154 98 6MB
English Pages 253 [250] Year 2021
Lecture Notes in Control and Information Sciences 485
Timm Faulwasser Matthias A. Müller Karl Worthmann Editors
Recent Advances in Model Predictive Control Theory, Algorithms, and Applications
Lecture Notes in Control and Information Sciences Volume 485
Series Editors Frank Allgöwer, Institute for Systems Theory and Automatic Control, Universität Stuttgart, Stuttgart, Germany Manfred Morari, Department of Electrical and Systems Engineering, University of Pennsylvania, Philadelphia, USA Advisory Editors P. Fleming, University of Sheffield, UK P. Kokotovic, University of California, Santa Barbara, CA, USA A. B. Kurzhanski, Moscow State University, Moscow, Russia H. Kwakernaak, University of Twente, Enschede, The Netherlands A. Rantzer, Lund Institute of Technology, Lund, Sweden J. N. Tsitsiklis, MIT, Cambridge, MA, USA
This series reports new developments in the fields of control and information sciences—quickly, informally and at a high level. The type of material considered for publication includes: 1. 2. 3. 4.
Preliminary drafts of monographs and advanced textbooks Lectures on a new field, or presenting a new angle on a classical field Research reports Reports of meetings, provided they are (a) of exceptional interest and (b) devoted to a specific topic. The timeliness of subject material is very important.
Indexed by EI-Compendex, SCOPUS, Ulrich’s, MathSciNet, Current Index to Statistics, Current Mathematical Publications, Mathematical Reviews, IngentaConnect, MetaPress and Springerlink.
More information about this series at http://www.springer.com/series/642
Timm Faulwasser Matthias A. Müller Karl Worthmann •
•
Editors
Recent Advances in Model Predictive Control Theory, Algorithms, and Applications
123
Editors Timm Faulwasser Department of Electrical Engineering and Information Technology, Institute of Energy Systems, Energy Efficiency and Energy Economics TU Dortmund University Dortmund, Germany
Matthias A. Müller Faculty of Electrical Engineering and Computer Science, Institute of Automatic Control Leibniz University Hannover Hannover, Germany
Karl Worthmann Faculty of Mathematics and Natural Sciences, Institute for Mathematics, Optimization-based Control Technische Universität Ilmenau Ilmenau, Germany
ISSN 0170-8643 ISSN 1610-7411 (electronic) Lecture Notes in Control and Information Sciences ISBN 978-3-030-63280-9 ISBN 978-3-030-63281-6 (eBook) https://doi.org/10.1007/978-3-030-63281-6 Mathematics Subject Classification: 93-B52, 93-C10, 93-D15, 49-N35, 90-C90 © The Editor(s) (if applicable) and The Author(s), under exclusive license to Springer Nature Switzerland AG 2021 This work is subject to copyright. All rights are solely and exclusively licensed by the Publisher, whether the whole or part of the material is concerned, specifically the rights of translation, reprinting, reuse of illustrations, recitation, broadcasting, reproduction on microfilms or in any other physical way, and transmission or information storage and retrieval, electronic adaptation, computer software, or by similar or dissimilar methodology now known or hereafter developed. The use of general descriptive names, registered names, trademarks, service marks, etc. in this publication does not imply, even in the absence of a specific statement, that such names are exempt from the relevant protective laws and regulations and therefore free for general use. The publisher, the authors and the editors are safe to assume that the advice and information in this book are believed to be true and accurate at the date of publication. Neither the publisher nor the authors or the editors give a warranty, expressed or implied, with respect to the material contained herein or for any errors or omissions that may have been made. The publisher remains neutral with regard to jurisdictional claims in published maps and institutional affiliations. This Springer imprint is published by the registered company Springer Nature Switzerland AG The registered company address is: Gewerbestrasse 11, 6330 Cham, Switzerland
Preface
Model predictive control (MPC) has nowadays become one of the most successful modern control technologies. In recent years, significant progress has been made in terms of theoretical foundations, computational methods, and application-oriented MPC schemes. This book presents a collection of some of these recent advances. Most of the authors of the chapters in this book were members of a scientific network in the area of MPC funded by the German Research Foundation (DFG). Such scientific networks are intended to foster collaborations between scientists on a specific topic of interest, and this book can hence be seen as an outcome, in which the following research areas were investigated • MPC: Set Point Stabilization, Path Following, and Distributed Control (Grant number WO 2056/1-1) • MPC: Beyond Set Point Stabilization (Grant number WO 2056/4-1). In the following, we give a brief account of the contents of this book, which can be categorized in novel theoretical developments (Chaps. 1–3), advances in optimization algorithms required for real-time implementation of MPC (Chaps. 4–6), and applications of MPC in different fields (Chaps. 7–9).
Theory In Chap. 1, a model predictive path-following control scheme is presented for differentially flat systems. One of the distinctive features of this approach compared to previous work is that no stabilizing terminal constraints are employed in the formulation of the underlying optimization problems. Estimates on the required length of the prediction horizon are rigorously derived such that solvability of the constrained path-following problem is guaranteed, i.e., convergence of the closed-loop trajectory to the pre-specified path and convergence of the path parameter to the end point of the path. To this end, cost controllability, i.e., a key building block for sufficient stability conditions, is established for a large class of v
vi
Preface
differentially flat systems. Subsequently, the design procedure is illustrated for a two-degree-of-freedom robotic arm. Chapters 2 and 3 focus on the topic of economic MPC. The distinctive feature of this type of MPC schemes is that compared to classical stabilizing (tracking) MPC, general (and not necessarily positive definite) cost functions can be considered. Chapter 2 provides a review on different notions of (strict) dissipativity that have been employed in the context of economic MPC. Such dissipativity properties are of crucial importance since they allow to classify the optimal system behavior (given the dynamics, cost function, and constraints) and to conclude closed-loop convergence and stability properties. First, the basic case is treated where steady-state operation is optimal, before optimal periodic operation and even more general (potentially time-varying) optimal operating regimes are considered. It is shown that in all these cases, similar dissipativity conditions can be used. Also, approaches are discussed how these dissipativity properties can be verified for certain classes of systems (in particular linear and nonlinear polynomial systems). In Chap. 3, three classes of economic MPC schemes that have been proposed in recent years are compared, namely, (i) with (standard primal) terminal constraints, (ii) without terminal constraints and terminal cost functions, and (iii) without (primal) terminal constraints but with a specific terminal cost that implies a terminal constraint for dual variables. These three classes of economic MPC schemes are contrasted in terms of (i) the size of the feasible sets of the underlying optimal control problems, (ii) invariance of the optimal steady state under the resulting receding horizon feedback, and (iii) bounds on the required prediction horizon length in order to conclude (practical) asymptotic stability of the resulting closed-loop system. This chapter closes by considering a numerical example.
Algorithms In Chaps. 4–6 the focus is shifted to algorithmic developments. While Chap. 4 is concerned with real-time applicability of MPC in general, Chaps. 5 and 6 deal with distributed computation and control, i.e., the decomposition of the optimal control problem to be solved in each MPC step. Nonlinear economic MPC schemes are considered in Chap. 4. The authors propose an extension of the so-called multi-level iteration scheme. On the one hand, they thoroughly discuss the interplay of exact and approximate Hessian regularization strategies based on inexact derivatives. On the other hand, the applicability is demonstrated in a tutorial example such that the presented findings are illustrated in detail. Moreover, the proper initialization of the optimal control problem shifted in time is addressed. While this task is straightforward in setpoint or tracking MPC, it is much more involved for the economic case. The authors propose several approaches to successfully tackle this problem, they compare them with the standard techniques shift initialization and warm start, and they consider the
Preface
vii
numerically challenging example of wind turbine control to show the effectiveness of the proposed strategies. In Chap. 5, the authors investigate the numerical solution of linear MPC schemes based on quadratic costs subject to polytopic constraints. The resulting large-scale quadratic programs (QPs) are typically solved in an iterative manner. However, due to real-time requirements, the computation is often stopped prematurely, i.e., before convergence to the optimal solution is (approximately) achieved. In this chapter, a real-time iteration based on the alternating direction method of multipliers (ADMM) with an a priori fixed iteration number is considered. The contribution is twofold: On the one hand, the authors make use of the concept of an augmented state-space model to rigorously derive statements w.r.t. closed-loop stability. On the other hand, a detailed sensitivity analysis provides insights in parameter tuning such that approximately optimal performance is realized despite maintaining restrictive real-time requirements. In Chap. 6, the authors provide a tutorial-style introduction to the recently proposed distributed optimization algorithm ALADIN, where the acronym stands for augmented lagrangian-based alternating direction inexact newton. The distinctive feature of ALADIN is its applicability to non-convex optimization problems— in contrast to, e.g., ADMM. Moreover, ALADIN locally exhibits a superlinear or even quadratic convergence and, thus, outperforms state-of-the-art solvers if high-accuracy solutions are required. Moreover, detailed explanations on the numerical implementation are presented and illustrated by conducting numerical case studies.
Applications In Chaps. 7–9, open problems closely linked to particular applications are investigated. This spans from the internet of things (IoT) via chemical processes to non-cooperative control of mobile robots. Here, different and rather new techniques are used: deep learning, Gaussian processes, and state quantization. Chapter 7 deals with systems interconnected either via physical couplings and/or communication links. In particular, applications within the broad theme internet of things (IoT) are considered, e.g., low-power wide area networks (LPWANs). While such use cases come along with a high degree of flexibility, they also exhibit certain characteristic traits, i.e., limited computational power, low memory storage, and strongly limited communication among the subsystems (e.g., micro-controllers and sensors). Hence, the task is the controller design facing these challenges while maintaining privacy and security requirements at the same time. To this end, a combination of explicit and/or robust predictive control schemes is typically employed to deal with the limited resources and the induced additional uncertainties. In this chapter, the authors propose a deep-learning-based approach to approximate the explicit MPC solution and, thus, to successfully resolve this
viii
Preface
conundrum of the seemingly contradicting objectives robustness and low computational power. In Chap. 8, the focus is on modeling the part of the dynamics typically not reflected by first principles. To this end, hybrid Gaussian processes are used. Advantages of this approach are twofold: Firstly, predictions of the future plant behavior are enabled—a conditio sine qua non for advanced process control methodologies like MPC. Secondly, the residual uncertainty is quantified, such that it can be properly taken into account within the employed control scheme. Indeed, Monte Carlo samples even allow to further tighten these constraints and, thus, to ensure joint probabilistic constraint satisfaction during the runtime of the algorithm. Furthermore, a challenging case study on control of a semi-batch reactor is presented to highlight features like data efficiency. In Chap. 9, control of multiple mobile robots in a non-cooperative setting is studied. In this context, communication and computation time are conflicting control objectives. To address both challenges, the author proposes a distributed model predictive control algorithm based on state quantization and so-called differential updates. The former is employed to attenuate the transmission load associated with the exchange of state predictions but requires additional care afterwards to rigorously ensure constraint satisfaction, e.g., to avoid collisions between the mobile robots. The latter, i.e., the differential communication, alleviates communication requirements. In particular, it takes care of not sending redundant information. The applicability of the newly introduced methods is demonstrated by excessive numerical simulations. In summary, this book takes a snapshot on recent developments in theory, algorithms, and applications of predictive control. However, one can expect that especially in the areas of learning-based control and network systems numerous important developments are yet to be made. Dortmund, Germany Hannover, Germany Ilmenau, Germany July 2020
Timm Faulwasser Matthias A. Müller Karl Worthmann
Contents
1 Predictive Path Following Control Without Terminal Constraints . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . T. Faulwasser, M. Mehrez, and K. Worthmann
1
2 Dissipativity in Economic Model Predictive Control: Beyond Steady-State Optimality . . . . . . . . . . . . . . . . . . . . . . . . . . . . M. A. Müller
27
3 Primal or Dual Terminal Constraints in Economic MPC? Comparison and Insights . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . T. Faulwasser and M. Zanon
45
4 Multi-level Iterations for Economic Nonlinear Model Predictive Control . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . A. Nurkanović, S. Albrecht, and M. Diehl
65
5 On Closed-Loop Dynamics of ADMM-Based MPC . . . . . . . . . . . . . 107 Moritz Schulze Darup and Gerrit Book 6 Distributed Optimization and Control with ALADIN . . . . . . . . . . . . 135 B. Houska and Y. Jiang 7 Model Predictive Control for the Internet of Things . . . . . . . . . . . . . 165 B. Karg and S. Lucia 8 Hybrid Gaussian Process Modeling Applied to Economic Stochastic Model Predictive Control of Batch Processes . . . . . . . . . . 191 E. Bradford, L. Imsland, M. Reble, and E. A. del Rio-Chanona 9 Collision Avoidance for Mobile Robots Based on an Occupancy Grid . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 219 Tobias Sprodowski
ix
Chapter 1
Predictive Path Following Control Without Terminal Constraints T. Faulwasser , M. Mehrez , and K. Worthmann
Abstract We consider model predictive path-following control (MPFC) without stabilizing terminal constraints or costs. We investigate sufficient stability conditions in the framework of cost controllability. Then, we analyze cost controllability for pathfollowing problems of differentially flat systems. Using this result, we establish that under suitable assumptions MPFC without terminal constraints or penalties leads to asymptotic stability provided the prediction horizon is sufficiently long. Further, the proposed methodology allows to quantify a stabilizing prediction horizon. We illustrate our findings considering a robotic manipulator example. We explicitly verify cost controllability and conduct numerical experiments.
1.1 Introduction Recently, besides the usual control problems of setpoint stabilization and trajectory tracking, so-called path-following problems have received a considerable attention. The latter refers to problems, where the reference is a geometric curve in the output space of a system, which should be traversed while the velocity is not specified a priori [1, 24]. Path-following problems are of immense interest in motion planning, robotics, and mechatronics [29, 35]. Moreover, beginning with [10] there has been an active development of MPC schemes to tackle this problem, see, e.g., [2, 6, 12, 21, 34]; the resulting class of schemes is referred to as model predictive path-following
T. Faulwasser TU Dortmund University, Dortmund, Germany M. Mehrez (B) University of Waterloo, Waterloo, Canada e-mail: [email protected] K. Worthmann Technische Universität Ilmenau, Ilmenau, Germany © The Author(s), under exclusive license to Springer Nature Switzerland AG 2021 T. Faulwasser et al. (eds.), Recent Advances in Model Predictive Control, Lecture Notes in Control and Information Sciences 485, https://doi.org/10.1007/978-3-030-63281-6_1
1
2
T. Faulwasser et al.
control (MPFC). The main advantages of MPFC are its ability to systematically consider input and state constraints and to treat both trajectory planning and tracking within the same framework. Except for [11, 25], essentially all of the works mentioned above rely on terminal constraints and/or costs to guarantee closed-loop asymptotic stability.1 The present work attempts to close this gap. To this end, we will focus on pathfollowing problems for differentially flat systems where the reference path is given by a flat output. While predictive path following for flat systems has already been considered in [6] (relying on terminal constraints) and in [11] (not providing a full-fledged convergence analysis), we present sufficient stability conditions without terminal constraints. Extending our previous conference paper [25], we show how the concept of cost controllability, see [7] and the references therein, can be combined with a flat parametrization of the path manifold as proposed by [6] to establish a sufficient stability condition for MPFC. Notice that, while the conference paper [25] considered a specific class of mobile robots, here we handle a much richer class of systems. Moreover, we present mild conditions which allow for an explicit construction of the growth bound required in cost controllability. The remainder of the chapter is structured as follows: in Sect. 1.2, we show the class of dynamical systems considered in this study, the path-following control problem, the tailored state set of the augmented system, and the proposed MPFC scheme. Stability results for MPC schemes without terminal conditions are revisited and summarized in Sect. 1.3. Cost controllability is adapted to MPFC and analyzed in Sect. 1.4. A detailed procedure for deriving the growth bound is shown in Sect. 1.5 for a robotic arm example, where a method for determining a stabilizing horizon length is shown. Finally, concluding remarks are discussed in Sect. 1.6. Notation: R denotes the real numbers. The kth derivative of a (sufficiently smooth) function r : I → Rn on an (open) interval I ⊆ R is written as r (k) . If the argument p refers to time, we also use the common abbreviation r˙ = r (1) . Lloc ([0, ∞), Rm ) stands for the space of Lebesgue-measurable and locally absolutely p-integrable functions on the domain [0, ∞) mapping into Rm .2
1.2 Preliminaries and Problem Statement We consider nonlinear control systems of the form x(t) ˙ = f (x(t), u(t)), x(0) = x0 ,
(1.1a)
y(t) = h (x(t)) ,
(1.1b)
1 Reference [21] relies on a contraction constraint, which might lead to a loss of recursive feasibility. 2 We
identify functions of the same equivalence class, i.e., if they coincide everywhere except on a set of measure zero in order to avoid a cumbersome notation.
1 Predictive Path Following Control Without Terminal Constraints
3
where y(t) ∈ Rn y denotes the output at time t ∈ R. We assume that the continuous vector field f : Rn x × Rn u → Rn x is locally Lipschitz w.r.t. its first argument x. A solution of the initial value problem (1.1a), starting at x0 at time 0, driven by nu the input u ∈ L∞ loc ([0, ∞), R ), is denoted as x(·; x 0 , u) on its maximal interval of nx existence. The state x ∈ R and the input u ∈ Rn u are constrained by the compact sets X ⊆ Rn x and U ⊆ Rn u , respectively. The control u is said to be admissible on the interval [0, ∞) if and only if the solution exists and the conditions u(t) ∈ U and x(t; x0 , u) ∈ X are satisfied for all t ∈ [0, ∞). We consider the class of differentially flat systems [15, 23, 31, 32]. Definition 1.1 (Differentially flat system [31]) A variable ξ = (ξ1 , ξ2 , . . . , ξn u ) is called a flat output of system (1.1a) if the following properties are locally satisfied: 1. The variable ξ can be written as a function of the state variables x = (x1 , . . . , xn x ) , the input variables u = (u 1 , . . . , u n u ) , and a finite number of time derivatives of the input variables, i.e., (l ) ξ = g x, u 1 , . . . , u 1(l1 ) , . . . , u n u , . . . , u n unu .
(1.2)
2. The state x and the control u can be expressed as functions of the variable ξ and κ nx a finite number of time derivatives n u of ξ . Hence, there exist maps Φ1 : R → R κ+n u nu and Φ2 : R → R , κ = i=1 ki , such that (k −1) , x = Φ1 ξ1 , . . . , ξ1(k1 −1) , . . . , ξn u , . . . , ξn u nu (k ) u = Φ2 ξ1 , . . . , ξ1(k1 ) , . . . , ξn u , . . . , ξn u nu .
(1.3a) (1.3b)
3. The components of ξ are differentially independent, i.e., they do not fulfill any differential equation. If a flat output exists, system (1.1a) is called (differentially) flat. We remark that the third condition is stated for the sake of completeness. It helps to avoid pathological cases in which the components of the flat output fulfill a homogeneous differential equation, cf. [31, Sect. 2.1]. We will utilize flatness for the further analysis in this paper. Thus, we assume the following. Assumption 1.2 (Flat system) System (1.1a) is differentially flat and (1.1b) is one of its flat outputs. Furthermore, the maps Φ1 , Φ2 given by (1.3) are continuous on sufficiently large subsets I ⊆ Rκ , J = I × Jˆ ⊆ Rκ+n u of their domains such that the following inclusion holds: X × U ⊆ Φ1 (I) × Φ2 (J).
(1.4)
In essence, Assumption 1.2 states that flatness and continuity of the parametrizations (1.3) hold for all (x, u) ∈ X × U.
4
T. Faulwasser et al.
1.2.1 Path-Following Problems Our goal is to drive the flat system (1.1) along a path P, which is given by a parametrization p : [θ¯ , 0] → Rn y of a regular curve in the output space (1.1b), i.e., ¯ 0] : y = p(θ ) . P = y ∈ Rn y | ∃ θ ∈ [θ,
(1.5)
The scalar variable θ is called the path parameter. In general, the path parameter θ is time dependent, but its time evolution t → θ (t) is not fixed a priori. We consider output paths, which are defined in a flat output space of (1.1). The path defined in the flat output can be mapped to the state set X ∈ Rn x . In this case, the result is a manifold referred to as zero path error manifold (or path manifold) [9]. Let p : (θ, θ˙ , θ¨ , . . . , θ (ˆr −1) ) → Rn y ׈r be given by p = p(θ )
d dt
p(θ ) . . .
d(ˆr −1) d(ˆr −1) t
p(θ )
with rˆ = max{k1 , . . . , kn u }. Using this shorthand, possible parametrizations of this manifold are given by Γ = x ∈ Rn x | ∃ θ, θ˙ , . . . , θ (ˆr −1) : x = Φ1 ◦ p := xP θ, θ˙ , . . . , θ (ˆr −1) , (1.6) i.e., xP is a function of the path parameter θ and a finite number of its time derivatives. Analogously, a feedforward input u P ensuring that system (1.1) follows the path (1.5) is given by u P = Φ2 ◦ p := Ψ (θ, θ˙ , . . . , θ (ˆr −1) ) ⊆ Rn u ,
(1.7)
see, e.g., [13]. For the sake of simplicity, we use an integrator chain as a timing law, i.e., the timing of the path parameter θ is specified via the ODE θ (ˆr ) = v,
θ (i) (0) = θ0(i) , i = 0, . . . , rˆ − 1,
(1.8)
where depending on the value of rˆ = max{k1 , . . . , kn u }, the variable v can be regarded as the speed, acceleration, or jerk of the reference. It is worth to be noted that the time evolution t → θ (t)—and thus also the evolution of the reference p(θ (t))— can be controlled via the virtual input v : [0, ∞) → V. In order to avoid impulsive path parameter dynamics (1.8) in the later optimization, the admissible values of the virtual path parameter inputs v are restricted to a compact set V ⊂ R containing 0 in its interior. Moreover, we remark that via (1.6) and (1.7) the output path is transformed into an equivalent state-space manifold. This mapping of paths from the output space to corresponding manifolds in the state space has been considered, for example, in [6,
1 Predictive Path Following Control Without Terminal Constraints
5
9, 11]. We also refer to the works of [4, 27] to [14] and [9, Chap. 4] for a detailed discussion. In many applications it is natural to formulate the path-following problem in the output space. However, similar to [6] and to the end of deriving an MPFC stability proof without terminal constraints, we tackle the reformulated problem in the state space, i.e., we consider the following problem. Problem 1.3 (Output-path following in the state space) Given the flat system (1.1) and the reference path P parametrized by (1.5), design a controller in order to compute u and v such that the following properties hold: 1. Path convergence: the state trajectory converges to the parametrized path, i.e., ˙ . . . , θ (ˆr −1) (t)) = 0. lim x(t) − xP (θ (t), θ(t),
t→∞
2. Convergence on path: the system moves along P in forward direction, i.e., θ˙ (t) ≥ 0 and lim |θ (t)| = 0. t→∞
3. Constraint satisfaction: the state and input constraints x(t) ∈ X and u(t) ∈ U, respectively, are satisfied for all t ∈ [0, ∞). Introducing the auxiliary variable z := (θ, θ˙ , . . . , θ (ˆr −1) ) , path-following problems can be investigated via the following augmented system [12]
x˙ f (x, u) x˙ = = =: f(x, u). z˙ g(z, v)
e x − xP (z) . = z1 θ
(1.9a) (1.9b)
In this representation, the dynamics (1.1a) are augmented by that of the path parameter θ , i.e., by z˙ = g(z, v) = A z z + Bz u, which is simply a state-space representation of (1.8). The two elements of the output (1.9b) are the path-following error e = x − xP (z) and the path parameter θ = z 1 . By referring to the augmented system (1.9a), state-space path-following (Problem 1.3) requires the convergence of the error e and the path parameter θ to 0, which corresponds to the end point of the path. The augmented state vector x := (x , z ) ∈ Rn x +ˆr comprises the state of the system as well as the virtual path parameter state z, while u := (u , v) ∈ Rn u +1 refers to the augmented input. Conceptually similar to [25], and for a parameter ¯ 0), the constraint set Xε ⊂ Rn x +ˆr of the augmented state variable x is defined ε ∈ (θ, as ¯ ε), X × Z, θ ∈ [θ, Xε = (1.10) Γ × Z, θ ∈ [ε, 0],
6
T. Faulwasser et al.
where Z ⊆ Rrˆ holds. The set Z is constructed such that it is a controlled forwardinvariant set ensuring that z 1 = θ ∈ [θ¯ , 0], as well as z 2 = θ˙ ≥ 0. This way, we enforce forward motion along the path. Moreover, the structure of Xε requires the controlled system state to be on the reference path P if the path parameter θ satisfies θ > ε, i.e., close to the end point of the path. This helps to avoid a local stabilizability assumption around the final point of the path, e.g., based on stabilizability of the linearization. We remark that while Xε imposes a constraint on the system motion close to the end point of the path P, it will not be enforced at the end of each prediction horizon (introduced later)—i.e., it does not constitute a stabilizing MPC terminal constraint in the generic sense. Finally, notice that the vector of augmented control actions u := (u , v) ∈ Rn u +1 contains the system control input as well as the virtual control. The input constraint of u is U = U × V. The path-following problem is reduced now to stabilizing the output (1.9b) to zero.
1.2.2 Model Predictive Path Following (MPFC) In [10], it has been suggested to tackle path-following problems via nonlinear MPC. To this end, we define the stage cost : Xε × U → R≥0 by x − xP (z) 2 u − u P (z) 2 , (x, u) = + z v Q R
(1.11)
where xP (z) and u P (z) are calculated via (1.6) and (1.7) and Q = diag(q1 , . . . , qn x , q¯1 , . . . , q¯rˆ ), and R = diag(r1 , . . . , rn u , rv ) are positive definite weighting matrices with units adjusted such that is dimensionless. Note that the stage cost is continuous and satisfies the conditions (x , 0) = 0
inf (x, u) > 0 ∀ x ∈ Xε \ x ,
and
u∈U
(1.12)
where x is the path end point given by
x :=
xP (z), z
.
(1.13)
z=0Rrˆ
For technical reasons we require that the path P ends in a controlled equilibrium of the original system (1.1), which can be formulated as follows. Assumption 1.4 (Path ends in equilibrium) The condition 0 = f(x , 0) holds with u = 0 ∈ int U = int (U × V).
1 Predictive Path Following Control Without Terminal Constraints
7
As standard in MPC, the applied input is based on repeatedly solving an optimal control problem (OCP). That is, at each sampling instance tk = kδ, k ∈ N0 , with δ > 0, we solve an OCP minimizing the cost functional JT (xk , u) =
tk +T
(x(τ ; xk , u), u(τ )) d τ.
(1.14)
tk
The subscript ·k indicates that the corresponding variable corresponds to the kth sampling instant tk . The constant T ∈ (δ, ∞) is the prediction horizon. The OCP to be solved in a receding horizon fashion at the sampling time tk reads VT (xk ) :=
min
u∈L∞ ([tk ,tk +T ],Rn u +1 )
JT (xk , u)
(1.15a)
subject to the constraints, for each τ ∈ [tk , tk + T ), x˙ (τ ) = f(x(τ ), u(τ )), x(tk ) = xk
(1.15b)
x(τ ) ∈ Xε u(τ ) ∈ U.
(1.15c) (1.15d)
The statement of conditions that ensure the existence of optimal solutions to the OCP (1.15) is beyond the scope of this chapter; we refer to [5, 22]. Rather, we assume that an optimal solution exists and is attained in order to avoid technical difficulties.3 Note that the decision variable u in (1.15a) contains the real system input u as well as the virtual path parameter input v. In other words, by solving (1.15) we obtain the system input and the reference evolution at the same time. Hence the time-varying feedback law is given by u (t; x(tk )), t ∈ [tk , tk + δ).
(1.16)
Note that the initial condition xk in the optimization is composed of the system state x(tk ) and the path parameter state z(tk ).4 Moreover, the initial condition for z(tk ) is set as z(tk ) := z(tk ; z k−1 , v ), i.e., the corresponding value of the last predicted trajectory. Due to this memory the feedback (1.16) is a dynamic and not a static one, cf. [12] for a detailed discussion. The state and input constraints of the augmented 3 Indeed, typical results on existence of optimal controls will require the OCP data to be at least twice
continuously differentiable. Interestingly, [28] appears to be one of the few paper leveraging flatness for numerical optimal control. Moreover, the question of how to leverage flatness for analysis of the existence of optimal solutions in a non-differentiable setting appears to be open. 4 In cases where no initial condition at k = 0 is given, one may obtain z(0) via: z(t0 ) = (θ(t0 ), 0, . . . , 0)T θ(t0 ) = arg min h(x0 ) − p(θ). ¯ θ ∈[θ,0]
(1.17a) (1.17b)
Note that this problem might have multiple optimal solutions, and we simply choose one of them.
8
T. Faulwasser et al.
system (1.9a) are enforced by (1.15c) and (1.15d), respectively. We remark that satisfaction of the constraint Xε given by (1.10) requires the system to follow the final part of the path exactly. However, OCP (1.15) does not involve any terminal constraint. Algorithm 1.1 summarizes the proposed MPFC scheme. Algorithm 1.1 MPFC Scheme Set: iteration counter k = 0, prediction horizon T , sampling period δ ∈ (0, T ), and initial condition xˆ := x(tk ) = (x (tk ), z (tk )) ∈ Xε . 1. Compute a minimizing control function u (·) ∈ U such that JT (ˆx, u (·)) = VT (ˆx) holds. 2. Apply the following feedback to the controlled system u (t; x(tk )), t ∈ [tk , tk + δ). 3. Increment k by 1 and set the initial condition xˆ = x(tk ) = (x (tk ), z (tk )), where x(tk ) := x(tk ; xk−1 , u ) and z(tk ) := z(tk ; z k−1 , v ), and go to Step 1.
1.3 MPC Stability and Performance Bounds Asymptotic stability of the path end point x given by (1.13) w.r.t. the proposed MPFC scheme (Algorithm 1.1) yields a solution of Problem 1.3. We show this claim by verifying cost controllability—a sufficient stability condition. To this end, we construct a so-called growth function, which relates the increase of the value function VT w.r.t. the prediction horizon T with the stage cost—uniformly in the state x. Hereby, we exploit differential flatness. We first recapitulate NMPC stability results presented in [7], which extend previously derived results for discrete [16, 19, 37] and continuous-time systems [30] to the class of systems with stabilizable homogeneous approximation. We use the subscript (·)T,δ to denote the MPC closed-loop variables (states and control inputs) of the augmented system (1.9a) for a specific choice of T and δ, where u is applied according to (1.16). As shown in [17], the key idea to ensure stability and performance measures of MPC schemes without stabilizing terminal constraints or costs is the relaxed Lyapunov inequality VT (xT,δ (δ; x)) ≤ VT (x) − αT,δ
δ
(xT,δ (t; x), uT,δ (t; x)) d t
0
with degree of suboptimality αT,δ > 0.
∀ x ∈ Xε , (1.18)
1 Predictive Path Following Control Without Terminal Constraints
9
The following theorem, which combines cost controllability based on the growth condition introduced in [33] and the stability results of [7, 30], is used to determine a prediction horizon T such that Inequality (1.18) holds. Theorem 1.5 Assume existence of a monotonically increasing and bounded function B : R≥0 → R≥0 satisfying Vt (x0 ) ≤ B(t) · (x0 )
∀ t ≥ 0 and x0 ∈ Xε ,
(1.19)
where (x0 ) := inf u∈U (x0 , u). Suppose that, for given sampling period δ > 0, the prediction horizon T > δ is chosen such that the degree of suboptimality is positive, i.e., αT,δ > 0 holds with αT,δ
T
B(t)−1 dt
T
−1
· e− T −δ B(t) dt . := 1 − T T −1 −1 1 − e− δ B(t) dt 1 − e− T −δ B(t) dt e−
δ
Then the relaxed Lyapunov inequality (1.18) as well as the performance estimate u
∞
V∞T,δ (x) := 0
−1 (xT,δ (t; x), uT,δ (t; x)) d t ≤ αT,δ · V∞ (x)
(1.20)
u
are satisfied for all x ∈ Xε , whereby V∞T,δ (x) are the integrated stage cost on the infinite time interval [0, ∞) evaluated along the MPC closed loop. If, in addition, there exists K∞ -functions η, η¯ : R≥0 → R≥0 such that ¯ − x ) ∀ x ∈ Xε η(x − x ) ≤ (x) ≤ η(x holds, the origin is asymptotically stable w.r.t. the MPC closed loop.
(1.21)
While condition (1.21) holds trivially for the chosen stage cost (1.11), verifying cost controllability (1.19) is, in general, non-trivial, see, e.g., [38, 39] for a discrete- and a continuous-time example. Furthermore, we refer to [26] for a detailed discussion on potential relaxations of the properties imposed on the growth function B and or an example that quadratic stage costs do not always work (see, e.g., [7] for possible remedies) and the fact that there always exists a prediction horizon T satisfying αT,δ > 0 under the assumed cost controllability. The next result provides the connection between Theorem 1.5 and Problem 1.3. Proposition 1.6 Consider the MPFC scheme proposed in Algorithm 1.1 and suppose that the assumptions of Theorem 1.5 hold and that for a given sampling period δ the prediction horizon T is chosen such that αT,δ > 0 given in Theorem 1.5 hold. Moreover, suppose that OCP (1.15) is recursively feasible. Then, the MPFC feedback (1.16) solves Problem 1.3. Proof The chosen stage cost (1.11) implies that, as the error e given by (1.9b) tends to 0, the system state x converges to the path P and the path parameter state z
10
T. Faulwasser et al.
converges to 0Rrˆ . This implies, as recursive feasibility of the OCP (1.15) is assumed, the convergence of the state trajectory to x and, thus, an admissible solution of Problem 1.3. Recursive feasibility in nonlinear MPC, as invoked in Proposition 1.6, is investigated for continuous-time systems without terminal constraints and costs in, e.g., [8]. Moreover, in [20], so-called generic terminal ingredients were considered. Also, the latter reference proposes a design procedure for discrete-time systems, its main conceptual ideas should be transferable to a continuous-time setting.
1.4 Cost Controllabiltiy for Differentially Flat Systems In this section, we construct a growth function B satisfying Inequality (1.19) for differentially flat systems. This ensures cost controllability. Then, for given sampling period δ, we can choose a prediction horizon T such that the stability condition αT,δ > 0 holds. Then, using Theorem 1.5, we conclude asymptotic stability of the desired setpoint. In our construction of the growth function B, we exploit the interplay between the structure (1.10) of the constraint set Xε and differentially flat systems. The rationale of the subsequent developments is as follows: Firstly, we construct a bounded function B˜ such that (1.19) is satisfied for all x0 ∈ Xε close to the desired setpoint. This is done by constructing an admissible (open loop) control function u = ux0 : [0, ∞) → U (parametrized in the state x0 ) that steers the corresponding state trajectories (of course, again, parametrized in the state x0 ) to the end point of the path P in finite time tx0 with uniform bound t¯, i.e., tx0 ≤ t¯ for all x0 ∈ Xε . Then, we derive an (absolutely) integrable function c : [0, ∞) → R≥0 , which is independent on x0 ∈ Xε , satisfying (x(t; x0 , ux0 ), ux0 (t)) ≤ c(t) · (x0 )
∀ x0 ∈ Xε
(1.22)
for all t ≥ 0. Since (x(t¯; x0 , ux0 ), ux0 (t)) = 0 holds, we can set c(t) = 0 for all t ≥ t¯. Hence, the function B˜ defined by ˜ B(t) =
t
c(s)d s
(1.23)
0
is bounded and satisfies Inequality (1.19) for all x0 ∈ Xε . Secondly, we modify the auxiliary function B˜ such that it satisfies (1.19) also for states x0 ∈ Xε not contained in the previously considered neighborhood of the desired setpoint. To this end, we use the fact that (·) has a lower bound while differential flatness also implies an upper bound on the value function Vt (·); both uniform w.r.t. the initial state x0 .
1 Predictive Path Following Control Without Terminal Constraints
11
1.4.1 Existence of Feasible Motions for Flat Systems As indicated above, we obtain the growth function B via the construction of feasible pairs (x(t), u(t)), t ∈ [0, ∞). For specific systems such trajectories can be generated using specific characteristics, e.g., symmetries, of the systems in consideration, see, e.g., [25]. Here, however, we propose a different approach using the controllability properties of flat systems. Specifically, we rely on results for trajectory generation of constrained flat systems from [14]. To this end, we recall the following (auxiliary) problem. Problem 1.7 (Constrained setpoint transition) Given the setpoints (x0 , u 0 ) ∈ X × U and (x f , u f ) ∈ X × U of system (1.1) satisfying 0 = f (xi , u i ), i ∈ {0, f }. Compute a finite time t f > 0 and a piecewise-continuous control function u : [t0 , t f ] → U such that x(t; x0 , u) ∈ X ∀ t ∈ [0, t f ], and (x(ti ; x0 , u), u(ti )) = (xi , u i ) ∀ i ∈ {0, f }.
(1.24a) (1.24b)
One can rely on differential flatness to tackle Problem 1.7. As shown in Definition 1.1, differential flatness reduces to the existence of a flat output in which all system states x and control inputs u can be expressed as functions (Φ1 , Φ2 ) of the flat output and a finite number of its time derivatives. As shown in [23, Chap. 7], if we would like to construct a trajectory with the initial and final conditions (x0 , u 0 ) and (x f , u f ), respectively, it suffices to build a corresponding trajectory of the flat output t → y(t). A quite general solution to Problem 1.7, which is based on the topological properties of the set of the admissible steady-state values of the flat output, is presented in [14]. Therein, the main results for the existence of a solution of Problem 1.7 are summarized by the following definition and theorem. ˜ ⊂ Rn y of the strictly Definition 1.8 (Consistent stationary outputs) The set Y steady-state consistent outputs of (1.1) is defined by ˜ := {y = h(x) | ∃ (x, u) ∈ int(X × U), f (x, u) = 0} . Y The proof of the following theorem can be found in [14, Theorem 1]. Theorem 1.9 (Feasibility of Setpoint Transitions) Given a differentially flat system, ˜ of consistent stationary i.e., a system (1.1) satisfying Assumption 1.2, and the set Y outputs, see Definition 1.8. Then, for any pair of setpoints (xi , u i ), i ∈ {0, f } for ˜ ⊂ Rn y such which there exists an open, simply connected, and bounded set K ⊆ Y that h(xi ) ∈ K,
i ∈ {0, f }
(1.25)
holds, the constrained finite-time setpoint transition, i.e., Problem 1.7, is feasible.
12
T. Faulwasser et al.
In essence, Theorem 1.9 ensures the existence of a trajectory t → (x(t), u(t)) for system (1.1) such that Problem 1.7 is satisfied. Importantly, the proof of this result is based on the existence of a sufficiently smooth geometric path connecting the output ˜ In view of the MPFC scheme at hand, the setpoints h(xi ), i ∈ {0, f } through K ⊆ Y. bottleneck of Theorem 1.9 is that it refers to motions between controlled equilibria of (1.1). We will need to address this issue in our further developments.
1.4.2 Construction of a Growth Function To the end of presenting a transparent construction of a growth function, we first make the following assumption. Assumption 1.10 (Constrained path-followability and feasible stopping) ˜ such that P ⊂ K, i.e., in virtue of (i) There exists a simply connected set K ⊆ Y Theorem 1.9 the path P is exactly followable. (ii) For all x0 ∈ X, there exist τˆ ∈ [0, τmax ] and an admissible control function u x0 ∈ L∞ ([0, τˆ ], U) satisfying h(x(τˆ ; x0 , u x0 )) ∈ K. Assumption 1.10 requires that the system can be steered to a (controlled) equilibrium from everywhere in the state constraint set. While for a general system, Assumption 1.10 is a tricky condition, we will see in Sect. 1.5 that it is satisfied for stiff robotic manipulators under rather mild assumptions. Now we have all the elements needed for our main result. Theorem 1.11 (Asymptotic Stability under MPFC) Consider system (1.1) satisfying Assumptions 1.2 and 1.10. Then, there exists a bounded function B satisfying Condition (1.19). Hence, cost controllability holds and, for arbitrary but fixed δ > 0, the origin is asymptotically stable w.r.t. the MPFC closed loop for a sufficiently large prediction horizon T > δ. Proof We provide a framework for the construction of a bounded growth function B satisfying Inequality (1.19). To this end, we partition the set Xε into initial states (1) x0 = (x0 z 0 ) with z 0,1 := z 1 (0) = θ0 ≤ ε and (2) x0 = (x0 z 0 ) with z 0,1 := z 1 (0) = θ0 ≥ ε. For each case, we derive bounded growth functions Bi , i ∈ {1, 2}, satisfying Inequality (1.19) for all states contained in the respective set of initial conditions. Then, the function B : [0, ∞) → R≥0 defined by B(t) := max {B1 (t), B2 (t)}
(1.26)
satisfies Inequality (1.19) for all x0 ∈ Xε . For the construction of each Bi , i ∈ {1, 2}, we proceed as described in the beginning of this section exploiting particular properties depending on the case in consideration. We emphasize that a similar construction was carried out in [38] for classical setpoint stabilization.
1 Predictive Path Following Control Without Terminal Constraints
13
Case 1, i.e., initial conditions x0 = (x0 z 0 ) with z 0,1 := z 1 (0) = θ0 ≤ ε. Here, the minimized stage cost (x0 ) given by x0 − xP (z 0 ) 2 (x0 ) = z0 Q
is uniformly lower bounded by q¯1 ε2 . Next, we construct a (parametrized) controlstate pair (ux0 (t), x(t; x0 , ux0 )), t ∈ R≥0 , such that Condition (1.22) holds with an absolutely integrable (auxiliary) function c1 : [0, ∞) → R≥0 hold. Here, it is essential that c1 does not depend on the parameter x0 . In particular, ux0 steers the system state from x0 to the path end point x . We construct this trajectory over three steps. First step: Since the set of considered initial conditions is compact, invoking Assumption 1.10 indeed yields a time t1 (independent of x0 ) such that all states x0 of system (1.1) can be driven to a controlled equilibrium x(t1 ) using an admissible control u x0 ∈ L1 ([0, t1 ], Rn u ) (if x0 is already a controlled equilibrium, the assertion holds trivially). During the same time interval, we stop the path parameter dynamics (1.8), i.e., z i (t1 ) = 0 holds for i ∈ {2, 3, . . . , rˆ }. The time t1 is chosen such that the constraints on the controls u and v are satisfied and both maneuvers are finished until/before t1 (again, we use the compactness of the considered set of initial conditions). The function c1 (t), t ∈ [0, t1 ], can be computed as the pointwise supremum, which exists due to uniform boundedness of the left-hand side of Inequality (1.22) (compactness, again) and the mentioned uniform lower bound on . We refer to the techniques (admissible swaps, replacements) introduced in [38, Definition 2 and Lemma 3] in order to reduce conservatism in the derived bounds in this and the following steps. Second step: The attained state of system (1.1) is steered to the point x(t2 ) = xP (z(t1 )) on the path xP (z) until time t2 , t2 > t1 , while leaving the path parameter dynamics stationary, i.e., v = 0 for all t ∈ (t1 , t2 ]. Feasibility of this state transition is ensured by means of Theorem 1.9 based on Problem 1.7. Then, the function c1 (t), t ∈ (t1 , t2 ] can be computed analogously to the first step of this procedure (existence and boundedness are inferred from the same arguments as well). Third step: During the time interval [t2 , t3 ], we steer the state x of the augmented (z) z ) |z=0Rrˆ . To this end, system (1.9a) to the end point of the path, i.e., x = (xP we apply the control u(t) = (u P (t) vr e f (t)) , t ∈ (t2 , t3 ], where u P is given by the parameterization (1.7) such that we attain x(t; x(t2 ), u P ) = xP (z(t; z(t2 ), vr e f )). Moreover, we choose the control vr e f such that the path parameter state z reaches the end of the path at time t = t3 , i.e., z(t2 ) = z(t1 )
and
z(t3 ) = 0Rrˆ
(1.27)
14
T. Faulwasser et al.
holds. This state transition of z is realized based on Theorem 1.9 where t3 is set to the maximal time needed for this maneuver. In conclusion, the function c1 (t), t ∈ (t2 , t3 ] is well defined and bounded (analogous line of reasoning as used in the first step) such that Condition (1.22) holds for all initial states considered in Case 1. We emphasize that parts of the maneuver not needed, are not skipped. Rather, one waits until the respective time is reached and, then, continues the next (needed) motion, see, e.g., [38] for further details. Case 2: We consider initial conditions x0 = (x0 z 0 ) with z 1 (0) = θ0 ≥ ε, for which we have x0 = xP (z 0 ) in view of thedefinition of the set Xε , cf. (1.10). Hence, ˆ q¯i z 0,i . we have the equation (x0 ) = z 0 2Q = ri=1 In this case, we generate a trajectory t → (x(t; x0 , ux0 ), ux0 (t)), which directly steers the system state from the initial condition x0 to the path end point x . To this end, we apply the control u(t) = (u P (t) vr e f (t)) , t ≥ 0, where u P (t) is given by the parameterization (1.7) and ensures x(t; x0 , u P ) = xP (z(t; z 0 , vr e f )). Then, the resulting stage cost is simply given by rˆ q¯i z i (t; z 0 , vr e f )2 + rv vr e f (t)2 . (x(t; x0 , ux0 ), ux0 (t)) =
(1.28)
i=0
Moreover, the control vr e f is chosen such that the path parameter state z is steered to the end of the path as t → ∞, i.e., z(t; z 0 , vr e f ) → 0Rrˆ . To be more precise, the control vr e f is set to a state feedback, i.e., vr e f (t) = −K z(t; z 0 , vr e f ) for some gain matrix K , such that asymptotic stability of the origin w.r.t. the closed-loop dynamics of the path parameter state z is ensured; the construction of this maneuver is similar to the analysis presented in [12]. The chosen feedback control results in trajectories z(t) = z 0 e(Az −Bz K )t , i.e., control and solution trajectory of the dynamics matched by the path dynamics resemble a linear system. Hence, taking the definition of the stage cost (1.11) into account, we get the (uniform) estimate (x(t; x0 , ux0 ), ux0 (t)) ≤ c2 (t) (x0 )
∀t ≥ 0
with an integrable function c2 analogously to the linear quadratic case, cf. [19]. The specific construction of the constrained set Xε allows to avoid any cost controllability (see, e.g., [7]), stabilizability w.r.t. the linearized dynamics, or similar assumptions.
1.5 Example–2-DoF Robot In this section, we establish cost controllability for a planar robotic arm with two rigid links to demonstrate the applicability of the procedure outlined in the proof of Theorem 1.11. The model of the robot in consideration is given by
1 Predictive Path Following Control Without Terminal Constraints
x˙ =
x˙ p x˙v
=
xv , M −1 (x p )(u − C(x p , xv )xv − g(x p ))
15
y = x p,
where the vectors x p = (x1 x2 ) ∈ R2 and xv = (x3 x4 ) ∈ R2 are the robot joint angles and speeds, respectively. M : R2 → R2×2 is the (invertible) mass matrix, C : R2 × R2 → R2×2 is the centrifugal and Coriolis forces matrix, and g : R2 → R2 is the vector of gravitational forces. Here, we consider the robot in the horizontal plane, i.e., g(x p ) = 0. Moreover, u = (u 1 u 2 ) ∈ R2 is the vector of the applied torques at each joint, see [12] for further details. Moreover, we impose the state and inputs constraints X = x = (x1 x2 x3 x4 ) ∈ S2 × R2 | (x3 x4 ) ∞ ≤ ω¯ , U = u ∈ R2 | u∞ ≤ u¯ with positive constants ω¯ and u. ¯ The considered system is differentially flat and the output y = x p is one of its flat outputs, for which the maps Φ1 and Φ2 , cf. (1.3), are given by Φ1 = (y y˙ ) ,
Φ2 = M(y) y¨ + C(y, y˙ ) y˙ .
Since the components of y are differentially independent, Assumption 1.2 is satisfied. The desired path to be followed by the considered system is defined as
p1 (θ ) P = y ∈ R2 | θ ∈ [θ¯ , 0] → p(θ ) := p2 (θ )
(1.29)
with (at least) twice continuously differentiable path parametrization p. We define the mapping xP from the path to the system space by xP (θ, θ˙ ) =
p(θ ) , p (θ )θ˙
(1.30)
cf. (1.6), where p (θ ) = ∂ p(θ )/∂θ . Moreover, the corresponding feedforward control u P = ψ(θ, θ˙ , θ¨ ), is ˙ p (θ )θ, ˙ u P = M( p(θ )) p (θ )θ¨ + p (θ )θ˙ 2 + C( p(θ ), p (θ )θ)
(1.31)
cf. (1.7), where p (θ ) = ∂ 2 p(θ )/∂θ 2 . For the considered path-following problem, the path dynamics are
z z˙ 1 = 2 , z˙ 2 v Finally, we define the set
θ z 1 (0) = ˙0 := z 0 . z 2 (0) θ0
(1.32)
16
T. Faulwasser et al.
Fig. 1.1 Visualization of the state set Z given by (1.33) and the open-loop maneuver of the path parameter state z for four different initial conditions for t ∈ [0, t1 ]. Here, t1 = 1/3 s and z¯ 2 = 5
6 5 4 3 2 1 0
-6
-5
-4
-3
-2
z1 ⊂ R2 , Z := [θ¯ , 0] × 0, min z¯ 2 , t1
-1
0
(1.33)
where z¯ 2 is an arbitrary but fixed upper bound on z 2 = θ˙ while the time t1 will be specified later. The special structure of Z ensures that this set is control forward invariant as will be shown later in this section, see Fig. 1.1 for a visualization. In summary, the augmented system (1.9a) is represented by the state vector x = (x z ) ∈ Xε ⊂ R6 and the control u = (u v) ∈ U ⊂ R3 .
1.5.1 Cost Controllability: Growth Function We follow the approach outlined in the proof of Theorem 1.11 to construct a bounded growth function B for the planar robotic arm. Let the weighting matrices Q = diag(q1 , q2 , q3 , q4 , q¯1 , q¯2 ) and R = diag(r1 , r2 , rv ) satisfy the following conditions in order to simplify the upcoming analysis: r1 = r2 , q1 = q2 , q3 = q4 , n :=
q1 q2 q¯1 = = . q3 q4 q¯2
(1.34)
Moreover, we choose the reference path p(θ = z 1 ) in (1.29) and the corresponding full-state path xP (z 1 , z 2 ) in (1.30) as p(z 1 ) = (z 1 z 1 )
and
xP (z 1 , z 2 ) = (z 1 , z 1 , z 2 , z 2 ) ,
(1.35)
respectively. That is, the chosen reference path is a straight-line passing through the origin. Therefore, the minimized stage cost (x0 ), cf. the right-hand side of
1 Predictive Path Following Control Without Terminal Constraints
17
Inequality (1.19), is given by
(x0 ) =
2
qi (x0,i − z 0,1 ) + 2
i=1
4
qi (x0,i − z 0,2 ) + 2
i=3
2
2 q¯i z 0,i .
(1.36)
i=1
Case 1 in the proof of Theorem 1.11, i.e., we begin with the derivation of the growth bound for initial values x0 = (x0 z 0 ) satisfying z 0,1 = θ0 ≤ ε. The goal is to deduce the function B1 in (1.26). Note that is uniformly lower bounded by q¯1 ε2 on the set of considered initial values x0 . We proceed in three steps analogously to those presented in the proof of Theorem 1.11. First step: We, simultaneously, steer the robot initial joint speeds (x0,3 x0,4 ) to (0 0) and the path dynamics speed z 0,2 to 0 in time t1 via the straight-line trajectories, i.e.,
t t , i ∈ {3, 4}, and z 2 (t) = z 0,2 1 − , t ∈ [0, t1 ). xi (t) = x0,i 1 − t1 t1 (1.37) The corresponding trajectories of the joint angular positions and the path parameter read
t xi (t) = x0,i + t x0,i+2 1 − 2t1
t , i ∈ {1, 2}, and z 1 (t) = z 0,1 + t z 0,2 1 − , 2t1
see Fig. 1.1 for a visualization of the path parameter state z resulting from this maneuver. As a result, at time t1 , the joint angles and the path parameter become xi (t1 ) = x0,i + t1 x0,i+2 /2, i ∈ {1, 2},
and
z 1 (t1 ) = z 0,1 + t1 z 0,2 /2.
(1.38)
The open-loop control required to conduct this maneuver is, for t ∈ [0, t1 ],
x3 (t) x0,3 /t1 + C(x(t)) u(t) = −M(x1 (t), x2 (t)) x0,4 /t1 x4 (t)
and
v(t) = −z 0,2 /t1 , (1.39)
where t1 is calculated such that the constraints on u and v are satisfied. As a result of this maneuver, the augmented system state becomes x(t1 ) = (x1 (t1 ), x2 (t1 ), 0, 0, z 1 (t1 ), 0) . Now, using the assumptions q1 = q2 , q3 = q4 , and r1 = r2 in (1.34) leads, for t ∈ [0, t1 ], to the stage cost
18
T. Faulwasser et al. (x(t; x0 , ux0 ), ux0 (t)) = q1
2 2 t (x0,i − z 0,1 ) + t (x0,i+2 − z 0,2 ) 1 − 2t1 i=1 ≤2(x 0,i −z 0,1 )2 +2t 2 (x 0,i+2 −z 0,2 )2 (1−t/(2t1 ))2
4
2 t 2 t + q3 1 − (x0,i − z 0,2 )2 + q¯1 z 0,1 + z 0,2 t 1 − t1 2t1 i=3 2 + q¯2 z 0,2 1−
t t1
2 +t 2 z 2 (1−t/(2t ))2 ≤z 0,1 1 0,2
2
+ r1 u(t) − u P (t)2 + rv v(t)2 ,
where we used z 0,1 z 0,2 ≤ 0. The controls u(t) and v(t) are given by (1.39) while u P (t) is given by (1.31). Thus, employing the remaining assumptions in (1.34), the stage cost can be estimated by
(x(t; x0 , ux0 ), ux0 (t)) ≤ 2q1
2 (x0,i − z 0,1 )2 + r1 u(t) − u P (t)2 i=1
+ q3 2nt 2 (1 − t/(2t1 ))2 + (1 − t/t1 )2
4 (x0,i − z 0,2 )2 i=3
2 + q¯1 z 0,1
2 + q¯2 nt 2 (1 − t/(2t1 ))2 + (1 − t/t1 )2 + rv /t12 z 0,2
for t ∈ [0, t1 ]. Moreover, the term u(t) − u P (t)2 can be estimated by 2r1 (2u) ¯ 2, 2 2 (x0 ) which yields the inequality u(t) − u P (t) ≤ 8r1 u¯ q¯1 ε2 . Consequently, the function c1 : [0, t1 ] → R≥0 of Condition (1.22) can be defined by
(t1 − t)2 c1 (t) = max 2, + max t12
nt 2 (2t1 − t)2 nt 2 (2t1 − t)2 rv , + 2 2 2t1 4t1 q¯2 t12
+
8r1 2 u¯ . q¯1 ε2
We remark that a less conservative estimate (as a linear factor of (x0 )) may be derived also for the term u(t) − u P (t)2 by inspecting (1.39) and (1.31) in more detail; however, we skip these further calculations to keep the presentation technically simple. Second step: We steer the joint angles (x1 (t1 ) x2 (t1 )) to the reference path at the point (z 1 (t1 ) z 1 (t1 )) until time t2 . Theorem 1.9 states that this task is manageable if a geometric path connecting the initial condition of the robot at time t1 to the path exists. Since the desired path of the controlled arm is defined in the joint space, a straight-line path is sufficient, e.g., parametrized by
z 1 (t1 ) − x1 (t1 ) x1 (t1 ) +s , p(s) = x2 (t1 ) z 1 (t1 ) − x2 (t1 )
where the path parameter s ∈ [0, 1] is used for the defined path in order to avoid confusion; the main path is denoted by p(θ ) in (1.29). At the end of this step,
1 Predictive Path Following Control Without Terminal Constraints
19
the robot attains the position (z 1 (t1 ) z 1 (t1 ) 0 0) . A possible time parameterization of the parameter s can be computed using the polynomial interpolation given by 3 ai (t − t1 )i , t ∈ [t1 , t2 ]. Then, employing this parameterization, yields s(t) = i=0 the open-loop state trajectory xi (t) = xi (t1 ) + s(t) (z 1 (t1 ) − xi (t1 )) and xi+2 (t) = s˙ (t) (z 1 (t1 ) − xi (t1 )) for i ∈ {1, 2} and t ∈ [t1 , t2 ], where (t2 − t1 ) is the time required to conduct the described maneuver. As the initial and final joint speeds of this maneuver are zero, the time parameterization s(t) is given by s(t) =
2(t − t1 )3 3(t − t1 )2 − , (t2 − t1 )2 (t2 − t1 )3
t ∈ [t1 , t2 ].
(1.40)
Additionally, the corresponding open-loop control is u = M(x1 (t), x2 (t))
x˙3 (t) x (t) + C(x(t)) 3 , x˙4 (t) x4 (t)
t ∈ [t1 , t2 ].
(1.41)
t2 is chosen such that the speed limit ω¯ of the joints and the control limit u¯ are maintained. During this part of the maneuver, the control input v is set to zero; thus, u P defined by (1.31) is equal to zero. Now, invoking the conditions assumed in (1.34) and the representation (1.38) of xi (t1 ), i ∈ {1, 2} and z 1 (t1 ), leads, for t ∈ [t1 , t2 ], to the stage cost 2 +t 2 z 2 /4 ≤z 0,1 1 0,2
2
(x(t; x0 , ux0 ),ux0 (t)) = q¯1 z 0,1 + t1 · z 0,2 /2 +r1 u(t)2
2 2 t1 (x0,i − z 0,1 ) + (x0,i+2 − z 0,2 ) , + q1 (1 − s(t))2 + q3 s˙ (t)2 2 i=1 2 2(x0,i −z 0,1 )2 +t12 (x0,i+2 −z 0,2 )2 /2 ≤ i=1
where we used that z 0,1 · z 0,2 ≤ 0 holds. As a result, the function c1 (t), t ∈ (t1 , t2 ] of Condition (1.22) can be defined by n · t12 2r1 2 n · t12 s˙ (t)2 , (1 − s(t))2 + max 2, + u¯ . c1 (t) = max 1, 4 n 2 q¯1 ε2 Third step: We steer the augmented state x to the end point of the path xP , i.e.,
x = x P (z), z |z=0R2 during the time interval [t2 , ∞). To this end, we apply the control u(t) = (u P (t), vr e f (t)) , t > t2 , where u P (t) is given by (1.31) while vr e f (t) is the control input that steers the path parameter z from (z 1 (t1 ) 0) to (0, 0) in an admissible way as will be shown later. Applying u P (t) to the considered
20
T. Faulwasser et al. 6
Fig. 1.2 Visualization of the z-trajectory on [t2 , ∞) for five different initial conditions with z 2 (t2 ) = 0 (black-filled circles) and ωn = 1.5
5 4 3 2 1 0
-6
-5
-4
-3
-2
-1
0
system ensures x(t; x(t2 ), u P ) = xP (z(t; z(t2 ), vr e f )) for all t ≥ t2 . Therefore, the resulting stage cost for t ≥ t2 reduces to (x(t; x0 , ux0 ), ux0 (t)) =
2
q¯i z i (t)2 + rv vr e f (t)2 .
(1.42)
i=0
We design vr e f (t), t ≥ the state feedback control law vr e f (t) = −K z(t) with
t2 , via the gain matrix K = k1 k2 such that the resulting closed-loop dynamics of the path parameter z, i.e.,
01 0 z˙ (t) = − K z(t) =: Acl z(t), 00 1 are asymptotically stable and satisfy the constraints Z given by (1.33). The corresponding characteristic equation reads λ2 + k2 λ + k1 = 0. Comparing this equation with the standard second-order characteristic equation λ2 + 2ζ ωn λ + ωn2 = 0 and choosing k1 = ωn2 and k2 = 2ωn leads to a critically damped response (ζ = 1) w.r.t. the path parameter z. Then, for t ≥ t2 , we have z(t) = e
Acl (t−t2 )
z 1 (t1 ) 1 + ωn (t − t2 ) −ωn (t−t2 ) e z 1 (t1 ) = 0 −ωn2 (t − t2 ) =:Δ
with z 1 (t1 ) = z 0,1 + t1 z 0,2 /2. Here, ωn can be chosen such that the constraints on vr e f and z are satisfied for all t ≥ t2 , see Fig. 1.2. As a result, the stage cost (1.42) can be written as
1 Predictive Path Following Control Without Terminal Constraints
21
(x(t; x0 , ux0 ), ux0 (t)) = q¯1 Δ21 + q¯2 Δ22 + rv (k1 Δ1 + k2 Δ2 )2 e−2ωn (t−t2 ) z 1 (t1 )2 .
2 2 Since z 0,1 z 0,2 ≤ 0, we have the estimate z 1 (t1 )2 = z 0,1 + t1 z 0,2 /2 ≤ z 0,1 + 2 2 t1 z 0,2 /4. Now, employing the assumptions in (1.34), c1 (t), t ≥ t2 , can be defined by n · t12 1 2 rv 2 2 Δ1 + Δ2 + c1 (t) = max 1, (k1 Δ1 + k2 Δ2 ) e−2ωn (t−t2 ) . 4 n q¯1
(1.43)
In conclusion, the function c1 is (absolutely) integrable on [0, ∞) and, thus, the growth bound B1 calculated according to (1.23) is bounded. Case 2: We consider initial values x0 = (x0 z 0 ) with z 0,1 = θ0 ≥ ε and x0 = xP (z 0 ). Therefore, the minimized stage cost (x0 ), cf. (1.36), is given by (x0 ) = 2 2 + q¯2 z 0,2 . The growth bound B2 in Eq. (1.26) can be obtained as follows. q¯1 z 0,1 Firstly, we stop the dynamics of the path parameter until time t1 . To this end, we use the same straight-line trajectory of z 2 as in Case 1, cf. Eq. (1.37) while applying the control u P to the robot arm joints as outlined in the proof of the Theorem 1.11. This motion results in z(t1 ) = (z 1 (t1 ) 0) , where z 1 (t1 ) is given in Eq. (1.38) while x(t; x0 , u P ) = xP (z(t; z 0 , vr e f )) holds. As a result, the stage cost on the interval [0, t1 ] reads
2 2
t t z 0,2 + q¯2 (t1 − t)2 + rv z 0,2 /t12 , (x(t; x0 , ux0 ), ux0 (t)) = q¯1 z 0,1 + 1− 2t1 2 2 ≤z 0,1 +(1−t/(2t1 ))2 t 2 z 0,2
where we again used z 0,1 z 0,2 ≤ 0. Now employing the assumptions in (1.34) on q¯1 and q¯2 , the function c2 of Condition (1.22) can be defined by
t2 c2 (t) = max 1, n t − 2t1
2
t + 1− t1
2
rv + q¯2 t12
! ,
t ∈ [0, t1 ].
Secondly, we steer the augmented state x to the endpoint of the path xP during the time interval [t1 , ∞) similar to the third step of Case 1. This allows to define the function c2 of Condition (1.22) by c2 (t) := c1 (t + (t2 − t1 )), cf. (1.43). We remark that the above two steps can be combined into one step, cf. the proof of Theorem 1.11; however, this would lead to even more technical details.
1.5.2 Stabilizing Prediction Horizon Length We use the growth bound B derived in the previous subsection, cf. Eq. (1.26), to determine a stabilizing horizon length Tˆ such that the end point of the path is asymp-
22
T. Faulwasser et al. 1
1
0.8
0.8
0.6
0.6
0.4
0.4
0.2
0.2
0
0
-0.2
-0.2
-0.4
-0.4
-0.6
-0.6
-0.8
-0.8
-1 12
14
16
18
20
22
-1
8
10
12
14
16
18
20
22
Fig. 1.3 Left: effect of changing ε on αT,δ for n = 1. Right: effect of changing the weight ratio n on αT,δ for ε = −3
totically stable for the closed-loop dynamics generated by Algorithm 1.1. Using the αT,δ -formula, Tˆ = min T > 0 | ∃ N ∈ N : T = N δ and αT,δ > 0 , is a stabilizing horizon in view of Theorem 1.5. The horizon Tˆ depends on the constraints Xε and U, the sampling period δ, the weights of the stage cost (1.11), the parameter ε, and the shape of the (reference) path given by (1.35). Here, we choose θ¯ = −2π . Moreover, we choose ω¯ = π [rad/s], u¯ = 3000 [Nm], the virtual control set V = [−15, 15], the parameter ωn = 1.5, and the bound z¯ 2 = 5. Then, the set Z given by (1.33) reads Z = [−2π, 0] × [0, min(5, z 1 /t1 )] ⊂ R2 . The resulting maneuver times are t1 = 1/3 [s], and t2 = 3 max(2π + ε, −ε)/2ω¯ [s]. Figure 1.3 shows the effect of changing ε and the weight ratio n, cf. Eq. (1.34), on αT,δ and thus the stabilizing horizon length. For the cases shown in the figure, we used the parameters rv = 10−6 , r1 = 10−7 , and the sampling time δ = 0.1 s. As can be inferred from the figure, increasing ε results in a shorter stabilizing prediction horizon. The same is observed for increasing the ratio n. However, at a certain value, i.e., n = 37, increasing n worsens the estimates on the stabilizing prediction horizon. The stabilizing prediction horizon can be improved by employing δ as a time shift and not a sampling time. Here, δ is still the length of the first portion of the MPFC minimizing control function as shown in (1.16), see [40] for further details. In this setting, the stabilizing prediction horizon becomes shorter as δ gets closer to T /2. Figure 1.5 (left) shows the effect of changing ε and n on αT,δ with δ = T /2. As can be seen from the figure, much less conservative stabilizing prediction horizons than the ones presented in Fig. 1.3 can be obtained. Furthermore, we remark that the stabilizing prediction horizon can be even further improved by designing a less conservative maneuver than the one presented in Sect. 1.5.1.
1 Predictive Path Following Control Without Terminal Constraints
23
1.5.3 Simulation Results Here, we show the closed-loop simulations of implementing Algorithm 1.1 to the planar robotic arm. The weighting matrices G and R of the stage cost are given by diag(103 , 103 , 10−2 , 10−2 , 1, 10−5 ) and diag(10−6 , 10−6 , 10−7 ), respectively. We set the parameter ε = −3 and conduct our simulations for the prediction horizon T = 0.5 [s] and time shift δ = 0.025 [s]. All simulations are run using MATLAB. The OCP (1.15) is formulated using the CasADi toolbox [3] utilizing the interiorpoint optimization method provided by the IPOPT package, see [36]. Closed-loop trajectories are computed until the condition VT (xT,δ (δ; x)) ≤ 10−6 is met, see Fig. 1.4 (left). In all cases, the MPFC Algorithm 1.1 successfully steers the joint angles to the end point on the imposed path. Moreover, the value function VT is strictly decreasing along all simulated closed-loop trajectories, see Fig. 1.4 (right). This confirms that the relaxed Lyapunov inequality (1.18) is satisfied for the chosen prediction horizon, which allows to conclude closed-loop stability in a trajectory-based setting, cf. [18]. In turn, this indicates that the derived growth bound B derived in the previous subsection, led to a very conservative stabilizing horizon length. However, we emphasize that the presented general construction is applicable to a broad system class. Finally, we illustrate the closed-loop trajectories of the path parameter state vector z in Fig. 1.5 (right). It can be noticed that for initial conditions with z 0,2 > 0, the closed-loop control actions firstly steer the state z toward the z 1 -axis, i.e., z 2 ≈ 0, before the state z is controlled such that it converges to the origin; this closed-loop maneuver is very similar to the proposed maneuver for z presented in Sect. 1.5.1. 10 5
0 -1 -2 -3
10 0
-4 -5 -6 -7 -7
10 -5 -6
-5
-4
-3
-2
-1
0
0
0.5
1
1.5
2
2.5
3
3.5
4
4.5
Fig. 1.4 Left: closed-loop trajectories of x1 and x2 for four initial conditions (filled circles) with ε = −3. Right: evolution of the corresponding value functions VT
24
T. Faulwasser et al. 6
1 0.8
5
0.6 0.4
4
0.2 0
3
-0.2 2
-0.4 -0.6
1
-0.8 -1
4
5
6
7
8
9
10
0
-6
-5
-4
-3
-2
-1
0
Fig. 1.5 Left: effect of changing n and ε on αT,δ with δ = T /2. Right: closed-loop trajectories of z 1 and z 2 for four initial conditions (filled circles)
1.6 Summary and Conclusions In this chapter, we consider model predictive path-following control without stabilizing terminal costs or constraints. We established cost controllability, a sufficient stability condition, for a large-class of differentially flat systems. To this end, we explicitly constructed the required growth function satisfying the desired relation with the (open loop) value function. The proof depends on available trajectory generation algorithms, which guarantee the existence of setpoint-transition trajectories for differentially flat systems. Using the presented techniques guarantees that the MPFC value function uniformly decreases along the closed-loop trajectories, which enables us to rigorously show that the path end point is (asymptotically) stabilized. Finally, we demonstrated the applicability of the presented techniques and, in particular, the proposed MPFC path-following control scheme by conducting a case study based on a planar robotic arm system. Acknowledgements Mohamed W. Mehrez acknowledges the support of the Natural Sciences and Engineering Research Council of Canada (NSERC, funding reference number PDF-532957-2019). Karl Worthmann is indebted to the German Research Foundation (DFG; grant WO 2056/6-1) for their support (Heisenberg-professorship for Optimization-based Control).
References 1. Aguiar, A.P., Hespanha, J.P.: Trajectory-tracking and path-following of underactuated autonomous vehicles with parametric modeling uncertainty. IEEE Trans. Autom. Control 52(8), 1362–1379 (2007) 2. Alessandretti, A., Aguiar, P., Jones, C.: Trajectory-tracking and path-following controllers for constrained underactuated vehicles using model predictive control. In: Proceedings of the European Control Conference 2013, Zürich, Switzerland
1 Predictive Path Following Control Without Terminal Constraints
25
3. Andersson, J.: A general-purpose software framework for dynamic optimization. Ph.D. thesis, Arenberg Doctoral School, KU Leuven, Department of Electrical Engineering, Belgium (2013) 4. Banaszuk, A., Hauser, J.: Feedback linearization of transverse dynamics for periodic orbits. Syst. Control Lett. 26(2), 95–105 (1995) 5. Berkovitz, L.D.: Optimal Control Theory. Applied Mathematical Sciences. Springer, Berlin (1974) 6. Böck, M., Kugi, A.: Real-time nonlinear model predictive path-following control of a laboratory tower crane. IEEE Trans. Control Syst. Technol. 22(4), 1461–1473 (2014) 7. Coron, J.-M., Grüne, L., Worthmann, K.: Model predictive control, cost controllability, and homogeneity. SIAM J. Control Optim. 58(5), 2979–2996 (2020) 8. Esterhuizen, W., Worthmann, K., Streif, S.: Recursive feasibility of continuous-time model predictive control without stabilising constraints. IEEE Control Syst. Lett. 5(1), 265–270 (2021) 9. Faulwasser, T.: Optimization-Based Solutions to Constrained Trajectory-Tracking and PathFollowing Problems. Shaker, Aachen (2013) 10. Faulwasser, T., Findeisen, R.: Nonlinear model predictive path-following control. Nonlinear Model Predictive Control: Towards New Challenging Applications. Lecture Notes in Control and Information Sciences, pp. 335–343. Springer, Berlin (2009) 11. Faulwasser, T., Findeisen, R.: Predictive path following without terminal constraints. In: Proceedings of the 20th International Symposium on Mathematical Theory of Networks and Systems (MTNS), Melbourne, Australia (2012) 12. Faulwasser, T., Findeisen, R.: Nonlinear model predictive control for constrained output path following. IEEE Trans. Autom. Control 61(4), 1026–1039 (2016) 13. Faulwasser, T., Hagenmeyer, V., Findeisen, R.: Optimal exact path-following for constrained differentially flat systems. In: Proceedings of the 18th IFAC World Congress, Milano, Italy, pp. 9875–9880 (2011) 14. Faulwasser, T., Hagenmeyer, V., Findeisen, R.: Constrained reachability and trajectory generation for flat systems. Automatica 50(4), 1151–1159 (2014) 15. Fliess, M., Lévine, J., Martin, P., Rouchon, P.: Flatness and defect of non-linear systems: introductory theory and examples. Int. J. Control 61(6), 1327–1361 (1995) 16. Grimm, G., Messina, M.J., Tuna, S.E., Teel, A.R.: Model predictive control: for want of a local control Lyapunov function, all is not lost. IEEE Trans. Autom. Control 50(5), 546–558 (2005) 17. Grüne, L.: Analysis and design of unconstrained nonlinear MPC schemes for finite and infinite dimensional systems. SIAM J. Control Optim. 48, 1206–1228 (2009) 18. Grüne, L., Pannek, J.: Practical NMPC suboptimality estimates along trajectories. Syst. Control Lett. 58(3), 161–168 (2009) 19. Grüne, L., Pannek, J., Seehafer, M., Worthmann, K.: Analysis of unconstrained nonlinear MPC schemes with varying control horizon. SIAM J. Control Optim. 48(8), 4938–4962 (2010) 20. Köehler, J., Müller, M.A., Allgöwer, F.: A nonlinear model predictive control framework using reference generic terminal ingredients. IEEE Trans. Autom. Control (2019) 21. Lam, D., Manzie, C., Good, M.C.: Model predictive contouring control for biaxial systems. IEEE Trans. Control Syst. Technol. 21(2), 552–559 (2013) 22. Lee, E.B., Markus, L.: Foundations of Optimal Control Theory. The SIAM Series in Applied Mathematics. Wiley, New York (1967) 23. Lévine, J.: Analysis and Control of Nonlinear Systems: a Flatness-Based Approach. Mathematical Engineering. Springer, Berlin (2009) 24. Matschek, J., Bäthge, T., Faulwasser, T., Findeisen, R.: Model predictive path following and tracking: an introduction and perspective. In: Rakovic, S., Levine, J. (eds.) The Handbook of Model Predictive Control, pp. 169–198. Birkhäuser, Basel (2019) 25. Mehrez, M.W., Worthmann, K., Mann, G.K.I., Gosine, R., Faulwasser, T.: Predictive path following of mobile robots without terminal stabilizing constraints. IFAC-PapersOnLine 50(1), 9852–9857 (2017) 26. Müller, M.A., Worthmann, K.: Quadratic costs do not always work in MPC. Automatica 82, 269–277 (2017)
26
T. Faulwasser et al.
27. Nielsen, C., Maggiore, M.: On local transverse feedback linearization. SIAM J. Control Optim. 47, 2227–2250 (2008) 28. Oldenburg, J., Marquardt, W.: Flatness and higher order differential model representations in dynamic optimization. Comput. Chem. Eng. 26(3), 385–400 (2002) 29. Raczy, C., Jacob, G.: Fast and smooth controls for a trolley crane. J. Decision Syst. 8(3), 367–388 (1999) 30. Reble, M., Allgöwer, F.: Unconstrained model predictive control and suboptimality estimates for nonlinear continuous-time systems. Automatica 48(8), 1812–1817 (2012) 31. Rothfuß, R.: Anwendung der flachheitsbasierten Analyse und Regelung nichtlinearer Mehrgrößensysteme. Fortschr.-Ber. VDI Reihe 8 Nr. 664. VDI Verlag, Düsseldorf (1997) 32. Sira-Ramírez, H., Agrawal, S.K.: Differentially Flat Systems. Control Engineering Series. Marcel Dekker Inc., New York (2004) 33. Tuna, S.E., Messina, M.J., Teel, A.R.: Shorter horizons for model predictive control. In: Proceedings of the American Control Conference, pp. 863–868 (2006) 34. van Duijkeren, N., Faulwasser, T., Pipeleers, G.: Dual-objective NMPC: considering economic costs near manifolds. IEEE Trans. Autom. Control (2018) 35. Verscheure, D., Demeulenaere, B., Swevers, J., De Schutter, J., Diehl, M.: Time-optimal path tracking for robots: a convex optimization approach. IEEE Trans. Autom. Control 54(10), 2318–2327 (2009) 36. Wächter, A., Biegler, L.T.: On the implementation of an interior-point filter line-search algorithm for large-scale nonlinear programming. Math. Program. 106(1), 25–57 (2006) 37. Worthmann, K.: Estimates on the prediction horizon length in model predictive control. In: Proceedings of the 20th International Symposium on Mathematical Theory of Networks and Systems, (MTNS), Melbourne, Australia (2012) 38. Worthmann, K., Mehrez, M.W., Zanon, M., Mann, G.K.I., Gosine, R.G., Diehl, M.: Regulation of differential drive robots using continuous time MPC without stabilizing constraints or costs. IFAC-PapersOnLine 48(23), 129–135 (2015) 39. Worthmann, K., Mehrez, M.W., Zanon, M., Mann, G.K.I., Gosine, R.G., Diehl, M.: Model predictive control of nonholonomic mobile robots without stabilizing constraints and costs. IEEE Trans. Control Syst. Technol. 24(4), 1394–1406 (2016) 40. Worthmann, K., Mehrez, M.W., Mann, G.K.I., Gosine, R.G., Pannek, J.: Interaction of open and closed loop control in MPC. Automatica 82, 243–250 (2017)
Chapter 2
Dissipativity in Economic Model Predictive Control: Beyond Steady-State Optimality M. A. Müller
Abstract This chapter provides a concise survey on different dissipativity conditions that have appeared in the literature on economic model predictive control and discusses their decisive role in this context. In particular, besides the well-studied case where steady-state operation is optimal, we also cover the cases of optimal periodic operation and more general optimal operating conditions. Furthermore, we discuss the computation of storage functions and extensions to time-varying settings.
2.1 Introduction Within the last decade, the study of economic MPC schemes has received a significant amount of attention. Here, in contrast to standard stabilizing or tracking MPC, some general performance criterion is considered, which is not necessarily related to the tracking of a given setpoint or (dynamic) trajectory. As a result, the employed cost function is not necessarily positive definite with respect to such a setpoint or trajectory. Such general control objectives occur in many applications and can, e.g., correspond to the maximization of a certain product in the process industry (or profit maximization in general), the minimization of energy consumption, or a cost efficient scheduling of a production process, to name but a few. These type of applications also served as motivation for the term economic MPC [3, 15, 32]. When considering such general cost functions in the context of MPC, various questions arise. The first immediate issue is that of determining what the optimal operating conditions are for given system dynamics, cost function, and constraints. This means to assess whether it is optimal (in terms of the employed cost function) to operate the system at the best steady state, at some periodic orbit, or under some more complex operating conditions (e.g., some general set or time-varying trajectory). The second question which directly follows is whether the closed-loop system resulting from application of an economic MPC scheme converges to the optimal regime of M. A. Müller (B) Institute of Automatic Control Leibniz Universität Hannover, Hannover, Germany e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Switzerland AG 2021 T. Faulwasser et al. (eds.), Recent Advances in Model Predictive Control, Lecture Notes in Control and Information Sciences 485, https://doi.org/10.1007/978-3-030-63281-6_2
27
28
M. A. Müller
operation, i.e., whether the closed-loop system “does the right thing”. For example, in case of optimal periodic operation, the closed-loop system should converge to the optimal periodic orbit. In order to answer the above questions, certain dissipativity conditions have turned out to play a crucial role in the context of economic MPC. In the literature, first the most basic case where steady-state operation is optimal has been studied, which is by now fairly well understood. On the other hand, results for more general cases which go beyond steady-state optimality have only been obtained recently and the picture is (at least partially) still much less complete here. The goal of this chapter is to give a concise survey on the dissipativity conditions that have appeared in the economic MPC literature and to discuss their decisive role in this context. The structure of this chapter is as follows. After the presentation of the setup and a brief introduction to the concept of dissipativity (Sects. 2.2 and 2.3), we start with the basic case of optimal steady-state operation (Sect. 2.4), and then move on to the cases of optimal periodic operation (Sect. 2.5) and more general optimal regimes of operation (Sect. 2.6). After that, we briefly discuss how the employed dissipativity conditions can be verified in Sect. 2.7. In Sect. 2.8, we discuss the timevarying case, before concluding this chapter in Sect. 2.9. We remark that we do not aim at providing a survey on all existing economic MPC approaches, but focus on the role played by dissipativity in the context of economic MPC. For a more comprehensive introduction to economic MPC in general, the interested reader is, e.g., referred to the recent survey article [15] or book [12].
2.2 Setup We consider discrete-time nonlinear systems1 of the form x(k + 1) = f (x(k), u(k)),
x(0) = x 0 ,
(2.1)
with k ∈ N0 . The input and state are required to satisfy pointwise-in-time input and state constraints, respectively, i.e.,2 u(k) ∈ X and x(k) ∈ X for all k ∈ N0 . Define U N (x) := {u ∈ U N : xu (k; x) ∈ X for all k = 0, . . . , N } for N ∈ N ∪ {∞} as the set of feasible input sequences of length N . The performance criterion to be optimized is specified by the stage cost function : Rn × Rm → R, which is assumed to be continuous. As already discussed in Sect. 2.1, in economic MPC no conditions (other than continuity) are imposed on the stage cost , in particular no positive definiteness requirements. The MPC optimization problem at time t, given the current measured state xˆ := x(t), is then given by
1 Some
of the following results are also available for continuous-time systems, compare, e.g., [13, 14]. 2 Also coupled constraints of the form (x, u) ∈ Z ⊆ Rn × Rm can be considered.
2 Dissipativity in Economic Model Predictive Control …
V (x) ˆ =
inf
u∈U N (x) ˆ
N −1
(xu (k; x), ˆ u(k)),
29
(2.2)
k=0
ˆ is applied to system (2.1) in a standard receding-horizon and the input u(t) = u ∗ (0; x) fashion. Various existing economic MPC schemes employ additional terminal constraints and/or a terminal cost function in (2.2). The dissipativity conditions which are discussed in the following play a crucial role in both settings with and without such additional terminal ingredients.
2.3 Dissipativity The concept of dissipativity was introduced by Willems in [36] (compare also [7] for a discrete-time version) and is as follows. Definition 2.1 The system (2.1) is dissipative on X × U with respect to the supply rate s : Rn × Rm → R if there exists a storage function λ : Rn → R≥0 such that the following inequality is satisfied for all (x, u) ∈ X × U with f (x, u) ∈ X: λ( f (x, u)) − λ(x) ≤ s(x, u).
(2.3)
If there exists ρ ∈ K∞ and a set X∗ ⊆ X such that for all (x, u) ∈ X × U with f (x, u) ∈ X it holds that3 λ( f (x, u)) − λ(x) ≤ −ρ(|x|X∗ ) + s(x, u),
(2.4)
then system (2.1) is strictly dissipative with respect to the supply rate s and the set X∗ . Interpreting the storage function λ as “generalized energy”, the property of dissipativity means that along any solution of system (2.1), energy is dissipated, i.e., the difference in stored energy is less than or equal to what is supplied to the system from the outside (via the supply rate s). Analogously, strict dissipativity means that energy is strictly dissipated along all trajectories which do not completely lie inside the set X∗ . In the following sections, we discuss how dissipativity (or strict dissipativity) with respect to different supply rates (and different sets X∗ ) can be employed in the context of economic MPC, both for classifying different optimal operating conditions as well as closed-loop convergence.
3 Here,
|x|X∗ denotes the distance of the point x to the set X∗ , i.e., |x|X∗ := min y∈X∗ |x − y|.
30
M. A. Müller
2.4 Optimal Steady-State Operation The paper [3] was the first4 paper where connections between dissipativity and economic MPC have been made apparent. In that work, the most basic case was considered where the optimal operating regime for system (2.1) is steady-state operation. In order to formally define this notion, we first define an optimal steady state and input pair (x ∗ , u ∗ ) as a minimizer5 to the following optimization problem: min
x∈X,u∈U,x= f (x,u)
(x, u).
The property of optimal steady-state operation can now be defined as follows [3]: Definition 2.2 The system (2.1) is optimally operated at steady state, if for each x ∈ X with U∞ (x) = ∅ and each u ∈ U∞ (x) the following holds: T lim inf T →∞
k=0
(xu (k; x), u(k)) ≥ (x ∗ , u ∗ ). T +1
Definition 2.2 means that no other feasible input and state trajectory pair of system (2.1) can result in an asymptotic average cost which is better than that of the optimal steady state. As was shown in [3], a sufficient condition for optimal steady-state operation is that system (2.1) is dissipative on X × U with respect to the supply rate s(x, u) = (x, u) − (x ∗ , u ∗ ).
(2.5)
This statement can be proven as follows. If the system is dissipative on X × U with respect to the supply rate (2.5), we can sum up the corresponding dissipation inequality (2.3) along any feasible state and input sequence pair to obtain λ(xu (T ; x)) − λ(x) ≤
T −1
(xu (k; x), u(k)) − (x ∗ , u ∗ )
k=0
for all T ∈ N0 . Dividing this inequality by T , taking the lim inf on both sides and exploiting the fact that the storage function λ is non-negative, this results in
4 In the earlier work [9], a special case of a dissipativity condition requiring a linear storage function
has already been used, although the employed assumption was not recognized as a dissipativity condition. 5 Given continuity of , such a minimizer exists, e.g., if X and U are compact. If the minimizer is not unique, in the following (x ∗ , u ∗ ) denotes an arbitrary of the multiple minimizers.
2 Dissipativity in Economic Model Predictive Control …
31
T −1 (xu (k; x), u(k)) − (x ∗ , u ∗ ) λ(xu (T ; x)) − λ(x) ≤ lim inf k=0 0 ≤ lim inf T →∞ T →∞ T T T −1 (x (k; x), u(k)) u . = −(x ∗ , u ∗ ) + lim inf k=0 T →∞ T But this implies that the system is optimally operated at steady state according to Definition 2.2. Similarly, it was proven in [3, 30] that strict dissipativity on X × U with respect to the supply rate (2.5) and the set X∗ = {x ∗ } is a sufficient condition for a slightly stronger property than optimal steady-state operation. This slightly stronger notion was termed (uniform) suboptimal operation off steady state and, loosely speaking, means that every other feasible input and state trajectory pair of system (2.1) either results in an asymptotic average cost which is strictly worse than that of the optimal steady state or enters a neighborhood of the optimal steady state sufficiently often. In two later publications [29, 30], it was shown that under an additional (local) controllability condition, (strict) dissipativity with respect to the supply rate (2.5) (and the set X∗ = {x ∗ }) is also necessary for optimal steady-state operation (uniform suboptimal operation off steady state). This can be proven by a contradiction argument. Namely, assuming optimal steady-state operation but the system being not dissipative with respect to the supply rate (2.5), one can construct a specific feasible periodic trajectory which results in an asymptotic average cost which is strictly lower than (x ∗ , u ∗ ), thus contradicting steady-state optimality. In summary, under an additional controllability condition, dissipativity with respect to the supply rate (2.5) is an equivalent characterization of optimal steady-state operation. As mentioned in Sect. 2.1, a crucial question in economic MPC is to determine whether the resulting closed-loop system converges to the optimal operating regime, i.e., the optimal steady state x ∗ in case of optimal steady-state operation. To this end, it turns out that again the same strict dissipativity condition can be used. The crucial observation for the stability analysis is to consider the rotated cost function L, defined as L(x, u) := (x, u) − (x ∗ , u ∗ ) + λ(x) − λ( f (x, u)),
(2.6)
and to note that strict dissipativity with respect to the supply rate (2.5) and the set X∗ = {x ∗ } implies that L is positive definite with respect to the optimal steady state x ∗ . In case that suitable additional terminal constraints are used, one can show ˆ to problem (2.2) is the same as that to problem (2.2) that the optimal solution u ∗ (·; x) with replaced by L. This was first shown using terminal equality constraints [3, 9] and subsequently extended to a setting with a terminal region and terminal cost [2]. Positive definiteness of L then allows to use standard stability results from stabilizing MPC with positive definite cost functions, and hence to conclude asymptotic stability of the optimal steady state x ∗ for the resulting closed-loop system. In case that no additional terminal constraints are employed, this equivalence between the optimal solutions of problem (2.2) and the modified problem using L instead of does no
32
M. A. Müller
longer hold. Here, positive definiteness of the rotated cost L allows to establish certain turnpike properties of problem (2.2), which in turn can be used to establish (practical) stability of the resulting closed-loop system, compare [17, 21]. In fact, it turns out that under certain conditions, strict dissipativity with respect to the supply rate (2.5) and the set X∗ = {x ∗ } and the turnpike property of problem (2.2) at the optimal steady state x ∗ are equivalent [14, 19]. To summarize the above, (strict) dissipativity with respect to the supply rate (2.5) and the set X∗ = {x ∗ } both allows to conclude that the system is optimally operated at steady state and that the closed loop converges to x ∗ , i.e., the optimal operating regime is found. An interesting question is to determine classes of systems that satisfy this dissipativity property. To this end, the following results have been obtained in the literature. It has first been noted in [9] (compare also [8] for a rigorous proof) that linear systems with convex constraints (satisfying a Slater condition) and a strictly convex cost function are strictly dissipative with respect to the supply rate (2.5) and the set X∗ = {x ∗ }, using a linear storage function6 λ(x) = a T x + c. This result has been extended in [8, 18] to cost functions which are only convex (instead of strictly convex) in the state. In this case, the above strict dissipativity condition (with a quadratic storage function λ) is equivalent to certain eigenvalue conditions on the system matrix. For linear systems with convex constraints and indefinite quadratic cost functions, the paper [5] develops conditions under which strict dissipativity with respect to the supply rate (2.5) and the set X∗ = {x ∗ } is satisfied. In this case, the optimal steady state is often located on the boundary of the constraint set. Finally, in [33], it was shown that under some technical conditions, convex, (state-)monotone nonlinear systems with convex constraints and convex, (state-)monotone cost function are dissipative with respect to the supply rate (2.5).
2.5 Optimal Periodic Operation We now turn to the more general case of optimal periodic operation. As described below, similar results as in Sect. 2.4 can be obtained. To this end, two different types of dissipativity conditions have been developed in the literature, a periodic dissipativity condition and a dissipativity condition for a multi-step system. Recently, it was shown that both these conditions are, in fact, equivalent to a standard dissipativity condition for system (2.1), analogous to Sect. 2.4. This will be discussed in detail in the following. We first define the notion of a periodic orbit. p
p
p
p
Definition 2.3 A set of state/input pairs Π = {(x0 , u 0 ), . . . , (x P−1 , u P−1 )} with p p P ∈ N is a feasible P-periodic orbit of system (2.1) if (xk , u k ) ∈ X × U for all k = p p p p p p 0, . . . , P − 1, xk+1 = f (xk , u k ) for all k = 0, . . . , P − 2, and x0 = f (x P−1 , u P−1 ). Given a periodic orbit Π , we denote its average cost by 6 Nonnegativity
of λ as required in Definition 2.1 then holds on any bounded set X.
2 Dissipativity in Economic Model Predictive Control …
P−1 Π :=
k=0
p
33 p
(xk , u k ) . P
(2.7)
We can now define optimal periodic operation of a system as follows, analogous to optimal steady-state operation. Definition 2.4 The system (2.1) is optimally operated at a periodic orbit Π , if for each x ∈ X with U∞ (x) = ∅ and each u ∈ U∞ (x) the following holds: T lim inf T →∞
k=0
(xu (k; x), u(k)) ≥ Π . T +1
(2.8)
Analogous to optimal steady-state operation, optimal periodic operation means that no other feasible input and state trajectory pair of system (2.1) can result in an asymptotic average cost which is better than that of the periodic orbit Π . Clearly, if a system is optimally operated at some periodic orbit Π ∗ , then Π ∗ is the optimal periodic orbit for this system, i.e., Π ∗ =
inf
P∈N,Π∈SΠP
Π ,
(2.9)
where SΠP denotes the set of all feasible P-periodic orbits. Furthermore, note that the definition of optimal periodic operation contains the definition of optimal steady-state operation as a special case (for P = 1). As mentioned above, two different dissipativity conditions have separately been developed in the literature to analyze optimal periodic operation. The first is a periodic dissipativity condition and has been proposed in7 [38]. Definition 2.5 The system (2.1) is P-periodically dissipative on X × U with respect p p to the supply rates sk (x, u) = (x, u) − (xk , u k ), k = 0, . . . , P − 1, if there exist n storage functions λk : R → R≥0 for k = 0, . . . , P with λ P = λ0 such that the following inequality is satisfied for all (x, u) ∈ X × U with f (x, u) ∈ X and all k = 0, . . . , P − 1: λk+1 ( f (x, u)) − λk (x) ≤ sk (x, u).
(2.10)
If there exists ρ ∈ K∞ such that (2.10) holds with sk (x, u) on the right-hand side replaced by sk (x, u) − ρ(|(x, u)|Π ), then system (2.1) is strictly dissipative with respect to the supply rates sk and the periodic orbit Π .
7 The first periodic dissipativity condition in the context of economic MPC has already been proposed
in [37]. However, besides the fact that the employed periodic storage functions are required to be linear as an extension of the strong duality condition in [9], the periodic dissipativity condition as formulated in [37] can only be satisfied by time-varying (periodic) systems, but not by time-invariant systems as in (2.1), compare [38, Remark 3.5].
34
M. A. Müller
For P = 1 this definition recovers the dissipativity condition of Sect. 2.4, i.e., dissipativity of system (2.1) with respect to the supply rate8 (2.5). In [38], it was shown that P-periodic dissipativity with respect to the supply rates sk (x, u) = p p (x, u) − (xk , u k ) implies that the system (2.1) is optimally operated at the periodic orbit Π . Analogous to the optimal steady-state case in Sect. 2.4, strict P-periodic p p dissipativity with respect to the supply rates sk (x, u) = (x, u) − (xk , u k ) and the periodic orbit Π implies a slightly stronger property than optimal periodic operation (uniform suboptimal operation off the periodic orbit Π , compare [30, 38] for a precise definition). Furthermore, under the same strict periodic dissipativity condition, asymptotic stability of the optimal periodic orbit for the closed-loop system can be established if suitable periodic terminal constraints are added to problem (2.2), compare [38]. The second dissipativity condition that has been used in the literature to examine the case of optimal periodic operation is based on the P-step system dynamics. Namely, define an extended state x˜ = (x˜0 , . . . , x˜ P−1 ) ∈ X P , input u˜ = (u˜ 0 , . . . , u˜ P−1 ) ∈ U P and dynamics ⎡
⎤ ⎡ ⎤ xu˜ (1; x˜ P−1 ) f (x˜ P−1 , u˜ 0 ) ⎦ = ⎣ f ( f (x˜ P−1 , u˜ 0 ), u˜ 1 )⎦ . ... x(t ˜ + 1) = ⎣ ... xu˜ (P; x˜ P−1 )
(2.11)
Furthermore, P-step system (2.11) as define the cost function associated to the ˜ x, ˜ j ) and |(x, ˜ u)| ˜ Π := P−1 ( ˜ u) ˜ := P−1 j=0 (x u˜ ( j; x˜ P−1 ), u j=0 |(x u˜ ( j; x˜ P−1 ), u j )|Π . In [30], it was shown that dissipativity of the P-step system (2.11) with respect to the supply rate ˜ x, s(x, ˜ u) ˜ = ( ˜ u) ˜ − PΠ
(2.12)
is sufficient and (under an additional controllability condition) also necessary for optimal periodic operation. Similarly, strict dissipativity with respect to the supply rate (2.12) and the set9 Π is sufficient and (again under a suitable controllability condition) necessary for uniform suboptimal operation off the periodic orbit Π . Furthermore, the same strict dissipativity condition for the P-step system can be used to show closed-loop convergence to the optimal periodic orbit when applying economic MPC schemes with [35] and without [27] additional terminal constraints in (2.2). Interestingly, however, convergence guarantees cannot necessarily be given definition of strict P-periodic dissipativity with P = 1 is slightly stronger than strict dissipativity with respect to the supply rate (2.5) and the set X∗ = {x ∗ }, since both strictness in state and input is required. Such a slightly stronger property is typically required in the context of optimal periodic operation in order to establish closed-loop convergence to the optimal periodic orbit, compare, e.g., [27, 38]. Furthermore, in [38] a weaker variant of strict periodic dissipativity with ρ(|(x, u)|Π ) replaced by ρ(|(x)|ΠX ) has been considered, where ΠX denotes the projection of Π on X, resulting in a slightly weaker closed-loop stability property. 9 Here, again the slightly stronger version of strict dissipativity compared to (2.4) is needed, i.e., strictness in both state and input, compare see footnote 8. 8 The
2 Dissipativity in Economic Model Predictive Control …
35
for a standard (one-step) MPC scheme but in general only when using a P-step MPC ˆ scheme, meaning that the first P components of the optimal input sequence u ∗ (·; x) are applied, before problem (2.2) is solved again (compare [27]). In a recent publication [24], it was shown that under some technical conditions, the two above-discussed different notions of (strict) dissipativity in the context of optimal periodic operation are equivalent. In fact, they are equivalent to system (2.1) being dissipative with respect to the supply rate s(x, u) = (x, u) − Π ,
(2.13)
(respectively, strictly dissipative with respect to the supply rate (2.13) and the set Π ). Note that this is a standard (one-step) dissipativity condition for system (2.1), as defined in Definition 2.1. The benefit of this result is that the cases of optimal steady state and optimal periodic operation can now be treated within the same framework using standard dissipation inequalities (i.e., without having to define periodic dissipativity notions or multi-step system dynamics). Also, the employed supply rates (2.5) and (2.13) are quite similar. In particular, they are of the form s(x, u) = (x, u) − c, where the constant c is the asymptotic average cost of the optimal operating behavior (optimal steady state or optimal periodic orbit). As shown in the following section, such a dissipativity condition can also be used beyond optimal periodic operation. Furthermore, when using the one-step dissipativity condition (2.13), various of the previous assumptions (such as local controllability of the P-step system at the optimal periodic orbit) can be relaxed, and closed-loop convergence to the optimal periodic orbit can be established under certain conditions also for standard one-step MPC schemes without terminal constraints, compare [24].
2.6 General Optimal Operating Conditions The results of the previous two sections can be extended to optimal operating conditions which are more general than steady state or periodic operation, as was recently shown in [10, 26]. Namely, consider an arbitrary feasible state and input trajectory pair of system (2.1) (x0∗ , u ∗0 ), (x1∗ , u ∗1 ), . . .
(2.14)
with corresponding (best) asymptotic average cost defined as T av := lim inf T →∞
(xk∗ , u ∗k ) . T +1
k=0
(2.15)
Then, it was proven in [10] that if system (2.1) is dissipative with respect to the supply rate
36
M. A. Müller
s(x, u) = (x, u) − av ,
(2.16)
the optimal asymptotic average performance is given by av . This means that for each x ∈ X with U∞ (x) = ∅ and each u ∈ U∞ (x) the following holds: T lim inf T →∞
k=0
(xu (k; x), u(k)) ≥ av . T +1
The proof of this fact is analogous to the one provided in Sect. 2.4 for optimal steadystate operation, replacing (x ∗ , u ∗ ) by av . It is easy to see that both optimal steady state and optimal periodic operation are included as a special case, when using a constant or periodic trajectory in (2.14), respectively. Furthermore, a strict version of this dissipativity condition can again be used in order to show closed-loop convergence to the optimal regime of operation. Namely, denote by Π the closure of the set of all points of the trajectory (2.14), i.e., Π := cl{(x0∗ , u ∗0 ), (x1∗ , u ∗1 ), . . . }, and the projection of Π on X by ΠX , i.e., ΠX := cl{x0∗ , x1∗ , . . . }. Then, if a suitable terminal region and terminal cost function is employed and some technical assumptions hold, strict dissipativity of system (2.1) with respect to the supply rate (2.16) and the set ΠX implies asymptotic stability of the set ΠX for the resulting closed-loop system. A different dissipativity condition has recently been proposed in [26] in the context of optimal set operation, using parametric storage functions. Namely, consider some control invariant set X ⊆ X and define Z := {(x, u) ∈ X × U : f (x, u) ∈ X} as well ¯ := {u ∈ U N : xu (k; x) ∈ X for all k ≥ 0}. Now consider the dissipation as U ∞ (x) inequality ¯ u), ¯ λ f (x, ¯ u) ¯ ( f (x, u)) − λx¯ (x) ≤ (x, u) − ( x,
(2.17)
where (x, ¯ u) ¯ ∈ Z. As shown in [26], if (2.17) is satisfied for all (x, u) ∈ X × U with f (x, u) ∈ X and all (x, ¯ u) ¯ ∈ Z, then system (2.1) is optimally operated at the set Z. This means that for each x ∈ X with U∞ (x) = ∅, each u ∈ U∞ (x), each x¯ ∈ X with U ∞ (x) ¯ = ∅, and each u¯ ∈ U ∞ (x) ¯ the following holds: T lim inf T →∞
k=0
(xu (k; x), u(k)) ≥ lim inf T →∞ T +1
T k=0
(x¯u¯ (k; x), ¯ u(k)) ¯ , T +1
i.e., the optimal asymptotic average performance is given by the asymptotic average performance of an arbitrary trajectory inside the set Z. Again, also the converse statement is true under an additional controllability assumption [26]. The dissipativity condition (2.17) uses a storage function which is parametrized by states x¯ ∈ X and can be seen as a generalization of the periodic dissipativity condition in Definition 2.5. Namely, while Definition 2.5 uses P storage functions (i.e., parametrized by the optimal periodic orbit in terms of its period length), the storage function in (2.17) is parametrized along arbitrary trajectories in the set X.
2 Dissipativity in Economic Model Predictive Control … Table 2.1 Dissipativity conditions used in economic MPC Optimal steady-state operation Optimal periodic operation (Sect. 2.4) (Sect. 2.5) Dissipativity w.r.t. supply rate s(x, u) = (x, u) − (x ∗ , u ∗ )
Dissipativity w.r.t. supply rate s(x, u) = (x, u) − Π with Π as in (2.7) P-periodic dissipativity (Definition 2.5)
37
General optimal operating conditions (Sect. 2.6) Dissipativity w.r.t. supply rate s(x, u) = (x, u) − av with av as in (2.15) Dissipativity with parametrized storage function as defined in (2.17)
Dissipativity of P-step system (2.11) w.r.t. the supply rate (2.12)
Using suitable terminal equality constraints specified by a trajectory in X, it can be shown that the dissipativity condition (2.17) strengthened to strict dissipativity with respect to the set X ensures closed-loop convergence to X as desired [26]. Table 2.1 summarizes the different dissipativity conditions employed in the context of economic MPC for the cases of optimal steady-state operation (Sect. 2.4), optimal periodic operation (Sect. 2.5), and general optimal operating conditions (Sect. 2.6).
2.7 Computation of Storage Functions In this section, we briefly discuss how the above dissipativity properties can be verified, i.e., how suitable storage functions satisfying the relevant dissipation inequalities can be computed. First, we note that also without explicitly verifying dissipativity, the above results can be used as an a posteriori guarantee that—given a suitably designed economic MPC scheme—the closed-loop system “does the right thing”, i.e., converges to the optimal operating behavior: (strict) optimal steady-state/periodic/set operation implies (strict) dissipativity, which in turn can be used to conclude closed-loop convergence. On the other hand, in order to obtain a priori guarantees about the optimal operating conditions and the closed-loop behavior, (strict) dissipativity needs to be verified. To this end, different approaches are available for different system classes (although it has to be mentioned that no systematic procedure is available for general nonlinear systems). As has already been discussed at the end of Sect. 2.4, for linear systems linear or quadratic storage functions can be computed under certain conditions. If nonlinear polynomial systems subject to polynomial cost and constraints are considered, sum-of-squares (SOS) programming can be employed to verify dissipativity, compare, e.g., [11]. In the context of economic MPC, this method has, e.g., been
38
M. A. Müller
applied in [10, 13] for verifying dissipativity in case of optimal steady-state and optimal periodic operation, respectively. In these examples, the optimal steady-state cost (x ∗ , u ∗ ) and cost of the optimal periodic orbit Π , appearing in the supply rates (2.5) and (2.13), respectively, had been assumed known (i.e., precomputed before verifying dissipativity). On the other hand, in general the optimal operating behavior might be unknown a priori and hence the optimal average cost appearing in the supply rates (2.5), (2.13), and (2.16) has to be computed together with a suitable storage function. To this end, a computational procedure has recently been proposed concurrently in [6, 31]. Namely, there the following optimization problem has been considered: maximizec∈R,λ∈Λ c s.t. (x, u) − c + λ(x) − λ( f (x, u)) ≥ 0 for all (x, u) ∈ Z,
(2.18) (2.19)
where Z := {(x, u) ∈ X × U : f (x, u) ∈ X} and Λ ⊆ C(Rn ) is a given set of functions. Constraint (2.19) ensures that system (2.1) is dissipative with respect to the supply rate s(x, u) = (x, u) − c.
(2.20)
Denote by c∗ and λ∗ the optimizers to this problem (assuming they exist). Hence, using the same arguments as above, it follows that c∗ is a lower bound on the best achievable asymptotic average cost, compare [10]. Unfortunately, problem (2.18)– (2.19) can, in general, not be solved efficiently, since (even for a fixed, finite parametrization of Λ) it is a semi-infinite optimization problem. Nevertheless, for polynomial system dynamics f and polynomial cost function , sum-of-squares programming can again be employed to efficiently solve a relaxed formulation of problem (2.18)–(2.19), compare [6, 31]. This is straightforward in case of no constraints, i.e., X = Rn and U = Rm . In case that state and/or input constraints are present which are given in terms of polynomial inequalities, the Positivstellensatz or S-procedure can be used in order to obtain again a relaxed formulation of problem (2.18)–(2.19), which can efficiently be solved by SOS techniques [6, 31]. These methods can also be used to verify approximate dissipativity for nonpolynomial systems by considering (polynomial) Taylor approximations [31]. In [6], it was additionally considered how strict dissipativity can be verified using problem (2.18)–(2.19). Namely, consider the set of points for which the constraint (2.19) is satisfied with equality for c = c∗ and λ = λ∗ , i.e., M := {(x, u) ∈ Z : (x, u) − c∗ + λ∗ (x) − λ∗ ( f (x, u)) = 0}.
(2.21)
The following results can now be obtained. If M is a periodic orbit, then M = c∗ and system (2.1) is strictly dissipative with respect to the supply rate (2.20) and the periodic orbit M. Conversely, if system (2.1) is strictly dissipative with respect to the supply rate (2.13) and some periodic orbit Π with storage function λ ∈ Λ, then c∗ =
2 Dissipativity in Economic Model Predictive Control …
39
Π and Π ⊆ M. Extensions to more general operating conditions are also possible. Furthermore, in case that the above-discussed relaxations via SOS programming are employed, similar sufficient conditions for strict dissipativity can be obtained (based on a set defined similarly to M in (2.21) and an additional complementary slackness condition), albeit necessity does no longer hold, compare [6].
2.8 Time-Varying Case In this section, we discuss how the above results can be extended to the time-varying case, i.e., time-varying system dynamics of the form x(k + 1) = f (k, k(k), u(k)), x(t0 ) = x0 ,
(2.22)
together with a time-varying cost function (k, x(k), u(k)) and constraint sets X(k) and U(k). While the main insights and proof techniques carry over, various technical subtleties become more involved in this case. The first issue is how to properly define the optimal operating behavior in the timevarying case. To this end, two different approaches have been used in the literature. In [1], an averaged criterion similar to the time-invariant case has been used. Namely, the system is optimally operated at some feasible state and input trajectory pair (x ∗ (·), u ∗ (·)), if for all x ∈ X(k) with10 U∞ (k, x) = ∅ and each u ∈ U∞ (k, x) the following holds: T lim inf
k=0
T →∞
(k, xu (k; x), u(k)) − (k, x ∗ (k), u ∗ (k)) ≥ 0. T +1
(2.23)
Similar to the previous sections, this means that no feasible state and input trajectory pair results in a better asymptotic average cost than the pair (x ∗ (·), u ∗ (·)). Alternatively, a slightly stronger definition of optimal system operation is employed in [20, 23], where the inequality (2.23) is evoked in a non-averaged sense. In particular, optimal system operation at some feasible state and input trajectory pair (x ∗ (·), u ∗ (·)) is given if for all x ∈ X(k) with U∞ (k, x) = ∅ and each u ∈ U∞ (k, x) the following holds: lim inf T →∞
T
(k, xu (k; x), u(k)) − (k, x ∗ (k), u ∗ (k)) ≥ 0.
(2.24)
k=0
This definition is related to the concept of overtaking optimality [16] in the sense that the (cumulative) cost encountered along the optimal trajectory pair (x ∗ (·), u ∗ (·)) is 10 Analogous to the time-invariant case, U (k, x) denotes the (time-varying) set of all feasible input ∞
sequences of infinite length for a given initial state x at time k.
40
M. A. Müller
“overtaken” by the (cumulative) cost encountered along any other feasible state and input trajectory pair at some point in time. Again, a suitable (time-varying) dissipativity condition can be employed to classify optimal system operation and to analyze the closed-loop behavior of time-varying economic MPC schemes. Definition 2.6 The system (2.22) is dissipative with respect to the supply rate s : N0 × Rn × Rm → R if there exists a storage function λ : N0 × Rn → R≥0 such that the following inequality is satisfied for all k ∈ N0 and all (x, u) ∈ X(k) × U(k) with f (k, x, u) ∈ X(k + 1): λ(k + 1, f (k, x, u)) − λ(k, x) ≤ s(k, x, u).
(2.25)
If there exists ρ ∈ K∞ such that (2.25) holds with s(k, x, u) on the right-hand side replaced by s(k, x, u) − ρ(|(x, u)|(x ∗ (k),u ∗ (k)) ), then system (2.22) is strictly dissipative with respect to the supply rates s and the trajectory pair (x ∗ (·), u ∗ (·)). Using the same arguments as in the time-invariant case, one can show that dissipativity with respect to the supply rate s(k, x, u) = (k, x, u) − (k, x ∗ (k), u ∗ (k)) is a sufficient condition for optimal system operation at the trajectory pair (x ∗ (·), u ∗ (·)) in the sense of (2.23), compare [1]. However, this is not necessarily the case when using the stronger, non-averaged version (2.24) of optimal system operation. Furthermore, strict dissipativity with respect to the supply rate s(k, x, u) = (k, x, u) − (k, x ∗ (k), u ∗ (k)) and the trajectory pair (x ∗ (·), u ∗ (·)) can (together with some technical assumptions) be employed to establish (i) time-varying turnpike properties [23] and (ii) closed-loop (practical) convergence to the optimal trajectory pair (x ∗ (·), u ∗ (·)), both for economic MPC schemes with11 [1] and without [20] suitable time-varying terminal cost and terminal region.
2.9 Conclusions Dissipativity has turned out to play a crucial role in the context of economic model predictive control, both for determining the optimal operating behavior of a system as well as for the analysis of the closed-loop dynamics. As shown above, to this end a dissipativity condition with supply rate of the form s(x, u) = (x, u) − c
(2.26)
[1], the additional condition that λ is constant along x ∗ (·) is imposed. Note that from (2.25) together with the definition of the supply rate s(k, x, u) = (k, x, u) − (k, x ∗ (k), u ∗ (k)), it follows that λ converges to a constant value λ¯ when evaluated along x ∗ (·), i.e., limk→∞ λ(k, x ∗ (k)) = λ¯ , but not necessarily that λ is constant along x ∗ (·) for all k.
11 In
2 Dissipativity in Economic Model Predictive Control …
41
can be used,12 where the constant c is the asymptotic average cost of the optimal operating behavior. This holds true for the basic case of optimal steady-state operation, for optimal periodic operation, and also for more general optimal regimes of operation. This allows for the following high-level intuition/interpretation. Recalling the “energy” interpretation of dissipativity given in Sect. 2.3, a negative value of the supply rate s corresponds to “extracting energy” from the system. In case of the supply rate (2.26), this is the case for all points (x, u) which have a lower cost than the optimal asymptotic average cost c. Since dissipativity means that we cannot extract an infinite amount of energy from the system, on average (asymptotically) the supply rate must be non-negative, which by (2.26) means that the optimal asymptotic average cost cannot be lower than c. Recently, some extensions and generalizations of the above results have been proposed, e.g., in the context of uncertain systems [4], for discounted optimal control problems [22, 28], and in distributed [25] and multi-objective [34] settings. Here, however, the picture of the interplay of suitable dissipativity conditions, optimal system operation, and economic MPC schemes is still much less complete and allows for many interesting directions for future research.
References 1. Alessandretti, A., Aguiar, A.P., Jones, C.N.: On convergence and performance certification of a continuous-time economic model predictive control scheme with time-varying performance index. Automatica 68, 305–313 (2016) 2. Amrit, R., Rawlings, J.B., Angeli, D.: Economic optimization using model predictive control with a terminal cost. Annu. Rev. Control 35(2), 178–186 (2011) 3. Angeli, D., Amrit, R., Rawlings, J.B.: On average performance and stability of economic model predictive control. IEEE Trans. Autom. Control 57(7), 1615–1626 (2012) 4. Bayer, F.A., Müller, M.A., Allgöwer, F.: On optimal system operation in robust economic MPC. Automatica 88, 98–106 (2018) 5. Berberich, J., Köhler, J., Allgöwer, F., Müller, M.A.: Indefinite linear quadratic optimal control: strict dissipativity and turnpike properties. IEEE Control Syst. Lett. 2(3), 399–404 (2018) 6. Berberich, J., Köhler, J., Allöwer, F., Müller, M.A.: Dissipativity properties in constrained optimal control: a computational approach. Automatica 114, 108840 (2020) 7. Byrnes, C.I., Lin, W.: Losslessness, feedback equivalence, and the global stabilization of discrete-time nonlinear systems. IEEE Trans. Autom. Control 39(1), 83–98 (1994) 8. Damm, T., Grüne, L., Stieler, M., Worthmann, K.: An exponential turnpike theorem for dissipative discrete time optimal control problems. SIAM J. Control Optim. 52(3), 1935–1957 (2014) 9. Diehl, M., Amrit, R., Rawlings, J.B.: A Lyapunov function for economic optimizing model predictive control. IEEE Trans. Autom. Control 56(3), 703–707 (2011) 10. Dong, Z., Angeli, D.: Analysis of economic model predictive control with terminal penalty functions on generalized optimal regimes of operation. Int. J. Robust Nonlinear Control 28(16), 4790–4815 (2018) 12 As discussed in the previous sections, this holds for the time-invariant case. The supply rates in the time-varying case or when using parametrized storage functions (compare Sects. 2.8 and 2.6, respectively) are of a similar structure and hence allow for the same interpretation as discussed in the following for supply rate (2.26).
42
M. A. Müller
11. Ebenbauer, C., Allgöwer, F.: Analysis and design of polynomial control systems using dissipation inequalities and sum of squares. Comput. Chem. Eng. 30(10), 1590–1602 (2006). Papers form Chemical Process Control VII 12. Ellis, M., Liu, M., Christofides, P.: Economic Model Predictive Control: Theory, Formulations and Chemical Process Applications. Springer, Berlin (2017) 13. Faulwasser, T., Korda, M., Jones, C.N., Bonvin, D.: Turnpike and dissipativity properties in dynamic real-time optimization and economic MPC. In: Proceedings of the 53rd IEEE Conference on Decision and Control, pp. 2734–2739 (2014) 14. Faulwasser, T., Korda, M., Jones, C.N., Bonvin, D.: On turnpike and dissipativity properties of continuous-time optimal control problems. Automatica 81, 297–304 (2017) 15. Faulwasser, T., Grüne, L., Müller, M.A.: Economic nonlinear model predictive control. Found. Trends® Syst. Control 5(1), 1–98 (2018) 16. Gale, D.: On optimal development in a multi-sector economy. Rev. Econ. Stud. 34(1), 1–18 (1967) 17. Grüne, L.: Economic receding horizon control without terminal constraints. Automatica 49(3), 725–734 (2013) 18. Grüne, L., Guglielmi, R.: Turnpike properties and strict dissipativity for discrete time linear quadratic optimal control problems. SIAM J. Control Optim. 56(2), 1282–1302 (2018) 19. Grüne, L., Müller, M.A.: On the relation between strict dissipativity and turnpike properties. Syst. Control Lett. 90, 45–53 (2016) 20. Grüne, L., Pirkelmann, S.: Economic model predictive control for time-varying system: performance and stability results. Optimal Control Appl. Methods (2019) 21. Grüne, L., Stieler, M.: Asymptotic stability and transient optimality of economic MPC without terminal constraints. J. Process Control 24(8), 1187–1196 (2014) 22. Grüne, L., Kellett, C.M., Weller, S.R.: On a discounted notion of strict dissipativity. In: Proceedings of the 10th IFAC Symposium on Nonlinear Control Systems, pp. 247–252 (2016) 23. Grüne, L., Pirkelmann, S., Stieler, M.: Strict dissipativity implies turnpike behavior for timevarying discrete time optimal control problems. Control Systems and Mathematical Methods in Economics: Essays in Honor of Vladimir M. Veliov. Lecture Notes in Economics and Mathematical Systems, vol. 687, pp. 195–218. Springer, Cham (2018) 24. Köhler, J., Müller, M.A., Allgöwer, F.: On periodic dissipativity notions in economic model predictive control. IEEE Control Syst. Lett. 2(3), 501–506 (2018) 25. Köhler, P.N., Müller, M.A., Allgöwer, F.: Approximate dissipativity and performance bounds for interconnected systems. In: Proceedings of the European Control Conference (ECC), pp. 787–792 (2019) 26. Martin, T., Köhler, P.N., Allgöwer, F.: Dissipativity and economic model predictive control for optimal set operation. In: Proceedings of the American Control Conference (ACC) (2019) 27. Müller, M.A., Grüne, L.: Economic model predictive control without terminal constraints for optimal periodic behavior. Automatica 70, 128–139 (2016) 28. Müller, M.A., Grüne, L.: On the relation between dissipativity and discounted dissipativity. In: Proceedings of the 56th IEEE Conference on Decision and Control (CDC), pp. 5570–5575 (2017) 29. Müller, M.A., Angeli, D., Allgöwer, F.: On necessity and robustness of dissipativity in economic model predictive control. IEEE Trans. Autom. Control 60(6), 1671–1676 (2015) 30. Müller, M.A., Grüne, L., Allgöwer, F.: On the role of dissipativity in economic model predictive control. In: Proceedings of the 5th IFAC Conference on Nonlinear Model Predictive Control, pp. 110–116 (2015) 31. Pirkelmann, S., Angeli, D., Grüne, L.: Approximate computation of storage functions for discrete-time systems using sum-of-squares techniques. Preprint, University of Bayreuth (2019) 32. Rawlings, J.B., Amrit, R.: Optimizing process economic performance using model predictive control. In: Magni, L., Raimondo, D.M., Allgöwer, F. (eds.) Nonlinear Model Predictive Control - Towards New Challenging Applications. Lecture Notes in Control and Information Sciences, pp. 119–138. Springer, Berlin (2009)
2 Dissipativity in Economic Model Predictive Control …
43
33. Schmitt, M., Ramesh, C., Goulart, P., Lygeros, J.: Convex, monotone systems are optimally operated at steady-state. In: 2017 American Control Conference (ACC), pp. 2662–2667 (2017) 34. Stieler, M.: Performance estimates for scalar and multiobjective model predictive control schemes. Ph.D. thesis, University of Bayreuth (2018) 35. Wabersich, K.P., Bayer, F.A., Müller, M.A., Allgöwer, F.: Economic model predictive control for robust periodic operation with guaranteed closed-loop performance. In: Proceedings of the European Control Conference, pp. 507–513 (2018) 36. Willems, J.C.: Dissipative dynamical systems - part i: general theory. Arch. Rational Mech. Anal. 45(5), 321–351 (1972) 37. Zanon, M., Gros, S., Diehl, M.: A Lyapunov function for periodic economic optimizing model predictive control. In: Proceedings of the 52nd IEEE Conference on Decision and Control, pp. 5107–5112 (2013) 38. Zanon, M., Grüne, L., Diehl, M.: Periodic optimal control, dissipativity and MPC. IEEE Trans. Autom. Control 62(6), 2943–2949 (2017)
Chapter 3
Primal or Dual Terminal Constraints in Economic MPC? Comparison and Insights T. Faulwasser
and M. Zanon
Abstract This chapter compares different formulations for economic nonlinear model predictive control (EMPC) which are all based on an established dissipativity assumption on the underlying optimal control problem (OCP). This includes schemes with and without stabilizing terminal constraints, respectively, or with stabilizing terminal costs. We recall that a recently proposed approach based on gradient correcting terminal penalties implies a terminal constraint on the adjoints of the OCP. We analyze the feasibility implications of these dual/adjoint terminal constraints and we compare our findings to approaches with and without primal terminal constraints. Moreover, we suggest a conceptual framework for the approximation of the minimal stabilizing horizon length. Finally, we illustrate our findings considering a chemical reactor as an example.
3.1 The Dissipativity Route to Optimal Control and MPC Since the late 2000s, there has been substantial interest in so-called economic nonlinear model predictive control (EMPC). Early works include [26, 30], recent overviews can be found in [2, 10, 18]. Indeed the underlying idea of EMPC is appealing as it refers to receding-horizon control based on optimal control problems (OCPs) comprising generic stage costs, i.e., more general than the typical tracking objectives. In this context, beginning with [3, 8] a dissipativity notion for OCPs received considerable attention. Indeed numerous key insights have been obtained via the dissipativity route. Dissipativity is closely related to optimal operation at steady state [29]: under suitable conditions they are equivalent. In its strict form, dissipativity also implies the so-called turnpike property of OCPs. These properties are a classical notion in T. Faulwasser (B) TU Dortmund University, Dortmund, Germany e-mail: [email protected] M. Zanon IMT Lucca, Lucca, Italy © The Author(s), under exclusive license to Springer Nature Switzerland AG 2021 T. Faulwasser et al. (eds.), Recent Advances in Model Predictive Control, Lecture Notes in Control and Information Sciences 485, https://doi.org/10.1007/978-3-030-63281-6_3
45
46
T. Faulwasser and M. Zanon
optimal control. They refer to similarity properties of OCPs parametric in the initial condition and the horizon length, whereby the optimal solutions stay close to the optimal steady state during the middle part of the horizon and the time spent close to the turnpike grows with increasing horizon length. Early observations of this phenomenon can be traced back to John von Neumann and the 1930/40s [33]. The term as such was coined in [9] and has received widespread attention in economics [6, 27]. Remarkably early in the development of EMPC, the key role of turnpike properties for such schemes had been observed [30]. However, it took until [20] for a first stability proof which directly leveraged their potential. Turnpikes are also of interest for generic OCPs in finite and infinite dimensions, see [19, 25, 32]. Strict dissipativity and turnpike properties of OCPs are—under suitable technical assumptions—equivalent [17, 21]. The turnpike property also allows to show recursive feasibility of EMPC without terminal constraints [11, 18]. Dissipativity can be used to build quadratic tracking costs for MPC [7, 42–44] yielding approximate economic optimality. It is also related to a positive definite Gauss–Newton-like Hessian approximation for EMPC [39]. Strict dissipativity of an OCP allows deriving sufficient stability conditions with terminal constraints [1, 3, 8] and without them [11, 16, 20, 40]. Dissipativity and turnpike concepts can be extended to time-varying cases [24, 41, 45] and to OCPs with discrete controls [14]. Finally, there exists a close relation between dissipativity notions of OCPs and infinite-horizon optimal control problems [13]. This substantial cardinality of results obtained along the dissipativity route to EMPC and OCPs might be surprising at first glance. However, taking into account the history of system-theoretic dissipativity notions—in particular the foundational works of Jan Willems [35–37]—the close relation between both topics is far from astounding. Yet, this chapter does not attempt a full-fledged introduction to EMPC, and the interested reader is referred to the recent overviews [2, 10, 18]. Neither will it give a comprehensive overview of dissipativity notions. To this end, we refer, for example, to Chap. 2 of this book. Instead, here, we focus on a comparison of EMPC schemes with and without terminal constraints. In general, one can distinguish three main classes of dissipativity-based stability approaches: • schemes relying on terminal {equality, inequality} constraints and corresponding terminal penalties, see, e.g., [1, 3, 8]; • schemes using neither terminal constraints nor penalties, e.g., [11, 20]; • schemes avoiding primal terminal constraints and instead using gradient correcting terminal penalties (which imply a dual terminal constraint) [16, 40]. Specifically, this chapter compares the three schemes above with respect to different aspects such as the structure of the optimality conditions, primal and dual feasibility properties, and the required length of the stabilizing horizon. The remainder of this chapter is structured as follows: In Sect. 3.2 we recall the EMPC schemes to be compared and the corresponding stability results. In Sect. 3.3 we analyze the schemes with respect to different properties. We also derive formal optimization problems, which upon solving, certify the length of the stabilizing
3 Primal or Dual Terminal Constraints in Economic …
47
horizon. Section 3.4 will present results of a comparative numerical case study. The chapter closes with conclusions and outlook in Sect. 3.5.
3.2 Economic MPC Revisited In this chapter, we consider EMPC schemes based on the following family of OCPs N −1
. VN (xˆi ) = min (xk , u k ) + Vf (x N ) x,u
(3.1a)
k=0
subject to x0 = xˆi , xk+1 = f (xk , u k ), g(xk , u k ) ≤ 0, x N ∈ Xf ,
(3.1b) k ∈ I[0,N −1] , k ∈ I[0,N ] ,
(3.1c) (3.1d) (3.1e)
where we use the shorthand notation I[a,b] := {a, . . . , b}, a, b integers. The constraint . set for z = [x u] ∈ Rn x +n u is defined as Z = {z ∈ Rn x +n u | g j (z) ≤ 0, j ∈ I[1,n g ] },
(3.2)
where Z is assumed to be compact and g j : Rn x → R.1 To avoid cumbersome technicalities, we assume that the problem data of (3.1) is sufficiently smooth, i.e., at least twice differentiable, and that the minimum exists.2 The superscript · denotes optimal solutions. Occasionally, we denote the optimal solutions as . u (xˆi ) = u 0 (xˆi ) . . . u N −1 (xˆi ) , . x (xˆi ) = x0 (xˆi ) . . . x N (xˆi ) , whenever no confusion about the initial condition can arise, we will drop it. We use the shorthand notation g(x, u) = [g1 (x, u), . . . , gn g (x, u)] . Finally, VN : Rn x → R denotes the optimal value function of (3.1). As usual in NMPC and EMPC, the receding-horizon solution to (3.1) implies the following closed-loop dynamics xˆi+1 = f (xˆi , κ(xˆi )), xˆ0 ∈ X0 , i ∈ N,
(3.3)
where the superscript ˆ· distinguishes actual system variables from their predictions and the MPC feedback κ : X → U is defined as usual: x-projection of Z is denoted as X and the u-projection as U. the feasible set is non-empty and compact, existence of a minimum follows from continuity of the objective (3.1a).
1 The
2 Provided
48
T. Faulwasser and M. Zanon
. κ(x) ˆ = u 0 (x). ˆ
3.2.1 Dissipativity-Based Stability Results + Recall the following standard definition: a function α : R+ 0 → R0 is said to belong to class K , if it is continuous, strictly increasing, and α(0) = 0. We begin by recalling a dissipativity notion for OCPs.
Definition 3.1 (Strict dissipativity) 1. System (3.1c) is said to be dissipative with respect to the steady-state pair x¯ u¯ ∈ Z, if there exists a non-negative function S : Rn x → R such that for all x, u ∈ Z S( f (x, u)) − S(x) ≤ (x, u) − (x, ¯ u). ¯
(3.4a)
2. If, additionally, there exists α ∈ K such that ¯ u)) ¯ + (x, u) − (x, ¯ u), ¯ S( f (x, u)) − S(x) ≤ −α ((x, u) − (x,
(3.4b)
then (3.1c) is said to be strictly dissipative with respect to x¯ u¯ ∈ Z. 3. If, for all N ∈ N and all x0 ∈ X0 , the dissipation inequalities (3.4) hold along any optimal pair of (3.1), then OCP (3.1) is said to be (strictly) dissipative with respect to (x, ¯ u). ¯ It is worth to be remarked that in (3.4) is the stage cost of (3.1). Moreover, we define the so-called supply rate s : Z → R as . s(x, u) = (x, u) − (x, ¯ u), ¯ while we denote S in (3.4) as a storage function. We then see that (3.4) are indeed dissipation inequalities [28, 36, 38]. Finally, we note that in the literature on EMPC different variants of the dissipation inequality (3.4b) are considered, i.e., in the early works [3, 20, 29] strictness is required only in x, while more recent works consider strictness in x and u, see [18, Remark 3.1] and [11, 17, 21]. ¯ u) ¯ is the unique Notice that the strict dissipation inequality (3.4b) implies that (x, globally optimal solution to the following steady-state optimization problem (SOP) min (x, u) subject to x, u
x = f (x, u) and x u ∈ Z.
(3.5)
As we will assume strict dissipativity throughout this chapter, the unique globally optimal solution to (3.5) is denoted by superscript ¯·. Moreover, the optimal Lagrange
3 Primal or Dual Terminal Constraints in Economic …
49
multiplier vector of the equality constraint x = f (x, u) is denoted as λ¯ ∈ Rn x (assuming it is unique). Subsequently, we compare three different EMPC schemes (i)–(iii) based on (3.1) which differ in terms of the terminal penalty Vf and the terminal constraint Xf . These schemes are defined as follows: (i) (ii) (iii)
Vf (x) = 0 Vf (x) = 0 Vf (x) = λ¯ x
and and and
Xf = {x}; ¯ Xf = Rn x ; Xf = Rn x .
Note that in the third scheme the optimal Lagrange multiplier vector λ¯ of (3.5) is used in the terminal penalty definition. Moreover, we remark that we consider these three schemes, as their stability proofs rely on dissipativity of OCP (3.1) while explicit knowledge of a storage function is not required.
3.2.1.1
Asymptotic Stability via Terminal Constraints
We begin our comparison by recalling conditions under which the EMPC scheme defined by (3.1) yields {practical, asymptotic} stability of the closed-loop system (3.3). The next assumption summarizes conditions sufficient to guarantee stability properties of the closed-loop system (3.3). Assumption 3.2 (Dissipativity) OCP (3.1) is strictly dissipative with respect to x¯ u¯ ∈ Z in the sense of Definition 3.1(iii). In [3] the following result analyzing the EMPC scheme (3.1) for Vf (x) = 0 and ¯ has been presented. Xf = {x} Theorem 3.3 (Asymptotic stability of EMPC with terminal constraints) Let Assumption 3.2 hold. Suppose that Vf (x) = 0 and Xf = {x} ¯ in (3.1). Moreover, suppose that ¯ S and VN are continuous at x = x. Then, for all initial conditions xˆ0 for which OCP (3.1) is feasible, it remains feasible for i ≥ 0, and the closed-loop system (3.3) is asymptotically stable at x. ¯ This result, and its precursor in [8], is appealing as no knowledge about the storage function S is required. However, knowledge about the optimal steady state is required to formulate the terminal equality constraint Xf = {x}. ¯ Indeed, a dissipativity-based scheme with terminal inequality constraints, which requires knowledge of a storage function, has been proposed in [1].
50
3.2.1.2
T. Faulwasser and M. Zanon
Practical Asymptotic Stability Without Terminal Constraints and Penalties
Consequently, and similarly to conventional tracking NMPC schemes, the extension to the case without terminal constraints (Xf = Rn x ) has been of considerable interest.3 This has been done in [18, 20, 23] using further assumptions. To this end, we consider a set of initial conditions X0 ⊂ Rn x . Assumption 3.4 (Reachability and local controllability) (i) For all x0 ∈ X0 , there exists an infinite-horizon admissible input u(·; x0 ), and constants c ∈ (0, ∞), ρ ∈ [0, 1), such that ¯ u) ¯ ≤ cρ k , (x(k; x0 , u(·; x0 )), u(k; x0 )) − (x, i.e., the steady state x¯ is exponentially reachable. (ii) The Jacobian linearization (A, B) of system (3.1c) at x¯ u¯ ∈ int Z is n x -step reachable.4 (iii) The optimal steady state satisfies x¯ u¯ ∈ int Z. In [18, 20] the following result analyzing the EMPC scheme (3.1) for Vf (x) = 0 and Xf = Rn x has been presented. It uses the notion of a class K L function. A function + γ : R+ 0 → R0 is said to belong to class L if continuous, strictly decreasing, and + + lims→∞ γ (s) = 0. A function β : R+ 0 × R0 → R0 is said to belong to class K L if it is of class K in its first argument and of class L in its second argument. Theorem 3.5 (Practical stability of EMPC without terminal constraints) Let Assumptions 3.2 and 3.4 hold. Suppose that Z is compact and that Vf (x) = 0 and Xf = Rn x in (3.1). Then the closed-loop system (3.3) has the following properties: (i) If x0 ∈ X0 and N ∈ N is sufficiently large, then OCP (3.1) is feasible for all i ∈ N. (ii) There exist γ (N ) ∈ R+ and β ∈ K L such that, for all x0 ∈ X0 , the closed-loop trajectories generated by (3.3) satisfy ¯ ≤ max{β(x0 − x, ¯ t), γ (N )}. xi − x The proof of this result is based on the fact that the dissipativity property of OCP (3.1) implies a turnpike property of the underlying OCP (3.1); it can be found in [18]. The original versions in [20, 22] do not entail the recursive feasibility statement. Under additional continuity assumptions on S and on the optimal value function one can show that the size of the neighborhood—i.e., γ (N ) in (ii)—converges to that the input-state constraints defined via Z from (3.2) are imposed also at the end of the prediction horizon, i.e., at k = N . 4 We remark that n -step reachability of x + = Ax + Bu implies that starting from x = 0 one can x reach any x ∈ Rn x within n x time steps; and one can steer any x = 0 to the origin within n x time steps, cf. [34]. In other words, n x -step reachability implies n x -step controllability. 3 Note
3 Primal or Dual Terminal Constraints in Economic …
51
0 as N → ∞, cf. [18, Lemma 4.1 and Theorem 4.1]. Similar results can also be established for the continuous-time case [11]. We remark that the size of the practically stabilized neighborhood of x¯ might be quite large—especially for short horizons. We will illustrate this via an example in Sect. 3.4; further ones can be found in [18, 40]. Moreover, comparison of ¯ strict Theorems 3.3 and 3.5 reveals a gap: while with terminal constraints Xf = {x}, dissipativity implies asymptotic stability of the EMPC loop, without terminal constraints merely practical stability is attained.
3.2.1.3
Asymptotic Stability via Gradient Correction
We turn toward the question of how this gap can be closed. In [16, 40] we analyzed EMPC using Vf (x) = λ¯ x and Xf = Rn x . In order to state the core stability result concisely we first recall a specific linearquadratic approximation of OCP (3.1). To this end and similar to [16], we consider the following Lagrangian5 of OCP (3.1) . L 0 = λ 0 (x 0 − xˆi ), . L k = (xk , u k ) + λ k+1 ( f (x k , u k ) − x k+1 ) + μk g(x k , u k ), . L N = Vf (x N ) + μN g(x N , u N ),
k = 0,
(3.6a)
k ∈ I[0,N −1] , (3.6b) k = N.
(3.6c)
Accordingly we have the Lagrangian of SOP (3.5) as . L¯ = (x, u) + λ ( f (x, u) − x) + μ g(x, u).
(3.7)
In correspondence with this Lagrangian, we denote the optimal dual variables of SOP (3.5) as λ¯ , μ¯ ≥ 0. The optimal adjoint and multiplier trajectories of OCP (3.1) are written as λ (·; xˆi ) and μ (·; xˆi ) ≥ 0. Subsequently, we consider the following linear time-invariant LQ-OCP min x,u
N −1 xk 1 xk + w z k + 21 x N PN x N + x N p N W 2 u uk k
(3.8a)
k=0
s.t. x0 = xˆ0 , xk+1 = Axk + Bu k , k ∈ I[0,N −1] , C xk + Du k − g(¯z ) ≤ 0, k ∈ I[0,N ] ,
5 Arguably, one could denote
(3.8b) (3.8c) (3.8d)
L from (3.6) also as a Hamiltonian. However, as we are working with discrete-time systems we stick to the nonlinear programming terminology. Readers familiar with optimal control theory will hopefully experience no difficulties to translate this to a continuous-time notion, cf. [40].
52
T. Faulwasser and M. Zanon
where the linear dynamics and path constraints are defined via the Jacobians A = f x , B = f u , C = gx , D = gu , and the quadratic objective is given by
Q S W = , S R
q w= r
with Q = L¯ x x ,
S = L¯ xu ,
R = L¯ uu , q = x , r = u ,
PN = Vf,x x ,
p N = Vf,x ,
where the functions and derivatives above are all evaluated at the primal-dual optimal solution of the SOP (3.5), i.e., at x¯ = 0, u¯ = 0, λ¯ , μ¯ = 0. Observe that μ¯ = 0 corresponds to Assumption 3.4(iii) and that x¯ = 0, u¯ = 0 is assumed without loss of generality. However, in general λ¯ = 0, as detailed in [16, 40]. Moreover, we denote the optimal primal and dual variables of the LQ-OCP (3.8) by . uL Q (xˆi ) = u L Q,0 (xˆi ) . . . u L Q,N −1 (xˆi ) , . ξ ∈ {x, λ, μ}. ξ L Q (xˆi ) = ξ L Q,0 (xˆi ) . . . ξ L Q,N (xˆi ) , Assumption 3.6 (Local approximation and stabilization properties) The SOP (3.5) is such that the optimal dual variable λ¯ is unique. There exists a horizon length N < ∞ such that the receding-horizon feedback generated by the LQ-OCP (3.8) without the constraint (3.8d) stabilizes the linearized system (A, B) at some point (x, ˜ u) ˜ which may differ from (x, ¯ u). ¯ (iii) The primal and dual optimal solutions of OCP (3.1) and LQ-OCP (3.8) satisfy (i) (ii)
ξ L Q (xˆi ) − ξ (xˆi ) = O xˆi 2 ,
ξ ∈ {x, u, λ, μ}.
Part (i) of the above assumption is an implicit requirement of linear independence constraint qualification (LICQ) in (3.5), while Part (ii) is essential for our latter developments as it allows the assessment of asymptotic stability. Finally, Part (iii) can be read as a regularity property of the OCP (3.1). In [16, Proposition 1] we have discussed that this can be enforced, for example, via strict complementarity. In [16] the following result analyzing the EMPC scheme (3.1) for Vf (x) = λ¯ x and Xf = Rn x has been presented. The continuous-time counterpart can be found in [40]. Theorem 3.7 (Asymptotic stability of EMPC with linear terminal penalty) Let Assumptions 3.2, 3.4, and 3.6 hold. Suppose that Z is compact and that Vf (x) = λ¯ x and Xf = Rn x in (3.1). Then, if x0 ∈ X0 , there exists a sufficiently large finite horizon N ∈ N, such that (i) OCP (3.1) is feasible for all i ∈ N. (ii) x¯ is an exponentially stable equilibrium of the closed-loop system (3.3).
3 Primal or Dual Terminal Constraints in Economic …
53
Leaving the technicalities of Assumption 3.6 aside, it is fair to ask what inner mechanisms yield that the linear end penalty Vf (x) = λ¯ x makes the difference between practical and asymptotic stability? Moreover, a comparison of the results of Theorems 3.3, 3.5 and 3.7 is clearly in order. We will comment on both aspects below.
3.3 Comparison We begin our comparison of the three EMPC schemes with an observation: neither Theorem 3.3 nor Theorem 3.5 use any sort of statement on dual variables—i.e., Lagrange multipliers μ and adjoints λ. Indeed, a close inspection of the proofs of Theorem 3.3 [3] and of Theorem 3.5 [18, 20], confirms that they are solely based on primal variables. Actually, despite the crucial nature of optimization for NMPC [22, 31], the vast majority of NMPC proofs does not involve any information on dual variables. However, as documented in [18, 40] the proof of Theorem 3.7 relies heavily on dual variables.
3.3.1 Discrete-Time Euler–Lagrange Equations Based on this observation, we begin our comparison of the three EMPC schemes by detailing the first-order necessary conditions of optimality (NCO), i.e., the KKT conditions of OCP (3.1) which we present in the form of discrete-time Euler–Lagrange equations [5]. The overall Lagrangian of OCP (3.1) reads as . L = Lk, N
k=0
with L k from (3.6). The first-order NCO are given by ∇L = 0, which entails Lλ = 0
⇒
xk+1 = f (xk , u k ),
Lx = 0
⇒
λk =
Lu = 0
⇒
0=
f x λk+1 + x + gx μk , f u λk + u + gu μk .
x0 = xˆi ,
(3.9a) (3.9b) (3.9c)
The reader familiar with the Euler–Lagrange equations might notice that (3.9) misses a crucial piece, i.e., the terminal constraints on either the primal state x N or the dual (adjoint) variable λ N . Depending on the specific choice for Vf and Xf , and assuming that the constraint (3.1d) is not active at k = N —which implies μ N = 0—, these terminal constraints read
54
T. Faulwasser and M. Zanon
(i) x N = x, ¯ λ N ∈ Rn x , if Vf (x N ) = 0 and Xf = x; ¯ nx if Vf (x N ) = 0 and Xf = Rn x ; (ii) x N ∈ R , λ N = 0, ¯ if Vf (x N ) = λ¯ x N and Xf = Rn x . (iii) x N ∈ Rn x , λ N = λ,
(3.9d)
Before proceeding, we remark that the full KKT conditions would also comprise primal feasibility, dual feasibility and complementarity constraints. We do not detail those here, as the discrete-time Euler–Lagrange equations (3.9) provide sufficient structure for our analysis, and all left-out optimality conditions coincide for the three problem formulations. Comparing the three EMPC schemes (i)–(iii) at hand, the first insight is obtained from (3.9d): the only difference in the optimality conditions is the boundary constraint. We comment on the implications of this fact next.
3.3.2 Primal Feasibility and Boundary Conditions of the NCO In case of Scheme (i), the NCO comprise the primal terminal constraint (primal ¯ In fact, for any xˆ0 ∈ X0 finite-time reachability of x¯ boundary condition) x N = x. must be given in order for (3.1) to be feasible. In other words, feasibility of x N = x¯ is a necessary condition for OCP (3.1) to admit optimal solutions. In case of Schemes (ii) and (iii), the NCO (3.9) comprise dual boundary conditions λ N = 0, respectively, λ N = λ¯ . The crucial difference between primal and dual boundary conditions is that the existence of an optimal solution certifies that the latter are feasible. In terms of logical implications—and provided that OCP (3.1) viewed as an NLP satisfies LICQ and the assumption on continuity of problem data—we have that primal feasibility of OCP (3.1)
⇒
existence of optimal solutions in OCP (3.1) ⇒ dual feasibility of NCO (3.9). Let
FXVff (x0 , N ) ⊆ Z × · · · × Z ⊆ R(N +1)(n x +n u )
denote the feasible set of OCP (3.1) parametrized by the initial condition x0 and the horizon length N , where the subscript ·Xf refers to the considered terminal constraint and the superscript ·Vf highlights the terminal penalty. In terms of feasible sets, the differences between the three considered schemes and thus the implications of their respective primal and dual boundary constraints can be expressed as follows ¯
F0x¯ (x0 , N ) ⊆ F0Rn x (x0 , N ) ≡ FλRn xx (x0 , N ),
∀ x0 ∈ X0 , ∀ N ∈ N.
3 Primal or Dual Terminal Constraints in Economic …
55
The first set relation follows from the fact that any feasible solution of Scheme (i) is also feasible in Schemes (ii) and (iii), but the opposite is not true. The second set relation is also evident, since Schemes (ii) and (iii) differ only in the objective function and not in the primal constraints.
3.3.3 Invariance of x¯ Under EMPC The next aspect we address in terms of comparing Schemes (i)–(iii) is related to the invariance of x¯ under the EMPC feedback, which, as we shall see, also crucially depends on the boundary constraints (3.9d). The invariance of x¯ under the EMPC feedback can be analyzed turning to the NCO (3.9) which entail xk+1 = f (xk , u k ), λk = f x λk+1 + x + gx μk , 0 = f u λk + u + gu μk . Recall that the SOP (3.5) implies the (partial) KKT optimality conditions x¯ = f (x, ¯ u), ¯ ¯ λ¯ = f x λ¯ + x + gx μ, ¯ 0 = f u λ + u + gu μ¯
(3.10a) (3.10b) (3.10c)
for which due to Assumption 3.2 x, ¯ u, ¯ λ¯ , μ¯ = 0 is the unique solution. Note that μ¯ = 0 follows from (x, ¯ u) ¯ ∈ int Z. In other words, the KKT conditions of SOP (3.5) coincide with the steady-state version of the NCO (3.9). As we will see this observation, which is also sketched in Fig. 3.1, is crucial in analyzing invariance of x¯ under the EMPC feedback. We mention that, to the best of our knowledge, the first usages of this observation appear to be [32, 40]. Invariance of x¯ means
¯ . (3.11) x¯ = f x, ¯ κXVff (x) This invariance holds if and only if, for the considered EMPC scheme ¯ = u. ¯ κXVff (x) Here, instead of providing fully detailed proofs, we focus on the crucial systemtheoretic aspects. Assumption 3.6 suggests to consider the NCO of the LQ approximation (3.8). Taking (x, ¯ u) ¯ ∈ int Z into account those NCO entail
Fig. 3.1 Relation of OCP and the corresponding Euler–Lagrange equations with the SOP and the corresponding KKT conditions
56 T. Faulwasser and M. Zanon
3 Primal or Dual Terminal Constraints in Economic …
57
x¯ = A x¯ + B u, ¯ λk = A λk+1 + Q x¯ + q, 0 = B λk + R u¯ + r. Assuming that det R = 0 and neglecting the first equation, we obtain A λk+1 + Q x¯ + q, u¯ = −R −1 B λk + r .
λk =
(3.12a) (3.12b)
This in turn can be read as a linear uncontrolled system (3.12a) with a linear output (3.12b). Importantly, uniqueness of the steady-state adjoint λ¯ and controllability of (A, B) (⇔ observability of (A , B )) imply that λk = λk+1 = λ¯ is the only solution (3.12), see [16]. This shows that the closed-loop invariance condition (3.11) holds ¯ i.e., if the state initial ¯ = λ, at x¯ if and only if the optimal adjoint satisfies λ0 (x) condition is x¯ then that adjoint at time 0 is λ¯ . Now, consider the boundary conditions of the adjoint λ as given in (3.9d). We start with case (ii), i.e., Vf (x N ) = 0, Xf = Rn x and have λ N = 0. This implies λk = A λk+1 + Q x¯ + q,
λ N = 0.
If λ0 = λ¯ would hold then the linearity of the adjoint dynamics implies for all k ∈ I[0,N ] that λk ≡ λ¯ . This, however, contradicts the boundary constraint λ N = 0. Hence the adjoint has to leave the optimal steady-state value directly in the first time step.6 A formal proof of the above considerations can be found in [16]. Moreover, we remark without further elaboration that in the continuous-time case, the mismatch between dual terminal constraint at λ N = 0 leads to a periodic orbit which appears in closed loop, cf. [40]. Finally, in case of singular OCPs with det L¯ uu = det R = 0 the NCOs do not directly allow to infer the controls. One can arrive at one of two cases: • For sufficiently long horizons, the open-loop optimal solution reach the optimal steady state (x, ¯ u) ¯ exactly. In this case, one says that the turnpike at (x, ¯ u) ¯ is exact. Moreover, one can show even with λ N = 0 (i.e., no terminal penalty) invariance of x¯ under the EMPC feedback law and thus also asymptotic stability is attained. We refer to [12] for a detailed analysis of the continuous-time case. • Despite the OCP being singular, for all finite horizons, the open-loop optimal solution get only close to the optimal steady state (x, ¯ u), ¯ i.e., the turnpike is not exact but approximate [11]. Without any terminal penalty or primal terminal constraint, one cannot expect to achieve asymptotic stability in this case. In case (iii), i.e., Vf (x N ) = λ¯ x N , Xf = Rn x we arrive at
considering the adjoint dynamics backwards in time from k = N to k = 0 implies that λ¯ is never reached due to linearity of the dynamics.
6 Likewise,
58
T. Faulwasser and M. Zanon
A λk+1 + Q x¯ + q, λ N = λ¯ , u¯ = −R −1 B λk + r .
λk =
¯ In this case, the boundary condition λ N = λ¯ prevents the adjoint from leaving λ. Finally, for scheme (i), i.e., Vf (x N ) = 0, Xf = {x}, ¯ there is no boundary constraint for the adjoint. Here invariance of x¯ is directly enforced by the primal terminal ¯ Slightly simplifying, one may state that the local curvature implied constraint x N = x. by the strict dissipation inequality (3.4b) makes staying at x¯ cheaper than leaving and returning. Moreover, one can also draw upon the stability result from Theorem 3.3 to conclude that the invariance condition (3.11) holds, which in our setting implies that the adjoint is also at steady state λ¯ .
3.3.4 Bounds on the Stabilizing Horizon Length The next aspect we are interested in analyzing is the minimal stabilizing horizon length induced by the differences in Schemes (i)–(iii). To this end, recall Assumption 3.4(i), which requires exponential reachability of x¯ from all x0 ∈ X0 . This also means that the set X0 is controlled forward invariant. In other words, for all x0 ∈ X0 , there exist infinite-horizon controls such that the solutions stay in X0 for all times. ¯ Similarly to the feasible sets F0x¯ (N , x), F0Rn x (N , x) and FλRn xx (N , x) we use superand sub-scripts to distinguish the EMPC feedback laws κx0¯ (N , x), κR0 n x (N , x) and ¯ κRλ n xx (N , x) for the three EMPC schemes. Moreover, using the arguments N , x we highlight the dependence of the feedback on the horizon length N . In case of Scheme (i), the computation of the minimal stabilizing horizon length can be formalized via the following bi-level optimization problem N x0¯ = min N
(3.13a)
N ∈N
subject to
F0x¯ (N , x) = ∅, f (x, κx0¯ (N , x))
∈ X0 ,
∀ x ∈ X0 ,
(3.13b)
∀ x ∈ X0 .
(3.13c)
The first constraint (3.13b) encodes the observation that for Scheme (i) feasibility implies closed-loop stability of x. ¯ The second constraint (3.13c) encodes that for any NMPC scheme to be stabilizing the set X0 is indeed rendered forward invariant by the NMPC feedback. The latter constraint implies a bi-level optimization nature of the problem: in order to solve (3.14)/(3.15), one needs to simulate the EMPC closed loop to obtain the feedback in (3.13c). Note that this constraint will not be (strongly) active in case of a terminal point constraint. This can be seen from the fact that leaving (3.13c) out will not change the value of N x0¯ . The reason is that, provided the problem was feasible at the previous time step, the pointwise terminal constraint makes it immediate to construct a feasible
3 Primal or Dual Terminal Constraints in Economic …
59
guess for the NMPC problem (3.1). Therefore, (3.13) can be rewritten equivalently as a single-level optimization problem. Here we include this constraint to simplify a comparison of the three NMPC schemes. Notice that the problem above, if solved to global optimality, provides the true minimal stabilizing horizon length for Scheme (i). Yet, solving it is complicated by the fact that F0x¯ (N , x0 ) = ∅, ∀ x0 ∈ X0 is an infinite-dimensional constraint. Moreover, as long as one does not tighten exponential reachability to finite-time reachability of x¯ from all x0 ∈ X0 , the optimal solution to (3.13) might be N x0¯ = ∞. The straightforward counterparts to (3.13) for Schemes (ii) and (iii) read NR0 n x = min N
(3.14a)
N ∈N
subject to
FR0n x (N , x) = ∅, f (x, κR0 n x (N , x))
∈ X0 ,
∀ x ∈ X0 ,
(3.14b)
∀ x ∈ X0 ,
(3.14c)
respectively, ¯
NRλn xx = min N
(3.15a)
N ∈N
subject to
λ¯ x Rn x
F
(N , x) = ∅,
¯ f (x, κRλ n xx (N , x))
∈ X0 ,
∀ x ∈ X0 ,
(3.15b)
∀ x ∈ X0 .
(3.15c)
If, in all problems (3.13)–(3.15), the forward invariance constraints (3.13c)–(3.15c) are inactive—which is indeed the case for (3.13c) or if there are no state constraints implied by Z—the following relation is easily derived: ¯
NR0 n x = NRλn xx ≤ N x0¯ . At first sight, this relation appears to be a rigorous advantage of Schemes (ii) and (iii). However, for those schemes there is, to the best of our knowledge, no general guarantee that recursive feasibility alone implies asymptotic properties. As a matter of fact, Scheme (ii) only admits practical asymptotic stability properties. Hence for Schemes (ii) and (iii) the horizon length computed via (3.14) or (3.15) constitutes a lower bound on the minimal stabilizing horizon length of the underlying schemes. Finally, we remark that while for (3.13) the invariance constraint (3.13c) is inactive, in (3.14) and (3.15) the feasibility set constraints (3.14b) and (3.15b) are inactive. The reason is that Assumption 3.4(i) implies that X0 is a controlled forward invariant set. In turn, this directly implies that for all x ∈ X0 and any N ∈ N+ the feasibility sets in (3.14b) and (3.15b) are non-empty as there are no primal terminal constraints in the underlying OCPs. However, inactivity of these constraints does not provide a handle to overcome the bi-level optimization nature of (3.14) and (3.15). Hence we turn toward an approximation procedure.
60
T. Faulwasser and M. Zanon
Computational Approximation To approximate the solution to (3.13), we fix a set of initial conditions . j ˜0 = {x0 , X
j = 1, . . . , M}.
j
For all samples x0 , we solve the minimum-time problem j . N x0¯ (x0 ) = min N
(3.16a)
x,u,N
j
subject to x0 = x0 ,
(3.16b)
xk+1 = f (xk , u k ), g(xk , u k ) ≤ 0, x N = x. ¯
k ∈ I[0,N −1] , k ∈ I[0,N ] ,
(3.16c) (3.16d) (3.16e)
By virtue of Bellman’s optimality principle, the minimum-time problem defined above allows one to conclude the behavior of the closed loop with primal terminal j . constraint. Whenever no solution to this problem is found, we define N x0¯ (x0 ) = ∞. Eventually, an approximation of (3.13) is given by j
N x0¯ ≈ max N x0¯ (x0 ), j ˜ x0 ∈X 0
j
provided that the samples x0 are sufficiently dense and cover a sufficiently large subset of the x-projection of Z. Observe that additional information is contained in j j j j the tuples (x0 , N x¯ (x0 )): large values of N x¯ (x0 ) indicate that reaching x¯ from x0 is difficult. The conceptual counterpart to (3.16) for Schemes (ii) and (iii) is a forward simulation of the closed loop with † = {0, λ¯ x} and j . NR† n x (x0 ) = min N
(3.17a)
N
j
subject to x0 = x0 , xk+1 = f (xk , κR† n x (N , xk )), x Ncl ∈ Bρ (x), ¯
(3.17b) k ∈ I[0,Ncl ] ,
(3.17c) (3.17d)
where Bρ denotes a ball of radius ρ and Ncl denotes the closed-loop simulation horizon which should ideally be infinite, but can in practice only be finite. Obviously, the above problem requires one to solve a large number of OCPs when simulating the closed loop. Moreover, one has to rigorously define the radius ρ, i.e., the accuracy by which the optimal steady state should be attained. This approximation procedure leads to j † = {0, λ¯ x}. NR† n x ≈ max NR† n x (x0 ), j ˜ x0 ∈X 0
3 Primal or Dual Terminal Constraints in Economic …
61
3.4 Simulation Example We consider a system with state x = xA xB ∈ [0, 1]2 , control u ∈ [0, 20], stage cost, and dynamics (x, u) = −2uxB + 0.5u + 0.1(u − 12)2 , x + 0.01 u (1 − xA ) − 0.12 xA . f (x, u) = A xB + 0.01 u xB + 0.12 xA
(3.18a) (3.18b)
This example has also been considered in [16]. The optimal steady state is x¯ = [0.5, 0.5] , u¯ = 12 with λ¯ = [−100, −200] . We computed the stabilizing horizon length for the three economic NMPC variants and observed that for the two schemes without terminal constraints in case Vf (x) = λ¯ x we obtain stability with N = 1 for any ρ > 0, while for Vf (x) = 0 the stabilizing horizon length is independent of the initial condition but depends on ρ as ¯ displayed in Fig. 3.2 (right). For the formulation with the terminal constraint x N = x, the stabilizing horizon length depends on the initial condition, as some initial conditions are potentially infeasible. We display the stability regions for several choices of N in Fig. 3.2 (left) and observe that for N ≥ 7 stability is obtained for all feasible initial states. These results are also summarized in Table 3.1. 1
10 -1
N=7 N=6 N=5 N=4 N=3 N=2
0.8 0.6
10
-2
10 -3 -4
0.4
10
0.2
10 -5
0
0
0.2
0.4
0.6
0.8
1
10 -6
0
5
10
15
20
25
30
Fig. 3.2 Stabilizing horizon length. Left graph: N x0¯ as a function of the initial state. Right graph: ρ as a function of N for scheme (ii) Table 3.1 Numerical approximation results of the minimal stabilizing horizon length for (3.18) EMPC Minimal stabilizing horizon Eventual deviation from x¯ ˜0 length for x0 ∈ X (i) Vf (x) = 0 Xf = {x} ¯ (ii) Vf (x) = 0 Xf = Rn x (iii) Vf (x) = λ¯ x Xf = Rn x
N =7 N ≥1 N =1
0 0 is a small constant, usually taken between 10−12 and 10−8 . Due to the special structure of the Hessian, the eigenvalue decomposition can be done blockwise, therefore it is computationally relatively cheap. The regularization strategies above usually result in linear convergence when applied to constrained optimization problems. Another obstacle is also that some fast structure exploiting QP solvers such as FORCES [16], HPMPC [17], or qpDUNES [18] need full-space positive definite Hessians. The regularization techniques above will in general not yield quadratic convergence rates, which makes it not attractive to calculate exact Hessians. However, the second-order sufficient conditions (SOSC) for optimality require positive definiteness of the reduced Hessian, i.e., the Hessian matrix projected onto the null space of the active constraint Jacobians [13]. A way to overcome all this in the case of ENMPC is to use the structure-preserving convexification algorithm from [53], which has a complexity that is linear in the horizon length N . For an exact Hessian RTI this algorithm is available in the open-source package acados [12]. The procedure replaces the full-space Hessian with a positive definite one without altering the reduced Hessians. Note that this procedure maintains the block-diagonal structure and preserves the quadratic convergence rate, see Theorem 23 in [53]. However, if the reduced Hessian is indefinite then the corresponding indefinite block is replaced by a mirrored or projected variant of it.
4 Multi-level Iterations for Economic Nonlinear Model Predictive Control
81
The exact Hessian depends only on the current iterate, so it has no memory. Another approach without memory for a specific class of problems that results in convex Hessians is the sequential convex quadratic programming (SCQP) method [54]. This algorithm exploits the convexities of the NLP while calculating the Hessian of the Lagrangian. Second-order derivatives are only evaluated for the convex objective and inequality constraint functions so that the computationally expensive second-order sensitivity propagation of the process dynamics (4.5c) is avoided. In the case of least-squares objectives which are used in classical NMPC, SCQP reduces to the well-known generalized Gauss–Newton method [55].
4.5.1 Hessian Update Formulae Another popular approach to obtain computationally cheap (and possibly) positive definite Hessian approximations Ak+1 for Newton-type optimization methods is the use of Hessian update formulas. In contrast to the above mentioned approaches, every new Hessian approximation Ak+1 is obtained by a formula that uses the previous approximation Ak , the step s k = Δw k and the difference of gradients of the Lagrangian y k = ∇w L (w k+1 , λk+1 , μk+1 ) − ∇w L (w k , λk+1 , μk+1 ). The motivation for Hessian update formulas is to start with some symmetric (positive definite) approximation A0 , and incorporate some curvature information based on the most recent step. There is no general way to start the procedure and usually a scaled unit matrix A0 = α I , α > 0 is used. We remind the reader that in the case of ENMPC in level D1 of the MLI, the exact Hessian of the previous level D2 iteration is used as a starting approximation, and thus we expect to have a good and well-scaled starting guess. Hessian update formulas have been very popular in offline optimization and most software implementations of SQP methods use some Hessian update formulas. One of the most widely used is the Broyden–Fletcher–Goldfarb–Shanno (BFGS) update [13]:
Ak+1 = Ak −
Ak s k s kT Ak y k y kT + . s kT Ak s k s kT y k
(4.23)
However, the updates will stay positive definite only if the curvature condition: s kT y k > 0
(4.24)
is satisfied. A naive way to overcome this is to simply skip an update if s kT y k is negative or close to zero, but this may lead to many skipped steps and therefore no curvature information would be captured. Powell [56, 57] suggested a modification to the BFGS formula, usually called damped BFGS, which keeps the Hessian Approximation Ak+1 positive definite even if the underlying exact Hessian is indefi-
82
A. Nurkanovi´c et al.
nite. Analytical and experimental studies have shown that the BFGS update formula has good self-correcting properties, i.e., if the curvature is wrongly captured at some step, the Hessian approximation will tend to correct itself within a few steps, but this holds only if a globalization strategy is applied, see Sect. 6.1 in [13]. Hessian update formulas have not been popular in online optimization since they often show fluctuating contraction rates. In the case of MPC, a state jump or external noise might result in a poor approximation and many updates may be needed in order to correct the contraction rate. But in the case of MLI better performance can be expected since after every few iterates a new starting guess is provided, so that all the disturbances are removed from the “memory” of a Hessian approximation. Another widely used formula is the Symmetric-Rank-1 (SR1) update: Ak+1 = Ak +
(y k − Ak s k )(y k − Ak s k )T . (y k − Ak s k )T s k
(4.25)
In contrast to the damped BFGS formulas, this update does not guarantee to maintain positive definiteness and therefore it might actually capture better the behavior of the exact Hessian. In [52] the projected Hessian is used as regularization to replace possible indefinite Hessians. But it can be expected that the reduced one is positive definite even when the SR1 update results in a indefinite Hessian, hence it is reasonable to apply the convexification procedure mentioned above. One drawback is that the denominator in (4.25) might vanish for some iterates. Furthermore, it has been observed in [34] that small values of the denominator yield ill-condition Hessian approximations. This is often the case in online optimization where the updates are small when no big parameter changes occur. One way to handle this is to skip the update if the following condition holds [52]: |(y k − Ak s k )T s k | > εSR1 ||y k − Ak s k || ||s k ||, εSR1 ∈ (0, 1).
(4.26)
If εSR1 is chosen too small, the Hessian approximation might still become illconditioned over time. Conversely, a too big value causes many skipped updates and therefore the curvature information will not be captured. In a practical implementation, εSR1 is usually chosen between 10−8 and 10−6 . Note that this skipping update procedure leads to a variant of fractional-level iterations, i.e., the Hessian blocks of the stages with no big changes in the states and controls are kept unchanged. The motivation for skipping updates is not exclusively to save computation time, but to always have well-defined iterates. Note that, if all updates are skipped, level D1 reduces to level D0. Other update formulas, that are sometimes used, are the Broyden update, the Davidon–Fletcher–Powell (DFP), and the related Broyden class formulas, see Chap. 6 in [13]. Note that some of the update formulas do not result in a symmetric matrix. Ak +AkT In this case, we symmetrize the corresponding block: Aik = i 2 i which is computationally cheap since it can be done block-wise.
4 Multi-level Iterations for Economic Nonlinear Model Predictive Control
83
It has been shown that under mild assumptions Hessian update formulas yield super-linear convergence [13]. Note that for any Hessian update formula no secondorder derivative evaluations are needed. To get a new Hessian approximation only additional linear algebra operations are needed. However, it is non-trivial to compare the computational complexity for Hessian update formulas and the exact Hessians, since in the latter case additional function evaluations are needed and in the first mostly matrix-vector operations are performed. Since the approximations are reused in updated formulas, it is reasonable to take the evolution of time into account, similar to the shift initialization for primal-dual variables. Since the first block of the Hessian corresponds to an already old value, one can use a Hessian “recycled by a shift” [58]: ⎡ ⎢ ⎢ ⎢ ⎢ ⎢ ⎣
⎤
A0 A1
..
⎡
⎢ ⎥ ⎢ ⎥ ⎢ ⎥ ⎥→⎢ ⎢ ⎥ ⎣ ⎦
. A N −1
A1
⎤ ..
⎥ ⎥ ⎥ ⎥, ⎥ ⎦
. A N −1
A N −1
AN
AN
where the first block A0 is dropped, and the blocks A N −1 and A N are initialized as A N −1 = A N −1 and A N = A N . However, similar to the shift strategy, this approach is only easily applicable for equidistant grids and when the feedback rate corresponds to ΔtMS . If the problem has a relatively short horizon, we suggest to rather use a warm-start strategy, i.e., taking the matrix Ak from the previous iterate without changes.
4.5.2 Relative Cost of Exact Hessian Computation in Direct Multiple Shooting In practice computation of the exact Hessian is avoided as it is considered to be computationally expensive. However, it is interesting to note that in direct multiple shooting framework the ratio between Hessian and Jacobian evaluation times on each stage does not depend on the number of states. The reason is the following: if n x is the number of states, the Jacobian evaluation is about n x times more expensive than a simple forward simulation, if we use the forward mode of AD. On the other hand, the Lagrange gradient on the same multiple shooting stage can be obtained by a fixed multiple of a forward simulation if we use the reverse mode of AD [59]. The Hessian can be obtained by n x forward derivatives of the gradient (“forward over adjoint”) and thus costs a fixed multiple of Jacobian. We will consider the chain of masses as a benchmark example, which is, for instance, described in Sect. 5.3.1 in [9]. This example is easily scaled since just adding a further mass three new states are introduced. We simulate the ODE with an RK4 integrator implemented in CasADi [60] with five steps for 0.2 [s]. Thereby we
84
A. Nurkanovi´c et al. 10 3
ratio [-]
time [ms]
10 2 10 1 10 0 10 -1
Jacobian/Integration Hessian/Integration Hessian/Jacobian
10 2
Integration Jacobian Hessian
10 1
10 0 10 2
10 2
state dimension - n x
state dimension - n x
(a) Evaluation times in ms.
(b) Ratios between evaluation times.
Fig. 4.1 Ratios between function and derivative evaluation times for the chain of masses example
calculate the first-order sensitivity using forward mode of AD and the second-order sensitivity using forward over adjoint AD. In our benchmark we calculate the sensitivities for n x starting from 12 to 234 states (corresponding to 3–40 masses in the chain). The results are depicted in Fig. 4.1. We evaluate all functions 200 times and take the average values of the evaluation times. In Fig. 4.1b one can see that the ratio between the Hessian and Jacobian evaluation is for higher values of n x fixed between 4 and 5 (more precisely, for n x = 234, the ratio is 4.168), which gives an empirical confirmation of our argument. This suggests that after 4–5 iterations of levels involving only first-order derivatives we should calculate a full D2 iteration. Note that this holds only for evaluation of the second-order derivatives of the constraints for fixed Lagrangian multipliers, which we need for the Hessian of the Lagrangian. The first- and second-order derivative evaluation times of the objective scale with the N xi . dimension of the input space, a simple illustrative example is, e.g., f (x) = i=1
4.6 A Tutorial Example To illustrate the convergence rates of the different variants, the quality of generalized tangential predictors and the path-following capabilities of different MLI schemes we regard a simple nonlinear pendulum from Exercise 8.10 in [6], with the dynamics: x(t) ˙ = f (x, u) =
C sin
v(t) p(t) C
+u
(4.27)
180 with the state vector x = [ p, v]T and C = 10π , where p is the position, v the velocity and u the control force. The goal is to bring the pendulum from x0 = [10, 0]T to x N = [0, 0] in a time horizon of T = 10 [s] while minimizing the control effort. The
4 Multi-level Iterations for Economic Nonlinear Model Predictive Control
85
direct multiple shooting method is used to discretize the arising OCP with 50 stages on a equidistant grid with a stage length of tMS = 0.2 [s]. On every stage a single step of a fourth-order Runge–Kutta integrator is used. Regarding the bounds on p, v and u, i.e., pmax , vmax and u max the discretized OCP reads as
min ,...,q
q0 N −1 s0 ,...s N
s.t.
N −1 1 |qi |2 2 i=0
(4.28a)
s0 − ξ = 0, si+1 − xi (ti+1 ; si , qi ) = 0,
(4.28b) (4.28c)
s N = 0, − xmax ≤ si ≤ xmax , i = 0, . . . , N ,
(4.28d) (4.28e)
− u max ≤ qi ≤ u max , i = 0, . . . , N − 1.
(4.28f)
4.6.1 Convergence Rates We will first look at the convergence rates of SQP algorithms with different Hessian and Jacobian approximations. The OCP (4.28) is solved with a full-step SQP algorithm, as described in Sect. 4.2. The algorithm is initialized at [w T , λT , μT ]T = 0. Figure 4.2a shows the convergence rates of SQP algorithms for different Hessian and Jacobian approximations. The exact Hessian variant shows quadratic convergence, which is desired in real-time applications in order to reject disturbances
100
Exact Hessian GN SR1 Skipping Steps SR1 BFGS Adjoint SQP Broyden
10-5
10-10
Exact Hessian Exact Hessian Projected Exact Hessian Mirror SR1 SR1 Projected
100
10-5
10-10 0
10
20
30
40
50
(a) Convergence rates for different Hessian and Jacobian approximations
0
10
20
30
40
(b) The effect of different regularization procedures on the convergence rate
Fig. 4.2 Convergence for different for SQP algorithms. The distance to a local optimum is plotted on the vertical axis, on the horizontal the iteration number
86
A. Nurkanovi´c et al.
100
100
10
Exact Hessian BFGS Broyden SR1 SR1 Skipping Steps SR1 Projected
-5
10-5 Exact Hessian BFGS Broyden SR1 SR1 Skipping Steps SR1 Projected
10-10
10-10
0
5
10
15
20
25
30
0
5
10
15
20
25
30
(a) Exact Hessian SQP iterations only at (b) Exact Hessian SQP iterations at the first the first three iterates, followed by quasi- three iterates, followed by quasi-Newton SQP, additonaly every five iterates another Newton SQP. exact Hessian SQP step to get a new starting guess for the Hessian update formulas. Fig. 4.3 Convergence for different full-step quasi-Newton SQP algorithms, where some of the iterates are performed with a exact Hessian SQP algorithm. The distance to a local optimum is plotted on the vertical axis, on the horizontal the iteration number
and state jumps. The SQP variants based on some Hessian update formulae show super-linear convergence and the adjoint SQP (with an identity matrix as Hessian approximation) and GN SQP show linear convergence, although GN SQP has a clearly faster rate. The variant where some of the SR1 block updates are skipped has the same behavior as the unmodified SR1, since the small steps occur already when the algorithm has almost converged. However we have observed that over a huge number of iterates, the small updates will result in ill-conditioned Hessian approximations, which is avoided with the update skipping. One can see that in this example the Powell BFGS formula, which always results in a full-space positive definite Hessian, yields a fluctuating convergence rate in contrast to a rather fast rate for the SR1 update formula. In this example, the reduced Hessian was positive definite with an indefinite full space in the case for exact Hessian, the Broyden, and SR1 update formulas. Since not all QP solvers can factorize indefinite Hessians, an obvious choice is the convexification procedure mentioned in the previous section to obtain a full-space positive definite Hessian. The effect of different regularizations is illustrated in Fig. 4.2b. The projected or mirrored SR1 and exact Hessian will result only in linear convergence in contrast to the convexification algorithm where we get super-linear and quadratic convergence, respectively. This simple example illustrates how the choice of regularization strategies can in some cases harm the convergence rates of some possibly computationally expensive SQP algorithms. In an MLI scheme with level D1 and D2 running in parallel, the D1 iterates will get a starting Hessian approximation from a previous D2 iterate. In order to mimic this behavior and see the effect of such a starting guess, we perform first three exact Hessian SQP iterations and then continue with quasi-Newton SQP, see Fig. 4.3a. One can see that the Broyden and SR1 will result in good convergence rates. Again the
4 Multi-level Iterations for Economic Nonlinear Model Predictive Control
87
use of a regularization strategy can harm the convergence rates as one can see that the projected SR1 will result only in linear convergence. Interestingly, the BFGS will not converge if not used with care. This formula needs always a positive definite starting guess, and in this example this was not the case for the full-space exact Hessian. A similar positive effect of proper starting Hessian approximations can also be observed in Fig. 4.3b. Again, the simple regularization strategies do harm the convergence and the BFGS SQP does not converge if the starting guess is not positive definite.
4.6.2 Generalized Tangential Predictors In this section the goal is to visualize the generalized tangential predictors when using QPs with different Jacobians and Hessian approximations, which we abbreviate as predictors from now on. The initial value ξ is varied from [0, 0]T to [10, 0]T and the OCP (4.28) is solved to convergence for every ξ in order to obtain the solution manifold y ∗ (ξ ), which is the gray curve in plots in Figs. 4.4, 4.5, 4.6, and 4.7. Since this is a high-dimensional problem, we plot only the first part of the control, i.e., q0∗ . We make several iterations for a fixed initial valued ξ = [7, 0]T in order to have just the predictor part in the QP and visualize the approximated predictors of an exact and strongly regular solution y ∗ (ξ ) for different SQP variants. Figure 4.4a depicts the predictors for an exact Hessian SQP and Fig. 4.4b for GN SQP. As expected, the exact Hessian a tangential predictor, however the GN results in a bad approximation for this example. Even if the GN has good convergence properties, this predictor might yield large errors, if, for instance, a lot of level A iterates are performed using the QP obtained from a GN SQP algorithm. Next we look at the predictors from a QP obtained from a SR1 SQP in Fig. 4.5a and a QP resulting from the SQP variant used in level D0 Fig. 4.5b, i.e., where the Hessian matrix is kept fixed. For the SR1 SQP we used the update skipping heuristic (4.26) to avoid ill-conditioned Hessians and the starting guess for the Hessian for a problem with a new parameter value p, we used the Hessian obtained for the previous parameter. In the first case an excellent and computationally cheap predictor is obtained, and in the latter case a larger error can be seen. Note that the quality of the D0 predictor strongly depends on the Hessian provided by the higher levels and in this case an exact Hessian calculated at the solution of the OCP with the initial value [0, 0]T was used. Next, the predictors from QPs obtained via a Broyden SQP Fig. 4.6a and adjoint SQP Fig. 4.6b are illustrated. Similar to SR1, the first variant results in a very good predictor. The adjoint SQP yields a rather bad approximation of the exact predictor, but similar to D0, the quality of the predictor strongly depends on the Hessian and Jacobian provided by higher levels. In order to illustrate the performance of the adjoint SQP without the influence of other levels in this case a scaled identity matrix was used for the Hessian and the Jacobians are evaluated at the solution for the OCP with the initial value [0, 0]T .
88
A. Nurkanovi´c et al. 0.8
0.15 Solution Manifold Exact Hessian Tangential Predictor
Solution Manifold Gauss Newton Tangential Predictor
0.6
0.1 0.4 0.2 0.05 0 0
-0.2 0
2
4
6
8
10
0
2
(a) Exact Hessian
4
6
8
10
(b) Gauss Newton
Fig. 4.4 Generalized tangential predictors from QPs with an exact Hessian and QPs with a Gauss Newton Hessian approximation 0.15
0.8 Solution Manifold SR1 Tangential Predictor
Solution Manifold D0 Tangential Predictor
0.6
0.1 0.4 0.2 0.05 0 0
-0.2 0
2
4
6
8
10
0
2
(a) SR1
4
6
8
10
(b) Level D0
Fig. 4.5 Generalized tangential predictors from QPs with an SR1 Hessian approximation and QPs with derivative approximations according to the Level D0 variant of the MLI scheme 0.2
0.15 Solution Manifold Broyden Tangential Predictor
Solution Manifold Adjoint SQP Tangential Predictor
0.15
0.1 0.1 0.05 0.05 0 0
-0.05 0
2
4
6
(a) Broyden
8
10
0
2
4
6
8
10
(b) adjoint SQP
Fig. 4.6 Generalized tangential predictors from QPs with Hessian approximations according to Broyden’s formula and from QPs corresponding to the adjoint SQP scheme
4 Multi-level Iterations for Economic Nonlinear Model Predictive Control
89
1.5
1.5
Solution Manifold Projected SR1 Tangential Predictor
Solution Manifold Projected Exact Hessian Tangential Predictor
1
1
0.5
0.5
0
0
-0.5
-0.5 0
2
4
6
8
(a) Projected Exact Hessian
10
0
2
4
6
8
10
(b) Projected SR1
Fig. 4.7 Influence of regularization strategies on the generalized tangential predictors
As seen in the previous subsection, the use of regularization strategies can strongly influence the convergence rate of SQP algorithms even if exact Hessians or Hessian updates formulas are used. The same holds for the resulting predictors based on the modified Hessians. The predictors for a projected exact Hessian and SR1 are depicted in Fig. 4.7. Even with an exact Hessian, the regularization will yield a predictor similar to the computationally cheap variants with fixed Hessians. This underlines again the need of proper convexity revealing strategies.
4.6.3 Path-Following Ideally with an infinite amount of computational resources one would immediately calculate an optimal solution for a given parameter ξ and pass it back to the process. However, due to the limited computational resources solving the problem to convergence would result in outdated solutions since the process has already evolved in the meantime and has a different initial value ξ . Therefore, as discussed before in order to avoid the large feedback delays we can approximately solve the problem via predictor–corrector path-following method in a minimum amount of computational time and try to track the optimal solution manifold y ∗ (ξ ) as close as possible. Now we will illustrate the path-following capabilities of different MLI schemes. In order to imitate the evolution of a process in time we will take discrete steps in the parameter ξ . As the higher levels take more time, we will take larger steps to mimic greater changes in the parameter since more time has passed and smaller steps for less computationally intensive steps. In such a setting MLI schemes allow us to calculate an approximation for y ∗ (ξ ) for more different values of ξ which is in this setting equivalent to having feedback at higher rates. First of all we will illustrate the effect of refining the linearization point with the AS-RTI (Algorithm 4.2) compared to a warm-start strategy. In this section we
90
A. Nurkanovi´c et al. 0.12
0.12
0.11
0.11
0.1
0.1
0.09
0.09
Solution Manfiold New Linearization Point Tangential Predictors After Corrector Step Output
0.08
Solution Manfiold New Linearization Point Tangential Predictors After Corrector Step Output
0.08 0.07
0.07 5
6
7
8
(a) Warm start
9
10
5
6
7
8
9
10
(b) AS-RTI with one extra QP solve in the preparation phase
Fig. 4.8 Illustration of the predictor–corrector path-following capabilities of the exact Hessian RTI with different initialization strategies
will always use just a single-level A iteration in the preparation phase of the ASRTI, since it is computationally cheap. To keep the presentation clear we will use an exact Hessian RTI for this comparison and we take discrete steps of 1 in the parameter ξ . In Fig. 4.8a one can see the path-following capabilities of the exact Hessian RTI with a warm start (given by Algorithm 4.1), i.e., the output (red circle) of the current iteration is the new linearization point for the new iteration (blue star). In this illustration we always take a corrector step starting from the new linearization point and then a predictor step starting from the corrected point, which completes a full RTI. The output of an RTI iteration is marked with the red circle. The use of Algorithm 4.2 is depicted in Fig. 4.8b. Thereby we assumed a wrong guess for the next state by adding 0.8 to the last actual initial value (instead of the true value, which is obtained by adding 1.0). The new linearization point (blue star) is on the predictor of the previous iteration (corresponds to one level A iteration in line 2 of Algorithm 4.2). From this point a corrector step is made and another predictor step, where the new predictor corresponds to the new linearization point. One can see that in this setting the outputs of the iterations are closer to the solution manifold than in the scenario with warm start. Note that if we had a perfect guess, the second predictor step would be equal to zero. In the next scenarios, whenever new derivatives need to be evaluated in some of the MLI levels, in order to refine the linearization point we solve one extra QP in the preparation phase as in Algorithm 4.2. To keep the figures clear, each time only two MLI levels are run in parallel. For the lower level we always take discrete steps of 0.125 in the parameter ξ . Figure 4.9a depicts the path-following capabilities of an MLI scheme with levels D2 and A in parallel. One can see that level A iterations are on the predictor corresponding to the last D2 iteration, and this illustrates that level A paired with a higher level is in some sense an adaptive linear MPC controller. Figure 4.9b depicts the results for the same setting when running levels D2 and C in parallel. In contrast to the first case, level C iterations also have corrector properties
4 Multi-level Iterations for Economic Nonlinear Model Predictive Control 0.12
0.12
0.11
0.11
0.1
0.1
91
0.09
0.09 Solution Manfiold Output D2 Tangential Predictors Output A
0.08
Solution Manfiold Output D2 Tangential Predictors Output C
0.08
0.07
0.07 5
6
7
8
9
10
5
(a) D2 and A
6
7
8
9
10
(b) D2 and C
Fig. 4.9 Illustration of the predictor–corrector path-following capabilities of different MLI schemes
and get closer to the solution manifold. The plots for levels D0 and D1 (using the SR1 update formula with the skipping updates heuristic given by (4.26)) running in parallel with D2 look virtually same as Fig. 4.9b, since in both cases the corrector part is present, they track the manifold closely. Furthermore, the lower levels use the last output of the upper level, hence if this output is better, the derivative information more accurate, this improves the performance of the lower levels. This example shows that a good combination of MLI levels allows one to follow the solution manifold accurately at very high rates.
4.7 Wind Turbine Control Example In this section we will analyze the presented algorithms on a more challenging numerical example. We will analyze various ENMPC schemes for a wind turbine control (WTC) problem from [34] where the goal is to maximize the energy output of the wind turbine.
4.7.1 The Optimal Control Problem ENMPC is a promising alternative to classical control of wind turbines, as it can naturally handle the operational constraints given by the grid operator and the physical limitations of the wind turbine. Moreover, using LIDAR (LIght Detection and Ranging) systems accurate short-term wind speed predictions can be incorporated into the optimization problem and the control can be scheduled on time. Some successful applications of MPC for WTC are reported in [61–65]. We will use a simple reduced model as presented in [61], where structural dynamics are ignored whose influence on the power generated by the wind turbine is rather
92
A. Nurkanovi´c et al.
Table 4.1 Wind turbine parameters [67] Description Parameter Total rotor inertia Gearbox ratio Air density Rotor radius Rotor area Rated rotor speed Cut-in rotor speed Rated wind speed Cut-in wind speed Cut-out wind speed Rated generator torque Rated power output Generator efficiency Maximum pitch angle Maximum pitch angle rate Maximum generator torque rate
Value
Unit
J rg ρ R A ωrated ωcut-in vin vcut-in vout Tg,rated Prated η βmax β˙max
40.47 · 1/97 1.23 63 1.25 · 104 1.26 0.33 11.2 3 25 41.51 5 0.97 30 7
[kg/m2 ] [–] [kg/m3 ] [m] [m2 ] [rad/s] [rad/s] [m/s] [m/s] [m/s] [kNm] [MW] [–] [◦ ] [◦ /s]
T˙g,max
4.15
[kNm/s]
106
negligible [66]. The model reads as J ω˙ = Taero − r g−1 Tg , Taero =
(4.29a) vw3
1 ρ Ac P (ω, β, vw ) . 2 ω
(4.29b)
Equation (4.29a) models the drive train dynamics, where ω is the rotor speed, Tg the generator torque, Taero the aerodynamic torque perceived by the rotor and is given by (4.29b). Variable r g is the gearbox ratio, J the total turbine inertia seen from the rotor given as J = Jr + r g−2 Jg , where Jr is the rotor inertia including the hub and the blades and Jg the generator inertia. Variable β is the collective pitch angle of the blades, vw the wind speed, ρ the air density and A the rotor area. The simulations are performed with the model parameters of a 5 MW three-bladed pitchcontrolled variable-speed wind turbine modeled by the national renewable energy laboratory (NREL) [67] and all relevant parameters are given in Table 4.1. The state T vector reads as x(t) = [ω(t), β(t), Tg (t)] and the control inputs are the pitch and T ˙ generator torque rates, i.e., u(t) = [β(t), T˙g (t)] . The aerodynamic power factor is marked as c P (ω, β, vw ) or often given as is the tip speed ratio. It is given as a two-dimensional lookc P (β, λ), where λ = Rω vw up table and it is computed with the blade-momentum model of the wind turbine
4 Multi-level Iterations for Economic Nonlinear Model Predictive Control Table 4.2 Power factor parameters
Parameter
Value
c1 c2 c3 c4 c5 c6 x
0.7834 138.8 0.4219 0.05772 12 18.05 2
93
[68]. We choose to use the following nonlinear function for approximating the c P table [69]: c P = c1 [c2 f (β, λ) − c3 β − c4 β x − c5 ]ec6 f (β,λ) , 0.003 1 − 3 . f (β, λ) = λ − 0.02β β +1
(4.30a) (4.30b)
The coefficients c1 , c2 , c3 , c4 , c5 , c6 and x in the last equations are determined via MATLAB’s Curve Fitting Toolbox and are given in Table 4.2. The natural goal is to maximize the energy output of the wind turbine over the control horizon. However, taking this directly as a cost functional has a drawback, which is in common in ENMPC and known as turnpike phenomenon, for more details see [66]. A formulation without the turnpike effect is to maximize the aerodynamic power [66]. The continuous-time OCP reads as
min x(·), u(·) s.t.
1 − 2
T 0
ρ Ac P (ω, β, vw )vw3 dt
x(0) − ξ = 0, 1 1 v3 ω(t) ˙ = ρ Ac P (ω, β, vw ) w − r g−1 Tg , J 2 ω ˙ = u 1 (t), β(t) T˙g (t) = u 2 (t),
(4.31a) (4.31b) (4.31c) (4.31d) (4.31e)
0 ≤ β(t) ≤ βmax , 0 ≤ Tg (t) ≤ Tg,max , ˙ ≤ β˙max , − β˙max ≤ β(t)
(4.31h)
− T˙g,max ≤ T˙g (t) ≤ T˙g,max ,
(4.31i)
ωcut-in ≤ ω(t) ≤ ωmax .
(4.31j)
(4.31f) (4.31g)
94
A. Nurkanovi´c et al.
The rotor speed ω and generator torque Tg are subject to so-called operational constraints [70]. The maximum rotor speed ωmax results from structural fatigue, power electronics limitations and noise regulations [66]. The maximum rotor speed ωmax and torque Tg,max are often chosen slightly above their rated value allowing them to exceed temporally the rated values [65]. In our implementation they are chosen 5% higher than the rated values, which makes also a higher generator power PE possible. Additionally, a minimum rotor speed ωcut-in is introduced, below which the operation of the generator is infeasible and the generator torque cannot be negative. Furthermore, the pitch rate β˙ and generator torque rate T˙g are also limited in order to not exceed the limits of the power electronics [71] and to provide a smooth power output which is required by the grid operator [72].
4.7.2 Simulation Results As time horizon length we choose T = 12 [s] which is divided into 30 equidistant multiple shooting subintervals. The influence of the horizon length on the controller performance is analyzed in [34, 65]. On every subinterval one integration step of a Implicit Runge–Kutta scheme based on Gauss collocation of order 4 with a fixed step size of h = 0.4 [s] is used. The above presented MLI schemes are implemented in MATLAB. For numerical integration and sensitivity generation we use the standalone ACADO integrators [73] and for solving the QPs we use the solvers available in acados and its MATLAB interface [11]. More information about the performance of the QP solvers in acados can be found in [74]. In this numerical experiment we use HPIPM [75], as it is suitable for sparse QPs resulting from problem with large number of shooting nodes [76]. This example is rather small and the dynamics are not very fast, so computational times might not be an issue on modern hardware. However, it is still suitable to demonstrate the performance of different MLI combinations and how they react to disturbances. As a reference solution we use a fully converged SQP solution with a sampling rate Δt = 40 [ms]. In this case there is virtually no big difference between an exact Hessian RTI with warm start or an adapted linearization point, since with such a high sampling rate the neighboring problems do not differ that much. Obviously, in this case where the length of the shooting subintervals does not match the sampling time, a shift initialization is not applicable. The effect of the new linearization strategy is more visible in cases with rather big sampling rates, as in the meantime the system evolves more and the old solution becomes more and more outdated. High sampling rates make it in this example also possible to incorporate wind predictions with a finer resolution. Since in practice the wind prediction is always uncertain, updating the prediction more often might bring benefits for the controller performance. The simulation length is Tsim = 60 [s] with a wind profile provided by Siemens, cf. Fig. 4.10. Furthermore, at t = 40 [s] we add an disturbance, where the pitch angle jumps to β = 6◦ , so that we can see how the controller reacts to further unforeseen disturbances.
4 Multi-level Iterations for Economic Nonlinear Model Predictive Control
95
12.5 12 11.5 11 10.5 10 9.5 0
10
20
30
40
50
60
Fig. 4.10 Realistic wind profile
First, we look at the sole level D0 iterations with the same sampling rate as the reference solution, see Fig. 4.11. This experiment is done to illustrate the need of frequent exact Hessian evaluations. The Hessian for the D0 iterations was evaluated at the offline fully converged solution of the first OCP arising in this MPC simulation. Sole level D0 iterations are not able to follow the reference closely, however, no operational constraints are violated. Furthermore, this scheme is able to react to the disturbance on the pitch angle, though slower than the reference. The inferior performance of sole D0 iterations compared to exact Hessian RTIs is also confirmed by the energy output, where the first has only 269.133 [kWh] and the latter 275.036 [kWh], i.e., we loose about 2% of energy. For further numerical testing, we consider now a few scenarios where we run some of the levels introduced above in parallel with D2 iterations. The D2 iterations are now always performed with a sampling time of ΔtD2 = 0.4 [s]. The first scenario is D2 paired with level A iterations such that we have the same sampling rate as in the reference solution, i.e., 9 level A iterations between two-level D2 iterations. In the second scenario, we perform now only 6 level A iterations between two D2 iterations which results in a sampling rate of Δt = 57, 14 [ms] s. The results are depicted in Fig. 4.12. In both scenarios the MLI solutions closely follow the reference solution, especially in the case where the disturbance occurs. Furthermore no constraint violations can be observed. In terms of power output the MLI schemes deliver only a slightly lower energy output, namely, below 0.1% less then the reference solution. However, at time t ∈ [5, 15] [s] when the wind speed is high, the pitch angle rate does not follow the reference closely, but still yields a satisfying performance. We omit for brevity the further performance analysis of levels B and C as they have been extensively studied in [33] and references therein and continue with the novel levels introduced in this chapter. In the next scenario we examine the performance of level D2 and D0 iterations in parallel. D0 iterations are more expensive than sole QP solving, but for comparison we consider the same feedback rates as in the previous experiment (but this scheme is still real-time feasible). The results are depicted in Fig. 4.13. Again, the MLI schemes
96
A. Nurkanovi´c et al. 10
1.3
8 6
1.2
4
1.1 2
1 0
20
40
60
0 0
20
(a) Rotor speed ω
40
60
(b) Pitch angle β
4.5
7 6
4
5 3.5
4 3
3 2
2.5 0
20
40
60
0
20
(c) Generator torque Tg
40
60
(d) Electric power output PE
4
1
2 0.5
0 -2
0
-4 -0.5
-6 -8 0
20
40
(e) Pitch angle rate β˙
60
-1 0
20
40
60
(f) Generator torque rate T˙g
Fig. 4.11 Reference solution obtained via a fully converged SQP method for the simulation horizon of 60 s (black) compared to sole D0 iterations with the same feedback rate (red)
4 Multi-level Iterations for Economic Nonlinear Model Predictive Control
97
10
1.35 1.3
8
1.25
6
1.2 4
1.15 2
1.1 1.05 0
20
40
60
0 0
20
40
60
(b) Pitch angle β
(a) Rotor speed ω 4.5
7 6
4
5 3.5
4 3
3 2
2.5 0
20
40
60
0
20
40
60
(d) Electric power output PE
(c) Generator torque Tg 4
1
2 0.5
0 -2
0
-4 -0.5
-6 -8 0
20
40
(e) Pitch angle rate β˙
60
-1 0
20
40
60
(f) Generator torque rate T˙g
Fig. 4.12 Reference solution (black) compared to level D2 every 0.8 [s] paired with level A every 0.04 [s] iterations (red) and level A every 0.16 [s] (blue)
98
A. Nurkanovi´c et al. 1.35
10
1.3
8
1.25
6
1.2
4 1.15
2
1.1
0
1.05 0
20
40
0
60
20
(a) Rotor speed ω
40
60
(b) Pitch angle β
4.5
7 6
4
5 3.5
4 3
3 2
2.5 0
20
40
60
0
20
40
60
(d) Electric power output PE
(c) Generator torque Tg 1
4 2
0.5
0 0
-2 -4
-0.5
-6 -8 0
20
40
(e) Pitch angle rate β˙
60
-1 0
20
40
60
(f) Generator torque rate T˙g
Fig. 4.13 Reference solution (black) compared to level D2 paired with 9 level D0 iterations (red) and level D2 with 6 level D0 iterations (blue)
4 Multi-level Iterations for Economic Nonlinear Model Predictive Control 1.35
10
1.3
8
1.25
99
6
1.2
4 1.15
2
1.1
0
1.05 0
20
40
0
60
20
(a) Rotor speed ω
40
60
(b) Pitch angle β 7
4.5
6 4
5 3.5
4
3
3 2
2.5 0
20
40
60
0
20
(c) Generator torque Tg
40
60
(d) Electric power output PE
10
1
5
0.5
0
0
-5
-0.5
-1
-10 0
20
40
(e) Pitch angle rate β˙
60
0
20
40
60
(f) Generator torque rate T˙g
Fig. 4.14 Reference solution (black) compared to level D2 paired with 6 level D1 iterations (red) and level D2 with 4 level D1 iterations (blue)
100
A. Nurkanovi´c et al.
Table 4.3 Average computational times for different MLI modes in milliseconds MLI mode Preparation phase Feedback phase D2 D1 D0 A
4.913 4.515 4.101 0
1.012 1.002 0.995 1.015
are able to follow the reference solution very closely and to react fast and reliably to the disturbances. Compared to sole level D0 iterations, the D0 in the MLI setting iterations perform way better due the new exact Hessian they get with every D2 iteration. Furthermore, during the high wind speed period the D2-D0 combination follows way better the reference compared to the previous scenario. In terms of power output again we have only a slightly lower energy output than the reference, namely, below 0.1% less than the reference. Finally, in the last scenario we examine the performance of level D2 and D1 iterations. We choose the SR1 update formula paired with the step skipping heuristic described in Sect. 4.5, since it has shown overall best performance in our numerical experiments. Since it turns out in this example that the D1 iterations are not significantly cheaper than the D2 iterations, we consider MLI schemes with 6 and 4 D1 iterations between two D2 iterations. As it could expected also these MLI schemes are able to follow closely the reference solution, see Fig. 4.14. In some cases the solver HPMPC was not able to solve the QPs in level D1, so we used the previously computed control input. Level D2 was always able to produce a feedback so it can be seen as a fall-back controller. The average computational times for different MLI modes in our numerical experiments are given in Table 4.3. To conclude, in this numerical example we could see that the new MLI variants are able to closely follow the fully converged reference solution even at very high sampling rates, which is encouraging for the use of MLI for ENMPC. Furthermore, the benefit of new Hessian matrices provided from higher to lower levels was clearly visible, compare Figs. 4.11 and 4.12. Also level A iterations are able to successfully react to even big disturbances if the reference QP is updated often enough. Level D1 iterations showed impressive performance on the tutorial example, however in this case some QPs could not be solved, therefore further investigation is needed.
4 Multi-level Iterations for Economic Nonlinear Model Predictive Control 4.6
Average power output [kWh/s]
Fig. 4.15 Average power output as function of the computational load for different MLI schemes for wind turbine control. The red line is the corresponds to the average power output of the reference solution
101
t D2 = 0.2 [s],
4.595
t A = 0.04 [s]
4.59 D2 and A D2 and D0 D2 and D1
4.585 4.58 4.575 t D2 = 0.8 [s],
t A = 0.16 [s]
4.57 0
5
10
15
20
CPU load [%]
4.8 Summary We reviewed and extended state-of-the-art algorithms for ENMPC for nonlinear dynamic systems and presented briefly the idea of predictor–corrector path-following methods. These methods can be exploited for ENMPC in an SQP context, resulting in the RTI and MLI schemes. The MLI was extended and the interplay between D levels was further emphasized. An important role in different level D iterations plays the Hessian matrix of the QP, Sect. 4.5 gave an overview of common techniques and recent advances. Furthermore, we show that the ratio between Hessian and Jacobian evaluation times on each multiple shooting stage does not depend has a fixed value. We demonstrated on a fairly simple example how some regularization strategies can harm the convergence rates and tangential predictors, even when exact Hessians are evaluated. Initializing subsequent problems is not always an easy task, Sect. 4.4 surveys known techniques and presents the AS-RTI scheme which improves the initial guess. The efficiency of this approach is even on the easy example clearly visible. The first numerical experiments suggested that the extended MLI scheme paired with new initialization strategies enable control inputs at very high rates which closely follow the optimal solution manifold. It would be interesting to further analyze schemes such as DOPUS in an MLI context. Furthermore, adaptive level choices could be analyzed further, since many new combinations are possible now. Both methods could possibly be improved with second-order information which is available through level D2. Acknowledgements The first author acknowledges the support of the German Federal Ministry of Education and Research (BMBF) via the funded Kopernikus project: SynErgie (03SFK3U0). We thank Robin Verschueren and Dimitris Kouzoupis for helpful discussions and suggestions during their stay at the Department of Microsystems Engineering (IMTEK) University Freiburg, 79110 Freiburg, Germany. We also acknowledge the contributions of two anonymous reviewers, whose close reading and constructive comments led to improvement of this chapter.
102
A. Nurkanovi´c et al.
References 1. Quirynen, R., Houska, B., Vallerio, M., Telen, D., Logist, F., Impe, J.V., Diehl, M.: Symmetric algorithmic differentiation based exact Hessian SQP method and software for economic MPC. In: IEEE Conference on Decision and Control (CDC), pp. 2752–2757 (2014) 2. Bock, H.G., Diehl, M., Kostina, E.A., Schlöder, J.P.: Constrained optimal feedback control of systems governed by large differential algebraic equations. Real-Time and Online PDEConstrained Optimization, pp. 3–22 (2007) SIAM J. Constraint Optim. 3. Diehl, M., Gros, S.: Numerical optimal control. Expected to be published in 2020 4. Bock, H., Plitt, K.: A multiple shooting algorithm for direct solution of optimal control problems. In: IFAC World Congress, pp. 1603–1608 (1984) 5. Diehl, M., Ferreau, H.J., Haverbeke, N.: Efficient numerical methods for nonlinear MPC and moving horizon estimation. In: Magni, L., Raimondo, M., Allgöwer, F. (eds.) Nonlinear Model Predictive Control. Lecture Notes in Control and Information Sciences, vol. 384, pp. 391–417. Springer, Berlin (2009) 6. Rawlings, J.B., Mayne, D.Q., Diehl, M.: Model Predictive Control: Theory, Computation, and Design, 2nd edn. Nob Hill, Madison (2017) 7. Houska, B., Ferreau, H., Diehl, M.: ACADO toolkit - an open source framework for automatic control and dynamic optimization. Optimal Control Appl. Methods 32(3), 298–312 (2011) 8. Quirynen, R., Vukov, M., Zanon, M., Diehl, M.: Autogenerating microsecond solvers for nonlinear MPC: a tutorial using ACADO integrators. Optimal Control Appl. Methods (2014) 9. Quirynen, R.: Numerical simulation methods for embedded optimization. Ph.D. thesis, KU Leuven and University of Freiburg (2017) 10. Diehl, M., Leineweber, D., Schäfer, A.: MUSCOD-II users’ manual, IWR-Preprint 2001-25, University of Heidelberg (2001) 11. Verschueren, R., Frison, G., Kouzoupis, D., van Duijkeren, N., Zanelli, A., Novoselnik, B., Frey, J., Albin, T., Quirynen, R., Diehl, M.: ACADOS: a modular open-source framework for fast embedded optimal control (2019) 12. ACADOS - fast and embedded optimal control problem solvers. http://www.acados.org/. Accessed: 21 May 2019 13. Nocedal, J., Wright, S.: Numerical Optimization. Springer, Berlin (2006) 14. Frison, G., Jrgensen, J.: Algorithms and methods for high-performance model predictive control. Ph.D. thesis, Technical University of Denmark (DTU) (2016) 15. Frison, G., Kouzoupis, D., Jørgensen, J.B., Diehl, M.: An efficient implementation of partial condensing for nonlinear model predictive control. In: Proceedings of the IEEE Conference on Decision and Control (CDC) (2016) 16. Domahidi, A., Zgraggen, A.U., Zeilinger, M.N., Morari, M., Jones, C.N.: Efficient interior point methods for multistage problems arising in receding horizon control. In: IEEE Conference on Decision and Control (CDC), pp. 668–674 (2012) 17. Frison, G., Sørensen, H.B., Dammann, B., Jørgensen, J.B.: High-performance small-scale solvers for linear model predictive control. In: IEEE European Control Conference (ECC), pp. 128–133 (2014) 18. Frasch, J.V., Sager, S., Diehl, M.: A parallel quadratic programming method for dynamic optimization problems. Math. Program. Comput. 7(3), 289–329 (2015) 19. Frison, G., Kouzoupis, D., Zanelli, A., Diehl, M.: BLASFEO: basic linear algebra subroutines for embedded optimization. CoRR (2017). arXiv:1704.02457 20. Banjac, G., Goulart, P., Stellato, B., Boyd, S.: Infeasibility detection in the alternating direction method of multipliers for convex optimization (2017). optimization-online.org 21. Ferreau, H., Kirches, C., Potschka, A., Bock, H., Diehl, M.: qpOASES: a parametric active-set algorithm for quadratic programming. Math. Program. Comput. 6(4) (2014) 22. Diehl, M.: Real-time optimization for large scale nonlinear processes. Ph.D. thesis, University of Heidelberg (2001) 23. Diehl, M., Bock, H.G., Schlöder, J.P.: A real-time iteration scheme for nonlinear optimization in optimal feedback control. SIAM J. Control Optim. 43(5), 1714–1736 (2005)
4 Multi-level Iterations for Economic Nonlinear Model Predictive Control
103
24. Diehl, M., Findeisen, R., Allgöwer, F., Bock, H.G., Schlöder, J.P.: Nominal stability of the real-time iteration scheme for nonlinear model predictive control. IEE Proc.-Control Theory Appl. 152(3), 296–308 (2005) 25. Diehl, M., Findeisen, R., Allgöwer, F.: A stabilizing real-time implementation of nonlinear model predictive control. In: Biegler, L., Ghattas, O., Heinkenschloss, M., Keyes, D., van Bloemen Waanders, B. (eds.) Real-Time and Online PDE-Constrained Optimization, pp. 23– 52 (2007) SIAM J. Control Optim. 26. Müller, M.A.: Distributed and economic model predictive control: beyond setpoint stabilization. Ph.D. thesis, Logos Verlag Berlin GmbH (2014) 27. Diehl, M., Amrit, R., Rawlings, J.B.: A Lyapunov function for economic optimizing model predictive control. IEEE Trans. Autom. Control 56(3), 703–707 (2011) 28. Lindscheid, C., Hakerl, D., Meyer, A., Potschka, A., Bock, H.G., Engell, S.: Parallelization of modes of the multi-level iteration scheme for nonlinear model-predictive control of an industrial process. In: IEEE Conference on Control Applications (CCA), pp. 1506–1512 (2016) 29. Hakerl, D., Meyer, A., Azadfallah, N., Engell, S., Potschka, A., Wirsching, L., Bock, H.G.: Study of the performance of the multi-level iteration scheme for dynamic online optimization for a fed-batch reactor example. In: IEEE European Control Conference (ECC), pp. 459–464 (2016) 30. Bock, H.G., Diehl, M., Kühl, P., Kostina, E., Schiöder, J.P., Wirsching, L.: Numerical Methods for Efficient and Fast Nonlinear Model Predictive Control, pp. 163–179. Springer, Berlin (2007) 31. Albersmeyer, J., Beigel, D., Kirches, C., Wirsching, L., Bock, H.G., Schlöder, J.: Fast nonlinear model predictive control with an application in automotive engineering. Nonlinear Model Predictive Control. Lecture Notes in Control and Information Sciences, vol. 384, pp. 471–480 (2009) 32. Kudruss, M., Koryakovskiy, I., Vallery, H., Mombaur, K., Kirches, C.: Combining multi-level real-time iterations of nonlinear model predictive control to realize squatting motions on Leo. Technical report (2017) 33. Wirsching, L.: Multi-level iteration schemes with adaptive level choice for nonlinear model predictive control. Ph.D. thesis, Universität Heidelberg (2018) 34. Nurkanovi´c, A.: Numerical methods for fast nonlinear model predictive control. Master’s thesis, Technical University of Munich (2017) 35. Tran-Dinh, Q., Savorgnan, C., Diehl, M.: Adjoint-based predictor-corrector sequential convex programming for parametric nonlinear optimization. SIAM J. Optim. 22(4), 1258–1284 (2012) 36. Diehl, M., Bock, H.G., Schlöder, J., Findeisen, R., Nagy, Z., Allgöwer, F.: Real-time optimization and nonlinear model predictive control of processes governed by differential-algebraic equations. J. Process Control 12(4), 577–585 (2002) 37. Guddat, J., Vasquez, F.G., Jongen, H.: Parametric Optimization: Singularities, Pathfollowing and Jumps. Teubner, Stuttgart (1990) 38. Frasch, J.V., Wirsching, L., Sager, S., Bock, H.G.: Mixedlevel iteration schemes for nonlinear model predictive control. In: IFAC Conference on Nonlinear Model Predictive Control, pp. 138– 144 (2012) 39. Gros, S., Zanon, M., Quirynen, R., Bemporad, A., Diehl, M.: From linear to nonlinear MPC: bridging the gap via the real-time iteration. Int. J. Control (2016) 40. Griewank, A., Walther, A.: Evaluating Derivatives, 2nd edn. SIAM, Philadelphia (2008) 41. Kirches, C., Wirsching, L., Sager, S., Bock, H.G.: Efficient Numerics for Nonlinear Model Predictive Control, pp. 339–357. Springer, Berlin (2010) 42. Diehl, M., Walther, A., Bock, H.G., Kostina, E.: An adjoint-based SQP algorithm with quasiNewton Jacobian updates for inequality constrained optimization. Optim. Methods Softw. 25(4), 531–552 (2010) 43. van Duijkeren, N., Pipeleers, G., Swevers, J., Diehl, M.: Towards dynamic optimization with partially updated sensitivities. In: IFAC World Congress, pp. 8680–8685 (2017) 44. Chen, Y., Cuccato, D., Bruschetta, M., Beghi, A.: An inexact sensitivity updating scheme for fast nonlinear model predictive control based on a curvature-like measure of nonlinearity. In: 2017 IEEE 56th Annual Conference on Decision and Control (CDC), pp. 4382–4387. IEEE (2017)
104
A. Nurkanovi´c et al.
45. Diehl, M., Magni, L., Nicolao, G.D.: Efficient NMPC of unstable periodic systems using approximate infinite horizon closed loop costing. Annu. Rev. Control 28(1), 37–45 (2004) 46. Bock, H.G., Diehl, M., Leineweber, D.B., Schlöder, J.: Efficient direct multiple shooting in nonlinear model predictive control. In: Keil, F., Mackens, W., Voß, H., Werther, J. (eds.) Scientific Computing in Chemical Engineering II, vol. 2, pp. 218–227. Springer, Berlin (1999) 47. Nurkanovi´c, A., Zanelli, A., Albrecht, S., Diehl, M.: The advanced step real time iteration for NMPC. In: Proceedings of the IEEE Conference on Decision and Control (CDC), pp. 5298– 5305 (2019) 48. Zavala, V.M., Biegler, L.T.: The advanced-step NMPC controller: optimality, stability and robustness. Automatica 45(1), 86–93 (2009) 49. Zavala, V., Anitescu, M.: Real-time nonlinear optimization as a generalized equation. SIAM J. Control Optim. 48(8), 5444–5467 (2010) 50. Nurkanovi´c, A., Mešanovi´c, A., Zanelli, A., Frey, J., Frison, G., Albrecht, S., Diehl, M.: Realtime nonlinear model predictive control for microgrid operation. In: Proceedings of the American Control Conference (ACC) (2020) 51. Nurkanovi´c, A., Zanelli, A., Frison, G., Albrecht, S., Diehl, M.: Contraction properties of the advanced step real-time iteration for NMPC. In: Proceedings of the IFAC World Congress, vol. 21 (2020) 52. Tenny, M.J., Wright, S.J., Rawlings, J.B.: Nonlinear model predictive control via feasibilityperturbed sequential quadratic programming. Comput. Optim. Appl. 28(1), 87–121 (2004) 53. Verschueren, R., Zanon, M., Quirynen, R., Diehl, M.: A sparsity preserving convexification procedure for indefinite quadratic programs arising in direct optimal control. SIAM J. Optim. 27(3), 2085–2109 (2017) 54. Verschueren, R., van Duijkeren, N., Quirynen, R., Diehl, M.: Exploiting convexity in direct optimal control: a sequential convex quadratic programming method. In: IEEE Conference on Decision and Control (CDC) (2016) 55. Bock, H.G.: Recent Advances in Parameter Identification Techniques for O.D.E., pp. 95–121. Birkhäuser, Boston (1983) 56. Powell, M.J.D.: Algorithms for nonlinear constraints that use Lagrangian functions. Math. Program. 14(1), 224–248 (1978) 57. Powell, M.J.: The convergence of variable metric methods for non-linearly constrained optimization calculations. Nonlinear Programming, vol. 3 (1978) 58. Bock, H.G., Diehl, M., Leineweber, D.B., Schlöder, J.P.: A Direct Multiple Shooting Method for Real-Time Optimization of Nonlinear DAE Processes, pp. 245–267. Birkhäuser, Boston (2000) 59. Griewank, A., Walther, A.: Evaluating Derivatives: Principles and Techniques of Algorithmic Differentiation. SIAM, Philadelphia (2008) 60. Andersson, J.: A general-purpose software framework for dynamic optimization. Ph.D. thesis, Arenberg Doctoral School, KU Leuven, Department of Electrical Engineering (ESAT/SCD) and Optimization in Engineering Center, Kasteelpark Arenberg 10, 3001-Heverlee, Belgium (2013) 61. Schlipf, D., Schlipf, D.J., Kühn, M.: Nonlinear model predictive control of wind turbines using LIDAR. Wind Energy 16(7), 1107–1129 (2013) 62. Koerber, A., King, R.: Nonlinear model predictive control for wind turbines. In: European Wind Energy Conference (EWEC) (2011) 63. Soliman, M., Malik, O., Westwick, D.: Multiple model MIMO predictive control for variable speed variable pitch wind turbines. In: IEEE American Control Conference (ACC), pp. 2778– 2784 (2010) 64. Gros, S., Quirynen, R., Diehl, M.: An improved real-time economic NMPC scheme for wind turbine control using spline-interpolated aerodynamic coefficients. In: IEEE Conference on Decision and Control (CDC), pp. 935–940 (2014) 65. Gros, S., Schild, A.: Real-time economic nonlinear model predictive control for wind turbine control. Int. J. Control 90(12), 2799–2812 (2017)
4 Multi-level Iterations for Economic Nonlinear Model Predictive Control
105
66. Gros, S.: An economic NMPC formulation for wind turbine control. In: IEEE Conference on Decision and Control (CDC), pp. 1001–1006 (2013) 67. N. R. E. Labaratory, National Renewable Energy Labaratory: Definition of a 5-Mw Reference Wind Turbine for Offshore System Development. BiblioBazaar (2012) 68. Bossanyi, E.: GH Bladed Theory Manual. GH & Partners Ltd. (2003) 69. Slootweg, J.G., de Haan, S.W.H., Polinder, H., Kling, W.L.: General model for representing variable speed wind turbines in power system dynamics simulations. IEEE Trans. Power Syst. 18(1), 144–151 (2003) 70. Muteanu, I., Bratcu, A.I., Cutululis, N.-A., Ceang˘a, E.: Optimal Control of Wind Energy Systems: Towards a Global Approach. Springer, London (2008) 71. Gros, S., Vukov, M., Diehl, M.: A real-time MHE and NMPC scheme for wind turbine control. In: IEEE Conference on Decision and Control (CDC) (2013) 72. Hovgaard, T.G., Larsen, L.F.S., Jorgensen, J.B., Boyd, S.: MPC for wind power gradients— utilizing forecasts, rotor inertia, and central energy storage. In: IEEE European Control Conference (ECC), pp. 4071–4076 (2013) 73. Quirynen, R., Gros, S., Diehl, M.: Fast auto generated ACADO integrators and application to MHE with multi-rate measurements. In: IEEE European Control Conference (ECC), pp. 3077– 3082 (2013) 74. Kouzoupis, D., Frison, G., Zanelli, A., Diehl, M.: Recent advances in quadratic programming algorithms for nonlinear model predictive control. Vietnam J. Math. 46(4), 863–882 (2018) 75. Frison, G., Diehl, M.: HPIPM: a high-performance quadratic programming framework for model predictive control. In: Proceedings of the IFAC World Congress, vol. 21 (2020) 76. Vukov, M., Domahidi, A., Ferreau, H.J., Morari, M., Diehl, M.: Auto-generated algorithms for nonlinear model predictive control on long and on short horizons. In: Proceedings of the IEEE Conference on Decision and Control (CDC), pp. 5113–5118 (2013)
Chapter 5
On Closed-Loop Dynamics of ADMM-Based MPC Moritz Schulze Darup
and Gerrit Book
Abstract In this chapter, we study the closed-loop dynamics of linear systems under approximate model predictive control (MPC). More precisely, we consider MPC implementations based on a finite number of ADMM iterations per time step. We first show that the closed-loop dynamics can be described based on a nonlinear augmented model. We then characterize an invariant set around the augmented origin, where the dynamics become linear. Finally, we investigate the performance of the approximate MPC for various choices of the ADMM parameters based on a comprehensive numerical benchmark.
As already highlighted in previous chapters, model predictive control (MPC) is an optimization-based control strategy. Typically, an optimal control problem (OCP) is solved in every time step to evaluate the control action to be applied. The objective function of this OCP specifies the performance metric, while the constraints encode a model of the system as well as constraints on states and inputs. For a convex quadratic performance metric, a linear model, and polytopic constraints, the resulting OCP is a convex quadratic program (QP). Several methods for solving QPs arising in control exist. Examples include interior-point methods [24], active-set procedures [7], multiparametric programming [1], and proximal algorithms such as projected gradient schemes [10, 19] and the alternating direction method of multipliers (ADMM) [16, 18]. What the various solvers have in common is that they iteratively approach the optimal solution. It is usually assumed that the number of iterations is high enough to approximate the optimum sufficiently well. This assumption can be hard to realize for MPC implementations tailored for resource-constrained embedded platforms, networked systems, or very high sampling rates. For those applications, termination of the optimization after a small number of iterations can be required even if the iterates have not yet converged to the optimum. M. Schulze Darup (B) Control and Cyberphysical Systems Group, TU Dortmund University, Dortmund, Germany e-mail: [email protected] G. Book Automatic Control Group, Paderborn University, Paderborn, Germany © The Author(s), under exclusive license to Springer Nature Switzerland AG 2021 T. Faulwasser et al. (eds.), Recent Advances in Model Predictive Control, Lecture Notes in Control and Information Sciences 485, https://doi.org/10.1007/978-3-030-63281-6_5
107
108
M. Schulze Darup and G. Book
At first sight, “incomplete” optimization seems to be doomed to fail. However, in the framework of optimization-based control, a fixed number of iterations per time step can be sufficient since additional iterations follow at future sampling instances. In the resulting setup, optimization-iterates are, to some extent, coupled to sampling times and thus called real-time iterations. MPC based on real-time iterations has been realized using various optimization schemes. Newton-type single and multiple shooting solvers are considered in [5] and [4]. Projected gradient schemes have been discussed in [12] and [23] and ADMM real-time iterations have recently been investigated in [20]. Most of these works focus on the special case of a single optimization iteration per time step. Moreover, state and input constraints are often not considered. In fact, constraints are neglected in [4, 5] (at least for the theoretical statements) and only input constraints are included in [12, 23]. In this chapter, motivated by the promising results in [20], we study MPC based on real-time ADMM for linear systems with state and input constraints. However, in contrast to [20] and previous works on real-time iterations, we allow multiple iterations per time step. Nevertheless, the number of iterations M is a priori fixed for every sampling instant and, in particular, independent of the current system state. As a consequence, the control law is explicitly defined based on the M ADMM iterations. The purpose of this work is to analyze how the closed-loop system dynamics change with the parameters of the ADMM scheme. To this end, inspired by [23] and [20], we show that an augmented state-space model allows to describe the behavior of the controlled system. The subsequent analysis of the augmented system is twofold. First, we will investigate the dynamics and the stability of the augmented system. Second, a numerical benchmark will indicate that MPC based on real-time ADMM is competitive (compared to standard MPC) if a suitable parametrization is used. For example, in one analyzed scenario with M = 10 iterations per time step (that is specified in line 8 of Table 5.2), 486 out of 500 initial states (that are feasible for the original MPC) are steered to the origin without violating the constraints and with a performance decrease of only 0.03% (compared to the original MPC). The chapter is organized as follows. We briefly comment on the specific notation used in this chapter in the following section. In Sect. 5.2, we summarize basic results on MPC for linear systems and corresponding implementations using ADMM. In Sect. 5.3, we introduce the real-time ADMM scheme and derive the augmented model for the closed-loop dynamics. We show in Sect. 5.4 that the dynamics become linear around the augmented origin. We further investigate the linear regime in Sects. 5.5 and 5.6 by specifying an invariant set and the cost-to-go around the origin, respectively. In Sects. 5.7 and 5.8, we discuss different choices of the ADMM parameters and analyze their impact based on a numerical benchmark. Finally, conclusions and an outlook are given in Sect. 5.9.
5 On Closed-Loop Dynamics of ADMM-Based MPC
109
5.1 Specific Notation This chapter is an extension of the conference paper [20]. To be consistent with the notation in [20], we slightly differ from the notation in other book chapters. Here, x refers to the state of the augmented system. The set of natural numbers (including 0) is denoted by N. The identity matrix in Rn×n is called In . With 0m×n , we denote the zero matrix in Rm×n . For the zero vector in Rm , we write 0m instead of 0m×1 . For vectors x ∈ X ⊂ Rn and z ∈ Z ⊂ Rq , we occasionally write x ∈ X × Z, z instead of (x, z) ∈ X × Z, i.e., we omit the splitting of concatenated vectors into vector pairs. Vector-valued inequalities such as z ≤ z with z ∈ Rq are understood element-wise. The indicator function IZ of some set Z is defined as 0 if z ∈ Z, IZ (z) := ∞ if z ∈ / Z.
5.2 Background on ADMM-based MPC for Linear Systems In this chapter, we consider linear discrete-time systems x(k + 1) = Ax(k) + Bu(k),
x(0) := x0
(5.1)
with state and input constraints of the form x(k) ∈ X := {x ∈ Rn | x ≤ x ≤ x} and u(k) ∈ U := {u ∈ Rm | u ≤ u ≤ u}.
(5.2a) (5.2b)
The box-constraints are characterized by bounds x, x ∈ Rn and u, u ∈ Rm satisfying x < 0n < x and u < 0m < u. Now, standard MPC is based on solving the OCP VN (x) :=
min
ϕ(x(N ˆ )) +
x(0),..., ˆ x(N ˆ ), u(0),..., ˆ u(N ˆ −1)
s.t.
N −1
(x(k), ˆ u(k)) ˆ
x(0) ˆ = x,
x(k ˆ + 1) = A x(k) ˆ + B u(k) ˆ u(k) ˆ ∈U x(k) ˆ ∈X
(5.3a)
k=0
(5.3b) ∀k ∈ {0, ..., N − 1}, ∀k ∈ {0, ..., N − 1},
(5.3c) (5.3d)
∀k ∈ {1, ..., N }
(5.3e)
110
M. Schulze Darup and G. Book
in every time step for the current state x = x(k). The objective function thereby consists of quadratic cost functions ϕ(x) ˆ := xˆ P xˆ
and
(x, ˆ u) ˆ := xˆ Q xˆ + uˆ R u, ˆ
(5.4)
where the weighting matrices Q and R are design parameters and where P is chosen as the solution of the discrete-time algebraic Riccati equation (DARE) A (P − P B (R + B P B)−1 B P) A − P + Q = 0.
(5.5)
The control action in every time step then is u(k) = uˆ ∗ (0),
(5.6)
i.e., the first element of the optimal control sequence for (5.3). For completeness, we make the following standard assumptions. Assumption 5.1 The pair (A, B) is stabilizable, that R is positive definite, and that Q can be written as L L with (A, L) being detectable. We additionally note that no terminal set is considered in (5.3). Recursive feasibility and convergence to the origin may thus not hold for every initially feasible state (5.7) x ∈ F N := {x ∈ X | (5.3) is feasible}. We stress, however, that recursive feasibility and convergence guarantees can be obtained for almost every x ∈ F N by suitably choosing the horizon length N (see, e.g., [13], [2, Thm. 13], or [21, Thm. 3]). It is easy to see that the OCP (5.3) is a QP parametrized by the current state x. As stated in the introduction, we here use ADMM (see, e.g., [3]) for its approximate solution. To prepare the application of ADMM, we rewrite (5.3) as the QP 1 z H z + x Qx z∈Z 2 s.t. Gz = F x
VN (x) = min
(5.8a) (5.8b)
with the decision variables z :=
u(0) ˆ x(1) ˆ
...
u(N ˆ − 1) , x(N ˆ )
(5.9)
the constraint set Z := {z ∈ Rq | z ≤ z ≤ z},
(5.10)
and suitable matrices F ∈ R p×n , G ∈ R p×q , and H ∈ Rq×q , where p := N n and q := p + N m. We first note that the constraint (5.3b) and the associated variable
5 On Closed-Loop Dynamics of ADMM-Based MPC
111
x(0) ˆ have been eliminated in (5.8). We further note that the specific order of x(k) ˆ and u(k) ˆ in (5.9) facilitates some mathematical expressions in the subsequent sections. We finally note that the bounds z, z ∈ Rq and the matrix H are uniquely determined by the constraints (5.2), the cost functions (5.3a) and (5.4), and definition (5.9). In contrast, G and F are not unique. To simplify reproducibility of our results, we use ⎛
−B In 0n×m ⎜0n×m −A −B ⎜ G := ⎜ . .. ⎝ .. . 0n×m
0n×n In .. . −A
0n×n ..
. −B
⎞ ⎟ A ⎟ (5.11) ⎟ and F := 0( p−n)×n ⎠
In
throughout the chapter. Now, by introducing the set E(x) := {z ∈ Rq | Gz = F x}, it is straightforward to show that the optimizers of (5.8) and min y,z
1 ρ y H y + IE(x) (y) + IZ (z) + y − z 22 , 2 2 s.t. y = z,
(5.12a) (5.12b)
are equivalent (see [16, Eqs. (9)–(10)]) for any positive ρ ∈ R. Due to (5.12b), the decision variable y acts as a copy of z. Hence, y and z are interchangeable in (5.12a). However, the specific choice in (5.12a) turns out to be useful [16]. We next investigate the dual problem to (5.12), which is given by max μ
inf L ρ (y, z, μ) , y,z
(5.13)
where the (augmented) Lagrangian with the Lagrange multipliers μ reads L ρ (y, z, μ) :=
1 ρ y H y + IE(x) (y) + IZ (z) + y − z 22 + μ (y − z). 2 2
ADMM solves (5.13) by repeatedly carrying out the iterations
y ( j+1) := arg min L ρ y, z ( j) , μ( j) , y
( j+1) z := arg min L ρ y ( j+1) , z, μ( j) , and z
μ( j+1) := μ( j) + ρ y ( j+1) − z ( j+1)
(5.14a) (5.14b) (5.14c)
(cf. [3, Sect. 3.1]). Since z ( j) and μ( j) are constant in (5.14a), we obviously have
112
M. Schulze Darup and G. Book
1 y (H + ρ I p )y + μ( j) − ρz ( j) y y 2 s.t. Gy = F x.
y ( j+1) = arg min
The solution to this equality-constrained QP results from
H + ρ Iq G G 0 p× p
y ( j+1) ∗
( j) ρz − μ( j) = . Fx
(5.15)
Thus, precomputing the matrix E= allows to evaluate
E 11 E 12 E 12 E 22
:=
H + ρ Iq G G 0 p× p
−1
y ( j+1) = E 11 ρz ( j) − μ( j) + E 12 F x.
(5.16)
(5.17)
According to [16, Sect. III.B], we further have 1 z ( j+1) = projZ y ( j+1) + μ( j) , ρ
(5.18)
i.e., z ( j+1) results from projecting y ( j+1) + ρ −1 μ( j) onto the set Z. In summary, by substituting y ( j+1) from (5.17) into (5.18) and (5.14c), we obtain the two iterations z
( j+1)
μ( j+1)
1 ( j) and + E 12 F x + μ := projZ E 11 ρz − μ ρ
:= μ( j) + ρ E 11 ρz ( j) − μ( j) + E 12 F x − z ( j+1)
( j)
( j)
(5.19a) (5.19b)
that are independent of the copy y introduced in (5.12). The iterations (5.19) form the basis for our subsequent analysis of ADMM-based MPC. For their derivation, we followed the procedure in [16] that considered the “uncondensed” QP (5.8) with equality constraints. We stress that ADMM-based MPC can be implemented differently. For example, an implementation for the “condensed” QP without equality constraints is discussed in [8]. For the analysis to be presented, the uncondensed form is more intuitive. However, in contrast to our approach, the method in [8] allows to efficiently incorporate terminal constraints. In this context, we note that ADMM can only be efficiently applied if the underlying iterations are easy to evaluate, where the crucial step is usually a projection similar to (5.19a). Here, the projection onto Z can be efficiently evaluated due to (5.10). In fact, for such box-constraints, we easily compute ⎧ ⎨ z i if z i < z i ,
(5.20) projZ (z) i = z i if z i ≤ z i ≤ z i , ⎩ z i if z i < z i .
5 On Closed-Loop Dynamics of ADMM-Based MPC
113
We note that the consideration of a terminal set in (5.3) (i.e., x(N ˆ ) ∈ T instead of x(N ˆ ) ∈ X) will, in general, result in a non-trivial set Z and, hence, in a non-trivial projection. However, we could consider terminal sets that allow for efficient projections such as, e.g., box-shaped or ellipsoidal sets T. We further note that implementing (5.18) can be more efficient than (5.19a) if a sparse factorization of the matrix in (5.15) is used to compute y ( j+1) instead of the dense matrix E.
5.3 An Augmented Model for the Closed-Loop Dynamics Generally, many iterations (5.19) are required to solve (5.8) (resp. (5.3)) with a certain accuracy. The number of required iterations varies, among other things, with the current state. Here, we fix the number of iterations to M ∈ N for the whole runtime of the controller and consequently independent of the current state. In contrast to existing works, we do not investigate the accuracy of the resulting ADMM scheme for a specific state but we study the dynamics of the corresponding closed-loop system in general. To this end, we initially provide a more precise description of the controlled system. In every time step k, we compute z (M) (k) and μ(M) (k) according to (5.19) based on x(k), z (0) (k), and μ(0) (k). Inspired by (5.6), the input u(k) is then chosen as the first element of the input sequence contained in z (M) (k), i.e., u(k) = Cu z (M) (k)
with
Cu := Im 0m×(q−m) .
(5.21)
Obviously, the resulting input depends on the initializations z (0) (k) and μ(0) (k) of the iterations (5.19). In principle, we can freely choose these initializations in every time step. However, it turns out to be useful to reuse data from previous time steps. This approach is called warm-start and it is well-established in optimization-based control. More precisely, we choose the initial values z (0) (k + 1) and μ(0) (k + 1) at step k + 1 based on the final iterates z (M) (k) and μ(M) (k) from step k. Moreover, for simplicity, we restrict ourselves to linear updates of the form z (0) (k + 1) := Dz z (M) (k) (0)
(M)
μ (k + 1) := Dμ μ
(k),
and
(5.22a) (5.22b)
where we note that similar updates have been considered in [5, Sect. 2.2]. Suitable choices for Dz ad Dμ will be discussed later in Sect. 5.7. Here, we focus on the structural dynamics of the controlled system. To this end, we first note that the openloop dynamics (5.1), the M iterations (5.19), the input (5.21), and the updates (5.22) determine the closed-loop behavior. More formally, we show in the following that the controlled system can be described using the augmented state
114
M. Schulze Darup and G. Book
⎛
⎞
x
x := ⎝ z (0) ⎠ ∈ Rr μ(0)
(5.23)
with r := n + 2q. In fact, according to (5.1), (5.21), and (5.22) the closed-loop dynamics are captured by the augmented system x(0) := x0 .
x(k + 1) = Ax(k) + Bv(x(k)) ,
(5.24)
with the augmented system matrices ⎛
⎞ A 0n×q 0n×q A := ⎝0q×n 0q×q 0q×q ⎠ 0q×n 0q×q 0q×q
⎛
and
⎞ BCu 0n×q B := ⎝ Dz 0q×q ⎠ , 0q×q Dμ
and the augmented control law v : Rr → R2q with z (M) . μ(M)
v(x) :=
(5.25)
According to the following proposition, v(x) is continuous and piecewise affine in x. Proposition 5.1 Let v be defined as in (5.25) with z (M) and μ(M) resulting from M ∈ N iterations (5.19). Then, v is continuous and piecewise affine in x. Proof We prove the claim by showing that the iterates z ( j) and μ( j) are continuous and piecewise affine in x not only for j = M but for every j ∈ {0, . . . , M}. We prove the latter statement by induction. For j = 0, continuity and piecewise affinity hold by construction since (0)
z (0) = 02q×n I2q x, μ i.e., z (0) and μ(0) are linear in x. It remains to prove that z ( j) and μ( j) being continuous and piecewise affine implies continuity and piecewise affinity of z ( j+1) and μ( j+1) . To this end, we first note that the iterations (5.19) can be rewritten as ⎛
⎛
x
⎞⎞
⎛
⎛
x
⎞
⎞
z ( j+1) = projZ ⎝K(1) ⎝ z ( j) ⎠⎠ and μ( j+1) = ρ ⎝K(1) ⎝ z ( j) ⎠ − z ( j+1) ⎠ μ( j) μ( j) (5.26) with (5.27) K(1) := E 12 F ρ E 11 ρ1 Iq − E 11 . Clearly,
5 On Closed-Loop Dynamics of ADMM-Based MPC
⎛
115
x
⎞
ζ ( j) := K(1) ⎝ z ( j) ⎠ μ( j) is continuous and piecewise affine in x if these properties hold for z ( j) and μ( j) . Moreover projZ (ζ ) is continuous and piecewise affine in ζ as apparent from (5.20). Since the composition of two continuous and piecewise affine functions results in a continuous and piecewise affine function, z ( j+1) as in (5.26) is indeed continuous and piecewise affine in x. As a consequence, μ( j+1) = ρ(ζ ( j+1) − z ( j+1) ) is continuous and piecewise affine in x since it results from the addition of two continuous and piecewise affine functions. Obviously, the augmented system (5.24) inherits continuity and piecewise affinity from v(x). This trivial result is summarized in the following corollary for reference. Moreover, we point out a connection between the augmented systems in (5.24) and [20, Eq. (21)] in Remark 5.3. Corollary 5.2 Let v be defined as in (5.25) with z (M) and μ(M) resulting from M ∈ N iterations (5.19). Then, the dynamics (5.24) are continuous and piecewise affine in x. Remark 5.3 In [20], the special case M = 1 is considered. The corresponding closed-loop dynamics are captured by an augmented system that is similar to but slightly different from (5.24). To identify the connection between the systems in (5.24) and [20, Eq. (21)], we note that μ(1) = ρK(1) x − ρz (1) and consequently μ(0) (k + 1) = Dz μ(1) (k) = ρ Dz K(1) x(k) − ρ Dz z (1) .
(5.28)
for M = 1. Relation (5.28) allows to formulate the augmented system (5.24) more compactly with an augmented control action that only depends on z (1) . This modification leads to the augmented system in [20, Eq. (21)].
5.4 Linear Dynamics Around the Augmented Origin The augmented input v(x) results from M ADMM iterations. In every iteration, evaluating z ( j+1) includes a projection. Apart from the projection, the iterations (5.19) are linear. We next derive conditions under which the projections are effectless (in the sense that projZ (z) = z) and, hence, under which v(x) is linear in x. As a preparation, we define the matrices j−1 i (ρ E 11 ) j (ρ E 11 ) j−1 ( ρ1 Iq − E 11 ) (5.29) K( j) := i=0 (ρ E 11 ) E 12 F for every j ∈ {1, . . . , M}, where we note that (5.29) for j = 1 is in line with the definition of K(1) in (5.27). These matrices are instrumental for the following result.
116
M. Schulze Darup and G. Book
Proposition 5.4 Let M ∈ N with M ≥ 1 and let x ∈ Rr be such that K( j) x ∈ Z
(5.30)
for every j ∈ {1, . . . , M}. Then, z ( j) = K( j) x
μ( j) = 0q
and
(5.31)
for every j ∈ {1, . . . , M}. Proof We prove the claim by induction. For j = 1, we have z (1) = projZ (K(1) x) and μ(1) = ρ(K(1) x − z (1) ) according to (5.26). Now, (5.30) implies z (1) = K(1) x and consequently μ(1) = 0q as proposed. We next show that (5.31) holds for j + 1 if it holds for j. To this end, we first note that satisfaction of (5.31) for j implies ⎛
⎛
x
⎞⎞
⎛
⎛
x
⎞⎞
z ( j+1) = projZ ⎝K(1) ⎝ z ( j) ⎠⎠ = projZ ⎝K(1) ⎝K( j) x⎠⎠ . 0q μ( j)
(5.32)
By definition of K( j) , we further obtain ⎛
⎞
x K(1) ⎝K( j) x⎠ = E 12 F ρ E 11 ( j) K x 0q j i j+1 (ρ E ) j ( 1 I − E ) x = E 12 F + 11 11 i=1 (ρ E 11 ) E 12 F (ρ E 11 ) ρ q x
(5.33) =K
( j+1)
x.
Substituting (5.33) in (5.32) and using (5.30) (for j + 1) implies
z ( j+1) = projZ K( j+1) x = K( j+1) x
(5.34)
as proposed. It remains to prove μ( j+1) = 0q , which follows from ⎛
⎛
x
⎞
⎞
⎛
⎛
x
⎞
⎞
μ( j+1) = ρ ⎝K(1) ⎝ z ( j) ⎠ − z ( j+1) ⎠ = ρ ⎝K(1) ⎝K( j) x⎠ − K( j+1) x⎠ = 0q , 0q μ( j) where we used (5.31) for j and relations (5.26), (5.33), and (5.34).
Obviously, conditions (5.30) are satisfied for x = 0r . Not too surprisingly, the conditions are also satisfied in a neighborhood of the augmented origin, and we will investigate this neighborhood in more detail in the next section. For now, we study the effect of the conditions (5.30) on the closed-loop dynamics. According to
5 On Closed-Loop Dynamics of ADMM-Based MPC
117
Proposition 5.4, having (5.30) implies (M) K x v(x) = 0q×r for M ≥ 1. Hence, around the augmented origin, the piecewise affine dynamics of the augmented system (5.24) turn into the linear dynamics x(k + 1) = S M x(k),
(5.35)
where (M) K 0q×r ⎛ M−1 (ρ E 11 )i E 12 F A + BCu ⎜ i=0 M−1 i =⎜ ⎝ Dz i=0 (ρ E 11 ) E 12 F
S M := A + B
0q×n
BCu (ρ E 11 ) M Dz (ρ E 11 ) M 0q×q
BCu (ρ E 11 ) M−1 ( ρ1 Iq − E 11 )
⎞
⎟ ⎟ Dz (ρ E 11 ) M−1 ( ρ1 Iq − E 11 ) ⎠. 0q×q
(5.36) Clearly, the eigenvalues of the matrix S M determine whether the closed-loop behavior around the augmented origin is asymptotically stable or not. At this point, we observe that three categories of parameters have an effect on S M . First, the parameters of the original system in terms of A and B. Second, the parameters Q, R, and N of the MPC. Third, the parameters ρ, M, and Dz of the ADMM scheme. We note that Cu is not counted as a parameter since there seems to be no competitive alternative to the choice in (5.21). We further note that the update matrix Dμ has no effect on S M . Since we are dealing with a stabilizable pair (A, B), an MPC parametrized as in Sect. 5.2 is stabilizing in some neighborhood around the origin for every prediction horizon N ≥ 1. In particular, this neighborhood includes the set, where the MPC acts identically to the linear quadratic regulator (LQR), i.e., where the MPC law (5.6) is equivalent to u(k) = K x(k)
with
K := −(R + B P B)−1 B P A.
(5.37)
Hence, instability can only result from inappropriate choices for ρ, Dz , and M. Based on this observation, it is interesting to study the effect of the ADMM parameters on the eigenvalues of S M . As a step in this direction, we next present two basic results: A general lower bound for the number of zero eigenvalues of S M is given in Proposition 5.5 and the existence of a stabilizing parameter set in terms of ρ, Dz , and M is guaranteed by Proposition 5.7. We note that both results do not completely characterize the eigenvalues of S M but they substantiate the numerical benchmark in Sect. 5.8 and they should serve as a basis for future research. Proposition 5.5 Let ρ ∈ R with ρ > 0, let M ∈ N with M ≥ 1, and let Dz ∈ Rq×q . Then, S M as in (5.36) has at least q + p − n = (2N − 1)n + N m zero eigenvalues.
118
M. Schulze Darup and G. Book
We use the following lemma to prove Proposition 5.5. Lemma 5.6 Let ρ ∈ R with ρ > 0 and let M ∈ N with M ≥ 1. Then, M rank (E 11 ) ≤ rank (E 11 ) = q − p = N m
(5.38)
holds for E 11 as in (5.16). Proof Clearly, the first inequality in (5.38) represents a standard result and the last relation in (5.38) holds by definition of q and p. Hence, it remains to prove rank (E 11 ) = q − p. To this end, we note that E as in (5.16) results from inverting a 2 × 2 block matrix, where the upper-left block (i.e., H + ρ Iq ) is invertible and where the off-diagonal blocks (i.e., G and G) have full rank as apparent from (5.11). Thus, we obtain
−1 E 11 = (H + ρ Iq )−1 − (H + ρ Iq )−1 G G(H + ρ Iq )−1 G G(H + ρ Iq )−1 (5.39) (see, e.g., [17, Thm. 2.1]). Due to rank (H + ρ Iq ) = q and E 11 ∈ Rq×q , we next find
rank (E 11 ) = rank (H + ρ Iq )E 11 (H + ρ Iq )
−1 = rank H + ρ Iq − G G(H + ρ Iq )−1 G G .
(5.40)
To further investigate (5.40), we choose the matrix G ∈ R(q− p)×q such that G :=
G G
is invertible, where we note that such a choice is always possible. Using the latter matrix, H + ρ Iq can be rewritten as
H + ρ Iq = G G
−
(H + ρ Iq )G
−1
G=G
G(H + ρ Iq )−1 G
−1
G. (5.41)
After substituting (5.41) in (5.40) and some lengthy but basic manipulations, we find
−1 rank H + ρ Iq − G G(H + ρ Iq )−1 G G = rank (G) . This completes the proof since rank (G) = q − p.
Proof of Proposition 5.5 To prove the claim, we first note that S M has obviously at least q zero eigenvalues since the last q rows of S M consist of zero entries. The remaining q + n eigenvalues correspond to the eigenvalues of the upper-left submatrix
5 On Closed-Loop Dynamics of ADMM-Based MPC
119
⎞ M−1 i A + BCu BCu (ρ E 11 ) M i=0 (ρ E 11 ) E 12 F ⎠ := ⎝ M−1 i M E Dz (ρ E ) F D (ρ E ) 11 12 z 11 i=0 BCu M−1 A 0n×q + = (ρ E 11 )i E 12 F (ρ E 11 ) M . i=0 0q×n 0q×q Dz ⎛
SM
Using standard rank inequalities for the sum and the multiplication of matrices (see, e.g., [15, p. 13]) and the result in Lemma 5.6, we infer from the latter equation that BCu M−1 i i M rank (S M ) ≤ rank (A) + min rank ρ E 11 E 12 F ρ M E 11 , rank i=0 Dz M ≤ n + min q, n + rank (E 11 ) ≤ n + min{q, n + q − p} = 2n + N m.
Hence, S M has at least q + n − 2n − N m = (N − 1)n zero eigenvalues, which implies that S M has at least (N − 1)n + q = (2N − 1)n + N m zero eigenvalues. Proposition 5.7 Let M = 1 and let Dz ∈ Rq×q be arbitrary. Then, there exists a ρ > 0 such that S M is Schur stable. Proof Clearly, S M is Schur stable if and only if S M is Schur stable. Now, for M = 1, we have A + BCu E 12 F ρ BCu E 11 . (5.42) S1 = Dz E 12 F ρ Dz E 11 It remains to show that there exists an ρ > 0 such that S1 is Schur stable, where we note that E 11 and E 12 depend on ρ as apparent from (5.16). It will turn out that a sufficiently small ρ implies Schur stability of S1 . In this context, we study the limit limρ→0 E and find
∗ ∗ E 12 E 11 ∗ ∗ (E 12 ) E 22
:= lim
ρ→0
E 11 E 12 E 12 E 22
=
H G G 0 p× p
−1 (5.43)
according to (5.16). Based on this result in combination with (5.42), we infer S∗1
:= lim S1 = ρ→0
∗ F A + BCu E 12 ∗ Dz E 12 F
0n×q . 0q×q
(5.44)
As a consequence, for every δ > 0, there exists a ρ > 0 such that S1 − S∗1 < δ for some matrix norm · . Moreover, since the spectrum of a matrix depends continuously on its entries (see, e.g., [6]), for every > 0, there exists a δ > 0 such that S1 − S∗1 < δ implies (5.45) max min |λi − λ∗j | < , i
j
120
M. Schulze Darup and G. Book
where λi and λ∗j denote the i-th and j-th eigenvalue of S1 and S∗1 , respectively. In summary, for every > 0, there exists a ρ > 0 such that (5.45) holds. We next show that S∗1 is Schur stable. This completes the proof since (5.45) implies that, for a sufficiently small ρ, the eigenvalues of S1 get arbitrarily close to those of S∗1 . Now, it ∗ F is Schur is apparent from (5.44) that S∗1 is Schur stable if and only if A + BCu E 12 stable. At this point, we note that the matrix on the right-hand side of (5.43) is related to the unconstrained solution of the original QP (5.8). In fact, the optimizer of (5.8) subject to z ∈ Rq instead of z ∈ Z satisfies
H G G 0 p× p
∗ z 0 = . ∗ Fx
(5.46)
The unconstrained solution of (5.8) is, on its own, related to the LQR. In fact, we have Cu z ∗ = K x for z ∗ as in (5.46). Taking (5.43) into account, we further obtain ∗ ∗ ∗ F x. Hence, K = Cu E 12 F and consequently A + BCu E 12 F = A + BK. z ∗ = E 12 Clearly, A + B K is Schur stable since the LQR stabilizes every stabilizable pair (A, B).
5.5 Positive Invariance Around the Augmented Origin In the previous section, we showed that the closed-loop behavior around the (augmented) origin obeys the linear dynamics (5.35). In this section, we analyze the neighborhood where (5.35) applies in more detail. The starting point for this analysis is the conditions (5.30) that imply linearity according to Proposition 5.4. As a consequence, the dynamics (5.35) apply to all states x in the set K M := x ∈ Rr | K( j) x ∈ Z, ∀ j ∈ {1, . . . , M} . Unfortunately, having x ∈ K M does not imply x+ := S M x ∈ K M . Hence, we next study the largest positively invariant set (for the linear dynamics) contained in K M . Clearly, this set corresponds to P M := {x ∈ Rr | SkM x ∈ K M , ∀k ∈ N}.
(5.47)
Now, assume an augmented state trajectory enters P M at step k ∗ , i.e., x(k ∗ ) ∈ P M . Then, all subsequent inputs, i.e., ∗
∗ u(k) = Cu K(M) Sk−k M x(k )
(5.48)
for every k ≥ k ∗ , satisfy the constraints U by construction. In fact, for k ≥ k ∗ , we ∗ ∗ ∗ (M) x(k) ∈ Z have x(k) = Sk−k M x(k ) ∈ P M due to x(k ) ∈ P M and consequently K and Cu K(M) x(k) ∈ U. However, the original states
5 On Closed-Loop Dynamics of ADMM-Based MPC ∗
∗ x(k) = C x Sk−k M x(k )
121
C x := In
with
0n×(r −n)
(5.49)
may or may not satisfy the constraints X for k ≥ k ∗ . To compensate for this drawback, we focus on positively invariant subsets of P M that take the original state constraints explicitly into account. In addition, it turns out to be useful to restrict our attention to sequences ∗
∗ z (0) (k) = C z Sk−k M x(k )
with
C z := 0q×n
Iq
0q×q
that satisfy the constraints Z. Hence, we consider the set P∗M := {x ∈ Rr | C M SkM x ∈ X × Z × Z M , ∀k ∈ N}
(5.50)
· · × Z and with Z M := Z × · (M)−times
⎛ CM
Cx Cz K(1) .. .
⎞
⎟ ⎜ ⎟ ⎜ ⎟ ⎜ := ⎜ ⎟, ⎟ ⎜ ⎠ ⎝ K(M) .
(5.51)
We note that the first two blocks in C M incorporate the conditions C x SkM x ∈ X and C z SkM x ∈ Z, respectively, whereas the last M blocks refer to SkM x ∈ K M . Hence, we have P∗M ⊆ P M by construction. Apparently, the sets (5.47) and (5.50) are similar to the output admissible sets studied in [9]. According to [9, Thms. 2.1 and 4.1], P M as in (5.50) is bounded and finitely determined if (i) S M is Schur stable, (ii) the pair (C M , S M ) is observable, (iii) X × Z × Z M is bounded, and (iv) 0n+(M+1)q is in the interior of X × Z × Z M . In this context, we recall that finite determinedness implies the existence of a finite k ∈ N such that P∗M = x ∈ Rr | C M SkM x ∈ X × Z × Z M , ∀k ∈ {0, . . . , k} . Now, X × Z × Z M is bounded and contains the origin as an interior point by construction. We already analyzed Schur stability of S M in the previous section and found that stabilizing parameters ρ, Dz , and M always exist (see Prop. 5.7). Thus, it remains to study observability of (C M , S M ). To this end, we first derive the following lemma. Lemma 5.8 Let ρ ∈ R with ρ > 0. Then, ρ1 Iq − E 11 is positive definite. Proof Clearly, H + ρ Iq is symmetric and positive definite. Hence, positive definite if and only if the matrix
1 I ρ q
− E 11 is
122
M. Schulze Darup and G. Book
1 Iq − E 11 (H + ρ Iq ) ρ
−1 1 = (H + ρ Iq )2 − H − ρ Iq + G G(H + ρ Iq )−1 G G ρ
(H + ρ Iq )
is positive definite, where the right-hand side of the equation results from (5.39). Now, positive definiteness of the latter matrix can be easily verified since it is the sum of the positive definite matrix 1 1 1 2 (H + ρ Iq ) − H − ρ Iq = (H + ρ Iq ) (H + ρ Iq ) − Iq = H 2 + H ρ ρ ρ
−1 and the positive semi-definite matrix G G(H + ρ Iq )−1 G G.
Based on Lemma (5.8), it is straightforward to prove observability of (C M , S M ). Proposition 5.9 Let ρ ∈ R with ρ > 0, let M ∈ N with M ≥ 1, and let S M and C M be as in (5.36) and (5.51), respectively. Then, the pair (C M , S M ) is observable. Proof We prove the claim by showing that the observability matrix has full rank, i.e., rank r . To this end, we note that the r × r -matrix ⎛
⎞ ⎛ In Cx ⎝ C z ⎠ = ⎝ 0q×n E 12 F K(1)
0n×q Iq ρ E 11
⎞ 0n×q 0q×q ⎠ 1 I − E 11 ρ q
(5.52)
is a submatrix of C M for every M ≥ 1. Hence, a sufficient condition for (C M , S M ) being observable is (5.52) having full rank. Now, verifying that (5.52) has rank r = n + 2q is easy since the diagonal blocks of the block-triangular matrix have rank n, q, and q, where the latter holds according to Lemma 5.8. Since (C M , S M ) is observable according to Proposition 5.9, P∗M as in (5.50) is bounded and finitely determined for every stabilizing choice of ρ, M, and Dz (with M ≥ 1). It is interesting to note that boundedness does not hold for the superset P M in (5.47) as shown in the following remark. Remark 5.10 In contrast to P∗M , the set P M is not bounded since we have ⎛
⎞
0 z
⎟ ⎜ x∗ := ⎝ ⎠ ∈ PM −1 1 E 11 z −ρ ρ Iq − E 11
(5.53)
for every z ∈ Rq , where the inverse exists according to Lemma 5.8. To retrace (5.53), we note that ( j) ∗
K x =ρ
j
j E 11 z
−ρ
j
j−1 E 11
1 Iq − E 11 ρ
1 Iq − E 11 ρ
−1
E 11 z = 0q
5 On Closed-Loop Dynamics of ADMM-Based MPC
123
for every j ∈ {1, . . . , M}. Hence, x∗ ∈ K M . Moreover, we clearly have Ax∗ = 0r and (M) K x∗ = B · 02q = 0r . B 0q×r As a consequence, we find S M x∗ = 0r by definition of S M in (5.36). In combination, we obtain SkM x∗ ∈ K M for every k ∈ N and thus x∗ ∈ P M . We finally note that the characterization of positively invariant sets can be simplified in exchange for slightly conservative results. In fact, regarding (5.36), it is easy to see that ∗
∗ μ(0) (k) = Cμ Sk−k M x(k ) = 0q
with
Cμ := 0q×(n+q)
Iq
for every k > k ∗ , every M ≥ 1, and every x(k ∗ ) ∈ Rr . Hence, during the investigation of positive invariance, we could assume μ = 0 and restrict our attention to the x–z– subspace. However, for brevity and in order to incorporate the case Cμ x(k ∗ ) = 0q , we do not detail this modification.
5.6 Cost-to-Go Around the Augmented Origin Let us assume that a trajectory of the augmented system (5.24) enters P∗M at time step k ∗ ∈ N, i.e., x(k ∗ ) ∈ P∗M . Positive invariance of P∗M combined with the linear dynamics (5.35) then imply ∗
∗ ∗ x(k) = Sk−k M x(k ) ∈ P M
for all k ≥ k ∗ . For stabilizing parameters ρ, M, and Dz (that imply Schur stability of S M ), we additionally find ∗
∗ lim Sk−k M x(k ) = 0r ,
k→∞
and hence convergence to the origin. Thereby, the evolution of the original states obeys (5.49) and the applied inputs are (5.48) for k ≥ k ∗ . According to the following proposition, the corresponding infinite-horizon cost is given by (5.55). Proposition 5.11 Let the parameters ρ > 0, Dz , and M be such that S M is Schur stable, let (5.54) Q := C x QC x + (K(M) ) Cu RCu K(M) , and assume that x(k ∗ ) ∈ P∗M for some k ∗ ∈ N. Then,
124
M. Schulze Darup and G. Book ∞
(x(k), u(k)) = x(k ∗ )Px(k ∗ ),
(5.55)
k=k ∗
where P is the solution of the Lyapunov equation P = Q + SM PS M .
(5.56)
Proof The definition of the stage cost in (5.4) combined with the input and state sequences (5.48)–(5.49) immediately lead to ∞
(x(k), u(k))
k=k ∗
∗
= x (k )
∞
∗ (Sk−k M )
k=k ∗ ∞
k−k ∗ (M) (M) C x Q C x + (Cu K ) RCu K SM x(k ∗ ),
∗
= x (k )
k (Sk M ) Q SM
x(k ∗ ),
k=0
where the second equation results from (5.54) and the substitution k := k − k ∗ . It remains to prove that ∞ k (Sk (5.57) M ) Q S M = P. k=0
At this point, we first note that the sum in (5.57) converges since S M is Schur stable. Moreover, Q as in (5.54) is obviously positive semi-definte. Using standard arguments, it is then straightforward to show that P can be inferred from (5.56). Remark 5.12 The matrix Q in (5.56) is usually required to be positive definite in order to guarantee a unique and positive definite solution P. The Lyapunov equation can, however, also be solved for positive semi-definite Q yielding a positive semidefinite matrix P. A suitable algorithm can, e.g., be found in [14].
5.7 Design Parameters of the Real-Time ADMM The previous sections provide some insights on the closed-loop dynamics of ADMMbased MPC. However, the identified model (5.24) and invariant set (5.50) depend on various parameters that need to be specified. Analogously to the analysis of S M in Sect. 5.4, we can distinguish three categories of involved parameters: The system parameters, the MPC parameters, and the ADMM parameters. Here, we focus on suitable choices for the ADMM parameters ρ, M, Dz , Dμ , z (0) (0), and μ(0) (0) that correspond to the weighting factor in (5.12a), the number of iterations (5.19), the
5 On Closed-Loop Dynamics of ADMM-Based MPC
125
update matrices in (5.22), and the “free” initial values in (5.24), respectively. We stress, in this context, that the initial state x(0) of the original system (5.1) is, of course, not freely selectable. Now, it is well-known that the performance of ADMM significantly varies with ρ. Optimal choices for ρ are available for some specific setups (see, e.g., [8, 11]). Unfortunately, these setups do not match with the ADMM scheme considered here. Hence, in our numerical experiments in the next section, we choose ρ ∈ {1, 10, 100} in order to cover different magnitudes. Regarding the number of ADMM iterations, we consider M ∈ {1, 5, 10} in the numerical benchmark. For M = 1, we deliberately reproduce the results from [20] that focus on a single ADMM iteration per time step. In order to make reasonable choices for Dz and Dμ , we have to better understand the role of these update matrices. To this end, let us first ignore the system dynamics (5.1) and assume x(k + 1) = x(k). In this case, we can choose Dz and Dμ such that the real-time iterations become classical ADMM iterations solving (5.8) for a fixed state x. In fact, for (5.58) D z = Dμ = I q , we find z (0) (k + 1) = z (M) (k) and μ(0) (k + 1) = μ(M) (k), which reflects the classical setup in the sense that z ( j) (k + 1) = z (M+ j) (k) and μ( j) (k + 1) = μ(M+ j) (k). However, in reality, (5.1) usually implies x(k + 1) = x(k). Hence, the choice (5.58) might, in general, not be useful. To identify suitable alternatives, it is helpful to recall the definition of z. As apparent from (5.9), z contains predicted states and inputs for N steps. Intuitively, after applying the first predicted input according to (5.21), it is reasonable to reuse the remaining N − 1 steps as initial guesses for z (0) (k + 1). This classical idea is omnipresent in MPC and also exploited for the real-time scheme in [5, Sect. 2.2]. Applying the shifting requires to extend the shortened predictions by one terminal step. In this context, a simple choice is u(N ˆ − 1) = 0m and consequently x(N ˆ ) = A x(N ˆ − 1). Clearly, this choice leads to ⎞ 0l×(n+m) Il 0l×n ⎜ 0n×(n+m) 0n×l In ⎟ ⎟ Dz = ⎜ ⎝0m×(n+m) 0m×l 0m×n ⎠ , 0n×(n+m) 0n×l A ⎛
(5.59)
where l := q − 2n − m is introduced for brevity. Another popular choice is u(N ˆ − 1) = K x(N ˆ − 1), which implies x(N ˆ ) = (A + B K ) x(N ˆ − 1) and ⎞ 0l×(n+m) Il 0l×n ⎜ 0n×(n+m) 0n×l In ⎟ ⎟ Dz = ⎜ ⎝0m×(n+m) 0m×l K ⎠ , 0n×(n+m) 0n×l S ⎛
(5.60)
where S := A + B K . It remains to apply the shifting to the update of the Lagrange multipliers μ. In contrast to the predictions for z, it is hard to reasonably extend the shortened predictions for μ. Hence, we choose
126
M. Schulze Darup and G. Book
Dμ =
Iq−n−m 0(q−n−m)×(n+m) 0(n+m)×(n+m) 0(n+m)×(q−n−m)
(5.61)
as a counterpart for (5.59) as well as (5.60). For both cases, this choice can be interpreted as the assumption that the added terminal step in z is optimal. The previously discussed choices for ρ, M, Dz , and Dμ determine the dynamics of the augmented system (5.24). The closed-loop trajectory additionally depends on the initial state ⎛ ⎞ ⎛ ⎞ x(0) x0 x0 = x(0) = ⎝ z (0) (0) ⎠ = ⎝ z 0 ⎠. μ0 μ(0) (0) As mentioned above, z 0 and μ0 can be freely chosen. We propose three different initial choices for z 0 that are, to some extent, related to the three discussed choices for Dz . A naive choice for z 0 is 0 p . In fact, this choice is only optimal if x0 = 0n , i.e., if the system is initialized at the setpoint. A more reasonable initialization results for the ˆ = Ak x 0 . predicted inputs u(0) ˆ = · · · = u(N ˆ − 1) = 0m and the related states x(k) For this choice, z 0 can be written as z 0 := D0 x0 with 0m×n 0m×n . (5.62) D0 := ... A AN We note that this initial choice satisfies the input constraints by construction. However, the state constraints might be violated especially for unstable system matrices A. In this case, the choice K S N −1 K S0 D0 := (5.63) ... S SN might be useful, which is based on the input predictions u(k) ˆ = K x(k) ˆ and the related states x(k) ˆ = S k x0 . It is hard to construct initializations for the Lagrange multipliers that reflect the different approaches for z 0 . We thus choose μ0 = 0 p independent of the initialization for z 0 . A summary of the different choices for the parameter ρ, the number of iterations M, the update matrices Dz and Dμ , and the initialization z 0 is listed in Table 5.1. Hence, by considering all combinations of the parameter choices, we obtain 81 = 34 different realizations of the proposed real-time ADMM.
Table 5.1 Overview of different choices for the weighting factor ρ, the number of iterations M, the update matrices Dz and Dμ , and the initializations z 0 = D0 x0 (from left to right) Weighting ρ Iterations M
Updates
Dz
Dμ
Initialization
D0
1
Copy
Iq
Iq
Naive
0q×n
10
5
Shift-zero
(5.59)
(5.61)
Zero
(5.62)
100
10
Shift-LQR
(5.60)
(5.61)
LQR
(5.63)
1
5 On Closed-Loop Dynamics of ADMM-Based MPC
127
5.8 Numerical Benchmark To investigate the performance of the ADMM-based MPC, we apply the 81 ADMM parametrizations from the previous section to the (discretized) double integrator with the system matrices 11 0.5 A= and B = 01 1 and the state and input bounds x = −x =
25 and u = −u = 1. 5
The MPC cost functions in (5.4) are specified by Q = I2 , R = 0.1, and P as in (5.5). The prediction horizon is chosen as N = 5. For every parametrization, we first investigate the linear dynamics (5.35) around the augmented origin. More precisely, we study the Schur stability of the augmented system matrix S M . As pointed out in Sect. 5.4, only the parameters ρ, M, and Dz have an effect on S M . For the corresponding 33 = 27 parametrizations, it is easy to verify numerically that S M as in (5.36) is always Schur stable. This observation is promising since Schur stability has, so far, not been proven for specific parameter choices. In fact, Proposition 5.7 merely proves the existence of stabilizing parameters and the proof focuses on small weighting factors ρ. Moreover, one can easily verify that the (meaningless but possible) choice ρ = 10, M = 1, Dz = −2Iq results in an unstable matrix S M . Now, since the pair (C M , S M ) (with C M as in (5.51)) is always observable according to Proposition 5.9, the set P∗M (as in (5.50)) is finitely determined for all considered parametrizations. We next compute P∗M for every parametrization using the standard procedure in [9, Sect. III]. The resulting sets P∗M are high dimensional. In fact, P∗M is r -dimensional with r = n + 2q = n + 2(n + m)N = 62 for n = 2, m = 1, and N = 10 as in the example. In order to get a feeling for the size of the various sets P∗M , we will analyze low-dimensional slices of the form
Sz,μ := x ∈ Rn x = x z μ ∈ P∗M
(5.64)
that result from fixing z and μ to some specific values. It is easy to see that the slices Sz,μ are subsets of the state constraints X for every choice of z and μ. Moreover, the slices Sz,μ have some similarities with the set T := {x ∈ Rn | S k x ∈ D, ∀k ∈ N},
(5.65)
128
M. Schulze Darup and G. Book
where D := {x ∈ X | K x ∈ U}, where S = A + B K as above, and where K is as in (5.37). Clearly, the set T refers to the largest set, where the LQR can be applied without violating the constraints and where (5.37) applies (for P as in (5.5)). It can be easily computed (see Fig. 5.1) and typically serves as a terminal set for MPC. The similarities between Sz,μ and T are as follows: For every state x in these sets, the upcoming trajectories (resulting from ADMM-based MPC, respectively, classical MPC) are captured by linear (augmented) dynamics and converging to the (augmented) origin. Hence, both sets provide a numerically relatively cheap underestimation of the domain of attraction (DoA) for the corresponding predictive control scheme. We compare the size of these underestimations by evaluating the ratio vol(Sz,μ ) vol(T)
(5.66)
for different z and μ. More precisely, we consider z = D0 x for the three variants of D0 in Table 5.1 and μ = 0q , i.e., we consider slices related to the different initializations for z 0 and μ0 . Numerical values for the ratios (5.66) and all 81 parametrizations are listed in Table 5.2 (see columns “vol.”). Interestingly, all values are larger than or equal to 1. Hence, the DoA of the system controlled by the ADMM-based MPC is at least as large as the set T. Moreover, for some parametrizations, Sz,μ is significantly larger than T. For example, the ratio (5.66) evaluates to 31.00 for the parametrization in line 13 of Table 5.2 and M = 5 (see Fig. 5.1). Another interesting observation is that the ratios are decreasing with M for most parametrizations except for those in lines 4, 5, and 13 of Table 5.2. One explanation for this trend is the fact that the number of rows in C M (see (5.51)) and, consequently, the number of potential hyperplanes restricting P∗M is increasing with M. The slices Sz,μ provide a useful underestimation of the DoA. However, in order to get a more complete impression of the DoA, we have to take into account initial states x0 outside of Sz,μ . To this end, we randomly generated 500 initial states x0 that are feasible for the classical MPC, i.e., that are contained in F5 defined as in (5.7) (see Fig. 5.1). Although not enforced by a terminal set, all of these states are steered to the origin by the classical MPC. In fact, the corresponding trajectories enter the set T after at most 15 time steps. In contrast, not all trajectories resulting from the proposed ADMM-based MPC are converging. The percentages of converging trajectories for the different parametrizations are listed in Table 5.2 (see columns “cnvg.”). In this context, a trajectory is counted as “converging” if it reaches the set P∗M after at most 50 time steps. Apparently, the numbers of converging trajectories differ significantly for the various parametrizations (see also Fig. 5.1). In fact, only 13% of the trajectories converge for the parametrization in line 21 in Table 5.2 and M = 1, whereas 100% converge, e.g., for the parametrization in line 25 and M = 10. We further observe that the convergence ratios are greater than 87% whenever shifted updates are used (i.e., Dz as in (5.59) or (5.60) and Dμ as in (5.61)). In contrast, for copied updates (i.e., Dz = Dμ = Iq ), the majority of convergence rates is below 67%. A simple explanation for this observation is varying states x(k).
init.
LQR LQR LQR Zero Zero Zero Naive Naive Naive LQR LQR LQR Zero Zero Zero
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
Shift-LQR Shift-LQR Shift-LQR Shift-LQR Shift-LQR Shift-LQR Shift-LQR Shift-LQR Shift-LQR Shift-zero Shift-zero Shift-zero Shift-zero Shift-zero Shift-zero
ADMM parameters Line Updates
100 10 1 100 10 1 100 10 1 100 10 1 100 10 1
ρ 1.00 1.00 1.00 1.05 1.69 2.17 2.00 1.86 1.36 1.00 1.00 1.00 26.36 19.02 2.26
M =1 vol. cnvg. 0.94 0.94 0.94 0.95 0.95 0.92 0.94 0.94 0.94 0.91 0.94 0.94 0.97 0.96 0.92
perf. 0.94 0.93 0.91 0.50 0.81 0.94 0.97 0.96 0.93 0.61 0.94 0.91 0.14 0.79 0.94
1.00 1.00 1.00 1.32 2.94 1.04 1.93 1.48 1.02 1.00 1.00 1.00 31.00 2.95 1.04
M =5 vol. cnvg. 0.98 0.93 0.91 0.98 0.91 0.91 0.98 0.94 0.91 0.97 0.93 0.91 0.98 0.92 0.91
perf. 0.98 1.00 0.99 0.69 0.98 0.99 0.99 1.00 0.99 0.85 1.00 0.99 0.60 0.98 0.99
1.00 1.00 1.00 1.78 1.70 1.00 1.85 1.26 1.00 1.00 1.00 1.00 17.07 1.70 1.00
M = 10 vol. cnvg. 0.97 0.97 0.89 0.97 0.97 0.89 0.96 0.97 0.89 0.98 0.95 0.89 0.95 0.95 0.89
perf. 0.97 1.00 1.00 0.84 1.00 1.00 0.98 1.00 1.00 0.96 1.00 1.00 0.82 1.00 1.00
(continued)
61.7 30.5 277.6 77.0 32.3 277.8 64.8 30.8 277.6 177.1 44.0 279.0 192.4 45.7 279.2
M∗
Table 5.2 Numerical benchmark for the 81 real-time ADMM parametrizations applied to the double integrator example. The abbreviations “vol.”, “cnvg.”, and “perf.” are short for volume, convergence, and performance ratio, respectively. The listed iterations M ∗ are required for standard ADMM to achieve the accuracy (5.70)
5 On Closed-Loop Dynamics of ADMM-Based MPC 129
init.
Naive Naive Naive LQR LQR LQR Zero Zero Zero Naive Naive Naive
16 17 18 19 20 21 22 23 24 25 26 27
Shift-zero Shift-zero Shift-zero Copy Copy Copy Copy Copy Copy Copy Copy Copy
ADMM parameters Line Updates
Table 5.2 (continued)
ρ
100 10 1 100 10 1 100 10 1 100 10 1
2.00 1.86 1.36 1.00 1.00 1.00 12.95 11.16 2.11 2.00 1.87 1.38
M =1 vol. cnvg. 0.87 0.94 0.94 0.77 0.59 0.13 0.66 0.95 0.17 0.81 0.64 0.15
perf. 0.54 0.97 0.93 0.72 0.74 0.83 0.24 0.78 0.90 0.82 0.83 0.87
1.93 1.48 1.02 1.00 1.00 1.00 11.82 2.61 1.05 1.93 1.50 1.03
M =5 vol. cnvg. 0.97 0.95 0.91 0.98 0.90 0.52 0.98 0.89 0.53 0.98 0.90 0.52
perf. 0.86 1.00 0.99 0.67 0.98 0.87 0.56 0.98 0.87 0.68 0.99 0.87
1.85 1.26 1.00 1.00 1.00 1.00 8.58 1.70 1.00 1.86 1.28 1.00
M = 10 vol. cnvg. 0.96 0.95 0.89 1.00 0.92 0.65 0.98 0.93 0.64 1.00 0.93 0.65
perf. 0.97 1.00 1.00 0.81 1.00 0.95 0.78 1.00 0.95 0.82 1.00 0.95
180.3 44.3 279.1 319.7 64.8 431.8 335.2 66.6 431.9 322.9 65.1 431.8
M∗
130 M. Schulze Darup and G. Book
5 On Closed-Loop Dynamics of ADMM-Based MPC
131
Fig. 5.1 Illustration of various sets, initial states, and trajectories resulting from the numerical example for M = 5 and the parametrizations in lines 13 (top) and 23 (bottom) of Table 5.2, respectively. In both plots, the sets F5 (gray), Sz 0 ,μ0 (yellow), and T (green) are shown. Moreover, the 500 generated initial states x0 are marked with crosses. In this context, black, respectively red, crosses indicate whether the corresponding ADMM-based trajectories converge or not. Finally, the trajectories resulting from the ADMM-based MPC (solid blue) and the classical MPC (dashed blue) are depicted for the initial state x0 = (−18.680 3.646)
It remains to investigate the performance of the ADMM-based MPC. To this end, we compare the overall costs of converging trajectories to the corresponding costs resulting from classical MPC. By definition, a converging trajectory enters P∗M after at most 50 time steps. Hence, according to Proposition 5.11, the overall cost for a converging trajectory evaluates to
132
M. Schulze Darup and G. Book
ADMM V∞ (x0 ) :=
∞
(x(k), u(k)) = x(k ∗ )Px(k ∗ ) +
k=0
∗ k −1
(x(k), u(k)),
(5.67)
k=0
where k ∗ ≤ 50 is such that x(k ∗ ) ∈ P∗M . Similarly, the overall cost of a classical MPC trajectory can be calculated as MPC (x0 ) := V∞
∞
(x(k), u(k)) = ϕ(x(k∞ )) +
k=0
k ∞ −1
(x(k), u(k)),
(5.68)
k=0
where k∞ is such that x(k∞ ) ∈ T. We separately compute the performance ratio MPC (x0 ) V∞ ADMM V∞ (x0 )
for every converging trajectory and list the mean values for the different parametrizations in Table 5.2 (see columns “perf.”). The (average) performance ratios indicate that the ADMM-based MPC results, in most cases, in a performance decrease. Nevertheless, the performance ratios are larger than 74% for all cases with ρ ∈ {1, 10}. Furthermore, the performance ratios are increasing with M for most parametrizations. This observation is reasonable since the ADMM iterations (5.19) monotonically converge to the optimizers z ∗ and μ∗ in the sense that the quantity ρ z ( j) − z ∗ 22 +
1 ( j) μ − μ∗ 22 ρ
(5.69)
decreases with j (see, e.g., [3, App. A]). Nevertheless, being closer to the optimum does not necessarily imply constraint satisfaction. This is apparent from the convergence ratios in Table 5.2 that are decreasing with M in some cases. In order to compare the proposed real-time iteration schemes with standard ADMM, we finally investigate the number of iterations (5.19) necessary to solve the arising QPs to a certain level of accuracy. More precisely, we evaluate the first iteration j that satisfies (5.70) z ( j) − z ∗ 22 ≤ 10−4 and denote it with M ∗ for each QP. Regarding the parametrization, we consider again the 27 variants in Table 5.2 and consequently warmstarts similar to (5.22). Since ∗ z (M ) ≈ z ∗ , the warmstarts are only meaningful if we follow the trajectories resulting from the original MPC (and not those resulting from the real-time schemes). Hence, the listed M ∗ in Table 5.2 refer to the number of iterations required to obtain (5.70) averaged over all QPs along all 500 MPC trajectories for each parametrization. Apparently, M ∗ is, on average, significantly larger than the considered number of iterations M for the investigated real-time schemes. In fact, we observe M ∗ ∈ [30.5, 431.9] while M ∈ [1, 10] has been considered for the proposed schemes.
5 On Closed-Loop Dynamics of ADMM-Based MPC
133
5.9 Conclusions and Outlook This chapter focused on MPC based on real-time ADMM. The restriction to a finite number of ADMM iterations per time step allowed us to systematically analyze the dynamics of the controlled system. The first part of the analysis showed that the closed-loop dynamics can be described based on a piecewise affine augmented model. The associated augmented state consists of the original states x, the decision variables z, and the Lagrange multipliers μ. The second part of the analysis revealed that the nonlinear dynamics turn into linear ones around the augmented origin. We further investigated where the linear dynamics apply and characterized a positively invariant set in the augmented state space. The third part of the analysis addressed the influence of various ADMM parameters. We motivated different choices for every parameter and evaluated their efficiency in a comprehensive numerical benchmark. The benchmark clearly indicates that real-time ADMM is competitive to classical MPC for suitable parametrizations. In fact, for some parametrizations (e.g., in line 8 of Table 5.2 with M = 10), we found a high convergence ratio of 97% and simultaneously a performance ratio of almost 100%, i.e., nearly optimal. The obtained results extend the findings in [20] in many directions. First, the analysis in [20] is restricted to one ADMM iteration per time step (i.e., M = 1), whereas the new results support M ≥ 1. Second, Propositions 5.5 and 5.7 provide novel findings on Schur stability and eigenvalues of S M . Third, observability of the pair (C M , S M ) has been formally proven in Proposition 5.9. While the presented results are more complete than the pioneering work [20], many promising extensions are left for future research. First, the ADMM-based MPC here and in [20] builds on the “uncondensed” QP (5.8). It would be interesting to study ADMM-based MPC derived from the “condensed” QP in [8]. Second, the stability analysis of the augmented system is still incomplete. It may, however, be possible to extend the guaranteed domain of attraction beyond the linear regime by exploiting (5.69) or the contraction estimates recently proposed in [25]. Third, robustness against disturbances has not been considered yet. Fourth, real-world applications of the proposed predictive control scheme should be addressed. In this context, it is interesting to note that real-time optimization schemes not only support embedded and fast controller implementations. In fact, they also pave the path for encrypted predictive control as in [22].
References 1. Bemporad, A., Morari, M., Dua, V., Pistikopoulos, E.N.: The explicit linear quadratic regulator for constrained systems. Automatica 38(1), 3–20 (2002) 2. Boccia, A., Grüne, L., Worthmann, K.: Stability and feasibility of state constrained MPC without stabilizing terminal constraints. Syst. Control Lett. 72, 14–21 (2014) 3. Boyd, S., Parikh, N., Chu, E., Peleato, B., Eckstein, J.: Distributed optimization and statistical learning via the alternating direction method of multipliers. Found. Trends Mach. Learn. 3(1), 1–122 (2011)
134
M. Schulze Darup and G. Book
4. Diehl, M., Bock, H.G., Schlöder, J.P.: A real-time iteration scheme for nonlinear optimization in optimal feedback control. SIAM J. Control Optim. 43(5), 1714–1736 (2005) 5. Diehl, M., Findeisen, R., Allgöwer, F., Bock, H.G., Schlöder, J.P.: Nominal stability of realtime iteration scheme for nonlinear model predictive control. IEE Proc. Control Theory Appl. 152(3), 296–308 (2005) 6. Elsner, L.: On the variation of the spectra of matrices. Linear Algebra Appl. 47, 127–138 (1982) 7. Ferreau, H.J., Kirches, C., Potschka, A., Bock, H.G., Diehl, M.: qpOASES: a parametric activeset algorithm for quadratic programming. Math. Prog. Comp. 6, 327–363 (2014) 8. Ghadimi, E., Teixeira, A., Shames, I., Johansson, M.: Optimal parameter selection for the alternating direction method of multipliers (ADMM): quadratic problems. IEEE Trans. Autom. Control 60(3), 644–658 (2015) 9. Gilbert, E.G., Tan, K.T.: Linear systems with state and control constraints: the theory and application of maximal output admissible sets. IEEE Trans. Autom. Control 36(9), 1008–1020 (1991) 10. Giselsson, P., Boyd, S.: Metric selection in fast dual forward backward splitting. Automatica 62, 1–10 (2015) 11. Giselsson, P., Boyd, S.: Linear convergence and metric selection for Douglas-Rachford splitting and ADMM. IEEE Trans. Autom. Control 62(2), 532–544 (2017) 12. Graichen, K., Kugi, A.: Stability and incremental improvement of suboptimal MPC without terminal constraints. IEEE Trans. Autom. Control 55(11), 2576–2580 (2010) 13. Grieder, P., Borelli, F., Torrisi, F., Morari, M.: Computation of the constrained infinite time linear quadratic regulator. Automatica 40(4), 701–708 (2004) 14. Hammarling, S.: Numerical solution of the discrete-time, convergent, non-negative definite Lyapunov equation. Syst. Control Lett. 17, 137–139 (1991) 15. Horn, R.A., Johnson, C.R.: Matrix Analysis. Cambridge University Press (1985) 16. Jerez, J.L., Goulart, P.J., Richter, S., Constantinides, G.A., Kerrigan, E.C., Morari, M.: Embedded online optimization for model predictive control at megahertz rates. IEEE Trans. Autom. Control 59(12), 3238–3251 (2014) 17. Lu, T.-T., Shiou, S.-H.: Inverses of 2 × 2 block matrices. Comput. Math. Appl. 43, 119–129 (2002) 18. O’Donoghue, B., Stathopoulos, G., Boyd, S.: A splitting method for optimal control. IEEE Trans. Control Syst. Technol. 21(6), 2432–2442 (2013) 19. Patrinos, P., Bemporad, A.: An accelerated dual gradient-projection algorithm for embedded linear model predictive control. IEEE Trans. Autom. Control 59(1), 18–33 (2014) 20. Schulze Darup, M., Book, G., Giselsson, P.: Towards real-time ADMM for linear MPC. In: Proceedings of the 2019 European Control Conference, pp. 4276–4282 (2019) 21. Schulze Darup, M., Cannon, M.: Some observations on the activity of terminal constraints in linear MPC. In: Proceedings of the 2016 European Control Conference, pp. 4977–4983 (2016) 22. Schulze Darup, M., Redder, A., Quevedo, D.E.: Encrypted cloud-based MPC for linear systems with input constraints. In: Proceedings of 6th IFAC Nonlinear Model Predictive Control Conference, pp. 635–642 (2018) 23. Van Parys, R., Pipeleers, G.: Real-time proximal gradient method for linear MPC. In: Proceedings of the 2018 European Control Conference, pp. 1142–1147 (2018) 24. Wang, Y., Boyd, S.: Fast model predictive control using online optimization. IEEE Trans. Control Syst. Technol. 18(2), 267–278 (2010) 25. Zanelli, A., Tran-Dinh, Q., Diehl, M.: Contraction estimates for abstract real-time algorithms for NMPC. In: Proceedings of the 58th IEEE Conference on Decision and Control (2019)
Chapter 6
Distributed Optimization and Control with ALADIN B. Houska
and Y. Jiang
Abstract This chapter aims to give a concise overview of distributed optimization and control algorithms based on the augmented Lagrangian based alternating direction inexact Newton (ALADIN) method. Here, our goal is to provide a tutorial-style introduction to this relatively new distributed optimization algorithm. In contrast to other existing algorithms, which are often tailored for convex optimization problems, ALADIN is particularly suited for solving non-convex optimization problems. Moreover, another principal advantage of ALADIN is that it can achieve a super-linear or even quadratic convergence rate if suitable Hessian approximations are used.
6.1 Introduction This chapter aims to give a concise overview of distributed optimization and control algorithms based on the augmented Lagrangian based alternating direction inexact Newton (ALADIN) method. Here, our goal is to provide a tutorial-style introduction to this relatively new distributed optimization algorithm. In contrast to other existing algorithms, which are often tailored for convex optimization problems, ALADIN is particularly suited for solving non-convex optimization problems. Moreover, another principal advantage of ALADIN is that it can achieve a super-linear or even quadratic convergence rate if suitable Hessian approximations are used.
6.1.1 Brief Literature Overview Existing algorithms for distributed convex optimization include dual decomposition [2, 12] as well as the alternating direction method of multipliers (ADMM) [15, 16]. The main idea of dual decomposition is that for minimization problems with B. Houska (B) · Y. Jiang ShanghaiTech University, Shanghai, China e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Switzerland AG 2021 T. Faulwasser et al. (eds.), Recent Advances in Model Predictive Control, Lecture Notes in Control and Information Sciences 485, https://doi.org/10.1007/978-3-030-63281-6_6
135
136
B. Houska and Y. Jiang
separable strictly convex objective function and affine coupling constraint, the dual function can be evaluated by solving decoupled optimization problems. One way to solve the dual maximization problem is by using a gradient or an accelerated gradient method as suggested in a variety of articles [26, 32, 33]. Other methods for solving this maximization problem are based on semi-smooth Newton methods [13, 14, 25], which have, however, the disadvantage that additional smoothing heuristics and linesearch routines are needed, as the dual function of most practically relevant convex optimization problem is usually not twice differentiable. An example for a software based on a combination of dual decomposition and a semi-smooth Newton algorithm is the code qpDunes [14]. An alternative to dual decomposition algorithms is the alternating direction method of multipliers (ADMM), which has originally been introduced in [15, 16]. In contrast to dual decomposition, which constructs a dual function by minimizing the Lagrangian over the primal variables, ADMM is based on the construction of augmented Lagrangian functions. During the last decade, ADMM received great attention from many researchers and can by now be considered as one of the most promising methods for distributed optimization [6, 8, 9]. In particular, [4] contains a self-contained convergence proof for a rather general class of convex optimization problems. The local convergence behavior of ADMM is linear for most problems of practical relevance, as, for example, discussed in [20] under mild technical assumptions. Besides dual decomposition, ADMM, and their variants, there exist a variety of other large-scale optimization methods some of which admit the parallelization or even distribution of most of their operations. For example, although sequential quadratic programming methods [3, 30, 31, 37] have not originally been developed for solving distributed optimization problems, they can exploit the partially separable structure of the objective function by using either block-sparse or low-rank Hessian approximations [36]. In particular, limited memory BFGS (L-BFGS) methods are highly competitive candidates for large-scale optimization [27]. As an alternative class of large-scale optimization methods, augmented Lagrangian methods [1, 19, 29, 35] have been analyzed and implemented in the software collection GALAHAD [7, 17]. A more exhaustive review of such augmented Lagrangian based decomposition methods for convex and non-convex optimization algorithms can be found in [18].
6.1.2 On the road between ADMM and SQP One of the main differences of ADMM compared to general Newton-type methods or, more specifically, in the context of optimization, sequential quadratic programming (SQP) methods is that they are not invariant under scaling. In practice, this means that it is advisable to apply a pre-conditioner in order to pre-scale the optimization variables before applying an ADMM method, as the method may converge very slowly otherwise. A similar statement holds for dual decomposition, if the dual maximization problem is solved with a gradient or an accelerated gradient method. Of course, the question whether it is desirable to exploit second-order information in a distributed or large-scale optimization method must be discussed critically. On
6 Distributed Optimization and Control with ALADIN
137
the one hand, one would like to avoid linear algebra overhead and decomposition of matrices in large-scale optimization, while, on the other hand, (approximate) secondorder information, as, for example, exploited by many SQP methods, might improve the convergence rate as well as the robustness of an optimization method with respect to the scaling of the optimization variables. Notice that the above reviewed L-BFGS methods are good examples for class of optimization algorithms, which attempt to approximate second-order terms while the algorithm is running, and which are competitive for large-scale optimization when compared to purely gradient-based methods. Thus, if sparse or low-rank matrix approximation techniques, for example, used in SQP methods, could be featured systematically in a distributed optimization framework, then this might lead to highly competitive distributed optimization algorithms that are robust with respect to scaling.
6.1.3 Why ALADIN? The goal of the current chapter is to provide a light introduction to the recently proposed ALADIN method, which can solve convex and non-convex optimization problems in a distributed way [22]. As such this chapter does not present any new technical contribution relative to the original ALADIN article [22] or its variants that can be found in [10, 24, 28], but it summarizes recent developments from a unifying perspective. Here, the main reasons, why we focus on ALADIN, can be summarized as follows: 1. ALADIN is a distributed algorithm that is locally equivalent to a Newton-type method, which means that locally super-linear or quadratic convergence rates can be achieved under certain regularity assumptions by using suitable Hessian matrix approximations. If the objective or constraint functions are non-differentiable, these regularity assumptions typically fail to hold. In this case, ALADIN can still be shown to converge, but only weaker convergence rate estimates are possible. It can also be shown that ALADIN contains SQP methods in the limit case if no nonlinear constraints are present and if the augmented Lagrangian parameters tend to infinity as shown in [22]. 2. ALADIN can be used to solve non-convex optimization problems to local optimality. Although there exist similar results for ADMM methods [21] under certain assumptions on the augmented Lagrangian parameter, ALADIN has the advantage that its local convergence properties are unaffected by how this parameter is adjusted. Moreover, in [10], a detailed numerical comparison of ADMM and ALADIN is presented for large-scale power network optimization problems, where it is indeed confirmed that both ALADIN and ADMM converge in principle, but ALADIN needs much fewer iterations to achieve the same accuracy.
138
B. Houska and Y. Jiang
6.2 Problem Formulation This section is about structured optimization problems of the form min x
N
f i (xi ) s.t.
i=1
N
Ai x i = b | λ .
(6.1)
i=1
Here, the functions f i : Rn → R ∪ {∞} are, in the most general case, semi-lower continuous functions, while the matrices Ai ∈ Rm×n and the vector b ∈ Rm are given. Throughout this paper dual variables are written immediately after the constraint; that is, in the above problem formulation, λ ∈ Rm denotes the multiplier that is associated with the coupled equality constraint. In practice, optimization problems of the above form often have the following characteristics: 1. the number N ∈ N is potentially large, 2. the functions f i are potentially non-convex, and 3. the matrices Ai are typically sparse. In order to solve (6.1) numerically, it is desirable to distribute the required computations. Here, one often assumes that the i-th processor or agent stores the function f i and the matrix Ai . For example, in the special case that we have no equality constraints, or if these equality constraints happen to be redundant, (6.1) can be solved by solving the smaller decoupled optimization problems min f (xi ) xi
(6.2)
for all i ∈ {1, . . . , N } separately. Of course, in practice, the equality constraints are usually not redundant, but the solution of (6.2) might still be a good initial approximation of the solution of (6.1). Intuitively, one might expect that this is the case, if the dual variable λ is relatively small compared to variations of the objective functions f i .
6.2.1 Consensus Constraints As much as distributed optimization problems arise in many applications, they are almost never in standard form (6.1). In this case, one needs to rewrite the optimization problem first in order to bring it into standard form. In order to understand this, it is helpful to have a look at the sensor network that is shown in Fig. 6.1. In this example, we have a network with seven sensors. The position of each sensor is denoted by χi ∈ R2 . If each sensor only takes a measurement ηi of its position χi and solves ∀i ∈ {1, . . . , 7},
min χi − ηi 22 , χi
6 Distributed Optimization and Control with ALADIN
139
Fig. 6.1 Example for a two-dimensional sensor network. The location of the i-th sensor is denoted by χi ∈ R2 . The distance between the first and the second sensor is a nonlinear function of χ1 and χ2 , the distance between the second and third sensor depends on χ2 and χ3 , and so on. Thus, one needs to introduce auxiliary variables in order to express these distances by using local variables only
all sensors can estimate their position independently; that is, without communicating with other sensors. However, as soon as the sensors additionally measure the distance to their neighbors, we would like to solve a least-squares localization problem of the form min χ
7
χi − ηi 22 + (χi − χi+1 2 − η¯ i )2
with χ8 = χ1 .
(6.3)
i=1
Here, η¯ i denotes the measurement for the distance between the i-th and the (i + 1)th sensor. Notice that this optimization is not yet in the form (6.1), because the objective function in (6.3) comprises terms of the form (χi − χi+1 2 − η¯ i )2 that are, unfortunately, coupled in χi and χi+1 . However, such couplings can be resolved by introducing the auxiliary variables xi = (χi , ζi )T
with
ζi = χi+1 .
This has the advantage that we can introduce the separable non-convex objective functions f i (xi ) =
1 1 1 χi − ηi 22 + ζi − ηi+1 22 + (χi − ζi 2 − η¯ i )2 . 2 2 2
Moreover, the affine coupling, ζi = χi+1 , can be written as 7 i=1
Ai x i = 0
140
B. Houska and Y. Jiang
by defining the matrices Ai ∈ {−1, 0, 1}7×4 appropriately. Constraints of this form are called consensus constraints, because they enforce the local copies of all variables of the different agents to coincide. Thus, a general strategy for formulating distributed optimization problems can be summarized as follows: 1. introduce local copies for neighbor variables until all terms in the objective function are decoupled, and 2. introduce affine consensus constraints that enforce all local copies of the same variable to coincide. Notice that the same strategy can also be used to decouple inequality constraints as further elaborated in the section below.
6.2.2 Inequality Constraints and Hidden Variables Although we regard (6.1) as the standard form of distributed optimization problems, it is important to keep in mind that the functions f i often have a very particular structure. This is because one is often interested in solving inequality constrained nonlinear optimization problems of the form
min x,z
N
Fi (xi , z i ) s.t.
i=1
⎧ N ⎨
Ai x i = b | λ ⎩ i=1 h i (xi , z i ) ≤ 0 | κi .
(6.4)
Here, the functions h i can be used to model bounds or other constraints on the local variables. Moreover, the above problem formulation distinguishes between the variables z i that are entirely local and the variables xi that are coupled via an affine constraint. In order to write (6.4) in the form of the standard problem (6.1), one needs to set f i (xi ) = min zi
Fi (xi , z i ) s.t. h i (xi , z i ) ≤ 0 .
(6.5)
This means that an evaluation of the functions f i requires one to solve a parametric optimization problem. In particular, this notation is only possible, if one additionally defines f i (xi ) = ∞ for all xi at which (6.5) is infeasible. Notice that this notation hides both the inequality constraints and the variables z i . This is why we call the variables z i hidden variables. Here, an obvious advantage of introducing the functions f i is that this simplifies notation and makes some of our theoretical derivations below easier to read. However, one has to keep in mind that this construction implies that 1. the functions f i are typically non-differentiable, and 2. the functions f i potentially take the value +∞ (if the hidden optimization problem is infeasible).
6 Distributed Optimization and Control with ALADIN
141
Thus, in the following, although we will usually work with the simplified notation in (6.1), we will sometimes come back to formulation (6.4) whenever the particular structure of this hidden optimization problem needs to be exploited.
6.3 Augmented Lagrangian Based Alternating Direction Inexact Newton (ALADIN) Method The most basic variant of ALADIN proceeds by repeating two main steps. In the first step, one solves decoupled optimization problems of the form min f i (yi ) + λT Ai yi + yi
1 yi − xi 2Σi , 2
(6.6)
where λ ∈ Rm is the current iterate for the multiplier of the consensus constraint, xi the current iterate for the primal variable, and Σi 0 a positive definite scaling matrix. Notice that the optimization problem is called “decoupled”, because the objective function depends only on the i-th objective function f i and on the matrix Ai that belongs to the i-th agent, too. This means in particular that the optimization problems (6.6) can for all i ∈ {1, . . . , N } be solved in parallel without the need to communicate information between the agents. In order to understand the terms in the objective, it is helpful to first consider the case that the functions f i are twice continuously differentiable. In this case, the optimality condition of the decoupled NLPs (6.6) can be written in the form ∇ f i (yi ) = Σi (xi − yi ) − AiT λ . This optimality condition must be compared to the first-order stationarity condition ∇ f i (xi∗ ) = −AiT λ∗ that is necessarily satisfied at KKT points (x ∗ , λ∗ ) of the original optimization problem (6.1) that we wish to compute. Here, it becomes clear that if our current iterate (x, λ) is close to (x ∗ , λ∗ ), then the solution yi of (6.6) can be expected to be close to xi∗ . Remark 6.1 The objective function of (6.6) is in the distributed optimization literature known under the name Augmented Lagrangian. The properties of such augmented Lagrangians have been analyzed by many authors [2, 18]. They are frequently used for developing proximal methods [6] and they may also be considered as one of the basic ingredients for developing ADMM methods [5, 8]. In the context of ALADIN, the computation of decoupled NLP solutions is alternated with the second main step, namely, the solution of coupled quadratic programming problems of the form
142
B. Houska and Y. Jiang
min Δy
N 1 i=1
2
ΔyiT Hi Δyi + giT Δyi
N
s.t.
Ai (yi + Δyi ) = b | λ+ . (6.7)
i=1
Here, the gradient gi = ∇ f i (yi ) and the Hessian matrix approximation Hi ≈∇ 2 f i (yi ) are both evaluated at the solutions of the decoupled NLP problems. In contrast to ADMM methods, the introduction of this coupled QP is inspired from the field of sequential quadratic programming (SQP). In the most basic variant of ALADIN, the NLP (6.4) and the QP (6.7) is all we need to setup a simple method for distributed optimization. This basic variant is summarized in the form of Algorithm 1. Before we discuss the convergence properties of this method in more detail, we have a brief look at possible termination conditions as well as certain advanced variants, which helps us to get an overview of why and how this basic variant of ALADIN can be refined. Algorithm 1: Basic ALADIN Input: Initial guesses xi ∈ Rn and λ ∈ Rm , scaling matrices Σi ∈ Sn++ and a termination tolerance > 0. Repeat: 1. Solve for all i ∈ {1, . . . , N } the decoupled NLPs min f i (yi ) + λT Ai yi + yi
1 yi − xi 2Σi . 2
2. Set gi = ∇ f i (yi ) and Hi ≈ ∇ 2 f i (yi ). 3. Solve the coupled equality constrained QP min Δy
N 1 i=1
2
ΔyiT Hi Δyi + giT Δyi
s.t.
N
Ai (yi + Δyi ) = b | λ+ .
i=1
4. Set x ← x + = y + Δy and λ ← λ+ and continue with Step 1.
Remark 6.2 If the functions f i are quadratic forms and if the matrices Hi are set to the exact Hessian matrices, Algorithm 1 converges trivially in one step. This is in contrast to standard ADMM methods, which usually do not converge in one step, even if all variables are scaled optimally. For example, if we consider the quadratic optimization problem min (q x − 1)2 s.t. x = 0 , x
with q = 0, the standard way of applying parallel ADMM would be to write this problem in consensus form min I0 (x) + (qy − 1)2 s.t. x = y , x,y
with
I0 (x) =
0 if x = 0 . ∞ otherwise
6 Distributed Optimization and Control with ALADIN
143
The augmented Lagrangian of this optimization problem can be written in the form L ρ (x, y, λ) = I0 (x) + (qy − 1)2 + λ(y − x) +
ρ (y − x)2 2
such that the associated ADMM iteration has the form y + = argmin L ρ (x, y, λ) = y
1 [2q − λ + ρx] +ρ
(6.8)
2q 2
λ+ = λ + ρ(y − x) x + = argmin L ρ (x, y + , λ+ ) = 0 ,
(6.9) (6.10)
x
which simplifies to [λ+ − 2q] =
2q 2 [λ − 2q] . 2q 2 + ρ 2
It is clear that this ADMM iteration converges for any ρ > 0 as 2q2q2 +ρ ∈ (0, 1). Because we have to choose ρ < ∞, the contraction factor is never equal to 0. Moreover, rescaling the optimization variable y is equivalent to using a different q = 0. Thus, no matter how we scale the variables or choose ρ, ADMM never converges in one step (assuming that we do not initialize at the optimal solution). Thus, ALADIN has, at least for this test example, a clear advantage over ADMM, since it can achieve one-step convergence by using exact Hessian matrices.
6.3.1 Termination Conditions Notice that Algorithm 1 has two main primal iterates: the iterates x, which are obtained as the solutions of the coupled QPs, and the iterates y, whose blockcomponents are obtained as the solutions of the decoupled NLPs. Now, one can terminate Algorithm 1 if the condition yi − xi ≤
(6.11)
is satisfied for all i ∈ {1, . . . , N } for a user-specified numerical tolerance > 0. Notice that this termination condition measures the distance between the iterates x and y. Because the iterates y satisfy ∇ f i (yi ) = Σi (xi − yi ) − AiT λ
(6.11)
=⇒
∇ f i (yi ) + AiT λ ≤ O ( ) ,
this termination condition is sufficient to ensure that the violation of the first-order stationary KKT condition is small if is small. Moreover, because the iterates x satisfy the coupled feasibility condition, we have
144
B. Houska and Y. Jiang N
Ai x i = b
(6.11)
=⇒
i
N Ai yi − b ≤ O ( ) , i=1
where the latter conclusion follows from (6.11) and the triangle inequality for norms. Thus, in summary, it is sufficient to check that (6.11) holds upon termination, if our goal is to ensure that (yi , λ) satisfies the first-order KKT optimality conditions up to a term of order O ( ). Remark 6.3 In practice, one has different options for choosing the norm · in (6.11). Because the scaling matrices Σi are assumed to be positive definite, one can use the norm yi − xi = yi − xi Σi = (yi − xi )T Σi (yi − xi ) . Moreover, if a positive definite Hessian approximation Hi is available, one can directly set Σi = Hi . In Sect. 6.4.2, we will show that this particular scaling leads to a consistent convergence proof.
6.3.2 Derivative-Free Variants The basic ALADIN Algorithm 1 has (at least in a very similar version) for the first time been proposed in [22]. However, by now, there have appeared several variants of this algorithm. One of the most important generalizations (see, for example, [28]) is the derivative-free variant that is briefly discussed in this section. Here, the main motivation for generalizing Algorithm 1 is that in this algorithm one needs to evaluate the first-order derivatives gi = ∇ f i (yi ) of the functions f i . However, as we have discussed in Sect. 6.2.2, these functions are, in practice, not always differentiable. But, as long as the functions f i are Lipschitz continuous, we still have Σi (xi − yi ) − AiT λ ∈ ∂ f i (yi ) at the solution yi of the decoupled NLPs (6.4). Here, ∂ f i (yi ) denotes the Clarke subdifferential of f i at yi , which coincides with the standard subdifferential of f i for the case that this function is convex. Clearly, the above inclusion motivates to set gi = Σi (xi − yi ) − AiT λ , such that gi ∈ ∂ f i (yi ). This expression has the advantage that we can compute gi without needing to evaluate a derivative of f i —in fact, such derivatives do not even have to exist at yi . The corresponding derivative-free variant of ALADIN is summarized in the form of Algorithm 2.
6 Distributed Optimization and Control with ALADIN
145
Algorithm 2: Derivative-free ALADIN Input: Initial guesses xi ∈ Rn and λ ∈ Rm and a termination tolerance > 0. Repeat: 1. Solve for all i ∈ {1, . . . , N } the decoupled NLPs min f i (yi ) + λT Ai yi + yi
1 yi − xi 2Σi . 2
2. Choose matrices Hi ∈ Sn++ and set gi = Hi (xi − yi ) − AiT λ. 3. Solve the coupled equality constrained QP min Δy
N 1 i=1
2
ΔyiT Hi Δyi + giT Δyi
s.t.
N
Ai (yi + Δyi ) = b | λ+ .
i=1
4. Set x ← x + = y + Δy and λ ← λ+ and continue with Step 1.
As we shall see in the following section, one can still ensure convergence of Algorithm 2 under surprisingly mild conditions; that is, without assuming that the functions f i are differentiable. Notice that for this variant of ALADIN the matrices Hi and Σi should both be properly chosen positive definite matrices in order to scale the method.
6.3.3 Inequality Constraint Variants In this section, we recall that in many applications, the evaluation of the functions f i requires itself to solve optimization problems of the form f i (xi ) = min Fi (xi , z i ) s.t. h i (xi , z i ) ≤ 0 . zi
(6.12)
Here, we also recall that the variables z i are called hidden variables, as discussed in Sect. 6.2.2. Of course, if the functions f i have this form, one should not introduce a bilevel structure for solving the decoupled NLPs, but solve the joint augmented Lagrangian minimization problem instead, min Fi (yi , z i ) + λT Ai yi + yi ,z i
s.t.
h i (yi , z i ) ≤ 0 | κi .
1 yi − xi 2Σi 2 (6.13)
Notice that this optimization problem corresponds to the expanded form of the decoupled NLPs in Algorithm 2. Here, one should recall that Algorithm 2 is rather general and can, at least in principle, be applied to any non-differentiable functions f i . Nevertheless, if the functions Fi and h i are twice Lipschitz-continuously differentiable,
146
B. Houska and Y. Jiang
one might nevertheless be interested in evaluating the second derivatives of these functions in order to achieve a better scaling of Algorithm 2. Unfortunately, there is in general no systematic way of pre-computing optimal choices for the positive definite matrices Hi . However, one possible heuristic is to introduce a Gauss–Newton proximal objective function of the form T 2 μi, j ∇ h (y , z ) − y ξ y j i i i i ˜ Fi (ξi , ζi ) = Fi (ξi , ζi ) + ζi − z i 2 ∇z h j (yi , z i ) j∈A 2
i
with tuning parameters μi, j > 0. Here, yi and z i denote the solutions of the current (expanded) decoupled NLPs and Ai = { j | κi, j > 0} the strictly active set of the i-th decoupled optimization problem. The second derivatives of F˜i can now be evaluated at yi and z i , which suggest to set the Hessian matrix Hi to the Schur complement −1 2 ˜ 2 ˜ 2 ˜ 2 ˜ ∇ζ,ξ Hi = ∇ξ,ξ Fi (yi , z i ) − ∇ξ,ζ Fi (yi , z i ) ∇ζ,ζ Fi (yi , z i ) Fi (yi , z i ) . (6.14) We will see in the sections below that this choice for Hi can be justified in the sense that one can establish locally quadratic convergence for this variant of Algorithm 2 under suitable regularity assumptions as long as the inverse tuning parameters, 1 = O (xi − yi ) , μi, j tend to 0 as the algorithm converges. However, at the current status of research, we only have working heuristics but no systematic way for choosing these tuning parameters. More systematic inequality handling routines for ALADIN remain among the most important open problems for future research.
6.3.4 Implementation Details Notice that almost all the steps of Algorithms 1 and 2 are completely decoupled. Consequently, all these steps can be implemented in parallel by the agents of the system without communication. Here, the only exception is the step in which we have to solve the coupled QP min Δy
N 1 i=1
2
ΔyiT Hi Δyi
+
giT Δyi
s.t.
N
Ai (yi + Δyi ) = b | λ+ ,
(6.15)
i=1
which requires the agents to communicate with each other. Let us have a closer look at how we can solve this coupled QP. For this aim, we introduce the terms
6 Distributed Optimization and Control with ALADIN
R =
N
Ai [yi − Hi−1 gi ] − b
and
147
M =
i=1
N
Ai Hi−1 AiT ,
(6.16)
i=1
which can be computed by evaluating the above running sums by communicating the vectors Ai [yi − Hi−1 gi ] and the projected inverse Hessian matrices Ai Hi−1 Ai , which is possible as long as we choose positive definite Hessian approximations. In the following, we call M the dual Hessian matrix, which is known to be invertible, if the matrices Ai satisfy the linear independence constraint qualification (LICQ), that is, if the matrix [A1 , A2 , . . . , A N ] has full-rank. Next, the dual solution of the QP is given by λ+ = M −1 R .
(6.17)
Notice that the evaluation of the matrix vector product can often be further simplified if the matrix M (and its inverse) have a particular structure. Here, we assume that the LICQ condition holds such that the matrix M is indeed invertible. The primal solution can then be recovered as Δyi = −Hi−1 gi + AiT λ+ .
(6.18)
The evaluation of this term can be done in decoupled mode. Notice that if the matrices Hi are kept constant during the iterations, the dual Hessian matrix M and its pseudo-inverse (or a suitable decomposition of M) can be computed in advance before starting the algorithm. However, one of the main motivations for developing Algorithms 1 and 2 is their similarity to sequential quadratic programming algorithms, which motivates to update the matrices Hi during the iterations as discussed next. In this case, the decomposition of M needs to be updated, too.
6.4 Convergence Analysis The goal of this section is to concisely summarize the main ideas for establishing convergence of Algorithms 1 and 2. Here, one should keep in mind that the stronger the assumptions on the functions f i are, the more can be said about the convergence properties of ALADIN. The following sections focus on the following two prototype situations: 1. If the functions f i are non-convex, global convergence statements can in general not be made. Therefore, Sect. 6.4.1 focuses on the local convergence properties of ALADIN under various regularity assumptions, but without assuming that the functions f i are convex. 2. Another case of practical interest is the case, where the functions f i are (strictly) convex. In this case, ALADIN converges under rather mild assumptions to global
148
B. Houska and Y. Jiang
minimizers of (6.1) without needing regularity conditions on the minimizer or differentiability of the functions f i . At this point, it should be mentioned that the sections below are not intended to be general. Our summary of convergence conditions should rather be understood as an introduction, which is intended to help the reader to understand the main conceptual ideas for proving convergence of ALADIN. Of course, in practice, these ideas might have to be re-combined or generalized depending on the particular properties of the functions f i . For a complete (but also much more technical) discussion of the convergence properties of ALADIN, we refer to [22].
6.4.1 Local Convergence Results One of the main advantages of Algorithm 1 compared to other distributed optimization algorithms is its favorable local convergence behavior. In order to analyze this local convergence behavior, we assume for a moment that the objective functions f i are twice Lipschitz-continuously differentiable in a neighborhood of a local minimizer x ∗ of (6.1). This has the advantage that the following auxiliary result can be derived, which has, in a similar version, originally been established in [22]. Lemma 6.4 Let the functions f i be twice continuously differentiable and let Σi be such that the Hessian matrices, ∇ 2 f i (xi ) + Σi σ I , of the decoupled NLPs are all positive definite, σ > 0, for all x in a local neighborhood of a minimizer x ∗ of (6.1). Then there exist constants χ1 , χ2 < ∞ such that the solution y of the decoupled NLPs satisfies y − x ∗ ≤ χ1 x − x ∗ + χ2 λ − λ∗ ,
(6.19)
whenever x − x ∗ and λ − λ∗ are sufficiently small. Proof By writing out the first-order necessary optimality conditions for both the decoupled NLPs as well as the original coupled optimization problem (6.1), we find the equations
and
0 = ∇ f i (yi ) + AiT λ + Σi (yi − xi ) 0 = ∇ f i (xi∗ ) + AiT λ∗ .
(6.20) (6.21)
Subtracting the second from the first equation and resorting terms yields ∇ f i (xi∗ ) − ∇ f i (yi ) + Σi (xi − xi∗ ) = AiT (λ − λ∗ ) + Σi (yi − xi∗ ) . (6.22)
6 Distributed Optimization and Control with ALADIN
149
Next, we use the fact that the Hessian matrix ∇ 2 f i (xi ) + Σi is positive definite in a neighborhood of xi∗ , which means that the inequality ∇ f i (xi∗ ) − ∇ f i (yi ) + Σi (xi∗ − yi ) ≥ σ xi∗ − yi holds in this neighborhood. Thus, if we take the norm on both sides of (6.22) and substitute the above inequality, we find the inequality y − x ∗ ≤
A Σ λ − λ∗ + x − x ∗ . σ σ
This inequality implies that the statement of the lemma holds with χ2 = χ1 = Σ . σ
A σ
and
Theorem 6.5 Let the functions f i be twice Lipschitz-continuously differentiable and let x ∗ be a (local) minimizer of (6.1) at which the conditions from Lemma 6.4 are satisfied. If the LICQ condition for NLP (6.1) is satisfied and if we set Hi = ∇ 2 f i (xi ) in Step 1 of Algorithm 1, then this algorithm converges with locally quadratic convergence rate; that is, there exists a constant ω < ∞ such that we have 2 x + − x ∗ + λ+ − λ∗ ≤ ω x − x ∗ + λ − λ∗ , if x − x ∗ and λ − λ∗ are sufficiently small. Proof Notice that the statement of this proposition has been established in a very similar version (and under more general conditions) in [22]. Therefore, we only briefly recall the two main steps of the proof. In the first step, we analyze the decoupled NLPs in Step 2 of Algorithm 1. Lemma 6.4 states that there exist constants χ1 , χ2 < ∞ such that the solution of the decoupled NLPs satisfies y − x ∗ ≤ χ1 x − x ∗ + χ2 λ − λ∗
(6.23)
for all x, λ in the neighborhood of (x ∗ , λ∗ ). Thus, it remains to analyze the coupled QP, which can be written in the form min Δy
s.t.
N 1 i=1 N
2
ΔyiT ∇ 2 f i (xi )Δyi
+ ∇ f i (yi ) Δyi T
Ai (yi + Δyi ) = b | λ+ ,
i=1
since we may substitute gi = ∇ f i (yi ) and Hi = ∇ 2 f i (xi ). Next, since the functions ∇ 2 f i are Lipschitz continuous, we can apply a result from standard SQP theory [27] to show that there exists a constant χ3 < ∞ such that
150
B. Houska and Y. Jiang
+ x − x ∗ ≤ χ3 y − x ∗ 2
and
+ λ − λ∗ ≤ χ3 y − x ∗ 2 . (6.24)
The statement of the theorem follows now by combining (6.23) and (6.24).
Remark 6.6 Recall that for the inequality constrained case, which has been discussed in Sect. 6.3.3, the functions f i are in general not twice differentiable and, as such, the above results are not directly applicable. However, under the additional assumption that (x ∗ , z ∗ ) is a regular KKT point of (6.4); that is, such that the LICQ conditions, the second-order sufficient optimality condition, and the strict complementarity conditions are satisfied at this point, then f i is still twice continuously differentiable in a neighborhood of x ∗ (see [27] for details). Thus, under such a regularity assumption the statement of Lemma 6.4 can be rescued. The statement of Theorem 6.5 can be generalized for this case, too; if the Hessian matrix Hi is chosen as in (6.14) and if μ1i, j = O (xi − yi ), then we can still establish locally quadratic convergence of Algorithm 2. This follows simply from the fact that the particular construction of Hi in (6.14) ensures that solving the coupled QP is locally equivalent to an (inexact) SQP step. Notice that a more formal proof of this result for the case with inequalities can be found in [22].
6.4.2 Global Convergence Results This section discusses conditions under which the iterates y of Algorithm 2 converge globally to the set of minimizers of (6.1). Notice that such global convergence conditions can be established for the special case that the matrices Σi = Hi are constant and under the additional assumption that the functions f i are closed, proper, and convex. Although these assumptions are restrictive, this result is nevertheless relevant, since it implies that ALADIN converges for strictly convex optimization problems from any starting point and without needing a line-search or other globalization routines. Moreover, the condition that we must set Σi = Hi and that these matrices are constant can later be relaxed again, but they make our analysis easier, because under these assumptions the following statement holds. Lemma 6.7 If we set Σi = Hi 0, then the solutions x + = y + Δy and λ+ of the coupled QP in Algorithms 1 and 2 satisfy Mλ+ = Mλ + 2r and xi+ = 2yi − xi − Hi−1 AiT (λ+ − λ) , with r =
N i=1
(6.25) (6.26)
Ai yi − b.
Proof We start with Eq. (6.17), which can be written in the form Mλ+ = R =
N i=1
Ai [yi − Hi−1 gi ] − b .
(6.27)
6 Distributed Optimization and Control with ALADIN
151
Next, because we have Σi = Hi , we substitute the expression for the gradient gi = Σi (xi − yi ) − AiT λ = Hi (xi − yi ) − AiT λ , which yields +
Mλ = R =
N
Ai [yi − (xi − yi ) + Hi−1 AiT λ] − b
(6.28)
i=1
= Mλ + 2
N
Ai yi − 2b = Mλ + 2r .
(6.29)
i=1
This corresponds to the first equation of this lemma. The second equation follows after substituting the above explicit expressions for gi and λ+ in (6.18). The main technical idea for establishing global convergence results for ALADIN is to introduce the function N ∗ 2 xi − x ∗ 2 L (x, λ) = λ − λ M + i Hi
(6.30)
i=1
N recalling that M = i=1 Ai Hi−1 AiT denotes the dual Hessian matrix of the coupled ∗ QP. Here, x denotes a primal and λ∗ a dual solution of (6.1). Definition 6.8 In the following, we use the symbol K to denote the set of continuous and monotonously increasing functions α : R+ → R that satisfy α(x) > 0 for all x > 0 and α(0) = 0. An important descent property of L along the iterates of Algorithms 1 and 2 is established next. Theorem 6.9 Let the functions f i be closed, proper, and strictly convex, let Problem (6.1) be feasible and such that strong duality holds, and let the matrices Hi = Σi be symmetric and positive definite. If x ∗ = y ∗ denotes the primal and λ∗ a (not necessarily unique) dual solution of (6.1), then the iterates of Algorithm 2 satisfy L (x + , λ+ ) ≤ L (x, λ) − α y − y ∗
(6.31)
for a function α ∈ K . Proof In Lemma 6.7, we have shown that the optimality conditions for the coupled QPs in Algorithms 1 and 2 can be written in the form Mλ+ = Mλ + 2r and
xi+
= 2yi − xi −
(6.32) Hi−1 AiT (λ+
− λ) ,
(6.33)
152
B. Houska and Y. Jiang
N where r = i=1 Ai yi − b denotes the constraint residuum. Next, the optimality condition for the decoupled NLPs can be written in the form 0 ∈ ∂ f i (yi ) + AiT λ + Hi (yi − xi ) (6.33)
= ∂ f i (yi ) + AiT λ+ + Hi (xi+ − yi )
recalling that ∂ f i (yi ) denotes the subdifferential of f i at yi . Since the functions f i are assumed to be convex, yi is a minimizer of the auxiliary function T f˜i (ξ ) = f i (ξ ) + AiT λ+ + Hi (xi+ − yi ) ξ . Moreover, since we additionally assume that f i is strictly convex, the function f˜ inherits this property and we find f˜i (yi ) ≤ f˜i (yi∗ ) − α˜ yi − yi∗ for a K -function α˜ ∈ K . Summing the above inequalities for all i, substituting the definition of f˜i , and rearranging terms yields N i=1
N T AiT λ+ + Hi (xi+ − yi ) (yi∗ − yi ) − α˜ yi − yi∗ f i (yi ) − f i (yi∗ ) ≤ i=1
= −r T λ+ +
N
(xi+ − yi )T Hi (yi∗ − yi ) − α˜ yi − yi∗ .
i=1
(6.34) Here, the second equality holds due to the primal feasibility of y ∗ ; that is, b = Ay ∗ . In order to be able to proceed, we need a lower bound on the left-hand expression of the above inequality. Fortunately, we can use Lagrangian duality to construct such a lower bound. Here, the main idea is to introduce the auxiliary function G(ξ ) =
N i=1
f i (ξi ) +
N
Ai yi − b λ∗ .
i=1
Since y ∗ is a minimizer of the (strictly convex) function G, we find that G(y ∗ ) ≤ G(y) − α y − y ∗
(6.35)
for a function α ∈ K . This inequality can be written in the equivalent form N − r T λ∗ + α y − y ∗ ≤ f i (yi ) − f i (yi∗ ) . i=1
(6.36)
6 Distributed Optimization and Control with ALADIN
153
By combining (6.34) and (6.36), and sorting terms it follows that N ∗ T + ∗ ≥ r (λ − λ ) + −α y − y (yi − yi∗ )T Hi (xi+ − yi ) i=1 (6.32)
=
1 + (yi − xi∗ )T Hi (xi+ − yi ) . (λ − λ)T M(λ+ − λ∗ ) + 2 i=1 N
with α = α˜ + α . The sum on the right hand of the above equation can be written in the form N + ∗ T i=1 (yi − x i ) Hi (x i − yi ) T (6.26) 1 N xi+ + xi − 2xi∗ + Hi−1 AiT (λ+ − λ) Hi xi+ − xi − Hi−1 AiT (λ+ − λ) = 4 i=1 N + x − x ∗ 2 − 1 N xi − x ∗ 2 . = − 41 (λ+ − λ)T M(λ+ − λ) + 41 i=1 i=1 i i Hi i Hi 4 By substituting this expression back into the former inequality, it turns out that 1 1 −α y − y ∗ ≥ (λ+ − λ)T M(λ+ − λ∗ ) − (λ+ − λ)T M(λ+ − λ) 2 4 N N 1 x + − x ∗ 2 − 1 xi − x ∗ 2 + i Hi i Hi i 4 i=1 4 i=1 =
1 1 L (x + , λ+ ) − L (x, λ) . 4 4
This inequality is equivalent to (6.31), which corresponds to the statement of the theorem. Notice that Theorem 6.9 can be used to establish global convergence of ALADIN from any starting point if the matrices Hi are kept constant during the iterations. This result is summarized in the corollary below. Corollary 6.10 Let the functions f i be closed, proper, and strictly convex and let problem (6.1) be feasible and such that strong duality holds. If the matrices Hi = Σi 0 are kept constant during the iterations, then the primal iterates y of Algorithm 2 converge (globally) to the unique primal solution x ∗ = y ∗ of (6.1), y → y∗. Proof If the matrices Hi are constant, it follows from (6.31) that the function L is strictly monotonously decreasing whenever y = y ∗ . As L is bounded from below by 0, the value of L (x, λ) must converge as the algorithm progresses, but this is only possible if y converges to y ∗ . The statement of Corollary 6.10 is rather general in the sense that it establishes global convergence of the primal ALADIN iterates for potentially non-differentiable but
154
B. Houska and Y. Jiang
strictly convex functions f i and without assuming that any constraint qualification holds (although the dual iterates might not converge in such a general scenario). Nevertheless, so far, we have not yet addressed the question what happens if the functions f i are only convex but not strictly convex. Before we will answer this question, it should be noted first that if the functions f i are only known to be convex, the set Y∗ of minimizers of (6.1) is in general not a singleton. Thus, the best we can hope for in such a general case is that the ALADIN iterates y converge to the set Y∗ rather than to a specific minimizer. The following theorem shows that such a convergence statement is indeed possible. Theorem 6.11 Let the functions f i be closed, proper, and convex and let us assume that strong duality holds for (6.1). Let Y∗ denote the set of minimizers of (6.1). If the matrices Hi 0 are kept constant during all iterations, then the iterates y of Algorithm 1 converge (globally) to Y∗ , min∗ y − z → 0 . z∈Y
Proof Let Λ∗ denote the set of dual solutions of (6.1). As Λ∗ is a closed convex set, we can pick a λ∗ ∈ relint(Λ∗ ), where relint(Λ∗ ) denotes the relative interior1 of Λ∗ . Now, we define the function L as above but by using any x ∗ = y ∗ ∈ Y∗ and the above particular choice of λ∗ in the relative interior of Λ∗ . Now, the main idea of the proof of this theorem is to have a closer look at the auxiliary function G(ξ ) =
N
f i (ξi ) +
i=1
N
Ai yi − b λ∗ ,
i=1
which has already been used in the proof of Theorem 6.9. Clearly, since we assume that strong duality holds, y ∗ is a minimizer of this function, but we may have G(y) = G(y ∗ ) even if y = y ∗ . However, fortunately, we know that G(y) = G(y ∗ ) if and only if y ∈ Y∗ , since we have strong duality and we have chosen λ∗ in the relative interior of Λ∗ . Consequently, since closed, proper, and convex functions are lower semicontinuous [5], there must exist a continuous and strictly monotonously increasing function α : R → R with α(0) = 0 such that
1 G(y ) ≤ G(y) − α min∗ y − z . z∈Y 4 ∗
By following the same argumentation as in the proof of Theorem 6.9, we find that this implies
L (x + , λ+ ) ≤ L (x, λ) − α min∗ y − z . z∈Y
1 If
Λ∗ is a singleton, we have relint(Λ∗ ) = Λ∗ .
(6.37)
6 Distributed Optimization and Control with ALADIN
155
The proof of the corollary follows now by using an analogous argumentation as in Corollary 6.10. Notice that the convergence statements of Corollary 6.10 and Theorem 6.11 only prove that the iterates y of Algorithm 1 converge to the set of minimizers of Y∗ of Problem 6.1, but no statement is made about the convergence of the iterates x and λ. However, if additional regularity assumptions are introduced, the iterates x and λ also converge, as expected. Lemma 6.12 If the conditions of Theorem (6.11) are satisfied and if the functions f i are continuously differentiable, then we have both x → y and, as a consequence, min∗ x − z → 0 . z∈Y
Moreover, if the LICQ condition for (6.1) holds, the dual iterates also converge, λ → λ∗ . Proof As we assume that the functions f i are continuously differentiable, we have ∇ f i (yi ) + AT λ∗ 2 −1 → 0 , i H i
since ∇ f i is continuous and y → y ∗ . By writing the left-hand expression in the form N N ∇ f i (yi ) + AT λ∗ 2 −1 = Hi (xi − yi ) − AT (λ − λ∗ )2 −1 i i H H i
i=1
i=1
T = λ − λ∗ M λ − λ∗ +
N
xi − yi 2Hi + 2r T (λ − λ∗ ) ,
i=1
now, y → y ∗ implies r → 0 and, consequently, we find N T xi − yi 2Hi → 0 , λ − λ∗ M λ − λ∗ + i=1
which implies x − y → 0. If LICQ holds, the dual Hessian matrix M is invertible and we also have λ → λ∗ . Remark 6.13 The assumption that the functions f i are continuously differentiable is rather restrictive for practical applications. However, this regularity condition can be relaxed by generalizing the proof of Lemma 6.12. For example, if f i can be written in the form 0 if h˜ i (xi ) ≤ 0 f i (xi ) = f˜i (xi ) + ∞ otherwise
156
B. Houska and Y. Jiang
with closed, proper, convex, and differentiable functions f˜i : Rn → R as well as h˜ i : Rn → Rn h and if (x ∗ , λ∗ ) is a regular KKT point of (6.1) as defined in [27], we can still show that x → y and λ → λ∗ . The proof of this generalization of Lemma 6.12 is technical but straightforward, and therefore not further elaborated in the current chapter.
6.5 Numerical Implementation and Examples In this section, we present one explicitly worked out tutorial example, which illustrates the convergence properties of ALADIN, as well as an application of distributed optimization that arises in the context of model predictive control.
6.5.1 Tutorial Example In order to understand the local and global convergence properties of Algorithm 2, let us consider the tutorial optimization problem min x
1 1 q1 x12 + q2 (x2 − 1)2 s.t. x1 − x2 = 0 | λ 2 2
(6.38)
with q1 , q2 ≥ 0, q1 + q2 > 0. The explicit solution of this problem is given by z ∗ = x1∗ = x2∗ =
q2 q2 q1 and λ∗ = − . q2 + q1 q2 + q1
The initialization of the primal variable is set to x1 = x2 = z. Thus, if we choose H1 = H2 = H > 0 the subproblems from Step 1 of Algorithm 2 can be written in the form 1 1 H H min q1 y12 + λy1 + (y1 − z)2 and min q2 (y2 − 1)2 − λy2 + (y2 − z)2 . y1 2 y2 2 2 2 The explicit solution of these subproblems is given by y1 =
Hz − λ H z + λ + q2 and y2 = . q1 + H q2 + H
Next, we work out the solution of the QP in Step 3 (z + , λ+ ), with z + = x1+ = x2+ , which yields
+ λ − λ∗ λ − λ∗ = C z+ − z∗ z − z∗
6 Distributed Optimization and Control with ALADIN
157
Fig. 6.2 The absolute value of the maximum eigenvalue, |eig(C)|, of the contraction matrix C versus the scaling H ∈ [10−3 , 103 ] for q1 = 0.1 and q2 = 10. Notice that we have |eig(C)| < 1 for all H > 0, which implies that ALADIN converges for all choices of H . However, the method contracts faster if H ≈ q1 or if H ≈ q2 , as the eigenvalues of C are close to 0 in these cases
with
1 C= (q1 + H )(q2 + H )
q1 q2 − H 2 H 2 (q2 − q1 ) q1 − q2 H 2 − q1 q2
.
In this form, it becomes clear that the convergence rate of the ALADIN iterates depends on the magnitude of the eigenvalues of the matrix C, which are given by eig(C) = ±
q1 − H q2 − H . q1 + H q2 + H
Notice that for any H > 0 these eigenvalues are contained in the open unit disc. This implies that ALADIN converges independent of how we choose H . However, the above eigenvalues depend on the term qq11 −H ∈ (−1, 1), which can be interpreted as +H the relative Hessian approximation error of the first objective function. Similarly, q2 −H ∈ (−1, 1) is the relative error associated with the second objective function. q2 +H Thus, the closer H approximates q1 or q2 the better the convergence rate will get. This convergence behavior is also visualized in Fig. 6.2.
6.5.2 Model Predictive Control This section applies ALADIN in the context of model predictive control (MPC). We consider an optimal control problem in discrete-time form,
min ξ,ν
N −1 i=0
⎧ ⎨ ∀i ∈ {0, ..., N − 1}, ξi+1 = A ξi + Bνi (ξi , νi ) + T (ξ N ) s.t. ⎩ ξi ∈ X , νi ∈ U , ξ N ∈ X N .
(6.39)
158
B. Houska and Y. Jiang
The main idea of MPC is to solve (6.39) iteratively at every sampling time based on the current initial state or measurement ξ0 . Here ξ and ν denote state trajectory and control inputs, respectively. The stage cost is denoted by and T denotes the terminal cost. The matrices A and B are assumed to be given. Moreover, X and U denote state and control constraint sets, and X N denotes a terminal set.
6.5.2.1
Standard Form
The above optimization problem can be written in the form of the distributed optimization problem (6.1). For this aim, we introduce the optimization variables x0 = ν0 , xi = [ξiT , νiT ]T , and x N = ξ N . The associated objective functions are given by f i (xi ) =
(ξi , νi ) if (ξi , νi ) ∈ X × U ∞
otherwise
and f N (x N ) =
T (ξ N ) if ξ N ∈ X N . ∞ otherwise
Last but not least, we introduce the matrices ⎛ ⎞ ⎛ B −I ⎜0⎟ ⎜A ⎜ ⎟ ⎜ ⎜ ⎟ ⎜ A0 = ⎜ 0 ⎟ , A1 = ⎜ 0 ⎜ .. ⎟ ⎜ .. ⎝.⎠ ⎝ . 0 0
⎞ ⎛ 0 0 ⎜−I B⎟ ⎟ ⎜ ⎜ 0⎟ ⎟ , A2 = ⎜ A ⎜ .. .. ⎟ ⎝ . .⎠ 0 0
⎞ ⎛ ⎞ 0 0 ⎜ 0 ⎟ 0⎟ ⎟ ⎜ ⎟ ⎜ ⎟ B⎟ ⎟ , . . . , AN = ⎜ 0 ⎟ ⎜ .. ⎟ .. ⎟ ⎝ . ⎠ .⎠ 0 −I
in order to represent the dynamics in the form of an affine coupling constraint. Notice that this optimization problem has 6N primal and 4(N + 1) dual variables.
6.5.2.2
Parameters
In order to setup a benchmark case study with 4 states and 2 controls, we use the system matrices ⎛
0.999 ⎜0.000 A =⎜ ⎝0.000 0.000
−3.008 0.986 2.083 0.053
the quadratic stage cost
−0.113 0.048 1.010 0.050
⎞ ⎛ −1.608 −0.080 ⎜−0.029 0.000 ⎟ ⎟ , B=⎜ ⎝−0.868 0.000 ⎠ 1.000 −0.022
⎞ −0.635 −0.014⎟ ⎟ , −0.092⎠ −0.002
6 Distributed Optimization and Control with ALADIN
⎛
0.1 ⎜ ⎜ 0 (ξ, ν) = ξ ⎝ 0 0
0 100 0 0
0 0 0.1 0
159
⎞ 0
0 ⎟ ⎟ ξ + ν 10 0 ν , 0 10 0 ⎠ 100
the locally exact terminal cost ⎛
1.42 ⎜ −26.08 T (ξ ) = ξ ⎜ ⎝ −0.96 10.33 and the constraint sets 4 X = ξ ∈R 2 U = ν∈R
−26.08 1462.89 53.93 −776.41
−0.96 53.93 10.25 36.37
⎞ 10.33 −776.41⎟ ⎟ξ , 36.37 ⎠ 1291.95
! !
−0.5 0100 0.5 ξ , −10 0001 10 ! !
−25 25 ν . −25 25
The terminal constraint is set to X N = X and we use the initial state ξ0 = (10, 0, 10, 10) .
6.5.2.3
Numerical Results
In this section, we solve the above MPC problem with ADMM [4] as well as the presented ALADIN method. Figure 6.3 shows the convergence of two variants of ALADIN versus traditional ADMM for the first MPC iteration. Here, the first variant of ALADIN uses a constant Hessian matrix while the other one updates Hi during the iterations. Notice that both ADMM as well as ALADIN with constant Hessian have a linear convergence rate. This is in contrast to the other variant of ALADIN, where we use exact Hessians in order to achieve a locally quadratic convergence rate, as discussed in Theorem 6.5. Figure 6.4 shows the closed-loop state trajectory that has been obtained by running the proposed algorithm with a fixed number of iterations per sampling time. Here, we simulate 200 time steps with sampling time 0.05 using 3 or 10 ALADIN iterations per sampling time, respectively. The corresponding optimal closed-loop trajectory is shown as a reference in the form of the black solid line.
160
B. Houska and Y. Jiang
Fig. 6.3 Logarithm of the distance of the current iterates to the optimal solution for ALADIN with exact Hessian (red circles), ALADIN with fixed Hessians (black triangles), and ADMM (blue rectangles)
Fig. 6.4 First two components of the closed-loop state trajectory for 3 ALADIN iterations per sampling time (red dotted line) and 10 ALADIN iterations per sampling time (blue dashed line). The black solid line depicts the optimal closed-loop trajectory
6.6 Conclusions This chapter has presented a light introduction to a relatively new distributed optimization algorithm, named augmented Lagrangian based alternating direction inexact Newton (ALADIN) method. This algorithm can be used to solve both convex as well as non-convex optimization problems, while exploiting separable structures. Here, one major advantage to traditional ADMM methods is that one can update the
6 Distributed Optimization and Control with ALADIN
161
Hessian matrix during the iterations such that a locally quadratic convergence rate can be observed. Moreover, we have discussed how global convergence guarantees can be derived for ALADIN under the assumption that the objective functions are strictly convex. In the second part of this chapter, we have additionally illustrated the numerical performance of ALADIN compared to ADMM, as well as its use in real-time model predictive control. During the past years, there have appeared a couple of articles on applications of ALADIN for convex and non-convex distributed optimization [10, 11, 22–24, 28, 34], which all report promising numerical performance for a large variety of applications. Nevertheless, as we have also discussed in this chapter, there are still a number of pressing open problems in distributed optimization. For example, it is not clear at the current status of research whether one can still ensure global convergence of ALADIN for convex problems, if the Hessian matrices are updated during the iterations. Similarly, it is not clear at all how to update these Hessian matrices in the presence of active set changes in the distributed solvers. Last but not least, although one option for designing a globalization routine for ALADIN with application to non-convex problems has been presented in [22], much more research will be needed to make these globalization routines robust enough to perform well on larger benchmark case studies for non-convex optimization. Thus, as much as this chapter has provided an introduction to ALADIN, it has also been intended to serve as an invitation to the optimization and control community to join this promising direction of distributed optimization method development.
References 1. Andreani, R., Birgin, E.G., Martinez, J.M., Schuverdt, M.L.: On augmented Lagrangian methods with general lower-level constraints. SIAM J. Optim. 18, 1286–1309 (2007) 2. Bertsekas, D.P.: Nonlinear Programming, 2nd edn. Athena Scientific (1999) 3. Boggs, P.T., Tolle, J.W.: Sequential quadratic programming. Acta Numerica 4, 1–51 (1995) 4. Boyd, S., Parikh, N., Chu, E., Peleato, B., Eckstein, J.: Distributed optimization and statistical learning via the alternating direction method of multipliers. Found. Trends Mach. Learn. 3, 1–122 (2011) 5. Boyd, S., Vandenberghe, L.: Convex Optimization. University Press, Cambridge (2004) 6. Chen, G., Teboulle, M.: A proximal-based decomposition method for convex minimization problems. Math. Program. 64, 81–101 (1994) 7. Conn, A.R., Gould, G.I.M., Toint, P.L.: LANCELOT: a Fortran package for large-scale nonlinear optimization (Release A), vol. 17. Springer Science & Business Media (2013) 8. Eckstein, J., Bertsekas, D.P.: On the Douglas-Rachford splitting method and the proximal point algorithm for maximal monotone operators. Math. Program. 55, 293–318 (1992) 9. Eckstein, J., Ferris, M.C.: Operator-splitting methods for monotone affine variational inequalities, with a parallel application to optimal control. INFORMS J. Comput. 10, 218–235 (1998) 10. Engelmann, A., Jiang, Y., Mühlpfordt, T., Houska, B., Faulwasser, T.: Towards distributed OPF using ALADIN. IEEE Trans. Power Syst. 34(1), 584–594 (2019) 11. Engelmann, A., Mühlpfordt, T., Jiang, Y., Houska, B., Faulwasser, T.: Distributed stochastic AC optimal power flow based on polynomial chaos expansion. In: Proceedings of the American Control Conference, pp. 6188–6193 (2018)
162
B. Houska and Y. Jiang
12. Everett, H.: Generalized Lagrange multiplier method for solving problems of optimum allocation of resources. Oper. Res. 11(399–417) (1963) 13. Ferreau, H.J., Kozma, A., Diehl, M.: A parallel active-set strategy to solve sparse parametric quadratic programs arising in MPC. In: Proceedings of the 4th IFAC Nonlinear Model Predictive Control Conference (2012) 14. Frasch, J.V., Sager, S., Diehl, M.: A parallel quadratic programming method for dynamic optimization problems. Math. Program. Comput. 7(3), 289–329 (2015) 15. Gabay, D., Mercier, B.: A dual algorithm for the solution of nonlinear variational problems via finite element approximations. Comput. Math. Appl. 2, 17–40 (1976) 16. Glowinski, R., Marrocco, A.: Sur l’approximation, par elements finis d’ordre un, et la resolution, par penalisation-dualite, d’une classe de problems de dirichlet non lineares. Revue Francaise d’Automatique, Informatique, et Recherche Operationelle 9, 41–76 (1975) 17. Gould, N.I.M., Orban, D., Toint, P.: GALAHAD, a library of thread-safe Fortran 90 packages for large-scale nonlinear optimization. ACM Trans. Math. Softw. 29(4), 353–372 (2004) 18. Hamdi, A., Mishra, S.K.: Decomposition methods based on augmented Lagrangian: a survey. In: Topics in Nonconvex Optimization. Mishra, S.K., Chapter 11, pp. 175–204 (2011) 19. Hestenes, M.R.: Multiplier and gradient methods. J. Optim. Theory Appl. 4, 302–320 (1969) 20. Hong, M., Luo, Z.Q.: On the linear convergence of the alternating direction method of multipliers. Math. Program. 162(1), 165–199 (2017) 21. Hong, M., Luo, Z.Q., Razaviyayn, M.: Convergence analysis of alternating direction method of multipliers for a family of nonconvex problems. SIAM J. Optim. 26(1), 337–364 (2016) 22. Houska, B., Frasch, J., Diehl, M.: An augmented Lagrangian based algorithm for distributed non-convex optimization. SIAM J. Optim. 26(2), 1101–1127 (2016) 23. Jiang, Y., Oravec, J., Houska, B., Kvasnica, M.: Parallel explicit model predictive control. arXiv preprint: arXiv:1903.06790 (2019) 24. Kouzoupis, D., Quirynen, R., Houska, B., Diehl, M.: A block based ALADIN scheme for highly parallelizable direct optimal control. In: Proceedings of the American Control Conference, pp. 1124–1129 (2016) 25. Kozma, A., Frasch, J., Diehl, M.: A distributed method for convex quadratic programming problems arising in optimal control of distributed systems. In: Proceedings of the 52nd IEEE Conference on Decision and Control, pp. 1526–1531 (2013) 26. Necoara, I., Doan, D., Suykens, J.A.K.: Application of the proximal center decomposition method to distributed model predictive control. In: Proceedings of the 47th IEEE Conference on Decision and Control, pp. 2900–2905 (2008) 27. Nocedal, J., Wright, S.J.: Numerical Optimization. Springer Series in Operations Research and Financial Engineering Springer, 2nd edn. (2006) 28. Oravec, J., Jiang, Y., Houska, B., Kvasnica, M.: Parallel explicit MPC for hardware with limited memory. In: Proceedings of the 20th IFAC World Congress, pp. 3356–3361 (2017) 29. Powell, M.J.D.: A method for nonlinear constraints in minimization problems. In: Fletcher, R. (ed.) Optimization. Academic Press (1969) 30. Powell, M.J.D.: A fast algorithm for nonlinearly constrained optimization calculations. Numerical Analysis Dundee. Springer, Berlin (1977) 31. Powell, M.J.D.: The convergence of variable metric methods for nonlinearly constrained optimization calculations. Nonlinear Programming 3. Academic Press, New York and London (1978) 32. Rantzer, A.: Dynamic dual decomposition for distributed control. In: Proceedings of the 2009 American Control Conference, pp. 884–888 (2009) 33. Richter, S., Morari, M., Jones, C.N.: Towards computational complexity certification for constrained MPC based on Lagrange relaxation and the fast gradient method. In: Proceedings of the 50th IEEE Conference on Decision and Control and European Control Conference (CDC-ECC) (2011) 34. Shi, J., Zheng, Y., Jiang, Y., Zanon, M., Hult, R., Houska, B.: Distributed control algorithm for vehicle coordination at traffic intersections. In: Proceedings of the 17th European Control Conference, pp. 1166–1171 (2017)
6 Distributed Optimization and Control with ALADIN
163
35. Tapia, R.A.: Quasi-newton methods for equality constrained optimization: equivalence of existing methods and a new implementation. In: Nonlinear Programming 3, pp. 125–164. Elsevier (1978) 36. Toint, P.: On sparse and symmetric matrix updating subject to a linear equation. Math. Comput. 31(140), 954–961 (1977) 37. Wilson, R.B.: A simplicial algorithm for concave programming. Ph.D. thesis, Graduate School of Business Administration, Harvard University (1963)
Chapter 7
Model Predictive Control for the Internet of Things B. Karg
and S. Lucia
Abstract In this chapter, we argue that model predictive control (MPC) can be a very powerful technique to mitigate some of the challenges that arise when designing and deploying control algorithms in the context of the internet of things. The development of new low-power communication technologies and the widespread availability of sensing and computing capabilities, which are a characteristic of the internet of things, enables the consideration of a large amount of interconnection and feedback loops. However, this also introduces important challenges such as the very limited communication capabilities of low-power wide area networks or the limited computational resources of low-cost micro-controllers. We argue that, as a predictive control scheme, MPC is a powerful technique to deal with limited communication capabilities and can be naturally extended to the context of distributed control, for cases where all the sensing information cannot be centralized. We also present an approach to approximate the solution of MPC problems using deep neural networks, rendering the embedded implementation of complex controllers possible even on very limited hardware. The ideas are illustrated with an example of distributed model predictive control of a smart building.
7.1 Introduction The advent of new communication technologies for the internet of things (IoT) and the ubiquity of sensors and micro-controllers drive the spread of wireless sensor networks and interconnected devices. These new developments also raise new challenges in the field of control engineering. On the one hand, some of the new wireless technologies like low-power wide area networks (LPWANs) offer very restricted communication. On the other hand, the limited computational capabilities of the available hardware pose an important challenge for the application of advanced and
B. Karg · S. Lucia (B) Technische Universität Dortmund, Dortmund, Germany e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Switzerland AG 2021 T. Faulwasser et al. (eds.), Recent Advances in Model Predictive Control, Lecture Notes in Control and Information Sciences 485, https://doi.org/10.1007/978-3-030-63281-6_7
165
166
B. Karg and S. Lucia
highly complex control algorithms that are necessary to achieve the desired highperformance operation. An interconnected system consists of various subsystems which can be coupled via either communication links, physical couplings, or both. The exchange of information using wireless technologies has been an important subject of study in the field of control [55]. Recently, new communication technologies such as Sigfox [12], LoRa [5, 12], and NB-IoT [21] have gained increasing popularity in the context of the internet of things due to their low energy requirements and long range. For example, for LoRa a world record distance of 703km was set using as little as 25 mW [70]. These technologies, classified as LPWANs, enable cheap installation of battery-powered micro-controllers and sensors and a high degree of flexibility. One main characteristic of LPWANs is their strongly limited communication including a defined number of events per day to receive data (downlink) and to send data (uplink) where each communication event has a restricted data volume. For systems where external factors and coupling effects have a major impact, communication can be crucial and it is a major challenge for the deployment of control systems based on low-power wide area networks. Such communication restrictions necessarily introduce additional uncertainties that affect the control of the interconnected system. Using robust predictive control algorithms, uncertainties can be directly taken into account and safe operation of the subsystems can be achieved. To deal with the uncertainty that arises from the communication limitations, as well as any other model uncertainty or disturbances, robust MPC approaches such as tube-based MPC [48] or multi-stage MPC [18, 43, 65] can be considered. However, robustifying MPC can result in a computationally more demanding optimization problem which needs to be solved at every control instant. This poses a second main challenge of MPC in the context of the internet of things, which arises because of the limited computational power and low memory storage of IoT devices such as low-cost micro-controllers. Different approaches have been proposed in the field of MPC to tackle this downside. Some methods focused on optimization algorithms tailored for embedded devices based on first-order methods like Nesterov’s fast gradient method [25, 42, 61] or the alternating direction method of multipliers (ADMM) [10]. Another family of methods tries to obtain an explicit formulation of the MPC controller [8]. The tailored algorithms can suffer from a significant computational burden but explicit solutions require a notable amount of memory storage, which grows exponentially with the problem dimension. Methods reducing the necessary memory by an alternative representation include the use of an alternative number representation [32], eliminating rendundant regions [24] or using lattice representation [73]. Other approaches approximate the exact explicit solution of the MPC problem to obtain a further reduction of the memory requirements. Approximate methods include the usage of less regions [30] or a simplicial partition [9] to describe the MPC law as well as using neural networks [54] and radial basis functions [17] to approximate the explicit MPC law. In this chapter, deep learning is proposed as a method to approximate the explicit MPC solution which is motivated by new theoretical advances on the approximation capabilities of deep neural networks [27, 50, 66].
7 Model Predictive Control for the Internet of Things
167
By using deep neural networks, the computational load and the memory footprint can be significantly reduced while still obtaining a near-optimal performance [13, 29, 38, 74]. This renders the deployment of complex control algorithms possible on edge devices which makes sending sensitive information, e.g., occupancy data for climate control of buildings [49], superfluous, and helps tackling privacy and security issues for the internet of things [63, 67]. This chapter provides an overview of current methods and possibilities to tackle the following challenges in the context of model predictive control and the internet of things: • Non-centralized MPC approaches for systems with very limited communication capacities. • Deep learning-based MPC approach for systems with low storage space and computational power. The remainder of this chapter is organized as follows. In Sect. 7.2, different MPC strategies for the control of interconnected systems are presented. The next Sect. 7.3 presents methods to cope with some of the challenges for MPC in the context of the internet of things. It shows how communication limitations can be directly considered in predictive control algorithms and how deep neural networks can be used to approximate complex control schemes to enable the usage of advanced algorithms on computationally limited hardware. In Sect. 7.4, a building model for multi-room temperature control is presented as a case study. In Sect. 7.5, the case study is evaluated to illustrate the potential of different MPC schemes. In the final Sect. 7.6, the results of this chapter are summarized and a short outlook is given.
7.2 MPC for Interconnected Systems 7.2.1 Interconnected Systems Interconnected systems are large networks of n s coupled subsystems. Each subsystem is either physically connected to other subsystems, connected via a communication link or both. Additionally, external factors d j might act on each subsystem S j . An example of an interconnected system consisting of six subsystems is depicted in Fig. 7.1. Subsystem S4 , marked blue, is coupled physically and with a communication link to subsystem S3 and subsystem S5 . The communication link enables the application of a distributed controller for S4 by computing the optimal inputs based on exchanged information such as local measurements or predicted trajectories. Subsystem S6 , marked green, is connected to subsystem S5 only physically. This allows the implementation of a decentralized controller for which couplings can only be considered in the design process, because there is no communication link, and therefore no new coupling information can be obtained online. We assume that each subsystem S j , with j = 1, . . . , n s , can be described by linear time-invariant dynamics:
168
B. Karg and S. Lucia
Fig. 7.1 Interconnected system consisting of six subsystems. Dotted lines indicate communication links and solid lines couplings. Arrows represent external disturbances and the orange rectangle represents the system border of the global interconnected system
x j (k + 1) = A j x j (k) + B j u j (k) + E j d j (k) + F j z j (k),
(7.1)
where k is the time step, x j (k) ∈ Rn j are the states, u j (k) ∈ Rm j are the manipulated variables, and d j (k) ∈ R p j are external disturbances. The variable z j (k) ∈ Rq j contains the coupling variables of the linked subsystems. For the exemplary interconnected system in Fig. 7.1, the coupling variable z 4 (k) includes the states and inputs from S3 and S5 that have an impact on the dynamics or constraints of subsystem S4 . The coupling variables z j (k) can be affected by the controllers of the interconnected subsystems, but the disturbances d j (k) are independent. The matrices A j ∈ Rn j ×n j , B j ∈ Rn j ×m j , E j ∈ Rn j × p j , and F j ∈ Rn j ×q j describe the influence of the corresponding variables on the states of subsystem j.
7.2.2 Centralized Versus Distributed Versus Decentralized Model Predictive Control Three general strategies to control interconnected systems differing with respect to communication and autonomy are usually distinguished in literature: centralized, distributed, and decentralized control. A centralized control approach views an interconnected system as one single system as depicted in Fig. 7.1 by the orange rectangle. Since the system contains all subsystems, coupling variables no longer exist and the system equation can be reduced to x(k + 1) = Ax(k) + Bu(k) + Ed(k),
(7.2)
where x(k) = [x1 (k)T , x2 (k)T , . . . , xn s (k)T ]T is the concatenation of all states of the subsystems. The inputs u(k), disturbances d(k), input matrix B and disturbance matrix E include the elements of the different subsystems and A is the system matrix which describes the global system. A centralized MPC approach requires communication between all subsystems and the central controller at each sampling instant.
7 Model Predictive Control for the Internet of Things
169
This prerequisite might not be satisfied for interconnected subsystems with flexible topology and wireless communication. In addition, the number of subsystems can be large and might render a centralized control approach computationally costly or even intractable. To take the characteristics of interconnected systems into account and to overcome the poor scaling properties of the centralized approach, two noncentralized approaches can be considered where an individual controller is deployed locally for each subsystem: • Distributed MPC: The local controllers can communicate with each other to exchange information (such as sensor measurements or local input and state predictions) and a connection to the Internet might be available. • Decentralized MPC: The local controllers have no communication links to other subsystems. This means that couplings with neighboring subsystems can only be considered in the design phase. A connection to the Internet might still be available. Remark 7.1 In literature, often decentralized MPC and distributed MPC are used synonymously, meaning that in both cases communication between subsystems is allowed [59, 60, 62]. In this work, we use the terms distributed and decentralized to distinguish between two different MPC approaches. For distributed MPC, the communication links are directly considered in the control approach. By informing interconnected subsystems about the progress of the corresponding coupling variables, the connected subsystems can take this information into account for the computation of their local optimal control solution. An overview of different approaches for distributed MPC is provided in [51, 64]. Methods range from considering negotiations between controllers [46], over focusing on the plug-and-play capability [62] to exchanging contracts [44, 45]. In the case of decentralized MPC, no communication links exist [64] and most approaches are thus based on robust MPC concepts such as min-max approaches [57] or contractive constraints [47]. Because centralized MPC has full knowledge of the global system and is not dependent on predictions from other controllers, it outperforms both distributed and decentralized MPC. The usage of communication links by distributed MPC to exchange information leads in general to a superior performance in comparison to decentralized MPC, which does not require any information exchange.
7.2.2.1
How to Obtain Guarantees for Non-centralized MPC Approaches?
To derive guarantees for non-centralized MPC, several methods have been presented in the literature, which use different kinds of insights about the global system. One of the first methods was shown in [11] by defining contractive stability constraints for linear systems which are able to communicate with their neighbors. This approach was extended in [47] for nonlinear systems and absence of communication. By including game theoretical approaches, in [26] conditions for stability are provided for the synthesis of local controllers based on the Nash equilibrium. Another technique to
170
B. Karg and S. Lucia
guarantee stability is to synthesize local controllers with local (variable) terminal cost functions and sets which additionally guarantee recursive feasibility for the distributed controllers [15, 16]. In [22], consensus is guaranteed for optimizing a coupled objective function with respect to input constraints, which is extended in [68] for state and input constraints and ADMM [10] is proposed as an algorithm for the distributed optimization task for fast convergence. An alternative algorithm for fast convergence for distributed optimization problems is the augmented Lagrangian alternating direction inexact Newton (ALADIN) [31], which was applied in [20] to an optimal power flow problem.
7.3 Challenges for Distributed MPC in the Internet of Things Using the advanced decision-making capabilities of model predictive control in the context of the internet of things is a challenging task. This work illustrates different current available methods to deal with two of the most important arising questions. How can non-centralized MPC help to optimize the control performance in case of extremely restricted communication? How can the limited computational power of typical IoT devices be used for advanced and computationally challenging algorithms?
7.3.1 Using Low-Power Wide Area Networks for Control One of the main goals of this work is to illustrate the fact that model predictive control can be leveraged to achieve a high-performance control despite the very limited communication possibilities offered by low-power wide area networks. The main idea is depicted in Fig. 7.2 for a system which is influenced by external factors. Obtaining predictions of the external factors for each sampling time in the prediction horizon is necessary to achieve accurate predictions of the state trajectories for the optimization. In the upper plot, a new downlink is available at step k and six new future values of the external factor (green dots) are received, because only six numbers can be transmitted in each communication event due to the strong limitations of lowpower wide area networks. For the sampling times where no external information is available, an interpolation rule can be used to compute the missing values. Since new information is not available at each sampling time due to the limited amount of possible downlinks per day, the old information is used until a new downlink is available. Note that it is, therefore, desirable to send longer prediction trajectories of external factors covering the prediction horizon of all subsequent steps until a new downlink is available. Even if it is not possible to send a value for each step, the sent trajectories counteract the fact that new information will not be available until the next
7 Model Predictive Control for the Internet of Things
171
predicted states
optimal inputs new external information
predicted states
optimal inputs old external information
Fig. 7.2 Illustration of two consecutive steps of model predictive control with limited communication. Receiving current and future external information from a downlink at time step k can be used throughout the subsequent steps until a new downlink is available (dashed vertical lines). Via interpolation between data points (green dots) values for each control instant can be derived. The gray rectangle depicts the receding time-window
downlink (vertical dashed lines). Using this simple strategy, the predictive nature of MPC can mitigate the drawbacks of the important communication limitations typical in IoT applications. When LPWANs are used, the resulting optimal control problem that needs to be solved at each sampling time for each interconnected subsystem S j described by (7.1) is given by N j −1
minimize x j ,u j
x j (T j ) P j x j (N j ) + T
x j (k)T Q j x j (k) + u j (k)T R j u j (k)
k=0
(7.3a) subject to
x j (k + 1) = A j x j (k) + B j u j (k) + E j d j (k) + F j z j (k), (7.3b)
172
B. Karg and S. Lucia
C x, j x j (k) ≤ cx, j , C f, j x j (N j ) ≤ c f, j ,
(7.3c)
Cu, j u j (k) ≤ cu, j ,
(7.3d)
x 0j ,
x j (0) =
(7.3e)
∀ k = 0, . . . , N j − 1,
(7.3f)
where N j is the horizon and P j ∈ Rn j ×n j 0, Q j ∈ Rn j ×n j 0, and R j ∈ Rm j ×m j 0 are weighting matrices. The n cx, j state constraints are described by the pair C x, j ∈ Rn cx, j ×n j and cx, j ∈ Rn cx, j , the n cu, j input constraints by the pair Cu, j ∈ Rn cu, j ×m j and cu, j ∈ Rn cu, j and the n c f, j terminal constraints by the pair C f, j ∈ Rn c f, j ×n j and c f, j ∈ Rn c f, j . A summary of MPC for very limited communication is given in Algorithm 7.1. Algorithm 7.1 MPC with limited communication 1: 2: 3: 4: 5: 6: 7: 8:
For each subsystem j: Obtain x 0j , and d j (k), z j (k) if new downlink available then Download and process (e.g. interpolation) new data z+j (k) and d+j (k) end if Simultaneously solve (7.3) for each subsystem j Apply first control input uˆ ∗j for each subsystem j k ←k+1 GOTO 1
cf. Remark 7.2
Remark 7.2 The necessary information about the disturbances d j (k) = [d j (k)T , . . . , d j (k + N j − 1)T ]T to solve (7.3) can be extracted from d+j (k) = [d j (k)T , . . . , d j (k + N j − 1 + n down )T ]T . d+j (k) is obtained at every downlink and n down is chosen such that the control horizons are covered until a new downlink is available. The couplings from the neighboring subsystems z j (k) can be obtained for each time step k + c by shifting the last obtained prediction at time k by c time steps and computing the c last time steps via the terminal controllers of the interconnected systems.
7.3.1.1
How to Obtain Guarantees in the Presence of Imperfect Communication?
One drawback of wireless communication is its vulnerability to dropout and delays and that quantization might lead to transmission of values with a noticeable deviation from the real value. To counteract those effects, ideas from robust MPC can be exploited. For example, a tube-based approach is used in [34] to guarantee collision-free control of a fleet of vehicles in case of communication breakdown. Most approaches are based on the synthesis of controllers to obtain a network of stable controllers. By exploiting the prediction capabilities of accurate plant models, an approach with high performance is presented in [53] despite delays and data loss.
7 Model Predictive Control for the Internet of Things
173
In [75], conditions for exponential stability in presence of delays and packet dropout are given when switching linear systems are considered. A scenario where packet loss can occur between the sensor and controller or the controller and actuator while guaranteeing stability is investigated in [19]. In [69] a controller synthesis which is robust with respect to dropouts and quantization effects is presented including guaranteeing recursive feasibility. Solving optimization problems of the form (7.3) on resource-constrained platforms can be very challenging. If uncertainties need to be considered via robust MPC schemes the computational load can be even larger. For this reason, we propose a method based on deep learning to enable an easy deployment of advanced control approaches on low-cost embedded hardware.
7.3.1.2
Deep Learning-Based Approximate MPC
Optimal control problems of the form (7.3) are parametric optimization problems that depend, in this case, on three parameters: the initial state x 0j , the prediction of external disturbances d j (k), and the prediction of the coupling variables z j (k). The optimization problem (7.3) defines implicitly the mapping between the parameters and the optimal solution and it can be written in a condensed form as minimize
uTj F j u j + pTj G j u j + pTj H j p j
(7.4a)
subject to
Cc, j u j ≤ T j p j + cc, j ,
(7.4b)
uj
where F j ∈ R N j m j ×N j m j , G j ∈Rn j +N j ( p j +q j )×N j m j , H j ∈ Rn j +N j ( p j +q j )×n j +N j ( p j +q j ) , Cc, j ∈ R N j n ineq ×N j m j , T j ∈ Rn j +N j n ineq, j ×n j N j ( p j +q j ) , cc, j ∈ R N j n ineq, j and n ineq, j is the total number of inequalities in (7.3). The vector p j = [x j (0)T , d j (0)T , . . . , d j (N j − 1)T , z j (0)T , . . . , z j (N j − 1)T ]T contains the parameters. The explicit solution of (7.4) is a piecewise affine function (PWA) with n r regions of the form [8]: ⎧ ⎪ ⎪ ⎨ K 1 p j + g1 if p j ∈ R1 , .. K (p j ) = (7.5) . ⎪ ⎪ ⎩ K p + g if p ∈ R , nr j nr j nr where Ri are the regions and K i ∈ R N j m j ×n j +N j ( p j +q j ) and gi ∈ R N j m j describe the corresponding affine control law. Each region is a polyhedron Ri = {p j ∈ R N j ( p j +q j ) |Z i p j ≤ z i } ∀i = 1, . . . , n r ,
(7.6)
where Z i ∈ Rci ×n+N j ( p j +q j ) , z i ∈ Rci describe the ci hyperplanes of the i-th region. nr Ri The intersection of the regions defines a bounded polytopic partition R = ∪i=1 with int(Ri ) ∩ int(R j ) = ∅ for all i = j. This reduces the solution of the optimiza-
174
B. Karg and S. Lucia
tion problem to finding the region Ri containing the parameters p j and to evaluate the corresponding affine control law. The number of regions n r can grow exponentially with respect to the horizon and the number of constraints which aggravates the point location problem and enlarges the memory requirements. To facilitate the point location problem, a binary search tree (BST) [71] can be computed whose nodes represent unique hyperplanes. Hyperplanes shared by several regions are evaluated only once and hyperplanes describing the convex hull of R are discarded. While traversing the tree, at each node it is checked whether the parameters are left or right of the hyperplane until a leaf node is reached. Leaf nodes represent a unique feedback law which can then be evaluated to obtain the optimal input. Still, the computation for a BST might be prohibitive or intractable for large systems [7]. Methods that approximate the BST to reduce the computational load can be obtained by only considering hypercubic regions [35], using truncated BSTs in combination with direct search for shallower BSTs [6] and computing arbitrary hyperplanes to balance the tree and to minimize its depth [23]. Instead of solving the point location problem, neural networks can be used as an approximator of the explicit function due to their strong approximation properties [27] as already done in [54]. If rectified linear units (ReLUs) are considered as the activation function, a deep neural network is a PWA function and the number of linear regions that it represents can grow exponentially with respect to the number of layers. This characteristic has been recently exploited [13, 38] to obtain a computationally cheap and memory-efficient approximation of the explicit description of the optimal control law (7.5) via supervised deep learning. In [38], it was additionally shown that an explicit feedback law (7.5) can be exactly represented by deep ReLU networks with a defined size and width. However, it is for the general case not possible to make rigorous statements about the necessary size of the network to obtain a certain approximation quality. A feed-forward neural network with constant width in the hidden layers is a mapping N : Rn j +N j ( p j +q j ) → Rm j described by N (p j ; θ, M, L)
=
f L+1 ◦ g L ◦ f L ◦ · · · ◦ g1 ◦ f 1 (x) for L ≥ 2, for L = 1, f L+1 ◦ g1 ◦ f 1 (x),
(7.7)
where the input of the network is p j ∈ Rn j +N j ( p j +q j ) and the output of the network is u ∈ Rm j . M is the number of neurons in each hidden layer and L is the number of hidden layers. Neural networks with L ≥ 2 are called deep, whereas neural networks with L = 1 are called shallow. Each hidden layer consists of two elements, where the first one is an affine function: fl (ξl−1 ) = Wl ξl−1 + bl ,
(7.8)
where ξl−1 ∈ R M is the output of the previous layer with ξ0 = p j . The second element of the neural network is a nonlinear activation function gl , which is applied elementwise to all the components of the layer. Common choices for nonlinearities include
7 Model Predictive Control for the Internet of Things
175
the rectifier linear units (ReLUs), which compute the maximum between zero and the affine function of the current layer l: gl ( fl ) = max(0, fl ),
(7.9)
and the hyperbolic tangent function (tanh): gl ( fl ) =
e fl − e− fl . e fl + e− fl
(7.10)
For IoT devices with limited ressources, ReLU activation is the favorable choice due to lower computational requirements, as it was shown in [37]. The weights Wi and biases bi , i = 1, . . . , L + 1 are summarized in the parameter θ = {θ1 , . . . , θ L+1 }. The elements θl contain the weights of the affine functions of each layer θl = {Wl , bl } ∀l = 1, . . . , L + 1,
(7.11)
⎧ M×n j +N j p j +q j ⎪ if l = 1, ⎨R Wl ∈ R M×M if l = 2, . . . , L , ⎪ ⎩ m j ×M if l = L + 1, R
(7.12)
where the weights are
and the biases are bl ∈
R M if l = 1, . . . , L , Rm j if l = L + 1.
(7.13)
The data for the training of the deep neural networks is generated via repeatedly solving the optimal control problem (7.3) n tr times. The n tr different parameters p j can be chosen via random sampling from the feasible space. By using active learning approaches [56], the number of necessary samples to achieve a certain approximation quality can be reduced. Another possibility to generate training data is to sample from the space of optimal trajectories by exploiting closed-loop trajectories as the training sets [37, 38] which usually leads to increased approximation quality assuming that the system will always be operated near optimal trajectories. Since the optimal input is applied and recomputed after every step, only the first step uˆ ∗j of the computed optimal input trajectory is of interest. This results in input– output tuples (p j,i , uˆ i∗ ) for i = 1, . . . , n tr with uˆ i∗ = K (p j,i ). Once all the training data has been generated by solving the optimal control problem (7.3) n tr times, the neural network is trained by solving the following optimization problem:
176
B. Karg and S. Lucia
θ ∗ = arg min θ
n tr (N (p j,i ; θ, M, L) − uˆ i∗ )2 ,
(7.14)
i=1
which tries to find the best approximation of the optimal control with a given network structure. The process of learning an approximate controller is summarized in Algorithm 7.2. Algorithm 7.2 Deep learning-based Model Predictive Control 1: 2: 3: 4: 5: 6: 7:
Design the MPC controller for i = 1, . . . , n tr do Solve (7.3) for feasible values of p j,i Add the pair (p j,i , uˆ i∗ ) to the training data end for Minimize (7.14) with chosen network structure (L layers, M neurons each) The approximate controller is defined by M, L and θ ∗
Typically, only a local minimizer θ ∗ can be found when solving (7.14) with variants of stochastic gradient descent. In addition, the resulting neural network is only an approximation of the real MPC feedback law (7.5), and therefore constraint satisfaction cannot be rigorously guaranteed.
7.3.1.3
How to Obtain Guarantees for Learning-Based Approximate Controllers?
Because the learning of the optimal controller is sampling based, an approximation error occurs for the learned controller which can not be exactly quantified. Three general methods can be found in literature to mitigate the effect of the approximation error: using ideas from robust MPC to incorporate the approximation error in the problem formulation, usage of statistical learning theory to verify the controller after the learning process, and applying a projection operator to ensure feasibility of the solutions. A method belonging to the robust approaches is presented in [72], where a backup controller in the sense of tube-based MPC is designed to guarantee feasibility of the closed-loop. In [29], robust MPC is combined with statistical learning theory. The approximation error is formulated as an uncertainty in the MPC controller from which the training samples are obtained. By analyzing the approximation error sampling based using Hoeffding’s inequality and designing a robust MPC controller, a statistical guarantee for stability and recursive feasibility is obtained. Probabilistic bounds on the feasibility and the suboptimality of a learned controller can be given by the a posteriori probabilistic validation of a learned optimal control law and the solution of the dual problem [74]. Probabilistic validation techniques for general closed-loop performance indicators are presented in [36, 39]. In [13, 38], a projection performed after the evaluation of the neural network is proposed as a method to
7 Model Predictive Control for the Internet of Things
177
guarantee recursive feasibility for the approximate controllers. A different approach to ensure that the obtained approximate controllers have the desired performance is to directly analyze the closed-loop behavior using probabilistic safe sets that can be derived despite model errors and other disturbances using probabilistic validation techniques [38]. In this work, we do not apply the mentioned strategies because small approximation errors can be tolerated in the considered application and the main goal is to illustrate the potential of the different techniques.
7.4 Temperature Control of a Smart Building Smart algorithms based on accurate models and high-resolution measurements can reduce the energy usage of buildings significantly [4, 40]. The internet of things provides an environment which facilitates the application of smart control methods through the easy installation of wireless sensor networks via battery-powered devices [2, 3], e.g., by retrofitting older buildings. Installing battery-powered sensors with a long battery lifetime (up to several years [3, 33]) is possible due to the lowpower communication and the deep sleep modes of recent devices. Furthermore, due to the long communication range of LPWANs in comparison to Wi-Fi, one gateway is sufficient to cover a building or even a residential district. While energy consumption is not an issue for the computation of the control action, as energy is usually available at the actuators (e.g., HVAC equipment), the available hardware is often very restricted in terms of memory and computation capabilities, making a learningbased MPC an advantageous approach. All these reasons lead to significantly lower installation costs that would enable a widespread use of advanced building control schemes using the methods described in this work. We illustrate the possibilities of MPC in the context of the internet of things with an academic example using a temperature control system for a smart building based on [52] where a limited LPWAN is used as the means of communication. The smart building consists of two rooms which are physically coupled via a door. Each room has one heating, ventilation, and air conditioning unit (HVAC) to independently control the room temperature. The setup of the interconnected system including external factors is visualized in Fig. 7.3. Each room S j , j = 1, 2, has three states x j = [TR, j , TWin, j , TWex, j ]T , where TR, j is the room temperature, TWin, j the temperature of a wall between two rooms, and TWex, j is the temperature of a wall facing the environment. The vector of manipulated variables is given by u j = [Phvac, j ]T , where Phvac, j is the power of the HVAC unit of room j. Positive values of Phvac, j mean that the room is heated and negative values of Phvac, j mean that the room is cooled. The external factors are d j = [Text , sr , dig, j ]T , where Text is the ambient temperature, sr is the solar radiation, and dig, j are the internal gains describing the heat fluxes from humans or devices like computers. The coupling variables z j are the room temperature of the other room, e.g., z 1 = [TR,2 ]. The system matrices for both rooms are identical and given by
178
B. Karg and S. Lucia
Fig. 7.3 Smart building with two rooms S1 and S2 which are physically coupled via a door. Each room is additionally influenced by three disturbance variables and the controllers can obtain weather predictions
Aj =
0.8511 0.0541 0.0707 0.1293 0.8635 0.0055 0.0989 0.0003 0.0002
E j = 10−3
0.0035 0.0003 0.0002
, Bj = ,
42.212 0.1 2.9214 , F j = 0 .
22.217 1.7912 1.5376 0.6944 103.18 0.1032 196.04
0
The bounds of the room temperature and the maximum available power for the HVAC are individually constrained. The temperature range for the first room is Tmin,1 = 21.0 ◦ C ≤ TR,1 ≤ 23.0 ◦ C = Tmax,1 and for the second room Tmin,2 = 16.0 ◦ C ≤ TR,2 ≤ 18.0 ◦ C = Tmax,2 . Due to on average higher internal gains in the first room, the corresponding HVAC unit is more powerful with |Phvac,1 | ≤ 1000.0W = Pmax,1 than the one in the second room with |Phvac,2 | ≤ 700.0 W = Pmax,2 . The temperature band in which the system can be operated allows a controller to use the thermal mass to reduce the energy consumption. The goal of the control is to minimize the average energy consumption Pavg =
n sim −1 i=0
(|Phvac,1,i | + |Phvac,2,i |) , n sim
(7.15)
in a given period Ttot = n sim · tsim , where n sim is the number of simulation steps and tsim is the simulation sampling time. Note that both positive and negative values of Phvac, j mean that energy is consumed. The optimal control problem for the centralized controller is given by minimize x,u
subject to
N −1 2
|Phvac, j (k)|
(7.16a)
k=0 j=1
x(k + 1) = Ax(k) + Bu(k) + Ed(k), Tmin, j ≤ TR, j (k) ≤ Tmax, j ,
(7.16b) (7.16c)
7 Model Predictive Control for the Internet of Things
179
Tmin, j ≤ TR, j (N ) ≤ Tmax, j ,
(7.16d)
|Phvac, j (k)| ≤ Pmax, j ,
(7.16e)
0
x(0) = x , ∀ k = 0, . . . , N − 1,
(7.16f) (7.16g)
∀ j = 1, 2,
(7.16h)
where N is the horizon. For non-centralized optimal control, the problem for room j is given by N j −1
minimize x,u
subject to
|Phvac, j (k)|
(7.17a)
k=0
x j (k + 1) = A j x j (k) + B j u j (k) + E j d j (k) + F j z j (k), Tmin, j ≤ TR, j (k) ≤ Tmax, j ,
(7.17b) (7.17c)
Tmin, j ≤ TR, j (N ) ≤ Tmax, j , |Phvac, j (k)| ≤ Pmax, j ,
(7.17d) (7.17e)
x j (0) = x 0j ,
(7.17f)
∀ k = 0, . . . , N j − 1.
(7.17g)
Remark 7.3 The formulation of the objective function in (7.16) and (7.17) is defined with the absolute values of the inputs for the sake of readability. In fact, a differpos neg entiable objective is used via modeling Phvac, j as two inputs Phvac, j and Phvac, j and corresponding constraints: pos
0 ≤ Phvac, j ≤ Pmax, j , −Pmax, j ≤
neg Phvac, j
≤ 0.
(7.18a) (7.18b)
Note that this transformation is not generally applicable. For the chosen example, this transformation does not change the optimal solution and is, therefore, used in the remainder of the chapter. A schematic overview of the non-centralized control procedure is given in Algorithm 7.3. Remark 7.4 We assume that the initial values for z j (k) and d j (k) are such that the initial problem at time step k is feasible for all subsystems. The initial values of z j (k) can be derived from either solving the centralized problem once or by using distributed methods presented in [16, 58]. In our case each element of z j (k) is initialized as the mean value between the upper and lower bound of the corresponding temperature. This is motivated by the relatively slow dynamics of the case study and because small violations of the constraints are uncritical. The values of the coupling variables can be updated in each loop of the algorithm if distributed MPC is applied. New information of the disturbance variables can be obtained when a new downlink is available. In the case of decentralized MPC,
180
B. Karg and S. Lucia
Algorithm 7.3 Non-centralized MPC 1: For each subsystem j: Obtain x 0j , z j (k) and d j (k) 2: Simultaneously solve (7.17) for all subsystems S j 3: Apply first control input uˆ ∗j 4: if distributed then 5: Exchange information with neighboring subsystems to obtain z j (k + 1) 6: end if 7: if new downlink available then 8: Update external information d+j (k + 1) 9: end if 10: k ← k + 1 11: GOTO 1
cf. Remark 7.4
information about z j (k) has to be stored locally and updated according to rules, e.g., if a periodic reference trajectory is known by shifting this trajectory according to the time step k.
7.5 Results We first show in this section how MPC can help to counteract the challenges that new low-power communication technologies introduce. Second, we show how the use of deep learning-based MPC can help to deploy complex decision-making strategies on resource-constrained hardware platforms which are characteristic for the internet of things. To simulate the external disturbances that affect the presented case study, we use real weather data obtained from the weather station Berlin Tegel.
7.5.1 Value of Communication The effect of communication on the control quality is investigated by varying the number of available downlinks per day. The temperature of each room is controlled by a decentralized MPC with horizon N1 = N2 = 12 corresponding to 12 h. No robust MPC scheme is applied to highlight the impact of communication on the control performance. A communication setting similar to the one offered by Sigfox is considered where no direct communication between the controllers is allowed. Since a decentralized MPC scheme is used, each controller assumes that the coupling variables z 1 = TR,2 and z 2 = TR,1 are in the middle of their allowed temperature bounds with z 1 = 17.0 ◦ C and z 2 = 22.0 ◦ C. It is assumed that with each downlink, six future values for each of the three external variables Text , sr , and dig, j can be obtained. In Fig. 7.4, forecasts of the ambient temperature Text (solid black line) for each hour during one day are shown for different downlink frequencies. Because of the limited communication, it is not
7 Model Predictive Control for the Internet of Things 24
real temperature 2 downlinks per day 4 downlinks per day
22
Text [◦C]
Fig. 7.4 Interpolation of the temperature of a summer day for different downlink intervals at time t = 0. Markers indicate when a new downlink will be available
181
20
18
16
0
5
10
15
20
25
time [h]
0.24
violations (avg.) [◦C h−1 ]
Fig. 7.5 Average violation of the room temperature constraint for different downlink rates. Enabling more communication leads to smaller constraint violations, and therefore to a better closed-loop performance
0.22
0.20
Sigfox
0.18
0.16 0
5
10
15
20
25
downlinks [d−1 ]
possible to receive the value for each hour, and an interpolation has to be performed. The quality of the interpolation depends on the number of daily downlinks that can be performed. The smaller the number of downlinks (dashed blue line), the worse the approximation of the real signal. The impact of the downlink rate on the interpolation quality is particularly visible in Fig. 7.4 at the peak around 15 h where the interpolation with four downlinks per day (dashed orange line) significantly outperforms the interpolation based on two downlinks per day. The higher approximation quality when more communication is allowed leads to a significant reduction of average violations of the temperature constraint when performing decentralized MPC as it is shown in Fig. 7.5. The average violations can be decreased by 20.3% by allowing 4 instead of 3 downlinks per day and allowing 24 downlinks instead of 4 leads to a further decrease of 14.8%. These results show that despite the restrictive communication, a small amount of information can still significantly improve the performance of the closed-loop control.
182
B. Karg and S. Lucia
7.5.2 Embedded Decentralized Control with Limited Communication Since no information of coupling variables can be obtained and information of external factors can only be updated at certain time intervals, robustness against the missing or inaccurate information should be incorporated in the decentralized MPC framework. In the case of the disturbance variables, the uncertainty is modeled using multiplicative uncertainties. For instance, the real solar radiation that affects the system is the product sr = δsr sˆr , where δsr is an uncertain parameter and sˆr is the predicted value. The uncertain parameters follow a uniform distribution with δText = unif(0.9, 1.1), δsr = unif(0.95, 1.05), and δdig, j = unif(0.8, 1.2), and are sampled at each control interval. The coupling variables, which are the temperature of the neighboring rooms, are assumed to lie in the allowed temperature range. To cope with the uncertainties, we use a robust MPC scheme called multi-stage MPC and adapt (7.3) according to [43]. For multi-stage MPC, a scenario tree is constructed where each branch represents a possible combination of discrete scenarios. A lower and an upper bound are considered as extreme values for each parameter, which for linear systems guarantees constraint satisfaction for any possible value within the assumed range. We consider a robust horizon of one step which means the tree is branched only in the first stage. Because centralized, distributed, and decentralized approaches handle the coupling variables differently, the scenario tree results in 8 scenarios for centralized and distributed MPC and 16 scenarios for decentralized MPC. The 8 scenarios are formed with the upper and lower bounds of the uncertainties for each one of the three external disturbances (23 = 8), while the 16 scenarios for decentralized MPC are obtained by adding the uncertainty of the coupling variable (24 = 16). For centralized MPC, the coupling variables are inherent states and do not need to be considered in the scenario tree. The cost function used in the robust MPC includes all corresponding scenarios with equal weights. In the case of decentralized MPC, pre-computed trajectories or constraint sets of coupling variables can be stored on the local controller to foster a certain degree of cooperation. Because no direct communication between controllers is available, the only available source of external information is the predictions of the external factors. The different choices of the bounds on the uncertain variables for the different control approaches are summarized in Table 7.1. If the three approaches have unlimited communication, i.e., a new downlink is available at each control instant and prediction values are obtained for each control instant within the horizon, it can be seen in Table 7.2 that all three approaches have a comparable performance only differing 1.6% in the energy usage for the HVAC. The minor violations for distributed MPC occur because it is not robust to the slight difference between the communicated predicted trajectories of the linked subsystems and the actual values. This can be avoided including consistency constraints as done in a contract-based approach [44]. The similar behavior of the three approaches is illustrated in Fig. 7.6.
7 Model Predictive Control for the Internet of Things
183
Table 7.1 Overview of the uncertainties considered in the scenario tree for the different approaches. For centralized MPC, the coupling variables are an inherent system state and for distributed MPC a prediction is available because controllers can exchange their planned trajectories. For decentralized MPC, three different intervals are considered because for limited communication additional errors occur which can be taken into account using larger uncertainty intervals Symbol
Type
Centralized
Distributed
Decentralized Tight
Middle
Large
δText
external
[0.9, 1.1]
[0.9, 1.1]
[0.9, 1.1]
[0.7, 1.3]
[0.5, 1.5]
δsr
external
[0.95, 1.05]
[0.95, 1.05]
[0.95, 1.05]
[0.75, 1.5]
[0.5, 2.0]
δdig
external
[0.8, 1.2]
[0.8, 1.2]
[0.8, 1.2]
[0.65, 1.35]
[0.5, 1.5]
z1
coupling
state
exch. traj.
[16, 18]
[15.75, 18.25]
[15.5, 18.5]
z2
coupling
state
exch. traj.
[21, 23]
[20.75, 23.25]
[20.5, 23.5]
Table 7.2 Average power usage of the HVAC, average and maximum violation of the temperature constraint for three MPC strategies when full prediction information at each control instant is available Method Centralized Distributed Decentralized HVAC (avg.) [W] Violation (avg.) [◦ Ch−1 ] Violation (max.) [◦ C]
580.78 0.0
585.82 1.24×10−5
590.07 0.0
0.0
4.56×10−3
0.0
Limited communication leads to interpolation errors that add additional uncertainty to the model which can jeopardize the performance of the MPC controller. To deal with this effect, we build three different scenario trees based on tight, mediumsized, and large intervals for δText , δsr , and δdig described in the last three columns of Table 7.1 and analyze the performance for each scenario. A tight scenario assumes that no interpolation error occurs and considers only the uncertainty related to the weather forecast. The uncertainty ranges for the medium and large scenarios are enlarged to take into account the additional interpolation errors. By enlarging the intervals of the uncertainties from tight to large, the average violations can be reduced by two orders of magnitudes as the first three columns of Table 7.3 show. The energy consumption increases by 5.0% when considering large intervals due to the more conservative control. The 16 scenarios to be considered for decentralized MPC in the multi-stage framework lead to a significant computational load. This load can be reduced by learning the parametric solution with a deep neural network. A deep neural network with 5 hidden layers with 20 neurons each was used to approximate the decentralized MPC with large uncertainty interval settings. The training data was generated by simulating the exact decentralized MPC for four years with weather data from Berlin Tegel which lead to 35041 training samples. Due to the limited communication, the input parameter vector p j = [x 0j , d j,int (0)T , . . . , d j,int (N j − 1)T ]T ∈ Rn j +N j p j includes interpolated disturbance
184
B. Karg and S. Lucia
TR,1 [◦C]
centralized
distributed
decentralized
decentralized (interpolation)
22.0 21.5 21.0
TR,2 [◦C]
18
17
Phvac,1 [W]
16
400 200 0 0
10
20
30
40
50
60
70
time [h]
Fig. 7.6 Four different MPC approaches are compared for a duration of 3 days. Centralized, distributed, and decentralized MPC obtained new weather predictions at each control instant. Decentralized (interpolation) relies on very limited communication and has to interpolate the necessary values for the disturbance variables Table 7.3 Average power usage of the HVAC, average and maximum violation of the temperature constraint for exact decentralized MPC for three different settings of the uncertainty intervals and deep learning-based approximate decentralized MPC for the setting with the largest uncertainty intervals Method MPC Neural network Tight Medium Large Large HVAC (avg.) [W] 588.90 Violation (avg.) 2.54×10−2 [◦ Ch−1 ] Violation (max.) 0.740 [◦ C]
603.27 3.63×10−3
618.40 2.92×10−4
618.60 2.94×10−4
0.378
0.396
0.468
values d j,int but no coupling variables. The neural network was trained using Keras [14] with TensorFlow [1] back-end. The samples were scaled such that all values belong to [0, 1]. Using the optimization algorithm Adam [41], a mean-squared error of 6.2676e-05 for the first room and 7.7577e-5 for the second room was obtained after 500 training epochs with batch size 200. The performance of the exact robust controller and the learned neural network controller are very similar as the difference in average violation and energy usage is less than 1.0% (see Table 7.3). A compari-
7 Model Predictive Control for the Internet of Things
185
TR,1 [◦C]
17.0
Phvac,1 [W]
22.0
TR,2 [◦C]
exact
approximation
21.5
16.5
400 200 0
10
20
30
40
50
60
70
time [h]
Fig. 7.7 Comparison of exact decentralized robust MPC for very limited communication and an approximation via neural networks
son of the approach with the large uncertainty intervals and its deep learning-based approximation is shown in Fig. 7.7. It can be seen that the performance is very similar. The main advantages of the neural network approach are the reduced computation times and very low memory requirements with a minor loss in performance. Saving the MPC problem formulation with condensed matrices would require 110.25 kB neglecting the additional storage needed to actually solve the optimization problem. Because of the large amount of parameters (39 in this case), the exact explicit solution can not be computed with standard explicit MPC tools [28]. The proposed deep learning-based approximation needs less than 10% of the memory with 9.77 kB compared to just storing the MPC problem formulation. Evaluating the approximate controller for one sampling instant on a low-cost 32-bit ARM Cortex M3 microcontroller with 96 kb of RAM and 512 kb of flash storage running at 89MHz required only 14.0 ms.
7.6 Conclusion The recent advances in communication technologies and low-cost computing hardware that drive the progress of the internet of things also bring important challenges to the application of advanced control techniques such as model predictive control. We have shown that MPC can be used to mitigate some of the disadvantages related to the strong limitations of low-power wide area networks. Furthermore, when
186
B. Karg and S. Lucia
these communication technologies are used in a distributed or decentralized MPC framework, uncertainty arises due to couplings between subsystems, disturbances, and a limited information exchange. Robust MPC schemes can be used to cope with the uncertainty but they lead to a significant increase of the computational requirements, which complicates the deployment of these strategies on the low-cost embedded hardware that is usually available. Motivated by recent theoretical advances in the field of deep learning, we have shown that learning the MPC policy using deep neural networks leads to high-quality approximations that can be easily deployed on low-cost micro-controllers. The results for a multi-room building control case study show that the proposed ideas can be used to deploy decentralized MPC schemes with very limited communication and simple hardware with only a minor performance degradation when compared to an exact centralized approach.
References 1. Abadi, M., Agarwal, A., Barham, P., Brevdo, E., Chen, Z., Citro, C., Corrado, G.S., Davis, A., Dean, J., Devin, M., Ghemawat, S., Goodfellow, I., Harp, A., Irving, G., Isard, M., Jozefowicz, R., Jia, Y., Kaiser, L., Kudlur, M., Levenberg, J., Mané, D., Schuster, M., Monga, R., Moore, S., Murray, D., Olah, C., Shlens, J., Steiner, B., Sutskever, I., Talwar, K., Tucker, P., Vanhoucke, V., Vasudevan, V., Viégas, F., Vinyals, O., Warden, P., Wattenberg, M., Wicke, M., Yu, Y., Zheng, X.: TensorFlow, tensorflow.org (2015) 2. Andersen, M.P., Kim, H.-S., Chen, K., Kumar, S., Zhao, W.J., Ma, K., Culler, D.E.: System architecture directions for post-soc/32-bit networked sensors. In: Proceedings of the 16th ACM Conference on Embedded Networked Sensor Systems, ACM (2018) 3. Andersen, M.P., Kim, H.-S., Culler, D.E.: Hamilton: a cost-effective, low power networked sensor for indoor environment monitoring. In: Proceedings of the 4th ACM International Conference on Systems for Energy-Efficient Built Environments, ACM (2017) 4. Ascione, F., Bianco, N., De Stasio, C., Mauro, G.M., Vanoli, G.P.: Simulation-based model predictive control by the multi-objective optimization of building energy performance and thermal comfort. Energy Build. 111. Elsevier (2016) 5. Augustin, A.: A study of LoRa: long range & low power networks for the internet of things. Sensors 16, 9th edn. Multidisciplinary Digital Publishing Institute (2016) 6. Bayat, F., Johansen, T.A., Jalali, A.A.: Combining truncated binary search tree and direct search for flexible piecewise function evaluation for explicit MPC in embedded microcontrollers. IFAC Proc. Vol. 44, 1st edn. Elsevier (2011) 7. Bayat, F., Johansen, T.A., Jalali, A.A.: Flexible piecewise function evaluation methods based on truncated binary search trees and lattice representation in explicit MPC. IEEE Trans. Control Syst. Technol. 20, 3rd edn. IEEE (2011) 8. Bemporad, A., Morari, M., Dua, V., Pistikopoulos, E.N.: The explicit linear quadratic regulator for constrained systems. Automatica 38, 1st edn. Elsevier (2002) 9. Bemporad, A., Oliveri, A., Poggi, T., Storace, M.: Ultra-fast stabilizing model predictive control via canonical piecewise affine approximations. IEEE Trans. Autom. Control 56, 12nd edn. IEEE (2011) 10. Boyd, S., Parikh, N., Chu, E., Peleato, B., Eckstein, J.: Distributed optimization and statistical learning via the alternating direction method of multipliers. Found. Trends® Mach. Learn. 3, 1st edn. Now Publishers, Inc. (2011) 11. Camponogara, E., Jia, D., Krogh, B.H., Talukdar, S.: Distributed model predictive control. IEEE Control Syst. Mag. 22, 1st edn. IEEE (2002)
7 Model Predictive Control for the Internet of Things
187
12. Centenaro, M., Vangelista, L., Zanella, A., Zorzi, M.: Long-range communications in unlicensed bands: the rising stars in the IoT and smart city scenarios. IEEE Wireless Commun. 23, 5th edn. IEEE (2016) 13. Chen, S., Saulnier, K., Atanasov, N., Lee, D.D., Kumar, V., Pappas, G.J., Morari, M.: Approximating explicit model predictive control using constrained neural networks. In: 2018 Annual American Control Conference (ACC) (2018) 14. Chollet, F., et. al.: Keras. https://github.com/fchollet/keras (2015) 15. Conte, C., Jones, C.N., Morari, M., Zeilinger, M.N.: Distributed synthesis and stability of cooperative distributed model predictive control for linear systems. Automatica 69. Elsevier (2016) 16. Conte, C., Voellmy, N.R., Zeilinger, M.N., Morari, M., Jones, C.N.: Distributed synthesis and control of constrained linear systems. In: American Control Conference, IEEE (2012) 17. Csek˝o, L.H., Kvasnica, M., Lantos, B.: Explicit MPC-based RBF neural network controller design with discrete-time actual Kalman filter for semiactive suspension. IEEE Trans. Control Syst. Technol. 23, 5th edn. IEEE (2015) 18. de la Peña, M.D., Bemporad, A., Alamo, T.: Stochastic programming applied to model predictive control. In: Proceedings of the 44th IEEE Conference on Decision and Control (2005) 19. Ding, B.: Stabilization of linear systems over networks with bounded packet loss and its use in model predictive control. Automatica 47, 11th edn. Elsevier (2011) 20. Engelmann, A., Jiang, Y., Mühlpfordt, T., Houska, B., Timm, F.: Toward distributed OPF using ALADIN. IEEE Trans. Power Syst. 34, 1st edn. IEEE (2018) 21. Eric Wang, Y.-P., Lin, X., Adhikary, A.: A primer on 3GPP narrowband Internet of Things (NB-IoT). arXiv preprint arXiv:1606.04171 (2016) 22. Ferrari-Trecate, G., Galbusera, L., Marciandi, M.P.E., Scattolini, R.: A model predictive control scheme for consensus in multi-agent systems with single-integrator dynamics and input constraints. In: 46th IEEE Conference on Decision and Control, IEEE (2007) 23. Fuchs, A.N., Jones, C., Morari, M.: Optimized decision trees for point location in polytopic data sets-application to explicit MPC. In: Proceedings of the 2010 American Control Conference (2010) 24. Geyer, T., Torrisi, F.D., Morari, M.: Optimal complexity reduction of polyhedral piecewise affine systems. Automatica 44, 7th edn. Elsevier (2008) 25. Giselsson, P., Doan, M.D., Keviczky, T., De Schutter, B., Rantzer, A.: Accelerated gradient methods and dual decomposition in distributed model predictive control. Automatica 49, 3rd edn. Elsevier (2013) 26. Gu, D.: A differential game approach to formation control. IEEE Trans. Control Syst. Technol. 16, 1st edn. (2007) 27. Hanin, B.: Universal function approximation by deep neural nets with bounded width and relu activations. arXiv preprint arXiv:1708.02691 (2017) 28. Herceg, M., Kvasnica, M., Jones, C.N., Morari, M.: Multi-parametric toolbox 3.0. In: European Control Conference 2013. IEEE (2013) 29. Hertneck, M., Köhler, J., Trimpe, S., Allgöwer, F.: Learning an approximate model predictive controller with guarantees. IEEE Control Syst. Lett. 2, 3rd edn. IEEE (2018) 30. Holaza, J., Takács, B., Kvasnica, M.: Synthesis of simple explicit MPC optimizers by function approximation. In: 2013 International Conference on Process Control (PC) (2013) 31. Houska, B., Kouzoupis, D., Jiang, Y., Diehl, M.: Convex optimization with ALADIN. Optimization Online preprint (2017) 32. Ingole, D., Kvasnica, M., De, S.H., Gustafson, J.: Reducing memory footprints in explicit model predictive control using universal numbers. IFAC-PapersOnLine 50, 1st edn. Elsevier (2017) 33. Ishibashi, K., Takitoge, R., Manyvone, D., Ono, N., Yamaguchi, S.: Long battery life IoT sensing by beat sensors. In: International Conference on Industrial Cyber Physical Systems. IEEE (2019) 34. Izadi, H.A., Gordon, B.W., Zhang, Y.: Decentralized receding horizon control of multiple vehicles subject to communication failure. In: 2009 American Control Conference, IEEE (2009)
188
B. Karg and S. Lucia
35. Johansen, T.A., Grancharova, A.: Approximate explicit constrained linear model predictive control via orthogonal search tree. IEEE Trans. Autom. Control 48, 5th edn. IEEE (2003) 36. Karg, B., Alamo, T., Lucia, S.: Probabilistic performance validation of deep learning-based robust NMPC controllers. arXiv preprint arXiv:1910.13906 (2019) 37. Karg, B., Lucia, S.: Deep learning-based embedded mixed-integer model predictive control. In: 2018 European Control Conference (ECC) (2018) 38. Karg, B., Lucia, S.: Efficient representation and approximation of model predictive control laws via deep learning. IEEE. Trans. Cyber. 50(9), 3866–3878 https://doi.org/10.1109/TCYB. 2020.2999556 (2018) 39. Karg, B., Lucia, S.: Learning-based approximation of robust nonlinear predictive control with state estimation applied to a towing kite. In: Proceedings of 18th European Control Conference (2019) 40. Killian, M., Kozek, M.: Ten questions concerning model predictive control for energy efficient buildings. Build. Environ. 105. Elsevier (2016) 41. Kingma, D.P., Ba, J.: Adam: a method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014) 42. Kögel, M., Findeisen, R.: A fast gradient method for embedded linear predictive control. IFAC Proc. Vol. 44, 1st edn. Elsevier (2011) 43. Lucia, S., Finkler, T., Engell, S.: Multi-stage nonlinear model predictive control applied to a semi-batch polymerization reactor under uncertainty. J. Process Control 23, 9th edn. Elsevier (2013) 44. Lucia, S., Kögel, M., Findeisen, R.: Contract-based predictive control of distributed systems with plug and play capabilities. IFAC-PapersOnLine 48, 23rd edn. Elsevier (2015) 45. Lucia, S., Kögel, M., Zometa, P., Quevedo, D.E., Findeisen, R.: Predictive control, embedded cyberphysical systems and systems of systems–a perspective. Annu. Rev. Control 41. Elsevier (2016) 46. Maestre, J.M., de le Peña, M.D., Camacho, E.F., Alamo, T.: Distributed model predictive control based on agent negotiation. J. Process Control 21, 5th edn. Elsevier (2011) 47. Magni, L., Scattolini, R.: Stabilizing decentralized model predictive control of nonlinear systems. Automatica 42, 7th edn. Elsevier (2006) 48. Mayne, D.Q., Seron, M.M., Rakovi´c, S.V.: Robust model predictive control of constrained linear systems with bounded disturbances. Automatica 41, 2nd edn. Elsevier (2005) 49. Mirakhorli, A., Dong, B.: Occupancy behavior based model predictive control for building indoor climate a critical review. Energy Build. 129. Elsevier (2016) 50. Montufar, G.F., Pascanu, R., Cho, K., Bengio, Y.: On the number of linear regions of deep neural networks. Adv. Neural Inf. Process. Syst. (2014) 51. Negenborn, R.R., Maestre, J.M.: Distributed model predictive control: an overview and roadmap of future research opportunities. IEEE Control Syst. Mag. 34, 4th edn. IEEE (2014) 52. Oldewurtel, F., Jones, C.N., Morari, M.: A tractable approximation of chance constrained stochastic MPC based on affine disturbance feedback. In: 47th IEEE Conference on Decision and Control (2008) 53. Onat, A., Naskali, T., Parlakay, E., Mutluer, O.: Control over imperfect networks: model-based predictive networked control systems. IEEE Trans. Ind. Electron. 58, 3rd edn. IEEE (2010) 54. Parisini, T., Zoppoli, R.: A receding-horizon regulator for nonlinear systems and a neural approximation. Automatica 31, 10th edn. Elsevier (1995) 55. Park, P., Ergen, S.C., Fischione, C., Lu, C., Johansson, K.H.: Wireless network design for control systems: a survey. IEEE Commun. Surv. Tutor. 20, 2nd edn. IEEE (2017) 56. Quindlen, J.F.: Data-driven methods for statistical verification of uncertain nonlinear systems (2018) 57. Raimondo, D.M., Magni, L., Scattolini, R.: Decentralized MPC of nonlinear systems: an inputto-state stability approach. Int. J. Robust Nonlinear Control: IFAC-Affiliated J., Wiley Online Library (2007) 58. Rakovi´c, S.V., Kern, B., Findeisen, R.: Practical robust positive invariance for large–scale discrete time systems. IFAC Proc. 44, 1st edn. Elsevier (2011)
7 Model Predictive Control for the Internet of Things
189
59. Richards, A., How, J.: A decentralized algorithm for robust constrained model predictive control. In: Proceedings of the 2004 American control conference, vol. 5. IEEE (2004) 60. Richards, A., How, J.: Decentralized model predictive control of cooperating UAVs. In: 43rd Conference on Decision and Control, vol. 4. IEEE (2004) 61. Richter, S., Jones, C.N., Morari, M.: Computational complexity certification for real-time MPC with input constraints based on the fast gradient method. IEEE Trans. Autom. Control 57, 6th edn. IEEE (2011) 62. Riverso, S., Farina, M., Ferrari-Trecate, G.: Plug-and-play decentralized model predictive control for linear systems. IEEE Trans. Autom. Control 58, 10th edn. IEEE (2013) 63. Roman, R., Zhou, J., Lopez, J.: On the features and challenges of security and privacy in distributed internet of things. Comput. Netw. 57, 10th edn. Elsevier (2013) 64. Scattolini, R.: Architectures for distributed and hierarchical model predictive control–a review. J. Process Control 19, 5th edn. Elsevier (2009) 65. Scokaert, P.O.M., Mayne, D.Q.: Min-max feedback model predictive control for constrained linear systems. IEEE Trans. Autom. Control 43, 8th edn. IEEE (1998) 66. Serra, T., Tjandraatmadja, C., Ramalingam, S.: Bounding and counting linear regions of deep neural networks. arXiv preprint arXiv:1711.02114 (2017) 67. Sicari, S., Rizzardi, A., Grieco, L.A., Coen-Porisini, A.: Security, privacy and trust in Internet of Things: the road ahead. Comput. Netw. 76. Elsevier (2015) 68. Summers, T.H., Lygeros, J.: Distributed model predictive consensus via the alternating direction method of multipliers. In: 50th Annual Allerton Conference on Communication, Control, and Computing, IEEE (2012) 69. Tang, X., Ding, B.: Model predictive control of linear systems over networks with data quantizations and packet losses. Automatica 49, 5th edn. Elsevier (2013) 70. Telkamp, T., Slats, L.: Ground breaking world record! LoRaWAN packet received at 702 km (436 miles) distance. https://www.thethingsnetwork.org/article/ground-breaking-worldrecord-lorawan-packet-received-at-702-km-436-miles-distance (2017) 71. Tøndel, P., Johansen, T.A., Bemporad, A.: Evaluation of piecewise affine control via binary search tree. Automatica 39, 5th edn. Elsevier (2003) 72. Wabersich, K.P., Zeilinger, M.N.: Linear model predictive safety certification for learningbased control. In: Conference on Decision and Control 2018. IEEE (2018) 73. Wen, C., Ma, X., Ydstie, B.E.: Analytical expression of explicit MPC solution via lattice piecewise-affine function. Automatica 45, 4th edn. Elsevier (2009) 74. Zhang, X., Bujarbaruah, M., Borrelli, F.: Safe and Near-Optimal Policy Learning for Model Predictive Control using Primal-Dual Neural Networks. arXiv preprint arXiv:1906.08257 (2019) 75. Zhang, W.-A., Yu, L.: Modelling and control of networked control systems with both networkinduced delay and packet-dropout. Automatica 44, 12th edn. Elsevier (2008)
Chapter 8
Hybrid Gaussian Process Modeling Applied to Economic Stochastic Model Predictive Control of Batch Processes E. Bradford , L. Imsland , M. Reble, and E. A. del Rio-Chanona
Abstract Nonlinear model predictive control (NMPC) is an effective approach for the control of nonlinear multivariable dynamic systems with constraints. However, NMPC requires an accurate plant model. Plant models can often be determined from first principles, parts of the model are, however, difficult to derive using physical laws alone. In this paper, a new hybrid modeling scheme is proposed to overcome this issue, which combines physical models with Gaussian process (GP) modeling. The GPs are employed to model the parts of the physical model that are difficult to describe using first principles. GPs not only give predictions, but also quantify the residual uncertainty of this model. It is vital to account for this uncertainty in the control algorithm, to prevent constraint violations and performance deterioration. Monte Carlo samples of the GPs are generated offline to tighten constraints of the NMPC and thus ensure joint probabilistic constraint satisfaction online. Advantages of our method include fast online evaluation times, and exploiting the flexibility of GPs and the data efficiency of first principle models. The algorithm is verified on a case study involving a challenging semi-batch bioreactor.
E. Bradford (B) · L. Imsland Engineering Cybernetics, Norwegian University of Science and Technology, Trondheim, Norway e-mail: [email protected] L. Imsland e-mail: [email protected] M. Reble BASF SE, Ludwigshafen, Germany e-mail: [email protected] E. A. del Rio-Chanona Centre for Process Systems Engineering, Department of Chemical Engineering, Imperial College London, London, UK e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Switzerland AG 2021 T. Faulwasser et al. (eds.), Recent Advances in Model Predictive Control, Lecture Notes in Control and Information Sciences 485, https://doi.org/10.1007/978-3-030-63281-6_8
191
192
E. Bradford et al.
8.1 Introduction Model predictive control (MPC) was developed in the late 70s and has progressed significantly since then. Model predictive control is the only advanced control methodology, which has made a significant impact on industrial control engineering [34]. MPC is especially useful to deal with multivariable control problems and important process constraints. Many processes are highly nonlinear and may be operated along state trajectories, which motivates the use of nonlinear MPC (NMPC). In particular, NMPC applications based on first principles models are becoming increasingly popular due to the advent of improved optimization methods and the availability of more models [6]. In this paper, we focus on finite-horizon control problems, for which chemical batch processes are a particularly important example. These are employed in many different chemical sectors due to their inherent flexibility. Previous works for batch processes include NMPC based on the extended and unscented Kalman filter [7, 39], polynomial chaos expansions [9, 37], and multi-stage NMPC [33]. A major limitation of NMPC in practice is the requirement for an accurate dynamic plant model, which has been cited to take up to 80% of the MPC commissioning effort [50]. The required dynamic model for NMPC is often derived from first principles taking advantage of the available prior knowledge of the process [40]. While this can be an efficient modeling approach, often parts of the model are notoriously difficult to represent using physical laws. In addition, modeling certain phenomena may require excessive amounts of computational time. Hybrid modeling1 approaches have, therefore, been proposed, which combine first principles modeling with datadriven regression methods. For example, in chemical engineering, hybrid models have been developed to capture chemical reaction kinetics [43, 51], the complex mechanics of catalyst deactivation [3], or for the correction of first principles models using available measurements [5, 21]. Most hybrid modeling applications focused on using neural networks (NNs). In this paper, we propose to use Gaussian processes (GPs) instead [45] due to their ability to not only provide predictions, but also provide a measure of uncertainty for these predictions difficult to obtain by other nonlinear modeling approaches [27]. GPs compute predictive Gaussian distributions as output estimates, whose mean serves as the prediction and the variance as a measure of uncertainty. GPs for regression were first introduced by [41]. It has been shown in [44] that GPs achieve comparable predictive accuracy as other modeling approaches like NNs. It is important to account for the GP measure of uncertainty to avoid constraint violations and performance deterioration. To consider uncertainty for NMPC formulations explicitly robust MPC [13] and stochastic MPC [19] approaches have been developed. Previous works on using GPs for hybrid modeling mainly focused on linear ordinary differential equation systems of first and second order that can be solved exactly, see, for example [1, 31, 46].
1 Hybrid
modeling refers to the combination of physical models with data-driven models. This should not be confused with hybrid systems, which refers to a dynamical system that exhibits both continuous and discrete dynamic behavior.
8 Hybrid Gaussian Process Modeling Applied to Economic Stochastic Model …
193
GP-based MPC was first proposed in [38], in which the GP is recursively updated for reference tracking. In [28, 29], it is proposed to identify the GP offline and apply it online for NMPC instead. The variance therein is constrained to avoid the NMPC steering into regions of high uncertainty. Furthermore, GPs have been used to overcome deviations between the approximate plant model and the real plant model [25, 35]. GPs may also act as an efficient surrogate to estimate the mean and variance required for stochastic NMPC [8]. Applications of GP-based MPC include the control of an unmanned quadrotor [14], the control of a gas–liquid separation process [32], and the steering of miniature cars [22]. While these works show the feasibility of GP-based MPC, most formulations use stochastic uncertainty propagation to account for the uncertainty measure provided by the GP, e.g. [14, 22, 28, 29]. An overview of these approaches can be found in [23]. Major limitations of stochastic propagation are open-loop uncertainty growth and significantly increased computational times. Recently, several papers have proposed alternative techniques to consider the GP uncertainty measure. In [30], ellipsoidal sets are propagated using linearization and the linearization error is accounted for by employing Lipschitz constants, which is, however, relatively conservative. In [36], a robust MPC approach is used that bounds the one-step ahead error from the GP, while [48] suggests a robust control approach for linear systems to account for unmodeled nonlinearities. This approach may, however, be infeasible if the deviation between the nonlinear system and linear system is too large. In this paper, we extend a method first proposed in [10, 11] to the hybrid modeling case. It is commonly assumed for most GP works that the function values required can be directly observed; however, the GP is now part of the physical model. Therefore, a new method is proposed for the identification of the GP. In addition, the first principle model is commonly in continuous time, and hence the GP error needs to be accounted for in continuous time. The main contributions of this works are • New hybrid modeling scheme using GPs, which applies Maximum a posteriori (MAP) estimation to identify the required latent function values. • Novel sampling approach of discretization points to account for the hybrid GP error in continuous time. The approach determines explicit back-offs to tighten constraints offline using closed-loop Monte Carlo (MC) simulations for finite-horizon control problems. These in turn guarantee the satisfaction of probabilistic constraints online. There are several advantages of this approach including avoidance of closed-loop uncertainty growth, fast online computational times, and probabilistic guarantees on constraint satisfaction. In addition, sampled GPs lead to deterministic models that can be easily handled in a hybrid modeling framework. In contrast, obtaining statistical moments for stochastic uncertainty propagation for hybrid models is difficult. The paper consists of the following sections. In Sect. 8.2, the problem definition is given. Thereafter, in Sect. 8.3, we outline the solution approach. Section 8.4 outlines the semi-batch bioprocess case study to be solved, while in Sect. 8.5 results and discussions for this case study are presented. Section 8.6 concludes the paper.
194
E. Bradford et al.
8.2 Problem Formulation The goal in this work is the use of GPs for hybrid modeling and a NMPC formulation exploiting this model for control. Hybrid modeling in this paper refers to the combination of first principles modeling and data-driven modeling. It is assumed that a physical model is available in continuous time with corresponding input/output data. Furthermore, it is assumed that a part of this physical model cannot be easily derived from first principles, and therefore GPs are exploited to identify this part. It is commonly assumed for most GP works that the function values required can be directly observed; however, the GP is now part of the physical model. To overcome this issue we propose the use of MAP to estimate these function values. The prior in this regard is an important component to avoid overfitting. The main advantage of GPs is their probabilistic nature, which quantifies the residual uncertainty of the identified functions. It is important to consider this uncertainty to prevent constraint violations. MC samples of the GPs are generated offline to tighten constraints of the NMPC using explicit back-offs, which in turn ensure the satisfaction of a probabilistic constraint online. The continuous-time equations need to be discretized, and hence the GP uncertainty needs to be accounted for at several points. A new sampling approach is, therefore, proposed, which samples the GP at the discretization points. The dynamic system in this paper is assumed to be given by a nonlinear equation system with additive disturbance noise and an unknown function q(·, uk ): xk+1 = F(xk , uk , q(·, uk )) + ωk , x0 ∼ N (μx0 , Σ x0 ),
(8.1)
where xk ∈ Rn x represent the states, uk denotes the control inputs, q : Rn x × Rn u → Rn q are unknown nonlinear functions, and F : Rn x × Rn u × Rn q → Rn x are known nonlinear functions. The initial condition x0 is assumed to follow a Gaussian distribution with mean μx0 and covariance Σ x0 . Additive disturbance noise is denoted by ωk , which is assumed to follow a Gaussian distribution with zero mean and covariance matrix Σ ω , ωk ∼ N (0, Σ ω ). Note most first principles models are given in continuous-time, which has important implications on the unknown function q(·, uk ). For example, this model needs to be well-identified not only at discrete times. Let δt = tk − tk−1 be a constant sampling time at which measurements are taken. The corresponding continuous-time model to F(·) is represented by f(·) assuming a zero hold
tk+1
F(xk , uk , q(·, uk )) =
f(x(t), uk , q(x(t), uk ))dt + xk ,
(8.2)
tk
where xk = x(tk ) is the value of the state at discrete-time k. In general q(·) may be composed of n q separate scalar functions: q(x, u) = [q1 (q1in (x, u)), . . . , qn q (qninq (x, u))]T
(8.3)
8 Hybrid Gaussian Process Modeling Applied to Economic Stochastic Model … n
195
with n q separate input functions qiin : Rn x × Rn u → R qi for i = 1, . . . , n q . Note that these input functions are assumed to be known, since commonly the unknown function denotes an unmodeled physical process, for which the inputs are known a priori. The input dimension n qiin is usually much lower than the dimension of states and control inputs combined, and therefore modeling these components can be considerably more data-efficient than determining the full state-space model from data instead. The measurement at discrete-time t = tk can be expressed as follows: in
yk = Hxk + νk ,
(8.4)
where yk is the corresponding measurement, H ∈ Rn y ×n x is the linear observation model, and νk denotes additive measurement noise with zero mean and a covariance matrix Σ ν . The control objective is to minimize a finite-horizon cost function: VT (x0 , U) = E
T −1
(xk , uk ) + f (xT ) ,
(8.5)
k=0
where T ∈ N is the time horizon, U = [u0 , . . . , uT −1 ]T ∈ RT ×n u is a joint matrix over all control inputs for time horizon T , : Rn x × Rn u → R represent the stage costs, and f : Rn x → R is the terminal cost. The control inputs are subject to hard constraints: uk ∈ U ∀k ∈ {0, . . . , T − 1},
(8.6)
where U ∈ Rn u denotes the set of feasible control inputs. The states are subject to the satisfaction of a joint nonlinear chance constraint over the time horizon T , which can be stated as P
T
{xk ∈ Xk } ≥ 1 − ,
(8.7a)
k=0
where Xt is defined as Xk = {x ∈ Rn x | g (k) j (x) ≤ 0, j = 1, . . . , n g }.
(8.7b)
The state constraint requires the joint event of all xk for all k ∈ {0, . . . , T } fulfilling the nonlinear constraint sets Xk to have a probability greater than 1 − . It is assumed that f(·) and qiin for i = 1, . . . , n q are known, while q(·) is unknown and needs to be identified from data. We assume we are given N noisy measurements according to Eq. 8.4, which are represented by the following two matrices:
196
E. Bradford et al.
Z = [zk(1) , . . . , zk(N ) ]T ∈ R N ×n z , Y=
(1) (N ) T , . . . , yk+1 ] [yk+1
∈R
N ×n y
(8.8a) ,
(8.8b)
where zk(i) = (xk(i) , uk(i) ) is a tuple of xk(i) and uk(i) , which are the i-th input of the data (i) at discrete-time k with corresponding noisy measurements given by yk+1 at discretetime k + 1. The matrix Z is a collection of input data with the corresponding noisy observations collected in Y. The uncertainty in this problem arises in part from the additive disturbance noise ω and from the noisy initial condition x0 . The more important source of noise, however, originates from the unknown function q(·), which is identified from only finite amount of data. To solve this problem, we train GPs to approximate q(·) from the data in Eq. 8.8. In the next section, we first introduce GPs to model the function q(·), which then also represent the residual uncertainty of q(·). This uncertainty representation is, thereafter, exploited to obtain the required stochastic constraint satisfaction of the closed-loop system.
8.3 Solution Approach 8.3.1 Gaussian Process Hybrid Model Training In this section, we introduce GPs to obtain a probabilistic model description for q(·). GPs are an example of flexible non-parametric models. The main appeal of GPs is the fact that they require little prior process knowledge and quantify the residual model uncertainty. Formally, a GP is a collection of random variables of which any finite subset follows a Gaussian distribution. For more information on GPs refer to [45, 54]. A GP describes a distribution over functions and can be viewed as a generalization of multivariate Gaussian distributions, and hence can be utilized as a prior probability distribution on unknown functions in a Bayesian framework. In Fig. 8.1, we illustrate a prior GP in the top graph and the posterior GP in the bottom graph. The GP prior uses a smooth covariance function, and hence the resulting samples are also smooth. The mean function is assumed to be 0. Once the GP has been updated with data, it can be seen that the mean function passes closely through the data points. One can also notice that close to these observations the uncertainty is greatly reduced; however, areas far from observations exhibit greater uncertainty. Consequently, the resulting GP samples which are contained with a high probability inside the confidence regions are shown. We use a separate GP for each component qi (qiin (x, u)) for i = 1, . . . , n q , which is a standard practice in the GP community to handle multivariate outputs [17]. Let i refer to the GP of function i of q. Assume qi (·) is distributed as a GP with mean function m i (·) and covariance function ki (·, ·), which fully specifies the GP prior
8 Hybrid Gaussian Process Modeling Applied to Economic Stochastic Model …
197
Fig. 8.1 Illustration of a GP of a one-dimensional function perturbed by noise. On the top the prior of the GP is shown, while on the bottom the Gaussian process was fitted to several data points to obtain the posterior
qi (·) ∼ GP(m i (·), ki (·, ·)).
(8.9)
The choice of the mean and covariance function define the GP prior. In this study, we use a zero mean function and the squared-exponential (SE) covariance function: m i (qiin ) := 0,
1 ki (qiin , qiin ) := ζi2 exp − (z − z )T i (z − z ) , 2
(8.10a) (8.10b)
where qiin , qiin ∈ Rn z are arbitrary inputs, ζi2 denotes the covariance magnitude, and i := diag(λ1 , . . . , λn z ) is a scaling matrix. Remark 8.1 (Prior assumptions) Zero mean can be realized by normalizing the data. Choosing the SE covariance function assumes the function to be modeled qi (·) is smooth, and the stochastic process to be stationary. Now assume we are given N values of qi (·) , which we jointly denote as Qi = [qi(1) , . . . , qi(N ) ]T .
(8.11)
Assume these correspond to their q(·) values at the inputs defined in Z in Eq. 8.8. The corresponding input response matrices to Z are then given by
198
E. Bradford et al. in (1) in (N ) T Qin i = [qi (zk ), . . . , qi (zk )] .
(8.12)
According to the GP prior, the data vectors Qi follow the multivariate normal distribution: (8.13) Qi ∼ N (0, Σ Qi ), where [Σ Qi ]lm = ki (qiin (zk(l) ), qiin (zk(m) )) + σνi2 δl,m for each pair (l, m) ∈ {1, . . . , N }2 and σνi2 is a pseudo-noise term for regularization [45]. In essence this places a likelihood on the training dataset based on the continuity and smoothness assumptions made by the choice of the covariance function. The length-scales and hyperparameters introduced are jointly denoted by Ψ i = [λ1 , . . . , λn qin , ζi2 , σνi2 ]T . i Given a value of Qi , we can further determine a likelihood for values not part of Qi using conditioning. Let Qˆ i represent Nˆ such values at the inputs: ˆ in (1) ˆ in Q zk ), . . . , qiin (ˆzk( N ) )]T . i = [qi (ˆ
(8.14)
ˆ i follow a joint Gaussian distribution: From the prior GP assumption Qi and Q
Σ Qi Σ TQˆ ,Q Qi 0 i i , , ˆi ∼N 0 Q Σ Qˆ i ,Qi Σ Qˆ i
(8.15)
where [Σ Qˆ i ]lm = k(qiin (ˆzk(l) ), qiin (ˆzk(m) )) + σνi2 δl,m for each pair (l, m) ∈ {1, . . . , Nˆ }2
and [Σ Qˆ i ,Qi ]lm = k(qiin (ˆzk(l) ), qiin (zk(m) )) for each pair (l, m) ∈ {1, . . . , N } × {1, . . . , Nˆ }. ˆ i conditioned on Qi is given by [45] The likelihood of Q ˆ i ∼ N (μ ˆ |Qi , Σ ˆ |Qi ), Q Qi Qi
(8.16)
−1 T where μQˆ i |Qi := Σ TQˆ ,Q Σ −1 ˆ i |Qi = Σ Q ˆ i − ΣQ ˆ i ,Qi . ˆ i ,Qi Σ Qi Σ Q ˆ i Qi and Σ Q Q i i So far the treatment of GPs has been relatively standard; however, we are unable ˆ i directly. This problem is a common occurrence for latent stateto observe Qi and Q space models, for which MCMC sampling [20] or maximum a posteriori (MAP) [26] has been applied. In this paper, we apply MAP to obtain the required vectors Qi , for which we require the following likelihood based on Eqs. 8.7 and 8.4:
yk+1 ∼ N (Hxk+1 , Σ ν + HΣω HT ),
(8.17a)
t where xk+1 = tkk+1 f(x(t), uk , q(x(t), uk ))dt + xk is dependent on the dynamics and crucially on the unknown function q(·). ˆ i , and Ψ i , respectively, i.e., ˆ and Ψ refer to the joint Qi , Q Let Q, Q, ˆ = [Q ˆ 1, . . . , Q ˆ n q ], Ψ = [Ψ 1 , . . . , Ψ n q ]. Q = [Q1 , . . . , Qn q ], Q
(8.18)
8 Hybrid Gaussian Process Modeling Applied to Economic Stochastic Model …
199
Based on the different likelihoods, we can now write down the likelihood equation for the data: ˆ Ψ , Z) ∝ p(Y|Q, Q, ˆ Z) p(Q|Q, ˆ p(Y|Q, Q, Ψ , Z) ˆ p(Ψ ), × p(Q|Ψ , Z) p(Q) p(Q)
(8.19a)
where the different likelihoods are given as follows: ˆ Z) p(Y|Q,
=
N
( j)
( j)
( j)
T ˆ ˆ N (yk+1 ; F(x k , uk , Q), Σ ν + HΣω H ),
(8.19b)
ˆ i ; μ ˆ |Qi , Σ ˆ |Qi ), N (Q Qi Qi
(8.19c)
N (Qi ; 0, Σ Qi ),
(8.19d)
j=1
ˆ p(Q|Q, Ψ , Z)
=
nq i=1 nq
p(Q|Ψ , Z)
=
i=1
ˆ k , uk , Q) ˆ refers to a discretized state-space model, for which Q ˆ represents where F(x the values of q(·) at the discretization points. The likelihoods stated above can be ˆ Z) is the likelihood of the observed data given understood as follows: p(Y|Q, Q, ˆ ˆ ˆ given Q, Ψ , Z, and lastly p(Q|Ψ , Z) Q, Z, p(Q|Q, Ψ , Z) is the likelihood of Q refers to the likelihood of Q given Ψ , Z. ˆ k , uk , Q) ˆ can in general repExample 8.2 (Example discretization for MAP) F(x resent any valid discretization rule. Assume, for example, we apply the trapezium rule for discretization, then we obtain the following relation for the known input ( j) ( j) ( j) zk = (xk , uk ): ( j) ( j) ( j) ( j) ( j) ( j) ( j) ( j) xk+1 = xk + 0.5δt f(ˆx1 , uk , qˆ 1 ) + f(ˆx2 , uk , qˆ 2 ) ( j)
(8.20)
where xˆ i and qˆ 1 refer to the ith state and q-value of the discretization rule and ˆ k , uk , Q) ˆ = x( j) . These states xˆ 1( j) = x( j) and xˆ 2( j) = x( j) , however, note in genF(x k+1 k k+1 eral that the initial- and end point may not be part of the discretization points. Let the number of discretization points required per interval be given by ds , such that for the trapezium rule above ds = 2. The corresponding matrices required for the ˆ = [qˆ 1(1) , . . . , qˆ (1) , . . . , qˆ 1(N ) , . . . , qˆ (N ) ]T ∈ R(ds N )×n q MAP likelihood are given by Q ds ds ( j) ( j) ( j) (1) (1) and Zˆ = [ˆz1 , . . . , zˆ ds , . . . , zˆ 1(N ) , . . . , zˆ d(Ns ) ]T ∈ R(ds N )×n q , where zˆ i = (ˆxi , uk ). For implicit integration rules like the one above either a Newton solver needs to be ( j) employed or the unknown values xˆ 2 are added to the optimization variables with Eq. 8.20 as additional equality constraints for each training data-point.
200
E. Bradford et al.
ˆ and p(Ψ ) are prior distributions of Q, Q, ˆ The remaining likelihoods p(Q), p(Q), and Ψ , respectively. These are a helpful tool to avoid overfitting and can be used to easily integrate prior knowledge into the optimization problem, e.g., knowledge on the approximate magnitude of Q. Refer to [26], for example, on how priors can be used to incorporate prior knowledge on latent variables, such as Q. The required values for Q and Ψ are then found by minimizing the negative log-likelihood of Eq. 8.19a: ˆ ∗ ) ∈ argmin L (Q, Ψ ) = − log p(Y|Q, Q, ˆ Ψ , Z), (Q∗ , Ψ ∗ , Q
(8.21)
ˆ Q,Ψ ,Q
ˆ ∗ are the required maximum a posteriori (MAP) estimates. where Q∗ , Ψ ∗ , Q In the following sections, we assume that the GP has been fitted in this way such that we have MAP values Q∗ and Ψ ∗ . The predictive distribution of q(·) at an arbitrary input z = (x, u) given the dataset D = (Z, Q∗ ) is as follows [54]: q(z)|D ∼ N (μq (z; D), Σ q (z; D))
(8.22a)
with −1 ∗ T ∗ T μq (z; D) = [k1T Σ −1 Q1 Q1 , . . . , kn q Σ Qn q , Qn q ]
Σ q (z; D) = −1 2∗ 2∗ 2∗ T diag ζ12∗ + σν1 − k1T Σ −1 Q1 k1 , . . . , ζn q + σνn q − kn q Σ Qn q kn q ,
(8.22b)
(8.22c)
where ki = [ki (qiin (z), qiin (zk(1) )), . . . , ki (qiin (z), qiin (zk(N ) ))]T .
8.3.2 Hybrid Gaussian Process Model Predictive Control Formulation In this section, we define the NMPC optimal control problem (OCP) based on the GP hybrid nominal model fitted in the previous section, where the nominal model refers to the mean function in Eq. 8.22. The initial state for the GP hybrid NMPC formulation is assumed to be measured or estimated, and propagated forward using Eq. 8.1. The predicted states are exploited to optimize the objective subject to tightened constraints. Let the corresponding optimization problem be denoted as PT μq (·; D); x, k for the current known state x at discrete-time k based on the mean function μq (·; D):
8 Hybrid Gaussian Process Modeling Applied to Economic Stochastic Model …
ˆ k:T −1 ) = minimize VˆT (x, k, U ˆ k:T −1 U
T −1
201
(ˆx j , uˆ j ) + f (ˆxT )
j=k+1
subject to: ˆ x j , uˆ j , μq (ˆz(t); D)), zˆ (t) = (ˆx(t), uˆ j ) ∀ j ∈ {k, . . . , T − 1} xˆ j+1 = F(ˆ
(8.23)
xˆ j+1 ∈ X j+1 , uˆ j ∈ U ∀ j ∈ {k, . . . , T − 1} xˆ k = x, ˆ and VˆT (·) refer to the states, control inputs, and control objective of the where xˆ , u, ˆ k:T −1 = [uˆ k , . . . , uˆ T −1 ]T , and Xk is a tightened constraint set MPC formulation, U denoted by Xk = {x ∈ Rn x | gi(k) (x) + bi(k) ≤ 0, i = 1, . . . , n g }. The variables bi(k) represent so-called back-offs, which tighten the original constraints Xk defined in Eq. 8.7. Remark 8.3 (Objective in expectation) Note the objective above in Eq. 8.23 aims to determine the optimal trajectory for the nominal and not the expectation of the objective as defined in Eq. 8.5, since it is computationally expensive to obtain the expectation of a nonlinear function [23]. The NMPC algorithm solves PT μq (·; D); xk , k at each sampling time tk given the current state xk to obtain an optimal control sequence: ∗ ˆ k:T ˆ k∗ μq (·; D); xk , k , . . . , uˆ ∗T −1 μq (·; D); xk , k ]T . U −1 μq (·; D); xk , k = [u (8.24) Only the first optimal control action is applied to the plant at time tk before the same optimization problem is solved at time tk+1 with a new state measurement xk+1 . This procedure implicitly defines the following feedback control law, which needs to be repeatedly solved for each new measurement xk : κ(μq (·; D); xk , k) = uˆ k∗ μq (·; D); xk , k . (8.25) It is explicitly denoted that the control actions depend on the GP hybrid model used. Remark 8.4 (Full state feedback) Note that in the control algorithm we have assumed full state feedback, i.e., it is assumed that the full state can be measured without noise. This assumption can be dropped by introducing a suitable state estimator based on the nominal GP into the closed-loop simulations.
8.3.3 Closed-Loop Monte Carlo Sample In Eq. 8.25, the control policy is stated, which is obtained by repeatedly solving the optimization problem in Eq. 8.23 with updated initial conditions. GPs are distribution
202
E. Bradford et al.
over functions and hence a GP sample describes a deterministic function. An example of this can be seen in Fig. 8.1, in which several GP samples are depicted. In this section, we outline how MC samples of GPs can be obtained for a finite time horizon, each describing separate state trajectories according to Eq. 8.1. These are then exploited in the next section to tighten the constraints defined in the previous section. In general, exact GP realizations cannot be obtained by any known approach, since generating such a sample would require sampling an infinite-dimensional stochastic process. Instead, approximate approaches have been applied, such as spectral sampling [12]. Exact samples of GP are, however, possible if the GP only needs to be evaluated at a finite number of points. This is, for example, the case for discrete-time GP state-space models as proposed in [16, 52]. We next outline this technique and show how this can be extended to the continuous-time case for hybrid GP models, in which discretization is applied. Assume we are given a state-space model defined as in Sect. 8.2 in Eq. 8.1, and a fitted GP model for q(·) determined from Sect. 8.3.1. The predictive distribution based on available data D = (Z, Y) is then given by Eq. 8.22. The aim here is to show how to obtain a sample of the state sequence, which can be repeated multiple times to obtain multiple possible state sequences. The initial condition x0 follows a known Gaussian distribution as defined in Eq. 8.1. Let the state sequence be given by (s) T (8.26) X (s) = [χ(s) 0 , . . . , χT ] , (s) where χ(s) k represents the state sequence of a GP realization s and χk the state of this realization at time t = tk . Further, let the corresponding control actions at time t = tk be denoted by u(s) k . The control actions are assumed to be the result of the GP nominal NMPC feedback policy defined in Eq. 8.25 and hence can be stated as (s) u(s) k = κ(μq (·; D); χk , k).
(8.27)
We denote the control actions over the time horizon T jointly as (s)
(s)
(s)
(s)
U (s) = [u0 , . . . , uT −1 ]T = [κ(μq (·; D ); χ0 , 0), . . . , κ(μq (·; D ); χT −1 , T − 1)]T ,
(8.28)
which are different for each MC sample s due to feedback. To obtain a sample of a state sequence, we first need to sample the initial state x0 ∼ N (μ x 0 , Σ x 0 ) to attain the realization χ(s) 0 . Thereafter, the next state is given by Eq. 8.2, which is dependent on the fitted GP of q(·). An exact approach to obtain an independent sample of a GP is as follows. Any time the GP needs to be evaluated at a certain point, the response at this point q(·) is sampled according to the predictive distribution in Eq. 8.22. This sampled point is then part of the sampled function path, and hence the GP needs to be conditioned on it. This necessitates treating this point as a noiseless pseudo-training point without changing the hyperparameters. Note if the sample path were to return to the same evaluation point, it would then lead to the
8 Hybrid Gaussian Process Modeling Applied to Economic Stochastic Model …
203
same output due to this conditioning procedure. Consequently, the sampled function is deterministic as expected. Furthermore, we also need to sample ωk ∼ N (0, Σ ω ) for each k. We refer to these realizations as w(s) k . We assume Eq. 8.2 has been adequately discretized, such that the GP of q(·) needs to be evaluated at only a finite number of points. The state sequence for realization s can then be given as follows: (s) ˆ (s) (s) (s) ∀k ∈ {1, . . . , T }, χ(s) k+1 = F(χk , uk , Qk ) + wk
(8.29)
where Qk(s) are discretization points sampled from the GP following the procedure ˆ represents the discretized version of Eq. 8.2. outlined above and F(·) Example 8.5 We give an example here of the procedure above exploiting the trapezˆ (s) , u(s) , Q (s) ). Note the covariance matrix and dataset of the GPs ium rule for F(χ k k k are updated recursively. Assume we are at time k for MC sample s, and the covari(s) (s) ∗(s) ance matrix of qi (·) is given by Σ (s) Qik with the updated dataset Dk = (Zk , Qk ), k ∗(s) (s) (s1) where Q∗(s) = [Q∗(s) , . . . , z(s N ) ]T as in Sect. 8.3.1. The k 1k , . . . , Qn q k ] and Zk = [z dataset size N k = N + (k − 1) × ds , since at each time step k, ds discretization points are added to the dataset. Let the number of discretization points per time interval be given by ds , for the trapezium rule ds = 2. The discretization points then follow the distribution: ˆ (s) ∈ Rds ∼ N (μ ˆ (s) |Q(s) , Σ ˆ (s) |Q(s) ), Q ik ik ik Q Q ik
(8.30)
ik
(s) (s) := Σ TQˆ (s) ,Q(s) Σ −1 Q(s) and Σ Qˆ (s) |Qik = Σ Qˆ (s) − Σ TQˆ (s) ,Q(s) Σ Q(s) where μQˆ (s) |Qik ˆ (s) ik ik
ik
ik
Qik
ik
ik
ik
ik
ik
Σ Qˆ (s) ,Q(s) , [Σ Qˆ (s) ]lm = k(qiin (ˆzk(sl) ), qiin (ˆzk(sm) )) for each pair (l, m) ∈ {1, . . . , ds }2 ik
ik
ik
and [Σ Qˆ (s) ,Q(s) ]lm = k(qiin (ˆzk(sl) ), qiin (zk(sm) )) for each pair (l, m) ∈ {1, . . . , N k } × ik ik {1, . . . , ds }. Firstly, we sample ds independent standard normally distributed ξi ∈ Rn q ∼ N (0, I) for each GP i. The sampled discretization points can then be expressed by 1
(s) (s) (s) = μQˆ (s) |Qik + ξi · Σ 2ˆ (s) |Qik ), Qik Qik
ik
(8.31)
(s) (s) , Σ Qˆ (s) |Qik ). where Qk(s) ∈ Rds ∼ N (μQˆ (s) |Qik ik
ik
Once Qk(s) has been sampled we arrive at the next state k + 1 for the MC sample as follows: ( j) ( j) ( j) ( j) (s) (s) (s) ˆ 1 , uk , Q1k ) + f(χˆ 2 , uk , Q2k ) , (8.32) χ(s) k+1 = χk + 0.5δt f(χ ( j)
( j)
ˆ 2 = χ(s) where χˆ 1 = χ(s) k and χ k+1 , since for the trapezium rule the discretization points coincide with the initial and the end-points, which is not true for other discretization rules. The value of the inputs for the discretization points are consequently
204
E. Bradford et al.
given by zˆ k(sl) = (χˆ l(s) , u(s) k ). For implicit integration rules like the one above a Newton solver needs to be employed using Eqs. 8.31 and 8.32. Equation 8.31 is required ( j) due to the dependency of zˆ k(sl) on χˆ l . Lastly, the data matrices for the particular MC sample need to be updated as follows: (s) (s1) ˆ k , . . . , zˆ k(sds ) ]T , Z(s) k+1 = [Zk , z
(8.33a)
∗(s)T Q∗(s) , Qk(s)T ]T , k+1 = [Qk
[Σ Qik ]lm =
(8.33b)
ki (qiin (zk(sl) ), qiin (zk(sm) ))
+
σνi2 δl,m
∀(l, m) ∈ {1, . . . , N
} . (8.33c)
k+1 2
Repeating this procedure multiple times then gives us multiple MC samples of the state sequence X (s) . The aim then is to use the information obtained from these sequences to iteratively tighten the constraints for the GP NMPC problem in Eq. 8.23 to obtain the probabilistic constraint satisfaction required from the initial problem definition in Sect. 8.2.
8.3.4 Constraint Tightening Using Monte Carlo Samples This section outlines how to systemically tighten the constraints based on MC samples using the procedure outlined in the previous chapter. Firstly, define the function C(·), which is a single-variate random variable that represents the satisfaction of the joint chance constraints: C(X) =
max
( j,k)∈{1,...,n g }×{0,...,T }
FC(X) = P {C(X) ≤ 0} = P
g (k) j (xk ),
T
(8.34a)
{xk ∈ Xk } ,
(8.34b)
k=0
where X = [x0 , . . . , xT ]T defines a state sequence, and Xk = {x ∈ Rn x | g (k) j (x) ≤ 0, j = 1, . . . , n g }. The evaluation of the probability in Eq. 8.34 is generally intractable, and instead a non-parametric sample approximation is applied, known as the empirical cumulative distribution function (ecdf). Assuming we are given S MC samples of the state trajectory X and hence of C(X), the ecdf estimate of the probability in Eq. 8.34 can be defined as follows: S 1 1{C(X (s) ) ≤ 0}, FC(X) ≈ FˆC(X) = S s=1
(8.35)
8 Hybrid Gaussian Process Modeling Applied to Economic Stochastic Model …
205
where X (s) is the s-th MC sample and FˆC(X) is the ecdf approximation of the true probability FC(X) . The accuracy of the ecdf in Eq. 8.35 significantly depends on the number of samples used and it is, therefore, paramount to account for the residual uncertainty of this sample approximation. This problem has been previously studied in statistics, for which the following probabilistic lower bound has been proposed known as “exact confidence bound” [15]: Theorem 8.6 (Confidence interval for empirical cumulative distribution function) Assume we are given a value of the ecdf, βˆ = FˆC(X) , as defined in Eq. 8.35 based on S independent samples of C(X), then the true value of the cdf, β = FC(X) , as defined in Eq. 8.34 has the following lower confidence bounds: P β ≥ βˆlb ≥ 1 − α,
ˆ S βˆ , βˆlb = betainv α, S + 1 − S β,
(8.36)
Proof The proof uses standard results in statistics and can be found in [15, 49]. In other words, the probability that the probability defined in Eq. 8.34, β, exceeds βˆlb is greater than 1 − α. Consequently, for small α βˆlb can be seen as a conservative lower bound of the true probability β accounting for the statistical error introduced through the finite sample approximation. Based on the definition of C(X) and the availability of S closed-loop MC simulations of the state sequence X, assuming we are given a value for βˆlb according to Eq. 8.36 with a confidence level of 1 − α, then the following Corollary holds: Corollary 8.7 (Feasibility probability) Assuming the stochastic system in Eq. 8.1 is a correct description of the uncertainty of the system including the fitted GP and ignoring possible inaccuracies due to discretization errors, and given a value of the lower bound βˆlb ≥ 1 − defined in Eq. 8.36 with a confidence level of 1 − α, then the original chance constraint in Eq. 8.7 holds true with a probability of at least 1 − α. Proof The realizations of possible state sequences described in Sect. 8.3.3 are exact within an arbitrary small discretization error, and therefore these S independent state trajectories X provide a valid lower bound βˆlb from Eq. 8.36 to the true cdf value β. If βˆlb is greater than or equal to 1 − , then the following probabilistic bound holds on the true cdf value β according to Theorem 1: P β ≥ βˆlb ≥ 1 − ≥ 1 − α, which in other words means that β = P {C(X ≤ 0)} ≤ 1 − with a probability of at least 1 − α. Now assume we want to determine back-off values for the nominal GP NMPC algorithm in Eq. 8.23, such that βlb is equal to 1 − for a chosen confidence level 1 − α. This in turn guarantees the satisfaction of the original chance constraint with a probability of at least 1 − α. The update rule to accomplish this has two steps: Firstly, an approximate constraint set is defined and secondly this set is iteratively adjusted. The approximate constraint set should reflect the difference of the constraint
206
E. Bradford et al.
values for the state sequence of the nominal MPC model and the constraint values of possible state sequence realizations of the real system in Eq. 8.1. The back-offs are first set to zero and S MC samples are run according to Sect. 8.3.3. Now assume we aim to obtain back-off values that imply satisfaction of individual chance constraints as follows to attain an approximate initial constraint set: (k) (k) g (k) j (χk ) + b j = 0 =⇒ P g j (χk ) ≤ 0 ≥ 1 − δ,
(8.37)
where δ is a tuning parameter and should be set to a reasonably low value and χk refers to states according to the nominal trajectory as defined in Sect. 8.3.3. It is proposed in [42] to exploit the inverse ecdf to fulfill the requirement given in Eq. (8.37) using the S MC samples available. The back-offs can then be stated as (k) (k) ˆ −1 b˜ (k) j = F (k) (1 − δ) − g j (χk ) ∀( j, k) ∈ {1, . . . , n g } × {1, . . . , T }, gj
(8.38)
˜ (t) refers to these where Fˆ −1 (t) denotes the inverse of the ecdf given in Eq. 8.35 and b j gj
initial back-off values. The inverse of an ecdf can be determined by the quantile values of the S constraint values from the MC samples with cut-off probability 1 − δ. This first step gives us an initial constraint set that depends on the difference between the nominal prediction χk as used in the MPC and possible state sequences according to the MC simulations. The parameter δ in this case is only a tuning parameter to obtain the initial back-off values. In the next step, these back-off values are modified using a back-off factor γ : ˜ (k) ∀( j, k) ∈ {1, . . . , n (k) b(k) g } × {1, . . . , T }. j = γ bj
(8.39)
A value of γ is sought for which the lower bound βlb is equal to 1 − to obtain the required chance constraint satisfaction in Eq. 8.7, which can be formulated as a root finding problem: (8.40) h(γ ) = βˆlb (γ ) − (1 − ), where the aim is to determine a value of γ , such that h(γ ) is approximately zero. βˆlb (γ ) refers to the implicit dependence of βˆlb on the S MC simulations resulting from the tightened constraints of the nominal GP NMPC algorithm according to Eq. 8.39. In other words, the back-off values of the NMPC are adjusted until they return the required chance constraint satisfaction in Eq. 8.7. To drive h(γ ) to zero we employ the bisection technique [4], which seeks the root of a function in an interval aγ and bγ , such that h(aγ ) and h(bγ ) have opposite signs. It is expected that a too high value of the back-off factor leads to a highly conservative solution with a positive sign of h(·), while a low value of the back-off factor often results in negative values of h(·). In our algorithm, the initial aγ is set to zero to evaluate b˜ (k) j in the first step. The bisection method repeatedly bisects the interval, in which the root is contained.
8 Hybrid Gaussian Process Modeling Applied to Economic Stochastic Model …
207
The output of the algorithm are the required back-offs in n b back-off iterations. The overall procedure to attain the back-offs is given in Algorithm 8.1. Algorithm 8.1 Back-off iterative updates ˆ k:T −1 ), Xk , Uk , , α, δ, S, n b Input: μx0 , Σ x0 , μq (z; D ), Σ q (z; D ), D , T , VT (x, k, U (k)
Initialize: Set all b j = 0 and δ to some reasonable value, set aγ = 0 and bγ to some reasonably high value, such that bγ − (1 − ) has a positive sign. for n b back-off iterations do if n b > 0 then cγ := (aγ + bγ )/2 (t) (t) (t) b j := cγ b˜ j ( j, t) ∈ {1, . . . , n g } × {1, . . . , T } end if (t) Define GP NMPC in Equation 8.23 with back-offs b j (s) Run S MC simulations to obtain X using the GP NMPC policy with updated back-offs S 1{C(X (s) ) ≤ 0} βˆ := FˆC(X (s) ) = 1S s=1 ˆ S βˆ βˆlb := betainv α, S + 1 − S β, if nb = 0 then (t) (t) (t) b˜ j = Fˆ −1 (t) (δ) − g j (χt ) ∀( j, t) ∈ {1, . . . , n g } × {1, . . . , T } gj
a βˆlbγ := βˆlb − (1 − ) else c βˆlbγ := βˆlb − (1 − ) c a if sign(βˆlbγ ) = sign(βˆlbγ ) then aγ := cγ a c βˆlbγ := βˆlbγ else bγ := cγ end if end if end for (t) (t) Output: b j ∀( j, t) ∈ {1, . . . , n g } × {1, . . . , T }, βˆlb
8.3.5 Algorithm A summary of the overall algorithm proposed in this paper is given in this section. As a first step, the problem needs to be specified following the problem definition in Sect. 8.2. From the available data, the GP hybrid model needs to be trained as outlined in Sect. 8.3.1. Thereafter, the back-offs are determined offline iteratively following Algorithm 8.1. These back-offs then define the tightened constraint set for the GP NMPC feedback policy, which is run online to solve the problem initially outlined. An overall summary can be found in Algorithm 8.2.
208
E. Bradford et al.
Algorithm 8.2 Back-off GP NMPC Offline Computations 1. Build GP hybrid model from dataset D = (Z, Y) as shown in Section 8.3.1. 2. Choose time horizon T , initial condition mean μx0 and covariance Σ x0 , measurement covariance matrix Σ ν , disturbance covariance matrix Σ ω , stage costs and f , constraint sets Xk , Uk ∀k ∈ {1, . . . , T }, chance constraint probability , ecdf confidence α, tuning parameter δ, the number of back-off iterations n b , and the number of Monte Carlo simulations S to estimate the back-offs. Determine explicit back-off constraints using Algorithm 8.1. 3. Check final probabilistic value βˆlb from Algorithm 8.1 if it is close enough to . Online Computations for k = 0, . . . , T − 1 do 1. Solve the MPC problem in Equation 8.23 with the tightened constraint set from the Offline Computations. 2. Apply the first control input of the optimal solution to the 3. real plant. 4. Measure the current state xk . end for
8.4 Case Study The case study is based on a semi-batch reaction for the production of fatty acid methyl ester (FAME) from microalgae, which is considered a promising renewable feedstock to meet the growing global energy demand. FAME is the final product of this process, which can be employed as biodiesel [18]. We exploit a simplified dynamic model to verify the hybrid GP NMPC algorithm proposed in this paper. The GP NMPC has an economic objective, which is to maximize the FAME (biodiesel) concentration for the final batch product subject to two path constraints and one terminal constraint.
8.4.1 Semi-batch Bioreactor Model The simplified dynamic system consists of four ODEs describing the evolution of the concentration of biomass, nitrate, nitrogen quota, and FAME. We assume a fixed volume fed-batch reactor. The balance equations can be stated as follows [18]:
8 Hybrid Gaussian Process Modeling Applied to Economic Stochastic Model …
209
Table 8.1 Parameter values for ordinary differential equation system in Eq. 8.41 Parameter Value Units μM μd kq μN KN ks ki α θ γ τ δ φ β L
0.359 0.004 1.963 2.692 0.8 91.2 100.0 196.4 6.691 7.53 ×103 0.01 1.376 9.904 16.89 0.0 0.0044
h−1 h−1 mg g−1 mg g−1 h−1 mg L−1 μmol m−2 s−1 μmol m−2 s−1 L mg−1 m−1 – – – – – – m−1 m
kq N dC X C X − μd C X , C X (0) = C X 0 = 2μm (I0 , C X ) 1 − dt q N + KN
CN dC N C X + FN , C N (0) = C N 0 = −μ N (8.41) dt CN + K N
kq dq CN − μm (I0 , C X ) 1 − = μN q, q(0) = q0 dt CN + K N q
kq dFA = μm (I0 , C X )(θ q − FA) 1 − dt q
C N C X , FA(0) = FA0 , − γ μN CN + K N where C X is the concentration of biomass in gL−1 , C N is the nitrate concentration in mgL−1 , q is the dimensionless intracellular nitrogen content (nitrogen quota), and FA is the concentration of FAME (biodiesel) in gL−1 . Control inputs are given by the incident light intensity (I0 ) in μmol m−2 s−1 and nitrate inflow rate (FN ) in mg L−1 h−1 . The state vector is hence given by x = [C X , C N , q, FA]T and the input vector by u = [I0 , FN ]T . The corresponding initial state vector is given by x0 = [C X 0 , C N 0 , q0 , FA0 ]T . The remaining parameters can be found in Table 8.1 taken in part from [18]. The function μm (I0 , C X ) describes the complex effects of light intensity on the biomass growth, which we assume to be unknown in this study. This helps simplify
210
E. Bradford et al.
the model significantly, since these effects are dependent on the distance from the light source and hence would lead to a partial differential equation (PDE) model if modeled by first principles. The actual function can be given as follows to obtain values to train the hybrid GP: μM μm (I0 , C X ) = L
L
z=0
I (z, I0 , C X ) I (z, I0 , C X ) + ks +
I (z,I0 ,C X )2 ki
dz,
(8.42)
where I (z, I0 , C X ) = I0 exp −(α C X + β )z , z is the distance from the light source in m, and L is the reactor width in m.
8.4.2 Problem Setup The problem has a time horizon T = 12 with a batch time of 480 h, and hence a sampling time of 40 h. Next we state the objective and constraint functions according to the general problem definition in Sect. 8.2 based on the dynamic system in Eq. 8.41. Measurement noise covariance matrix Σ ν and disturbance noise matrix Σ ω are defined as Σ ν = 10−4 × diag 2.52 , 8002 , 5002 , 30002 , Σ ω = 10−4 × diag 0.12 , 2002 , 102 , 1002 .
(8.43a) (8.43b)
The mean and covariance of the initial condition are set to μx0 = [0.4, 0, 150, 0]T , Σ x0 = 10−3 × diag(0.22 , 0, 1002 , 0).
(8.44)
The aim of the control problem is to maximize the amount of biodiesel in the final batch with a penalty on the change of control actions. The corresponding stage and terminal costs can be given as (xt , ut ) = Tut Rut , f (xT ) = −FAT ,
(8.45)
where ut = ut − ut−1 and R = 5 × 10−3 × diag(1/4002 , 1/402 ). The objective is then defined by Eq. 8.5. There are two path constraints. Firstly, the nitrate is constrained to be below 800 mg/L. Secondly, the ratio of nitrogen quota q to biomass may not exceed 0.011 for high-density biomass cultivation. These are then defined as g1(t) = C N t − 800 ≤ 0
∀t ∈ {0, . . . , T },
(8.46a)
g2(t)
∀t ∈ {0, . . . , T }.
(8.46b)
= qt − 0.011C X t ≤ 0
8 Hybrid Gaussian Process Modeling Applied to Economic Stochastic Model …
211
Further, the nitrate should reach a concentration below 150 mg/L for the final batch. This constraint can be stated as g3(T ) (xT ) = C N T − 200 ≤ 0, g3(t) (xt ) = 0 ∀t ∈ {0, . . . , T − 1}.
(8.47)
The control inputs light intensity and nitrate inflow rate are subject to the following box constraints: 120 ≤ It ≤ 300 0 ≤ FN t ≤ 10
∀t ∈ {0, . . . , T }, ∀t ∈ {0, . . . , T }.
(8.48a) (8.48b)
The priors were set to the following values: p(Q) = N (−6 × 1, 50 × I),
(8.49a)
ˆ = N (−6 × 1, 50 × I), p(Q) −3 T
(8.49b) −6
p(Ψ ) = N ([0, 5 × 10 ] , diag(20 × I, 1 × 10 )).
(8.49c)
Maximum probability of violation was to = 0.1. To compute the back-offs, a total of S = 1000 MC iterations are employed for each iteration according to δ = 0.05 and α = 0.01. The number of back-off iterations was set to n b = 14.
8.4.3 Implementation and Initial Dataset Generation The discretization rule used for the MAP fit, for the GP MC sample, and for the GP NMPC formulation exploits direct collocation with fourth-order polynomials with the Radau collocation points. The MAP optimization problem and the GP NMPC optimization problem are solved using Casadi [2] to obtain the gradients of the problem using automatic differentiation in conjunction with IPOPT [53]. IDAS [24] is utilized to simulate the real plant. The input dataset Z was designed using the Sobol sequence [47] for the entire input data in the range zi ∈ [0, 3] × [0, 800] × [0, 600] × [0, 3500] × [120, 300] × [0, 10]. The ranges were chosen for the data to cover the expected operating region. The outputs Y were then obtained from the IDAS simulation of the system perturbed by Gaussian noise as defined in the problem setup.
8.5 Results and Discussions Firstly, the accuracy of the proposed hybrid GP model is verified by creating 1000 random datapoints. For these we calculate the absolute prediction error and the abso-
212
E. Bradford et al.
lute error over the standard deviation, which gives an indication on the accuracy of the uncertainty measure provided by the GP. These results are summarized in Fig. 8.2. For comparison purposes, three cases of the GP NMPC algorithm are compared. Firstly, we run the above case study using 30 datapoints and 50 datapoints. In addition, we compare this with the previously proposed GP NMPC algorithm in [11] that aims to model the dynamic state-space equations using GPs using 50 datapoints. Lastly, these three cases are further compared to their nominal variations, i.e., setting all back-offs in the formulations to zero. The results of these runs are highlighted in Figs. 8.3, 8.4, 8.5, 8.6, 8.7, and 8.8 and in Table 8.2. From these results, we can draw the following conclusions: • From Fig. 8.2, we can see in the first graph that the median absolute error decreases significantly going from a dataset size of 30 to 50, which is as expected. Overall the hybrid model predictions seem reasonably well. The GP error measure can be tested by dividing the absolute error by the standard deviation, for which the vast majority of values should be within approximately a range of 0 to 3. A value above 3 has a chance of 99.4% of occurrence according to the underlying Gaussian distribution. For N = 30, we observe no value above 3, while for N = 50 we observed 1.1% of values above 3. It can, therefore, be said that the error measure for N = 30 is more conservative, but both seem to show reasonable behavior. • From Figs. 8.3, 8.4, and 8.5, it can be seen that the hybrid approaches both lead to generally good solutions, while the non-hybrid approach is unable to deal with the spread of the trajectories for constraint g2 . Further, it can be seen that the uncertainty of GP hybrid 50 is less than GP hybrid 30 from the significantly smaller spread of constraint g2 , which is as expected given the observations from Fig. 8.2. • Figure 8.6 illustrates the better performance of GP hybrid 50 over GP hybrid 30 obtaining a nearly 40% increase in the objective on average. This is due to two reasons: Firstly, more data leads to better decisions on average; and secondly, lower uncertainty means that the GP hybrid 50 is less conservative than GP hybrid 30. Lastly, GP non-hybrid 50 achieves high objective values by violating the second constraint g2 by a substantial amount. • Figures 8.7 and 8.8 show that the nominal approach ignoring back-offs leads to constraint violations for all GP NMPC variations, while with back-offs the two hybrid approaches remain feasible throughout. GP non-hybrid 50 overshoots the constraint by a huge amount due to the NMPC becoming infeasible for the real plant. Overall, the importance of back-offs is shown to maintain feasibility given the presence of plant–model mismatch for both GP hybrid cases. However, for GP non-hybrid 50, the uncertainty is too large to attain a reasonable solution. • In Table 8.2, the average computational times are between 78ms and 174ms. It can be seen that the GP hybrid approaches have higher computational times, which is due to the discretization required in the NMPC optimization problem. Overall the computational time of a single NMPC iteration is relatively low, while the offline computational times required to attain the back-offs is relatively high.
8 Hybrid Gaussian Process Modeling Applied to Economic Stochastic Model …
213
Fig. 8.2 GP hybrid model cross-validation for dataset sizes N = 30 and N = 50 using 1000 randomly generated points in the same range as the training datapoints. The LHS graph shows the box plot of the absolute error, while the RHS graph shows the absolute error over the standard deviation
Fig. 8.3 The 1000 MC trajectories at the final back-off iteration of the nitrate concentration for the constraints g1 and g2 (LHS) and the ratio of bioproduct to biomass constraint g2 (RHS) for hybrid GP N = 30
Fig. 8.4 The 1000 MC trajectories at the final back-off iteration of the nitrate concentration for the constraints g1 and g2 (LHS) and the ratio of bioproduct to biomass constraint g2 (RHS) for hybrid GP N = 50
214
E. Bradford et al.
Fig. 8.5 The 1000 MC trajectories at the final back-off iteration of the nitrate concentration for the constraints g1 and g2 (LHS) and the ratio of bioproduct to biomass constraint g2 (RHS) for the non-hybrid GP with N = 50 modeling the entire state-space model
Fig. 8.6 Probability density function for the “real” plant objective values for GP hybrid N = 30 and N = 50 on the LHS, and for the non-hybrid GP with N = 50 on the RHS
Fig. 8.7 90th percentile trajectory values of the nitrate concentration for constraints g1 and g3 (LHS) and the ratio of the bioproduct constraint g2 (RHS) for all variations applied to the “real” plant with the final tightened constraint set
8 Hybrid Gaussian Process Modeling Applied to Economic Stochastic Model …
215
Fig. 8.8 90th percentile trajectory values of the nitrate concentration for constraints g1 and g3 (LHS) and the ratio of the bioproduct constraint g2 (RHS) for all variations applied to the “real” plant with back-off values set Table 8.2 Lower bound on the probability of satisfying the joint constraint βˆlb , average computational times to solve a single OCP for the GP NMPC, and the average computational time required to complete one back-off iteration Algorithm variation Probability βˆlb OCP time (ms) Back-off iteration time (s) GP hybrid 30 GP hybrid 50 GP non-hybrid 50
0.89 0.91 0.91
109 174 78
1316 2087 824
8.6 Conclusions In conclusion, a new approach is proposed to combine first principles derived models with GP regression for NMPC. In addition, it is shown how the probabilistic nature of the GPs can be exploited to sample functions of possible dynamic models. These in turn are used to determine explicit back-offs, such that closed-loop simulations of the sampled models remain feasible to a high probability. It is shown how probabilistic guarantees can be obtained based on the number of constraint violations of the simulations. Online computational times are kept low by carrying out the constraint tightening offline. Lastly, a challenging semi-batch bioreactor case study demonstrates the efficiency and potential for this technique to operate complex dynamic systems.
216
E. Bradford et al.
References 1. Alvarez, M., Luengo, D., Lawrence, N.D.: Latent force models. In: Artificial Intelligence and Statistics, pp. 9–16 (2009) 2. Andersson, J.A.E., Gillis, J., Horn, G., Rawlings, J.B., Diehl, M.: CasADi: a software framework for nonlinear optimization and optimal control. Math. Program. Comput. 1–36 (2018) 3. Azarpour, A., Borhani, T.N.G., Alwi, S.R.W., Manan, Z.A., Mutalib, M.I.A.: A generic hybrid model development for process analysis of industrial fixed-bed catalytic reactors. Chem. Eng. Res. Des. 117, 149–167 (2017) 4. Beers, K.J., Beers, K.J.: Numerical Methods for Chemical Engineering: Applications in Matlab. Cambridge University Press (2007) 5. Bhutani, N., Rangaiah, G.P., Ray, A.K.: First-principles, data-based, and hybrid modeling and optimization of an industrial hydrocracking unit. Ind. Eng. Chem. Res. 45(23), 7807–7816 (2006) 6. Biegler, L.T.: Nonlinear Programming: Concepts, Algorithms, and Applications to Chemical Processes, vol. 10. Siam (2010) 7. Bradford, E., Imsland, L.: Economic stochastic model predictive control using the unscented Kalman filter. IFAC-PapersOnLine 51(18), 417–422 (2018) 8. Bradford, E., Imsland, L.: Stochastic nonlinear model predictive control using Gaussian processes. In: 2018 European Control Conference (ECC), pp. 1027–1034. IEEE (2018) 9. Bradford, E., Imsland, L.: Output feedback stochastic nonlinear model predictive control for batch processes. Comput. Chem. Eng. 126, 434–450 (2019) 10. Bradford, E., Imsland, L., del Rio-Chanona, E.A.: Nonlinear model predictive control with explicit back-offs for Gaussian process state space models. In: 58th Conference on decision and control (CDC), page accepted. IEEE (2019) 11. Bradford, E., Imsland, L., Zhang, D., del Rio-Chanona, E.A.: Stochastic data-driven model predictive control using Gaussian processes. arXiv preprint arXiv:1908.01786 (2019) 12. Bradford, E., Schweidtmann, A., Lapkin, A.: Efficient multiobjective optimization employing Gaussian processes, spectral sampling and a genetic algorithm. J. Global Optim. 71(2), 407–438 (2018) 13. Campo, P.J., Morari, M.: Robust model predictive control. In: American Control Conference, 1987, pp. 1021–1026. IEEE (1987) 14. Cao, G., Lai, E.M.-K., Alam, F.: Gaussian process model predictive control of an unmanned quadrotor. J. Intell. Robot. Syst. 88(1), 147–162 (2017) 15. Clopper, C.J., Pearson, E.S.: The use of confidence or fiducial limits illustrated in the case of the binomial. Biometrika 26(4), 404–413 (1934) 16. Conti, S., Gosling, J.P., Oakley, J.E., O’Hagan, A.: Gaussian process emulation of dynamic computer codes. Biometrika 96(3), 663–676 (2009) 17. Deisenroth, M., Rasmussen, C.E.: PILCO: a model-based and data-efficient approach to policy search. In: Proceedings of the 28th International Conference on machine learning (ICML-11), pp. 465–472 (2011) 18. del RioChanona, E.A., Liu, J., Wagner, J.L., Zhang, D., Meng, Y., Xue, S., Shah, N.: Dynamic modeling of green algae cultivation in a photobioreactor for sustainable biodiesel production. Biotechnol. Bioeng. 115(2), 359–370 (2018) 19. Farina, M., Giulioni, L., Scattolini, R.: Stochastic linear model predictive control with chance constraints a review. J. Process Control 44, 53–67 (2016) 20. Frigola, R., Lindsten, F., Schön, T.B., Rasmussen, C.E.: Bayesian inference and learning in Gaussian process state-space models with particle MCMC. In: Advances in Neural Information Processing Systems, pp. 3156–3164 (2013) 21. Hermanto, M.W., Braatz, R.D., Chiu, M.: Integrated batchtobatch and nonlinear model predictive control for polymorphic transformation in pharmaceutical crystallization. AIChE J. 57(4), 1008–1019 (2011)
8 Hybrid Gaussian Process Modeling Applied to Economic Stochastic Model …
217
22. Hewing, L., Liniger, A., Zeilinger, M.N.: Cautious NMPC with Gaussian process dynamics for autonomous miniature race cars. In: 2018 European Control Conference (ECC), pp. 1341–1348 (2018) 23. Hewing, L., Zeilinger, M.N.: Cautious Model Predictive Control using Gaussian Process Regression. arXiv preprint arXiv:1705.10702 (2017) 24. Hindmarsh, A.C., Brown, P.N., Grant, K.E., Lee, S.L., Serban, R., Shumaker, D.E., Woodward, C.S.: SUNDIALS: suite of nonlinear and differential/algebraic equation solvers. ACM Trans. Math. Softw. (TOMS) 31(3), 363–396 (2005) 25. Klenske, E.D., Zeilinger, M.N., Schölkopf, B., Hennig, P.: Gaussian process-based predictive control for periodic error correction. IEEE Trans. Control Syst. Technol. 24(1), 110–121 (2016) 26. Ko, J., Fox, D.: Learning GP-BayesFilters via Gaussian process latent variable models. Auton. Robots 30(1), 3–23 (2011) 27. Kocijan, J., Girard, A., Banko, B., Murray-Smith, R.: Dynamic systems identification with Gaussian processes. Math. Comput. Model. Dyn. Syst. 11(4), 411–424 (2005) 28. Kocijan, J., Murray-Smith, R.: Nonlinear predictive control with a Gaussian process model. In: Switching and Learning in Feedback Systems, pp. 185–200. Springer (2005) 29. Kocijan, J., Murray-Smith, R., Rasmussen, C.E., Girard, A.: Gaussian process model based predictive control. In: American Control Conference, 2004. Proceedings of the 2004, vol. 3, pp. 2214–2219. IEEE (2004) 30. Koller, T., Berkenkamp, F., Turchetta, M., Krause, A.: Learning-based model predictive control for safe exploration and reinforcement learning. arXiv preprint arXiv:1803.08287 (2018) 31. Lawrence, N.D., Sanguinetti, G., Rattray, M.: Modelling transcriptional regulation using Gaussian processes. In: Advances in Neural Information Processing Systems, pp. 785–792 (2007) 32. Likar, B., Kocijan, J.: Predictive control of a gas liquid separation plant based on a Gaussian process model. Comput. Chem. Eng. 31(3), 142–152 (2007) 33. Lucia, S., Finkler, T., Engell, S.: Multi-stage nonlinear model predictive control applied to a semi-batch polymerization reactor under uncertainty. J. Process Control 23(9), 1306–1319 (2013) 34. Maciejowski, J.M.: Predictive Control: with Constraints. Pearson education (2002) 35. Maciejowski, J.M., Yang, X.: Fault tolerant control using Gaussian processes and model predictive control. In: 2013 Conference on Control and Fault-Tolerant Systems (SysTol), pp. 1–12. IEEE (2013) 36. Maiworm, M., Limon, D., Manzano, J.M., Findeisen, R.: Stability of Gaussian process learning based output feedback model predictive control. IFAC-PapersOnLine 51(20), 455–461 (2018) 37. Mesbah, A., Streif, S., Findeisen, R., Braatz, R.D.: Stochastic nonlinear model predictive control with probabilistic constraints. In: 2014 American Control Conference, pp. 2413–2419. IEEE (2014) 38. Murray-Smith, R., Sbarbaro, D., Rasmussen, C.E., Girard, A.: Adaptive, cautious, predictive control with Gaussian process priors. IFAC Proc. Vol. 36(16), 1155–1160 (2003) 39. Nagy, Z.K., Mahn, B., Franke, R., Allgöwer, F.: Evaluation study of an efficient output feedback nonlinear model predictive control for temperature tracking in an industrial batch reactor. Control Eng. Pract. 15(7), 839–850 (2007) 40. Nagy, Z.K., Mahn, B., Franke, R., Allgöwer, F.: Real-time implementation of nonlinear model predictive control of batch processes in an industrial framework. In: Assessment and Future Directions of Nonlinear Model Predictive Control, pp. 465–472. Springer (2007) 41. O’Hagan, A., Kingman, J.F.C.: Curve fitting and optimal design for prediction. J. Royal Stat. Soc. Ser. B (Methodol.) 1–42 (1978) 42. Paulson, J.A., Mesbah, A.: Nonlinear model predictive control with explicit backoffs for stochastic systems under arbitrary uncertainty. IFAC-PapersOnLine 51(20), 523–534 (2018) 43. Psichogios, D.C., Ungar, L.H.: A hybrid neural network-first principles approach to process modeling. AIChE J. 38(10), 1499–1511 (1992) 44. Rasmussen, C.E.: Evaluation of Gaussian Processes and Other Methods for Non-linear Regression (1996)
218
E. Bradford et al.
45. Rasmussen, C.E., Williams, C.K.I.: Gaussian Processes for Machine Learning. MIT Press (2005) 46. Sarkka, S., Alvarez, M.A., Lawrence, N.D.: Gaussian process latent force models for learning and stochastic control of physical systems. IEEE Trans. Autom. Control (2018) 47. Sobol, I.M.: Global sensitivity indices for nonlinear mathematical models and their Monte Carlo estimates. Math. Comput. Simul. 55(1–3), 271–280 (2001) 48. Soloperto, R., Müller, M.A., Trimpe, S., Allgöwer, F.: Learning-based robust model predictive control with state-dependent uncertainty. IFAC-PapersOnLine 51(20), 442–447 (2018) 49. Streif, S., Karl, M., Mesbah, A.: Stochastic nonlinear model predictive control with efficient sample approximation of chance constraints. arXiv preprint arXiv:1410.4535 (2014) 50. Sun, Z., Qin, S.J., Singhal, A., Megan, L.: Performance monitoring of model-predictive controllers via model residual assessment. J. Process Control 23(4), 473–482 (2013) 51. Teixeira, A.P., Carinhas, N., Dias, J.M.L., Cruz, P., Alves, P.M., Carrondo, M.J.T., Oliveira, R.: Hybrid semi-parametric mathematical systems: bridging the gap between systems biology and process engineering. J. Biotechnol. 132(4), 418–425 (2007) 52. Umlauft, J., Beckers, T., Hirche, S.: Scenario-based optimal control for Gaussian process state space models. In: 2018 European Control Conference (ECC), pp. 1386–1392. IEEE (2018) 53. Wächter, A., Biegler, L.T.: On the implementation of an interior-point filter line-search algorithm for large-scale nonlinear programming. Math. Program. 106(1), 25–57 (2006) 54. Williams, C.K.I., Rasmussen, C.E.: Gaussian processes for regression. In: Advances in Neural Information Processing Systems, pp. 514–520 (1996)
Chapter 9
Collision Avoidance for Mobile Robots Based on an Occupancy Grid Tobias Sprodowski
Abstract Communication among distributed systems under the regime of a distributed model predictive control scheme (DMPC), for example, mobile robots, is in many cases a necessary ingredient to steer such a system to its defined target in a reasonable time. Considering short sampling times due to the movement of mobile robots, the communication burden increases with a high density of robots in a shared continuous space. To attenuate the transmission load, many approaches consider triggering events, which transmit an update when the system state changes significantly. As in robotic scenarios with fast dynamics, a constant communication exchange is necessary, quantization approaches to attenuate the communication effort are discussed, and approaches based on state quantization are presented, which allows different communication reduction strategies based on differential updates, bounding boxes, and the estimation of the trajectory of the other robots, which are discussed in this paper. We demonstrate these approaches in simulations with mobile robots aiming for non-cooperative control and derive additional sufficient conditions to ensure the collision avoidance constraints in the DMPC scheme.
9.1 Introduction Due to an advanced empowerment by Industrie 4.0, distributed systems are a fundamental architecture in many production-related scenarios, robotic scenarios or traffic scenarios, for example, extending production workstations to cyber-physical systems [41], industrial robots collaborating in manufacturing processes [1], or mobile robots fulfilling a common goal, for example, explore an unknown environment most efficiently considering shortest way and lowest energy consumption, see [42]. As full knowledge about the state of the overall system is often not achievable for each subsystem due to unavailability of information or privacy issues, a global optimal solution, which is solved by a central instance is, therefore, difficult. T. Sprodowski (B) Universität Bremen, Bremen, Germany e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Switzerland AG 2021 T. Faulwasser et al. (eds.), Recent Advances in Model Predictive Control, Lecture Notes in Control and Information Sciences 485, https://doi.org/10.1007/978-3-030-63281-6_9
219
220
T. Sprodowski
The complexity in the overall system in most cases is increased by two factors, either the computational load of nonlinear models or due to scalability as many subsystems (machines or vehicles) have to be considered. This led to the extension of MPC to distributed MPC, in which every subsystem equipped with a controller solves an optimal control problem (OCP) in a (non-)cooperative manner [2]. The problem can be divided either by using methods as dual decomposition [9], establishing hierarchical structures [30], or if the subsystems are naturally distributed (for example mobile robots or vehicles) and their local states are not necessarily depending on external states, a coupling could be established via coupling constraints to ensure, for example, collision avoidance [8]. As the global system state is not available for all subsystems and information has to be exchanged, some coordination between the subsystems has to be established; Jia and Krogh proposed in [15] a coordinated DMPC scheme in which the controllers exchange min-max open-loop predictions between each other. The min-max predictions induced by a bounded disturbance for each controller are taken into account in the local OCP as constraints to guarantee feasibility. Other approaches focus on coupling constraints or coupling penalty functions among the subsystem to coordinate the subsystems [5, 6]. Venkat et al. showed in [40] by comparing decentralized and distributed control that distributed systems have to incorporate communication to provide subsystems with information about the overall environment to improve the performance regarding the convergence time to achieve a stable state in comparison to the performance of a centralized system. Stewart et al. compared the performance for setpoint stabilization for linear and nonlinear systems in [37, 38], respectively. Their implementation does not require any hierarchical decomposition or central coordination instance. While the communication is carried out as broadcasts that every controller receives all transmitted messages, other approaches restrict the information to neighbored subsystems or successor/predecessor communication as stated in [8, 36]. We are not focusing on the reduction of communication by reducing the number of communication links but instead on the communication load between two controllers themselves. A reason for that can be the guarantee to deliver high priority messages between controllers in a certain time delay, which mostly coincide with the requirement that the bandwidth of the communication channel is never exhausted. Hence, this paper is focusing on approaches to shrink the communication load of established communication links between controllers by retaining still a near-optimal solution. Therefore, in the following paragraph, approaches of reducing communication load on links between connected controllers are discussed. Then, an overview about methods to shrink the effort of communication in distributed systems is presented, mostly considering event-triggering and quantization methods, before introducing the idea of the occupancy grid.
9 Collision Avoidance for Mobile Robots Based on an Occupancy Grid
221
9.1.1 The Burden of Communication Depending on the type of systems, communication between subsystems is restricted to bandwidth and delay: one possibility to implement communication between mobile robots or vehicles are based on wireless protocols including ad-hoc networks, which are specified for vehicles by the IEEE as VANET [19, 24]. Considering high dynamic models, for example, autonomous vehicles or mobile robots, response times have to be fast, and therefore the sampling time interval has to be short, see [14]. As MPC is implemented as a receding horizon control, the predicted plan, for example, discrete points along the planned trajectory, has to be communicated in each time instant. As most mobile robots use digital wireless communication channels to exchange information, the capacity is limited due to few available channels and both bandwidth and delay suffer from a growing number of participants as the access to communication channels has to be executed exclusively, creating a bottleneck of accessing the channels with a growing number of participants. Although mechanisms for prioritized channel access exist, packet drops and bandwidth for congested highways would be still an issue, especially when urban scenarios are considered with high density of participants and higher speeds, see [16, 18]. Therefore, motivated by saving energy and/or bandwidth, different approaches to improve the efficiency of communication are proposed. Aiming to save energy and subsequently bandwidth, event-triggered approaches based on nonlinear MPC use a threshold function, which computes a necessary control when the state deviates from a certain region [39], or by partitioning states in several disjunct spaces to obtain a necessary change in the input [10], or by weighing the deviation from a set point against the necessary communication effort to steer the system back [21] or using Lyapunov-based function triggers, which could be activated if a predicted trajectory leads to an increase of a cost function to minimize [7].
9.1.2 Lifting Communication Burden with Quantization While these event-triggering approaches are applied to systems with slow-changing dynamics [4], for system with a fast-changing dynamics (mobile robots or vehicles) event-triggering may be inefficient considering communication reduction. Therefore, the implementation of quantization approaches allows for systems with fastchanging states to reduce the communication effort, which can be classified as uniform, dynamic, and logarithmic quantifiers. In [17], the load-balancing problem is applied with fixed quantifiers, which is a specialization of the consensus problem with theoretical extensions of upper bounds regarding the convergence time. Static quantizers were introduced in [26] for a linear optimal control problem subject to a low data rate encoding control values and examines the performance with respect to the limited data rate and compares them to the performance of a linear quadratic regulator (LQR). In [3], the quantizers were applied for a state estimation
222
T. Sprodowski
of measured output values via robust Kalman filters, where the relative states were quantized. As the constant quantization scheme may require high data rates to cover the set of output values, a dynamic quantizer is proposed using relative quantized distances. As the underlying model is linear, the quantization region is calculated by solving the Riccati equation. In [13], a quantized communication scheme is implemented for the consensus problem based on quantized relative states. A Lyapunov function candidate was derived from the communication structure of the graph and convergence is shown for first- and second-order systems, reaching the consensus region for uniform quantizers and the consensus point for logarithmic quantizers. For dynamic quantizers with adapting finite levels, in [44] the consensus problem was solved for linear multi-agent systems applying a zoom parameter to obtain a suitable resolution for the quantizer. A mixture of fixed and dynamic quantizer was developed in [43], which allows for switching the quantization levels depending on a dynamic or fixed communication graph among the agents. A similar approach was conducted in [28] applying a flexible quantization scheme for a distributed optimization problem, where the number of bits are limited and evaluated in each iteration. By combining event-triggered and quantized communication, in [22] a distributed subgradient-based optimization problem is addressed, utilizing a finite number of levels to quantize the information about the states of agents. Each agent therein is equipped with a coder/decoder to quantize information about the predicted trajectory with regards to a local cost function subject to a common constraint set among all the agents. Another approach for handling distributed optimization problem based on multi-agents with event-triggered communication was proposed in [23], which uses dynamic quantizers also based on a finite number of levels giving conditions for convergence bounded by the quantization error. Extending this approach, in [20], each agent calculates the necessary quantization levels, which depends on the deviation between all connected agents. By the requirement that a feasible control can be calculated one step ahead, the number of necessary quantization levels is known beforehand and via a zooming technique the optimal solution can be achieved. To conclude the overview one can say that each quantizer type yields advantages and drawbacks: uniform quantizers may be insufficient to depict the whole state space, and therefore a large amount of levels has to be used, which induces also larger sizes of encoded information. On the other side, they are robust if the system is near the optimal state as the quantization error is constant. But nevertheless, the constant quantization levels may hinder the overall system to reach the optimum. Dynamic quantizers may handle this quantization error as those quantization levels are adaptable. But a careful parameter selection has to be conducted to avoid an infinite number of quantization levels when the system is near the optimum, which would negate the advantages of quantization. Logarithmic quantizers combine the advantages of appropriate quantize levels near the optimum and shorter convergence but may lack of mapping the whole state space.
9 Collision Avoidance for Mobile Robots Based on an Occupancy Grid
223
9.1.3 Non-cooperative Control and Communication While most of the discussed scenarios consider a cooperative control scheme following a common objective known to all subsystems and sharing a common restricted constraint set among the agents, we consider a scenario of a non-cooperative control scheme, which consists of mobile robots with individual assigned targets for each robot and an absence of a cooperative goal, see [27]. This is also more realistic for traffic scenarios, where all vehicles have individual targets but are restricted to a rule set to manage the traffic and a global information state would be difficult to obtain. The continuous space (for example a 2D plane) is shared where the robots operate on and hence, dynamic constraints based on the states and open-loop predictions from the other robots have to be incorporated to avoid collisions between the robots. Each robot is equipped with a local controller to execute the MPC scheme in synchronized time instants. The optimization is carried out in the continuous space solving an optimal control problem (OCP) over a finite horizon to obtain a feasible (sub)optimal control to predict a feasible trajectory incorporating all constraints. These predicted trajectories are exchanged between the robots utilizing a quantizer with a finite number of quantization levels. Here, the shared space is quantized into equidistant cells numbered by positive integers, where the quantization level size is equal among all robots. Each robot quantizes the optimal trajectory by mapping the predicted states onto the cells and broadcasts the quantized trajectory to the other robots. After the quantized trajectories are received from all the other robots yielding an occupancy grid for the local robot, the trajectories are converted back into the continuous space, induced as coupling constraints in the local OCP and the optimization is executed in the continuous space aiming for the assigned individual target. Hence, the disadvantage of the quantization on the solution quality can be omitted as the convergence of the robots to their target states is not depending on them. Nevertheless, an appropriate choice of the quantization level (here the cell size) influences the speed of convergence as the cell size impacts the trajectories (that are necessary larger/smaller deviations) of the robots.
9.1.4 Occupancy Grid The usage of an occupancy grid to communicate between the robots allows for the reduction of the communication effort, using integers instead of floating point values. This was shown in numerical simulation for holonomic and non-holonomic simulations in [25]. Lower bounds for the cell size of the occupancy grid were derived in [31]. Therein, also lower bounds for the safety margin to incorporate the quantization error were presented and the communication effort was reduced further using incremental updates instead of broadcasting the full trajectory. This was also examined in an intersection scenario with autonomous connected vehicles in [33]. In a further approach, adopting the interval arithmetic idea from [45], the
224
T. Sprodowski
predicted trajectory states (here the cell indices) are restricted by a bounding box, which allows for the reduction of communication to the largest and smallest cell spanned by the rectangle [35]. As this approach reduces the communication to two cells per time instant, much more cells are assigned for the planned trajectories by using the bounding box. Therefore, we use the idea of relaxed constraints to attenuate the effect of the much more needed space in this approach [32]. The chapter is organized as follows: In the following section, the problem setting is revisited, then the prediction coherence and incremental update idea are reviewed in Sect. 9.3 and illustrated in simulation examples with holonomic robots. In Sect. 9.4, the idea of using further information without additional communication effort is revisited (bounding boxes) and sufficient conditions are derived to ensure a sufficient distance among the robots. Then, the DMPC scheme is adapted and presented in Sect. 9.5 and a necessary condition on the dynamics of the robots is presented, before concluding this article. Notation: Sets of non-negative real and natural numbers are denoted by R≥0 and N0 with N0 := N ∪ {0}. We use the shorthand [x : y] = {x, x + 1, . . . , y} for all x, y ∈ N0 with x ≤ y. For a vector x ∈ Rn , n ∈ N, we define the infinity norm x∞ := maxi∈[1:n] |xi |. Tuples are denoted in round brackets, for example, x = N . (a, b). A sequence of tuples (x1 , x2 , . . . , x N ) is denoted as x = (x(k))k=1
9.2 Problem Setting In this setting, a group of P mobile robots is considered, which are governed by a discrete-time model for each robot p with x+ p = f (x p , u p ) = f 0 (x p ) +
m
f i (x p )u p,i ,
(9.1)
i=1
where f i : Rd → Rd describe smooth vector fields for all i ∈ {1, . . . , m} with d, m ∈ N0 and f 0 (·) could be interpreted as a drift term. Here, for x p = z p , y p ∈ X planar states are considered for the 2D plane, while for d > 2 additional states (for example, orientation) may be considered. As we focus on the quantization purposes, we restrict the state w.l.o.g. to x p ∈ X ⊂ R2≥0 . State x p and control u p are constrained by x p ∈ X ⊂ Rd≥0 and u p ∈ U ⊂ Rm .
(9.2)
The control values are imposed component-wise on the vector fields with the abbre viation u p := u p,1 , . . . , u p,m . For each robot p a finite discrete state trajectory over a finite prediction horizon length N is defined, which starts from a feasible initial point x 0p ∈ X with
9 Collision Avoidance for Mobile Robots Based on an Occupancy Grid
N xup = x up k; x 0p k=0 = x 0p , f x 0p , u p (0) , f x 0p + f x 0p , u p (0) , u p (1) , . . . , = (z 0 , y0 ) , (z 1 , y1 ) , . . . , (z N , y N ) ,
225
(9.3)
while the sequence of control values is incorporated as u p := u p (0), u p (1), . . . , u p (N − 1) ∈ R(N −1)×m . Moreover, we define that the robots are able to stop immediately, which is later needed to show feasibility. Property 9.1 (Immediate Hold) Each robot can come to an immediate hold, that is ∀ xp ∈ X ∃ up ∈ U :
x p = f (x p , u p ).
(9.4)
In practice, this may be difficult to achieve, as actuators may react with a certain delay to an applied control and drift is not taken into account. Therefore, to attenuate this assumption in practice, a larger safety margin may be used to regard the delay of sensors and actuators to incorporate the violation of this stated assumption. The continuous space X is quantized by G := [1, . . . , amax ] × [1, . . . , bmax ] where amax , bmax ∈ N≥0 are the supremum of G. Each cell is then defined by a cell size c based on the continuous space z, y, z max , ymax ∈ X and amax , bmax the supremum with amax =
z max +1 c
and
bmax =
ymax + 1, c
where the grid indices are calculated via a quantize function q : X → G with q(x p ) =
z p
c
+ 1,
y p
c
+1 .
To express the trajectory x p from (9.3) as a finite sequence of quantized (A.E.) states for robot p, this is denoted as a sequence of tuples N I p,n : = n + k, q x up k; x 0p k=0 , n + 1, q x up 1; x 0p , . . . , n + N , q x up N ; x 0p = n, q x up 0; x 0p
(9.5) with k ∈ N0 where each consists of a time stamp and state. I p,n ( j) denotes tuple in the sequence with k ≤ j ≤ N . As the robots the jth tuple n + j, q x up j; x 0p broadcast the quantized trajectory, and therefore all other robots receive these trajectories, they are assembled to an occupancy grid for each robot p as i p,n := I1,n , . . . , I p−1,n , I p+1,n , . . . I P,n .
226
T. Sprodowski
Fig. 9.1 Possible swapping of cells of two robots x1,2 0 with starting conditions x1,2 for time instants n = 0, 1, 2, . . ., which would be unnoticed if cell c size is too small
x2 0; x20 x2 2; x20
x1 1; x10 c
x1 0; x10
x2 1; x20
x1 2; x10
c
To incorporate the information in the optimization problem, the cell indices are converted back to the continuous space via h : G → X with h n, q(x p ) = ((a − 0.5) c, (b − 0.5) c) , (a, b) ∈ G
=:(x c ,y c ) ∈X
and then used to formulate the coupling constraints to evaluate the necessary margin between the 2D state x p = z p , y p of robot p and the quantized state Iq,n of the other robot q with q = p via the infinity norm with
z p (k)
− h I gq,k := g x up k; x 0p , Iq,n (k) = (k) q,n
y p (k)
∞
− rmin ≥ 0, k ∈ [1 : N ]
(9.6)
where rmin ≥ 0 denotes a numerical safety margin. To choose the lower bound for the cell size c, see [31] for deriving sufficient safety margins, which consider the quantization error and prevent skipping cells in two successive time instants. In short, this could occur if two robots are located in opposite and the next step would be the occupied cell by the other robot, see Fig. 9.1. Then, to prevent that the two robots could swap their cells in one time instant, and therefore collide undetected, a minimum cell size depending on the dynamics of the robots has to be found. To obtain this, we recall the proposition from [31] Proposition 9.2 (Minimum Cell Size) Consider a time-discrete system (9.1) with state and control constraints (9.2) to be given. Then the minimum cell size to avoid that cells are skipped with maxu p h 0, q x p − h 0, q f x p , u p ∞ ≤ c, ∀ (x p , u p ) ∈ X × U and c ≤ dmin where dmin ≤ 0 describes the diameter of the robots. Then the minimum cell size satisfies c ≥ c where the lower bound is given by
9 Collision Avoidance for Mobile Robots Based on an Occupancy Grid
c := max c f0 + u p ∈U
m
227
c fi |u p,i |
i=1
with c f0 := max x p ∈X
max | f 0 (x p ) − x p j |
j∈{1,2}
and c fi := max x p ∈X
max | f i (x p ) j | .
j∈{1,2}
The proof was presented in [31]. All constraints are assembled into a set of constraints according to the occupancy grid with G x up k; x 0p , i p,n (k) = g1,k , . . . , g p−1,k , g p+1,k , . . . , g P,k , gi,k ≥ 0 ∀i ∈ [1 : P] \ { p}
(9.7) where i p,n (k) defines the kth tuples of robots 1, . . . , p − 1, p + 1, . . . P. These definitions allow us to discuss the idea of differential updates and prediction coherence in the following section to reduce the effort of constant communication. Each robot is equipped with a local controller to minimize a positive semi-definite objective function p : X × U → R≥0 with subject to the current (measured) state x 0p and an individual target x ∗p specified for each robot. Then, the OCP is formulated over a finite-horizon N as a minimization problem with argmin up
J (u p ; x 0p , x ∗p ) = argmin up
w.r.t x up (k + 1; x 0p ) = f G
x up
N −1
p x up (k; x 0p ), u p (k)
k=0 u x p (k; x 0p ), u p (k) ,
u p (k) ∈ U, 0 k; x p , i p,n (k) ≥ 0, x up k; x 0p ∈ X,
(9.8)
k ∈ [0 : N − 1], k ∈ [0 : N − 1], k ∈ [1 : N ], k ∈ [1 : N ],
(9.9)
which is executed by each robot until the target condition is matched regarding a tolerance margin 0 < δ 1. The overall DMPC scheme is stated in Algorithm 9.1. As this DMPC algorithm is based on the scheme of [29], no terminal costs or constraints are imposed following [12]. In the first time instant, as no robot has solved an OCP, the current feasible states are assumed to be kept over the full prediction horizon N (Algorithm 9.1, line 3). Then, in a sequential order as long as the target is not reached (Algorithm 9.1, line 9), each robot assembles the occupancy grid from the received tuples (Algorithm 9.1, line 10), solves the OCP (Algorithm 9.1, line 11), and sends the predicted trajectory in a quantized sequence of tuples to the other robots (Algorithm 9.1, line 12). Note, that the quantization does not affect the solution quality due to the execution of the optimization in the continuous space. Hence,
228
T. Sprodowski
Algorithm 9.1 DMPC scheme for the overall system Given feasible initial states and targets x 0p , x ∗p for p ∈ [1 : P] , k := 0 for k = 0 to N do Set I p,n (k) := k, q x 0p ∀ p ∈ [1 : P] end for Broadcast I p,n to all robots ∀ p ∈ [1 : P] for n = 1, 2, . . . do for p = 1 to P do Measure x 0p if x 0p − x ∗p 2 > δ then 10: Receive Iq,n and construct G x up k; x 0p , i p,n (k) via (9.7) for k ∈ [1 : N ] 11: Solve OCP (9.8) and apply u ∗p (0) 12: Broadcast I p,n based on (9.5) 13: end if 14: end for 15: end for 1: 2: 3: 4: 5: 6: 7: 8: 9:
the additional safety margin due to the quantization error impacts the convergence time only. To ensure that the problem is initial and recursively feasible, we recap the theorem from [35]: Theorem 9.3 (Initial and Recursive Feasibility) Consider a set of P robots with underlying model (9.1) and given constraints (9.2) satisfying Assumptions 9.1 and the condition, that for each time instant n ∈ N, a minimizer of (9.8) exists if the feasible set is non-empty. If Algorithm 9.1 is applied, then the problem is recursively feasible, that is for all n ∈ N and all p ∈ [1 : P] there exists a solution to the OCP (9.8). The proof for this theorem was presented in [35]. The recursive feasibility is based on Assumption 9.1 that the algorithm does not terminate unexpectedly. If a robot is not capable to find a trajectory, which minimizes the actual costs, setting u p = u p will be always feasible and thus, a feasible solution for an arbitrary chosen robot exists for any time instant. Now, with this quantized information exchange we can investigate further ideas reducing the communication effort of the exchanged predicted trajectories.
9.3 Prediction Coherence and Differential Updates As the predicted trajectories are exchanged in every time instant between the robots, an investigation of these open-loop prediction promises further reduction: As the finite MPC horizon is shifted after each applied stepand a warm start is used the next time instant with u p = u p (1), . . . , u p (N − 1), 0 , this ensures initial feasibility in every time instant incorporating the assumption that an immediate hold of the robots is possible, see [31, Assumption 1]. But furthermore, with this warm start, the trajectory of one robot has similarities in two successive time instants, which raises
9 Collision Avoidance for Mobile Robots Based on an Occupancy Grid
229
the question if a transmission of the full trajectory in every time instant is necessary. Let the trajectories of two successive time instants n = 1, n = 2 be denoted as N +1 N xup = x up k; x 0p k=1 and xup = x up k; x 0p k=0 . Then, to measure the similarity of two successive trajectories, the difference between these two trajectories named as prediction coherence rc : with rc : X N × X N → R≥0 is evaluated by N
u
x (k + 1; x 0 ) − x u (k; x 0 ) . rc x up , x up = p p p 2 p
(9.10)
k=0
for the trajectories in continuous space. For the prediction coherence of the quantized trajectory, we define accordingly the quantized trajectories of two successive time instants with N +1 N I p,n = n + k, q x up k; x 0p k=1 and I p,n−1 = (n − 1) + k, q x up k; x 0p k=0 for n > 0, which allows to state the prediction coherence rq : G N +1 × G N +1 → R≥0 then by N
I p,n (k + 1) − I p,n−1 (k) , rq I p,n , I p,n−1 = 2
(9.11)
k=0
while the latter measures the differences of the grid indices, c.f [34]. Example 9.4 (Prediction coherence for holonomic robots) As an example we will illustrate the problem in simulations consisting + of asystem holonomic of four mobile zp zp 0 y 1 z robots defined by the kinematic model = v where + v + yp yp 1 p 0 p y 2 2 y v zp and v p are defined as the control inputs and bounded by v zp + v p ≤ 0.5. The minimum cell size was obtained according to [31] with c = 0.5 and the evaluation was done for cell sizes c = {0.5, 1.0, 1.5, 2.0} over a given continuous space X = [−6, 6]2 . The stage cost function, which is used as a performance criterion is defined as
2
2
2
z p − z ∗p p x p , u p :=
+ 0.2 u p ,
5 y p − y ∗p where x ∗p := z ∗p , y ∗p defines the target position for robot p and the chosen quadratic terms for x and coefficients for y were chosen to allow for comparison of previous work [25, 34]. Individual start and target position of the robots are depicted in Table 9.1 and a plot to illustrate the comprehensive scenario is presented in Fig. 9.2.
230
T. Sprodowski
Table 9.1 Initial and target states of the robots Robot ( p) Initial states (x 0p ) (4.5, 4.5)
1 2 3 4
(−4.5, 4.5) (4.5, −4.5) (−4.5, −4.5)
Fig. 9.2 Example for a collision avoidance conflict of four robots with given initial conditions xi (0) and targets xi∗ for i ∈ {1, 2, 3, 4} with a = b = 7 with conflicting area (gray)
Target states (x ∗p ) (−4.5, −4.5) (4.5, −4.5) (−4.5, 4.5) (4.5, 4.5)
x1 (0)
x2 (0)
x4∗
x3∗
x3 (0)
x1∗
x2∗
x4 (0)
The cumulated closed-loop costs over each time instant n are defined by M P (n) :=
P
p x up (n; x 0p ), u p (n) .
p=1
Considering the closed-loop costs, we stick to the definition stated in [25, 31] to allow for comparison. Figures 9.3 and 9.4 show the prediction coherence regarding (9.10) and occupancy grid coherence regarding (9.11) for the given scenario with four robots for cell sizes c = {0.5, 1.0, 1.5, 2.0} (left figures) and for each robot in one scenario with cell size c = 2.0 developing over the closed-loop time instances denoted by n (right figures). Here, the continuous prediction coherence shows that for larger cell sizes there are more similarities between two predictions as the robots need more time to cross a cell (see Fig. 9.3 left). For the middle-sized cell sizes c = {1.0, 1.5} it seems that the difference are larger considering the time instants around n = 20. For the occupancy grid coherence, the picture is similar to the former figures, leading to
9 Collision Avoidance for Mobile Robots Based on an Occupancy Grid
231
Fig. 9.3 Four robots depicting the prediction coherence rc for N = 12 for cell sizes c = {0.5, 1.0, 1.5, 2.0} (left) and c = 2.0 for each robot (right)
a smaller difference with larger cell sizes (see Fig. 9.4 left). Especially, the situation with a chosen cell size c = 2.0 shows that the fixed execution order of the robots has an impact on the coherence. This is particularly shown by the smallest difference of robot 1, which carries out its optimization at first, as the coherence of this one is the highest (green graph). The closed-loop costs are illustrated in Fig. 9.5 showing the decrease of cumulated costs over all cell sizes (left) and for the particular scenario with cell size c = 2.0 the decrease of individual costs of each robot (right). The individual decrease of costs allows the conclusion that robots 3 and 4 have to take larger detours as their optimization order is at last in each time instant. Here, to compare the performance of the approach using grid cells against the DMPC scheme without quantization, the continuous version of the DMPC scheme is given with c = 0. For small and larger cell sizes the robots converges fast, while for middlesized cell sizes the convergence times increases. In Fig. 9.5 (right), the closed-loop costs are depicted for each robot showing that the fixed sequential order of the robots gives an advantages to robot 1 as its optimization is carried out at first, which leads to less unnecessary detours and thus, first arrival at the target. The results from [34] and Figs. 9.3 and 9.4 show a higher prediction coherence (in terms of similarity) with increasing cell sizes. Considering larger cell sizes, the prediction coherence is not proportional to the cell size. This indicates that there exists a lower bound of necessary communication, which is independent of the chosen cell size. Hence, the communication effort is bounded by two situations: On one hand, at least one message per robot has to be transmitted per time instant as the new predicted state at the end of the horizon length has to be broadcasted. On the other hand, as ensuring collision avoidance lead to detours, a certain number of cells in the prediction horizon have to be updated especially with smaller cell sizes. Therefore, depending on the dynamics of the robots, cell size and prediction horizon length, it
232
T. Sprodowski
Fig. 9.4 Four robots depicting the occupancy grid coherence rq for N = 12 for cell sizes c = {0.5, 1.0, 1.5, 2.0} with cumulated coherence (left) and c = 2.0 for each robot individually (right)
Fig. 9.5 Four robots depicting cumulated closed-loop costs M p for cell sizes c = {0.5, 1.0, 1.5, 2.0} (left) and individual costs for c = 2.0 for each robot (right) over horizon length N = 12
is an open question, which cell sizes and prediction horizon length suit best for low communication load and fast convergence. The prediction coherence for the quantized trajectory imposes the idea to broadcast differential updates instead of fully transmitted trajectories, as the grid indices of the quantized predicted trajectories are easily comparable for each successive time instant. Therefore, the quantized predicted trajectory, consisting of a sequence of tuples, is preserved for the last time instant as I p,n−1 and allows to obtain the differential update Icp,n for comparison with
9 Collision Avoidance for Mobile Robots Based on an Occupancy Grid
Icp,n
⎧ ⎪ ⎪∅, ⎪ ⎨n + k, a , b , p p := ⎪ , b n + k, a p p , ⎪ ⎪ ⎩ n + k, a p , b p ,
233
h n + k, a p , b p = h n + k + 1, a p , b p , k ∈ [1 : N − 1] h n + k, a p , b p = h n + k + 1, a p , b p , k ∈ [1 : N − 1] k=N other wise
(9.12)
for a p , b p ∈ I p,n−1 , a p , b p ∈ I p,n ,
which adds here the tuples from the new trajectory I p,n if they are different to the previous one (I p,n−1 ). For k = N , the last tuple is always added due to the shifted c is received, Iq,n is horizon. On the other hand, when the differential update Iq,n constructed via c c n + k, aq , bq := Iq,n ∃ n + k, aq , bq ∈ Iq,n (k) , Iq,n := (9.13) n + k, a q , bq := Iq,n−1 (k) , other wise with q ∈ [1 : P] \ { p}. If an update exists in the received transmission from the c , the tuple is taken, otherwise, the tuples are taken from the preserved other robot Iq,n memory Iq,n−1 . Hence, the inclusion of the time instant is essential to enable this differential update method. Then, the communication of the DMPC algorithm is extended to incorporate the differential updates, which is presented in Algorithm 9.2. Here, in the initialization phase (Algorithm 9.2, line 5) the full prediction is Algorithm 9.2 DMPC scheme with differential updates for the overall system 1: Given feasible initial states and targets x 0p , x ∗p for p ∈ [1 : P] , k := 0 2: for k = 0 to N do 3: Set I p,n (k) := k, q x 0p ∀ p ∈ [1 : P] 4: end for 5: Broadcast I p,n to all robots ∀ p ∈ [1 : P] 6: for n = 1, 2, . . . do 7: for p = 1 to P do 8: Measure x 0p 9: if x 0p − x ∗p 2 > δ then 10: Set I p,n−1 := I p,n , Iq,n−1 := Iq,n , q ∈ [1 : P] \ { p} 11: if n = 1 then 12: Receive Iq,n 13: else c and construct I 14: Receive Iq,n q,n via (9.13) 15: end if 16: Construct G x up k; x 0p , i p,n (k) via (9.7) for k ∈ [1 : N ] 17: Solve OCP (9.8) and apply u ∗p (0) 18: Broadcast Icp,n based on (9.12) 19: end if 20: end for 21: end for
234
T. Sprodowski
sent and received from the other robots (line 12). Then, the predictions, which were sent (I p,n−1 ) and received from the other robots (Iq,n−1 ) are preserved to calculate the differential update (Algorithm 9.2, line 10). While the robots have not reached their targets (Algorithm 9.2, line 9), the differential updates are received and the full occupancy grid is constructed with the help of the preserved memory Iq,n−1 (Algorithm 9.2, line 14). After solving the OCP the obtained trajectory is examined, which updates have to be transmitted (Algorithm 9.2, line 18). Note that the communication of differential updates is based on noiseless communication channels without delays or other disturbances. As full communication may be used to ensure robustness to use the open-loop prediction to bridge a delay in communication [11], our focus here is only on communication reduction. Example 9.5 (Differential Updates) We implement the setup from Example 9.4 but with differential updates in the transmitted predictions and the closed-loop costs, which are depicted in Fig. 9.6. The differential updates are presented as the average number of broadcasted tuples according to the simulation execution length n # , which is given by K :=
n# P 1 #Ic |n , n # n=1 p=1 p,n
where #Icp,n |n represents the broadcasted tuples of robot p at time instant n and for the sake of completeness for the full communication (sending full predictions in each time instant) this is then K =
n# P 1 #I p,n |n . n # n=1 p=1
Here, the results show a significant lower number of average messages for the differential update over all cell sizes. Additionally, the average message load does not increase for the differential updates as for the full communication with larger cell sizes, also the simulation execution time varies. As the robots need more time to cross the cells with a large cell size, the differential update shrinks to at least one cell per time instant and thus, reduces the communication load at most. While the differential update transmission load is still influenced by the dynamics and the speed of the robots, another possibility to reduce this number of transmitted cell indices is to use the bounding box formulation, which admits a constant communication stream independent from the chosen cell size or dynamics of the underlying system.
9 Collision Avoidance for Mobile Robots Based on an Occupancy Grid
235
Fig. 9.6 Average number of messages in full communication manner (blue) and differential update (beige) over the cell sizes c = {0.5, 1.0, 1.5, 2.0} (left) and the development of the cumulated closedloop costs M p (right) for all cell sizes c for four robots over horizon length N = 12
9.4 Bounding Box Constraint Formulation In the previous two sections, we introduced the quantization of the predicted trajectories and the calculation of differential updates. Therein, the communication load is still dependent on the dynamics and the difference of the successive open-loop trajectories. Hence, stemming from the idea of Interval Superposition Arithmetic [45], we characterize the finite quantized trajectory prediction of the robots by a bounding box. Following the quantized trajectory formulation (9.5), the bounding box is defined as a rectangle by evaluating the minimum and maximum of the predicted trajectory by robot p with X p = min C p , min C p , max C p , max C p ap ap bp bp u 0 N with C p : = a p , b p |q x p k; x p k=0 .
(9.14)
For an illustration, see Fig. 9.7 (left). To this end, this results in two cells to be communicated, which have to be transmitted in each time instant. Thus, the robots are able to construct the bounding box within when the other robot is moving along its trajectory. The bounding boxes of the other robots have to be avoided over the full prediction horizon as the exact trajectory is unknown, which may lead to longer detours with respect to the trajectory [35]. Now, as all robots share the same dynamics (9.1) and the same limitations considering state and control constraints (9.2), a prediction of the movement of the robots can be made if start points (Iq,n (k)) and end points (Iq,n (k + N )) of the quantized trajectories of all other robots q, q ∈ [1 : P] \ { p} are known. From the received information, which yields a bounding box, no informa-
236
T. Sprodowski
Fig. 9.7 Illustration of the bounding box declared by min/max cells (left), illustration of usage of signs for (a, b), depending on the direction (right)
tion about the start and end can be obtained. Therefore, as the grid indices points are non-negative, that is a p , b p ∈ G, the encoding of the leading signs could be used to include without additional communication effort more specific information about the positions of the robots: Hence, the directions of the robots are encoded by leading signs using the combinations (++, −−, +−, −+) for the first grid index X q (1) referring to the directions “to right top”, “to left bottom”, “to right bottom”, and “to left top”, respectively, see also [32]. For an illustration, see Fig. 9.7 (right). The communicated bounding box from (9.14) is then extended to ⎛
⎞
Ibp := X p = ⎝(±a1 , ±b1 ), (a2 , b2 )⎠
(9.15)
direction
as a 2-tuple with (0, a1 , b1 ) = Ibp (1) and (0, a2 , b2 ) = Ibp (2), respectively. Here, the grid indices are still interpreted as strictly positive, namely letting, for example, C p (−2, −2) = C p (−2, 2) = C p (2, 2) = C p (2, −2) be the same cell. To obtain the direction of a received sequence based on (9.15), the initial point xqu k; xq0 ) := h Iqb (1) , which represents the start position in the current sequence and the (inter mediate) target point xqu k + N ; xq0 ) := h Iqb (2) , which represents the end of the predicted trajectory, are calculated via (9.16). Here, based on the received sequence, the grid indices, where the robots start and end, are assembled in the correct order by examining the signs via
9 Collision Avoidance for Mobile Robots Based on an Occupancy Grid
237
Fig. 9.8 Reconstruction of the movement of one robot (green cells) by (9.16) based on received X p (dashed cells), for example, here ((+a1 , −b1 ), (a2 , b2 ))
⎧ Xq , ⎪ ⎪ ⎪ ⎨ X (2), −X (1) , q q Iqb : = ⎪ (1)(1), X q (2)(2) , X q (2)(1), −X q (1)(2) , X q ⎪ ⎪ ⎩ X q (2)(1), X q (1)(2) , −X q (1)(1), X q (2)(2) , with sgn(X q ) := sgn X q (1)(1) , sgn X q (1)(2) ,
sgn(X q ) = (1, 1) sgn(X q ) = (−1, −1) sgn(X q ) = (1, −1) sgn(X q ) = (−1, 1)
(9.16)
where X q (i)( j) returns in the ith tuple the jth value. For an illustration of this reconstruction of the direction see Fig. 9.8. We can assume that with a large size of a bounding box, the reserving robot releases the first cells quickly due to a higher speed, and therefore the cells especially in the beginning (close to Iqb (1)) could be used by the other robots. Although the direction of the robot is now known, a condition is needed, which guarantee that a reserved cell is no longer in use. Otherwise, it cannot be guaranteed that a cell inside the bounding box is free at a given time k. For example, if the robot has to let pass another robot by, it is unknown to the other robots, at which speed the robot is moving inside the bounding box. Nevertheless, it is possible to calculate a lower bound, which ensures the sufficient distance between two robots p and q, assuming that robot q uses the latest possible time instant to reach the intermediate target point Iq2 (2) with maximum speed. This is formulated in the following proposition stating a minimum release time for each cell of the bounding box, where the robot will not use it. Theorem 9.6 (Minimum release time of a cell over prediction horizon length) Consider two arbitrary robots p, q with x p , xq ∈ X fulfilling (9.2) and a bounding box based on (9.15) from robot q is provided via Iqb := X q with p = q. Then, for a minimum cell size c satisfying Proposition 9.2, a finite-horizon length N > 1, k ∈ [n : n + N ] with discrete-time instants n = 1, 2, . . ., the minimum release time of a
238
T. Sprodowski
cell I p,n (k) regarding the distance to robot p to robot q is lower bounded by max h Iqb (2) − h Iqb (1) ∞ − (k − 1)c,
min h Iqb (2) − h Iqb (1) ∞ , h Iqb (2) + (N − k) c ∞
≤ h I p,n (k) − h Iqb (2) ∞ . (9.17) Proof We distinguish here the cases k = 1 and
1< k ≤ N : For k = 1 it is ensured by the first term of the maximum evaluation h Iqb (2) − h Iqb (1) ∞ − (k − 1)c that the bounding box of robot q is not violated by robot p with
b
h I (2) − h Ib (1) q
q
∞
≤ h I p,n (1) − h Iqb (2) ∞ .
For 1 < k ≤ N the second term of the maximum expression ensures that as the lower 9.2 by the maximum distance cell size c is lower bounded according to Proposition of control value maxu p h 0, q(x p ) − h 0, q( f (x p , u p )) ∞ ≤ c, ∀ (x p , u p ) ∈ X × U assuming that robot q is sought to achieve maximum speed. To ensure that the bounding box is not becoming larger and the predicted trajectory of robot p might become infeasible, this is avoided by taking here the minimum, that is the original bounding box stated by robot q as the estimation via c might be larger. Then, for an increasing k the distance formulated as the radius (N − k)c decreases and hence, the bounding box decreases. As the trajectory of robot q is assumed feasible, the to be upper possible speed limited by c ensures that the position xqu k; xq0 ) of robot q is
inside h Iqb (2) + (N − k) c ∞ for any time instant 1 ≤ k ≤ N . We can conclude that for a cell, which distance is less than the lower bound stated in the former proposition, still has to be considered for the bounding box. Lemma 9.7 (Examination of cells for the bounding box) If for a given cell (i, j) with (i, j) ∈ G Proposition 9.6 does not hold, that is the lower bound does not hold with max h Iqb (2) − h Iqb (1) ∞ − (k − 1)c,
min h Iqb (2) − h Iqb (1) ∞ , h Iqb (2) + (N − k) c ∞
≥ h (k, (i, j)) − h Ib (2) q
∞
then (i, j) has to be added to the occupancy grid Iq,n . The concluded Lemma 9.7 allows the examination of all the cells included in the bounding box for all robots q ∈ [1 : P] \ { p} with Iqb (1) ≤ (i, j) ≤ Iqb (2), the coupling constraints can be constructed according to (9.6), which is carried out in Algorithm 9.3. This algorithm ensures that all cells, which are capable of being used by the other robot q over the prediction horizon length N , are included in the
9 Collision Avoidance for Mobile Robots Based on an Occupancy Grid
239
Algorithm 9.3 Occupancy grid construction for (robot p) by the received sequence X q from robots q ∈ [1 : P] \ { p} 1: Input X q , k and N 2: Compute Iqb from (9.16) 3: Set i min := Iqb (1)(1), i max := Iqb (2)(1)) 4: Set jmin := Iqb (1)(2), jmax := Iqb (2)(2) 5: for k = n to n + N do 6: for i = i min to i max do 7: for j = jmin to jmax do 8: if Lemma 9.7 holds then 9: Append Iq,n (k) := Iq,n (k) ∪ (k, (i, j)) 10: end if 11: end for 12: end for 13: end for 14: return Iq,n
occupancy grid. With an increasing k, the bounding box becomes smaller, imposing the assumption that the trajectories of each robot are not intertwined and they are moving straightly to their intermediate target Iqb (2), which is discussed in the next subsection.
9.5 DMPC with Decreasing Bounding Box Constraints Now, with the reconstruction of the occupancy grid from the bounding box and usage of the direction from incorporated signs to attenuate the used number of cells, the DMPC algorithm using the decreasing bounding box method is defined in Algorithm 9.4: Here, in difference to the previous communication methods, even in the initialization phase only two cells are transmitted (see Algorithm 9.4, line 10). Then, the bounding box is received from the other robots q with q ∈ [1 : P] \ { p} and the occupancy grid is constructed (Algorithm 9.4, line 12) to be able to create the coupling constraints for the OCP. After that the bounding box of the predicted trajectory is calculated with leading signs included and broadcasted. The overall procedure continues until the target condition is matched (Algorithm 9.4, line 8). Note that the trajectory of each robot is assumed to be inside the bounding box and the is assumed as an linearization. Therefore, the underlying dynamics movement k; x 0p of the robots has to fulfill
b
I (2) − (N − k)c ≥ h Ib (2) − (N − k)x u k; x 0 q q p p ∞ ∞ for each k ∈ [k : k + N ]. In other words, if the kinematic model is non-holonomic, the linearization is only able to guarantee collision avoidance for those types of non-
240
T. Sprodowski
Algorithm 9.4 DMPC scheme with differential updates for the overall system 1: Given feasible initial states and targets x 0p , x ∗p for p ∈ [1 : P] , k := 0 2: for k = 0 to N do 3: Set I p,n (k) := k, q x 0p ∀ p ∈ [1 : P] 4: end for 5: for n = 1, 2, . . . do 6: for p = 1 to P do 7: Measure x 0p 8: if x 0p − x ∗p 2 > δ then 9: if n = 1 then 10: Broadcast Ibp to all robots according to (9.15) 11: end if 12: Receive X q for q∈ [1 :P] \ { p} and construct Iq,n via Algorithm 9.3 13: Construct G x up k; x 0p , i p,n (k) via (9.7) for k ∈ [1 : N ] 14: Solve OCP (9.8) and apply u ∗p (0) 15: Broadcast Ibp to all robots according to (9.15) 16: end if 17: end for 18: end for
holonomic models, which dynamics are capable to keep the trajectory inside the decreasing bounding box over the prediction horizon length. Example 9.8 (DMPC with decreasing bounding boxes) For the decreasing bounding box approach, we refer to the parameters of Example 9.4, which are also utilized here. The closed-loop costs are depicted in Fig. 9.9. As the robots are utilizing the bounding box and to restrict the robots to move on vertical or horizontal lines, the closed-loop costs are quite close among all cell sizes. To illustrate the proportion of the bounding box and the maximum possible speed of the robots, this is evaluated via
kl := Iqb (2) − N c 2 − Iqb (2) − Iqb (1) 2
(9.18)
to indicate, if the robots might leave their initial cells early or they have to wait either to let other robots passing or reaching their end point. Here, on the left of Fig. 9.10, for small cell sizes (with c = 0.5) the conflicts are minor for robot 1 starting at first, as its optimization is carried out at first, and therefore is able to reserve most of the available space. Therefore, the latest start time is low as this robot is moving fast in the beginning. The other robots have to incorporate the imposed constraints by robot 1, therefore, they cannot impose maximum speed, which leads to a higher latest start time (for example, robot 4 as last robot to be in order). With ongoing simulations, the robots have to slow down, especially robot 4, which leads to a slower speed and thus, to a larger kl . As this robot is the last one to optimize, it has to include all imposed trajectories from the other robots, which reduces the number of available cells. At the end of the simulations, all robots are slowing down revealing a large latest start time, and therefore a much smaller
9 Collision Avoidance for Mobile Robots Based on an Occupancy Grid
241
Fig. 9.9 Development of closed-loop costs M p over time n (left) and of individual closed-loop costs for c = 2.0 (right) with horizon length N = 12
Fig. 9.10 Development of kl (9.18) with c = 0.5 (left) c = 2.0 (right) for horizon length N = 12
rectangle, which is then claimed by the robots. Note that by using a holonomic model, this ensures the lowest bound for the latest start time. As other kinematic models (for example four-wheels, tricycles) are constrained in their possible directions, all models have at least same or lower start times, as turn arounds or necessary curves to reach a certain position which might require more steps to follow a predicted trajectory.
242
T. Sprodowski
9.6 Conclusion In this article, we have discussed and reviewed a quantization technique based on state quantization to minimize the communication via differential updates. Furthermore, utilizing state bounding boxes leads to larger required safety margins, which can be attenuated via the estimation of the movement of the other robots. Using leading signs gives more insight information for the robots about the expected movement of the other ones, which leads to shorter convergence times. Numerical simulations were presented as examples for the differential update method and the bounding box method in a setting with holonomic robots to allow for comparison to former approaches and to show the improvements considering less communication loads and reduction of the convergence times. Further considerations should cover theoretical aspects about the optimal choice of the cell size in the context of shorter convergence times and low communication rates. Other practical issues are the introduction of priority rules, which may reveal an optimal order of the robots regarding the current state of the whole system.
References 1. Brending, S., Lawo, M., Pannek, J., Sprodowski, T., Zeising, P., Zimmermann, D.: Certifiable software architecture for human robot collaboration in industrial production environments. IFAC-PapersOnLine, Toulouse, France 50(1), 1983–1990 (2017) 2. Camponogara, E., Jia, D., Krogh, B.H., Talukdar, S.: Distributed model predictive control. IEEE Control Syst. Mag. IEEE 22(1), 44–52 (2002) 3. Cheng, T.M., Malyavej, V., Savkin, A.V.: Set-valued state estimation for uncertain continuoustime systems via limited capacity communication channels. IFAC Proc. Vol. IFAC 41(2), 3755– 3760 (2008) 4. Christofides, P.D., Scattolini, R., de la Peña, D.M., Liu, J.: Distributed model predictive control: a tutorial review and future research directions. Comput. Chem. Eng. 51, 21–41 (2013) 5. Dunbar, W.B.: Distributed receding horizon control of cost coupled systems. In: 46th IEEE Conference on Decision and Control, New Orlenas, LA, USA, pp. 2510–2515. IEEE (2007) 6. Dunbar, W.B.: Distributed receding horizon control of coupled nonlinear oscillators: theory and application. In: Proceedings of the IEEE Conference on Decision and Control (CDC), San Diego, CA, USA, pp. 4854–4860. IEEE (2006) 7. Eqtami, A., Dimarogonas, D.V., Kyriakopoulos, K.J.: Novel event-triggered strategies for model predictive controllers. In: Proceedings of the IEEE Conference on Decision and Control (CDC), Orlando, FL, USA, pp. 3392–3397. IEEE (2011) 8. Farina, M., Perizzato, A., Scattolini, R.: Application of distributed predictive control to motion and coordination problems for unicycle autonomous robots. Robot. Auton. Syst. Elsevier Ltd. 72, 248–260 (2015) 9. Giselsson, P., Rantzer, A.: Distributed model predictive control with suboptimality and stability guarantees. In: 49th IEEE Conference on Decision and Control, Atlanta, Georgia, USA, pp. 7272–7277. IEEE (2010) 10. Grüne, L., Müller, F.: An algorithm for event-based optimal feedback control. In: 48th IEEE Conference on Decision and Control and 28th Chinese Control Conference, Shanghai, China, pp. 5311–5316. IEEE (2009)
9 Collision Avoidance for Mobile Robots Based on an Occupancy Grid
243
11. Grüne, L., Pannek, J., Worthmann, K.: A prediction based control scheme for networked systems with delays and packet dropouts. In: 48th IEEE Conference on Decision and Control (CDC), Shanghai, China, pp. 537–542. IEEE (2009) 12. Grüne, L., Worthmann, K.: A distributed NMPC scheme without stabilizing terminal constraints. Distributed Decision Making and Control, pp. 261–287. Springer, London (2013) 13. Guoa, M., Dimarogonasa, D.V.: Consensus with quantized relative state measurement. Automatica, Elsevier 49(8), 2531–2537 (2013) 14. Isermann, R.: Digital Control Systems. Springer, Berlin (1991) 15. Jia, D., Krogh, B.: Min-max feedback model predictive control for distributed control with communication. In: Proceedings of the American Control Conference, Anchorage, AK, USA, pp. 4507–4512. IEEE (2002) 16. Karabulut, M.A., Shah, A.F.M.S., Ilhan, H.: The performance of the IEEE 802.11 DCF for different contention window in VANETs. In: 41st International Conference on Telecommunications and Signal Processing (TSP), Athens, Greece, pp. 1–4. IEEE 17. Kashyap, A., Ba¸ssar, T., Srikant, R.: Quantized consensus. Automatica, Elsevier Ltd. 43, 1192– 1203 (2007) 18. Khaliq, K.A., Qayyum, A., Pannek, J.: Performance analysis of proposed congestion avoiding protocol for IEEE 802.11s. Int. J. Adv. Comput. Sci. Appl. SAI 2, 356–369 (2017) 19. Khaliq, K.A., Qayyum, A., Pannek, J.: Synergies of advanced technologies and role of VANET in logistics and transportation. Int. J. Adv. Comput. Sci. Appl. SAI 7(11), 359–369 (2016) 20. Li, H., Liu, S., Soh, Y.C., Xie, L.: Event-triggered communication and data rate constraint for distributed optimization of multiagent systems. IEEE Trans. Syst. Man Cybern.: Syst. IEEE 48(11), 1908–1919 (2018) 21. Li, H., Yan, W., Shi, Y.: Adaptive self-triggered model predictive control of discrete-time linear systems. In: IEEE 56th Annual Conference on Decision and Control, Melbourne, Australia, pp. 6165–6170. IEEE (2017) 22. Liu, S., Xie, L., Quevedo, D.E.: Event-triggered quantized communication based distributed convex optimization. IEEE Trans. Control Netw. Syst. 5870(99), 1–11 (2016) 23. Liu, S., Xie, L., Quevedo, D.E.: Event-triggered quantized communication-based distributed convex optimization. IEEE Trans. Control Netw. Syst. IEEE 5(1), 167–178 (2018) 24. Martinez, F.J., Cano, J.C., Calafate, C.T., Manzoni, P.: A performance evaluation of warning message dissemination in 802.11p based VANETs. In: Proceedings - Conference on Local Computer Networks, LCN, pp. 221–224 (2009) 25. Mehrez, M.W., Sprodowski, T., Worthmann, K., Mann, G.K.I., Gosine, R.G., Sagawa, J.K., Pannek, J.: Occupancy grid based distributed MPC for mobile robots. In: IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), Vancouver, Canada, pp. 4842–4847. IEEE (2017) 26. Nair, G.N., Huang, M., Evans, R.J.: Optimal infinite horizon control under a low data rate. In: 14th IFAC Symposium on System Identification, Newcastle, Australia. IFAC (2006) 27. Pannek, J.: Parallelizing a state exchange strategy for noncooperative distributed NMPC. Syst. Control Lett. Elsevier Ltd. 62(1), 29–36 (2013) 28. Pu, Y., Zeilinger, M.N., Jones, C.N.: Quantization design for distributed optimization with time-varying parameters. In: 54th IEEE Conference on Decision and Control (CDC), Osaka, Japan, pp. 2037–2042. IEEE (2015) 29. Richards, A., How, J.: Decentralized model predictive control of cooperating UAVs. In: 43th IEEE Conference on Decision and Control, Nassau, Bahamas, vol. 4, pp. 4286–4291. IEEE (2004) 30. Scattolini, R.: Architectures for distributed and hierarchical model predictive control – a review. J. Process Control, Elsevier Ltd. 5, 723–731 (2009) 31. Sprodowski, T., Mehrez, M.W., Worthmann, K., Mann, G.K.I., Gosine, R.G., Sagawa, J.K., Pannek, J.: Differential communication with distributed model predictive control of mobile robots based on an occupancy grid. Inf. Sci. 453, 426–441 (2018) 32. Sprodowski, T., Pannek, J.: Relaxed collision constraints based on interval superposition principle in a DMPC scheme. In: 24th International Conference on Parallel and Distributed Systems, Singapore, Singapore, pp. 831–838. IEEE (2018)
244
T. Sprodowski
33. Sprodowski, T., Sagawa, J.K., Pannek, J.: Connection between quantisation and bandwidth requirements of distributed model predictive control. IFAC-PapersOnLine, IFAC 50(1), 10329– 10334 (2017) 34. Sprodowski, T., Sagawa, J.K., Pannek, J.: Connection between quantisation and bandwidth requirements of distributed model predictive control. IFAC-PapersOnLine, Toulouse, France 50(1), 10329–10334 (2017) 35. Sprodowski, T., Zha, Y., Pannek, J.: Interval superposition arithmetic inspired communication for distributed model predictive control. In: Freitag, M., Kotzab, H., Pannek, J. (eds.) Proceedings of the 6th International Conference on Dynamics in Logistics (LDIC 2018), Bremen, Germany, pp. 327–334. Springer International Publishing (2018) 36. Stewart, B.T., Rawlings, J.B.: Coordinating multiple optimization-based controllers: new opportunities and challenges. J. Process Control, Elsevier Ltd. 18(9), 839–845 (2008) 37. Stewart, B.T., Venkat, A.N., Rawlings, J.B., Wright, S.J., Pannocchia, G.: Cooperative distributed model predictive control. Syst. Control Lett. Elsevier Ltd. 59(8), 460–469 (2010) 38. Stewart, B.T., Wright, S.J., Rawlings, J.B.: Cooperative distributed model predictive control for nonlinear systems. J. Process Control, Elsevier Ltd. 21(5), 698–704 (2011) 39. Varutti, P., Faulwasser, T., Kern, B., Kögel, M., Findeisen, R.: Event-based reduced-attention predictive control for nonlinear uncertain systems. In: IEEE International Symposium on Computer-Aided Control System Design (CACSD), Part of Multi-conference on Systems and Control (MSC), Yokohama, Japan, pp. 1085–1090. IEEE (2010) 40. Venkat, A.N., Rawlings, J.B., Wright, S.J.: Stability and optimality of distributed model predictive control. In: 44th IEEE Conference on Decision and Control (CDC), Seville, Spain, pp. 6680–6685. IEEE (2005) 41. Wang, L., Törngren, M., Onori, M.: Current status and advancement of cyber-physical systems in manufacturing. J. Manuf. Syst. 37, 517–527 (2015) 42. Weining, L., Tao, Z., Jun, Y., Xueqian, W.: A formation control approach with autonomous navigation of multi-robot system in unknown environment. In: Proceedings of the 34th Chinese Control Conference, Hangzhou, China, pp. 5230–5234. IEEE (2015) 43. Yi, P., Hong, Y.: Quantized sub-gradient algorithm and data rate analysis for distributed optimization. IEEE Trans. Control Netw. Syst. 1(4), 380–392 (2014) 44. Yu, S., Wang, Y., Jin, L., Zheng, K.: Asymptotic average consensus of continuous-time multiagent systems with dynamically quantized communication. IFAC Proc. Vol. IFAC 47(3), 1819– 1824 (2014) 45. Zha, Y., Houska, B.: Interval superposition arithmetic (2016). arXiv:1610.05862