Stochastic Linear-Quadratic Optimal Control Theory: Differential Games and Mean-Field Problems [1st ed.] 9783030483050, 9783030483067

This book gathers the most essential results, including recent ones, on linear-quadratic optimal control problems, which

299 106 2MB

English Pages XII, 130 [138] Year 2020

Report DMCA / Copyright

DOWNLOAD PDF FILE

Table of contents :
Front Matter ....Pages i-xii
Some Elements of Linear-Quadratic Optimal Controls (Jingrui Sun, Jiongmin Yong)....Pages 1-13
Linear-Quadratic Two-Person Differential Games (Jingrui Sun, Jiongmin Yong)....Pages 15-67
Mean-Field Linear-Quadratic Optimal Controls (Jingrui Sun, Jiongmin Yong)....Pages 69-123
Back Matter ....Pages 125-130
Recommend Papers

Stochastic Linear-Quadratic Optimal Control Theory: Differential Games and Mean-Field Problems [1st ed.]
 9783030483050, 9783030483067

  • 0 0 0
  • Like this paper and download? You can publish your own PDF file online for free in a few minutes! Sign Up
File loading please wait...
Citation preview

SPRINGER BRIEFS IN MATHEMATICS

Jingrui Sun Jiongmin Yong

Stochastic Linear-Quadratic Optimal Control Theory: Differential Games and Mean-Field Problems

SpringerBriefs in Mathematics Series Editors Nicola Bellomo, Torino, Italy Michele Benzi, Pisa, Italy Palle Jorgensen, Iowa City, USA Tatsien Li, Shanghai, China Roderick Melnik, Waterloo, Canada Otmar Scherzer, Linz, Austria Benjamin Steinberg, New York City, USA Lothar Reichel, Kent, USA Yuri Tschinkel, New York City, USA George Yin, Detroit, USA Ping Zhang, Kalamazoo, USA

SpringerBriefs in Mathematics showcases expositions in all areas of mathematics and applied mathematics. Manuscripts presenting new results or a single new result in a classical field, new field, or an emerging topic, applications, or bridges between new results and already published works, are encouraged. The series is intended for mathematicians and applied mathematicians. All works are peer-reviewed to meet the highest standards of scientific literature.

BCAM SpringerBriefs Editorial Board Enrique Zuazua Deusto Tech Universidad de Deusto Bilbao, Spain and Departamento de Matemáticas Universidad Autónoma de Madrid Cantoblanco, Madrid, Spain Irene Fonseca Center for Nonlinear Analysis Department of Mathematical Sciences Carnegie Mellon University Pittsburgh, USA Juan J. Manfredi Department of Mathematics University of Pittsburgh Pittsburgh, USA Emmanuel Trélat Laboratoire Jacques-Louis Lions Institut Universitaire de France Université Pierre et Marie Curie CNRS, UMR, Paris Xu Zhang School of Mathematics Sichuan University Chengdu, China BCAM SpringerBriefs aims to publish contributions in the following disciplines: Applied Mathematics, Finance, Statistics and Computer Science. BCAM has appointed an Editorial Board, who evaluate and review proposals. Typical topics include: a timely report of state-of-the-art analytical techniques, bridge between new research results published in journal articles and a contextual literature review, a snapshot of a hot or emerging topic, a presentation of core concepts that students must understand in order to make independent contributions. Please submit your proposal to the Editorial Board or to Francesca Bonadei, Executive Editor Mathematics, Statistics, and Engineering: [email protected].

More information about this series at http://www.springer.com/series/10030

Jingrui Sun Jiongmin Yong •

Stochastic Linear-Quadratic Optimal Control Theory: Differential Games and Mean-Field Problems

123

Jingrui Sun Department of Mathematics Southern University of Science and Technology Shenzhen, Guangdong, China

Jiongmin Yong Department of Mathematics University of Central Florida Orlando, FL, USA

ISSN 2191-8198 ISSN 2191-8201 (electronic) SpringerBriefs in Mathematics ISBN 978-3-030-48305-0 ISBN 978-3-030-48306-7 (eBook) https://doi.org/10.1007/978-3-030-48306-7 © The Author(s), under exclusive license to Springer Nature Switzerland AG 2020 This work is subject to copyright. All rights are solely and exclusively licensed by the Publisher, whether the whole or part of the material is concerned, specifically the rights of translation, reprinting, reuse of illustrations, recitation, broadcasting, reproduction on microfilms or in any other physical way, and transmission or information storage and retrieval, electronic adaptation, computer software, or by similar or dissimilar methodology now known or hereafter developed. The use of general descriptive names, registered names, trademarks, service marks, etc. in this publication does not imply, even in the absence of a specific statement, that such names are exempt from the relevant protective laws and regulations and therefore free for general use. The publisher, the authors and the editors are safe to assume that the advice and information in this book are believed to be true and accurate at the date of publication. Neither the publisher nor the authors or the editors give a warranty, express or implied, with respect to the material contained herein or for any errors or omissions that may have been made. The publisher remains neutral with regard to jurisdictional claims in published maps and institutional affiliations. This Springer imprint is published by the registered company Springer Nature Switzerland AG The registered company address is: Gewerbestrasse 11, 6330 Cham, Switzerland

To Our Parents Yuqi Sun and Xiuying Ma Wenyao Yong and Xiangxia Chen

Preface

Linear-quadratic optimal control theory (LQ theory, for short) has a long history, and most people feel that the LQ theory is quite mature. Three well-known relevant issues are involved: existence of optimal controls, solvability of the optimality system (which is a two-point boundary value problem), and solvability of the associated Riccati equation. A rough impression is that these three issues are somehow equivalent. In the past few years, we, together with our collaborators, have been re-investigating the LQ theory for stochastic systems with deterministic coefficients. A number of interesting delicate issues have been identified, including: • For finite-horizon LQ problems, open-loop optimal controls and closed-loop optimal strategies should be distinguished because the existence of the latter implies the existence of the former, but not vice versa. Whereas, for infinite-horizon LQ problems, under proper conditions, the open-loop and closed-loop solvability are equivalent. • For finite-horizon two-person (not necessarily zero-sum) differential games, the open-loop and closed-loop Nash equilibria are two different concepts. The existence of one of them does not imply the existence of the other, which is different from LQ optimal control problems. • The closed-loop representation of an open-loop Nash equilibrium is not necessarily the outcome of a closed-loop Nash equilibrium. Our investigations also revealed some previously unknown facts concerning two-person differential games. A partial list is: • For two-person (not necessarily zero-sum) differential games in finite horizons, the existence of an open-loop Nash equilibrium is equivalent to the solvability of a system of coupled FBSDEs, together with the convexities of the cost functionals; the existence of a closed-loop Nash equilibrium is equivalent to the solvability of a Lyapunov-Riccati type equation.

vii

viii

Preface

• For two-person zero-sum differential games, both in finite and infinite horizons, if closed-loop saddle points exist, and an open-loop saddle point exists and admits a closed-loop representation, then the representation must be the outcome of some closed-loop saddle point. Such a result also holds for LQ optimal control problems. • For two-person zero-sum differential games over an infinite horizon, the existence of an open-loop and a closed-loop Nash equilibrium are equivalent. • Some of the results concerning LQ optimal control problems can further be extended to the case when expectations of the state and the control are involved. This kind of LQ problems is referred to as the mean-field problem. The purpose of this book is to systematically present the above-mentioned results concerning LQ differential games and mean-field LQ optimal control problems. We assume that readers are familiar with basic stochastic analysis and stochastic control theory. This work is supported in part by NSFC Grant 11901280 and NSF Grants DMS-1406776, DMS-1812921. The authors also would like to express their gratitude to the anonymous referees for their constructive comments, which led to this improved version. Shenzhen, China Orlando, USA March 2020

Jingrui Sun Jiongmin Yong

Contents

1 Some Elements of Linear-Quadratic Optimal Controls . . 1.1 LQ Optimal Control Problems in Finite Horizons . . . . 1.2 LQ Optimal Control Problems in Infinite Horizons . . . 1.3 Appendix: Pseudoinverse and Infinite-Horizon BSDEs

. . . .

. . . .

. . . .

1 1 8 12

...... ......

15 15

......

19

......

28

...... ......

33 39

......

40

...... ......

43 51

. . . . .

. . . . .

69 70 72 74 78

.......

79

....... .......

88 94

. . . .

2 Linear-Quadratic Two-Person Differential Games . . . . . . . . 2.1 Formulation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.2 Open-Loop Nash Equilibria and Their Closed-Loop Representations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.3 Closed-Loop Nash Equilibria and Symmetric Riccati Equations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.4 Relations Between Open-Loop and Closed-Loop Nash Equilibria . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.5 Zero-Sum Games . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.5.1 Characterizations of Open-Loop and Closed-Loop Saddle Points . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.5.2 Relations Between Open-Loop and Closed-Loop Saddle Points . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.6 Differential Games in Infinite Horizons . . . . . . . . . . . . . . 3 Mean-Field Linear-Quadratic Optimal Controls . . . . . . . . . 3.1 Problem Formulation and General Considerations . . . . . . 3.2 Open-Loop Solvability and Mean-Field FBSDEs . . . . . . 3.3 A Hilbert Space Point of View . . . . . . . . . . . . . . . . . . . 3.4 Uniform Convexity and Riccati Equations . . . . . . . . . . . 3.4.1 Solvability of Riccati Equations: Sufficiency of the Uniform Convexity . . . . . . . . . . . . . . . . . 3.4.2 Solvability of Riccati Equations: Necessity of the Uniform Convexity . . . . . . . . . . . . . . . . . 3.4.3 Sufficient Conditions for the Uniform Convexity .

. . . . .

. . . .

. . . .

. . . . .

. . . .

. . . . .

. . . .

. . . . .

. . . .

. . . . .

ix

x

Contents

3.5 Multi-dimensional Brownian Motion Case 3.6 Problems in Infinite Horizons . . . . . . . . . 3.6.1 Stability and Stabilizability . . . . . . 3.6.2 Stochastic LQ Problems . . . . . . . .

and Applications . . . ................ ................ ................

. . . .

. 97 . 106 . 107 . 119

Bibliography . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 125 Index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 129

Frequently Used Notation

I. Notation for Euclidean Spaces and Matrices 1. 2. 3. 4. 5. 6. 7. 8. 9. 10. 11. 12. 13. 14. 15. 16. 17. 18. 19. 20. 21. 22.

Rnm : the space of all n  m real matrices. Rn ¼ Rn1 ; R ¼ R1 ; R ¼ ½1; 1. Sn : the space of all symmetric n  n real matrices. Snþ : the subset of Sn consisting of positive definite matrices. n : the subset of Sn consisting of positive semi-definite matrices. S þ In : the identity matrix of size n, which is also denoted simply by I if no confusion occurs. M > : the transpose of a matrix M. M y : the Moore-Penrose pseudoinverse of a matrix M. trðMÞ: the sum of diagonal elements of a square matrix M, called the trace of M. h; i: the inner product on a Hilbert  space.  In particular, the usual inner product on Rnm is given by hM; N i 7! tr M > N . pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi jMj , trðM > MÞ: the Frobenius norm of a matrix M. RðMÞ: the range of a matrix or an operator M. NðMÞ: the kernel of a matrix or an operator M. A > B: A  B is a positive semi-definite symmetric matrix. QðPÞ , PA þ A> P þ C > PC þ Q. SðPÞ , B> P þ D> PC þ S. RðPÞ , R þ D> PD. b , A þ A,  B b , C þ C,  D b , B þ B,  C b , D þ D.  A b , Q þ Q,  b b , G þ G.  b , R þ R,  G Q S , S þ S, R > > b b b b b b QðP; PÞ , P A þ A P þ C P C þ Q. b b þb b >PC b >P þ D S. SðP; PÞ , B > b bþD b P D. b RðPÞ ,R

xi

xii

Frequently Used Notation

II. Sets and Spaces of Functions and Processes Let H be a Euclidian space (which could be Rn , Rnm , etc.). 1. Cð½t; T; HÞ: the space of H-valued, continuous functions on ½t; T. 2. Lp ðt; T; HÞ: the space of H-valued functions that are pth ð1 6 p\1Þ power Lebesgue integrable on ½t; T. 3. L1 ðt; T; HÞ: the space of H-valued, Lebesgue measurable functions that are essentially bounded on ½t; T. 4. L2F t ðX; HÞ: the space of F t -measurable, H-valued random variables n such that Ejnj2 \1. 5. L2F ðX; L1 ðt; T; HÞÞ: the space of F-progressively measurable, H-valued prohR i2 T cesses u : ½t; T  X ! H such that E t juðsÞjds \1. 6. L2F ðt; T; HÞ: the space of F-progressively measurable, H-valued processes u : RT ½t; T  X ! H such that E t juðsÞj2 ds\1. 7. L2F ðHÞ: the space of F-progressively measurable, H-valued processes u : R1 ½0; 1Þ  X ! H such that E 0 juðtÞj2 dt\1. 8. L2F ðX; Cð½t; T; HÞÞ: the space of F-adapted, continuous, H-valued processes h i u : ½t; T  X ! H such that E sups2½t;T juðsÞj2 \1. 9. 10. 11. 12. 13.

X t ¼ L2F t ðX; Rn Þ. X ½t; T ¼ L2F ðX; Cð½t; T; Rn ÞÞ. U½t; T ¼ L2F ðt;TT; Rm Þ. X loc ½0; 1Þ ¼ T [ 0 X ½0; T. X ½0; 1Þ: the subspace of X loc ½0; 1Þ consisting of processes u which are R1 square-integrable: E 0 juðtÞj2 dt\1.

Chapter 1

Some Elements of Linear-Quadratic Optimal Controls

Abstract This chapter is a brief review on the stochastic linear-quadratic optimal control. Some useful concepts and results, which will be needed throughout this book, are presented in the context of finite and infinite horizon problems. These materials are mainly for beginners and may also serve as a quick reference for knowledgeable readers. Keywords Linear-quadratic · Optimal control · Finite horizon · Infinite horizon · Riccati equation · Open-loop · Closed-loop In this chapter, we briefly review the stochastic linear-quadratic (LQ, for short) optimal control problem and present some useful concepts and results in the context of finite and infinite horizon problems. Most of the results recalled here are quoted from the book [48] by Sun and Yong, where rigorous proofs can be found. In the sequel, (Ω, F, P) denotes a complete probability space on which a standard onedimensional Brownian motion W = {W (t); 0  t < ∞} is defined, and F denotes the usual augmentation of the natural filtration {Ft }t0 generated by W . For a random variable ξ, we write ξ ∈ Ft if ξ is Ft -measurable, and for a stochastic process ϕ, we write ϕ ∈ F if it is F-progressively measurable.

1.1 LQ Optimal Control Problems in Finite Horizons Consider the following controlled linear stochastic differential equation (SDE, for short) on a finite horizon [t, T ]: ⎧ ⎪ ⎨ d X (s) = [A(s)X (s) + B(s)u(s) + b(s)]ds + [C(s)X (s) + D(s)u(s) + σ(s)]dW (s), (1.1.1) ⎪ ⎩ X (t) = x, where A, C : [0, T ] → Rn×n , B, D : [0, T ] → Rn×m , called the coefficients of the state equation (1.1.1), and b, σ : [0, T ] × Ω → Rn , called the nonhomogeneous terms, satisfy the following assumption. © The Author(s), under exclusive license to Springer Nature Switzerland AG 2020 J. Sun and J. Yong, Stochastic Linear-Quadratic Optimal Control Theory: Differential Games and Mean-Field Problems, SpringerBriefs in Mathematics, https://doi.org/10.1007/978-3-030-48306-7_1

1

2

1 Some Elements of Linear-Quadratic Optimal Controls

(H1) The coefficients and the nonhomogeneous terms of (1.1.1) satisfy 

A ∈ L 1 (0, T ; Rn×n ),

B ∈ L 2 (0, T ; Rn×m ),

C ∈ L 2 (0, T ; Rn×n ),

D ∈ L ∞ (0, T ; Rn×m ), σ ∈ L 2F (0, T ; Rn ).

b ∈ L 2F (Ω; L 1 (0, T ; Rn )),

In the above assumption we have adopted the following notation: For a subset H of some Euclidean space (which could be Rn , Rn×m , etc.), 

 T L p (t, T ; H) = ϕ : [t, T ] → H  |ϕ(s)| p ds < ∞ (1  p < ∞), t

  ∞ L (t, T ; H) = ϕ : [t, T ] → H  ϕ is essentially bounded , T 

 2  L F (t, T ; H) = ϕ : [t, T ] × Ω → H ϕ ∈ F, E |ϕ(s)|2 ds < ∞ , t

  2 1  L F (Ω; L (t, T ; H)) = ϕ : [t, T ] × Ω → H ϕ ∈ F, E

T

|ϕ(s)|ds

2

−∞ for every initial pair (t, x). Remark 1.1.4 It is clear that the process u ∗ in (1.1.3) depends on the initial pari (t, x). The notion of an open-loop optimal control is employed to emphasize this dependence.

4

1 Some Elements of Linear-Quadratic Optimal Controls

Write for simplicity Θ[t, T ] = L 2 (t, T ; Rm×n ). Let (Θ, v) ∈ Θ[t, T ] × U[t, T ], and let X be the solution to the SDE ⎧ ⎪ ⎨ d X (s) = [(A + BΘ)X + Bv + b]ds ⎪ ⎩

+ [(C + DΘ)X + Dv + σ]dW, s ∈ [t, T ], X (t) = x,

(1.1.4)

where, for notational convenience, we have suppressed the argument s in the drift and diffusion terms. The process u(s) = Θ(s)X (s) + v(s), s ∈ [t, T ]

(1.1.5)

is easily seen to be square-integrable and hence is a control. Plugging (1.1.5) into the SDE (1.1.4), we see that (1.1.4) is exactly the form (1.1.1). Definition 1.1.5 A closed-loop strategy over [t, T ] is a pair (Θ, v) ∈ Θ[t, T ] × U[t, T ]. The process defined by (1.1.5) is called a closed-loop control generated by (Θ, v), and the SDE (1.1.4) is called a closed-loop system. Remark 1.1.6 A closed-loop strategy (Θ, v) is independent of the initial state x. However, the closed-loop control generated by (Θ, v) still depends on x. Definition 1.1.7 A closed-loop strategy (Θ, v) ∈ Θ[t, T ] × U[t, T ] is said to be optimal if for every initial state, the closed-loop control generated by (Θ, v) is openloop optimal. In this case, (Θ, v) is called a closed-loop optimal strategy. Definition 1.1.8 Problem (SLQ) is said to be (uniquely) closed-loop solvable over [t, T ], if a (unique) closed-loop optimal strategy exists on [t, T ]. In order to study Problem (SLQ), it is often convenient to consider first the homogeneous problem associated with Problem (SLQ), for which we want to find a control u ∈ U[t, T ] such that  J (t, x; u)  E G X (T ), X (T ) +

T

0

t



Q S S R

      X X , ds , u u

is minimized subject to ⎧ ⎪ ⎨ d X (s) = [A(s)X (s) + B(s)u(s)]ds + [C(s)X (s) + D(s)u(s)]dW (s), s ∈ [t, T ], ⎪ ⎩ X (t) = x. We denote this homogeneous problem by Problem (SLQ)0 and its value function by V 0 (t, x). Now we present a characterization of open-loop optimal controls.

1.1 LQ Optimal Control Problems in Finite Horizons

5

Theorem 1.1.9 Let (H1)–(H2) hold, and let (t, x) ∈ [0, T ) × Rn be a given initial pair. Then u ∈ U[t, T ] is an open-loop optimal control of Problem (SLQ) for (t, x) if and only if (i) the mapping v → J 0 (t, 0; v) is convex, or equivalently, J 0 (t, 0; v)  0, ∀v ∈ U[t, T ];

(1.1.6)

(ii) the adapted solution (X, Y, Z ) to the decoupled forward-backward stochastic differential equation (FBSDE, for short) ⎧ + Bu + b)ds + (C X + Du + σ)dW, ⎪ ⎨ d X (s) = (AX    dY (s) = − A Y + C  Z + Q X + S  u + q ds + Z d W, ⎪ ⎩ X (t) = x, Y (T ) = G X (T ) + g satisfies the stationarity condition B  Y + D  Z + S X + Ru + ρ = 0, a.e. on [t, T ], a.s. The closed-loop optimal strategy can be characterized by means of the Riccati equation, which is a nonlinear ordinary differential equation (ODE, for short) of the following form: 

˙ P(s) + Q(s, P(s)) − S(s, P(s)) R(s, P(s))† S(s, P(s)) = 0, P(T ) = G,

(1.1.7)

where the superscript † denotes the Moore–Penrose pseudoinverse of matrices, and ⎧   ⎪ ⎨ Q(s, P)  P A(s) + A(s) P + C(s) PC(s) + Q(s), S(s, P)  B(s) P + D(s) PC(s) + S(s), ⎪ ⎩ R(s, P)  R(s) + D(s) P D(s).

(1.1.8)

The Eq. (1.1.7) is symmetric, so by a solution we mean a continuous Sn -valued function that satisfies (1.1.7) for almost all s. To simplify notation we will frequently suppress the variable s and write Q(s, P(s)), S(s, P(s)), and R(s, P(s)) as Q(P), S(P), and R(P), respectively. Let R(M) denotes the range of a matrix M, and let C([t, T ]; Sn ) be the space of all Sn -valued, continuous functions over [t, T ]. Definition 1.1.10 Let P ∈ C([t, T ]; Sn ) be a solution to the Riccati equation (1.1.7) on [t, T ]. We say that the solution P is regular if (i) R(P)† S(P) ∈ Θ[t, T ]; (ii) R(S(P)) ⊆ R(R(P)) a.e. on [t, T ]; (iii) R(P)  0 a.e. on [t, T ], i.e., R(P) is positive semi-definite a.e. on [t, T ].

6

1 Some Elements of Linear-Quadratic Optimal Controls

The Riccati equation (1.1.7) is said to be regularly solvable when a regular solution exists. Definition 1.1.11 A solution P of the Riccati equation (1.1.7) on [t, T ] is said to be strongly regular if there exists a constant λ > 0 such that R(s) + D(s) P(s)D(s)  λIm , a.e. s ∈ [t, T ]. The Riccati equation (1.1.7) is said to be strongly regularly solvable when a strongly regular solution exists. The following result provides a characterization of closed-loop optimal strategies. Theorem 1.1.12 Let (H1)–(H2) hold. Then Problem (SLQ) is closed-loop solvable on [t, T ] if and only if (i) the Riccati equation (1.1.7) admits a regular solution P ∈ C([t, T ]; Sn ); (ii) with Θ(s)  −R(s, P(s))† S(s, P(s)), s ∈ [t, T ], the adapted solution (η, ζ) to the backward stochastic differential equation (BSDE, for short)  ⎧    ⎪ ⎨ dη(s) = − (A + BΘ) η + (C + DΘ) ζ + (C + DΘ) Pσ + Θ  ρ + Pb + q ds + ζdW, s ∈ [t, T ], ⎪ ⎩ η(T ) = g

(1.1.9)

satisfies the following conditions: κ  B  η + D  ζ + D  Pσ + ρ ∈ R(R(P)), a.e. on [t, T ], a.s., v  −R(P)† κ ∈ U[t, T ]. In this case, the closed-loop optimal strategy (Θ ∗ , v ∗ ) admits the following representation: Θ ∗ = Θ + [I − R(P)† R(P)], v ∗ = v + [I − R(P)† R(P)]π, where (, π) ∈ Θ[t, T ] × U[t, T ] is arbitrary. Moreover,  V (t, x) = E P(t)x, x + 2η(t), x +

P(s)σ(s), σ(s) + 2η(s), b(s) t

+ 2ζ(s), σ(s) − R(s, P(s))† κ(s), κ(s) ds . T

An equivalent statement of Theorem 1.1.12 is as follows.

1.1 LQ Optimal Control Problems in Finite Horizons

7

Theorem 1.1.13 Let (H1)–(H2) hold. Then a closed-loop strategy (Θ, v) ∈ Θ[t, T ] × U[t, T ] is optimal if and only if (i) the solution P ∈ C([t, T ]; Sn ) to the symmetric Lyapunov type equation 

P˙ + Q(P) + Θ  R(P)Θ + S(P) Θ + Θ  S(P) = 0, P(T ) = G,

satisfies the following two conditions: for almost all s ∈ [t, T ], R(P)  0, S(P) + R(P)Θ = 0; (ii) the adapted solution (η, ζ) to the BSDE  ⎧    ⎪ ⎨ dη(s) = − (A + BΘ) η + (C + DΘ) ζ + (C + DΘ) Pσ + Θ  ρ + Pb + q ds + ζdW (s), s ∈ [t, T ], ⎪ ⎩ η(T ) = g, satisfies the following condition: for almost all s ∈ [t, T ], B  η + D  ζ + D  Pσ + ρ + R(P)v = 0, a.s. The closed-loop solvability trivially implies the open-loop solvability of Problem (SLQ). An example has been constructed in [48] to show that the converse implication does not hold in general. The following result gives a condition under which these two types of solvability are equivalent. Theorem 1.1.14 Let (H1)–(H2) hold. Suppose that there exists a constant λ > 0 such that T 0 J (0, 0; u)  λE |u(s)|2 ds, ∀u ∈ U[0, T ]. (1.1.10) 0

Then the following hold: (i) Problem (SLQ) is uniquely open-loop solvable. (ii) The Riccati equation (1.1.7) admits a unique strongly regular solution P ∈ C([0, T ]; Sn ) and hence Problem (SLQ) is uniquely closed-loop solvable over [t, T ] for every t ∈ [0, T ). (iii) The unique closed-loop optimal strategy over each [t, T ] is given by the following unified form: Θ ∗ = −R(P)−1 S(P), v ∗ = −R(P)−1 (B  η + D  ζ + D  Pσ + ρ), where (η, ζ) is the adapted solution of (1.1.9) with Θ replaced by Θ ∗ .

8

1 Some Elements of Linear-Quadratic Optimal Controls

(iv) The unique open-loop optimal control for the initial pair (t, x) is the closedloop control generated by the closed-loop optimal strategy (Θ ∗ , v ∗ ) over [t, T ]. It should be noted that part of the converse of Theorem 1.1.14 is true. In fact, we have the following result. Theorem 1.1.15 Let (H1)–(H2) hold. Then the following are equivalent: (i) there exists a constant λ > 0 such that (1.1.10) holds; (ii) the Riccati equation (1.1.7) admits a strongly regular solution on [0, T ]; (iii) there exists an Sn -valued function P such that R(t, P(t))  λIm , a.e. t ∈ [0, T ] holds for some constant λ > 0 and V 0 (t, x) = P(t)x, x, ∀(t, x) ∈ [0, T ] × Rn .

1.2 LQ Optimal Control Problems in Infinite Horizons Let L 2F (Rk ) be the space of Rk -valued, F-progressively measurable processes that are square-integrable on [0, ∞). Consider the controlled linear SDE 

d X (t) = [AX (t) + Bu(t) + b(t)]dt + [C X (t) + Du(t) + σ(t)]dW, X (0) = x,

over the infinite horizon [0, ∞) and the quadratic cost functional



J (x; u)  E



0

Q S S R



       X (t) X (t) q(t) X (t) , +2 , dt, u(t) u(t) ρ(t) u(t)

where A, C ∈ Rn×n ,

B, D ∈ Rn×m ,

Q ∈ Sn , S ∈ Rm×n ,

R ∈ Sm

are given constant matrices, and b, σ, q ∈ L 2F (Rn ), ρ ∈ L 2F (Rm ) are given processes. In the above, the control process u belongs to the space L 2F (Rm ). A control u is said to be admissible with respect to the initial state x if the corresponding state process {X (t; x, u); 0  t < ∞} satisfies

1.2 LQ Optimal Control Problems in Infinite Horizons

E



9

|X (t; x, u)|2 dt < ∞.

0

The set of admissible controls with respect to x is denoted by Uad (x). The linear-quadratic optimal control problem over [0, ∞) can be stated as follows. Problem (SLQ)∞ . For a given initial state x ∈ Rn , find an admissible control u ∗ ∈ Uad (x) such that J (x; u ∗ ) = inf J (x; u) ≡ V (x). u∈Uad (x)

In the special case of b, σ, q, ρ = 0, we denote Problem (SLQ)∞ by Problem (SLQ)0∞ , the cost functional by J 0 (x; u), and the value function by V 0 (x). In order to obtain conditions under which the admissible control sets Uad (x) are nonempty, we introduce the following concepts. Definition 1.2.1 The uncontrolled system d X (t) = AX (t)dt + C X (t)dW (t), t  0

(1.2.1)

is said to be L 2 -stable if for every initial state x ∈ Rn , its solution X (t; x) is squareintegrable on [0, ∞), that is,



E

|X (t; x)|2 dt < ∞, ∀x ∈ Rn .

0

Definition 1.2.2 The controlled system d X (t) = [AX (t) + Bu(t)]dt + [C X (t) + Du(t)]dW (t), t  0

(1.2.2)

is said to be L 2 -stabilizable if there exists a matrix Θ ∈ Rm×n such that the uncontrolled system d X (t) = (A + BΘ)X (t)dt + (C + DΘ)X (t)dW (t), t  0 is L 2 -stable. In this case, Θ is called a stabilizer of (1.2.2). The set of all stabilizers of (1.2.2) is denoted by S ≡ S [A, C; B, D]. Theorem 1.2.3 The following statements are equivalent: (i) Uad (x) = ∅ for all x ∈ Rn ; (ii) S [A, C; B, D] = ∅; (iii) The following algebraic Riccati equation (ARE, for short) admits a positive definite solution P ∈ Sn+ :

10

1 Some Elements of Linear-Quadratic Optimal Controls

P A + A P + C  PC + I − (P B + C  P D)(I + D  P D)−1 (B  P + D  PC) = 0.

(1.2.3)

If the above are satisfied and P ∈ Sn+ is a solution of (1.2.3), then Γ  −(I + D  P D)−1 (B  P + D  PC) ∈ S [A, C; B, D]. According to Theorem 1.2.3, Problem (SLQ)∞ is well-posed (for all x ∈ Rn ) only if (1.2.2) is L 2 -stabilizable. So it is reasonable to assume the following: (S) The system (1.2.2) is L 2 -stabilizable, i.e., S [A, C; B, D] = ∅. Definition 1.2.4 An element u ∗ ∈ Uad (x) is called an open-loop optimal control of Problem (SLQ)∞ for the initial state x ∈ Rn if J (x; u ∗ )  J (x; u), ∀u ∈ Uad (x). If an open-loop optimal control (uniquely) exists for x, Problem (SLQ)∞ is said to be (uniquely) open-loop solvable at x. Problem (SLQ)∞ is said to be (uniquely) open-loop solvable if it is (uniquely) open-loop solvable at all x ∈ Rn . Definition 1.2.5 A pair (Θ, v) ∈ S [A, C; B, D] × L 2F (Rm ) is called a closed-loop strategy of Problem (SLQ)∞ . The outcome u(t)  Θ X (t) + v(t), t  0 of a closed-loop strategy (Θ, v) is called a closed-loop control for the initial state x. Definition 1.2.6 A closed-loop strategy (Θ ∗ , v ∗ ) is said to be optimal if J (x; Θ ∗ X ∗ + v ∗ )  J (x; Θ X + v), for all (x, Θ, v) ∈ Rn × S [A, C; B, D] × L 2F (Rm ). If a closed-loop optimal strategy (uniquely) exists, Problem (SLQ)∞ is said to be (uniquely) closed-loop solvable. Now similar to (1.1.8), we define for a constant matrix P ∈ Sn that ⎧   ⎪ ⎨ Q(P)  P A + A P + C PC + Q, S(P)  B  P + D  PC + S, ⎪ ⎩ R(P)  R + D  P D.

(1.2.4)

Note that in the above, all matrices are time-invariant. In order to study the open-loop and closed-loop solvability of Problem (SLQ)∞ , we further introduce the following concept.

1.2 LQ Optimal Control Problems in Infinite Horizons

11

Definition 1.2.7 The following constrained nonlinear algebraic equation ⎧  † ⎪ ⎨ Q(P) − S(P) R(P) S(P) = 0, R(S(P)) ⊆ R(R(P)), ⎪ ⎩ R(P)  0,

(1.2.5)

with the unknown P ∈ Sn , is called a generalized algebraic Riccati equation (GARE, for short). A solution P of (1.2.5) is said to be stabilizing if there exists a  ∈ Rm×n such that the matrix Θ  −R(P)† S(P) + [I − R(P)† R(P)] is a stabilizer of the system (1.2.2). The main result on the solvability of Problem (SLQ)∞ is as follows. Theorem 1.2.8 Let (S) hold. Then the following statements are equivalent: (i) Problem (SLQ)∞ is open-loop solvable; (ii) Problem (SLQ)∞ is closed-loop solvable; (iii) The GARE (1.2.5) admits a stabilizing solution P ∈ Sn , and the BSDE  dη(t) = − [A − BR(P)† S(P)] η + [C − DR(P)† S(P)] ζ + [C − DR(P)† S(P)] Pσ − S(P) R(P)† ρ

+ Pb + q dt + ζdW, t  0 admits an L 2 -stable adapted solution1 (η, ζ) such that θ(t)  B  η(t) + D  ζ(t) + D  Pσ(t) + ρ(t) ∈ R(R(P)), a.e. t ∈ [0, ∞), a.s. In the above case, all closed-loop optimal strategies (Θ ∗ , v ∗ ) are given by Θ ∗ = −R(P)† S(P) + [I − R(P)† R(P)], v ∗ = −R(P)† θ + [I − R(P)† R(P)]ν, where  ∈ Rm×n is chosen so that Θ ∗ ∈ S [A, C; B, D] and ν ∈ L 2F (Rm ) is arbitrary; every open-loop optimal control u ∗ for the initial state x admits a closed-loop representation: u ∗ (t) = Θ ∗ X ∗ (t) + v ∗ (t), t  0, 1 See

the next section for the notion of an L 2 -stable adapted solution to BSDEs over an infinite horizon.

12

1 Some Elements of Linear-Quadratic Optimal Controls

where (Θ ∗ , v ∗ ) is a closed-loop optimal strategy of Problem (SLQ)∞ and X ∗ is the corresponding closed-loop state process. Moreover, V (x) = P x, x + 2Eη(0), x ∞ Pσ, σ + 2η, b + 2ζ, σ − R(P)† θ, θ dt. +E 0

1.3 Appendix: Pseudoinverse and Infinite-Horizon BSDEs Let M be an m × n real matrix. The Moore-Penrose pseudoinverse of M, denoted by M † , is an n × m real matrix such that M M † M = M, (M M † ) = M M † , M † M M † = M † , (M † M) = M † M. Every matrix has a unique (Moore-Penrose) pseudoinverse. If M ∈ Sn , then M † ∈ Sn , M M † = M † M, and M  0 if and only if M †  0. Proposition 1.3.1 Let I be an interval. Let L(t) and N (t) be two Lebesgue measurable functions on I, with values in Rn×k and Rn×m , respectively. Then the equation N (t)X (t) = L(t) has a solution X (t) ∈ L 2 (I; Rm×k ) if and only if (i) R(L(t)) ⊆ R(N (t)), and (ii) N (t)† L(t) ∈ L 2 (I; Rm×k ), in which case the general solution is given by X (t) = N (t)† L(t) + [Im − N (t)† N (t)]Y (t), where Y (t) ∈ L 2 (I; Rm×k ) is arbitrary. Remark 1.3.2 The following are obvious: (i) The condition R(L(t)) ⊆ R(N (t)) is equivalent to N (t)N (t)† L(t) = L(t). (ii) If N (t) ∈ Sn and N (t)X (t) = L(t), then X (t) N (t)X (t) = L(t) N (t)† L(t). Next, let X [0, ∞) be the subspace of L 2F (Rn ) whose elements are continuous. Consider the following BSDE over the infinite horizon [0, ∞):

1.3 Appendix: Pseudoinverse and Infinite-Horizon BSDEs

13

dY (t) = −[A Y (t) + C  Z (t) + ϕ(t)]dt + Z (t)dW (t), t ∈ [0, ∞),

(1.3.1)

where A, C ∈ Rn×n are given constant matrices, and {ϕ(t); 0  t < ∞} is a given F-progressively measurable, Rn -valued process. Definition 1.3.3 An L 2 -stable adapted solution to the BSDE (1.3.1) is a pair (Y, Z ) ∈ X [0, ∞) × L 2F (Rn ) that satisfies the integral version of (1.3.1): Y (t) = Y (0) −

t 0

A Y (s) + C  Z (s) + ϕ(s) ds +

t

Z (s)dW (s), t  0.

0

The following theorem establishes the basic existence and uniqueness result for the BSDE (1.3.1). Theorem 1.3.4 Suppose that the system (1.2.1) is L 2 -stable. Then for every ϕ ∈ L 2F (Rn ), the BSDE (1.3.1) admits a unique L 2 -stable adapted solution.

Chapter 2

Linear-Quadratic Two-Person Differential Games

Abstract The purpose of this chapter is to develop a theory for stochastic linearquadratic two-person differential games. Open-loop and closed-loop Nash equilibria are explored in the context of nonzero-sum and zero-sum differential games. The existence of an open-loop Nash equilibrium is characterized in terms of a system of constrained forward-backward stochastic differential equations, and the existence of a closed-loop Nash equilibrium is characterized by the solvability of a system of coupled symmetric differential Riccati equations. It is shown that in the nonzerosum case, the closed-loop representation for open-loop Nash equilibria is different from the outcome of closed-loop Nash equilibria in general, whereas they coincide in the zero-sum case when both exist. Some results for infinite-horizon zero-sum differential games are also established in terms of algebraic Riccati equation. Keywords Linear-quadratic · Differential game · Two-person · Nonzero-sum · Zero-sum · Nash equilibrium · Saddle point · Open-loop · Closed-loop · Riccati equation Throughout this chapter, (Ω, F, P) is a complete probability space in the background, and F = {Ft }t0 is a filtration over [0, ∞). As usual, we assume that the probability space (Ω, F, P) is rich enough to support a standard one-dimensional Brownian motion W = {W (t); 0  t < ∞}, and that F is the usual augmentation of the natural filtration generated by W . We employ throughout this chapter the notation of Chap. 1.

2.1 Formulation Consider the controlled linear SDE on a finite time horizon [t, T ]: ⎧ ⎪ ⎨ d X (s) = [A(s)X (s) + B1 (s)u 1 (s) + B2 (s)u 2 (s) + b(s)]ds + [C(s)X (s) + D1 (s)u 1 (s) + D2 (s)u 2 (s) + σ(s)]dW (s), ⎪ ⎩ X (t) = x, © The Author(s), under exclusive license to Springer Nature Switzerland AG 2020 J. Sun and J. Yong, Stochastic Linear-Quadratic Optimal Control Theory: Differential Games and Mean-Field Problems, SpringerBriefs in Mathematics, https://doi.org/10.1007/978-3-030-48306-7_2

(2.1.1)

15

16

2 Linear-Quadratic Two-Person Differential Games

where A, B1 , B2 , C, D1 , D2 , b and σ are given coefficients. We suppose that in this chapter, the following assumption holds. (G1) The coefficients and the nonhomogeneous terms of (2.1.1) satisfy ⎧ 1 n×n Bi ∈ L 2 (0, T ; Rn×m i ), ⎪ ⎨ A ∈ L (0, T ; R ), C ∈ L 2 (0, T ; Rn×n ), Di ∈ L ∞ (0, T ; Rn×m i ), ⎪ ⎩ b ∈ L 2F (Ω; L 1 (0, T ; Rn )), σ ∈ L 2F (0, T ; Rn ).

i = 1, 2, i = 1, 2,

In the above, X is the n-dimensional state process, and u i (i = 1, 2), valued in Rm i , is the control process selected by Player i. For i = 1, 2 and t ∈ [0, T ), the set of admissible (open-loop) controls of Player i on [t, T ] is defined by   T Ui [t, T ] = u i : [t, T ] × Ω → Rm i | u i ∈ F and E t |u i (s)|2 ds < ∞ , and the cost functional for Player i is defined by 



J i (t, x; u 1 , u 2 ) = E G i X (T ), X (T ) + 2 g i , X (T ) ⎞⎛ ⎞ ⎛ ⎞ ⎛ T Q i (s) S1i (s) S2i (s) X (s) X (s)  i i ⎝ S1i (s) R11 (s) R12 (s) ⎠⎝u 1 (s)⎠, ⎝u 1 (s)⎠ + i i i t u 2 (s) u 2 (s) S2 (s) R21 (s) R22 (s)

⎛q i (s)⎞ ⎛ X (s) ⎞   + 2 ⎝ρi1 (s)⎠, ⎝u 1 (s)⎠ ds . u 2 (s) ρi2 (s)

(2.1.2)

We assume that the weighting coefficients satisfy the following assumption so that the integral on the right-hand side makes sense: (G2) The weighting coefficients satisfy the following conditions: ⎧ i G ∈ Sn , g i ∈ L 2FT (Ω; Rn ), Q i ∈ L 1 (0, T ; Sn ), ⎪ ⎪ ⎪ ⎪ ⎨ S i ∈ L 2 (0, T ; Rm j ×n ), R i ∈ L ∞ (0, T ; Sm j ), j jj ⎪ q i ∈ L 2F (Ω; L 1 (0, T ; Rn )), ρij ∈ L 2F (0, T ; Rm j ), ⎪ ⎪ ⎪ ⎩ i i  R12 = (R21 ) ∈ L ∞ (0, T ; Rm 1 ×m 2 ); i, j = 1, 2.

2.1 Formulation

17

The problem we are concerned with in this chapter is the following linear-quadratic stochastic two-person differential game. Problem (SDG). For given initial pair (t, x) ∈ [0, T ) × Rn , how the players should choose their controls u i ∈ Ui [t, T ] so that their payoff J i (t, x; u 1 , u 2 ) is minimized. The above problem is called an LQ stochastic two-person zero-sum differential game if one player’s gain is the other’s loss, or equivalently, the sum of the winnings and losses of the players is always zero, i.e., J 1 (t, x; u 1 , u 2 ) + J 2 (t, x; u 1 , u 2 ) = 0, ∀u i ∈ Ui [t, T ], i = 1, 2.

(2.1.3)

If (2.1.3) is not necessarily true, we shall call Problem (SDG) an LQ stochastic two-person nonzero-sum differential game, emphasizing that (2.1.3) is not assumed. Remark 2.1.1 The LQ optimal control problem studied in Chap. 1, Sect. 1.1 can be regarded as a special case of Problem (SDG) where m 1 = 0 or m 2 = 0. For notational simplicity, we let m = m 1 + m 2 and denote B = (B1 , B2 ),  i S S i = 1i , S2

D = (D1 , D2 ),   i i R11 R12 Ri = , i i R21 R22

i i R1i = (R11 , R12 ),  i ρ ρi = i1 , ρ2

i i R2i = (R21 , R22 ),   u1 u= . u2

Naturally, we identify U[t, T ] = U1 [t, T ] × U2 [t, T ]. With such notation, the state equation (2.1.1) can be rewritten as ⎧ ⎪ ⎨ d X (s) = [A(s)X (s) + B(s)u(s) + b(s)]ds + [C(s)X (s) + D(s)u(s) + σ(s)]dW (s), s ∈ [t, T ], ⎪ ⎩ X (t) = x, and the cost functionals (2.1.2) can be rewritten as 



J i (t, x; u) =E G i X (T ), X (T ) + 2 g i , X (T )     T  i Q (s) S i (s) X (s) X (s) + , S i (s) R i (s) u(s) u(s) t     i   X (s) q (s) , ds . +2 ρi (s) u(s) Definition 2.1.2 A pair (u ∗1 , u ∗2 ) ∈ U1 [t, T ] × U2 [t, T ] is called an open-loop Nash equilibrium of Problem (SDG) for the initial pair (t, x) if J 1 (t, x; u ∗1 , u ∗2 )  J 1 (t, x; u 1 , u ∗2 ), ∀u 1 ∈ U1 [t, T ], J 2 (t, x; u ∗1 , u ∗2 )  J 2 (t, x; u ∗1 , u 2 ), ∀u 2 ∈ U2 [t, T ].

(2.1.4)

18

2 Linear-Quadratic Two-Person Differential Games

In the zero-sum case, (u ∗1 , u ∗2 ) is called an open-loop saddle point. For i = 1, 2, let us denote Θ i [t, T ] = L 2 (t, T ; Rm i ×n ) and write Θ[t, T ] = Θ 1 [t, T ] × Θ 2 [t, T ]. In a similar fashion to the LQ optimal control problem, we can consider, for any Θ = (Θ1 , Θ2 ) ∈ Θ 1 [t, T ] × Θ 2 [t, T ] and v = (v1 , v2 ) ∈ U1 [t, T ] × U2 [t, T ], the state equation ⎧ ⎪ ⎨ d X (s) = [(A + BΘ)X + Bv + b]ds + [(C + DΘ)X + Dv + σ]dW (s), s ∈ [t, T ], ⎪ ⎩ X (t) = x,

(2.1.5)

and the cost functionals (i = 1, 2) 



J i (t, x; Θ1 X + v1 , Θ2 X + v2 ) = E G i X (T ), X (T ) + 2 g i , X (T )     i     T  i i   Q (S ) X X q X , +2 , ds . + Si Ri ρi Θ X +v Θ X +v Θ X +v t We shall call (Θi , vi ) a closed-loop strategy of Player i, and (2.1.5) the closed-loop system of the original system under closed-loop strategies (Θ1 , v1 ) and (Θ2 , v2 ) of Players 1 and 2. To emphasize that the solution X to (2.1.5) depends on (Θi , vi ) (i = 1, 2), as well as on the initial pair (t, x), we frequently write X (·) = X (· ; t, x, Θ1 , v1 , Θ2 , v2 ). The control pair (u 1 , u 2 ) defined by u 1 = Θ1 X + v1 , u 2 = Θ2 X + v2 is called the outcome of the closed-loop strategy (Θ1 , v1 ; Θ2 , v2 ). Definition 2.1.3 A 4-tuple (Θ1∗ , v1∗ ; Θ2∗ , v2∗ ) ∈ Θ 1 [t, T ] × U1 [t, T ] × Θ 2 [t, T ] × U2 [t, T ] is called a closed-loop Nash equilibrium of Problem (SDG) on [t, T ], if for any x ∈ Rn and (Θ1 , v1 ; Θ2 , v2 ) ∈ Θ 1 [t, T ] × U1 [t, T ] × Θ 2 [t, T ] × U2 [t, T ], J 1 (t, x; Θ1∗ X ∗ + v1∗ , Θ2∗ X ∗ + v2∗ )  J 1 (t, x; Θ1 X + v1 , Θ2∗ X + v2∗ ), J

2

(t, x; Θ1∗ X ∗

+

v1∗ , Θ2∗ X ∗

+

v2∗ )

J

2

(t, x; Θ1∗ X

+

v1∗ , Θ2 X

+ v2 ).

(2.1.6) (2.1.7)

In the zero-sum case, (Θ1∗ , v1∗ ; Θ2∗ , v2∗ ) is called a closed-loop saddle point. Remark 2.1.4 Similar to the closed-loop optimal strategy of Problem (SLQ), the closed-loop Nash equilibrium is also required to be independent of the initial state x. One should note that in both (2.1.6) and (2.1.7), X ∗ (·) = X (· ; t, x, Θ1∗ , v1∗ , Θ2∗ , v2∗ ), whereas, in (2.1.6) X (·) = X (· ; t, x, Θ1 , v1 , Θ2∗ , v2∗ ), and in (2.1.7) X (·) = X (· ; t, x, Θ1∗ , v1∗ , Θ2 , v2 ). Thus, the X appeared in (2.1.6) and (2.1.7) are different in general.

2.1 Formulation

19

The following result provides some equivalent definitions of closed-loop Nash equilibrium, whose proof is similar to the case of LQ optimal control problems; see Proposition 2.1.5 of [48, Chap. 2]. Proposition 2.1.5 Let (G1)–(G2) hold, and let (Θ1∗ , v1∗ ; Θ2∗ , v2∗ ) ∈ Θ 1 [t, T ] × U1 [t, T ] × Θ 2 [t, T ] × U2 [t, T ]. The following are equivalent: (i) (Θ1∗ , v1∗ ; Θ2∗ , v2∗ ) is a closed-loop Nash equilibrium on [t, T ]; (ii) for any x ∈ Rn and (v1 , v2 ) ∈ U1 [t, T ] × U2 [t, T ], J 1 (t, x; Θ1∗ X ∗ + v1∗ , Θ2∗ X ∗ + v2∗ )  J 1 (t, x; Θ1∗ X + v1 , Θ2∗ X + v2∗ ), J 2 (t, x; Θ1∗ X ∗ + v1∗ , Θ2∗ X ∗ + v2∗ )  J 2 (t, x; Θ1∗ X + v1∗ , Θ2∗ X + v2 ); (iii) for any x ∈ Rn and (u 1 , u 2 ) ∈ U1 [t, T ] × U2 [t, T ], J 1 (t, x; Θ1∗ X ∗ + v1∗ , Θ2∗ X ∗ + v2∗ )  J 1 (t, x; u 1 , Θ2∗ X + v2∗ ), J

2

(t, x; Θ1∗ X ∗

+

v1∗ , Θ2∗ X ∗

+

v2∗ )

J

2

(t, x; Θ1∗ X

+

v1∗ , u 2 ).

(2.1.8) (2.1.9)

Suppose that (Θ1∗ , v1∗ ; Θ2∗ , v2∗ ) is a closed-loop Nash equilibrium of Problem (SDG) on [t, T ]. If we denote by (u ∗1 , u ∗2 ) the outcome of this closed-loop Nash equilibrium, i.e., u i∗ = Θi∗ X ∗ + vi∗ , i = 1, 2, then (2.1.8) and (2.1.9) respectively become J 1 (t, x; u ∗1 , u ∗2 )  J 1 (t, x; u 1 , Θ2∗ X + v2∗ ), J 2 (t, x; u ∗1 , u ∗2 )  J 2 (t, x; Θ1∗ X + v1∗ , u 2 ).

(2.1.10) (2.1.11)

Since in (2.1.10), X corresponds to u 1 and (Θ2∗ , v2∗ ), one might not have u ∗2 = Θ2∗ X + v2∗ . Likewise, in (2.1.11) one might not have u ∗1 = Θ1∗ X + v1∗ either. Hence, comparing this with (2.1.4), we see that, in general, the outcome of a closed-loop Nash equilibrium is not necessarily an open-loop Nash equilibrium for the initial pair (t, x).

2.2 Open-Loop Nash Equilibria and Their Closed-Loop Representations To begin our study of open-loop Nash equilibria, we observe that the open-loop controls selected by the players are free to choose from Ui [t, T ]. This makes it possible to treat the two-person differential game as two related optimal control

20

2 Linear-Quadratic Two-Person Differential Games

problems. To elaborate on the idea, let us suppose that (u ∗1 , u ∗2 ) is an open-loop Nash equilibrium of Problem (SDG) for the initial pair (t, x). Consider the following two LQ optimal control problems: (1) To minimize

J (t, x; u 1 )  J 1 (t, x; u 1 , u ∗2 )

subject to the state equation ⎧ ∗ ⎪ ⎨ d X (s) = [A(s)X (s) + B1 (s)u 1 (s) + B2 (s)u 2 (s) + b(s)]ds + [C(s)X (s) + D1 (s)u 1 (s) + D2 (s)u ∗2 (s) + σ(s)]dW (s), ⎪ ⎩ X (t) = x. (2) To minimize

J (t, x; u 2 )  J 2 (t, x; u ∗1 , u 2 )

subject to the state equation ⎧ ∗ ⎪ ⎨ d X (s) = [A(s)X (s) + B2 (s)u 2 (s) + B1 (s)u 1 (s) + b(s)]ds + [C(s)X (s) + D2 (s)u 2 (s) + D1 (s)u ∗1 (s) + σ(s)]dW (s), ⎪ ⎩ X (t) = x. According to Definition 2.1.2, u ∗1 is an open-loop optimal control of Problem (1), and u ∗2 is an open-loop optimal control of problem (2). Thus, we may apply Theorem 1.1.9 of Chap. 1 to the above control problems. This leads to the following result. Theorem 2.2.1 Let (G1)–(G2) hold. Then (u ∗1 , u ∗2 ) ∈ U1 [t, T ] × U2 [t, T ] is an openloop Nash equilibrium of Problem (SDG) for the initial pair (t, x) if and only if for i = 1, 2, (a) the adapted solution (X ∗ , Yi∗ , Z i∗ ) to the FBSDE on [t, T ] ⎧ d X ∗ (s) = (AX ∗ + B1 u ∗1 + B2 u ∗2 + b)ds ⎪ ⎪ ⎪ ⎪ ⎪ + (C X ∗ + D1 u ∗1 + D2 u ∗2 + σ)dW (s), ⎪ ⎨  dYi∗ (s) = − A Yi∗ + C  Z i∗ + Q i X ∗ ⎪  ⎪ ⎪ ⎪ + (S1i ) u ∗1 + (S2i ) u ∗2 + q i ds + Z i∗ dW (s), ⎪ ⎪ ⎩ X ∗ (t) = x, Yi∗ (T ) = G i X ∗ (T ) + g i ,

(2.2.1)

satisfies the stationarity condition i ∗ i ∗ Bi Yi∗ + Di Z i∗ + Sii X ∗ + Ri1 u 1 + Ri2 u 2 + ρii = 0,

a.e. s ∈ [t, T ], a.s. (b) for any u i ∈ Ui [t, T ],

(2.2.2)

2.2 Open-Loop Nash Equilibria and Their Closed-Loop Representations

 i E G X i (T ), X i (T ) +

T

t



Q i (Sii ) Sii Riii



21

    Xi Xi , ds  0, ui ui

where X i is the solution to the SDE 

d X i (s) = (AX i + Bi u i )ds + (C X i + Di u i )dW (s), s ∈ [t, T ], X i (t) = 0,

(or equivalently, the mapping u i → J i (t, x; u 1 , u 2 ) is convex). Remark 2.2.2 (1) Note that for i = 1, 2 the FBSDEs (2.2.1) are coupled through the relation (2.2.2). In fact, from (2.2.2) we see that 

1 1 R11 R12 2 2 R21 R22

  u ∗1 u ∗2

 =−

B1 Y1∗ + D1 Z 1∗ + S11 X ∗ + ρ11 B2 Y2∗ + D2 Z 2∗ + S22 X ∗ + ρ22

 .

Thus, say, in the case that the coefficient matrix on the left-hand side is invertible, one has  −1     1 1 R11 B1 Y1∗ + D1 Z 1∗ + S11 X ∗ + ρ11 R12 u ∗1 =− . 2 2 u ∗2 R21 R22 B2 Y2∗ + D2 Z 2∗ + S22 X ∗ + ρ22 Plugging the above into (2.2.1), we see the coupling between these two FBSDEs. (2) An easily verifiable condition for (b) of Theorem 2.2.1 is G i  0, Riii (s) > 0, Q i (s) − Sii (s) Riii (s)−1 Sii (s)  0; a.e. s ∈ [t, T ]. This can be seen by completing the square: For any x ∈ Rn and u ∈ Rm i ,





Q i (s)x, x + 2 Sii (s)x, u + Riii (s)u, u  

= Q i (s) − Sii (s) Riii (s)−1 Sii (s) x, x    

+ Riii (s) u + Riii (s)−1 Sii (s)x , u + Riii (s)−1 Sii (s)x  0, a.e. s ∈ [t, T ].

Note that for i = 1, 2, the conditions (b) of Theorem 2.2.1 are independent, and both can be regarded as the convexity condition for certain Problem (SLQ) with appropriate state equation and cost functional. We now rewrite (2.2.1) and (2.2.2) in more compact forms. Recall the notation introduced right after Remark 2.1.1. Let Ik denote the identity matrix of size k, and let

22

2 Linear-Quadratic Two-Person Differential Games



       A 0 B 0 C 0 D 0 , B= , C= , D= , 0 A 0 B 0 C 0 D   1   1   1   1 Q 0 S 0 R 0 G 0 , Q = , S = , R = , G= 0 G2 0 Q2 0 S2 0 R2  1  1  1   q ρ g I q= , ρ= 2 , Ik = k . g= 2 , g q2 ρ Ik A=

Note that the matrices G, Q, and R are symmetric. If we denote by J the 2m × m matrix ⎛ ⎞ ⎛ ⎞ Im 1 0 Im 1 0m 1 ×m 2 ⎜ 0 0 ⎟ ⎜0m 2 ×m 1 0m 2 ×m 2 ⎟ ⎜ ⎟ ⎜ ⎟ ⎝ 0 0 ⎠ = ⎝0m 1 ×m 1 0m 1 ×m 2 ⎠ , 0m 2 ×m 1 Im 2 0 Im 2 then

⎞ Im 1 0   0 0 ⎟ B1 0 B1 B2 0 0 ⎜ ⎟ ⎜ = , BJ = 0 0 B1 B2 ⎝ 0 0 ⎠ 0 B2 0 Im 2 ⎛ ⎞   Im 1 0   D1 D2 0 0 ⎜ 0 0 ⎟ D1 0 ⎜ ⎟ DJ = , = 0 0 D1 D2 ⎝ 0 0 ⎠ 0 D2 0 Im 2 ⎛ 1 ⎞  S11 0  1   S2 0 ⎟ S 0 Im 1 0 0 0 ⎜  ⎜ ⎟ J S= = 1 2 , 0 0 0 Im 2 ⎝ 0 S12 ⎠ 0 S2 0 S22 ⎛ 1 ⎞  R11 0  1   ⎟ I 0 0 0 ⎜ ⎜ R2 02 ⎟ = R1 02 , J  R = m1 0 0 0 Im 2 ⎝ 0 R1 ⎠ 0 R2 0 R22 ⎛ 1⎞  ρ11  1  ⎟ I 0 0 0 ⎜ ⎜ρ22 ⎟ = ρ21 . J ρ = m1 ρ2 0 0 0 Im 2 ⎝ρ1 ⎠ ρ22 



With the above notation and with   Y1 (s) , Y (s)  Y2 (s)



 Z(s) 

 Z 1 (s) , Z 2 (s)

we can express the FBSDEs (2.2.1) and the stationarity conditions (2.2.2) (i = 1, 2), respectively, more compactly as (dropping ∗)

2.2 Open-Loop Nash Equilibria and Their Closed-Loop Representations

⎧ + Bu + b)ds + (C X + Du + σ)dW (s), ⎪ ⎨ d X (s) = (AX !  " dY (s) = − A Y + C  Z + Q I n X + S I m u + q ds + ZdW (s), ⎪ ⎩ X (t) = x, Y (T ) = G I n X (T ) + g,

23

(2.2.3)

and ! " J  B  Y + D Z + S I n X + R I m u + ρ = 0, a.e. s ∈ [t, T ], a.s.

(2.2.4)

From Theorem 2.2.1 we see that the open-loop Nash equilibria are determined by (2.2.3)–(2.2.4). To solve for an open-loop Nash equilibrium (u 1 , u 2 ) from (2.2.3)– (2.2.4), we introduce the ansatz that the adapted solution (X, Y , Z) to the FBSDE (2.2.3) has the form Y=

        Π1 X + η1 Π1 η Y1 = ≡ Π X + η; Π  , η 1 , Y2 Π2 X + η2 Π2 η2

where Πi : [t, T ] → Rn×n ; i = 1, 2, are differentiable maps to be determined, and η : [t, T ] × Ω → R2n is a stochastic process satisfying a certain equation. To match the terminal condition Y (T ) = G I n X (T ) + g, we impose the requirements Π(T ) = G I n , η(T ) = g. The second requirement suggests that the equation for η should be a BSDE: 

dη(s) = α(s)ds + ζ(s)dW (s), s ∈ [t, T ], η(T ) = g,

where α : [t, T ] × Ω → R2n is to be determined. Applying Itô’s formula to both sides of Y = Π X + η, we obtain " ! − A Y + C  Z + Q I n X + S I m u + q ds + ZdW     ˙ X + Π(AX + Bu + b) + α ds + Π(C X + Du + σ) + ζ dW = Π     ˙ + Π A)X + Π Bu + Πb + α ds + ΠC X + Π Du + Πσ + ζ dW. = (Π Comparing the drift and diffusion coefficients, we find A Y + C  Z + Q I n X + S I m u + q ˙ + Π A)X + Π Bu + Πb + α = 0, + (Π

(2.2.5)

Z = ΠC X + Π Du + Πσ + ζ.

(2.2.6)

and

24

2 Linear-Quadratic Two-Person Differential Games

Substituting Y = Π X + η and (2.2.6) into (2.2.4) yields ! " ! " J  B  Π + D ΠC + S I n X + J  R I m + D Π D u ! " + J  B  η + D ζ + D Πσ + ρ = 0.

(2.2.7)

! " If the Rm×m -valued function J  R I m + D Π D has bounded inverse, then (2.2.7) implies " −1 !  u = − J  (R I m + D Π D) J  B  Π + D ΠC + S I n X  −1 ! " − J  (R I m + D Π D) J  B  η + D ζ + D Πσ + ρ ,

(2.2.8)

which, together with (2.2.5) and (2.2.6), in turn yields 0 = A (Π X + η) + C  (ΠC X + Π Du + Πσ + ζ) ! " ˙ + Π A X + Π Bu + Πb + α + Q I n X + S I m u + q + Π ! " ! " ˙ + Π A + A Π + C  ΠC + Q I n X + Π B + C  Π D + S I m u = Π + A η + C  ζ + C  Πσ + Πb + q + α ! " ˙ + Π A + A Π + C  ΠC + Q I n X = Π ! " −1 − Π B + C  Π D + S I m J  (R I m + D Π D) ! " × J  B  Π + D ΠC + S I n X ! " −1 − Π B + C  Π D + S I m J  (R I m + D Π D) ! " × J  B  η + D ζ + D Πσ + ρ + A η + C  ζ + C  Πσ + Πb + q + α. Comparing the coefficients of X in the above, we see that Π should be the solution to the following Riccati equation on [t, T ]: ! " ⎧     ˙ ⎪ ⎨ Π + Π A + A Π + C ΠC + Q I n − Π B + C Π D + S I m   " −1  !    × J B = 0, (R I + D Π D) J Π + D ΠC + S I m n ⎪ ⎩ Π(T ) = G I n , and that the BSDE for (η, ζ) should be ⎧ " !  " !  " #!     ⎪ ⎨ dη(s) = − A + K B η +$ C + K D ζ + C + K D Πσ + Πb + q + K ρ ds + ζdW (s), ⎪ ⎩ η(T ) = g, where

" ! −1 K = − Π B + C  Π D + S I m J  (R I m + D Π D) J  .

(2.2.9)

2.2 Open-Loop Nash Equilibria and Their Closed-Loop Representations

25

! If the Riccati equation (2.2.9) indeed admits a solution Π such that J  R I m + " D Π D has bounded inverse, then by reversing the above argument, we see the 4-tuple (X, Y , Z, u), defined through the forward SDE % ⎧   "& −1 !    ⎪ d X = A − B J B X (R I + D Π D) J Π + D ΠC + S I m n ⎪ ⎪ ⎪  ⎪  −1 ! " ⎪ ⎪ ⎪ − B J (R I m + DΠ D) J  B  η + D ζ + DΠσ + ρ + b ds ⎪ ⎪ ⎨ %   "& −1 !    + C − D J B X (R I + D Π D) J Π + D ΠC + S I m n ⎪ ⎪ ⎪  ⎪ ⎪ ⎪ − D  J (R I + DΠ D)−1 J ! B  η + D ζ + DΠσ + ρ" + σ dW, ⎪ m ⎪ ⎪ ⎪ ⎩ X (t) = x, and the Eqs. (2.2.8), Y = Π X + η, and (2.2.6), satisfies (2.2.3)–(2.2.4). If, in addition, the convexity condition (b) of Theorem 2.2.1 holds for i = 1, 2, then by Theorem 2.2.1, the control pair u defined by (2.2.8) is an open-loop Nash equilibrium for the initial state x. Observe that u takes the form u(s) = Θ(s)X (s) + v(s), s ∈ [t, T ], where (Θ, v) ∈ Θ[t, T ] × U[t, T ] is independent of x. Writing (Θ, v) componentwise as (Θ, v) = (Θ1 , v1 ; Θ2 , v2 ) with Θi ∈ Θ i [t, T ], vi ∈ Ui [t, T ]; i = 1, 2, we see then that (Θ, v) is a pair of closed-loop strategies. This leads to the following definition. Definition 2.2.3 We say that the open-loop Nash equilibria of Problem (SDG) with initial time t admit a closed-loop representation, if there exists a pair (Θ, v) ∈ Θ[t, T ] × U[t, T ] of closed-loop strategies such that for any initial state x ∈ Rn , the process u(s)  Θ(s)X (s) + v(s), s ∈ [t, T ] (2.2.10) is an open-loop Nash equilibrium of Problem (SDG) for (t, x), where X (·) = X (· ; t, x, Θ, v) is the solution to the following closed-loop system: # $ ⎧ + B(s)Θ(s)]X (s) + B(s)v(s) + b(s) ds ⎪ ⎨ d X (s) = [A(s) # $ + [C(s) + D(s)Θ(s)]X (s) + D(s)v(s) + σ(s) dW (s), ⎪ ⎩ X (t) = x.

(2.2.11)

Comparing Definitions 2.1.3 and 2.2.3, it is natural to ask: Does the closedloop representation of open-loop Nash equilibria coincide with the outcome of some

26

2 Linear-Quadratic Two-Person Differential Games

closed-loop Nash equilibrium? The answer is no. A counterexample will be presented in Sect. 2.4. Now we present a characterization of the closed-loop representation of open-loop Nash equilibria. Theorem 2.2.4 Let (G1)–(G2) hold. Let (Θ, v) ∈ Θ[t, T ] × U[t, T ]. The openloop Nash equilibria of Problem (SDG) with initial time t admit the closed-loop representation (2.2.10) if and only if (a) the convexity condition (b) of Theorem 2.2.1 holds for i = 1, 2, (b) the solution Π ∈ C([t, T ]; Rn×2n ) to the ODE on [t, T ] ⎧   ˙ ⎪ ⎨ Π + !Π A + A Π + C ΠC +" Q I n + Π B + C  Π D + S I m Θ = 0, ⎪ ⎩ Π(T ) = G I n ,

(2.2.12)

satisfies 

! "  J  (R I m + D Π D) Θ + J  B  Π + D ΠC + S I n = 0

(2.2.13)

almost everywhere on [t, T ], and (c) the adapted solution (η, ζ) to the BSDE on [t, T ] ! "   ⎧    ⎪ ⎨ dη(s) = − A η + C ζ + Π B+ C Π D + S I m v + C  Πσ + Πb + q ds + ζdW (s), ⎪ ⎩ η(T ) = g,

(2.2.14)

satisfies 

!  " J  (R I m + D Π D) v + J  B  η + D ζ + D Πσ + ρ = 0

(2.2.15)

almost surely and almost everywhere on [t, T ]. Proof For arbitrary but fixed x ∈ Rn , let X be the solution to (2.2.11). Let u be defined by (2.2.10) and set Y = Π X + η,

Z = Π(C + DΘ)X + Π Dv + Πσ + ζ.

Then Y (T ) = G I n X (T ) + g, and

2.2 Open-Loop Nash Equilibria and Their Closed-Loop Representations

27

˙ X ds + Πd X + dη dY = Π  ˙ X + Π(A + BΘ)X + Π Bv + Πb − A η − C  ζ = Π !  " − Π B + C  Π D + S I m v − C  Πσ − Πb − q ds   + Π(C + DΘ)X + Π Dv + Πσ + ζ dW  ! " = − A Π + C  ΠC + Q I n + C  Π DΘ + S I m Θ X " !  − A η − C  ζ − C  Π D + S I m v − C  Πσ − q ds + ZdW #   = − A (Π X +η) − Q I n X − C  Π(C + DΘ)X +Π Dv+Πσ+ζ $ − S I m (Θ X + v) − q ds + ZdW ! " = − A Y − Q I n X − C  Z − S I m u − q ds + ZdW. This shows that (X, Y , Z, u) satisfies the FBSDE (2.2.3). According to Theorem 2.2.1, the control pair u defined by (2.2.10) is an open-loop Nash equilibrium for (t, x) if and only if (a) holds and ! " 0 = J  B  Y + D Z + S I n X + R I m u + ρ # = J  B  (Π X + η) + D [Π(C + DΘ)X + Π Dv + Πσ + ζ] $ + S I n X + R I m (Θ X + v) + ρ   = J  B  Π + D ΠC + S I n + (R I m + D Π D)Θ X   + J  B  η + D ζ + D Πσ + ρ + (R I m + D Π D)v . (2.2.16) Since the initial state x is arbitrary and J  [B  η + D ζ + D Πσ + ρ + (R I m + D Π D)v] is independent of x, we conclude that (2.2.16) is equivalent to (2.2.13) and (2.2.15).

We conclude this section with a remark on the solution Π of Eq. (2.2.12). The Eqs. (2.2.12) and (2.2.13) can be written componentwise as ⎧   ˙ Qi ⎪ ⎨ Πi + Πi A + A Πi + C Πi C +  + Πi B + C  Πi D + (S i ) Θ = 0, ⎪ ⎩ Πi (T ) = G i ; i = 1, 2, and

  1 R1 + D1 Π1 D R22 + D2 Π2 D

Θ+

(2.2.17)

   B1 Π1 + D1 Π1 C + S11 = 0, B2 Π2 + D2 Π2 C + S22

(2.2.18)

respectively. The relation (2.2.18) shows that the equations for Π1 and Π2 are coupled. Since the Rm×m -valued function 

R11 + D1 Π1 D R22 + D2 Π2 D



 =

1 1 R11 + D1 Π1 D1 R12 + D1 Π1 D2 2 2 R21 + D2 Π2 D1 R22 + D2 Π2 D2



28

2 Linear-Quadratic Two-Person Differential Games

is not necessarily symmetric even if D1 and D2 are zero, we see that, in general, Π1 and Π2 are not symmetric.

2.3 Closed-Loop Nash Equilibria and Symmetric Riccati Equations In this section we characterize the closed-loop Nash equilibria of Problem (SDG). The idea is similar to that of the previous section: We regard the components of a closed-loop Nash equilibrium as closed-loop optimal strategies of certain LQ control problems. To make this idea precise, let (Θ1∗ , v1∗ ; Θ2∗ , v2∗ ) be a pair of closed-loop strategies of Problem (SDG) on [t, T ], and consider the state equation ⎧ ∗ ∗ ⎪ ⎨ d X (s) = [(A + B2 Θ2 )X + B1 u 1 + B2 v2 + b]ds + [(C + D2 Θ2∗ )X + D1 u 1 + D2 v2∗ + σ]dW (s), ⎪ ⎩ X (t) = x,

(2.3.1)

with the cost functional J'(t, x; u 1 )  J 1 (t, x; u 1 , Θ2∗ X + v2∗ ), that is, 



J'(t, x; u 1 ) = E G 1 X (T ), X (T ) + 2 g 1 , X (T ) ⎡⎛ ⎞⎛ ⎞ ⎛ ⎞ T Q 1 (S11 ) (S21 ) X X 1 1 ⎠⎝ ⎣ ⎝ S11 R11 ⎠, ⎝ ⎠ u1 u1 R12 + 1 1 1 ∗ ∗ ∗ ∗ t S2 R21 R22 Θ2 X + v2 Θ2 X + v2 ⎞⎤ ⎫

⎛q 1 ⎞ ⎛ X ⎬ ⎠ ⎦ ds u1 + 2 ⎝ρ11 ⎠, ⎝ ⎭ ρ12 Θ2∗ X + v2∗ 



= E G 1 X (T ), X (T ) + 2 g 1 , X (T )           T  '' Q S X X qˆ X , + 2 ds + c, ˆ , + 1 ' u u ρ ˆ u S R11 1 1 1 t where



T

cˆ = E t



(2.3.2)



 1 ∗ R22 v2 , v2∗ + 2 ρ12 , v2∗ ds

is a constant and 1 ' = Q 1 + (S21 ) Θ2∗ + (Θ2∗ ) S21 + (Θ2∗ ) R22 Q Θ2∗ ,

1 ' S = S11 + R12 Θ2∗ ,

1 qˆ = q 1 + (Θ2∗ ) ρ12 + (S21 + R22 Θ2∗ ) v2∗ ,

1 ∗ ρˆ = ρ11 + R12 v2 .

2.3 Closed-Loop Nash Equilibria and Symmetric Riccati Equations

29

We denote the problem of minimizing (2.3.2) subject to the Eq. (2.3.1) by Problem (1). In a similar fashion, we may consider the problem of minimizing J'(t, x; u 2 )  J 2 (t, x; Θ1∗ X + v1∗ , u 2 ) subject to the state equation ⎧ ∗ ∗ ⎪ ⎨ d X (s) = [(A + B1 Θ1 )X + B2 u 2 + B1 v1 + b]ds + [(C + D1 Θ1∗ )X + D2 u 2 + D1 v1∗ + σ]dW (s), ⎪ ⎩ X (t) = x, which we denote by Problem (2). Theorem 2.3.1 Let (G1)–(G2) hold. Let (Θ1∗ , v1∗ ; Θ2∗ , v2∗ ) ∈ Θ 1 [t, T ] × U1 [t, T ] ×Θ 2 [t, T ] × U2 [t, T ] and denote  ∗  ∗ Θ1 v ∗ , v = 1∗ . Θ = Θ2∗ v2 ∗

Then (Θ1∗ , v1∗ ; Θ2∗ , v2∗ ) is a closed-loop Nash equilibrium of Problem (SDG) on [t, T ] if and only if for i = 1, 2, (a) the solution Pi ∈ C([t, T ]; Sn ) to the symmetric Lyapunov type equation ⎧ P˙i + Pi A + A Pi + C  Pi C + Q i + (Θ ∗ ) (R i + D  Pi D)Θ ∗ ⎪ ⎪ ⎪ ⎪ ⎨ +  Pi B + C  Pi D + (S i ) Θ ∗   ⎪ + (Θ ∗ ) B  Pi + D  Pi C + S i = 0, ⎪ ⎪ ⎪ ⎩ Pi (T ) = G i

(2.3.3)

satisfies the following two conditions: for a.e. s ∈ [t, T ],

Bi Pi

+

Riii + Di Pi Di  0,  Di Pi C + Sii + (Rii + Di Pi D)Θ ∗

(2.3.4) = 0;

(2.3.5)

(b) the adapted solution (ηi , ζi ) to the BSDE on [t, T ] ⎧  ⎪ dη (s) = − (A + BΘ ∗ ) ηi + (C + DΘ ∗ ) ζi ⎪ i ⎪ ⎪ ⎪   ⎪ ⎨ + (Θ ∗ ) (R i + D Pi D) + Pi B + C  Pi D + (S i ) v ∗  (2.3.6) ⎪ ∗  ∗  i i ⎪ ds + ζ + (C + DΘ ) P σ + (Θ ) ρ + P b + q dW, ⎪ i i i ⎪ ⎪ ⎪ ⎩ ηi (T ) = g i satisfies the following condition: for a.e. s ∈ [t, T ],

30

2 Linear-Quadratic Two-Person Differential Games

Bi ηi + Di ζi + Di Pi σ + ρii + (Rii + Di Pi D)v ∗ = 0, a.s.

(2.3.7)

Proof From Proposition 2.1.5 we see that (Θ1∗ , v1∗ ; Θ2∗ , v2∗ ) is a closed-loop Nash equilibrium of Problem (SDG) if and only if (Θ1∗ , v1∗ ) is a closed-loop optimal strategy of Problem (1) and (Θ2∗ , v2∗ ) is a closed-loop optimal strategy of Problem (2). According to Theorem 1.1.13 of Chap. 1, (Θ1∗ , v1∗ ) is a closed-loop optimal strategy of Problem (1) if and only if the solution P1 ∈ C([t, T ]; Sn ) to the ODE ⎧ P˙ + P1 (A + B2 Θ2∗ ) + (A + B2 Θ2∗ ) P1 + (C + D2 Θ2∗ ) P1 (C + D2 Θ2∗ ) ⎪ ⎪ ⎪ 1 ⎪ ⎪ ∗  1  ∗ ' ⎪ ⎪ ⎨ + Q + (Θ1 ) (R11 + D1 P1 D1 )Θ1  + P1 B1 + (C + D2 Θ2∗ ) P1 D1 + ' S  Θ1∗ ⎪   ⎪ ∗   ∗ ⎪ ' ⎪ + (Θ ) B P + D P (C + D Θ ) + S = 0, 1 1 2 ⎪ 1 1 1 2 ⎪ ⎪ ⎩ 1 P1 (T ) = G is such that for a.e. s ∈ [t, T ], 1 + D1 P1 D1  0, R11 1 B1 P1 + D1 P1 (C + D2 Θ2∗ ) + ' S + (R11 + D1 P1 D1 )Θ1∗ = 0,

and the adapted solution (η1 , ζ1 ) to the BSDE ⎧  ⎪ dη (s) = − [(A + B2 Θ2∗ ) + B1 Θ1∗ ] η1 + [(C + D2 Θ2∗ ) + D1 Θ1∗ ] ζ1 ⎪ 1 ⎪ ⎪ ⎪ ⎪ ⎨ + [(C + D2 Θ2∗ ) + D1 Θ1∗ ] P1 (D2 v2∗ + σ)  ⎪ ∗  ∗ ⎪ + (Θ ) ρ ˆ + P (B v + b) + q ˆ ds + ζ1 dW (s), ⎪ 1 2 2 1 ⎪ ⎪ ⎪ ⎩ η1 (T ) = g 1 is such that for a.e. s ∈ [t, T ], 1 + D1 P1 D1 )v1∗ = 0, a.s. B1 η1 + D1 ζ1 + D1 P1 (D2 v2∗ + σ) + ρˆ + (R11

A tedious but straightforward calculation shows that the above equations can respectively be simplified to the following: ⎧   ˙ Q 1 + (Θ ∗ ) (R 1 + D  P1 D)Θ ∗ ⎪ ⎨ P1 + P1 A + A P1 + C P1 C +    + P1 B + C  P1 D + (S 1 ) Θ ∗ + (Θ ∗ ) B  P1 + D  P1 C + S 1 = 0, ⎪ ⎩ P1 (T ) = G 1 , 1 + D1 P1 D1  0, R11

B1 P1 + D1 P1 C + S11 + (R11 + D1 P1 D)Θ ∗ = 0,

2.3 Closed-Loop Nash Equilibria and Symmetric Riccati Equations

31

⎧  ⎪ dη (s) = − (A + BΘ ∗ ) η1 + (C + DΘ ∗ ) ζ1 ⎪ 1 ⎪ ⎪ ⎪   ⎪ ⎨ + (Θ ∗ ) (R 1 + D P1 D) + P1 B + C  P1 D + (S 1 ) v ∗  ⎪ ∗  ∗  1 1 ⎪ ds + ζ1 dW (s), + (C + DΘ ) P σ + (Θ ) ρ + P b + q ⎪ 1 1 ⎪ ⎪ ⎪ ⎩ η1 (T ) = g 1 , B1 η1 + D1 ζ1 + D1 P1 σ + ρ11 + (R11 + D1 P1 D)v ∗ = 0, a.s. From the above, we see that (Θ1∗ , v1∗ ) is a closed-loop optimal strategy of Problem (1) if and only if the conditions (a) and (b) hold for i = 1. In a similar manner we can show that (Θ2∗ , v2∗ ) is a closed-loop optimal strategy of Problem (2) if and only if the conditions (a) and (b) hold for i = 2.

Note that conditions (2.3.5) and (2.3.7) are, respectively, equivalent to:  

B1 P1 + D1 P1 C + S11 B2 P2 + D2 P2 C + S22

B1 η1 + D1 ζ1 + D1 P1 σ + ρ11 B2 η2 + D2 ζ2 + D2 P2 σ + ρ22



 +



 +

R11 + D1 P1 D R22 + D2 P2 D R11 + D1 P1 D

 Θ ∗ = 0,  v ∗ = 0.

R22 + D2 P2 D

Therefore,  ∗

Θ =−  ∗

v =−

R11 + D1 P1 D R22 + D2 P2 D R11 + D1 P1 D R22 + D2 P2 D

−1  −1 

B1 P1 + D1 P1 C + S11 B2 P2 + D2 P2 C + S22

 ,

B1 η1 + D1 ζ1 + D1 P1 σ + ρ11 B2 η2 + D2 ζ2 + D2 P2 σ + ρ22

(2.3.8)  ,

provided the involved inverse (which is an Rm×m -valued function) exists. Plugging (2.3.8) into (2.3.3), we see that the equations for P1 and P2 are coupled, symmetric, and of Riccati type. We conclude this section with a compact form of the Lyapunov type Eqs. (2.3.3) (i = 1, 2). The differential equations in (2.3.3) (i = 1, 2) can be expressed in block matrix notation as

32

2 Linear-Quadratic Two-Person Differential Games

        P1 0 A 0 P1 0 P˙1 0 A 0 + 0= + 0 P2 0 P2 0 A 0 A 0 P˙2        P1 0 C 0 C 0 Q1 0 + + 0 P2 0 Q2 0 C 0 C  ∗   1   ∗  0 Θ 0 R + D P1 D Θ 0 + 0 Θ∗ 0 R 2 + D  P2 D 0 Θ∗  ∗   0 Θ 0 P1 B + C  P1 D + (S 1 ) + 0 P2 B + C  P2 D + (S 2 ) 0 Θ∗  ∗     0 Θ 0 B P1 + D  P1 C + S 1 + . 0 Θ∗ 0 B  P2 + D  P2 C + S 2 

Denoting

 P≡

  ∗  Θ 0 P1 0 , Θ≡ . 0 P2 0 Θ∗

and using the notation introduced in the previous section, we can write 

 R 1 + D P1 D 0 = R + DP D, 0 R 2 + D P2 D   0 P1 B + C P1 D + (S 1 ) = P B + C P D + S. 0 P2 B + C P2 D + (S 2 ) Consequently, one sees that the Eqs. (2.3.3) (i = 1, 2) are equivalent to ! " ⎧    ˙ R + D P D Θ ⎪ ⎨ P + !P A + A P + C P C" + Q + Θ ! " + P B + C  P D + S Θ + Θ  B  P + D P C + S = 0, ⎪ ⎩ P(T ) = G, which is easily seen to be symmetric. Note that in the notation of the previous section, the conditions (2.3.5) (i = 1, 2) can be rewritten as

2.3 Closed-Loop Nash Equilibria and Symmetric Riccati Equations



B1 P1 + D1 P1 C + S11





R11 + D1 P1 D

33



+ Θ∗ B2 P2 + D2 P2 C + S22 R22 + D2 P2 D            P1 0 In D1 0 P1 0 C 0 B1 0 In + = 0 B2 0 P2 0 D2 0 P2 In In 0 C  1     1    S 0 In Im 1 0 0 0 R 0 Im Im 1 0 0 0 + Θ∗ + In Im 0 0 0 Im 2 0 S2 0 0 0 Im 2 0 R2       P1 0 D 0 Im D1 0 Θ∗ + Im 0 D2 0 P2 0 D "  ! "  ! = J  B  P + D P C + S I n + J  R + D P D I m Θ ∗ .

0=

2.4 Relations Between Open-Loop and Closed-Loop Nash Equilibria Recall that for the LQ optimal control problem, which is a special case of Problem (SDG), the closed-loop solvability implies the open-loop solvability. It is natural to expect that an analogous result holds for the LQ two-person differential game. However, as the next example shows, the existence of a closed-loop Nash equilibrium does not necessarily imply the existence of an open-loop Nash equilibrium. In this section, we discuss the connections between open-loop and closed-loop Nash equilibria by presenting three examples. It turns out that the open-loop and closed-loop solvability of Problem (SDG) are quite different. The following example shows that Problem (SDG) may have only closed-loop Nash equilibria and no open-loop Nash equilibria. Example 2.4.1 Consider the one-dimensional state equation 

d X (s) = u 1 (s)ds + u 2 (s)dW (s), s ∈ [t, 1], X (t) = x,

and the cost functionals

34

2 Linear-Quadratic Two-Person Differential Games







1

J (t, x; u 1 , u 2 ) = E |X (1)| + |u 1 (s)| ds , t  1% &  2 2 2 2 − |X (s)| + |u 2 (s)| ds . J (t, x; u 1 , u 2 ) = E −|X (1)| + 1

2

2

t

Let t ∈ [0, 1) be arbitrary. We claim that (Θ1∗ (s), v1∗ (s); Θ2∗ (s), v2∗ (s)) = ((s − 2)−1 , 0; 0, 0) is a closed-loop Nash equilibrium of the above problem on [t, 1]. According to Theorem 2.3.1, to verify this claim it suffices to show that for i = 1, 2, the solution Pi to the Eq. (2.3.3) satisfies (2.3.4) and (2.3.5), since in this example the solution (ηi , ζi ) to the BSDE (2.3.6) is identically zero. Simplifying the equations for P1 and P2 we obtain  P˙1 (s) + P1 (s)Θ2∗ (s)2 + 2P1 (s)Θ1∗ (s) + Θ1∗ (s)2 = 0, P1 (1) = 1,  P˙2 (s) + P2 (s)Θ2∗ (s)2 + 2P2 (s)Θ1∗ (s) + Θ2∗ (s)2 − 1 = 0, P2 (1) = −1. Substituting in Θ1∗ (s) = (s − 2)−1 and Θ2∗ (s) = 0 and solving for P1 and P2 yield P1 (s) =

1 , 2−s

P2 (s) =

−(2 − s)3 − 2 . 3(2 − s)2

Now one can easily verify that (2.3.4) and (2.3.5) hold. Next we prove, by contradiction, that the problem does not have open-loop Nash equilibria. Suppose that (u ∗1 , u ∗2 ) is an open-loop Nash equilibrium for some initial pair (t, x) with t ∈ [0, 1). Then u ∗2 is an open-loop optimal control of the following Problem (SLQ): To minimize  2 ' J (t, x; u 2 ) = E −|X (1)| +

1

%

&  − |X (s)| + |u 2 (s)| ds 2

2

t

subject to the state equation 

d X (s) = u ∗1 (s)ds + u 2 (s)dW (s), s ∈ [t, 1], X (t) = x.

(2.4.1)

2.4 Relations Between Open-Loop and Closed-Loop Nash Equilibria

35

Take u 2 to be an arbitrary constant λ. Then the corresponding solution to the above equation is given by X (s) = x + t

s

u ∗1 (r )dr + λ[W (s) − W (t)].

(2.4.2)

Let ε > 0 be undetermined. Substituting (2.4.2) into (2.4.1) and using the inequality (a + b)2  (1 − 1ε )a 2 + (1 − ε)b2 , we obtain 

2   1 / 02 1 u ∗1 (s)ds + (ε − 1)E λ[W (1) − W (t)] −1 E x + ε t  1  2 s 1 ∗ x+ −1 E + u 1 (r )dr ds ε t t 1/ 02 λ[W (s) − W (t)] ds + λ2 (1 − t) + (ε − 1)E t  1  2 1  2 2 1 s 1 ∗ ∗ x+ −1 E x + = u 1 (s)ds + u 1 (r )dr ds ε t t t   1 + λ2 (1 − t) 2ε + (ε − 1)(1 − t) . 2

J'(t, x; λ) 

Choosing ε > 0 small enough so that 2ε + (ε − 1)(1 − t) < 0 and then letting λ → ∞, we see that inf J'(t, x; u 2 ) = −∞, u 2 ∈L 2F (t,1;R)

which contradicts the fact that the LQ problem has an open-loop optimal control u ∗2 . The following example shows that Problem (SDG) may have only open-loop Nash equilibria and no closed-loop Nash equilibria. Example 2.4.2 Consider the one-dimensional state equation 

d X (s) = [u 1 (s) + u 2 (s)]ds + [u 1 (s) − u 2 (s)]dW (s), s ∈ [t, 1], X (t) = x,

and the cost functionals J 1 (t, x; u 1 , u 2 ) = J 2 (t, x; u 1 , u 2 ) = E|X (1)|2 ≡ J (t, x; u 1 , u 2 ). Let t ∈ [0, 1) be arbitrary, and let β 

1 . 1−t

We claim that

36

2 Linear-Quadratic Two-Person Differential Games

  / 0 βx βx β β 1[t,t+ β1 ] (s), 1[t,t+ β1 ] (s) , s ∈ [t, 1] u 1 (s), u 2 (s) = − 2 2 is an open-loop Nash equilibrium for the initial pair (t, x). Indeed, it is clear that β J (t, x; u 1 , u 2 )  0 for any u 1 ∈ L 2F (t, 1; R). On the other hand, the state process X β β β corresponding to (u 1 , u 2 ) and (t, x) satisfies X β (1) = 0. Hence, β

β

β

β

β

J (t, x; u 1 , u 2 ) = 0  J (t, x; u 1 , u 2 ), ∀u 1 ∈ L 2F (t, 1; R). Likewise, β

J (t, x; u 1 , u 2 ) = 0  J (t, x; u 1 , u 2 ), ∀u 2 ∈ L 2F (t, 1; R). This establishes the claim. However, the problem does not admit a closed-loop Nash equilibrium. This can be proved by contradiction. Suppose that (Θ1∗ , v1∗ ; Θ2∗ , v2∗ ) is a closed-loop Nash equilibrium on [t, 1] with t < 1. Consider the corresponding ODEs in Theorem 2.3.1, which now become  P˙i + Pi (Θ1∗ − Θ2∗ )2 + 2Pi (Θ1∗ + Θ2∗ ) = 0, i = 1, 2. (2.4.3) Pi (1) = 1. The corresponding constraints read P1 , P2  0,

P1 + P1 (Θ1∗ − Θ2∗ ) = 0,

P2 − P2 (Θ1∗ − Θ2∗ ) = 0.

(2.4.4)

Since P1 and P2 satisfy the same ODE (2.4.3), we have P1 = P2 by the uniqueness of solutions. Then the last two equations in (2.4.4) imply P1 ≡ 0, which contradicts the terminal condition P1 (1) = 1. The next example shows that the closed-loop representation of open-loop Nash equilibria is not necessarily the outcome of a closed-loop Nash equilibrium. Example 2.4.3 Consider the one-dimensional state equation 

d X (s) = [u 1 (s) + u 2 (s)]ds + X (s)dW (s), s ∈ [t, T ], X (t) = x,

and the cost functionals J 1 (t, x; u 1 , u 2 ) = E X (T )2 + J 2 (t, x; u 1 , u 2 ) = E X (T )2 +

T

 u 1 (s)2 ds ,

T

 u 2 (s)2 ds .

t

t

2.4 Relations Between Open-Loop and Closed-Loop Nash Equilibria

37

In this example, A = 0, C = 1,   0 1 2 ρ =ρ = , 0

B1 = B2 = 1,   0 1 2 S =S = , 0

D1 = D2 = 0,   10 1 R = , 00

b = σ = 0,   00 2 R = , 01

q 1 = q 2 = 0,

Q 1 = Q 2 = 0,

G 1 = G 2 = 1,

g 1 = g 2 = 0.

Clearly, the convexity condition (b) of Theorem 2.2.1 holds for i = 1, 2. The corresponding Eqs. (2.2.12)–(2.2.13) can be written componentwise as Π˙ i (s) + Πi (s) + Πi (s)[Θ1 (s) + Θ2 (s)] = 0, Πi (T ) = 1; Θi (s) + Πi (s) = 0; i = 1, 2. It is not hard to see that the solutions Π1 and Π2 are equal and both are given by Π1 (s) = Π2 (s) =

e T −s . 2e T −s − 1

Note that if we take v to be zero, then the adapted solution (η, ζ) to the BSDE (2.2.14) is identically (0, 0) and hence (2.2.15) holds. Thus, by Theorem 2.2.4, the open-loop Nash equilibria of this problem (with initial time t) admit a closed-loop representation which is given by u 1 (s) = u 2 (s) = −

e T −s X (s), 2e T −s − 1

s ∈ [t, T ].

(2.4.5)

Next we show that the problem admits a unique closed-loop Nash equilibrium (Θ1∗ , v1∗ ; Θ2∗ , v2∗ ) on [t, T ]. In light of Theorem 2.3.1, to determine (Θ1∗ , v1∗ ; Θ2∗ , v2∗ ), we need to solve the constrained Riccati equations  

P˙1 (s) + P1 (s) + Θ1∗ (s)2 + 2P1 (s)[Θ1∗ (s) + Θ2∗ (s)] = 0, P1 (T ) = 1, P1 (s) + Θ1∗ (s) = 0,

(2.4.6)

P˙2 (s) + P2 (s) + Θ2∗ (s)2 + 2P2 (s)[Θ1∗ (s) + Θ2∗ (s)] = 0, P2 (T ) = 1, P2 (s) + Θ2∗ (s) = 0,

(2.4.7)

as well as the constrained BSDEs

38

2 Linear-Quadratic Two-Person Differential Games

# ∗ ⎧ ∗ ∗ ∗ ⎪ ⎨ dη1 (s) = − [Θ1 (s) + Θ2 (s)]η1 (s) +$ ζ1 (s) + Θ1 (s)v1 (s) + P1 (s)[v1∗ (s) + v2∗ (s)] ds + ζ1 (s)dW (s), ⎪ ⎩ η1 (T ) = 0, η1 (s) + v1∗ (s) = 0, # ∗ ⎧ ∗ ∗ ∗ ⎪ ⎨ dη2 (s) = − [Θ1 (s) + Θ2 (s)]η2 (s) +$ ζ2 (s) + Θ2 (s)v2 (s) + P2 (s)[v1∗ (s) + v2∗ (s)] ds + ζ2 (s)dW (s), ⎪ ⎩ η2 (T ) = 0, η2 (s) + v2∗ (s) = 0.

(2.4.8)

(2.4.9)

Using the relations Pi (s) + Θi∗ (s) = 0; i = 1, 2, we can further rewrite (2.4.6) and (2.4.7) as follows:  

P˙1 (s) = P1 (s)2 + 2P1 (s)P2 (s) − P1 (s), P1 (T ) = 1, Θ1∗ (s) = −P1 (s), P˙2 (s) = P2 (s)2 + 2P2 (s)P1 (s) − P2 (s), P2 (T ) = 1, Θ2∗ (s) = −P2 (s).

Now it is easily seen that P1 (s) = P2 (s) =

e T −s e T −s , Θ1∗ (s) = Θ2∗ (s) = − T −s . −2 3e −2

3e T −s

To solve (2.4.8) and (2.4.9), we first use the relations ηi (s) + vi∗ (s) = 0; i = 1, 2 to rewrite them as  # $ dη1 (s) = − K (s)η1 (s) + ζ1 (s) − K (s)[v1∗ (s) + v2∗ (s)] ds + ζ1 (s)dW (s), 

η1 (T ) = 0, v1∗ (s) = −η1 (s), # $ dη2 (s) = − K (s)η2 (s) + ζ2 (s) − K (s)[v1∗ (s) + v2∗ (s)] ds + ζ2 (s)dW (s), η2 (T ) = 0, v2∗ (s) = −η2 (s),

where K (s) = Θ1∗ (s). Since (η1 , ζ1 ) and (η2 , ζ2 ) satisfy the same BSDE, they must be equal. Substituting v1∗ (s) = v2∗ (s) = −η1 (s) into the BSDE for (η1 , ζ1 ) yields 

dη1 (s) = −[3K (s)η1 (s) + ζ1 (s)]ds + ζ1 (s)dW (s), η1 (T ) = 0,

from which we find (η1 , ζ1 ) = (0, 0), and hence v1∗ (s) = v2∗ (s) = 0. Therefore, the unique closed-loop Nash equilibrium (Θ1∗ , v1∗ ; Θ2∗ , v2∗ ) is given by Θ1∗ (s) = Θ2∗ (s) = −

e T −s , v1∗ (s) = v2∗ (s) = 0. 3e T −s − 2

(2.4.10)

2.4 Relations Between Open-Loop and Closed-Loop Nash Equilibria

39

Comparing (2.4.5) with (2.4.10), we see that the closed-loop representation of open-loop Nash equilibria is different from the outcome of the closed-loop Nash equilibrium. Remark 2.4.4 From Example 2.4.3 we see that the closed-loop representation of open-loop Nash equilibria is not the outcome of a closed-loop Nash equilibrium in general. The reason is that the Riccati equation (2.3.3) for Pi is symmetric, whereas the Eq. (2.2.17) for Πi is not.

2.5 Zero-Sum Games In this section we consider LQ two-person zero-sum differential games in which one player’s gain is the other’s loss. In this case, the sum of the payoffs/costs of the players is always zero, i.e., J 1 (t, x; u 1 , u 2 ) + J 2 (t, x; u 1 , u 2 ) = 0, ∀u i ∈ Ui [t, T ], i = 1, 2. Thus, we may assume, without loss of generality, that the weighting matrices in J 1 and J 2 have opposite signs, i.e., G 1 + G 2 = 0,

g 1 + g 2 = 0,

S 1j + S 2j = 0,

R 1jk + R 2jk = 0,

Q 1 + Q 2 = 0, q 1 + q 2 = 0, ρ1j + ρ2j = 0;

j, k = 1, 2.

(2.5.1)

To simplify the notation we shall denote J (t, x; u 1 , u 2 ) = J 1 (t, x; u 1 , u 2 ) and B = (B1 , B2 ), g = g 1 , q = q 1 , D = (D1 , D2 ), G = G 1 , Q = Q 1 , R jk = R 1jk ,    1  1   R11 R12 S ρ u1 R= , S = 11 , ρ = 11 , u = . R21 R22 S2 ρ2 u2 Then the state equation (2.1.1) and the cost functional J 1 (t, x; u 1 , u 2 ) can be rewritten, respectively, as ⎧ ⎪ ⎨ d X (s) = [A(s)X (s) + B(s)u(s) + b(s)]ds + [C(s)X (s) + D(s)u(s) + σ(s)]dW (s), ⎪ ⎩ X (t) = x, and

40

2 Linear-Quadratic Two-Person Differential Games

 J (t, x; u) = E G X (T ), X (T ) + 2g, X (T )     T  X (s) X (s) Q(s) S(s) , + S(s) R(s) u(s) u(s) t       q(s) X (s) +2 , ds . ρ(s) u(s) In terms of the single cost functional J (t, x; u 1 , u 2 ), the notions of open-loop and closed-loop saddle points can be represented as follows. Definition 2.5.1 Let (2.5.1) hold. (i) A pair (u ∗1 , u ∗2 ) ∈ U1 [t, T ] × U2 [t, T ] is called an open-loop saddle point of Problem (SDG) for the initial pair (t, x) ∈ [0, T ) × Rn if J (t, x; u ∗1 , u 2 )  J (t, x; u ∗1 , u ∗2 )  J (t, x; u 1 , u ∗2 ), ∀(u 1 , u 2 ) ∈ U1 [t, T ] × U2 [t, T ]. (ii) A 4-tuple (Θ1∗ , v1∗ ; Θ2∗ , v2∗ ) ∈ Θ 1 [t, T ] × U1 [t, T ] × Θ 2 [t, T ] × U2 [t, T ] is called a closed-loop saddle point of Problem (SDG) on [t, T ] if the inequalities J (t, x; Θ1∗ X + v1∗ , u 2 )  J (t, x; Θ1∗ X ∗ + v1∗ , Θ2∗ X ∗ + v2∗ )  J (t, x; u 1 , Θ2∗ X + v2∗ ) hold for all x ∈ Rn and (u 1 , u 2 ) ∈ U1 [t, T ] × U2 [t, T ].

2.5.1 Characterizations of Open-Loop and Closed-Loop Saddle Points In the case of zero-sum games, Theorems 2.2.1 and 2.3.1 can be reformulated in the following simpler ways. Theorem 2.5.2 Let (G1)–(G2) and (2.5.1) hold. Let (u ∗1 , u ∗2 ) ∈ U1 [t, T ] × U2 [t, T ] and denote  ∗ u1 ∗ . u = u ∗2 Then (u ∗1 , u ∗2 ) is an open-loop saddle point of Problem (SDG) for the initial pair (t, x) if and only if (a) the adapted solution (X ∗ , Y ∗ , Z ∗ ) to the FBSDE

2.5 Zero-Sum Games

41

⎧ ∗ ∗ + Bu ∗ + b)ds + (C X ∗ + Du ∗ + σ)dW (s), ⎪ ⎨ d X (s) = (AX ! " dY ∗ (s) = − A Y ∗ + C  Z ∗ + Q X ∗ + S  u ∗ + q ds + Z ∗ dW (s), ⎪ ⎩ X ∗ (t) = x, Y ∗ (T ) = G X ∗ (T ) + g, satisfies the stationarity condition B  Y ∗ + D  Z ∗ + S X ∗ + Ru ∗ + ρ = 0, a.e. on [t, T ], a.s.; (b) the following convexity-concavity condition holds: For i = 1, 2, T 

 (−1)i−1 E G X i (T ), X i (T ) + t

Q (Si ) Si Rii



    Xi Xi , ds  0 ui ui

for all u i ∈ Ui [t, T ], where X i is the solution to the SDE 

d X i (s) = (AX i + Bi u i )ds + (C X i + Di u i )dW (s), s ∈ [t, T ], X i (t) = 0,

(or equivalently, the cost functional J (t, x; u 1 , u 2 ) is convex in u 1 and concave in u 2 ). Proof The result follows obviously from the fact that in the case of (2.5.1), the

solutions (Yi∗ , Z i∗ ); i = 1, 2, to the BSDE in (2.2.1) are mutual additive inverses. Theorem 2.5.3 Let (G1)–(G2) and (2.5.1) hold. Let (Θ1∗ , v1∗ ; Θ2∗ , v2∗ ) ∈ Θ 1 [t, T ] × U1 [t, T ] × Θ 2 [t, T ] × U2 [t, T ] and denote  ∗  ∗ Θ1 v ∗ , v = 1∗ . Θ = Θ2∗ v2 ∗

Then (Θ1∗ , v1∗ ; Θ2∗ , v2∗ ) is a closed-loop saddle point of Problem (SDG) on [t, T ] if and only if (a) the solution P ∈ C([t, T ]; Sn ) to the symmetric Lyapunov type equation ⎧   ∗   ∗ ˙ ⎪ ⎨ P + P A + A P + C PC + Q + (Θ ) (R + D P D)Θ + (P B + C  P D + S  )Θ ∗ + (Θ ∗ ) (B  P + D  PC + S) = 0, ⎪ ⎩ P(T ) = G, satisfies the following conditions: for a.e. s ∈ [t, T ], R11 + D1 P D1  0, R22 + D2 P D2  0, B  P + D  PC + S + (R + D  P D)Θ ∗ = 0; (b) the adapted solution (η, ζ) to the BSDE on [t, T ]

(2.5.2) (2.5.3)

42

2 Linear-Quadratic Two-Person Differential Games

 ⎧ ∗  ∗  ⎪ ⎨ dη(s) = − (A + BΘ ) η + (C + DΘ ) ζ  + (C + DΘ ∗ ) Pσ + (Θ ∗ ) ρ + Pb + q ds + ζdW, (2.5.4) ⎪ ⎩ η(T ) = g, satisfies the following condition: for a.e. s ∈ [t, T ], B  η + D  ζ + D  Pσ + ρ + (R + D  P D)v ∗ = 0, a.s. Proof In the case of (2.5.1), the solutions Pi ; i = 1, 2, to the ODE (2.3.3) are mutual additive inverses: (2.5.5) P1 (s) = −P2 (s), s ∈ [t, T ]. It is clear then that the conditions (a) in Theorems 2.3.1 and 2.5.3 are equivalent. From the relations (2.5.1) and (2.5.5) we see that the solutions (ηi , ζi ); i = 1, 2, to the BSDE (2.3.6) are also mutual additive inverses. Moreover, by substituting (2.5.3) into (2.3.6), the equation for (η1 , ζ1 ) reduces to (2.5.4). The equivalence between the conditions (b) in Theorems 2.3.1 and 2.5.3 then follows easily.

The following statement is an equivalent version of Theorem 2.5.3, which provides an explicit representation for the closed-loop saddle points. Theorem 2.5.4 Let (G1)–(G2) and (2.5.1) hold. Then Problem (SDG) admits a closed-loop saddle point on [t, T ] if and only if (a) the Riccati equation ⎧     ˙ ⎪ ⎨ P + P A + A P + C PC + Q − (P B + C P D + S ) × (R + D  P D)† (B  P + D  PC + S) = 0, s ∈ [t, T ], ⎪ ⎩ P(T ) = G

(2.5.6)

admits a solution P ∈ C([t, T ]; Sn ) such that R11 + D1 P D1  0, R22 + D2 P D2  0, a.e. on [t, T ], 





R(B P + D PC + S) ⊆ R(R + D P D), a.e. on [t, T ], 





(R + D P D) (B P + D PC + S) ∈ L (t, T ; R †

2

m×n

);

(2.5.7) (2.5.8) (2.5.9)

(b) the adapted solution (η, ζ) of the BSDE  ⎧ ' ' ' ⎪ ⎨ dη(s) = − (A + B Θ) η + (C + D Θ)ζ + (C + D Θ)Pσ  ' ρ + Pb + q ds + ζdW (s), s ∈ [t, T ], +Θ ⎪ ⎩ η(T ) = g, ' ≡ −(R + D  P D)† (B  P + D  PC + S), satisfies where Θ

(2.5.10)

2.5 Zero-Sum Games

43

B  η + D  ζ + D  Pσ + ρ ∈ R(R + D  P D), a.e. on [t, T ], a.s. (R + D  P D)† (B  η + D  ζ + D  Pσ + ρ) ∈ L 2F (t, T ; Rm ). In this case, the closed-loop saddle points are given by Θ ∗ = − (R + D  P D)† (B  P + D  PC + S) + [I − (R + D  P D)† (R + D  P D)]Γ, ∗







(2.5.11)



v = − (R + D P D) (B η + D ζ + D Pσ + ρ) †

+ [I − (R + D  P D)† (R + D  P D)]γ,

(2.5.12)

where Γ ∈ L 2 (t, T ; Rm×n ) and γ ∈ L 2F (t, T ; Rm ) are arbitrary. Proof The proof of the equivalence of Theorems 2.5.3 and 2.5.4 is straightforward, using Proposition 1.3.1 of Chap. 1.



2.5.2 Relations Between Open-Loop and Closed-Loop Saddle Points We have seen in Sect. 2.4 that the existence of open-loop Nash equilibria does not necessarily imply the existence of closed-loop Nash equilibria, and vice-versa. Even if both open-loop and closed-loop Nash equilibria exist, the outcome of a closed-loop Nash equilibrium is not necessarily the closed-loop representation of open-loop Nash equilibria (see Example 2.4.3). The situation is somewhat different in the zero-sum case. We shall see in this section that any closed-loop representation of open-loop saddle points coincides with the outcome of a closed-loop saddle point, as long as both exist. Let (Θ, v) ∈ Θ[t, T ] × U[t, T ], and assume that the open-loop saddle points of Problem (SDG) on [t, T ] admit the closed-loop representation (2.2.10). Recalling (2.5.1) and the notation introduced right after (2.5.1), we may rewrite the Eqs. (2.2.17) (i = 1, 2) for Π1 and Π2 as 

Π˙ 1 + Π1 A + A Π1 + C  Π1 C + Q + (Π1 B + C  Π1 D + S  )Θ = 0, Π1 (T ) = G,

and 

Π˙ 2 + Π2 A + A Π2 + C  Π2 C − Q + (Π2 B + C  Π2 D − S  )Θ = 0, Π2 (T ) = −G,

44

2 Linear-Quadratic Two-Person Differential Games

respectively. It is easily seen that Π1 = −Π2 . Hence, denoting Π  Π1 , we may rewrite (2.2.18) as     R12 + D1 Π D2 B1 Π + D1 ΠC + S1 R11 + D1 Π D1 Θ + = 0, −R21 − D2 Π D1 − R22 − D2 Π D2 −B2 Π − D2 ΠC − S2



or equivalently, (R + D  Π D)Θ + B  Π + D  ΠC + S = 0. According to Proposition 1.3.1 of Chap. 1, the latter in turn is equivalent to R(B  Π + D  ΠC + S) ⊆ R(R + D  Π D), a.e. on [t, T ], 





(R + D Π D) (B Π + D ΠC + S) ∈ L (t, T ; R †

2

m×n

),

(2.5.13) (2.5.14)

and in this case Θ is given by Θ = − (R + D  Π D)† (B  Π + D  ΠC + S) + [I − (R + D  Π D)† (R + D  Π D)]Γ,

(2.5.15)

where Γ ∈ L 2 (t, T ; Rm×n ) is arbitrary. Upon substitution of (2.5.15) into the equation for Π = Π1 , the latter becomes ⎧     ˙ ⎪ ⎨ Π + Π A + A Π + C ΠC + Q − (Π B + C Π D + S ) × (R + D  Π D)† (B  Π + D  ΠC + S) = 0, s ∈ [t, T ], ⎪ ⎩ Π (T ) = G.

(2.5.16)

Note that the Eq. (2.5.16) is symmetric. Now we write the solution (η, ζ) to the BSDE (2.2.14) into a vector form as     η1 ζ η= , ζ= 1 , η2 ζ2 where each component is valued in Rn . In a similar manner we can show that (i) (η1 , ζ1 ) = −(η2 , ζ2 ), and both satisfy  ⎧ ˜  ˜  ˜  ⎪ ⎨ dη(s) = − (A + B Θ) η + (C + D Θ) ζ + (C + D Θ) Π σ + Θ˜  ρ + Π b + q ds + ζdW (s), s ∈ [t, T ], ⎪ ⎩ η(T ) = g, where Θ˜ = −(R + D  Π D)† (B  Π + D  ΠC + S), (ii) the constraint (2.2.15) is equivalent to

(2.5.17)

2.5 Zero-Sum Games

45

B η + D ζ + D Π σ + ρ ∈ R(R + D Π D), a.e. on [t, T ], a.s. 







(R + D Π D) (B η + D ζ + D Π σ + ρ) ∈ †

L 2F (t, T ; Rm ),

(2.5.18) (2.5.19)

(iii) and in this case v is given by v = − (R + D  Π D)† (B  η + D  ζ + D  Π σ + ρ) + [I − (R + D  Π D)† (R + D  Π D)]γ, where γ ∈ L 2F (t, T ; Rm ) is arbitrary. We summarize these observations in the following theorem. Theorem 2.5.5 Let (G1)–(G2) and (2.5.1) hold. The open-loop saddle points of Problem (SDG) with initial time t admit a closed-loop representation if and only if the following hold: (a) the convexity-concavity condition (b) of Theorem 2.5.2 holds; (b) the Riccati equation (2.5.16) admits a solution Π ∈ C([t, T ]; Sn ) such that (2.5.13)–(2.5.14) hold, and the adapted solution (η, ζ) to the BSDE (2.5.17) satisfies (2.5.18)–(2.5.19). In this case, Θ(s)X (s) + v(s); s ∈ [t, T ] is a closed-loop representation of openloop saddle points if and only if Θ = − (R + D  Π D)† (B  Π + D  ΠC + S) + [I − (R + D  Π D)† (R + D  Π D)]Γ, v = − (R + D  Π D)† (B  η + D  ζ + D  Π σ + ρ) + [I − (R + D  Π D)† (R + D  Π D)]γ, for some Γ ∈ L 2 (t, T ; Rm×n ) and γ ∈ L 2F (t, T ; Rm ). Proof The result can be proved by combining Theorem 2.2.4 and the previous argument. We leave the details to the interested reader.

Comparing Theorems 2.5.4 and 2.5.5, one may wonder whether the closed-loop representation of open-loop saddle points coincide with the outcome of closed-loop saddle points when both exist. The answer to this question is affirmative, as shown by the following result. Theorem 2.5.6 Let (G1)–(G2) and (2.5.1) hold. Suppose that both open-loop and closed-loop saddle points exist on [t, T ]. If the open-loop saddle points admit a closed-loop representation, then this representation must be the outcome of a closedloop saddle point. Proof The proof is immediate from Theorems 2.5.4 and 2.5.5, once we show that the solution P to the Riccati equation (2.5.6) with constraints (2.5.7)–(2.5.9) coincides

46

2 Linear-Quadratic Two-Person Differential Games

with the solution Π to (2.5.16) with constraints (2.5.13)–(2.5.14). To this end, we fix an arbitrary t  ∈ [t, T ]. First, we note that if the convexity-concavity condition (b) of Theorem 2.5.2 holds for the initial time t, it also holds for t  . Indeed, for any u 1 ∈ U1 [t  , T ], let X 1 be the solution to  d X 1 (s) = (AX 1 + B1 u 1 )ds + (C X 1 + D1 u 1 )dW (s), s ∈ [t  , T ], X 1 (t  ) = 0, and define the zero-extension of u 1 as follows:  0, s ∈ [t, t  ), [01[t,t  ) ⊕ u 1 ](s) = u 1 (s), s ∈ [t  , T ]. Then u˜ 1 ≡ [01[t,t  ) ⊕ u 1 ] ∈ U1 [t, T ], and due to the initial state being 0, the solution X˜ 1 of  d X˜ 1 (s) = (A X˜ 1 + B1 u˜ 1 )ds + (C X˜ 1 + D1 u˜ 1 )dW (s), s ∈ [t, T ], X˜ 1 (t) = 0 

is such that X˜ 1 (s) = It follows that  E G X 1 (T ), X 1 (T ) +

0, s ∈ [t, t  ), X 1 (s), s ∈ [t  , T ].

     Q (S1 ) X 1 X1 , ds S R u u1  1 11 1 t       T 

Q (S1 ) X˜ 1 X˜ 1 ˜ ˜ , ds  0. = E G X 1 (T ), X 1 (T ) + S R u ˜ u˜ 1 1 11 1 t T



This proves the case i = 1. The case i = 2 can be treated similarly. Now let (η P , ζ P ) denote the adapted solution to (2.5.10). It is clear that over the interval [t  , T ], with Π replaced by P and (η, ζ) replaced by (η P , ζ P ), the condition (b) of Theorem 2.5.5 still holds. It follows by Theorem 2.5.5 that if (Θ ∗ , v ∗ ) is a closed-loop saddle point on [t, T ], then the outcome u ∗ (s) = Θ ∗ (s)X ∗ (s) + v ∗ (s), s ∈ [t  , T ] is a closed-loop representation of open-loop saddle points, where X ∗ is the solution of

2.5 Zero-Sum Games

47

⎧ ∗ ∗ ∗ ∗ ⎪ ⎨ d X (s) = [(A + BΘ )X + Bv + b]ds ∗ ∗ + [(C + DΘ )X + Dv ∗ + σ]dW (s), s ∈ [t  , T ], ⎪ ⎩ ∗  X (t ) = x, with arbitrary initial state x ∈ Rn . Using the representation (2.5.11)–(2.5.12) of (Θ ∗ , v ∗ ), one can easily verify that P satisfies P˙ + P(A + BΘ ∗ ) + (A + BΘ ∗ ) P + (C + DΘ ∗ ) P(C + DΘ ∗ ) + (Θ ∗ ) RΘ ∗ + S  Θ ∗ + (Θ ∗ ) S + Q = 0, and that (η P , ζ P ) satisfies  dη P = − (A + BΘ ∗ ) η P + (C + DΘ ∗ ) ζ P + (C + DΘ ∗ ) Pσ  + (Θ ∗ ) ρ + Pb + q ds + ζ P dW. Then applying Itô’s formula to s → P(s)X ∗ (s), X ∗ (s) yields EG X ∗ (T ), X ∗ (T ) − EP(t  )x, x T

− [(Θ ∗ ) RΘ ∗ + S  Θ ∗ + (Θ ∗ ) S + Q]X ∗ , X ∗ =E t

+ 2 P(Bv ∗ + b) + (C + DΘ ∗ ) P(Dv ∗ + σ), X ∗ 



+ D  P Dv ∗ , v ∗ + 2 D  Pσ, v ∗ + Pσ, σ ds, and applying Itô’s formula to s → η P (s), X ∗ (s) yields Eg, X ∗ (T ) − Eη P (t  ), x T

− (C + DΘ ∗ ) Pσ + (Θ ∗ ) ρ + Pb + q, X ∗ =E t 

 + B η P + D  ζ P , v ∗ + η P , b + ζ P , σ ds. Substituting for EG X ∗ (T ), X ∗ (T ) and Eg, X ∗ (T ) in the cost functional J (t  , x; u ∗ ) = J (t  , x; Θ ∗ X ∗ + v ∗ )  = E G X ∗ (T ), X ∗ (T ) + 2g, X ∗ (T ) T%

+ [Q + S  Θ ∗ + (Θ ∗ ) S + (Θ ∗ ) RΘ ∗ ]X ∗ , X ∗ 

t + 2 (RΘ ∗ + S) v ∗ + (Θ ∗ ) ρ + q, X ∗ &  + Rv ∗ , v ∗ + 2ρ, v ∗ ds

48

2 Linear-Quadratic Two-Person Differential Games

and noting that (R + D  P D)Θ ∗ + B  P + D  PC + S = 0, we obtain  J (t  , x; u ∗ ) = E P(t  )x, x + 2η P (t  ), x T%

Pσ, σ + 2η P , b + 2ζ P , σ + (R + D  P D)v ∗ , v ∗ +  t

&   (2.5.20) + 2 B η P + D  ζ P + D  Pσ + ρ, v ∗ ds . Next let (ηΠ , ζΠ ) denote the adapted solution to (2.5.17), and let Θ = −(R + D  Π D)† (B  Π + D  ΠC + S), v = −(R + D  Π D)† (B  ηΠ + D  ζΠ + D  Π σ + ρ). According to Theorem 2.5.5, for the initial time t  , u(s) = Θ(s)X (s) + v(s), s ∈ [t  , T ] is a closed-loop representation of open-loop saddle points. We may proceed as previously to obtain  J (t  , x; u) = E Π (t  )x, x + 2ηΠ (t  ), x T%

Π σ, σ + 2ηΠ , b + 2ζΠ , σ + (R + D  Π D)v, v + t

&   (2.5.21) + 2 B ηΠ + D  ζΠ + D  Π σ + ρ, v ds .    ∗ u1 u1 and u = are open-loop saddle points for (t  , x), we Since both u = u ∗2 u2 have ∗

J (t  , x; u ∗1 , u ∗2 )  J (t  , x; u 1 , u ∗2 )  J (t  , x; u 1 , u 2 )  J (t  , x; u ∗1 , u 2 )  J (t  , x; u ∗1 , u ∗2 ), which implies J (t  , x; u ∗ ) = J (t  , x; u). Since x is arbitrary, we conclude from

(2.5.20) and (2.5.21) that P(t  ) = Π (t  ). Remark 2.5.7 Theorem 2.5.6 is based on the assumption that both the closed-loop representation of open-loop saddle points and the closed-loop saddle points exist on [t, T ]. This assumption is necessary because, in general, neither of these two kinds

2.5 Zero-Sum Games

49

of existence implies the other (see Examples 2.5.8 and 2.5.9 below). It is different from Problem (SLQ), in which closed-loop solvability always implies open-loop solvability. Recall from Theorem 1.1.12 of Chap. 1 that for Problem (SLQ), when a closed-loop optimal strategy exists, the solution P to the associated Riccati equation satisfies R + D  P D  0. This positivity condition actually implies the convexity condition (i) of Theorem 1.1.9 in Chap. 1. However, in the case of Problem (SDG), one cannot deduce the convexity-concavity condition (b) of Theorem 2.5.2 from the counterpart (2.5.7) of R + D  P D  0, and vice-versa. We now present an example to show that the existence of a closed-loop saddle point does not necessarily imply the existence of an open-loop saddle point. Example 2.5.8 Consider the one-dimensional state equation 

d X (s) = [u 1 (s) − u 2 (s)]ds + [u 1 (s) − u 2 (s)]dW (s), s ∈ [t, 1], X (t) = x,

and the cost functional  2 J (t, x; u 1 , u 2 ) = E |X (1)| +

1

% &  2 2 |u 1 (s)| − |u 2 (s)| ds .

t

The associated Riccati equation reads ⎧  †   ⎪ 1 1 + P(s) −P(s) ⎨ P(s) ˙ = P(s)(1, −1) P(s) = 0, −1 −P(s) −1 + P(s) ⎪ ⎩ P(1) = 1. One can easily check that P(s) = 1 (0  s  1) is the unique solution. Since 

2 −1 R(s) + D(s) P(s)D(s) = −1 0 



is invertible, the range inclusion condition (2.5.8) automatically holds. Also note that   1  †  , [R(s) + D(s) P(s)D(s)] B(s) P(s) = 1 R11 (s) + D1 (s) P(s)D1 (s) = 2,

R22 (s) + D2 (s) P(s)D2 (s) = 0.

Hence the conditions (2.5.7) and (2.5.9) hold. The associated BSDE (2.5.10) reads dη(s) = ζ(s)dW (s), s ∈ [t, 1];

η(1) = 0,

50

2 Linear-Quadratic Two-Person Differential Games

whose solution is clearly (0, 0), and hence the condition (b) of Theorem 2.5.4 is satisfied. Therefore, by Theorem 2.5.4, the game admits a unique closed-loop saddle point (Θ ∗ , v ∗ ) over [t, 1], which is given by     1 0 ∗ Θ (s) = − , v (s) = ; s ∈ [t, 1]. 1 0 ∗

We next prove that the game does not have open-loop saddle points by showing that the convexity-concavity condition (b) of Theorem 2.5.2 fails. To this end, we take u 2 to be an arbitrary constant λ = 0. The solution to the SDE 

d X 2 (s) = −λds − λdW (s), s ∈ [t, 1], X 2 (t) = 0

is given by X (s) = −λ(s − t) − λ[W (s) − W (t)], 

and hence



1

E |X 2 (1)| − 2

 |u 2 (s)| ds = λ2 (1 − t)2 > 0. 2

t

This means that the convexity-concavity condition (b) of Theorem 2.5.2 does not hold for i = 2. The next example shows that the existence of an open-loop saddle point does not necessarily imply the existence of a closed-loop saddle point. Example 2.5.9 Consider the two-dimensional controlled state equation    u 1 (s) X 1 (s) = ds, s ∈ [t, T ]; d X 2 (s) u 2 (s) 



   X 1 (t) x = 1 , x2 X 2 (t)

and the cost functional & % J (t, x; u 1 , u 2 ) = E |X 1 (T )|2 − |X 2 (T )|2 . Let (t, x) ∈ [0, T ) × R2 be an arbitrary initial pair with x = (x1 , x2 ) . Choose constants λi  T 1−t ; i = 1, 2, and define u i∗ (s) = −λi xi 1[t,t+ λ1 ] (s), s ∈ [t, T ]; i

i = 1, 2.

It is straightforward to verify that for any (u 1 , u 2 ) ∈ L 2F (t, T ; R) × L 2F (t, T ; R) J (t, x; u ∗1 , u 2 )  J (t, x; u ∗1 , u ∗2 ) = 0  J (t, x; u 1 , u ∗2 ).

2.5 Zero-Sum Games

51

Thus, (u ∗1 , u ∗2 ) is an open-loop saddle point for (t, x). However, there is no closedloop saddle point for this problem. Indeed, the associated Riccati equation reads ˙ P(s) = 0, s ∈ [t, T ];



 1 0 P(T ) = G = . 0 −1

Clearly, the solution to the above equation is P(s) = G, t  s  T. Since in this example, B  P + D  PC + S = P = G,

R + D  P D = 0,

we see that the inclusion relation (2.5.8) does not hold. Consequently, the closed-loop saddle point does not exist by Theorem 2.5.4.

2.6 Differential Games in Infinite Horizons We now look at an infinite-horizon problem. As before, we let (Ω, F, F, P) be a complete filtered probability space on which a one-dimensional standard Brownian motion W = {W (t); t  0} is defined with F = {Ft }t0 being the usual augmentation of the natural filtration generated by W . Recall the following notation:   3 ∞ L 2F (Rn ) = ϕ : [0, ∞) × Ω → H 3 ϕ ∈ F, E 0 |ϕ(t)|2 dt < ∞ ,  3 Xloc [0, ∞) = ϕ : [0, ∞) × Ω → Rn 3 ϕ ∈ F is continuous, % &  and E sup0tT |ϕ(t)|2 < ∞ for every T > 0 ,   3 ∞ X [0, ∞) = ϕ ∈ Xloc [0, ∞) 3 E 0 |ϕ(t)|2 dt < ∞ . For brevity, we write Ui = L 2F (Rm i ), i = 1, 2;

U = U1 × U2 .

Consider the following controlled linear SDE on [0, ∞): ⎧ ⎪ ⎨ d X (t) = [AX (t) + B1 u 1 (t) + B2 u 2 (t) + b(t)]dt + [C X (t) + D1 u 1 (t) + D2 u 2 (t) + σ(t)]dW (t), ⎪ ⎩ X (0) = x,

(2.6.1)

52

2 Linear-Quadratic Two-Person Differential Games

where we assume the following (which is comparable with (G1)): (G1) The coefficients and the nonhomogeneous terms of (2.6.1) satisfy A, C ∈ Rn×n ;

Bi , Di ∈ Rn×m i , i = 1, 2; b, σ ∈ L 2F (Rn ).

Clearly, under (G1) , for any x ∈ Rn and (u 1 , u 2 ) ∈ U1 × U2 , (2.6.1) admits a unique solution X (·) ≡ X (· ; x, u 1 , u 2 ) ∈ Xloc [0, ∞), which is the state process. In this section, we are only concerned with zero-sum games. So Player 1 and Player 2 share the same performance functional:



J (x; u 1 , u 2 )  E 0

⎞⎛ ⎞ ⎛ ⎞ ⎛ Q S1 S2 X (t) X (t)  ⎝ S1 R11 R12 ⎠⎝u 1 (t)⎠, ⎝u 1 (t)⎠ S2 R21 R22 u 2 (t) u 2 (t)

⎛ q(t) ⎞ ⎛ X (t) ⎞  + 2 ⎝ρ1 (t)⎠, ⎝u 1 (t)⎠ ρ2 (t) u 2 (t)

dt,

(2.6.2)

for which we impose the following assumption: (G2) The weighting coefficients in (2.6.2) satisfy Q ∈ Sn ,

 R21 = R12 ∈ Rm 1 ×m 2 , q ∈ L 2F (Rn ),

Si ∈ Rm i ×n ,

Rii ∈ Sm i , ρi ∈ L 2F (Rm i ); i = 1, 2.

Note that in general, for (x, u 1 , u 2 ) ∈ Rn × U1 × U2 , the solution X (·) ≡ X (· ; x, u 1 , u 2 ) of (2.6.1) might just be in Xloc [0, ∞) and the performance functional J (x; u 1 , u 2 ) might not be defined. Therefore, we introduce the following set: # $ Uad (x)  (u 1 , u 2 ) ∈ U | X (· ; x, u 1 , u 2 ) ∈ X [0, ∞) . An element (u 1 , u 2 ) ∈ Uad (x) is called an admissible control pair for the initial state x, and the corresponding state process X (·) ≡ X (· ; x, u 1 , u 2 ) is called an admissible state process with initial state x. In the current zero-sum game, we assume that Player 1 is the minimizer and Player 2 is the maximizer. That is, Player 1 wishes to minimize (2.6.2) by selecting a control u 1 , and Player 2 wishes to maximize (2.6.2) by selecting a control u 2 . Thus, (2.6.2) represents the cost of Player 1 and the payoff of Player 2. The problem is to find an admissible control pair (u ∗1 , u ∗2 ) that both players can accept. We refer to such a problem as an infinite-horizon linear-quadratic stochastic two-person zero-sum differential game, and denote it by Problem (SDG)∞ for short. Similar to the finite-horizon problem, for notational convenience we let m = m 1 + m 2 and denote

2.6 Differential Games in Infinite Horizons

53

B = (B1 , B2 ), D = (D1 , D2 ),         R11 R12 S1 ρ1 u1 R= , S= , ρ= , u= . R21 R22 S2 ρ2 u2 With the above notation, the state equation can be rewritten as ⎧ ⎪ ⎨ d X (t) = [A(t)X (t) + B(t)u(t) + b(t)]dt + [C(t)X (t) + D(t)u(t) + σ(t)]dW (t), t  0, ⎪ ⎩ X (0) = x, and the performance functional can be rewritten as



J (x; u) = E 0



Q S S R



       X (t) X (t) q(t) X (t) , +2 , dt. u(t) u(t) ρ(t) u(t)

Also, when b, σ, q, ρ = 0, we denote the corresponding Problem (SDG)∞ by Problem (SDG)0∞ and the corresponding performance functional by J 0 (x; u). Similar to Problem (SLQ)∞ , we will assume that the set of stabilizers of the system d X (t) = [AX (t) + Bu(t)]dt + [C X (t) + Du(t)]dW (t), t  0

(2.6.3)

is nonempty, that is, $ # S [A, C; B, D]  Θ ∈ Rm×n | Θ stabilizes the system (2.6.3) = ∅. Moreover, for Θi ∈ Rm i ×n , i = 1, 2, we let  S1 (Θ2 ) = Θ1 ∈ Rm 1 ×n  S2 (Θ1 ) = Θ2 ∈ Rm 2 ×n

  Θ1 ∈ S [A, C; B, D] , : Θ2    Θ1 ∈ S [A, C; B, D] . : Θ2 

Note that in general, say, S1 (Θ2 ) is not necessarily non-empty for some Θ2 ∈ Rm 2 ×n . However, if Θ  (Θ1 , Θ2 ) ∈ S [A, C; B, D], then both S1 (Θ2 ) and S2 (Θ1 ) are non-empty. Definition 2.6.1 For a given initial state x ∈ Rn , a pair (u¯ 1 , u¯ 2 ) ∈ Uad (x) is called an open-loop saddle point of Problem (SDG)∞ if J (x; u¯ 1 , u 2 )  J (x; u¯ 1 , u¯ 2 )  J (x; u 1 , u¯ 2 ) for any (u 1 , u 2 ) ∈ U such that J (x; u¯ 1 , u 2 ) and J (x; u 1 , u¯ 2 ) are defined. Definition 2.6.2 A 4-tuple (Θ1∗ , u ∗1 ; Θ2∗ , u ∗2 ) ∈ Rm 1 ×n × U1 × Rm 2 ×n × U2 is called a closed-loop saddle point of Problem (SDG)∞ if

54

2 Linear-Quadratic Two-Person Differential Games

(i) Θ ∗  ((Θ1∗ ) , (Θ2∗ ) ) ∈ S [A, C; B, D], and (ii) for any x ∈ Rn , (Θ1 , Θ2 ) ∈ S1 (Θ2∗ ) × S2 (Θ1∗ ) and (u 1 , u 2 ) ∈ U1 × U2 , J (x; Θ1∗ X + u ∗1 , Θ2 X + u 2 )  J (x; Θ1∗ X ∗ + u ∗1 , Θ2∗ X ∗ + u ∗2 )  J (x; Θ1 X + u 1 , Θ2∗ X + u ∗2 ). Remark 2.6.3 (a) Although both players are non-cooperative, when choosing Θi (i = 1, 2), they prefer to at least work together so that Θ = ((Θ1 ) , (Θ2 ) ) is a stabilizer of (2.6.3) (and the system will not be crashed). Thus, in Definition 2.6.2, we only require Θ ∗ being a stabilizer of (2.6.3) rather than Θi∗ (i = 1, 2) being a stabilizer of the system d X (t) = [AX (t) + Bi u i (t)]dt + [C X (t) + Di u i (t)]dW (t), t  0. (b) By a similar method used in proving Proposition 2.1.5 of [48, Chap. 2], one can show that the condition (ii) in Definition 2.6.2 is equivalent to the following: (ii) for any x ∈ Rn and (u 1 , u 2 ) ∈ U1 × U2 , J (x; Θ1∗ X + u ∗1 , Θ2∗ X + u 2 )  J (x; Θ1∗ X ∗ + u ∗1 , Θ2∗ X ∗ + u ∗2 )  J (x; Θ1∗ X + u 1 , Θ2∗ X + u ∗2 ). Let Θ∗ =

 ∗  ∗ Θ1 u1 ∗ ∈ S [A, C; B, D], u ∈ U. = Θ2∗ u ∗2

Consider the state equation ⎧ ∗ ⎪ ⎨ d X (t) = [(A + BΘ )X (t) + Bu(t) + b(t)]dt + [(C + DΘ ∗ )X (t) + Du(t) + σ(t)]dW (t), t  0, ⎪ ⎩ X (0) = x and the performance functional J4(x; u 1 , u 2 )  J (x; Θ1∗ X + u 1 , Θ2∗ X + u 2 )          ∞  44 Q S X X q˜ X =E , + 2 , dt, 4 u u ρ u S R 0 where 4 = Q + (Θ ∗ ) S + S  Θ ∗ + (Θ ∗ ) RΘ ∗ , Q 4 S = S + RΘ ∗ , q˜ = q + (Θ ∗ ) ρ.

2.6 Differential Games in Infinite Horizons

55

From (ii) of Remark 2.6.3, we see that (Θ1∗ , u ∗1 ; Θ2∗ , u ∗2 ) is a closed-loop saddle point of Problem (SDG)∞ if and only if (u ∗1 , u ∗2 ) is an open-loop saddle point for the problem with the above state equation and performance functional. Applying the idea used in the proof of Theorem 2.5.2, we see that (Θ1∗ , u ∗1 ; Θ2∗ , u ∗2 ) is a closed-loop saddle point of Problem (SDG)∞ if and only if for any x ∈ Rn , the adapted solution (X ∗ , Y ∗ , Z ∗ ) ∈ X [0, ∞) × X [0, ∞) × L 2F (Rn ) to the FBSDE   ⎧ d X ∗ (t) = (A + BΘ ∗ )X ∗ + Bu ∗ + b dt ⎪ ⎪   ⎪ ⎪ ⎪ + (C + DΘ ∗ )X ∗ + Du ∗ + σ dW (t), t  0, ⎪ ⎨  dY ∗ (t) = − (A + BΘ ∗ ) Y ∗ + (C + DΘ ∗ ) Z ∗ ⎪  ⎪ ⎪  ∗ 4X ∗ + 4 ⎪ + Q S u + q ˜ dt + Z ∗ dW (t), t  0, ⎪ ⎪ ⎩ ∗ X (0) = x

(2.6.4)

satisfies the following stationarity condition: S X ∗ + ρ = 0, a.e. a.s. Ru ∗ + B  Y ∗ + D  Z ∗ + 4 and the following condition holds for i = 1, 2:



(−1)i−1 E 0



44 Q Si 4 Si Rii



   Xi Xi , dt  0, ∀u i ∈ Ui , ui ui

where 4 Si = Si + Ri Θ ∗ and X i is the solution of ⎧ ∗ ⎪ ⎨ d X i (t) = [(A + BΘ )X i (t) + Bi u i (t)]dt + [(C + DΘ ∗ )X i (t) + Di u i (t)]dW (t), t  0, ⎪ ⎩ X i (0) = 0. By subtracting solutions of (2.6.4) with initial states x and 0, we obtain the following result. Proposition 2.6.4 If (Θ1∗ , u ∗1 ; Θ2∗ , u ∗2 ) is a closed-loop saddle point of Problem (SDG)∞ , then (Θ1∗ , 0; Θ2∗ , 0) is a closed-loop saddle point of Problem (SDG)0∞ . Now we introduce the following algebraic Riccati equation (ARE): ⎧  † ⎪ ⎨ Q(P) − S(P) R(P) S(P) = 0, R(S(P)) ⊆ R(R(P)), ⎪ ⎩ R1 (P)  0, R2 (P)  0, where

(2.6.5)

56

2 Linear-Quadratic Two-Person Differential Games

Q(P) = P A + A P + C  PC + Q, S(P) = B  P + D  PC  + S, R(P) = R + D  P D, Ri (P) = Rii + Di P Di ; i = 1, 2. Definition 2.6.5 A matrix P ∈ Sn is called a stabilizing solution of (2.6.5) if P satisfies (2.6.5) and there exists a matrix Π ∈ Rm×n such that −R(P)† S(P) + [I − R(P)† R(P)]Π ∈ S [A, C; B, D]. The following result provides a necessary condition for the existence of a closedloop saddle point. Proposition 2.6.6 Suppose that Problem (SDG)0∞ admits a closed-loop saddle point. Then the ARE (2.6.5) admits a stabilizing solution P. Proof We assume without loss of generality that (Θ1∗ , 0; Θ2∗ , 0) is a closed-loop saddle point of Problem (SDG)0∞ . It is easily seen that the function V 0 (x) defined by V 0 (x)  J 0 (x; Θ1∗ X ∗ , Θ2∗ X ∗ ) is a quadratic form, that is, there exists a matrix P ∈ Sn such that V 0 (x) = P x, x , ∀x ∈ Rn . Consider the state equation ⎧ ∗ ⎪ ⎨ d X 1 (t) = [(A + B2 Θ2 )X 1 (t) + B1 u 1 (t)]dt + [(C + D2 Θ2∗ )X 1 (t) + D1 u 1 (t)]dW (t), t  0, ⎪ ⎩ X 1 (0) = x and the cost functional J1 (x; u 1 )  J 0 (x; u 1 , Θ2∗ X 1 ) ⎞⎛ ⎞ ⎛ ⎞ ⎛ ∞ Q S1 S2 X1 X1  ⎝ S1 R11 R12 ⎠ ⎝ u 1 ⎠ , ⎝ u 1 ⎠ dt =E 0 S2 R21 R22 Θ2∗ X 1 Θ2∗ X 1 ∞ [Q + (Θ2∗ ) R22 Θ2∗ + (Θ2∗ ) S2 + S2 Θ2∗ ]X 1 , X 1 =E 0  + R11 u 1 , u 1 + 2(S1 + R12 Θ2∗ )X 1 , u 1 dt. Then (Θ1∗ , 0) is a closed-loop optimal control of Problem (SLQ)0∞ with the above state equation and cost functional, and the value function of this Problem (SLQ)0∞ is

2.6 Differential Games in Infinite Horizons

57

given by P x, x with P satisfying 4 4 4 4 41 + A PA 1 P + C1 P C1 + Q 1   41 P D1 + 4 41 + 4 − (P B1 + C S1 )R1 (P)† (B1 P + D1 P C S1 ) = 0, ∗   4 R1 (P)  0, R1 (P)Θ1 + (B1 P + D1 P C1 + 4 S1 ) = 0,

(2.6.6) (2.6.7)

where 41 = C + D2 Θ2∗ , R1 (P) = R11 + D1 P D1 , 41 = A + B2 Θ2∗ , C A 41 = Q + (Θ2∗ ) R22 Θ2∗ + (Θ2∗ ) S2 + S2 Θ2∗ , 4 S1 = S1 + R12 Θ2∗ . Q Similarly, by considering the state equation ⎧ ∗ ⎪ ⎨ d X 2 (t) = [(A + B1 Θ1 )X 2 (t) + B2 u 2 (t)]dt + [(C + D1 Θ1∗ )X 2 (t) + D2 u 2 (t)]dW (t), t  0, ⎪ ⎩ X 2 (0) = x and the cost functional J2 (x; u 2 )  −J 0 (x; Θ1∗ X 2 , u 2 ), we can obtain 42 + 4 S2 ) = 0, R2 (P)  0, R2 (P)Θ2∗ + (B2 P + D2 P C

(2.6.8)

where 42 = C + D1 Θ1∗ , 4 S2 = S2 + R21 Θ1∗ . R2 (P) = R22 + D2 P D2 , C Let Θ ∗ = ((Θ1∗ ) , (Θ2∗ ) ) . Combining (2.6.7) and (2.6.8), one has R(P)Θ ∗ + S(P) = (R + D  P D)Θ ∗ + (B  P + D  PC + S) = 0,

(2.6.9)

from which we conclude that R(S(P)) ⊆ R(R(P)) and the existence of a matrix Π ∈ Rm×n such that Θ ∗ = −R(P)† S(P) + [I − R(P)† R(P)]Π ∈ S [A, C; B, D]. Using (2.6.6)–(2.6.9), we have

(2.6.10)

58

2 Linear-Quadratic Two-Person Differential Games

41 + A 4 4 4 4 0 = PA 1 P + C1 P C1 + Q 1   41 P D1 + 4 41 + 4 − (P B1 + C S1 )R1 (P)† (B1 P + D1 P C S1 )   ∗  ∗ 41 + A 41 P + C 41 P C 41 + Q 41 − (Θ1 ) R1 (P)Θ1 = PA = P A + A P + C  PC + Q + (Θ2∗ ) R2 (P)Θ2∗ − (Θ1∗ ) R1 (P)Θ1∗ + (P B2 + C  P D2 + S2 )Θ2∗ + (Θ2∗ ) (B2 P + D2 PC + S2 ) = Q(P) − (Θ1∗ ) R1 (P)Θ1∗ − (Θ2∗ ) R2 (P)Θ2∗   + (Θ2∗ ) R2 (P) + (P B2 + C  P D2 + S2 ) Θ2∗   + (Θ2∗ ) (B2 P + D2 PC + S2 ) + R2 (P)Θ2∗ = Q(P) − (Θ1∗ ) R1 (P)Θ1∗ − (Θ2∗ ) R2 (P)Θ2∗ − (Θ1∗ ) (D1 P D2 + R12 )Θ2∗ − (Θ2∗ ) (D2 P D1 + R21 )Θ1∗ = Q(P) − (Θ ∗ ) R(P)Θ ∗ = Q(P) − S(P) R(P)† S(P).



Therefore, P is a stabilizing solution of the ARE (2.6.5).

The following result, which is the main result of this section, gives a characterization for closed-loop saddle points of Problem (SDG)∞ . Theorem 2.6.7 Problem (SDG)∞ admits a closed-loop saddle point  ∗   ∗  u Θ1 , 1∗ ∈ Rm×n × U (Θ , u ) = Θ2∗ u2 ∗



(2.6.11)

if and only if (i) the ARE (2.6.5) admits a stabilizing solution P, and (ii) the BSDE  dη(t) = − [A − BR(P)† S(P)] η + [C − DR(P)† S(P)] ζ + [C − DR(P)† S(P)] Pσ − [R(P)† S(P)] ρ  + Pb + q dt + ζdW (t), t  0,

(2.6.12)

admits an L 2 -stable adapted solution (η, ζ) such that B  η(t) + D  ζ(t) + D  Pσ(t) + ρ(t) ∈ R(R(P)), a.e. t ∈ [0, ∞), a.s. In this case, the closed-loop saddle point (Θ ∗ , u ∗ ) admits the following representation:  ∗ Θ = −R(P)† S(P) + [I − R(P)† R(P)]Π, (2.6.13) u ∗ = −R(P)† [B  η + D  ζ + D  Pσ + ρ] + [I − R(P)† R(P)]ν,

2.6 Differential Games in Infinite Horizons

59

where Π ∈ Rm×n is chosen such that Θ ∗ ∈ S [A, C; B, D], and ν ∈ U is arbitrary. Furthermore, the value function admits the following representation:  V (x) = P x, x + E 2η(0), x +

% Pσ, σ + 2η, b + 2ζ, σ 0 3 32 &  1 − 3[R(P)† ] 2 (B  η + D  ζ + D  Pσ + ρ)3 dt . ∞

Proof Necessity. Suppose that the pair (Θ ∗ , u ∗ ) in (2.6.11) is a closed-loop saddle point of Problem (SDG)∞ . Then by Proposition 2.6.4, (Θ1∗ , 0; Θ2∗ , 0) is a closedloop saddle point of Problem (SDG)0∞ , and hence by Proposition 2.6.6, the ARE (2.6.5) admits a stabilizing solution P. Moreover, from the proof of Proposition 2.6.6, we see that Θ ∗ is given by (2.6.10) for some matrix Π ∈ Rm×n . To prove (ii), let (X ∗ , Y ∗ , Z ∗ ) be the solution of (2.6.4). Then Ru ∗ + B  Y ∗ + D  Z ∗ + (S + RΘ ∗ )X ∗ + ρ = 0, a.e. a.s.

(2.6.14)

It follows that # $ 4X ∗ + 4 S  u ∗ + q˜ dt + Z ∗ dW dY ∗ = − (A+ BΘ ∗ ) Y ∗ + (C + DΘ ∗ ) Z ∗ + Q # = − A Y ∗ + C  Z ∗ + (Q + S  Θ ∗ )X ∗ + S  u ∗ + q $ + (Θ ∗ ) [B  Y ∗ + D  Z ∗ + (S + RΘ ∗ )X ∗ + Ru ∗ + ρ] dt + Z ∗ dW # $ = − A Y ∗ + C  Z ∗ + (Q + S  Θ ∗ )X ∗ + S  u ∗ + q dt + Z ∗ dW. Define for t  0, η(t) = Y ∗ (t) − P X ∗ (t), ζ(t) = Z ∗ (t) − P(C + DΘ ∗ )X ∗ (t) − P Du ∗ (t) − Pσ(t). Noting that Q(P) + S(P) Θ ∗ = 0, we have dη = dY ∗ − Pd X ∗ = − [A Y ∗ + C  Z ∗ + (Q + S  Θ ∗ )X ∗ + S  u ∗ + q]dt + Z ∗ dW − P[(A + BΘ ∗ )X ∗ + Bu ∗ + b]dt − P[(C + DΘ ∗ )X ∗ + Du ∗ + σ]dW # = − A (η + P X ∗ ) + C  [ζ + P(C + DΘ ∗ )X ∗ + P Du ∗ + Pσ] $ + (Q + S Θ ∗ )X ∗ + S  u ∗ + q + P[(A+ BΘ ∗ )X ∗ + Bu ∗ + b] dt + ζdW # = − A η + C  ζ + Q(P)X ∗ + S(P) Θ ∗ X ∗ + S(P) u ∗ $ + C  Pσ + Pb + q dt + ζdW # $ = − A η + C  ζ + S(P) u ∗ + C  Pσ + Pb + q dt + ζdW. According to (2.6.14), we have (noting that S(P) + R(P)Θ ∗ = 0)

60

2 Linear-Quadratic Two-Person Differential Games

0 = B  Y ∗ + D  Z ∗ + (S + RΘ ∗ )X ∗ + Ru ∗ + ρ = B  (η + P X ∗ ) + D  [ζ + P(C + DΘ ∗ )X ∗ + P Du ∗ + Pσ] + (S + RΘ ∗ )X ∗ + Ru ∗ + ρ = [S(P) + R(P)Θ ∗ ]X ∗ + B  η + D  ζ + D  Pσ + ρ + R(P)u ∗ = B  η + D  ζ + D  Pσ + ρ + R(P)u ∗ . Hence,

and

B  η + D  ζ + D  Pσ + ρ ∈ R(R(P)), a.e. a.s. u ∗ = −R(P)† (B  η + D  ζ + D  Pσ + ρ) + [I − R(P)† R(P)]ν

for some ν ∈ U. Consequently, S(P) u ∗ = − S(P) R(P)† (B  η + D  ζ + D  Pσ + ρ) + S(P) [I − R(P)† R(P)]ν = − S(P) R(P)† (B  η + D  ζ + D  Pσ + ρ). Then A η + C  ζ + S(P) u ∗ + C  Pσ + Pb + q = A η + C  ζ − S(P) R(P)† (B  η + D  ζ + D  Pσ + ρ) + C  Pσ + Pb + q = [A − S(P) R(P)† B  ]η + [C  − S(P) R(P)† D  ]ζ + [C  − S(P) R(P)† D  ]Pσ − S(P) R(P)† ρ + Pb + q. Therefore, (η, ζ) is an L 2 -stable solution to (2.6.12). Sufficiency. Let (Θ ∗ , u ∗ ) be given by (2.6.13), where Π ∈ Rm×n is chosen so that Θ ∈ S [A, C; B, D]. Then we have ∗

R(P)Θ ∗ + S(P) = 0, 



(2.6.15) ∗ 

∗ 



Q(P) + S(P) Θ + (Θ ) S(P) + (Θ ) R(P)Θ = 0, 







B η + D ζ + D Pσ + ρ = −R(P)u ,

(2.6.16) (2.6.17)

and [(Θ ∗ ) + S(P) R(P)† ](B  η + D  ζ + D  Pσ + ρ) = −Π  [I − R(P)R(P)† ]R(P)u ∗ = 0.

(2.6.18)

2.6 Differential Games in Infinite Horizons

61

Take an arbitrary u ∈ U and let X (·) ≡ X (· ; x, u) be the solution to the following closed-loop system: ⎧ ∗ ⎪ ⎨ d X (t) = [(A + BΘ )X (t) + Bu(t) + b(t)]dt + [(C + DΘ ∗ )X (t) + Du(t) + σ(t)]dW (t), t  0, ⎪ ⎩ X (0) = x. Then     X X Q S , S R Θ∗ X + u Θ∗ X + u 0      q X dt +2 , ρ Θ∗ X + u ∞ =E [Q + S  Θ ∗ + (Θ ∗ ) S + (Θ ∗ ) RΘ ∗ ]X, X

J (x; Θ ∗ X + u) = E







0

+ 2(S + RΘ ∗ )X, u + Ru, u  + 2q + (Θ ∗ ) ρ, X + 2ρ, u dt.

(2.6.19)

Applying Itô’s formula to t → P X (t), X (t) , one has (recalling (2.6.15)) −P x, x = E

∞ 0

[P(A + BΘ ∗ ) + (A + BΘ ∗ ) P]X, X



+ P(C + DΘ ∗ )X, (C + DΘ ∗ )X + 2P X, Bu + b

 + 2P(C + DΘ ∗ )X, Du + σ ) + P(Du + σ), Du + σ dt ∞ [(P A + A P + C  PC) + (P B + C  P D)Θ ∗ =E 0

+ (Θ ∗ ) (B  P + D  PC) + (Θ ∗ ) D  P DΘ ∗ ]X, X



+ 2(B  P + D  PC + D  P DΘ ∗ )X, u + 2P(C + DΘ ∗ )X, σ  + D  P Du, u + 2D  Pσ, u + 2P X, b + Pσ, σ dt ∞

=E [Q(P)+S(P)Θ ∗ +(Θ ∗ )S(P)+(Θ ∗ )R(P)Θ ∗ ]X, X 0

− [Q + S  Θ ∗ + (Θ ∗ ) S + (Θ ∗ ) RΘ ∗ ]X, X

+ 2 [S(P)+R(P)Θ ∗ −(S + RΘ ∗ )]X, u + 2P(C + DΘ ∗ )X, σ  + D  P Du, u + 2D  Pσ, u + 2P X, b + Pσ, σ dt ∞ =E 2P(C + DΘ ∗ )X, σ + D  P Du, u + 2D  Pσ, u 0

+ 2P X, b + Pσ, σ − 2(S + RΘ ∗ )X, u

 − [Q + S  Θ ∗ + (Θ ∗ ) S + (Θ ∗ ) RΘ ∗ ]X, X dt.

(2.6.20)

62

2 Linear-Quadratic Two-Person Differential Games

Applying Itô’s formula to t → η(t), X (t) , one has (recalling (2.6.18))

∞ 

[A − BR(P)† S(P)] η + [C − DR(P)† S(P)] ζ

Eη(0), x = E 0

+ [C − DR(P)† S(P)] Pσ − S(P) R(P)† ρ + Pb + q, X



 − (A+ BΘ ∗ )X + Bu + b, η − ζ, (C + DΘ ∗ )X + Du + σ dt ∞

=E − [Θ ∗ +R(P)† S(P)] (B  η+ D  ζ + D  Pσ+ρ), X 0

+ P(C + DΘ ∗ )X, σ + (Θ ∗ ) ρ + Pb + q, X  − Bu + b, η − ζ, Du + σ dt ∞

=E P(C + DΘ ∗ )X, σ + (Θ ∗ ) ρ + Pb + q, X 0  − Bu + b, η − ζ, Du + σ dt. (2.6.21) Combining (2.6.19)–(2.6.21) and recalling (2.6.17), we have J (x; Θ ∗ X + u) − P x, x − 2Eη(0), x ∞ =E R(P)u, u + 2B  η + D  ζ + D  Pσ + ρ, u 0  + 2b, η + 2ζ, σ + Pσ, σ dt ∞  =E R(P)u, u − 2R(P)u ∗ , u + 2b, η + 2ζ, σ + Pσ, σ dt 0 ∞  =E R(P)(u − u ∗ ), u − u ∗ − R(P)u ∗ , u ∗ 0  + 2b, η + 2ζ, σ + Pσ, σ dt. Consequently, J (x; Θ1∗ X + u 1 , Θ2∗ X + u ∗2 ) − J (x; Θ ∗ X ∗ + u ∗ ) ∞ =E R1 (P)(u 1 − u ∗1 ), u 1 − u ∗1 dt  0, 0

since R1 (P)  0. Similarly, J (x; Θ1∗ X + u ∗1 , Θ2∗ X + u 2 ) − J (x; Θ ∗ X ∗ + u ∗ ) ∞ =E R2 (P)(u 2 − u ∗2 ), u 2 − u ∗2 dt  0, 0

2.6 Differential Games in Infinite Horizons

63

since R2 (P)  0. Therefore, (Θ ∗ , u ∗ ) is a closed-loop saddle point of Problem (SQG)∞ . Finally, recalling (2.6.17), we have R(P)u ∗ , u ∗ = R(P)R(P)† R(P)u ∗ , u ∗ = R(P)† R(P)u ∗ , R(P)u ∗ 3 32 1 = 3[R(P)† ] 2 (B  η + D  ζ + D  Pσ + ρ)3 , and hence, V (x) = J (x; Θ ∗ X + u ∗ ) = P x, x + 2Eη(0), x ∞  +E − R(P)u ∗ , u ∗ + 2b, η + 2ζ, σ + Pσ, σ dt 0 ∞% = P x, x + 2Eη(0), x + E Pσ, σ + 2η, b + 2ζ, σ 0 3 32 & 1 − 3[R(P)† ] 2 (B  η + D  ζ + D  Pσ + ρ3 dt.



This completes the proof.

To conclude this section, we present some examples illustrating how the “stabilizing solution” of algebraic Riccati equations plays an important role in the study of closed-loop saddle points. The first example shows that the algebraic Riccati equation may only admits non-stabilizing solutions even if the system (2.6.3) is stabilizable. Example 2.6.8 Consider the one-dimensional state equation ⎧ ⎨ d X (t) = − 1 X (t)dt + [u (t) + u (t)]dW (t), t  0, 1 2 2 ⎩ X (0) = x, and the performance functional



J (x; u 1 , u 2 ) = E 0

⎛ 1 1 −1 ⎞ ⎛ X (t) ⎞ ⎛ X (t) ⎞ ⎝ 1 1 0 ⎠ ⎝u 1 (t)⎠ , ⎝u 1 (t)⎠ dt. −1 0 −1 u 2 (t) u 2 (t)

In this example, 1 A=− , 2

B = (0, 0), C = 0, D = (1, 1),     1 1 0 Q = 1, S = , R= . −1 0 −1 It is clear that Θ = (Θ1 , Θ2 ) ∈ S [A, C; B, D] if and only if

64

2 Linear-Quadratic Two-Person Differential Games

−1 + (Θ1 + Θ2 )2 = 2(A + BΘ) + (C + DΘ)2 < 0, or equivalently, if and only if −1 < Θ1 + Θ2 < 1. Note that R + D  P D is invertible for all P ∈ R with inverse 

(R + D P D)

−1

 =

P +1 P P P −1

−1



 −P + 1 P = . P −P − 1

Then the corresponding ARE reads 0 = P A + A P + C  PC + Q − (P B + C  P D + S  )(R + D  P D)† (B  P + D  PC + S)    −P + 1 P 1 = −P + 1 − (1, −1) P −P − 1 −1 = 3P + 1. Thus, P = −1/3 and R11 + D1 P D1 =

2  0, 3

R22 + D2 P D2 = −

4  0. 3

Also, the range condition R(B  P + D  PC + S) ⊆ R(R + D  P D) holds automatically since R + D  P D is invertible. However, for any Π ∈ R, [I − R(P)† R(P)]Π − R(P)† S(P) =

/

1 0 5 − ,− , 3 3

which is not a stabilizer of the system (2.6.3). Hence, by Theorem 2.6.7, the above problem does not admit closed-loop saddle points. Next we give an example of Problem (SDG)0∞ which admits uncountably many closed-loop saddle points. This example also tells us when the algebraic Riccati equation is solvable, −R(P)† S(P) might not be a stabilizer of the system (2.6.3) in general, and we should carefully choose Π so that [I − R(P)† R(P)]Π − R(P)† S(P) is a closed-loop saddle point of the game.

2.6 Differential Games in Infinite Horizons

65

Example 2.6.9 Consider the one-dimensional state equation ⎧ & % ⎨ d X (t) = − 1 X (t) + 1 u (t) dt + [−X (t) + u (t)]dW (t), t  0, 2 1 4 2 ⎩ X (0) = x, and the performance functional J (x; u 1 , u 2 ) = E



0

⎛ 1 −1 − 1 ⎞ ⎛ X (t) ⎞ ⎛ X (t) ⎞ 2 2 ⎝ −1 1 0 ⎠ ⎝u 1 (t)⎠ , ⎝u 1 (t)⎠ dt. u 2 (t) u 2 (t) − 21 0 0

In this example, / 1 10 A = − , B = 0, − , C = −1, D = (1, 0), 4 2    1 −1 10 Q= , S= , R = . 00 − 21 2 Clearly, Θ = (Θ1 , Θ2 ) ∈ S [A, C; B, D] if and only if / 1 1 0 2 − − Θ2 + (−1 + Θ1 )2 = 2(A + BΘ) + (C + DΘ)2 < 0, 4 2 or equivalently, if and only if Θ12 − 2Θ1 +

1 < Θ2 . 2

The corresponding ARE reads 0 = P A + A P + C  PC + Q − (P B + C  P D + S  )(R + D  P D)† (B  P + D  PC + S) / 0  P + 1 0†  −(P + 1)  1 1 = (P + 1) − − (P + 1), − (P + 1) 0 0 − 21 (P + 1) 2 2  †   1 1 2 P +1 0 = (P + 1) − (P + 1)2 (2, 1) . (2.6.22) 1 0 0 2 4 It is easy to verify that P = −1 is the unique solution of (2.6.22). Thus,     00 0   , S(P) = B P + D PC + S = . R(P) = R + D P D = 00 0 

Hence, all the conditions

66

2 Linear-Quadratic Two-Person Differential Games

R11 + D1 P D1  0,

R22 + D2 P D2  0,

R(B  P + D  PC + S) ⊆ R(R + D  P D) hold. By Theorem 2.6.7, we see that (Θ1 , ν1 ; Θ2 , ν2 ) with Θ12 − 2Θ1 +

1 < Θ2 , ν1 , ν2 ∈ L 2F (R) 2

are all the closed-loop saddle points of the above problem. However, / S [A, C; B, D]. −R(P)† S(P) = (0, 0) ∈ From this example, we also see that even if −R(P)† S(P) is not a stabilizer of the system, Problem (SDG)∞ may still admit closed-loop saddle points, thanks to the fact that we can properly choose Π = 0 so that the term [I − R(P)† R(P)]Π could play a role. Finally, we present an example showing that not all of the stabilizers are necessarily closed-loop saddle points of the game. It may happens that the system (2.6.3) has more than one (uncountably many) stabilizer, while the closed-loop saddle point is unique. Example 2.6.10 Consider the one-dimensional state equation 

d X (t) = [−8X (t) + u 1 (t) − u 2 (t)]dt + [u 1 (t) + u 2 (t)]dW (t), t  0, X (0) = x,

and the performance functional



J (x; u 1 , u 2 ) = E

% & 12|X (t)|2 + |u 1 (t)|2 − |u 2 (t)|2 dt.

0

In this example, A = −8,

B = (1, −1), C = 0, D = (1, 1),     0 1 0 Q = 12, S = , R= . 0 0 −1 Again, Θ = (Θ1 , Θ2 ) ∈ S [A, C; B, D] if and only if − 16 + 2(Θ1 − Θ2 ) + (Θ1 + Θ2 )2 < 0. The corresponding ARE reads

(2.6.23)

2.6 Differential Games in Infinite Horizons

67

0 = P A + A P + C  PC + Q − (P B + C  P D + S  )(R + D  P D)† (B  P + D  PC + S)   †  1 P +1 P = −16P + 12 − P 2 (1, −1) −1 P P −1    −P + 1 P 1 = −16P + 12 − P 2 (1, −1) P −P − 1 −1 = 4P 3 − 16P + 12, which has three solutions: P1 = 1,

√ −1 + 13 , P2 = 2

√ −1 − 13 P3 = . 2

All of them satisfy the range condition R(B  P + D  PC + S) ⊆ R(R + D  P D), since R + D  P D is invertible for any P ∈ R. However, only P1 = 1 satisfies R11 + D1 P D1  0,

R22 + D2 P D2  0.

For any Π ∈ R, [I − R(P1 )† R(P1 )]Π − R(P1 )† S(P1 ) = (1, −3) , which satisfies (2.6.23) and hence is a stabilizer of the system (2.6.3). By Theorem 2.6.7, the above problem admits a unique closed-loop saddle point (1, 0; −3, 0). On the other hand, by verifying (2.6.23), we see that all the following (Θ1∗ , Θ2∗ ) = (1, −3),

(Θ1 , Θ2 ) = (0, 0),

(Θ1∗ , Θ2 )

(Θ1 , Θ2∗ ) = (0, −3),

= (1, 0),

are stabilizers of the system (2.6.3), but only (Θ1∗ , Θ2∗ ) is the closed-loop saddle point of the problem.

Chapter 3

Mean-Field Linear-Quadratic Optimal Controls

Abstract This chapter is concerned with a more general class of linear-quadratic optimal control problems, the mean-field linear-quadratic optimal control problem, in which the expectations of the state process and the control are involved. Two differential Riccati equations are introduced for the problem. The strongly regular solvability of these two Riccati equations is proved to be equivalent to the uniform convexity of the cost functional. In terms of the solutions to the Riccati equations, the unique optimal control is obtained as a linear feedback of the state process and its expectation. An application of the mean-field linear-quadratic optimal control theory is presented, in which analytical optimal portfolio policies are constructed for a continuous-time mean-variance portfolio selection problem. The mean-field linear-quadratic optimal control problem over an infinite horizon is also studied. Keywords Mean-field · Linear-quadratic · Optimal control · Riccati equation · Open-loop solvability · Uniform convexity · Mean-variance portfolio selection

In this chapter, we study a more general class of linear-quadratic optimal control problems, in which the expectations of the state process and the control, called the mean-field, are involved. As before, we shall have a complete probability space (Ω, F, P) on which a standard one-dimensional Brownian motion W = {W (t); 0  t < ∞} is defined. We denote by F = {Ft }t0 the usual augmentation of the natural filtration generated by W and employ throughout this chapter the notation of Chap. 1.

© The Author(s), under exclusive license to Springer Nature Switzerland AG 2020 J. Sun and J. Yong, Stochastic Linear-Quadratic Optimal Control Theory: Differential Games and Mean-Field Problems, SpringerBriefs in Mathematics, https://doi.org/10.1007/978-3-030-48306-7_3

69

70

3 Mean-Field Linear-Quadratic Optimal Controls

3.1 Problem Formulation and General Considerations Consider the following controlled state equation on a finite horizon [t, T ]:  ⎧ ¯ d X (s) = A(s)X (s) + A(s)E[X (s)] ⎪ ⎪  ⎪ ⎪ ¯ ⎪ + B(s)u(s) + B(s)E[u(s)] + b(s) ds ⎪ ⎨  ¯ + C(s)X (s) + C(s)E[X (s)] ⎪  ⎪ ⎪ ¯ ⎪ + D(s)u(s) + D(s)E[u(s)] + σ(s) dW (s), ⎪ ⎪ ⎩ X (t) = ξ.

(3.1.1)

The initial pair (t, ξ) belongs to   D = (t, ξ) | t ∈ [0, T ], ξ ∈ Xt ≡ L 2Ft (Ω; Rn ) , and the control u is taken from U[t, T ] ≡ L 2F (t, T ; Rm ), which is the same as Problem (SLQ). Note that in (3.1.1), the expectations of X (s) and u(s) are involved. We therefore call (3.1.1) a mean-field SDE. The cost functional we are considering is a quadratic form and also involves the expectations of X (s) and u(s): ¯ ξ; u), J (t, ξ; u)  L(t, ξ; u) + L(t,

(3.1.2)

where  L(t, x; u) = E G X (T ), X (T ) + 2g, X (T )





 T X (s) X (s) Q(s) S(s) , + S(s) R(s) u(s) u(s) t



  q(s) X (s) +2 , ds , ρ(s) u(s) ¯ ξ; u) = GE[X ¯ L(t, (T )], E[X (T )] + 2g, ¯ E[X (T )]





 T  ¯ ¯ Q(s) S(s) E[X (s)] E[X (s)] , + ¯ ¯ E[u(s)] E[u(s)] S(s) R(s) t



 q(s) ¯ E[X (s)] +2 , ds. ρ(s) ¯ E[u(s)] To make a precise statement of the mean-field linear-quadratic optimal control problem (MFLQ problem, for short), we introduce the following assumptions, which are comparable with (H1) and (H2) introduced in Chap. 1.

3.1 Problem Formulation and General Considerations

71

(A1) The coefficients of (3.1.1) satisfy ⎧ 1 n×n ¯ B, B¯ ∈ L 2 (0, T ; Rn×m ), ⎪ ⎨ A, A ∈ L (0, T ; R ), C, C¯ ∈ L 2 (0, T ; Rn×n ), D, D¯ ∈ L ∞ (0, T ; Rn×m ), ⎪ ⎩ b ∈ L 2F (Ω; L 1 (0, T ; Rn )), σ ∈ L 2F (0, T ; Rn ). (A2) The weighting matrices in (3.1.2) satisfy ⎧ Q, Q¯ ∈ L 1 (0, T ; Sn ), S, S¯ ∈ L 2 (0, T ; Rm×n ), ⎪ ⎪ ⎪ ⎪ ⎪ ∞ m ⎪ G, G¯ ∈ Sn , ⎪ R, R¯ ∈ L (0, T ; S ), ⎨ q ∈ L 2F (Ω; L 1 (0, T ; Rn )), q¯ ∈ L 1 (0, T ; Rn ), ⎪ ⎪ ⎪ ⎪ ρ ∈ L 2F (0, T ; Rm ), ρ¯ ∈ L 2 (0, T ; Rm ), ⎪ ⎪ ⎪ ⎩ g ∈ L 2FT (Ω; Rn ), g¯ ∈ Rn . It can be shown that under the assumptions (A1) and (A2), for any initial pair (t, ξ) ∈ D and control u ∈ U[t, T ], the state Equation (3.1.1) has a unique solution X (·) ≡ X (· ; t, ξ, u) ∈ X [t, T ] ≡ L 2F (Ω; C([t, T ]; Rn )), and the cost functional J (t, ξ; u) is therefore well-defined. The MFLQ problem is then stated as follows. Problem (MFLQ). For given initial pair (t, ξ) ∈ D, find a control u ∗ ∈ U[t, T ] such that (3.1.3) J (t, ξ; u ∗ ) = inf J (t, ξ; u) ≡ V (t, ξ). u∈U [t,T ]

An element u ∗ ∈ U[t, T ] that satisfies (3.1.3) is called an open-loop optimal control of Problem (MFLQ) for the initial pair (t, ξ). The solution X ∗ (·) ≡ X (· ; t, ξ, u ∗ ) of (3.1.1) corresponding to an optimal control u ∗ is called an optimal state process. The function V (t, ξ) is called the value function of Problem (MFLQ). We shall denote by Problem (MFLQ)0 the particular case of Problem (MFLQ) where the nonhomogeneous terms b, σ of the state equation and the coefficients of the linear terms g, g, ¯ q, q, ¯ ρ, ρ¯ in the cost functional are zero. The cost functional and value function of Problem (MFLQ)0 will be denoted by J 0 (t, ξ; u) and V 0 (t, ξ), respectively. Definition 3.1.1 Problem (MFLQ) is said to be (i) finite at the initial pair (t, ξ) ∈ D if V (t, ξ) > −∞; (ii) finite at t ∈ [0, T ] if V (t, ξ) > −∞ for all ξ ∈ Xt ; (iii) finite if it is finite at all t ∈ [0, T ]. Definition 3.1.2 Problem (MFLQ) is said to be (i) (uniquely) open-loop solvable at the initial pair (t, ξ) ∈ D if there exists a (unique) u ∗ ∈ U[t, T ] such that (3.1.3) holds; (ii) (uniquely) open-loop solvable at t if for any ξ ∈ Xt , there exists a (unique) u ∗ ∈ U[t, T ] such that (3.1.3) holds;

72

3 Mean-Field Linear-Quadratic Optimal Controls

(iii) (uniquely) open-loop solvable if it is (uniquely) open-loop solvable at all t ∈ [0, T ). It should be noted that Problem (MFLQ) might be finite but not solvable at an initial pair.

3.2 Open-Loop Solvability and Mean-Field FBSDEs In order to study the open-loop solvability of Problem (MFLQ), we introduce the following mean-field BSDE:   ⎧  ¯ ¯ ¯ ⎪ ⎨ dY (s) = − A Y + A E[Y ] + C Z + C E[Z ] + Q X + QE[X ]   + S u + S¯ E[u] + q + q¯ ds + Z d W, s ∈ [t, T ], ⎪ ⎩ ¯ Y (T ) = G X (T ) + GE[X (T )] + g + g, ¯

(3.2.1)

where X is the solution to the state Equation (3.1.1) corresponding to some initial pair (t, ξ) ∈ D and control u ∈ U[t, T ]. This mean-field BSDE is called the adjoint equation associated with (3.1.1). Similar to the standard BSDE theory, the mean-field BSDE (3.2.1) admits a unique adapted solution (Y, Z ) ∈ L 2F (Ω; C([t, T ]; Rn )) × L 2F (t, T ; Rn ). Proposition 3.2.1 Let (A1)–(A2) hold, and let (t, ξ) ∈ D be a given initial pair. The following holds for any scalar λ ∈ R and controls u, v ∈ U[t, T ]: J (t, ξ; u + λv) − J (t, ξ; u) T   B Y + B¯  E[Y ] + D  Z + D¯  E[Z ] = λ2 J 0 (t, 0; v) + 2λE t  ¯ ¯ + S X + SE[X ] + Ru + RE[u] + ρ + ρ, ¯ v ds, (3.2.2) where X is the state process corresponding to (t, ξ, u), and (Y, Z ) is the adapted solution to the associated adjoint equation (3.2.1). Consequently, the map u → J (t, ξ; u) is Fréchet differentiable with the Fréchet derivative given by  Du J (t, ξ; u) = 2 B  Y + B¯  E[Y ] + D  Z + D¯  E[Z ]  + S X + S¯ E[X ] + Ru + R¯ E[u] + ρ + ρ¯ . Proof Let X λ (·) = X (· ; t, ξ, u + λv) be the state process corresponding to the control u + λv and with initial pair (t, ξ), and let X˚ 0v be the solution to the mean-field SDE

3.2 Open-Loop Solvability and Mean-Field FBSDEs

73

⎧   v v ¯ X˚ 0v ] + Bv + BE[v] ¯ ⎪ ds ⎪ d X˚ 0 (s) = A X˚ 0 + AE[ ⎨   v v ¯ X˚ 0 ] + Dv + DE[v] ¯ + C X˚ 0 + CE[ dW, s ∈ [t, T ], ⎪ ⎪ ⎩ ˚v X 0 (t) = 0. It is easily seen, by the linearity of the state equation, that X λ = X + λ X˚ 0v . Substituting this relation into the expression of J (t, ξ; u + λv), we obtain by a direct computation that J (t, ξ; u + λv) − J (t, ξ; u)   ¯ (T )] + g + g, ¯ X˚ 0v (T ) = λ2 J 0 (t, 0; v) + 2λE G X (T ) + GE[X T   ¯ Q X + QE[X ] + S  u + S¯  E[u] + q + q, ¯ X˚ 0v +   ¯ ¯ + S X + SE[X ] + Ru + RE[u] + ρ + ρ, ¯ v ds . 

t

(3.2.3)

Now by applying Itô’s formula to s → Y (s), X˚ 0v (s), we have ¯ EG X (T ) + GE[X (T )] + g + g, ¯ X˚ 0 (T ) T ¯ − A Y + A¯  E[Y ] + C  Z + C¯  E[Z ] + Q X + QE[X ] =E t  + S  u + S¯  E[u] + q + q, ¯ X˚ 0v ¯ X˚ 0v ] + Bv + BE[v], ¯ + A X˚ 0v + AE[ Y  ¯ X˚ 0v ] + Dv + DE[v], ¯ + C X˚ 0v + CE[ Z  ds T =E B  Y + B¯  E[Y ] + D  Z + D¯  E[Z ], v t  ¯ ¯ X˚ 0v  ds. (3.2.4) − Q X + QE[X ] + S  u + S¯  E[u] + q + q, Substituting (3.2.4) into (3.2.3) yields (3.2.2).



Theorem 3.2.2 Let (A1)–(A2) hold, and let (t, ξ) ∈ D be given. Let u ∈ U[t, T ] and (X, Y, Z ) be the adapted solution to the following (decoupled) mean-field FBSDE: ⎧   ¯ ¯ ⎪ ⎪ d X (s) = AX + AE[X ] + Bu + BE[u] + b ds ⎪   ⎪ ⎪ ¯ ¯ ⎪ + C X + CE[X ] + Du + DE[u] + σ dW, ⎪ ⎨  ¯ dY (s) = − A Y + A¯  E[Y ] + C  Z + C¯  E[Z ] + Q X + QE[X ] ⎪  ⎪   ⎪ ⎪ + S u + S¯ E[u] + q + q¯ ds + Z d W, ⎪ ⎪ ⎪ ⎩ ¯ X (t) = ξ, Y (T ) = G X (T ) + GE[X (T )] + g + g. ¯

(3.2.5)

74

3 Mean-Field Linear-Quadratic Optimal Controls

Then u is an open-loop optimal control of Problem (MFLQ) for the initial pair (t, ξ) if and only if (3.2.6) J 0 (t, 0; v)  0, ∀v ∈ U[t, T ], and the following holds almost everywhere on [t, T ] and almost surely: B  Y + D  Z + S X + Ru + ρ ¯ ¯ + B¯  E[Y ] + D¯  E[Z ] + SE[X ] + RE[u] + ρ¯ = 0.

(3.2.7)

Proof By (3.2.2), we see that u is an optimal control of Problem (MFLQ) for the initial pair (t, ξ) if and only if

T

λ J (t, 0; v) + λE 2

0

Du J (t, ξ; u)(s), v(s)ds

t

= J (t, ξ; u + λv) − J (t, ξ; u)  0, ∀λ ∈ R, ∀v ∈ U[t, T ], which is equivalent to (3.2.6), together with the following:

T

E

Du J (t, ξ; u)(s), v(s)ds = 0, ∀v ∈ U[t, T ].

t



The result follows immediately.

3.3 A Hilbert Space Point of View According to Proposition 3.2.1, the cost functional J (t, ξ; u) can be written as J (t, ξ; u) = J 0 (t, 0; u) + Du J (t, ξ; 0), u + J (t, ξ; 0). For fixed initial pair (t, ξ) ∈ D, the second term is linear in the control variable u, and the last term is a constant. We now look at the structure of J 0 (t, 0; u) from a Hilbert space point of view. For an initial pair (t, ξ) ∈ D and a control u ∈ U[t, T ], we denote by X˚ ξu the solution to the homogeneous state equation ⎧   ¯ X˚ ξu ] + Bu + BE[u] ¯ ⎪ d X˚ ξu (s) = A X˚ ξu + AE[ ds ⎪ ⎨   ¯ X˚ ξu ] + Du + DE[u] ¯ dW, + C X˚ ξu + CE[ ⎪ ⎪ ⎩ ˚u X ξ (t) = ξ.

(3.3.1)

3.3 A Hilbert Space Point of View

75

In terms of X˚ ξu , the cost functional J 0 (t, ξ; u) becomes  ¯ X˚ ξu (T )], E[ X˚ ξu (T )] J 0 (t, ξ; u) = E G X˚ ξu (T ), X˚ ξu (T ) + GE[

u u  T Q(s) S(s) X˚ ξ (s) X˚ ξ (s) , ds + S(s) R(s) u(s) u(s) t





  T ¯ ¯  Q(s) S(s) E[ X˚ ξu (s)] E[ X˚ ξu (s)] + , ds . ¯ ¯ S(s) R(s) E[u(s)] E[u(s)] t When ξ = 0, we can define two bounded linear operators Lt : U[t, T ] → X [t, T ] by Lt u = X˚ 0u

and

and

t : U[t, T ] → XT L

t u = X˚ 0u (T ), L

∗t , and E∗ the adjoints of Lt , L t , and the expectation respectively. Denote by L∗t , L operator E, respectively. Then  ¯ X˚ 0u (T )], E[ X˚ 0u (T )] J 0 (t, 0; u) = E G X˚ 0u (T ), X˚ 0u (T ) + GE[

u u  T Q(s) S(s) X˚ 0 (s) X˚ 0 (s) , ds + S(s) R(s) u(s) u(s) t



 

T ¯ ¯  Q(s) S(s) E[ X˚ 0u (s)] E[ X˚ 0u (s)] + , ds ¯ ¯ S(s) R(s) E[u(s)] E[u(s)] t     t u, L t u + GE[ t u], E[L t u] ¯ L = GL + QLt u, Lt u + 2SLt u, u + Ru, u       ¯ ¯ ¯ + QE[L t u], E[Lt u] + 2 SE[Lt u], E[u] + RE[u], E[u] . Let ∗t (G + E∗ GE) t + L∗t (Q + E∗ QE)L ¯ L ¯ Mt = L t ∗¯ ∗  ∗ ¯ ¯ + (S + E SE)Lt + Lt (S + E S E) + (R + E∗ RE), which is a bounded self-adjoint linear operator from U[t, T ] into itself. Then the above can be further written as J 0 (t, 0; u) = Mt u, u, and the cost functional J (t, ξ; u) admits the following representation:

(3.3.2)

76

3 Mean-Field Linear-Quadratic Optimal Controls

J (t, ξ; u) = Mt u, u + Du J (t, ξ; 0), u + J (t, ξ; 0), ∀(t, ξ) ∈ D, ∀u ∈ U[t, T ].

(3.3.3)

We see that the cost functional J (t, ξ; u) of the mean-field LQ problem has the same structure as that of Problem (SLQ) presented in Sect. 2 of [48, Chap. 2]. So most results there remain true for Problem (MFLQ). To be more precise, let us first introduce the following relevant assumptions. (A3) J 0 (t, 0; u)  0 for all u ∈ U[t, T ]. (A4) There exists a constant δ > 0 such that

T

J 0 (t, 0; u)  δ E

|u(s)|2 ds, ∀u ∈ U[t, T ].

t

From (3.3.2) and (3.3.3), we see that (A3) is equivalent to each of the following conditions: (i) Mt u, u  0 for all u ∈ U[t, T ]. (ii) The map u → J 0 (t, ξ; u) is convex for every ξ ∈ Xt . (iii) The map u → J (t, ξ; u) is convex for every ξ ∈ Xt . Likewise, (A4) is equivalent to each of the following conditions: (i) There exists a constant δ > 0 such that

T

Mt u, u  δ E

|u(s)|2 ds, ∀u ∈ U[t, T ].

t

(ii) The map u → J 0 (t, ξ; u) is uniformly convex for every ξ ∈ Xt . (iii) The map u → J (t, ξ; u) is uniformly convex for every ξ ∈ Xt . The following result shows that (A3) is a necessary condition for the finiteness (and open-loop solvability) of Problem (MFLQ) at the initial time t, and that (A4) is a sufficient condition for the open-loop solvability at t. Proposition 3.3.1 Let (A1)–(A2) hold and let t be a given initial time. The following statements are true: (i) If Problem (MFLQ) is finite at t, then (A3) must hold. (ii) Suppose that (A4) holds. Then Problem (MFLQ) is uniquely open-loop solvable at t, and the unique optimal control for the initial pair (t, ξ) is given by 1 Du J (t, ξ; 0). u ∗ = − M−1 2 t Moreover,

2 1 −1 V (t, ξ) = J (t, ξ; 0) − Mt 2 Du J (t, ξ; 0) . 4

3.3 A Hilbert Space Point of View

77

Proof The proof is straightforward, and it is omitted.



Assume that the necessary condition (A3) for the finiteness of Problem (MFLQ) holds. For ε > 0, let us consider the state Equation (3.1.1) and the following uniformly convex cost functional:

T

Jε (t, ξ; u)  J (t, ξ; u) + εE

|u(s)|2 ds

t

= (Mt + εI )u, u + Du J (t, ξ; 0), u + J (t, ξ; 0).

(3.3.4)

Denote the corresponding optimal control problem and value function by Problem (MFLQ)ε and Vε (t, ξ), respectively. According to Proposition 3.3.1(ii), for any ξ ∈ Xt , Problem (MFLQ)ε admits a unique optimal control 1 u ∗ε = − (Mt + εI )−1 Du J (t, ξ; 0), 2

(3.3.5)

and its value function is given by 2 1 1 Vε (t, ξ) = J (t, ξ; 0) − (Mt + εI )− 2 Du J (t, ξ; 0) . 4

(3.3.6)

Theorem 3.3.2 Let (A1)–(A3) hold, and let ξ ∈ Xt be a given initial state. We have the following: (i) limε→0 Vε (t, ξ) = V (t, ξ). (ii) The family {u ∗ε }ε>0 defined by (3.3.5) is a minimizing family of the map u → J (t, ξ; u). That is, lim J (t, ξ; u ∗ε ) =

ε→0

inf

u∈U [t,T ]

J (t, ξ; u) = V (t, ξ).

(3.3.7)

(iii) The following statements are equivalent: (a) Problem (MFLQ) is open-loop solvable at (t, ξ); (b) The family {u ∗ε }ε>0 is bounded in U[t, T ]; (c) u ∗ε converges strongly in U[t, T ] as ε → 0. In this case, the strong limit of u ∗ε is an open-loop optimal control of Problem (MFLQ) for (t, ξ). The proof of Theorem 3.3.2 follows from Proposition 1.3.4 of [48, Chap. 1] (see also Theorem 2.6.2 of [48, Chap. 2]). The details are omitted here.

78

3 Mean-Field Linear-Quadratic Optimal Controls

3.4 Uniform Convexity and Riccati Equations Theorem 3.3.2 tells us that under the necessary condition (A3), in order to solve Problem (MFLQ), we need only solve Problem (MFLQ)ε , which has the uniformly convex cost functional, then pass to the limit with ε → 0. When the uniform convexity condition (A4) holds, Proposition 3.3.1 gives a representation of the unique optimal control u ∗ in terms of the operator Mt and Du J (t, ξ; 0). However, such a representation is not convenient for applications, since both Mt and Du J (t, ξ; 0) are in abstract forms and very difficult to compute. To obtain a more explicit form of the optimal control, we will consider the following two Riccati equations:  

P˙ + Q(P) − S(P) R(P)† S(P) = 0, P(T ) = G,   Π ) R(P)  † S(P,  Π ) = 0, Π˙ + Q(P, Π ) − S(P,  Π (T ) = G,

(3.4.1) (3.4.2)

where in (3.4.1), we have adopted the notation Q(P) = P A + A P + C  PC + Q, R(P) = R + D  P D, S(P) = B  P + D  PC + S,

(3.4.3)

introduced in (1.1.8), and in (3.4.2), we have employed the following notation: ¯  ¯ C ¯ ¯  = A + A,  = C + C,  = D + D, A B = B + B, D ¯  ¯ ¯ ¯  = Q + Q,  = R + R,  = G + G, Q S = S + S, R G   + A Π + C  P C  + Q,  R(P) + D  P D,  Q(P, Π) = Π A =R  Π) =   P C + S(P, BΠ + D S.

(3.4.4)

It should be noted that in the above the variable s has been suppressed for simplicity. In this section, we will (i) establish the equivalence between the uniform convexity of the cost functional and the strongly regular solvability of the above two Riccati equations, (ii) use the solutions of (3.4.1) and (3.4.2) to construct the optimal control, and (iii) present some sufficient conditions on the coefficients of the state equation and the weighting matrices of the cost functional that guarantee the uniform convexity of the cost functional. To clearly present the results, we will divide the developments into three subsections.

3.4 Uniform Convexity and Riccati Equations

79

3.4.1 Solvability of Riccati Equations: Sufficiency of the Uniform Convexity Our first result of this section is as follows. Theorem 3.4.1 Let (A1)–(A2) and (A4) hold. Then the Riccati equation (3.4.1) admits a unique solution P ∈ C([t, T ]; Sn ) such that  + D  P D  0, ≡R R(P) ≡ R + D  P D 0, R(P)

(3.4.5)

and the Riccati equation (3.4.2) admits a unique solution Π ∈ C([t, T ]; Sn ). The proof of Theorem 3.4.1 will be divided into two steps. In the first step, we will prove that the Riccati equation (3.4.1) admits a unique solution P ∈ C([t, T ]; Sn ) such that the first condition in (3.4.5) holds. In the second step, we will further show that the solution P satisfies the second condition in (3.4.5), and that the Riccati equation (3.4.2) is solvable. For the first step, we need the following lemmas. Lemma 3.4.2 Let (A1)–(A2) and (A4) hold. Then there exists a constant α ∈ R such that value function of Problem (MFLQ)0 satisfies V 0 (s, ξ)  αE|ξ|2 , ∀(s, ξ) ∈ D with s  t and Eξ = 0. Proof Let s ∈ [t, T ]. For a control u ∈ U[s, T ], we define its zero-extension on [t, T ] as follows:  0, r ∈ [t, s), [01[t,s) ⊕ u](r ) = u(r ), r ∈ [s, T ]. It is easily seen that v  01[t,s) ⊕ u ∈ U[t, T ]. Let X˚ 0v be the solution to the following mean-field SDE over [t, T ]: ⎧   ¯ )E[ X˚ 0v (r )] + B(r )v(r ) + B(r ¯ )E[v(r )] dr ⎪ d X˚ 0v (r ) = A(r ) X˚ 0v (r ) + A(r ⎪ ⎨   ¯ )E[ X˚ 0v (r )] + D(r )v(r ) + D(r ¯ )E[v(r )] dW, + C(r ) X˚ 0v (r ) + C(r ⎪ ⎪ ⎩ ˚v X 0 (t) = 0. Since the initial state is zero, we have X˚ 0v (r ) = 0 for r ∈ [t, s]. Hence, J 0 (s, 0; u) = J 0 (t, 0; 01[t,s) ⊕ u) T |[01[t,s) ⊕ u](r )|2 dr = δ E  δE t

T

|u(r )|2 dr.

s

Let (X, Y, Z ) be the adapted solution to the mean-field FBSDE

(3.4.6)

80

3 Mean-Field Linear-Quadratic Optimal Controls

    ⎧ ¯ ¯ d X (r ) = AX + AE[X ] dr + C X + CE[X ] dW, r ∈ [s, T ], ⎪ ⎪ ⎪ ⎪ ⎨ dY (r ) = −  A Y + A¯  E[Y ] + C  Z + C¯  E[Z ]  ¯ ⎪ + Q X + QE[X ] dr + Z d W, r ∈ [s, T ], ⎪ ⎪ ⎪ ⎩ ¯ X (s) = ξ, Y (T ) = G X (T ) + GE[X (T )], and let (X, Y, Z) be the adapted solution to the matrix FBSDE ⎧ + CXdW, r ∈ [0, T ], ⎪ ⎨ dX(r ) = AXdr    dY(r ) = − A Y + C  Z + QX dr + ZdW, r ∈ [0, T ], ⎪ ⎩ X(0) = In , Y(T ) = GX(T ). According to Proposition 3.2.1, we have J 0 (s, ξ; u) − J 0 (s, ξ; 0) − J 0 (s, 0; u) T    ¯ B Y + B¯  E[Y ] + D  Z + D¯  E[Z ] + S X + SE[X ], u dr. = 2E

(3.4.7)

s

If E[ξ] = 0, then

E[X (r )] = 0, ∀r ∈ [s, T ],

(3.4.8)

and one can easily verify that for r ∈ [s, T ], (X (r ), Y (r ), Z (r )) = (X(r )X(s)−1 , Y(r )X(s)−1 , Z(r )X(s)−1 )ξ.

(3.4.9)

Since for r ∈ [s, T ], X(r )X(s)−1 , Y(r )X(s)−1 and Z(r )X(s)−1 are independent of Fs , we have E[X (r )] = E[Y (r )] = E[Z (r )] = 0, ∀r ∈ [s, T ], and thereby (3.4.7) reduces to

T

J 0 (s, ξ; u) − J 0 (s, ξ; 0) − J 0 (s, 0; u) = 2E

B  Y + D  Z + S X, udr.

s

Now using the Cauchy-Schwarz inequality and (3.4.6), we obtain J (s, ξ; u)  J (s, ξ; 0) + J (s, 0; u) − δE 0

0

0

T

|u(r )|2 dr

s

T 1 − E |B  Y + D  Z + S X |2 dr δ s T 1  J 0 (s, ξ; 0) − E |B  Y + D  Z + S X |2 dr. δ s

(3.4.10)

3.4 Uniform Convexity and Riccati Equations

81

Recalling (3.4.8) and (3.4.9), we can rewrite J 0 (s, ξ; 0) as   T J (s, ξ; 0) = E G X (T ), X (T ) + Q(r )X (r ), X (r )dr s    = E ξ  X(s)−1 X(T ) GX(T )X(s)−1 ξ 0

+ ξ



T

   X(s)−1 X(r ) Q(r )X(r )X(s)−1 dr ξ .

s

Similarly, E

T

|B(r ) Y (r ) + D(r ) Z (r ) + S(r )X (r )|2 dr  T     = E ξ X(s)−1 B(r ) Y(r ) + D(r ) Z(r ) + S(r )X(r ) s

   × B(r ) Y(r ) + D(r ) Z(r ) + S(r )X(r ) X(s)−1 dr ξ . s

Thus, with the notation   M(s) = E X(s)−1 X(T ) GX(T )X(s)−1 T   X(s)−1 X(r ) Q(r )X(r )X(s)−1 dr + s

   1 T − X(s)−1 B(r ) Y(r ) + D(r ) Z(r ) + S(r )X(r ) δ s    × B(r ) Y(r ) + D(r ) Z(r ) + S(r )X(r ) X(s)−1 dr , (3.4.10) becomes

  J 0 (s, ξ; u)  E ξ  M(s)ξ .

The desired result then follows from the fact that the function M : [t, T ] → Sn is continuous.  Lemma 3.4.3 Let (A1)–(A2) hold. For Θ ∈ Θ[t, T ], let PΘ ∈ C([t, T ]; Sn ) denote the solution to the following Lyapunov equation: ⎧   ˙ ⎪ ⎨ PΘ + PΘ (A + BΘ) + (A + BΘ) PΘ + (C + DΘ) PΘ (C + DΘ) (3.4.11) + Θ  RΘ + S  Θ + Θ  S + Q = 0, ⎪ ⎩ PΘ (T ) = G.

82

3 Mean-Field Linear-Quadratic Optimal Controls

If there exist constants α ∈ R and β > 0 such that for all Θ ∈ Θ[t, T ], PΘ (s)  αIn , R(s) + D(s) PΘ (s)D(s)  β Im , a.e. s ∈ [t, T ],

(3.4.12)

then the Riccati equation (3.4.1) is strongly regularly solvable. Proof The proof is the same as the proof of Chap. 2, Theorem 2.5.6, part (i) ⇒ (ii) in [48].  Proof of Theorem 3.4.1. Step 1: We need only show that the condition stated in Lemma 3.4.3 holds. To this end, let Θ ∈ Θ[t, T ] and denote simply by P the corresponding solution of (3.4.11). Take an arbitrary deterministic function u ∈ L 2 (t, T ; Rm ) ⊆ U[t, T ], and let X u denote the solution to the following SDE over [t, T ]: 

d X u (s) = [(A + BΘ)X u + BuW ]ds + [(C + DΘ)X u + DuW ]dW, X u (t) = 0.

One sees that v  Θ X u + uW ∈ U[t, T ],

(3.4.13)

E[X u (s)] = 0, E[v(s)] = 0, ∀s ∈ [t, T ].

(3.4.14)

and that

Thus, X u also satisfies the mean-field SDE ⎧   u ¯ ¯ d X u (s) = AX u + AE[X ] + Bv + BE[v] ds ⎪ ⎪ ⎨   u ¯ ¯ + C X u + CE[X ] + Dv + DE[v] dW, s ∈ [t, T ], ⎪ ⎪ ⎩ X u (t) = 0. This means that J 0 (t, 0; v) = EG X u (T ), X u (T ) T  Q X u , X u  + 2S X u , v + Rv, v ds. +E t

By applying Itô’s formula to s → P(s)X u (s), X u (s), we obtain

(3.4.15)

3.4 Uniform Convexity and Riccati Equations

83

EG X u (T ), X u (T ) T  P˙ X u , X u  + P[(A + BΘ)X u + BuW ], X u  =E t

+ P X u , (A + BΘ)X u + BuW 

 + P[(C + DΘ)X u + DuW ], (C + DΘ)X u + DuW  ds T − (Θ  RΘ + S  Θ + Θ  S + Q)X u , X u  =E  + 2[B  P + D  P(C + DΘ)]X u , uW  + D  P DuW, uW  ds. t

Substituting (3.4.13) and the above into (3.4.15) gives J 0 (t, 0; v) = E

T

 2[S(P) + R(P)Θ]X u , uW  + R(P)uW, uW  ds.

t

(3.4.16)

By the assumption (A4),

T

J (t, 0; v)  δE 0



T

|v(s)| ds = δE 2

t

|Θ(s)X u (s) + u(s)W (s)|2 ds. (3.4.17)

t

Combining (3.4.16) and (3.4.17) yields

T

E

  2[S(P) + (R(P) − δ Im )Θ]W X u , u + W 2 [R(P) − δ Im ]u, u ds

t



T

= δE

|Θ(s)X u (s)|2 ds  0.

(3.4.18)

t

For simplicity of notation, let us write Δ = S(P) + [R(P) − δ Im ]Θ,  = R(P) − δ Im . Noting that u is a deterministic function, we can rewrite (3.4.18) as

T



 2Δ(s)E[W (s)X u (s)], u(s) + s(s)u(s), u(s) ds  0.

(3.4.19)

t

By applying Itô’s formula to s → W (s)X u (s) and taking expectations, we see that V (s)  E[W (s)X u (s)] satisfies the following ODE: 

V˙ (s) = [A(s) + B(s)Θ(s)]V (s) + s B(s)u(s), s ∈ [t, T ], V (t) = 0.

84

3 Mean-Field Linear-Quadratic Optimal Controls

Now we take u to be the form u(s) = u 0 1[t  ,t  +h] (s), where u 0 ∈ Rm is a constant vector and t  t  < t  + h  T . Then ⎧ s ∈ [t, t  ], ⎪ ⎨ 0, s∧(t  +h) E[W (s)X u (s)] = ⎪ Φ(r )−1 B(r )r u 0 dr, s ∈ [t  , T ], ⎩ Φ(s) t

where Φ is the fundamental matrix for the system x˙ (s) = [A(s) + B(s)Θ(s)]x(s), s ∈ [0, T ]. Consequently, (3.4.19) becomes

t  +h t

s     2 Δ(s)Φ(s) Φ(r )−1 B(r )r u 0 dr, u 0 + s(s)u 0 , u 0  ds  0. t

Dividing both sides by h and letting h → 0, we obtain t  (t  )u 0 , u 0   0, a.e. t  ∈ [t, T ]. Since u 0 ∈ Rm is arbitrary, the above implies that (s)  0 for almost every s ∈ [t, T ], or equivalently, R(s) + D(s) P(s)D(s)  δ Im , a.e. s ∈ [t, T ]. To show that P(s)  αIn , a.e. s ∈ [t, T ] for some constant α independent of Θ, we denote by X the solution of 

d X (r ) = (A + BΘ)X dr + (C + DΘ)X d W, r ∈ [s, T ], X (s) = W (s)x,

and set w = Θ X ∈ U[s, T ]. Similar to the previous arguments, by applying Itô’s formula to r → P(r )X (r ), X (r ), we can derive that J 0 (s, W (s)x; w) = EP(s)W (s)x, W (s)x = sP(s)x, x.

3.4 Uniform Convexity and Riccati Equations

85

Then with α being the constant in Lemma 3.4.2, sP(s)x, x  V 0 (s, W (s)x)  αE|W (s)x|2 = sα|x|2 , ∀(s, x) ∈ [t, T ] × Rn , which further implies that P(s)  αIn for all s ∈ [t, T ].



For the second step of the proof of Theorem 3.4.1, we introduce a deterministic LQ optimal control problem. With the notation of (3.4.4), let us consider the state equation   y˙ (s) = A(s)y(s) + B(s)u(s), s ∈ [t, T ], (3.4.20) y(t) = x, and the cost functional  y(T ), y(T ) + J(t, x; u) = G

t

T

   0) Q(P, 0) S(P, y y , ds.  0) R(P)  u u S(P,

We pose the following problem. Problem (DLQ). For given initial pair (t, x) ∈ [0, T ) × Rn , find a control u ∗ ∈ L 2 (t, T ; Rm ) such that J(t, x; u ∗ ) =

inf

u∈L 2 (t,T ;Rm )

J(t, x; u).

Note that the Riccati equation associated with this Problem (DLQ) is exactly (3.4.2). Thus, the second step will be accomplished once we prove the following result. Theorem 3.4.4 Let (A1)–(A2) and (A4) hold. Then the map u → J(t, 0; u) is uniformly convex, i.e., there exists a constant λ > 0 such that J(t, 0; u)  λ



T

|u(s)|2 ds, ∀u ∈ L 2 (t, T ; Rm ).

t

Hence, the strongly regular solution P of the Riccati equation (3.4.1) satisfies  + D PD  0, R(P) ≡R and the Riccati equation (3.4.2) admits a unique solution Π ∈ C([t, T ]; Sn ). Proof Let Θ = −R(P)−1 S(P). We claim that J 0 (t, 0; Θ X + v) = J(t, 0; Θ y + v), ∀v ∈ L 2 (t, T ; Rm ).

(3.4.21)

86

3 Mean-Field Linear-Quadratic Optimal Controls

To prove (3.4.21), fix an arbitrary v ∈ L 2 (t, T ; Rm ). Let y be the solution of 

 y˙ (s) = A(s)y(s) + B(s)[Θ(s)y(s) + v(s)], s ∈ [t, T ], y(t) = 0,

and X be the solution of   ⎧ ¯ ¯ + AE[X ] + B(Θ X + v) + BE[Θ X + v] ds ⎪ ⎨ d X (s) = AX   ¯ ¯ + C X + CE[X ] + D(Θ X + v) + DE[Θ X + v] dW, s ∈ [t, T ], ⎪ ⎩ X (t) = 0. Since Θ and v are deterministic, we have     dE[X (s)] = AE[X ]+  B(ΘE[X ] + v) ds, s ∈ [t, T ], E[X (t)] = 0. Thus, the functions s → E[X (s)] and s → y(s) satisfy the same ODE, and hence by the uniqueness of solutions, E[X (s)] = y(s), s ∈ [t, T ]. Let z(s)  X (s) − E[X (s)] = X (s) − y(s). Then ¯  y(T ), y(T ), EG X (T ), X (T )+GE[X (T )], E[X (T )] = EGz(T )z(T )+G ¯ y, y. ], E[X ] = EQz, z+ Q EQ X, X + QE[X Also, let u(s)  Θ(s)X (s) + v(s). Then E[u(s)] = Θ(s)y(s) + v(s). Consequently, ¯ ES X, u +  SE[X ], E[u] = ESz, Θz +  S y, Θ y + v, ¯  y + v), Θ y + v. E[u] = ERΘz, Θz +  R(Θ ERu, u +  RE[u],

3.4 Uniform Convexity and Riccati Equations

87

It follows that J 0 (t, 0; Θ X + v)  = E Gz(T )z(T ) +

T

   Qz, z + 2Sz, Θz + RΘz, Θz ds

t



 y, y + 2 Q S y, Θ y + v t   y + v), Θ y + v ds. +  R(Θ

 y(T ), y(T ) + + G

T

(3.4.22)

Observe that z satisfies the following SDE: 

  y + D(Θ  y + v) dW, dz(s) = (A + BΘ)zds + (C + DΘ)z + C z(t) = 0.

Also, keep in mind that v is deterministic and E[z(s)] ≡ 0, and note that 0 = P˙ + P(A + BΘ) + (A + BΘ) P + (C + DΘ) P(C + DΘ) + Θ  RΘ + S  Θ + Θ  S + Q. Then by applying Itô’s formula to s → P(s)z(s), z(s), we can obtain  E Gz(T )z(T ) +

T

= t

T



  Qz, z + 2Sz, Θz + RΘz, Θz ds

t



 P C y, y + 2 D  P C y, Θ y + v C   y + v), Θ y + v ds.  P D(Θ + D

(3.4.23)

Substituting (3.4.23) into (3.4.22) gives (3.4.21). Consequently, we have by the assumption (A4) that J(t, 0; Θ y + v)  δE

t T

δ t

T

|Θ(s)X (s) + v(s)|2 ds

|E[Θ(s)X (s) + v(s)]|2 ds = δ

T

|Θ(s)y(s) + v(s)|2 ds.

t

Since v ∈ L 2 (t, T ; Rm ) is arbitrary, the uniform convexity of u → J(t, 0; u) follows. The rest assertions are immediate consequences of Theorem 1.1.15 (with the initial time 0 replaced by t). 

88

3 Mean-Field Linear-Quadratic Optimal Controls

3.4.2 Solvability of Riccati Equations: Necessity of the Uniform Convexity We have shown in Theorem 3.4.1 that the uniform convexity condition (A4) implies the solvability of the Riccati equations (3.4.1) and (3.4.2). In this section we establish the converse to Theorem 3.4.1. First we need the following lemma. Lemma 3.4.5 Let (A1)–(A2) hold. For u ∈ U[t, T ], let X 0u denote the solution to   ⎧ u u u ¯ ¯ ⎪ 0 + AE[X 0 ] + Bu + BE[u] ds ⎨ d X 0 (s) = AX   u ¯ ¯ + C X 0u + CE[X 0 ] + Du + DE[u] dW, s ∈ [t, T ], ⎪ ⎩ X 0u (t) = 0.

(3.4.24)

Then for every Θ, Θ¯ ∈ Θ[t, T ], there exists a constant γ > 0 such that for all u ∈ U[t, T ],

T

E t

|u(s) − Θ(s){X 0u (s) − E[X 0u (s)]}|2 ds  γE

T t

u 2 ¯ |E[u(s)] − Θ(s)E[X 0 (s)]| ds  γ

t T



T

|u(s)|2 ds,

(3.4.25)

|E[u(s)]|2 ds.

(3.4.26)

t

Proof For each Θ ∈ Θ[t, T ], we can define a bounded linear operator A : U[t, T ] → U[t, T ] by Au = u − Θ(X 0u − E[X 0u ]). The operator A is bijective and its inverse A−1 is given by  v  X 0 − E[  X 0v ] , A−1 v = v + Θ  where  X 0v = {  X 0v (s); t  s  T } is the solution of   ⎧ ¯ v + BΘ)  X 0v + ( A¯ − BΘ)E[  X 0v ] + Bv + BE[v] ds ⎪ ⎨ d X 0 (s) = (A   v v + (C + DΘ)  X 0 + (C¯ − DΘ)E[  X 0 ] + Dv + DE[v] dW, ⎪ ⎩ v X 0 (t) = 0. By the open mapping theorem, A−1 is also a bounded operator with norm A−1  > 0. Thus, for each u ∈ U[t, T ],

3.4 Uniform Convexity and Riccati Equations



T

E



T

|u(s)| ds = E 2

t

89

−1

−1



T

|(A Au)(s)| ds  A E

t −1

2

|(Au)(s)|2 ds

t



T

= A  E t

|u(s) − Θ(s){X 0u (s) − E[X 0u (s)]}|2 ds.

This shows that (3.4.25) holds with γ = A−1 −1 . To prove (3.4.26), for each deterministic v ∈ L 2 (t, T ; Rm×n ), let y v denote the solution to   y˙ (s) = A(s)y(s) + B(s)v(s), s ∈ [t, T ], (3.4.27) y(t) = 0. For every Θ¯ ∈ Θ[t, T ], we can define a bounded linear operator B : L 2 (t, T ; Rm ) → L 2 (t, T ; Rm ) by Bv = v − Θ¯ y v . Similar to the previous argument, we can show that B is invertible and that

T

v ¯ |v(s) − Θ(s)y (s)|2 ds 

t

1 B −1 



T

|v(s)|2 ds, ∀v ∈ L 2 (t, T ; Rm×n ).

t

Since E[X 0u ] satisfies (3.4.27) with v = E[u], (3.4.26) follows.



Now we state and prove the converse to Theorem 3.4.1. Theorem 3.4.6 Let (A1)–(A2) hold. Suppose that the Riccati equation (3.4.1) admits a solution P ∈ C([t, T ]; Sn ) such that (3.4.5) holds and that the Riccati equation (3.4.2) admits a solution Π ∈ C([t, T ]; Sn ). Then the uniform convexity condition (A4) holds. Proof Recall the notation R(P) = R + D  P D,  + D  P D,  R(P) =R and set

S(P) = B  P + D  PC + S,  Π) =   P C + S(P, BΠ + D S,

 Π ).  −1 S(P,   −R(P) Θ  −R−1 (P)S(P), Θ

Let (η, ζ) and η¯ be the solutions to the BSDE  ⎧    ⎪ ⎨ dη(s) = − (A + BΘ) η + (C + DΘ) ζ + (C + DΘ) Pσ + Θ  ρ + Pb + q ds + ζdW, s ∈ [t, T ], ⎪ ⎩ η(T ) = g,

(3.4.28)

90

3 Mean-Field Linear-Quadratic Optimal Controls

and the ODE   ⎧    ˙   ⎪ ⎨ η¯ + ( A + B Θ) η¯ + Θ D (PE[σ] + E[ζ]) + E[ρ] + ρ¯  (PE[σ] + E[ζ]) + E[q] + q¯ + Π E[b] = 0, s ∈ [t, T ], +C ⎪ ⎩ η(T ¯ ) = E[g] + g, ¯ respectively, and set   ϕ = −R(P)−1 B  η + D  (ζ + Pσ) + ρ ,     −1   (E[ζ] + PE[σ]) + E[ρ] + ρ¯ . ϕ¯ = −R(P) B η¯ + D

(3.4.29)

For any ξ ∈ L 2Ft (Ω; Rn ) and u ∈ U[t, T ], let X = {X (s); t  s  T } be the corresponding solution to the mean-field state equation (3.1.1), and let z(s) = X (s) − E[X (s)], v(s) = u(s) − E[u(s)], y(s) = E[X (s)]; t  s  T. Then z satisfies   ⎧ + Bv + b − E[b] ds ⎪ ⎨ dz(s) = Az   y + DE[u]  + C z + Dv + σ + C dW, s ∈ [t, T ], ⎪ ⎩ z(t) = ξ − E[ξ], and y satisfies  + y˙ (s) = Ay BE[u] + E[b], s ∈ [t, T ];

y(t) = E[ξ].

Rewrite the cost functional as follows:  J (t, ξ; u) = E Gz(T ) + 2g, z(T )



   z z q z + , +2 , ds v v ρ v t    y(T ) + 2(E[g] + g), + G ¯ y(T )





 T   y Q S y , +   E[u] E[u] S R t



  E[q] + q¯ y +2 , ds. (3.4.30) E[ρ] + ρ¯ E[u]

T

Q S S R

3.4 Uniform Convexity and Riccati Equations

91

By applying Itô’s formula to s → P(s)z(s) + 2η(s), z(s) and noting that E[z(s)] = 0, E[v(s)] = 0; t  s  T, we obtain EGz(T ) + 2g, z(T ) − EP(t)(ξ − E[ξ]) + 2η(t), ξ − E[ξ] T     ( P˙ + P A + A P + C  PC)z, z + 2 (P B + C  P D)v, z =E      t + D  P Dv, v + 2 B  η + D  ζ + D  Pσ, v − Θz − 2 Θ  ρ + q, z     y + DE[u]  y + DE[u]),  y + DE[u]  + 2 PE[σ] + E[ζ], C + P(C C  + Pσ, σ + 2η, b − E[b] + 2ζ, σ ds. It follows that  E Gz(T ) + 2g, z(T ) +

   z z q z , +2 , ds v v ρ v t T = EP(t)(ξ − E[ξ]) + 2η(t), ξ − E[ξ] + E Θ  R(P)Θz, z T



Q S S R

t

− 2Θ  R(P)v, z + R(P)v, v − 2R(P)ϕ, v − Θz  P C y, y + 2C  P DE[u],   P DE[u],  + C y +  D E[u]         + 2C PE[σ] + E[ζ] , y + 2 D PE[σ] + E[ζ] , E[u]  + Pσ, σ + 2η, b − E[b] + 2ζ, σ ds = EP(t)(ξ − E[ξ]) + 2η(t), ξ − E[ξ] T R(P)(v − Θz − ϕ), v − Θz − ϕ − R(P)ϕ, ϕ +E t

 P C y, y + 2C  P DE[u],   P DE[u],  + C y +  D E[u]        PE[σ] + E[ζ] , y + 2 D  PE[σ] + E[ζ] , E[u] + 2C  + Pσ, σ + 2η, b − E[b] + 2ζ, σ ds.

(3.4.31)

Similarly, by applying the integration by parts formula to s → Π (s)y(s) + 2η(s), ¯ y(s), we can obtain

92

3 Mean-Field Linear-Quadratic Optimal Controls





 T   y Q S y  G y(T ) + 2(E[g] + g), ¯ y(T ) + ,   E[u] E[u] S R t



  E[q] + q¯ y +2 , ds E[ρ] + ρ¯ E[u] T + A   Π + Q)y,  y (Π˙ + Π A = Π (t)E[ξ] + 2η(t), ¯ E[ξ] + t

  η¯ + E[q] + q¯ + Π E[b], y + 2(Π  B + S  )E[u], y + 2η˙¯ + A   + 2  B  η¯ + E[ρ] + ρ, ¯ E[u] +  RE[u], E[u] + 2η, ¯ E[b] ds. (3.4.32) Substituting (3.4.31) and (3.4.32) into (3.4.30) gives J (t, ξ; u) = EP(t)(ξ − E[ξ]) + 2η(t), ξ − E[ξ] + Π (t)E[ξ] + 2η(t), ¯ E[ξ] T R(P)(v − Θz − ϕ), v − Θz − ϕ − R(P)ϕ, ϕ +E t  + Pσ, σ + 2η, b − E[b] + 2ζ, σ + 2η, ¯ E[b] ds T  Θ   R(P)  y, y − 2Θ  R(P)E[u], Θ y + t    ϕ,  y ds + R(P)E[u], E[u] − 2R(P) ¯ E[u] − Θ = EP(t)(ξ − E[ξ]) + 2η(t), ξ − E[ξ] + Π (t)E[ξ] + 2η(t), ¯ E[ξ] T Pσ, σ + 2η, b − E[b] + 2ζ, σ + 2η, ¯ E[b] +E t

 ϕ, − R(P) ¯ ϕ ¯ − R(P)(ϕ − E[ϕ]), ϕ − E[ϕ] + R(P)(v − Θz − ϕ + E[ϕ]), v − Θz − ϕ + E[ϕ]    y − ϕ),  y − ϕ + R(P)(E[u] −Θ ¯ E[u] − Θ ¯ ds.

(3.4.33)

 Since R(P), R(P) 0, (3.4.33) implies that J (t, ξ; u)  EP(t)(ξ − E[ξ]) + 2η(t), ξ − E[ξ] + Π (t)E[ξ] + 2η(t), ¯ E[ξ] T Pσ, σ + 2η, b − E[b] + 2ζ, σ + 2η, ¯ E[b] +E t   ϕ, − R(P)(ϕ − E[ϕ]), ϕ − E[ϕ] − R(P) ¯ ϕ ¯ ds, (3.4.34) with the equality holding if and only if  y + ϕ, v = Θz + ϕ − E[ϕ], E[u] = Θ ¯ or equivalently, if and only if

3.4 Uniform Convexity and Riccati Equations

 u = Θ(X − E[X ]) + ΘE[X ] + ϕ − E[ϕ] + ϕ. ¯

93

(3.4.35)

In particular, when b, σ, g, g, ¯ q, q, ¯ ρ, ρ¯ vanish, we have (η(s), ζ(s)) = (0, 0), η(s) ¯ = 0, ϕ(s) = ϕ(s) ¯ = 0; ∀t  s  T. Then we may take ξ = 0 to get from (3.4.33) that

T

R(P){u −E[u]−Θ(X −E[X ])}, u −E[u]−Θ(X −E[X ])     + R(P)(E[u] − ΘE[X ]), E[u] − ΘE[X ] ds. (3.4.36)

J (t, 0; u) = E 0

t

 Noting that R(P), R(P)  δ I for some δ > 0 and using Lemma 3.4.5, we obtain

T

J (t, 0; u)  δ E 0



   |u − E[u] − Θ(X − E[X ])|2 + |E[u] − ΘE[X ]|2 ds

t

 |u − Θ(X − E[X ])|2 − 2u − Θ(X − E[X ]), E[u] t  + (1 + γ)|E[u]|2 ds T δγ  E |u − Θ(X − E[X ])|2 ds 1+γ t T δγ 2  |u(s)|2 ds, ∀u ∈ U[t, T ], E 1+γ t  δE

T

for some γ > 0. The desired result follows.



To conclude this section, we present the following result, which is a direct consequence of (3.4.34)–(3.4.35). Corollary 3.4.7 Under the assumptions of Theorem 3.4.6, the unique optimal control of Problem (MFLQ) for the initial pair (t, ξ) is given by ∗  ] + ϕ − E[ϕ] + ϕ, ¯ u ∗ = Θ(X ∗ − E[X ∗ ]) + ΘE[X

 are defined as in (3.4.28), ϕ, ϕ¯ are defined as in (3.4.29), and X ∗ is the where Θ, Θ solution to the closed-loop system  ⎧ ∗ +   d X ∗ (s) = (A + BΘ)(X ∗ − E[X ∗ ]) + ( A B Θ)E[X ] ⎪ ⎪  ⎪ ⎪ ⎪ + B(ϕ − E[ϕ]) +  B ϕ¯ + b ds ⎪ ⎨  ∗ ∗ + D Θ)E[X  ] + (C + DΘ)(X − E[X ∗ ]) + (C ⎪  ⎪ ⎪ ϕ¯ + σ dW, s ∈ [t, T ], ⎪ + D(ϕ − E[ϕ]) + D ⎪ ⎪ ⎩ ∗ X (t) = ξ.

94

3 Mean-Field Linear-Quadratic Optimal Controls

Moreover, the value V (t, ξ) is given by V (t, ξ) = EP(t)(ξ − E[ξ]) + 2η(t), ξ − E[ξ] + Π (t)E[ξ] + 2η(t), ¯ E[ξ] T Pσ, σ + 2η, b − E[b] + 2ζ, σ + 2η, ¯ E[b] +E t   ϕ, − R(P)(ϕ − E[ϕ]), ϕ − E[ϕ] − R(P) ¯ ϕ ¯ ds. In particular, the value function of Problem (MFLQ)0 is given by V 0 (t, ξ) = E P(t)(ξ − E[ξ]), ξ − E[ξ] + Π (t)E[ξ], E[ξ].

3.4.3 Sufficient Conditions for the Uniform Convexity In this section we present a couple of sufficient conditions for the uniform convexity of the cost functional. Proposition 3.4.8 Let (A1)–(A2) hold. If there exists a constant δ > 0 such that for almost every s ∈ [t, T ],   δ Im , R(s), R(s)   0, G, G

Q(s) − S(s) R(s)−1 S(s)  0,  −  −1 Q(s) S(s) R(s) S(s)  0,

then the uniform convexity condition (A4) holds. Proof Denote by X 0u = {X 0u (s); t  s  T } the solution of (3.4.24) corresponding to the control u ∈ U[t, T ] and set y(s) = X 0u (s) − E[X 0u (s)], v(s) = u(s) − E[u(s)]. Then

   T y y Q S , ds J (t, 0; u) = E Gy(T ), y(T ) + S R v v t   u u  + GE[X 0 (T )], E[X 0 (T )]





 T   E[X 0u ] Q S E[X 0u ] , ds +   E[u] E[u] S R t

 T Q S y y  , ds S R v v t





 T   E[X 0u ] Q S E[X 0u ] , ds. (3.4.37) +   E[u] E[u] S R t 0

3.4 Uniform Convexity and Riccati Equations

95

The first term on the right-hand side of (3.4.37) is equal to

T

E

  Qy, y + 2Sy, v + Rv, v ds

t



T

=E



   (Q − S  R −1 S)y, y + R(v + R −1 Sy), v + R −1 Sy ds

t



T

 δE

  v + R −1 Sy 2 ds.

t

Similarly, the second term on the right-hand side of (3.4.37) is equal to

T



t

     u u u    QE[X ], E[X ] + 2 SE[X ], E[u] + RE[u], E[u] ds 0 0 0

T

δ t

2  E[u] + R −1 SE[X 0u ] ds.

−1 Consequently, by using Lemma 3.4.5 with Θ = −R −1 S and Θ¯ = − R S, we obtain

T

J 0 (t, 0; u)  δ E t

    u − E[u] + R −1 S X u − E[X u ] 2 + γ|E[u]|2 ds 0 0

T δγ  E 1+γ t T δγ 2 E  1+γ t

   u + R −1 S X u − E[X u ] 2 ds 0 0 |u(s)|2 ds, ∀u ∈ U[t, T ]. 

This completes the proof.

Proposition 3.4.9 Let (A1)–(A2) hold. Let P and Π be the solutions to the following 0 ∈ L 1 (t, T ; S¯ n+ ), respectively: Lyapunov equations for some Q 0 , Q  

P˙ + P A + A P + C  PC + Q − Q 0 = 0, s ∈ [t, T ], P(T ) = G, + A Π + C  P C + Q − Q 0 = 0, s ∈ [t, T ], Π˙ + Π A  Π (T ) = G.

If for some δ > 0,

S(P) Q0 S(P) R(P) − δ Im

 0,

 Π ) 0 S(P, Q  Π ) R(P)  S(P,

a.e. on [t, T ], then the uniform convexity condition (A4) holds.

 0,

(3.4.38)

96

3 Mean-Field Linear-Quadratic Optimal Controls

Proof Let u ∈ U[t, T ] and X be the solution to the homogeneous state equation with initial state ξ = 0:   ⎧ ¯ ¯ + AE[X ] + Bu + BE[u] ds ⎪ ⎨ d X (s) = AX   ¯ ¯ + C X + CE[X ] + Du + DE[u] dW, s ∈ [t, T ], ⎪ ⎩ X (t) = 0. By letting y(s) = E[X (s)], z(s) = X (s) − E[X (s)], v(s) = u(s) − E[u(s)], we can rewrite J 0 (t, 0; u) as follows:  J 0 (t, 0; u) = E Gz(T ), z(T ) +  y(T ), y(T ) + + G



T



T



t

t

  z z , ds v v





  y y S , ds.  E[u] E[u] R

Q S S R

 Q  S

Observe that y and z satisfy, respectively, the following ODE and SDE:  

 + y˙ (s) = Ay BE[u], s ∈ [t, T ], y(t) = 0, y + DE[u])dW,  dz(s) = (Az + Bv)ds + (C z + Dv + C s ∈ [t, T ], z(t) = 0,

and that E[z] = 0, E[v] = 0. Then using integration by parts, we have



 ( P˙ + P A + A P + C  PC)z, z t     + 2 (P B + C  P D)v, z + D  P Dv, v   y + DE[u]),  y + DE[u]  + P(C C ds T     (Q 0 − Q)z, z + 2 (P B + C  P D)v, z =E T

EGz(T ), z(T ) = E

   y + DE[u]),  y + DE[u]  C ds. + D  P Dv, v + P(C 

t

Similarly,  y(T ), y(T ) = G

t

T



   0 − Q −C  P C)y,  y +2 Π (Q BE[u], y ds.

3.4 Uniform Convexity and Riccati Equations

97

Consequently,

T

J (t, 0; u) = E 0

t

+

t

T

 z z Q 0 S(P) , ds S(P) R(P) v v





  Π ) 0 S(P, y y Q , ds.  Π ) R(P)  E[u] E[u] S(P,



The desired result follows from (3.4.38).



3.5 Multi-dimensional Brownian Motion Case and Applications In the previous sections, the Brownian motion has been assumed to be onedimensional for the purpose of simplicity. In the case of a d-dimensional Brownian motion W = {W (t) = (W1 (t), . . . , Wd (t)) ; 0  t < ∞}, the MFLQ problem is to find a control u ∗ ∈ U[t, T ] such that the quadratic cost functional (3.1.2) is minimized subject to the following state equation over [t, T ]:   ⎧ ¯ ¯ d X (s) = AX + AE[X ] + Bu + BE[u] + b ds ⎪ ⎪ ⎪ ⎪ d ⎨    + Ci X + C¯ i E[X ] + Di u + D¯ i E[u] + σi dWi , ⎪ ⎪ i=1 ⎪ ⎪ ⎩ X (t) = ξ,

(3.5.1)

where the weighting matrices in the cost functional satisfy (A2), and the coefficients of the state equation satisfy the following assumption that is similar to (A1): (A1)∗ The coefficients of (3.5.1) satisfy ⎧ 1 n×n ¯ B, B¯ ∈ L 2 (0, T ; Rn×m ), ⎪ ⎨ A, A ∈ L (0, T ; R ), Ci , C¯i ∈ L 2 (0, T ; Rn×n ), Di , D¯ i ∈ L ∞ (0, T ; Rn×m ), ⎪ ⎩ b ∈ L 2F (Ω; L 1 (0, T ; Rn )), σi ∈ L 2F (0, T ; Rn ); i = 1, . . . , d. In this case, the associated Riccati equations (3.4.1) and (3.4.2) become

98

3 Mean-Field Linear-Quadratic Optimal Controls

⎧ d d ! ⎪   ⎪   ⎪ ˙ ⎪ P + P A + A P + C PC + Q − P B + Ci P Di + S  i ⎪ i ⎪ ⎪ ⎪ i=1 i=1 ⎨ d d !−1 !   ⎪ ⎪ × R+ B P + Di P Di Di PCi + S = 0, ⎪ ⎪ ⎪ ⎪ i=1 i=1 ⎪ ⎪ ⎩ P(T ) = G,

(3.5.2)

and ⎧ d d ! ⎪   ⎪    ⎪     i +   i P D ˙ ⎪ Π + Π A + A Π + Q + P C − Π B + S C C i ⎪ i ⎪ ⎪ ⎪ i=1 i=1 ⎨ d !−1 ! (3.5.3)  "d ⎪  + i=1  i i +  i P D i P C ⎪ × R BΠ + S = 0, D D ⎪ ⎪ ⎪ ⎪ i=1 ⎪ ⎪ ⎩  Π (T ) = G,      G  are as in (3.4.4) and respectively, where A, B, Q, S, R, i = Ci + C¯ i , C

i = Di + D¯ i ; D

1  i  d.

Similar to the case of one-dimensional Brownian motion, we have the following result. Theorem 3.5.1 Let (A1)∗ and (A2) hold. The uniform convexity condition (A4) holds if and only if (i) the Riccati equation (3.5.2) admits a solution P ∈ C([t, T ]; Sn ) such that R(P)  R +

d  i=1

Di P Di

 + 0, R(P) R

d 

i 0, i P D D

i=1

(ii) the Riccati equation (3.5.3) admits a solution Π ∈ C([t, T ]; Sn ). In this case, the unique optimal control of Problem (MFLQ) for the initial pair (t, ξ) is given by ∗  u ∗ = Θ(X ∗ − E[X ∗ ]) + ΘE[X ] + ϕ − E[ϕ] + ϕ, ¯

3.5 Multi-dimensional Brownian Motion Case and Applications

99

 ϕ, ϕ¯ are defined by where Θ, Θ, −1

d 



Θ  −R(P)

B P+

! Di PCi + S ,

i=1

 −1    −R(P) BΠ + Θ

d 

! i +  i P C S , D

i=1

 ϕ  −R(P)−1 B  η +

d 

 Di (ζi + Pσi ) + ρ ,

i=1



 −1  B  η¯ + ϕ¯  −R(P)

d 

 i (E[ζi ] + PE[σi ]) + E[ρ] + ρ¯ , D

i=1

with (η, ζ) and η¯ being the solutions to the BSDE ⎧ d  ⎪  ⎪  ⎪ ⎪ dη(s) = − (A + BΘ) η + (Ci + Di Θ) (ζi + Pσi ) ⎪ ⎪ ⎪ ⎪ i=1 ⎨ d    ⎪ ⎪ + Θ ρ + Pb + q ds + ζi dWi , s ∈ [t, T ], ⎪ ⎪ ⎪ ⎪ i=1 ⎪ ⎪ ⎩ η(T ) = g,

(3.5.4)

and the ODE ⎧ d   ⎪ ⎪ ⎪ +  i (PE[σi ] + E[ζi ]) + E[ρ] + ρ¯   η¯ + Θ  ⎪ η˙¯ + ( A B Θ) D ⎪ ⎪ ⎪ ⎪ i=1 ⎨ d (3.5.5)  ⎪ i (PE[σi ] + E[ζi ]) + E[q] + q¯ + Π E[b] = 0, s ∈ [t, T ], ⎪ + C ⎪ ⎪ ⎪ ⎪ i=1 ⎪ ⎪ ⎩ η(T ¯ ) = E[g] + g, ¯ respectively, and X ∗ is the solution to the closed-loop system

100

3 Mean-Field Linear-Quadratic Optimal Controls

⎧  ∗ ⎪ +   d X ∗ (s) = (A + BΘ)(X ∗ − E[X ∗ ]) + ( A B Θ)E[X ] ⎪ ⎪ ⎪ ⎪  ⎪ ⎪ ⎪ ⎪ + B(ϕ − E[ϕ]) +  B ϕ¯ + b ds ⎪ ⎪ ⎪ ⎪ ⎨ d   ∗ i + D i Θ)E[X  (Ci + Di Θ)(X ∗ − E[X ∗ ]) + (C + ] ⎪ ⎪ ⎪ i=1 ⎪ ⎪  ⎪ ⎪ ⎪ i ϕ¯ + σi dWi , s ∈ [t, T ], ⎪ + D (ϕ − E[ϕ]) + D i ⎪ ⎪ ⎪ ⎪ ⎩ X ∗ (t) = ξ.

Moreover, the value V (t, ξ) is given by V (t, ξ) = EP(t)(ξ − E[ξ]) + 2η(t), ξ − E[ξ] + Π (t)E[ξ] + 2η(t), ¯ E[ξ] T  d d  +E Pσi , σi  + 2η, b − E[b] + 2ζi , σi  t

i=1

i=1

   ϕ, + 2η, ¯ E[b] − R(P)(ϕ − E[ϕ]), ϕ − E[ϕ] − R(P) ¯ ϕ¯ ds.

In particular, the value function of Problem (MFLQ)0 is given by V 0 (t, ξ) = E P(t)(ξ − E[ξ]), ξ − E[ξ] + Π (t)E[ξ], E[ξ].

As an application of the mean-field control theory, let us consider a market in which d + 1 assets (or “securities”) are traded continuously. One of the assets, called the bond, has a price P0 (t) which evolves according to the ODE 

d P0 (t) = r (t)P0 (t)dt, t ∈ [0, T ], P0 (0) = p0 .

(3.5.6)

3.5 Multi-dimensional Brownian Motion Case and Applications

101

The remaining d assets, called stocks, are “risky”; their prices are modeled by the linear SDE for i = 1, . . . , d: ⎧ d  ⎪ ⎪ ⎨ d Pi (t) = Pi (t)bi (t)dt + Pi (t) σi j (t)dW j (t), t ∈ [0, T ], (3.5.7) j=1 ⎪ ⎪ ⎩ Pi (0) = pi . The process W = {W (t) = (W1 (t), . . . , Wd (t)) ; 0  t  T } is a standard d-dimensional Brownian motion on a probability space (Ω, F, P), and the filtration F = {Ft }t0 is the augmentation under P of the natural filtration generated by W . The interest rate r (t), as well as the vector of mean rates of return b(t) = (b1 (t), . . . , bd (t)) and the dispersion matrix σ(t) = (σi j (t))1i, jd , is assumed to be deterministic (nonrandom) and bounded uniformly in t ∈ [0, T ]. We assume that for some number δ > 0, σ(t)σ(t)  δ Id , ∀t ∈ [0, T ].

(3.5.8)

We imagine now an investor who starts with some initial wealth x0 > 0 and invests it in the d + 1 assets described previously. Let Ni (t) denote the number of shares of asset i owned by the investor at time t. Then the investor’s wealth at time t is x(t) =

d 

Ni (t)Pi (t).

(3.5.9)

i=0

Assume that the trading of shares is self-financed and takes place continuously, and that transaction cost and consumptions are not considered. Then one has ⎧ d ⎪  ⎪ ⎨ d x(t) = Ni (t)d Pi (t), t ∈ [0, T ], i=0 ⎪ ⎪ ⎩ x(0) = x0 . Taking (3.5.6), (3.5.7), (3.5.9) into account and denoting by u i (t)  Ni (t)Pi (t) the amount invested in the i-th stock, 1 < i < d, we may write (3.5.10) as

(3.5.10)

102

3 Mean-Field Linear-Quadratic Optimal Controls

⎧ d    ⎪ ⎪ ⎪ ⎪ d x(t) = r (t)x(t) + [bi (t) − r (t)]u i (t) dt ⎪ ⎪ ⎪ ⎪ i=1 ⎨ d d  ⎪ + σi j (t)u i (t)dW j (t), t ∈ [0, T ], ⎪ ⎪ ⎪ ⎪ i=1 j=1 ⎪ ⎪ ⎪ ⎩ x(0) = x0 .

(3.5.11)

Definition 3.5.2 A portfolio process u = {u(t) = (u 1 (t), . . . , u d (t)) , Ft ; 0  t  T } is a progressively measurable process for which

T

E

|u(t)|2 dt < ∞.

0

In other words, a portfolio process is an element in L 2F (0, T ; Rd ). The objective of the investor is to maximize the mean terminal wealth Ex(T ), or equivalently, to minimize h 1 (u)  −Ex(T ), and at the same time to minimize the variance of the terminal wealth h 2 (u)  var x(T ) = Ex(T )2 − [ Ex(T )]2 . Definition 3.5.3 A portfolio process u ∗ is called an efficient portfolio if there exists no portfolio process u such that h 1 (u)  h 1 (u ∗ ), h 2 (u)  h 2 (u ∗ ), and at least one of the inequalities is strict. In this case, (h 1 (u ∗ ), h 2 (u ∗ )) ∈ R2 is called an efficient point. In other words, an efficient portfolio is one where there exists no other portfolio better than it with respect to both the mean and variance criteria. The problem then is to identify the efficient portfolios and is referred to as a mean-variance portfolio selection problem (MV problem, for short). An efficient portfolio can be found by solving a single-objective optimization problem where the objective is a weighted average of the two original criteria, as shown by the following result. Proposition 3.5.4 Let λ > 0 be a given real number. If u ∗ is a minimum of λh 1 (u) + h 2 (u), then it is an efficient portfolio of the MV problem. Proof We prove the conclusion by contradiction. Suppose that there exists a portfolio v such that, say,

3.5 Multi-dimensional Brownian Motion Case and Applications

103

h 1 (v) < h 1 (u ∗ ), h 2 (v)  h 2 (u ∗ ). Then λh 1 (v) + h 2 (v) < λh 1 (u ∗ ) + h 2 (u ∗ ), which contradicts to the fact that u ∗ is a minimum of λh 1 (u) + h 2 (u).  Let us now consider the following MFLQ problem for λ > 0. Problem (MFLQ)λ . For given initial state x0 ∈ R, find a control u ∗ ∈ L 2F (0, T ; Rd ) such that inf Jλ (x0 ; u), (3.5.12) Jλ (x0 ; u ∗ ) = u∈L 2F (0,T ;Rd )

subject to (3.5.11), where Jλ (x0 ; u)  2λh 1 (u) + h 2 (u) = Ex(T )2 − [ Ex(T )]2 − 2λEx(T ). According to Proposition 3.5.4, we may find efficient portfolios of the MV problem by solving the above Problem (MFLQ)λ , in which the initial time has been taken to be t = 0. We now apply Theorem 3.5.1 to get the optimal control for Problem (MFLQ)λ . The Riccati equations associated to Problem (MFLQ)λ are ⎧ d !−1 ⎪  ⎪ ˙ ⎨ P + P A + A P − P B Di P Di B  P = 0, i=1 ⎪ ⎪ ⎩ P(T ) = G, ⎧ d !−1  ⎪ ⎪ ⎨ Π˙ + Π A + A Π − Π  i  P D  B D B  Π = 0, i

⎪ ⎪ ⎩

i=1

(3.5.13)

(3.5.14)

 Π (T ) = G,

in which  = r (t), A(t) = A(t)  = 0, G = 1, G

B(t) =  B(t) = (b1 (t) − r (t), . . . , bd (t) − r (t)), i (t) = (σ1i (t), . . . , σdi (t)). Di (t) = D

Note that P(t) ∈ R and d 

Di Di = σ(t)σ(t)  δ Id ,

i=1

With the notation

−1  B(t) , μ(t) = B(t) σ(t)σ(t)

we can rewrite (3.5.13) and (3.5.14) as

104

3 Mean-Field Linear-Quadratic Optimal Controls



˙ P(t) = [μ(t) − 2r (t)]P(t), t ∈ [0, T ], P(T ) = 1,

⎧ μ(t) ⎨ Π(t) ˙ = Π (t)2 − 2r (t)Π (t), t ∈ [0, T ], P(t) ⎩ Π (T ) = 0.

(3.5.15)

(3.5.16)

Clearly, the positive function P(t) = e

#T t

[2r (s)−μ(s)]ds

, 0  t  T,

(3.5.17)

is the solution of (3.5.15) such that d 

Di P Di

i=1

=

d 

i 0, i P D D

i=1

and Π (t) ≡ 0 is the solution of (3.5.16). Observe that for Problem (MFLQ)λ ,  −1  = 0; 0  t  T, Θ(t) = − σ(t)σ(t) B(t) , Θ(t) and that all the coefficients in (3.5.4) are deterministic. Thus, the BSDE (3.5.4) reduces to the following ODE: 

dη(t) = −[r (t) − μ(t)]η(t)dt, t ∈ [0, T ], η(T ) = 0,

and the ODE (3.5.5) reduces to 

η¯˙ (t) + r (t)η(t) ¯ = 0, t ∈ [0, T ], η(T ¯ ) = −λ.

Clearly, the solutions of the above two ODEs are given by η(t) = 0, η(t) ¯ = −λe

#T t

r (s)ds

; 0  t  T.

Therefore, ϕ(t) = 0, ϕ(t) ¯ = −λΘ(t)e

#T t

[μ(s)−r (s)]ds

; 0  t  T,

3.5 Multi-dimensional Brownian Motion Case and Applications

105

and the optimal control of Problem (MFLQ)λ is given by  #T  u ∗λ (t) = −Θ(t) λe t [μ(s)−r (s)]ds − [x ∗ (t) − Ex ∗ (t)] .

(3.5.18)

Note that Ex ∗ (t) satisfies the following ODE:   ⎧ dy(t) = r (t)y(t) + B(t)Eu ∗λ (t) dt ⎪ ⎨ #T   = r (t)y(t) + λμ(t)e t [μ(s)−r (s)]ds dt, t ∈ [0, T ], ⎪ ⎩ y(0) = x0 , from which we get Ex ∗ (t) = x0 e = x0 e

#t 0

#t 0

#T

t

+ λe−

r (s)ds

 #T + λe− t r (s)ds e

t

r (s)ds



r (s)ds

μ(s)e

0 #T 0

μ(s)ds

#T s

μ(v)dv

−e

#T t

ds

μ(s)ds

 .

So (3.5.18) can be further written as  #t  #T #T u ∗λ (t) = −Θ(t) x0 e 0 r (s)ds + λe 0 μ(s)ds− t r (s)ds − x ∗ (t) .

(3.5.19)

Moreover, from the last assertion of Theorem 3.5.1 we have

$  % σ(t)σ(t) P(t)ϕ(t), ¯ ϕ(t) ¯ dt 0 T #T #T = −2λx0 e 0 r (s)ds − λ2 P(t)μ(t)e t 2[μ(s)−r (s)]ds dt.

Jλ (x0 ; u ∗λ ) = 2η(0), ¯ x0  −

T

0

Substituting (3.5.17) into the above gives Jλ (x0 ; u ∗λ ) = −2λx0 e = −2λx0 e

#T 0

#T 0

r (s)ds r (s)ds

− λ2 

T

0 #T

− λ2 e

0

μ(t)e μ(s)ds

#T t

μ(s)ds

dt

 −1 .

Therefore, the efficient point corresponding to the efficient portfolio (3.5.19) is given by  #T  + λ e 0 μ(s)ds − 1 ,  #T  var x ∗ (T ) = Jλ (x0 ; u ∗λ ) + 2λEx ∗ (T ) = λ2 e 0 μ(s)ds − 1 . Ex ∗ (T ) = x0 e

#T 0

r (s)ds

106

3 Mean-Field Linear-Quadratic Optimal Controls

3.6 Problems in Infinite Horizons In this section, we consider mean-field linear-quadratic optimal control problems over the infinite time horizon [0, ∞). For simplicity of presentation, we consider only the homogeneous problem. Thus, the state equation of interest here takes the following form:   ⎧ ¯ ¯ (s) + AE[X (s)] + Bu(s) + BE[u(s)] ds ⎪ ⎨ d X (s) = AX   ¯ ¯ + C X (s) + CE[X (s)] + Du(s) + DE[u(s)] dW, ⎪ ⎩ X (0) = x, and the cost functional is of the form ∞ ¯ Q X (s), X (s) +  QE[X (s)], E[X (s)] J (x; u) = E 0  ¯ + Ru(s), u(s) +  RE[u(s)], E[u(s)] ds.

(3.6.1)

(3.6.2)

In the above, the coefficients of (3.6.1) and the weighting matrices in (3.6.2) satisfy the following two assumptions, respectively. ¯ C, C¯ ∈ Rn×n , B, B, ¯ D, D¯ ∈ Rn×m are constant-valued. (A1) A, A, (A2) Q, Q¯ ∈ Sn , R, R¯ ∈ Sm are constant-valued and Q, Q + Q¯  0,

R, R + R¯ > 0.

For convenience, let us recall the following notation:    #∞ L 2F (Rn ) = ϕ : [0, ∞) × Ω → Rn  ϕ ∈ F and E 0 |ϕ(t)|2 dt < ∞ ,   Xloc [0, ∞) = ϕ : [0, ∞) × Ω → Rn  ϕ is F-adapted, continuous,    and E sup0tT |ϕ(t)|2 < ∞ for every T > 0 ,    #∞ X [0, ∞) = ϕ ∈ Xloc [0, ∞)  E 0 |ϕ(t)|2 dt < ∞ . By a standard argument using contraction mapping theorem, one can show that for any (x, u) ∈ Rn × L 2F (Rn ), the state Equation (3.6.1) admits a unique solution X (·) = X (· ; x, u) ∈ Xloc [0, ∞). It should be noted that in general, the solution X of (3.6.1) might not be in X [0, ∞) and thereby the cost functional J (x; u) might not be defined. So we introduce the following set:   Uad (x) = u ∈ L 2F (Rm ) | J (x; u) is defined .

3.6 Problems in Infinite Horizons

107

An element u ∈ Uad (x) is called an admissible control for the initial state x, and the corresponding state process X (·) ≡ X (· ; x, u) is called an admissible state process. Our optimal control problem can then be stated as follows. Problem (MFLQ)∞ . For given x ∈ Rn , find an admissible control u ∗ ∈ Uad (x) such that J (x; u ∗ ) = inf J (x; u) ≡ V (x). u∈Uad (x)

The structure of Uad (x) is kind of complicated, since it involves not only the state equation, but also the cost functional. Also, observe that the cost functional J (x; u) can be written as ∞ 1 ¯ 21 E[X (s)]|2 |Q 2 {X (s) − E[X (s)]}|2 + |(Q + Q) J (x; u) = E 0 ! ¯ + Ru(s), u(s) +  RE[u(s)], E[u(s)] ds. So u ∈ Uad (x) if and only if



E

! 1 ¯ 21 E[X (s)]|2 ds < ∞. |Q 2 {X (s) − E[X (s)]}|2 + |(Q + Q)

(3.6.3)

0

¯ might be degenerate, we might not have X ∈ X [0, ∞) Since Q and/or (Q + Q) even though u ∈ Uad (x). Later, we will give some better description of Uad (x) under certain conditions.

3.6.1 Stability and Stabilizability Let us first look at an example, in which the admissible control set Uad (x) is empty for nonzero initial state x. Example 3.6.1 Consider the one-dimensional controlled system   d X (s) = X (s)ds + E[X (s)] + u(s) dW, s  0, and the cost functional



J (x; u) = E



 |X (s)|2 + |u(s)|2 + |E[u(s)]|2 ds.

0

Clearly, dE[X (s)] = E[X (s)]ds, s  0,

108

3 Mean-Field Linear-Quadratic Optimal Controls

so with initial state x, E[X (s)] = xes . Consequently, d X (s) = X (s)ds + [xes + u(s)]dW, s  0, from which we get 



t

X (t) = e x + t



  x + e u(s) dW , t  0. −s

0

As long as x = 0,



J (x; u)  E 0





|X (s)|2 ds =

  t  2 x + e−s u(s) ds dt = ∞. e2t x 2 +

0

0

In this case, Uad (x) = ∅. Therefore, the corresponding Problem (MFLQ)∞ is not meaningful. From the above example, we see that before investigating Problem (MFLQ)∞ , one should find conditions for the system and the cost functional so that the set Uad (x) is at least non-empty and hopefully it admits an accessible characterization. To this end, we first consider the following uncontrolled linear mean-field SDE (which amounts to taking u = 0 or letting B = B¯ = D = D¯ = 0): 

    ¯ ¯ d X (t) = AX (t) + AE[X (t)] dt + C X (t) + CE[X (t)] dW, X (0) = x.

(3.6.4)

¯ C, C], ¯ and for simplicity, we write We will briefly denote the above system by [A, A, [A, C] = [A, 0, C, 0] when the mean-field terms are absent. ¯ C, C] ¯ is said to be Definition 3.6.2 Let (A1) –(A2) hold. The system [A, A, (i) L 2 -exponentially stable if for every initial state x ∈ Rn , the solution X of (3.6.4) satisfies lim eλt E|X (t)|2 = 0 t→∞

for some λ > 0; (ii) L 2 -globally integrable if for every x ∈ Rn , the solution X of (3.6.4) is in X [0, ∞), i.e., ∞ E|X (t)|2 dt < ∞; 0

(iii) L 2 -asymptotically stable if for every x ∈ Rn , the solution X of (3.6.4) satisfies lim E|X (t)|2 = 0;

t→∞

(3.6.5)

3.6 Problems in Infinite Horizons

109

(iv) L 2Q, Q¯ -globally integrable if for every x ∈ Rn , the solution X of (3.6.4) satisfies (3.6.3). ¯ C, C] ¯ is L 2Q -globally integrable if it We shall simply say that the system [A, A, 2 is L Q,0 -globally integrable. Remark 3.6.3 We shall say that a matrix M ∈ Rn×n is exponentially stable if the system [M, 0, 0, 0] is L 2 -exponentially stable. Proposition 3.6.4 Let (A1) –(A2) hold. Among the statements (i) (ii) (iii) (iv)

¯ C, C] ¯ is the system [A, A, ¯ C, C] ¯ is the system [A, A, ¯ C, C] ¯ is the system [A, A, ¯ C, C] ¯ is the system [A, A,

L 2 -exponentially stable; L 2 -globally integrable; L 2 -asymptotically stable; L 2Q, Q¯ -globally integrable,

we have the following implications: (i) ⇒ (ii) ⇒ (iii);

(ii) ⇒ (iv).

Proof The implications (i) ⇒ (ii) and (ii) ⇒ (iv) are trivial. For the implication (ii) ⇒ (iii), we note that E|X (t)|2 − E|X (s)|2 t  ¯ ¯ 2X (s), AX (s) + AE[X (s)] + |C X (s) + CE[X (s)]|2 ds =E s t L E|X (s)|2 ds. (3.6.6) s

Here and hereafter, L > 0 stands for a generic constant which could be different from line to line. Since ∞ t 2 E|X (s)| ds  E |X (s)|2 ds < ∞, s

0

(3.6.6) implies that E|X (t)|2 is bounded uniformly in t ∈ [0, ∞) and thereby Lipschitz-continuous. Combining the Lipschitz-continuity and the integrability of  E|X (t)|2 over [0, ∞), we obtain (3.6.5). Theorem 3.6.5 Let (A1) hold. ¯ C, C] ¯ is L 2 -asymptotically stable, then A + A¯ is exponen(i) If the system [A, A, tially stable. ¯ C, C] ¯ is (ii) If A + A¯ is exponentially stable, then the system [A, A, L 2 -exponentially stable if either [A, C] is L 2 -globally integrable or C + C¯ = 0.

(3.6.7)

110

3 Mean-Field Linear-Quadratic Optimal Controls

Proof (i) Suppose that (3.6.5) holds. Taking expectations in (3.6.4) yields 

¯ dE[X (t)] = (A + A)E[X (t)]dt, t  0, E[X (0)] = x,

from which we obtain

¯

E[X (t)] = e(A+ A)t x, t  0.

Since A, A¯ are constant matrices and |E[X (t)]|2  E|X (t)|2 → 0

as t → ∞,

the exponential stability of A + A¯ follows. (ii) Let P ∈ Sn be a positive definite matrix. By integration by parts, EP{X (t) − E[X (t)]}, X (t) − E[X (t)] t (P A + A P + C  PC){X (s) − E[X (s)]}, X (s) − E[X (s)] =E 0  ¯  P(C + C)E[X ¯ + (C + C) (s)], E[X (s)] ds. (3.6.8) If (3.6.7) holds, (3.6.8) implies that for some constant L > 0,

t

var [X (t)]  L

var [X (s)]ds, ∀t  0.

0

Then, by Gronwall’s inequality, var [X (t)] = 0 and hence ¯

X (t) = E[X (t)] = e(A+ A)t x, t  0. ¯ C, C] ¯ is L 2 -exponentially stable since A + A¯ is exponentially So the system [A, A, stable. If [A, C] is L 2 -globally integrable, then by Theorem 1.2.3 of Chap. 1 (with B = D = 0), the equation (3.6.9) P A + A P + C  PC + I = 0 admits a positive definite solution P ∈ Sn . With ρ > 0 being the smallest eigenvalue ¯  P(C + C), ¯ (3.6.8) implies of P and μ  0 being the largest eigenvalue of (C + C)

t

ρ var [X (t)]  − 0

By Gronwall’s inequality,



t

var [X (s)]ds + μ 0

  E[X (s)]2 ds, t  0.

3.6 Problems in Infinite Horizons

var [X (t)] 

111

μ ρ



t

e

s−t ρ

  E[X (s)]2 ds, t  0.

(3.6.10)

0

Since A + A¯ is exponentially stable, there exist constants L , λ > 0 such that   ¯ E[X (t)] = |e(A+ A)t x|  Le−λt , t  0.

(3.6.11)

Combining (3.6.10) and (3.6.11), we can obtain var [X (t)] 

 Lμ  −λt −1 e − e−ρ t , t  0. 1 − λρ

This results in 2  E|X (t)|2 = var [X (t)] + E[X (t)]  Lμ  −λt −1 e  − e−ρ t + L 2 e−2λt , t  0, 1 − λρ ¯ C, C] ¯ follows. from which the L 2 -exponential stability of [A, A,



It is worth noting that in high dimensions, the L 2 -exponential stability, the L 2 ¯ C, C] ¯ are not global integrability, and the L 2 -asymptotic stability of system [A, A, equivalent. However, in the one-dimensional case, we have the following equivalence result. Proposition 3.6.6 For the one-dimensional system 

    d X (t) = a X (t) + aE[X ¯ (t)] dt + cX (t) + cE[X ¯ (t)] dW, t  0, X (0) = x,

where a, a, ¯ c, c¯ are real numbers, the following statements are equivalent: (i) (ii) (iii) (iv)

It is L 2 -exponentially stable. It is L 2 -globally integrable. It is L 2 -asymptotically stable. a + a¯ < 0, and either 2a + c2 < 0 or c + c¯ = 0.

Proof It suffices to prove the implication (iii) ⇒ (iv). By Itô’s rule, we have E|X (t)|2 = x 2 +

t 0

 (2a + c2 )E|X (s)|2 + (2a¯ + c¯2 + 2cc)|E[X ¯ (s)]|2 ds.

112

3 Mean-Field Linear-Quadratic Optimal Controls

So E|X (t)|2 satisfies the following ODE: 

y˙ (t) = (2a + c2 )y(t) + (2a¯ + c¯2 + 2cc)|E[X ¯ (t)]|2 , t  0, y(0) = x 2 .

¯ x, we have by the variation of constants formula Thus, noting that E[X (t)] = e(a+a)t that t   2 2 (2a+c2 )t 2 2 2 (2a+c2 )t ¯ 2 )s + x 2a¯ − c + (c + c) ¯ e e(2a−c ds E|X (t)| = x e 0 t 2 2 ¯ 2 )s e(2a−c ds = x 2 e(2a+c )t + x 2 (2a¯ − c2 )e(2a+c )t 0 t 2 ¯ 2 )s ¯ 2 e(2a+c )t e(2a−c ds + x 2 (c + c) 0 t 2 ¯ ¯ 2 )s + x 2 (c + c) ¯ 2 e(2a+c )t e(2a−c ds. (3.6.12) = x 2 e2(a+a)t 0

Since the system [a, a, ¯ c, c] ¯ is L 2 -asymptotically stable, Theorem 3.6.5(i) shows that a + a¯ < 0. Then, from (3.6.12) we further have (c + c) ¯ 2 e(2a+c

2

)t



t

¯ )s e(2a−c ds → 0 2

as t → ∞.

(3.6.13)

0

It is not hard to see that (3.6.13) holds only if 2a + c2 < 0 or c + c¯ = 0.



¯ C, C], ¯ we have the following For the L 2Q, Q¯ -global integrability of system [A, A, result. ¯ C, C] ¯ is L 2 ¯ -globally integrable, Theorem 3.6.7 Let (A1) –(A2) hold. If [A, A, Q, Q 2 then A + A¯ is L ¯ -globally integrable, i.e., Q+ Q





 ¯ 2 (Q + Q) ¯ 21 e(A+ A)t dt < ∞.

(3.6.14)

0

¯ C, C] ¯ is L 2 ¯ -globally integrable, proConversely, if (3.6.14) holds, then [A, A, Q, Q vided either (3.6.7) holds, or [A, C] is L 2Q -globally integrable and ¯ ⊆ N (C + C), ¯ N (Q + Q) where N (G) stands for the null space of a matrix G.

(3.6.15)

3.6 Problems in Infinite Horizons

113

Proof Suppose that (3.6.3) holds. Since Q  0, we have for any x ∈ Rn ,

∞ 0

2  ¯ (Q + Q) ¯ 21 e(A+ A)t x  dt







= 0





0

¯ (Q + Q)E[X (t)], E[X (t)]dt ! ¯ EQ X (t), X (t) +  QE[X (t)], E[X (t)] dt ! 1 ¯ 21 E[X (t)]|2 dt < ∞, |Q 2 {X (t) − E[X (t)]}|2 + |(Q + Q)



=E 0

and (3.6.14) follows. Conversely, let us suppose that (3.6.14) holds. Recall that (3.6.8) is true for any P ∈ Sn . So if (3.6.7) holds, the same argument in the proof of Theorem 3.6.5(ii) shows that ¯ X (t) = E[X (t)] = e(A+ A)t x, t  0. Consequently,

! 1 ¯ 21 E[X (t)]|2 dt |Q 2 {X (t) − E[X (t)]}|2 + |(Q + Q)



E 0





=

¯

¯ 2 e(A+ A)t x|2 dt < ∞. |(Q + Q) 1

0

If (3.6.15) holds and [A, C] is L 2Q -globally integrable, then proceeding similarly to the proof of Theorem 3.2.3 in [48, Chap. 3] (with  replaced by Q), one can show that there exists a matrix P  0 such that P A + A P + C  PC + Q = 0. With this P, (3.6.8) yields 0  EP{X (t) − E[X (t)]}, X (t) − E[X (t)] t − Q{X (s) − E[X (s)]}, X (s) − E[X (s)] =E 0  ¯  P(C + C)E[X ¯ + (C + C) (s)], E[X (s)] ds. Now, the condition (3.6.15) implies that ¯  P(C + C)y, ¯ ¯ (C + C) y  L(Q + Q)y, y, ∀y ∈ Rn ,

114

3 Mean-Field Linear-Quadratic Optimal Controls

for some L > 0. Thus, E

t

Q{X (s) − E[X (s)]}, X (s) − E[X (s)]ds t ¯ (Q + Q)E[X (s)], E[X (s)]ds, ∀t  0. L 0

0

Consequently, E

! 1 ¯ 21 E[X (s)]|2 ds |Q 2 {X (s) − E[X (s)]}|2 + |(Q + Q) 0 ∞ =E Q{X (s) − E[X (s)]}, X (s) − E[X (s) 0 ! ¯ + (Q + Q)E[X (s)], E[X (s)] ds ∞ ¯  (L + 1) (Q + Q)E[X (s)], E[X (s)]ds 0 ∞ 2  ¯ (Q + Q) ¯ 21 e(A+ A)s = (L + 1) x  ds < ∞. ∞

0

This means that the system is L 2Q, Q¯ -globally integrable.



We now return to the controlled linear mean-field SDE (3.6.1), which we denote ¯ C, C; ¯ B, B, ¯ D, D] ¯ for simplicity of notation. by [A, A, ¯ C, C; ¯ B, B, ¯ D, D] ¯ is said Definition 3.6.8 Let (A1) –(A2) hold. The system [A, A, 2 m×n ¯ such that for any to be MF-L Q, Q¯ -stabilizable if there exist matrices K , K ∈ R ¯

x ∈ Rn , if X K , K is the solution of   ⎧ + B K )X (t) + [ A¯ + B¯ K¯ + B( K¯ − K )]E[X (t)] dt ⎪ ⎨ d X (t) = (A   + (C + D K )X (t) + [C¯ + D¯ K¯ + D( K¯ − K )]E[X (t)] dW, ⎪ ⎩ X (0) = x, and

 ¯ ¯ ¯  ¯ u K , K = K X K , K − E[X K , K ] + K¯ E[X K , K ],

then



E 0

 ¯ ¯ ¯ K , K¯ ¯ Q X K , K (t), X K , K (t) +  QE[X (t)], E[X K , K (t)]  ¯ + |u K , K (t)|2 dt < ∞.

3.6 Problems in Infinite Horizons

115

In this case, the pair (K , K¯ ) is called an MF-L 2Q, Q¯ -stabilizer of the system. When the stronger condition



E

  ¯ ¯ |X K , K (t)|2 + |u K , K (t)|2 dt < ∞

0

holds, we say that the system is MF-L 2 -stabilizable and call (K , K¯ ) an MF-L 2 stabilizer of the system. ¯ C, C; ¯ B, B, ¯ D, D] ¯ is said Definition 3.6.9 Let (A1) –(A2) hold. The system [A, A, 2 m×n such that for any x ∈ Rn , to be L Q, Q¯ -stabilizable if there exists a matrix K ∈ R K if X is the solution of   ⎧ + B K )X (t) + ( A¯ + B¯ K )E[X (t)] dt ⎪ ⎨ d X (t) = (A   + (C + D K )X (t) + (C¯ + D¯ K )E[X (t)] dW, ⎪ ⎩ X (0) = x, and u K (t) = K X K (t), t  0, then

∞

 K ¯ Q X K (t), X K (t) +  QE[X (t)], E[X K (t)] + |u K (t)|2 dt < ∞.

E 0

In this case, K is called an L 2Q, Q¯ -stabilizer of the system. In the special case of Q¯ = 0, we simply say that the system is L 2Q -stabilizable and call K an L 2Q -stabilizer. When the stronger condition



E



 |X K (t)|2 + |u K (t)|2 dt < ∞

0

holds, we say that the system is L 2 -stabilizable and call K an L 2 -stabilizer of the system. ¯ C, C; ¯ The importance of the notions defined above is that when the system [A, A, ¯ D, D] ¯ is MF-L 2 ¯ -stabilizable, the admissible control set Uad (x) is nonempty B, B, Q, Q ¯

for every initial state x ∈ Rn , since u K , K ∈ Uad (x). In particular, Uad (x) is nonempty for all x ∈ Rn if the system is MF-L 2 -stabilizable. ¯ C, C; ¯ B, B, ¯ D, D] ¯ is MF-L 2 ¯ -stabilizable, then It is easily seen that if [A, A, Q, Q with (K , K¯ ) being an stabilizer, the uncontrolled system  K¯ − D K ], [A + B K , A¯ +  B K¯ − B K , C + D K , C¯ + D

(3.6.16)

116

3 Mean-Field Linear-Quadratic Optimal Controls

¯ is L 2 ¯ -globally integrable. Moreover, when  = D + D, where  B = B + B¯ and D Q, Q the mean-field terms are absent, the notion of L 2 -stabilizability defined here coincides with that defined in Chap. 1, Sect. 1.2. Note that if the system (3.6.1) is L 2 -stabilizable, then it is also MF-L 2 -stabilizable. In fact, if K is an L 2 -stabilizer, then (K , K ) is an MF-L 2 -stabilizer. The following example shows that in general, the MF-L 2 -stabilizability does not imply the L 2 stabilizability. Example 3.6.10 Consider the one-dimensional controlled mean-filed SDE   ⎧ ¯ ¯ (t)] + bu(t) + bE[u(t)] dt ⎪ ⎨ d X (t) = aX (t) + aE[X  ¯ + cX (t) + cE[X ¯ (t)] + du(t) + dE[u(t)] dW, ⎪ ⎩ X (0) = x,

(3.6.17)

where the coefficients a = −1, a¯ = 2, b = 1, b¯ = 0, c = 1, c¯ = 0, d = −1, d¯ = 1. To see that the system (3.6.17) is MF-L 2 -stabilizable, we need only find k, k¯ ∈ R such that the closed-loop system   d X (t) = (a + bk)X (t) + [a¯ + b¯ k¯ + b(k¯ − k)]E[X (t)] dt   + (c + dk)X (t) + [c¯ + d¯ k¯ + d(k¯ − k)]E[X (t)] dW is L 2 -globally integrable. By Proposition 3.6.6, the above closed-loop system is L 2 -globally integrable if and only if ¯ k¯ = k¯ + 1 < 0, a + a¯ + (b + b) 2(a + bk) + (c + dk)2 = k 2 − 1 < 0. ¯ = (0, −2) is an MF-L 2 -stabilizer. So (k, k) To see that the system (3.6.17) is not L 2 -stabilizable, we need verify that for any k ∈ R, the system   ¯ d X (t) = (a + bk)X (t) + (a¯ + bk)E[X (t)] dt   ¯ + (c + dk)X (t) + (c¯ + dk)E[X (t)] dW is not L 2 -globally integrable. This is equivalent to verifying that the system of inequalities

3.6 Problems in Infinite Horizons

117

¯ = k + 1 < 0, (a + bk) + (a¯ + bk) 2(a + bk) + (c + dk)2 = k 2 − 1 < 0, has no solutions, which is trivial. Now we present a result concerning the MF-L 2Q, Q¯ -stabilizability of (3.6.1). We shall use the following notation: ¯  = A + A, A

¯ C ¯   = C + C, B = B + B,

¯  = D + D, D

¯  = Q + Q. Q

Theorem 3.6.11 Let (A1) –(A2) hold. (i) If the system (3.6.1) is MF-L 2Q, Q¯ -stabilizable, then for some K¯ ∈ Rm×n ,



 2  1      2 A+ B K¯ t  Q e   dt < ∞.

(3.6.18)

0

(ii) Suppose that there exists K¯ ∈ Rm×n such that (3.6.18) and  ⊆ N (C + D  K¯ ) N ( Q)

(3.6.19)

hold. If the controlled SDE system d X (s) = [AX (s) + Bu(s)]ds + [C X (s) + Du(s)]dW

(3.6.20)

is L 2Q -stabilizable, then the system (3.6.1) is MF-L 2Q, Q¯ -stabilizable. (iii) Suppose that there exists K¯ ∈ Rm×n such that (3.6.18) and + D  K¯ = 0 C

(3.6.21)

hold. Then the system (3.6.1) is MF-L 2Q, Q¯ -stabilizable. Proof The proof of Theorem 3.6.11 follows easily from Theorem 3.6.7. The details are left to the reader.  Corollary 3.6.12 Let (A1) –(A2) hold. (i) If the system (3.6.1) is MF-L 2 -stabilizable, then there exists a matrix K¯ ∈ Rm×n such that   +  σ( A B K¯ ) ⊆ C− ≡ z = α + iβ ∈ C | α < 0 ,

(3.6.22)

+  +  where σ( A B K¯ ) is the spectrum of A B K¯ . ¯ (ii) Suppose that (3.6.22) holds for some K ∈ Rm×n , and that the system (3.6.20) is L 2 -stabilizable. Then the controlled mean-field SDE system (3.6.1) is MFL 2 -stabilizable.

118

3 Mean-Field Linear-Quadratic Optimal Controls

(iii) Suppose that there exists a matrix K¯ ∈ Rm×n such that (3.6.21) and (3.6.22) hold. Then the controlled mean-field SDE system (3.6.1) is MF-L 2 -stabilizable. ¯ However, Note that the assumptions in Corollary 3.6.12(ii) do not involve C¯ or D. ¯ ¯ the condition (3.6.21) in Corollary 3.6.12(iii) involves both C and D. We point out that (3.6.21) means that  ⊆ R( D).  R(C) (3.6.23)  and D.  In the case of m < n, the above could be a big restriction on C In Corollary 3.6.12(iii), the matrix K¯ ∈ Rm×n should satisfy (3.6.21) and (3.6.22) simultaneously. The solution of (3.6.21) is given by  + (I − D † D)  K , † C K¯ = − D  ∈ Rm×n is arbitrary. Thus, in order for (3.6.22) to hold, we need where K !  + (I − D † D)  K ] ⊆ C− , +  † C σ A B[− D which means that the ODE system  (t) +  † D)u(t),  −  † C)X B(I − D t 0 X˙ (t) = ( A BD

(3.6.24)

is stabilizable. Hence, we obtain the following result. Proposition 3.6.13 Let (A1) –(A2) and (3.6.23) hold. Then the system (3.6.1) is MF-L 2 -stabilizable if the ODE system (3.6.24) is stabilizable, which is the case, if,  is invertible, and in particular, m = n, D  ⊆ C− . −  −1 C) σ( A BD

(3.6.25)

From (3.6.25) it seems that the MF-L 2 -stabilizability of the mean-filed SDE system (3.6.1) could have nothing to do with the stabilizability of the SDE system (3.6.20). In the case that A¯ = C¯ = 0 and B¯ = D¯ = 0, we have the following controlled linear SDE: d X (t) = [AX (t) + Bu(t)]dt + [C X (t) + Du(t)]dW, t  0. Suppose that m = n and D −1 exists. Then the condition (3.6.25) becomes σ(A − B D −1 C) ⊆ C− . In this case, if we take

u(t) = −D −1 C X (t), t  0,

(3.6.26)

3.6 Problems in Infinite Horizons

119

then the closed-loop system becomes d X (t) = (A − B D −1 C)X (t)dt, t  0, which is exponentially stable if (3.6.26) holds.

3.6.2 Stochastic LQ Problems We now return to Problem (MFLQ)∞ . In order for the admissible control set Uad (x) to be nonempty for each initial state x ∈ Rn , we make the following assumption. (A3) The controlled ODE system  (t) +  X˙ (t) = AX Bu(t), t  0 is stabilizable, and the controlled SDE system d X (t) = [AX (t) + Bu(t)]dt + [C X (t) + Du(t)]dW, t  0 is L 2 -stabilizable. From Corollary 3.6.12 (ii), we know that under the assumptions (A1) –(A3) , the system (3.6.1) is MF-L 2 -stabilizable. It is possible to relax (A3) in various ways. However, for simplicity of presentation, we would like to keep (A3) in the sequel. Let us first introduce the following notation (similar to that used for the finitehorizon problem): ¯  ¯ C ¯ ¯  = A + A,  = C + C,  = D + D, A B = B + B, D ¯ ¯ G ¯  = Q + Q,  = R + R,  = G + G, Q R and for constant matrices P, Π ∈ Sn , Q(P) = P A + A P + C  PC + Q, S(P) = B  P + D  PC, R(P) = R + D  P D, and  + A Π + C  P C  + Q,  Q(P, Π) = Π A    Π) =   P C,  S(P, B Π+D  + D  P D.  R(P) =R

120

3 Mean-Field Linear-Quadratic Optimal Controls

Note that different from (3.4.3) and (3.4.4), all the matrices in the above are constantvalued and for simplicity we take S = S¯ = 0. Theorem 3.6.14 Let (A1) –(A3) hold. We have the following results: (i) For every initial state x ∈ Rn , Problem (MFLQ)∞ admits a unique optimal control u ∗ ∈ Uad (x). (ii) The algebraic Riccati equations (AREs, for short) Q(P) − S(P) R(P)−1 S(P) = 0,   Π ) R(P)  −1 S(P,  Π ) = 0, Q(P, Π ) − S(P,

(3.6.27) (3.6.28)

admit a solution pair (P, Π ) ∈ S¯ n+ × S¯ n+ .  defined by (iii) The pair (Θ, Θ)  Π)  −1 S(P,   −R(P) Θ  −R(P)−1 S(P), Θ is an MF-L 2Q, Q¯ -stabilizer of the system (3.6.1). (iv) The optimal control u ∗ for the initial state x admits the following state feedback representation:    (t)], t  0, u ∗ (t) = Θ X (t) − E[X (t)] + ΘE[X where X is the solution to the following mean-field closed-loop system:   ⎧ ¯   − BΘ)E[X (t)] dt BΘ)X (t)+( A+ BΘ ⎪ ⎨ d X (t) = (A+   Θ  − DΘ)E[X (t)] dW, + (C + DΘ)X (t)+(C¯ + D ⎪ ⎩ X (0) = x.

(3.6.29)

(v) The value function of Problem (MFLQ)∞ is given by V (x) 

inf

u∈Uad (x)

J (x; u) = Π x, x, ∀x ∈ Rn .

Proof Under (A1) –(A3) , the admissible control set Uad (x) is nonempty and convex for each x ∈ Rn . Writing the cost functional J (x; u) as

∞

 Q{X (t)−E[X (t)]}, X (t)−E[X (t)]+ QE[X (t)], E[X (t)]   + R{u(t) − E[u(t)]}, u(t) − E[u(t)] +  RE[u(t)], E[u(t)] dt,

J (x; u) = E

0

we see, by the assumption (A2) , that



J (x; u)  δE 0

|u(t)|2 dt, ∀u ∈ Uad (x)

3.6 Problems in Infinite Horizons

121

for some δ > 0. Thus, the map u → J (x; u) is strictly convex on Uad (x), and hence the optimal control of Problem (MFLQ)∞ for x is unique. To prove the existence of an optimal control as well as the assertions (ii)–(v), define for each T > 0 the cost functional T ¯ Q X (t), X (t) +  QE[X (t)], E[X (t)] JT (x; u) = E 0  ¯ + Ru(t), u(t) +  RE[u(t)], E[u(t)] dt and consider the following problem. Problem (MFLQ)T . For given x ∈ Rn , find a control u ∗T ∈ U[0, T ] such that JT (x; u ∗T ) =

inf

u∈U [0,T ]

JT (x; u) ≡ VT (x).

By the results of Sect. 3.4 in this chapter, Problem (MFLQ)T admits a unique optimal control u ∗T and VT (x) = JT (x; u ∗T ) = ΠT (0)x, x, ∀x ∈ Rn , where ΠT (t)  0 is the solution to the Riccati equation ⎧   ˙ ⎪ ⎨ ΠT (t) + Q(PT (t), ΠT (t)) − S(PT (t), ΠT (t))  T (t), ΠT (t)) = 0, t ∈ [0, T ],  T (t))−1 S(P × R(P ⎪ ⎩ ΠT (T ) = 0, with PT (t)  0 being the solution to the Riccati equation 

P˙T (t) + Q(PT (t)) − S(PT (t)) R(PT (t))−1 S(PT (t)) = 0, t ∈ [0, T ], PT (T ) = 0.

Furthermore, with the notation ΘT (t) = −R(PT (t))−1 S(PT (t)),  T (t), ΠT (t)),  T (t))−1 S(P T (t) = −R(P Θ the optimal control u ∗T admits the following state feedback representation:   T (t)E[X T (t)], t ∈ [0, T ], u ∗T (t) = ΘT (t) X T (t) − E[X T (t)] + Θ where X T is the solution to the following closed-loop system over [0, T ]:

122

3 Mean-Field Linear-Quadratic Optimal Controls

  ⎧    ¯  T (t)− BΘT (t) E[X T (t)] dt ⎪ d X (t) = A+ BΘ (t) X (t)+ A+ B Θ ⎪ T T T ⎪ ⎨      ¯   + C + DΘ (t) X (t)+ C + D Θ (t)− DΘ (t) E[X (t)] dW, T T T T T ⎪ ⎪ ⎪ ⎩ X T (0) = x. To see that the ARE (3.6.27) admits a solution P ∈ S¯ n+ , let us consider the state equation 

d X (t) = [AX (t) + Bu(t)]dt + [C X (t) + Du(t)]dW, t  0, X (0) = x,

and the following cost functionals: JT (x; u) = E J(x; u) = E



T



 Q X (t), X (t) + Ru(t), u(t) dt,

0∞ 

 Q X (t), X (t) + Ru(t), u(t) dt.

0

By Theorem 1.1.15 of Chap. 1, PT (0)x, x =

inf

u∈U [0,T ]

T (x), ∀x ∈ Rn . JT (x; u) ≡ V

Since the controlled SDE system d X (t) = [AX (t) + Bu(t)]dt + [C X (t) + Du(t)]dW, t  0 is L 2 -stabilizable, we have (x)  V

inf

u∈L 2F (Rm )

J(x; u) < ∞.

It is not hard to see that for any 0 < T < T  ,   (x)  V (x), ∀x ∈ Rn , T (x)  V V T from which we conclude that the limit P  lim PT (0)  0 T →∞

exists and is finite. By Lemma 3.3.4 of [48, Chap. 3], P is a solution of (3.6.27). Further, proceeding exactly as in the proof of Lemma 3.3.4 in [48, Chap. 3], we can show that PT (T − s) = PT  (T  − s), ∀s ∈ [0, T ],

3.6 Problems in Infinite Horizons

123

from which it follows that PT (0) = PT +t (t) for all t  0, and hence lim PT (t) = P, ∀t  0.

T →∞

(3.6.30)

By a similar argument, we can show that Π  lim ΠT (0)  0 T →∞

exists and is a solution of (3.6.28), and that lim ΠT (t) = Π, ∀t  0.

T →∞

(3.6.31)

From (3.6.30) and (3.6.31) we obtain lim ΘT (t) = − lim R(PT (t))−1 S(PT (t)) = −R(P)−1 S(P) = Θ,

T →∞

T →∞

T →∞

T →∞

 T (t))−1 S(P  T (t), ΠT (t)) = Θ. T (t) = − lim R(P  lim Θ

Thus, for any t  0, we have almost surely lim X T (t) = X (t),    lim u ∗T (t) = Θ X (t) − E[X (t)] + ΘE[X (t)] ≡ u ∗ (t),

T →∞ T →∞

where X is the solution of (3.6.29). Recall that ΠT (0)x, x = JT (x; u ∗T ) T ¯ Q X T (t), X T (t) +  QE[X =E (t)], E[X T (t)] T 0  ∗ ∗ ¯ + Ru ∗T (t), u ∗T (t) +  RE[u (t)], E[u (t)] dt. T T Letting T → ∞, we have by Fatou’s Lemma, V (x)  lim VT (x) = Π x, x T →∞ ∞ ¯ E Q X (t), X (t) +  QE[X (t)], E[X (t)] 0  ∗ ¯ + Ru ∗ (t), u ∗ (t) +  RE[u (t)], E[u ∗ (t)] dt  V (x). This shows that u ∗ is an optimal control for x and that the assertions (ii)–(v) hold. 

Bibliography

1. Ahmed, N.U.: Nonlinear diffusion governed by McKean-Vlasov equation on Hilbert space and optimal control. SIAM J. Control Optim. 46, 356–378 (2007) 2. Ahmed, N.U., Ding, X.: Controlled McKean-Vlasov equations. Commun. Appl. Anal. 5, 183– 206 (2001) 3. Ait Rami, M., Moore, J.B., Zhou, X.Y.: Indefinite stochastic linear quadratic control and generalized differential Riccati equation. SIAM J. Control Optim. 40, 1296–1311 (2001) 4. Andersson, D., Djehiche, B.: A maximum principle for SDEs of mean-field type. Appl. Math. Optim. 63, 341–356 (2011) 5. Bellman, R., Glicksberg, I., Gross, O.: Some Aspects of the Mathematical Theory of Control Processes. RAND Corporation, Santa Monica, CA (1958) 6. Berkovitz, L.D.: Lectures on differential games. In: Kuhn, H.W., Szego, G.P. (eds.) Differential Games and Related Topics, North-Holland, Amsterdam, The Netherlands, pp. 3–45 (1971) 7. Bernhard, P.: Linear-quadratic, two-person, zero-sum differential games: necessary and sufficient conditions. J. Optim. Theory Appl. 27, 51–69 (1979) 8. Bensoussan, A., Sung, K.C.J., Yam, S.C.P., Yung, S.P.: Linear-quadratic mean field games. J. Optim. Theory Appl. 169, 496–529 (2016) 9. Bismut, J.M.: Linear quadratic optimal stochastic control with random coefficients. SIAM J. Control Optim. 14, 419–444 (1976) 10. Borkar, V.S., Kumar, K.S.: McKean-Vlasov limit in portfolio optimization. Stoch. Anal. Appl. 28, 884–906 (2010) 11. Buckdahn, R., Cardaliaguet, P., Rainer, C.: Nash equilibirum payoffs for nonzero-sum stochastic differential games. SIAM J. Control Optim. 43, 624–642 (2004) 12. Buckdahn, R., Djehiche, B., Li, J.: A general maximum principle for SDEs of mean-field type. Appl. Math. Optim. 64, 197–216 (2011) 13. Carmona, R.: Lectures on BSDEs, stochastic control, and stochastic differential games with financial applications. SIAM Book Ser. Financ. Math. 1 (2016) 14. Chan, T.: Dynamics of the McKean-Vlasov equation. Ann. Probab. 22, 431–441 (1994) 15. Delfour, M.C.: Linear quadratic differential games: Saddle point and Riccati differential equations. SIAM J. Control Optim. 46, 750–774 (2007) 16. Delfour, M.C., Sbarba, O.D.: Linear quadratic differential games: closed loop saddle points. SIAM J. Control Optim. 47, 3138–3166 (2009) 17. El Karoui, N., Hamadène, S.: BSDEs and risk-sensitive control, zero-sum and nonzero-sum game problems of stochastic functional differential equations. Stoch. Proc. Appl. 107, 145–169 (2003) © The Author(s), under exclusive license to Springer Nature Switzerland AG 2020 J. Sun and J. Yong, Stochastic Linear-Quadratic Optimal Control Theory: Differential Games and Mean-Field Problems, SpringerBriefs in Mathematics, https://doi.org/10.1007/978-3-030-48306-7

125

126

Bibliography

18. Friedman, A.: Stochastic differential games. J. Differ. Equ. 11, 79–108 (1972) 19. Hamadène, S.: Backward-forward SDE’s and stochastic differential games. Stoch. Proc. Appl. 77, 1–15 (1998) 20. Hamadène, S.: Nonzero sum linear-quadratic stochastic differential games and backwardforward equations. Stoch. Anal. Appl. 17, 117–130 (1999) 21. Hamadène, S., Mu, R.: Existence of Nash equilibrium points for Markovian non-zero-sum stochastic differential games with unbounded coeffcients. Stoch. Int. J. Probab. Stoch. Process 87, 85–111 (2015) 22. Ho, Y.C., Bryson, A.E., Baron, S.: Differential games and optimal pursuit-evasion strategies. IEEE Trans. Automat. Control 10, 385–389 (1965) 23. Huang, J., Li, X., Yong, J.: A linear-quadratic optimal control problem for mean-field stochastic differential equations in infinite horizon. Math. Control Relat. Fields 5, 97–139 (2015) 24. Huang, M., Malhamé, R.P., Caines, P.E.: Large population stochastic dynamic games: closedloop McKean-Vlasov systems and the Nash certainty equivalence principle. Community Inform. Syst. 6, 221–251 (2006) 25. Ichikawa, A.: Linear quadratic differential games in a Hilbert space. SIAM J. Control Optim. 14, 120–136 (1976) 26. Kalman, R.E.: Contributions to the theory of optimal control. Bol. Soc. Mat. Mexicana 5, 102–119 (1960) 27. Karatzas, I., Shreve, S.E.: Brownian Motion and Stochastic Calculus, 2nd edn. Springer, New York (1991) 28. Li, X., Sun, J., Xiong, J.: Linear quadratic optimal control problems for mean-field backward stochastic differential equations. Appl. Math. Optim. 80, 223–250 (2019) 29. Li, X., Sun, J., Yong, J.: Mean-field stochastic linear quadratic optimal control problems: closed-loop solvability. Probab. Uncertain. Quant. Risk 1, 2 (2016). https://doi.org/10.1186/ s41546-016-0002-3 30. Lin, Q.: A BSDE approach to Nash equilibrium payoffs for stochastic differential games with nonlinear cost functionals. Stoch. Proc. Appl. 122, 357–385 (2012) 31. Lukes, D.L., Russell, D.L.: A global theory for linear-quadratic differential games. J. Math. Anal. Appl. 33, 96–123 (1971) 32. Ma, J., Protter, P., Yong, J.: Solving forward-backward stochastic differential equations explicitly–a four-step scheme. Probab. Theory Relat. Fields 98, 339–359 (1994) 33. Ma, J., Yong, J.: Forward-Backward Stochastic Differential Equations and Their Applications. Lecture Notes in Mathematics, vol. 1702. Springer, New York (1999) 34. McAsey, M., Mou, L.: Generalized Riccati equations arising in stochastic games. Linear Algebra Appl. 416, 710–723 (2006) 35. Meyer-Brandis, T., Øksendal, B., Zhou, X.Y.: A mean-field stochastic maximum principle via Malliavin calculus. Stochastics 84, 643–666 (2012) 36. Mou, L., Yong, J.: Two-person zero-sum linear quadratic stochastic differential games by a Hilbert space method. J. Ind. Manag. Optim. 2, 95–117 (2006) 37. Nash, J.: Non-cooperative games. Ann. Math. 54, 286–295 (1951) 38. Penrose, R.: A generalized inverse of matrices. Proc. Camb. Philos. Soc. 52, 17–19 (1955) 39. Pham, H.: Linear quadratic optimal control of conditional McKean-Vlasov equation with random coefficients and applications. Probab. Uncertain. Quant. Risk 1, 7 (2016). https://doi.org/ 10.1186/s41546-016-0008-x 40. Pham, H., Basei, M.: Linear-quadratic McKean-Vlasov stochastic control problems with random coefficients on finite and infinite dorizon, and applications (2017). arXiv:1711.09390 41. Rainer, C.: Two different approaches to nonzero-sum stochastic differential games. Appl. Math. Optim. 56, 131–144 (2007) 42. Schmitendorf, W.E.: Existence of optimal open-loop strategies for a class of differential games. J. Optim. Theory Appl. 5, 363–375 (1970) 43. Sun, J.: Mean-field stochastic linear quadratic optimal control problems: open-loop solvabilities. ESAIM: COCV 23, 1099–1127 (2017)

Bibliography

127

44. Sun, J., Li, X., Yong, J.: Open-loop and closed-loop solvabilities for stochastic linear quadratic optimal control problems. SIAM J. Control Optim. 54, 2274–2308 (2016) 45. Sun, J., Wang, H.: Mean-field stochastic linear-quadratic optimal control problems: weak closed-loop solvability (2020). https://doi.org/10.3934/mcrf.2020026 46. Sun, J., Yong, J.: Linear quadratic stochastic differential games: open-loop and closed-loop saddle points. SIAM J. Control Optim. 52, 4082–4121 (2014) 47. Sun, J., Yong, J.: Linear-quadratic stochastic two-person nonzero-sum differential games: openloop and closed-loop Nash equilibria. Stoch. Proc. Appl. 129, 381–418 (2019) 48. Sun, J., Yong, J.: Stochastic Linear-Quadratic Optimal Control Theory: Open-Loop and ClosedLoop Solutions. SpringerBriefs in Mathematics, Springer, Cham (2020) 49. Sun, J., Yong, J., Zhang, S.: Linear quadratic stochastic two-person zero-sum differential games in an infinite horizon. ESAIM: COCV 22, 743–769 (2016) 50. Wonham, W.M.: On a matrix Riccati equation of stochastic control. SIAM J. Control 6, 681–697 (1968) 51. Yong, J.: Linear-quadratic optimal control problems for mean-field stochastic differential equations. SIAM J. Control Optim. 51, 2809–2838 (2013) 52. Yong, J.: Differential Games–A Concise Introduction. World Scientific Publisher, Singapore (2015) 53. Yong, J.: Linear-quadratic optimal control problems for mean-field stochastic differential equations–time-consistent solutions. Trans. Am. Math. Soc. 369, 5467–5523 (2017) 54. Yong, J., Zhou, X.Y.: Stochastic Controls: Hamiltonian Systems and HJB Equations. Springer, New York (1999) 55. Zhang, P.: Some results on two-person zero-sum linear quadratic differential games. SIAM J. Control Optim. 43, 2157–2165 (2005)

Index

A Adjoint equation, 70 Admissible, 8 Admissible control, 16, 104 Admissible control pair, 51 Admissible state process, 51, 104 Algebraic Riccati Equation (ARE), 9, 54, 117 B Backward Stochastic Differential Equation (BSDE), 6 Bond, 98 C Closed-loop control, 4, 10 Closed-loop optimal strategy, 4, 10 Closed-loop representation, 25 Closed-loop saddle point, 52 Closed-loop solvable, 4, 10 Closed-loop strategy, 4, 10, 18 Closed-loop system, 4, 18 Coefficient, 2 Control, 2 Convexity-concavity condition, 40 Cost functional, 2 D Dispersion, 98 E Efficient point, 99

Efficient portfolio, 99 Exponentially stable, 106

F Finite, 3, 69

G Generalized Algebraic Riccati Equation (GARE), 11

H Homogeneous problem, 4

I Initial pair, 3, 68 Interest rate, 98

L L 2 -globally integrable, 105 L 2 -stabilizable, 9, 112 L 2 -stabilizer, 112 L 2 -stable, 9 asymptotically, 105 exponentially, 105 L 2 -stable adapted solution, 11, 13 L 2Q -stabilizable, 112 L 2Q -stabilizer, 112 L 2Q, Q¯ -globally integrable, 106 L 2Q, Q¯ -stabilizable, 112

© The Author(s), under exclusive license to Springer Nature Switzerland AG 2020 J. Sun and J. Yong, Stochastic Linear-Quadratic Optimal Control Theory: Differential Games and Mean-Field Problems, SpringerBriefs in Mathematics, https://doi.org/10.1007/978-3-030-48306-7

129

130 L 2Q, Q¯ -stabilizer, 112 L 2Q -globally

integrable, 106 Linear-Quadratic (LQ), 1, 16 Lyapunov equation, 29, 40, 79, 93

M Mean-field, 67 Mean-field BSDE, 70 Mean-field FBSDE, 71 Mean-field linear-quadratic optimal control problem, 68 Mean-field SDE, 68 Mean rates of return, 98 Mean-variance portfolio selection problem, 99 MF-L 2 -stabilizable, 112 MF-L 2 -stabilizer, 112 MF-L 2Q, Q¯ -stabilizable, 111 MF-L 2Q, Q¯ -stabilizer, 112 Minimizing family, 75 Moore-Penrose pseudoinverse, 5, 12 MV problem, 99

N Nash equilibrium closed-loop, 18 open-loop, 17 Nonhomogeneous term, 2

O Open-loop optimal control, 10 Open-loop saddle point, 52 Open-loop solvable, 3, 10, 70 Optimal control open-loop, 3, 69 Optimal state process, 69 Ordinary Differential Equation (ODE), 5

P Portfolio process, 99 Problem (DLQ), 83 Problem (MFLQ), 69 Problem (MFLQ)0 , 69 Problem (MFLQ)λ , 100 Problem (MFLQ)∞ , 104 Problem (MFLQ)ε , 75

Index Problem (MFLQ)T , 118 Problem (SDG), 16 Problem (SDG)∞ , 51 Problem (SDG)0∞ , 52 Problem (SLQ), 3 Problem (SLQ)0 , 4 Problem (SLQ)0∞ , 9 Problem (SLQ)∞ , 9

R Riccati equation, 5 regularly solvable, 6 regular solution, 5 strongly regular, 76 strongly regularly solvable, 6 strongly regular solution, 6

S Saddle point closed-loop, 18, 39 open-loop, 17, 39 SLQ problem, 3 Stabilizer, 9 Stabilizing solution, 11, 55 State equation, 2 Stationarity condition, 5, 20, 40 Stochastic Differential Equation (SDE), 1 Stochastic LQ optimal control problem, 3 Stock, 98

T Two-person differential game, 16 infinite-horizon, 51 nonzero-sum, 17 zero-sum, 16, 38

V Value function, 3, 69

W Weighting matrix, 3

Z Zero-extension, 45, 77