111 69 7MB
English Pages 345 [317] Year 2023
CAMBRIDGE STUDIES IN ADVANCED MATHEMATICS 207 Editorial Board J. B E R T O I N, B. B O L L O B Á S, W. F U L T O N, B. K R A, I. M O E R D I J K C. P R A E G E R, P. S A R N A K, B. S I M O N, B. T O T A R O
OPTIMAL MASS TRANSPORT ON EUCLIDEAN SPACES Over the past three decades, optimal mass transport has emerged as an active field with wide-ranging connections to the calculus of variations, partial differential equations (PDEs), and geometric analysis. This graduate-level introduction covers the field’s theoretical foundation and key ideas in applications. By focusing on optimal mass transport problems in a Euclidean setting, the book is able to introduce concepts in a gradual, accessible way with minimal prerequisites, while remaining technically and conceptually complete. Working in a familiar context will help readers build geometric intuition quickly and give them a strong foundation in the subject. This book explores the relation between the Monge and Kantorovich transport problems, solving the former for both the linear transport cost (which is important in geometric applications) and the quadratic transport cost (which is central in PDE applications), starting from the solution of the latter for arbitrary transport costs. Francesco Maggi is Professor of Mathematics at the University of Texas at Austin. His research interests include the calculus of variations, partial differential equations, and optimal mass transport. He is the author of Sets of Finite Perimeter and Geometric Variational Problems: An Introduction to Geometric Measure Theory, published by Cambridge University Press.
CAMBRIDGE STUDIES IN ADVANCED MATHEMATICS Editorial Board J. Bertoin, B. Bollobás, W. Fulton, B. Kra, I. Moerdijk, C. Praeger, P. Sarnak, B. Simon, B. Totaro All the titles listed below can be obtained from good booksellers or from Cambridge University Press. For a complete series listing, visit www.cambridge.org/mathematics. Already Published 167 D. Li & H. Queffelec Introduction to Banach Spaces, II 168 J. Carlson, S. Müller-Stach & C. Peters Period Mappings and Period Domains (2nd Edition) 169 J. M. Landsberg Geometry and Complexity Theory 170 J. S. Milne Algebraic Groups 171 J. Gough & J. Kupsch Quantum Fields and Processes 172 T. Ceccherini-Silberstein, F. Scarabotti & F. Tolli Discrete Harmonic Analysis 173 P. Garrett Modern Analysis of Automorphic Forms by Example, I 174 P. Garrett Modern Analysis of Automorphic Forms by Example, II 175 G. Navarro Character Theory and the McKay Conjecture 176 P. Fleig, H. P. A. Gustafsson, A. Kleinschmidt & D. Persson Eisenstein Series and Automorphic Representations 177 E. Peterson Formal Geometry and Bordism Operators 178 A. Ogus Lectures on Logarithmic Algebraic Geometry 179 N. Nikolski Hardy Spaces 180 D.-C. Cisinski Higher Categories and Homotopical Algebra 181 A. Agrachev, D. Barilari & U. Boscain A Comprehensive Introduction to Sub-Riemannian Geometry 182 N. Nikolski Toeplitz Matrices and Operators 183 A. Yekutieli Derived Categories 184 C. Demeter Fourier Restriction, Decoupling and Applications 185 D. Barnes & C. Roitzheim Foundations of Stable Homotopy Theory 186 V. Vasyunin & A. Volberg The Bellman Function Technique in Harmonic Analysis 187 M. Geck & G. Malle The Character Theory of Finite Groups of Lie Type 188 B. Richter Category Theory for Homotopy Theory 189 R. Willett & G. Yu Higher Index Theory 190 A. Bobrowski Generators of Markov Chains 191 D. Cao, S. Peng & S. Yan Singularly Perturbed Methods for Nonlinear Elliptic Problems 192 E. Kowalski An Introduction to Probabilistic Number Theory 193 V. Gorin Lectures on Random Lozenge Tilings 194 E. Riehl & D. Verity Elements of ∞-Category Theory 195 H. Krause Homological Theory of Representations 196 F. Durand & D. Perrin Dimension Groups and Dynamical Systems 197 A. Sheffer Polynomial Methods and Incidence Theory 198 T. Dobson, A. Malniˇc & D. Marušiˇc Symmetry in Graphs 199 K. S. Kedlaya p-adic Differential Equations 200 R. L. Frank, A. Laptev & T. Weidl Schrödinger Operators: Eigenvalues and Lieb–Thirring Inequalities 201 J. van Neerven Functional Analysis 202 A. Schmeding An Introduction to Infinite-Dimensional Differential Geometry 203 F. Cabello Sánchez & J. M. F. Castillo Homological Methods in Banach Space Theory 204 G. P. Paternain, M. Salo & G. Uhlmann Geometric Inverse Problems 205 V. Platonov, A. Rapinchuk & I. Rapinchuk Algebraic Groups and Number Theory, I (2nd Edition) 206 D. Huybrechts The Geometry of Cubic Hypersurfaces
Optimal Mass Transport on Euclidean Spaces FRANCESCO MAGGI University of Texas at Austin
Shaftesbury Road, Cambridge CB2 8EA, United Kingdom One Liberty Plaza, 20th Floor, New York, NY 10006, USA 477 Williamstown Road, Port Melbourne, VIC 3207, Australia 314–321, 3rd Floor, Plot 3, Splendor Forum, Jasola District Centre, New Delhi – 110025, India 103 Penang Road, #05–06/07, Visioncrest Commercial, Singapore 238467 Cambridge University Press is part of Cambridge University Press & Assessment, a department of the University of Cambridge. We share the University’s mission to contribute to society through the pursuit of education, learning and research at the highest international levels of excellence. www.cambridge.org Information on this title: www.cambridge.org/9781009179706 DOI: 10.1017/9781009179713 © Francesco Maggi 2023 This publication is in copyright. Subject to statutory exception and to the provisions of relevant collective licensing agreements, no reproduction of any part may take place without the written permission of Cambridge University Press & Assessment. First published 2023 A catalogue record for this publication is available from the British Library Library of Congress Cataloging-in-Publication Data Names: Maggi, Francesco, 1978– author. Title: Optimal mass transport on Euclidean spaces / Francesco Maggi. Description: Cambridge ; New York, NY : Cambridge University Press, 2023. | Series: Cambridge studies in advanced mathematics ; 207 | Includes bibliographical references and index. Identifiers: LCCN 2023016425 | ISBN 9781009179706 (hardback) | ISBN 9781009179713 (ebook) Subjects: LCSH: Transport theory – Mathematical models. | Mass transfer. | Generalized spaces. Classification: LCC QC175.2 .M34 2023 | DDC 530.13/8–dc23/eng/20230817 LC record available at https://lccn.loc.gov/2023016425 ISBN 978-1-009-17970-6 Hardback Cambridge University Press & Assessment has no responsibility for the persistence or accuracy of URLs for external or third-party internet websites referred to in this publication and does not guarantee that any content on such websites is, or will remain, accurate or appropriate.
“Francesco Maggi’s book is a detailed and extremely well written explanation of the fascinating theory of Monge–Kantorovich optimal mass transfer. I especially recommend Part IV’s discussion of the ‘linear’ cost problem and its subtle mathematical resolution.” – Lawrence C. Evans, UC Berkeley “Over the last three decades, optimal transport has revolutionized the mathematical analysis of inequalities, differential equations, dynamical systems, and their applications to physics, economics, and computer science. By exposing the interplay between the discrete and Euclidean settings, Maggi’s book makes this development uniquely accessible to advanced undergraduates and mathematical researchers with a minimum of prerequisites. It includes the first textbook accounts of the localization technique known as needle decomposition and its solution to Monge’s centuries old cutting and filling problem (1781). This book will be an indispensable tool for advanced undergraduates and mathematical researchers alike.” – Robert McCann, University of Toronto
to Vicki
Contents
Preface Notation PART I
page xiii xix THE KANTOROVICH PROBLEM
1
An Introduction to the Monge Problem 1.1 The Original Monge Problem 1.2 A Modern Formulation of the Monge Problem 1.3 Optimality via Duality and Transport Rays 1.4 Monotone Transport Maps 1.5 Knothe Maps
3 3 5 7 9 10
2
Discrete Transport Problems 2.1 The Discrete Kantorovich Problem 2.2 c-Cyclical Monotonicity with Discrete Measures 2.3 Basics about Convex Functions on Rn 2.4 The Discrete Kantorovich Problem with Quadratic Cost 2.5 The Discrete Monge Problem
14 14 17 20 27 32
3
The Kantorovich Problem 3.1 Transport Plans 3.2 Formulation of the Kantorovich Problem 3.3 Existence of Minimizers in the Kantorovich Problem 3.4 c-Cyclical Monotonicity with General Measures 3.5 Kantorovich Duality for General Transport Costs 3.6 Two Additional Results on c-Cyclical Monotonicity 3.7 Linear and Quadratic Kantorovich Dualities
35 35 37 40 42 43 49 51
ix
x
Contents
P A R T II S O L U T I O N O F T H E M O N G E PROBLEM WITH QUADRATIC COST: T H E B R E N I E R–M C C A N N T H E O R E M 4
The Brenier Theorem 4.1 The Brenier Theorem: Statement and Proof 4.2 Inverse of a Brenier Map and Fenchel−Legendre Transform 4.3 Brenier Maps under Rigid Motions and Dilations
59 61 64 65
5
First Order Differentiability of Convex Functions 5.1 First Order Differentiability and Rectifiability 5.2 Implicit Function Theorem for Convex Functions
67 67 69
6
The Brenier–McCann Theorem 6.1 Proof of the Existence Statement 6.2 Proof of the Uniqueness Statement
73 74 75
7
Second Order Differentiability of Convex Functions 7.1 Distributional Derivatives of Convex Functions 7.2 Alexandrov’s Theorem for Convex Functions
77 77 79
8
The Monge–Ampère Equation for Brenier Maps 8.1 Convex Inverse Function Theorem 8.2 Jacobians of Convex Gradients 8.3 Derivation of the Monge–Ampère Equation
86 88 90 93
P A R T III A P P L I C A T I O N S T O P D E A N D T H E CALCULUS OF VARIATIONS AND THE WASSERSTEIN SPACE 9
Isoperimetric and Sobolev Inequalities in Sharp Form 9.1 A Jacobian−Laplacian Estimate 9.2 The Euclidean Isoperimetric Inequality 9.3 The Sobolev Inequality on Rn
97 97 98 100
10
Displacement Convexity and Equilibrium of Gases 10.1 A Variational Model for Self-interacting Gases 10.2 Displacement Interpolation 10.3 Displacement Convexity of Internal Energies 10.4 The Brunn–Minkowski Inequality
104 104 109 112 115
11
The Wasserstein Distance W2 on P2 (Rn ) 11.1 Displacement Interpolation and Geodesics in (P2 (Rn ), W2 )
118 118
Contents
xi
11.2 Some Basic Remarks about W2 11.3 The Wasserstein Space (P2 (Rn ), W2 )
120 124
12
Gradient Flows and the Minimizing Movements Scheme 12.1 Gradient Flows in Rn and Convexity 12.2 Gradient Flow Interpretations of the Heat Equation 12.3 The Minimizing Movements Scheme
129 129 133 135
13
The Fokker–Planck Equation in the Wasserstein Space 13.1 The Fokker–Planck Equation 13.2 First Variation Formulae for Inner Variations 13.3 Analysis of the Entropy Functional 13.4 Implementation of the Minimizing Movements Scheme 13.5 Displacement Convexity and Convergence to Equilibrium
139 139 142 147 150 160
14
The Euler Equations and Isochoric Projections 14.1 Isochoric Transformations of a Domain 14.2 The Euler Equations and Principle of Least Action 14.3 The Euler Equations as Geodesics Equations
164 164 166 170
15
Action Minimization, Eulerian Velocities, and Otto’s Calculus 15.1 Eulerian Velocities and Action for Curves of Measures 15.2 From Vector Fields to Curves of Measures 15.3 Displacement Interpolation and the Continuity Equation 15.4 Lipschitz Curves Admit Eulerian Velocities 15.5 The Benamou–Brenier Formula 15.6 Otto’s Calculus
174 175 179 181 183 184 189
P A R T IV S O L U T I O N O F T H E M O N G E PROBLEM WITH LINEAR COST: THE SUDAKOV THEOREM 16
Optimal Transport Maps on the Real Line 16.1 Cumulative Distribution Functions 16.2 Optimal Transport on the Real Line
195 195 197
17
Disintegration 17.1 Statement of the Disintegration Theorem and Examples 17.2 Proof of the Disintegration Theorem 17.3 Stochasticity of Transport Plans 17.4 Kc = Mc for Nonatomic Origin Measures
204 205 208 211 212
xii
Contents
18
Solution to the Monge Problem with Linear Cost 18.1 Transport Rays and Transport Sets 18.2 Construction of the Sudakov Maps 18.3 Kantorovich Potentials Are Countably C 1,1 -Regular 18.4 Proof of the Sudakov Theorem 18.5 Some Technical Measure-Theoretic Arguments
216 219 227 238 240 242
19
An Introduction to the Needle Decomposition Method 19.1 The Payne–Weinberger Comparison Theorem 19.2 The Localization Theorem 19.3 C 1,1 -Extensions of Kantorovich Potentials 19.4 Concave Needles and Proof of the Localization Theorem
247 248 254 256 259
Appendix A Radon Measures on Rn and Related Topics A.1 Borel and Radon Measures, Main Examples A.2 Support and Concentration Set of a Measure A.3 Fubini’s Theorem A.4 Push-Forward of a Measure A.5 Approximation Properties A.6 Weak-Star Convergence A.7 Weak-Star Compactness A.8 Narrow Convergence A.9 Differentiation Theory of Radon Measures A.10 Lipschitz Functions and Area Formula A.11 Vector-valued Radon Measures A.12 Regularization by Convolution A.13 Lipschitz Approximation of Functions of Bounded Variation A.14 Coarea Formula A.15 Rectifiable Sets A.16 The C 1,1 -Version of the Whitney Extension Theorem Appendix B Bibliographical Notes References Index
268 268 269 269 270 270 271 271 272 273 274 275 276 276 278 279 280 281 288 294
Preface
In a hypothetical hierarchy of mathematical theories, the theory of optimal mass transport (OMT hereafter) lies at quite a fundamental level, yielding a formidable descriptive power in very general settings. The most striking example in this direction is the theory of curvature-dimension conditions, which exploits OMT to construct fine analytic and geometric tools in ambient spaces as general as metric spaces endowed with a measure. The Bourbakist aesthetics would thus demand OMT to be presented in the greatest possible generality from the onset, narrowing the scope of the theory only when strictly necessary. In contrast, this book stems from the pedagogically more pragmatic viewpoint that many key features of OMT (and of its applications) already appear in full focus when working in the simplest ambient space, the Euclidean space Rn , and with the simplest transport costs per unit mass, namely, the “linear” transport cost c(x, y) = |x − y| and the quadratic transport cost c(x, y) = |x − y| 2 . Readers of this book, who are assumed to be graduate students with an interest in Analysis, should find in these pages sufficient background to start working on research problems involving OMT – especially those involving partial differential equations (PDEs); at the same time, having mastered the basics of the theory in its most intuitive and grounded setting, they should be in an excellent position to study more complete and general accounts on OMT, like those contained in the monographs [Vil03, Vil09, San15, AGS08]. For other introductory treatments that could serve well to the same purpose, see, for example [ABS21, FG21]. The story of OMT began in 1781, that is, in the midst of a founding period for the fields of Analysis and PDE, with the formulation of a transport problem by Monge. Examples of famous problems formulated roughly at the same time include the wave equation (1746), the Euler equations (1757), Plateau’s
xiii
xiv
Preface
problem 1 and the minimal surface equation (1760), and the heat equation and the Navier–Stokes equations (1822). An interesting common trait of all these problems is that the frameworks in which they had been originally formulated have all proved inadequate for their satisfactory solutions. For example, the study of heat and wave equations has stimulated the investigation of Fourier’s series, with the corresponding development of functional and harmonic analysis, and of the notion of distributional solution. Similarly, the study of the minimal surface equation and of the Plateau’s problem has inspired a profound revision of the notion of surface itself, leading to the development of geometric measure theory. The Monge problem has a similar history: 2 an original formulation (essentially intractable), and a modern reformulation, proposed by Kantorovich in the 1940s, which leads to a broader class of problems and to many new questions. The main theme of this book is exploring the relation between the Monge and Kantorovich transport problems, solving the former both for the linear transport cost (the one originally considered by Monge, which is of great importance in geometric applications) and for the quadratic transport cost (which is central in applications to PDE), starting from the solution of the latter for arbitrary transport costs. The book is divided in four parts, requiring increasing levels of mathematical maturity and technical proficiency at the reader’s end. Besides a prerequisite 3 familiarity with the basic theory of Radon measures in Rn , the book is essentially self-contained. Part I opens with an introduction to the original minimization problem formulated by Monge in terms of transport maps and includes a discussion about the intractability of the Monge problem by a direct approach, as well as some basic examples of (sometimes optimal) transport maps (Chapter 1). It then moves to the solution of the discrete OMT problem with a generic transport cost c(x, y) (Chapter 2), which serves to introduce in a natural way three key ideas behind Kantorovich’s approach to OMT problems: the notions of transport plan, c-cyclical monotonicity, and Kantorovich duality. Kantorovich’s theory is then presented in Chapter 3, leading to existence and characterization results for optimal transport plans with respect to generic transport costs. 1
2 3
The minimization of surface area under a prescribed boundary condition (together with the related minimal surface equation) has been studied by mathematicians at least since the work of Lagrange (1760). The modern consolidated terminology calls this minimization problem “Plateau’s problem,” although Plateau’s contribution was dated almost a century later (1849) and consisted in extensive experimental work on soap films. For a complete and accurate account on the history of the Monge problem and on the development of OMT, see the bibliographical notes of Villani’s treatise [Vil09]. All the relevant terminologies, notations, and required results are summarized in Appendix A, which is for the most part a synopsis of [Mag12, Part I].
Preface
xv
The optimal transport plans constructed in Kantorovich’s theory are more general and flexible objects than the optimal transport maps sought by Monge, which explains why solving the Kantorovich problem is way easier than solving the Monge problem. Moreover, a transport map canonically induces a transport plan with the same transport cost, thus leading to the fundamental question: When are optimal transport plans induced by optimal transport maps? Parts II and IV provide answers to these questions in the cases of the quadratic and linear transport costs, respectively. Part II opens with the Brenier theorem (Chapter 4), which asserts the existence of an optimal transport map in the Monge problem with quadratic cost under the assumptions that the origin mass distribution is absolutely continuous with respect to the Lebesgue measure and that both the origin and final mass distributions have finite second order moments; moreover, this optimal transport map comes in the form of the gradient of a convex function, and is uniquely determined, and therefore called the Brenier map from the origin to the final mass distribution. In Chapter 5, we establish some sharp results on the first order differentiability of convex functions, which we then use in Chapter 6 to prove McCann’s remarkable extension of the Brenier theorem – in which the absolute continuity assumption on the origin mass distribution is sharply weakened, and the finiteness of second order moments is entirely dropped. In both the Brenier theorem and the Brenier–McCann theorem, the transport condition is expressed in a measure-theoretic form (see (1.6) in Chapter 1) which is weaker than the “infinitesimal transport condition” originally envisioned by Monge (see (1.1) in Chapter 1). The former implies the latter for transport maps that are Lipschitz continuous and injective, but, unfortunately, both properties are not generally valid for gradients of convex functions. To address this point, in Chapter 7, we provide a detailed analysis of the second order differentiability properties of convex functions, which we then exploit in Chapter 8 to prove the validity of the Monge–Ampère equation for Brenier maps between absolutely continuous distributions of mass. In turn, the latter result is of key technical importance for the applications of quadratic OMT problems to PDE and geometric/functional inequalities. Part III has two main themes: the first one describes some celebrated applications of Brenier maps to mathematical models of physical interests; the second one introduces the geometric structure of the Wasserstein space. That is the space of finite second-moment probability measures P2 (Rn ) endowed with the distance W2 defined by taking the square root of the minimum value in the Kantorovich problem with quadratic cost. The close relation between the purely geometrical properties of the Wasserstein space and the inner workings of many mathematical models of basic physical importance is one of the most charming
xvi
Preface
and inspiring traits of OMT theory and definitely the reason why OMT is so relevant for mathematicians with such largely different backgrounds. We begin the exposition of these ideas in Chapter 9, where we present OMT proofs of two inequalities of paramount geometric and physical importance, namely, the Euclidean isoperimetric inequality and the Sobolev inequality. We then continue in Chapter 10 with the analysis of a model for self-interacting gases at equilibrium. While studying the uniqueness problem for minimizers in this model, we naturally introduce an OMT-based notion of convex combination between probability measures, known as displacement interpolation, together with a corresponding class of displacement convex “internal energies.” The latter include an example of paramount physical importance, namely, the (negative) entropy S of a gas. As a further application of displacement convexity (beyond the uniqueness of equilibria of self-interacting gases), we close Chapter 10 with an OMT proof of another key geometric inequality, the Brunn–Minkowski inequality. In Chapter 11, we introduce the Wasserstein space (P2 (Rn ), W2 ), build up some geometric intuition on it by a series of examples, prove that it is a complete metric space, and explain how to interpret displacement convexity as geodesic interpolation in (P2 (Rn ), W2 ). We then move, in the two subsequent chapters, to illustrate how to interpret many parabolic PDEs as gradient flows of displacement convex energies in the Wasserstein space. In Chapter 12, we introduce the notion of gradient flow, discuss why interpreting an evolution equation as a gradient flow is useful, how it is possible that the same evolution equation may be seen as the gradient flow of different energies, and how to construct gradient flows through the minimizing movements scheme. Then, in Chapter 13, we exploit the minimizing movements scheme framework to prove that the Fokker–Planck equation (describing the motion of a particle under the action of a chemical potential and of white noise forces due to molecular collisions) can be characterized (when the chemical potential is convex) as the gradient flow of a displacement convex functional on the Wasserstein space, and, as a further application, we derive quantitative rates for convergence to equilibrium in the Fokker–Planck equation. In Chapter 14, we obtain additional insights about the geometry of the Wasserstein space by looking at the Euler equations for the motion of an incompressible fluid. The Euler equations describe the motion of an incompressible fluid in the absence of friction/viscosity and can be characterized as geodesics equations in the (infinite-dimensional) “manifold” M of volumepreserving transformations of a domain. At the same time, geodesics (on a manifold embedded in some Euclidean space) can be characterized by a limiting procedure involving an increasing number of “mid-point projections” in
Preface
xvii
the ambient space: there lies the connection with OMT, since the Brenier theorem allows us to characterize L 2 -projections over M as compositions with Brenier maps. The analysis of the Euler equations serves also to introduce two crucial objects: the action functional of an incompressible fluid (time integral of the total kinetic energy) and the continuity equation (describing how the Eulerian velocity of the fluid transports its mass density). In Chapter 15, we refer to these objects when formally introducing the concepts of (Eulerian) velocity of a curve of measures in P2 (Rn ) and characterize the Wasserstein distance between the end points of such a curve in terms of minimization of a corresponding action functional. This is the celebrated Benamou–Brenier formula, which provides the entry point to understand the “Riemannian” (or “infinitesimally Hilbertian”) structure of the Wasserstein space. We only briefly explore the latter direction, by quickly reviewing the notion of gradient induced by such Riemannian structure (Otto’s calculus). Part IV begins with Chapter 16, where a sharp result for the existence of optimal transport maps in dimension one is presented. In Chapter 17, we first introduce the fundamental disintegration theorem and then exploit it to give a useful geometric characterization of transport plans induced by transport maps and to prove the equivalence of infima for the Monge and Kantorovich problems when the origin mass distribution is atomless. We then move, in Chapter 18, to construct optimal transport maps for the Monge problem with linear transport cost. We do so by implementing a celebrated argument due to Sudakov, which exploits disintegration theory to reduce the construction of an optimal transport map to the solution of a family of one-dimensional transport problems. The generalization of Sudakov’s argument to more general ambient spaces (like Riemannian manifolds or even metric measure spaces) lies at the heart of a powerful method for proving geometric and functional inequalities, known as the “needle decomposition method,” and usually formalized in the literature as a “localization theorem.” In Chapter 19, we present these ideas in the Euclidean setting. Although this restricted setting does not allow us to present the most interesting applications of the technique itself, its discussion seems, however, sufficient to illustrate several key aspects of the method, thus putting readers in an ideal position to undertake further reading on this important subject. Having put the focus on the clarity of the mathematical exposition above anything else, the main body of this book contains very few bibliographical references and almost no bibliographical digressions. A set of bibliographical notes has been included in the appendix with the main intent of acknowledging the original papers and references used in the preparation of the book and of pointing students to a few of the many possible further readings.
xviii
Preface
For comprehensive bibliographies and historical notes on OMT, we refer to [AGS08, Vil09, San15]. This book originates from a short course on OMT I taught at the Universidad Autónoma de Madrid in 2015 at the invitation of Matteo Bonforte, Daniel Faraco, and Juan Luis Vázquez. An expanded version of the lecture notes of that short course formed the initial core of a graduate course I taught at the University of Texas at Austin during the fall of 2020, and whose contents roughly correspond to the first 14 chapters of this book. The remaining chapters have been written in a more advanced style and without the precious feedback generated from class teaching. For this reason, I am very grateful to Fabio Cavalletti, Nicola Gigli, Carlo Nitsch, and Aldo Pratelli who, in reading those final four chapters, have provided me with very insightful comments, spotted subtle problems, and suggested possible solutions that led to some major revisions (and, I think, eventually, to a very nice presentation of some deep and beautiful results!). I would also like to thank Lorenzo Brasco, Kenneth DeMason, Luigi De Pascale, Andrea Mondino, Robin Neumayer, Daniel Restrepo, Filippo Santambrogio, and Daniele Semola for providing me with additional useful comments that improved the correctness and clarity of the text. Finally, I thank Alessio Figalli for his initial encouragement in turning my lecture notes into a book. With my gratitude to Luigi Ambrosio and Cedric Villani, from whom I have first learned OMT about 20 years ago during courses at the Scuola Normale Superiore di Pisa and at the Mathematisches Forschungsinstitut Oberwolfach, and with my sincere admiration for the many colleagues who have contributed to the discovery of the incredibly beautiful Mathematics contained in this book, I wish readers to find here plenty of motivations, insights, and enjoyment while learning about OMT and preparing themselves for contributing to this theory with their future discoveries!
Notation
a.e., almost everywhere s.t., such that w.r.t., with respect to N, Z, Q, R, natural, integer, rational, and real numbers Rn , the n-dimensional Euclidean space Br (x), open ball in Rn with center x and radius r (Euclidean metric) Int(E), interior of a set E ⊂ Rn (Euclidean topology) Cl(E), closure of a set E ⊂ Rn (Euclidean topology) ∂E, boundary of a set E ⊂ Rn (Euclidean topology) Sn , the n-dimensional sphere in Rn+1 νE , the outer unit normal νE : ∂E → Sn−1 of a set E ⊂ Rn with C 1 -boundary Rn×m , matrices with n-rows and m-columns A∗ , the transpose of a matrix/linear operator n×n , matrices A ∈ Rn×n , with A = A∗ Rsym w ⊗ v (v ∈ Rn , w ∈ Rm ), the linear map from Rn to Rm defined by (w ⊗ v)[e] = (v · e) w B(Rn ), the Borel subsets of Rn L n , the Lebesgue measure on Rn H k , the k-dimensional Hausdorff measure on Rn P (Rn ), probability measures on Rn Pac (Rn ), measures in P (Rn ) that are absolutely continuous w.r.t. L n Pp (Rn ), measures in P (Rn ) with finite p-moment (1 ≤ p < ∞) Pp,ac (Rn ) = Pac (Rn ) ∩ Pp (Rn ) μ 0}.
(1.1)
4
An Introduction to the Monge Problem
We call (1.1) the pointwise transport condition from ρ(x) dx to σ(y) dy. Taking |y − x| as the transport cost 1 to move a unit mass from x to y, the original Monge problem (from ρ(x) dx to σ(y) dy) is the minimization problem |T (x) − x| ρ(x) dx : T is smooth, injective and (1.1) holds . M = inf Rn
(1.2) Since work has the dimensions of force times length, if λ denotes the amount of force per unit mass at our disposal to implement the “instructions” of transport maps, then λ M is the minimal amount of work needed to transport ρ(x) dx into σ(y) dy. Since work is a form of (mechanical) energy, (1.2) is, in precise physical terms, an “energy minimization problem.” From a mathematical viewpoint – even from a modern mathematical viewpoint that takes advantage of all sorts of compactness and closure theorems discovered since Monge’s time – (1.2) is a very challenging minimization problem. Let us consider, for example, the problem of showing the mere existence of a minimizer. The baseline, modern strategy to approach this kind of question, the so-called Direct Method of the Calculus of Variations, works as follows. Consider an abstract minimization problem, m = inf{ f (x) : x ∈ X }, defined by a function f : X → R such that m ∈ R. By definition of infimum of a set of real numbers, we can consider a minimizing sequence 2 for m, that is, a sequence {x j } j in X such that f (x j ) → m as j → ∞. Assuming that: (i) there is a notion of convergence in X such that “{ f (x j )} j bounded in R implies, up to subsequences, that x j → x ∈ X,” and (ii) “ f (x) ≤ lim inf j f (x j ) whenever x j → x,” we conclude that any subsequential limit x of {x j } j is a minimizer of m, since, using in the order, x ∈ X, properties (i) and (ii), and the minimizing sequence property, we find m ≤ f (x) ≤ lim inf f (x j ) = m. j
With this method in mind, and back to the original Monge problem (1.2), we assume that M is finite (i.e., we assume the existence of at least one transport map with finite transport cost) and consider a minimizing sequence {Tj } j for (1.9). Thus, {T j } j is a sequence of smooth and injective maps with σ(T j (x)) | det ∇T j (x)| = ρ(x),
(1.3)
for all x ∈ { ρ > 0} and j ∈ N, and such that lim |T j − x| ρ = M < ∞.
(1.4)
j→∞
1 2
Rn
The transport cost |x − y | is commonly named the “linear cost,” although evidently (x, y) → |x − y | is not linear. Notice that a subsequence of a minimizing sequence is still a minimizing sequence.
1.2 A Modern Formulation of the Monge Problem
5
Trying to check assumption (i) of the Direct Method, we ask if (1.4) implies the compactness of {T j } j , say, in the sense of pointwise (a.e.) convergence. Compactness criteria enforcing this kind of convergence, like the Ascoli–Arzelà criterion, or the compactness theorem of Sobolev spaces, would require some form of uniform control on the gradients (or on some sort of incremental ratio) of the maps T j . It is, however, clear that no control of that sort is contained in (1.4). It is natural to think about pointwise convergence here, because should the maps T j converge pointwise to some limit T, then by Fatou’s lemma, we would find Rn
|T − x| ρ ≤ lim
j→∞
Rn
|T j − x| ρ = M,
thus verifying assumption (ii) of the Direct Method. Finally, even if pointwise convergence could somehow be obtained, we would still face the issue of showing that the limit map T belongs to the competition class (i.e., T is smooth and injective, and it satisfies the transport constraint (1.1)) in order to infer |T − x| ρ ≥ M and close the Direct Method argument. Deducing all these Rn properties on T definitely requires some form of convergence of ∇Tj toward ∇T (as is evident from the problem of passing to the limit the nonlinear constraint (1.3)) – a task that is even more out of reach than proving the pointwise convergence of T j in the first place! Thus, even from a modern perspective, establishing the mere existence of minimizers in the original Monge problem is a formidable task.
1.2 A Modern Formulation of the Monge Problem We now introduce the modern formulation of the Monge problem that will be used in the rest of this book. The first difference with respect to Monge’s original formulation is that we extend the class of distributions of mass to be transported to the whole family P (Rn ) of probability measures on Rn . We usually denote by μ the origin distribution of mass, and by ν the final one, thus going back to Monge’s original formulation by setting μ = ρ dL n and ν = σ dL n , where L n is the Lebesgue measure on Rn . This first change demands a second one, namely, we need to reformulate the pointwise transport condition (1.1) in a way that makes sense even when μ and ν are not absolutely continuous with respect to L n . This is done by resorting to the notion of push-forward (or direct image) of a measure through a map, which we now recall (see also Appendix A.4). We say that T transports μ if there exists a Borel set F ⊂ Rn such that T : F → Rn is a Borel map, and μ is concentrated on F (i.e., μ(Rn \ F) = 0).
(1.5)
6
An Introduction to the Monge Problem
Whenever T : F → Rn transports μ, we can define a Borel measure T# μ (the push-forward of μ through T) by setting, for every Borel set E ⊂ Rn , (T# μ)(E) = μ(T −1 (E)), where T −1 (E) = x ∈ F : T (x) ∈ E .
(1.6)
Notice that, according to this definition (T# μ)(Rn ) = μ(F); therefore, the requirement that μ is concentrated on F in (1.5) is necessary to ensure that T# μ ∈ P (Rn ) if μ ∈ P (Rn ). Finally, we say that T is a transport map from μ to ν if T# μ = ν. Clearly, the transport condition (1.6) does not require T to be differentiable, nor injective; moreover, it boils down to the pointwise transport condition (1.1) whenever the latter makes sense, as illustrated in the following proposition. Proposition 1.1 Let μ = ρ dL n and ν = σ dL n belong to P (Rn ), μ be concentrated on a Borel set F, and T : F → Rn be an injective Lipschitz map. Then, T# μ = ν if and only if σ(T (x)) | det ∇T (x)| = ρ(x), Proof
for L n -a.e. x ∈ F.
(1.7)
By the injectivity and Lipschitz continuity of T, the area formula, ϕ(y) σ(y) dy = ϕ(T (x)) | det ∇T (x)| σ(T (x)) dx, (1.8) T (F )
F
holds for every Borel function ϕ : T (F) → [0, ∞] (see Appendix A.10). Since T is injective, for every Borel set G ⊂ F, we have G = T −1 (T (G)). Therefore, by definition of T# μ and by (1.8) with ϕ = 1T (G) , we find (T# μ)(T (G)) = μ T −1 (T (G)) = ρ, G σ(y) dy = | det ∇T (x)| σ(T (x)) dx. ν(T (G)) = T (G)
G
By arbitrariness of G ⊂ F, we find that T# μ = ν if and only if (1.7) holds.
Based on these considerations, given μ, ν ∈ P (Rn ), we formally introduce the Monge problem from μ to ν by letting |T (x) − x| dμ(x) : T# μ = ν . (1.9) M1 (μ, ν) = inf Rn
Problem (1.9) is, in principle, more tractable than (1.2). Transport maps are no longer required to be smooth and injective, as reflected in the new transport condition (1.6). It is still unclear, however, if, given two arbitrary μ, ν ∈ P (Rn ),
1.3 Optimality via Duality and Transport Rays
7
there always exists at least one transport map from μ to ν, and if such transport map can be found with finite transport cost; whenever this is not the case, we have M1 (μ, ν) = +∞, and the Monge problem is ill posed. A more fundamental issue is that, even in a situation where we know from the onset that M1 (μ, ν) < ∞, it is still very much unclear how to verify assumption (i) in the Direct Method: what notion of subsequential convergence (for minimizing sequences {T j } j ) is needed for passing the transport condition (T j )# μ = ν to a limit map T? This difficulty will eventually be solved by working with the Kantorovich formulation of the transport condition, which requires extending competition classes for transport problems from the family of transport maps to that of transport plans. From this viewpoint, the modern formulation of the Monge problem, as much as the original one, is still somehow untractable by a direct approach. We can, of course, formulate the Monge problem with respect to a general3 transport cost c : Rn × Rn → R. Interpreting c(x, y) as the cost needed to transport a unit mass from x to y (notice that c does not need to be symmetric in (x, y)!), we define the Monge problem with transport cost c by setting c(x,T (x)) dμ(x) : T# μ = ν . (1.10) Mc (μ, ν) = inf Rn
Our focus will be largely (but not completely) specific to the cases of the linear cost c(x, y) = |x − y| and of the quadratic cost c(x, y) = |x − y| 2 . In the following, when talking about “the Monge problem,” we shall either assume that the transport cost under consideration is evident from the context or otherwise add the specification “with general cost,” “with linear cost,” or “with quadratic cost.” From the historical viewpoint, of course, only the Monge problem with linear cost should be called “the Monge problem.”
1.3 Optimality via Duality and Transport Rays We now anticipate an observation that we will formally reintroduce later on4 in our study of the Kantorovich duality theory and that provides a simple and effective criterion to check the optimality of a transport map in the Monge problem. The remark is that, if f : Rn → R is a Lipschitz function with Lip( f ) ≤ 1 (briefly, a 1-Lipschitz function), and if T is a transport map from μ to ν, then f dν − f dμ = [ f (T (x)) − f (x)] dμ(x) ≤ |T (x) − x| dμ(x), Rn
3 4
Rn
Rn
Rn
In practice, we shall work with transport costs that are at least lower semicontinuous, thus guaranteeing the Borel measurability of x → c (x, T (x)). See Section 3.7.
8
An Introduction to the Monge Problem
so that one always has sup f dν − Lip( f ) ≤1
Rn
Rn
f dμ ≤ inf
T# μ=ν
Rn
|T (x) − x| dμ(x).
(1.11)
In particular, if for a given transport map T from μ and ν, we can find a 1Lipschitz function f such that f (T (x)) − f (x) = |T (x) − x|,
for μ-a.e. x ∈ Rn ,
(1.12)
then, integrating (1.12) with respect to dμ and exploiting T# μ = ν, we find f dν − f dμ = |T (x) − x| dμ(x), Rn
Rn
Rn
thus deducing from (1.11) that T is a minimizer in the Monge problem M1 (μ, ν) (and, symmetrically, that f is a maximizer in the “dual” maximization problem appearing on the left-hand side of (1.11)). We illustrate this idea with the so-called “book-shifting example.” Given N ≥ 2, let us consider the Monge problem from μ to ν with μ=
1[0, N ] dL 1 , N
ν=
1[1, N +1] dL 1 . N
We can think of μ as a collection of N books of mass 1/N that we want to shift to the right (not necessarily in their original order) by a unit length. The map T (t) = t + 1 for t ∈ R (corresponding to shifting each book to the right by a unit length) is a minimizer in M1 (μ, ν) since it satisfies (1.12) with f (t) = t. By computing the transport cost of T, we see that M1 (μ, ν) = 1. We easily check that transport map S defined by S(t) = t, for t ∈ [1, N], and S(t) = t + N, for t ∈ [0, 1), which corresponds to moving only the left-most book to the right by a length equal to N, has also a unit transport cost and thus is also optimal in M1 (μ, ν). This shows, in particular, that the Monge problem can admit multiple minimizers. It is interesting to notice that the connection between optimal transport maps and 1-Lipschitz “potential functions” expressed in (1.12) was also clear to Monge, who rather focused on the more expressive identity ∇ f (x) =
T (x) − x . |T (x) − x|
(1.13)
The relation between (1.13) and (1.12) is clarified by noticing that Lip( f ) ≤ 1 and (1.12) imply that f is affine with unit slope along the oriented segment from x to T (x), that is, f x + t (T (x) − x) = f (x) + t |T (x) − x|, ∀t ∈ [0, 1], (1.14)
1.4 Monotone Transport Maps
9
from which (1.13) follows 5 if f is differentiable at x. Such oriented segments are called transport rays, and their study plays a central in the solution to the Monge problem (presented in Part IV). Notice that, by (1.14), the graph of f above the union of such segments is a developable surface; this connection seems to be the reason why Monge started (independently from Euler) the systematic study of developable surfaces.
1.4 Monotone Transport Maps In dimension n = 1, it is particularly easy to construct optimal transport maps by looking at monotone transport maps. Here we just informally discuss this important idea, which will be addressed rigorously in Chapter 16. It is quite intuitive that, in dimension 1, moving mass by monotone increasing maps must be a good transport strategy (the book-shifting example from Section 1.3 confirming that intuition). Considering the case when μ = ρ dL 1 and ν = σ dL 1 , for an increasing map to be a transport map, we only need to check that the “rate of mass transfer,” i.e., the derivative of the transport map, is compatible with the transport condition (1.1), and this can be achieved quite easily by defining T (x) through the formula T (x) x ρ= σ, ∀x ∈ R. (1.15) −∞
−∞
In more geometric terms, we are prescribing that the mass stored by μ to the left of x corresponds to the mass stored by ν to the left of T (x), i.e., we are setting μ((−∞, x)) = ν((−∞,T (x))); see Figure 1.1. Indeed, an informal differentiation in x of (1.15) gives ρ(x) = T (x) σ(T (x)), which (thanks to T ≥ 0) is the pointwise transport condition (1.1). The map T defined in (1.15) is called the monotone rearrangement of μ into ν and provides a minimizer in the Monge problem. We present here a simple argument in support of this assertion, which works under the assumption that {T ≥ id} = {x : T (x) ≥ x} and {T < id} = {x : T (x) < x} are equal, respectively, to complementary half-lines [a, ∞) and (−∞, a) for some a ∈ R: indeed, in this case, we can consider a 1-Lipschitz function f : R → R with f (x) = 1 {T ≥id} (x) − 1 {T 0, c(x i , y j ) = λ for every i, j. Indeed, in that case, Cost(γ) =
c(x i , y j ) γi j = λ
i, j
M N
γi j = λ
i=1 j=1
N
μi = λ,
i=1
i.e., cost is constant on Γ(μ, ν), so every discrete transport plan is optimal.
2.2 c-Cyclical Monotonicity with Discrete Measures We now further develop the remark made in Remark 2.3 to obtain a necessary optimality condition for minimizers γ in problem Kc (μ, ν). We start by noticing that we could well have γi j = 0 for some pair of indexes (i, j): when this happens, it means that the optimal plan γ has no convenience in sending any of the mass stored at x i to the destination y j . We thus look at those pairs (i, j) such that γi j > 0 and consider the set (2.8) S(γ) = (x i , y j ) ∈ Rn × Rn : γi j > 0 of those pairs of locations in the supports of μ and ν that are exchanging mass under the plan γ. We now formulate a necessary condition for the optimality of γ in terms of a geometric property of S(γ). Theorem 2.5 If μ, ν ∈ P (Rn ) are discrete (i.e., if (2.1) and (2.2) hold) and γ is a minimizer in the discrete Kantorovich problem Kc (μ, ν), then for every L of S(γ) we have finite subset {(z , w )}=1 L =1
c(z , w ) ≤
L
c(z +1 , w ),
(2.9)
=1
where z L+1 = z1 . Remark 2.6 Based on Theorem 2.5, we introduce the following crucial notion, to be discussed at length in the sequel: a set S ⊂ Rn × Rn is c-cyclically L of S satisfies (2.9). monotone if every finite subset {(z , w )}=1 4
This situation can of course be achieved in many different ways: for example, we could work in R2 , with x 1 = (0, 0), x 2 = (1, 1), y 1 = (1, 0), y 2 = (0, 1), and with c (x, y) being any nonnegative function of the Euclidean distance |x − y |; see Figure 2.5.
18
Discrete Transport Problems
Proof of Theorem 2.5 By construction, z = x i () and w = y j () for suitable functions i : {1, . . . , L} → {1, . . . , N } and j : {1, . . . , L} → {1, . . . , M }, and α = γi () j () > 0 for every . Given ε > 0 with ε < min α , we construct a family of transport plans γ ε by making, first of all, the following changes: γi (1) j (1)
→
γiε(1) j (1) = γi (1) j (1) − ε,
γi (2) j (2)
→
γiε(2) j (2) = γi (2) j (2) − ε,
... γiε(L) j (L) = γi (L) j (L) − ε.
→
γi (L) j (L)
i.e., we decrease by ε the amount of mass sent by γ from z = x i () to w = y j () . Without further changes, the resulting plan γ ε is not admissible: indeed, we have left unused an ε of mass at each origin site z , while each of the destination sites w is missing an ε of mass. To fix things, we transport the excess mass ε sitting at z +1 to w and thus prescribe the following changes: γi (2) j (1)
→
γiε(2) j (1) = γi (2) j (1) + ε,
γi (3) j (2)
→
γiε(3) j (2) = γi (3) j (2) + ε,
... γi (L+1) j (L)
→
γiε(L+1) j (L) = γi (1) j (L) + ε,
where i(L + 1) = 1. Notice that, since ε > 0 by assumption, we do not need γi (+1) j () to be positive to prescribe the second round of changes. Finally, by setting γiεj = γi j for every (i, j) {(i( ), j ( )) : 1 ≤ ≤ L}, we find that γ ε ∈ Γ(μ, ν) and therefore that 0 ≤ Cost(γ ε ) − Cost(γ) =
L
−ε c(z , w ) + ε c(z +1 , w ).
=1
Given that ε > 0, we deduce the validity of (2.9).
When minimizing (as done in Kc (μ, ν)) a linear function f on a compact convex set K, if f is nonconstant on K, then any minimum point x 0 will necessarily lie on ∂K. In particular, given a unit vector τ with x 0 + t τ ∈ K for every sufficiently small and positive t, by differentiating in t the inequality f (x 0 + t τ) ≥ f (x 0 ), we find that ∇ f (x 0 ) · τ ≥ 0. The family of inequalities ∇ f (x 0 ) · τ ≥ 0 indexed over all the admissible directions τ is then a necessary and sufficient condition for x 0 to be a minimum point of f on K. From this viewpoint, in Theorem 2.5 we have identified a family of “directions τ” that can be used to take admissible one-sided variations of an optimal transport plan γ. Understanding if c-cyclical monotonicity is not only a necessary condition for minimality but also a sufficient one is tantamount to prove
2.2 c-Cyclical Monotonicity with Discrete Measures
19
that such one-sided variations exhaust all the admissible ones. The Kantorovich duality theorem (Theorem 3.13) provides an elegant way to prove that this is indeed the case – i.e., that c-cyclical monotonicity fully characterizes optimality in Kc (μ, ν). The main difficulties related to the Kantorovich duality theorem are not of a technical character – actually the theorem is deduced by somehow elementary considerations – but rather conceptual – for example, if one insists (as it would seem natural when working with transport problems) in having a clear geometric understanding of things. Indeed, while the combinatorial nature of c-cyclical monotonicity is evident, its geometric content is definitely less immediate. Luckily, in the case of the quadratic cost c(x, y) = |x − y| 2 , c-cyclical monotonicity is immediately related to convexity. By examining this relation in detail, and by moving in analogy with it in the case of general costs, we will develop geometric and analytical ways to approach c-cyclical monotonicity, as well as develop the Kantorovich duality theory. To explain the relation with convexity, it is sufficient to look at (2.9), with c(x, y) = |x − y| 2 , to expand the squares |z − w | 2 and |z +1 − w | 2 , to cancel out the sums over of |z | 2 , |w | 2 and |z +1 | 2 (as we can thanks to z L+1 = z1 ), and, finally, to obtain the equivalent condition L =1
w · (z +1 − z ) ≤ 0,
L ⊂ S. for all {(z , w )}=1
(2.10)
This condition is (very well) known in convex geometry as cyclical monotonicity (of S). As proved in the next section, (2.10) is equivalent to require that S lies in the graph of the gradient of a convex function on Rn . We can quickly anticipate this result by looking at the simple case when n = 1 and L = 2, and (2.10) just says 0 ≤ w2 (z2 − z1 ) − w1 (z2 − z1 ) = (w2 − w1 )(z2 − z1 ), that is, ⎧ (z , w ), (z2 , w2 ) ∈ S, ⎪ ⎨ 1 1 ⎪z ≤ z , 2 ⎩ 1
⇒
w1 ≤ w2 .
(2.11)
The geometric meaning of (2.11) is absolutely clear (see Figure 2.1): S must be contained in the extended graph of a monotone increasing function from R to R (where the term “extended” indicates that vertical segments corresponding to jump points are included in the graph). Since monotone functions are the gradients of convex functions, the connection between convexity and OMT problems with quadratic transport cost is drawn.
20
Discrete Transport Problems
(a)
(b)
x0
Figure 2.1 (a) The graph of an increasing function f : R → R with a discontinuity at a point x 0 . The black dot indicates that the function takes the lowest possible value compatible with being increasing; (b) The extended graph of f is a subset of R2 which contains the graph of f and the whole vertical segment of values that f may take at x 0 without ceasing to be increasing. The property of being contained into the extended graph of an increasing function is easily seen to be equivalent to (2.11).
2.3 Basics about Convex Functions on R n We now review some key concepts concerning convex functions on Rn that play a central role in our discussion. In OMT it is both natural and convenient to consider convex functions taking values in R ∪ {+∞}. Since this setting may be unfamiliar to some readers, we offer here a review of the main results, including proofs of the less obvious ones. Convex sets: A convex set in Rn is a set K ⊂ Rn such that t x + (1 − t) y ∈ K whenever t ∈ (0, 1) and x, y ∈ K. If K ∅, the (affine) dimension of K is defined as the dimension of the smallest affine space containing K. The relative interior Ri(K ) of a convex set K is its interior as a subset of the smallest affine space containing it; of course Ri(K ) = Int(K ), where Int(K ) is the set of interior points of K as a subset of Rn , whenever K has dimension n. Given E ⊂
N t i x i for some Rn we say that z ∈ Rn is a convex combination in E if z = i=1
N N ⊂ E. The convex coefficients t i ∈ [0, 1] such that i=1 t i = 1 and some {x i }i=1 envelope conv(E) of E ⊂ Rn is the collection of all the convex combinations in E. The convex envelope can be characterized as the intersection of all the convex sets containing E. Of course, K is convex if and only if K = conv(K ). Convex functions: A function f : Rn → R ∪ {+∞} is a convex function if f (t x + (1 − t) y) ≤ t f (x) + (1 − t) f (y),
∀t ∈ [0, 1], x, y ∈ Rn , (2.12)
or, equivalently, if the epigraph of f , Epi( f ) = {(x,t) : t ≥ f (x)} ⊂ Rn+1 , is a convex set in Rn+1 . The domain of f , Dom( f ) = {x ∈ Rn : f (x) < ∞} = { f < ∞}, is a convex set in Rn . Notice that, with this definition, whenever f
2.3 Basics about Convex Functions on Rn
21
is not identically equal to +∞, the dimension of Dom( f ) could be any integer between 0 and n. Given a convex set K in Rn and a function f : K → R satisfying (2.12) for x, y ∈ K, by extending f = +∞ on Rn \ K, we obtain a convex function with K = Dom( f ); therefore, the point of view adopted here includes what is probably the more standard notion of “finite-valued, convex function defined on a convex set” that readers may be familiar with. The indicator function IK of a convex set K ⊂ Rn , defined by setting IK (x) = 0 if x ∈ K and IK (x) = +∞ if x K, is a convex function. In particular, the basic optimization problem “minimize a finite-valued convex function g over a convex set K” can be simply recast as the minimization over Rn of the R ∪ {+∞}-valued function f = g + IK , that is inf g = infn g + IK . (2.13) K
R
Finally, a more practical reason for considering R ∪ {+∞}-valued convex functions is that, as well shall see subsequently, many natural convex functions arise by taking suprema of families of affine functions, and thus they may very well take the value +∞ outside of a convex set. Lipschitz continuity and a.e. differentiability: We prove that convex functions are always locally Lipschitz 5 in the relative interiors of their domains. We give details in the case when Dom( f ) has affine dimension n, since the general case is proved similarly. Let Ω = IntDom( f ), and let us first prove that f is locally bounded in Ω. To this end, given Br (x) ⊂⊂ Ω we notice that, for every z ∈ Br (x), y = 2x − z ∈ Br (x) is such that x = (y + z)/2. Hence, f (x) ≤ ( f (y) + f (z))/2, which gives inf f ≥ 2 f (x) − sup f ,
B r (x)
B r (x)
i.e., f is locally bounded in Ω if it is locally bounded from above in Ω. To show n+1 in Rn so that the boundedness from above, let us fix n + 1 unit vectors {vi }i=1 simplex Σ with vertexes vi – defined as the set of all the convex combinations
n+1
n+1 n i=1 t i vi corresponding to 0 < t i < 1 with i=1 t i = 1 – is an open set in R containing the origin in its interior. Now, for each x ∈ Ω, we can find r > 0 such that Σ x,r = x + r Σ is contained in Ω, and since the convexity of f implies that
N N N ∈ [0, 1] s.t. N t = 1, ⎧ ∀N ∈ N, ∀{t i }i=1 ⎪ i=1 i ⎨ ti xi ≤ t i f (x i ), f ⎪ ∀{x } N ⊂ Rn , i i=1 i=1 i=1 ⎩ (2.14) 5
See Appendix A.10 for the basics on Lipschitz functions.
22
Discrete Transport Problems
we conclude that sup f ≤ max
Σ x, r
1≤i ≤n+1
f (x + r vi ) < ∞.
By a covering argument, we conclude that f is locally bounded (from above and, thus, also from below) in Ω. We next exploit local boundedness to show that f is locally Lipschitz in Ω. Indeed, if B2r is a ball of radius 2r compactly contained in Ω, and if Br is concentric to B2r with radius r, then 1 Lip( f ; Br ) ≤ sup f − inf f . B2r r B2r To show this, pick x, y ∈ Br and write y = t x + (1 − t) z for some z ∈ ∂B2r . Then, |x − y| = |1 − t| |x − z| ≥ r |1 − t| so that f (y) − f (x) ≤ t f (x) + (1 − t) f (z) − f (x) ≤ |1 − t| | f (x) − f (z)| |x − y| sup f − inf f . ≤ B2r r B2r In particular, by Rademacher’s theorem, f is a.e. differentiable in Ω, and a simple consequence of (2.12) shows that f (y) ≥ f (x) + ∇ f (x) · (y − x),
∀y ∈ Rn ,
(2.15)
whenever f is differentiable at x with gradient ∇ f (x). Condition (2.15) expresses the familiar property that convex functions lie above the tangent hyperplanes to their graphs whenever the latter are defined. In fact, inequality (2.15) points at a very fruitful way to think about convex functions, which we are now going to discuss. Convex functions as suprema of affine functions: If A is any family of affine functions on Rn (i.e., if α ∈ A, then α(x) = a + y · x for some a ∈ R and y ∈ Rn ), then it is trivial to check that f = sup α α ∈A
(2.16)
defines a convex function on Rn with values in R ∪ {+∞}. A convex function defined in this way is automatically lower semicontinuous on Rn , as it is the supremum of continuous functions. Of course, not every convex function is going to be lower semicontinuous on Rn (e.g., f (x) = I[0,1) (x) is not lower semicontinuous at x = 1), so not every convex function will satisfy an identity like (2.16) on the whole Rn . However, it is not hard to deduce from (2.15) that f (z) = sup f (x)+∇ f (x)·(z−x) : f is differentiable at x ∀z ∈ IntDom( f ), (2.17)
2.3 Basics about Convex Functions on Rn
23
so that (2.16) always holds on IntDom( f ) if A = {α x } x for x ranging among the points of differentiability of f , α x (z) = a x + y x · z, a x = f (x) − ∇ f (x) · x, and y x = ∇ f (x). We now introduce the concepts of subdifferential and Fenchel–Legendre transform of a convex function. These concepts lead to a representation formula for convex functions similar to (but more robust than) (2.17). Subdifferential at a point: Given a convex function f , a point x ∈ Dom( f ), and a hyperplane L in Rn+1 , we say that L is a supporting hyperplane of f at x if L is the graph of an affine function α on Rn such that α ≤ f on Rn and α(x) = f (x). If a ∈ R and y ∈ Rn are such that α(z) = a + y · z for all z ∈ Rn , then y is called the slope of L. The subdifferential ∂ f (x) of f at x is defined as follows: if x ∈ Dom( f ), then we set ∂ f (x) = slopes of all the supporting hyperplanes of f at x a + y · z ≤ f (z) ∀z ∈ Rn , = y ∈ Rn : ∃ a ∈ R s.t. a + y · x = f (x). = y ∈ Rn : f (z) ≥ f (x) + y · (z − x) ∀z ∈ Rn ; (2.18) otherwise, i.e., if f (x) = +∞, we set ∂ f (x) = ∅. If f is differentiable at some x ∈ Dom( f ), then x ∈ IntDom( f ) (because f is finite in a neighborhood of x) and ∂ f (x) = {∇ f (x)}
(2.19)
(see Proposition 2.7 for the proof). At a generic point x ∈ Dom( f ), where f may not be differentiable, we always have that ∂ f (x) is a closed convex set in Rn . For example, if f (x) = |x|, then ∂ f (0) is the closed unit ball in Rn centered at the origin (see Figure 2.2); if f is the maximum of finitely many affine functions α i , with slope yi , then ∂ f (x) is the convex envelope of those yi such that x belongs to { f = α i }. In the following proposition we prove (2.19) together with a sort of continuity property of subdifferentials. Proposition 2.7 (Continuity of subdifferentials) If f is differentiable at x, then ∂ f (x) = {∇ f (x)}. Moreover, for every ε > 0 there exists δ > 0 such that ∂ f (Bδ (x)) ⊂ Bε (∇ f (x)). Proof
(2.20)
Step one: We prove (2.19). Given y ∈ ∂ f (x), set Fx, y (z) = f (z) − f (x) − y · (z − x),
z ∈ Rn .
Notice that Fx, y has a minimum at x, since Fx, y (z) ≥ 0 = Fx, y (x) for every z ∈ Rn . In particular, if f is differentiable at x, then Fx, y is differentiable at x with 0 = ∇Fx, y (x) = ∇ f (x) − y so that (2.19) is proved.
24
Discrete Transport Problems
f (x) = |x|
∂f
1
−1 Figure 2.2 The subdifferential of f (x) = |x| when n = 1. Step two: If (2.20) fails, then there exist ε > 0 and x j → x as j → ∞ such that |y j − ∇ f (x)| ≥ ε for every j and y j ∈ ∂ f (x j ). It is easily seen that since f is bounded in a neighborhood of x, the sequence {y j } j must be bounded in Rn and thus up to extracting subsequences, that y j → y as j → ∞. By taking limits as j → ∞ in “ f (z) ≥ f (x j ) + y j · (z − x j ) for every z ∈ Rn ,” we deduce that y ∈ ∂ f (x) = {∇ f (x)}, in contradiction with |y j − ∇ f (x)| ≥ ε for every j. Fundamental theorem of (convex) Calculus: It is well-known that a smooth function g : R → R is the derivative f of a smooth convex function f : R → R if and only if g is increasing on R. The notion of subdifferential and a proper generalization of monotonicity to Rn allow to extend this theorem to R∪ {+∞}valued convex functions on Rn . First of all, let us introduce the notion of (total) subdifferential of f , defined as {x} × ∂ f (x) ; (2.21) ∂f = x ∈Rn
see Figure 2.2. Notice that ∂ f is a subset of Rn × Rn , which is closed as soon as f is lower semicontinuous. Recalling that, as set in (2.10), S ⊂ Rn × Rn is N ⊂ S one has cyclically monotone if for every finite set {(x i , yi )}i=1 N
yi · (x i+1 − x i ) ≤ 0,
where x N +1 = x 1 .
(2.22)
i=1
Thus, we have the following theorem. Theorem 2.8 (Rockafellar theorem) Let S ⊂ Rn × Rn be a non-empty set: S is cyclically monotone if and only if there exists a convex and lower semicontinuous function f : Rn → R ∪ {+∞} such that S ⊂ ∂f. Remark 2.9
Notice that S ∅ and S ⊂ ∂ f imply Dom( f ) ∅.
(2.23)
2.3 Basics about Convex Functions on Rn
25
Proof Proof that (2.23) implies cyclical monotonicity: Let us consider a finite N of S. Then, y ∈ ∂ f (x ) implies ∂ f (x ) ∅, and thus subset {(x i , yi )}i=1 i i i f (x i ) < ∞. For every i = 1, . . . , N, we know that f (x) ≥ f (x i ) + yi · (x − x i ),
∀x ∈ Rn ,
so that testing the i-th inequality at x = x i+1 and summing up over i = 1, . . . , N gives N N f (x i+1 ) ≥ f (x i ) + yi · (x i+1 − x i ). i=1
We find (2.22) since
N i=1
i=1
f (x i+1 ) =
N i=1
f (x i ) by the convention x N +1 = x 1 .
Proof that cyclical monotonicity implies (2.23): We need to define a convex function f which contains S in its subdifferential. To this end, we fix 6 (x 0 , y0 ) ∈ S and define N −1 N yi · (x i+1 − x i ) + y0 · (x 1 − x 0 ) : {(x i , yi )}i=1 ⊂S f (z) = sup y N · (z − x N ) + i=1
(2.24) for z ∈ Clearly, f is a convex and lower semicontinuous function on Rn with values in R ∪ {+∞}. We also notice that f (x 0 ) ∈ R. Indeed, by applying N ⊂ S we find (2.22) to {(x i , yi )}i=0 Rn .
y N · (x 0 − x N ) +
N −1
yi · (x i+1 − x i ) + y0 · (x 1 − x 0 ) ≤ 0
N ∀{(x i , yi )}i=1 ⊂ S,
i=1
(2.25) so that f (x 0 ) ≤ 0. (Actually, we can even say that f (x 0 ) = 0, since f (x 0 ) ≥ 0 by testing (2.24) with {(x 1 , y1 )} = {(x 0 , y0 )} at z = x 0 .) Interestingly, the proof that f (x 0 ) < ∞ is the only point of this argument where cyclical monotonicity plays a role. We now prove that S ⊂ ∂ f . Indeed, let (x ∗ , y∗ ) ∈ S and let t ∈ R be such N ⊂ S such that that t < f (x ∗ ). By definition of f (x ∗ ), we can find {(x i , yi )}i=1 y N · (x ∗ − x N ) +
N −1
yi · (x i+1 − x i ) ≥ t.
(2.26)
i=0 N +1 ⊂ S by setting x If we now define {(x i , yi )}i=1 N +1 = x ∗ and y N +1 = y∗ , then, N +1 by testing the definition of f with {(x i , yi )}i=1 ⊂ S, we find that, for every z ∈ Rn , 6
The choice of (x 0, y 0 ) is analogous to the choice of an arbitrary additive constant in the classical fundamental theorem of Calculus.
26
Discrete Transport Problems
f (z) ≥ y N +1 · (z − x N +1 ) +
N
yi · (x i+1 − x i )
i=0
= y∗ · (z − x ∗ ) + y N · (x ∗ − x N ) +
N −1
yi · (x i+1 − x i )
i=0
≥ t + y∗ · (z − x ∗ ),
(2.27)
where in the last inequality we have used (2.26). Since f (z) is finite at z = x 0 , by letting t → f (x ∗ ) − in (2.27) first with z = x 0 , we see that x ∗ ∈ Dom( f ), and then, by taking the same limit for an arbitrary z, we see that y∗ ∈ ∂ f (x ∗ ). Fenchel–Legendre transform: Given a convex function f , and the slope y of a supporting hyperplane to f , we know that there exists a ∈ R such that a + y · x ≤ f (x),
∀x ∈ Rn .
The largest value of a ∈ R such that this condition holds can be obviously characterized as a = − f ∗ (y), where f ∗ (y) = sup y · x − f (x) : x ∈ Rn . (2.28) The function f ∗ is called the Fenchel–Legendre transform of f . It is a convex function, and it is automatically lower semicontinuous on Rn . Moreover, as it is easily seen, f ∗∗ is the lower semicontinuous envelope of f – i.e., the largest lower semicontinuous function lying below f : in particular, if f is convex and lower semicontinuous, then f = f ∗∗ , i.e., f (x) = sup x · y − f ∗ (y) : y ∈ Rn (2.29) ∀x ∈ Rn . This is the “more robust” reformulation of (2.17). The last basic fact about convex functions that will be needed in the sequel is contained in the following two assertions: f (x) + f ∗ (y) ≥ x · y,
∀x, y ∈ Rn ,
(2.30)
f (x) + f ∗ (y) = x · y,
iff x ∈ Dom( f ) and y ∈ ∂ f (x).
(2.31)
Notice that (2.30) is immediate from the definition (2.28). If f (x) + f ∗ (y) = x · y, then x · y − f (x) = f ∗ (y) ≥ y · z − f (z), i.e., f (z) ≥ f (x) + y · (z − x) for every z ∈ Rn , i.e., y ∈ ∂ f (x); and, vice versa, if y ∈ ∂ f (x), then x · y − f (x) ≥ y · z − f (z) for every z ∈ Rn so that f ∗ (y) ≤ x · y − f (x) – which combined with (2.30) gives f ∗ (y) = x · y − f (x). Many common inequalities in analysis can be interpreted as instances of the Fenchel–Legendre inequality (2.30): for example, if 1 < p < ∞ and
2.4 The Discrete Kantorovich Problem with Quadratic Cost
27
f (x) = |x| p /p, one computes that f ∗ (y) = |y| p /p for p = p/(p − 1), and thus finds 7 that (2.30) boils down to the classical Young’s inequality. Extremal points and the Choquet theorem: 8 Given a convex set K, we say that x 0 is an extremal point of K if x 0 = (1 − t) y + t z with t ∈ [0, 1] and y, z ∈ K implies that either t = 0 or t = 1. We claim that if K Rn is non-empty, closed, and convex, then K has at least one extremal point.
(2.32)
To this end, we argue by induction on n, with the case n = 1 being trivial. If n ≥ 2, since K is not empty and not equal to Rn , there is a closed half-space H such that K ⊂ H and ∂H ∩ ∂K ∅. In particular, J = K ∩ ∂H is a convex set with affine dimension (n − 1), and, by inductive hypothesis, there is an extremal point x 0 of J. We conclude by showing that x 0 is also an extremal point of K. Should this not be the case, we could find t ∈ (0, 1) and x, y ∈ K such that x 0 = (1 − t) x + t y. On the one hand, it must be x, y ∈ ∂H: otherwise, assuming, for example, that x ∈ Int(H), by x 0 ∈ ∂H and t ∈ (0, 1) we would then find y H, against y ∈ K; on the other hand, x, y ∈ ∂H implies x, y ∈ J, and thus x 0 = (1 − t) x + t y with t ∈ (0, 1) would contradict the fact that x 0 is an extremal point of J. Having proved (2.32), we deduce from it the following statement (known as the Choquet theorem): if K ⊂ Rn is convex and compact, and f : Rn → R ∪ {+∞} is convex and lower semicontinuous,
(2.33)
then there is an extremal point x 0 of K such that f (x 0 ) = inf K f . This is trivially true by (2.32) if Dom( f ) = ∅. Otherwise, since f is lower semicontinuous and K is compact, we can apply the Direct Method to show that the set J of the minimum points of f over K is non-empty and compact. Since f is convex, J is also convex. Hence, by (2.32), J admits an extremal point, and (2.33) is proved.
2.4 The Discrete Kantorovich Problem with Quadratic Cost We now use the fundamental theorem of Calculus for convex functions proved in Section 2.3 to give a complete discussion of the discrete transport problem 7
8
Of course, we have not just discovered an incredibly short proof of Young’s inequality: indeed, showing that f ∗ (y) = |y | p /p is equivalent to prove Young’s inequality! From this viewpoint, the importance of the Fenchel’s inequality is more conceptual than practical. These results are only used in Section 2.5 and can be omitted on a first reading.
28
Discrete Transport Problems
with quadratic transport cost c(x, y) = |x − y| 2 . In particular, we make our first encounter with the Kantorovich duality formula; see (2.36), which comes into play as our means for proving that cyclical monotonicity is a sufficient condition for minimality in the transport problem. Theorem 2.10 If μ, ν ∈ P (Rn ) are discrete (i.e., if (2.1) and (2.2) hold) and c(x, y) = |x − y| 2 , then, for every discrete transport plan γ ∈ Γ(μ, ν), the following three statements are equivalent: (i) γ is a minimizer of Kc (μ, ν); (ii) S(γ) = {(x i , y j ) : γi j > 0} ⊂ Rn × Rn is cyclically monotone; (iii) there exists a convex function f such that S(γ) ⊂ ∂ f . Moreover, denoting H as the family of pairs (α, β) such that α, β : Rn → R satisfy α(x) + β(y) ≤ −x · y
∀x, y ∈ Rn
(2.34)
and defining H : H → R by setting H (α, β) =
N
α(x i ) μi +
i=1
M
β(y j ) ν j ,
j=1
we have, for every γ ∈ Γ(μ, ν) and (α, β) ∈ H ,
|x i − y j | 2 γi j ≥
i, j
N
|x i | 2 μi +
i=1
M
|y j | 2 ν j + 2 H (α, β).
(2.35)
j=1
Finally, if γ satisfies (iii), then (2.35) holds as an identity with (α, β) = (− f , − f ∗ ). In particular, Kc (μ, ν) =
N
|x i | 2 μi +
i=1
M
|y j | 2 ν j + 2
j=1
sup
(α, β) ∈H
H (α, β).
(2.36)
Remark 2.11 Theorem 2.10 is, of course, a particular case of Theorem 3.20, in which the same assertions are proved without the discreteness assumption on μ and ν. Proof of Theorem 2.10 Step one: We prove that (i) implies (ii) and that (ii) implies (iii). If (i) holds, then, by Theorem 2.5, we have L =1
|z − w | 2 ≤
L
|z +1 − w | 2 ,
(2.37)
=1
L ⊂ S(γ) and z L+1 = z1 . By expanding the squares in whenever {(z , w )}=1 (2.37), L L ⊂ S(γ), w · (z +1 − z ) ≤ 0, ∀{(z , w )}=1 (2.38) =1
so that S(γ) is cyclically monotone. In turn, if (ii) holds, then (iii) follows immediately by Rockafellar’s theorem (Theorem 2.8).
2.4 The Discrete Kantorovich Problem with Quadratic Cost
29
Step two: For every γ ∈ Γ(μ, ν) we have N M Cost(γ) = |x i − y j | 2 γi j = |x i | 2 μi + |y j | 2 ν j + 2 (−x i · y j ) γi j , i, j
i=1
i, j
j=1
so that (2.34) gives N M Cost(γ) − |x i | 2 μi − |y j | 2 ν j = −2 x i · y j γi j ≥ 2 H (α, β), i=1
j=1
i, j
that is (2.35). Now, if γ satisfies (iii), then, by the Fenchel–Legendre inequality (2.30), we have (− f , − f ∗ ) ∈ H , while (2.31) and S(γ) ⊂ ∂ f give f (x i ) + f ∗ (y j ) = x i · y j ,
if γi j > 0,
(2.39)
which in turn implies that (2.35) holds as an identity if we choose (α, β) = (− f , − f ∗ ). This shows at once that (2.36) holds and that γ is a minimizer of Kc (μ, ν). The following three remarks concern the lack of uniqueness in the discrete Kantorovich problem. Remark 2.12 We already know that uniqueness does not hold in problem Kc (μ, ν) for arbitrary data; recall Remark 2.4. However, the following statement (which will be proved in full generality in Theorem 3.15) provides a “uniqueness statement of sorts” for the quadratic transport cost: If S = γ S(γ), where γ ranges over all the optimal plans in the quadratic-cost transport problem defined by two discrete measures μ and ν, then S is cyclically monotone; in particular, there exists a convex function f such that S(γ) ⊂ ∂ f for every such optimal plan γ. We interpret this as a uniqueness statement since the subdifferential ∂ f appearing in it has the property of “bundling together” all the optimal transport plans of problem Kc (μ, ν). As done in Remark 2.13, this property can be indeed exploited to prove uniqueness in special situations. Notice that the cyclical monotonicity of S = γ S(γ) is not obvious, since, in general, the union of cyclically monotone sets is not cyclically monotone; see Figure 2.3. The reason why S = γ S(γ) is, nevertheless, cyclically monotone lies in the linearity of Cost combined with the convexity of Γ(μ, ν). Together, these two properties imply that the set Γopt (μ, ν) of all optimal transport plans for Kc (μ, ν) is convex: in particular, if γ 1 and γ 2 are optimal in Kc (μ, ν), then (γ 1 + γ 2 )/2 is an optimal plan, and thus S((γ 1 + γ 2 )/2) is cyclically monotone, and since
1 γ + γ2 S = S(γ 1 ) ∪ S(γ 2 ). 2 we conclude that S(γ 1 ) ∪ S(γ 2 ) is also cyclically monotone.
30
Discrete Transport Problems
(x2 , y2 )
S2 S1 (x1 , y1 )
Figure 2.3 Two sets S1 and S2 that are cyclically monotone in R × R, whose union S1 ∪ S2 is not cyclically monotone. Indeed, by suitably picking points (x 1, y1 ) ∈ S1 and (x 2, y2 ) ∈ S2 , we see that (y 2 − y1 )(x 2 − x 1 ) < 0, thus violating the 2 cyclical monotonicity inequality on {(x i , y i ) }i=1 .
Remark 2.13 (Uniqueness in dimension one) When n = 1, the statement in Remark 2.12 can be used to prove the uniqueness of minimizers for the discrete Kantorovich problem with quadratic cost. We only discuss this result informally. First of all, we construct a monotone discrete transport plan γ ∗ N and {y } M are from μ to ν. Assuming without loss of generality that {x i }i=1 j j=1 ∗ indexed so that x i < x i+1 and y j < y j+1 , we define γi j as follows: if μ1 ≤ ν1 , ∗ = μ , and γ ∗ = 0 for j ≥ 2; otherwise, we let j (1) be the largest then set γ11 1 1j index j such that μ1 > ν1 + . . . + ν j and set ⎧ νj , if 1 ≤ j ≤ j (1), ⎪ ⎪ ⎪ ⎪ ⎨ = ⎪ μ1 − (ν1 + . . . + ν j (1) ), if j = j (1) + 1, ⎪ ⎪ ⎪ 0, if j (1) + 1 < j ≤ M. ⎩ In this way we have allocated all the mass μ1 sitting at x 1 among the first j (1)+1 receiving sites, with the first j (1) receiving sites completely filled. Next, we start distributing the mass μ2 at site x 2 , start moving the largest possible fraction of it to y j (1)+1 (which can now receive a ν j (1)+1 −[μ1 − (ν1 +. . .+ ν j (1) )] amount of mass), and keep moving any excess mass to the subsequent sites y j (1)+k , k ≥ 2, if needed. Evidently, the resulting transport plan γ ∗ is such that S(γ ∗ ) is contained in the extended graph of an increasing function so that γ ∗ is indeed optimal in Kc (μ, ν). A heuristic explanation of why this is the unique optimal transport plan is given in Figure 2.4a. For a proof, see Theorem 16.1-(i,ii). γ1∗ j
Remark 2.14 We review Remark 2.4 in light of the results of this chapter. Denoting with superscripts the coordinates of points, so that p = (p1 , p2 ) is the generic point of R2 , we take x 1 = (0, 1),
x 2 = (0, −1),
y1 = (−1, 0),
y2 = (1, 0).
2.4 The Discrete Kantorovich Problem with Quadratic Cost
(a)
∂f
y5
(b)
31
y5 y1
y4
f
y3
y4 y3
y2
y2 y1 x1 x1
x2
x3
x2
x3
x4
x4
Figure 2.4 (a): A discrete transport problem with quadratic cost and n = 1. The black squares indicates all the possible interaction pairs (x i , y j ). Weights μ i and ν j are such that S (γ ∗ ) consists of the circled black squares. If γ is another optimal transport plan, then, by the statement in Remark 2.12, S (γ) ∪ S (γ ∗ ) is contained in the subdifferential of a convex function. This implies that S (γ) \ S (γ ∗ ) can only contain either (x 2, y3 ) or (x 3, y2 ). However, given the construction of γ ∗ , the fact that S (γ ∗ ) is “jumping diagonally” from (x 2, y2 ) to (x 3, y3 ) means that μ 1 + μ 2 = ν 1 + ν 2 . So there is no mass left for γ to try something different: if γ activates (x 2, y3 ) (i.e., if γ sends a fraction of μ 2 to y3 ), then γ must activate (x 3, y2 ) (sending a corresponding fraction of μ 3 to y2 to compensate the first modification), and this violates cyclical monotonicity. Therefore, γ ∗ is a unique optimal plan for Kc (μ, ν). (b): A geometric representation of a potential f such that S (γ ∗ ) ⊂ ∂ f . Notice that ∂ f (x 1 ) = [y1, y2 ] (with y1 = f (x 1− ) and y2 = f (x 1+ )), ∂ f (x 2 ) = { f (x 2 ) = y2 }, ∂ f (x 3 ) = [y3, y5 ] (with y3 = f (x 3− ), y5 = f (x 3+ ) and y4 in the interior of ∂ f (x 3 )), and ∂ f (x 4 ) = { f (x 4 ) = y5 }. Notice that we have large freedom in accommodating the y j s as elements of the subdifferentials ∂ f (x i ); in particular, we can find other convex functions g with S (γ ∗ ) ⊂ ∂g and such that f − g is not constant.
No matter what values of μi and ν j are chosen, all the admissible transport plans will have the same cost. If, say, μi = ν j = 1/2 for all i and j, then all the plans t = t, γ11
t γ12 =
1 − t, 2
t γ21 =
1 − t, 2
t γ22 = t,
corresponding to t ∈ [0, 1/2] are optimal. If t ∈ (0, 1/2), then S(γ t ) contains all the four possible pairs S = {(x i , y j )}i, j . How many convex functions (modulo additive constants) can contain S in their subdifferential? Just one. Indeed, the slopes y1 = (−1, 0) and y2 = (1, 0) correspond to the affine functions (p) = a−p1 and m(p) = b+p1 for a, b ∈ R. The only way for y1 , y2 ∈ ∂ f (x 1 )∩∂ f (x 2 ) is that the set { = m} contains both x 1 = (0, 1) and x 2 = (0, −1). Hence, we must have a = b and, modulo additive constants, there exists a unique convex potential; see Figure 2.5.
32
Discrete Transport Problems
x1 f (p) = |p1 |
1−t
t y1
y2 1−t
x2
t
p2
x1
x2 Figure 2.5 An example of discrete transport problem with quadratic cost where we have nonuniqueness of optimal transport plans, but where there is a unique (up to additive constants) convex potential such that the statement in Remark 2.12 holds.
2.5 The Discrete Monge Problem We close this chapter with a brief discussion of the discrete Monge problem. The following theorem is our main result in this direction. L and {y } L are families of L distinct points, and μ Theorem 2.15 If {x i }i=1 j j=1 and ν denote the discrete measures
μ=
L δ xi , L i=1
ν=
L δy j j=1
L
,
(2.40)
then for every optimal transport plan γ in Kc (μ, ν) there is a transport map T from μ to ν with the same transport cost as γ; in particular, T is optimal in Mc (μ, ν) and Mc (μ, ν) = Kc (μ, ν). Remark 2.16 It is interesting to notice that, by a perturbation argument, given an arbitrary pair of discrete probability measures (μ, ν) (i.e., μ and ν satisfy (2.1) and (2.2)), we can find a sequence {(μ , ν )} of discrete probability mea∗ ∗ sures such that μ μ and ν ν as → ∞, and each (μ , ν ) satisfies the assumptions of Theorem 2.15 (with some L = L in (2.40)). In a first approximation step, we can reduce to the case when all the weights μi and ν j are rational numbers. Writing these weights with a common denominator L, we find mi , n j ∈ {1, . . . , L} such that μi =
mi , L
νj =
nj , L
L=
N i=1
mi =
M j=1
nj .
2.5 The Discrete Monge Problem
(a)
t τ2 t τ1
(b)
t τ3
y1
X2
33
X3
Y1 = y1
X1
x1
x1
y2
Y2 = y2
t τ2
Y4 X4 = x 2
x2 t τ1
= 1/4
= 1/2
Y3
y3
y3
= 3/4
Figure 2.6 The second step in the approximation procedure of Remark 2.16: (a) The starting measures μ = (3/4)δ x 1 + (1/4)δ x 2 and ν = (1/4)(δ y1 + δ y2 ) + (1/2) δ y3; notice that there is no transport map between these two measures; (b)
The measures μ t = (1/4) 4h=1 δ X h and ν t = (1/4) 4h=1 δYh resulting from the approximation by splitting with t > 0. As t → 0+ , these measure weak-star converge to μ and ν respectively. Notice that there is a transport map from μ t to ν t corresponding to every permutation of {1, . . . , 4}. One of them is optimal for the Monge problem with quadratic cost from μ t to ν t .
Then, in a second approximation step, we consider a set of L-distinct unit L in Rn and notice that, on the one hand, vectors {τk }k=1 mi 1 ∗ mi δ x i +t τ k δ xi L L k=1
as t → 0+ ,
while, on the other hand, for every sufficiently small but positive t, both L {X h }h=1 = x i + t τk : 1 ≤ i ≤ N, 1 ≤ k ≤ mi , L = y j + t τk : 1 ≤ j ≤ M, 1 ≤ k ≤ n j , {Yh }h=1 consist of L-many distinct points. By combining these two approximation steps, we find a sequence {(μ j , ν j )} j with the required properties; see Figure 2.6. Proof of Theorem 2.15 We notice that, by (2.40), bi j , Γ(μ, ν) = {γi j } : γi j = L
b = {bi j } ∈ B L ,
(2.41)
where B L ⊂ R L×L is the set of the L × L-bistochastic matrices b = {bi j }, i.e.,
for every i, j, bi j ∈ [0, 1] and i bi j = j bi j = 1. Therefore, by the Choquet theorem (2.33), if γ is an optimal plan in Kc (μ, ν), then b = {bi j = L γi j } is an
34
Discrete Transport Problems
extremal point of B L . The latter are characterized as follows (a result known as the Birkhoff theorem): permutation matrices are the extremal points of BL ,
(2.42)
where b = {bi j } is a L × L-permutation matrix if bi j = δ j,σ (i) for a permutation σ of {1, . . . , L}. The proof of Birkhoff’s theorem is very similar to the proof of Theorem 2.5 and goes as follows. It is enough to prove that if b ∈ BL and bi (1) j (1) ∈ (0, 1) for some pair (i(1), j (1)), then b is not an extremal point
of B L . Indeed, by j bi (1) j = 1 we can find, on the i(1)-row of b, an entry bi (1) j (2) ∈ (0, 1) with j (2) j (1). We can then find, in the j (2)-column of b, an entry bi (2) j (2) ∈ (0, 1) with i(2) i(1). If we iterate this procedure, there is a first step k ≥ 3 such that either j (k) = j (1) or i(k) = i(1). In the first case we have identified an even number of entries bi j of b; in the second case, discarding the entry bi (1) j (1) , we have also identified an even number of entries bi j of b; in both cases, these entries are arranged into a closed loop in the matrix representation of b, and belong to (0, 1). We can exploit this cyclical structure to define a family of variations bt of b: considering, for notational simplicity, the case when j (3) = j (1), these variations take the form bit (1) j (1) = bi (1) j (1) + t,
bit (1) j (2) = bi (1) j (2) − t,
bit (2) j (1) = bi (2) j (1) − t,
bit (2) j (2) = bi (2) j (2) + t,
and bit j = bi j otherwise. In this way there is t 0 ∈ (0, 1) such that bt = {bit j } ∈ B L whenever |t| ≤ t 0 . In particular, by b = (bt0 + b−t0 )/2, we see that b is not an extremal point of B L and conclude the proof of (2.42). Having proved that for every discrete optimal transport plan γ in Kc (μ, ν) there is a permutation σ of {1, . . . , L} such that γi j = δσ (i), j /L, we can define L → Rn by setting T (x ) = y a map T : {x i }i=1 i σ (i) . By construction, T# μ = ν, with Mc (μ, ν) ≤ c(x,T (x)) dμ(x) = Cost(γ) = Kc (μ, ν) ≤ Mc (μ, ν), Rn
where we have used (2.7) and the general inequality Kc (μ, ν) ≤ Mc (μ, ν).
3 The Kantorovich Problem
In this chapter we finally present the general formulation of the Kantorovich problem. In Section 3.1 we extend the notion of transport plan, introduced in the discrete setting in Section 2.1, to the case of general probability measures. We then formulate the Kantorovich problem (Section 3.2), prove the existence of minimizers (Section 3.3), the c-cyclical monotonicity of supports of optimal transport plans (Section 3.4), the Kantorovich duality theorem (Section 3.5), and a few auxiliary, general results on c-cyclical monotonicity (Section 3.6). We close the chapter with Section 3.7, where we examine the Kantorovich duality theorem in the cases of the linear and quadratic transport costs.
3.1 Transport Plans In Section 2.1, given two discrete probability measures μ=
N
μi δ x i ,
ν=
i=1
M
ν j δy j ,
j=1
we have introduced discrete transport plans from μ to ν as matrices γ = {γi j } ∈ R N ×M , with the idea that the entry γi j ∈ [0, 1] represents the amount of mass at x i to be transported to y j . We have then imposed the transport conditions μi =
M
γi j ,
νj =
j=1
N
γi j
i=1
(all the mass sitting at x i is transported to ν, and all the mass needed at y j arrives from μ) to express the condition that γ transports μ into ν. When working with arbitrary probability measures μ, ν ∈ P (Rn ), it is just natural to consider probability measures γ on Rn × Rn , rather than matrices, with the idea that γ(E × F) is the amount of mass sitting at the Borel set E ⊂ Rn that, 35
36
The Kantorovich Problem
under the instructions contained in γ, has to be transported to the Borel set F ⊂ Rn . The conditions expressing that γ is a transport plan from μ to ν are then γ(E × Rn ) = μ(E),
γ(Rn × F) = ν(F),
∀E, F ∈ B(Rn ),
(3.1)
or, using the notation of push-forward of measures (see Appendix A.4), p# γ = μ,
q# γ = ν,
(3.2)
where p, q : Rn × Rn → Rn are the projection maps defined, for (x, y) ∈ Rn × Rn , by p(x, y) = x and q(x, y) = y. The family of the transport plans from μ to ν is then denoted by q# γ = ν . (3.3) Γ(μ, ν) = γ ∈ P (Rn × Rn ) : p# γ = μ, It is easily seen that if μ and ν satisfy (2.1) and (2.2), and γ = {γi j } ∈ Γ(μ, ν) ⊂ R N ×M for Γ(μ, ν) defined as in (2.4), then γi j δ (x i , y j ) ∈ Γ(μ, ν) γ= i, j
for Γ(μ, ν) defined as in (3.3). In other words, the notation set in Chapter 2 is compatible with the one just introduced now. It is also convenient to notice that (3.1) can be equivalently formulated in terms of test functions, by saying that ϕ(x) dγ(x, y) = ϕ dμ, ϕ(y) dγ(x, y) = ϕ dν, Rn ×Rn
Rn
Rn ×Rn
Rn
(3.4) whenever ϕ ∈ Cb0 (Rn ) or ϕ : Rn → [0, ∞] is a Borel function. The notion of transport plan just introduced is deeper than it may seem at first sight. There is a subtle indeterminacy in the transport instructions contained in γ ∈ Γ(μ, ν). Indeed, given x ∈ Rn , we cannot really answer the question “Where are we taking the mass sitting at x?” All we can say is that, given r > 0, we must take a γ(Br (x) × Br (y)) amount of the mass stored by μ in Br (x) and transport it entirely inside Br (y). The stochastic character of transport plans will be further investigated in Section 17.3. For the moment, it is sufficient to clarify that transport maps can indeed be seen as transport plans. Proposition 3.1 (Transport maps induce transport plans) If μ, ν ∈ P (Rn ) and T transports μ (i.e., μ is concentrated on a Borel set E ⊂ Rn and T : E → Rn is a Borel map), then T is a transport map from μ to ν (T# μ = ν) if and only if γT = (id × T )# μ is a transport plan from μ to ν. Here, (id × T ) : (id × T )(x) = (x,T (x)),
(3.5) Rn
×
Rn
→
∀x ∈ Rn .
Rn
is defined by
3.2 Formulation of the Kantorovich Problem
37
Proof Since μ ∈ P (Rn ), it is obvious that γT ∈ P (Rn ×Rn ). Moreover, using the general fact that the push-forward by the composition is the composition of the push-forwards, we see that p# γT = μ and q# γT = T# μ: indeed, if F ⊂ E is a Borel set, then (p# γT )(F) = (id × T )# μ (F × Rn ) = μ {x : (x,T (x)) ∈ F × Rn } = μ(F), (q# γT )(F) = (id × T )# μ (Rn × F) = μ {x : (x,T (x)) ∈ Rn × F} = μ(T −1 (F)) = (T# μ)(F). As a consequence, (3.2) holds if and only if T# μ = ν. Remark 3.2
We notice that transport plans always exist, since μ × ν ∈ Γ(μ, ν),
∀μ, ν ∈ P (Rn ).
Indeed, if γ = μ × ν and E ∈ B(Rn ), then p# γ(E) = (μ × ν)(p−1 (E)) = (μ × ν)(E × Rn ) = μ(E) ν(Rn ) = μ(E), and, similarly, q# γ = ν. The fact that γ = μ × ν is always a transport plan is somehow reflected in the fact that, as a transport plan, μ × ν is indeed giving the most generic transport instructions (and, thus, the less likely transport-cost efficient ones!). Indeed, γ(E × F) = μ(E) ν(F) means that the amount of mass stored by μ at E has to be distributed uniformly along all possible “receiving sites” F for ν (i.e., those sets with ν(F) > 0).
3.2 Formulation of the Kantorovich Problem A (transport) cost is simply a Borel function c : Rn × Rn → [0, ∞], so c(x, y) represents the cost to transport a unit mass from location x to location y. Although our focus will be mostly on the cases of the linear cost (appearing in Monge’s original problem (1.2)) c(x, y) = |x − y| and the quadratic cost c(x, y) = |x − y| 2 , in this section we will work with general cost functions. This choice has the advantage of not obscuring the true nature of some very general facts about transport problems, which have nothing to do with the linear or the quadratic cost; moreover, there are many examples of transport costs (beyond the linear and the quadratic case) that have theoretical and practical importance. Given a transport plan γ ∈ Γ(μ, ν), the transport cost of γ is given by c(x, y) dγ(x, y), (3.6) Rn ×Rn
38
The Kantorovich Problem
and the Kantorovich problem (with general cost c, from μ to ν) is the minimization problem Kc (μ, ν) = inf c(x, y) dγ(x, y) : γ ∈ Γ(μ, ν) . (3.7) Rn ×Rn
Minimizers in Kc (μ, ν) are called optimal transport plans. Of course, using the notion of transport map, we can formulate a Monge problem for every cost function c by setting c(x,T (x)) dμ(x) : T# μ = ν , (3.8) Mc (μ, ν) = inf Rn ×Rn
and we always have Mc (μ, ν) ≥ Kc (μ, ν).
(3.9)
To prove (3.9) we notice that by Proposition 3.1, if T is a transport map from μ to ν and γT = (id × T )# μ, then γT ∈ Γ(μ, ν), and thus Kc (μ, ν) ≤ c(x, y) dγT (x, y) = c(x,T (x)) dμ(x). Rn ×Rn
Rn
The crucial consequence of (3.9) is that it suggests an approach to the Monge problem: first, prove the existence of an optimal transport plan in Kc (μ, ν); second, when possible, show that optimal transport plans in Kc (μ, ν) are induced by transport maps as in (3.5). Of course, when this strategy works, it implies that (3.9) holds as an identity, therefore the interest of the following two remarks (see also Remark 4.1 in Chapter 4): Remark 3.3 (Mc > Kc may happen) If c ∈ L 1 (μ×ν) (so that Kc (μ, ν) < ∞), but there are no transport maps from μ to ν (so that Mc (μ, ν) = +∞), then (3.9) is obviously a strict inequality. In Remark 2.1 we have already seen that this can happen in the framework of discrete transport problems. More generally, we expect similar problems to emerge whenever μ has atoms, namely, whenever there is x ∈ Rn such that μ({x}) > 0, and thus it may be necessary to “split T (x) into more than one value” to achieve the transport condition toward a given ν. Remark 3.4 (Mc = Kc if μ has no atoms) However, under very general conditions on the cost function, if μ has no atoms (i.e., if μ({x}) = 0 for every x ∈ Rn ), then (3.9) holds as an equality. The idea behind this result is that, when μ has no atoms, then every γ ∈ Γ(μ, ν) can be approximated by maps-induced transport plans γT with the desired precision in transport cost. The formalization of this idea requires deeper technical tools than what is advisable to employ at this stage of our discussion, which will be postponed until Chapter 17; see, in particular, Theorem 17.11 therein.
3.2 Formulation of the Kantorovich Problem
39
Remark 3.5 (Symmetric costs) Notice that we are not requiring the cost function to be symmetric, that is, to satisfy c(x, y) = c(y, x) for every x, y ∈ Rn . When this is the case, like for the linear or the quadratic cost, we have Kc (μ, ν) = Kc (ν, μ).
(3.10)
Indeed, if we let R : Rn × Rn → Rn × Rn be the reflection map R(x, y) = R(y, x), then γ ∈ Γ(μ, ν) if and only if R# γ ∈ Γ(ν, μ), and c(x, y) d[R# γ](x, y) = c(y, x) dγ(x, y). Rn ×Rn
Rn ×Rn
In particular, if c is symmetric, then γ and R# γ have the same transport cost, and thus (3.10) follows. Remark 3.6 Whenever μ and ν are discrete measures (see (2.1) and (2.2)) and we denote by γ both a discrete transport plan {γi j } ∈ R N ×M from μ to ν
and its realization as transport plan i, j γi j δ (x i , y j ) ∈ P (Rn × Rn ), we have the cost function Cost(γ) defined in (2.5) and the one defined in (3.6) agree, since c dγ = c(x i , y j ) γi j . Rn ×Rn
ij
In particular, it is possible to see the theory developed in Chapter 2 as a particular case of the theory developed in this chapter. We close this section with a useful remark, where we summarize the relations between spt μ, spt ν, and spt γ when γ ∈ Γ(μ, ν). Remark 3.7 (On the supports of transport plans) Γ(μ, ν), then we have
If μ, ν ∈ P (Rn ) and γ ∈
spt γ ⊂ spt μ × spt ν.
(3.11)
μ is concentrated on p(spt γ) and ν on q(spt γ),
(3.12)
Moreover,
and, finally, if spt ν is compact, then ∀ x ∈ spt μ there exists y ∈ spt ν s.t. (x, y) ∈ spt γ. (3.13) To prove (3.11), we notice that (A.1) gives (3.14) spt γ = (x, y) ∈ Rn × Rn : γ Br (x) × Br (y) > 0 ∀r > 0 , so that (3.11) follows by γ(Br (x) × Rn ) = μ(Br (x)) and γ(Rn × Br (y)) = ν(Br (y)). To prove (3.12), we notice that if μ(Rn \p(spt γ)) > 0, then γ( A) > 0 for the open set A = [Rn \ p(spt γ)] × Rn . Thus, there exists (x, y) ∈ A ∩ spt γ
40
The Kantorovich Problem
so that x ∈ p(spt γ), a contradiction to x ∈ A. Before proving (3.13) we notice that the property fails without assuming the compactness of ν: for example, let γ=
∞ 1 δ (1/ j, j ) , 2j j=1
μ=
∞ 1 δ1/ j , 2j j=1
ν=
∞ 1 δj, 2j j=1
then 0 ∈ spt μ, but there is no y ∈ spt ν such that (0, y) ∈ spt γ. To prove (3.13): If x ∈ spt μ, then for every r > 0 we have γ(Br (x) × Rn ) > 0. The measure λ(E) = γ(Br (x) × E) must have non-empty support, so that for every r > 0 we can find y(r) ∈ Rn such that γ(Br (x) × Bs (y(r))) > 0 for every s > 0. Notice that y(r) ∈ spt ν since 0 < γ(Br (x) × Bs (y(r))) ≤ γ(Rn × Bs (y(r))) = ν(Bs (y(r)) for every s > 0. By compactness of spt ν, given r j → 0+ we can find y j ∈ spt ν with y j → y and y ∈ spt ν such that γ(Br j (x) × Bs (y j )) > 0 for every s > 0. In particular, for every ρ > 0 we can find j large enough so that ρ > r j , Bρ/2 (y j ) ⊂ Bρ (y), and thus γ(Bρ (x) × Bρ (y)) ≥ γ(Br j (x) × Bρ/2 (y j )) > 0, thus proving that (x, y) ∈ spt γ with y ∈ spt ν.
3.3 Existence of Minimizers in the Kantorovich Problem As in the discrete case, the existence of minimizers in the Kantorovich problem is easily obtained as a reflection of the fact that we are minimizing the linear functional defined in (3.6) over the convex set Γ(μ, ν). The main issue to be addressed is the compactness of Γ(μ, ν), since now, at variance with the discrete case, we have an infinite-dimensional competition class. This is the content of the following theorem, where we prove the existence of minimizers in the Kantorovich problem under the sole assumption that the cost c is lower semicontinuous. Theorem 3.8 If c : Rn × Rn → [0, ∞] is lower semicontinuous, then for every μ, ν ∈ P (Rn ) there exists a minimizer γ of Kc (μ, ν). Proof If Kc (μ, ν) = +∞, then every γ ∈ Γ(μ, ν) is a minimizer in Kc (μ, ν). Therefore, we assume that Kc (μ, ν) < +∞ and consider a minimizing sequence {γ j } j for Kc (μ, ν), i.e., c dγ j , γ j ∈ Γ(μ, ν). (3.15) Kc (μ, ν) = lim j→∞
Rn ×Rn
Since γ j × = 1 for every j, by the compactness theorem for Radon measures there exists a Radon measure γ on Rn × Rn such that, up to extracting (Rn
Rn )
3.3 Existence of Minimizers in the Kantorovich Problem
41
∗
subsequences, γ j γ as j → ∞. Aiming at applying the narrow convergence criterion of Proposition A.1, we now show that for every ε > 0 we can find Kε ⊂ Rn × Rn compact with sup γ j (Rn × Rn ) \ Kε < ε. (3.16) j ∈N
and ν(Rn ) are finite, we can find compact sets K μ Indeed, since both ν n and K in R such that μ(Rn \ K μ ) < ε/2 and ν(Rn \ K ν ) < ε/2; since Kε = K μ × K ν is compact in Rn × Rn and since (Rn × Rn ) \ Kε ⊂ Rn \ K μ × Rn ∪ Rn × Rn \ K ν , μ(Rn )
by γ j ∈ Γ(μ, ν) we find γ j (Rn × Rn ) \ Kε ≤ γ j Rn \ K μ × Rn + γ j Rn × Rn \ K ν = μ(Rn \ K μ ) + ν(Rn \ K ν ) < ε. Having proved (3.16), we deduce from Proposition A.1 and from γ j (Rn ×Rn ) = 1 for every j that γ(Rn × Rn ) = 1 and that γ j narrowly converges to γ on Rn × Rn . In particular, since ϕ ∈ Cb0 (Rn ) implies ϕ ◦ p ∈ Cb0 (Rn × Rn ), we find that ϕ dμ = (ϕ ◦ p) dγ j → (ϕ ◦ p) dγ as j → ∞, Rn
that is
Rn ×Rn
Rn ×Rn
Rn ×Rn
ϕ(x) dγ(x, y) =
Rn
ϕ dμ
∀ϕ ∈ Cb0 (Rn ).
This shows p# γ = μ, and, similarly, we have q# γ = ν. To complete the proof we notice that, since c is lower semicontinuous on Rn × Rn , the super level sets {c > t} of c are open sets. Thus, by (A.8) we have lim inf γ j ({c > t}) ≥ γ({c > t}) j→∞
∀t ∈ R,
so, by applying the layer cake formula (A.2) twice and by Fatou’s lemma, we find ∞ c dγ j = lim γ j {c > t} dt Kc (μ, ν) = lim j→∞ Rn ×Rn j→∞ 0 ∞ ∞ lim inf γ j {c > t} dt ≥ γ {c > t} dt ≥ j→∞ 0 0 c dγ ≥ Kc (μ, ν), = Rn ×Rn
where in the last inequality we have used the fact that γ ∈ Γ(μ, ν). This proves that γ is a minimizer of Kc .
42
The Kantorovich Problem
3.4 c-Cyclical Monotonicity with General Measures Next, in analogy with Theorem 2.5 for the discrete transport problem, we relate the property of being an optimal transport plan to c-cyclical monotonicity. Theorem 3.9 If c : Rn × Rn → [0, ∞) is a continuous function, μ, ν ∈ P (Rn ) are such that Kc (μ, ν) < ∞, and γ is an optimal transport plan in Kc (μ, ν), then N N c(x i , yi ) ≤ c(x i+1 , yi ) i=1
i=1
N whenever {(x i , yi )}i=1 ⊂ spt γ and x N +1 = x 1 . In particular, spt γ is c-cyclically monotone.
Remark 3.10 (Non-crossing condition for the linear cost) If γ is an optimal transport plan in Kc (μ, ν) with c(x, y) = |x − y| and (x, y), (x , y ) ∈ spt γ (so that, in particular, x, x ∈ spt μ and y, y ∈ spt ν by (3.11)), then the c-cyclical monotonicity of γ with respect to the linear cost implies |x − y| + |x − y | ≤ |x − y | + |x − y|.
(3.17)
Geometrically, this means that if the segments [x, y] = {(1−t) x+t y : t ∈ [0, 1]} and [x , y ] have nontrivial intersection, then they either have one common endpoint (i.e., x = x and/or y = y ) or have the same orientation (i.e., (y − x)/|y − x| = (y − x )/|y − x |). This non-crossing condition, which was already met in (1.18), will play a crucial role in solving the Monge problem in Chapter 18; see (18.21). N ⊂ Proof of Theorem 3.9 As in the proof of the Theorem 2.5, given {(x i , yi )}i=1 spt γ and setting x N +1 = x 1 , we want to create variations of γ by not transporting an ε of the mass that γ transports from x i to yi and by sending instead to each to yi+1 an ε of the mass stored by μ at x i . The difference is that now we are not working with discrete measures, so we cannot really identify objects like “the mass sent by γ from x i to yi ” but have rather to work with γ(Br (x i ) × Br (yi )) for a value of r positive and small. This said, for every r > 0 and (x i , yi ) ∈ spt γ we set ε i = γ Br (x i ) × Br (yi ) > 0,
γi =
1 γ [Br (x i ) × Br (yi )] ∈ P (Rn × Rn ), εi
μi = p# γi ∈ P (Rn ), νi = q# γi ∈ P (Rn ),
3.5 Kantorovich Duality for General Transport Costs
43
so that γi encodes “what γ does near (x i , yi ),” while μi+1 × νi is a good approximation of the operation of “taking the mass transported by γ from x i+1 to yi+1 , and sending it to yi instead.” With this insight, we define our family of competitors parameterized over 0 < ε < mini=1, ..., N ε i by setting γε = γ −
N N ε ε γi + μi+1 × νi . N i=1 N i=1
We first check that γ ε ∈ P (Rn × Rn ): indeed, γ ε (Rn × Rn ) = 1 −
N N ε ε γi (Rn × Rn ) + μi+1 (Rn ) νi (Rn ) = 1, N i=1 N i=1
while for every A ⊂ Rn × Rn we have γ ε ( A) ≥ 0, since ε/ε i ≤ 1, and thus N N ε γ A ∩ (Br (x i ) × Br (yi )) 1 ε ≥ γ( A) − γ( A) = 0. γ ( A) ≥ γ( A) − N i=1 εi N i=1 (3.18) Finally, denoting by ω a uniform modulus of continuity for c at the points N ∪ {(x N {(x i , yi )}i=1 i+1 , yi )}i=1 , we notice that since γi is concentrated on Br (x i ) × Br (yi ) and μi+1 × νi is concentrated on Br (x i+1 ) × Br (yi ), we have 0≤ c dγ ε − c dγ Rn ×Rn N
ε =− N ε ≤− N
Rn ×Rn
i=1 N i=1
Rn ×Rn
N ε c dγi + c d[μi+1 × νi ] N i=1 Rn ×Rn
N ε c(x i , yi ) − ω(r) + c(x i+1 , yi ) + ω(r) . N i=1
Dividing by ε > 0, we find that N
c(x i , yi ) ≤ 2 N ω(r) +
i=1
N
c(x i+1 , yi ),
i=1
and letting r → 0+ we conclude the proof.
3.5 Kantorovich Duality for General Transport Costs Having proven c-cyclical monotonicity of spt γ to be necessary for optimality in Kc (μ, ν), we now discuss its sufficiency. The idea is to repeat the analysis performed in the case of the discrete transport problem with quadratic cost in Section 2.4. In that case the crucial step for proving sufficiency, discussed in
44
The Kantorovich Problem
the implication “(iii) implies (i)” in Theorem 2.10, was exploiting the relation of cyclical monotonicity to the notion of subdifferential of a convex function, and the relation between subdifferentials and equality cases in the Fenchel–Legendre inequality. We thus set ourselves to the task of introducing an appropriate notion of convexity related to the cost c and then develop related concepts of subdifferential and Fenchel–Legendre duality, with the final goal of repeating the argument of Theorem 2.10. The appropriate definitions are correctly guessed by recalling how convexity is related to the quadratic cost transport problem, and in particular how the Euclidean scalar product, which plays a crucial role in the definitions of subdifferential and of Fenchel–Legendre transform, enters into the proof of Theorem 2.10. In this vein, we can notice that the identity 2 2 2 |x−y| dγ(x, y) = |x| dμ(x)+ |y| dν(y)−2 x·y dγ(x, y) Rn ×Rn
Rn
Rn
Rn ×Rn
implies that Kc (μ, ν) with c(x, y) = |x − y| 2 is actually equivalent to Kc (μ, ν) with c(x, y) = −x · y and then make the educated guess that we should introduce notions of c-convexity, of c-subdifferential, and of c-Fenchel–Legendre transform, by systematically replacing the scalar product x · y with −c(x, y) in the corresponding definitions for convex functions from Section 2.3. We thus proceed to consider the following definitions: c-convexity: Following (2.29), we say that f : Rn → R ∪ {+∞} is c-convex if there is a function α : Rn → R ∪ {−∞} such that f (x) = sup α(y) − c(x, y) : y ∈ Rn (3.19) ∀x ∈ Rn . We set Dom( f ) = { f < ∞} and notice that if c is continuous, then every cconvex function is lower semicontinuous on Rn . When c(x, y) = −x · y, we are just defining convex, lower-semicontinuous functions on Rn , so that f = f ∗∗ . In the convex case we are taking of course α(y) = − f ∗ (y). Since f ∗ takes values in R ∪ {+∞}, we have allowed α to take values in R ∪ {−∞}. c-subdifferential of a c-convex function: If f is c-convex, then, in analogy with (2.18) and (2.21) (and noticing that, if c(x, y) = −x · y, then y · (z − x) = c(x, y) − c(z, y) for every x, y, z ∈ Rn ), we set ∂c f (x) = y ∈ Rn : f (z) ≥ f (x) + c(x, y) − c(z, y) ∀z ∈ Rn , (3.20) {x} × ∂c f (x) ⊂ Rn × Rn . (3.21) ∂c f = x ∈Rn
Clearly, if the cost c is continuous and f is c-convex (and thus lower semicontinuous), then ∂c f is closed in Rn × Rn .
3.5 Kantorovich Duality for General Transport Costs
45
c-Fenchel-Legendre transform: If f is c-convex, then, in analogy with (2.28), we set f c (y) = sup − c(x, y) − f (x) : x ∈ Rn
∀y ∈ Rn .
(3.22)
Finally, we recall that S ⊂ Rn × Rn is c-cyclically monotone if for every N ⊂ S one has {(x i , yi )}i=1 N i=1
c(x i , yi ) ≤
N
c(x i+1 , yi )
where x N +1 = x 1 .
(3.23)
i=1
The importance of definitions (3.19), (3.20), (3.21), and (3.22) is established by the following two results. Theorem 3.11 (c-Rockafellar theorem) Let c : Rn × Rn → R be given. A set S ⊂ Rn × Rn is c-cyclically monotone if and only if there exists a c-convex function f : Rn → R ∪ {+∞} such that S ⊂ ∂c f .
(3.24)
Moreover, in the “only if” implication one can take f to be lower semicontinuous as soon as c is continuous. Proposition 3.12 (c-Fenchel inequality) Let c : Rn × Rn → R be given. If f : Rn → R ∪ {+∞} is a c-convex function, then c(x, y) ≥ − f (x) − f c (y)
∀(x, y) ∈ Rn × Rn ,
c(x, y) = − f (x) − f (y)
if and only if
c
y ∈ ∂c f (x).
(3.25) (3.26)
Before proving Theorem 3.11 and Proposition 3.12, we show how they are used in the analysis of the Kantorovich problem, and, in particular, in proving that c-cyclical monotonicity of supports is effectively characterizing optimal transports plans, and in showing the validity of the duality formula (3.27) (which generalizes (2.36)): Theorem 3.13 (Kantorovich theorem) Let c : Rn × Rn → [0, ∞) be a continuous function, μ, ν ∈ P (Rn ), and c ∈ L 1 (μ × ν). Then, for every γ ∈ Γ(μ, ν), the following three statements are equivalent: (i) γ is an optimal transport plan in Kc (μ, ν); (ii) spt γ is c-cyclically monotone in Rn × Rn ; (iii) there exists a (lower semicontinuous) c-convex function f : Rn → R ∪ {+∞} such that spt γ ⊂ ∂c f .
46
The Kantorovich Problem
Moreover, there exists a c-convex function f : Rn → R ∪ {+∞} such that f ∈ L 1 (μ), f c ∈ L 1 (ν), and Kc (μ, ν) = sup α dμ + β dν (3.27) (α, β) ∈A
Rn
=
Rn
Rn
(− f ) dμ +
Rn
(− f c ) dν,
where A is the family of those pairs (α, β) ∈ L 1 (μ) × L 1 (ν) such that ∀(x, y) ∈ Rn × Rn .
α(x) + β(y) ≤ c(x, y)
(3.28)
Remark 3.14 (Kantorovich dual problem in Economics) The maximization problem appearing in (3.27) is known as the Kantorovich dual problem, and (3.27) has the following economic interpretation. Let cship be the transport cost paid by a shipping company, and let c be the one that an individual would pay by transporting mass by their own means. Expressing both costs in the same currency, if the shipping company is well organized, then we arguably have cship (x, y) ≤ c(x, y) (with a large gap if |x − y| is large enough). The shipping company wants to establish prices α(x) and β(y) for the services of, respectively, picking up a unit mass at a location x and of delivering a unit mass at location y. As soon as condition (3.28) holds, it will be convenient for any individual to delegate their transport needs to the shipping company. An optimal pair (− f , − f c ) in (3.27) represents the maximal prices the shipping company can ask to be convenient for individuals. The actual prices will need to be lower than that to attract customers, but still such that α(x) + β(y) > cship (x, y) in order to keep the company profitable. Proof of Theorem 3.13 Theorem 3.9 shows that (i) implies (ii), while Theorem 3.11 shows that (ii) implies (iii). We now show that if γ ∈ Γ(μ, ν) and there exists a c-convex function f : Rn → R ∪ {+∞} such that spt γ ⊂ ∂c f , then γ is an optimal transport plan in Kc (μ, ν). To this end let us first notice that c d γ¯ ≥ α dμ + β dν Rn ×Rn
Rn
Rn
for every γ¯ ∈ Γ(μ, ν) and (α, β) ∈ A, i.e., Kc (μ, ν) ≥ sup α dμ + (α, β) ∈A
Rn
Rn
β dν.
Therefore, to show that γ is an optimal transport plan in Kc (μ, ν) (and, at the same time, that (3.27) holds), it suffices to prove that (− f , − f c ) ∈ A and c dγ = (− f ) dμ + (− f c ) dν. (3.29) Rn ×Rn
Rn
Rn
3.5 Kantorovich Duality for General Transport Costs
47
To this end, let us recall that, by (3.25), (α, β) = (− f , − f c ) satisfies (3.28), while spt γ ⊂ ∂ c f implies that if (x, y) ∈ spt γ, then y ∈ ∂c f (x) and thus, thanks to (3.26), that ∀(x, y) ∈ spt γ. (3.30) Now, by c ∈ L 1 (μ × ν) it follows that for ν-a.e. y ∈ Rn , Rn c(x, y) dμ(x) < ∞; c similarly, by (3.12), ν is concentrated on q(spt γ), while q(spt γ) ⊂ { f < ∞} n thanks to (3.30); therefore, for ν-a.e. y ∈ R we have both Rn c(x, y) dμ(x) < ∞ and f c (y) ∈ R; we fix such a value of y and integrate | f (x)| ≤ c(x, y) + | f c (y)| in dμ(x) over x ∈ p(spt γ) to find that | f | dμ ≤ c(x, y) dμ(x) + | f c (y)| < ∞, c(x, y) = − f (x) − f c (y),
Rn
Rn
thus proving f ∈ L 1 (μ); we then easily deduce that f c ∈ L 1 (ν) and conclude that (− f , − f c ) ∈ A and satisfies (3.29), thus completing the proof of the theorem. The proofs of Theorem 3.11 and Proposition 3.12 are identical to those of their convex counterparts. Indeed, in proving Rockafellar’s theorem, Fenchel’s inequality and Fenchel’s identity we have never made use of the bilinearity of the scalar product, nor of its continuity, homogeneity, or symmetry. For the sake of clarity we include anyway the details. Proof of Theorem 3.11 Proof that (3.24) implies c-cyclical monotonicity: N of S, (3.24) implies that y ∈ ∂ f (x ); in Given a finite subset {(x i , yi )}i=1 i c i particular, ∂c f (x i ) is non-empty, f (x i ) < ∞, and f (x) ≥ f (x i ) + c(x i , yi ) − c(x, yi ),
∀x ∈ Rn .
Testing this inequality at x = x i+1 (with x N +1 = x 1 ) and summing up over i = 1, . . . , N gives N
and since
N i=1
f (x i+1 ) ≥
i=1
f (x i+1 ) =
N
N
f (x i ) + c(x i , yi ) − c(x i+1 , yi ),
i=1
i=1
f (x i ) we deduce that (3.23) holds.
Proof that c-cyclical monotonicity implies (3.24): Since S is non-empty we can pick (x 0 , y0 ) ∈ S and (by replacing every term of the form −z · w with c(z, w) in (2.24)) we define for z ∈ Rn N −1 ⎧ ⎪ −c(z, y f (z) = sup ⎨ ) + c(x , y ) + c(x i , yi ) − c(x i+1 , yi ) N N N ⎪ i=1 ⎩ ⎫ ⎪ N −c(x 1 , y0 ) + c(x 0 , y0 ) : {(x i , yi )}i=1 ⊂ S⎬ ⎪. ⎭
(3.31)
48
The Kantorovich Problem
If we set α(y) = −∞ for y q(spt γ), and N −1 ⎧ ⎫ ⎪ ⎪ N ⎬, c(x α(y) = sup ⎨ , y ) + c(x , y ) − c(x , y ) : {(x , y )} ⊂ S, y = y N N i i i+1 i i i N i=1 ⎪ ⎪ i=0 ⎩ ⎭ if y ∈ q(spt γ) (so that the above supremum is taken over a non-empty set), then (3.31) takes the form f (z) = sup α(y) − c(z, y) : y ∈ Rn (3.32) ∀z ∈ Rn ,
showing that f is, indeed, c-convex (and that f is lower semicontinuous as soon as c is continuous). We claim that f (x 0 ) = 0 (so that f is not identically equal to +∞). Indeed, by testing (3.31) with {(x 1 , y1 )} = {(x 0 , y0 )} at z = x 0 , we find f (x 0 ) ≥ 0, N ⊂ S we have while f (x 0 ) ≤ 0 is equivalent to show that, for every {(x i , yi )}i=1 c(x N , y N ) − c(x 0 , y N ) +
N −1
c(x i , yi ) − c(x i+1 , yi ) + c(x 0 , y0 ) − c(x 1 , y0 ) ≤ 0.
i=1
(3.33) N ⊂ S. This follows by applying (3.23) to {(x i , yi )}i=0 Finally, we prove that S ⊂ ∂c f . Indeed, let (x ∗ , y∗ ) ∈ S and let t ∈ R be N ⊂ S such such that t < f (x ∗ ). By definition of f (x ∗ ), we can find {(x i , yi )}i=1 that N −1 − c(x ∗ , y N ) + c(x N , y N ) + c(x i , yi ) − c(x i+1 , yi ) ≥ t. (3.34) i=0 N +1 ⊂ S, Letting x N +1 = x ∗ and y N +1 = y∗ and testing (3.31) with {(x i , yi )}i=1 n we find that, for every z ∈ R , N f (z) ≥ −c(z, y N +1 ) + c(x N +1 , y N +1 ) + c(x i , yi ) − c(x i+1 , yi ) i=0
N −1 = − c(z, y∗ ) + c(x ∗ , y∗ ) − c(x ∗ , y N ) + c(x N , y N )+ c(x i , yi ) − c(x i+1 , yi ) i=0
≥ −c(z, y∗ ) + c(x ∗ , y∗ ) + t,
(3.35)
where in the last inequality we have used (3.34). Since f (z) is finite at z = x 0 , by letting t → f (x ∗ ) − in (3.35) first with z = x 0 we see that x ∗ ∈ Dom( f ), and then, by taking the same limit for an arbitrary z, we see that y∗ ∈ ∂c f (x ∗ ). Proof of Proposition 3.12 Let f : Rn → R ∪ {+∞} be a c-convex function. The validity of (3.25) is immediate from the definition of f c , so we just have to prove that c(x, y) = − f (x) − f c (y)
if and only if
y ∈ ∂c f (x).
3.6 Two Additional Results on c-Cyclical Monotonicity
49
On the one hand, if c(x, y) = − f (x) − f c (y) for some x, y ∈ Rn , then by using f c (y) ≥ −c(z, y) − f (z) for every z ∈ Rn we deduce c(x, y) ≤ − f (x) + c(z, y) + f (z)
∀z ∈ Rn ,
(3.36)
i.e., y ∈ ∂c f (x). Conversely, if y ∈ ∂c f (x) for some x, y ∈ Rn , then (3.36) holds, and the arbitrariness of z and the definition of f c (y) give c(x, y) ≤ − f (x) − f c (y). Since the opposite inequality holds true, we deduce c(x, y) = − f (x) − f c (y).
3.6 Two Additional Results on c-Cyclical Monotonicity We now prove two additional general results concerning c-cyclical monotonicity and transport problems. The first one, Theorem 3.15, is the generalization of Theorem 2.12 to general origin and final measures. The second one, Theorem 3.16, concerns the existence of c-cyclically monotone plans in Γ(μ, ν) even when Kc (μ, ν) is not finite. Theorem 3.15 (Set of optimal transport plans) Let c : Rn × Rn → [0, ∞) be a continuous function, let μ, ν ∈ P (Rn ) and assume that c ∈ L 1 (μ × ν). Let Γopt (μ, ν, c) be the family of optimal transport plans in Kc (μ, ν). Then Γopt (μ, ν, c) is convex and compact in the narrow convergence of Radon measures. Moreover, S= spt γ : γ ∈ Γopt (μ, ν, c) is c-cyclically monotone, and in particular there exists a (lower-semicontinuous and) c-convex function f : Rn → R ∪ {+∞} whose c-subdifferential ∂c f contains the supports of all the optimal plans in Kc (μ, ν). Proof Given γ ∈ Γopt (μ, ν, c), by Theorem 3.13, there exists a c-convex function f : Rn → R ∪ {+∞} such that spt γ ⊂ ∂c f ; moreover, since c is continuous and f is lower semicontinuous, we have that ∂c f is closed. Now, if γ ∈ Γopt (μ, ν, c), then by (3.27) we have c(x, y) dγ (x, y) = (− f ) dμ + (− f c ) dν, Rn ×Rn
so that Proposition 3.12 implies that closed, this implies that spt γ ⊂ ∂c f .
Rn
γ
Rn
is concentrated on ∂c f . Since ∂c f is
Transport plans with prescribed marginals and whose support is c-cyclically monotone can be constructed even outside of the context of a well-posed transport problem, i.e., even when the assumption Kc (μ, ν) < ∞ is dropped.
50
The Kantorovich Problem
The proof is based on an approximation argument and on the remark that Kc (μ, ν) < ∞ always holds if μ and ν are discrete measures. Theorem 3.16 Let c : Rn × Rn → [0, ∞) be a continuous function. If μ, ν ∈ P (Rn ), then there exist γ ∈ Γ(μ, ν) and a c-convex function f : Rn → R∪{+∞} such that spt γ is c-cyclically monotone and contained in ∂c f . Proof Step one: We prove that every probability measure on Rn can be approximated by discrete probability measures in the narrow convergence. This is usually proved, in much greater generality, by means of the Banach–Alaoglu theory – but, in the spirit of our presentation, we adopt here a more concrete argument. For R > 0 and for some integer M ≥ 2, let Q R = (−R/2, R/2) n N be a covering of Q by N = (R/M) n many (neither open and let {Q iR }i=1 R or closed) cubes of side length R/M. Denote by x iR the center of Q iR , set λ iR = μ(Q iR )/μ(Q R ), and let N λ iR δ x i ∈ P (Rn ). μ R, M = R
i=1
Then, for every ϕ ∈ Cc0 (Rn ), denoting by ωϕ a modulus of continuity for ϕ, we have
ϕ dμ − ϕ dμ
≤ ϕC 0 (Rn ) μ(Rn \ Q R ), QR
Rn
N
μ(Q iR )
ϕ dμ − ϕ dμ R, N
≤ |ϕ − ϕ(x iR )| dμ+|ϕ(x iR )|
μ(Q iR ) − μ(Q R )
Rn
i=1 Q iR
QR
√ R 1 − μ(Q R ) . n ≤ ωϕ μ(Q R ) + ϕC 0 (Rn ) M μ(Q R ) √ Setting R = M and letting M → ∞, we have proved that ∗
μ √ M, M μ
as M → ∞.
Since each μ √ M, M is a probability measure, by Proposition A.1, the convergence of μ √ M, M to μ is narrow. j ∞ Step two: Now let μ, ν ∈ P (Rn ), and let { μ j }∞ j=1 and {ν } j=1 be discrete probability measures such that n
μ j μ,
n
ν j ν,
as j → ∞.
By the theory developed in this section, for each j there is an optimal plan γ j in Kc (μ j , ν j ) < ∞, and spt γ j is c-cyclically monotone. Up to extracting a ∗ subsequence, we have γ j γ for some Radon measure γ on Rn × Rn . By narrow convergence of μ j and ν j to μ and ν respectively, we have that
3.7 Linear and Quadratic Kantorovich Dualities
51
lim sup μ j (Rn \ BR ) + ν j (Rn \ BR ) = 0.
R→∞
j n
In particular, by arguing as in the proof of (3.16), we find that γ j γ, and thus that γ ∈ P (Rn ). By (A.12), for every (x, y) ∈ spt γ there are (x j , y j ) ∈ spt γ j such that (x j , y j ) → (x, y) as j → ∞: since the c-cyclical monotonicity inequality for a continuous cost c is a closed condition, we deduce the c-cyclical monotonicity of spt γ from that of spt γ j . By Theorem 3.11, there exists a cconvex function f : Rn → R ∪ {+∞} such that spt γ ⊂ ∂c f . Moreover, the narrow convergence of μ j , ν j and γ j to μ, ν and γ respectively implies that γ ∈ Γ(μ, ν).
3.7 Linear and Quadratic Kantorovich Dualities We now examine some immediate consequences of the Kantorovich duality for the transport problems with linear and quadratic costs. The corresponding results, see Theorem 3.17 and Theorem 3.20, will be the starting points for the detailed analysis on the corresponding Monge problems in Part IV and Part II, respectively. It will be convenient to work with the set of probability measures with finite p-moment, |x| p dμ(x) < ∞ , 1 ≤ p < ∞, Pp (Rn ) = μ ∈ P (Rn ) : Rn
so that the Kantorovich problem corresponding to c(x, y) = |x − y| p , denoted by K p (μ, ν), is such that K p (μ, ν) < ∞ for every μ, ν ∈ Pp (Rn ) – indeed, if μ, ν ∈ Pp (Rn ), then μ × ν defines a transport plan from μ to ν with finite p-cost. This remark is useful because the finiteness of Kc (μ, ν) is a necessary assumption to apply Kantorovich’s theory. Theorem 3.17 If μ, ν ∈ P1 (Rn ), then K1 (μ, ν) admits optimal transport plans, and there exists a Lipschitz function f : Rn → R with Lip( f ) ≤ 1, called a Kantorovich potential from μ to ν, such that γ ∈ Γ(μ, ν) is an optimal transport plan in K1 (μ, ν) if and only if f (y) = f (x) + |x − y|
∀(x, y) ∈ spt γ.
Moreover, the Kantorovich duality formula (3.27) holds in the form n g dν − g dμ : g : R → R, Lip(g) ≤ 1 , K1 (μ, ν) = sup Rn
Rn
and any Kantorovich potential f achieves the supremum in (3.38).
(3.37)
(3.38)
52
The Kantorovich Problem
Remark 3.18 A special situation where all Kantorovich potentials agree up to additive constants is described in Remark 19.5. Remark 3.19 It is easily seen (arguing along the lines of the following proof) that if f is a Kantorovich potential from μ to ν, then f c = − f and − f is a Kantorovich potential from ν to μ. Proof of Theorem 3.17 In the following we set c(x, y) = |x − y|. Step one: By Theorem 3.15, there exists a c-convex function f : Rn → R ∪ {+∞} such that γ ∈ Γ(μ, ν) is an optimal transport plan for K1 (μ, ν) if and only if (3.37) holds. The c-convexity of f means that (3.39) ∀x ∈ Rn , f (x) = sup α(y) − c(x, y) : y ∈ Rn where α : Rn → R ∪ {−∞}. We notice that Dom( f ) ∅: indeed, ∂c f ∅ since there is an optimal plan γ for K1 (μ, ν) (Theorem 3.8), and therefore ∅ spt γ ⊂ ∂c f by Theorem 3.15. In turn Dom( f ) ∅ implies that f is 1-Lipschitz. Indeed, let us notice as a general fact that if f is c-convex (with c(x, y) = |x − y|), then Dom( f ) ∅ if and only if Lip( f ) ≤ 1 (and, in particular, f is everywhere finite). Indeed, if f is c-convex and f (x 0 ) < ∞, then α(y) ≤ f (x 0 ) + |y − x 0 |
∀y ∈ Rn ,
(3.40)
and in particular α(y) − |x − y| ≤ f (x 0 ) + |y − x 0 | − |x − y| ≤ f (x 0 ) + |x − x 0 |, so that, taking the supremum over y ∈ Rn , f (x) ≤ f (x 0 ) + |x − x 0 | and Dom( f ) = Rn . Since f (x) < ∞ we can interchange roles and prove, by the same argument, that f (x 0 ) ≤ f (x) + |x − x 0 |, thus showing that Lip( f ) ≤ 1. Conversely, if Lip( f ) ≤ 1, then f is c-convex: indeed (3.39) holds trivially with α(y) = f (y) for y ∈ Rn . The fact that f is a 1-Lipschitz function implies that y ∈ ∂c f (x)
if and only if
f (y) = f (x) + |x − y|.
(3.41)
In other words, y ∈ ∂c f (x) if and only if y “saturates” the 1-Lipschitz condition of f with respect to x. In one direction, if y ∈ ∂c f (x), then (3.20) implies that f (z) ≥ f (x) + |x − y| − |z − y|
∀z ∈ Rn .
Setting z = y one finds f (y) ≥ f (x) + |x − y|; since the opposite inequality follows by Lip( f ) ≤ 1, we find f (y) = f (x) + |y − x|. Conversely, if f (y) = f (x) + |x − y| and z ∈ Rn , then by Lip( f ) ≤ 1 we find f (z) ≥ f (y) − |z − y| = f (x) + |x − y| − |z − y|, and thus, by arbitrariness of z, y ∈ ∂c f (x).
3.7 Linear and Quadratic Kantorovich Dualities
53
Step two: We now prove (3.38). First of all we trivially have K1 (μ, ν) ≥ sup g dν − g dμ : u : Rn → R, Lip(g) ≤ 1 . Rn
Rn
At the same time if γ is optimal in K1 (μ, ν) and f is a Kantorovich potential from μ to ν, then f belongs to the competition class of the supremum problem since Lip( f ) ≤ 1, while (3.37) gives |x − y| dγ(x, y) K1 (μ, ν) = Rn ×Rn = f (y) − f (x) dγ(x, y) = f dν − f dμ, Rn ×Rn
Rn
Rn
thus proving (3.38).
Let us now consider the case of the quadratic cost c(x, y) = |x − y| 2 /2. It is easily seen that, (i) a function f : Rn → R ∪ {+∞} is c-convex if and only if it is lower semicontinuous on Rn and x → f (x) + |x| 2 /2 is convex on Rn ; (ii) for every x ∈ Rn , denoting by ∂ the subdifferential of a convex function, | · |2 ∂c f (x) = ∂ f + (x); 2 (iii) if f is a c-convex function, then ∗ | · |2 |y| 2 . (y) − f c (y) = f + 2 2
(3.42)
It is thus immediate to deduce the following result. Theorem 3.20 If μ, ν ∈ P2 (Rn ), then K2 (μ, ν) admits optimal transport plans, and there exists a convex, lower semicontinuous function f : Rn → R ∪ {+∞} such that γ ∈ Γ(μ, ν) is an optimal transport plan in K2 (μ, ν) if and only if spt γ ⊂ ∂ f , and such that K2 (μ, ν) =
Rn
|x| 2 dμ + 2
Rn
|y| 2 dν − 2
Rn
f dμ −
Rn
f ∗ dν. (3.43)
Moreover, the Kantorovich duality formula (3.27) holds in the form (−g) dμ + (−g c ) dν : g ∈ Cb0 (Rn ) ∩ Lip(Rn ) , K2 (μ, ν) = sup Rn
where c(x, y) = |x − y| 2 /2.
Rn
(3.44)
54
The Kantorovich Problem
Remark 3.21 It is qualitatively clear that the notion of subdifferential in the case of the linear cost, highlighted in (3.41), is much less stringent than the notion of subdifferential for the quadratic cost. Although just at an heuristic level, this remark points in the right direction in indicating that the linear cost problem is rougher than the quadratic cost problem – an indication that will be confirmed under every viewpoint: uniqueness of optimal plans, existence, uniqueness and regularity of transport maps, and so on. Remark 3.22 (Improved Kantorovich duality) Identity (3.44) points to the problem of the continued validity of the Kantorovich duality formula (3.27) when one restricts the competition class of the dual problem to only include functions more regular than Borel measurable: this may of course require attention since, by restricting the competition class of a supremum, we are possibly decreasing its value. We will address this problem only for the quadratic cost, since this is the case of the improved Kantorovich duality formula needed (or at least, convenient for technical reasons) in the proof of the Brenier–Benamou formula (see Chapter 15, Theorem 15.6). In particular, step two and step three of the proof of Theorem 3.20, which are devoted to the proof of (3.44), can be safely skipped on a first reading. Proof of Theorem 3.20 Step one: Since μ, ν ∈ P2 (Rn ) implies c(x, y) = |x − y| 2 ∈ L 1 (μ× ν), by Theorem 3.15, there exists a convex, lower-semicontinuous function g : Rn → R ∪ {+∞} such that every optimal transport plan γ in K2 (μ, ν) satisfies spt γ ⊂ ∂ c g. Conversely by Theorem 3.13, if spt γ ⊂ ∂ c g, then γ is optimal in K2 (μ, ν). Moreover, K2 (μ, ν) = (−g) dμ + (−g c ) dν, (3.45) Rn
Rn
thanks to (3.27). By the above remarks, setting f (x) = g(x)+|x| 2 /2 for x ∈ Rn , spt γ ⊂ ∂ c g is equivalent to spt γ ⊂ ∂ f , where f : Rn → R ∪ {+∞} is convex and lower semicontinuous on Rn , and (3.43) follows by g(x) = f (x) − |x| 2 /2 and by (3.42). Step two: To approach the proof of (3.44), we start making the following remark: if c is a nonnegative, bounded, and Lipschitz continuous cost function, μ, ν ∈ P (Rn ) are such that Kc (μ, ν) < ∞, γ is an optimal plan in Kc (μ, ν), and g is the c-convex function associated by Theorem 3.11 to S = spt γ and to a given (x 0 , y0 ) ∈ Rn ×Rn (see (3.31)), then both g and g c are bounded and Lipschitz continuous. Indeed, by (3.32), g is Lipschitz continuous (as the supremum of a family of uniformly Lipschitz functions) and bounded from below (since c is bounded from above). From g c (y) = sup{−c(x, y) − g(x) : x ∈ Rn }, we thus see that g c is bounded from above (as g is bounded from below and
3.7 Linear and Quadratic Kantorovich Dualities
55
c ≥ 0), bounded from below (as g(x 0 ) = 0, g c (y) ≥ −c(x, y) ≥ − sup c), and Lipschitz continuous (again, as a supremum of uniformly Lipschitz continuous functions). From g(x) = g cc (x) = sup{−c(x, y) − g c (y) : y ∈ Rn } and since g c is bounded from above, we also see that g is bounded from below. Step three: Now let c(x, y) = |x − y| 2 /2 and let {c j } j be such that c j = c on n n 2 2 2 n n B2n j = {(x, y) ∈ R ×R : |x| +|y| < j }, c j ↑ c on R ×R , and c j is bounded n n and Lipschitz continuous on R × R . By step two and by Theorem 3.13, if μ, ν ∈ P2 (Rn ), then we have α dμ + β dν, (3.46) Kc j (μ, ν) = sup (α, β) ∈A j
Rn
Rn
where A j is the class of those functions α, β ∈ Cb0 (Rn ) ∩ Lip(Rn ) such that α(x) + β(y) ≤ c j (x, y) for every (x, y) ∈ Rn × Rn . If γ j is an optimal transport plan for Kc j (μ, ν), then by γ j ∈ Γ(μ, ν), and up to extracting subsequences, n
there is γ ∈ Γ(μ, ν) such that γ j γ. In particular, for each (x, y) ∈ spt γ, there is (x j , y j ) ∈ spt γ j such that (x j , y j ) → (x, y) as j → ∞, and we can use this fact, together with the observation that testing the c-cyclical monotonicity of spt γ requires the consideration of only finitely many points in spt γ at a time, to see that the c j -cyclical monotonicity of spt γ j implies the c-cyclical monotonicity of spt γ, and thus that, thanks to Theorem 3.13, c dγ. (3.47) K2 (μ, ν) = Rn ×Rn
n n 2n Now, since c = c j on B2n j , setting Vr = (R × R ) \ Br we see that, if r < j is such that γ(∂Br2n ) = 0 (L 1 -a.e. r < j has this property), then
c dγ − c j dγ j
≤
c dγ − c dγ j
+ c dγ + c j dγ j , Rn ×Rn B 2n Vr Vr
Rn ×Rn
B2n
r r
where the first term converges to zero as j → ∞ by γ(∂Br2n ) = 0, while 2 2 2 2 c j dγ j ≤ |x| + |y| dγ j = |x| dμ + |y| dν− |x| 2 + |y| 2 dγ j Vr
Rn
Vr
Rn
B 2n r
n
so that, by γ j γ and lower semicontinuity on open sets, lim sup c j dγ j ≤ |x| 2 dμ + |y| 2 dν− |x| 2 +|y| 2 dγ = |x| 2 +|y| 2 dγ. j→∞
Vr
In summary,
Rn
Rn
B 2n r
Vr
lim sup c dγ − c j dγ j ≤ 2 |x| 2 + |y| 2 dγ, n n n n j→∞ R ×R R ×R Vr
56
The Kantorovich Problem
where the latter quantity converges to zero as r → ∞, since γ ∈ P2 (Rn × Rn ). Combining this with (3.47) and (3.46) we see that α dμ + β dν K2 (μ, ν) = lim sup ≤ sup
j→∞ (α, β) ∈A j
Rn
Rn
(−g) dμ +
Rn
(−g ) dν : g ∈ c
Rn
Cb0 (Rn )
∩ Lip(R ) ≤ K2 (μ, ν), n
where in the first inequality we have used c j ≤ c to notice that every (α, β) ∈ A j satisfies α(x) + β(y) ≤ c(x, y) whenever (x, y) ∈ Rn × Rn , while in the second inequality we have used (3.27).
PART II Solution of the Monge Problem with Quadratic Cost: The Brenier–McCann Theorem
4 The Brenier Theorem
With In this chapter we start our analysis of the Monge problem Mc (μ, ν) for the quadratic cost c(x, y) = |x − y| 2 , namely, of M2 (μ, ν) = inf |x − T (x)| 2 dμ(x) : T# μ = ν . Rn ×Rn
As noticed in Remark 2.1 the competition class for this problem could be empty, and, as explained in Chapter 1, even assuming to be in a situation where the existence of transport maps is not in doubt (e.g., because we assume from the onset that ν = T# μ for some map T that transports μ), it is not clear how to prove the existence of minimizers. At the same time, in Theorem 3.20, given μ, ν ∈ P2 (Rn ), we have proved the existence of optimal transport plans in the quadratic Kantorovich problem |x − y| 2 dγ(x, y) : γ ∈ Γ(μ, ν) . K2 (μ, ν) = inf Rn ×Rn
Since K2 (μ, ν) ≤ M2 (μ, ν), a natural strategy for proving the existence of minimizers of M2 (μ, ν) is to show that for every optimal plan γ in K2 (μ, ν) there is a transport map T from μ to ν such that γ = γT = (id × T )# μ (cf. Proposition 3.1). Theorem 3.20 strongly suggests that such a map T should be the gradient of a convex function. Indeed, by Theorem 3.20, if μ, ν ∈ P2 (Rn ), then there is a convex, lower-semicontinuous function f : Rn → R ∪ {+∞} such that every optimal transport plan γ in K2 (μ, ν) satisfies spt γ ⊂ ∂ f = (x, y) : y ∈ ∂ f (x) . (4.1) For a generic x, ∂ f (x) is a set, not a point; however, by Rademacher’s theorem (see Appendix A.10), the local Lipschitz property of convex functions, and Proposition 2.7, we know that ∂ f (x) = {∇ f (x)} for L n -a.e. x ∈ Dom( f ). In particular, if μ 0 such that ∂ f (Bδ (x)) ⊂ Bε (∇ f (x)).
(4.14)
By (4.14), we see that if x ∈ F, then for every ε > 0 we can find δ > 0 such that, thanks to (∇ f )# μ = ν, ν(Bε (∇ f (x))) ≥ ν(∂ f (Bδ (x))) ≥ ν(∇ f (F ∩ Bδ (x))) = μ z ∈ F : ∇ f (z) ∈ ∇ f (F ∩ Bδ (x)) ≥ μ(F ∩ Bδ (x)) = μ(Bδ (x)). In particular, if x ∈ (spt μ) ∩ F, then μ(Bδ (x)) > 0 and the arbitrariness of ε give ∇ f (x) ∈ spt ν. We have thus proved ∇ f (F ∩ spt μ) ⊂ spt ν, and since spt ν is closed, (4.7) follows.
4.2 Inverse of a Brenier Map and Fenchel−Legendre Transform Of course, the case when both μ and ν are absolutely continuous with respect to L n is of particular importance. In this case, in addition to the Brenier map ∇ f from μ to ν, we have the Brenier map ∇g from ν to μ. In the following theorem we prove that ∇g is the inverse of ∇ f (in the proper a.e. sense), and that the Fenchel–Legendre transform f ∗ of f is always a suitable choice for the convex potential g. Theorem 4.4 If μ, ν ∈ P2,ac (Rn ) and ∇ f is the Brenier map from μ to ν, then ∇ f ∗ is the Brenier map from ν to μ. Moreover, the following properties hold: (i) for ν-a.e. y ∈ Rn , f is differentiable at x = ∇ f ∗ (y), and ∇ f (∇ f ∗ (y)) = y; (ii) for μ-a.e. x ∈ Rn , f ∗ is differentiable at y = ∇ f (x), and ∇ f ∗ (∇ f (x)) = x.
4.3 Brenier Maps under Rigid Motions and Dilations
65
Proof Step one: Let R(x, y) = (y, x). As already noticed in Remark 3.5, if γ ∈ Γ(μ, ν), then R# γ ∈ Γ(ν, μ) and the two plans have the same quadratic cost. Therefore, K2 (μ, ν) = K2 (ν, μ). If now γ = (id × ∇ f )# μ is optimal in K2 (μ, ν), then γ ∗ = R# γ is optimal in K2 (ν, μ). By (4.10), for γ-a.e. (x, y) ∈ Rn × Rn we have y ∈ ∂ f (x). Since y ∈ ∂ f (x) if and only if x ∈ ∂ f ∗ (y), we conclude that for γ ∗ -a.e. (x, y) ∈ Rn × Rn we have x ∈ ∂ f ∗ (y). Since ν 0 and μ = ρ dL n ∈ P2,ac (Rn ), then the measures τx0 μ(E) = μ(E − x 0 ),
(4.15)
−1
Q# μ(E) = μ(Q (E)), λ
μ (E) = μ(E/λ),
(4.16) E ∈ B(R ), n
(4.17)
belong to P2,ac (Rn ) and are such that spt τx0 μ = x 0 + spt μ,
spt (Q# μ) = Q(spt μ),
spt μλ = λ spt μ.
Let f : Rn → R ∪ {+∞} be convex such that ∇ f transports μ, so that ∇ f is the Brenier map from μ to ν = (∇ f )# μ by Theorem 4.2. We can easily obtain the Brenier maps from τx0 μ, Q# μ, and μλ to, respectively, τx0 ν, Q# ν, and ν λ , by looking at the convex functions x → f (x − x 0 ) + x · x 0 ,
x → f (Q∗ (x)),
x → λ 2 f (x/λ).
For example, if ϕ ∈ Cc0 (Rn ), then (∇ f )# μ = ν gives ϕ d[τx0 ν] = ϕ(y + x 0 ) dν(y) = ϕ(∇ f (x) + x 0 ) dμ(x), Rn
Rn
Rn
while at the same time if g is convex and such that (∇g)# [τx0 μ] = τx0 ν, then ϕ d[τx0 ν] = ϕ(∇g) d[τx0 μ] = ϕ(∇g(x + x 0 )) dμ(x), Rn
Rn
Rn
so that we must have ∇g(x) = ∇ f (x − x 0 ) + x 0 for μ-a.e. x ∈ Rn ; in particular, we may take g(x) = f (x − x 0 ) + x · x 0 and (4.15) holds. The proofs of (4.16) and (4.17) are entirely analogous.
5 First Order Differentiability of Convex Functions
In this chapter we establish two basic facts concerning the first order differentiability of convex functions-a sharp dimensional estimate on the set of non-differentiability points (Theorem 5.1), and a non-smooth, convex version of the implicit function theorem (Theorem 5.3). These results will then be used in Chapter 6 to relax the condition of L n -absolute continuity on the origin measure in the Brenier theorem.
5.1 First Order Differentiability and Rectifiability The L n -a.e. differentiability of convex functions was deduced in Section 2.3 as a direct consequence of Rademacher’s theorem and the local Lipschitz bounds implied by convexity. This approach does not take full advantage of convexity, and indeed a bit of experimenting suggests that the set of non–differentiability points of a convex function should have, at most, codimension one; and, in Theorem 5.1 below, we prove indeed its countable (n − 1)-rectifiability. Here we are using the terminology (see Appendix A.15 for more context) according to which a Borel set M ⊂ Rn is countably k-rectifiable in Rn (where 0 ≤ k ≤ n − 1 is an integer), if there exist countably many Lipschitz maps g j : Rk → Rn such that M⊂ g j (Rk ). j ∈N
Theorem 5.1 (First order differentiability of convex functions) If f : Rn → R ∪ {+∞} is a convex function on Rn with set of differentiability points F, then Dom( f ) \ F is a countably (n − 1)-rectifiable subset of Rn . Proof
For a preliminary insight in the proof, see Figure 5.1. 67
68
First Order Differentiability of Convex Functions
S = (id + f )(Rn )
R
n
∂ f (x) ∂f x Rn
Figure 5.1 Proof of Theorem 5.1. On the left, the subdifferential ∂ f of a convex function f . On the right, the graph S of (id + ∂ f ): By monotonicity of ∂ f we can invert (id + ∂ f ) on S as a 1-Lipschitz map. Now, if x ∈ Dom( f ) \ F, then ∂ f (x) has affine dimension at least 1, in particular ∂ f (x) intersects every element of an open ball in the metric space of affine hyperplanes (this ball is represented by a “cloud of lines” in the picture). By taking a countable dense subset in the space of affine hyperplanes and using the inverse map of (id + ∂ f ), we thus cover Dom( f ) \ F by countably many Lipschitz images of Rn−1 .
Step one: Let us consider the multi-valued map (id + ∂ f ) : Rn → K , i.e., (id + ∂ f )(x) = x + y : y ∈ ∂ f (x) x ∈ Rn , where K is the family of closed convex subsets of Rn , and let S = (id + ∂ f )(Rn ) = z ∈ Rn : z = x + y, x ∈ Rn , y ∈ ∂ f (x) . We first claim that for every z ∈ S there exists a unique x ∈ Rn so that z = x + y for some y ∈ ∂ f (x). Indeed, if z1 = x 1 + y1 and z2 = x 2 + y2 are elements of S, then by cyclical monotonicity of ∂ f we have (y1 − y2 ) · (x 1 − x 2 ) ≥ 0,
(5.1)
and therefore |x 1 − x 2 | 2 ≤ [(x 1 − x 2 ) + (y1 − y2 )] · (x 1 − x 2 ) = (z1 − z2 ) · (x 1 − x 2 ) ≤ |z1 − z2 | |x 1 − x 2 | so that |x 1 − x 2 | ≤ |z1 − z2 |: In particular, if z1 = z2 , then x 1 = x 2 and, necessarily, y1 = y2 . Based on this claim we can define a map T = (id + ∂ f ) −1 : S → Rn so that T (z) = x is the unique element of Rn with the property that z − x ∈ ∂ f (x), and the resulting map is 1-Lipschitz, i.e.,
5.2 Implicit Function Theorem for Convex Functions |T (z1 ) − T (z2 )| ≤ |z1 − z2 |
69
∀z1 , z2 ∈ S.
Finally, we extend T as a 1-Lipschitz map from Rn to Rn . Step two: We can assume that Ω = Int Dom( f ) ∅: Otherwise F = ∅, Dom( f ) is contained in a hyperplane, and the theorem is proved. Since Dom( f ) \ Ω is contained in the boundary of the convex set Ω – which is easily seen to be countably (n − 1)-rectifiable – we can directly work with Ω \ F. To conclude the metric space of (n − 1)-dimensional the argument, let us denote by Gn−1 n n n−1 represent the metric space linear subspaces of R , and let A n = Rn × Gn−1 n of (n − 1)-dimensional affine subspaces of Rn . We let Q = { A j } j ∈N denote a countable dense subset of A nn−1 . Now, if x ∈ Ω \ F, then ∂ f (x) is a non-trivial convex set of affine dimension at least 1. In particular, there exists an open ball in A nn−1 such that every affine plane A in this ball intersects ∂ f (x): By density, we can find j ∈ N such that ∂ f (x) ∩ A j ∅. Let g j : A j → Rn be defined as the restriction of T to A j : If z ∈ A j ∩ ∂ f (x), then g j (z) = x. In summary, for every x ∈ Ω \ F, there exists j ∈ N such that x ∈ g j ( A j ). Since A j ≡ Rn−1 (and g j is Lipschitz since T is), Ω \ F is countably (n − 1)-rectifiable. Remark 5.2 With the same proof, the set of those x such that ∂ f (x) has affine dimension greater or equal than k is proved to be countably (n − k)-rectifiable. In particular, ∂ f (x) has non-empty interior only at countably many x ∈ Rn .
5.2 Implicit Function Theorem for Convex Functions The implicit function theorem for C 1 -functions makes crucial use of the continuity assumption of the gradients and is indeed false for Lipschitz functions. 1 In the following theorem we prove a version of the implicit function theorem for convex functions where the continuity of gradients is replaced by the continuity property of convex subdifferentials proved in Proposition 2.7. Theorem 5.3 (Convex implicit function theorem) If f , g : Rn → R ∪ {+∞} are convex functions, both differentiable at x and with ∇ f (x) ∇g(x), then there exists r > 0 such that Br (x) ∩ { f = g} is contained in the graph of a Lipschitz function of (n − 1)-variables. Proof The idea is to follow the classical proof of the implicit function theorem in the case of the two smooth functions f ε = f ρε and gε = g ρε defined by 1
An example is given in [McC95]: let n = 2, f (x) = x 1 if |x 1 | ≥ x 22 , f (x) = 0 if |x 1 | ≤ x 22 /2 and let f (x) be defined by a Lipschitz extension elsewhere. In this example f is differentiable at x = 0 with ∇ f (0) = e 1 , and f (0) = g(0) = 0, where g ≡ 0. Nevertheless, the level set { f = g } = { f = 0} contains an open set in every neighborhood of the origin.
70
First Order Differentiability of Convex Functions
s0 r0 en
K s0 + r0 en Cr0 K s0 Rn
0
uε −r0 en
K s0 − r0 en Figure 5.2 The notation used in the proof of Theorem 5.3. ε-regularization of f and g. By exploiting the continuity of subdifferentials, i.e., the fact that if f is differentiable at x, then for every ε > 0 there exists δ > 0 such that (∂ f )(Bδ (x)) ⊂ Bε (∇ f (x)),
(5.2)
we will be able to take the limit ε → 0+ in the implicit functions so constructed, thus proving the theorem. Without loss of generality, we set x = 0, let h = f −g, assume that ∇h(0) = 2 λ en ,
for λ > 0,
(5.3)
and introduce the projection π : Rn → Rn−1 defined by π(y) = (y1 , . . . , yn−1 ), and the corresponding cylinders and (n − 1)-dimensional disks Kr = y ∈ Rn : |π(y)| < r, yn = 0 ; Cr = y ∈ Rn : |π(y)| < r, |yn | < r , see Figure 5.2. We claim that there exists ε 0 > 0 and r 0 > 0 such that ∇hε (y) · en ≥ λ
∀y ∈ Cr0 ,∀ε < ε 0 ,
(5.4)
where hε = h ρε . Indeed, if (5.4) fails, then we can find ε j → 0+ and y j → 0 such that ∇hε j (y j ) · en < λ. If F is the set of differentiability points of f , then ρε j (z − y j ) ∇ f (z) dz ∈ Conv (∇ f )(F ∩ Bε j (y j )) ∇ f ε j (y j ) = B ε j (y j )
⊂ Conv ∂ f (Bε j + |y j | (0)) ,
5.2 Implicit Function Theorem for Convex Functions
71
so that, thanks to (5.2), we find ∇ f ε j (y j ) → ∇ f (0). Similarly, ∇gε j (y j ) → ∇g(0), and thus ∇h(0) · en ≤ λ, a contradiction with (5.3). This proves (5.4). By (5.4), we find hε (r 0 en ) ≥ hε (0) + λ r 0 ,
hε (−r 0 en ) ≤ hε (0) − λ r 0 .
By combining these bounds with the basic uniform estimate sup ∇hε C 0 (C r
ε u(z), then wn > uε (z) for every ε small enough, and thus, thanks to (5.6) and (5.4), 1 ∇hε z,uε (z) + t (wn − uε (z) · en dt hε (w) = hε (z,uε (z))+(wn − uε (z)) ≥ λ (wn − uε (z)),
0
which in turn, in the limit ε → 0+ , gives 0 = h(w) ≥ λ(wn − u(z)) > 0, a contradiction.
6 The Brenier–McCann Theorem
We now exploit the results presented in Chapter 5 to improve the Brenier theorem in several directions. Theorem 6.1 (The Brenier–McCann theorem)
If μ ∈ P (Rn ) is such that
μ(S) = 0
(6.1)
for every countably (n − 1)-rectifiable set S in Rn , then: Existence: For every ν ∈ P (Rn ) there exists a lower semicontinuous and convex function f : Rn → R ∪ {+∞} with set of differentiability points F such that Ω = Int Dom( f ) ∅, μ is concentrated on F, (∇ f )# μ = ν, Cl(∇ f (F ∩ spt μ)) = spt ν. Moreover, if spt ν is compact, then we can construct f so that Dom( f ) = Rn ,
∇ f (x) ∈ spt ν for every x ∈ F.
(6.2)
Rn
Uniqueness: If f , g : → R ∪ {+∞} are convex functions such that (∇ f )# μ = (∇g)# μ, then ∇ f = ∇g μ-a.e. on Rn . Remark 6.2 The existence statement in Theorem 6.1 improves on Theorem 4.2 under two aspects: First, it drops the assumption that μ, ν ∈ P2 (Rn ); second, μ is no longer required to be absolutely continuous with respect to L n , but just to be null on countably (n − 1)-rectifiable sets. The first improvement simply hinges on the use of Theorem 3.16 in place of Theorem 3.20. The second improvement is based on the first order differentiability result for convex functions established in Theorem 5.1. Notice in particular that, in light of Remark 4.1, condition (6.1) is sharp; i.e., if μ is positive on some (n − 1)-dimensional 73
74
The Brenier–McCann Theorem
set, there may be no optimal transport map. The uniqueness statement in Theorem 6.1 is formally the same as the uniqueness of Brenier maps proved in Theorem 4.2, but the proof is substantially different. The reason is, once again, that we may not have |x − y| 2 ∈ L 1 (μ × ν) and thus may be unable to exploit the Kantorovich duality theorem (which is behind Theorem 3.15, and thus the uniqueness statements in the Brenier theorem). We will thus need an entirely new argument, which will be based on the implicit function theorem for convex functions (Theorem 5.3).
6.1 Proof of the Existence Statement Step one: By Theorem 3.16 there are γ ∈ Γ(μ, ν) and a convex function f : Rn → R ∪ {+∞} such that spt γ ⊂ ∂ f . We now argue as in step one of the proof of the Brenier theorem, using (6.1) in place of μ 0, and a Lipschitz function u : v ⊥ → R such that u(0)ν = x 0 and Bδ0 (x 0 ) ∩ { f > g} = Bδ0 (x 0 ) ∩ z + t v : z ∈ v ⊥ ,t > u(z) , where v ⊥ = {z ∈ Rn : z · v = 0}. By (6.1) we have μ(Bδ0 (x 0 ) ∩ { f = g}) = 0, which, combined with the fact that μ(Bδ (x 0 )) > 0 for every δ > 0, implies that for every δ ∈ (0, δ0 ) we have or μ Bδ (x 0 ) ∩ { f < g} > 0. either μ Bδ (x 0 ) ∩ { f > g} > 0 (6.4) To find a contradiction to the fact that ∇ f and ∇g define the same push-forward of μ, we need to relate the behaviors of ∇ f and ∇g on { f > g} and { f < g}. To this end, we first prove that if x ∈ G, then ∇g(x) ∈ ∂ f f > g ⇒ f (x) > g(x). (6.5) Indeed, since ∇g(x) ∈ ∂ f (z) for some z such that f (z) > g(z), for every w ∈ Rn we have f (w) ≥ f (z) + ∇g(x) · (w − z) > g(z) + ∇g(x) · (w − z) ≥ g(x) + ∇g(x) · (z − x) + ∇g(x) · (w − z) i.e., f (w) > g(x) + ∇g(x) · (w − x)
∀w ∈ Rn .
(6.6)
Taking w = x proves (6.5). We can prove a stronger localized version of (6.5) near x 0 , namely we can show the existence of η > 0 such that if x ∈ G, then ∇g(x) ∈ ∂ f
f >g
⇒
⎧ f (x) > g(x), ⎪ ⎨ ⎪ |x − x | ≥ η. 0 ⎩
(6.7)
76
The Brenier–McCann Theorem
Indeed, if (6.7) fails, then we can find x j → x 0 with ∇g(x j ) ∈ ∂ f (z j ) for z j ∈ { f > g}, so that by (6.6) f (w) > g(x j ) + ∇g(x j ) · (w − x j )
∀j,∀w ∈ Rn ;
since ∇g(x j ) → ∇g(x 0 ) by the continuity of convex subdifferentials, we find g
f (w) ≥ Ax0 (w)
∀w ∈ Rn .
(6.8)
g Ax0 (x 0 ),
we conclude that ∇g(x 0 ) ∈ ∂ f (x 0 ) = Since f (x 0 ) = g(x 0 ) = {∇ f (x 0 )}, a contradiction. This proves (6.7). Therefore, by repeating the argument with the roles of f and g inverted, we have proved the existence of η > 0, depending on f , g and x 0 , such that (6.9) (∇g) −1 ∂ f { f > g} ⊂ { f > g} \ Bη (x 0 ), (∇ f ) −1 ∂g {g > f } ⊂ {g > f } \ Bη (x 0 ). Assuming without loss of generality that η < δ0 and that the first condition in (6.4) holds for δ = η, i.e., that μ Bη (x 0 ) ∩ { f > g} > 0, we now conclude that proof. Indeed, setting Z = ∂ f { f > g} and taking into account that we trivially have F ∩ { f > g} = y ∈ F : ∇ f (y) ∈ Z = (∇ f ) −1 (Z ),
(6.10)
we conclude that, by (∇g)# μ = ν, (6.9), μ(Rn \ F) = 0, and (6.10), one has ν Z ≤ μ { f > g} \ Bη (x 0 ) = μ F ∩ { f > g} \ Bη (x 0 ) ≤ μ (∇ f ) −1 Z ∩ { f > g} \ Bη (x 0 ) = μ (∇ f ) −1 Z ∩ { f > g} − μ (∇ f ) −1 Z ∩ { f > g} ∩ Bη (x 0 ) = μ((∇ f ) −1 (Z )) − μ F ∩ { f > g} ∩ Bη (x 0 ) = ν(Z ) − μ { f > g} ∩ Bη (x 0 ) < ν(Z ) where in the last identity we have used (∇ f )# μ = ν and μ(Rn \ F) = 0, and where in the last inequality we have used (6.4) with δ = η. This contradiction proves that ∇ f = ∇g on spt μ ∩ F ∩ G, and thus μ-a.e.
7 Second Order Differentiability of Convex Functions
In Chapter 1 we have introduced two different formulations of the Monge problem: one based on the pointwise transport condition (1.1) (and limited to the case when μ, ν ε}; therefore, if ϕ ∈ Cc∞ (Ω) with ϕ ≥ 0, then for ε small enough we have spt ϕ ⊂⊂ Ωε and 2 ϕ ∇vv f ε = f ε ∇2vv ϕ, 0≤ 2 Dvv
spt ϕ
spt ϕ
2 f ≥ 0 on C ∞ (Ω), as claimed. Now, setting D 2 f = so that, letting ε → 0+ , Dvv c ij 2 De i e j f , the identity
(ei + e j ) · A[ei + e j ] = ei · A[ei ] + e j · A[e j ] + 2 ei · A[e j ]
n×n ∀A ∈ Rsym
applied to A = ∇2 ϕ gives 2 2 Di2j f = Dv2 v f − Dii f − D2j j f ,
v = ei + e j ,
thus proving that Di2j f is a signed Radon measure in Ω. This proves (7.2), and since D2 f = D(∇ f ), we also have ∇ f ∈ BVloc (Ω; Rn ).
7.2 Alexandrov’s Theorem for Convex Functions We now address the existence of second order Taylor expansions for convex functions, proving the following classical result of Alexandrov. Theorem 7.2 (Alexandrov’s theorem) Let f : Rn → R ∪ {+∞} be convex on Rn , and let F2 denote the set of points of twice differentiability of f , namely, for every x ∈ F2 , f is differentiable at x with gradient ∇ f (x) and there exists n×n such that ∇2 f (x) ∈ Rsym f (x + v) = f (x) + ∇ f (x) · v +
1 v · ∇2 f (x)[v] + o(|v| 2 ), 2
(7.6)
as v → 0 in Rn . Then, L n (Dom( f ) \ F2 ) = 0, and, for L n -a.e. x ∈ F2 one can take ∇2 f (x) to be the value at x of density of D2 f with respect to the Lebesgue measure.
80
Second Order Differentiability of Convex Functions
Proof Of course, there is nothing to prove unless Ω = Int Dom( f ) ∅. Let D2 f = ∇2 f dL n Ω + [D2 f ]s be the Radon–Nikodym decomposition of D2 f with respect to L n Ω, and let x be a Lebesgue point for both ∇ f and ∇2 f , and a point such that ∇2 f (x) = D L n [D2 f ](x), so that, as r → 0+ , |∇ f − ∇ f (x)| = o x (r n ), (7.7) B r (x) |∇2 f − ∇2 f (x)| = o x (r n ), (7.8) B r (x)
lim+
r →0
D2 f (Br (x)) = ∇2 f (x); ωn r n
(7.9)
see Appendix A.9. Notice that (7.8) and (7.9) imply
[D2 f ]s
(Br (x)) = o x (r n ),
(7.10)
as r → 0+ . Without loss of generality, we set x = 0,
Br (x) = Br ,
Φ(y) = f (y) − f (0) − ∇ f (0) · y −
1 y · ∇2 f (0)[y], 2
for y ∈ Rn , and notice that the thesis follows by showing ΦC 0 (B r ) = o(r 2 )
as r → 0+ .
(7.11)
We first prove an L 1 -version of (7.11) and then conclude by an interpolation argument. Step one: We prove that |Φ| = o(r n+2 ),
as r → 0+ .
(7.12)
Br
Given standard regularizing kernels ρε , we set f ε = f ρε and let Φε (y) = f ε (y) − f ε (0) − ∇ f ε (0) · y −
1 y · ∇2 f (0)[y], 2
y ∈ Rn .
(Notice that we are intentionally using ∇2 f (0) and not ∇2 f ε (0) in defining Φε .) Since f is continuous in Ω, f ε → f locally uniformly in Ω; moreover, since 0 is a Lebesgue point of ∇ f , we have ∇ f ε (0) → ∇ f (0). Therefore, Φε → Φ
locally uniformly on Rn .
In particular, if ϕ ∈ Cc0 (Ω), then ϕ Φ = lim+ Br
ε→0
Br
ϕ Φε .
(7.13)
7.2 Alexandrov’s Theorem for Convex Functions
81
Notice that by applying Taylor’s formula for f ε we find 1 1 Φε (y) = (1 − s) y · ∇2 f ε (s y)[y] ds − y · ∇2 f (0)[y] 2 0 1 (1 − s) y · ∇2 f ε (s y) − ∇2 f (0) [y] ds, = 0
so that, setting ϕ
hε (s) =
Br s
ϕ(z/s) z · ∇2 f ε (z) − ∇2 f (0) [z] dz
s ∈ (0, 1),
by the change of variables s y = z we find 1 (1 − s) ϕ ϕ(y) Φε (y) dy = hε (s) ds. s n+2 Br 0
(7.14)
n×n -valued Radon measure on Ω, we Now, in the weak-star convergence of Rsym have ∇2 f ε (x) − ∇2 f (0) dL n Ω = D2 f − ∇2 f (0) dL n Ω ρε ∗
as ε → 0+ .
D2 f − ∇2 f (0) dL n Ω, ϕ
Therefore, for every ϕ ∈ Cc0 (Ω) and s ∈ (0, 1), we have hε (s) → hϕ (s) as ε → 0+ , where hϕ is given by ϕ h (s) = ϕ(z/s) z · ∇2 f (z) − ∇2 f (0) [z] dz Br s n
+
i, j=1
Brs
ϕ(z/s) z i z j d[D2 f ]isj ;
here we have taken into account that D2 f = ∇2 f d[L n Ω] + [D 2 f ]s and have denoted by [D2 f ]isj the (i, j)-component of the Rn×n -valued Radon measure [D2 f ]s . Setting for brevity μ = D2 f − ∇2 f (0) dL n Ω and recalling the basic estimate for Radon measures min{t n , ε n } | μ ρε | ≤ C(n) | μ|(Bt+ε ), (see (A.23)), εn Bt we find that ϕ |hε (s)|
≤ ϕC 0 (r s)
2 Br s
min{ε n , (r s) n } 2 |D f − ∇2 f (0) dL n |(Br s+ε ) εn min{ε n , (r s) n } (r s) 2 (r s + ε) n ≤ C(n) ϕC 0 (r s) n+2 , εn
≤ C(n) ϕC 0 (r s) 2 ≤ C(n) ϕC 0
|(D2 f − ∇2 f (0) dL n ) ρε |(z) dz
82
Second Order Differentiability of Convex Functions
where in the penultimate inequality we have used (7.8) and (7.9). This proves that, uniformly in ε, ϕ
|hε (s)| ≤ C(n) ϕC 0 (r s) n+2 ,
∀s ∈ (0, 1),
(7.15)
so that by (7.13), (7.14), and dominated convergence we get 1 1 (1 − s) ϕ (1 − s) ϕ ϕ Φ = lim+ ϕ Φε = lim+ hε (s) ds = h (s) ds. n+2 ε→0 ε→0 s s n+2 Br Br 0 0 We now notice that, again by (7.8) and (7.9), we have 1 1 (1 − s) ϕ 1−s 2 h (s) ds ≤ r ϕC 0 ds |∇2 f (z) − ∇2 f (0)| dz sn s n+2 Br s 0 0 1 1−s 2 s + f ] ) ds [D (B
rs sn 0 = ϕC 0 o(r n+2 ),
as r → 0+ ,
where o(r n+2 ) is independent from ϕ. We conclude that 0 |Φ| = sup ϕ Φ : ϕ ∈ Cc (Ω), |ϕ| ≤ 1 = o(r n+2 ), Br
as r → 0+ ,
Br
as claimed. Step two: In this step we show how deduce (7.11) from (7.12), thus concluding the proof of the theorem. Let us start noticing that we can write Φ = Φ1 − Φ2 , where Φ1 and Φ2 are the convex functions defined by Φ1 (y) = f (y) − f (0) − ∇ f (0) · y,
Φ2 (y) =
1 y · ∇2 f (0)[y]. 2
By applying (7.1) to both Φ1 and Φ2 , we find C(n) ∇Φ L ∞ (B r ) ≤ n+1 Φ1 L 1 (B2r ) + Φ2 L 1 (B2r ) r C(n) ≤ n+1 Φ L 1 (B2r ) + 2 Φ2 L 1 (B2r ) , r where Φ2 C 0 (B2 r ) ≤ Λ r 2 if Λ = |∇2 f (0)|, so that 1 |Φ| , ∇Φ L ∞ (B r ) ≤ C(n, Λ) r + n+1 r B2r
∀B3r ⊂⊂ Ω. (7.16)
Interestingly, we cannot deduce (7.11) directly from the basic interpolation of (7.16) and (7.12) obtained by looking at the first order Taylor’s formula for Φ, namely, 1 C |∇Φ(sy)| ds ≤ C |y| 2 + n |Φ|; |Φ(y)| ≤ |y| |y| 0 B2 |y |
7.2 Alexandrov’s Theorem for Convex Functions
83
indeed, in this way, we just find ΦC 0 (B r ) = O(r 2 ). We rather argue as follows. For every ε small enough consider the bad set Kε = {y ∈ Br : |Φ| ≥ ε r 2 } and notice that by (7.12) we can find r ε such that |Φ| ≤ ε n+1 r 2 |Br |, ∀r < r ε , Br
and, correspondingly, 1 |Br ∩ Kε | ≤ ε r2
|Φ| ≤ ε n |Br |, Br
∀r < r ε .
In particular, if r < r ε and y ∈ Br /2 , then B2εr (y) \ Kε ∅, for otherwise ε n |Br | ≥ |Br ∩ Kε | ≥ |B2ε r (y)| = 2n ε n |Br |, a contradiction. Hence, for every r < r ε and y ∈ Br /2 there exists z with |z − y| < 2ε r and |Φ(z)| ≤ ε r 2 , so that (7.16) gives |Φ(y)| ≤ |Φ(z)| + |z − y| ∇Φ L ∞ (B r ) C ≤ ε r 2 + 2 ε r C r + n+1 |Φ| ≤ C ε r 2 , r B2r where in the last inequality we have again used (7.12). This proves (7.11).
We finally discuss the validity of a first order Taylor’s expansion for the gradient of a convex function. In general, the validity of (7.6) does not imply that ∇ f (x+v)−∇ f (x)−∇2 f (x)[v] = o(|v|) as v → 0 (as it is seen, for example, by taking f (x) = |x| 3 sin(|x| −2 ) if x 0, and f (0) = 0 with n = 1). For convex functions, however, the existence of a second order Taylor expansion at a point implies the existence of a first order expansion at the level of gradients, although the latter concept has to be properly formulated since the classical gradient is not everywhere defined. Theorem 7.3 (Differentiability of the subdifferential) is convex and twice differentiable at x 0 , then lim
x→x 0
If f : Rn → R ∪ {+∞}
y − ∇ f (x 0 ) − ∇2 f (x 0 )[x − x 0 ]
= 0. sup
|x − x 0 | y ∈∂ f (x)
(7.17)
n×n Remark 7.4 Notice that, in turn, the validity of (7.17) with some A ∈ Rsym 2 in place of ∇ f (x 0 ) implies that
f (x) = f (x 0 ) + ∇ f (x 0 ) · (x − x 0 ) +
1 (x − x 0 ) · A[x − x 0 ] + o(|x − x 0 | 2 ) 2
as x → x 0 in Rn – and thus, that A = ∇2 f (x 0 ). To prove this, let Q(x) = f (x 0 ) + ∇ f (x 0 ) · (x − x 0 ) +
1 (x − x 0 ) · A[x − x 0 ], 2
x ∈ Rn ,
84
Second Order Differentiability of Convex Functions
pick any v ∈ Sn , and integrate over [x 0 , x 0 + tv] to find that t
∇ f (x 0 + s v) − ∇ f (x 0 ) + A[sv]
ds | f (x 0 + t v) − Q(x 0 + t v)| ≤ 0 t
y − y0 − A[sv]
|s v| ds ≤ sup
|s v| 0 y ∈∂ f (x 0 +s v) ≤
t2 2
sup
x ∈B t (x 0
y − ∇ f (x 0 ) − A[x − x 0 ]
, sup
|x − x 0 | ) y ∈∂ f (x)
i.e., | f (x 0 + t v) − Q(x 0 + t v)| = o(t 2 ) as t → 0+ , uniformly on v ∈ Sn . Proof of Theorem 7.3 Step one: We show that pointwise convergence of convex functions implies locally uniform convergence of their subdifferentials. More formally, if f j and f are convex functions on Rn , K ⊂⊂ IntDom( f ), and f j → f pointwise on Rn as j → ∞, then (7.18) lim sup dist(y, ∂ f (x)) : y ∈ ∂ f j (x), x ∈ K = 0. j→∞
Indeed, if x is interior to the domain of f , then we can find an n-dimensional simplex Σ containing x and compactly contained in Dom( f ). Every convex function will be bounded on Σ by its values at the (n + 1)-vertexes of Σ, and in turn its Lipschitz constant on a ball Br (x 0 ) such that B2r (x 0 ) ⊂ Σ will be bounded in terms of the oscillation of the convex function on Σ. Covering K by finitely many such simplexes we see that, for A an open neighborhood of K with A ⊂⊂ IntDom( f ), we have Lip( f j ; A) ≤ M < ∞ for every j. In particular, f j → f uniformly on K. Now let x j ∈ K and y j ∈ ∂ f j (x j ) be such that 1 dist(y j , ∂ f (x j )) + ≥ sup j , j where sup j is an abbreviation for the j-dependent supremum appearing in (7.18). Since K is compact and Lip( f j ; A) ≤ M, we deduce that both x j and y j are bounded sequences and thus, up to extracting subsequences, have limits x 0 and y0 . Since f j → f uniformly on K, we find that f j (x j ) → f (x 0 ). By taking the limit j → ∞ in the inequalities f j (z) ≥ f j (x j ) + y j · (z − x j ) for every z ∈ Rn we see that y0 ∈ ∂ f (x 0 ). Since dist(y j , ∂ f (x j )) → dist(y0 , ∂ f (x 0 )) = 0 we conclude that sup j → 0, as claimed. Step two: We know that ∂ f (x 0 ) = {∇ f (x 0 )} and that there exists A = n×n such that, locally uniformly on v ∈ Rn , ∇2 f (x 0 ) ∈ Rsym f (x 0 + h v) − f (x 0 ) − h ∇ f (x 0 ) · v 1 → v · A[v] 2 h2
as h → 0+ .
7.2 Alexandrov’s Theorem for Convex Functions
85
This fact can be reformulated by saying that the convex functions f (x 0 + h v) − f (x 0 ) − h ∇ f (x 0 ) · v 1 , g(v) = v · A[v], 2 2 h are such that gh → g uniformly on compact subsets of Rn as h → 0+ . Clearly, ∇g(v) = A[v] for every v ∈ Rn , while y − ∇ f (x 0 ) ∂gh (v) = : y ∈ ∂ f (x 0 + h v) . h gh (v) =
By applying step one with K = {|v| ≤ 1}, and taking into account that ∂g(v) = { A[v]}, and thus dist(y, ∂g(v)) = |y − A[v]|, we conclude that 0 = lim+ sup |w − A[v]| : w ∈ ∂gh (v), |v| ≤ 1 h→0 y − ∇ f (x 0 )
= lim+ sup
− A[v]
: y ∈ ∂ f (x 0 + h v), |v| ≤ 1 h→0 h
y − ∇ f (x 0 ) − A[x − x 0 ]
, = lim sup
x→x 0 y ∈∂ f (x) |x − x 0 |
as claimed.
8 The Monge–Ampère Equation for Brenier Maps
In this chapter we address the fundamental problem of the validity, for Brenier maps, of the (L n -a.e.) pointwise transport condition (1.1) – namely, we prove that, if μ = ρ dL n , ν = σ dL n , and ∇ f is the Brenier map from μ to ν, then ρ(x) = σ(∇ f (x)) det ∇2 f (x)
L n -a.e. on Dom( f ).
(8.1)
This identity plays a crucial role in the applications of the quadratic Monge problem discussed in Part III. It also asserts that the convex potential f solves (in the interior of its domain) the Monge–Ampère equation det ∇2 f = h (with h(x) = ρ(x)/σ(∇ f (x))). This is a type of equation with a rich regularity theory, which, under suitable assumptions on ρ and σ, can be exploited to infer regularity results for Brenier maps. We cannot deduce (8.1) from (∇ f )# μ = ν as in Proposition 1.1, since there we were making use of the area formula for injective Lipschitz maps T, whereas T = ∇ f is, in general, neither Lipschitz nor injective. The second order differentiability theory of convex functions developed in Chapter 7 provides the starting point to address these two issues and prove the main result of this chapter, Theorem 8.1. Notice that Theorem 8.1 can be seen as a further improvement on Theorem 6.1 in the special case when μ, ν ∈ Pac (Rn ). We set the following convenient notation. Given a convex function f : Rn → R∪{+∞}, we denote by F1 and F2 the points of first and second differentiability of f , and set F2,inv = x ∈ F2 : ∇2 f (x) is invertible . ∗ . Moreover, The corresponding sets for f ∗ are denoted by F1∗ , F2∗ , and F2,inv given w ∈ L 1loc (Ω), Ω an open set in Rn , we denote by Leb(w) the set of the Lebesgue points of w in Ω, see (A.15). With this notation in place we state the main result of this chapter.
86
The Monge–Ampère Equation for Brenier Maps
Theorem 8.1 (Monge–Ampère equation for Brenier maps) with ν = σ dL n , μ = ρ dL n ,
87
If μ, ν ∈ Pac (Rn )
then there exists a convex function f : Rn → R ∪ {+∞} such that Ω = IntDom( f ) ∅, (∇ f )# μ = ν, Cl(∇ f (F1 ∩ spt μ)) = spt ν, and μ is concentrated on X, where X = F2,inv ∩ (∇ f ) −1 (Leb(σ)) ∩ Leb( ρ) ∩ { ρ > 0}.
(8.2)
Moreover, for every x ∈ X we have ∇ f (x) ∈ {σ > 0},
(8.3)
σ(∇ f (x)) det ∇ f (x) = ρ(x).
(8.4)
2
In particular, (8.3) and (8.4) hold μ-a.e. on Rn . Finally, det ∇2 f ∈ L 1loc (Ω) and the area formula ϕ = ϕ(∇ f ) det ∇2 f , (8.5) (∇ f )(M )
M
holds for every Borel function ϕ : Rn → [0, ∞], where M = F2,inv ∩ Leb(det ∇2 f ). Remark 8.2 The convex potential f in Theorem 8.1 is constructed as in Theorem 6.1 – in particular, if spt ν is compact, then we can further assert that Dom( f ) = Rn and that ∇ f (x) ∈ spt ν for every x ∈ F1 . We now discuss the strategy of the proof of Theorem 8.1. The first step consists of the following basic proposition, which holds in greater generality than Theorem 8.1, and is of independent interest. Proposition 8.3 If μ ∈ P (Rn ), f : Rn → R ∪ {+∞} is convex, and ∇ f transports μ, then for every Borel set E in Rn we have (∇ f )# μ (E) = μ ∂ f ∗ (E) . (8.6) Proof Indeed, recalling that F1 denotes the differentiability set of f , the fact that ∇ f transports μ means that μ is concentrated on F1 . At the same time, F1 ∩ ∂ f ∗ (E) = x ∈ F1 : ∃ y ∈ E s.t. x ∈ ∂ f ∗ (y) = x ∈ F1 : ∃ y ∈ E s.t. y ∈ ∂ f (x) = {∇ f (x)} = x ∈ F1 : ∇ f (x) ∈ E = (∇ f ) −1 (E), so that ∂ f ∗ (E) is μ-equivalent to (∇ f ) −1 (E), as claimed.
88
The Monge–Ampère Equation for Brenier Maps
With (8.6) at hand, we can illustrate the main idea behind the proof of (8.4). With μ and ν as in Theorem 8.1, if x is both a Lebesgue point of ρ and such that ∇ f (x) is a Lebesgue point of σ, then, thanks to (8.6), σ(∇ f (x)) ν(Br (∇ f (x))) ω n r n μ(∂ f ∗ (Br (∇ f (x)))) ≈ = . n ρ(x) ωn r μ(Br (x)) μ(Br (x))
(8.7)
We thus need to understand how the subdifferential of the convex function f ∗ is deforming volumes: Of course, should f ∗ be twice differentiable at y = ∇ f (x), we may hope that, in analogy with the smooth case, μ(∂ f ∗ (Br (y))) ≈ det ∇2 f ∗ (y). μ(Br (x))
(8.8)
By (8.7) and (8.8), we could prove (8.4) by further showing that ∇2 f ∗ (y) is the inverse matrix of ∇2 f (x).
(8.9)
With this scheme of proof in mind we now discuss these various issues separately: (8.9) is addressed in Section 8.1, (8.8) in Section 8.2, and finally the formal proof of Theorem 8.1 is presented in Section 8.3.
8.1 Convex Inverse Function Theorem Theorem 8.4 Let f : Rn → R ∪ {+∞} be convex and twice differentiable n×n ), then f ∗ is twice at x 0 ∈ Rn . If ∇2 f (x 0 ) is invertible (as an element of Rsym differentiable at y0 = ∇ f (x 0 ) with ∇ f ∗ (y0 ) = x 0 and ∇2 f ∗ (y0 ) = [∇2 f (x 0 )]−1 .
(8.10)
Viceversa, if ∇2 f (x 0 ) is not invertible, then f ∗ is not twice differentiable at y0 . Finally, if F2 denotes the sets of points where f is twice differentiable and F2,inv = {x ∈ F2 : ∇2 f (x) is invertible}, then F2 \ F2,inv ⊂ ∂ f ∗ (Dom( f ∗ ) \ F2∗ ) where
F2∗
is the set of those y0 ∈
Rn
such that
f∗
(8.11)
is twice differentiable at y0 .
Proof Step one: We prove that if A = ∇2 f (x 0 ) is invertible, then f ∗ is differentiable at y0 = ∇ f (x 0 ) with ∇ f ∗ (y0 ) = x 0 . Indeed, y0 ∈ ∂ f (x 0 ) implies x 0 ∈ ∂ f ∗ (y0 ). If f ∗ is not differentiable at y0 , then there is some x 1 ∈ ∂ f ∗ (y0 ), x 1 x 0 . Since ∂ f ∗ (y0 ) is a convex set, we deduce that x t = (1−t)x 0 +t x 1 ∈ ∂ f ∗ (y0 ) for every t ∈ [0, 1]. Since f is twice differentiable at x 0 , by Theorem 7.3 we have
y − y0 x 1 − x 0
|y − y0 − A[x − x 0 ]|
. ≥ lim+ sup
−A 0 = lim sup x→x 0 y ∈∂ f (x) t→0 y ∈∂ f (x ) t |x 1 − x 0 | |x − x 0 | |x 1 − x 0 |
t
8.1 Convex Inverse Function Theorem
89
Since x t ∈ ∂ f ∗ (y0 ) implies y0 ∈ ∂ f (x t ) we can pick y = y0 and conclude that A[x 1 − x 0 ] = 0, a contradiction to the invertibility of A. Step two: We prove that if A is invertible, then f ∗ is twice differentiable at y0 with ∇2 f ∗ (y0 ) = A−1 . Thanks to Remark 7.4, it is enough to show that sup
lim
y→y 0 x ∈∂ f ∗ (y)
|x − x 0 − A−1 [y − y0 ]| = 0. |y − y0 |
(8.12)
Indeed by keeping into account that x ∈ ∂ f ∗ (y) if and only if y ∈ ∂ f (x), that ∂ f (x 0 ) = {y0 }, that ∂ f ∗ (y0 ) = {x 0 } and the continuity of subdifferentials, Proposition 2.7, we find that |x − x 0 − A−1 [y − y0 ]| , lim sup a(x, y) = lim sup a(x, y), a(x, y) = y→y 0 x ∈∂ f ∗ (y) x→x 0 y ∈∂ f (x) |y − y0 | (8.13) whenever one of the two limits exists. Since A is invertible we have
x − x 0 − A−1 [y − y0 ] −1 y − y0 − A(x − x 0 )| |x − x 0 | =−A . (8.14) |y − y0 | |x − x 0 | |y − y0 | We conclude the proof since Theorem 7.3 gives
y − y0 − A(x − x 0 )|
= 0, sup
x→x 0 y ∈∂ f (x) |x − x 0 |
lim
which in turn implies that lim
sup
x→x 0 y ∈∂ f (x)
|x − x 0 | |x − x 0 | = lim sup < ∞. |y − y0 | x→x0 y ∈∂ f (x) | A(x − x 0 )|
since A is invertible. Combining (8.13) and (8.14) with these facts we conclude the proof. Step three: We prove that if A is not invertible, then f ∗ is not twice differentiable at y0 = ∇ f (x 0 ). We can directly consider the case when f ∗ is differentiable at y0 , in which case it must be ∇ f ∗ (y0 ) = x 0 . Under this n×n , assumption we prove that, for every B ∈ Rsym lim sup y→y 0
sup
x ∈∂ f ∗ (y)
|x − x 0 − B[y − y0 ]| = ∞, |y − y0 |
thus showing that f ∗ is not twice differentiable at y0 . Indeed, if A[w] = 0 for some w ∈ Sn then we can deduce from (7.17) the existence of x j → x 0 and y j ∈ ∂ f (x j ) such that |y j − y0 | = o(|x j − x 0 |) as j → ∞. Therefore, as j → ∞, sup
x ∈∂ f ∗ (y j )
|x j − x 0 | |x − x 0 − B[y − y0 ]| ≥ − B → +∞. |y − y0 | |y j − y0 |
90
The Monge–Ampère Equation for Brenier Maps
Step four: We finally prove (8.11). If x ∈ F2 \F2,inv ⊂ F1 , then y = ∇ f (x) is well defined and x ∈ ∂ f ∗ (y), so that ∂ f ∗ (y) is non-empty, and hence y ∈ Dom( f ∗ ). This shows x ∈ ∂ f ∗ (Dom( f ∗ )). If f ∗ is not differentiable at y, this also shows that x ∈ ∂ f ∗ (Dom( f ∗ )\F1∗ ) ⊂ ∂ f ∗ (Dom( f ∗ )\F2∗ ). If, instead, f ∗ is differentiable at y, then the fact that ∇2 f (x) is not invertible implies that f ∗ is not twice differentiable at y = ∇ f (x) by step three, and thus y F2∗ , as required.
8.2 Jacobians of Convex Gradients Having in mind (8.8), we now discuss the volume deforming properties of convex subdifferentials. The first step is obtained by combining the differentiability theorem for convex subdifferentials, Theorem 7.3, with the convex inverse function theorem, Theorem 8.4 to prove a sort of Lipschitz property for convex subdifferentials at points of twice differentiability of a convex function. In the statement, we set Ir (X ) = {x ∈ Rn : dist(x, X ) < r }. Theorem 8.5 (Lipschitz continuity of convex subdifferentials) If f : Rn → R ∪ {+∞} is convex and twice differentiable at x 0 with y0 = ∇ f (x 0 ) and A = ∇2 f (x 0 ), then ∂ f (Br (x 0 )) ⊂ y0 + r Iω(r ) ( A[B1 ]),
∀Br (x 0 ) ⊂⊂ Ω,
(8.15)
where ω is an increasing function with ω(0+ ) = 0 and Ω = Int Dom( f ). If, in addition, A is invertible and r is such that B(1+ω( A r )) r (x 0 ) ⊂⊂ Ω, then
Proof
∂ f (Br (x 0 )) ⊂ y0 + r A[B1+ A−1 ω(r ) ],
(8.16)
y0 + A[Br ] ⊂ ∂ f (B(1+ω( A r )) r (x 0 )).
(8.17)
By (7.17) we have sup |y − y0 − A[x − x 0 ]| ≤ |x − x 0 | ω(|x − x 0 |),
y ∈∂ f (x)
for a nonnegative increasing function ω : [0, ∞) → (0, ∞) such that ω(0+ ) = 0. In particular, if Br (x 0 ) ⊂⊂ Ω, then ∂ f (Br (x 0 )) ⊂ y0 + Iω(r ) r ( A[Br ]) = y0 + r Iω(r ) ( A[B1 ]). that is (8.15). If now A is invertible, then we have Iε ( A[B1 ]) ⊂ A B1+ A−1 ε ,
∀ε > 0,
8.2 Jacobians of Convex Gradients
91
and thus (8.16) follows from (8.15). Moreover, by Theorem 8.4 we also have that y0 is a point of twice differentiability of f ∗ with ∇2 f ∗ (y0 ) = A−1 , so that sup
x ∈∂ f ∗ (y)
|x − x 0 − A−1 [y − y0 ]| ≤ |y − y0 | ω(|y − y0 |),
up to possibly increase the values of ω. Now, if y ∈ y0 + A[Bs ], then |y − y0 | ≤ A s and ∂ f ∗ (y0 + A[Bs ]) ⊂ x 0 + Iω( A s) s A−1 [A[Bs ]] ⊂ B(1+ω( A s)) s (x 0 ), so that (8.17) follows thanks to the fact that ∂ f ∗ (F) ⊂ E
iff
F ⊂ ∂ f (E),
whenever E ⊂ Dom( f ) and F ⊂ Dom( f ∗ ).
(8.18)
Next we combine Theorem 8.5 with the area formula for linear maps to obtain volume distortion estimates for subdifferentials. Let us recall indeed that, whenever L : Rn → Rn is a linear map, L n (L[Br ]) = | det L| ωn r n
∀r > 0.
(8.19)
Theorem 8.6 (Jacobian of ∇ f and volume distortion) If f : Rn → R ∪ {+∞} is convex and Ω = IntDom( f ), then: (i) if f is twice differentiable at x 0 , then lim+
r →0
L n (∂ f (Br (x 0 ))) = det ∇2 f (x 0 ). ωn r n
(8.20)
(ii) if f is twice differentiable at x 0 , ∇2 f (x 0 ) is invertible, and y0 = ∇ f (x 0 ) ∈ Leb(w) form some w ∈ L 1loc (Bη (y0 )), η > 0, then 1 |w − w(y0 )| = 0. (8.21) lim r →0+ L n (∂ f (Br (x 0 ))) ∂ f (B r (x 0 )) (iii) if μ = u dL n , ∇ f transports μ, f is twice differentiable at x 0 , ∇2 f (x 0 ) is invertible, and x 0 is a Lebesgue point of u, then lim
r →0+
[(∇ f )# μ](Br (y0 )) u(x 0 ) , = ωn r n det ∇2 f (x 0 )
(8.22)
where y0 = ∇ f (x 0 ). Proof Step one: We prove statement (i). Let y0 = ∇ f (x 0 ) and A = ∇2 f (x 0 ). If A is not invertible, then A[B1 ] is contained in a (n − 1)-dimensional plane, and therefore L n (Iω(r ) ( A[B1 ])) ≤ C(n, A) ω(r).
92
The Monge–Ampère Equation for Brenier Maps
By combining this estimate with (8.15) we find that L n (∂ f (Br (x 0 ))) ≤ r n C(n, A) ω(r) and (8.20) holds with det A = 0. Assuming from now on that A is invertible, by (8.16), (8.19) and | det A| = det A, we have L n (∂ f (Br (x 0 )) ≤ r n L n ( A[B1+ A−1 ω(r ) ]) = ω n r n (1+ A−1 ω(r)) n det A, that is, lim sup r →0+
L n (∂ f (Br (x 0 ))) ≤ det A. ωn r n
(8.23)
Finally, by (8.17) and since s → (1 + ω( As))s is strictly increasing on s > 0, we conclude that lim inf + r →0
L n (∂ f (B(1+ω( A s)) s (x 0 ))) L n (∂ f (Br (x 0 ))) = lim inf s→0+ ωn r n ω n (1 + ω( A s)) s) n L n ( A[Bs ]) ≥ lim inf = det A, + s→0 ω n (1 + ω( A s)) s) n (8.24)
which combined with (8.23) proves (8.20) also when A is invertible. Step two: We prove statement (ii). We start showing that {∂ f (Br (x 0 ))}r >0 shrinks nicely to y0 = ∇ f (x 0 ) with respect to L n as r → 0+ , in the sense that we can find ρ(r) → 0+ as r → 0+ such that ∂ f (Br (x 0 )) ⊂ Bρ(r ) (y0 )
∀r < r 0 ,
lim inf + r →0
L n (∂ f (Br (x 0 ))) ≥ c, L n (Bρ(r ) (y0 )) (8.25)
for some r 0 , c > 0. Indeed (8.15) implies ∂ f (Br (x 0 )) ⊂ B( A +ω(r ))r (y0 )
∀Br (x 0 ) ⊂⊂ Ω,
so that, setting ρ(r) = ( A + ω(r))r and using (8.24) we find that (8.25) holds with c = det A/ A n . If now w ∈ L 1loc (Bη (y0 )), η > 0, and y0 is a Lebesgue point of w, then |w − w(y0 )| ω n ρ(r) n 1 ∂ f (B r (x 0 )) ≤ |w − w(y0 )| L n (∂ f (Br (x 0 ))) L n (∂ f (Br (x 0 ))) ω n ρ(r) n B ρ (r ) (y0 ) so that (8.21) follows by (8.25). Step three: We prove statement (iii). By Proposition 8.3, setting ν = (∇ f )# μ we have y0 = ∇ f (x 0 ). ν(Br (y0 )) = μ(∂ f ∗ (Br (y0 )))
8.3 Derivation of the Monge–Ampère Equation
93
Taking into account that μ = u dL n we have that u L n ∂ f ∗ (Br (y0 )) ν(Br (y0 )) ∂ f ∗ (B r (y 0 )) = . ωn r n ωn r n L n ∂ f ∗ (Br (y0 )) By statement (i), if y0 ∈ F2∗ , then the second factor converges to det ∇2 f ∗ (y0 ), ∗ , then the while by statement (ii), if x 0 is a Lebesgue point of u and y0 ∈ F2,inv first factor converges to u(x 0 ): Therefore, lim
r →0+
ν(Br (y0 )) = u(x 0 ) det ∇2 f ∗ (y0 ), ωn r n
∗ . If x ∈ F whenever x 0 ∈ Leb(u) and y0 = ∇ f (x 0 ) ∈ F2,inv 0 2,inv ∩ Leb(u), then ∗ 2 ∗ by Theorem 8.4, y0 = ∇ f (x 0 ) ∈ F2,inv with ∇ f (y0 ) = [∇2 f (x 0 )]−1 , so that we have completed the proof of (8.22).
8.3 Derivation of the Monge–Ampère Equation Proof of Theorem 8.1 By assumption, μ = ρ dL n and ν = σ dL n . By Theorem 6.1 there exists f : Rn → R ∪ {+∞} such that Ω = IntDom( f ) ∅, μ is concentrated on F1 , (∇ f )# μ = ν, and Cl(∇ f (F1 ∩ spt μ)) = spt ν. Step one: We show that μ is concentrated on F2,inv . Since μ is concentrated on F1 , μ 0, we find that σ(∇ f (x)) > 0, so that (8.3) is proved. Step four: We finally prove that det ∇2 f ∈ L 1loc (Ω) and that the area formula (8.5) holds true. Let us first show that det ∇2 f ≤ L n (∂ f (K )) ∀K ⊂⊂ Ω. (8.28) K
Indeed, set
Ω∗
= IntDom( f ∗ ) and consider the measure λ defined by
λ(E) = (∇ f ∗ )# (L n Ω∗ )(E) = L n (∂ f (E))
∀E ∈ B(Rn ),
where in the last identity we have used Proposition 8.3, the fact that ∇ f ∗ transn ports L n Ω∗ , and that f ∗∗ = f . Since ∇ f ∈ L ∞ loc (Ω; R ), we easily see that n if K ⊂⊂ Ω, then ∂ f (K ) is a bounded set in R . Therefore, λ is locally finite, and thus a Radon measure, in Ω. By the Lebesgue–Besicovitch differentiation theorem there exist h ∈ L 1loc (Ω) and a Radon measure λ s ⊥ L n Ω such that λ = h L n Ω + λ s ,
h(x) = lim+ r →0
λ(Br (x)) ωn r n
L n -a.e.
for x ∈ Ω. At the same time, thanks to statement (a) and Alexandrov’s theorem, for L n -a.e. x ∈ Ω the limit defining h(x) is equal to det ∇2 f (x), so that det ∇2 f ∈ L 1loc (Ω), as claimed. To prove (8.5) we need to show that (∇ f )# det ∇2 f dL n M = L n ∇ f (M), M = F2,inv ∩ Leb(det ∇2 f ). (8.29) Since ∇ f transports the measure π = det ∇2 f dL n M (indeed, M ⊂ F1 ), we can apply Theorem 8.6-(iii) with u = det ∇2 f , and setting ω = (∇ f )# π, we find lim+
r →0
ω(Br (∇ f (x 0 ))) det ∇2 f (x 0 ) =1 = ωn r n det ∇2 f (x 0 )
∀x 0 ∈ M.
(8.30)
In particular, ω∇ f (M) = L n ∇ f (M). Since π is concentrated on M and ω = (∇ f )# π, it follows that ω is concentrated on (∇ f )(M), and thus (8.29) holds.
PART III Applications to PDE and the Calculus of Variations and the Wasserstein Space
9 Isoperimetric and Sobolev Inequalities in Sharp Form
As detailed in the introduction, the third part of this book presents a series applications of the Brenier–McCann theorem to the analysis of various problems in PDE and the Calculus of Variations, and to the introduction of the Wasserstein space. We begin in this chapter by proving sharp forms of the Euclidean isoperimetric inequality and of the Sobolev inequality. By “sharp form” we mean that we are going to prove these inequalities with their sharp constants, although we will avoid the more technical discussions needed to characterize their equality cases.
9.1 A Jacobian−Laplacian Estimate We begin our analysis by proving a proposition which lies at the heart of the method discussed in this chapter. The proposition is based on the arithmeticgeometric mean inequality: given non-negative numbers α k ≥ 0, we have n
1 αk , n k=1 n
α 1/n ≤ k
k=1
(9.1)
with equality if and only if there is α ≥ 0 such that α k = α for every k = 1, . . . , n. Proposition 9.1 If f : Rn → R ∪ {+∞} is convex with Ω = IntDom( f ) ∅, then, as Radon measures in Ω, Div(∇ f ) ≥ n (det ∇2 f ) 1/n d(L n Ω),
(9.2)
where Div denotes the distributional divergence operator. Proof We have proved in Theorem 7.1 that ∇ f ∈ BVloc (Ω; Rn ) and that the n×n -valued Radon measure in Ω such distributional Hessian D2 f of f is an Rsym n 2 that, for every v ∈ R , Dvv f is a (standard, nonnegative) Radon measure 97
98
Isoperimetric and Sobolev Inequalities in Sharp Form
2 f (ϕ) = v · f ∇2 ϕ(x)[w] dx for in Ω – where as usual we have set Dvw Rn every v, w ∈ Rn and ϕ ∈ Cc∞ (Ω). Moreover, we have the Radon–Nikodym decomposition of D2 f with respect to L n Ω, D2 f = ∇2 f d(L n Ω) + [D 2 f ]s ,
where [D2 f ]s = D2 f Y ,
for a suitable Borel set Y , see (A.14). In particular,
2 s v· ϕ d[D f ] [v] ≥ 0, ∀ϕ ∈ Cc0 (Ω), ϕ ≥ 0,∀v ∈ Rn , Rn
so that for every v ∈ Rn 2 Dvv f ≥ v · ∇2 f [v] d(L n Ω)
as Radon measures in Ω. Since ∇ f ∈ L 1loc (Ω; Rn ) gives D(∇ f ) = D2 f , we find that, as Radon measures in Ω, n De2 i e i f ≥ trace(∇2 f ) dL n Ω, (9.3) Div(∇ f ) =
n
i=1
where trace( A) = i=1 Aii if A ∈ Rn×n . Now, for a.e. x ∈ Ω, the absolutely continuous density ∇2 f (x) of D2 f with respect to L n is an n × n symmetric matrix (as the limit as r → 0+ of the n × n-symmetric matrices D2 f (Br (x))/ω n r n ). Therefore ∇2 f (x) can be represented, in an x-dependent n , orthonormal basis, as a diagonal matrix with nonnegative entries {α k (x)}k=1 so that, by the arithmetic-geometric mean inequality (9.1) we find Trace(∇2 f (x)) =
n
α k (x) ≥ n
k=1
n
α k (x) 1/n = n (det ∇2 f (x)) 1/n .
(9.4)
k=1
Combining (9.3) with (9.4) we conclude the proof.
9.2 The Euclidean Isoperimetric Inequality In Theorem 9.2 we present an OMT proof of the Euclidean isoperimetric inequality. Let us recall that if E is an open set with smooth boundary in Rn and outer unit normal νE , then the Gauss–Green formula ∇ϕ = ϕ νE dH n−1 ∀ϕ ∈ Cc∞ (Rn ), (9.5) E
∂E
can be interpreted in distributional sense as saying that the characteristic function 1 E of E (which, trivially, is locally summable, and thus admits a distributional derivative) satisfies D1 E = νE H n−1 ∂E.
(9.6)
9.2 The Euclidean Isoperimetric Inequality
99
In particular, D1 E is an Rn -valued Radon measure, since |D1 E | = H n−1 ∂E and ∂E is locally H n−1 -finite. Indeed, for every x ∈ ∂E there exist an (n − 1)-dimensional disk M centered at x, a cylinder K with section M, a smooth function u : M → R with bounded gradient such that K ∩ ∂E = graph of u over M, which combined with (9.6) gives |D1 E |(K ) = H
n−1
(K ∩ ∂E) =
1 + |∇u| 2 < ∞.
M
Defining the perimeter of E by setting P(E) = H n−1 (∂E) = |D1 E |(Rn ), one has the Euclidean isoperimetric inequality, asserting that if 0 < |E| < ∞, then
1/n |E| r (E) = , P(E) ≥ P(Br (E) ) ωn where r (E) is the radius such that |Br (E) | = |E|, and where equality holds if and only if E = x + Br (E) for some x ∈ Rn . Given that (n−1)/n , P(Br (E) ) = n ω1/n n |E|
the Euclidean isoperimetric inequality can be written as in (9.7). Theorem 9.2
If E is a bounded open set with smooth boundary in Rn , then (n−1)/n P(E) ≥ n ω1/n . n |E|
Proof Step one: We prove that if u ∈ Cc∞ (Rn ) with u ≥ 0 and 1, then Rn
(9.7) Rn
u n/(n−1) =
|∇u| ≥ n ω1/n n .
(9.8)
Indeed, let us consider the probability measures μ = u n/(n−1) dL n ,
ν=
1 B1 dL n . ωn
Since spt ν is compact, by Theorem 8.1 there exists a convex function f : Rn → R such that (∇ f )# μ = ν, with ∇ f (x) ∈ B1 for L n -a.e. x ∈ E, and det(∇2 f (x)) = ω n
u(x) n/(n−1) , 1 B1 (∇ f (x))
for L n -a.e. x ∈ {u > 0}.
(9.9)
100
Isoperimetric and Sobolev Inequalities in Sharp Form
By integrating h(y) = 1 B1 (y) −1/n with respect to ν and using first (∇ f )# μ = ν and then Monge’s transport condition (9.9), we find 1 B1 (y) −1/n dν(y) = 1 B1 (∇ f ) −1/n u n/(n−1) 1= Rn
= =
1 ω1/n n 1
Rn
det ∇2
Rn
f
u n/(n−1) (det ∇ f ) 2
ω1/n n
1/n
1/n
Rn
u n/(n−1) u≤
1 n ω1/n n
Rn
u d[Div (∇ f )].
where in the last inequality we have used (9.2) (which holds with Ω = Rn ). Given that u ∈ Cc∞ (Rn ), by definition of distributional divergence we find that 1/n nω n ≤ u d[Div(∇ f )] = − ∇ f · ∇u ≤ |∇u| Rn
Rn
Rn
where in the last inequality we have used ∇u = 0 on {u = 0} and ∇ f (x) ∈ B1 for L n -a.e. x ∈ {u > 0}. This proves (9.8). Step two: Now let E be a bounded open set with smooth boundary in Rn , and let uε = 1 E ρε . Since E is bounded we have that uε ∈ Cc∞ (Rn ), and by (9.8) we have
(n−1)/n 1/n n/(n−1) |∇uε | ≥ n ω n uε . Rn
Rn
As ε → 0+ we have uε (x) → 1 E (x) at every Lebesgue point x of 1 E , and |∇uε | → |D1 E |(Rn ) = P(E) Rn
thanks to (A.22). Therefore (9.7) is proved.
9.3 The Sobolev Inequality on R n The sharp Sobolev inequality on Rn states that if 1 < p < n, n ≥ 2, u ∈ L 1loc (Rn ) is such that ∇u ∈ L p (Rn ; Rn ), and {|u| > t} has finite Lebesgue measure for every t > 0, then
1/p 1/p
np p p (9.10) |∇u| ≥ S(n, p) |u| , p = n n n −p R R with equality if and only if for some a ∈ R, x 0 ∈ Rn and r > 0 one has u = ua, x0,r , where, setting p = p/(p − 1), a , x ∈ Rn . (9.11) ua, x0,r (x) = (1 + r |x − x 0 | p ) (n/p)−1
9.3 The Sobolev Inequality on Rn
101
Evaluation of the L p -norm of ua, x0,r and of the L p -norm of ∇ua, x0,r allows one to give an explicit formula for S(n, p) in terms of gamma functions. In Theorem 9.3 we prove (9.10) with the sharp constant S(n, p) by using Brenier maps. The interest of establishing the Sobolev inequality in sharp form is paramount both from the physical and geometric viewpoint. In the latter direction, we may recall the geometric interpretation of (9.10) when p = 2 and n ≥ 3. In this setting, the conformally flat Riemannian manifold (M, g) = (Rn ,u4/(n−2) δ i j ) defined by a smooth and positive function u : Rn → (0, ∞) is such that n−2 |∇u| 2 = Rg , u2 = volg (M), 4 (n − 1) M Rn Rn where Rg denotes the scalar curvature of (M, g). In other words, S(n, 2) is the minimum of the total scalar curvature functional among conformally flat Riemannian manifolds with unit volume. Since (9.11) gives 1 x ∈ Rn , u1,0,1 (x) 4/(n−2) = (1 + |x| 2 ) 2 and since (1 + |x| 2 ) −2 is the conformal factor of the standard metric on the sphere Sn under stereographic projection onto Rn , we conclude that the sharp Sobolev inequality with p = 2 is equivalent to say that standard spheres minimize total scalar curvature among conformally flat manifolds with fixed volume. Theorem 9.3 (Sharp Sobolev inequality) If n ≥ 2, 1 < p < n, and u ∈ L 1loc (Rn ) with ∇u ∈ L p (Rn ; Rn ) and L n ({|u| > t}) < ∞ for every t > 0, then
1/p 1/p p p |∇u| ≥ S(n, p) |u| . (9.12) Rn
Rn
Moreover, equality holds if u = ua, x0,r for some a ∈ R, x 0 ∈ Rn and r > 0. Proof
Step one: We show that if u, v ∈ Cc∞ (Rn ) with u, v ≥ 0 on Rn and up = v p = 1, Rn
then
n p#
Rn Rn
vp
Rn
#
|y| p v(y) p dy
1/p ≤
1/p Rn
|∇u|
p
,
p# =
p(n − 1) . n−p
(9.13) Let ∇ f be the Brenier map from μ = u p dL n to ν = v p dL n , so that h vp = (h ◦ ∇ f ) u p (9.14) Rn
Rn
102
Isoperimetric and Sobolev Inequalities in Sharp Form
for every Borel function h : Rn → [0, ∞]. By Theorem 8.1, and since spt ν is compact, we can assume that Dom( f ) = Rn and that ∇ f (x) ∈ {v > 0} and
det ∇2 f (x) =
u(x) p , v(∇ f (x)) p
By applying (9.14) to h = v −p that #
Rn
vp =
Rn
/n
v −p
for L n -a.e. x ∈ {u > 0}.
(9.15)
, and noticing that p# = p − (p/n), we find
/n
vp =
Rn
v(∇ f ) −p
/n
up ,
and thus, by (9.15), # (det ∇2 f ) 1/n p 1 p# 2 1/n p # v = u = (det ∇ f ) u ≤ u p d[Div∇ f ], p /n n n n n n u R R R R where in the last inequality we have used the key inequality (9.2) (with Ω = Rn ). Since u ∈ Cc∞ (Rn ) we can integrate by parts and apply the Hölder inequality to find # # # vp ≤ u p d[Div(∇ f )] = −p# u p −1 ∇u · ∇ f n Rn
Rn
Rn
≤p
# Rn
u
(p # −1) p
|∇ f |
p
1/p
1/p Rn
|∇u|
p
.
Since (p# − 1)p = p we can apply (9.14) again, this time with h(y) = |y| p , and find 1/p
1/p p# # p p p v ≤p |y| v(y) dy |∇u| . n Rn
Rn
Rn
This proves (9.13). Step two: By a non-trivial density argument that is omitted for the sake of brevity, (9.13) implies that if u ∈ L 1loc (Rn ) with ∇u ∈ L p (Rn ; Rn ) and L n ({|u| > t}) < ∞ for every t > 0, then
1/p 1/p p p |∇u| ≥ S(n, p) |u| , Rn
Rn
where S(n, p) is defined as vp = 1 , S(n, p) = sup M (v) : v ∈ Cc∞ (Rn ), v ≥ 0, Rn # p v n Rn M (v) = # 1/p . p p |v(y)| p dy |y| Rn
9.3 The Sobolev Inequality on Rn
103
p 1−(n/p) for a suitable choice of By letting u(x) = v(x) p)(1 + |x| ) = c(n, c(n, p) that guarantees Rn u p = Rn v p = 1, a direct computation (or, better said, another omitted density argument) shows that
∇u L p (Rn ) = M (v), u L p (Rn )
(9.16)
thus proving the optimality of (9.11) in (9.12).
Remark 9.4 The above proof leaves the impression that, without prior knowledge of the optimal functions in the Sobolev inequality, one could have not guessed their form from the mass transport argument. This is, however, a wrong impression, since the proof of Theorem 9.3 actually indicates a simple way to identify the specific profiles of the optimizers in the Sobolev inequality. Indeed, we can inspect the proof of Theorem 9.3 looking for functions u such that the choice v = u saturates the duality inequality (9.13). Since in this case we can set ∇ f (x) = x for every x ∈ Rn , the only non-trivial inequality sign left in the argument is found in correspondence of the application of the Hölder inequality,
1/p 1/p p # −1 p p p u (−∇u) · x ≤ |x| u (x) dx |∇u| . Rn
Rn
Rn
Equality holds in this Hölder inequality if and only if, for some λ > 0, up
# −1
x = λ |∇u| p−2 (−∇u);
writing this condition for u(x) = ζ (|x|), with ζ = ζ (r) a decreasing function, we find that # r ∀r > 0, (−ζ ) p−1 = ζ p −1 λ which is equivalent to −μ ζ ζ −n/(n−p) = r 1/(p−1) , μ = λ 1/(p−1) , for every r ∈ {ζ > 0}. This is easily integrated to find that ζ (r) −p/(n−p) = α + β r p for positive constants α and β, and thus that ζ (r) = (α + β r p ) 1−(n/p) for every r > 0.
10 Displacement Convexity and Equilibrium of Gases
Displacement convexity is one of the most important ideas stemming from the study of OMT problems. Following McCann’s original presentation, in this chapter we introduce displacement convexity (and the related notion of displacement interpolation) in connection with an apparently simple problem in the Calculus of Variations, namely, establishing the uniqueness of equilibrium states in a basic variational model for self-interacting gases. A displacement convexity proof of the Brunn–Minkowski inequality closes the chapter.
10.1 A Variational Model for Self-interacting Gases We consider a model where the state of a gas is entirely determined by its mass density ρ. We normalize the total mass of the gas to unit, and thus identify the state variable ρ with an absolutely continuous probability measure μ = ρ dL n ∈ Pac (Rn ). The energy E = U +V of the state ρ is defined as the sum of an internal energy U and of a self-interaction energy V: the internal energy U has the form U ( ρ) = U ( ρ(x)) dx, Rn
where U : [0, ∞) → R is continuous, with U (0) = 0, while the self-interaction energy V has the form 1 V ( ρ) = ρ(x) V (x − y) ρ(y) dx dy 2 Rn Rn corresponding 1 to a continuous function V : Rn → R. 1
In the physical case n = 3, the dimensions of U are energy per unit volume, and those of V energy times squared unit mass.
104
10.1 A Variational Model for Self-interacting Gases
105
Internal energies: The internal energy U models the behavior of the gas under compression and dilution. Compression and dilution corresponds, respectively, to the limits λ → 0+ and λ → ∞ of the rescaled densities ρλ (x) = λ −n ρ(x/λ) (λ > 0). Since gases oppose compression and favor dilution, we want
ρ(x) n U λ dx λ ∈ (0, ∞) → U ( ρλ ) = λn Rn to be a decreasing function. By the arbitrariness of ρ, this means requiring that λ ∈ (0, ∞) → λ n U (λ −n ) is a decreasing function.
(10.1)
Physically relevant examples of internal energy densities U satisfying (10.1) are given by U ( ρ) = ρ log ρ,
(10.2)
γ
U ( ρ) = ρ ,
γ ≥ 1,
U ( ρ) = −ργ ,
1≥γ≥
(10.3) n−1 . n
(10.4)
With U as in (10.3) or (10.4), the functional U is well-defined on Pac (Rn ) with values, respectively in [0, ∞] and in [−∞, 0]. The case when U ( ρ) = ρ log ρ, and thus U is the (negative) entropy of the gas, is of course more delicate, because both U ( ρ) + = max{U ( ρ), 0} and U ( ρ) − = max{−U ( ρ), 0} may have infinite integrals. To avoid this difficulty one usually assumes a moment bound. For example, given p ∈ [1, ∞), by taking into account that ρ [log( ρ) + |x| p ] = p p U ( ρ e |x | ) e− |x | and by applying Jensen inequality to the convex function U p and to the probability measure c(p) e− |x | dx, it is easily seen that
p − |x | p |x| ρ(x) dx −log e dx ∀μ = ρ dL n ∈ Pp,ac (Rn ). U ( ρ) ≥ − Rn
Rn
(10.5) Finally, as a remark that will play an important role in Theorem 10.5, we notice that in all the three examples listed above the function of λ defined in (10.1) is also convex, in addition to being decreasing. Self-interaction energies: The energy term V takes into account the interaction between different particles in the gas mediated by the self-interaction potential V . The Newtonian potential V (z) = 1/|z| for z ∈ R3 \ {0} is, by and large, the physically most important example of a self-interaction potential: in this case, the attraction between particles decreases with an increase of their mutual distance. We shall rather focus on the case of interaction potentials V such that V is strictly convex on Rn .
(10.6)
106
Displacement Convexity and Equilibrium of Gases
This assumption excludes the Newtonian potential, and refers more naturally to situations where the attraction between particles increases with their mutual distance. We now turn to the analysis of the minimization problem inf E ( ρ) = U ( ρ) + V ( ρ) : μ = ρ dL n ∈ Pac (Rn ) ,
(10.7)
whose minimizers represent the equilibrium states of a gas modeled by U and V . By a classical technique in the Calculus of Variations, the concentrationcompactness method, and under mild growth assumptions on U and V , one checks the existence of minimizers in (10.7): we do not provide the details of this existence argument, since they are not really related to OMT. We rather focus on the question of the uniqueness of minimizers, which is considerably trickier. The standard approach to uniqueness of minimizers, for example in the minimization problem minK F of a real-valued function F defined over a convex subset K of a linear space, requires the strictly convexity of F: if for every x 0 , x 1 ∈ K and t ∈ (0, 1) one has F ((1 − t)x 0 + t x 1 ) ≤ (1 − t) F (x 0 ) + t F (x 1 ), with equality for some t ∈ (0, 1) if and only if x 0 = x 1 , then min K F can be achieved only at one point; indeed F (x 0 ) = F (x 1 ) = min K F = m and convexity imply that F is constantly equal to m along the segment (1 − t)x 0 + t x 1 , t ∈ (0, 1), thus implying, by strict convexity, that x 0 = x 1 . As the reader may expect at this point of the discussion, the problem is that under the assumptions on U and V made above, the energy E will not be strictly convex with respect to the standard notion of convex interpolation on Pac (Rn ). By standard interpolation of μ0 = ρ0 dL n and μ1 = ρ1 dL n we mean the family of measures μt = (1 − t) ρ0 + t ρ1 dL n ∈ Pac (Rn ),
(10.8)
satisfying of course μt=0 = μ0 and μt=1 = μ1 . The internal energy U is strictly convex along the standard interpolation (10.8) as soon as U is a strictly convex function on (0, ∞), a condition which is met by all the examples in our list (10.2), (10.3) and (10.4). Thus, for these choices of the internal energy U, the total energy E is strictly convex along (10.8) as soon as V is convex along (10.8). However, as we are going to see in a moment, under very mild assumptions on V (namely, V (0) = 0 and V > 0 on Rn \ {0}), the self-interaction energy V may very well be concave along the interpolation (10.8)! Let us take, for example, ρε0 =
1 Bε (x 0 ) , ωn ε n
ρε1 =
1 Bε (x 1 ) , ωn ε n
ε < |x 0 − x 1 |,
(10.9)
10.1 A Variational Model for Self-interacting Gases
(b)
107
μ1/2
(a) μ0
μ1 (c)
Rn
μ1/2
Figure 10.1 (a) Two probability densities in Rn , obtained one as a translation of the other; (b) the standard convex interpolation with t = 1/2 is defined by taking the average of the two densities; (c) the new displacement interpolation is defined by continuously transporting μ 0 in μ 1 through a convex combination of the identity map and the Brenier map: in the depicted case, this amounts to a continuous translation. ∗
so that ρεt = (1 − t) ρε0 + t ρε1 , and μεt = ρεt dL n μt as ε → 0+ , where μ0 = δ x 0 ,
μ1 = δ x 1 ,
μt = (1 − t) δ x0 + t δ x1
t ∈ (0, 1). (10.10)
If, for example, V (0) = 0, then, letting ε → 0+ , we find 1 ε V ( ρt ) → dμt (x) V (x − y) dμt (y) 2 Rn Rn 1 = (1 − t) V (x − x 0 ) + t V (x − x 1 ) dμt (x) 2 Rn 1 = (1 − t) t V (x 1 − x 0 ) + t (1 − t) V (x 0 − x 1 ) 2 which is clearly a concave function of t as soon as V > 0 on Rn \ {0}. Therefore we can essentially rule out the use of the standard convex interpolation (10.8) in addressing uniqueness of minimizers in (10.7). The crucial remark to exit this impasse is that self-interaction energies V corresponding to convex potentials V are indeed convex if tested along an interpolation between the supports of the densities ρ0 and ρ1 (rather than between their values, as it is the case with (10.8)). This simple idea is illustrated in Figure 10.1, and can be preliminarily tested in the simple case when
N
N
N λ i δ x i and μ1 = i=1 λ i δ y i for some λ i ∈ (0, 1) with i=1 λ i = 1. μ0 = i=1 In this case, we are thinking about replacing the standard convex interpolation μt = (1 − t) μ0 + t μ1 , see (10.10), with the displacement interpolation 2 2
To be precise, (10.11) coincides with the formal notion of displacement interpolation (introduced in (10.12) for the case when μ 0}, and thus for L n -a.e. x ∈ Rn . We thus have, Proposition 10.1 (Displacement convexity of interaction energies) If V : Rn → R is convex, then the self-interaction energy V is convex along displacement interpolations in Pac (Rn ). Moreover, if V is strictly convex and μ0 = ρ0 dL n and μ1 = ρ1 dL n are such that V ( ρt ) = (1−t) V ( ρ0 ) +t V ( ρ1 ) for some t ∈ (0, 1), then there is x 0 ∈ Rn such that ρ0 (x) = ρ1 (x − x 0 ) for L n -a.e. x ∈ Rn . Proof
By the above discussion and Proposition 10.2.
This is of course very promising, but notice that, in fixing the convexity issues of V, we have created a potential problem on the front of the internal energy U. Indeed, if true, the convexity in t ∈ (0, 1) of U ( ρt (x)) dx, where ρt dL n = (1 − t) id + t ∇ f )# μ0 , U ( ρt ) = Rn
is definitely not an immediate consequence of a convexity assumption on U – as it was the case for the standard convex interpolation. Interestingly, the physically natural condition that λ → λ n U (λ −n ) is convex and decreasing on λ > 0 is sufficient to establish the convexity of U along the displacement interpolation (10.12), and is tightly related to the theory of Brunn–Minkowski inequalities. The rest of this chapter is organized as follows. We first discuss some basic properties of displacement interpolation, and then provide a general result about the displacement convexity of internal energies. Finally, we briefly discuss a proof of the Brunn–Minkowski inequality by means of displacement interpolation.
10.2 Displacement Interpolation Whenever μ0 ∈ P (Rn ) is such that μ0 (S) = 0 for every countably (n − 1)rectifiable set in Rn and μ1 ∈ P (Rn ), we can construct the Brenier map ∇ f from μ0 to μ1 (Theorem 6.1) and define the displacement interpolation from μ0 to μ1 by setting
110
Displacement Convexity and Equilibrium of Gases μt = (1 − t) id + t ∇ f # μ0 ,
t ∈ [0, 1].
(10.15)
At times the notation [μ0 , μ1 ]t is used in place of μt , because it stresses the directionality of (10.15): we know that we can start from μ0 and end-up on μ1 by formula (10.15), but unless μ1 also satisfies μ1 (S) = 0 for every countably (n − 1)-rectifiable set in Rn , we do not have a Brenier map from μ1 to μ0 to define [μ1 , μ0 ]t . The following proposition is thus very useful, as it shows that displacement interpolation is closed in Pac (Rn ). Proposition 10.2 If μ0 = ρ0 dL n ∈ Pac (Rn ) and f : Rn → R ∪ {+∞} is a convex function such that ∇ f transports 3 μ0 , then μt = (1 − t)id + t ∇ f # μ0 ∈ Pac (Rn ), for every t ∈ [0, 1), i.e., μt = ρt dL n where ρt ∈ L 1 (Rn ), ρt ≥ 0 and Rn ρt =1. Remark 10.3 Notice that in the setting of Proposition 10.2 we may have μ1 = (∇ f )# μ0 Pac (Rn ). The sole condition that μ0 ∈ Pac (Rn ) is enough to guarantee that μt ∈ Pac (Rn ) whenever t ∈ [0, 1). Proof of Proposition 10.2 Let f t (x) = (1 − t)|x| 2 /2 + t f (x) for x ∈ Rn . Since ∇ f t = (1 − t)id + t ∇ f we clearly have that ∇ f t transports μ and that μt = (∇ f t )# μ. If F is the set of the differentiability points of f , then F is also the set of differentiability points of f t . Moreover, if x 1 , x 2 ∈ F, then |∇ f t (x 1 )−∇ f t (x 2 )| |x 1 − x 2 | ≥ ∇ f t (x 1 ) − ∇ f t (x 2 ) · (x 1 − x 2 ) = (1 − t)|x 1 − x 2 | 2 + t ∇ f (x 1 )−∇ f (x 2 ) · (x 1 − x 2 ) ≥ (1 − t)|x 1 − x 2 | 2 , where in the last inequality we have used the monotonicity of ∇ f , namely (∇ f (x 1 ) − ∇ f (x 2 )) · (x 1 − x 2 ) ≥ 0 for all x 1 , x 2 ∈ Rn (see (5.1)). We thus find, ∀x 1 , x 2 ∈ F. |∇ f t (x 1 ) − ∇ f t (x 2 )| ≥ (1 − t) |x 1 − x 2 | Setting St = (∇ f t )(F), this means that (∇ f t ) −1 defines a map on St with values in Rn such that Lip((∇ f t ) −1 ; St ) ≤ (1 − t) −1 . In particular, for every Borel set E ⊂ Rn we have L n (∇ f t ) −1 (E) =L n (∇ f t ) −1 (E ∩ St ) ≤(1−t) −n L n (E ∩ St )≤(1 − t) −n L n (E), so that L n ((∇ f t ) −1 (E)) = 0 if L n (E) = 0. In particular, ρ0 (x) dx = 0 if L n (E) = 0, μt (E) = (∇ f t ) −1 (E)
showing that μt 0, and to the measures τx0 μ(E) = μ(E − x 0 ), Q# μ(E) = μ(Q−1 (E)) and μλ (E) = μ(E/λ) (E a Borel set in Rn ). It is easily seen that f (x) = |x| 2 /2 + x 0 · x is such that ∇ f transports μ = ρ dL n into τx0 μ, so that t x + (1 − t) ∇ f (x) = x + t x 0 , and thus [μ, τx0 μ]t = ρt (x) dx
with
ρt (x) = ρ(x − t x 0 ).
In particular, the displacement interpolation between μ and its translation τx0 μ is obtained by continuously translating μ, i.e., [μ, τx0 μ]t = τt x0 μ. Similarly, f (x) = λ |x| 2 /2 is such that ∇ f transports μ = ρ dL n into μλ , so that t x + (1 − t) ∇ f (x) = [t + (1 − t)λ]x and [μ, μλ ]t = ρt (x) dx
ρt (x) =
with
ρ(x/λ(t)) , λ(t) n
λ(t) = t + (1 − t)λ.
In particular, the displacement interpolation between μ and its rescaling μλ is obtained by continuously rescaling μ, i.e., [μ, μλ ]t = μλ (t ) . However, the displacement interpolation between μ and Q# μ is not obtained by a continuous rotation process of μ! Indeed, being a (non-trivial) rotation and being the gradient of a convex function are incompatible conditions. More precisely, let ∇ f be the Brenier map between μ = ρ dL n and Q# μ, then we have ϕ(∇ f ) ρ = ϕ(Q[x]) ρ(x) dx ∀ϕ ∈ Cc0 (Rn ). Rn
Rn
Of course if we could find a convex function f such that ∇ f (x) = Q[x] for μ-a.e. x ∈ Rn , then ∇ f would be the Brenier map from μ to Q# μ: however n×n ∩ O(n) with in that case the condition ∇2 f = Q would imply that Q ∈ Rsym ∗ ∗ Q ≥ 0, and thus that Q = Id (indeed, Q Q = Id and Q = Q give Q2 = Id, so that Q ≥ 0 implies Q = Id ). So unless we are in the trivial case when Q = Id, the Brenier map ∇ f from μ to Q# μ cannot be the rotation defined by Q.
112
Displacement Convexity and Equilibrium of Gases
10.3 Displacement Convexity of Internal Energies The goal of this section is proving the following theorem, which provides a large class of displacement convex internal energies. Theorem 10.5
Let U : [0, ∞) → R be continuous, with U (0) = 0 and
λ ∈ (0, ∞) → λ n U (λ −n ) is a convex decreasing function. Let K ⊂ Pac
(Rn )
(10.16)
satisfy:
(i) if μ0 , μ1 ∈ K , then μt ∈ K for every t ∈ (0, 1), where μt = ρt dL n is the displacement interpolation from μ0 to μ1 ; (ii) either “U ( ρ) is well-defined in R ∪ {+∞} for every μ = ρ dL n ∈ K ” or “U ( ρ) is well-defined in R ∪ {−∞} for every μ = ρ dL n ∈ K .” Then for every μ0 , μ1 ∈ K , t → U ( ρt ) is a convex function on [0, 1], where μt = ρt
dL n
(10.17)
is the displacement interpolation from μ0 to μ1 .
Remark 10.6 When U is as in (10.3) and (10.4), then assumptions (i) and (ii) are satisfied by K = Pac (Rn ). Indeed, (i) follows by Proposition 10.2, while (ii) is trivially true since U ≥ 0 on [0, ∞) if (10.3) holds, while U ≤ 0 on [0, ∞) if (10.4) holds. We claim that, when U ( ρ) = ρ log ρ, then K = Pp,ac (Rn ) (with p ≥ 1) satisfies (i) and (ii). The validity of (ii) follows by (10.5), while the fact that ρt dL n ∈ Pp,ac (Rn ) if μ0 , μ1 ∈ Pp,ac (Rn ) is proved by combining Proposition 10.2 with the following argument: if ∇ f is the Brenier map from μ0 to μ1 and f t (x) = (1 − t)|x| 2 /2 + t f (x), then ∇ f t is the Brenier map from μ0 to μt , so that p |x| ρt (x) dx = |∇ f t (x)| p ρ0 (x) dx Rn Rn p p |x| ρ0 (x) dx + |∇ f (x)| ρ0 (x) dx ≤ C(p) Rn Rn = C(p) |x| p ρ0 (x) dx + |x| p ρ1 (x) dx < ∞, Rn
thus proving that Pp,ac
(Rn )
Rn
is closed under displacement interpolation.
n×n , A ≥ 0, then Proof of Theorem 10.5 Step one: We first show that if A ∈ Rsym 1/n t ∈ [0, 1] → det (1 − t) Id + t A
defines a concave function, which is strictly concave unless A = λ Id for some λ ≥ 0. Indeed, A = diag(λ i ) with λ i ≥ 0 in a suitable orthonormal basis of Rn . Therefore, setting s = t/(1 − t), we find that
10.3 Displacement Convexity of Internal Energies
113
n 1/n ϕ(t) = det (1 − t) Id + t A = ((1 − t) + t λ i ) 1/n . i=1
= (1 − t)
n
(1 + s λ i ) 1/n .
i=1
≥ (1 − t) 1 +
n
(s λ i )
1/n
= (1 − t) ϕ(0) + t ϕ(1),
i=1
where in the inequality we have used n n (1 + ai ) 1/n ≥ 1 + ai1/n i=1
∀ai ≥ 0.
(10.18)
i=1
Inequality (10.18) is obtained by applying the arithmetic–geometric mean n and {a /(1 + inequality (9.1) to the sets of nonnegative numbers {1/(1 + ai )}i=1 i n : indeed in this way one finds ai )}i=1 n i=1
n n n ai1/n 1 1 1 1 ai + ≤ + = 1. n i=1 1 + ai n i=1 1 + ai (1 + ai ) 1/n i=1 (1 + ai ) 1/n
Notice that equality holds in (10.18) if and only if there exists a ≥ 0 such that ai = a for every i. In particular, if for some t ∈ (0, 1) we have ϕ(t) = (1 − t) ϕ(0) + t ϕ(1), then there exists a ≥ 0 such that a = t λ i /(1 − t) for every i, and thus A = λ Id for a constant λ ≥ 0. Step two: Now let μ0 = ρ0 dL n , μ1 = ρ1 dL n , let ∇ f be the Brenier map from μ0 to μ1 , and consider the convex function f t : Rn → R ∪ {+∞} defined by |x| 2 + t f (x), x ∈ Rn . 2 t denote the sets of differentiability, twice differentiability, and If F1t , F2t , F2,inv twice differentiability with positive Hessian of the convex function f t , then for every t ∈ (0, 1) we have f t (x) = (1 − t)
F1t = F1 ,
t F2,inv = F2t = F2 ,
(10.19)
where as usual F1 and F2 are the corresponding sets for f . Since |x| 2 /2 is smooth we trivially have F1t = F1 and F2t = F2 . To complete the proof of (10.19) we notice that if x ∈ F2 and A = ∇2 f (x), then, in a suitable orthonormal basis, A = diag(λ i ) with λ i ≥ 0. Correspondingly, (1 − t)Id + t A = diag(1 − t + t λ i ) with 1 − t + t λ i ≥ 1 − t > 0, so that (1 − t)Id + t A is invertible t = F2t . for every t ∈ [0, 1), thus showing F2,inv We prove (10.17). We notice that ∇ f t is the Brenier map from μ0 to μt = (∇ f t )# μ0 = μt . We can thus apply Theorem 8.1 to μ0 and μt and find that μ0 t is concentrated on a Borel set Yt ⊂ F2,inv = F2 such that
114
Displacement Convexity and Equilibrium of Gases ρ0 ρt (∇ f t ) = on Yt , det ∇2 f t ϕ= ϕ(∇ f t ) det ∇2 f t , (∇ f t )(Yt )
(10.20) (10.21)
Yt
for every Borel function ϕ : Rn → [0, ∞]. Thanks to assumptions (i) and (ii) we have that either ϕ = U ( ρt ) + or ϕ = U ( ρt ) − has finite integral, therefore we can first write (10.21) with ϕ = U ( ρt ), and then apply (10.20), to find that
ρ0 U ( ρt ) = U (10.22) det ∇2 f t det ∇2 f t (∇ f t )(Yt ) Yt holds as an identity between extended real numbers. By (∇ f t )# μ0 = μt we have μt (Rn \ (∇ f t )(Yt )) ≤ μ0 (Rn \ Yt ) = 0, i.e., ρt = 0 L n -a.e. on Rn \ (∇ f t )(Yt ), and since U (0) = 0 we conclude that U ( ρt ) = U ( ρt ). (10.23) Rn
(∇ f t )(Yt )
L n -a.e.
Now, Yt ⊂ F2 and ρ0 = 0 on F2 , so that exploiting again U (0) = 0 we find that
ρ0 ρ0 2 U f = U (10.24) det ∇ det ∇2 f t , t 2f 2f det ∇ det ∇ t t Yt F2 t where the integrand is well-defined also on F2 since F2,inv = F2 . By combining (10.22), (10.23) and (10.24) we conclude that
ρ0 U ( ρt ) = U det ∇2 f t , det ∇2 f t F2
as extended real numbers. For every x ∈ F2 we now set λ x (t) = (det ∇2 f t (x)) 1/n ,
ρ0 (x) n Φ x (λ) = λ U , λn
t ∈ [0, 1], λ > 0,
and ψ x (t) = Φ x (λ x (t)). By step one, λ x is concave on [0, 1], and thus λ x (t) ≥ (1 − t) λ x (0) + t λ x (1); therefore by exploiting first that Φ x is decreasing and then that it is convex, we find that ψ x (t) = Φ x (λ x (t)) ≤ Φ x (1 − t) λ x (0) + t λ x (1) ≤ (1 − t) Φ x λ x (0) + t Φ x λ x (1) so that U ( ρt ) =
(10.25) (10.26)
= (1 − t) ψ x (0) + t ψ x (1),
F2
ψ x (t) dx is convex on [0, 1], as claimed.
10.4 The Brunn–Minkowski Inequality
115
Remark 10.7 (Strict convexity and internal energy) It is interesting to see what happens if assumption (10.16) in Theorem 10.5 is strengthened to λ ∈ (0, ∞) → λ n U (λ −n ) is a strictly convex decreasing function.
(10.27)
Under this assumption we have that if ρ0 and ρ1 are such that U ( ρt ) = (1 − t) U ( ρ0 ) + t U ( ρ1 ) for some t ∈ [0, 1], then ∇2 f (x) = Id for L n -a.e. x ∈ F2 .
(10.28)
Indeed, let x ∈ F2 be such that (1 − t) ψ x (0) + t ψ x (1) = ψ x (t). Since Φ x is strictly convex and decreasing, and thus also strictly decreasing, equality in (10.25) implies that (1 − t) λ x (0) + t λ x (1) = λ x (t), and thus by step one that ∇2 f (x) = σ(x) Id for some σ(x) ≥ 0. Equality in (10.26) implies that λ x (0) = λ x (1), where λ x (0) = det(Id ) 1/n = 1,
λ x (1) = det(∇2 f (x)) 1/n = σ(x),
so that σ(x) = 1; this proves (10.28). Now F2 is L n -equivalent to { ρ0 > 0}. In a situation where { ρ0 > 0} is an open connected set, then (10.28) implies the existence of x 0 ∈ Rn such that ∇ f (x) = x + x 0 for every x ∈ { ρ0 > 0}, and thus that ρ1 = τx0 ρ0 . If we just assume that { ρ0 > 0} is open, but possibly disconnected, then different connected components of { ρ0 > 0} may be compatible with different translations, and an additional argument (tailored on the specific problem under consideration) may be needed to conclude that ρ1 = τx0 ρ0 .
10.4 The Brunn–Minkowski Inequality We close this chapter with an application of Theorem 10.5 to geometric inequalities, and give, in particular, a proof of the Brunn–Minkowski inequality. Let us recall that the Minkowski sum of E0 and E1 subsets of Rn is defined as E0 + E1 = x + y : x ∈ E0 , y ∈ E1 . The Brunn–Minkowski inequality provides a lower bound on the volume of E0 + E1 in terms of the volumes of E0 and E1 : |E0 + E1 | 1/n ≥ |E0 | 1/n + |E1 | 1/n ,
∀E0 , E1 ∈ B(Rn ).
(10.29)
This inequality, which at first sight may look a bit abstract, plays a central role in many mathematical problems. As an indication of its pervasiveness, we analyze its meaning in the special case when E1 = Br (0). In this case, setting
116
Displacement Convexity and Equilibrium of Gases
for brevity E0 = E, we have that E0 + E1 = Ir (E) is the open r-neighborhood of E, so that (10.29) gives a very general lower bound on the volume of |Ir (E)|. |Ir (E)| ≥ |E| 1/n + r |B1 | 1/n ) n ∀r > 0. (10.30) Notice that equality holds in (10.30) if E = BR (x) for some x ∈ Rn and R > 0, a fact that suggest a link between (10.30) and Euclidean isoperimetry. Indeed, if E is a bounded open set with smooth boundary, then for r sufficiently small (in terms of the curvatures of ∂E) we have |Ir (E)| = |E| + r P(E) + O(r 2 ), so that (10.30) implies |Ir (E)| − |E| r →0 r 1/n + r |B1 | 1/n ) n − |E| |E| = n |B1 | 1/n |E| (n−1)/n , ≥ lim+ r →0 r
P(E) = lim+
which is the sharp Euclidean isoperimetric inequality (9.7). Proof of the Brunn–Minkowski inequality (10.29) Without loss of generality we can assume that |E0 + E1 | < +∞ and that |E0 |, |E1 | > 0. Since both E0 and E1 are non-empty, E0 + E1 contains both a translation of E0 and of E1 , and thus we also have |E0 |, |E1 | < ∞. Therefore, the densities ρ0 =
1 E0 , |E0 |
ρ1 =
1 E1 , |E1 |
define probability measures μ0 , μ1 ∈ Pac (Rn ), and the internal energy ρ(x) 1−(1/n) dx, U ( ρ) = − Rn
corresponding to U ( ρ) = −ρ1−(1/n) , ρ > 0, is such that U ( ρ0 ) = −|E0 | 1/n ,
U ( ρ1 ) = −|E1 | 1/n .
Now let ∇ f be the Brenier map from μ0 to μ1 , so that the displacement interpolation μt from μ0 to μ1 satisfies μt = ρt dL n = (∇ f t )# μ0 where ∇ f t = (1 − t) id + t ∇ f . By Theorem 8.1, and denoting by F the set of differentiability points of f , we have ∃X1 ⊂ E0 ∩ F s.t. μ0 concentrated on X1 and ∇ f (X1 ) ⊂ E1 . In particular, μt = (∇ f t )# μ0 and the fact that F is also the set of differentiability points of f t give μt is concentrated on ∇ f t (X1 ).
10.4 The Brunn–Minkowski Inequality
117
By combining these two facts we find that for L n -a.e. y ∈ { ρt > 0} there exists x ∈ X1 such that y = ∇ f t (x) = (1 − t) x + t ∇ f (x) ∈ (1 − t) E0 + t E1 , which of course implies (10.31) |{ ρt > 0}| ≤
(1 − t) E0 + t E1
. We now take t = 1/2. By |E0 + E1 | < +∞ we have |{ ρ1/2 > 0}| < ∞, while by displacement convexity of U, recall Theorem 10.5, we find −
U ( ρ0 ) + U ( ρ1 ) |E0 | 1/n + |E1 | 1/n = 2 2 ≥ U ( ρ1/2 ) =
{ρ 1/2 >0}
≥ |{ ρ1/2 > 0}| U = −|{ ρ1/2 > 0}|
U ( ρ1/2 (x)) dx 1
|{ ρ1/2 > 0}|
1/n
,
{ρ 1/2 >0}
ρ1/2 (x) dx (10.32)
where in the last inequality we have used Jensen’s inequality, the convexity of U ( ρ) and the finiteness of |{ ρ1/2 > 0}|. By combining (10.31) and (10.32) we obtain the Brunn–Minkowski inequality.
11 The Wasserstein Distance W2 on P2 (Rn )
In this chapter we introduce the (L 2 -) Wasserstein space (P2 (Rn ), W2 ), which is the set of probability measures with finite second moments, P2 (Rn ), endowed with the (L 2 -) Wasserstein distance W2 (μ, ν) = K2 (μ, ν) 1/2 . We prove the completeness of (P2 (Rn ), W2 ), that the W2 -convergence is equivalent to narrow convergence of Radon measures plus convergence of the second moments, and that displacement interpolation can be interpreted as “geodesic interpolation” in (P2 (Rn ), W2 ). The Wasserstein space is a central object in OMT, in all its applications to PDE, and in those to Riemannian and metric geometry.
11.1 Displacement Interpolation and Geodesics in (P2 (R n ), W2 ) We begin by showing that every transport plan γ ∈ Γ(μ0 , μ1 ) induces a Lipschitz curve in (P2 (Rn ), W2 ) with end-points at μ0 and μ1 , and that optimal transport plans saturate the corresponding Lipschitz bounds. As usual we set p(x, y) = x and q(x, y) = y for the standard projection operators on Rn × Rn . Proposition 11.1
If μ0 , μ1 ∈ P2 (Rn ) and γ ∈ Γ(μ0 , μ1 ), then t ∈ [0, 1], μt = (1 − t) p + t q # γ,
(11.1)
defines a Lipschitz curve in (P2 (Rn ), W2 ), in the sense that W2 (μt , μ s ) ≤ L |t − s|,
∀t, s ∈ [0, 1],
(11.2)
where L is the quadratic transport cost of γ, i.e., 1/2
|x − y| 2 dγ(x, y) . L= Rn ×Rn
If in addition γ is an optimal transport plan in K2 (μ0 , μ1 ), then μt is called the displacement interpolation from μ0 to μ1 induced by γ, and satisfies 118
11.1 Displacement Interpolation and Geodesics in (P2 (Rn ), W2 ) W2 (μt , μ s ) = W2 (μ0 , μ1 ) |t − s|
∀t, s ∈ [0, 1].
119
(11.3)
Remark 11.2 The terminology just introduced is consistent with the one introduced in (10.12). Indeed, if μ0 , μ1 ∈ P2,ac (Rn ), then by Theorem 4.2 there exists a unique optimal transport plan γ in K2 (μ0 , μ1 ), induced by the Brenier map ∇ f from μ0 to μ1 through the formula γ = (id × ∇ f )# μ0 , and with this choice of γ it is easily seen that (1 − t) p + t q # γ = (1 − t) id + t ∇ f # μ0 . The main difference between the case when μ0 , μ1 ∈ P2,ac (Rn ) and the general case is simply that, in the general case, there could be more than one displacement interpolation between μ0 and μ1 . This fact should be understood in analogy with the possible occurrence of multiple geodesics with same end-points in a Riemannian manifold (see also Remark 11.3). Proof of Proposition 11.1 Let γ ∈ Γ(μ0 , μ1 ). If 0 ≤ t < s ≤ 1 and pt = (1−t) p + t q : Rn × Rn → Rn , then the map (pt , ps ) : Rn × Rn → Rn × Rn can be used to define a plan γt, s = (pt , ps )# γ ∈ Γ(μt , μ s ). Therefore,
|x − y| 2 dγt, s = |pt (x, y) − ps (x, y)| 2 dγ n n n n R ×R R ×R 2 2 = |t − s| |x − y| dγ = |t − s| 2 L 2 ,
K2 (μt , μ s ) ≤
Rn ×Rn
that is (11.2). If in addition γ is optimal in K2 (μ0 , μ1 ), then we have proved that W2 (μt , μ s ) ≤ W2 (μ0 , μ1 ) |t − s|,
∀t, s ∈ [0, 1].
By applying this inequality on the intervals [0,t], [t, s] and [s, 1], and by exploiting the triangular inequality in (P2 (Rn ), W2 ) (which is proved in Theorem 11.10 without making use of this proposition), we thus find that W2 (μ0 , μ1 ) ≤ W2 (μ0 , μt ) + W2 (μt , μ s ) + W2 (μ s , μ1 ) ≤ (t + (s − t) + (1 − s)) W2 (μ0 , μ1 ) = W2 (μ0 , μ1 ),
thus showing the validity of (11.3).
Remark 11.3 (Displacement interpolation as geodesic interpolation) The notion of length of a smooth curve γ : [0, 1] → M on a Riemannian manifold 1 1
(γ) = gγ(t ) (γ (t), γ (t)) dt = |γ (t)|gγ (t ) dt, 0
0
120
The Wasserstein Distance W2 on P2 (Rn )
can be used, when M is path-connected, to define a metric d g on M, d g (p, q) = inf (γ) : γ(0) = p, γ(1) = q .
(11.4)
Minimizers in d g (p, q) always exist, are possibly non-unique, and are called minimizing geodesics. More generally, if ∇ denotes the Levi-Civita connection of (M, g), then geodesics are defined as solutions of ∇γ γ = 0, i.e., as critical points of the length functional (among curves with fixed ends). The notion of minimizing geodesics is usually more restrictive than that of geodesic (as illustrated by a pair of complementary arcs with non-antipodal end-points inside an equatorial circle: both arcs are geodesics, but only one is a minimizing geodesic). A geodesic, as every other curve in M, can be reparametrized so to make its speed |γ |gγ constant, and constant speed geodesics satisfy d g (γ(t), γ(s)) = |t − s| d g (γ(0), γ(1))
∀t, s ∈ [0, 1].
(11.5)
Viceversa, if γ is a geodesic in M and (11.5) holds, then γ has constant speed. Generalizing on these considerations, it is customary to say that a constant speed geodesic in a metric space (X, d) is any curve γ : [0, 1] → X such that d(γ(t), γ(s)) = |t − s| d(γ(0), γ(1)) for every t, s ∈ [0, 1]. With this convention, in view of Proposition 11.1, displacement interpolation is the (constant speed) geodesic interpolation on (P2 (Rn ), W2 ).
11.2 Some Basic Remarks about W2 Remark 11.4 (W2 “extends” the Euclidean distance from Rn to P2 (Rn )) Setting d(x, y) = |x − y| for x, y ∈ Rn , we see that (Rn , d) embeds isometrically into (P2 (Rn ), W2 ) through the map x ∈ Rn → δ x . Indeed, if x 0 , x 1 ∈ Rn , then W2 (δ x0 , δ x1 ) = |x 0 − x 1 |.
(11.6)
Remark 11.5 (W2 is lower semicontinuous under weak-star convergence) ∗ ∗ Indeed, let μ j , μ, ν j , ν ∈ P2 (Rn ) be such that μ j μ and ν j ν as j → ∞ (and thus narrowly, since μ(Rn ) = ν(Rn ) = 1): we claim that W2 (μ, ν) ≤ lim inf W2 (μ j , ν j ). j→∞
(11.7)
Up to extracting a subsequence we can assume the liminf is a limit. Moreover, if ∗ γ j is optimal in K2 (μ j , ν j ), then up to extracting a further subsequence γ j γ as j → ∞ for some Radon measure γ on Rn × Rn . As usual, given the narrow convergence of μ j and ν j , we have that γ ∈ Γ(μ, ν) and that the convergence of γ j to γ is narrow. Hence, given ϕ ∈ Cc0 (Rn × Rn ) with 0 ≤ ϕ ≤ 1 we find
11.2 Some Basic Remarks about W2
121
|x − y| 2 dγ = sup |x − y| 2 ϕ dγ n n n n ϕ R ×R R ×R 2 = sup lim |x − y| ϕ dγ j ≤ lim K2 (μ j , ν j ).
K2 (μ, ν) ≤
ϕ
j→∞
j→∞
Rn ×Rn
Notice that the inequality in (11.7) can be strict: for example, with Rn = R, if ∗ μ = ν = δ0 and μ j = (1 − (1/ j)) δ0 + (1/ j) δ j , then μ j μ as j → ∞, but |x| 2 dμ j (x) = j → ∞. W2 (μ j , μ) 2 = R
Remark 11.6 (W2 is a contraction under ε-regularization) Precisely, if μ, ν ∈ P2 (Rn ), με = μ ρε and νε = ν ρε for a standard regularizing kernel ρε , then W2 (μ, ν) ≥ W2 με dL n , νε dL n , (11.8) n n (11.9) W2 (μ, ν) = lim+ W2 με dL , νε dL . ε→0
Indeed, if γ is an optimal plan for W2 (μ, ν) and γ (ε) ∈ P (Rn ×Rn ) is defined by ϕ dγ (ε) = dγ(x, y) ϕ(x + z, y + z) ρε (z) dz, (11.10) Rn ×Rn
Rn ×Rn 0 n Cc (R ×Rn ), then
Rn
γ (ε)
∈ P (Rn ×Rn ) and, for every ϕ ∈ Cc0 (Rn ), for every ϕ ∈ ϕ(x) dγ (ε) (x, y) = dγ(x, y) ϕ(x + z) ρε (z) dz n n Rn ×Rn Rn R ×R = dμ(x) ϕ(x + z) ρε (z) dz n n R R = dμ(x) ϕ(w) ρε (w − x) dw = ϕ(μ ρε ). Rn
Rn
Rn
Thus γ (ε) ∈ Γ(με dL n , νε dL n ). Since με dL n and νε dL n belong to P2 (Rn ), (11.10) holds with ϕ(x, y) = |x − y| 2 , and thus W2 με dL n , νε dL n 2 ≤ |x − y| 2 dγ (ε) (x, y) Rn ×Rn = dγ(x, y) |x + z − (y + z)| 2 ρε (z) dz Rn ×Rn Rn = |x − y| 2 dγ(x, y) = W2 (μ, ν) 2 . Rn ×Rn
This proves (11.8), and then (11.9) follows by (11.7) and (11.8). Example 11.7 (Wasserstein distance and translations) This example should be helpful in developing some geometric intuition about the Wasserstein distance. Given two measures μ0 , μ1 ∈ P2 (Rn ), let us consider the Wasserstein distance W2 (μ0 , τx0 μ1 ) between μ0 and the translation of μ1 by x 0 ∈ Rn ,
The Wasserstein Distance W2 on P2 (Rn )
122
namely, τx0 μ1 (E) = μ1 (E − x 0 ), E ∈ B(Rn ). Given (11.6), we would expect this distance to grow linearly in |x 0 |, at least in certain directions. We have indeed that if x 0 · barμ1 − barμ0 ≥ 0, W2 (μ0 , τx0 μ1 ) ≥ W2 (μ0 , μ1 ) 2 + |x 0 | 2 , (11.11) where barμ = Rn x dμ(x) denotes the barycenter of μ. Indeed, let us first assume that μ0 , μ1 0. Since for every μ ∈ P2 (Rn ) we have as ε → 0+, bar[ μ ρε ) dL n ] → barμ we find that x 0 ·(barμε1 −barμε0 ) > 0 if μεk = (μk ρε ) dL n . Since με0 , με1 1, μ0 =
1 1 δ (0, L) + δ (1,−L) , 2 2
μ1 =
1 1 δ (1, L) + δ (0,−L) , 2 2
ν=
1 1 δ (0,0) + δ (1,0) . 2 2 (11.12)
In this way μt =
1 1 δ (t, L) + δ (1−t,−L) , 2 2
see Figure 11.1, and the optimal plan γt in K2 (μt , ν) is clearly given by 1 δ[(t, L), (0,0)] + 2 1 γt = δ[(t, L), (1,0)] + 2
γt =
1 δ[(1−t,−L), (1,0)] 2 1 δ[(1−t,−L), (0,0)] 2
∀t ∈ (0, 1/2], ∀t ∈ [1/2, 1).
In particular, W2 (μt , ν) 2 = t 2 + L 2 ,
∀t ∈ (0, 1/2],
W2 (μt , ν) = (t − 1) + L , 2
2
2
∀t ∈ [1/2, 1),
and t → W2 (μt , ν), having a strict maximum at t = 1/2, is not convex.
The Wasserstein Distance W2 on P2 (Rn )
124
Remark 11.9 (Convexity of W22 with respect to standard interpolation) We finally notice that, if μ0 , μ1 , ν ∈ P2 (Rn ), then W2 (1 − t) μ0 + t μ1 , ν 2 ≤ (1 − t) W2 (μ0 , ν) 2 + t W2 (μ1 , ν) 2 . (11.13) Indeed, if γ0 ∈ Γ(μ0 , ν) and γ1 ∈ Γ(μ1 , ν) are optimal transport plans for the quadratic cost, then γt = (1 − t) γ0 + t γ1 ∈ Γ((1 − t) μ0 + t μ1 , ν) and thus W2 (1 − t) μ0 + t μ1 , ν 2 ≤ |x − y| 2 dγt n n R ×R = (1 − t) |x − y| 2 dγ0 + t |x − y| 2 dγ1 Rn ×Rn
Rn ×Rn
which is (11.13).
11.3 The Wasserstein Space (P2 (R n ), W2 ) We finally come to the main result of this chapter. Theorem 11.10 The Wasserstein space (P2 (Rn ), W2 ) is a complete metric space. Moreover, given μ j , μ ∈ P2 (Rn ), j ∈ N, we have that W2 (μ j , μ) → 0,
as j → ∞,
(11.14)
if and only if, as j → ∞, ∗
μj μ as Radon measures on Rn , 2 |x| dμ j (x) → |x| 2 dμ(x). Rn
Rn
(11.15) (11.16)
In particular, if (11.14) holds, then μ j narrowly converges to μ as j → ∞. Proof In step one we prove that (P2 (Rn ), W2 ) is a metric space, while step two and three discuss the equivalence between (11.14) and (11.15)–(11.16). Finally, in step four, we show the completeness of (P2 (Rn ), W2 ). Step one: We prove that W2 defines a metric on P2 (Rn ). We have already noticed that K2 (μ, ν) = K2 (ν, μ), hence W2 is symmetric. If W2 (μ, ν) = 0 and γ is an optimal plan from μ to ν, then 0 = Rn ×Rn |x − y| 2 dγ(x, y) implies that γ is concentrated on {(x, y) : x = y}: hence for every E ∈ B(Rn ) we have γ(E × (Rn \ E)) = γ((Rn \ E) × E) = 0, and thus μ(E) = γ(E × Rn ) = γ(E × E) = γ(Rn × E) = ν(E),
11.3 The Wasserstein Space (P2 (Rn ), W2 )
125
that is, μ = ν. We are thus left to prove the triangular inequality W2 (μ1 , μ3 ) ≤ W2 (μ1 , μ2 ) + W2 (μ2 , μ3 ),
∀μ1 , μ2 , μ3 ∈ P2 (Rn ). (11.17)
Thanks to (11.9) it is sufficient to prove (11.17) with μεk = (μk ρε ) dL n in place of μk . To this end, let ∇ f 12 and ∇ f 23 be the Brenier maps from με1 to με2 and from με2 to με3 respectively. Denoting by F12 and F23 the differentiability sets of f 12 and f 23 , we define T : F12 ∩ (∇ f 12 ) −1 (F23 ) → Rn by setting x ∈ F12 ∩ (∇ f 12 ) −1 (F23 ).
T (x) = ∇ f 23 (∇ f 12 (x))
We immediately see that T# με1 = με3 since (∇ f 12 )# με1 = με2 and (∇ f 23 )# με2 = με3 : therefore, W2 (με1 , με3 ) ≤ T − Id L 2 (μ ε ) ≤ T − ∇ f 12 L 2 (μ ε ) + W2 (με1 , με2 ). 1
We conclude since
(∇ f 12 )# με1
=
με2
1
gives
T − ∇ f 12 L 2 (μ ε ) = 1
=
Rn
Rn
|(∇ f 23 ) ◦ (∇ f 12 ) − ∇ f 12 | |∇ f 23 − Id | 2 dμε2
1/2
2
dμε1
1/2
= W2 (με2 , με3 ).
Step two: We prove that if W2 (μ j , μ) → 0 as j → ∞, then (11.15)–(11.16) holds. Of course it is enough to prove that (11.15) holds up to extracting subsequences. We start proving that there is a positive constant M such that sup μ j (Rn \ BR ) ≤ j
M , R2
∀R > 0.
(11.18)
Indeed if γ j ∈ Γ(μ j , μ) is an optimal plan in K2 (μ j , μ), then by first integrating the inequality
1 2 2 ∀x, y ∈ Rn , |x| ≤ (1 + ε) |y| + 1 + |x − y| 2 ε with respect to γ j , and then by exploiting p# γ j = μ j and q# γ j = μ, we obtain |x| 2 dμ j (x) ≤ (1 + ε) |y| 2 dμ(y) + C(ε) K2 (μ j , μ). Rn
Rn
In particular, since W2 (μ j , μ) → 0 as j → ∞, we find 2 |x| dμ j (x) ≤ |y| 2 dμ(y), lim sup j→∞
Rn
Rn
(11.19)
The Wasserstein Distance W2 on P2 (Rn )
126
∗
which implies (11.18). Now, up to extracting a subsequence, γ j γ for some Radon measure γ on Rn × Rn , and if ϕ ∈ Cc0 (Rn × Rn ) with 0 ≤ ϕ ≤ 1, then 2 |x−y| ϕ(x, y) dγ = lim |x−y| 2 ϕ(x, y) dγ j ≤ lim K2 (μ j , μ)=0, j→∞
Rn ×Rn
j→∞
Rn ×Rn
so that x = y γ-a.e. on Rn × Rn .
(11.20)
Moreover (11.18) and γ j ∈ Γ(μ j , μ) imply that γ j converges narrowly to γ, so that we can pick any ϕ ∈ Cc0 (Rn ) and find that lim ϕ dμ j = lim ϕ(x) dγ j (x, y) = ϕ(x) dγ(x, y). j→∞
j→∞
Rn
Rn ×Rn
Rn ×Rn
By (11.20) and again by narrow convergence of γ j to γ, we find however that ϕ(x) dγ(x, y) = ϕ(y) dγ(x, y) = lim ϕ(y) dγ j (x, y) j→∞ Rn ×Rn Rn ×Rn Rn ×Rn = lim ϕ dμ = ϕ dμ, j→∞
Rn
Rn
where in the penultimate identity we have used q# γ j = μ. We have thus proved ∗ ∗ that μ j μ as j → ∞. Finally (11.19) and μ j μ give 2 2 |x| dμ = sup ϕ |x| dμ = sup lim ϕ |x| 2 dμ j ϕ ϕ j→∞ Rn Rn Rn 2 ≤ lim inf |x| dμ j ≤ |x| 2 dμ, j→∞
Rn
Rn
where supϕ runs among those ϕ ∈ Cc0 (Rn ) with 0 ≤ ϕ ≤ 1. This proves (11.16). Step three: We prove that (11.15)–(11.16) imply W2 (μ j , μ) → 0 as j → ∞. Once again we only need to prove this up to the extraction of subsequences. We notice that (11.16) implies (11.19) and thus (11.18). In particular we find that (i): ϕ dμ j → ϕ dμ, ∀ϕ ∈ C 0 (Rn ) with quadratic growth, (11.21) Rn
Rn ∈ C 0 (Rn )
i.e., whenever ϕ is such that |ϕ(x)| ≤ C (1 + |x| 2 ) for every x ∈ Rn and a suitable C < ∞; and (ii): for every ε > 0, we can find ϕε ∈ Cc0 (Rn ) with 0 ≤ ϕε ≤ 1, ϕε = 1 on BR(ε)/2 and ϕε = 0 on Rn \ BR(ε) for some R(ε) → +∞ as ε → 0+ , so that sup |x − y| 2 [1 − ϕε (x) ϕε (y)] dγ < ε. (11.22) γ ∈Γ(μ j , μ)
Rn ×Rn
As a consequence, if we introduce the transport costs cεp (x, y) = |x − y| p ϕε (x) ϕε (y)
(x, y) ∈ Rn × Rn ,
p = 1, 2,
11.3 The Wasserstein Space (P2 (Rn ), W2 )
127
and notice that c2ε ≤ 2 R(ε) c1ε and c1ε ≤ |x − y| on Rn × Rn , then we find by (11.22) K2 (μ j , μ) ≤ Kc2ε (μ j , μ) + ε ≤ 2 R(ε) Kc1ε (μ j , μ) + ε ≤ 2 R(ε) K1 (μ j , μ) + ε, for every j and ε > 0. We have thus reduced to prove that lim K1 (μ j , μ) = 0,
(11.23)
j→∞
and in turn, by the Kantorovich duality theorem with linear cost, Theorem 3.17, see in particular (3.38), it is enough to show that n lim sup u dμ j − u dμ : u : R → R, Lip(u) ≤ 1 = 0. (11.24) j→∞
Rn
Rn
Let u j be 1/ j-close to realize the supu in (11.24). Notice that we can add a constant to u j without affecting the almost optimality property of u j in supu , so that we can also assume u j (0) = 0. By the Ascoli-Arzelá theorem, up to extracting a further subsequence, we have that u j → u locally uniformly on Rn for some u with Lip(u) ≤ 1 and u(0) = 0. Moreover, we clearly have max{|u j (x)|, |u(x)|} ≤ |x| for every x ∈ Rn and j ∈ N. To conclude the proof we are left to show that u j dμ j − u j dμ = 0. (11.25) lim j→∞
Rn
Rn
Indeed, given R > 0, we have that
u j dμ j − u j dμ
≤
u dμ j − u dμ
+ 2 u j − uC 0 (B R ) Rn Rn
Rn
Rn +
Rn \B R
|u j | + |u| dμ j +
By max{|u j |, |u|} ≤ |x|, we find that
sup |u j | + |u| d[μ j + μ] ≤ 2 sup j
Rn \B R
j
Rn \B R
|u j | + |u| dμ.
1/2 |x| d[μ j + μ] 2
Rn \B R
→ 0+
as R → +∞, so that
lim sup
u j dμ j − u j dμ
≤ lim sup
u dμ j − u dμ
= 0, j→∞ Rn j→∞ Rn Rn Rn
where in the last identity we have used the fact that u has quadratic growth (since it has linear growth) in conjunction with (11.21). Step four: Finally, let { μ j } j be a Cauchy sequence in (P2 (Rn ), W2 ). Given ε > 0 we can find j (ε) such that if j ≥ j (ε), then W2 (μ j , μ j (ε) ) ≤ ε. At
The Wasserstein Distance W2 on P2 (Rn )
128
the same time we can find R(ε) such that μ j (ε) (BR(ε) ) ≥ 1 − ε. In particular, if u : Rn → [0, 1] is such that u = 1 on BR(ε) , u = 0 on Rn \ BR(ε)+1 and Lip(u) ≤ 1, then for every k ≥ j (ε) we find u dμ j (ε) − u dμk , ε ≥ W2 (μk , μ j (ε) ) ≥ K1 (μk , μ j (ε) ) ≥ Rn
Rn
≥ μ j (ε) (BR(ε) ) − μk (BR(ε)+1 ) ≥ 1 − ε − μk (BR(ε)+1 ), that is, μk (BR(ε)+1 ) ≥ 1 − 2 ε,
∀k ≥ j (ε).
In other words, sup j μ j (Rn \ BR ) → 0 as R → ∞, so that we can find μ ∈ P (Rn ) such that, up to extracting subsequences, μ j converges to μ narrowly. Moreover, 2 |x| 2 dμ j = W2 (μ j , δ0 ) 2 ≤ W2 (μ j (ε) , δ0 ) + W2 (μ j (ε) , μ j ) Rn
and W2 (μ j , μ j (ε) ) ≤ ε for j ≥ j (ε) imply that lim sup |x| 2 dμ j (x) < ∞, j→∞
Rn
(11.26)
so that μ ∈ P2 (Rn ) by lower semicontinuity. By step four, W2 (μ j , μ) → 0. Remark 11.11
The proof of Theorem 11.10 is easily adapted to show that (P1 (Rn ), W1 )
(where W1 (μ, ν) = K1 (μ, ν)) is a complete metric space, and to characterize the W1 -convergence. The main difference concerns the proof of the triangular inequality, which in this case follows immediately from the Kantorovich duality formula (3.38).
12 Gradient Flows and the Minimizing Movements Scheme
One of the most interesting applications of OMT is the possibility of characterizing various parabolic PDE as gradient flows of physically meaningful energy functionals. The fundamental example is that of the heat equation, which, in addition to its classical characterization as the gradient flow of the Dirichlet energy in the Hilbert space L 2 , can also be characterized, in a physically more compelling way, as the gradient flow of the entropy functional in the Wasserstein space. Given that gradients are defined by representing differentials via inner products, the most immediate framework for discussing gradient flows is definitely the Hilbertian one. For the same reason it is less clear how to define gradient flows in metric spaces which, like (P2 (Rn ), W2 ), do not obviously admit an Hilbertian structure. This point is addressed in this chapter with the introduction of the versatile and powerful algorithm known as the minimizing movements scheme. The implementation of this algorithm in the Wasserstein space is then discussed in Chapter 13, specifically, on the case study provided by the Fokker–Planck equation on Rn . In Section 12.1 we review the concepts of Lyapunov functional and of dissipation for gradient flows in finite dimensional Euclidean spaces, we present the general argument for convergence toward equilibrium of convex gradient flows (Theorem 12.1), and we illustrate the flexibility of these notions (see Remark 12.2). In Section 12.2 we heuristically exemplify these ideas in a proper PDE setting by looking at the case of the heat equation on Rn . Finally, in Section 12.3, we present the minimizing movements scheme in the finite dimensional setting.
12.1 Gradient Flows in R n and Convexity Every smooth vector field u : Rn → Rn defines a family of systems of ODE (depending on the initial condition x 0 ∈ Rn ) 129
130
Gradient Flows and the Minimizing Movements Scheme
⎧ x (t) = u(x(t)), t ≥ 0, ⎪ ⎨ (12.1) ⎪ x(0) = x . 0 ⎩ There always exists a unique solution x(t) = x(t; x 0 ) to (12.1) defined on a non-trivial interval [0,t 0 ), with t 0 uniformly positive for x 0 is a compact set. We call the collection of these solutions the flow generated by u. When t 0 = +∞ we can try to describe the long time behavior of these solutions by looking for quantities H : Rn → R which behave monotonically along the flow. It is customary to focus, for the sake of definiteness, on monotonically decreasing quantities, which are usually called either entropies (a terminology which creates confusion with the physical notion of entropy, which is actually increasing along diffusive processes) or Lyapunov functionals. In general, a flow admits many Lyapunov functionals: Indeed, a necessary and sufficient condition for being a Lyapunov functional is just that u · ∇H ≤ 0 on Rn , as it is easily seen by using (12.1) to compute d H (x(t)) = x (t) · ∇H (x(t)) = (u · ∇H)(x(t)). dt The dissipation D : Rn → [0, ∞) of H along the flow defined by u is defined by D = −u · ∇H, and of particular importance is the special case when (12.1) is a gradient flow, in the sense that u = −∇ f for an “energy functional” f : Rn → R. In this case (12.1) takes the form ⎧ x (t) = −∇ f (x(t)), t > 0, ⎪ ⎨ (12.2) ⎪ x(0) = x , 0 ⎩ and H = f is itself a Lyapunov functional of its own gradient flow, with the (somehow maximal) dissipation D = −|∇ f | 2 . Under (12.2), the flow x(t) evolves according to an infinitesimal minimization principle, that is, the systems “sniffs around” for the nearest accessible states of lower energy by moving in the direction −∇ f (x(t)). The dynamics of a gradient flow can be quite rich. Every critical point x 0 of f is stationary for the corresponding gradient flow, in the sense that x(t) ≡ x 0 for every t > 0 is the unique solution of (12.2) with initial datum x 0 . Among critical points, local maxima will be unstable (under small perturbations of the initial datum,
12.1 Gradient Flows in Rn and Convexity
131
the solution will flow away); but all local minima will be possible asymptotic equilibria (limit points of x(t) as t → ∞). This last remark illustrate the great importance of understanding local minimizers, and not just global minimizers, in the study of variational problems. It also clarifies while convexity is such a powerful assumption in studying gradient flows. Indeed, as we had already reason to appreciate when discussing cyclical monotonicity, the general principle “critical points of convex functions are global minimizers” holds. As a reflection of this fact, and as proved in the following elementary theorem, convex gradient flows have particularly simple dynamics. Theorem 12.1 (Convergence of convex gradient flows) smooth convex function such that
If f : Rn → R is a
lim f (x) = +∞,
(12.3)
|x |→∞
then for every x 0 ∈ Rn there is a solution x(t) of (12.2) defined for every t ≥ 0. Moreover: (a) if f is strictly convex on Rn , then f has a unique global minimizer x min on Rn and for every x 0 ∈ Rn the solution of (12.2) is such that x(t) → x min as t → ∞; (b) if f is uniformly convex on Rn , in the sense that there exists λ > 0 such that v · ∇2 f (x)[v] ≥ λ |v| 2
∀x, v ∈ Rn ,
(12.4)
then for every x 0 ∈ Rn the solution x(t) of (12.2) converges to x min exponentially fast, with |x(t) − x min | ≤ e−λ t |x 0 − x min |,
∀t ≥ 0.
(12.5)
Proof Thanks to (12.3) there is no loss of generality in assuming that f ≥ 0. The flow exists for every time because, as already noticed, f itself is a Lyapunov functional of (12.2): Hence, f (x(t)) ≤ f (x 0 ) for every t ∈ (0,t 0 ), and in particular x(t) never leaves the closed, bounded (by (12.3)) set { f ≤ f (x 0 )}. Since ∇ f is locally Lipschitz, a standard application of the Cauchy-Picard theorem proves that we can take t 0 = +∞. By coercivity and strict convexity of f there exists a unique minimizer x min of f on Rn . We now prove that x(t) → x min as t → ∞. We first notice that d f (x(t)) = −|∇ f (x(t))| 2 , dt implies the dissipation inequality ∞ |∇ f (x(t))| 2 dt ≤ f (x 0 ). 0
(12.6)
132
Gradient Flows and the Minimizing Movements Scheme
Differentiating in turn |∇ f (x(t))| 2 , and using the second order characterization of convexity v · ∇2 f (x)[v] ≥ 0
∀u, v ∈ Rn ,
we deduce that d |∇ f (x(t))| 2 = 2 ∇ f (x(t)) · ∇2 f (x(t))[x (t)] dt = −2 ∇ f (x(t)) · ∇2 f (x(t))[∇ f (x(t))] ≤ 0.
(12.7) (12.8)
Combining (12.6) and (12.7) we see that |∇ f (x(t))| → 0 as t → ∞. Since f (x(t)) is bounded and f is coercive, for every t j → ∞ there exists a subsequence z j of x(t j ) such that z j → z∞ as j → ∞ for some z∞ ∈ Rn . By continuity of ∇ f , we find ∇ f (z∞ ) = 0, and since f admits a unique global minimum and its convex, we find z∞ = x min . By the arbitrariness of t j , we conclude that lim x(t) = x min .
(12.9)
t→∞
To prove exponential convergence, we take the scalar product of 1 ∇ f (x(t)) − ∇ f (x min ) = ∇2 f (sx(t) + (1 − s)x min ) x(t) − x min ds 0
with (x(t) − x min ) and apply uniform convexity to find λ |x(t) − x min | 2 ≤ ∇ f (x(t)) − ∇ f (x min ) · (x(t) − x min ). But thanks to (12.2) d |x(t) − x min | 2 = 2 (x(t) − x min ) · x (t) dt = −2 (x(t) − x min ) · (∇ f (x(t)) − ∇ f (x min )) ≤ −2 λ |x(t) − x min | 2
so that (12.5) follows. n×n Rsym
Remark 12.2 (Gradient flow identification) Let A ∈ be positive definite, and let (x, y) A = A[x] · y denote the scalar product on Rn induced by A. Given a differentiable function f : Rn → R, the gradient of f at x defined by A is the unique vector ∇ A f (x) such that d f x [v] = (∇ A f (x), v) A . When A = Id we are of course back to the usual definition of gradient, i.e., ∇Id = ∇. Now, consider the system of ODE x (t) = −x(t),
(12.10)
and notice that (i) as soon as A is positive definite, f A (x) = (1/2) (x, x) A is a Lyapunov functional of (12.10); (ii) clearly (12.10) is the gradient flow of
12.2 Gradient Flow Interpretations of the Heat Equation
133
f Id (x) = |x| 2 /2 (with respect to the usual definition of gradient); (iii) at the same time, (12.10) is also the gradient flow of f A (x) = (1/2) (x, x) A , if “gradient” means “gradient defined by A,” since ∇ A f A (x) = x for every x ∈ Rn . In summary, the same evolution equation can be seen as the gradient flow of different Lyapunov functionals by tailoring the notion of gradient on the considered functional (where “tailoring” stands for “choosing an inner product”). This remark, which is a bit abstract and artificial in the Euclidean context, is actually quite substantial when working in the infinite dimensional setting of parabolic PDE. In the latter situation, one can typically guess different Lyapunov functionals (by differentiating a candidate functional, substituting the PDE in the resulting formula, and then seeking if the resulting expression can be shown to have a sign), and then can ask if the PDE under study is actually the gradient flow of any of these Lyapunov functionals (abstractly, by trying out different Hilbertian structures – but in the next section we will see more concretely how this works). Whenever it is possible to identify a PDE as the gradient flow of a convex functional, then the proof of Theorem 12.1 provides a sort of blueprint to address convergence to equilibrium.
12.2 Gradient Flow Interpretations of the Heat Equation Following up on Remark 12.2, we now illustrate how to give different gradient flow interpretations of the most basic parabolic PDE, 1 namely, the heat equation on Rn . Given an initial datum μ0 = ρ0 dL n ∈ P (Rn ), we look for ρ = ρ(x,t) : Rn × [0, ∞) → R such that ⎧ ∂ ρ = Δρ, in Rn × (0, ∞), ⎪ ⎨ t (12.11) ⎪ ρ| = ρ . 0 ⎩ t=0 One can easily find several Lyapunov functionals for (12.11). For example, setting h( ρ), h : [0, ∞) → R smooth and convex, H ( ρ) = Rn
1
Clearly, we consider this example for purely illustrative purposes, because any reasonable question about the (unique) solution of (12.11) can be directly tackled by means of the resolution formula 2 e −| x−y | /4 t ρ(x, t ) = ρ 0 (y) dy, (x, t ) ∈ R n × (0, ∞). (12.12) (4π t ) n/2 Rn However, resolution formulas for PDE are definitely not the norm, hence the interest of exploring less direct methods of investigating their behavior.
134
Gradient Flows and the Minimizing Movements Scheme
we easily see that H ( ρ) decreases along the heat flow. Indeed, by informally applying the divergence theorem 2 to the vector field h ( ρ) ∇ρ we find that if ρ is a solution to (12.11), then h ≥ 0 gives d H ( ρ) = h ( ρ) ∂t ρ = h ( ρ) Δρ = − h ( ρ) |∇ρ| 2 ≤ 0. n n n dt R R R Another natural 3 Lyapunov functional for the heat equation is the Dirichlet energy 1 |∇ρ| 2 . H ( ρ) = 2 Rn Indeed, again thanks to an informal application of the divergence theorem, we see that if ρ solves (12.11), then d H ( ρ) = ∇ρ · ∇(∂t ρ) = − ∂t ρ Δρ = − (Δρ) 2 ≤ 0. n n n dt R R R Next we ask the question: Can we interpret the heat equation as the gradient flow of one of its many Lyapunov functionals? For example, it is easily seen that (12.11) can be interpreted as the gradient flow of the Dirichlet energy. On the Hilbert space L 2 (Rn ) endowed with the standard L 2 -scalar product, we consider the functional H defined by 1 H ( ρ) = |∇ρ| 2 if ρ ∈ L 0 = W 1,2 (Rn ) ⊂ L 2 (Rn ), 2 Rn and by H ( ρ) = +∞ if, otherwise, ρ ∈ L 2 (Rn ) \ L 0 . Notice that H is convex on L 2 (Rn ), and that the differential of H at ρ ∈ W 1,2 (Rn ) in the direction ϕ ∈ W 1,2 (Rn ) is given by H ( ρ + tϕ) − H ( ρ) dH ρ [ϕ] = lim = ∇ρ · ∇ϕ, ∀ϕ ∈ W 1,2 (Rn ). t→0 t Rn In particular, if ρ ∈ W 2,2 (Rn ), then by dH ρ [ϕ] = − 2
3
Rn
ϕ Δρ,
Precisely, one applies the divergence theorem on a ball B R to find x · ∇ρ h (ρ) Δρ = − h (ρ) |∇ρ | 2 + h (ρ) |x | BR BR ∂BR and then needs to show the vanishing of the boundary integral in the limit R → ∞. This requires suitable decays assumptions on the initial datum ρ 0 and checking their consequent validity on ρ(·, t ) for every t > 0. Notice that the Dirichlet energy is (up to constants) the dissipation of the Lyapunov functional u2. Rn
12.3 The Minimizing Movements Scheme
135
we can uniquely extend dHρ as a bounded linear functional on the whole L 2 (Rn ), with 2
∇ L H ( ρ) = −Δρ
∀ρ ∈ W 2,2 (Rn ).
Therefore, setting ρ(t) = ρ(x,t), the gradient flow equation 2 d ρ(t) = −∇ L H ( ρ(t)), ρ(0) = ρ0 , dt is equivalent to (12.11). What about Lyapunov functionals of the form H ( ρ) = Rn h( ρ)? It can be seen, for example, that (12.11) is the gradient flow of the Lyapunov functional defined by h( ρ) = ρ2 /2 if gradients are computed in the scalar product of the dual space to W 1,2 (Rn ). The situation is however less clear in the case of a physically interesting Lyapunov functionals like the (negative) entropy ρ log ρ. H ( ρ) =
Rn
Informally (which here means, disregarding integrability issues, admissibility of the variations, etc.) we have H ( ρ + t ϕ) = H ( ρ) + t (1 + log( ρ)) ϕ + O(t 2 ) (12.13) Rn
so that the L 2 -gradient of H at ρ is 1 + log ρ whenever log ρ ∈ L 2 (Rn ), i.e., 2
∇ L H ( ρ) = 1 + log ρ ∀ρ such that log ρ ∈ L 2 (Rn ); but, of course, the corresponding gradient flow, ∂t ρ = −1 − log( ρ), has nothing to do with the heat equation! This is puzzling, because one of the most natural interpretations of the heat equation is that of a model for the time evolution of the position probability density of a particle undergoing random molecular collisions, a process that is intuitively driven by the maximization of the physical entropy, i.e., by − Rn ρ log ρ. Therefore, there should be one way to interpret ρt = Δρ as the gradient flow of H ( ρ) = Rn ρ log ρ, and this interpretation should be very significant from a physical viewpoint.
12.3 The Minimizing Movements Scheme The minimizing movements scheme provides a perspective on the concept of gradient flow which is somehow more fundamental than the one based on the use of Hilbertian structures for representing differentials as gradients. The idea is looking at gradient flows as limits of discrete-in-time flows defined by sequences of minimization processes associated to the choice of a
136
Gradient Flows and the Minimizing Movements Scheme
reference metric. We are now going to introduce this construction in the case of the finite dimensional example (12.2). Given a smooth, nonnegative function f : Rn → [0, ∞) with bounded sublevel sets (i.e., { f ≤ a} is bounded for every a > 0), and given h > 0 and x 0 ∈ Rn , we define a sequence ∞ {x k(h) }k=1 ⊂ Rn
by taking x 0(h) = x 0 and, for k ≥ 1, x k(h)
is the unique minimizer of
f (x) +
1 (h) 2 |x − x k−1 | . 2h
(12.14)
Correspondingly, we consider the discrete flow of step h, x (h) : [0, ∞) → Rn , defined by (h) x (h) (t) = x k−1
if
(k − 1) h ≤ t < k h,
k ≥ 1,
(12.15)
and claim that (i) as h → 0+ , x (h) (t) has a locally uniform limit x(t) on [0, ∞); (ii) the pointwise limit x(t) is the unique solution to the gradient flow (12.2), i.e., x (t) = −∇ f (x(t)) for t > 0, x(0) = x 0 . We expect this to happen because the minimality property (12.14) implies the validity of the critical point condition (h) x k(h) − x k−1
h
= −∇ f (x k(h) ),
which, in turn, is a discrete-in-time approximation of x (t) = −∇ f (x(t)). The interest of this construction is that it is only the Euclidean distance that appears in the definition of the discrete flows x (h) , and not the Euclidean scalar product. This feature makes possible to consider the same scheme starting from a function defined on a metric space: When the resulting scheme converges (as h → 0+ ) to a limit flow, we can say that such flow has the structure of a gradient flow (although no gradient was actually computed in the proper sense of representing a differential through an inner product), namely, of the (metric) gradient flow of the considered function with respect to the considered metric. The convergence of the minimizing movements scheme in the Euclidean setting is discussed in the next theorem. As in the case of Theorem 12.1, the proof is particularly interesting because it provides a blueprint for approaching the same problem in more general settings. Theorem 12.3 (Convergence of the minimizing movements scheme) Given a smooth f : Rn → [0, ∞) with bounded sublevel sets, x 0 ∈ Rn , and h > 0, let x (h) (t) be the discrete flow of step h defined by (12.14) and (12.15). Then, uniformly in h, x (h) (t) is locally bounded, and (1/2)-Hölder continuous above
12.3 The Minimizing Movements Scheme
137
scale 2 h, on [0, ∞). Moreover, its locally uniform limit x(t) on [0, ∞) is such that x(t) ∈ C ∞ ([0, ∞); Rn ), x (t) = −∇ f (x(t)) for every t > 0 and x(0) = x 0 . Proof
Step one: We claim that for some M > 0, depending on f and x 0 only, sup sup |x (h) (t)| ≤ M, h>0 t ≥0
sup
sup
h>0 |t−s | ≥2 h
|x (h) (t) − x (h) (s)| ≤ M
(12.16)
|t − s|.
(12.17)
(h) 2 (h) | at x = x k−1 with the Comparing the value of x → f (x) + (1/2 h) |x − x k−1 (h) minimum value assumed at x = x k , we find that (h) 2 (h) f (x k(h) ) + (1/2 h) |x k(h) − x k−1 | ≤ f (x k−1 )
∀k ≥ 1.
(12.18)
Notice that (12.18) implies f (x k(h) ) ≤ f (x 0(h) ) = f (x 0 ), i.e., f is decreasing along the discrete flow x (h) . Since f has bounded sublevel sets, this proves (12.16). To prove (12.17), we start by adding up (12.18) on k = 1, . . . , N and canceling out the terms f (x k(h) ), k = 1, . . . , N − 1, which appear on both sides, to find that f (x (h) N )+
N 1 (h) (h) 2 |x − x k−1 | ≤ f (x 0 ), 2 h k=1 k
∀N ≥ 1.
In particular, since f ≥ 0, we find ∞
(h) 2 |x k(h) − x k−1 | ≤ 2 h f (x 0 ).
(12.19)
k=1
If now s > t ≥ 0 and |t − s| ≥ 2h then we can find integers j > k ≥ 1 such that (k − 1)h ≤ t < k h < (k + 1) h ≤ ( j − 1) h ≤ s < j h, so that, by (12.19), we find (h) |x (h) (s) − x (h) (t)| = |x (h) j−1 − x k−1 | ≤
≤ ≤
j−k
j−1 =k
|x (h)
1 + [(s − t)/h]
−
j−1
=k (h) 2 1/2 x −1 |
j −k −1 ≤
s−t , h
(h) |x (h) − x −1 |
2 h f (x 0 ) ≤
2 f (x 0 )
√
s−t +h ≤ M
√
s − t,
since s − t ≥ 2 h. This proves (12.17). Step two: We show that x (h) solves a “weak (integral) discrete form” of the gradient flow system of ODE x(t) = −∇ f (x(t)). More precisely, we prove that if s > t ≥ 0, then
138
Gradient Flows and the Minimizing Movements Scheme
x (h) (s) − x (h) (t) +
s t
∇ f x (h) (r) dr
≤ M h,
(12.20)
for a constant M depending on f and x 0 only. Indeed, the critical point condition associated to the minimality property (12.14) gives (h) = −h ∇ f x k(h) , ∀k ≥ 1, (12.21) x k(h) − x k−1 so that if (k − 1) h ≤ t < k h and ( j − 1) h ≤ s < j h for integers 1 ≤ k ≤ j, then (h) x (h) (s) − x (h) (t) = x (h) j−1 − x k−1 =
=−
jh
j−1 =k
(h) x (h) − x −1 = −h
∇ f x (h) (r) dr.
j−1 =k
∇ f x (h) (12.22)
kh
Since x (h) (r) ∈ { f ≤ f (x 0 )} for every r ≥ 0 and ∇ f is bounded by a constant M/2 depending only on f and x 0 on the bounded set { f ≤ f (x 0 )} we find that s
j h M
∇ f (x (h) (r)) dr − ∇ f x (h) (r) dr
≤ | j h − s| + |k h − t| ≤ M h, 2 t
kh that is (12.20). Step three: We claim that for each h j → 0+ there exists a function x(t) : [0, ∞) → Rn such that, up to extract a not relabeled subsequence, x (h j ) → x locally uniformly on [0, ∞). If the claim holds, then x(0) = x 0 and, by letting h = h j in (12.20), s ∇ f (x(r)) dr, ∀s > t ≥ 0, x(s) = x(t) − t
so that x is smooth on [0, ∞), and it is actually the unique solution of x (t) = −∇ f (x(t)) such that x(0) = x 0 . In particular, the limit of x (h j ) does not depend on the particular subsequence that had been extracted, and therefore x (h) → x as h → 0+ . We are left to prove the claim. The claim would follow from the Ascoli-Arzelá theorem, (12.16) and (12.17) if only (12.17) would hold without the restriction that |t −s| ≥ 2h. However one can easily adapt the classical proof of the Ascoli-Arzelá theorem, and thus establish compactness, even under the (weaker than usual) condition (12.17) that the equicontinuity of {x (h) }h only holds above the scale 2 h. The simple details are omitted.
13 The Fokker–Planck Equation in the Wasserstein Space
We continue the discourse started in Chapter 12 with an example of implementation of the minimizing movements scheme in the Wasserstein space. In Section 13.1 we introduce the Fokker–Planck equation and its associated free energy F , which consists of a potential energy term and of the (negative) entropy term S = Rn ρ log ρ already met in Chapter 10 (see (10.2)). We show that F has a unique critical point, which is also a stationary solution of the Fokker–Planck equation, and thus the natural candidate for describing the long time behavior of general solutions. In Section 13.2 we introduce the inner variations of functionals on Pac (Rn ), and (having in mind the importance of the critical point condition (12.21) in the proof of Theorem 12.3) derive the corresponding first variation formulae for the potential energy term in F and for W2 . Taking inner variations of the entropy term S is more delicate, and is separately addressed in Section 13.3. Finally, in Section 13.4 we construct the minimizing movements scheme for F with respect to W2 , and prove its convergence toward a solution to the Fokker–Planck equation, while in Section 13.5 we informally discuss how to prove convergence toward equilibrium in the Fokker–Planck equation when the free energy F is uniformly displacement convex (compare with the use of (12.4) in proving (12.5) in Theorem 12.1).
13.1 The Fokker–Planck Equation The Fokker–Planck equation models the time evolution of the position probability density ρ(t) = ρ(·,t) : Rn → [0, ∞) of a particle moving under the action of a chemical potential Ψ : Rn → [0, ∞) and of white noise forces due to molecular collisions. It takes the form ∂ρ 1 = div (∇Ψ(x) ρ) + Δρ, on Rn × [0, ∞), (13.1) ∂t β where β is a positive constant with the dimensions of inverse temperature. We impose the initial condition ρ(x, 0) = ρ0 (x), for ρ0 : Rn → [0, ∞) such that 139
140
The Fokker–Planck Equation in the Wasserstein Space Rn
ρ0 (x) dx = 1,
ρ0 ≥ 0.
(13.2)
The conditions (13.2) are propagated in time by the Fokker–Planck equation: indeed, non-negativity is preserved by the maximum principle, while the total mass of ρ(t) is constant in time since, at least informally, 1
d ∂ρ ∇ρ = ρ(t) = div ρ ∇Ψ + = 0. dt Rn β Rn ∂t Rn Therefore it makes sense to study the Fokker–Planck equation as a flow in n 1 n Pac (R ), under the identification between densities ρ ∈ L (R ) with ρ ≥ 0 and ρ = 1 and probability measures μ = ρ dL n ∈ Pac (Rn ). In this direction, Rn we notice that a stationary state for the Fokker–Planck equation, and thus a possible limit for the Fokker–Planck flow, exists in Pac (Rn ) if the potential Ψ is such that e−β Ψ ∈ L 1 (Rn ). Indeed, ρ = e−βΨ is such that ρ ∇Ψ + β −1 ∇ρ = 0, so that, as soon as e−βΨ has finite integral, the density ρ∗ defined by e−β Ψ(x) , Z= ρ∗ (x) = e−β Ψ(x) dx, (13.3) n Z R is a stationary state for (13.1) and satisfies μ∗ = ρ∗ dL n ∈ Pac (Rn ). A functional F : Pac (Rn ) → R ∪ {±∞} which is naturally associated with the Fokker–Planck equation is the free energy 1 1 Ψ(x) ρ + ρ log ρ (13.4) F ( ρ) = E ( ρ) + S( ρ) = β β Rn Rn consisting of the sum of the chemical energy E defined by the potential Ψ and the (negative) entropy functional S. We now make two informal remarks pointing to the conclusion that it should be possible to interpret the Fokker– Planck equation as the gradient flow of the free energy F . Remark 13.1 The free energy F is a Lyapunov functional of (13.1). We can prove this informally (i.e., assuming that differentiations and integrations by parts can be carried over as expected) by noticing that if ρ(t) = ρ(x,t) solves (13.1), then
1 + log ρ ∂ ρ d F ( ρ(t)) = Ψ(x) + dt β ∂t Rn
1 + log ρ 1 = Ψ(x) + div (∇Ψ(x) ρ) + Δρ β β Rn 1
A formal derivation would require establishing a suitably strong decay at infinity for ρ.
13.1 The Fokker–Planck Equation
141
1 ∇ρ ∇ρ =− ∇Ψ(x) + · ∇Ψ(x) ρ + β ρ β Rn 2
∇ Ψ + log ρ
ρ ≤ 0. =−
n β
R
In particular,
D( ρ) =
2
log ρ
∇ Ψ +
ρ, β
Rn
(13.5)
is the dissipation functional of the free energy F along the Fokker–Planck flow. Remark 13.2 The free energy F admits a unique critical point, which is the stationary state ρ∗ described in (13.3). More precisely, if μ = ρ dL n ∈ Pac (Rn ) is such that ρ is continuous with { ρ > 0} = Rn , and if ρ is a critical point of F in the sense that d
∞ n F ( ρ + s ϕ) = 0, ∀ϕ ∈ C (R ), ϕ = 0, (13.6) c ds
s=0 Rn then ρ = ρ∗ with ρ∗ as in (13.3). Indeed, { ρ > 0} = Rn implies that for every ϕ as above one has ( ρ + s ϕ) dL n ∈ Pac (Rn ) whenever s is sufficiently small, and then (13.6) boils down to
1 + log ρ ϕ = 0. Ψ+ ϕ = 0, ∀ϕ ∈ Cc∞ (Rn ), β Rn Rn This implies that Ψ + β −1 (1 + log ρ) = constant on Rn , i.e., ρ = ρ∗ . We close this section presenting an additional motivation for studying the Fokker–Planck equation, namely, we prove that the Fokker–Planck equation with quadratic potential is the blow-up flow of the heat equation. (This remark plays no role in the analysis of the minimizing movements scheme done in the subsequent sections.) Remark 13.3 Let us consider the solution ρ = ρ(x,t) of the heat equation on the whole space Rn , that is ∂t ρ = Δρ on Rn , with initial datum ρ0 ∈ L 1 (Rn ), ρ0 ≥ 0. As it is easily deduced from (12.12), we know that ρ(t) → 0+ uniformly on Rn , with 0 ≤ ρ ≤ t −n/2 ρ0 L 1 (Rn ) . It is thus interesting to “blow-up” the heat flow near an arbitrary point x 0 ∈ Rn , trying to capture more precisely the nature of this exponential decay toward 0. To this end is we consider L 1 -norm preserving rescalings of ρ of the form σ(y, τ) = φ(τ) n ρ φ(τ) (y − x 0 ), ψ(τ) , y ∈ Rn , τ ≥ 0, determined by functions φ, ψ : [0, ∞) → (0, ∞) with φ(0) = 1 and ψ(0) = 0. Notice that, whatever the choice of φ and ψ we make, the Ansatz made on σ is such that
142
The Fokker–Planck Equation in the Wasserstein Space
Rn
σ(τ) =
Rn
ρ(ψ(τ)) =
Rn
ρ0
∀τ > 0.
We now want to choose φ and ψ wisely, so that the dynamics of σ is also described by a parabolic PDE: to this end, we compute ∇2y σ = φ n+2 ∇2x ρ, ∂τ σ = n φ n−1 φ ρ + φ n φ (y − x 0 ) · ∇ x ρ + φ n ψ ∂t ρ φ = div y ((y − x 0 ) σ) + φ n ψ ∂t ρ. φ In particular, if φ = φ and φ n ψ = φ n+2 , that is (taking into account the initial conditions φ(0) = 1 and ψ(0) = 0), if φ(τ) = eτ ,
ψ(τ) =
e2 τ − 1 , 2
then σ solves the Fokker–Planck equation with Ψ(y) = |y − x 0 | 2 /2 and β = 1: (13.7) ∂τ σ = div (y − x 0 ) σ + Δσ. As much as the heat equation describes the dynamics of the position probability density ρ of a particle undergoing random molecular collisions, the above scaling analysis shows that, near a point x 0 in space, the heat equation dynamics is the superposition of random molecular collisions with the action of the force field y − x 0 pointing away from x 0 . Based on our previous discussion on the Fokker–Planck equation, we expect this refined local dynamics to have the 2 . This result, read in terms of asymptotic equilibrium state σ(y) = e− |y−x0 | /2√ −1 the heat equation by inverting τ = ψ (t) = log 1 + 2 t, means that √ 2 as t → +∞, (1 + 2t) n/2 ρ 1 + 2t (y − x 0 ),t ≈ e− |y−x0 | /2 a prediction that is easily confirmed by working with the resolution formula (12.12).
13.2 First Variation Formulae for Inner Variations Implementing the minimizing movements scheme for the free energy F with respect to the Wasserstein distance W2 involves the minimization in ρ of functionals of the form 1 W2 ρ dL n , ρ0 dL n 2 , Ψρ+ ρ log ρ + (13.8) n n 2h R R associated to a given ρ0 dL n ∈ P2,ac (Rn ). As in the finite dimensional case (see Theorem 12.3, and (12.21) in particular), the critical point condition (or
13.2 First Variation Formulae for Inner Variations
143
Euler–Lagrange equation) of the functional (13.8) plays a crucial role in proving the convergence of the discrete-in-time flow as h → 0+ . Therefore, in this and in the next section, we introduce a notion of variation for the functional (13.8), and then characterize the corresponding notion of critical point. Given μ = ρ dL n ∈ Pac (Rn ) we can take variations μt of μ in the form μt = ( ρ + t ϕ) dL n ,
(13.9)
whenever ϕ ∈ Cc0 (Rn ) is such that spt ϕ ⊂⊂ { ρ > δ} for some δ > 0, t is sufficiently small with respect to δ and ϕC 0 (Rn ) , and provided Rn ϕ = 0. Indeed, under these conditions, we trivially have μt ∈ Pac (Rn ), and therefore we can try to differentiate at t = 0 any functional defined on Pac (Rn ) along the curve t → μt . However, given the discussion contained in Section 10.1 (see, in particular (12.13)), we do not expect the Fokker–Planck equation and the free energy F to be linked through the “outer” variations defined in (13.9). We shall rather resort to a kind of variation that looks more natural from the point of view of displacement convexity (see Section 10.1), that is the notion of “inner” variation. Given ε > 0, a vector field u ∈ Cc∞ (Rn ; Rn ), and Φ ∈ C ∞ (Rn × (−ε, ε); Rn ), we say that {Φt } |t |0} = U ( ρ) + t U ( ρ) − ρ U ( ρ) div u + O(t 2 ), {ρ>0}
and (13.17) follows. Looking back at the examples of internal energies listed in Chapter 10, we find that d
U ( ρ ) = − ρ div u, if U ( ρ) = ρ log ρ, t dt
t=0 {ρ>0} d
U ( ρ ) = (1 − γ) ργ div u, if U ( ρ) = ργ . t dt
t=0 {ρ>0} In particular, when γ = 1 and U the first variation is always identically ( ρ) = ρ, 0 (as expected, since we have Rn ρ = Rn ρt = 1 for every t).
13.3 Analysis of the Entropy Functional
147
13.3 Analysis of the Entropy Functional We now take a closer look at the entropy functional ρ log ρ, S( ρ) = Rn
where some care has to be paid toward integrability issues. We shall work in the class A = ρ : ρ dL n ∈ P2,ac (Rn ) = ρ : ρ = 1, ρ ≥ 0, M ( ρ) < ∞ (13.18) Rn M ( ρ) = |x| 2 ρ(x) dx. (13.19) Rn
Thanks to (10.5) with p = 2, we have
2 S( ρ) ≥ −M ( ρ) − log e− |x | ,
∀ρ ∈ A,
Rn
(13.20)
so that S is well-defined, with values in R∪{+∞}, on A. However, this is pretty much all that is implied by (13.20), as exemplified in the following remarks. Remark 13.7 (S is unbounded from below on A) In the limit R → +∞, the densities 1B R , R > 0, ρR = ωn Rn correspond to a “cloud of gas” that rarefies indefinitely over increasingly larger portions of space. We thus expect the physical entropy of such configurations to diverge to +∞ as R → +∞, and indeed S( ρ R ) = log(1/ω n ) + n log(1/R) → −∞
as R → +∞.
Of course in this case we have M ( ρ R ) = O(R2 ) as R → +∞. Remark 13.8 (S can take the value +∞ on A) Consider, for example, ρ(x) = c0 By the first identity in 1 dr , =− 2 log(r) r log (r)
1 B1/2 \{0} (x) |x| log2 (|x|)
,
x ∈ R.
dr = − log(− log(r)), (13.21) r (− log(r)) we have ρ ∈ L 1 (R), so that we can get R ρ = 1 by suitably choosing c0 . Moreover, M ( ρ) < ∞ since { ρ > 0} is bounded so that ρ ∈ A. At the same time ρ log ρ ≥ 1/|x| | log |x|| for 0 < |x| < 1/2, so that the second identity in (13.21) gives S( ρ) = +∞.
148
The Fokker–Planck Equation in the Wasserstein Space
In the next theorem we collect various properties of the entropy functional S over A. Notice (13.22), which improves the lower bound in (13.20) with the appearance of a sublinear power of M ( ρ): this will be useful in subsequent compactness arguments. Theorem 13.9 (Properties of the entropy functional S) The following properties hold: (i) Entropy-moment bound: If ρ ∈ A and α ∈ (n/(n + 2), 1), then ρ log ρ ≥ −C(n, α) 1 + M ( ρ) α . (13.22) 0≥ {ρ 0, Rn ϕ ρ0 + ρ ζ ∇Ψ · ∇ϕ − Δϕ + ϕ ζ = 0, (13.56) ζ (0) Rn
for every ζ ∈
Rn ×(0,∞)
Cc∞ ([0, ∞))
and ϕ ∈ Cc∞ (Rn ), and such, that for every T > 0,
M ( ρ(·)) L ∞ (0,T ) ≤ CT for every T > 0, ρ(x, ·) Ψ(x) dx ∞ ≤ CT . L (0,T )
Rn
(13.57) (13.58)
Indeed, if we set U (r) = r max{0, log(r)}, then (13.42) implies that sup sup U ( ρ(h) (t)) dx ≤ CT . h>0 0≤t ≤T
Rn
Since U (r)/r → +∞ as r → +∞, by the Dunford–Pettis criterion 3 we find that for every h j → 0+ there exists a not-relabeled subsequence of h j and a measurable function ρ : (0, ∞) × Rn → [0, ∞) such that ρ ∈ L 1 ((0,T ) × Rn ) for every T < ∞ and ρ(h j ) ρ 3
weakly in L 1 ((0,T ) × Rn ) for every T < ∞.
See, e.g., [AFP00, Theorem 1.38].
(13.59)
158
The Fokker–Planck Equation in the Wasserstein Space
We notice that (13.59) implies (13.56) by setting h = h j and letting j → ∞ in (13.47). Moreover, testing (13.59) on 1Rn ×I (x,t) for a bounded Borel set I ⊂ (0, ∞) shows that ρ(x,t) dx dt = lim ρ(h j ) (x,t) dx dy = L 1 (I). Rn ×I
j→∞
Rn ×I
: Rn ρ(t) 1}) be of positive measure, we could choose I so to Should reach a contradiction in the above identity. Therefore ρ(t) = 1 for a.e. t > 0. (13.60) L 1 ({t
Rn
Given t 0 ∈ (0,T ), we can similarly test (13.59) with 1 B R ×(t0 −ε, t0 +ε) (x,t) to find t0 +ε M ( ρ(t)) dt = lim |x| 2 ρ(x,t) dx dt R→+∞ B R ×(t 0 −ε, t 0 +ε) t 0 −ε |x| 2 ρ(h j ) (x,t) dx dt ≤ CT 2 ε, ≤ lim lim sup R→+∞
j→∞
B R ×(t 0 −ε, t 0 +ε)
thanks to (13.41): in particular, by the Lebesgue’s points theorem, we find (13.57). A similar argument, based on (13.43), proves (13.58). Step four: We conclude the proof by showing that there exists a unique funcn 1 n tion ρ : R × (0, ∞) → R such that ρ ∈ L ((0,T ) × R ) for every T > 0, ρ(t) = 1 for a.e. t > 0, and (13.56), (13.57) and (13.58) hold. We first Rn notice that if ρ is a smooth solution of the Fokker–Planck equation (13.1), that is, if ∂t ρ = div ( ρ ∇Ψ) + Δρ on Rn with ρ(0) = ρ0 , then multiplication of (13.1) by ϕ(x) ζ (t) with ϕ ∈ Cc0 (Rn ) and ζ ∈ Cc0 ([0, ∞)) leads to (13.56). Viceversa, one can show that any solution ρ of (13.56) which satisfies (13.57) and (13.58) is actually a classical smooth solution of (13.1) such that ρ(t) → ρ0 strongly in L 1 (Rn ) as t → 0+ . Since this argument really pertains the theory of linear parabolic PDEs more than OMT, we omit the details. 4 We rather explain in detail why there is a unique smooth solution of (13.1) such that ρ(t) → ρ0 strongly in L 1 (Rn ) as t → 0+ and (13.57) and (13.58) hold: indeed, this step if crucial for proving the convergence of the whole scheme as h → 0+ ! To this end, let σ denote the difference of two such solutions, so that σ ∈ C ∞ (Rn × (0, ∞)), and ∂t σ = div (σ ∇Ψ) + Δσ,
(13.61) 0+ ,
as t → σ(t) → 0 in |σ(t)| 1 + |x| 2 ≤ CT . sup L 1 (Rn )
0 0 thanks to the arbitrariness of ϕ0 (0). Remark 13.12 One can prove, in analogy with (12.17), that the discrete flow ρ(h) is 1/2-Hölder continuous in the Wesserstein distance (above time scale h), in the sense that (13.65) sup W2 ρ(h) (t), ρ(h) (s) ≤ CT |s − t| + h, 0≤t, s ≤T
160
The Fokker–Planck Equation in the Wasserstein Space
compare with (12.17). Indeed, if 0 ≤ t < s ≤ T, then there exists integers k ≤ j such that (k − 1) h ≤ t < k h ≤ ( j − 1) h ≤ s < j h ≤ T. If j = k, then the left-hand side of (13.65) is 0. If j = k + 1, then by (13.44) √ W2 ρ(h) (t), ρ(h) (s) = W2 ρ(h) , ρ(h) ≤ CT h, k−1 k and (13.65) holds. Finally, if j ≥ k + 2, then s − t ≥ ( j − k − 1) h and (13.44) give j−2 (h) (h) , ρ W2 ρ(h) ≤ W2 ρ(h) (t), ρ(h) (s) = W2 ρ(h) j−1 +1 , ρ k−1
≤
j −k −1
j−1 =k
≤
j −k −1
which implies (13.65) with CT
√
j−1 =k
=k
(h) 2 W2 ρ(h) +1 , ρ
1/2
(h) (h) 2 1/2 W2 ρ+1 , ρ ≤
s−t CT h, h
s − t on the right-hand side.
13.5 Displacement Convexity and Convergence to Equilibrium In Theorem 12.1 we have proved that if f : Rn → R is uniformly convex, in the sense that for some λ > 0 we have v · ∇2 f (x)[v] ≥ λ |v| 2
∀x, v ∈ Rn ,
then the unique solution of x (t) = −∇ f (x(t)) converges exponentially to the unique critical point (and global minimum) x min of f , in the sense that |x(t) − x min | ≤ e−λ t |x(0) − x min |,
∀t ≥ 0.
Given the identification (Theorem 13.11) of the Fokker–Planck equation (13.1) with the gradient flow of the free energy Ψ(x) ρ + ρ log ρ, (13.66) F ( ρ) = E ( ρ) + S( ρ) = Rn
Rn
on the metric space (A, W2 ), A = { ρ : ρ dL n ∈ P2,ac (Rn )}, we may thus ask if the Fokker–Planck flow ρ(t) converges to the equilibrium state
13.5 Displacement Convexity and Convergence to Equilibrium
161
ρ∞ = e−Ψ /Z of F (Z = Rn e−Ψ ). The answer is affirmative, and one can obtain the following type of statement: if Ψ is λ-uniformly convex on Rn , and if the initial datum ρ0 is comparable to ρ∞ , then ρ(t) converges to ρ∞ in W2 as t → ∞,
(13.67)
at an explicit exponential rate which depends on λ. The assumption on Ψ is the existence of λ > 0 such that τ · ∇2 Ψ(x)[τ] ≥ λ |τ| 2
∀x, τ ∈ Rn ,
(13.68)
while ρ0 is comparable to ρ∞ if there are a ∈ (0, 1) and b > 1 such that a ρ∞ ≤ ρ0 ≤ b ρ∞
on Rn .
(13.69)
The main goal of this section is providing an informal justification of (13.67), highlighting in the process some key ideas that, more generally, are useful in addressing the same kind of questions in other contexts. Following the blueprint of the proof of Theorem 12.1, convergence to equilibrium is deduced from two basic inequalities: (i): a control on W2 ( ρ, ρ∞ ) 2 = W2 ( ρ dL n , ρ∞ dL n ) 2 in terms of the (nonnegative) energy gap F ( ρ) − F ( ρ∞ ); (ii): a control on the energy gap F ( ρ) − F ( ρ∞ ) in terms of the dissipation D( ρ) of F along the Fokker–Planck flow, see (13.5). We now sketch the arguments needed for obtaining these two results. Sketch of proof of (i): We show that F ( ρ) − F ( ρ∞ ) ≥
λ W2 ( ρ, ρ∞ ) 2 , 2
∀ρ ∈ A.
(13.70)
Indeed, let ρt dL n = (∇ f t )# μ, where ∇ f is the transport map from μ = ρ dL n to μ∞ = ρ∞ dL n , and where ∇ f t = (1 − t) id + t ∇ f . By (13.68) we obtain a quantification of the gap in the basic convexity inequality for Ψ, namely λ (1 − t) Ψ(x) + t Ψ(y) ≥ Ψ (1 − t) x + t y + t (1 − t) |x − y| 2 , 2
(13.71)
for every x, y ∈ Rn and t ∈ [0, 1]. Setting y = ∇ f (x) in (13.71), multiplying by ρ the resulting inequality, and integrating over Rn we obtain Ψ ρ+t Ψ(∇ f ) ρ (13.72) (1 − t) E ( ρ) + t E ( ρ∞ ) = (1 − t) Rn Rn λ λ ≥ Ψ(∇ f t ) ρ + t(1 − t) |x−∇ f | 2 ρ = E ( ρt ) + t(1 − t) W2 ( ρ, ρ∞ ) 2 . 2 Rn 2 Rn
162
The Fokker–Planck Equation in the Wasserstein Space
Recalling that S is displacement convex thanks to Theorem 10.5, we have (1 − t) S( ρ) + t S( ρ∞ ) ≥ S( ρt ).
(13.73)
Since ρ∞ is the absolute minimizer of F over P2,ac (Rn ), we can add up (13.72) and (13.73) to obtain λ t (1 − t) W2 ( ρ, ρ∞ ) 2 2 λ ≥ F ( ρ∞ ) + t (1 − t) W2 ( ρ, ρ∞ ) 2 . 2
(1 − t) F ( ρ) + t F ( ρ∞ ) ≥ F ( ρt ) +
Rearranging terms, simplifying 1 − t, and then letting t → 1− , we find (13.70). Sketch of proof of (ii): We prove the “energy–dissipation inequality” 2
∇Ψ + ∇ρ
ρ ≥ 2λ F ( ρ) − F ( ρ ) , ∞
ρ
Rn
(13.74)
(compare with (13.5)). Let us consider the quantification of the supporting hyperplane inequality for Ψ obtained from (13.68), namely, Ψ(y) ≥ Ψ(x) + ∇Ψ(x) · (y − x) +
λ |x − y| 2 , 2
∀x, y ∈ Rn .
(13.75)
Taking y = ∇ f (x) in (13.75), multiplying by ρ, and integrating over Rn we obtain λ ∇Ψ · (∇ f − x) ρ + W2 ( ρ, ρ∞ ) 2 . E ( ρ∞ ) ≥ E ( ρ) + 2 Rn At the same time since ρ = ρ∞ (∇ f ) det ∇2 f L n -a.e. on { ρ > 0}, we have that S( ρ∞ ) = ρ∞ log( ρ∞ ) = ρ log( ρ∞ (∇ f )) n Rn R
ρ = S( ρ) − ρ log ρ log det ∇2 f . = 2 det ∇ f Rn Rn Adding up the two inequalities we have proved that λ 2 F ( ρ) − F ( ρ∞ ) + W2 ( ρ, ρ∞ ) ≤ ∇Ψ · (x −∇ f ) ρ+ ρ log det ∇2 f . n n 2 R R (13.76) Now, for L n -a.e. x ∈ { ρ > 0} we have that ∇2 f (x) is a diagonal matrix in a suitable orthonormal basis, therefore, by the elementary inequality
n n (1 + λ k ) ≤ λk , log k=1
k=1
13.5 Displacement Convexity and Convergence to Equilibrium
163
we deduce that, for L n -a.e. x ∈ { ρ > 0}, log(det ∇2 f (x)) = log det[Id + (∇2 f (x) − Id )] ≤ trace (∇2 f (x) − Id ), and thus, by the non-negativity of D2 f , that log(det ∇2 f ) dL n ≤ div (∇ f − x) in the sense of distributions on Rn . Now, by (13.69) and by the maximum principle, we have a ρ∞ ≤ ρ(t) ≤ b ρ∞
on Rn , for every t > 0.
(13.77)
By (13.77), 0 ≤ Ψ ≤ C(1 + |x| 2 ) on Rn , and the fact that ∇ f is the Brenier map from μ = ρ dL n to μ∞ = ρ∞ dL n , we can obtain bounds on |∇ f − x| such that the following integration by parts 2 ρ log det ∇ f ≤ ρ d[div (∇ f − x)] = ∇ρ · (x − ∇ f ), Rn
Rn
Rn
can be justified, and then can be combined with (13.76) to obtain (13.74), λ ∇ρ 2 F ( ρ) − F ( ρ∞ ) + W2 ( ρ, ρ∞ ) ≤ ∇Ψ + · (x − ∇ f ) ρ 2 ρ Rn 1/2
2 1/2
∇Ψ + ∇ρ
ρ 2 |∇ f − x| ρ ≤
ρ
Rn Rn 2 1
∇Ψ + ∇ρ
ρ + λ W ( ρ, ρ ) 2 , ≤ 2 ∞ 2λ Rn
ρ
2 where in the last inequality we have used ab ≤ (a2 /2λ) + (b2 λ/2). Convergence to equilibrium: Finally we combine (13.70) and (12.6). Indeed, differentiating F ( ρ(t)) along (13.1), as done in Section 13.1, we find that 2 d
∇Ψ + ∇ρ(t)
ρ(t). F ( ρ(t)) = −
dt ρ(t)
Rn Therefore (12.6) gives d F ( ρ(t)) − F ( ρ∞ ) ≤ −2 λ F ( ρ(t)) − F ( ρ∞ ) dt from which in turn we deduce, using also (13.70),
λ F ( ρ0 ) − F ( ρ∞ ) e−2 λ t ≥ F ( ρ(t)) − F ( ρ∞ ) ≥ W2 ( ρ(t), ρ∞ ) 2 . 2 In summary, we have proved 2 −λt W2 ( ρ(t), ρ∞ ) ≤ e F ( ρ0 ) − F ( ρ∞ ) , λ that is exponential convergence to equilibrium in Wasserstein distance.
14 The Euler Equations and Isochoric Projections
This chapter contains one more physically intriguing application of the Brenier theorem and the prelude to a crucial insight on the geometry of the Wasserstein space. Both developments originate in the study of the Euler equations for an incompressible fluid. This is one of the most fascinating and challenging PDE in Mathematics, which is closely related to OMT since the motion of an incompressible fluid is naturally described, from the kinematical viewpoint, by using isochoric (i.e., volume-preserving) transformations. The study of time-dependent isochoric transformations leads in turn to identify the transport equation, while the derivation of the Euler equations from the principle of least action leads to consider the minimization of the action functional for an incompressible fluid. These last two objects provide the entry point to understand the “Riemannian” (or “infinitesimally Hilbertian”) structure of the Wasserstein space, which will be further addressed in Chapter 15. In Section 14.1 we introduce isochoric transformations, while in Section 14.2 we derive the Euler equations from the principle of least action. In Section 14.3 we interpret the principle of least action for an incompressible fluid as the geodesics equation in the space of isochoric transformations. We then introduce the idea of geodesics as limits of “iterative projections of mid-points,” and then present (as a rather immediate consequence of Theorem 4.2) the Brenier projection theorem, relating the quadratic Monge problem to the L 2 -projection on the “manifold” of isochoric transformations.
14.1 Isochoric Transformations of a Domain We consider an open connected set Ω ⊂ Rn , which plays the role of a container completely occupied by an incompressible fluid. An isochoric transformation of Ω is a smooth and bijective map Φ : Cl(Ω) → Cl(Ω) such that Φ(∂Ω) = ∂Ω and Φ# (L n Ω) = L n Ω, i.e., 164
14.1 Isochoric Transformations of a Domain |Φ−1 (E)| = |E|
for every Borel set E ⊂ Ω.
165
(14.1)
A by now familiar argument shows that (14.1) is equivalent to | det ∇Φ(x)| = 1
for every x ∈ Ω,
(14.2)
see Proposition 1.1. In fact, by connectedness of Ω, we either have det ∇Φ = 1 of det ∇Φ = −1 on Ω; depending on which condition holds, we have an orientation preserving or inverting isochoric transformation. We denote by M (Ω) the set of isochoric transformations of Ω. The basic kinematic assumption of classical Fluid Mechanics is that the motion of a fluid can be described by a time dependent family of orientation preserving isochoric transformations of Ω. Precisely, an incompressible motion in Ω is a smooth function Φ ∈ C ∞ (Ω × [0, ∞); Ω) such that, setting Φt (x) = Φ(x,t), we have Φ0 = id,
{Φt }t ≥0 ⊂ M (Ω).
The time derivative v = ∂t Φ is the velocity field of the incompressible motion Φ. The fact that Φt ∈ M (Ω) is reflected at the level of v by the validity of the following two conditions, v(x,t) · νΩ (x) = 0, trace ∇Φt (x) −1 ∇v(x,t) = 0,
∀x ∈ ∂Ω,t ≥ 0,
(14.3)
∀x ∈ Ω,t ≥ 0.
(14.4)
The validity of (14.3) is immediate from Φ(∂Ω,t) = ∂Ω and v = ∂t Φ. To prove (14.4) we notice that (14.2) and ∇Φt+h (x) = ∇Φt (x) + h ∇v(x,t) +O(h2 ) imply 1 = det ∇Φt+h = det ∇Φt (x) det 1 + h ∇Φt (x) −1 ∇v(x,t) + O(h2 ) = 1 + h trace ∇Φt (x) −1 ∇v(x,t) + O(h2 ), where in the last identity we have used det(Id + t A) = 1 + h trace( A) + O(t 2 ) for h → 0, see, e.g., [Mag12, Lemma 17.4]. Conversely, one can see that given a smooth vector field v ∈ C ∞ (Ω × [0, ∞); Rn ) satisfying (14.3) and (14.4), then, by setting Φ(x,t) = X (t) where X (t) is the solution of the ODE X (t) = v(X (t),t) with X (0) = x, we define an incompressible motion in Ω with velocity field v. Both the constraints (14.3) and (14.4) take a simpler form if, rather than on v, we focus on the Eulerian velocity u(y,t) = v(Φ−1 t (y),t),
(y,t) ∈ Ω × [0, ∞).
(14.5)
The relation between v and u is simple: while v(x,t) is the velocity of the fluid particle that at time t occupies position x (Lagrangian viewpoint), u(y,t) is
166
The Euler Equations and Isochoric Projections
the velocity of the fluid particle that at time t is transiting through position y (Eulerian viewpoint). We claim that (14.3) and (14.4) are equivalent to u(y,t) · νΩ (x) = 0, div u(y,t) = 0,
∀y ∈ ∂Ω,t ≥ 0,
(14.6)
∀y ∈ Ω,t ≥ 0.
(14.7)
The equivalence between (14.3) and (14.6) is immediate, while by differentiating in x the identity v(x,t) = u(Φt (x),t) one finds ∇Φt (x) −1 ∇v(x,t) = (∇u)(Φt (x),t) and deduce by the bijectivity of Φt that (14.7) and (14.4) are equivalent.
14.2 The Euler Equations and Principle of Least Action We now derive the equations of motion for an incompressible fluid. We consider the situation when, at the initial time t = 0, the velocity field v0 (x) and density of mass per unit volume ρ0 (x) of the fluid are known, and consider the problem of determining the future motion of the fluid. We are assuming there are no external forces and that no friction/viscosity effects are in place to dissipate the initial kinetic energy of the fluid, which is given by 1 ρ0 |v0 | 2 . 2 Ω Under these idealized conditions, the initial kinetic energy should be conserved, and the fluid be stuck in an endless motion driven by the sole effect of the incompressibility constraint. Indeed, preserving the isochoric character of the motion induces a mechanism such that fluid particles have to continuously move away from their position to make space for incoming particles. The force causing this motion is exerted by the internal pressure of the fluid, which is ultimately due to the intramolecular forces responsible for the validity of the incompressibility constraint. Equation for the mass density: We denote by ρ(y,t) the density of mass per unit volume of the fluid particle that at time t is transiting through position y. If E is an arbitrary region in Ω, then {Φt (E)}t ≥0 describes the motion of the part of the fluid occupying the region E at time t = 0. By the Principle of Conservation of Mass, the total mass of Φt (E) must stay constant in time, i.e., d 0= ρ(y,t) dy. dt Φt (E) By using det ∇Φt = 1, we find that d 0= ρ(Φt (x),t) dx = (∂t ρ)(Φt (x),t) + ∇ρ(Φt (x),t) · v(x,t) dx. dt E E
14.2 The Euler Equations and Principle of Least Action
167
By arbitrariness of E and by u(Φt (x),t) = v(x,t), the transport equation 1 ∂t ρ + ∇ρ · u = 0
on Ω × [0, ∞)
(14.8)
holds. If the Eulerian velocity u is known, the transport equation, coupled with the initial condition ρ(·, 0) = ρ0 , allows one to determine the mass density of the fluid during the motion. Notice that if ρ0 is constant, then ρ ≡ ρ0 solves (14.8): in other words, it makes sense to study the motion of an incompressible fluid under the assumption that the mass density is constant throughout the motion. We also notice that, exactly under the incompressibility constraint div u = 0, the transport equation (14.8) is equivalent to the PDE ∂t ρ + div ( ρ u) = 0
on Ω × [0, ∞),
(14.9)
known as the continuity equation (and playing a pivotal role in Chapter 15). Equations for the velocity field: We now derive the equations of motion by exploiting the equivalence between the Newton laws of motion and the principle of least action. In the absence of external or friction/viscosity forces, the action between times t 1 and t 2 > t 1 of a motion Φ is defined as the total kinetic energy associated to the motion between time t 1 and time t 2 , i.e., 1 t2 2 A(Φ; t 1 ,t 2 ) = dt ρ(Φ−1 t (x),t) |v(x,t)| dx 2 t1 Ω 1 t2 = dt ρ(y,t) |u(y,t)| 2 dy. (14.10) 2 t1 Ω To formulate the principle of least action we need to introduce the notion of variation of an incompressible motion Φ. We say that Ψ is a variation of Φ over (t 1 ,t 2 ) if Ψ = Ψ(x,t, s) : Ω × [0, ∞) × (−ε, ε) → Ω is a smooth function such that, setting Ψts (x) = Ψ s (x,t) = Ψ(x,t, s), Ψ s is an incompressible motion of Ω for every |s| < ε, Ψ0 = Φ and Ψts1 = Φt1 and Ψts2 = Φt2 in Ω, ∀|s| < ε.
(14.11)
The principle of least action postulates that Φ is an actual motion of the fluid if and only if it is a critical point of A, i.e., if for every variation Ψ of Φ over (t 1 ,t 2 ), we have d
A(Ψ s ; t 1 ,t 2 ) = 0. (14.12) ds
s=0 Writing (14.12) just in terms of Φ leads to the Euler equations. We shall do this under the assumption that ρ0 is constant, and thus, as discussed earlier, that ρ ≡ ρ0 for every time t ≥ 0. In this way, 1
Every physical quantity conserved along the fluid motion will satisfy a transport equation analogous to (14.8).
168
The Euler Equations and Isochoric Projections
d d ρ0 A(Ψ s ; t 1 ,t 2 ) = ds ds 2
t2
dt t1
Ω
∂ Ψ(x,t, s)
2 dx = ρ 0
t
t2
dt t1
Ω
∂t Ψ·∂st Ψ,
so that, setting X (x,t) = ∂s Ψ(x,t, 0) and noticing that ∂t Ψ(x,t, 0) = ∂t Φ(x,t) = v(x,t), we find that (14.12) is equivalent to t2 t2 0= dt v · ∂t X = − dt ∂t v · X. (14.13) Ω
t1
Ω
t1
where we have used (14.11) to claim that X (·,t 1 ) = X (·,t 2 ) = 0. Now, should there be no constraint on the variations Ψ, then X would be an arbitrary vector field, and (14.13) would imply that ∂t v = 0 – in other words, the fluid would move by inertial motion. This is the point where the isochoric nature of the fluid motion enters into play. The fact that Φs is an incompressible motion implies that 1 = det ∇Φts in Ω for every |s| < ε and t ≥ 0. In particular, if we differentiate this identity in s and set s = 0, thanks to Ψt0 = Φt we find trace (∇Φt ) −1 ∇X = 0, or equivalently, setting Y (y,t) = X (Φ−1 t (y),t), div Y = 0 on Ω × [0, ∞). Therefore, changing variables in (14.13) we get t2 dt (∂t v)(Φ−1 (14.14) 0= t (y),t) · Y (y,t) dy, t1
Ω
where Y satisfies ⎧ div Y = 0, ⎪ ⎪ ⎪ ⎪ ⎨ Y (·,t 1 ) = Y (·,t 2 ) = 0, ⎪ ⎪ ⎪ ⎪ Y · ν = 0, Ω ⎩
on Ω × [0, ∞), on Ω,
(14.15)
on ∂Ω × [0, ∞).
Viceversa, whenever Y is a vector field such that (14.15), then, setting X (x,t) = Y (Φt (x),t) and defining Ψ as the flow in the s variable generated by X, we obtain a variation Ψ of Φ on (t 1 ,t 2 ). Therefore (14.14) holds for every Y satisfying (14.15). Let us now recall that if a locally integrable vector field F : Ω → Rn satisfies F · Z = 0, Ω
for every smooth test vector field Z such that div Z = 0 in Ω and Z · νΩ = 0 on ∂Ω, then the distributional curl of F equals zero, and thus there exists a potential f : Ω → R such that F = ∇ f in Ω. Therefore, coming back to (14.14) we conclude that there exists p = p(y,t) : Ω × [0, ∞) → R such that ∂v −1 (Φ (y),t) = −∇p(y,t), ∂t t
∀y ∈ Ω,t ≥ 0.
(14.16)
14.2 The Euler Equations and Principle of Least Action
169
Finally, by differentiating in time v(x,t) = u(Φt (x),t), we notice that the acceleration of the fluid particle that at time t = 0 is at the position x is ∂t v(x,t) = ∇u[u] + ∂t u (Φt (x), x). The action of the n × n matrix ∇u on the vector u is commonly denoted by
n n n ui ∂i uj ej = ∂i u j ui e j = (∇u)[u], (u · ∇)u = i=1
j=1
i, j=1
so that the acceleration of the particle that at time t is in position y is given by ∂t u + (u · ∇)u (y,t) = ∂t v(Φ−1 t (y),t). This last fact combined with (14.16), (14.3), and (14.7) leads us to the (constant mass density) Euler equations ⎧ ∂ u + (u · ∇)u = −∇p, ⎪ ⎨ t ⎪ div u = 0, ⎩
in Ω × [0, ∞),
(14.17)
which are coupled with the boundary condition u · νΩ = 0 on ∂Ω × [0, ∞) and with the initial condition u(·, 0) = u0 in Ω. The function p is called the pressure associated with the fluid motion defined by u. Based on our preliminary physical considerations, we would expect a smooth solution of (14.17) to conserve kinetic energy. And indeed, taking into account (14.16) we find d 1 2 |v| = v · ∂t v = u(y,t) · (∂t v)(Φ−1 t (y),t) dy dt 2 Ω Ω Ω =− u(y,t) · ∇p(y,t) dy Ω = p div u − p u · νΩ = 0, Ω
∂Ω
so that kinetic energy is conserved by smooth solutions of the Euler equations.2 2
Constructing solutions of the Euler equations that exist for every t > 0 and conserve kinetic energy (e.g., because they are smooth) is an extremely challenging problem. In the planar case n = 2 this is possible by a classical result of Yudovich, stating is that if the vorticity ω = ∇ × u is bounded at time t = 0, then there is a global-in-time solution, which is unique and conserves kinetic energy. In the physical case n = 3, and in higher dimensions, however, the situation is incredibly more complex and many basic questions are open. Our expectation that the motion of an ideal fluid (no friction, no external forces, just the pressure induced by the isochoric constraint) would consist of a never-ending smooth conservative motion clashes with the cold reality that we are actually able to construct weak (non-smooth) solutions of (14.17) which dissipate kinetic energy (even at a prescribed rate!) and violate uniqueness.
170
The Euler Equations and Isochoric Projections
14.3 The Euler Equations as Geodesics Equations Based on the above derivation from the principle of least action, Euler equations (14.17) can be naturally interpreted, as originally proposed by Arnold, as the geodesics equation in the space M (Ω) of isochoric transformations of Ω. Indeed, constant speed minimizing geodesics in a Riemannian manifold (M, g) (defined over an interval (t 1 ,t 2 )) are minimizers (and thus satisfy the Euler–Lagrange equation) of the action functional 1 t2 2 |γ (t)|gγ (t ) dt. (14.18) 2 t1 Interpreting an admissible fluid motion Φ as a curve {Φt }t1 ≤t ≤t2 of isochoric transformations connecting Φt1 to Φt2 , and using the L 2 (Ω; Rn )-norm to measure the size of its velocity v = ∂t Φ, we can see that the action of Φ over (t 1 ,t 2 ), which is defined as t2 1 2 dt ρ(Φ−1 (14.19) A(Φ; t 1 ,t 2 ) = t (x),t) |v(x,t)| dx 2 t1 Ω t2 1 = dt ρ(y,t) |u(y,t)| 2 dy, 2 t1 Ω can be understood (up to multiplicative constants) in analogy to the action functional (14.18) for a curve in a manifold. This suggests to interpret solutions to the Euler equations as geodesics in the space of isochoric transformations. Inspired by this analogy, we now recall that given a manifold M embedded in Rn , there is a classical approach to the construction of geodesics in M (equipped with the Riemannian metric induced by the embedding into Rn ), an approach that we may try to reproduce on the Euler equations. The idea goes as follows: given points p1 and p2 in M, we can compute their middle–point (p1 + p2 )/2 in Rn , and then project it back on M (which is identified as a subset of Rn ), thus defining p1 + p2 p12 = projection on M of . 2 (For this construction to make sense, we need of course p1 and p2 to be sufficiently close to each other, so that (p1 + p2 )/2 lies in a neighborhood of M where the projection on M is well-defined.) Iterating this basic step, we first project the mid–points between p1 and p12 and between p12 and p2 on M, and then continue indefinitely: at the k-th step of the construction, O(2k ) points on M have been identified, and we expect the probability measure associated to these points to converge toward a geodesic curve with end-points p1 and p2 . We would now like to explore the idea of adapting this finite-dimensional construction to M (Ω). It seems to natural to consider M (Ω) “embedded” in
14.3 The Euler Equations as Geodesics Equations
171
L 2 (Ω; Rn ), since we are already using the L 2 (Ω; Rn )-norm to measure tangent vectors v = ∂t Φ in defining the action of a curve in M (Ω). The projection problem is delicate, because M (Ω) is evidently not-closed in L 2 (Ω; Rn ): however, the closure of M (Ω) in L 2 (Ω; Rn ) can be characterized 3 as the set of all Borel measurable isochoric transformations of Ω, i.e., as M ∗ (Ω) = T : Ω → Ω : T is Borel measurable and T# (L n Ω) = L n Ω . A celebrated theorem of Brenier shows that the projection operator of L 2 (Ω; Rn ) onto M ∗ (Ω) is well-defined, and can be described by composition with Brenier maps. Theorem 14.1 (Brenier projection theorem) Rn , let T ∈ L 2 (Ω; Rn ) be such that
Let Ω be a bounded open set in
T# (L n Ω) 0 L n -a.e. in Ω. Indeed, by the general form of the area formula 0 dy g(x) dH (x) = g JT, Ω
3
See [BG03, Corollary 1.1].
T −1 (y)
Ω
172
The Euler Equations and Isochoric Projections
which holds for every Borel function g : Rn → [0, ∞], we see that T# (L n Ω)(E) = L n = Ω = Ω = E
Ω ∩ T −1 (E) = L n Ω ∩ T −1 (E) ∩ { JT > 0} 1 {JT >0}∩T −1 (E) (x) JT (x) dx JT (x) 1 {JT >0}∩T −1 (E) (x) dH 0 (x) dy JT (x) T −1 (y) 1 {JT >0} (x) dH 0 (x) = 0 dy JT (x) T −1 (y)
if L n (E) = 0. More generally, (14.20) holds whenever we can find a L n a.e. Borel partition {E j } j of Ω such that Lip(T; E j ) < ∞ and JT > 0 L n -a.e. on each E j . This kind of partition can be constructed, for example, when T ∈ BVloc (Ω; Rn ) and | det ∇T | > 0 L n -a.e. in Ω, where DT = ∇T d[L n Ω] + [DT]s is the usual Radon-Nikodym decomposition of DT with respect to L n Ω. Remark 14.3 Theorem 14.1 is commonly known 4 as Brenier’s polar factorization theorem, in reference to the formula T = (∇ f ∗ ) ◦ T0
L n -a.e. in Ω,
(14.23)
which allows us to rearrange a generic map T ∈ L 2 (Ω; Rn ) with T# (L n Ω) 0 on Rn × [0, 1], v ∈ Cc∞ (Rn ; Rn ) is such that
176
Action Minimization, Eulerian Velocities, and Otto’s Calculus
div v = 0 and u transports µ, then for every s ∈ R we have that u + s [v/ρ] transports µ, since div ρ u + s [v/ρ] = div ( ρ u) = ∂t ρ. (15.7) The basic questions that now we want to address are (i) given a vector field u, how to generate curves of measures transported by u? (ii) given a curve of measures, how to find a vector field u which transports µ? (iii) can W2 be characterized in terms of minimization of the action defined in (15.5)? Before moving to the analysis of these questions, we make the following remark. Proposition 15.1 Given a curve of measures µ in P2 (Rn ) and a Borel vector field u : Rn × [0, 1] → Rn such that u(t) = u(·,t) ∈ L 2 (μt ) for a.e. t ∈ (0, 1) and A(µ, u) < ∞, we have that µ is transported by u if and only if, for every ϕ ∈ Cc∞ (Rn ), the map ϕ dμt is absolutely continuous on [0, 1] (15.8) t ∈ [0, 1] → Rn
and for a.e. t ∈ (0, 1) d dt
Rn
ϕ dμt =
Rn
∇ϕ · u(t) dμt .
(15.9)
Proof If µ is transported by u, then testing (15.6) with ψ(x,t) = ϕ(x) ζ (t) with ϕ ∈ Cc∞ (Rn ) and ζ ∈ C ∞ ([0, 1]) we find that
t=1 1 1 ζ (t) dt ϕ dμt = ζ (t) ϕ dμt
− ζ (t) dt ∇ϕ ·u(t) dμt , Rn Rn Rn 0 0
t=0 from which it follows that f (t) = μt [ϕ] is absolutely continuous on [0, 1] with f (t) = Rn ∇ϕ · u(t) dμt for a.e. t ∈ (0, 1), that is (15.9). Viceversa, if (15.8) and (15.9) hold, then (15.6) holds for every ψ(x,t) = ϕ(x) ζ (t) corresponding to ϕ ∈ Cc∞ (Rn ) and ζ ∈ C ∞ ([0, 1]), and by the density of the span of this kind of test functions in Cc∞ (Rn × [0, 1]) we conclude the proof. Not surprisingly, when working with curves of measures, one may need to discuss the differentiability of (15.8) for test functions ϕ less regular than Cc∞ (Rn ). The following proposition (which will only be used here in the proof of Theorem 15.6, and can be safely skipped on a first reading) addresses the case when ϕ is bounded and Lipschitz continuous. The delicate point here is that, since ∇ϕ may only exists L n -a.e. on Rn , (15.9) may not even make sense if μt is singular with respect to L n . Nevertheless, (15.8) still holds, and (15.9) can be replaced by an estimate in terms of the asymptotic Lipschitz constant |∇∗ ϕ| of ϕ (which, at variance with ∇ϕ, has a precise geometric meaning at every point of Rn ).
15.1 Eulerian Velocities and Action for Curves of Measures
177
Proposition 15.2 If µ is a curve of measures in P2 (Rn ) transported by a vector field u, then there is I ⊂ [0, 1] with L 1 ([0, 1] \ I) = 0 with the following property. If ϕ : Rn → R is bounded and Lipschitz continuous, then ϕ dμt Φ(t) = Rn
is absolutely continuous on [0, 1], with s |Φ(t) − Φ(s)| ≤ dτ |∇∗ ϕ| |u(τ)| dμτ , ∀(t, s) ⊂ (0, 1), (15.10) Rn t |Φ(t + h) − Φ(t)| ≤ |∇∗ ϕ| |u(t)| dμt , ∀t ∈ I. (15.11) lim sup |h| h→0 Rn Here, |∇∗ ϕ| : Rn → [0, ∞) is the bounded, upper semicontinuous function called the asymptotic Lipschitz constant of ϕ, and defined by |∇∗ ϕ|(x) = lim+ Lip(ϕ; Br (x)) = lim+ r →0
sup
r →0 y, z ∈B r (x), yz
|ϕ(y) − ϕ(z)| . (15.12) |y − z|
Proof Step one: If η k is a cut-off function between Bk and Bk+1 and vε = v ρε denotes the ε-regularization of v ∈ L 1loc (Rn ), then we find
Φ(t) − (η k ϕ)ε dμt
≤ ϕC 0 (Rn ) μt (Rn \ Bk+1 ) + η k ϕ − (η k ϕ)ε C 0 (Rn ) . Rn
By applying (15.9) to (η k ϕ)ε ∈ Cc∞ (Rn ), we thus find that for every (s,t) ⊂ (0, 1)
(η k ϕ)ε dμ s − (η k ϕ)ε dμt
|Φ(s) − Φ(t)| = lim lim+
k→∞ ε→0 Rn Rn
s
= lim lim+
dτ ∇[(η k ϕ)ε ] · u(τ) dμτ
k→∞ ε→0 t Rn
s ≤ lim sup lim sup dτ |∇ϕε | |u(τ)| dμτ .(15.13) k→∞
ε→0+
B k +1
t
Since, clearly, |∇ϕε (x)| ≤ Lip(ϕ; Bε (x)) we immediately find lim sup |∇ϕε (x)| ≤ |∇∗ ϕ|(x), ε→0+
∀x ∈ Rn .
By combining this last estimate with Fatou’s lemma and (15.13) we conclude that s dτ |∇∗ ϕ| |u(τ)| dμτ |Φ(s) − Φ(t)| ≤ lim sup k→∞
t
B k +1
so that (15.10) follows by monotone convergence.
178
Action Minimization, Eulerian Velocities, and Otto’s Calculus
Step two: We prove the existence of I ⊂ (0, 1) with L 1 ((0, 1) \ I) = 0 such that if t ∈ I and f : Rn → R is bounded and upper semicontinuous on Rn , then t+h 1 lim sup dτ f |u(τ)| dμτ ≤ f |u(t)| dμt . (15.14) h→0 h t Rn Rn (Then (15.11) will follow by taking f = |∇∗ ϕ|.) To clarify the issue at hand, we first notice that, being f bounded, the map t → Rn f |u(t)| dμt belongs to L 1 (0, 1); thus, denoting by I ( f ) the set of the Lebesgue points of this map, we see that (15.14) holds as an identity at every t ∈ I ( f ). The goal is thus finding a set of good values of t which is independent of f . To this end, let ζ k be a smooth cut-off function between Bk and Bk+1 (k ∈ N), let G be a countable dense set in Cc0 (Rn ) (in the local uniform convergence), and let I (ζ k g) ∩ I (ζ k ) ∩ I (1 − ζ k ) , I= k ∈N g ∈ G
L 1 ((0, 1)
so that \ I) = 0. If t ∈ I and f ∈ Cb0 (Rn ), then integrating f |u(τ)| dμτ ≤ ζ k g+ f −gC 0 (B k +1 ) ζ k + f C 0 (Rn ) (1−ζ k ) |u(τ)| dμτ Rn
Rn
in dτ over (t,t + h), dividing by h, and then letting h → 0, we find that t+h 1 lim sup dτ f |u(τ)| dμτ (15.15) h→0 h t Rn ≤ ζ k g + f − gC 0 (B k +1 ) ζ k + f C 0 (Rn ) (1 − ζ k ) |u(t)| dμt . Rn
Taking {g j } j ⊂ G such that g j → f locally uniformly on Rn , and letting first j → ∞ and then k → ∞ in (15.15) with g = g j , we deduce the validity of (15.14) whenever f ∈ Cb0 (Rn ). If now f is merely upper semicontinuous and bounded, then f s (x) = sup f (y) − y ∈Rn
|x − y| 2 , 2s
x ∈ Rn , s > 0,
(15.16)
is such that f s ∈ Cb0 (Rn ) (so that (15.14) holds for f s at every t ∈ I) with f ≤ f s ↓ f on Rn as s → 0+ (so that (15.14) for f s implies (15.14) for f at every t ∈ I). To prove the claimed properties of f s , we notice that the supremum in (15.16) is always achieved. Denoting by ys (x) a maximum point, it must be (15.17) |ys (x) − x| ≤ 2 s osc Rn f for otherwise f (x) ≤ f s (x) = f (ys (x)) −
|x − ys (x)| 2 < f (ys (x)) − osc Rn f ≤ infn f R 2s
15.2 From Vector Fields to Curves of Measures
179
a contradiction. By (15.17), ys (x) → x as s → 0+ , so that f (x) ≤ f s (x) ≤ f (ys (x)) and the upper semicontinuity of f imply f s ↓ f on Rn as s → 0+ . Finally, if x, z ∈ Rn , then we have f s (x)− f s (z) ≤ f s (x)− f (ys (x))+
|z − ys (x)| 2 |x − ys (x)| 2 |z − ys (x)| 2 ≤ − , 2s 2s 2s
and by the elementary identity |z − y| 2 − |x − y| 2 = |z − x| 2 + 2(y − x) · (x − z) and (15.17) we conclude that |z − x| 2 |z − x| 2 osc Rn f + . f s (x) − f s (z) ≤ √ 2s s By symmetry in (x, z), we deduce the (local) Lipschitz continuity of f s .
15.2 From Vector Fields to Curves of Measures We now exploit some classical ODE theory to show how sufficiently regular vector fields can be used to construct curves of measures transported according to the continuity equation. The regularity assumptions used here are not so convenient (e.g., the use of Proposition 15.3 in proving the Benamou–Brenier formula requires a quite delicate approximation argument), nevertheless this result has been included here because of its clear conceptual interest. Proposition 15.3 If u : Rn × [0, 1] → Rn is a Borel vector field such that 1 u(t)C 0 (Rn ) + Lip(u(t); Rn ) dt = L < ∞, (15.18) 0
then the flow Φ : Rn × [0, 1] → Rn of u, defined by ⎧ ∂ Φ(x,t) = u(Φ(x,t),t), ⎪ ⎨ t ⎪ Φ (x) = x, ⎩ 0 n n is such that Φt : R → R is a Lipschitz homeomorphism with L max{LipΦt , LipΦ−1 t } ≤ e ,
(15.19)
∀t.
Moreover, if μ0 ∈ P2 (Rn ), then μt = (Φt )# μ0 defines a curve of measures µ in P2 (Rn ) (with μt ∈ P2,ac (Rn ) if μ0 ∈ P2,ac (Rn )) which is transported by u and is such that 1 (15.20) A(µ, u) ≥ W2 (μ0 , μ1 ) 2 . 2 Finally, if µ ∗ is a curve of measures in P2 (Rn ) transported by u and such that μ∗t 0). However: (16.15) cannot hold, since h strictly increasing and s > 0 give h(r + s + t) − h(r + s) > h(r + t) − h(r); (16.16) cannot hold, since h increasing, h strictly increasing and r + s > 0 give 2
A more geometric (and, possibly, a conceptually clearer) way to see this argument is that, by the properties of h, the difference between the right-hand side and left-hand side of (16.14) can be decreased by moving to the right min{x, y } (which is the left-most point in {x, x , y, y }), or by moving to the left max{x , y } (which is the right-most point of the four). One can do this up to a position where either x = x or y = y , and thus (16.14) becomes an identity.
202
Optimal Transport Maps on the Real Line
h(r + s + t) = h(r + s + t) − h(r + s) + h(r + s) − h(s) + h(s) ≥ h(r + s + t) − h(r + s) + h(s) > (h(t) − h(0)) + h(s) = h(t) + h(s); and, finally, (16.17) cannot hold, since r > 0 and h is strictly increasing. We have thus proved that x < x and (16.14), imply y ≤ y , as claimed. Step three: We now assume that μ has no atoms, and set I μ = {0 < Mμ < 1} and T = Wν ◦ Mμ . To prove that μ is concentrated on I μ it is enough to notice that, by (16.1), μ x : Mμ (x) ∈ {0, 1} = L 1 ({0, 1}) = 0. The set I μ is in interval in general, and an open interval since μ has no atoms. Evidently T is defined and increasing on I μ , while the fact that T# μ = ν is immediate by combining (16.1) with (16.5) (applied to ν). If J is the smallest closed interval containing spt ν and T (x) < inf J for some x ∈ {0 < Mμ < 1}, then the interval {0 < Mμ < Mμ (x)}, which has positive μ-measure, is mapped by T on a ν-null set; we rule out similarly that T (x) > sup J for some x ∈ {0 < Mμ < 1}. This proves statement (iii). We begin the proof of statement (iv) by considering an increasing map S : I μ → R such that S# μ = ν. For every x ∈ I μ = (a, b) we have (a, x] ⊂ y ∈ (a, b) : S(y) ≤ S(x) , so that S# μ = ν gives Mμ (x) ≤ μ y ∈ (a, b) : S(y) ≤ S(x) = ν (−∞, S(x)] = Mν (S(x)). Since Wν (Mν (y)) ≤ y for every y ∈ R (recall (16.4)), by definition of T, by the previous inequality, and since Wν is increasing we find that T (x) = Wν (Mμ (x)) ≤ Wν (Mν (S(x))) ≤ S(x),
∀x ∈ I μ . (16.18)
Now pick x ∈ (a, b) such that, for some ε x > 0, T (x) ≤ S(x) −ε x . By applying Mν to both sides of this identity we find Mμ (x) ≤ Mν (S(x) − ε) for every ε ∈ (0, ε x ), while S −1 ((−∞, S(x) − ε)) ⊂ (−∞, x) gives Mν (S(x) − ε) ≤ Mμ (x), and hence Mμ (x) = Mν (S(x) − ε) for every ε ∈ (0, ε x ). This shows that if x ∈ (a, b) is such that T (x) < S(x), then there is an open interval I x such that Mν is constantly equal to Mμ (x) on I x . Evidently, there can be at most countably many disjoint intervals on which Mν is constant, therefore {T < S} is at most countable, and since μ has no atoms, we conclude that T = S μ-a.e. on R. We complete the proof of statement (iv) by considering γ ∈ Γ(μ, ν) such that spt γ is increasing, and proving that γ = (id × T )# μ. Indeed, if we set Z := x ∈ R : there exist y1 y2 such that (x, y1 ), (x, y2 ) ∈ spt γ ,
16.2 Optimal Transport on the Real Line
203
then to each x ∈ Z we can associate an open interval Ix , with the property that if x 1 , x 2 ∈ Z, x 1 x 2 , then I x1 ∩ I x2 = ∅. Therefore Z is at most countable, and thus μ-negligible. We conclude that μ is concentrated on a set E ⊂ R such that for every x ∈ E there is a unique S(x) ∈ R such that (x, S(x)) ∈ spt γ, and such that if x < x (x, x ∈ E), then S(x) ≤ S(x ). Since γ is concentrated on G = {(x, S(x)) : x ∈ E}, we are in a position to exploit (16.6) to conclude that γ = (id × S)# μ. As a consequence, ν = q# γ = S# μ, and thus by the first part of (iv), S = T μ-a.e. on R, and thus γ = (id × T )# μ. To prove statement (v) let us first consider h as in statement (ii). If c[h] ∈ L 1 (μ × ν), then there is an optimal transport plan γ in Kc (μ, ν), which is c[h]cyclically monotone by Theorem 3.9, and thus such that spt γ is increasing thanks to statement (ii). Hence statement (iv) implies that γ = γT = (id ×T )# μ, and in particular that √ Kc (μ, ν) = Mc (μ, ν) with T optimal in Mc (μ, ν). Now consider hε (r) = ε 2 + r 2 and cε (x, y) = hε (|x − y|). If μ, ν ∈ P1 (R), then cε ∈ L 1 (μ × ν), and hence γT is optimal in Kc ε (μ, ν). In particular, γT is cε -cyclically monotone, a property which is easily seen to imply that γT is c0 cyclically monotone, where c0 (x, y) = |x − y| is the linear transport cost. Hence γT is optimal in K1 (μ, ν), and T is optimal in M1 (μ, ν). Finally, to prove statement (vi) it is enough to check that when both μ and ν have no atoms, then W μ ◦ Mν is the inverse function of Wν ◦ Mμ . We omit the simple details of this proof.
17 Disintegration
The disintegration theorem (Theorem 17.1) is the second main ingredient in Sudakov’s strategy for the solution of the Monge problem, and can be introduced as a generalization of Fubini’s theorem. Given integers n = m + k, Fubini’s theorem provides a decomposition of L n as a superposition (i.e., as a sum/integral) of an m-dimensional family of k-dimensional measures concentrated on a partition (disjoint covering) of Rn by k-dimensional planes obtained as counter-images {p−1 (y)}y ∈Rm of the projection map p : Rn → Rm . Generalizing on this example, we consider the problem of decomposing a finite Borel measure μ on Rn as a superposition of probability measures μσ on Rn , concentrated on the counter-images P−1 (σ) of a Borel function P : Rn → S. The superposition operation is performed 1 through the Radon measure P# μ on S, and the resulting disintegration formula takes the form ϕ dμ = d(P# μ)(σ) ϕ dμσ ∀ϕ ∈ Cc0 (Rn ). (17.1) Rn
S
P −1 (σ)
In many applications one can simply take S = Rm , but in our case we will need S to be the space of (possibly unbounded) oriented segments in Rn (to be introduced in Section 18.1). For this reason, we will allow S to be a locally compact and separable metric space. The chapter is organized as follows. In Section 17.1 we state the disintegration theorem (Theorem 17.1), relate it to Fubini’s theorem and to the coarea formula, and make some other important remarks concerning the localization and reparametrization of disintegrations. In Section 17.2 we prove the disintegration theorem, and, finally, in Section 17.3 we discuss some interesting implications of the disintegration theorem concerning the relation between the notions of transport map and transport plan. 1
The choice of P# μ here is essentially the only possible one, see Theorem 17.1-(ii).
204
17.1 Statement of the Disintegration Theorem and Examples
205
17.1 Statement of the Disintegration Theorem and Examples To formally state the disintegration theorem we need to introduce the notion of Borel measurable map with values in the space of Borel measures: given a Borel set Σ in a locally compact and separable metric space S, and a collection of Borel measures { μσ }σ ∈Σ , we say that the map σ ∈ Σ → μσ is Borel measurable if for every E ∈ B(Rn ), the map σ ∈ Σ → μσ (E) is Borel measurable; or, equivalently, if for every bounded Borel function ϕ : Rn × S → R, the map ϕ(x, σ) dμσ (x), σ ∈ Σ → Rn
is Borel measurable. Notice that whenever η is a Radon measure concentrated on a Borel set Σ ⊂ S, and σ ∈ Σ → μσ is a Borel measurable map, then the formula ϕ d μσ ⊗ η = dη(σ) ϕ(x) dμσ (x) ϕ ∈ Cc0 (Rn ), Rn
Σ
Rn
defines a Radon measure μσ ⊗ η on Rn . Theorem 17.1 (Disintegration) Let S be a locally compact and separable metric space, let μ be a finite Borel measure concentrated on a Borel set E, and let P : E → S be a Borel map. Then the following properties hold: (i) there exists a Borel set Σ ⊂ S of full (P# μ)-measure such that for every σ ∈ Σ there exists μσ ∈ P (Rn ) such that μσ is concentrated on P−1 (σ), σ ∈ Σ → μσ is Borel measurable, and ϕ dμ = d(P# μ)(σ) ϕ dμσ , (17.2) Rn
Σ
Rn
for every bounded Borel function ϕ :
Rn
→ R, i.e., μ = μσ ⊗ [P# μ];
(ii) if F is a Borel subset of E, η is a finite Borel measure concentrated on a Borel set Σ ⊂ S, and for every σ ∈ Σ there exists a finite Borel measure μ∗σ such that σ ∈ Σ → μ∗σ is a Borel measurable map, and μ∗σ (Rn ) > 0,
(17.3)
μ∗σ is concentrated on F ∩ P−1 (σ), ϕ dμ = dη(σ) ϕ dμ∗σ ,
(17.4)
F
Σ
Rn
(17.5)
for every bounded Borel function ϕ : Rn → R, then η 0 for L n -a.e. (x, y) ∈ E, we can apply (17.9) to ϕ = ( ρ/CP) ψ for an arbitrary Borel function ψ : Rn → [0, ∞], to find that ψ dμ = ϕ CP = dy ϕ dH . (17.10) Rn
E
Rm
P −1 (y)
Taking ψ = 1 P −1 (Y ) for an arbitrary Borel set Y ⊂ Rm we find ρ ρ (P# μ)(Y ) = dH , i.e., P# μ = dH dy. dy Y P −1 (y) CP P −1 (y) CP Again by (17.10), for every Borel function ψ : Rn → [0, ∞] we have ρ/CP ψ dμ = d(P# μ)(y) = ψ dH , n m −1 ( ρ/CP) dH R R P (y) P −1 (y) which we compare to (17.5): by arguing as in Remark 17.4, we also verify (17.3) and (17.4), and conclude by Theorem 17.1-(ii) that μy =
( ρ/CP) d H [P−1 (y)] , ( ρ/CP) dH −1 P (y)
(17.11)
for (P# μ)-a.e. y ∈ Rm . Identity (17.11) shows, in particular, that μ y is absolutely continuous with respect to H [P−1 (y)], a fact that will be crucially used in the solution of the Monge problem with linear cost (Theorem 18.1). 2
If A ∈ R m ⊗ R n with n ≥ m, the condition det( A A∗ ) > 0 is equivalent to rank( A) = m.
208
Disintegration
Remark 17.6 (Reparameterizing a disintegration) With E, μ, P, and S as in the statement of Theorem 17.1, let us now consider another separable, locally compact metric space T and let η : P(E) → T be an injective Borel map.
(17.12)
Then, the Borel map Q : E → T defined by setting Q = η ◦ P is such that P = μQ μσ η(σ)
for (P# μ)-a.e. σ ∈ S.
(17.13)
P } and { μQ } the disintegrations of μ with respect to P and Q where { μσ σ τ τ respectively. Indeed, since Q = η ◦ P gives Q# μ = η # (P# μ), if ϕ ∈ Cc0 (Rn ), then Q ϕ dμ = d(Q# μ)(τ) ϕ dμτ = d η # (P# μ) (τ) ϕ dμQ τ Rn
=
T
S
d(P# μ)(σ)
Rn
Rn
T
Rn
ϕ dμQ η(σ) ,
so that (17.5) holds with F = E, η = P# μ (thus h = 1) and μ∗σ = μQ η(σ) . Now P } and { μQ } . Since P μ is let Σ ⊂ S and Θ ⊂ T denote index sets for { μσ σ # τ τ −1 concentrated on both η (Θ) and Σ, so it is on η −1 (Θ) ∩ Σ. We thus deduce (17.3) since, if σ ∈ η −1 (Θ) ∩ Σ, then n μ∗σ (Rn ) = μQ η(σ) (R ) = 1
(since η(σ) ∈ Θ).
−1 −1 η −1 (η(σ)) = Moreover, μ∗σ = μQ η(σ) is concentrated on Q (η(σ)) = P P−1 (σ), where in the last identity we have used that η is injective. In particular (17.4) holds too. We can thus deduce (17.13) by Theorem 17.1-(ii).
17.2 Proof of the Disintegration Theorem Proof of Theorem 17.1 Step one: Denote by p and p S the projections of Rn × S onto Rn and S. We prove that if γ is a finite Borel measure on Rn × S, concentrated on a Borel set G, then there is a Borel set Σ ⊂ S of full (p#S γ)measure such that for every σ ∈ Σ there is a probability measure γσ on Rn , concentrated on {x ∈ Rn : (x, σ) ∈ G}, and such that, for every ζ ∈ Cc0 (Rn ×S), σ ∈ Σ → ζ (x, σ) dγσ (x) is Borel measurable, Rn ×S
and
Rn ×S
ζ dγ =
Σ
d p#S γ (σ)
Rn
ζ (x, σ) dγσ .
(17.14)
17.2 Proof of the Disintegration Theorem
209
We start by noticing that for each ϕ ∈ Cc0 (Rn ), the correspondence 0 ψ ∈ Cc (S) → ϕ [ψ] = ϕ(x) ψ(σ) dγ(x, σ) Rn ×S
defines a bounded linear functional
ϕ : L 1 (p#S γ) → R
(L 1 )∗ ≤ ϕC 0 (Rn ) .
with
By applying the Riesz theorem to each functional ϕ , we construct a linear map m : Cc0 (Rn ) → L ∞ (p#S γ) so that m[ϕ] L ∞ (pS γ) ≤ ϕC 0 (Rn )
∀ϕ ∈ Cc0 (Rn ),
#
and
Rn ×S
ϕ(x) ψ(σ) dγ(x, σ) = ϕ [ψ] =
Rn
m[ϕ](σ) ψ(σ) d p#S γ (σ)
(17.15) for every (ϕ, ψ) ∈ Cc0 (Rn ) × Cc0 (S). Given ϕ ∈ Cc0 (Rn ), let Σϕ be a set of full (p#S γ)-measure in S such that |m[ϕ](σ)| ≤ m[ϕ] L ∞ (pS γ) ≤ ϕC 0 (Rn ) #
∀σ ∈ Σϕ .
If F is a countable dense subset of Cc0 (Rn ) and Σ = {Σϕ : ϕ ∈ F }, then Σ is a set of full (p#S γ)-measure in S such that |m[ϕ](σ)| ≤ ϕC 0 (Rn ) for every σ ∈ Σ and ϕ ∈ F . Therefore the linear functional γσ : Cc0 (Rn ) → R defined by setting γσ [ϕ] = m[ϕ](σ) is such that sup γσ [ϕ] : ϕ ∈ F , ϕC 0 (Rn ) ≤ 1 ≤ 1, ∀σ ∈ Σ. In particular, again by the Riesz theorem, if σ ∈ Σ, then γσ is a Radon measure on Rn with γσ (Rn ) ≤ 1, and, thanks to (17.15), with ϕ(x) ψ(σ) dγ(x, σ) = ψ(σ) d p#S γ (σ) ϕ(x) dγσ (x), Rn ×S
Σ
Rn
Cc0 (Rn ) × Cc0 (S).
(17.16) (x) and 1 S (σ) with
By approximating 1 for every (ϕ, ψ) ∈ Cc0 -functions ϕ(x) and ψ(σ), we deduce from (17.16) that γ(Rn × S) = γσ (Rn ) d(p#S γ)(σ) ≤ p#S γ(S) = γ(Rn × S), Rn
Σ
= 1 for (p#S γ)-a.e. σ ∈ Σ. Denoting by Cc0 (Rn ) ⊗ Cc0 (S) the so that γσ subset of those ζ ∈ Cc0 (Rn × S) such that ζ (x, σ) = ϕ(x) ψ(σ) for (ϕ, ψ) ∈ Cc0 (Rn ) × Cc0 (S), and noticing that the span of Cc0 (Rn ) ⊗ Cc0 (S) is dense in Cc0 (Rn × S), we see that (17.16) implies ζ dγ = d p#S γ (σ) ζ (x, σ) dγσ (x), (17.17) (Rn )
Rn ×S
Σ
Rn
210
Disintegration
for every ζ ∈ Cc0 (Rn × S). By approximating 1(Rn ×S)\G (x, σ) with Cc0 functions ζ (x, σ) in (17.17), we find 0= γσ x ∈ Rn : (x, σ) G d p#S μ (σ), Σ
so that for (p#S γ)-a.e. σ ∈ Σ, γσ is concentrated on {x ∈ Rn : (x, σ) ∈ G}. Step two: We now let μ be a finite Borel measure on Rn , concentrated on a Borel set E, and given a Borel map P : E → S, we consider the Borel map g : E → Rn × S defined by setting g(x) = (x, P(x)), x ∈ E. We apply the claim proved in step one to the Radon measure γ = g# μ, which is concentrated on the graph G = {(x, σ) ∈ E × S : σ = P(x)} of P over E, and is such that p#S γ = P# μ.
p# γ = μ, (p#S γ)
(17.18) P (Rn ),
Hence, for = (P# μ)-a.e. σ ∈ S, there is γσ ∈ concentrated on −1 {x ∈ E : (x, σ) ∈ G} = P (σ), and such that (17.14) holds. By applying (17.14) to ζ (x, σ) = ϕ(x), ϕ ∈ Cc0 (Rn ) (an approximation argument using the finiteness of γ is needed here, since ζ has not compact support) and by (17.18), ϕ dμ = ζ dγ = d p#S γ (σ) ζ (x, σ) dγσ Rn
=
Rn ×S
S
S
Rn
d(P# μ)(σ)
Rn
ϕ(x) dγσ (x),
Therefore μσ = γσ has the required properties. Step three: Now let F, η, Σ , and μ∗σ be as in (ii). If Ω is a P# μ-negligible Borel set in S, then P−1 (Ω) is a μ-negligible Borel set in Rn . By applying (17.5) to ϕ = 1 P −1 (Ω) , and recalling that μ∗σ (P−1 (σ)) = μ∗σ (Rn ), we find that 0= μ∗σ (P−1 (Ω)) dη(σ) = μ∗σ (P−1 (σ) ∩ P−1 (Ω)) dη(σ) =
Σ
Σ
Σ ∩Ω
μ∗σ (Rn ) dη(σ).
In particular, since μ∗σ (Rn ) > 0 for every σ ∈ Σ , and η is concentrated on Σ , we find η(Ω) = 0, thus proving η 0, we can tune in the parameter ε > 0 used in the construction so to obtain that T takes values in Iδ (q(spt γ)), and thus that spt γT ⊂ spt μ × spt (T# μ) ⊂ Iδ (spt γ) = {y ∈ Rn : dist(y, spt γ) ≤ δ}. Coming to the criterion for equality in (3.9), we have the following general result by Ambrosio: 4 if μ, ν ∈ P (Rn ), μ has not atoms, c : Rn → [0, ∞) is continuous, and Kc (μ, ν) < ∞, then Kc (μ, ν) = Mc (μ, ν). Here we limit ourselves to present a proof in the case when μ and ν have compact supports and c is locally Lipschitz continuous (in particular, the linear and quadratic costs are admissible here).
4
See [Amb03, Theorem 2.1].
214
Disintegration
Theorem 17.11 (Equality of Monge’s and Kantorovich’s transport costs) If μ, ν ∈ P (Rn ) have compact supports, μ has no atoms, and c : Rn → [0, ∞) is locally Lipschitz continuous, then Kc (μ, ν) = Mc (μ, ν). Proof Thanks to (3.9) we only need to prove that Kc (μ, ν) ≥ Mc (μ, ν). Setting K = spt μ × spt ν, and given i ∈ N, by applying Remark 17.10 to μ, ν, and γ an optimal plan in Kc (μ, ν), we can find Borel maps T1i : Rn → Rn such that (T1i )# μ has no atoms, T1i takes values in a 2−i -neighborhood of q(spt γ) ⊂ q(K ), and the plan (id × T1i )# μ, whose second marginal satisfies q# (id × T1i )# μ = (T1i )# μ, is sufficiently close to γ in weak-star convergence to entail that 1 1
i
Kc (μ, ν) − K1 (ν, (T1 )# μ) < i , c d[(id × T1i )# μ]
< i . 2 2 Rn ×Rn (17.22) Here we have used that Kc (μ, ν) = Rn ×Rn c dγ and c is bounded on spt [(id × T1i )# μ] ⊂ I1 (K ) = dist(·, K ) ≤ 1 , i n
as well as the fact that, if λ i λ in P (Rn × Rn ) with uniformly bounded supports, then K1 (q# λ i , q# λ) → 0 as i → ∞ (compare with Theorem 11.10). Would (T1i )# μ = ν, we could easily conclude the proof letting i → ∞. We therefore start an iterative application of Remark 17.10 aimed at decreasing the size of K1 (ν, (T1i )# μ). Since (T1i )# μ has no atoms, we can apply Remark 17.10 to (T1i )# μ, ν, and γ1i an optimal plan 5 in K1 ((T1i )# μ, ν). We can thus find a Borel map T2i : Rn → Rn such that, if setting S1i = T1i and S2i = T2i ◦ T1i , then (S2i )# μ = (T2i )# [(T1i )# μ] has no atoms, S2i takes values 2−(i+1) -neighborhood of q(spt γ1i ), and the mapinduced plan (id × T2i )# ((T1i )# μ) = (S1i × S2i )# μ, which has second marginal q# (S1i × S2i )# μ = (S2i )# μ, 5
Notice carefully that γ 1i is taken optimal in K1 , and not in Kc . While, obviously, the starting plan γ had to be taken optimal in Kc , in this iteration we work with K1 since we aim at reducing the size of K1 (ν, (T1i )# μ). We can do this while keeping track of c-transport costs since c is locally Lipschitz continuous.
17.4 Kc = Mc for Nonatomic Origin Measures
satisfies K1 ν, (S2i )# μ
0 we find Mc (μ, ν) ≤ c(x, S i (x)) dμ(x) ≤ c(x, S ij+1 (x)) dμ(x) + L δ ij+1 Rn
≤ ≤
Rn
c(x,T1i (x)) dμ(x) + L
=1
c d[(id
Rn ×Rn
× T1i )# μ]
Rn
j Rn
i |S+1 − Si | dμ + L δ ij+1
+ C(L) 2−i +L δ ij+1 ≤ Kc (μ, ν) + C(L) 2−i +L δ ij+1 ,
where in the last inequality we have used (17.22) (which exploited the optimality of γ in Kc (μ, ν)). We let j → ∞ and then i → ∞ to conclude.
18 Solution to the Monge Problem with Linear Cost
This chapter is devoted to the proof of the following theorem concerning the existence of solutions to the Monge problem with linear cost, namely, |T (x) − x| dμ(x) : T# μ = ν , μ, ν ∈ P1 (Rn ). M1 (μ, ν) = inf Rn
(18.1)
Theorem 18.1 (Sudakov theorem) If μ, ν ∈ P1 (Rn ) and μ 0} intersects {y : y (1) = 0} when t = t(x) is given by √ −x (1) ∈ 0, 2 ε ; t(x) = (1) τG (x)
230
Solution to the Monge Problem with Linear Cost
the corresponding map t : E → [0, properties (b1–3) give
√
2 ε] is Lipschitz continuous on E, since
x (1) x 2(1)
1
− |t(x 1 ) − t(x 2 )| = (1) τG (x 2 ) (1)
τG (x 1 )
√ 1 1
≤ 2 |x 1(1) − x 2(1) | + ε
− (1) τG (x 2 ) (1)
τG (x 1 )
2 √ 1 ≤ 2 |x 1 − x 2 | + ε max |τG (x 1 ) − τG (x 2 )| i=1,2 τG (x i ) (1) √ ≤ ( 2 + 2 ε L) |x 1 − x 2 |, and then Q is Lipschitz continuous on E since it satisfies x (1) τG (x), ∀x ∈ E. (18.35) τG (x) (1) Let us now { μ∗y }y be the disintegration of μ with respect to Q. We claim that Q(x) = x + t(x) τG (x) = x −
μ∗y 0 L n -a.e. on E. Now, (CQ) ˆ is the sum of the squares of the (n − 1) × (n − 1)-minors of ∇Q, in particular CQˆ ≥ det ∇ xˆ Qˆ , where ∇ xˆ denotes the gradient along the variables (x (2) , . . . , x (n) ). From the second identity in (18.35), since ∇ xˆ x (1) = 0, ∇ xˆ xˆ = Id (n−1)×(n−1) and τG is Lipschitz continuous on E, we find that
18.2 Construction of the Sudakov Maps
231
x (1) x (1) τ ∇ + τG (x) ⊗ ∇ xˆ τG(1) , x ˆ G τG (x) (1) [τG (x) (1) ]2 (18.37) (b1–3) we have Lip(τ ; E) ≤ L for L n -a.e. on x ∈ E. Since, by assumptions G √ as well as |x (1) | ≤ ε and |τG(1) | ≥ 1/ 2 on E, we conclude from (18.37) that ˆ ∇ xˆ Q(x) = Id (n−1)×(n−1) −
∇ xˆ Q(x) ˆ − Id (n−1)×(n−1)
≤ C(n, L) ε, and thus that det ∇ xˆ Qˆ ≥ 1 − C(n, L) ε, where C(n, L) denotes a suitable constant depending on n and L only. Taking ε small enough in terms of n and L we complete the proof of (18.36), thus of step one. Step two: We complete the proof of (18.32) (under the sole assumptions (a) and (b)) by reducing to the situation of step one. To this aim, we consider the following partitioning process: first, we use assumption (b) to introduce a countable Borel partition {E j } j of E (modulo a Lebesgue negligible set) such that Lip(τG , E j ) ≤ j ∈ N; second, we pick a finite set of directions {νk }k ⊂ that for every τ ∈ Sn−1 there is at least one k such Sn−1 with the property √ that τ · νk ≥ 1/ 2, and then we further subdivide each E j into Borel sets E j k such that 1 ∀x ∈ E j k ; τG (x) · νk ≥ , 2 obviously the sets E j k are Borel sets since the map x → τG (x) · νk is Borel measurable (assumption (a)); third, given > 0 and setting Σ( ) = {σ ∈ S ◦ : −1 (Σ( )) is H 1 (Φ(σ)) > }, we notice that Σ( ) is open in S ◦ , and thus that PG n a Borel subset of R (assumption (a) again): in particular, we can construct a Borel partition {Ejkm }m of E j k so that PG (x) ∈ Σ(1/m),
∀x ∈ Ejkm ;
fourth, for a positive ε j to be chosen later on in terms of n and j (≥ Lip(τG , E j )), we further subdivide each Ejkm into Borel sets of the form 1 Ejkmh =Ejkm ∩ Zjkmh , Zjkmh = y ∈ Rn : α h − min ε j , √ ≤ y · νk ≤ α h 2m corresponding to a choice of {α h }h ⊂ R such that the stripes Zjkmh cover the whole Rn . Finally, we notice that, up to a rotation taking νk into e1 , and a translation taking {y : y (1) = α h } into {y : y (1) = 0}, the set Ejkmh satisfies assumptions (b1–4) with L = j, ε = ε j , and = 1/m. We can thus apply jkmh step one to deduce that, if { μσ }σ denotes the disintegration of μEjkmh with respect to (PG )| Ejkmh , then jkmh
μσ
0, B r (x 0 )∩spt μ σ 0
up to further decrease the value of r. Since σh → σ0 in S ◦ as h → ∞, we have reached a contradiction with (18.49).
238
Solution to the Monge Problem with Linear Cost
18.3 Kantorovich Potentials Are Countably C 1,1 -Regular We now prove that, for every 1-Lipschitz function f , ∇ f has the countable Lipschitz property on the transport set with endpoints Tc ( f ), and then use this information to discuss the validity of the assumptions of Theorem 18.7 when G ⊂ G( f ) = {(x, y) : f (y) = f (x) + |x − y|}. Theorem 18.9 (Countable C 1,1 -regularity of the Kantorovich potentials) If f : Rn → R is a 1-Lipschitz function with set of differentiability points F, then ∇ f has the countable Lipschitz property on F ∩ Tc ( f ).
(18.51)
Moreover, if G ⊂ G( f ) is such that T (G), Tc (G), and PG : T (G) → S ◦ are Borel measurable, then the following properties hold: (i) G satisfies the non-crossing condition; (ii) for every x ∈ E = F ∩ Tc (G), there is a unique σ ∈ Θmax (G) with x ∈ Φ( ρc (σ)), the corresponding extensions of PG and τG from T (G) to E are both Borel measurable, and τG = (∇ f )| E has the countable Lipschitz property on E; (iii) L n (Tc (G) \ T (G)) = 0. Proof We first prove that ∇ f has the countable Lipschitz property on T ( f ) ∩ F; then, by a symmetric argument, the same will hold on Tr ( f ) ∩ F, and thus on Tc ( f ) ∩ F. Given e ∈ Sn−1 and α ∈ R, we consider the open half-space He,α = {x ∈ Rn : x · e < α}, the set Ye,α of right endpoints of transport rays of f that lie outside of He,α , Ye,α = y ∈ Rn : y · e ≥ α, ∃ x y s.t. (x, y) ∈ G( f ) , and the set Ee,α = F ∩ He,α ∩
[[x, y[[ : (x, y) ∈ G( f ), y ∈ Ye,α ,
obtained by intersecting F ∩ He,α with the union of the left-closed/right-open segments defined by G( f ) having their right-end point in Yε,α . Since G = {(x, y) ∈ G( f ) : y ∈ Ye,α } is closed in Rn × Rn , by Proposition 18.5 we have that Ee,α = F ∩ He,α ∩ T (G ) is a Borel set for every choice of e and α. Moreover, by using countably many choices of e and α, we can cover F ∩T ( f ) with sets of the form Ee,α . We have thus reduced to prove that ∇ f has the countable Lipschitz property on Ee,α for fixed e ∈ Sn−1 and α ∈ R. To this end, let us define ge,α : Rn → R by setting ge,α (z) = sup f (y) − |z − y| : y ∈ Ye,α , z ∈ Rn .
18.3 Kantorovich Potentials Are Countably C 1,1 -Regular
239
Clearly, ge,α is a Lipschitz function with Lip(ge,α ) ≤ 1: in addition to that, (18.52) ge,α ≤ f on Rn with ge,α = f on Ee,α , ⎧ ∀ β < α there is C(α, β) > 0 s.t. ⎪ ⎨ (18.53) ⎪ z → g (z) + C(α, β) |z| 2 is convex on H . e,α e, β ⎩ To prove (18.52): since f (y) − |z − y| ≤ f (z) for every z, y ∈ Rn we obviously have ge,α ≤ f on Rn ; while if z ∈ Ee,α , then z ∈ [[x, y[[ for some (x, y) ∈ G( f ) with x y and y ∈ Ye,α , so that z = y + t(x − y) for some t ∈ (0, 1) and, and thanks to (x, y) ∈ G( f ) we find f (z) = f (y + t(x − y)) = f (y) − t |x − y| = f (y) − |z − y| ≤ ge,α (z) ≤ f (z). To prove (18.53): Given y ∈ Ye,α , we have that |z − y| ≥ α − β for every z ∈ He, β : hence, the second derivatives of z → |z − y| are uniformly bounded by C(n)/(α − β) on z ∈ He, β , and therefore z ∈ He, β → f (y) − |z − y| + C |z| 2 , is convex on He, β provided C is large enough depending on n, α and β only. We conclude since a supremum of convex functions is a convex function. By Theorem 7.1 (gradients of convex functions have locally bounded variation), by (18.53), and with C large enough depending on n, α and β, we have ∀ β < α, ∇ge,α + 2 C Id ∈ BVloc (He, β ; Rn ) so that ∇ge,α ∈ BVloc (He,α ; Rn ). A classical result about maximal functions (see Theorem A.2 in the Appendix) then implies that ∇ge,α has the countable Lipschitz property on He,α . Since, by (18.52), it holds that ∇ge,α = ∇ f a.e. on Ee,α ⊂ He,α , we conclude that ∇ f has the countable Lipschitz property on Ee,α . We now consider G ⊂ G( f ) such that T (G) and PG : T (G) → S ◦ are Borel measurable, and prove conclusions (i), (ii), and (iii). Proof of (i): By Proposition 18.6, G( f ) satisfies the non-crossing condition (18.21), which is thus satisfied by G thanks to G ⊂ G( f ). Proof of (ii): The inclusion G ⊂ G( f ) implies that Θ(G) ⊂ Θ( f ), and thus that T (G) ⊂ T ( f ) and Tc (G) ⊂ Tc ( f ). Moreover, every σ ∈ Θmax (G) is a (possibly strict) sub-segment of some σ ⊂ Θmax ( f ). We conclude by combining these properties with the possibility of extending P f and ∇ f from T ( f ) to F ∩ Tc ( f ) (Proposition 18.6) with the fact that ∇ f has the countable Lipschitz property on F ∩ Tc ( f ). Proof of (iii): By conclusion (ii), G satisfies assumptions (a) and (b) of Theorem 18.7 with the Borel set E = F ∩ Tc (G). Let us assume for a moment
240
Solution to the Monge Problem with Linear Cost
that L n (E) < ∞, and apply conclusion (18.32) in Theorem 18.7 to the disintegration { μσ }σ of the finite measure μ = L n E with respect to PG : E → S ◦ : we find, in particular, that μσ 0, we let Join(Σ,r) ⊂ S ◦ , denote the family of those open oriented segments ]]x, y[[ for which there exist z, w ∈ Rn such that (i) ]]x, z[[ and ]]w, y[[ belong to Σ; (ii) the intersection between ]x, z[ and ]w, y[ is the open segment ]w, z[; and (iii) |z − w| ≥ r. In plain terms, segments in Join(Σ,r) are obtained by joining segments in Σ whose realizations have an oriented overlapping of length at least r: in particular, all the segments in Σ of length at least r are contained in Join(Σ,r) (although Σ may contain shorter segments, and thus not be contained in Join(Σ,r)), and every segment in Join(Σ,r) has length at least r. If we extend Σ by adding all the segments in Join(Σ,r), and thus consider Ext(Σ,r) = Σ ∪ Join(Σ,r), then ⎧ Ext(Σ,r) is closed in S ◦ ⎪ ⎨ ⎪ with Φ(Ext(Σ,r)) bounded in Rn . ⎩ (18.64) We just need to prove that Join(Σ,r) is closed in S ◦ . Indeed, let σ j =]]x j , y j [[ be a sequence in Join(Σ,r) associated to points z j , w j satisfying (i), (ii) and (iii), and assume that σ j → σ in S ◦ . Since x j , y j ∈ Φ(Σ), which is bounded, it must be σ =]]x, y[[ for some x, y ∈ Rn , x y. The fact that ]]x j , z j [[ and ]]w j , y j [[ lie on a same oriented line implies that, up to extracting subsequences, z j → z and w j → w in Rn , with |z − w| ≥ r and with x < w < z < y in the orientation of the line generated by σ, and with ]]x, z[[ and ]]w, y[[ in Σ thanks to the facts that ]]x j , z j [[ and ]]w j , y j [[ are in Σ and that Σ is closed. This shows that σ ∈ Join(Σ,r), and thus (18.64) is proved. ⎧ Σ is closed in S ◦ ⎪ ⎨ ⎪ Φ(Σ) bounded ⎩
⇒
Step two: We notice that if Σ ⊂ Sub(Θmax (G)), then Sub(Σ) ⊂ Sub(Θmax (G)),
Ext(Σ,r) ⊂ Sub(Θmax (G))
∀r > 0. (18.65)
The simple proof is omitted. Step three: We claim that
Θk,k , Sub Θmax (G) = ∞
(18.66)
k=1
where Θk,k is the sequence of closed subsets of S ◦ defined as follows: we set k ∈ N, Θk = ]]x, y[[ : (x, y) ∈ G, x y, |x|, |y| ≤ k ⊂ Θ(G),
18.5 Some Technical Measure-Theoretic Arguments
245
for the set of all the transport rays of G with end-points in Cl(Bk ), and then define Θk,0 = Sub(Θk ), Θk,h = Sub Ext(Θk,h−1 , 1/h) ,
1 ≤ h ≤ k.
Notice that, at step h, we define Θk,h by first adding to Θk,h−1 all the segments obtained by joining segments in Θk,h−1 with oriented overlapping of length at least 1/h, and then by taking all the possible sub-segments of the resulting family. For this reason, it is easily seen that Θk,h is monotone increasing both in k and in h with respect to set inclusion. The fact that each Θk,h is closed in S ◦ is immediate from (18.59) and (18.64), and the inclusion in Sub(Θmax (G)) follows from (18.65). We are thus left to prove that if σ is a sub-segment of some maximal transport ray σ ∗ ∈ Θmax (G), then σ ∈ Θk,k for k large enough. To this end, let us notice that by definition of maximal transport ray we can find countably many transport rays σ j of G, such that Φ(σ ∗ ) is covered by {Φ(σ j )} j ∈N . Since Φ(σ) is bounded, we can find an integer N such that Φ(σ) is covered by {Φ(σ j )} N j=1 ; moreover, by connectedness of Φ(σ) and up to rearranging in the index j, we find that for each j = 1, . . . , N −1, Φ(σ j ) has a positive overlapping with Φ(σ j+1 ); we denote by ε the corresponding infimum overlapping length, which, evidently, is positive. We now take k0 large enough so that 1/k0 < ε and σ j ∈ Θk0 for every j = 1, . . . , N, and then pick k = k0 + N − 1. Since σ is a sub-segment of the segment obtained by joining (N − 1)-segments in Θk0 with overlapping 1/k0 , the choice of k ensures that σ ∈ Θk,k . Step four: Let Σ be a closed subset of S ◦ , and notice that, by (18.66), Φ(Σ ∩ Θmax (G)) = Φ Σ ∩ Sub(Θmax (G)) ∞ ∞ =Φ Σ∩ Θk,k = Φ [Σ ∩ Θk,k ] . k=1
k=1
Since Σ ∩ Θk,k is S 0 -closed, by Proposition 18.3 we see that Φ(Σ ∩ Θmax (G)) ∈ B(Rn ),
∀ Σ ⊂ S ◦ , Σ closed.
(18.67)
Taking Σ = S ◦ in (18.67), we see that T (G) = Φ(Θmax (G)) is a Borel set. When the non-crossing condition (18.21) holds, and thus PG is well-defined, we have of course (PG ) −1 (Σ) = Φ(Σ ∩ Θmax (G)),
∀Σ ⊂ S ◦ ,
246
Solution to the Monge Problem with Linear Cost
and the Borel measurability of PG follows by arbitrariness of Σ closed in (18.67). Step five: The Borel measurability of T (G) with ∈ {c, , r} is discussed similarly. Indeed, the definitions of Join and Ext are easily adapted in S (compare to what done with the definition of Sub given in (18.57)). The assertions in step two, three and four are easily seen to hold with identical arguments if Θmax (G) max (G) = { ρ (σ) : σ ∈ Θmax (G)} ⊂ S . is systematically replaced with Θ max (G) can always be In particular, the argument in step three shows that Θ characterized as countable union of S -closed sets Θk,k .
19 An Introduction to the Needle Decomposition Method
In this final chapter we relate OMT to the “needle decomposition method” originating in the classical work by Payne and Weinberger [PW60] on sharp forms of the Poincaré inequality on convex domains, and subsequently reintroduced as a “localization theorem” in Convex Geometry by various authors [GM87, LS93, KLS95]. The idea of framing the needle decomposition technique as a disintegration procedure along transport rays [Kla17] and the concurrent development of OMT in (weighted) Riemannian manifolds first and in metric measure spaces with curvature/dimension conditions later have led to realize the (proper formulation and) validity of the localization theorem in those very general settings and opened the way to several striking advancements on geometric and functional inequalities, some of them previously unknown even in the Riemannian setting. Although a reasonably complete account on these developments is evidently beyond the scope of our Euclidean-centric discussion, we can still take advantage of the results presented in Chapter 18 to illustrate some of the key ideas involved in the method, thus offering a particularly gentle introduction to this important topic; and this will indeed be the goal of this final chapter. We start, in Section 19.1, with an account on the original proof of the Payne–Weinberger comparison theorem (Theorem 19.1). This is instructive for at least two reasons: first, it shows that the basic idea behind the needle decomposition method arises from completely elementary and direct considerations (no need to solve the Monge problem to use this method – at least in Rn , and in other highly symmetric spaces); second, it introduces readers to the subject of comparison theorems, a class of statements of fundamental importance in Riemannian geometry, and the typical “target” of proofs based on the method. In Section 19.2 we briefly introduce the localization theorem (Theorem 19.4) as the most efficient formalization of the ideas behind the needle decomposition method, and we apply it to deduce, quite immediately, the Payne–Weinberger 247
248
An Introduction to the Needle Decomposition Method
comparison theorem. Although the localization proof requires, on aggregate, more work than the original proof, it offers an undeniably simpler approach from the conceptual viewpoint. This simplicity is reflected, on a more pragmatic level, in the fact that the localization theorem can be generalized, mutatis mutandis, to much more general settings than Euclidean spaces, where the simple geometric arguments presented in Section 19.1 are not even conceivable; and, once such generalizations are obtained, they allow the far-reaching extension of Theorem 19.1 to those settings by repeating the simple argument presented in Section 19.2. The chapter is closed by a proof of the localization theorem, which is achieved by first (re-)discussing, in Section 19.3, the C 1,1 differentiability properties of Kantorovich potentials, and by then proving a key concavity property of disintegration densities in Section 19.4.
19.1 The Payne–Weinberger Comparison Theorem Given a bounded, open, connected set Ω ⊂ Rn with Lipschitz boundary, the Poincaré inequality states the existence of a positive constant λ(Ω) such that 2 2 1 |∇u| ≥ λ(Ω) u , ∀u ∈ C (Ω), u = 0. (19.1) Ω
Ω
Ω
The constant λ(Ω) appearing in (19.1) is understood to be the largest possible constant such that (19.1) holds, i.e., λ(Ω) is by definition 2 1 2 |∇u| : u ∈ C (Ω) u = 1, u=0 . λ(Ω) = inf Ω
Ω
Ω
Notice that λ(Ω) has the dimensions of a squared inverse length and could have been equivalently defined as the first positive eigenvalue of the Laplace operator on Ω with Neumann boundary condition, i.e., as the smallest number1 λ such that there is a nonconstant function u with the property that −Δu = λ u in Ω and ∇u · νΩ = 0 on ∂Ω. We call 2 λ(Ω) the Poincaré constant of Ω. Since the Poincaré inequality itself does not assert anything but the positivity of λ(Ω), and given that one can hope to explicitly compute λ(Ω) only in very specific situations (e.g., if n = 1 and Ω is an interval of length d, then λ(Ω) = (π/d) 2 by basic Fourier analysis), a natural question is bounding λ(Ω) in terms of other (hopefully more directly accessible) geometric/functional quantities depending on Ω. The Payne–Weinberger comparison theorem takes on this question in the class of convex domains (i.e., bounded open convex sets). 1 2
Any such λ is necessarily positive, as seen if we integrate by parts λ u 2 = −u Δu. The same term is routinely used also for 1/λ (Ω) in the literature.
19.1 The Payne–Weinberger Comparison Theorem
Theorem 19.1 (Payne–Weinberger comparison theorem) domain in Rn and u ∈ C 1 (Ω) with Ω u = 0, then
π |∇u| ≥ diam(Ω) Ω
249
If Ω is a convex
2
2
Ω
u2 .
(19.2)
In other words, at a fixed diameter, one-dimensional intervals have the lowest possible Poincaré constant among convex domains in any dimension, i.e., λ(Ω) ≥ λ (0, diam(Ω)) , ∀ Ω convex domain. The original proof of Theorem 19.1 by Payne and Weinberger introduces (ante litteram) the needle decomposition method and provides us with a key insight into the localization theorem (Theorem 19.4). The method allows to reduce the proof of Theorem 19.1 to the discussion of the family of one-dimensional variational problems addressed in the following statement: If d > 0, h : (0, d) → R is concave, and = eh , then d π 2 d 2 (u ) ≥ u2 , d 0 0 d whenever u ∈ W 1,2 (0, d) is such that 0 u = 0.
Lemma 19.2
(19.3)
Remark 19.3 Notice that (19.3) is sharp, since in the case h = constant we have indeed λ((0, d)) = (π/d) 2 . It is also useful to notice that Lemma 19.2 applies whenever = k m for some m > 0 and k : (0, d) → R is positive and concave, since in this case = eh with h = m log k, which is (trivially) concave on (0, d). The interest of this case will become apparent in the proof of Theorem 19.1, see, in particular, (19.13) and (19.14). Proof of Lemma 19.2 For ε ∈ (0, d/2), the ε-regularization by convolution of h, hε (x) = h(y) ρε (y− x) dx, defines a smooth concave function on (ε, d−ε) so that hε ↑ h locally in (0, d) as ε → 0+ . Exploiting this approximation, we reduce to the case when h is smooth and convex on (0, d) (and, in particular, is smooth, positive, and bounded from above) and consider the problem d d d 2 2 (u ) : u = 1, u = 0 , γ = inf 0
0
0
which is easily seen to admit a minimizer u ∈ d Given ζ ∈ W 1,2 (0, d) with 0 ζ = 0, we notice that
d ζ u u ϕ=ζ−
W 1,2 (0, d)
0
by the Direct Method.
(19.4)
250
An Introduction to the Needle Decomposition Method
is such that
d
0
Picking ψ ∈ W 1,2 (0, d) with d
d
ϕ =
ϕu = 0.
(19.5)
0
d
ψ = 0,
0
ψu = 1,
(19.6)
0
(e.g., pick ψ = u), we consider
d
f (t, s) = −1 +
(u + t ϕ + s ψ) 2 .
0
Clearly, f (0, 0) = 0, and (19.5) and (19.6) give ∂f (0, 0) = 2, ∂s
∂f (0, 0) = 0, ∂t
so that we can find ε > 0 and s : (−ε, ε) → R such that s(0) = 0, s (0) = 0 and f (t, s(t)) = 0 for every |t| < ε. Since, by construction, (u + t ϕ + s(t) ψ) has zero average on (0, d), we see that u + t ϕ + s(t) ψ is admissible in γ for every |t| < ε, and exploiting the minimality of u in γ, s (0) = 0 and (19.4), we conclude that d d d ϕ u = ζ u − γ ζ u , (19.7) 0= 0
0
0
d ζ = 0. Given η ∈ C ∞ [0, d] with 0 η = 0, d and applying (19.7) to ζ = η/ , we find 0 = 0 η u − η [( / ) u + γ u], that is (counting that / = h ) d d d
ηu = η (u + h u + γ u), ∀ η ∈ C ∞ [0, d] with 0 η = 0. (19.8)
for every ζ ∈ W 1,2 (0, d) with
0
d 0
0
By testing (19.8) with η compactly supported in (0, d) and by a classical regularity argument, u ∈ C ∞ (0, d), and there exists λ ∈ R such that u + h u + γ u = λ,
on (0, d),
(19.9)
which can then be combined again into (19.8) with choices of η that are nonzero at {0, d} to conclude that u ∈ C ∞ [0, d] with u (0) = u (d) = 0. Differentiating (19.9) we get on (0, d); (19.10) u + h u + (h + γ) u = 0, √ √ √ at the same time, setting v = u , by ( ) = (h /2) and (19.10), we find that
19.1 The Payne–Weinberger Comparison Theorem
251
√ h
u + u 2
2 √ h √ √ √ h h h v = u +2 u + u u = u + h u + u + 2 2 2 2 h (h ) 2 + − γ v ≥ −γ v, = − 2 4 v =
where in the las step we have used h ≤ 0. Multiplying by v and integrating by d d parts with the help of v(0) = v(d) = 0, we find − 0 (v ) 2 ≥ −γ 0 v 2 , which gives d d
2 )2 ⎧ ⎫ ⎪ ⎪ (v (w ) 2 ⎪ ⎪ π 0 0 1,2 ⎨ ⎬ γ≥ d ≥ inf ⎪ d : w ∈ W0 (0, d) ⎪ = , d ⎪ ⎪ 2 v2 ⎩ 0 w ⎭ 0 if v is not identically zero on (0, d). It is easily that it cannot be v ≡ 0; indeed, d this would give u constant, which combined with 0 u = 0 and > 0 on (0, d) would give u ≡ 0, in contradiction with 0 u2 = 1. We now present the original proof of Theorem 19.1. Proof of Theorem 19.1 Step one: Given positive d and δ, let us say that a convex domain Ω is a needle of length d and waist δ if, up to rigid motions, Ω ⊂ x : x 1 ∈ (0, d), |x i | < δ ∀i 1 . (19.11) (It is implicitly intended that the values of d and δ cannot be further decreased.) In this step we prove that if Ω is a needle of length d and waist δ, then
2 π |∇u| 2 ≥ u2 − C(u) L n (Ω) δ − C(u) L n (Ω) δ, (19.12) d Ω Ω for every u ∈ C 2 (Ω) with Ω u = 0, and for a constant C(u) depending only on the C 2 -norm of u in Ω. Indeed, after a rigid motion that takes Ω in a position such that (19.11) holds, for t ∈ (0, d) let us set
(t) = H n−1 Ω(t) , Ω(t) = Ω ∩ {x 1 = t}, (19.13) and notice that, for every Borel function g : Ω → R with g(x) = G(x 1 ), one has d g= G(t) (t) dt, (19.14) Ω
0
thanks to Fubini’s theorem. By the Brunn–Minkowski inequality, if t 1 ,t 2 ∈ [0, d] and s ∈ (0, 1), then (1 − s) (t 1 ) 1/(n−1) + s (t 2 ) 1/(n−1) ≤ H n−1 (1 − s) Ω(t 1 ) + s Ω(t 2 ) 1/(n−1) ,
252
An Introduction to the Needle Decomposition Method
{x1 = (1 − s) t1 + s t2 } Ω
x1 {x1 = t1 }
{x1 = t2 }
Figure 19.1 If Ω is convex, then the section of Ω by {x 1 = (1 − s) t 1 + s t 2 } contains the convex combination (1 − s) Ω(t 1 ) + s Ω(t 2 ) of the sections Ω(t i ) = Ω ∩ {x 1 = t i }, i = 1, 2. These various sections are depicted in gray.
while, by convexity of Ω,
(1 − s) Ω(t 1 ) + s Ω(t 2 ) ⊂ Ω (1 − s) t 1 + s t 2 ,
see Figure 19.1, so that 1/(n−1) is concave on [0, d] (and positive on (0, d) since the value of d cannot be further decreased). We can thus exploit Lemma 19.2 (as explained in Remark 19.3 with m = n−1) to the function x 1 ∈ (0, d) → u(x 1 , 0) and deduce by (19.14) that 2
π 2 2 ∂u 1 (x 1 , 0) dx ≥ u(y1 , 0) dy dx. u(x 1 , 0) − n d L (Ω) Ω Ω ∂x 1 Ω Since Ω u = 0, for a constant C(u) depending only on the C 2 -norm of u in Ω we have 2 2
∂u ∂u
≤ C(u) δ L n (Ω),
(x 1 , 0) dx − Ω ∂x 1
Ω ∂x 1 2
1
u(y1 , 0) dy dx − u2
≤ C(u) δ L n (Ω), u(x 1 , 0) − n L (Ω) Ω Ω
Ω
and therefore we deduce
2 2
π ∂u 2 n 2 n |∇u| ≥ ≥ −C(u) δ L (Ω) + u − C(u) δ L (Ω) , d Ω Ω ∂x 1 Ω which is (19.12). 2 Step two: We conclude the proof. Let Ω be a convex domain and let u ∈ C (Ω) with Ω u = 0: we claim that, for every δ > 0, we can find a finite family of N such that L n (Ω \ N Ω ) = 0, Ω ∩ Ω = ∅, each Ω convex domains {Ωi }i=1 i j i i=1 i is a needle with length d i ≤ diam(Ω) and waist δ, and u = 0, ∀i = 1, . . . , N. (19.15) Ωi
19.1 The Payne–Weinberger Comparison Theorem
253
Notice that the claim implies the theorem: indeed, by (19.15) we can apply step two to u on each Ωi (notice that the C 2 -norm of u on Ωi is bounded by the C 2 -norm of u on Ω) to find
2 di 2 n |∇u| + C(u) L (Ωi ) δ ≥ u2 − C(u) L n (Ωi ) δ. π Ωi Ωi Exploiting d i ≤ diam(Ω) and adding up over i we find
2 diam(Ω) |∇u| 2 + C(u) L n (Ω) δ ≥ u2 − C(u) L n (Ω) δ. π Ω Ω Letting δ → 0+ we obtain (19.2). To prove the claim: We first consider the case n = 2. Let waist(Ω) denote the infimum over ν ∈ S1 of max{|(x − y) · ν| : x, y ∈ Ω} so that waist(Ω) is the size of the narrowest strip containing Ω, and let us notice that (19.16) waist(Ω) ≤ 2 L 2 (Ω). (Indeed, Ω always contains two intersecting orthogonal segments, one of length a = waist(Ω) and one of length b ≥ a; thus, it contains their convex envelope, whose area is at least a2 /2.) Next, for every ν ∈ S1 we can find two complemen− tary half-planes {Hν+ , Hν− }, ν pointing outward Hν+ and inward Hν , such that, ± ± 2 ± 2 setting Ων = Ω ∩ Hν , we have L (Ων ) = L (Ω)/2. Since Ω+ u = − Ω− u, ν ν by a continuity argument we can find ν ∈ S1 such that Ω+ u = − Ω− u = 0. ν
ν
m
2 of Iterating this construction, after m steps we have defined a family {Ωi(m) }i=1 mutually disjoint convex domains such that, for every i = 1, . . . , 2m , 2m L 2 (Ω) Ωi(m) = 0, , u = 0. L2 Ω \ L 2 (Ωi(m) ) = 2m Ωi(m) i=1
Thanks to (19.16), Ωi(m) is a needle with length d ≤ diam(Ω) and waist δ ≤ C (L 2 (Ω)/2m ) 1/2 , and the claim is proved taking m large with respect to δ. In the case n ≥ 3 we argue as follows. Given an (n − 2)-dimensional plane J in Rn , we denote by {vJ , w J } an orthonormal basis to J ⊥ . Given ν ∈ S2J ⊥ = {cos α vJ + sin α w J : α ∈ [0, 2 π]}, we now define Hν± and Ων± by requiring H 2 (p⊥J (Ω)) , H 2 p⊥J (Ων± ) = 2 where p⊥J denotes the projection of Rn over J ⊥ . Again by a continuity argu ment, a specific ν ∈ S2J ⊥ can be selected to ensure Ω+ u = − Ω− u = 0. The ν ν iteration procedure is then repeated, this time finding H 2 (p⊥J (Ω)) H 2 p⊥J (Ωi(m) ) = 2m
254
An Introduction to the Needle Decomposition Method
(in place of L 2 (Ωi(m) ) = L 2 (Ω)/2m ), thus proving that lim
sup
m→∞ 1≤i ≤2 m
waist(p⊥J (Ωi(m) )) = 0.
We thus obtain a partition of Ω into convex domains Ωi(m) , such that u has zero average on each Ωi(m) and such that each Ωi(m) is δ-thin with respect to a direction contained in J ⊥ . Let us take any one of these pieces, say, Ω1(m) , and assume, as it can be done without loss of generality up to a rotation, that Ω1(m) is δ-thin in the direction en : by repeating the above argument starting from Ω1(m) in place of Ω, and with J ⊥ = span{en−2 , en−1 }, we further subdivide Ω1(m) into convex domains on which u has zero average and which are δ-thin in the direction en and in a direction contained in span{en−2 , en−1 }. By continuing this procedure, we eventually prove the claim and, thus, the theorem.
19.2 The Localization Theorem We now introduce (the Euclidean version of) the localization theorem. Here and in the rest of this chapter we use the notation T ( f ), Tc ( f ), P f , and so on for transport set, transport set with endpoints, maximal transport ray map, and others associated to a 1-Lipschitz function f . This notation was introduced in Section 18.1; see, in particular, Proposition 18.6. Theorem 19.4 (Localization theorem) LetΩ be a convex domain in Rn , u : Ω → R a Borel measurable function with Ω u = 0 and u 0, and let f : Rn → R be a Kantorovich potential from μ to ν, where μ=
u+ dL n Ω , u+ Ω
ν=
u− dL n Ω . u− Ω
Then, 3 u = 0, (L n (Ω
L n -a.e. on Ω \ T ( f ), S◦,
(19.17) P−1 f (σ)
and for (P f )# ∩ T ( f )))-a.e. σ ∈ the set Iσ := ∩ Ω is an open segment and there is a smooth function σ : Iσ → R such that 1/(n−1) (i) σ is concave on Iσ and I u σ dH 1 = 0; σ (ii) { σ H 1 Iσ }σ is the disintegration of L n (Ω ∩ T ( f )) with respect to P f . 3
It could be that u = 0 on a set of positive Lebesgue measure inside T ( f ). For example, if Ω = (0, 3) × (0, 1) ⊂ R2 with u = 1 on Q 1 = (0, 1) × (0, 1), u = 0 on Q 2 = (1, 2) × (0, 1) and u = −1 on Q 3 = (2, 3) × (0, 1), then f (x 1, x 2 ) = x 1 is a Kantorovich potential from μ = L n Q 1 to ν = L n Q 3 with T ( f ) = Ω and P f (x 1, x 2 ) =]](x 1, 0), (x 2, 3)[[ for every (x 1, x 2 ) ∈ Ω.
19.2 The Localization Theorem
255
Remark 19.5 (Uniqueness of Kantorovich potentials) (19.17), if μ, ν ∈ P1 (Rn ), μ = ρ dL n , ν = σ dL n and
As a consequence of
{ ρ > 0} ∪ {σ > 0} is L n -equivalent to a convex domain Ω,
(19.18)
then all Kantorovich potentials from μ to ν agree up to an additive constant. Indeed, if u = ρ − σ, then u 0 L n -a.e. on Ω by (19.18), and thus (19.17) implies L n (Ω \ T ( f )) = 0. In particular, if f is a Kantorovich potential from μ to ν, then |∇ f | = 1 L n -a.e. on Ω. Now, if f 1 and f 2 are Kantorovich potentials from μ to ν, then ( f 1 + f 2 )/2 is a Kantorovich potential from μ to ν too (since in the dual Kantorovich problem we are maximizing the linear function g → g dν − Rn g dμ on the convex set {g : Lip(g) ≤ 1}, see (3.38)). Hence, Rn |∇( f 1 + f 2 )/2| = 1 L n -a.e. on Ω, and, by strict convexity of the Euclidean norm, ∇ f 1 = ∇ f 2 L n -a.e. on Ω. Since Ω is, in particular, open and connected, this implies that f 1 = f 2 + constant in Ω, as claimed. We now comment on the relation between Theorem 19.4 and the “needle decomposition” argument used to prove Theorem 19.1. There, given δ > 0, we have constructed a partition of Ω into convex (solid) “needles” {Ωδi }i with waist of size δ, so that Ωδ u = 0 for each i. Further assuming u ∈ C 2 (Ω), we i
have deduced, by Lemma 19.2, that (19.2) holds on each needle Ωδi up to an error of size uC 2 (Ω) δ L n (Ωδi ), and then, by adding up the resulting approximate inequalities over i before sending δ → 0+ , we have deduced (19.2) on Ω. A natural question is thus if this approximation step is necessary at all – in other words, if one could directly work, so to say, with δ = 0. Theorem 19.4 answers affirmatively to that question. By the properties of disintegration (see Theorem 17.1), the decomposition 4 of Ω into one-dimensional (rather than solid) “needles” {Iσ }σ , or, better, the disintegration of L n Ω into measures { μσ = σ H 1 Iσ }σ allows us to decompose “bulk” integrals on Ω as superposition of one-dimensional integrals such that u has still zero average with 1/(n−1) , respect to each μσ . Lemma 19.2 applies thanks to the concavity of σ which descends, in the limit as δ → 0+ , from the concavity of the slice function (19.13) (indeed, each μσ can be seen as a weak-star limit of probability δ δ measures of the form L n (Ωi (jj ) ) −1 L n Ωi (jj ) with δ j → 0+ as j → ∞). We close this section by showing how easily the proof of Theorem 19.1 is reduced to that of Lemma 19.2 by using Theorem 19.4. The proof of Theorem 19.4 will then take the rest of the chapter. 4
In this sentence we are really thinking to the situation where L n (Ω \ T ( f )) = 0, and thus we obtain a foliation by segments of the whole Ω, and not just of Ω \ T ( f ) (which is a subset of {u = 0} modulo L n -null sets by (19.17), and thus is irrelevant for the typical applications of the localization theorem, see for example the proof of Theorem 19.1 via Theorem 19.4).
256
An Introduction to the Needle Decomposition Method
Proof of Theorem 19.1 via Theorem 19.4 Given u ∈ C 1 (Ω), let f be as in Theorem 19.4. By (19.17) and by Theorem 19.4-(ii), setting for brevity λ = (P f )# (L n (Ω ∩ T ( f ))), we have |∇u| 2 = |∇u| 2 = dλ(σ) |∇u| 2 σ dH 1 Ω Ω∩T ( f ) S◦ Iσ 2 ≥ dλ(σ) (uσ ) σ dH 1 , S◦
Iσ
where uσ ∈ C 1 (Iσ ) denotes the restriction of u to the open segment Iσ | on I . By Theorem 19.4-(i) and Lemma 19.2, trivially, |∇u| ≥ |uσ σ
2 π 2 2 (uσ ) σ ≥ uσ
σ , diam(Iσ ) Iσ Iσ
so that,
where diam(Iσ ) ≤ diam(Ω), and thus, using again Theorem 19.4-(ii), we find
2 2 π π 2 |∇u| 2 ≥ dλ(σ) uσ
σ = u2 . diam(Ω) diam(Ω) Ω S◦ Iσ Ω∩T ( f )
Using again (19.17), we conclude the proof.
19.3 C 1,1 -Extensions of Kantorovich Potentials In a first step toward the proof of Theorem 19.4 we revisit in more geometric terms the analysis from Section 18.3. There we proved that if f is a Kantorovich potential, then ∇ f has the countable Lipschitz property on Tc ( f ), the transport set with endpoints of f (see Section 18.1 for all the relevant notation). We now show that ∇ f is actually O(1/ε)-Lipschitz continuous on the transport-type sets Σε ( f ) of those x ∈ T ( f ) that are midpoints of intervals of length 2 ε contained in some transport ray of f , i.e., on Σε ( f ) = x ∈ T ( f ) : x − ε ∇ f (x), x + ε ∇ f (x) ⊂ Φ(P f (x)) . (19.19) In addition, we construct a C 1,1 -extension of f from Σε ( f ) to Rn with C 1,1 norm of order O(1/ε) by using the Whitney extension theorem. Theorem 19.6
If f : Rn → R is 1-Lipschitz and ε > 0, then
48 |x − y|, ε 16 |x − y| 2 , | f (y) − f (x) − ∇ f (x) · (y − x)| ≤ ε |∇ f (x) − ∇ f (y)| ≤
(19.20) ∀x, y ∈ Σε ( f ).(19.21)
In particular, there exists f ε ∈ C 1,1 (Rn ) such that f = f ε and ∇ f ε = ∇ f on Σε ( f ).
(19.22)
19.3 C 1,1 -Extensions of Kantorovich Potentials
{ f = f (y)}
P f (x)
x1
x
x0
∇ f (x)
{ f = f (y) + δ} x2
P f (y) y = y1
y0
257
∇ f (y)
y2
{ f = f (y) − δ} Figure 19.2 The situation in the proof of Theorem 19.6. Proof Thanks to the Whitney extension theorem for C 1,1 -functions (see Appendix A.16) (19.22) is a consequence of (19.20) and (19.21). Since |∇ f | ≤ 1, if |x−y| ≥ ε/8, then (19.20) and (19.21) hold, respectively, with (16/ε)|x−y| and with (16/ε)|x − y| 2 on their right-hand sides. Therefore, we can focus on proving (19.20) and (19.21) when x, y ∈ Σε ( f ) are such that |x − y| ≤ We set
ε . 8
x 1 = x + f (x) − f (y) ∇ f (x),
By Lip( f ) ≤ 1,
(19.23) y1 = y.
|x 1 − x| ≤ |x − y|,
(19.24)
so (19.23) and x ∈ Σε ( f ) give x 1 ∈ Φ(P f (x)). Since f grows linearly with unit slope along P f (x), we have f (x 1 ) = f (x) + f (y) − f (x) = f (y), and, setting x 0 = x 1 − δ ∇ f (x),
x 2 = x 1 + δ ∇ f (x),
y0 = y1 − δ ∇ f (y),
y2 = y1 + δ ∇ f (y),
δ=
ε , 8
we deduce from x, y ∈ Σε ( f ) that x i ∈ Φ(P f (x)), yi ∈ Φ(P f (y)) (i = 0, 1, 2), and x 0 , y0 ∈ { f = f (y) − δ},
x 1 , y1 ∈ { f = f (y)},
see Figure 19.2. We now claim that max |x i − x j |, |yi − y j | ≤ |x i − y j |, |x 1 − y1 | ≤ 2 |x − y|, max{|x 0 − y0 |, |x 2 − y2 |} ≤ 2 |x 1 − y1 |.
x 2 , y2 ∈ { f = f (y) + δ}; (19.25) ∀i, j = 0, 1, 2,
(19.26) (19.27) (19.28)
258
An Introduction to the Needle Decomposition Method
To prove (19.26): Thanks to x i , x j ∈ Φ(P f (x)) and to (19.25) we have |x i − x j | = | f (x i ) − f (x j )| = | f (x i ) − f (y j )| ≤ |x i − y j |. To prove (19.27): By (19.24), |x 1 − y1 | ≤ |x 1 − x| + |x − y1 | ≤ 2 |x − y|. To prove (19.28): The symmetries of the problem allow us to assume that |x 0 − y0 | ≤ |x 2 − y2 |, and thus focus on proving |x 2 − y2 | ≤ 2 |x 1 − y1 |. By (19.26), |x 0 − x 2 | ≤ |x 0 − y2 | = |(y2 − x 2 ) + (x 2 − x 0 )| so that, taking squares, expanding them, and noticing that x 0 − x 2 = 2 (x 1 − x 2 ), we find |x 2 − y2 | 2 ≥ 2 (y2 − x 2 ) · (x 0 − x 2 ) = 4 (y2 − x 2 ) · (x 1 − x 2 ). (19.29) Similarly, again by (19.26), |y0 − y2 | ≤ |y0 − x 2 | = |(y2 − x 2 ) + (y0 − y2 )|, so y2 − y0 = 2 (y2 − y1 ) implies |x 2 − y2 | 2 ≥ 2 (y2 − x 2 ) · (y2 − y0 ) = 4 (y2 − x 2 ) · (y2 − y1 ). (19.30) By (19.29) and (19.30), |y2 − x 2 | |y1 − x 1 | ≥ (y2 − x 2 ) · (y1 − x 1 ) = (y2 − x 2 ) · (y2 − x 2 ) − (y2 − y1 ) − (x 1 − x 2 ) = |y2 − x 2 | 2 − (y2 − x 2 ) · (y2 − y1 ) − (y2 − x 2 ) · (x 1 − x 2 ) |x 2 − y2 | 2 , ≥ 2 thus completing the proof of (19.28). Finally: To prove (19.20): By (19.28) and (19.27), δ |∇ f (x) − ∇ f (y)| = |(x 2 − x 1 ) − (y2 − y1 )| ≤ |x 2 − y2 | + |x 1 − y1 | ≤ 3 |x 1 − y1 | ≤ 6 |x − y|, and (19.20) follows since δ = ε/8. To prove (19.21): By y = y1 we have | f (y) − f (x) − ∇ f (x) · (y − x)| =
f (y1 ) − f (x) + ∇ f (x) · (x 1 − x) − ∇ f (x) · (y − x 1 )
= |∇ f (x) · (y − x 1 )|,
19.4 Concave Needles and Proof of the Localization Theorem
259
where we have used the identity f (y1 ) = f (x 1 ) = f (x) + ∇ f (x) · (x 1 − x). Now, by (19.26), |x 0 − x 1 | 2 ≤ |x 0 − y1 | 2 = |x 0 − x 1 | 2 + |x 1 − y1 | 2 + 2 (x 0 − x 1 ) · (x 1 − y1 ), which combined with x 0 − x 1 = −δ ∇ f (x) and with (19.27) gives 2 δ ∇ f (x) · (x 1 − y1 ) ≤ |x 1 − y1 | 2 ≤ 4 |x − y| 2 . Similarly, |x 2 − x 1 | 2 ≤ |x 2 − y1 | 2 ≤ |x 2 − x 1 | 2 + |x 1 − y1 | 2 + 2 (x 2 − x 1 ) · (x 1 − y1 ), so that x 2 − x 1 = δ ∇ f (x) and (19.27) give −2 δ ∇ f (x) · (x 1 − y1 ) ≤ |x 1 − y1 | 2 ≤ 4 |x − y| 2 , and hence |∇ f (x) · (y − x 1 )| ≤ (2/δ) |x − y| 2 ≤ (16/ε) |x − y| 2 .
19.4 Concave Needles and Proof of the Localization Theorem In Theorem 19.7, given a 1-Lipschitz function f , we consider the disintegration { μσ }σ of μ = L n Tc ( f ) with respect to P f and prove a crucial concavity property of the densities of μσ with respect to H 1 P−1 f (σ). Here, P f has been extended as usual from T ( f ) to Tc ( f ) ∩ F by Proposition 18.6, where F is the differentiability set of f , and thus L n (Rn \ F) = 0: in particular, it makes sense to disintegrate μ with respect to P f . With Theorem 19.7 proved, we will be finally ready to prove Theorem 19.4 and conclude our discussion. Theorem 19.7 (Concave needles) If n ≥ 2, f : Rn → R is 1-Lipschitz and μ = L n Tc ( f ), then the disintegration { μσ }σ of μ with respect to P f is such that for (P f )# μ-a.e. σ ∈ S ◦ , (19.31) μσ = σ dH 1 P−1 f (σ) 1/(n−1) is concave where σ is a smooth function on P−1 f (σ) such that ( σ ) along P−1 f (σ).
Proof By Theorem 18.9 we can apply Theorem 18.7 to G = G( f ). Hence, for (P f )# μ-a.e. σ ∈ S ◦ there is σ : P−1 f (σ) → R Borel measurable such that μσ = σ dH 1 P−1 f (σ).
(19.32)
Therefore, our goal will be proving that σ is smooth and ( σ ) 1/(n−1) is concave on the open segment P−1 f (σ). In doing so we shall use the coarea formula for rectifiable sets, which works analogously to the most basic version already used in Remark 17.5, and which is recalled here in Appendix A.14 and A.15.
260
An Introduction to the Needle Decomposition Method
Step one: Given k > 0 large enough (any k > 96 will do), let ε > 0 and set 1 δ = 1− ε. k Denoting by f δ the C 1,1 -extension of f from Σδ ( f ) to Rn constructed in Theorem 19.6, since ∇ f δ is Lipschitz continuous, ∇ f = ∇ f δ on Σδ ( f ) ⊂ T ( f ) and |∇ f | = 1 on T ( f ), by the coarea formula, if Gδ is the set of nondifferentiability points of f δ and A is an arbitrary open set containing Gδ , we have n |∇ f δ | = H n−1 A ∩ Σδ ( f ) ∩ { f δ = s} ds. L ( A ∩ Σδ ( f )) = A∩Σδ ( f )
R
Applying this to A = A j with Gδ ⊂ A j ⊂ A j−1 and L n ( A j ) → L n (Gδ ) = 0 as j → ∞, we find that H n−1 A j ∩ Σδ ( f ) ∩ { f δ = s} ds 0 = lim j→∞ R ≥ H n−1 Gδ ∩ Σδ ( f ) ∩ { f δ = s} ds ≥ 0. R
In particular, there is J0 ⊂ R such that L 1 (R \ J0 ) = 0 and if s ∈ J0 , then ∇ f δ is differentiable H n−1 -a.e. on { f δ = s} ∩ Σδ ( f ). (19.33) Given s ∈ J0 , we now define M = Σε ( f ) ∩ f = s , T (y,t) = y + t ∇ f (y), T : M × Iε/k → Rn , ε E = T M × Iε/k = y + t ∇ f (y) : y ∈ M, |t| < ⊂ T ( f ); k see Figure 19.3, where Ia = (−a, a) and where we have used the facts that Σε ( f ) ⊂ T ( f ) and that f is differentiable on T ( f ) to define T. We notice that, setting Eε, s in place of E to momentarily stress the dependency of E on the choices of ε and s ∈ J0 , we have T (f) = Eε, s . (19.34) ε >0 s ∈J0
Indeed, if x ∈ T ( f ), then x ∈ Σ2 ε ( f ) for some ε > 0, and, by L 1 (R \ J0 ) = 0, we can find s ∈ J0 with |s − f (x)| < ε/k. Then y = x − ( f (x) − s) ∇ f (x) is such that f (y) = s, ∇ f (y) = ∇ f (x), x = y + t ∇ f (y) for t = f (x) − s ∈ Iε/k , and y ∈ Σ2 ε− |t | ( f ), where 2 ε − |t| > ε. We claim that if k is large enough, then M is locally H n−1 -rectifiable, E is a Borel set contained in Σδ ( f ), and T is Lipschitz continuous and invertible on M × Iε/k , with Lipschitz continuous inverse. To prove the claim, we start
19.4 Concave Needles and Proof of the Localization Theorem
261
2 ε/k
δ
{ f = s} Figure 19.3 Construction of M, T , and E. The dotted segments with open endpoints represent the maximal transport rays making up T ( f ). The light grey region is Σδ ( f ), which is obtained by cutting off segments of length δ from each end of every maximal transport ray. In particular, Σε ( f ) is obtained from Σδ ( f ) by an additional cutting off segments of length ε/k. We define M as the intersection of some level set { f = s } of f with Σε ( f ). In this way, the set E (depicted in dark grey) obtained by taking the union of the segments of length 2ε/k centered on points in M, is contained in Σδ ( f ), and thus f = f δ on E. The map T simply takes M × Iε/k into E by the formula T (y, t ) = y + t ∇ f (y).
noticing that, since f = f ε and |∇ f ε | = |∇ f | = 1 on Σε ( f ), by the implicit function theorem { f ε = f (x 0 )} is a C 1 -hypersurface in a neighborhood of M, and since M ⊂ { f ε = f (x 0 )}, we conclude that M is locally H n−1 -rectifiable. Next, if k > 48, then, Lip(T; M × Iε/k ) ≤ C(n) thanks to (19.20) and ε |T (y1 ,t 1 ) − T (y2 ,t 2 )| ≤ |y1 − y2 | + |t 1 − t 2 | + |∇ f (y1 ) − ∇ f (y2 )| k ≤ 2 |y1 − y2 | + |t 1 − t 2 |. In particular, T is Lipschitz continuous on M × Iε/k , and E is a Borel set (since it is the Lipschitz image of a Borel set). To prove that T is invertible with Lipschitz inverse, let x 1 , x 2 ∈ E, x i = T (yi ,t i ): by yi ∈ Σε ( f ) we find f (x i ) = f yi + t i ∇ f (yi ) = f (yi ) + t i = f (x 0 ) + t i , so that |t 1 − t 2 | ≤ | f (x 1 ) − f (x 2 )| ≤ |x 1 − x 2 |; in particular, |y1 − y2 | ≤ |x 1 − x 2 | +
t 1 ∇ f (y1 ) − t 2 ∇ f (y2 )
≤ |x 1 − x 2 | + |t 1 − t 2 | + |t 1 |
∇ f (y1 ) − ∇ f (y2 )
≤ 2|x 1 − x 2 | +
ε 48 |y1 − y2 |, k ε
so that (1/2) |y1 − y2 | ≤ 2 |x 1 − x 2 | if k ≥ 96, and Lip(T −1 ; E) ≤ C(n). Finally, to prove E ⊂ Σδ ( f ), we notice that if x ∈ E, then x = y + t ∇ f (y) for
262
An Introduction to the Needle Decomposition Method
y ∈ M ⊂ Σε ( f ) and |t| < ε/k, so that by ∇ f (x) = ∇ f (y) we find x − δ ∇ f (x), x + δ ∇ f (x) ⊂ y − ε ∇ f (y), y + ε ∇ f (y) ⊂ Φ(P f (y)) = Φ(P f (x)) and thus x ∈ Σδ (E). Having proved the claim, and for reasons that will become apparent in step two, we set p : Rn × R → Rn , p(x,t) = x, and consider the Lipschitz map Q = p ◦ T −1 : E → M,
(19.35)
and make a second claim: for L n -a.e. x ∈ E, setting CQ(x) = det(∇Q(x) (∇Q(x)) ∗ ) 1/2 for ∇Q(x) ∈ (TQ(x) M) ⊗ Rn , there is M0 ⊂ M with H n−1 (M \ M0 ) such that, for every y ∈ M0 , (19.36) along each segment {y + t∇ f (y) : t ∈ Iε/k } 1/CQ is H 1 -a.e. equal to a smooth function whose (1/(n − 1))-power is concave. To prove this second claim, since ∇ f δ is Lipschitz continuous on Σδ ( f ) and M ⊂ E ⊂ Σδ ( f ), we can find M0 ⊂ M with H n−1 (M \ M0 ) = 0 such that at every y ∈ M0 , ∇ f δ is tangentially differentiable along M.
(19.37)
Since s0 ∈ J0 and M ⊂ { f δ = s0 }, thanks to (19.33) and up to shrinking M0 , we can still retain H n−1 (M \ M0 ) = 0 while achieving, in addition to (19.37) at every y ∈ M0 , ∇ f δ is differentiable. By (19.37) and (19.38) (see (A.31)), we thus find that ∇ M ∇ f δ (y)[τ] = ∇2 f δ (x)[τ] ∀τ ∈ Ty M,∀y ∈ M0 .
(19.38)
(19.39)
Since ∇ f = ∇ f δ on Σδ ( f ) and ∇ f is constant on the segment {y + t ∇ f (y) : t ∈ Iε/k } ⊂ Σδ ( f ), we see that ∇2 f δ [∇ f (y)] = 0, where Ty M = (∇ f (y)) ⊥ . By the symmetry of ∇2 f (y) we thus conclude that for every y ∈ M0 we can find n−1 of T M ≡ ∇ f (y) ⊥ and {κ (y)} n−1 such that an orthonormal basis {τi (y)}i=1 y i i=1 ∇2 f δ (y) =
n−1
κ i (y) τi (y) ⊗ τi (y),
i=1
which combined with (19.39) gives ∇ M ∇ f δ (y) = κ i (y) τi (y) ⊗ τi (y), n−1 i=1
∀y ∈ M0 .
(19.40)
19.4 Concave Needles and Proof of the Localization Theorem
263
Now, since M ⊂ E ⊂ Σδ ( f ), we have T (y,t) = y + t ∇ f δ (y) for (y,t) ∈ M × Iε/k , and thus we deduce from (19.37) and (19.40) that for every (y,t) ∈ M0 × Iε/k , T is tangentially differentiable along M × I (19.41) n−1 with ∇ M ×I T (y,t) = 1 + t κ i (y) τi (y) ⊗ τi (y) + ∇ f δ (y) ⊗ e, i=1
if e is such that {τ1 (y), . . . , τn−1 (y), e} is an orthonormal basis of Ty M × R. We now notice that, since H n−1 (M \ M0 ) = 0, we have n L T (M \ M0 ) × Iε/k = J M ×Iε/k T d H n−1 × L 1 = 0, (M \M0 )×Iε/k
so that E is L n -equivalent to T (M0 × Iε/k ) and there is E0 ⊂ E with L n (E \ E0 ) = 0, and, for every x ∈ E0 , Q is differentiable at x,
(19.42)
and x = T (y,t) for some (y,t) ∈ M0 × Iε/k . Now, by construction, y = Q(y + t ∇ f (y)) for every (y,t) ∈ M × Iε/k , and since M ⊂ Σδ ( f ) ⊂ {∇ f = ∇ f δ }, we have y = Q y + t ∇ f δ (y)
∀(y,t) ∈ M × Iε/k .
(19.43)
For every x = T (y,t) as in (19.42), we can differentiate (19.43) along τi (y) and exploit (19.41) to find τi (y) = ∇Q(x) (1 + t κ i (y)) τi (y) = (1 + t κ i (y)) ∇Q(x)[τi (y)]; similarly, we can differentiate along e to find that 0 = ∇Q(x)[∇ f (y)]. We have thus proved that for every x ∈ E0 there is (y,t) ∈ M0 × Iε/k such that ∇Q(x) =
n−1 τi (y) ⊗ τi (y) , (1 + t κ i (x)) i=1
x = T (y,t),
(19.44)
and hence, by definition of CQ, such that 1 = (1 + t κ i (y)). CQ(x) i=1 n−1
(19.45)
264
An Introduction to the Needle Decomposition Method
Now, the function on the right-hand side of this last identity is smooth, with concave [1/(n − 1)]-power, along the segment {y + t ∇ f (y) : t ∈ Iε/k } ⊂ E. To conclude, we notice that for every y ∈ M0 , the set I0 (y) = t ∈ Iε/k : T (y,t) ∈ E0 is such that L 1 (Iε/k \ L 0 (y)) = 0: indeed, J M ×I T > 0 on M0 × Iε/k , and, by the area formula, dH n−1 (y) J M ×I T dL 1 . 0 = L n E \ E0 ) = M0
Iε/k \I0 (y)
L 1 -a.e.
t ∈ Iε/k we have x = T (y,t) ∈ E0 , and Therefore, if y ∈ M0 , then for 1 thus (19.45) holds for H -a.e. x ∈ {y +t ∇ f (y) : t ∈ Iε/k }. This proves (19.36). Step two: To conclude the proof, we notice that since Pf (M) = P f (E) and (P f )| M is Borel measurable and injective, it turns out that η = [(P f )| M ]−1 : P f (E) → M is Borel measurable and satisfies Q = η ◦ (P f )| E .
(19.46)
By Remark 17.3, { μσ E/μσ (E)}σ is the disintegration of μE = L n (E ∩ Tc ( f ) = L n E with respect to (P f )| E ; and thanks to (19.46), we can apply Remark 17.6 to find that if { μQ y } y is the disintegration of μE with respect to Q (so that, for Q# μ-a.e. y ∈ M, μQ y is concentrated on the open segment −1 Q (y) = E ∩ Φ(P f (y))), then μσ E = μQ η(σ) μσ (E)
for [(P f )| E ]# μ-a.e. σ ∈ P f (E) ⊂ S ◦ .
In particular, thanks to (19.32), Q
σ dH 1 P−1 f (σ) ∩ E = μσ (E) μη(σ)
for (P f )# μ-a.e. σ ∈ P f (E) ⊂ S ◦ . (19.47) Q −1 (where also μη(σ) is concentrated on the open segment Pf (σ) ∩ E). Notice that Q is a Lipschitz map from a Borel set E ⊂ Rn to a locally H n−1 -rectifiable set M ⊂ Rn so that the coarea formula CQ = H 1 (F ∩ Q−1 (y)) dH n−1 (y) ∀ F ⊂ E Borel (19.48) F
M
holds (see, in particular, Remark A.3 and (A.32)), provided (CQ) 2 = det[(∇Q)(∇Q) ∗ ]
19.4 Concave Needles and Proof of the Localization Theorem
265
and ∇Q(x) ∈ (TQ(x) M) ⊗ Rn (as explained in the appendix, for L n -a.e. x ∈ E, Q is differentiable at E, M has an approximate tangent plane at Q(x), and ∇Q(x) is well defined as an element of (TQ(x) M) ⊗ Rn ). Based on (19.48), assuming that C M Q > 0 L n -a.e. on E (as we are going to prove in a moment) we can repeat the argument of Remark 17.5 to find M ∗ ⊂ M with μQ y =
1 d H 1 Q−1 (y) CQ
∀y ∈ M ∗ ,
Q# μ(M \ M ∗ ) = 0.
In particular, if σ ∈ η −1 (M ∗ ), then Q−1 (η(σ)) = [(P f )| E ]−1 (σ) = E ∩ P−1 f (σ) gives 1 1 −1 (19.49) μQ η(σ) = CQ d H [E ∩ P f (σ)], where
[(P f ) E ]# μ η −1 (M \ M ∗ ) = Q# μ(M \ M ∗ ) = 0.
Having proved that (19.49) holds for (P f )# μ-a.e. σ ∈ P f (E), and recalling (19.47), we conclude that, for (P f )# μ-a.e. σ ∈ P f (E),
σ =
μσ (E) CQ
H 1 -a.e. on E ∩ P−1 f (σ),
(19.50)
and thus that σ is H 1 -equivalent to a smooth function with concave 1/(n − 1)-power thanks to (19.36). Thanks to (19.34), resorting back to the notation where E = Eε, s , each open interval P−1 f (σ) is covered by the open intervals {Eε, s ∩ P−1 (σ)} , and the proof is complete. ε >0, s ∈I0 f We conclude with the proof of the localization theorem. Proof of Theorem 19.4 Let Ω, u, μ, ν and f be as in the statement of the theorem. By Theorem 19.7 we only need to prove (19.17) and the second stated property in Theorem 19.4-(i). To this end, we divide the proof in a few steps. Step one: We claim that if K is compact in Rn , then
u
≤ |u|, Rn \[K ∪Dis(K ; f )]
K
(19.51)
where we have denoted by Dis(K; f ) = x ∈ Rn : | f (x) − f (y)| < |x − y| for every y ∈ K the set of those x ∈ Rn that are “not connected to K” by the transport ray geometry of f . To prove (19.51), let us define gδ : Rn → R by setting gδ (x) = infn f (y) + |x − y| − δ 1 K (y), y ∈R
x ∈ Rn .
266
An Introduction to the Needle Decomposition Method
Evidently, gδ is 1-Lipschitz (so that uf ≥ Rn
Rn
u gδ
(19.52)
gδ + δ = f on K .
(19.53)
by Theorem 3.17), with 0 ≤ f − gδ ≤ δ on Rn ,
Now, if we set vδ = ( f − gδ )/δ (δ > 0), then by (19.53) we have vδ : Rn → [0, 1]; moreover, we easily see that, for every x ∈ Rn , δ ∈ (0, ∞) → vδ (x) is increasing, and, in particular, v = limδ→0+ vδ : Rn → [0, 1] defines a Borel function such that (thanks to (19.52)) u v ≥ 0. (19.54) Rn
Now v = 1 on K by (19.53), while we claim that v = 0 on Dis(K; f ): indeed, if x ∈ Dis(K; f ), then | f (x) − f (y)| < |x − y| for every y ∈ K and, in particular, inf f (y) + |x − y| = f (y x ) + |x − y x | ≥ f (x) + δ x ,
y ∈K
for some δ x > 0; and hence, for every δ ∈ (0, δ x ), f (y) + |x − y| − δ 1 K (y) − f (x) ≥ (δ x − δ) 1 K (y) ≥ 0 so that gδ ≥ f on Dis(K; f ), and hence gδ = f therein by (19.53). Having proved v = 1 on K and v = 0 on Dis(K; f ) we conclude from (19.54) u≥− |u|. (19.55) Rn \[K ∪Dis(K ; f )]
K
Considering that − f is a Kantorovich potential from ν to μ, that (−u) + = u− and (−u) − = u+ , and that Dis(K; f ) = Dis(K; − f ), the above argument also entails (−u) ≥ − |u|, Rn \[K ∪Dis(K ; f )]
K
which combined with (19.55) gives (19.51). Step two: We now claim that
u = 0,
(19.56)
A
whenever A is a partial transport set for f , that is, whenever A ⊂ T ( f ) and for every x ∈ A, it holds that Φ(P f (x)) ⊂ A. Indeed, if K ⊂ A is compact and x ∈ Rn \ (K ∪ Dis(K; f )),
19.4 Concave Needles and Proof of the Localization Theorem
267
then there is y ∈ K such that | f (x) − f (y)| = |x − y|; since x K, it must be x y, and ]x, y[ ⊂ Φ(P f (y)) ⊂ A. In particular, either x ∈ Φ(P f (y)), and thus x ∈ A; or x is an end-point of P f (y), and thus x ∈ Tc ( f ) \ T ( f ); this shows that Rn \ (K ∪ Dis(K; f )) ⊂ A ∪ Tc ( f ) \ T ( f ) . Since L n (Tc ( f ) \ T ( f )) = 0, we conclude from (19.51)
u ≤ |u|. A\K
K
(19.57)
By the arbitrariness of K ⊂ A and by the fact that u ∈ L 1 (Rn ), we find (19.56). Step four: We now prove (19.17), that is, u = 0 a.e. on Ω \ T ( f ). Indeed, let us consider the isolated set of f , Iso( f ) = x ∈ Rn : | f (x) − f (y)| < |x − y|
(19.58)
∀y ∈ Rn = Dis(Rn ; f ),
i.e., those points that are “totally isolated” in the transport geometry of f , and notice that (19.59) Rn \ T ( f ) = Iso( f ) ∪ Tc ( f ) \ T ( f ) . If K ⊂ Iso( f ) is compact, then Dis(K; f ) ⊃ Dis(Iso( f ); f ) = Rn , so that (19.51) implies K u = 0; by arbitrariness of K, u = 0 L n -a.e. on Iso( f ) and thus on T ( f ), thanks to L n (Tc ( f ) \ T ( f )) = 0 and (19.59). Conclusion: Let {λ σ }σ be the disintegration of λ = L n (Ω ∩ T ( f )) with respect to P f . For every Σ ⊂ S ◦ we have, thanks to (19.56) applied to A = P−1 f (Σ), d[(P f )# λ](σ) u dλ σ = u = 0. P −1 f (σ)
Σ
P −1 f (Σ)
By the arbitrariness of Σ we conclude that u dλ σ = 0 for (P f )# λ-a.e. σ ∈ S ◦ , P −1 f (σ)
(19.60)
which proves the second stated property in Theorem 19.4-(i). This completes the proof of the localization theorem.
Appendix A Radon Measures on Rn and Related Topics
Here we summarize a few aspects of the theory of Radon measures on Rn that provide the basic language for the rest of the notes. We will follow the presentation given in [Mag12, Part I]. We assume readers have a certain familiarity with abstract measure theory, including topics like measurability of functions, integration, limit theorems, and Lebesgue spaces.
A.1 Borel and Radon Measures, Main Examples A Borel measure μ is a measure on the σ-algebra of Borel sets B(Rn ). A Radon measure on Rn is a Borel measure on Rn , which is locally finite (i.e., finite on compact sets). A probability measure (for the purpose of these notes!) is a Radon measure μ on Rn such that μ(Rn ) = 1. The Lebesgue measure on Rn is defined on an arbitrary set E ⊂ Rn as r (Q) n , L n (E) = |E| = inf F
Q∈F
where F ranges among the countable coverings of E by cubes Q with sides parallel to the coordinate axes, and r (Q) denotes the side length of Q. Notice that L n is countably additive on B(Rn ) (and actually on a larger σ-algebra n than B(Rn ), but not on the whole 2R ) and that it is locally finite, so L n is a Radon measure. Given k ∈ N and δ ∈ (0, ∞], the k-dimensional Hausdorff measure of step δ is defined on E ⊂ Rn by
k diam(F) ωk , Hδk (E) = inf F 2 F ∈F
where ωk = ∈ : |x| < 1}) and F ranges among all the countable coverings of E by sets F with diam(F) < δ; the k-dimensional Hausdorff measure of E ⊂ Rn is then given by L k ({x
Rk
H k (E) = lim+ Hδk (E) = sup Hδk (E). δ→0
δ>0
268
A.3 Fubini’s Theorem
269
It turns out that H k defines a Borel measure with the following three important properties: (i) H 0 is the counting measure, it measures the number of elements of subsets of Rn : for this reason we sometimes write #(E) in place of H 0 (E); (ii) H n = L n on Borel sets of Rn ; (iii) if 1 ≤ k ≤ n − 1, M is a k-dimensional C 1 -surface in Rn , and Area(M) denotes the area of M (as defined by using parameterizations and partitions of unity as in basic Differential Geometry), then H k (M) = Area(M). Given a Borel measure ν on Rn and E ∈ B(Rn ), the restriction μ = νE of ν to E, defined by μ(F) = ν(E∩F) for every F ∈ B(Rn ), is still a Borel measure; if ν(E ∩ BR ) < ∞ for every R > 0, then μ = νE is a Radon measure on Rn . For example, if x ∈ Rn , then Dirac’s delta δ x = H 0 {x} is a Radon measure on Rn ; if M is a k-dimensional C 1 -surface in Rn with H k (M ∩ BR ) < ∞ for every R > 0, then H k M is a Radon measure on Rn (so that the convergence of sequences of surfaces can be studied in terms of the convergence of their associated Radon measures).
A.2 Support and Concentration Set of a Measure A Borel measure μ is concentrated on a Borel set E if μ(Rn \ E) = 0; in this case, if a property holds at every x ∈ E, we say that it holds μ-a.e. The intersection of all the closed sets of concentration of μ defines the support of μ, which satisfies (A.1) spt μ = x ∈ Rn : μ(Br (x)) > 0 ∀ r > 0 . So if a property holds at every x ∈ sptμ, then it holds μ-a.e. A measure may
−j be concentrated on a strict subset of its support, e.g., μ = ∞ j=1 2 δ1/ j is a Radon measure on R with sptμ = {1/ j : j ≥ 1} ∪ {0}, and μ is concentrated on sptμ \ {0}.
A.3 Fubini’s Theorem Given that Radon measures on Euclidean spaces are automatically σ-finite, if μ and ν are Radon measures, then we can always apply Fubini’s theorem to write integrals in the product measure μ × ν as iterated integrals in μ and ν. Of particular interest is the layer cake formula ∞ g dμ = μ({g > t}) dt, ∀g : Rn → [0, ∞] Borel function, (A.2) Rn
0
270
Radon Measures on Rn and Related Topics
which follows by applying Fubini’s theorem to μ and L 1 , g dμ = dμ(x) 1[0, g(x)) (t) dt n Rn R R ∞ = dt 1[0, g(x)) (t) dμ(x) = μ({g > t}) dt, R
Rn
0
where in the last identity one uses 1[0, g(x)) (t) = 1[0,∞) (t) 1 {g>t } (x). Notice that, thanks to (A.11), and the fact that the Borel sets {g = t}, t > 0, are disjoint, identity (A.2) also holds with {g ≥ t} replacing {g > t}.
A.4 Push-Forward of a Measure A Borel function is a function T : F → Rm (F a Borel set in Rn ) such that T −1 ( A) = {x ∈ F : T (x) ∈ A} ∈ B(Rn ) for every open set A ⊂ Rm . If μ is a Borel measure on Rn , concentrated on F, then the push-forward measure T# μ is the Borel measure on Rm defined by T# μ(E) = μ(T −1 (E)) or, alternatively, by ϕ d[T# μ] = Rm
Rn
∀E ∈ B(Rm ),
(ϕ ◦ T ) dμ,
∀ϕ ∈ Cc0 (Rn ).
(A.3)
(A.4)
(By standard approximation arguments, (A.4) holds also for every Borel function ϕ : Rn → [0, ∞], and for every bounded Borel function ϕ : Rm → R.) If T is proper (i.e., T −1 (K ) is compact in Rn for every compact set K in Rm ) or if μ is finite (i.e., μ(Rn ) < ∞), then T# μ is a Radon measure on Rm . Notice that since μ is concentrated on F, we have that T# μ is concentrated on T (F). Moreover, spt(T# μ) = T (sptμ)
(A.5)
whenever T can be extended to a continuous function on the whole sptμ.
A.5 Approximation Properties Countable additivity on Borel sets ties Radon measures to the Euclidean topology, and leads to a crucial approximation property: if μ is a Radon measure, then for every Borel set E ⊂ Rn we have μ(E) = inf μ( A) : E ⊂ A open = sup μ(K ) : K ⊂ E, K compact . (A.6) As a consequence, one has the classical Lusin’s theorem, from which we readily deduce that Cc0 (Rn ) is dense in L p (μ) whenever 1 ≤ p < ∞ and μ is a Radon measure on Rn .
A.7 Weak-Star Compactness
271
A.6 Weak-Star Convergence A sequence { μ j } j of Radon measures is weakly-star converging as j → ∞ to ∗ a Radon measure μ, and we write μ j μ, if ϕ dμ j = ϕ dμ, ∀ϕ ∈ Cc0 (Rn ). (A.7) lim j→∞
Rn
Rn
Recall that (A.7) is equivalent to μ( A) ≤ lim inf μ j ( A) j→∞
μ(K ) ≥ lim sup μ j (K )
for all open sets A ⊂ Rn
(A.8)
for all compact sets K ⊂ Rn
(A.9)
j→∞
and that in turn (A.8)+(A.9) is equivalent to μ(E) = lim μ j (E)
for all bounded E ∈ B(Rn ) s.t. μ(∂E) = 0. (A.10)
j→∞
Condition (A.10) is often used in conjunction with the following fact: If {St }t ∈T is a disjoint family of Borel sets in Rn , indexed over an arbitrary set T, and if μ is a Borel measure on Rn which is finite on S = t ∈T St , there are at most countably many values of t such that μ(St ) > 0: indeed, μ(E) , (A.11) # t ∈ T : μ(St ) > ε ≤ ε whenever ε > 0. For example, if μ is a Radon measure on Rn and x ∈ Rn , then we have μ(∂Br (x)) = 0 for a.e. r > 0 (actually, for all but countably many ∗ values of r), so that if μ j μ on Rn , then μ j (Br (x)) → μ(Br (x)) for a.e. ∗ r > 0 thanks to (A.10). Finally, we notice that μ j μ implies that ∀x 0 ∈ sptμ there exist x j ∈ sptμ j s.t. x j → x 0 as j → ∞.
(A.12)
The converse property is in general false, that is to say, one can have x j ∈ spt μ j , x j → x 0 , and x 0 sptμ. For example, consider μ j = (1 − (1/ j)) δ1 + (1/ j) δ1/ j on R with x j = 1/ j.
A.7 Weak-Star Compactness The paramount importance of weak-star convergence lies in two facts: first, many different mathematical objects (functions, surfaces, etc.) can be seen as Radon measures; second, every sequence of Radon measures with locally bounded mass has a weak-star subsequential limit. This is the Compactness Theorem for Radon measures: if { μ j } j is a sequence of Radon measures on Rn with sup μ j (BR ) < ∞ j
∀R > 0,
Radon Measures on Rn and Related Topics
272
then there exists a Radon measure μ on Rn such that, up to extract subse∗ quences, μ j μ on Rn .
A.8 Narrow Convergence A sequence { μh }h of Radon measures is narrowly converging as h → ∞ to a n Radon measure μ, and we write μh μ, if ϕ dμh = ϕ dμ, ∀ϕ ∈ Cb0 (Rn ); (A.13) lim h→∞
where
Cb0 (Rn )
Rn
Rn
is the space of all bounded continuous functions on Rn . Notice ∗
that in general μh μ on Rn does not imply μh (Rn ) → μ(Rn ), since some ∗ mass contained in μh may be “lost at infinity”: for example, μh = δ x h μ = 0 if the points x h ∈ Rn are such that |x h | → ∞. The importance of narrow convergence is that is exactly equal to weak-star convergence plus conservation of total mass (notice indeed that μh = δ x h does not admit a narrow limit as soon as |x h | → ∞). Proposition A.1 (Narrow convergence criterion) Assume that μh and μ are ∗ finite Radon measures on Rn with μh μ on Rn as h → ∞. Then the following statements are equivalent: (i): μh (Rn ) → μ(Rn ) as h → ∞; (ii): μh is narrowly converging to μ on Rn ; (iii): for all ε > 0 there exists Kε ⊂ Rn compact such that μh (Rn \ Kε ) < ε for every h. Proof Proof that (i) is equivalent to (iii): Assume (iii), fix ε > 0 and consider Kε such that μh (Rn \ Kε ) < ε for every h: then, by (A.9), we have μ(Rn ) ≥ μ(Kε ) ≥ lim sup μh (Kε ) ≥ lim sup μh (Rn ) − ε ≥ μ(Rn ) − ε, h→∞
h→∞
where in the last inequality we have used (A.8). Viceversa, we just need to prove that (i) implies lim sup μh (Rn \ BR ) = 0.
R→0+ h ∈N
Indeed, by μ(Rn ) < ∞, given ε > 0 we can find R > 0 such that μ(Rn \ Cl(BR )) < ε. Moreover, for all but countably many values of r we have μ(∂Br ) = 0 so that, up to increase the value of R we can also assume that μ(∂BR ) = 0. By applying (A.10) with E = Cl(BR ) we find that μh (Cl(BR )) → μ(Cl(BR )), and thus, thanks to (i), that μh (Rn \ Cl(BR )) → μ(Rn \ Cl(BR )) as h → ∞, which gives μh (Rn \ Cl(BR )) < ε for every h large enough.
A.9 Differentiation Theory of Radon Measures
273
Proof that (i) is equivalent to (ii): If (ii) holds then we can test (A.7) at ϕ ≡ 1 and deduce (i). Conversely, let ϕ ∈ Cb0 (Rn ) and let ζ R ∈ Cc0 (BR ) with 0 ≤ ζ R ≤ 1 and ζ R → 1 on Rn as R → ∞. We have that
ϕ dμh − ϕ ζ R dμh
≤ ϕC 0 (Rn ) μh (Rn \ BR ), Rn
Rn
ϕ dμ − ϕ ζ R dμ
≤ ϕC 0 (Rn ) μ(Rn \ BR ), Rn
Rn and that, for every R > 0, Rn ϕ ζ R dμh → Rn ϕ ζ R dμ as h → ∞. Therefore, to show that Rn ϕ dμh → Rn ϕ dμ as h → ∞,
A.9 Differentiation Theory of Radon Measures Given two Borel measures μ and ν, we say that μ is absolutely continuous with respect to ν, and write μ 0, ε > 0. (A.22) B r (x)
It is useful to notice that (A.22) can be improved in those (not so frequently met) situations where r is substantially smaller than ε, as expressed in the following estimate: min{r n , ε n } | με | ≤ C(n) | μ|(Br +ε (x)), ∀x ∈ Rn ,r > 0, ε > 0. εn B r (x) (A.23) Indeed, by ρC 0 ≤ C(n)/ε n and | με | ≤ dy ρε (z − y) d| μ|(z) B r (x) B r (x) B ε (y) d| μ|(z) ρε (z − y) dy, = B r +ε (x)
B r (x)∩B ε (z)
which gives (A.23).
A.13 Lipschitz Approximation of Functions of Bounded Variation As an immediate consequence of (A.15) we see that, if 1 ≤ p < ∞ and f ∈ p L loc (Rn ), then, for L n -a.e. x ∈ Rn ,
1/p 1 1 lim+ n | f −( f ) x,r | p = 0, where ( f ) x,r = f. r →0 r |Br (x)| B r (x) B r (x) (A.24) For an arbitrary measurable function, the rate of convergence of the limit in (A.24) will drastically change depending on the point x. However, it is easily seen that as soon as f agrees L n -a.e. with an α-Hölder continuous function,
A.13 Lipschitz Approximation of Functions of Bounded Variation
277
α ∈ (0, 1], then the rate of convergence is O(r α ) as r → 0+ , uniformly in x. The converse is true, and this is the content of Campanato’s criterion, see [Mag12, Section 6.1]. This basic result, combined with the Poincaré inequality and a covering argument, is sufficient to conclude that every BV -function is countably Lipschitz. Theorem A.2 If Ω is an open set in Rn , f ∈ L 1loc (Ω) and D f is a Radon measure in Ω, then there exists an increasing family of Borel sets {E j } j such that E j = 0, Ln Ω \ Lip( f ; E j ) ≤ j. j
Proof If Ω(t) = {x ∈ Ω ∩ B1/t : dist(x, ∂Ω) > t}, then, for a.e. t > 0, Ω(t) is an open bounded set of finite perimeter in Rn (see [Mag12, Remark 18.2]), and, in particular, g = 1Ω(t ) f ∈ L 1 (Rn ) and Dg is a finite Borel measure on Rn . We can thus reduce to prove the theorem in the case Ω = Rn . Now, given
> 0, it turns out that f is C(n) -Lipschitz on the Borel set n |D f |(Br (x)) ≤ , ∀r > 0 . (A.25) E = x ∈ R : rn Indeed, by the Poincaré inequality for smooth functions, we have 1 |D f |(Br (x)) | f − ( f ) x,r | ≤ C(n) |Br | B r (x) r n−1 (indeed, when f is smooth |D f |(Br (x)) = B (x) |∇ f |, so that the general case r follows by approximation), and thus 1 | f − ( f ) x,r | ≤ C(n) r, ∀x ∈ E . |Br | B r (x) Arguing as in the proof of Campanato’s criterion [Mag12, Section 6.1], we deduce that | f (x) − f (y)| ≤ C(n) for every x, y ∈ E , so that Lip( f ; E ) ≤ C(n) . At the same time, by Vitali’s covering theorem, we can find disjoint balls {Br j (x j )}∞ j=1 such that R n \ E ⊂
∞
B5 r j (x j ),
j=1
|D f |(Br j (x j )) r nj
> .
Therefore L n (Rn \ E ) ≤ 5n ω n
∞ j=1
r nj ≤
∞ C(n) C(n) |D f |(Rn ), |D f |(Br j (x j )) ≤
j=1
which immediately implies L n (Rn \ E ) → 0 as → +∞.
Radon Measures on Rn and Related Topics
278
A.14 Coarea Formula The coarea formula is a generalization of Fubini’s theorem to “curvilinear coordinates.” In its simplest instance, the coarea formula pertains to Lipschitz functions f : Rn → R and states that |∇ f | = H n−1 (E ∩ { f = t}) dt, ∀E ∈ B(Rn ); (A.26) E
R
or, equivalently Rn g |∇ f | = R dt { f =t } g dH n−1 for every Borel function g : Rn → R, which is either bounded or nonnegative; see [Mag12, Theorem 18.1]. The case when f is affine is equivalent to Fubini’s theorem on Rn ≡ Rn−1 × R. For a Lipschitz function f : Rn → Rk with 1 ≤ k ≤ n, at every differentiability point x of f we can define the coarea factor of f as C f (x) = det ∇ f (x) ∇ f (x) ∗ 1/2 ,
and correspondingly find that Cf = H n−k (E ∩ { f = t}) dH k (t), E
Rk
∀E ∈ B(Rn ).
(A.27)
When k = n, this is the area formula (A.18). When f is affine and k ≤ n − 1, (A.27) is equivalent to Fubini’s theorem on Rn ≡ Rn−k × Rk . The proof is analogous to the case k = 1 (the only differences being found in the underlying linear algebra), see [EG92, Section 3.4] and [AFP00, Section 2.12]. Remark A.3 There is a sometime confusing case, which is met for example in the proof of Theorem 19.7, and that may require clarification. Consider a situation when f : Rn → Rn is a smooth map such that f (Rn ) ⊂ M, M an k-dimensional surface with 1 ≤ k ≤ n − 1.
(A.28)
In this situation the area formula for f contains no information: the right-hand side of (A.18) is zero (since ∇ f (x) ∈ Rn ⊗ Rn has rank at most k < n, and thus det ∇ f (x) = 0), as well as its left-hand side (the integration in dH 0 (x) happens on { f = y}, which is empty unless y ∈ f (Rn ) ⊂ M, so that the integration in dy happens on M, which is Lebesgue negligible). One expects, however, that nontrivial information may be contained in a variant of the coarea formula where the target lower dimensional Euclidean space Rk is replaced by (the non-flat, but still k-dimensional) M. This is indeed the case if one identifies ∇ f (x), more properly, as an element of (Tf (x) M) ⊗ Rn (rather than of Rn ⊗ Rn ), and considers the coarea factor of f to be A(x) = ∇ f (x) ∇ f (x) ∗ ∈ (T f (x) M) ⊗ (T f (x) M). C f (x) = det A(x) 1/2 , (A.29)
A.15 Rectifiable Sets
279
With this choice of C f and under (A.28) we indeed have the coarea formula Cf = H n−k (E ∩ { f = t}) dH k (t), ∀E ∈ B(Rn ). (A.30) E
M
A.15 Rectifiable Sets Given k ∈ {1, . . . , n − 1}, a Borel set M ⊂ Rn is countably H k -rectifiable if it can be covered, modulo a set of H k -null measure, by countably many sets of the form f (E), where E ⊂ Rk is a Borel set and f : Rk → Rn a Lipschitz map. Countably H k -rectifiable sets M such that H k M defines a Radon measure on Rn , i.e., such that H k (M ∩ BR ) < ∞, for every R > 0, are called locally H k -rectifiable sets; or, simply, H k -rectifiable sets, when H k (M) < ∞. A Borel set M ⊂ Rn is locally H k -rectifiable in Rn if and only if for H k -a.e. x ∈ M there is a k-dimensional plane Px in Rn such that ∗ H k [(M − x)/r] H k Px as r → 0+ ; since this property uniquely identifies Px , we set Px = Tx M and call Tx M the approximate tangent plane to M at x. Given M a locally H k -rectifiable set in Rn and a Lipschitz map f : Rn → Rm , the restriction of f along x + Tx M is differentiable at x, and the corresponding differential is a linear map d M f x : Tx M → Rm , to which we can associate a tensor ∇ M f (x) ∈ Rn ⊗ (Tx M), called the tangential gradient of f along M. An useful fact to keep in mind is that when f is both differentiable and tangentially differentiable along M at some x ∈ M, then ∇ M f (x) = ∇ f (x) ◦ p x ,
p x the projection of Rn onto Tx M.
(A.31)
Coming to area and coarea formulas: if k ≤ m, then the formula J M f (x) = det(∇ M f (x) ∗ ∇ M f (x)) 1/2 , defines the tangential Jacobian of f at x along M, and the area formula holds M k (g ◦ f ) J f dH = g(z) H 0 ({ f = z}) dH k (z). M
Rm
In the situation when f (M) is a locally H -rectifiable set in Rm with k ≥ , then, for H k -a.e. x ∈ M, T f (x) ( f (M)) exists and the differential d M f x takes values in T f (x) [ f (M)], i.e., d M f x : Tx M → T f (x) [ f (M)]; correspondingly, we can identify the tangential gradient as a tensor ∇ M f (x) ∈ (T f (x) [ f (M)]) ⊗ (Tx M), define the tangential coarea factor of f along M as C M f (x) = det(∇ M f (x)∇ M f (x) ∗ ) 1/2 ,
Radon Measures on Rn and Related Topics
280
and have the coarea formula CM f = H n−k (E ∩ { f = t}) dH k (t), E
∀E ∈ B(Rn ).
(A.32)
M
A.16 The C 1,1 -Version of the Whitney Extension Theorem Given a C 1 -function f : Rn → R, we trivially notice that, on every compact set K ⊂ Rn , the C 0 -map T = ∇ f is such that lim+
sup
δ→0 x, y ∈K 0< |x−y |