244 37 5MB
English Pages 376 [369] Year 2021
Celso Melchiades Doria
Differentiability in Banach Spaces, Differential Forms and Applications
Differentiability in Banach Spaces, Differential Forms and Applications
Celso Melchiades Doria
Differentiability in Banach Spaces, Differential Forms and Applications
123
Celso Melchiades Doria Department of Mathematics, CFM Universidade Federal de Santa Catarina Florianópolis, Santa Catarina, Brazil
ISBN 978-3-030-77833-0 ISBN 978-3-030-77834-7 https://doi.org/10.1007/978-3-030-77834-7
(eBook)
Mathematics Subject Classification: 46-01, 47-01, 58-01 © Springer Nature Switzerland AG 2021 This work is subject to copyright. All rights are reserved by the Publisher, whether the whole or part of the material is concerned, specifically the rights of translation, reprinting, reuse of illustrations, recitation, broadcasting, reproduction on microfilms or in any other physical way, and transmission or information storage and retrieval, electronic adaptation, computer software, or by similar or dissimilar methodology now known or hereafter developed. The use of general descriptive names, registered names, trademarks, service marks, etc. in this publication does not imply, even in the absence of a specific statement, that such names are exempt from the relevant protective laws and regulations and therefore free for general use. The publisher, the authors and the editors are safe to assume that the advice and information in this book are believed to be true and accurate at the date of publication. Neither the publisher nor the authors or the editors give a warranty, expressed or implied, with respect to the material contained herein or for any errors or omissions that may have been made. The publisher remains neutral with regard to jurisdictional claims in published maps and institutional affiliations. This Springer imprint is published by the registered company Springer Nature Switzerland AG The registered company address is: Gewerbestrasse 11, 6330 Cham, Switzerland
Preface
This textbook introduces the techniques of differentiability on Banach spaces, integration of maps and some applications. In the first introduction to Calculus, the derivation and integration of the functions of one real variable are the main concern; these techniques are then extended to functions and maps of several real variables. For functions of a real variable, the main results of the theory are (a) the existence of maximum and minimum for a differentiable function defined on a compact set, and (b) the Fundamental Theorem of Calculus. When dealing with several variables, the main results are similar. Indeed, they are generalizations of cases (a) and (b), where case (b) is known as the Stokes Theorem. For functions of several variables, we should stress the importance of the Inverse Function Theorem. The text is divided into seven chapters and three appendices. In Chap. 1, we develop the theory of differentiation of functions and maps defined on the Euclidean space Rn . An application of the theory of differentiable maps is given to prove the Fundamental Theorem of Algebra without using properties of the field to be algebraically complete. The last section contains a description and some results related to the Jacobian Conjecture, which has been open since 1939. In Chap. 2, linear operators defined on Banach spaces are introduced, and particular classes of operators are clearly explained and illustrated. Chapter 3 is about differentiation of functions and maps defined on Banach spaces, and applications are given. The theory has several applications in several areas; we present only a few because the applications are usually very extensive and require specific knowledge of the contents. Chapter 4 contains an introduction to Vector Fields, the basic operations, the structure of the Lie algebra and its interpretation in terms of Linear Differential Operators. In Chap. 5, we review the formalism of vector calculus in which we write down the classical theorems of integration and then show their passage to the formalism of differential forms. In Chap. 6, we introduce the Exterior Algebra of Differential Forms to prove Stokes theorem; as an example of this application, we define the De Rham Cohomology groups to compute the groups of the sphere Sn and closed surfaces of genus g. In Chap. 7, some applications of the Stokes Theorem and differential forms are given in the introduction to Harmonic Functions, Maxwell’s Equations, and Helmholtz’s Theorem. v
vi
Preface
Fig.1 Chapter’s flowchart
By dividing the content into two central themes (1) Differentiation and (2) Integration, we use the flowchart in Fig. 1 to show the interdependence between chapters. There are also three appendices to make the textbook as self-contained as possible, and to support the reader in gaining an understanding of the basic terminology, the notations, the concepts and some basic theorems used in the text. The appendices are not supposed to be read; instead they should serve as a guiding tool to recall the basics according to the needs of the readers. In Appendix A, we fix some notations, enunciate some theorems largely used throughout the other chapters, and further develop some elementary content according to our needs. In this way, Appendix A is recommended as a reference source for prerequisites and references. Appendix B introduces very basic concepts from the differentiable manifold and Lie groups. Appendix C deals with tensor algebra, which is a basic concept to have a full understanding of the content in Chap. 6. Florianópolis, Brazil
Celso Melchiades Doria
Introduction
At the turn of the century, at the end of 1999, there were many speculations about results of greater importance that were reached in the 1st Millenium. One day, while waiting for a medical appointment, there was a magazine listing some results that were considered to be of great significance among those obtained for the development of human knowledge. To my surprise and joy, one of the noted results was the Fundamental Theorem of Calculus. I had not thought of this possibility, but I immediately agreed to its inclusion, not only because I am a mathematician but also because it is a fact that all of Classical Mechanics, Thermodynamics and Electromagnetism were developed using calculus. Consequently, technological advancements achieved in the exact sciences and social sciences depended largely on the development of calculus. In general, when we refer to calculus, we are discussing the techniques of differentiation and integration. In most textbooks, the study of the derivative precedes that of the integral, but not historically. The method of exhaustion developed by Archimedes was an infinite sum process, analogous to that used nowadays to define the integral of a function. The concept of derivative of a function appeared in the sixteenth century after the emergence of analytical geometry to calculate the relative rate of change of a quantity. In the period noted, many ideas in physics were evolving rapidly due to the scientific method. In this period, the development of mechanics was latent. The Pioneers of Calculus as we know it today were Sir Isaac Newton (1642–1727), who developed the Method of Flows, and Gottfried Wilhelm von Leibniz (1646–1716) who developed Calculus, as he named it, and also gave a good part of the notation used to this day. Newton discovered the basic Laws of Classical Mechanics, then applied them together with the Method of Flows to demonstrate Kepler’s Laws. Newton’s 2nd Law states that a Force on a body of mass m generates a relative rate of change of velocity in relation to time. More precisely, in our current ! mathematical language, the second Law states that the force F acting on a body of ! ! mass m is given by F ¼ m d v , where ! v is the velocity vector of the body. Thus, dt
the concept of the derivative is essential for formulating Newtons’ Law. To
vii
viii
Introduction
understand the dynamic behavior of a variable from its relative variation proved to be very efficient, in that the most important information follows from the local data of the variable studied. The idea of studying a phenomenon from knowing how it changes with respect to a parameter requires answering the following question: suppose that the variation of a quantity that has been studied is known, can we determine the quantity? This question is partially answered in the Fundamental Theorem of Calculus which reveals the connection between the concepts of derivative and integral. While the integral gives global information, the derivative gives local information. The local nature of differentiation and integration are complementary behaviors that unite in various applications. In the nineteenth century, classical mechanics and calculus were maturing, except for the questions of the fundamentals of calculus such as the understanding of real numbers, the convergence of series and limits of functions, all of which were later treated in Mathematical Analysis. The basic laws of Electromagnetism were formulated regarding mathematically expressed laws at the outset of the nineteenth century and culminated with Maxwell’s Equations published in 1861. Electromagnetism was one of the main sources of motivation for the development of Vector Calculus, as well as the development of Fluid Mechanics, the Lagrangian formalism for Lagrangian Mechanics and the Quaternions discovered by Sir William Rowan Hamilton. Chronologically, the Electromagnetism equations were written using local coordinates, which then evolved to vector notation, and finally to the formulation using differential forms. Electromagnetism was important because of its applicability, and this boosted experimental and theoretical knowledge of the theory. Due to the electromagnetic theory, the industrial revolution went from steam engines to electric motors that led to an unprecedented development in manufacturing and also to communications, where the telegraph has become essential. It was due to Maxwell’s Equations that the wave behavior of the electrical and magnetic fields had been discovered as well as the fact that both travel at the speed of light. The mathematical richness of electromagnetism revealed several structures that contributed to the development of new ideas and new methods, for example, for the current format of integration theorems that we find in calculus textbooks. Differential forms appeared later and revealed a more precise and succinct language. The original formulation of Maxwell’s Equations, as published by James Clerk Maxwell (1831–1879) in 1861, consisted of 20 equations; years later, Olivier Heaviside (1850–1925) introduced the vector operators r(Curl) and r (Divergent) and reduced the equations to four equations. Later, with the use of differential forms, the equations were reduced to only two equations. Calculus has its technical limits, and these limits are the scope of the issues addressed in Mathematical Analysis. Essentially, the fundamentals of Calculus depend on the concept of limit, which was formalized only in the nineteenth century. These fundamentals and their consequences are fundamental for, for example, to correctly extend the concepts and methods of Calculus in Euclidean
Introduction
ix
spaces to the Calculus of Variations and to study functions and maps in Banach Spaces. The development of Calculus motivated the development of several other areas of mathematics. Today we can claim there is enough apparatus of techniques and tools sufficient to solve various theoretical problems in Linear Algebra, which is not true when the problem belongs to the non-linear world. Non-linear questions can be stratified among those where linear techniques and tools are efficient, for example, in non-linear phenomena whose linear approximation is useful to the study and in those non-linear phenomena whose approximation is hopeless. In the latter case, non-linear problems are much more difficult and rarely are embedded in a global theory, that is, each question is a problem in itself. With the development of Quantum Mechanics and the evolution of optimization problems arising in several areas, it became essential to develop the calculus for the spaces of functions, which in this text will be the Banach spaces, much more general and more abstract than the Euclidean space Rn . In many cases, these are the Hilbert spaces, particular cases of Banach spaces. As knowledge evolved, questions arose inducing such new areas as Algebraic Topology, Differential Topology and Geometric Topology. It has become evident in various models and theories that the topological spaces and geometric properties would be fundamental to our understanding. The first proof by Gauss of the Fundamental Theorem of Algebra used the index of a curve showing that topology plays an important role; the same thing happens later with the development of Calculus of a function of one complex variable, and the Theory of Dynamical Systems. Concepts and techniques evolved very efficiently. We introduce De Rham Cohomology Groups, which contain information on the topology space and make extensive use of the local and global nature of the derivative and integral operators. Mathematics is a language to quantify, as such its domain has the function of improving understanding and efficiency to solve a problem of a quantifying nature. However, only the domain of language does not reveal the ways to understand the inherent phenomena of a model and its applications. Something similar occurs in other theories and areas of human knowledge where mathematics is present. Of course, math benefits from this interaction. As the saying goes “one hand washes the other.”
Contents
1 Differentiation in Rn . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 Differentiability of Functions f : Rn ! R . . . . . . . . . . . . 1.1 Directional Derivatives . . . . . . . . . . . . . . . . . . . . . . 1.2 Differentiable Functions . . . . . . . . . . . . . . . . . . . . . 1.3 Differentials . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.4 Multiple Derivatives . . . . . . . . . . . . . . . . . . . . . . . 1.5 Higher Order Differentials . . . . . . . . . . . . . . . . . . . 2 Taylor’s Formula . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3 Critical Points and Local Extremes . . . . . . . . . . . . . . . . . 3.1 Morse Functions . . . . . . . . . . . . . . . . . . . . . . . . . . 4 The Implicit Function Theorem and Applications . . . . . . 5 Lagrange Multipliers . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.1 The Ultraviolet Catastrophe: The Dawn of Quantum Mechanics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6 Differentiable Maps I . . . . . . . . . . . . . . . . . . . . . . . . . . . 6.1 Basics Concepts . . . . . . . . . . . . . . . . . . . . . . . . . . 6.2 Coordinate Systems . . . . . . . . . . . . . . . . . . . . . . . . 6.3 The Local Form of an Immersion . . . . . . . . . . . . . . 6.4 The Local Form of Submersions . . . . . . . . . . . . . . 6.5 Generalization of the Implicit Function Theorem . . . 7 Fundamental Theorem of Algebra . . . . . . . . . . . . . . . . . 8 Jacobian Conjecture . . . . . . . . . . . . . . . . . . . . . . . . . . . 8.1 Case n ¼ 1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8.2 Case n 2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8.3 Covering Spaces . . . . . . . . . . . . . . . . . . . . . . . . . . 8.4 Degree Reduction . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . .
. . . . . . . . . . . .
. . . . . . . . . . . .
. . . . . . . . . . . .
. . . . . . . . . . . .
. . . . . . . . . . . .
. . . . . . . . . . . .
. . . . . . . . . . . .
1 1 2 5 10 11 14 14 17 21 23 33
. . . . . . . . . . . . .
. . . . . . . . . . . . .
. . . . . . . . . . . . .
. . . . . . . . . . . . .
. . . . . . . . . . . . .
. . . . . . . . . . . . .
. . . . . . . . . . . . .
. . . . . . . . . . . . .
36 41 41 44 49 52 57 58 63 65 67 74 75
2 Linear Operators in Banach Spaces . . . . . . . . . . . . . . . . . . . . . . . . . 1 Bounded Linear Operators on Normed Spaces . . . . . . . . . . . . . . . . 2 Closed Operators and Closed Range Operators . . . . . . . . . . . . . . . .
77 77 81
xi
xii
Contents
3 4 5 6
Dual Spaces . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . The Spectrum of a Bounded Linear Operator . . . . . . . . . . . . . Compact Linear Operators . . . . . . . . . . . . . . . . . . . . . . . . . . . Fredholm Operators . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6.1 The Spectral Theory of Compact Operators . . . . . . . . . . . 7 Linear Operators on Hilbert Spaces . . . . . . . . . . . . . . . . . . . . 7.1 Characterization of Compact Operators on Hilbert Spaces 7.2 Self-adjoint Compact Operators on Hilbert Spaces . . . . . . 7.3 Fredholm Alternative . . . . . . . . . . . . . . . . . . . . . . . . . . . 7.4 Hilbert-Schmidt Integral Operators . . . . . . . . . . . . . . . . . 8 Closed Unbounded Linear Operators on Hilbert Spaces . . . . . .
. . . . . . . . . . .
. . . . . . . . . . .
. . . . . . . . . . .
. . . . . . . . . . .
86 87 94 100 108 110 113 114 116 118 120
3 Differentiation in Banach Spaces . . . . . . . . . . . . . . . . . . . . . . . . 1 Maps on Banach Spaces . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.1 Extension by Continuity . . . . . . . . . . . . . . . . . . . . . . . . 2 Derivation and Integration of Functions f : ½a; b ! E . . . . . . . 2.1 Derivation of a Single Variable Function . . . . . . . . . . . . 2.2 Integration of a Single Variable Function . . . . . . . . . . . . 3 Differentiable Maps II . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4 Inverse Function Theorem (InFT) . . . . . . . . . . . . . . . . . . . . . . 4.1 Prelude for the Inverse Function Theorem . . . . . . . . . . . . 4.2 InFT for Functions of a Single Real Variable . . . . . . . . . 4.3 Proof of the Inverse Function Theorem (InFT) . . . . . . . . 4.4 Applications of InFT . . . . . . . . . . . . . . . . . . . . . . . . . . . 5 Classical Examples in Variational Calculus . . . . . . . . . . . . . . . 5.1 Euler-Lagrange Equations . . . . . . . . . . . . . . . . . . . . . . . 5.2 Examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6 Fredholm Maps . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6.1 Final Comments and Examples . . . . . . . . . . . . . . . . . . . 7 An Application of the Inverse Function Theorem to Geometry
. . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . .
127 127 131 131 132 132 135 137 137 139 146 148 152 153 156 165 170 175
4 Vector Fields . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 Vector Fields in Rn . . . . . . . . . . . . . . . . . . . . . . . . 2 Conservative Vector Fields . . . . . . . . . . . . . . . . . . 3 Existence and Uniqueness Theorem for ODE . . . . . 4 Flow of a Vector Field . . . . . . . . . . . . . . . . . . . . . 5 Vector Fields as Differential Operators . . . . . . . . . . 6 Integrability, Frobenius Theorem . . . . . . . . . . . . . . 7 Lie Groups and Lie Algebras . . . . . . . . . . . . . . . . . 8 Variations over a Flow, Lie Derivative . . . . . . . . . . 9 Gradient, Curl and Divergent Differential Operators
. . . . . . . . . .
. . . . . . . . . .
. . . . . . . . . .
. . . . . . . . . .
179 179 189 191 193 197 200 204 207 211
. . . . . . . . . .
. . . . . . . . . .
. . . . . . . . . .
. . . . . . . . . .
. . . . . . . . . .
. . . . . . . . . .
. . . . . . . . . .
. . . . . . . . . .
Contents
xiii
. . . . . . .
. . . . . . .
. . . . . . .
. . . . . . .
. . . . . . .
. . . . . . .
215 215 215 218 221 222 225
6 Differential Forms, Stokes Theorem . . . . . . . . . . . . . . . . . . . . 1 Exterior Algebra . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2 Orientation on V and on the Inner Product on KðVÞ . . . . . . . 2.1 Orientation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.2 Inner Product in KðVÞ . . . . . . . . . . . . . . . . . . . . . . . . . 2.3 Pseudo-Inner Product, the Lorentz Form . . . . . . . . . . . . 3 Differential Forms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.1 Exterior Derivative . . . . . . . . . . . . . . . . . . . . . . . . . . . 4 De Rham Cohomology . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.1 Short Exact Sequence . . . . . . . . . . . . . . . . . . . . . . . . . 5 De Rham Cohomology of Spheres and Surfaces . . . . . . . . . . 6 Stokes Theorem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7 Orientation, Hodge Star-Operator and Exterior Co-derivative . 8 Differential Forms on Manifolds, Stokes Theorem . . . . . . . . . 8.1 Orientation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8.2 Integration on Manifolds . . . . . . . . . . . . . . . . . . . . . . . 8.3 Exterior Derivative . . . . . . . . . . . . . . . . . . . . . . . . . . . 8.4 Stokes Theorem on Manifolds . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . .
231 231 238 239 239 241 243 246 249 254 255 268 273 276 276 277 279 280
7 Applications to the Stokes Theorem . . . . . . . . . . . . . . . 1 Volumes of the ðn þ 1Þ-Disk and of the n-Sphere . . . . 2 Harmonic Functions . . . . . . . . . . . . . . . . . . . . . . . . . 2.1 Laplacian Operator . . . . . . . . . . . . . . . . . . . . . . 2.2 Properties of Harmonic Functions . . . . . . . . . . . . 3 Poisson Kernel for the n-Disk DnR . . . . . . . . . . . . . . . 4 Harmonic Differential Forms . . . . . . . . . . . . . . . . . . . 4.1 Hodge Theorem on Manifolds . . . . . . . . . . . . . . 5 Geometric Formulation of the Electromagnetic Theory 5.1 Electromagnetic Potentials . . . . . . . . . . . . . . . . . 5.2 Geometric Formulation . . . . . . . . . . . . . . . . . . . 5.3 Variational Formulation . . . . . . . . . . . . . . . . . . . 6 Helmholtz’s Decomposition Theorem . . . . . . . . . . . . .
. . . . . . . . . . . . .
. . . . . . . . . . . . .
. . . . . . . . . . . . .
. . . . . . . . . . . . .
. . . . . . . . . . . . .
281 281 284 284 286 290 298 303 304 306 307 309 313
5 Vector Integration, Potential Theory . . . . . . . . . . 1 Vector Calculus . . . . . . . . . . . . . . . . . . . . . . . . 1.1 Line Integral . . . . . . . . . . . . . . . . . . . . . . . 1.2 Surface Integral . . . . . . . . . . . . . . . . . . . . . 2 Classical Theorems of Integration . . . . . . . . . . . 2.1 Interpretation of the Curl and Div Operators 3 Elementary Aspects of the Theory of Potential . .
. . . . . . .
. . . . . . .
. . . . . . .
. . . . . . .
. . . . . . .
. . . . . . . . . . . . .
. . . . . . .
. . . . . . . . . . . . .
. . . . . . .
. . . . . . . . . . . . .
. . . . . . .
. . . . . . . . . . . . .
. . . . . . . . . . . . .
Appendix A: Basics of Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 317 Appendix B: Differentiable Manifolds, Lie Groups . . . . . . . . . . . . . . . . . 341
xiv
Contents
Appendix C: Tensor Algebra . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 351 References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 357 Index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 359
Chapter 1
Differentiation in Rn
The analysis of the behavior of a function is efficiently carried out when we study the way in which the function varies. In this chapter, techniques used in studying functions of one real variable f : I → R, defined in an open interval I ⊂ R, are extended for functions of several real variables f : U → Rm defined over an open subset U ⊂ Rn . Several real variables is understood to mean a finite number of variables (x1 , . . . , xn ) ∈ Rn . The simple topological nature of Rn allows the theory to be more easily understood as all the concepts and techniques. The same concepts and techniques will be studied in the chapters ahead within the framework of Banach spaces.
1 Differentiability of Functions f : Rn → R When we study the continuity of a function of one real variable in the neighborhood of a given point x0 ∈ R for > 0, we consider the interval I (x0 ) = (x0− , x0+ ), x0± = x0 ± , to analyze the values f (x0− ) and f (x0+ ) taken by f in the lateral neighborhoods (x0− , x0 ) (left-hand side) and (x0 , x0+ ) (right-hand side). In this case, the continuity of f at x0 follows by proving that the lateral limits are equal to the value of the function at the point, i.e., lim f (x0+ ) = lim f (x0− ) = f (x0 ).
→0
→0
In the case of functions of several variables, there are too many directions to be analyzed, as shown in the next examples;
© Springer Nature Switzerland AG 2021 C. M. Doria, Differentiability in Banach Spaces, Differential Forms and Applications, https://doi.org/10.1007/978-3-030-77834-7_1
1
1 Differentiation in Rn
2
Examples (1) Let f : R2 → R be given by f (x, y) =
xy , x 2 +y 2
(x, y) = (0, 0),
0, (x, y) = (0, 0).
(1)
To verify the continuity of f (x, y) at the origin, we analyze the values of f (x, y) when (x, y) is getting close to (0, 0) along straight lines. For α ∈ R, considering γ(t) = (t, αt), we take the limit lim f (γ(t)) = lim
t→0
t→0
α α = . 2 1+α 1 + α2
Since the limit depends on the slope of the line, it follows that f (x, y) is not continuous at the origin, as shown in Fig. 1. (2) The behavior of the values of a function in the neighborhood is quite subtle, as the example below shows: f (x, y) =
x2 y , x 4 +y 2
(x, y) = (0, 0),
0, (x, y) = (0, 0).
When we approximate (x, y) of (0, 0) on the straight lines γ(t) = (t, αt), α ∈ R, we get limt→0 f (γ(t)) = 0. This suggests that f is continuous at the origin. However, in studying the parabolic approximation γ(t) = (t, βt 2 ), we get the limit β to be limt→0 f (γ(t)) = 1+β 2 . That is, when we approximate (x, y) of (0, 0) by moving on to the parabolas, we can approach any value r ∈ [0, 1/2), as shown in Fig. 2. So, the function f (x, y) is not continuous at (0, 0). The examples above show how complicated it might be to study the behavior of a function in the neighborhood of a point. It becomes clear that the way we approximate to the point requires learning about continuity.
1.1 Directional Derivatives Let U ⊂ Rn be an open subset with coordinates (x1 , . . . , xn ); let f : U → R be a function and γ : (−, ) → U a curve such that γ(0) = p ∈ U and γ (0) = v is a vector in Rn . Definition 1 The directional derivative of f at p ∈ U and in the direction of the vector v is ∂f f (γ(t)) − f ( p) ( p) = lim . (2) t→0 ∂v t
1 Differentiability of Functions f : Rn → R
3
Fig. 1 Example 1
Fig. 2 Example 2
According to the definition, the directional derivative ∂∂vf ( p) depends on the curve γ. As discussed in the previous examples, dependence on γ is rather complicated and not easily understood. In what follows, we will examine the question of determining the class of functions such that ∂∂vf ( p) depends only on the point p and on the vector v. Initially, we use the curve γ(t) = p + t v to compute the directional derivative ∂f ( p). ∂v Definition 2 Consider β = {e1 , . . . , en } the canonical basis of Rn and p = ( p1 , . . . , pn ) ∈ U . The partial derivative of f with respect to the variable xi at p is ∂f f ( p + tei ) − f ( p) f ( p1 , . . . , pi + t, . . . , pn ) − f ( p1 , . . . , pn ) = lim . ( p) = lim ∂xi t t t→0 t→0
In developing the concept of a differentiable function of several variables, there are two relevant aspects to be addressed:
1 Differentiation in Rn
4
∂f (1) The independence of ∂x ( p) on γ. i (2) The continuity of f at p. The simple existence of partial derivatives at a point p does not imply continuity of the function at p. The function f defined in Eq. (1) is not continuous at (0, 0), although the partial derivatives (limits) of f exist at (0, 0):
f (t, 0) − f (0, 0) ∂f (0, 0) = lim = 0, t→0 ∂x t
∂f f (0, t) − f (0, 0) (0, 0) = lim = 0. t→0 ∂y t
Assuming the existence of the partial derivatives of the function f : U → R at ∂f ∂f each point of U , we associate f with the n functions ∂x (x), . . . , ∂x (x) : U → R. 1 n Definition 3 A function f : U → R is differentiable of class C1 in U if all partial ∂f (x), i = 1, . . . , n, are continuous for every point derivatives of the functions ∂x i 1 x ∈ U . Consider C (U ) to be the set of all differentiable functions in U . Ignoring dependence on γ and taking γ(t) = p + t v, we work out the following case: U ⊂ R2 , f : U → R2 and f ∈ C1 (U ). Let p = (x0 , y0 ) ∈ U and v = v1 e1 + v2 e2 ∈ Rn : ∂f f (x0 + tv1 , y0 + tv2 ) − f (x0 , y0 ) ( p) = lim t→0 ∂v t 1 = lim f (x0 + tv1 , y0 + tv2 ) − f (x0 , y0 + tv2 ) + f (x0 , y0 + tv2 ) − f (x0 , y0 ) t→0 t ∂f 1 ∂f = lim (c1 , y0 + tv2 ).(tv1 ) + (x0 , c2 ).(tv2 ) , t→0 t ∂x ∂y
and now the last equality follows from the Mean Value Theorem, which guarantees the existence of c1 , c2 , such that x0 < c1 < x0 + tv1 and y0 < c2 < y0 + tv2 . When we pass to the limit t → 0, we get ∂f ∂f ∂f ( p) = ( p).v1 + ( p).v2 . ∂v ∂x ∂y
(3)
The right-hand side of Eq. (3) depends only on the values of p and v. Using the internal product < ., . >: R2 × R2 → R, Eq. (3) is written as ∂f ( p) =< ∂v
∂f ∂f ( p), ( p) , (v1 , v2 ) > . ∂x ∂y
Similarly for the general case f = f (x1 , . . . , xn ), we have the identity ∂f ( p) =< ∂v
∂f ∂f ( p), . . . , ( p) , (v1 , . . . , vn ) > . ∂x1 ∂xn
1 Differentiability of Functions f : Rn → R
5
Motivated by the above, we consider the R3 , f : U → R following situation: U ⊂ 1 1 of class C , γ : (−, ) → R, γ(t) = γ1 , (t), γ2 (t), γ3 (t) , a C -curve such that γ(0) = p = ( p1 , p2 , p3 ) and γ (0) = v = (v1 , v2 , v3 ): f γ(t) − f ( p) f (γ1 , (t), γ2 (t), γ3 (t)) − f ( p1 , p2 , p3 ) ∂f ( p) = lim = lim ∂v t t t→0 t→0 1 f (γ1 , (t), γ2 (t), γ3 (t)) − f p1 , γ2 (t), γ3 (t) = lim t→0 t 1 1 + lim f p1 , γ2 (t), γ3 (t) − f p1 , p2 , γ3 (t) + + lim f p1 , p2 , γ3 (t) − f ( p1 , p2 , p3 ) t→0 t t→0 t γ1 (t) − p1 γ2 (t) − p2 ∂f ∂f + = c1 , γ2 (t), γ3 (t) . p1 , c2 (t), γ3 (t) . ∂x1 t ∂x2 t γ3 (t) − p3 ∂f , p1 , p2 (t), c3 (t) . + ∂x3 t
and now the Mean Value Theorem guarantees that c1 , c2 , c3 exists such that c1 ∈ ( p1 , γ1 (t)), c2 ∈ ( p2 , γ2 (t)) and c3 ∈ ( p3 , γ3 (t)). Once γ ∈ C1 , we have d1 , d2 and d3 in (0, t) such that
γ (d1 ).t γ (d2 ).t ∂f ∂f ∂f ( p) = lim + c1 , γ2 (t), γ3 (t) . 1 p1 , c2 (t), γ3 (t) . 2 t→0 ∂x 1 ∂v t ∂x2 t
γ3 (d3 ).t ∂f ∂f ∂f ∂f + lim ( p).v1 + ( p).v2 + ( p).v3 p1 , p2 (t), c3 (t) . = t→0 ∂x 3 t ∂x1 ∂x2 ∂x3
∂f ∂f ∂f =< ( p), ( p), ( p) , (v1 , v2 , v3 ) >, ∂x1 ∂x2 ∂x3
that is, under the hypothesis above, the directional derivative ∂∂vf ( p) depends only on p ∈ U and also on v. In the case of the function f : Rn → R and v = i vi ei , we get n ∂f ∂f ( p) = ( p).vi . (4) ∂v ∂x 2 i=1
1.2 Differentiable Functions The tangent plane at p ∈ U is the vector space v ∈ Rn | ∃γ : (−, ) → U, γ(0) = p, γ (0) = v}. T p U = { Definition 4 A function f : U → R is differentiable at the point p ∈ U if there is a linear functional d f p : T p U → R, such that v + r ( v) f ( p + v) − f ( p) = d f p .
(5)
1 Differentiation in Rn
6
v) and limv→0 r|( = 0 for all v ∈ T p U . The linear operator d f x is the differential (or v| the derivative) of f (x) at x.
The uniqueness of d f p is achieved directly from the definition. Theorem 1 If f : U → R is differentiable at the point p ∈ U , then f is continuous at p. Proof It follows from the definition that limv→0 r ( v ) = limv→0
r ( v) . | v|
| v |= 0. Hence
v + lim r ( v ) = 0. lim f ( p + v) − f ( p) = lim d f p .
v→0
v→0
v→0
In what follows, vectors are designated as v instead of v, unless the context requires v otherwise. Considering the unit vector vˆ = |v| , the identity below shows the relation between the differential d f p .vˆ =
∂f ∂ vˆ
and the directional derivatives,
f ( p+ | v | v) ˆ − f ( p) r (v) v = − d f p. . |v| |v| |v| Theorem 2 If f ∈ C1 (U ), then f is differentiable. Proof Let β= {e1 , . . . , en } be the canonical basis of Rn . Taking x = (x1 , . . . , xn ) ∈ U and v = i vi ei ∈ Tx U , define r (v) = r (v1 , . . . , vn ) = [ f (x1 + v1 , . . . , xn + vn ) − f (x1 , . . . , xn )] −
n ∂f (x).vi . ∂xi i=1
So the Eq. (4) implies r (v) f (x1 + v1 , . . . , xn + vn ) − f (x1 , . . . , xn ) ∂ f vi = lim − (x). lim v→0 | v | v→0 |v| ∂x | v| i i=1 n
∂f vi ∂f − = 0. (x). ∂ vˆ ∂xi |v| i=1 n
=
n Considering d f x .vˆ = ∂∂ vˆf = i=1 tional d f x : Tx U → R is defined as
∂f ∂xi
d f x .v =
vi (x). |v| , which is linear on v, the linear func-
n ∂f (x).vi . ∂xi i=1
Therefore f ( p + v) − f ( p) = d f p .v + r (v) and limv→0 differentiable.
r (v) |v|
= 0. Hence f is
1 Differentiability of Functions f : Rn → R
7
Using the inner product defined on Rn , the differential d f x : Tx U → R is given by
d f x .v =<
∂f ∂f (x), . . . , (x) , (v1 , . . . , vn ) >, ∀v ∈ Tx U. ∂x1 ∂xn
(6)
Definition 5 Let f ∈ C1 (U ) and β = {e1 , . . . , en } be the canonical basis of Rn . The gradient vector of f at p ∈ U is ∇ f ( p) =
n ∂f ∂f ∂f ( p). ( p).ei = ,..., ∂xi ∂x1 ∂xn i=1
Therefore d f p .v =< ∇ f ( p), v >. Taking an orthonormal basis β = {e1 , . . . , en }, the inner product induces the isomorphism P −1 : T p∗ U → T p U , P −1 (φ) = (φ(e1 ), . . . , φ(en )). Since d f p (ei ) = ∂f ( p), we have P −1 (d f p ) = ∇ f ( p). The dependence of the gradient vector on the ∂xi inner product is often ignored, since in R n the canonical inner product is always the one being used. The gradient operator ∇ : C1 (U ) → C 0 (U ), f → ∇ f , satisfies the following identities: let a, b ∈ R and f, g ∈ C1 (U ): (1) ∇(a f + bg) = a∇ f + b∇g (linearity). (2) ∇( f.g) = ∇( f ).g + f.∇(g) (Leibniz’s rule). The differential of a function f ∈ C1 (U ) induces the application x → d f x . Let L(Rn ; R) be spaces of linear functionals defined on Rn . The differentiability of f requires the continuity of the application d f : U → L(Rn ; R), x → d f x . The topological properties of the space of linear operators is studied in Chap. 3. To study the variation of a function, we start by choosing a curve in the domain, as illustrated in the following: consider p ∈ U and γ : (−, ) → U to be a C1 -curve, γ(t) = (γ1 (t), . . . , γn (t)) and such that γ(0) = p and γ (0) = v. The relative rate of change in the variable t for the function h : (−, ) → R, h(t) = f (γ(t)), is given at each instant t by the derivative h (t) of the function obtained as follows: h (t) =
∂f d[ f (γ(t))] d[γi (t)] = d f γ(t) .γ (t) =< ∇ f (γ(t)), γ (t) >= . (7) (γ(t)). dt ∂xi dt n
i=1
If f ∈ C1 (U ) is constant, then ∇ f (x) = 0, ∀ x ∈ U . To verify the reverse claim, it is necessary to assume that U is a connected set and f ∈ C1 (U ), that is for all pairs of points x, y ∈ U , we have a C 0 -curve γ : [0, 1] → U such that γ(0) = x, γ(1) = y and γ([0, 1]) ⊂ U . Initially, we assume that U is convex and γ(t) = x + t (y − x). In this way, the Mean Value Theorem guarantees the existence of c ∈ (0, 1) such that h(1) − h(0) = h (c) =< ∇ f (γ(c)), γ (c) >=< ∇ f (γ(c)), y − x > .
1 Differentiation in Rn
8
Since the partial derivatives of f are continuous, M = maxt∈[0,1] | ∇ f (γ(t)) |. Since U is convex for every pair x, y ∈ U , we get | f (y) − f (x) | ≤ M | y − x | .
(8)
Proposition 1 Let U ⊂ Rn be a connected open set and f : U → R a C1 -function. If ∇ f (x) = 0 for all x ∈ U , then f is constant. Proof We first assume U is convex; it follows from the Inequality (8) that | f (y) − f (x) | ≤ max | ∇ f (x) | . | y − x |= 0 x∈U
and f (y) = f (x), for all x, y ∈ U . Assuming U is connected, then the conclusion follows from the local convexity. First, we fix the sets { pλ ∈ U | λ ∈ } and {λ ∈ (0, ∞)} in such a way that the family of open balls {Bλ = Bλ ( pλ ) | λ ∈ } is an open cover of U . Choose λ0 ∈ so that Bλ0 ⊂ U . Since Bλ0 is convex, we have c ∈ R such that f (x) = c, for all x ∈ Bλ0 . The set Uc = {x ∈ U | f (x) = c} is open and closed as a subset in U , and therefore Uc = U . Our goal is to study differentiable functions, so we will always assume that f ∈ C1 (U ). To understand the behavior of a function, it is important to know the properties of the subsets of U in which the function is constant. Given a value c ∈ Im(f), we define the c-level set as f −1 (c) = {x ∈ U | f (x) = c}. Indeed, Sard’s theorem [21] affirms that the level sets f −1 (c) are hypersurfaces Rn for a dense set of values c ∈ Im(f); in case n = 2, the set f −1 (c) is a level curve and for n = 3, it is a level surface. The characteristics of the set of levels of a function are important to address optimization questions, for example to find local maximums and local minimums of f . The following remarks derive from Eq. (7); (1) Let θ(t) be the angle formed by the vectors ∇ f (γ(t)) and γ (t), so h (t) =
d[ f (γ(t))] =< ∇ f (γ(t)), γ (t) >=| ∇ f (γ(t)) | . | γ (t) | cos(θ(t)), dt
(2) The function has the highest growth rate in the direction of the vector ∇(γ(t)). If γ (t) = ∇ f (γ(t)), then the derivative of the function h(t) = f (γ(t)) is given by h (t) = d f γ(t) .γ (t) =| ∇ f (γ(t)) |2 > 0 ⇒ h(t) increases. The growth of f is more accentuated in the direction of the gradient vector because cos(θ(t)) = 1. Of course, f decreases sharply in the opposite direction of the gradient. (3) If γ(t) ⊂ f −1 (c), then ∇ f (γ(t)) is orthogonal to the level set f −1 (c) for all t ∈ (−, ), since 0=
d[ f (γ(t))] =< ∇ f (γ(t)), γ (t) > . dt
1 Differentiability of Functions f : Rn → R
9
Exercises (1) Sketch the level sets and graphs of the following functions; (i) f (x, y) = x 2 + y 2 , (ii) f (x, y) = x 2 − y 2 , (iii) f (x, y) = −x 2 − y 2
(iv) f (x, y) = x 2 (x − 1)2 + y 2 , (v) f (x, y) = (x 2 + y − 11)2 + (x + y 2 − 7)2 .
(2) Consider 0 ≤ k ≤ ∞ and Ik = (i 1 , . . . , i k ) ∈ {1, . . . , n}k . Let Ck (U ) be the set k f of functions f : U → R with partial derivatives ∂xi ∂...∂x that are continuous i 1
k
for all multi-index Ik ∈ {1, . . . , n}k , and let C∞ (U ) be the set of f ∈ Ck (U ) for every k ∈ N. Prove that Ck (U ) is a ring for all k ∈ N. (3) A homogeneous function of degree k is a function f : Rn → R that satisfies the condition f (t x) = t k f (x) for any t ∈ R and x ∈ Rn . Show that k. f (x) =
n i=1
xi
∂f . ∂xi
(4) Let f : Rn → R be a differentiable function at the origin. If f (t x) = t f (x) for all t > 0 and all x ∈ Rn , prove f is linear. Use this result to prove that the function 3 x 2 2 , (x, y) = (0, 0), f (x, y) = x +y 0, (x, y) = (0, 0) is not differentiable at the origin. (5) Let f (x, y, z) = e x (6)
(7)
(8)
(9)
2
√ + yx+ln(z)
∂f
∂f
∂f
. Find the partial derivatives ∂x , ∂ y and ∂z . 2 3 Define h(t) = f (t, t , t ) and compute dh . dt Consider the function f (x, y) = x.y. Find the total derivative d f and the increment f = f (x + x, y + y) − f (x, y). Next, compare the results obtained using the fact that d x ∼ x, dy ∼ y. Find the approximate value of the variation of the hypotenuse of a triangle with sides that measure 3 cm and 4 cm; the smaller side is decreased by 0.5 cm and the largest side is raised by 0.75 cm. Let L(Rn ; R) be the space of linear functionals. Prove that f ∈ C1 (U ) if and only if the map d f : U → L(Rn ; R), x → d f x is continuous (hint: L(Rn ; R) can be identified with the space M(n, 1) of real matrices n × 1). The Laplacian operator : C 2 (U ) → C 0 (U ) defined on an open subset U ⊂ n ∂ 2 f . A function f is harmonic if it satisfies the Rn is defined as f = i=1 ∂x 2 i
partial differential equation (PDE) f = 0 in U . Consider U = R2 , and prove that u(x, y) = e x sin(y) is harmonic in R2 .
1 Differentiation in Rn
10
(10) The PDE governing the evolution of heat in space is v − 2 − 4(tr −t) 0
∂v ∂t
= 0. Prove
that the function f (x, t) = 8π3/2 (t10 −t)3/2 e , with (x − x0 )2 + (y − y0 )2 + (z − z 0 )2 satisfies the heat equation. (11) Consider U = R × R. The uni-dimensional wave equation is the PDE
r=
∂2 f ∂2 f − = 0. ∂x 2 ∂t 2 Suppose f ∈ C 2 (U ) is a solution of the wave equation. Show that f satisfies the 2 equation ∂x∂ ∂tf = 0 after changing the coordinates to x = x + t, t = x − t. Conclude that there are functions φ, ψ : U × I → R ∈ C 2 (U × I ), defined on an interval I , such that f (x, t) = φ(x + t) + ψ(x − t). (12) Consider C ⊂ U to be a closed subset and let f ∈ C1 (U ). Extend the concept of differentiability for the function f¯ : C → R. (13) If γ : R → U is a non-constant curve such that γ (t) = ∇ f γ(t) , prove that γ(t) cannot be periodic. (14) (Maximum descent method) Given A ∈ Mn (R) and y0 ∈ Rn , consider the linear equation A.x = y0 . For a solution, we consider the problem of finding the minimum of the function f (x) = 21 | A.x − y0 |2 ; (a) Prove that ∇ f (x) = At (A.x − y0 ). (b) Take x0 ∈ U and consider the straight line r0 (t) = x0 − t∇ f (x0 ). Conclude that the function f 0 (t) = 21 | A.(x0 − t∇ f (x0 )) − y0 |2 attains its minimum at | At (A.x0 − y0 ) | . t0 = | At A(A.x0 − y0 ) | (c) Define x1 = x0 − t0 ∇ f (x0 ) and prove that f (x1 ) ≤ f (x0 ). (d) Applying the minimizing procedure of the previous item, define the sequence xn+1 = xn −
| At (A.xn − y0 ) | t A (A.xn − y0 ), n ∈ N, | At A(A.xn − y0 ) |
and prove that the sequence {xn }n∈N converges to a solution of the linear equation A.x = y0 .
1.3 Differentials Taking in U ⊂ Rn an orthogonal coordinate system (x1 , . . . , xn ) for 1 ≤ i ≤ n, we get the linear functionals d xi : T p U → R, vi ei → d xi (v) = vi . v= i
(9)
1 Differentiability of Functions f : Rn → R
11
In this coordinate system, the differential of f is df =
n ∂f .d xi . ∂xi i=1
(10)
Shifting the point (x, y) to the point (x + x, y + y) yields a variation of f : (x,y) f = f (x + x, y + y) − f (x, y). The Mean Value Theorem assures us that there are α ∈ (x, x + x) and β ∈ (y, y + y) such that (x,y) f =
∂f ∂f (α, y + y).x + (x, β).y. ∂x ∂y
That is, if x ∼ 0 and y ∼ 0, then (x,y) f ∼ d f (x,y) (∼ “is approximately equal to”) n ∂f (γ(t)).γ (t)t. ( f (γ(t)) ∼ ∂x i i=1 The total derivative and the partial derivative are often used in the language of physicists. The partial derivative of a function f , relative to the variable x, is simply ∂f ; the total derivative is the differential of f . The difference is clear, considering the ∂x following example: the temperature on a two-dimensional plate defines a function T = T (x, y, t) depending on the coordinates (x, y) of the point where it is being measures measured and the instant t of the measurement. The partial derivative ∂T ∂t the rate of change of the temperature relative to t at a fixed point (x, y). The total measures the rate of change of the temperature along a trajectory γ(t) = derivative dT dt dy dx (x(t), y(t)); dT = ∂T + ∂T + ∂T . The differential allows us to calculate any dt ∂x dt ∂x dt ∂t variation of the function.
1.4 Multiple Derivatives ∂f ∂xi
were generated from the function f ∈ C1 (U ), ∂f . Assuming many others can be generated by taking more partial derivatives ∂x∂ j ∂x i
In the same way that the functions ∂f ∂xi
∈ C1 (U ), ∀ i = 0, . . . , n, n 2 functions can be defined by
∂2 f ∂f ∂ ∈ C 0 (U ), (i, j) ∈ {1, . . . , n}2 . = ∂x j ∂xi ∂x j ∂xi
1 Differentiation in Rn
12
Consider 1 ≤ k ≤ n; for each multi-index (i 1 , . . . , i k ) ∈ {1, . . . , n}k , the following functions are defined inductively by
∂ ∂k f = ∂xik . . . ∂xi1 ∂xik
∂ k−1 f ∂xik−1 . . . ∂xi1
.
A function f : U → R has class C 2 (U ) if the following items are satisfied, for all (i 1 , i 2 ) ∈ {1, . . . , n}2 ; 2 ∂f ∈ C1 (U ), ∀ i ∈ {1, . . . , n} and (ii) ∂x∂i ∂xf i ∈ C 0 (U ). (i) ∂x i 1
2
Sucessively, f ∈ Ck (U ). If f ∈ Ck (U ) for all k ∈ N, we say f is smooth and f ∈ C∞ (U ). The following result, known as Schwarz’s Theorem, gives a sufficient condition for the partial derivatives to commute. Theorem 3 (Schwarz). If f ∈ C2 (U ), then ∂ ∂x j
∂f ∂xi
∂ ∂xi
=
∂f ∂x j
for all i, j ∈ {1, . . . , n} and x ∈ U . The next lemma goes half-way towards proving Schwarz’s theorem. Lemma 1 Let U ⊂ Rn be an open subset and let f ∈ C1 U × [a, b] . For every i = b ∂φ 1, . . . , n, the partial derivative ∂x of the function φ : U → R, φ(x) = a f (x, t)dt i at x ∈ U is b ∂f ∂φ (x) = (x, t)dt. (11) ∂xi a ∂x i Proof Let ei be an element of the canonical basis Rn and assume that [x, x + sei ] ⊂ U , so we have b b f (x + sei , t) − f (x, t) ∂f φ(x + sei ) − φ(x) ∂f − − (x, t)dt = (x, t) dt. s s ∂xi a ∂xi a
By the Mean Value Theorem, we have r0 ∈ (0, 1) such that ∂f f (x + sei , t) − f (x, t) = (x + r0 sei , t). s ∂xi Therefore φ(x + sei ) − φ(x) − s
a
b
∂f (x, t)dt = ∂xi
a
b
∂f ∂f (x + r0 sei , t) − (x, t) dt. ∂xi ∂xi (12)
1 Differentiability of Functions f : Rn → R
13
We have to control the integral term at the right-hand side of Eq. (12). To do so, we use the compactness of [a, b] and the continuity of the partial derivatives, so ∂f , 1 ≤ i ≤ n, are uniformly continuous with respect to the variable the functions ∂x i t (Appendix A, Theorem 4). Taking > 0, we get δ > 0 to be not dependent on t, such that ∂ f ∂f (x + r0 sei , t) − (x, t) < . ∂xi ∂xi Therefore taking the limit s → 0 in the identity (12), we get the identity (11). Proof (Schwarz’s Theorem) By the Fundamental Theorem of Calculus,
y
f (x, y) = f (x, b) + b
∂f (x, t)dt. ∂y
Deriving the equation above, we obtain
y ∂f ∂f ∂ ∂f (x, y) = (x, b) + (x, t)dt ∂x ∂x ∂x b ∂ y
y ∂ ∂f ∂f (x, b) + (x, t)dt . = ∂x b ∂x ∂ y In deriving the above equation with respect to y, it follows that
∂ ∂f ∂ y ∂x
=
∂ ∂f . ∂x ∂ y
As the following function shows, the Schwarz theorem is false in general: f (x, y) =
x y(x 2 −y 2 ) , x 2 +y 2
In this case, the partial derivatives are ∂ f (0, 0) ∂ y∂x 2
= −1 and
∂ f (0, 0) ∂x∂ y 2
(x, y) = (0, 0),
(13)
0, (x, y) = (0, 0).
= 1.
∂f ∂x
(0, y) = −y and
∂f ∂x
(x, 0) = x, while
Exercises (1) State and prove Schwarz’s theorem for the general case. Considering a bijection τ : {1, . . . , k} → {1, . . . , k}, then we have ∂k f ∂k f = . ∂xiτ (1) . . . ∂xiτ (k) ∂xi1 . . . ∂xik (2) Prove that the function defined in (13) is not C1 .
1 Differentiation in Rn
14
1.5 Higher Order Differentials The 2nd-order differential of f is ⎛
⎞
n n n ∂ f ∂2 f ∂f d xi = d xi ⎠ = d d xi d x j . d2 f = d ⎝ ∂xi ∂xi ∂xi ∂x j j=1 j=1 i, j=1 Inductively, we define the differential of order k by d k f = d(d k−1 f ), and Ik = (i 1 , . . . , i k ) ∈ {1, . . . , n}k . Therefore we have dk f =
Ik
∂k f d xi1 . . . d xik . ∂xi1 . . . ∂xik
(14)
Of course, d xi d x j = d x j d xi for all pairs i, j. For p ∈ U , the differential d2 f 2 induces a bilinear form d f p : T p U × T p U → R; for any vectors u = i u i ei and v = j v j e j ∈ Tp U , (d 2 f p ).(u, v) =
n
n ∂2 f ∂2 f d xi d x j (u, v) = u i v j = u.H ( p).v t , ∂x ∂x ∂x ∂x i j i j i, j=1 i, j=1
2nd-order differential of f defines the symmetric matrix H ( p) = The ∂2 f ( p) ∈ Sn called the Hessian matrix of f at p. The condition f ∈ C2 (U ) ∂xi ∂x j is equivalent to the continuity of the map d 2 f : U → Sn , p → H ( p).
2 Taylor’s Formula The 1st-order differential d f p gives the linear approximation of a function f ∈ C1 (U ) in a small neighborhood of p ∈ U , f ( p + v) ∼ f ( p) +
n ∂f ( p)xi . ∂xi i=1
According to the definition, f ∈ C1 can be written as f ( p + v) = f ( p) + d f p .v + r (v), (v) = 0. Next, a quadratic approximation of f (x) is described in such that limv→0 r|v| terms of the 2nd partial derivatives.
2 Taylor’s Formula
15
Theorem 4 (Taylor’s Formula). Let f ∈ C2 (U ), p ∈ U and Bδ be a ball centered at the origin with radius δ > 0 such that p + tv ∈ U for all v ∈ Bδ . So we have a function r : Bδ → R such that 1 f ( p + v) − f ( p) = d f p .v + (d 2 f p ).(v, v) + r (v), 2 and limv→0
r (v) |v|2
(15)
= 0.
Proof Let p ∈ U and define 1 r (v) = f ( p + v) − f ( p) − d f p .v − d 2 f p (v, v). 2 The function r = r (v) is C 2 -differentiable. Consider v = (x1 , . . . , xn ), so r (0) = 0 and ∂2 f ∂f ∂f ∂r ∂r (v) = ( p + v) − ( p) − ( p)x j ⇒ (0) = 0. ∂xi ∂xi ∂xi ∂x ∂x ∂x j i i i=1 n
Deriving the equation above once more, we get ∂2 f ∂2 f ∂ 2r ∂ 2r (v) = ( p + v) − ( p) ⇒ (0) = 0. ∂x j ∂xi ∂x j ∂xi ∂x j ∂xi ∂x j ∂xi We check the condition limv→0 such that ∂r ∂r (v) − (0) = d ∂xi ∂xi Therefore ∂r ∂xi
r (0) =
∂r ∂xi
(0) =
r (v) |v|2
(16)
= 0. Since r ∈ C 2 (Bδ ), we have R ∈ C 0 (Bδ )
∂r ∂xi
∂2r ∂x j ∂xi
.v + R(v), 0
(0) = 0
lim
v→0
R(v) = 0. |v|
implies
∂r ∂xi
(v) = R(v)
and
(v)
limv→0 |v| = limv→0 R(v) = 0. Since δ is sufficiently small, the solution to r can be |v| approximated by its differential. Consider the curve γ(t) = tv, satisfying γ(0) = 0 and γ(1) = v, and the function h(t) = r (γ(t)) = r (tv). By the Mean Theon Value ∂r (cv)x rem there is c ∈ (0, 1) such that h(1) − h(0) = h (c), that is, r (v) = i=1 i. ∂xi Therefore ∂r n (cv) cxi r (v) ∂xi = . . 2 |v| | cv | | v | i=1 Since (v) limv→0 r|v| 2
limv→0 = 0.
∂r ∂xi
(cv)
|cv|
=0
and
v 0 ≤| c. |v| | 0 centered at p. k Let f : Bδ → R be a C -function and let F : (0, δ) → R be given by F(t) = f p + t (x − p) . Apply Taylor’s formula for a one-variable function to F(t) =
k F i (0) i t + rk (t), i! i=0
1 rk (t) = (k+1)! F k+1 (ct), 0 < c < 1, to obtain the Taylor series of f . (2) Generalize the Taylor formula to
f ( p + v) − f ( p) =
k 1 i (d f p ).(v, . .i ., v) + r (v), i! i=1
(v) such that limv→0 r|v| k = 0. (3) Consider Di = ∂x∂ i : Ck (U ) → Ck−1 (U ), 1 ≤ i ≤ n. Define V as the real vector n space generated by the formal sums i=1 ci Di , such that ci ∈ R, i ∈ {1, . . . , n}. Define the product of an operation on V as follows:
Di D j =
∂ ∂ ∂2 ∂k = = Di j , . . . , Di1 ...ik = . ∂xi ∂x j ∂xi ∂x j ∂xi1 . . . ∂xik
Prove the identity below for functions in Ck (U ): consider Ik = (i 1 , . . . , i k ) such that i 1 + . . . + i k = k:
2 Taylor’s Formula
17
k c1 D1 + . . . + cn Dn = Ik
k! D i1 D i2 . . . Dkik . i 1 !i 2 ! . . . i k ! 1 2
(a) In item (3), consider v = (v1 , . . . , vn ) ∈ Rn and define the product H.∇ = (v1 , . . . , vn ).(D1 , . . . , Dn ) = v1 D1 + . . . + vn Dn . Prove that the derivative of the function of g(t) = f ( p + tv) is dg(t) =< ∇ f ( p + tv), v >= (v.∇). f ( p + tv). dt (b) Consider r ∈ N and f ∈ Cr (U ). If p ∈ U and v ∈ T p U , prove that dr ( f ( p + tv)) = (v.∇)r f ( p + tv). dt (c) Find an estimate for the term R(v) =
1 (v.∇)i k!
f ( p + αv).
3 Critical Points and Local Extremes Let U ⊂ Rn be an open set with coordinates (x1 , . . . , xn ) and f ∈ Ck (U ), 1 ≤ k ≤ ∞. Whenever the differential d f p is non-null at a point p ∈ U , the local properties of f near p can be extracted from its linear term in Taylor’s formula. However, if d f p = 0, then the local properties depend upon the Hessian matrix H ( p). Definition 6 The point p ∈ U is a critical point of f if the linear functional d f p : T p U → R is not surjective. Consider Cr( f ) the set of critical points of f . It is straightforward from the definition that if p ∈ Cr( f ), then d f p = 0 and, consequently, ∇ f ( p) = 0. Depending on the context, p also is called a singular point. Taking p ∈ Cr( f ), the 2nd -order of Taylor’s formula of f centered at p is f ( p + v) − f ( p) =
1 < v, H ( p).v > +r (v). 2
Taking the canonical basis of Rn , the operator H ( p) : T p U → T p U is represented
by the Hessian matrix H ( p) =
∂2 f ∂xi ∂x j
( p) of f at p.
Definition 7 Let p ∈ U , and for any small δ > 0, consider the ball Bδ ( p) ⊂ U : (i) p is a local maximum of f if there is δ > 0 such that f (x) ≤ f ( p) for all x ∈ Bδ ( p).
1 Differentiation in Rn
18
Fig. 3 f (x, y) = x 2 + y 2
(ii) p is a local minimum of f if there is δ > 0 such that f (x) ≥ f ( p) for all x ∈ Bδ ( p). We consider the case p to be a local maximum, so f ( p + t v) ˆ − f ( p) ≤ 0 for all | t |< δ and vˆ ∈ T p U is a unitary vector. By the Mean Value Theorem, we have c ∈ (0, 1) such that f ( p + t v) ˆ − f ( p) =
∂f ( p + cv).t ˆ ≤ 0. ∂ vˆ
Therefore we consider the following cases: (i) if t > 0, then ∂∂ vˆf ( p + cv) ˆ ≤ 0 and ∂f ∂f (ii) if t < 0, then ∂ vˆ ( p + cv) ˆ ≥ 0. So taking the limit t → 0 yields ∂ vˆ ( p) = 0 for any v. ˆ Hence ∇ f ( p) = 0 and d f p = 0, so p is a critical point. The same is true if we assume that p is a local minimum. However p being a critical point does not mean it is a local maximum or a local minimum, as shown in the next examples. Example 1 From the classification1 of quadratic forms in Rn , the examples below are canonical models of critical points for a function f : R2 → R, in which the Hessian is a non-singular matrix. Each case is illustrated in Figs. 3, 4 and 5. (1) f (x, y) = x 2 + y 2 ; Since ∇ f (x, y) = 2(x, y), p = (0, 0) is the only critical point of f . Clearly, p = 0 is a local minimum and the Hessian H (0) is
20 H (0, 0) = . 02
1 Theorem
of Sylvester, Appendix A.
3 Critical Points and Local Extremes
19
Fig. 4 f (x, y) = −x 2 − y 2
Fig. 5 f (x, y) = x 2 − y 2
(2) f (x, y) = −x 2 − y 2 ; In this case, ∇ f (x, y) = −2(x, y) and p = 0 is the only critical point and a local maximum. So,
−2 0 H (0, 0) = . 0 −2 (3) f (x, y) = x 2 − y 2 ; Analogously, the identity ∇ f (x, y) = 2(x, −y) yields that p = 0 is a critical point; however, it is neither a local maximum nor a local minimum of f . In this case, p = 0 is called a saddle point of f . The Hessian matrix at p = 0 is H (0, 0) =
2 0 . 0 −2
Since the Hessian matrix H ( p) is symmetric, it is diagonalizable, and all eigenvalues are real numbers; that is, there is an orthonormal basis of T p U in which the elements are eigenvectors of H ( p). The spectrum of H ( p) is the set σ(H ( p)) = {λ ∈ R | ∃v = 0 ∈ Rn , H ( p).v = λv} of the eigenvalues of H ( p).
1 Differentiation in Rn
20
Definition 8 Let T : Rn → Rn be a symmetric operator (σ(T ) ⊂ R); (1) T is non-negative if σ(T ) ⊂ [0, ∞), and is positive if σ(T ) ⊂ (0, ∞). (2) T is non-positive if σ(T ) ⊂ (−∞, 0], and is negative if σ(T ) ⊂ (−∞, 0). Theorem 5 Let p ∈ U be a critical point of f ∈ C2 (U ); (i) if H ( p) is positive, then p is a local minimum of f . (ii) if H ( p) is negative, then p is a local maximum of f . Proof Let β = {e1 , . . . , en } be a basis of eigenvectors of H ( p), and therefore H (ei ) = λi ei . Consider that the function g :S n−1 → R, g(v) ˆ =< v, ˆ H (v) ˆ > attains its maximum at vˆ M and its minimum at vˆm . Define the extreme values of the = max σ(H ). For any vector v = spectrum as λm = min σ(H ) and λ M i vi ei , the function g takes the value g(v) = i λi vi2 , so ˆ ≤ λ M = g(vˆ M ). g(vˆm ) = λm ≤ g(v) To prove item (i), we note that from Taylor’s formula we have a function ρ : Bδ → R such that limv→0 ρ(v) = 0 and 1 < v, H ( p)v > + ρ(v) | v |2 2 n =| v |2 λi vˆi2 + | v |2 ρ(v) ≥ | v |2 [λm + ρ(v)].
f ( p + v) − f ( p) =
i=1
Since λm > 0, by taking δ small enough in the identity above, we get f ( p + v) ≥ f ( p) for all v ∈ Bδ . Item (ii) follows by the same argument. The theorem does not hold if H ( p) is an indefinite quadratic form, that is, when p is neither a local maximum nor a local minimum of f . If the spectrum of H ( p) has negative and positive eigenvalues, and 0 ∈ / σ(H ( p)), we say that p is a saddle point of f . When 0 ∈ σ(H ( p)), the classification of quadratic forms is rather extended; moreover, the critical point is no longer isolated as shown in the following examples: (1) f (x, y) = x 2 ,
∇ f ( p) = 0 ⇒ p = (0, y) 20 H (x, y) = ⇒ , 00 H (0) is indefinite
p is minimum.
3 Critical Points and Local Extremes
21
(2) f (x, y) = −x 2 ,
∇ f ( p) = 0 ⇒ p = (0, y) −2 0 H (x, y) = ⇒ , 0 0 H (0) is indefinite
p is maximum.
(3) f (x, y, z) = x 2 − y 2 , ⎛
⎞ 2 0 0 ∇ f ( p) = 0 ⇒ p = (0, 0, z) , H (x, y) = ⎝ 0 −2 0 ⎠ ⇒ H (0) is indefinite 0 0 0
p is saddle.
Exercises (1) Consider the function f (x, y) = x 3 − 3x y 2 (monkey’s saddle). Is p = (0, 0, 0) a local minimum, a local maximum or a saddle point of f ? (2) For f : R2 → R defined below, find the critical points and classify them. (a) f (x, y) = x 2 (x − 1)2 + y 2 . (b) f (x, y) = (x 2 + y − 11)2 + (x + y − 7)2 . (c) f (x, y) = x 3 − 3x y 2 . (3) Let f ∈ C2 (R2 ) and p ∈ Cr( f ). Show that: ∂2 f ( p) > 0. ∂x 2 ∂ f ( p) < 0. ∂x 2
(a) p is a local minimum of f if det(H ( p)) > 0 and
2
p is a local maximum f if det(H ( p)) > 0 and p is a saddle point if det(H ( p)) < 0. if det(H ( p)) = 0, then it is not possible to decide about its nature. Suppose p is a strict minimum, that is, if f ( p) < f (x) for all x ∈ Bδ , then λm > 0. (f) Give a sufficient condition in order for p to be a strict maximum ( f ( p) > f (x), for all x ∈ U ). (g) Give an example such that f ( p) ≤ f (x), ∀x ∈ Bδ , and it is false that λm > 0. (b) (c) (d) (e)
3.1 Morse Functions As before, let U ⊂ Rn be an open set and f ∈ C 2 (U ). The criteria in the previous section are accurate when the Hessian matrix H ( p) is non-degenerate, which by definition means that Ker H(p) = {0}. Definition 9 f ∈ C2 (U ) is a Morse function if the Hessian H ( p) is non-degenerate for all critical points p ∈ Cr( f ).
1 Differentiation in Rn
22
A function f is a Morse function if 0 ∈ / σ(H ( p)). In the finite dimension, this is equivalent to the condition det(H ( p)) = 0. Defining the sets σ + (H ( p)) = {λ ∈ σ(H ( p)) | λ > 0} and σ − (H ( p)) = {λ ∈ σ(H ( p)) | λ < 0}, we get σ(H ( p)) = σ + (H ( p)) ∪ σ − (H ( p)). Let Vλ be the eigenspace associated to the eigenvalue λ ∈ σ(H ( p)), and define the subspaces Vλ , V − ( p) = Vλ . V + ( p) = λ>0
λ . Applications of the Morse lemma are beyond the scope of this text. For reading the proof we recommend Lima [30]. The condition imposed on a Morse function is not as restrictive as it seems because Sard’s Theorem, as in Guillemin-Pollack [21], states that the set of Morse functions are dense in C2 (U ). This result can be understood by observing that a slight perturbation in the function can turn a Hessian’s null eigenvalue into a non-zero eigenvalue.
Exercises (1) Let A ∈ Mn (R) be a self-adjoint matrix and let f A : Rn → R be the function f A (x) =
< x, Ax > . | x |2
Therefore f A induces a function fˆA : S n−1 → R.
3 Critical Points and Local Extremes
23
(a) Find the critical points of fˆA . (b) Find sufficient conditions to guarantee that fˆA is a Morse function. (c) Assume that fˆA is a Morse function and find the Morse index of each critical point. (2) The real projective plane of dimension 2 is the quotient space RP 2 = (R3 \{0})/ ∼. The equivalent relation is defined as follows: (x , y , z ) ∼ (x, y, z) if (x , y , z ) = t.(x, y, z) and t ∈ R\{0}. Indeed, RP 2 is the space of lines in R3 passing through the origin. (a) Define the quotient topology in RP 2 . (b) Prove that RP 2 = (S 2 / ∼) given that (x , y , z ) ∼ (x, y, z) if (x , y , z ) = (x, y, z) or (x , y , z ) = −(x, y, z). So, RP 2 = S 2 /Z2 . (c) Let A ∈ S3 (R) be a symmetric matrix. Prove that fˆA induces a differentiable function f˜A : RP 2 → R. Use the results obtained for the function f A to find the Morse indices of the critical points of fˆA .
4 The Implicit Function Theorem and Applications At the origin of many mathematical ideas is the need to solve equations. Given an equation, the set of solutions can be empty, finite, or infinite. First, let’s look at the 2 example of an equation with one variable. In order to solve the equation e−x − 2 ln(x) = 0, x > 0, the best strategy is to consider the function f (x) = e−x − ln(x) and to study its properties. The graph of f is illustrated in Fig. 6. Since f is continuous and f (1) = e−1 > 0 and f (2) < 0, there is a point a ∈ (1, 2) such that f (a) = 0. In addition, the derivative of the function satisfies f (x) < 0 for all x ∈ (0, ∞); consequently there is a unique solution to the equation. The example above shows how studying functions can be useful in leading to the possibility of solutions for an equation. The solution set of an equation of one variable in which the associated function f (x) is differentiable and is generically discrete because it is geometrically described as the intercept of the graph of f (x) with the x-axis. In most cases a solution to an equation is not explicitly obtained through analytical methods, but once the existence of a solution has been demonstrated, numerical methods are employed to obtain an approximation as close as possible, which demands computational methods. In the case of an equation with two variables, we must understand what it means to find a a solution. To find the solution set S for the equation x + y = 1 is equivalent to finding the zero set f −1 (0) of the function f (x, y) = x + y − 1. In both cases, by introducing the function φ(x) = 1 − x, we get S = f −1 (0) = {(x, y) ∈ R2 | x ∈ R, y = φ(x)}. The case of three variables is illustrated in the equation x 2 + y 2 + 2 z = 1. Letting φ(x, y) = 1 − x 2 − y 2 , the solution set can be defined as S = {(x, y, z) ∈ R3 | z = φ(x, y)}.
1 Differentiation in Rn
24 f (x) = e−x − ln(x) 2
0
1
2
3
4
Fig. 6 f (x) = e−x − ln(x) 2
The last examples allow us to express one of the variables as a function of the 2 others. This procedure is not always possible, e.g., in the equation e−(x+y) − ln(x + y) = 0 neither can y be written as a function of x nor can x be written as a function of y. Solution sets can be complex; examples with two variables define a curve and with three variables define a surface. However, once the case with one equation is well understood, the case with n-equations and m-variables can be dealt with. Before going forward, a short step back to get an understanding of linear systems is necessary. Given the linear system a11 x1 + a12 x2 + . . . + a1n xn = b1 .. . am1 x1 + a12 x2 + . . . + amn xn = bm ,
(17)
assume the vectors n i = (ai1 , . . . , ain ), 1 ≤ i ≤ m, are linearly independent, and the solution set is a space with dimension (n − m). The linear case is the simplest case to understand. Each equation allows us to write one variable as a function of the others and replace it in the next equation. After performing this algorithm in every equation, we obtain a single equation with (n − m) variables. Geometrically, this system corresponds to a set of m-distinct and non-parallel hyperplanes of Rn , and the solution set is the intersection of the hyperplanes. If the intersection is not empty, then generically it defines an (n − m) affine space. Although the systems of nonlinear equations are far more complex, the linear methods are useful to give us some insight or even to find a solution in many examples. We go through more examples of equations defined by functions f ∈ C∞ (U ), U ⊂ R2 an open subset;
4 The Implicit Function Theorem and Applications
25
−1 Example 2 (1) f : R2 → R, f (x, y) = x 2 + y 2 − 1. The √ set f (0) is the cir2 cumference of radius 1. The functions g± (x) = ± 1 − x define the sets U ± = {(x, g ± (x) | x ∈ (−1, 1)}, respectively, such that f −1 (0) = U + ∪ U − ∪ {−1, 1}. Restricting to looking at each set U ± , the solution set f −1 (0) is the graph of g ± ; however, it is not possible to extend the functions g± on a larger open interval (−1 − , 1 + ). The reason for not being able to extend results comes from the fact that in the neighborhood of points (−1, 0) and (1, 0), the circumference cannot be described as the graph of a function depending on x. However, in the neighborhood of each point Q ± = (±1, 0), we can express x = ± 1 − y 2 , that is, we have h ± (y), y . This example is the most elementary; it contains the basic elements to understand the sufficient conditions needed to solve an equation. (2) f : R2 → R, f (x, y) = x 2 − x y + y 2 − 1. In this case, neither x nor y can be expressed explicitly as a function of each other by taking f (x, y) = 0. The solution set defines the curve shown in Fig. 7. We notice that the curve is not a graph as a whole, though locally for some intervals, it stands for a graph. (3) f : R2 → R, f (x, y) = y 5 + 2y − x 3 + x. Figure 8 shows that the curve defined by the solution set of the equation f (x, y) = 0 is a graph. This is easily checked since f (., y) is injective and increasing as a consequence of
∂f (x0 , y) = 5y 4 + 2 > 0, for all y. ∂y So for each x = x0 , there is a unique y0 such that f (x0 , y0 ) = 0, and lim y→−∞ f (x0 , y) = −∞ and lim y→∞ f (x0 , y) = ∞. Hence f (x0 , y) is a strictly increasing continuous function with R as the image. However, there is no function g such that y = g(x). Next, the Implicit Function Theorem gives a sufficient condition for solving the equation f (x1 , . . . , xn+1 ) = c when f ∈ C k (U ), k ≥ 1. Consider Rn+1 = Rn × R = {(x, y) | x = (x1 , . . . , xn ) ∈ Rn , y ∈ R} and U ⊂ Rn+1 as an open set. Theorem 6 (ImFT). Let f ∈ Ck (U ), k ≥ 1. Let (x0 , y0 ) ∈ U be such that f (x0 , y0 ) = c and assume that ∂∂ yf (x0 , y0 ) = 0. So we have a ball B = Bδ (x0 ) ⊂ Rn and an interval J = (y0 − , y0 + ) with the following properties: (1) B × J ⊂ U and ∂∂ yf (x, y) = 0 for all (x, y) ∈ B × J . (2) There is a Ck -function ξ : B → J such that f (x, y) = f (x, ξ(x)) = c. (3) The partial derivatives of ξ are ∂f (x, ξ(x)) ∂ξ ∂x (x) = − ∂ if . ∂xi (x, ξ(x) ∂y
(18)
1 Differentiation in Rn
26
Fig. 7 x 2 − x y + y 2 = 1
Fig. 8 y 5 + 2y − x 3 + x = 0
Proof Assume
∂f (x0 , y0 ) ∂y
∂f (x, ∂y
> 0. Since
∂f ∂y
is continuous, we have δ > 0 and >
y) > 0 for all (x, y) ∈ Bδ (x) × (y0 − , y0 + ). The map y → 0 such that f (x, y) defines an increasing function on J = (y0 − , y0 + ). Since f (x0 , y0 ) = c, f (x0 , y0 − ) < c and f (x0 , y0 + ) > c, the Mean Value Theorem implies the existence of a unique y ∈ (y0 − , y0 + ) related to x. Now, we define y = ξ(x), such that f (x, ξ(x)) = c, and so the function ξ : Bδ (x0 ) → (y0 − , y0 + ), x → ξ(x). Assume for now that ξ is continuous. Differentiability of ξ is achieved as shown next; let h(x) = (x, ξ(x)), h : Bδ (x0 ) → R, f (x + tei , ξ(x + tei )) − f (x, ξ(x)) h(x + tei ) − h(x) = t t f (x, ξ(x + tei )) − f (x, ξ(x)) f (x + tei , ξ(x + tei )) − f (x, ξ(x + tei )) + = t t ∂f ∂f ξ(x + tei ) − ξ(x) (x + cei , ξ(x + tei )) + = (x, d) , ∂xi ∂y t (19) such that 0 < c < t and ξ(x) < d < ξ(x + tei ). Since ξ is continuous, taking the limit t → 0 in Eq. (19), we get 0=
4 The Implicit Function Theorem and Applications
27
Fig. 9 V = B × J ∂f (x, ξ(x)) ∂ξ ξ(x + tei ) − ξ(x) ∂xi = − ∂f (x) = lim . t→0 ∂xi t (x, ξ(x)) ∂y
Therefore the identity (18) is verified. The continuity of ξ is a consequence of the following argument: given a compact set K ⊂ J , let {xk }k∈N ⊂ B be a sequence converging to x, ¯ such that ξ(xk ) ∈ K for all k ∈ N. If the sequence {yk = ξ(xk )}k∈N converges to α ∈ K , then f (x, ¯ α) = limk f (xk , yk ) = f (x, ¯ ξ(x)) ¯ = c. Since ξ is injective, it follows that ξ(x) ¯ = α, i.e, α = lim yk . Hence ξ is continuous. In the above statement, we highlighted the importance of the hypothesis = 0 for the existence of a differentiable bijection between Bδ (x0 ) and the image of the application x → (x, ξ(x)), in which case the inverse is obviously differentiable. As a by-product, the set f −1 (c) is locally a graph. This is an important remark when generalizing the Implicit Function Theorem for differentiable maps (Fig. 9). ∂f (x0 , y0 ) ∂y
Definition 11 Let U ⊂ Rn be an open subset and f ∈ Ck (U ), k ≥ 1. (i) c ∈ R is a regular value of f if there are no critical points in f −1 (c); otherwise c is a critical value of f . (ii) A subset M ⊂ Rn+1 is a hypersurface of class C k if it is locally a graph of a function, i.e., for all p ∈ M, we have an open neighborhood V ( p) ⊂ Rn+1 of p, an open set U ⊂ Rn , and a C k -function ξ : U → R, such that V ( p) ∩ M = {(x, ξ(x)) | x ∈ U }. If c is a regular value of f , the ImFT implies that M(c) = f −1 (c) is a hypersurface. A function f : U → R is constant when restricted to M. Given a smooth curve γ : (−, ) → M(c), we have h (t) = d f γ(t) .γ (t) =< ∇ f (γ(t), γ (t) >= 0. So γ (t) is orthogonal to ∇ f (γ(t)) for all t ∈ (−, ). Consequently, the tangent plane to M(c) at p is T p M(c) = v ∈ Rn+1 | ∃γ : (−, ) → M, γ(0) = p, γ (0) = v = Ker(dfp ).
1 Differentiation in Rn
28
Therefore T p M(c) = (∇ f ( p))⊥ is a subspace of Rn+1 of dimension n. M c is also called an n-submanifold of Rn+1 , or equivalently, a hypersurface of Rn+1 . Since M(c) is locally defined as the graph of ξ : U → R, let V ⊂ M be an open set such that M ∩ V = {(x, ξ(x) | x ∈ U }. A curve in M(c) is defined as follows: let γ : (−, ) → M(c), γ(0) = p and γ (t) = v, and therefore γ(t) = (x1 , . . . , xn (t), ξ(x(t))) and
γ (t) = =
x1 (t), . . . , xn (t), dξx(t) .x (t)
n
xi (t)[ei +
i=1
=
x1 (t), . . . , xn (t),
∂ξ (x(t)).xi (t) ∂x i i
∂ξ (x(t))en+1 ]. ∂xi
∂ξ Defining the linearly independent vectors {v1 , . . . , vn }, vi = ei + ∂x (x(0))en+1 , i n xi (0)vi . Conversely, we claim that every linear we get at t = 0 v = γ (0) = i=1 combination v = i αi vi is the tangent vector γ (0) of some curve γ : (−, ) → M(c) as shown by the following argument: let ξ : B → R be the function given by ImFT and let v = (α1 , . . . , αn ) ∈ Rn . Choose > 0 so that p + tv ∈ B and γ(t) = ( p + tv, ξ( p + tv)). Hence γ(0) = ( p, ξ( p)) and γ (0) = v. The following are necessary conditions for a hypersurface M ⊂ Rn+1 to be a level set of the function f : (i) there must be a non-null orthogonal field (∇ f ) over M, (ii) in the neighborhood of any point (x0 , y0 ) ∈ M, we have > 0 and an open set, i.e., V = Bδ (x0 ) × (y0 − , y0 + ) ⊂ Rn+1 with the property that V ∩ M = {(x, ξ(x)) | x ∈ Bδ(x0 ) }. This concept of hypersurface can be extended to an n-submanifold of Rn+k , 1 ≤ k ≤ n;
Definition 12 M n ⊂ Rn+k is an n-submanifold of Rn+k of class Cl if for any point p ∈ M n , there are open sets V ⊂ Rn+k ( p∈ V ) and U ⊂ R n , such that V ∩ M n is the graph of a map f : U → Rk , f (x) = f 1 (x), . . . , f k (x) , f i ∈ C l (U ); i.e., V = V˜ ∩ M n =
x1 , . . . , xn , f 1 (x1 , . . . , xn ), . . . , f k (x1 , . . . , xn ) .
Example 3 We work out some examples; (1) Let U ⊂ Rn+1 , f : U → R and M = {(x, f (x)) ∈ Rn+1 | x ∈ U }. In this case, M is globally a graph, so it is trivially a hypersurface. (2) S n = {x ∈ Rn+1 ; | x |= 1}. Let x = (x1 , . . . , xn+1 ) ∈ Rn+1 be the coordinate system in Rn+1 and let S n = ∪i Vi± , Vi+ = {x ∈ S n | xi > 0} and Vi− = {x ∈ S n | xi < 0}, 1 ≤ i ≤ n, are the hemispheres. Let Din = {x = (x1 , . . . , xˆi , . . . , xn+1 ); | x |< 1} be the × {0} × Rn+1−i ⊂ Rn+1 , and define the unit open ball contained in Rn = Ri−1 ! functions f i± : Din → R, f i± (x) = ± 1 − l=i xl2 . So Vi± = {(x, f i± (x)) | x ∈ Din }. Therefore S n is a hypersurface in Rn+1 .
4 The Implicit Function Theorem and Applications
29
(3) S n being a hypersurface of Rn+1 is an easy consequence of the ImFT. Consider ing the function f : Rn → R, f (x1 , . . . , xn ) = i xi2 . Since any a ∈ (0, ∞) is a regular value of f , it follows that S n−1 (a) = f −1 (a) is a hypersurface. Let p ∈ S n and let op be the vector defined by p. The identity d f p .v = 2 < op, v >= 0 yields that the tangent plane at p ∈ S n is the orthogonal subspace to the vector op. Indeed, T p S n = Ker(dfp ). (4) Consider the function f : R3 → R, f (x, y, z) = −x 2 − y 2 + z 2 − 1. The level surface f −1 (0) is the non-connected 2-submanifold of R3 . (5) Let A : Rn → Rn be a self-adjoint operator and let f : Rn → R be the bilinear form f A (x) = 21 < A(x), x >. To find the gradient ∇ f A (x) at x, let v ∈ Rn ; therefore d( f A )x .v = lim
t→0
f A (x + tv) − f A (x) 1 = [< A(x), v > + < A(v), x >] =< A(x), v > . t 2
Therefore ∇ f A (x) = A(x) and Cr( f A ) = Ker(A). So any c = 0 belonging to the image of f A is a regular value of f A . Hence M Ac = f A−1 is a hypersurface. Indeed, M Ac is a quadric in Rn . The example in the last item is a quadric defined by the matrix ⎛ ⎞ −1 0 0 A = ⎝ 0 −1 0 ⎠ . 0 0 1 (6) The surface M in Fig. 10 is parametrized by rotating the curve y = z 2 − 4 around the z-axis. It is not a hypersurface of R3 since there are points with a local neighborhood that is not a local graph. ⎛ ⎞ ⎛ ⎞⎛ 2 ⎞ cos(v) − sin(v) 0 u −4 x ⎝ y ⎠ = ⎝ sin(v) cos(v) 0 ⎠ ⎝ 0 ⎠ . z 0 01 u (7) By the same argument in the last example, Whitney’s umbrella illustrated in Fig. 11, with its parametrization given by f (x, y) = (x y, x, y 2 ) is not a 2submanifold of R3 . (8) Consider the functions f (x, y, z) = x y − z and g(x, y, z) = z and define the intersection of sets M = ( f )−1 (0) ∩ (g)−1 (0). The gradients ∇ f (x, y, z) = (y, x, −1) and ∇g(x, y, z) = (0, 0, 1) are non-zero, so 0 is a regular value of each of the functions. However, the vectors ∇ f (0, 0, 0) and ∇g(0, 0, 0) are linearly dependent at (0, 0, 0). The set M = {(x, y, 0) ∈ R3 | x y = 0} is not a 1-submanifold of R3 since it is not a graph of a function at (0, 0, 0). At (0, 0, 0), M has no tangent plane.
1 Differentiation in Rn
30
(9) SLn (R). 2 The matrix set Mn (R) is a vector space isomorphic to Rn . The determi∞ nant det : Mn (R) → R is a C function. Indeed, it is a homogeneous polynomial of degree n with unknowns the matrix entries. The subset of special matrices SLn (R) = {A ∈ Mn (R) | det(A) = 1} is a subgroup of the group GLn (R) of invertible matrices. Considering x = (x1 , . . . , xn ), such that xi = (xi1 , . . . , xin ) is a column vector, the determinant is an alternating n-linear function satisfying the following properties: (a) det(I ) = 1, (I = (e1 , . . . , en ). (b) (alternating) For all pairs (i, j), 1 ≤ i, j ≤ n, det(x1 , . . . , xi , . . . , x j , . . . , xn ) = − det(x1 , . . . , x j , . . . , xi , . . . , xn ); (c) (n-linear) For all 1 ≤ i ≤ n and a, b ∈ R, det(x1 , . . . , axi + byi , . . . , xn ) = a det(x1 , . . . , xi , . . . , xn ) + b det(x1 , . . . , yi , . . . , xn ).
Let’s compute to the case n = 3; det(A + t V ) − det(A) det(I + t A−1 V ) − 1 = det(A). lim . t→0 t→0 t t
d(det) A .V = lim
Since A is invertible, we can define B = A−1 V = (b1 , . . . , bn ); det(I + t B) = det(e1 + tb1 , e2 + tb2 , e3 + tb3 ) = det(e1 , e2 + tb2 , e3 + tb3 ) + t det(b1 , e2 + tb2 , e3 + tb3 ) = det(e1 , e2 , e3 + tb3 ) + t det(e1 , b2 , e3 + tb3 ) + t det(b1 , e2 , e3 + tb3 ) + t 2 det(b1 , b2 , e3 + tb3 ) = det(e1 , e2 , e3 ) + t[det(e1 , e2 , b3 ) + det(e1 , b2 , e3 ) + det(b1 , e2 , e3 )] + t 2 [det(b1 , e2 , b3 ) + det(e1 , b2 , b3 ) + det(b1 , b2 , e3 )] + t 3 det(b1 , b2 , b3 )].
Taking the limit t → 0, we get det(I + t B) − 1 = b11 + b22 + b33 , t→0 t lim
that is,
d(det) A .V = det(A).Tr(A−1 .V).
Therefore 1 is a regular value of det, since det(A) = 1 and d(det) A is surjective. Consequently, SL3 (R) = det −1 (1) is a hypersurface of M3 (R). The tangent plane at A ∈ SL3 (R) is T A SL3 (R) = {V ∈ M3 (R) | Tr(A−1 V) = 0}. At the point A = I , the tangent plane TI SL3 (R) = {V ∈ M3 (R) | Tr(V) = 0} is the subspace End0 (R3 ) of traceless matrices in M(R3 ). The inner product < A, B >= 13 Tr(ABt ) defined on M3 (R) induces the orthogonal decomposition M3 (R) =< I > ⊕ End0 (R3 ).
4 The Implicit Function Theorem and Applications
31
Fig. 10 Parametrized surface
Fig. 11 Whitney’s umbrella
Fig. 12 Torus T 2
(10) The torus T 2 is the surface generated by moving the circumference (y − a)2 + z 2 = r 2 along the circumference x 2 + y 2 = R 2 , as illustrated in Fig. 12. Now T 2 can be defined as the level set f −1 (0) of the function f : R3 → R, f (x, y, z) = ( x 2 + y 2 − a)2 + z 2 . Since 0 is a regular value of f , T 2 is a 2-submanifold of R3 .
1 Differentiation in Rn
32
Exercises (1) Let A ∈ GL2 (R) be a positive self-adjoint matrix and f A : R2 → R, f A (x) =< A(x), x >. Find the set of the regular values of f A and describe the level curves of f A . (2) Suppose A ∈ GL3 (R) is a self-adjoint matrix. Find the level surfaces of f A : R3 → R, f A (x) =< A(x), x >. . (3) Find the critical points of the function f A : Rn → R, f A (x) = |x|2 (4) Prove that the solution set of the non-linear system below is a 1-submanifold of R3 . xy − z = 0 ln(x y) + z 2 = 1. (5) Prove that SLn (R) is a hypersurface of R n . Is the level set (det)−1 (0) a hypersurface? (6) (Non-Linear System) Let U ⊂ Rn and f 1 , . . . , f k ∈ C l (U ). Assume 0 is a regular value of f i for all 1 ≤ i ≤ k, and define Mi = f i−1 (0). Prove that if the set of gradient vectors {∇ f 1 (x), . . . , ∇ f k (x)} is linearly independent, then M = M1 ∩ . . . ∩ Mk is a (n − k)-submanifold of Rn (suppose M = ∅). (7) Let U ⊂ Rn be an open set and f 1 , . . . , f k functions in C1 (U ). Find a condition such that the solution set S of the non-linear system below is a submanifold of Rn and find dim(S) (Fig. 15). 2
⎧ ⎪ ⎪ ⎨ f 1 (x0 = 0, .. . ⎪ ⎪ ⎩ f (x) = 0. k (8) Prove that there is no function f : U ⊂ R2 → R such that the Möebius band M in Fig. 13 can be defined as the level set of f . (9) Let f, g : R3 → R, f (x, y, z) = x 2 + y 2 + z 2 − 1 and g(x, y, z) = x 2 + (y − a)2 − a 2 . For which values of the parameter a is the set f −1 (0) ∩ g −1 (0) a submanifold of R3 ? Fig. 14 shows the cases a = 1 and a = 21 ). (10) For which real values a is the intersection of the circumferences x 2 + y 2 = 1 and (x − 2)2 + y 2 = a 2 not transversal? (11) Let M ⊂ Rn be a compact, orientable hypersurface. Prove that there is an open subset V ⊂ Rn and a function f : V → R such that M = f −1 (0). (hint: the orientability allows us to assume the existence of vector field X , normal to M, such that X ( p) = 0 for all p ∈ M. Compactness guarantees that a tubular neighborhood of M exists).
5 Lagrange Multipliers
33
Fig. 13 Möebius band
Fig. 14 Intersection of surfaces
Fig. 15 Intersection of surfaces
5 Lagrange Multipliers In this section the main issue is to optimize a function f ∈ Ck (U ), k ≥ 1, restricted to the hypersurface M = φ−1 (c) ⊂ U , in which c is a regular value of the differentiable function φ : U → R. An example is to determine the shortest distance from a point p = (x0 , y0 , z 0 ) to the plane π = {(x, y, z) ∈ R3 | ax + by + cz + d = 0}. This is equivalent to finding the minimum value of the function f : R3 → R, f (x, y, z) =
(x − x0 )2 + (y − y0 )2 + (z − z 0 )2 ,
1 Differentiation in Rn
34
restricted to the plane π. Considering the function φ : R3 → R, φ(x, y, z) = ax + by + cz + d, then π = φ−1 (0). To address the general case of optimizing a smooth function f : Rn → R, restricted to a hypersurface M = φ−1 (0) ⊂ Rn , φ : Rn → R, we stress the following items: (1) ∇φ( p) ⊥ T p M, ∀ p ∈ M. (2) If p is a critical point of f | M , then d f p .v =< ∇ f ( p), v >= 0 for all v ∈ T p M. Hence ∇ f ( p) ⊥ T p M. Assuming ∇ f ( p) = 0, the above items imply that ∇ f ( p) and ∇φ( p) are linear dependent vectors, so we have λ = 0 ∈ R such that ∇ f ( p) = λ∇φ( p). In this way, finding a critical point for f | M corresponds to finding x ∈ M and λ = 0 satisfying the nonlinear system with (n + 1)-variables and (n + 1)-equations,
∇ f (x) − λ∇φ(x) = 0, φ(x) = c.
(20)
It could happen that p ∈ U is a critical point of f , so ∇ f ( p) = 0. Now returning to the initial example, to find the shortest distance from p to π, we need to solve the system ⎧ ⎪ x − x0 = λa f, ⎪ ⎪ ⎪
⎨ y − y = λb f, x − x0 y − y0 z − z 0 0 , , = λ.(a, b, c) ⇒ ⎪ f f f z − z 0 = λc f, ⎪ ⎪ ⎪ ⎩ax + by + cz = d
⇒ f 2 = λ2 (a 2 + b2 + c2 ) f 2 .
1 Since f (x) = 0 for all x, then λ = ± √a 2 +b . It is possible to solve the equations 2 +c2 to find the point pm = (xm , ym , z m ) realizing the shortest distance; however, to find the minimum value of f , we take a short cut:
a2 b2 c2 a(xm − x0 ) = f, b(ym − y0 ) = f, c(z m − z 0 ) = f. a 2 + b2 + c2 a 2 + b2 + c2 a 2 + b2 + c2
Using the identity axm + bym + cz m + d = 0, then | ax0 + by0 + cz 0 + d | a(xm − x0 ) + b(ym − y0 ) + c(z m − z 0 ) = f ( pm ). a 2 + b2 + c2 ⇒ f ( pm ) = . √ a 2 + b2 + c2
Of course, pm is the absolute minimum since f restricted to the plane π grows as much as we wish by pushing (x, y, z) ∈ π away from pm . An interesting interpretation of Eq. (20) is the following: the normal vector to the hypersurface M = φ−1 (c) at x ∈ M is ∇φ(x). By projecting the vector ∇ f (x) over the tangent plane T p M = (∇φ)⊥ , we obtain the vector
5 Lagrange Multipliers
35
∇ f (x) = ∇ f (x) −
< ∇ f (x), ∇φ(x) > ∇φ(x). | ∇φ(x) |2
(21)
The vector ∇ f (x) is the projection of the gradient vector of f : M → R over Tx M. If x0 ∈ M is a critical point, then ∇ f (x0 ) = 0; this is equivalent to Eq. (20).
Exercises (1) Find the point pm realizing the shortest distance from p = (x0 , y0 , z 0 ) to the plane π = {(x, y, z) ∈ R3 | ax + by + cz + d = 0}. (2) Find the shortest distance of a point p = (x0 , y0 , z 0 ) to the sphere of radius R centered at the origin. 2 2 (3) Consider the ellipse E : ax 2 + by2 = 1 and the line L : x + y = 1. Find a necessary condition so that E ∩ L = ∅ and determine the shortest distance between them. (4) Let A ∈ Mn (R) be a positive definite symmetric matrix and let u 0 = 0 in Rn . Considering in Rn the ellipsoid E = {x ∈ Rn |< Ax, x >= 1} and the hyperplane defined by L = {x ∈ Rn |< u 0 , x >= 1}, prove the following: (i) E ∩ L = ∅ if and only if < A−1 x, x > < 1. (ii) Find the distance d(E, L) =
inf
y∈E, x∈L
|| y − x || .
(5) Let Q 1 = {(x1 , . . . , xn ) | xi > 0, ∀i = 1, . . . , n}. Find the maximum value of f (x1 , . . . , xn ) = x1 x2 . . . xn considering (x1 , . . . , xn ) ∈ Q 1 and x1 + x2 + . . . + xn = c, c constant. Prove that √ x1 + x2 + . . . + xn n . x1 x2 . . . xn ≤ n (6) Let A : Rn → Rn be a linear self-adjoint operator and let f A : Rn → R be the function f A (x) =< x, A(x) >. Prove that the critical point set of ( f A ) | S n−1 is equal to the set of eigenvectors of A. Prove that if σ(A) is the spectrum of A, λm = min σ(A) and λ M = max σ(A), then max f A (x) = λ M , min f A (x) = λm .
x∈S n−1
x∈S n−1
(7) Let S be the set defined by the equations x 2 + y 2 + z 2 = 1 and x + y + z = 1. Find the critical points of the function f : S → R given by f (x, y, z) = x 4 + y4 + z4.
1 Differentiation in Rn
36
(8) Assume f , φ1 and φ2 are in C1 (U ). Assume that 0 is a regular value of φ1 and φ2 and define M1 = φ−1 (0) and M2 = φ−1 (0). Prove that if p ∈ M = M1 ∩ M2 is a critical value of f , then there are constants λ1 , λ2 ∈ R such that ⎧ ⎪ ⎨∇ f ( p) + λ1 ∇φ1 ( p) + λ2 φ2 ( p) = 0. φ1 ( p) = 0, ⎪ ⎩ φ2 ( p) = 0.
.
(22)
5.1 The Ultraviolet Catastrophe: The Dawn of Quantum Mechanics The genesis of Quantum Mechanics lies in Max Planck’s work on the radiation of a blackbody. Let’s use the Lagrange multipliers to obtain the Maxwell-Boltzmann distribution used by Planck in his seminal work. This distribution is fundamental in the theory of gases and in many others topics in Statistical Physics. The ultraviolet catastrophe was a mathematical prediction with respect to the spectral distribution of blackbody radiation when the temperature T is assumed to be large. The experimental evidences showed that the energy E(, λ, T ) per unity volume of the radiation with wavelength between λ and λ + dλ would have the following behavior as shown in Fig. 16: (1) the short length cutoff advances toward the origin as the temperatures increase, (2) raising the temperature increases the energy of all spectral components, (3) the peak of the curve shifts to a shorter wavelength as the temperature increases. Indeed, the ultraviolet catastrophe is the divergence of the density of energy
Fig. 16 Plancks’ radiation law
5 Lagrange Multipliers
37
Fig. 17 Rayleigh-Jeans radiation law
∞
E(λ, T )dλ.
(23)
0
Considering c as the velocity of light, the energy density E(λ, T ) = 4c I (λ, T ) emitted is the energy radiated per the unit of area at a given wavelength λ and at temperature T . The Rayleigh-Jeans law obtained from the classical theory of equipartition of energy gives us the equation (plotted in Fig. 17) E R J (λ, T ) =
8π kT. λ4
(24)
in which k is the Boltzmann constant. Consequently we have that E(λ, T ) → ∞ when λ → 0. This inconsistency with the experimental reality, a failure of the model, implied that there was something fundamentally wrong with either the equipartition theorem or the theory of electromagnetic radiation. Max Planck was aware of the shortcomings of the Rayleigh-Jeans law, so he proposed a condition that later became one of the axioms of Quantum Mechanics. Planck’s model for a radiating body is to regard the body as a collection of N0 = 6, 022 × 1023 (Avogadro constant) of linear oscillators performing harmonic motion, which then radiates electromagnetic waves. Planck considered the blackbody as a set of N0 harmonic oscillators, the energy levels of each oscillator with wavelength λ were also discretized in energy levels {E n | n ∈ N}, and applied the Maxwell-Boltzmann distribution function N (n) = n 0 e−En /kT . The fundamental assumption was to consider the density energy irradiated by an oscillator at frequency ν to assume values in the discrete set hνN = {E n = nhν | n ∈ N}, in which ν = c/λ and h = 6.6261 × 10−34 m2 kg/s is the Planck constant. The number of oscillators per the unit of volume at each wavelength λ, which is
1 Differentiation in Rn
38
known as Jeans’ number, is n(λ) = 8π/λ4 . Planck was able to obtain the energy density E(λ, T ) emitted by the oscillators at wavelength λ, to be (see Fig. 16) E(λ, T ) =
1 8πhc . λ5 ehc/λkT − 1
This model made it possible for Planck to eliminate the ultraviolet catastrophe. Next we apply the Lagrange multipliers to show how the Maxwell-Boltzmann distribution function is obtained. Given a volume V divided into l cells of volume Vi (1 ≤ i ≤ l), N identical particles are to be distributed among the l cells. The number n i of particles occupying the ith-cell should be proportional to the volume of the cell. Let gi = Vi /V be the fractional volume. In this way, we have l
gi = 1, and
i=1
l
ni = N .
i=1
The probability of a given distribution with n 1 particles in V1 , n 2 particles in V2 and so on is N! (g1 )n 1 (g2 )n 2 · · · (g K )n K . (25) W = n 1 !n 2 ! · · · n K ! For example, consider the case of distributing 12 particles over 4 cells such that n 1 = 1, n 2 = 2, n 3 = 3 and n 4 = 6 is W (1, 2, 3, 6) =
12! (g1 )1 (g2 )2 (g3 )3 (g4 )6 . 1!2!3!6!
If we assume that the cells are equal with the same volume, then we have gi = 1/4 and W (1, 2, 3, 6) = 0, 33045 × 10−2 . We would like to find the most probable distribution, so we must maximize2 the function W (n 1 , . . . , n l ). Let’s assume that the variables n i , 1 ≤ i ≤ l, take values in R. The method of Lagrange’s multipliers is applied as follows: we wish to find the extreme value of W (n 1 , . . . , n l ) subject to the constraint φ = N − li=1 n i = 0. It is convenient to take the logarithm in Eq. (25), so ln(W ) = ln(N !) −
l
ln(n i !) +
i=1
l
n i ln(gi ).
i=1
When N is large, we can use Stirling’s approximation ln(n!) ∼ n ln(n) − n. So we get
l gi . n i ln ln(W ) = N ln(N ) + n i i=1 2 It
is easily seem that a critical point must be a upper limit.
5 Lagrange Multipliers
39
To apply the method of Lagrange’s multipliers, we must solve Eq. (20), i.e., ∂φ + α ∂n = 0, 1 ≤ i ≤ l. It follows that we must solve the system of equations ∂W ∂n i i −(α+1) , and so n i = gi e g1 + . . . + g K = (n 1 + . . . + n K )eα+1 = 1 ⇒ eα+1 =
1 . N
Consequently we get n i = gi N . This means that the most probable distribution is the uniform distribution, that is when the number of particles in a cell is proportional to the size of the cell. Now we consider the problem of distributing N identical particles into l cells with the additional constraint on the energy of the system. The cells are regarded as a discrete set of the energy states E i , such that the total energy is E = li=1 E i . In this case, we have two constraints given by φ1 (n 1 , . . . , n l ) = N −
l
n i = 0,
i
φ2 (n 1 , . . . , n l ) = E −
l
n i E i = 0.
i
In Exercise 8 of this section, the reader was left to prove that due to having two constraints, we have to solve the system of Eq. (22). Then we get ln
gi ni
= 1 + α1 + E i α2
⇒
n i = gi e−(α1 +1) e−Ei α2 .
K n i = e−(α1 +1) Z . Let’s define3 Z = i gi e−Ei α2 , so we then have N = i=1 −E i α2 Therefore the number n i of particles in the ith cell is n i = N gi eZ . The Maxwell-Boltzmann distribution is obtained when we consider that λ2 = and N = N0 is the Avogadro constant, then ni =
1 kT
N0 gi −Ei /kT e . Z
Assuming that gi = g0 for all 1 ≤ i ≤ l, and defining n 0 =
N0 gi Z
, we have
n i = n 0 e−Ei /kT . Therefore if the energy E i increases, then the number n i of particles placed in the ith cell becomes small. So the higher energy levels are less likely to contain particles. It is completely different from the classical situation when n i = gi N . 3Z
is the Partition Function.
1 Differentiation in Rn
40
Now we return to using oscillators instead of cells. Planck’s hypothesis was that the energy emitted by an oscillator at frequency ν can only take value in the discrete set hνN = {hν, 2hν, 3hν, . . . , nhν, . . .}, and not in a continuous set of values. As show next, this has a profound effect on the average energy calculated using the Maxwell-Boltzmann distribution. The probability distribution in a system that is in equilibrium at temperature T , and partitioned into equal oscillators, is e−Ei /kT ni = −E /kT . i N i e
P(E i ) =
Here the fact that the ith energy level at frequency ν is given by E i (ν) = i hν, i ∈ N, comes into play. The number of oscillators is large, so we can assume l → ∞. In this way, the average energy E per oscillator with frequency ν (wavelength λ) is E(ν, T ) =
∞
E i (ν)P(E i (ν)) =
i=1
∞ −i hν/kT i hν hν i=1 n 0 e = hν/kT . ∞ −i hν/kT e −1 i=1 n 0 e
Therefore Planck’s radiation law describing the energy density per wavelength λ then reads as 1 8πhc . E(λ, T ) = n(λ)E(λ, T ) = (26) 5 hc/λkT λ (e − 1) Equation (26) obtained by Planck explained the experimental facts about blackbody radiation to overcome the crises due to ultraviolet catastrophe. The assumption that the energy emitted by a tiny oscillator, as small as an atom, assumes discrete values that are multiples of E = hν called quantas was a fundamental breakthrough to start understanding Physical systems in the atomic scale. Later the quanta hypothesis became an axiom of Quantum Theory. Ultraviolet catastrophe arose because in the classical model, it was considered that all energy levels were equally likely to be occupied by particles.
Exercises (1) Prove Eq. (26). (2) Show that when λ → ∞, the behavior of the energy density obtained by Planck in (26) is similar to the classical energy density (24). (3) Assuming that the energy takes continuous values, show that ∞
Ee−E/kT d E = kT. E(λ) = 0 ∞ −E/kT dE 0 e and conclude that E(λ, T ) ∝
kT λ4
.
6 Differentiable Maps I
41
6 Differentiable Maps I Let U ⊂ Rn and V ⊂ Rm be open subsets. This section is the first step in studying the differentiability of maps f : U → V . The basic concepts and the Inverse Function Theorem are introduced and some applications are worked out. The proof is postponed until Chap. 3.
6.1 Basics Concepts Using the usual orthogonal systems, a point x ∈ U is represented as x = (x1 , . . . , xn ) ∈ Rn and a point y ∈ V is represented as y = (y1 , . . . , ym ) ∈ Rm . Definition 13 A map f : U → V is differentiable at p ∈ U if we have a linear transformation d f p : T p U → T f ( p) V such that f ( p + v) − f ( p) = d f p .v + r (v), for all v ∈ T p U , and the map r : Rn → Rm satisfies the condition limv→0 f is differentiable in U if it is differentiable for all p ∈ U .
r (v) |v|
= 0.
In the local coordinates of U and V , a map f : U → V is given by f (x1 , . . . , xn ) = f 1 (x1 , . . . , xn ), . . . , f m (x1 , . . . , xn ) . Now assume that for every component f i ∈ C1 (U ), there are m functions ri , 1 ≤ (v) = 0. Defining r (v) = (r1 (v), . . . , rm (v)), it follows i ≤ m, satisfying limv→0 ri|v| that limv→0
r (v) |v|
= 0. Then at every p ∈ U ,
f ( p + v) − f (x) = f 1 ( p + v) − f 1 ( p), . . . , f m ( p + v) − f m ( p) = d( f 1 ) p .v + r1 (v), . . . , d( f m ) p .v + rm (v) = d( f 1 ) p .v, . . . , d( f m ) p .v + r (v) = < ∇ f 1 ( p), v >, . . . , < ∇ f m ( p), v > + r (v). Taking v = (v1 , . . . , vn ) in the local coordinate systems defined on U , d f p is the linear map d f p : Rn → Rm given by ⎛ ∂ f1 ∂x ⎜ ∂ f12 ⎜ ∂x1 ⎜
d f p .v = ⎜ . ⎝ ..
∂ fm ∂x1
⎞ ⎛ ⎞ v1 ⎟ ⎜ ⎟ ⎟ ⎜v2 ⎟ .. .. ⎟ ⎟ . ⎜ .. ⎟ . . ... . ⎠ ⎝ . ⎠ ∂ fm vn . . . ∂∂xfmn ∂x2 ∂ f1 ∂x2 ∂ f2 ∂x2
... ...
∂ f1 ∂xn ∂ f2 ∂xn
1 Differentiation in Rn
42
∂f ∂f The column vectors { ∂x , . . . , ∂x } generate the image of d f p in Rm ; let v = 1 n m ∂f n i vk ei ∈ R , so d f p .v = i=1 vi ∂xi (x). The differential of f at p is represented n m by the matrix d f p ∈ L(R , R ); it is also called the Jacobian matrix of f . Let Ck (U, V ) be the set of differentiable maps f : U → V of class C k . The next result is straightforward based on a previous discussion.
Theorem 7 Let k ≥ 1. If f i ∈ Ck (U ) for all i ∈ {1, . . . , m}, then the map f : U → V , f = ( f 1 , . . . , f m ) belongs to Ck (U, V ). Recalling that L(Rn , Rm ) = Rnm , the condition f ∈ C1 (U, V ) is equivalent to the continuity of the map d f : U → L(Rn , Rm ) = Rnm , x → d f x . The derivative of a map satisfies the following properties: let x ∈ U , f, g ∈ C1 (U, V ), t ∈ C1 (U ) and a, b ∈ R; (1) d(a f + bg)x = ad f x + bdgx (R-linear), (2) d(t. f )x = dtx . f (x) + t (x).d f x (Leibniz’s rule). Definition 14 A map f : U → V is a diffeomorphism if f : U → f (U ) is bijective, differentiable, and the inverse map f −1 : f (U ) → U is also differentiable. f : U → V is a local diffeomorphism when it is the diffeomorphism U p → V p , restricted to neighborhoods U p ⊂ U and V f ( p) ⊂ V of p and f ( p), respectively for all p ∈ U . Several references define the Jacobian of f at p as the determinant det(d f p ) and f 1 ,..., f n ) it is denoted by det(d f p ) = ∂( ( p). If f is a diffeomorphism, the linear map ∂(x1 ,...,xn ) d f p : T p U → T f ( p) V is an isomorphism for all p ∈ U , since the inverse d f p−1 = [d f p ]−1 is well-defined. Example 4 The next examples reflect the importance of coordinate systems; (1) The function f : (0, ∞) → R, f (x) = ln(x), is bijective and differentiable in (0, ∞). The inverse f −1 : R → (0, ∞), ( f −1 )(x) = e x is also differentiable. Therefore f is a diffeomorphism. (2) The function f : R → R, f (x) = x 3 is bijective and differentiable; its derivative is f (x) = 3x 2 . The inverse f −1 : R → R, ( f −1 )(x) = √31 2 is 3 x non-differentiable at x = 0, so f is not a diffeomorphism. Let y0 = f (x0 ); if f (x0 ) = 0, then ( f −1 ) (y0 ) = f ( f −11 (y0 )) is not defined at x0 . (3) Let f : [0, 2π) → S 1 , f (t) = cos(2πt), sin(2πt) . Then f (t) = 0 for all t ∈ [0, 2π). The function f cannot be a diffeomorphism since S 1 is compact and [0, 2π) is not. Both functions f and f −1 are well-defined, however, f −1 is not continuous. (4) The polar coordinate system in R2 defines a map P : R2 → R2 , P(r, θ) = r cos(θ), r sin(θ)
(27)
6 Differentiable Maps I
43
with a derivative at (r, θ) that is d P(r,θ) =
cos(θ) −r sin(θ) . sin(θ) r cos(θ)
By restricting the map P to P : (0, ∞) × (0, 2π) → R2 \{(x, 0) | x ≥ 0}, it becomes a diffeomorphism when we notice that P is a bijection and det(d P(r,θ) ) = r = 0. (5) The spherical coordinate system in R3 defines the map F : R3 → R3 , F(ρ, θ, ψ) = (ρ. cos(θ) sin(ψ), ρ. sin(θ) sin(ψ), ρ. cos(ψ)),
(28)
with derivative at (ρ, θ, ψ) given by the linear map ⎛
d(F)(ρ,θ,ψ)
⎞ cos(θ) sin(ψ) −ρ sin(θ) sin(ψ) ρ cos(θ) cos(ψ) = ⎝ sin(θ) sin(ψ) ρ cos(θ) sin(ψ) ρ sin(θ) cos(ψ) ⎠ . cos(ψ) 0 −ρ sin(ψ)
So, det(d(F)(ρ,θ,ψ) ) = −ρ2 sin(ψ). By restricting F to the domain U = (0, ∞) × (0, 2π) × (0, π), we get a diffeomorphism F : U → F(U ). (6) The map f : Rn → B = {x ∈ Rn ; | x |< 1}, f (x) = √ x 2 is a diffeomorphism. The inverse is f −1 (y) = √
1+|x|
y 1−|y|2
and both are C1 .
(7) (Cauchy-Riemann Equations) Let C = {z = x + i y | x, y ∈ R} and let f : C → C be a function of a one complex variable. Assume that f (z) = u(z) + iv(z) and u, v : C → R. Identifying C with R2 , and using the map I : C → R2 , I(x + i y) = (x, y), the function f induces the map f R (x, y) = (u(x, y), v(x, y)) and the following diagram commutes: f
C −−−−→ ⏐ ⏐ )I
C ⏐ ⏐ )I
(29)
fR
R2 −−−−→ R2 . The function f is holomorphic (differentiable with respect to z ) if we have functions f (z) : C → C and r : C → C such that for all v ∈ C, f (z + v) − f (z) = f (z).v + r (v), = 0. Consider f (z) = α + iβ, v = v1 + iv2 and r (v) = r1 (v) + and limv→ r (v) v ir2 (v). The equation above implies that f (z + v) − f (z) = [(αv1 − βv2 )] + i(αv2 + βv1 )] + r1 (z) + ir2 (z).
1 Differentiation in Rn
44
So, u((x, y) + (v1 , v2 )) − u(x, y) = (αv1 − βv2 ) + r1 (v) =< (α, −β), (v1 , v2 ) > +r1 (v) v((x, y) + (v1 , v2 )) − v(x, y) = (αv2 + βv1 ) + r2 (v) =< (β, α), (v1 , v2 ) > +r2 (v). r1 r2 Assuming limv→0 |v| = limv→0 |v| = 0, it follows that the u, v are differentiable and the gradients are ∇u = (α, −β) and ∇v = (β, α). Consequently, f is holomorphic if and only if u and v satisfy the Cauchy-Riemann equations
∂v ∂u = , ∂x ∂y
∂u ∂v =− . ∂y ∂x
(30)
The Jacobian of f R : R2 → R2 is dfR =
∂u
∂x ∂u ∂x
∂u − ∂x
∂u ∂x
⇒
det(d f R ) =
∂u ∂x
2
+
∂u ∂y
2 .
Therefore f is holomorphic and this implies that f R is a conformal map (preserve angles), since < d f R .u, d f R .v >= det(d f R ). < u, v > . To finish the argument, limv→
r (v) v
= 0 implies that limv→0
r1 |v|
= limv→0
r2 |v|
= 0.
Exercises Prove that the following maps are C1 -maps and find their derivatives; (1) (2) (3) (4)
T : Rn → Rm , T (x) = A.x, and A ∈ M(m × n, R). S : Rn × Rn → Rn , S(x, y) = x + y. B : Rn × Rm → R p a bilinear map. f : Rn → B = {x ∈ Rn ; | x |< 1}, given by f (x) = √
x . 1+|x|2
(5) Let f, g ∈ C1 (U, Rn ) and consider the function < f, g >: U × U → R, given by < f, g > (x) =< f (x), g(x) >. Prove that d(< f, g >) p .v =< d f p .v, g( p) > + < f ( p), dg p .v >, v ∈ T p U
(31)
Show that if | f | is constant, then f ( p) ⊥ d f p .v for all p ∈ U and v ∈ T p U .
6.2 Coordinate Systems Coordinate systems are often used to simplify different kinds of problems. For example, using spherical coordinates instead of Cartesian coordinates simplifies problems with spherical symmetry, and similarly for cylindrical coordinates when there is cylindrical symmetry.
6 Differentiable Maps I
45
To change a coordinate system, it is necessary to be able to reverse the transformation, so it must be a diffeomorphism. For example, we have learned how to change the integration variable to solve an integral; this change has to be a diffeomorphism. By changing the coordinate system, the derivative of a map changes as well, as shown in the Chain Rule below. Proposition 2 (Chain Rule). Let U ⊂ Rn and V ⊂ Rm be open subsets and let f : U → R p and T : V → U be C1 . Taking p ∈ V and q = T ( p) ∈ V , the composition f ◦ T : V → R p is C1 and d( f ◦ T ) p = d f T ( p) .dT p : T p Rn → T f (q) R p .
(32)
Proof If f and T are C1 , then we have maps ρ1 and ρ2 such that f (q + w) − f (q) = d f p .w + ρ1 (w). | w | T ( p + v) − T ( p) = dT p .v + ρ2 (v). | v | and limw→0 ρ1 (w) = limv→0 ρ2 (v) = 0. Therefore
w
* +, - f ◦T ( p + v) = f T ( p + v) = f T ( p) + dT p .v + ρ2 (v) | v | = f T ( p) + d f q dT p .v + ρ2 (v) | v | + ρ1 dT p .v + ρ2 (v) | v | | (dT p .v + ρ2 (v) | v |) | = f ◦ T ( p) + d f q .dT p .v + R(v) | v |
v and R(v) = d f q .ρ2 (v) + ρ1 (dT p .v + ρ2 (v) | v |). | (dT p . |v| + ρ2 (v) | v |) |. Since v the term |v| is bounded, by passing to the limit v → 0, we get R(v) → 0 verifying the identity (32).
Corollary 1 Let U ⊂ Rn and V ⊂ Rm be open subsets, and let f : U → R p and T : V → U be C k . So f ◦ T : V → R p is C k . Now, we work out the gradient and Laplacian formulas using polar coordinates. Considering β = {e1 , e2 } the canonical basis of R2 , let’s compute the gradient ∇ f = 2 2 ∂f e + ∂∂ yf e2 and the Laplacian f = ∂∂x 2f + ∂∂ y 2f using the polar coordinates defined ∂x 1 y in (27). Since r = x 2 + y 2 and θ = arctg x , we have the partial derivatives
∂ f ∂r ∂ f ∂θ ∂f ∂f sin(θ) ∂f = . + . = . cos(θ) + − , ∂x ∂r ∂x ∂θ ∂x ∂r ∂θ r
∂f ∂ f ∂r ∂ f ∂θ ∂f ∂ f cos(θ) = . + . = . sin(θ) + . ∂y ∂r ∂ y ∂θ ∂ y ∂r ∂θ r Taking the orthogonal unit vectors rˆ = (cos(θ), sin(θ)) and θˆ = (− sin(θ), cos(θ)), we get ∂f 1∂f ˆ ∇f = rˆ + θ. (33) ∂r r ∂θ
1 Differentiation in Rn
46
Analogously,
∂r ∂ ∂f ∂θ ∂ ∂f ∂f sin(θ) ∂2 f ∂ ∂f . + . = . cos(θ) + − cos(θ) = ∂r ∂x ∂x ∂θ ∂x ∂x ∂r ∂r ∂θ r ∂x 2
∂ ∂f ∂f sin(θ) sin(θ) + . cos(θ) + − − , ∂θ ∂r ∂θ r r
∂2 f ∂r ∂ ∂f ∂θ ∂ ∂f ∂ f cos(θ) ∂ ∂f . + . = . sin(θ) + sin(θ) = ∂r ∂ y ∂y ∂θ ∂ y ∂y ∂r ∂r ∂θ r ∂ y2
cos(θ) ∂ ∂f ∂ f cos(θ) . + . sin(θ) + ∂θ ∂r ∂θ r r
Therefore f =
1 ∂2 f 1∂f ∂2 f + . + r ∂r ∂r 2 r 2 ∂θ2
(34)
Applying the formula (33) to the function f (x, y) = (x 2 + y 2 )−3/2 , we get ∇ f (r, θ) = − r34 rˆ .
Exercises (1) Let f = f (r, θ) and g(x, y) = f ◦ P −1 (x, y). Use the chain rule to prove the Eq. (33). (2) The cylindrical coordinates are defined using the map C :R3 → R3 ,
C(r, θ, z) = r cos(θ), r sin(θ), z .
(35)
Find an open set U ⊂ R3 such that C : U → C is a diffeomorphism. Using the cylindrical coordinates, prove that the gradient and the Laplacian of a function are given by 1∂f ˆ ∂f ∂f rˆ + zˆ , ∇f = θ+ ∂r r ∂θ ∂z (36) ∂2 f 1∂f 1 ∂2 f ∂2 f + f = + + . r ∂r ∂r 2 r 2 ∂θ2 ∂z 2 (3) Using spherical coordinates, prove that the gradient and the Laplacian of a function f ∈ C1 (R3 ) are ∂f 1 ∂f ˆ 1∂f ˆ rˆ + θ+ ψ, ∂r r sin(ψ) ∂θ r ∂ψ ∂2 f 1 1 ∂2 f 2∂f cot(ψ) ∂ f ∂2 f + 2 2 + 2 + + . f = 2 2 2 ∂r r ∂r r ∂ψ r ∂ψ 2 r sin (ψ) ∂θ ∇f =
(37)
6 Differentiable Maps I
47
(4) Using spherical coordinates in R n , prove that the Laplacian is given by f =
n−1∂f ∂2 f 1 + + 2 S n−1 f, 2 ∂ρ ρ ∂ρ ρ
(38)
such that S n−1 f is an operator involving only the 1st and 2nd partial derivatives of f with respect to the angles θ1 , . . . , θn−1 . The operator S n−1 f is the LaplaceBeltrami operator defined on S n−1 . One of the most important results in the theory of differentiable maps is the Inverse Function Theorem (InFT), our next result. Theorem 8 (InFT). Let U, V be subsets of Rn and let f : U → V be a C1 -map. Take p ∈ U and assume that d f p : T p U → T f ( p) V is an isomorphism of vector spaces. So we have an open neighborhood W ⊂ U containing p, such that f : W → f (W ) is a diffeomorphism. Now, to verify whether a map is a diffeomorphism in the neighborhood of a point p, it is sufficient to check if the linear transformation d f p is an isomorphism. The proof is postponed to Chap. 4. Next, we will apply the InFT to understand some standard cases of differentiable maps. To this end, we will need the following definitions; Definition 15 Let U ⊂ Rn and let V ⊂ Rm be open subsets. Let f ∈ C1 (U, V ) and p ∈ U . Thus, f is: (1) an immersion at p if n < m and rank(d f p ) = n. f is an immersion on U if it is an immersion for all p ∈ U . (2) a submersion at p if n > m and rank(d f p ) = m. f is a submersion on U if it is a submersion for all p ∈ U . Example 5 In the examples below, we show standard examples of immersions and submersions; (1) The map f : (0, 2π) × (0, π) → R3 , given by f (θ, ψ) = cos(θ) sin(ψ), sin(θ) sin(ψ), cos(ψ) is an immersion. (2) Let B1 = {(x, y) ∈ R2 | x 2 + y 2 < 1}. The map φ1 : B1 → R3 given by φ1 (x, y) = (x, y, 1 − x 2 − y 2 ) is an immersion. (3) A map F : Rn → Rn+m , F(x) = x, f (x) , defining the graph of a differentiable map f : Rn → Rm is the standard model of a local immersion. It is proved in the sequel that all immersions can be described locally with the map F. (4) The function f : R3 → R, f (x, y, z) = x 2 + y 2 + z 2 is a submersion in R3 − {0}.
1 Differentiation in Rn
48
(5) Let n > m; the projection π : Rn → Rm , π(x1 , . . . , xn ) = (x1 , . . . , xm ), is a submersion. (6) (Whitney’s umbrella) The map f : R2 → R3 , f (x, y) = (x y, x, y 2 ) is not an immersion at p = (0, 0). (7) M ⊂ Rm is a parametrized n-dimensional submanifold of Rm if the tangent plane T p M is a well-defined n-dimensional vector subspace of Rm for all p ∈ M. If f : U ⊂ Rn → Rm is an immersion, then M = f (U ) is a parametrized ndimensional submanifold of Rm . (8) (Boy’s Surface) David Hilbert asked if there is an immersion of the Projective Plane RP 2 into R3 . In 1901, Werner Boy answered positively displaying the following immersion (Figs. 18 and 19): √ x= z=
2 cos2 (v) cos(2u) + cos(u) sin(2v) , y= √ 2 − 2 sin(3u) sin(2v)
2−
√
3 cos2 (v) 2 sin(3u) sin(2v)
Fig. 18 Boy’s surface
Fig. 19 Boy’s surface
√ 2 cos2 (v) sin(2u) − sin(u) sin(2v) , √ 2 − 2 sin(3u) sin(2v)
.
(39)
6 Differentiable Maps I
49
Definition 16 Let U ⊂ Rn and V ⊂ Rm be open subsets and f ∈ C1 (U, V ). A point c ∈ Rm is a regular value of f if the linear map d f p : T p U → Tc V has maximum rank for all p ∈ f −1 (c). The values taken by immersions and submersions are always regular. A map f : Rn → Rm is an immersion if rank(dfp ) = n and f is a submersion if rank(dfp ) = m.
6.3 The Local Form of an Immersion We start by studying the easiest example of an immersion. Let U ⊂ R2 be an open subset and let φ : U → R be a differentiable function such that ∇φ(x) = 0 for all x = (x1 , x2 ) ∈ U . The map ι : R2 → R3 given by ι(x1 , x2 ) = x1 , x2 , φ(x1 .x2 )
(40)
is an immersion. Indeed, ι(U ) is the graph of φ : U → R and the rank of the matrix ⎛
dι(x1 ,x2 )
1 =⎝ 0
⎞ 0 1 ⎠
∂φ ∂φ ∂x1 ∂x2
is equal to 2. We claim ι : U → f (U ) is a diffeomorphism. ι is obviously bijective, so it is sufficient to prove that the inverse ι−1 is differentiable. Extending ι to a map : U × R → R3 , (x1 , x2 , x3 ) = x1 , x2 , φ(x1 , x2 ) + x3 , we have the linear map d p at p = (x1 , x2 , x3 ) given by ⎛
1 ⎝ 0 d p =
∂φ ∂x1
⎞ 0 0 1 0⎠ . ∂φ 1 ∂x2
So dφ p : T p (U × R) → T( p) R3 is an isomorphism of vector spaces since rank(d p ) = 3. By the InFT, we have an open set W ⊂ U × R such that : W → (W ) is a diffeomorphism. Consequently, the restriction : W ∩ (U × {0}) → (W ∩ (U × {0})), (x1 , x2 , 0) = ι(x1 , x2 ), is a diffeomorphism. The inverse map is given by ι−1 u, v, φ(u, v) = −1 u, v, φ(u, v) = (u, v, 0).
1 Differentiation in Rn
50
Theorem 9 (Local Form of Immersions). Let U ⊂ Rn and V ⊂ Rm be open subsets and let f ∈ C1 (U, V ) be an immersion. So there are open subsets U ⊂ U and W ⊂ Rn , and a diffeomorphism : W → U such that the composition f ◦ : W → V is given by (41) f ◦ (x) = x, ξ1 (x), . . . , ξm−n (x) , x = (x1 , . . . , xn ) ∈ W and the functions ξ1 , . . . , ξm−n : W → R are all differentiable. Proof The proof is carried out for the case n = 2 and m = 3; the general case is analogous. Let f (x1 , x2 ) = f 1 (x1 , x2 ), f 2 (x1 , x2 ), f 3 (x1 , x2 ; then ⎛ ∂f ⎜ d f (x1 ,x2 ) = ⎝
1
∂x1 ∂ f2 ∂x1 ∂ f3 ∂x1
∂ f1 ∂x2 ∂ f2 ∂x2 ∂ f3 ∂x2
⎞ ⎟ ⎠.
Since rank(d f p ) = 2, we assume the matrix
∂ f1 ∂x1 ∂ f2 ∂x1
∂ f1 ∂x2 ∂ f2 ∂x2
defines an isomorphism ∀(x1 , x2 ) ∈ U . Taking the map g : U → R2 , g(x1 , x2 ) = ( f 1 (x1 , x2 ), f 2 (x1 , x2 )), from the InFT, there are open neighborhoods U ⊂ U and W of f ( p), such that g : U → W is a diffeomorphism. Let = g −1 : W → U and set u = f 1 (x1 , x2 ) (Fig. 20), v = f 2 (x1 , x2 ) and ξ1 (u, v) = f 3 ◦ (u, v), so f ◦ : W → U
f ◦ (u, v) = (u, v, ξ1 (u, v)).
Fig. 20 Local form of immersions
6 Differentiable Maps I
51
Fig. 21 Klein Bottle
Figure 21 illustrates the immersion of Klein’s Bottle; defined as follows: let u, v ∈ [0, 2π] × [0, 2π);
cos(u) x = 6 cos(u) 1 + sin(u) + 4 1 − cos(u) cos(v), 2
cos(u) sin(v), y =4 1− 2
cos(u) sin(u) cos(v). z = 16 sin(u) + 5 1 − 2 The Projective Plane RP 2 and Klein’s Bottle K2 cannot be embedded into R3 without self-intersections. Since both are non-orientable, inside they contain an embedded Möbius Band. Suppose that K2 could be embedded into R3 without self-intersections, so there would be a tubular neighborhood N = K2 × (−, ) such that an arbitrary closed curve C ⊂ N intersecting the surface K2 × {0} would have two points of intersection with K2 × {0}; however, we can prove that there is a curve with only one such point, as proved in Hatcher [22]. Definition 17 Consider n < m, U ⊂ Rn and let V ⊂ Rm be open subsets. Let f : U → V be a differentiable map; (1) f is an open map if for all open subsets W ⊂ U , the image f (W ) is an open subset of f (U ), i.e., there is an open set V ⊂ Rm such that f (W ) = f (U ) ∩ V . (2) f is a differentiable embedding, or just an embedding, if f is an open immersion. Theorem 10 Consider n < m, U ⊂ Rn and V ⊂ Rm open subsets, and assume f : U → V is an embedding. So M = f (U ) is an n-submanifold of V .
1 Differentiation in Rn
52
Fig. 22 Embedding
Proof The proof is carried out to the case n = 2 and m = 3; the general case is analogous. From the Local Form of an Immersion (9), for all p ∈ U we have an open neighborhood U p ⊂ U such that f : U p → f (U p ) is a diffeomorphism given by f (x1 , x2 ) = (x1 , x2 , f 1 (x1 , x2 )). Since f is an open map, we have an open set V ⊂ R3 such that f (U p ) = V ∩ f (U ) is open in f (U ). Indeed, given > 0, take the open set V = f (U p ) × (−, )] (Fig. 22). Example 6 There are examples of immersions that are not an embedding; this is a subtle issue. The following sheds some light on what may occur. Consider the embedding of the torus T 2 given by φ : [0, 2π] × [0, 2π] → R3 , φ(u, v) = (6 + 2 cos(u)) cos(v), (6 + 2 cos(u)) sin(v), 2 sin(u) ,
(42)
and the curve αa,b : R → R2 , αa,b (t) = (at, bt). For r = ab , define the curve γr : R → T 2 given by the composition γr = φ ◦ αa,b and consider the following families of curves; (i) r ∈ Q; illustrated in Fig. 23, (ii) r ∈ R\Q; illustrated in Fig. 24. Both families have distinct topological properties as a subset of T 2 . For every r ∈ Q / Q, the curve αr is closed in T 2 ; so it is an embedding of R into T 2 . However, if r ∈ then the curve αr is not closed; indeed it is dense in T 2 (see a proof in Lima [30]). Consequently, the case r ∈ / Q defines a curve αr ⊂ T 2 which is not an embedding 2 of R into T , even though it is an immersion.
6.4 The Local Form of Submersions Let V ⊂ R3 be an open subset and p ∈ V . The projection πx : V → R, πx (x, y, z) = x, is a submersion since the linear functional
6 Differentiable Maps I
53
Fig. 23 γ2/3 is closed
Fig. 24 γ √3 is not closed 2
d(πx ) p : T p V → R, d(πx ) p .v =< (1, 0, 0), (v1 , v2 , v3 ) >= v1 is surjective. For every c ∈ Im(ßx ), the set πx−1 (c) is the intersection of the plane x = c with the open set V . Another example is the submersion defined by πx y : V → R2 , πx y (x, y, z) = (x, y). The projections are standard models of submersions. Theorem 11 (Local Form of Submersions). Let U ⊂ Rm+k and V ⊂ Rm be open subsets and p ∈ U . If f ∈ C1 (U, V ), f = ( f 1 , . . . , f m ), is a submersion such that ∂( f 1 ,..., f m ) = 0, then we have an open neighborhood U (⊂ U ) of p, an open subset ∂(x1 ,...,xm ) W ⊂ Rm+k and a diffeomorphism : W → U such that the composition f ◦ : W → V is the projection f ◦ (U1 , . . . , um , . . . , um+k ) = (U1 , . . . , um ). Proof The proof is carried out for the case n = 3 and m = 2. The general case follows using the same reasoning. Let f : R3 → R2 be given by f (x, y, z) = f 1 (x, y, z), f 2 (x, y, z) . ∂ f
Therefore d f (x,y,z) =
1 ∂ f1 ∂ f1 ∂x ∂ y ∂z ∂ f2 ∂ f2 ∂ f2 ∂x ∂ y ∂z
.
1 Differentiation in Rn
54
Fig. 25 Local form of submersions
From the hypothesis rank(dfp ) = 2. Without loss of generality, we can assume the matrix ∂ f ∂ f 1
1
∂x ∂ y ∂ f2 ∂ f2 ∂x ∂ y
( p)
is invertible for all p ∈ U . The InFT implies that the map F : U → V × R, F(x, y, z) = ( f 1 (x, y, z), f 2 (x, y, z), z) is a diffeomorphism since ⎞
⎛∂ f d Fp =
1 ∂ f1 ∂ f1 ⎜ ∂∂xf2 ∂∂ fy2 ∂∂zf2 ⎟ ⎝ ∂x ∂ y ∂z ⎠
0
0
1
is invertible. Now we have open neighborhoods U (⊂ U ) containing p, and W (⊂ V × R) containing f ( p), such that F : U → W is a diffeomorphism. Let = F −1 : W → U ; considering u = f 1 (x, y, z) and v = f 2 (x, y, z), we get f ◦ (u, v, z) = f (x, y, z) = (u, v) The InFT ensures that locally (u, v, z) = φ1 (u, v, z), φ2 (u, v, z), z is a diffeo morphism. In this way for c = (c1 , c2 ) ∈ Im(f), f −1 (c) = { φ1 (c1 , c2 , z , . : R → R2 , φ . = (φ1 , φ2 ) φ2 (c1 , c2 , z), z | (c1 , c2 , z) ∈ W } is the local graph of φ (Fig. 25).
6 Differentiable Maps I
55
Theorem 12 Let U ⊂ Rn , V ⊂ Rm be open subsets, let f : U → V be a differentiable map and let c ∈ Im(f) be a regular value of f . So the set M c = f −1 (c) is an (n-m)-submanifold of Rn and T p M c = Ker(dfp ) for all p ∈ M c , . Proof The proof is carried out for the case n = 3 and m = 2. Let c = (c1 , c2 ) ∈ Im( f ) and p = (x, y, z) ∈ f −1 (c). From the proof of Theorem 11, there are open neighborhoods U p p and W f ( p) such that U p is diffeomorphic to W ⊂ f (U p ) × R and a map F : U → W . Therefore f −1 (c) ∩ U p is diffeomorphic to {(c1 , c2 , z) | z ∈ R} since f (x, y, z) = c
⇐⇒
F(x, y, z) = (c1 , c2 , z).
Example 7 Examples of submanifolds. (1) Orthogonal Group On = {A ∈ GLn (R) | A.At = At .A = I}. Consider Sn = {A ∈ Mn (R) | At = A} and An = {A ∈ Mn (R) | At = −A} to be the subspaces of symmetric and skew-symmetric matrices, respectively. Since t t + B−B , we take the direct any matrix B ∈ Mn (R) decomposes as B = B+B 2 2 n(n+1) n(n−1) sum Mn (R) = Sn ⊕ An . As vector spaces, Sn = R 2 and An = R 2 . Defint ing the map f : Mn (R) → Sn , f (A) = A.A , the orthogonal group is given by On = f −1 (I). To show that On is a submanifold, it is sufficient to prove that I ∈ Sn is a regular value of f . Taking A ∈ On , the derivative of f at A is given by (A + t V )(A + t V )t − A At t[AV t + V At ] + t 2 V V t = lim t→0 t→0 t t t t = AV + V A ∈ Sn .
d f A .V = lim
Since the equation AV t + V At = B admits a solution for all B ∈ Sn , the derivat t + B−B , it is tive d f A is surjective. Applying the decomposition B = B+B 2 2 easy to verify that V = 21 B A is a solution. Once I is a regular value of f , 2 it follows that On is a differentiable submanifold of Rn . The tangent plane at n(n−1) A is T A On = Ker(dfA ), so dim(On ) = 2 . At A = I , the tangent plane is TI On = An . The determinant of a matrix A ∈ On is either 1 or −1, so the group On has two connected components. The connected component containing the identity is the subgroup SOn = {A ∈ On | det(A) = 1} of special orthogonal matrices.
1 Differentiation in Rn
56
(2) Unitary Group Un = {A ∈ Mn (C) | A∗ = A} (A∗ = A t ). Let Mn (C) be the algebra of matrices with complex entries; let Hn = {A ∈ Mn (C) | A∗ = A} be the subspace of Hermitian matrices and let Anh = {A ∈ Mn (C) | A∗ = −A} be the subspace of skew-Hermitian matrices. For any unitary matrix A ∈ Un , the following hold: (i) Given two complex vectors x and y, multiplication by A preserves their inner product; that is, < Ax, Ay >=< x, y >. (ii) A is normal. (iii) A is diagonalizable; that is, A is unitarily similar to a diagonal matrix, as a consequence of the spectral theorem. A now has a decomposition of the form A = P D P ∗ , with P unitary, and D is diagonal and unitary. (iv) | det(A) |= 1. (v) Its eigenspaces are orthogonal. (vi) A can be written as A = e S (e indicates the exponential), i is the imaginary unit and S is a skew-Hermitian matrix. ∗ ∗ + A−A induces the direct sum Mn (C) = Hn ⊕ The decomposition A = A+A 2 2 Anh . The real dimension of each space is given by dim R (Hn ) = n2 and dimR (Anh ) = n2 . Considering the map f : Mn (C) → Hn , f (X ) = X.X ∗ , the space of unitary 2 matrices is given by Un = f −1 (I). The vector space Mn (C) is isomorphic to R2n 2 and Hn is a vector subspace (over R) of R2n . We prove that I is a regular value of f . Fix A ∈ Un , the derivative of f at A is given by (A + t V )(A + t V )∗ − A A∗ t[AV ∗ + V A∗ ] + t 2 V V ∗ = lim t→0 t→0 t t ∗ ∗ = AV + V A ∈ Hn .
d f A .V = lim
Since the equation AV ∗ + V A∗ = B admits a solution V = 21 B ∗ A, d f A is surjective and I is a regular value of f . The tangent plane at A = I is TI Un = Anh . 2 Hence Un = f −1 (I) is a submanifold of R2n with dimension dimR (Un ) = n2 . To prove that Un is connected, we use the property (iii) that any unitary matrix A, being diagonalizable, can be written as A = P D P ∗ , with P unitary and D = diag(ei`1 , . . . , ei`n ). So by considering D(t) = diag(eit`1 , . . . , eit`n ), the path γ : [0, 1] → Un , γ(t) = P D(t)P ∗ , connects A to I . Hence the group Un is connected. A second proof that Un is a submanifold uses the determinant function det : Mn (C) → C. Since the determinant of a unitary matrix is a complex number with norm 1, the determinant gives a group homomorphism det : Un → U1 . However, since the determinant restricted to the case of Un takes values in U1 , and not in a Euclidean space, the proof that Un = det −1 (U1 ) is a submanifold requires concepts of transversality beyond the scope of this text. (3) Unitary Special Group SUn = {A ∈ Un | det(A) = 1}. To prove that SUn is a submanifold, we must use the property det(A) = 1 for all A ∈ SUn . As mentioned before, this situation requires further concepts.
6 Differentiable Maps I
57
Exercise Show that the groups On , SOn , Un and SUn are compact spaces.
6.5 Generalization of the Implicit Function Theorem In Rn+k = Rn × Rk , consider the coordinates (x, y), x ∈ Rn and y ∈ Rk . Let U ⊂ Rn+k be an open set and assume that the functions f 1 , . . . , f k are differentiable; then consider the non-linear system of equations ⎧ ⎪ ⎪ ⎨ f 1 (x, y) = c1 , .. . ⎪ ⎪ ⎩ f (x, y) = c . k
(43)
k
Next, define the map F : U → Rk , F(x, y) = ( f 1 (x, y), . . . , f k (x, y)) and value the point (x0 , y0 ) ∈ F −1 (c) is known and c = (c1 , . . . , ck ) ∈ Rk . Suppose ∂ fi ∂F (x0 , y0 ) = ∂ y j (x0 , y0 ) is an invertible matrix. Then the InFT can be applied ∂y ˜ y) = x, F(x, y) in a neighborhood of (x0 , y0 ). to the map F˜ : U → Rn+k , F(x, So we have open sets U , V ∈ Rn+k , (x0 , y0 ) ∈ U ⊂ U , such that F˜ : U → V is a diffeomorphism. Consider the coordinates in V given by (x, u) ∈ Rn × Rk and let ξ˜ = F˜ −1 , and we get ˜ u) = x, ξ1 (x, u), . . . , ξ1 (x, u) = (x, ξ(x, u)). (x, y1 , . . . , yk ) = ξ(x, Therefore if u = c ∈ Rk is a fixed value, then F −1 (c) = (x, ξ(x, c)); i.e., F −1 (c) restricted to U (x0 , y0 ) is the graph of the map ξ : Rn → Rk .
Exercises (1) Let U ⊂ Rn , V ⊂ Rm be open sets and f ∈ C1 (U, V ). Prove the following; (a) The local form of immersions (n < m). (b) The local form of submersions (n > m). (2) Prove that the set of matrices 2 × 2 having rank-1 is a 3-submanifold of R4 . (3) Consider p : Rn → R to be a homogeneous polynomial of degree k and c = 0. Prove that p −1 (c) is a (n-1)-submanifold of Rn . (4) Prove that any submersion is an open map. Conclude that if K is a compact set, then there is no submersion f : K → Rm . (5) Let U1 ⊂ Rn , U2 ⊂ Rm be open subsets and U = {(x, y) ∈ U | x ∈ U1 , y ∈ U2 } = U1 × U2 . Consider f : U1 × U2 → Rm as the map f (x, y) = ( f 1 (x, y), . . . , f m (x, y)), such that f 1 , . . . , f m : Rn → R are C1 -functions. Let d(x,y) f = d1 f (x,y) + d2 f (x,y) : Rn ⊕ Rm → Rm be the derivative of f and
1 Differentiation in Rn
58
(6)
(7)
(8)
(9)
(10)
(11)
assume that d2 f (x0 ,y0 ) is invertible. Using this setting, write a statement and prove a generalization of the Implicit Function Theorem, and also prove that the set M = f −1 (0) is locally the graph of a map ξ : Rn → Rm . Find the critical values set of the function f : R3 → R, f (x, y, z) = x 2 + y 2 − z 2 . Show that if c1 and c2 are regular values of f having the same sign, then f −1 (c1 ) and f −1 (c2 ) are diffeomorphic. Let Q : Rn → R be a non-degenerate quadratic form. The spectrum σ(Q) of Q admits a decomposition σ(Q) = σ + ∪ σ − , in which σ + = σ(Q) ∩ (0, ∞) and σ − = σ(Q) ∩ (−∞, 0). Since Q is diagonalizable, Rn has a basis β = − ± ± ± + + − {u + 1 , . . . , u p , u 1 , . . . , u q } of eigenvectors of Q, and Q(u i ) = λi u i , λi > 0 − and λi < 0. By Sylvester’s theorem, quadratic forms are classified by their rank n = p + q and by their signature σ(Q) = p − q. Show the set SO(p, q) = Find its dimension. {A ∈ GLn (R) | A.Q.At = Q} is a group and a submanifold.
0 −In Let In be the n × n identity matrix and J0 = . The symplectic group In 0 t is defined as Sp2n (R) = {A ∈ GLn (R) | A J0 A = J0 }. Show Sp2n (R) is a submanifold. Consider the linear transformation J0 : R2n → R2n in which the matrix matrix J0 is defined in the last item. The real vector space R2n and the complex vector space Cn can be identified by taking the canonical basis β = {e1 , e2 , . . . , ei , ei+1 , . . . , e2n−1 , en } and defining the complex orthonormal basis β C = {e1 + J0 e2 , . . . , ei + J0 ei+1 , . . . , e2n−1 + J0 e2n }. Show that GLn (C) ⊂ GL subgroups; indeed, they are submanifolds 2n (R) and Un ⊂ Sp(2n) are A ∈ GLn (C) ⇔ AJ0 = J0 A . Using the Gram-Schmidt procedure to obtain an orthonormal basis, show that any compact subset of GLn (R) can be continuously deformed into a compact subset of On . Indeed, given a compact K ⊂ GLn (R), we have a continuous map F : [0, 1] × K → GLn (R) such that F(0, x) = x e F(1, x) ∈ On . Let (x0 , y0 ) be a solution to the non-linear system (43). Prove that if the gradient vectors ∇ f 1 (x0 , y0 ), . . . , ∇ f k (x0 , y0 ) are linearly independent in a neighborhood of (x0 , y0 ), then the solution set is an n-submanifold of Rn+k . Conclude that the solution of the system below is not a 1-submanifold of R3 .
x 3 − y 2 + z = 0, x y − z = 0.
7 Fundamental Theorem of Algebra This section is devoted to applying the theory of differentiable maps to prove the Fundamental Theorem of Algebra (FTA) on R. The proof relies only on the properties of differentiable maps; the condition that the field be algebraically closed is
7 Fundamental Theorem of Algebra
59
not necessary. The complex case will be left as an exercise. This proof is due to Pukhlikov [37]. /d ⊂ R[x] be the vector Polynomials of one real variable define the ring R[x]. Let P /d is represented by subspace of polynomials of degree d. An element P ∈ P P(x) = ad x d + ad−1 x d−1 + · · · + a1 x + a0 , /d is an R-vector space identified and ad = 0, ad−1 , . . . , a1 , a0 are real numbers. P with Rd+1 by the vector space isomorphism /d → Rd+1 P
ad x d + · · · + a0 → ad , . . . , a0 . /d the equivalent relation “P(x) ∼ Q(x) ⇔ if we have λ ∈ R∗ We introduce in P /d can be identified with such that Q(x) = λP(x)”, and every polynomial P(x) ∈ P a monic polynomial P(x) = x d +
ad−1 d−1 a1 a0 x + ··· + x + . ad ad ad
/d can be identified with Rd = The space Pd of monic polynomials in P d / . The quotient space Pd / ∼ is exactly the projective space {(1, 0, . . . , 0)} × R RPd = Rd+1 / ∼ . Remarks (1) As described in Appendix B, the real projective spaces RPn , n ∈ N, can be assembled with the inductive process RP1 = S 1 , 0 RPn = Rn RPn−1 .
(44)
(2) The space Pd of monic polynomials with degree d is identified as Rd ⊂ RPd , and the space Pd−1 of monic polynomials of degree ≤ (d − 1) is identified as RPd−1 = {(0 : ad−1 : . . . : a0 )} ⊂ RPd . Theorem 13 (FTA). Every non-constant polynomial P(x) ∈ R[x] can be factored into a product of linear and quadratic factors. Example Let P(x) ∈ P4 . Factoring an arbitrary monic polynomial P(x) = x 4 + Ax 3 + Bx 2 + C x + D as x 4 + Ax 3 + Bx 2 + C x + D = (x 2 + ac + b)(x 2 + cx + d), we get a system of four equations in four unknowns:
1 Differentiation in Rn
60
a + c = A, ad + bc = C, b + d + ac = B, bd = D. Rewriting the system in the variables x, y, z and w, we have f : R4 → R4 , (x, y, z, w) → (x + z, y + w + x z, xw + yz, yw). We would like to know if the map f above is surjective. The main ingredients to understand the topology of the solution set for the equation f (x, y, z, w) = (A, B, C, D) are the Inverse Function Theorem and the fact that f is a proper map. Let X and Y be non-compact locally compact Hausdorff spaces. Take com. = Y ∪ K Y be the X = X ∪ K X and Y pact subsets K X ⊂ X and K Y ⊂ Y and let . compactifications, respectively. If a map f : X → Y admits a continuous extension ., then f is proper.4 There are two important cases for our purposes. The fˆ : . X →Y first is used in the next section regarding the Jacobian Conjecture and the second is to prove the FTA. (i) Let . Rn = Rn ∪ {∞} be the one point compactification by stereographic projection. We know that . Rn = S n . (ii) The real projective plane RPn = Rn ∪ RPn−1 is a compactification of Rn by adding a copy of RPn−1 at infinity. The image f (X ) with a proper map f : X → Y between two topological Hausdorff spaces is closed5 in Y . Furthermore, if Y is connected and f (X ) ⊂ Y is open, then f is surjective. The polynomial ring R[x] is composed of a chain of linear subspaces, each corresponding to a space of monic polynomials: R = P0 ⊂ P1 ⊂ . . . ⊂ Pn ⊂ . . . ⊂ R[x]. For d ≥ 3 and 1 ≤ k ≤ d − 1, we define the multiplication map μk :Pk × Pd−k → Pd , (P, Q) → P(x) · Q(x).
(45)
The image of μk corresponds to the polynomials that can be factored as a product of a polynomial of degree k with another polynomial of degree (d − k).
4 See 5 See
Appendix B. Appendix A.
7 Fundamental Theorem of Algebra
61
Proposition 3 For 1 ≤ k ≤ d − 1, the mapping μk : Pk × Pd−k → Pd is proper and differentiable. Proof f is a proper map since μk extends to a map μk : RPk × RPd−k → RPd , and Pk = Rk ⊂ RPd . To differentiate, we consider a map μk : Rk × Rd−k → Rd . For the polynomials P = (P1 , P2 ) ∈ Pk × Pd−k and T = ( f, g) ∈ Pk−1 × Pd−k−1 , we have μk (P + T ) − μk (R) (46) d μk P .T = lim = g P1 + f P2 . →0 Corollary 2 If d = d1 + . . . + dk , di ∈ N, then the map μ : Pd1 × Pd2 × . . . × Pd K given by μ(P1 , . . . , Pk ) = P1 · P2 · · · Pk−1 · Pk is proper. Next, we will continue to demonstrate TFA. Proof (FTA) The argument is by induction on d. The theorem is true for d = 1 and d = 2. For d = 3, it is also true since every P(x) ∈ P3 admits at least one real root. Let Wk = μk (Pk × Pd−k ) ⊂ Pd be the subset of reducible polynomials and define W (d) =
d−1 0
Wk .
k=1
The proof is completed if the induction principle leads us to the identity Pd = W (d). The proof is divided into four statements: 1st − Claim : W (d) ⊂ Pd is closed in Pd . Pf: Each map μk is proper, so its image is closed. W (d) is a finite union of closed subsets, and hence is closed. The subset W (d) would be an open subset in Pd if for every polynomial P = (P1 , P2 ) ∈ W (d) there was an open neighborhood U ⊂ Pd . Here the InFT comes into play. The derivative of μ K : Pk × Pd−k → Pd defined by Eq. (46) is d(μk ) P ( f, g) = g P1 + f P2 . Recall that the pair (P1 , P2 ) is relatively prime if and only if the greatest common divisor (P1 , P2 ) = gcd(P1 , P2 ) = 1. In this way, the kernel of d(μk ) P is Kerd(μk ) P =
0, if (P1 , P2 ) = 1, = 0, otherwise.
1 Differentiation in Rn
62
Let D = {P = (P1 , P2 ) | (P1 , P2 ) = 1} ⊂ Pk × Pd−k be the space of degenerate monic polynomials. D is closed in W (d). Consequently, if P = (P1 , P2 ) ∈ W (d)D, then there are open neighborhoods U ⊂ Pk × Pd−k of P and V ⊂ Pd of P1 · P2 , such that μk : U → V is a diffeomorphism. Therefore every element in U can be factored into the product of nonconstant relatively prime polynomials. 2nd − Claim : W (d)D is open in Pd D. Pf: By the InFT, W (d)D is open in Pd for all 1 ≤ k ≤ d − 1. In particular, W (d)D is open in Pd D. 3rd − Claim : W (d)D is closed in Pd D. Pf: Since W (d) is closed in Pd , it follows that W (d)D is closed in Pd D. 4th − Claim : Pd D = W (d)D. Pf: The claim follows if Pd D is connected. Let’s describe the set D of monic polynomials that cannot be written as a product of nonconstant relatively prime polynomials. Given an element P = (P1 , P2 ) ∈ D, by the induction hypothesis P1 and P2 can be factored into linear or quadratic polynomials and they have a common factor. There are two possibilities: (1) d is odd. In this case, the only possibility is P(x) = (x + a)d . So D = (x + a)d | a ∈ R is an embedded smooth curve in Rd , i.e., (x + a)d →
d d 2 d d a, a ,..., a . 1 2 d
(2) d is even. In this case, the possibilities are either P(x) = (x + a)d or P(x) = (x 2 + bx + c)d/2 . Therefore 0 D = (x + a)d | a ∈ R (x 2 + bx + c)d/2 | b, c ∈ R is the union of an embedded curve in Rd with an embedded surface parametrized by (b, c). If d = 6, the surface is given by φ
(b, c) → (3b, 3b2 + 3c, b3 + 6bc, 3b2 c + 3c2 , c3 ). The embedding is differentiable since the matrix
3 6b 3b3 + 6c 6bc 3c2 0 dφ(b,c) = 0 3 6b 3b2 + 6c 6bc 3c2 has rank 2. Of course, the embedding and the smoothness of the surface can be checked for all d ∈ N.
7 Fundamental Theorem of Algebra
63
If d ≥ 4, the codimension of D is greater or equal to 2. It is well-known from topology that a smooth subspace with codimension ≥ 2 does not disconnect Rn . Hence Pd D is connected. Since W (d)D = Pd D and D ⊂ W (d), it follows that Pd = W (d). Therefore every polynomial P(x) ∈ Pd can be factored into the product of linear and quadratic polynomials. Applying the Principle of Induction, we conclude the proof of the Fundamental Theorem of Algebra on field R.
Exercises (1) Show the Fundamental Theorem of Algebra when the field is C. In this case, prove that every polynomial can be factored into linear irreducible polynomials.
8 Jacobian Conjecture In the preceding sections, the study of differentiable maps defined on Rn was carried out in great detail. The main concepts and theorems were carefully introduced. To conclude this chapter, we describe an open problem since 1939, which is very easy to enunciate and basic enough considering the elements addressed in the question. In this section, we consider the field being either K = R or K = C. The Jacobian of a differentiable map f : Kn → Kn at x ∈ Kn we define to be the determinant J ( f ) = det(d f x ). Let K[x1 , . . . , xn ] be the polynomial ring of Kn . A polynomial mapping P : n K → Kn is a map given by P(x1 , . . . , xn ) = p1 (x1 , . . . , xn ), . . . , pn (x1 , . . . , xn ) , and pi ∈ K[x1 , . . . , xn ], 1 ≤ i ≤ n. The degree of P is defined as deg(P) = supi deg( pi ). We may use the bold notation x = (x1 , . . . , xn ) when referring to a point x ∈ Kn . Jacobian Conjecture JCn0 : Let P : Kn → Kn be a polynomial mapping. Then P is invertible and its inverse is also a polynomial mapping if and only if the Jacobian J (P) is constant and nonzero (J (P) ∈ K∗ ). Remarks (1) A polynomial automorphism is a polynomial map with an inverse that is welldefined and is also a polynomial map. (2) Let P : Kn → Kn be a polynomial automorphism and let Q be its inverse. So Q(P(x)) = x implies d Q P(x) .d Px = I . The Jacobians J (P) and J (Q) are also polynomials. Therefore J (Q ◦ P) = J (Q).J (P) = 1 ⇒ deg(J (P)) = deg(J (Q)) = 0. Hence J (P) ∈ K∗ .
1 Differentiation in Rn
64
(3) The JCn0 -Conjecture on R is sometimes called the Strong JCn0 -Conjecture. This is because if the conjecture is true for real polynomial maps P : Rm → Rm , then it . : Cn → Cn . By identifying Cn with is also true for complex polynomial maps P 2n . : Cn → Cn R , such that z = x + iy → (x, y), a complex polynomial map P 2n 2n induces a real polynomial map P : R → R , and .z )2 = J (P)(z) 2 . J (P)(x) = det(d Px ) = det(d P
(47)
Therefore we have J ( / P) ∈ C∗ if and only if J (P) ∈ R∗ . The Jacobian Conjecture is attributed to O.H. Keller, who first posted it in [26]. So far, the only case in which the conjecture has been settled is in dimension n = 1. There are variants of the conjecture. They are important in order to understand the issues involved in the question. So, before diving into the question, let’s introduce some notation and concepts. Consider X and Y two subsets of K and F a category of maps. Let F X and FY be the category atributed to X and Y , respectively. A map f : X → Y induces a ring morphism f ∗ : FY → FY by f ∗ (g) = g ◦ f . Let K[X ] be the class of polynomial mappings on X . The JCn0 -Conjecture can be rephrased in this setting. JCn0 -Conjecture: Let P : X → Y be a polynomial mapping. The morphism P ∗ : K[Y ] → K[X ] is a ring automorphism if and only if the Jacobian J (P) = det(d Px ) ∈ K∗ . Originally, the question was raised in Algebraic Geometry, however, it can be extended to larger categories of maps. We can define extensions to the conjecture by considering other categories of maps: given a category F, let JCF be the corresponding Jacobian Conjecture. (i) C∞ = the category of smooth maps JCn∞ . (ii) C ω = the category of analytic maps JCnω . (iii) R = the category of rational maps JCnR . It is natural to consider the following weaker form of the Jacobian Conjecture when the field is K = R. WJC-Conjecture - Let F : X → Y be a map in F. The induced morphism F ∗ : K[Y ] → K[X ] is a ring automorphism if and only if J (F) = 0. We mentioned different extensions to the JC-Conjecture in order to broaden the reader’s knowledge regarding the question, since all of them are addressed in the references. The sub-index 0 is used exclusively for the original conjecture formulated for polynomials. A local diffeomorphism f : Kn → Kn induces a local change of coordinates. If the conjecture is true, then it would imply f is a global change of coordinates. The conjecture can be formulated using a general field K. Our interest is restricted to the cases either K = C or K = R, but it can be addressed for fields with arbitrary characteristic k ≥ 0.
8 Jacobian Conjecture
65
8.1 Case n = 1 We intend to shed light where there is darkness; it means we would like to help the reader become sensitive to the inherent difficulties present in the problem. So we start considering the simplest case: Question 1: (WJC1∞ ) Suppose f : R → R is a local diffeomorphism. Is it true that f is a global diffeomorphism? Assuming f is a C1 -map such that f (x) = 0 for all x ∈ R, the question is whether f is injective and surjective. This case is rather exceptional, since in dimension n = 1 we can count on the following theorems. Theorem 14 (Intermediate Value). Let f : [a, b] → be a continuous function. Then f attains all values between f (a) and f (b). Theorem 15 (Mean Value). Let f : R → R be a C1 -function. For every pair a, b ∈ R, such that a < b, there is c ∈ (a, b) such that f (b) − f (a) = f (c)(b − a).
(48)
Corollary 3 If n = 1, then a local diffeomorphism f : R → R is injective. Proof It follows straighforward from the Mean Value Theorem. Assuming we have a, b ∈ R such that f (a) = f (b), then by Eq. (48), we have c ∈ (a, b) such that f (c) = 0 contradicting the hypothesis. Hence f is injective. The surjectivity is more subtle, as we can see from the next examples. Examples (1) The function f : R → R, f (x) = arctan(x) satisfies the following conditions: 1 (1i) f (x) = 1+x 2 > 0 for all x ∈ R, (1ii) lim x→∞ f (x) = 0, (1iii) lim x→−∞ f (x) = 0, (1iv) f is not surjective since f (R) ⊂ (− π2 , π2 ). (2) The function tanh : R → R defined by tanh(x) =
4 e x − e−x ⇒ (tanh) (x) = x > 0, x −x e +e (e + e−x )2
satisfies conditions (1i)-(1iv) in item (1). It is not surjective since tanh(R) = (−1, 1). Remark From the examples above, we learn that the conditions lim x→±∞ f (x) = 0 are not desirable to prove the surjectivity of f .
1 Differentiation in Rn
66
(3) The function f : R → R, f (x) = ex satisfies f (x) = ex > 0 and defines a diffeomorphism with an inverse f −1 : R → (0, ∞) is given by f −1 (x) = ln(x). Therefore f satisfies both conditions (1i) and (1iii). f is not surjective since f (R) = (0, ∞). √ (4) The function f : (0, ∞) → (0, ∞), f (x) = x with a derivative f (x) = 2√1 x satisfies conditions (1i) and (1ii). However f is surjective. Therefore the Conjecture WJC1∞ is false. We need extra conditions on f . Remark Example (4) above suggests we could construct a diffeomorphism h : R → R by composing the following: ex
√
x
ln(x)
R → (0, ∞) → (0, ∞) → R. √ 1 x −→ h(x) = ln( ex ) = x. 2
(49)
It turned out that h is a polynomial of degree 1. Without lost of generality, fix a ∈ R and let f : R → R be a C1 -function such that f (x) > 0 for all x ∈ R. So f is monotone increasing. Given a sequence {xn }n∈N ⊂ R, such that limn→∞ xn = ∞, we have a sequence {cn }n∈N ⊂ R such that cn ∈ (a, xn ) and f (xn ) − f (a) = f (cn )(xn − a) > 0.
(50)
From the examples above, we learn that the right-hand side term of Eq. (50) reveals the behavior of the function when n → ∞. The surjectivity does depend on the decay of f , which is determined by limn→∞ | f (cn ) |. There are two possibilities: (1) limn→∞ | f (cn )(xn − a) |= L < ∞, as in examples (1) and (2); (2) lim x→∞ | f (cn )(xn − a) |= ∞, as in examples (3) and (4). Examples (1), (2) and (3) above are not proper functions, while example (4) is proper. Examples (3) and (4) are in the twilight zone, as we can see assuming the existence of r > 0 such that | f (x) |> r for all x ∈ R. In this case the limit limn→∞ | f (cn )(xn − a) |= ∞ yields limn→∞ f (xn ) = ∞. Therefore the surjectivity of f follows from the Intermediate Value Theorem. The following theorem closes Question 8.1 when n = 1. Theorem 16 Let f : R → R be a local diffeomorphism such that | f (x) |> 0 for all x ∈ R. Then f is a global diffeomorphism if and only if for any fixed a ∈ R, we get (51) lim | f (x)(x − a) |= ∞. x→±∞
8 Jacobian Conjecture
67
Finally, for the case n = 1, we list the followings facts: (1) The JC10 conjecture is true. k Let P(x) = i=0 pkx k be an invertible polynomial and let l l Q(x) = l=0 ql x be the inverse polynomial, i.e., (Q ◦ P)(x) = x. If Q (P(x)) · P (x) = 1 for all x ∈ R, then deg(P) = deg(Q) = 0 and deg(P) = deg(Q) = 1. Therefore P(x) = p0 + p1 x ⇒ Q(x) = q0 + p1−1 x. (2) The WJC10 -Conjecture is false. The polynomial p(x) = x 3 + x satisfies p (x) = 3x 2 + 1 > 0 and p(R) = R. So p is a global diffeomorphism; neverthless, p −1 (x) is not a polynomial.6
8.2 Case n ≥ 2 As before, K = R or K = C. When n ≥ 2, a new novel starts. The simplest case JC20 remains unproven. These cases are far more difficult since there are no results similar to the Intermediate Value and Mean Value Theorems. Now injectivity becomes a hard issue and surjectivity becomes more difficult to be achieved. The main ingredients to achieve surjectivity are the InFT and to verify if the map is proper. Examples Below we give some simple examples to enlighten our understanding about polynomial and differentiable maps. (1) Let p : R → R be a polynomial. A very simple example of a polynomial automorphism is the polynomial map P : R2 → R given by P(x, y) = x + p(y), y , dp
1 dy ⇒ J (P)(x, y) = 1. d P(x,y) = 0 1 P is a proper map. (2) Let’s consider the following question: “is every quadratic polynomial defined on K the product of two linear polynomials?” Given a quadratic polynomial f (x) = x 2 + αx + β, we would like to prove that x 2 − αx + β = (x − a)(x − b). The decomposition above is equivalent to finding solutions to the system a + b = α, ab = β, 6 Solve
the cubic equation x 3 + x − α = 0 to obtain the inverse.
(52)
1 Differentiation in Rn
68
or to solve the quadratic equation x 2 − αx + β = 0. The existence of the solution to the system (52) is equivalent to proving that the polynomial map P : R2 → R2 , P(x, y) = (x + y, x y), is surjective. Suppose there are two solutions x1 and x2 to P(x, y) = (α, β), then P −1 (α, β) = {(x1 , x2 ), (x2 , x1 )}. Two possibilities can occur according to the field K: (i) If K = C, then equation P(x, y) is surjective but it is not injective. (ii) If K = R, then P(x, y) is not surjective and admits a solution if and only if α2 ≥ 4β. P is a proper map. This example corresponds to the Fundamental Theorem of Algebra in its more humble case. Observe that d P(x,y) =
11 yx
⇒ J (P)(x, y) = x − y.
The singular set corresponds to the cases when the linear factors are identical, i.e., f (x) = (x − a)2 . (3) Let P : R2 → R2 be the polynomial map P(x, y) = x y, −x 2 + y 2 . Then d P(x,y) =
y x −2x 2y
and J (P)(x, y) = 2(x 2 + y 2 ).
(53)
The map P shares the following properties: (i) J (P)(x, y) = 0 if and only if (x, y) = (0, 0). (ii) P is surjective. (iii) For any point (a, b) = (0, 0), we have # P −1 (a, b) = 2 and # P −1 (0, 0) = 1. So P is not injective. (iv) P is a proper map. Fix a point (a, b) ∈ R2 and consider the equations xy = a −x 2 + y 2 = b
⇒
y = ax x 4 + bx 2 − a 2 = 0.
Solving the quadratic equation z 2 + bz − a 2 = 0, with the discriminant = b2 + 4a 2 ≥ 0, we get −b + z1 = 2
√
−b − , z2 = 2
√
.
√ So the biquadratic equation admits only two7 real solutions, namely α = z 1 and −α. We get P(α, αa ) = P(−α, − αa ). So the items (ii) and (iii) are verified. Thus for all (a, b) ∈ R2 , the solution set P −1 (a, b) is the intersection of level curves, i.e., 7 If
K = C, then # P −1 (a, b) = 4 for all (a, b) = 0.
8 Jacobian Conjecture
69
1 P −1 (a, b) = (x, y) ∈ R2 | x y = a (x, y) ∈ R2 | −x 2 + y 2 = b . The origin is a ramification point. Indeed, the behavior of P near the origin is similar to a rotation. Furthermore, P (0, ∞) × R = R2 . So that a polynomial map P : R2 → R2 , P(x, y) = f (x, y)g(x, y) defines a diffeomorphism, we need to find polynomials f (x) and g(x) that level curves define a coordinate system, i.e., for every point (a, b) ∈ R2 , the solution set P −1 (a, b) = f −1 (a) ∩ g −1 (b) must satisfy # P −1 (a, b) = 1. It is a difficult task to find two families of algebraic curves defining transversal foliations whose leaves f −1 (a) and g −1 (b) intersect at most one point for all a, b ∈ R. It is even harder to find families in which infinitesimal parallelograms defined by the level curves have a constant area. The tangent vector to the curves f −1 (a) and g −1 (b) define the vector fields X f and X g , respectively, as to which Lie bracket is [X f , X g ] = 0. (4) Let F : R2 → R2 be the map F(x, y) = e x (1 − y 2 ), e x y . So J (F)(x, y) = e2x (1 + y 2 ) > 0. It is an interesting example; note F is not surjective since (−∞, 0) × {y = 0} ⊂ Im(F), however, F is injective. So it is clear that the WJC2∞ -Conjecture is false. (5) Let B 2 be the open 2-disk centered at the origin with radius 1 and let F : R2 → R2 be the C∞ -map F(x, y) =
x 1 + x 2 + y2
y
, 1 + x 2 + y2
.
Therefore | F(x, y) |< 1 ⇒ Im(F) = B 2 . We have
1 1 + y 2 −x y , (1 + x 2 + y 2 )3/2 −x y 1 + x 2 1 J (F)(x, y) = . 2 (1 + x + y 2 )1/2 d F(x,y) =
F : R2 → B 2 is a diffeomorphism. The map F : R2 → R2 is injective and not proper. (6) Let F : R2 \{y = 0} → R2 be the C∞ -map given by F(x, y) = x 2 y 6 + 2x y 2 , x y 3 + 1y . So J (F) = −2. F is not injective since F(−1, 1) = F(−1, 1) = (−1, 0) and is not surjective since (a, 0) ∈ / Im(F) for all a > 0. The images of the horizontal and vertical curves tell us a little bit about how F transforms the plane: Fb
(i) For every constant b, the curves (t, b) → F(t, b) describe the parabola x = y 2 − b12 . The curve F(t, −b) describes the same parabola as F(t, b).
1 Differentiation in Rn
70 Fa
(ii) For a constant a, the curves (a, t) → F(a, t) show the following behaviors: (ii.1) If a < 0, we have | a |= −a and Fa (| a |)−1/4 = Fa −(| a |)−1/4 = − | a |, 0 . (ii.2) If a > 0, then Im Fa ∩ {y = 0} = ∅. This example is not a counterexample since its domain is a disconnected subset in R2 , and it shows that the condition on the Jacobian is far from being sufficient. (7) The examples above are showing that the WJCn∞ -Conjecture fails and they are all in the C∞ -category. Pinchuk [36] gave a counterexample to the WJC20 on R. The outline of the counterexample is described as follows: (a) Define the polynomials t = x y − 1, h = t (xt + 1), f =
(xt + 1)2 (h + 1) = (xt + 1)(t 2 + y). x
(b) Define the polynomial p = h + f . (c) Show that 2 J p, −t 2 + 6th(h + 1) = t 2 + t + f (13 + 15h) + f 2 − f v, and v = v( f, h) = f + f (13 + 15h)2 + 12h + 12 f h. (d) Find a polynomial u( f, h) such that J ( p, u) = − f v. This is equivalent to finding a solution to the PDE ∂u ∂u − = v. ∂f ∂h (e) Define the polynomial q = −t 2 − 6th(h + 1) − u( f, h). Then 2 J ( p, q) = t 2 + f 2 + t + f (13 + 15h) ≥ 0. So J ( p, q) = 0 if and only if t = 0 and f = 0. But this is impossible, since if t = 0, then f = x1 . Therefore J ( p, q)(x, y) > 0 for all (x, y). The polynomial map F : R2 → R2 , F(x, y) = p(x, y), q(x, y) , cannot be injective because the set p −1 (0) is the disconnected algebraic curve x 2 y − x + 1 = 0. Indeed, the reasoning follows from the next claim: Claim: Let f, g ∈ C∞ and consider the map F = ( f, g) such that J (F)( p) = 0 for all p ∈ R2 . If f −1 (c) is connected for all c ∈ R, then F is injective. Moreover, if F is a polynomial map, then the reciprocal is true. Pf: Let’s assume J (F)( p) > 0 for all p ∈ R2 . Suppose we have points A, B ∈ R2 such that F(A) = F(B) = c. Solving8 the ODE 8 In
Chap. 4, we prove the existence and uniqueness of the solution for this ODE (IVP).
8 Jacobian Conjecture
71
⊥ γ (t) = ∇ F γ(t) ⇔
γ1 (t) = − ∂∂ yf γ(t) , γ2 (t) = ∂∂xf γ(t) ,
with the initial condition γ(0) = A, we get a parametrization of the level curve f −1 (c) given by γ(t) = γ1 (t), γ2 (t) . For all t, we have F γ(t) = c, g γ(t)) , and so ∂ f ∂ f
∂f d − 0 ∂x ∂ y ∂ y F γ(t) = d Fγ(t) .γ (t) = ∂g ∂g · = , ∂f J (F) γ(t) dt ∂x ∂ y ∂x
d F γ(t) = 0, ∇g(γ(t)).γ (t) . dt Considering the function h(t) = g γ(t) , we have h (t) = J (F) γ(t) > 0. Therefore h is increasing along the curve f −1 (c) contradicting the condition g(A) = g(B). If F is a polynomial map, the proof of the reciprocal is left as an exercise.
Exercises (1) Let f : C → C be the entire analytic function9 f (z) =
n≥0
z 2n+1 . n!(2n + 1)
The real map induced by f is F : R2 → R2 given by F(x, y) = Re( f (z)), Im( f (z)) . (i) Show that f (z) = e z . Deduce that J (F)(x, y) > 0 for all (x, y) ∈ R2 . (ii) Show that f is surjective (hint: assume it is not surjective and apply the little Picard theorem). (iii) Show that f is not injective (hint: use the Big Picard theorem). 2
So far, a better understanding of the relationships between the conditions given by the hypothesis and the assertion of the JCn0 -Conjecture is given in the equivalences proved in Bass-Connell-Wright [2]. Theorem 17 (Bass, Connell, Wright). Let p : Cn → Cn be a polynomial mapping with constant nonzero Jacobian. Then the following statements are equivalent: (i) p is invertible and p −1 is a polynomial mapping. (ii) p is injective. (iii) p is proper. Remark The theorem above considers the complex case. The hypothesis of being a polynomial map is essential. 9 See
Ref. [16].
1 Differentiation in Rn
72
As n increases, the question becomes much more difficult to tackle using methods based on properties of the determinant, since computing determinants grow factorially as n grows. We hope for a general property to ensure that the conjecture is true. To hope might be wishful, neverthless we live for it. Theorem 17 suggests that to solve the JCn0 -Conjecture, we need to prove that a map F satisfying the condition J (F) ∈ K∗ is proper or injective. The fact that it is not a proper map might be a problem for polynomial maps of many variables since the inverse image of a compact set may not be compact. Consider the example P(x, y) = (x y, x 2 − x y); so we have p −1 (0) = {(0, y) | y ∈ R}, i.e., the inverse image of a compact set is unbounded. If we assume F : Rn → Rn is proper, since the image F(Rn ) using a proper map is always10 closed, it remains to prove that F(Rn ) is also open to achieve surjectivity. Let’s discuss the cases assuming the polynomials are proper. Proposition 4 Let P : Kn → Kn be a polynomial mapping such that J (P)(x) = 0 for all x ∈ Kn . Then for each y ∈ Kn , P −1 (y) is finite. Proof Since d Px has maximal rank, the map P : Kn → Kn defines a submersion with fibers over y ∈ Im(P) that must be a discrete set of points. Let P = ( p1 , . . . , pn ), such that pi : Kn → K is a polynomial for all 1 ≤ i ≤ n, and y = (y1 , . . . , yn ). So 1 1 ... pn−1 (yn ). P −1 (y) = p1−1 (y1 ) The derivative d Px being non-singular implies that the set of gradient vectors ∇ p1 (x), . . . , ∇ pn (x) is linearly independent for all x ∈ Kn ; therefore the intersection is a 0-dimensional submanifold. Hence P −1 (y) is a discrete set of points. The claim follows from the fact that a discrete algebraic set over K = C or K = R is finite. Remark Proposition 4 does not extend to smooth maps. Once dim(P −1 (y)) = 0, we can assign to each element x ∈ P −1 (y) the number
sgn J (P)(x) =
+1, if J (P)(x) > 0, −1, if J (P)(x) < 0,
(54)
and so, we can define the Index of P at y ∈ Im(P) to be I(P; y) =
sgn J (P)(x) .
(55)
x∈P −1 (y)
The map y → I(P; y) may not be continuous, as shown in Example 3 by the polynomial map P(x, y) = (x y, −x 2 + y 2 ). In this way, we cannot guarantee P −1 (y) is 10 See
Appendix A.
8 Jacobian Conjecture
73
locally constant. However, if we assume P −1 (y) ∈ Z is locally constant, then we get a well-defined Index since Kn is connected. Definition 18 Let P : Kn → Kn be a proper polynomial mapping such that J (P)(x) = 0 for all x ∈ Kn . If the Index I : K → Z is a locally constant function, the Index of P is I(P) = I(P; y), (56) for all y ∈ Im(P). An interpretation can be given for the Index of P. . : Cn → Cn . So we have J (P) =| (1) Let the map P : R2n → R2n be induced by P 2 . J ( P) | . Therefore (57) I(P) = # P −1 (y). (2) Fix an orientation on Rn and let P : Rn → Rn be a polynomial map defining a local diffeomorphism preserving orientation. So J (P)(x) > 0. In this case, we also have I(P) = # P −1 (y). In the presence of a singularity, the Index may change. If the Index changes, the map must have a singularity. Proposition 5 Let r ∈ N. Let P : Kn → Kn be a polynomial mapping with J (P)(x) = 0 for all x ∈ K n and such that I(P) = r . Then P : Kn → Im(P) is a proper map onto its image. Proof Let K ⊂ Im(P) be a compact subset; we need to establish the compactness of P −1 (K ). Let {xn }n∈N ⊂ P −1 (K ) be an infinite sequence. The compactness will be achieved if {xn }n∈N has a convergent subsequence with limit in Im(P). Consider the sequence {yn }n∈N ⊂ K given by yn = P(xn ). We can consider {yn }n∈N convergent, so let y∞ = limn→∞ yn be the limit. Let’s consider P −1 (y∞ ) = {a1 , . . . , ar }. Given the InFT, we have open subsets 2 Ui ai and Vi y∞ such that P : Ui → Vi is a diffeomorphism. Taking V = i Vi , let Q i : V → Ui be the inverse, i.e., P ◦ Q i = id. Let’s assume {yn }n∈N ⊂ V , so P Q i (yn ) = yn for all n ∈ N. Therefore Q i (yn ) ∈ P −1 (yn ) and r 0 P −1 {yn | n ∈ N } ⊂ P −1 (V ) = Ui . i=1
By construction, we have xn ∈ P −1 (yn ), so xn = Q i (yn ) in which the index i ∈ {1, . . . , r } can be changing randomly. Neverthless, for some i ∈ {1, . . . , r }, the subsequence 1 Im(Q i ) {xni } = {xn }
1 Differentiation in Rn
74
has an infinite number of elements. Since xni = Q i (yn ), it follows that lim xni = lim Q i (yn ) = Q i (y∞ ) = ai .
n i →∞
n→∞
Hence P −1 (K ) is compact.
Remark Proposition 5 extends to non-singular smooth maps.
8.3 Covering Spaces The polynomial P in Proposition 5 is not injective when r > 1 and may not be surjective. To proceed, we will apply the theory of Covering Spaces. For further reading about Covering Spaces we recommend [33]. Definition 19 Let X and X be path-connected Hausdorff spaces. A map f : X → X is a covering map if the following conditions are satisfied: (i) f is surjective, (ii) f is locally a homeomorphism. In this case, X is a cover space for X . A local homeomorphism may not be a covering map, since it may not be surjective. However, a proper map that is locally a homeomorphism is a covering map. The subset f −1 (x0 ) is called the fiber over x0 . The fibers are discrete subsets of X . When f is a polynomial map, the fiber is a finite set. Once an element p0 ∈ f −1 (x0 ) has been fixed, we can establish an order such that f −1 (x0 ) = { p0 , p1 , . . . , pr }. This can be done for all points in X . In this framework, we have a group G and an action G × X → X , (g, x) → g · x, such that: (i) G acts freely on X . (ii) G · f −1 (x0 ) = f −1 (x0 ). (iii) X = X G. (iv) The order of G is o(G) = # f −1 (x0 ). (v) If π1 ( X ) = 0, then π1 (X ) = G. Lemma 3 Let P : Kn → Kn be a polynomial mapping with J (P)(x) = 0 for all x ∈ K n and such that I(P) = r ∈ N. Then r = 1 and P : Kn → P(Kn ) is injective. Proof Let’s sketch the proof, since it requires a short stroll along the banks of Algebraic Topology. Let’s assume r > 1; then we can write r = k. p, p prime. Take an element g ∈ G, such that g k = e, and consider the cyclic subgroup H = {e, (g k ), . . . , (g k )r −1 }. So o(H ) = p. The group H acts freely on Kn ; consequently
8 Jacobian Conjecture
75
it acts freely on the sphere S n−1 ⊂ Kn . Taking the stereographic projection, we get an action on the n-sphere S n with one fixed point. A corollary of Smith theory [41] claims this cannot happen, since the fixed point set by a finite group action on S n must be a homology m-sphere with m = −1, 0, 1, . . . , n. Here, m = −1 denotes the empty set and m = 1 is S 0 = {N , S}. N is the North Pole and S is the South Pole. Therefore for m ≥ 0, there are at least two fixed points, one of which belongs to Kn . Hence r = 1 and P : Kn → P(Kn ) is injective. From now on, the hypothesis of being a polynomial map is essential. The demonstration of the next lemmas requires that we delve into algebraic techniques that are not within the scope of this textbook. Therefore we present an outline and leave the reader with the references. Lemma 4 Let P : Kn → Kn be an injective polynomial mapping with J (P)(x) = 0 for all x ∈ K n . Then P is a global diffeomorphism. Proof (sketch) Given the theorem of Bialynicki-Birula and Rosenlicht [3], an injective polynomial map of affine spaces of the same dimension is surjective, so P : Kn → Kn is a bijection. It follows from the InFT that P is a diffeomorphism. Lemma 5 Let P : Kn → Kn be a bijective polynomial mapping with J (P)(x) = 0 for all x ∈ K n . Then P is biregular, i.e, P ∗ K[x] = K[x]. Proof Let K(x) be the field of rational functions on Kn . A bijection P : Kn → Kn induces a monomorphism P ∗ : K(x) → K(x), and so K(x) may be regarded as a finite dimensional vector space over the field P ∗ K(x) such that # f −1 (y) = dim P ∗ K(x) K(x). Since # f −1 (y) = 1, we have P ∗ K(x) = K(x). P is a regular bijection between two smooth affine varieties, so by Zariski’s main theorem then P is biregular. Proof Lemmas 4 and 5 conclude the proof of Theorem 17.
8.4 Degree Reduction Another strategy to tackle the JCn0 -Conjecture was to analyze the question for polynomials of fixed degree. Stuart Wang proved in [44] that the JCn0 is true if deg(P) = 2. The very simple proof of Wang’s theorem given below is due to Oda-Yoshida [35]. Proposition 6 (Wang [44]). If deg(P) ≤ 2, then the real JCn0 is true. Proof By Theorem 17 it suffices to prove that P is injective. Let’s suppose P(a) = P(b). Define the polynomial Q(x) = P(x + a) − P(a) and let c = b − a = 0. So Q satisfies the following properties:
1 Differentiation in Rn
76
(i) Q(c) = Q(0) = 0, (ii) deg(Q) ≤ 2, (ii) d Q x = d Px+a ⇒ J (Q)(x) = J (P)(x + a) ∈ R∗ . Now, decomposing Q = Q 1 + Q 2 into its homogeneous components Q 1 and Q 2 with degree 1 and 2, respectively, then we get Q(tc) = t Q 1 (c) + t 2 Q 2 (c). Differentiation gives Q 1 (c) + 2t Q 2 (c) = for all t ∈ R. Substituting t = tive.
1 2
d Q(tc) = d Q tc · c = 0, dt
gives Q(c) = 0, a contradiction. Hence P is injec
Proposition 6 improves the result for polynomials of degree 1. In R2 , we know from [5] that any local diffeomorphism F : R2 → R2 with deg(F) ≤ 4 satisfies the real Jacobian conjecture. The next theorem says we don’t have to check the JCn0 Conjecture for all polynomials, only tfor the polynomials of degree 3. However, the task is not easy since we must check the conjecture for all polynomials of degree 3 and for all dimensions n. Theorem 18 (Bass-Connell-Wright [2]). If the Jacobian Conjecture JCn0 holds for all polynomial P ∈ C[x1 , . . . , xn ] with deg(P) ≤ 3 and for all n ∈ N, then the Jacobian Conjecture JCn0 holds for all polynomials P ∈ C[x1 , . . . , xn ] such that J (P) ∈ R∗ . Proof (sketch) They proved it suffices to prove the Jacobian Conjecture for all polynomial P of the form P(x) = x1 + H1 (x), . . . , xn + Hn (x) , suh that each Hi is either zero or homogenous of degree 3.
(58)
Improvements: The following results improved the theorem above. (1) Dru˙zkowski proved in [11] that the polynomials Hi , 1 ≤ i ≤ n, in (58) can be taken as n 3 a ji x j . (59) Hi (x1 , . . . , xn ) = j=1
(2) If n ≤ 7, Hubbers proved in [25] that the Jacobian Conjecture JCn0 holds for all polynomial P of the form P(x) = x + H (x), and H is a polynomial given by (59).
Chapter 2
Linear Operators in Banach Spaces
In this chapter we present a brief introduction to basic concepts of Operator Theory, and some relevant classes of operators are introduced to what follows thereafter. The most explored Banach spaces in the text are the spaces E = (C k (K ; Rm ), || f ||C k ), as defined in Appendix A. Eventually, the spaces L p are used, but we avoid them since more care is required with the analysis. Our larger goal is to study the differentiable maps; for this purpose the spaces C k are enough.
1 Bounded Linear Operators on Normed Spaces The concept of differentiable maps defined on a Banach space requires prior knowledge of examples and properties of linear operators between Banach spaces. Due to the complexity, we will cover only enough as needed for our purposes; more details are found in books on Functional Analysis. Let K = R or C and let E, F be normed K-vector spaces. A function T : E → F is a K-linear operator if, for all u, v ∈ E and a, b ∈ K, T (au + bv) = aT (u) + bT (v). When F = K, the operator T is a linear functional. The space of linear functions on E is denoted by E ∗ . We note that the concept is defined for normed spaces; the need to be a Banach space will arise according to the context whenever completeness is necessary. Example 1 Examples of linear operators. (1) Consider E, F vector spaces such that dim(E) = n and dim(F) = m. A Klinear operator T : E → F can be represented by a matrix with coefficients in K. By fixing the bases β E = {e1 , . . . , en } and β F = { f 1 , . . . , f m } of E and F, © Springer Nature Switzerland AG 2021 C. M. Doria, Differentiability in Banach Spaces, Differential Forms and Applications, https://doi.org/10.1007/978-3-030-77834-7_2
77
78
2 Linear Operators in Banach Spaces
respectively, we get T (x) = A.x. The matrix A = (ai j ) ∈ Mm×n (K) is defined by T (ei ) = k aki f k , 1 ≤ i ≤ n. (2) On the functional spaces E = C 1 ([a, b]) and F = C 0 ([a, b]), we consider the R-linear differential operator D : E → F, (D f )(x) = f (x). 1 (3) On the functional spaces E = C 0 ([a, b]) x and F = C ([a, b]), we have the integral operator I : E → F, I( f )(x) = a f (t)dt, which is an R-linear operator. (4) Given a function f ∈ E = C 0 ([a, b]; C), we define the linear functional L f : b E → R, L f (g) = a ( f.g)(x)d x. Consider the Hermitian product on E as < f, g >=
b
f.g (x)d x.
a
The space E, endowed with the induced norm by the Hermitian product, is not necessarily a Banach space. A fundamental question is to determine the continuous linear functionals defined over E. When E has finite dimension, every linear functional f : E → R can be written as f (v) =< u f , v >, with u f ∈ Rn , as proved in Proposition (2 in Appendix A). Similarly, when E is an infinite dimensional Hilbert space, we can apply the Representation Theorem of Riesz (12 in Appendix A) to represent any linear functional f : E → R as f (v) =< u f , v >. (5) Given a continuous function k : [a, b] × [a, b] → R, we define the R-linear integral operator K : C 0 ([a, b]) → C 0 ([a, b]) as
K f (x) =
b
k(x, y) f (y)dy.
a
The function k is called the kernel of K . (6) Let E, F be normed spaces. Consider ai j , b j , c ∈ C k (E; R). The differentiable operator D : C k (E, F) → C k−2 (E, F) given by Df =
n i, j=1
∂2 f ∂f + bi (x) + c(x) ∂ xi ∂ x j ∂ xi i=1 n
ai j (x)
is R-linear. Definition 1 Let (E, || . || E ), (F, || . || F ) be normed spaces. A linear operator T : E → F is bounded if there is M > 0 such that || T (u) || F ≤ M || u || E , for all u ∈ E. Let L(E, F) be the space of bounded linear operators T : E → F; when E = F, let L(E) = L(E, E). Proposition 1 Let (E, || . || E ) and (F, || . || F ) be normed spaces. For any T ∈ L(E, F), the following items are equivalent;
1 Bounded Linear Operators on Normed Spaces
79
(i) T is continuous. (ii) T is continuous at the origin. (iii) T is a bounded operator. Proof We have T (0) = 0. Let’s check the following directions; (i) ⇒ (ii): straightforward. x (ii) ⇒ (iii): for any x = 0 ∈ E, consider y = δ ||x|| . Taking = 1, we have δ > 0 such that, if || y ||< δ, then || T (y) ||< 1. In this way, we get || T (y) ||= δ
1 T (x) < 1 ⇒ || T (x) || < || x || . || x || δ
(iii) ⇒ (i): Let {xn }n∈N be a sequence such that xn → x ∈ E. Therefore || T (xn ) − T (x) || = || T (xn − x) || < M || xn − x ||→ 0.
The space L(E, F) is a vector space with the following operations: take T, S ∈ L(E, F) and a ∈ K (i) (T + S)(x) = T (x) + S(x), (ii) (aT )(x) = aT (x). The operator norm of an operator T ∈ L(E, F) is defined as | T |= inf M ∈ R; || T (x) || F ≤ M || x || E . It is straightforward that || T (x) ||≤| T | . || x ||, for all x ∈ E. Moreover, | T | = sup || T (x) || F = sup ||x||=1
x=0
|| T (x) || F || x || E
and | S ◦ T |≤| S | . | T |. Now we consider the normed space L(E, F), | . | . Proposition 2 Let E and F be Banach spaces and let | . | be the operator norm. So the normed space L(E, F), | . | is a Banach space. Proof For all x ∈ E, || Tn (x) − Tm (x) || F ≤| Tn − Tm | . || x || E . Consider {Tn }n∈N a Cauchy sequence in L(E, F). Since | Tn − Tm |→ 0, when m, n → ∞, the sequence {Tn (x)}n∈N ⊂ F is a Cauchy sequence for all x ∈ E. Since F is complete, define T (x) = limn Tn (x) ∈ F. The linearity of T follows from T (ax + by) = lim Tn (ax + by) = a lim Tn (x) + b lim Tn (y) = aT (x) + bT (y). n→∞
n→∞
n→∞
80
2 Linear Operators in Banach Spaces
We have to check that T is a bounded operator and the sequence {Tn } ⊂ L(E, F), | . | converges to T . Since {Tn }n∈N is Cauchy, given > 0, we have n() such that if m, n > n(), then || Tn (x) − Tm (x) || F ≤ | Tn − Tm | . || x || E < || x || E .
(1)
Taking the limit m → ∞, it follows that || Tn (x) − Tm (x) || F < || x || E , for all x ∈ E. Then, if n 0 > n(), we have || T (x) || F ≤ || Tn 0 (x) − T (x) || F + || Tn 0 (x) || F < || x || E + | Tn 0 | . || x || E for all x ∈ E. Therefore T ∈ L(E, F) since || T (x) || F ≤ (+ | Tn 0 |) || x || E . From (1) above, we have | Tn − T |= sup x=0
|| Tn (x) − T (x) || F ≤ , for every n > n(). || x || E
Hence limn→∞ Tn = T .
Henceforth we will always consider L(E, F) to be endowed with the topology induced by the operator norm | T |= sup||x||=1 || T (x) ||. The theory of Linear Algebra for infinite dimensional vector spaces requires much more attention to topological issues. Several results proved for finite dimensional vector spaces are false in the general Banach space settings. Basic questions concerning continuity, the closedness of the graph and the closed range property of a linear operator require special attention. Continuity is understood in Proposition 1 and the closed range is fully answered in the next section. Example 2 In what follows, we will give two examples of bounded linear operators defined on C 0 ([a, b]) considering different norms. Let k ∈ C 0 ([a, b]) and consider the integral linear operator
K f (x) =
b
k(x, y) f (y)dy.
a
(1) Let E = C 0 ([a, b]), || . ||0 , so K ∈ L(E). First, let’s check that K is well-defined, i.e., if f ∈ E, then K f ∈ E. By Theorem 4 in Appendix A, k(x, y) is absolutely continuous, so given > 0, we have δ() > 0 such that if (x, y), (x, ˜ y˜ ) ∈ [a, b] × [a, b], | x˜ − x |< δ() and | y˜ − y |< δ(), then
b
(K f )(x) ˜ − (K f )(x) =
˜ y) − k(x, y)) f (y)dy ≤ (k(x, a b b ≤ | k(x, ˜ y) − k(x, y) | . | f (y) | dy < | f (y) | dy ⇒ K f ∈ C 0 . a
a
1 Bounded Linear Operators on Normed Spaces
81
Since k is continuous, we have M > 0 such that | k(x, y) |≤ M in [a, b] × [a, b]. Then for all x ∈ R,
(K f )(x) ≤
b
b
| k(x, y) | . | f (y) | dy ≤ M
a
| f (y) | dy ≤ M(b − a) || f ||0 .
a
Therefore || K ||0 ≤ M(b − a) and K ∈ L(E). (2) Let F = (C 0 ([a, b]), || . ||2 ), so K ∈ L(F). 1/2 b The norm || f || L 2 = a | f (x) |2 d x is induced by the inner product
b
< f, g >=
f (x)g(x)d x,
a
and satisfies the Cauchy-Schwarz inequality
< f, g > ≤ || f || L 2 . || g || L 2 . Applying the inequality to K f (x), we get
(K f )(x) ≤
Let C 2 =
b a
| k(x, y) | . | f (y) | dy ≤
bb a
a
|| (K f ) || L 2
a
b
| k(x, y) |2 dy
1/2 1/2 b . | f (y) |2 dy . a
| k(x, y) |2 d yd x, so 2
≤
b
| k(x, y) | dy . || f ||2L 2 d x ≤ 2
a
≤
b
a
a
b
a
b
| k(x, y) |2 d yd x . || f ||2L 2 ≤ C 2 . || f ||2L 2 .
Therefore || (K f ) ||2 ≤ C. || f || L 2 and K ∈ L(F).
2 Closed Operators and Closed Range Operators Let E and F be Banach spaces and let T : E → F be a linear operator. In Appendix A, Proposition 4, we give necessary and sufficient conditions for having a complementary closed subspace or, equivalently, a topological complement. They are useful to understand when the Kernel or the Range (Image) of a linear operator admits a complementary closed subspace. In the infinite dimension, several technical difficulties make the analysis harder than in the finite dimension. For the remainder of the text, the following notations are used;
82
2 Linear Operators in Banach Spaces
• D(T ) denotes the domain of T . • R(T ) denotes the range of T . When referring to the range, we may also use the symbols Im(T) or T (E). • Ker(T) = {x ∈ D(T ) | T (x) = 0} denotes the Kernel of T . The most elementary issues are: (i) A linear operator may not be continuous. (ii) The graph Gr(T) = {(x, T (x)) | x ∈ D(T )} may not be closed. (iii) The range R(T ) may not be closed. Definition 2 Let E, F be Banach spaces and let T : E → F be a linear operator with the domain D(T ) ⊂ E. T is a closed operator if the graph Gr(T) is a closed subset of E × F. Indeed, T is a closed operator if given any sequence {xn }n∈N ⊂ D(T ) such that lim xn = x ∈ E and lim T (xn ) = y ∈ F, then x ∈ D(T ) and T (x) = y. Considering the norm || x ||Gr(T) = || x || E + || T (x) || F , T is closed if D(T ), || . ||Gr(T) is a Banach space. T is a closable operator when T admits an extension S : D(S) → F such that D(T ) ⊂ D(S) and S(x) = T (x) for all x ∈ D(T ). Later, in Section 2.8, when treating unbounded linear operators, we will give an example of a non-closable operator. There we will also show more details concerning the closable operators defined on Hilbert spaces. For a full treatment of the topic, we recommend [20, 39]. The third issue concerns the closed range, which is a real pain in the neck. Given a linear operator T : E → F, let T (E) be the range of T . Theorem 1 Let T : E → F be a bounded linear operator. If the range T (E) is closed in F, then given > 0, there is a constant N ∈ R, N > 0, such that for all x ∈ E there is a point x ∈ E such that: (i) || T (x) − T (x ) || E ≤ . (ii) || x || E ≤ N || T (x) || F . Proof The image-set T (E) is a Banach space. Let B E = {x ∈ E; | x |< 1} be the unit open ball. We split the proof into some steps: Step 1: T (E) = ∪∞ n=1 nT (B E ). Let x ∈ E and n ∈ N be such that || x || E ≤ n. So n1 || x || E ∈ B E and T (x) = nT
1 x n
∈ nT (B E ) ⇒ T (E)
∞
nT (B E ).
n=1
∞ Since T (E) is closed, nT (B E ) ∪∞ n=1 T (E). Thus, T (E) = ∪n=1 nT (B E ). Let BT (E) = {y ∈ T (E); | y |< 1}.
2 Closed Operators and Closed Range Operators
83
Step 2: We have r > 0 such that r BT (E) 2nT (B E ). Given the Baire Category Theorem, T (E) has a non-empty interior, so there must be an m ∈ N such that mT (B E ) has a non-empty interior. Taking y0 ∈ int mT (B E ) , let r > 0 be such that Br (y0 ) int mT (B E ) . Therefore whenever || y0 − T (x) ||< r , we have T (x) ∈ int mT (B E ) . In particular, if || T (x) ||≤ 1, then y0 + r T (x) ∈ int mT (B E ) . Let {αn }n∈N and {βn }n∈N be sequences in B E such that lim mT (αn ) = y0 , lim mT (βn ) = y0 + r T (x). n
n
Having the above sequences is a consequence of T (E) being a Banach space. Define the sequence γn = β2n − α2n and note that {γn }n∈N is a sequence in B E since || γn || ≤
1 1 || αn || + || βn || ≤ 1. 2 2
Furthermore, lim 2mT (γn ) = r T (x). Therefore r T (x) ∈ 2mT (B E ) for every x ∈ B E . Hence r BT (E) 2mT (B E ). Step 3: Final; Let x ∈ E be such that T (x) = 0. In the last step we have r.
T (x) ∈ BT (E) 2mT (B E ). || T (x) ||
So given > 0, there is z ∈ B E such that
r T (x)
− 2mT (z)
< ,
r. || T (x) || || T (x) || and so
2m || T (x) ||
T (z)
< .
T (x) − r
Taking x =
2m||T (x)|| z, r
we have || T (x) − T (x ) ||< . Moreover, since z ∈ B E ,
2m || T (x) ||
2m
|| x ||=
z
. || z || ≤ || T (x) || . r r
84
2 Linear Operators in Banach Spaces
Defining N =
we have || x ||≤ N || T (x) ||.
2m , r
Theorem 2 Let T : E → F be a bounded linear operator. The subspace T (E) is closed if and only if there is a finite constant M > 0, such that y = T (x) and || x || E ≤ M || y || F . Proof (⇒) Consider the sequence {u n }n∈N constructed as follows: given 1 = by Theorem 2, there is u 1 ∈ E such that || y − T (u 1 ) || < Given 2 =
||y|| , 22
|| y || and 2
||y|| , 2
|| u 1 ||≤ N || y || .
by the same argument we have u 2 ∈ E such that
[y − T (u 1 )] − T (u 2 )
< || y || and || u 2 ||≤ N || y − T (u 1 ) || ≤ N . || y || . 22 2 Recursively, we have u n ∈ E such that || y −
n
|| y || || y || and || u n || ≤ N n−1 . 2n 2
T (u i ) ||
0, we have n 0 ∈ N, such that if n, m > n 0 , then || yn − ym || F < . Consider yn = T (xn ). So the sequence {xn }n∈N ⊂ E is also Cauchy, since by the hypothesis, || xn − xm || E ≤ | T | . || yn − ym || F < . If x = lim xn , then by continuity, we have y = T (x). Hence y ∈ T (E).
2 Closed Operators and Closed Range Operators
85
Corollary 1 If T : E → F is a closed range operator, then the left-hand side inverse T −1 is such that T −1 T = I E exists and is continuous. Proof (⇒) Since T is a closed range operator, we have M > 0 such that || x ||≤ M || T (x) ||. So T is injective since T (x) = T (y) implies || x − y ||= 0. Continuity is straightforward, noticing that || T −1 (T (x)) || ≤
1 || T (x) || . M
On the other hand, if T −1 is continuous, then || x || = || T −1 (T (x)) ≤ || T −1 || . || T (x) || . The reverse claim follows upon taking M =
1 . ||T −1 ||
Proposition 3 Let T : E → F be a bounded linear operator. If T (E) has a topological complement, then T (E) is closed. Proof Let W ⊂ F be the topological complement of T (E), so W is closed and F = T (E) ⊕ W . The product space E × W = (x, w) | x ∈ E, w ∈ W is a Banach space endowed with the norm || (x, w) || p =|| x || E + || w || F since W is closed. Define the linear operator S : E × W → F by S(x, w) = T (x) + w. S is bounded since || S(x, w) || p = || T (x) + w || F ≤ || T (x) || F + || w || F ≤ | T | . || x || E + || w || F ≤ | T | . || x || E + | T | . || w || F + (|| x || E + || w || F ) ≤ (| T | +1).(|| x || E + || w || F ).
S is surjective since F = T (X ) ⊕ W . Since E × W and F are both Banach spaces and S is bounded, then S(E × W ) is closed. By Theorem 2, there is N > 0, such that for every y ∈ F, we have an (x, w) ∈ E × W such that S(x, w) = T (x) + w = y, and || (x, w) || ≤ N || y || F . If y = T (x), then the W -component of y is w = 0, i.e., S(x, 0) = y. So, || (x, 0) || p = || x || E ≤ N || y || F .
86
2 Linear Operators in Banach Spaces
Hence T (E) is closed.
A subspace V ⊂ E is said to be finite co-dimensional if dim(EV) < ∞. Corollary 2 Let E and F be Banach spaces and let T : E → F be a bounded linear operator. If T (E) is finite codimensional, then T (E) is closed. Proof Let W ⊂ F be a finite dimensional algebraic complement of T (E). So W is a topological complement of T (E) in F. By the last proposition, T (E) is closed.
It is important to keep in mind that: (1) bounded operators are closed operators. (2) bounded operators may not have a closed range. Consider the embedding C 0 ([0, 1]) → L 1 ([0, 1]). (3) there are closed unbounded operators. (4) there are unbounded operators with closed range. We could say that the bounded linear operators having closed range are tame; otherwise, they can be wild objects. Examples will be studied in the next sections. The class of differential operators is a source of examples of unbounded linear operators.
3 Dual Spaces A linear functional defined on E is a linear operator f : E → K and K = R or K = C. The dual space E ∗ is the space of bounded linear functionals defined on E. Consider the following example: an integral defines the linear functional I : C 0 ([a, b]) → R, b
I( f ) =
f (x)d x.
a
In infinite dimensions, the dual space plays a relevant role. In the finite dimension, E ∗ is isomorphic to E. The full description of all linear functionals in E ∗ becomes an important question. When E is a Hilbert space, Riesz’s Representation Theorem answers this question by stating that for any f ∈ E ∗ , we have a vector v f ∈ E such that f (u) =< v f , u >. An operator T ∈ L(E, F) induces the linear operator T ∗ : F ∗ → E ∗ , (T ∗ f )(u) = f (T (u)), for all u ∈ E. Let E and F be Hilbert spaces; given T ∈ L(E, F), we define the adjoint operator T ∗ ∈ L(F, E) as < T (u), v >=< u, T ∗ (v) > .
(2)
Both definitions of the dual operator T ∗ coincide on a Hilbert space. For some purposes which we will not delve into, some concepts of convergence in infinite-dimensional spaces are necessary. Let (E, || . ||) be a Banach space and {xn }n∈N ⊂ E a sequence;
3 Dual Spaces
87
(1) the convergence xn → x is strong if || xn − x || E → 0 when n → ∞. w (2) the convergence xn → x is weak if, for all f ∈ E ∗ , | f (xn ) − f (x) |→ 0 when n → ∞. Strong convergence implies weak convergence; the reverse is false, except if dim(E) < ∞. For all Banach spaces E, and for every u ∈ E, consider that the linear functional 0, v and u are L.I. ∗ u (v) = a, if v = a.u, a ∈ K ∗
induces the sequence E → E ∗ → (E ∗ )∗ . The linear operator J : E → (E ∗ )∗ , J (x)( f ) = f (x), f ∈ E ∗ , extends linearly and || J (x) ||=|| x ||. We say that E is a reflexive Banach space if J (E) = (E ∗ )∗ , which is trivial if dim(E) < ∞. Reflexive spaces carry the following property; Theorem 3 If E is a reflective Banach space, then all bounded sets are compact in the weak convergence.
Proof See [27].
Exercises (1) Prove that | T |= sup||x||=1 || T (x) || F = sup|x|≤1 || T (x) || F . (2) Prove that | S.T |≤| S | . | T |, for all S, T ∈ L(E, F). (3) Consider c > 0. Let E and F be Banach spaces and let T ∈ L(E, F) be an operator such that || T (x) ||≥ c. || x ||, for all x ∈ E. Prove that T n (E) is a closed subset of F, for all n ∈ N. (4) Let k : [0, 1] × [0, 1] → R, k(x, y) = y and E = (C 0 ([a, b]), || . ||0 ). Find the sup norm of the integral operator K : E → E given by
K f (x) =
1
k(x, y) f (y)dy.
0
(5) Consider 1 ≤ p < ∞ and let E( p) = (C 0 ([a, b]) || . || L p ). Assume k : [0, 1] × [0, 1] → R is continuous and prove that the operator K : E( p) → E( p), b (K f )(x) = a k(x, y) f (y)dy is bounded. (hint: use Hölder’s inequality | ab ( f.g)(x) d x |≤|| f || L p . || g || L q , where 1p + q1 = 1.) (6) Let E be a reflexive Banach space. Prove that a weakly convergent sequence {xn }n∈N ⊂ E is bounded.
4 The Spectrum of a Bounded Linear Operator In this section, we consider K = C and E is a Banach space. When E = Cn , the standard procedure to study an operator T : E → E is by decomposing E into T -
88
2 Linear Operators in Banach Spaces
invariant subspaces. For this purpose, the spectral set σ (T ) = {λ ∈ C | Tλ = T − λI is non-invertible} is the main ingredient. To describe σ (T ), we decompose the characteristic polynomial pT (x) of T as the product of irreducible factors: pT (x) =
r r (x − λi )ei , ei = n. i=1
i=1
) = ei . The Primary T -invariant subspaces are E λi = Ker(T − λi I )ei , where dim(E λi Decomposition Theorem guarantees the decomposition E = ri=1 E λi . If T is a normal operator, i.e., T ∗ .T = T.T ∗ , by the Spectral Theorem, we can take E λi = Ker(T − λi I ), and so T is diagonalized. When T is not diagonalized, T can be represented by a block matrix with blocks given by the Jordan form. When E is an infinite dimensional vector space, the spectrum σ (T ) is rather complex as a set since new phenomena arise. For every λ ∈ C, we associate the operator Tλ = T − λI . The spectrum σ (T ) of an operator is the set of elements λ ∈ C such that the operator Tλ is not invertible or is not lower bounded; that is, we have a sequence {xn }n∈N ⊂ E such that || T (xn ) ||→ 0. This last condition is avoided if we have c ∈ (0, ∞) such that || T (x) ||≥ c || x ||, which implies that the image-set T (E) is closed. An operator T ∈ L(E), not lower bounded, may not be injective and its image may be an open set. Examples (1) Let E = l 2 (N) and consider the operator T : l 2 (N) → l 2 (N) given by T (x1 , x2 , x3 , . . . ) = (0, x1 , x2 , x3 , . . . ). Clearly, T is not surjective and therefore cannot be invertible, so 0 ∈ σ (T ). Since T has no eigenvalues, 0 is not an eigenvalue. (2) Let E = l 2 (Z) and consider T : l 2 (Z) → l 2 (Z) given by T (. . . , x−3 , x−2 , x−1 , x0 , x1 , x2 , x3 , . . . ) = (. . . , y−3 , y−2 , y−1 , y0 , y1 , y2 , y3 , . . . ), yi = xi−1 .
If T x = λx, λ ∈ C, then x ∈ / l 2 (Z), otherwise we would have || x ||= ∞. So T has no eigenvalues. However, let us see that for any λ ∈ U1 (| λ |= 1), there is a sequence {xn }n∈N such that || T xn − λxn ||→ 0 when n → ∞. Fixing λ ∈ U1 , consider when 1 xn = √ . . . , 0, 1, λ−1 , λ−2 , λ−3 , . . . , λ2−n , λ1−n , 0, . . . . n
(3)
The (i − n)th coordinate is (xn )i−n = λi−n if 1 ≤ i ≤ n and (xn )i = 0 for n < i. Then || xn ||= 1, for all x ∈ N, and T (xn ) has coordinates
T (xn )
i+1−n
= λi−n .
4 The Spectrum of a Bounded Linear Operator
In this way, || T (xn ) − λxn ||=
√2 , n
89
so,
lim || T (xn ) − λxn ||= 0.
n→∞
Therefore any λ ∈ U1 belongs to σ (T ). (3) Let H be a Hilbert space and P : H → H an orthogonal projection; so P 2 = P and P ∗ = P. Therefore | P |= 1 and P n = P for all n ≥ 2. Moreover, 0 ∈ σ (P) since Ker(P) = 0. Besides, if there is v = 0 ∈ H such that Pv = λv, then λ(λ − 1)v = 0. Therefore σ (P) = {0, 1}. If λ ∈ C and | λ |> 1, then the inverse of (P − λI ) is (P − λ.I )−1 = −λ−1 [I − λ−1 P]−1 = −λ−1 I + λ−1 P + λ−2 P 2 + · · · + λ−n P n + . . . 1 1 I+ P . = −λ−1 I + (λ−1 + λ−2 + · · · + λ−n + . . . )P = − λ λ−1
For any T ∈ L(E), when Tλ = T − λI is invertible, the operator Rλ (T ) = Tλ−1 is the resolvent of T . A complex number λ is a regular value of T if one of the following conditions is satisfied: (i) Rλ (T ) exists, (ii) Rλ (T ) is bounded, (iii) Rλ (T ) is well-defined on a dense subset of E. Definition 3 The resolvent set of T is ρ(T ) = {λ ∈ C | λ regular value of T }. The spectrum of T is the complementary set σ (T ) = C\ρ(T ). The spectrum σ (T ) decomposes into σ (T ) = σ p (T ) ∪ σc (T ) ∪ σr (T ); (i) σ p (T ) is the pontual spectrum with elements λ that are such that Rλ (T ) is not well-defined. The complex numbers λ ∈ σ p (T ) are the eigenvalues of T . (ii) σc (T ) is the continuous spectrum with elements λ that are those for which Rλ (T ) is unbounded. (iii) σr (T ) is the residual spectrum with elements λ that are those for which the domain of Rλ (T ) is not dense in E. Therefore C = ρ(T ) ∪ σ (T ). If E is a finite dimensional vector space, then σc (T ) = σr (T ) = ∅, that is, in the finite dimension the spectral values of T are the eigenvalues of T . Example 3 (1) E = C 0 ([a, b]); fix θ ∈ E and consider T : E → E given by T ( f ) = θ. f . For all λ ∈ Im(θ ), the operator Rλ (T ) is unbounded, so σ (T ) = Im(θ ). (2) Consider H = L 2 ([0, 1]) and let T : H → H be the bounded operator T ( f )(x) = x f (x) (| T |= 1). In this example, there are no eigenvalues since for any λ ∈ C, the only solution in H of equation T ( f ) = λ f is the distribution f (x) = 0 almost everywhere in [0, 1]. Let’s analyze the following cases to describe the spectrum:
90
2 Linear Operators in Banach Spaces
(i) λ ∈ / [0, 1]. The operator Tλ is invertible, since Rλ g = || Rλ g ||2L 2 =
1
0
g(x) x−λ
∈ H and
| g(x) |2 1 dx ≤ || f ||2L 2 . (x − λ)2 (1− | λ |)2
Hence ρ(T ) C\[0, 1]. (ii) λ ∈ [0, 1]. The operatorTλ = T − λI is not surjective, since the function g(x) = (x − λ) f (x) satisfies g(λ) = 0; this rules out the constant functions g(x) = c = 0 of being in the image of Tλ . Let’s check that the image of Rλ is dense in H ; take f ∈ H and consider the sequence { f n }n∈N ⊂ H given by f n (x) =
f (x), if | x − λ |≥ 1/n, 0, if | x − λ |≤ 1/n.
Therefore f = lim f n in H and f n ∈ Im(Tλ ). Hence σ (T ) = σc (T ) = [0, 1] and ρ(T ) = C\[0, 1]. (3) Let A ∈ Un and E A = { f ∈ C 0 ([a, b]; Cn ) | f (t + 1) = A. f (t)}. Consider the . Then Ker(Tλ ) = {eiλt u 0 | u 0 ∈ Cn , t ∈ operator T : E A → E A , T (u) = −i du dt R}. By imposing the condition f (t + 1) = A. f (t), we get A.u 0 = eiλt u 0 , and so u 0 is an eigenvector associated to the eigenvalue eiλ . Considering A = exp(iξ ), such that ξ ∈ un (Lie algebra of Un ), it follows that σ (T ) = σ (ξ ) + 2π Z. The following topological properties will be proved next; (i) the set ρ(T ) ⊂ C is an open subset. (ii) σ (T ) ⊂ C is a compact non-empty subset. Proposition 4 Let T ∈ L(E). If | T |< 1, then (I − T ) ∈ L(E) is invertible and ∞
(I − T )−1 =
T n.
i+0
Proof Assume for a while that the series
∞ i=0
T n converges uniformly. So
∞ ∞ n (I − T ). T = (T n − T n+1 ) = I. i=0
i=0
Convergence follows by the Weierstrass criterion since if | T |≤ c < 1, then ∞ ∞
T n ≤ | T |n ≤ i=1
i=0
1 1 ≤ . 1− | T | 1−c
4 The Spectrum of a Bounded Linear Operator
91
Corollary 3 If λ ∈ ρ(T ) and | T | n 0 , then | Tn − T |< δ . Both inequalities |T −1 | | Tn−1 − T −1 | = | −Tn−1 (Tn − T )T −1 | ≤ | Tn−1 | . | Tn − T | . | T −1 | and | Tn−1 |≤
|T −1 | 1−|T −1 |.|T −Tn |
| Tn−1 − T −1 | ≤
imply that
| T −1 |2 . | Tn − T | ≤ | T −1 |2 . | Tn − T | . 1− | T −1 | . | T − Tn |
4 The Spectrum of a Bounded Linear Operator
93
Therefore lim Tn−1 = T −1 .
Corollary 4 If T ∈ L(E), then ρ(T ) ⊂ C is an open subset and σ (T ) = C\ρ(T ) is closed. Proof Let λ ∈ ρ(T ) and consider > 0 such that| Rλ (T ) |< 1 . Let’s check that if μ is such that | μ − λ |< , then μ ∈ ρ(T ). Using the same trick as in the last proposition, we have −1 Rμ (T ) = I − (μ − λ).Rλ (T ) .Rλ (T ). So Rμ (T ) is invertible. Moreover, Rμ (T ) is bounded since | Tμ − Tλ |< and | Rμ (T ) |
T (e ), and so n n n=1 || T (x) ||2 ≤ || x ||2
∞ n=1
|| T (en ) ||2 < ∞.
5 Compact Linear Operators
97
We also have T (x) = n < T (x), en > en . So given > 0, we then have n 0 ∈ N such that ∞
< T (x), en > 2 < , ∀x ∈ , (7) n 0 +1
for all n > n 0 . For n > n 0 , let Vn be the finite dimensional subspace generated by {e1 , . . . , en } and Pn : H → Vn the projection Pn (x) =
n
< x, ei > ei , (Pn2 = Pn ).
i=1
As for any projection, | P |= 1. For any Cauchy sequence {xk }k∈N ⊂ , we have || T (xk ) − T (xl ) || ≤ || Pn (T (xk ) − T (xl )) || + || (I − Pn ) T (xk ) − T (xl ) || .
From the estimate obtained in Eq. (7), we have || (I − Pn )(T (xk ) − T (xl )) ||< for all k, l ≥ n 0 . Since Vn is a finite dimensional subspace and is bounded, there is a subsequence {xki } in {xk } such that {Pn .T (xki )} is a Cauchy sequence in Vn . Therefore {T (xki )} is also a Cauchy sequence converging in E. Indeed, this proves that T can be approximated by the sequence {Pn .T }n∈N of finite rank operators. Consequently, finite rank operators are dense in the Hilbert-Schmidt category. (5) Consider the Hilbert space E = {(x1 , . . . , xn , . . . ) | xi ∈ C, i | xi |2 < ∞}. Let {λn }n∈N be a sequence in E. It follows from the last example that the diagonal operator D : E → E, ⎛
λ1 ⎜0 ⎜ ⎜ .. D=⎜ ⎜. ⎜0 ⎝ .. .
0 λ2 .. . 0 .. .
0 ... 0 ... .. . ... 0 ... .. . ...
⎞ 0 ... 0 . . .⎟ ⎟ ⎟ 0 . . .⎟ ⎟, λn . . . ⎟ ⎠ .. . ...
is compact. The compact operators are also characterized by transforming weakly convergent sequences into strongly convergent ones. Theorem 5 A bounded linear operator is compact if and only if it applies weakly convergent sequences in strongly convergent sequences.
98
2 Linear Operators in Banach Spaces
Proof Let E, F be Banach spaces and K ∈ K(E, F). w (i) (⇒); take {xn }n∈N ⊂ E and x ∈ E such that xn → x; set yn = K (xn ) and y = w K (x). The first step is to show that yn → y, and for this consider f ∈ E ∗ and g = K ∗ f . Since f is arbitrary, it follows from the inequality that
| g(yn ) − g(y) | = g(K (xn )) − g(K (x)) = | K ∗ f (xn ) − K ∗ f (x) | → 0 w
and that yn → y. Now suppose that yn does not converge strongly to y. Then there is a subsequence {yn k } such that || yn k − y || F ≥ for some > 0. The weak convergence w xn → x implies {xn }n∈N is bounded. Due to the compactness of T , the sequence {yn k = K (xn k )}n∈N has a strongly convergent subsequence {yn }. Taking y = lim yn , w it follows that yn → y , and consequently y = y; this contradicts the hypothesis that yn does not converge strongly for y. w (ii) (⇐); Assume that for all sequences {xn }n∈N ⊂ E, such that xn → x, we have lim K (xn ) = K (x) (strongly). It is immediate that K is compact since every sequence is bounded if and only if it is weakly convergent.
When H is a Hilbert space admitting an orthonormal basis {en }n∈N ⊂ H , we have ∞ i=1 |< x, en >|< ∞ for all x ∈ H . Consequently, lim n→∞ < x, en >= 0. Hence the sequence {en }n∈N is weakly convergent to 0. Corollary 5 Let H be a Hilbert space, let {en }n∈N ⊂ H be an orthonormal basis of H , and let K be a compact operator. So {K (en )}n∈N converges strongly to 0. w
Proof Since en → 0, Theorem 5 implies the sequence {K (en )}n∈N strongly converges, say, lim K (en ) = w = 0. Then || w ||2 = lim < K (en ), w > = lim < en , K ∗ (w) > = 0. n→∞
n→∞
Therefore {K (en )}n∈N strongly converges to 0.
The spectral set of a compact operator is discrete, as we shall prove next. Theorem 6 The set of eigenvalues of an operator T ∈ K(E, F) is enumerable and can be finite or empty. Moreover, the set of accumulation points A(σ (T )) of σ (T ) is either A(σ (T )) = ∅ or A(σ (T )) = {0}. Proof It is sufficient to prove that the set (c) = {λ ∈ σ (T ); | λ |> c} is finite for every real number c > 0. Suppose that (c) is not finite; then there is a sequence {λn }n∈N , such that λn = λm , for all n = m and | λn |≥ c. Let E T = {x1 , . . . , xn , . . . } be the set of eigenvectors associated to the sequence {λn }, that is, Tn xn = λn xn . Once set of vectors, consider V (n) = { E T is a linearly independent i=1 ai x i | ai ∈ C}. n ai xi ∈ V (n), we have Then for all x = i=1 (T − λn I )x = a1 (λ1 − λn )x1 + · · · + an−1 (λn−1 − λn )xn−1 ∈ V (n − 1).
5 Compact Linear Operators
99
Let’s assume E is infinite dimensional; otherwise, the claim is known. From Corollary 2 in Appendix A, it follows that we have a sequence {yn }n∈N with the following properties: (i) yn ∈ V (n), (ii) || yn ||= 1 and (iii) || yn − x ||≥ 1/2, for all x ∈ V (n − 1). The existence of such a sequence will lead us to a contradiction. Assuming m < n, consider the point x˜ = λn yn − (T yn − T ym ). Therefore we have x˜ ∈ V (n − 1) since (λn I − T )yn ∈ V (n − 1) and T ym ∈ V (n − 1) (m < n). Letting x ∈ V (n − 1) and taking the vector x˜ = λn x, we get || T yn − T ym || = || λn yn − x˜ || = | λn | . || yn − x || ≥
c λn ≥ . 2 2
Consequently, the sequence {T yn }n∈N does not converge, contradicting the hypothesis T ∈ K(E, F). The set (c) = {λ ∈ σ (T ); | λ |> c} being finite, yields that either (i) A(σ (T )) = ∅ when λn = λ0 , for all n ≥ n 0 or (ii) A(σ (T )) = {0} when
λn → 0. The theory of compact operators arose from the study of integral equations
b
x(s) − λ
k(s, t)x(t)dt = y(s)
(8)
a
b which can be written in the form (I − λT )(x) = y, and T (x) = a k(s, t)x(t)dt. Fredholm proved that either the integral equation (8) has a single solution or the homogeneous equation x(s) − λ
b
k(s, t)x(t)dt = 0
a
has a finite number of linearly independent solutions. Therefore either (I − λ.T )−1 exists or the kernel of (I − λ.T ) has finite dimension. The example above motivated the following definition. Definition 6 The operator T ∈ L(E) satisfies the Fredholm alternative if one of the following conditions are satisfied; C1 The non-homogenous equations T (x) = y, T ∗ f = g have unique solutions x and f , respectively, for all y ∈ E and g ∈ E ∗ . C2 The homogenous equations T (x) = 0, T ∗ f = 0
100
2 Linear Operators in Banach Spaces
have the same number of linearly independent solutions, say x1 , . . . , xn and f 1 , . . . , f n , respectively. The non-homogeneous equations T (x) = y, T ∗ f = g have a solution if and only if y and g satisfy f k (y) = 0, g(xk ) = 0, for all 1 ≤ k ≤ n. Theorem 7 Let T ∈ K(E) and λ = 0. So Tλ satisfies the Fredholm alternative.
Proof See in [27].
Exercises (1) Let T ∈ K(E, F) and assume ⊂ E is a bounded subset. Prove that the image T () is relatively compact, i.e., T () is compact. (2) Prove that K(E, F) is a closed subset of L(E, F). (3) Assume that T : E → F is compact. Prove that T ∗ : F ∗ → E ∗ is also compact. (4) Consider HS(E, F) the space of Hilbert-Schmidt operators defined in Example 4 with the inner product < K1, K2 >H S =
∞
< K 1 (ei ), K 2 (ei ) > .
i=1
Prove that HS(E, F) is a Hilbert space. (5) Let ∈ Rn be a bounded set and r < n. Prove that the operator K : L 2 () → L 2 (), given by 1 f (y)dy (K f )(x) = r | x − y | is compact.
6 Fredholm Operators A generalization of the Primary Decomposition Theorem for compact operators will be demonstrated in this section. Definition 7 Let E, F be Banach spaces. A linear operator T ∈ L(E, F) is Fredholm if it satisfies the following conditions: (i) Ker(T) is closed and dim(Ker(T)) < ∞. (ii) T (E) ⊂ F is closed and dim(F/T(E)) is finite (codim(T(E)) < ∞).
6 Fredholm Operators
101
Assuming dim(F/T(E)) < ∞, then T (E) ⊂ F is a closed subset. The space F(E, F) of Fredholm operators is not a vector space, since 0 ∈ / F(E, F). However, if T ∈ F(E, F), then t T ∈ F(E, F) for all t ∈ C. Definition 8 Let T ∈ F(E, F). The index of T is ind(T ) = dim(Ker(T )) − dim
F . T (E)
(9)
Example 5 Examples of Fredholm operators. (1) Let E = {(a1 , a2 , . . . , an , . . . ) | ai ∈ C, i | ai | p < ∞}. The operators T, S : E → E, T (a1 , a2 , . . . , an , . . . ) = (a2 , a3 , . . . , an , . . . ), S(a1 , a2 , . . . , an , . . . ) = (0, a1 , a2 , . . . , an , . . . ) are Fredholm and ind(T) = 1 and ind(S) = −1. (2) Let T : E → E be a compact operator. So = I − T is a Fredholm operator and ind() = 0. This example is fundamental in the theory of integral equations and in the spectral theory of compact operators. (a) dim(Ker()) < ∞. Taking x ∈ Ker(), we have that T x = x, that is T |Ker() = I |Ker() . It follows from the compactness of T that the identity operator restricted to Ker() must be finite dimensional and so dim(Ker()) < ∞. (b) (E) is closed in E. Consider V a closed complement of Ker() so that E = Ker() ⊕ V . Consider the continuous operators V = |V : V → E and TV : T |V : V → E, so Ker(V ) = {0}. We will prove that −1 V : T (V ) → V is continuous at the origin. Suppose that −1 V is not continuous at the origin; then there is a sequence {xn } ⊂ V such that lim V (xn ) = 0 and lim xn = 0. Let r > 0 be such that | xn |≥ r for all n ∈ N; therefore |x1n | < r1 and, consequently, lim V ( |xxnn | ) = 0. The sequence {xˆn = |xxnn | }n∈N is bounded, so we have a subsequence {T (xˆn k )}n k ⊂ {T (xˆn )}n∈N converging in E. Since {xˆn k } also converges. The limit (xˆn k ) = xˆn k − T (xˆn k ), the subsequence z = lim xˆn k belongs to Ker() since z − T (z) = 0. This contradicts the construction of the decomposition E = Ker() ⊕ V . Hence −1 V is continuous and (V ) is closed. (c) codim((E)) < ∞. Suppose that codim((E)) = ∞. Let (E) = W0 and consider a sequence of closed subspaces: (E) = W0 ⊂ W1 ⊂ W2 ⊂ · · · ⊂ Wn ⊂ . . .
102
2 Linear Operators in Banach Spaces
such that dim(Wn ) = dim(Wn−1 ) + 1. Given > 0, Corollary 2 in Appendix A guarantees that we have a sequence {xn }n∈N ⊂ E such that xn ∈ Wn , | xn |= 1 and | xn − y |> 1 − for all y ∈ Wn−1 and for all n. Taking k < n, we get
| T (xn ) − T (xk ) | = xn − xk + (xn − (xk ) ≥ 1 − . " #$ % ∈Wn−1
So we have the non-existence of a convergent subsequence of {T (xn )}n∈N . That is the assumption codim((E)) = ∞ leads to a contradiction since T is a compact operator. (d) Since : E → E is Fredholm, we have the decompositions E = Ker() ⊕ V and E = (V ) ⊕ Coker(). The map : V → (V ) is an isomorphism, so dim(Ker()) = dim(Coker()). Therefore ind() = 0. 0 n (3) Let x E = (C ([0, 1], R ), || . ||0 ) and let T : E → E be the operator T ( f )(x) = 0 f (t)dt. Since T is linear and compact, the operator : E → E, defined by
x
( f )(x) = f (x) −
f (t)dt
0
is Fredholm with ind() = 0. (4) Let K ⊂ Rn be a compact subset, V ⊂ R an open subset, and E j = (C j (K , V ), || . || j ). Consider the differential operator D : E j → E j−1 given by D( f ) = f . (a) The operator D : E 1 → E 0 is Fredholm. The kernel is Ker(D) = R and D is surjective (D(E 1 ) = E 0 ), so ind(D) = 1. The surjectiveness follows from t the Fundamental Theorem of Calculus; given g ∈ E 0 , the curve f (t) = a g(s)ds satisfies the equation D( f ) = g and f ∈ E 1 . However, D is not compact since a bounded sequence { f n }n∈N ⊂ E 1 is not enough to guarantee the equicontinuity of the sequence { f n }n∈N . Thus, the sequence { f n }n∈N ⊂ E 1 may not converge in E 1 . (b) The operator D : E 2 → E 0 , D( f ) = f is compact but is not Fredholm. The kernel is Ker(D) = R. The operator is not surjective, since given a curve b g ∈ E 0 , the curve f (t) = a g(s)ds satisfies D( f ) = g but it may not be C 2 . However, in this case, the operator D is compact. Taking a bounded sequence { f n }n∈N ⊂ E 2 , such that || f n ||2 < C, we have || f n ||0 < C. By the Mean Value Theorem, | f n (y) − f n (x) | ≤ || f n ||0 . | y − x | . Consequently, { f n }n∈N is equicontinuous. This means that the sequence { f n }n∈N ⊂ C 0 (K , V ) admits a convergent subsequence. The image of D is not closed since we can take a sequence { f n }n∈N with the limit / C 1. lim f n = g ∈
6 Fredholm Operators
103
(5) Let φ ∈ E = C 0 (K , R) and define the operator T : C k → C k , Tφ ( f ) = φ. f . So Tφ ∈ L(E) and | Tφ |=|| φ ||0 . Let Dφ : C 3 (K , V ) → C 0 (K , V ) be the operator Dφ = Tφ ◦ D, Dφ ( f ) = φ. f . Hence Dφ is Fredholm. Proposition 7 The space F(E, F) is an open subset of the space L(E, F) and the function ind : F(E, F) → Z, T → ind(T ) is continuous. Proof Take ∈ F(E, F). The decomposition E = Ker() ⊕ V is taken in such a way that V = : V → (V ) is invertible. Also, consider the decomposition F = (V ) ⊕ W , with dim(W ) < ∞. From Proposition 5, any operator T : V → T (V ) sitting in the ball Br () = {T ∈ L(E, F); | T − |< r } is invertible by taking r < |1−1 | . In this case we have a subspace V ⊂ Ker() such that Ker() = Ker(T ) ⊕ V . Let m = dim(V ) and let W ⊂ W be the complement of T (V ), that is, F = (V ) ⊕ T (V ) ⊕ W , and dim (T (V ) = m since T : V → T (V ) is an isomorphism. Therefore, ind(T ) = dim(K er (T )) − dim(W ) = [dim(Ker(T )) + m] − [dim(W ) + m] = = dim(Ker()) − dim(W ) = ind().
An important property of the Fredholm operators is their stability under perturbation, as shown in the following corollary; Corollary 6 If T, S ∈ F(E, F), then T + K ∈ F(E, F). Proof Since T |Ker(T +K ) = −K |Ker(T +K ) is a compact operator with closed range, we have dim(Ker(T + K)) < ∞. The operator t K is also compact for all t ∈ C, and the map t → ind(T + t K ) is continuous, so it is constant. Consequently, dim(F/(T + K )) < ∞. Hence (T + K )(E) ⊂ F is closed in F and T + K is Fredholm.
Example 6 Examples (1) On E = (C 0 ([a, b]; R), || . ||0 ), define the operator T : E → E,(T f )(x) = x a f (t)dt. The equation (T f ) − α f = g has a solution for all g ∈ E and α = 0 ∈ R,. It is equivalent to −α[I − α −1 T ]( f ) = g. The operator = −α[I − α −1 T ] is Fredholm. We assert that its kernel is trivial, i.e., Ker() = {0}. Note that if f ∈ Ker(), then (T f )(x) = α f (x). As T f is differentiable, it follows that f ∈ Ker() is also differentiable x and f = α1 f , that is, f (x) = f (0).e− α . However (T f )(a) = 0 = f (a) implies f (x) = 0, for all x ∈ [a, b]. Therefore is injective. Since ind() = 0, it follows that is surjective. (2) Let K ∈ K(E) and assume T = I − K is injective, so T is surjective. For all t ∈ R, t K is compact and the map t → ind(I − t K ) is continuous, and therefore constant. Taking t = 1, clearly ind(I − K ) = 0. Therefore dim E/T(E) = dim(Ker(T)) = 0.
104
2 Linear Operators in Banach Spaces
(3) The Laplacian operator defined in a compact region is one of the most important examples. Let ⊂ Rn be a compact subset and E = L 2 (; R). The Laplacian operator : L 2 (; R) → L 2 (; R) is defined as f =
n ∂2 f i=1
∂ xi2
.
The assertion that is a Fredholm operator is proved by verifying the Inequality (14) and applying the fact that the inclusion L 2,k → L 2 of the Sobolev space is a compact embedding for k > 2; both proofs are beyond the scope of this text (see in [18]). Proposition 8 Let E, F, G be Banach spaces and consider S ∈ F(E, F) and T ∈ F(F, G). So ind(TS) = ind(T) + ind(S). Proof The composition T ◦ S is represented by S
T
E −→ F −→ G. Due to the identity ind(T S) = dim(Ker(TS)) − dim (G/T S(E)) ,
(10)
we will calculate each portion of ind(T S) as a function of ind(T ) and ind(S). The inclusions Ker(S) ⊂ Ker(T S) and T S(E) ⊂ T (F) imply Ker(T S) dim (Ker(T S)) = dim (Ker(S)) + dim Ker(S) T (F) G G = dim + dim . dim T S(E) T S(E) T (F)
(11)
All spaces above have finite dimension. Applying the Rank-Nullity Theorem to the linear operator T : F/S(E) → T (F), we get dim
F S(E)
= dim
Ker(T ) S(E) ∩ Ker(T )
Therefore Eqs. (11) and (12) imply that
+ dim
T (F) . T S(E)
(12)
6 Fredholm Operators
105
ind(T S) − ind(T ) − ind(S) = T (F) F Ker(T S) − dim − dim(Ker(T)) + dim = = dim Ker(S) T S(E) S(E) Ker(T ) Ker(T S) − dim(Ker(T)) + dim = = dim (13) Ker(S) S(E) ∩ Ker(T ) Ker(T ) Ker(T S) − dim − dim(S(E) ∩ Ker(T )) + = dim Ker(S) S(E) ∩ Ker(T ) Ker(T S) Ker(T ) = dim − dim(S(E) ∩ Ker(T )). + dim S(E) ∩ Ker(T ) Ker(S) Considering the decompositions Ker(T S) = Ker(S) ⊕ W and E = Ker(T S) ⊕ U = Ker(S) ⊕ W ⊕ U , we have S(E) = S(W ) ⊕ S(U ). We claim S(E) ∩ Ker(T ) = S(W ); it is clear that S(W ) ⊂ S(E) ∩ Ker(T ). To prove the reverse, let y = S(x) ∈ S(E) such that T S(x) = 0; by taking x = x1 + x2 ∈ Ker(S) ⊕ W , then S(x) = S(x2 ), and so, y ∈ S(W ). Consequently S(E) ∩ Ker(T ) ⊂ S(W ). Hence ind(T S) = ind(T ) + ind(S).
A sufficient condition often used to check if an operator is Fredholm is the following; Proposition 9 Let E, F, G be Banach spaces. Let T : E → F be a bounded operator and let K : E → G be a compact operator. If there is a constant c > 0 such that (14) || x || E ≤ c || T (x) || F + || K (x) ||G , for all x ∈ E, then the following conditions are satisfied: (i) dim(Ker(T )) < ∞. (ii) T (E) is closed. Proof (i) Let {xn }n∈N ⊂ Ker(T ) be a sequence such that || xn ||≤ 1 for all n. Since {xn }n∈N is bounded, we have a subsequence {xn k } so that {K (xn k )} converges. From the Inequality (14), {xn k } is a Cauchy sequence, and so it converges in E. Therefore Ker(T ) is finite dimensional. (ii) Let V ⊂ E be the complementary subspace of Ker(T ) and y ∈ T (V ). Now we have a sequence {xn } ⊂ V such that y = lim T (xn ). Let’s check that {xn } is bounded; by contradiction, suppose {xn k } ⊂ V is unbounded and || xn k ||→ ∞. Considering x the sequence xˆn k = ||xnnk || , we have lim T (xˆn k ) = 0 and {K (xˆn k )} converges. Therek ˆ then || xˆ ||= 1 and T (x) ˆ = 0. fore {xn k } ⊂ V is a Cauchy sequence. If lim xn k = x, However this contradicts the decomposition that E = Ker(T ) ⊕ V .
The operators T, P ∈ L(E, F) are considered congruent T ≡ P mod K(E, F) whenever T − P ∈ K(E, F). This is an equivalent relation defined in L(E, F). When F = E, if T1 ≡ P1 mod K(E) and T2 ≡ P2 mod K(E), then T1 .T2 ≡ P1 .P2 mod K(E). An operator T ∈ L(E, F) is invertible module K(E, F) if there is T ∈ L(E, F) such that
106
2 Linear Operators in Banach Spaces
T .T ≡ I E mod K(E, F) and T.T ≡ I F mod K(E, F). Therefore T is the inverse mod K(E, F) of T . Proposition 10 Let T ∈ L(E, F). T is a Fredholm operator if and only if T is invertible mod K(E, F). The inverse T of T can be chosen so that dim(F/T (E)) < ∞. Proof (⇒) Consider E = Ker(T ) ⊕ V and F = T (V ) ⊕ W , and now V and W are closed subspaces. Let P : F → E be the continuous linear operator P = i ◦ T −1 ◦ pr1 given by pr1
T −1
i
F = T (V ) ⊕ W −→ T (V ) −→ V −→ E = Ker(T ) ⊕ V. Since T P(x) = 0, for all x ∈ W , we get either cases: (i) if x ∈ W , then (I F − T P)(x) = x or (ii) if x ∈ / W , then (I F − T P)(x) = 0. Therefore pr W = I F − T P is the projection over W and a compact operator since dim(W ) < ∞. With the same argument, pr N (T ) = I E − P T is the projection over Ker(T ), so it is also a compact operator. (⇐) Let K F = I F − T P ∈ K(F, E) and K E = I E − P T ∈ K(E, F). Since Ker(T ) ⊂ Ker(P T ) and (I E − P T ) |Ker(P T ) = I E , we have dim(Ker(P T )) < ∞. So dim(N (T )) < ∞. Once the operators P T = I E − K E and T P = I F − K F are Fredholm, the inclusion T P(E) ⊂ T (E) implies T (E) has finite codimension; therefore T (E) ⊂ F is closed.
The proof above gives a geometric interpretation showing that the operator I − T P is a projection over the finite-dimensional space W . The linear Fredholm operators may not be invertible, however, they are pseudoinvertible. Let T : E → F be a bounded linear operator, the pseudo-inverse of T is the operator Q : F → E such that: (i) QT Q = Q, (ii) T QT = T . Proposition 11 A linear operator T : E → F has a pseudo-inverse if and only if the following conditions are satisfied: (i) T (E) is closed. (ii) We have a subspace E 0 ⊂ E such that E = Ker(T ) ⊕ E 1 . (iii) We have a subspace F1 ⊂ F such that F = F0 ⊕ T (E). In particular, every Fredholm operator has a pseudo-inverse. Proof (⇐) Let’s assume T satisfies items (i)–(iii). Taking E 0 = Ker(T ) and F1 = T (E), let E 1 ⊂ E and F0 ⊂ F be subspaces such that
6 Fredholm Operators
107
E = E0 ⊕ E1,
F = F0 ⊕ F1 .
Therefore T1 = T | E1 : E 1 → F1 is a bounded isomorphism. For any v = v0 + v1 ∈ F, with v0 ∈ F0 and v1 ∈ F1 , define Q : F → E,
Q(v0 + v1 ) = T1−1 (v1 ).
Therefore we have (i) QT Q = Q and (ii) T QT = T . (⇒) Now we assume T has a pseudo-inverse Q : F → E. Define the subspaces E 1 = Q(F) and F0 = Ker(Q). The operator P = QT : E → E is a projection, since P 2 = (QT )(QT ) = QT = P, Ker(P) = Ker(T ) and P(E) = Q(F). Then P projects E over Q(F). Analogously, P = T Q : F → F is a projection such that Ker(P ) = Ker(Q) and P (F) = T (E) since P projects F over T (E). Now the following three items are satisfied: (i) the subspace T (E) is closed since it is the image of the projection P . (ii) the complement of Ker(T ) is Q(F). (iii) the complement of T (E) is Ker(Q).
Exercises (1) Let E, F be finite-dimensional Banach spaces and T ∈ L(E, F). Prove that T is Fredholm and ind(T ) = dim(E) − dim(F). (2) Let T : E → F be a Fredholm compact operator. Prove that dim(E) and dim(F) are finite. (3) Let K ⊂ Rn be a compact subset, V ⊂ Rm an open subset and E j = (C j (K , V ), || . || j ). Is the derivative operator D : E k → E k−1 , D( f ) = f , a Fredholm operator? (4) Let T1 , T2 ∈ F(E). Prove that T2 .T1 ∈ F(E). (5) Let T ∈ F(E, F). Prove that T satisfies the Fredholm alternative, defined in (6). (6) Let H be a Hilbert space and T ∈ F(H ). Prove that ind(T ) = dim Ker(T ) − dim Ker(T ∗ ) . (7) Let P1 , P2 be the pseudo-inverses of T ∈ F(E, F). Prove that P1 ≡ P2 mod F(E, F). (8) Assume T1 , T2 ∈ F(E, F). Prove that T1 .T2 ∈ F(E, F). (9) Let E, F be Banach spaces and T ∈ F(E, F). Prove that there is a Banach space G and a bounded linear operator S : G → F, such that the operator T ⊕ G : E ⊕ G → F, T ⊕ S(u ⊕ v) = T (u) + S(v), is surjective. Moreover, the projection P : Ker(T ⊕ S) → G is Fredholm and ind(T ) = ind(P).
108
2 Linear Operators in Banach Spaces
6.1 The Spectral Theory of Compact Operators When E is a finite dimensional vector space, we associate to every linear operator T ∈ L(E) the characteristic polynomial pT (x) = (x − λ1 )n 1 .(x − λ2 )n 2 . . . (x − λk )n k ,
(15)
k n k = dim(E). The Cayleyand λi , 1 ≤ i ≤ k, are the eigenvalues of T and i=1 Hamilton Theorem asserts that pT (T ) = 0. For every eigenvalue λi , we associated the operator Ti = T − λi I and the generalized eigenspace E i (λi , n i ) = Ker(T ni ).
(16)
The Primary Decomposition Theorem asserts that the space E decomposes as E = E 1 (λ1 , n 1 ) ⊕ E 2 (λ2 , n 2 ) ⊕ . . . E k (λk , n k ).
(17)
Consider E a Banach space. We will extend the Primary Decomposition Theorem for compact operators K ∈ K(E). For any λ = 0 ∈ C, the linear operator Tλ = K − λI = −λ(I − λ−1 K ) is Fredholm. The same is true for the powers p Tλ = (K − λI ) p = (−λ) p I − K 1 , p and for the operator K 1 = j=1 pj (λ−1 K ) j is compact. Since ind(Tλ ) = 0, we p get dim(Ker(Tλ )) = dim( T pE(E) ), for all p ∈ N. Therefore either Tλ is invertible or λ λ ∈ σ (T ). By Theorem 6, the spectrum σ (K ) is enumerable and either its only accumulation point is A(σ (K )) = {0} or A(σ (K )) = ∅. We fix an order σ (K ) = {λ1 , λ2 , . . . , λn , . . . } with | λi |>| λi+1 | and lim λn = 0. For 1 ≤ i ≤ n, define the operator Ti = K − λi I . Proposition 12 Let K ∈ K(E) and λi ∈σ (K ), λi = 0. So for i ∈ {1, . . . , n, . . . } n = Ker (K − λI )ni for all there is an integer n i > 0 such that Ker (K − λi I n ≥ n i . The integer n i is the exponent of the eigenvalue λi in the characteristic polynomial defined in (15). Proof By contradiction, assume that the assertion is false. Therefore there is an ascending chain of subspaces Ker [K − λI ] Ker [K − λI ]2 · · · Ker [K − λI ]n . . . . n However thisis not possible since the operator (K − λI ) is Fredholm. Therefore n we get dim(Ker [K − λI ] ) < ∞, for all n ∈ N, and we also have n λ ∈ N so that now the chain stops.
6 Fredholm Operators
109
Theorem 8 Consider K ∈ K(E) and let λi ∈ σ (K ) be a non-null eigenvalue with exponent n i . Then the space E admits a decomposition E = Ker(Tini ) ⊕ Im(Tini ).
(18)
The subspaces of the decomposition are K -invariants. Moreover, if λ j ∈ σ (K ) is an eigenvalue distinct of λi with exponent n j , then n
Ker(T j j ) ⊂ Im(Tini ).
(19)
Proof The spaces Ker(Tini ) and Im(Tini ) are K -invariants closed subspaces. Suppose the closed subspace F = Ker(Tini ) ⊕ Im(Tini ) E. Since ind(Tini ) = 0, we have dim(E/Im(Tini )) = dim(Ker(Tini )). Once E = Im(Tini ) ⊕ (E/Im(Tini )), it follows that F = E. So we have E = Ker(Tini ) + Im(Tini ). To prove that Ker(Tini ) ∩ Im(Tini ) = {0}, consider x ∈ Ker(Tini ) ∩ Im(Tini ) and assume that x = Tini (y), y ∈ E. Since Tini (x) = 0, we get (Tini )2 (y) = 0. Since Ker((Tini )2 ) = Ker(Tini ), we have y ∈ Ker(Tini ) and x = 0. Therefore E = Ker(Tini ) ⊕ Im(Tini ). Now, let λ j = λi be n n another eigenvalue in σ (K ). The commutativity Tini .T j j = T j j .Tini implies that nj n the subspaces in the decomposition (19) are T j -invariants. Let z ∈ Ker(T j j ) and n consider z = x + y, where x ∈ Ker(Tini ) and y ∈ Im(Tini ). The identity T j j (z) = nj nj nj nj T j (x) + T j (y) = 0 now implies that T j (x) = 0, i.e., x ∈ Ker(T j ). Using the fact that the polynomials p(x) = (x − λi )ni and q(x) = (x − λ j )n j are primes, we now have polynomials r (x), s(x) such that r (x). p(x) + s(x).q(x) = 1.
(20)
Evaluating the identity (20) on K , we get r (K ). p(K ) + s(K ).q(K ) = I. Then,
n
x = I (x) = r (K ).Tini (x) + s(K ).T j j (x) = 0. Therefore z ∈ Im(Tini ).
(21)
(22)
k &k Ker(Tini ) and Ik = i=1 Im(Tini ). Applying For every k ∈ N, let Kk = i=1 Theorem 8, we have the decomposition E = Kk ⊕ Ik corresponding to the Primary Decomposition Theorem for compact operators on Banach spaces. If the exponents of all eigenvalues λi ∈ σ (K ) satisfy n i = 1, then K is diagonalizable. The subspaces satisfy the inclusion chain N1 N2 · · · Nn . . . , I1 I2 · · · In . . . .
110
2 Linear Operators in Banach Spaces
7 Linear Operators on Hilbert Spaces A sesquilinear form < ., . >: H × H → C defined on a Hilbert space imposes a more efficient structure to address the linear algebra questions. As a result of the Riesz Representation Theorem 12, in Appendix A, the Hilbert spaces H and H ∗ are isomorphic. Definition 9 Let H be a Hilbert space and T ∈ L(H ). The adjoint operator T ∗ : H → H is the linear operator such that < v, T (u) >=< T ∗ (v), u >, for any u, v ∈ H . In particular, if f is a linear functional and f (g) =< v f , g > for all g ∈ H , then T ∗ (v f ) = vT ∗ f . Considering R : H ∗ → H , the Riesz isomorphism f → v f , then the following diagram commutes: T∗
H −−−−→ ' ⏐ R⏐
H ' ⏐ ⏐R
T∗
H ∗ −−−−→ H ∗ . Proposition 13 If T, S ∈ L(H ), then T ∗ , S ∗ ∈ L(H ). The adjoint operator ∗ : H → H ∗ , T → T ∗ satisfies the following items: (i) (ii) (iii) (iv) (v)
(T + S)∗ = T ∗ + S ∗ . (T ∗ )∗ = T . ¯ ∗. (a.T )∗ = a.T ∗ (T.S) = S ∗ .T ∗ . | T ∗ |=| T | and | T ∗ .T |=| T |2 .
Proof The items (i)–(iv) are straightforward to prove. We now prove item (v). From the inequality
< T ∗ (u).v > = < u, T (v) > ≤ | T | . | u | . | v |, we have | T ∗ |≤| T |. Since (T ∗ )∗ = T , the same argument applied twice implies that | T |≤| T ∗ |. Therefore | T ∗ |=| T | and | T ∗ .T | ≤ | T ∗ | . | T |=| T |2 . The reverse inequality follows easily from | T (u) |2 = < T (u), T (u) >=< u, T ∗ .T (u) > ≤ | T ∗ .T | . | u |2 ⇒ | T |2 ≤ | T ∗ .T | .
7 Linear Operators on Hilbert Spaces
111
Hence | T |2 =| T ∗ .T |
Definition 10 Let H be a Hilbert space. An operator T ∈ L(H ) is self-adjoint if < T (u), v > = < u, T (v) > ⇐⇒ T ∗ = T, for any u, v ∈ H . Proposition 14 Let H be a Hilbert space and let T ∈ L(H ) be a self-adjoint operator. So
(23) | T |= sup < T (x), x > . ||x||≤1
Proof Taking c = sup||x||≤1 |< T (x), x >|, we have |< T (x), x >|≤ c. || x ||2 for all x ∈ H . The Cauchy-Schwartz inequality implies that c ≤| T |. From the inequality (24) | T |= sup || T (x) || ≤ sup sup |< T (x), y >|, |x|≤1
|x|≤1 |y|≤1
we learn that it is sufficient to prove |< T (x), y >|≤ c whenever || x ||≤ 1 and || y ||≤ 1. Applying the polarization identity, we get 4 < T (x), y > =
+ + i < T (x + i y), x + i y > − < T (x − i y), x − i y > .
< T (x + y), x + y > − < T (x − y), x − y >
The parallelogram identity, as described in Appendix A (11), gives the following results:
2
2 16 < T (x), y > = < T (x + y), x + y > − < T (x − y), x − y > + 2 + < T (x + i y), x + i y > − < T (x − i y), x − i y > . (25) The polar representations < T (x), y) >= eiθ |< T (x), y >| and x = e−iθ x applied into the sesquilinear product < T (x ), y >=|< T (x), y >| show the product is real. Therefore we can assume that < T (x), y) >∈ R. As a consequence, < T (x + i y), x + i y > − < T (x − i y), x − i y >= 0
< T (x + i y), x + i y > − < T (x − i y), x − i y >= 2i − < T (x), y > + < T (y), x > = = 2i − < T (x), y > + < y, T (x) > = 2i − < T (x), y > +< T (x), y > = 0.
Identity (25) implies that 4 |< T (x), y >| ≤ |< T (x + y), x + y >| + |< T (x − y), x − y >| ≤ ≤ | T | . | x + y |2 + | T | . | x − y |2 ≤ | T | . | x + y |2 + | x − y |2 ≤ ≤ 2 | T | | x |2 + | y |2 .
112
2 Linear Operators in Banach Spaces
Since | x |≤ 1 and | y |≤ 1, we have 4 |< T (x), y >| ≤ 2c || x ||2 + || y ||2 ≤ 4c
⇒ | T | ≤ c.
Hence | T |≤ c.
Taking into account that sup||x||≤1 |< T (x), x >|= sup||x||=1 |< T (x), x >|, define (26) m(T ) = inf < T (x), x >, M(T ) = sup < T (x), x > . ||x||=1
||x||=1
Theorem 9 If T ∈ L(H ) is a self-adjoint operator, then the following conditions are satisfied: (i) σ (T ) ⊂ [m(T ), M(T )] ⊂ R. (ii) Both values m(T ) and M(T ) belong to σ (T ). Proof Let λ ∈ σ (T ) be a non-null eigenvalue and x = 0 ∈ N (T − λI ). From λ || x ||2 = < T (x), x > = < x, T (x) > = λ¯ || x ||2 ⇒
(λ − λ¯ ) || x ||2 = 0, (27) we have λ¯ = λ, and so σ (T ) ⊂ R. To prove Item (i), we verify the operator T − λI ∈ L(E) is bijective for all λ ∈ / [m(T ), M(T )]. Consider λ ∈ / σ (T ); (ia) T − λI is injective. Consider x ∈ H such that α =< T (x), x >∈ [m(T ), M(T )]. Therefore < (T − α I )x, x >= 0 and || (T − λI )x ||2 = || (T − α I )x + (α − λ)x ||2 = | λ − α |2 || x ||2 + || (T − α I )x ||2 + 2(α − λ) < (T − α I )x, x > ≥ | λ − α |2 || x ||2 .
Since | λ − α |2 > 0, then T − λI is injective. Moreover, T (H ) is a closed subset in H . (ib) T − λI is surjective. For all v ∈ (Im(T − λI ))⊥ we have < (T − λI )x, v >= 0 for all x ∈ H . Then < x, (T − λI )v >= 0, and so v = 0. Therefore (T − λI ) is surjective. (ic) The resolvent Rλ = (T − λI )−1 is bounded since the norm of y = (T − λI )x satisfies || y ||2 ≥ d || x ||2 , where d = dist(λ, [m(T ), M(T )]). Therefore || Rλ y ||≤ 1 || y ||, i.e., | Rλ |≤ d1 . d (ii) We observe that either | T |= −m(T ) or | T |= M(T ). Assuming | T |= M(T ) = M, we have a sequence {xn }n∈N ⊂ H of unitary vectors such that lim < T (xn ), xn >= M. However, we have lim || (T − M I )xn ||→ 0 since || (T − M.I )xn ||2 = || T (xn ) ||2 +M 2 − 2M < T (xn ), xn > ≤ 2M 2 − 2M < T (xn ), xn > → 0.
7 Linear Operators on Hilbert Spaces
113
This means that (T − M.I ) is not invertible in L(H ). Hence M ∈ σ (T ). The proof is similar to the case | T |= −m(T ).
The spectrum of a self-adjoint linear operator is a non-empty subset in R. Moreover, its norm is equal to its spectral ray; rσ (T ) = sup{| λ |}. λ∈σ
(28)
Exercises (a) Prove identity (24). (b) Prove that | T ∗ T |=| T |2 for all T ∈ L(H ). (c) Let T ∈ L(H ) be a self-adjoint operator. Taking M(T ) = M and m(T ) = m, prove that | M.I − T |= sup < (T − M.I )x, x >= M − m. ||x||=1
Apply this identity to prove that M − m ∈ σ (T ). Equivalently, we have that the operator (M − m)I − (M.I − T ) = T − m.I is not invertible. (d) Assuming T is a self-adjoint operator, prove the identity | T |= rσ (norm is equal to the spectral ray). (e) Let H be a Hilbert space. Prove that a linear operator F : H → H is Fredholm if and only if there exist orthogonal decompositions H = H1 ⊕ H2 and H = H3 ⊕ H4 , such that (a) H1 and H3 are both closed subspaces. (b) H2 and H4 are finite dimensional subspaces. (c) F can be represented in blocks as F=
F11 F12 F21 F22
: H1 ⊕ H2 → H3 ⊕ H4 .
where F11 : H1 → H3 is invertible. (d) The index of F is ind(F) = dim(H2 ) − dim(H4 ).
7.1 Characterization of Compact Operators on Hilbert Spaces Compact operators behave in a similar way with finite-dimensional operators. We will give the criterion for deciding whether an operator on a Hilbert space is compact. Proposition 15 Let H be a Hilbert space and K ∈ L(H ). So K is compact if and only if we have a sequence of finite rank operators {K n } ∈ L(H ) such that K n → K in L(H ).
114
2 Linear Operators in Banach Spaces
Proof Since K (H ) is compact, it has an enumerable and dense subset, so K (H ) is a separable of H . Let {φn }n∈N be an orthonormal basis of K (H ) ⊂ H and subset N < y, φi > φi the orthogonal projection over the subspace generated PN (y) = i=1 by {φ1 , . . . , φ N }. Now we have lim N →∞ || PN (y) − y ||= 0 for all y ∈ K (H ). The operator K N = PN K ∈ K(H ) has finite rank. Since K is compact, taking a bounded sequence {xn } ⊂ H , let lim K (xn ) = y ∈ H . In this way, || (K − K N )(xn ) || =|| (I − PN )K (xn ) || ≤ || (I − PN )(K (xn ) − y) || + || (I − PN )(y) || ≤ || K (xn ) − y || + || (I − PN )(y) || → 0, when N → ∞.
The reverse is straightforward. Consider {Tn } ∈ L(H ) a sequence of finite rank linear operators such that limn→∞ | T − Tn |= 0. Let {xk } ⊂ H be a bounded sequence. Taking yn = limk→∞ Tn (xk ), we get | T (xk ) − T (xl ) | ≤ | (T (xk ) − Tn (xk ) | + | (Tn (xk ) − Tn (xl ) | + | (Tn (xl ) − T (xl ) | .
Considering k, l ∈ N large enough, the sequence {T (xk )} is a Cauchy sequence. Hence T is compact.
The above theorem is false in the category of Banach spaces. Per Enflo gave a counterexample in [15].
7.2 Self-adjoint Compact Operators on Hilbert Spaces Let H be a Hilbert space. We will use the acronym S AC O(H ) to represent the class of self-adjoint compact operators defined on H . The main properties of a self-adjoint operator T are (i) T ∗ = T , (ii) the eigenvalues of T are real numbers, (iii) the generalized eigenspaces associated with distinct eigenvalues are orthogonal and invariant by T . We also learned from Theorem 9 that | T |∈ σ (T ). The proof of the Spectral Theorem for compact self-adjoint operators defined on an infinite dimensional vector space goes along the same reasoning to prove when we have a finite dimensional vector space. Theorem 10 Let H be a Hilbert space and T ∈ S AC O(H ). So: (i) the spectrum σ (T ) = {0} ∪ {λ1 , . . . , λn , . . . } is an ordered enumerable set such that λn is an eigenvalue of T , | λk |≥| λk−1 |> 0 for all k ∈ N and lim λn = 0. (ii) there is an orthonormal basis of eigenvalues β = {φ1 , . . . , φn . . . } spanning a closed subspace V such that H = Ker(T ) ⊕ V . (iii) For any v ∈ H , ∞ λn < v, φn > φn . T( f ) = n=1
7 Linear Operators on Hilbert Spaces
115
Proof Let λ1 ∈ {± | T |} and let φ1 ∈ H be a unitary vector such that T φ1 = λ1 φ1 . Taking the subspace V1 =< φ1 > generated by φ1 , we get an orthogonal decomposition H = V1 ⊕ V1⊥ , where T (V1 ) ⊂ V1 and T (V1⊥ ) ⊂ V1⊥ . The restriction T1 = T |V1⊥ is also compact and self-adjoint. Suppose T1 = 0. Let λ2 ∈ {± | T1 |} and let φ2 ∈ V1⊥ be a unitary vector such that T1 φ2 = λ2 φ2 . Considering V2 = V1 ⊕ < φ2 > with basis β H2 = {φ1 , φ2 }, we have the decomposition H = V2 ⊕ V2⊥ . Observing that | Ti | ≥ | Ti+1 | for all i, the same construction is performed to obtain a decomposition for all i ∈ N. According to Theorem 6, the set σ (T ) = {λ1 , . . . , λn , . . . } is enumerable, | λi | ≥ | λi−1 | and lim λn = 0. So forany v ∈ H , represented as the linear combination of the basis β H ={φn } as v = ∞ n=1 < f, φn > φn , we get the λ < v, φ
uniformly convergent series T (v) = ∞ n > φn . n=1 n The theorem above extends to compact operators. Theorem 11 Let E, F be Hilbert spaces and K ∈ K(E, F). So we have orthonormal bases β E = {φn }n∈N ⊂ E and β F = {ψn }n∈N ⊂ F, and a sequence {λn }n∈N ⊂ C, lim λn = 0, such that ∞ K( f ) = λn < f, φn > ψn , n=1
for all f ∈ E. Proof Since the operator K ∗ K is compact and positive (self-adjoint), the set of eigenvectors {φn } forms an orthonormal basis of H . Let σ (K ∗ K ) = {μ1 , . . . , μn , . . . } ⊂ √ [0, ∞) be the set of eigenvalues. For every n, define λn = μn and let (K ∗ K )1/2 ∈ K(E, F) be the operator given by (K ∗ K )1/2 ( f ) =
∞
λn < f, φn > φn .
(29)
n=1
The inverse operator (K ∗ K )−1/2 ∈ K(E, F) is (K ∗ K )−1/2 ( f ) =
∞
λ−1 n < f, φn > φn .
(30)
n=1
Let U = K (K ∗ K )−1/2 and U ( f ) = ∞ n=1 < f, φn > U (φn ). From Eq. (30), we have ∞ U( f ) = λ−1 (31) n < f, φn > K (φn ). n=1
116
2 Linear Operators in Banach Spaces
Therefore we get U (φn ) = λ−1 n K (φn ). Since || U ( f ) ||=|| f || for all f ∈ E, U is a unitary operator and the basis β = {λ−1 n K (φn )} is orthonormal. So taking K (φ ), we have ψn = λ−1 n n K( f ) =
∞
λn < f, φn > ψn .
(32)
n=1
It is interesting to remark that a Hilbert space provided with a compact, self-adjoint operator is necessarily separable. This indicates that the properties of operators can reveal the intrinsic properties of the space.
Exercises (1) The operator T : H → H is positive if it is self-adjoint and < T (u), u >≥ 0 for all u ∈ H . Given a bounded linear operator T : H → H , prove that T ∗ T and T T ∗ are positives. (2) The square root of a positive operator T is the self-adjoint operator S such that S 2 = T . Prove that any positive operator T has a unique square root S. (3) Find the eigenvalues and eigenvectors of T : L 2 ([0, 2π ]) → L 2 ([0, 2π ]) given by 2π sin(x − t)u(t)dt. T (u)(x) = 0
(4) Let k : [0, 2π ] → R be a periodic integrable function. Find the eigenvalues and eigenvectors of T : L 2 ([0, 2π ]) → L 2 ([0, 2π ]) given by (hint: try u n (x) = einx )
2π
T (u)(x) =
k(x − t)u(t)dt.
0
7.3 Fredholm Alternative Let H be a Hilbert space and let K ∈ K(H ) be a compact self-adjoint operator. The Spectral Theorem 10 guarantees the existence of a sequence of eigenvalues σ (K ) = {λn | n ∈ N} satisfying the conditions (i) | λi | > | λi+1 |, (ii) if #σ (K ) = ∞, then lim λn = 0 and (iii) we have an orthonormal basis β = {φn } of H with vectors that are eigenvectors of K . Let’s consider the following two cases; Case 1: If λ = 0 and λ ∈ / σ (K ), then the equation λ.x − K (x) = y admits a unique solution x for each y ∈ H .
(33)
7 Linear Operators on Hilbert Spaces
117
−1 Proof Let’s check that the operator Tλ = λ.I − K has a continuous inverse Tλ ∈ L(H ). Any vector x ∈ H can be written as a linear combination x = n xn φn of the basis β, where xn =< x, φn >. Then from Eq. (33), we have
(λ − λn )xn φn =
n
yn φn ⇒ xn =
n
yn . λ − λn
yn The series x = n λ−λ φn is a candidate to solve the equation x = (λ.I − n −1 K ) (y). To verify, first let’s check the uniform convergence of the series. Let 1 | and apply the Cauchy-Schwartz inequality; α = supn | λ−λ n
2
yn
2 yn
φn ≤ | yn |2 . | φn |2 ≤ α. | y |2 .
. | φ n |2 ≤ α λ − λ λ − λ n n n n n n
Then | Tλ−1 (y) |≤ α. | y |2 and | Tλ−1 |≤ α. If the equation Tλ−1 (y) = x has a solution, then it must be unique; otherwise, having a solution x that is different from x would imply λ ∈ σ (K ). This is not allowed since it contradicts the hypothesis. Hence the solution is unique.
Case 2: If λ p = 0 and λ p ∈ σ (K ), then the solution set of the Eq. is a finite dimensional vector space. Proof The necessary and sufficient condition for the existence of a solution x to the equation λ p x − K (x) = y is < y, φ >= 0 for all eigenvectors φ associated to λ p . Let’s prove the necessity first. Let x be a solution and K (φ) = λ p φ, so < λ p x − K (x), φ > = < x, λ p φ − K (φ) > = 0
⇒ < y, φ > = 0.
To check sufficiency, suppose < y, φ >= 0 for all φ so that K (φ) = λ p φ. Expanding in the basis β, we get λ p x − K (x) = y ⇒
n= p
(λ p − λn )xn φn =
yn φn .
n= p
yn It follows that x p = n= p λ−λ φn is a solution. Now, for all φ solving (λ p I − n K )φ = 0, the general solution is x = x p + φ since λ p x − K (x) = λ p x p − K (x p − λ p φ − K (φ) = 0. Since t (λ p I − K ) is a Fredholm operator, its kernel has finite dimension. Therefore the solution set x p + φ | φ ∈ Ker(λ p I − K ) has finite dimension.
118
2 Linear Operators in Banach Spaces
7.4 Hilbert-Schmidt Integral Operators An important category of compact operators is the Hilbert-Schmidt (HS) integral operators. Definition 11 Let ⊂ Rn be a connected open subset. The Hilbert-Schmidt kernel is a function k : × → C such that | k(x, y) |2 d xd y < ∞.
The Hilbert-Schmidt operator associated to k is Tk ( f )(x) =
k(x, y) f (y)dy,
(34)
and we have || Tk || H S =|| k || L 2 . The operator Tk is linear, continuous and compact. Assuming the condition k(x, y) = k(x, y), Tk becomes a self-adjoint operator. So it is diagonalizable. The kernel k is non-negative defined when, for any finite sequence of points {x1 , . . . , xn } ⊂ [a, b] and any choice of real numbers c1 , . . . , cn , we have n n
k(xi , x j )ci c j ≥ 0.
i=1 j=1
Theorem 12 (Mercer) Let = [a, b] and let k be a continuous, self-adjoint and non-negative H S kernel. So we have an orthonormal basis β = {φn } of L 2 ([a, b]) consisting of eigenfunctions of the operator Tk defined in (34), with eigenvalues that are non-negative. The eigenfunctions corresponding to an eigenvalue λn = 0 are continuous in [a, b], and k is given by the absolutely convergent series k(x, y) =
λi φi (x)φi (y),
(35)
i
Proof The operator Tk defined in (34) is non-negative, compact and self-adjoint in L 2 [a, b]. Expanding k(x, y) as a linear combination of the basis β, we get k(x, y) =
αi (x)φi (y),
i
where αi (x) =
b a
k(x, y)φi (y)dy. However since β is a set of eigenvectors, we have a
b
k(x, y)φi (y)dy = λi .φi (x).
7 Linear Operators on Hilbert Spaces
119
Consequently, k(x, y) = i λi φi (x)φi (y). When λi = 0, the eigenfunction φi is continuous in [a, b]. From the inequality
λi | φi (x).φi (y) |≤ sup | k(x, x) |2 , x∈[a,b]
i
we conclude that the series in (35) uniformly converges to k.
Exercises (1) Prove that a compact operator is bounded. (2) Given the hypothesis of Theorem 12, prove that
b
k(t, t)dt =
a
λn .
n
In this case, the operator Tk belongs to the class of trace operators, and we define
b
tr (Tk ) =
k(t, t)dt =
a
λn .
n
(3) Given the hypothesis of Theorem 12, prove that (hint: use Bessel’s inequality) n
λ2n ≤
a
b
b
| k(s, t) |2 dsdt.
a
(4) Let K : L 2 ([0, 1]) → L 2 ([0, 1]) be the operator given by K ( f )(x) =
x
f (y)dy.
0
(a) (b) (c) (d)
Find the adjoint K ∗ . Find | K |. Prove that the spectral ray is rσ (K ) = 0. Prove that 0 belongs to the continuous spectrum of K .
(5) Consider S : l 2 (Z) → l 2 (Z) the operator S(x)k = xk−1 , where x = (xk )∞ −∞ ∈ l 2 (Z) for all k ∈ Z. Prove the following items: (a) σ (S) = ∅. (b) (λI − S) is surjective for all λ ∈ C such that | λ |= 1. (c) The spectrum σ (S) = {λ ∈ C; | λ |= 1} of S is only continuous.
120
2 Linear Operators in Banach Spaces
8 Closed Unbounded Linear Operators on Hilbert Spaces So far the theory developed deals only with bounded linear operators; however there are important classes of unbounded linear operators. Let E, F be normed spaces, a linear operator T : E → F is unbounded if it is not bounded. That is, T is unbounded if we have a sequence {xn } ⊂ E such that lim |T|x(xnn| E)| F = ∞. Example 7 Among the unbounded operators, we highlight those that are differential operators. (1) Let E = (C 0 ([a, b]), || . ||0 ) and let D = C 1 ([a, b]) ⊂ E be the domain of T . The differential operator T : D → E, T ( f ) = f is unbounded. For n ∈ N, consider the function f n : [−π, π ] → R, f n (x) = cos(nx), so f n (x) = −n sin(nx). Since || f n ||0 =
sup
| f n (x) | = 1 and
x∈[−π,π]
|| f n ||0 =
sup x∈[−π,π]
| f n (x) |= n,
0 we have lim ||T||(fnfn||)|| = ∞. 0 2 (2) Let E = L (R) and D = H 2 (R). Consider T : D → L 2 (R) the operator T ( f ) = − f . The sequence { f n }n∈N , f n (t) = e−n|t| dt, T is unbounded since
|| f n ||2L 2 =
1 , || T ( f n ) ||2L 2 = n 3 n
⇒
lim
n→∞
|| T ( f n ) || L 2 = lim n = ∞. n→∞ || f n || L 2
(3) Let φ be an unbounded continuous function defined on R and D = C0∞ (R) (set of functions with compact support). Define Tφc : D → L 2 (R) by (Tφc ( f )(x) = φ(x). f (x). Therefore T is unbounded. (4) In the last example, consider D = { f ∈ L 2 (R) | φ. f ∈ L 2 (R)} and T : D → L 2 (R). Therefore T is unbounded. ∞ | j x j |2 < (5) Let H = l 2 (Z) be the Hilbert space D = {(x1 , . . . , xn , . . . ) | i=0 ∞} and T : D → H the operator T (x1 , . . . , xn , . . . ) = (x1 , 2x2 , 3x3 , . . . , nxn , . . . ). Therefore T is unbounded. The differential operator, as defined in Example (1) above, has the following property: if { f n }n∈N ⊂ C 1 ([a, b]) is a convergent sequence such that (i) f n → f ∈ (C 0 ([a, b]), || . ||0 ) and (ii) T ( f n ) → g ∈ (C 0 ([a, b]), || . ||0 ), then f ∈ C 1 ([a, b]) and f = g (Theorem 11, Appendix A). This property is explored in [20] for studying the spectral properties of T . So in this case the operator is closed, as defined in Appendix A, Definition 2. Definition 12 Let E, F be Banach spaces and let T : E → F be a linear operator with the domain D(T ) ⊂ E. T is a closed operator if given any sequence {xn }n∈N ⊂ D(T ) such that lim xn = x ∈ E and lim T (xn ) = y ∈ F, then x ∈ D(T ) and T (x) = y.
8 Closed Unbounded Linear Operators on Hilbert Spaces
121
We emphasize the importance of the domain D(T ) and the range R(T ) to study the properties of the operator. For example, the differential operator T : (C 1 (R), || . ||1 ) → (C 0 (R), || . ||0 ), T ( f ) = f , is bounded while the operator T : C 1 (−1, 1) → L 2 (−1, 1), T ( f ) = f , is unbounded and not closed, as shown in the following case:
consider the sequence of functions { f n }n∈N given by f n (x) = L2
f n → f (x) =| x |,
f n (x)
=√
nx n2 x 2
x2 +
L2
+1
→ g(x) =
1 . n2
So,
x , |x|
x = 0, 0, x = 0.
Obviously, g ∈ / C 1 and f (x) = g. A large class of differential operators belongs to the category of unbounded and closed operators. When we perform operations with the operators T and S, with domains D(T ) and D(S) respectively, it is important to note that: (i) D(T + S) = D(T ) ∩ D(S); (ii) D(T ◦ S) = {x ∈ D(S) | S(x) ∈ D(T )}. The operations T + S or T ◦ S make sense only if the respective domain is nonempty. From now on, consider H a Hilbert space. Definition 13 Let D(T ) ⊂ H be the domain of a linear operator T : D(T ) → H . (i) T is well-defined if T (u) = f and T (u) = g implies f = g. Equivalently, there is no f ∈ D(T ) such that T (0) = f since T (0) = 0. (ii) T is densely defined in H if D H is dense in H . (iii) The linear operator S : D(S) → H is an extension of T : D(T ) → H if D(T ) ⊂ D(S) and S(u) = T (u) for all u ∈ D(T ). In this case, we denote T ⊂ S. Next, we will give a sufficient condition to guarantee the existence of a closed extension T : D(T ) → H . Definition 14 Let D(T ) ⊂ H and let T :D(T ) → H be an unbounded linear oper∗ ator densely defined. Consider D(T ) = ∗v ∈ H |∗u →< T (u), v > is continuous for all u ∈ D(T ) . The adjoint operator T : D(T ) → H is defined by the identity < T (u), v > H =< u, T ∗ (v) > H , ∀u ∈ D(T ). The operator T ∗ is well-defined; otherwise there are v, w such that T ∗ (u) = v = w, and so < u, v − w >= 0 for all u ∈ D(T ). Since D(T ) is densely defined, we take any element uˆ ∈ H and let uˆ = lim u n ∈ H , where {u n }n∈N ⊂ D(T ). It follows from the continuity of the product that < u, ˆ v − w >= 0 for any uˆ ∈ H , so v = w. In addition, T ∗ is a linear operator, since for any a, b ∈ C, u ∈ D(T ) and v ∈ D(T ∗ ),
122
2 Linear Operators in Banach Spaces < T (u), av1 + bv2 > = a < T (u), v1 > + b < T (u), v2 >=< u, aT ∗ (v1 ) + bT ∗ (v2 ) >, < T (u), av1 + bv2 > = < u, T ∗ (av1 + bv2 ) > ⇒ < u, T ∗ (av1 + bv2 ) − aT ∗ (v1 ) − bT ∗ (v2 ) >= 0.
Hence T ∗ (av1 + bv2 ) = aT ∗ (v1 ) + bT ∗ (v2 ). Example 8 Examples of adjoint operators. (1) H = L 2 ([0, 1]), T (u) = u .
D(T ) = {u ∈ C 1 ([0, 1]) | u(0) = u(1) = 0} ⊂ H
1
< T (u), v >= 0
u (x)v(x)d x =
1
and
u(x).v (x)d x = < u, −v >, ∀v ∈ C 1 ([0, 1]).
0
So we have T ∗ (v) = −v and D(T ) D(T ∗ ) = C 1 ([0, 1]). ) = {u ∈ C 1 ([0, 1]) | u(0) = 0} ⊂ H and T (u) = u . (2) H = L 2 ([0, 1]), D(T 1
< T (u), v >= 0
1
u (x)v(x)d x = u(1)v(1) +
u(x).(−v (x))d x =< u, −v > .
0
So we have D(T ∗ ) = {v ∈ C 1 ([0, 1]) | v(1) = 0}. Therefore, D(T ) D(T ∗ ). Proposition 16 Let T : D(T ) ⊂ H → H and S : D(S) ⊂ H → H be densely defined operators. So: (i) If T ⊂ S, then S ∗ ⊂ T ∗ . (ii) If D(T ∗ ) is dense in H , then T ⊂ T ∗∗ . (iii) If T is injective and T −1 is densely defined, then T ∗ is injective and (T ∗ )−1 = (T −1 )∗ . (iv) S ∗ T ∗ ⊂ (T S)∗ . Definition 15 Let D(T ) ⊂ H . A densely defined operator T : D(T ) → H is selfadjoint if D(T ∗ ) = D(T ) and T = T ∗ . Example 9 Let’s consider the following classical examples. Consider H = L 2 ([0, 1]). (1) Let D(T ) = {u ∈ C 2 ([0, 1]) | u(0) = u(1) = 0} and let T : D(T ) → H be the operator T (u) = u . < T (u), v) >= u (1)v (1) − u (0)v (0)+ < u, T (v) > . Then D(T ∗ ) = {v ∈ C 2 ([0, 1]) | v (0) = v (1) = 0} = D(T ). Restricting to the intersection D(T ∗ ) ∩ D(T ), we have T ∗ = T . Therefore T is not self-adjoint. (2) As above, consider the same operator T (u) = u , T : D(T ) → H defined on the space D(T ) = {u ∈ C 2 ([0, 1]) | u(0) = u(1) = 0 and u (0) = u (1) = 0}. Now we have the identity < T (u), v >=< u, T (v) > for all u, v ∈ D(T ). Consequently, D(T ∗ ) = D(T ), T ∗ = T . Hence T is self-adjoint.
8 Closed Unbounded Linear Operators on Hilbert Spaces
123
The last examples motivate the next definition. Definition 16 Let H be a Hilbert space D(T ) ⊂ H and let T : D(T ) → H be a densely defined linear operator. T is symmetric if < T (u), v >=< u, T (v) > for all u, v ∈ D(T ). The classic example of a symmetric operator is the operator T (u) = i dtd defined on D(T ) =
)
f ∈ L 2 ([a, b]) |
* df ∈ C 0 and f (a) = f (b) = 0 ⊂ H = L 2 ([a, b]). dt
In the example above we can easily compute the adjoint as shown next; < T ( f ), g >=
b
i a
df dg gdt = i f (b)g(b) − f (a)g(a) + < f, i >=< f, T ∗ (g) > . dt dt
So we have D(T ∗ ) = H and T ∗ |D(T ) = T . Clearly, T ∗ is an extension of T . Proposition 17 Let H be a Hilbert operator and D(T ) ⊂ H . If T : D(T ) → H is a densely defined linear operator, then T ∗ is closed. Proof Take a sequence {vn }n∈N ⊂ D(T ∗ ) such that lim vn = v and lim T ∗ (vn ) = w. Therefore < T (u), v >= lim < T (u), vn >= lim < u, w > ⇒ < u, T ∗ (v) − w >= 0 for any u ∈ D(T ). Therefore T ∗ (v) = w and T ∗ is closed.
The operator T : D(T ) → H is symmetric if and only if T ∗ is an extension of T . The next two theorems give us enough conditions to find a closed operator S extending T . Theorem 13 Let H be a Hilbert space. Let D(T ) and D(T ∗ ) be dense subsets in H and consider the linear operators T : D(T ) → H and T ∗ : D(T ∗ ) → H . So the operator (T ∗ )∗ = T ∗∗ is a closed extension of T . Proof Since T ∗ is densely defined, the operator T ∗∗ is closed. To prove D(T ) ⊂ D(T ∗∗ ), consider u ∈ D(T ∗∗ ) and w = T ∗∗ (u); therefore we have < w, v >=< T ∗∗ (u), v >=< u, T ∗ (v) > . By definition, < u, T ∗ (v) >=< T (u), v > for all v ∈ D(T ∗ ) and u ∈ D(T ).
Therefore if u ∈ D(T ), then T ∗∗ (u) = T (u). Hence D(T ) ⊂ D(T ∗∗ ). Theorem 14 Let H be a Hilbert space and D(T ) ⊂ H . Consider T : D(T ) → H a symmetric densely defined linear operator. So we have a closed symmetric operator S such that S is an extension of T .
124
2 Linear Operators in Banach Spaces
Proof Let’s start by defining the domain of S. Let D(S) be the set of elements u ∈ H for which there is a sequence {u n }n∈N ⊂ D(T ) and an element v ∈ H such that u = lim u n and lim T (u n ) = v. D(S) is a vector space and D(T ) ⊂ D(S). Let S : D(S) → H be the operator given by S(u) = lim T (u n ), where {xn }n∈N ⊂ D is a sequence so that lim u n = u. By the definition of D(S), the limit lim T (u n ) exists. (i) S is well-defined. Suppose there are sequences {vn } and {wn } in D(T ) such that lim vn = v, lim wn = w and lim T (u n ) = lim T (wn ). So, < u, T (u n ) − T (wn ) >=< u, T (u n − wn ) >=< T (u), u n − wn >=< T (u), v − w >= 0.
Therefore (v − w) ⊥ D. The density condition on D(T ) ⊂ H yields v = w. (ii) S(u) = T (u), for any u ∈ D and D(S) = ∅ Let u ∈ D(T ) and consider the constant sequence u n = u for all n ∈ N. So we have S(u) = T (u) for all u ∈ D(T ). Hence S is an extension of T . (iii) S is symmetric. Let {u n } and {vn }n∈N be sequences in D(T ) such that lim u n = u, lim vn = v. In this way, S(u) = lim T (u n ) and S(v) = lim T (vn ). Consequently, lim < T (u n ), vn >= lim < u n , T (vn ) > ⇒ < S(u), v >=< u, S(v) > . (iv) S is closed. Let {u n }n∈N ⊂ D(S) be a sequence such that lim u n = u and lim S(u n ) = w, where u, w ∈ H . For every n ∈ N, consider the sequence {u n,k }k∈N ⊂ D(T ) such that u n = limk u n,k . Therefore the subsequence {u n,n }n∈N ⊂ D(T ) converges to lim u n,n = u.
Consequently we have u ∈ D(S) and S(u) = lim T (u n,n ). An example of an operator that does not allow a closed extension to be obtained is as follows: let H be a Hilbert space, let D(U ) ⊂ H be a subspace and let U : D(U ) → H be an unbounded linear functional. We fix w = 0 belonging to H \D(U ) and define the linear operator T : D(U ) → H by T (u) = U (u)w. Since U is not continuous at 0, there is a sequence {u n }n∈N ⊂ D(U ) such that u n → 0 and U (u n ) 0, that is, there is c > 0 such that | U (u n ) |≥ c > 0 ∀n ∈ N. The sequence vn = U u(un n ) has the following properties: (i) vn → 0, (ii) T (vn ) = w = 0 for all n ∈ N. Therefore if there were an extension of S to the operator T , then we would have S(0) = lim T (vn ) = T (0) = 0. However, S(0) = w.
Exercises (1) Prove Proposition 16. (2) Let H be a Hilbert space, D(T ) ⊂ H a dense subset in H and T : D(T ) → H a linear operator. Consider the sets
8 Closed Unbounded Linear Operators on Hilbert Spaces
125
R(T ) = v ∈ H | v = T (u) for some u ∈ D(T ) , Ker(T ) = {u ∈ D | T (u) = 0},
R(T ∗ ) = u ∈ H | u = T ∗ (v) for some v ∈ D(T ∗ ) , Ker(T ∗ ) = {v ∈ D(T ∗ ) | T (v) = 0}.
⊥ Prove that (i) R(T )⊥ = Ker(T ∗ ), (ii) R(T )⊥ = R(T ) = Ker(T ∗ )⊥ . (3) Let H be a Hilbert space, D ⊂ H a dense subset in H and T : D → H a selfadjoint linear operator. Suppose there is C > 0 and || T (u) || H ≥ C || u || H for all u ∈ D. Prove the following claims; (i) T (u) = f has a unique solution for every f ∈ H . (ii) If f ∈ N (T ∗ )⊥ , then there is a solution but there is no uniqueness. (4) Consider T a closed densely defined linear operator and suppose we have a constant C > 0 such that < T (u), u >≥ C. || u ||2H for all u ∈ D(T ). Prove that T has closed range. (5) Let H be a Hilbert space and D ⊂ H . Let T : D → H be an unbounded linear operator. The inverse operator T −1 of T is defined as follows; f ∈ D(T −1 ) satisfies T −1 ( f ) = u ⇐⇒ ∃u ∈ D such that T (u) = f. (a) Prove T −1 is well-defined if and only if T is injective. (b) Assume T is densely defined and there is C > 0 such that < T (u), u >≥ C. || u ||2H for all u ∈ D(T ). Prove that T has a bounded inverse. (c) Suppose T is closed and injective, prove that the image subspace R(T ) is closed if and only if T −1 is bounded. (d) Assume T is closed and densely defined. Prove that: (i) R(T ) = H if and only if (T ∗ )−1 is bounded. (ii) R(T ∗ ) = H if and only if (T )−1 is bounded. To conclude this section, we will make a short comment on the spectral properties of unlimited closed operators found in [20]. If T : D → H is closed and densely defined, then the operator λI − T is closed and D(λI − T ) = D for any λ ∈ C. If T is also symmetric, then it follows that: (i) the eigenvalues of T belong to R. (ii) the eigenvectors associated with distinct eigenvalues are orthogonal, that is, λ = μ ⇒ Ker(λI − T ) ⊥ Ker(μI − T ).
Chapter 3
Differentiation in Banach Spaces
In this chapter we will introduce the concept of differentiability of maps defined in Banach spaces. The Inverse Function Theorem (InFT) is the main result; some examples of optimization in Variational Calculus are given, as well as some properties of the Fredholm maps are proved along with some applications of the InFT.
1 Maps on Banach Spaces Let E, F be Banach spaces. The purpose of this section is to give examples of differentiable maps f : E → F between Banach Spaces. In the preceding chapter, we proved that the spectrum σ(T ) of a linear operator T ∈ L(E) is a non-void compact subset of C. Let K ⊂ C be a compact subset. From the Stone-Weierstrass Theorem, any function f ∈ (C 0 (K ; C), || . ||0 ) can be approximated using a sequence of polynomials { pn }n∈N . Therefore assuming T ∈ L(E) and | T |∈ K , we define the operator f (T ) = lim pn (T ) : E → E. n→∞
For any f = lim pn and let’s check that f (T ) ∈ L(E). Any polynoTn ∈ L(E), i a λ , can be extended to the map p : L(E) → L(E), mial p : C → C, p(λ) = i=1 i n ai T i , since p(T ) = i=1 | p(T ) | ≤
n i=1
| ai | . | T |i < sup | p(λ) | . λ∈K
© Springer Nature Switzerland AG 2021 C. M. Doria, Differentiability in Banach Spaces, Differential Forms and Applications, https://doi.org/10.1007/978-3-030-77834-7_3
127
128
3 Differentiation in Banach Spaces
Let { pn }n∈N be a sequence of polynomials, pn (λ) = | pn (T ) − pm (T ) | ≤
n
| an,i − am,i | . | T |i ≤
i=1
n i=1
an,i λi , so
sup | pn (λ) − pm (λ) | = || pn − pm ||0 .
λ∈σ(T )
Assume { pn }n∈N is a Cauchy sequence converging uniformly in (C 0 ([m, M]; R), || . ||0 ).1 So the sequence { pn (T )}n∈N is also a Cauchy sequence in L(E) since | pn (T ) − pm (T ) | = sup{| pn (λ) − pm (λ) |; λ ∈ σ(T )} = || pn − pm ||0 . Taking another polynomial sequence {qn }n∈N converging uniformly to f , we can check that the value f (T ) does not depend on the polynomial. Let lim pn (T ) = A and lim qn (T ) = B, so | A − B | ≤ | A − pn (T ) | + | pn (T ) − qn (T ) | + | qn (T ) − B | ≤ ≤ | A − pn (T ) | + || pn − qn ||0 + | qn (T ) − B | . Passing to the limit, we have A = B = f (T ). The following properties are satisfied with the limits: (i) (a f + bg)(T ) = a f (T ) + bg(T ), (ii) ( f.g)(T ) = f (T ).g(T ), (iii) f¯(T ) = [ f (T )]∗ . The spectrum of f (T ) is quite simple and will be described, as we show next. Theorem 1 (Spectral Map Theorem) Let T ∈ L(E). Let p ∈ C[X ] be a polynomial, so σ( p(T )) = p(σ(T )) = { p(λ) | λ ∈ σ(T )}. Proof For any λ ∈ R, the polynomial p(x) ˆ = p(x) − p(λ) can be written as the product of p(x) ˆ = (x − λ)q(x). Therefore we have p(T ˆ ) = p(T ) − p(λ)I = (T − λI )q(T ). • p(σ(T )) ⊂ σ( p(T )). Suppose p(λ) ∈ ρ( p(T )); in this case, the operator p(T ˆ ) has an inverse S such that S. p(T ˆ ) = p(T ˆ ).S = I ⇒ (S.q(T )).(T − λI ) = (T − λI ).(q(T ).S) = I ⇒ λ ∈ ρ(T ).
Given the above, if p(λ) ∈ σ( p(T )), then λ ∈ σ(T ). 1 Recall
m = inf σ(T ) and M = sup σ(T ).
1 Maps on Banach Spaces
129
• σ( p(T )) ⊃ p(σ(T )). Let k ∈ σ( p(T )). When factored over C, the polynomial p(x) − k is written as p(x) − k = (x − λ1 ) . . . (x − λr ), 1 ≤ r ≤ n. The inverse of p(T ) − k I =
r
i=1 (T
− λi I ) is
( p(T ) − k I )−1 =
1
(T − λi I )−1 .
i=r
However since k ∈ σ( p(T )), there is one λi for which the operator T − λi I has no inverse. Hence λi ∈ σ(T ) and p(λi ) = k ∈ p(σ(T )). Corollary 1 If f (σ(T )).
f ∈ (C0 (σ(T ); C), || . ||0 ) and T ∈ L(E), then σ( f (T )) =
Corollary 2 Let p ∈ C[x] be a polynomial, and let H be a Hilbert space. If T ∈ L(H ) is a self-adjoint operator, then | p(T ) |= sup{| p(λ) |; λ ∈ σ(T )}. Proof Let’s first assume that p ∈ R[x]. The fact that T is self-adjoint implies that p(T ) is also self-adjoint, and so the norm is equal to the spectral radius: | p(T ) | = sup{| μ |; μ ∈ σ( p(T ))} = sup{| p(λ) |; λ ∈ σ(T )}. To address the case p ∈ C[x]; we consider the polynomial p. ¯ p ∈ R[x]. Since ( p. ¯ p)(T ) = [ p(T )]∗ . p(T ), then | p(T ) |2 = || [ p(T )]∗ . p(T ) || = sup{| ( p. ¯ p)(λ); λ ∈ σ(T )} = = sup{| p(λ) |2 ; λ ∈ σ(T )}. Examples of maps between Banach spaces are shown using analytical functions. Let f : C → C be an analytic function with the radius of convergence R. When we fix a point x0 ∈ C, there is a unique sequence {an }n∈N so that f (x) is equal to ∞ an (x − x0 )n for every x ∈ (x0 − R, x0 + R). Therefore we define the f (x) = i=0 ∞ an An . In this way, f (A) is well-defined in map f : L(E) → L(E) as f (A) = i=0 {A ∈ L(E); | A |< R} since | f (A) | ≤
∞ i=0
an | A |n < ∞.
130
3 Differentiation in Banach Spaces
Example 1 Examples of maps. (1) Exponential of a linear operator. xk Let f (x) = e x = ∞ k=0 k! . For every A ∈ L(E), consider the series exp(A) = ∞ Ak k=0 k! . This series is convergent for all A ∈ L(E) since | (A) |≤
∞ | A |k = e|A| . k! k=0
(1)
So exp(0) = I. In general, exp(A + B) = exp(A).exp(B). If we assume AB = B A, then we have exp(A + B) = exp(A).exp(B) since ⎤ ⎡ ∞ ∞ k k− j B j (A + B)k A ⎦= ⎣ exp(A + B) = = k! (k − j)! j! k=0 k=0 j=0 ⎡ ⎤
⎤ ⎡ ∞ ∞ k ∞ k− j B j i Bj A A ⎣ ⎦= ⎣ ⎦ = (k − j)! j! i! j! k=0
i+ j=k
i=0
= exp(A).exp(B).
j=0
An immediate consequence is that exp(A) ∈ GL(E) for all A ∈ GL(E) since [exp(A)]−1 = exp(−A). We treat the case AB = B A by introducing the commutator [A, B] = AB − B A and the operator C = exp(A + B). In this way, 1 1 C = A + B + [A, B] + ([A, [A, B]] + [[A, B], B]) + . . . . 2 12 When E = Rn or E = Cn , the exponential of A ∈ L(E) is explicitly computed using Jordan’s canonical form of A. ∞ k x 2k k x 2k+1 (2) cos(x) = ∞ k=0 (−1) (2k)! , sin(x) = k=0 (−1) (2k+1)! . Both functions are analytic functions with the radius of convergence being R = ∞. This allows us to extend the real functions cos, sin : R → R to the maps cos, sin : L(E) → L(E). (−1)i+1 i x is analytic with the radius of conver(3) The function tan−1 (x) = ∞ i=1 i gence R = 1. Its extension tan−1 : B1 → L(E) is given in the series tan−1 (A) = ∞ (−1)i+1 i A defined as B1 = {A ∈ L(E); | A |< 1}. i=1 i
Exercises (1) Find the values of the maps exp(A), sin(A), cos(A) and tan−1 (A) at A=
5 3 . −6 −4
1 Maps on Banach Spaces
131
1.1 Extension by Continuity Let E be a normalized space, F ⊂ E is a vector subspace, and G is a Banach space. We will briefly address the issue of extending the linear operator T : F → G to the closing F; Theorem 2 Let T : F → G be a bounded linear operator, so we have a unique extension T : F → G such that T (x) = T (x) for all x ∈ F and | T |=| T | in F. Proof To define T , we take x ∈ F and let {xn }n∈N ⊂ F be a sequence such that lim xn = x. Then {T (xn )}n∈N is a Cauchy sequence since || T (xn ) − T (xm ) || ≤ | T | . || xn − xm || . Therefore {T (xn )}n∈N is convergent. Let T (x) = limn→∞ T (xn ). The limit does not depend on the sequence {xn }n∈N . Let’s consider two convergent sequences {xn }n∈N and {yn }n∈N with xn → x and yn → x. Given > 0, we have n() ∈ N, such that for all n, m > n(), we have || T (xn ) − T (yn ) || ≤ | T | . || xn − yn ||< . T (x) = limn→∞ T (xn ) = limn→∞ T (yn ). We define T (x) = Hence limn→∞ T (xn ), with {xn }n∈N being any sequence converging to x. The operator T is bounded and | T |=| T | since || T (x) || = lim || T (xn ) || ≤ | T | . lim || xn || = | T | . || x || . n→∞
n→∞
Exercises (1) Show that if F ⊂ E is a subspace, then F ⊂ E is also a subspace. (2) Show the following assertions: (a) If dim(F) < ∞, then F = F. (b) If dim(F) = ∞, then item (a) is false.
2 Derivation and Integration of Functions f : [a, b] → E Formally, the theory goes along the same lines as in the case when dim(E) < ∞. However, additional conditions must be assumed for the techniques to have the same efficiency.
132
3 Differentiation in Banach Spaces
2.1 Derivation of a Single Variable Function A curve in a Banach space E is a function f : [a, b] → E. Definition 1 A curve f : [a, b] → E is differentiable at t ∈ [a, b], if there is A ∈ L(E) such that f (t + h) − f (t) = Ah + r (h), and lim h→0
||r (h)|| |h|
= 0.
f (t) justifying the notation A = f (t). The Therefore we have A = lim h→0 f (t+h)− h linear operator D( f ) = f defined on differentiable functions satisfies the following properties: (i) ( f + g) (t) = f (t) + g (t). (ii) ( f.g) (t) = f (t)g(t) + f (t)g (t). (iii) (chain rule) Let α : [c, d] → [a, b] and h(t) = f (α(t)), so h (t) = f (α(t)).α (t). The differentiability of f implies its continuity, since the linear operator f (t) : R → E, f (t)(h) = f (t).h is bounded and limh→0 r (h) = 0. Then passing to the limit, we have
|| f (t + h) − f (t) || = || f (t)h + r (h || ≤ | f (t) | . || h || + || r (h) || → 0. In the theory of differentiable maps between Banach spaces, to prove the Mean Value inequality requires more care than in the case of a map between finite dimensional spaces. When dim(E) < ∞, a function is described in terms of its coordinates f (t) = ( f 1 (t), . . . , f n (t)), which cannot be done for a function defined in E. When E = Rn , in considering the linear functional πi : Rn → R given by π(x1 , . . . , πn ) = xi , i ∈ {1, . . . , n}, each coordinate of f is described as f i = πi ◦ f . To get around the lack of a coordinate system, we use Corollary 1 in Appendix A, which we have as a consequence of the Hahn-Banach Theorem 14 in Appendix A. Proposition 1 Let I ⊂ R be an interval, let E be a normed space and let f : I → E be a differentiable curve. If f (t) = 0 for all t ∈ I , then f is constant. Proof Take t0 ∈ I and assume that there is t1 ∈ I such that f (t1 ) = f (t0 ); otherwise f is constant. The Hahn-Banach Theorem guarantees the existence of a functional λ : E → R, such that λ( f (t1 ) − f (t0 )) = 0, and by linearity we get λ( f (t1 )) = λ( f (t0 )). Define the function g = λ ◦ f : I → R, g(t) = λ( f (t)). Since g (t) = λ( f (t)) = 0, we have that g is constant, contradicting the fact that g(t0 ) = g(t1 ). Hence f (t) is constant.
2.2 Integration of a Single Variable Function We will briefly present the theory of integration of functions f : [a, b] → E. The reader can find the complete approach to this topic in Ref. [28].
2 Derivation and Integration of Functions f : [a, b] → E
133
Let B([a, b], E) = { f : [a, b] → E; | f |∞ < ∞} be the set of bounded functions provided with the norm || . ||0 so that (B([a, b], E), || . ||0 ) is a Banach space. A function f ∈ B([a, b], E) is a step function if there is a partition P f = {a = x0 , x1 , . . . , xn = b} such that f is constant in the interval [xi−1 , xi ], i = 1, . . . , n, that is, there is a collection of values {k1 , . . . , kn } ⊂ E, such that f |[xi−1 ,xi ] = ki and || f ||0 = supi {| ki |}. Let S([a, b], E) ⊂ B([a, b], E) be the subset of step functions. Definition 2 The integral of f ∈ S([a, b], E) with respect to the partition P f is I ( f, P) =
n
ki (xi − xi−1 ).
(2)
i=1
The integral of the function f ∈ S([a, b], E) is independent of the partition used, i.e., I ( f, P ) = I ( f, P) for any partition P and P of [a, b]. Now we consider the functional Iab : S([a, b], E) → E, Iab ( f ) = I ( f, P). Lemma 1 The map Iab : S([a, b], E) → E is linear and continuous. Proof Let f, g ∈ S([a, b], E), and let P be a partition of [a, b] such that both functions are step functions with respect to P. Linearity follows easily, since Iab (a f + bg) =
n (aki + bli )(xi − xi−1 ) = a Iab ( f ) + bIab (g). i=1
Continuity follows from the boundedness of Iab ; || Iab ( f ) || E = ki (xi − xi−1 ) E ≤ | xi − xi−1 | . || ki || E ≤ i
i
≤ (b − a) sup || ki || E ≤ (b − a) || f ||0 . i
If { f n } ⊂ S([a, b], E) is a sequence converging uniformly to f , then the integral of f ∈ S([a, b], E) is obtained by taking the limit a
b
f (x)d x = lim
n→∞ a
b
f n (x)d x = lim Iab ( f n ). n→∞
It follows from Theorem 2 that the operator Iab : S([a, b], E) → E admits a unique extension I¯ab : S([a, b], E) → E to the closing of S([a, b], E). If f ∈ b S([a, b], E), then define I¯ab ( f ) = a f (x)d x. Theorem 3 C 0 ([a, b], E) ⊂ S([a, b], E).
134
3 Differentiation in Banach Spaces
Proof Since f is continuous and [a, b] is compact, given > 0, we have δ > 0 such that if | x − y |< δ, then | f (x) − f (y) |< and f is uniformly continuous. Now < δ. Consider the partition P(n) = {a = x0 , . . . , xn = choose n ∈ N such that b−a n b−a b} satisfying xi − xi−1 = n and define the step function gn : [a, b] → R, gn (x) =
f (xi−1 ), x ∈ [xi−1 , xi ), f (b), x = b.
Therefore gn depends on P(n), and so gn depends on n. Taking n sufficiently large, we have || f − gn ||0 < . R;
The integral satisfies the following properties: let f, g ∈ S([a, b], E) and c, d ∈
b b b (1) a (c f + dg)(x)d x = c a f (x)d x + d a g(x)d x. b c b (2) a f (x)d x = a f (x)d x + c f (x)d x. b b (3) | a f (t)dt | ≤ a | f (t) | dt ≤ || f ||0 (b − a). Next, we will prove a version of the Fundamental Theorem of Calculus (FTC) for functions (curves) f : [a, b] → E; Theorem 4 (TFC) Let f ∈ C 0 ([a, b], E) and consider F : [a, b] → R the function given by x
F(x) = k +
f (t)dt,
a
and k ∈ E is constant. So F (x) = f (x) and F(a) = k. Proof Since F (x) = lim
h→0
F(x + h) − F(x) 1 = lim h→0 h h
x+h
f (t)dt,
x
we have F(x + h) − F(x) 1 x+h − f (x) ≤ || f (t) − f (x) || E dt. E h h x Then, F(x + h) − F(x) 1 x+h lim − f (x) = lim || f (t) − f (x) || E dt. h→0 h→0 h x E h Since f is uniformly continuous on [a, b], given > 0, there is δ > 0 such that if | t − x |< δ for all x, t ∈ [a, b], then || f (t) − f (x) || E < . If | h |< δ, then
2 Derivation and Integration of Functions f : [a, b] → E
135
F(x + h) − F(x) − f (x) < . E h since | t − x |≤| h |. Hence F (x) = lim h→0
F(x+h)−F(x) h
= f (x).
Corollary 3 Let f ∈ C 0 ([a, b], E) and F ∈ C 1 ([a, b], E) be such that F (x) = f (x). Then, b f (x)d x = F(b) − F(a). a
x
Proof Defining I (x) = a f (t)dt, we have I (a) = 0. Let F : [a, b] → E be another function such that F (x) = f (x). By Proposition 1, we have a constant k ∈ E such that F(x) − I (x) = k. In this way, k = F(a) and I (x) = F(x) − F(a), that is, I (b) = F(b) − F(a).
3 Differentiable Maps II We must be aware that the concept of distance in Banach Spaces depends on the norm, which does not occur when the dimension is finite since the norms are all equivalent. Definition 3 Let U ⊂ E be an open subset and f : E → R a map. The Gâteaux derivative of f at p ∈ U and in the direction of the vector v is the directional derivative f ( p + tv) − f ( p) ∂f ( p) = lim . t→0 ∂v t The nickname Gâteux derivative is often used in the context of Variational Calculus, it is also called the “functional derivative”. Regardless of how it is called, it is the directional derivative as we defined it in Chap. 1. Of course, just as in Rn , the directional derivative may not be linear in the argument for v, as shown in the following example: let p = (0, 0) and v = (a, b); f (x, y) =
x3
x 2 +y 2
, (x, y) = (0, 0),
0, (x, y) = (0, 0).
3 a ∂f 2 2 , (a, b) = (0, 0), ( p) = a +b ∂v 0, (a, b) = (0, 0).
Nonlinearity in the examples above is because the partial derivatives are not continuous at the origin. The concept of differentiability for maps between Banach spaces is called the Fréchet derivative. Definition 4 A function f : E → R is Fréchet differentiable at p whenever we have a continuous linear functional d f p : E → R and a function r : E → R such that f ( p + v) − f ( p) = d f p (v) + r (v)
136
and limv→0 E → R.
3 Differentiation in Banach Spaces |r (v)| ||v||
= 0. Let C 1 (E, R) be the set of Fréchet differentiable maps f :
The differential operator d f p is unique and satisfies the following properties: (i) linearity: d f p (av + bw) = ad f p (v) + bd f p (w), (ii) Leibnitz’s rule: d( f.g) p (v) = d f p (v).g( p) + f ( p).dg p (v), (iii) chain rule: φ : E → E, h = f ◦ φ : E → R, so dh q (v) = d f φ(q) .dφq (v). (iv) If f is differentiable at p (Fréchet differentiable), then f is also Gâteaux differentiable at p and ∂∂vf ( p) = d f p (v). The reverse is false, but if the Gâteaux derivative is linear and bounded at p, then it follows that f is Fréchet differentiable at p. We emphasize the need for the linear operator d f p to be continuous in defining differentiability; otherwise, it does not imply the continuity of the application. Proposition 2 If f : E → R is Fréchet differentiable, then f is continuous. Proof Since we have || f ( p + v) − f ( p) || E ≤ || d f p (v) || E + || r (v) || E ≤ | d f p | . || v || E + || r (v) || E ,
taking v → 0, yields limv→0 ( f ( p + v) − f ( p)) = 0.
The Mean Value inequality is fundamental in the theory of differentiability in finite dimensional spaces. It can extend to Fréchet differentiable maps (application of the FTC). Theorem 5 Let U ⊂ E be a convex open subset, f ∈ C 1 (U, E) and x, y ∈ U . Assuming the line r : [0, 1] → E, r (t) = x + t (y − x) is contained in U , then || f (y) − f (x) || ≤ sup | d fr (t) | . || y − x || .
(3)
t∈[0,1]
Proof Let the function g : [0, 1] → E be given by g(t) = f (r (t)). Then we have g (t) = d fr (t) .(y − x). By the FTC || f (y) − f (x) || = || g(1) − g(0) || = || ≤
1 0
g (t)dt || = ||
1 0
d fr (t) .(y − x)dt || ≤
sup | d fr (t) | . || y − x || .
t∈[0,1]
Since f ∈ C 1 (U, E), the supreme of d fr (t) is attained in [0, 1].
Proposition 3 Let U ⊂ E be a convex subset and let f : U → F be a Fréchet differentiable map for all points belonging to the line segment r : [1.1] → U , r (t) = p + tv. Then there is τ ∈ (0, 1) such that || f ( p + v) − f ( p) || F ≤ | d fr (τ ) | . || v || E .
3 Differentiable Maps II
137
Proof Consider the function h : [0, 1] → F, h(t) = f ( p + tv). Using the chain rule, we have h (t) = d f p+tv .v. Applying the inequality from the last proposition, we get || h (t) || F ≤ M || v ||v , M = supt∈[0,1] | d f p+tv |. Definition 5 Let U ⊂ E and V ⊂ F be open subsets of Banach spaces. A map f : U → V is Fréchet differentiable at p if we have a continuous linear functional d f p : E → F and a map r : U → V such that f ( p + v) − f ( p) = d f p (v) + r (v) and limv→0 U → V.
|r (v)| ||v||
= 0. Let C 1 (U, V ) be the set of Fréchet differentiable maps f :
The differential d f p is unique and satisfies the following properties: (i) linearity: d f p (av + bw) = ad f p (v) + bd f p (w), (ii) chain rule: Let E, F and G be Banach spaces. Let φ : E → F and let f : F → G be Fréchet differentiable maps, so the composite h = f ◦ φ : E → G is Fréchet differentiable and dh p (v) = d f φ( p) .dφ p (v). (iii) If f : U → V is Fréchet differentiable, then f is continuous.
4 Inverse Function Theorem (InFT) In this section, we give the statement and the proof of the more general version of the Inverse Function Theorem (InFT) for differentiable maps defined on Banach Spaces. In Chap. 2, the InFT was enunciated and some applications given in the finite dimensional context.
4.1 Prelude for the Inverse Function Theorem We now introduce the main ideas to prove the InFT through a simple example. We will consider the case of a single real variable function so that the similarity with the Newton-Raphson Method (NR) becomes apparent. If the reader finds it appropriate, this section can be ignored. The N-R Method is a practical tool for finding the roots of an equation, while InFT is a theoretical tool with widespread applications. Now let’s address the problem to find a solution to the equation f (x) = 0, we assume f : (a, b) → R is a C 2 -function. The proof of InFT will follow with the same arguments. Method of Newton–Raphson Let x1 ∈ (a, b) be a point such that f (x1 ) = 0. The tangent line L1 to the graph of f at (x1 , f (x1 )) is given by the equation L1 : y − f (x1 ) = f (x1 )(x − x1 ).
138
3 Differentiation in Banach Spaces
Fig. 1 Sequence {xn+1 }n∈N
The intersection of L1 with the x-axis is the point x2 = x1 −
f (x1 ) . If f (x1 )
f (x2 ) = 0,
then we repeat the process to obtain the point x3 = x2 − ff (x(x22)) , as indicated in Fig. 1. After successive steps, if we have f (xn ) = 0, then we can repeat the process to obtain xn+1 = xn −
f (xn ) . f (xn )
(4)
If f (xn ) = 0, then the algorithm does not go further since the tangent line Ln will be parallel to the x-axis. The N-R method to solve equation f (x) = 0 consists of applying the following algorithm: (1) Choose two points α, β ∈ (a, b) such that f (α). f (β) < 0 and f (x) = 0 for all x ∈ (α, β). As a result, we can guarantee the existence of a unique solution in the interval (α, β). (2) Now take an initial point x1 to apply the algorithm induced by the recursive formula (4). Geometrically, once we choose a point x1 ∈ [α, β], we can intuit from the figure that lim xn = ξ is a solution. If item (2) above fails at step n, then the algorithm will not be efficient to find a root. Given an arbitrary point y, we can generalize the N-R method to solve the equation f (x) = y given that the function f y (x) = y − f (x) and defining the recursive sequence xn+1 to be xn+1 = xn +
1 (y − f (xn )). f (xn )
(5)
4 Inverse Function Theorem (InFT)
139
If the sequence {xn }n∈N converges to ξ = lim xn , then y = f (ξ). Let’s consider y = 0. To prove that the method indeed solves the equation, it is crucial to show that {xn }n∈N converges to a solution of the equation f (x) = 0. Consider the function φ : [α, β] → R given by φ(x) = x −
f (x) , f (x)
(6)
we note that ξ ∈ [α, β] is a solution of f (x) = 0 if and only if φ(ξ) = ξ. Its derivative is φ (x) =
f (x). f (x) . [ f (x)]2
(7)
Taking M = supt∈[x0 ,x1 ] | φ (t) |, it follows from the Mean Value Theorem that | φ(x ) − φ(x) | ≤ M. | x − x |, ∀x, x ∈ [α, β]. Theorem 6 Let f : (a, b) → R be a C 2 -function. Let α, β ∈ (a, b) be such that f (α). f (β) < 0 and f (x) = 0 for all x ∈ [α, β]. If | f (x). f (x) | 0 (as small as we wish) be
< δ. Then for all n ∈ N, | xn+1 − ξ | ≤ C. | xn − ξ |2 +δ.
So | xn+1 − ξ | ≤ C. | xn − ξ |2 . We choose r > 0 such that Cr < 1, and we take the starting point x1 so that | x1 − ξ |< r . Therefore we have | xn+1 − ξ | ≤
1 n−1 . (C.r )2 . C
Hence lim xn = ξ. Proof of Theorem 7 Let’s start by analyzing an example from Ref. [6]. Example 3 To find a solution to the equation cos(x) + 3xe−x = 0 is equivalent to finding the fixed points of the function f (x) = − 13 e x cos(x); the graph is illustrated in Fig. 6. Let z 0 be the only negative fixed point; the others are {z k }, k ≥ 1, all being positive with an approximate value of z k ∼ (2k + 1) π2 . The derivative of f satisfies | f (x) |
1. The numeric behavior of the sequence {xn }n∈N defined in (4) can be understood by taking different initial values x0 close to z k . Only the point z 0 is an attractor, that is, choosing x0 near z 0 , the sequence converges to z 0 since f is a contraction. In the cases k = 0, so that x0 is close to z k , the sequence does not converge to z k , so they are called repulsive points since f is no longer a contraction. We will introduce another strategy to approach the question; we will consider the functions f (x) = cos(x) + 3xe−x and g(x) = x − a. f (x). Finding a fixed point for g is equivalent to finding a zero for f (x). We can choose a so that the derivative g (x) = 1 − a[− sin(x) + 3e−x (1 − x)] satisfies the condition | g (x) |< 1 in the neighborhood of z k , therefore making g(x) a contraction. In this example, the size of the neighborhood depends on z k , so it is not uniform. We now consider the following case. Let fˆ : R → R be a differentiable map. Given c ∈ E, we will study the question about the existence of solutions for the equation fˆ(x) = c. First, let’s look at some of the ideas that motivate the techniques we will use. To solve the equation fˆ(x) = c is equivalent to solving f (x) = fˆ(x) − c = 0, so we can reduce the problem to find the zeros (roots) of a function. Given the function g(x) = x − a f (x), the equation f (x) = 0 has a solution if and only if g(x) = x. This brings us to the Contraction Lemma 1 in Appendix A to show that g has a fixed point.
Fig. 6 fixed points of f (x) = − 31 e x cos(x)
144
3 Differentiation in Banach Spaces
Proposition 4 Let U ⊂ E be an open subset and let f : U → U be a differentiable map. If | d f p |≤ λ < 1, then there is a neighborhood U0 ⊂ U of p such that f |U0 is a contraction. Proof Let x, y ∈ U and r : [0, 1] → E be the line r (t) = x + t (y − x). By Theorem 5, there is t0 ∈ (0, 1) such that | f (y) − f (x) | ≤ | d f t0 | . | y − x | < λ | y − x | . The constant a in g(x) = x − a f (x) is chosen so that the root α of f (x) = 0 satisfies | g (α) | ≤ 1. So α is an attractor and a fixed point of g since | g (α) | ≤ | 1 − a. f (x) | < 1. We note that the optimal choice for a is f 1(α) . Let’s proceed to prove Theorem 7. The first step is to show that under some restrictions, the function φ y defined in Eq. 8 has a fixed point. Definition 6 Let a = f (x0 ). Consider the open set | a | U0 = x ∈ I ; | f (x) − f (x0 ) |< 2
(9)
and V0 = f (U0 ) its image. The proof of Theorem 7 is achieved once the next series of five propositions is proved. Proposition 5 If x0 ∈ U0 , then φ y is a contraction for any y ∈ R. Proof The derivative of φ y (x) = x + a1 (y − f (x)) is φy (x) = 1 − U0 , then | f (x) − f (x0 ) |=| a | . | φy (x) |
0 and α = |a| , we consider the open balls Br (x0 ) = {x ∈ U0 ; | x − 2 x0 |< r } and Bαr (y0 ) = {y ∈ V0 ; | y − y0 |< αr }.
4 Inverse Function Theorem (InFT)
145
Proposition 6 If y ∈ Bαr (y0 ), then there is x ∈ U0 such that y = f (x). Proof We have φ y (Br (x0 )) ⊆ Br (x0 ) since | φ y (x) − x0 | ≤ | φ y (x) − φ y (x0 ) | + | φ y (x0 ) − x0 | ≤
1 1 | x − x0 | + | y − y0 | ≤ r. 2 |a|
Therefore φ y : Br (x0 ) → Br (x0 ) is a contraction for all y ∈ Br (y0 ). Consequently, φ y has a fixed point ξ ∈ Br (x0 ). Hence y = f (ξ). Proposition 7 V0 is an open set. Proof Given > 0, let y0 = f (x0 ) ∈ V0 and y ∈ B (y0 ). According to the statement of Proposition 6, we have φ y (B (x0 )) ⊂ Br (x0 ) for all y ∈ B (y0 ). So the contraction φ y : B (x0 ) → Br (x0 ) has a unique fixed point ξ ∈ Br (x0 ), and therefore y = f (ξ) ∈ V0 and B (y0 ) ⊂ V0 . Proposition 8 The map f : U0 → V0 is a homeomorphism. Proof First, let’s check that f : U0 → V0 is injective. To argue by contradiction, assume there are points x1 , x2 such that y = f (x1 ) = f (x2 ), then φ y (x1 ) = x1 and φ y (x2 ) = x2 . Consequently, we have x1 = x2 due to the uniqueness of the fixed point. Since f −1 (V0 ) = U0 is open, then f −1 : V0 → U0 is continuous. Proposition 9 The map f −1 : V0 → U0 is differentiable. Proof For any x ∈ U0 , we have f (x) = 0. Now let’s consider y = f (x), y + k = f (x + h); k k = f −1 ( f (x + h)) − f −1 ( f (x)) − = f (x) f (x) 1 f (x + h) − f (x) − f (x)h k = − (k − f (x)h) = . =h− f (x) f (x) f (x)
f −1 (y + k) − f −1 (y) −
Passing to the limit k → 0 and taking into account the inequality | h |< k |, we have h → 0 and lim
k→0
f −1 (y + k) − f −1 (y) −
k f (x)
< lim
h→0
2 | f (x0 )|
f (x + h) − f (x) − f (x)h = 0. f (x)
|
146
3 Differentiation in Banach Spaces
Therefore at y = f (x), the derivative of f −1 is ( f −1 ) (y) = 2 | k |, we note that | h |< | f (x 0 )| | φ y (x + h) − φ y (x) | = | h −
1 . f (x)
To prove that
k |h| |≤ . f (x0 ) 2
Using the triangular inequality | a | − | b | < | a − b |, we get |h|−|
k f (x
0)
|≤
|h| 2
⇒
|h| |k| < . 2 | f (x0 ) |
Exercises
Considering f ∈ C 0 [a, b] , answer the following; (1) (Secant’s Method) Given the points x0 and x1 , show that the sequence {xn }n∈N , xn+1 =
xn−1 f (xn ) − xn f (xn−1 ) f (xn ) − f (xn−1 )
converges to a solution of f (x) = 0. (2) Let U ⊂ Rn be an open subset, let f ∈ C 2 (U, Rn ) and let p ∈ U be a root of f such that d f p is invertible. In U , consider the sequence {xn }n∈N defined by the Eq. (4). Show that there is a neighborhood V ⊂ U of p and a constant C > 0 such that if x0 ∈ V , then lim xn = p.
4.3 Proof of the Inverse Function Theorem (InFT) We will prove the general version of InFT for differentiable maps on a Banach space. In Chapter I, the Implicit Function Theorem (ImFT) used to solve equation f (x) = c is a consequence of the InFT. Theorem 8 (InFT) Let E be a Banach space, and let U ⊂ E be an open subset, f ∈ C 1 (U, E) and p ∈ U . If the differential operator d f p is invertible, then there are open subsets U0 , V0 in E such that: (i) U0 ⊂ U , p ∈ U0 and f ( p) ∈ V0 . (ii) V0 = f (U0 ) and f : U0 → V0 is a diffeomorphism.
4 Inverse Function Theorem (InFT)
147
Proof The proof is divided into four steps; (i) The choice of U0 ; Let A = d f p and λ = 2|A1−1 | . Define U0 = {x ∈ U ; | d f x − A |< λ}. Given the continuity of d f p , the subset U0 is well-defined. The existence of d f x−1 for all x ∈ U0 is guaranteed due to Proposition 5 in Chap. 2. (ii) The map f |U0 is injective. Let y ∈ E and define the map φ y : U → E, φ y (x) = x + A−1 (y − f (x)). There are two important remarks to be stressed; (1a ) If we have x ∈ U such that y = f (x), then φ y (x) = x, (2a ) φ y |U0 is a contraction. Indeed, for all x, x¯ ∈ U0 , we claim the inequality ¯ − φ y (x) || ≤ || φ y (x)
1 || x¯ − x || . 2
Since | d(φ y )x | = | I − A−1 d f x | ≤ | A−1 | . | A − d f x | ≤
1 , 2
for all x ∈ U0 , we can apply the Mean Value Theorem 5 to obtain a contraction. Suppose there are x0 , x1 ∈ U0 such that y = f (x0 ) = f (x1 ); then φ y has two fixed points contracting the Contraction Lemma. Therefore x0 = x1 and f |U0 is injective. (iii) The set V0 = f (U0 ) is open. Before proving the statement, we note it implies f −1 |U0 is continuous. Set y0 ∈ V0 and let x0 ∈ U0 be such that y0 = f (x0 ). Let’s show that there is an open ball B (y0 ) contained in V0 . We choose r > 0 so that Br (x0 ) ⊂ U0 and define the ball Bλr (y0 ) = {y ∈ E; | y − y0 |< λr }. We will prove that Bλr (y0 ) ⊂ V0 , i.e., for any y ∈ Bλr (y0 ), then we have ξ ∈ U0 such that y = f (ξ). Given y ∈ Bλr (y0 ), we consider the map φ y : U0 → U0 , φ y (x) = x + A−1 (y − f (x)), and we note that φ y (Br (x0 )) ⊆ Br (x0 ) for any x ∈ Br (x0 ) given that 1 | x − x0 | + | φ y (x0 ) − x0 | ≤ 2 r r r r ≤ + | A−1 (y − f (x0 )) | ≤ + | A−1 | . | y − y0 | ≤ + = r. 2 2 2 2
| φ y (x) − x0 | ≤ | φ y (x) − φ y (x0 ) | + | φ y (x0 ) − x0 | ≤
Since Br (x0 ) is a complete metric space, it follows that the map φ y when restricted to Br (x0 ) is a contraction having a unique fixed point ξ ∈ Br (x0 ). Therefore y = f (ξ), and so Bλr (Y0 ) ⊂ V0 .
148
3 Differentiation in Banach Spaces
(iv) The map g = f −1 : V0 → U0 is differentiable and dg y = [d f g(y) ]−1 . Since the map f : U0 → V0 is bijective, we can define g = f −1 : V0 → U0 . Let y, y + k ∈ V0 and x, x + h ∈ U0 be such that y = f (x) and y + k = f (x + h). For all x ∈ U0 , the operator [d f x ]−1 is well-defined. Let B = [d f x ]−1 ; g(y + k) − g(y) − B(k) = g( f (x + h)) − g( f (x)) − B(k) = x + h − x − B(k) = h − B(k) = B(B −1 (h) − k) = −B( f (x + h) − f (x) − B −1 (h)). In this way, we have g(y + k) − g(y) − B(k) | B( f (x + h) − f (x) − B −1 (h)) | = ≤ |k| |k| | f (x + h) − f (x) − B −1 (h) | . ≤|B|. |k| Passing to the limit k → 0, we use the inequality | h |≤ 2 | A−1 | . | k |= λ−1 | k | with a proof to be given at the end. Now, taking the limit h → 0 on the righthand side of | B | | f (x + h) − f (x) − d f x (h) | g(y + k) − g(y) − B(k) ≤ . , |k| λ |h| we get that g is differentiable. Hence B = dg y = [d f g(y) ]−1 . The continuity of dg y in relation to the argument y is verified by observing that the map I : GL(E) → GL(E), I(S) = S −1 is continuous. The maps d f : x → d f x and g are continuous, so the map dg y = [d f g(y) ]−1 = I(d f g(y) ) is a composite of continuous maps, so it is continuous. The inequality | h | ≤ 2 | A−1 | . | k | = λ−1 | k | follows from | φ y (x + h) − φ y (x) | = | h + A−1 k |≤ and | h | − | A−1 k | ≤ | h + A−1 k | ≤
1 2
| h |.
1 |h| 2
4.4 Applications of InFT (1) Implicit Function Theorem in finite dimension. Consider Rk+m = {(x, y) | x ∈ Rk , y ∈ Rm } and let f : Rk+m → Rm be a differentiable map given by f (x, y) = ( f 1 (x, y), . . . , f m (x, y)). Let c = (c1 , . . . , cm ) ∈ Rm be fixed and let p ∈ Rk+m be a point satisfying f ( p) = c. We will
4 Inverse Function Theorem (InFT)
149
assume that the differential d f p = ∂f ( p) ∂y
∂f ∂x
...
∂f ∂y
: Rk+m → Rm is surjective and
is an isomorphism. The differential of F(x, y) = ( f 1 (x, y), . . . , f m (x, y), x)
at p is
d Fp =
AB I 0
,
and A = ∂∂xf (p) is an (m × k) matrix, B = ∂∂ yf (p) is an (m × m), I is the (k × k) identity matrix, and 0 is the null matrix (k × m). So d F p is an isomorphism. Due to the InFT, there are open sets U p and V F( p) such that F : U → V is a diffeomorphism. Letting = F −1 , u = f (x, y) and v = x, we have (u, v) = (x, y). Therefore f ◦ (u, v) = f (x, y) = u. We see that up to a local diffeomorphism, the map f is a projection. Letting (u, v) = (φ1 (u, v), φ2 (u, v)), if f (x, y) = c, then x = v and y = φ2 (x, c). The solution set of the equation f (x, y) = c is locally described in Rk+m by the map x → (x, φ2 (x, c)). (2) Let f : Mn (R) × Mn (R) → Mn (R) be the map f (X, Y ) = X.Y − I . The entries of f (X, Y ) are polynomials, so f is differentiable. The differential of f at (X, Y ) 2 2 2 is the linear operator d f (X,Y ) : Rn × Rn → Rn , d f (X,Y ) .(U, V ) = U Y + X V . Then d f (X,Y ) is surjective if X or Y belong to GLn (R). The point (I, I ) is a solution to the equation f (X, Y ) = 0, so the solution set is locally diffeomorphic to 2 Rn . (3) Let U ⊂ Rn be an open subset and f : U → Rn a C 1 -map. The InFT can be applied to show that the dependence on initial conditions x0 = x(0) for solutions of ODE x (t) = f (x(t)) is differentiable. Consider the Banach spaces E = C 0 [a, b] , || . ||0 , F = g ∈ E | g(x0 ) = 0 ⊂ E, and let f be a function such that || f ||0 < M. Define the map F : E → F by t f (x(s))ds. (10) F(x)(t) = x(t) − x0 − 0 The zeros of F correspond to solutions of the initial value problem x (t) = f x(t) , x(0) = x0 , which has a unique solution due to the Existence and t Uniqueness Theorem.2 The operator dFx : F → F, dFx .h = h(t) − a f t (x(s))h(s)ds is written as dFx = I − T , T : F → F is given by T (h) = a f (x(s))h(s)ds and | T |≤ M | t |. Let’s choose a neighborhood (−, ) so that M | t | < 1, and so | T | < 1. Therefore the operator dFx = I − T is an isomorphism. So there are open subsets U, V ⊂ E and a diffeomorphism
2 Proved
in Chap. 4.
150
3 Differentiation in Banach Spaces
: U → V . We conclude that the equation F(x) = y has a solution for all y belonging to the neighborhood V ⊂ E of 0 and it is defined for all t sufficiently small. Moreover, its dependence on the initial condition x(0) is differentiable. (4) Let E 1 = C 1 ([0, 1] , || f ||1 ) and E 2 = C 0 ([0, 1] , || f ||0 ). Let F : E 1 → E 2 be a differentiable map, and let F( f + h) = F( f ) + dF f .h + r (h) be the Taylor formula of degree 1. We want to show that for all g ∈ E 2 satisfying || g ||0 < , the differential equation df + F( f ) = g, f (0) = k dx admits a unique solution. The derivative of : E 1 → E 2 given as ( f ) =
df + F( f ) dx
defines the linear operator d f .h = ddhx + F ( f ).h : E 1 → E 2 . For f = 0, we have (0) = 0 and d0 .h = ddhx . The kernel of the d0 is the set of constant functions in E 1 , so it is isomorphic to R. Consider the decomposition E 1 = Ker(d0 ) ⊕ W, with W ⊂ E 1 as a closed complement. By the Fundamental Theorem of Calculus, the operator d0 : W → E 2 is bijective. We therefore ˆ : U → V is a diffeomorhave open subsets U ⊂ W and V ⊂ E 2 such that ˆ f ) = g has a phism. Therefore given > 0 sufficiently small, the equation ( unique solution in U ⊂ E.
Exercises (1) Let L be an invertible linear operator and f a map such that || f (x) ||< M || x ||2 . Show that the map g(x) = L(x) + f (x) is a diffeomorphism. (2) Show that there are open neighborhoods U, V of the identity matrix in which every matrix A admits the square root, so that we have X ∈ U such that X 2 = A. Extend the result to the neighborhood of an arbitrary matrix. An (3) Define the exponential map exp : Mn (R) → GLn (R) by exp(A) = ∞ i=0 n . Show that: (a) The map exp : Mn (R) → GLn (R) is well-defined and exp ∈ C 1 . (b) The identity has a neighborhood U ⊂ Mn (R) in which exp |U is a local diffeomorphism. (c) If | A |< ln(2), then | exp(A) − I |< 1. n+1 (A−I )n . Show that; (4) Define ln(A) = ∞ 1 (−1) n (a) If | I − A |< 1, then the series ln(A) converges absolutely and | ln(A) |< 1. (b) If | A |< ln(2), then | exp(A) − I |< 1 and ln(exp(A)) = A (hint: first consider A diagonalizable. Next, apply the Jordan Canonical form to the general case).
4 Inverse Function Theorem (InFT)
151
(5) Let f : Rn ⊂ Rn be differentiable and f (0) = 0. If 1 is not in the spectrum σ(d f 0 ), prove there is a neighborhood V ⊂ Rn of 0 such that f (x) = x for all x ∈ V − 0. (6) Let A = {A ∈ Mn (R) | At = −A} be the space of skew-symmetric ∞ An matrices. . Show that: Define the exponential map exp : A → S On by exp(A) = i=0 n (a) The map exp is well-defined. (b) There is a neighborhood U of identity I ∈ A such that exp |U is a diffeomorphism. (7) Let k : [a, b] × [a, b] → R be a continuous function such that || k ||0 < C, C > 1 . For n ∈ N, consider 0 and consider r ∈ R a real number satisfying | r |≤ C(b−a) the map b n ( f (x)) = f (x) + r k(x, t) f n (t)dt a
and solve the following items: (a) Show that n : C 0 [a, b] ⊂ C 0 [a, b] . (b) Assuming n = 1, prove that given an arbitrary function g ∈ C 0 [a, b] there is always a unique function f ∈ C 0 [a, b] such that g = 1 ( f ). (c) Assuming n = 2 and || f ||0 ≤ 1, prove there are neighborhoods U ( f ) f and V ( f ) 2 ( f ) such that, given an arbitrary h ∈ V ( f ), we have a unique function g ∈ U ( f ) such that 2 (g) = h. 2 2 (8) For which values of (r, θ) is the map P : R → R , P(r, θ) = r. cos(θ), r. sin(θ) , a local diffeomorphism? (9) Given the functions f, g ∈ C 1 R2 , let φ : R2 → R2 be the map φ(x, y) = (x + x 2 f (x, y), y 2 + y 2 g(x, y)). Show that there is a neighborhood V of the origin and a solution for the equation p = φ(x, y) for all p ∈ V . (10) (Implicit Function Theorem) Let E 1 , E 2 and F be Banach spaces, let U1 ⊂ E 1 , U2 ⊂ E 2 be open subsets and E = E 1 × E 2 . Let U = U1 × U2 ⊂ E and f ∈ C 1 (E, F). For a point p ∈ U , the differential operator d f p admits a decomposition into d1 f p ⊕ d2 f p : E 1 ⊕ E 2 → F with di f = d f | Ei . Assume the differential d2 f p : E 2 → F is in G L(E 2 , F) at p = ( p1 , p2 ) ∈ E, pi ∈ E i . Let q = f ( p) ∈ F. So there are open sets U1 ⊂ E 1 p1 and V ⊂ E 2 p2 satisfying the following condition: there is a C 1 -map ξ : U2 → V such that f (x, ξ(x)) = c. Moreover, the differential of ξ : V → W at x ∈ U is dξx = −[d2 f (x, ξ(x))]−1 .d1 f (x, ξ(x)).
152
3 Differentiation in Banach Spaces
5 Classical Examples in Variational Calculus The theory of differentiability in Banach spaces is a powerful tool that is often used in several contexts to address questions in Pure and Applied Mathematics. There are a variety of problems where solutions require optimizing a function F : → R, defined on a Banach space , and given by
b dγ , (11) L t, γ(t), γ(t) ˙ dt, γ˙ = F(γ) = dt a with γ ∈ , γ : [a, b] → Rn , and L :[a, b] × Rn × Rn → R, (t, x, y) → L(t, x, y)
(12)
is a C 2 -function. L is the Lagrangian. The Calculus of Variations deals with this kind of optimization problem. To optimize F, using the differential techniques developed so far, it means finding the critical points of F which may be local minimums, local maximums or saddle points. The Calculus of Variations has its origins in the famous Brachistochrone problem formulated by Johann Bernoulli in 1696. After the Lagrangian and Hamiltonian formulation of Classical Newtonian Mechanics, the topic had a tremendous advance and technical growth culminating in a variety of applications; e.g., in the formulation of Quantum Mechanics, in addressing global problems in Differential Geometry, as well as in Engineering and many other areas. Nowadays, the Calculus of Variations is one of the most important subjects in mathematics. Often the topic uses its own jargon such as the terms; – the function F : ( p, q) → R is called the functional. – the Banach space E is the set of admissible functions. – the directional derivative is called the functional derivative. The initial formulation of a variational problem is settled when we know the functional F and the space of admissible functions. The examples and the approach we will present are classics, so in most of them the space of admissible functions is a subset of the Banach space C 1 ([a, b], Rn ) with the norm || f ||1 =|| f ||0 + || f ||0 . In general, the closing of a bounded subset of the space of admissible functions is not compact, so there is no apriori guarantee that the optimization problem will have a solution. The critical points of a function f are roots of the equation f (x) = 0. The equivalent version of this equation requires finding a critical point of a functional F. Since the critical points are minimal in our examples, a variational problem in our context is well posted when it is given as follows: b minimize: F(γ) = a L(t, γ(t), γ(t))dt, ˙ (13) constrain to: γ ∈ .
5 Classical Examples in Variational Calculus
153
5.1 Euler-Lagrange Equations Let’s fix the points p, q ∈ Rn and consider the space of admissible functions as the affine space ( p, q) = {γ : [a, b] → Rn | γ ∈ C 1 , γ(a) = p, γ(b) = q}. Let Rn = {(x1 , . . . , xn ) | xi ∈ R, 1 ≤ i ≤ n} and let β = {e1 , . . . , en } be the canonical basis of Rn . For all γ ∈ ( p, q), γ(t) = (x1 (t), . . . , xn (t)), we consider ˙ Let Tγ ( p, q) be the the map : [a, b] → R × Rn × Rn , (t) = (t, γ(t), γ(t)). b tangent space to ( p, q) at γ. So the functional is written as F() = a (L ◦ )(t)dt. Fixing the origin of ( p, q) at γ, we get the vector space V = v : [a, b] → Rn | v ∈ C 1 , v(a) = v(b) = 0 . Then Tγ ( p, q) = V for all γ ∈ ( p, q). The vector space Tγ ( p, q) is endowed with the inner product < v, w >=
b
< v(t), w(t) > dt =
a
a
b
n
vi (t)wi (t) dt.
i=1
If γ is a critical point of F : ( p, q) → R, then d Fγ .v = 0 for all v ∈ Tγ ( p, q). To describe the differential d Fγ .v, we proceed the same way as we did to obtain the directional derivative in Chap. 1. For all v ∈ Tγ ( p, q), the curve γs : (−, ) → Rn , s = v) given by s → γs (t) = γ(t) + s.v(t) is contained in ( p, q) (γ0 = γ and dγ ds and defines a variation of γ. The Lagrangian L = L(t, x1 , . . . , xn , y1 , . . . , yn ) is a function on (2n + 1)-parameters and yi = x˙i , 1 ≤ i ≤ n. Considering L ◦ : (−, ) × [a, b] → R, L ◦ (s, t) = L(t, γs (t), γ˙ s (t)), we have b b d(L ◦ ) d F(γs ) d(γs ) |s=0 = |s=0 dt = |s=0 dt = d Fγ .v = d L (t) . ds ds ds a a b b d(γs ) |s=0 > dt = = < ∇ L((t)), < ∇ L((t)), (0, v, v) ˙ > dt = ds a a
b b n n ∂L ∂L ∂L d ∂L .vi dt+ = .vi + .v˙i dt = − ∂xi ∂ yi dt ∂ yi a i=1 a i=1 ∂x i +
∂L ∂L (b)vi (b) − (a)vi (a) . ∂y ∂ yi i ! =0
154
3 Differentiation in Banach Spaces
Let the gradient of F be the operator ∇F =
n ∂L
−
∂xi
i=1
d dt
∂L ∂ yi
ei .
Therefore the expression above can be written as d Fγ .v =< ∇ F(γ), v >=
b
< ∇ F(γ(t)), v(t) > dt.
a
Consequently, γ is a critical point of F if
b
< ∇ F(γ), v > dt =
a
b
n ∂L
a
d − ∂xi dt
i=1
∂L ∂ yi
.vi dt = 0,
(14)
for all v ∈ Tγ .
Lemma 2 (du Bois-Reymond) Let f ∈ C 0 [a, b] be an admissible function. If b 0 a f (t)v(t)dt = 0 for all v ∈ C [a, b] satisfying v(a) = v(b) = 0, then f = 0 in [a, b]. t Proof Let g(t) = a f (ξ)dξ. It follows from the hypothesis that 0= a
b
d(g(t).v(t)) dt = dt
b
a
g (t).v(t)dt + !
b
g(t).v (t)dt.
a
=0
b t So a g(t).v (t)dt = 0 for all v. Consider v(t) = a ( f (ξ) − c)dξ, t ∈ [a, b] and b 1 c = b−a a f (ξ)dξ. By construction, we have v(a) = 0 and v(b) = 0. Therefore 0≤
b
( f (t) − c)2 dt =
a
b
( f (t) − c).v (t)dt =
a
b
f (t)v (t)dt − c (v(b) − v(a)) = 0.
a
Now we have f (t) = c, but this yields f (t) = 0.
Taking v(t) = vk (t)ek , k ∈ {1, . . . , n} in Eq. (14), we get a
b
< ∇ F(γ), v > dt = a
b
∂L d − ∂xk dt
∂L ∂ yk
(t).vk (t)dt = 0.
Applying Lemma 2, the Euler-Lagrange equations of the functional F are
∂L d ∂L = 0, k = 1, . . . , n. (15) − ∂xk dt ∂ yk
5 Classical Examples in Variational Calculus
155
Since γ(t) ˙ = (x˙1 (t), . . . , x˙n (t)), the ordinary differential equations (15) above are most often written as
d ∂L ∂L = 0, k = 1, . . . , n. (16) − ∂xk dt ∂ x˙k Therefore the E-L equations are 2nd-order ODEs. For an admissible curve γ(t) = x1 (t), . . . , xn (t) , we have L(t, γ(t), γ(t)) ˙ = L(t, x1 , . . . , xn , x˙1 , . . . , x˙n ). There are particular cases where the E-L equations can be reduced to 1st-order ODEs; this is done analyzing the L dependence on the (2n + 1)-variables. We denote L y = ∂∂ Ly . (i) Assume L = L(t, x˙1 , . . . , x˙n ); It follows from the E-L equations that d dt
∂L ∂ x˙k
= 0.
(ii) L = L(x1 , . . . , xn , x˙1 , . . . , x˙n ); In this case, L does not depend on the parameter t, so d L xi = (L x˙i ) yi x˙i + (L yi )x˙i x˙i = L x˙i yi x˙i + L x˙i x˙i x¨i . dx Dealing with the E-L , we get L xi − L x˙i xi x˙i − L x˙i x˙i y x¨i = 0. After multiplying by x˙i , we have L xi x˙i − L x˙i xi (x˙i )2 − L x˙i x˙i x¨i x˙i = L xi x˙i + L x˙i x¨i − [L x˙i x¨i + x˙i L x˙i xi x˙i + x˙i L x˙i x˙i x¨i ] = # dL d " d x˙i L x˙i = L − x˙i L x˙i = 0. = − dt dt dt
Therefore we have a constant c ∈ R such that L − x˙i L x˙i = ci , i = 1, . . . , n; (iii) L = L(t, x1 . . . , xn ); Since L x˙i = 0, i = 1, . . . , n, the E-L equations become ∂L = 0, i = 1, . . . , n. ∂xi
(17)
156
3 Differentiation in Banach Spaces
5.2 Examples We will introduce some classical examples. Let ( p, q) be the space of continuous curves in R2 connecting p to q, i.e., ( p, q) = {γ : [0, 1] → R2 | γ ∈ C 0 ([0, 1]), γ(0) = p, γ(1) = q}. The space of admissible functions to be considered is a subset of the space gr ( p, q) = {γ ∈ ( p, q) | γ(x) = (x, y(x))} ⊂ ( p, q), with y : [0, 1] → R a C 1 -function. (1) Brachistochrone curve (brachystos=minimum, chronos=time). Starting from the point p = (a, A), a heavy body slides to the point q = (b, B). Find the curve γ ∈ g ( p, q) so that the body reaches the point q in a minimum amount of time. variational formulation: Let us consider that the trajectory of the particle is given as γ(t) = (x, y(x)), y ∈ C 1 [a, b] . We need to find the time spent by the body to go through the $ trajectory described by γ. The arc-length element of γ is ds = 1 + [y (t)]2 d x. The velocity of the ˙ and the module of the velocity is $ body at time t is v = γ(t) v =| v |=| γ˙ |= 1 + [y (x)]2 ddtx . Assuming gravity’s acceleration is constant √ √ 1+[y (x)]2 and equal to g, we have v = 2gy, and so dt = √2gy(x) d x . Therefore b% 1 1 + [y (x)]2 d x. (18) T =√ y(x) 2g a The variational problem to be addressed is ⎧ ) ⎨minimize T (γ) = √1 b 1+[y (x)]2 d x, y(x) 2g a ⎩constrain to γ ∈ g ( p, q) = {y : [a, b] → R | f ∈ C 1 [a, b]} ⊂ (C 1 [a, b], || . ||1 ).
(19) solution: The Lagrangian does not depend on x, so it satisfies Eq. 17; %
L − y L y = Taking A =
1 y 1 1 + (y )2 1 = c. −√ $ =√ $ y 2g 1 + (y )2 2g y(1 + (y )2 )
√1 , c 2g
we have y =
)
A2 −y , y
i.e.,
5 Classical Examples in Variational Calculus
* dx =
157
y dy ⇒ x = A2 − y
*
y dy. A2 − y
Changing the variable of integration to y = A2 sin2 (θ/2), we get A2 A2 (θ − sin(θ)) + c, y(θ) = (θ − cos(θ)). 4 4 2 2 The parametrized curve γ(θ) = A4 (θ − sin(θ)) + c, A4 (θ − cos(θ)) is a cycloid. (2) Geodesics. Let S ⊂ R3 be a surface. A Riemannian metric defined in S is a C ∞ -map that associates to every x ∈ S an inner product gx : Tx S × Tx S → R. Taking a frame αx = {e1 (x), e2 (x)} at each point x ∈ S, the metric gx evaluated on vectors u, v ∈ Tx S can be written as the product of matrices x(θ) =
gx (u, v) = u t g(x)v. The entries of matrix g(x) are [g(x)]i j = gx (ei , e j ). The length of a curve γ : [a, b] → S is b$ g(γ (t), γ (t))dt. (20) L(γ) = a
Since the interval is [a, b], consider ( p, q) = {γ ∈ C 0 ([a, b]; S) | γ(a) = p, γ(b) = q}. Definition 7 Let p, q ∈ S. A geodesic connecting the point p to the point q is a curve γ ∈ ( p, q) such that L(γ) = min L(δ). δ∈( p,q)
(21)
The distance d( p, q) between p and q is d( p, q) = min L(δ). δ∈( p,q)
(22)
Indeed, the distance between p and q is the length of the geodesic connecting p to q. Then to find the distance between two points p, q ∈ S, we need to solve the variational problem b√ minimize: L(γ) = a γ (t)t .g(γ(t)).γ (t)dt, (23) constrain to: γ ∈ ( p, q).
158
3 Differentiation in Banach Spaces
Next, we will work out the geodesic variational problem in Euclidean, spherical and hyperbolic geometries. (a) Euclidean Geometry In this case, the surface is S = R2 and g p (u, v) = u
t
10 v. 01
Let’s consider the space of admissible curves as the curves defined by the graph of a function g ( p, q) = {y : [a, b] → R | y ∈ C 1 [a, b] }. The length of a curve γ(t) = (t, y(t)) is b$ 1 + ( y˙ (t))2 dt. L(γ) = a
variational formulation: To find the distance between two points p, q ∈ R2 , we need to solve the problem b$ minimize: a 1 + ( y˙ (t))2 dt., (24) constrain to: γ ∈ g ( p, q). Solution:
L(γ) =
b
$
1 + (y )2 dt.
a
The Lagrangian L(t, y, y ) = satisfies the equation L − y L y =
$
$ 1 + (y )2 does not depend on t and y, so it
(y )2 1 1 + (y )2 − $ =$ = c. 2 1 + (y ) 1 + (y )2
In this way, we have a constant k ∈ R such that y = k, and consequently, we get y(t) = kt + k . In Euclidean Geometry, tracing a geodesic involves straight lines. (b) Spherical Geometry Let the surface be the sphere S = S 2 = {(cos(θ) sin(ψ), cos(θ) sin(ψ), cos(ψ)) | (θ, ψ ∈ (0, 2π) × (0, π)}. A curve γ : [a, b] → S 2 is given by γ(t) = cos θ(t) sin ψ(t) , cos θ(t) sin ψ(t) , cos ψ(t) . The Riemannian metric at p = (cos(θ) sin(ψ), cos(θ) sin(ψ), cos(ψ)) is g p (u, v) = u t .
sin2 (ψ) 0 .v, for all u, v ∈ T p S 2 . 0 1
5 Classical Examples in Variational Calculus
159
The length of γ is
b
L(γ) =
)
sin2 (ψ(t))θ˙2 + ψ˙ 2 dt.
a
Let’s consider θ = θ(ψ) as a function of ψ and denote θ = of γ is b) L(γ) = sin2 (ψ(t))θ2 + 1dψ.
dθ . dψ
Now the length
a
variational formulation:
b$ minimize: a sin2 (ψ(t))θ2 + 1dψ, constrain to: γ ∈ g ( p, q) = {θ : [a, b] → [0, 2π] | θ ∈ C 1 ([a, b]; U )}. (25) Solution: The Euler-Lagrange equation is d ∂L (L θ ) − = 0. dθ ∂θ Since the Lagrangian L does not depend on θ, we have ∂∂θL = 0, and so there is a constant c ∈ R such that sin2 (ψ)θ d d sin2 (ψ)θ $ (L θ ) = = c. =0 ⇒ $ dθ dθ sin2 (ψ)θ2 + 1 sin2 (ψ)θ2 + 1 Therefore we have θ =
k $ sin(ψ) sin(ψ) − k 2
⇒
θ=
k $ dψ. sin(ψ) sin(ψ) − k 2
Changing the variable of integration to u = cotg( ), we get sin(ψ) = du dθ = − , 1 + u2 Taking k =
1−c2 , c2
θ=− so
√
and
* ) 1 − c2 sin2 (ψ) − c2 = c. − u2. c2
the integral is written as
du k2
√ 1 1+u 2
−
u2
= arcos
u k
+ ψ0
⇒
cos(θ − ψ0 ) =
cotg(ψ) , k
160
3 Differentiation in Banach Spaces
cos(φ) =0 k. sin(ψ) cos(ψ0 ). cos(θ). sin(ψ) − sin(ψ0 ). sin(θ) sin(ψ) − cos(φ) = 0. k. sin(ψ) cos(ψ0 ). cos(θ) − sin(ψ0 ). sin(θ) −
Since coordinates of the curve are x = cos(θ). sin(ψ), y = sin(θ). sin(ψ) and z = cos(φ), then the curve is contained in the plane cos(ψ0 ).x − sin(ψ0 ).y − z = 0. Therefore the trace of a spherical geodesic is a segment of a great circle of the sphere. (c) Hyperbolic Geometry. Let the surface be S = H2 = {(x, y ∈ R2 | y > 0}. The Riemannian metric at p = (x, y) is 1 0 t y2 v. g p (u, v) = u 0 y12 The length of γ(t) = (x(t), y(t)) ∈ ( p, q) is L(γ) =
b
a
$ ((x (t))2 + (y (t))2 dt. y(t)
Considering that γ(t) = (t, y(t)) ∈ g ( p, q), the length is L(γ) =
b
a
$
(1 + (y (t))2 dt. y(t)
variational formulation: minimize:
b √1+(y (t))2 a
y(t)
dt,
constrain to: γ ∈ g ( p, q). Solution: As in the last examples, we have $
L − y L y = Taking k =
√1 , c
(y )2 1 + (y )2 1 − $ = $ = c. y y 1 + (y )2 y 1 + (y )2
$ we get y 1 + (y )2 = k 2 , and so
(26)
5 Classical Examples in Variational Calculus
dy dx
2
161
k2 − y2 dy = = , ⇒ y2 dx
By integrating the equation d x = √
y
k 2 −y 2
%
k2 − y2 . y2
dy, we get a constant c ∈ R such that
(x − c)2 + y 2 = k 2 . Then the trace of a hyperbolic geodesic, defined by the graph of a function, is an arc contained in the upper part (y > 0) of a circumference centered on the x-axis. When the points p, q are on a vertical line, say p = (x0 , a) and q = (x0 , b), the geodesic is a curve γ(t) = (0, y(t)) with a trace that is a vertical segment of the line with length
b y(b) y dt = ln . L(γ) = y(a) a y (3) Minimum area of Surface of Revolution Given a curve γ ⊂ R2 and γ ∈ ( p, q), we would like to find a surface S ⊂ R3 with the minimum area among all the surfaces obtained by rotating a curve γ ∈ ( p, q) around some fixed axis. Given the curve γ ∈ g ( p, q), γ(t) = (t, f (t)), consider the surface Sγ obtained by rotating γ around the axis-x. A parametrization of Sγ is given by ⎞ ⎛ ⎞ ⎛ ⎞ ⎞ ⎛ x(t) 1 0 0 t t ⎝ y(t) ⎠ = ⎝ 0 cos(θ) − sin(θ) ⎠ . ⎝ f (t) ⎠ = ⎝ f (t). cos(θ) ⎠ . z(t) 0 sin(θ) cos(θ) 0 f (t). sin(θ) ⎛
The area of Sγ is A( f ) = 2π
b
$ f (t) 1 + [ f (t)]2 dt.
a
variational formulation: $ b minimize: A( f ) = 2π a f (t) 1 + [ f (t)]2 dt, constrain to: f ∈ g ( p, q).
(27)
(4) Lagrangian Formulation of Classical Mechanics Joseph Louis Lagrange observed that the trajectory of a particle, subject to a is the one minimizing the action of the motion. Let the cinematic force field F, energy of the particle, traversing a trajectory γ(t), be K (γ) = 21 m | γ˙ |2 , and let the potential energy be U ∈ C 1 (R3 ). The Lagrangian function is defined by L(t, γ, γ(t)) ˙ = K (γ) − U (γ) and the Action of the movement is defined by the integral
162
3 Differentiation in Banach Spaces
S(γ) =
b
K (γ) − U (γ) dt.
a
Then to determine the trajectory of the part subject to the field F = −∇U , we have to solve the following variational problem: variational formulation: b minimize: S(γ) = a (K (γ) − U (γ)) dt., constrain to: γ ∈ ( p, q) = {γ ∈ C 0 ([a, b]; Rn ) | γ(0) = p, γ(b) = q}. (28) In Classical Mechanics, the dynamics of a body of mass m, subject to a force is obtained by studying the properties of the solutions of Newton’s field F, F = m a as time evolves. Lagrange noted that Newton’s equation is equivalent to minimizing a functional. Let’s assume the force field F = −∇U 3 is defined by the potential U : R → R,3 U =3 U (γ). Given a curve γ(t) = ˙ = γ1 (t), . . . , γn (t) , the Lagrangian L : R × R × R → R, L t, γ(t), γ(t) L t, x1 (t), . . . , xn (t), x˙1 (t), . . . , x˙n (t) , is L(t, γ(t), γ(t)) ˙ =
n m | γ˙ i (t) |2 −U (γ(t)). 2 i=1
The Euler-Lagrange equations define the ODE system m γ¨ i = −
∂U , i = 1, . . . , n. ∂xi
The Euler-Lagrange equations turn out to be Newton’s equation F = m γ(t). ¨ (5) Euclidean Geometry II Now we consider the space of admissible functions as the space of parametrized curves in R2 : ( p, q) = {γ : [a, b] → R2 | γ(t) = (x(t), y(t)) ∈ C 1 ([a, b], R2 ), γ(a) = p, γ(b) = q}.
The variational problem is
b$ minimize: L(γ) = a x˙ 2 + y˙ 2 constrain to: γ ∈ ( p, q).
Consider the space V = { : [a, b] → R2 | ∈ C 1 , (a) = (b) = 0} and let γt (s) : (−, ) → ( p, q) be the curve in the space of admissible functions given by
5 Classical Examples in Variational Calculus
163
γs (t) = γ(t) + s(t), ∈ V, and such that γ0 (t) = γ(t). If γ ∈ ( p, q) and ∈ V , then ∂γs (t) |s=0 = (t). ∂s
γ0 (t) = γ(t), Let γ ∈ ( p, q), γ˙ =
dγ dt
and γ =
dγ . ds
(a) γ is regular at t ∈ [0, 1] when it is differentiable at t and γ (t) = 0. We say that γ is regular if it is regular for all t ∈ [0, 1]. (b) The normal vector to γ is γ¨ − N= γ¨ −
¨ γ˙ |γ| ˙ 2 .
¨ γ˙ |γ| ˙ 2
(c) The curvature of γ is the function kγ : [0, 1] → R given by kγ =
| γ˙ ∧ γ¨ | . | γ˙ |3
Let γ ∈ ( p, q) be a regular curve with a curvature kγ and normal vector N . The length of γs (t) defines a function L : (−, ) → R,
1
L(s) = L(γs ) =
| γ˙ s | dt.
0
Therefore we have 1 ˙ | ds = | γ˙ + s L(s) = 0
1
) ˙ > +s 2 | ˙ |2 dt. | γ˙ |2 +2t < γ, ˙
0
The variational derivative of L(s) with respect to the parameter s is dL L(s) − L(0) |s=0 = lim = d L γ .. s→0 ds t Since dL = ds
1 0
$
˙ > +s | ˙ |2 < γ, ˙ dt, ˙ > +s 2 | ˙ |2 | γ˙ |2 +2s < γ, ˙
we have dL |s=0 = ds
0
1
dt. , | γ˙ |
(29)
164
3 Differentiation in Banach Spaces
We integrate the expression by parts aware of the following identities:
d d < γ, > =< (i) dt dt (ii)
γ˙ γ˙ ˙ >, , > + < , | γ˙ | | γ˙ |
1 < γ, ˙ γ˙ > d γ˙ ( )= γ˙ − γ ˙ , dt | γ˙ | | γ˙ | | γ˙ |2
< γ, ˙ γ¨ > 2 < γ, ˙ γ¨ >2 | γ˙ ∧ γ¨ |2 2 (iii) γ¨ − γ ˙ =| γ ¨ | − = . | γ˙ |2 | γ˙ |2 | γ˙ |2 So we have dL =< γ(1), ˙ (1) > − < γ(0), ˙ (0) > + ds
1
dt. | γ˙ |3
The space V is endowed with the inner product < 1 , 2 >=
b
< 1 (t), 2 (t) > dt.
a
Since (1) = (0) = 0 and kγ = dL |s=0 = ds
1
|γ∧ ˙ γ| ¨ , |γ| ˙ 3
the equation becomes
< −kγ (t)N (t), (t) > dt = < −kγ N , > .
0
Hence d L γ . =< −kγ N , >. Theorem 9 Let p, q ∈ R2 . A curve γ : [a, b] → R2 is a geodesic from Euclidean Geometry if and only if the trace of γ is a straight line. Proof Suppose γ ∈ ( p, q) is a critical point. Therefore d L γ . = 0 for all ∈ V . This is equivalent to < ∇ L(γ), >= 0 for all ∈ V. The du Bois-Reymond Lemma implies that ∇ L(γ) = −kγ N = 0. Since N = 0, then kγ = 0. Consequently, γ˙ ∧ γ¨ = 0, and so one of the following cases may occur: (a) γ¨ = 0. In this case, we have γ(t) = t (q − p) + p. Hence the trace is a straight segment. (b) We have a function f : [0, 1] → R such that for all t ∈ [0, 1], γ(t) ¨ = f (t).γ(t). ˙ The general solution of the ODE
5 Classical Examples in Variational Calculus
165
⎧ ⎪ ⎨ γ¨ − f γ˙ = 0, γ(0) = p, ⎪ ⎩ γ(1) = q, is given as
t
γ(t) =
σ 0 f (τ )dτ dσ 0 e 1 σ f (τ )dτ (q 0 dσ 0 e
− p) + p.
Hence the trace of γ is a straight line in R2 .
6 Fredholm Maps A differentiable map f : E → F between Banach spaces is a Fredholm map if d f x : E → F is a linear Fredholm operator for all x ∈ E. Since the Fredholm index is invariant by a continuous perturbation, the d f x index does not depend on x. Let ind(f) be the index of f . Some theorems for differentiable maps between finite dimensional spaces extend to the case when the map is Fredholm. A point y ∈ F is a regular value of f if d f x is surjective and has a right-inverse for all x ∈ f −1 (y); otherwise y is a critical value. Under these conditions, we will prove that f −1 (y) is a differentiable n = ind(f) dimensional manifold. Theorem 10 Let E, F be Banach spaces and let U ⊂ E be an open subset. If f : U → F is a Fredholm C k -map and y ∈ F is a regular value of f , then M = f −1 (y) ⊂ X is a C k -differentiable Banach manifold and Tx M = Ker(dfx ) for all x ∈ M. Moreover, dim(M) = ind(f) is finite. The proof of the above theorem follows the arguments outlined in the Preamble of the Inverse Function Theorem. Again, the Contraction Lemma is the backbone of the proof, which is a task requiring a clear goal and also a lot of perseverance. Before proceeding with the proof, we will discuss the strategy we will use. We consider y = 0 a regular value of f and M = f −1 (0). Let x0 ∈ U be a point, not necessarily in M, such that d f x0 is a surjective Fredholm operator for which we define E 0 = Ker(dfx0 ). Then we have a closed subspace E 1 ⊂ E, such that E = E 0 ⊕ E 1 and d f x0 : E 1 → F is an isomorphism. Let Q x0 = d f x−1 : F → E 1 be the linear 0 −1 operator given by Q x0 (y) = d f x0 (y). Q x0 is injective and is the right inverse of d f x0 . For this reason, for every x ∈ E, we have ξ ∈ Ker(df0 ) and η ∈ F such that x = (x0 + ξ) + Q x0 (η).
166
3 Differentiation in Banach Spaces
For the following, we fix the point x0 , so that f (x0 ) is very small and define d f 0 = d f x0 and Q x0 = Q. The goal is to show that we have open sets V ⊂ Ker(d f 0 ), W ⊂ M and a diffeomorphism : V → W . Consequently M is locally diffeomorphic to Ker(d f 0 ). Given x1 = x0 + ξ, such that f (x1 ) is sufficiently small and ξ ∈ Ker(d f 0 ), therefore the strategy is to prove the solution of f (x) = 0 in the equivalence class x1 + Im(Q). There is no need for x0 and x1 to be solutions of f (x) = 0; we just need them to be near to a solution. For this purpose, let U ⊂ E be an open subset containing x0 , such that U ∩ M = ∅, and consider the differentiable maps h:U → F , h(x) = f (x) − d f 0 (x − x1 )
ψ:U → E ψ(x) = x + Qh(x).
Proposition 10 Let f : U → F be a C k -map and let x0 be such that d f 0 : E → F is surjective. So ψ(x) = x1 if and only if f (x) = 0 and x − x1 ∈ Im(Q). Proof Let’s assume ψ(x) = x1 , so Q( f (x) − d f 0 (x − x1 )) = x1 − x ⇒ x − x1 ∈ Im(Q). Let u ∈ F be such that Q(u) = x − x1 ; x1 = ψ(x) = x + Q( f (x) − d f 0 Q(u)) = x + Q( f (x) − u) = x + Q( f (x)) − x + x1 ⇒ Q( f (x)) = 0. Therefore we have f (x) = 0 since Q is injective. The reverse assertion is trivial (Fig. 7). We need to show that the equation ψ(x) = x1 has a unique solution in the neighborhood of x0 . We take δ > 0 and c > 0 such that Bδ (x0 ) ⊂ U , | Q |≤ c and x ∈ Bδ (x0 ) ⇒ || d f x − d f 0 ||≤
1 . 2c
For every y ∈ U , we consider the map φ y : Bδ (x0 ) → E, φ y (x) = x + (y − ψ(x)). Then 1 | d(φ y )x =| I − dψx | = | Q.(d f x − d f x0 | ≤ , 2 for all x ∈ Bδ (x0 ). Applying the Mean Value inequality, it follows that φ y is a contraction since || φ y (x2 ) − φ y (x1 ) || ≤
1 || x2 − x1 || . 2
(30)
6 Fredholm Maps
167
Fig. 7 f : E → F is Fredholm
The map ψ : Bδ (x0 ) → E is injective since || ψ(x2 ) − ψ(x1 ) || =|| (x2 − x1 ) − (φ y (x2 ) − φ y (x1 )) || ≥ || x2 − x1 || − || φ(x2 ) − φ(x1 ) || ≥ 1 1 || x2 − x1 || , ≥ || x2 − x1 || − || x2 − x1 || ≥ 2 2
(31)
for any x1 , x2 ∈ Bδ (x0 ). By restricting the domain of φ y to Bδ = Bδ (x0 ), we have the map ψ : Bδ → Uδ = ψ(Bδ ). Therefore we assert ψ(x) = y has a solution in Bδ for y ∈ Uδ . Lemma 3 Let y0 ∈ Uδ and y0 = ψ(x0 ). Consider 0 < ≤ δ such that B (x0 ) ⊂ Bδ (x0 ) and B/2 (y0 ) = {y ∈ E; || y − y0 ||< /2}. So (i) Uδ ⊂ E is open. Indeed, if y ∈ B/2 (y0 ), then y ∈ Uδ . (i) ψ : Bδ (x0 ) → Uδ is a diffeomorphism. Proof (i) For any x ∈ B (x0 ) and y ∈ B/2 (y0 ), we have || φ y (x) − x0 || ≤ || φ y (x) − φ y (x0 ) || + || φ y (x0 ) − x0 || ≤ 1 || x − x0 || + || y − y0 ||≤ . ≤ 2
168
3 Differentiation in Banach Spaces
So φ y (x) ∈ B (x0 ) and φ y (B (x0 )) ⊂ B (x0 ). Since B (x0 ) ⊂ E is a complete metric space, it follows from the Contraction Lemma that φ y : B (x0 ) → B (x0 ) has a single fixed point. Let x y ∈ B (x0 ) be the fixed point of φ y . Therefore y0 ∈ Uδ since ψ(x y ) = y. Moreover, since B/2 (y0 ) ⊂ Uδ , the subset Uδ is open. (ii) In particular, φ y : Bδ (x0 ) → Bδ (x0 ) is a contraction. Due to the uniqueness of the fixed point, the map ψ : Bδ → Uδ is injective. Also, we have dψx0 = I . Hence ψ : Bδ → Uδ is a diffeomorphism. Theorem 11 Let x0 ∈ M = f −1 (0) and suppose that d f x0 is surjective. Let δ and c be such that Bδ (x0 ) ⊂ U , | Q |≤ c and x ∈ Bδ (x0 ) ⇒ || d f x − d f x0 || ≤ If x1 ∈ E satisfies || x1 − x0 ||< 8δ and || f (x1 ) ||≤ xˆ1 ∈ Uδ such that (i) f (xˆ1 ) = 0, and xˆ1 − x1 ∈ Im(Q). (ii) || xˆ1 − x1 ||≤ 2c || f (x1 ) ||.
1 . 2c δ , 4c
then we have a unique
Proof (i) We need to check that x1 ∈ Uδ . Since ψ(x1 ) = x1 + Q f (x1 ), we have || x1 − ψ(x0 ) || =|| ψ(x1 ) − ψ(x0 ) − Q f (x1 ) || ≤ || ψ(x1 ) − ψ(x0 ) || + | Q | . || f (x1 ) || ≤ 2 || x1 − x0 || +c. || f (x1 ) ||
+
1 < X, Y >| v |2 2
coupling the connection A to the spinor φ. The Seiberg-Witten map is F :A × c+ → 2+ (u1 ) × c− F(A, φ) = FA+ − σ(φ), D + Aφ .
(34)
The Seiberg-Witten Monopole moduli space is Mc = F −1 (0)/G. Let T(A,φ) G be the tangent plane to the orbit G.(A, φ) and T(A,φ) Cc the tangent plane to Cc 3 Cl
4
is a Clifford Algebra.
6 Fredholm Maps
175
at the point (A, φ). The differential of F at (A, φ) defines the linear operator dF(A,φ) : 1 (u1 ) × c+ → 1 (u1 ) × c− . Using a diagram, we get dF(A,φ)
d1
0 −−−−→ 0 (u1 ) −−−−→ 1 (u1 ) × c+ −−−−→ 2+ (u1 ) × c− . 0 Likewise in the Instantons example, we introduce the vector spaces H(A,φ) = 2 (u )× +
+ 1 + 1 c Ker(d1 ), H(A,φ) = Ker(dF(A,Œ) )/Im(d1 ) and H(A,φ) = Im(dF . Analogously, (A,Œ)) 0 (i) (A, φ) is irreducible if and only if H(A,φ) = 0. 1 (ii) The tangent plane to Mc at (A, φ) is H(A,φ) . + (iii) The differential dF(A,φ) is surjective if and only if H(A,φ) = 0.
Theorem 16 If b2+ (M) ≥ 1, then the space Mc is either empty or it is a differentiable manifold for a generic metric g. Moreover, Mc is compact and orientable with dimension dim (Mc ) =
# 1" 2 c1 (L(E)c ) − 2χ(M) − 3σ(M) , 2
(35)
χ(M) is the Euler characteristic of M and σ(M) is the signature of the intersection form of M.
7 An Application of the Inverse Function Theorem to Geometry In the previous section, we introduced the concept of geodesic, and showed how to find a geodesic connecting two points p, q by solving a variational problem. When we consider an open subset U ⊂ Rn endowed with a Riemannian metric g, it is very difficult to find a geodesic connecting two points p, q ∈ U . In this section, we will prove the existence of a geodesic if p, q are non-conjugate points, which means they are close enough. The proof is an application of InFT. We remind the reader that the Riemannian metric g induces an internal product g : Tx U × Tx U → R on the tangent plane Tx U Rn . Once a frame β = {ei | 1 ≤ i ≤ n} is fixed on U , the entries of the matrix associated to the metric g, with respect to the frame β, are functions gi j : U → R given by gi j (x) = gx (ei , e j ) (gi j = g ji ). The tangent vector to a curve γ : [a, b] → U , γ(t) = (γ1 (t), . . . , γn (t)), at the instant t, is γ(t) ˙ = (γ˙ 1 (t), . . . , γ˙ n (t)) = i j γ˙ i ei . The length of γ is L(γ) = a
b
$
g(γ(t)), ˙ g(γ(t)) ˙ = a
b
%
gi j (t)γ˙ i (t)γ˙ j (t),
i, j
gi j (t) = gγ(t) (ei , e j ). The Euler-Lagrange equations are
176
3 Differentiation in Banach Spaces
dγk2 k dγi dγ j + i j (γ(t)) = 0, k = 1, . . . , n. dt 2 dt dt i, j The functions ikj : U → R are Christoffel’s symbol of the metric g. They are given by imj
∂gi j 1 ∂g jk ∂gki .g km ; i, j, k ∈ {1, . . . , n}. = + − 2 k ∂xi ∂x j ∂xk
The Banach space C 2 [a, b]n = { f : [a, b]n → Rn | f ∈ C 2 } is endowed with the norm, di f || f ||C 2 = sup max | (t) | . dt i 0≤i≤2 t∈[a,b] 0 = { f ∈ C 2 [a, b]n | f ([a, b]) ⊂ U } of C 2 [a, b]n , conOn the open subset U sider the map 0 → C 0 [a, b]n G :U ⎛ ⎞ 2 2 dγ dγ dγ dγ dγ dγ j j⎠ i i , . . . , 2n + γ → ⎝ 21 + i1j (γ(t)) inj (γ(t)) . dt dt dt dt dt dt i, j i, j The differentiable map that will matter to solve our question is the following; 0 → C 0 [a, b]n × Rn × Rn P :U γ → (G(γ), γ(a), γ(b)) . With respect to the frame β, we have G(γ) =
λ
⎛
⎞ 2 dγ d γ dγ j⎠ λ i ⎝ + iλj (γ(t)) Gλ eλ . eλ = 2 dt dt dt i, j λ
In local coordinates, the differential dGγ : C 2 [a, b]n → C 0 [a, b]n is given by (dGγ .h)λ =
d 2hλ dγi dh j ∂i j dγi dγ j l λ + h. + 2 (γ) i j dt 2 dt dt ∂xl dt dt i, j l i, j λ
Let’s consider the decomposition (dGγ .h)λ = D2 h λ + K 1λ (h) + K 2λ (h)
(36)
7 An Application of the Inverse Function Theorem to Geometry
177
and D2 h λ =
∂i j dγi dγ j d 2hλ dγi dh j λ λ λ , K (h) = (γ) (h) = , K hl . 1 i j 2 dt 2 dt dt ∂x dt dt l i, j l i, j λ
0 → C 0 [a, b]n × Rn × Rn at γ is The differential of P : U d Pγ :C 2 ([a, b]n ) → C 0 [a, b]n × Rn × Rn , d Pγ .h = (dGγ .h, h(a), h(b)) d Pγ .h = D2 h λ + K 1λ (h) + K 2λ (h) and D2 h λ = (D2 h λ , h(a), h(b)) ∂ λ dγ dγ dγi dh j j l ij i iλj (γ) K 1λ (h) = , 0, 0 , K 2λ (h) = h , 0, 0 . dt dt ∂xl dt dt i, j l i, j We will show that the operator D2 is a Fredholm operator with index 0 and the operators K 1λ and K 2λ are compact. Once this is done, we will conclude that the operator d Pγ is a Fredholm operator with index 0. (i) D2 is a Fredholm operator with index 0: The assertion follows from the following items; (a) Ker(D2 ) = {0}. Let h = 0 ∈ Ker(D2 ), so h (t) = 0 and h(a) = h(b) = 0. Therefore h (t) = u 0 ∈ Rn and h(t) = u 0 t + u 1 , u 0 , u 1 ∈ Rn . Since h(a) = h(b) = 0, then u 0 = u 1 = 0. is surjective, i.e., (b) D2 : C 2 [a, b]n → C 0 [a, b]n × Rn × Rn CoKer(D2 ) = {0}. 0 × Rn × Rn and define Let (g, p, q) ∈ U h(t) = p + q − p − a
b
s
a
t s t −a + g(θ)dθds . g(θ)dθds. b−a a a
Therefore h(a) = p, h(b) = q and h (t) = g(t). Also the linear operator D2 is surjective and ind(D2 ) = 0. Remark: The bounded functions aiλj =
i
iλj (γ)
∂i j dγi dγ j dγi , ϑlλ = dt ∂xl dt dt ij λ
allow us to consider the constants Caλ = sup j | a λj | and Cϑλ = supl | ϑlλ |. So the operators
178
3 Differentiation in Banach Spaces
Kˆ 1 : V → C 0 (a, b)n ; U a λj v j Kˆ 1 (v) =
Kˆ 2 : V → C 0 (a, b)n ; U ϑλ (γ)vl Kˆ 2 (v) =
j
l
are bounded since || Kˆ 1 (v) ||≤ Caλ || v || and || Kˆ 2 (v) ||≤ Cϑλ || v ||. (ii) K 1λ , for all λ, is a linear compact operator. The operator D1 : C 2 (a, b); R → C 0 (a, b); R , D1 (h) = h , is compact. Fix the Banach spaces E, F, G, the composition T ◦ K , T ∈ L(F, G) and K ∈ K(E, F) is always a compact operator. So the compactness of K 1λ follows from the composition K 1λ (h) = Kˆ 1 (D1 (h)) =
a λj D1 (h j ).
j
(iii) The compactness of K 2λ for all λ is a trivial consequence of K 2λ (h) =
ϑlλ h l .
l
The linear operator d Pγ : C 2 [a, b]n → C 0 [a, b]n × Rn × Rn is a Fredholm operator and ind(dP) = 0. It follows from a Riemannian Geometry Theorem that if p and q are not conjugated points, then the kernel of the differential d Pγ is trivial. The Since kernel of d Pγ contains the Jacobi Fields (see [7]). Therefore d Pγ is injective. ind(dP) = 0, then it is also surjective. Hence d Pγ : C 2 [a, b]n → C 0 [a, b]n × Rn × Rn is an isomorphism at γ. 0 and W ⊂ C 0 [a, b]n × Rn × By the InFT, there are open neighborhoods Vγ ⊂ U Rn such that P : Vγ → W is a diffeomorphism. Set γ as a constant geodesic, let’s say 0 γ(t) = x0 ∈ U , so P(γ) = (0, x0 , x0 ), and take W = Vγ × V0 × V0 , with Vγ ⊂ U a neighborhood of γ and V0 is a neighborhood of x0 in U . Therefore every pair of points ( p, q) ∈ V0 × V0 is connected by a geodesic. Theorem 17 Let U ⊂ Rn be an open subset endowed with a Riemannian metric g. If the points p, q ∈ U are non-conjugated, then we have a unique geodesic in U joining them.
Chapter 4
Vector Fields
Vector fields arise naturally in physics where several variables are of a vectorial nature. We will look at examples in which a physical system is modeled by an ordinary differential equation (ODE). In Classical Mechanics, Newton’s 2nd law imposes the differential equation F = m ddtv . An understanding of the analytical, algebraic and geometric properties of vector fields is the core of the study to understand the evolution of a system governed by an ODE. We will take an easy approach to cover what is needed in the text. Whenever appropriate, we will also consider vector fields X : E → E defined in the space of Banach E. We will denote the Vector Fields by the capital letters F, X, Y, V .
1 Vector Fields in Rn A vector field in U ⊂ Rn is a map F : U → T Rn , such that at each point p ∈ Rn associates a vector F( p) ∈ T p U = Rn . Let (U ) be the space of the vector fields defined in U . In the following examples, it is understood that an orthonormal frame βc = {e1 , . . . , en } is fixed on U . A C k -vector field F defined on an open subset U ⊂ Rn is a map F ∈ C k (U ; Rn ) which we write in local coordinates x = (x1 , . . . , xn ) as F(x) =
n
f i (x)ei , f i ∈ C k (U ).
i=1
On many occasions, we may denote just F(x) = ( f 1 (x), . . . , f n (x)). Unless otherwise stated, we assume that the fields are C ∞ .
© Springer Nature Switzerland AG 2021 C. M. Doria, Differentiability in Banach Spaces, Differential Forms and Applications, https://doi.org/10.1007/978-3-030-77834-7_4
179
180
4 Vector Fields
Example 1 Examples of vector fields. 1. In R2 ; (a) F(x, y) = (1, 0) (constant vector field). (b) F(x, y) = (x, y). Using polar coordinates, we have F(r, θ ) = r rˆ with rˆ = √ √
y x 2 +y 2
x e1 x 2 +y 2
e2 .
(c) F(x, y) = (−y, x). ˆ with θˆ = √ −y Using polar coordinates, we have F(r, θ ) = r θ, 2 √
x x 2 +y 2
+
x +y 2
e2 .
(d) F(x, y) = x 2 + cos(x y),
1 y2
+ esec(x
2
.y)
e1 +
.
2. In R3 , (a) F(x, y, z) =
(x, y, z) x 2 + y2 + z2
, (b) F(x, y, z) = (x y, yz, zx).
3. In Rn . Let A ∈ Mn (R) be a matrix and define the linear vector field F : Rn → Rn , F(x) = A.x; ⎞ ⎛ ⎞ ⎛ x1 a11 a12 · · · a1n ⎜a21 a22 · · · a2n ⎟ ⎜ x2 ⎟ ⎟ ⎜ ⎟ ⎜ F(x, y, z) = ⎜ . . . . ⎟ . ⎜ . ⎟ . ⎝ .. .. .. .. ⎠ ⎝ .. ⎠ an1 an2 · · · ann
xn
4. The Euler-Lagrange equations (15) in Chap. 3 associated with the problem
b Minimize: F(γ ) = a L(t, γ (t), γ˙ (t))dt, constraint to: γ ∈ E
(1)
defines a vector field in E. Integral curves of a vector field F : U → Rn are those curves γ : (a, b) → U satisfying the equation (2) γ (t) = F(γ (t)), i.e., they are curves with a tangent at the point γ (t) that is F(γ (t)). Considering vector n f i (x)ei , Eq. (2) is written as a system of ODE γ = (γ1 , . . . , γn ) and F(x) = i=1 ⎧ ⎪ ⎪ ⎨γ1 (t) = f 1 (γ (t)), .. . ⎪ ⎪ ⎩γ (t) = f (γ (t)) . n n
(3)
1 Vector Fields in Rn
181
Once a point x0 ∈ U is fixed, the IVP (initial value problem) associated with Eq. (2) is γ (t) = F(γ (t)), (4) γ (0) = x0 . Example 2 Examples. 1. The ODE x + x = 0 governs the simple harmonic oscillator without friction and without external force. If the initial conditions are x(0) = a and x (0) = b, the IVP solution is x(t) = a cos(t) + b sin(t) = a 2 + b2 cos(t − θ0 ), with cos(θ0 ) = √a 2a+b2 and sin(θ0 ) = √a 2b+b2 . Let’s introduce a vector field associated with the ODE by considering the variable y = x . The ODE becomes the linear system 0 1 x x −y = . = , y −1 0 y x associated with the linear vector field F(x, y) = (−y, x). 2. The nonlinear pendulum is governed by the ODE, θ +
k 1 θ + sin(θ ) = 0. m l
(5)
In introducing the variable ω = θ , the ODE becomes the linear system
θ = ω, ω = − mk θ − 1l sin(θ ).
(6)
In this way the parameter is governed by the ODE associated with the planar vector field F(θ, ω) = (ω, − mk θ − 1l sin(θ )). Definition 1 Let U ⊂ Rn be an open subset and let F : U → Rn be a vector field. A point p ∈ U is a singularity or a critical point of F if F( p) = 0. A singularity is a stationary point in the vector field F since the only solution to the IVP γ (t) = F(γ (t)) γ (0) = p is γ (t) = p. Example 3 Examples of singularities of linear vector fields on R2 ; 1. F(x, y) = (x, y). The only singularity is at p = (0, 0). For the point (x0 , y0 ), the solution to the IVP: x 10 x x x(0) x = . = , = 0 , y y0 01 y y y(0)
182
4 Vector Fields
is the integral curve γ (t) = et (x0 , y0 ). Taking the limit t → ∞, we have γ (t) → ∞. This is an example of an unstable singularity, also called a source because any point x = p in a neighborhood of p moves away from p, while the parameter t evolves to ∞, as shown in Fig. 1. 2. F(x, y) = (−x, −y). In this case, p = (0, 0) is the only singularity. For each point (x0 , y0 ), the solution to the IVP x −1 0 x −x x(0) x = . = , = 0 y y0 0 −1 y −y y(0) is the integral curve γ (t) = e−t (x0 , y0 ). Taking the limit t → ∞, we have γ (t) → p. This is an example of a stable singularity called a sink because any point x = p in a neighborhood of p moves towards p, while the parameter t evolves to ∞, as shown in Fig. 2. 3. F(x, y) = (x, −y). The only singularity is at p = (0, 0). For each point (x0 , y0 ), the solution to the IVP x 1 0 x x x(0) x = . = , = 0 , y y0 0 −1 y −y y(0)
Fig. 1 Unstable singularity
1 Vector Fields in Rn
183
Fig. 2 Stable singularity
is the integral curve γ (t) = (et x0 , e−t y0 ). As we can see in Fig. 3, there are directions in which the trajectory approaches p and also directions moving away from p. In this case, p is a saddle point singularity. 4. F(x, y) = (−y, x). The point p = (0, 0) is the only singularity. For each point (x0 , y0 ), it follows from item (1) in Example 2 that the solution to the IVP x 01 x −y x(0) x = . = , = 0 , y y0 −1 0 y x y(0) is the integral curve γ (t) = (A cos(t + θ0 ), A sin(t + θ0 )),
A=
x02 + y02 .
Therefore γ (t) is a closed orbit for which the trace is the circumference x 2 + y 2 = A2 . Note in this example that p = (0, 0) is neither a sink nor a source. 5. F(x, y) = (y − x2 , −x − 2y ). The point p = (0, 0) is the only singularity at p = (0, 0) and it is stable, as shown in Fig. 5. We highlight below some elementary questions in studying the vector field F : U → Rn , U ⊂ Rn .
184
Fig. 3 Saddle singularity
Fig. 4 Periodic orbit
4 Vector Fields
1 Vector Fields in Rn
185
Fig. 5 F(x, y) = (y − x2 , −x − 2y )
Question 1:
Let p ∈ U ⊂ E and let F : U → E be a vector field. The IVP
γ (t) = F(γ (t)), γ (0) = p
(7)
admits a solution? How many? Question 2: Find all the singularities of a vector field F and classify them in terms of their stability (easy for linear fields, hard for nonlinear). Question 3: Assuming that the IVP (7) has a unique solution γ p (t) (deterministic problem), is it possible to describe the evolution and the limit of p along its trajectory (integral curve)? In R2 , the Bendixon-Poincaré theorem asserts that if U is compact, then either limt→∞ γ p (t) is a singularity of F or it is asymptotic to a closed orbit (see [31]). Question 4: Are there periodic orbits? Example 4 Nonlinear Pendulum. Let’s look at the linear system of the Linear Pendulum with an edge measuring l, as shown in Fig. 6. We will assume that the gravitational constant is g = 1, the mass of the edge is negligible, and there is a frictional force F f proportional to the speed at the end of the pendulum. Let θ be the angle formed by the edge of the pendulum and and, consequently, the velocity the vertical line, so the angular velocity is θ = dθ dt
186
4 Vector Fields
Fig. 6 Pendulum
is v = l.θ and F f = kl.θ (k > 0). It follows from Newton’s Law that the resulting force is F = −F f − m. sin(θ ), that is, the ODE governing the system is θ +
k 1 θ + sin(θ ) = 0. m l
(8)
As seen before, the pendulum is governed by the ODE associated with the planar vector field F(θ, ω) = (ω, − mk θ − 1l sin(θ )). Let’s study the possibilities; (i) the free pendulum with small oscillations (F f = 0 and sin(θ ) ∼ θ ). The vector field F(θ, ω) = (ω, − 1l θ ) has a unique singularity at p = (0, 0). So it is equal to the harmonic oscillator, and so the integral curves are centered at the origin as shown in Fig. 4. (ii) pendulum without friction (F f = 0). In this case, the field F(θ, ω) = (ω, − 1l sin(θ )) has singularities at pk = (kπ, 0), k ∈ N. The phase space is shown in Fig. 7. We stress the existence of closed orbits and unstable singularities. (iii) pendulum with friction. The force field in this case is F(θ, ω) = (ω, − mk ω − 1l sin(θ )). Again, the only singularities are at pk = (kπ, 0), k ∈ N. There is no closed orbit, as shown in Fig. 8. The phase portrait of a vector field F : U → Rn (U ⊂ Rn ) is a copy of U in which a point represents a possible physical state of a system governed by ODE γ (t) = F(γ (t)). The integral curves passing through the points represent the trajectory of each point while the parameter evolves, therefore showing the evolution of the states of the system. Figures 4, 7 and 8 show the phase portrait of each pendulum system under consideration. In describing the phase portrait of a vector field, we can make a qualitative analysis of the evolution of each state, i.e., the evolution of each point along the integral curve γ p (t) when t → ∞. It is also important, though a difficult
1 Vector Fields in Rn
Fig. 7 Frictionless pendulum
Fig. 8 Pendulum with friction
187
188
4 Vector Fields
task, to find the invariant sets by the integral curves, i.e., for any integral curve γ , find those sets V ⊂ U such that if γ ∩ V = ∅, then γ (t) ⊂ V for all t ∈ R.
Exercises 1. Show that the system
x = −y + x(x 2 + y 2 − 4), y = x + y(x 2 + y 2 − 4)
(9)
has a periodic orbit in which all other integral curves accumulate. Conclude that the phase space of the system is the one in Fig. 9 (hint: use polar coordinates): 2. Show the phase portrait of the system below is the one in Fig. 10:
Fig. 9 Attracting orbit
x = 2x − y + 3(x 2 − y 2 ) + 2x y, y = x − 3y − 3(x 2 − y 2 ) + 3x y .
(10)
2 Conservative Vector Fields
189
Fig. 10 System (10)
2 Conservative Vector Fields Let U ⊂ Rn be a star-shaped open subset. A vector field F : U → Rn is conservative, if for any closed continuous curve γ : [0, 1] → Rn , γ (0) = γ (1), the path integral over γ is null, i.e.,
γ
1
F=
< F(γ (t)), γ (t) > dt = 0.
0
n As a result, for any pair of points p, q ∈ U and a C 0 curve α : [0, 1]
→ R such that α(0) = p and α(1) = q, if F is conservative, then the integral α F does not depend on α. In the next section, assuming U is star-shaped and F is conservative, we will prove that we have a function V : U → R such that F = −∇V . The function V is the potential function of F. Conservative vector fields are also known as gradient vector fields; they are of great importance since they emerge in many applications. In Classical Mechanics, a body of mass m displaced under the effects of a conservative
190
4 Vector Fields
vector field F = −∇V has its total energy conserved. Given Newton’s 2nd Law, a . force field acting on a body of mass m and velocity v satisfies the identity F = m dv dt In this case, let α be the curve defined by the trajectory along the displacement from p to q (v = α ). The work done by F along the displacement is
α
1
F=
dt = m dt
1
0
d 1 1 | α (t) |2 dt = m | v(q) |2 − m | v( p) |2 . dt 2 2
For a body of mass m, the kinetic energy is K (v) = 21 m | v |2 . Since F is conservative, we have F = −∇V , and so
α
1
F =−
< ∇V (α(t)), α (t) > dt =
0
Consequently,
1 0
m 2
dV dt = −V (q) + V ( p). dt
| v(q) |2 − m2 | v( p) |2 = −V (q) + V ( p). Therefore m m | v(q) |2 +V (q) = | v( p) |2 +V ( p). 2 2
The total energy of the system at the point x ∈ U is given by the function E(x) =
m | v(x) |2 +V (x). 2
So we have proved the conservation of the total energy. Theorem 1 Let F = −∇V be a vector field and suppose a body of mass m is under the action of F. If γ : (−a, a) → U is an integral curve of the equation γ (t) = F(γ (t)), then the energy function E γ = E(γ (t)) : (−a, a) → R is constant along γ . A vector field F : U → Rn is a central vector field if we have a function λ : U → R such that F(x) = λ(x).x.
Exercises 1. Show that a central vector field is conservative. 2. Let k > 0 be a constant. Consider F(x) = −kx and show it is conservative. Apply Newton’s Second Law to find the integral curves of F. 3. Find necessary conditions for a vector field F to be conservative. 4. Find sufficient conditions for a vector field F to be conservative. (hint: read Chap. 5).
3 Existence and Uniqueness Theorem for ODE
191
3 Existence and Uniqueness Theorem for ODE The cornerstone of this chapter is the Existence and Uniqueness Theorem, which answers Question (1) above. Let E be a Banach space and let U ⊂ E be an open subset. We will assume that X : U → E is a Lipschitz function with constant K > 0, that is, for every pair x, y ∈ U , we have | X (y) − X (x) | ≤ K . | y − x | . Let I : C 0 ([−a, a]); E) → C 1 ([−a, a]); E) be the operator given by
t
I(α)(t) = x0 +
X (α(s))ds.
(11)
0
If γ is an integral curve of X and γ (0) = x0 (IVP (2)), then γ (t) = x0 +
t
X (γ (s))ds.
(12)
0
Then the identity I(γ ) = γ implies that γ is a fixed point of I. To prove that the IVP γ (t) = X γ (t) , γ (0) = x0 , has a solution, we need the hypothesis of X to be Lipschitz. The strategy will be to show the existence of a fixed point of I. Consider Ia = (−a, a) to be the open interval and I n (α) = I(I n−1 (α)). Proposition 1 Let K > 0 be the Lipschitz constant of X : Ia × U → E. So | I n (α)(t) − I n (β)(t) | ≤
Kn | t |n . || α − β ||0 , n!
(13)
for all α, β ∈ C 0 (Ia , E), n > 0 and t ∈ Ia . Proof The statement is immediate for n = 0. Let α, β ∈ C 0 (Ia , E). From the definition of I, we have || I(α) − I(β) ||0 ≤
t 0
| X (α(s)) − X (β(s)) | ds ≤ K .
t 0
≤ K | t | . || α − β ||0 .
The identity is true for n = 1. Let’s check the case n = 2;
| α(s) − β(s) | ds ≤
192
4 Vector Fields
t
|| I 2 (α) − I 2 (β) ||0 ≤ K
| I (α)(s) − I (β)(s) | ds ≤ K 2 || α − β ||0
0
≤ K2
t
| s | ds ≤
0
| t |2 . || α − β ||0 . 2
By induction, we assume the inequality (13) is true for n. By an analogous procedure, we obtain the inequality (13) for n + 1. Lemma 1 The operator I : C 0 (Ia ; E) → C 1 (Ia ; E) has a unique fixed point. Proof Since the series e K |t| is convergent for any t ∈ Ia , we have lim Kn! | t |n = 0. n Setting c ∈ (0, 1) and taking n 0 so that for every n > n 0 , we have Kn! | t |n < c < 1, then we have || I n (α) − I n (β) ||0 < c. || α − β ||0 . n
Therefore for all n > n 0 , I n is a contraction.
Let B r (x0 ) = {x ∈ U ; | x − x0 |≤ r } be the closed ball. Theorem 2 (Existence and Uniqueness) Let E be a Banach space and assume X : U ⊂ E → E is a Lipschitz map with the constant K > 0 such that supx∈Br (x0 ) | X (x) |≤ M. So there is > 0 such that the IVP (4) admits a unique solution γ : (− , ) → E of class C 1 . Proof Consider the Banach space Pr = C 0 Ia ; B r (x0 ) . Let’s check that the hypothesis implies the existence of > 0, such that I : Pr → Pr has a unique fixed point. Since t | X (α(s)) | ds ≤ M | t |, | I(α)(t) − x0 | ≤ 0
we have that if | t |≤ Mr , then I(α)(t) ∈ B r (x0 ). Taking n 0 as prescribed in Lemma 1 and = min{ c.n! , r }, the operator I n : Pr → Pr is well-defined and is a contraction Kn M for n > n 0 . Therefore from Lemma 1, we now have a unique γ ∈ Pr such that I(γ ) = γ . As previously noted, γ is the only solution to the IVP. Remark 1 The vector field F : R → R, given by F(x) = 3x 2/3 , is continuous but it is not Lipschitz. The IVP with x(0) = 0 admits two solutions; they are x(t) = t 3 and x(t) = 0. Theorem 2 suggests an interaction process to extend the solution γ : (− , ) → E for an interval greater than (− , ), as we show next: let γ0 : (− 0 , 0 ) → E be a solution to the IVP (4) and x1 = γ (t1 ), t1 ∈ (− 0 , 0 ). When we change the initial condition of the IVP to γ (0) = x1 , we obtain a new IVP whose solution we call γ1 : (− 1 , 1 ) → E. Due to the uniqueness, γ0 and γ1 coincide in a neighborhood of x1 . By combining the solutions, we obtained an extension of γ0 . This process can be applied so that either γ is defined for every t ∈ R; in this case we say that the integral curve γ : R → E is complete, or that there is a maximal interval J(x0 ) ⊂ R in which γ is well-defined.
3 Existence and Uniqueness Theorem for ODE
193
Example 5 Incomplete vector fields. (1) Let X (x) = x 2 be a vector field defined on R. The IVP γ (t) = X (γ (t)) = γ 2 (t), γ (0) = x0 , has a solution given that γ (t) = 1−tx0x0 . Therefore the maximal interval is J(x0 ) = R\{x0 }. (2) Let X (x, y) = (0, 1). The integral curve passing by p = (x0 , y0 ) is γ (t) = (x0 , y0 + t). If we consider X : R2 → R2 and p = (0, −1), then J ( p) = R. If we consider X : R2 \{0} → R2 and p = (0, −1), then J(p) = (−∞, 1). An important fact arising from this uniqueness is that if two integral curves γ1 : I → U and γ2 : I2 → U pass through p ∈ U , then they coincide in the intersection of their domains. To prove this, we consider the intersection I = I1 ∩ I2 and the set C = {t ∈ I | γ1 (t) = γ2 (t)}. C is connected; let’s check whether it is open and closed. It is a closed set because the integral curves are continuous. It is open because the interactive process mentioned above shows we can always extend a solution near t ∈ C. A vector field X : U → E is complete, if for all x ∈ U , we have J (x) = R. If X has compact support, then it is complete. The following theorem tells us about the continuous dependence of the solution with respect to the initial condition. Theorem 3 Let X : U → E be a vector field satisfying the hypothesis of Theorem 2. Let α(t), β(t) be solutions of the ODE γ (t) = X (γ (t)) on the closed interval [t0 , t1 ]. So we have | α(t) − β(t) | ≤ | α(t0 ) − β(t0 ) | .exp (K(t − t0 )) , for all t ∈ [t0 , t1 ]. Proof See in [31].
4 Flow of a Vector Field In this section, we consider only complete C ∞ -vector fields. Let U ⊂ Rn be an open subset and let X : U → Rn be a vector field. In the previous section, we showed the existence and uniqueness of an integral curve of X passing by each point x ∈ U . In this way, we associate the integral curve γ (t, x) = γx (t) with every x ∈ U . The flow of X is defined by the map X : R × U → E given by X (t, x) = γx (t). X is a C ∞ map (see [31]).
194
4 Vector Fields
Example 6 Linear flows For a fixed matrix A ∈ Mn (R), consider the linear vector field F : Rn → Rn given by F(x) = A.x. These are the simplest non-constant fields to be studied. Simplicity is revealed in the fact that it is possible to describe analytically all of its integral curves. We set the point x0 ∈ Rn and consider the trajectory γ (t) = exp(At).x0 , with exp(A) the exponential operator of A. We will show that γ (t) = exp(At).x0 is the solution of the IVP γ (t) = A.γ (t)), (14) γ (0) = x0 . It is straightforward to verify that γ (0) = x0 and γ ∈ C ∞ . The trajectory γ (t) satisfies the ODE since d γ (t) = dt
∞ ∞ ∞ An An An−1 n−1 n = A. exp(t A) = A.γ (t) = t t n−1 = A. t n! (n − 1)! (n − 1)! i=0
i=1
i=1
and γ (0) = x0 . F is complete, since γ (t) is defined for all t ∈ R. The flow of F(x) = A.x is the map F :R × Rn → Rn , F (t, x) = γx (t) = exp(t A).x . The flow γx (t) = x. exp(t A) satisfies the following conditions; (i) γx (0) = x and (ii) γγx (t) (s) = γx (t + s). In terms of the flow, these identities mean that F (s, F (t, x)) = F (t + s, x). The flow properties obtained in the example above generalizes to linear vector fields F : E → E, F(x) = A.x, defined in a Banach space E by an operator A ∈ L(E). Indeed, the properties are true in general. Any complete C ∞ vector field X : U → E, satisfying the hypothesis of Theorem 2, defines a C ∞ flow X : R × U → E. X :R × U → E (15) (t, x) → γt (x). Of course, X (0, t) = x. To prove the identity X (s, X (t, x)) = X (t + s, x), we consider the curves α(s) = γu (t + s) and β(s) = γγu (t) (s), and note that α(0) = γu (t), α (s) = γu (t + s) = X (γu (t + s)
,
β(0) = γu (t), β (s) = γγu (t) (s) = X (γγu (t) (s)).
Since α and β satisfy the same IVP, they must be equal. Therefore the flux satisfies the conditions X (0, x) = x, and X (t + s, u) = X (s, X (t, u)).
(16)
4 Flow of a Vector Field
195
For fixed t ∈ R, the flow X induces the diffeomorphism φt : U → U , φt (x) = (t, x). By varying the parameter t, we get the 1-parameter subgroup {φt : U → U | t ∈ R} for the diffeomorphism group Diff(X ). X
Definition 2 Let X ∈ (U ), V ∈ (V ) and let φ : U → V be differentaible maps. φ
(i) The vector fields X and Y are φ-related (X ∼ Y ) if dφ.X = Y ◦ φ. Equivalently, φ
the relation X ∼ Y means the diagram below commutes; dφ
T U −−−−→ ⏐ ⏐X
TV ⏐ ⏐Y
(17)
φ
U −−−−→ V . (ii) The flows X , Y are conjugated if φ ◦ X = Y ◦ φ. If φ is a diffeomorphism, then Y = φ ◦ X ◦ φ −1 . In particular, X is conjugated to itself. The φ-related vector fields share the following algebraic properties; φ
φ
(i) If X i ∼ Yi , i = 1, 2, and a, b ∈ R, then a X 1 + bX 2 ∼ aY1 + bY2 . (ii) If X ∈ (U ), Y ∈ (V ) and Z ∈ (W ), then ⎧ φ ⎨X ∼ ψ◦φ Y ⇒ X ∼ Z. φ ⎩Y ∼ Z
(18)
Proposition 2 Given the vector fields X ∈ (U ) and V ∈ (V ), let ψ : U → V ψ
be a differentiable map and let X , Y be their flows, respectively. So X ∼ Y if and only if their flows are conjugated. In particular, if ψ is a diffeomorphism, then Y = ψ∗ (X ) if and only if Y = ψ ◦ X ◦ ψ −1 ( ( X )∗ (X ) = X ). Proof Let’s prove each direction of the statements separately. For t ∈ R, consider the diffeomorphism φt = X (t, .) : U → U and fix p ∈ U . (i) (⇒) Let c(t) = φt ( p), so c (t) = X ( p). It follows that d (ψ(c(t))) = dψc(t) .c (t) = dψtX ( p) .X ( p) = Y ◦ ψ (tX ( p)) = dt = Y ◦ ψ (c(t)) = Y (ψ ◦ c)(t) . Therefore ψ ◦ c (t) is an integral ψ c(0) = condition curve Xof Y withthe initial ψ( p). The uniqueness implies that ψ ◦ t ( p) = ψ ◦ c (t) = tY ψ( p) . (ii) (⇐) Differentiating the expression ψ ◦ tX ( p) = tY ◦ ψ ( p) with respect to t, we get dtY d X ψ( p) ⇒ dψtX .X = Y ψ( p) . dψtX ( p) . t = dt dt
196
4 Vector Fields
Definition 3 Let G be a group with product operation · : G × G → G and let e be the identity element. A G-action on a topological space X is a map G • : G × X → X satisfying the following items; for all x ∈ X , (i) e • x = x, (ii) g • (g • x) = (g · g) • x. By denoting γ (t, x) = t • x, it follows from the identities in Eq. (16) that a flow defines an action of the additive group G = (R, +) on E since it satisfies the following items; (i) 0 • x = x; (ii) s • (t • x) = (s + t) • x, for any t, s ∈ R and x ∈ E. The orbit of an element x ∈ E generated by the R action is the set Ox = {t • x | t ∈ R}, which is the trajectory of the point x with the flow X . If x0 is a singularity of X , X (x0 ) = 0, then x0 is a fixed point for the action, that is, its orbit is the constant trajectory t • x0 = x0 . In general, the R-action induced by a vector field X induces the 1-parameter subgroup of the diffeomorphism φt : U → U , φt (x) = t • x. Let G be a group. A dynamical system defined on an open subset U ⊂ E is the flow spanned by the action G × U → E, (g, x) → g • x. When G = R and the action is induced by the flow of a vector field X ∈ (U ) (EDO), we have t • x = γx (t). The dynamical system can be continuous or discrete, depending on whether G is continuous or discrete as a topological space. When X (x) = A · x, the 1-parameter subgroup of the diffeomorphism is {exp(t A) | t ∈ R}.
Exercises 1. Show that the system below is φ-related to the system defined in Eq. (9). hint: φ is the diffeomorphism φ : R2 \{0} → R2 \{0}, φ(θ, r ) = (r · cos(θ ), r · sin(θ )) . r = r (r 2 − 4), θ = 1.
.
(19)
2. Show that the solution to the IVP given by the ODE in the item above with the initial condition (x(0), y(0)) = (a, b) is ⎧ k ⎪ ⎪(x(t), y(t)) = 2 k−e8θ cos(θ ), sin(θ ) , if r > 2 ⎨ (x(t), y(t)) = 2 cos(θ ), sin(θ ) , se r = 2, ⎪ ⎪ ⎩(x(t), y(t)) = 2 k cos(θ ), sin(θ ) , if r < 2. e8θ −k The constants are k =
a 2 +b2 .e8θ0 a 2 +b2 −4
and θ0 = tg−1 ( ba ).
5 Vector Fields as Differential Operators
197
5 Vector Fields as Differential Operators Let U ⊂ Rn be an open subset and let T U = U × Rn be its tangent bundle. A vector field is an element of the space (U ), that is, a section of the tangent bundle of U as described in Chap. 2. Consider a fixed orthonormal frame βc = {ei | 1 ≤ i ≤ n} of Rn , then a C ∞ -vector field X : U → T U can be written as X (x) =
n
ai (x)ei , ai ∈ C ∞ (U ).
i=1
The field X induces the C ∞ (U ) map X : C ∞ (U ) → C ∞ (U ), n ∂f ∂ X ( f ) = (d f.X )(x) = ai (x) (x) = ai (x) ( f (x)), ∂ xi ∂ xi i=1 i=1 n
and so the bijection X=
n i=1
ai (x)ei ←→
n i=1
ai
(20)
∂ . ∂ xi
We have identified the vector field with a 1st-order linear differential operator. As differential operators, vector fields have the following properties: let f, g ∈ C ∞ (U ), a, b ∈ R and X ∈ (U ); (i) X (a f + bg) = a X ( f ) + bX (g) (linear). (ii) X ( f.g) = X ( f ).g + f.X (g) (Leibniz’s rule). Definition 4 The Lie bracket [X, Y ], or commutator, of two smooth vector fields X, Y ∈ (U ) is the differentiable vector field [X, Y ] : C ∞ (U ) → C ∞ (U ), [X, Y ]( f ) = X Y ( f ) − Y X ( f ) .
(21)
Proposition 3 Let X, Y ∈ (U ) and f, g ∈ C ∞ (U ); so 1. [ f X, gY ] = f g[X, Y ] + f X (g)Y − gY ( f )X . 2. [Y, X ] = −[X, Y ]. 3. [[X, Y ], Z ] + [[Z , X ], Y ] + [[Y, Z ], X ] = 0 (Jacobi’s identity). Proof It is left as an exercise since it is straightforward from the definition. We note that for any X ∈ (U ) and f ∈ C ∞ (V ), we have [dφ(X )] ( f )(x) = d f φ(x) .dφx .X (x) = dx ( f ◦ φ).X (x) = X f ◦ φ (x). φ
If X ∼ Y , then
198
4 Vector Fields
dφ(X ) ( f )(x) = d f φ(x) .dφx .X (x) = dφ(x) f.Y φ(x) = Y ( f ) ◦ φ. Proposition 4 Let U ⊂ Rn and V ⊂ Rm be open subsets and let φ : U → V be a φ
C ∞ map. If X, X ∈ (U ) and Y, Y ∈ (V ) are vector fields such that X ∼ Y and φ
φ
X ∼ Y , then [X, Y ] ∼ [X , Y ]. Proof dφ([X, X ]x )(x) = [X, X ]( f ◦ φ)(x) = X (X ( f ◦ φ)) − X (X ( f ◦ φ)) = = X (dφ.X ( f )) − X (dφ.X ( f )) = X (Y ( f ) ◦ φ) − X (Y ( f ) ◦ φ) = = dφ(X )(Y ( f )) − dφ(X )(Y ( f )) = Y (Y ( f )) − Y (Y ( f )) = [Y, Y ]( f ).
Definition 5 A Lie algebra is a vector space g endowed with a bilinear map [., .] : g → g satisfying the following identities: for any X, Y, Z ∈ g; (i) [Y, X ] = −[X, Y ]. (ii) [[X, Y ], Z ] + [[Z , X ], Y ] + [[Y, Z ], X ] = 0 (Jacobi’s identity). Therefore the space (U ) of vector fields defined on an open subset U ⊂ Rn is a Lie algebra. Remark 2 Let’s show a useful interpretation for the bracket [X, Y ] of X, Y ∈ (U ). Consider the points p0 , p1 , p2 , p3 , p4 ∈ U , as illustrated in Fig. 11 and let α0 , α1 , β0 , β1 be the integral curves such that: • α0 , α1 are integral curves of X , i.e., αi (t) = X (α(t)). Besides, α0 satisfies α0 (0) = p0 and α0 (h) = p2 , and α1 satisfies α1 (0) = p1 and α1 (h) = p4 . • β0 , β1 are integral curves of Y , i.e., βi (t) = X (β(t)). β0 satisfies β0 (0) = p0 and β0 (0) = p1 , and β1 satisfies β1 (0) = p2 and β1 (h) = p3 . For any f ∈ C ∞ (U ), let’s find the value of f ( p4 ) − f ( p3 ). We make use of the approximations in the Taylor series, as follows: (a) Let pi , p j be αi -adjacent vertices in the polygon illustrated in Fig. 12. Then we have αi (h) = αi (0) + hαi (0) + o(h) = αi (0) + h X (αi (0)) + o(h), and lim o(h) = 0. Also we have p2 = p0 + h X ( p0 ) and p4 = p1 + h X ( p1 ). h (b) Let pi , p j be βi -adjacent vertices in the polygon illustrated in Fig. 12. Then we have βi (h) = βi (0) + hβi (0) + o(h) = βi (0) + h X βi (0) + o(h), and lim
o(h) h
= 0. So we have p1 = p0 + hY ( p0 ) and p3 = p2 + h X ( p2 ).
5 Vector Fields as Differential Operators
199
Fig. 11 Phase portrait of system (19)
Fig. 12 Coordinates curves
(c)
(d)
For any f ∈ C ∞ (U ), let H f be the Hessian operator of f . So we have the identity. d(d f ).X.Y = H f (X, Y ). For any X, Y ∈ (U ), we have [X, Y ]( f ) = X Y ( f ) − Y X ( f ) = d d f.Y .X − d d f.X .Y = = d(d f ).Y + d f.dY .X − (d(d f ).X + d f.d X ).Y = = dY.X − d X.Y ( f ).
200
4 Vector Fields
Fig. 13 Coordinate system
Now we can proceed to compute f ( p4 ) − f ( p3 ); f ( p4 ) − f ( p3 ) = [ f ( p4 ) − f ( p1 )] + [ f ( p1 ) − f ( p0 )] + [ f ( p0 ) − f ( p2 )] + [ f ( p2 ) − f ( p3 )] = = (d f.X ) p1 + o(h) + (d f.Y ) p0 + o(h) − (d f.X ) p0 + o(h) − (d f.Y ) p2 + o(h) = = [(d f.X ) p1 − (d f.X ) p0 ] − [((d f.Y ) p2 − ((d f.Y ) p0 ] + o(h) = = [d(d f.X ).Y ] p0 − [d(d f.Y ).X ] p0 + o(h) = d f.(d X ).Y ) p0 + H f (X, Y ) − d f.(dY.X )] p0 − H f (X, Y ) = d f.[d X.Y − dY.X ] + o(h) = d f.[X, Y ] + o(h) = [X, Y ]( f ) + o(h).
Since the identity obtained holds for any f , we conclude that the paths close if and only if [X, Y ] = 0. If {(x1 , . . . , xn ) ∈ U } defines a coordinate system in U , then [ ∂∂xi , ∂∂x j ] = 0, for all i, j. Definition 6 Let U ⊂ Rn . A set of vector fields {X 1 , . . . , X n } ⊂ (U ) defines a coordinate system in U if the following conditions are satisfied: (i) for all x ∈ U , the set {X 1 (x), . . . , X n (x)} is a basis of Tx U , (ii) [X i , X j ] = 0, for all pairs (i, j). In this case, the integral curves are called coordinate curves , as illustrated in Fig. 13.
Exercise 1. Prove the identities in Proposition 3.
6 Integrability, Frobenius Theorem Let U ⊂ Rn be an open subset and let Dx ⊆ Tx U be a k-dimensional vector subspace for all x ∈ U . We say that D = ∪x∈U Dx is a k-distribution. D is differentiable if we have a set of k C ∞ -vector fields β = {X 1 , . . . , X k } ⊂ (U ), such that at x ∈ U , the
6 Integrability, Frobenius Theorem
201
set β(x) = {X 1 (x), . . . , X k (x)} is a base of Dx . A field X belongs to the distribution D if X (x) ∈ Dx for all x ∈ U . The set β is a frame on U when β(x) is a basis of Tx U for all x ∈ U . Definition 7 A distribution D is involutive if [X, Y ] ∈ D for all pairs of vector fields X, Y ∈ D. Now we will consider the integrability of a distribution D. Indeed, this is a generalization of the case k = 1, for which Theorem 2 ensures the response is positive. In the previous section, we considered the case k = 2 and showed that if the distribution satisfies [X 1 , X 2 ] = 0, then locally these fields define a surface. The Lie bracket plays a central role in answering the question. Definition 8 A k-distribution D is integrable if there is a submanifold M k (x) ⊂ U for all x ∈ U , such that (i) x ∈ M k (x), (ii) Tx M k (x) = Dx . M k (x) is a leaf of the foliation defined by the distribution D. The distribution D is maximal if M k (x) is connected and it is not contained in a larger leaf of the foliation for all k ∈ N; i.e., if N k (x) is another leaf containing x, then either N k (x) ⊂ M k (x) or N k (x) = M k (x) for all x ∈ U . For the k-dimensional case, the integrability of a distribution defines a local foliation as illustrated in Figs. 14 and 15. The necessary and sufficient condition for the integrability of a distribution is provided by the Frobenius theorem. Before proving the Frobenius theorem, we will prove two lemmas.
Fig. 14 Foliation A
202
4 Vector Fields
Fig. 15 Foliation B
Lemma 2 Let D be an involutive k-distribution in U of class C ∞ . So we have a local frame β = {Yi | 1 ≤ i ≤ k} such that [Yi , Y j ] = 0 for every pair i, j and for all x ∈ U . Proof Let β = {X 1 , . . . , X k } ⊂ (U ) be a k-frame defining the distribution D and let βc = {ei | 1≤ i ≤ n} be the canonical frame composed by the vector fields ei = n ∂ . Let X i = i=1 t ji e j be the representation given by the coordinates in basis βc . ∂ xi The matrix T = (ti j ) is equivalent by elementary row operations to Ik×k A(n−k)×k . 0(n−k)×k B(n−k)×(n−k) Therefore we have a unique k-frame β = {Yi | 1 ≤ i ≤ k} given by Yi = ei +
n
α ji e j
i=k+1
also spanning D. Let Dˆ be the distribution generated by the vector fields {ek+1 , . . . , ˆ Since D ∩ Dˆ = {0}, we have en }. Since [ei , e j ] = 0, this yields that [Yi , Y j ] ∈ D. [Yi , Y j ] ∈ D, and so [Yi , Y j ] = 0, ∀i, j.
6 Integrability, Frobenius Theorem
203
Lemma 3 The vector fields X, Y ∈ (U ) commute ( [X, Y ] = 0 ) if and only if the flows X and Y are also commutative, i.e., tX ◦ sY = sY ◦ tX , ∀s, t ∈ R. Proof (i) (⇒) To prove this direction, the procedure is the same as in Remark 2; we have to show that [X, Y ] measures the difference f ( p4 ) − f ( p3 ). Assuming t and s are small enough such that tX (x) = x + X (x)h + r1 (h), sY (x) = x + Y (x)h + r2 (h), and lim h→0
ri (h) h
= 0, i = 1, 2, we have the identity
tX (sY (x)) − sY (tX (x)) = [X, Y ].h + r (h),
r (h) = 0. h→0 h lim
(ii) (⇐) Consider the surface ξ(t, s) = tX (sY ) = X (t, sY ). Then ∂ X (t, sY ) ∂ξ = = X (sY ) ∂t ∂t and
∂ 2ξ ∂ X (sY ) ∂Y = = d X sY . s = d X sY .Y (sY ). ∂s∂t ∂s ∂s
Analogously, using the fact that ξ(t, s) = sY ◦ tX , we get From the hypothesis, we have 0.
∂ ξ ∂s∂t 2
=
∂ ξ , ∂t∂s 2
∂2ξ ∂t∂s
= dYtX .X (tX ).
and therefore [X, Y ] = d X.Y − dY.X =
Theorem 4 (Frobenius) A k-dimensional C ∞ -distribution D in U is integrable if and only if it is involutive. Proof Let β = {X 1 , . . . , X k } ⊂ (U ) be a k-frame defining D; (i) integrable ⇒ involutive. Consider ι : M → U the embedding of M such that ι(m 0 ) = x0 . For m ∈ M, dι : X 1, . . . , X k ∈ Tm M, Tm M → Tι(m) U is an isomorphism, so we have vector fields Xi , X j are ι-related to X i , X j , respectively; theresuch that di( X i ) = X i ◦ ι for all i. Xi , X j ]) for all pairs i, j. Hence D is involutive. fore [X i , X j ] = dι([ (ii) involutive ⇒ integrable. Fix a point x0 ∈ U . Consider β = {X i | 1 ≤ i ≤ k} a k-frame generating D in a neighborhood U0 ⊂ U of x0 and satisfying the condition [X i , X j ] = 0, ∀i, j (Lemma 2). Let > 0 be such that the flow X i is defined for (− , ) × U0 ⊂ R × U for all i. Considering the k-cube (− , )k , we define the map
204
4 Vector Fields
:(− , )k → U,
(22)
(t1 , . . . , tk ) → (t1 , . . . , tk ) = (tX1 1 ◦ · · · ◦ tXk k )(x0 ). Let tˆ = (t1 , . . . , tk ). Applying Lemma 3, we get
d (tX1 1 ◦ · · · ◦ tXk k )(x0 ) d tXi i ◦ (tX1 1 ◦ · · · ◦ tXk k )(x0 ) ∂ |t=tˆ= = |t=tˆ = ∂ti dt dt = X i tXi i ◦ (tX1 1 ◦ · · · ◦ tXk k )(x0 ) ) = X i (tˆ) . (23) So is an immersion. Given the Local Immersion Theorem, we can reduce to
1 , 0 < 1 < , so that the image of the map : (− 1 , 1 )k → U is a k-submanifold M k (x0 ) ⊂ U . Therefore D is integrable.
7 Lie Groups and Lie Algebras In Appendix B, we introduce the concept of a Lie Group G; now we will associate G to a Lie Algebra g. As an application of the Frobenius theorem, we will use the SO(n) example to show that a subgroup H of a group G can be reconstructed when we know its subalgebra h ⊂ g. Let us first recapitulate some assertions. Definition 9 Let g ∈ G. A left translation by an element g ∈ G is the diffeomorphism L g : G → G, L g (h) = g.h. The right translation by g is the diffeomorphism Rg : G → G, Rg (h) = h.g. That we have left and right translations make the Lie groups a very special class of spaces to be studied. Lσ
Definition 10 A vector field X is left-invariant if d L σ .X = X ◦ L σ (X ∼ X ). Let e ∈ G be the identity. For g ∈ G, the diffeomorphism L g : G → G, L g (h) = g.h, induces the isomorphism d L g : Te G → Tg G, d L g (X ) = g.X . For every X e ∈ Te G, the vector field X (g) = d L g .X e is left-invariant. (d L σ X )(g) = d L σ .X (g) = d L σ .d L g .X e = d L σ.g .X e = X (σ.g) = (X ◦ L σ )(g). Proposition 5 Let g be the vector space of left-invariant vector fields on a Lie group G. So (i) g is isomorphic to Te G. (ii) g is a Lie algebra endowed with a Lie bracket.
7 Lie Groups and Lie Algebras
205
Proof (i) Since every field X ∈ g is defined by the relation X (g) = d L g .X e , we consider the map θ : g → Te G given by θ (X ) = X e . So θ is a vector space isomorphism. (ii) The Lie bracket [X, Y ] of two fields X, Y ∈ g also belongs to g since d L g .[X, Y ] = [d L g X, d L g Y ] = [X ◦ L g , Y ◦ L g ] = [X, Y ] ◦ L g . Definition 11 The Lie algebra of a Lie group G is the vector space g of left-invariant vector fields defined on G. A subspace h ⊂ g is a Lie subalgebra of g if [X, Y ] ∈ h whenever X, Y ∈ h. The group GLn (R) of invertible matrices is a differentiable manifold locally mod2 elled on the Euclidean space Rn . Indeed, GLn (R) is an open subset of Mn (R). The tangent plane at identity I ∈ GLn (R) is TI G 0 = Mn (R). Let gln (R) be the Lie algebra of GLn (R). The vector space Mn (R) is a Lie algebra endowed with the Lie bracket [A, B] = A.B − B.A, so gln (R) and Mn (R) are isomorphic Lie algebras, since [X, Y ](g) = d L g .[X (I ), Y (I )]. In Example 7, Chap. 1, we show that the group of orthogonal matrices SO(n) = {X ∈ GLn (R) | X.X t = X t .X = I, det(X ) = 1} . The tangent plane TI SO(n) = {A ∈ is a Lie group of dimension d(n) = n(n−1) 2 Mn (R) | At = −A} at the identity is exactly the subspace An (R) of the real skewsymmetric matrices. We note that An (R) is a Lie subalgebra of gln (R) with the Lie bracket [A1 , A2 ] = A1 A2 − A2 A1 , since [A1 , A2 ]t = (A1 A2 − A2 A1 )t = At2 At1 − At1 At2 = A2 A1 − A1 A2 = [A2 , A1 ] = −[A1 , A2 ].
Therefore [A1 , A2 ] ∈ son satisfies the following items: (i) [A2 , A1 ] = −[A1 , A2 ], (ii) [[A1 , A2 ], A3 ] + [[A3 , A1 ], A2 ] + [[A2 , A3 ], A1 ] = 0. Moreover the Lie algebra son of SO(n) is isomorphic to the Lie algebra An (R). A basis of son is given by β = {E i j = (ei j ) | 1 ≤ i, j ≤ n}, and the entries of E i j are
eαβ
⎧ ⎪ ⎨ 1, if α < β and α = i, β = j, = −1, if α > β and α = j, β = i, ⎪ ⎩ 0, if {α, β} = {i, j}.
206
4 Vector Fields
The dimension of son is d(n) = n(n−1) . Let’s construct an involutive d(n)2 distribution D in GLn (R). The fields X i j (g) = d L g .E i j form a basis for son . Set D(g) to be the subspace Tg GLn (R) generated by the base β(g) = {X i j (g) | 1 ≤ i < j ≤ n} and D = {D(g) | g ∈ GLn (R)}. According to the Frobenius theorem, I is in a maximal leaf M(integral subvariety) of D, which is invariant by left translation (L g−1 (M) = M) since the distribution D is invariant by left translation. We have to show that M = SO(n) is a group, but this follows directly from the fact that L g−1 (M) = M since g −1 h ∈ M for all g, h ∈ M. Consequently, M is a subgroup of GLn (R) with Lie algebra son . And SO(n) as well is an integral submanifold (leaf) of D, and a Lie subgroup of GLn (R). To verify that SO(n) is maximal, we note that the dimension of SO(n) is d(n) and is a connected, compact topological space.
Exercises 1. Show that R3 provided with the Lie bracket [u, v] = u × v is a Lie algebra, isomorphic to A3 (R) given that × is the vector cross product. 2. Show that there is a bijection between the sets {Lie subalgebras of g} ←→ {subgroups of G}. 3. Let A ∈ A. Consider the ODE γ (t) = Aγ (t) defined in Rn . Show that the 1parameter subgroup of diffeomorphisms generated by the flow is {exp(t A) | t ∈ R}. 4. Show that SO(n) is compact and connected. ∞ A n , is 5. Show that the exponential map exp : An (R) → SO(n), exp(A) = i=0 n! well-defined and surjective. 6. Show that exp : A → O(n) is not surjective. 7. For all A ∈ SO(n), show that the 1-parameter subgroup of diffeomorphisms of the linear field F(x) = A.X is a subgroup of SO(n). 8. Baker-Campbell-Hausdorff formula: let A, B and C be matrices in Mn (C), such that exp(C) = exp(A). exp(B). Show that there are homogeneous polynomials Fn of degree n such that ∞ C= Fn (A, B). i=1
Moreover, if [A, B] = 0, then Fn (A, B) = 0 for all n ≥ 2. Show that 1 1 C = A + B + [A, B] + ( [A, [A, B]] + [[A, B], B] ) + . . . . 2 12
8 Variations over a Flow, Lie Derivative
207
8 Variations over a Flow, Lie Derivative Let X be the flux of the field X and let {φt | t ∈ R} be the 1-parameter subgroup of diffeomorphisms generated by the flow. We want to study the variations of functions and vector fields along the flow of X . We will consider all functions, maps, and vector fields to be C ∞ . Definition 12 Let U ⊂ Rn and V ⊂ Rm be open sets and let φ : U → V be a map in C ∞ (U, V ); (i) Let f ∈ C ∞ , X ∈ (U ); (i.1) the pullback of f by φ ∈ C ∞ (V ) is φ ∗ f (x) = f (φ(x)) ∈ C ∞ (U ). The pullback induces a map φ ∗ : C ∞ (V ) → C ∞ (U ). (i.2) Assuming φ is a diffeomorphism, the pushforward of f ∈ C ∞ (U ) induced by φ is φ∗ f = f ◦ φ −1 ∈ C ∞ (V ). We have the map φ∗ : C ∞ (U ) → C ∞ (V ). (ii) Let φ : U → V be a diffeomorphism. (ii.1) The pullback of Y ∈ (V ) induced by φ is φ∗ (X ) = dφ −1 ◦ X ◦ φ ∈ (U ). The pullback induces the map φ ∗ : (V ) → (U ). (ii.2) The pushforward of X by φ is φ∗ (X ) = dφ ◦ X ◦ φ −1 . The pushforward induces the map φ ∗ : (U ) → (V ). A diffeomorphism ψ : U → V also induces the pushforward of flows. The pushforward ψ ∗ X = ψ ◦ X ◦ ψ −1 of X by ψ induces the group homomorphism Diff(U ) → Diff(V ), tX → ψ ◦ tX ◦ ψ −1 . The pushforward of X with the diffeomorphisms φt = tX is X itself. We fix an orthonormal basis β = {e1 , . . . , en } of R n and let X = i xi ei . Consider φ : U → V a diffeomorphism and q = φ( p). Then dφ p (X ( p)) =
i
dφ p (ei ) =
i
⎤ ⎡ i ∂φ ∂φ ⎣ xi ( p) ( p) = x j ( p) ( p)⎦ ei . ∂ xi ∂x j i j
Two fields can be related by a diffeomorphism, for example by a coordinate change; therefore it is necessary to describe the action of the diffeomorphisms on functions and maps. Proposition 6 The pullback and the pushforward of functions enjoy the following algebraic properties; (i) the pullback and the pushforward of functions are linear operators and φ ∗ ( f.g) = φ ∗ ( f ).φ ∗ (g), φ∗ ( f.g) = φ∗ ( f ).φ∗ (g).
(24)
208
4 Vector Fields
(ii) Let φ : U → V and ψ : V → W , so (ψ ◦ φ)∗ = φ ∗ ◦ ψ ∗ and (ψ ◦ φ)∗ = φ∗ ◦ ψ∗ . Proof The proofs are only immediate from the definitions, so they will be left as an exercise. Definition 13 Let f ∈ C ∞ (U ) and X ∈ (U ). The Lie derivative of f in the direction of the vector field X is the operator L X :C ∞ (U ) → C ∞ (U ), L X f (x) = X f (x) = d f x .X (x).
(25)
Therefore the Lie derivative is a derivative operator satisfying the following identities; for all a, b ∈ R and f, g ∈ C ∞ (U ), (i) L X (a f + bg) = aL X ( f ) + bL X (g) (R-linear), (ii) L X ( f.g) = L X ( f ).g + f.L X (g) (Leibniz’s rule). Proposition 7 Let ψ : U → V be a diffeomorphism. So L X is natural with respect to the pushforward by ψ, i.e., for all f ∈ C ∞ (U ), we have Lψ∗ X (ψ∗ f ) = ψ∗ (L X ( f )). Equivalently, the diagram below commutes; ψ∗
C ∞ (U ) −−−−→ C ∞ (V ) ⏐ ⏐ ⏐L ⏐L $ ψ∗ $ X
(26)
ψ∗
C ∞ (U ) −−−−→ C ∞ (V ). Proof Let x ∈ U , Lψ∗ X (ψ∗ f )(x) = d( f ◦ ψ −1 ).(ψ∗ X )(x) = d( f ◦ ψ −1 ).(dψ.X.ψ −1 )(x) =
= d f ψ −1 (x) .dψx−1 .dψ.X (ψ −1 (x)) = d f ψ −1 (x) .X (ψ −1 (x)) = ψ∗ (L X ( f )).
For maps, we have the following assertion; ψ
Proposition 8 Let ψ : U → V be a map, X ∈ (U ) and Y ∈ (V ). If X ∼ Y , then L X (ψ ∗ f ) = ψ ∗ (LY ( f )) for all maps f : V → F given that F is a Banach space. ψ∗
C ∞ (V, F) −−−−→ C ∞ (U, F) ⏐ ⏐ ⏐L ⏐L $ X $ Y ψ
∗
C ∞ (V, F) −−−−→ C ∞ (U, F).
(27)
8 Variations over a Flow, Lie Derivative
209
Proof For all x ∈ U ,
L X ψ ∗ f (x) = d f ◦ ψ x .X (x) = d f ψ(x) .dψx .X (x) = d f ψ(x) .Y ψ(x) = ψ ∗ LY f (x).
Definition 14 Let X, Y ∈ (U ). The Lie derivative of Y with respect to a vector field X at x ∈ U is X Y (tX ) − Y (x) d−t . (28) L X Y = lim t→0 t X In the definition above, we need to apply the isomorphism d−t = (dtX )−1 : X TtX (x) U → Tx U (Fig. 16) to compute L X Y since the vectors Y (t (x)) and Y (x) do not belong to the same tangent plane (Fig. 17).
Fig. 16 Comparing tX ◦ sY (x) with sY ◦ tX (x)
Fig. 17 Lie derivative
210
4 Vector Fields
Proposition 9 Let X, Y ∈ (U ), so L X Y = [X, Y ].
(29)
Proof According to the definition,
X Y (tX ) − Y (x) d−t d X X d−t (Y t ) ( f ) |t=0 = (L X Y )( f ) = lim (f) = t→0 t dt =
d X X Y t ( f ◦ −t ) |t=0 . dt
(30) X For any x ∈ U , when the function H (t, u) = f (φ−t ◦ uY ◦ tX )(x) , X we consider while setting q = φ−t ◦ uY ◦ tX (x), we have ∂ X ∂H X X .Y (tX (x)) = Y tX (x) ( f ◦ −t ). |(t,0) = d f q . −t (uY (tX (x))) |(t,0) = d f q .d−t ∂u ∂u ∂ H ∂ H Therefore we get ∂t∂u |(0,0) = (L X Y )( f ). We have to find ∂t∂u |(0,0) . Consider the X Y X function K (t, u, s) = f (s ◦ u ◦ t (x)), so H (t, u) = K (t, u, −t). In this way, 2
2
∂ 2 H %% ∂ 2 K %% ∂ 2 K %% = − . (0,0) (0,0,0) ∂t∂u ∂t∂u ∂s∂u (0,0,0) Since K (t, u, 0) = f (uY ◦ tX (x)), and considering qˆ = uY ◦ tX (x), it follows that the identity ∂∂uK |(0,0,0) = d f qˆ .Y (tX (x))( f ) and, consequently, we have ∂K ∂H = , ∂u ∂u
& '% ∂2 K |(0,0,0) ( f ) = d(d f )qˆ .(duY .X (tX )).Y (tX (x)) + d f qˆ .dY X .X (tX )) %(0,0,0) ( f ) = t ∂t∂u ⎧ ⎫ ⎪ ⎪ ⎨ ⎬ = d(d f )x .(d0Y .X (x)).Y (x) + d f x .dYx .X (x) ( f ) = X (Y ( f )) (x). ⎪ ⎪ ( )* + ⎩ ⎭ 0
Analogously, we consider K (0, u, s) = f sX ◦ uY (x) , and so we have ∂K = X f (φuY ) , ∂s
∂2 K |(0,0,0) = Y (x) X ( f ) . ∂u∂s
Hence L X Y = [X, Y ].
Corollary 1 Let X, Y ∈ (U ), so d X ∗ (t ) Y = (tX )∗ [X, Y ] dt In particular , dtd (tX )∗ Y |t=0 = [X, Y ].
(31)
8 Variations over a Flow, Lie Derivative
211
X Proof To prove the assertion it is enough to note that (tX )∗ Y = d−t ◦ Y ◦ tX is exactly the term in Eq. 30.
Exercise 1. Show the Proposition 6 statements.
9 Gradient, Curl and Divergent Differential Operators In this section, we will define the Gradient, Curl, and Divergent operators that play a prominent role in the theories of Differentiability and Integrability of maps. Let U ⊂ Rn be an open subset and let β = {ei | 1 ≤ i ≤ n} be an orthonormal basis of Rn . The differential operator “nabla” is ∇=
n ∂ .ei . ∂ xi i=1
(32)
Definition 15 The gradient operator ∇ : C ∞ (U ) → C ∞ (U ) is ∇( f ) =
n ∂f ei . ∂ xi i=1
(33)
Proposition 10 (Gradient properties) Let a, b ∈ R and f, g ∈ C ∞ (U ); (i) ∇(a f + bg) = a∇ f + b∇g (R-linear). (ii) ∇( f.g) = ∇ f.g + f.∇g (Leibniz’s rule). Definition 16 Let U ⊂ R3 be an open subset and let β = {ei | 1 ≤ i ≤ 3} be an orthonormal basis of R3 . 3 vi ei → curl(V), is 1. The curl operator curl : (U) → (R3 ), V = i=1 ∂v3 ∂v2 ∂v1 ∂v3 ∂v2 ∂v1 e1 + e2 + e3 . − − − ∂x2 ∂x3 ∂x3 ∂x1 ∂x1 ∂x2 {1,2,3} (34) Using the cross vector product × : R3 × R3 → R3 and ∇, we have curl(V) = ∇ × V. 2. The divergent operator div : (U) → C∞ (U) is curl(V) =
V → div(V), V =
n i=1
vi ei → div(V) =
n ∂vi i=1
∂xi
.
(35)
212
4 Vector Fields
The divergent is also defined by div(V) = ∇.V (the dot “.” corresponds to the inner product); ∇.V =< ∇, V > =
= . ∂ xi ∂ xi i=1 j=1 i=1
3. The Laplacian operator on functions is f = div(∇f) =
= . ∂xi ∂xj ∂xi2 i=1 j=1 i=1
We have the following fundamental identities in R3 ; (i) ∇ × (∇ f ) = 0, (ii) ∇.(∇ × V ) = 0.
(36)
Exercises 1. Show that the curl operator satisfies the following identities; let a, b ∈ R and V, W ∈ (U )); show the curl(V) corresponds with taking the skew(i) If V = P i + Q j + R k, symmetric part of the Jacobian matrix d V . (ii) ∇ × (aV + bW ) = a.∇ × V + b.∇ × W . (iii) ∇ × ( f.V ) = ∇ f × V + f.∇ × V . (iv) ∇ × (V × W ) = [∇.W + < W, ∇ >](V ) − [∇.V + < V, ∇ >](W ). (v) ∇ × (∇V ) = 0, for all V ∈ (U ). (vi) ∇ × (∇ × V ) = ∇(∇.V ) − V . 2. Show that the divergent operator satisfies the following identities; let a, b ∈ R and V, W ∈ (U )); (i) ∇.(aV + bW ) = a.∇.V + b.∇.W . (ii) ∇.( f.V ) =< ∇ f, V > + f ∇.V . (iii) ∇.(V × W ) =< ∇ × V, W > − < V, ∇ × W >. (iv) f = ∇.(∇ f ). (v) ∇.(∇ × V ) = 0, for all V ∈ (U ). The symbols (∇), (∇×) and (∇·) are concomitantly used with the symbols grad, curl and div, respectively. Remark 3 The grad and div operators depend on the inner product in Rn , hence the Laplacian also depends on the inner product. Consider Rn endowed with the inner product < ., . >: Rn × Rn → R where the matrix relative to the canonical basis β = {e1 , . . . , en } is g = (gi j ) and the coefficients gi j are constants. 1. grad. Given a function f , the grad(f) is
9 Gradient, Curl and Divergent Differential Operators
213
d f x .u =< grad(f)(x), u > .
(37)
It follows from Proposition 2, Appendix A, that ⎛ ⎞ n n ∂f ⎝ ⎠ ei . grad(f) = gij ∂xj i=1 j=1
(38)
2. div. The trace of a matrix A = (ai j ) with respect to the inner product G = (gi j ) in Rn is tr g (A) = ni,j=1 gij aij . Then the divergent of a vector field V = i vi ei is div(V) = tr g (dV) =
n
gij
i,j=1
∂Vi . ∂xj
(39)
3. Laplacian. The Laplacian of a differentiable function is f = tr g (d∇f) =
n
gij
i,j=1
∂ 2f . ∂xij
(40)
Considering d 2 f = ( ∂∂ xifj ) the Hessian matrix of f , we have f = tr g (d2 f). 4. In Riemannian geometry, the inner product is replaced by a Riemannian metric. A Riemannian metric induces on the tangent plane T p U an inner product g p : TP U × T p U → R. The map p → g p is C ∞ . We recommend [7] for further reading on Riemannian geometry. We remark that the operator d(∇ f ) depends on the covariant derivative, defined by the Riemannian connection, which depends on Christoffel’s symbols. As shown in Sect. 7 in Chap. 3, if the of entries gi j are constants, then the Christoffel’s symbols are null. In the context n Vi ei ; Riemannian geometry, operators have the following form: let V = i=1 2
⎛ ⎞ n n ∂ f ⎝ ⎠ ei , ∇f = gi j ∂ x j i=1 j=1 1 ∂ det(g)Vi , det(g) ∂ xi i=1 n 1 ∂ ∂ f = . det(g)g i j √ ∂x j det(g) ∂ xi i=1
div(V) =
n
√
In the next chapter, we will give an interpretation of the operators curl and div.
Chapter 5
Vector Integration, Potential Theory
1 Vector Calculus We will review some operations in vector calculus that will motivate using differential forms when integrating vector fields. The differential forms formalism allows us to generalize the Stokes Theorem to describe the conditions of integrability (Frobenius Theorem), and to write Maxwell’s equations succinctly to obtain topological invariants using differentiable tools and many other applications.
1.1 Line Integral To review the line integral, we use the Physicist’s concept of Work. Consider F to be a force field acting on the point of mass m. Assume that the magnitude F =| F | of the force field is constant and the point undergoes a displacement on a straight line that is s-units from the initial point. In this case, the work W done by the force F is W = F · s. When a point of mass m moves on a plane, the displacement describes a C 1 curve γ : [a, b] → R2 parametrized by γ = (γ1 , γ2 ). To determine the Work done, we use a polygonal curve γP to approximate γ . To define the polygonal, we use a partition P = {a = t1 , . . . , ti , . . . , tn = b} of an interval [a, b]. Considering Pi = γ (ti ) the vertices, the polygonal is obtained by joining the vertices with the straight line segments Pi−1 Pi . We fix a set of points {ti∗ ; 1 ≤ i ≤ n, ti∗ ∈ [ti−1 , ti ]}.
© Springer Nature Switzerland AG 2021 C. M. Doria, Differentiability in Banach Spaces, Differential Forms and Applications, https://doi.org/10.1007/978-3-030-77834-7_5
215
216
5 Vector Integration, Potential Theory
Fig. 1 Work
The Work done by the particle along the straight line segment between the points γ , (t ∗ ) Pi−1 Pi approximates Wi =< F(ti∗ ), T (ti∗ ) > si , with T (ti∗ ) = |γ , (ti∗ )| the tangent i vector to γ , at time ti∗ , and si = ≈
γ1 (ti+1 ) − γ1 (ti ) ti+1 − ti
2
+
γ2 (ti+1 ) − γ2 (ti ) ti+1 − ti
2 ti ≈
[γ1, (ti∗ )]2 + [γ2, (ti∗ )]2 .ti =| γ , (ti∗ ) | .ti .
So the total Work performed along the polygonal line γP is approximately WP = =
n i=1 n i=1
Wi =
n
< F(ti∗ ), T (ti∗ ) > si =
i=1
< F(ti∗ ),
n
< F(ti∗ ), T (ti∗ ) >| γ , (ti∗ ) | .ti =
i=1
γ , (ti∗ )
| γ , (ti∗ )
> . | γ , (ti∗ ) | .ti =
n
< F(ti∗ ), γ , (ti∗ ) > ti .
i=1
Therefore when the trajectory of a point of mass m is a curve parametrized by γ , the Work W performed along γ is approximately the Work WP done on γP . Intuitively, when n → ∞, the value of || P ||= sup1≤i≤n | ti+1 − ti | becomes arbitrarily small and the value of WP becomes close to the value of W . Now we define W = lim WP . ||P||→0
If the limit exists, then W = a
b
< F(t), γ , (t) > dt.
(1)
1 Vector Calculus
217
Taking F = F1 e1 + F2 e2 and γ = (γ1 , γ2 ), we have < F(t), γ , (t) > dt = F1 (t)γ1, (t)dt + F2 (t)γ2, (t)dt. The starting point for differential forms is to define the linear functionals d x1 , d x2 as follows: Let β = {e1 , e2 } be the canonical basis of R2 ; d x1 (ei ) = δ1i , d x2 (ei ) = δ2i . Then for any vectors u = u 1 e1 + u 2 e2 , v = v1 e1 + v2 e2 ∈ T p R2 , r, s ∈ R and p ∈ R2 , we get d x1 (r u + sv) = r u 1 + sv1 , d x2 (r u + sv) = r u 2 + sv2 . By defining dγ =
dγ dt
dt = (γ1, dt, γ2, dt), we have
d x1 (dγ ) = γ1, (t)dt and d x2 (dγ ) = γ2, (t)dt. Therefore b < F(γ (t)), γ , (t) > dt = W = a
=
a b
b
F1 (γ (t))γ1, (t)dt + F2 (γ (t))γ2, (t)dt =
{F1 (γ (t))d x1 (dγ ) + F2 (γ (t))d x2 (dγ )}.
a
(2) At every point γ (t) ∈ U , consider the linear functional wγ (t) : Tγ (t) U → R given by wγ = F1 (γ (t))d x1 (dγ ) + F2 (γ (t))d x2 (dγ ). Using this formalism, Eq. (2) is W =
γ
w.
(3)
The Work done along γ is obtained as follows: W = = =
γ
w=
b a
b a
γ
F1 d x1 + F2 d x2 =
γ
{F1 d x1 + F2 d x2 }γ (t) =
F1 (γ (t))d x1 (γ (t)) + F2 (γ (t))d x2 (γ (t)) = F1 (γ (t))γ1 (t) + F2 (γ (t))γ2 (t) dt.
b a
F1 (γ (t))γ1 (t)dt + F2 (γ (t))γ2 (t)dt =
218
5 Vector Integration, Potential Theory
1.2 Surface Integral We will consider the following physical problem: find the flow throughout of a bounded surface S ⊂ R3 of a vector field V . The vector field could be an electric field, the velocity of a fluid, an electric current and many other examples. Let φ : U → R3 be a C ∞ -parametrization of S given in coordinates as φ(u, v) = φ1 (u, v), φ2 (u, v), φ3 (u, v) , (u, v) ∈ U. We assume S is an orientable surface, which means we have a non-null normal C ∞ -vector field N given by N = φu × φv , in which we have φu = ∂u φ =
∂φ ∂u
and
φu × φv = ∂u φ2 ∂v φ3 − ∂u φ3 ∂v φ2 , ∂u φ3 ∂v φ1 − ∂u φ1 ∂v φ3 , ∂u φ1 ∂v φ2 − ∂u φ2 ∂v φ1 . The unitary normal vector is nˆ = |NN | . The idea to define the flow through S is to approximate S by a polyhedron SP with faces that are images of parallelograms by the parametrization of S. If the vector field V is tangent to S, then the flow of V across the surface is null. Intuitively, we consider the case when people are walking parallel to a wall (orthogonally to the wall’s normal direction) with a door; in this case, there is no flow of people through the door. This example gives good motivation for understanding the flow throughout S. To define SP , we consider a partition P of U by taking a subdivision of U into small rectangles Rˆ i , each with area Ai , i = 1, . . . , n. For i, we fix an arbitrary point Pi∗ = (u i∗ , vi∗ ) ∈ Rˆ i . The images of the vertices of rectangle Rˆ i define the vertices of the polyhedron SP . Let’s consider U = [a, b] × [c, d]. Taking partitions Pu = {a = u 1 , . . . , u n = b} and Pv = {c = v1 , . . . , vn = d}, we define P = Pu × Pv = {(u i , vi ) | 1 ≤ i ≤ n}, i.e., (see Fig. 2) U=
n
m
[u k−1 , u k ] × [vl−1 , vl ]. k=1 l=1
Now we fix an order on the set of rectangles Rkl = [u i−1 , u i ] × [v j−1 , v j ], so that they are indexed by a single index, and we can write U = ∪i Ri . Each face of the polyhedron SP , denoted by Si , is determined by the image of the vertices of Ri . For Si , with area Si , we approximate the vector field V by the constant value V (u i∗ , vi∗ ), given that (u i∗ , vi∗ ) ∈ Rˆ i . We define the flow through Si to be ˆ i∗ , vi∗ ) > .Si , i =< V (u i∗ , vi∗ ), n(u
nˆ =
N . |N|
(4)
1 Vector Calculus
219
Fig. 2 Polyhedral surface
The area of Si is approximately Si =| N (u i∗ , vi∗ ) | .Ai . So now the total flow through SP is approximately n SP = i . i=1
Let || P ||= sup1≤i≤n Ai be the measure of the largest mesh. Taking the limit || P ||→ 0, the flow through the polyhedron is approximated to the flow throughout S. We define the total flow throughout S as the limit S = lim SP . ||P||→0
(5)
The limit above always exists whenever V is a continuous vector field and S is a C ∞ -surface. Definition 1 Let S be an orientable compact surface embedded in R3 and parametrized by a C ∞ -map φ : U → R3 . The flow of V throughout S is S =
U
< V (u, v), N (u, v) > d A.
The integral formula is obtained from Eqs. (4) and (5) since S = =
U U
< V (u, v),
N (u, v) > . | N (u, v) | d A = | N (u, v) |
< V (u, v), N (u, v) > d A.
220
5 Vector Integration, Potential Theory
In coordinates, we have < V (u, v), N (u, v) >= V1 (u, v)N1 (u, v) + V2 (u, v)N2 (u, v) + V3 (u, v)N3 (u, v) V1 N1 = V1 . ∂u φ2 ∂v φ3 − ∂u φ3 ∂v φ2 V2 N2 = V2 . ∂u φ3 ∂v φ1 − ∂u φ1 ∂v φ3 V3 N3 = V3 . ∂u φ1 ∂v φ2 − ∂u φ2 ∂v φ1 .
and
(6)
Similar to the line integral, we fix a basis β = {e1 , e2 , e3 } of R3 and define the linear functionals d xi : T p R3 → R, i = 1, 2, 3. These functionals form the dual basis consider the following β ∗ = {d x1 , d x2 , d x3 } of β, i.e., d xi (e j ) = δi j . We now skewsymmetric bilinear functions: taking the vectors u = i u 1 ei and v = j v j e j in T p R3 , we consider the product d xi ∧ d x j :T p R3 × T p R3 → R d xi ∧ d x j ( u , v) = d xi ( u ).d x j ( v ) − d xi ( v ).d x j ( u ) = (u i v j − vi u j ). We note d xi ∧ d x j = −d x j ∧ d xi , in particular d xi ∧ d xi = 0. Finally, we can write Eq. (6) as V1 N1 = V1 .d x2 ∧ d x3 ∂u φ, ∂v φ V2 N2 = V2 .d x3 ∧ d x1 ∂u φ, ∂v φ (7) V3 N3 = V3 .d x1 ∧ d x2 ∂u φ, ∂v φ . Therefore = V1 d x2 ∧ d x3 + V2 d x3 ∧ d x1 + V3 d x1 ∧ d x2 (∂u φ, ∂v φ)dudv.
(8)
U
Considering w = V1 d x2 ∧ d x3 + V2 d x3 ∧ d x1 + V3 d x1 ∧ d x2 , the flow is written as S =
S
w.
(9)
1 Vector Calculus
221
Exercises (i) Show that every vector V tangent to S can be written as the linear combination V = v1 ∂u φ + v2 ∂v φ. (ii) The differential of the map φ : U → R3 , at (u, v), is dφ(u,v) : T(u,v) R2 → Tφ(u,v) R3 . The Jacobian matrix is ⎛
dφ(u,v)
⎞ ∂u φ1 ∂v φ1 = ⎝∂u φ2 ∂v φ2 ⎠ . ∂u φ 3 ∂v φ 3
(10)
Show that the normal vector N = φu × φv is given in coordinates as N (u, v) =
∂(φ2 , φ3 ) ∂(φ3 , φ1 ) ∂(φ1 , φ2 ) , , , ∂(u, v) ∂(u, v) ∂(u, v)
and each coordinate is a smaller determinant of the matrix (10). (iii) Let S be a surface parametrized by a C ∞ -map φ : U → R3 . Consider w 1, w 2 tangents to S. Find the products d xi ∧ d x j (w 1, w 2 ), for i = 1, 2, 3. Use the fact that T p S = Im(dφ) and dφ = (∂u φ)du + (∂v φ)dv.
2 Classical Theorems of Integration We will state the classical integration theorems in a convenient way for our purposes. Let’s start with the Fundamental Theorem of Calculus. Let f : [a, b] → R be a continuous function and x g(x) = f (t)dt. a
So g (x) = f (x) and
b
f (t)dt = g(b) − g(a).
(11)
a
The boundary of [a, b] is the set ∂([a, b]) = {a, b}. By orienting the boundary,1 we have ∂[a, b] = b − a. Therefore we can write the formula (11) as g (t)dt = g(t)dt. (12) [a,b]
∂([a,b])
The theorems about vector field integration generalize (12). To state them, we will use the differential operators gradient, curl or divergence, as defined in Chap. 4. The
1 The
orientation will be discussed in Chap. 7.
222
5 Vector Integration, Potential Theory
curl operator can be defined for planar vector fields V = v1 e1 + v2 e2 , considering that v3 = 0. Theorem 1 (Green’s Theorem) Let ⊂ R2 be a compact domain with a boundary being a C 1 -curve. If P, Q : → R are C 1 -functions, then
∂P ∂Q − )d xd y = ( ∂ x ∂y
Pd x + Qdy,
(13)
that is,
∂
curl(F)dA =
F.
(14)
∂
Theorem 2 (Stokes’ Theorem) Let S ⊂ R3 be a compact surface, with a boundary ∂S that is a C 1 -curve. If P, Q, R : → R are C 1 -functions, then S
(
∂Q ∂P ∂R ∂Q ∂P ∂R Pd x + Qdy + Rdz, − )d xd y + ( − )dydz + ( − )dzd x = ∂x ∂y ∂y ∂z ∂z ∂x ∂S
that is,
S
curl(F)dA =
F.
(15)
∂S
Theorem 3 (Gauss’ Theorem) Let ⊂ R3 be a compact domain with a boundary that is a closed C 1 -surface. If P, Q, R : → R are C 1 -functions, then
that is,
∂P ∂Q ∂R + + d xd ydz = Pdydz + Qdzd x + Rd xdy, ∂x ∂y ∂z ∂
div(F)dV =
F.
(16)
∂
We note the integrals in the identities (12), (14), (15) and (16) are determined by integration on the boundary of the functions depending on the differential operators curl and div. Differential forms will allow us to show that the above theorems are all particular cases of the same theorem, the Stokes Theorem, and it can be extended for any finite dimension n.
2.1 Interpretation of the Curl and Div Operators The curl and divergence of a vector field in R3 arises in several mathematical models, so we have to give an interpretation for both. Let F : R3 → R3 be a differentiable vector field. As maps we have curl(F) ∈ C∞ (R3 ; R3 ) and div(F) ∈ C∞ (R3 ; R).
2 Classical Theorems of Integration
223
(1) Curl. Let St (a) be a disc of radius t > 0 centered in a, let A(t) be the area of St (a) and let C(t) = ∂ St (a) be the circumference defined by the boundary. If n is the unit normal vector at St (a) and T (t) is the unit tangent vector at the point C(t) so that {n, T, n × T } is a positive basis of R3 , then < n, curl(F)(a) >= lim
t→0
1 A(t)
γt (a)
< F, T > ds.
(17)
Proof The integral on the right side of the identity (17) measures the circulation of the field F along the curve C(t). The term < curl(F)(a), n > is the density of the circulation of F at a with respect to a plane perpendicular to n at a. To obtain (17), we define the continuous function φ(x) =< curl(F)(x), n >. Given > 0, there is δ > 0 such that if x ∈ Bδ (a), then | φ(x) − φ(a) |< . So φ(x)d S + [φ(a) − φ(x)]d S. φ(a).A(t) = St (a)
St (a)
When we apply the Stokes Theorem, it follows that | φ(a).A(t) −
γt (a)
< F, T > dγ |≤
St (a)
[φ(a) − φ(x)]d S < .A(t).
In Fig. 3, we have curl(F)(A) = 0, curl(F)(B) = 0, curl(F)(C) = 0, curl(F)(D) = 0 and curl(F)(E) = 0. (2) Divergence. Let a ∈ R3 , let Bt (a) be a ball of radius t > 0 centered at a, and let St (a) = ∂ Bt (a) be the sphere defined by the boundary of Bt (a). If V (t) is the volume of Bt (a) and n is the unit normal vector to St (a), then 1 div(F(a)) = lim < F, n > dS. (18) t→0 V(t) S (a) t Proof The integral on the right-hand side of Identity (18) measures the flow of F across the surface St (a), which means the divergence of F at a measures the density of the flow of F at a. We recall that the flow measures the amount of F entering through St (a) which was diminished by the amount of F exiting through St (a). Indeed, the amount entering yields < F, n >< 0 while the amount exiting yields < F, n >< 0. Therefore if div(F)(a) = 0, then there is no flow; indeed the flow that comes in also goes out. To prove Identity (18), we consider the continuous function φ = div(F). Given > 0, there is δ > 0 such that if x ∈ Bδ (a), then | φ(x) − φ(a) |< . So φ(x)d V + [φ(a) − φ(x)]d V. φ(a).V (t) = Bt (a)
Bt (a)
224
5 Vector Integration, Potential Theory
Fig. 3 Curl
Fig. 4 Divergence
When we apply the Gauss Theorem, we have
| φ(a).V (t) −
St (a)
< F, n > d S | ≤
Bt (a)
[φ(a) − φ(x)]d V < .V (t).
In Fig. 4, we can see div(F) = 0 in the regions where the flow lines enter through the boundary and later leave. For the regions in which there are lines of the flow either only entering or just leaving we have div(F) = 0.
2 Classical Theorems of Integration
225
Exercise (1) A vector field F is a solenoidal field if div(F) = 0. Show that the radial field F(r ) = rr3 is solenoidal. Show there is no field V such that F = curl(V).
3 Elementary Aspects of the Theory of Potential In this section, we will address questions arising from the relationship between the linear differential operators grad, curl and div with integration theorems. Question 1: Let f : [a, b] → R be a continuous function. Is there a function g : [a, b] → R such that g (x) = f (x)? If it does, g is called the primitive of f. Answer 1: Yes, it does exist because of the Fundamental Theorem of Calculus. The function g is x g(x) = f (t)dt. a
When we formulate the same question for maps, the answer is not always positive. Let ⊂ Rn be an open subset and f : → R a C 1 -map. We identify the differential d f with the gradient vector as follows: at every point p ∈ , we have d f p (.) =< n ∂ f ∇ f ( p), . >, and ∇ f = i=1 e. ∂ xi i n Question 2: Let V = i=1 vi ei : Rn → Rn be a C 1 -vector field defined on . Is there a differentiable function f : → R such that V = ∇ f ? Equivalently, does there exist a function f such that vi =
∂f , for all i = 1, . . . , n ? ∂ xi
(19)
To answer the 2nd question, there is an obstruction to the existence of f due to the identities ∂2 f ∂2 f = , for all i, j, ∂ x j ∂ xi ∂ xi ∂ x j because they yield as follows;
∂vi ∂x j
=
∂v j ∂ xi
for all i, j. Therefore the 2nd question should be stated
∂v Question 2’: Let V = i vi ei be a C 1 -vector field satisfying the identities ∂∂vx ij = ∂ xij for all i, j. Is there a function f : → R such that ∇ f = V ? When such an f does exist, it is called the potential function of V . Let’s work out a counterexample to question 2 .
226
5 Vector Integration, Potential Theory
Example 1: Consider the vector field V : R2 \{0} → R2 given by x −y . , V (x, y) = x 2 + y2 x 2 + y2 Suppose we have a function f : R2 → R such that ∇ f = V . Let γ : [0, 2π ] → R be the curve γ (θ ) = (cos(θ ), sin(θ )) and let f γ : [0, 2π ] → R be the function f γ (θ ) = f ◦ γ (θ ). By the chain rule, 2
d fγ (θ ) = f γ (θ) .γ , (θ ) = V cos(θ ), sin(θ ) .γ , (θ ) = dθ =< − sin(θ ), cos(θ ) , − sin(θ ), cos(θ ) >= 1. The line integral γ V can be calculated by two different methods: (i) using the definition;
γ
2π
V =
< V (γ (θ )), γ (θ ) > dθ = 2π ;
0
(ii) using V = grad(f); γ
V =
2π
0
=
0
< V (γ (θ ), γ (θ ) > dθ =
2π
< grad(f(θ )), γ (θ ) > dθ =
0 2π
d fγ (θ ) dθ = f γ (2π ) − f γ (0) = 0. dθ
As the values found are distinct, we conclude that the vector field V does not admit a potential function. V is defined in the region R2 \{0}, and we cannot extend it by a C 1 -vector field on R2 . So we have learned that the topology of the region matters in order to answer the question as to whether a potential functor exists. Definition 2 A subset ⊂ Rn is star-shaped with respect to a point x0 ∈ if the segment {t x0 + (1 − t)x | t ∈ [0, 1]} is contained in for all x ∈ (see Fig. 5). Remark 4 In Chap. 3, vector fields that admit a potential function are called conservative fields. They play a fundamental role in physics. Theorem 4 Let ⊂ Rn be an open star-shaped subset. If the vector field V : → ∂v R2 , V = (v1 , v2 , . . . , vn ) satisfies ∂∂vx ij = ∂ xij for all 1 ≤ i, j ≤ n, then we have a potential function f such that V = ∇ f . Proof We prove the case n = 2; the general case can be proved using the same arguments. Without loss of generality, assume x0 = 0 ∈ . Consider the function f : → R given by
3 Elementary Aspects of the Theory of Potential
227
Fig. 5 Star-shaped set
1
f (x1 , x2 ) =
x1 .v1 (t x1 , t x2 ) + x2 .v2 (t x1 , t x2 ) dt.
0
Since is star-shaped with respect to 0, f (x1 , x2 ) is well-defined for all (x1 , x2 ) ∈ . By differentiating f , we have ∂f (x1 , x2 ) = ∂ x1
1 0
∂v1 ∂v2 v1 (t x1 , t x2 ) + t x1 (t x1 , t x2 ) + t x2 (t x1 , t x2 ) dt. ∂ x1 ∂ x1
It follows from the identity ∂v1 ∂v1 d (t x1 , t x2 ) + t x2 (t x1 , t x2 ) [t.v1 (t x1 , t x2 )] = v1 (t x1 , t x2 ) + t x1 dt ∂ x1 ∂ x2 that 1 1 ∂v2 ∂f d ∂v1 t.v1 (t x1 , t x2 ) dt + (x1 , x2 ) = t x2 (t x1 , t x2 ) − (t x1 , t x2 ) dt ∂ x1 ∂ x1 ∂ x2 0 dt 0 1 = tv1 (t x1 , t x2 ) = v1 (x1 , x2 ). 0
Analogously, we have
∂f ∂ x2
= v2 .
The theorem above and the Example 1 reveal that the geometric nature of matters to answer Question 2’. We will introduce some mathematical language by defining vector spaces associated with a region ⊂ R3 that are useful for detecting whether a potential function exists. We consider ( ) = C ∞ ( , Rk ) the vector space of C 1 -vector fields
228
5 Vector Integration, Potential Theory
V : → Rk . The grad, curl and div are linear operators satisfying the identities (i) curl(grad(f)) = 0, (ii) div(curl(V)) = 0
(20)
for all f ∈ C ∞ ( , R) and ∀V ∈ ( ). For every operator, we define the following vector spaces: (i) Kernel
(ii) Image
Ker(curl) = {V ∈ ( ) | curl(V) = 0},
Im(grad) = {V ∈ ( ) | V = grad(f)}, (21)
Ker(div) = {V ∈ ( ) | div(V) = 0}.
Im(curl) = {V ∈ ( ) | V = curl(W)}.
So from the Identities (20) (i) and (ii), we have Im(grad) ⊆ Ker(curl), Im(curl) ⊆ Ker(div). Definition 3 Let ⊂ R3 . We consider the vector spaces associated with : V0 ( ) = Im(grad), Ker(div) , V2 ( ) = Im(curl)
Ker(curl) , Im(grad) C ∞ ( ) V3 ( ) = . Im(div)
V1 ( ) =
(22)
The definition extends to R2 = R2 × {0} ⊂ R3 . For any , the spaces Vi ( ) can be very intricate. The vector spaces Ker(curl) and Im(grad) have infinite dimensions and their quotients may not be vector spaces. In the following chapter, we will use differential forms to define these spaces, which will allow us to obtain a finite dimensional algebra. With these concepts at hand, we have the following statements; Theorem 5 Let ⊂ R2 be a star-shaped subset. So V0 ( ) = R, V1 ( ) = V2 ( ) = 0. Proof Let’s check for every dimension. (1) V0 ( ) = R. Suppose that grad(f) = 0. It follows that f is constant in . Therefore the map V0 ( ) → R, f → c, in which c ∈ R is a constant, defines an isomorphism between vector spaces. (2) V1 ( ) = 0. It follows from Theorem (4). (3) V2 ( ) = 0. Let f ∈ C ∞ ( , R) be an arbitrary function and let V (x, y) = 0, Q(x, y) be 1 a C ∞ -vector field with Q(x, y) = 0 f (t x, y)xdt. It is straightforward from
3 Elementary Aspects of the Theory of Potential
∂Q = ∂x
that curl(V) =
1 0
∂ [ f (t x, y)x] dt = ∂x
∂Q ∂x
229
0
1
d [ f (t x, y)t] dt = f (x, y) dt
= f. Hence H 2 ( ) = 0.
We call attention to the fact that V1 (R2 \{0}) = 0, as shown in Example 1. Next, let’s look at the case ⊂ R3 . Theorem 6 If ⊂ R3 is a star-shaped subset, then V0 ( ) = R, V1 ( ) = V2 ( ) = V3 ( ) = 0. Proof The reasoning behind proving V0 ( ) = R and V1 ( ) = 0 is analogous to what was used to prove the last case in ⊂ R2 . The proofs below show that V2 ( ) = 0 and V3 ( ) = 0. (1) V2 ( ) = 0. Assume is star-shaped with respect to the origin. Let V : → R3 , let V = i vi ei , be a vector field such that div(V) = 0 and define the vector field W : → R3 as 1 V (t x) × (t x) dt, W (x) = 0
with V (t x) × (t x) = t.(v2 x3 − v3 x2 , v3 x1 − v1 x3 , v1 x2 − v2 x1 ). Since div(V) = 0, we have d(t 2 V(tx)) . curl V(tx) × (tx) = dt As a consequence of the identity curl W(x) =
1 0
d(t 2 V(tx)) dt = V(x), dt
V is the curl of W . Hence V2 ( ) = 0. (2) V3 ( ) = 0. Let f ∈ C ∞ ( , R) and consider the vector field V (x, y, z) = (P(x, y, z), 0, 0), with P(x, y, z) as 1
P(x, y, z) =
f (t x, y, z)xdt.
0
Since div(V) = f, then V3 ( , R) = 0.
230
5 Vector Integration, Potential Theory
Exercises (1) If ⊂ R3 has k path connected components, show that V0 ( ) Rk . (2) Let S = {(x, y, z) ∈ R3 | x 2 + y 2 = 1, z = 0} be the planar unitary circumference in R2 × {0} ⊂ R3 . Consider the function V : R3 − S → R3 given by V (x, y, z) =
−2x z −2yz x 2 + y2 − 1 , 2 , 2 2 2 2 2 2 2 2 z + (x + y − 1) z + (x + y − 1) z + (x 2 + y 2 − 1)2
(a) Show that V ∈ V1 (R3 \S). (b) Find the integral
V γ
given that γ : [−π, π ] → R3 is γ (t) =
1 + cos(t), 0, sin(t) .
(c) Suppose V = ∇ f and find the line integral 1 (R3 \S 1 ) = 0. (d) Show that HDR
γ
∇ f.
.
Chapter 6
Differential Forms, Stokes Theorem
We will introduce the algebra ∗ (U ) of differential forms on an open subset U ⊂ Rn , although the formalism to define it on a submanifold of Rn is the same. The main purpose is to prove the Stokes Theorem and to define De Rham Cohomology. We will start with an introduction to the pointwise concepts which later will be extended to an open set U .
1 Exterior Algebra Exterior or Grassmannian algebras are the key algebraic structures in studying differential forms. They were introduced in 1844 by Hermann Gunther Grassmann. For a vector space V , the problem addressed by Grassmann was to find an algebra for which the relation v 2 = 0 would be satisfied for all v ∈ V . In 1867, Hermann Hankel gave a geometric interpretation of Grassmann’s idea which was to use the alternating product of vectors. Consider (V ) the Exterior Algebra associated with a vector space V . The construction of (V ) will be done using the alternating operator Alt : T(V) → T(V), and more precisely, we will define (V ) = Img(Alt(V)). The Exterior Algebra that interests us is (V ∗ ) which is associated with the dual vector space V ∗ . Definition 1 Let V be a vector space over R. A p-tensor defined on V is a p-linear function T : V × p = V × . p. . × V → R such that
T (v1 , . . . αv j + βv j , . . . , v p ) = α.T (v1 . . . , v j , v j+1 , . . . , v p ) + β.T (v1 , . . . , v j , . . . , v p ),
© Springer Nature Switzerland AG 2021 C. M. Doria, Differentiability in Banach Spaces, Differential Forms and Applications, https://doi.org/10.1007/978-3-030-77834-7_6
231
232
6 Differential Forms, Stokes Theorem
with j ∈ {1, . . . , p}, for all vi , v j ∈ V and α, β ∈ R. A p-tensor T is a linear functional T : V ⊗ p → R.1 Example 1 Examples. (i) p = 1, linear functionals on V . (ii) p = 2, the inner product of Rn . (iii) p = k, the determinant of a k × k matrix. The p-tensors T and S are equal if and only if T (v1 , . . . , v p ) = S(v1 , . . . , v p ) for all {v1 , . . . , v p } ⊂ V . Let T p (V ) be the set of p-tensors on a vector space V , and T 1 (V ) = V ∗ . The tensor product between tensors is defined as follows: Let T ∈ T p (V ) and S ∈ T q (V ): (T ⊗ S)(v1 , . . . , v p , v p+1 , . . . , . . . v p+q ) = T (v1 , . . . , v p ).S(v p+1 , . . . , v p+q ). (1) The tensor product over T (V ) induces an associative and non-commutative algebraic structure. Let β = {e1 , . . . , ek } be a basis of V and let β ∗ = {e1∗ , . . . , ek∗ } be the corresponding dual basis, i.e., ei∗ (e j ) = δi j . Consider the sets C = {1, 2, . . . , k} ⊂ N and C p = C × . p. . × C, p ∈ N. For pairs of index sequences I = (i 1 , . . . , i p ), J = ( j1 , . . . , j p ) ∈ C p , we define the elements to be e∗I = ei∗1 ⊗ · · · ⊗ ei∗p ∈ (V ∗ )⊗ p , e J = (e j1 , . . . , e j p ) ∈ V × . p. . × V. Therefore we have e∗I (e J ) = (ei∗1 ⊗ · · · ⊗ ei∗p )(e j1 , . . . , e j p ) = = ei∗1 (e j1 ) . . . ei∗p (e j p ) = δ I J .
(2)
Proposition 1 Let V be a k-dimensional vector space, let β = {e1 , . . . , ek } be a basis of V and let β ∗ = {e1∗ , . . . , ek∗ } be the dual basis. Then the set {ei∗1 ⊗ · · · ⊗ ei∗p | 1 ≤ i 1 , i 2 , . . . , i p ≤ k} is a basis of T p (V ). Consequently, dim(T p (V)) = k p . Proof Let β = {e1 , . . . , ek } be a basis of V and let β ∗ = {e1∗ , . . . , ek∗ } be the dual basis. (i) The set {ei∗1 ⊗ · · · ⊗ ei∗p | 1 ≤ i 1 , i 2 , . . . , i p ≤ k} is linearly independent. Suppose there are coefficients a I ∈ R such that I a I e∗I = 0; then a I e∗I (e J ) = a J = 0 ⇒ a J = 0 ∀J. I
(ii) The set {ei∗1 ⊗ · · · ⊗ ei∗p | 1 ≤ i 1 , i 2 , . . . , i p ≤ k} spans T p (V ). Consider the vectors w1 , . . . , w p in V and T ∈ T p (V ). For i ∈ {1, . . . , p}, we have wi = li wili eli (i = 1 . . . , p), then 1 The
tensor product V ⊗ p is defined in Appendix C.
1 Exterior Algebra
233
T (w1 , . . . , w p ) =
...
l1
=
=
l
T (el1 , . . . , el P )w1l1 . . . w pp =
lp
...
l1
T (el1 , . . . , el P )el∗1 (w1 ) . . . el∗p (w p ) =
lp
...
l1
Tl1 l2 ...l p (el∗1 ⊗ . . . el∗p )(w1 , . . . , w p ).
lp
Therefore we have T =
...
l1
Tl1 l2 ...l p el∗1 ⊗ . . . el∗p .
lp
Hence {e∗I | I ∈ C p } spans T p (V ).
The space of tensors defined on V is the vector space T (V ) =
∞
T p (V ), T 0 (V ) = R.
(3)
p=0
Indeed, T (V ) is equal to the tensor algebra T (V ∗ ). Definition 2 A p-tensor is alternating if T (v1 , . . . , vi , . . . , v j , . . . , v p ) = −T (v1 , . . . , v j , . . . , vi , . . . , v p ), ∀i, j. By convention, the 1-tensors are alternating. The classical example of an alternating p-tensor is the determinant of a p × p matrix. Let S p be the symmetric group defined by the set of bijections of the set {1, . . . , p}. A permutation π ∈ S p is either even or odd, depending on whether the number of transpositions of indices are even or odd. Consider | σ | equal to +1 if σ is even and −1 if is odd. Then (−1)|π| will be equal to +1 or −1. There is a natural action of S p over T p (V ) given by S p × T p (V ) → T p (V ), (π, T ) → T π , such that T π (v1 , v2 , . . . , v p ) = T (vπ(1) , vπ(2) , . . . , vπ( p) ) and (T + S)π = T π + S π . It follows straightforward that (T π )σ = T σ ◦π . A p-tensor is alternating if T π = (−1)|π| T, for any π ∈ S p . Let
234
6 Differential Forms, Stokes Theorem
p (V ) = {T ∈ T p (V ) | T π = (−1)|π| T, ∀π ∈ S p } ⊂ T p (V ), be the subspace of alternating p-tensors. Considering the operator Alt defined as Alt(T) =
1 (−1)|π| Tπ , p! π∈S
(4)
p
we define the alternate linear operator Alt : T p (V) → T p (V) over T p (V ) by extending AlT linearly over R. Proposition 2 The operator Alt : T p (V ) → T p (V ) is a projection over the vector subspace p (V ) for all p. Proof Let’s check that for all σ ∈ S p and T ∈ T p (V ), we have [Alt(T)]σ = (−1)|σ | Alt(T). σ σ 1 1 1 Alt(T) = (−1)|π | T π = (−1)|π | [T π ]σ = (−1)|σ | (−1)|π | T σ ◦π = (−1)|σ | p! p! p! π ∈S p
π ∈S p
π ∈S p
1 1 (−1)|σ | (−1)|σ | (−1)|σ ◦π | T σ ◦π = (−1)|σ ◦π | T σ ◦π = = p! p! π ∈S p
π ∈S p
1 (−1)|σ | (−1)|τ | T τ = (−1)|σ | Alt(T). = p! τ ∈S p
The last equality above uses the fact that S p is a group, so we have π ∈ S p such that τ = σ ◦ π for all τ ∈ S p ,. Therefore Alt : T p (V) → p (V). To verify that Alt is a projection, i.e., Alt ◦ Alt = Alt, we note that whenever T is an alternating p-tensor T π = (−1)|π| T , we have Alt(T) =
1 1 (−1)|π| T π = (−1)|π| (−1)|π| T = p! π∈S p! π∈S p
1 = T = T. p! π∈S
p
p
Hence (Alt)2 = Alt.
The vector space (V ) carries an algebraic structure, which we show next. Definition 3 Let T ∈ T p (V ) and S ∈ T q (V ) be alternating tensors; the exterior product between T and S is T ∧ S = Alt(T ⊗ S) ∈ T (p+q) (V). The exterior product has the following properties;
1 Exterior Algebra
235
(1) (associativity) For all T ∈ T p (V ), S ∈ T q (V ) and R ∈ T t (V ), (T ∧ S) ∧ R = T ∧ (S ∧ R). (2) Let k ∈ R for all T ∈ T p (V ), k ∧ T = T ∧ k = kT, in particular, 1 ∈ R is the identity element. (3) (distributivity) For all T ∈ T p (V ), S ∈ T q (V ) and R ∈ T t (V ), T ∧ (S + R) = T ∧ S + T ∧ R. The product is not commutative. Associativity requires a proof; for this purpose we use the following lemma. Lemma 1 If Alt(T) = 0, then T ∧ S = S ∧ T = 0 for all S. Proof If T ∈ T p (V ) and S ∈ T q (V ), then T ⊗ S ∈ T ( p+q) (V ). Let G be the subgroup of S( p+q) with elements that are permutations fixing the indices p + 1, . . . , p + q, so G is isomorphic to Sq . If π ∈ G, then π(1, . . . , p, p + 1, . . . , p + q) = (π(1), . . . , π( p), p + 1, . . . , p + q). For all π ∈ G, we have (T ⊗ S)π = T π ⊗ S since
T ⊗S
π
(v1 , . . . , v p , v p+1 , . . . , v p+q ) = (−1)π T (vπ(1) , . . . , vπ( p) ).S(v p+1 , . . . , v p+q ).
So
(−1)π (T ⊗ S)π = [
π∈G
(−1)π T π ] ⊗ S = Alt(T) ⊗ S = 0.
π∈G
The subgroup G decomposes S p+q into a disjoint union of left lateral classes which means that we have a set of elements {σi | 1 ≤ i ≤ l} ⊂ S p+q such that for any τ ∈ S p+q , there is π ∈ G, and i ∈ {1, . . . , l}, showing that τ = σi .π . Consequently, T ∧S=
(−1)τ (T ⊗ S)τ =
τ ∈S p+q
=
l
(−1)σi ◦π (T ⊗ S)σi ◦π =
i=1 π∈G
l l σ (−1)σi (−1)π (T ⊗ S)π i = (−1)σi [ (−1)π T π ] ⊗ S]σi = i+1
π∈G
l σ (−1)σi Alt(T) ⊗ S i = 0. = i=1
i=1
π∈G
236
6 Differential Forms, Stokes Theorem
Proposition 3 The exterior product is associative. Proof Given the linearity of Alt : T p (V) → p (V), we have (T ∧ S) ∧ R − Alt(T ⊗ S ⊗ R) = Alt[(T ∧ S) ⊗ R] − Alt(T ⊗ S ⊗ R) = = Alt[(T ∧ S) ⊗ R − T ⊗ S ⊗ R] = Alt[(T ∧ S − T ⊗ S) ⊗ R]. However Alt[T ∧ S − T ⊗ S] = 0. From the last lemma, the above equation is null; consequently, (T ∧ S) ∧ R = Alt(T ⊗ S ⊗ R). Analogously, T ∧ (S ∧ R) = Alt(T ⊗ S ⊗ R).
Hence the exterior product is associative.
The vector space of alternating tensors ((V ), +) associated with an R-vector space V is the R-vector space (V ) = p (V ). p
The exterior algebra associated to V is ((V ), +, ∧). Proposition 4 Let φ, ψ ∈ 1 (V ). Then (1) φ ∧ ψ = −ψ ∧ φ; (2) φ ∧ φ = 0. Proof It follows from the definition that (φ ∧ ψ)(v1 , v2 ) = Alt(φ ⊗ ψ)(v1 , v2 ) = φ ⊗ ψ(v1 , v2 ) − φ ⊗ ψ(v2 , v1 ) = = φ(v1 ).ψ(v2 ) − φ(v2 ).ψ(v1 ). Now swapping the positions of φ and ψ, the result follows.
Proposition 5 Let V be an n-dimensional vector space, let {φ1 , . . . , φk } be a basis of V ∗ , and let I p = {I = (i 1 , . . . , i p ) | 1 ≤ i 1 ≤ i 2 ≤ · · · ≤ i p ≤ k}. Associated to the multi-index I = (i 1 , . . . , i p ), we consider the alternating p-tensor φ I = φi1 ∧ · · · ∧ φi p . Then the set {φ I ; I ∈ I p } is a basis of p (V ). The dimension of P (V ) is
n p
.
Proof To check that {φ I | I ∈ I} generates p (V ), we recall that a p-tensor T can be written in a unique way as
1 Exterior Algebra
237
T =
Ti1 ...i p φi1 ⊗ · · · ⊗ φi p .
i 1 ,...,i p
Therefore we have Alt(T) =
Ti1 ...ip Alt(φi1 ⊗ · · · ⊗ φip ) =
I∈I p
TI φI .
I∈I p
The condition of being linearly independent is elementary and to be verified. If {v1 , . . . , vk } is a basis of V such that φi (v j ) = δi j and v J = (v j1 , . . . , v j p ), then φ J (v J ) = δ I J and TI φ I = 0 ⇒ TI φ I (v J ) = TJ = 0. I ∈I p
I ∈I p
The cardinality of the set {φ I | I ∈ I p } is
n . p
If V is an n-dimensional vector space of dimension n, then (V ) = ⊕kn=0 n (V ) is a 2n-dimensional vector space. (V ) endowed with the exterior product “∧” becomes an algebra such that x 2 = x ∧ x = 0 for every x ∈ V . This relation is a universal relation since V generates (V ). Furthermore, (V ) is a graduated algebra, that is, (V ) p ∧ (V )q ⊂ (V ) p+q . The exterior algebra (V ) can be defined as a factorial algebra. To this end, we consider the ideal I ⊂ T (V ) generated by the elements of the form {v ⊗ v | v ∈ V }. By definition, the degree of elements in V is 1; therefore they belong to T 1 (V ) (I ∩ T 1 = 0). Given the canonical homomorphism j : T (V ) → T (V )/I, the restriction of j to V is injective. So we can identify V with its image j (V ) and consider it a subset of (V ) = T (V )/I. Since V generates the algebra T (V ), V also generates the algebra (V ). By construction, j (v)2 = 0 for all v ∈ V . Proposition 6 (Universal Property) Let A be an algebra over the field K and let f : V → A be a homomorphism such that f (v). f (v) = 0, ∀v ∈ V. Then there is a unique homomorphism φ : (V ) → A extending f . Proof Consider the homomorphism f ∗ : T (V ) → A given by f ∗ (v ⊗ w) = f ∗ (v). f ∗ (w). So we have f ∗ (v ⊗ v) = 0, ∀v ∈ V . Therefore I ⊂ Ker(f ∗ ), and consequently, f ∗ induces a homomorphism f ∗ : T (V )/I → A. Uniqueness arises from the fact that V spans (V ). Clifford algebras generalize the Exterior Algebra. Let V be an n-dimensional real vector space endowed with a non-degenerate bilinear form q : V × V → R, and let q : V → R be the associated quadratic form. Considering the relation
238
6 Differential Forms, Stokes Theorem
u • v + v • u = −2q(u, v), ∀u, v ∈ V, we define the vector spaces C k = {u 1 • · · · • u k | u j ∈ V, 1 ≤ j ≤ k} and Cl(V, q) = R ⊕ C 1 ⊕ · · · ⊕ C n . The Clifford Algebra Cl(V, q) is associated with the pair (V, q). When the bilinear form is completely degenerate, i.e., q = 0, we have Cl(V, q) = (V ).
Exercises (1) Consider R the vector space generated by e1 = 1 with the quadratic form q(1) = 1. Consequently we have e1 • e1 = e12 = −1. Show that the Clifford Algebra Cl1 = Cl(R, q) is isomorphic to C. (2) Let β = {e1 , e2 } be an orthonormal basis of R2 ; let < ., . >: R2 × R2 → R be an inner product with the quadratic form q(u) =< u, u >. Then we have the identity ei • e j + e j • ei = −2δi j . Show that the Clifford Algebra Cl2 = Cl(R2 , q) is isomorphic to the quaternion algebra H. (3) Show that T p (V ) is a vector space over R. (4) Show that a p-tensor T : V × p → R induces a homomorphism T : V ⊗ p → R. (5) Let T ∈ p (V ) and S ∈ q (V ); then T ∧ S = (−1) pq S ∧ T ∈ p+q (V ). (6) Let β = {e1 , . . . , en } be an orthonormal basis of Rn and < ., . >: Rn × Rn → R an inner product. Show that the identity ei • e j + e j • ei = −2δi j , defines an algebra (Cln = Cl(Rn , < ., . >)). (7) Show that Cl(V, q) is a factorial algebra generated by V .
2 Orientation on V and on the Inner Product on (V ) In this section, we deal with two technical concepts that are necessary to carry on with our task which is to study differential forms and to prove the Stokes Theorem.
2 Orientation on V and on the Inner Product on (V )
239
2.1 Orientation Orientation is a misleading concept because it is easy to define on vector spaces and differentiable manifolds, but it is not so elementary to be able to verify if a manifold is orientable. We don’t have to bother with necessary conditions to be orientable; we just need the concept. β β The bases α = {e1α , . . . , enα } and β = {e1 , . . . , en } of Rn are equivalent if we have n n a linear transformation T : R → R , such that β T (ei )
=
n
t ji eαj
j=1
and the determinant of the matrix T = (t ji ) is positive. The relation “α ∼ β ⇔ det(T ) > 0” defines an equivalent relation in the set BRn of bases of Rn . When we fix a basis of Rn , e.g., the canonical basis c = {e1 , . . . , en }, we note that there are only 2 distinct equivalence classes in Rn : the positive class corresponding to those bases such that det(T) > 0 and negative when det(T ) < 0. We consider the equivalent classes ORn = BRn ∼ = {−1, 1}, such that 1 corresponds to the positive class and −1 to the negative class. To fix an orientation on Rn involves a class in ORn , that is, to fix a basis of Rn . For example, once we fix a basis c = {e1 , e2 , . . . , en } by swapping positions to obtain a new ordered basis c = {e2 , e1 , . . . , en }, we consider 1 = [c], and so −1 = [c ]. Let’s consider β = {e1 , . . . , en } a set of C ∞ -vector fields on U ⊂ Rn . We say that β is a frame on an open subset U ⊂ Rn if the following conditions are verified; (i) β( p) = {e1 ( p), . . . , en ( p)} is a basis of T p U for all p ∈ U . (ii) the map U → Gln (R), p → β( p), is C ∞ . We assume the frame is globally defined on U , which is the case when U is simply connected. For all p ∈ U , the ordered basis β( p) = {e1 ( p), . . . , en ( p)} is assumed to be the positive orientation on T p U . An orientation is assigned to U since frame β is fixed. Let U ∈ Rn and V ∈ Rm be oriented opens subsets. A differentiable map f : U → V preserves the orientation if det(d f p ) > 0 for all p ∈ U .
2.2 Inner Product in (V ) When we fix an inner product on V , we can induce an inner product on (V ). Let (V, < ., . >) be an n-dimensional vector space endowed with an inner product < ., . >: V × V → R. We will define an inner product on (V ). Let’s consider the case p = 2: the inner product < w1 ⊗ w2 , η1 ⊗ η2 >=< w1 , η1 > . < w2 , η2 >
240
6 Differential Forms, Stokes Theorem
on V ⊗ V induces on 2 (V ) an inner product as follows; < w1 ∧ w2 , η1 ∧ η2 >=< w1 ⊗ w2 − w2 ⊗ w1 , η1 ⊗ η2 − η2 ⊗ η1 >=
< w1 , η1 > < w1 , η2 > . =< w1 , η1 > . < w2 , η2 > − < w1 , η2 > . < w2 , η1 >= det < w2 , η1 > < w2 , η2 >
Definition 4 The inner product < ., . >: V × V → R induces on each subspace k (V ), 1 ≤ k ≤ n, the inner product < w1 ∧ · · · ∧ wk , η1 ∧ · · · ∧ ηk >= det(< wi , η j >), 1 ≤ i, j ≤ k.
(5)
The subspaces k (V ) and l (V ) are orthogonal whenever k = l. If β = {e1 . . . , en } is an orthonormal basis of V , then βk = {e I | I = (i 1 , . . . , i k )} = ∪k βk is an orthonormal is an orthonormal basis of k (V ), and consequently, β basis of (V ). When dim(V) = n, the space n (V ) has dimension 1, and we fix βn = {ϑβ = e1 ∧ · · · ∧ en } to be the basis. The element e1 ∧ · · · ∧ en is the volume element of β. Definition 5 An orientation on V is given by the volume element e1 ∧ · · · ∧ en , associated to an orthonormal basis β as the positive basis of n (V ). Definition 6 Let V be an n-dimensional oriented vector space with an orientation given by the volume element e1 ∧ · · · ∧ en . The Hodge star-operator ∗ : k (V ) → n−k (V ) is the linear operator defined by the following identity: let ω, η ∈ k (V ) ω ∧ ∗η =< ω, η > e1 ∧ · · · ∧ en =< ω, η > ϑβ .
(6)
It follows from the definition that ∗(1) = ϑβ and w ∧ ∗w =| w |2 ϑβ . Let’s conwhich is an element of the basis of 2 (V ). sider the casen = 4 and η = e1 ∧ e3 , Given ∗η = i, j ai j ei ∧ e j and w = k,l bkl ek ∧ el , we have w ∧ ∗η =
ai j bk,l ek ∧ el ∧ ei ∧ e j =
k,l i, j
bkl < ek ∧ el , e1 ∧ e3 > e1 ∧ e2 ∧ e3 ∧ e4 =
k,l
= b13 .e1 ∧ e2 ∧ e3 ∧ e4 .
Let S4 be the symmetric group permuting 4 elements and consider | σ | the signal of an element σ ∈ S4 . So we have w ∧ ∗η =
k,l i, j
ai j bk,l ek ∧ el ∧ ei ∧ e j = [
σ ∈S4
(−1)|σ | aσ (1)σ (2) bσ (3)σ (4) ]e1 ∧ e2 ∧ e3 ∧ e4 .
2 Orientation on V and on the Inner Product on (V )
241
Pairing the expressions we obtained, we have [
(−1)|σ | aσ (1)σ (2) bσ (3)σ (4) ]e1 ∧ e2 ∧ e3 ∧ e4 = b13 e1 ∧ e2 ∧ e3 ∧ e4 .
σ ∈S4
Therefore ∗η = −e2 ∧ e4 . The example shows that the operator ∗ is determined with respect to the elements of the basis, so it is the unique operator satisfying the identity (32). Proposition 7 The operator ∗ : k (V ) → n−k (V ) satisfies the following properties: (i) ∗2 = (−1)k(n−k) . (ii) < ∗w, ∗η >=< w, η >. Proof We will prove the identities using the fact that the operator ∗ is determined by the image of the basis. (i) Let ϑβ = e1 ∧ · · · ∧ en be the volume element, I = (1, . . . , k) a multi-index and e I = e1 ∧ · · · ∧ ek . From the identity (32), we have e I ∧ ∗e I = ϑβ . Note we have c ∈ R such that ∗e I = c.e J , J = (k + 1, . . . , n) and e I ∧ e J = ϑβ . Since e I ∧ ∗e I = cϑβ = ϑβ , we have c = 1. Due to the identities e J ∧ e I = (−1)k(n−k) e I ∧ e J and ∗e I = e J , we get ∗2 e I = ∗(e J ) = (−1)k(n−k) e I . Therefore ∗2 = (−1)k(n−k) . (ii) < ∗w, ∗η >= ∗w ∧ ∗(∗η) = ∗w ∧ (−1)k(n−k) η = (−1)2k(n−k) η ∧ ∗w =< w, η >.
Exercises (1) Show that the bilinear form defined in Eq. 5 is symmetric and positive definite, so it is an inner product. (2) Let T : V → V be a linear transformation. Show that T preserves orientation if and only if det(T ) > 0, and so there are just two possible orientations on a vector space. (3) Consider V = R4 and β = {e1 , e2 , e3 , e4 } an orthonormal basis. Find ∗(ei ∧ e j ) for all 1 ≤ i, j ≤ 4. (4) When n = 4 and k = 2, we have ∗2 = 1. Show that there is a decomposition 2 (R4 ) = 2+ ⊕ 2− with ± being the eigenspaces associated to the eigenvalues ±1 of ∗, respectively.
2.3 Pseudo-Inner Product, the Lorentz Form So far we have worked with a real vector space endowed with an inner product < ., . >: V × V → R, whose quadratic form has signature τ = dim(V). In Restricted Relativity we have V = R4 , and there is the need to work with a quadratic form with signature τ = 2 called the Lorentz form. According to Sylvester’s theorem, on
242
6 Differential Forms, Stokes Theorem
R4 the quadratic forms satisfying τ = 2 are equivalent up to an orthogonal linear transformation to ⎛ ⎞ 100 0 ⎜0 1 0 0⎟ ⎟ L=⎜ ⎝0 0 1 0⎠. 0 0 0 −1 A finite dimensional real vector space V is endowed with a pseudo-inner product if, on V , we have a symmetric bilinear form such that τ < dim(V). Let β = {e1 , e2 , e3 , e4 } be a basis of R4 such that δi j , if 1 ≤ i, j ≤ 3, L(ei , e j ) = −δi4 , if 1 ≤ i ≤ 4. Consequently, considering the dual basis β ∗ = {e1∗ , e2∗ , e3∗ , e4∗ } (L ∗ = L −1 ) on (R ) , we have δi j , if 1 ≤ i, j ≤ 3, ∗ ∗ ∗ L (ei , e j ) = −δi4 , if 1 ≤ i ≤ 4. 4 ∗
Then we get a pseudo-inner product on (R4 )∗ given by ⎧ ⎪ ⎨ 1, if 1 ≤ i, j ≤ 3 and i = k and j = l ∗ ∗ ∗ ∗ ∗ L (ei ∧ e j , ek ∧ el ) = −1, if i = k and j = l = 4, ⎪ ⎩ 0, if {i, j} = {k, l}
,
⎧ ⎪ ⎨ 1, if 1 ≤ i, j, k, α, β, γ ≤ 3 and i = α, j = β and k = γ L ∗ (ei∗ ∧ e∗j ∧ ek∗ , eα∗ ∧ eβ∗ ∧ eγ∗ ) = −1, if i = α, j = β and k = γ = 4, ⎪ ⎩ 0, if {i, j, k} = {α, β, γ }
,
L ∗ (e1∗ ∧ e2∗ ∧ e3∗ ∧ e4∗ , e1∗ ∧ e2∗ ∧ e3∗ ∧ e4∗ ) = −1. Hodge’s star-operator is defined using the pseudo-inner product L. Then ω ∧ (∗η) = L ∗ (ω, η)e1 ∧ e2 ∧ e3 ∧ e4 ,
(7)
and ∗2 = (−1)1+k(4−k) . In this case the eigenvalues of ∗ : 2 (R4 ) → 2 (R4 ) are ±i. The eigenspaces 2± yields the decomposition 2 (R4 ) = 2+ ⊕ 2− .
3 Differential Forms
243
3 Differential Forms In this section we will introduce differential forms on Rn and the Exterior Derivative operator. Rn carries very simple topological and vectorial structures, and every simply connected open subset U ⊂ Rn inherits these properties. In Rn , we have a constant orthonormal frame making many concepts and calculations easy to handle. However this comfortable situation has to be extended to frames depending on the point. Definition 7 Let U ⊂ Rn be an open subset. (1) A frame on U is a C∞ map E : U → G L n (R) E(x) = {e1 (x), . . . , en (x)}, such that the set E(x) is a basis of Tx Rn for all x ∈ U . A basis can be identified with an invertible matrix through the bijection 1−1 E(x) = {e1 (x), . . . , en (x)} ←→ e1 (x) . . . en (x) ∈ G L n (R) in which ei (x) is a column vector. (2) The co-frame associated to E on U is E ∗ (x) = {e1∗ (x), . . . , en∗ }, ei∗ (e j ) = δi j . Therefore E ∗ (x) is a basis of Tx∗ U for all x ∈ U . We can fix a constant orthonormal frame in U ⊂ Rn so that the dual co-frame is also constant. Classically, the co-vectors are written as d xi = ei∗ . For x ∈ U , using the vector spaces Tx U and Tx∗ U , we obtain the spaces of the alternating p-tensors p (Tx∗ U ), 0 ≤ p ≤ n. Definition 8 Let p (U ) = ∪x∈U p (Tx∗ U ) be the vector bundle over U with a fiber at x that is the vector space p (Tx∗ U ). A differential p-form in U is a section w : U → p (U ), i.e., for every x ∈ U we have an alternating p-tensor wx ∈ p (Tx∗ U ). A p-form ω is smooth or, equivalently, C ∞ if the map x → ωx is C ∞ . Let p (U ) be the space of the p-forms defined on U . Example 2 Some differential forms are well-known, as shown in the next examples; (1) The 0-forms are C ∞ -functions defined on U , so 0 (U ) = C ∞ (U ). (2) For any function f ∈ 0 (U ), the differential d f x ∈ Tx∗ U is a linear functional whose dependence on x is C ∞ ; therefore d f ∈ 1 (U ). Let (x1 , . . . , xn ) be the coordinates in U , and let E(x) = {e1 , . . . , en } be the constant frame given by the canonical basis. In this case, we have ei∗ = d xi . Since n ∂f ∂f ∂f + · · · + un = ei∗ (u), d f x .u = u 1 ∂ x1 ∂ xn ∂ x i i=1
244
6 Differential Forms, Stokes Theorem
we have d f =
n
∂f i=1 ∂ xi
d xi .
Since E ∗ (x) = {e1∗ (x), . . . , en∗ (x)} is a basis of Tx∗ U for all x ∈ U , the set {e∗I (x)}, I = (i 1 , . . . , i p ), 1 ≤ i 1 < · · · < i p ≤ n is a basis of p (Tx∗ U). Every p-form in U can be written as w I e∗I , w I ∈ C ∞ (Rn ; R). w= I
Let U ⊂ Rn and V ⊂ Rm be open subsets. A differentiable map φ : U → V induces the homomorphism φ ∗ : p (V ) → p (U ) given by [φ ∗ w]x (u 1 , . . . , u n ) = wφ(x) (dφx .u 1 , . . . , dφx .u n ).
(8)
The p-form φ ∗ w ∈ p (U ) is the pullback of w by φ. When w is a 0-form, the pullback of w by φ is the 0-form φ ∗ w = w ◦ φ. Example 3 This example explains an important case that is useful to simplify calculus with forms. Let U ⊂ Rn and V ⊂ Rm be open subsets with coordinates U = {(x1 , . . . , xn ) | xi ∈ R} and V = {(y1 , . . . , ym ) | y j ∈ R}. Fix the canonical frame β = {e1 , . . . , em } in Rm . Let φ : U → V be a C ∞ -map defined with coordinates as φ = (φ1 , . . . , φm ). We will check the identity φ ∗ dyα = dφα .
(9)
Therefore we have
(φ ∗ dyα )(u) = dyα (dφ.u) = dyα (
n i=1
=
n i=1
u i dyα (
m ∂φ β
eβ ) =
∂ xi n ∂φ α
β=1
n
∂φ α = ui = ∂ xi i=1
∂φ ∂φ )= u i dyα ( )= ∂ xi ∂ xi i=1 n
ui
i=1
∂ xi
n m i=1 β=1
ui
∂φ β dyα (eβ ) = ∂ xi
d xi (u) = dφ α (u).
Theorem 1 (Change of Variables) Let U, V be open subsets of Rn and let φ : U → V be a diffeomorphism preserving the orientation. If w is an integrable n-form in V , then w= φ ∗ w. (10) φ(U )
Proof Let w = f dy1 ∧ · · · ∧ dyn ; we have
U
3 Differential Forms
245
φ ∗ w = ( f ◦ φ) det dφx (d x1 ∧ · · · ∧ d xn ). Since det dφx > 0, the statement follows from the formula for the change of variables for multiple integrals, Theorem 8 in Appendix A. Example 4 Let U ⊂ R2 be an open subset, w = f 1 d x2 ∧ d x3 + f 2 d x3 ∧ d x1 + Let φ : U → R3 be a C ∞ -map f 3 d x1 ∧ d x2 ∈ 2 (R3 ) and F = f 1i + f 2 j + f 3 k. given by φ(u, v) = u, v, g(u, v) . The image of φ is the surface S ⊂ R3 given by the graph of g. We would like to check the identity w= F. S
S
By the Change of Variables Theorem, we have
w= S
U
φ∗w =
( f i ◦ φ)φ ∗ (d x j ∧ d xk ) =
U {i jk}
S
w=
U
φ ∗ w. Therefore
( f i ◦ φ)(φ ∗ d x j ) ∧ (φ ∗ d xk ).
{i jk} U
Since φ ∗ (d x1 ) = dφ1 = du, φ ∗ (d x2 ) = dφ2 = dv and φ ∗ (d x3 ) = dg = we get the identities
∂g dv. ∂v
∂g du ∂u
φ ∗ (d x1 ∧ d x2 ) = du ∧ dv, ∂g φ ∗ (d x3 ∧ d x1 ) = − du ∧ dv, ∂u ∂g ∗ φ (d x2 ∧ d x3 ) = − du ∧ dv. ∂v In terms of the coordinate (u, v) on U , the pullback form φ ∗ w is written as ∂g ∂g − f2 + f 3 ]du ∧ dv = ∂u ∂v ∂g ∂g =< ( f 1 , f 2 , f 3 ), (− , − , 1) > du ∧ dv. ∂u ∂v
φ ∗ w = [− f 1
∂g ∂g We note the normal vector to S is N = (− ∂u , − ∂v , 1), so ∂g ∂g φ∗w = < ( f 1 , f 2 , f 3 ), (− , − , 1) > dudv = ∂u ∂v U U N > dudv = = < F, F. U
S
+
246
6 Differential Forms, Stokes Theorem
Exercises (1) Let φ and ψ be differentiable maps. Show the following identities; (a) φ ∗ (w1 + w2 ) = φ ∗ w1 + φ ∗ w2 ; (b) φ ∗ (w1 ∧ w2 ) = φ ∗ w1 ∧ φ ∗ w2 ; (c) (φ ◦ ψ)∗ w = ψ ∗ ◦ φ ∗ w. (2) Let φ : U → V be a diffeomorphism between open sets in Rn . Let (x1 , . . . , xn ) and (y1 , . . . , yn ) be coordinate systems in U and V , respectively. Show the identity ∗ φ (dy1 ∧ · · · ∧ dym ) = det[dφx ](d x1 ∧ · · · ∧ d xn ), for all y = φ(x). (3) Find S w given that w= f 1 d x2 ∧ d x3 + f 2 d x3 ∧ d x1 + f 3 d x1 ∧ d x2 ∈ 2 (R3 ) and S is a surface parametrized by φ : U → R3 , φ(u, v) = φ1 (u, v), φ2 (u, v), φ3 (u, v) .
3.1 Exterior Derivative The exterior derivative generalizes the operators grad, curl and div appearing in the classical integration theorems. Definition 9 Let U ⊂ Rn be an open subset. For p ∈ {1, . . . , n}, the exterior derivative operator d p : P (U ) → p+1 (U ) is defined as follows: (1) If p = 0 and w ∈ 0 (U ), then d0 w = dw =
n ∂wi i=1
(2) If p > 0 and w =
I
∂ xi
d xi .
(11)
w I d x I ∈ P (U ), w I ∈ O (U ), then dpw =
dw I ∧ d x I .
(12)
I
Examples: Let U ⊂ Rn be an open subset and F a C 1 -vector field defined on U ; n ∂ f (1) If f ∈ 0 (U ), then d f = i=1 d xi . Therefore d0 f coincides with the dif∂ xi ferential d f . It follows that d0 f = 0 if and only if grad(f) = 0. (2) Considering n = 2, given ω = f 1 d x1 + f 2 d x2 ∈ 1 (U ) and F = f 1i + f 2 j, we get
3 Differential Forms
247
∂ f1 ∂ f1 ∂ f2 ∂ f2 dω = d f 1 ∧ d x1 + d f 2 ∧ d x2 = d x1 + d x2 ∧ d x1 + d x1 + d x2 ∧ d x2 ∂ x1 ∂ x2 ∂ x1 ∂ x2
∂ f2 ∂ f1 d x1 ∧ d x2 = curl(F)dx1 ∧ dx2 . = − ∂ x1 ∂ x2
(3) Considering n = 3, given ω = f 1 d x1 + f 2 d x2 + f 3 d x3 ∈ 1 (U ) and F = f 1i + we get f 2 j + f 3 k, ⎛ dω = ⎝
⎛ ⎛ ⎞ ⎞ ⎞ 3 3 3 ∂ f1 ∂ f2 ∂ f3 ⎝ ⎝ ⎠ ⎠ d x ∧ d x1 + d x ∧ d x2 + d x ⎠ ∧ d x3 ∂ xi i ∂ xi i ∂ xi i
i=1
i=1
i=1
∂f ∂f ∂f ∂f ∂f = 1 d x2 ∧ d x1 + 1 d x3 ∧ d x1 + 2 d x1 ∧ d x2 + 2 d x3 ∧ d x2 + 3 d x1 ∧ d x3 ∂ x2 ∂ x3 ∂ x1 ∂ x3 ∂ x1
∂f ∂ f2 ∂f ∂ f3 ∂f ∂ f3 ∂f + 3 d x2 ∧ d x3 = − 1 d x1 ∧ d x2 + − 1 d x1 ∧ d x3 + − 2 d x2 ∧ d x3 . ∂ x2 ∂ x1 ∂ x2 ∂ x1 ∂ x3 ∂ x2 ∂ x3
Therefore dω = 0 if and only if curl(F) = 0. (4) Considering n = 3, given w = f 3 d x1 ∧ d x2 + f 2 d x3 ∧ d x1 + f 1 d x2 ∧ d x3 ∈ we get 2 (U ) and F = f 1i + f 2 j + f 3 k, dω =
∂ f3 ∂ f2 ∂ f1 d x1 ∧ d x2 ∧ d x3 + d x1 ∧ d x2 ∧ d x3 + d x1 ∧ d x2 ∧ d x3 = div(F)dx1 ∧ dx2 ∧ dx3 . ∂ x3 ∂ x2 ∂ x1
Therefore dω = 0 if and only if div(F) = 0. (5) If w ∈ n (U ), then dω = 0. Theorem 2 For all p ∈ {1, . . . , n}, the exterior derivative operator d p : p (U ) → p+1 (U ) satisfies the following properties; (1) (linearity) For any a1 , a2 ∈ R and ω1 , ω2 ∈ p (U ), we have d p (a1 ω1 + a2 ω2 ) = a1 d p ω1 + a2 d p ω2 . (2) (multiplication law) If ω ∈ p (U ), then d p (ω ∧ θ ) = d p ω ∧ θ + (−1) p ω ∧ dθ. (3) (cocycle condition) d p+1 ◦ d p = 0, (d 2 = 0).
(13)
In addition, the set of operators {d p : p → p+1 | 1 ≤ p ≤ n} is the only one that satisfies the properties above and d0 = d, i.e., the exterior derivative on 0-forms coincides with the usual derivative on functions.
248
6 Differential Forms, Stokes Theorem
Proof The above properties follow directly from the definitions. To verify the uniqueness, suppose that there is a family of operators {D p : p (U ) → p+1 (U )} such that D0 = d. So D(d x I ) = 0 since D(d xi1 ∧ · · · ∧ d xin ) =
(−1)i j −1 d xi1 ∧ · · · ∧ D(d xi j ) ∧ · · · ∧ d xi p j
and D(d xi j ) = D(Dxi j ) = 0. In this way, if w = I a I d x I , then d(a I ) ∧ d x I = dw. D(a I ) ∧ d x I + a I D(d x I ) = Dw = I
I
Corollary 1 Let U ⊂ Rn , V ⊂ Rm be open subsets and let φ : U → V be a C ∞ map. So d p (φ ∗ ω) = φ ∗ d p ω, for all ω ∈ p (V ). Proof Let’s consider ω ∈ 0 (V ); therefore ∗ φ dω x (u) = dωφ(x) .dφx (u) = d(ω ◦ φ)x (u) = d(φ ∗ ω)x (u). For the general case, when dω = φ ∗ dω = φ ∗
I
d f I ∧ d x I ∈ p (V ), we have
∗ d fI ∧ d xI = φ (d f I ∧ d x I ) = (φ ∗ d f I ) ∧ (φ ∗ (d x I ))
I
dφ ∗ ω = d φ ∗
I
I
∗ fI ∧ dxI = d φ ( fI ∧ dxI ) = d(φ ∗ d f I ) ∧ (φ ∗ (d x I )).
I
I
I
Hence d p (φ ∗ w) = φ ∗ d p w.
The above corollary means that the following diagram is commutative; dp
p (V ) → p+1 (V ) ↓φ ∗ ↓φ ∗ dp
p (U ) → p+1 (U ).
3 Differential Forms
249
Exercises The Frobenius Integrability Theorem has the following version using differential forms (see Ref. [45]); (1) Let U ⊂ Rn be an open subset and let D be a p-dimensional C ∞ -distribution on U . A q-form ω annihilates D if for each x ∈ U , ωx (v1 , . . . , vq ) = 0 for all v1 , . . . , vq ∈ Dx . We let I(D) = {ω ∈ ∗ (U ) | ω annihilate D}. Show that in the following items: (a) I(D) is an ideal. (b) I(D) is locally generated by (n − p) independent 1-forms ω1 , . . . , ωn− p on U. (c) If I(D) ⊂ ∗ (U ) is an ideal locally generated by (n − p) independent 1forms, then there is a unique C ∞ -distribution D of dimension p on U for which I = I(D). (2) An ideal I ⊂ ∗ (U ) is a differential ideal if it is closed under the exterior derivative D, i.e., d(I) ⊂ I. Show that the claims in the following items are true; (a) A C ∞ -distribution on U is involutive if and only if the ideal I(D) is a differential ideal. (b) Let i : M → U be the inclusion of a submanifold. M is an integral manifold of an ideal I ⊂ ∗ (U ) if for every ω ∈ I, we have i ∗ ω ≡ 0 (the pull-back form). M is maximal if its image is not a proper subset of the image of any other connected integral manifold of the ideal. Assume I ⊂ ∗ (U ) is a differential ideal locally generated by (n − p) independent 1-forms.. For any x ∈ M, show that we have a unique maximal, connected, integral manifold of I through x whose dimension is equal to p. (3) What can we claim from the results above when I is generated by a closed 1-form ω?
4 De Rham Cohomology Differential forms are a powerful tool; they are more than just a convenient language to prove Stokes theorem because they allows us to define and compute topological invariants of a differentiable manifold. In this section, we compute the De Rham groups of the sphere S n and the closed surfaces g of genus g. The formalism we are developing can be used to define the De Rham cohomology of any differentiable manifold and not just for an open subset U ⊂ Rn . We recommend reading the section
250
6 Differential Forms, Stokes Theorem
on differential forms on manifolds at the end of this chapter. The exterior derivative defines the sequence d
d
d
d
d
d
0 (V ) → 1 (V ) → · · · → p (V ) → p+1 (V ) → · · · → n (V ).
(14)
The cocycle condition d 2 = 0 defines the differentiable complex (∗ (U ), d) defined as ∗
(U ) =
n
p (U ), d : p (U ) → p+1 (U ).
p=0
Since d 2 = 0, we have the required condition to be a complex, i.e., Im(d) ⊂ Ker(d). Definition 10 Let (∗ (U ), d) be a differentiable complex; (1) ω ∈ p (U ) is a closed p-form if dω = 0. The space of closed p-forms is Z p (U ) = {ω ∈ p (U ) | dω = 0}. (2) ω ∈ p (U ) is an exact p-form if we have a ( p − 1)-form η ∈ p−1 (U ) such that ω = dη. The space of exact p-forms is B p (U ) = {ω ∈ p (U ) | ω = dη, η ∈ p−1 (U )}. From the definition, we have B p (U ) ⊂ Z P (U ), ∀ p = 0, . . . , n. Consider in Z (Rn ) the following equivalent relation: p
ω ∼ ω ⇐⇒ ω − ω ∈ B p (Rn ), [ω] = ω + d p−1 . Definition 11 Let U ⊂ Rn be an open subset; (i) The p th De Rham cohomology group is the quotient vector space p
H D R (U ) =
Z p (Rn ) Ker(d) = p n . Im(d) B (R )
(15)
p
By convention, H D R (U ) = 0 if p < 0, and n H D0 R (U ) = Ker(d) = {f ∈ C∞ (U, R) | df = 0}, HDR (U) =
n (U) . dn−1 (U)
(ii) The p-forms ω and ω are cohomologous if ω ∼ ω . In this case, they define the p same class [ω ] = [ω] ∈ HDR (U ).
4 De Rham Cohomology
251
The vector space p (U ) is infinite dimensional. We cannot guarantee that the p quotient spaces H D R (U ) are closed spaces. They are closed for a large class of sets, e.g., for closed differentiable manifolds because the exterior derivative operator has a closed image.2 In the following, we will restrict ourselves to subsets U ⊂ Rn that are differentiable manifolds. De Rham’s theorem ensures that if U is a p compact differentiable manifold, then the vector spaces H D R (U ), p = 0, . . . , n, are p isomorphic to the singular cohomology groups H (U, R) with real coefficients, and p so they are finite dimensional and dim(HDR (U)) = bp (U) for all p = 0, . . . , dim(U). To answer the question about when a vector field is conservative requires a deep insight into the underlying topological space. Proposition 8 The vector space H D∗ R (U )
=
n
p
H D R (U )
i=0
is an algebra endowed with multiplication: q
p+q
H DPR (U ) × H D R (U ) → H D R (U ) ([w1 ], [w2 ]) → [w1 ].[w2 ] = [w1 ∧ w2 ].
(16)
Proof If w1 and w2 are closed forms, i.e., dw1 = dw2 = 0, then the form w1 ∧ w2 is also closed, since d(w1 ∧ w2 ) = dw1 ∧ w2 + (−1)|w1 | w1 ∧ dw2 = 0. It is enough to show that the product is well-defined since the associativity is inherited from the exterior product. Suppose that [w1 ] = [w1, ] and [w2 ] = [w2, ]; we need to check that [w1 ].[w2 ] = [w1, ].[w2, ], that is [w1 ∧ w2 ] = [w1, ∧ w2, ]. Given w1, = w1 + dη1 e w2, = w2 + dη2 , we have w1, ∧ w2, = (w1 + dη1 ) ∧ (w2 + dη2 ) = w1 ∧ w2 + w1 ∧ dη2 + dη1 ∧ w2 + dη1 ∧ dη2 = = w1 ∧ w2 + d[(−1) p w1 ∧ η2 + η1 ∧ w2 + η1 ∧ η2 ].
Therefore [w1 ].[w2 ] = [w1, ].[w2, ]. 2 For
more details on the De Rham theory, see Ref. [45].
252
6 Differential Forms, Stokes Theorem
Since φ ∗ d = dφ ∗ , a differentiable map φ : U → V between the open subsets p p U ⊂ Rn and V ⊂ Rm induces a homomorphism φ ∗ : H D R (V ) → H D R (U ). So the composition of maps ψ
φ
U −−−−→ V −−−−→ W induces the homomorphism ψ∗
p
φ∗
p
p
H D R (W ) −−−−→ H D R (V ) −−−−→ H D R (U ), (ψ ◦ φ)∗ = ψ ∗ ◦ φ ∗ . Next, the results obtained in Chap. 4 will be extended to star-shaped sets. Lemma 2 (Poincaré Lemma) If U ⊂ Rn is a star-shaped open subset, then p H D R (U ) = 0 for all p > 0. and H D0 R (U ) R Proof The proof relies on the existence of an operator S p : p (U ) → p−1 (U ) such that d p−1 S p + S p+1 d p = id p , if p > 0, S1 d = id − e, if p = 0, and e(ω) = ω(0). The existence of this operator immediately implies the lemma, since given a closed p-form ω, we have ω = d p−1 S p ω, if p > 0, ω = ω(0), if p = 0. (1) First, we construct a homomorphism S p : p (U × R) → p−1 (U ). Every form p w ∈ (U × R) can be written as f I (x, t)d x I + g J (x, t)dt ∧ d x J , ω= I
J
given that I = (i 1 , . . . , i p ) and J = ( j1 , . . . , j p−1 ) are multi-index. Consider 1 g J (x, t)dt d x J . S p (ω) = 0
J
Once dω =
∂ fI J
we get
∂ xi
d xi ∧ d x I +
∂ fI I
∂t
dt ∧ d x I +
∂g J J
∂ xi
d xi ∧ dt ∧ d x J ,
4 De Rham Cohomology d S p (w) + S p+1 dw =
253
1 ∂g J 1 ∂ fI 1 ∂g J dt d x I − dt d xi ∧ d x J + dt d xi ∧ d x J = 0 ∂ xi 0 ∂t 0 ∂ xi J,i
I
1 ∂ fI f I (x, 1) − f (x, 0) d x I . dt d x I = = 0 ∂t I
J,i
I
(2) Let ψ : R → R be a C ∞ -function, such that 0 ≤ ψ ≤ 1 and ⎧ ⎪ ⎨0, t ≤ 0; ψ(t) = 1, t ≥ 1; ⎪ ⎩ 0 ≤ ψ(t) ≤ 1, t ∈ (0, 1). Let φ : U × R → U be the map φ(x, t) = ψ(t)x. Consider S p (w) = S p (ψ ∗ w). Given w = I h I (x)d x I , since φ ∗ (d xi ) = xi ψ , (t)dt + ψ(t)d xi , we have φ ∗ (w) =
h I ψ(t)x dψ(t)xi1 + ψ(t)d xi1 ∧ · · · ∧ dψ(t)xi p + ψ(t)d xi p .
I
In the above expression, the first term independent of dt is p h I ψ(t)x ψ(t) d x I . I
So d S p (w) + S p+1 dw =
I h I (x)d x I = w, p > 0; w(x) − w(0), p = 0.
Poincaré’s Lemma implies that the De Rham cohomology groups of a star-shaped open subset U are trivial; Proposition 9 Let n > 1. The De Rham cohomology groups of Rn are R, if i = 0, i n HDR (R ) = 0, if i = 0. The De Rham cohomology H D∗ R carries a ring structure, and indeed, it is a topological invariant. We list below some fundamental properties of this ring; the proofs can be read in the Refs. [4, 22, 45].
254
6 Differential Forms, Stokes Theorem
(1) If X is connected and contractible, then H Di R (X ) = 0 for all i = 0 and H D0 R (X ) = R. (2) From the De Rham theorem, if X is a paracompact space admitting a differentiable structure, then H Di R (X ) is isomorphic to H i (X, R) (ith-singular cohoi (X ) mology group) for all i > 0. If X is a compact differentiable manifold, HDR i is finitely generated for all i = 0, . . . , dim(X). Since H (X, R) is a topological i (X ). Indeed, H i (X, R) is a vector space over invariant, so also is the group HDR i (X)) is called the ith-Betti number of X . R whose dimension bi (X ) = dim(HDR (3) If X and Y are homotopic topological spaces, then H D∗ R (X ) = H D∗ R (Y ). In pari i (X × Y ) = HDR (X ). ticular, if Y is contractible, then HDR
Exercise (1) If U has a k-connected component, show that H D0 R (U ) Rk .
4.1 Short Exact Sequence Let’s consider that ( A , d A ) and ( B , d B ) are two differentiable complexes, such p p ∗ ∗ (A) and HDR (B). that A = ⊕ p≥0 A and B = ⊕ p≥0 B with cohomologies HDR Definition 12 A map f : A → B between the differentiable complexes is a chain p p map if f = ⊕ p≥0 f p is the direct sum of maps f p : A → B such that f p+1 ◦ d A = dB ◦ f p . Consequently the diagram below commutes for all p ≥ 0; p d A, p
p+1
p d B, p
p+1
A → A ↓ fp ↓ f p+1
B → B . The differentiable complexes ( A , d A ), ( B , d B ) and (C , dC ) define a Short Exact Sequence if there are chain maps f : A → B and g : B → C such that the sequence below is an exact sequence: g
f
0 −→ A −→ B −→ C −→ 0.
(17)
It means that f is injective, Ker(g) = Im(f) and g is surjective. Proposition 10 For all p ≥ 0, the Short Exact Sequence (17) induces homomorp p+1 phisms δ : HDR (C) → HDR (A) and a long exact sequence p
fp
p
gp
p
δ
p+1
· · · → HDR (A) → HDR (B) → HDR (C) → HDR (A) → . . . .
4 De Rham Cohomology
255 p
p
p
p
Proof The homomorphisms f p : HDR (A) → HDR (B) and g p : HDR (B) → HDR (C) p are well-defined and Im(fp ) = Ker(gp ). To define the homomorphism δ : HDR (C) → p+1 HDR (A), we call upon the diagram below; ⏐ ⏐ ⏐ ⏐ ⏐ ⏐ p+1
0 −−−−→ A ⏐ d A, p ⏐
g p+1
f p+1
p+1
−−−−→ i+1 −−−−→ C B ⏐ ⏐ d B, p ⏐ dC, p ⏐ fp
p
gp
−−−−→ 0 .
(18)
p
0 −−−−→ iA −−−−→ B −−−−→ C −−−−→ 0 ⏐ ⏐ ⏐ ⏐ ⏐ ⏐ p
Let c ∈ C be a closed form, i.e., dC, p (c) = 0. Since g p is surjective, there is b ∈ iB such that g p (b) = c. The map g is a chain map and since the sequence is p+1 exact, we have g p+1 (d B, p (b)) = 0. So there is a ∈ A such that d B, p (b) = f (a). p p+1 Defining δ(c) = a, we get that δ : C → A induces the well-defined homomorp p+1 phism δ : HDR (C) → HDR (A), as shown in the following items; p+1 (i) there is a unique a ∈ A such that d B, p (b) = f p+1 (a); p+1 Suppose there are a, a ∈ A such that d B, p (b) = f p+1 (a) = f p+1 (a ), so we get a − a belonging to Ker(fp+1 ) = {0}. Hence a = a. p p−1 (ii) If [c ] = [c] ∈ HDR (C), then we have c = c + dη, η ∈ C , and therefore δ(c ) = δ(c). (iii) d A, p+1 (a) = 0. The cocycle condition yields f p+2 d A, p+1 (a) = d B, p+1 f p+1 (a) = d B, p+1 d B, p (b) = 0. Since f p+2 is injective, we have d A, p+1 (a) = 0. Consequently, a defines a class p+1 in HDR (A).
5 De Rham Cohomology of Spheres and Surfaces It is a difficult task to compute the De Rham Cohomology groups. However several examples of groups that can be easily found with relatively simple techniques. We will work out two cases: the sphere S n and the closed surface g of genus g. To find the groups, we will make use of the partitions of unity and the Mayer-Vietoris exact sequence. Partitions of Unity Partitions of unity are useful tools in situations in which we need to define global objects from local objects.
256
6 Differential Forms, Stokes Theorem
Given a function f : → R, let supp(f) = {x ∈ | f(x) = 0} be the support of f . Definition 13 A family of subsets C = {Cλ }λ∈ contained in ⊂ Rn is locally finite if every point x ∈ has a neighborhood Vx ⊂ intersecting only a finite number ! of sets Cλ . In this way, if ⊂ λ∈ Cλ , then the family C = {Cλ | λ ∈ } defines a locally finite cover of . Definition 14 A partition of unity defined in is a family of C k -functions {φλ }λ∈ ∈ C k (, R) such that: (1) For all λ ∈ and x ∈ , we have φλ (x) ≥ 0. (2) The family of sets F = {supp(φλ )}λ∈ is locally finite in . (3) For all x ∈ , we have λ∈ φλ (x) = 1. The functions φλ are called cutoff functions. From the definition, we get (1) 0 ≤ φλ (x) ≤ 1 for all x ∈ and λ ∈ . (2) For all x ∈ , we have ! λ ∈ such that φλ (x) ≥ 0. The supports define the locally finite cover ⊂ λ∈ supp(φλ ). Definition 15 A partition of unity {φλ | λ ∈ } is subordinated to the cover C = {Uθ | θ ∈ } of if, for λ ∈ , there is θ ∈ such that supp(φλ ) ⊂ Uθ . In this case, the cover {supp(φλ ) | λ ∈ } is a refinement of the cover {Uθ | θ ∈ }. Theorem 3 Let ∈ Rn be an open subset. For any open cover C = {Uλ | λ ∈ } of , there is a partition of unity {φλ }λ∈ subordinated to C . The proof of the theorem requires a list of results left as Exercises, some are elementary.
Exercises (1) (Lindelöf’s Theorem) Let ⊂ Rn be an3 arbitrary set. Show that any cover of admits a countable subcover. (2) Show that any locally finite family of sets contained in an open set ⊂ Rn is countable. (3) If C = {Cλ | λ ∈ } is a locally finite family contained in an open subset, show that a compact set K ⊂ intersects at most a finite number of sets Cλ . } is a locally finite family of closed sets contained in , show (4) If C = {Cλ | λ ∈ ! that the set C = λ∈ Cλ is closed in . (5) Let 0 < a < b and consider the function (Fig. 1), 1 e− (t+a)(t+b) , t ∈ [−b, −a]; f (t) = 0, t ∈ / [−b, −a]. 3 The
proofs can be read in [29].
5 De Rham Cohomology of Spheres and Surfaces
257
Fig. 1 Graph of f (t)
(a) Sketch the graph of g(t) =
⎧ ⎨ ∞ f (t)
, t ≤ 0,
f (t)dt ⎩ ∞− f (t) , −∞ f (t)dt −∞
t > 0.
t (b) Consider the function h : R → R given by h(s) = −∞ g(s)ds and sketch the graph. (c) Let ψ : Rn → R be the function ψ(x)=h(| x |). Show that ψ ∈ C ∞ (Rn , R) and supp(ψ) ⊂ Bb (0). ! Lemma 3 Any open subset ⊂ Rn admits a countable cover = n∈N K n by compact sets K n such that K n ⊂ int(Kn+1 ), ∀n ∈ N. ! Proof Every point a ∈ belongs to a ball Bδ (a) ⊂ , so ⊂! a∈ Bδ (a). By Lindelöf’s theorem, there is a countable subcover such that ⊂ n∈N Bδ (an ) and ! ⊂ n∈N Bδ (an ), with Bδ (an ) being compact sets. The compact sets K n are defined by induction, as follows; let K 1 = Bδ (a1 ) and suppose that (1) ⊂ K 1 ∪ K 2 ∪ · · · ∪ K p ∪ Bδ (a p+1 ) ∪ . . . , (2) K i ⊂ K i+1 , 1 ≤ i ≤ p, (3) Bδ (a1 ) ∪ · · · ∪ Bδ (a p ) ⊂ K p . Take the cover of the compact set K p ∪ Bδ (a p+1 ) with a finite number of balls Bδ (a) and a ∈ K p ∪ Bδ (a p+1 ). Consider K p+1 = K p ∪ Bδ (a p+1 ). Repeating this process, we obtain the cover {K n | n ∈ N}.
258
6 Differential Forms, Stokes Theorem
Proof (Theorem 3) Let {xλ | λ ∈ } be a family of points in and let Bδ (xλ ) be open balls such that " Bδ (xλ ). ⊂ λ∈
For λ ∈ , consider the function ψλ (x) such that supp(ψλ ) ⊂ Bδ (xλ ). By Lemma 3, there is a!countable family of sets {K k | k ∈ N} such that ! pcompact 1 Bδ (xi ) be a finite cover of K 1 . Since K k ⊂ K k+1 and ⊂ k∈N K k . Let A1 = i=1 ! p2 K 2 − K 1 is compact, consider A2 = i=1 Bδ (xi ) a finite cover of K 2 − K 1 . Analogously, K k − K k−1 is compact, so Ak =
pk "
Bδ (xi )
i=1
! is a finite cover of K!k − K k−1 . Therefore ⊂ k∈N Ak . After ordering the set of balls, we have ⊂ n∈N Bδ (xn ). For n ∈ N, consider the function φn : → N, φ p (x) =
ψ p (x) . n∈N ψn (x)
The collection of functions {φ p | p ∈ N} satisfies the following properties; (1) 0 ≤ φ( p (x) ≤ 1 for all x ∈ . (2) For any x ∈ , we have φ p (x) = 0, but for a finite number of p ∈ N and supp(ψ p ) ⊂ Bδ (x p ). Consequently, the family F={supp(φ p ) | n ∈ N} is locally finite in and is also subordinated to the cover {Bδ (x p ) | p ∈ N}. (3) p∈N φ p = 1. The Mayer-Vietoris Sequence Let M be a set such that M = U ∪ V , U, V open sets. We have the inclusions i
iU i V
M ←−−−− U V ←−−−− U ∩ V, given that U V is the disjoint union. The map i : U V → M is the inclusion, and the maps iU : U ∩ V → U and i V : U ∩ V → V are also inclusions. This maps define the homomorphisms i∗
iU∗ ⊕i V∗
∗ (M) −−−−→ ∗ (U ) ⊕ ∗ (V ) −−−−→ ∗ (U ∩ V ).
(19)
Consider the homomorphism j ∗ : ∗ (U ) ∗ (V ) → ∗ (U ∩ V ) given by j (α, β) = iU∗ (α) − i V∗ (β). ∗
5 De Rham Cohomology of Spheres and Surfaces
259
Lemma 4 (Mayer-Vietoris Sequence) The sequence j∗
i∗
0 −−−−→ ∗ (M) −−−−→ ∗ (U ) ⊕ ∗ (V ) −−−−→ ∗ (U ∩ V ) −−−−→ 0 −−−−→ iU∗ (α) − i V∗ (β)
(α, β) is exact.
Proof Let w ∈ ∗ (M) be a differential form on M. The restrictions wU = w |U , wV = w |V are also forms on U and V , respectively; hence j ∗ ◦ i ∗ = 0. Let’s check that j ∗ is surjective. Let η ∈ ∗ (U ∩ V ) and let {φU , φV } be a partition of unity, subordinate to the cover {U, V } of M. The forms φU .η ∈ U∗ and φV .η ∈ ∗V satisfy η = φU .η − (−φV .η). Therefore j ∗ is surjective and j ∗ (φU .η, −φV .η) = η. Proposition 10 and Lemma 4 yield the Mayer-Vietoris Exact Sequence (MVS),
p
ip
p
jp
p
δ
p
· · · → HDR (M) → HDR (U ) ⊕ HDR (V ) → HDR (U ∩ V ) → H p+1 (M) → . . . . The MVS is useful to compute De Rham cohomology once the space splits into subsets in which the groups are known. Given the closed form w ∈ p (U ∩ V ), p+1 we can find δ(w) ∈ HDR (M) explicitly. Since d p w = 0, we have 0 = d p j p ξ = j p+1 d p ξ = 0, so d p ξ = i p+1 (a), a = δ(w). We note that ξ = (φU .w, −φV .w) ∈ ∗ (U ) ⊕ ∗ (V ) and d p ξ = (d p (φU .w), −d p (φV .w)) ∈ ∗ (U ) ⊕ ∗ (V ) satisfy j p (ξ ) = w. Since j p+1 d p ξ = d p (φU .w) + d p (φV .w) = 0, we have d p (φU .w) = −d p (φV .w) in U ∩ V . Then we define [d(φU .w)], in U, p+1 δ(w) = ∈ HDR (M). −[d(φV .w)], in V. Although the groups HDR have been introduced in open subsets of Rn , they can be defined for differentiable submanifolds of Rn by using the same formalism we already developed. We will ignore this formality to describe their De Rham cohomology groups and we will use the fact that the sphere S n and closed surface g of genus g are both compact submanifolds of Rn . The tools so far developed are enough; we just need some topological properties for each space. 1-Sphere S n Using the stereographic projection, we have S n = U ∪ V , and U, V are open subsets homeomorphic to the unit ball B n = {x ∈ Rn ; | x |≤ 1} and U ∩ V is homeomorp p phic to S n−1 × R, as illustrated in Fig. 2. So we have HDR (U ) = HDR (V ) = 0 and p p n−1 HDR (U ∩ V ) = HDR (S ) for all p > 0. From the Mayer-Vietoris exact sequence, p−1
i p−1
j p−1
p−1
δ
p
· · · → HDR (S n ) → 0 ⊕ 0 → HDR (S n−1 ) → HDR (S n ) → 0 . . .
260
6 Differential Forms, Stokes Theorem
Fig. 2 S n = U ∪ V
δ
p−1
p
yields an isomorphism HDR (S n−1 ) → HDR (S n ) for all p > 1. When n > 1 and p = 1, j0 is an isomorphism in δ
j0
i0
0 0 1 0 → HDR (S n ) = R → R ⊕ R → HDR (S n−1 ) = R → HDR (S n ) → 0 . . .
breaking into three exact sequences: δ
j0
i0
(i) 0 → R → R ⊕ R → R → 0, δ
1 (S n ) → 0, (ii) 0 → HDR p−1
p
(iii) 0 → HDR (S n−1 ) → HDR (S n ) → 0. 1 Consequently, HDR (S n ) = 0 for all n > 1. When n = 1, the sequence becomes i0
j0
δ
1 0 → R → R ⊕ R → R ⊕ R → HDR (S 1 ) → 0 . . . p−1
(20) p
Proposition 11 Let n > 1. The isomorphisms HDR (S n−1 ) HDR (S n ) and 1 ∗ (S n ) 0 determine HDR (S n ); indeed, HDR ⎧ ⎪ ⎨R, if i = 0, i n HDR (S ) = 0, if 0 < i < n, ⎪ ⎩ 1 HDR (S 1 ), if i = n. p
p−k
Proof By recurrence, we get that HDR (S n ) = HDR (S n−k ) = 0. Since p < n, if k = p 1 (S n− p+1 ) = 0. If p = n, the same recurrence p − 1, then we have HDR (S n ) = HDR n 1 n 1 gives us HDR (S ) HDR (S ). 1 To find HDR (S 1 ), we need to check on the sequence (20), i0
j0
δ
1 0 → R → R ⊕ R → R ⊕ R → HDR (S 1 ) → 0 . . .
(21)
5 De Rham Cohomology of Spheres and Surfaces
261
The homomorphism i 0 is injective, so we have Im(j0 ) = R. Since δ is surjective, 1 (S 1 ) = (R ⊕ R)/Img(j0 ) = R. we have HDR Theorem 4 Let n > 0 be finite. The cohomology groups of the sphere S n are ⎧ ⎪ ⎨R, if i = 0, i n HDR (S ) = 0, if 1 ≤ i ≤ n − 1, ⎪ ⎩ R, if i = n. 0 If n = 0, then HDR (S 0 ) = R ⊕ R.
2-Closed Surface g The De Rham cohomology of a genus g closed surface g is ⎧ ⎪ ⎨R, if i = 0, i HDR (g ) = R2g , if 1 ≤ i ≤ n − 1, ⎪ ⎩ R, if i = n. 0 (g ) = Let’s find these groups using forms. Since g is path connected, we get HDR R. To find the 1st and 2nd cohomology groups, we need some lemmas. Let B = {x ∈ R2 ; | x |< 1} be the open unit disc in R2 . Lemma 5 Let ρ ∈ 2 (B) be a 2-form such that B ρ = 0. Then we have α ∈ 1 (B) such that ρ = dα.
Proof Let ρ = R(u, v)dudv and consider ψ : R → R a cutoff function with support ∞ in B and −∞ ψ(t)dt = 1. Define the functions ∞ R(u, t)dt, r (u) = −∞
# v) = R(u, v) − r (u)ψ(v). R(u, # has compact support in B and, for each u ∈ B, we have Then R
# v)dudv = R(u, B
R(u, v)dudv − B
∞
−∞
r (u)du.
The function P(u, v) =
v
# t)dt R(u,
−∞
has compact support in B and ∂P v). = R(u, ∂v
∞ −∞
ψ(v)dv = 0.
262
6 Differential Forms, Stokes Theorem
The function Q(u, v) = ψ(v)
u −∞
r (t)dt
also has compact support in B and ∂Q # v). = ψ(v)r (u) = R(u, v) − R(u, ∂v Therefore we have R = we have ρ = dα.
∂P ∂v
+
∂Q . ∂u
Considering the 1-form α = −Pdu + Qdv,
The lemma extends over g .
Corollary 2 Let ρ ∈ 2 (g ) be a 2-form such that g ρ = 0. Then there is α ∈ 1 (g ) such that ρ = dα. !n Proof Consider a cover = i=1 Ui such that Ui is an open subset diffeomorphic to B. We use induction on n. If n = 1, then we are reduced to the last lemma. Suppose n > 1, consider V = U2 ∪ · · · ∪ Un , so K = U ∪ V . If either g ∩ U = ∅ or g ∩ V = ∅, we are done, given the lemma. Since g is connected, there is a point in ∩ U ∩ V . Choose a 2-form τ with compact support in U ∩ V and g τ = 1. Let { f 1 , f 2 } be a partition of unity that is subordinate to the cover {U, V }. Then ρ = f 1 ρ + f 2 ρ, supp(f1 ρ) ⊂ U and supp(f2 ρ) ⊂ V. Let I = f1 ρ = − f 2 ρ. g
g
Both 2-forms f 1 ρ − I τ and f 2 + I τ have compact support in U and V , respectively, and f1ρ − I τ = f 2 ρ + I τ = 0. g
g
Using the inductive hypothesis, we can find a 1-form α such that supp(α) ⊂ U and dα = f 1 ρ − I τ , and a 1-form β such that supp(β) ⊂ V and dβ = f 2 ρ + I τ . Hence ρ = d(α + β). 2 A by-product of the corollary is the isomorphism HDR (g ) = R. To find the group let’s consider the following functionals; (i) let γ : [0, 1] → g be a closed curve, i.e., γ (0) = γ (1). Integration around γ yields a linear map 1 (g ), HDR
1 Iγ :HDR (g ) → R, Iγ (ω) = ω. γ
(22)
5 De Rham Cohomology of Spheres and Surfaces
263
(ii) Given a 1-form θ ∈ 1 (g ), we get a linear map 1 Jθ :HDR (g ) → R, Jθ (ω) = θ ∧ ω.
(23)
γ
Proposition 12 For any loop γ , we have a 1-form θ ∈ 1 (g ) with compact support in a collar neighborhood of γ such that Jθ = Iγ . Proof Let P = {0 = t0 , t1 , . . . , t N = 1} be a partition of the interval [0, 1]. Consider the image set {γ (t! 0 ), . . . γ (ti−1 ), γ (ti ), . . . , γ (t N )}, and γ (t0 ) = γ (t N ). We take a N Ui of γ ([0, 1]) with open subsets Ui such that; cover γ ([0, 1]) ⊂ i=0 (i) For all i, Ui is diffeomorphic to B; (ii) γ ([ti−1 , ti ]) ⊂ Ui ; (iii) for any i = j = k, Ui ∩ U j ∩ Uk = ∅. Choose small open discs Di , 0 ≤ i ≤ N , such that: (i) Di ⊂ Ui ∩ Ui+1 and D0 = D N ⊂ U1 ∩ U N ; (ii) γ (ti ) ∈ Di . (ii) Di ∩ D j = ∅ unless {i, j} = {0, N }. Let {ρi | 1 ≤ i ≤ N } be a family of cutoff functions such that (i) supp(ρ i ) ⊂ Di ; (ii) Di ρi = 1. The 2-form ρi − ρi−1 has compact support in Ui and (ρi − ρi−1 ) = 0. By Lemma 5, there are 1-forms θi with compact support in Ui such that dθ1 = ρ1 − ρ0 , . . . , dθi = ρi − ρi−1 , , . . . , dθ N = ρ N − ρ N −1 . Considering the 1-form θ = θ1 + · · · + θ N , we get dθ = (ρ1 − ρ0 ) + (ρ2 − ρ1 ) + · · · + (ρ N − ρ N −1 ) = ρ N − ρ0 = 0. 1 (g ) and find a representative ω such that [ω] = ζ vanishing when Let ζ ∈ HDR 1 (Ui ) = 0, we have a function f i restricted to each disk Dk , 0 ≤ k ≤ N . Since HDR on Ui such that ω = d f i in Ui . So we have θi ∧ ω = θi ∧ ω = θi ∧ d f i = f i (ρi − ρi−1 ). g
Ui
Ui
Ui
Since ω vanishes on all the discs Dk , the function f i must be constant on Di ⊃ supp(ρi ) and Di−1 ⊃ supp(ρi−1 ). So θi ∧ ω = f i (γ (ti )) − f i (γ (ti−1 )). g
264
6 Differential Forms, Stokes Theorem
Fig. 3 16-polygon
On the other hand, since φ = d f i over the image of the interval [ti , ti+1 ], we have γ
ω=
N i=1
γ ([ti−1 ,ti ]
d f i = f i (γ (ti )) − f i (γ (ti−1 )).
Hence Jθ = Jγ .
The result from the proposition above is a bilinear pairing 1 1 HDR (g ) × HDR (g ) → R, 1 (g ) must be even which is skew-symmetric and non-degenerate. Therefore HDR 2N dimensional, say isomorphic to R . To find N , let’s use the fact that g = U ∪ V , 2g U is diffeomorphic to B and V is homotopic to a bouquet ∨i=1 Si1 of 1-spheres. Indeed, g is obtained by identifying the sides of a 4g-polygon as shown in Figs.4 3, 4, 5, 6, 7, 8, 9 and 10. Let V be the closure of the neighborhood of the g closed curves obtained by identifying the sides of the 4g-polygon obtained over g , as shown in Fig. 10, and let U be the complement of V in g . Indeed, V is a neighborhood of the sides of the 4g-polygon and U is the disk inside. Using De Rham cohomology of the 1-sphere, we can compute that the De Rham groups of V and the De Rham cohomology of U are isomorphic to the cohomology of R2 . The Mayer-Vietoris exact sequence associated to the sequence
g → U ∪ V → S 1 × I
4 Figures
3, 4, 5, 6, 7, 8, 9 and 10 were kindly provided by Jos Leys. See Ref. [47] (iv).
5 De Rham Cohomology of Spheres and Surfaces
265
Fig. 4 1st-step
Fig. 5 2nd-step
Fig. 6 3rd-step
breaks into two exact sequences: (s1), which is related to the 0-cohomology groups and (s2), which is related to the higher order groups, as follows; j0
i0
δ
(s1) 0 → R → R ⊕ R → R → 0 ı1
2g
δ
1 1 1 2 (s2) 0 → HDR (g ) → 0 ⊕ HDR (∨i=1 Si1 ) → HDR (S 1 × I ) → HDR (g ) → 0. (24)
266
Fig. 7 4th-step
Fig. 8 5th-step
Fig. 9 4
6 Differential Forms, Stokes Theorem
5 De Rham Cohomology of Spheres and Surfaces
267
Fig. 10 4
In the sequence (s2), δ is an isomorphism, so (s2) breaks into (s3) and (s4); ı1
2g
1 1 (s3) 0 → HDR (g ) → 0 ⊕ HDR (∨i=1 Si1 ) → 0, δ
1 2 (S 1 × I ) → HDR (g ) → 0. (s4) 0 → HDR
(25)
1 Therefore i 1 is an isomorphism, and so HDR (g ) = R2g . ∗ , we need further concepts beyond the scope of To describe the ring structure of HDR this text. References [22, 34] fully approach the singular homology and cohomology theories.
Exercises 2 (1) Show that HDR (g ) is isomorphic to R. 1 (g ) → R given in Eq. (22) is well(2) Show that the linear functional Iγ : HDR defined. 1 (g ) → R given in Eq. (23) is well(3) Show that the linear functional Jθ : HDR defined. (4) Let {Di | 1 ≤ i ≤ N } be a family of discs in g such that Di ∩ D j = ∅ if i = j. 1 (g , show that we have a representative ω ∈ Z 1 (g ) such Given $a class ζ ∈ HDR that ω$ Di = 0 (hint: use the cutoff functions {ψi } associated with the family of discs {Di |
1 ≤ i ≤ N } to obtain ω = ω − d(ψ1 f 1 + · · · + ψ N f N ), and [ω ] = ζ is any representative $ and ω $ D = d f i ). i
2g
(5) Show that the cohomology of the bouquet ∨i=1 Si1 is 2g
0 (∨i=1 Si1 ) = R, HDR δ
2g
1 HDR (∨i=1 Si1 ) = R2g ,
2g
2 HDR (∨i=1 Si1 ) = 0.
1 2 (6) Show that HDR (S 1 × I ) → HDR (g ) is an isomorphism.
268
6 Differential Forms, Stokes Theorem
6 Stokes Theorem The Stokes Theorem generalizes the classical integration theorems seen before. The differential forms are the fundamental tool to prove the theorem in full generality. Throughout the exposition, we consider I = [0, 1] ⊂ R to be the closed interval and In = [0, 1] × · · · × [0, 1] = [0, 1]n , [0, 1]0 = {0} = R0 , the n-cube. Definition 16 A singular n-cube in U ⊂ Rn is a continuous function c : [0, 1]n → U. The simplest cases we have are the following: (i) c : [0, 1]0 → U corresponds to a point c(0) ∈ U ; (ii) c : [0, 1]1 → U corresponds to the curve in U ; (iii) the n-cube In is also a singular n-cube considering the identity map In : [0, 1]n → Rn , In (x) = x Let Cn (U ) be the free group generated by the singular n-cubes such that each element of the group is written as a linear combination of a finite number of singular n-cubes. Any c ∈ Cn (U ) is the sum c=
k
ai ci , ai ∈ R, ci : In → U singular cube.
i=1
The space Cn is a real vector space, its elements are named, each one, an n-chain. For each n-cube In , and for each i ∈ {1, . . . , n}, we define the (n − 1)-faces of In as follows; In(i,0) (x) = In (x1 , . . . , xi−1 , 0, xi+1 , . . . , xn ), In(i,1) (x) = In (x1 , . . . , xi−1 , 1, xi+1 , . . . , xn ). The n-cube In has 2n-faces of dimension (n − 1). The (n − 1)-cube In(i,0) is the (i, 0) face of In , and In(i,1) is the (i, 1) face of In . An n-cube In has 0-faces (vertex), 1faces (edges), 2-faces, . . . , (n − 1)-faces. Now we consider the vector spaces Ci (U ) for 0 ≤ i ≤ n, in which the elements are the singular i-chains. Remark 5 Any k-dimensional differentiable submanifold S k of Rn admits a decomposition with singular k-cubes. Definition 17 Let c : In → U be a singular n-cube and α ∈ {0, 1} (see Fig. 11). (1) A face (i, α) of c is c(i,α) = c ◦ In(i,α) .
6 Stokes Theorem
269
Fig. 11 Boundary of the cube I 3
(2) The boundary of a singular n-cube is the singular (n − 1)-chain ∂c =
n 1
(−1)i+α c(i,α) .
i=1 α=0
(3) The boundary of an n-chain c =
k i+1
∂c =
n i ci is
k
n i ∂ci .
i=1
Example 5 The notation is crucial to the exposition. (1) (i) n = 2 and x = (x1 , x2 ); I2(1,0) (x) = I2 (0, x2 ), I2(1,1) (x) = I2 (1, x2 ) I2(2,0) (x) = I2 (x1 , 0), I2(2,1) (x) = I2 (x1 , 1). ∂ I2 = −I (1,0) + I (2,0) − I (2,1) + I (1,1) .
270
6 Differential Forms, Stokes Theorem
(2) (ii) n = 3 and x = (x1 , x2 , x3 ); I3(1,0) (x) = I3 (0, x2 , x3 ), I3(1,1) (x) = I3 (1, x2 , x3 ), I3(2,0) (x) = I3 (x1 , 0, x3 ), I3(2,0) (x) = I3 (x1 , 1, x3 ), I3(3,0) (x) = I3 (x1 , x2 , 0), I3(3,1) (x) = I3 (x1 , x2 , 1). ∂ I3 = −I3(1,0) + I3(1,1) + I3(2,0) − I3(2,1) − I3(3,0) + I3(3,1) . Lemma 6 For 0 ≤ i ≤ n, the boundary defines the homomorphism ∂i : Ci (U ) → Ci−1 (U ), such that ∂0 = 0 and ∂i−1 ◦ ∂i = 0 (∂ 2 = 0). Proof See Ref. [22]. Definition 18 Let S ⊂ Rn be a k-dimensional submanifold. For 0 ≤ i ≤ k, a singular i-chain σ ∈ C j (S): (i) is an i-cycle if ∂σ = 0. The subspace of i-cycles is Zi (S) = {σ ∈ Ci (S) | ∂σ = 0}. (ii) is an i-boundary if we have τ ∈ Ci+1 (U ) such that σ = ∂τ . The subspace of boundaries is Bi (S) = {σ ∈ Ci (S) | σ = ∂τ }. (iii) The ith singular homology group with real coefficients of S and Hi (S; R) =
Zi (S) . Bi (S)
(26)
The operators d and ∂ share the similar properties d 2 = 0 and ∂ 2 = 0, respectively; this suggests the existence of some relation between chains and forms, or between singular homology groups and De Rham cohomology groups. This relationship is established when integrating the forms on chains. For our purposes, we will consider only the singular i-cubes whose map c : Ii → U is differentiable. A k-form w on Ik is written as ω = f d x1 ∧ · · · ∧ d xk . We define ω= f, Ik
Ik
or equivalently,
f d x1 ∧ · · · ∧ d xk = Ik
1
f (x1 , . . . , xk )d x1 . . . d xk = Ik
0
1
. k. . 0
f (x1 , . . . , xk )dc1 . . . dck ,
6 Stokes Theorem
271
(dci = d xi ◦ dc). Let ω be a k-form on U and c a singular k-cube in U . We define
c∗ ω.
ω= c
Ik
In local coordinates, the integral above is equal to
(c)∗ ( f d x1 ∧ · · · ∧ d xk ) =
f d x1 ∧ · · · ∧ d xk = c
set
Ik
( f ◦ c)d x1 . . . d xk . Ik
When ω is a 0-form, a function, and c : {0} → U is a 0-singular l cube in U , we ω = w(0). The integral of a k-form ω along a k-chain c = i=1 n i ci is c ω= c
l
ω.
ai
(27)
ci
i=1
The integral of a 1-form on a 1-chain is a line integral, and the integration of 2-forms on a 2-chain is a surface integral. Definition 19 A k-chain c ∈ U ⊂ Rn is orientable if the orientation on Rn induces a non-null k-form on c. Theorem 5 (Stokes Theorem) Let ω be a (k − 1)-form on an open subset U ⊂ Rn , let c be an orientable k-chain on U and d : k−1 (U ) → k (U ) the exterior derivative. Then dω = ω. (28) ∂c
c
Proof Consider c = idIk : Ik → U and ω a (k − 1)-form on Ik . The boundary of c is ∂c = (−1)i+α ci,α (ci,α = c : I (i,α) → Ik ). Given an orthonormal basis of Rn , so we get an orientation on Rn . A (k − 1)-form ω is written as ω=
n
f i d x1 ∧ · · · ∧ d xi ∧ · · · ∧ d xk .
i=1
Noting that c(∗j,α) (d xi )
=
0, i = j; d xi , i = j,
272
6 Differential Forms, Stokes Theorem
we get
( j,α) ∗
Ik−1
(Ik
) ( f i d x1 ∧ · · · ∧ d xi ∧ · · · ∧ d xk ) = Ik−1
0, i = j; = Ik f i (x 1 , . . . , α, . . . , x k )d x 1 . . . d x k , i = j.
( f i ◦ c( j,α) )c(∗j,α) (d x1 ) ∧ · · · ∧ c(∗j,α) (d xk ) =
The right-hand side of Eq. (28) is equal to f i d x1 ∧ · · · ∧ d xi ∧ · · · ∧ d xk = (−1)i+α
∂ Ik
=
k 1
( j,α) ∗
j=1 α=0 Ik−1
k (−1) j+1
(Ik
f i (x1 , . . . , 0, . . . , xk )d x1 . . . d xk .
f i (x1 , . . . , 1, . . . , xk )d x1 . . . d xk + (−1) j Ik
j=1
) ( f i d x1 ∧ · · · ∧ d xi ∧ · · · ∧ d xk ) =
Ik
The left-hand side of the Eq. (28) gives us
d( f i d x1 ∧ · · · ∧ d xi ∧ · · · ∧ d xk ) = Ik
Ik
∂ fi d xi ∧ d x1 ∧ · · · ∧ d xi ∧ · · · ∧ d xk = ∂ xi
= (−1)i−1 Ik
∂ fi . ∂ xi
By Fubini’s theorem and the Fundamental Theorem of Calculus, we have
Ik
d( f i d x1 ∧ · · · ∧ d xi ∧ · · · ∧ d xk ) = (−1)i−1
= (−1)i−1 = (−1)i−1
Ik−1 Ik
Ik−1
1 ∂ fi d xi d x1 . . . d xi . . . d xk = ∂ x 0 i
f (x1 , . . . , 1, . . . , xk ) − f (x1 , . . . , 0, . . . , xk ) d x1 . . . d xi . . . d xk =
(29)
f (x1 , . . . , 1, . . . , xk ) + (−1)i
Ik
f (x1 , . . . , 0, . . . , xk )d x1 . . . d xk .
Matching both sides of Eq. (28), we get w= dw. ∂ Ik
Ik
Let ci be any singular k-cube. Using the linear properties of we get ∗ ∗ ∗ dω = ci (dω) = d(ci ω) = ci ω = ci
Ik
Ik
∂ Ik
∂ci
and d operators, ω.
6 Stokes Theorem
273
Therefore if c is an orientable singular k-chain c = dω = c
l
dω =
ai ci
i=1
l
l
ai
i=1
ai ci , then
i=1
∂ci
ω=
∂c
ω.
Remark 6 The Stokes theorem yields cohomological implications, which we address next. (1) Let S be a k-dimensional submanifold of U ⊂ Rn . If ∂ S = ∅, then S dη = 0 k (U ), that is, ω = ω + dη, for all η ∈ k−1 (U ). Therefore if [ω ] = [ω] ∈ HDR then ω = ω. S
S
(2) Let S be a k-dimensional submanifold of U ⊂ Rn and ω ∈ k (U ) a closed form (dω = 0). If c is a (k − 1)-chain, then ω= ω+ ω = ω + dω = ω. S+∂c
S
∂c
S
c
S
k Given a class [ω] ∈ HDR (U ), we get a linear functional
ξ :Hk (U ; R) → R, ξ(ω) = ω.
(30)
S
(3) De Rham isomorphism. k Gathering the results above, we have a homomorphism Rk : HDR (U ) → k H (U, R) given by Rk (ω)(σ ) = Iω = ω, .
with Iω : Hk (U, R) → R being the linear functional Iω (σ ) = σ ω. The De Rham theorem in [45] asserts that Rk is an isomorphism for all k > 0.
7 Orientation, Hodge Star-Operator and Exterior Co-derivative By defining an inner product on ∗ (U ), we can define the L 2 -dual operator d ∗ of d. Considering ω, η ∈ k (U ), for x ∈ U , we have ωx , ηx ∈ (Tx∗ U ). The inner product defined in Definition 4 induces a function < ω, η >x : U → R given as
274
6 Differential Forms, Stokes Theorem
< ω, η >x =< ωx , ηx >. Letting dϑ = d x1 ∧ · · · ∧ d xn be the volume form, the inner product < ., . >: k (U ) × k (U ) → R is given by < ω, η >= < ωx , ηx > dϑ. (31) U
Definition 20 An open subset U ⊂ Rn is orientable, if in n (U ), there is an n-form ω such that ω(x) = 0 for all x ∈ U . If | ω |= 1, ω is named the unit volume form. Definition 21 Let U ⊂ Rn be an open subset and dϑ = e1 ∧ · · · ∧ en the volume form. The Hodge star-operator ∗ : k (V ) → n−k (V ) is the linear operator defined by the following: let w, η ∈ k (V ); w ∧ ∗η =< w, η > dϑ.
(32)
Hodge’s star-operator inherits all properties shown in Proposition 7; (i) ∗2 = (−1)k(n−k) . (ii) < ∗w, ∗η >=< w, η >. (iii) d∗ = ∗d. n−k k (U ) → HDR (U ) related to the wellProperty (i) induces an isomorphism ∗ : HDR known Poincaré duality in singular cohomology. In what follows, we will consider the space of the forms ∗c (U ) with compact support in U , so they cancel on the boundary when there is one. Definition 22 The exterior co-derivative dk∗ : kc (U ) → k−1 is the operator given c by dk∗ = (−1)nk+1 ∗ dn−k ∗ .
(33)
The co-derivative is the (L 2 ) dual operator of the exterior derivative operator with respect to the inner product in ω ∈ k (U ), as shown by the following identities: let ω ∈ kc (U ) and η ∈ n−k c (U ); (i) d(ω ∧ ∗η) = dω ∧ ∗η + (−1)k ω ∧ d(∗η) d(ω ∧ ∗η) = dω ∧ ∗η + (−1)k ω ∧ d(∗η).
(34)
(ii) Integrating both identities above, we get
d(ω ∧ ∗η) =
dω ∧ ∗η + (−1)k
ω ∧ d(∗η) =
U
U
U
< dω, η > dϑ +
= U
(35) < ω, (−1)k(n−k)+k ∗ d ∗ η > dϑ.
U
7 Orientation, Hodge Star-Operator and Exterior Co-derivative
275
Applying the Stokes theorem, the right-hand side in Eq. 35 is equal to 0. Consequently, we get
< dω, η > dϑ =
< ω, (−1)k(n−k)+k+1 ∗ d ∗ η > dϑ = U < dω, η > dϑ = < ω, (−1)nk+1 ∗ d ∗ η > dϑ. =
U
U
U
Hence < dω, η >=< ω, d ∗ η >, for any ω ∈ kc (U ) and η ∈ k+1 c (U ). The following proposition is straightforward from the definition; Proposition 13 The exterior co-derivative d ∗ : kc (U ) → k−1 satisfies the coc cycle condition ∗ ◦ dk∗ = 0. (36) dk−1 In this way, defining d0∗ = 0, we have the co-complex d∗
d∗
d∗
d∗
d∗
d∗
0 ← 0 (U ) ← 1 (U ) ← · · · ← p (U ) ← p+1 (U ) ← · · · ← n (U ).
(37)
Example 6 Considering U ⊂ R3 , we have d ∗ = (−1)3k+1 ∗ d∗. Therefore ∗(d x1 ) = d x2 ∧ d x3 , ∗(d x2 ) = d x3 ∧ d x1 , ∗(d x3 ) = d x1 ∧ d x2 . (1) d ∗ : 1 (U ) → 0 (U ), d ∗ = ∗d∗. so d ∗ = div(F). Let ω = F1 d x1 + F2 d x2 + F3 d x3 and F = F1i + F2 j + F3 k, ∗ 2 1 ∗ (2) d : (U ) → (U ), d = − ∗ d∗, and ω = F1 d x2 ∧ d x3 + F3 d x1 ∧ d x2 + F2 d x3 ∧ d x1 . So we get
∂ F3 ∂ F2 ∂ F1 ∂ F3 ∂ F2 ∂ F1 d x1 + d x2 + d x3 = − − − d ω= ∂ x2 ∂ x3 ∂ x3 ∂ x1 ∂ x1 ∂ x2 1 dx1 + [curlF] 2 dx2 + [curlF] 3 dx3 . = [curlF] ∗
(3) d ∗ : 3 (U ) → 2 (U ), d ∗ = ∗d∗, and ω = f d x1 ∧ d x2 ∧ d x3 ); ∂f ∂f ∂f d x2 ∧ d x3 + d x3 ∧ d x1 + d x1 ∧ d x1 = ∂ x1 ∂ x2 ∂ x3 = [∇ f ]1 d x2 ∧ d x3 + [∇ f ]2 d x3 ∧ d x1 + [∇ f ]3 d x1 ∧ d x2 .
d ∗ω =
What is shown above allows us to identify the operators d3∗ grad, d2∗ curl and d1∗ = div .
276
6 Differential Forms, Stokes Theorem
Exercises (1) Consider the Hodge star-operator induced by the Lorentz form. Show that d ∗ = (−1)nk ∗ d ∗ . (2) Consider the operator
k
(38)
: k (U ) → k (U ) given by k
∗ = dk+1 dk + dk−1 dk∗ .
(39)
(a) Show that k is a Laplacian operator for all k ∈ N, k (hint: write it in coordinates). (b) Assuming ω is compactly supported, show that k ω = 0 if and only if dω = d ∗ ω = 0. (c) Show Ker(d) ⊥ Im(d∗ ). (d) Show k is an elliptic operator, and indeed, a Fredholm operator (hard). (hint: consult Ref. [45])
8 Differential Forms on Manifolds, Stokes Theorem Let M be an n-dimensional differentiable manifold and let A M = {(Uα , φα ) | α ∈ } be a differentiable structure carried by M with Uα being a contractible open subset for all α.
8.1 Orientation To give M an orientation, the idea will be to extend the orientation on an open Uα ⊂ M. There are differentiable manifolds that do not admit a frame globally defined, so we must work locally. For α ∈ , we fix a frame βα = {e1α , . . . , enα }, or equivalently, an orientation on Uα . Definition 23 Given an orientation over M, provided each open subset Uα carries an orientation and whenever Uα ∩ Uβ = ∅, the derivative of φαβ : φα (Uα ∩ Uβ ) → φβ (Uα ∩ Uβ ) preserves the orientation. Therefore the differential dφαβ (x) : Tφα (x) Uα → Tφβ (x) Uβ sends positive frames to positive frames. There are manifolds that do not admit an orientation; they are the non-orientable manifolds. Classic examples are the Möebius Band, the Real Projective Plane RP 2 , and the Klein Bottle K2 .
8 Differential Forms on Manifolds, Stokes Theorem
277
The frames βα and ββ , defined on Uα and Uβ respectively, are compatible if det(dφβα (x)) > 0 for all x ∈ Uα ∩ Uβ . An orientation is given by extending over M a frame βα defined on Uα under the compatibility condition. Another way of defining an orientation is through differential forms. Since we have the bijective relation βα = {e1 , . . . , en }
↔
ϑβ = e1 ∧ · · · ∧ en ,
we can set a positive basis for the one dimensional vector space n (T p M). We define the equivalence relation ϑ ∼ ϑ
⇔
∃ λ > 0 such that ϑ = λ.ϑ.
If T is the matrix sending frame β to frame β , then ϑβ = det(T ).ϑβ . Consequently, β ∼ β if and only if ϑβ ∼ ϑβ . An orientation is assigned to an open subset U ⊂ Rn by a volume n-form ϑ ∈ n (U ) that never is annulled in U . The same occurs on a manifold M, when we set a volume n-form.
Exercises (1) (2) (3) (4)
show that the Möebius band is non-orientable. Show that the sphere S n is orientable. Show that the the Klein bottle K2 is non-orientable. Show that the Projective Plane RP 2n is non-orientable and the Projective Plane RP 2n+1 is orientable for all n ∈ N. (5) Show that the fundamental group of a non-orientable manifold M has a normal subgroup isomorphic to Z2 . (hint: show there is a Möebius band embedded in M non-homotopic to a point).
8.2 Integration on Manifolds Fort 0 ≤ p ≤ n = dim(M), consider the vector fibered spaces " " 0 (Tx M), p (M) = p (Tx M). 0 (M) = x∈M
(40)
x∈M
A smooth p-form on M is a C ∞ -section of the vector bundle p (M), that is, ω : M → p (M),
278
6 Differential Forms, Stokes Theorem
such that ωx ∈ (Tx∗ M) for x ∈ M. Let p (M) be the space of p-forms and let ∗ (M) = ⊕ p≥0 p (M) be the Exterior Algebra of the differential forms on M. For x ∈ M and ω ∈ p (M), ωx is an alternating p-tensor on T p X . Let f : U → R be a differentiable function. If φ : V → U is a diffeomorphism between open sets preserving orientation, then f d x1 . . . d xn = ( f ◦ φ) | det(dφ) | dy1 . . . dyn . U
V
Let (M, A M ) be a differentiable manifold in which the differentiable structure α , φα ) | α ∈ } with a boundary that may not be is given by the atlas A M = {(U p empty. Let ω ∈ (M) be a p-form with compact support, i.e., supp(ω) = {x ∈ M | ω(x) = 0} is compact. Initially, we assume the support of ω is contained in a local chart Uα . Once we have fixed the orientation on Uα , we assume the diffeomorphism α preserves the orientation. The support of (φ −1 )∗ ω is a compact set φα : U α → U contained in Uα . Define ω= (φ −1 )∗ ω. (41) α U
M
Let β ∈ be such that Uα ∩ Uβ = ∅, and let φβα = φβ ◦ φα−1 : φα (Uα ∩ Uβ ) → φβ (Uα ∩ Uβ ) be an orientation preserving diffeomorphism. The integral defined by (41) is well-defined; indeed, it does not depend on the chart used because of the identity β U
(φβ−1 )∗ ω =
α U
∗ φβα (φβ−1 )∗ ω =
α U
(φβ−1 ◦ φβ ◦ φα−1 )∗ ω =
α U
(φα−1 )∗ ω.
Let’s now consider the general case when the support set supp(ω) is not contained in a local chart. The recipe is to use partitions of unity subordinated to the cover M = ∪Uα , so we can decompose a p-form in a summand of p-forms with supports that are contained in a chart. Consider {ρα }α∈ a partition of unity subordinated to the cover {Uα }α∈ , so ω= ρi ω. X
i
X
We need to check that M ω does not depend on the partition of unity used; let {ρα }α∈ be another partition and ω= M
j
X
ρ j ω,
8 Differential Forms on Manifolds, Stokes Theorem
279
and
ρ ,j ω
X
=
X
i
ρi ρ ,j ω.
Analogously, ρi ω =
M
Therefore i
m
ρi ω =
i
j
M
j
M
ρ ,j ρi ω
=
ρ ,j ρi ω.
j
i
M
ρi ρ ,j ω
=
j
M
ρ ,j ω.
For all a, b ∈ R and w1 , w2 ∈ p (X ), we have {aω1 + bω2 } = a ω1 + b ω2 . X
X
X
Theorem 6 Let M and N be differentiable manifolds and let f : M → N be a differentiable map preserving orientation; then ω= f ∗ ω. N
M
8.3 Exterior Derivative The differential operator d : C ∞ (M) → C ∞ (M) induces the operator d : 0 (M) → α , 1 (M), f d f . Locally, we fix a chart (Uα , φα ) and a frame {e1 , . . . , en } over U ∗ ∗ so the co-frame {e1 , . . . , en } allows us to write any k-form ω on Uα as ωαI d x I , ωαI ∈ C ∞ (X ). ω= I
We can apply the exterior derivative on open subsets of Rn as follows: dωαI ∧ d x I . dαw = I
Let’s check that the definition given does not depend on the local chart we have used. Let α, β ∈ be such that Uα ∩ Uβ = ∅; the compatibility condition is ∗ ∗ d β = d α φβα . φβα
280
6 Differential Forms, Stokes Theorem
For p ∈= {0, . . . , n}, we get the exterior derivative operator d p : p (M) → (M) satisfying the co-cycle condition d p+1 ◦ d p = 0. Therefore we have on M the complex p+1
0
d
d
d
d
d
d
0 → 0 (M) → 1 (M) → · · · → p (M) → p+1 (M) → · · · → n (M). (42) For p ≥ 0, the vector spaces p
HDR (M) =
Ker(d p ) Im(d p−1 )
define the pth De Rham Cohomology group of M.
8.4 Stokes Theorem on Manifolds The Stokes Theorem extends over a differentiable manifold. The details of the proof are similar to the case in Rn (see Ref. [42]); they only require partitions of the unit and checking that the concepts are well-defined up to the local chart. Theorem 7 Let M be a differentiable manifold and S ⊂ M a ( p + 1)-dimensional differentiable submanifold of M with a boundary ∂ S. For any p-form ω ∈ p (M), we have the identity dω = ω. (43) S
∂S
For a complete approach regarding cohomology groups of differentiable manifolds using differential forms, we recommend Ref. [4] .
Chapter 7
Applications to the Stokes Theorem
Applications are widespread in many topics of Pure and Applied Mathematics. To apply the formalism of differential forms and the Stokes Theorem, we will discuss the topics on Harmonic Functions and the geometric formulation of Electromagnetism without delving into the contents.
1 Volumes of the (n + 1)-Disk and of the n-Sphere Let B n+1 (R) = {x ∈ Rn+1 ; | x |< R} be the (n + 1)-dimensional open ball with = B n+1 (R) be the closed ball, and let radius R centered at the origin; let D n+1 R n n+1 S (R) = ∂ B (R) be the n-dimensional sphere of radius R. We denote the (n + 1)volume of D n+1 (R) by Vn+1 (R) and the n-volume of S n (R) by An (R). When R = 1, we simply denote Vn and An , respectively. Let’s check that the following identities are true: Vn+1 (R) = R n+1 Vn+1 , (1) An (R) = R n An . To prove the identities, we consider the diffeomorphism f : D n+1 (1) → D n+1 (R) given by f (x) = R.x, the differential is d f x = R.I , and so det(d f ) = R n+1 . It follows from the change of variables theorem for multiple integrals that the elements of volume and area are d VR =| det(d f ) | .d V and d A R =| det(d f ∂ ) | .d A; therefore Vn+1 (R) =
D n+1 (R)
d VR =
= R n+1 .
D n+1 (1)
D n+1 (1)
| J (d f x ) | d V =
d V = R n+1 Vn+1 .
© Springer Nature Switzerland AG 2021 C. M. Doria, Differentiability in Banach Spaces, Differential Forms and Applications, https://doi.org/10.1007/978-3-030-77834-7_7
281
282
7 Applications to the Stokes Theorem
Similarly, since f ∂ : S n (1) → S n (R), d f ∂ = R n .I , we get An (R) =
S n (R)
d AR =
S n (1)
| det(d( f ∂ )x ) | d A = R n .
S n (1)
d A = R n An .
Proposition 1 For any n ≥ 0, we have the identity Vn+1 (R) =
R An (R). n+1
(2)
Proof First, we fix a partition P = {0 = r0 , r1 , . . . , rn = R} for the interval [0, R] and consider D n+1 (R) = ∪i S n (ri∗ ) × [ti−1 , ri ] , with ri∗ = ri−12+ri and r = ri − ri−1 . So we have Vn+1 (R) = lim
r →0
An (ri∗ )r =
i
R 0
An (r )dr =
R 0
r n An dr =
R n+1 R An = An (R). n+1 n+1
The volume form of Rn+1 is d V = d x1 ∧ · · · ∧ d xn+1 , so we have Vn+1 (R) =
B n+1 (R)
d x1 ∧ · · · ∧ d xn+1 .
The external derivative of the differential n-form ωn =
n+1
(−1)i−1 xi d x1 ∧ · · · ∧ d xi−1 ∧ d xi+1 ∧ · · · ∧ d xn+1 ∈ n (D n+1 (R)),
i=1
is dωn = (n + 1)d x1 ∧ · · · ∧ d xn+1 . Since the volume of the n-sphere is 1 An (R) = R
S n (R)
ωn ,
(3)
from the Stokes Theorem we get that Vn+1 (R) =
D n+1 (R)
d x1 ∧ · · · ∧ d xn+1 =
1 1 R dωn = ωn = An (R). n + 1 D n+1 (R) n + 1 S n (R) n+1
Let’s give S n−1 (1) a parametrization using spherical coordinates and also using induction on the dimension: 1 (θ1 ) = cos(θ1 ), sin(θ1 ) , 2 (θ1 , θ2 ) = 1 (θ1 ) sin(θ2 ), cos(θ2 ) = cos(θ1 ) sin(θ2 ), sin(θ1 ) sin(θ2 ), cos(θ2 ) . .. .
n (θ1 , . . . , θn−1 , θn ) = ψn−1 (θ1 , . . . , θn−1 ). sin(θn ), cos(θn ) .
1 Volumes of the (n + 1)-Disk and of the n-Sphere
283
A parametrization of S n (1) induces a parametrization n : [0, 2π ] × [0, π ] × × [0, π ] → S n (R) on S n (R):
n−1 ...
n (θ1 , . . . , θn−1 , θn ) = R. ψn−1 (θ1 , . . . , θn−1 ). sin(θn ), cos(θn ) , n ≥ 2. By induction, the volume element d A R of S n (R) given as the function of angles θ1 , . . . , θn is d AR =
1 ωn = R n . sinn−1 (θn ). sinn−2 (θn−1 ) . . . sin(θ2 )dθ1 dθ2 . . . dθn . R
So the volume of S n (R) is An (R) =
2π
π
dθ1 .
0
π
sin(θ2 )dθ2 . . .
0
sinn−1 (θn )dθn .
0
We will not use the above formula to find An ; instead we will use a smart trick, which is to know the value of the Gaussian integral
∞
−∞
e−x d x = 2
√
π.
By integrating the function f (x1 , . . . , xn+1 ) = e−(x1 +···+xn+1 ) in Rn+1 , we get 2
e−(x1 +···+xn+1 ) d x1 . . . d xn+1 = 2
Rn+1
2
∞ −∞
e−x1 d x1 . . . 2
2
∞ −∞
e−xn+1 d xn+1 = (π ) 2
n+1 2
.
Using the spherical coordinates, the volume n-form in Rn+1 is d V = d x1 . . . d xn+1 = ρ n dρdωn , so the integral becomes
e−(x1 +···+xn+1 ) d x1 . . . d xn+1 = 2
Rn+1
2
0
Hence
n+1
An (R) =
∞
ρ n e−ρ dρ. 2
Sn
ωn =
1
2
n+1 .An . 2
n+1
2(π ) 2 n (π ) 2 n+1 R , Vn+1 (R) = n 3 R n+1 .
2
2+2
(4)
∞ Remark The Gamma function (t) = 0 x t−1 e−x d x shares the following properties: (i) (1) = 1, (ii) (t (hint: integrating by parts), ∞+ 1) = t (t) 2 (iii) 0 x n−1 e−x d x = 21 ( n2 ). (hint: change the variable to u = r 2 and solve the integral).
284
7 Applications to the Stokes Theorem
Exercises 1. Show the following recurrence identities, Vn =
2π 2π Vn−2 , Sn−1 = Sn−3 . n n−2
√ n n , V2n+1 = 2n!(4π) (hint: (n + 21 ) = (2n)! π ). 2. Show that V2n = (π) n! (2n+1)! 4n n! 3. Show that limn→∞ Vn = 0 and limn→∞ An = 0. Give an intuitive reason why the limits are 0.
2 Harmonic Functions Harmonic functions play a key role in physics and in several topics in Applied and Pure Mathematics.
2.1 Laplacian Operator Let U ⊂ Rn be an open subset. The Laplacian operator on U is the linear differential operator : C 2 (U ) → C 0 (U ) given by f = div(∇ f ).
(5)
In Cartesian coordinates, the Laplacian is given by f =
∂2 f ∂2 f ∂2 f + + ··· + . 2 2 ∂ xn2 ∂ x1 ∂ x2
(6)
In the case of a vector quantity F = ( f 1 , . . . , f n ), the Laplacian of F is defined as F = f 1 , . . . , f n . Example 1 In physics, the Laplacian is an operator of fundamental importance since it arises in several models, for example, in models that study heat conduction, wave models, electromagnetic models, Quantum Mechanics and many others. Let’s look at two examples: 1. Heat conduction. Heat or thermal energy is the transfer of energy from a warmer body to a colder body. According to Fourier’s law of heat conduction, the amount of heat Q transferred through a closed surface S per unit of time is proportional to the flow through S of the temperature gradient T , that is
2 Harmonic Functions
285
dQ = −k. dt
∇T, S
where k is the thermal conductivity of the material, which we assume to be constant. Thermal conductivity is the property that material has to conduct heat. When in a region there is no heat source, or something absorbing heat, the amount of heat remains constant inside . Therefore for every surface S ⊂ , we have S ∇T = 0. Applying the Stokes Theorem, we conclude that div(∇T ) = 0. S
Hence T = 0 in . 2. Gauss’ Law. Also known as the Gaussian Flow Theorem, it states that the flow of the electric field E through a closed connected surface S = ∂ (boundaryless) is equal to the total charge within S (or ). Let 0 be the electric constant and Q S the total electric charge within S, then 1 E = 0
A. Ed S
Let ρ be the charge density in . Applying the Stokes Theorem, we obtain E =
1 0
V = div( E)d
ρd V.
= ρ . Assuming there is no electric charge in the Therefore it follows that div( E) 0
= 0, and if we also assume there is no magnetic field from region , we have div( E)
= 0. Therefore Faraday’s law, we can derive that curl( E)
= ∇ div( E)
− ( E)
= 0 ⇒ ( E)
= 0. curl curl( E) Consequently, in the absence of sources in a region , we have E = 0 in . Many mathematical models have some symmetry allowing us to simplify the equations to understand the questions addressed. Once a symmetry is detected, we should use appropriate coordinates. Next, we will address the question of the existence of a solution for the equation f = g on the disk D nR which has spherical symmetry. For our purposes, it will be extremely convenient to use spherical coordinates in Rn . Using the spherical coordinates (ρ, θ1 , . . . , θn−1 ) in Rn , the Laplacian is f =
1 n − 1 ∂f ∂2 f + 2 S n−1 f, + 2 ∂ρ ρ ∂ρ ρ
(7)
286
7 Applications to the Stokes Theorem
in which S n−1 f is an operator involving only the 1st and 2nd derivatives with respect to the angles θ1 , . . . , θn−1 . The operator S n−1 is known as the Laplace-Beltrami operator on S n−1 . Using the exterior derivative, we have f = d ∗ d f .
2.2 Properties of Harmonic Functions Definition 1 A function f ∈ C 2 (U ) is harmonic if it satisfies the equation f = 0.
(8)
There are many examples of harmonic functions in Rn ; below we show some of them; n n ai xi2 such that i=1 ai = 0. (i) every polynomial p(x1 , . . . , xn ) = i=1 (ii) If n = 2, then ν2 (x) = ln(| x |) is harmonic in R2 \{0}. (iii) If n ≥ 3, then νn (x) =| x |2−n is harmonic in Rn \{0}. Due to the importance of this example for the text segment, we will prove that ν = 0 in Rn \{0}. We call attention to the fact that it is not harmonic in Rn . In spherical coordinates, we have νn (ρ) = ρ 2−n . Applying the Eq. (7), we get (2 − n)(1 − n) (n − 1) (2 − n) + = 0, ρ = 0. ρn ρ ρ n−1 However, B n (νn )d V = (2 − n)An−1 , where An−1 is the n-volume of the sphere R S n−1 . (iv) If f (x) is harmonic and differentiable, then ∂∂xfi is harmonic for all 1 ≤ i ≤ n. As a result there is an infinite number of harmonic functions in Rn . The harmonic functions in R2 enjoy many properties also satisfied by the holomorphic functions in C. We recall that the real and imaginary parts of a holomorphic function are real harmonic functions, and every harmonic function defined on an open subset U ⊂ R2 is a real or imaginary part of a holomorphic function. In this way, the results we will present next are similar to those known in the function theory of a single complex variable. We assume the following conditions along with the rest of the exposition; (i) the region of integration ⊂ U is closed, bounded, and has a non-empty interior with S = ∂ such that the Stokes Theorem is valid. (ii) the functions are in C 2 (U ). Let d V be the volume element in and let d S be the volume element of the boundary S. It follows from the Stokes Theorem that ∂f d S, f dV = div(∇ f )d V = < ∇ f, n > d S = S S ∂n
2 Harmonic Functions
287
where n is the normal unit vector to S. Consequently, if f is harmonic, then S ∂∂nf d S = 0. This imposes a strong constraint on f since the flow of the field ∇ f must be 0 through all of the surfaces S = ∂ . Green’s Identities are fundamental for exploring the properties of the harmonic functions. Proposition 2 If f, g ∈ C ∞ (U ), then: (i) Green’s 1st identity.
g f + < ∇ f, ∇g > d V =
g S
∂f d S. ∂n
(9)
(ii) Green’s 2nd identity:
f.g − g. f d V =
∂g ∂f − g. d S. f ∂n ∂n S
(10)
Proof We recall that div(g.∇ f ) =< ∇ f, ∇g > +g. f . (i) Considering the function h = g.∇ f , we get the identity (9) applying the Stokes Theorem. (ii) Let h = f.∇g − g.∇ f ; div f.∇g − g.∇ f = f.g − g. f. It follows from the Stokes Theorem that
div f.∇g − g.∇ f d V =
f.g − g. f d V =
∂f ∂g − g. d S. = f. ∂n ∂n S
S
f. < ∇g, n > −g. < ∇ f, n > d S =
If f is harmonic, then it satisfies the following properties; 1. S ∂∂nf d S = 0. 2. | ∇ f |2 d V = S f. ∂∂nf d S (take f = g in (9)). These identities are enough to prove uniqueness. Theorem 1 Let ⊂ U be a region where the Stokes Theorem is valid. If f ∈ C 2 (int()) and g ∈ C 0 (∂), then there is a unique function u ∈ C 2 (int()) such that (ii) u = f on int(), (ii) u |∂ = g. Proof Suppose that there are two solutions u, v ∈ C 2 (), so the function w = u − v solves the equation
288
7 Applications to the Stokes Theorem
w = 0, on , w |∂ = 0.
Since
| ∇w |2 d V =
∂
w.
∂w d S = 0, ∂n
we have w = 0 in int(). Hence u = v.
Although we have proved the uniqueness for a bounded region , we can’t yet claim the existence of solutions. If is unbounded, then it may have no solutions at all, as shown in the example in which = {x = (x1 , . . . , xn ) ∈ Rn | xn ≥ 0}, f (x) = 0 and g(x) = xn . Similar to the holomorphic functions on C, the harmonic functions satisfy the Mean Value Property. Theorem 2 (Mean Value Property) Consider n ≥ 2. Let R > 0 be such that D nR (a) ⊂ ⊂ Rn . If f ∈ C 2 (U ) is harmonic in U , then f (a) = where An (R) =
S Rn−1 (a)
1
f,
An−1 (R)
(11)
S Rn−1
d S.
Proof Given > 0, we consider = x ∈ U ; ≤| x − a |≤ R and νn,a (x) =| x − a |2−n . Using spherical coordinates in Rn and letting ρ =| x − a |, we get = ˆ {x ∈ U ; ≤ ρ ≤ R} and νn,a (ρ) = ρ 2−n , with the gradient being ∇νn,a (ρ) = ρ2−n n−1 ρ. n−1 n−1 Using the fact that νn,a (x) is harmonic in and ∂ = S R (−S ) is an oriented boundary, the Green’s 2nd identity yields 0=
( f νn,a − νn,a f )d V =
S n−1 R (a)
( f ∂n νn,a − νn,a ∂n f )d S −
Sn−1 (a)
( f ∂n νn,a − νn,a ∂n f )d S =
(2 − n) (2 − n) f d S − R 2−n ∂n f d S − n−1 f d S + 2−n ∂n f d S = n−1 n−1 n−1 n−1 R (a) S R (a) S R (a) S Sn−1 (a) (2 − n) (2 − n) f d S − n−1 f d S. = n−1 R S n−1 (a) Sn−1 (a)
=
R
So
1
R n−1
S Rn−1 (a)
f dS =
1 n−1
Multiplying both sides of the above identity by 1 An−1 (R)
S Rn−1 (a)
f dS =
f d S. S Rn−1 () 1 , An−1
1 An−1 ()
we get
f d S. Sn−1 (a)
2 Harmonic Functions
289
Passing the limit → 0, we have f (a) =
1 An−1 (R)
f d S. S Rn−1 (a)
For the case n = 2, the method of proof goes along the same lines; however, the auxiliary harmonic function is ν2 (x) = ln(| x |). The Mean Value Property plays an outstanding and important role in understanding the properties of harmonic functions. Theorem 3 (Maximum Principle) Let ⊂ U be a connected region. Assume f ∈ C 2 int() is a harmonic function in int(). If f reaches the maximum value or the minimum value in int(), then f is constant in . Proof Let M = max{ f (x) | x ∈ }. Consider the set C M = {x ∈ int() | f (x) = M}. Since f is continuous, C M is closed. Let’s check that C M is also open. Given a ∈ C M ∩ int(), let > 0 be such that B (a) ⊂ int(). By the Mean Value Property, we have 1 f d S. M = f (a) = Vn () Bn (a) Consequently f must be a constant equal to M on ∂ B (a); otherwise, f (a) could not be the average value on Bn (a). Indeed, if we consider smaller balls containing a, we get B (a) ⊂ C M , and so C M is open. Therefore C M = . For the case of the minimum value, we consider the harmonic function (− f ), and the above reasoning applies.
Exercises n 1. Using spherical coordinates Laplacian is given by Eq. (7). in R , prove that the 2. Show that f (x) = ln | x | is harmonic in R 2 \{0}. 3. Show that
| x |2−n d V = An−1 , ln | x | d V = 2π, B Rn
B R2
and conclude that ν2 (x) = ln(| x |) and νn (x) =| x |2−n are not harmonics in B Rn . 4. If f is harmonic, show that | f | is also harmonic. Indeed, | f | = | ff | f . 5. Let D 2 = {u ∈ R2 ; | u |≤ 1} be the unity disk and let f : ∂ D 2 → R be a C 2 function.1 Using polar coordinates show that a solution to the PDE
1
f has a uniformly convergent Fourier series since it is C 2 -differentiable.
290
7 Applications to the Stokes Theorem
u = 0 in D 2 , u ∂ D 2 = f is given by ∞
u(r, θ ) =
a0 n + r an cos(nθ ) + bn sin(nθ ) , r ∈ [0, 1), θ ∈ [0, 2π ). 2 i=1
The coefficients an and bn are given as 1 an = 2π
2π 0
1 f (t) cos(nt) dt, bn = 2π
2π
f (t) sin(nt) dt.
0
The solution u : D 2 → R is a harmonic extension of f over the disk D 2 . 6. Show that the zeros of a continuous harmonic function f ≡ 0 are not isolated. 7. Consider R > 0 such that D nR (a) ⊂ , with the volume being V (R). If f is harmonic, show that the value of f at a ∈ is given by f (a) =
1 V (R)
f d V, D nR (a)
where V (R) is the volume of D nR (a).
3 Poisson Kernel for the n-Disk D nR For any n > 2, let’s consider D nR = {x ∈ Rn ; | x |≤ 1} the n-Disk, B Rn = {x ∈ Rn ; | x |< 1} the open n-ball, and S Rn−1 = {x ∈ Rn ; | x |= 1} in the (n − 1)-sphere. Dirichlet Problem in D nR : Given a function φ ∈ C 0 (S Rn−1 ), show the existence of a function f ∈ C 2 (B Rn ) such that f (x) = 0, x ∈ B Rn ,
(12)
f (x) = φ(x), x ∈ S Rn−1 . The first step is to use Green’s 2nd identity:
f.gd V =
B Rn
g. f d V +
( f.∂n g − g.∂n f ) d S, S
∂f = ∂n f ∂n
.
(13)
3 Poisson Kernel for the n-Disk D nR
291
Let ξ ∈ and consider = D nR \Bn (ξ ) and2 ν(ξ, x) = νξ (x) =| x − ξ |2−n . As in the proof of the 2nd Green identity, we have (all the derivatives taken with respect to the x variable), ∂νξ ∂νξ ∂f ∂f − νξ . dS − − νξ . dS = f. ∂n ∂n ∂n ∂n ∂ Sn−1 ( p) ∂νξ (2 − n) ∂f ∂f f. − νξ . d S − n−1 d S. f d S + 2−n = n−1 n−1 ∂n ∂n ∂ S ( p) S ( p) ∂n
( f.νξ − νξ . f ) d V =
f.
Since νξ is harmonic in int( ) with respect to x variable, given → 0, we get the 3rd Green identity: f (ξ ) =
∂νξ 1 1 ∂f − νξ . d S. f. (νξ . f ) d V + (2 − n)An−1 (2 − n)An−1 ∂ ∂n ∂n
(14)
If f is harmonic, then f (ξ ) is determined by the values of f on ∂ since f (ξ ) =
1 (2 − n)An−1
f.
∂
∂f ∂νξ − νξ . ∂n ∂n
d S.
(15)
Once we have proved the existence of a harmonic function G ξ (x) in int(), satisfying the boundary condition G ξ |∂ = 0, the solution to the problem (12) is 1 f (ξ ) = (2 − n)An−1
∂
∂G ξ .φ d S. ∂n
(16)
The function G ξ is Green’s function for the Dirichlet problem (12) when = D nR . In order to find G ξ , we will consider some special transformations. The inversion map in a sphere S Rn−1 is I :Rn → Rn x (17) x → I(x) = R 2 . 2 |x| Consider the image I = I(). Once we take into account the decomposition set n−1 n n R = B R ∪ S R ∪ R \B Rn , where B Rn is the open ball, the map I : Rn \{0} → Rn \{0} satisfies the following properties: (i) I ◦ I = I, (ii) | I(x) | . | x |= R 2 . Therefore if x ∈ S Rn−1 , then I(x) = x, I(B Rn ) = Rn \B Rn and I(Rn \B Rn ) = B Rn . (iii) I is a conformal map; that is, we have λ : Rn \{0} → (0, ∞) such that n
< dIx .u, Ix .v >= λ(x). < u, v >, ∀u, v ∈ Tx (Rn \{0}). 2 For
the case n = 2, Green’s function is νξ (x) = ln(| x − ξ |).
292
7 Applications to the Stokes Theorem
Definition 2 Let f ∈ C 0 Rn \{0} . Kelvin’s transform of a function f is the function | x |2−n f I(x) . (18) 2−n R Therefore K : C 0 Rn \{0} → C 0 Rn \{0} . Kelvin’s transform is linear and K K ( f ) = f , that is, K −1 = K . It was Lord Kelvin, famous for his contributions to thermodynamics, who discovered this transformation. The most relevant property for our purposes is in preserving the harmoniousness of a harmonic function. We will need some preliminary results regarding Kelvin’s transform. K [ f ](x) =
Lemma 1 Let p : Rn → R be a homogeneous polynomial of degree m. Then, | x |2−n−2m p =| x |2−n−2m . p. Proof Since p(t x) = t m p(x), we have < x, ∇ p(x) >= mp(x). Considering f (x) =| x |k . p(x), we have f =| x |k p + 2 < ∇ | x |k , ∇ p(x) > + p x.(| x |k ) . Applying the identities ∇(| x |k ) = k | x |k−2 x,
(| x |k ) = div(∇(| x |k )) = k(k − 2) + nt | x |k−2 , we get f (x) =| x |k p + 2k | x |k−2 < x, ∇ p(x) > + k(k − 2) + nt | x |k−2 . p(x) = =| x |k p + 2k | x |k−2 mp(x) + k(k − 2) + nt | x |k−2 . p(x) = =| x |k p + k(2m + n + k − 2) | x |k−2 . p(x).
The statement follows assuming k = 2 − 2m − n.
Lemma 2 The Kelvin transform preserves uniform convergence on compact sets. Proof Let { f n } ⊂ C 2 Rn \{0} be a uniformly convergent sequence when restricted to a compact set K ⊂ Rn \{0} in which | x |≤ R 2−n M (M > 0). Given > 0, we have n 0 ∈ N such that if n, m > n 0 , then || f n − f m ||2 < M2−n in K . The image set I(K ) is also compact. Given that y = I(x), we get | x |2−n || K [ f n ] − K [ f m ] ||2 = sup K [ f n ](y) − K [ f m ](y)2 ≤ sup | f n (x) − f m (x) |2 ≤ 2−n x∈K R y∈K I ≤ M 2−n || f n − f m ||2 < .
3 Poisson Kernel for the n-Disk D nR
293
Therefore {K [ f n ]}n∈N uniformly converges in the Banach space C 2 Rn \{0} , || . ||2 . Corollary 1 Let f ∈ C 2 Rn \{0} , so
K[ f ] =
|x| R
4 .K [ f ].
(19)
Proof First, let p be a homogeneous polynomial of degree m; therefore K [ p](x) =
| x |2−n R2 p( x) = R 2m+n−2 | x |2−n−2m p(x). R 2−n | x |2
From Lemma 1, we have K [ p](x) = R 2m | x |2−n−2m p. When p is homogeneous of degree m, the function p is also homogeneous of degree (m + 2). Therefore 2 2 m+2 R R | x |2−n | x |2−n p x = p(x) = K [ p](x) = 2−n 2 2−n R |x| R | x |2 R 4 R 4 2m+n−2 .R . | x |2−n−2m . p(x) = .K [ p](x). = |x| |x| n 2 Consider { pn } a polynomial sequence converging uniformly to f in C R \{0} . Writing the polynomial as a finite sum pn (x) = i pn,i (x), with pn,i (x) being a homogeneous polynomial of degree i, it follows from the linearity of both operators, Kelvin’s transform and Laplacian ( pn = i pn,i ) that K [ pn ] =
K [ pn,i ] =
i
|x| R
4
K [ pn,i ] =
i
|x| R
4 K [ pn ].
Since pn converges to f in C 2 , we have K [ f ] = lim K [ pn ] = lim n→∞
n→∞
|x| R
4
K [ pn ] =
|x| R
4 K [ f ].
n Theorem 4 A function f ∈ C R \{0} is harmonic if and only if K [ f ] is harmonic. 2
Proof The statement follows from the identity K [ f ] =
|x| R
4 .K [ f ].
294
7 Applications to the Stokes Theorem
Let D = {(x, x) ∈ B Rn × B Rn } be the diagonal and let G : B Rn × B Rn \D → R be the function given by (G ξ (x) = G(ξ, x)): 2−n | x |2−n R 2 G(ξ, x) =| x − ξ |2−n − K | x − ξ |2−n = | x − ξ |2−n − . x − ξ , 2−n 2 R |x|
(20)
and we have our Green’s function. Definition 3 Green’s function for the Dirichlet problem (12) is G(ξ, x) =| x − ξ |2−n −
2−n | x |2−n R 2 . x − ξ . R 2−n | x |2
(21)
To check G(ξ, x) = G ξ (x) is the Green function, we need to verify the conditions: (i) G ξ = 0 in B Rn ; (ii) G ξ (x) = 0, if x ∈ S Rn−1 . (1) If ξ or x belong to S Rn−1 , then G(ξ, x) = 0. (2) G(ξ, x) = G(x, ξ ). It is enough to check the second term, so we use the identity (easily checked by squaring both sides): ξ 2 x − | x | .ξ = R 2 − | ξ | .x . R |x| |ξ | So we have 2−n 2−n | x |2−n R 2 | x |2−n 1 2 x − | x | .ξ . x − ξ = . = R R 2−n | x |2 R 2−n | x |2−n |x| 2−n 2−n ξ | ξ |2−n 2 ξ 1 − | ξ | .x = .R − x . = 2−n .R 2 2−n 2 R |ξ | R |ξ | G(ξ, x) is symmetric with
respectto the variables x and ξ . (3) G(ξ, x) is harmonic in B Rn × B Rn \D. It is straightforward from the definition of G. (4) Since the unit normal vector to S Rn−1 at x is n x = Rx , the derivative at the direction along the vector n x is 1 R 2 − | ξ |2 1 ∂G < ∇G(ξ, x), x >= (ξ, x) = . ∂n x R R An−1 | x − ξ |n
(22)
By swapping3 the variables ξ → x and x → ξ in Eq. (22), the solution (16) is now 3 Just
for an aesthetic reason.
3 Poisson Kernel for the n-Disk D nR
295
1 f (x) = (2 − n)R An−1
S Rn−1
R 2 − | x |2 φ(ξ ) d Sξ . | x − ξ |n
(23)
Definition 4 The Poisson kernel for the Dirichlet problem (12) in the n-Disk D nR is P(x, ξ ) =
1 R 2 − | x |2 . (2 − n)R An−1 | x − ξ |n
(24)
Let G : C 0 (S Rn−1 ) → C 0 (B Rn ) be the map given by H[g](x) =
1 (2 − n)R An−1
S Rn−1
R 2 − | x |2 g(ξ )d Sξ . | x − ξ |n
(25)
The function H[g](x) is the only solution to the Dirichlet Problem (12). To prove this statement, we need some previous results. Theorem 5 If ξ ∈ S Rn−1 , then P(x, ξ ) is harmonic in B Rn \{ξ }. −|x| is C 2 -differentiable in R n \{ξ }. ConProof The Poisson kernel P(x, ξ ) = R|x−ξ |n sidering the functions u = R 2 − | x |2 and v =| x − ξ |n , we have 2
2
∇u = −2x, u = −2n n 2n ∇v = − (x − ξ ), v = . n+2 | x −ξ | | x − ξ |n+2 Applying the identity (u.v) = u.v + 2 < ∇u, ∇v > +v.u, it follows that P(x, ξ ) is harmonic in R n \{ξ }. Proposition 3 The Poisson kernel satisfies the following properties; (1) P(x, ξ ) > 0 for all x ∈ B Rn and ξ ∈ S Rn−1 . (2) S n−1 P(x, ξ )d Sξ = R. R
(3) For any η ∈ S Rn−1 and δ > 0, lim
x→η
|ξ −η|>δ
P(x, ξ ) d Sξ = 0.
Proof The proof is straightforward from the definition and properties, as shown next. (1) It is immediate from the definition of P(x, ξ ). (2) Consider a sequence {xk }k∈N ⊂ B Rn such that lim xk = η ∈ S Rn−1 . Assuming the condition | xk − η |> δ, we get lim
k→∞
S Rn−1
R 2 − | x k |2 R 2 − | x k |2 < lim d Sξ = 0. d S ξ k→∞ S n−1 | x k − ξ |n δ R
296
7 Applications to the Stokes Theorem
(3) Consider R = 1; since | ξ |= 1, we have 1− | x |2 1− | ξ |2 | x |2 d S = d S = ξ ξ n n S n−1 | x − ξ | S n−1 | ξ | x − ξ S n−1 |ξ | x x d Sξ = P 0, = 1. P | x | ξ, = |x| |x| S n−1
1− | ξ |2 | x |2 d Sξ = x n | x | ξ − |x|
The last equality follows from the fact that P(x, ξ ) is harmonic, so it satisfies the Mean Value Property. Now we apply the coordinate change ξ = Rξ to get S Rn−1
R 2 − | x |2 d Sξ = R | x − ξ |n
S n−1
1− | Rx |2 d Sξ = R. | Rx − ξ |n
Theorem 6 Let g ∈ C
0
(S Rn−1 )
and consider
f (x) =
H R [φ](x), if x ∈ B Rn , φ(x), if x ∈ S Rn−1 .
(26)
So f is the unique solution to Problem (12). Proof The Laplacian operator switches with the integral since f is differentiable. In this way, the Laplacian of f (x) in B Rn−1 is f (x) =
S Rn−1
P(x, ξ )φ(ξ ) d Sξ = 0.
To prove the continuity of f in B Rn , let’s fix η ∈ S Rn−1 and > 0. Since φ is continuous, there is δ > 0 such that if | ξ − η |< δ, then | φ(ξ ) − φ(η) |< . lim | f (x) − f (η) |= lim P(x, ξ )(φ(ξ ) − φ(η)) d Sξ ≤ lim P(x, ξ ) | φ(ξ ) − φ(η) | d Sξ ≤ x→η S n−1 x→η S n−1 R R ≤ lim P(x, ξ ) | φ(ξ ) − φ(η) | d Sξ + lim P(x, ξ ) | φ(ξ ) − φ(η) | d Sξ ≤
x→η
x→η |x−η|δ
≤ R + 2 || φ ||0 lim
x→η |x−η|>δ
P(x, ξ ) | φ(ξ ) − φ(η) | d Sξ = 0.
In the last inequality, we use item (3) of Proposition 3. ∞
(B Rn )
with respect to the variable Indeed the function f (x) defined in (26) is C x ∈ B Rn . We can check this as follows: let α = (α1 , . . . , αn ) be a multi-index in which n |α| αi , the partial derivative ∂ |α| f = ∂ α1 x∂1 ...∂f αn xn is 0 ≤ αi and | α |= i=1 ∂ |α| f (x) =
B Rn
∂ |α| P(x, ξ )φ(ξ ) d Sξ .
3 Poisson Kernel for the n-Disk D nR
297
Since P(x, ξ ) ∈ C ∞ (B Rn ) for all ξ ∈ S Rn−1 , it is straighforward that f ∈ C ∞ (B Rn−1 ).
Exercises (1) Prove Eq. (22) can be obtained for the Poisson kernel. hint: show the identities (i) (ii)
(2 − n) ∂ | x − p |2−n = (xi − pi ), ∂ xi | x − p |n ∂(|
| p| x R
−
R | p|
p |2−n )
∂ xi
=
|p| | p| R | R x − | p| p |n R (2 − n)
|p| R xi − pi R |p|
and prove that if | x |= R, then | |Rp| x − |Rp| p |=| x − p |. (2) Consider ⊂ Rn . Let{ f n }n∈N ⊂ C ∞ () be a sequence converging uniformly to f in each compact subset K ⊂ . Show that f is harmonic and for all of multi-index α, the sequence {∂ |α| f n }n∈N converges uniformly in each compact subset K ⊂ to ∂ |α| f . (3) Prove the reverse of the Mean Value Property: let a ∈ and f ∈ C 0 (). If f (x) =
1 An−1 (R)
S Rn−1
f (ξ ) d Sξ ,
for all R > 0 such that B Rn (a) ⊂ , then u is harmonic. To prove continuity, it is necessary to assume is bounded; otherwise, assuming = Rn , the function below is a counterexample; ⎧ ⎪ ⎨ 1, xn > 0, f (x) = 0, xn = 0, ⎪ ⎩ −1, xn < 0. (4) Let f be an integrable function in . Let a ∈ o and R > 0. If for all B Rn (a) ⊂ , we have 1 f (ξ ) d Vξ , f (x) = Vn−1 (R) B Rn (a) then f is harmonic. (5) If f is harmonic in ⊂ Rn , then f is analytic in . (6) Solve the Dirichlet problem for the case n = 2. (7) Let ⊂ Rn be a region, that is, a bounded set with non-empty interior and with a boundary that is C 0 -regular, and let || d f || = 2
n ∂f 2 i=1
∂ xi
.
298
7 Applications to the Stokes Theorem
The Dirichlet energy functional E : C 2 (U ) → R is given by E( f ) =
|| d f ||2 d V.
Given an open subset U ⊂ int(), show the identity d E f .u = U < f, u > d V . Then conclude that ∇ E( f ) = f . (8) Let M be a compact manifold. If f ∈ C 2 (M) is harmonic, show that it must be constant.
4 Harmonic Differential Forms Let U ⊂ Rn be a bounded open subset endowed with a Riemannian metric g. We consider {(x1 , . . . , xn ) | xi ∈ R} a local system of coordinates on U . Let β = {e1 , . . . , en } be a local frame, in which ei = ∂∂xi satisfies the conditions; ∂e
(i) ∂ei e j = ∂ xij = 0 for all i, j; (ii) [ei , e j ] = 0 for all i, j. p
Let cs be the space of p-forms in U with compact support, we wish to avoid the p p+1 boundary terms. The exterior derivative d p : cs (U ) → cs (U ) is dp =
e j ∧ ∂e j .
(27)
j
Let f ∈ 0cs (U ), so d0 f =
e j ∧ ∂e j f =
j
ej ∧
j
∂f ∂f = e j = ∇ f. ∂x j ∂x j j
Given a p-form ω = f ei1 ∧ ei2 ∧ · · · ∧ ei p , we get dpω =
e j ∧ ∂e j f ei 1 ∧ ei 2 ∧ · · · ∧ ei p =
j
=
∂f e j ∧ ei 1 ∧ ei 2 ∧ · · · ∧ ei p . ∂x j j
p
In cs (U ), we have the inner product
ω ∧ ∗η dvolg
< ω, η >= U
and the norm is defined as | ω |2 =
U
| ω |2 dvol g
21
.
4 Harmonic Differential Forms
299
So far we have considered differential forms, i.e., p-forms ω : U → p (U ) differentiable with respect to thevariablex ∈ U for all 0 ≤ p ≤ n. In this section, we p need to consider the space L 2 cs (U ) of L 2 -forms (square integrable); p p L 2 cs (U ) = {ω ∈ cs (U ); | ω |2 < ∞}. n ∗ p L 2 cs (U ) . L cs (U ) =
(28)
2
p=0
p For 0 ≤ p ≤ n, the spaces L 2 cs (U ) are Hilbert The space of C r p spaces. p r 2 differential forms C (cs (U )) is a subspace of L cs (U ) . The space of L 2 -forms has a weaker topology than the space of C r -forms; it is more appropriate to prove the existence of a solution for a PDE. We call the reader’s to the space of forms p attention 2 (U ) for the square integrable we use along this section; we set the notation L cs p p p-forms, C r cs (U ) for the C r -differentiable p-forms and cs (U ) when it does not concern the topology defined on the space of p-forms. p+1 p The operator d ∗ : L 2 (cs (U)) → L 2(cs (U )) is the dual operator of d with p respect to the inner product in L 2 cs (U ) , i.e., < dω, η >=< ω, d ∗ η > . p
p
The Laplacian operator p : L 2 (cs (U )) → L 2 (cs (U )) is p = d ∗p+1 d p + d p−1 d ∗p .
(29)
p Given η ∈ C r cs (U ) , our intent is to solve the equation p ω = η,
(30)
p ) . A p-form ω is harmonic if p ω = 0. Therefore ω is given that ω ∈ C r cs (U p harmonic in L 2 cs (U ) if and only if d p ω = d ∗p ω = 0. A harmonic p-form ω represents a cohomology class in De Rham Cohomology. Let H p be the space of harmonic p p−1 p+1 p-forms. We introduce the operator ∂/ p : cs (U ) → cs (U ) ⊕ cs (U ), ∂/ p = d ∗p + d p .
(31)
So the Laplacian is given by ∂/2p = p . Furthermore, (i) ∂/∗p = ∂/ p , (ii) Ker(∂/ p ) = Ker(∂/2p ), (iii) < ω, p ω > = |
∂/ p |2L 2
(32) =|
d ∗p ω |2L 2
+ | dpω
|2L 2
.
300
7 Applications to the Stokes Theorem
The subspaces Im(d p−1 ) = d p−1 (cs ), Im(d ∗p+1 ) = d ∗p+1 (cs ) and H p are p−1 p+1 orthogonal since for any θ ∈ cs , ω ∈ H p and η ∈ cs , we have p−1
p+1
< dθ, d ∗ η >=< d 2 θ, η >= 0 < dθ, ω >=< θ, d ∗ ω >= 0; < ω, d ∗ η >=< dω, η >= 0. In order to solve the Eq. (30), we stress the following items: p (1) We will prove the existence of a solution ω ∈ L 2 cs (U ) , indeed, a weak solution. p (2) For the purposes of Differential Topology, we need to prove ω ∈ C ∞ cs (U ) . To prove the existence of a weak solution, we will open the tool of Functional box p 2 (U ) for ω = Analysis. Assuming η = 0 and the existence of a solution ω ∈ L cs p η, we consider the functional F : L 2 cs (U ) → R given by F(φ) =< η, φ > . The definition yields the following remarks; p (i) F(θ ) =< ω, θ > for all θ ∈ L 2 cs (U ) , since < η, θ > =< ω, θ > =< ω, θ > . p (ii) η ∈ (H p )⊥ ⊂ L 2 cs (U ) (η = 0) since < η, h >=< ω, h >=< ω, h >= 0, ∀h ∈ H p . The strategy to obtain a weak solution for Eq. (30) is to prove the orthogonal decomposition p (H p )⊥ , (U ) = H p (33) L 2 cs and the fact that we have a weak solution if and only if η ∈ (H p )⊥ . The orthogonal decomposition is achieved by proving the space H p is finite dimensional. At this point, the ellipticity of the Laplacian operators p , 0 ≤ p ≤ n, is fundamental. Indeed, ellipticity is a very powerful property for which the consequences are the following; (i) H p is finite dimensional. p (ii) Given η ∈ C ∞ cs (U ) , if we have an L 2 -weak solution ω, then it is in p C ∞ cs (U ) . A full treatment would require a more in-depth approach to analytical issues about Sobolev Spaces, which are beyond the scope of this text. For a full treatment we recommend the Refs. [13, 45].
4 Harmonic Differential Forms
301
p Lemma 3 Let c > 0 and consider {ωn } ⊂ L 2 cs (U ) a sequence of p-forms on U such that | ωn |2 ≤ c, and | ωn |2 ≤ c, ∀n ∈ N. p Then a subsequence of {ωn }n∈N is a Cauchy sequence in L 2 cs (U ) . p Corollary 2 H p is finite dimension. Furthermore, the space L 2 cs (U ) admits an orthogonal decomposition Proof See in Ref. [45].
p (M) = Im(d p−1 ) ⊕ Im(d ∗p+1 ) ⊕ H p .
(34)
Proof By Lemma 3, H p is a locally compact set. Therefore H p is finite dimensional by Theorem 15 in Appendix A. In this case, H p admits a closed orthogonal complement (H p )⊥ as proved in Proposition 5 in Appendix A. Since p = ∂/2p for all 0 ≤ p ≤ n, we get the desired decomposition. Corollary 3 There is a constant k > 0 such that | ω |2 ≤ k | ω |2 ,
(35)
for all ω ∈ (H p )⊥ . Proof let us assume (35) is false, so there would be a sequence {ωn }n∈N ⊂ p Otherwise, L 2 cs (U ) with | ωn |2 ≤ 1 and | ωn |2 → 0. By Lemma 3, this would imply the existence of a Cauchy sequence which we can assume to be {ωn }n∈N . Let ω = limn ωn , so lim < ωn , θ >=< ω, θ >, ∀θ. n→∞
In this case, we have | ω |2 = 0. But this is a contradiction since the orthogonal complement is closed. p Theorem 7 (Hodge, L 2 -version) The space L 2 cs (U ) admits an orthogonal decomposition into p (U ) = Im(d p−1 ) ⊕ Im(d ∗p+1 ) ⊕ H p L 2 cs
(36)
for all 0 ≤ p ≤ n. Furthermore, the equation p ω = η admits a weak solution if and only if η ⊥ H p (U ). p p ⊥ h Proof Since L 2 cs (U ) = H p (H )p , letting β = {ω1 , . . . , ωb( p) } be a basis p 2 of H , then an arbitrary form ω ∈ L cs (U ) can uniquely be written as ω=
b( p) i=1
< ω, ωi > ωi + ω⊥ ,
302
7 Applications to the Stokes Theorem
p where ω⊥ ∈ (H p )⊥ . Define the projection P h : L 2 cs (U ) → H p by P h (ω) =
b( p)
< ω, ωi > ωi .
i=1
p Let’s prove the image set Im() = cs (U ) is equal to (H p )⊥ . We have already p seen that the decomposition is orthogonal and cs (U ) ⊆ (H p )⊥ . Now we need p to check the inclusion (H p )⊥ ⊆ cs (U ) . Let η ∈ (H p )⊥ and define the linear functional F : (H p )⊥ → R, F(θ ) =< η, θ > . We note the following items; (i) F is well-defined; Assume θ1 = θ2 ; so (θ1 − θ2 ) = 0 implies (θ1 − θ2 ) ∈ H p ; therefore < η, (θ1 − θ2 ) >= 0 and F(θ p 1 ) = F(θ2 ). (ii) F, restricted to cs (U ) , is a bounded linear functional; applying Corollary 3, we have | F(θ ) |=|< η, θ >| ≤ | η | . | θ | ≤ k | η | . | η | . Due to the Hahn-Banach 14 in Appendix A, F extends to a bounded lin p Theorem ear functional on L2 cs (U ) . By Riesz’s Representation Theorem 12 in Appendix p A, there is ω ∈ L 2 cs (U )) such that F(θ ) =< ω, θ > . So we get F(θ ) =< ω, θ >=< ω, θ >=< η, θ > . p Therefore ω = η and ω ∈ L 2 cs (U ) is a weak solution.
The version of the Hodge Theorem requires proving that ω ∈ p differentiable C ∞ cs (U ) ; neverthless, this requires using regularity theory which relies on Sobolev’s embedding theorem and the ellipticity of the Laplacian operator. Theorem 8 (Hodge) The space cs (U ) of C ∞ -differential forms admits the orthogonal decomposition p
p (U ) = Im(d p−1 ) ⊕ Im(d ∗p+1 ) ⊕ H p cs
(37) p
for all 0 ≤ p ≤ n. Moreover, the equation p ω = η admits a solution in cs (U ) if and only if η ⊥ H p (U ).
4 Harmonic Differential Forms
303
4.1 Hodge Theorem on Manifolds The Hodge theorem extends over a closed4 differentiable manifold M. A byproduct on an orientable manifold is the finiteness of the De Rham cohomology groups. The p n− p orientability allows us to define the bilinear map h : HDR (M) × HDR (M) → R given by φ ∧ ψ. (38) h [φ], [ψ] = M
Indeed, we get the following useful theorem; Theorem 9 (Poincaré Duality) Let M be an n-dimensional compact orientable manifold. The bilinear pairing (38) determines isomorphisms n− p ∗ p HDR (M) HDR (M) . p
Proof Assume HDR (M) = 0. Let ω ∈ H p , ω = 0. Since ∗ = ∗, the (n − p)form ∗ω is harmonic, and we have ω ∧ ∗ω =| ω |22 = 0. h [ω], [ω] = M
The pairing being non-singular implies the isomorphism.
Exercises (1) Let ω = f ei1 ∧ ei2 ∧ · · · ∧ ei p be a p-form and let η be a q-form. Using Eq. 27, show that [ei , e j ]ei ∧ e j ∧ e1 ∧ · · · ∧ e p = 0. (i) d(dω) = i, j
(ii) d(ω ∧ η) = dω ∧ η + (−1) p ω ∧ dη. (2) Considering the contraction operator : p (U ) → p−1 (U ) given by n (−1)i+1 < v, vi > v1 ∧ · · · ∧ vˆi ∧ · · · ∧ v p , v v1 ∧ · · · ∧ v p =
(39)
i+1
where (ˆ.) indicates deletion, show the following identities: let ω ∈ p (U ) and η ∈ q (U ); (i) v(ω ∧ η) = (vω) ∧ η + (−1) p ω ∧ (vη). (ii) v(vω) = 0, ∀ω. 4 Compact
and ∂ M = ∅.
(40) (41)
304
7 Applications to the Stokes Theorem
(3) Let ∗ : p (U ) → n− p (U ) be the Hodge star-operator and d ∗ = (−1)np+n+1 ∗ d∗. Show that n d∗ = − e j ∂e j (42) j=1
(4) Let M be an n-dimensional differentiable manifold. Explain the following: (i) The Hodge Theorem on M. (ii) The bilinear pairing (38) is well-defined. n (M) R. (iii) HDR (iv) If ω ∈ H p and θ ∈ Hq , then not necessarily ω ∧ θ ∈ H p+q . (5) Show that the Poincaré duality theorem implies p
n− p
HDR (M) HDR (M).
5 Geometric Formulation of the Electromagnetic Theory Electromagnetic Theory is governed by Maxwell’s equations named in honor of the Scottish physicist James Clerk Maxwell (1831–1879) who discovered and published them in the early years of the decade of 1860. Maxwell’s equations are partial differential equations. Electricity and Magnetism were considered distinct subjects until the Danish physicist Hans Christian Ørsted (1777–1851) discovered around 1819 that electrical currents create magnetic fields. Soon after, in 1821, André-Marie Ampère (1775–1836), a French physicist and mathematician explored Ørsted’s discovery and formulated Ampère’s Law. This began the unification of Electricity and Magnetism, originating in Electromagnetic Theory. The history of electromagnetism is rich and very important for developing the theory of Vector Calculus and Differential Forms. The fundamentals of Electromagnetic Theory are described in terms of the con The notion of a “Field” was cepts for the Electric Field E and the Magnetic Field B. a revolutionary idea introduced by the English physicist Michael Faraday (1791– 1867) in 1831; Faraday was one of the main developmental leaders who was also responsible for important applications of Electromagnetic Theory to the concept of the electric motor. Also Carl Friederich Gauss (1777–1855), Jean-Baptiste Biot (1774–1862) and Félix Savart (1791–1841) contributed to laying the fundamentals of the electromagnetic theory as we know it today. From the physics point of view, Electromagnetism is governed by four laws (described below using the Gaussian system of units); (1) Gauss’ Law (electric charge). The flow of the electric field through a closed surface is proportional to the charge contained within the surface. Let S ⊂ R3 be a surface such that S = ∂ and is a region in which the Stokes Theorem is valid. According to Gaussian Law, we have
5 Geometric Formulation of the Electromagnetic Theory
305
n > d S = 4π Q, < E,
(43)
S
where 0 is the electric constant. If the charge density inside the region is ρ, then
div( E)d V = 4π ρd V.
Considering that the law applies to any closed surface, we have
= 4πρ, (1st Maxwell’s eq.). div( E)
(44)
(2) Gauss’ Law (magnetic charge). Since a magnetic monopole has never been observed in nature, what we observe is that the flux of the field B through a closed surface S is zero. This Law is motivated by the previous one. Consequently,
= 0, (2nd Maxwell’s eq.). div( B)
(45)
(3) Faraday’s Law. The induced electromotive force in any closed circuit is equal to the negative of the time rate of change of the magnetic flux enclosed by the circuit. The electromotive force is the line integral of the electric field along a closed curve. Let S ⊂ R3 be a surface such that ∂ S = γ and the Stokes Theorem is valid. The Law states that the electric current induced in γ by a magnetic field B is proportional to the flow B of the magnetic field through the surface S per unit of time. That is, γ
d B 1 E = − =− dt c
S
∂ B d S. ∂t
(46)
Applying the Stokes Theorem, we obtain 1 ∂ B , (3rd Maxwell’s eq.). ∇ × E = − c ∂t
(47)
(4) Ampère’s Law. The integral around a closed path of the component of the magnetic field tangent to the direction of the path equals μ0 times the current through the area within the path. To apply Ampère’s Law, all currents have to be steady (i.e., they do not change with time). The currents have to be taken with their algebraic signs (those going “out” of the surface are positive, those going “in” are negative). Let S be a surface with a boundary that is a closed curve γ = ∂ S on which the Stokes theorem is valid. 1 (48) B dγ = 4π I, c γ
306
7 Applications to the Stokes Theorem
Setting J as a current density in S, we get ∇ × B =
1 4π J . c
Maxwell found that there was a need to add the term ∂∂tE to Ampère’s law, which became known as Ampère-Maxwell’s Law as follows: γ
Therefore
∂ E 1
d S. Bdγ = 4π J dS + c S S ∂t
1 ∂ E ∇ × B = 4π J + , (4th Maxwell’s eq.). c ∂t
(49)
(50)
Putting all the equations together, we get Maxwell’s equations (1) ∇. B = 0,
(3) ∇. E = 4πρ, 1 ∂ B ∂ E 1 (2) ∇ × E = − . , (4) ∇ × B = 4π J + c ∂t c ∂t
(51)
5.1 Electromagnetic Potentials Since we are considering the equations defined in R3 , noting that De Rham cohomology is trivial here, we consider a region ⊂ R3 , diffeomorphic to R3 , and where the Poincaré Lemma and the Stokes Theorem are both valid. It follows from the equation ∇. B = 0 that we have a vector field A B = A1 i +
such that B = ∇ × A B . Consequently, Eq. (2) in (51) becomes A2 j + A3 k,
1 ∂ A B ∇× E+ c ∂t
= 0.
Applying the Poincaré Lemma, φ is such that E + E = −∇φ −
∂ A B ∂t
= −∇φ. Consequently
1 ∂ A B . c ∂t
The vector A B is the magnetic potential, the scalar φ is the electric potential, and the pair (A B , φ) is the electromagnetic potential. Introducing these relations into Eq. (4) in (51), we get
5 Geometric Formulation of the Electromagnetic Theory
307
∇ × B = ∇ × (∇ × A B ) = ∇(∇. A B ) − A B = 4π 1 ∂φ 1 ∂ 2 A B = − 2 2 . J− ∇ c c ∂t c ∂ t Therefore 4π 1 ∂φ 1 ∂ 2 A B
∇ ∇. A B + 2 ∇ = J + A B − 2 2 . c ∂t c c ∂ t Definition 5 The D’Alembertian operator : C 2 () → C 0 () is = − + Therefore
1 ∂2 . c2 ∂ 2 t
(52)
4π 1 ∂φ
A B = . J − ∇ ∇. A B + 2 ∇ c c ∂t
(53)
Let f = f (x, t) : R3 × R → R be a C 1 -function. Given B = ∇ × A B and E =
−∇φ − ∂∂tA B , we note that the potentials A = A B + ∇ f and φ = φ −
∂f , ∂t
satisfy the same equations. This invariance is called gauge invariance. As a result we have some freedom to choose the electromagnetic potential (A B , φ). We select the Lorenz gauge, which will be further explained in the next section, and we have ∇. A B +
1 ∂φ = 0. ∇ c2 ∂t
(54)
The equations for the potentials are now A B =
4π J , φ = 4πρ. c
(55)
5.2 Geometric Formulation Let’s consider Maxwell’s equation in the absence of the sources of magnetic ( J = 0) and electric fields (ρ = 0). We will not worry about the unit systems or the physical constants. In classical electromagnetism, ruled by Maxwell’s equations, we consider R4 = R3 × R (Space×Time) provided with the Lorentz pseudo-product (Hendrik Lorentz, 1853–1928),
308
7 Applications to the Stokes Theorem
⎛
1 ⎜0 L=⎜ ⎝0 0
⎞ 00 0 1 0 0⎟ ⎟. 0 1 0⎠ 0 0 −1
We will use the apparatus of differential forms to obtain Maxwell’s equations. We start by considering the 1-form A = A1 ( p, t)d x + A2 ( p, t)dy + A3 ( p, t)dz − φ( p, t)dt, where p = (x, y, z) ∈ R3 is the spacial variables and t is the time. Considering the functions Bx =
∂ A3 ∂ A1 ∂ A2 ∂ A2 ∂ A3 ∂ A1 − , By = − , Bz = − . ∂y ∂z ∂z ∂x ∂x ∂y Ex = −
∂φ ∂φ ∂φ , E y = − , Ez = − , ∂x ∂y ∂z
associated to the magnetic field B = ∇ × A = (Bx , B y , Bz ), we have the differential 2-form B = Bz d x ∧ dy + Bx dy ∧ dz + B y dz ∧ d x, and associated to the electric field E = (E x , E y , E z ) = −∇φ, we have the differential 1-form E = E x d x + E y dy + E z dz. Definition 6 The electromagnetic tensor associated to the electromagnetic potential A is the differential 2-form FA = d A. Now we have FA = B + E ∧ dt.
(56)
Since d 2 = 0, we get
∧ dt =
x ∧ dy ∧ dz + ∂ B ∧ dt + curl( E) 0 = d FA = d B + d E ∧ dt = div( B)d ∂t ∂ B
x ∧ dy ∧ dy + = div( B)d + curl(E) ∧ dt. ∂t i< j Therefore we have derived Maxwell’s equations (1) and (2) in (51):
= 0, (2) curl( E)
+ ∂ B = 0. (1) div( B) ∂t
5 Geometric Formulation of the Electromagnetic Theory
309
5.3 Variational Formulation To obtain the other pair of equations, a variational formulation is required since at some point nature optimizes its costs. The admissible space for the variational problem is the Banach space of 1-forms 1 () provided with the standard L 1,2 Sobolev norm || . || L 1,2 , that is,
A = A ∈ 1 (); || A || L 1,2 < ∞ . Definition 7 The energy functional E : A → R of the electromagnetic field is E(A) =
1 2
|| FA ||2 d V.
(57)
To derive the Euler-Lagrange equations, let ω ∈ 1 () and FA+tω = FA + tdω, so E(A + tω) − E(A) < FA+tω , FA+tω 1 1 = lim > = d E A .ω = lim 2 t→0 t 2 t→0 t 1 = < FA dω > + < dω, FA >= < d ∗ FA , ω > . 2 The Fréchet derivative of the energy functional at A ∈ A is d E A .ω =
< d ∗ FA , ω > d V.
Consequently the critical points satisfy the EL-equation d ∗ FA = 0.
(58)
Since d ∗ = ∗d∗ (Eq. 38), the EL-equation is equivalent to d(∗FA ) = 0. The Hodge star-operator induced by the Lorentz form L satisfies ∗2 = (−1)1+n(n−k) , so ∗ : 2 (R4 ) → 2 (R4 ) satisfies ∗2 = −1. In this case, the eigenvalues are ±i and the respective eigenspaces are 2+ (R4 ) (self-dual) and 2− (R4 ) (anti-self-dual). The space 2 (R4 ) decomposes as 2 (R4 ) = 2+ (R4 ) ⊕ 2− (R4 ).
(59)
The electromagnetic field FA decomposes as FA = FA+ + FA− , where ∗FA = ±i FA . Let’s analyze the case when the solutions to Eq. (58) are self-dual or antiself-dual;
310
7 Applications to the Stokes Theorem
(1) A is self-dual: ∗FA = i FA . In this case, we get the equation d FA = 0 which is equivalent to Maxwell’s 1st and 2nd equations. (2) A is anti-self-dual: ∗FA = −i FA . Analogous to the self-dual case. (3) d(∗FA ) = 0. Let’s find each component of d(∗FA ) = d(∗B) + d ∗ (E ∧ dt) . Since ∗B = Bx d x ∧ dt + B y dy ∧ dt + Bz dz ∧ dt, the contribution due to the 2-form B is " d(∗B) =
∂ By ∂ Bx − ∂x ∂y
d x ∧ dy +
∂ By ∂ Bz − ∂y ∂z
dy ∧ dz +
∂ Bx ∂ Bz − ∂z ∂x
# dz ∧ d x ∧ dt.
We set d(∗B) = (∇ × B) ∧ dt. The contribution from the Electric Field component is ∂ Ey ∂ Ez
x ∧ dy ∧ dz + ∂ E x dy ∧ dz + dz ∧ d x + d x ∧ dy ∧ dt. d ∗ (E ∧ dt) = div( E)d ∂t ∂t ∂t
x ∧ dy ∧ dz + ∂ E ∧ dt, the equation By setting d(∗(E ∧ dt)) = div( E)d ∂t d ∗ FA = 0 becomes
∂ E
x ∧ dy ∧ dz + curl( B)
+ div( E)d ∧ dt = 0. ∂t Hence
= 0, curl( B)
+ div( E)
∂ E = 0. ∂t
In this case, in the absence of electromagnetic sources, Maxwell’s equations (51) are written using the formalism of Differential Forms as d FA = 0, d ∗ FA = 0.
(60)
If FA is a solution to Maxwell’s equation above, then FA is a harmonic 2-form since FA = (dd ∗ + d ∗ d)(FA ). As we have mentioned before, electromagnetic theory has a gauge invariance. Given a differentiable function f , the 1-form A = A + d f satisfies FA = FA . Indeed, B = B and E = E. The component d f belongs to the image of d : 0 → 1 . Since 1 () decomposes (orthogonal) as
1 () = Im d : 0 () → 1 () ⊕ Ker d ∗ : 1 () → 0 () ,
5 Geometric Formulation of the Electromagnetic Theory
311
we consider a representative in the Ker(d ∗ ) of the gauge class of a 1-form A. Given a 1-form A = A1 (x, t)d x + A2 (x, t)dy + A3 (x, t)dz − φ(x, t)dt, we would like to find a 1-form A gauge equivalent to A, i.e., to find a 0-form f such that A = A + d f,
d ∗ A = 0.
This is equivalent to solving the equation f = −d ∗ A. Thanks to the Hodge Theorem there is such an f . Since d ∗ (A) = div(A), where the divergent operator is written with respect to the Lorentz form, we get d∗ A = 0
⇔
div(A) = ∇. A B +
∂φ = 0. ∂t
(61)
The Lorenz gauge is defined by the element A satisfying the equation d ∗ A = 0 among all the 1-forms gauge equivalent to A. This explain the choice taken in Eq. (54). Gauge invariance is a fundamental principle in the physical models of the Elementary Interactions of nature. Electromagnetism is the oldest example of Gauge Theory, and in Mathematics it corresponds to the Differential Geometry of Vector Fiber Bundles. In Geometry, one of the most fundamental concepts is that of Curvature, which measures how much the space moves away, or curves from a plane. The electromagnetic tensor is an example of a Curvature tensor and the electromagnetic potential A is the 1-form connection with a curvature that is the 2-form FA = d A. After reading this section, we invite the reader to make a comparison with YangMills Theory, which was briefly introduced in Chap. 3, Example (4), when we introduced the Moduli Spaces of Instantons.
Exercises (1) Show that the matrix representation of the Electromagnetic Field is given in terms of the fields B = (Bx , B y , Bz ) and E = (E x , E y , E z ), by ⎛
0 Bz −B y ⎜ −Bz 0 Bx FA = ⎜ ⎝ B y −Bx 0 −E x −E y −E z
⎞ Ex Ey ⎟ ⎟. Ez ⎠ 0
(2) Show that the Laplacian L with respect to the Lorentz form L is the D’Alembertian −, i.e., L f = − f. (3) Show that the electric field and the magnetic field satisfy the equations E = 0, B = 0.
312
7 Applications to the Stokes Theorem
(4) Show that the electric potential and the magnetic potential satisfy the equations A B = 0, φ = 0. (5) Show that Maxwell’s equation in the presence of sources are d FA = 0, d ∗ FA = 4π J, where J is a 1-form depending on the charge density ρ and the current density J . (6) The Biot-Savart law named after the French physicists Jean Baptiste Biot and Felix Savart specifies the magnetic field generated by a steady distributed current
j in a region ⊂ R3 as B(r ) =
1 4π
r − r )
j(r ) × ( d V. | r − r |3
(62)
To show the formula above, follow these steps; (a) Use the 2nd Green identity and the harmonic function ν(r ) = to prove that the solution to the equation
1 r
in R3 \{0}
A B = 4π j
)= is A(r
1 4π
j |r −r |
dV .
(b) Use the identity B = ∇ × A B . (c) Using the same arguments, show that
)= 1 E(r 4π
ρ(r )
r − r d V. | r − r |3
(7) Consider a current being carried along a straight wire directed along the z-axis. Show the magnetic field generated by the current is I − y i + x j . B = 2π Show that the vector potential is I
A B = − ln(x 2 + y 2 )k. 2π
6 Helmholtz’s Decomposition Theorem
313
6 Helmholtz’s Decomposition Theorem In this section, we will ignore analyzing formalisms and techniques that are required to be accurate and rigorous; let’s assume jurisprudence in Advanced Calculus that our arguments are acceptable, as they are for physicists. The Helmholtz theorem is an interesting application, and very useful in electromagnetism. It certainly precedes the Hodge Theorem in time. Let F be a vector field and let ⊂ R3 be a domain with S = ∂ in which the Stokes Theorem and Poincaré’s Lemma are both valid. Theorem 10 (Helmholtz) Let F ∈ C 2 (; R3 ) be a vector field, so we have φ ∈ C 1 (, R) and A ∈ C 1 (; R3 ) such that F = −∇(φ) + ∇ × A, where
)
) ∇ . F(r F(r 1 d V .n d S , − 4π S | r − r | |r −r |
)
) ∇ × F(r 1 F(r 1 d V .n d S . − A(r ) = 4π | r − r | 4π S | r − r |
φ(r ) =
1 4π
(63)
That is, F is the sum of one component with null curl and another component with null divergence. The Helmholtz theorem can be understood using differential forms. Let F = (Fx , Fy , Fy ) and F = Fx d x + Fy dy + Fz dz. In light of Functional Analysis, the operators d : 0 (R3 ) → 1 (R3 ) and d ∗ : 1 (R3 ) → 0 (R3 ), the space 1 (R3 ) admits the orthogonal decomposition 1 (R3 ) = Img(d) ⊕ Ker(d ∗ ).
Theorem 10 is a As seen in Chap. 6, we have d f = ∇ f and d ∗ F = div( F).1. consequence of the Hodge Theorem. To prove Helmholtz’s theorem using simple calculus tools, we will use Dirac’s δfunction, which is quite familiar to physicists and unfamiliar to mathematicians.5 We will leave some exercises to help the reader become familiar with Dirac’s δ-function.
1 = −4π δ 3 (r − r ), we have Proof Since |r −r |
)= F(r
) F(r 1 1
).δ 3 (r − r ) d V = − 1
). F(r F(r dV = − dV . 4π |r −r | 4π |r −r |
Applying the identities (i1) ( v ) = ∇(∇. v ) − ∇ × (∇ × v ), (i2) ∇ × ( f v ) (i3) ∇.( f v ) =< ∇ f, v > + f ∇. v 5 It
is an example in the content of Distribution Theory.
= − v × ∇ f + f ∇ × v ,
314
7 Applications to the Stokes Theorem
we get % % $ $
)
) F(r F(r 1 1 (i2)+(i3)
) (i1) = − = F(r + ∇ ∇. d V ∇ × ∇ × d V 4π 4π |r −r | |r −r | # # " " 1 1 1 1
), ∇
) × ∇ > d V − . < F(r F(r =− ∇ ∇ × > d V 4π | r − r | 4π | r − r |
Noting that ∇ 1
∇ F(r) = 4π
"
1 |r −r |
), ∇ < F(r
= −∇
1 | r − r |
1 |r −r |
, then
" # # 1 1
) × ∇ > dV + ∇× > > d V . F(r 4π | r − r |
By resorting to identities (i1), (i2) and (i3), we have the equation $
%
)
∇ . F(r ) 1 F(r dV − ∇ . dV + 4π | r − r | |r −r | %
)
) 1 ∇ × F(r 1 F(r dV − ∇ × +∇× dV . 4π | r − r | 4π | r − r |
)=−∇ 1 F(r 4π $
Applying the Stokes Theorem, we obtain $
%
)
) ∇ . F(r F(r 1 dV − dV + 4π S | r − r | |r −r | %
)
1 ∇ × F(r ) 1 F(r dV − nˆ × +∇× dV , 4π | r − r | 4π S | r − r |
)=−∇ 1 F(r 4π $
where nˆ is the unit normal vector to S = ∂. Considering = R3 and assuming limr →∞ plified to $
) = −∇ 1 F(r 4π
) F(r |r −r |2
= 0, the decomposition is sim-
% $ %
)
) 1 ∇ . F(r ∇ × F(r dV − ∇ × dV . | r − r | 4π | r − r |
Exercises (1) Extend the Helmholtz Theorem to a region ⊂ Rn . (2) (Dirac’s δ-function) Consider&the sequence of functions { f k }k∈N , where f k : Rn → R is given by f k (x) =
k −k|x|2 e . π
6 Helmholtz’s Decomposition Theorem
(a) Show supx∈Rn | f k (x) |=
315
&
k π
and get to the conclusion
lim f k (x) =
k→∞
0, if x = 0, ∞, if x = 0.
(b) Fix x0 ∈ Rn and consider δ n (x − x0 ) = limk→∞ f k (x − x0 ). Show that
Rn
δ n (x)d x = 1,
Rn
f (x)δ n (x − x0 )d x = f (x0 ).
We denote δ(x) = δ 1 (x). (c) In Rn , prove the identity
1 = −An−1 .δ n (x − x0 ), | x − x0 |n−2
where An−1 is the volume of the sphere S n−1 . (d) Fix a fixed point a = (a1 , . . . , an ) in Rn and let x = (x1 , . . . , xn ) be an arbitrary point. Show δ n (x − a) = δ(x1 − a1 ) . . . δ(xn − an ). (e) Prove the following identities: assume n = 1 and a, λ ∈ R (i) δ(−x) = δ(x). . (ii) δ(λx) = δ(x) |λ| a δ(x− |λ| ) . |λ| δ(x 2 − a 2 ) = δ(x−a)+δ(x+a) . 2|a| Let {xn | f (xn ) = 0} be the f (xn ) = 0, then
(iii) δ(λx − a) = (iv) (v)
set of zeros of f : R → R. Assuming
δ(x − x x ) . δ f (x) = | f (xn ) | n
(3) Let ν(r ) = r1 . Show that ν = −4π δ(r ) = Extend the result when ν(r ) =
1 r n−2
0, if r = 0, ∞, if r = 0.
is a harmonic function is Rn .
Remark Due to its value at x = 0, Dirac’s δ-function is not a function, it is a distribution. The distributions are generalizations of functions; they are naturally treated in the theory of distributions or in context of Sobolev spaces.
Appendix A
Basics of Analysis
The contents of Appendix A contains basic definitions and theorems to support reading Chaps. 1–7. Appendix A should be used only as a reference source along with the References.
1 Sets Let C be a set, and consider C k = C × · · · × C the Cartesian product of k-copies of C. Consider Cn = {1, 2, 3, . . . , n} and Cnk = Cn × . k. . × Cn . Cnk is a multi-index set of length k. For example, an element a I , I = (i 1 , i 2 , . . . , i k ) ∈ Cnk means a I = ai1 i2 ...ik . The Kronecker delta is 0, if I = J, δI J = 1, if I = J.
2 Finite-dimensional Linear Algebra: V = Rn Let V be a real vector space. We say that V is a vector space of dimension n and denote dim(V ) = n whenever V admits a finite basis β = {e1 , . . . , en }. A theorem ensures that any finitely generated vector space admits a finite basis, i.e., for every vector u∈ V , we have coefficients u 1 , . . . , u n such that u is the linear combination n u i ei , u i ∈ R. of u = i=1
© Springer Nature Switzerland AG 2021 C. M. Doria, Differentiability in Banach Spaces, Differential Forms and Applications, https://doi.org/10.1007/978-3-030-77834-7
317
318
Appendix A: Basics of Analysis
2.1 Matrix Spaces The set of matrices n × m with real entries is denoted by Mn,m (R) and Mn (R) whenever n = m. The following sets deserve highlights: (i) The group of invertible matrices GLn (R) = {A ∈ Mn (R) | A is invertible}. (ii) The group of special matrices SLn (R) = {A ∈ GLn (R) | det(A) = 1}. (iii) The orthogonal group of matrices is On = {A ∈ GLn (R) | At = A−1 } and the special orthogonal group is SOn = {A ∈ On | det(A) = 1}. (iv) The vector space Sn = {A ∈ Mn (R) | At = A} of symmetric matrices. (v) The vector space An = {A ∈ Mn (R) | At = −A} of skew-symmetric matrices. The vector space Mn (R) splits as Mn (R) = Sn ⊕ An since any matrix A ∈ Mn (R) can be decomposed into A=
A − At A + At + . 2 2
When the coefficient field is the complex numbers C, we have the following sets of matrices Mn,m (C), Mn (C), GLn (C), SLn (C). Instead of symmetric or skewsymmetric matrices, now we have the subspace of Hermitian matrices Hn = {A ∈ Mn (C) | A∗ = A} and the subspace of anti-Hermitians AHn = {A ∈ Mn (C) | A∗ = −A}, A∗ = A¯ t (A∗ = A¯ t ). Similar to the real case, we have the decomposition Mn (C) = Hn ⊕ AHn .
2.2 Linear Transformations Let V and W be vector spaces over a field K = R or C. A linear transformation T : V → W is a map such that for any a, b ∈ R and u, v ∈ V , we have T (au + bv) = aT (u) + bT (v). (1) Matrix Representation By fixing a basis α = {e1 , . . . , en } of V , we can associate to a linear transformation T : V → W a matrix A = [T ]α , as shown next;
Appendix A: Basics of Analysis
319
T (e1 ) = a11 e1 + · · · + an1 en =
n
al1 el ,
l=1 n
⎛
a11 T (e2 ) = a12 e1 + · · · + an2 en = al2 el , ⎜a21 ⎜ l=1 ⇒ T =⎜ . ⎝ .. .. . an1 T (en ) = a1n e1 + · · · + ann en =
n
a12 a22 .. . an2
... ... .. . ...
⎞ a1n a2n ⎟ ⎟ .. ⎟ = [T ]α , . ⎠ ann
aln el .
l=1
(2) Change of Basis Let V be a vector space of dimension n and T : V → V a linear transformation. Given the bases α = {e1 , . . . en } and β = {u 1 , . . . , u n } of V and a vector v ∈ V , the matrix taking the representation [v]β to the representation [v]α is the matrix P = ( pi j ) with entries defined as follows; u 1 = p11 e1 + · · · + pn1 en = u 2 = p12 e1 + · · · + pn2 en =
n l=1 n
pl1 el , ⎛
pl2 el ,
l=1
.. . u n = p1n e1 + · · · + pnn en =
n
p11 p12 . . . ⎜ p21 p22 . . . ⎜ ⇒ P=⎜ . .. .. ⎝ .. . . pn1 pn2 . . .
⎞ p1n p2n ⎟ ⎟ α .. ⎟ = [P]β . . ⎠ pnn
pln el .
l=1
In this case, we say that the matrix [P]αβ takes the basis β to the basis α since ⎛ ⎞ 0 ⎛ ⎞ ⎜0 ⎟ p1i ⎜ ⎟ ⎜ .. ⎟ ⎜ p2i ⎟ ⎜.⎟ ⎜ ⎟ α ⎟ [u i ]α = ⎜ . ⎟ = P. ⎜ ⎜1⎟ = Pβ .[u i ]β . . ⎝ . ⎠ ⎜ ⎟ ⎜ .. ⎟ pni ⎝.⎠ 0 Assume α is the canonical basis and let β be a basis in which the vectors are eigenvectors of T . In this way, the matrix P is the matrix with column vectors exactly as the vectors in β; [v]α = P.[v]β ; that is, P takes a vector represented in β to its representation in the canonical basis α. Let’s find the relationship between the matrices A = [T ]α and B = [T ]β representing T , respectively. So
320
Appendix A: Basics of Analysis
T (ei ) =
n
ali el , T (ei ) =
l=1
n
ali el .
l=1
However, T (u i ) = T
pli el = pli T (el ) = pli akl ek =
l
=
k
Therefore
l
akl pli ek .
l
(1)
k
(2)
l
A P = P B ⇐⇒ B = P −1 A P.
Consider L(Rn ; Rm ) the space of linear transformations T : Rn → Rm .
2.3 Primary Decomposition Theorem For simplicity, let’s assume V is a C-vector space and let T : V → V be a linear map. Consider the characteristic polynomial of T to be given by the product pc (t) =
k
(t − λi )ni ,
i=1
with λ1 , . . . , λk being the distinct eigenvalues of T , and n i are positive integers such that n 1 + · · · + n k = dim(V ). We define the generalized eigenspace of T belonging to λi to be the subspace n E(T, λi ) = Ker T − λi I i ⊂ V. The following theorem1 shows that the space V can be decomposed into T invariant subspaces. Theorem 1 (Primary Decomposition) Let V be a C-vector space and let T : V → V be a linear map; otherwise V is real and T has real eigenvalues. Then V admits the direct sum decomposition
1 If
V is an R-vector space, there are more cases to be considered.
Appendix A: Basics of Analysis
321
V =
r
E(T, λi ).
i=1
Indeed, n i = dim E(T, λi ) .
2.4 Inner Product and Sesquilinear Forms An inner product on a vector space V is a bilinear map < ., . >: V × V → R satisfying the following properties: for any u, v ∈ V , (i) < u, v > = < v, u >. (ii) < u, u > ≤ 0, otherwise if < u, u > = o, then u = 0. (iii) |< u, v >| ≤ | u | . | v |. β
Given a basis β = {e1 , . . . , en } of V , we get the matrix G = (gi j ) associated β to < ., . > with entries gi j =< ei , e j >, and so we have < u, v >= u t .G.v for all u, v. Condition (i) implies that G is a symmetric matrix (G t = G), and so it is diagonalizable. Let u ∈ V be an eigenvector associated with the eigenvalue λ; it follows that < u, u >= u t · G · u = λu t · u. Therefore we have λ > 0 and consequently det(G) > 0. An inner product on V induces an inner product on every vector subspace of V . Then, restricting to the subspace Vi j , generated by the vectors βi j = {ei , e j }, the inner product is defined as < u, v >= u
t
β
β
gii gi j β β g ji g j j
= u t G(i, j).v .
β β β Consequently, det G β (i, j) = gii g j j − (gi j )2 > 0. From now on, the index β will be ignored unless it is necessary to convey information. Definition 1 A basis β = {e1 , . . . , en } on V is orthonormal if the matrix of < ., . > associated to β is the identity matrix, that is, gi j = δi j . Theorem 2 Let (V, < ., . >) be a finite dimensional vector space endowed with an inner product. So we have a basis β = {e1 , . . . , en } such that < ei , e j >= δi j . An orthonormal basis is useful to perform computations. In some cases, as in Differential Geometry or General Relativity, there is a need to use a non-orthogonal basis. The concept of a vector space with an inner product extends to complex vector spaces as we defined next.
322
Appendix A: Basics of Analysis
Definition 2 A bilinear form < ., . >: V × V → C is symmetric sesquilinear, or Hermitian, if it satisfies the following conditions; for all u, v ∈ V (i) < v, u > = < u, v >. (ii) < u, u > ≥ 0; if < u, u > = 0, then u = 0. (iii) |< u, v >| ≤ | u | · | v |. From (i), for any a, b ∈ C and u, v ∈ V , < au, bv >= a b¯ < u, v >. Let V be a finite dimensional vector space over C and fix a basis β = {e1 , . . . , en }, so there is a complex matrix H such that ¯ t. < u, v >= u ∗ · H · v, u ∗ = (u) t
Also from (i), we get H ∗ = H , and H ∗ = H .
2.5 The Sylvester Theorem Let V be a real vector space. A bilinear form B : V × B → R is symmetric if B(u, v) = B(v, u), for all u, v ∈ V . In this case, associated with B, there is the quadratic form Q : V → R, Q(u) = B(u, u). Reciprocally, given a quadratic form Q, we define the bilinear form B as B(u, v) =
1 Q(u + v) − Q(u) − Q(v) . 2
(3)
Next, we will be working with the quadratic form instead of the bilinear form. Given a basis β = {e1 , . . . , en } of V , Q is represented as Q β = (qi j ), qi j = B(ei , e j ). We get B(u, v) = v t · Q β · u. The matrix Q β being symmetric allows us to consider its diagonal form. Assume Q is non-degenerate, that is, Q(u) = 0 whenever u = 0. So the eigenvalues of Q are all non-null. The spectrum of Q is, by definition, the set σ (Q) = {λ ∈ C | Q.u = λ.u}. Since Q is symmetric, it follows that σ (Q) ⊂ R. Considering the Q-invariant subspaces VQ+ = λ ∈ σ (Q) | λ > 0 , VQ− = λ ∈ σ (Q) | λ < 0 ,
(4)
we get the decomposition V = VQ+ ⊕ VQ− . Definition 3 Let V be a real vector space of dimension n and let Q : V → R be a non-degenerate quadratic form. Let p Q = dim(VQ+ ) and q Q = dim(VQ− ). The rank of Q is r (Q) = p Q + q Q = n and its signature is τ (Q) = p Q − q Q . If τ (Q) is known, then the numbers p Q and q Q are determined in the terms n and τ (Q). By performing a change of basis, the matrix Q changes. Considering the orthogonal bases β, β , in which the matrices representing Q are Q β and Q β ,
Appendix A: Basics of Analysis
323
respectively, we have a matrix P such that Q β = P t .Q β .P. Since β and β are orthonormal bases, then P is also an orthonormal matrix, and so P t = P −1 . Therefore Q β and Q β are similar matrices (Q β ∼ Q β ), which means that there is a matrix P ∈ O(n) such that Q β = P −1 · Q β · P. Similar matrices have the same characteristic polynomial; therefore their spectrums are equal. Theorem 3 (Sylvester) Let V be a real vector space and Q : V → R a quadratic form. So Q is similar to a quadratic form with the matrix given as Q=
I pQ 0 , 0 −Iq Q
in which I p Q and Iq Q are identity matrices with rank p Q and q Q , respectively. Sylvester’s Theorem 3 generalizes Theorem 2.
2.6 Dual Vector Spaces Let V be a real vector space of dimension n. Definition 4 A linear functional defined on V is a function f : V → R satisfying the following condition: for any a, b ∈ R and u, v ∈ V , f (au + bv) = a f (u) + b f (v). The dual space of V is the space V ∗ = { f : V → R | f is linear} for linear functionals on V . Proposition 1 V ∗ is a vector space of dimension n. Proof The vector space structure on V ∗ is induced from the vector space structure on V in the following manner; (i) if f 1 , f 2 ∈ V ∗ , then f 1 + f 2 ∈ V ∗ since ( f 1 + f 2 )(u) = f 1 (u) + f 2 (u). (ii) if a ∈ R, f ∈ V ∗ , then a f ∈ V ∗ since (a f )(u) = a. f (u). To calculate the dimension of V, we fix a basis β = {e1 , . . . , en } of V and consider the set β ∗ = {e1∗ , . . . , en∗ } ⊂ V ∗ , with ei∗ : V → R the functional given as the linear extension of the identity ei∗ (e j ) = δi j ; ei∗ (u) = e∗
k
We claim β ∗ is a basis of V ∗ . (i) β ∗ is linear independent.
u k ek = u k δik = u i . k
324
Appendix A: Basics of Analysis
Suppose we have ai = 0 ∈ R, 1 ≤ i ≤ n, such that a1 ei∗ + · · · + an en∗ = 0. Then, for all i, 0 = a1 ei∗ + · · · + an en∗ (ei ) = ai . (ii) β ∗ span V ∗ . ∗ Let u ∈ V be the vector u = u 1 e1 + · · · + u n en . The identity u i = ∗ei (u) allows us ∗ ∗ to write u = i ei (u)ei . Let f ∈ V , so we get f (u) = i f (ei )ei (u), and so f =
n
f (ei )ei∗ .
(5)
i=1
Therefore it follows from (i) and (ii) that β ∗ is a basis of V ∗ and dim(V ∗ ) = n. Definition 5 Given a basis β = {e1 , . . . , en } of V , the dual basis of β in V ∗ is β ∗ = {e1∗ , . . . , en∗ }, ei∗ (e j ) = δi j . x Now let’s consider (V, < ., . >) a vector space endowed with an inner product and a matrix defined by the basis β = {e1 , . . . , en } is G = (gi j ). The inner product induces the linear map P : V → V ∗ , P(u) =< u, . >. Proposition 2 P : V → V ∗ is an isomorphism of vector spaces. Proof We claim P is injective and surjective. (i) Injectivity. Let u 1 , u 2 ∈ V be such that P(u 1 ) = P(u 2 ), so < u 1 − u 2 , u >= 0 for every u ∈ V . Given u = u 1 − u 2 , it follows that | u 1 − u 2 |= 0. Hence u 1 = u 2 . (ii) surjectivity. Let β # = { f 1 , . . . , f n } be the set of linear functionals given byf i (v) =< ei , v > for i = 1, . . . , n. β # is a basis of V ∗ , and to prove this, take u = nj=1 e∗j (u)e j , f i (u) =
n
f i (e j )e∗j (u) =
i=1
n
gi j e∗j (u) ⇒ f i =
i=1
n
g ji e∗j , (gi j = g ji ).
j=1
In this way, the matrix G = (gi j ) takes β ∗ onto β # . Considering the inverse matrix G = (g i j ), it follows that n ei∗ = g ji f j . −1
j=1
Any functional f ∈ V ∗ can be written as f =
n i=1
f (ei )ei∗ =
n i, j=1
g ji f (ei ) f j .
Appendix A: Basics of Analysis
325
Consequently f (u) =
n
g ji f (ei ) < e j , u >=
.
i, j=1
Defining the vector uf =
n
g ji f (ei )e j ,
(6)
i, j=1
we get P(u f ) = f . Therefore P is linear and surjective, so it is an isomorphism of vector spaces. The last proposition allows us to define an inner product on V ∗ induced by the one on V as follows; let f = P(u f ) and h = P(u h ) ∈ V ∗ < f, g >=< u f , u h > =
g g f (e j )h(ek )g jl = ji lk
i, j,k,l
=
g i j δ jk f (e j )h(ek ) =
i, j,l
g
ji
i, j,k
g jl g
lk
f (e j )h(ek ) =
l
g i j f (ei )h(e j ).
i, j
The inner product matrix of < ., . >: V ∗ × V ∗ → R is G −1 = (g i j ), g i j =< ∗ >. Therefore it is possible to extend the inner product on V and V to a lot of vector spaces that are constructed using the direct sum and tensor product operations, as we discussed in Chap. 6.
ei∗ , e∗j
3 Metric and Banach Spaces Let X be a topological space. A metric defined on X , also called a distance function, is a function d : X × X → R which has the following properties: for all x, y, z ∈ X , (1) d(x, y) ≥ 0, and d(x, y) = 0 if, and only if, x = y. (2) d(x, y) = d(y, x). (3) d(x, y) ≤ d(x, z) + d(z, y). A metric space is a pair (X, d). The notion of distance allows us to define the limit of a sequence {xn }n∈N in X ; Definition 6 A sequence {xn }n∈N converges to a point a ∈ X , which is indicated by limn→∞ xn = a, if for any > 0, we have n 0 ∈ N such that d(xn , a) < for all n > n0.
326
Appendix A: Basics of Analysis
In the space (X, d), the open ball of radius r and centered at p ∈ X is the set Br ( p) = {x ∈ X | d(x, p) < r }. The closed ball is the closure Br ( p) = {x ∈ X | d(x, p) ≤ r }. The topology of the space (X, d) is generated by the open balls. In most places, the letter U is used to denote an open set, unless otherwise stated. Consider (X, d X ) and (Y, dY ) metric spaces and f : X → Y a function. Definition 7 Let x0 ∈ X and y0 ∈ Y . The limit of f is y0 when x tends to x0 , which is denoted by lim x→x0 f (x) = y0 , if for a given > 0 there is δ > 0 such that if d X (x, x0 ) < δ, then dY ( f (x), y0 ) < . Definition 8 A function f : (X, d X ) → (Y, dY ) is continuous at x0 ∈ X if lim x→x0 f (x) = f (x0 ). That is, f is defined at x0 and the limit is f (x0 ). Considering U ⊂ X , f is continuous on U if it is continuous at every point x ∈ U . In the definition of continuity, the value of δ may depend on the values of and x0 . In some cases, the proofs require that there be no such dependence; this motivates the concept of uniform continuity; Definition 9 A function f : (X, d X ) → (Y, dY ) is uniformly continuous in X if, for all > 0, there is δ > 0 such that dY ( f ( p), f (q)) < for every pair of points p, q ∈ X such that d X ( p, q) < δ (δ is independent on p and q). The following theorem has several applications throughout the text; Theorem 4 Let (X, d X ) be a compact metric space and consider f : (X, d X ) → (Y, dY ) a continuous function. So f is uniformly continuous. Proof See Ref. [38].
Consider K = R or C. A norm on a K-vector space V is a function || . ||: X → R such that for every x, y, z ∈ X , the following properties are satisfied; (1) || x || ≥ 0, and || x || = 0 if, and only if, x = 0. (2) || kx || = | k | . || x ||, for all k ∈ K. (3) || y − x || ≤ || y − z || + || z − x || (triangular inequality). A vector space endowed with a norm || . || is denoted by (V, || . ||), or simply V when it is clear which norm is defined. All normalized vector spaces are a metric space since d(x, y) =|| y − x || defines a metric on V . However, the reverse is false. A K-vector space has a finite dimension equal to n if it admits a finite basis with n elements; otherwise, we say that it has infinite-dimension. There are infinite dimensional vector spaces that admit an enumerable basis, and there are also examples that do not admit enumerable bases. The impossibility to know a priori the precise value of a = limn→∞ xn makes the next concept very useful;
Appendix A: Basics of Analysis
327
Definition 10 A sequence {xn }n∈N ⊂ X is a Cauchy sequence if for any > 0 there is n 0 ∈ N such that if n, m > n 0 , then we have d(xn , xm ) < . In Rn , a sequence converges if and only if it is a Cauchy sequence. This is not true for all metric spaces. It is impossible to develop the concepts of calculus in a space where there are non-convergent Cauchy sequences, for example over Q. This question is not only a fault of the field, it may be a fault of the metric space. Definition 11 A metric space (X, d) is complete if every Cauchy sequence in X converges to a point in X . A Banach space is a complete normalized space (V, || . ||). Among the normed spaces, we have a special category of spaces with a norm induced by an inner product, i.e., in the case of an R-vector space, or induced by a sesquilinear form if it is a C-vector space. Definition 12 Let H = (V, || . ||) be a Banach space; (i) H is a complex Hilbert space when V is a C-vector space equipped with a √ sesquilinear form < ., . >: V × V → C, with norm || x ||= < x, x >. (ii) H is a real Hilbert space when V is an R-vector space endowed with inner √ product < ., . >: V × V → C and norm || x ||= < x, x >. The concept of distance is basic to studying issues related to approximation or convergence. As we have seen, there are different types of structures; the techniques to study convergence depend on the structure defined on the space. In fact, we have the following inclusions; Hilbert Spaces Normed Spaces Metric Spaces Topological Spaces.
(7)
Example 1 The examples below play an important role in former chapters. (1) The spaces (Q, d), (R, d) and (C, d) are normed spaces in which the norm is given by the module function || x ||=| x |. (2) The vector spaces Rn and Cn have finite dimension. They are Hilbert spaces √ with a norm || x ||= < x, x > induced by < ., . >; and an inner product if K = R and a sesquilinear form if K = C. (3) Consider the vector space C0 ([a, b]) = { f : [a, b] → R | f continuous} endowed with the norm || f ||0 = supx∈[a,b] | f (x) | and induced metric d( f, g) =|| g − f ||. Therefore C0 ([a, b]), d is a metric space. Indeed, it is a Banach space. 1/ p b induces a metric (4) Let 1 ≤ p < ∞. The norm || f || p = a | f (x) | p d x structure on C0([a, b]). For p = q, the metrics || . || p and || . ||q are nonequivalent. So C0 ([a, b]), || . || p is a normed space but it is not a Banach space. (5) L p spaces, 1 < p < ∞: Let X ⊂ Rn be a closed subset, with non-empty interior and finite volume and let E be a Banach space. Define
328
Appendix A: Basics of Analysis
L p (X ; E) =
f :X→E|
| f | p dv < ∞ ,
(8)
X
and dv = d x1 . . . d xn is the volume element of Rn . Consider the norm
1/ p
|| f || L p =
| f | p dv
.
(9)
X
The space L p (X ; E), || f || L p is a Banach space [28]. For p = 2, L2 (X ; E) is a Hilbert space. p (6) Consider the space of sequences l p (Z) = {{an }n∈N | an ∈ C, ∞ −∞ | an | < ∞} ∞ 1/ p p endowed with the norm || {an } || p = , 1 ≤ p < ∞. Therefore −∞ | an | (l p (Z), || . || p ) is a normed space. (7) In the last example, the norm || {an } ||0 = supn | an | induces a norm over the space of sequences l ∞ (Z) = {{a n }n∈N | an ∈ C, || {an } ||∞ < ∞}. n (8) Consider ρk,m = supx∈R | x k dd xf |. The Schwartz space is the set of functions S(R) = { f ∈ C∞ (R) | ρk,m < ∞, ∀k, m ∈ N}. The metric d( f, g) =
∞ k,m=−∞
1 2k+m
ρk,m ( f, g) 1 + ρk,m ( f, g)
induces a metric over S(R). (9) Let K ⊂ Rn be a compact subset and V ⊂ Rm an open subset. 0 ; Rm ) = { f : K → V | f continuous}. The space E = (a) Let 0 C (K m C (K ; R ), || . ||0 endowed with the norm || f ||0 = supx∈U | f (x) | is a Banach space. for all 0 ≤ i ≤ r } ( f : (b) Let Cr (K ; Rm ) = { f : K → V | d i f continuous, U → Rm is a C r -map). The space E = Cr (K ; Rm ), || . ||r is a normed space with norm k r || d i f ||0 . (10) || f ||C = i=0
It follows from the Ascoli-Arzelà theorem that Er = Cr (K ; Rm ), || . ||r is a Banach space. Moreover, the inclusion Fi,i−1 = (Ci (K ; Rm ), || . ||i−1
→ E i−1 is compact, that is, any sequence { f n }n∈N ⊂ Fi,i−1 admits a subsequence{ f n k } converging in E i−1 . The spaces Cr (U ; V ), r ≥ 0 are the most important ones for the purposes of this book. The norm is defined according to the context.
Appendix A: Basics of Analysis
329
All metric spaces admit a completion [28]. More precisely, if (X, d) is a metric ! and an inclusion ι : X → ! space, then there is a metric space ( ! X , d) X such that: (i) d! ι(x), ι(y) = d(x, y) for every pair x, y ∈ X , (ii) ι(X ) is dense in ! X, ! is complete, (iii) ( ! X , d) ! is unique up to the isomorphism of metric spaces. (iv) ( ! X , d)
Exercises (1) Let V be a real Banach space. Verify that a norm defined by an inner product satisfies the parallelogram identity || x + y ||2 + || x − y ||2 = 2 || x ||2 + || y ||2 .
(11)
(2) Show that a norm defines an inner product, if and only if it satisfies the identity of the parallelogram. (3) In the items (1) and (2) above, study the case when V is a complex vector space. (4) For each inclusion shown in the sequence of spaces (7), display an example to show that the inclusion cannot be an equality. (5) Show that C1 ([a, b]) is not a Banach space when fitted with the norm || . ||0 . Extend the result to Cr ([a, b]), 1 ≤ k < ∞. r m (6) Let K ⊂ Rn be a compact set. Show i that Emr = C (K ; R ), || . ||C r is complete and the inclusion Fi,i−1 = (C (K ; R ), || . ||i−1 → E i−1 is compact. Definition 13 Let U be a subset of a Banach space E. U is locally compact if every point has a compact neighborhood. That is, given r > 0, if U is open and bounded, then U admits a finite cover of balls of radius r . It follows from the definition that if E is locally compact and U ⊂ E, then U is compact. Definition 14 Let V be a vector space. A subset U ⊂ V is convex if the segment r : [0, 1] → U , r (t) = x + t (y − x) is contained in U for every pair of points x, y ∈ U . Definition 15 The set U is connected if we have a continuous curve γ : [0, 1] → U such that γ (0) = x, γ (1) = y and γ ([0, 1]) ⊂ U for every pair of points x, y ∈ U .
4 Calculus Theorems In this section, some statements about calculus are established to fix the notation.
330
Appendix A: Basics of Analysis
4.1 One Real Variable Functions Theorem 5 (Intermediate Value) Let f be a continuous real-valued function on a closed and bounded interval [a, b]. Then f attains all values between f (a) and f (b). Theorem 6 (Mean Value) Let f : [a, b] → R be a real-valued function of class C 1 . So there is a point c ∈ (a, b) such that f (b) − f (a) = f (c)(b − a).
(12)
The Fundamental Theorem of Calculus (FTC) is a milestone in Calculus; Theorem 7 (FTC) Let f : [a, b] → R be a function of class C 0 and F : [a, b] → R the function given by x
F(x) =
f (t)dt.
a
So F is differentiable and F (x) = f (x), for all x ∈ [a, b].
4.2 Functions of Several Real Variables Let U ⊂ Rn be an open subset. Definition 16 The tangent plane at a point p ∈ U is the set T p U = v ∈ Rn | ∃ γ : (−, ) → U, γ (0) = p and γ (0) = v . Taking any vector v ∈ Rn and γ (t) = p + t v, it follows that v ∈ T p U , hence T p U = Rn . Therefore T p U is a vector space isomorphic to Rn . Consider the map f = ( f 1 , . . . , f m ) ∈ C k (U ; Rm ), k ≥ 1, and the curve γ : (a, b) → U of class C 1 . The composition defines a curve h : (a, b) → Rm , h = f ◦ γ = (h 1 , . . . , h m ), and h i : (a, b) → R is C 1 . The Mean Value Theorem applied to each coordinate gives the numbers c1 , . . . , cm ∈ (a, b) such that h(b) − h(a) = h 1 (c1 ), . . . , h m (cm ) . In this case, we get the mean value inequality | h(b) − h(a) | ≤ sup | h i (ci ) | | b − a | .
(13)
i
Changing the coordinate system in Rn changes the volume element and, consequently also changes the expression of the integral. The change of the variable in multiple integrals is performed according to the following.
Appendix A: Basics of Analysis
331
Theorem 8 Let U ⊂ Rn be an open subset, : U → (U ) a C 1 -diffeomorphism and let f : (U ) → R be a continuous function. So the function ( f ◦ ). | det( ) | is integrable in U and
(U )
f (y)dy =
f ◦ (x). | det(d x ) | d x, y = (x).
(14)
U
5 Proper Maps This section is rather important for studying polynomial maps. Let X and Y be two topological spaces. Let V ⊂ Y be an arbitrary subset and define f −1 (V ) = {x ∈ X | f (x) ∈ V }. Definition 17 Let X and Y be two topological spaces. A map f : X → Y is called proper if the inverse image f −1 (K ) ⊂ X is compact for all compact subsets K ⊂ Y . Examples: (1) The restriction of a proper map is always proper. (2) Every polynomial P(x) ∈ R[x] is proper. Take a compact set K ⊂ R; then f −1 (K ) is closed since K is closed. We have to check if f −1 (K ) is bounded. Suppose it is not bounded; then we have y0 ∈ K such that lim x→∞ P(x) = y0 . This cannot happen, since lim x→∞ | P(x) |= ∞. Therefore f −1 (K ) is compact since it is closed and bounded. (3) Let P : R2 → R be the polynomial P(x, y) = x y. So P is not proper, since f −1 (0) is the union of the two coordinate axes. (4) The immersion γ : R → T 2 defined by Eq. (42)2 is not proper if r ∈ R\Q. (5) For all n ∈ N, consider the function f (x) = sin(nx). Since f −1 (0) = Z, f is not proper. (6) If X is compact and f : X → Y is continuous, then f is proper. Given two locally compact Hausdorff spaces X and Y , the lack of being proper for a C0 -map f : X → Y means that we have a convergent sequence {yn }n∈N ⊂ f (X ) so that the set f −1 ({yn }) contains a subsequence {xn }n∈N ⊂ X such that lim | xn |= ∞. Proposition 3 If X and Y are locally compact Hausdorff spaces, then any proper map f : X → Y has closed range. Proof Let {xn }n∈N ⊂ X be a sequence such that its image defines a convergent subsequence {yn = lim xn }n∈N ⊂ f (X ) and y = lim yn . Let K 0 ⊂ Y be a compact set containing y. Now, consider the compact subsets K f = K 0 ∩ f (X ) and f −1 (K f ). 2 See
in Chap. 1.
332
Appendix A: Basics of Analysis
Since f −1 (K f ) is compact, the sequence {xn }n∈N ⊂ X admits a convergent subsequence {xn k }. Let x = lim xn k . Given continuity, we have y = f (xn k ) ∈ f (X ). Hence f (X ) is closed. The interesting cases will be when Y is connected, f : X → Y is proper and f (X ) is open, then f must be surjective.
Exercises Let X and Y be locally compact Hausdorff spaces. (1) Show that if a map f : X → Y is a local homeomorphism and is proper, then f is surjective. (2) Assume X and Y are non-compact. Take compact subsets K X ⊂ X and K Y ⊂ Y " = Y ∪ K Y be the compactifications, respectively. and let " X = X ∪ K X and Y ", then Show that if a map f : X → Y admits a continuous extension fˆ : " X →Y f is proper (3) Show that the immersion γ : R → T2 given by the Example 42 in Chap. 1 is not proper if r ∈ R\Q.
6 Equicontinuity and the Ascoli-Arzelà Theorem Consider (X, d) a complete metric space and F ⊂ X a subset. Definition 18 Assume F is a Banach space. A subset C ⊂ C 0 (X ; F) is equicontinuous at x0 ∈ X if, for any f ∈ C, given > 0, there is δ > 0 such that for all x ∈ X if d(x, x0 ) < δ, then || f (x) − f (x0 ) || F < . C is equicontinuous in X if it is equicontinuous for every x ∈ X . The concept of equicontinuity is a subtle one; it must be observed in the definition that the value of δ is independent of f ∈ C. Indeed, if the set C is equicontinuous, then all f ∈ F is uniformly continuous, since given an > 0, there is a δ > 0, regardless of the point that serves all functions belonging to C. The sequence f n : [0, 1] → R, f n (x) = x n is not equicontinuous, since near x = 1 there is no such δ. Let K ⊂ X be a compact subset and let { f n }n∈N ⊂ C 0 (K , C) be a uniformly bounded sequence; that is, we have M > 0 such that | f n (x) |< M for all x ∈ K and n ∈ N. If { f n }n∈N is a sequence converging uniformly to f ∈ C 0 (K , C), then { f n }n∈N is equicontinuous as we can see as follows: consider n 0 ∈ N such that || f n − f m ||0 < for all n, m ≥ n 0 . The equicontinuity then follows from the inequality | f n (y) − f n (x) | ≤ | f n (y) − f n 0 (y) | + | f n 0 (y) − f n 0 (x) | + | f n 0 (x) − f n (x) | . According to Theorem 4, all continuous functions defined on a compact set are uniformly continuous. Taking δ > 0 such that d(x, y) < δ for all x, y ∈ K , we get | f n 0 (y) − f n 0 (x) |< .
Appendix A: Basics of Analysis
333
Theorem 9 (Ascoli-Arzelà) Let K be a compact subset of a metric space X and let F be a Banach space. Let C be a subset and E = (C 0 (K , F), || . ||0 ). So C ⊂ E is relatively compact if and only if the following conditions are verified; (i) C is equicontinuous. (ii) For x ∈ K , the set C(x) = { f (x) | f ∈ C} is relatively compact in X . Proof The conditions (i) and (ii) are necessary and sufficient given the following arguments. (i) necessity (⇒). Since C is relatively compact in C 0 (K ; F), given > 0, there is a finite subset p { f 1 , . . . , f p } ⊂ C such that C ⊂ ∪i=1 Bi , and Bi = B ( f i ) = { f ∈ C 0 (K ; F) | || f − f i || F < }. Once K is compact, we can take a finite set {x1 , . . . , x p } ⊂ K and p a finite cover ∪i=1 Vi , in which the Vi are open neighborhoods of xi . Given > 0, there is δ1 , . . . , δ p in such way that we can choose Vi = {x ∈ K | d(x, xi ) < δi }. Let δ = min1≤i≤ p {δi }. It follows that C is equicontinuous and C(x) is relatively compact. (ii) sufficient (⇐). p Given > 0, consider the finite cover ∪i=1 Vi in K defined in the last item. Once each set C(xi ) is relatively compact, it follows that a finite union U = C(x1 ) ∪ · · · ∪ C(x p ) is relatively compact. Consider the open balls B (a1 ), . . . , B (at ) of ray and t B (ai ). We assume centered at the points a1 , . . . , at , respectively, such that U ⊂ ∪i=1 that f (xi ) ∈ B (ai ), or equivalently, || f (xi ) − ai ||< . Defining Ci = { f ∈ C; || p f (xi ) − ai ||< }, it is straightforward that C admits a finite cover ∪i=1 Ci . In order to prove that Ci is bounded for every i, it will be enough to verify that the diameter is upper bounded by 4. Now if f, g ∈ Ci and x ∈ K , then x ∈ B (ai ), and therefore || f (x) − g(x) || ≤ || f (x) − f (xi ) || + || f (xi ) − ai || + || ai − g(xi ) || + || g(xi ) − g(x) || ≤ 4.
The Ascoli-Arzelà theorem is useful to prove several results. In most cases we studied in this book either F = Rn or F = Cn , so condition (ii) can be replaced by condition (ii’) in which we assume that C is a bounded set. F being finite dimensional, every bounded subset of F is locally compact. An important application is to prove the compactness of the embedding Fi,i−1 → E i−1 mentioned earlier. Theorem 10 Let K ⊂ R be a compact subset and { f n }n∈N ⊂ C 0 (K , C). If { f n }n∈N converges uniformly to f , then a
Proof See Ref. [38].
b
f (x)d x = lim
n→∞ a
b
f n (x)d x.
Theorem 11 Let K be a compact set and let { f n }n∈N ⊂ C 1 (K , C) be a sequence such that { f n (x0 )} converges to x0 ∈ K . If { f n }n∈N converges uniformly in K , then { f n }
334
Appendix A: Basics of Analysis
converges uniformly to a function f ∈ C 1 (K , C). Moreover, f (x) = limn→∞ f n (x) for all x ∈ K . Proof Essentially, the argument follows from the fact that the sequence { f n } is continuous, so the claim follows from the Fundamental Theorem of Calculus. See Ref. [38].
7 Functional Analysis Theorems Functional analysis starts off when we observe that certain topological properties of finite dimensional spaces and also some properties of the linear operators between them are no longer true when we consider vector spaces of infinite dimension. For example, convergence becomes a very relevant concept, and so many topological properties are sensitive to the norm defined on the space. Topological properties are very different in infinite dimension, e.g., the open ball is no longer a locally closed subset and the closed ball, as well as the spheres, are not compact. Not every vector subspace is closed, so a complementary subspace may not exist and the decomposition as direct sums may fail. In general, the Range and Nullity Theorem is false. In the finite dimension all norms are equivalent, which is false in the infinite dimension. Next, we outline some results used throughout the text.
7.1 Riesz and Hahn-Banach Theorems For the purposes of our applications in this book, the following versions of Riesz’s Representation Theorem (RRT) are enunciated; they can be found in several analysis textbooks. A more general version of the theorem is in [28]. Theorem 12 Let E be a Hilbert space and f : E → R a bounded linear functional. So we have some v f ∈ H such that for all g ∈ H , f (g) =< v f , g > . Moreover, | f |=| v f | E . There is no Riesz representation theorem for an arbitrary Banach space. Theorem 13 Let X be a Hausdorff and locally compact topological space. Consider C 0 (X ) = { f : X → R | f continuous} and T : C 0 (X ) → R a bounded linear functional such that T ( f ) ≥ 0 whenever f ≥ 0. So we have a Radon measure μ contained in the σ -algebra of Borel subsets B(X ) such that f dμ, f ∈ C 0 (X ).
T( f ) = X
Appendix A: Basics of Analysis
335
Theorem 14 (Hahn-Banach) Let E be a normed vector space and let F ⊂ E be a subspace. If θ : F → R is a continuous linear functional such that | θ |< C, then there is an extension θ˜ : E → R of θ such that | θ˜ |< C.
Proof See Ref. [28].
Corollary 1 Let E be a normed vector space and v ∈ E a non-null vector. So we have a functional θ : E → R such that θ (v) = 0. Proof Let F be the subspace generated by v. Consider θv : F → R, given by θv (v) = 1, and extend it linearly (use the Hahn-Banach Theorem) to the functional θ : E → R. In the finite dimension, the Hahn-Banach theorem is trivial: if v = 0, then one of its coordinates must be non-null, and then we assume vi = 0. Then take θ = πi . The following theorem gives a necessary and sufficient topological condition for the dimension of a normed space to be finite. Theorem 15 Let E be a normed space. E has finite dimension if, and only if, it is locally compact. Proof The need stems from Heine-Borel’s theorem, which states that any limited and closed set of a finite-dimensional normed space is compact. We will prove sufficiency. Assume E is locally compact and consider B = {x ∈ E; | x |≤ 1}. In this way, there m is a finite set of points { p1 , . . . , pm } ⊂ B such that B ⊂ ∪i=1 Br ( pi ). Consider F the vector subspace generated by the linear combination of p1 , . . . , pm . The claim is proved if F = E. By contradiction, suppose that F E. Let x ∈ / F, since F is a closed subspace of E, then there is a ∈ F such that dist(x, F) = min y∈F | x − y |=| x−a is contained in the ball Br ( pl ); x − a |. Assume the point |x−a| # # # # x −a # # − x l # ≤ r. #| x − a | In this way, | x − (a+ | x − a | xl ) | ≤ r. | x − a | . The vector al = a+ | x − a | xl belongs to F. Now taking r = 1/2, we get | x − al | ≤
| x −a | , 2
and dist(x, al ) < dist(x, F). Hence F = E.
Corollary 2 Let E be a normed space. Let {xn }n∈N ⊂ E be an arbitrary sequence and let F(n) be the subspace generated by {x1 , . . . , xn }. E has infinite dimension if, and only if, there is a sequence {yn } with the following properties; for all x ∈ F(n − 1),
336
Appendix A: Basics of Analysis
(i) yn ∈ F(n), (ii) | yn |= 1, (iii) | yn − x | ≥ 1/2.
7.2 Topological Complementary Subspace Let E and F be Banach spaces and let V ⊂ E be a linear subspace. Definition 19 A linear subspace W ⊂ E is: (i) an algebraic complement of V if V ∩ W = ∅ and E = V + W . Indeed, E = V ⊕ W. (ii) a topological complement of V if W is an algebraic complement of V and is also a closed subset in E. Proposition 4 A linear subspace V ⊂ E has a topological complement if and only if we have a projection P : E → E with P(E) = V . Proof (i) (⇒) Suppose V has a topological complement W ⊂ E. Since W is closed, every x ∈ E can be written uniquely as x = vx + wx . Define P : E → E by P(x) = vx . Therefore P is linear and P 2 = P, since P 2 (x) = P(P(x)) = P(vx ) = vx . Clearly, P(E) = V . (ii) (⇐) Suppose there is an idempotent linear operator P : E → E with P(E) = V . Let W = Ker(P). Then we have that W = Ker(P) = (I − P)(E) is closed since W = P −1 (0) and P is a continuous map. Furthermore, E = P(E) ⊕ (I − P)(E) = V ⊕ W. Hence W is a topological complement of V .
Proposition 5 If V ⊂ E is finite dimensional, then V has a topological complement. Proof Assume that dim(V ) = n and let β = {e1 , . . . , en } be a basis for V . Every v ∈ V can be written as a linear combination v = v1 e1 + · · · + vn en , with the coefficients vi ∈ C (or vi ∈ R). The linear functionals θi : V → C given by θi (v) = vi , 1 ≤ i ≤ n are continuous since V is finite dimensional. So each functional can be extended to a functional i : E → C such that: (a) i (v) = θi (v), for all v ∈ V ; (b) | i |=| θi |.
Appendix A: Basics of Analysis
337
Let W ⊂ E be defined by W =
n $
Ker(i ).
i=1
Let’s check that W is a topological complement of V . – W is closed since Ker(i ) is closed. n ai ei ∈ V and i (x) = ai = 0, – V ∩ W = {0}. Assume x ∈ V ∩ W , so x = i=1 for all 1 ≤ i ≤ n, since x ∈ W . Therefore V ∩ W = {0}. – E = V + W . From the decomposition x=
n i=1
%
) i (x)ei + x − &'
∈V
(
%
n
* i (x)ei ,
i=1
&'
∈W
(
we have n E = V + W . Now we can project E onto W by P : E → W , P(x) = i (x)ei . Therefore W is a topological complement of V . x − i=1
8 The Contraction Lemma Let (X, d) be a complete metric space. A map φ : X → X is a contraction if there is λ ∈ R such that 0 < λ < 1 and d φ(x ), φ(x) ≤ λd(x , x), for all x , x ∈ X. Fix x ∈ X and consider the sequence {φ n (x)}n∈N given by φ n (x) = n ' (% & (φ ◦ · · · ◦ φ)(x). For every pair x , x ∈ X , we have the inequalities (i) d φ n (x ), φ n (x) < λn .d(x , x), (ii) d φ n+1 (x), φ n (x) < d φ n (φ(x)), φ n (x) < λn .d(φ(x), x). Proposition 6 If φ : X → X is a contraction, then the sequence {φ n (x)}n∈N converges for all x ∈ X . Proof It is sufficient to prove that {φ n (x)}n∈N is a Cauchy sequence, which is achieved by proving the estimate
338
Appendix A: Basics of Analysis
d φ(x), x m d(φ (x), φ (x)) ≤ λ . 1−λ n
m
Taking n = m + p, we get d φ n (x), φ m (x) ≤ ≤ d φ m+ p (x), φ m+ p−1 (x) + d φ m+ p−1 (x), φ m+ p−2 (x) + · · · + d φ m+1 (x), φ m (x) ≤ d φ m+ p−1 (φ(x)), φ m+ p−1 (x) + · · · + d φ m (φ(x)), φ m (x) ≤ d φ(x), x m ≤ (λm+ p−1 + λm+ p−2 + · · · + λm ).d φ(x), x ≤ λ . 1−λ
d φ(x),x Define C = 1−λ . Once 0 < λ < 1, given > 0, there is n 0 ∈ N such that C | λ |n < for all m > n 0 . Therefore d φ n (x), φ m (x) ≤ , forall n, m > n 0 . Hence the sequence {φ n (x)} converges for all x ∈ X .
Due to the pointwise convergence of the sequence {φ n (x)} for all x ∈ X , define ξ(x) = lim φ n (x). n→∞
(15)
Proposition 7 The point ξ(x) does not depend on x; moreover, it is the only fixed point of φ. Proof The fact that ξ(x) is a fixed point of φ is an immediate consequence of ξ(x) = lim φ n+1 (x) = lim φ φ n (x) = φ lim φ n (x) = φ ξ(x) . Evaluating the limit n → ∞ on both sides of the inequality d φ n (x ), φ n (x) < λn .d(x , x), it follows that ξ(x ) = ξ(x) = ξ .
Bounding the claims proved so far, we have a very useful consequence; Lemma 1 (Contraction Lemma) Let (X, d) be a complete metric space and let φ : X → X be a contraction, so there is always a unique fixed point of φ in X . Corollary 3 If we have n 0 ∈ N such that φ n 0 : X → X is a contraction, then φ : X → X has a unique fixed point. Proof Let ξ be the unique fixed point of φ n 0 , so φ(ξ ) = φ 1+n 0 (ξ ) = φ n 0 φ(ξ ) .
Appendix A: Basics of Analysis
Therefore φ(ξ ) is a fixed point of φ n 0 , hence φ(ξ ) = ξ .
339
Remark 1 The condition 0 < k < 1 cannot be extended to include the case k = 1, as the following example shows: let X = [1, ∞), d(x, y) =| x − y | and φ(x) = x + x1 . (i) | φ(x) − φ(y) |