252 26 3MB
English Pages 544 Year 2021
Calculus of Real and Complex Variables Kuttler [email protected] January 11, 2021
2
CONTENTS I
Preliminary Topics
1
1 Basic Notions 1.1 Sets and Set Notation . . . . . . . . . . 1.2 The Schroder Bernstein Theorem . . . . 1.3 Equivalence Relations . . . . . . . . . . 1.4 The Hausdorff Maximal Theorem . . . . 1.4.1 The Hamel Basis . . . . . . . . . 1.5 Analysis of Real and Complex Numbers 1.5.1 Roots Of Complex Numbers . . . 1.5.2 The Complex Exponential . . . . 1.5.3 The Cauchy Schwarz Inequality . 1.6 Polynomials and Algebra . . . . . . . . 1.7 The Fundamental Theorem of Algebra . 1.8 Some Topics from Analysis . . . . . . . 1.8.1 lim sup and lim inf . . . . . . . . 1.8.2 Nested Interval Lemma . . . . . 1.9 Exercises . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . .
. . . . . . . . . . . . . . .
. . . . . . . . . . . . . . .
. . . . . . . . . . . . . . .
. . . . . . . . . . . . . . .
. . . . . . . . . . . . . . .
. . . . . . . . . . . . . . .
. . . . . . . . . . . . . . .
. . . . . . . . . . . . . . .
. . . . . . . . . . . . . . .
3 3 4 8 8 11 11 12 13 14 15 20 21 24 26 27
2 Basic Topology and Algebra 2.1 Some Algebra . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.2 Metric Spaces . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.3 Closed and Open Sets . . . . . . . . . . . . . . . . . . . . . . 2.4 Sequences and Cauchy Sequences . . . . . . . . . . . . . . . . 2.5 Separability and Complete Separability . . . . . . . . . . . . 2.6 Compactness . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.6.1 Continuous Functions . . . . . . . . . . . . . . . . . . 2.6.2 Limits of Vector Valued Functions . . . . . . . . . . . 2.6.3 The Extreme Value Theorem and Uniform Continuity 2.6.4 Convergence of Functions . . . . . . . . . . . . . . . . 2.6.5 Multiplication of Series . . . . . . . . . . . . . . . . . 2.7 Tietze Extension Theorem . . . . . . . . . . . . . . . . . . . . 2.8 Root Test . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.9 Equivalence of Norms . . . . . . . . . . . . . . . . . . . . . . 2.10 Norms on Linear Maps . . . . . . . . . . . . . . . . . . . . . . 2.11 Connected Sets . . . . . . . . . . . . . . . . . . . . . . . . . . 2.12 Stone Weierstrass Approximation Theorem . . . . . . . . . . 2.12.1 The Bernstein Polynomials . . . . . . . . . . . . . . . 2.12.2 The Case of Compact Sets . . . . . . . . . . . . . . . . 2.12.3 The Case of a Closed Set in Rp . . . . . . . . . . . . . 2.12.4 The Case Of Complex Valued Functions . . . . . . . . 2.13 Brouwer Fixed Point Theorem∗ . . . . . . . . . . . . . . . . . 2.14 Simplices and Triangulations . . . . . . . . . . . . . . . . . . 2.15 Labeling Vertices . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . .
31 31 34 35 39 40 42 47 51 52 53 58 59 61 62 63 67 71 71 73 74 77 78 78 81
3
. . . . . . . . . . . . . . .
. . . . . . . . . . . . . . .
. . . . . . . . . . . . . . .
. . . . . . . . . . . . . . .
. . . . . . . . . . . . . . .
. . . . . . . . . . . . . . .
. . . . . . . . . . . . . . .
. . . . . . . . . . . . . . .
. . . . . . . . . . . . . . .
. . . . . . . . . . . . . . .
. . . . . . . . . . . . . . .
CONTENTS
4
2.16 The Brouwer Fixed Point Theorem . . . . . . . . . . . . . . . . . . . . . . . . 83 2.17 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 85
II
Real Analysis
3 The 3.1 3.2 3.3 3.4 3.5 3.6 3.7 3.8 3.9 3.10
91
Derivative, a Linear Transformation Basic Definitions . . . . . . . . . . . . . . The Chain Rule . . . . . . . . . . . . . . . The Matrix of the Derivative . . . . . . . Existence of the Derivative, C 1 Functions Mixed Partial Derivatives . . . . . . . . . A Cofactor Identity . . . . . . . . . . . . . Implicit Function Theorem . . . . . . . . More Continuous Partial Derivatives . . . Invariance of Domain . . . . . . . . . . . . Exercises . . . . . . . . . . . . . . . . . .
. . . . . . . . . .
. . . . . . . . . .
. . . . . . . . . .
. . . . . . . . . .
. . . . . . . . . .
. . . . . . . . . .
. . . . . . . . . .
. . . . . . . . . .
. . . . . . . . . .
. . . . . . . . . .
. . . . . . . . . .
. . . . . . . . . .
. . . . . . . . . .
. . . . . . . . . .
. . . . . . . . . .
. . . . . . . . . .
. . . . . . . . . .
. . . . . . . . . .
. . . . . . . . . .
. . . . . . . . . .
93 93 95 96 97 98 99 101 105 106 108
4 Line Integrals 4.1 Existence and Definition . . . . . . . . . . 4.1.1 Change of Parameter . . . . . . . . 4.1.2 Existence . . . . . . . . . . . . . . 4.1.3 The Riemann Integral . . . . . . . 4.2 Estimates and Approximations . . . . . . 4.2.1 Finding the Length of a C 1 Curve 4.2.2 Curves Defined in Pieces . . . . . . 4.2.3 A Physical Application, Work . . . 4.3 Conservative Vector Fields . . . . . . . . . 4.4 Orientation . . . . . . . . . . . . . . . . . 4.5 Exercises . . . . . . . . . . . . . . . . . .
. . . . . . . . . . .
. . . . . . . . . . .
. . . . . . . . . . .
. . . . . . . . . . .
. . . . . . . . . . .
. . . . . . . . . . .
. . . . . . . . . . .
. . . . . . . . . . .
. . . . . . . . . . .
. . . . . . . . . . .
. . . . . . . . . . .
. . . . . . . . . . .
. . . . . . . . . . .
. . . . . . . . . . .
. . . . . . . . . . .
. . . . . . . . . . .
. . . . . . . . . . .
. . . . . . . . . . .
. . . . . . . . . . .
. . . . . . . . . . .
111 111 112 113 114 116 121 122 123 124 127 132
5 Measures And Measurable Functions 5.1 Measurable Functions . . . . . . . . 5.2 Measures and Their Properties . . . 5.3 Dynkin’s Lemma . . . . . . . . . . . 5.4 Measures and Outer Measures . . . . 5.5 An Outer Measure on P (R) . . . . . 5.6 Measures from Outer Measures . . . 5.7 When is a Measure a Borel Measure? 5.8 One Dimensional Lebesgue Measure 5.9 Exercises . . . . . . . . . . . . . . .
. . . . . . . . .
. . . . . . . . .
. . . . . . . . .
. . . . . . . . .
. . . . . . . . .
. . . . . . . . .
. . . . . . . . .
. . . . . . . . .
. . . . . . . . .
. . . . . . . . .
. . . . . . . . .
. . . . . . . . .
. . . . . . . . .
. . . . . . . . .
. . . . . . . . .
. . . . . . . . .
. . . . . . . . .
. . . . . . . . .
. . . . . . . . .
. . . . . . . . .
137 137 141 142 144 144 146 150 151 153
6 The Abstract Lebesgue Integral 6.1 Definition For Nonnegative Measurable Functions . . . 6.1.1 Riemann Integrals For Decreasing Functions . . . . 6.1.2 The Lebesgue Integral For Nonnegative Functions 6.2 The Lebesgue Integral for Nonnegative Simple Functions . 6.3 The Monotone Convergence Theorem . . . . . . . . . . . 6.4 Other Definitions . . . . . . . . . . . . . . . . . . . . . . . 6.5 Fatou’s Lemma . . . . . . . . . . . . . . . . . . . . . . . . 6.6 The Integral’s Righteous Algebraic Desires . . . . . . . . . 6.7 The Lebesgue Integral, L1 . . . . . . . . . . . . . . . . . . 6.8 The Dominated Convergence Theorem . . . . . . . . . . . 6.9 Product Measures . . . . . . . . . . . . . . . . . . . . . . 6.10 Some Important General Theorems . . . . . . . . . . . . .
. . . . . . . . . . . .
. . . . . . . . . . . .
. . . . . . . . . . . .
. . . . . . . . . . . .
. . . . . . . . . . . .
. . . . . . . . . . . .
. . . . . . . . . . . .
. . . . . . . . . . . .
. . . . . . . . . . . .
. . . . . . . . . . . .
. . . . . . . . . . . .
157 157 157 158 159 160 161 162 162 163 167 169 172
. . . . . . . . .
. . . . . . . . .
. . . . . . . . .
CONTENTS 6.10.1 Eggoroff’s Theorem . . . . . . . 6.10.2 The Vitali Convergence Theorem 6.10.3 Radon Nikodym Theorem . . . . 6.11 Exercises . . . . . . . . . . . . . . . . .
5 . . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
7 Positive Linear Functionals 7.1 Partitions of Unity . . . . . . . . . . . . . . . . . . . 7.2 Positive Linear Functionals and Measures . . . . . . 7.3 Lebesgue Measure . . . . . . . . . . . . . . . . . . . 7.4 Computation with Iterated Integrals . . . . . . . . . 7.5 Approximation with Gδ and Fσ Sets and Translation 7.6 The Vitali Covering Theorems . . . . . . . . . . . . 7.7 Exercises . . . . . . . . . . . . . . . . . . . . . . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
172 173 176 179
. . . . . . . . . . . . . . . . . . . . . . . . . . . . Invariance . . . . . . . . . . . . . . .
. . . . . . .
. . . . . . .
. . . . . . .
. . . . . . .
. . . . . . .
. . . . . . .
. . . . . . .
183 183 185 190 192 194 195 199
8 Basic Function Spaces 8.1 Bounded Continuous Functions . . . . . . . . . . 8.2 Compactness in C (K, Rn ) . . . . . . . . . . . . . 8.3 The Lp Spaces . . . . . . . . . . . . . . . . . . . 8.4 Approximation Theorems . . . . . . . . . . . . . 8.5 Maximal Functions and Fundamental Theorem of 8.6 A Useful Inequality . . . . . . . . . . . . . . . . . 8.7 Exercises . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . Calculus . . . . . . . . . . . .
. . . . . . .
. . . . . . .
. . . . . . .
. . . . . . .
. . . . . . .
. . . . . . .
. . . . . . .
. . . . . . .
. . . . . . .
. . . . . . .
207 207 212 214 218 219 222 223
9 Change of Variables 9.1 Lebesgue Measure and Linear Transformations 9.2 Change of Variables Nonlinear Maps . . . . . . 9.3 Mappings Which are Not One to One . . . . . 9.4 Spherical Coordinates In p Dimensions . . . . . 9.5 Approximation with Smooth Functions . . . . . 9.6 Continuity Of Translation . . . . . . . . . . . . 9.7 Separability . . . . . . . . . . . . . . . . . . . . 9.8 Exercises . . . . . . . . . . . . . . . . . . . . .
. . . . . . . .
. . . . . . . .
. . . . . . . .
. . . . . . . .
. . . . . . . .
. . . . . . . .
. . . . . . . .
. . . . . . . .
. . . . . . . .
. . . . . . . .
. . . . . . . .
. . . . . . . .
. . . . . . . .
. . . . . . . .
. . . . . . . .
. . . . . . . .
227 228 231 235 236 239 242 243 245
10 Some Fundamental Functions and Transforms 10.1 Gamma Function . . . . . . . . . . . . . . . . . . . 10.2 Laplace Transform . . . . . . . . . . . . . . . . . . 10.3 Fourier Transform . . . . . . . . . . . . . . . . . . 10.4 Fourier Transforms in Rn . . . . . . . . . . . . . . 10.5 Fourier Transforms Of Just About Anything . . . . 10.5.1 Fourier Transforms in G ∗ . . . . . . . . . . 10.5.2 Fourier Transforms of Functions In L1 (Rn ) 10.5.3 Fourier Transforms of Functions In L2 (Rn ) 10.5.4 The Schwartz Class . . . . . . . . . . . . . 10.5.5 Convolution . . . . . . . . . . . . . . . . . . 10.6 Exercises . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . .
. . . . . . . . . . .
. . . . . . . . . . .
. . . . . . . . . . .
. . . . . . . . . . .
. . . . . . . . . . .
. . . . . . . . . . .
. . . . . . . . . . .
. . . . . . . . . . .
. . . . . . . . . . .
. . . . . . . . . . .
. . . . . . . . . . .
. . . . . . . . . . .
. . . . . . . . . . .
. . . . . . . . . . .
249 249 250 253 258 261 261 265 267 271 272 274
11 Degree Theory, an Introduction 11.1 Sard’s Lemma and Approximation . . . . . . . 11.2 Properties of the Degree . . . . . . . . . . . . . 11.3 Borsuk’s Theorem . . . . . . . . . . . . . . . . 11.4 Applications . . . . . . . . . . . . . . . . . . . . 11.5 Product Formula, Jordan Separation Theorem 11.6 The Jordan Separation Theorem . . . . . . . . 11.7 Exercises . . . . . . . . . . . . . . . . . . . . .
. . . . . . .
. . . . . . .
. . . . . . .
. . . . . . .
. . . . . . .
. . . . . . .
. . . . . . .
. . . . . . .
. . . . . . .
. . . . . . .
. . . . . . .
. . . . . . .
. . . . . . .
. . . . . . .
. . . . . . .
277 279 285 287 291 294 297 301
. . . . . . . .
. . . . . . .
. . . . . . .
CONTENTS
6 12 Green’s Theorem 12.1 An Elementary Form of Green’s Theorem . . . . 12.2 Stoke’s Theorem . . . . . . . . . . . . . . . . . . 12.3 A General Green’s Theorem . . . . . . . . . . . . 12.4 The Jordan Curve Theorem∗ . . . . . . . . . . . 12.5 Green’s Theorem for a Rectifiable Jordan Curve∗ 12.6 Orientation of a Jordan Curve . . . . . . . . . . .
III
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
Abstract Analysis
. . . . . .
305 305 308 311 312 321 329
333
13 Banach Spaces 13.1 Theorems Based On Baire Category . . . . . . . . . . . . . . 13.1.1 Baire Category Theorem . . . . . . . . . . . . . . . . . 13.1.2 Uniform Boundedness Theorem . . . . . . . . . . . . . 13.1.3 Open Mapping Theorem . . . . . . . . . . . . . . . . . 13.1.4 Closed Graph Theorem . . . . . . . . . . . . . . . . . 13.2 Basic Theory of Hilbert Spaces . . . . . . . . . . . . . . . . . 13.3 Hahn Banach Theorem . . . . . . . . . . . . . . . . . . . . . . 13.3.1 Partially Ordered Sets . . . . . . . . . . . . . . . . . . 13.3.2 Gauge Functions And Hahn Banach Theorem . . . . . 13.3.3 The Complex Version Of The Hahn Banach Theorem 13.3.4 The Dual Space And Adjoint Operators . . . . . . . . 13.4 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . .
. . . . . . . . . . . .
. . . . . . . . . . . .
. . . . . . . . . . . .
. . . . . . . . . . . .
. . . . . . . . . . . .
. . . . . . . . . . . .
. . . . . . . . . . . .
. . . . . . . . . . . .
335 335 335 339 340 342 343 348 348 349 350 352 355
14 Representation Theorems 14.1 Radon Nikodym Theorem . 14.2 Vector Measures . . . . . . 14.3 The Dual Space of Lp (Ω) . 14.4 The Dual Space Of L∞ (Ω) 14.5 The Dual Space Of C0 (Rp ) 14.6 Exercises . . . . . . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
361 361 361 366 370 373 378
IV
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
Complex Analysis
15 Fundamentals 15.1 Banach Spaces . . . . . . . . . . . . . . . 15.2 The Cauchy Riemann Equations . . . . . 15.3 Contour Integrals . . . . . . . . . . . . . . 15.4 Primitives and Cauchy Goursat Theorem 15.5 Functions Differentiable on a Disk, Zeros . 15.6 The General Cauchy Integral Formula . . 15.7 Riemann sphere . . . . . . . . . . . . . . . 15.8 Exercises . . . . . . . . . . . . . . . . . .
383 . . . . . . . .
. . . . . . . .
. . . . . . . .
. . . . . . . .
. . . . . . . .
. . . . . . . .
. . . . . . . .
. . . . . . . .
. . . . . . . .
. . . . . . . .
. . . . . . . .
. . . . . . . .
. . . . . . . .
385 385 386 388 395 401 407 408 410
16 Isolated Singularities and Analytic Functions 16.1 Open Mapping Theorem for Complex Valued Functions 16.2 Functions Analytic on an Annulus . . . . . . . . . . . . 16.3 The Complex Exponential and Winding Number . . . . 16.4 Cauchy Integral Formula for a Cycle . . . . . . . . . . . 16.5 An Example of a Cycle . . . . . . . . . . . . . . . . . . . 16.6 Isolated Singularities . . . . . . . . . . . . . . . . . . . . 16.7 The Residue Theorem . . . . . . . . . . . . . . . . . . . 16.8 Evaluation of Improper Integrals . . . . . . . . . . . . .
. . . . . . . .
. . . . . . . .
. . . . . . . .
. . . . . . . .
. . . . . . . .
. . . . . . . .
. . . . . . . .
. . . . . . . .
. . . . . . . .
. . . . . . . .
. . . . . . . .
. . . . . . . .
417 417 420 423 425 429 433 435 437
. . . . . . . .
. . . . . . . .
. . . . . . . .
. . . . . . . .
. . . . . . . .
. . . . . . . .
. . . . . . . .
CONTENTS
7
16.9 The Inversion of Laplace Transforms . . . . . . . . . . . . . . . . . . . . . . . 446 16.10Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 450 17 Mapping Theorems 17.1 Meromorphic Functions . . . . . . . . . . . 17.2 Meromorphic on Extended Complex Plane . 17.3 Rouche’s Theorem . . . . . . . . . . . . . . 17.4 Fractional Linear Transformations . . . . . 17.5 Some Examples . . . . . . . . . . . . . . . . 17.6 Riemann Mapping Theorem . . . . . . . . . 17.6.1 Montel’s Theorem . . . . . . . . . . 17.6.2 Regions with Square Root Property 17.7 Exercises . . . . . . . . . . . . . . . . . . .
. . . . . . . . .
. . . . . . . . .
. . . . . . . . .
. . . . . . . . .
. . . . . . . . .
. . . . . . . . .
. . . . . . . . .
. . . . . . . . .
. . . . . . . . .
. . . . . . . . .
. . . . . . . . .
. . . . . . . . .
. . . . . . . . .
. . . . . . . . .
. . . . . . . . .
. . . . . . . . .
. . . . . . . . .
. . . . . . . . .
. . . . . . . . .
455 455 457 458 459 460 463 464 465 468
18 Spectral Theory of Linear Maps ∗ 471 18.1 The Resolvent and Spectral Radius . . . . . . . . . . . . . . . . . . . . . . . . 471 18.2 Functions of Linear Transformations . . . . . . . . . . . . . . . . . . . . . . . 476 18.3 Invariant Subspaces . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 480 A Review of Linear Algebra A.1 Systems of Equations . . . . . . . . . . . . . . . . . A.2 Matrices . . . . . . . . . . . . . . . . . . . . . . . . A.3 Subspaces and Spans . . . . . . . . . . . . . . . . . A.4 Application to Matrices . . . . . . . . . . . . . . . A.5 Mathematical Theory of Determinants . . . . . . . A.5.1 The Function sgn . . . . . . . . . . . . . . . A.5.2 Determinants . . . . . . . . . . . . . . . . . A.5.3 Definition of Determinants . . . . . . . . . A.5.4 Permuting Rows or Columns . . . . . . . . A.5.5 A Symmetric Definition . . . . . . . . . . . A.5.6 Alternating Property of the Determinant . A.5.7 Linear Combinations and Determinants . . A.5.8 Determinant of a Product . . . . . . . . . . A.5.9 Cofactor Expansions . . . . . . . . . . . . . A.5.10 Formula for the Inverse . . . . . . . . . . . A.5.11 Cramer’s Rule . . . . . . . . . . . . . . . . A.5.12 Upper Triangular Matrices . . . . . . . . . A.6 Cayley-Hamilton Theorem . . . . . . . . . . . . . . A.7 Eigenvalues and Eigenvectors of a Matrix . . . . . A.7.1 Definition of Eigenvectors and Eigenvalues . A.7.2 Triangular Matrices . . . . . . . . . . . . . A.7.3 Defective and Nondefective Matrices . . . . A.7.4 Diagonalization . . . . . . . . . . . . . . . . A.8 Schur’s Theorem . . . . . . . . . . . . . . . . . . . A.9 Hermitian Matrices . . . . . . . . . . . . . . . . . . A.10 Right Polar Factorization . . . . . . . . . . . . . . A.11 Direct Sums . . . . . . . . . . . . . . . . . . . . . . A.12 Block Diagonal Matrices . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . . .
485 485 486 489 492 493 493 495 495 495 496 497 497 498 499 500 501 502 502 504 504 505 505 508 510 512 513 515 519
8
CONTENTS
CONTENTS
i
c 2018, You are welcome to use this, including copying it for use in classes Copyright or referring to it on line but not to publish it for money. I do constantly upgrade it when I find things which could be improved.
ii
CONTENTS
Preface This book is on multi-variable advanced calculus with an introduction to complex analysis. I assume the reader has had a course in calculus which does an honest job of presenting the Riemann integral of a function of one variable and knows the usual things about completeness of R and its algebraic properties although these things are reviewed. This usually implies having had a reasonably good course in advanced calculus for functions of one variable, but my book Calculus of one and many variables would suffice. Also, it is expected that the reader knows the usual elementary things found in an undergraduate linear algebra course, such as row operations and linear transformations, and linear independence, although there is a review of these kinds of topics in the appendix. I also assume the reader has knowledge of math induction and well ordering and the topics in a typical pre-calculus course. If not, read the first part of my calculus book. I also assume the reader is familiar with the pre-calculus topics involving C, the complex numbers. If not, see my single variable advanced calculus book or my pre-calculus book http://www.centerofmath.org/textbooks/pre calc/index.html (2012). I have included the Lebesgue integral instead of the more usual Riemann integral because it seems to me that the Riemann integral is too technically difficult. If you include a rigorous treatment of this integral, you end up fussing over things like Jordan content of boundary, things which are not an issue in the Lebesgue theory. You end up working much harder for an inferior result. I think a strong argument can be made to feature the generalized Riemann integral. However, I decided on the Lebesgue integral because of its central role in probability. This book has a modern approach to real analysis and an introduction to complex analysis which includes those things which are of most interest to me. The main direction in the complex analysis part is toward classical nineteenth century analysis although it does include an introduction to methods of complex analysis in spectral theory of operators on a Banach space. I am presenting some very interesting theorems more than once. I think it is good to see different ways of proving them. Often these theorems appear for the first time in the exercises. Sometimes it seems like complex analysis is unrelated to real analysis because of the lack of pathology. I am trying to merge the two and point out similarities as well as differences. I hope that by doing so, better understanding of both subjects will be acquired. I have chosen to present the theory in a way that includes the case that the function is only continuous on the contour and analytic on the inside. I have also only assumed that the contours have finite total variation. I realize that one can get all of the main theorems in this subject by considering only piecewise C 1 curves which lie in an open set on which the function is analytic, but I think the extra effort is justified because it is less fussy, and it leads to formulations of major theorems in the manner which was originally conceived by those who invented the subject. This is hard to do and still keep the presentation rigorous because it requires things like the Jordan curve theorem, but I think it is worth the extra effort and helps to tie the material on complex analysis to important topological theorems as well as the theorems of real analysis which is the purpose for the book. However, the more usual treatment will be clear. A more elementary introduction is in my single variable advanced calculus book.
iii
iv
CONTENTS
Part I
Preliminary Topics
1
Chapter 1
Basic Notions The reader should be familiar with most of the topics in this chapter. However, it is often the case that set notation is not familiar and so a short discussion of this is included first. Complex numbers are then considered in somewhat more detail. This book is on analysis of functions of real variables or a complex variable. The basic arithmetic of complex numbers needs to be well understood at the outset.
1.1
Sets and Set Notation
A set is just a collection of things called elements. Often these are also referred to as points in calculus. For example {1, 2, 3, 8} would be a set consisting of the elements 1,2,3, and 8. To indicate that 3 is an element of {1, 2, 3, 8} , it is customary to write 3 ∈ {1, 2, 3, 8} . 9 ∈ / {1, 2, 3, 8} means 9 is not an element of {1, 2, 3, 8} . Sometimes a rule specifies a set. For example you could specify a set as all integers larger than 2. This would be written as S = {x ∈ Z : x > 2} . This notation says: the set of all integers, x, such that x > 2. If A and B are sets with the property that every element of A is an element of B, then A is a subset of B. For example, {1, 2, 3, 8} is a subset of {1, 2, 3, 4, 5, 8} , in symbols, {1, 2, 3, 8} ⊆ {1, 2, 3, 4, 5, 8} . It is sometimes said that “A is contained in B” or even “B contains A”. The same statement about the two sets may also be written as {1, 2, 3, 4, 5, 8} ⊇ {1, 2, 3, 8}. The union of two sets is the set consisting of everything which is an element of at least one of the sets, A or B. As an example of the union of two sets {1, 2, 3, 8} ∪ {3, 4, 7, 8} = {1, 2, 3, 4, 7, 8} because these numbers are those which are in at least one of the two sets. In general A ∪ B ≡ {x : x ∈ A or x ∈ B} . Be sure you understand that something which is in both A and B is in the union. It is not an exclusive or. The intersection of two sets, A and B consists of everything which is in both of the sets. Thus {1, 2, 3, 8} ∩ {3, 4, 7, 8} = {3, 8} because 3 and 8 are those elements the two sets have in common. In general, A ∩ B ≡ {x : x ∈ A and x ∈ B} . The symbol [a, b] where a and b are real numbers, denotes the set of real numbers x, such that a ≤ x ≤ b and [a, b) denotes the set of real numbers such that a ≤ x < b. (a, b) consists of the set of real numbers x such that a < x < b and (a, b] indicates the set of numbers x such that a < x ≤ b. [a, ∞) means the set of all numbers x such that x ≥ a and (−∞, a] means the set of all real numbers which are less than or equal to a. These sorts of sets of real numbers are called intervals. The two points a and b are called endpoints of the interval. Other intervals such as (−∞, b) are defined by analogy to what was just explained. In general, the curved parenthesis indicates the end point it sits next to is not included while the square parenthesis indicates this end point is included. The reason that there 3
4
CHAPTER 1. BASIC NOTIONS
will always be a curved parenthesis next to ∞ or −∞ is that these are not real numbers. Therefore, they cannot be included in any set of real numbers. A special set which needs to be given a name is the empty set also called the null set, denoted by ∅. Thus ∅ is defined as the set which has no elements in it. Mathematicians like to say the empty set is a subset of every set. The reason they say this is that if it were not so, there would have to exist a set A, such that ∅ has something in it which is not in A. However, ∅ has nothing in it and so the least intellectual discomfort is achieved by saying ∅ ⊆ A. If A and B are two sets, A \ B denotes the set of things which are in A but not in B. Thus A \ B ≡ {x ∈ A : x ∈ / B} . Set notation is used whenever convenient. To illustrate the use of this notation relative to intervals consider three examples of inequalities. Their solutions will be written in the notation just described. Example 1.1.1 Solve the inequality 2x + 4 ≤ x − 8 x ≤ −12 is the answer. This is written in terms of an interval as (−∞, −12]. Example 1.1.2 Solve the inequality (x + 1) (2x − 3) ≥ 0. The solution is x ≤ −1 or x ≥ 3 [ , ∞). 2
3 . In terms of set notation this is denoted by (−∞, −1] ∪ 2
Example 1.1.3 Solve the inequality x (x + 2) ≥ −4. This is true for any value of x. It is written as R or (−∞, ∞) . Something is in the Cartesian product of a set whose elements are sets if it consists of a single thing taken from each set in the family. Thus (1, 2, 3) ∈ {1, 4, .2}×{1, 2, 7}×{4, 3, 7, 9} because it consists of exactly one element from each of the sets which are separated by ×. Also, this is the notation for the Cartesian product of finitely many sets. If S is a set whose elements are sets, Y A A∈S
signifies the Cartesian product. The Cartesian product is the set of choice functions, a choice function being a function which selects exactly one element of each set of S. You may think the axiom of choice, stating that the Cartesian product of a nonempty family of nonempty sets is nonempty, is innocuous but there was a time when many mathematicians were ready to throw it out because it implies things which are very hard to believe, things which never happen without the axiom of choice.
1.2
The Schroder Bernstein Theorem
It is very important to be able to compare the size of sets in a rational way. The most useful theorem in this context is the Schroder Bernstein theorem which is the main result to be presented in this section. The Cartesian product is discussed above. The next definition reviews this and defines the concept of a function.
Definition 1.2.1
Let X and Y be sets. X × Y ≡ {(x, y) : x ∈ X and y ∈ Y }
1.2. THE SCHRODER BERNSTEIN THEOREM
5
A relation is defined to be a subset of X × Y . A function f, also called a mapping, is a relation which has the property that if (x, y) and (x, y1 ) are both elements of the f , then y = y1 . The domain of f is defined as D (f ) ≡ {x : (x, y) ∈ f } , written as f : D (f ) → Y . Another notation which is used is the following f −1 (y) ≡ {x ∈ D (f ) : f (x) = y} This is called the inverse image. It is probably safe to say that most people do not think of functions as a type of relation which is a subset of the Cartesian product of two sets. A function is like a machine which takes inputs, x and makes them into a unique output, f (x). Of course, that is what the above definition says with more precision. An ordered pair, (x, y) which is an element of the function or mapping has an input, x and a unique output y,denoted as f (x) while the name of the function is f . “mapping” is often a noun meaning function. However, it also is a verb as in “f is mapping A to B ”. That which a function is thought of as doing is also referred to using the word “maps” as in: f maps X to Y . However, a set of functions may be called a set of maps so this word might also be used as the plural of a noun. There is no help for it. You just have to suffer with this nonsense. The following theorem which is interesting for its own sake will be used to prove the Schroder Bernstein theorem, proved by Dedekind in 1887. The proof given here is like the version in Hewitt and Stromberg [34].
Theorem 1.2.2
Let f : X → Y and g : Y → X be two functions. Then there exist sets A, B, C, D, such that A ∪ B = X, C ∪ D = Y, A ∩ B = ∅, C ∩ D = ∅, f (A) = C, g (D) = B. The following picture illustrates the conclusion of this theorem. X
Y f A
B = g(D)
C = f (A)
g D
Proof:Consider the empty set, ∅ ⊆ X. If y ∈ Y \ f (∅), then g (y) ∈ / ∅ because ∅ has no elements. Also, if A, B, C, and D are as described above, A also would have this same property that the empty set has. However, A is probably larger. Therefore, say A0 ⊆ X satisfies P if whenever y ∈ Y \ f (A0 ) , g (y) ∈ / A0 . A ≡ {A0 ⊆ X : A0 satisfies P}. Let A = ∪A. If y ∈ Y \ f (A), then for each A0 ∈ A, y ∈ Y \ f (A0 ) and so g (y) ∈ / A0 . Since g (y) ∈ / A0 for all A0 ∈ A, it follows g (y) ∈ / A. Hence A satisfies P and is the largest subset of X which does so. Now define C ≡ f (A) , D ≡ Y \ C, B ≡ X \ A.
6
CHAPTER 1. BASIC NOTIONS
It only remains to verify that g (D) = B. It was just shown that g (D) ⊆ B. Suppose x ∈ B = X \ A. Then A ∪ {x} does not satisfy P and so there exists y ∈ Y \ f (A ∪ {x}) ⊆ D such that g (y) ∈ A ∪ {x} . But y ∈ / f (A) and so since A satisfies P, it follows g (y) ∈ / A. Hence g (y) = x and so x ∈ g (D). Hence g (D) = B.
Theorem 1.2.3
(Schroder Bernstein) If f : X → Y and g : Y → X are one to one, then there exists h : X → Y which is one to one and onto. Proof:Let A, B, C, D be the sets of Theorem1.2.2 and define f (x) if x ∈ A h (x) ≡ g −1 (x) if x ∈ B Then h is the desired one to one and onto mapping. Recall that the Cartesian product may be considered as the collection of choice functions.
Definition 1.2.4
Let I be a set and let Xi be a set for each i ∈ I. f is a choice
function written as f∈
Y
Xi
i∈I
if f (i) ∈ Xi for each i ∈ I. The axiom of choice says that if Xi 6= ∅ for each i ∈ I, for I a set, then Y Xi 6= ∅. i∈I
Sometimes the two functions, f and g are onto but not one to one. It turns out that with the axiom of choice, a similar conclusion to the above may be obtained.
Corollary 1.2.5 If f : X → Y is onto and g : Y → X is onto, then there exists h : X → Y which is one to one and onto. Proof: For each y ∈ Y Q , f −1 (y) ≡ {x ∈ X : f (x) = y} 6= ∅. Therefore, by the axiom of −1 choice, there exists f0 ∈ y∈Y f −1 (y) which is the same as saying that for each y ∈ Y , f0−1 (y) ∈ f −1 (y). Similarly, there exists g0−1 (x) ∈ g −1 (x) for all x ∈ X. Then f0−1 is one to one because if f0−1 (y1 ) = f0−1 (y2 ), then y1 = f f0−1 (y1 ) = f f0−1 (y2 ) = y2 . Similarly g0−1 is one to one. Therefore, by the Schroder Bernstein theorem, there exists h : X → Y which is one to one and onto.
Definition 1.2.6 A set S, is finite if there exists a natural number n and a map θ which maps {1, · · · , n} one to one and onto S. S is infinite if it is not finite. A set S, is called countable if there exists a map θ mapping N one to one and onto S.(When θ maps a set A to a set B, this will be written as θ : A → B in the future.) Here N ≡ {1, 2, · · · }, the natural numbers. S is at most countable if there exists a map θ : N →S which is onto. The property of being at most countable is often referred to as being countable because the question of interest is normally whether one can list all elements of the set, designating a first, second, third etc. in such a way as to give each element of the set a natural number. The possibility that a single element of the set may be counted more than once is often not important.
Theorem 1.2.7
If X and Y are both at most countable, then X × Y is also at most countable. If either X or Y is countable, then X × Y is also countable.
1.2. THE SCHRODER BERNSTEIN THEOREM
7
Proof:It is given that there exists a mapping η : N → X which is onto. Define η (i) ≡ xi and consider X as the set {x1 , x2 , x3 , · · · }. Similarly, consider Y as the set {y1 , y2 , y3 , · · · }. It follows the elements of X × Y are included in the following rectangular array. (x1 , y1 ) (x1 , y2 ) (x1 , y3 ) · · · (x2 , y1 ) (x2 , y2 ) (x2 , y3 ) · · · (x3 , y1 ) (x3 , y2 ) (x3 , y3 ) · · · .. .. .. . . .
← Those which have x1 in first slot. ← Those which have x2 in first slot. ← Those which have x3 in first slot. . .. .
Follow a path through this array as follows. (x1 , y1 ) → (x1 , y2 ) . % (x2 , y1 ) (x2 , y2 ) ↓ % (x3 , y1 )
(x1 , y3 ) →
Thus the first element of X × Y is (x1 , y1 ), the second element of X × Y is (x1 , y2 ), the third element of X × Y is (x2 , y1 ) etc. This assigns a number from N to each element of X × Y. Thus X × Y is at most countable. It remains to show the last claim. Suppose without loss of generality that X is countable. Then there exists α : N → X which is one to one and onto. Let β : X × Y → N be defined by β ((x, y)) ≡ α−1 (x). Thus β is onto N. By the first part there exists a function from N onto X × Y . Therefore, by Corollary 1.2.5, there exists a one to one and onto mapping from X × Y to N. Qn Note that by induction, i=1 Xi is at most countable if each Xi is.
Theorem 1.2.8 If X and Y are at most countable, then X ∪ Y is at most countable. If either X or Y are countable, then X ∪ Y is countable. Proof:As in the preceding theorem, X = {x1 , x2 , x3 , · · · } and Y = {y1 , y2 , y3 , · · · } . Consider the following array consisting of X ∪ Y and path through it. x1 y1
→ x2 . → y2
x3
→
%
Thus the first element of X ∪ Y is x1 , the second is x2 the third is y1 the fourth is y2 etc. Consider the second claim. By the first part, there is a map from N onto X ×Y . Suppose without loss of generality that X is countable and α : N → X is one to one and onto. Then define β (y) ≡ 1, for all y ∈ Y ,and β (x) ≡ α−1 (x). Thus, β maps X × Y onto N and this shows there exist two onto maps, one mapping X ∪ Y onto N and the other mapping N onto X ∪ Y . Then Corollary 1.2.5 yields the conclusion. In fact, the countable union of countable sets is also at most countable. ∞ Let Ai be a countable set. Thus Ai = rji j=1 . Then ∪∞ i=1 Ai is also at most a countable set. If it is an infinite set, then it is countable.
Theorem 1.2.9
8
CHAPTER 1. BASIC NOTIONS Proof: This is proved like Theorem 1.2.7 arrange ∪∞ i=1 Ai as follows. r11 r12 r13 .. .
r21 r22 r23 .. .
r31 r32 r33 .. .
··· ··· ···
Now take a route through this rectangular array as in Theorem 1.2.7, identifying an enumeration in the order in which the displayed elements are encountered as done in that theorem. ∞ Thus there is an onto mapping from N to ∪∞ i=1 Ai and so ∪i=1 Ai is at most countable, meaning its elements can be enumerated. However, if any of the Ai is infinite or if the union is, then there is an onto map from ∪∞ i=1 Ai onto N and so from Corollary 1.2.5, there would be a one to one and onto map between N and ∪∞ i=1 Ai . Note that by induction this shows that if you have any finite set whose elements are countable sets, then the union of these is countable.
1.3
Equivalence Relations
There are many ways to compare elements of a set other than to say two elements are equal or the same. For example, in the set of people let two people be equivalent if they have the same weight. This would not be saying they were the same person, just that they weighed the same. Often such relations involve considering one characteristic of the elements of a set and then saying the two elements are equivalent if they are the same as far as the given characteristic is concerned.
Definition 1.3.1
Let S be a set. ∼ is an equivalence relation on S if it satisfies the
following axioms. 1. x ∼ x
for all x ∈ S. (Reflexive)
2. If x ∼ y then y ∼ x. (Symmetric) 3. If x ∼ y and y ∼ z, then x ∼ z. (Transitive)
Definition 1.3.2 [x] denotes the set of all elements of S which are equivalent to x and [x] is called the equivalence class determined by x or just the equivalence class of x. With the above definition one can prove the following simple theorem.
Theorem 1.3.3
Let ∼ be an equivalence relation defined on a set, S and let H denote the set of equivalence classes. Then if [x] and [y] are two of these equivalence classes, either x ∼ y and [x] = [y] or it is not true that x ∼ y and [x] ∩ [y] = ∅. Proof: If x ∼ y, then if z ∈ [y] , you have x ∼ y and y ∼ z so x ∼ z which shows that [y] ⊆ [x]. Similarly, [x] ⊆ [y]. If it is not the case that x ∼ y, then there can be no intersection of [x] and [y] because if z were in this intersection, then x ∼ z, z ∼ y so x ∼ y.
1.4
The Hausdorff Maximal Theorem
The Hausdorff maximal theorem or something like it is often very useful. I will use it whenever convenient and its use typically makes a much longer and involved argument shorter. However, sometimes its use is absolutely essential. First is the definition of what is meant by a partial order.
1.4. THE HAUSDORFF MAXIMAL THEOREM
9
Definition 1.4.1 A nonempty set F is called a partially ordered set if it has a partial order denoted by ≺. This means it satisfies the following. If x ≺ y and y ≺ z, then x ≺ z. Also x ≺ x. It is like ⊆ on the set of all subsets of a given set. It is not the case that given two elements of F that they are related. In other words, you cannot conclude that either x ≺ y or y ≺ x. A chain, denoted by C ⊆ F has the property that it is totally ordered meaning that if x, y ∈ C, either x ≺ y or y ≺ x. A maximal chain is a chain C which has the property that there is no strictly larger chain. In other words, if x ∈ F\ ∪ C, then C∪ {x} is no longer a chain so x fails to be related to something in C. Here is the Hausdorff maximal theorem. The proof is a proof by contradiction. We assume there is no maximal chain and then show this cannot happen. The axiom of choice is used in choosing the xC right at the beginning of the argument.
Theorem 1.4.2
Let F be a nonempty partially ordered set with order ≺. Then there
exists a maximal chain. Proof: Suppose no chain is maximal. Then for each chain C there exists xC ∈ F\ ∪ C such that C ∪ {xC } is a chain. Call two chains comparable if one is a subset of the other. Let X be the set of all chains. Thus the elements of X are chains and X is a subset of P (F). For C a chain, let θC denote C ∪ {xC } . Pick x0 ∈ F. A subset Y of X will be called a “tower” if θC ∈ Y whenever C ∈ Y, x0 ∈ ∪C for all C ∈ Y, and if S is a subset of Y such that all pairs of chains in S are comparable, then ∪S is in Y. Note that X is a tower. Let Y0 be the intersection of all towers. Then Y0 is also a tower, the smallest one. This follows from the definition. Claim 1: If C0 ∈ Y0 is comparable to every chain C ∈ Y0 , then if C0 ( C, it must be the case that θC0 ⊆ C. The symbol ( indicates proper subset. Proof of Claim 1: Consider B ≡ {D ∈ Y0 : D ⊆ C 0 or xC0 ∈ ∪D}. Let Y1 ≡ Y0 ∩ B. I want to argue that Y1 is a tower. Obviously all chains of Y1 contain x0 in their unions. If D ∈ Y1 , is θD ∈ Y1 ? case 1: D ! C0 . Then xC0 ∈ ∪D and so xC0 ∈ ∪θD so θD ∈ Y1 . case 2: D ⊆ C 0 . Then if θD ! C0 , it follows that D ⊆ C0 D ∪ {xD } . If x ∈ C0 \ D then x = xD . But then C0 D ∪ {xD } ⊆ C0 ∪ {xD } = C0 which is nonsense, and so C0 = D so xD = xC0 ∈ ∪θC0 = ∪θD and so θD ∈ Y1 . If θD ⊆ C0 then right away θD ∈ B. Thus B = Y0 because Y1 cannot be smaller than Y0 . In particular, if D ! C0 , then xC0 ∈ ∪D or in other words, θC0 ⊆ D. Claim 2: Any two chains in Y0 are comparable so if C ( D, then θC ⊆ D. Proof of Claim 2: Let Y1 consist of all chains of Y0 which are comparable to every chain of Y0 . (Like C0 above.) I want to show that Y1 is a tower. Let C ∈ Y1 and D ∈ Y0 . Since C is comparable to all chains in Y0 , either C ( D or C ⊇ D. I need to show that θC is comparable with D. The second case is obvious so consider the first that C ( D. By Claim 1, θC ⊆ D. Since D is arbitrary, this shows that Y1 is a tower. Hence Y1 = Y0 because Y0 is as small as possible. It follows that every two chains in Y0 are comparable and so if C ( D, then θC ⊆ D. Since every pair of chains in Y0 are comparable and Y0 is a tower, it follows that ∪Y0 ∈ Y0 so ∪Y0 is a chain. However, θ ∪ Y0 is a chain which properly contains ∪Y0 and since Y0 is a tower, θ ∪ Y0 ∈ Y0 . Thus ∪ (θ ∪ Y0 ) ! ∪ (∪Y0 ) ⊇ ∪ (θ ∪ Y0 ) which is a contradiction. Therefore, it is impossible to obtain the xC described above for some chain C and so, this C is a maximal chain. If X is a nonempty set, ≤ is an order on X if x ≤ x, and if x, y ∈ X, then either x ≤ y or y ≤ x
10
CHAPTER 1. BASIC NOTIONS
and if x ≤ y and y ≤ z then x ≤ z. ≤ is a well order and say that (X, ≤) is a well-ordered set if every nonempty subset of X has a smallest element. More precisely, if S 6= ∅ and S ⊆ X then there exists an x ∈ S such that x ≤ y for all y ∈ S. A familiar example of a well-ordered set is the natural numbers.
Lemma 1.4.3 The Hausdorff maximal principle implies every nonempty set can be wellordered. Proof: Let X be a nonempty set and let a ∈ X. Then {a} is a well-ordered subset of X. Let F = {S ⊆ X : there exists a well order for S}. Thus F = 6 ∅. For S1 , S2 ∈ F, define S1 ≺ S2 if S1 ⊆ S2 and there exists a well order for S2 , ≤2 such that (S2 , ≤2 ) is well-ordered and if y ∈ S2 \ S1 then x ≤2 y for all x ∈ S1 , and if ≤1 is the well order of S1 then the two orders are consistent on S1 . Then observe that ≺ is a partial order on F. By the Hausdorff maximal principle, let C be a maximal chain in F and let X∞ ≡ ∪C. Define an order, ≤, on X∞ as follows. If x, y are elements of X∞ , pick S ∈ C such that x, y are both in S. Then if ≤S is the order on S, let x ≤ y if and only if x ≤S y. This definition is well defined because of the definition of the order, ≺. Now let U be any nonempty subset of X∞ . Then S ∩ U 6= ∅ for some S ∈ C. Because of the definition of ≤, if y ∈ S2 \ S1 , Si ∈ C, then x ≤ y for all x ∈ S1 . Thus, if y ∈ X∞ \ S then x ≤ y for all x ∈ S and so the smallest element of S ∩ U exists and is the smallest element in U . Therefore X∞ is well-ordered. Now suppose there exists z ∈ X \ X∞ . Define the following order, ≤1 , on X∞ ∪ {z}. x ≤1 y if and only if x ≤ y whenever x, y ∈ X∞ x ≤1 z whenever x ∈ X∞ . Then let Ce = {S ∈ C or X∞ ∪ {z}}. Then Ce is a strictly larger chain than C contradicting maximality of C. Thus X \ X∞ = ∅ and this shows X is well-ordered by ≤. This proves the lemma. With these two lemmas the main result follows.
Theorem 1.4.4
The following are equivalent. The axiom of choice The Hausdorff maximal principle The well-ordering principle.
Proof: It only remains to prove that the well-ordering principle implies the axiom of choice. Let I be a nonempty set and let Xi be a nonempty set for each i ∈ I. Let X =Q∪{Xi : i ∈ I} and well order X. Let f (i) be the smallest element of Xi . Then f ∈ i∈I Xi . There are some other equivalences to the axiom of choice proved in the book by Hewitt and Stromberg [34].
1.5. ANALYSIS OF REAL AND COMPLEX NUMBERS
1.4.1
11
The Hamel Basis
A Hamel basis is nothing more than the correct generalization of the notion of a basis for a finite dimensional vector space to vector spaces which are possibly not of finite dimension.
Definition 1.4.5
Let X be a vector space. A Hamel basis is a subset of X, Λ such that every vector of X can be written as a finite linear combination of vectors P of Λ and the n vectors of Λ are linearly independent in the sense that if {x1 , · · · , xn } ⊆ Λ and k=1 ck xk = 0. Then each ck = 0. The main result is the following theorem.
Theorem 1.4.6
Let X be a nonzero vector space. Then it has a Hamel basis.
Proof: Let x1 ∈ X and x1 6= 0. Let F denote the collection of subsets of X, Λ containing x1 with the property that the vectors of Λ are linearly independent as described in Definition 1.4.5 partially ordered by set inclusion. By the Hausdorff maximal theorem, there exists a maximal chain, C Let Λ = ∪C. Since C is a chain, it follows that if {x1 , P · · · , xn } ⊆ C then n there exists a single Λ0 ∈ C containing all these vectors. Therefore, if k=1 ck xk = 0 it follows each ck = 0. Thus the vectors of Λ are linearly independent. Is every vector of X a finite linear combination of vectors of Λ? Suppose not. Then there exists z which Pm is not equal to a finite linear combination of vectors of Λ. Consider Λ ∪ {z} . If cz + k=1 ck xk = 0 where the xk are vectors of Λ, then if c 6= 0 this contradicts the condition that z is not a finite linear combination of vectors of Λ. Therefore, c = 0 and now all the ck must equal zero because it was just shown Λ is linearly independent. It follows C∪ {Λ ∪ {z}} is a strictly larger chain than C and this is a contradiction. Therefore, Λ is a Hamel basis as claimed.
1.5
Analysis of Real and Complex Numbers
I am assuming the reader is familiar with the complex numbers which can be considered as points in the plane, the complex number x + iy being the point obtained by graphing the ordered pair (x, y) . I assume the reader knows about the complex conjugate x + iy ≡ x − iy and all its properties such ¯ and zw = z¯ w. ¯ Also recall p as, for z, w ∈ C, (z + w) = z¯ + w 2 2 that for z ∈ C, |z| ≡ x + y where z = x + iy and that the triangle inequalities hold: 1/2 |z + w| ≤ |z| + |w| and |z − w| ≥ ||z| − |w|| and |z| = (z z¯) . This is the time to review these things. If you have not seen them, read my single variable advanced calculus book or the first part of my calculus book. Any good pre-calculus book has these topics. Also recall that complex numbers, are often written in the so called polar form which is described next. Suppose z = x + iy is a complex number. Then ! p x y x + iy = x2 + y 2 p + ip . x2 + y 2 x2 + y 2 Now note that
!2
x p
+
x2 + y 2
=1
p x2 + y 2
and so x p
!2
y
y
x2 + y 2
!
,p x2 + y 2
is a point on the unit circle. Therefore, there exists a unique angle θ ∈ [0, 2π) such that cos θ = p
x x2
+
y2
, sin θ = p
y x2
+ y2
.
12
CHAPTER 1. BASIC NOTIONS
The polar form of p the complex number is then r (cos θ + i sin θ) where θ is this angle just described and r = x2 + y 2 ≡ |z|. r=
p
x2 + y 2
x + iy = r(cos(θ) + i sin(θ))
r θ
1.5.1
Roots Of Complex Numbers
A fundamental identity is the formula of De Moivre which follows.
Theorem 1.5.1
Let r > 0 be given. Then if n is a positive integer, n
[r (cos t + i sin t)] = rn (cos nt + i sin nt) . Proof: It is clear the formula holds if n = 1. Suppose it is true for n. n+1
n
[r (cos t + i sin t)]
= [r (cos t + i sin t)] [r (cos t + i sin t)]
which by induction equals = rn+1 (cos nt + i sin nt) (cos t + i sin t) = rn+1 ((cos nt cos t − sin nt sin t) + i (sin nt cos t + cos nt sin t)) = rn+1 (cos (n + 1) t + i sin (n + 1) t) by the formulas for the cosine and sine of the sum of two angles.
Corollary 1.5.2 Let z be a non zero complex number. Then for k ∈ N, there are always exactly k k th roots of z in C. Proof: Let z = x + iy and let z = |z| (cos t + i sin t) be the polar form of the complex number. By De Moivre’s theorem, a complex number r (cos α + i sin α) , is a k th root of z if and only if rk (cos kα + i sin kα) = |z| (cos t + i sin t) . 1/k
This requires rk = |z| and so r = |z| This can only happen if
and also both cos (kα) = cos t and sin (kα) = sin t.
kα = t + 2lπ for l an integer. Thus α=
t + 2lπ ,l ∈ Z k
and so the k th roots of z are of the form t + 2lπ t + 2lπ 1/k |z| cos + i sin , l ∈ Z. k k Since the cosine and sine are periodic of period 2π, there are exactly k distinct numbers which result from this formula. Example 1.5.3 Find the three cube roots of i.
1.5. ANALYSIS OF REAL AND COMPLEX NUMBERS
13
First note that i = 1 cos π2 + i sin π2 . Using the formula in the proof of the above corollary, the cube roots of i are (π/2) + 2lπ (π/2) + 2lπ + i sin 1 cos 3 3 where l = 0, 1, 2. Therefore, the roots are π π 5 5 3 3 + i sin , cos cos π + i sin π , cos π + i sin π . 6 6 6 6 2 2 √ √ 3 1 − 3 1 Thus the cube roots of i are +i , +i , and −i. 2 2 2 2 The ability to find k th roots can also be used to factor some polynomials. Example 1.5.4 Factor the polynomial x3 − 27. First find the cube roots of 27. ! By the above procedure using De Moivre’s theorem, √ √ ! 3 −1 3 −1 +i −i , and 3 . Therefore, x3 − 27 = these cube roots are 3, 3 2 2 2 2 (x − 3) x − 3
√ !! 3 −1 +i x−3 2 2
√ !! −1 3 −i . 2 2
√ √ 3 3 Note also x − 3 −1 x − 3 −1 = x2 + 3x + 9 and so 2 +i 2 2 −i 2 x3 − 27 = (x − 3) x2 + 3x + 9
where the quadratic polynomial x2 + 3x + 9 cannot be factored without using complex numbers. Note that even though the polynomial x3 − 27 has all real coefficients, it has some √ √ −1 −1 3 3 +i and −i . These zeros are complex conjugates of each other. complex zeros, 2 2 2 2 It is always this way. You should show this is the case. To see how to do this, see Problems 17 and 18 below. Another fact for your information is the fundamental theorem of algebra. This theorem says that any polynomial of degree at least 1 having any complex coefficients always has a root in C. This is sometimes referred to by saying C is algebraically complete. Gauss is usually credited with giving a proof of this theorem in 1797 but many others worked on it and the first completely correct proof was due to Argand in 1806. For more on this theorem, you can google fundamental theorem of algebra and look at the interesting Wikipedia article on it. Proofs of this theorem usually involve the use of techniques from calculus even though it is really a result in algebra. A proof and plausibility explanation is given later. Recall also the quadratic formula which gives solutions for solutions to ax2 + bx + c = 0 which holds for any a, b, c ∈ C with a 6= 0. This is also good to review from any good precalculus book. My book published with http://www.centerofmath.org/textbooks/pre calc/index.html (2012) has all these elementary considerations. Most are in my on line calculus text or Volume 1 of the one published by World Scientific. This is a good time to review the quadratic formula also.
1.5.2
The Complex Exponential
Here is a short review of the complex exponential. It was shown above that every complex number can be written in the form r (cos θ + i sin θ) where r ≥ 0. Laying aside the zero complex number, this shows that every non zero complex
14
CHAPTER 1. BASIC NOTIONS
number is of the form eα (cos β + i sin β) . We write this in the form eα+iβ . Having done so, does it follow that the expression preserves the most important property of the function t → e(α+iβ)t for t real, that 0 e(α+iβ)t = (α + iβ) e(α+iβ)t ? By the definition just given which does not contradict the usual definition in case β = 0 and the usual rules of differentiation in calculus, 0 0 = eαt (cos (βt) + i sin (βt)) e(α+iβ)t = eαt [α (cos (βt) + i sin (βt)) + (−β sin (βt) + iβ cos (βt))] Now consider the other side. From the definition it equals (α + iβ) eαt (cos (βt) + i sin (βt)) = eαt [(α + iβ) (cos (βt) + i sin (βt))] = eαt [α (cos (βt) + i sin (βt)) + (−β sin (βt) + iβ cos (βt))] which is the same thing. This is of fundamental importance in differential equations. It shows that there is no change in going from real to complex numbers for ω in the consideration of the problem y 0 = ωy, y (0) = 1. The solution is always eωt . The formula just discussed, that eα (cos β + i sin β) = eα+iβ is Euler’s formula. He originally conceived of this formula by considering power series of cos and sin and re arranging the order of the infinite sums.
1.5.3
The Cauchy Schwarz Inequality
This fundamental inequality takes several forms. I will present the version first given by Cauchy although I am not sure if the proof is the same.
Proposition 1.5.5 Let zj , wj be complex numbers. Then 1/2 1/2 X p p X X p 2 2 zj wj ≤ |wj | |zj | j=1 j=1 j=1 Proof: First note that
p X j=1
zj zj =
p X
2
|zj | ≥ 0
j=1
Next, if a + ib is a complex number, consider θ = 1 if both a, b are zero and θ = √a−ib a2 +b2 if the complex number is not zero. Thus, in√either case, there exists a complex number θ Pp such that |θ| = 1 and θ (a + ib) = |a + ib| ≡ a2 + b2 . Now let |θ| = 1 and θ j=1 zj wj = P p j=1 zj wj . Then for t a real number, a2
b2
z }| { z }| { p p p p p X X X X X zj zj + zj tθwj + tθwj zj + t2 wj wj 0 ≤ p (t) ≡ (zj + tθwj ) zj + tθwj = j=1
j=1
j=1
j=1
p p X X ≡ a2 + 2t Re θ wj zj + t2 b2 = a2 + 2t wj zj + t2 b2 j=1 j=1
j=1
1.6.
POLYNOMIALS AND ALGEBRA
15
Since this is always nonnegative for all real t, it follows from the quadratic formula that 2 p X wj zj − 4a2 b2 ≤ 0 4 j=1 It follows that a2 =
Pp
j=1
2
|zj | , b2 =
Pp
j=1
2
|wj | and so
1/2 1/2 X p p X X p 2 2 wj zj ≤ |zj | |wj | . j=1 j=1 j=1
1.6
Polynomials and Algebra
Polynomials are of fundamental significance in calculus and linear algebra. Therefore, an introduction is given here. In elementary calculus, they are regarded as functions. It is better to begin with a consideration of them as expressions of a certain kind. The two ways of thinking about them turn out to be different when you ask which polynomials equal 0. This will not matter in this book but it does matter very much when your field of scalars is something like the residue classes of integers.
Definition 1.6.1
A polynomial is an expression of the form an λn +an−1 λn−1 +· · ·+ a1 λ + a0 , an 6= 0 where the ai come from a field of scalars like R or C. Two polynomials are equal means that the coefficients match for each power of λ. The degree of a polynomial is the largest power of λ. Thus the degree of the above polynomial is n. Addition of polynomials is defined in the usual way as is multiplication of two polynomials. The leading term in the above polynomial is an λn . The coefficient of the leading term is called the leading coefficient. It is called a monic polynomial if an = 1. Note that the degree of the zero polynomial is not defined in the above.
Lemma 1.6.2 Let f (λ) and g (λ) 6= 0 be polynomials. Then there exist polynomials, q (λ) and r (λ) such that f (λ) = q (λ) g (λ) + r (λ) where the degree of r (λ) is less than the degree of g (λ) or r (λ) = 0. These polynomials q (λ) and r (λ) are unique. Proof: Suppose that f (λ) − q (λ) g (λ) is never equal to 0 for any q (λ). If it is, then the conclusion follows. Now suppose r (λ) = f (λ) − q (λ) g (λ) and the degree of r (λ) is m ≥ n where n is the degree of g (λ). Say the leading term of r (λ) is bλm while the leading term of g (λ) is ˆbλn . Then letting a = b/ˆb , aλm−n g (λ) has the same leading term as r (λ). Thus the degree of r1 (λ) ≡ r (λ) − aλm−n g (λ) is no more than m − 1. Then q1 (λ) z }| { r1 (λ) = f (λ) − q (λ) g (λ) + aλm−n g (λ) = f (λ) − q (λ) + aλm−n g (λ) Denote by S the set of polynomials f (λ) − g (λ) l (λ) . Out of all these polynomials, there exists one which has smallest degree r (λ). Let this take place when l (λ) = q (λ). Then by the above argument, the degree of r (λ) is less than the degree of g (λ). Otherwise, there is one which has smaller degree. Thus f (λ) = g (λ) q (λ) + r (λ).
16
CHAPTER 1. BASIC NOTIONS As to uniqueness, if you have r (λ) , rˆ (λ) , q (λ) , qˆ (λ) which work, then you would have (ˆ q (λ) − q (λ)) g (λ) = r (λ) − rˆ (λ)
Now if the polynomial on the right is not zero, then neither is the one on the left. Hence this would involve two polynomials which are equal although their degrees are different. This is impossible. Hence r (λ) = rˆ (λ) and so, matching coefficients implies that qˆ (λ) = q (λ).
Definition 1.6.3 A polynomial f is said to divide a polynomial g if g (λ) = f (λ) r (λ) for some polynomial r (λ). Let {φi (λ)} be a finite set of polynomials. The greatest common divisor will be the monic polynomial q (λ) such that q (λ) divides each φi (λ) and if p (λ) divides each φi (λ) , then p (λ) divides q (λ) . The finite set of polynomials {φi } is said to be relatively prime if their greatest common divisor is 1. A polynomial f (λ) is irreducible if there is no polynomial with coefficients in F which divides it except nonzero scalar multiples of f (λ) and constants. In other words, it is not possible to write f (λ) = a (λ) b (λ) where each of a (λ) , b (λ) have degree less than the degree of f (λ). Proposition 1.6.4 The greatest common divisor is unique. Proof: Suppose both q (λ) and q 0 (λ) work. Then q (λ) divides q 0 (λ) and the other way around and so q 0 (λ) = q (λ) l (λ) , q (λ) = l0 (λ) q 0 (λ) Therefore, the two must have the same degree. Hence l0 (λ) , l (λ) are both constants. However, this constant must be 1 because both q (λ) and q 0 (λ) are monic.
Theorem 1.6.5
Let {φi (λ)} be polynomials, not all of which are zero polynomials. Then there exists a greatest common divisor and it equals the monic polynomial ψ (λ) of smallest degree such that there exist polynomials ri (λ) satisfying ψ (λ) =
p X
ri (λ) φi (λ) .
i=1
Proof: Let S denote the set of monic polynomials which are of the form p X
ri (λ) φi (λ)
i=1
where ri (λ) is a polynomial. Then S 6=P∅ because some φi (λ) 6= 0. Then let the ri be chosen p such that the degree of the expression i=1 ri (λ) φi (λ) is as small as possible. Letting ψ (λ) equal this sum, it remains to verify it is the greatest common divisor. First, does it divide each φi (λ)? Suppose it fails to divide φ1 (λ) . Then by Lemma 1.6.2 φ1 (λ) = ψ (λ) l (λ) + r (λ) where degree of r (λ) is less than that of ψ (λ). Then dividing r (λ) by the leading coefficient if necessary and denoting the result by ψ 1 (λ) , it follows the degree of ψ 1 (λ) is less than the degree of ψ (λ) and ψ 1 (λ) equals for some a ∈ F ψ 1 (λ) = (φ1 (λ) − ψ (λ) l (λ)) a
=
φ1 (λ) −
p X
! ri (λ) φi (λ) l (λ) a
i=1
=
(1 − r1 (λ)) φ1 (λ) +
p X i=2
! (−ri (λ) l (λ)) φi (λ) a
1.6.
POLYNOMIALS AND ALGEBRA
17
This is one of the polynomials in S. Therefore, ψ (λ) does not have the smallest degree after all because the degree of ψ 1 (λ) is smaller. This is a contradiction. Therefore, ψ (λ) divides φ1 (λ) . Similarly it divides all the other φi (λ). If p (λ) Pp divides all the φi (λ) , then it divides ψ (λ) because of the formula for ψ (λ) which equals i=1 ri (λ) φi (λ) . Thus ψ (λ) satisfies the condition to be the greatest common divisor. This shows the greatest common divisor exists and equals the above description of it.
Lemma 1.6.6 Suppose φ (λ) and ψ (λ) are monic polynomials which are irreducible and not equal. Then they are relatively prime. Proof: Suppose η (λ) is a nonconstant polynomial. If η (λ) divides φ (λ) , then since φ (λ) is irreducible, η (λ) equals aφ (λ) for some a ∈ F. If η (λ) divides ψ (λ) then it must be of the form bψ (λ) for some b ∈ F and so it follows ψ (λ) = ab φ (λ) but both ψ (λ) and φ (λ) are monic polynomials which implies a = b and so ψ (λ) = φ (λ). This is assumed not to happen. It follows the only polynomials which divide both ψ (λ) and φ (λ) are constants and so the two polynomials are relatively prime. Thus a polynomial which divides them both must be a constant, and if it is monic, then it must be 1. Thus 1 is the greatest common divisor.
Lemma 1.6.7 Let ψ (λ) be an irreducible monic polynomial not equal to 1 which divides p Y
k
φi (λ) i , ki a positive integer,
i=1
where each φi (λ) is an irreducible monic polynomial not equal to 1. Then ψ (λ) equals some φi (λ) . Qp k Proof : Say ψ (λ) l (λ) = i=1 φi (λ) i . Suppose ψ (λ) 6= φi (λ) for all i. Then by Lemma 1.6.6, there exist polynomials mi (λ) , ni (λ) such that 1 φi (λ) ni (λ) Hence, (φi (λ) ni (λ))
ki
= ψ (λ) mi (λ) + φi (λ) ni (λ) =
1 − ψ (λ) mi (λ) ki
= (1 − ψ (λ) mi (λ))
n (λ) l (λ) ψ (λ) =
p Y
and so letting n (λ) = ki
(ni (λ) φi (λ))
=
i=1
p Y
Q
i
k
ni (λ) i ,
(1 − ψ (λ) mi (λ))
ki
i=1
= 1 + g (λ) ψ (λ) for a suitable polynomial g (λ) . You just separate out the term 1ki = 1 in that product and then all terms that are left have a ψ (λ) as a factor. Hence (n (λ) l (λ) − g (λ)) ψ (λ) = 1 which is impossible because ψ (λ) is not equal to 1. Of course, since coefficients are in a field, you can drop the stipulation that the polynomials are monic and replace the conclusion with: ψ (λ) is a multiple of some φi (λ) . Now here is a simple lemma about canceling monic polynomials.
Lemma 1.6.8 Suppose p (λ) is a monic polynomial and q (λ) is a polynomial such that p (λ) q (λ) = 0. Then q (λ) = 0. Also if p (λ) q1 (λ) = p (λ) q2 (λ) then q1 (λ) = q2 (λ) .
18
CHAPTER 1. BASIC NOTIONS Proof: Let p (λ) =
k X
pj λj , q (λ) =
j=1
n X
qi λi , pk = 1.
i=1
Then the product equals k X n X
pj qi λi+j .
j=1 i=1
If not all qi = 0, let qm be the last coefficient which is nonzero. Then the above is of the form k X m X pj qi λi+j = 0 j=1 i=1 m+k
Consider the λ term. There is only one and it is pk qm λm+k . Since pk = 1, qm = 0 after all. The second part follows from p (λ) (q1 (λ) − q2 (λ)) = 0. The following is the analog of the fundamental theorem of arithmetic for polynomials.
Theorem 1.6.9
Let f (λ) be a Q nonconstant polynomial with coefficients in F. Then n there is some a ∈ F such that f (λ) = a i=1 φi (λ) where φi (λ) is an irreducible nonconstant monic polynomial and repeats are allowed. Furthermore, this factorization is unique in the sense that any two of these factorizations have the same nonconstant factors in the product, possibly in different order and the same constant a. Proof: That such a factorization exists is obvious. If f (λ) is irreducible, you are done. Factor out the leading coefficient. If not, then f (λ) = aφ1 (λ) φ2 (λ) where these are monic polynomials. Continue doing this with the φi and eventually arrive at a factorization of the desired form. It remains to argue the factorization is unique except for order of the factors. Suppose a
n Y
φi (λ) = b
i=1
m Y
ψ i (λ)
i=1
where the φi (λ) and the ψ i (λ) are all irreducible monic nonconstant polynomials and a, b ∈ F. If n > m, then by Lemma 1.6.7, each ψ i (λ) equals one of the φj (λ) . By the above cancellation lemma, Lemma 1.6.8, you can cancel all these ψ i (λ) with appropriate φj (λ) and obtain a contradiction because the resulting polynomials on either side would have different degrees. Similarly, it cannot happen that n < m. It follows n = m and the two products consist of the same polynomials. Then it follows a = b. The following corollary seems rather believable but does require a proof. Q Corollary 1.6.10 Let q (λ) = pi=1 φi (λ)ki where the ki are positive integers and the φi (λ) are irreducible distinct monic polynomials. Suppose also that p (λ) is a monic polynomial which divides q (λ) . Then p (λ) =
p Y
φi (λ)
ri
i=1
where ri is a nonnegative integer no larger than ki . Qs r Proof: Using Theorem 1.6.9, let p (λ) = b i=1 ψ i (λ) i where the ψ i (λ) are each irreducible and monic and b ∈ F. Since p (λ) is monic, b = 1. Then there exists a polynomial g (λ) such that p s Y Y r k p (λ) g (λ) = g (λ) ψ i (λ) i = φi (λ) i i=1
i=1
1.6.
POLYNOMIALS AND ALGEBRA
19
Hence g (λ) must be monic. Therefore, p(λ)
z }| { p s l Y Y Y r k p (λ) g (λ) = ψ i (λ) i η j (λ) = φi (λ) i i=1
j=1
i=1
for η j monic and irreducible. By uniqueness, each ψ i equals one of the φj (λ) and the same holding true of the η i (λ). Therefore, p (λ) is of the desired form because you can cancel the η j (λ) from both sides. A very useful result is the partial fractions expansion theorem.
Proposition 1.6.11 Suppose r (λ) =
a(λ) p(λ)m
where a (λ) is a polynomial and p (λ) is an irreducible polynomial meaning that the only polynomials dividing p (λ) are numbers and scalar multiples of p (λ). That is, you can’t factor it any further. Then r (λ) = q (λ) +
m X bk (λ) k
k=1
p (λ)
, where degree of bk (λ) < degree of p (λ)
Proof: By the division algorithm, a (λ) = p (λ) l (λ) + b (λ) where the degree of b (λ) is less than the degree of p (λ) or else is 0. Then r (λ) = Now do the same process for of the claimed form.
b (λ) l (λ) p (λ) l (λ) + b (λ) = m m m−1 + p (λ) p (λ) p (λ)
l(λ) p(λ)m−1
and continue doing this till you end up with something
Lemma 1.6.12 Suppose you have a rational function
a(λ) b(λ) .
Then this is of the form
n (λ) mi i=1 pi (λ)
p (λ) + Qm
where theQ {p1 (λ) , · · · , pm (λ)} are relatively prime and the degree of n (λ) is less than the m m degree of i=1 pi (λ) i . Proof: If the degree of a (λ) is greater than the degree of b (λ) , do division to obtain a (λ) n (λ) = p (λ) + b (λ) b (λ) where the degree of n (λ) is less than the degree of b (λ). Now use the theory in the chapter to factor b (λ) as a product of powers of relatively prime polynomials pi (λ) . This gives what is desired. Next is the partial fractions theorem. This is a result in algebra but it is very useful in calculus.
Theorem 1.6.13
Let r (λ) =
a(λ) b(λ)
pˆ (λ) +
be a rational function. Then it is of the form mi m X X nki (λ) k
i=1 k=1
pi (λ)
where the degree of nki (λ) is less than the degree of pi (λ) and pˆ (λ) is some polynomial.
20
CHAPTER 1. BASIC NOTIONS Proof: First use the lemma and write n (λ) mi i=1 pi (λ)
r (λ) = p (λ) + Qm
where the pi (λ) are relatively prime. Pm Next multiply by i=1 ai (λ) pi (λ) = 1. Then you get Pm n (λ) i=1 ai (λ) pi (λ) Qm r (λ) = p (λ) + mi i=1 pi (λ) m X n (λ) ai (λ) = p (λ) + mi −1 Q mj j6=i pj (λ) i=1 pi (λ) Do long division if necessary to obtain p˜ (λ) +
m X i=1
Now do for each
ni (λ) Q
mi −1
ni (λ) Q pi (λ)mi −1 j6=i pj (λ)mj
pi (λ)
j6=i
mj
pj (λ)
what was just done and keep doing till you get an
expression of the desired form in which every term has only powers of some single pj (λ) in the denominator. The degree of the bottom is decreasing with each step so eventually, the process must end. Then use the above proposition on each of these terms. This is the partial fractions expansion of the rational function. Actually carrying out this computation may be impossible, but this shows the existence of such a partial fractions expansion. From the fundamental theorem of algebra discussed next, every polynomial has a factorization into linear factors if you allow complex coefficients. If all coefficients are real, then all roots come in conjugate pairs and so every real polynomial can be factored into a product of irreducible quadratics and linear polynomials. Indeed, (z − a)(z − a) is a quadratic polynomial with real coefficients. Thus, in the case of real coefficients, the degree of each pi (λ) can always be taken no more than 2. This was used a lot in calculus.
1.7
The Fundamental Theorem of Algebra
The fundamental theorem of algebra states that every non constant polynomial having coefficients in C has a zero in C. If C is replaced by R, this is not true because of the example, x2 + 1 = 0. This theorem is a very remarkable result and notwithstanding its title, all the proofs depend on either analysis or topology in some way. It was first mostly proved by Gauss in 1797. The first complete proof was given by Argand in 1806. The proof given later in the book follows Rudin [60]. See also Hardy [32] for a similar proof, more discussion and references. The shortest proofs are in the theory of complex analysis and are presented later. Here is an informal explanation of this theorem which shows why it is reasonable to believe in the fundamental theorem of algebra.
Theorem 1.7.1
Let p (z) = an z n + an−1 z n−1 + · · · + a1 z + a0 where each ak is a complex number and an 6= 0, n ≥ 1. Then there exists w ∈ C such that p (w) = 0. To begin with, here is the informal explanation. Dividing by the leading coefficient an , there is no loss of generality in assuming that the polynomial is of the form p (z) = z n + an−1 z n−1 + · · · + a1 z + a0 If a0 = 0, there is nothing to prove because p (0) = 0. Therefore, assume a0 6= 0. From the polar form of a complex number z, it can be written as |z| (cos θ + i sin θ). Thus, by DeMoivre’s theorem, n z n = |z| (cos (nθ) + i sin (nθ))
1.8.
SOME TOPICS FROM ANALYSIS
21 n
It follows that z n is some point on the circle of radius |z| Denote by Cr the circle of radius r in the complex plane which is centered at 0. Then if r is sufficiently large and |z| = r, the term z n is far larger than the rest of the polynon mial. It is on the circle of radius |z| while the other terms are on circles of fixed mulk tiples of |z| for k ≤ n − 1. Thus, for r large enough, Ar = {p (z) : z ∈ Cr } describes a closed curve which misses the inside of some circle having 0 as its center. It won’t be as simple as suggested in the following picture, but it will be a closed curve thanks to De Moivre’s theorem and the observation that the cosine and sine are periodic. Now shrink r. Eventually, for r small enough, the non constant terms are negligible and so Ar is a curve which is contained in some circle centered at a0 which has 0 on the outside. Thus it is reasonable to believe that for some r durAr • a0 Ar r large ing this shrinking process, the set Ar must hit 0. It follows that p (z) = 0 for some z. •0 For example, consider the polynomial x3 +x+1+i. r small It has no real zeros. However, you could let z = r (cos t + i sin t) and insert this into the polynomial. Thus you would want to find a point where 3 (r (cos t + i sin t)) + r (cos t + i sin t) + 1 + i = 0 + 0i Expanding this expression on the left to write it in terms of real and imaginary parts, you get on the left r3 cos3 t − 3r3 cos t sin2 t + r cos t + 1 + i 3r3 cos2 t sin t − r3 sin3 t + r sin t + 1 Thus you need to have both the real and imaginary parts equal to 0. In other words, you need to have (0, 0) = r3 cos3 t − 3r3 cos t sin2 t + r cos t + 1, 3r3 cos2 t sin t − r3 sin3 t + r sin t + 1 for some value of r and t. First here is a graph of this parametric function of t for t ∈ [0, 2π] on the left, when r = 4. Note how the graph misses the origin 0 + i0. In fact, the closed curve is in the exterior of a circle which has the point 0 + i0 on its inside. r too big r too small r just right
y
50
2
0
y 0
-50
-2 -50
0
x
50
4
y
2 0 -2
-2
0
2
x
-4
-2
0
2
4
6
x
Next is the graph when r = .5. Note how the closed curve is included in a circle which has 0 + i0 on its outside. As you shrink r you get closed curves. At first, these closed curves enclose 0 + i0 and later, they exclude 0 + i0. Thus one of them should pass through this point. In fact, consider the curve which results when r = 1. 386 which is the graph on the right. Note how for this value of r the curve passes through the point 0 + i0. Thus for some t, 1.386 (cos t + i sin t) is a solution of the equation p (z) = 0 or very close to one.
1.8
Some Topics from Analysis
Recall from calculus that if A is a nonempty set, supa∈A f (a) denotes the least upper bound of f (A) or if this set is not bounded above, it equals ∞. Also inf a∈A f (a) denotes the greatest lower bound of f (A) if this set is bounded below and it equals −∞ if f (A) is not bounded below. Thus to say supa∈A f (a) = ∞ is just a way to say that A is not bounded above. The existence of these quantities is what we mean when we say that R is complete.
22
CHAPTER 1. BASIC NOTIONS
Definition 1.8.1
Let f (a, b) ∈ [−∞, ∞] for a ∈ A and b ∈ B where A, B are sets which means that f (a, b) is either a number, ∞, or −∞. The symbol, +∞ is interpreted as a point out at the end of the number line which is larger than every real number. Of course there is no such number. That is why it is called ∞. The symbol, −∞ is interpreted similarly. Then supa∈A f (a, b) means sup (Sb ) where Sb ≡ {f (a, b) : a ∈ A} . Unlike limits, you can take the sup in different orders.
Lemma 1.8.2 Let f (a, b) ∈ [−∞, ∞] for a ∈ A and b ∈ B where A, B are sets. Then sup sup f (a, b) = sup sup f (a, b) . a∈A b∈B
b∈B a∈A
Proof: Note that for all a, b, f (a, b) ≤ sup sup f (a, b) b∈B a∈A
and therefore, for all a, supb∈B f (a, b) ≤ supb∈B supa∈A f (a, b). Therefore, sup sup f (a, b) ≤ sup sup f (a, b) . a∈A b∈B
b∈B a∈A
Repeat the same argument interchanging a and b, to get the conclusion of the lemma.
Theorem 1.8.3
Let aij ≥ 0. Then ∞ X ∞ X
aij =
i=1 j=1
∞ X ∞ X
aij .
j=1 i=1
Proof: First note there is no trouble in defining these sums because the aij are all nonnegative. If a sum diverges, it only diverges to ∞ and so ∞ is the value of the sum. Next note that ∞ X ∞ ∞ X n X X aij ≥ sup aij n
j=r i=r
because for all j,
∞ X
aij ≥
i=r
Therefore,
∞ ∞ X X
aij ≥ sup n
j=r i=r
=
=
sup lim
n
n X
aij .
i=r
∞ X n X
aij = sup lim
n X m X
aij = sup
aij = lim
i=r j=r
n X
n
i=r j=r n X ∞ X
m X n X
n m→∞
j=r i=r
n m→∞
sup
j=r i=r
i=r n X ∞ X
n→∞
lim
m X
aij j=r ∞ X ∞ X
m→∞
aij =
i=r j=r
aij
j=r i=r
aij
i=r j=r
Interchanging the i and j in the above argument proves the theorem.
Corollary 1.8.4 If aij ≥ 0, then ∞ X ∞ X j=r i=r
aij =
X
aij
i,j
the last symbol meaning for Nr the integers larger than or equal to r, X sup aij where S is a finite subset of Nr × Nr (i,j)∈S
1.8.
SOME TOPICS FROM ANALYSIS
23
P P∞ P∞ P P∞ P∞ Proof: Obviously (i,j)∈S aij ≤ j=r i=r aij and so i,j aij ≤ j=r i=r aij . Now P∞ P∞ let λ < j=r i=r aij . Then there exists M such that M X ∞ X
aij =
j=r i=r
Now there is N such that
aij > λ
i=r j=r
N X M X
λ
p, then X X ∞ ∞ X ∞ ∞ X X q X ∞ |aij | b ≤ b ≤ ij ij i=p j=r i=p j=r i=p j=r P∞ P∞ which is small if p is large enough because i=r j=r |aij | < ∞. Thus the partial sums form a Cauchy sequence and so ∞ X ∞ X bij exists. i=r j=r
Similarly
P∞ P∞ i=r
j=r bij
exists. Now |aij | + Re aij ≥ 0. Thus
∞ X ∞ X
|aij | + Re aij =
i=r j=r
∞ X ∞ X
|aij | + Re aij
j=r i=r
Thus, from the definition of the infinite sums, ∞ X ∞ X i=r j=r
|aij | +
∞ X ∞ X
Re aij =
i=r j=r
∞ X ∞ X
|aij | +
j=r i=r
∞ X ∞ X
Re aij
j=r i=r
Subtracting that which is known to be equal from both sides, ∞ X ∞ X i=r j=r
Re aij =
∞ X ∞ X
Re aij
j=r i=r
A similar equation by the P∞ Pholds P∞same P∞reasoning for Im aij in place of Re aij . Then this ∞ implies that i=r j=r aij = j=r i=r aij .
24
1.8.1
CHAPTER 1. BASIC NOTIONS
lim sup and lim inf n
Sometimes the limit of a sequence does not exist. For example, if an = (−1) , then limn→∞ an does not exist. This is because the terms of the sequence are a distance of 1 apart. Therefore there can’t exist a single number such that all the terms of the sequence are ultimately within 1/4 of that number. The nice thing about lim sup and lim inf is that they always exist. First here is a simple lemma and definition.
Definition 1.8.6 Denote by [−∞, ∞] the real line along with symbols ∞ and −∞. It is understood that ∞ is larger than every real number and −∞ is smaller than every real number. Then if {An } is an increasing sequence of points of [−∞, ∞] , limn→∞ An equals ∞ if the only upper bound of the set {An } is ∞. If {An } is bounded above by a real number, then limn→∞ An is defined in the usual way and equals the least upper bound of {An }. If {An } is a decreasing sequence of points of [−∞, ∞] , limn→∞ An equals −∞ if the only lower bound of the sequence {An } is −∞. If {An } is bounded below by a real number, then limn→∞ An is defined in the usual way and equals the greatest lower bound of {An }. More simply, if {An } is increasing, lim An ≡ sup {An } n→∞
and if {An } is decreasing then lim An ≡ inf {An } .
n→∞
Lemma 1.8.7 Let {an } be a sequence of real numbers and let Un ≡ sup {ak : k ≥ n} . Then {Un } is a decreasing sequence. Also if Ln ≡ inf {ak : k ≥ n} , then {Ln } is an increasing sequence. Therefore, limn→∞ Ln and limn→∞ Un both exist. Proof: Let Wn be an upper bound for {ak : k ≥ n} . Then since these sets are getting smaller, it follows that for m < n, Wm is an upper bound for {ak : k ≥ n} . In particular if Wm = Um , then Um is an upper bound for {ak : k ≥ n} and so Um is at least as large as Un , the least upper bound for {ak : k ≥ n} . The claim that {Ln } is decreasing is similar. From the lemma, the following definition makes sense.
Definition 1.8.8
Let {an } be any sequence of points of [−∞, ∞] lim sup an ≡ lim sup {ak : k ≥ n} n→∞
n→∞
lim inf an ≡ lim inf {ak : k ≥ n} . n→∞
Theorem 1.8.9
n→∞
Suppose {an } is a sequence of real numbers and that lim sup an n→∞
and lim inf an n→∞
are both real numbers. Then limn→∞ an exists if and only if lim inf an = lim sup an n→∞
n→∞
and in this case, lim an = lim inf an = lim sup an .
n→∞
n→∞
n→∞
1.8.
SOME TOPICS FROM ANALYSIS
25
Proof: First note that sup {ak : k ≥ n} ≥ inf {ak : k ≥ n} and so, lim sup an
≡
n→∞
≥
lim sup {ak : k ≥ n}
n→∞
lim inf {ak : k ≥ n}
n→∞
≡ lim inf an . n→∞
Suppose first that limn→∞ an exists and is a real number a. Then from the definition of a limit, there exists N corresponding to ε/6 in the definition. Hence, if m, n ≥ N, then |an − am | ≤ |an − a| + |a − an |
0, there exists N such that sup {ak : k ≥ N } − inf {ak : k ≥ N } < ε, and for every N, inf {ak : k ≥ N } ≤ a ≤ sup {ak : k ≥ N } Thus if n ≥ N, |a − an | < ε which implies that limn→∞ an = a. In case a = ∞ = lim sup {ak : k ≥ N } = lim inf {ak : k ≥ N } N →∞
N →∞
then if r ∈ R is given, there exists N such that inf {ak : k ≥ N } > r which is to say that limn→∞ an = ∞. The case where a = −∞ is similar except you use sup {ak : k ≥ N }. The significance of lim sup and lim inf, in addition to what was just discussed, is contained in the following theorem which follows quickly from the definition.
26
CHAPTER 1. BASIC NOTIONS
Theorem 1.8.10
Suppose {an } is a sequence of points of [−∞, ∞] . Let λ = lim sup an . n→∞
Then if b > λ, it follows there exists N such that whenever n ≥ N, an ≤ b. If c < λ, then an > c for infinitely many values of n. Let γ = lim inf an . n→∞
Then if d < γ, it follows there exists N such that whenever n ≥ N, an ≥ d. If e > γ, it follows an < e for infinitely many values of n. The proof of this theorem is left as an exercise for you. It follows directly from the definition and it is the sort of thing you must do yourself. Here is one other simple proposition.
Proposition 1.8.11 Let limn→∞ an = a > 0. Then lim sup an bn = a lim sup bn . n→∞
n→∞
Proof: This follows from the definition. Let λn = sup {ak bk : k ≥ n} . For all n large enough, an > a − ε where ε is small enough that a − ε > 0. Therefore, λn ≥ sup {bk : k ≥ n} (a − ε) for all n large enough. Then lim sup an bn
=
lim λn
n→∞
n→∞
≥
lim (sup {bk : k ≥ n} (a − ε))
n→∞
(a − ε) lim sup bn
=
n→∞
Similar reasoning shows lim sup an bn ≤ (a + ε) lim sup bn n→∞
n→∞
Now since ε > 0 is arbitrary, the conclusion follows. A fundamental existence theorem is the nested interval lemma.
1.8.2
Nested Interval Lemma
A fundamental existence theorem is the nested interval lemma.
Lemma 1.8.12 Let Ik = [ak , bk ] and suppose Ik ⊇ Ik+1 for all k. Then there is a point x ∈ ∩∞ k=1 Ik . Proof: Suppose k ≤ l. Then ak ≤ al ≤ bl . On the other hand, suppose k > l. Then ak ≤ bk ≤ bl . Let a ≡ supk ak . Then from what was just observed, a ≤ bl for each l. Therefore, also a ≤ inf bl , so al ≤ a ≤ bl l
for every l showing that a is a point in all these intervals.
1.9. EXERCISES
1.9
27
Exercises
1 4 1 3 1 2 n + n + n . 4 2 4 √ Pn 1 2. Prove by induction that whenever n ≥ 2, k=1 √ > n. k Pn 3. Prove by induction that 1 + i=1 i (i!) = (n + 1)!. Pn n 4. The binomial theorem states (x + y) = k=0 nk xn−k y k where n+1 n n n n = + if k ∈ [1, n] , ≡1≡ k k k−1 0 n
1. Prove by induction that
Pn
k=1
k3 =
Prove the binomial theorem by induction. Next show that n n! , 0! ≡ 1 = (n − k)!k! k 5. Let z = 5 + i9. Find z −1 . 6. Let z = 2 + i7 and let w = 3 − i8. Find zw, z + w, z 2 , and w/z. 7. Give the complete solution to x4 + 16 = 0. 8. Graph the complex cube roots of 8 in the complex plane. Do the same for the four fourth roots of 16. 9. If z is a complex number, show there exists ω a complex number with |ω| = 1 and ωz = |z| . n
10. De Moivre’s theorem says [r (cos t + i sin t)] = rn (cos nt + i sin nt) for n a positive integer. Does this formula continue to hold for all integers n, even negative integers? Explain. 11. You already know formulas for cos (x + y) and sin (x + y) and these were used to prove De Moivre’s theorem. Now using De Moivre’s theorem, derive a formula for sin (5x) and one for cos (5x). 12. If z and w are two complex numbers and the polar form of z involves the angle θ while the polar form of w involves the angle φ, show that in the polar form for zw the angle involved is θ + φ. Also, show that in the polar form of a complex number z, r = |z| . 13. Factor x3 + 8 as a product of linear factors. 14. Write x3 + 27 in the form (x + 3) x2 + ax + b where x2 + ax + b cannot be factored any more using only real numbers. 15. Completely factor x4 + 16 as a product of linear factors. 16. Factor x4 + 16 as the product of two quadratic polynomials each of which cannot be factored further without using complex numbers. Qn 17. If z, w are complex numbers prove zw = zw and then show by induction that j=1 zj = Qn Pm Pm j=1 zj . Also verify that k=1 zk . In words this says the conjugate of a k=1 zk = product equals the product of the conjugates and the conjugate of a sum equals the sum of the conjugates. 18. Suppose p (x) = an xn + an−1 xn−1 + · · · + a1 x + a0 where all the ak are real numbers. Suppose also that p (z) = 0 for some z ∈ C. Show it follows that p (z) = 0 also.
28
CHAPTER 1. BASIC NOTIONS
19. Show that 1 + i, 2 + i are the only two zeros to p (x) = x2 − (3 + 2i) x + (1 + 3i) so the zeros do not necessarily come in conjugate pairs if the coefficients are not real. 20. I claim that 1 = −1. Here is why. −1 = i2 =
√
q √ √ 2 −1 −1 = (−1) = 1 = 1.
This is clearly a remarkable result but is there something wrong with it? If so, what is wrong? 21. De Moivre’s theorem is really a grand thing. I plan to use it now for rational exponents, not just integers. 1/4
1 = 1(1/4) = (cos 2π + i sin 2π)
= cos (π/2) + i sin (π/2) = i.
Therefore, squaring both sides it follows 1 = −1 as in the previous problem. What does this tell you about De Moivre’s theorem? Is there a profound difference between raising numbers to integer powers and raising numbers to non integer powers? 22. Review Problem 10 at this point. Now here is another question: If n is an integer, is n it always true that (cos θ − i sin θ) = cos (nθ) − i sin (nθ)? Explain. 23. Suppose you Pmhave Pnany polynomial in cos θ and sin θ. By this I mean an expression of the form α=0 β=0 aαβ cosα θ sinβ θ where aαβ ∈ C. Can this always be written in Pm+n Pn+m the form γ=−(n+m) bγ cos γθ + τ =−(n+m) cτ sin τ θ? Explain. 24. Show that C cannot be considered an ordered field. Hint: Consider i2 = −1. 25. Suppose p (x) = an xn + an−1 xn−1 + · · · + a1 x + a0 is a polynomial and it has n zeros, z1 , z2 , · · · , zn listed according to multiplicity. (z is a root of multiplicity m if the polynomial f (x) = m (x − z) divides p (x) but (x − z) f (x) does not.) Show that p (x) = an (x − z1 ) (x − z2 ) · · · (x − zn ) . 26. Give the solutions to the following quadratic equations having real coefficients. (a) x2 − 2x + 2 = 0 (b) 3x2 + x + 3 = 0 (c) x2 − 6x + 13 = 0 (d) x2 + 4x + 9 = 0 (e) 4x2 + 4x + 5 = 0 27. Give the solutions to the following quadratic equations having complex coefficients. Note how the solutions do not come in conjugate pairs as they do when the equation has real coefficients. (a) x2 + 2x + 1 + i = 0 (b) 4x2 + 4ix − 5 = 0 (c) 4x2 + (4 + 4i) x + 1 + 2i = 0 (d) x2 − 4ix − 5 = 0
1.9. EXERCISES
29
(e) 3x2 + (1 − i) x + 3i = 0 28. Prove the fundamental theorem of algebra for quadratic polynomials having coefficients in C. That is, show that an equation of the form ax2 + bx + c = 0 where a, b, c are complex numbers, a 6= 0 has a complex solution. Hint: Consider the fact, noted earlier that the expressions given from the quadratic formula do in fact serve as solutions. 29. Prove the Euclidean algorithm: If m, n are positive integers, then there exist integers q, r ≥ 0 such that r < m and n = qm + r Hint: You might try considering S ≡ {n − km : k ∈ N and n − km < 0} and picking the smallest integer in S or something like this. It was done in the chapter, but go through it yourself. 30. Verify DeMorgan’s laws, C (∪ {A : A ∈ C}) = ∩ AC : A ∈ C C (∩ {A : A ∈ C}) = ∪ AC : A ∈ C where C consists of a set whose elements are subsets of a given set S. Hint: This says the complement of a union is the intersection of the complements and the complement of an intersection is the union of the complements. You need to show each set on either side of the equation is a subset of the other side.
30
CHAPTER 1. BASIC NOTIONS
Chapter 2
Basic Topology and Algebra 2.1
Some Algebra
It is often the case that one desires to consider the complex numbers denoted as C. However, most of the time in the first part of the book, we have in mind R. Nevertheless, it is convenient to have the theory pertain just as well to C or Cp . Therefore, we will denote by F that which could be either R or C. The symbol |x| will refer to the norm of x ∈ Fp , defined as v uX p u p 2 t |xj | = |x| where x ≡ x1 · · · xp , |a + ib| ≡ a2 + b2 j=1
but we could also use the norm k·k = k·k∞ defined as kxk∞ = max {|xi | , i = 1, · · · , p} I assume the reader has seen most of this before in multivariable calculus. From linear algebra or calculus, k·k is called a norm if the following conditions are satisfied. kxk ≥ 0 and kxk = 0 if and only if x = 0 (2.1) For α a scalar, kαxk = |α| kxk
(2.2)
kx + yk ≤ kxk + kyk
(2.3)
First note that if z, w ∈ C. Then |z + w| ≤ |z| + |w|. To see this, let z = a + ib, w = c + id. Then |z + w|
2
=
2
2
2
2
(a + c) + (b + d) = |z| + |w| + 2 (ac + bd) 2
2
2
2
2
= |z| + |w| + 2 Re (zw) ≤ |z| + |w| + 2 |z| |w| = (|z| + |w|) This also follows from the first part of the following proposition.
Proposition 2.1.1 Each of k·k∞ and |·| satisfy the axioms of a norm, 2.1 - 2.3. Pp Proof : Consider first |·| , the usual norm. Define (x, y) ≡ j=1 xj yj . Then it is obvious 2 that (x, x) = |x| . Also the following come directly from the definition. (x, y) = (y, x)
(2.4)
(ax + by, z) = a (x, z) + b (y, z)
(2.5)
For a, b ∈ F ,
31
32
CHAPTER 2. BASIC TOPOLOGY AND ALGEBRA (x, x) ≥ 0 and equals 0 if and only if x = 0
(2.6)
Note that (z,ax + by) = (ax + by, z) = a (x, z) + b (y, z) = a (z, x) + b (z, y) Thus the comma goes across + and you can factor out scalars in first argument and conjugates of the scalars in the second. The properties 2.4-2.6 are called the inner product axioms. The Cauchy Schwarz inequality, Proposition 1.5.5 says that |(x, y)| ≤ |x| |y|. Therefore, 2
|x + y|
=
(x + y, x + y) = (x, x) + (x, y) + (y, x) + (y, y)
=
|x| + 2 Re (x, y) + |y| ≤ |x| + 2 |(x, y)| + |y|
≤
|x| + 2 |x| |y| + |y| = (|x| + |y|)
2
2
2
2
2
2
2
which shows 2.3 for |·|. The other two axioms are obvious. As to k·k∞ , only 2.3 is not obvious. kx + yk∞
=
max {|xi + yi | , i ≤ p} ≤ max {|xi | + |yi | , i ≤ p}
≤
max {|xi | , i ≤ p} + max {|yi | , i ≤ p} = kxk∞ + kyk∞
It is very useful to observe that the Cauchy Schwarz inequality comes from the list of axioms 2.4 - 2.6. Thus the following is also called the Cauchy Schwarz inequality.
Proposition 2.1.2 Suppose the axioms 2.4 - 2.6. Then |(x, y)| ≤ |x| |y| 1/2
where |x| ≡ (x, x)
. Equality holds if and only if one vector is a real multiple of the other.
Proof: There is θ ∈ C such that |θ| = 1 and ¯θ (x, y) = |(x, y)|. Then consider p (t) ≡ (x + tθy, x + tθy) . Thus p (t) ≥ 0 for all t ∈ R. From the axioms, p (t)
=
(x, x) + 2 Re t¯θ (x, y) + t2 (y, y)
=
(x, x) + 2t |(x, y)| + t2 (y, y) ≥ 0
Assume (y, y) > 0. Thus, the polynomial p (t) has no real roots or only one. By the 2 quadratic formula, 4 |(x, y)| − 4 (x, x) (y, y) ≤ 0 which is a restatement of the Cauchy Schwarz inequality. If (y, y) = 0, then p (t) cannot be nonnegative for all t ∈ R unless (x, y) = 0 and so the inequality holds. For the last claim, note that if one vector is a real multiple of the other equality holds from application of the axioms and definitions of |·|. To go the other direction, equality holds 2 if and only if the 4 |(x, y)| − 4 (x, x) (y, y) = 0 if and only if the polynomial has exactly one real root if and only if for some t real, p (t) = 0 which implies for that t, x + tθy = 0 and so one vector is a multiple of the other. Note that the axiom which says (x, x) = 0 only if x = 0 can be removed in the first part of this proposition. The only thing used was that (x, x) ≥ 0. Note that the above two norms are equivalent in the sense that √ (*) kxk∞ ≤ |x| ≤ p kxk∞ Thus in what follows, it will not matter which norm is being used. Actually, any two norms are equivalent, which will be shown later. The significance of the Euclidean norm |·| is geometrical. See the problems. The fundamental result pertaining to the inner product just discussed is the Gram Schmidt process presented next.
2.1. SOME ALGEBRA
Definition 2.1.3
33 A set of vectors {v1 , · · · , vk } is called orthonormal if 1 if i = j (vi , vj ) = δ ij ≡ 0 if i 6= j
Then there is a very easy proposition which follows this.
Proposition 2.1.4 Suppose {v1 , · · · , vk } is an orthonormal set of vectors. Then it is linearly independent. Pk Proof: Suppose i=1 ci vi = 0. Then taking inner products with vj , X X 0 = (0, vj ) = ci (vi , vj ) = ci δ ij = cj . i
i
Since j is arbitrary, this shows the set is linearly independent as claimed. It turns out that if X is any subspace of Fm , then there exists an orthonormal basis for X. This follows from the use of the next lemma applied to a basis for X. Recall first that from linear algebra, every subspace of Fm has a basis. If this is not familiar, see the appendix on linear algebra.
Lemma 2.1.5 Let {x1 , · · · , xn } be a linearly independent subset of Fp , p ≥ n. Then there exist orthonormal vectors {u1 , · · · , un } which have the property that for each k ≤ n, span (x1 , · · · , xk ) = span (u1 , · · · , uk ) . Proof: Let u1 ≡ x1 / |x1 | . Thus for k = 1, span (u1 ) = span (x1 ) and {u1 } is an orthonormal set. Now suppose for some k < n, u1 , · · · , uk have been chosen such that (uj , ul ) = δ jl and span (x1 , · · · , xk ) = span (u1 , · · · , uk ). Then define uk+1
Pk xk+1 − j=1 (xk+1 , uj ) uj , ≡ Pk xk+1 − j=1 (xk+1 , uj ) uj
(2.7)
where the denominator is not equal to zero because the xj form a basis, and so xk+1 ∈ / span (x1 , · · · , xk ) = span (u1 , · · · , uk ) Thus by induction, uk+1 ∈ span (u1 , · · · , uk , xk+1 ) = span (x1 , · · · , xk , xk+1 ) . Also, xk+1 ∈ span (u1 , · · · , uk , uk+1 ) which is seen easily by solving 2.7 for xk+1 and it follows span (x1 , · · · , xk , xk+1 ) = span (u1 , · · · , uk , uk+1 ) . If l ≤ k, (uk+1 , ul ) = C (xk+1 , ul ) −
k X
(xk+1 , uj ) (uj , ul ) =
j=1
C (xk+1 , ul ) −
k X
(xk+1 , uj ) δ lj = C ((xk+1 , ul ) − (xk+1 , ul )) = 0.
j=1 n
The vectors, {uj }j=1 , generated in this way are therefore orthonormal because each vector has unit length. The following lemma is a fairly simple observation about the Gram Schmidt process which says that if you start with orthonormal vectors, the process will not undo what you already have.
34
CHAPTER 2. BASIC TOPOLOGY AND ALGEBRA
Lemma 2.1.6 Suppose {w1 , · · · , wr , vr+1 , · · · , vp } is a linearly independent set of vectors such that {w1 , · · · , wr } is an orthonormal set of vectors. Then when the Gram Schmidt process is applied to the vectors in the given order, it will not change any of the w1 , · · · , wr . Proof: Let {u1 , · · · , up } be the orthonormal set delivered by the Gram Schmidt process. Then u1 = w1 because by definition, u1 ≡ w1 / |w1 | = w1 . Now suppose uj = wj for all j ≤ k ≤ r. Then if k < r, consider the definition of uk+1 . Pk+1 wk+1 − j=1 (wk+1 , uj ) uj ≡ Pk+1 wk+1 − j=1 (wk+1 , uj ) uj
uk+1
By induction, uj = wj and so this reduces to wk+1 / |wk+1 | = wk+1 . Note how the argument depended only on the axioms of the inner product. Thus it will hold with no change in any inner product space, not just Fn . However, our main interest is the special case. An inner produce space is just a vector space which has an inner product, that which satisfies 2.4 - 2.6.
Lemma 2.1.7 Suppose V, W are subspaces of Fn and they have orthonormal bases, {v1 , · · · , vr } , {w1 , · · · , wr } respectively. Let A map V to W be defined by ! r r X X A ck vk ≡ ck wk k=1
k=1
Then |Av| = |v| . That is, A preserves Euclidean norms. Proof: This follows right away from a computation. If {u1 , · · · , ur } is orthonormal, then 2 ! r r r X X X X X X 2 ck uk = ck uk , ck uk = ck cj (uk , uj ) = ck ck = |ck | k=1
k=1
k=1
j,k
k
k
Pr Pr P Pr 2 2 2 2 Therefore,|A ( k=1 ck vk )| = | k=1 ck wk | = k |ck | = | k=1 ck vk | .
2.2
Metric Spaces
It was shown above that kx + yk ≤ kxk + kyk where k·k = |·| or k·k∞ . This was called the triangle inequality. Thus, in particular, kx − yk + ky − zk ≥ kx − zk A metric space is a nonempty set X along with a distance function d : X × X → [0, ∞) which satisfies the following axioms. 1. d (x, y) = d (y, x) 2. d (x, y) + d (y, z) ≥ d (x, z) 3. d (x, x) = 0 and d (x, y) = 0 if and only if x = y
2.3. CLOSED AND OPEN SETS
35
Thus Fn with either of the norms discussed is an example of a metric space if we define d (x, y) ≡ kx − yk A metric space is significantly more general than a normed vector space like Rn because it does not have any algebraic vector space properties associated with it. The only thing of importance is the distance function. There are many things which are metric spaces which are of interest and are not vector spaces. For example, you could consider the surface of the earth. It is not a subspace of R3 but it is very meaningful to ask for the distance between points on the earth. Because of this, I am going to use the language of metric spaces when referring to things which only involve topological considerations. It is customary to not bother to make the symbol for something in the space bold face so I will follow this simplified notation when referring to metric space which does not necessarily have any vector space attributes. A useful result is in the following lemma.
Lemma 2.2.1 Suppose xn → x and yn → y. Then d (xn , yn ) → d (x, y). Proof: Consider the following. d (x, y) ≤ d (x, xn ) + d (xn , y) ≤ d (x, xn ) + d (xn , yn ) + d (yn , y) so d (x, y) − d (xn , yn ) ≤ d (x, xn ) + d (yn , y). Similarly d (xn , yn ) − d (x, y) ≤ d (x, xn ) + d (yn , y) and so |d (xn , yn ) − d (x, y)| ≤ d (x, xn ) + d (yn , y) and the right side converges to 0 as n → ∞.
2.3
Closed and Open Sets
The definition of open and closed sets is next. This is in an arbitrary metric space.
Definition 2.3.1
An open ball, denoted as B (x, r) is defined as follows. B (x, r) ≡ {y : d (x, y) < r}
A set U is said to be open if whenever x ∈ U, it follows that there is r > 0 such that B (x, r) ⊆ U . More generally, a point x is said to be an interior point of U if there exists such a ball. In words, an open set is one for which every point is an interior point. For example, you could have X be a subset of R and d (x, y) = |x − y|. Then the first thing to show is the following.
Proposition 2.3.2 An open ball is an open set. Proof: Suppose y ∈ B (x, r) . We need to verify that y is an interior point of B (x, r). Let δ = r − d (x, y) . Then if z ∈ B (y, δ) , it follows that d (z, x) ≤ d (z, y) + d (y, x) < δ + d (y, x) = r − d (x, y) + d (y, x) = r Thus y ∈ B (y, δ) ⊆ B (x, r).
Definition 2.3.3 Let S be a nonempty subset of a metric space. Then p is a limit point (accumulation point) of S if for every r > 0 there exists a point different than p in B (p, r) ∩ S. Sometimes people denote the set of limit points as S 0 .
36
CHAPTER 2. BASIC TOPOLOGY AND ALGEBRA p A
The following proposition is fairly obvious from the above definition and will be used whenever convenient. It is equivalent to the above definition and so it can take the place of the above definition if desired.
Proposition 2.3.4 A point x is a limit point of the nonempty set A if and only if every B (x, r) contains infinitely many points of A. Proof: ⇐ is obvious. Consider ⇒ . Let x be a limit point. Let r1 = 1. Then B (x, r1 ) contains a1 6= x. If {a1 , · · · , an } have been chosen none equal to x and with no repeats in the list, let 0 < rn < min n1 , min {d (ai , x) , i = 1, 2, · · · n} . Then let an+1 ∈ B (x, rn ) . Thus every B (x, r) contains B (x, rn ) for all n large enough and hence it contains ak for k ≥ n where the ak are distinct, none equal to x. A related idea is the notion of the limit of a sequence. Recall that a sequence is really ∞ just a mapping from N to X. We write them as {xn } or {xn }n=1 if we want to emphasize the values of n. Then the following definition is what it means for a sequence to converge.
Definition 2.3.5
We say that x = limn→∞ xn when for every ε > 0 there exists N
such that if n ≥ N, then d (x, xn ) < ε Often we write xn → x for short. This is equivalent to saying lim d (x, xn ) = 0.
n→∞
Proposition 2.3.6 The limit is well defined. That is, if x, x0 are both limits of a sequence, then x = x0 . Proof: From the definition, there exist N, N 0 such that if n ≥ N, then d (x, xn ) < ε/2 and if n ≥ N 0 , then d (x, xn ) < ε/2. Then let M ≥ max (N, N 0 ) . Let n > M. Then d (x, x0 ) ≤ d (x, xn ) + d (xn , x0 )
0, B (p, r) contains xn ∈ S for all n large enough. Hence, p is a limit point because none of these xn are equal to p.
Definition 2.3.8
A set H is closed means H C is open.
Note that this says that the complement of an open set is closed. If V is open, then the C complement of its complement is itself. Thus V C = V an open set. Hence V C is closed. Thus, open sets are complements of closed sets and closed sets are complements of open sets. Then the following theorem gives the relationship between closed sets and limit points.
2.3. CLOSED AND OPEN SETS
Theorem 2.3.9
37
A set H is closed if and only if it contains all of its limit points.
Proof: =⇒ Let H be closed and let p be a limit point. We need to verify that p ∈ H. If it is not, then since H is closed, its complement is open and so there exists δ > 0 such that B (p, δ) ∩ H = ∅. However, this prevents p from being a limit point. ⇐= Next suppose H has all of its limit points. Why is H C open? If p ∈ H C then it is not a limit point and so there exists δ > 0 such that B (p, δ) has no points of H. In other words, H C is open. Hence H is closed.
Corollary 2.3.10 A set H is closed if and only if whenever {hn } is a sequence of points of H which converges to a point x, it follows that x ∈ H. Proof: =⇒ Suppose H is closed and hn → x. If x ∈ H there is nothing left to show. If x∈ / H, then from the definition of limit, it is a limit point of H because none of the hn are equal to x. Hence x ∈ H after all. ⇐= Suppose the limit condition holds, why is H closed? Let x ∈ H 0 the set of limit points of H. By Theorem 2.3.7 there exists a sequence of points of H, {hn } such that hn → x. Then by assumption, x ∈ H. Thus H contains all of its limit points and so it is closed by Theorem 2.3.9. Next is the important concept of a subsequence.
Definition 2.3.11
∞
Let {xn }n=1 be a sequence. Then if n1 < n2 < · · · is a strictly ∞ ∞ increasing sequence of indices, we say {xnk }k=1 is a subsequence of {xn }n=1 . The really important thing about subsequences is that they preserve convergence.
Theorem 2.3.12
Let {xnk } be a subsequence of a convergent sequence {xn } where xn → x. Then limk→∞ xnk = x also.
Proof: Let ε > 0 be given. Then there exists N such that d (xn , x) < ε if n ≥ N. It follows that if k ≥ N, then nk ≥ N and so d (xnk , x) < ε if k ≥ N. This is what it means to say limk→∞ xnk = x. In the case of Fp , if you have two norms which are equivalent in the sense that δ kxk ≤ kxk1 ≤ ∆ kxk then a set U is open with respect to one norm if and only if it is open with respect to the other. Indeed, if x ∈ U which is open with respect to k·k , then there is Bk·k (x, δ) ⊆ U but from the above inequality, 1 Bk·k1 x, δ ⊆ Bk·k (x, δ) ⊆ U ∆ because if kx − yk1
0 such that B (p, rk ) ⊆ Uk . Let 0 < r ≤ min (r1 , r2 , · · · , rm ) . Then B (p, r) ⊆ ∩m k=1 Uk and so the finite intersection is open. Note that if the finite intersection is empty, there is nothing to prove because it is certainly true in this case that every point in the intersection is an interior point because there aren’t any such points. Suppose {H1 , · · · , Hm } is a finite set of closed sets. Then ∪m k=1 Hk is closed if its complement is open. However, from DeMorgan’s laws, Problem 30 on Page 29, C
m C (∪m k=1 Hk ) = ∩k=1 Hk ,
a finite intersection of open sets which is open by what was just shown. Next let C be a set consisting of closed sets. Then C (∩C) = ∪ H C : H ∈ C , a union of open sets which is therefore open by the first part of the proof. Thus ∩C is closed. This proves the theorem. Now back to Fp we can conclude that every point in an open set is a limit point of the open set. Example 2.3.14 Consider A = B (x, δ) , an open ball in Fp . Then every point of B (x, δ) is a limit point of A. If z ∈ B (x, δ) , consider z+ k1 (x − z) ≡ wk for k ∈ N. Then
1
kwk − xk = z+ (x − z) − x
k
k−1 1 1 = 1 − z− 1 − x
= k kz − xk < δ k k and also
1 kx − zk < δ/k k so wk → z. Furthermore, the wk are distinct. Thus z is a limit point of A as claimed. This is because every ball containing z contains infinitely many of the wk and since they are all distinct, they can’t all be equal to z. In a general metric space, peculiar things can occur. In particular, you can have a nonempty open set which has no limit points at all. kwk − zk ≤
Example 2.3.15 Let Ω 6= ∅ and define for x, y ∈ Ω, d (x, y) = 0 if x = y and d (x, y) = 1 if x 6= y. Then you can show that this is a perfectly good metric space on Ω. However, every set is both open and closed. There are also no limit points for any nonempty set since B (x, 1/2) = {x}. You should consider why every set is both open and closed. Next is the definition of what is meant by the closure of a set.
Definition 2.3.16 Let A be a nonempty subset of X for X a metric space. Then A is defined to be the intersection of all closed sets which contain A. This is called the closure of A. Note X is one such closed set which contains A. Lemma 2.3.17 Let A be a nonempty set in X. Then A is a closed set and A = A ∪ A0 where A0 denotes the set of limit points of A.
2.4. SEQUENCES AND CAUCHY SEQUENCES
39
Proof: First of all, denote by C the set of closed sets which contain A. Then A ≡ ∩C C and this will be closed if its complement is open. However, A = ∪ H C : H ∈ C . Each H C is open and so the union of all these open sets must also be open. This is because if x is in this union, then it is in at least one of them. Hence it is an interior point of that one. But this implies it is an interior point of the union of them all which is an even larger set. Thus A is closed. The interesting part is the next claim. First note that from the definition, A ⊆ A so if / A. If y ∈ / A, a closed set, then there exists x ∈ A, then x ∈ A. Now consider y ∈ A0 but y ∈ C B (y, r) ⊆ A . Thus y cannot be a limit point of A, a contradiction. Therefore, A ∪ A0 ⊆ A Next suppose x ∈ A and suppose x ∈ / A. Then if B (x, r) contains no points of A different than x, since x itself is not in A, it would follow that B (x, r) ∩ A = ∅ and so recalling that C open balls are open, B (x, r) is a closed set containing A so from the definition, it also / A, then x ∈ A0 and contains A which is contrary to the assertion that x ∈ A. Hence if x ∈ 0 so A ∪ A ⊇ A
2.4
Sequences and Cauchy Sequences
It was discussed above what is meant by convergence of a sequence in a metric space. It was shown that if a sequence converges, then so does every subsequence. The context in this section will be a metric space (X, d). k Of course the converse does not hold. Consider ak = (−1) it has a subsequence converging to 1 but the sequence does not converge. However, if you have a Cauchy sequence, defined next, then convergence of a subsequence does imply convergence of the Cauchy sequence. This is a very important observation.
Definition 2.4.1 {xk } is a Cauchy sequence if and only if the following holds. For every ε > 0, there exists nε such that if k, l ≥ nε , then d (xk , xl ) < ε All Cauchy sequences are bounded.
Theorem 2.4.2 The set of terms in a Cauchy sequence {xn } in X is bounded in the sense that there exists M such that for all n, d (xn , x1 ) < M. Also, if any sequence converges, then it is Cauchy. Proof: Let ε = 1 in the definition of a Cauchy sequence and let n > n1 . Then from the definition, d (xn , xn1 ) < 1. It follows that if n ≥ n1 , d (xn , x1 ) < d (xn , xn1 ) + d (xn1 , x1 ) < 1 + d (xn1 , x1 ) and if n < n1 , d (xn , x1 ) ≤ max {d (xk , x1 ) : k ≤ n1 } and so for any n, d (xn , x1 ) < 1 + d (xn1 , x1 ) + max {d (xk , x1 ) : k ≤ n1 } If limn→∞ xn = x, then if ε > 0 is given, there is N such that if n > N, then d (xn , x) < ε/2. Thus, if n, m > N, then d (xn , xm ) ≤ d (xn , x) + d (x, xm ) < 2ε + 2ε = ε and so the sequence is Cauchy whenever it converges. Here is the theorem which says that if a subsequence of a Cauchy sequence converges, then so does the Cauchy sequence.
Theorem 2.4.3 Let {xn } be a Cauchy sequence. Then it converges to x if and only if some subsequence converges to x. Proof: =⇒ This was just done above. Indeed, if the sequence converges, then every subsequence converges to the same thing. ⇐= Suppose now that {xn } is a Cauchy sequence and limk→∞ xnk = x. Then there exists N1 such that if k > N1 , then d (xnk , x) < ε/2. From the definition of what it means
40
CHAPTER 2. BASIC TOPOLOGY AND ALGEBRA
to be Cauchy, there exists N2 such that if m, n ≥ N2 , then d (xm , xn ) < ε/2. Let N ≥ max (N1 , N2 ). Then if k ≥ N, then nk ≥ N and so d (x, xk ) ≤ d (x, xnk ) + d (xnk , xk )
0, then B (x, r) ∩ D 6= ∅. A metric space is called completely separable if there exists a countable collection of nonempty open sets B such that every open set is the union of some subset of B. This collection of open sets is called a countable basis. For those who like to fuss about empty sets, the empty set is open and it is indeed the union of a subset of B namely the empty subset.
2.5. SEPARABILITY AND COMPLETE SEPARABILITY
Theorem 2.5.2
41
A metric space is separable if and only if it is completely separable.
Also Fp is separable. Proof: ⇐= Let B be the special countable collection of open sets and for each B ∈ B, let pB be a point of B. Then let P ≡ {pB : B ∈ B}. To be specific, let pB be the center of B. If B (x, r) is any ball, then it is the union of sets of B and so there is a point of P in it. Since B is countable, so is P. =⇒ Let D be the countable dense set and let B ≡ {B (d, r) : d ∈ D, r ∈ Q ∩ [0, ∞)}. Then B is countable because the Cartesian product of countable sets is countable. It suffices to show that every ball is the union of these sets. Let B (x, R) be a ball. Let y ∈ B (y, δ) ⊆ δ δ . Let ε ∈ Q and 10 < ε < 5δ . Then y ∈ B (d, ε) ∈ B. B (x, R) . Then there exists d ∈ B y, 10 Is B (d, ε) ⊆ B (x, R)? If so, then the desired result follows because this would show that every y ∈ B (x, R) is contained in one of these sets of B which is contained in B (x, R) showing that B (x, R) is the union of sets of B. Let z ∈ B (d, ε) ⊆ B d, 5δ . Then d (y, z) ≤ d (y, d) + d (d, z)
0 such that B (x, r) ⊆ U ∈ C. Let k be large enough that n1k < 2r and also large enough that xnk ∈ B x, 2r . Then B xnk , n1k ⊆ B xnk , 2r ⊆ B (x, r) contrary to the requirement that B xnk , n1k is not contained in any set of C. In any metric space, these two definitions of compactness are equivalent.
Theorem 2.6.5
Let K be a nonempty subset of a metric space (X, d). Then it is compact if and only if it is sequentially compact.
2.6. COMPACTNESS
43
Proof: ⇐ Suppose K is sequentially compact. Let C be an open cover of K. By Proposition 2.6.4 there is a Lebesgue number δ > 0. Let x1 ∈ K. If B (x1 , δ) covers K, then pick a set of C containing this ball and this set will be a finite subset of C which covers K. If B (x1 , δ) does not cover K, let x2 ∈ / B (x1 , δ). Continue this way obtaining xk such that n d (xk , xj ) ≥ δ whenever k 6= j. Thus eventually {B (xi , δ)}i=1 must cover K because if not, you could get a sequence {xk } which has every pair of points further apart than δ and hence it has no Cauchy subsequence. Therefore, by Lemma 2.4.2, it would have no convergent subsequence. This would contradict K is sequentially compact. ⇒ Now suppose K is compact. If it is not sequentially compact, then there exists a sequence {xn } which has no convergent subsequence to a point of K. In particular, no point of this sequence is repeated infinitely often. The set of points ∪n {xn } has no limit point in K. If it did, you would have a subsequence converging to this point since every ball containing this point would contain infinitely many points of ∪n {xn }. Now consider the sets Hn ≡ ∪k≥n {xk } ∪ H 0 where H 0 denotes all limit points of ∪n {xn } in X which is the same as the limit points of ∪k≥n {xk }. Therefore, each Hn is closed thanks to Theorem 2.3.9. Now let Un ≡ HnC . This is an increasing sequence of open sets whose union contains K thanks to the fact that there is no constant subsequence. However, none of these open sets covers K because Un is missing xn , violating the definition of compactness. ⇒Another proof of the the second part of the above is as follows. Suppose K is not sequentially compact. Then there is {xn } such that no x ∈ K is the limit of a convergent subsequence. Hence if x ∈ K, there is rx > 0 such that B (x, rx ) contains xn for only finitely many n and there would exist many n. Otherwise, B x, k1 would contain xn for infinitely ∞ 1 {xnk }k=1 with nk < nk+1 for all k and xnk ∈ B x, k so this subsequence would converge to x. By compactness, there are finitely many of these balls B (x, rx ) which cover K. Now this is a contradiction because one of these balls must now contain xn for infinitely many n.
Definition 2.6.6
Let X be a metric space. Then a finite set of points {x1 , · · · , xn }
is called an ε net if X ⊆ ∪nk=1 B (xk , ε) If, for every ε > 0 a metric space has an ε net, then we say that the metric space is totally bounded.
Lemma 2.6.7 If a metric space (K, d) is sequentially compact, then it is separable and totally bounded. Proof: Pick x1 ∈ K. If B (x1 , ε) ⊇ K, then stop. Otherwise, pick x2 ∈ / B (x1 , ε) . Continue this way. If {x1 , · · · , xn } have been chosen, either K ⊆ ∪nk=1 B (xk , ε) in which case, you have found an ε net or this does not happen in which case, you can pick xn+1 ∈ / ∪nk=1 B (xk , ε). The process must terminate since otherwise, the sequence would need to have a convergent subsequence which is not possible because every pair of terms is farther apart than ε. See Lemma 2.4.2. Thus for every ε > 0, there is an ε net. Thus the metric space is totally bounded. Let Nε denote an ε net. Let D = ∪∞ k=1 N1/2k . Then this is a countable dense set. It is countable because it is the countable union of finite sets and it is dense because given a point, there is a point of D within 1/2k of it. Also recall that a complete metric space is one for which every Cauchy sequence converges to a point in the metric space. The following is the main theorem which relates these concepts. Note that if (X, d) is a metric space, then so is (S, d) whenever S ⊆ X. You simply use the metric on S.
Theorem 2.6.8
For (X, d) a metric space, the following are equivalent.
44
CHAPTER 2. BASIC TOPOLOGY AND ALGEBRA 1. (X, d) is compact. 2. (X, d) is sequentially compact. 3. (X, d) is complete and totally bounded.
Proof: By Theorem 2.6.5, the first two conditions are equivalent. 2.=⇒ 3. If (X, d) is sequentially compact, then by Lemma 2.6.7, it is totally bounded. If {xn } is a Cauchy sequence, then there is a subsequence which converges to x ∈ X by assumption. However, from Theorem 2.4.3 this requires the original Cauchy sequence to converge. Thus (X, d) is complete and totally bounded. 3.=⇒ 1. Since (X, d) is totally bounded, there must be a countable dense subset of X. Just take the union of 1/2k nets for each k ∈ N. Thus (X, d) is completely separable by Theorem 2.5.5 has the Lindeloff property. Hence, if X is not compact, there is a countable ∞ set of open sets {Ui }i=1 which covers X but no finite subset does. Consider the nonempty closed sets Fn and pick xn ∈ Fn where C
X \ ∪ni=1 Ui ≡ X ∩ (∪ni=1 Ui ) ≡ Fn Mk be a 1/2k net for X. We have for some m, B xkmk , 1/2k contains xn Let xkm m=1 for infinitely many values of n because there are only finitely many balls and infinitely k+1 k+1 k+1 many indices. Then out of the finitely many x for which B x , 1/2 intersects m m k k k+1 k+1 k+1 B xmk , 1/2 , pick one xmk+1 such that B xmk+1 , 1/2 contains xn for infinitely many k ∞ n. Then obviously xmk k=1 is a Cauchy sequence because 1 1 1 d xkmk , xk+1 mk+1 ≤ k + k+1 ≤ k−1 2 2 2 Hence for p < q, q−1 ∞ X X d xpmp , xqmq ≤ d xkmk , xk+1 < mk+1 k=p
Now take a subsequence xnk
k=p
1 1 = p−2 2k−1 2
∈ B xkmk , 2−k and it follows that lim xnk = lim xkmk = x ∈ X
k→∞
k→∞
However, x ∈ Fn for each n since each Fn is closed and these sets are nested. Thus x ∈ ∩n Fn ∞ contrary to the claim that {Ui }i=1 covers X. For the sake of another point of view, here is another argument, this time that 3.)⇒ 2.). This will illustrate something called the Cantor diagonalization process. 3.⇒ 2. Suppose {xk } is a sequence in X. By assumption there are finitely many open balls of radius 1/n covering X. This for each n ∈ N. Therefore, for n = 1, there is one of the balls, having radius 1 which contains xk for infinitely many k. Therefore, there is a subsequence with every term contained in this ball of radius 1. Now do for this subsequence what was just done for {xk } . There is a further subsequence contained in a ball of radius ∞ 1/2. Continue this way. Denote the ith subsequence as {xki }k=1 . Arrange them as shown x11 , x21 , x31 , x41 · · · x12 , x22 , x32 , x42 · · · x13 , x23 , x33 , x43 · · · .. . ∞
Thus all terms of {xki }k=1 are contained in a ball of radius 1/i. Consider now the diagonal sequence defined as yk ≡ xkk . Given n, each yk is contained in a ball of radius 1/n whenever k ≥ n. Thus {yk } is a subsequence of the original sequence and {yk } is a Cauchy sequence. By completeness of X, this converges to some x ∈ X which shows that every sequence in X has a convergent subsequence. This shows 3.)⇒ 2.).
2.6. COMPACTNESS
45
Lemma 2.6.9 The closed interval [a, b] in R is compact and every Cauchy sequence in R converges. Proof: To show this, suppose it is not. Then there is an admits no a+b C which open cover , , b . One of these, finite subcover for [a, b] ≡ I0 . Consider the two intervals a, a+b 2 2 maybe both cannot be covered with finitely many sets of C since otherwise, there would be a finite collection of sets from C covering [a, b] . Let I1 be the interval which has no finite subcover. Now do for it what was done for I0 . Split it in half and pick the half which has no finite covering of sets of C. Thus there is a “nested” sequence of closed intervals I0 ⊇ I1 ⊇ I2 · · · , each being half of the preceding interval. Say In = [an , bn ] . By the nested interval Lemma, Lemma 1.8.12, there is a point x in all these intervals. The point is unique because the lengths of the intervals converge to 0. This point is in some O ∈ C. Thus for some δ > 0, [x − δ, x + δ] , having length 2δ, is contained in O. For k large enough, the interval [ak , bk ] has length less than δ but contains x. Therefore, it is contained in [x − δ, x + δ] and so must be contained in a single set of C contrary to the construction. This contradiction shows that in fact [a, b] is compact. The second claim was proved earlier, but here it is again. If {xn } is a Cauchy sequence, then it is contained in some interval [a, b] which is compact. Hence there is a subsequence which converges to some x ∈ [a, b]. By Theorem 2.4.3 the original Cauchy sequence converges to x. Now the next corollary pertains more specifically to Rp .
Corollary 2.6.10 For each r > 0, Q ≡ [−r, r]p ≡
Qp
i=1
[−r, r] is compact in Rp .
∞ ∞ Proof: Let xk k=1 be a sequence in Q. Then for each i, xki k=1 is contained in [−r, r] . Therefore, taking a succession of p subsequences, one obtains a subsequence of xk denoted as {xnk } such that for each i ≤ p, {xni k } converges to some xi ∈ [−r, r]. It follows that limk→∞ xnk = x where the ith component of x is xi . Since Rp is a metric space, Theorem 2.6.8 implies the following theorem.
Theorem 2.6.11
A nonempty set K contained in Rp is compact if and only if it is
sequentially compact. Now the general result called the Heine Borel theorem comes right away.
Theorem 2.6.12
For K ⊆ Rp a nonempty set, the following are equivalent.
1. K is compact. 2. K is sequentially compact. 3. K is closed and bounded. Proof: The first two are equivalent from Theorem 2.6.5. It remains to show that these are equivalent to closed and bounded. Suppose the first two hold. Why is K bounded? If not, there is kn ∈ K \ B (0, n) . Then {kn } cannot have a Cauchy subsequence and so no subsequence can converge thanks to Theorem 2.4.3. Why is it closed? Using Corollary 2.3.10, it suffices to show that if kn → k, then k ∈ K. We know that {kn } is a Cauchy sequence by Theorem 2.4.2. Since K is sequentially compact, a subsequence converges to some l ∈ K. However, from Theorem 2.4.3, the original sequence also converges to l and so l = k. Thus k ∈ K. The following is another proof that K is closed given K is compact. ⇒If it is not closed, then it has a limit point k not in the set. Thus, in particular, K is not a finite set. For each x ∈ K, there is B (x, rx ) such that B (x, rx ) ∩ B (k, rx ) = ∅. Since K is compact, finitely many of the balls B xi , 21 rxi i = 1, · · · , m must cover K. But then
46
CHAPTER 2. BASIC TOPOLOGY AND ALGEBRA
one could consider 0 < r < min 21 rxi , i = 1 · · · m > 0 and B (k, r) would contain points of K different than k. Let x be such a point. It is in some B xi , 21 rxi . Hence r
0, there exists δ > 0 such that if d (ˆ x, x) < δ, then ρ (f (ˆ x) , f (x)) < ε. If f is continuous at every x ∈ X we say that f is continuous on X. The notation f −1 (S) means {x ∈ X : f (x) ∈ S} . It is called the inverse image of S. For example, you could have a real valued function f (x) defined on an interval [0, 1] . In this case you would have X = [0, 1] and Y = R with the distance given by d (x, y) = |x − y|. Then the following theorem is the main result. Recall that if (X, d) is a metric space and S ⊆ X is a nonempty subset, then (S, d) is also a metric space so this latter case is included in what follows.
Theorem 2.6.19 Let f : X → Y where (X, d) and (Y, ρ) are metric spaces. Then the following two are equivalent. a f is continuous at x. b Whenever xn → x, it follows that f (xn ) → f (x) . Also, the following are equivalent. c f is continuous on X. d Whenever V is open in Y, it follows that f −1 (V ) ≡ {x ∈ X : f (x) ∈ V } is open in X. e Whenever H is closed in Y, it follows that f −1 (H) is closed in X. Proof: a =⇒ b: Let f be continuous at x and suppose xn → x. Then let ε > 0 be given. By continuity, there exists δ > 0 such that if d (ˆ x, x) < δ, then ρ (f (ˆ x) , f (x)) < ε. Since xn → x, it follows that there exists N such that if n ≥ N, then d (xn , x) < δ and
48
CHAPTER 2. BASIC TOPOLOGY AND ALGEBRA
so, if n ≥ N, it follows that ρ (f (xn ) , f (x)) < ε. Since ε > 0 is arbitrary, it follows that f (xn ) → f (x). b =⇒ a: Suppose b holds but f fails to be continuous at x. Then there exists ε > 0 such that for all δ > 0, there exists x ˆ such that d (ˆ x, x) < δ but ρ (f (ˆ x) , f (x)) ≥ ε. Letting δ = 1/n, there exists xn such that d (xn , x) < 1/n but ρ (f (xn ) , f (x)) ≥ ε. Now this is a contradiction because by assumption, the fact that xn → x implies that f (xn ) → f (x). In particular, for large enough n, ρ (f (xn ) , f (x)) < ε contrary to the construction. c =⇒ d: Let V be open in Y . Let x ∈ f −1 (V ) so that f (x) ∈ V. Since V is open, there exists ε > 0 such that B (f (x) , ε) ⊆ V . Since f is continuous at x, it follows that there exists δ > 0 such that if x ˆ ∈ B (x, δ) , then f (ˆ x) ∈ B (f (x) , ε) ⊆ V. (f (B (x, δ)) ⊆ B (f (x) , ε)) In other words, B (x, δ) ⊆ f −1 (B (f (x) , ε)) ⊆ f −1 (V ) which shows that, since x was an arbitrary point of f −1 (V ) , every point of f −1 (V ) is an interior point which implies f −1 (V ) is open. C d =⇒ e: Let H be closed in Y . Then f −1 (H) = f −1 H C which is open by assumption. Hence f −1 (H) is closed because its complement is open. C e =⇒ d: Let V be open in Y. Then f −1 (V ) = f −1 V C which is assumed to be closed. This is because the complement of an open set is a closed set. Thus f −1 (V ) is open because its complement is closed. d =⇒ c: Let x ∈ X be arbitrary. Is it the case that f is continuous at x? Let ε > 0 be given. Then B (f (x) , ε) is an open set in V (Recall that open balls are open.) and so x ∈ f −1 (B (f (x) , ε)) which is given to be open. Hence there exists δ > 0 such that x ∈ B (x, δ) ⊆ f −1 (B (f (x) , ε)) . Thus, f (B (x, δ)) ⊆ B (f (x) , ε) so if d (x, x ˆ) < δ meaning that x ˆ ∈ B (x, δ) , then ρ (f (ˆ x) , f (x)) < ε meaning that f (ˆ x) ∈ B (f (x) , ε). Thus f is continuous at x for every x. In the case where f : D (f ) ⊆ Rp → Rq , the above definition takes the following more familiar form.
Definition 2.6.20
A function f : D (f ) ⊆ Rp → Rq is continuous at x ∈ D (f ) if for each ε > 0 there exists δ > 0 such that whenever y ∈ D (f ) and |y − x| < δ it follows that |f (x) − f (y)| < ε. f is continuous if it is continuous at every point of D (f ). By equivalence of norms, this is equivalent to the same statement with k·k∞ in place of |·| or any other norm equivalent to |·| and it will be shown that this includes all of them. Also, the above major theorem implies the following corollary when the functions have values in Rq .
Corollary 2.6.21 f : Rp → Rq is continuous if and only if f −1 (V ) is open in Rp whenever V is open in Rq and f −1 (C) is closed in Rp whenever C is closed in Rq .
Recall how the function x → dist (x, S) was continuous. Theorem 2.6.19 implies 1 1 is open, x : dist (x, S) ≥ is closed x : dist (x, S) > k k and so forth. Now here are some basic properties of continuous functions which have values in Rp or R so that it makes sense to add and multiply by scalars. However, no context is specified for property 3. which holds for f , g having values and domains in metric space.
Theorem 2.6.22
The following assertions are valid.
2.6. COMPACTNESS
49
1. The function af + bg is continuous at x when f , g are continuous at x ∈ D (f ) ∩ D (g) and a, b ∈ R. 2. If and f and g are each real valued functions continuous at x, then f g is continuous at x. If, in addition to this, g (x) 6= 0, then f /g is continuous at x. 3. If f is continuous at x, f (x) ∈ D (g), and g is continuous at f (x) , then g ◦ f is continuous at x. 4. If f = (f1 , · · · , fq ) : D (f ) → Rq , then f is continuous if and only if each fk is a continuous real valued function. 5. The function f : Rp → R, given by f (x) = |x| is continuous. Proof: Begin with (1). Let ε > 0 be given. By assumption, there exist δ 1 > 0 such ε that whenever |x − y| < δ 1 , it follows |f (x) − f (y)| < 2(|a|+|b|+1) and there exists δ 2 > 0 ε . Then let such that whenever |x − y| < δ 2 , it follows that |g (x) − g (y)| < 2(|a|+|b|+1) 0 < δ ≤ min (δ 1 , δ 2 ). If |x − y| < δ, then everything happens at once. Therefore, using the triangle inequality |af (x) + bf (x) − (ag (y) + bg (y))| ≤ |a| |f (x) − f (y)| + |b| |g (x) − g (y)| ε ε < |a| + |b| < ε. 2 (|a| + |b| + 1) 2 (|a| + |b| + 1) Now begin on (2). There exists δ 1 > 0 such that if |y − x| < δ 1 , then |f (x) − f (y)| < 1. Therefore, for such y, |f (y)| < 1 + |f (x)| . It follows that for such y, |f g (x) − f g (y)| ≤ |f (x) g (x) − g (x) f (y)| + |g (x) f (y) − f (y) g (y)| ≤ |g (x)| |f (x) − f (y)| + |f (y)| |g (x) − g (y)| ≤ (1 + |g (x)| + |f (y)|) [|g (x) − g (y)| + |f (x) − f (y)|] . Now let ε > 0 be given. There exists δ 2 such that if |x − y| < δ 2 , then |g (x) − g (y)|
0 such that if d (x, z) < δ and z ∈ D (f ), then d (f (z) , f (x)) < η. Then if d (x, z) < δ and z ∈ D (g ◦ f ) ⊆ D (f ), all the above hold and so d (g (f (z)) , g (f (x))) < ε.
2.6. COMPACTNESS
51
This proves part (3). In terms of sequences, if xn → x, then f (xn ) → f (x) and so g (f (xn )) → g (f (x)). Thus g ◦ f is continuous at x. Part (4) says: If f = (f1 , · · · , fq ) : D (f ) → Rq , then f is continuous if and only if each fk is a continuous real valued function. Then |fk (x) − fk (y)| ≤ |f (x) − f (y)| ≡
q X
!1/2 2
|fi (x) − fi (y)|
i=1
≤
q X
|fi (x) − fi (y)| .
(2.9)
i=1
Suppose first that f is continuous at x. Then there exists δ > 0 such that if |x − y| < δ, then |f (x) − f (y)| < ε. The first part of the above inequality then shows that for each k = 1, · · · , q, |fk (x) − fk (y)| < ε. This shows the only if part. Now suppose each function fk is continuous. Then if ε > 0 is given, there exists δ k > 0 such that whenever |x − y| < δ k , |fk (x) − fk (y)| < ε/q.Now let 0 < δ ≤ min (δ 1 , · · · , δ q ). For |x − y| < δ, the above inequality holds for all k and so the last part of (2.9) implies |f (x) − f (y)| ≤
q X
|fi (x) − fi (y)|
0 be given and let δ = ε. Then if |x − y| < δ, the triangle inequality implies |f (x) − f (y)| = ||x| − |y|| ≤ |x − y| < δ = ε. This proves part (5) and completes the proof of the theorem.
2.6.2
Limits of Vector Valued Functions
I will feature limits of functions which have values in some Rp . First of all, you can only consider limits at limit points of the domain as explained below. It isn’t any harder to formulate this in terms of metric spaces, so this is what I will do. You can let the metric space be Rp if you like.
Definition 2.6.23
Let f : D (f ) ⊆ X → Y where (X, d) and (Y, ρ) are metric spaces. For x a limit point of D (f ) , meaning that B (x, r) contains points of D (f ) other than x for each r > 0, limy→x f (y) = z ∈ Y means the following. For every ε > 0, there exists δ > 0 such that if 0 < d (x, y) < δ and y ∈ D (f ) , then ρ (f (y) , z) < ε. Note that x must be a limit point of D (f ) in order to take the limit at x. This will be clear from the next proposition which says that the limit, if it exists, is well defined.
Proposition 2.6.24 Let x be a limit point of D (f ) where f : D (f ) ⊆ X → Y as in the above definition. If limy→x f (x) = z and limy→x f (y) = zˆ, then z = zˆ. Proof: Let δ be small enough to go with ε/3 in the case of both z, zˆ. Then, since x is a limit point, there exists y ∈ B (x, δ) ∩ D (f ) , y 6= x. Then ρ (z, zˆ) ≤ ρ (z, f (y)) + ρ (f (y) , zˆ) 0 such that whenever x, x ˆ are two points of X with d (x, x ˆ) < δ, it follows that ρ (f (x) , f (ˆ x)) < ε. Note the difference between this and continuity. With continuity, the δ could depend on x but here it works for any pair of points in X. There is a remarkable result concerning compactness and uniform continuity.
Theorem 2.6.28 Let f : (K, d) → (Y, ρ) be a continuous function where K is a compact metric space. Then f is uniformly continuous. Proof: Suppose f fails to be uniformly continuous. Then there exists ε > 0 and pairs of points xn , x ˆn such that d (xn , x ˆn ) < 1/n but ρ (f (xn ) , f (ˆ xn )) ≥ ε. Since K is compact, it is sequentially compact and so there exists a subsequence, still denoted as {xn } such that xn → x ∈ K. Then also x ˆn → x also and so by Lemma 2.2.1, 0 = ρ (f (x) , f (x)) = limn→∞ ρ (f (xn ) , f (ˆ xn )) ≥ ε which is a contradiction. Later in the book, I will consider the fundamental theorem of algebra. However, here is a fairly short proof based on the extreme value theorem. You mayhave to fill in a few details however. In particular, note that (C, |·|) is the same as R2 , |·| where |·| is the standard norm on R2 . Thus closed and bounded sets are compact in (C, |·|). Also, the above theorems apply for Rp and so they also apply for Cp because it is the same as R2p .
Proposition 2.6.29 Let p (z) = a0 + a1 z + · · · + an−1 z n−1 + z n be a nonconstant polynomial where each ai ∈ C. Then there is a root to this polynomial. Proof: Suppose the nonconstant polynomial p (z) = a0 + a1 z + · · · + z n , has no zero in C. Since lim|z|→∞ |p (z)| = ∞, there is a z0 with |p (z0 )| = minz∈C |p (z)| > 0 Why? (The growth condition shows that you can restrict attention to a closed and bounded set and 0) then apply the extreme value theorem.) Then let q (z) = p(z+z p(z0 ) . This is also a polynomial
2.6. COMPACTNESS
53
which has no zeros and the minimum of |q (z)| is 1 and occurs at z = 0. Since q (0) = 1, it follows q (z) = 1 + ak z k + r (z) where r (z) consists of higher order terms. Here ak is the first coefficient which is nonzero. Choose a sequence, zn → 0, such that ak znk < 0. For example, let −ak znk = (1/n). Then |q (zn )| = 1 + ak z k + r (z) ≤ 1 − 1/n + |r (zn )| = 1 + ak znk + |r (zn )| < 1 for all n large enough because |r (zn )| is small compared with ak znk since it involves higher order terms. This is a contradiction. The idea is that if k < m, then for z sufficiently small, z m is very small relative to z k . Here is another very interesting theorem about continuity and compactness. It says that if you have a continuous function defined on a compact set K then f (K) is also compact. If f is one to one, then its inverse is also continuous.
Theorem 2.6.30
Let K be a compact set in some metric space and let f : K → f (K) be continuous. Then f (K) is compact. If f is one to one, then f −1 is also continuous. Proof: The first part is in Theorem 2.6.25. However, I will give a different proof. As explained above, compactness and sequential compactness are the same in the setting of ∞ metric space. Suppose then that {f (xk )}k=1 is a sequence in f (K). Since K is compact, ∞ there is a subsequence, still denoted as {xk }k=1 such that xk → x ∈ K. Then by continuity, f (xk ) → f (x) and so f (K) is compact as claimed. Next suppose f is one to one. If you have f (xk ) → f (x) , does it follow that xk → x? ∞ If not, then by compactness, there is a subsequence, still denoted as {xk }k=1 such that xk → x ˆ ∈ K, x 6= x ˆ. Then by continuity, it also happens that f (xk ) → f (ˆ x) and so f (x) = f (ˆ x) which is a contradiction. Therefore, xk → x as desired, showing that f −1 is continuous.
2.6.4
Convergence of Functions
Here the discussion is specialized to vector valued functions having values in some Rp . Most if not all of it will work for general metric spaces. However, I want to move towards Banach spaces which are complete normed vector spaces. There are two kinds of convergence for a sequence of functions described in the next definition, pointwise convergence and uniform convergence. Of the two, uniform convergence is far better and tends to be the kind of convergence most encountered in complex analysis. Pointwise convergence is more often encounted in real analysis and necessitates much more difficult theorems.
Definition 2.6.31
Let S ⊆ Cp and let fn : S → Cq for n = 1, 2, · · · . Then {fn } is said to converge pointwise to f on S if for all x ∈ S, fn (x) → f (x) for each x. The sequence is said to converge uniformly to f on S if lim sup |fn (x) − f (x)| = 0 n→∞
x∈S
supx∈S |fn (x) − f (x)| is denoted askfn − f k∞ or just kfn − f k for short.k·k is called the uniform norm. To illustrate the difference in the two types of convergence, here is a standard example. Example 2.6.32 Let f (x) ≡
0 if x ∈ [0, 1) 1 if x = 1
Also let fn (x) ≡ xn for x ∈ [0, 1] . Then fn converges pointwise to f on [0, 1] but does not converge uniformly to f on [0, 1].
54
CHAPTER 2. BASIC TOPOLOGY AND ALGEBRA The following picture illustrates the above definition. f
Note how the target function is not continuous although each function in the sequence is. The next theorem shows that this kind of loss of continuity never occurs when you have uniform convergence. The theorem holds generally when S ⊆ X a metric space and f , f n have values in Y another metric space. Note that you could simply refer to S as the metric space if you want. You might adapt the following to include this more general case.
Theorem 2.6.33 Let fn : S → Cq be continuous and let fn converge uniformly to f on S. Then if fn is continuous at x ∈ S, it follows that f is also continuous at x. Proof: Let ε > 0 be given. Let N be such that if n ≥ N, then sup |fn (y) − f (y)| ≡ kfn − f k∞
0 such that if |y − x| < δ, then |fn (y) − fn (x)| < 3ε . Then if |y − x| < δ, y ∈ S, then |f (x) − f (y)|
≤
0, there exists N such that if m, n > ε, then if m, n ≥ N, then kfn − fm k ≡ sup |fn (x) − fm (x)| < ε x∈S
Observation 2.6.35 k·k satisfies the axioms of a norm. Consider the triangle inequality. |f (x) + g (x)| ≤ |f (x)| + |g (x)| ≤ kf k + kgk and so kf + gk ≡ sup |f (x) + g (x)| ≤ sup |f (x)| + sup |g (x)| = kf k + kgk x∈S
x∈S
x∈S
As to the axiom about scalars, kαf k ≡ sup |αf (x)| = sup |α| |f (x)| = |α| sup |f (x)| ≡ |α| kf k x∈S
x∈S
x∈S
It is clear that kf k ≥ 0. If kf k = 0, then clearly f (x) = 0 for each x and so f = 0.
Theorem 2.6.36
Let fn : S → Cq be bounded functions: supx∈S |fn (x)| = Cn < ∞. Then there exists bounded f : S → Cq such that limn→∞ kf − fn k = 0 if and only if {fn } is uniformly Cauchy.
2.6. COMPACTNESS
55
Proof: ⇐First suppose {fn } is uniformly Cauchy. Then for each x, |fn (x) − fm (x)| ≤ kfn − fm k
(2.10)
and it follows that for each x, {fn (x)} is a Cauchy sequence. By completeness of Rq , this converges. Let f (x) be that to which it converges. Now pick N such that for m, n ≥ N, kfn − fm k < ε/2 Then in 2.10, |fn (x)| ≤ |fN (x)| + ε/2 ≤ CN + 1 so, since x is arbitrary, it follows each fn is bounded since |fn (x)| is bounded above by max {CN + ε/2, Ck , k ≤ N } . Also, for n ≥ N, |fn (x) − f (x)| ≤ |fn (x) − fm (x)| + |fm (x) − f (x)| < ε/2 + |fm (x) − f (x)| Now, take the limit as m → ∞ on the right to obtain that for all x, |fn (x) − f (x)| ≤ ε/2 Therefore, kfn − f k < ε if n ≥ N so, since ε is arbitrary, this shows that limn→∞ kf − fn k = 0. ⇒Conversely, if there exists bounded f : S → Rq to which {fn } converges uniformly, why is {fn } uniformly Cauchy? |fn (x) − fm (x)|
≤
|fn (x) − f (x)| + |f (x) − fm (x)|
≤
kfn − f k + kf − f m k
By assumption, there is N such that if n ≥ N, then kfn − f k < ε/3. Then if m, n ≥ N, it follows that for any x ∈ S, |fn (x) − fm (x)| < ε/3 + ε/3 = 2ε/3 Then sup |fn (x) − fm (x)| ≡ kfn − fm k ≤ 2ε/3 < ε x∈S
Hence {fn } is uniformly Cauchy. Now here is an example of an infinite dimensional space which is also complete. Example 2.6.37 Denote by BC (S; Cq ) the bounded continuous functions defined on S with values in Cq . Then this is a complex vector space with norm k·k defined above. It is also complete. It is obvious that this is a complex vector space. Indeed, it is a subspace of the set of functions having values in Cq and it is clear that the given set of functions is closed with respect to the vector space operations. It was explained above in Observation 2.6.35 that this uniform norm really is a norm. It remains to verify completeness. Suppose then that {fn } is a Cauchy sequence. By Theorem 2.6.36, there exists a bounded function f such that fn converges to f uniformly. Then by Theorem 2.6.33 it follows that f is also continuous. Since every Cauchy sequence converges, this says that BC (S; Cq ) is complete. Of course this all works if Cq is replaced with Y a complete normed linear space. Incidentally, such complete normed linear spaces are called Banach spaces.
Definition 2.6.38
A Banach space is a complete normed linear space. That is, it is a normed vector space in which every Cauchy sequence converges.
56
CHAPTER 2. BASIC TOPOLOGY AND ALGEBRA
Example 2.6.39 Let K be a compact set and consider C (K; Cq ) with the uniform norm. This is a complete normed linear space. If you have a Cauchy sequence, then the functions are automatically bounded thanks to the extreme value theorem. Thus, by the above example, the Cauchy sequence converges uniformly to a continuous function. One can consider convergence of infinite series the same way as done in calculus. Pn P Definition 2.6.40 The symbol ∞ k=1 fk (x) provided k=1 fk (x) means limn→∞ this limit exists. This is called pointwise convergence of the infinite sum. Thus the infinite sum means the limit of the sequence of partial sums. The infinite sum is said to converge uniformly if the sequence of paritial sums converges uniformly. P∞ Note how this theorem includes the case of k=1 ak as a special case. Here the ak don’t depend on x. The following theorem is very useful. It tells how to recognize that an infinite sum is converging or converging uniformly. First is a little lemma which reviews standard calculus. P Lemma 2.6.41 Suppose Mk ≥ 0 and ∞ k=1 Mk converges. Then lim
∞ X
m→∞
Mk = 0
k=m
Proof: By assumption, there is N such that if m ≥ N, then if n > m, n m n X X X Mk − Mk = Mk < ε/2 k=1
k=1
k=m+1
P∞ Then letting n →P ∞, one can pass to a limit and conclude that k=m+1 Mk < ε. It follows ∞ that for m P > N, k=m Mk < ε. The part about passing to a limit follows from fact Pthe n ∞ that n → M is an increasing sequence which is bounded above by M k k. k=m+1 k=1 Therefore, it converges by completeness of R. P Theorem 2.6.42 Let fk : S → Cp . For x ∈ S, if ∞ k=1 |fk (x)| < ∞, then P∞ fk (x) converges pointwise. If there exists Mk such that Mk ≥ |fk (x)| for all x ∈ S, k=1P ∞ then k=1 fk (x) converges uniformly. Pn Pm P∞ Proof: Let m < n. Then | k=1 fk (x) − k=1 P∞fk (x)| ≤ k=m |fk (x)| < ε/2 whenever m is large enough due to the assumption that k=1 |fk (x)| < ∞. Thus the partial sums are a Cauchy sequence and so the series converges pointwise. If Mk ≥ |fk (x)| for all x ∈ S, then for M large enough, n m ∞ ∞ X X X X fk (x) − fk (x) ≤ |fk (x)| ≤ Mk < ε/2 k=1
k=1
k=m
k=m
Thus, taking sup,
n m
X
X
fk (·) − fk (·) ≤ ε/2 < ε
k=1
k=1
and so the partial sums P∞ are uniformly Cauchy sequence. Hence they converge uniformly to what is defined as k=1 fk (x) for x ∈ S. The latter part of this theorem is called the Weierstrass M test. As a very interesting application, consider the question of nowhere differentiable functions. Consider the following description of a function. The following is the graph of the function on [0, 1] .
2.6. COMPACTNESS
57
1 The height of the function is 1/2 and the slope of the rising line is 1 while the slope of the falling line is −1. Now extend this function to the whole real line to make it periodic of period 1. This means f (x + n) = f (x) for all x ∈ R and n ∈ Z, the integers. In other words to find the graph of f on [1, 2] you simply slide the graph of f on [0, 1] a distance of 1 to get the same tent shaped thing on [1, 2] . Continue this way. The following picture illustrates what a piece of the graph of this function looks like. Some might call it an infinite sawtooth.
Now define g (x) ≡
∞ k X 3
4
k=0
f 4k x .
−k
Letting Mk = (3/4) , an application of the Weierstrass M test, Theorem 2.6.42 shows g is everywhere continuous. This is because each function in the sum is continuous and the series converges uniformly on R. However, this function is nowhere differentiable. This is shown next. Let δ m = ± 41 (4−m ) where we assume m > 2. That of interest will be m → ∞. g (x + δ m ) − g (x) = δm
P∞
k=0
3 k 4
f 4k (x + δ m ) − f 4k x δm
If you take k > m, f 4k (x + δ m ) − f 4k x
1 −m 4 4k x ± − f 4k x 4 integer z }| { k 1 k = f 4 x ± 4k−m −f 4 x =0 4 = f
Therefore, m k g (x + δ m ) − g (x) 1 X 3 = f 4k (x + δ m ) − f 4k x δm δm 4 k=0
The absolute value of the last term in the sum is m 3 m m (f (4 (x + δ )) − f (4 x)) m 4 and we choose the sign of δ m such that both 4m (x + δ m ) and 4m x are in some interval [k/2, (k + 1) /2) which is certainly possible because the distance between these two points is 1/4 and such half open intervals include all of R. Thus, since f has slope ±1 on the interval just mentioned, m m 3 3 m m (f (4 (x + δ m )) − f (4 x)) = 4m |δ m | = 3m |δ m | 4 4
58
CHAPTER 2. BASIC TOPOLOGY AND ALGEBRA
As to the other terms, 0 ≤ f (x) ≤ 1/2 and so m−1 m−1 m m X 3 k X 3 k 1 − (3/4) 3 k k f 4 (x + δ m ) − f 4 x ≤ = =4−4 4 4 1/4 4 k=0
k=0
Thus
m g (x + δ m ) − g (x) ≥ 3m − 4 − 4 3 ≥ 3m − 4 δm 4
Since δ m → 0 as m → ∞, g 0 (x) does not exist because the difference quotients are not bounded. This proves the following theorem.
Theorem 2.6.43 There exists a function defined on R which is continuous and bounded but fails to have a derivative at any point. Proof: It only remains to verify that the function just constructed is bounded. However, ∞
0 ≤ g (x) ≤
1X 2
k=0
k 3 =2 4
Note that you could consider (ε/2) g (x) to get a function which is continuous, has values between 0 and ε which has no derivative.
2.6.5
Multiplication of Series
The following is a major result about multiplying series. It is Mertens theorem.
Theorem 2.6.44
Suppose
P∞
i=r
∞ X
ai and
P∞
j=r bj
both converge absolutely1 . Then
! ∞ ∞ X X bj = cn ai n=r
j=r
i=r
where cn =
n X
ak bn−k+r .
k=r
Proof: Let pnk = 1 if r ≤ k ≤ n and pnk = 0 if k > n. Then cn =
∞ X
pnk ak bn−k+r .
k=r
Also, ∞ X ∞ X
pnk |ak | |bn−k+r | =
k=r n=r
=
∞ X k=r
|ak |
∞ X n=k
|bn−k+r | =
∞ X k=r
∞ X k=r
|ak |
|ak |
∞ X
pnk |bn−k+r |
n=r
∞ ∞ ∞ X X X bn−(k−r) = |ak | |bm | < ∞. n=k
k=r
m=r
1 Actually, it is only necessary to assume one of the series converges and the other converges absolutely. This is known as Merten’s theorem and may be read in the 1974 book by Apostol listed in the bibliography.
2.7. TIETZE EXTENSION THEOREM
59
Therefore, from Corollary 1.8.5, ∞ X
cn
=
n=r
= =
∞ X n X
ak bn−k+r =
n=r k=r ∞ ∞ X X
ak
k=r ∞ X
ak
∞ X ∞ X n=r k=r ∞ X
pnk bn−k+r =
n=r ∞ X
pnk ak bn−k+r ak
k=r
∞ X
bn−k+r
n=k
bm
m=r
k=r
P∞ It follows that n=r cn converges absolutely. Also, you can see by induction that you can multiply any number of absolutely convergent series together and obtain a series which is absolutely convergent. Next, here are some similar results related to Merten’s theorem. In this theorem, z is a variable in some set called K. P P∞ Lemma 2.6.45 Let ∞ n=0 an (z) and n=0 bn (z) be two convergent series for z ∈ K which satisfy the conditions of the Weierstrass M test. Thus there exist positive conP∞ stants, A and B such that |a (z)| ≤ A , |b (z)| ≤ B for all z ∈ K and A n n n n n n n < n=0 P∞ ∞, n=0 Bn < ∞. Then defining the Cauchy product, cn (z) ≡
n X
an−k (z) bk (z) ,
k−0
P∞ it follows n=0 cn (z) also converges absolutely and uniformly on K because cn (z) satisfies the conditions of the Weierstrass M test. Therefore, ! ∞ ! ∞ ∞ X X X cn (z) = ak (z) bn (z) . (2.11) n=0
Proof: |cn (z)| ≤
k=0 n X
n=0
|an−k (z)| |bk (z)| ≤
n X
An−k Bk .
k=0
k=0
Also, from Theorem 1.8.3, n ∞ X X n=0 k=0
An−k Bk =
∞ ∞ X X
An−k Bk =
k=0 n=k
∞ X k=0
Bk
∞ X
An < ∞.
n=0
The claim of 2.11 follows from Merten’s theorem, Theorem 2.6.44. P Corollary 2.6.46 Let P be a polynomial and let ∞ n=0 an (z) converge uniformly and absolutely on K such that P the an satisfy P the conditions of the Weierstrass M test. Then ∞ ∞ there exists a series for P ( n=0 an (z)) , n=0 cn (z) , which also converges absolutely and uniformly for z ∈ K because cn (z) also satisfies the conditions of the Weierstrass M test.
2.7
Tietze Extension Theorem
This is an interesting theorem which holds in arbitrary normal topological spaces. However, I am specializing to Rp to keep the emphasis on that which is most familiar. The presentation depends on Lemma 2.5.8.
Lemma 2.7.1 Let H, K be two nonempty disjoint closed subsets of Rp . Then there exists a continuous function, g : Rp → [−1/3, 1/3] such that g (H) = −1/3, g (K) = 1/3, g (Rp ) ⊆ [−1/3, 1/3] .
60
CHAPTER 2. BASIC TOPOLOGY AND ALGEBRA Proof: Let f (x) ≡
dist (x, H) . dist (x, H) + dist (x, K)
The denominator is never equal to zero because if dist (x, H) = 0, then x ∈ H because H is closed. (To see this, pick hk ∈ B (x, 1/k) ∩ H. Then hk → x and since H is closed, x ∈ H.) Similarly, if dist (x, K) = 0, then x ∈ K and so the denominator is never zero as claimed because it is not possible for a point to be in both H and K. Hence f is continuous and from its definition, f = 0 on H and f = 1 on K. Now let g (x) ≡ 32 f (x) − 12 . Then g has the desired properties.
Definition 2.7.2
For f : M ⊆ Rp → R, define kf kM as sup {|f (x)| : x ∈ M } . Right now, this is just notation.
Lemma 2.7.3 Suppose M is a closed set in Rp and suppose f : M → [−1, 1] is continuous at every point of M. Then there exists a function g which is defined and continuous on all of Rp such that kf − gkM < 32 , g (Rp ) ⊆ [−1/3, 1/3] . Proof: Let H = f −1 ([−1, −1/3]) , K = f −1 ([1/3, 1]) . Thus H and K are disjoint closed subsets of M. Suppose first H, K are both nonempty. Then by Lemma 2.7.1 there exists g such that g is a continuous function defined on all of Rp and g (H) = −1/3, g (K) = 1/3, and g (Rp ) ⊆ [−1/3, 1/3] . It follows kf − gkM < 2/3. If H = ∅, then f has all its values in [−1/3, 1] and so letting g ≡ 1/3, the desired condition is obtained. If K = ∅, let g ≡ −1/3. If both H, K = ∅, there isn’t much to show. Just let g (x) = 0 for all x.
Lemma 2.7.4 Suppose M is a closed set in Rp and suppose f : M → [−1, 1] is continuous at every point of M. Then there exists a function g which is defined and continuous on all of Rp such that g = f on M and g has its values in [−1, 1] . Proof: Using Lemma 2.7.3, let g1 be such that g1 (Rp ) ⊆ [−1/3, 1/3] and kf − g1 kM ≤
2 . 3
Suppose g1 , · · · , gm have been chosen such that gj (Rp ) ⊆ [−1/3, 1/3] and
m m i−1
X 2 2
gi < .
f −
3 3 i=1
(2.12)
M
This has been done for m = 1. Then
! m i−1
3 m
X 2
f− gi
2
3 i=1
≤1
M
m i−1 Pm and so 23 f − i=1 23 gi can play the role of f in the first step of the proof. Therefore, there exists gm+1 defined and continuous on all of Rp such that its values are in [−1/3, 1/3] and
! m i−1
3 m X 2 2
f− gi − gm+1 ≤ .
2
3 3 i=1 M
Hence
! m i−1 m
X 2 2
gi − gm+1
f−
3 3 i=1
M
m+1 2 . ≤ 3
It follows there exists a sequence, {gi } such that each has its values in [−1/3, 1/3] and for every m 2.12 holds. Then let ∞ i−1 X 2 g (x) ≡ gi (x) . 3 i=1
2.8. ROOT TEST It follows
61
∞ m i−1 X 2 i−1 X 2 1 |g (x)| ≤ gi (x) ≤ ≤1 3 3 3 i=1 i=1
and
i−1 2 i−1 2 1 gi (x) ≤ 3 3 3
so the Weierstrass M test applies and shows convergence is uniform. Therefore g must be continuous by Theorem 2.6.33. The estimate 2.12 implies f = g on M . The following is the Tietze extension theorem.
Theorem 2.7.5
Let M be a closed nonempty subset of Rp and let f : M → [a, b] be continuous at every point of M. Then there exists a function, g continuous on all of Rp which coincides with f on M such that g (Rp ) ⊆ [a, b] . 2 (f (x) − b) . Then f1 satisfies the conditions of Lemma 2.7.4 Proof: Let f1 (x) = 1 + b−a p and so there exists g1 : R → [−1, 1] such that g is continuous on Rp and equals f1 on M . Let g (x) = (g1 (x) − 1) b−a + b. This works. 2 For x ∈ M, 2 b−a g (x) = 1+ (f (x) − b) − 1 +b b−a 2 b−a 2 (f (x) − b) +b = b−a 2 = (f (x) − b) + b = f (x) 2 Also 1+ b−a (f (x) − b) ∈ [−1, 1] so [a, b].
2.8
2 b−a
(f (x) − b) ∈ [−2, 0] and (f (x) − b) ∈ [−b + a, 0] , f (x) ∈
Root Test
The root test has to do with when a series of complex numbers converges. I am assuming the reader has been exposed to infinite series. However, this that I am about to explain is a little more general than what is usually seen in calculus. P Theorem 2.8.1 Let ak ∈ Fp and consider ∞ k=1 ak . Then this series converges absolutely if 1/k lim sup |ak | = r < 1. k→∞ 1/k
The series diverges spectacularly if lim supk→∞ |ak | the test fails.
1/k
> 1 and if lim supk→∞ |ak |
= 1,
1/k
Proof: Suppose first that lim supk→∞ |ak | = r < 1. Then letting R ∈ (r, 1) , it follows 1/k from the definition of lim sup that for all k large enough, |ak | ≤ R. Hence there exists N such that if k ≥ N, then |ak | ≤ Rk . Let Mk = |ak | for k < N and let Mk = Rk for k ≥ N . Then ∞ N −1 X X RN 1,
62
CHAPTER 2. BASIC TOPOLOGY AND ALGEBRA
then letting r > R > 1, it follows that for infinitely many k, |ak | > Rk and so there is a subsequence which is unbounded. In particular, the series cannot and in fact Pconverge ∞ 1 which diverges diverges spectacularly. In case that the lim sup = 1, you can consider n=1 n P∞ by calculus and n=1 n12 which converges, also from calculus. However, the lim sup equals 1 for both of these. This is a major theorem because the lim sup always exists. As an important application, here is a corollary which emphasizes one aspect of the above theorem. P Corollary 2.8.2 If k ak converges, then lim supk→∞ |ak |1/k ≤ 1. If the sequence has values in X a complete normed linear space, there is no change in the conclusion or proof of the above theorem. You just replace |·| with k·k the symbol for the norm.
2.9
Equivalence of Norms
As mentioned above, it makes absolutely no difference which norm you decide to use on Rp . This holds in general finite dimensional normed spaces and is shown here. Of course the main interest here is where the normed linear space is (Rp , k·k) but it is no harder to present the more general result where you have a finite dimensional vector space V which has a norm. If you have not seen such things, just let V be Rp in what follows or consider the problems at end of the chapter.
Definition 2.9.1
Let (V, k·k) be a normed linear space and let aP basis be {v1 , · · · , vn }. For x ∈ V, let its component vector in Fp be (α1 , · · · , αn ) so that x = i αi vi . Then define T θx ≡ α = α1 · · · αn Thus θ is well defined, one to one and onto from V to Fp . It is also linear and its inverse θ−1 satisfies all the same algebraic properties as θ. In particular, (V, k·k) could be (Rp , k·k) where k·k is some norm on Rp . The following fundamental lemma comes from the extreme value theorem for continuous functions defined on a compact set. Let
X
f (α) ≡ αi vi ≡ θ−1 α
i P Then it is clear that f is a continuous function. This is because α → i αi vi is a continuous map into V and from the triangle inequality x → kxk is continuous as a map from V to R.
Lemma 2.9.2 There exists δ > 0 and ∆ ≥ δ such that δ = min {f (α) : |α| = 1} , ∆ = max {f (α) : |α| = 1} Also, δ |α|
≤
−1
θ α ≤ ∆ |α|
(2.13)
δ |θv|
≤
kvk ≤ ∆ |θv|
(2.14)
Proof: These numbers exist thanks to the extreme value theorem, PnTheorem 2.6.26. It cannot be that δ = 0 because if it were, you would have |α| = 1 but j=1 αk vj = 0 which is impossible since {v1 , · · · , vn } is linearly independent. The first of the above inequalities follows from
−1 α
=f α ≤∆ δ≤ θ
|α| |α| the second follows from observing that θ−1 α is a generic vector v in V . Now we can draw several conclusions about (V, k·k) for V finite dimensional.
2.10. NORMS ON LINEAR MAPS
63
Theorem 2.9.3 Let (V, k·k) be a finite dimensional normed linear space. Then the compact sets are exactly those which are closed and bounded. Also (V, k·k) is complete. If K is a closed and bounded set in (V, k·k) and f : K → R, then f achieves its maximum and minimum on K. Proof: First note that the inequalities 2.13 and 2.14 show that both θ−1 and θ are continuous. Thus these take convergent sequences to convergent sequences. ∞ ∞ Let {wk }k=1 be a Cauchy sequence. Then from 2.14, {θwk }k=1 is a Cauchy sequence. p Thanks to Theorem 2.6.26, it converges to some β ∈ F . It follows that limk→∞ θ−1 θwk = limk→∞ wk = θ−1 β ∈ V . This shows completeness. Next let K be a closed and bounded set. Let {wk } ⊆ K. Then {θwk } ⊆ θK which is also a closed and bounded set thanks to the inequalities 2.13 and 2.14. Thus there is a subsequence still denoted with k such that θwk → β ∈ Fp . Then as just done, wk → θ−1 β. Since K is closed, it follows that θ−1 β ∈ K. Finally, why are the only compact sets those which are closed and bounded? Let K be ∞ compact. If it is not bounded, then there is a sequence of points of K, {km }m=1 such that m kk k ≥ m. It follows that it cannot have a convergent subsequence because the points are further apart from each other than 1/2. Hence K is not sequentially compact and consequently it is not compact. It follows that K is bounded. It follows from Proposition 2.6.16 that K is closed. The last part is identical to the proof in Theorem 2.6.26. You just take a convergent subsequence of a minimizing (maximizing) sequence and exploit continuity. Next is the theorem which states that any two norms on a finite dimensional vector space are equivalent. In particular, any two norms on Rp are equivalent.
Theorem 2.9.4
Let k·k , k|·k| be two norms on V a finite dimensional vector space. Then they are equivalent, which means there are constants 0 < a < b such that for all v, a kvk ≤ k|vk| ≤ b kvk ˆ go with k|·k|. Then using the Proof: In Lemma 2.9.2, let δ, ∆ go with k·k and ˆδ, ∆ inequalities of this lemma, kvk ≤ ∆ |θv| ≤
ˆ ˆ ∆ ∆∆ ∆∆ k|vk| ≤ |θv| ≤ kvk ˆδ ˆδ δ ˆδ
and so ˆδ ˆ ∆ kvk ≤ k|vk| ≤ kvk ∆ δ Thus the norms are equivalent.
2.10
Norms on Linear Maps
To begin with, the notion of a linear map is just a function which is linear. Such a function, denoted by L, and mapping Rn to Rm is linear means ! m m X X L xi v i = xi Lvi i=1
i=1
In other words, it distributes across additions and allows one to factor out scalars. Hopefully this is familiar from linear algebra. If not, there is a short review of linear algebra in the appendix.
64
CHAPTER 2. BASIC TOPOLOGY AND ALGEBRA
Definition 2.10.1
We use the symbol L (Rn , Rm ) to denote the space of linear transformations, also called linear operators, which map Rn to Rm . For L ∈ L (Rn , Rm ) one can always consider it as an m × n matrix A as follows. Let A = Le1 Le2 · · · Len where in the above Lei is the ith column. Define the sum and scalar multiple of linear transformations in the natural manner. That is, for L, M linear transformations and α, β scalars, (αL + βM ) (x) ≡ αL (x) + βM (x)
Observation 2.10.2 With the above definition of sums and scalar multiples of linear transformations, the result of such a linear combination of linear transformations is itself linear. Indeed, for x, y vectors and a, b scalars, (αL + βM ) (ax + by) ≡ αL (ax + by) + βM (ax + by) = αaL (x) + αbL (y) + βaM (x) + βbM (y) = a (αL (x) + βM (x)) + b (αL (y) + βM (y)) = a (αL + βM ) (x) + b (αL + βM ) (y) Also, a linear combination of linear transformations corresponds to the linear combination of the corresponding matrices in which addition is defined in the usual manner as addition of corresponding entries. To see this, note that if A is the matrix of L and B the matrix of M, (αL + βM ) ei ≡ (αA + βB) ei = αAei + βBei by the usual rules of matrix multiplication. Thus the ith column of (αA + βB) is the linear combination of the ith columns of A and B according to usual rules of matrix multiplication.
Proposition 2.10.3 For L ∈ L (Rn , Rm ) , the matrix defined above satisfies Ax = Lx, x ∈ Rn and if any m × n matrix A does satisfy Ax = Lx, then A is given in the above definition. Proof: Ax = Lx for all x if and only if for x =
Pn
i=1
xi ei
Ax = L
n X
! xi e i
=
i=1
n X
xi L (ei ) ≡
Le1
···
Le2
i=1
Len
x1 x2 .. .
xn if and only if for every x ∈ Rn , Ax = which happens if and only if A =
Definition 2.10.4
Le1 Le1
Le2 Le2
··· ···
Len Len
x
.
The norm of a linear transformation of A ∈ L (Rn , Rm ) is de-
fined as kAk ≡ sup {kAxkRm : kxkRn ≤ 1} < ∞. Then kAk is referred to as the operator norm of the linear transformation A.
2.10. NORMS ON LINEAR MAPS
65
It is an easy exercise to verify that k·k is a norm on L (Rn , Rm ) and it is always the case that kAxkRm ≤ kAk kxkRn . This is shown next. Furthermore, you should verify that you can replace ≤ 1 with = 1 in the definition. Thus kAk ≡ sup {kAxkRm : kxkRn = 1} . It is necessary to verify that this norm is actually well defined.
Lemma 2.10.5 The operator norm is well defined. Let A ∈ L (Rn , Rm ). Proof: We can use the matrix of the linear transformation with matrix multiplication interchangeably with the linear transformation. This follows from the above considerations. Suppose limk→∞ vk = v in Rn . Does it follow that Avk → Av? This is indeed the case with the usual Euclidean norm and therefore, it is also true with respect to any other norm by the equivalence of norms (Theorem 2.9.4). To see this,
k Av − Av ≡
m X
Avk − (Av) 2 i i
i=1
!1/2
2 1/2 n m X X k Aij vj − vj ≤ i=1 j=1
2 1/2 2 1/2 m n m n X X X X |Aij | |Aij | vjk − vj ≤ v k − v ≤
i=1
i=1
j=1
j=1
Thus A is continuous. Then also v → kAvkRm is a continuous function by the triangle inequality. Indeed, |kAvk − kAuk| ≤ kAv − AukRm Now let D be the closed ball of radius 1 in V . By Theorem 2.9.3, this set D is compact and so max {kAvkRm : kvkRn ≤ 1} ≡ kAk < ∞. Then we have the following theorem.
Theorem 2.10.6
Let Rn and Rm be finite dimensional normed linear spaces of dimension n and m respectively and denote by k·k the norm on either Rn or Rm . Then if A is any linear function mapping Rn to Rm , then A ∈ L (Rn , Rm ) and (L (Rn , Rm ) , k·k) is a complete normed linear space of dimension nm with kAxk ≤ kAk kxk . Also if A ∈ L (Rn , Rm ) and B ∈ L (Rm , Rp ) where Rn , Rm , Rp are normed linear spaces, kBAk ≤ kBk kAk Proof: It is necessary to show the norm defined on linear transformations really is a norm. Again the triangle inequality is the only property which is not obvious. It remains to show this and verify kAk < ∞. This last follows from the above Lemma 2.10.5. Thus the norm is at least well defined. It remains to verify its properties. kA + Bk ≡ sup{k(A + B) (x)k : kxk ≤ 1} ≤ sup{kAxk : kxk ≤ 1} + sup{kBxk : kxk ≤ 1} ≡ kAk + kBk . Next consider the assertion about the dimension of L (Rn , Rm ) . This is fairly obvious because a basis for the space of m × n matrices is clearly the matrices Eij which has a 1 in
66
CHAPTER 2. BASIC TOPOLOGY AND ALGEBRA
the ij th position and a 0 everywhere else. By Theorem 2.9.4 (L (Rn , Rm ) , k·k) is complete. If x 6=0,
x 1
≤ kAk kAxk = A kxk kxk Thus kAxk ≤ kAk kxk. Consider the last claim. kBAk ≡ sup kB (A (x))k ≤ kBk sup kAxk = kBk kAk kxk≤1
kxk≤1
What does it mean to say that Ak → A in terms of this operator norm? In words, this happens if and only if the ij th entry of Ak converges to the ij th entry of A for each ij.
Proposition 2.10.7 limk→∞ Ak − A = 0 if and only if for every i, j lim Akij − Aij = 0 k→∞
Proof: If A is an m × n matrix, then Aij = eTi Aej . Suppose now that Ak − A → 0. Then in terms of the usual Euclidean norm and using the Cauchy Schwarz inequality, k Aij − Aij = eTi Ak − A ej =
ei , Ak − A ej ≤ |ei | Ak − A ej ≤ Ak − A
(2.15)
If the operator norm is taken with respect to k·k , some other norm than the Euclidean norm, then the right side of the above after ≤ k
A − A ej ≤ ∆ Ak − A ej ≤ ∆ Ak − A kej k Thus convergence in operator norm implies pointwise convergence of the entries of Ak to the corresponding entries of A. Next suppose the entries of Ak converge to the corresponding entries of A. If kvk ≤ 1, and to save notation, let B k = Ak − A. Then k P k T P k P A − A v = B v B v · · · B v j j mj j j 1j j 2j j 2 1/2 2 1/2 X X X X k k Bij Bij |vj | ≤ |v| = i
j
i
j
By equivalence of norms, 2 1/2 X X
k k Bij δ B v ≤ ∆ kvk i
j
and so if kvk ≤ 1, 2 1/2 X X
k Bij δ B k v ≤ ∆ i
and so
j
2 1/2 X X
k ∆ k
B ≤ Bij δ i j
2.11. CONNECTED SETS
67
the quantity on the right converging to 0. More generally, you might want to consider linear transformations L (V, W ) where V, W are finite dimensional normed linear spaces. In this case, the operator norm defined above is the same and it is well defined.
Proposition 2.10.8 Let L ∈ L (V, W ) where V, W are normed linear spaces with V finite dimensional. Then the operator norm defined above as kLk ≡ sup kLvkW kvk≤1
is finite and satisfies all the axioms of a norm. Also kLvk ≤ kLk kvk. Proof: I won’t bother with the subscript on the norms and allow this to bePdetermined n by context in what follows. Let (v1 , ..., vn ) be a basis for V . For v ∈ V, let v = j=1 aj vj so Pn the aj are the coordinates of v in F. Let h (v) ≡ a where a = (a1 , ..., an ) with v = j=1 aj vj . Thus h (v)i = ai . Obviously h is linear, one to one, and onto. Define k|vk| ≡ |h (v)| where the last is the usual Euclidean norm on Fn . Then this is obviously a norm because k|vk| = 0 if and only if h (v) = 0 if and only if v = 0. It is also clear that k|αvk| ≡ |h (αv)| = |α| |h (v)| whenever α is a scalar. It only remains to show the triangle inequality. However, this is also easy because we know it for |·| . k|v + wk| ≡ |h (v + w)| ≤ |h (v)| + |h (w)| ≡ k|vk| + k|wk| . It follows the two norms are equivalent. Thus k|vk| ≤ ∆ kvk for some ∆. Hence
X
h (v)i vi sup kLvkW ≤ sup kLvkW = sup L
kvk≤1 k|vk|≤∆ |h(v)|≤∆ i X ≤ sup |h (vi )| kLvi kW |h(v)|≤∆
i
!1/2
!1/2 sup
X
|h(v)|≤∆
i
≤
2
|h (v)i |
X
kLvi k
2
i
!1/2 ≤
∆
X
2
kLvi k
0, (l, l + δ) ∩ B = ∅ contradicting the definition of l as an upper bound for S. Therefore, l ∈ B which implies l∈ / A after all, a contradiction. It follows I must be connected. This yields a generalization of the intermediate value theorem from one variable calculus.
Corollary 2.11.7 Let E be a connected set in Rp and suppose f : E → R and that y ∈ (f (e1 ) , f (e2 )) where ei ∈ E. Then there exists e ∈ E such that f (e) = y. Proof: From Theorem 2.11.3, f (E) is a connected subset of R. By Theorem 2.11.6 f (E) must be an interval. In particular, it must contain y. This proves the corollary. The following theorem is a very useful description of the open sets in R.
Theorem 2.11.8 Let U be an open set in R. Then there exist countably many ∞ disjoint open sets {(ai , bi )}i=1 such that U = ∪∞ i=1 (ai , bi ) . Proof: Let p ∈ U and let z ∈ Cp , the connected component determined by p. Since U is open, there exists, δ > 0 such that (z − δ, z + δ) ⊆ U. It follows from Theorem 2.11.2 that (z − δ, z + δ) ⊆ Cp . This shows Cp is open. By Theorem 2.11.6, this shows Cp is an open interval, (a, b) where a, b ∈ [−∞, ∞] . There are therefore at most countably many of these connected components because each must contain a rational number and the rational numbers are countable. ∞ Denote by {(ai , bi )}i=1 the set of these connected components.
70
CHAPTER 2. BASIC TOPOLOGY AND ALGEBRA
Definition 2.11.9
A set E in Rp is arcwise connected if for any two points, p, q ∈ E, there exists a closed interval, [a, b] and a continuous function, γ : [a, b] → E such that γ (a) = p and γ (b) = q. An example of an arcwise connected space would be any subset of Rp which is the continuous image of an interval. Arcwise connected is not the same as connected. A well known example is the following. 1 : x ∈ (0, 1] ∪ {(0, y) : y ∈ [−1, 1]} (2.16) x, sin x
You can verify that this set of points in R2 is not arcwise connected but is connected.
Lemma 2.11.10 In Rp , B (z,r) is arcwise connected. Proof: This is easy from the convexity of the set. If x, y ∈ B (z, r) , then let γ (t) = x + t (y − x) for t ∈ [0, 1] . kx + t (y − x) − zk
= k(1 − t) (x − z) + t (y − z)k ≤ (1 − t) kx − zk + t ky − zk < (1 − t) r + tr = r
showing γ (t) stays in B (z, r).
Proposition 2.11.11 If X 6= ∅ is arcwise connected, then it is connected. Proof: Let p ∈ X. Then by assumption, for any x ∈ X, there is an arc joining p and x. This arc is connected because it is the continuous image of an interval which is connected. Since x is arbitrary, every x is in a connected subset of X which contains p. Hence Cp = X and so X is connected.
Theorem 2.11.12
Let U be an open subset of Rp . Then U is arcwise connected if and only if U is connected. Also the connected components of an open set are open sets. Proof: By Proposition 2.11.11 it is only necessary to verify that if U is connected and open, then U is arcwise connected. Pick p ∈ U . Say x ∈ U satisfies P if there exists a continuous function, γ : [a, b] → U such that γ (a) = p and γ (b) = x. A ≡ {x ∈ U such that x satisfies P.} If x ∈ A, then Lemma 2.11.10 implies B (x, r) ⊆ U is arcwise connected for small enough r. Thus letting y ∈ B (x, r) , there exist intervals, [a, b] and [c, d] and continuous functions having values in U , γ, η such that γ (a) = p, γ (b) = x, η (c) = x, and η (d) = y. Then let γ 1 : [a, b + d − c] → U be defined as γ (t) if t ∈ [a, b] γ 1 (t) ≡ η (t + c − b) if t ∈ [b, b + d − c]
Then it is clear that γ 1 is a continuous function mapping p to y and showing that B (x, r) ⊆ A. Therefore, A is open. A 6= ∅ because since U is open there is an open set, B (p, δ) containing p which is contained in U and is arcwise connected. Now consider B ≡ U \ A. I claim this is also open. If B is not open, there exists a point z ∈ B such that every open set containing z is not contained in B. Therefore, letting B (z, δ) be such that z ∈ B (z, δ) ⊆ U, there exist points of A contained in B (z, δ) . But then, a repeat of the above argument shows z ∈ A also. Hence B is open and so if B 6= ∅, then U = B ∪ A and so U is separated by the two sets B and A contradicting the assumption
2.12. STONE WEIERSTRASS APPROXIMATION THEOREM
71
that U is connected. Note that, since B is open, it contains no limit points of A and since A is open, it contains no limit points of B. It remains to verify the connected components are open. Let z ∈ Cp where Cp is the connected component determined by p. Then picking B (z, δ) ⊆ U, Cp ∪B (z, δ) is connected and contained in U and so it must also be contained in Cp . Thus z is an interior point of Cp . As an application, consider the following corollary.
Corollary 2.11.13 Let f : Ω → Z be continuous where Ω is a connected nonempty open set in Rp . Then f must be a constant. Proof: Suppose not. Then it achieves two different values, k and l 6= k. Then Ω = f −1 (l) ∪ f −1 ({m ∈ Z : m 6= l}) and these are disjoint nonempty open sets which separate Ω. To see they are open, note 1 1 −1 −1 f ({m ∈ Z : m 6= l}) = f ∪m6=l m − , m + 6 6 which is the inverse image of an open set while f −1 (l) = f −1 l − 16 , l + 16 also an open set.
2.12
Stone Weierstrass Approximation Theorem
2.12.1
The Bernstein Polynomials
These polynomials give an explicit description of a sequence of polynomials which converge uniformly to a continuous function. Recall that if you have a bounded function defined on some set S with values in Cq . kf k∞ ≡ sup {kf (x)k : x ∈ S} This is one way to measure distance between functions.
Lemma 2.12.1 The following estimate holds for x ∈ [0, 1] and m ≥ 2. m X m k=0
k
2
m−k
(k − mx) xk (1 − x)
≤
1 m 4
Proof: First of all, from the binomial theorem m X m k=0
k
m X m m−k m−k et(k−mx) xk (1 − x) = e−tmx etk xk (1 − x) k k=0
= e−tmx 1 − x + xet
m
m
≡ e−tmx g (t) , g (0) = 1, g 0 (0) = g 00 (0) = x
Take a derivative with respect to t twice. m X m k=0
=
2
k
2
m−k
(k − mx) et(k−mx) xk (1 − x)
m
m−1
(mx) e−tmx g (t) + 2 (−mx) e−tmx mg (t) g 0 (t) h i m−2 0 2 m−1 00 +e−tmx m (m − 1) g (t) g (t) + mg (t) g (t)
72
CHAPTER 2. BASIC TOPOLOGY AND ALGEBRA
Now let t = 0 and note that the right side is m(x − x2 ) ≤ m/4 for x ∈ [0, 1] . Thus m X m k=0
k
2
(k − mx) xk (1 − x)
m−k
= mx − mx2 ≤ m/4
With this preparation, here is the first version of the Weierstrass approximation theorem. I will allow f to have values in a complete normed linear space. Thus, f ∈ C ([0, 1] ; X) where X is a Banach space, Definition 2.6.38. Thus this is a function which is continuous with values in X as discussed earlier with metric spaces.
Theorem 2.12.2
Let f ∈ C ([0, 1] ; X) and let the norm be denoted by k·k .
pm (x) ≡
m X k m m−k . xk (1 − x) f k m
k=0
Then these polynomials converge uniformly to f on [0, 1]. Proof: Let kf k∞ denote the largest value of kf (x)k. By uniform continuity of f , there exists a δ > 0 such that if |x − x0 | < δ, then kf (x) − f (x0 )k < ε/2. By the binomial theorem,
m X
k m m−k k
kpm (x) − f (x)k ≤ x (1 − x)
f m − f (x) k k=0
X m
m−k
f k − f (x) + ≤ xk (1 − x)
k m k −x| 0 is given, there exists a polynomial p such that for all x ∈ [0, 1] , kp (x) − f ◦ l (x)k < ε Therefore, letting y = l (x) , it follows that for all y ∈ [a, b] ,
−1
p l (y) − f (y) < ε.
2.12. STONE WEIERSTRASS APPROXIMATION THEOREM
73
The exact form of the polynomial is as follows. m X k m m−k k p (x) = x (1 − x) f l k m k=0
p l
−1
(y) =
m X k=0
2.12.2
m k
l
−1
k m−k f (y) 1 − l−1 (y)
k l m
(2.17)
The Case of Compact Sets
There is a profound generalization of the Weierstrass approximation theorem due to Stone. It has to be one of the most elegant and insightful theorems in mathematics.
Definition 2.12.4
A is an algebra of real valued functions if A is a real vector space and if whenever f, g ∈ A then f g ∈ A. There is a generalization of the Weierstrass theorem due to Stone in which an interval will be replaced by a compact or locally compact set and polynomials will be replaced with elements of an algebra satisfying certain axioms.
Corollary 2.12.5 On the interval [−M, M ], there exist polynomials pn such that pn (0) = 0 and lim kpn − |·k|∞ = 0.
n→∞
recall that kf k∞ ≡ supt∈[−M,M ] |f (t)|. Proof: By Corollary 2.12.3 there exists a sequence of polynomials, {˜ pn } such that p˜n → |·| uniformly. Then let pn (t) ≡ p˜n (t) − p˜n (0) . In what follows, x will be a point in Rp . However, this could be generalized. Note that p C can be considered as R2p .
Definition 2.12.6
An algebra of functions A defined on A, annihilates no point of A if for all x ∈ A, there exists g ∈ A such that g (x) 6= 0. The algebra separates points if whenever x1 6= x2 , then there exists g ∈ A such that g (x1 ) 6= g (x2 ). The following generalization is known as the Stone Weierstrass approximation theorem.
Theorem 2.12.7
Let A be a compact set in Rp and let A ⊆ C (A; R) be an algebra of functions which separates points and annihilates no point. Then A is dense in C (A; R). Proof: First here is a lemma.
Lemma 2.12.8 Let c1 and c2 be two real numbers and let x1 6= x2 be two points of A. Then there exists a function fx1 x2 such that fx1 x2 (x1 ) = c1 , fx1 x2 (x2 ) = c2 . Proof of the lemma: Let g ∈ A satisfy g (x1 ) 6= g (x2 ). Such a g exists because the algebra separates points. Since the algebra annihilates no point, there exist functions h and k such that h (x1 ) 6= 0, k (x2 ) 6= 0.
74
CHAPTER 2. BASIC TOPOLOGY AND ALGEBRA
Then let u ≡ gh − g (x2 ) h, v ≡ gk − g (x1 ) k. It follows that u (x1 ) 6= 0 and u (x2 ) = 0 while v (x2 ) 6= 0 and v (x1 ) = 0. Let fx1 x2 ≡
c2 v c1 u + . u (x1 ) v (x2 )
This proves the lemma. Now continue the proof of Theorem 2.12.7. First note that A satisfies the same axioms as A but in addition to these axioms, A is closed. The closure of A is taken with respect to the usual norm on C (A), kf k∞ ≡ max {|f (x)| : x ∈ A} . Thus A consists, by definition, of all functions in A along with all uniform limits of these functions. Suppose f ∈ A and suppose M is large enough that kf k∞ < M. Using Corollary 2.12.5, let pn be a sequence of polynomials such that kpn − |·k|∞ → 0, pn (0) = 0. It follows that pn ◦ f ∈ A and so |f | ∈ A whenever f ∈ A. Also note that max (f, g) =
(f + g) − |f − g| |f − g| + (f + g) , min (f, g) = . 2 2
Therefore, this shows that if f, g ∈ A then max (f, g) , min (f, g) ∈ A. By induction, if fi , i = 1, 2, · · · , m are in A then max (fi , i = 1, 2, · · · , m) , min (fi , i = 1, 2, · · · , m) ∈ A. Now let h ∈ C (A; R) and let x ∈ A. Use Lemma 2.12.8 to obtain fxy , a function of A which agrees with h at x and y. Letting ε > 0, there exists an open set U (y) containing y such that fxy (z) > h (z) − ε if z ∈ U (y). Since A is compact, let U (y1 ) , · · · , U (yl ) cover A. Let fx ≡ max (fxy1 , fxy2 , · · · , fxyl ). Then fx ∈ A and fx (z) > h (z)−ε for all z ∈ A and fx (x) = h (x). This implies that for each x ∈ A there exists an open set V (x) containing x such that for z ∈ V (x), fx (z) < h (z) + ε. Let V (x1 ) , · · · , V (xm ) cover A and let f ≡ min (fx1 , · · · , fxm ).Therefore, f (z) < h (z) + ε for all z ∈ A and since fx (z) > h (z) − ε for all z ∈ A, it follows f (z) > h (z) − ε also and so |f (z) − h (z)| < ε for all z. Since ε is arbitrary, this shows h ∈ A and proves A = C (A; R).
2.12.3
The Case of a Closed Set in Rp
You can extend this theory to the case where A = X a closed set. More generally, this is done with a locally compact Hausdorff space but this kind of space has not been considered here.
Definition 2.12.9
Let X be a closed set in Rp . C0 (X) denotes the space of real or complex valued continuous functions defined on X with the property that if f ∈ C0 (X) , then for each ε > 0 there exists a compact set K such that |f (x)| < ε for all x ∈ X \ K. Define kf k∞ = sup {|f (x)| : x ∈ X}. These functions are said to vanish at infinity.
2.12. STONE WEIERSTRASS APPROXIMATION THEOREM
75
Lemma 2.12.10 For X a closed set in Rp with the above norm, C0 (X) is a complete space, meaning that every Cauchy sequence converges. Proof: Let {fn } be a Cauchy sequence of functions in C0 (X). Then in particular, ∞ {fn (x)}n=1 is a Cauchy sequence in F. Let f (x) ≡ limn→∞ fn (x) .Let ε > 0 be given. Then there exists N such that for any x ∈ X, |fm (x) − fn (x)| ≤ kfm − fn k∞ < ε/3 for m, n ≥ N . Thus, picking n ≥ N and taking a limit as m → ∞, |f (x) − fn (x)| ≤ ε/3 since x was arbitrary, sup |f (x) − fn (x)| ≤ ε/3 (2.18) x∈X
By assumption, there exists a compact set K such that if x ∈ / K then kfN k∞ < ε/3. Thus, from 2.18, sup |f (x)| ≤ 2ε/3 < ε x∈K /
It remains to verify that f is continuous. Letting N be as the above, let x, y ∈ X. Then |f (x) − f (y)| ≤ |f (x) − fN (x)| + |fN (x) − fN (y)| + |fN (y) − f (y)| By continuity of fN at x, there exists δ > 0 such that if |x − y| < δ for y ∈ X, it follows that |fN (x) − fN (y)| < ε/3. Then for |y − x| < δ, |f (x) − f (y)|
≤
sup |f (x) − fN (x)| + x∈X
ε + sup |f (y) − fN (y)| 3 y∈X
ε ε ε + + =ε 3 3 3
0 and ∞ if tj = 0. Then p (yi ) will be the label placed on yi . To determine this label, let rk be the smallest of these ratios. Then the label placed on yi will be pk where rk is the smallest of all these extended nonnegative real numbers just described. If there is duplication, pick pk where k is smallest. The value of the simplex will be the product of the labels. What does it mean for the value of the simplex to be Pn ≡ p0 p1 · · · pn ? It means that each of the first n + 1 primes is assigned to exactly one of the n + 1 vertices of the simplex so each rj > 0 and there are no repeats in the rj . For the vertices which are on [xi1 , · · · , xim ] , these will be labeled from the list {pi1 , · · · , pim } because tk = 0 for each of these and so rk = ∞ unless k ∈ {i1 , · · · , im } . In particular, this scheme labels xi as pi . By the Sperner’s lemma procedure described above, there are an odd number of simplices Q having value i6=k pi on the k th face and an odd number of simplices in the triangulation of S for which the value of the simplex is p0 pQ 1 · · · pn ≡ Pn .QThus if [y0 , · · · , yn ] is one of n n these simplices, and p (yi ) is the label for yi , i=0 p (yi ) = j=0 pj ≡ Pn What is rk , the smallest of those ratios in determining a label? Could it be larger than 1? rk is certainly finite because at least some tj 6= 0 since they sum to 1. Thus, if rk > 1, you would have sk > tk . The sj sum to 1 and so some sj < tj since otherwise, the sum of the tj equalling 1 would require the sum of the sj to be larger than 1. Hence rk was not really the smallest after all and so rk ≤ 1. Hence sk ≤ tk . Thus if the value of a simplex is Pn , then for each vertex of the simplex, the smallest ratio associated with it is of the form sj /tj ≤ 1 and each j gets used exactly once. Let S ≡ {S1 , · · · , Sm } denote those simplices whose value is PP n . In other words, if n {y0 , · · · , yn } are the vertices of one of these simplices in S, and ys = i=0 tsi xi , rks ≤ rj for all j P 6= ks and {k0 , · · · , kn } = {0, · · · , n}. Let b denote the barycenter of Sk = [y0 , · · · , yn ]. n 1 yi b ≡ i=0 n+1 Do the same system of labeling for each n simplex in a sequence of triangulations where the diameters of the simplices in the k th triangulation are no more than 2−k . Thus each of these triangulations has a n simplex having diameter no more than 2−k which has value Pn . Let bk be the barycenter of one of these n simplices having value Pn . By compactness, there is a subsequence, still denoted with the index k such that bk → x. This x is a fixed point. Pn Pn Consider this last claim. x = i=0 ti xi and after applying f , the result is i=0 si xi . Then bk is the barycenter of some σ k having diameter no more than 2−k which has value
2.17. EXERCISES
85
Pn . Say σ k is a simplex having vertices y0k , · · · , ynk and the value of y0k , · · · , ynk is Pn . Thus also limk→∞ yik = x. Re ordering these vertices if necessary, we can assume that the label for yik is pi which implies that the smallest ratio rk is when k = i and as noted above, this ratio is no larger than 1. Thus for each i = 0, · · · , n, si ≤ 1, si ≤ ti ti the ith coordinate of f yik with respect to the original vertices of S decreases and each i th of is represented for i = {0, 1, · · · , n} . As noted above, yik → x and so the i coordinate k k th k yi , ti must converge to ti . Hence if the i coordinate of f yi is denoted by ski , ski ≤ tki . By continuity of f , it follows that ski → si . Thus the above inequality is preserved on taking k → ∞ and so 0 ≤ si ≤ ti this for each i and these si , ti pertain to the single point x. But these si add to 1 as do the ti and so in fact, si = ti for each i and so f (x) = x. This proves the following theorem which is the Brouwer fixed point theorem.
Theorem 2.16.1
n
Let S be a simplex [x0 , · · · , xn ] such that {xi − x0 }i=1 are independent. Also let f : S → S be continuous. Then there exists x ∈ S such that f (x) = x.
Corollary 2.16.2 Let K be a closed convex bounded subset of Rn . Let f : K → K be continuous. Then there exists x ∈ K such that f (x) = x. Proof: Let S be a large simplex containing K and let P be the projection map onto K. See Problem 31 on Page 90 for the necessary properties of this projection map. Consider g (x) ≡ f (P x) . Then g satisfies the necessary conditions for Theorem 2.16.1 and so there exists x ∈ S such that g (x) = x. But this says x ∈ K and so g (x) = f (x).
Definition 2.16.3
A set B has the fixed point property if whenever f : B → B for f continuous, it follows that f has a fixed point. The proof of this corollary is pretty significant. By a homework problem, a closed convex set is a retract of Rn . This is what it means when you say there is this continuous projection map which maps onto the closed convex set but does not change any point in the closed convex set. When you have a set A which is a subset of a set B which has the property that continuous functions f : B → B have fixed points, and there is a continuous map P from B to A which leaves points of A unchanged, then it follows that A will have the same “fixed point property”. You can probably imagine all sorts of sets which are retracts of closed convex bounded sets. Also, if you have a compact set B which has the fixed point property and h : B → h (B) with h one to one and continuous, it will follow that h−1 is continuous and that h (B) will also have the fixed point property. This is very easy to show. This will allow further extensions of this theorem. This says that the fixed point property is topological.
2.17
Exercises
1. If you have two norms on Fn or more generally on X a finite dimensional vector space (You can’t escape these. They are encountered as subspaces of Fn for example.) and these are equivalent norms in the sense that there are scalars δ, ∆ not depending on x such that for all x, δ kxk ≤ kxk1 ≤ ∆ kxk (2.24) then the open sets defined in terms of these norms are the same. 2. Explain carefully why in Rn , B∞ (p, r) =
n Y i=1
(pi − r, pi + r)
86
CHAPTER 2. BASIC TOPOLOGY AND ALGEBRA 3. Explain carefully why if x is a limit point of a set A with respect to k·k then it is also a limit point with respect to k·k1 if the two norms are equivalent as in 2.24. Also show that if xn → x with respect to k·k , then the same is true with respect to k·k1 an equivalent norm satisfying 2.24. 4. A vector space X with field of scalars either R or C is called a normed linear space if it has a norm k·k . That is, the following assumptions are satisfied. kxk ≥ 0 and kxk = 0 if and only if x = 0
(2.25)
For α a scalar, kαxk = |α| kxk
(2.26)
kx + yk ≤ kxk + kyk
(2.27)
A normed linear space is called a Banach space if, in addition to having a norm, it is complete. This means that if {xn } is a Cauchy sequence, then it must converge. Recall that a sequence is a Cauchy sequence if for every ε > 0 there exists N such that if n, m ≥ N, then kxn − xm k < ε The space of bounded linear transformations L (X, Y ) which are linear transformations mapping X to Y consists of the functions L : X → Y such that L is linear, L (αx + βy) = αL (x) + βL (y) and kLk ≡ sup kLxkY < ∞ kxk≤1
Show that k·k , called the operator norm is a norm on L (X, Y ) and that kLxk ≤ kLk kxkX Also show that if you have L ∈ L (X, Y ) and M ∈ L (Y, Z) for X, Y, Z normed linear spaces, then kM ◦ Lk ≤ kM k kLk 5. If you have X a normed linear space and Y is a Banach space, show that L (X, Y ) is a Banach space with respect to the operator norm. 6. If X, Y are normed linear spaces, verify that A : X → Y is in L (X, Y ) if and only if A is continuous at each x ∈ X if and only if A is continuous at 0. 7. Generalize the root test, Theorem 2.8.1 to the situation where the sk are in a complete normed linear space. 8. Suppose X is a Banach space and {Bn } is a sequence of closed sets in X such that Bn ⊇ Bn+1 for all n and no Bn is empty. Also suppose that the diameter of Bn converges to 0. Recall the diameter is given by diam (B) ≡ sup {kx − yk : x, y ∈ B} Thus these sets Bn are nested and diam (Bn ) → 0. Verify that there is a unique point in the intersection of all these sets. 9. If X is a Banach space, and Y is the span of finitely many vectors in X, show that Y is closed. ∞
10. If X is an infinite dimensional Banach space, show that there exists a sequence {xn }n=1 such that kxn k ≤ 1 but for any m 6= n, kxn − xm k ≥ 1/4. Thus in infinite dimensional Banach spaces, closed and bounded sets are no longer compact as they are in Fn .
2.17. EXERCISES
87
11. In the proof of the fundamental theorem of algebra, explain why there exists z0 such that for p (z) a polynomial with complex coefficients, |p (z0 )| = min |p (z)| > 0 z∈C
12. Explain why a compact set in R has a largest point and a smallest point. Now if f : K → R for K compact and f continuous, give another proof of the extreme value theorem from using that f (K) is compact. 13. Generalize Theorem 2.6.33 to the case where fn : S → T where S, T are metric spaces. Give an appropriate definition for uniform convergence which will imply uniform convergence transfers continuity from fn to the target function f . 14. A function f : X → R for X a normed linear space is lower semicontinuous if, whenever xn → x, f (x) ≤ lim inf f (xn ) n→∞
It is upper semicontinuous if, whenever xn → x, f (x) ≥ lim sup f (xn ) n→∞
Explain why, if K is compact and f is upper semicontinuous then f achieves its maximum and if K is compact and f is lower semicontinuous, then f achieves its minimum on K. 15. Suppose fn : S → Y where S is a nonempty subset of X a normed linear space and suppose that Y is a Banach space (complete normed linear space). Generalize the theorem in the chapter to this case: Let fn : S → Y be bounded functions: supx∈S |fn (x)| = Cn < ∞. Then there exists bounded f : S → Y such that limn→∞ kf − fn k = 0 if and only if {fn } is uniformly Cauchy. Also show that BC (S; Y ) is a Banach space. 16. Show that no interval [a, b] ⊆ R can be countable. Hint: First show [0, 1] is not countable. You might do this by noting that every point in this interval can be written P∞ as k=1 2−k ak where ak is either 0 or 1. Let F be ∪n P ({1, 2, · · · , n}) . Explain why F is countable. Then Pmlet S ≡ P (N) \ F. Explain why S is uncountable. Let C be all points of the form k=1 2−k ak where ak is 0 or 1. Explain why C is countable. Let P J = [0, 1] \ C. Now let θ : S → J be given by θ (S) = k∈S 2−k . Explain why θ is one to one onto J. If [0, 1] is countable, show there are onto mappings as indicated N → [0, 1] → J → S showing that S is countable after all. 17. Using the above problem as needed, let B be a countable set of real numbers. Say ∞ B = {bn }n=1 . Let 1 if t ∈ {b1 , · · · , bn } fn (t) ≡ 0 otherwise P∞ −k Let g (t) ≡ k=1 2 fk (t) . Explain why g is continuous on R \ B and discontinuous on B. Note that B could be the rational numbers. 18. Let φn (x) = 1 − x2
n
for |x| ≤ 1.
For f a continuous function defined on [−1, 1] , extend it to have f (x) = f (1) for x > 1 and f (x) = f (−1) for x < −1. Consider Z x+1 Z 1 pn (x) ≡ φn (x − y) f (y) dy = φn (y) f (x − y) dy x−1
−1
88
CHAPTER 2. BASIC TOPOLOGY AND ALGEBRA This involves elementary calculus and change of variables. Show that pn (x) is a polynomial and that pn converges uniformly to f on [−1, 1]. This is the way Weierstrass originally proved the famous approximation theorem.
19. Generalize the Tietze extension theorem to metric spaces. Thus, f : K → [a, b] where K is a closed subset of a metric space (X, d). Show it can be extended as in this theorem. 20. In fact the Bernstein polynomials apply for, f having values in a normed linear space and a similar result will hold. Give such a generalization. 21. Consider the following pm (x) ≡
m1 X k1 =0
mn X m1 m2 mn k1 m −k m −k ··· ··· x1 (1 − x1 ) 1 1 xk22 (1 − x2 ) 2 2 k1 k2 kn kn =0
· · · xknn
mn −kn
(1 − xn )
f
kn k1 ,··· , m1 mn
.
(2.28)
n
where f : [0, 1] → X a normed linear space. Show kpm − f k → 0 as min (m1 , · · · , mn ) → ∞. Hint: Do consider Lemma 2.12.1 first and you may see how to do this. 22. Theorem 2.6.43 gave an example of a function which is everywhere continuous and nowhere differentiable. The first examples of this sort were given by Weierstrass in 1872 who gave an example involving an infinite series in which each term had all derivatives everywere and yet the uniform limit had no derivative anywhere. Using the example of Theorem 2.6.43, give an exampleP of an infinite series of functions, ∞ each term being a polynomial defined on [0, 1], k=1 pk (x) = f (x) for which it P ∞ makes absolutely no sense to write f 0 (x) = k=1 p0k (x) because f 0 fails to exist at any point. In other words, you cannot differentiate an infinite series term by term. The derivative of a sum is not the sum of the derivatives when dealing with an infinite “sum”. Also show that if you have any differentiable function g and ε > 0, there exists a nowhere differentiable function h such that kg − hk < ε. This is in stark contrast with what will be presented in complex analysis in which, thanks to the Cauchy integral formula, uniform convergence of differentiable functions does lead to a differentiable function. Hint: Use Weierstrass approximation theorem and telescoping series to get the example of a series which can’t be differentiated term by term. 23. Consider R \ {0} . Show this is not connected. 24. Show S ≡ x, sin x1 if x > 0 ∪ {(0, y) : |y| ≤ 1} is connected but not arcwise connected. 25. Let A be an m × n matrix. Then A∗ , called the adjoint matrix, is obtained from A by taking the transpose and then the conjugate. For example, ∗ i 1 −i 1 − i 3 1+i 2 = 1 2 1+i 3 1−i ∗ Formally, (A∗ )ij = Aji . Show that (Ax, y) = (x, A∗ y) and P(x,By) = (B x, y). The inner product is described in the chapter. Recall (x, y) ≡ j xj yj .
26. Let X be a subspace of Fm having dimension d and let y ∈ Fm . Show that x ∈ X is closest to y in the Euclidean norm |·| out of all vectors in X if and only if (y − x, u) = 0 Pd for all u ∈ X. Next show there exists such a closest point and it equals j=1 y, uj uj d
for {uj }j=1 an orthonormal basis for X.
2.17. EXERCISES
89
27. Let A : Fn → Fm be an m × n matrix. (Note how it is being considered as a linear transformation.) Show Im (A) ≡ {Ax : x ∈ Fn } is a subspace of Fm . If y ∈ Fm is given, show that there exists x such that y − Ax is as small as possible (Ax is the point of Im (A) closest to y) and it is a solution to the least squares equation A∗ Ax = A∗ y Hint: You might want to use Problem 25. 28. Show that the usual norm in Fn given by |x| = (x, x)
1/2
satisfies the following identities, the first of them being the parallelogram identity and the second being the polarization identity. 2
|x + y| + |x − y|
2
Re (x, y)
2
2
2 |x| + 2 |y| 1 2 2 = |x + y| − |x − y| 4
=
Show that these identities hold in any inner product space, not just Fn . By definition, an inner product space is just a vector space which has an inner product. 29. Let K be a nonempty closed and convex set in an inner product space (X, |·|) which is complete. For example, Fn or any other finite dimensional inner product space. Let y∈ / K and let λ = inf {|y − x| : x ∈ K} Let {xn } be a minimizing sequence. That is λ = lim |y − xn | n→∞
Explain why such a minimizing sequence exists. Next explain the following using the parallelogram identity in the above problem as follows. 2 2 y − xn + xm = y − xn + y − xm 2 2 2 2 2 y x y x 2 1 1 n m 2 2 = − − − − + |y − xn | + |y − xm | 2 2 2 2 2 2 Hence xm − xn 2 2
2 1 xn + xm 1 2 2 = − y − + 2 |y − xn | + 2 |y − xm | 2 1 1 2 2 ≤ −λ2 + |y − xn | + |y − xm | 2 2
Next explain why the right hand side converges to 0 as m, n → ∞. Thus {xn } is a Cauchy sequence and converges to some x ∈ X. Explain why x ∈ K and |x − y| = λ. Thus there exists a closest point in K to y. Next show that there is only one closest 2 point. Hint: To do this, suppose there are two x1 , x2 and consider x1 +x using the 2 parallelogram law to show that this average works better than either of the two points which is a contradiction unless they are really the same point. This theorem is of enormous significance.
90
CHAPTER 2. BASIC TOPOLOGY AND ALGEBRA
30. Let K be a closed convex nonempty set in a complete inner product space (H, |·|) (Hilbert space) and let y ∈ H. Denote the closest point to y by P x. Show that P x is characterized as being the solution to the following variational inequality Re (z − P x, y − P x) ≤ 0 for all z ∈ K. Hint: Let x ∈ K. Then, due to convexity, a generic thing in K is of the form x + t (z − x) , t ∈ [0, 1] for every z ∈ K. Then 2
2
2
|x + t (z − x) − y| = |x − y| + t2 |z − x| − t2Re (z − x, y − x) If x = P y, then the minimum value of this on the left occurs when t = 0. Function defined on [0, 1] has its minimum at t = 0. What does it say about the derivative of this function at t = 0? Next consider the case that for some x the inequality Re (z − x, y − x) ≤ 0. Explain why this shows x = P y. 31. Using Problem 30 and Problem 29 show the projection map, P onto a closed convex subset is Lipschitz continuous with Lipschitz constant 1. That is |P x − P y| ≤ |x − y| 32. Let A : Rn → Rn be continuous and let f ∈ Rn . Also let (·, ·) denote the standard inner product in Rn . Letting K be a closed and bounded and convex set, show that there exists x ∈ K such that for all y ∈ K, (f − Ax, y − x) ≤ 0 Hint: Show that this is the same as saying P (f − Ax + x) = x for some x ∈ K where here P is the projection map discussed above. Now use the Brouwer fixed point theorem. This little observation is called Browder’s lemma. It is a fundamental result in nonlinear analysis. 33. ↑In the above problem, suppose that you have a coercivity result which is (Ax, x) =∞ kxk kxk→∞ lim
Show that if you have this, then you don’t need to assume the convex closed set is bounded. In case K = Rn , and this coercivity holds, show that A maps onto Rn . 34. Let σ be an r simplex. Then [σ, b] will consist of all (1 − λ) σ + λb where λ ∈ [0, 1]. If σ = [x1 , · · · , xr ] , show that [σ, b] = [x1 , · · · , xr , b]. Now if σ 1 , σ 2 ⊆ σ where [σ, b] is an r + 1 simplex and each σ i is an r simplex, show that [σ 1 , b] ∩ [σ 2 , b] = [σ 2 ∩ σ 1 , b] .
Part II
Real Analysis
91
Chapter 3
The Derivative, a Linear Transformation The derivative is a linear transformation. In this chapter are the principal results about the derivative.
3.1
Basic Definitions
The derivative is a linear transformation. This may not be entirely clear from a beginning calculus course because they like to say it is a slope which is a number. As observed by Deudonne, “...In the classical teaching of Calculus, this idea (that the derivative is a linear transformation) is immediately obscured by the accidental fact that, on a one-dimensional vector space, there is a one-to-one correspondence between linear forms and numbers, and therefore the derivative at a point is defined as a number instead of a linear form. This slavish subservience to the shibboleth1 of numerical interpretation at any cost becomes much worse when dealing with functions of several variables...” He is absolutely right. The concept of derivative generalizes right away to functions of many variables but only if you regard a number which is identified as the derivative in single variable calculus as a linear transformation on R. However, no attempt will be made to consider derivatives from one side or another. This is because when you consider functions of many variables, there isn’t a well defined side. However, it is certainly the case that there are more general notions which include such things. I will present a fairly general notion of the derivative of a function which is defined on a normed vector space which has values in a normed vector space. The case of most interest is that of a function which maps an open set in Fn to Fm but it is no more trouble to consider the extra generality and it is sometimes useful to have this extra generality because sometimes you want to consider functions defined, for example on subspaces of Fn and it is nice to not have to trouble with ad hoc considerations. Also, you might want to consider Fn with some norm other than the usual one. For most of what follows, it is not important for the vector spaces to be finite dimensional provided you make the following definition of what is meant by L (X, Y ) which is automatic if X is finite dimensional. See Proposition 2.10.8.
Definition 3.1.1
Let (X, k·kX ) and (Y, k·kY ) be two normed linear spaces. Then L (X, Y ) denotes the set of linear maps from X to Y which also satisfy the following condi1 In
the Bible, there was a battle between Ephraimites and Gilleadites during the time of Jepthah, the judge who sacrificed his daughter to Jehovah, one of several instances of human sacrifice in the Bible. The cause of this battle was very strange. However, the Ephramites lost and when they tried to cross a river to get back home, they had to say shibboleth. If they said “sibboleth” they were killed because their inability to pronounce the “sh” sound identified them as Ephramites. They usually don’t tell this story in Sunday school. The word has come to denote something which is arbitrary and no longer important.
93
94
CHAPTER 3. THE DERIVATIVE, A LINEAR TRANSFORMATION
tion. For L ∈ L (X, Y ) , lim kLxkY ≡ kLk < ∞
kxkX ≤1
To save notation, I will use k·k as a norm on either X, Y or L (X, Y ) and allow the context to determine which it is. Let U be an open set in X, and let f : U → Y be a function.
Definition 3.1.2
A function g is o (v) if g (v) =0 kvk→0 kvk lim
(3.1)
A function f : U → Y is differentiable at x ∈ U if there exists a linear transformation L ∈ L (X, Y ) such that f (x + v) = f (x) + Lv + o (v) This linear transformation L is the definition of Df (x). This derivative is often called the Frechet derivative. Note that from Theorem 2.9.4 the question whether a given function is differentiable is independent of the norm used on the finite dimensional vector space. That is, a function is differentiable with one norm if and only if it is differentiable with another norm. In infinite dimensions, this is not clearly so and in this case, simply regard the norm as part of the definition of the normed linear space which incidentally will also typically be assumed to be a complete normed linear space. The definition 3.1 means the error, f (x + v) − f (x) − Lv converges to 0 faster than kvk. Thus the above definition is equivalent to saying lim
kvk→0
kf (x + v) − (f (x) + Lv)k =0 kvk
(3.2)
or equivalently, lim
y→x
kf (y) − (f (x) + Df (x) (y − x))k = 0. ky − xk
(3.3)
The symbol, o (v) should be thought of as an adjective. Thus, if t and k are constants, o (v) = o (v) + o (v) , o (tv) = o (v) , ko (v) = o (v) and other similar observations hold.
Theorem 3.1.3
The derivative is well defined.
Proof: First note that for a fixed nonzero vector v, o (tv) = o (t). This is because lim
t→0
o (tv) o (tv) = lim kvk =0 t→0 |t| ktvk
Now suppose both L1 and L2 work in the above definition. Then let v be any vector and let t be a real scalar which is chosen small enough that tv + x ∈ U . Then f (x + tv) = f (x) + L1 tv + o (tv) , f (x + tv) = f (x) + L2 tv + o (tv) . Therefore, subtracting these two yields (L2 − L1 ) (tv) = o (tv) = o (t). Therefore, dividing by t yields (L2 − L1 ) (v) = o(t) t . Now let t → 0 to conclude that (L2 − L1 ) (v) = 0. Since this is true for all v, it follows L2 = L1 .
3.2. THE CHAIN RULE
95
Lemma 3.1.4 Let f be differentiable at x. Then f is continuous at x and in fact, there exists K > 0 such that whenever kvk is small enough,kf (x + v) − f (x)k ≤ K kvk . Also if f is differentiable at x, then o (kf (x + v) − f (x)k) = o (v) Proof: From the definition of the derivative, f (x + v) − f (x) = Df (x) v + o (v) . Let kvk be small enough that o(kvk) kvk < 1 so that ko (v)k ≤ kvk. Then for such v, kf (x + v) − f (x)k ≤ kDf (x) vk + kvk ≤ (kDf (x)k + 1) kvk This proves the lemma with K = kDf (x)k + 1. Recall the operator norm discussed in Definitions 2.10.4, 3.1.1. The last assertion is implied by the first as follows. Define ( o(kf (x+v)−f (x)k) 6 0 kf (x+v)−f (x)k if kf (x + v) − f (x)k = h (v) ≡ 0 if kf (x + v) − f (x)k = 0 Then limkvk→0 h (v) = 0 from continuity of f at x which is implied by the first part. Also from the above estimate,
o (kf (x + v) − f (x)k)
= kh (v)k kf (x + v) − f (x)k ≤ kh (v)k (kDf (x)k + 1)
kvk kvk This establishes the second claim. Here kDf (x)k is the operator norm of the linear transformation, Df (x). This will always be the case unless specified to be otherwise.
3.2
The Chain Rule
With the above lemma, it is easy to prove the chain rule.
Theorem 3.2.1 (The chain rule) Let U and V be open sets U ⊆ X and V ⊆ Y . Suppose f : U → V is differentiable at x ∈ U and suppose g : V → Z is differentiable at f (x) ∈ V where Z is a normed linear space. Then g ◦ f is differentiable at x and D (g ◦ f ) (x) = Dg (f (x)) Df (x) . Proof: This follows from a computation. Let B (x,r) ⊆ U and let r also be small enough that for kvk ≤ r, it follows that f (x + v) ∈ V . Such an r exists because f is continuous at x. For kvk < r, the definition of differentiability of g and f implies g (f (x + v)) − g (f (x)) =
Dg (f (x)) (f (x + v) − f (x)) + o (f (x + v) − f (x)) = Dg (f (x)) [Df (x) v + o (v)] + o (f (x + v) − f (x)) = D (g (f (x))) D (f (x)) v + o (v) + o (f (x + v) − f (x))
(3.4)
= D (g (f (x))) D (f (x)) v + o (v) By Lemma 3.1.4. From the definition of the derivative D (g ◦ f ) (x) exists and equals D (g (f (x))) D (f (x)).
96
3.3
CHAPTER 3. THE DERIVATIVE, A LINEAR TRANSFORMATION
The Matrix of the Derivative
The case of interest here is where X = Rn and Y = Rm , the function being defined on an open subset of Rn . Of course this all generalizes to arbitrary vector spaces and one considers the matrix taken with respect to various bases. As above, f will be defined and differentiable on an open set U ⊆ Rn . The matrix of Df (x) is the matrix having the ith column equal to Df (x) ei and so it is only necessary to compute this. Let t be a small real number. Then o (t) f (x + tei ) − f (x) − Df (x) (tei ) = t t Therefore, f (x + tei ) − f (x) o (t) = Df (x) (ei ) + t t The limit exists on the right and so it exists on the left also. Thus ∂f (x) f (x + tei ) − f (x) ≡ lim = Df (x) (ei ) t→0 ∂xi t and so the matrix of the derivative is just the matrix which has the ith column equal to the ith partial derivative of f . Note that this shows that whenever f is differentiable, it follows that the partial derivatives all exist. It does not go the other way however as discussed later.
Theorem 3.3.1 the partial derivatives
Let f : U ⊆ Fn → Fm and suppose f is differentiable at x. Then all exist and if Jf (x) is the matrix of the linear transformation,
∂fi (x) ∂xj
Df (x) with respect to the standard basis vectors, then the ij th entry is given by denoted as fi,j or fi,xj . It is the matrix whose ith column is
∂fi ∂xj
(x) also
∂f (x) f (x + tei ) − f (x) . ≡ lim t→0 ∂xi t ∂ ∂f . This might If you take another partial derivative, it can be written as fxi xj ≡ ∂x j ∂xi be written as f,ij also. I assume the reader has seen partial derivatives in calculus. What if all the partial derivatives of f exist? Does it follow that f is differentiable? Consider the following function, f : R2 → R, xy x2 +y 2 if (x, y) 6= (0, 0) . f (x, y) = 0 if (x, y) = (0, 0)
Then from the definition of partial derivatives, f (h, 0) − f (0, 0) 0−0 = lim =0 h→0 h→0 h h lim
and lim
h→0
f (0, h) − f (0, 0) 0−0 = lim =0 h→0 h h
However f is not even continuous at (0, 0) which may be seen by considering the behavior of the function along the line y = x and along the line x = 0. By Lemma 3.1.4 this implies f is not differentiable. Therefore, it is necessary to consider the correct definition of the derivative given above if you want to get a notion which generalizes the concept of the derivative of a function of one variable in such a way as to preserve continuity whenever the function is differentiable.
3.4. EXISTENCE OF THE DERIVATIVE, C 1 FUNCTIONS
97
Existence of the Derivative, C 1 Functions
3.4
There is a way to get the differentiability of a function from the existence and continuity of the partial derivatives. This is very convenient because these partial derivatives are taken with respect to a one dimensional variable. Of course, the determination of continuity is again a multivariable consideration. The following theorem is the main result.
Definition 3.4.1
When f : U → Rp for U an open subset of Rn and the vector ∂fi ∂f are all continuous, (equivalently each ∂x is continuous), the function valued functions, ∂x i j 1 is said to be C (U ) . If all the partial derivatives up to order k exist and are continuous, then the function is said to be C k . It turns out that for a C 1 function, all you have to do is write the matrix described in Theorem 3.3.1 and this will be the derivative. There is no question of existence for the derivative for such functions. This is the importance of the next theorem.
Theorem 3.4.2
Suppose f : U → Rp where U is an open set in Rn . Suppose also that all partial derivatives of f exist on U and are continuous. Then f is differentiable at every point of U . Proof: If you fix all the variables but one, you can apply the fundamental theorem of calculus as follows. 1
Z
∂f (x + tvk ek ) vk dt. ∂xk
f (x+vk ek ) − f (x) = 0
(3.5)
Here is why. Let h (t) = f (x + tvk ek ) . Then f (x + tvk ek + hvk ek ) − f (x + tvk ek ) h (t + h) − h (t) = vk h hvk ∂f and so, taking the limit as h → 0 yields h0 (t) = ∂x (x + tvk ek ) vk . You can also see this is k ∂f 0 true by using the chain rule. h (t) = Df (x + tvk ek ) vk ek = ∂x (x + tvk ek ) vk Therefore, k 1
Z
h0 (t) dt =
f (x+vk ek ) − f (x) = h (1) − h (0) = 0
1
Z 0
∂f (x + tvk ek ) vk dt. ∂xk
Now I will use this observation Pnto prove the theorem. Let v = (v1 , · · · , vn ) with |v| sufficiently small. Thus v = k=1 vk ek . For the purposes of this argument, define Pn v e ≡ 0. Then with this convention, f (x + v) − f (x) = k=n+1 k k n X
f
x+
i=1
n X
! vk ek
−f
x+
k=i
=
n Z X i=1
0
1
n X
!! vk ek
=
i=1
k=i+1
∂f ∂xi
x+
n X
! vk ek + tvi ei
k=i+1
n Z X 0
1
∂f ∂xi
∂f vi − (x) vi ∂xi
n X ∂f = (x) vi + o (v) ∂x i i=1
and this shows f is differentiable at x.
!
n X
x+
vk ek + tvi ei
k=i+1
! dt +
n Z X i=1
0
1
∂f (x) vi dt ∂xi
vi dt
98
CHAPTER 3. THE DERIVATIVE, A LINEAR TRANSFORMATION
Some explanation of the step to the last line is in order. The messy term is o (v) because of the continuity of the partial derivatives. In fact, from the Cauchy Schwarz inequality, ! ! Z 1X n n X ∂f ∂f x+ (x) vi dt vk ek + tvi ei − ∂xi ∂xi 0 i=1 k=i+1 2 1/2 !1/2 ! Z 1 X n n n X X ∂f ∂f 2 x+ dt vi ≤ vk ek + tvi ei − (x) ∂xi ∂xi 0 i=1
i=1
k=i+1
Z 1 X n ∂f = ∂xi 0
x+
i=1
!
n X
vk ek + tvi ei
k=i+1
2 1/2 ∂f (x) dt |v| − ∂xi
Thus, dividing by |v| and taking a limit as |v| → 0, the quotient is nothing but Z 0
1
n X ∂f ∂xi i=1
n X
x+
! vk ek + tvi ei
k=i+1
2 1/2 ∂f (x) dt − ∂xi
which converges to 0 as v → 0 due to continuity of the partial derivatives of f . Indeed 2 ! n ∂f X ∂f v → x+ vk ek + tvi ei − (x) ∂xi ∂xi k=i+1
is uniformly continuous for |v| ≤ 1 by Theorem 2.6.28.
3.5
Mixed Partial Derivatives
Under certain conditions the mixed partial derivatives will always be equal. The simple condition is that if they exist and are continuous, then they are equal. This astonishing fact is due to Euler in 1734 and was proved by Clairaut although not very well. The first satisfactory proof was by Hermann Schwarz in 1873. For reasons I cannot understand, calculus books seldom include a proof of this important result. It is not all that hard. It is based on the mean value theorem for derivatives. Here it is.
Theorem 3.5.1 fxy and fyx
Suppose f : U ⊆ R2 → R where U is an open set on which fx , fy , exist. Then if fxy and fyx are continuous at the point (x, y) ∈ U , it follows fxy (x, y) = fyx (x, y) .
Proof: Since U is open, there exists r > 0 such that B ((x, y) , r) ⊆ U. Now let |t| , |s| < r/2 and consider h(t)
h(0)
}| { z }| { 1 z ∆ (s, t) ≡ {f (x + t, y + s) − f (x + t, y) − (f (x, y + s) − f (x, y))}. st
(3.6)
Note that (x + t, y + s) ∈ U because 2
2 1/2
|(x + t, y + s) − (x, y)| = |(t, s)| = t + s
≤
r2 r2 + 4 4
1/2
r = √ < r. 2
As implied above, h (t) ≡ f (x + t, y + s)−f (x + t, y). Therefore, by the mean value theorem from calculus and the (one variable) chain rule, ∆ (s, t) =
1 1 1 (h (t) − h (0)) = h0 (αt) t = (fx (x + αt, y + s) − fx (x + αt, y)) st st s
3.6. A COFACTOR IDENTITY
99
for some α ∈ (0, 1) . Applying the mean value theorem again, ∆ (s, t) = fxy (x + αt, y + βs) where α, β ∈ (0, 1). If the terms f (x + t, y) and f (x, y + s) are interchanged in 3.6, ∆ (s, t) is also unchanged and the above argument shows there exist γ, δ ∈ (0, 1) such that ∆ (s, t) = fyx (x + γt, y + δs) . Letting (s, t) → (0, 0) and using the continuity of fxy and fyx at (x, y) , lim (s,t)→(0,0)
∆ (s, t) = fxy (x, y) = fyx (x, y) .
The following is obtained from the above by simply fixing all the variables except for the two of interest.
Corollary 3.5.2 Suppose U is an open subset of Rn and f : U → R has the property that for two indices, k, l, fxk , fxl , fxl xk , and fxk xl exist on U and fxk xl and fxl xk are both continuous at x ∈ U. Then fxk xl (x) = fxl xk (x) . It is necessary to assume the mixed partial derivatives are continuous in order to assert they are equal. The following is a well known example [5]. Example 3.5.3 Let ( f (x, y) =
xy (x2 −y 2 ) x2 +y 2
if (x, y) 6= (0, 0) 0 if (x, y) = (0, 0)
From the definition of partial derivatives it follows immediately that fx (0, 0) = fy (0, 0) = 0. Using the standard rules of differentiation, for (x, y) 6= (0, 0) , fx = y Now
x4 − y 4 + 4x2 y 2 2
(x2 + y 2 )
, fy = x
x4 − y 4 − 4x2 y 2 (x2 + y 2 )
2
−y 4 fx (0, y) − fx (0, 0) = lim = −1 y→0 (y 2 )2 y→0 y
fxy (0, 0) ≡ lim while
fy (x, 0) − fy (0, 0) x4 = lim =1 x→0 x→0 (x2 )2 x
fyx (0, 0) ≡ lim
showing that although the mixed partial derivatives do exist at (0, 0) , they are not equal there. Here is a picture of the graph of this function. It looks innocuous but isn’t.
3.6
A Cofactor Identity
Lemma 3.6.1 Suppose det (A) = 0. Then for all sufficiently small nonzero ε, det (A + εI) 6= 0. Proof: First suppose A is a p × p matrix. Suppose also that det (A) = 0. Thus, the constant term of det (λI − A) is 0. Consider εI + A ≡ Aε for small real ε. The characteristic polynomial of Aε is det (λI − Aε ) = det ((λ − ε) I − A)
100
CHAPTER 3. THE DERIVATIVE, A LINEAR TRANSFORMATION
This is of the form p
(λ − ε) + ap−1 (λ − ε)
p−1
m
+ · · · + (λ − ε) am
where the aj are the coefficients in the characteristic equation for A and m is the largest such that am 6= 0. The constant term of this characteristic polynomial for Aε must be nonzero for all ε small enough because it is of the form m
(−1) εm am + (higher order terms in ε) which shows that εI + A is invertible for all ε small enough but nonzero. Recall that for A an p×p matrix, cof (A)ij is the determinant of the matrix which results i+j
from deleting the ith row and the j th column and multiplying by (−1) . In the proof and in what follows, I am using Dg to equal the matrix of the linear transformation Dg taken P with respect to the usual basis on Rp . Thus (Dg)ij = ∂gi /∂xj where g = i gi ei for the ei the standard basis vectors.
Lemma 3.6.2 Let g : U → Rp be C 2 where U is an open subset of Rp . Then p X
cof (Dg)ij,j = 0,
j=1
where here (Dg)ij ≡ gi,j ≡
∂gi ∂xj .
Also, cof (Dg)ij =
∂ det(Dg) . ∂gi,j
Proof: From the cofactor expansion theorem, δ kj det (Dg) =
p X
gi,k cof (Dg)ij
(3.7)
i=1
This is because if k 6= j, that on the right is the cofactor expansion of a determinant with two equal columns while if k = j, it is just the cofactor expansion of the determinant. In particular, ∂ det (Dg) = cof (Dg)ij (3.8) ∂gi,j which shows the last claim of the lemma. Assume that Dg (x) is invertible to begin with. Differentiate 3.7 with respect to xj and sum on j. This yields X r,s,j
δ kj
X X ∂ (det Dg) gr,sj = gi,kj (cof (Dg))ij + gi,k cof (Dg)ij,j . ∂gr,s ij ij
Hence, using δ kj = 0 if j 6= k and 3.8, X X X (cof (Dg))rs gr,sk = gr,ks (cof (Dg))rs + gi,k cof (Dg)ij,j . rs
rs
ij
Subtracting the first sum on the right from both sides and using the equality of mixed partials, X X gi,k (cof (Dg))ij,j = 0. i
j
P Since it is assumed Dg is invertible, this shows j (cof (Dg))ij,j = 0. If det (Dg) = 0, use Lemma 3.6.1 to let gk (x) = g (x) + εk x where εk → 0 and det (Dg + εk I) ≡ det (Dgk ) 6= 0. Then X X (cof (Dg))ij,j = lim (cof (Dgk ))ij,j = 0 j
k→∞
j
3.7. IMPLICIT FUNCTION THEOREM
3.7
101
Implicit Function Theorem
The implicit function theorem is one of the greatest theorems in mathematics. There are many versions of this theorem which are of far greater generality than the one given here. The proof given here is like one found in one of Caratheodory’s books on the calculus of variations. It is not as elegant as some of the others which are based on a contraction mapping principle but it may be more accessible. This proof is based on a mean value theorem in the following lemma.
Lemma 3.7.1 Let U be an open set in Rp which contains the line segment t → y + t (z − y) for t ∈ [0, 1] and let f : U → R be differentiable at y + t (z − y) for t ∈ (0, 1) and continuous for t ∈ [0, 1]. Then there exists x on this line segment such that f (z) − f (y) = Df (x) (z − y) . Proof: Let h (t) ≡ f (y + t (z − y)) for t ∈ [0, 1] . Then h is continuous on [0, 1] and has a derivative, h0 (t) = Df (y + t (z − y)) (z − y), this by the chain rule. Then by the mean value theorem of one variable calculus, there exists t ∈ (0, 1) such that h (1) − h (0) = f (z) − f (y) = Df (y + t (z − y)) (z − y) and we let x = y + t (z − y) for this t. Also of use is the following lemma.
Lemma 3.7.2 Let A√be an m × n matrix and suppose that |Aij | ≤ C. Then the operator norm satisfies kAk ≤ C mn. Proof: Let kxk ≤ 1. Then 2
|Ax|
=
≤
2 2 m n m X X X X n Aij xj ≤ C 2 |xj | i=1 j=1 i=1 j=1 1/2 2 1/2 n n m X X X 2 |xj | 1 C2 i=1
=
C2
m X
j=1
j=1 2
n kxk ≤ C 2 nm
i=1
√ and so kAk ≡ supkxk≤1 |Ax| ≤ C mn which shows that kAk ≤ C nm. 2
2
2
Definition 3.7.3
Suppose U is an open set in Rn × Rm and (x, y) will denote a typical point of R × R with x ∈ Rn and y ∈ Rm . Let f : U → Rp be in C 1 (U ) meaning that all partial derivatives exist and are continuous. Then define f1,x1 (x, y) · · · f1,xn (x, y) .. .. D1 f (x, y) ≡ , . . n
m
fp,x1 (x, y) · · ·
f1,y1 (x, y) · · · .. D2 f (x, y) ≡ . fp,y1 (x, y) · · ·
fp,xn (x, y) f1,ym (x, y) .. . . fp,ym (x, y)
Definition 3.7.4 Let δ, η > 0 satisfy: B (x0 , δ) × B (y0 , η) ⊆ U where f : U ⊆ Rn × Rm → Rp is given as f1 (x, y) f2 (x, y) f (x, y) = .. . fp (x, y)
102 and for
CHAPTER 3. THE DERIVATIVE, A LINEAR TRANSFORMATION x1
Thus, its ith
···
xn
n
∈ B (x0 , δ) and y ∈ B (y0 , ηˆ) define f1,x1 x1 , y · · · f1,xn x1 , y .. .. J x1 , · · · , xn , y ≡ . . . n n fp,x1 (x , y) · · · fp,xn (x , y) row is D1 fi xi , y . Let K, r be constants.
(*)
By Lemma 3.7.1, and (x, y) ∈ B (x0 , δ) × B (y0 , η) ⊆ U, and h, k sufficiently small, there are xi on the line segment between x and x + h such that f (x + h, y + k) − f (x, y) = f (x + h, y + k) − f (x, y + k) + f (x, y + k) − f (x, y) = J x1 , · · · , xn , y + k h + D2 f (x, y) k + o (k) (3.9) 1 n = D1 f (x, y) h + D2 f (x, y) k + o (k) + J x , · · · , x , y + k − D1 f (x, y) h (3.10) q 2 2 Now by continuity of the partial derivatives, if |h| + |k| is sufficiently small,
J x1 , · · · , xn , y + k − D1 f (x, y) < ε and so
J x1 , · · · , xn , y + k − D1 f (x, y) h ε |h| q ≤q ≤ε 2 2 2 2 |h| + |k| |h| + |k|
and so the last term in 3.10 is o ((h, k)) . Thus f (x + h, y + k) − f (x, y) is of the form D1 f (x, y) h + D2 f (x, y) k + o (h, k) which shows that f is differentiable and its derivative is the p × (n + m) matrix, D1 f (x, y) D2 f (x, y) .
Proposition 3.7.5 Suppose g : B (x0 , δ)×B (y0 , η0 ) → [0, ∞) is continuous and g (x0 , y0 ) = 0 and if x 6= x0 , g (x, y0 ) > 0. Then there exists η < η 0 such that if y ∈ B (y0 , η) , then the function x → g (x, y) achieves its minimum on the open set B (x0 , δ). Proof: If not, then there is a sequence yk → y0 but the minimum of x → g (x, yk ) for x ∈ B (x0 , δ) happens on ∂B (x0 , δ) ≡ ∂B ≡ {x : |x − x0 | = δ} at xk . Now ∂B is closed and bounded and so compact. Hence there is a subsequence, still denoted with subscript k such that xk → x ∈ ∂B and yk → y0 . By uniform continuity on the compact set and assumption, if k is large enough, for all x ˆ ∈ B (x0 , δ) n o max |g (ˆ x, y0 ) − g (ˆ x, yk )| : x ˆ ∈ B (x0 , δ) ≤ Then
1 1 min {|g (ˆ x, y0 )| : x ˆ ∈ ∂B} ≤ g (x, y0 ) 2 2
1 g (x0 , yk ) ≥ g (xk , yk ) > g (xk , y0 ) − g (x, y0 ) 2
ˆ → g (ˆ x, y0 ) is only 0 at x0 . Letting k → ∞, 0 ≥ 21 g (x, y0 ) contrary to x Here is the implicit function theorem. It is based on the mean value theorem from one variable calculus, the extreme value theorem from calculus, and the formula for the inverse of a matrix in terms of the transpose of the cofactor matrix divided by the determinant. In the above, n = p.
3.7. IMPLICIT FUNCTION THEOREM
103
Theorem 3.7.6 (implicit function theorem) Suppose U is an open set in Rn × Rm . Let f : U → Rn be in C 1 (U ) and suppose −1
f (x0 , y0 ) = 0, D1 f (x0 , y0 )
exists.
(3.11)
Then there exist positive constants δ, η, such that for every y ∈ B (y0 , η) there exists a unique x (y) ∈ B (x0 , δ) such that f (x (y) , y) = 0.
(3.12)
Furthermore, the mapping, y → x (y) is in C 1 (B (y0 , η)). Proof: Let B (x0 , δ) × B (y0 , ηˆ) ⊆ U . By the assumption of continuity of all the partial derivatives and the extreme value theorem, there exists r > 0 and δ, ηˆ > 0 possibly smaller, n such that if x1 , · · · , xn ∈ B (x0 , δ) and y ∈ B (y0 , ηˆ), det J x1 , · · · , xn , y > r > 0.
(3.13)
By continuity of all the partial derivatives and the extreme value theorem, Lemma 3.7.2 implies there exists a constant, K such that for all (x, y) ∈ B (x0 , δ)× B (y0 , ηˆ) kD2 f (x, y)k < K,
(3.14)
n and by enlarging K if necessary, for all x1 , · · · , xn ∈ B (x0 , δ) and y ∈ B (y0 , ηˆ),
−1
J x1 , · · · , xn , y
1 assuming that f is C k . From 3.22 ∂x −1 ∂f = −D1 f (x, y) . ∂y l ∂y l Thus the following formula holds for q = 1 and |α| = q. X Dα x (y) = Mβ (x, y) Dβ f (x, y)
(3.23)
|β|≤q
where Mβ is a matrix whose entries are differentiable functions of Dγ x for |γ| < q and −1 Dτ f (x, y) for |τ | ≤ q. This follows easily from the description of D1 f (x, y) in terms of the cofactor matrix and the determinant of D1 f (x, y). Suppose 3.23 holds for |α| = q < k. Then by induction, this yields x is C q . Then X ∂Mβ (x, y) ∂Dβ f (x, y) ∂Dα x (y) = Dβ f (x, y) + Mβ (x, y) . p p ∂y ∂y ∂y p |β|≤|α|
∂M (x,y)
β By the chain rule ∂y is a matrix whose entries are differentiable functions of Dτ f (x, y) p for |τ | ≤ q + 1 and Dγ x for |γ| < q + 1. It follows, since y p was arbitrary, that for any |α| = q + 1, a formula like 3.23 holds with q being replaced by q + 1. By induction, x is C k . As a simple corollary, this yields the inverse function theorem. You just let F (x, y) = y − f (x) and apply the implicit function theorem.
Theorem 3.8.2
(inverse function theorem) Let x0 ∈ U ⊆ Fn and let f : U → Fn . Suppose for k a positive integer, f is C k (U ) , and Df (x0 )−1 ∈ L(Fn , Fn ).
(3.24)
Then there exist open sets W , and V such that x0 ∈ W ⊆ U,
(3.25)
f : W → V is one to one and onto,
(3.26)
f
−1
k
is C .
(3.27)
106
3.9
CHAPTER 3. THE DERIVATIVE, A LINEAR TRANSFORMATION
Invariance of Domain
As an application of the inverse function theorem is a simple proof of the important invariance of domain theorem which says that continuous and one to one functions defined on an open set in Rn with values in Rn take open sets to open sets. You know that this is true for functions of one variable because a one to one continuous function must be either strictly increasing or strictly decreasing. This will be used when considering orientations of curves later. However, the n dimensional version isn’t at all obvious but is just as important if you want to consider manifolds with boundary for example. The need for this theorem occurs in many other places as well in addition to being extremely interesting for its own sake. The inverse function theorem gives conditions under which a differentiable function maps open sets to open sets. The following lemma, depending on the Brouwer fixed point theorem is the thing which will allow this to be extended to continuous one to one functions. It says roughly that if a continuous function does not move points near p very far, then the image of a ball centered at p contains an open set.
Lemma 3.9.1 Let f be continuous and map B (p, r) ⊆ Rn to Rn . Suppose that for all x ∈ B (p, r), |f (x) − x| < εr Then it follows that f B (p, r) ⊇ B (p, (1 − ε) r) Proof: This is from the Brouwer fixed point theorem, Corollary 2.16.2. Consider for y ∈ B (p, (1 − ε) r) h (x) ≡ x − f (x) + y Then h is continuous and for x ∈ B (p, r), |h (x) − p| = |x − f (x) + y − p| < εr + |y − p| < εr + (1 − ε) r = r Hence h : B (p, r) → B (p, r) and so it has a fixed point x by Corollary 2.16.2. Thus x − f (x) + y = x so f (x) = y. The notation kf kK means supx∈K |f (x)|. If you have a continuous function h defined on a compact set K, then the Stone Weierstrass theorem implies you can uniformly approximate it with a polynomial g. That is kh − gkK is small. The following lemma says that you can −1 also have g (z) = h (z) and Dg (z) exists so that near z, the function g will map open sets to open sets as claimed by the inverse function theorem. First is a little observation about approximating.
Lemma 3.9.2 Let K be a compact set in Rn and let h : K → Rn be continuous, z ∈ K is fixed. Let δ > 0. Then there exists a polynomial g (each component a polynomial) such that −1 kg − hkK < δ, g (z) = h (z) , Dg (z) exists Proof: By the Weierstrass approximation theorem, Theorem 2.12.7, there exists a polynomial g ˆ such that δ kˆ g − hkK < 3 Then define for y ∈ K g (y) ≡ g ˆ (y) + h (z) − g ˆ (z) Then g (z) = g ˆ (z) + h (z) − g ˆ (z) = h (z)
3.9. INVARIANCE OF DOMAIN
107
Also |g (y) − h (y)|
≤
|(ˆ g (y) + h (z) − g ˆ (z)) − h (y)|
≤
|ˆ g (y) − h (y)| + |h (z) − g ˆ (z)|
0 there γ exists a δ > 0 such that if kPk ≤ δ, then Z f · dγ − S (P) < ε. γ
Sometimes this is written as
Z f · dγ ≡ lim S (P) . γ
kPk→0
∗ The set of points in the Pcurve, γ ([a, b]) will be denoted sometimes by γ . Also, when convenient, I will write P to denote a Riemann sum.
111
112
CHAPTER 4. LINE INTEGRALS
Then γ ∗ is a set of points in Rp and as t moves from a to b, γ (t) moves from γ (a) to γ (b) . Thus γ ∗ has a first point and a last point. Note that from the above definition, it is obvious that the line integral Simply is linear. let Pn refer to a uniform parition of [a, b] and let τ nj be the midpoint of tnj−1 , tnj . Then for a, b scalars and f , g vector valued functions, P Z Z limn→∞ a Pn f γ τ nj · γ tnj − γ tnj−1 P a f · dγ + b g · dγ = + limn→∞ b Pn g γ τ nj · γ tnj − γ tnj−1 γ γ X = lim af γ τ nj + bg γ τ nj · γ tnj − γ tnj−1 n→∞
Pn
Z ≡
(af + bg) · dγ γ
Another issue is whether the integral depends on the parametrization associated with γ ∗ or only on γ ∗ and the direction of motion over γ ∗ . If φ : [c, d] → [a, b] is a continuous nondecreasing function, then γ ◦ φ : [c, d] → Rp is also of bounded variation and yields the same set of points in Rp with the same first and last points. The next theorem explains that one can use either γ or γ ◦ φ and get the same integral.
4.1.1
Change of Parameter
Theorem 4.1.3 Let φ be continuous and non-decreasing and γ is continuous and bounded variation. Then assuming that Z f · dγ γ
exists, so does Z f d (γ ◦ φ) γ◦φ
and
Z
Z f · dγ =
γ
f d (γ ◦ φ) .
(4.1)
γ◦φ
Proof: There exists δ > 0 such that if P is a partition of [a, b] such that kPk < δ, then Z f · dγ − S (P) < ε. γ
By Theorem 2.6.28, φ is uniformly continuous so there exists σ > 0 such that if Q is a partition of [c, d] with kQk < σ, Q = {s0 , · · · , sn } , then |φ (sj ) − φ (sj−1 )| < δ. Thus letting P denote the points in [a, b] given by φ (sj ) for sj ∈ Q, it follows that kPk < δ and so Z n X f · dγ − f (γ (φ (τ j ))) · (γ (φ (sj )) − γ (φ (sj−1 ))) < ε γ j=1 where τ j ∈ [sj−1 , sj ] . Therefore, from the definition 4.1 holds and Z f · d (γ ◦ φ) γ◦φ
R exists and equals γ f · dγ. R This theorem shows that γ f · dγ is independent of the particular parametrization γ used in its computation in the sense that if φ is any nondecreasing continuous function from
4.1. EXISTENCE AND DEFINITION
113
another interval, [c, d] , mapping onto [a, b] , then the same value is obtained by replacing γ with γ ◦ φ. Given a parametrization γ of a curve and an interval [a, b] on which γ is defined, there is a natural orientation corresponding to increasing t ∈ [a, b]. Sometimes people write −γ to denote the opposite orientation. To obtain this, you could write a parametrization for it as −γ (t) ≡ γ (b − t) for t ∈ [0, b − a]. Of course this encounters the points of γ ∗ in the opposite order. Therefore, in the above definition, Z Z − f ·dγ = f ·d (−γ) γ
4.1.2
γ
Existence
The fundamental result in this subject is the following theorem.
Theorem 4.1.4
Let f : γR∗ → Rp be continuous and let γ : [a, b] → Rp be continuous and of bounded variation. Then γ f ·dγ exists. Also letting δ m > 0 be such that |t − s| < δ m 1 , implies kf (γ (t)) − f (γ (s))k < m Z f · dγ − S (P) ≤ 2V (γ, [a, b]) (4.2) m γ
whenever kPk < δ m . Proof: The function, f ◦ γ , is uniformly continuous because it is defined on a compact set. Therefore, there exists a decreasing sequence of positive numbers {δ m } such that if |s − t| < δ m , then 1 |f (γ (t)) − f (γ (s))| < . m Let Fm ≡ {S (P) : kPk < δ m }. Thus Fm is a closed set. (The symbol, S (P) in the above definition, means to include all sums corresponding to P for any choice of τ j .) It is shown that diam (Fm ) ≤
2V (γ, [a, b]) m
(4.3)
and then it will follow there exists Ra unique point, I ∈ ∩∞ This is because Rp is m=1 Fm . complete. It will then follow I = γ f (t) · dγ (t) . To verify 4.3, it suffices to verify that whenever P and Q are partitions satisfying kPk < δ m and kQk < δ m , |S (P) − S (Q)| ≤
2 V (γ, [a, b]) . m
(4.4)
Suppose kPk < δ m and Q ⊇ P. Then also kQk < δ m . To begin with, suppose that P ≡ {t0 , · · · , tp , · · · , tn } and Q ≡ {t0 , · · · , tp−1 , t∗ , tp , · · · , tn } . Thus Q contains only one more point than P. Letting S (Q) and S (P) be Riemann Steiltjes sums, S (Q) ≡
p−1 X
f (γ (σ j )) · (γ (tj ) − γ (tj−1 )) + f (γ (σ ∗ )) · (γ (t∗ ) − γ (tp−1 ))
j=1
+f (γ (σ ∗ )) · (γ (tp ) − γ (t∗ )) +
n X
f (γ (σ j )) · (γ (tj ) − γ (tj−1 )) ,
j=p+1
S (P) ≡
p−1 X j=1
f (γ (τ j )) · (γ (tj ) − γ (tj−1 )) +
114
CHAPTER 4. LINE INTEGRALS =f (γ(τ p ))·(γ(tp )−γ(tp−1 ))
z }| { f (γ (τ p )) · (γ (t∗ ) − γ (tp−1 )) + f (γ (τ p )) · (γ (tp ) − γ (t∗ )) n X
+
f (γ (τ j )) · (γ (tj ) − γ (tj−1 )) .
j=p+1
Therefore, using the Cauchy Schwarz inequality, |S (P) − S (Q)| ≤
p−1 X 1 1 |γ (tj ) − γ (tj−1 )| + |γ (t∗ ) − γ (tp−1 )| + m m j=1
n X 1 1 1 |γ (tp ) − γ (t∗ )| + |γ (tj ) − γ (tj−1 )| ≤ V (γ, [a, b]) . m m m j=p+1
(4.5)
Clearly the extreme inequalities would be valid in 4.5 if Q had more than one extra point. You simply do the above trick more than one time. Let S (P) and S (Q) be Riemann Steiltjes sums for which kPk and kQk are less than δ m and let R ≡ P ∪ Q. Then from what was just observed, |S (P) − S (Q)| ≤ |S (P) − S (R)| + |S (R) − S (Q)| ≤
2 V (γ, [a, b]) . m
and this shows 4.4 which proves 4.3. Therefore, there exists a unique complex number, R I ∈ ∩∞ F which satisfies the definition of f · dγ. m=1 m γ In case the existence and uniqueness of I is not clear, note that each Fm is closed and if you pick a point from each, you get a Cauchy sequence. Thus it converges to a point of Fm for each m. Hence there is a point in all these Fm and since their diameters converge to 0, there can be no more than one point.
4.1.3
The Riemann Integral
The reader is assumed to be familiar with the Riemann integral, but if not, the above is more general and gives the principal results for the Riemann integral of continuous functions very easily. Therefore, here is a slight digression to show this. In the special case that f : [a, b] → R, and γ (t) = t, the Rieman Stieltjes sums reduce to ordinary Riemann sums n X f (τ i ) (ti − ti−1 ) i=1
and the above argument reduces to showing the existence of the Riemann integral for a continuous function. Note that it is obvious that if f (t) ≥ 0 for all t, then Z
Rb a
f (t) dt
b
f (t) dt ≥ 0 a
and so Z
b
Z (|f | − f ) dx
a
Z
b
≥ 0⇒ a
b
Z (|f | + f ) dx
≥
a
b
Z |f | dx ≥
0⇒
f dx a
b
Z |f | dx ≥ −
a
b
f dx a
which together imply that Z a
b
Z b f dx |f | dx ≥ a
(4.6)
4.1. EXISTENCE AND DEFINITION
115
the triangle inequality. As to vector valued functions and Riemann integration, if f : [a, b] → Rp is continuous, Rb what is meant by a f (s) ds? We mean the obvious thing: ! Z Z b
b
≡
f (s) ds a
i
fi (s) ds
(4.7)
a
More generally, we say f is Riemann integrable if each component function fi is and the Riemann integrable is defined by 4.7.
Proposition 4.1.5 Let f be continuous on [a, b] with values in Rp . Then the triangle inequality Z Z b b |f (x)| dx f (x) dx ≤ a a holds for |·| the usual Euclidean norm. P Proof: Let a ∈ Rp be any nonzero vector. Then for (a, b) ≡ i ai , bi , a ≤ sup |a · b| ≤ sup |a| |b| = |a| |a| = a, |a| |b|=1 |b|=1 Therefore, using the Cauchy Schwarz inequality as needed, Z Z p X Z b bX b fi (x) dxbi = sup fi (x) bi dx f (x) dx = sup a |b|=1 |b|=1 a a i=1 i !1/2 !1/2 Z b X p p X 2 2 ≤ sup |fi (x)| |bi | dx |b|=1 a i=1 i=1 Z b Z b = sup |b| |f (x)| dx = |f (x)| dx |b|=1
a
a
The assumption that f is continuous makes the existence of the integrals obvious. However, the same result holds with the same proof if it is only known that f is integrable in some sense. Here the main interest is in continuous functions although later, this will be spectacularly generalized far beyond Riemann integrable. Also one obtains easily the fundamental theorem of calculus.
Theorem 4.1.6
Let f : [a, b] → R be continuous and real valued. Suppose F 0 (x) = f (x) for all x ∈ (a, b) where F is a continuous function on [a, b]. Then Z b f (x) dx = F (b) − F (a) a
Proof: Let ε > 0. There is a δ > 0 such that if kPk < δ, then for P = {x0 , · · · , xn } , Z n b X f (x) dx − f (τ i ) (xi − xi−1 ) < ε a i=1
Now
Z Z n b b X f (x) dx − (F (b) − F (a)) = f (x) dx − (F (xi ) − F (xi−1 )) a a i=1
By intermediate value theorem from calculus, there is τ i ∈ (xi−1 , xi ) such that F (xi ) − F (xi−1 ) = F 0 (τ i ) (xi − xi−1 ) , F 0 = f
116
CHAPTER 4. LINE INTEGRALS
Thus Z Z n b b X f (x) dx − (F (b) − F (a)) < ε f (x) dx − = f (τ i ) (xi − xi−1 ) = a a i=1
Rb because kPk < δ. Since ε > 0, it follows that a f (x) dx − (F (b) − F (a)) = 0. In case of vector valued continuous functions with f (s) = F0 (s) so Fi0 (s) = fi (s) , Z
b
f (s) ds ≡
R b a
a
a
F1 (b) − F1 (a) · · ·
= =
Rb
f1 (s) ds · · ·
fp (s) ds
T
Fp (b) − Fp (a)
T
F (b) − F (a)
The other form of the fundamental theorem of calculus is also obtained. R Theorem 4.1.7 Let f : [a, b] → Rp be continuous and let F (t) ≡ at f (s) ds. Then F0 (t) = f (t) for all t ∈ (a, b). Proof: Let t ∈ (a, b) and let |h| be small enough that everything of interest is in [a, b]. First suppose h > 0. Then Z t+h Z F (t + h) − F (t) 1 1 t+h = − f (t) f (s) ds − f (t) ds h h h t t 1 ≤ h
t+h
Z t
1 |f (s) − f (t)| ds ≤ h
Z
t+h
εds = ε t
provided that ε is small enough due to continuity of f at t. A similar inequality is obtained if h < 0 except in the argument, you will have t + h < t so you have to switch the order of integration in going to the second line and replace 1/h with 1/ (−h). Thus lim
h→0
4.2
F (t + h) − F (t) = f (t) . h
Estimates and Approximations
The following theorem follows easily from the above definitions and theorem.
Theorem 4.2.1
Let f ∈ C (γ ∗ ) and let γ : [a, b] → Rp be of bounded variation and continuous. Let M be at least as large as the maximum of |f | on γ ∗ . That is, M ≥ max {|f ◦ γ (t)| : t ∈ [a, b]} .
(4.8)
Z f · dγ ≤ M V (γ, [a, b]) .
(4.9)
Then
γ
Also if {fn } is a sequence of functions continuous on γ ∗ which is converging uniformly to the function f on γ ∗ , then Z Z lim fn · dγ = f · dγ. (4.10) n→∞
γ
γ
Proof: Let 4.8 hold. From the proof of Theorem 4.1.4 on existence, when kPk < δ m , Z f · dγ − S (P) ≤ 2 V (γ, [a, b]) m γ
4.2. ESTIMATES AND APPROXIMATIONS and so
117
Z f · dγ ≤ |S (P)| + 2 V (γ, [a, b]) m γ
Then by the triangle inequality and Cauchy Schwarz inequality, ≤
n X
M |γ (tj ) − γ (tj−1 )| +
j=1
≤
2 V (γ, [a, b]) m
2 V (γ, [a, b]) . m
M V (γ, [a, b]) +
This proves 4.9 since m is arbitrary. To verify 4.10 use the above inequality to write Z Z Z f · dγ − (f − f ) · dγ (t) = f · dγ n n γ
γ
γ
≤ max {|f ◦ γ (t) − fn ◦ γ (t)| : t ∈ [a, b]} V (γ, [a, b]) . Since the convergence is assumed to be uniform, this proves 4.10. As an easy example of a curve of bounded variation, here is an easy lemma.
Lemma 4.2.2 Let γ : [a, b] → Rp be in C 1 ([a, b]) . Then V (γ, [a, b]) < ∞ so γ is of bounded variation. Proof: This follows from the following. You can use Proposition 4.1.5 in the first inequality. n n Z tj n Z tj X X X 0 |γ (tj ) − γ (tj−1 )| = γ (s) ds ≤ |γ 0 (s)| ds tj−1 t j−1 j=1 j=1 j=1 n Z tj X ≤ kγ 0 k∞ ds = kγ 0 k∞ (b − a) . j=1
tj−1
Therefore it follows V (γ, [a, b]) ≤ kγ 0 k∞ (b − a) . Here kγk∞ = max {|γ (t)| : t ∈ [a, b]} which exists by Theorem 2.6.26.
Theorem 4.2.3
Let γ : [a, b] → Rp be continuous and of bounded variation. Let Ω be an open set containing γ ∗ and let f : Ω × K → Rp be continuous for K a compact set in Rp , and let ε > 0 be given. Then there exists η : [a, b] → Rp such that η (a) = γ (a) , γ (b) = η (b) , η ∈ C 1 ([a, b]) , and kγ − ηk < ε, (4.11) Z Z f (·, z) · dγ − f (·, z) dη < ε, all z ∈ K (4.12) γ
η
V (η, [a, b]) ≤ V (γ, [a, b]) ,
(4.13)
where kγ − ηk ≡ max {|γ (t) − η (t)| : t ∈ [a, b]} . Proof: Extend γ to be defined on all R according to γ (t) = γ (a) if t < a and γ (t) = γ (b) if t > b. Now define for 0 ≤ h ≤ 1. 1 γ h (t) ≡ 2h
Z
2h t+ (b−a) (t−a)
2h −2h+t+ (b−a) (t−a)
γ (s) ds, γ 0 (t) ≡ γ (t) ,
where the integral is defined in the obvious way. That is, the j th component of γ h (t) will be 2h Z t+ (b−a) (t−a) 1 γ j (s) ds 2h 2h −2h+t+ (b−a) (t−a)
118
CHAPTER 4. LINE INTEGRALS
Note that (t, h) → γ h (t) is clearly continuous on the compact set [a, b] × [0, 1] if γ 0 (0) ≡ γ (t). This is from the fundamental theorem of calculus, Theorem 4.1.7 or its proof, and the observation that 2h 2h (t − a) − −2h + t + (t − a) = 2h t+ (b − a) (b − a) Thus this function of two variables is uniformly continuous on this set by Theorem 2.6.28. Also, the definition implies b+2h
1 γ h (b) = 2h
Z
1 2h
Z
γ h (a) =
γ (s) ds = γ (b) , b a
γ (s) ds = γ (a) . a−2h
By continuity of γ, the chain rule from beginning calculus, and the fundamental theorem of calculus, Theorem 4.1.7, 1 2h 2h 0 γ h (t) = (t − a) γ t+ 1+ − 2h b−a b−a 2h 2h (t − a) 1+ γ −2h + t + b−a b−a and so γ h ∈ C 1 ([a, b]) . Next is a fundamental estimate.
Lemma 4.2.4 V (γ h , [a, b]) ≤ V (γ, [a, b]) . Proof: Let a = t0 < t1 < · · · < tn = b. Then using the definition of γ h and changing the variables to make all integrals over [0, 2h] , n X
|γ h (tj ) − γ h (tj−1 )| =
j=1
Z n 2h X 2h 1 (t − a) − γ s − 2h + t + j j 2h 0 b−a j=1 2h γ s − 2h + tj−1 + (tj−1 − a) b−a Z 2h X n 1 2h ≤ (tj − a) − γ s − 2h + tj + 2h 0 j=1 b−a 2h γ s − 2h + tj−1 + (tj−1 − a) ds. b−a 2h (tj − a) for j = 1, · · · , n form an For a given s ∈ [0, 2h] , the points, s − 2h + tj + b−a increasing list of points in the interval [a − 2h, b + 2h] and so the integrand is bounded above by V (γ, [a − 2h, b + 2h]) = V (γ, [a, b]) . It follows n X
|γ h (tj ) − γ h (tj−1 )| ≤ V (γ, [a, b])
j=1
With this lemma the proof of Theorem 4.2.3 can be completed without too much trouble. Let H be an open set containing γ ∗ such that H is a compact subset of Ω. This is easily done as follows. For each x ∈ γ ∗ let B (x, rx ) be a ball centered at x which is contained
4.2. ESTIMATES AND APPROXIMATIONS
119
in Ω. Since γ ∗ is the continuous image of a compact set, it is also compact and so there are many of the balls {B (x, rx /2)}whose union contains γ ∗ . Denote these balls by finitely r n n B xi , 2xi i=1 . Then let H be the union of the balls {B (xi , rxi )}i=1 Thus H is the union of the closures of these balls each of which is compact, so H is also compact and contained in Ω. Let 0 < ε < 21 min {rxi : i ≤ n}. Then there exists δ 1 such that if h < δ 1 , then for all t, |γ (t) − γ h (t)|
≤
0 such that for all z ∈ K and h ∈ [0, 1] , |f (γ h (t) , z) − f (γ h (s) , z)| < if |t − s| < δ 2 . Therefore, from 4.2, Z f (·, z) · dγ h (t) − S (P) ≤ 2V (γ h , [a, b]) ≤ 2V (γ, [a, b]) (4.16) m m γ
so for large enough choice of m, the two inequalities in 4.15 are obtained. Here S (P) is a Riemann Steiltjes sum of the form n X
f (γ (τ i ) , z) · (γ (ti ) − γ (ti−1 ))
i=1
and Sh (P) is a similar Riemann Steiltjes sum taken with respect to γ h instead of γ. Because of 4.14, γ h (t) has values in H ⊆ Ω. Therefore, fix the partition P, and choose h small enough that in addition to this, the following inequality is valid for all z ∈ K. |S (P) − Sh (P)|
0 such that if ε |y − x|. From Lemma 4.2.6 above, there exists x, y ∈ (s − δ s , s + δ s ) , then |o (y − x)| < b−a δ > 0 such that any interval of length less than δ is contained on one of the intervals (s − δ s , s + δ s ) . Now pick a sum which is close to V (γ) . V (γ) − ε ≤
n X
|γ (ti ) − γ (ti−1 )| =
k=1
n X
|γ 0 (ti−1 ) (ti − ti−1 ) + o (ti − ti−1 )| ≤ V (γ)
k=1
Then by adding in points to the partition, one only makes the approximating sum larger. Hence, by making kP k sufficiently small, smaller than the Lebesgue number just mentioned, we can assume that successive partition points are contained in a single interval ε (s − δ s , s + δ s ) described above. Thus one can assume that |o (ti − ti−1 )| < b−a (ti − ti−1 ) for all i. It follows that if kP k is small enough, where P is a partition of [a, b], V (γ) − ε ≤ =
n X k=1 n X
0
|γ (ti−1 )| (ti − ti−1 ) +
n X i=1
ε (ti − ti−1 ) b−a
|γ 0 (ti−1 )| (ti − ti−1 ) + ε
k=1
and V (γ) ≥ =
n X k=1 n X
|γ 0 (ti−1 )| (ti − ti−1 ) −
n X i=1
ε (ti − ti−1 ) b−a
|γ 0 (ti−1 )| (ti − ti−1 ) − ε
k=1
Thus V (γ) − 2ε ≤
n X
|γ 0 (ti−1 )| (ti − ti−1 ) ≤ V (γ) + ε
k=1
Now picking more points to make kP k → 0, one obtains Z b V (γ) − 2ε ≤ |γ 0 (t)| dt ≤ V (γ) + ε a
Then, since ε is arbitrary, Z a
b
|γ 0 (t)| dt = V (γ)
122
CHAPTER 4. LINE INTEGRALS
Example 4.2.7 Let γ (t) ≡ t, t2 , t for t ∈ [0, 1]. Find the length of the curve determined by this parametrization. √ R1√ |γ 0 (t)| = 2 + 4t2 and so the length of the curve is 0 2 + 4t2 dt. You can use the standard to find this integral or you could find it numerically. It equals √ calculus √ 1gimmicks √ √ 1 ln 2 + 3 + 2 3. 2 2
4.2.2
Curves Defined in Pieces
Definition 4.2.8
If γ k : [ak , bk ] → Rp is continuous, one to one on [ak , bk ), and of bounded variation, for k = 1, · · · , m and γ k (bk ) = γ k+1 (ak ) , define Z m Z X f · dγ ≡ f · dγ k . (4.19) P m k=1
γk
k=1
γk
In addition to this, for γ : [a, b] → Rp , define −γ : [a, b] → Rp by −γ (t) ≡ γ (b + a − t) . Thus γ simply traces out the points of γ ∗ in the opposite order. The following lemma is useful and follows quickly from Theorem 4.1.3. It shows that when you string these finite variation curves together end to end, you could just as well save trouble on the details and consider a single finite variation vector valued function.
Lemma 4.2.9 In the above definition where γ k (bk ) = γ k+1 (ak ), there exists a con-
tinuous bounded variation function, γ one to one on γ −1 (γ k [ak , bk )) which is defined on some closed interval, [c, d] , such that γ ([c, d]) = ∪m k=1 γ k ([ak , bk ]) and γ (c) = γ 1 (a1 ) while γ (d) = γ m (bm ) . Furthermore, Z m Z X f · dγ k . (4.20) f · dγ = γ
k=1
γk
If γ : [a, b] → Rp is of bounded variation and continuous, then Z Z f · dγ = − f · dγ. γ
(4.21)
−γ
Proof: Consider the first claim about the intervals. It is obvious if m = 1. Supposehtheni that it holds for m−1 and you have m intervals and curves. By induction, there exists cˆ, dˆ h i and γ ˆ such that γ ˆ cˆ, dˆ = ∪m−1 γ ([a , b ]) with γ ˆ (ˆ c ) = γ (a ) , γ ˆ dˆ = γ (bm−1 ) , k k 1 k 1 k=1 and γ k (bk ) = γ k+1 (ak ) for k ≤ m − 1. I will describe an extension ofhγ ˆ toi h by assumption, i cˆ, dˆ + bm − am which will be defined as [c, d]. Thus the new interval coming after cˆ, dˆ h i ˆ dˆ + bm − am will be d, h i γ ˆ (t) if t ∈ cˆ, dˆ h i γ (t) ≡ ˆ dˆ + bm − am γ m t − dˆ + am if t ∈ d, Now consider the claim in 4.20. In writing the Riemann sums, it can always be assumed that the end points of the intervals γ −1 (γ k ([ak , bk ])) are in the partition since including these points makes kP k no larger and the integral is defined in terms of smallness of kP k. Therefore, 4.20 follows from Theorem 4.1.3 applied to two different parametrizations of γ ∗k , the one coming from γ and the one coming from γ k . Finally consider the last claim. Say γ : [a, b] → Rn . Then −γ (t) ≡ γ (a + b − t) and so a typical Riemann sum for −γ would be n X i=1
f (γ (a + b − τ i )) · (γ (a + b − ti ) − γ (a + b − ti−1 )) , τ i ∈ [ti−1 , ti ]
4.2. ESTIMATES AND APPROXIMATIONS
=−
n X
123
f (γ (a + b − τ i )) · (γ (a + b − ti−1 ) − γ (a + b − ti ))
i=1
Now is just a Riemann sum for R a + b − τ i ∈ [a + b − ti , a + b − ti−1 ] and so the above R − γ f ·dγ. Thus, in the limit as kP k → 0, one obtains − γ f ·dγ.
4.2.3
A Physical Application, Work
One of the most important applications of line integrals is to the concept of work. I will specialize the discussion to the case of a smooth curve. However, the general case is obtained by writing dγ in place of γ 0 (t) dt. It is being assumed that the parametrizations considered are one to one on the interior of the parameter domain. You have a force field field F which acts on an object as it moves along the curve C, in the direction determined by the given orientation of the curve which comes here from the given parametrization. From beginning physics or calculus, you defined work as the force in a given direction times the distance the object moves in that direction. Work is really a measure of the extent to which a force contributes to the motion but here the object moves over a curve so the direction of motion is constantly changing. In order to define what is meant by the work, consider the following picture. F(x(t))
x(t) x(t + h) In this picture, the work done by a constant force F on an object which moves from the point γ (t) to the point γ (t + h) along the straight line shown would equal F·(γ (t + h) − γ (t)). It is reasonable to assume this would be a good approximation to the work done in moving along the curve joining γ (t) and γ (t + h) provided h is small enough. Also, provided h is small, γ (t + h) − γ (t) ≈ γ 0 (t) h where the wriggly equal sign indicates the two quantities are close. In the notation of Leibniz, one writes dt for h and dW = F (γ (t)) · γ 0 (t) dt or in other words, dW = F (γ (t)) · γ 0 (t) . dt Defining the total work done by the force at t = 0, corresponding to the first endpoint of the curve, to equal zero, the work would satisfy the following initial value problem. dW = F (γ (t)) · γ 0 (t) , W (a) = 0. dt This motivates the following definition of work.
Definition 4.2.10 Let F (γ) be given above. Then the work done by this force field on an object moving over the curve C in the direction determined by the specified orientation is defined as Z b Z 0 F (γ (t)) · γ (t) dt = F · dγ a
C
where the function γ is one of the allowed parameterizations of C in the given orientation of C. In other words, there is an interval [a, b] and as t goes from a to b, γ (t) moves from one endpoint to the other.
124
CHAPTER 4. LINE INTEGRALS
When you have a curve and γ : [a, b] → C and η : [c, d] → C are two parametrizations such that γ −1 ◦η is increasing, the two parametrizations give the same value for the work. This follows from Theorem 4.1.3. You consider γ and γ ◦ γ −1 ◦ η in that theorem. This says essentially that two different parametrizations give the same value for the work if they go over the curve in the same direction. Example 4.2.11 Say F ≡ (x, x + y, z − x) . Find the work done on an object which moves over the curve parametrized by γ (t) = 1, 2t, t2 for t ∈ [0, 1]. From the above, it equals 7 2.
4.3
R1 0
R1 1, 1 + 2t, t2 − 1 · (0, 2, 2t) dt = 0 4t − 2t + 2t3 + 2 dt =
Conservative Vector Fields
Recall the gradient of a scalar function x → F (x) , x ∈ Rp . It was the vector Fx1 .. . Fxp the transpose of the matrix of the derivative of F . (It is best not to worry too much about this distinction between the gradient and the derivative at this point.)
Theorem 4.3.1
Let γ : [a, b] → Rp be continuous and of bounded variation. Also suppose ∇F = f on Ω, an open set containing γ ∗ and f is continuous on Ω. Then Z f · dγ = F (γ (b)) − F (γ (a)) . γ
Proof: By Theorem 4.2.3 there exists η ∈ C 1 ([a, b]) such that γ (a) = η (a) , and γ (b) = η (b) such that Z Z < ε. f · dγ − f · dη γ
η
1
Then from Theorem 4.2.5, since η is in C ([a, b]) , it follows from the chain rule and the fundamental theorem of calculus that Z Z b Z b d 0 F (η (t)) dt f · dη = f (η (t)) η (t) dt = η a a dt = F (η (b)) − F (η (a)) = F (γ (b)) − F (γ (a)) . Therefore, Z (F (γ (b)) − F (γ (a))) − f · dγ < ε γ
and since ε > 0 is arbitrary, this proves the theorem. You can prove this theorem another way without using that approximation result.
Theorem 4.3.2
Suppose γ is continuous and bounded variation, a parametrization of Γ where t ∈ [a, b] . Suppose γ ∗ = Γ ⊆ Ω an open set and that f : Ω → Rp is continuous and has a potential F . Thus ∇F (x) = f (x) for x ∈ Ω. Then Z f ·dγ = F (γ (b)) − F (γ (a)) γ
4.3. CONSERVATIVE VECTOR FIELDS Proof: Define the following function ( h (s, t) ≡
F (γ(t))−F (γ(s))−f (γ(s))·(γ(t)−γ(s)) |γ(t)−γ(s)|
125
if γ (t) − γ (s) 6= 0
0 if γ (t) − γ (s) = 0
Then h is continuous at points (s, s). To see this note that the above reduces to o (γ (t) − γ (s)) |γ (t) − γ (s)| thus if (sn , tn ) → (s, s) , the continuity of γ requires this to also converge to 0. Claim: Let ε > 0 be given. There exists δ > 0 such that if |t − s| < δ, then kh (s, t)k < ε. Proof of claim: If not, then for some ε > 0, there exists (sn , tn ) where |tn − sn | < 1/n but kh (sn , tn )k ≥ ε. Then by compactness, there is a subsequence, still called {sn } such that sn → s. It follows that tn → s also. Therefore, 0 = lim |h (sn , tn )| ≥ ε n→∞
a contradiction. Thus whenever |t − s| < δ, F (γ (t)) − F (γ (s)) − f (γ (s)) · (γ (t) − γ (s)) 0. If for some other pair of points, x2 < y2 with x2 , y2 ∈ (a, b) , the above inequality does not hold, then since φ is 1 − 1, (φ (y2 ) − φ (x2 )) (y2 − x2 ) < 0. Let xt ≡ tx1 + (1 − t) x2 and yt ≡ ty1 + (1 − t) y2 . Then xt < yt for all t ∈ [0, 1] because tx1 ≤ ty1 and (1 − t) x2 ≤ (1 − t) y2 with strict inequality holding for at least one of these inequalities since not both t and (1 − t) can equal zero. Now define h (t) ≡ (φ (yt ) − φ (xt )) (yt − xt ) . Since h is continuous and h (0) < 0, while h (1) > 0, there exists t ∈ (0, 1) such that h (t) = 0. Therefore, both xt and yt are points of (a, b) and φ (yt ) − φ (xt ) = 0 contradicting the assumption that φ is one to one. It follows φ is either strictly increasing or strictly decreasing on (a, b) . This property of being either strictly increasing or strictly decreasing on (a, b) carries over to [a, b] by the continuity of φ. Suppose φ is strictly increasing on (a, b) , a similar argument holding for φ strictly decreasing on (a, b) . If x > a, then pick y ∈ (a, x) and from the above, φ (y) < φ (x) . Now by continuity of φ at a, φ (a) = lim φ (z) ≤ φ (y) < φ (x) . x→a+
Therefore, φ (a) < φ (x) whenever x ∈ (a, b) . Similarly φ (b) > φ (x) for all x ∈ (a, b). It only remains to verify φ−1 is continuous. Suppose then that sn → s where sn and s are points of φ ([a, b]) . It is desired to verify that φ−1 (sn ) → φ−1 (s) . If this does not happen, there exists ε > 0 and a subsequence, still denoted by sn such that φ−1 (sn ) − φ−1 (s) ≥ ε. Using the sequential compactness of [a, b] there exists a further subsequence, still denoted by n, such that φ−1 (sn ) → t1 ∈ [a, b] , t1 6= φ−1 (s) . Then by continuity of φ, it follows sn → φ (t1 ) and so s = φ (t1 ) . Therefore, t1 = φ−1 (s) after all.
128
CHAPTER 4. LINE INTEGRALS
Definition 4.4.2 Let η, γ be continuous one to one parametrizations for a simple curve. That is, γ ([a, b]) = η ([c, d]) and γ, η are one to one on [a, b] and [c, d] respectively. Thus a simple curve is the one to one onto image of a closed interval. Earlier this was called a Jordan arc. If η −1 ◦γ is increasing, then γ and η are said to be equivalent parametrizations and this is written as γ ∼ η. It is also said that the two parametrizations give the same orientation for the curve when γ ∼ η. A simple closed curve is one which is the continuous one to one image of S 1 , the unit circle. First is a discussion of orientation of simple curves and after that, orientation of a simple closed curve is considered. It is always about specifing a direction of motion along the curve. In the case of a simple curve, there will be exactly two orientations because there is a single point in γ −1 (p) for p a point on the curve. Thus γ −1 is well defined and continuous by Theorem 2.6.30. By Lemma 4.4.1, either η −1 ◦ γ is increasing or it is decreasing. In simple language, the message is that there are exactly two directions of motion along a simple curve. q
p
q
p
Lemma 4.4.3 The following hold for ∼. γ ∼ γ,
(4.22)
If γ ∼ η then η ∼ γ,
(4.23)
If γ ∼ η and η ∼ θ, then γ ∼ θ.
(4.24)
Proof: Formula 4.22 is obvious because γ −1 ◦ γ (t) = t so it is clearly an increasing function. If γ ∼ η then γ −1 ◦ η is increasing. Now η −1 ◦ γ must also be increasing because it is the inverse of γ −1 ◦ η. This verifies 4.23. To see 4.24, γ −1 ◦ θ = γ −1 ◦ η ◦ η −1 ◦ θ and so since both of these functions are increasing, it follows γ −1 ◦ θ is also increasing. This proves the lemma.
Definition 4.4.4
Let Γ be a simple curve and let γ be a parametrization for Γ. Denoting by [γ] the equivalence class of parameterizations determined by the above equivalence relation, the following pair will be called an oriented curve. (Γ, [γ]) In simple language, an oriented curve is one which has a direction of motion specified. Actually, people usually just write Γ and there is understood a direction of motion or orientation on Γ. How can you identify which orientation is being considered? Is there a simple condition which shows that the orientation defined by two parametrizations is the same for a curve? For a simple curve, we have the following lemma.
Lemma 4.4.5 Suppose γ ∗ = η ∗ , γ : [a, b] → Rp , η : [c, d] → Rp are both one to one. Then η ∼ γ if and only if there is a strictly increasing continuous function φ : [a, b] → [c, d] such that η ◦ φ = γ.
4.4. ORIENTATION
129
Proof: ⇒ In this case, it is given that η −1 ◦ γ is increasing. Let this function be φ. Then γ = η ◦ φ. ⇐ In this case, the increasing function φ exists such that η ◦ φ = γ. Thus η −1 ◦ γ is increasing. Note that, the end points of this curve consist of {γ (a) , γ (b)} = {η (c) , η (d)}. Otherwise, if an end point were γ (c) not equal to either {γ (a) , γ (b)} , you could remove γ (c) from γ ∗ and the result would be connected while [a, c) ∪ (c, b] is not connected. Now γ −1 : γ ∗ → [a, b] is continuous so the image of γ ∗ \ {γ (c)} would need to be connected which would not be so. With a simple closed curve, the image of [a, b) this may not be so. You could get the curve from two different parametrizations and the two could yield different beginning and ending points. You remove a point from one of these and the result is still connected. This is the difficulty of trying to use this approach on a simple closed curve. When the parametrizations are equivalent, this says they preserve the direction of motion along the curve. Recall Theorem 4.1.3. Thus Lemma 4.4.5 shows that the line integral depends only on the set of points in the curve and the orientation or direction of motion along the curve. The orientation of a simple curve is determined by two points on the curve. This is the idea of the following proposition.
Proposition 4.4.6 Let (Γ, [γ]) be an oriented simple curve, γ ([a, b]) = γ ∗ , γ one to one on [a, b] and let p, q be any two distinct points of Γ. Then [γ] is determined by the order of γ −1 (p) and γ −1 (q). This means that η ∈ [γ] if and only if η −1 (p) and η −1 (q) occur in the same order as γ −1 (p) and γ −1 (q). In other words, if and only if p, q are encountered in the same order with both parametrizations as the parameter increases. Proof: Suppose γ −1 (p) < γ −1 (q) and let η ∈ [γ] . In this case, there is a strictly increasing function φ such that γ = η ◦ φ and so γ −1 (p) = φ−1 η −1 (p) < γ −1 (q) = φ−1 η −1 (q) But φ is increasing and so η −1 (p) < η −1 (q) . The situation is similar if γ −1 (p) < γ −1 (q) . Thus if the two γ, η are equivalent, then they preserve order of points as just described. Conversely suppose the two parametrizations preserve the order for some pair of points. That is, for some p, q on the curve Γ, γ −1 (p) < γ −1 (q) , η −1 (p) < η −1 (q) By Lemma 4.4.1, either γ −1 ◦ η is strictly increasing or it is strictly decreasing. However, the above shows γ −1 η η −1 (p) < γ −1 η η −1 (q) , γ −1 ◦ η η −1 (p) < γ −1 ◦ η η −1 (p) where η −1 (p) < η −1 (q) is given. Therefore, γ −1 ◦ η can only be increasing. Thus η ∼ γ. This shows that the direction of motion on the simple curve is determined by any two points and the determination of which is encountered first by any parametrization in the equivalence class of parameterizations which determines the orientation. Sometimes people indicate this direction of motion by drawing an arrow.
Definition 4.4.7
Let Γ be a simple closed curve. This means there is γ : S 1 → Γ which is one to one and onto. Recall also from Lemma 12.4.12 this is equivalent to a parametrization γ : [a, b] → Γ such that γ is continuous and one to one on [a, b) but γ (a) = γ (b). Here S 1 is the unit circle in the plane x2 + y 2 = 1
130
CHAPTER 4. LINE INTEGRALS
Simple closed curves are the continuous one to one image of the unit circle. However, one can take any point on the simple closed curve and regard that point as the beginning and ending point. This differs from a simple closed curve in which you have only two points which can be considered the beginning or the end.
Proposition 4.4.8 Let Γ be a simple closed curve. Then for any point p on Γ, there is a closed interval [a, b] and a continuous map γ : [a, b] → Γ and γ one to one on [a, b) such that γ (a) = γ (b) = p. That is, p will be the beginning and ending point. Proof: Let ξ : S 1 → Γ be one to one, onto and continuous. Then ξ −1 (p) ≡ (cos α, sin α) for some α ∈ [0, 2π). Then let θ : [α, α + 2π] → S 1 be defined by θ (t) ≡ (cos t, sin t) . Then θ is one to one on [α, α + 2π) and continuous on [α, α + 2π] and θ : [α,α + 2π] → S 1 is onto and θ (α) = θ (α + 2π). Consider ξ ◦ θ ≡ γ. γ (α) ≡ ξ ξ −1 (p) = p, γ is also continuous on [α, α + 2π] and is one to one on [α, α + 2π) and maps onto Γ. γ (α + 2π) = ξ (θ (α + 2π)) = p. If Γ = γ ([a, b]) where γ is one to one on [a, b), continuous on [a, b] and γ (a) = γ (b) , then if c ∈ (a, b) , you cannot have γ ([a, c]) = Γ or γ ([c, b]) = Γ because this would violate γ being one to one on [a, b). Thus γ (c) is neither a beginning nor an ending point for this particular parametrization.
Lemma 4.4.9 Suppose Γi , i = 1, 2 are two simple curves which intersect only at the points p, q and that p, q are also the endpoints of each Γi . Then the union of these two curves, denoted by Γ is a simple closed curve. Conversely, if Γ is a simple closed curve, it can be expressed as the union of two such simple curves.
Γ1
•
• Γ2
Proof:⇒ By assumption there exists γ i : [ai , bi ] → Γi such that the end points γ i (a) , γ i (b) consist of the points p, q. Without loss of generality, one can change the parameter if necessary to have γ 1 (a1 ) = p, γ 1 (b1 ) = q, γ 2 (a2 ) = q, γ 2 (b2 ) = p Now change the parameter again to have [a1 , b1 ] = [0, 1] and [a2 , b2 ] = [1, 2]. Let γ (t) ≡
γ 1 (t) if t ∈ [0, 1] , γ 1 (1) = q γ 2 (t) if t ∈ [1, 2] , γ 2 (2) = p
Then γ is one to one on [0, 2) and has γ (0) = γ (2). Thus if Γ ≡ γ ([0, 2]) , then Γ is a simple closed curve. ⇐If Γ is a simple closed curve, pick p 6= q. By Proposition 4.4.8, there is γ : [a, b] → Γ which is onto, one to one on [a, b) continuous, and γ (a) = γ (b) = p. Thus there is c ∈ (a, b) such that γ (c) = q. Then the two simple curves are γ ([a, c]) and γ ([c, b]). What is meant by an oriented simple closed curve? As in the case of a simple curve, it needs to be defined in such a way as to specify a direction around the curve. What is easily understood is that for the unit circle, there are exactly two ways to go around it, counter clockwise and clockwise.
4.4. ORIENTATION
131
Definition 4.4.10
Let Γ be a simple closed curve. Two parametrizations γ : S 1 → Γ and η : S 1 → Γ are equivalent if and only if η −1 ◦ γ preserves the direction of motion around S 1 . That is, if for increasing t ∈ R, x (t) is a point on S 1 , then if x (t) moves clockwise for increasing t, so does η −1 ◦ γ (x (t)) and if x (t) moves counter clockwise for increasing t, then so does η −1 ◦ γ (x (t)) .
Lemma 4.4.11 The above definition yields an equivalence relation. Proof: The proof is just like it was earlier in case of simple curves. It is obvious that γ ∼ γ. If γ ∼ η so η −1 ◦ γ preserves direction of motion. How about γ −1 ◦ η? If x (t) −1 −1 is moving counter clockwise then x (t) = η ◦ γ γ ◦ η (x (t)) . This cannot be true unless t → γ −1 ◦ η (x (t)) moves counter clockwise as well since you cannot have equality in t of two points on S 1 which move in opposite directions. Similarly, clockwise motion must also be preserved. The transitive law is fairly obvious also. Say γ ∼ η and η ∼ ζ. Then ζ −1 ◦γ = ζ −1 ◦η η −1 ◦γ and the two mappings on the right preserve motion on S 1 . Since there are exactly two directions of motion on S 1 , this shows there are exactly two equivalence classes of parametrizations of a simple closed curve. However, we will mainly use orientations on simple curves to specify the direction of motion on a simple closed curve as in the following Proposition. The situation is illustrated by the following picture in which there are two simple closed curves which share a simple curve denoted by l. The case we have in mind is in the plane but it would work as well if it were only the case that the two simple closed curves have the two points p, q in common.
• p l
Γ2
Γ1
q •
Proposition 4.4.12 Let Γ1 and Γ2 be two simple closed curves and let their intersection be l. Suppose also that l is itself a simple curve. Also suppose the orientation of l when considered a part of Γ1 is opposite its orientation when considered a part of Γ2 . Then if the open segment (l except for its endpoints) of l is removed, the result is a simple closed −1 curve Γ. This curve has a parametrization γ with the property that on γ −1 γj j (Γ ∩ Γj ) , γ is increasing. Furthermore, if Γ has such parametrization γ, then the orientations of l as part of the two simple closed curves, Γ1 and Γ2 are opposite. • Γ2
Γ1 •
Proof: By Proposition 4.4.8 there is a parametrization γ 1 : [a, b] → Γ1 which is one to one on [a, b) and γ 1 (a) = γ 1 (b) ≡ q. Suppose the motion along this curve l is from p to q.
132
CHAPTER 4. LINE INTEGRALS
Thus γ 1 (c) = p for some c ∈ (a, b). Changing the parametrization, we can have c = 1 and [a, b] = [0, 2]. Thus motion is from q to p along Γ1 for the parameter in [0, 1] and then from p to q for the parameter in [1, 2] . As to Γ2 , change the parameter such that the point goes from p to q along Γ2 with the parameter in [1, 2] and then from q back to p along [2, 3]. Call this parametrization γ 2 . So let γ be defined as follows. γ 1 (t) for t ∈ [0, 1] γ (t) ≡ γ 2 (t) for t ∈ [1, 2] Then γ encounters p and then q along Γ1 and q then p along Γ2 . By Proposition 4.4.6, γ −1 γ j is increasing for j = 1, 2 because the order in which the two points are obtained is the same for both γ and γ j . Then γ starts at q goes to p then back to q and is one to one on [0, 2) while γ (0) = γ (2). Thus γ ([0, 2]) is a simple closed curve and a direction or orientation over this curve has been established. Now suppose the orientation of γ is consistent with each of γ j and along Γ1 , motion of γ is from q to p. Then this requires that γ must go from p to q along Γ2 , because γ is given to be a parametrization for a simple closed curve. Also γ 1 must go from p to q along l because Γ1 is a simple closed curve and so q must be the end point. Similarly along l as part of Γ2 , motion is from q to p because Γ2 is a simple closed curve. Thus γ 1 , γ 2 must have opposite orientations along l in order to have γ be consistently oriented with each γ j . There is a slightly different aspect of the above proposition which is interesting. It involves using the shared segment to orient the simple closed curve Γ.
Corollary 4.4.13 Let the intersection of simple closed curves, Γ1 and Γ2 consist of the simple curve l. Then place opposite orientations on l, and use these two different orientations to specify orientations of Γ1 and Γ2 . Then letting Γ denote the simple closed curve which is obtained from deleting the open segment of l, there exists an orientation for Γ which is consistent with the orientations of Γ1 and Γ2 obtained from the given specification of opposite orientations on l. Also note that if you have specified a direction for one of the simple curves having endpoints p and q on a simple closed curve Γ, then you have only one choice for the direction on the remaining curve. If on this curve, the direction is from p to q, then the direction on the remaining simple curve must be from q to p.
4.5
Exercises
2 √ 1. Let r (t) = ln (t) , t2 , 2t for t ∈ [1, 2]. Find the length of this curve. 2. Let r (t) =
2 3/2 , t, t 3t
for t ∈ [0, 1]. Find the length of this curve.
3. Let r (t) = (t, cos (3t) , sin (3t)) for t ∈ [0, 1]. Find the length of this curve. 4. Suppose for t ∈ [0, π] the position of an object is given by r (t) = ti + cos (2t) j + 3 2 sin (2t) k. Also suppose R there is a force field defined on R , F (x, y, z) ≡ 2xyi+ x + 2zy j+ 2 y k. Find the work C F · dR where C is the curve traced out by this object having the orientation determined by the direction of increasing t. 5. In the following, a force field is specified followed by the parametrization of a curve. Find the work. (a) F = (x, y, z) , r (t) = t, t2 , t + 1 , t ∈ [0, 1] (b) F = (x − y, y + z, z) , r (t) = (cos (t) , t, sin (t)) , t ∈ [0, π] (c) F = x2 , y 2 , z + x , r (t) = t, 2t, t + t2 , t ∈ [0, 1]
4.5. EXERCISES
133
(d) F = (z, y, x) , r (t) = t2 , 2t, t , t ∈ [0, 1] 6. The curve consists of straight line segments which go from (0, 0, 0) to (1, 1, 1) and finally to (1, 2, 3). Find the work done if the force field is (a) F = 2xy, x2 + 2y, 1 (b) F = yz 2 , xz 2 , 2xyz + 1 (c) F = (cos x, − sin y, 1) (d) F = 2x sin y, x2 cos y, 1 7. Show the mean value theorem for integrals. Suppose f ∈ C ([a, b]) . Then there exists x ∈ [a, b] , in fact x can be taken in (a, b) , such that Z b f (x) (b − a) = f (t) dt a
You will need to recall simple theorems about the integral from one variable calculus. 8. In this problem is a short argument showing a version of what has become known as Fubini’s theorem. Suppose f ∈ C ([a, b] × [c, d]) . Then Z bZ d Z dZ b f (x, y) dydx = f (x, y) dxdy a
c
c
a
First explain why the two iterated integrals make sense. Hint: To prove the two iterated integrals are equal, let a = x0 < x1 < · · · < xn = b and c = y0 < y1 < · · · < ym = d be two partitions of [a, b] and [c, d] respectively. Then explain why Z bZ d n X m Z xi Z yj X f (s, t) dtds f (x, y) dydx = a
Z
c dZ
b
f (x, y) dxdy c
=
a
i=1 j=1 xi−1 n Z yj m X X j=1 i=1
yj−1
yj−1
Z
xi
f (s, t) dsdt xi−1
Now use the mean value theorem for integrals to write Z xi Z yj f (s, t) dtds = f sˆi , tˆj (xi − xi−1 ) (yi − yi−1 ) xi−1
yj−1
do something similar for Z
yj
Z
xi
f (s, t) dsdt yj−1
xi−1
and then observe that the difference between the sums can be made as small as desired by simply taking suitable partitions. 9. This chapter is on line integrals. It was almost exclusively oriented toward having γ continuous. There is a similar thing called a Riemann Stieltjes integral, written as Z b f (t) dg (t) a
A function f (assume here it is scalar valued for simplicity although this is not necessary) is said to be Riemann Stieltjes integrable if there is a number, I such that for all ε > 0 there exists δ such that if kP k < δ, then n X f (τ i ) (g (ti ) − g (ti−1 )) − I < ε i=1
134
CHAPTER 4. LINE INTEGRALS
for any Riemann Stieltjes sum defined as the above in which τ i ∈ [ti−1 , ti ]. This I is Rb denoted as a f (t) dg (t) and we will say that f ∈ R ([a, b] , g). Show that if g is of Rb bounded variation and f is continuous, then a f (t) dg (t) exists. Note the difference Rb between this and a f (g (t)) dg (t) which is a case of line integrals considered in this Rb chapter and how either includes the ordinary Riemann integral a f (t) dt. Rb 10. Suppose a f dg exists. Explain the following: Let P ≡ {x0 , · · · , xn } and let ti ∈ [xi−1 , xi ]. n X f g (b) − f g (a) − g (ti ) (f (xi ) − f (xi−1 )) i=1
=
n X
f (xi ) g (xi ) − f (xi−1 ) g (xi−1 ) −
i=1
=
n X
g (ti ) (f (xi ) − f (xi−1 ))
i=1 n X
n X
f (xi ) (g (xi ) − g (ti )) +
i=1
f (xi−1 ) (g (ti ) − g (xi−1 ))
i=1
Rb Rb and if kP k is small enough, this is a Riemann sum for a f dg which is closer to a f dg Rb Rb Rb Rb than ε. Use to explain why if a f dg exists, then so does a gdf and a f dg + a gdf = f g (b) − f g (a). Note how this says roughly that d (f g) = f dg + gdf . As an example, suppose g (t) = t and t → f (t) is decreasing. In particular, it is of bounded variation. Rb Rb Rb Thus a gdf exists. It follows then that a f dg = a f (t) dt exists. 11. Let f be increasing and g continuous on [a, b]. Then there exists c ∈ [a, b] such that Z b gdf = g (c) (f (b) − f (c)) . a
Hint: First note g Riemann Stieltjes integrable because it is continuous. Since g is continuous, you can let m = min {g (x) : x ∈ [a, b]} and M = max {g (x) : x ∈ [a, b]} Then Z
b
Z
b
df ≤
m a
b
Z gdf ≤ M
a
df a
Now if f (b) − f (a) 6= 0, you could divide by it and conclude Rb gdf a ≤ M. m≤ f (b) − f (a) Rb You need to explain why a df = f (b) − f (a). Next use the intermediate value theorem to get the term in the middle equal to g (c) for some c. What happens if f (b) − f (a) = 0? Modify the argument and fill in the details to show the conclusion still follows. 12. Suppose g is increasing and f is continuous and of bounded variation. g ∈ R ([a, b] , f ) . Show there exists c ∈ [a, b] such that Z b Z gdf = g (a) a
a
c
Z df + g (b)
b
df c
4.5. EXERCISES
135
This is called the second mean value theorem for integrals. Hint: Use integration by parts. Z b Z b f dg + f (b) g (b) − f (a) g (a) gdf = − a
a
Now use the first mean value theorem, the result of Problem 11 to substitute something Rb for a f dg and then simplify. 13. Let U be an open subset of Rn and suppose that f : [a, b] × U → R satisfies (x, y) →
∂f (x, y) , (x, y) → f (x, y) ∂yi
are all continuous. Show that Z b
b
Z f (x, y) dx, a
a
∂f (x, y) dx ∂yi
all make sense and that in fact ! Z Z b b ∂f ∂ f (x, y) dx = (x, y) dx ∂yi ∂y i a a Also explain why Z
b
∂f (x, y) dx a ∂yi is continuous. Hint: You will need to use the theorems from one variable calculus about the existence of the integral for a continuous function. You may also want to use theorems about uniform continuity of continuous functions defined on compact sets. y→
14. I found this problem in Apostol’s book [3]. This is a very important result and is obtained very simply. Read it and fill in any missing details. Let Z
2 2 e−x (1+t ) dt 1 + t2
1
g (x) ≡ 0
and Z
x
f (x) ≡
e
−t2
2 dt .
0
Note ∂ ∂x
2 2 e−x (1+t ) 1 + t2
! 2 2 = −2xe−x (1+t )
Explain why this is so. Also show the conditions of Problem 13 are satisfied so that Z 1 2 2 g 0 (x) = −2xe−x (1+t ) dt. 0
Now use the chain rule and the fundamental theorem of calculus to find f 0 (x) . Then change the variable in the formula for f 0 (x) to make it an integral from 0 to 1 and show f 0 (x) + g 0 (x) = 0. Now this shows f (x) + g (x) is a constant. Show the constant is π/4 by letting x → 0. Next take a limit as x → ∞ to obtain the following formula for the improper integral, R ∞ −t 2 e dt, 0 Z ∞ 2 −t2 e dt = π/4. 0
136
CHAPTER 4. LINE INTEGRALS In passing to the limit in the integral for g as x → ∞ you need to justify why that integral converges to 0. To do this, argue the integrand converges uniformly to 0 for t ∈ [0, 1] and then explain why this gives convergence of the integral. Thus Z ∞ √ 2 e−t dt = π/2. 0
15. The gamma function is defined for x > 0 as Z Γ (x) ≡
∞
e−t tx−1 dt ≡ lim
R→∞
0
Z
R
e−t tx−1 dt
0
Show this limit exists. Note you might have to give a meaning to Z
R
e−t tx−1 dt
0
if x < 1. Also show that Γ (x + 1) = xΓ (x) , Γ (1) = 1. How does Γ (n) for n an integer compare with (n − 1)!?
Chapter 5
Measures And Measurable Functions The Lebesgue integral is much better than the Rieman integral. This has been known for over 100 years. It is much easier to generalize to many dimensions and it is much easier to use in applications. That is why I am going to use it rather than struggle with an inferior integral. It is also this integral which is most important in probability. However, this integral is more abstract. This chapter will develop the abstract machinery necessary for this integral. Complex analysis does not really require the use of this superior integral however, but the approach we take here will involve integration of a function of more than one variable and when you do this, the Riemann integral becomes totally insufferable, forcing one to consider things like the Jordan content of the boundary and so forth. It is much better to go ahead and deal with the better integral even though it is more abstract.
Definition 5.0.1 Let Ω be a nonempty set. F ⊆ P (Ω) , the set of all subsets of Ω, is called a σ algebra if it contains ∅, Ω, and is closed with respect to countable unions and ∞ complements. That is, if {An }n=1 is countable and each An ∈ F, then ∪∞ n=1 Am ∈ F also C and if A ∈ F, then Ω \ A ≡ A ∈ F. It is clear that any intersection of σ algebras is a σ algebra. If K ⊆ P (Ω) , σ (K) is the smallest σ algebra which contains K. Observation 5.0.2 If you have sets Ai ∈ F a σ algebra, then ∞ C ∩∞ i=1 Ai = ∪i=1 Ai
C
∈F
Thus countable unions, countable intersections, and complements of sets of F stay in F.
5.1
Measurable Functions
Then for functions which have values in (−∞, ∞] we have the following Lemma which includes a definition of what it means for a function to be measurable. Notation 5.1.1 In whatever context, f −1 (S) ≡ {ω ∈ Ω : f (ω) ∈ S}. It is called the inverse image of S and everything in the theory of the Lebesgue integral is formulated in terms of this.
Lemma 5.1.2 Let f : Ω → (−∞, ∞] where F is a σ algebra of subsets of Ω. The following are equivalent. f −1 ((d, ∞]) ∈ F for all finite d, f −1 ((−∞, d)) ∈ F for all finite d, f −1 ([d, ∞]) ∈ F for all finite d, 137
138
CHAPTER 5. MEASURES AND MEASURABLE FUNCTIONS f −1 ((−∞, d]) ∈ F for all finite d, f −1 ((a, b)) ∈ F for all a < b, −∞ < a < b < ∞.
Any of these equivalent conditions is equivalent to the function being measurable. Proof: First note that the first and the third are equivalent. To see this, observe −1 f −1 ([d, ∞]) = ∩∞ ((d − 1/n, ∞]), n=1 f
and so if the first condition holds, then so does the third. −1 f −1 ((d, ∞]) = ∪∞ ([d + 1/n, ∞]), n=1 f
and so if the third condition holds, so does the first. Similarly, the second and fourth conditions are equivalent. Now f −1 ((−∞, d]) = (f −1 ((d, ∞]))C so the first and fourth conditions are equivalent. Thus the first four conditions are equivalent and if any of them hold, then for −∞ < a < b < ∞, f −1 ((a, b)) = f −1 ((−∞, b)) ∩ f −1 ((a, ∞]) ∈ F. Finally, if the last condition holds, −1 f −1 ([d, ∞]) = ∪∞ ((−k + d, d)) k=1 f
C
∈F
and so the third condition holds. Therefore, all five conditions are equivalent. From this, it is easy to verify that pointwise limits of a sequence of measurable functions are measurable.
Corollary 5.1.3 If fn (ω) → f (ω) where all functions have values in (−∞, ∞], then if each fn is measurable, so is f . Proof: Note the following: 1 1 1 −1 −1 ∩ f , ∞] ⊆ f b + , ∞ (b + f −1 (b + , ∞] = ∪∞ k=1 n≥k n l l l This follows from the definition of the limit. Therefore, 1 1 −1 ∞ −1 ∞ ∞ −1 f ((b, ∞]) = ∪l=1 f (b + , ∞] = ∪l=1 ∪k=1 ∩n≥k fn (b + , ∞] l l 1 −1 ⊆ ∪∞ b + ,∞ = f −1 ((b, ∞]) l=1 f l The messy term on the middle is measurable because it consists of countable unions and intersections of measurable sets. It equals f −1 ((b, ∞]) and so this last set is also measurable. By Lemma 5.1.2, f is measurable.
Observation 5.1.4 If f : Ω → R then the above definition of measurability holds with no change. In this case, f never achieves the value ∞. This is actually the case of most interest. The following theorem is of major significance. I will use this whenever it is convenient.
Theorem 5.1.5
Suppose f : R → R is measurable and g : R → R is continuous. Then g ◦ f is measurable. Also, if f, g are measurable real valued functions, then their sum is also measurable and real valued as are all linear combinations of these functions.
5.1. MEASURABLE FUNCTIONS
139
−1 Proof: (g ◦ f ) ((a, ∞)) = f −1 g −1 ((a, ∞)) and by continuity of g, it follows that g −1 ((a, ∞)) is an open set. Thus it is the disjoint union of countably many open intervals by Theorem 2.11.8. It follows that f −1 g −1 ((a, ∞)) is the countable union of measurable sets and is therefore measurable. Why is f + g measurable when f, g are real valued measurable functions? This is a little ∞ trickier. Let the rational numbers be {rn }n=1 . −1
(f + g)
−1 ((a, ∞)) = ∪∞ (rn , ∞) ∩ f −1 (a − rn , ∞) n=1 g −1
It is clear that the expression on the right is contained in (f + g) (a, ∞) . Why are they −1 actually equal? Suppose ω ∈ (f + g) (a, ∞) . Then f (ω) + g (ω) > a and there exists rn a rational number smaller than g (ω) such that f (ω) + rn > a. Therefore, ω ∈ g −1 (rn , ∞) ∩ f −1 (a − rn ) and so the two sets are actually equal as claimed. Now by the first part, if f is measurable and a is a real number, then af is measurable also. Thus linear combinations of measurable functions are measurable. Although the above will usually be enough in this book, it is useful to have a theorem about measurability of a continuous combination of measurable functions. First note the following.
Lemma 5.1.6 Let (Ω, F) be a measurable space. Then f : Ω → R is measurable if and only if f −1 (U ) ∈ F whenever U is an open set.
Proof: See Problem 17 on Page 155. Alternatively, U = ∪∞ i=1 (ai , bi ) where the open intervals are disjoint. Hence f −1 (U ) = ∪i f −1 (ai , bi ) . If f is measurable, then the right side of the preceding equation is in F and so it follows that the left side is as well. Now if f −1 (U ) ∈ F for every open U then this holds in particular for open intervals and so f is measurable by definition in Lemma 5.1.2.
Proposition 5.1.7 Let fi : Ω → R be measurable, (Ω, F) a measurable space, and let T g : Rn → R be continuous. If f (ω) = f1 (ω) · · · fn (ω) , then g ◦ f is measurable. Proof: From the above lemma, it suffices to verify that (g ◦ f ) whenever U is open. However, −1 (g ◦ f ) (U ) = f −1 g −1 (U )
−1
(U ) is measurable
Since g is continuous, it follows from Proposition 2.6.19 that g −1 (U ) is an open set in Rn . By Proposition 2.5.5 there are countably many open sets Bi = B (xi , ri ) whose union is g −1 (U ). We will use the norm k·k∞ so that these Bi are of the form Bi =
n Y
aik , bik
k=1
Thus −1 ∞ −1 n f −1 g −1 (U ) = f −1 (∪∞ (Bi ) = ∪∞ i=1 Bi ) = ∪i=1 f i=1 ∩k=1 fk
aik , bik
∈F
Note that this includes all of Theorem 5.1.5 as a special case. There is a fundamental theorem about the relationship of simple functions to measurable functions given in the next theorem.
Definition 5.1.8
Let E ∈ F for F a σ algebra. Then 1 if ω ∈ E XE (ω) ≡ 0 if ω ∈ /E
140
CHAPTER 5. MEASURES AND MEASURABLE FUNCTIONS
This is called the indicator function of the set E. Let s : (Ω, F) → R. Then s is a simple function if it is of the form n X s (ω) = ci XEi (ω) i=1
where Ei ∈ F and ci ∈ R, the Ei being disjoint. Thus simple functions have finitely many values and are measurable. In the next theorem, it will also be assumed that each ci ≥ 0. Each simple function is measurable. This is easily seen as follows. First of all, you can assume the ci are distinct because if not, you could just replace those Ei which correspond to a single value with their union. Then if you have any open interval (a, b) , s−1 ((a, b)) = ∪ {Ei : ci ∈ (a, b)} and this is measurable because it is the finite union of measurable sets.
Theorem 5.1.9
Let f ≥ 0 be measurable. Then there exists a sequence of nonnegative simple functions {sn } satisfying 0 ≤ sn (ω)
(5.1)
· · · sn (ω) ≤ sn+1 (ω) · · · f (ω) = lim sn (ω) for all ω ∈ Ω. n→∞
(5.2)
If f is bounded, the convergence is actually uniform. Conversely, if f is nonnegative and is the pointwise limit of such simple functions, then f is measurable. Proof : Letting I ≡ {ω : f (ω) = ∞} , define n
tn (ω) =
2 X k X −1 k k+1 (ω) + 2n XI (ω). n f ([ n , n ))
k=0
Then tn (ω) ≤ f (ω) for all ω and limn→∞ tn (ω) = f (ω) for all ω. This is because tn (ω) = 2n n for ω ∈ I and if f (ω) ∈ [0, 2 n+1 ), then 0 ≤ f (ω) − tn (ω) ≤
1 . n
(5.3)
Thus whenever ω ∈ / I, the above inequality will hold for all n large enough. Let s1 = t1 , s2 = max (t1 , t2 ) , s3 = max (t1 , t2 , t3 ) , · · · . Then the sequence {sn } satisfies 5.1-5.2. Also each sn has finitely many values and is measurable. To see this, note that −1 n s−1 n ((a, ∞]) = ∪k=1 tk ((a, ∞]) ∈ F
To verify the last claim, note that in this case the term 2n XI (ω) is not present and for n large enough, 2n /n is larger than all values of f . Therefore, for all n large enough, 5.3 holds for all ω. Thus the convergence is uniform. Now consider the converse assertion. Why is f measurable if it is the pointwise limit of an increasing sequence simple functions? −1 f −1 ((a, ∞]) = ∪∞ n=1 sn ((a, ∞])
because ω ∈ f −1 ((a, ∞]) if and only if ω ∈ s−1 n ((a, ∞]) for all n sufficiently large.
5.2. MEASURES AND THEIR PROPERTIES
5.2
141
Measures and Their Properties
First, what is meant by a measure? These are defined on measurable spaces as follows.
Definition 5.2.1 Let (Ω, F) be a measurable space. Here F is a σ algebra of sets ∞ of Ω. Then µ : F → [0, ∞] is called a measure if whenever {Fi }i=1 is a sequence of disjoint sets of F, it follows that ∞ X µ (∪∞ F ) = µ (Ei ) i=1 i i=1
Note that the series could equal ∞. If µ (Ω) < ∞, then µ is called a finite measure. An important case is when µ (Ω) = 1 when it is called a probability measure. Note that µ (∅) = µ (∅ ∪ ∅) = µ (∅) + µ (∅) and so µ (∅) = 0. Example 5.2.2 You could have P (N) = F and you could define µ (S) to be the number of elements of S. This is called counting measure. It is left as an exercise to show that this is a measure. Example 5.2.3 Here is a pathological example. Let Ω be uncountable and F will be those sets which have the property that either the set is countable or its complement is countable. Let µ (E) = 0 if E is countable and µ (E) = 1 if E is uncountable. It is left as an exercise to show that this is a measure. Of course the most important measure in this book will be Lebesgue measure which gives the “volume” of a subset of Rn . However, this requires a lot more work to get this. First is a fundamental result about general measures. P∞ Lemma 5.2.4 If µ is a measure and Fi ∈ F, then µ (∪∞ i=1 Fi ) ≤ i=1 µ (Fi ). Also if Fn ∈ F and Fn ⊆ Fn+1 for all n, then if F = ∪n Fn , µ (F ) = lim µ (Fn ) n→∞
Symbolically, if Fn ↑ F, then µ (Fn ) ↑ µ (F ). If Fn ⊇ Fn+1 for all n, then if µ (F1 ) < ∞ and F = ∩n Fn , then µ (F ) = lim µ (Fn ) n→∞
Symbolically, if µ (F1 ) < ∞ and Fn ↓ F, then µ (Fn ) ↓ µ (F ). Proof: Let G1 = F1 and if G1 , · · · , Gn have been chosen disjoint, let Gn+1 ≡ Fn+1 \ ∪ni=1 Gi Thus the Gi are disjoint. In addition, these are all measurable sets. Now µ (Gn+1 ) + µ (Fn+1 ∩ (∪ni=1 Gi )) = µ (Fn+1 ) and so µ (Gn ) ≤ µ (Fn ). Therefore, µ (∪∞ i=1 Gi ) =
X i
µ (Gi ) ≤
X
µ (Fi ) .
i
Now consider the increasing sequence of Fn ∈ F. If F ⊆ G and these are sets of F µ (G) = µ (F ) + µ (G \ F ) so µ (G) ≥ µ (F ). Also F = ∪∞ i=1 (Fi+1 \ Fi ) + F1
142
CHAPTER 5. MEASURES AND MEASURABLE FUNCTIONS
Then µ (F ) =
∞ X
µ (Fi+1 \ Fi ) + µ (F1 )
i=1
Now µ (Fi+1 \ Fi ) + µ (Fi ) = µ (Fi+1 ). If any µ (Fi ) = ∞, there is nothing to prove. Assume then that these are all finite. Then µ (Fi+1 \ Fi ) = µ (Fi+1 ) − µ (Fi ) and so µ (F )
=
∞ X
µ (Fi+1 ) − µ (Fi ) + µ (F1 )
i=1
=
lim
n→∞
n X
µ (Fi+1 ) − µ (Fi ) + µ (F1 ) = lim µ (Fn+1 ) n→∞
i=1
Next suppose µ (F1 ) < ∞ and {Fn } is a decreasing sequence. Then F1 \ Fn is increasing to F1 \ F and so by the first part, µ (F1 ) − µ (F ) = µ (F1 \ F ) = lim µ (F1 \ Fn ) = lim (µ (F1 ) − µ (Fn )) n→∞
n→∞
This is justified because µ (F1 \ Fn ) + µ (Fn ) = µ (F1 ) and all numbers are finite by assumption. Hence µ (F ) = lim µ (Fn ) . n→∞
5.3
Dynkin’s Lemma
Dynkin’s lemma is a very useful result. It is used quite a bit in books on probability. It will be used here to obtain n dimensional Lebesgue measure without any ugly technicalities.
Definition 5.3.1
Let Ω be a set and let K be a collection of subsets of Ω. Then K is called a π system if ∅, Ω ∈ K and whenever A, B ∈ K, it follows A ∩ B ∈ K. The following is the fundamental lemma which shows these π systems are useful. This is due to Dynkin.
Lemma 5.3.2 Let K be a π system of subsets of Ω, a set. Also let G be a collection of subsets of Ω which satisfies the following three properties. 1. K ⊆ G 2. If A ∈ G, then AC ∈ G ∞
3. If {Ai }i=1 is a sequence of disjoint sets from G then ∪∞ i=1 Ai ∈ G. Then G ⊇ σ (K) , where σ (K) is the smallest σ algebra which contains K. Proof: First note that if H ≡ {G : 1 - 3 all hold} then ∩H yields a collection of sets which also satisfies 1 - 3. Therefore, I will assume in the argument that G is the smallest collection satisfying 1 - 3. Let A ∈ K and define GA ≡ {B ∈ G : A ∩ B ∈ G} .
5.3. DYNKIN’S LEMMA
143
I want to show GA satisfies 1 - 3 because then it must equal G since G is the smallest collection of subsets of Ω which satisfies 1 - 3. This will give the conclusion that for A ∈ K and B ∈ G, A ∩ B ∈ G. This information will then be used to show that if A, B ∈ G then A ∩ B ∈ G. From this it will follow very easily that G is a σ algebra which will imply it contains σ (K). Now here are the details of the argument. Since K is given to be a π system, K ⊆ G A . Property 3 is obvious because if {Bi } is a sequence of disjoint sets in GA , then ∞ A ∩ ∪∞ i=1 Bi = ∪i=1 A ∩ Bi ∈ G
because A ∩ Bi ∈ G and the property 3 of G. It remains to verify Property 2 so let B ∈ GA . I need to verify that B C ∈ GA . In other words, I need to show that A ∩ B C ∈ G. However, C A ∩ B C = AC ∪ (A ∩ B) ∈ G Here is why. Since B ∈ GA , A∩B ∈ G and since A ∈ K ⊆ G it follows AC ∈ G by assumption 2. It follows from assumption 3 the union of the disjoint sets, AC and (A ∩ B) is in G and then from 2 the complement of their union is in G. Thus GA satisfies 1 - 3 and this implies since G is the smallest such, that GA ⊇ G. However, GA is constructed as a subset of G. This proves that for every B ∈ G and A ∈ K, A ∩ B ∈ G. Now pick B ∈ G and consider GB ≡ {A ∈ G : A ∩ B ∈ G} . I just proved K ⊆ GB . The other arguments are identical to show GB satisfies 1 - 3 and is therefore equal to G. This shows that whenever A, B ∈ G it follows A ∩ B ∈ G. This implies G is a σ algebra. To show this, all that is left is to verify G is closed under countable unions because then it follows G is a σ algebra. Let {Ai } ⊆ G. Then let A01 = A1 and A0n+1
≡
An+1 \ (∪ni=1 Ai )
=
An+1 ∩ ∩ni=1 AC i
=
∩ni=1 An+1 ∩ AC ∈G i
because finite intersections of sets of G are in G. Since the A0i are disjoint, it follows ∞ 0 ∪∞ i=1 Ai = ∪i=1 Ai ∈ G
Therefore, G ⊇ σ (K). Example 5.3.3 Suppose you have (U, F) and (V, S) , two measurable spaces. Let K ⊆ U ×V consist of all sets of the form A × B where A ∈ F and B ∈ S. This is easily seen to be a π system. When this is done, σ (K) is denoted as F × S. An important example of a σ algebra is the Borel sets.
Definition 5.3.4
The Borel sets on Rp , denoted by B (Rp ) consist of the smallest σ algebra containing the open sets. Don’t ever try to describe a generic Borel set. Always work with the definition that it is the smallest σ algebra containing the open sets. Attempts to give an explicit description of a “typical” Borel set tend to lead nowhere because there are so many things which can be done.You can take countable unions and complements and then countable intersections of what you get and then another countable union followed by complements and on and on. You just can’t get a good useable description in this way. However, it is easy to see that something like C ∞ ∩∞ i=1 ∪j=i Ej
is a Borel set if the Ej are. This is useful. An important example of the above is the case of a random vector and its distribution measure.
144
CHAPTER 5. MEASURES AND MEASURABLE FUNCTIONS
Definition 5.3.5
A measurable function X : (Ω, F, µ) → Rp is called a random variable when µ (Ω) = 1. For such a random variable, one can define a distribution measure λX on the Borel sets of Rp as follows. λX (G) ≡ µ X−1 (G) This is a well defined measure on the Borel sets of Z because it makes sense for every G open and G ≡ G ⊆ Rp : X−1 (G) ∈ F is a σ algebra which contains the open sets, hence the Borel sets. Such a random variable is also called a random vector.
5.4
Measures and Outer Measures
There is also something called an outer measure which is defined on the set of all subsets.
Definition 5.4.1
Let Ω be a nonempty set and let λ : P (Ω) → [0, ∞) satisfy the
following: 1. λ (∅) = 0 2. If A ⊆ B, then λ (A) ≤ λ (B) P∞ 3. λ (∪∞ i=1 Ei ) ≤ i=1 λ (Ei ) Then λ is called an outer measure. Every measure determines an outer measure. For example, suppose that µ is a measure on F a σ algebra of subsets of Ω. Then define µ ˆ (S) ≡ inf {µ (E) : E ⊇ S, E ∈ F} This is easily seen to be an outer measure. Also, we have the following Proposition.
Proposition 5.4.2 Let µ be a measure as just described. Then µˆ as defined above, is an outer measure and also, if E ∈ F, then µ ˆ (E) = µ (E). Proof: The first two properties of an outer measure are obvious. What of the third? If any µ ˆ (Ei ) = ∞, then there is nothing to show so suppose each of these is finite. Let Fi ⊇ Ei such that Fi ∈ F and µ ˆ (Ei ) + 2εi > µ (Fi ) . Then ∞ µ ˆ (∪∞ i=1 Ei ) ≤ µ (∪i=1 Fi ) ≤
∞ X
µ (Fi )
i=1
0. Letting δ → 0, it follows that m (∅) = 0. Consider 2.). If any m (Ai ) = ∞, there is nothing to prove. The assertion simply is ∞ ≤ ∞. Assume then that m (Ai ) < ∞ for all i. Then for each m ∈ N there exists a m ∞ countable set of open intervals, {(am i , bi )}i=1 such that ∞
m (Am ) +
X ε m (bm > i − ai ) . m 2 i=1
Then using Theorem 1.8.3 on Page 22, m (∪∞ m=1 Am ) ≤ ≤
X
m (bm i − ai ) =
i,m ∞ X
m (Am ) +
m=1
∞ ∞ X X
m (bm i − ai )
m=1 i=1 ∞ X
ε m (Am ) + ε, = 2m m=1
and since ε is arbitrary, this establishes 2.). ∞ Next consider 3.). By definition, there exists a sequence of open intervals, {(ai , bi )}i=1 whose union contains [a, b] such that m ([a, b]) + ε ≥
∞ X
(bi − ai )
i=1
Since [a, b] is compact, finitely many of these intervals also cover [a, b]. It follows there n exist finitely many of these intervals, denoted as {(ai , bi )}i=1 , which overlap, such that a ∈ (a1 , b1 ) , b1 ∈ (a2 , b2 ) , · · · , b ∈ (an , bn ) . Therefore, m ([a, b]) ≤
n X
(bi − ai ) .
i=1
It follows n X
(bi − ai ) ≥ m ([a, b])
i=1
≥
n X
(bi − ai ) − ε
i=1
≥ (b − a) − ε
146
CHAPTER 5. MEASURES AND MEASURABLE FUNCTIONS
Therefore, since a − 2ε , b +
ε 2
⊇ [a, b] ,
(b − a) + ε ≥ m ([a, b]) ≥ (b − a) − ε Since ε is arbitrary, (b − a) = m ([a, b]) Also [a + δ, b − δ] . From what was just shown, (b − a) − 2δ = m ([a + δ, b − δ]) ≤ m ((a, b)) and so, since this holds for every δ, (b − a) ≤ m ((a, b)) ≤ m ([a, b]) = (b − a). This shows 3.)
5.6
Measures from Outer Measures
Earlier in Theorem 5.5.1, an outer measure on P (R) was constructed. This can be used to obtain a measure defined on R. However, the procedure for doing so is a special case of a general approach due to Caratheodory in about 1918.
Definition 5.6.1
Let Ω be a nonempty set and let µ : P(Ω) → [0, ∞] be an outer measure. For E ⊆ Ω, E is µ measurable if for all S ⊆ Ω, µ(S) = µ(S \ E) + µ(S ∩ E).
(5.4)
To help in remembering 5.4, think of a measurable set E, as a process which divides a given set into two pieces, the part in E and the part not in E as in 5.4. In the Bible, there are several incidents recorded in which a process of division resulted in more stuff than was originally present.1 Measurable sets are exactly those which are incapable of such a miracle. You might think of the measurable sets as the non-miraculous sets. The idea is to show that they form a σ algebra on which the outer measure µ is a measure. First here is a definition and a lemma.
Definition 5.6.2 (µbS)(A) ≡ µ(S ∩ A) for all A ⊆ Ω. Thus µbS is the name of a new outer measure, called µ restricted to S. The next lemma indicates that the property of measurability is not lost by considering this restricted measure.
Lemma 5.6.3 If A is µ measurable, then A is µbS measurable. Proof: Suppose A is µ measurable. It is desired to to show that for all T ⊆ Ω, (µbS)(T ) = (µbS)(T ∩ A) + (µbS)(T \ A). Thus it is desired to show µ(S ∩ T ) = µ(T ∩ A ∩ S) + µ(T ∩ S ∩ AC ).
(5.5)
But 5.5 holds because A is µ measurable. Apply Definition 5.6.1 to S ∩ T instead of S. If A is µbS measurable, it does not follow that A is µ measurable. Indeed, if you believe in the existence of non measurable sets, you could let A = S for such a µ non measurable set and verify that S is µbS measurable. The next theorem is the main result on outer measures which shows that starting with an outer measure you can obtain a measure. 11
Kings 17, 2 Kings 4, Mathew 14, and Mathew 15 all contain such descriptions. The stuff involved was either oil, bread, flour or fish. In mathematics such things have also been done with sets. In the book by Bruckner Bruckner and Thompson there is an interesting discussion of the Banach Tarski paradox which says it is possible to divide a ball in R3 into five disjoint pieces and assemble the pieces to form two disjoint balls of the same size as the first. The details can be found in: The Banach Tarski Paradox by Wagon, Cambridge University press. 1985. It is known that all such examples must involve the axiom of choice.
5.6. MEASURES FROM OUTER MEASURES
147
Theorem 5.6.4
Let Ω be a set and let µ be an outer measure on P (Ω). The collection of µ measurable sets S, forms a σ algebra and If Fi ∈ S, Fi ∩ Fj = ∅, then µ(∪∞ i=1 Fi ) =
∞ X
µ(Fi ).
(5.6)
i=1
If · · · Fn ⊆ Fn+1 ⊆ · · · , then if F = ∪∞ n=1 Fn and Fn ∈ S, it follows that µ(F ) = lim µ(Fn ).
(5.7)
n→∞
If · · · Fn ⊇ Fn+1 ⊇ · · · , and if F = ∩∞ n=1 Fn for Fn ∈ S then if µ(F1 ) < ∞, µ(F ) = lim µ(Fn ).
(5.8)
n→∞
This measure space is also complete which means that if µ (F ) = 0 for some F ∈ S then if G ⊆ F, it follows G ∈ S also. Proof: First note that ∅ and Ω are obviously in S. Now suppose A, B ∈ S. I will show A \ B ≡ A ∩ B C is in S. To do so, consider the following picture.
S S
T
T C AC B
S S
T
A
T
T
T AC B
BC S
T
A
T
B
B
A
It is required to show that µ (S) = µ (S \ (A \ B)) + µ (S ∩ (A \ B)) First consider S \ (A \ B) . From the picture, it equals S ∩ AC ∩ B C ∪ (S ∩ A ∩ B) ∪ S ∩ AC ∩ B Therefore, µ (S) ≤ µ (S \ (A \ B)) + µ (S ∩ (A \ B)) ≤ µ S ∩ AC ∩ B C + µ (S ∩ A ∩ B) + µ S ∩ AC ∩ B + µ (S ∩ (A \ B)) = µ S ∩ AC ∩ B C + µ (S ∩ A ∩ B) + µ S ∩ AC ∩ B + µ S ∩ A ∩ B C = µ S ∩ AC ∩ B C + µ S ∩ A ∩ B C + µ (S ∩ A ∩ B) + µ S ∩ AC ∩ B = µ S ∩ B C + µ (S ∩ B) = µ (S) and so this shows that A \ B ∈ S whenever A, B ∈ S. Since Ω ∈ S, this shows that A ∈ S if and only if AC ∈ S. Now if A, B ∈ S, A ∪ B = C (A ∩ B C )C = (AC \ B)C ∈ S. By induction, if A1 , · · · , An ∈ S, then so is ∪ni=1 Ai . If A, B ∈ S, with A ∩ B = ∅, µ(A ∪ B) = µ((A ∪ B) ∩ A) + µ((A ∪ B) \ A) = µ(A) + µ(B).
148
CHAPTER 5. MEASURES AND MEASURABLE FUNCTIONS
By induction, if Ai ∩ Aj = ∅ and Ai ∈ S, µ(∪ni=1 Ai ) =
n X
µ(Ai ).
(5.9)
i=1
Now let A = ∪∞ i=1 Ai where Ai ∩ Aj = ∅ for i 6= j. ∞ X
µ(Ai ) ≥ µ(A) ≥ µ(∪ni=1 Ai ) =
i=1
n X
µ(Ai ).
i=1
Since this holds for all n, you can take the limit as n → ∞ and conclude, ∞ X
µ(Ai ) = µ(A)
i=1
which establishes 5.6. Consider part 5.7. Without loss of generality µ (Fk ) < ∞ for all k since otherwise there is nothing to show. Suppose {Fk } is an increasing sequence of sets of S. Then letting ∞ F0 ≡ ∅, {Fk+1 \ Fk }k=0 is a sequence of disjoint sets of S since it was shown above that the difference of two sets of S is in S. Also note that from 5.9 µ (Fk+1 \ Fk ) + µ (Fk ) = µ (Fk+1 ) and so if µ (Fk ) < ∞, then µ (Fk+1 \ Fk ) = µ (Fk+1 ) − µ (Fk ) . Therefore, letting F ≡ ∪∞ k=1 Fk which also equals ∪∞ k=1 (Fk+1 \ Fk ) , it follows from part 5.6 just shown that µ (F )
=
∞ X
µ (Fk+1 \ Fk ) = lim
n→∞
k=0
=
lim
n→∞
n X
n X
µ (Fk+1 \ Fk )
k=0
µ (Fk+1 ) − µ (Fk ) = lim µ (Fn+1 ) . n→∞
k=0
In order to establish 5.8, let the Fn be as given there. Then, since (F1 \ Fn ) increases to (F1 \ F ), 5.7 implies lim (µ (F1 ) − µ (Fn )) = µ (F1 \ F ) . n→∞
The problem is, I don’t know F ∈ S and so it is not clear that µ (F1 \ F ) = µ (F1 ) − µ (F ). However, µ (F1 \ F ) + µ (F ) ≥ µ (F1 ) and so µ (F1 \ F ) ≥ µ (F1 ) − µ (F ). Hence lim (µ (F1 ) − µ (Fn )) = µ (F1 \ F ) ≥ µ (F1 ) − µ (F )
n→∞
which implies lim µ (Fn ) ≤ µ (F ) .
n→∞
But since F ⊆ Fn , µ (F ) ≤ lim µ (Fn ) n→∞
and this establishes 5.8. Note that it was assumed µ (F1 ) < ∞ because µ (F1 ) was subtracted from both sides.
5.6. MEASURES FROM OUTER MEASURES
149
It remains to show S is closed under countable unions. Recall that if A ∈ S, then AC ∈ S n and S is closed under finite unions. Let Ai ∈ S, A = ∪∞ i=1 Ai , Bn = ∪i=1 Ai . Then µ(S)
= =
µ(S ∩ Bn ) + µ(S \ Bn ) (µbS)(Bn ) +
(5.10)
(µbS)(BnC ).
By Lemma 5.6.3 Bn is (µbS) measurable and so is BnC . I want to show µ(S) ≥ µ(S \ A) + µ(S ∩ A). If µ(S) = ∞, there is nothing to prove. Assume µ(S) < ∞. Then apply Parts 5.8 and 5.7 to the outer measure µbS in 5.10 and let n → ∞. Thus Bn ↑ A, BnC ↓ AC and this yields µ(S) = (µbS)(A) + (µbS)(AC ) = µ(S ∩ A) + µ(S \ A). Therefore A ∈ S and this proves Parts 5.6, 5.7, and 5.8. It only remains to verify the assertion about completeness. Letting G and F be as described above, let S ⊆ Ω. I need to verify µ (S) ≥ µ (S ∩ G) + µ (S \ G) However, µ (S ∩ G) + µ (S \ G) ≤ µ (S ∩ F ) + µ (S \ F ) + µ (F \ G) = µ (S ∩ F ) + µ (S \ F ) = µ (S) because by assumption, µ (F \ G) ≤ µ (F ) = 0. The measure m which results from the outer measure of Theorem 5.5.1 is called Lebesgue measure. The following is a general result about completion of a measure space.
Proposition 5.6.5 Let (Ω, F, µ) be a measure space. Also let µˆ be the outer measure defined by µ ˆ (F ) ≡ inf {µ (E) : E ⊇ F and E ∈ F} ˆ the set of µ Then µ ˆ is an outer measure which is a measure on F, ˆ measurable sets. Also ˆ If (Ω, F, µ) is already complete, then no new sets are µ ˆ (E) = µ (E) for E ∈ F and F ⊆ F. ˆ obtained from this process and F = F. Proof: The first part of this follows from Proposition 5.4.2. It only remains to verify ˆ Let S be a set and let E ∈ F, ES ⊇ S, ES ∈ F. Then that F ⊆ F. µ (ES ) = µ (ES \ E) + µ (ES ∩ E) due to the fact that µ is a measure. As usual, if µ ˆ (S) = ∞, it is obvious that µ ˆ (S) ≥ µ ˆ (S \ E) + µ ˆ (S ∩ E) . Therefore, assume this is not ∞. Then let µ ˆ (S) > µ (ES ) − ε. Then from the above, ε+µ ˆ (S) ≥ µ (ES \ E) + µ (ES ∩ E) ≥ µ (S \ E) + µ (S ∩ E) ˆ Thus F ⊆ F. ˆ Since ε is arbitrary, this shows that E ∈ F. Why are these two σ algebras equal if (Ω, F, µ) is complete? Suppose now that (Ω, F, µ) ˆ Then there exists E ⊇ F such that µ (E) = µ is complete. Let F ∈ F. ˆ (F ) . This is obvious if µ ˆ (F ) = ∞. Otherwise, let En ⊇ F, µ ˆ (F ) + n1 > µ (En ) . Just let E = ∩n En . Now µ ˆ (E \ F ) = 0. Now also, there exists a set of F called W such that µ (W ) = 0 and W ⊇ E \ F. Thus E \ F ⊆ W, a set of measure zero. Hence by completeness of (Ω, F, µ) , it must be the case that E \ F = E ∩ F C = G ∈ F. Then taking complements of both sides, E C ∪ F = GC ∈ F. Now take intersections with E. F ∈ E ∩ GC ∈ F.
150
5.7
CHAPTER 5. MEASURES AND MEASURABLE FUNCTIONS
When is a Measure a Borel Measure?
You have an outer measure defined on the set of all subsets of Rp . How can you tell that the σ algebra of measurable sets includes the Borel sets? This is what is discussed here. This is a very important idea because, from the above, you can then assert regularity of the measure if, for example it is finite on any ball.
Definition 5.7.1
For two sets, A, B we define
dist (A, B) ≡ inf {d (x, y) : x ∈ A, y ∈ B} .
Theorem 5.7.2
Let µ be an outer measure on the subsets of (X, d), a metric space.
If µ(A ∪ B) = µ(A) + µ(B) whenever dist(A, B) > 0, then the σ algebra of measurable sets S contains the Borel sets. Proof: It suffices to show that closed sets are in S, the σ-algebra of measurable sets, because then the open sets are also in S and consequently S contains the Borel sets. Let K be closed and let S be a subset of Ω. Is µ(S) ≥ µ(S ∩ K) + µ(S \ K)? It suffices to assume µ(S) < ∞. Let 1 Kn ≡ {x : dist(x, K) ≤ } n By Lemma 2.5.8 on Page 42, x → dist (x, K) is continuous and so Kn is closed. By the assumption of the theorem, µ(S) ≥ µ((S ∩ K) ∪ (S \ Kn )) = µ(S ∩ K) + µ(S \ Kn )
(5.11)
since S ∩ K and S \ Kn are a positive distance apart. Now µ(S \ Kn ) ≤ µ(S \ K) ≤ µ(S \ Kn ) + µ((Kn \ K) ∩ S).
(5.12)
If limn→∞ µ((Kn \ K) ∩ S) = 0 then the theorem will be proved because this limit along with 5.12 implies limn→∞ µ (S \ Kn ) = µ (S \ K) and then taking a limit in 5.11, µ(S) ≥ µ(S ∩ K) + µ(S \ K) as desired. Therefore, it suffices to establish this limit. Since K is closed, a point, x ∈ / K must be at a positive distance from K and so Kn \ K = ∪∞ k=n Kk \ Kk+1 . Therefore µ(S ∩ (Kn \ K)) ≤
∞ X
µ(S ∩ (Kk \ Kk+1 )).
(5.13)
k=n
If
∞ X
µ(S ∩ (Kk \ Kk+1 )) < ∞,
(5.14)
k=1
then µ(S ∩ (Kn \ K)) → 0 because it is dominated by the tail of a convergent series so it suffices to show 5.14. M X µ(S ∩ (Kk \ Kk+1 )) = k=1
X k even, k≤M
µ(S ∩ (Kk \ Kk+1 )) +
X
µ(S ∩ (Kk \ Kk+1 )).
(5.15)
k odd, k≤M
By the construction, the distance between any pair of sets, S ∩ (Kk \ Kk+1 ) for different even values of k is positive and the distance between any pair of sets, S ∩ (Kk \ Kk+1 ) for different odd values of k is positive. Therefore, X X µ(S ∩ (Kk \ Kk+1 )) + µ(S ∩ (Kk \ Kk+1 )) ≤ k even, k≤M
k odd, k≤M
5.8. ONE DIMENSIONAL LEBESGUE MEASURE µ
[
151
[
(S ∩ (Kk \ Kk+1 )) + µ
k even, k≤M
(S ∩ (Kk \ Kk+1 ))
k odd, k≤M
≤ µ (S) + µ (S) = 2µ (S) and so for all M,
5.8
PM
k=1
µ(S ∩ (Kk \ Kk+1 )) ≤ 2µ (S) showing 5.14.
One Dimensional Lebesgue Measure
Now with these major results about measures, it is time to specialize to the outer measure of Theorem 5.5.1. The next theorem describes some fundamental properties of Lebesgue measure on R. The conditions 5.16 and 5.17 given below are known respectively as inner and outer regularity.
Theorem 5.8.1
Let F denote the σ algebra of Theorem 5.6.4, associated with the outer measure µ in Theorem 5.5.1, on which µ is a measure. Then every open interval is in F. So are all open and closed sets. Furthermore, if E is any set in F µ (E) = sup {µ (K) : K compact, K ⊆ E}
(5.16)
µ (E) = inf {µ (V ) : V is an open set V ⊇ E}
(5.17)
Proof: The first task is to show (a, b) ∈ F. I need to show that for every S ⊆ R, C µ (S) ≥ µ (S ∩ (a, b)) + µ S ∩ (a, b) (5.18) Suppose first S is an open interval, (c, d) . If (c, d) has empty intersection with (a, b) or is contained in (a, b) there is nothing to prove. The above expression reduces to nothing more than µ (S) = µ (S). Suppose next that (c, d) ⊇ (a, b) . In this case, the right side of the above reduces to µ ((a, b)) + µ ((c, a] ∪ [b, d)) ≤
b−a+a−c+d−b
=
d − c = µ ((c, d))
The only other cases are c ≤ a < d ≤ b or a ≤ c < d ≤ b. Consider the first of these cases. Then the right side of 5.18 for S = (c, d) is µ ((a, d)) + µ ((c, a]) = d − a + a − c = µ ((c, d)) The last case is entirely similar. Thus 5.18 holds whenever S is an open interval. Now it is clear 5.18 also holds if µ (S) = ∞. Suppose then that µ (S) < ∞ and let S ⊆ ∪∞ k=1 (ak , bk ) such that µ (S) + ε >
∞ X
(bk − ak ) =
k=1
∞ X
µ ((ak , bk )) .
k=1
Then since µ is an outer measure, and using what was just shown, C µ (S ∩ (a, b)) + µ S ∩ (a, b) C ∞ ≤ µ (∪∞ (a , b ) ∩ (a, b)) + µ ∪ (a , b ) ∩ (a, b) k k k k k=1 k=1 ≤ ≤
∞ X k=1 ∞ X k=1
C µ ((ak , bk ) ∩ (a, b)) + µ (ak , bk ) ∩ (a, b) µ ((ak , bk )) ≤ µ (S) + ε.
152
CHAPTER 5. MEASURES AND MEASURABLE FUNCTIONS
Since ε is arbitrary, this shows 5.18 holds for any S and so any open interval is in F. It follows any open set is in F. This follows from Theorem 2.11.8 which implies that if U is open, it is the countable union of disjoint open intervals. Since each of these open intervals is in F and F is a σ algebra, their union is also in F. It follows every closed set is in F also. This is because F is a σ algebra and if a set is in F then so is its complement. The closed sets are those which are complements of open sets. The assertion of outer regularity is not hard to get. Letting E be any set µ (E) < ∞, ∞ there exist open intervals covering E denoted by {(ai , bi )}i=1 such that µ (E) + ε >
∞ X i=1
bi − ai =
∞ X
µ (ai , bi ) ≥ µ (V )
i=1
where V is the union of the open intervals just mentioned. Thus µ (E) ≤ µ (V ) ≤ µ (E) + ε. This shows outer regularity. If µ (E) = ∞, there is nothing to show. Now consider the assertion of inner regularity 5.16. Suppose I is a closed and bounded interval and E ⊆ I with E ∈ F. By outer regularity, there exists open V containing I ∩ E C such that µ I ∩ E C + ε > µ (V ) Then since µ is additive on F, it follows that µ V \ I ∩ E C < ε. Then K ≡ V C ∩ I is a compact subset of E. This is because V ⊇ I ∩ E C so V C ⊆ I C ∪ E and so V C ∩ I ⊆ I C ∪ E ∩ I = E ∩ I = E. Also, E \ V C ∩ I = E ∩ V = V \ EC ⊆ V \ I ∩ EC , a set of measure less than ε. Therefore, µ V C ∩ I + ε ≥ µ V C ∩ I + µ E \ V C ∩ I = µ (E) , so the desired conclusion holds in the case where E is contained in a compact interval. Now suppose E is arbitrary and let l < µ (E) . Then choosing ε small enough, l+ε < µ (E) also. Letting En ≡ E∩[−n, n] , it follows from Lemma 5.2.4 that for n large enough, µ (En ) > l + ε. Now from what was just shown, there exists K ⊆ En such that µ (K) + ε > µ (En ). Hence µ (K) > l. This shows 5.16.
Proposition 5.8.2 For m Lebesgue measure, m ([a, b]) = m ((a, b)) = b − a. Also m is translation invariant in the sense that if E is any Lebesgue measurable set, then m (x + E) = m (E). Proof: The formula for the measure of an interval comes right away from Theorem 5.5.1. From this, it follows right away that whenever E is an interval, m (x + E) = m (E). Every open set is the countable disjoint union of open intervals, so if E is an open set, then m (x + E) = m (E). What about closed sets? First suppose H is a closed and bounded set. Then letting (−n, n) ⊇ H, µ (((−n, n) \ H) + x) + µ (H + x) = µ ((−n, n) + x) Hence, from what was just shown about open sets, µ (H)
= µ ((−n, n)) − µ ((−n, n) \ H) = µ ((−n, n) + x) − µ (((−n, n) \ H) + x) = µ (H + x)
5.9. EXERCISES
153
Therefore, the translation invariance holds for closed and bounded sets. If H is an arbitrary closed set, then µ (H + x) = lim µ (H ∩ [−n, n] + x) = lim µ (H ∩ [−n, n]) = µ (H) . n→∞
n→∞
It follows right away that if G is the countable intersection of open sets, (Gδ set, pronounced g delta set ) then m (G ∩ (−n, n) + x) = m (G ∩ (−n, n)) Now taking n → ∞, m (G + x) = m (G) .Similarly, if F is the countable union of compact sets, (Fσ set, pronounced F sigma set) then m (F + x) = m (F ) . Now using Theorem 5.8.1, if E is an arbitrary measurable set, there exist an Fσ set F and a Gδ set G such that F ⊆ E ⊆ G and m (F ) = m (G) = m (E). Then m (F ) = m (x + F ) ≤ m (x + E) ≤ m (x + G) = m (G) = m (E) = m (F ) .
Definition 5.8.3
There is an important idea which is often seen in the context of measures. Something happens a.e. (almost everywhere) means that it happens off a set of measure zero.
5.9
Exercises
1. Show carefully that if S is a set whose elements are σ algebras which are subsets of P (Ω) , then ∩S is also a σ algebra. Now let G ⊆ P (Ω) satisfy property P if G is closed with respect to complements and countable disjoint unions as in Dynkin’s lemma, and contains ∅ and Ω. If H ⊆ G is any set whose elements are subsets of P (Ω) which satisfies property P, then ∩H also satisfies property P. Thus there is a smallest subset of G satisfying P . 2. Show Q B (Rp ) = σ (P) where P consists of the half open rectangles which are of the p form i=1 [ai , bi ). 3. Recall that f : (Ω, F) → R is measurable means f −1 (open) ∈ F. Show that if E is any set in B (R) , then f −1 (E) ∈ F. Thus, inverse images of Borel sets are measurable. Next consider f : (Ω, F) → R being measurable and g : R → R is Borel measurable, meaning that g −1 (open) ∈ B (R). Explain why g ◦ f is measurable. Hint: You −1 know that (g ◦ f ) (U ) = f −1 g −1 (U ) . For your information, it does not work the other way around. That is, measurable composed with Borel measurable is not necessarily measurable. In fact examples exist which show that if g is measurable and f is continuous, then g ◦ f may fail to be measurable. Qn 4. Let Xi ≡ Rni and let X = i=1 Xi and let the distance between two points in X be given by kx − yk ≡ max {kxi − yi k , i = 1, 2, · · · , n} Show that any set of the form n Y
Ei , Ei ∈ B (Xi )
i=1
is a Borel set. That is, the product Qn of Borel sets is Borel. Hint: You might consider the continuous functions π i : j=1 Xj → Xi which are the projection maps. Thus π i (x) ≡ xi . Then π −1 i (Ei ) would have to be Borel measurable whenever Ei ∈ B (Xi ). Explain why. You know π i Q is continuous. Why would π −1 i (Borel) be a Borel set? n Then you might argue that i=1 Ei = ∩ni=1 π −1 (E ) . i i
154
CHAPTER 5. MEASURES AND MEASURABLE FUNCTIONS
5. You have two finite measures defined on B (X) µ, ν. Suppose these are equal on every open set. Show that these must be equal on every Borel set. Hint: You should use Dynkin’s lemma to show this very easily. 6. Show that (N, P (N) , µ) is a measure space where µ (S) equals the numberPof elements ∞ of S. You need to verify that if the sets Ei are disjoint, then µ (∪∞ i=1 Ei ) = i=1 µ (Ei ) . 7. Let Ω be an uncountable set and let F denote those subsets of Ω, F such that either F or F C is countable. Show that this is a σ algebra. Next define the following measure. µ (A) = 1 if A is uncountable and µ (A) = 0 if A is countable. Show that µ is a measure. 8. Let µ (E) = 1 if 0 ∈ E and µ (E) = 0 if 0 ∈ / E. Show this is a measure on P (R). 9. Give an example of a measure µ and a measure space and a decreasing sequence of measurable sets {Ei } such that limn→∞ µ (En ) 6= µ (∩∞ i=1 Ei ). 10. If you have a finite measure µ on B (Rp ), and if F ∈ B (Rp ) , show that there exist sets E, G such that G is a countable intersection of open sets and E is a countable union of closed sets such that E ⊆ F ⊆ G and µ (G \ E) = 0. 11. You have a measure space (Ω, F, P ) where P is a probability measure on F. Then you also have a measurable function X : Ω → Rn . Thus X −1 (U ) ∈ F whenever U is open. Now define a measure on B (Rn ) denoted by λX and defined by λX (E) = P ({ω : X (ω) ∈ E}) . Explain why this yields a well defined probability measure on B (Rn ). This is called the distribution measure. 12. Let K ⊆ V where K is closed and V is open. Consider the following function. dist x, V C f (x) = dist (x, K) + dist (x, V C ) Explain why this function is continuous, equals 0 off V and equals 1 on K. 13. Let (Ω, F) be a measurable space and let f : Ω → Rn be a measurable function. Then σ (f ) denotes the smallest σ algebra such that f is measurable with respect to this σ algebra. Show that σ (f ) = f −1 (E) : E ∈ B (Rn ) . More generally, you have a whole set of measurable functions S and σ (S) denotes the smallest σ algebra such that each function in S is measurable. If you have an increasing list St for t ∈ [0, ∞), then σ (St ) will be what is called a filtration. You have a σ algebra for each t ∈ [0, ∞) and as t increases, these σ algebras get larger. This is an essential part of the construction which is used to show that Wiener process is a martingale. In fact the whole subject of martingales has to do with filtrations. 14. There is a monumentally important theorem called the Borel Cantelli lemma. It says the following. If you have a measure space (Ω, F, µ) and if {Ei } ⊆ F is such that P ∞ i=1 µ (Ei ) < ∞, then there exists a set N of measure 0 (µ (N ) = 0) such that if ω∈ / N, then ω is in only finitely many of the Ei . Hint: You might look at the set of all ω which are in infinitely many of the Ei . First explain why this set is of the form ∩∞ n=1 ∪k≥n Ek . 15. Let (Ω, F, µ) be a measure space. A sequence of functions {fn } is said to converge in measure to a measurable function f if and only if for each ε > 0, lim µ ({ω : |fn (ω) − f (ω)| > ε}) = 0
n→∞
Show that if this happens, then there exists a subsequence {fnk } and a set of measure N such that if ω ∈ / N, then lim fnk (ω) = f (ω) . k→∞
5.9. EXERCISES
155
Also show that if µ is finite and limn→∞ fn (ω) = f (ω) , then fn converges in measure to f . 16. Let N be the positive integers and let F denote the set of all subsets of N. Explain why N is a σ algebra. You could let µ (S) be the number of elements of S. This is called counting measure. Explain why µ is a measure. 17. Show f : Ω → R is measurable if and only if f −1 (U ) is measurable whenever U is an open set. Hint: This is pretty easy if you recall that every open set is the disjoint union of countably many connected components. 18. The smallest σ algebra on R which contains the open intervals, denoted by B is called the Borel sets. Show that B contains all open sets and is also the smallest σ algebra which contains all open sets. Show that all continuous functions g : R → R are B measurable. A word of advice pertaining to Borel sets: Don’t try to describe a typical Borel set. Instead, use the definition that it is a set in the smallest σ algebra containing the open sets. 19. Show that f : Ω → R is measurable if and only if f −1 (B) is measurable for every Borel B. Recall B is the smallest σ algebra which contains the open sets. Hint: Let G be those sets B such that f −1 (B) is measurable. Argue it is a σ algebra. 20. Now suppose f : Ω → R where (Ω, F) is a measureable space. Suppose g : R → R is B measurable. Explain why g ◦ f is F measurable. 21. The open sets in Rn are defined to be all sets U which are unions of open rectangles of the form n Y (ai , bi ) R= i=1 n
Show that all open sets in R are a countable union of such open rectangles. If a pi system K consists of products of open intervals like the above, show that σ (K) is B the Borel sets. Hint: There are countably many open rectangles of the form n Y
(p, q) , q, p ∈ Q
i=1
Show that an arbitrary open rectangle is the union of open rectangles of this sort having rational end points. Qn 22. ↑Show that a set of the form i=1 Bi is a Borel set in Rn if each Bi is a Borel set in R. The Borel sets in Rn are the smallest σ algebra which contains the open sets. Hint: You might let fi : Rn → R be the projection map. Explain why fi−1 (B) is a Borel set when B is a Borel set in R. You know fi is continuous and that it follows that it is Borel measurable. Now consider intersections of sets like this. 23. Lebesgue measure was discussed. Recall that m ((a, b)) = b − a and it is defined on a σ algebra which contains the Borel sets. It comes from an outer measure defined on P (R). Also recall that m is translation invariant. Let x ∼ y if and only if x − y ∈ Q. Show this is an equivalence relation. Now let W be a set of positive measure which is contained in (0, 1). For x ∈ W, let [x] denote those y ∈ W such that x ∼ y. Thus the equivalence classes partition W . Use axiom of choice to obtain a set S ⊆ W such that S consists of exactly one element from each equivalence class. Let T denote the rational numbers in [−1, 1]. Consider T + S ⊆ [−1, 2]. Explain why T + S ⊇ W . For T ≡ {rj } , explain why the sets {rj + S}j are disjoint. Now suppose S is measurable. Then show that you have a contradiction if m (S) = 0 since m (W ) > 0 and you also
156
CHAPTER 5. MEASURES AND MEASURABLE FUNCTIONS have a contradiction if m (S) > 0 because T + S consists of countably many disjoint sets. Explain why S cannot be measurable. Thus there exists T ⊆ R such that m (T ) < m (T ∩ S) + m T ∩ S C . Is there an open interval (a, b) such that if T = (a, b) , then the above inequality holds?
24. Consider the following nested sequence of compact sets, {Pn }.Let P1 = [0, 1], P2 = 0, 31 ∪ 32 , 1 , etc. To go from Pn to Pn+1 , delete the open interval which is the middle third of each closed interval in Pn . Let P = ∩∞ n=1 Pn . By the finite intersection property of compact sets, P 6= ∅. Show m(P ) = 0. If you feel ambitious also show there is a one to one onto mapping of [0, 1] to P . The set P is called the Cantor set. Thus, although P has measure zero, it has the same number of points in it as [0, 1] in the sense that there is a one to one and onto mapping from one to the other. Hint: There are various ways of doing this last part but the most enlightenment is obtained by exploiting the topological properties of the Cantor set rather than some silly representation in terms of sums of powers of two and three. All you need to do is use the Schroder Bernstein theorem and show there is an onto map from the Cantor set to [0, 1]. 25. Consider the sequence of functions defined in the following way. Let f1 (x) = x on [0, 1]. To get from fn to fn+1 , let fn+1 = fn on all intervals where fn is constant. If fn is nonconstant on [a, b], let fn+1 (a) = fn (a), fn+1 (b) = fn (b), fn+1 is piecewise linear and equal to 21 (fn (a) + fn (b)) on the middle third of [a, b]. Sketch a few of these and you will see the pattern. The process of modifying a nonconstant section of the graph of this function is illustrated in the following picture.
Show {fn } converges uniformly on [0, 1]. If f (x) = limn→∞ fn (x), show that f (0) = 0, f (1) = 1, f is continuous, and f 0 (x) = 0 for all x ∈ / P where P is the Cantor set of Problem 24. This function is called the Cantor function.It is a very important example to remember. Note it has derivative equal to zero a.e. and yet it succeeds in climbing from 0 to 1. Explain why this interesting function is not absolutely continuous although it is continuous. Hint: This isn’t too hard if you focus on getting a careful estimate on the difference between two successive functions in the list considering only a typical small interval in which the change takes place. The above picture should be helpful. 26. ↑ This problem gives a very interesting example found in the book by McShane [53]. Let g(x) = x + f (x) where f is the strange function of Problem 25. Let P be the Cantor set of Problem 24. Let [0, 1] \ P = ∪∞ j=1 Ij where Ij is open and Ij ∩ Ik = ∅ if j 6= k. These intervals are the connected components of the complement of the Cantor set. Show m(g(Ij )) = m(Ij ) so m(g(∪∞ j=1 Ij )) =
∞ X j=1
m(g(Ij )) =
∞ X
m(Ij ) = 1.
j=1
Thus m(g(P )) = 1 because g([0, 1]) = [0, 2]. By Problem 23 there exists a set, A ⊆ g (P ) which is non measurable. Define φ(x) = XA (g(x)). Thus φ(x) = 0 unless x ∈ P . Tell why φ is measurable. (Recall m(P ) = 0 and Lebesgue measure is complete.) Now show that XA (y) = φ(g −1 (y)) for y ∈ [0, 2]. Tell why g −1 is continuous but φ ◦ g −1 is not measurable. (This is an example of measurable ◦ continuous 6= measurable.) Show there exist Lebesgue measurable sets which are not Borel measurable. Hint: The function, φ is Lebesgue measurable. Now recall that Borel ◦ measurable = measurable.
Chapter 6
The Abstract Lebesgue Integral The general Lebesgue integral requires a measure space, (Ω, F, µ) and, to begin with, a nonnegative measurable function. I will use Lemma 1.8.2 about interchanging two supremums frequently. Also, I will use the observation that if {an } is an increasing sequence of points of [0, ∞] , then supn an = limn→∞ an which is obvious from the definition of sup.
6.1 6.1.1
Definition For Nonnegative Measurable Functions Riemann Integrals For Decreasing Functions
First of all, the notation [g < f ] is short for {ω ∈ Ω : g (ω) < f (ω)} with other variants of this notation being similar. Also, the convention, 0 · ∞ = 0 will be used to simplify the presentation whenever it is convenient to do so. The notation a ∧ b means the minimum of a and b.
Definition 6.1.1
Let f : [a, b] → [0, ∞] be decreasing. Define
b
Z
Z f (λ) dλ ≡ lim
M →∞
a
b
Z M ∧ f (λ) dλ = sup M
a
b
M ∧ f (λ) dλ a
where a ∧ b means the minimum of a and b. Note that for f bounded, b
Z M
Z M ∧ f (λ) dλ =
sup a
b
f (λ) dλ a
where the integral on the right is the usual Riemann integral because eventually M > f . For f a nonnegative decreasing function defined on [0, ∞), Z
∞
Z f dλ ≡ lim
0
R→∞
R
Z f dλ = sup
0
R>1
R
Z
0
R M >0
R
f ∧ M dλ
f dλ = sup sup 0
Since decreasing bounded functions are Riemann integrable, the above definition is well defined. This claim should have been seen in calculus, but if not, see Problem 10 on Page 134. Now here is an obvious property. 157
158
CHAPTER 6. THE ABSTRACT LEBESGUE INTEGRAL
Lemma 6.1.2 Let f be a decreasing nonnegative function defined on an interval [a, b] . Then if [a, b] = ∪m k=1 Ik where Ik ≡ [ak , bk ] and the intervals Ik are non overlapping, it follows Z b m Z bk X f dλ = f dλ. a
k=1
ak
Proof: This follows from the computation, Z b Z b f dλ ≡ lim f ∧ M dλ M →∞
a
=
lim
M →∞
a m Z X
bk
f ∧ M dλ =
ak
k=1
m Z X k=1
bk
f dλ
ak
Note both sides could equal +∞.
6.1.2
The Lebesgue Integral For Nonnegative Functions
Here is the definition of the Lebesgue integral of a function which is measurable and has values in [0, ∞].
Definition 6.1.3
Let (Ω, F, µ) be a measure space and suppose f : Ω → [0, ∞] is
measurable. Then define Z
Z f dµ ≡
∞
µ ([f > λ]) dλ 0
which makes sense because λ → µ ([f > λ]) is nonnegative and decreasing. R R Note that if f ≤ g, then f dµ ≤ gdµ because µ ([f > λ]) ≤ µ ([g > λ]) . P0 For convenience i=1 ai ≡ 0.
Lemma 6.1.4 In the situation of the above definition, Z f dµ = sup
∞ X
µ ([f > hi]) h
h>0 i=1
Proof: Let m (h, R) ∈ N satisfy R − h < hm (h, R) ≤ R. Then limR→∞ m (h, R) = ∞ and so Z Z ∞ Z R f dµ ≡ µ ([f > λ]) dλ = sup sup µ ([f > λ]) ∧ M dλ M
0
R
0
m(h,R)
=
X
sup sup sup M R>0 h>0
(µ ([f > kh]) ∧ M ) h + (µ ([f > R]) ∧ M ) (R − hm (h, R))(6.1)
k=1 m(h,R)
=
X
sup sup sup M R>0 h>0
(µ ([f > kh]) ∧ M ) h
k=1
RR because the sum in 6.1 is just a lower sum for the integral 0 µ ([f > λ]) ∧ M dλ, these lower sums are increasing, and the last term is smaller than M h. Hence, switching the order of the sups, this equals m(h,R)
sup sup sup R>0 h>0 M
X
m(h,R)
(µ ([f > kh]) ∧ M ) h = sup sup lim
R>0 h>0 M →∞
k=1 m(R,h)
= sup sup h>0 R
X k=1
(µ ([f > kh])) h = sup h>0
∞ X k=1
X
(µ ([f > kh]) ∧ M ) h
k=1
(µ ([f > kh])) h.
6.2. THE LEBESGUE INTEGRAL FOR NONNEGATIVE SIMPLE FUNCTIONS
6.2
159
The Lebesgue Integral for Nonnegative Simple Functions
To begin with, here is a useful lemma.
Lemma 6.2.1 If f (λ) = 0 for all λ > a, where f is a decreasing nonnegative function, then
Z
∞
Z
a
f (λ) dλ =
f (λ) dλ.
0
0
Proof: From the definition, Z ∞ f (λ) dλ =
R
Z
Z
lim
R→∞
0
f (λ) ∧ M dλ 0 R
Z =
0
R
sup sup R>1 M
f (λ) dλ
R>1
0
Z =
R
f (λ) dλ = sup
f (λ) ∧ M dλ
sup sup M R>1
0 a
Z
sup sup f (λ) ∧ M dλ M R>1 0 Z a Z = sup f (λ) ∧ M dλ ≡
=
M
0
a
f (λ) dλ.
0
Now the Lebesgue integral for a nonnegative function has been defined, what does it do to a nonnegative simple function? Recall a nonnegative simple function is one which has finitely many nonnegative real values which it assumes on measurable sets. Thus a simple function can be written in the form s (ω) =
n X
ci XEi (ω)
i=1
where the ci are each nonnegative, the distinct values of s. P Lemma 6.2.2 Let s (ω) = pi=1 ai XEi (ω) be a nonnegative simple function where the Ei are distinct but the ai might not be. Then Z sdµ =
p X
ai µ (Ei ) .
(6.2)
i=1
Proof: Without loss of generality, assume 0 ≡ a0 < a1 ≤ a2 ≤ · · · ≤ ap and that µ (Ei ) < ∞, i > 0. Here is why. If µ (Ei ) = ∞, then letting a ∈ (ai−1 , ai ) , by Lemma 6.2.1, the left side would be Z ap Z ai µ ([s > λ]) dλ ≥ µ ([s > λ]) dλ 0 a0 Z ai ≡ sup µ ([s > λ]) ∧ M dλ M
0
≥ sup M ai = ∞ M
and so both sides are equal to ∞. Thus it can be assumed for each i, µ (Ei ) < ∞. Then it follows from Lemma 6.2.1 and Lemma 6.1.2, Z
∞
Z µ ([s > λ]) dλ =
0
ap
µ ([s > λ]) dλ = 0
p Z X k=1
ak
ak−1
µ ([s > λ]) dλ
160
CHAPTER 6. THE ABSTRACT LEBESGUE INTEGRAL
=
p X
(ak − ak−1 )
k=1
p X
µ (Ei ) =
p X
µ (Ei )
i=1
i=k
i X
(ak − ak−1 ) =
p X
ai µ (Ei )
i=1
k=1
Lemma 6.2.3 If a, b ≥ 0 and if s and t are nonnegative simple functions, then Z
Z as + btdµ = a
Proof: Let s(ω) =
n X
Z sdµ + b
αi XAi (ω), t(ω) =
i=1
m X
tdµ.
β j XBj (ω)
i=1
where αi are the distinct values of s and the β j are the distinct values of t. Clearly as + bt is a nonnegative simple function because it has finitely many values on measurable sets. In fact, m X n X (as + bt)(ω) = (aαi + bβ j )XAi ∩Bj (ω) j=1 i=1
where the sets Ai ∩ Bj are disjoint and measurable. By Lemma 6.2.2, Z as + btdµ =
=
n m X X
(aαi + bβ j )µ(Ai ∩ Bj )
j=1 i=1 n m X X
a
αi µ(Ai ∩ Bj ) + b
i=1 j=1 n X
= a
i=1
m X
β j µ(Bj )
j=1
Z
6.3
β j µ(Ai ∩ Bj )
j=1 i=1
αi µ(Ai ) + b
= a
m X n X
Z sdµ + b
tdµ.
The Monotone Convergence Theorem
The following is called the monotone convergence theorem. This theorem and related convergence theorems are the reason for using the Lebesgue integral. If limn→∞ fn (ω) = f (ω) and fn is increasing, then clearly f is also measurable because −1 f −1 ((a, ∞]) = ∪∞ k=1 fk ((a, ∞]) ∈ F
Theorem 6.3.1 (Monotone Convergence theorem) Let f have values in [0, ∞] and suppose {fn } is a sequence of nonnegative measurable functions having values in [0, ∞] and satisfying lim fn (ω) = f (ω) for each ω. n→∞
· · · fn (ω) ≤ fn+1 (ω) · · · Then f is measurable and Z
Z f dµ = lim
fn dµ.
Z
Z
n→∞
Proof: By Lemma 6.1.4 lim
n→∞
fn dµ = sup n
fn dµ
6.4. OTHER DEFINITIONS
=
sup sup n h>0
=
sup sup h>0 N
∞ X
161
µ ([fn > kh]) h = sup sup sup h>0 N
k=1 N X
µ ([f > kh]) h = sup h>0
k=1
∞ X
n
N X
µ ([fn > kh]) h
k=1
Z µ ([f > kh]) h =
f dµ.
k=1
To illustrate what goes wrong without the Lebesgue integral, consider the following example. Example 6.3.2 Let {rn } denote the rational numbers in [0, 1] and let fn (t) ≡
1 if t ∈ / {r1 , · · · , rn } 0 otherwise
Then fn (t) ↑ f (t) where f is the function which is one on the rationals and zero on the irrationals. Each fn is Riemann integrable (why?) but f is not Riemann integrable because it is everywhere discontinuous. Also, there isR a gap between all upper sums and lower sums. R Therefore, you can’t write f dx = limn→∞ fn dx. A meta-mathematical observation related to this type of example is this. If you can choose your functions, you don’t need the Lebesgue integral. The Riemann Darboux integral is just fine. It is when you can’t choose your functions and they come to you as pointwise limits that you really need the superior Lebesgue integral or at least something more general than the Riemann integral. The Riemann integral is entirely adequate for evaluating the seemingly endless lists of boring problems found in calculus books. It is shown later that the two integrals coincide when the Lebesgue integral is taken with respect to Lebesgue measure and the function being integrated is continuous.
6.4
Other Definitions
To review and summarize the above, if f ≥ 0 is measurable, Z Z ∞ f dµ ≡ µ ([f > λ]) dλ
(6.3)
0
R another way to get the same thing for f dµ is to take an increasing sequence of nonnegative simple functions, {sn } with sn (ω) → f (ω) and then by monotone convergence theorem, Z Z f dµ = lim sn n→∞
where if sn (ω) =
Pm
j=1 ci XEi
(ω) , Z sn dµ =
m X
ci µ (Ei ) .
i=1
Similarly this also shows that for such a nonnegative measurable function, Z Z f dµ = sup s : 0 ≤ s ≤ f, s simple Here is an equivalent definition of the integral of a nonnegative measurable function. The fact it is well defined has been discussed above.
162
CHAPTER 6. THE ABSTRACT LEBESGUE INTEGRAL
Definition 6.4.1
For s a nonnegative simple function,
s (ω) =
n X
Z ck XEk (ω) ,
s=
k=1
n X
ck µ (Ek ) .
k=1
For f a nonnegative measurable function, Z Z s : 0 ≤ s ≤ f, s simple . f dµ = sup
6.5
Fatou’s Lemma
The next theorem, known as Fatou’s lemma is another important theorem which justifies the use of the Lebesgue integral.
Theorem 6.5.1 (Fatou’s lemma) Let fn be a nonnegative measurable function. Let g(ω) = lim inf n→∞ fn (ω). Then g is measurable and Z Z fn dµ. gdµ ≤ lim inf n→∞
In other words, Z
Z
lim inf fn dµ ≤ lim inf n→∞
n→∞
fn dµ
Proof: Let gn (ω) = inf{fk (ω) : k ≥ n}. Then −1 gn−1 ([a, ∞]) = ∩∞ k=n fk ([a, ∞]) ∈ F
Thus gn is measurable by Lemma 5.1.2. Now the functions gn form an increasing sequence −1 of nonnegative measurable functions. Thus g −1 ((a, ∞)) = ∪∞ n=1 gn ((a, ∞)) ∈ F so g is measurable also. By monotone convergence theorem, Z Z Z gdµ = lim gn dµ ≤ lim inf fn dµ. n→∞
n→∞
The last inequality holding because Z
Z gn dµ ≤
(Note that it is not known whether limn→∞
6.6
R
fn dµ. fn dµ exists.)
The Integral’s Righteous Algebraic Desires
The monotone convergence theorem shows the integral wants to be linear. This is the essential content of the next theorem.
Theorem 6.6.1 Let f, g be nonnegative measurable functions and let a, b be nonnegative numbers. Then af + bg is measurable and Z Z Z (af + bg) dµ = a f dµ + b gdµ. (6.4) Proof: By Theorem 5.1.9 on Page 140 there exist increasing sequences of nonnegative simple functions, sn → f and tn → g. Then af + bg, being the pointwise limit of the simple
6.7. THE LEBESGUE INTEGRAL, L1
163
functions asn + btn , is measurable. Now by the monotone convergence theorem and Lemma 6.2.3, Z Z asn + btn dµ (af + bg) dµ = lim n→∞ Z Z = lim a sn dµ + b tn dµ n→∞ Z Z = a f dµ + b gdµ. As long as you are allowing functions to take the value +∞, you cannot consider something like f + (−g) and so you can’t very well expect a satisfactory statement about the integral being linear until you restrict yourself to functions which have values in a vector space. To be linear, a function must be defined on a vector space. This is discussed next.
6.7
The Lebesgue Integral, L1
The functions considered here have values in C, which is a vector space. A function f with values in C is of the form f = Re f + i Im f where Re f and Im f are real valued functions. In fact f +f f −f Re f = , Im f = . 2 2i We first define the integral of real valued functions and then the integral of a complex valued function will be of the form Z Z Z f dµ = Re (f ) dµ + i Im (f ) dµ
Definition 6.7.1 Let (Ω, S, µ) be a measure space and suppose f : Ω → C. Then f is said to be measurable if both Re f and Im f are measurable real valued functions. 2
2
2
As is always the case for complex numbers, |z| = (Re z) + (Im z) . Also, for g a real valued function, one can consider its positive and negative parts defined respectively as g + (x) ≡
|g (x)| − g (x) g (x) + |g (x)| − , g (x) = . 2 2
Thus |g| = g + + g − and g = g + − g − and both g + and g − are measurable nonnegative functions if g is measurable. This follows because of Theorem 5.1.5. The mappings x → x+ , x → x− are clearly continuous. Thus g + is the composition of a continuous function with a measurable function. Then the following is the definition of what it means for a complex valued function f to be in L1 (Ω).
Definition 6.7.2
Let (Ω, F, µ) be a measure space. Then a complex valued measurable function f is in L1 (Ω) if Z |f | dµ < ∞.
For a function in L1 (Ω) , the integral is defined as follows. Z Z Z Z Z + − + − f dµ ≡ (Re f ) dµ − (Re f ) dµ + i (Im f ) dµ − (Im f ) dµ I will show that with this definition, the integral is linear and well defined. First note that it is clearly well defined because all the above integrals are of nonnegative functions and are each equal to a nonnegative real number because for h equal to any of the functions, R |h| ≤ |f | and |f | dµ < ∞. Here is a lemma which will make it possible to show the integral is linear.
164
CHAPTER 6. THE ABSTRACT LEBESGUE INTEGRAL
Lemma 6.7.3 Let g, h, g 0 , h0 be nonnegative measurable functions in L1 (Ω) and suppose that g − h = g 0 − h0 . Then
Z
Z gdµ −
Z hdµ =
g 0 dµ −
Z
h0 dµ.
Proof: By assumption, g + h0 = g 0 + h. Then from the Lebesgue integral’s righteous algebraic desires, Theorem 6.6.1, Z Z Z Z gdµ + h0 dµ = g 0 dµ + hdµ which implies the claimed result. 1 Lemma 6.7.4 Let Re L1 (Ω) denote the vectorR space of real valued functions in L (Ω) 1 where the field of scalars is the real numbers. Then dµ is linear on Re L (Ω) , the scalars being real numbers. Proof: First observe that from the definition of the positive and negative parts of a function, + − (f + g) − (f + g) = f + + g + − f − + g − because both sides equal f + g. Therefore from Lemma 6.7.3 and the definition, it follows from Theorem 6.6.1 that Z Z Z Z + − + + f + gdµ ≡ (f + g) − (f + g) dµ = f + g dµ − f − + g − dµ Z Z Z Z Z Z + + − − = f dµ + g dµ − f dµ + g dµ = f dµ + gdµ. +
what about taking out scalars? First note that if a is real and nonnegative, then (af ) = − + − af + and (af ) = af − while if a < 0, then (af ) = −af − and (af ) = −af + . These claims follow immediately from the above definitions of positive and negative parts of a function. Thus if a < 0 and f ∈ L1 (Ω) , it follows from Theorem 6.6.1 that Z Z Z Z Z + − af dµ ≡ (af ) dµ − (af ) dµ = (−a) f − dµ − (−a) f + dµ Z Z Z Z Z − + + − = −a f dµ + a f dµ = a f dµ − f dµ ≡ a f dµ. The case where a ≥ 0 works out similarly but easier. Now here is the main result. R Theorem 6.7.5 dµ is linear on L1 (Ω) and L1 (Ω) is a complex vector space. If f ∈ L1 (Ω) , then Re f, Im f, and |f | are all in L1 (Ω) . Furthermore, for f ∈ L1 (Ω) , Z Z Z Z Z + − + − f dµ ≡ (Re f ) dµ − (Re f ) dµ + i (Im f ) dµ − (Im f ) dµ Z Z ≡ Re f dµ + i Im f dµ and the triangle inequality holds, Z Z f dµ ≤ |f | dµ.
(6.5)
Also, for every f ∈ L1 (Ω) it follows that for every ε > 0 there exists a simple function s such that |s| ≤ |f | and Z |f − s| dµ < ε.
6.7. THE LEBESGUE INTEGRAL, L1
165
Proof: First consider the claim that the integral is linear. It was shown above that the integral is linear on Re L1 (Ω) . Then letting a + ib, c + id be scalars and f, g functions in L1 (Ω) , (a + ib) f + (c + id) g = (a + ib) (Re f + i Im f ) + (c + id) (Re g + i Im g) = c Re (g) − b Im (f ) − d Im (g) + a Re (f ) + i (b Re (f ) + c Im (g) + a Im (f ) + d Re (g)) It follows from the definition that Z Z (a + ib) f + (c + id) gdµ = (c Re (g) − b Im (f ) − d Im (g) + a Re (f )) dµ Z +i
(b Re (f ) + c Im (g) + a Im (f ) + d Re (g))
Also, from the definition, Z Z (a + ib) f dµ + (c + id) gdµ =
(6.6)
Z
Z Re f dµ + i Im f dµ Z Z + (c + id) Re gdµ + i Im gdµ
(a + ib)
which equals Z
=
Z Z Z Re f dµ − b Im f dµ + ib Re f dµ + ia Im f dµ Z Z Z Z +c Re gdµ − d Im gdµ + id Re gdµ − d Im gdµ. a
Using Lemma 6.7.4 and collecting terms, it follows that this reduces to 6.6. Thus the integral is linear as claimed. Consider the claim about approximation with a simple function. Letting h equal any of + − + − (Re f ) , (Re f ) , (Im f ) , (Im f ) , (6.7) It follows from the monotone convergence theorem and Theorem 5.1.9 on Page 140 there exists a nonnegative simple function s ≤ h such that Z ε |h − s| dµ < . 4 Therefore, letting s1 , s2 , s3 , s4 be such simple functions, approximating respectively the functions listed in 6.7, and s ≡ s1 − s2 + i (s3 − s4 ) , Z Z Z + − |f − s| dµ ≤ (Re f ) − s1 dµ + (Re f ) − s2 dµ Z Z + − + (Im f ) − s3 dµ + (Im f ) − s4 dµ < ε It is clear from the construction that |s| ≤ |f |. R R What about 6.5? Let θ ∈ C be such that |θ| = 1 and θ f dµ = f dµ . Then from what was shown above about the integral being linear, Z Z Z Z Z f dµ = θ f dµ = θf dµ = Re (θf ) dµ ≤ |f | dµ. It is routine to verify that for f, g measurable, meaning real and imaginary parts are measurable, then any complex linear combination is also measurable. This follows right
166
CHAPTER 6. THE ABSTRACT LEBESGUE INTEGRAL
away from Theorem 5.1.5 and looking at the real and imaginary parts of this complex linear combination. Also Z Z |af + bg| dµ ≤ |a| |f | + |b| |g| dµ < ∞. The following corollary follows from this. The conditions of this corollary are sometimes taken as a definition of what it means for a function f to be in L1 (Ω).
Corollary 6.7.6 f ∈ L1 (Ω) if and only if there exists a sequence of complex simple functions, {sn } such that sn (ω) → f R(ω) for all ω ∈ Ω limm,n→∞ (|sn − sm |) = 0
(6.8)
When f ∈ L1 (Ω) , Z
Z f dµ ≡ lim
n→∞
sn .
(6.9)
Proof: From the above theorem, if f ∈ L1 there exists a sequence of simple functions {sn } such that Z |f − sn | dµ < 1/n, sn (ω) → f (ω) for all ω Then
Z
Z
Z
1 1 + . n m Next suppose the existence of the approximating sequence of simple functions. Then f is measurable because its real and imaginary parts are the limit of measurable functions. By Fatou’s lemma, Z Z |f | dµ ≤ lim inf |sn | dµ < ∞ |sn − sm | dµ ≤
|sn − f | dµ +
|f − sm | dµ ≤
n→∞
because
Z Z Z |sn | dµ − |sm | dµ ≤ |sn − sm | dµ R |sn | dµ is a Cauchy sequence and is therefore, which is given to converge to 0. Hence bounded. In case f ∈ L1 (Ω) , letting {sn } be the approximating sequence, Fatou’s lemma implies Z Z Z Z f dµ − sn dµ ≤ |f − sn | dµ ≤ lim inf |sm − sn | dµ < ε m→∞
provided n is large enough. Hence 6.9 follows. This is a good time to observe the following fundamental observation which follows from a repeat of the above arguments.
Theorem 6.7.7
Suppose Λ (f ) ∈ [0, ∞] for all nonnegative measurable functions and suppose that for a, b ≥ 0 and f, g nonnegative measurable functions, Λ (af + bg) = aΛ (f ) + bΛ (g) .
In other words, Λ wants to be linear. Then Λ has a unique linear extension to the set of measurable functions {f measurable : Λ (|f |) < ∞} , this set being a vector space. Notation 6.7.8 If E is a measurable set and f is a measurable nonnegative function or one in L1 , the integral Z XE f dµ is often denoted as Z f dµ E
6.8. THE DOMINATED CONVERGENCE THEOREM
6.8
167
The Dominated Convergence Theorem
One of the major theorems in this theory is the dominated convergence theorem. Before presenting it, here is a technical lemma about lim sup and lim inf which is really pretty obvious from the definition.
Lemma 6.8.1 Let {an } be a sequence in [−∞, ∞] . Then limn→∞ an exists if and only if lim inf an = lim sup an n→∞
n→∞
and in this case, the limit equals the common value of these two numbers. Proof: Suppose first limn→∞ an = a ∈ R. Then, letting ε > 0 be given, an ∈ (a − ε, a + ε) for all n large enough, say n ≥ N. Therefore, both inf {ak : k ≥ n} and sup {ak : k ≥ n} are contained in [a − ε, a + ε] whenever n ≥ N. It follows lim supn→∞ an and lim inf n→∞ an are both in [a − ε, a + ε] , showing lim inf an − lim sup an < 2ε. n→∞
n→∞
Since ε is arbitrary, the two must be equal and they both must equal a. Next suppose limn→∞ an = ∞. Then if l ∈ R, there exists N such that for n ≥ N, l ≤ an and therefore, for such n, l ≤ inf {ak : k ≥ n} ≤ sup {ak : k ≥ n} and this shows, since l is arbitrary that lim inf an = lim sup an = ∞. n→∞
n→∞
The case for −∞ is similar. Conversely, suppose lim inf n→∞ an = lim supn→∞ an = a. Suppose first that a ∈ R. Then, letting ε > 0 be given, there exists N such that if n ≥ N, sup {ak : k ≥ n} − inf {ak : k ≥ n} < ε therefore, if k, m > N, and ak > am , |ak − am | = ak − am ≤ sup {ak : k ≥ n} − inf {ak : k ≥ n} < ε showing that {an } is a Cauchy sequence. Therefore, it converges to a ∈ R, and as in the first part, the lim inf and lim sup both equal a. If lim inf n→∞ an = lim supn→∞ an = ∞, then given l ∈ R, there exists N such that for n ≥ N, inf an > l.
n>N
Therefore, limn→∞ an = ∞. The case for −∞ is similar. Here is the dominated convergence theorem.
Theorem 6.8.2
(Dominated Convergence theorem) Let fn ∈ L1 (Ω) and suppose f (ω) = lim fn (ω), n→∞
168
CHAPTER 6. THE ABSTRACT LEBESGUE INTEGRAL
and there exists a measurable function g, with values in [0, ∞],1 such that Z |fn (ω)| ≤ g(ω) and g(ω)dµ < ∞. Then f ∈ L1 (Ω) and Z 0 = lim
n→∞
Z Z |fn − f | dµ = lim f dµ − fn dµ n→∞
Proof: f is measurable by Corollary 5.1.3 applied to real and imaginary parts. Since |f | ≤ g, it follows that f ∈ L1 (Ω) and |f − fn | ≤ 2g. By Fatou’s lemma (Theorem 6.5.1), Z Z 2g − |f − fn |dµ 2gdµ ≤ lim inf n→∞ Z Z = 2gdµ − lim sup |f − fn |dµ. n→∞
Subtracting
R
2gdµ, Z 0 ≤ − lim sup
|f − fn |dµ.
n→∞
Hence Z
0
≥ lim sup |f − fn |dµ n→∞ Z Z Z ≥ lim inf |f − fn |dµ ≥ lim inf f dµ − fn dµ ≥ 0. n→∞ n→∞
This proves the theorem by Lemma 6.8.1 because the lim sup and lim inf are equal.
Corollary 6.8.3 Suppose fn ∈ L1 (Ω) and f (ω) = limn→∞ fn (ω) . Suppose alsoR there R exist measurable functions, gn , Rg with values R in [0, ∞] such that limn→∞ gn dµ = gdµ, gn (ω) → g (ω) µ a.e. and both gn dµ and gdµ are finite. Also suppose |fn (ω)| ≤ gn (ω) . Then Z |f − fn | dµ = 0. lim n→∞
Proof: It is just like the above. This time g + gn − |f − fn | ≥ 0 and so by Fatou’s lemma, Z Z Z Z 2gdµ − lim sup |f − fn | dµ = lim (gn + g) dµ − lim sup |f − fn | dµ n→∞
n→∞
n→∞
Z = = and so − lim supn→∞ 0
Z
|f − fn | dµ Z Z lim inf ((gn + g) − |f − fn |) dµ ≥ 2gdµ lim inf
n→∞
(gn + g) dµ − lim sup
n→∞
n→∞
R
|f − fn | dµ ≥ 0. Thus Z ≥ lim sup |f − fn |dµ n→∞ Z Z Z |f − fn |dµ ≥ f dµ − fn dµ ≥ 0. ≥ lim inf n→∞
1 Note
that, since g is allowed to have the value ∞, it is not known that g ∈ L1 (Ω) .
6.9. PRODUCT MEASURES
Definition 6.8.4
169
Let E be a measurable subset of Ω. Z Z f dµ ≡ f XE dµ. E
If L1 (E) is written, the σ algebra is defined as {E ∩ A : A ∈ F} and the measure is µ restricted to this smaller σ algebra. Clearly, if f ∈ L1 (Ω), then f XE ∈ L1 (E) and if f ∈ L1 (E), then letting f˜ be the 0 extension of f off of E, it follows f˜ ∈ L1 (Ω). What about something ordinary, the integral of a continuous function?
Theorem 6.8.5
Let f be continuous on [a, b]. Then b
Z
Z f (x) dx =
a
f dm [a,b]
where the integral on the left is the usual Riemann integral and the integral on the right is the Lebesgue integral. Proof: From Theorems 5.5.1 and 5.8.1 f X[a,b] is Lebesgue measurable. Assume for the sake of simplicity that f (x) ≥ 0. If not, apply what is about to be shown to f + and f − . Let sn (x) be a step function and let this converge uniformly to f (x) on [a, b] with sn (x) = 0 for x ∈ / [a, b]. For example, let sn (x) ≡
n X
f (xj−1 ) X[xj−1 ,xj ) (x)
j=1
Then Z
b
Z sn (x) dx =
a
sn dm [a,b]
thanks to Theorem 5.5.1 which gives the measure of intervals. Then one can apply the Rb definition of the Riemann integral to obtain the left side converging to a f (x) dx because f Rb is continuous and a sn (x) dx is nothing more than a left sum. Then apply the dominated convergence theorem on the right to obtain the claim of the theorem. Indeed f is bounded and so there exists M ≥ f (x) ≥ 0 for all x ∈ [a, b]. This shows that for reasonable functions, there is nothing new in the Lebesgue integral. The big difference is that now you have limit theorems which may be applied and you can integrate more functions. In fact, every Riemann integrable function on an interval is Lebesgue integrable.
6.9
Product Measures
First of all is a definition.
Definition 6.9.1
Let (X, F, µ) be a measure space. Then it is called σ finite if there exists an increasing sequence of sets Rn ∈ F such that µ (Rn ) < ∞ for all n and also X = ∪∞ n=1 Rn .
170
CHAPTER 6. THE ABSTRACT LEBESGUE INTEGRAL
Qp Now I will show how to define a measure on i=1 Xi given that (Xi , Fi , µi ) is a σ finite measure space. Qp Qp Let K denote all subsets of X ≡ i=1 Xi which are the form i=1 Ei where Ei ∈ Fi . ∞ These are called measurable rectangles. Let {Rin }n=1 be the sequence of sets in FQi whose Qp p n+1 n n union is all of Xi , Ri ⊆ Ri , and µi (Ri ) < ∞. Thus if Rn ≡ i=1 Rin , and E ≡ i=1 Ei , then p Y Rn ∩ E = Rin ∩ Ei i=1
Let I ≡ (i1 , · · · , ip ) where (i1 , · · · , ip ) is a permutation of {1, · · · , p}. Also, to save on space, denote the iterated integral Z
Z ···
X11
Xip
XF (x1 , · · · , xp ) dµi1 · · · dµip
as Z XF (x1 , · · · , xp ) dµI I
G≡
Z F ⊆ X such that for all n,
X
F∩Rn
(x1 , · · · , xp ) dµI makes sense and is independent of I
I
The iterated integral means exactly what the symbols indicate. First you integrate XF (x1 , · · · , xp ) with respect to dµi1 and then you have a function of the other variables other than xi1 . You then integrate what is left with respect to xi2 and so forth. This is just like what was with iterated integrals in calculus. In order for this to make sense, every function encountered must be measurableRwith respect to the appropriate Qp σ algebra. Now obviously K ⊆ G. In fact, if F ∈ K, then I XF∩Rn (x1 , · · · , xp ) dµI = i=1 µi (Fi ∩ Rin ) for any choice of n.
Proposition 6.9.2 Let K and G be as just defined, thenRG ⊇ σ (K) . We define σ (K) as
F p , better denoted as F1 ×· · ·×Fp . Then if ~µ (F) ≡ limn→∞ I XF∩Rn (x1 , · · · , xp ) dµI , then ~µ is a measure which does not depend on I the particular permutation chosen for the order of integration. ~µ often denoted as µ1 × · · · × µp is called product measure. f : X → [0, ∞) is measurable with respect to F p then for any permutation (i1 , · · · , ip ) of {1, · · · , p} it follows Z
Z f d~µ =
Z ···
f (x1 , · · · , xp ) dµi1 · · · dµip
(6.10)
Proof: I will show that G is closed with respect and countable disjoint to complements ∞ unions. Then the result will follow. Now suppose Fk k=1 are disjoint, each in G. Then if k F ≡ ∪∞ k=1 F , k n F ∩ Rn = ∪∞ k=1 F ∩ R
and since these sets are disjoint, XF∩Rn =
∞ X
XFk ∩Rn
k=1
Therefore, applying the monotone convergence theorem repeatedly for the iterated integrals and using the fact that measurability is not lost on taking limits, then for (i1 , · · · , ip ) , (j1 , · · · , jp )
6.9. PRODUCT MEASURES
171
two permutations, Z
Z XF∩Rn (x1 , · · · , xp ) dµi1 · · · dµip
··· Z
Z ···
=
lim
N →∞
Z =
=
=
···
lim
N →∞
lim
N →∞
lim
N Z X
XFk ∩Rn (x1 , · · · , xp ) dµj1 · · · dµjp
··· Z X N
Z
Z
XFk ∩Rn (x1 , · · · , xp ) dµi1 · · · dµip
···
XFk ∩Rn (x1 , · · · , xp ) dµj1 · · · dµjp
k=1
···
=
Z
Z
···
lim
Z
XFk ∩Rn (x1 , · · · , xp ) dµi1 · · · dµip
k=1
N →∞
=
Z X N
k=1
Z =
XFk ∩Rn (x1 , · · · , xp ) dµi1 · · · dµip
k=1
k=1
N Z X
N →∞
N X
lim
N →∞
N X
XFk ∩Rn (x1 , · · · , xp ) dµj1 · · · dµjp
k=1
Z ···
XF∩Rn (x1 , · · · , xp ) dµj1 · · · dµjp
Thus G is closed with respect to countable disjoint unions. So suppose F ∈ G. Then RXFC ∩Rn = XRn − XF∩Rn . Everything works for both terms on the right and in addition, X n dµI is finite and independent of I. Therefore, everything works as it should for I R the function on the left using similar arguments to the above. You simply verify that all makes sense for each integral at a time and apply monotone convergence theorem as needed. Therefore, G is indeed closed with respect to complements. It follows that G ⊇ σ (K) by Dynkin’s lemma, Lemma 5.3.2. Now define for F ∈ σ (K) , Z ~µ (F) ≡ lim XF∩Rn (x1 , · · · , xp ) dµI n→∞
I
By definition of G this definition of ~µ does not depend on I. If you have sequence of disjoint sets in G, then if F is their union, Z X ∞ ~µ (F) ≡ lim XFk ∩Rn (x1 , · · · , xp ) dµI n→∞
k ∞ F k=1 is a
I k=1
and one can apply the monotone convergence theorem one integral at a time and obtain that this is ∞ Z X lim XFk ∩Rn (x1 , · · · , xp ) dµI n→∞
k=1
I
Now applying the monotone convergence theorem again, this time for the Lebesgue integral given by a sum with counting measure, the above is Z ∞ ∞ X X lim XFk ∩Rn (x1 , · · · , xp ) dµI ≡ ~µ Fk k=1
n→∞
I
k=1
which shows that ~µ is indeed a measure. Also from the construction, it follows that this measure does not depend on the particular permutation of the iterated integrals used to compute it.
172
CHAPTER 6. THE ABSTRACT LEBESGUE INTEGRAL
The claim about the integral 6.10 follows right away from the monotone convergence theorem applied in the right side one iterated integral at a time and approximation with simple functions as in Theorem 5.1.9. The result holds for each of an increasing sequence simple functions from linearity of integrals and the definition of ~µ. Then you apply the monotone convergence theorem to obtain the claim of the theorem.
6.10
Some Important General Theorems
6.10.1
Eggoroff ’s Theorem
Eggoroff’s theorem says that if a sequence converges pointwise, then it almost converges uniformly in a certain sense.
Theorem 6.10.1
(Egoroff ) Let (Ω, F, µ) be a finite measure space, (µ(Ω) < ∞)
and let fn , f be complex valued functions such that Re fn , Im fn are all measurable and lim fn (ω) = f (ω)
n→∞
for all ω ∈ / E where µ(E) = 0. Then for every ε > 0, there exists a set, F ⊇ E, µ(F ) < ε, such that fn converges uniformly to f on F C . Proof: First suppose E = ∅ so that convergence is pointwise everywhere. It follows then that Re f and Im f are pointwise limits of measurable functions and are therefore measurable. Let Ekm = {ω ∈ Ω : |fn (ω) − f (ω)| ≥ 1/m for some n > k}. Note that q 2 2 |fn (ω) − f (ω)| = (Re fn (ω) − Re f (ω)) + (Im fn (ω) − Im f (ω)) and so, 1 |fn − f | ≥ m is measurable. Hence Ekm is measurable because 1 ∞ Ekm = ∪n=k+1 |fn − f | ≥ . m For fixed m, ∩∞ k=1 Ekm = ∅ because fn converges to f . Therefore, if ω ∈ Ω there exists k 1 such that if n > k, |fn (ω) − f (ω)| < m which means ω ∈ / Ekm . Note also that Ekm ⊇ E(k+1)m . Since µ(E1m ) < ∞, Theorem 5.2.4 on Page 141 implies 0 = µ(∩∞ k=1 Ekm ) = lim µ(Ekm ). k→∞
Let k(m) be chosen such that µ(Ek(m)m ) < ε2−m and let F =
∞ [ m=1
Ek(m)m .
6.10. SOME IMPORTANT GENERAL THEOREMS
173
Then µ(F ) < ε because µ (F ) ≤
∞ X
∞ X µ Ek(m)m < ε2−m = ε
m=1
m=1
C Now let η > 0 be given and pick m0 such that m−1 0 < η. If ω ∈ F , then
ω∈
∞ \
C Ek(m)m .
m=1 C Hence ω ∈ Ek(m so 0 )m0
|fn (ω) − f (ω)| < 1/m0 < η for all n > k(m0 ). This holds for all ω ∈ F C and so fn converges uniformly to f on F C . ∞ Now if E 6= ∅, consider {XE C fn }n=1 . Each XE C fn has real and imaginary parts measurable and the sequence converges pointwise to XE f everywhere. Therefore, from the first part, there exists a set of measure less than ε, F such that on F C , {XE C fn } converges C uniformly to XE C f. Therefore, on (E ∪ F ) , {fn } converges uniformly to f .
6.10.2
The Vitali Convergence Theorem
The Vitali convergence theorem is a convergence theorem which in the case of a finite measure space is superior to the dominated convergence theorem.
Definition 6.10.2
Let (Ω, F, µ) be a measure space and let S ⊆ L1 (Ω). S is uniformly integrable if for every ε > 0 there exists δ > 0 such that for all f ∈ S Z | f dµ| < ε whenever µ(E) < δ. E
Lemma 6.10.3 If S is uniformly integrable, then |S| ≡ {|f | : f ∈ S} is uniformly integrable. Also S is uniformly integrable if S is finite. Proof: Let ε > 0 be given and suppose S is uniformly integrable. First suppose the functions are real valued. Let δ be such that if µ (E) < δ, then Z f dµ < ε 2 E for all f ∈ S. Let µ (E) < δ. Then if f ∈ S, Z Z Z |f | dµ ≤ (−f ) dµ + f dµ E E∩[f ≤0] E∩[f >0] Z Z = f dµ + f dµ E∩[f ≤0] E∩[f >0] ε ε + = ε. < 2 2 In general, if S is a uniformly integrable set of complex valued functions, the inequalities, Z Z Z Z Re f dµ ≤ f dµ , Im f dµ ≤ f dµ , E
E
E
E
imply Re S ≡ {Re f : f ∈ S} and Im S ≡ {Im f : f ∈ S} are also uniformly integrable. Therefore, applying the above result for real valued functions to these sets of functions, it follows |S| is uniformly integrable also.
174
CHAPTER 6. THE ABSTRACT LEBESGUE INTEGRAL
For the last part, is suffices to verify a single function in L1 (Ω) is uniformly integrable. To do so, note that from the dominated convergence theorem, Z lim |f | dµ = 0. R→∞
[|f |>R]
R Let ε > 0 be given and choose R large enough that [|f |>R] |f | dµ < 2ε . Now let µ (E) < Then Z Z Z |f | dµ = |f | dµ + |f | dµ E∩[|f |≤R]
E
ε 2R .
E∩[|f |>R]
ε ε ε Rµ (E) + < + = ε. 2 2 2
0 such that if µ (E) < δ 1 , then |g| dµ < ε/3 for all g ∈ S ∪ {f }. E By Egoroff’s theorem, there exists a set, F with µ (F ) < δ 1 such that fn converges uniformly to f on F C . Therefore, there exists N such that if n > N , then Z |f − fn | dµ < FC
ε . 3
It follows that for n > N , Z
Z |f − fn | dµ ≤ FC
Ω
< which verifies 6.11.
Z |f − fn | dµ +
ε ε ε + + = ε, 3 3 3
Z |f | dµ +
F
|fn | dµ F
176
CHAPTER 6. THE ABSTRACT LEBESGUE INTEGRAL
6.10.3
Radon Nikodym Theorem
Let µ, ν be two finite measures on the measurable space (Ω, F) and let αP ≥ 0. Let λ ≡ ν−αµ. ∞ ∞ Then it is clear that if {Ei }i=1 are disjoint sets of F, then λ (∪i Ei ) = i=1 λ (Ei ) and that the series converges. The next proposition is fairly obvious.
Proposition 6.10.6 Let (Ω, F, λ) be a measure space and let λ : F → [0, ∞). Then λ is a finite measure. Proof: Either λ (Ω) < ∞ and λ is a finite measure or λ (Ω) = ∞ and it isn’t.
Definition 6.10.7 ∞
Let (Ω, F) be a measurable space and let λ : F → R satisfy: If P∞ {Ei }i=1 are disjoint sets of F, then λ (∪i Ei ) = i=1 λ (Ei ) and the series converges. Such a real valued function is called a signed measure. In this context, a set E ∈ F is called positive if whenever F is a measurable subset of E, it follows λ (F ) ≥ 0. A negative set is defined similarly. Note that this requires λ (Ω) ∈ R.
Lemma 6.10.8 The countable union of positive sets is positive. Proof: Let Ei be positive and consider E ≡ ∪∞ i=1 PEi . If A ⊆ E with A measurable, then A ∩ Ei ⊆ Ei and so λ (A ∩ Ei ) ≥ 0. Hence λ (A) = i λ (A ∩ Ei ) ≥ 0.
Lemma 6.10.9 Let λ be a signed measure on (Ω, F). If E ∈ F with 0 < λ (E), then E has a measurable subset which is positive. Proof: If every measurable subset F of E has λ (F ) ≥ 0, then E is positive and we are done. Otherwise there exists F ⊆ E with λ (F ) < 0. Let U ∈ F mean U is a set consisting of disjoint measurable sets F for which λ (F ) < 0. Partially order F by set inclusion. By the Hausdorff maximal theorem, there is a maximal chain C. Then ∪C is a set consisting of disjoint measurable sets F ∈ F such that λ (F ) < 0. Since each set in ∪C has measure ∞ strictly less than 0, it follows that ∪C is a countable set, {Fi }i=1 . Otherwise, there would exist an infinite subset of ∪C with each set having measure less than n1 for some n ∈ N so λ would not be real valued. Letting F = ∪i Fi , then E \ F has no measurable subsets S for which λ (S) < 0 since, if it did, C would not have been maximal. Thus E \ F is positive. A major result is the following, called a Hahn decomposition.
Theorem 6.10.10
Let λ be a signed measure on a measurable space (Ω, F) . Then there are disjoint measurable sets P, N such that P is a positive set, N is a negative set, and P ∪ N = Ω.
Proof: If Ω is either positive or negative, there is nothing to show, so suppose Ω is neither positive nor negative. Let U ∈ F mean U is a set consisting of disjoint positive measurable sets F for which λ (F ) > 0. Partially order F by set inclusion and use Hausdorff maximal theorem to get C a maximal chain. Then, as in the above lemma, ∪C is countable, ∞ say {Pi }i=1 because λ (F ) > 0 for each F ∈ ∪C. Letting P ≡ ∪i Pi , and N = P C , it follows from Lemma 6.10.8 that P is positive. It is also the case that N must be negative because otherwise, C would not be maximal. Clearly a Hahn decomposition is not unique. For example, you could have obtained a different Hahn decomposition if you had considered disjoint negative sets F for which λ (F ) < 0 in the above argument . ∞ Let {αn }n=0 be equally spaced points αn = 2−k n. Also let (Pn , Nn ) be a Hahn decomposition for ν − αn µ. Now from the definition, Nn+1 \ Nn = Nn+1 ∩ Pn . Also, Nn ⊆ Nn+1 ∞ for each n and we can take N0 = ∅. then {Nn+1 \ Nn }n=0 covers all of Ω except possibly for a set of µ measure 0.
Lemma 6.10.11 Let S ≡ Ω \ (∪n Nn ) . Then µ (S) = 0.
6.10. SOME IMPORTANT GENERAL THEOREMS
177
Proof: Let S ≡ Ω \ (∪n Nn ) . If µ (S) > 0, then for n large enough, ν (S) − αn µ (S) < 0 and so from Lemma 6.10.9 applied to αn µ − ν, there exists F ⊆ S with F negative with respect to ν − αn µ. Thus F ⊆ Nn for all n large enough contrary to F ⊆ S = Ω \ (∪n Nn ) . It follows that µ (S) = 0. As just noted, if E ⊆ Nn+1 \ Nn , then ν (E) − αn µ (E) ≥ 0 ≥ ν (E) − αn+1 µ (E) αn+1 µ (E) ≥ ν (E) ≥ αn µ (E) Now restore k in the definition of αn so αkn replaces αn define f k (ω) ≡
∞ X
αkn X(N k
k n+1 \Nn
) (ω)
n=0 k Say ω ∈ Nn+1 \ Nnk so that f k (ω) = αkn ≡ 2−k n. How does f k (ω) compare with f k+1 (ω)? k+1 k+1 k+1 k Let m be such that αk+1 = αkn = f k (ω) so Nm = Nnk . Then Nn+1 \Nnk = Nm+2 \ Nm+1 ∪ m k+1 k+1 Nm+1 \ Nm and so ω is in one of these two disjoint sets. If ω is in the second, then k+1 −(k+1) f k (ω) = f k+1 (ω) and if in the first, then (ω) = (m + ≥ f k (ω) = m2−(k+1) . kf 1) 2−k k k+1 Thus k → f (ω) is increasing and f (ω) − f (ω) ≤ 2 and so there is a measurable, real valued function f (ω) which is the limit of these f k (ω). k As explained above, if E ⊆ Nn+1 \Nnk , αkn+1 µ (E) ≥ ν (E) ≥ αkn µ (E) . Thus for arbitrary E ⊆ ∪n Nn , Z Z X ∞ ∞ X k k XE (ω) f (ω) dµ = αkn XE∩(N k \N k ) (ω) dµ = αkn µ E ∩ Nn+1 \ Nnk n+1
n
n=0
≤ ≤
∞ X n=0 ∞ X
n=0
k ν E ∩ Nn+1 \ Nnk
= ν (E) ≤
≤
k αkn+1 µ E ∩ Nn+1 \ Nnk
n=0
k
k αkn µ E ∩ Nn+1 \ Nn
+ 2−k
n=0
Z
∞ X
∞ X
k µ E ∩ Nn+1 \ Nnk
n=0
XE (ω) f k (ω) dµ + 2−k µ (E)
R Letting k → ∞ and using the monotone convergence theorem, ν (E) = E f dµ. This proves most of the following theorem which is the Radon Nikodym theorem. First is a definition.
Definition 6.10.12
Let µ, ν be finite measures on (Ω, F). Then ν µ means that whenever µ (E) = 0, it follows that ν (E) = 0.
Theorem 6.10.13
Let ν and µ be finite measures defined on a measurable space (Ω, F). Then there exists a set of µ measure zero S and R a real valued, measurable R function ω → f (ω) such that if E ⊆ S C , E ∈ F, then ν (E) = E f dµ. If ν µ, ν (E) = E f dµ. In R any case, ν (E) ≥ E f dµ. This function f ∈ L1 (Ω). If f, fˆ both work, then f = fˆ a.e. Proof: Let S be defined in Lemma 6.10.11 so S ≡ Ω \ (∪n Nn ) and µ (S) = 0. If E ∈ F, and f as described above, Z Z C ν (E) = ν E ∩ S + ν (E ∩ S) = f dµ + ν (E ∩ S) = f dµ + ν (E ∩ S) E∩S C C
R
E
Thus R if E ⊆ S ,R we have ν (E) = E f dµ. If ν µ, Rthen in the above, ν (E ∩ S) = 0 so E∩S C f dµ = E f dµ = ν (E). In any case, ν (E) ≥ E f dµ, strict inequality holding if ν (E ∩ S) > 0. The last claim is left as an exercise. dλ , in the case ν µ and this is called the Radon Nikodym Sometimes people write f = dµ derivative.
178
CHAPTER 6. THE ABSTRACT LEBESGUE INTEGRAL
Definition 6.10.14
Let S be in the above theorem. Then Z Z C ν || (E) ≡ ν E ∩ S = f dµ = f dµ E∩S C
E
while ν ⊥ (E) ≡ ν (E ∩ S) . Thus ν || µ and ν ⊥ is nonzero only on sets which are contained in S which has µ measure 0. This decomposition of a measure ν into the sum of two measures, one absolutely continuous with respect to µ and the other supported on a set of µ measure zero is called the Lebesgue decomposition.
Definition 6.10.15
A measure space (Ω, F, µ) is σ finite if there are countably many measurable sets {Ωn } such that µ is finite on measurable subsets of Ωn . There is a routine corollary of the above theorem.
Corollary 6.10.16 Suppose µ, ν are both σ finite measures defined on (Ω, F). Then the same conclusion in the above theorem can be obtained. Z ν (E) = ν E ∩ S C + ν (E ∩ S) = f dµ + ν (E ∩ S) , µ (S) = 0
(6.12)
E
In particular, if ν µ then there is a real valued function f such that ν (E) = 1 ˆ ,µ Ω ˆ < ∞, then f ∈ L Ω ˆ . all E ∈ F. Also, if ν Ω
R E
f dµ for
n o∞ ˜k ˜k , µ Ω ˜ k are Proof: Since both µ, ν are σ finite, there are Ω such that ν Ω k=1 ˜ ˜ k \ ∪k−1 Ω so that µ, ν are finite on Ωk and the Ωk finite. Letting Ω0 = ∅ and Ωk ≡ Ω j=0 j are disjoint. Let Fk be the measurable subsets of Ωk , equivalently the intersections with Ωk with sets of F. Now let ν k (E) ≡ ν (E ∩ Ωk ) , similar for µk . By Theorem 6.10.13, there exists Sk ⊆ Ωk , and fk as described there. Thus µk (Sk ) = 0 and Z ν k (E) = ν k E ∩ SkC + ν k (E ∩ Sk ) = fk dµk + ν k (E ∩ Sk ) E∩Ωk
Now let f (ω) ≡ fk (ω) for ω ∈ Ωk . Thus Z ν (E ∩ Ωk ) = ν (E ∩ (Ωk \ Sk )) + ν (E ∩ Sk ) =
f dµ + ν (E ∩ Sk )
(6.13)
E∩Ωk
Summing over all k, and letting S ≡ ∪k Sk , it follows µ (S) = 0 and that for Sk as above, a subset of Ωk where the Ωk are disjoint, R Ω \ S = ∪k (Ωk \ Sk ) . Thus, summing on k in 6.13, ν (E) = ν E ∩ S C + ν R(E ∩ S) = E f dµ + ν (E ∩ S) . In particular, if ν µ, then ν (E ∩ S) = 0 and so ν (E) = E f dµ. The last claim is obvious from 6.12.
Corollary 6.10.17 In the above situation, let λ be a signed measure and let λ µ meaning that if µ (E) = 0 ⇒ λ (E) R = 0. Here assume that µ is a finite measure. Then there exists h ∈ L1 such that λ (E) = E hdµ. Proof: Let P ∪ N be the Hahn decomposition of λ. Let λ+ (E) ≡ λ (E ∩ P ) , λ− (E) ≡ −λ (E ∩ N ) . Then both λ+ and λ− areR absolutely continuous measures and so there are nonnegative h+ and h− with λ− (E) = E h− dµ and a similar equation for λ+ . Then both 1 of these measures are necessarily finite. Hence R both h− and h+ are in L so h ≡ h+ − h− 1 is also in L and λ (E) = λ+ (E) − λ− (E) = E (h+ − h− ) dµ.
6.11. EXERCISES
6.11
179
Exercises
1. Let Ω = N ={1, 2, · · · }. Let F = P(N), the set of all subsets of N, and let µ(S) = number of elements in S. Thus µ({1}) = 1 = µ({2}), µ({1, 2}) = 2, etc. In this case, all functions are measurable. For a nonnegative function, f defined on N, show Z ∞ X f dµ = f (k) N
k=1
What do the monotone convergence and dominated convergence theorems say about this example? 2. For the measure space of Problem 1, give an example of a sequence of nonnegative measurable functions {fn } converging pointwise to a function f , such that inequality is obtained in Fatou’s lemma. 3. If (Ω, F, µ) isRa measure show that if g (ω) = f (ω) R space and f, g ≥ 0 is measurable, 1 a.e. ω, then gdµ = f dµ. Show that if f, g ∈ L (Ω) and g (ω) = f (ω) a.e. then R R gdµ = f dµ. 4. Let {fn } , f be measurable functions with values in C. {fn } converges in measure if lim µ(x ∈ Ω : |f (x) − fn (x)| ≥ ε) = 0
n→∞
for each fixed ε > 0. Prove the theorem of F. Riesz. If fn converges to f in measure, then there exists a subsequence {fnk } which converges to f a.e. In case µ is a probability measure, this is called convergence in probability. It does not imply pointwise convergence but does imply that there is a subsequence which converges pointwise off a set of measure zero. Hint: Choose n1 such that µ(x : |f (x) − fn1 (x)| ≥ 1) < 1/2. Choose n2 > n1 such that µ(x : |f (x) − fn2 (x)| ≥ 1/2) < 1/22, n3 > n2 such that µ(x : |f (x) − fn3 (x)| ≥ 1/3) < 1/23, etc. Now consider what it means for fnk (x) to fail to converge to f (x). Use the Borel Cantelli lemma of Problem 14 on Page 154. 5. Suppose (Ω, µ) is a finite measure space (µ (Ω) < ∞) and S ⊆ L1 (Ω). Then S is said to be uniformly integrable if for every ε > 0 there exists δ > 0 such that if E is a measurable set satisfying µ (E) < δ, then Z |f | dµ < ε E
for all f ∈ S. Show S is uniformly integrable and bounded in L1 (Ω) if there exists an increasing function h which satisfies Z h (t) = ∞, sup h (|f |) dµ : f ∈ S < ∞. lim t→∞ t Ω S is bounded if there is some number, M such that Z |f | dµ ≤ M for all f ∈ S. This is in the chapter but write it down in your own words.
180
CHAPTER 6. THE ABSTRACT LEBESGUE INTEGRAL
6. A collection S ⊆ L1 (Ω) , (Ω, F, µ) a finite measure space, is called equiintegrable if for every ε > 0 there exists λ > 0 such that Z |f | dµ < ε [|f |≥λ]
for all f ∈ S. Show that S is equiintegrable, if and only if it is uniformly integrable and bounded. The equiintegrable condition is pretty popular in probability. 7. Product measure is described in the chapter. Go through the construction in detail for two measure spaces as follows. (X, F, µ) , (Y, G, ν) Let K be the π system of measurable rectangles A × B where A ∈ F and B ∈ G. Explain why this is really a π system. Now let F × G denote the smallest σ algebra which contains K. Let Z Z Z Z P≡ A∈F ×G : XA dνdµ = XA dµdν X
Y
Y
X
where both integrals make sense and are equal. Then show that P is closed with respect to complements and countable disjoint unions. By Dynkin’s lemma, P = F × G. Then define a measure µ × ν as follows. For A ∈ F × G Z Z µ × ν (A) ≡ XA dνdµ X
Y
Explain why this is a measure and why if f is F × G measurable and nonnegative, then Z Z Z Z Z f d (µ × ν) = XA dνdµ = XA dµdν X×Y
X
Y
Y
X
Hint: Pay special attention to the way the monotone convergence theorem is used. 8. Let (Ω, F, µ) be a measure space and suppose f, g : Ω → (−∞, ∞] are measurable. Prove the sets {ω : f (ω) < g(ω)} and {ω : f (ω) = g(ω)} are measurable. Hint: The easy way to do this is to write {ω : f (ω) < g(ω)} = ∪r∈Q [f < r] ∩ [g > r] . Note that l (x, y) = x − y is not continuous on (−∞, ∞] so the obvious idea doesn’t work. Here [g > r] signifies {ω : g (ω) > r}. 9. Let {fn } be a sequence of real or complex valued measurable functions. Let S = {ω : {fn (ω)} converges}. Show S is measurable. Hint: You might try to exhibit the set where fn converges in terms of countable unions and intersections using the definition of a Cauchy sequence. 10. Suppose un (t) is a differentiable function for t ∈ (a, b) and suppose that for t ∈ (a, b), |un (t)|, |u0n (t)| < Kn where
P∞
n=1
Kn < ∞. Show (
∞ X n=1
un (t))0 =
∞ X
u0n (t).
n=1
Hint: This is an exercise in the use of the dominated convergence theorem and the mean value theorem.
6.11. EXERCISES
181
11. Suppose {fn } is a sequence of nonnegative measurable functions defined on a measure space, (Ω, S, µ). Show that Z X ∞ k=1
fk dµ =
∞ Z X
fk dµ.
k=1
Hint: Use the monotone convergence theorem along with the fact the integral is linear. Pn k 12. Show limn→∞ 2nn k=1 2k = 2. This problem was shown to me by Shane Tang, a former student. It is a nice exercise in dominated convergence theorem if you massage it a little. Hint: n−1 n n n−1 n−1 X X X X n X 2k l k−n n −l n −l = = = ≤ 2 2 2 2−l (1 + l) 1 + n 2 k k n−l n−l k=1
k=1
l=0
l=0
l
13. Show that if f is continuous on [a, b] . Show that the traditional Riemann integral equals the Lebesgue integral. Hint: Assume first that f ≥ 0. You can reduce to this case later by looking at positive and negative parts. Show that there is an increasing sequence of step functions fn which converges uniformly to f on [a, b] . Then from single variable calculus, or Theorem 4.2.1, and the monotone convergence theorem, Z
b
Z f (t) dt = lim
a
n→∞
b
Z
a
n→∞
Z fn (t) dm1 =
fn (t) dt = lim
[a,b]
f (t) dm1 [a,b]
Note how this uses the obvious fact that the Riemann integral and Lebesgue integral coincide on nonnegative step function since the Lebesgue meaure of an interval is just the length of the interval. 14. Show the Vitali Convergence theorem implies the Dominated Convergence theorem for finite measure spaces but there exist examples where the Vitali convergence theorem works and the dominated convergence theorem does not.
182
CHAPTER 6. THE ABSTRACT LEBESGUE INTEGRAL
Chapter 7
Positive Linear Functionals In this chapter is a standard way to obtain many examples of measures from extending positive linear functionals. I will consider positive linear functionals defined on Cc (X) where (X, d) is a metric space. This can all be generalized to X a locally compact Hausdorff space, but I don’t have many examples which need this level of generality. Also, we will always assume that the closure of balls in (X, d) are compact. Thus you see that the main example is Rp or some closed subset of Rp like a m dimensional surface in Rp where m < p. This approach will not work for finding measures on infinite dimensional Banach spaces for example, which appears to limit its applications to probability but it is a very general approach which gives outstanding results very quickly. To see more generality including the locally compact Hausdorff spaces, see Rudin [60] which is where I first saw this, actually in an earlier version of this book. Another source is [34]. This is based on extending functionals. The most obvious functional is as follows: Z ∞ Z ∞ Lf ≡ ··· f (x1 , · · · , xp ) dxp dxp−1 · · · dx1 −∞
−∞
the iterated integral in which f ∈ Cc (Rp ). You do exactly what the notation says. First integrate with respect to xp then with respect to xp−1 and so forth. I leave as an exercise the verification that each iterate makes perfect sense whenever f ∈ Cc (Rp ).
7.1
Partitions of Unity
Definition 7.1.1
Define Cc (X) to be the functions which have complex values and compact support. This means spt (f ) ≡ {x ∈ X : f (x) 6= 0} is a compact set. Then L : Cc (X) → C is called a positive linear functional if it is linear and if, whenever f ≥ 0, then L (f ) ≥ 0 also. When f is a continuous function and spt (f ) ⊆ V an open set, we say f ∈ Cc (V ). Here X is some metric space. The following definition gives some notation.
Definition 7.1.2
If K is a compact subset of an open set, V , then K ≺ φ ≺ V if φ ∈ Cc (V ), φ(K) = {1}, φ(X) ⊆ [0, 1],
where X denotes the whole metric space. Also for φ ∈ Cc (X), K ≺ φ if φ(X) ⊆ [0, 1] and φ(K) = 1. and φ ≺ V if φ(X) ⊆ [0, 1] and spt(φ) ⊆ V. 183
184
CHAPTER 7. POSITIVE LINEAR FUNCTIONALS Next is a useful theorem.
Theorem 7.1.3
Let H be a compact subset of an open set U in X where (X, d) is a metric space in which the closures of balls are compact. Then there exists an open set V such that H ⊆ V ⊆ V¯ ⊆ U with V¯ compact. There also exists ψ such that H ≺ ψ ≺ V , meaning that ψ = 1 on H and spt (ψ) ⊆ V¯ . Proof: Consider h → dist h, U C . This continuous function achieves its minimum at some h0 ∈ H because H is compact. Let δ ≡ dist h0 , U C . The distance is positive because U C is closed. Now H ⊆ ∪h∈H B (h, δ) . Since H is compact, there are finitely many of these balls which cover H. Say H ⊆ ∪ki=1 B (hi , δ) ≡ V. Then, since there are finitely many of these balls, let V ≡ ∪ki=1 B (hi , δ), V ≡ ∪ki=1 B (hi , δ)
V is a compact set since it is a finite union of compact sets. To obtain ψ, let dist x, V C ψ (x) ≡ dist (x, V C ) + dist (x, H) Then ψ (x) ≤ 1 and if x ∈ H, its distance to V C is positive and dist (x, H) = 0 so ψ (x) = 1. If x ∈ V C , then its distance to H is positive and so ψ (x) = 0. It is obviously continuous because the denominator is a continuous function and never vanishes since both V C and H C are closed so if either or dist (x, H) is 0, then x is in either V C or H. Thus, if dist x, V C one of dist x, V , dist (x, H) is 0, the other isn’t. Thus H ≺ ψ ≺ V .
Theorem 7.1.4
(Partition of unity) Let K be a compact subset of X and suppose K ⊆ V = ∪ni=1 Vi , Vi open.
Then there exist ψ i ≺ Vi with
n X
ψ i (x) = 1
i=1
for all x ∈ K. If H is a compact subset of Vi for some Vi , there exists a partition of unity such that ψ i (x) = 1 for all x ∈ H Proof: Let K1 = K \ ∪ni=2 Vi . Thus K1 is compact and K1 ⊆ V1 . Let K1 ⊆ W1 ⊆ W 1 ⊆ V1 with W 1 compact. To obtain W1 , use Theorem 7.1.3 to get f such that K1 ≺ f ≺ V1 and let W1 ≡ {x : f (x) 6= 0} . Thus W1 , V2 , · · · Vn covers K and W 1 ⊆ V1 . Let K2 = K \ (∪ni=3 Vi ∪ W1 ). Then K2 is compact and K2 ⊆ V2 . Let K2 ⊆ W2 ⊆ W 2 ⊆ V2 W 2 compact. Continue this way finally obtaining W1 , · · · , Wn , K ⊆ W1 ∪ · · · ∪ Wn , and W i ⊆ Vi W i compact. Now let W i ⊆ Ui ⊆ U i ⊆ Vi , U i compact. By Theorem 7.1.3, let U i ≺ φi ≺ Vi , ∪ni=1 W i ≺ γ ≺ ∪ni=1 Ui . Define Pn Pn γ(x)φi (x)/ j=1 φj (x) if j=1 φj (x) 6= 0, Pn ψ i (x) = Wi Ui Vi 0 if j=1 φj (x) = 0. Pn If x is such that j=1 φj (x) = 0, then x ∈ / ∪ni=1 U i . Consequently γ(y) P = 0 for all y near x and so ψ i (y) = 0 for all y near x. Hence ψ i is continuous n at such x. If j=1 φj (x) 6= 0, this situation persists near x and so ψ i is continuous at such Pn points. Therefore ψ i is continuous. If x ∈ K, then γ(x) = 1 and so j=1 ψ j (x) = 1. Clearly 0 ≤ ψ i (x) ≤ 1 and spt(ψ j ) ⊆ Vj . As to the last claim, keep Vi the same but replace Vj fj ≡ Vj \ H. Now in the proof above, applied to this modified collection of open sets, with V if j 6= i, φj (x) = 0 whenever x ∈ H. Therefore, ψ i (x) = 1 on H.
7.2. POSITIVE LINEAR FUNCTIONALS AND MEASURES
7.2
185
Positive Linear Functionals and Measures
Now with this preparation, here is the main result called the Riesz representation theorem for positive linear functionals. I am presenting this for a metric space, but in this book, we will typically have X = Rp .
Theorem 7.2.1
(Riesz representation theorem) Let L be a positive linear functional on Cc (X) where (X, d) is a metric space having closed balls compact. Thus Lf ∈ C if f ∈ Cc (X). Then there exists a σ algebra F containing the Borel sets and a unique measure µ, defined on F, such that
µ(K)
µ(Vi ). µ(E) ≤ µ(∪∞ i=1 Vi ) ≤
∞ X i=1
Since ε was arbitrary, µ(E) ≤
P∞
i=1
µ(Vi ) ≤ ε +
∞ X
µ(Ei ).
i=1
µ(Ei ) which proves the lemma.
186
CHAPTER 7. POSITIVE LINEAR FUNCTIONALS
Lemma 7.2.4 Let K be compact, g ≥ 0, g ∈ Cc (X), and g = 1 on K. Then µ(K) ≤ Lg. Also µ(K) < ∞ whenever K is compact. Proof: Let Vα ≡ {x : g(x) > 1 − α} where α is small. I want to compare µ (Vα ) with µ (K). Thus let h ≺ Vα .
K
Vα
g >1−α −1
−1
Then h ≤ 1 on Vα while g (1 − α) ≥ 1 on Vα and so g (1 − α) −1 L(g (1 − α) ) ≥ Lh and that therefore, since L is linear,
≥ h which implies
Lg ≥ (1 − α) Lh. Taking sup for all such h, it follows that Lg ≥ (1 − α) µ (Vα ) ≥ (1 − α) µ (K). Letting α → 0 yields Lg ≥ µ(K). This proves the first part of the lemma. The second assertion follows from this and Theorem 7.1.3. If K is given, let K ≺ g ≺ X and so from what was just shown, µ (K) ≤ Lg < ∞. For two sets A, B, let dist (A, B) ≡ inf {|a − b| : a ∈ A, b ∈ B} .
Lemma 7.2.5 If A and B are disjoint subsets of X, with dist (A, B) > 0 then µ(A∪B) = µ(A) + µ(B). Proof: There is nothing to show if µ (A ∪ B) = ∞ so assume µ (A ∪ B) < ∞. Let δ ≡ dist (A, B) > 0. Then let U1 ≡ ∪a∈A B a, 3δ , V1 ≡ ∪b∈B B b, 3δ . It follows that these two open sets have empty intersection. Also, there exists W ⊇ A ∪ B such that µ (W ) − ε < µ (A ∪ B). let U ≡ U1 ∩ W, V ≡ V1 ∩ W. Then µ (A ∪ B) + ε > µ (W ) ≥ µ (U ∪ V ) Now let f ≺ U, g ≺ V such that Lf + ε > µ (U ) , Lg + ε > µ (V ) . Then µ (U ∪ V ) ≥
L (f + g) = L (f ) + L (g)
>
µ (U ) − ε + (µ (V ) − ε)
≥
µ (A) + µ (B) − 2ε
It follows that µ (A ∪ B) + ε > µ (A) + µ (B) − 2ε and since ε is arbitrary, µ (A ∪ B) ≥ µ (A) + µ (B) ≥ µ (A ∪ B). It follows from Theorem 5.7.2 that the measurable sets F determined by this outer measure µ contains the Borel σ algebra B (X). Since closures of balls are compact, it follows from Lemma 7.2.4 that µ is finite on every ball. From the definition, for any E ∈ F, µ (E) = inf {µ (V ) : V ⊇ E, V open} It remains to verify that for F ∈ F, µ (F ) = sup {µ (K) : K ⊆ F, K compact} , In doing so, first assume that F is contained in a closed ball B. Let V ⊇ (B \ F ) such that µ (V ) < µ (B \ F ) + ε
7.2. POSITIVE LINEAR FUNCTIONALS AND MEASURES
187
Then µ (V \ (B \ F )) + µ (B \ F ) = µ (V ) < µ (B \ F ) + ε and so µ (V \ (B \ F )) < ε. Now observe that B ∩ V C is a compact set and is contained in F . Indeed, if x ∈ B ∩ V C yet x∈ / F, then x would be in B \ F so x ∈ V which is not the case. Now F \ VC ∩B = F ∩ V ∪ BC = F ∩ V V \ (B \ F ) = V ∩ F ∪ B C = (V ∩ F ) ∪ V ∩ B C and so, µ F \ V C ∩ B
≤ µ (V \ (B \ F )) < ε.
µ (F ) = µ V C ∩ B + µ F \ V C ∩ B < µ V C ∩ B + ε This shows the inner regularity in case F is contained in a closed ball. If F is not necessarily contained in a closed ball, let Bn be a sequence of closed balls having increasing radii and let Fn = Bn ∩ F . Then if l < µ (F ) , it follows that µ (Fn ) > l for all large enough n. Then picking one of these, it follows from what was just shown that there is a compact set K ⊆ Fn such that also µ (K) > l. Thus F contains the Borel sets and µ is inner regular on all sets of F, outer regular by definition. It remains to show µ satisfies 7.5. R Lemma 7.2.6 f dµ = Lf for all f ∈ Cc (X). Proof: Let f ∈ Cc (X), f real-valued, and suppose f (X) ⊆ [a, b]. Choose t0 < a and let t0 < t1 < · · · < tn = b, ti − ti−1 < ε. Let Ei = f −1 ((ti−1 , ti ]) ∩ spt(f ).
(7.6)
Note that ∪ni=1 Ei is a closed set equal to spt(f ). ∪ni=1 Ei = spt(f )
(7.7)
Since X = ∪ni=1 f −1 ((ti−1 , ti ]). Let Vi ⊇ Ei , Vi is open and let Vi satisfy f (x) < ti + ε for all x ∈ Vi , µ(Vi \ Ei ) < ε/n.
(7.8)
By Theorem 7.1.4, there exists hi ∈ Cc (X) such that hi ≺ Vi ,
n X
hi (x) = 1 on spt(f ).
i=1
Now note that for each i, f (x)hi (x) ≤ hi (x)(ti + ε). If x ∈ / Vi both sides equal 0.) Therefore, Lf
n n n X X X = L( f hi ) ≤ L( hi (ti + ε)) = (ti + ε)L(hi ) i=1
=
i=1
i=1
! n n X X (|t0 | + ti + ε)L(hi ) − |t0 |L hi . i=1
i=1
Now note that |t0 | + ti + ε ≥ 0 and so from the definition of µ and Lemma 7.2.4, this is no larger than n X (|t0 | + ti + ε)µ(Vi ) − |t0 |µ(spt(f )) i=1
188
CHAPTER 7. POSITIVE LINEAR FUNCTIONALS
≤
n X
(|t0 | + ti + ε) (µ(Ei ) + ε/n) − |t0 |µ(spt(f ))
i=1
µ(spt(f ))
≤
z }| { n X X ε |t0 | ti µ (Ei ) µ(Ei ) + n |t0 | + n i i=1 +
X i
ti
ε2 ε X + εµ (Ei ) + − |t0 | µ(spt(f )) n n i
≤ ε |t0 | + ε (|t0 | + |b|) + εµ(spt(f )) + ε2 +
X
ti µ (Ei )
i 2
≤ ε |t0 | + ε (|t0 | + |b|) + 2εµ(spt(f )) + ε +
n X
ti−1 µ(Ei )
i=1
Z ≤ ε (2 |t0 | + |b| + 2µ(spt(f )) + ε) +
f dµ
R Since ε > 0 is f real. Hence equality holds because R arbitrary, Lf ≤ R f dµ for all f ∈ Cc (X), R L(−f ) ≤ − f dµ so L(f ) ≥ f dµ. Thus Lf = f dµ for all f ∈ Cc (X). Just apply the result for real functions to the real and imaginary parts of f . In fact the two outer measures are equal on all sets. Thus the measurable sets are exactly the same and so they have the same σ algebra of measurable sets and are equal on this σ algebra. Of course, if you were willing to consider σ algebras for which the measures are not complete, then you might have different σ algebras, but note that if you define the measurable sets in terms of Caratheodory as done above, the σ algebras are also unique. As a special case of the above,
Corollary 7.2.7 Let µ be a Borel measure. Also let µ (B) < ∞ for every ball B contained in a metric space X which has closed balls compact. Then µ must be regular. Proof: This follows right away from using the Riesz representation theorem above on the functional Z Lf ≡ f dµ for all f ∈ Cc (X). Here is another interesting result.
Corollary 7.2.8 Let X be a random variable with values in Rp . Then λX is an inner and outer regular measure defined on B (Rp ). This is obvious when you recall the definition of the distribution measure for the random vector X. λX (E) ≡ P (ω : X (ω) ∈ E) where this means the probability that X is in E a Borel set of Rp . It is a finite Borel measure and so it is regular. There is an interesting application of regularity to approximation of a measurable function with one that is continuous.
Lemma 7.2.9 Suppose f : Rp → [0, ∞) is measurable where µ is a regular measure, both inner and outer regular, as in the Riesz representation theorem for positive linear functionals, finite on balls. Then there is a set of measure zero N and a sequence of functions {hn } , hn : Rp → [0, ∞) each in Cc (Rp ) such that for all x ∈ Rp \ N, hn (x) → f (x) . Also, for x ∈ / N, hn (x) ≤ f (x) for all n large enough.
7.2. POSITIVE LINEAR FUNCTIONALS AND MEASURES
189
Proof: Consider fn (x) ≡ XBn (x) min (f (x) , n) where Bn is a ball centered at x0 which has radius n. Thus fn (x) is an increasing sequence and converges to f (x) for each x. Also by Corollary 5.1.9, there exists a simple function sn such that sn (x) ≤ fn (x) , sup |fn (x) − sn (x)| < x∈Rp
Let sn (x) =
mn X
1 2n
cnk XEkn (x) , cnk > 0
k=1
R Then it must be the case that µ (Ekn ) < ∞ because fn dµ < ∞. By regularity, there exists a compact set Kkn and an open set Vkn such that Kkn ⊆ Ekn ⊆ Vkn ,
mn X
µ (Vkn \ Kkn )
M ≡ sup{r : B(p, r) ∈ F} > 0 and let k ∈ (0, ∞) . Then there exists G ⊆ F such that If B(p, r) ∈ G then r > k,
(7.11)
If B1 , B2 ∈ G then B1 ∩ B2 = ∅,
(7.12)
G is maximal with respect to 7.11 and 7.12.
(7.13)
By this is meant that if H is a collection of balls satisfying 7.11 and 7.12, then H cannot properly contain G. Proof: Let S denote a subset of F such that 7.11 and 7.12 are satisfied. Then if S = ∅, it means there is no ball having radius larger than k. Otherwise, S 6= ∅. Partially order S with respect to set inclusion. Thus A ≺ B for A, B in S means that A ⊆ B. By the Hausdorff maximal theorem, there is a maximal chain in S denoted by C. Then let G be ∪C. If B1 , B2 are in C, then since C is a chain, both B1 , B2 are in some element of C and so B1 ∩ B2 = ∅. The maximality of C is violated if there is any other element of S which properly contains G.
Proposition 7.6.3 Let F be a collection of balls, and let A ≡ ∪ {B : B ∈ F} . Suppose ∞ > M ≡ sup {r : B(p, r) ∈ F} > 0. Then there exists G ⊆ F such that G consists of balls whose closures are disjoint and b : B ∈ G} A ⊆ ∪{B b denotes the open ball B (x, 5r). where for B = B (x, r) a ball, B Proof: Let G1 satisfy 7.11 and 7.12 for k = 2M 3 . Suppose G1 , · · · , Gm−1 have been chosen for m ≥ 2. Let Gi denote the collection of closures of the balls of Gi . Then let Fm be those balls of F, such that if B is one of these balls, B has empty intersection with every closed ball of Gi for each i ≤ m − 1. Then using Lemma 7.6.2, let Gm be a maximal collection of balls from Fm with the property that each
7.6. THE VITALI COVERING THEOREMS
197
m ball has radius larger than 23 M and their closures are disjoint. Let G ≡ ∪∞ k=1 Gk . Thus the closures of balls in G are disjoint. Let x ∈ B (p, r) ∈ F \ G. Choose m such that m−1 m 2 2 M M. 3 Consider the picture, in which w ∈ B (p0 , r0 ) ∩ B (p, r). r0
x • w ••p • p0 r
Then for x ∈ B (p, r), ≤r0
z }| { | kx − p0 k ≤ kx − pk + kp − wk + kw − p0 k < 23 r0
0 and x ∈ S, there exists a set B ∈ C containing x, the diameter of B is less than ε, and there exists an upper bound to the set of diameters of sets of C. The following corollary is a consequence of the above Vitali covering theorem.
Corollary 7.6.5 Let F be a bounded set and let C be a Vitali covering of F consisting of closed balls. Let r (B) denote the radius of one of these balls. Then assume also that sup {r (B) : B ∈ C} = M < ∞. Then there is a countable subset of C denoted by {Bi } such that m ¯ p F \ ∪N B i=1 i = 0 for N ≤ ∞, and Bi ∩ Bj = ∅ whenever i 6= j. Proof: Let U be a bounded open set containing F such that U approximates F so well that −1 mp (U ) ≤ 1 − 10−p m ¯ p (F ) . Since this is a Viali covering, for each x ∈ F, there is one of these balls B containing x such ˆ ⊆ U . Let Cb denote those balls of C such that B ˆ ⊆ U also. Thus, this is also a cover that B of F . By the Vitali covering theorem above, there are disjoint balls from C {Bi } such that
198
CHAPTER 7. POSITIVE LINEAR FUNCTIONALS
n o ˆi covers F . Thus B m ¯ p F \ ∪∞ j=1 Bj
≤
θp > ˆθp , since the sum on the right is the tail of a convergent series, ∞ X
m (Bk ) ≤ θp − ˆθp m ¯ (F ) if n large enough
k=n
Thus there exists n1 such that 1 m ¯ F \ ∪nj=1 Bj < θ p m ¯ (F ) 1 1 If m ¯ F \ ∪nj=1 Bj = 0, stop. Otherwise, do for F \ ∪nj=1 Bj exactly the same thing that was n1 done for F. Since ∪j=1 Bj is closed, you can arrange to have the approximating open set C 1 be contained in the open set ∪nj=1 Bj . It follows there exist disjoint closed balls from C called Bn1 +1 , · · · , Bn2 such that 1 2 1 m ¯ F \ ∪nj=1 Bj \ ∪nj=n Bj < θ p m ¯ F \ ∪nj=1 Bj < θ2p m ¯ (F ) 1+1 continuing this way and noting that limn→∞ θnp = 0 while m ¯ (F ) < ∞, this shows the nk desired result. Either the process stops because m ¯ F \ ∪j=1 Bj = 0 or else you obtain m ¯ F \ ∪∞ j=1 Bj = 0. The conclusion holds for arbitrary balls, open or closed or neither. This follows from observing that the measure of the boundary of a ball is 0. Indeed, let S (x,r) ≡ {y : |y − x| = r}. Then for each ε < r, mp (S (x, r)) ⊆ = = Hence mp (S (x, r)) = 0.
mp (B (x, r + ε)) − mp (B (x, r − ε)) mp (B (0, r + ε)) − mp (B (0, r − ε)) p p r−ε r+ε − (mp (B (0, r))) r r
7.7. EXERCISES
199
Thus you can simply omit the boundaries or part of the boundary of the closed balls and there is no change in the conclusion. Just first apply the above corollary to the Vitali cover consisting of closures of the balls before omitting part or all of the boundaries. The following theorem is obtained.
Theorem 7.6.6
Let E be a bounded set and let C be a Vitali covering of E consisting of balls, open, closed, or neither. Let r (B) denote the radius of one of these balls. Then assume also that sup {r (B) : B ∈ C} = M < ∞. Then there is a countable subset of C denoted by {Bi } such that m ¯ p E \ ∪N i=1 Bi = 0, N ≤ ∞, and Bi ∩ Bj = ∅ whenever i 6= j. Here m ¯ p denotes the outer measure determined by mp . The same conclusion follows if you omit the assumption that E is bounded. Proof: It remains to consider the last claim. Consider the balls B (0, 1) , B (0, 2) , B (0,3) , · · · . If E is some set, let Er denote that part of E which is between B (0, r − 1) and B (0, r) but not on the boundary of either of these balls, where B (0, −1) ≡ ∅. Then ∪∞ r=0 Er differs from E by a set of measure zero and so you can apply the first part of the theorem to each Er keeping all balls between B (0, r − 1) and B (0, r) allowing for no intersection with any of the boundaries. Then the union of the disjoint balls associated with Er gives the desired cover.
7.7
Exercises
1. Let (X, F, µ) be a regular measure space. For example, it could be R with Lebesgue measure. Why do we care about a measure space being regular? This problem will show why. Suppose that closures of balls are compact as in the case of R. (a) Let µ (E) < ∞. By regularity, there exists K ⊆ E ⊆ V where K is compact and V is open such that µ (V \ K) < ε. Show there exists W open such that ¯ ⊆ V and W ¯ is compact. Now show there exists a function h such that K ⊆W h has values in [0, 1] , h (x) = 1 for x ∈ K, and h (x) equals 0 off W . Hint: You might consider Problem 12 on Page 154. (b) Show that Z |XE − h| dµ < ε Pn (c) Next suppose s = i=1 ci XEi is a nonnegative simple function where each µ (Ei ) < ∞. Show there exists a continuous nonnegative function h which equals zero off some compact set such that Z |s − h| dµ < ε (d) Now suppose f ≥ 0 and f ∈ L1 (Ω) . Show that there exists h ≥ 0 which is continuous and equals zero off a compact set such that Z |f − h| dµ < ε (e) If f ∈ L1 (Ω) with complex values, show the conclusion in the above part of this problem is the same. That is, Cc (Rp ) is dense in L1 (Rp ). 2. Let F be an increasing function defined on R. For f a continuous function having compact support in R, consider the functional Z Lf ≡ f dF
200
CHAPTER 7. POSITIVE LINEAR FUNCTIONALS Rb where here the integral signifies a f dF where spt (f ) ⊆ [a, b] and the integral is the ordinary Rieman Stieltjes integral. For a discussion of these, see my single variable advanced calculus book. If µ is the measure which results, show that µ ((α, β)) = F (β−) − F (α+) and µ ((α, β]) = F (β+) − F (α+) , µ ([α, β]) = F (β+) − F (α−) . Here F (x+) ≡ lim F (y) , F (x−) ≡ lim F (y) y→x,y>x
y→x,y M ≡ sup {r (B) : B ∈ B} . The Vitali covering theorem says that there is a countable collection of these balls of B ˆ called B˜ such that no two intersect n and for B o the open ball having the same center as ˆ : B ∈ B˜ . This is a really useful covering theorem B but 5 times the radius, S ⊆ ∪ B when dealing with Lebesgue measure. Prove this theorem by following the following steps. You fill in details. (a) If F is a nonempty collection of balls having K = sup {r (B) : B ∈ F} . Show that there exists F1 such that 1.) No two different balls of F1 have nonempty intersection. 2.) For each B ∈ F1 , r (B) > K/2. 3.) F1 is maximal with respect to 1.),2.). That is, if B has r (B) > K/2, then B must intersect some ball of F1 . Hint: To do this, define a partial order consisting of all subsets of F, S ⊆ F. Two of these are related if one is a subset of the other. Now use the Hausdorff maximal theorem in the appendix to obtain a maximal chain of these subsets of F called C. Let F1 = ∪C. Verify this works. (b) Beginning with B, use a.) to conclude there exists B1 ⊆ B such that the balls of B1 are disjoint, each has radius larger than M/2 and B1 is maximal with respect to these properties. If B1 , · · · , Bk have been chosen such that no ball of Bk intersects any ball of Bi and no pair of balls in any Bj have nonempty intersection, consider F to be all balls of B which have empty intersection with all balls of ∪ki=1 Bk . If there are none, then process stops. Get Bk+1 , a subset of F consisting of disjoint balls each having radius larger than M/2k−1 and maximal with respect to these conditions.
7.7. EXERCISES
201
ˆ (c) Now show that ∪∞ i=1 Bi covers S. Hint: Let x ∈ S so x ∈ B ∈ B. Let k be such that M M < r (B) ≤ k−1 k 2 2 ˆ1 covers B. Explain why there is B1 ∈ Bk such that B1 ∩ B 6= ∅. Now show that B To do this, let y be the center of B and let x be the center of B1 . Then explain why |x − z| ≤ |z − y| + |y − x|
0. There is V open such that mp (V ) < ε and V ⊇ N . ˆx contained in V . Go (b) For each x ∈ N, there is a ball Bx centered at x with B ahead and let the ball be taken with respect to the norm kxk ≡ max {|xi | , i ≤ p}. Thus these Bx are open cubes. (c) You know from Problem 9 that h (Bx ) is measurable. Use n o Problem 8 to obtain ∞ ˆ countably many disjoint balls {Bxi }i=1 such that Bxi covers N . n o ˆx (d) Explain why h (N ) is covered by h B . Now fill in the details of the foli lowing estimate. mp (h (N )) ≤
∞ X i=1
=
∞ X
∞ X ˆx ˆx mp h B ≤ K p mp B i i i=1 p
K p 5p mp (Bxi ) = (5K) ε
i=1
(e) Now explain why this shows that mp (h (N )) = 0. Thus Lipschitz mappings take sets of measure zero to sets of measure zero. 11. Use this and Proposition 7.5.2 to show that if h is a Lipschitz function, then if E is Lebesgue measurable, so is h (E). Hint: This will involve completeness of the measure and Problem 10. You could first show that it suffices to assume that E is contained in some ball to begin with if this would make it any easier. 12. Show that the continuous functions with compact support are dense in L1 (Rp ) with respect to Lebesgue measure. Will this work for a general Radon measure? Hint: You should show that the simple functions are dense in L1 (Rp ) using the norm in L1 (Rp ) and then consider regularity of the measure.
202
CHAPTER 7. POSITIVE LINEAR FUNCTIONALS
13. Suppose A ⊆ Rp is covered by a finite collection of Balls, F. Show that then there m b b exists a disjoint collection of these balls, {Bi }i=1 , such that A ⊆ ∪m i=1 Bi where Bi has the same center as Bi but 3 times the radius. Hint: Since the collection of balls is finite, they can be arranged in order of decreasing radius. 14. This problem will help to understand that a certain kind of function exists. f (x) =
2
e−1/x if x 6= 0 0 if x = 0
show that f is infinitely differentiable. Note that you only need to be concerned with what happens at 0. There is no question elsewhere. This is a little fussy but is not too hard. 15. ↑Let f (x) be as given above. Now let f (x) if x ≤ 0 fˆ (x) ≡ 0 if x > 0 Show that fˆ (x) is also infinitely differentiable. Now let r > 0 and define g (x) ≡ fˆ (− (x − r)) fˆ (x + r) Qn show that g is infinitely differentiable and vanishes for |x| ≥ r. Let ψ (x) = k=1 g (xk ). For U = B (0, 2r) with the norm given by kxk = max {|xk | , k ≤ n} , show that ψ ∈ Cc∞ (U ). 16. ↑Using the above problem, let ψ ∈ Cc∞ (B (0, 1)) . Also let ψ ≥ 0R as in the above problem. Show there exists ψ ≥ 0 such that ψ ∈ Cc∞ (B (0, 1)) and ψdmn = 1. Now define ψ k (x) ≡ k n ψ (kx) R Show that ψ k equals zero off a compact subset of B 0, k1 and ψ k dmn = 1. We say that spt (ψ k ) ⊆ B 0, k1 . spt (f ) is defined as the closure of the set Ron which f is not equal to 0. Such a sequence offunctions as just defined {ψ k } where ψ k dmn = 1 and ψ k ≥ 0 and spt (ψ k ) ⊆ B 0, k1 is called a mollifier. 17. ↑It is important to be able to approximate functions with those which are infinitely differentiable. Suppose f ∈ L1 (Rp ) and let {ψ k } be a mollifier as above. We define the convolution as follows. Z f ∗ ψ k (x) ≡ f (x − y) ψ k (y) dmn (y) Here the notation means that the variable of integration is y. Show that f ∗ ψ k (x) exists and equals Z ψ k (x − y) f (y) dmn (y) Now show using the dominated convergence theorem that f ∗ ψ k is infinitely differentiable. Next show that Z lim |f (x) − f ∗ ψ k (x)| dmn = 0 k→∞
Thus, in terms of being close in L1 (Rp ) , every function in L1 (Rp ) is close to one which is infinitely differentiable.
7.7. EXERCISES
203
18. ↑From Problem 1 above and f ∈ L1 (Rp ), there exists h ∈ Cc (Rp ) , continuous and spt (h) a compact set, such that Z |f − h| dmn < ε Now consider h ∗ ψ k . Show that this function is in Cc∞ spt (h) + B 0, k2 . The notation means you start with the compact set spt (h) and fatten it up by adding the set B 0, k1 . It means x + y such that x ∈ spt (h) and y ∈ B 0, k1 . Show the following. For all k large enough, Z |f − h ∗ ψ k | dmn < ε so one can approximate with a function which is infinitely differentiable and also has compact support. Also show that h ∗ ψ k converges uniformly to h. If h is a function in C k Rk in addition to being continuous with compact support, show that for each |α| ≤ k, Dα (h ∗ ψ k ) → Dα h uniformly. Hint: If you do this for a single partial derivative, you will see how it works in general. 19. ↑Let f ∈ L1 (R). Show that Z lim
k→∞
f (x) sin (kx) dm = 0
Hint: Use the result of the above problem to obtain g ∈ Cc∞ (R) , continuous and zero off a compact set, such that Z |f − g| dm < ε Then show that
Z lim
k→∞
You can do this by integration by Z f (x) sin (kx) dm =
Z ≤
g (x) sin (kx) dm (x) = 0 parts. Then consider this. Z Z f (x) sin (kx) dm − g (x) sin (kx) dm Z + g (x) sin (kx) dm
Z |f − g| dm + g (x) sin (kx) dm
This is the celebrated Riemann Lebesgue lemma which is the basis for all theorems about pointwise convergence of Fourier series. 20. As another application, here is a very important result. Suppose f ∈ L1 (Rp ) and for every ψ ∈ Cc∞ (Rp ) Z f ψdmn = 0 Show that then it follows that f (x) = 0 for a.e.x. That is, there is a set of measure zero such that off this set f equals 0. Hint: What you can do is to let E be a measurable which is bounded and let Kk ⊆ E ⊆ Vk where mn (Vk \ Kk ) < 2−k . Here Kk is compact and Vk is open. By an earlier exercise, Problem 12 on Page 154, there exists a function φk which is continuous, has values in [0, 1] equals 1 on Kk and spt (φk ) ⊆ V. To get this last part, show there exists Wk open such that Wk ⊆ Vk and Wk contains Kk . Then you use the problem to get spt (φk ) ⊆ Wk . Now you form η k = φk ∗ ψ l where {ψ l } is a mollifier. Show that for l large enough, η k has values in
204
CHAPTER 7. POSITIVE LINEAR FUNCTIONALS [0, 1] , spt (η k ) ⊆ Vk and η k ∈ Cc∞ (Vk ). Now explain why η k → XE off a set of measure zero. Then Z Z Z f XE dmn = f (XE − η k ) dmn + f η k dmn Z = f (XE − η k ) dmn Now explain why this converges to 0 Ron the right. This will involve the dominated convergence theorem. Conclude that f XE dmn = 0 for every bounded measurable R set E. Show that this implies that f XE dmn = 0 for every measurable E. Explain why this requires f = 0 a.e. The result which gets used over and over in all of this is the dominated convergence theorem.
21. This is from the advanced calculus book by Apostol. Justify the following argument using convergence theorems. Z
1
F (x) ≡ 0
Then you can take the derivative
2 2 Z x 2 e−x (1+t ) −t2 e dt dt + 1 + t2 0
d dx
and obtain
2 2 Z x e−x (1+t ) (−2x) 1 + t2 2 −t2 dt + 2 e dt e−x F (x) = 2 1 + t 0 0 Z x Z 1 2 2 2 2 −x (1+t ) −t = e (−2x) dt + 2 e dt e−x Z
0
1
0
0
Why can you take the derivative on the inside of the integral? Change the variable in the second integral, letting t = sx. Thus dt = xds. Then the above reduces to Z 1 Z 1 2 2 2 2 2 = e−x (1+t ) (−2x) dt + 2x e−s x ds e−x 0
Z =
(−2x) 0
1
2 2 e−x (1+t ) dt + 2x
Z
0 1
2
e−(s
+1)x2
ds = 0
0
and so this function of x is constant. However, Z 1 1 dt = π F (0) = 2 1 + t 4 0 Now you let x → ∞. What happens to that first integral? It equals Z 1 Z 1 −x2 t2 e 1 −x2 −x2 e dt ≤ e dt 2 2 0 1+t 0 1+t and so it obviously converges to 0 as x → ∞. Therefore, taking a limit yields Z ∞ 2 √ Z ∞ 2 2 π π e−t dt = , e−t dt = 4 2 0 0 22. The Dini derivates are as follows. In these formulas, f is a real valued function defined on R. D+ f (x) ≡ D− f (x) ≡
f (x + h) − f (x) f (x + h) − f (x) , D+ f (x) ≡ lim inf h→0+ h h h→0+ f (x) − f (x − h) f (x) − f (x − h) lim sup , D− f (x) ≡ lim inf h→0+ h h h→0+
lim sup
7.7. EXERCISES
205
Thus when these are all equal, the function has a derivative. Now suppose f is an increasing function. Let Nab = x : D+ f (x) > b > a > D+ f (x) , a ≥ 0 r Let V be an open set which contains Nab ∩ (−r, r) ≡ Nab such that
m (V \ (Nab ∩ (−r, r))) < ε Then explain why there exist disjoint intervals [ai , bi ] such that r r m (Nab \ ∪i [ai , bi ]) = m (Nab \ ∪i (ai , bi )) = 0
and f (bi ) − f (ai ) ≤ am (ai , bi ) each interval being contained in V ∩ (−r, r). Thus you have r r m (Nab ) = m (∪i Nab ∩ (ai , bi )) .
Next show there exist disjoint intervals (aj , bj ) such that each of these in Pis contained r some (ai , bi ), the (aj , bj ) are disjoint, f (bj )−f (aj ) ≥ bm (aj , bj ) , and j m (Nab ∩ (aj , bj )) = r m (Nab ). Then you have the following thanks to the fact that f is increasing. r a (m (Nab ) + ε) >
am (V ) ≥ a
X
(bi − ai ) >
i
≥
X
f (bj ) − f (aj ) ≥ b
j
≥ b
X
f (bi ) − f (ai )
i
X
bj − aj
j
X
r r m (Nab ∩ (aj , bj )) = bm (Nab )
j
and since ε > 0, r r am (Nab ) ≥ bm (Nab ) r showing that m (Nab ) = 0. This is for any r and so m (Nab ) = 0. Thus the derivative from the right exists for a.e. x by taking the complement of the union of the Nab for a, b nonnegative rational numbers. Now do the same thing to show that the derivative from the left exists a.e. and finally, show that D− f (x) = D+ f (x) for almost a.e. x. Off the union of these three exceptional sets of measure zero all the derivates are the same and so the derivative of f exists a.e. In other words, an increasing function has a derivative a.e.
23. This problem is on Eggoroff’s theorem. This was presented earlier in the book. The idea is for you to review this by going through a proof. Suppose you have a measure space (Ω, F, µ) where µ (Ω) < ∞. Also suppose that {fk } is a sequence of measurable, complex valued functions which converge to f pointwise. Then Eggoroff’s theorem says that for any ε > 0 there is a set N with µ (N ) < ε and convergence is uniform on NC. 1 (a) Define Emk ≡ ∪∞ r=m ω : |f (ω) − fr (ω)| > k . Show that Emk ⊇ E(m+1)k for all m and that ∩m Emk = ∅ (b) Show that there exists m (k) such that µ Em(k)k < ε2−k . (c) Let N ≡ ∪∞ / N C , if k=1 Em(k)k . Explain why µ (N ) < ε and that for all ω ∈ 1 r > m (k) , then |f (ω) − fr (ω)| ≤ k . Thus uniform convergence takes place on NC.
206
CHAPTER 7. POSITIVE LINEAR FUNCTIONALS
24. Suppose you have a sequence {fn } which converges uniformly on each of sets A1 , · · · , An . Why does the sequence converge uniformly on ∪ni=1 Ai ? 25. ↑Now suppose you have µ is a finite Radon measure on Rp . For example, you could have Lebesgue measure. Suppose you have f has nonnegative real values for all x and is measurable. Then Lusin’s theorem says that for every ε > 0, there exists an open set V with measure less than ε and a continuous function defined on Rp such that f (x) = g (x) for all x ∈ / V. That is, off an open set of small measure, the function is equal to a continuous function. (a) By Lemma 7.2.9, there exists a sequence {fn } ⊆ Cc (Ω) which converges to f off ˆ such that a set N of measure zero. Use Eggoroff’s theorem to enlarge N to N ε ˆ. ˆ < and convergence is uniform off N µ N 2 ˆ having measure less than ε. (b) Next use outer regularity to obtain open V ⊇ N C Thus {fn } converges uniformly on V . Therefore, that which it converges to is continuous on V C a closed set. Now use the Tietze extension theorem. 26. Say you have a change of variables formula which says that Z Z f (y) dmp = f (h (x)) |det Dh (x)| dmp h(U )
U
and that this holds for all f ≥ 0 and Lebesgue measurable. Show that the formula continues to hold if f ∈ L1 (h (U )) . Hint: Apply what is known to the positive and negative parts of the real and imaginary parts.
Chapter 8
Basic Function Spaces In this chapter is an introduction to some of the most important vector spaces of functions. First of all, recall from linear algebra that if you have any nonempty set S and V is the set of all functions defined on S having values in F or more generally some vector space, then defining (f + g) (x) ≡ f (x) + g (x) (αg) (x) ≡ αg (x) this defines vector addition and scalar multiplication of functions. You should check that all the axioms of a vector space hold for this situation. Note also that the usual situation in linear algebra Fn where vectors are ordered lists of numbers is a special case. There you are considering functions mapping {1, · · · , n} to F so the set S consists of the first n natural numbers. This was a finite dimensional vector space, but if S is the unit interval and V consists of functions defined on S, then this will not be finite dimensional because for each x ∈ S, you could consider fx (x) ≡ 1 and fx (y) = 0 for y 6= x and you would have infinitely many vectors such that every finite subset of them is linearly independent. There are two kinds of function spaces discussed here, the space of bounded continuous functions and the Lp spaces. First I will consider the space of bounded continuous functions.
8.1
Bounded Continuous Functions
As before, F will denote either R or C.
Definition 8.1.1
Let T be a subset of some Fm , possibly all of Fm . Let BC (T ; Fn ) denote the bounded continuous functions defined on T .1 Then this is a vector space (linear space) with respect to the usual operations of addition and scalar multiplication of functions. Also, define a norm as follows: kf k ≡ sup |f (t)| < ∞. t∈T
This is a norm because it satisfies the axioms of a norm which are as follows: kf + gk ≤ kf k + kgk kαf k = |α| kf k kf k ≥ 0 and equals 0 if and only if f = 0 1 In fact, they will be automatically bounded if the set T is a closed interval like [0,T], but the considerations presented here will work even when a compact set is not being considered.
207
208
CHAPTER 8. BASIC FUNCTION SPACES
A sequence {fn } in BC (T ; Fn ) is a Cauchy sequence if for every ε > 0 there exists Mε such that if m, n ≥ Mε , then kfn − fm k < ε Such a normed linear space is called complete if every Cauchy sequence converges. Such a complete normed linear space is called a Banach space. This norm is often denoted as k·k∞ . I am letting T be a subset of Fn just to keep things in familiar territory. T can be an arbitrary metric space or even a general topological space. Now there is another norm which works just as well in the case where T ≡ [a, b] , an interval. This is described in the following definition.
Definition 8.1.2
For f ∈ BC ([a, b] ; Fn ) , let c ∈ [a, b] , γ a real number. Then kf kγ ≡ sup f (t) e−|γ(t−c)| t∈[a,b]
Then this is a norm. The above Definition 8.1.1 corresponds to γ = 0.
Lemma 8.1.3 k·kγ is a norm for BC ([a, b] ; Fn ) and BC ([a, b] ; Fn ) is a complete normed linear space. Also, a sequence is Cauchy in k·kγ if and only if it is Cauchy in k·k. Proof: First consider the claim about k·kγ being a norm. To simplify notation, let T = [a, b]. It is clear that kf kγ = 0 if and only if f = 0 and kf kγ ≥ 0. Also, kαf kγ ≡ sup αf (t) e−|γ(t−c)| = |α| sup f (t) e−|γ(t−c)| = |α| kf kγ t∈T
t∈T
so it does what is should for scalar multiplication. Next consider the triangle inequality. kf + gkγ = sup (f (t) + g (t)) e−|γ(t−c)| t∈T ≤ sup f (t) e−|γ(t−c)| + g (t) e−|γ(t−c)| t∈T ≤ sup f (t) e−|γ(t−c)| + sup g (t) e−|γ(t−c)| t∈T
t∈T
= kf kγ + kgkγ The rest follows from the next inequalities. kf k ≡ sup |f (t)| = sup f (t) e−|γ(t−c)| e|γ(t−c)| ≤ e|γ(b−a)| kf kγ t∈T t∈T |γ(b−a)| ≡ e sup f (t) e−|γ(t−c)| ≤ e|γ|(b−a) sup |f (t)| = e|γ|(b−a) kf k t∈T
t∈T
Now consider the general case where T is just some set.
Lemma 8.1.4 The collection of functions BC (T ; Fn ) is a normed linear space (vector space) and it is also complete which means by definition that every Cauchy sequence converges. Proof: Showing that this is a normed linear space is entirely similar to the argument in the above for γ = 0 and T = [a, b]. Let {fn } be a Cauchy sequence. Then for each t ∈ T, {fn (t)} is a Cauchy sequence in Fn . By completeness of Fn this converges to some g (t) ∈ Fn . We need to verify that kg − fn k → 0 and that g ∈ BC (T ; Fn ). Let ε > 0 be given. There exists Mε such that if
8.1. BOUNDED CONTINUOUS FUNCTIONS
209
m, n ≥ Mε , then kfn − fm k < 4ε . Let n > Mε . By Lemma 1.8.2 which says you can switch supremums, sup |g (t) − fn (t)|
≤
t∈T
sup sup |fk (t) − fn (t)| t∈T k≥Mε
sup sup |fk (t) − fn (t)| = sup kfk − fn k ≤
=
k≥Mε t∈T
k≥Mε
Therefore, sup (|g (t)| − |fn (t)|) ≤ sup |g (t) − fn (t)| ≤ t∈T
t∈T
ε 4
(*)
ε 4
Hence ε ≥ sup (|g (t)| − |fn (t)|) = sup |g (t)| − inf |fn (t)| ≥ sup |g (t)| − kfn k t∈T 4 t∈T t∈T t∈T sup |g (t)| ≤ t∈T
ε + kfn k < ∞ 4
so in fact g is bounded. Now by the fact that fn is continuous, there exists δ > 0 such that if |t − s| < δ, then ε |fn (t) − fn (s)| < 3 It follows that |g (t) − g (s)| ≤ |g (t) − fn (t)| + |fn (t) − fn (s)| + |fn (s) − g (s)| ≤
ε ε ε + + m and consider the following which comes from the triangle inequality for the norm, kx + yk ≤ kxk + kyk. kF n x0 − F m x0 k ≤
n−1 X k=m
k+1
F x0 − F k x0
210
CHAPTER 8. BASIC FUNCTION SPACES
Now F k+1 x0 − F k x0 ≤
r F k x0 − F k−1 x0 ≤ r2 F k−1 x0 − F k−2 x0 · · · ≤ rk kF x0 − x0 k . Using this in the above, kF n x0 − F m x0 k ≤ n−1 X k=m
X
k+1
n−1 rm k
F x0 − F x0 ≤ rk kF x0 − x0 k ≤ kF x0 − x0 k 1−r
(8.1)
k=m
since r < 1, this is a Cauchy sequence. Hence it converges to some x. Therefore, x = lim F n x0 = lim F n+1 x0 = F lim F n x0 = F x. n→∞
n→∞
n→∞
The third equality is a consequence of the following consideration. If zn → z, then kF zn − F zk ≤ r kzn − zk so also F zn → F z. In the above, F n x0 plays the role of zn and its limit plays the role of z. The fixed point is unique because if you had two of them, x, x ˆ, then kx − x ˆk = kF x − F x ˆk ≤ r kx − x ˆk and so x = x ˆ. In the second case, let m = 0 in 8.1 and you get the estimate kF n x0 − x0 k ≤
1 kF x0 − x0 k < R 1−r
It is still the case that the sequence {F n x0 } is a Cauchy sequence and must therefore converge to some x ∈ B (x0 , R) which is a fixed point as before. The fixed point is unique because of the same argument as before. Why do we care about complete normed linear spaces? The following is a fundamental existence theorem for ordinary differential equations. It is one of those things which, incredibly, is not presented in ordinary differential equations courses. However, this is the mathematically interesting thing. The initial value problem is to find t → x (t) on [a, b] such that x0 (t) = f (t, x (t)) x (c) = x0 Assuming (t, x) → f (t, x) is continuous, this is obviously equivalent to the single integral equation Z t x (t) = x0 + f (s, x (s)) ds c
Indeed, if x (·) is a solution to the initial value problem, then you can integrate and obtain the above. Conversely, if you find a solution to the above, integral equation, then you can use the fundamental theorem of calculus to differentiate and find that it is a solution to the initial value problem.
Theorem 8.1.7
Let f satisfy the Lipschitz condition |f (t, x) − f (t, y)| ≤ K |x − y|
(8.2)
(t, x) → f (t, x) is continuous.
(8.3)
and the continuity condition
Then there exists a unique solution to the initial value problem, Z t x (t) = x0 + f (s, x (s)) ds, c ∈ [a, b] c
on [a, b] .
(8.4)
8.1. BOUNDED CONTINUOUS FUNCTIONS
211
Proof: It is necessary to find a solution to the integral equation Z
t
f (s, x (s)) ds, t ∈ [a, b]
x (t) = x0 + c
Let a, b be finite but given and completely arbitrary, c ∈ [a, b]. Rt f (s, x (s)) ds Thus c F : BC ([a, b] , Fn ) → BC ([a, b] , Fn )
Let F x (t) ≡ x0 +
Let k·kγ be the new norm on BC ([a, b] , Fn ) . kf kγ ≡ sup f (t) e−|γ(t−c)| t∈[a,b]
Note that |x (s) − y (s)| = e|γ(s−c)| e−|γ(s−c)| |x (s) − y (s)| ≤ e|γ(s−c)| kx − ykγ Then for t ∈ [a, b] , Z t Z t K |x (s) − y (s)| ds |f (s, x (s)) − f (s, y (s))| ds ≤ |F x (t) − F y (t)| ≤ c
c
Z t Z t e|γ(s−c)| ds ≤ K e|γ(s−c)| kx − ykγ ds = K kx − ykγ
(*)
c
c
Now consider the case that t ≥ c. The above reduces to |γ|(s−c) Z t e = K kx − ykγ e|γ|(s−c) ds = K kx − ykγ |tc |γ| c |γ|(t−c) |γ(t−c)| e 1 e 1 = K kx − ykγ − = K kx − ykγ − |γ| |γ| |γ| |γ| Next consider ∗ when t < c. In this case, ∗ becomes Z K kx − ykγ
c
e
|γ|(c−s)
t
ds = K kx − ykγ
e|γ|(c−s) c | − |γ| t
e|γ|(c−t) 1 + − |γ| |γ| |γ(t−c)| 1 e ≤ K kx − ykγ − |γ| |γ|
= K kx − ykγ
It follows that for any t ∈ [a, b] , e
−|γ(t−c)|
|F x (t) − F y (t)|
≤
K kx − ykγ e
≤
K kx − ykγ
−|γ(t−c)|
1 e|γ(t−c)| − |γ| |γ|
1 |γ|
and so kF x − F ykγ ≤ K kx − ykγ
1 |γ|
so let |γ| > 2K and this shows that F is a contraction map on BC ([a, b] ; Fn ) . Thus there is a unique solution to the above integral equation and hence a unique solution to the initial value problem 8.4 on [a, b] .
212
CHAPTER 8. BASIC FUNCTION SPACES
Definition 8.1.8
For the integral equation, Z t x (t) = x0 + f (s, x (s)) ds c
one considers the Picard iterates. These are given as follows. x0 (t) ≡ x0 and Z t xn+1 (t) ≡ x0 + f (s, xn (s)) ds c
Thus letting F x (t) ≡ x0 +
Rt c
f (s, x (s)) ds, the Picard iterates are of the form F xn = xn+1 .
By Theorem 8.1.6, the Picard iterates converge in BC ([a, b] , Fn ) with respect to k·kγ and so they also converge in BC ([a, b] , Fn ) with respect to the usual norm k·k by Lemma 8.1.3.
8.2
Compactness in C (K, Rn )
Let K be a nonempty compact set in Rm and consider all the continuous functions defined on this set having values in Rn . It is desired to give conditions which will show that a subset of C (K, Rn ) is compact. First is an important observation about compact sets.
Proposition 8.2.1 Let K be a nonempty compact subset of Rm . Then for each ε > 0 r
there is a finite set of points {xi }i=1 such that K ⊆ ∪i B (xi , ε) . This finite set of points is called an ε net. If D1/k is this finite set of points corresponding to ε = 1/k, then ∪k D1/k is a dense countable subset of K. Proof: The last claim is obvious. Indeed, if B (x, r) ≡ {y ∈ K : |y − x| < r} , then consider D1/k where k1 < 13 r. Then the given ball must contain a point of D1/k since its center is within 1/k of some point of Dk . Now consider the first claime about the ε net. Pick x1 ∈ K. If B (x1 , ε) ⊇ K, stop. You have your ε net. Otherwise pick x2 ∈ / B (x1 , ε) . If K ⊆ B (x1 , ε) ∪ B (x2 , ε) , stop. You have found your ε net. Continue this way. Eventually, the process must stop since otherwise, you would have an infinite sequence of points with not limit point because they are all ε apart. This contradicts the compactness of K. Recall Lemma 8.1.4. If you consider C (K, Rn ) it is automatically equal to BC (K, Rn ) because of the extreme value theorem applied to x → |f (x)| for x ∈ K. Therefore, the space C (K, Rn ) is complete with respect to the norm defined there.
Definition 8.2.2
Let A be a set of functions in C (K, Rn ) . It is called equicontinuous if for every ε > 0 there exists δ > 0 such that if |x − y| < δ, then |f (x) − f (y)| < ε for all f ∈ C (K, Rn ). In words, the functions in A are uniformly continuous for all f at once. A set A ⊆ C (K, Rn ) is uniformly bounded if there is a number M such that max {|f (x)| : x ∈ K, f ∈ A} < M. The following is the Arzela Ascoli theorem . Actually, the converse is also true but I will only give the direction of most use in applications.
Theorem 8.2.3
Let A ⊆ C (K, Rn ) be equicontinuous and uniformly bounded. Then every sequence in A has a convergent subsequence converging to some g ∈ C (K, Rn ), the convergence taking place with respect to k·k∞ , the uniform norm. ∞
Proof: Let {fj }j=1 be a sequence of functions in A. Let D be a countable dense subset ∞ ∞ of K. Say D ≡ {dk }k=1 . Then {fj (d1 )}j=1 is a bounded set of points in Rn . By the Heine ∞ Borel theorem, there is a subsequence, denoted by f(j,1) (d1 ) j=1 which converges. Now ∞ apply what was just done with {fj } to f(j,1) and feature d2 instead of d1 . Thus f(j,2) j=1
8.2. COMPACTNESS IN C K, RN
213
is a subsequence of f(j,1) which converges at d2 . This new subsequence still converges at d1 thanks to Theorem 2.3.12. Continue this way. Thus we get the following f(1,1) f(1,2) f(1,3) .. .
f(2,1) f(2,2) f(2,3) .. .
f(3,1) f(3,2) f(3,3) .. .
··· ··· ···
converges at d1 converges at d1 , d2 converges at d1 , d2 , d3 .. .
f(1,l) .. .
f(2,l) .. .
f(3,l) .. .
···
converges at dj , j ≤ l .. .
∞ ∞ Each subsequence f(j,r) j=1 is a subsequence of the one above it f(j,r−1) j=1 and converges ∞ at dj for j ≤ r. Consider the subsequence f(r,r) r=1 the diagonal subsequence. Then ∞ ∞ ∞ f(r,r) r=j is a subsequence of f(i,j) i=1 and so f(r,r) (di ) r=j converges for each i ≤ j. ∞ Since j is arbitrary, this shows that f(r,r) r=1 converges at every point of D as r → ∞. ∞ From now on, denote this subsequence of the original sequence as {gk }k=1 . It has the property that it converges at every point of D. ∞ Claim: {gk }k=1 converges at every x ∈ K. Proof of claim: Let ε > 0 be given. Let δ go with ε/4 in the definition of equicontinuity. Then pick d ∈ D such that |d − x| < δ. Then |gk (x) − gl (x)|
≤
b. Thus f has a minimum at the point where ap−1 = b. But p − 1 = p/q and so, at this point ap = bq . Therefore, at this point, p p f (a) = ap + aq − aap−1 = ap − ap = 0. Therefore, f (a) ≥ 0 for all a ≥ 0 and it equals 0 exactly when ap = bq . This implies the following major result, Holder’s inequality.
Theorem 8.3.3
Let f, g be measurable and nonnegative functions. Then Z
Z f gdµ ≤ Ω
p
1/p Z
f dµ Ω
q
1/q
g dµ Ω
1/p 1/q R R Proof: If either Ω f p dµ or Ω g q dµ is 0, then there is nothing to show because R p 1/p R p if ΩR f dµ f dµ = 0 and you could let An ≡ {ω : f p (ω) ≥ 1/n} . Then Ω R= 0, then p p 0 = Ω f dµ ≥ An f dµ ≥ (1/n) µ (An ) and so µ (An ) = 0. Therefore, {ω : f (ω) 6= 0} = ∪∞ n=1 An and each of these setsRin the union has measure zero. It follows that {ω : f (ω) 6= 0} has measure zero. Therefore, Ω f gdµ = 0 and so indeed, there is nothing left to show. The 1/q R situation is the same if Ω g q dµ = 0. Thus assume both of the factors on the right 1/p 1/q R R in the inequality are nonzero. Then let A ≡ Ω f p dµ , B ≡ Ω g q dµ . Proposition 8.3.2, Z Z Z fp gq f g dµ ≤ dµ + dµ p q Ω A p Ω B q Ω AB R R 1 Ω f p dµ 1 Ω g q dµ = + p Ap q Bq 1 1 = + =1 p q
8.3. THE LP SPACES
215
Therefore, 1/p Z 1/q q f dµ g dµ
Z
Z
p
f gdµ ≤ AB = Ω
Ω
Ω
This makes it easy to prove the Minkowski inequality for the sum of two functions.
Theorem 8.3.4
Let f, g be two measurable functions with values in F. Then
Z
p
1/p
Z
|f + g| dµ
1/p
p
≤
|f | dµ
Ω
Z
p
1/p
|g| dµ
+
Ω
(8.6)
Ω
Proof: First of all, p
p−1
|f + g| = |f + g|
p−1
|f + g| ≤ |f + g|
(|f | + |g|)
Recall that p − 1 = p/q. Then, using Theorem 8.3.3, Z Z Z p p/q p/q |f + g| dµ ≤ |f + g| |f | dµ + |f + g| |g| dµ Ω
Ω
Ω
1/q Z 1/p Z 1/q Z 1/p p p p p |f + g| dµ |f | dµ + |f + g| dµ |g| dµ Ω Ω Ω Z Z Z ! Z
≤
1/q
1/p
p
p
|f + g| dµ
=
|f | dµ
1/p
p
|g| dµ
+
Ω
Ω
1/q R R p p If Ω |f + g| dµ = 0, then 8.6 is obvious so assume this is positive. Then divide Ω |f + g| dµ on both sides to obtain 8.6. By induction, you have p !1/p 1/p Z X n n Z X p f dµ ≤ |f | dµ i i Ω Ω k=1
k=1
Observation 8.3.5 If f, g are in Lp and α, β are scalars, then αf + βg ∈ Lp also. p
To see this, note that αf + βg is measurable thanks to Proposition 5.1.7. Is |αf + βg| in L1 ? Z 1/p Z 1/p Z 1/p p p p |αf + βg| dµ ≤ |αf | dµ + |βg| dµ Ω
Ω
Ω
Z =
|α| Ω
1/p Z 1/p p p |f | dµ + |β| |g| dµ 2−k . Then Z p fn 2−k µ (Bk ) ≤ (ω) − fnk (ω) dµ < 4−k k+1 Bk
and so µ (Bk ) < 2−k . Now if fnk (ω) fails to be a Cauchy sequence, then ω ∈ Bk for infinitely many k. In other words, ω ∈ ∩∞ n=1 ∪k≥n Bk ≡ B This measurable set B has measure zero because µ (B) ≤ µ (∪k≥n Bk ) ≤
∞ X
µ (Bk )
kf k∞ }) = 0 and if λ < kf k∞ , then µ ({ω : |f (ω)| > λ}) > 0. Proof: The second claim follows right away from the definition of kf k∞ . If it is not so, then µ ({ω : |f (ω)| > λ}) = 0 and so λ would be one of those constants M in the description of kf k∞ and kf k∞ would not really be the infimum of these numbers. Consider the other claim. By definition, 1 =0 µ ω : |f (ω)| > + kf k∞ n This is because there must be some essential upper bound M between kf k∞ and kf k∞ + n1 since otherwise, kf k∞ would not be the infimum. But 1 µ ({ω : |f (ω)| > kf k∞ }) = ∪∞ µ ω : |f (ω)| > + kf k n=1 ∞ n and each of the sets in the union has measure zero. Note that this implies that if kf k∞ = 0, then f = 0 a.e. so f is regarded as 0. Thus, to say that kf − gk∞ = 0 is to say that the two functions inside the norm are equal except for a set of measure zero, and the convention is that when this happens, we regard them as the same function. If α = 0 then kαf k∞ = 0 = 0 kf k∞ . If α 6= 0, then M ≥ |f (ω)| implies |α| M ≥ |αf (ω)| . In particular, |α| kf k∞ + n1 ≥ |αf (ω)| for all ω not in the union of the sets of measure zero corresponding to each kf k∞ + n1 . Thus there is a set of measure zero N such that for ω ∈ / N, 1 |α| kf k∞ + ≥ |αf (ω)| for all n n
1 Therefore, for ω ∈ / N, |α| kf k∞ ≥ kαf k∞ . This implies that kf k∞ = α1 αf ∞ ≤ |α| kαf k∞ and so kαf k∞ ≥ |α| kf k∞ also. This shows that this acts like a norm relative to multiplication by scalars. What of the triangle inequality? Let Mn ↓ kf k∞ and Nn ↓ kgk∞ . Thus for each n, there is an exceptional set of measure zero such that off this set Mn ≥ |f (ω)| and a similar condition holding for g and Nn . Let N be the union of all the exceptional sets for f and g for each n. Then for ω ∈ / N, the following holds for all ω ∈ /N Mn + Nn ≥ |f (ω)| + |g (ω)| ≥ |f (ω) + g (ω)| So take a limit of both sides and find that kf k∞ + kgk∞ ≥ |f (ω) + g (ω)| for all ω off a set of measure zero. Therefore, kf k∞ + kgk∞ ≥ kf + gk∞
218
CHAPTER 8. BASIC FUNCTION SPACES
Theorem 8.3.10
L∞ (Ω) is complete.
Proof: Let {fn } be a Cauchy sequence. Let N be the union of all sets where it is not the case that |fn (ω) − fm (ω)| ≤ kfn − fm k∞ . By Proposition 8.3.9, there is such an exceptional set Mmn for each choice of m, n. Thus N is the countable union of these sets ∞ of measure zero. Therefore, for ω ∈ / N, {fn (ω)}n=1 is a Cauchy sequence and so we let g (ω) = 0 if ω ∈ / N and g (ω) ≡ limn→∞ fn (ω) . Thus g is measurable. Also, for ω ∈ / N, |g (ω) − fn (ω)| = lim |fm (ω) − fn (ω)| ≤ lim sup kfm − fn k∞ < ε m→∞
m→∞
provided n is sufficiently large. This shows |g (ω)| ≤ kfn k∞ + ε, ω ∈ / N so kgk∞ < ∞. Also it shows that there is a set of measure zero N such that for all ω ∈ / N, for any ε > 0, |g (ω) − fn (ω)| < ε which means kg − fn k∞ ≤ ε. Since ε is arbitrary, this shows that limn→∞ kg − fn k∞ = 0.
8.4
Approximation Theorems
First is a significant result on approximating with simple functions in Lp .
Theorem 8.4.1
Let f ∈ Lp (Ω) for p ≥ 1. Then for each ε > 0 there is a simple
function s such that kf − skp ≤ ε Proof: It suffices to consider the case where f ≥ 0 because you can then apply what is shown to the positive and negative parts of the real and imaginary parts of f to get the general case. Thus, suppose f ≥ 0 and in Lp (Ω) . By Theorem 5.1.9, there exists a p p sequence of simple functions increasing to f. Then |f (ω) − sn (ω)| ≤ |f (ω)| . This is a suitable dominating function. Then by the dominated convergence theorem, Z p 0 = lim |f (ω) − sn (ω)| dµ n→∞
Ω
which establishes the desired conclusion unless p = ∞. Use Proposition 8.3.9 to get a set of measure zero N such that off this set, |f (ω)| ≤ kf k∞ . Then consider f XN C . It is a measurable and bounded function so by Theorem 5.1.9, there is an increasing sequence of simple functions {sn } converging uniformly to this function. Hence, for n large enough, kf − sn k∞ < ε.
Theorem 8.4.2
Let µ be a regular Borel measure on Rn and f ∈ Lp (Rn ). Then for each p ≥ 1, p 6= ∞, there exists g a continuous function which is zero off a compact set such that kf − gkp < ε.
Proof: Without loss of generality, assume f ≥ 0. First suppose that f is 0 off some ball R p p B (0, R). There exists a simple function 0 ≤ s ≤ f such that |f − s| dµ < (ε/2) . Thus it suffices to show the existence of a continuous function h which is zero off a compact set 1/p R p which satisfies |h − s| dµ < ε/2. Let s (x) =
m X i=1
ci XEi (x) , Ei ⊆ B (0, R)
8.5. MAXIMAL FUNCTIONS AND FUNDAMENTAL THEOREM OF CALCULUS 219 where Ei is in Fp . Thus each Ei is bounded. By regularity, there exist compact sets Ki and Pm 1/p open sets Vi such that Ki ⊆ Ei ⊆ Vi ⊆ B (0, R) and i=1 (cpi µ (Vi \ Ki )) < ε/2. Now define dist x, V C hi (x) ≡ dist (x, K) + dist (x, V C ) Pm Thus hi equals zero off a compact set and it equals 1 on Ki and 0 off Vi . Let h ≡ i=1 ci hi . Thus 0 ≤ h ≤ max {ci , i = 1, · · · , m} . Then Z p |ci hi − ci XEi | dµ ≤ cpi XVi −Ki ≤ cpi µ (Vi \ Ki ) It follows that, from the Minkowski inequality, p !1/p Z X ci hi dµ f −
Z
1/p
p
|f − s| dµ
≤
+
i
ε + 2
≤
Z
X
p !1/p Z X (8.7) ci hi dµ s − i !p !1/p
|ci XEi − ci hi |
dµ
Z 1/p ε X p + |ci XEi − ci hi | dµ 2 i ε X p 1/p + (ci µ (Vi \ Ki )) 2 i ε ε + =ε 2 2
≤ ≤
0. Next, for each x ∈ [M f > ε] there exists a ball Bx = B (x,rx ) with rx ≤ 1 and Z −1 mp (Bx ) |f | dmp > ε. (8.12) B(x,rx )
Let F be this collection of balls. By the Vitali covering theorem, there is a collection of disjoint balls G such that if each ball in G is enlarged making the center the same but the radius 5 times as large, then the corresponding collection of enlarged balls covers [M f > ε] . ∞ ˆi . By separability, G is countable, say {Bi }i=1 and the enlarged balls will be denoted as B Then from 8.12, X X ˆi ≤ 5p mp ([M f > ε]) ≤ mp B mp (Bi ) i
i
Z 5p X |f | dmp ≤ 5p ε−1 kf k1 ε i Bi
≤ This proves claim 1. Claim 2: If g ∈ Cc (Rp ), then 1 r→0 mp (B (x, r))
Z |g (y) − g (x)| dmp (y) = 0
lim
and
B(x,r)
1 r→0 mp (B (x,r))
Z
lim
g (y) dmp (y) = g (x).
(8.13)
B(x,r)
Proof: Since g is continuous at x, whenever r is small enough, Z Z 1 1 |g (y) − g (x)| dmp (y) ≤ ε dmp (y) = ε. mp (B (x, r)) B(x,r) mp (B (x,r)) B(x,r) 8.13 follows from the above and the triangle inequality. This proves the claim. Now let g ∈ Cc (Rp ). Then from the above observations about continuous functions in Claim 2, " #! Z 1 mp x : lim sup |f (y) − f (x)| dmp (y) > ε (8.14) r→0 mp (B (x, r)) B(x,r)
8.5. MAXIMAL FUNCTIONS AND FUNDAMENTAL THEOREM OF CALCULUS 221 #! Z ε 1 mp |f (y) − g (y)| dmp (y) > x : lim sup 2 r→0 mp (B (x, r)) B(x,r) #! " Z ε 1 +mp |g (y) − g (y)| dmp (y) > x : lim sup 2 r→0 mp (B (x, r)) B(x,r) h ε i +mp x : |g (x) − f (x)| > . 2 h h ε i ε i ≤ mp M (f − g) > + mp |f − g| > (8.15) 2 2 Z h ε ε i kf − gk1 ≥ |f − g| dmp ≥ mp |f − g| > 2 2 [|f −g|> 2ε ] "
≤
Now
and so using Claim 1, 8.15 and hence 8.14 is dominated by Z 2 5p |f − g| dmp . + ε ε But by Theorem 8.4.2, g can be chosen to make the above as small as desired. Hence 8.14 is 0. " #! Z 1 mp lim sup |f (y) − f (x)| dmp (y) > 0 r→0 mp (B (x, r)) B(x,r) " #! Z ∞ X 1 1 ≤ mp lim sup |f (y) − f (x)| dmp (y) > =0 k r→0 mp (B (x, r)) B(x,r) k=1
By completeness of mp this implies " # Z 1 lim sup |f (y) − f (x)| dmp (y) > 0 r→0 mp (B (x, r)) B(x,r) is a set of mp measure zero. The following corollary is the main result referred to as the Lebesgue Differentiation theorem.
Definition 8.5.2
f ∈ L1loc (Rp , mp ) means f XB is in L1 (Rn , mp ) whenever B is a
ball.
Corollary 8.5.3 If f ∈ L1loc (Rp , mp ), then for a.e.x, 1 lim r→0 mp (B (x, r))
Z |f (y) − f (x)| dmp (y) = 0 .
(8.16)
B(x,r)
In particular, for a.e.x, 1 r→0 mp (B (x, r))
Z
lim
f (y) dmp (y) = f (x) B(x,r)
Proof: If f is replaced by f XB(0,k) then the conclusion 8.16 holds for all x ∈ / Fk where Fk is a set of mp measure 0. Letting k = 1, 2, · · · , and F ≡ ∪∞ k=1 Fk , it follows that F is a set of measure zero and for any x ∈ / F , and k ∈ {1, 2, · · · }, 8.16 holds if f is replaced by f XB(0,k) . Picking any such x, and letting k > |x| + 1, this shows Z 1 lim |f (y) − f (x)| dmp (y) r→0 mp (B (x, r)) B(x,r)
222
CHAPTER 8. BASIC FUNCTION SPACES 1 r→0 mp (B (x, r))
Z
f XB(0,k) (y) − f XB(0,k) (x) dmp (y) = 0.
= lim
B(x,r)
The last claim holds because Z 1 f (y) dmp (y) f (x) − mp (B (x, r)) B(x,r) Z 1 ≤ |f (y) − f (x)| dmp (y) mp (B (x, r)) B(x,r)
Definition 8.5.4
Let E be a measurable set. Then x ∈ E is called a point of density
if lim
r→0
mp (B (x, r) ∩ E) =1 mp (B (x, r))
Proposition 8.5.5 Let E be a measurable set. Then mp a.e. x ∈ E is a point of density. Proof: This follows from letting f (x) = XE (x) in Corollary 8.5.3.
8.6
A Useful Inequality
There is an extremely useful inequality. To prove this theorem first consider a special case of it in which technical considerations which shed no light on the proof are excluded.
Lemma 8.6.1 Let (X, S, µ) and (Y, F, λ) be finite complete measure spaces and let f be µ × λ measurable and uniformly bounded. Then the following inequality is valid for p ≥ 1. Z Z
p1
p
X
Z
Z
dµ ≥
|f (x, y)| dλ
( Y
Y
|f (x, y)| dµ)p dλ
p1 .
(8.17)
X
Proof: Since f is bounded and µ (X) , λ (Y ) < ∞, p1 Z ( |f (x, y)|dµ)p dλ < ∞.
Z Y
X
Let
Z |f (x, y)|dµ.
J(y) = X
Note there is no problem in writing this for a.e. y because f is product measurable. Then by Fubini’s theorem, p Z Z Z Z p−1 |f (x, y)|dµ dλ = J(y) |f (x, y)|dµ dλ Y X X ZY Z = J(y)p−1 |f (x, y)|dλ dµ X
Y
Now apply Holder’s inequality in the last integral above and recall p − 1 = pq . This yields p |f (x, y)|dµ dλ
Z Z Y
X
Z Z ≤
p
q1 Z
p
|f (x, y)| dλ
J(y) dλ X
Y
Z =
dµ
Y p
q1 Z Z
p
|f (x, y)| dλ
J(y) dλ Y
p1
X
Y
p1 dµ
8.7. EXERCISES
223 q1 Z Z p1 Z ( |f (x, y)|dµ)p dλ dµ. |f (x, y)|p dλ
Z = Y
X
X
(8.18)
Y
Therefore, dividing both sides by the first factor in the above expression, Z Z
p p1 p1 Z Z p ≤ dµ. |f (x, y)|dµ dλ |f (x, y)| dλ
Y
X
X
(8.19)
Y
Note that 8.19 holds even if the first factor of 8.18 equals zero. Now consider the case where f is not assumed to be bounded and where the measure spaces are σ finite.
Theorem 8.6.2 Let (X, S, µ) and (Y, F, λ) be σ-finite measure spaces and let f be product measurable. Then the following inequality is valid for p ≥ 1. Z Z
p1
p
dµ ≥
|f (x, y)| dλ X
p1 Z p ( |f (x, y)| dµ) dλ .
Z Y
Y
(8.20)
X
Proof: Since the two measure spaces are σ finite, there exist measurable sets, Xm and Yk such that Xm ⊆ Xm+1 for all m, Yk ⊆ Yk+1 for all k, and µ (Xm ) , λ (Yk ) < ∞. Now define f (x, y) if |f (x, y)| ≤ n fn (x, y) ≡ n if |f (x, y)| > n. Thus fn is uniformly bounded and product measurable. By the above lemma, Z
Z
p
p1
|fn (x, y)| dλ Xm
Z
Z
dµ ≥
p
|fn (x, y)| dµ) dλ
(
Yk
Yk
p1 .
(8.21)
Xm
Now observe that |fn (x, y)| increases in n and the pointwise limit is |f (x, y)|. Therefore, using the monotone convergence theorem in 8.21 yields the same inequality with f replacing fn . Next let k → ∞ and use the monotone convergence theorem again to replace Yk with Y . Finally let m → ∞ in what is left to obtain 8.20. Note that the proof of this theorem depends on two manipulations, the interchange of the order of integration and Holder’s inequality. Note that there is nothing to check in the case of double sums. Thus if aij ≥ 0, it is always the case that !p 1/p
X X j
aij
1/p
≤
X
i
X
i
apij
j
because the integrals in this case are just sums and (i, j) → aij is measurable.
8.7
Exercises
1. Establish the inequality kf gkr ≤ kf kp kgkq whenever
1 r
=
1 p
+ 1q .
2. Let (Ω, S, µ) be counting measure on N. Thus Ω = N and S = P (N) with µ (S) = number of things in S. Let 1 ≤ p ≤ q. Show that in this case, L1 (N) ⊆ Lp (N) ⊆ Lq (N) . R Hint: This is real easy if you consider what Ω f dµ equals. How are the norms related? 3. Consider the function, f (x, y) =
xp−1 py
q−1
+ yqx for x, y > 0 and
that f (x, y) ≥ 1 for all such x, y and show this implies xy ≤
1 1 p + q = 1. p yq x p + q .
Show directly
224
CHAPTER 8. BASIC FUNCTION SPACES
4. Give an example of a sequence of functions in Lp (R) which converges to zero in Lp but does not converge pointwise to 0. Does this contradict the proof of the theorem that Lp is complete? 5. Let φ : R → R be convex. This means φ(λx + (1 − λ)y) ≤ λφ(x) + (1 − λ)φ(y) whenever λ ∈ [0, 1]. Verify that if x < y < z, then φ(z)−φ(x) z−x
φ(y)−φ(x) y−x
≤
φ(z)−φ(y) z−y
and that
φ(z)−φ(y) . z−y
≤ Show if s ∈ R there exists λ such that φ(s) ≤ φ(t) + λ(s − t) for all t. Show that if φ is convex, then φ is continuous. 6. ↑ Prove Jensen’s R inequality. R If φ : R → R is convex,Rµ(Ω) = 1, and f : Ω → R is in L1 (Ω), then φ( Ω f du) ≤ Ω φ(f )dµ. Hint: Let s = Ω f dµ and use Problem 5. R1 R∞ 7. B(p, q) = 0 xp−1 (1 − x)q−1 dx, Γ(p) = 0 e−t tp−1 dt for p, q > 0. The first of these is called the beta function, while the second is the gamma function. Show a.) Γ(p + 1) = pΓ(p); b.) Γ(p)Γ(q) = B(p, q)Γ(p + q). Rx 8. Let f ∈ Cc (0, ∞) and define F (x) = x1 0 f (t)dt. Show kF kLp (0,∞) ≤
p kf kLp (0,∞) whenever p > 1. p−1
Hint: Argue Rthere is no loss of generality in assuming f ≥ 0 and then assume this is ∞ so. Integrate 0 |F (x)|p dx by parts as follows: Z
∞
show = 0
Z z }| { p ∞ F dx = xF |0 − p p
0
∞
xF p−1 F 0 dx.
0
Now show xF 0 = f − F and use this in the last integral. Complete the argument by using Holder’s inequality and p − 1 = p/q. The measure is one dimensional Lebesgue measure in this problem. 9. ↑ Now suppose f ∈ Lp (0, ∞), p > 1, and f not necessarily in Cc (0, ∞). Show that Rx F (x) = x1 0 f (t)dt still makes sense for each x > 0. Show the inequality of Problem 8 is still valid. This inequality is called Hardy’s inequality. Hint: To show this, use the above inequality along with the density of Cc (0, ∞) in Lp (0, ∞). 10. Suppose f, g ≥ 0. When does equality hold in Holder’s inequality? 11. Let α ∈ (0, 1]. We define, for X a compact subset of Rp , C α (X; Rn ) ≡ {f ∈ C (X; Rn ) : ρα (f ) + kf k ≡ kf kα < ∞} where kf k ≡ sup{|f (x)| : x ∈ X} and
|f (x) − f (y)| : x, y ∈ X, x 6= y}. α |x − y| Show that (C α (X; Rn ) , k·kα ) is a complete normed linear space. This is called a Holder space. What would this space consist of if α > 1? ρα (f ) ≡ sup{
α n p 12. Let {fn }∞ n=1 ⊆ C (X; R ) where X is a compact subset of R and suppose
kfn kα ≤ M for all n. Show there exists a subsequence, nk , such that fnk converges in C (X; Rn ). We say the given sequence is precompact when this happens. (This also shows the embedding of C α (X; Rn ) into C (X; Rn ) is a compact embedding.) Hint: You might want to use the Ascoli Arzela theorem.
8.7. EXERCISES
225
13. Let f :R × Rn → Rn be continuous and bounded and let x0 ∈ Rn . If x : [0, T ] → Rn and h > 0, let
x0 if s ≤ h, x (s − h) , if s > h.
τ h x (s) ≡ For t ∈ [0, T ], let
t
Z xh (t) = x0 +
f (s, τ h xh (s)) ds. 0
Show using the Ascoli Arzela theorem that there exists a sequence h → 0 such that xh → x n
in C ([0, T ] ; R ). Next argue Z x (t) = x0 +
t
f (s, x (s)) ds 0
and conclude the following theorem. If f :R × Rn → Rn is continuous and bounded, and if x0 ∈ Rn is given, there exists a solution to the following initial value problem. x0 x (0)
= f (t, x) , t ∈ [0, T ] = x0 .
This is the Peano existence theorem for ordinary differential equations. 14. Suppose f ∈ L∞ ∩ L1 . Show limp→∞ kf kLp = kf k∞ . Hint: Z p p (kf k∞ − ε) µ ([|f | > kf k∞ − ε]) ≤ |f | dµ ≤ [|f |>kf k∞ −ε] Z Z Z p p−1 p−1 |f | dµ = |f | |f | dµ ≤ kf k∞ |f | dµ. Now raise both ends to the 1/p power and take lim inf and lim sup as p → ∞. You should get kf k∞ − ε ≤ lim inf kf kp ≤ lim sup kf kp ≤ kf k∞ 15. Suppose µ(Ω) < ∞. Show that if 1 ≤ p < q, then Lq (Ω) ⊆ Lp (Ω). Hint Use Holder’s inequality. 1 2 2 1 16. Show √ L (R) * L (R) and L (R) * L (R) if Lebesgue measure is used. Hint: Consider 1/ x and 1/x.
17. Suppose that θ ∈ [0, 1] and r, s, q > 0 with 1 θ 1−θ = + . q r s show that
Z (
|f |q dµ)1/q ≤ ((
Z
|f |r dµ)1/r )θ ((
Z
|f |s dµ)1/s )1−θ.
If q, r, s ≥ 1 this says that kf kq ≤ kf kθr kf k1−θ . s Using this, show that ln kf kq ≤ θ ln (kf kr ) + (1 − θ) ln (kf ks ) . Hint:
Now note that 1 =
Z θq r
+
|f |q dµ =
q(1−θ) s
Z
|f |qθ |f |q(1−θ) dµ.
and use Holder’s inequality.
226
CHAPTER 8. BASIC FUNCTION SPACES
18. Suppose f is a function in L1 (R) and f is infinitely differentiable. Is f 0 ∈ L1 (R)? Hint: What if φ ∈ Cc∞ (0, 1) and f (x) = φ (2n (x − n)) for x ∈ (n, n + 1) , f (x) = 0 if x < 0?
Chapter 9
Change of Variables Lemma 9.0.1 Every open set in Rp is the countable disjoint union of half open boxes of the form p Y
(ai , ai + 2−k ]
i=1 −k
where ai = l2 for some integers, l, k where k ≥ m. If Bm denotes this collection of half open boxes, then every box of Bm+1 is contained in a box of Bm or equals a box of Bm . Proof: Let m ∈ N be given and let k ≥ m. Ck = {All half open boxes
p Y
(ai , ai + 2−k ] where
i=1 −k
ai = l2
for some integer l.}
Thus Ck consists of a countable disjoint collection of boxes whose union is Rp . This is sometimes called a tiling of Rp . Think of tiles on the floor of a bathroom and you will get √ the idea. Note that each box has diameter no larger than 2−k p. This is because if we have two points, p Y x, y ∈ (ai , ai + 2−k ], i=1 −k
then |xi − yi | ≤ 2
. Therefore, |x − y| ≤
p X
−k 2
!1/2
2
√ = 2−k p.
i=1
Also, a box of Ck+1 is either contained in a box of Ck or it has empty intersection with this box of Ck . Let U be open and let B1 ≡ all sets of C1 which are contained in U . If B1 , · · · , Bk have been chosen, Bk+1 ≡ all sets of Ck+1 contained in U \ ∪ ∪ki=1 Bi . Let B∞ = ∪∞ i=1 Bi . We claim ∪B∞ = U . Clearly ∪B∞ ⊆ U because every box of every Bi is contained in U . If p ∈ U , let k be the smallest integer such that p is contained in a box from Ck which is also a subset of U . Thus p ∈ ∪Bk ⊆ ∪B∞ . Hence B∞ is the desired countable disjoint collection of half open boxes whose union is U . The last claim follows from the construction. 227
228
9.1
CHAPTER 9. CHANGE OF VARIABLES
Lebesgue Measure and Linear Transformations
Lemma 9.1.1 Let A : Rp → Rp be linear and invertible. Then A maps open sets to open sets. Proof: This follows from the observation that if B is any linear transformation, then B is continuous. Indeed, it is realized by matrix multiplication and so it is clear that if xn → x, then Bxn → Bx. Now it follows that A−1 is continuous. Let U be open. Let y ∈ A (U ) . Then is y an interior point of A (U )? if not, there exists yn → y where yn ∈ / A (U ). But then A−1 yn → A−1 y ∈ U. Since U is open, A−1 yn ∈ U for all n large enough and so yn ∈ A (U ) after all. Thus y is an interior point of A (U ) showing that A (U ) is open. First is a general result.
Proposition 9.1.2 Let h : U → Rp is continuous where U is an open subset of Rp . Also suppose h is differentiable on H ⊆ U where H is Lebesgue measurable. Then if E is a Lebesgue measurable set contained in H, then h (E) is also Lebesgue measurable. Also if N ⊆ H is a set of measure zero, then h (N ) is a set of measure zero. In particular, a linear function A maps measurable sets to measurable sets. Proof: Consider the second claim first. Let N be a set of measure zero contained in H and let Nk ≡ {x ∈ N : kDh (x)k ≤ k} There is an open set V ⊇ Nk such that mp (V ) < ε. For each x ∈ Nk , there is a ball Bx ˆx ⊆ V, where Bx = B (x, rx ) , B ˆx = B (x, 5rx ) centered at x with radius 5rx < 1 such that B ˆ and for y ∈ Bx , h (y) ∈ h (x) + Dh (x) B (0, 5rx ) + B (0, ε5rx ) Thus for kvk ∈ B (0, 5rx ) and kuk < εrx , Dh (x) v + u
∈
B (0, kDh (x)k 5rx ) + B (0, εrx )
⊆ B (0, (kDh (x)k + ε) 5rx ) It follows that h (Bx ) ⊆ h (x) + B (0, (kDh (x)k + ε) 5rx ) ⊆ B (h (x) , (k + ε) 5rx ) p mp (h (Bx )) ≤ (k + ε) mp (B (x, 5rx )) Also let rx be so small that B (x,5rx ) ⊆ V . Then, the balls B (x,rx ) cover Nk and so by ˆi the the Vitali covering theorem, there are disjoint balls Bi = B (xi , rxi ) such that for B ˆk . Thus ball with same center and 5 times the radius as Bi , Nk ⊆ ∪k B X ˆk ˆk mp (h (Nk )) ⊆ mp ∪k h B ≤ mp h B k
≤
X k p
p
ˆk = (k + ε) mp B
X
p
(k + ε) 5p mp (Bk )
k p
p
p
≤ 5 (k + ε) mp (V ) < ε5 (k + ε)
Since ε > 0 is arbitrary, it follows that mp (h (Nk )) = 0 and so h (Nk ) is measurable and has measure zero. Now let k → ∞ to conclude that mp (h (N )) = 0. Now the other claim is shown as follows. By Proposition 7.5.2, if E is Lebesgue measurable, E ⊆ H, there is an Fσ set F contained in E such that mp (E \ F ) = 0. Then h (F ) is clearly measurable because h is continuous and F is a countable union of compact sets. Thus h (E) = h (F ) ∪ h (E \ F ) and the second was just shown measurable while the first is an Fσ set so it is actually a Borel set.
9.1. LEBESGUE MEASURE AND LINEAR TRANSFORMATIONS
229
From Linear algebra, if A is an invertible linear transformation , it is the composition of finitely many invertible linear transformations which are of the following form. T T x1 · · · xr · · · xs · · · xp → x1 · · · xr · · · xs · · · xp x1 x1
···
xr
··· ···
xr
···
xs
···
T
xp xp
→
T
···
x1
→
x1
···
T
cxr
···
xp
xr
···
xs + xr
, c 6= 0 ···
xp
T
where these are the actions obtained by multiplication by elementary matrices. Denote these specialQ linear transformations by E (r ↔ s) , E (cr) , E (s → s + r) . p Let R = i=1 (ai , bi ) . Then it is easily seen that mp (E (r ↔ s) (R)) = mp (R) = |det (E (r ↔ s))| mp (R) since this transformation just switches two sides of R. mp (E (cr) (R)) = |c| mp (R) = |det (E (cr))| mp (R) since this transformation just magnifies one side, multiplying it by c. The other linear transformation which represents a sheer is a little harder. However, Z mp (E (s → s + r) (R)) = dmp E(s→s+r)(R) Z Z Z Z = ··· XE(s→s+r)(R) dxs dxr dxp1 · · · dxpp−2 R
R
R
R
Now recall Theorem 6.8.5 which says you can integrate using the usual Riemann integral when the function involved is Borel. Thus the above becomes Z bp Z bp1 Z br Z bs +xr p−2 ··· dxs dxr dxp1 · · · dxpp−2 app−2
ap1
ar
as +xr
= mp (R) = |det (E (s → s + r))| mp (R) Recall that when a row (column) is added to another row (column), the determinant of the resulting matrix is unchanged.
Lemma 9.1.3 Let L be any of the above elementary linear transformations. Then mp (L (F )) = |det (L)| mp (F ) for any Borel set F . Also L (F ) is Lebesgue measurable if F is Lebesgue measurable. If F is Borel, then so is L (F ). Qp Proof: Let Rk = i=1 (−k, k) . Let G be those Borel sets F such that L (F ) is Borel and mp (L (F ∩ Rk )) = |det (L)| mp (F ∩ Rk ) (9.1) Letting K be the open rectangles, it follows from the above discussion that the pi system K is in G. It is also obvious that if Fi ∈ G the Fi being disjoint, then since L is one to one, mp (L (∪∞ i=1 Fi ∩ Rk ))
=
∞ X
mp (L (Fi ∩ Rk )) = |det (L)|
i=1
=
∞ X
mp (Fi ∩ Rk )
i=1
|det (L)| mp (∪∞ i=1 Fi ∩ Rk )
Thus G is closed with respect to countable disjoint unions. If F ∈ G then mp L F C ∩ Rk + mp (L (F ∩ Rk )) = mp (L (Rk ))
230
CHAPTER 9. CHANGE OF VARIABLES mp L F C ∩ Rk mp L F C ∩ Rk
+ |det (L)| mp (F ∩ Rk ) = |det (L)| mp (Rk ) = |det (L)| mp (Rk ) − |det (L)| mp (F ∩ Rk ) = |det (L)| mp F C ∩ Rk
It follows that G is closed with respect to complements also. Therefore, G = σ (K) = B (Rp ). Now let k → ∞ in 9.1 to obtain the desired conclusion.
Theorem 9.1.4
Let L be a linear transformation which is invertible. Then for any Borel F, L (F ) is Borel and mp (L (F )) = |det (L)| mp (F ) More generally, if L is an arbitrary linear transformation, then for any F ∈ Fp , L (F ) ∈ Fp and the above formula holds. Proof: From linear algebra, there are Li each elementary such that L = L1 ◦L2 ◦· · ·◦Ls . By Proposition 9.1.2, each Li maps Borel sets to Borel sets. Hence, using Lemma 9.1.3, mp (L (F ))
= |det (L1 )| mp (L2 ◦ · · · ◦ Ls (F )) = |det (L1 )| |det (L2 )| mp (L3 ◦ · · · ◦ Ls (F )) s Y = ··· = |det (Li )| mp (F ) = |det (L)| mp (F ) i=1
the last claim from properties of the determinant. Next consider the general case. First I clam that if N has measure 0 then so does L (N ) and if F ∈ Fp , then so is L (F ) ∈ Fp for any linear L. This follows from Proposition 9.1.2 since L is differentiable. By Proposition 7.5.2, if E ∈ Fp , then for L invertible, there is an Fσ set F and a Gδ set G such that mp (G \ F ) = 0 and F ⊆ E ⊆ G. Then for L invertible, mp (L (F )) ≤ mp (L (E)) ≤ mp (L (G)) and so, since F, G are Borel, |det (L)| mp (F ) ≤ mp (L (E)) ≤ |det (L)| mp (G) = |det (L)| mp (F ) = |det (L)| mp (E) and so all the inequalities are equal signs. Hence, mp (L (E)) = |det (L)| mp (E). If L−1 does not exist and E ∈ Fp , then there are elementary matrices Lk such that L1 ◦L2 ◦· · ·◦Lm ◦L maps Rp to {x ∈ Rp : xp = 0} , a set of mp measure zero. By completeness of Lebesgue measure, L1 ◦ L2 ◦ · · · ◦ Lm ◦ L (E) and L (E) are both measurable and m Y
|det (Li )| mp (L (E)) = mp (L1 ◦ L2 ◦ · · · ◦ Lm ◦ L (E)) = 0
i=1
so in this case, mp (L (E)) = 0 = |det (L)| mp (0). Thus the formula holds regardless. For A, B nonempty sets in Rp , A + B denotes all vectors of the form a + b where a ∈ A and b ∈ B. Thus if Q is a linear transformation, Q (A + B) = QA + QB The following proposition uses standard linear algebra to obtain an interesting estimate on the measure of a set. It is illustrated by the following picture.
9.2. CHANGE OF VARIABLES NONLINEAR MAPS
231
D
QD
In the above picture, the slanted set is of the form B + D where B is a ball and the un-slanted version is obtained by doing the linear transformation Q to the slanted set. The reason the two look the same is that the Q used will preserve all distances. It will be an orthogonal linear transformation.
Proposition 9.1.5 Let the norm be the standard Euclidean norm and let V be a k dimensional subspace of Rp where k < p. Suppose D is a Fσ subset of V which has diameter d. Then p−1 mp (D + B (0, r)) ≤ 2p (d + r) r Proof: Let {v1 , · · · , vk } be an orthonormal basis for V . Enlarge to an orthonormal basis of all of Rp using the Gram Schmidt process to obtain {v1 , · · · , vk , vk+1 , · · · , vp } . Now define an orthogonal transformation Q by Qvi = ei . Thus QT Q = I and Q preserves all lengths. Thus also det (Q) = 1. Then Q (D + B (0, r)) = QD + B (0, r) where the diameter of QD is the same as the diameter of D and QB (0, r) = B (0,r) because Q preserves lengths in the Euclidean norm. This is why we use this norm rather than some other. Therefore, from the definition of the Lebesgue measure and the above result on the magnification factor, mp (D + B (0, r)) = det (Q) mp (D + B (0, r)) = mp (QD + B (0, r)) and this last is no larger than (2d + 2r)
9.2
p−1
p−1
2r = 2p (d + r)
r.
Change of Variables Nonlinear Maps
The very interesting approach given here follows Rudin [61]. Now recall Lemma 3.9.1 which is stated here for convenience.
Lemma 9.2.1 Let g be continuous and map B (p, r) ⊆ Rn to Rn . Suppose that for all x ∈ B (p, r), |g (x) − x| < εr Then it follows that g B (p, r) ⊇ B (p, (1 − ε) r) Now suppose that U ⊆ Rp is open, h : U → Rp is continuous, and mp (h (U \ H)) = 0 where H ⊆ U and H is Lebesgue measurable. Suppose also that h is one to one and differentiable on H. Define for Lebesgue measurable E ⊆ U λ (E) ≡ mp (h (E ∩ H)) , Then it is clear that λ is indeed a measure on the σ algebra of Lebesgue measurable subsets of U . Note that mp (h (E)) − mp (h (E ∩ H)) ≤ ≤
mp (h (E ∩ H)) + mp (h (E \ H)) − mp (h (E ∩ H)) mp (h (U \ H)) = 0
232
CHAPTER 9. CHANGE OF VARIABLES
Thus one could just as well let λ (E) ≡ mp (h (E)). Since h is one to one on H, this along with Proposition 9.1.2 implies that λ mp since if mp (E) = 0, then h (E ∩ H) also has measure zero. Also λ and mp are finite on closed balls so both are σ finite. Therefore, for measurable E ⊆ U, it follows from the Radon Nikodym theorem Corollary 6.10.16 that there is a real valued, nonnegative, measurable function f in L1 (K) for any compact set K such that Z Z λ (E) = mp (h (E ∩ H)) = XE f (x) dmp = XE f (x) dmp (9.2) U
So what is f (x)? To begin with, assume Dh (x) −1 Dh (x) exists as needed,
−1
exists. By differentiability, and using
h (B (x, r)) − h (x) ⊆ Dh (x) B (0,r) + Dh (x) B (0, εr) ⊆ Dh (x) (B (0, r (1 + ε))) for all r small enough. Therefore, by Theorem 9.1.4 and translation invariance of Lebesgue measure, mp (h (B (x, r))) ≤ mp (Dh (x) (B (0, r (1 + ε)))) = |det (Dh (x))| mp (B (0, r (1 + ε))) Also, for |v| < r h (x + v) − h (x) −1
Dh (x)
(h (x + v) − h (x))
=
Dh (x) v + o (v)
= v + o (v)
−1
and so if g (v) ≡ Dh (x) (h (x + v) − h (x)) , |g (v) − v| < εr provided r is small enough. Therefore, from Lemma 9.2.1, for small enough r, Dh (x)
−1
(h (x + B (0,r)) − h (x)) ⊇ B (0, (1 − ε) r)
Thus h (B (x, r)) ⊇ h (x) + Dh (x) B (0, (1 − ε) r) and so, using Theorem 9.1.4 again, mp (h (B (x, r))) ≥ |det (Dh (x))| mp (B (0, r (1 − ε))) for r small enough. Thus, since Z mp (h (B (x, r))) = mp (h (B (x, r) ∩ H)) = λ (B (x, r)) =
f (x) dmp , B(x,r)
it follows that Z |det (Dh (x))| mp (B (0, r (1 − ε))) ≤
f (x) dmp ≤ |det (Dh (x))| mp (B (0, r (1 + ε))) B(x,r)
for r small enough. Now divide by mp (B (x, r)) and use the fundamental theorem of calculus Corollary 8.5.3 to find that for ε small ε > 0, p
p
|det (Dh (x))| (1 − ε) ≤ f (x) ≤ |det (Dh (x))| (1 + ε) a.e. Letting εk → 0 and picking a set of measure zero for each εk , it follows that off a set of measure zero f (x) = |det (Dh (x))| . −1 If Dh (x) does not exist, then you have h (B (x, r)) − h (x) ⊆ Dh (x) B (0,r) + B (0,εr)
9.2. CHANGE OF VARIABLES NONLINEAR MAPS
233
and Dh (x) maps into a bounded subset of a p − 1 dimensional subspace. Therefore, using Proposition 9.1.5, the right side has measure no more than an expression of the form Crp ε, C depending on Dh (x). Therefore, in this case, 1 lim r→0 mp (B (x, r))
mp (h (B (x, r))) Crp ≤ε r→0 mp (B (x, r)) αp r p
Z
f (x) dmp = lim B(x,r)
−1
and since ε is arbitrary, this shows that for a.e. x, such that Dh (x) does not exist, f (x) = 0 = |det (Dh (x))| in this case also. Therefore, whenever E is a Lebesgue measurable set E ⊆ H, Z Z mp (h (E)) = Xh(E) (y) dmp = XE (x) |det (Dh (x))| dmp h(H) U Z = XE (x) |det (Dh (x))| dmp H
The difficulty here is that the inverse image of a measurable set might not be measurable. Let F be a Borel subset of the measurable set h (H) . Then h−1 (F ) is Borel measurable. Then from what was just shown, Z Z Xh(h−1 (F )) (y) dmp = Xh−1 (F ) (x) |det (Dh (x))| dmp h(H)
H
and rewriting this gives Z
Z XF (y) dmp =
h(H)
XF (h (x)) |det (Dh (x))| dmp H
and all needed measurability holds. Now for E an arbitrary measurable subset of H let F ⊆ E ⊆ G where F is Fσ and G is Gδ and mp (G \ F ) = 0. Without loss of generality, assume G ⊆ U . Then Z (XG (h (x)) |det (Dh (x))| − XF (h (x)) |det (Dh (x))|) dmp = 0 H
because the integral of each function in the difference equals Z Z Z XE (y) dmp = XF (y) dmp = h(H)
h(H)
XG (y) dmp
h(H)
and so the two functions are equal off a set of measure zero. Furthermore, XE (h (x)) |det (Dh (x))| ∈ [XF (h (x)) |det (Dh (x))| , XG (h (x)) |det (Dh (x))|] and so XE (h (x)) |det (Dh (x))| equals a measurable function a.e. By completeness of the measure mp it follows that x →XE (h (x)) |det (Dh (x))| is also measurable. I am not saying that x →XE (h (x)) is measurable. In fact this might not be so because it is Xh−1 (E) (x) and the inverse image of a measurable set is not necessarily measurable. For a well known example, see Problem 26 on Page 156. The thing which is measurable is the product in the integrand. Therefore, for E a measurable subset of H Z Z XE (y) dmp = XE (h (x)) |det (Dh (x))| dmp (9.3) h(H)
H
The following theorem gives a change of variables formula.
234
CHAPTER 9. CHANGE OF VARIABLES
Theorem 9.2.2
Let U ⊆ Rp be open, h : U → Rp continuous, and mp (h (U \ H)) = 0 where H ⊆ U and H is Lebesgue measurable. Suppose also that h is differentiable on H and is one to one on H. Then h (H) is Lebesgue measurable and if f ≥ 0 is Lebesgue measurable, then Z Z f (y) dmp = f (h (x)) |det (Dh (x))| dmp (9.4) h(H)
H
and all needed measurability holds. Proof: Formula 9.3 implies that 9.4 holds for any nonnegative simple function s. Then for f nonnegative and measurable, it is the pointwise increasing limit of such simple functions. Therefore, 9.4 follows from the monotone convergence theorem. Note that the above theorem holds if H = U . One might wonder why the fuss over having a separate H on which h is differentiable. One reason for this is Rademacher’s theorem which states that every Lipshitz continuous function is differentiable a.e. Thus if you have a Lipshitz function defined on U an open set, then if you let H be the set where this function is differentiable, it will follow that U \ H has measure zero and so, by Problem 10 on Page 201, you also have h (U \ H) has measure zero. Thus the above theorem is at least as good as what is needed to give a change of variables formula for transformations which are only Lipschitz continuous. Next is a significant result called Sard’s lemma. In the proof, it does not matter which norm you use in defining balls but it may be easiest to consider the norm kxk ≡ max {|xi | , i = 1, · · · , p}.
Lemma 9.2.3 (Sard) Let U be an open set in Rp and let h : U → Rp be differentiable. Let Z ≡ {x ∈ U : det Dh (x) = 0} . Then mp (h (Z)) = 0. Proof: For convenience, assume the balls in the following argument come from k·k∞ . First note that Z is a Borel set because h is continuous and so the component functions of the Jacobian matrix are each Borel measurable. Hence the determinant is also Borel measurable. Suppose that U is a bounded open set. Let ε > 0 be given. Also let V ⊇ Z with V ⊆ U open, and mp (Z) + ε > mp (V ) . Now let x ∈ Z. Then since h is differentiable at x, there exists δ x > 0 such that if r < δ x , then B (x, r) ⊆ V and also, h (B (x,r)) ⊆ h (x) + Dh (x) (B (0,r)) + B (0,rη) , η < 1. Regard Dh (x) as an n × n matrix, the matrix of the linear transformation Dh (x) with respect to the usual coordinates. Since x ∈ Z, it follows that there exists an invertible matrix A such that ADh (x) is in row reduced echelon form with a row of zeros on the bottom. Therefore, mp (A (h (B (x,r)))) ≤ mp (ADh (x) (B (0,r)) + AB (0,rη))
(9.5)
The diameter of ADh (x) (B (0,r)) is no larger than kAk kDh (x)k 2r and it lies in Rp−1 × {0} . The diameter of AB (0,rη) is no more than kAk (2rη) .Therefore, the measure of the right side in 9.5 is no more than p−1
[(kAk kDh (x)k 2r + kAk (2η)) r] p
≤ C (kAk , kDh (x)k) (2r) η
(rη)
9.3. MAPPINGS WHICH ARE NOT ONE TO ONE
235
Hence from the change of variables formula for linear maps, C (kAk , kDh (x)k) mp (B (x, r)) |det (A)|
mp (h (B (x,r))) ≤ η
Then letting δ x be still smaller if necessary, corresponding to sufficiently small η, mp (h (B (x,r))) ≤ εmp (B (x, r)) The balls of this form constitute a Vitali cover of Z. Hence, by the Vitali covering theorem ∞ Theorem 7.6.6, there exists {Bi }i=1 , Bi = Bi (xi , ri ) , a collection of disjoint balls, each of which is contained in V, such that mp (h (Bi )) ≤ εmp (Bi ) and mp (Z \ ∪i Bi ) = 0. Hence from Lemma 9.1.2, mp (h (Z) \ ∪i h (Bi )) ≤ mp (h (Z \ ∪i Bi )) = 0 Therefore, mp (h (Z)) ≤
X
mp (h (Bi )) ≤ ε
i
≤
X
mp (Bi )
i
ε (mp (V )) ≤ ε (mp (Z) + ε) .
Since ε is arbitrary, this shows mp (h (Z)) = 0. What if U is not bounded? Then consider Zn = Z ∩ B (0, n) . From what was just shown, h (Zn ) has measure 0 and so it follows that h (Z) also does, being the countable union of sets of measure zero.
9.3
Mappings Which are Not One to One
Now suppose h : U → V = h (U ) and h is only C 1 , not necessarily one to one. Note that I am using C 1 , not just differentiable. This makes it convenient to use the inverse function theorem. You can get more generality if you work harder. For U+ ≡ {x ∈ U : |det Dh (x)| > 0} and Z the set where |det Dh (x)| = 0, Lemma 9.2.3 implies mp (h(Z)) = 0. For x ∈ U+ , the inverse function theorem implies there exists an open set Bx ⊆ U+ , such that h is one to one on Bx . Let {Bi } be a countable subset of {Bx }x∈U+ such that U+ = ∪∞ i=1 Bi . Let E1 = B1 . If E1 , · · · , Ek have been chosen, Ek+1 = Bk+1 \ ∪ki=1 Ei . Thus ∪∞ i=1 Ei = U+ , h is one to one on Ei , Ei ∩ Ej = ∅, and each Ei is a Borel set contained in the open set Bi . Now define n(y) ≡
∞ X
Xh(Ei ) (y) + Xh(Z) (y).
i=1
The sets h (Ei ) , h (Z) are measurable by Proposition 9.1.2. Thus n (·) is measurable.
Lemma 9.3.1 Let F ⊆ h(U ) be measurable. Then Z
Z XF (h(x))| det Dh(x)|dmp .
n(y)XF (y)dmp = h(U )
U
236
CHAPTER 9. CHANGE OF VARIABLES Proof: Using Lemma 9.2.3 and the Monotone Convergence Theorem mp (h(Z))=0 Z Z ∞ }| { z X n(y)XF (y)dmp = Xh(Ei ) (y) + Xh(Z) (y) XF (y)dmp h(U )
h(U )
= = = =
Xh(Ei ) (y)XF (y)dmp
i=1 h(U ) ∞ Z X
Xh(Ei ) (y)XF (y)dmp
i=1 h(Bi ) ∞ Z X
XEi (x)XF (h(x))| det Dh(x)|dmp
i=1 Bi ∞ Z X
XEi (x)XF (h(x))| det Dh(x)|dmp
i=1
=
i=1
∞ Z X
U
Z X ∞
XEi (x)XF (h(x))| det Dh(x)|dmp
U i=1
Z
Z XF (h(x))| det Dh(x)|dmp =
= U+
XF (h(x))| det Dh(x)|dmp . U
Definition 9.3.2
For y ∈ h(U ), define a function, #, according to the formula #(y) ≡ number of elements in h−1 (y).
Observe that #(y) = n(y) a.e.
(9.6)
because n(y) = #(y) if y ∈ / h(Z), a set of measure 0. Therefore, # is a measurable function because of completeness of Lebesgue measure.
Theorem 9.3.3 Z
Let g ≥ 0, g measurable, and let h be C 1 (U ). Then Z #(y)g(y)dmp = g(h(x))| det Dh(x)|dmp .
h(U )
(9.7)
U
In fact, you can have E some Borel measurable subset of U and conclude that Z Z #(y)g(y)dmp = g(h(x))| det Dh(x)|dmp h(E)
E
Proof: From 9.6 and Lemma 9.3.1, 9.7 holds for all g, a nonnegative simple function. Approximating an arbitrary measurable nonnegative function g, with an increasing pointwise convergent sequence of simple functions and using the monotone convergence theorem, yields 9.7 for an arbitrary nonnegative measurable function g. To get the last claim, simply replace g with gXh(E) in the first formula.
9.4
Spherical Coordinates In p Dimensions
Sometimes there is a need to deal with spherical coordinates in more than three dimensions. In this section, this concept is defined and formulas are derived for these coordinate systems. Recall polar coordinates are of the form y1 = ρ cos θ y2 = ρ sin θ
9.4. SPHERICAL COORDINATES IN P DIMENSIONS
237
where ρ > 0 and θ ∈ R. Thus these transformation equations are not one to one but they are one to one on (0, ∞) × [0, 2π). Here I am writing ρ in place of r to emphasize a pattern which is about to emerge. I will consider polar coordinates as spherical coordinates in two dimensions. I will also simply refer to such coordinate systems as polar coordinates regardless of the dimension. This is also the reason I am writing y1 and y2 instead of the more usual x and y. Now consider what happens when you go to three dimensions. The situation is depicted in the following picture. •(y1 , y2 , y3 )
R φ1 ρ
R2 From this picture, you see that y3 = ρ cos φ1 . Also the distance between (y1 , y2 ) and (0, 0) is ρ sin (φ1 ) . Therefore, using polar coordinates to write (y1 , y2 ) in terms of θ and this distance, y1 = ρ sin φ1 cos θ, y2 = ρ sin φ1 sin θ, y3 = ρ cos φ1 . where φ1 ∈ R and the transformations are one to one if φ1 is restricted to be in [0, π] . What was done is to replace ρ with ρ sin φ1 and then to add in y3 = ρ cos φ1 . Having done this, there is no reason to stop with three dimensions. Consider the following picture: •(y1 , y2 , y3 , y4 )
R φ2 ρ
R3 From this picture, you see that y4 = ρ cos φ2 . Also the distance between (y1 , y2 , y3 ) and (0, 0, 0) is ρ sin (φ2 ) . Therefore, using polar coordinates to write (y1 , y2 , y3 ) in terms of θ, φ1 , and this distance, y1 = ρ sin φ2 sin φ1 cos θ, y2 = ρ sin φ2 sin φ1 sin θ, y3 = ρ sin φ2 cos φ1 , y4 = ρ cos φ2 where φ2 ∈ R and the transformations will be one to one if φ2 , φ1 ∈ (0, π) , θ ∈ (0, 2π) , ρ ∈ (0, ∞) . Continuing this way, given spherical coordinates in Rp , to get the spherical coordinates in Rp+1 , you let yp+1 = ρ cos φp−1 and then replace every occurance of ρ with ρ sin φp−1 to obtain y1 , · · · , yp in terms of φ1 , φ2 , · · · , φp−1 ,θ, and ρ. It is always the case that ρ measures the distance from the point in Rp to the origin in p R , 0. Each φi ∈ R and the transformations will be one to one if each φi ∈ (0, π) , and θ ∈ (0, 2π) . Denote by hp ρ, ~φ, θ the above transformation. It Q can be shown using math induction and geometric reasoning that these coordinates p−2 map i=1 (0, π) × (0, 2π) × (0, ∞) one to one onto an open subset of Rp which is everything except for the set of measure zero Ψp (N ) where N results from having some φi equal to 0 or π or for ρ = 0 or for θ equal to either 2π or 0. Each of these are sets of Lebesgue measure zero and so their union is also a set of measure zero. You can see that
238
CHAPTER 9. CHANGE OF VARIABLES
Q p−2 hp (0, π) × (0, 2π) × (0, ∞) omits the union of the coordinate axes except for maybe i=1 one of them. This is not important to the integral because it is just a set of measure zero. Theorem 9.4.1 Let y = hp ~φ, θ, ρ be the spherical coordinate transformations in Qp−2 Rp . Then letting A = i=1 (0, π) × (0, 2π) , it follows h maps A × (0, ∞) one to one onto all of Rp except a set of measure zero given by hp (N ) where N is the set of measure zero A¯ × [0, ∞) \ (A × (0, ∞)) Also det Dhp ~φ, θ, ρ will always be of the form det Dhp ~φ, θ, ρ = ρp−1 Φ ~φ, θ .
(9.8)
where Φ is a continuous function of ~φ and θ.1 Then if f is nonnegative and Lebesgue measurable, Z Z Z (9.9) f (y) dmp = f (y) dmp = f hp ~φ, θ, ρ ρp−1 Φ ~φ, θ dmp Rp
hp (A)
A
Furthermore whenever f is Borel measurable and nonnegative, one can apply Fubini’s theorem and write Z Z ∞ Z p−1 f (y) dy = ρ f h ~φ, θ, ρ Φ ~φ, θ d~φdθdρ (9.10) Rp
0
A
where here d~φdθ denotes dmp−1 on A. The same formulas hold if f ∈ L1 (Rp ) . Proof: Formula 9.8 is obvious from the definition of the spherical coordinates because in the matrix of the derivative, there will be a ρ in p − 1 columns. The first claim is also clear from the definition and math induction or from the geometry of the above description. It remains to verify 9.9 and 9.10. It is clear hp maps A¯ × [0, ∞) onto Rp . Since hp is differentiable, it maps sets of measure zero to sets of measure zero. Then Rp = hp (N ∪ A × (0, ∞)) = hp (N ) ∪ hp (A × (0, ∞)) , the union of a set of measure zero with hp (A × (0, ∞)) . Therefore, from the change of variables formula, Z Z Z f (y) dmp = f (y) dmp = f hp ~φ, θ, ρ ρp−1 Φ ~φ, θ dmp Rp
hp (A×(0,∞))
A×(0,∞)
which proves 9.9. This formula continues to hold if f is in L1 (Rp ). Finally, if f ≥ 0 or in L1 (Rn ) and is Borel measurable, the Borel sets denoted as B (Rp ) then one can write the following. From the definition of mp Z f hp ~φ, θ, ρ ρp−1 Φ ~φ, θ dmp A×(0,∞)
Z
Z
f hp ~φ, θ, ρ ρp−1 Φ ~φ, θ dmp−1 dm (0,∞) A Z Z = ρp−1 f hp ~φ, θ, ρ Φ ~φ, θ dmp−1 dm
=
(0,∞) 1 Actually
A
it is only a function of the first but this is not important in what follows.
9.5. APPROXIMATION WITH SMOOTH FUNCTIONS
239
Now the claim about f ∈ L1 follows routinely from considering the positive and negative parts of the real and imaginary parts of f in the usual way. Note that the above equals Z f hp ~φ, θ, ρ ρp−1 Φ ~φ, θ dmp ¯ A×[0,∞)
and the iterated integral is also equal to Z Z ρp−1 f hp ~φ, θ, ρ Φ ~φ, θ dmp−1 dm ¯ A
[0,∞)
because the difference is just a set of measure zero. Notation 9.4.2 Oftenthis is written differently. Note that from the spherical coordinate formulas, f h ~φ, θ, ρ = f (ρω) where |ω| = 1. Letting S p−1 denote the unit sphere, {ω ∈ Rp : |ω| = 1} , the inside integral in the above formula is sometimes written as Z f (ρω) dσ S p−1
where σ is a measure on S p−1 . See [46] for another description of this measure. It isn’t an important issue here. Either 9.10 or the formula Z Z ∞ ρp−1 f (ρω) dσ dρ S p−1
0
will be referred to as polar coordinates and is very useful in establishing estimates. Here R p−1 σ S ≡ A Φ ~φ, θ dmp−1 . Example 9.4.3 For what values of s is the integral
R B(0,R)
2
1 + |x|
s
dy bounded inde-
p
pendent of R? Here B (0, R) is the ball, {x ∈ R : |x| ≤ R} . I think you can see immediately that s must be negative but exactly how negative? It turns out it depends on p and using polar coordinates, you can find just exactly what is needed. From the polar coordinates formula above, Z
2
1 + |x|
s
Z dy
R
Z
1 + ρ2
=
B(0,R)
ρp−1 dσdρ
S p−1
0
Z =
s
Cp
R
1 + ρ2
s
ρp−1 dρ
0
Now the very hard problem has been reduced to considering an easy one variable problem of finding when Z R s ρp−1 1 + ρ2 dρ 0
is bounded independent of R. You need 2s + (p − 1) < −1 so you need s < −p/2.
9.5
Approximation with Smooth Functions
It is very important to be able to approximate measurable and integrable functions with continuous functions having compact support. Recall Theorem 8.4.2. This implies the following.
240
CHAPTER 9. CHANGE OF VARIABLES
Theorem 9.5.1
R Let f ≥ 0 be Fn measurable and let f dmn < ∞. Then there exists a Rsequence of continuous functions {hn } which are zero off a compact set such that p limn→∞ |f − hn | dmn = 0.
Definition 9.5.2
Let U be an open subset of Rn . Cc∞ (U ) is the vector space of all infinitely differentiable functions which equal zero for all x outside of some compact set contained in U . Similarly, Ccm (U ) is the vector space of all functions which are m times continuously differentiable and whose support is a compact subset of U .
Corollary 9.5.3 Let U be a nonempty open set in Rn and let f ∈ Lp (U, mn ) . Then there exists g ∈ Cc (U ) such that Z
p
|f − g| dmn < ε Proof: If f ≥ 0 then extend it to be 0 off U . In the above argument, let all functions involved, the simple functions and the continuous functions be zero off U . Simply intersect all Vi with U and no harm is done. Now to extend to Lp (U ) , simply apply Theorem 9.5.1 to the positive and negative parts of real and imaginary parts of f . Example 9.5.4 Let U = B (z, 2r) −1 2 2 exp |x − z| − r if |x − z| < r, ψ (x) = 0 if |x − z| ≥ r. Then a little work shows ψ ∈ Cc∞ (U ). The following also is easily obtained.
Lemma 9.5.5 Let U be any open set. Then Cc∞ (U ) 6= ∅. Proof: Pick z ∈ U and let r be small enough that B (z, 2r) ⊆ U . Then let ψ ∈ Cc∞ (B (z, 2r)) ⊆ Cc∞ (U ) be the function of the above example. For a different approach see Problem 14 on Page 202. This leads to a really remarkable result about approximation with smooth functions.
Definition 9.5.6
Let U = {x ∈ Rn : |x| < 1}. A sequence {ψ m } ⊆ Cc∞ (U ) is called a mollifier (This is sometimes called an approximate identity if the differentiability is not included.) if 1 ψ m (x) ≥ 0, ψ m (x) = 0, if |x| ≥ , m R and ψ m (x) = 1. Sometimes it may be written as {ψ ε } where ψ ε satisfies the above conditions except ψ ε (x) = 0 if |x| ≥ ε. In other words, εRtakes the place of 1/m. There certainly exist mollifiers. Let ψ ∈ Cc∞ (B (0,1)) , ψ (x) ≥ 0, ψ (x) dmn = 1. Then let ψ m (x) ≡ cm ψ (mx) R 1 where cm is chosen to make cm ψ (mx) dmn = 1. Thus ψ m is 0 off B 0, m . R The notation f (x, y)dµ(y) will mean x is fixed and the function y → f (x, y) is being integrated. To make the notation more familiar, dx is written instead of dmn (x). R Lemma 9.5.7 Let g ∈ Cc (U ) then there exists h ∈ Cc∞ (U ) such that |g − h|p dmn < ε. Proof: Let ψ m be a mollifier. Consider Z hm (x) ≡ g (x − y) ψ m (y) dmn (y)
9.5. APPROXIMATION WITH SMOOTH FUNCTIONS
241
Then since the integral of ψ m is 1, it follows that Z hm (x) − g (x) = (g (x − y) − g (x)) ψ m (y) dmn (y) Since g is zero off a compact set, it follows that g is uniformly continuous and so there is δ > 0 such that if |x − x ˆ| < δ, then |g (x) − g (ˆ x)| < ε. Choose m such that 1/m < δ. Then Z |hm (x) − g (x)| = (g (x − y) − g (x)) ψ m (y) dmn Z ≤ |g (x − y) − g (x)| ψ m (y) dmn (y) B(0,1/m) Z < εψ m (y) dmn (y) = ε. B(0,1/m)
This is true for all x. Now note that hm is only nonzero if x ∈ K + B (0, 1/m) where K is defined as the compact set off which g equals 0. Since K is contained in U, it follows that K + B (0, 1/m) ⊆ U for all m small enough. In fact, K + B (0, 1/m) is a compact set contained in U off which hm is zero for all m large enough because B (0, 1/ (m + 1)) ⊆ B (0, 1/m) . Thus hm is zero off a compact subset of U. In addition to this, hm is infinitely differentiable. To see this last claim, note that Z Z hm (x) = g (x − y) ψ m (y) dmn (y) = g (y) ψ m (x − y) dmn (y) This follows from the change of variables formulas presented above. To see the function is differentiable, Z hm (x+hei ) − h (x) ψ (x+hei −y) − ψ m (x − y) = g (y) m dmn (y) h h and now, since ψ m is zero off a compact set, it and its partial derivatives of all order are uniformly continuous. Hence, one can pass to a limit and obtain Z ∂ψ m (x) hxi (x) = g (y) dmn (y) ∂xi Repeat the same argument using the partial derivative of ψ m in place of ψ m . Continuing this way, one obtains the existence of all partial derivatives at any x. Thus hm ∈ Cc∞ (U ) R p for all m large enough and |hm − g| dmn < ε for all m large enough. Note that this would have worked for µ an arbitrary regular measure. Now it is obvious that the functions in Cc∞ (U ) are dense in Lp (U ) , p ≥ 1. Pick f ∈ p L (U ) . Then there exists g ∈ Cc (U ) such that Z
1/p
p
|f − g| dmn
< ε/2
U
and there is h ∈ Cc∞ (U ) such that Z
p
1/p
|h − g| dmn
< ε/2.
U
Then Z
p
|f − h| dmn U
1/p
Z ≤
p
|f − g| dmn U
1/p
Z
p
|h − g| dmn
+ U
1/p
0 be given and let g ∈ Cc (Rn ) with kg − f kp < measure is translation invariant (mn (w + E) = mn (E)), kgw − fw kp = kg − f kp
0 isgiven there exists δ < δ 1 such that if |w| < δ, it follows that 1/p
|g (x − w) − g (x)| < ε/3 1 + mn (B)
. Therefore,
Z kg − gw kp
p
1/p
|g (x) − g (x − w)| dmn
= B
1/p
mn (B) ε < . ≤ ε 1/p 3 3 1 + mn (B) Therefore, whenever |w| < δ, it follows kg − gw kp
0, there exists g ∈ D such that kf − gkp < ε. Proof: Let Q be all functions of the form cX[a,b) where [a, b) ≡ [a1 , b1 ) × [a2 , b2 ) × · · · × [an , bn ), and both ai , bi are rational, while c has rational real and imaginary parts. Let D be the set of all finite sums of functions in Q. Thus, D is countable. In fact D is dense in Lp (Rn , µ). To prove this it is necessary to show that for every f ∈ Lp (Rn , µ), there exists an element of D, s such that ks − f kp < ε. If it can be shown that for every g ∈ Cc (Rn ) there exists h ∈ D such that kg − hkp < ε, then this will suffice because if f ∈ Lp (Rn ) is arbitrary, Theorem 8.4.2 implies there exists g ∈ Cc (Rn ) such that kf − gkp ≤ 2ε and then there would exist h ∈ Cc (Rn ) such that kh − gkp < 2ε . By the triangle inequality, kf − hkp ≤ kh − gkp + kg − f kp < ε. Therefore, assume at the outset that f ∈ Cc (Rn ). Q n Let Pm consist of all sets of the form [a, b) ≡ i=1 [ai , bi ) where ai = j2−m and bi = (j+1)2−m for j an integer. Thus Pm consists of a tiling of Rn into half open rectangles having 1 diameters 2−m n 2 . There are countably many of these rectangles; so, letPm = {[ai , bi )} for m i ≥ 1, and Rn = ∪∞ i=1 [ai , bi ). Let ci be complex numbers with rational real and imaginary parts satisfying −m |f (ai ) − cm , i | 0 there exists δ > 0 such that if |x − y| < δ, |f (x) − f (y)| < ε/2. Now let m be large enough that every box in Pm has diameter less than δ and also that 2−m < ε/2. Then if [ai , bi ) is one of these boxes of Pm , and x ∈ [ai , bi ), |f (x) − f (ai )| < ε/2 and −m |f (ai ) − cm < ε/2. i | 0, there exists g ∈ D such that kf − gkp < ε. Proof: Let P denote the set of all polynomials which have rational coefficients. Then P is n n n countable. Let τ k ∈ Cc ((− (k + 1) , (k + 1)) ) such that [−k, k] ≺ τ k ≺ (− (k + 1) , (k + 1)) . n The notation means that τ k is one on [−k, k] , has values between 0 and 1 and vanishes off a n compact subset of (− (k + 1) , (k + 1)) . Let Dk denote the functions which are of the form, pτ k where p ∈ P. Thus Dk is also countable. Let D ≡ ∪∞ k=1 Dk . It follows each function in D is in Cc (Rn ) and so it in Lp (Rn , µ). Let f ∈ Lp (Rn , µ). By regularity of µ there exists n g ∈ Cc (Rn ) such that kf − gkLp (Rn ,µ) < 3ε . Let k be such that spt (g) ⊆ (−k, k) . Now by the Weierstrass approximation theorem there exists a polynomial q such that kg − qk[−(k+1),k+1]n
n
≡ sup {|g (x) − q (x)| : x ∈ [− (k + 1) , (k + 1)] } ε < n . 3µ ((− (k + 1) , k + 1) )
It follows kg − τ k qk[−(k+1),k+1]n
= kτ k g − τ k qk[−(k+1),k+1]n ε < n . 3µ ((− (k + 1) , k + 1) )
Without loss of generality, it can be assumed this polynomial has all rational coefficients. Therefore, τ k q ∈ D. Z p p kg − τ k qkLp (Rn ) = |g (x) − τ k (x) q (x)| dµ (−(k+1),k+1)n
≤
0, x → e−tx is a function in L1 (R) and Z ∞ 1 e−tx dx = . t 0 Thus Z 0
R
sin (t) dt = t
R
Z
Z
∞
sin (t) e−tx dxdt
0
0
Now explain why you can change the order of integration in the above iterated integral. Then compute what you get. Next pass to a limit as R → ∞ and show Z ∞ sin (t) 1 dt = π t 2 0 This is a very important integral. Note that the thing on the left is an improper integral. sin (t) /t is not Lebesgue integrable because it is not absolutely integrable. That is Z ∞ sin t t dm = ∞ 0
It is important to understand that the Lebesgue theory of integration only applies to nonnegative functions and those which are absolutely integrable. 2. As mentioned above, polar coordinates are x =
r cos (θ)
y
r sin (θ)
=
and this transformation maps [0, 2π) × [0, ∞) onto R2 . It is the restriction to this set of the same transformation defined on the open set (−1, 2π) × (−1, ∞) and so from Theorem 9.3.3, if g is Lebesgue measurable and zero off [0, 2π) × [0, ∞), Z
Z
2π
Z
# (x) g (x) dm2 = R2
∞
g (r cos θ, r sin θ) rdrdθ 0
0
where # (x) is 1 except for a set of measure zero consisting of either θ = 0 or r = 0. This set, has its image also of measure zero. Hence one can simply write Z Z 2π Z ∞ g (x) dm2 = g (r cos θ, r sin θ) rdrdθ R2
0
0
Similar considerations apply to the general case of spherical coordinates asR explained 2 ∞ above. Now use this change of variables for polar coordinates. To show that −∞ e−x dx = R∞ R∞ R∞ 2 2 √ 2 π. Hint: Let I = −∞ e−x dx and explain why I 2 = −∞ −∞ e−(x +y ) dxdy. Now use polar coordinates. 3. Let E be a Lebesgue measurable set in R. Suppose m(E) > 0. Consider the set E − E = {x − y : x ∈ E, y ∈ E}. Show that E − E contains an interval. Hint: Let Z f (x) = XE (t)XE (x + t)dt. Show f is continuous at 0 and f (0) > 0 and use continuity of translation in Lp .
246
CHAPTER 9. CHANGE OF VARIABLES
4. Let K be a bounded subset of Lp (Rn ) and suppose that there exists G such that G is compact with Z p |u (x)| dx < εp Rn \G
and for all ε > 0, there exist a δ > 0 and such that if |h| < δ, then Z p |u (x + h) − u (x)| dx < εp for all u ∈ K. Show that K is precompact in Lp (Rn ). Hint: Let φk be a mollifier and consider Kk ≡ {u ∗ φk : u ∈ K} . The notation means the following: Z u ∗ φk (x) ≡
u (x − y) φk (y) dmn (y) Z
=
u (y) φk (x − y) dmn (y)
It is called the convolution. Verify the conditions of the Ascoli Arzela theorem for these functions defined on G and show there is an ε net for each ε > 0. Can you modify this to let an arbitrary open set take the place of Rn ? R 5. Let φm ∈ Cc∞ (Rn ), φm (x) ≥ 0, and Rn φm (y)dy = 1 with lim sup {|x| : x ∈ spt (φm )} = 0.
m→∞
Show if f ∈ Lp (Rn ), limm→∞ f ∗φm = f in Lp (Rn ). Hint: Use Minkowski’s inequality for integrals to get a short proof of this fact. This is a little different than what is in the chapter. 0
6. Let p1 + p10 = 1, p > 1, let f ∈ Lp (Rn ), g ∈ Lp (Rn ). Show f ∗ g is uniformly continuous on R and |(f ∗ g)(x)| ≤ kf kLp kgkLp0 . Z f ∗ g (x) ≡ f (x − y) g (y) dmn Rn
Hint: You need to consider why f ∗ g exists and then this follows from the definition of convolution and continuity of translation of Lebesgue measure. 7. Suppose f is a strictly decreasing nonnegative function defined on [0, ∞). Let f −1 (y) ≡ {x : such that y ∈ [f (x+) , f (x−)]}. Show that Z
∞
Z
f (0)
f (t) dt = 0
f −1 (y) dy
0
Hint: Try to show that f −1 (y) = m ([f > y]) . −1/2
8. Let f (y) = g (y) = |y| if y ∈ (−1, 0) ∪ (0, 1) and f (y) = g (y) R = 0 outside of this set. For which values of x does it make sense to write the integral R f (x − y) g (y) dy ≡ f ∗ g (x)? This is asking for you to find where the convolution of f and g makes sense. 9. Let f ∈ L1 (Rp ) and let g ∈ L1 (Rp ) . Define the convolution of f and g as follows. It equals Z f ∗ g (x) ≡
f (x − y) g (y) dmp
9.8. EXERCISES
247
Show that the above integral makes sense for a.e. x that is, for all x off a set of measure zero. If f ∗ g is defined to equal 0 at points where the above integral does not make sense, show that kf ∗ gk1 ≤ kf k1 kgk1 R where khk1 ≡ |h| dmp . 10. Consider D ≡ {p (e−αt )} where p (t) is some real polynomial having zero constant term and α is some positive number at least as large as a given α0 > 0. Show that D is an algebra and is dense in C0 ([0, ∞)) with respect to the norm kf k∞ ≡ max {|f (x)| : x ∈ [0, ∞)}. R∞ 11. ↑Suppose f ∈ L1 ([0, ∞)) where the measure is Lebesgue measure and that 0 f (t) g (t) dm = 0 for all g ∈ D in the above problem. Explain why f (t) = 0 a.e. t. Hint: You can assume f is real since if not, you could look at the real and imaginary parts. You can also assume that f is nonnegative. Show density of Cc ([0, ∞)) in L1 ([0, ∞)). Then show there is a sequence of things in D which converges in L1 to f . Finally, go over why you can get a further subsequence which converges to f a.e. Then use Fatou’s lemma. 12. A measurable function f defined on [0, ∞) has exponential growth if f (t) ≤ Cert for some Rreal r. Suppose you have f measurable with exponential growth. Show ∞ Lf (s) ≡ 0 e−st f (t) dt the Laplace transform, is well defined for all s large enough. Now show that if Lf (s) = 0 for all s large enough, then f (t) = 0 for a.e. t. This shows that if two measurable functions with exponential growth have the same Laplace transform for large s, then they are a.e. the same function.
248
CHAPTER 9. CHANGE OF VARIABLES
Chapter 10
Some Fundamental Functions and Transforms 10.1
Gamma Function
With the Lebesgue integral, it becomes easy to consider the Gamma function and the theory of Laplace transforms. I will use the standard notation for the integral used in calculus, but remember that all integrals will be Lebesgue integrals taken with respect to one dimensional Lebesgue measure. First is a very important function defined in terms of an integral. Problem 13 on Page 181 shows that in the case of a continuous function, the Riemann integral and the Lebesgue integral are exactly the same. Thus all the standard calculus manipulations are valid for the Lebesgue integral provided the functions integrated are continuous. This also implies immediately that the two integrals coincide whenever the function is piecewise continuous on a finite interval. Recall that the value of the Riemann integral does not depend on the value of the function at single points and the same is true of the Lebesgue integral because single points have zero measure.
Definition 10.1.1
The gamma function is defined by Z ∞ Γ (α) ≡ e−t tα−1 dt 0
whenever α > 0.
Lemma 10.1.2 The integral is finite for each α > 0. Proof: By the monotone convergence theorem, for n ∈ N Z Γ (α)
= ≤
n
lim
n→∞
−t α−1
e t 1/n
Z
1
≤ lim sup n→∞
t
α−1
1/n
Z dt +
!
n −t/2
Ce 1
1 1 1 + lim −2Ce− 2 n + 2Ce− 2 < ∞ α n→∞
The explanation for the constant is as follows. For t ≥ 1 and m a positive integer larger than α, tα−1 tn−1 < et/2 et/2 which converges to 0 as t → ∞ which is easily shown by an appeal to L’Hospital’s rule. Hence tα−1 e−t ≤ Cet/2 e−t = Ce−t/2 .
249
250
CHAPTER 10. SOME FUNDAMENTAL FUNCTIONS AND TRANSFORMS
Proposition 10.1.3 For n a positive integer, n! = Γ (n + 1). In general, Γ (1) = 1, Γ (α + 1) = αΓ (α) Proof: First of all, Γ (1) = limδ→0 α > 0, Z
= lim
δ→0
e
δ
−1 e−t dt = limδ→0 e−δ − e−(δ ) = 1. Next, for
δ −1
Γ (α + 1) = lim
δ→0
R δ−1
" −t α
e t dt = lim
δ→0
δ
−δ α
δ −e
−(δ −1 ) −α
δ
−1 −e−t tα |δδ
Z
δ −1
# −t α−1
+α
e t
dt
δ δ −1
Z
! −t α−1
+α
e t
dt
= αΓ (α)
δ
Now it is defined that 0! = 1 and so Γ (1) = 0!. Suppose that Γ (n + 1) = n!, what of Γ (n + 2)? Is it (n + 1)!? if so, then by induction, the proposition is established. From what was just shown, Γ (n + 2) = Γ (n + 1) (n + 1) = n! (n + 1) = (n + 1)! and so this proves the proposition.
10.2
Laplace Transform
Here is the definition of a Laplace transform.
Definition 10.2.1
A function φ has exponential growth on [0, ∞) if there are positive constants λ, C such that |φ (t)| ≤ Ceλt for all t. Then for s > λ, one defines the Laplace R∞ transform Lφ (s) ≡ 0 φ (t) e−st dt. R∞ Let f (s) = 0 e−st φ (t) dt where t → φ (t) e−st is in L1 ([0, ∞)) for all s large enough and φ has exponential growth. Then for s large enough, f (k) (s) exists R∞ k and equals 0 (−t) e−st φ (t) dt. In fact if s is a complex number and Re s > λ where |φ (t)| ≤ Ceλt , then for Re s > λ, Z ∞ f (s + h) − f (s) ≡ f 0 (s) = (−t) e−st φ (t) dt lim h→0 h 0
Theorem 10.2.2
Proof: First consider the real case. Suppose true for some k ≥ 0. By definition it is so for k = 0. Then always assuming s > λ, |h| < s − λ, where |φ (t)| ≤ Ceλt , λ ≥ 0, f (k) (s + h) − f (k) (s) = h
Z
∞
k
(−t) 0
e−(s+h)t − e−st φ (t) dt h
Using the mean value theorem, the integrand satisfies −(s+h)t − e−st (−t)k e ≤ tk |φ (t)| −te−ˆst φ (t) h where sˆ ∈ (s, s + h) or (s + h, s) if h < 0. In case h > 0, the integrand is less than tk+1 |φ (t)| e−st ≤ tk+1 Ce(λ−s)t , a function in L1 since s > λ. In the other case, for |h| small enough, the integrand is dominated by tk+1 Ce(λ−(s+|h|))t . Letting |h| < ε where s − λ < ε, the integrand is dominated by tk+1 Ce(λ−(s+ε))t < Ctk+1 e−εt , also a function in L1 . By the dominated convergence theorem, one can pass to the limit and obtain Z ∞ k+1 −st f (k+1) (s) = (−t) e φ (t) dt 0
10.2. LAPLACE TRANSFORM
251
Now consider the final claim about Re s > λ. The difference quotient looks the same. However, s will now be complex as will h. From the properties of the complex exponential, Section 1.5.2, Z ∞ −(s+h)t Z ∞ Z ∞ e − e−st e−ht − 1 e−ht − 1 φ (t) dt = −te−st φ (t) dt = e−st φ (t) dt h −th h 0 0 0 Now it follows from that section that the integrand converges to −te−st φ (t). Therefore, it only is required to obtain an estimate and use the dominated convergence theorem again. Say h = reiθ . Then the integrand is e−(s+h)t
1 − eht φ (t) h
Thus its magnitude satisfies ht ht −(s+h)t 1 − eht e ≤ Ceλt 1 − e e−(s+h) ≤ Cte−(s+h−λ)t 1 − e φ (t) h th h Now e−(s+h−λ)t = e−(Re(s+h)−λ)t ≤ e−εt whenever |h| is small enough because of the asRt sumption that Re s > λ. eht − 1 = 0 hehu du and so the last expression is R t hehu du 1 Z t 0 hu e du ≤ th t 0 Rt R t There is ω, |ω| = 1, ω = eiα and ω 0 ehu du = 0 ehu du and so the above is no larger than 1 ω t
Z
t
ehu du =
0
≤ ≤
Z Z 1 t iα hu 1 t e e du = Re eiα ehu du t 0 t 0 Z t Z t 1 ehu du = 1 eu Re h du t 0 t 0 Z 1 t u|h| e du ≤ et|h| t 0 ε
and so for |h| small enough, say smaller than ε/2, the integrand is no larger than Cte−εt e 2 t = ε L1 and so the dominated convergence theorem applies and it follows that Cte− 2 t which R ∞ is in−st 0 f (s) = 0 (−t) e φ (t) dt whenever Re (s) > λ. In fact all the derivatives will exist, by the same argument, but this will follow more easily as a special case of more general results when we get around to using this. The whole approach for Laplace transforms in differential equations is based on the assertion that if L (f ) = L (g) , then f = g. However, this is not even true because if you change the function on a set of measure zero, you don’t change the transform. However, if f, g are continuous, then it will be true. Actually, it is shown here that if L (f ) = 0, and f is continuous, then f = 0. The approach here is based on the Weierstrass approximation theorem or rather a case of it.
Lemma 10.2.3 Suppose q is a continuous function defined on [0, 1] . Also suppose that for all n = 0, 1, 2, · · · , Z
1
q (x) xn dx = 0
0
Then it follows that q = 0. R1 Proof: By assumption, for p (x) any polynomial, 0 q (x) p (x) dx = 0. Now let {pn (x)} be a sequence of polynomials which converge uniformly to q (x) by Theorem 2.12.2. Say max |q (x) − pn (x)|
0 and all t > 0 and also that f is R |f (t)| ≤ Ce
continuous. Suppose that
∞ −st e f 0
(t) dt = 0 for all s > 0. Then f = 0.
Proof: First note that limt→∞ |f (t)| = 0. Next change the variable letting x = e−t and R1 so x ∈ [0, 1]. Then this reduces to 0 xs−1 f (− ln (x)) dx. Now if you let q (x) = f (− ln (x)) , it is not defined when x = 0, but x = 0 corresponds to t → ∞. Thus limx→0+ q (x) = 0. R1 Defining q (0) ≡ 0, it follows that it is continuous and for all n = 0, 1, 2, · · · , 0 xn q (x) dx = 0 and so q (x) = 0 for all x from Lemma 10.2.3. Thus f (− ln (x)) = 0 for all x ∈ (0, 1] and so f (t) = 0 for all t ≥ 0. Now suppose only that |f (t)| ≤ Cert so f has exponential growth and that for all s sufficiently large, L (f ) = 0. Does it follow that f = 0? Say this holds for all s ≥ s0 where also s0 > r. Then consider fˆ (t) ≡ e−s0 t f (t) . Then if s > 0, Z ∞ Z ∞ Z ∞ e−st fˆ (t) dt = e−st e−s0 t f (t) dt = e−(s+s0 )t f (t) dt = 0 0
0
0
because s + s0 is large enough for this to happen. It follows from Lemma 10.2.4 that fˆ = 0. But this implies that f = 0 also. This proves the following fundamental theorem.
Theorem 10.2.5
Suppose f has exponential growth and is continuous on [0, ∞). Suppose also that for all s large enough, L (f ) (s) = 0. Then f = 0. Now this will be extended to more general functions.
Corollary 10.2.6 Suppose |f (t)| , |g (t)| have exponential growth and are continuous except for countably many jumps. Thus f, g are certainly measurable. Then if L (f ) = L (g) it follows that f = g except for jumps. Proof: Say |f (t)| , |g (t)| ≤ Ceλt and let s > λ. Then by definition and Fubini’s theorem, for h = f − g Z ∞ Z t Z ∞ Z ∞ −st h (u) du e dt = h (u) e−st dtdu 0 0 0 u Z ∞ 1 1 = h (u) e−us = L (h) (s) = 0 s s 0 Rt Thus from Theorem 10.2.5, 0 h (u) du = 0. By fundamental theorem of calculus, at every t not a jump, h (t) = 0. That is, f (t) = g (t) other than for t a point of discontinuity of either f or g. With the Lebesgue fundamental theorem of calculus which has not been presented, the above argument will show that it is only necessary to assume the functions are in L1 and conclude they are equal a.e. if their Laplace transforms are equal.
lim pn ◦ l−1 − f ∞ = 0
n→∞
and it is clear that pn ◦ l−1 is a polynomial in x ∈ [a, b] because l−1 is just a linear function like l.
10.3. FOURIER TRANSFORM
10.3
253
Fourier Transform
Definition 10.3.1
The Fourier transform is defined as follows for f ∈ L1 (R) . Z ∞ 1 e−itx f (x) dx F f (t) ≡ √ 2π −∞
where here I am using the usual notation from calculus to denote the Lebesgue integral in which, to be more precise, you would put dm1 in place of dx. The inverse Fourier transform is defined the same way except you delete the minus sign in the complex exponential. Z ∞ 1 F −1 f (t) ≡ √ eitx f (x) dx 2π −∞ Does it deserve to be called the “inverse” Fourier transform? This question will be explored somewhat below. In studying the Fourier transform, I will use some improper integrals. R R Definition 10.3.2 Define a∞ f (t) dt ≡ limr→∞ ar f (t) dt. This coincides with the Lebesgue integral when f ∈ L1 (a, ∞). With this convention, there is a very important improper integral involving sin (x) /x. You can show with a little estimating that x → sin (x) /x is not in L1 (0, ∞) . Nevertheless, a lot can be said about improper integrals involving this function.
Theorem 10.3.3 1.
R∞ 0
sin u u du
2. limr→∞
=
R∞
The following hold
π 2
sin(ru) du u
= 0 whenever δ > 0. R 3. If f ∈ L1 (R) , then limr→∞ R sin (ru) f (u) du = 0. This is called the Riemann Lebesgue lemma. R∞ Proof: You know u1 = 0 e−ut dt. Therefore, using Fubini’s theorem, δ
Z 0
r
sin u du = u
Z
r
Z sin (u)
0
∞
e
−ut
Z
∞
Z
dtdu =
0
0
r
e−ut sin (u) dudt
0
Now you integrate that inside integral by parts. Z ∞ 1 −tr cos (r) + t sin (r) −e dt = t2 + 1 1 + t2 0 That integrand converges to t21+1 as r → ∞ for each t > 0. I would like to use the dominated convergence theorem. That second term is of the form √ 2 −tr 1 + t cos (r − φ (t, r)) e 1 + t2 Also
√ 1 −tr 1 + t2 cos (r − φ (t, r)) e−tr e ≤ 2 (1 + t2 )1/2 1+t
For r > 1, this is no larger than 1 (1 +
1/2 t2 )
e−t
254
CHAPTER 10. SOME FUNDAMENTAL FUNCTIONS AND TRANSFORMS
which is obviously in L1 and so one can apply the dominated convergence theorem and conclude that Z r Z ∞ sin u 1 π dt = . lim du = 2 r→∞ 0 u 1 + t 2 0 This shows part 1. R∞ R∞ Rδ Now consider δ sin(ru) du. It equals 0 sin(ru) du − 0 sin(ru) du which can be seen from u u u the definition of what the improper integral means. Also, you can change the variable. Let ru = t so rdu = dt and the above reduces to Z ∞ Z rδ Z ∞ sin (t) 1 sin (t) sin (ru) r dt − dt = du t r t u 0 0 δ Thus π − 2 R∞
rδ
Z 0
sin (t) dt = t
Z
∞
δ
sin (ru) du u
sin(ru) du u
= 0 from the first part. and so limr→∞ δ Now consider the third claim, the Riemann Lebesgue lemma. Let h ∈ Cc∞ (R) such that R |f − h| dmp < ε. Then R Z Z ≤ sin (ru) (f (u) − h (u)) du + sin (ru) h (u) R R Z ≤ ε + sin (ru) h (u)
Z sin (ru) f (u) du R
R
with this last integral, do an integration by parts. Since h vanishes off some interval, Z Z 1 sin (ru) h (u) dmp = − cos (ru) h0 (u) dmp r R R Thus this last integral is dominated by Cr so it converges to 0. For r large enough, Z sin (ru) f (u) du ≤ 2ε R
and since ε is arbitrary, this establishes the claim.
Definition 10.3.4
The following notation will be used assuming the limits exist.
lim g (x + r) ≡ g (x+) , lim g (x − r) ≡ g (x−)
r→0+
r→0+
Theorem 10.3.5
Suppose that g ∈ L1 (R) and that at some x, g is locally Holder continuous from the right and from the left. This means there exist constants K, δ > 0 and r ∈ (0, 1] such that for |x − y| < δ, |g (x+) − g (y)| < K |x − y|
r
(10.1)
|g (x−) − g (y)| < K |x − y|
r
(10.2)
for y > x and for y < x. Then 2 lim r→∞ π
Z 0
∞
sin (ur) u =
g (x − u) + g (x + u) 2
g (x+) + g (x−) . 2
du
10.3. FOURIER TRANSFORM
255
R∞ Proof: As in the proof of Theorem 10.3.3, changing variables shows that π2 0 1.Therefore, Z 2 ∞ sin (ur) g (x − u) + g (x + u) g (x+) + g (x−) du − π 0 u 2 2 Z ∞ sin (ur) g (x − u) − g (x−) + g (x + u) − g (x+) 2 = du π 0 u 2 δ
=
Z
=
g (x − u) − g (x−) g (x + u) − g (x+) sin (ur) + du 2u 2u 0 Z ∞ sin (ur) g (x − u) − g (x−) g (x + u) − g (x+) 2 + + du π δ u 2 2 2 π
sin(ru) du u
(10.3)
Second Integral: It equals Z 2 ∞ sin (ur) g (x − u) + g (x + u) g (x−) + g (x+) − du π δ u 2 2 ∞
sin (ur) g (x − u) + g (x + u) = u 2 δ Z ∞ 2 sin (ur) g (x−) + g (x+) − π δ u 2 2 π
Z
(10.4)
From part 2 of Theorem 10.3.3, Z 2 ∞ sin (ur) g (x−) + g (x+) lim du = 0 r→∞ π δ u 2 Thus consider the first integral in 10.3. Z 2 ∞ sin (ur) g (x − u) + g (x + u) du π δ u 2 Z ∞ Z ∞ 1 sin (ur) 1 sin (ur) = g (x − u) du + g (x + u) du π δ u π δ u ! Z −δ Z ∞ 1 sin (ur) sin (ur) = g (x + u) du + g (x + u) du π u u −∞ δ Now Z
−δ
−∞
and g(x+u) ≤ u
1 δ
sin (ur) g (x + u) du = u
Z
−δ
sin (ur) −∞
|g (x + u)| for u < −δ. Thus u →
g(x+u) u
g (x + u) du u
is in L1 ((−∞, −δ)) . Indeed,
−δ
Z Z g (x + u) 1 du ≤ 1 |g (x + u)| du = |g (y)| dy < ∞ u δ R δ R −∞
Z
It follows from the Riemann Lebesgue lemma Z −δ Z ∞ g (x + u) g (x + u) lim sin (ur) du = lim sin (ur) du = 0 r→∞ −∞ r→∞ δ u u First Integral in 10.3: This converges to 0 as r → ∞ because of the Riemann Lebesgue lemma. Indeed, for 0 ≤ u ≤ δ, g (x − u) − g (x−) ≤K 1 2u u1−r
256
CHAPTER 10. SOME FUNDAMENTAL FUNCTIONS AND TRANSFORMS
which is integrable on [0, δ]. The other quotient also is integrable by similar reasoning. The next theorem justifies the terminology above which defines F −1 and calls it the inverse Fourier transform. Roughly it says that the inverse Fourier transform of the Fourier transform equals the mid point of the jump. Thus if the original function is continuous, it restores the original value of this function. Surely this is what you would want by calling something the inverse Fourier transform. However, note that in this theorem, it is defined in terms of an improper integral. This is because there is no guarentee that the Fourier R∞ RR transform will end up being in L1 . Thus instead of −∞ we write limR→∞ −R . Of course, IF the Fourier transform ends up being in L1 , then this amounts to the same thing. The interesting thing is that even if this is not the case, the formula still works provided you consider an improper integral. Now for certain special kinds of functions, the Fourier transform is indeed in L1 and one can show that it maps this special kind of function to another function of the same sort. This can be used as the basis for a general theory of Fourier transforms. However, the following does indeed give adequate justification for the terminology that F −1 is called the inverse Fourier transform.
Theorem 10.3.6
Let g ∈ L1 (R) and suppose g is locally Holder continuous from the right and from the left at x as in 10.1 and 10.2. Then 1 lim R→∞ 2π
R
Z
e
ixt
Z
−R
∞
−∞
Proof: Consider the following manipulations. 1 2π
Z
∞
Z
R
e
ixt −ity
e
−∞
−R
Z
=
1 2π 1 2π
Z
= 1 = π
∞
1 g (y) dtdy = 2π
∞
Z g (y)
−∞
= =
RR
Z
∞
−R
e
i(x−y)t
Z
Z
−∞
eixt
R∞ −∞
e−ity g (y) dydt =
R
ei(x−y)t g (y) dtdy
−R
!
R
Z dt +
0
∞
e
−i(x−y)t
dt dy
0
!
R
2 cos ((x − y) t) dt dy
g (y) −∞
1 2π
R
0
1 sin R (x − y) dy = g (y) x − y π −∞
Z
g (x+) + g (x−) . 2
e−ity g (y) dydt =
Z
∞
g (x − y) −∞
sin Ry dy y
Z 1 ∞ sin Ry (g (x − y) + g (x + y)) dy π 0 y Z 2 ∞ g (x − y) + g (x + y) sin Ry dy π 0 2 y
From Theorem 10.3.5, Z R Z ∞ 1 ixt lim e e−ity g (y) dydt R→∞ 2π −R −∞ Z 2 ∞ g (x − y) + g (x + y) sin Ry = lim dy R→∞ π 0 2 y g (x+) + g (x−) = . 2
10.3. FOURIER TRANSFORM
257
R∞
e−ity g (y) dy is itself in L1 (R) , then you don’t need to RR R∞ do the inversion in terms of a principal value integral as above in which limR→∞ −R eixt −∞ e−ity g (y) dydt was considered. Instead, you simply get Z ∞ Z ∞ g (x+) + g (x−) 1 e−ity g (y) dydt = eixt 2π −∞ 2 −∞
Observation 10.3.7 If t →
−∞
Does this situation ever occur? Yes, it does. This is discussed a little later. How does the Fourier transform relate to the Laplace transform? This is considered next. Recall that from Theorem 10.2.2 if g has exponential growth |g (t)| ≤ Ceηt , then if Re (s) > η, one can define Lg (s) as Z ∞ e−su g (u) du Lg (s) ≡ 0
and also s → Lg (s) is differentiable on Re (s) > η in the sense that if h ∈ C and G (s) ≡ Lg (s) , then Z ∞ G (s + h) − G (s) = G0 (s) = − ue−su g (u) du lim h→0 h 0 This is an example of an analytic function of the complex variable s. It will follow from later theorems that the existence of one derivative implies the existence of all derivatives. This is of course very different than in real analysis where you may have only finitely many derivatives. The difference here is that h ∈ C. You are saying much more by allowing h to be complex. Then the next theorem shows how to invert the Laplace transform. It is one of those results which says that you get the mid point of the jump when you do a certain process. It is like what happens in Fourier series where the Fourier series converges to the midpoint of the jump under suitable conditions. For a fairly elementary discussion of this kind of thing, see the single variable advanced calculus book on my web page.
Theorem 10.3.8
Let g be a measurable function defined on (0, ∞) which has expo-
nential growth |g (t)| ≤ Ceηt for some real η and is Holder continuous from the right and left as in 10.1 and 10.2. For Re (s) > η Z ∞ Lg (s) ≡ e−su g (u) du 0
Then for any γ > η, 1 R→∞ 2π
Z
R
e(γ+iy)t Lg (γ + iy) dy =
lim
−R
g (t+) + g (t−) 2
(10.5)
Proof: This follows from plugging in the formula for the Laplace transform of g and then using the above. Thus 1 2π 1 2π
Z
R
−R
e(γ+iy)t
Z
R
Z
e(γ+iy)t Lg (γ + iy) dy =
−R
∞
e−(γ+iy)u g (u) dudy =
−∞
1 =e 2π γt
Z
R
e −R
iyt
Z
1 2π
Z
R
−R
eγt eiyt
Z
∞
−∞
∞
−∞
e−iyu e−γu g (u) dudy
e−(γ+iy)u g (u) dudy
258
CHAPTER 10. SOME FUNDAMENTAL FUNCTIONS AND TRANSFORMS
Now apply Theorem 10.3.6 to conclude that 1 lim R→∞ 2π
Z
R
e(γ+iy)t Lg (γ + iy) dy
−R
1 R→∞ 2π
= eγt lim = eγt
g (t+) e
Z
R
eiyt
−R −γt+
Z
∞
e−iyu e−γu g (u) dudy
−∞
+ g (t−) e−γt− g (t+) + g (t−) = . 2 2
In particular, this shows that if Lg (s) = Lh (s) for all s large enough, both g, h having exponential growth, then these must be equal except for jumps and in fact, at any point where they are both Holder continuous from right and left, the mid point of their jumps is the same. This gives an alternate proof of Corollary 10.2.6.
10.4
Fourier Transforms in Rn
In this section is a general treatment of Fourier transforms. It turns out you can take the Fourier transform of almost anything you like. First is a definition of a very specialized set of functions. Here the measure space will be (Rn , mn , Fn ) , mn Lebesgue measure on Rn . First is the definition of a polynomial in many variables.
Definition 10.4.1
α = (α1 , · · · , αn ) for α1 · · · αn nonnegative integers is called a multi-index. For α a multi-index, |α| ≡ α1 + · · · + αn and if x ∈ Rn , x = (x1 , · · · , xn ) , and f a function, define αn 1 α2 xα ≡ xα 1 x2 · · · xn .
A polynomial in n variables of degree m is a function of the form X p (x) = aα xα . |α|≤m
Here α is a multi-index as just described and aα ∈ C. Also define for α = (α1 , · · · , αn ) a multi-index ∂ |α| f Dα f (x) ≡ α1 αn . 2 ∂x1 ∂xα 2 · · · ∂xn
Definition 10.4.2
2
Define G1 to be the functions of the form p (x) e−a|x| where a > 0 is rational and p (x) is a polynomial having all rational coefficients, aα being “rational” if it is of the form a + ib for a, b ∈ Q. Let G be all finite sums of functions in G1 . Thus G is an algebra of functions which has the property that if f ∈ G then f ∈ G.
Thus there are countably many functions in G1 . This is because, for each m, there are countably many choices for aα for |α| ≤ m since there are finitely many α for |α| ≤ m and for each such α, there are countably many choices for aα since Q+iQ is countable. (Why?) Thus there are countably many polynomials having degree no more than m. This is true for each m and so the number of different polynomials is a countable union of countable 2 sets which is countable. Now there are countably many choices of e−α|x| and so there are countably many in G1 because the Cartesian product of countable sets is countable. Now G consists of finite sums of functions in G1 . Therefore, it is countable because for each m ∈ N, there are countably many such sums which are possible. I will show now that G is dense in Lp (Rn ) but first, here is a lemma which follows from the Stone Weierstrass theorem.
10.4. FOURIER TRANSFORMS IN RN
259
Lemma 10.4.3 G is dense in C0 (Rn ) with respect to the norm, kf k∞ ≡ sup {|f (x)| : x ∈ Rn } Proof: By the Weierstrass approximation theorem, it suffices to show G separates the points and annihilates no point. It was already observed in the above definition that f ∈ G whenever f ∈ G. If y1 6= y2 suppose first that |y1 | = 6 |y2 | . Then in this case, you can let −|x|2 f (x) ≡ e . Then f ∈ G and f (y1 ) 6= f (y2 ). If |y1 | = |y2 | , then suppose y1k 6= y2k . 2 This must happen for some k because y1 6= y2 . Then let f (x) ≡ xk e−|x| . Thus G separates 2 points. Now e−|x| is never equal to zero and so G annihilates no point of Rn . These functions are clearly quite specialized. Therefore, the following theorem is somewhat surprising.
Theorem 10.4.4
For each p ≥ 1, p < ∞, G is dense in Lp (Rn ). Since G is countable, this shows that L (Rn ) is separable. p
Proof: Let f ∈ Lp (Rn ) . Then there exists g ∈ Cc (Rn ) such that kf − gkp < ε. Now let b > 0 be large enough that Z 2 p e−b|x| dx < εp . Rn 2
Then x → g (x) eb|x| is in Cc (Rn ) ⊆ C0 (Rn ) . Therefore, from Lemma 10.4.3 there exists ψ ∈ G such that
b|·|2
− ψ < 1
ge ∞
2
Therefore, letting φ (x) ≡ e−b|x| ψ (x) it follows that φ ∈ G and for all x ∈ Rn , 2
|g (x) − φ (x)| < e−b|x| Therefore, Z Rn
1/p Z |g (x) − φ (x)| dx ≤ p
e
−b|x|2
p
1/p dx < ε.
Rn
It follows kf − φkp ≤ kf − gkp + kg − φkp < 2ε. From now on, drop the restriction that the coefficients of the polynomials in G are rational. Also drop the restriction that a is rational. Thus G will be finite sums of functions 2 which are of the form p (x) e−a|x| where the coefficients of p are complex and a > 0. The following lemma is also interesting even if it is obvious.
Lemma 10.4.5 For ψ ∈ G , p a polynomial, and α, β multi-indices, Dα ψ ∈ G and pψ ∈ G. Also sup{|xβ Dα ψ(x)| : x ∈ Rn } < ∞ Thus these special functions are infinitely differentiable (smooth). They also have the property that they and all their partial derivatives vanish as |x| → ∞. Let G be the functions of Definition 10.4.2 except, for the sake of convenience, remove all references to rational numbers. Thus G consists of finite sums of polynomials having 2 coefficients in C times e−a|x| for some a > 0. The idea is to first understand the Fourier transform on these very specialized functions.
260
CHAPTER 10. SOME FUNDAMENTAL FUNCTIONS AND TRANSFORMS
Definition 10.4.6
For ψ ∈ G Define the Fourier transform, F and the inverse
Fourier transform, F −1 by F ψ(t) ≡ (2π)
−n/2
Z
e−it·x ψ(x)dx,
Rn
F −1 ψ(t) ≡ (2π)−n/2
Z
eit·x ψ(x)dx.
Rn
Pn where t · x ≡ i=1 ti xi . Note there is no problem with this definition because ψ is in L1 (Rn ) and therefore, it·x e ψ(x) ≤ |ψ(x)| , an integrable function. One reason for using the functions G is that it is very easy to compute the Fourier transform of these functions. The first thing to do is to verify F and F −1 map G to G and that F −1 ◦ F (ψ) = ψ.
Lemma 10.4.7 The following hold. (c > 0)
1 2π
n/2 Z
2
e−c|t| e−is·t dt =
Rn
n/2
1 2π
=
e
−
|s|2 4c
1 2π
n/2 Z
2
e−c|t| eis·t dt
Rn
√ n n/2 2 1 π 1 √ = e− 4c |s| . 2c c
(10.6)
Proof: Consider first the case of one dimension. Let H (s) be given by Z Z 2 −ct2 −ist H (s) ≡ e e dt = e−ct cos (st) dt R
R
Then using the dominated convergence theorem to differentiate, H 0 (s)
Z
=
2
2
−e−ct
s e−ct sin (st) |∞ −∞ − 2c 2c
t sin (st) dt =
R
= − Also H (0) =
R R
Z
! 2
e−ct cos (st) dt
R
s H (s) . 2c 2
e−ct dt. Thus H (0) = 2
Z
I =
e
−c(x2 +y 2 )
2
R R
e−cx dx ≡ I and so Z
∞
Z
dxdy =
R2
0
2π
2
e−cr rdθdr =
0
π . c
For another proof of this which does not use change of variables and polar coordinates, see Problem 10 below. Hence r s π 0 H (s) + H (s) = 0, H (0) = . 2c c s2
It follows that H (s) = e− 4c 1 √ 2π
Z R
√ √π c 2
. Hence
e−ct e−ist dt =
r
π 1 − s2 √ e 4c = c 2π
1 2c
1/2
s2
e− 4c .
This proves the formula in the case of one dimension. The case of the inverse Fourier transform is similar. The n dimensional formula follows from Fubini’s theorem. With these formulas, it is easy to verify F, F −1 map G to G and F ◦ F −1 = F −1 ◦ F = id.
10.5. FOURIER TRANSFORMS OF JUST ABOUT ANYTHING
Theorem 10.4.8
261
Each of F and F −1 map G to G. Also F −1 ◦ F (ψ) = ψ and
F ◦ F −1 (ψ) = ψ. Proof: To make the notation simpler, e
−b|x|2
. Then from the above
R
will symbolize
−n/2
F fb = (2b)
1 (2π)n/2
R Rn
. Also, fb (x) ≡
f(4b)−1 2
The first claim will be shown if it is shown that F ψ ∈ G for ψ (x) = xα e−b|x| because an arbitrary function of G is a finite sum of scalar multiples of functions such as ψ. Using Lemma 10.4.7, Z 2 F ψ (t) ≡ e−it·x xα e−b|x| dx Z −|α| α −it·x −b|x|2 = (−i) Dt e e dx , (Differentiating under integral) √ n |t|2 π −|α| √ = (−i) Dtα e− 4b b |t|2
and this is clearly in G because it equals a polynomial times e− 4b . Similarly, F −1 : G → G. Now consider F −1 ◦ F (ψ) (s) where ψ was just used. From the above, and integrating by parts, Z Z 2 −|α| eis·t Dtα e−it·x e−b|x| dx dt F −1 ◦ F (ψ) (s) = (−i) Z Z 2 −|α| |α| = (−i) (−i) sα eis·t e−it·x e−b|x| dx dt = F −1 (F (fb )) (s)
= =
sα F −1 (F (fb )) (s)
−n/2 −n/2 −1 F −1 (2b) f(4b)−1 (s) = (2b) F f(4b)−1 (s) −n/2 −n/2 −1 (2b) 2 (4b) f(4(4b)−1 )−1 (s) = fb (s)
Hence F −1 ◦ F (ψ) (s) = sα fb (s) = ψ (s). Another way to see this is to use Observation 10.3.7. If n = 1 then F −1 ◦F (ψ) (s) = ψ (s) from Observation 10.3.7 and Theorem 10.3.6. So suppose F −1 ◦ F (ψ) (s) = ψ (s) on Rn−1 . From the definition and the observation and Fubini’s theorem, if ψ ∈ G on Rn , then from the special form of ψ, F −1 ◦ F (ψ) (s) ≡ Z Z Z Z ˆ isn tn −itn xn iˆ sn ·ˆ tn e e e e−itn ·ˆxn ψ (ˆ xn , xn ) dˆ xn dtˆn dxn dtn R
Rn−1
R
Rn−1
Now by induction and Theorem 10.3.6, this is Z Z isn tn e e−itn xn ψ (ˆ sn , xn ) dxn dtn = ψ (ˆ sn , sn ) = ψ (s) R
R
10.5
Fourier Transforms Of Just About Anything
10.5.1
Fourier Transforms in G ∗
It turns out you can make sense of the Fourier transform of any linear map defined on G. This is a very abstract way to look at things but if you want ultimate generality, you must do something like this. Part of the problem is that it is desired to take Fourier transforms of
262
CHAPTER 10. SOME FUNDAMENTAL FUNCTIONS AND TRANSFORMS
functions which are not in L1 (Rn ). Thus the integral which defines the Fourier transform in the above will not make sense. You run into this problem as soon as you try to take the Fourier transform of a function in L2 because such functions might not be in L1 if they are defined on R or Rn . However, it was realized long ago that if a function is in L1 ∩ L2 , then the L2 norm of the function is equal to the L2 norm of the Fourier transform of the function. Thus there is an obvious question about whether you can get a definition which will allow you to directly deal with the Fourier transform on L2 . If you solve this, perhaps by using density of L1 ∩ L2 in L2 , you are still faced with the problem of taking the Fourier transform of an arbitrary function in Lp . The method developed here removes all these difficulties at once.
Definition 10.5.1
Let G ∗ denote the vector space of linear functions defined on G which have values in C. Thus T ∈ G ∗ means T : G → C and T is linear, T (aψ + bφ) = aT (ψ) + bT (φ) for all a, b ∈ C, ψ, φ ∈ G Let ψ ∈ G. Then we can regard ψ as an element of G ∗ by defining Z ψ (φ) ≡ ψ (x) φ (x) dx. Rn
This implies the following important lemma.
Lemma 10.5.2 The following is obtained for all φ, ψ ∈ G. F ψ (φ) = ψ (F φ) , F −1 ψ (φ) = ψ F −1 φ
Also if ψ ∈ G and ψ = 0 in G ∗ so that ψ (φ) = 0 for all φ ∈ G, then ψ = 0 as a function. Proof: Z F ψ (φ) ≡
F ψ (t) φ (t) dt Rn
Z = Rn
1 2π
n/2 Z
Z =
ψ(x) Rn
1 2π
e−it·x ψ(x)dxφ (t) dt
Rn n/2
Z
e−it·x φ (t) dtdx
Rn
Z ψ(x)F φ (x) dx ≡ ψ (F φ)
= Rn
The other claim is similar. Suppose now ψ (φ) = 0 for all φ ∈ G. Then Z ψφdx = 0 Rn
for all φ ∈ G. Therefore, this is true for φ = ψ and so ψ = 0. This lemma suggests a way to define the Fourier transform of something in G ∗ .
Definition 10.5.3
For T ∈ G ∗ , define F T, F −1 T ∈ G ∗ by
F T (φ) ≡ T (F φ) , F −1 T (φ) ≡ T F −1 φ
Lemma 10.5.4 F and F −1 are both one to one, onto, and are inverses of each other.
10.5. FOURIER TRANSFORMS OF JUST ABOUT ANYTHING
263
Proof: First note F and F −1 are both linear. This follows directly from the definition. Suppose now F T = 0. Then F T (φ) ≡ T (F φ) = 0 for all φ ∈ G. But F and F −1 map G onto G because if ψ ∈ G, then as shown above, ψ = F F −1 (ψ) . Therefore, T = 0 and so F is one to one. Similarly F −1 is one to one. Now F −1 (F T ) (φ) ≡ (F T ) F −1 φ ≡ T F F −1 (φ) = T φ. Therefore, F −1 ◦ F (T ) = T. Similarly, F ◦ F −1 (T ) = T. Thus both F and F −1 are one to one and onto and are inverses of each other as suggested by the notation. Probably the most interesting things in G ∗ are functions of various kinds. The following lemma will be useful in considering this situation.
Definition 10.5.5
A function f defined on Rn is in L1loc (Rn ) if f XB ∈ L1 (Rn ) for every ball B. Such functions are termed locally integrable. R Lemma 10.5.6 If f ∈ L1loc (Rn ) and Rn f φdx = 0 for all φ ∈ Cc (Rn ), then f = 0 a.e.
Proof: Let E be bounded and Lebesgue measurable. By regularity, there exists a compact set Kk ⊆ E and an open set Vk ⊇ E such that mn (Vk \ Kk ) < 2−k . Let hk equal 1 on Kk , vanish on VkC , and take values between 0 and 1. Then hk converges to XE off ∞ ∩∞ k=1 ∪l=k (Vl \ Kl ) , a set of measure zero. Hence, by the dominated convergence theorem, Z Z f XE dmn = lim f hk dmn = 0. k→∞
It follows that for E an arbitrary Lebesgue measurable set, Z f XB(0,R) XE dmn = 0. Let
( sgn f =
f |f |
if |f | = 6 0 0 if |f | = 0
By Corollary 6.7.6, there exists {sk }, a sequence of simple functions converging pointwise to sgn f XB(0,R) such that |sk | ≤ 1. Then by the dominated convergence theorem again, Z Z f XB(0,R) sk dmn = 0. |f | XB(0,R) dmn = lim k→∞
Since R is arbitrary, |f | = 0 a.e.
Corollary 10.5.7 Let f ∈ L1 (Rn ) and suppose Z f (x) φ (x) dx = 0 Rn
for all φ ∈ G. Then f = 0 a.e. Proof: Let ψ ∈ Cc (Rn ) . Then by the Stone Weierstrass approximation theorem, there exists a sequence of functions, {φk } ⊆ G such that φk → ψ uniformly. Then by the dominated convergence theorem, Z Z f ψdx = lim f φk dx = 0. k→∞
By Lemma 10.5.6 f = 0. The next theorem is the main result of this sort.
264
CHAPTER 10. SOME FUNDAMENTAL FUNCTIONS AND TRANSFORMS
Theorem 10.5.8
Let f ∈ Lp (Rn ) , p ≥ 1, or suppose f is measurable and has
polynomial growth,
m 2 |f (x)| ≤ K 1 + |x|
for some m ∈ N. Then if
Z f ψdx = 0
for all ψ ∈ G, then it follows f = 0. Proof: First noteR that if f ∈ Lp (Rn ) or has polynomial growth, then it makes sense to write the integral f ψdx described above. This is obvious in the case of polynomial growth. In the case where f ∈ Lp (Rn ) it also makes sense because 1/p0 1/p Z Z Z p0 p |ψ| dx 1. Then 0 1 1 p−2 |f | f ∈ Lp (Rn ) , p0 = q, + = 1 p q 0
and by density of G in Lp (Rn ) (Theorem 10.4.4), there exists a sequence {gk } ⊆ G such that
p−2 f 0 → 0.
gk − |f | p
Then Z
p
Z
|f | dx
f |f |
=
Rn
p−2
Z
f − gk dx +
n
= ≤
f gk dx Rn
ZR
p−2 f |f | f − gk dx Rn
p−2 f kf kLp gk − |f | 0 p
which converges to 0. Hence f = 0. 2 It remains to consider the case where f has polynomial growth. Thus x → f (x) e−|x| ∈ 1 n L (R ) . Therefore, for all ψ ∈ G, Z 2 0 = f (x) e−|x| ψ (x) dx 2
2
because e−|x| ψ (x) ∈ G. Therefore, by the first part, f (x) e−|x| = 0 a.e. Note that “polynomial growth” could be replaced with a condition of the form m α 2 |f (x)| ≤ K 1 + |x| ek|x| , α < 2 and the same proof would yield that these functions are in G ∗ . The main thing to observe is that almost all functions of interest are in G ∗ .
Theorem 10.5.9
Let f be a measurable function with polynomial growth, N 2 |f (x)| ≤ C 1 + |x| for some N,
or let f ∈ Lp (Rn ) for some p ∈ [1, ∞]. Then f ∈ G ∗ if Z f (φ) ≡ f φdx.
10.5. FOURIER TRANSFORMS OF JUST ABOUT ANYTHING
265
Proof: Let f have polynomial growth first. Then the above integral is clearly well defined and so in this case, f ∈ G ∗ . Next suppose f ∈ Lp (Rn ) with ∞ > p ≥ 1. Then it is clear again that the above integral is well defined because of the fact that φ is a sum of polynomials times exponentials of the 2 0 form e−c|x| and these are in Lp (Rn ). Also φ → f (φ) is clearly linear in both cases. This has shown that for nearly any reasonable function, you can define its Fourier transform as described above. You could also define the Fourier transform of a finite Borel measure µ because for such a measure Z ψ→ ψdµ Rn
is a linear functional on G. This includes the very important case of probability distribution measures.
10.5.2
Fourier Transforms of Functions In L1 (Rn )
First suppose f ∈ L1 (Rn ) . As mentioned, you can think of it as being in G ∗ and so one can take its Fourier transform as described above. However, since it is in L1 (Rn ) , there is a natural way to define its Fourier transform. Do the two give the same thing?
Theorem 10.5.10
Let f ∈ L1 (Rn ) . Then F f (φ) = g (t) =
and F −1 f (φ) =
R Rn
1 2π
n/2 Z
F f (t) ≡ (2π)
Rn
gφdt where
e−it·x f (x) dx
Rn
R 1 n/2 2π Rn
gφdt where g (t) =
R
−n/2
Z
eit·x f (x) dx. In short,
e−it·x f (x)dx,
Rn
F −1 f (t) ≡ (2π)−n/2
Z
eit·x f (x)dx.
Rn
Proof: From the definition and Fubini’s theorem, Z
Z
F f (φ) ≡
f (t) F φ (t) dt = Rn
Rn
Z = Rn
f (t)
1 2π
n/2 Z
1 2π !
n/2 Z
e−it·x φ (x) dxdt
Rn
f (t) e−it·x dt φ (x) dx.
Rn
Since φ ∈ G is arbitrary, it follows from Theorem 10.5.8 that F f (x) is given by the claimed formula. The case of F −1 is identical. Here are interesting properties of these Fourier transforms of functions in L1 .
Theorem 10.5.11
If f ∈ L1 (Rn ) and kfk −f k1 → 0, then F fk and F −1 fk converge uniformly to F f and F f respectively. If f ∈ L1 (Rn ), then F −1 f and F f are both continuous and bounded. Also, −1
lim F −1 f (x) = lim F f (x) = 0.
|x|→∞
|x|→∞
Furthermore, for f ∈ L1 (Rn ) both F f and F −1 f are uniformly continuous.
(10.7)
266
CHAPTER 10. SOME FUNDAMENTAL FUNCTIONS AND TRANSFORMS Proof: The first claim follows from the following inequality. Z −it·x −n/2 e fk (x) − e−it·x f (x) dx |F fk (t) − F f (t)| ≤ (2π) n ZR −n/2 = (2π) |fk (x) − f (x)| dx Rn
=
(2π)
−n/2
kf − fk k1 .
which a similar argument holding for F −1 . Now consider the second claim of the theorem. Z 0 −it·x 0 −n/2 − e−it ·x |f (x)| dx |F f (t) − F f (t )| ≤ (2π) e Rn
The integrand is bounded by 2 |f (x)|, a function in L1 (Rn ) and converges to 0 as t0 → t and so the dominated convergence theorem implies F f is continuous. To see F f (t) is uniformly bounded, Z −n/2 |f (x)| dx < ∞. |F f (t)| ≤ (2π) Rn
A similar argument gives the same conclusions for F −1 . It remains to verify 10.7 and the claim that F f and F −1 f are uniformly continuous. Z −n/2 −it·x e f (x)dx |F f (t)| ≤ (2π) Rn
−n/2
kg − f k1 < ε/2. Then Now let ε > 0 be given and let g ∈ Cc∞ (Rn ) such that (2π) Z |F f (t)| ≤ (2π)−n/2 |f (x) − g (x)| dx Rn Z −n/2 −it·x e g(x)dx + (2π) Rn Z −n/2 −it·x e g(x)dx . ≤ ε/2 + (2π) Rn
Now integrating by parts, it follows that for ktk∞ ≡ max {|tj | : j = 1, · · · , n} > 0 Z X n 1 ∂g (x) dx |F f (t)| ≤ ε/2 + (2π)−n/2 ktk∞ Rn j=1 ∂xj
(10.8)
and this last expression converges to zero as ktk∞ → ∞. The reason for this is that if tj 6= 0, integration by parts with respect to xj gives Z Z 1 ∂g (x) (2π)−n/2 e−it·x g(x)dx = (2π)−n/2 e−it·x dx. −itj Rn ∂xj Rn Therefore, choose the j for which ktk∞ = |tj | and the result of 10.8 holds. Therefore, from 10.8, if ktk∞ is large enough, |F f (t)| < ε. Similarly, limktk→∞ F −1 (t) = 0. Consider the claim about uniform continuity. Let ε > 0 be given. Then there exists R such that if ktk∞ > R, then |F f (t)| < 2ε . Since F f is continuous, it is uniformly continuous on the n compact set [−R − 1, R + 1] . Therefore, there exists δ 1 such that if kt − t0 k∞ < δ 1 for n 0 t , t ∈ [−R − 1, R + 1] , then |F f (t) − F f (t0 )| < ε/2.
(10.9)
Now let 0 < δ < min (δ 1 , 1) and suppose kt − t0 k∞ < δ. If both t, t0 are contained in n n n [−R, R] , then 10.9 holds. If t ∈ [−R, R] and t0 ∈ / [−R, R] , then both are contained in
10.5. FOURIER TRANSFORMS OF JUST ABOUT ANYTHING
267
n
[−R − 1, R + 1] and so this verifies 10.9 in this case. The other case is that neither point n is in [−R, R] and in this case, |F f (t) − F f (t0 )|
≤
n, show that if f ∈ H k (Rn ), then f equals a bounded continuous function a.e. Hint: Show that for k this large, F f ∈ L1 (Rn ), and then use Problem 1. To do this, write −k k |F f (x)| = |F f (x)|(1 + |x|2 ) 2 (1 + |x|2 ) 2 , So
Z
Z |F f (x)|dx =
k
|F f (x)|(1 + |x|2 ) 2 (1 + |x|2 )
−k 2
dx.
Use the Cauchy Schwarz inequality. This is an example of a Sobolev imbedding Theorem. 9. Let u ∈ G. Then F u ∈ G and so, in particular, it makes sense to form the integral, Z F u (x0 , xn ) dxn R
where (x0 , xn ) = x ∈ Rn . For u ∈ G, define γu (x0 ) ≡ u (x0 , 0). Find a constant such that F (γu) (x0 ) equals this constant times the above integral. Hint: By the dominated convergence theorem Z Z 2 0 F u (x , xn ) dxn = lim e−(εxn ) F u (x0 , xn ) dxn . R
ε→0
R
Now use the definition of the Fourier transform and Fubini’s theorem as required in order to obtain the desired relationship. R 2 R −x2 (1+t2 ) x −t2 1 10. Let h (x) = 0 e dt + 0 e 1+t2 dt . Show that h0 (x) = 0 and h (0) = π/4. R∞ R∞ √ √ 2 2 Then let x → ∞ to conclude that 0 e−t dt = π/2. Show that −∞ e−t dt = π √ R∞ 2 and that −∞ e−ct dt = √πc . 11. Recall that for f a function, fy (x) = f (x − y) . Find a relationship between F fy (t) and F f (t) given that f ∈ L1 (Rn ). 12. For f ∈ L1 (Rn ) , simplify F f (t + y) . 13. For f ∈ L1 (Rn ) and c a nonzero real number, show F f (ct) = F g (t) where g (x) = f xc . R 14. Suppose that f ∈ L1 (R) and R that |x| |f (x)| dx < ∞. Find a way to use the Fourier transform of f to compute xf (x) dx.
276
CHAPTER 10. SOME FUNDAMENTAL FUNCTIONS AND TRANSFORMS
Chapter 11
Degree Theory, an Introduction This chapter is on the Brouwer degree, a very useful concept with numerous and important applications. The degree can be used to prove some difficult theorems in topology such as the Brouwer fixed point theorem, the Jordan separation theorem, and the invariance of domain theorem. A couple of these big theorems have been presented earlier, but when you have degree theory, they get much easier. Degree theory is also used in bifurcation theory and many other areas in which it is an essential tool. The degree will be developed for Rp first. When this is understood, it is not too difficult to extend to versions of the degree which hold in Banach space. There is more on degree theory in the book by Deimling [20] and much of the presentation here follows this reference. Another more recent book which is really good is [23]. This is a whole book on degree theory. The original reference for the approach given here, based on analysis, is [33] and dates from 1959. The degree was developed earlier by Brouwer and others using different methods. To give you an idea what the degree is about, consider a real valued C 1 function defined on an interval I, and let y ∈ f (I) be such that f 0 (x) 6= 0 for all x ∈ f −1 (y). In this case the degree is the sum of the signs of f 0 (x) for x ∈ f −1 (y), written as d (f, I, y).
y
In the above picture, d (f, I, y) is 0 because there are two places where the sign is 1 and two where it is −1. The amazing thing about this is the number you obtain in this simple manner is a specialization of something which is defined for continuous functions and which has nothing to do with differentiability. An outline of the presentation is as follows. First define the degree for smooth functions at regular values and then extend to arbitrary values and finally to continuous functions. The reason this is possible is an integral expression for the degree which is insensitive to homotopy. It is very similar to the winding number of complex analysis. The difference between the two is that with the degree, the integral which ties it all together is taken over the open set while the winding number is taken over the boundary, although proofs of in the case of the winding number sometimes involve Green’s theorem which involves an integral over the open set. 277
278
CHAPTER 11. DEGREE THEORY, AN INTRODUCTION In this chapter Ω will refer to a bounded open set.
Definition 11.0.1
For Ω a bounded open set, denote by C Ω the set of functions which are restrictions of functions in Cc (Rp ) , equivalently C (Rp ) to Ω and by C m Ω , m ≤ ∞ the space of restrictions of functions in Ccm (Rp ) , equivalently C m (Rp ) to Ω. If f ∈ C Ω the symbol f will also be used to denote a function defined on Rp equalling f on Ω when convenient. The subscript c indicates that the functions have compact support. The norm in C Ω is defined as follows. kf k∞,Ω = kf k∞ ≡ sup |f (x)| : x ∈ Ω . If the functions take values in Rp write C m Ω; Rp or C Ω; Rp for these functions if there is no differentiability assumed. The norm on C Ω; Rp is defined in the same way as above, kf k∞,Ω = kf k∞ ≡ sup |f (x)| : x ∈ Ω . Of course if m = ∞, the notation means that there are infinitely many derivatives. Also, C (Ω; Rp ) consists of functions which are continuous on Ω that have values in Rp and C m (Ω; Rp ) denotes the functions which have m continuous derivatives defined on Ω. Also let P consist of functions f (x) such that fk (x) is a polynomial, meaning an element of the algebra of functions generated by {1, x1 , · · · , xp } . Thus a typical polynomial is of the form P i1 ip a where the ij are nonnegative integers and a (i1 · · · ip ) is a real i1 ···ip (i1 · · · ip ) x · · · x number. Some of the theorems are simpler if you base them on the Weierstrass approximation theorem. Note that, by applying the Tietze extension theorem to the components of the function, one can always extend a function continuous on Ω to all of Rp so there is no loss of generality in simply regarding functions continuous on Ω as restrictions of functions continuous on Rp . Next is the idea of a regular value.
Definition 11.0.2
For W an open set in Rp and g ∈ C 1 (W ; Rp ) y is called a regular value of g if whenever x ∈ g−1 (y), det (Dg (x)) 6= 0. Note that if g−1 (y) = ∅, it follows that y is a regular value from this definition. That is, y is a regular value if and only if y∈ / g ({x ∈ W : det Dg (x) = 0}) Denote by Sg the set of singular values of g, those y such that det (Dg (x)) = 0 for some x ∈ g−1 (y). Also, ∂Ω will often be referred to. It is those points with the property that every open set (or open ball) containing the point contains points not in Ω and points in Ω. Then the following simple lemma will be used frequently.
Lemma 11.0.3 Define ∂U to be those points x with the property that for every r > 0, B (x, r) contains points of U and points of U C . Then for U an open set, ∂U = U \ U
(11.1)
Let C be a closed subset of Rp and let K denote the set of components of Rp \ C. Then if K is one of these components, it is open and ∂K ⊆ C Proof: First consider claim 11.1. Let x ∈ U \ U. If B (x, r) contains no points of U, then x ∈ / U . If B (x, r) contains no points of U C , then x ∈ U and so x ∈ / U \ U . Therefore, U \ U ⊆ ∂U . Now let x ∈ ∂U . If x ∈ U, then since U is open there is a ball containing
11.1. SARD’S LEMMA AND APPROXIMATION
279
x which is contained in U contrary to x ∈ ∂U . Therefore, x ∈ / U. If x is not a limit point of U, then some ball containing x contains no points of U contrary to x ∈ ∂U . Therefore, x ∈ U \ U which shows the two sets are equal. Why is K open for K a component of Rp \ C? This follows from Theorem 2.11.12 and results from open balls being connected. Thus if k ∈ K,letting B (k, r) ⊆ C C , it follows K ∪ B (k, r) is connected and contained in C C and therefore is contained in K because K is maximal with respect to being connected and contained in C C . Now for K a component of Rp \ C, why is ∂K ⊆ C? Let x ∈ ∂K. If x ∈ / C, then x ∈ K1 , some component of Rp \ C. If K1 6= K then x cannot be a limit point of K and so it cannot be in ∂K. Therefore, K = K1 but this also is a contradiction because if x ∈ ∂K then x ∈ /K thanks to 11.1. Note that for an open set U ⊆ Rp , and h :U → Rp , dist (h (∂U ) , y) ≥ dist h U , y because U ⊇ ∂U . The following lemma will be nice to keep in mind. Lemma 11.0.4 f ∈ C Ω × [a, b] ; Rp if and only if t → f (·, t) is in C [a, b] ; C Ω; Rp . Also kf k∞,Ω×[a,b] = max kf (·,t)k∞,Ω t∈[a,b]
Proof: ⇒By uniform continuity, if ε > 0 there is δ > 0 such that if |t − s| < δ, then for all x ∈ Ω, kf (x,t) − f (x,s)k < 2ε . It follows that kf (·, t) − f (·, s)k∞ ≤ 2ε < ε. ⇐Say (xn , tn ) → (x,t) . Does it follow that f (xn , tn ) → f (x,t)? kf (xn , tn ) − f (x,t)k
≤
kf (xn , tn ) − f (xn , t)k + kf (xn , t) − f (x, t)k
≤
kf (·, tn ) − f (·, t)k∞ + kf (xn , t) − f (x, t)k both terms converge to 0, the first because f is continuous into C Ω; Rp and the second because x → f (x, t) is continuous. The claim about the norms is next. Let (x, t) be such that kf k∞,Ω×[a,b] < kf (x, t)k + ε. Then kf k∞,Ω×[a,b] < kf (x, t)k + ε ≤ max kf (·, t)k∞,Ω + ε t∈[a,b]
and so kf k∞,Ω×[a,b] ≤ maxt∈[a,b] max kf (·,t)k∞,Ω
because ε is arbitrary. However, the
same argument works in the other direction. There exists t such that kf (·, t)k∞,Ω = maxt∈[a,b] kf (·, t)k∞,Ω by compactness of the interval. Then by compactness of Ω, there is x such that kf (·,t)k∞,Ω = kf (x, t)k ≤ kf k∞,Ω×[a,b] and so the two norms are the same.
11.1
Sard’s Lemma and Approximation
First are easy assertions about approximation of continuous functions with smooth ones. The following is the Weierstrass approximation theorem. It is Corollary 2.12.3 presented earlier.
Corollary 11.1.1 If f ∈ C ([a, b] ; X) where X is a normed linear space, then there exists a sequence of polynomials which converge uniformly to f on [a, b]. The polynomials are of the form m X k m−k k m −1 −1 l (t) (11.2) 1 − l (t) f l k m k=0
where l is a linear one to one and onto map from [0, 1] to [a, b]. Applying the Weierstrass approximation theorem, Theorem 2.12.11 or Theorem 2.12.7 to the components of a vector valued function yields the following corollary
280
CHAPTER 11. DEGREE THEORY, AN INTRODUCTION
Theorem 11.1.2 there exists g ∈ C ∞
If f ∈ C Ω; Rp for Ω a bounded subset of Rp , then for all ε > 0, Ω; Rp such that kg − f k∞,Ω < ε.
Recall Sard’s lemma, shown earlier. It is Lemma 9.2.3. I am stating it here for convenience.
Lemma 11.1.3 (Sard) Let U be an open set in Rp and let h : U → Rp be differentiable. Let S ≡ {x ∈ U : det Dh (x) = 0} . Then mp (h (S)) = 0. First note that if y ∈ / g (Ω) , then we are calling it a regular value because whenever x ∈ g−1 (y) the desired conclusion follows vacuously. Thus, for g ∈ C ∞ Ω, Rp , y is a regular value if and only if y ∈ / g x ∈ Ω : det Dg (x) = 0 . Observe that any uncountable set in Rp has a limit point. To see this, tile Rp with countably many congruent boxes. One of them has uncountably many points. Now subdivide this into 2p congruent boxes. One has uncountably many points. Continue subdividing this way to obtain a limit point as the unique point in the intersection of a nested sequence of compact sets whose diameters converge to 0. p Lemma 11.1.4 Let g ∈ C ∞ (Rp ; Rp ) and let {yi }∞ i=1 be points of R and let η > 0.
Then there exists e with kek < η and yi + e is a regular value for g. Proof: Let S = {x ∈ Rp : det Dg (x) = 0}. By Sard’s lemma, g (S) has measure zero. Let N ≡ ∪∞ i=1 (g (S) − yi ) . Thus N has measure 0. Pick e ∈ B (0, η) \ N . Then for each i, yi + e ∈ / g (S) . Lemma 11.1.5 Let f ∈ C Ω;Rp and let {yi }∞ i=1 be points not in f (∂Ω) and let δ > 0. Then there exists g ∈ C ∞ Ω; Rp such that kg − f k∞,Ω < δ and yi is a regular value for −1 g for each i. That is, if g (x) = yi , then Dg (x) exists. Also, if δ < dist (f (∂Ω) , y) for ∞ p some y a regular value of g ∈ C Ω; R , then g−1 (y) is a finite set of points in Ω. Also, ∞ if y is a regular value of g ∈ C (Rp , Rp ) , then g−1 (y) is countable. Proof: Pick g ˜ ∈ C ∞ Ω; Rp , k˜ g − f k∞,Ω < δ. g ≡ g ˜ − e where e is from the above Lemma 11.1.4 and η so small that kg − f k∞,Ω < δ. Then if g (x) = yi , you get g ˜ (x) = yi +e a regular value of g ˜ and so det (Dg (x)) = det (D˜ g (x)) 6= 0 so this shows the first part. It remains to verify the last claims. Since kg − f kΩ,∞ < δ, if x ∈ ∂Ω, then kg (x) − yk ≥ kf (x) − yk − kf (x) − g (x)k ≥ dist (f (∂Ω) , y) − δ > δ − δ = 0 and so y ∈ / g (∂Ω), so if g (x) = y, then x ∈ Ω. If there are infinitely many points in g−1 (y) , then there would be a subsequence converging to a point x ∈ Ω. Thus g (x) = y so x ∈ / ∂Ω. However, this would violate the inverse function theorem because g would fail to be one to one on an open ball containing x and contained in Ω. Therefore, there are only finitely many points in g−1 (y) and at each point, the determinant of the derivative of g is nonzero. For y a regular value, g−1 (y) is countable since otherwise, there would be a limit point x ∈ g−1 (y) and g would fail to be one to one near x contradicting the inverse function theorem. Now with this, here is a definition of the degree.
Definition 11.1.6
Let Ω be a bounded open set in Rp and let f : Ω → Rp be continuous. Let y ∈ / f (∂Ω) . Then the degree is defined as follows: Let g be infinitely differentiable, kf − gk∞,Ω < dist (f (∂Ω) , y) , and y is a regular value of g.Then X d (f ,Ω, y) ≡ sgn (det (Dg (x))) : x ∈ g−1 (y)
11.1. SARD’S LEMMA AND APPROXIMATION
281
From Lemma 11.1.5 the definition at least makes sense because the sum is finite and such a g exists. The problem is whether the definition is well defined in the sense that we get the same answer if a different g is used. Suppose it is shown that the definition is well defined. If y ∈ / f (Ω) , then you could pick g such that kg − f kΩ < dist y, f Ω . However, this requires that g Ω does not contain y because if x ∈ Ω, then kg (x) − yk
= k(y − f (x)) − (g (x) − f (x))k ≥ kf (x) − yk − kg (x) − f (x)k > dist y, f Ω − dist y, f Ω = 0
Therefore, y is a regular value of g because every point in g−1 (y) is such that the determinant of the derivative at this point is non zero since there are no such points. Thus if f −1 (y) = ∅, d (f ,Ω, y) = 0. If f (z) = y, then there is g having y a regular value and g (z) = y by the above lemma.
Lemma 11.1.7 Suppose g, gˆ both satisfy the above definition, dist (f (∂Ω) , y) > δ > kf − gk∞,Ω , dist (f (∂Ω) , y) > δ > kf − g ˆk∞,Ω Then for t ∈ [0, 1] so does tg + (1 − t) g ˆ. In particular, y ∈ / (tg + (1 − t) g ˆ) (∂Ω). More generally, if kh − f k < dist (f (∂Ω) , y) then 0 < dist (h (∂Ω) , y) . Also d (f − y,Ω, 0) = d (f ,Ω, y). Proof: From the triangle inequality, if t ∈ [0, 1] , kf − (tg + (1 − t) g ˆ)k∞ ≤ t kf − gk∞ + (1 − t) kf − g ˆk∞ < tδ + (1 − t) δ = δ. If kh − f k∞ < δ < dist (f (∂Ω) , y) , as was just shown for h ≡tg + (1 − t) g ˆ, then if x ∈ ∂Ω, ky − h (x)k ≥ ky − f (x)k − kh (x) − f (x)k > dist (f (∂Ω) , y) − δ ≥ δ − δ = 0 Now consider the last claim. This follows because kg − f k∞ small is the same as −1 kg − y− (f − y)k∞ being small. They are the same. Also, (g − y) (0) = g−1 (y) and Dg (x) = D (g − y) (x). First is an identity. It was Lemma 3.6.2 on Page 100.
Lemma 11.1.8 Let g : U → Rp be C 2 where U is an open subset of Rp . Then p X
cof (Dg)ij,j = 0,
j=1
where here (Dg)ij ≡ gi,j ≡
∂gi ∂xj .
Also, cof (Dg)ij =
Next is an integral representation of little lemma about disjoint sets.
P
∂ det(Dg) . ∂gi,j
sgn (det (Dg (x))) : x ∈ g−1 (y) but first is a
Lemma 11.1.9 Let K be a compact set and C a closed set in Rp such that K ∩ C = ∅. Then dist (K, C) ≡ inf {kk − ck : k ∈ K, c ∈ C} > 0. Proof: Let d ≡ inf {kk − ck : k ∈ K, c ∈ C} Let {ki } , {ci } be such that d+
1 > kki − ci k . i
282
CHAPTER 11. DEGREE THEORY, AN INTRODUCTION
Since K is compact, there is a subsequence still denoted by {ki } such that ki → k ∈ K. Then also kci − cm k ≤ kci − ki k + kki − km k + kcm − km k If d = 0, then as m, i → ∞ it follows kci − cm k → 0 and so {ci } is a Cauchy sequence which must converge to some c ∈ C. But then kc − kk = limi→∞ kci − ki k = 0 and so c = k ∈ C ∩ K, a contradiction to these sets being disjoint. In particular the distance between a point and a closed set is always positive if the point is not in the closed set. Of course this is obvious even without the above lemma. Definition 11.1.10 Let g ∈ C ∞ Ω; Rp where Ω is a bounded open set. Also let φε be a mollifier. Z φε ∈ Cc∞ (B (0, ε)) , φε ≥ 0,
φε dx = 1.
The idea is that ε will converge to 0 to get suitable approximations. First, here is a technical lemma which will be used to identify the degree with an integral. Lemma 11.1.11 Let y ∈/ g (∂Ω) for g ∈ C ∞ Ω; Rp . Also suppose y is a regular value of g. Then for all positive ε small enough, Z X φε (g (x) − y) det Dg (x) dx = sgn (det Dg (x)) : x ∈ g−1 (y) Ω
Proof: First note that the sum is finite from Lemma 11.1.5. It only remains to verify the equation. I need to show the left side of this equation is constant for ε small enough and equals the m right side. By what was just shown, there are finitely many points, {xi }i=1 = g−1 (y). By the inverse function theorem, there exist disjoint open sets Ui with xi ∈ Ui , such that g is one to one on Ui with det (Dg (x)) having constant sign on Ui and g (Ui ) is an open set containing y. Then let ε be small enough that B (y, ε) ⊆ ∩m / g Ω \ (∪ni=1 Ui ) , a i=1 g (Ui ) . Also, y ∈ compact set. Let ε be still smaller, if necessary, so that B (y, ε) ∩ g Ω \ (∪ni=1 Ui ) = ∅ and let Vi ≡ g−1 (B (y, ε)) ∩ Ui . g(U3 ) x3 •
V1
ε•y
V3 V2
g(U2 )
• x1 g(U1 )
• x2
Therefore, for any ε this small, Z m Z X φε (g (x) − y) det Dg (x) dx = Ω
i=1
φε (g (x) − y) det Dg (x) dx
Vi
The reason for this is as follows. The integrand on the left is nonzero only if g (x) − y ∈ B (0, ε) which occurs only if g (x) ∈ B (y, ε) which is the same as x ∈ g−1 (B (y, ε)). Therefore, the integrand is nonzero only if x is contained in exactly one of the disjoint sets, Vi . Now using the change of variables theorem, (z = g (x) − y, g−1 (y + z) = x.) m Z X = φε (z) det Dg g−1 (y + z) det Dg−1 (y + z) dz i=1
g(Vi )−y
11.1. SARD’S LEMMA AND APPROXIMATION
283
By the chain rule, I = Dg g−1 (y + z) Dg−1 (y + z) and so det Dg g−1 (y + z) det Dg−1 (y + z) = sgn det Dg g−1 (y + z) det Dg g−1 (y + z) det Dg−1 (y + z) = sgn det Dg g−1 (y + z) = sgn (det Dg (x)) = sgn (det Dg (xi )) . Therefore, this reduces to m X
Z sgn (det Dg (xi ))
m X
φε (z) dz = g(Vi )−y
i=1
Z sgn (det Dg (xi ))
φε (z) dz = B(0,ε)
i=1
m X
sgn (det Dg (xi )) .
i=1
In case g−1 (y) = ∅, there exists ε > 0 such that g Ω ∩ B (y, ε) = ∅ and so for ε this small, Z φε (g (x) − y) det Dg (x) dx = 0. Ω
As noted above, this will end up being d (g, Ω, y) in this last case where g−1 (y) = ∅. Next is an important result on homotopy. Lemma 11.1.12 If h is in C ∞ Ω × [a, b] , Rp , and 0 ∈/ h (∂Ω × [a, b]) then for 0 < ε < dist (0, h (∂Ω × [a, b])) , Z t→ φε (h (x, t)) det D1 h (x, t) dx Ω
is constant for t ∈ [a, b]. As a special case, d (f , Ω, y) is well defined. Also, if y ∈ / f Ω , then d (f ,Ω, y) = 0. Proof: By continuity of h, h (∂Ω × [a, b]) is compact and so is at a positive distance from 0. Let ε > 0 be such that for all t ∈ [a, b] , B (0, ε) ∩ h (∂Ω × [a, b]) = ∅
(11.3)
Define for t ∈ (a, b), Z H (t) ≡
φε (h (x, t)) det D1 h (x, t) dx Ω
I will show that H 0 (t) = 0 on (a, b) . Then, since H is continuous on [a, b] , it will follow from the mean value theorem that H (t) is constant on [a, b]. If t ∈ (a, b), Z X H 0 (t) = φε,α (h (x, t)) hα,t (x, t) det D1 h (x, t) dx Ω α
Z +
φε (h (x, t)) Ω
X
det D1 (h (x, t)),αj hα,jt dx ≡ A + B.
(11.4)
α,j
In this formula, the function det is considered as a function of the n2 entries in the n × n matrix and the , αj represents the derivative with respect to the αj th entry hα,j . Now as in the proof of Lemma 3.6.2 on Page 100, det D1 (h (x, t)),αj = (cof D1 (h (x, t)))αj
284
CHAPTER 11. DEGREE THEORY, AN INTRODUCTION
and so B=
Z XX Ω α
φε (h (x, t)) (cof D1 (h (x, t)))αj hα,jt dx.
j
By hypothesis x → φε (h (x, t)) (cof D1 (h (x, t)))αj for x ∈ Ω is in Cc∞ (Ω) because if x ∈ ∂Ω, it follows that for all t ∈ [a, b] , h (x, t) ∈ / B (0, ε) and so φε (h (x, t)) = 0 off some compact set contained in Ω. Therefore, integrate by parts and write Z XX ∂ (φε (h (x, t))) (cof D1 (h (x, t)))αj hα,t dx+ B=− ∂x j Ω α j −
Z XX Ω α
φε (h (x, t)) (cof D (h (x, t)))αj,j hα,t dx
j
The second term equals zero by Lemma 11.1.8. Simplifying the first term yields B =
−
Z XXX Ω α
=
−
j
Z XX Ω α
φε,β (h (x, t)) hβ,j hα,t (cof D1 (h (x, t)))αj dx
β
φε,β (h (x, t)) hα,t
X
hβ,j (cof D1 (h (x, t)))αj dx
j
β
Now the sum on j is the dot product of the β th row with the αth row of the cofactor matrix which equals zero unless β = α because it would be a cofactor expansion of a matrix with two equal rows. When β = α, the sum on j reduces to det (D1 (h (x, t))) . Thus B reduces to Z X =− φε,α (h (x, t)) hα,t det (D1 (h (x, t))) dx Ω α
Which is the same thing as A, but with the opposite sign. Hence A + B in 11.4 is 0 and H 0 (t) = 0 and so H is a constant on [a, b]. Finally consider the last claim. If g, g ˆ both work in the definition for the degree, then consider h (x, t) ≡ tg (x) + (1 − t) g ˆ (x) − y for t ∈ [0, 1] . From Lemma 11.1.7, h satisfies what is needed for the first part of this lemma. Then from Lemma 11.1.11 and the first part of this lemma, if 0 < ε < dist (0, h (∂Ω × [0, 1])) is sufficiently small that the second and last equations hold in what follows, d (f , Ω, y)
=
X
sgn (det (Dg (x))) : x ∈ g−1 (y)
Z =
φε (h (x, 1)) det D1 h (x, 1) dx ZΩ
=
φε (h (x, 0)) det D1 h (x, 0) dx X = sgn (det (Dˆ g (x))) : x ∈ g−1 (y) Ω
The last claim was noted earlier. If y ∈ / f Ω , then letting g be smooth with kf − gk∞,Ω < δ < dist (f (∂Ω) , y) , it follows that if x ∈ ∂Ω, kg (x) − yk ≥ ky − f (x)k − kg (x) − f (x)k > dist (f (∂Ω) , y) − δ ≥ 0 Thus, from the definition, d (f , Ω, y) = 0.
11.2. PROPERTIES OF THE DEGREE
11.2
285
Properties of the Degree
Now that the degree for a continuous function has been defined, it is time to consider properties of the degree. In particular, it is desirable to prove a theorem about homotopy invariance which depends only on continuity considerations. Theorem 11.2.1 If h is in C Ω × [a, b] , Rp , and 0 ∈/ h (∂Ω × [a, b]) then t → d (h (·, t) , Ω, 0) is constant for t ∈ [a, b]. Proof: Let 0 < δ < mint∈[a,b] dist (h (∂Ω × [a, b]) , 0) . By Corollary 11.1.1, there exists hm (·, t) =
m X
pk (t) h (·, tk )
k=0
for pk (t) some polynomial in t of degree m such that max khm (·, t) − h (·, t)k∞,Ω < δ
(11.5)
t∈[a,b]
Letting ψ n be a mollifier, Z 1 Cc∞ B 0, , ψ n (u) du = 1 n Rp let gn (·, t) ≡ hm ∗ ψ n (·, t) Thus, Z gn (x, t) ≡ Rp
=
m X
hm (x − u, t) ψ n (u) du =
m X
Z pk (t)
k=0
Z pk (t) Rp
k=0
h (u, tk ) ψ n (x − u) du ≡
Rp m X
h (x − u, tk ) ψ n (u) du
pk (t) h (·, tk ) ∗ ψ n (x) (11.6)
k=0
so x → gn (x, t) is in C ∞ Ω; Rp . Also, Z kgn (x, t) − hm (x,t)k ≤
kh (x − u, t) − h (z, t)k du < ε
1 B (0, n )
provided n is large enough. This follows from uniform continuity of h on the compact set Ω × [a, b]. It follows that if n is large enough, one can replace hm in 11.5 with gn and obtain for large enough n max kgn (·, t) − h (·, t)k∞,Ω < δ (11.7) t∈[a,b]
Now gn ∈ C ∞ Ω × [a, b] ; R are continuous. Here
p
because all partial derivatives with respect to either t or x
gn (x,t) =
m X
pk (t) h (·, tk ) ∗ ψ n (x)
k=0
Let τ ∈ (a, b]. Let gaτ (x, t) ≡ gn (x, t) −
τ −t τ −a ya
+ yτ τt−a −a
where ya is a regular value
of gn (·, a), yτ is a regular value of gn (x, τ ) and both ya , yτ are so small that max kgaτ (·, t) − h (·, t)k∞,Ω < δ.
t∈[a,b]
(11.8)
286
CHAPTER 11. DEGREE THEORY, AN INTRODUCTION
This uses Lemma 11.1.3. Thus if gaτ (x, τ ) = 0, then gn (x, τ ) − yτ = 0 so 0 is a regular value for gaτ (·, τ ) . If gaτ (x, a) = 0, then gn (x, t) = ya so 0 is also a regular value for gaτ (·, a) . From 11.8, dist (gaτ (∂Ω × [a, b]) , 0) > 0 as in Lemma 11.1.5. Choosing ε < dist (gaτ (∂Ω × [a, b]) , 0), it follows from Lemma 11.1.12, the definition of the degree, and Lemma 11.1.11 in the first and last equations that for ε small enough, Z d (h (·, a) , Ω, 0) = φε (gaτ (x, a)) det D1 gaτ (x, a) dx ZΩ = φε (gaτ (x, τ )) det D1 gaτ (x, τ ) dx = d (h (·, τ ) , Ω, 0) Ω
Since τ is arbitrary, this proves the theorem. Now the following theorem is a summary of the main result on properties of the degree.
Theorem 11.2.2
Definition 11.1.6 is well defined and the degree satisfies the fol-
lowing properties. / h (∂Ω, t) for all t ∈ [0, 1] 1. (homotopy invariance) If h ∈ C Ω × [0, 1] , Rp and y (t) ∈ where y is continuous, then t → d (h (·, t) , Ω, y (t)) is constant for t ∈ [0, 1] . 2. If Ω ⊇ Ω1 ∪ Ω2 where Ω1 ∩ Ω2 = ∅, for Ωi an open set, then if y ∈ / f Ω \ (Ω1 ∪ Ω2 ) , then d (f , Ω1 , y) + d (f , Ω2 , y) = d (f , Ω, y) 3. d (I, Ω, y) = 1 if y ∈ Ω. 4. d (f , Ω, ·) is continuous and constant on every connected component of Rp \ f (∂Ω). 5. d (g,Ω, y) = d (f , Ω, y) if g|∂Ω = f |∂Ω . 6. If y ∈ / f (∂Ω), and if d (f ,Ω, y) 6= 0, then there exists x ∈ Ω such that f (x) = y. Proof: That the degree is well defined follows from Lemma 11.1.12. Consider 1., the first property about homotopy. This follows from Theorem 11.2.1 applied to H (x, t) ≡ h (x, t) − y (t). Consider 2. where y ∈ / f Ω \ (Ω1 ∪ Ω2 ) . Note that dist y, f Ω \ (Ω1 ∪ Ω2 ) ≤ dist (y, f (∂Ω)). Then let g be in C Ω; Rp and kg − f k∞
δ − δ = 0 Then by 1 of Theorem 11.2.2, d (f , Ω, (1 − t) y + tz) is constant. When t = 0 you get d (f , Ω, y) and when t = 1 you get d (f , Ω, z) . Another simple observation is that if you have y1 , · · · , yr in Rp \ f (∂Ω) , then if ˜ f has
˜
the property that f − f < mini≤r dist (yi , f (∂Ω)) , then ∞
d (f , Ω, yi ) = d ˜ f , Ω, yi for each yi . This follows right away from the above arguments and thehomotopy invariance ˜ applied to each of the finitely many yi . Just consider d f + t f − f , Ω, yi , t ∈ [0, 1] . If x ∈ ∂Ω, f + t ˜ f − f (x) 6= yi and so d f + t ˜ f − f , Ω, yi is constant on [0, 1] , this for each i.
11.3
Borsuk’s Theorem
In this section is an important theorem which can be used to verify that d (f , Ω, y) 6= 0. This is significant because when this is known, it follows from Theorem 11.2.2 that f −1 (y) 6= ∅. In other words there exists x ∈ Ω such that f (x) = y.
288
CHAPTER 11. DEGREE THEORY, AN INTRODUCTION
Definition 11.3.1
A bounded open set, Ω is symmetric if −Ω = Ω. A continuous function, f : Ω → Rp is odd if f (−x) = −f (x). Suppose Ω is symmetric and g ∈ C ∞ Ω; Rp is an odd map for which 0 is a regular value. Then the chain rule implies Dg (−x) = Dg (x) and so d (g, Ω, 0) must equal an odd integer because if x ∈ g−1 (0), it follows that −x ∈ g−1 (0) also and since Dg (−x) = Dg (x), it follows the overall contribution to the degree from x and −x must be an even integer. Also 0 ∈ g−1 (0) and so the degree equals an even integer added to sgn (det Dg (0)), an odd integer, either −1 or 1. It seems reasonable to expect that something like this would hold for an arbitrary continuous odd function defined on symmetric Ω. In fact this is the case and this is next. The following lemma is the key result used. This approach is due to Gromes [30]. See also Deimling [20] which is where I found this argument. The idea is to start with a smooth odd map and approximate it with a smooth odd map which also has 0 a regular value. Note that 0 is achieved because g (0) = −g (0) . Lemma 11.3.2 Let g ∈ C ∞ Ω; Rp be an odd map. Then for every ε > 0, there exists h ∈ C ∞ Ω; Rp such that h is also an odd map, kh − gk∞ < ε, and 0 is a regular value of h, 0 ∈ / g (∂Ω) . Here Ω is a symmetric bounded open set. In addition, d (g, Ω, 0) is an odd integer. Proof: In this argument η > 0 will be a small positive number. Let h0 (x) = g (x) + ηx where η is chosen such that det Dh0 (0) 6= 0. Just let −η not be an eigenvalue of Dg (0) also see Lemma 3.6.1. Note that h0 is odd and 0 is a value of h0 thanks to h0 (0) = 0. This has taken care of 0. However, it is not known whether 0 is a regular value of h0 because there may be other x where h0 (x) = 0. These other points must be accounted for. The eventual function will be of the form h (x) ≡ h0 (x) −
p X
yj x3j ,
j=1
0 X
yj x3j ≡ 0.
j=1
Note that h (0) = 0 and det (Dh (0)) = det (Dh0 (0)) 6= 0. This is because when you take the matrix of the derivative, those terms involving x3j will vanish when x = 0 because of the exponent 3.
The idea is to choose small yj , yj < η in such a way that 0 is a regular value for h for each x 6= 0 such that x ∈ h−1 (0). As just noted, the case where x = 0 creates no problems. Let Ωi ≡ {x ∈ Ω : xi 6= 0} , so ∪pj=1 Ωj = {x ∈ Rp : x 6= 0} , Ω0 ≡ {0} . Each Ωi is a symmetric open set while Ω0 is the single point 0. Then let h1 (x) ≡ h0 (x) − y1 x31 on Ω1 . If h1 (x) = 0, then h0 (x) = y1 x31 so y1 = The set of singular values of x →
h0 (x) x31
(11.9)
h0 (x) x31
for x ∈ Ω1 contains no open sets. Hence there exists
a regular value y for x → such that y1 < η. Then if y1 = h0x(x) 3 , abusing notation 1 a little, the matrix of the derivative at this x is h0 (x) x31
1
3
Dh1 (x)rs
= =
h0r,s (x) x31 − (x1 ),xs h0r (x) x61 h0r,s (x)
3 − (x1 ),xs x31
3
=
h0r,s (x) x31 − (x1 ),xs yr1 x31 x61
yr1
and so, choosing y1 this way in 11.9, this derivative is non-singular if and only if 3 det h0r,s (x) − (x1 ),xs yr1 = det D (h1 (x)) 6= 0
11.3. BORSUK’S THEOREM
289
which of course is exactly what is wanted. Thus there is a small y1 , y1 < η such that if h1 (x) = 0, for x ∈ Ω1 , then det D (h1 (x)) 6= 0. That is, 0 is a regular value of h1 . Thus, 11.9 is how to choose y1 for y1 < η. Then the theorem will be proved by doing the same process, going from Ω1 to Ω1 ∪ Ω2 and so forth. If for each k ≤ p, there exist yi , yi < η, i ≤ k and 0 is a regular value of hk (x) ≡ h0 (x) −
k X
yj x3j for x ∈ ∪ki=0 Ωi
j=1
Then the theorem will be proved by considering hp ≡ h. For k = 0, 1 this is done. Suppose then that this holds for k − 1, k ≥ 2. Thus 0 is a regular value for hk−1 (x) ≡ h0 (x) −
k−1 X
yj x3j on ∪k−1 i=0 Ωi
j=1
What should yk be? Keeping y1 , · · · , yk−1 , it follows that if xk = 0, hk (x) = hk−1 (x) . For x ∈ Ωk where xk 6= 0, hk (x) ≡ h0 (x) −
k X
yj x3j = 0
j=1
if and only if h0 (x) −
Pk−1 j=1
yj x3j
x3k
= yk
P j 3
h0 (x)− k−1 j=1 y xj So let yk be a regular value of on Ωk , (xk 6= 0) and also yk < η. This x3k is the same reasoning as before, the set of singular values does not contain any open set. Then for such x satisfying
h0 (x) −
Pk−1 j=1
x3k
yj x3j
= yk on Ωk ,
(11.10)
and using the quotient rule as before, Pk−1 Pk−1 h0r,s − j=1 yrj ∂xs x3j x3k − h0r (x) − j=1 yrj x3j ∂xs x3k 0 6= det x6k and h0r (x) −
Pk−1 j=1
yrj x3j = yrk x3k from 11.10 and so
h0r,s −
Pk−1
0 6= det
j=1
yrj ∂xs x3j
x3k − yrk x3k ∂xs x3k
x6k
h0r,s −
Pk−1
0 6= det
j 3 j=1 yr ∂xs xj
x3k
− yrk ∂xs x3k
which implies 0 6= det h0r,s −
k−1 X j=1
yrj ∂xs
x3j − yrk ∂xs x3k = det (D (hk (x)))
290
CHAPTER 11. DEGREE THEORY, AN INTRODUCTION
If x ∈ Ωk , (xk 6= 0) this has shown that det (Dhk (x)) 6= 0 whenever hk (x) = 0 and xk ∈ Ωk . Pk If x ∈ / Ωk , then xk = 0 and so, since hk (x) = h0 (x) − j=1 yj x3j , because of the power of 3 in x3k , Dhk (x) = Dhk−1 (x) which has nonzero determinant by induction if hk−1 (x) = 0. Thus for each k ≤ p, there is a function hk of the form described above, such that if x ∈ ∪kj=1 Ωk , with hk (x) = 0, it follows that det (Dhk (x)) 6= 0. Thus 0 is a regular value for hk on ∪kj=1 Ωj . Let h ≡ hp . Then ) p X
k
y kxk max kηxk + (
kh − gk∞,
Ω
≤
x∈Ω
k=1
≤ η ((p + 1) diam (Ω)) < ε < dist (g (∂Ω) , 0) provided η was chosen sufficiently small to begin with. So what is d (h, Ω, 0)? Since 0 is a regular value and h is odd, h−1 (0) = {x1 , · · · , xr , −x1 , · · · , −xr , 0} . So consider Dh (x) and Dh (−x). Dh (−x) u + o (u)
h (−x + u) − h (−x)
=
= −h (x+ (−u)) + h (x) =
− (Dh (x) (−u)) + o (−u)
=
Dh (x) (u) + o (u)
Hence Dh (x) = Dh (−x) and so the determinants of these two are the same. It follows from the definition that d (g, Ω, 0) = d (h, Ω, 0)
=
r X
sgn (det (Dh (xi ))) +
i=1
r X
sgn (det (Dh (−xi )))
i=1
+ sgn (det (Dh (0))) 2m ± 1 some integer m Theorem 11.3.3 (Borsuk) Let f ∈ C Ω; Rp be odd and let Ω be symmetric with 0∈ / f (∂Ω). Then d (f , Ω, 0) equals an odd integer. =
Proof: Let ψ n be a mollifier which is symmetric, ψ (−x) = ψ (x). Also recall that f is the restriction to Ω of a continuous function, still denoted as f which is defined on all of Rp . Let g be the odd part of this function. That is, g (x) ≡
1 (f (x) − f (−x)) 2
Since f is odd, g = f on Ω. Then Z gn (−x) ≡ g ∗ ψ n (−x) =
g (−x − y) ψ n (y) dy Ω
Z =−
Z g (x + y) ψ n (y) dy = −
Ω
g (x− (−y)) ψ n (−y) dy = −gn (x) Ω
Thus gn is odd and is infinitely differentiable. Let n be large enough that kgn − gk∞,Ω < δ < dist (f (∂Ω) , 0). Then by definition of the degree, d (f ,Ω, 0) = d (g, Ω, 0) = d (gn , Ω, 0) and by Lemma 11.3.2 this is an odd integer.
11.4. APPLICATIONS
11.4
291
Applications
With these theorems it is possible to give easy proofs of some very important and difficult theorems.
Definition 11.4.1
If f : U ⊆ Rp → Rp where U is an open set. Then f is locally one to one if for every x ∈ U , there exists δ > 0 such that f is one to one on B (x, δ). As a first application, consider the invariance of domain theorem. This result says that a one to one continuous map takes open sets to open sets. It is an amazing result which is essential to understand if you wish to study manifolds. In fact, the following theorem only requires f to be locally one to one. First here is a lemma which has the main idea.
Lemma 11.4.2 Let g : B (0,r) → Rp be one to one and continuous where here B (0,r) is the ball centered at 0 of radius r in Rp . Then there exists δ > 0 such that g (0) + B (0, δ) ⊆ g (B (0,r)) . The symbol on the left means: {g (0) + x : x ∈ B (0, δ)} . Proof: For t ∈ [0, 1] , let h (x, t) ≡ g
x 1+t
−g
−tx 1+t
Then for x ∈ ∂B (0, r) , h (x, t) 6= 0 because if this were so, the fact g is one to one implies x −tx = 1+t 1+t and this requires x = 0 which is not the case since kxk = r. Then d (h (·, t) , B (0, r) , 0) is constant. Hence it is an odd integer for all t thanks to Borsuk’s theorem, because h (·, 1) is odd. Now let B (0,δ) be such that B (0,δ) ∩ h (∂Ω, 0) = ∅. Then d (h (·, 0) , B (0, r) , 0) = d (h (·, 0) , B (0, r) , z) for z ∈ B (0, δ) because the degree is constant on connected components of Rp \ h (∂Ω, 0) . Hence z = h (x, 0) = g (x) − g (0) for some x ∈ B (0, r). Thus g (B (0,r)) ⊇ g (0) + B (0,δ) Now with this lemma, it is easy to prove the very important invariance of domain theorem. A function f is locally one to one on an open set Ω if for every x0 ∈ Ω, there exists B (x0 , r) ⊆ Ω such that f is one to one on B (x0 , r).
Theorem 11.4.3
(invariance of domain)Let Ω be any open subset of Rp and let f : Ω → R be continuous and locally one to one. Then f maps open subsets of Ω to open sets in Rp . p
Proof: Let B (x0 , r) ⊆ Ω where f is one to one on B (x0 , r). Let g be defined on B (0, r) given by g (x) ≡ f (x + x0 ) Then g satisfies the conditions of Lemma 11.4.2, being one to one and continuous. It follows from that lemma there exists δ > 0 such that f (Ω)
⊇ f (B (x0 , r)) = f (x0 + B (0, r)) = g (B (0,r)) ⊇ g (0) + B (0, δ) = f (x0 ) + B (0,δ) = B (f (x0 ) , δ)
This shows that for any x0 ∈ Ω, f (x0 ) is an interior point of f (Ω) which shows f (Ω) is open. With the above, one gets easily the following amazing result. It is something which is clear for linear maps but this is a statement about continuous maps.
292
CHAPTER 11. DEGREE THEORY, AN INTRODUCTION
Corollary 11.4.4 If p > m there does not exist a continuous one to one map from Rp to Rm . Proof: Suppose not and let f be such a continuous map, T
f (x) ≡ (f1 (x) , · · · , fm (x)) . T
Then let g (x) ≡ (f1 (x) , · · · , fm (x) , 0, · · · , 0) where there are p − m zeros added in. Then g is a one to one continuous map from Rp to Rp and so g (Rp ) would have to be open from the invariance of domain theorem and this is not the case.
Corollary 11.4.5 If f is locally one to one and continuous, f : Rp → Rp , and lim |f (x)| = ∞,
|x|→∞
then f maps Rp onto Rp . Proof: By the invariance of domain theorem, f (Rp ) is an open set. It is also true that f (Rp ) is a closed set. Here is why. If f (xk ) → y, the growth condition ensures that {xk } is a bounded sequence. Taking a subsequence which converges to x ∈ Rp and using the continuity of f , it follows f (x) = y. Thus f (Rp ) is both open and closed which implies f must be an onto map since otherwise, Rp would not be connected. The next theorem is the famous Brouwer fixed point theorem.
Theorem 11.4.6
(Brouwer fixed point) Let B = B (0, r) ⊆ Rp and let f : B → B be continuous. Then there exists a point x ∈ B, such that f (x) = x. Proof: Assume there is no fixed point. Consider h (x, t) ≡ tf (x) − x for t ∈ [0, 1] . Then for kxk = r, 0∈ / tf (x) − x,t ∈ [0, 1] By homotopy invariance, t → d (tf − I, B, 0) n
is constant. But when t = 0, this is d (−I, B, 0) = (−1) 6= 0. Hence d (f − I, B, 0) 6= 0 so there exists x such that f (x) − x = 0. You can use standard stuff from Hilbert space to get this the fixed point theorem for a compact convex set. Let K be a closed bounded convex set and let f : K → K be continuous. Let P be the projection map onto K as in Problem 31 on Page 90. Then P is continuous because |P x − P y| ≤ |x − y|. Recall why this is. From the characterization of the projection map P, (x−P x, y−P x) ≤ 0 for all y ∈ K. Therefore, (x−P x,P y−P x) ≤ 0, (y−P y, P x−P y) ≤ 0 so (y−P y, P y−P x) ≥ 0 Hence, subtracting the first from the last, (y−P y− (x−P x) , P y−P x) ≥ 0 consequently, 2
|x − y| |P y−P x| ≥ (y − x, P y−P x) ≥ |P y−P x|
and so |P y − P x| ≤ |y − x| as claimed. Now let r be so large that K ⊆ B (0,r) . Then consider f ◦ P. This map takes B (0,r) → B (0, r). In fact it maps B (0,r) to K. Therefore, being the composition of continuous functions, it is continuous and so has a fixed point in B (0, r) denoted as x. Hence f (P (x)) = x. Now, since f maps into K, it follows that x ∈ K. Hence P x = x and so f (x) = x. This has proved the following general Brouwer fixed point theorem.
11.4. APPLICATIONS
293
Theorem 11.4.7
Let f : K → K be continuous where K is compact and convex and nonempty, K ⊆ Rp . Then f has a fixed point.
Definition 11.4.8
f is a retract of B (0, r) onto ∂B (0, r) if f is continuous, f B (0, r) ⊆ ∂B (0,r)
and f (x) = x for all x ∈ ∂B (0,r).
Theorem 11.4.9
There does not exist a retract of B (0, r)
onto its boundary,
∂B (0, r). Proof: Suppose f were such a retract. Then for all x ∈ ∂B (0, r), f (x) = x and so from the properties of the degree, the one which says if two functions agree on ∂Ω, then they have the same degree, 1 = d (I, B (0, r) , 0) = d (f , B (0, r) , 0) which is clearly impossible because f −1 (0) = ∅ which implies d (f , B (0, r) , 0) = 0. You should now use this theorem to give another proof of the Brouwer fixed point theorem. The proofs of the next two theorems make use of the Tietze extension theorem, Theorem 2.7.5.
Theorem 11.4.10
Let Ω be a symmetric open set in Rp such that 0 ∈ Ω and let f : ∂Ω → V be continuous where V is an m dimensional subspace of Rp , m < p. Then f (−x) = f (x) for some x ∈ ∂Ω. Proof: Suppose not. Using the Tietze extension theorem on components of the function, extend f to all of Rp , f Ω ⊆ V . (Here the extended function is also denoted by f .) Let g (x) = f (x) − f (−x). Then 0 ∈ / g (∂Ω) and so for some r > 0, B (0,r) ⊆ Rp \ g (∂Ω). For z ∈ B (0,r), d (g, Ω, z) = d (g, Ω, 0) 6= 0 because B (0,r) is contained in a component of Rp \ g (∂Ω) and Borsuk’s theorem implies that d (g, Ω, 0) 6= 0 since g is odd. Hence V ⊇ g (Ω) ⊇ B (0,r) and this is a contradiction because V is m dimensional. This theorem is called the Borsuk Ulam theorem. Note that it implies there exist two points on opposite sides of the surface of the earth which have the same atmospheric pressure and temperature, assuming the earth is symmetric and that pressure and temperature are continuous functions. The next theorem is an amusing result which is like combing hair. It gives the existence of a “cowlick”.
Theorem 11.4.11
Let n be odd and let Ω be an open bounded set in Rp with 0 ∈ Ω. Suppose f : ∂Ω → R \ {0} is continuous. Then for some x ∈ ∂Ω and λ 6= 0, f (x) = λx. p
Proof: Using the Tietze extension theorem, extend f to all of Rp . Also denote the extended function by f . Suppose for all x ∈ ∂Ω, f (x) 6= λx for all λ ∈ R. Then 0∈ / tf (x) + (1 − t) x (x, t) ∈ ∂Ω × [0, 1] 0∈ / tf (x) − (1 − t) x, (x, t) ∈ ∂Ω × [0, 1] . Thus there exists a homotopy of f and I and a homotopy of f and −I. Then by the homotopy invariance of degree, d (f , Ω, 0) = d (I, Ω, 0) , d (f , Ω, 0) = d (−I, Ω, 0) . n
But this is impossible because d (I, Ω, 0) = 1 but d (−I, Ω, 0) = (−1) = −1.
294
11.5
CHAPTER 11. DEGREE THEORY, AN INTRODUCTION
Product Formula, Jordan Separation Theorem
This section is on the product formula for the degree which is used to prove the Jordan separation theorem. To begin with is a significant observation which is used without comment below. Recall that the connected components of an open set are open. The formula is all about the composition of continuous functions. g
f
Ω → f (Ω) ⊆ Rp → Rp p Lemma 11.5.1 Let {Ki }∞ i=1 be the connected components of R \ C where C is a closed
set. Then ∂Ki ⊆ C. Proof: Since Ki is a connected component of an open set, it is itself open. See Theorem 2.11.12. Thus ∂Ki consists of all limit points of Ki which are not in Ki . Let p be such a point. If it is not in C then it must be in some other Kj which is impossible because these are disjoint open sets. Thus if x is a point in U it cannot be a limit point of V for V disjoint from U .
Definition 11.5.2
Let the connected components of Rp \ f (∂Ω) be denoted by Ki . From the properties of the degree listed in Theorem 11.2.2, d (f , Ω, ·) is constant on each of these components. Denote by d (f , Ω, Ki ) the constant value on the component Ki . The following is the product formula. Note that if K is an unbounded component of C f (∂Ω) , then d (f , Ω, y) = 0 for all y ∈ K by homotopy invariance and the fact that for large enough kyk , f −1 (y) = ∅ since f Ω is compact. ∞
Theorem 11.5.3
(product formula)Let {Ki }i=1 be the bounded components of Rp \ f (∂Ω) for f ∈ C Ω; R , let g ∈ C (Rp , Rp ), and suppose that y ∈ / g (f (∂Ω)) or in other words, g−1 (y) ∩ f (∂Ω) = ∅. Then p
d (g ◦ f , Ω, y) =
∞ X
d (f , Ω, Ki ) d (g, Ki , y) .
(11.11)
i=1
All but finitely many terms in the sum are zero. If there are no bounded components of C f (∂Ω) , then d (g ◦ f , Ω, y) = 0. Proof: The compact set f Ω ∩g−1 (y) is contained in Rp \f (∂Ω) and so, f Ω ∩g−1 (y) is covered by finitely many of the components Kj one of which may be the unbounded component. Since these components are disjoint, the other components fail to intersect f Ω ∩ g−1 (y). Thus, if Ki is one of these others, either it fails to intersect g−1 (y) or Ki fails to intersect f Ω . Thus either d (f , Ω, Ki ) = 0 because Ki fails to intersect f Ω or d (g, Ki , y) = 0 if Ki fails to intersect g−1 (y). Thus the sum is always a finite sum. I am using Theorem 11.2.2, the part which says that if y ∈ / h Ω , then d (h,Ω, y) = 0. Note that ∂Ki ⊆ f (∂Ω) so y ∈ / g (∂Ki ). p p Let g ˜ be in C ∞ (R g − gk∞ be so small that for each of the finitely many , R−1) and let k˜ Ki intersecting f Ω ∩ g (y) , d (g ◦ f , Ω, y) d (g, Ki , y)
= d (˜ g ◦ f , Ω, y) = d (˜ g, Ki , y)
(11.12)
Since ∂Ki ⊆ f (∂Ω), both conditions are obtained by letting kg − g ˜k∞,f (Ω) < dist (y, g (f (∂Ω))) By Lemma 11.1.5, there exists g ˜ such that y is a regular value of g ˜ in addition to 11.12 and g ˜−1 (y) ∩ f (∂Ω) = ∅. Then g ˜−1 (y) is contained in the union of the Ki along with the
11.5. PRODUCT FORMULA, JORDAN SEPARATION THEOREM
295
unbounded component(s) and by Lemma 11.1.5 g ˜−1 (y) is countable. mi As discussed there, g ˜−1 (y) ∩ Ki is finite if Ki is bounded. Let g ˜−1 (y) ∩ Ki = xij j=1 , mi ≤ ∞. mi could only be ∞ on the unbounded component. Now use Lemma 11.1.5 again to get ˜ f in C ∞ Ω; Rp such that each xij is a regular value
f − f is very small, so small that of ˜ f on Ω and also ˜ ∞
d g ˜ ◦˜ f , Ω, y = d (˜ g ◦ f , Ω, y) = d (g ◦ f , Ω, y) and
d ˜ f , Ω, xij = d f , Ω, xij
for each i, j. Thus, from the above, d (g ◦ f , Ω, y) = d g ˜ ◦˜ f , Ω, y = d f , Ω, xij = d (f , Ω, Ki ) d ˜ f , Ω, xij d (˜ g, Ki , y)
= d (g, Ki , y)
Is y a regular value for g ˜ ◦˜ f on Ω? Suppose z ∈ Ω and y = ˜ ◦˜ f(z) so ˜ f (z) ∈ g ˜−1 (y) . g −1 Then ˜ f (z) = xi for some i, j and D˜ f (z) exists. Hence D g ˜ ◦˜ f (z) = D˜ g xi D˜ f (z) , j
j
both linear transformations invertible. Thus y is a regular value of g ˜ ◦˜ f on Ω. i What of xj in Ki where Ki is unbounded? Then as observed above, the sum of f ,Ω, xi and is 0 because the degree is constant sgn det D˜ f (z) for z ∈ ˜ f −1 xi is d ˜ j
j
on Ki which is unbounded. From the definition of the degree, the left side of 11.11 d (g ◦ f , Ω, y) equals Xn o f (z) sgn det D˜ f (z) : z ∈ ˜ f −1 g ˜−1 (y) sgn det D˜ g ˜ The g ˜−1 (y) are the xij . Thus the above is of the form XX X = sgn det D˜ g xij sgn det D˜ f (z) i j z∈˜ f −1 (xij ) As mentioned, if xij ∈ Ki an unbounded component, then X sgn det D˜ g xij sgn det D˜ f (z) = 0 z∈˜ f −1 (xij ) and so, it suffices to only consider bounded components in what follows and the sum makes sense because there are finitely many xij in bounded Ki . This also shows that if there are C no bounded components of f (∂Ω) , then d (g ◦ f , Ω, y) = 0. Thus d (g ◦ f , Ω, y) equals XX X = sgn det D˜ g xij sgn det D˜ f (z) i j z∈˜ f −1 (xij ) X = d (˜ g,Ki , y) d ˜ f , Ω, Ki i
sgn det D˜ f (z) ≡ d ˜ f , Ω, xij = d ˜ f , Ω, Ki . This proves the product formula because g ˜ and ˜ f were chosen close enough to f , g respectively that X X d ˜ f , Ω, Ki d (˜ g,Ki , y) = d (f , Ω, Ki ) d (g,Ki , y)
To explain the last step,
i
P
z∈˜ f −1 (xij )
i
296
CHAPTER 11. DEGREE THEORY, AN INTRODUCTION
Before the general Jordan separation theorem, I want to first consider the examples of most interest. Recall that if a function f is continuous and one to one on a compact set K, then f is a homeomorphism of K and f (K). Also recall that if U is a nonempty open set, the boundary of U , denoted as ∂U and meaning those points x with the property that for all r > 0 B (x, r) intersects both U and U C , is U \ U .
Proposition 11.5.4 Let H be a compact set and let f : H → Rp , p ≥ 2 be one to one and continuous so that H and f (H) ≡ C are homeomorphic. Suppose H C has only one connected component so H C is connected. Then C C also has only one component. Proof: I want to show that C C has no bounded components so suppose it has a bounded component K. Extend f to all of Rp and let g be an extension of f −1 to all of Rp . Then, by the above Lemma 11.5.1, ∂K ⊆ C. Since f ◦ g (x) = x on ∂K ⊆ C, if z ∈ K, it follows that C 1 = d (f ◦ g, K, z) . Now g (∂K) ⊆ g (C) = H. Thus g (∂K) ⊇ H C and H C is unbounded C and connected. If a component of g (∂K) is bounded, then it cannot contain the unbounded C H C which must be contained in the component of g (∂K) which it intersects. Thus, the C only bounded components of g (∂K) must be contained in H. Let the set of such bounded components be denoted by Q. By the product formula, X d (f ◦ g, K, z) = d (g, K, Q) d (f , Q, z) Q∈Q
However, f Q ⊆ f (H) = C and z is contained in a component of C C so for each Q ∈ Q, d (f , Q, z) = 0. Hence, by the product formula, d (f ◦ g, K, z) = 0 which is a contradiction to 1 = d (f ◦ g, K, z). Thus there is no bounded component of C C . It is obvious that the unit sphere S p−1 divides Rp into two disjoint open sets, the inside and the outside. The following shows that this also holds for any homeomorphic image of S p−1 . p−1 Proposition 11.5.5 its boundary. Suppose f : Let B be the ball B (0,1) with S
S p−1 → C ≡ f S p−1 ⊆ Rp is a homeomorphism. Then C C also has exactly two components, a bounded and an unbounded. Proof: I need to show there is only one bounded component of C C . From Proposition 11.5.4, there is at least one. Otherwise, S p−1 would have no bounded components which is obviously false. Let f be a continuous extension of f off S p−1 and let g be a continuous extension of g off C. Let {Ki } be the bounded components of C C . Now ∂Ki ⊆ C and so g (∂Ki ) ⊆ g (C) = S p−1 . Let G be those points x with |x| > 1. C Let a bounded component of g (∂Ki ) be H. If H intersects B then H must contain C C B because B is connected and contained in S p−1 ⊆ g (∂Ki ) which means that B is C contained in the union of components of g (∂Ki ) . The same is true if H intersects G. However, H cannot contain the unbounded G and so H cannot intersect G. Since H is open and cannot intersect G, it cannot intersect S p−1 either. Thus H = B and so there is only C one bounded component of g (∂Ki ) and it is B. Now let z ∈ Ki . f ◦ g = I on ∂Ki ⊆ C and so if z ∈ Ki , since the degree is determined by values on the boundary, the product formula implies 1 = d (f ◦ g,Ki , z) = d (g, Ki , B) d (f , B, z) = d (g, Ki , B) d (f , B, Ki ) P If there are n of these Ki , it follows i d (g, Ki , B) d (f , B, Ki ) = n. Now pick y ∈ B. Then, since g ◦ f (x) = x on ∂B, the product formula shows X X 1 = d (g ◦ f , B, y) = d (f , B, Ki ) d (g, Ki , y) = d (f , B, Ki ) d (g, Ki , B) = n i
i
11.6. THE JORDAN SEPARATION THEOREM
297
Thus n = 1 and there is only one bounded component of C C . It remains to show that, in the above f S p−1 is the boundary of both components, the bounded one and the unbounded one.
Theorem 11.5.6
Let S p−1 be the unit sphere in Rp . Suppose γ : S p−1 → Γ ⊆ Rp is one to one onto and continuous. Then Rp \ Γ consists of two components, a bounded component (called the inside) Ui and an unbounded component (called the outside), Uo . Also the boundary of each of these two components of Rp \ Γ is Γ and Γ has empty interior. Proof: γ −1 is continuous since S p−1 is compact and γ is one to one. By the Jordan separation theorem, Rp \Γ = Uo ∪Ui where these on the right are the connected components of the set on the left, both open sets. Only one of them is bounded, Ui . Thus Γ∪Ui ∪Uo = Rp . Since both Ui , Uo are open, ∂U ≡ U \ U for U either Uo or Ui . If x ∈ Γ, and is not a limit point of Ui , then there is B (x, r) which contains no points of Ui . Let S be those points x of ˆ Γ for which B (x, r) contains no points of Ui for some r > 0. This S is open in Γ. Let Γ be −1 ˆ p−1 ˆ ˆ Γ \ S. Then if C = γ Γ , it follows that C is a closed set in S and is a proper subset of S p−1 . It is obvious that taking a relatively open set from S p−1 results in a compact set ˆ is also an open whose complement is an open connected set. By Proposition 11.5.4, Rp \ Γ connected set. Start with x ∈ Ui and consider a continuous curve which goes from x to ˆ . Thus the curve contains no points of Γ. ˆ However, it y ∈ Uo which is contained in Rp \ Γ must contain points of Γ which can only be in S. The first point of Γ intersected by this curve is a point in Ui and so this point of intersection is not in S after all because every ball containing it must contain points of Ui . Thus S = ∅ and every point of Γ is in Ui . Similarly, every point of Γ is in Uo . Thus Γ ⊆ Ui \ Ui and Γ ⊆ Uo \ Uo . However, if x ∈ Ui \ Ui , then x∈ / Uo because it is a limit point of Ui and so x ∈ Γ. It is similar with Uo . Thus Γ = Ui \ Ui and Γ = Uo \ Uo . This could not happen if Γ had an interior point. Such a point would be in Γ but would fail to be in either ∂Ui or ∂Uo . When p = 2, this theorem is called the Jordan curve theorem. ¯ to Rp instead of just S p−1 ? Obviously, one should be able to say a What if γ maps B little more.
Corollary 11.5.7 Let B be an open ball and let γ : B¯ → Rp be one to one and continuous. Let Ui , Uo be as in the above theorem, the bounded and unbounded components of C γ (∂B) . Then Ui = γ (B). Proof: By connectedness and the observation that γ (B) contains no points of C ≡ γ (∂B) , it follows that γ (B) ⊆ Ui or Uo . Suppose γ (B) ⊆ Ui . I want to show that γ (B) is not a proper subset of Ui . If γ (B) is a proper subset of Ui , then if x ∈ Ui \ γ (B) , it follows ¯ because x ∈ ¯ also that x ∈ Ui \ γ B / C = γ (∂B). Let H be the component of Ui \ γ B determined by x. In particular H and Uo are separated since they are disjoint open sets. ¯ C and B ¯ C each have only one component. This is true for B ¯ C and by the Jordan Now γ B C C ¯ . However, H is in γ B ¯ ¯ C separation theorem, also true for γ B as is Uo and so γ B has at least two components after all. If γ (B) ⊆ Uo the argument works exactly the same. Thus either γ (B) = Ui or γ (B) = Uo . The second alternative cannot take place because γ (B) is bounded. Hence γ (B) = Ui . Note that this essentially gives the invariance of domain theorem.
11.6
The Jordan Separation Theorem
What follows is the general Jordan separation theorem.
Lemma 11.6.1 Let Ω be a bounded open set in Rp , f ∈ C Ω; Rp , and suppose {Ωi }∞ i=1
are disjoint open sets contained in Ω such that y∈ / f Ω \ ∪∞ j=1 Ωj
298
CHAPTER 11. DEGREE THEORY, AN INTRODUCTION
Then d (f , Ω, y) =
∞ X
d (f , Ωj , y)
j=1
where the sum has all but finitely many terms equal to 0. Proof: By assumption, the compact set f −1 (y) ≡ x ∈ Ω : f (x) = y has empty intersection with Ω \ ∪∞ j=1 Ωj and so this compact set is covered by finitely many of the Ωj , say {Ω1 , · · · , Ωn−1 } and y∈ / f ∪∞ j=n Ωj . By Theorem 11.2.2 and letting O = ∪∞ j=n Ωj , d (f , Ω, y) =
n−1 X
d (f , Ωj , y) + d (f ,O, y) =
j=1
∞ X
d (f , Ωj , y)
j=1
because d (f ,O, y) = 0 as is d (f , Ωj , y) for every j ≥ n. To help remember some symbols, here is a short diagram. I have a little trouble remembering what is what when I read the proof. bounded components Rp \ C K
bounded components Rp \ f (C) L
For K ∈ K, K a bounded component of Rp \ C, bounded components of Rp \ f (∂K) H LH sets of L contained in H ∈ H H1 those sets of H which intersect a set of L Note that the sets of H will tend to be larger than the sets of L. The following lemma relating to the above definitions is interesting.
Lemma 11.6.2 If a component L of Rp \ f (C) intersects a component H of Rp \ f (∂K), K a component of Rp \ C, then L ⊆ H. Proof: First note that by Lemma 11.5.1 for K ∈ K, ∂K ⊆ C. Next suppose L ∈ L and L ∩ H 6= ∅ where H is as described above. Suppose y ∈ L \ H so L is not contained in H. Since y ∈ L, it follows that y ∈ / f (C) and so y ∈ / f (∂K) either, ˜ for some other H ˜ a component of Rp \ f (∂K) . Now by Lemma 11.5.1. It follows that y ∈ H Rp \ f (∂K) ⊇ Rp \ f (C) and so ∪H ⊇ L and so L = ∪ {L ∩ H : H ∈ H} and these are disjoint open sets with two of them nonempty. Hence L would not be connected which is a contradiction. Hence if L ∩ H 6= ∅, then L ⊆ H. The following is the Jordan separation theorem. It is based on a use of the product formula and splitting up the sets of L into LH for various H ∈ H, those in LH being the ones which intersect H thanks to the above lemma.
Theorem 11.6.3
(Jordan separation theorem) Let f be a homeomorphism of C and f (C) where C is a compact set in Rp . Then Rp \ C and Rp \ f (C) have the same number of connected components.
11.6. THE JORDAN SEPARATION THEOREM
299
Proof: Denote by K the bounded components of Rp \ C and denote by L, the bounded components of Rp \ f (C). Also, using the Tietze extension theorem on components of a vector valued function, there exists ¯ f an extension of f to all of Rp and let f −1 be an extension of f −1 to all of Rp . Pick K ∈ K and take y ∈ K. Then ∂K ⊆ C and so y∈ / f −1 ¯ f (∂K) f equals the identity I on ∂K, it follows from the properties of the degree that Since f −1 ◦ ¯ 1 = d (I, K, y) = d f −1 ◦ ¯ f , K, y . Recall that if two functions agree on the boundary, then they have the same degree. Let H denote the set of bounded components of Rp \ f (∂K). Then ∪H = Rp \ f (∂K) ⊇ Rp \ f (C) Thus if L ∈ L, then L ⊆ ∪H and so it must intersect some set H of H. By the above Lemma 11.6.2, L is contained in this set of H so it is in LH . By the product formula, X f , K, y = d ¯ f , K, H d f −1 , H, y , (11.13) 1 = d f −1 ◦ ¯ H∈H
the sum being a finite sum. That is, there are finitely many H involved in the sum, the other terms being zero, this by the general result in the product formula. What about those sets of H which contain no set of L ? These sets also have empty intersection with all sets of L and empty intersection with the unbounded component(s) of Rp \ f (C) by Lemma 11.6.2. Therefore, for H one of these, H ⊆ f (C) because H has no points of Rp \ f (C) which equals ∪L ∪ {unbounded component(s) of Rp \ f (C) } . Therefore, d f −1 , H, y = d f −1 , H, y = 0 because y ∈ K a bounded componentof Rp \ C, but for u ∈ H ⊆ f (C) , f −1 (u) ∈ C so f −1 (u) 6= y implying that d f −1 , H, y = 0. Thus in 11.13, all such terms are zero. Then letting H1 be those sets of H which contain (intersect) some sets of L, the above sum reduces to X 1= d ¯ f , K, H d f −1 , H, y H∈H1 p Note also that for H ∈ H1 , H \ ∪LH = H \ ∪L and it has no points of R \ f (C) = ∪L so −1 H \ ∪LH is contained in f (C). Thus y ∈ /f H \ ∪LH ⊆ C because y ∈ / C. It follows from Lemma 11.6.1 which comes from Theorem 11.2.2, the part about having disjoint open sets in Ω and getting a sum of degrees over these being equal to d (f , Ω, y) that X X X d ¯ f , K, H d f −1 , H, y = d ¯ f , K, H d f −1 , L, y H∈H1
H∈H1
=
X
L∈LH
X
d ¯ f , K, H d f −1 , L, y
H∈H1 L∈LH
where LH are those in H. sets of L contained Now d ¯ f , K, H = d ¯ f , K, L where L ∈ LH . This is because L is an open connected subset of H and z → d ¯ f , K, z is constant on H. Therefore, X X X X d ¯ f , K, H d f −1 , L, y = d ¯ f , K, L d f −1 , L, y H∈H1 L∈LH
H∈H1 L∈LH
300
CHAPTER 11. DEGREE THEORY, AN INTRODUCTION
As noted above, there are finitely many H ∈ H which are involved. Rp \ f (C) ⊆ Rp \ f (∂K) and so every L must be contained in some H ∈ H1 . It follows that the above reduces to X f , K, L d f −1 , L, y d ¯ L∈L
Where this is a finite sum because all but finitely many terms are 0. Thus from 11.13, X X f , K, L d f −1 , L, y = f , K, L d f −1 , L, K 1= d ¯ d ¯ L∈L
(11.14)
L∈L
Let |K| denote the number of components in K and similarly, |L| denotes the number of components in L. Thus X X X f , K, L d f −1 , L, K |K| = 1= d ¯ K∈K
K∈K L∈L
Similarly, the argument taken another direction yields X X X |L| = 1= d ¯ f , K, L d f −1 , L, K L∈L
L∈L K∈K 1
If |K| < ∞, then
P
z X
K∈K
}| { −1 ¯ d f , K, L d f , L, K < ∞. The summation which equals 1
L∈L
is a finite sum and so is the outside sum. Hence we can switch the order of summation and get X X |K| = d ¯ f , K, L d f −1 , L, K = |L| L∈L K∈K
A similar argument applies if |L| < ∞. Thus if one of these numbers |K| , |L| is finite, so is the other and they are equal. This proves the theorem because if p > 1 there is exactly one unbounded component to both Rp \ C and Rp \ f (C) and if p = 1 there are exactly two unbounded components. As an application, here is a very interesting little result. It has to do with d (f , Ω, f (x)) in the case where f is one to one and Ω is connected. You might imagine this should equal 1 or −1 based on one dimensional analogies. Recall a one to one map defined on an interval is either increasing or decreasing. It either preserves or reverses orientation. It is similar in n dimensions and it is a nice application of the Jordan separation theorem and the product formula.
Proposition 11.6.4 Let Ω be an open connected bounded set in Rp , p ≥ 1 such that p p
R \ ∂Ω consists of two, three if n = 1, connected components. Let f ∈ C Ω; R be continuous and one to one. Then f (Ω) is the bounded component of Rp \ f (∂Ω) and for y ∈ f (Ω) , d (f , Ω, y) either equals 1 or −1. Proof: First suppose n ≥ 2. By the Jordan separation theorem, Rp \ f (∂Ω) consists of two components, a bounded component B and an unbounded component U . Using the Tietze extention theorem, there exists g defined on Rp such that g = f −1 on f Ω . Thus on ∂Ω, g ◦ f = id. It follows from this and the product formula that 1
= d (id, Ω, g (y)) = d (g ◦ f , Ω, g (y)) = d (g, B, g (y)) d (f , Ω, B) + d (f , Ω, U ) d (g, U, g (y)) = d (g, B, g (y)) d (f , Ω, B)
11.7. EXERCISES
301
The reduction happens because d (f , Ω, U ) = 0 as explained above. Since U is unbounded, ¯ . For such, the degree is 0 there are points in U which cannot be in the compact set f Ω but the degree is constant on U, one of the components of f (∂Ω). Therefore, d (f , Ω, B) 6= 0 and so for every z ∈ B, it follows z ∈ f (Ω) . Thus B ⊆ f (Ω) . On the other hand, f (Ω) cannot have points in both U and B because it is a connected set. Therefore f (Ω) ⊆ B and this shows B = f (Ω). Thus d (f , Ω, B) = d (f , Ω, y) for each y ∈ B and the above formula shows this equals either 1 or −1 because the degree is an integer. In the case where n = 1, the argument is similar but here you have 3 components in R1 \ f (∂Ω) so there are more terms in the above sum although two of them give 0. Here is another nice application of the Jordan separation theorem to the Jordan curve theorem and generalizations to higher dimensions.
11.7
Exercises
1. Show the Brouwer fixed point theorem is equivalent to the nonexistence of a continuous retraction onto the boundary of B (0, r). That is, a function f : B (0,r) → ∂B (0, r) such that f is the identity on ∂B and continuous. 2. Using the Jordan separation theorem, prove the invariance of domain theorem. Hint: You might consider B (x, r) and show f maps the inside to one of two components of Rn \ f (∂B (x, r)) . Thus an open ball goes to some open set. 3. Give a version of Proposition 11.6.4 which is valid for the case where n = 1. 4. It was shown that if f is locally one to one and continuous, f : Rn → Rn , and lim |f (x)| = ∞,
|x|→∞
then f maps Rn onto Rn . Suppose you have f : Rm → Rn where f is one to one and lim|x|→∞ |f (x)| = ∞. Show that f cannot be onto. 5. Can there exist a one to one onto continuous map, f which takes the unit interval to the unit disk? 6. Let m < n and let Bm (0,r) be the ball in Rm and Bn (0, r) be the ball in Rn . Show that there is no one to one continuous map from Bm (0,r) to Bn (0,r). Hint: It is like the above problem. 7. Consider the unit disk, (x, y) : x2 + y 2 ≤ 1 ≡ D and the annulus
1 2 2 (x, y) : ≤ x + y ≤ 1 ≡ A 2
Is it possible there exists a one to one onto continuous map f such that f (D) = A? Thus D has no holes and A is really like D but with one hole punched out. Can you generalize to different numbers of holes? Hint: Consider the invariance of domain theorem. The interior of D would need to be mapped to the interior of A. Where do the points of the boundary of A come from? Consider Theorem 2.11.3. 8. Suppose C is a compact set in Rn which has empty interior and f : C → Γ ⊆ Rn is one to one onto and continuous with continuous inverse. Could Γ have nonempty interior? Show also that if f is one to one and onto Γ then if it is continuous, so is f −1 .
302
CHAPTER 11. DEGREE THEORY, AN INTRODUCTION
9. Let K be a nonempty closed and convex subset of Rn . Recall K is convex means that if x, y ∈ K, then for all t ∈ [0, 1] , tx + (1 − t) y ∈ K. Show that if x ∈ Rn there exists a unique z ∈ K such that |x − z| = min {|x − y| : y ∈ K} . This z will be denoted as P x. Hint: First note you do not know K is compact. Establish the parallelogram identity if you have not already done so, 2
2
2
2
|u − v| + |u + v| = 2 |u| + 2 |v| . Then let {zk } be a minimizing sequence, 2
lim |zk − x| = inf {|x − y| : y ∈ K} ≡ λ.
k→∞
Now using convexity, explain why 2 2 2 zk − zm 2 + x− zk + zm = 2 x − zk + 2 x − zm 2 2 2 2 and then use this to argue {zk } is a Cauchy sequence. Then if zi works for i = 1, 2, consider (z1 + z2 ) /2 to get a contradiction. 10. In Problem 9 show that P x satisfies the following variational inequality. (x−P x) · (y−P x) ≤ 0 for all y ∈ K. Then show that |P x1 − P x2 | ≤ |x1 − x2 |. Hint: For the first part note 2 that if y ∈ K, the function t → |x− (P x + t (y−P x))| achieves its minimum on [0, 1] at t = 0. For the second part, (x1 −P x1 ) · (P x2 −P x1 ) ≤ 0, (x2 −P x2 ) · (P x1 −P x2 ) ≤ 0. Explain why (x2 −P x2 − (x1 −P x1 )) · (P x2 −P x1 ) ≥ 0 and then use a some manipulations and the Cauchy Schwarz inequality to get the desired inequality. 11. Establish the Brouwer fixed point theorem for any convex compact set in Rn . Hint: If K is a compact and convex set, let R be large enough that the closed ball, D (0, R) ⊇ K. Let P be the projection onto K as in Problem 10 above. If f is a continuous map from K to K, consider f ◦P . You want to show f has a fixed point in K. 12. Suppose D is a set which is homeomorphic to B (0, 1). This means there exists a continuous one to one map, h such that h B (0, 1) = D such that h−1 is also one to one. Show that if f is a continuous function which maps D to D then f has a fixed point. Now show that it suffices to say that h is one to one and continuous. In this case the continuity of h−1 is automatic. Sets which have the property that continuous functions taking the set to itself have at least one fixed point are said to have the fixed point property. Work Problem 7 using this notion of fixed point property. What about a solid ball and a donut? Could these be homeomorphic? 13. Suppose Ω is any open bounded subset of Rn which contains 0 and that f : Ω → Rn is continuous with the property that f (x) · x ≥ 0 for all x ∈ ∂Ω. Show that then there exists x ∈ Ω such that f (x) = 0. Give a similar result in the case where the above inequality is replaced with ≤. Hint: You might consider the function h (t, x) ≡ tf (x) + (1 − t) x.
11.7. EXERCISES
303
14. Suppose Ω is an open set in Rn containing 0 and suppose that f : Ω → Rn is continuous and |f (x)| ≤ |x| for all x ∈ ∂Ω. Show f has a fixed point in Ω. Hint: Consider h (t, x) ≡ t (x − f (x)) + (1 − t) x for t ∈ [0, 1] . If t = 1 and some x ∈ ∂Ω is sent to 0, then you are done. Suppose therefore, that no fixed point exists on ∂Ω. Consider t < 1 and use the given inequality. 15. Let Ω be an open bounded subset of Rn and let f , g : Ω → Rn both be continuous such that |f (x)| − |g (x)| > 0 for all x ∈ ∂Ω. Show that then d (f − g, Ω, 0) = d (f , Ω, 0) −1
Show that if there exists x ∈ f −1 (0) , then there exists x ∈ (f − g) (0). Hint: You might consider h (t, x) ≡ (1 − t) f (x) + t (f (x) − g (x)) and argue 0 ∈ / h (t, ∂Ω) for t ∈ [0, 1]. 16. Let f : C → C where C is the field of complex numbers. Thus f has a real and imaginary part. Letting z = x + iy, f (z) = u (x, y) + iv (x, y) p Recall that the norm in C is given by |x + iy| = x2 + y 2 and this is the usual norm in R2 for the ordered pair (x, y) . Thus complex valued functions defined on C can be considered as R2 valued functions defined on some subset of R2 . Such a complex function is said to be analytic if the usual definition holds. That is f 0 (z) = lim
h→0
f (z + h) − f (z) . h
In other words, f (z + h) = f (z) + f 0 (z) h + o (h)
(11.15)
n
at a point z where the derivative exists. Let f (z) = z where n is a positive integer. Thus z n = p (x, y) + iq (x, y) for p, q suitable polynomials in x and y. Show this function is analytic. Next show that for an analytic function and u and v the real and imaginary parts, the Cauchy Riemann equations hold. ux = vy , uy = −vx . In terms of mappings show 11.15 has the form u (x + h1 , y + h2 ) v (x + h1 , y + h2 )
h1 + o (h) h2 u (x, y) ux (x, y) −vx (x, y) h1 = + + o (h) v (x, y) vx (x, y) ux (x, y) h2 =
u (x, y) v (x, y)
+
ux (x, y) uy (x, y) vx (x, y) vy (x, y)
T
where h = (h1 , h2 ) and h is given by h1 + ih2 . Thus the determinant of the above matrix is always nonnegative. Letting Br denote the ball B (0, r) = B ((0, 0) , r) show d (f, Br , 0) = n. where f (z) = z n . In terms of mappings on R2 , u (x, y) f (x, y) = . v (x, y)
304
CHAPTER 11. DEGREE THEORY, AN INTRODUCTION Thus show d (f , Br , 0) = n. Hint: You might consider g (z) ≡
n Y
(z − aj )
j=1
where the aj are small real distinct numbers and argue that both this function and f are analytic but that 0 is a regular value for g although it is not so for f . However, for each aj small but distinct d (f , Br , 0) = d (g, Br , 0). 17. Using Problem 16, prove the fundamental theorem of algebra as follows. Let p (z) be a nonconstant polynomial of degree n, p (z) = an z n + an−1 z n−1 + · · · Show that for large enough r, |p (z)| > |p (z) − an z n | for all z ∈ ∂B (0, r). Now from Problem 15 you can conclude d (p, Br , 0) = d (f, Br , 0) = n where f (z) = an z n . 18. Suppose f : Rn → Rn satisfies |f (x) − f (y)| ≥ α |x − y| , α > 0, Show that f must map Rn onto Rn . Hint: First show f is one to one. Then use invariance of domain. Next show, using the inequality, that the points not in f (Rn ) must form an open set because if y is such a point, then there can be no sequence {f (xn )} converging to it. Finally recall that Rn is connected. 19. Suppose f : Rn → Rn is locally one to one and continuous and satisfies a coercivity condition lim |f (x)| = ∞ kxk→∞
Show that then f must map onto Rn . Hint: By invariance of domain, you have f (Rn ) is open. Is this also closed? Now use connectedness considerations. 20. Suppose f : R2 → R2 is given by f (x, y) ≡
ex cos (y) ex sin (y)
Show that det (Df (x, y)) 6= 0. By the inverse function theorem, this function is locally one to one. However, it misses the point (0, 0) because |f (x, y)| = ex . Does this contradict the above problem? 21. Suppose f : Rn → Rn satisfies |f (x) − f (y)| ≥ α |x − y| , α > 0, Show that f must map Rn onto Rn . Hint: First show f is one to one. Then use invariance of domain. Next show, using the inequality, that the points not in f (Rn ) must form an open set because if y is such a point, then there can be no sequence {f (xn )} converging to it. Finally recall that Rn is connected.
Chapter 12
Green’s Theorem 12.1
An Elementary Form of Green’s Theorem
Definition 12.1.1
Here and elsewhere, an open connected set will be called a region unless another use for this term is specified. Green’s theorem is an important theorem which relates line integrals to integrals over a surface in the plane. It can be used to establish Stoke’s theorem but is interesting for it’s own sake. Historically, something like it was important in the development of complex analysis. I will first establish Green’s theorem for regions of a particular sort and then show that the theorem holds for many other regions also. Suppose a region is of the form indicated in the following picture in which U
=
{(x, y) : x ∈ (a, b) and y ∈ (b (x) , t (x))}
=
{(x, y) : y ∈ (c, d) and x ∈ (l (y) , r (y))} . y = t(x)
d
x = l(y) c
U
x = r(y)
y = b(x)
a b I will refer to such a region as being convex in both the x and y directions. For sufficiently simple regions like those just described, it is easy to see what is meant by counter clockwise motion over the curve. It will always be assumed in this section that the bounding curves are piecewise C 1 meaning that there is a parametrization t → R (t) ≡ (x (t) , y (y)) for t ∈ [a, b] and a partition of [a, b] , {x0 , · · · , xn } such that R has derivatives which are continuous on each [xk−1 , xk ] and R is continuous on [a, b]. Thus these curves are of bounded variation thanks to Lemma 4.2.2. One can then compute the line integrals by adding together the integrals over the sub-intervals thanks to Lemma 4.2.9. In particular, one writes for one of these integrals Z Z xk
F (R (t)) · R0 (t) dt =
xk−1
F·dR R([xk−1 ,xk ])
Orientation is assumed to come from increasing t.
Lemma 12.1.2 Let F (x, y) ≡ (P (x, y) , Q (x, y)) be a C 1 vector field defined near U where U is a region of the sort indicated in the above picture which is convex in both the x 305
306
CHAPTER 12. GREEN’S THEOREM
and y directions. Suppose also that the functions, r, l, t, and b in the above picture are all C 1 functions and denote by ∂U the boundary of U oriented such that the direction of motion is counter clockwise. (As you walk around U on ∂U , the points of U are on your left.) Then Z P dx + Qdy ≡ ∂U
Z
Z F·dR = ∂U
U
∂Q ∂P − ∂x ∂y
dA.
(12.1)
Proof: First consider the right side of 12.1. Z ∂Q ∂P − dA ∂x ∂y U Z
d
Z
r(y)
∂Q dxdy − ∂x
Z
b
Z
t(x)
∂P dydx c l(y) a b(x) ∂y Z d Z b = (Q (r (y) , y) − Q (l (y) , y)) dy + (P (x, b (x))) − P (x, t (x)) dx. (12.2) =
c
a
Now consider the left side of 12.1. Denote by V the vertical parts of ∂U and by H the horizontal parts. Z F·dR = ∂U
Z ((0, Q) + (P, 0)) · dR
= ∂U Z d
=
(0, Q (r (s) , s)) · (r0 (s) , 1) ds +
Z (0, Q (r (s) , s)) · (±1, 0) ds
c
H d
Z
Z
0
− c
Z (P (s, b (s)) , 0) · (0, ±1) ds −
V
Z
(P (s, b (s)) , 0) · (1, b0 (s)) ds
a
Z + d
Z
c
b
(P (s, t (s)) , 0) · (1, t0 (s)) ds
a
Q (r (s) , s) ds −
=
b
(0, Q (l (s) , s)) · (l (s) , 1) ds +
d
Z
c
b
Z P (s, b (s)) ds −
Q (l (s) , s) ds + a
b
P (s, t (s)) ds a
which coincides with 12.2.
Corollary 12.1.3 Let everything be the same as in Lemma 12.1.2 but only assume the functions r, l, t, and b are continuous and piecewise C 1 functions. Then the conclusion this lemma is still valid. Proof: The details are left for you. All you have to do is to break up the various line integrals into the sum of integrals over sub intervals on which the function of interest is C 1 . From this corollary, it follows 12.1 is valid for any triangle for example. Now suppose 12.1 holds for U1 , U2 , · · · , Um and the open sets, Uk have the property that no two have nonempty intersection and their boundaries intersect only in a finite number of piecewise smooth curves. Then 12.1 must hold for U ≡ ∪m i=1 Ui , the union of these sets. This is because Z ∂Q ∂P − dm2 = ∂x ∂y U
12.1. AN ELEMENTARY FORM OF GREEN’S THEOREM
= =
m Z X
∂Q ∂P − ∂x ∂y Z F · dR =
k=1 Uk m Z X k=1
∂Uk
307
dm2 F · dR
∂U
because if Γ = ∂Uk ∩ ∂Uj , then its orientation as a part of ∂Uk is opposite to its orientation as a part of ∂Uj and consequently the line integrals over Γ will cancel, points of Γ also not being in ∂U. It is obvious from the definition of Lebesgue measure given earlier that the intersection of two of these in a smooth curve has measure zero. Thus adding an integral with respect to m2 over such a curve yields 0. I am not trying to be completely general here. I am just noting that when you paste together simple shapes like triangles and rectangles, this kind of cancelation will take place. As part of the development of a general Green’s theorem given later, it is shown that whenever you have a curve of bounded variation, it will have two dimensional Lebesgue measure zero. As an illustration, consider the following picture for two such Uk .
U1 U2
You can see that you could paste together many such simple regions composed of triangles or rectangles and obtain a region on which Green’s theorem will hold even though it is not convex in each direction. This approach is developed much more in the book by Spivak [65] who pastes together boxes as part of his treatment of a general Stokes theorem. Similarly, if U ⊆ V and if also ∂U ⊆ V and both U and V are open sets for which 12.1 holds, then the open set, V \ (U ∪ ∂U ) consisting of what is left in V after deleting U along with its boundary also satisfies 12.1. Roughly speaking, you can drill holes in a region for which 12.1 holds and get another region for which this continues to hold provided 12.1 holds for the holes. To see why this is so, consider the following picture which typifies the situation just described.
V
U
Then
Z
Z F·dR = ∂V
Z
∂Q ∂P − ∂x ∂y Z F·dR +
Z =
V \U
Z
dA
∂U
U
Z
∂Q ∂P − ∂x ∂y V \U ∂Q ∂P − dA ∂x ∂y V \U
=
and so
V
∂Q ∂P − ∂x ∂y
∂Q ∂P − ∂x ∂y
dA +
Z dA =
dA
Z F·dR−
∂V
F·dR ∂U
308
CHAPTER 12. GREEN’S THEOREM
which equals Z F · dR ∂(V \U )
where ∂V is oriented as shown in the picture. (If you walk around the region, V \ U with the area on the left, you get the indicated orientation for this curve.) You can see that 12.1 is valid quite generally.
12.2
Stoke’s Theorem
When you have a region for which Green’s theorem holds, you can quickly get a version of Stoke’s theorem. This approach follows Davis and Snyder [19]. Thus the difficult questions concerning Stoke’s theorem on a two dimensional surface in R3 can be reduced completely to Green’s theorem in the plane. I am assuming you have seen some discussion of Stoke’s theorem in multivariable calculus. Recall from calculus the following definition of the curl of a vector field. Recall also how the cross product of two vectors defined as i j k u × v ≡ u1 u2 u3 v1 v2 v3 is perpendicular to both vectors u, v and how u, v, u × v forms a right handed system. In this formula, it is assumed that the vectors i, j, k pointing respectively in the direction of the positive x axis, y axis and z axis also form a right handed system. Also recall from calculus that when you have U a bounded region in R2 , you can define a parametric surface S as R (U ) where x (u, v) R (u, v) ≡ y (u, v) , (u, v) ∈ U z (u, v) R : U → R3 has continuous partial derivatives and satisfies |Ru × Rv | = 6 0 Then when you write
R S
f dS, you mean Z f (R (u, v)) |Ru × Rv | dm2 U
Stoke’s theorem is really just a version of Green’s theorem for a surface in three dimensions rather than a surface in the plane. It relates a line integral around the boundary of the surface to a surface integral like what was just described.
Definition 12.2.1
Let
F (x, y, z) = (F1 (x, y, z) , F2 (x, y, z) , F3 (x, y, z)) be a C 1 vector field defined on an open set, V in R3 . Then i j k ∂ ∂F3 ∂F2 ∂F1 ∂F3 ∂F2 ∂F1 ∂ ∂ ≡ − i+ − j+ − k. ∇ × F ≡ ∂x ∂y ∂z ∂y ∂z ∂z ∂x ∂x ∂y F1 F2 F3 This is also called curl (F) and written as indicated, ∇ × F. In symbols, the ith component of ∇ × F is εijk ∂j Fk
12.2. STOKE’S THEOREM
309
where εijk is the permutation symbol and summation is over repeated indices. This symbol is defined as ε123 = 1, εijk = −εjik = −εikj = −εkji In words, when you switch two subscripts, the result is −1 times what you had. Note that the above implies that if two indices are equal, then the symbol equals 0. A fundamental identity is available. For 1 if i = j δ ij ≡ 0 if i 6= j it follows that εijk εirs = δ jr δ ks − δ js δ kr
(12.3)
Note that δ ij xj = xi where it is always understood that summation over repeated indices is taking place. The following lemma gives the fundamental identity which will be used in the proof of Stoke’s theorem.
Lemma 12.2.2 Let R : U → V ⊆ R3 where U is an open subset of R2 and V is an open subset of R3 . Suppose R is C 2 and let F be a C 1 vector field defined in V. (Ru × Rv ) · (∇ × F) (R (u, v)) = ((F ◦ R)u · Rv − (F ◦ R)v · Ru ) (u, v) .
(12.4)
Proof: Start with the left side and let xi = Ri (u, v) for short, then using 12.3 when needed, (Ru × Rv ) · (∇ × F) (R (u, v))
= εijk xju xkv εirs =
∂Fs ∂xr
(δ jr δ ks − δ js δ kr ) xju xkv
∂Fs ∂xr
∂Fj ∂Fk − xju xkv ∂xj ∂xk ∂ (F ◦ R) ∂ (F ◦ R) = Rv · − Ru · ∂u ∂v
=
xju xkv
which proves 12.4. The proof of Stoke’s theorem given next follows [19]. First, it is convenient to give a definition.
Definition 12.2.3
A vector valued function, R : U ⊆ Rm → Rn is said to be in C U , R if it is the restriction to U of a vector valued function which is defined on Rm and is C k . That is, this function has continuous partial derivatives up to order k. k
n
Let U be a region in R2 for which Green’s theorem holds. Thus Green’s theorem says that if P, Q are in C 1 U , and the oriented boundary ∂U is taken in the counter clockwise direction, then Z Z (Qu − Pv ) dA = U
f · dR ∂U
Here the u and v axes are in the same relation as the x and y axes. That is, the following picture holds. The positive x and u axes both point to the right and the positive y and v axes point up. y
v x
u
310
CHAPTER 12. GREEN’S THEOREM
Theorem 12.2.4
(Stoke’s Theorem) Let U be any region in R2 for which the conclusion of Green’s theorem holds and let R ∈ C 2 U , R3 be a one to one function satisfying |(Ru × Rv ) (u, v)| = 6 0 for all (u, v) ∈ U and let S denote the surface, S
≡
{R (u, v) : (u, v) ∈ U } ,
∂S
≡
{R (u, v) : (u, v) ∈ ∂U }
where the orientation on ∂S is consistent with the counter clockwise orientation on ∂U. (U is on the left as you walk around ∂U as described above. For reasonably simple regions this notion of counter clockwise motion is obvious.) Then for F a C 1 vector field defined near S, Z Z F · dR = curl (F) · ndS ∂S ZS ≡ curl (F (R (u, v))) · n (u, v) |Ru × Rv | dm2 U
where n is the normal to S defined by n≡
Ru × Rv . |Ru × Rv |
and dS is the increment of surface area defined as |Ru × Rv | dudv = |Ru × Rv | dm2 for dm2 indicating 2 dimensional Lebesgue measure discussed earlier. Proof: Letting C be an oriented part of ∂U having parametrization, r (t) ≡ (u (t) , v (t)) for t ∈ [α, β] and letting R (C) denote the oriented part of ∂S corresponding to C, Z F · dR = R(C)
Z
β
F (R (u (t) , v (t))) · (Ru u0 (t) + Rv v 0 (t)) dt
= α β
Z
F (R (u (t) , v (t))) Ru (u (t) , v (t)) u0 (t) dt
= α
Z + Z
β
F (R (u (t) , v (t))) Rv (u (t) , v (t)) v 0 (t) dt
α
((F ◦ R) · Ru , (F ◦ R) · Rv ) · dr.
= C
Since this holds for each such piece of ∂U, it follows Z Z F · dR = ((F ◦ R) · Ru , (F ◦ R) · Rv ) · dr. ∂S
∂U
By the assumption that the conclusion of Green’s theorem holds for U , this equals Z [((F ◦ R) · Rv )u − ((F ◦ R) · Ru )v ] dm2 ZU = [(F ◦ R)u · Rv + (F ◦ R) · Rvu − (F ◦ R) · Ruv − (F ◦ R)v · Ru ] dm2 U Z = [(F ◦ R)u · Rv − (F ◦ R)v · Ru ] dm2 U
12.3. A GENERAL GREEN’S THEOREM
311
the last step holding by equality of mixed partial derivatives, a result of the assumption that R is C 2 . Now by Lemma 12.2.2, this equals Z (Ru × Rv ) · (∇ × F) dA ZU = ∇ × F· (Ru × Rv ) dA ZU = ∇ × F · ndS S (Ru ×Rv ) . Thus because dS = |(Ru × Rv )| dm2 and n = |(R u ×Rv )|
(Ru × Rv ) dm2
(Ru × Rv ) |(Ru × Rv )| dm2 |(Ru × Rv )| = ndS.
=
Note that there is no mention made in the final result that R is C 2 . Therefore, it is not surprising that versions of this theorem are valid in which this assumption is not present. It is possible to obtain extremely general versions of Stoke’s theorem if you use the Lebesgue integral.
12.3
A General Green’s Theorem
Now suppose U is a region in the uv plane for which Green’s theorem holds and that V ≡ R (U ) 2
where R is C 2 U , R and is one to one, Ru × Rv 6= 0. Here, to be specific, the u, v axes are oriented as the x, y axes respectively. y
v x
u
Also let F (x, y, z) = (P (x, y) , Q (x, y) , 0) be a C 1 vector field defined near V . Note that F does not depend on z. Therefore, ∇ × F (x, y) = (Qx (x, y) − Py (x, y)) k. You can check this from the definition. Also R (u, v) = and so Ru × Rv , the normal vector to V
x (u, v) y (u, v)
is xu yu xu yu
xv yv xv yv
k
Suppose xu yu
xv >0 yv
312
CHAPTER 12. GREEN’S THEOREM
so the unit normal is then just k. Then Stoke’s theorem applied to Z Z F·dR= (Qx (x (u, v) , y (u, v)) − Py (x (u, v) , y (u, v))) k · k ∂V
U
this special case yields xu xv yu yv dm2 (u, v)
Now by the change of variables formula, this equals Z = (Qx (x, y) − Py (x, y)) dm2 V
This is just Green’s theorem for V . Thus if U is a region for which Green’s theorem holds and if V is another region, V = R (U ) , where |Ru × Rv | 6= 0, R is one to one, and twice continuously differentiable with Ru × Rv in the direction of k, then Green’s theorem holds for V also. This verifies the following theorem.
Theorem 12.3.1
(Green’s Theorem) Let V be an open set in the plane and let ∂V be piecewise smooth and let F (x, y) = (P (x, y) , Q (x, y)) be a C 1 vector field defined near V. Then if ∂V is oriented counter clockwise, it is often1 the case that Z Z ∂P ∂Q (x, y) − (x, y) dm2 . (12.5) F · dR = ∂x ∂y ∂V V In particular, if there exists U such as the simple convex in both directions case considered earlier for which Green’s theorem holds, and V = R (U ) where R : U → V is C 2 U , R2 such that |Rx × Ry | = 6 0 and Rx × Ry is in the direction of k, then 12.5 is valid where the orientation around ∂V is consistent with the orientation around ∂U . This is a very general version of Green’s theorem which will include most, if not all of what will be of interest in the part of the book on complex analysis. However, it suffers from several shortcomings. First the region V is defined as R (U ) where R ∈ C 2 U , R2 and ∂U is piecewise C 1 . What if you just have a bounded variation function R : S 1 → R2 where S 1 denotes the unit circle? Can you give Green’s theorem for the inside of this curve? Of course the first step is to verify that there is an inside and an outside such that R S 1 is the boundary of both. This is the Jordan curve theorem in the next section and presented in the chapter on degree theory. Next you need to deal with the general case of line integrals with respect to bounded variation curves which is not done above. The more general result presented in Section 12.5 will allow for all of these considerations and will also allow for much more general vector fields F. It is not necessary to assume F ∈ C 2 U , R2 for example. In fact, it suffices to have F be C 1 on the open set U and the partial derivatives of F be in L1 (U ) , and you only need F continuous on U . This is significantly more general, and the presentation in Section 12.5 is sufficient to include these things. In fact, it is possible to do all of this for even more general situations.
12.4
The Jordan Curve Theorem∗
This section is devoted to giving an elementary proof of the Jordan curve theorem other than the one in the chapter on degree theory. I am following lecture notes from a topology course given by Fernley at BYU in the 1970s. Another more general approach to this important result is in the chapter on degree theory. The reason for this theorem is to enable a very general proof of Green’s theorem for a simple closed curve having finite length. This will be used to give a treatment of the Cauchy 1 For a general version see the advanced calculus book by Apostol. This is presented in the next section also. The general versions involve the concept of a rectifiable Jordan curve. You need to be able to take the area integral and to take the line integral around the boundary.
12.4. THE JORDAN CURVE THEOREM∗
313
integral theorem which is like what was originally proposed by Cauchy. However, I have given another treatment of Green’s theorem which is sufficiently general to include most of what is of interest later. Rather than simply citing the result, I am trying to give a proof for those who need to see why the theorem is true. The theorem itself is pretty easy to believe. A unit circle divides the plane into two connected components, a bounded inside and an unbounded outside with the circle itself being the complete boundary of both of these connected open sets. This is obvious. The Jordan curve theorem says that if h is a one to one continuous function defined on such a circle C, then the wriggly circle h (C) also divides the plane into two components, an inside and an outside just like the circle does. Proving this turns out to be surprisingly difficult. It was first done by Veblun in 1917.
Definition 12.4.1
A grating G is a finite set of horizontal and vertical lines, each of which separates the plane. The grating divides the plane into two dimensional domains the closures of which are called 2 cells of G. The 1 cells of G are the edges of the 2 cells and the 0 cells of G are the end points of the 1 cells. 0 cell •
2 cell
•
•
•
•
•
•
•
•
2 cell
1 cell 2 cell n
For k = 0, 1, 2, one speaks of k chains. For {aj }j=1 a set of k cells, the k chain is denoted as a formal sum C = a1 + a2 + · · · + an where the sum is taken modulo 2. The sums are just formal expressions like the above. Thus for a a k cell, a + a = 0, 0 + a = a, the summation sign is commutative. In other words, if a k cell is repeated an even number of times in the formal sum, it disappears resulting in 0 defined by 0 + a = a + 0 = a. The symbol 0 indicates there is no k cell. For a a k cell, |a| denotes the points of the plane which are contained in a. For a k chain C as above, |C| ≡ {x : x ∈ |aj | for some aj } so |C| is the union of the k cells in the sum remembering that when a k cell occurs twice, it is gone and does not contribute to |C|. The following picture illustrates the above definition. The following is a picture of the 2 cells in a 2 chain. The dotted lines indicate the lines in the grating.
Now the following is a picture of the 1 chain consisting of the sum of the 1 cells which are the edges of the above 2 cells. Remember when a 1 cell is added to itself, it disappears from the chain. Thus if you add up the 1 cells which are the edges of the above 2 cells, lots of them cancel off. In fact all the edges which are shared between two 2 cells disappear. The following is what results.
314
CHAPTER 12. GREEN’S THEOREM
Definition 12.4.2
Next, the boundary operator is defined. This is denoted by ∂. ∂ takes k cells to k − 1 chains. If a is a 2 cell, then ∂a consists of the edges of a. If a is a 1 cell, then ∂a consists of the ends of the 1 cell. If a is a 0 cell, then ∂a ≡ 0. This extends in a natural way to k chains. For C = a1 + a2 + · · · + an , ∂C ≡ ∂a1 + ∂a2 + · · · + ∂an A k chain C is called a cycle if ∂C = 0. In the second of the above pictures, you have a 1 cycle. Here is a picture of another one in which the boundary of another 2 cell has been included over on the right.
This 1 cycle shown above is the boundary of exactly two 2 chains. What are they? C1 consists of the 2 cells in the first picture above along with the 2 cell whose boundary is the 1 cycle over on the right. C2 is all the other 2 cells of the grating. You see this clearly works. Could you make that 2 cell on the right be in C2 ? No, you couldn’t do it. This is because the 1 cells which are shown would disappear, being listed twice. This illustrates the fundamental lemma of the plane which comes next.
Lemma 12.4.3 If C is a bounded 1 cycle (∂C = 0), then there are exactly two disjoint 2 chains D1 , D2 such that C = ∂D1 = ∂D2 . Proof : First consider the unbounded 2 cells. These must all be in a single 2 chain. To see this, imagine moving around a very large rectangle which contains the bounded 1 cycle in its interior. (The Jordan curve theorem is obvious for a rectangle.) If these 2 cells are assigned to more than one 2 chain, then an unbounded 1 cell is in the boundary of two adjacent 2 cells and would appear in C contrary to C being bounded. Therefore, one is forced to assign all unbounded 2 cells to a single 2 chain if the lemma is to be true. Label each of these with an A to indicate they are all in the same 2 chain. Now starting from the left of the bounded 1 cycle and moving toward the right, toggle between A and B every time you hit a vertical 1 cell of C. This will label every 2 cell with either A or B. Next, starting above the bounded 1 cycle, toggle between A and B every time you encounter a horizontal 1 cell of C. This also labels every 2 cell as either A or B. Then let one of the 2 chains be those labeled with A and the other be the ones labeled with B. If the lemma is to be true, then these are the two 2 chains D1 , D2 which satisfy ∂Di = C since this accounts for all edges of C exhibiting each as part of the boundary of Di . The difficulty is whether this is well defined. We have not yet used the fact that this is a 1 cycle having zero boundary. Consider the following picture.
12.4. THE JORDAN CURVE THEOREM∗
315
Suppose you have a 2 cell which is labeled A from left to right and B from top to bottom. Pick the contradictory 2 cell which is farthest to the left and as high as possible out of all contradictory 2 cells. Be• ginning with the label B move over the circle counterAB clockwise changing the label whenever you encounter an edge of C. Since you end up with A, you must have encountered an odd number of 1 cells of C and so C was not a 1 cycle after all. The point at the center in the above picture must be in ∂C. Thus D1 and D2 are uniquely determined assuming there are only two such 2 chains having boundary C. If you had a third, D3 , then let e be an edge of d ∈ ∂D3 . It would then need to be an edge of two 2 cells, one in D1 and one in D2 and so some 2 cell is in two different Di and they are not disjoint after all. The next lemma is interesting because it gives the existence of a continuous curve joining two points. y •
• x
Lemma 12.4.4 Let C be a bounded 1 chain such that ∂C = x + y. Then both x, y are contained in a continuous curve which is a subset of |C|. Proof : There are an odd number of 1 cells of C which have x at one end. Otherwise ∂C 6= x + y. Begin at x and move along an edge leading away from x. Continue till there is no new edge to travel along. You must be at y since otherwise, you would have found another boundary point of C. This point would be in either one or three one cells of C. It can’t be x because x is contained in either one or three one cells of C. Thus, there is always a way to leave x if the process returns to it. It follows that there is a continuous curve in |C| joining x to y. The next lemma gives conditions under which you can go around a couple of closed sets. It is called Alexander’s lemma. The following picture is a rough illustration of the situation. Roughly, it says that if you can miss F1 and you can miss F2 in going from x to y, then you can miss both F1 and F2 by climbing around F1 . x C1 F1
F2 C2 y
Lemma 12.4.5 Let F1 be compact and F2 closed. Suppose C1 , C2 are two bounded 1 chains in a grating. Suppose also that this grating has no unbounded 2 cells intersecting F1 . Suppose ∂Ci = x+y where x, y ∈ / F1 ∪F2 . Suppose C2 does not intersect F2 and C1 does not intersect F1 . Also suppose the 1 cycle C1 +C2 is the boundary of a 2 chain D for which |D| ∩ F1 ∩ F2 = ∅. Then there exists a 1 chain C such that ∂C = x + y and |∂C| ∩ (F1 ∪ F2 ) = ∅. In particular x, y cannot be in different components of the complement of F1 ∪ F2 . Proof: Let a1 , a2 , · · · , am be the 2 cells of D which intersect the compact set F1 . Con-
316
CHAPTER 12. GREEN’S THEOREM
sider C ≡ C2 +
X
∂ak .
k
This is a 1 chain and ∂C = x + y because ∂∂ak = 0. Then |ak | ∩ F2 = ∅. This is because |ak |∩F1 6= ∅ and none of the 2 cells of D intersect both F1 and F2 by assumption. Therefore, C is a bounded 1 chain which avoids intersecting F2 . Does it also avoid F1 ? Suppose to the contrary that l is a 1 cell of C which does intersect F1 . If |l| ⊆ |C1 + C2 | , then it would be an edge of some 2 cell of D and would have to be a 1 cell P of C2 since it intersects F1 so it would have been added twice, once from C2 and once from k ∂ak and therefore could not be a summand in C. Therefore, |l| is not in |C1 + C2 |. It follows l must be an edge of some ak ∈ D and it is not a 1 cell of C1 + C2 . Therefore, if b is the 2 cell adjacent to ak which shares l as an edge, it must follow b ∈ /D since otherwise l would not appear in the sum because it would be added out. But now, it follows that l ∈ ∂D = C1 + C2 which was ruled out above. Therefore C misses F1 also.
Lemma 12.4.6 Let C be a bounded 1 cycle such that |C|∩H = ∅ where H is a connected set. Also let D, E be the 2 chains with ∂D = C = ∂E. Then either |H| ⊆ |E| or |H| ⊆ |D| . Proof: If p is a limit point of |E| and p ∈ |D| , then p must be contained in an edge of some 2 cell of D since otherwise it could not be a limit point, being contained in an open set whose intersection with |E| is empty. If p is a point of an even number of edges of 2 cells of D, then it is likewise an interior point of |D| which cannot be a limit point of |E| . Therefore, if p is a limit point of |E| , it must be the case that p ∈ |C| . A similar observation holds for the case where p ∈ |E| and is a limit point of |D|. Thus if H ∩ |D| and H ∩ |E| are both nonempty, then they separate the connected set H and so H must be a subset of one of |D| or |E|. If you have a limit point of either one, then it is in |C| and so not in the other since H has no points of |C|.
Definition 12.4.7
A Jordan arc is a set of points of the form Γ ≡ r ([a, b]) where r is a one to one map from [a, b] to the plane. For p, q ∈ Γ, say p < q if p = r (t1 ) , q = r (t2 ) for t1 < t2 . Also let pq denote the arc r ([t1 , t2 ]).
Theorem 12.4.8
Let Γ be a Jordan arc. Then its complement is connected.
Proof: Suppose this is not so. Then there exists x, y points in ΓC which are in different components of ΓC . Let G be a grating having x, y as points of intersection of a horizontal line and a vertical line of G and let p, q be the points at the ends of the Jordan arc. Also let G be such that no unbounded two cell has nonempty intersection with Γ. Let p = r (a) and consider the two arcs pz and zq. If ∂C = x + y and q = r (b) . Now let z = r a+b 2 then it is required |C| ∩ Γ 6= ∅ since otherwise these two points would not be in different components. Suppose there exists C1 , ∂C1 = x + y and |C1 | ∩ zq = ∅ and C2 , ∂C2 = x + y but |C2 | ∩ pz = ∅. Then C1 + C2 is a 1 cycle and so by Lemma 12.4.3 there are exactly two 2 chains whose boundary is C1 + C2 . Since z ∈ / |Ci | , it follows z = pz ∩ zq can only be in one of these 2 chains because it is a single point. Then by Lemma 12.4.5, Alexander’s lemma, there exists C a 1 chain with ∂C = x + y and |C| ∩ (pz ∪ zq) = ∅ so by Lemma 12.4.4 x, y are not in different components of ΓC contrary to the assumption they are in different components. Hence one of pz, zq has the property that every 1 chain C with ∂C = x + y intersects it. Say every such 1 chain goes through zq. Then let zq play the role of pq and conclude every 1 chain C such that ∂C = x + y goes through either zw or wq there 1 a+b +b w=r 2 2
12.4. THE JORDAN CURVE THEOREM∗
317
Thus, continuing this way, there is a sequence of Jordan arcs pk qk where r (tk ) = qk and , [sk , tk ] ⊆ [a, b] such that every C with ∂C = x + y r (sk ) = pk with |tk − sk | < b−a 2k has nonempty intersection with pk qk . The intersection of these arcs is r (s) where s = ∩∞ k=1 [sk , tk ]. Then all such C must go through r (s) because such C with ∂C = x + y must intersect pk qk for each k and their intersection is r (s). But now there is an obvious contradiction to having every 1 chain whose boundary is x + y intersecting r (s). Pick a 1 chain C whose boundary is x+y. Let D be the two •q chain of at most four 2 cells consisting of those two cells which have r (s) on some edge. Then ∂ (C + ∂D) = ∂C = x + y but •y r (s) ∈ / |C + ∂D| . Therefore, this contradiction shows ΓC must be connected after all. • r(s) x The other important observation about a Jordan arc is • that it has no interior points. This will also follow later from p• a harder result but it is also easy to prove.
Lemma 12.4.9 Let Γ = r ([a, b]) be a Jordan arc where r is as above, one to one, onto and continuous. Then Γ has no interior points. Proof : Suppose to the contrary that Γ has an interior point p. Then for some r > 0, B (p, r) ⊆ Γ. Consider the circles of radius δ < r centered at p. Denoting as Cδ one of these, it follows the Cδ are disjoint. Therefore, since r is one to one, the sets r−1 (Cδ ) are also disjoint. Now r is continuous and one to one mapping to a compact set. Therefore, by Theorem 2.6.30, r−1 is also continuous. It follows r−1 (Cδ ) is connected and compact. Thus by Theorem 2.11.6 each of these sets is a closed interval of positive length since r is one to one. It follows there exist disjoint open nonempty intervals consisting of the interiors of r−1 (Cδ ) , {Iδ }δ 2, but we are primarily concerned with simple closed curves in the plane. Note that we could have assumed r is continuous on [a, b] and one to one on (a, b] with r (a) = r (b) and proved that this is also equivalent to J being the continuous one to one image of S 1 . The proof would be essentially the same as what was given above.
Corollary 12.4.13 The following are equivalent. 1. J is the continous one to one image of S 1 2. J is the continuous one to one image of [a, b) where if r is the mapping, r (a) = r (b) . 3. J is the continuous one to one image of (a, b] where if r is the mapping, r (a) = r (b) . Before the proof of the Jordan curve theorem, recall Theorem 2.11.12 which says that the connected components of an open sets are open and that an open connected set is arcwise connected. If J is a Jordan curve then it is the continuous image of the compact set S 1 and so J is also compact. Therefore, its complement is open and the connected components of J C are connected. The following lemma is a fairly obvious conclusion of this. A square curve is a continuous curve which consists entirely of line segments which are either horizontal or vertical. Thus, given a square curve, you can always consider it as part of a grating.
Lemma 12.4.14 Let U be a connected open set and let x, y be points of U . Then there is a square curve which joins x and y. Proof: Let V denote those points of U which can be joined to x by a square curve. Then if z ∈ V, there exists B (z, r) ⊆ U . It is clear that every point of B (z, r) can be joined to z by a square curve. Also V C must be open since if z ∈ V C , B (z, r) ⊆ U for some r. Then if any w ∈ B (z, r) is in V, one could join w to z by a square curve and conclude that z ∈ V after all. The fact that both V, V C are both open would result in a contradiction unless both x, y ∈ V since otherwise, U is separated by V, V C .
12.4. THE JORDAN CURVE THEOREM∗
319
Theorem 12.4.15
Let J be a Jordan curve in the plane. Then J C consists of exactly two components, a bounded component, and an unbounded component, and J is the frontier of both of these components. Furthermore, J has empty interior. Proof : To begin with consider the claim there are no more than two components. Suppose this is not so. Then there exist x, y, z each of which is in a different component of J C . Let J = H ∪ K where H and K are two Jordan arcs joined at the points a and b. If the Jordan d]) where r (c) = r (d) as described above, you could take curve is r ([c, c+d and K = r , d . Thus the points on the Jordan curve illustrated in H = r c, c+d 2 2 the following picture could be c+d a = r (c) , b = r 2 •a H
K •b
First we show that there are at most two components in J C . By the Jordan arc theorem above, and the above lemma about square curves, there exists a square curve CxyH such that ∂CxyH = x + y and |Cx,yH | ∩ H = ∅. Using the same notation in relation to the other points, there exist •y square curves in the following list. If it has K in the subscript, then it •a misses K. H •x CxyH , ∂CxyH = x + y, CyzH , ∂CyzH = y + z •b K C , ∂C = x + y, C , ∂C =y+z xyK
xyK
yzK
yzK
Let these square curves be part of a grating which includes all vertices of all these square curves and contains the compact set J in the bounded two cells. First note that CxyH + CxyK is a one cycle because ∂CxyH + ∂CxyK = x + y + x + y = 0, and that |CxyH + CxyK | ∩ (H ∩ K) = ∅ This is because if a point is in H ∩K, then it is not in |CxyK | and it is also not in |CxyH |. Thus it cannot be in the possibly smaller set |CxyH + CxyK | . Also note that H ∩ K = {a, b} since r is one to one on [c, d) and (c, d]. Therefore, there exist unique two chains D, E such that ∂D = ∂E = CxyH + CxyK . Now if one of these 2 chains contains both a, b then the other 2 chain contains neither a nor b (single points are connected sets). Then by Alexander’s lemma, Lemma 12.4.5, there would exist a square curve C such that |C| ∩ (H ∪ K) = |C| ∩ J = ∅ and ∂C = x + y which is assumed not to happen since x, y are in different components of J C . Therefore, one of the two chains contains a and the other contains b. Say a ∈ |D| and b ∈ |E|. Similarly there exist unique 2 chains P, Q such that ∂P = ∂Q = CyzH + CyzK where a ∈ |P | and b ∈ |Q|. Now consider ∂ (D + Q) = CxyH + CxyK + CyzH + CyzK = (CxyH + CyzH ) + (CxyK + CyzK ) This is a one cycle because its boundary is x + y + y + z + x + y + y + z = 0. By Lemma 12.4.3, the fundamental lemma of the plane, there are exactly two 2 chains whose
320
CHAPTER 12. GREEN’S THEOREM
boundaries equal this one cycle. Therefore, D + Q must be one of them because, as just noted, ∂ (D + Q) = CxyH + CxyK + CyzH + CyzK . Also b ∈ |Q| and is not in |D| . Hence b ∈ |D + Q| because no 2 cell containing b is added twice. Similarly a ∈ |D + Q| since a ∈ |D| and a ∈ / |Q| . It follows that the other two chain whose boundary equals the above one cycle contains neither a nor b. In addition to this, CxyH + CyzH misses H and CxyK + CyzK misses K. Both of these one chains have boundary equal to x + z. By Alexander’s lemma, there exists a one chain C which misses both H and K (all of J) such that ∂C = x + z which contradicts the assertion that x, z are in different components. This proves the assertion that there are no more than two components to J C . Next, why are there at least two components in J C ? Suppose there is only one and let a, b be the points of J described above and H, K also as above. Let Q be a small square 1 cycle which encloses a on its inside such that b is not inside Q. Thus a is on the inside of Q and b is on the outside of Q as shown in the picture. •a
Q
H
K
•b Now let G be a grating which has the corners of Q as points of intersection of horizontal and vertical lines and also has all the 2 cells so small that none of them can intersect both of the disjoint compact sets H ∩ |Q| and |Q| ∩ K. These are indeed disjoint because the only possible point of intersection would be a and it is right at the center of Q so is not in |Q|. Let P be the 1 cells contained in the 1 cycle Q which have nonempty intersection with H. Some of them must have nonempty intersection with H because if not, then H would fail to be connected, having points inside Q, a, and points outside Q, b, but no points on Q. Similarly some of these one cells in Q have nonempty intersection with K. Let ∂P = x1 + · · · + xm . Then it follows each xk ∈ / H. This is because if an end of a 1 cell of P is on H, then its adjacent 1 cell having the same end is also in P and so when the boundary of P is taken, these two 0 cells will cancel. Could ∂P = 0? Moving counterclockwise around Q, it would follow that all the one cells contained in Q would intersect H. However, at least one must intersect K because if not, a is a point of K inside the square Q while b is a point of K outside Q thus separating K which is a connected set. However, this contradicts the choice of the grating. It violates the assumption that no 2 cell of G can intersect both of those disjoint compact sets H ∩ |Q| and |Q| ∩ K. Therefore, ∂P 6= 0. Starting with a one cell of Q which does not intersect H, move counter clockwise till you obtain the first one which intersects H. This will produce a point of ∂P . Then the next point of ∂P will occur when the first one cell of P which does not intersect H is encountered. Thus a pair of points in ∂P are obtained. Now you are in the same position as before, continue moving counter clockwise and obtaining pairs of points of ∂P till there are no more one cells of Q which intersect H. You must have encountered an even number of points for ∂P . Since it is assumed there is only one component of J C , it follows upon refining G if necessary, there exist 1 chains Bk contained in J C such that ∂Bk = x1 + xk and it is the existence of these Bk which will give the desired contradiction. Let B = B2 + · · · + Bm . Then P + B is a 1 cycle which misses K. It is a one cycle because m is even. m=2l 2l 2l 2l X X X X ∂ (P + B) = x1 + xk + xk = xk + xk + xk = 0 k=2
k=1
k=1
k=2
It misses K because B misses J and all the 1 cells of P in the original grating G intersect
12.5. GREEN’S THEOREM FOR A RECTIFIABLE JORDAN CURVE∗
321
H ∩ |Q| so they cannot intersect K. Also P + Q + B is a 1 cycle which misses H. This is because B misses J and every 1 cell of P which intersects H disappears because P + Q causes them to be added twice. Also, Q is a 1 cycle. Since H and K are connected, it follows from Lemma 12.4.6, 12.4.3 that P + B bounds a 2 chain D which is contained entirely in K C (the one which does not contain K). Similarly P + Q + B bounds a 2 chain E which is contained in H C (the 2 chain which does not contain H). Thus D + E is a 2 chain which does contains neither a nor b. (D misses K and E misses H and {a, b} = H ∩ K) However, ∂ (D + E) = P + B + P + Q + B = Q and so D + E is one of the 2 chains of Lemma 12.4.3 which have Q as boundary. However, Q is the boundary of the 2 chain of 2 cells which are inside Q which contains a and the 2 chain of 2 cells which are outside Q which contains b. This is a contradiction because neither of these 2 chains miss both a and b and this shows there are two components of J C . In the above argument, if each pair {x1 , xi } can be joined by a square curve Bi which lies in J C , then the contradiction was obtained. Therefore, there must exist a pair {x1 , xi } which can’t be joined by any square curve in J C and this requires these points to be in different components by Lemma 12.4.14 above. Since they are both on Q and Q could be as small as desired, this shows a is in the frontier of both components of J C . Furthermore, a was arbitrary so every point of J is a frontier point of both the components of J C . These are the only frontier points because the components of J C are open. By Lemma 12.4.12, J is the continuous image of the compact set S 1 so it follows J is bounded. The unbounded component of J C is the one which contains the connected set C B (0, R) where J ⊆ B (0, R). Thus there are two components for J C , the unbounded one C which contains B (0, R) and the bounded one which must be contained in B (0, R) . This proves the theorem.
12.5
Green’s Theorem for a Rectifiable Jordan Curve∗
Green’s theorem is one of the most profound theorems in mathematics. The version here appeared in 1951 and involves a rectifiable simple closed curve. It will always be assumed that the x, y axes are oriented in the usual way and that everything is taking place in the plane. First recall the Jordan curve theorem. Theorem 12.5.1 Let J be a Jordan curve in the plane. That is J is γ S 1 where γ is one to one, continuous, and onto and S 1 is the unit circle. Then J C consists of exactly two components, a bounded component called the inside Ui and an unbounded component called the outside Uo and J is the frontier of both of these components, meaning that every open set containing a point of J also contains infinitely many points of both Uo and Ui . The following lemma will be of importance in what follows. To say two sets are homeomorphic is to say there is a one to one continuous onto mapping from one to the other which also has continuous inverse. Clearly the statement that two sets are homeomorphic defines an equivalence relation. Then from Lemma 12.4.12, a Jordan curve is just a curve which is homeomorphic to a circle. From now on, Uo is the outside component of a simple closed curve Γ which is unbounded and Ui is the inside component. Also I will write γ ∗ or Γ to emphasize that the set of points in the curve is being considered rather than some parametrization. The following lemma is about cutting in two pieces a simple closed curve with its inside.
Lemma 12.5.2 In the situation of Theorem 12.5.1, let Γ be a simple closed curve and let γ ∗ be a straight line segment such that the open segment, γ ∗o , γ ∗ without its endpoints, is contained in Ui and the intersection of γ ∗ with Γ equals {p, q} . Then this line segment
322
CHAPTER 12. GREEN’S THEOREM
divides Ui into two connected open sets U1i , U2i which are the insides of two simple closed curves such that Ui = U1i ∪ γ ∗o ∪ U2i Furthermore, U1i ∩ U2i = ∅. Proof: To aid in the discussion, for α a real number, denote by α ˆ the point on the unit circle S 1 which is determined in the usual way. That is, if α is a real number, α ˆ is that point on S 1 whose radian measure in [0, 2π) is congruent to α mod 2π. We usually think of this as taking a string of length |α| and wrapping it around the unit circle counter clockwise if α > 0 and clockwise if α < 0 and α ˆ is where we end up. 1 Since Γ is a simple closed curve, there is γ : S → Γ which is one to one, onto and ˆ = q where 0 ≤ α < β < 2π. Change the parameter continuous. Say γ (ˆ α) = p and γ β from t to s where t = s + α. Then consider γ ˆ (ˆ s) ≡ γ s\ + α . Then we can consider the two curves γ ˆ (ˆ s) , s ∈ [0, β − α] , γ ˆ (ˆ s) , s ∈ [β − α, 2π] 0 ,q = γ ˆ β\ − α . To save on notation, write γ instead of γ ˆ. where p = γ ˆ ˆ ˆ 1 and Γ ˆ 2 joined only at the points p = γ ˆ0 = γ 2π c Thus there are two simple curves Γ − α . Orient the first to go from q to p and the second to go from p to q. and q = γ β\ Recall that the order of two points suffices to specify the orientation of one of these simple curves. Now define ( γ (β − α) + s\ (2π − (β − α)) , s ∈ [0, 1] γ 1 (s) ≡ p + (s − 1) (q − p) , s ∈ [1, 2] Since the line segment is in Ui , this specifies a simple closed curve γ 1 ([0, 2]) which starts at q goes to p and then back to q as desired. One can describe γ 2 (s) similarly. Refer to these simple closed curves as Γ1 , Γ2 . They intersect in the line segment γ ∗ . Thus, by the Jordan curve theorem, these simple closed curves have insides U1i and U2i respectively. Are these insides contained in Ui , the inside of the original Jordan curve? p Γ • γ∗ U2i Γ2
• q
U1i
Γ1
Claim 1: U1i , U2i ⊆ Ui , U1i , U2i contain no points of Γ2 ∪ Γ1 . Proof of claim 1: Since U1i is connected, it cannot have points which are in Ui and points which are in Uo , the insides and outsides of the original Jordan curve Γ. However, it does contain points of Ui because γ ∗ ⊆ Ui and every point of γ ∗ is a limit of points of U1i . ˆ1 ∪ Γ ˆ2 Therefore, U1i ⊆ Ui . Similarly U2i ⊆ Ui . Thus no point of U1i has any point of Γ ∗ ˆ because it is the inside of Γ1 which consists of γ ∪ Γ1 . Nor does it contain any point of γ ∗ , this because of the Jordan curve theorem and the construction above which shows γ ∗ as part of the simple closed curve having U1i as its inside. Similar considerations pertain to U2i . This shows the claim. Now consider a point x ∈ Ui . It could be a point of γ ∗o so suppose it is not. x is also not ˆ 1 , nor a point of Γ ˆ 2 because the union of these simple curves equals the original a point of Γ Jordan curve Γ. It follows x ∈ γ ∗o ∪ U1i ∪ U2i and so Ui ⊆ γ ∗o ∪ U1i ∪ U2i . However, it
12.5. GREEN’S THEOREM FOR A RECTIFIABLE JORDAN CURVE∗
323
was already argued that U1i ⊆ Ui and U2i ⊆ Ui and it was given that γ ∗o ⊆ Ui . Therefore, equality holds and Ui = γ ∗o ∪ U1i ∪ U2i . To see that U1i ∩ U2i = ∅, note that these are open conected sets. Also, it was argued above that U2i is contained in Ui so it has no points of Γ. Neither does it have points of γ ∗ . Therefore, U2i = (U1i ∩ U2i ) ∪ (U1o ∩ U2i ) , two disjoint open sets. Since U2i is connected, it follows that one of these disjoint open sets is empty. However, consideration of the midpoint of γ ∗ and a small ball centered at this point shows that the second set is not empty because there is a point not in U1i in this ball. This point is in Ui so it must be in U2i because it is in neither γ ∗o nor in U1i . Hence U1i ∩ U2i = ∅. The following lemma has to do with decomposing the inside and boundary of a simple closed rectifiable curve into small pieces. In doing this I will refer to a region as the union of a connected open set with its boundary. Also, two regions will be said to be non overlapping if they either have empty intersection or the intersection is contained in the intersection of their boundaries.The height of a set A equals sup {|y1 − y2 | : (x1 , y1 ) , (x2 , y2 ) ∈ A} . The width of A will be defined similarly. We have a rough idea of what it means for things like circles and squares and triangles to be oriented in counter clockwise direction. You move from left to right on the bottom and from right to left on top. This is motivation for the lemma which follows in which a line segment will split the simple closed curve into two pieces, one having the top point and the other having the bottom point. Add in the line segment to obtain two simple closed curves. Then along the line, it will be oriented to move from left to right as part of the simple closed curve which has the point at the top and from right to left as part of the simple closed curve containing the point at the bottom. This produces an orientation on the original simple closed curve as well as an orientation on the resulting smaller simple closed curves. This lemma is about splitting things into simple regions as just described and in estimating the number of regions which intersect Γ. The main idea consists of showing that you can divide a simple closed curve into two simple closed curves, one of which has the top point and the other having the bottom point and then doing the same thing relative to the left and right points of the simple closed curve. The following argument will be used without explicit reference in the argument. Here γ is a parametrization and γ 2 its second component. Let γ 2 (c) > r and γ 2 (a) = r. Let S ≡ {t ∈ [a, c] : γ 2 (t) = r} . This equals {t ∈ [a, c) : γ 2 (t) = r}. Then S has a last point. Let s = sup S. Then s < c. Also there exists tn ↑ s. Therefore, by continuity γ 2 (s) = r. In other words, there exists a last point s before c such that γ 2 (s) = r. Similar considerations apply if γ 2 (c) < r. In the following lemma, the term “region” will be used to denote a simple closed curve along with its inside.
Lemma 12.5.3 Let Γ be a simple closed rectifiable curve. Also let δ > 0 be given such that 2δ is smaller than both the height and width of Γ. Then there exist finitely many non n overlapping regions {Rk }k=1 consisting of simple closed rectifiable curves along with their insides whose union equals Ui ∪ Γ. These regions consist of two kinds, those contained in Ui and those with nonempty intersection with Γ. These latter regions are called “border” regions. The boundary of a border region consists of straight line segments parallel to the coordinate axes of the form x = m 4δ or y = k 4δ for m, k integers along with arcs from Γ. The non border regions consist of rectangles. Thus all of these regions have boundaries which are rectifiable simple closed curves. Also each region is contained in a square having sides of length no more than δ. There are at most 4V (Γ) 4 +1 δ
324
CHAPTER 12. GREEN’S THEOREM
border regions. The construction also yields an orientation for Γ and for all these regions, the orientations for any segment shared by two regions are opposite. Proof: Let x ∈ Ui and let l be the horizontal x, y0 ) for line through x where x = (ˆ y0 = m 4δ where m ∈ N is chosen to make m 4δ as close as possible to the average of the second components of the top and bottom points of Γ. Let a top point be (x1 , y1 ) and a bottom point (x2 , y2 ). Let γ ∗0 be the connected component of l ∩ Ui which contains x and let γ ∗ result from including the endpoints which are in Γ. (x1 , y1 ) •
• • α β
• (x2 , y2 ) The problem is, we might have picked the wrong segment. This is illustrated in the picture. The segment shown does separate Γ into two simple curves but one of them contains both the top and bottom points. The other is very short. It is desired to get two of these such that one of them contains a top point and excludes at least one bottom point. This prevents one of the new curves being very short. Let γ be an orientation of Γ such that γ : [a, b] → Γ, γ (a) = γ (b) = α, y2 < α2 < y1 , γ one to one on [a, b) and assume that for c < d, c, d ∈ (a, b) , γ (c) = (x1 , y1 ) , γ (d) = (x2 , y2 ) , a < c < d < b If c is not less than d, you could simply change the orientation or modify the argument. This orientation is used only to describe the special line segment. The final orientation will be obtained later after the line segment has been found. The connected components of l ∩ Ui are the segments (γ (τ k ) , γ (σ k )) where τ k < σ k . Choose τ k as large as possible but less than c. τ k is the last number less than c such that γ 2 (τ k ) = mδ/4. Thus c ∈ (τ k , σ k ) because otherwise, τ k was not as large as possible because σ k would work better. Hence γ (c) = (x1 , y1 ) ∈ γ ([τ k , σ k ]). What about d? Could it happen that d ∈ (τ k , σ k )? If so, there would be t ∈ (c, d) such that γ 2 (t) = m 4δ . Letting t be either the smallest such point larger than c or the largest such point smaller than d, (this is to account for the fact that a segment of points of Γ could lie on l) it must be an endpoint of some component of l ∩ Ui . However, these components are either disjoint or they coincide. Thus if γ (t) is in the segment (γ (τ k ) , γ (σ k )) , then t must be either τ k or σ k which is impossible because τ k < c < d < σ k . It follows that γ (t) must be in some other component (γ (τ j ) , γ (σ j )) for j 6= k and t = τ j or t = σ j . Consider the two simple curves γ ([τ k , σ k ]) , γ ([τ j , σ j ]) . Since γ is one to one, there is no intersection. However, γ (t) ∈ γ ([τ k , σ k ]) and since t = τ j or σ j , it follows that γ (t) ∈ γ ([τ j , σ j ]). This is a contradiction. Hence d ∈ / (τ k , σ k ) and so (x2 , y2 ) ∈ / γ ([τ k , σ k ]). This orients Γ as follows: α then (x1 , y1 ) , then (x2 , y2 ) then α. From now on, this will be the orientation of Γ. All other orientations will come from this one. Now here is the illustration of the correct segment. γ ∗ is the segment shown. It divides Γ into two simple curves, one containing (x1 , y1 ) and the other (x2 , y2 ).
12.5. GREEN’S THEOREM FOR A RECTIFIABLE JORDAN CURVE∗
325
(x1 , y1 ) •
• σk
• τk
• (x2 , y2 ) By Lemma 12.5.2, this divides Γ into two simple closed curves having γ ∗ as their intersection. Do the same process to each of these two simple closed curves. Each time you do this process, you get two simple closed curves in place of one. If the height of the curve is h > δ, then both new simple closed curves have length at least 87 δ. (This is where it was important to have a top point in one curve and a bottom point in the other.) Repeat the process until all regions have height no more than δ. This must eventually take place because if not, then there would be a sequence of simple closed curves each with height larger than 7δ 8 . This would violate the assumption that Γ has finite length. Using Proposition 4.4.12, use the orientation on Γ obtained earlier to orient the resulting horizontal line segments. Thus, from this proposition, each segment has opposite orientation as part of the two simple closed curves resulting from its inclusion. Now follow the same process just described on each of the non overlapping “short” regions just obtained using vertical rather than horizontal lines, letting the orientation of the vertical edges be determined from the orientation already obtained, but this time feature width instead of height and let the lines be vertical of the form x = k 4δ where k is an integer. How many border regions are there? Denote by V (Γ) the length of Γ. Now decompose Γ into N arcs of length 4δ with maybe one having length less than 4δ . Thus N − 1 ≤ V (Γ) ( δ4 ) and so 4V (Γ) N≤ +1 δ The resulting arcs are each contained in a box having sides of length no more than δ in length. Each of these N arcs can’t intersect any more than four of the rectangles of the above construction because of their short length. Therefore, at most 4N boxes of the construction can intersect Γ. Thus there are no more than 4V (Γ) 4 +1 δ border regions. This proves the lemma. Note that this shows that the part of Ui which is included by the boundary boxes has area at most equal to 4V (Γ) 4 + 1 δ2 (12.6) δ which converges to 0 as δ → 0. Also, this has shown that Γ is contained in the union of boxes having sides of length δ whose total Lebesgue measure is no more than 12.6. Since δ is arbitrary, this shows that the m2 (Γ) = 0.
Proposition 12.5.4 The two dimensional Lebesgue measure of a simple closed curve having finite total variation is 0.
326
CHAPTER 12. GREEN’S THEOREM
The following is a proof of Green’s theorem based on the above. Included in this formula is a way to analytically define the orientation of a simple closed curve in the plane. Recall that there were two orientations of a simple closed curve depending on the fact that there are two orientations for a circle in the plane. Green’s theorem can distinguish between these two orientations.
Lemma 12.5.5 Let R = [a, b] × [c, d] be a rectangle and let P, Q be functions which are C 1 in some open set containing R. Orient the boundary of R as shown in the following picture. This is called the counter clockwise direction or the positive orientation
Then letting γ denote the oriented boundary of R as shown, Z Z (Qx (x, y) − Py (x, y)) dm2 = f ·dγ R
γ
where f (x, y) ≡ (P (x, y) , Q (x, y)) . In this context the line integral is usually written using the notation Z P dx + Qdy. ∂R
If the bounding curve were oriented in the opposite direction, then the area integral would be Z (Py (x, y) − Qx (x, y)) dm2 R
Proof: This follows from direct computation. A parametrization for the bottom line of R is γ B (t) = (a + t (b − a) , c) , t ∈ [0, 1] A parametrization for the top line of R with the given orientation is γ T (t) = (b + t (a − b) , d) , t ∈ [0, 1] A parametrization for the line on the right side is γ R (t) = (b, c + t (d − c)) , t ∈ [0, 1] and a parametrization for the line on the left side is γ L (t) = (a, d + t (c − d)) , t ∈ [0, 1] Now it is time to do the computations using Theorem 4.2.5. Z Z 1 f · dγ = P (a + t (b − a) , c) (b − a) dt γ
0
Z
1
P (b + t (a − b) , d) (a − b) dt
+ 0
12.5. GREEN’S THEOREM FOR A RECTIFIABLE JORDAN CURVE∗ 1
Z
1
Z
Q (a, d + t (c − d)) (c − d) dt
Q (b, c + t (d − c)) (d − c) dt +
+
327
0
0
Changing the variables and combining the integrals, this equals Z b Z b Z d Z = P (x, c) dx − P (x, d) dx + Q (b, y) dy − a
a b
Z =
− Z
Z
c
d
Q (a, y) dy
c d
Z
Z
b
Qx (x, y) dxdy
Py (x, y) dydx + a
d
c
c
a
(Qx − Py ) dm2
= R
By Fubini’s theorem, Proposition 6.9.2 on Page 170. (To use this theorem you can extend the functions to equal 0 off R.) This proves the lemma. With this lemma, it is possible to prove Green’s theorem and also give an analytic criterion which will distinguish between different orientations of a simple closed rectifiable curve. First here is a discussion which amounts to a computation. nδ Let Γ be a rectifiable simple closed curve with inside Ui and outside Uo . Let {Rk }k=1 denote the non overlapping regions of Lemma 12.5.3 all oriented as explained there and let Γ also be oriented as explained there. Since the shared edges of the horizontal and vertical lines have opposite orientations, all these regions which are on the inside of Γ are rectangles and have the same orientation. Assume it is counter clockwise. Let Bδ be the set of border regions and let Iδ be the rectangles contained in Ui . Thus in taking the sum of the line integrals over the boundaries of the interior rectangles, the integrals over the “interior edges” cancel out and you are left with a line integral over the exterior edges of a polygon which is composed of the union of the squares in Iδ . Now let f (x, y) = (P (x, y) , Q (x, y)) be a vector field which is C 1 on Ui , and suppose also that both Py and Qx are in L1 (Ui ) (Absolutely integrable) and that P, Q are continuous on Ui ∪Γ. (An easy way to get all this to happen is to let P, Q be in C 1 (Ui ∪ Γ) , restrictions to Ui ∪ Γ of functions which are C 1 on some open set containing Ui ∪ Γ, but what is assumed here is a lot more general.) Note that ∪δ>0 {R : R ∈ Iδ } = Ui and that for Iδ ≡ ∪ {R : R ∈ Iδ } , the following pointwise convergence holds for δ denoting a sequence converging to 0. lim XIδ (x) = XUi (x) .
δ→0
By the dominated convergence theorem, Z Z lim (Qx − Py ) dm2 = δ→0
Iδ
(Qx − Py ) dm2
Ui
where m2 denotes two dimensional Lebesgue measure discussed earlier. Let ∂R denote the 2 boundary of R for R one of these regions of Lemma 12.5.3 oriented as described. Let wδ (R) denote (max {Q (x) : x ∈ ∂R} − min {Q (x) : x ∈ ∂R})
2
+ (max {P (x) : x ∈ ∂R} − min {P (x) : x ∈ ∂R})
2
By uniform continuity of P, Q on the compact set Ui ∪ Γ, if δ is small enough, wδ (R) < ε for all R ∈ Bδ . Then for R ∈ Bδ , it follows from Theorem 4.2.1, for V (∂R) denoting the length of ∂R, Z 1 f · dγ ≤ wδ (R) (V (∂R)) < ε (V (∂R)) (12.7) 2 ∂R
328
CHAPTER 12. GREEN’S THEOREM
whenever δ is small enough. Always let δ be this small. Also since the line integrals cancel on shared edges Z X Z X Z f · dγ + f · dγ = f · dγ R∈Iδ
∂R
R∈Bδ
∂R
(12.8)
Γ
Consider the second sum on the left. From 12.7, X Z X X Z (V (∂R)) f · dγ ≤ ε f · dγ ≤ ∂R ∂R R∈Bδ
R∈Bδ
R∈Bδ
Denote by ΓR the part of Γ which is contained in R ∈ Bδ and V (ΓR ) is its length. Then the above sum equals ! ! X X ε (V (ΓR ) + |∂Rδ |) = εV (Γ) + ε |∂Rδ | R∈Bδ
R∈Bδ
where |∂Rδ | is the sum of the lengths of the straight edges of Rδ . This last sum is easy to estimate. Recall from 12.5.3 there are no more than 4V (Γ) 4 +1 δ of these border regions. Furthermore, the sum of the lengths of all four edges of one of these is no more than 4δ and so X V (Γ) |∂Rδ | ≤ 4 + 1 4δ = 16V (Γ) + 16δ. δ R∈Bδ
P R Thus R∈Bδ ∂R f · dγ is dominated by ε (16V (Γ) + 16δ) + εV (Γ) ≤ 17εV (Γ) + ε (16δ) Let εn → 0 and let δ n be the corresponding sequence of δ such that δ n → 0 also. Hence X Z lim f · dγ = 0. n→∞ R∈Bδn ∂R Then using Green’s theorem proved above for squares, Z X Z f · dγ = lim f · dγ + lim n→∞
Γ
= lim
n→∞
X Z R∈Iδn
∂R
R∈Iδn
n→∞
X Z
Z f · dγ = lim
∂R
n→∞
f · dγ
∂R
R∈Bδn
Z ± (Qx − Py ) dm2 =
Iδ n
± (Qx − Py ) dm2 Ui
where the ± adjusts for whether the interior rectangles are all oriented positively (counter clockwise) or all oriented negatively (clockwise). It was assumed these rectangles are oriented counter clockwise and so the + sign would be used. This has proved the general form of Green’s theorem which is stated in the following theorem.
Theorem 12.5.6
Let Γ be a rectifiable simple closed curve in R2 having inside Ui and outside Uo . Let P, Q be functions with the property that Qx , Py ∈ L1 (Ui )
12.6. ORIENTATION OF A JORDAN CURVE
329
and P, Q are C 1 on Ui . Assume also P, Q are continuous on Γ ∪ Ui . Then there exists an orientation for Γ (Remember there are only two.) such that for f (x, y) = (P (x, y) , Q (x, y)) , Z Z f · dγ = (Qx − Py ) dm2 . Γ
Ui
Proof: In the construction of the regions, an orientation was imparted to Γ. The above computation shows Z Z f · dγ = (Qx − Py ) dm2 Γ
12.6
Ui
Orientation of a Jordan Curve
With this wonderful theorem, it is possible to give an analytic description of the two different orientations of a rectifiable simple closed curve. The positive orientation is the one for which Greens theorem holds and the other one, called the negative orientation is the one for which Z Z f · dγ = (Py − Qx ) dm2 . Γ
Ui
Definition 12.6.1
Let γ be a parametrization of a simple closed curve Γ of finite length. We say this curve is oriented positively if Green’s theorem holds. That is, whenever (P (x, y) , Q (x, y)) is C 1 on Ui , and in L1 (Ui ) , the inside of Γ and continuous on Ui then Z Z (P, Q) · dr = (Qx − Py ) dm2 γ
Ui
A careful examination of the proof also provides a geometric description of the orientation of a simple closed curve Γ. This method is described in the following procedure. For simple situations, it gives an easy to verify description of positive and negative orientation.
Procedure 12.6.2
Let p be a point of Γ with largest second component and let q be a point of smallest second component, respectively the top and bottom of Γ. There exists a horizontal line segment γ ∗ having left endpoint x and right endpoint y, part of a horizontal line l such that this segment is a component of l ∩ Ui for Ui the inside of Γ and of the two simple curves Γp , Γq each having endpoints x, y, one contains p and the other contains q. Denote these by Γp and Γq respectively. The positive orientation takes the points in the following order: y, p, x, q, y. The negative orientation takes the points in the order x, p, y, q, x.
p •
γ∗ • x
• y
q
•
330
CHAPTER 12. GREEN’S THEOREM
For most situations, one can use the above method to identify whether a simple closed oriented curve is positively oriented. There are other regions for which Green’s theorem holds besides just the inside and boundary of a simple closed curve. For Γ a simple closed curve and Ui its inside, lets refer to Ui ∪ Γ as a Jordan region. When you have two non overlapping Jordan regions which intersect in a finite number of simple curves, you can delete the interiors of these simple curves and what results will also be a region for which Green’s theorem holds. This is illustrated in the following picture. Ui2
U1i There are two Jordan regions (inside a simple closed curve) here with insides U1i and U2i and these regions intersect in three simple curves. As indicated in the picture, opposite orientations are given to each of these three simple curves. Then the line integrals over these cancel. The area integrals add. Recall the two dimensional area of a bounded variation curve equals 0. Denote by Γ the curve on the outside of the whole thing and Γ1 and Γ2 the oriented boundaries of the two holes which result when the curves of intersection are removed, the orientations as shown. Then letting f (x, y) = (P (x, y) , Q (x, y)) , and U = U1i ∪ U2i ∪ {Open segments of intersection} as shown in the following picture,
Γ2
U
Γ1
Γ
it follows from applying Green’s theorem to both of the Jordan regions, Z Z Z Z f · dγ+ f · dγ 1 + f · dγ 2 = (Qx − Py ) dm2 Γ Γ1 Γ2 U ∪U Z 1i 2i = (Qx − Py ) dm2 U
To make this simpler, just write it in the form Z Z f · dγ = (Qx − Py ) dm2 ∂U
U
where ∂U is oriented as indicated in the picture and involves the three oriented curves Γ, Γ1 , Γ2 . As a particular case of Green’s theorem applied to a simple closed curve, let Q (x, y) ≡
x−a 2
(x − a) + (y − b)
2,
P (x, y) ≡
− (y − b) 2
(x − a) + (y − b)
2,
(12.9)
12.6. ORIENTATION OF A JORDAN CURVE
331
f (x, y) ≡ (P (x, y) , Q (x, y)). Thus if (a, b) ∈ Ui , the function is not C 1 so you can’t use Green’s theorem due to the obvious fact that the denominators vanish when x = a, y = b. Then some computations show that for (x, y) 6= (a, b) , Qx − Py = 0. However, the functions are obviously not differentiable at (a, b). Therefore, let Cr be a small circle centered at (a, b) ∈ Ui the inside of Γ. Orient Cr clockwise and orient Γ as described above in which the orientation on Γ comes from counter clockwise orientation of the rectangles. Then by Green’s theorem, Z Z f ·dγ+ f ·dγ = 0 Γ
and so
Cr
Z
Z f ·dγ =
f ·dγ −Cr
Γ
where −Cr is the counterclockwise orientation of the small circle. That second integral is routine to evaluate. A parametrization is x = a + r cos t, y = (b + r sin t) Then the integral is Z 2π 0
−r sin t r cos t , r2 r2
Z · (−r sin t, r cos t) dt =
2π
1dt = 2π 0
If the orientation of Γ is opposite, then the same reasoning shows that Z f ·dγ = −2π Γ
This shows that if (a, b) ∈ / Ui , then 1 2π
Z f ·dγ = 0 Γ
One can apply Green’s theorem directly in this case. However, if (a, b) ∈ Ui , then if orientation is counter clockwise, Z 1 f ·dγ = 1 2π Γ and if orientation is clockwise, then 1 2π
Z f ·dγ = −1 Γ
This proves the following interesting result which is related to the winding number in complex analysis.
Proposition 12.6.3 Let P and Q be given in 12.9. Let f (x, y) = (P (x, y) , Q (x, y)) . Then
1 2π
Z f ·dγ = ±1 Γ
depending on the orientation of Γ. Here f is as above where (a, b) ∈ Ui .
332
CHAPTER 12. GREEN’S THEOREM
Part III
Abstract Analysis
333
Chapter 13
Banach Spaces 13.1
Theorems Based On Baire Category
13.1.1
Baire Category Theorem
Banach spaces are complete normed linear spaces. These have been mentioned throughout the book so far. In this chapter are the most significant theorems relative to Banach spaces and more generally normed linear spaces. The main theorems to be presented here are the uniform boundedness theorem, the open mapping theorem, the closed graph theorem, and the Hahn Banach Theorem. The first three of these theorems come from the Baire category theorem which is about to be presented. They are topological in nature. The Hahn Banach theorem has nothing to do with topology. First some definitions:
Definition 13.1.1 As earlier, a set U is open if whenever x ∈ U, there exists r > 0 such that B (x, r) ⊆ U, where B (x, r) ≡ {z: kx − zk < r} Closed sets are complements of open sets just as earlier. k·k denotes a norm satisfying the axioms of a norm, 2.1 - 2.3 mentioned much earlier in the context of Fp , kxk ≥ 0 and kxk = 0 if and only if x = 0
(13.1)
For α a scalar, kαxk = |α| kxk
(13.2)
kx + yk ≤ kxk + kyk
(13.3)
All of the theorems presented earlier for Fp hold with no change except for those which pertain to compactness. Note that a normed linear space is always a metric space if d (x, y) ≡ kx − yk.
Proposition 13.1.2 The open ball B (x, r) is an open set. Proof: If y ∈ B (x, r) , consider B (y, δ) where δ ≡ r − kx − yk . Then if z ∈ B (y, δ) , it follows that kz − xk ≤ kz − yk + ky − xk < r − kx − yk + ky − xk = r showing that B (y, δ) ⊆ B (x, r). For convenience, define the closed disk as D (z, r) ≡ {x: kx − zk ≤ r} 335
336
CHAPTER 13. BANACH SPACES
is a closed set. Indeed by continuity of the norm, if y is a limit point of D (z, r) so that yn → y for yn ∈ D (z, r) , then kz − yk = lim kz − yn k ≤ r. n→∞
Thus it contains limits of converging sequences of points of D (z, r) and is therefore closed. See Theorems 2.3.7 and 2.3.9. To see that 15.4 holds, note kzk = kz − w + wk ≤ kz − wk + kwk which implies kzk − kwk ≤ kz − wk and now switching z and w, yields kwk − kzk ≤ kz − wk which implies 15.4.
Definition 13.1.3
{xn } is called a Cauchy sequence if for every ε > 0 there exists N such that if m, n ≥ N, then kxn − xm k < ε
Definition 13.1.4
A complex vector space Y is called a Banach space if it is a normed linear space for which every Cauchy sequence converges. This means that if {xn } is a Cauchy sequence, there exists x ∈ Y such that limn→∞ kx − xn k = 0.
Definition 13.1.5
A complete normed linear space is called a Banach space.
The following remarkable result is called the Baire category theorem. To get an idea of its meaning, imagine you draw a line in the plane. The complement of this line is an open set and is dense because every point, even those on the line, are limit points of this open set. Now draw another line. The complement of the two lines is still open and dense. Keep drawing lines and looking at the complements of the union of these lines. You always have an open set which is dense. Now what if there were countably many lines? The Baire category theorem implies the complement of the union of these lines is dense. In particular it is nonempty. Thus you cannot write the plane as a countable union of lines. This is a rather rough description of this very important theorem. The precise statement and proof follow.
Theorem 13.1.6
Let X be a Banach space and let {Un }∞ n=1 be a sequence of open U subsets of X satisfying Un = X (Un is dense). Then D ≡ ∩∞ n=1 n is a dense subset of X.
Proof: Let p ∈ X and let r0 > 0. I need to show D ∩ B(p, r0 ) 6= ∅. Since U1 is dense, there exists p1 ∈ U1 ∩ B(p, r0 ), an open set. Let p1 ∈ B(p1 , r1 ) ⊆ B(p1 , r1 ) ⊆ U1 ∩ B(p, r0 ) and r1 < 2−1 . This is possible because U1 ∩ B (p, r0 ) is an open set and so there exists r1 such that B (p1 , 2r1 ) ⊆ U1 ∩ B (p, r0 ). But B (p1 , r1 ) ⊆ B (p1 , r1 ) ⊆ B (p1 , 2r1 ) because B (p1 , r1 ) = {x ∈ X : kx − p1 k ≤ r1 } (Why?)
13.1. THEOREMS BASED ON BAIRE CATEGORY r0 p p· 1
337
There exists p2 ∈ U2 ∩ B(p1 , r1 ) because U2 is dense. Let p2 ∈ B(p2 , r2 ) ⊆ B(p2 , r2 ) ⊆ U2 ∩ B(p1 , r1 ) ⊆ U1 ∩ U2 ∩ B(p, r0 ).
and let r2 < 2−2 . Continue in this way. Thus rn < 2−n, B(pn , rn ) ⊆ U1 ∩ U2 ∩ ... ∩ Un ∩ B(p, r0 ), B(pn , rn ) ⊆ B(pn−1 , rn−1 ). The sequence, {pn } is a Cauchy sequence because all terms of {pk } for k ≥ n are contained in B (pn , rn ), a set whose diameter is no larger than 2−n . Since X is complete, there exists p∞ such that lim pn = p∞ . n→∞
Since all but finitely many terms of {pn } are in B(pm , rm ), it follows that p∞ ∈ B(pm , rm ) for each m. Therefore, ∞ p∞ ∈ ∩∞ m=1 B(pm , rm ) ⊆ ∩i=1 Ui ∩ B(p, r0 ).
The following corollary is also called the Baire category theorem.
Corollary 13.1.7 Let X be a Banach space and suppose X = ∪∞ i=1 Fi where each Fi is a closed set. Then for some i, interior Fi 6= ∅. Proof: If all Fi has empty interior, then FiC would be a dense open set. Therefore, from Theorem 13.1.6, it would follow that C
∞ C ∅ = (∪∞ i=1 Fi ) = ∩i=1 Fi 6= ∅.
The set D of Theorem 13.1.6 is called a Gδ set because it is the countable intersection of open sets. Thus D is a dense Gδ set. Recall that a norm satisfies: a.) kxk ≥ 0, kxk = 0 if and only if x = 0. b.) kx + yk ≤ kxk + kyk. c.) kcxk = |c| kxk if c is a scalar and x ∈ X. From the definition of continuity, a function is continuous if lim xn = x
n→∞
implies lim f (xn ) = f (x).
n→∞
Theorem 13.1.8 Let X and Y be two normed linear spaces and let L : X → Y be linear (L(ax + by) = aL(x) + bL(y) for a, b scalars and x, y ∈ X). The following are equivalent a.) L is continuous at 0 b.) L is continuous c.) There exists K > 0 such that kLxkY ≤ K kxkX for all x ∈ X (L is bounded). Proof: a.)⇒b.) Let xn → x. It is necessary to show that Lxn → Lx. But (xn − x) → 0 and so from continuity at 0, it follows L (xn − x) = Lxn − Lx → 0 so Lxn → Lx. This shows a.) implies b.).
338
CHAPTER 13. BANACH SPACES
b.)⇒c.) Since L is continuous, L is continuous at 0. Hence kLxkY < 1 whenever kxkX ≤ δ for some δ. Therefore, suppressing the subscript on the k·k ,
L δx ≤ 1.
kxk Hence kLxk ≤
1 kxk. δ
c.)⇒a.) follows from the inequality given in c.).
Definition 13.1.9 Let L : X → Y be linear and continuous where X and Y are normed linear spaces. Denote the set of all such continuous linear maps by L(X, Y ) and define kLk = sup{kLxk : kxk ≤ 1}. (13.4) This is called the operator norm. Note that from Theorem 13.1.8 kLk is well defined because of part c.) of that Theorem. The next lemma follows immediately from the definition of the norm and the assumption that L is linear.
Lemma 13.1.10 With kLk defined in 13.4, L(X, Y ) is a normed linear space. Also kLxk ≤ kLk kxk. Proof: Let x 6= 0 then x/ kxk has norm equal to 1 and so
x
L
≤ kLk .
kxk Therefore, multiplying both sides by kxk, kLxk ≤ kLk kxk. This is obviously a linear space. It remains to verify the operator norm really is a norm. of all, if kLk = 0, then Lx = 0 First
for all kxk ≤ 1. It follows that for any x 6= 0, 0 = L Also, if c is a scalar,
x kxk
and so Lx = 0. Therefore, L = 0.
kcLk = sup kcL (x)k = |c| sup kLxk = |c| kLk . kxk≤1
kxk≤1
It remains to verify the triangle inequality. Let L, M ∈ L (X, Y ) . kL + M k
≡
sup k(L + M ) (x)k ≤ sup (kLxk + kM xk) kxk≤1
≤
kxk≤1
sup kLxk + sup kM xk = kLk + kM k . kxk≤1
kxk≤1
This shows the operator norm is really a norm as hoped. For example, consider the space of linear transformations defined on Rn having values in Rm . The fact the transformation is linear automatically imparts continuity to it. You should give a proof of this fact. To show this, you might recall that every such linear transformation can be realized in terms of matrix multiplication. Thus, in finite dimensions the algebraic condition that an operator is linear is sufficient to imply the topological condition that the operator is continuous. The situation is not so simple in infinite dimensional spaces such as C (X; Rn ). This explains the imposition of the topological condition of continuity as a criterion for membership in L (X, Y ) in addition to the algebraic condition of linearity. Here is a simple example.
13.1. THEOREMS BASED ON BAIRE CATEGORY
339 2
Let V denote all linear combinations of functions form e−αx for α > 0. Thus a Pn of the−α 2 kx typical element of V is an expression of the form β e . Let L : V → C be given k=1 k R by Lf ≡ R f (x) dx and for a norm on V, kf k ≡ max {|f (x)| : x ∈ R} Of course V is not complete, but it is a normed linear space. Recall that Z ∞ Z ∞ 2 1 −(x2 /n2 ) √ e−x dx = e = π −∞ n −∞ 2 2 where here n ∈ N. Consider the sequence of functions fn (x) ≡ n1 e−(x /n ) . Its maximum value is 1/n and so kfn k → 0 but Lfn fails to converge to the 0 function. Thus L is not continuous although it is linear.
Theorem 13.1.11
If Y is a Banach space, then L(X, Y ) is also a Banach space.
Proof: Let {Ln } be a Cauchy sequence in L(X, Y ) and let x ∈ X. kLn x − Lm xk ≤ kxk kLn − Lm k. Thus {Ln x} is a Cauchy sequence. Let Lx = lim Ln x. n→∞
Then, clearly, L is linear because if x1 , x2 are in X, and a, b are scalars, then L (ax1 + bx2 )
= =
lim Ln (ax1 + bx2 )
n→∞
lim (aLn x1 + bLn x2 )
n→∞
= aLx1 + bLx2 . Also L is continuous. To see this, note that {kLn k} is a Cauchy sequence of real numbers because |kLn k − kLm k| ≤ kLn − Lm k. Hence there exists K > sup{kLn k : n ∈ N}. Thus, if x ∈ X, kLxk = lim kLn xk ≤ Kkxk. n→∞
13.1.2
Uniform Boundedness Theorem
The next big result is sometimes called the Uniform Boundedness theorem, or the BanachSteinhaus theorem. This is a very surprising theorem which implies that for a collection of bounded linear operators, if they are bounded pointwise, then they are also bounded uniformly. As an example of a situation in which pointwise bounded does not imply uniformly bounded, consider the functions fα (x) ≡ X(α,1) (x) x−1 for α ∈ (0, 1). Clearly each function is bounded and the collection of functions is bounded at each point of (0, 1), but there is no bound for all these functions taken together. One problem is that (0, 1) is not a Banach space. Therefore, the functions cannot be linear.
Theorem 13.1.12
Let X be a Banach space and let Y be a normed linear space. Let {Lα }α∈Λ be a collection of elements of L(X, Y ). Then one of the following happens. a.) sup{kLα k : α ∈ Λ} < ∞ b.) There exists a dense Gδ set, D, such that for all x ∈ D, sup{kLα xk α ∈ Λ} = ∞.
340
CHAPTER 13. BANACH SPACES Proof: For each n ∈ N, define Un = {x ∈ X : sup{kLα xk : α ∈ Λ} > n}.
Then Un is an open set because if x ∈ Un , then there exists α ∈ Λ such that kLα xk > n But then, since Lα is continuous, this situation persists for all y sufficiently close to x, say for all y ∈ B (x, δ). Then B (x, δ) ⊆ Un which shows Un is open. Case b.) is obtained from Theorem 13.1.6 if each Un is dense. The other case is that for some n, Un is not dense. If this occurs, there exists x0 and r > 0 such that for all x ∈ B(x0 , r), kLα xk ≤ n for all α. Now if y ∈ B(0, r), x0 + y ∈ B(x0 , r). Consequently, for all such y, kLα (x0 + y)k ≤ n. This implies that for all α ∈ Λ and kyk < r, kLα yk ≤ n + kLα (x0 )k ≤ 2n.
r Therefore, if kyk ≤ 1, 2 y < r and so for all α, r y k ≤ 2n. kLα 2 Now multiplying by r/2 it follows that whenever kyk ≤ 1, kLα (y)k ≤ 4n/r. Hence case a.) holds.
13.1.3
Open Mapping Theorem
Another remarkable theorem which depends on the Baire category theorem is the open mapping theorem. Unlike Theorem 13.1.12 it requires both X and Y to be Banach spaces.
Theorem 13.1.13 Let X and Y be Banach spaces, let L ∈ L(X, Y ), and suppose L is onto. Then L maps open sets onto open sets. To aid in the proof, here is a lemma.
Lemma 13.1.14 Let a and b be positive constants and suppose B(0, a) ⊆ L(B(0, b)). Then L(B(0, b)) ⊆ L(B(0, 2b)). Proof of Lemma 13.1.14: Let y ∈ L(B(0, b)). There exists x1 ∈ B(0, b) such that ky − Lx1 k < a2 . Now this implies 2y − 2Lx1 ∈ B(0, a) ⊆ L(B(0, b)). Thus 2y − 2Lx1 ∈ L(B(0, b)) just like y was. Therefore, there exists x2 ∈ B(0, b) such that k2y−2Lx1 −Lx2 k < a/2. Hence k4y − 4Lx1 − 2Lx2 k < a, and there exists x3 ∈ B (0, b) such that k4y − 4Lx1 − 2Lx2 − Lx3 k < a/2. Continuing in this way, there exist x1 , x2 , x3 , x4 , ... in B(0, b) such that n X k2n y − 2n−(i−1) L(xi )k < a i=1
which implies ky −
n X i=1
−(i−1)
2
L(xi )k = ky − L
n X i=1
! −(i−1)
2
(xi ) k < 2−n a
(13.5)
13.1. THEOREMS BASED ON BAIRE CATEGORY Now consider the partial sums of the series, k
n X
2−(i−1) xi k ≤ b
i=m
P∞
i=1
∞ X
341
2−(i−1) xi .
2−(i−1) = b 2−m+2 .
i=m
Therefore, P these partial sums form a Cauchy sequence and so since X is complete, there ∞ exists x = i=1 2−(i−1) xi . Letting n → ∞ in 13.5 yields ky − Lxk = 0. Now kxk = lim k n→∞
≤ lim
n→∞
n X
n X
2−(i−1) xi k
i=1
2−(i−1) kxi k < lim
n→∞
i=1
n X
2−(i−1) b = 2b.
i=1
This proves the lemma. By Corollary 13.1.7, the set, Proof of Theorem 13.1.13: Y = ∪∞ n=1 L(B(0, n)). L(B(0, n0 )) has nonempty interior for some n0 . Thus B(y, r) ⊆ L(B(0, n0 )) for some y and some r > 0. Since L is linear B(−y, r) ⊆ L(B(0, n0 )) also. Here is why. If z ∈ B(−y, r), then −z ∈ B(y, r) and so there exists xn ∈ B (0, n0 ) such that Lxn → −z. Therefore, L (−xn ) → z and −xn ∈ B (0, n0 ) also. Therefore z ∈ L(B(0, n0 )). Then it follows that B(0, r) ⊆ B(y, r) + B(−y, r) ≡
{y1 + y2 : y1 ∈ B (y, r) and y2 ∈ B (−y, r)}
⊆ L(B(0, 2n0 )) The reason for the last inclusion is that from the above, if y1 ∈ B (y, r) and y2 ∈ B (−y, r), there exists xn , zn ∈ B (0, n0 ) such that Lxn → y1 , Lzn → y2 . Therefore, kxn + zn k ≤ 2n0 and so (y1 + y2 ) ∈ L(B(0, 2n0 )). By Lemma 13.1.14, L(B(0, 2n0 )) ⊆ L(B(0, 4n0 )) which shows B(0, r) ⊆ L(B(0, 4n0 )). Letting a = r(4n0 )−1 , it follows, since L is linear, that B(0, a) ⊆ L(B(0, 1)). It follows since L is linear, L(B(0, r)) ⊇ B(0, ar). (13.6) Now let U be open in X and let x + B(0, r) = B(x, r) ⊆ U . Using 13.6, L(U ) ⊇ L(x + B(0, r)) = Lx + L(B(0, r)) ⊇ Lx + B(0, ar) = B(Lx, ar). Hence Lx ∈ B(Lx, ar) ⊆ L(U ). which shows that every point, Lx ∈ LU , is an interior point of LU and so LU is open. This theorem is surprising because it implies that if |·| and k·k are two norms with respect to which a vector space X is a Banach space such that |·| ≤ K k·k, then there exists a constant k, such that k·k ≤ k |·| . This can be useful because sometimes it is not clear how to compute k when all that is needed is its existence. To see the open mapping theorem implies this, consider the identity map id x = x. Then id : (X, k·k) → (X, |·|) is continuous and onto. Hence id is an open map which implies id−1 is continuous. Theorem 13.1.8 gives the existence of the constant k.
342
13.1.4
CHAPTER 13. BANACH SPACES
Closed Graph Theorem
Definition 13.1.15
Let f : D → E. The set of all ordered pairs of the form {(x, f (x)) : x ∈ D} is called the graph of f .
Definition 13.1.16
If X and Y are normed linear spaces, make X × Y into a normed linear space by using the norm k(x, y)k = max (kxk, kyk) along with componentwise addition and scalar multiplication. Thus a(x, y) + b(z, w) ≡ (ax + bz, ay + bw). There are other ways to give a norm for X × Y . For example, you could define k(x, y)k = kxk + kyk
Lemma 13.1.17 The norm defined in Definition 13.1.16 on X × Y along with the definition of addition and scalar multiplication given there make X × Y into a normed linear space. Proof: The only axiom for a norm which is not obvious is the triangle inequality. Therefore, consider k(x1 , y1 ) + (x2 , y2 )k
= k(x1 + x2 , y1 + y2 )k =
max (kx1 + x2 k , ky1 + y2 k)
≤ max (kx1 k + kx2 k , ky1 k + ky2 k) ≤
max (kx1 k , ky1 k) + max (kx2 k , ky2 k)
=
k(x1 , y1 )k + k(x2 , y2 )k .
It is obvious X × Y is a vector space from the above definition.
Lemma 13.1.18 If X and Y are Banach spaces, then X × Y with the norm and vector space operations defined in Definition 13.1.16 is also a Banach space. Proof: The only thing left to check is that the space is complete. But this follows from the simple observation that {(xn , yn )} is a Cauchy sequence in X × Y if and only if {xn } and {yn } are Cauchy sequences in X and Y respectively. Thus if {(xn , yn )} is a Cauchy sequence in X × Y , it follows there exist x and y such that xn → x and yn → y. But then from the definition of the norm, (xn , yn ) → (x, y).
Lemma 13.1.19 Every closed subspace of a Banach space is a Banach space. Proof: If F ⊆ X where X is a Banach space and {xn } is a Cauchy sequence in F , then since X is complete, there exists a unique x ∈ X such that xn → x. However this means x ∈ F = F since F is closed.
Definition 13.1.20 Let X and Y be Banach spaces and let D ⊆ X be a subspace. A linear map L : D → Y is said to be closed if its graph is a closed subspace of X × Y . Equivalently, L is closed if xn → x and Lxn → y implies x ∈ D and y = Lx. Note the distinction between closed and continuous. If the operator is closed the assertion that y = Lx only follows if it is known that the sequence {Lxn } converges. In the case of a continuous operator, the convergence of {Lxn } follows from the assumption that xn → x. It is not always the case that a mapping which is closed is necessarily continuous. Consider the function f (x) = tan (x) if x is not an odd multiple of π2 and f (x) ≡ 0 at every odd multiple of π2 . Then the graph is closed and the function is defined on R but it clearly fails to be continuous. Of course this function is not linear. You could also consider the map, d : y ∈ C 1 ([0, 1]) : y (0) = 0 ≡ D → C ([0, 1]) . dx
13.2. BASIC THEORY OF HILBERT SPACES
343
where the norm is the uniform norm on C ([0, 1]) , kyk∞ . If y ∈ D, then Z x y (x) = y 0 (t) dt. 0
Therefore, if
dyn dx
→ f ∈ C ([0, 1]) and if yn → y in C ([0, 1]) it follows that yn (x) = ↓ y (x) =
Rx 0
dyn (t) dx dt
Rx ↓ f (t) dt 0
and so by the fundamental theorem of calculus f (x) = y 0 (x) and so the mapping is closed. It is obviously not continuous because it takes y (x) and y (x) + n1 sin (nx) to two functions which are far from each other even though these two functions are very close in C ([0, 1]). Furthermore, it is not defined on the whole space, C ([0, 1]). The next theorem, the closed graph theorem, gives conditions under which closed implies continuous.
Theorem 13.1.21
Let X and Y be Banach spaces and suppose L : X → Y is closed and linear. Then L is continuous. Proof: Let G be the graph of L. G = {(x, Lx) : x ∈ X}. By Lemma 13.1.19 it follows that G is a Banach space. Define P : G → X by P (x, Lx) = x. P maps the Banach space G onto the Banach space X and is continuous and linear. By the open mapping theorem, P −1 maps open sets is continuous.
onto open sets. Since P is also one to one, this says that P −1
Thus P x ≤ K kxk. Hence kLxk ≤ max (kxk, kLxk) ≤ K kxk By Theorem 13.1.8 on Page 337, this shows L is continuous. The following corollary is quite useful. It shows how to obtain a new norm on the domain of a closed operator such that the domain with this new norm becomes a Banach space.
Corollary 13.1.22 Let L : D ⊆ X → Y where X, Y are a Banach spaces, and L is a closed operator. Then define a new norm on D by kxkD ≡ kxkX + kLxkY . Then D with this new norm is a Banach space. Proof: If {xn } is a Cauchy sequence in D with this new norm, it follows both {xn } and {Lxn } are Cauchy sequences and therefore, they converge. Since L is closed, xn → x and Lxn → Lx for some x ∈ D. Thus kxn − xkD → 0.
13.2
Basic Theory of Hilbert Spaces
A Hilbert space is a special case of a Banach space. In a Hilbert space, the norm comes from an inner product. This is described next. It is really a generalization of the familiar dot product from calculus.
Definition 13.2.1
Let X be a vector space. An inner product is a mapping from X × X to C if X is complex and from X × X to R if X is real, denoted by (x, y) which satisfies the following. (x, x) ≥ 0, (x, x) = 0 if and only if x = 0,
(13.7)
(x, y) = (y, x).
(13.8)
344
CHAPTER 13. BANACH SPACES
For a, b ∈ C and x, y, z ∈ X, (ax + by, z) = a(x, z) + b(y, z).
(13.9)
Note that 13.8 and 13.9 imply (x, ay + bz) = a(x, y) + b(x, z). Such a vector space is called an inner product space. The Cauchy Schwarz inequality is fundamental for the study of inner product spaces.
Theorem 13.2.2
(Cauchy Schwarz) In any inner product space |(x, y)| ≤ kxk kyk.
Proof: Let ω ∈ C, |ω| = 1, and ω(x, y) = |(x, y)| = Re(x, yω). Let F (t) = (x + tyω, x + tωy). If y = 0 there is nothing to prove because (x, 0) = (x, 0 + 0) = (x, 0) + (x, 0) and so (x, 0) = 0. Thus, it can be assumed y 6= 0. Then from the axioms of the inner product, F (t) = kxk2 + 2t Re(x, ωy) + t2 kyk2 ≥ 0. This yields kxk2 + 2t|(x, y)| + t2 kyk2 ≥ 0. Since this inequality holds for all t ∈ R, it follows from the quadratic formula that 4|(x, y)|2 − 4kxk2 kyk2 ≤ 0. This yields the conclusion and proves the theorem.
Proposition 13.2.3 For an inner product space, kxk ≡ (x, x)1/2 does specify a norm. Proof: All the axioms are obvious except the triangle inequality. To verify this, 2
kx + yk
2
2
≡
(x + y, x + y) ≡ kxk + kyk + 2 Re (x, y)
≤
kxk + kyk + 2 |(x, y)|
≤
kxk + kyk + 2 kxk kyk = (kxk + kyk) .
2
2
2
2
2
The following lemma is called the parallelogram identity.
Lemma 13.2.4 In an inner product space, kx + yk2 + kx − yk2 = 2kxk2 + 2kyk2. The proof, a straightforward application of the inner product axioms, is left to the reader.
Lemma 13.2.5 For x ∈ H, an inner product space, kxk = sup |(x, y)| kyk≤1
Proof: By the Cauchy Schwarz inequality, if x 6= 0, x = kxk . kxk ≥ sup |(x, y)| ≥ x, kxk kyk≤1 It is obvious that 13.10 holds in the case that x = 0.
(13.10)
13.2. BASIC THEORY OF HILBERT SPACES
345
Definition 13.2.6
A Hilbert space is an inner product space which is complete. Thus a Hilbert space is a Banach space in which the norm comes from an inner product as described above. In Hilbert space, one can define a projection map onto closed convex nonempty sets.
Definition 13.2.7
A set, K, is convex if whenever λ ∈ [0, 1] and x, y ∈ K, λx +
(1 − λ)y ∈ K.
Theorem 13.2.8
Let K be a closed convex nonempty subset of a Hilbert space, H, and let x ∈ H. Then there exists a unique point P x ∈ K such that kP x − xk ≤ ky − xk for all y ∈ K. Proof: Consider uniqueness. Suppose that z1 and z2 are two elements of K such that for i = 1, 2, kzi − xk ≤ ky − xk (13.11) for all y ∈ K. Also, note that since K is convex, z1 + z2 ∈ K. 2 Therefore, by the parallelogram identity, z1 + z2 z1 − x z2 − x 2 − xk2 = k + k 2 2 2 z1 − x 2 z2 − x 2 z1 − z2 2 = 2(k k +k k )−k k 2 2 2 1 1 z1 − z2 2 2 2 = kz1 − xk + kz2 − xk − k k 2 2 2 z1 − z2 2 k, ≤ kz1 − xk2 − k 2 where the last inequality holds because of 13.11 letting zi = z2 and y = z1 . Hence z1 = z2 and this shows uniqueness. Now let λ = inf{kx − yk : y ∈ K} and let yn be a minimizing sequence. This means {yn } ⊆ K satisfies limn→∞ kx − yn k = λ. Now the following follows from properties of the norm. yn + ym 2 − xk2 ) kyn − x + ym − xk = 4(k 2 m Then by the parallelogram identity, and convexity of K, yn +y ∈ K, and so 2 kz1 − xk2
≤
k
=kyn −x+ym −xk2
z
2
k (yn − x) − (ym − x) k
}| { yn + ym 2 − xk ) = 2(kyn − xk + kym − xk ) − 4(k 2 2 ≤ 2(kyn − xk2 + kym − xk2 ) − 4λ . 2
2
Since kx − yn k → λ, this shows {yn − x} is a Cauchy sequence. Thus also {yn } is a Cauchy sequence. Since H is complete, yn → y for some y ∈ H which must be in K because K is closed. Therefore kx − yk = lim kx − yn k = λ. n→∞
Let P x = y.
Corollary 13.2.9 Let K be a closed, convex, nonempty subset of a Hilbert space, H, and let x ∈ H. Then for z ∈ K, z = P x if and only if Re(x − z, y − z) ≤ 0
(13.12)
for all y ∈ K. Before proving this, consider what it says in the case where the Hilbert space is Rn.
346
CHAPTER 13. BANACH SPACES
y
θ
K
Condition 13.12 says the angle, θ, shown in the diagram is always obtuse. Remember from calculus, the sign of x · y is the same as the sign of the cosine of the included angle between x and y. Thus, in finite dimensions, the conclusion of this corollary says that z = P x exactly when the angle of the indicated angle is obtuse. Surely the picture suggests this is
x
z
reasonable. The inequality 13.12 is an example of a variational inequality and this corollary characterizes the projection of x onto K as the solution of this variational inequality. Proof of Corollary: Let z ∈ K and let y ∈ K also. Since K is convex, it follows that if t ∈ [0, 1], z + t(y − z) = (1 − t) z + ty ∈ K. Furthermore, every point of K can be written in this way. (Let t = 1 and y ∈ K.) Therefore, z = P x if and only if for all y ∈ K and t ∈ [0, 1], kx − (z + t(y − z))k2 = k(x − z) − t(y − z)k2 ≥ kx − zk2 for all t ∈ [0, 1] and y ∈ K if and only if for all t ∈ [0, 1] and y ∈ K 2
2
2
kx − zk + t2 ky − zk − 2t Re (x − z, y − z) ≥ kx − zk If and only if for all t ∈ [0, 1], 2
t2 ky − zk − 2t Re (x − z, y − z) ≥ 0.
(13.13)
Now this is equivalent to 13.13 holding for all t ∈ (0, 1). Therefore, dividing by t ∈ (0, 1) , 13.13 is equivalent to 2 t ky − zk − 2 Re (x − z, y − z) ≥ 0 for all t ∈ (0, 1) which is equivalent to 13.12. This proves the corollary.
Corollary 13.2.10 Let K be a nonempty convex closed subset of a Hilbert space, H. Then the projection map, P is continuous. In fact, |P x − P y| ≤ |x − y| . Proof: Let x, x0 ∈ H. Then by Corollary 13.2.9, Re (x0 − P x0 , P x − P x0 ) ≤ 0, Re (x − P x, P x0 − P x) ≤ 0 Hence 0
≤ Re (x − P x, P x − P x0 ) − Re (x0 − P x0 , P x − P x0 ) =
Re (x − x0 , P x − P x0 ) − |P x − P x0 |
2
and so 2
|P x − P x0 | ≤ |x − x0 | |P x − P x0 | . This proves the corollary. The next corollary is a more general form for the Brouwer fixed point theorem. This was discussed in exercises and elsewhere earlier. However, here is a complete proof.
Corollary 13.2.11 Let f : K → K where K is a convex compact subset of Rn . Then f has a fixed point.
13.2. BASIC THEORY OF HILBERT SPACES
347
Proof: Let K ⊆ B (0, R) and let P be the projection map onto K. Then consider the map f ◦ P which maps B (0, R) to B (0, R) and is continuous. By the Brouwer fixed point theorem for balls, this map has a fixed point. Thus there exists x such that f ◦ P (x) = x Now the equation also requires x ∈ K and so P (x) = x. Hence f (x) = x.
Definition 13.2.12
Let H be a vector space and let U and V be subspaces. U ⊕V = H if every element of H can be written as a sum of an element of U and an element of V in a unique way. The case where the closed convex set is a closed subspace is of special importance and in this case the above corollary implies the following.
Corollary 13.2.13 Let K be a closed subspace of a Hilbert space, H, and let x ∈ H. Then for z ∈ K, z = P x if and only if (x − z, y) = 0
(13.14)
for all y ∈ K. Furthermore, H = K ⊕ K ⊥ where K ⊥ ≡ {x ∈ H : (x, k) = 0 for all k ∈ K} and 2
2
2
kxk = kx − P xk + kP xk .
(13.15)
Proof: Since K is a subspace, the condition 13.12 implies Re(x − z, y) ≤ 0 for all y ∈ K. Replacing y with −y, it follows Re(x − z, −y) ≤ 0 which implies Re(x − z, y) ≥ 0 for all y. Therefore, Re(x − z, y) = 0 for all y ∈ K. Now let |α| = 1 and α (x − z, y) = |(x − z, y)|. Since K is a subspace, it follows αy ∈ K for all y ∈ K. Therefore, 0 = Re(x − z, αy) = (x − z, αy) = α (x − z, y) = |(x − z, y)|. This shows that z = P x, if and only if 13.14. For x ∈ H, x = x − P x + P x and from what was just shown, x − P x ∈ K ⊥ and P x ∈ K. This shows that K ⊥ + K = H. Is there only one way to write a given element of H as a sum of a vector in K with a vector in K ⊥ ? Suppose y + z = y1 + z1 where z, z1 ∈ K ⊥ and y, y1 ∈ K. Then (y − y1 ) = (z1 − z) and so from what was just shown, (y − y1 , y − y1 ) = (y − y1 , z1 − z) = 0 which shows y1 = y and consequently z1 = z. Finally, letting z = P x, 2
kxk
2
=
(x − z + z, x − z + z) = kx − zk + (x − z, z) + (z, x − z) + kzk
=
kx − zk + kzk
2
2
2
This proves the corollary. The following theorem is called the Riesz representation theorem for the dual of a Hilbert space. If z ∈ H then define an element f ∈ H 0 by the rule (x, z) ≡ f (x). It follows from the Cauchy Schwarz inequality and the properties of the inner product that f ∈ H 0 . The Riesz representation theorem says that all elements of H 0 are of this form.
Theorem 13.2.14
Let H be a Hilbert space and let f ∈ H 0 . Then there exists a
unique z ∈ H such that f (x) = (x, z) for all x ∈ H.
(13.16)
348
CHAPTER 13. BANACH SPACES Proof: Letting y, w ∈ H the assumption that f is linear implies f (yf (w) − f (y)w) = f (w) f (y) − f (y) f (w) = 0
which shows that yf (w) − f (y)w ∈ f −1 (0), which is a closed subspace of H since f is continuous. If f −1 (0) = H, then f is the zero map and z = 0 is the unique element of H which satisfies 13.16. If f −1 (0) 6= H, pick u ∈ / f −1 (0) and let w ≡ u − P u 6= 0. Thus −1 Corollary 13.2.13 implies (y, w) = 0 for all y ∈ f (0). In particular, let y = xf (w) − f (x)w where x ∈ H is arbitrary. Therefore, 0 = (f (w)x − f (x)w, w) = f (w)(x, w) − f (x)kwk2. Thus, solving for f (x) and using the properties of the inner product, f (x) = (x,
f (w)w ) kwk2
Let z = f (w)w/kwk2 . This proves the existence of z. If f (x) = (x, zi ) i = 1, 2, for all x ∈ H, then for all x ∈ H, then (x, z1 − z2 ) = 0 which implies, upon taking x = z1 − z2 that z1 = z2 . This proves the theorem. If R : H → H 0 is defined by Rx (y) ≡ (y, x) , the Riesz representation theorem above states this map is onto. This map is called the Riesz map. It is routine to show R is conjugate linear and kRxk = kxk. In fact, ¯ (u, y) R (αx + βy) (u) ≡ (u, αx + βy) = α ¯ (u, x) + β ¯ ¯ ≡ α ¯ Rx (u) + βRy (u) = α ¯ Rx + βRy (u) so it is conjugate linear meaning it goes across plus signs and you factor out conjugates. kRxk ≡ sup |Rx (y)| ≡ sup |(y, x)| = kxk kyk≤1
13.3
kyk≤1
Hahn Banach Theorem
The closed graph, open mapping, and uniform boundedness theorems are the three major topological theorems in functional analysis. The other major theorem is the Hahn-Banach theorem which has nothing to do with topology. Before presenting this theorem, here are some preliminaries about partially ordered sets.
13.3.1
Partially Ordered Sets
Definition 13.3.1 Let F be a nonempty set. F is called a partially ordered set if there is a relation, denoted here by ≤, such that x ≤ x for all x ∈ F. If x ≤ y and y ≤ z then x ≤ z. C ⊆ F is said to be a chain if every two elements of C are related. This means that if x, y ∈ C, then either x ≤ y or y ≤ x. Sometimes a chain is called a totally ordered set. C is said to be a maximal chain if whenever D is a chain containing C, D = C. The most common example of a partially ordered set is the power set of a given set with ⊆ being the relation. It is also helpful to visualize partially ordered sets as trees. Two points on the tree are related if they are on the same branch of the tree and one is higher than the other. Thus two points on different branches would not be related although they might both be larger than some point on the trunk. You might think of many other things
13.3. HAHN BANACH THEOREM
349
which are best considered as partially ordered sets. Think of food for example. You might find it difficult to determine which of two favorite pies you like better although you may be able to say very easily that you would prefer either pie to a dish of lard topped with whipped cream and mustard. The following theorem is equivalent to the axiom of choice. For a discussion of this, see the appendix on the subject.
Theorem 13.3.2
(Hausdorff Maximal Principle) Let F be a nonempty partially ordered set. Then there exists a maximal chain.
13.3.2
Gauge Functions And Hahn Banach Theorem
Definition 13.3.3
Let X be a real vector space ρ : X → R is called a gauge function
if ρ(x + y) ≤ ρ(x) + ρ(y), ρ(ax) = aρ(x) if a ≥ 0.
(13.17)
Suppose M is a subspace of X and z ∈ / M . Suppose also that f is a linear real-valued function having the property that f (x) ≤ ρ(x) for all x ∈ M . Consider the problem of extending f to M ⊕ Rz such that if F is the extended function, F (y) ≤ ρ(y) for all y ∈ M ⊕ Rz and F is linear. Since F is to be linear, it suffices to determine how to define F (z). Letting a > 0, it is required to define F (z) such that the following hold for all x, y ∈ M . f (x)
z }| { F (x) + aF (z) = F (x + az) ≤ ρ(x + az), f (y)
z }| { F (y) − aF (z) = F (y − az) ≤ ρ(y − az).
(13.18)
Now if these inequalities hold for all y/a, they hold for all y because M is given to be a subspace. Therefore, multiplying by a−1 13.17 implies that what is needed is to choose F (z) such that for all x, y ∈ M , f (x) + F (z) ≤ ρ(x + z), f (y) − ρ(y − z) ≤ F (z) and that if F (z) can be chosen in this way, this will satisfy 13.18 for all x, y and the problem of extending f will be solved. Hence it is necessary to choose F (z) such that for all x, y ∈ M f (y) − ρ(y − z) ≤ F (z) ≤ ρ(x + z) − f (x).
(13.19)
Is there any such number between f (y)−ρ(y −z) and ρ(x+z)−f (x) for every pair x, y ∈ M ? This is where f (x) ≤ ρ(x) on M and that f is linear is used. For x, y ∈ M , ρ(x + z) − f (x) − [f (y) − ρ(y − z)] = ρ(x + z) + ρ(y − z) − (f (x) + f (y)) ≥ ρ(x + y) − f (x + y) ≥ 0. Therefore there exists a number between sup {f (y) − ρ(y − z) : y ∈ M } and inf {ρ(x + z) − f (x) : x ∈ M } Choose F (z) to satisfy 13.19. This has proved the following lemma.
350
CHAPTER 13. BANACH SPACES
Lemma 13.3.4 Let M be a subspace of X, a real linear space, and let ρ be a gauge function on X. Suppose f : M → R is linear, z ∈ / M , and f (x) ≤ ρ (x) for all x ∈ M . Then f can be extended to M ⊕ Rz such that, if F is the extended function, F is linear and F (x) ≤ ρ(x) for all x ∈ M ⊕ Rz. With this lemma, the Hahn Banach theorem can be proved.
Theorem 13.3.5 (Hahn Banach theorem) Let X be a real vector space, let M be a subspace of X, let f : M → R be linear, let ρ be a gauge function on X, and suppose f (x) ≤ ρ(x) for all x ∈ M . Then there exists a linear function, F : X → R, such that a.) F (x) = f (x) for all x ∈ M b.) F (x) ≤ ρ(x) for all x ∈ X. Proof: Let F = {(V, g) : V ⊇ M, V is a subspace of X, g : V → R is linear, g(x) = f (x) for all x ∈ M , and g(x) ≤ ρ(x) for x ∈ V }. Then (M, f ) ∈ F so F 6= ∅. Define a partial order by the following rule. (V, g) ≤ (W, h) means V ⊆ W and h(x) = g(x) if x ∈ V. By Theorem 13.3.2, there exists a maximal chain, C ⊆ F. Let Y = ∪{V : (V, g) ∈ C} and let h : Y → R be defined by h(x) = g(x) where x ∈ V and (V, g) ∈ C. This is well defined because if x ∈ V1 and V2 where (V1 , g1 ) and (V2 , g2 ) are both in the chain, then since C is a chain, the two element related. Therefore, g1 (x) = g2 (x). Also h is linear because if ax+by ∈ Y , then x ∈ V1 and y ∈ V2 where (V1 , g1 ) and (V2 , g2 ) are elements of C. Therefore, letting V denote the larger of the two Vi , and g be the function that goes with V , it follows ax + by ∈ V where (V, g) ∈ C. Therefore, h (ax + by)
= g (ax + by) = ag (x) + bg (y) = ah (x) + bh (y) .
Also, h(x) = g (x) ≤ ρ(x) for any x ∈ Y because for such x, x ∈ V where (V, g) ∈ C. Is Y = X? If not, there exists z ∈ X \ Y and there exists an extension of h to Y ⊕ Rz using Lemma 13.3.4. Letting h denote this extended function, contradicts the maximality of C. Indeed, C ∪ { Y ⊕ Rz, h } would be a longer chain. This is the original version of the theorem. There is also a version of this theorem for complex vector spaces which is based on a trick.
13.3.3
The Complex Version Of The Hahn Banach Theorem
Corollary 13.3.6 (Hahn Banach) Let M be a subspace of a complex normed linear space, X, and suppose f : M → C is linear and satisfies |f (x)| ≤ Kkxk for all x ∈ M . Then there exists a linear function, F , defined on all of X such that F (x) = f (x) for all x ∈ M and |F (x)| ≤ Kkxk for all x. Proof: First note f (x) = Re f (x) + i Im f (x) and so Re f (ix) + i Im f (ix) = f (ix) = if (x) = i Re f (x) − Im f (x). Therefore, Im f (x) = − Re f (ix), and f (x) = Re f (x) − i Re f (ix). This is important because it shows it is only necessary to consider Re f in understanding f . Now it happens that Re f is linear with respect to real scalars so the above version of the Hahn Banach theorem applies. This is shown next.
13.3. HAHN BANACH THEOREM
351
If c is a real scalar Re f (cx) − i Re f (icx) = cf (x) = c Re f (x) − ic Re f (ix). Thus Re f (cx) = c Re f (x). Also, Re f (x + y) − i Re f (i (x + y))
= f (x + y) = f (x) + f (y)
= Re f (x) − i Re f (ix) + Re f (y) − i Re f (iy). Equating real parts, Re f (x + y) = Re f (x) + Re f (y). Thus Re f is linear with respect to real scalars as hoped. Consider X as a real vector space and let ρ(x) ≡ Kkxk. Then for all x ∈ M , | Re f (x)| ≤ |f (x)| ≤ Kkxk = ρ(x). From Theorem 13.3.5, Re f may be extended to a function, h which satisfies h(ax + by)
=
h(x) ≤
ah(x) + bh(y) if a, b ∈ R Kkxk for all x ∈ X.
Actually, |h (x)| ≤ K kxk . The reason for this is that h (−x) = −h (x) ≤ K k−xk = K kxk and therefore, h (x) ≥ −K kxk. Let F (x) ≡ h(x) − ih(ix). By arguments similar to the above, F is linear. F (ix)
= h (ix) − ih (−x) = ih (x) + h (ix) = i (h (x) − ih (ix)) = iF (x) .
If c is a real scalar, F (cx)
=
h(cx) − ih(icx)
=
ch (x) − cih (ix) = cF (x)
Now F (x + y)
= h(x + y) − ih(i (x + y)) = h (x) + h (y) − ih (ix) − ih (iy) = F (x) + F (y) .
Thus F ((a + ib) x)
=
F (ax) + F (ibx)
= aF (x) + ibF (x) =
(a + ib) F (x) .
This shows F is linear as claimed. Now wF (x) = |F (x)| for some |w| = 1. Therefore must equal zero
|F (x)| = wF (x) = h(wx) −
z }| { ih(iwx)
= |h(wx)| ≤ Kkwxk = K kxk .
= h(wx)
352
CHAPTER 13. BANACH SPACES
13.3.4
The Dual Space And Adjoint Operators
Definition 13.3.7
Let X be a Banach space. Denote by X 0 the space of continuous linear functions which map X to the field of scalars. Thus X 0 = L(X, F). By Theorem 13.1.11 on Page 339, X 0 is a Banach space. Remember with the norm defined on L (X, F), kf k = sup{|f (x)| : kxk ≤ 1} X 0 is called the dual space.
Definition 13.3.8
Let X and Y be Banach spaces and suppose L ∈ L(X, Y ). Then define the adjoint map in L(Y 0 , X 0 ), denoted by L∗ , by L∗ y ∗ (x) ≡ y ∗ (Lx) for all y ∗ ∈ Y 0 . The following diagram is a good one to help remember this definition. X
L∗ ← → L
0
X
Y0 Y
This is a generalization of the adjoint of a linear transformation on an inner product space. Recall (Ax, y) = (x, A∗ y) What is being done here is to generalize this algebraic concept to arbitrary Banach spaces. There are some issues which need to be discussed relative to the above definition. First of all, it must be shown that L∗ y ∗ ∈ X 0 . Also, it will be useful to have the following lemma which is a useful application of the Hahn Banach theorem.
Lemma 13.3.9 Let X be a normed linear space and let x ∈ X \ V where V is a closed subspace of X. Then there exists x∗ ∈ X 0 such that x∗ (x) = kxk, x∗ (V ) = {0}, and kx∗ k ≤
1 dist (x, V )
In the case that V = {0} , kx∗ k = 1. Proof: Let f : Fx+V → F be defined by f (αx+v) = αkxk. First it is necessary to show f is well defined and continuous. If α1 x + v1 = α2 x + v2 then if α1 6= α2 , then x ∈ V which is assumed not to happen so f is well defined. It remains to show f is continuous. Suppose then that αn x + vn → 0. It is necessary to show αn → 0. If this does not happen, then there exists a subsequence, still denoted by αn such that |αn | ≥ δ > 0. Then x + (1/αn ) vn → 0 contradicting the assumption that x ∈ / V and V is a closed subspace. Hence f is continuous on Fx + V. Being a little more careful, kf k =
sup
|f (αx + v)| =
kαx+vk≤1
|α| kxk =
sup |α|kx+(v/α)k≤1
1 kxk dist (x, V )
By the Hahn Banach theorem, there exists x∗ ∈ X 0 such that x∗ = f on Fx + V. Thus x∗ (x) = kxk and also 1 kx∗ k ≤ kf k = dist (x, V ) In case V = {0} , the result follows from the above or alternatively, kf k ≡ sup |f (αx)| = kαxk≤1
sup |α|≤1/kxk
|α| kxk = 1
13.3. HAHN BANACH THEOREM
353
and so, in this case, kx∗ k ≤ kf k = 1. Since x∗ (x) = kxk it follows x kxk ∗ = kx k ≥ x∗ = 1. kxk kxk Thus kx∗ k = 1.
Theorem 13.3.10
Let L ∈ L(X, Y ) where X and Y are Banach spaces. Then a.) L∗ ∈ L(Y 0 , X 0 ) as claimed and kL∗ k = kLk. b.) If L maps one to one onto a closed subspace of Y , then L∗ is onto. c.) If L maps onto a dense subset of Y , then L∗ is one to one.
Proof: It is routine to verify L∗ y ∗ and L∗ are both linear. This follows immediately from the definition. As usual, the interesting thing concerns continuity. kL∗ y ∗ k = sup |L∗ y ∗ (x)| = sup |y ∗ (Lx)| ≤ ky ∗ k kLk . kxk≤1
kxk≤1
Thus L∗ is continuous as claimed and kL∗ k ≤ kLk . By Lemma 13.3.9, there exists yx∗ ∈ Y 0 such that kyx∗ k = 1 and yx∗ (Lx) = kLxk .Therefore, kL∗ k
=
sup kL∗ y ∗ k = sup ky ∗ k≤1
= ≥
sup
sup |L∗ y ∗ (x)|
ky ∗ k≤1 kxk≤1
sup |y ∗ (Lx)| = sup
ky ∗ k≤1 kxk≤1 sup |yx∗ (Lx)| kxk≤1
sup |y ∗ (Lx)|
kxk≤1 ky ∗ k≤1
= sup kLxk = kLk kxk≤1
showing that kL∗ k ≥ kLk and this shows part a.). If L is one to one and onto a closed subset of Y , then L (X) being a closed subspace of a Banach space, is itself a Banach space and so the open mapping theorem implies L−1 : L(X) → X is continuous. Hence
kxk = kL−1 Lxk ≤ L−1 kLxk Now let x∗ ∈ X 0 be given. Define f ∈ L(L(X), C) by f (Lx) = x∗ (x). The function, f is well defined because if Lx1 = Lx2 , then since L is one to one, it follows x1 = x2 and so f (L (x1 )) = x∗ (x1 ) = x∗ (x2 ) = f (L (x1 )). Also, f is linear because f (aL (x1 ) + bL (x2 ))
= f (L (ax1 + bx2 )) ≡ x∗ (ax1 + bx2 ) = ax∗ (x1 ) + bx∗ (x2 ) = af (L (x1 )) + bf (L (x2 )) .
In addition to this,
|f (Lx)| = |x∗ (x)| ≤ kx∗ k kxk ≤ kx∗ k L−1 kLxk
and so the norm of f on L (X) is no larger than kx∗ k L−1 . By the Hahn Banach
theorem,
there exists an extension of f to an element y ∗ ∈ Y 0 such that ky ∗ k ≤ kx∗ k L−1 . Then L∗ y ∗ (x) = y ∗ (Lx) = f (Lx) = x∗ (x) so L∗ y ∗ = x∗ because this holds for all x. Since x∗ was arbitrary, this shows L∗ is onto and proves b.). Consider the last assertion. Suppose L∗ y ∗ = 0. Is y ∗ = 0? In other words is y ∗ (y) = 0 for all y ∈ Y ? Pick y ∈ Y . Since L (X) is dense in Y, there exists a sequence, {Lxn } such that Lxn → y. But then by continuity of y ∗ , y ∗ (y) = limn→∞ y ∗ (Lxn ) = limn→∞ L∗ y ∗ (xn ) = 0. Since y ∗ (y) = 0 for all y, this implies y ∗ = 0 and so L∗ is one to one.
354
CHAPTER 13. BANACH SPACES
Corollary 13.3.11 Suppose X and Y are Banach spaces, L ∈ L(X, Y ), and L is one to one and onto. Then L∗ is also one to one and onto. There exists a natural mapping, called the James map from a normed linear space, X, to the dual of the dual space which is described in the following definition.
Definition 13.3.12
Define J : X → X 00 by J(x)(x∗ ) = x∗ (x).
Theorem 13.3.13
The map, J, has the following properties. a.) J is one to one and linear. b.) kJxk = kxk and kJk = 1. c.) J(X) is a closed subspace of X 00 if X is complete. Also if x∗ ∈ X 0 , kx∗ k = sup {|x∗∗ (x∗ )| : kx∗∗ k ≤ 1, x∗∗ ∈ X 00 } . Proof: J (ax + by) (x∗ ) ≡ x∗ (ax + by) = ax∗ (x) + bx∗ (y) =
(aJ (x) + bJ (y)) (x∗ ) .
Since this holds for all x∗ ∈ X 0 , it follows that J (ax + by) = aJ (x) + bJ (y) and so J is linear. If Jx = 0, then by Lemma 13.3.9 there exists x∗ such that x∗ (x) = kxk and kx∗ k = 1. Then 0 = J(x)(x∗ ) = x∗ (x) = kxk. This shows a.). To show b.), let x ∈ X and use Lemma 13.3.9 to obtain x∗ ∈ X 0 such that x∗ (x) = kxk with kx∗ k = 1. Then kxk
≥ sup{|y ∗ (x)| : ky ∗ k ≤ 1} =
sup{|J(x)(y ∗ )| : ky ∗ k ≤ 1} = kJxk
≥
|J(x)(x∗ )| = |x∗ (x)| = kxk
Therefore, kJxk = kxk as claimed. Therefore, kJk = sup{kJxk : kxk ≤ 1} = sup{kxk : kxk ≤ 1} = 1. This shows b.). To verify c.), use b.). If Jxn → y ∗∗ ∈ X 00 then by b.), xn is a Cauchy sequence converging to some x ∈ X because kxn − xm k = kJxn − Jxm k and {Jxn } is a Cauchy sequence. Then Jx = limn→∞ Jxn = y ∗∗ . Finally, to show the assertion about the norm of x∗ , use what was just shown applied to the James map from X 0 to X 000 still referred to as J. kx∗ k = sup {|x∗ (x)| : kxk ≤ 1} = sup {|J (x) (x∗ )| : kJxk ≤ 1} ≤ sup {|x∗∗ (x∗ )| : kx∗∗ k ≤ 1} = sup {|J (x∗ ) (x∗∗ )| : kx∗∗ k ≤ 1} ≡ kJx∗ k = kx∗ k.
Definition 13.3.14
When J maps X onto X 00 , X is called reflexive.
13.4. EXERCISES
13.4
355
Exercises
1. Is N a Gδ set? What about Q? 2. ↑ Let f : R → C be a function. Define the oscillation of a function in B (x, r) by ω r f (x) = sup{|f (z) − f (y)| : y, z ∈ B(x, r)}. Define the oscillation of the function at the point, x by ωf (x) = limr→0 ω r f (x). Show f is continuous at x if and only if ωf (x) = 0. Then show the set of points where f is continuous is a Gδ set (try Un = {x : ωf (x) < n1 }). Does there exist a function continuous at only the rational numbers? Does there exist a function continuous at every irrational and discontinuous ∞ elsewhere? Hint: Suppose D is any countable set, D = {di }i=1 , and define the −n function, fn (x) to equal zero / {d1 , · · · , dn } and 2 for x in this finite P∞for every x ∈ set. Then consider g (x) ≡ n=1 fn (x). Show that this series converges uniformly. 3. Let f ∈ C([0, 1]) and suppose f 0 (x) exists. Show there exists a constant, K, such that |f (x) − f (y)| ≤ K|x − y| for all y ∈ [0, 1]. Let Un = {f ∈ C([0, 1]) such that for each x ∈ [0, 1] there exists y ∈ [0, 1] such that |f (x) − f (y)| > n|x − y|}. Show that Un is open and dense in C([0, 1]) where for f ∈ C ([0, 1]), kf k ≡ sup {|f (x)| : x ∈ [0, 1]} . Show that ∩n Un is a dense Gδ set of nowhere differentiable continuous functions. Thus every continuous function is uniformly close to one which is nowhere differentiable. P∞ 4. ↑ Suppose f (x) = k=1 uk (x) where the convergence is uniform and each uk is a P∞ polynomial. Is it reasonable to conclude that f 0 (x) = k=1 u0k (x)? The answer is no. Use Problem 3 and the Weierstrass approximation theorem to show this. 5. Let X be a normed linear space. A ⊆ X is “weakly bounded” if for each x∗ ∈ X 0 , sup{|x∗ (x)| : x ∈ A} < ∞, while A is bounded if sup{kxk : x ∈ A} < ∞. Show A is weakly bounded if and only if it is bounded. 6. Let f be a 2π periodic locally integrable function on R. The Fourier series for f is given by n ∞ X X ak eikx ≡ lim Sn f (x) ak eikx ≡ lim n→∞
k=−∞
k=−n
where Z
1 ak = 2π
n→∞
π
e−ikx f (x) dx.
−π
Show Z
π
Dn (x − y) f (y) dy
Sn f (x) = −π
where Dn (t) = Verify that
Rπ −π
sin((n + 21 )t) . 2π sin( 2t )
Dn (t) dt = 1. Also show that if g ∈ L1 (R) , then Z lim
a→∞
g (x) sin (ax) dx = 0. R
This last is called the Riemann Lebesgue lemma. Hint: For the last part, assume first that g ∈ Cc∞ (R) and integrate by parts. Then exploit density of Cc∞ (R) in L1 (R).
356
CHAPTER 13. BANACH SPACES
7. ↑It turns out that the Fourier series sometimes converges to the function pointwise. Suppose f is 2π periodic and Holder continuous. That is |f (x) − f (y)| ≤ θ K |x − y| where θ ∈ (0, 1]. Show that if f is like this, then the Fourier series converges to f at every point. Next modify your argument to show that if at every θ point, x, |f (x+) − f (y)| ≤ K |x − y| for y close enough to x and larger than x and θ |f (x−) − f (y)| ≤ K |x − y| for every y close enough to x and smaller than x, then (x−) , the midpoint of the jump of the function. Hint: Use Problem Sn f (x) → f (x+)+f 2 6. 8. ↑ Let Y = {f such that f is continuous, defined on R, and 2π periodic}. Define kf kY = sup{|f (x)| : x ∈ [−π, π]}. Show that (Y, k kY ) is a Banach space. Let x ∈ R and define Ln (f ) = Sn f (x). Show Ln ∈ Y 0 but limn→∞ kLn k = ∞. Show that for each x ∈ R, there exists a dense Gδ subset of Y such that for f in this set, |Sn f (x)| is unbounded. Finally, show there is a dense Gδ subset of Y having the property that |Sn f (x)| is unbounded on the rational numbers. Hint: To do the first part, let f (y) approximate sgn(Dn (x − y)). Here sgn r = 1 if r > 0, −1 if r < 0 and 0 if r = 0. This rules out one possibility of the uniform boundedness principle. After this, show the countable intersection of dense Gδ sets must also be a dense Gδ set. 9. Let α ∈ (0, 1]. Define, for X a compact subset of Rp , C α (X; Rn ) ≡ {f ∈ C (X; Rn ) : ρα (f ) + kf k ≡ kf kα < ∞} where kf k ≡ sup{|f (x)| : x ∈ X} and ρα (f ) ≡ sup{
|f (x) − f (y)| : x, y ∈ X, x 6= y}. α |x − y|
Show that (C α (X; Rn ) , k·kα ) is a complete normed linear space. This is called a Holder space. What would this space consist of if α > 1? 10. ↑Let X be the Holder functions which are periodic of period 2π. Define Ln f (x) = Sn f (x) where Ln : X → Y for Y given in Problem 8. Show kLn k is bounded independent of n. Conclude that Ln f → f in Y for all f ∈ X. In other words, for the Holder continuous and 2π periodic functions, the Fourier series converges to the function uniformly. Hint: Ln f (x) is given by Z π Ln f (x) = Dn (y) f (x − y) dy −π α
where f (x − y) = f (x) + g (x, y) where |g (x, y)| ≤ C |y| . Use the fact the Dirichlet kernel integrates to one to write =|f (x)|
Z
π
−π
zZ Dn (y) f (x − y) dy ≤
π
−π
Z +C
π
−π
sin
}|
{ Dn (y) f (x) dy
1 n+ y (g (x, y) / sin (y/2)) dy 2
Show the functions, y → g (x, y) / sin (y/2) are bounded in L1 independent of x and get a uniform bound on kLn k. Now use a similar argument to show {Ln f } is equicontinuous in addition to being uniformly bounded. In doing this you might proceed as follows. Show Z π 0 0 |Ln f (x) − Ln f (x )| ≤ Dn (y) (f (x − y) − f (x − y)) dy −π
13.4. EXERCISES ≤
357 α
kf kα |x − x0 | Z π 1 sin n+ + y −π 2
f (x − y) − f (x) − (f (x0 − y) − f (x0 )) sin y2
!
dy
Then split this last integral into two cases, one for |y| < η and one where |y| ≥ η. If Ln f fails to converge to f uniformly, then there exists ε > 0 and a subsequence, nk such that kLnk f − f k∞ ≥ ε where this is the norm in Y or equivalently the sup norm on [−π, π]. By the Arzela Ascoli theorem, there is a further subsequence, Lnkl f which converges uniformly on [−π, π]. But by Problem 7 Ln f (x) → f (x). 11. Let X be a normed linear space and let M be a convex open set containing 0. Define ρ(x) = inf{t > 0 :
x ∈ M }. t
Show ρ is a gauge function defined on X. This particular example is called a Minkowski functional. It is of fundamental importance in the study of locally convex topological vector spaces. A set M , is convex if λx + (1 − λ)y ∈ M whenever λ ∈ [0, 1] and x, y ∈ M . 12. ↑The Hahn Banach theorem can be used to establish separation theorems. Let M be an open convex set containing 0. Let x ∈ / M . Show there exists x∗ ∈ X 0 such that ∗ ∗ Re x (x) ≥ 1 > Re x (y) for all y ∈ M . Hint: If y ∈ M, ρ(y) < 1. Show this. If x∈ / M, ρ(x) ≥ 1. Try f (αx) = αρ(x) for α ∈ R. Then extend f to the whole space using the Hahn Banach theorem and call the result F , show F is continuous, then fix it so F is the real part of x∗ ∈ X 0 . 13. A Banach space is said to be strictly convex if whenever kxk = kyk and x 6= y, then
x + y
2 < kxk . F : X → X 0 is said to be a duality map if it satisfies the following: a.) kF (x)k = kxk. b.) F (x)(x) = kxk2 . Show that if X 0 is strictly convex, then such a duality map exists. The duality map is an attempt to duplicate some of the features of the Riesz map in Hilbert space. This Riesz map R is the map which takes a Hilbert space to its dual defined as follows. R (x) (y) = (y, x) The Riesz representation theorem for Hilbert space says this map is onto. Hint: For an arbitrary Banach space, let n o 2 F (x) ≡ x∗ : kx∗ k ≤ kxk and x∗ (x) = kxk Show F (x) 6= ∅ by using the Hahn Banach theorem on f (αx) = αkxk2 . Next show F (x) is closed and convex. Finally show that you can replace the inequality in the definition of F (x) with an equal sign. Now use strict convexity to show there is only one element in F (x). 14. Prove the following theorem which is an improved version of the open mapping theorem, [22]. Let X and Y be Banach spaces and let A ∈ L (X, Y ). Then the following are equivalent. AX = Y, A is an open map. Note this gives the equivalence between A being onto and A being an open map. The open mapping theorem says that if A is onto then it is open.
358
CHAPTER 13. BANACH SPACES
15. Suppose D ⊆ X and D is dense in X. Suppose L : D → Y is linear and kLxk ≤ Kkxk e defined on all of X with for all x ∈ D. Show there is a unique extension of L, L, e e kLxk ≤ Kkxk and L is linear. You do not get uniqueness when you use the Hahn Banach theorem. Therefore, in the situation of this problem, it is better to use this result. 16. ↑ A Banach space is uniformly convex if whenever kxn k, kyn k ≤ 1 and kxn + yn k → 2, it follows that kxn − yn k → 0. Show uniform convexity implies strict convexity (See Problem 13). Hint: Suppose it is not strictly convex. Then there exist kxk and
n = 1 consider xn ≡ x and yn ≡ y, and use the kyk both equal to 1 and xn +y 2 conditions for uniform convexity to get a contradiction. It was shown earlier that Lp is uniformly convex whenever ∞ > p > 1. See Hewitt and Stromberg [34] or Ray [57]. This involved the Clarkson inequalities. Recall that the space Lp (Ω) consists of all functions f : Ω → C which are measurable and which satisfy Z p p |f (ω)| dµ ≡ kf kLp < ∞ Ω
17. Show that a closed subspace of a reflexive Banach space is reflexive. 18. xn converges weakly to x if for every x∗ ∈ X 0 , x∗ (xn ) → x∗ (x). xn * x denotes weak convergence. Show that if kxn − xk → 0, then xn * x. 19. ↑ Show that if X is uniformly convex, then if xn * x and kxn k → kxk, it follows kxn − xk → 0. Hint: Use Lemma 13.3.9 to obtain f ∈ X 0 with kf k = 1 and f (x) = kxk. See Problem 16 for the definition of uniform convexity. Now by the weak convergence, you can argue that if x 6= 0, f (xn / kxn k) → f (x/ kxk). You also might try to show this in the special case where kxn k = kxk = 1. ∗
20. Suppose L ∈ L (X, Y ) and M ∈ L (Y, Z). Show M L ∈ L (X, Z) and that (M L) = L∗ M ∗ . 21. This problem presents the Radon Nikodym theorem. Suppose (Ω, F) is a measurable space and that µ and λ are two finite measures defined on F. Suppose also that λ µ which means that if µ (E) = 0 then λ (E) = 0. Now define Λ ∈ L2 (Ω, µ + λ) as follows. Z Λ (f ) ≡ f dλ Ω
Verify that this is really a bounded linear transformation on L2 (Ω, µ + λ) . Then by the Riesz representation theorem, Theorem 13.2.14, there exists h ∈ L2 (Ω, µ + λ) such that for all f ∈ L2 (Ω, µ + λ) , Z Z f dλ = hf d (λ + µ) Ω
Ω
Verify that h has almost all values real and contained in [0, 1). This will use λ µ. Then note the following: Z Z f (1 − h) dλ = hf dµ Ω
Ω
Pn−1 Now for E ∈ F, let fn = XE k=0 hk . Thus Z Z X n (1 − hn ) dλ = hk dµ E
E k=1
Use monotone convergence to show that Z ∞ X λ (E) = gdµ, g = hk E
k=1
13.4. EXERCISES
359
Show that g ∈ L1 (Ω, µ). Formulate this as a theorem. It is called the Radon Nikodym theorem. This elegant approach is due to Von Neumann.
360
CHAPTER 13. BANACH SPACES
Chapter 14
Representation Theorems 14.1
Radon Nikodym Theorem
This chapter is on various representation theorems. The first theorem, the Radon Nikodym Theorem, is a representation theorem for one measure in terms of another. This important theorem represents one measure in terms of another. It is Theorem 6.10.13 on Page 177. This section is a short review of this major theorem. For a very different approach, see [61] where a nice proof due to Von Neumann is used.
Definition 14.1.1
Let µ and λ be two measures defined on a σ-algebra S, of subsets of a set, Ω. λ is absolutely continuous with respect to µ,written as λ µ, if λ(E) = 0 whenever µ(E) = 0. A complex measure λ defined on a σ-algebra SPis one which has the property that if the Ei are distinct and measurable, then λ (∪i Ei ) = i λ (Ei ). Recall Corollary6.10.17 on Page 178. I am stating it next for convenience.
Corollary 14.1.2 Let λ be a signed measure and let λ µ meaning that if µ (E) = 1 0 ⇒ λ (E) R = 0. Here assume that µ is a finite measure. Then there exists h ∈ L such that λ (E) = E hdµ.
There is an easy corollary to this.
Corollary 14.1.3 Let λ be a complex R measure and λ µ for µ a finite measure. Then there exists h ∈ L1 such that λ (E) =
E
hdµ.
Proof: Let (Re λ) (E) = Re (λ (E)) with Im λ defined similarly. Then R these are signed 1 measures and so there are functions f , f in L such that Re λ (E) = f dµ, Im λ (E) = 1 2 E 1 R f dµ. Then h ≡ f + if satisfies the necessary condition. 2 1 2 E More general versions are available. To see one of these, one can read the treatment in Hewitt and Stromberg [34]. This involves the notion of decomposable measure spaces, a generalization of σ finite.
14.2
Vector Measures
The next topic will use the Radon Nikodym theorem. It is the topic of vector and complex measures. The main interest is in complex measures although a vector measure can have values in any topological vector space. Whole books have been written on this subject. See for example the book by Diestal and Uhl [21] titled Vector measures. 361
362
CHAPTER 14. REPRESENTATION THEOREMS
Definition 14.2.1 Let (V, k·k) be a normed linear space and let (Ω, S) be a measure space. A function µ : S → V is a vector measure if µ is countably additive. That is, if {Ei }∞ i=1 is a sequence of disjoint sets of S, µ(∪∞ i=1 Ei ) =
∞ X
µ(Ei ).
i=1
Note that it makes sense to take finite sums because it is given that µ has values in a vector space in which vectors can be summed. In the above, µ (Ei ) is a vector. It might be a point in Rn or in any other vector space. In many of the most important applications, it is a vector in some sort of function space P∞ which may be infinite Pn dimensional. The infinite sum has the usual meaning. That is i=1 µ(Ei ) = limn→∞ i=1 µ(Ei ) where the limit takes place relative to the norm on V .
Definition 14.2.2 Let (Ω, S) be a measure space and let µ be a vector measure defined on S. A subset, π(E), of S is called a partition of E if π(E) consists of finitely many disjoint sets of S and ∪π(E) = E. Let X |µ|(E) = sup{ kµ(F )k : π(E) is a partition of E}. F ∈π(E)
|µ| is called the total variation of µ. The next theorem may seem a little surprising. It states that, if finite, the total variation is a nonnegative measure.
Theorem 14.2.3 If |µ|(Ω) < ∞, then |µ| is a measure on S. Even if |µ| (Ω) = ∞, P∞ |µ| (∪∞ E ) ≤ |µ| (E i i ) . That is |µ| is subadditive and |µ| (A) ≤ |µ| (B) whenever i=1 i=1 A, B ∈ S with A ⊆ B. Proof: Consider the last claim. Let a < |µ| (A) and let π (A) be a partition of A such that X a< kµ (F )k . F ∈π(A)
Then π (A) ∪ {B \ A} is a partition of B and X |µ| (B) ≥ kµ (F )k + kµ (B \ A)k > a. F ∈π(A)
Since this is true for all such a, it follows |µ| (B) ≥ |µ| (A) as claimed. ∞ Let {Ej }j=1 be a sequence of disjoint sets of S and let E∞ = ∪∞ j=1 Ej . Then letting a < |µ| (E∞ ) , it follows from the definition of total variation there exists a partition of E∞ , Pn π(E∞ ) = {A1 , · · · , An } such that a < i=1 kµ(Ai )k. Also, Ai = ∪∞ j=1 Ai ∩ Ej P∞ and so by the triangle inequality, kµ(Ai )k ≤ j=1 kµ(Ai ∩ Ej )k. Therefore, by the above, and either Fubini’s theorem or Lemma 1.8.3 on Page 22 ≥kµ(Ai )k
a
0 was arbitrary, |µ|(E1 ∪ E2 ) ≥ |µ|(E1 ) + |µ|(E2 ).
(14.1)
Then 14.1 implies that whenever the Ei are disjoint, |µ|(∪nj=1 Ej ) ≥ ∞ X
|µ|(Ej ) ≥
|µ|(∪∞ j=1 Ej )
≥
|µ|(∪nj=1 Ej )
j=1
≥
n X
Pn
j=1
|µ|(Ej ). Therefore,
|µ|(Ej ).
j=1
P∞ Since n is arbitrary, |µ|(∪∞ j=1 Ej ) = j=1 |µ|(Ej ) which shows that |µ| is a measure as claimed. In the case that µ is a complex measure, it is always the case that |µ| (Ω) < ∞ this is shown soon. However, first is an interesting corollary. It concerns the case that µ is only finitely additive.
Corollary 14.2.4 Suppose (Ω, F) is a set with a σ algebra of subsets F and suppose Pn n
µ : F → C is only finitely additive. That is, µ (∪i=1 Ei ) = i=1 µ (Ei ) whenever the Ei are disjoint. Then |µ| , defined in the same way as above, is also finitely additive provided |µ| is finite. Proof: Say E ∩ F = ∅ for E, F ∈ F. Let π (E) , π (F ) suitable partitions for which the following holds. X X |µ| (E ∪ F ) ≥ |µ (A)| + |µ (B)| ≥ |µ| (E) + |µ| (F ) − 2ε. A∈π(E)
B∈π(F )
Since ε is arbitrary, |µ| (E ∩ F ) ≥ |µ| (E)+|µ| (F ) . Similar considerations apply Pn to any finite union of disjoint sets. That is, if the Ei are disjoint, then |µ| (∪ni=1 Ei ) ≥ i=1 |µ| (Ei ) . Now let E = ∪ni=1 Ei where the Ei are disjoint. Then letting π (E) be a suitable partition of E, X |µ| (E) − ε ≤ |µ (F )| , F ∈π(E)
it follows that |µ| (E) ≤ ε +
X F ∈π(E)
n X X |µ (F )| = ε + µ (F ∩ Ei ) F ∈π(E) i=1
364
CHAPTER 14. REPRESENTATION THEOREMS
≤ε+
n X X
|µ (F ∩ Ei )| ≤ ε +
i=1 F ∈π(E)
n X
|µ| (Ei )
i=1
Pn Since ε is arbitrary, this shows |µ| (∪ni=1 Ei ) ≤ i=1 |µ| (Ei ) . Thus |µ| is finitely additive. In the case that µ is a complex measure, it is always the case that |µ| (Ω) < ∞. First is a lemma.
Lemma 14.2.5 Suppose µ is a real valued measure (signed measure by Definition 6.10.7). Then |µ| is a finite measure. Proof: Suppose µ : F → R is a vector measure (signed measure by Definition 6.10.7). By the Hahn decomposition, Theorem 6.10.10 on Page 176, Ω = P ∪ N where P is a positive set and N is a negative one. Then on N , −µ acts like a measure in the sense that if A ⊆ B and A, B measurable subsets of N, then −µ (A) ≤ −µ (B). Similarly µ is a measure on P . X X |µ (F )| ≤ (|µ (F ∩ P )| + |µ (F ∩ N )|) F ∈π(Ω)
=
F ∈π(Ω)
X
µ (F ∩ P ) +
F ∈π(Ω)
−µ (F ∩ N )
F ∈π(Ω)
∪F ∈π(Ω) F ∩ P + −µ
=µ
X
∪F ∈π(Ω) F ∩ N ≤ µ (P ) + |µ (N )|
It follows that |µ| (Ω) < µ (P ) + |µ (N )| and so |µ| has finite total variation.
Theorem 14.2.6 Suppose µ is a complex measure on (Ω, S) where S is a σ algebra of subsets of Ω. That is, whenever {Ei } is a sequence of disjoint sets of S, µ (∪∞ i=1 Ei ) =
∞ X
µ (Ei ) .
i=1
Then |µ| (Ω) < ∞. Proof: If µ is a vector measure with values in C, Re µ and Im µ have values in R. Then X X |µ (F )| ≤ |Re µ (F )| + |Im µ (F )| F ∈π(Ω)
F ∈π(Ω)
=
X F ∈π(Ω)
≤
|Re µ (F )| +
X
|Im µ (F )|
F ∈π(Ω)
|Re µ| (Ω) + |Im µ| (Ω) < ∞
thanks to Lemma 14.2.5.
Theorem 14.2.7
Let (Ω, S) be a measure space and let λ : S → C be a complex vector measure. Thus |λ|(Ω) < ∞. Let µ : S → [0, µ(Ω)] be a finite measure such that λ µ. Then there exists a unique f ∈ L1 (Ω) such that for all E ∈ S, Z f dµ = λ(E). E
Proof: It is clear that Re λ and Im λ are real-valued vector measures on S. Since |λ|(Ω) < ∞, it follows easily that | Re λ|(Ω) and | Im λ|(Ω) < ∞. This is clear because |λ (E)| ≥ |Re λ (E)| , |Im λ (E)| . Therefore, each of | Im λ| − Im(λ) | Re λ| + Re λ | Re λ| − Re(λ) | Im λ| + Im λ , , , and 2 2 2 2
14.2. VECTOR MEASURES
365
are finite measures on S. It is also clear that each of these finite measures are absolutely continuous with respect to µ and so there exist unique nonnegative functions in L1 (Ω), f1, f2 , g1 , g2 such that for all E ∈ S, Z 1 (| Re λ| + Re λ)(E) = f1 dµ, 2 ZE 1 (| Re λ| − Re λ)(E) = f2 dµ, 2 ZE 1 (| Im λ| + Im λ)(E) = g1 dµ, 2 E Z 1 (| Im λ| − Im λ)(E) = g2 dµ. 2 E Now let f = f1 − f2 + i(g1 − g2 ). Next is the notion of representing the vector measure in terms of |µ|. It is like representing a complex number in the form reiθ . The proof requires the following lemma.
Lemma 14.2.8 Suppose (Ω, S, µ) is a measure space and f is a function in L1 (Ω, µ) with the property that Z f dµ ≤ µ(E) E
for all E ∈ S. Then |f | ≤ 1 a.e. Proof of the lemma: Consider the following picture where B(p, r) ∩ B(0, 1) = ∅. Let E = f −1 (B(p, r)). In fact µ (E) = 0. If µ(E) 6= 0 then Z Z Z 1 1 1 f dµ − p = (f − p)dµ ≤ |f −p|dµ < r µ(E) E µ(E) E B(p, r) µ(E) E R (0, 0) 1 f dµ is closer because on E, |f (ω) − p| < r. Hence µ(E) p E 1 1 R to p than r and so µ(E) E f dµ > 1. Refer to the picture. However, this contradicts the assumption of the lemma. It follows µ(E) = 0. Since the set of complex numbers z such ∞ that |z| > 1 is an open set, it equals the union of countably many balls, {Bi }i=1 . Therefore, ∞ X −1 µ f −1 ({z ∈ C : |z| > 1} = µ ∪∞ f (B ) ≤ µ f −1 (Bk ) = 0. k k=1 k=1
Thus |f (ω)| ≤ 1 a.e. as claimed.
Corollary 14.2.9 Let λ be a complex vector measure with |λ|(Ω) < ∞.1 Then there R 1 exists a unique f ∈ L (Ω) such that λ(E) = is called the polar decomposition of λ.
E
f d|λ|. Furthermore, |f | = 1 for |λ| a.e. This
Proof: First note that λ |λ| and so such an L1 function exists and is unique by Theorem 14.2.7. It is required to show |f | = 1 a.e. If |λ|(E) 6= 0, Z λ(E) 1 = ≤ 1. f d|λ| |λ|(E) |λ|(E) E Therefore by Lemma 14.2.8, |f | ≤ 1, |λ| a.e. Now let 1 En = |f | ≤ 1 − . n 1 As
proved above, the assumption that |λ| (Ω) < ∞ is redundant.
366
CHAPTER 14. REPRESENTATION THEOREMS
Let {F1 , · · · , Fm } be a partition of En . Then X m m Z m Z X X |λ (Fi )| = f d |λ| ≤ i=1
≤
m Z X i=1
1−
Fi
Fi
i=1
1 n
d |λ| =
m X
i=1
1 n
1−
i=1
|f | d |λ|
Fi
1 |λ| (Fi ) = |λ| (En ) 1 − n
Then taking the supremum over all partitions, 1 |λ| (En ) ≤ 1 − |λ| (En ) n which shows |λ| (En ) = 0. Hence |λ| ([|f | < 1]) = 0 because [|f | < 1] = ∪∞ n=1 En .
Corollary 14.2.10 Suppose (Ω, S) is a measure space and µ is a finite nonnegative measure on S. Then for h ∈ L1 (µ) , define a complex measure, λ by Z hdµ. λ (E) ≡ E
Then
Z |λ| (E) =
|h| dµ. E
Furthermore, |h| = gh where gd |λ| is the polar decomposition of λ, Z λ (E) = gd |λ| E
Proof: From Corollary 14.2.9 there exists g such that |g| = 1, |λ| a.e. and for all E ∈ S Z Z λ (E) = gd |λ| = hdµ. E
E
Let sn be a sequence of simple functions converging pointwise to g. Then from the above, Z Z gsn d |λ| = sn hdµ. E
E
Passing to the limit using the dominated convergence theorem, Z Z d |λ| = ghdµ. E
E
It follows gh ≥ 0 a.e. and |g| = 1. Therefore, |h| = |gh| = gh. It follows from the above, that Z Z Z Z |λ| (E) = d |λ| = ghdµ = d |λ| = |h| dµ E
14.3
E
E
E
The Dual Space of Lp (Ω)
This is on representation of the dual space of Lp (Ω).
Theorem 14.3.1
(Riesz representation theorem) Let ∞ > p > 1 and let (Ω, S, µ) be a finite measure space. If Λ ∈ (Lp (Ω))0 , then there exists a unique h ∈ Lq (Ω) ( p1 + 1q = 1) such that Z Λf = hf dµ. Ω
This function satisfies khkq = kΛk where kΛk is the operator norm of Λ.
14.3. THE DUAL SPACE OF LP (Ω)
367
Proof: (Uniqueness) If h1 and h2 both represent Λ, consider f = |h1 − h2 |q−2 (h1 − h2 ), where h denotes complex conjugation. By Holder’s inequality, it is easy to see that f ∈ R p L (Ω). Thus 0 = Λf − Λf = h |h − h2 |q−2 (h1 − h2 ) − h2 |h1 − h2 |q−2 (h1 − h2 )dµ = 1 1 R q |h1 − h2 | dµ. Therefore h1 = h2 and this proves uniqueness. Now let λ(E) = Λ(XE ). Since this is a finite measure space, XE is an element of Lp (Ω) and so it makes sense to write Λ (XE ). Is λ a complex measure? ∞ n If {Ei }∞ i=1 is a sequence of disjoint sets of S, let Fn = ∪i=1 Ei , F = ∪i=1 Ei . Then by the Dominated Convergence theorem, kXFn − XF kp → 0. Therefore, by continuity of Λ, n X
λ(F ) ≡ Λ(XF ) = lim Λ(XFn ) = lim n→∞
n→∞
Λ(XEk ) =
k=1
∞ X
λ(Ek ).
k=1
This shows λ is a complex measure. It is also clear from the definition of λ that R λ µ. Therefore, by the Radon Nikodym theorem, there exists h ∈ L1 (Ω) with λ(E) = E hdµ = Λ(XE ). Actually h ∈ Lq and satisfies the other conditions above. This is shown next. Pm Let s = i=1 ci XEi be a simple function. Then since Λ is linear, Λ(s) =
m X
ci Λ(XEi ) =
i=1
m X
Z ci
i=1
Z hdµ =
hsdµ.
(14.2)
Ei
Claim: If f is uniformly bounded and measurable, then Z Λ (f ) = hf dµ. Proof of claim: Since f is bounded and measurable, there exists a sequence of simple functions, {sn } which converges to f pointwise and in Lp (Ω) , |sn | ≤ |f |. This follows from Theorem 5.1.9 on Page 140 upon breaking f up into positive and negative parts of real and complex parts. In fact this theorem gives uniform convergence. Then Z Z Λ (f ) = lim Λ (sn ) = lim hsn dµ = hf dµ, n→∞
n→∞
the first equality holding because of continuity of Λ, the second following from 14.2 and the third holding by the dominated convergence theorem. This is a very nice formula but it still has not been shown that h ∈ Lq (Ω). Let En = {x : |h(x)| ≤ n}. Thus |hXEn | ≤ n. Then |hXEn |q−2 (hXEn ) ∈ Lp (Ω). By the claim, it follows that Z q khXEn kq = h|hXEn |q−2 (hXEn )dµ = Λ(|hXEn |q−2 (hXEn ))
≤ kΛk |hXEn |q−2 (hXEn ) p =
Z
q
|hXEn | dµ
1/p
q
= kΛk khXEn kqp,
because q − 1 = q/p and so it follows that khXEn kq ≤ kΛk. Letting n → ∞, the monotone convergence theorem implies khkq ≤ kΛk. (14.3) Now that h is in Lq (Ω), it follows from 14.2 and the density of the simple functions, R Theorem 8.4.1 on Page 218, that Λf = hf dµ for all f ∈ Lp (Ω). It only remains to verify
368
CHAPTER 14. REPRESENTATION THEOREMS
the last claim that khkq = kΛkRnot just 14.3. However, from the definition and Holder’s inequality and 14.3, kΛk ≡ sup{ hf : kf kp ≤ 1} ≤ khkq ≤ kΛk. Next consider the case of L1 (Ω) . What is its dual space? I will assume here that (Ω, µ) is a finite measure space. The argument will be a little different than the one given above for Lp with p > 1.
Theorem 14.3.2 (Riesz representation theorem) Let (Ω, S, µ) be a finite measure space. If Λ ∈ (L1 (Ω))0 , then there exists a unique h ∈ L∞ (Ω) such that Z Λ(f ) = hf dµ Ω ∞
for all f ∈ L (Ω). If h is the function in L (Ω) representing Λ ∈ (L1 (Ω))0 , then khk∞ = kΛk. 1
Proof: For measurable E, it follows that XE ∈ L1 (Ω, µ). Define a measure λ (E) ≡ Λ (XE ) . This is a complex measure as in the proof of Theorem 14.3.1. Then it follows from Corollary 14.1.3 that there exists a unique h ∈ L1 (Ω, µ) such that Z λ (E) ≡ Λ (XE ) = hdµ (14.4) E
R I will show that h ∈ L∞ (Ω, µ) and that Λ (f ) = R hf dµ for all f ∈ L1 (Ω) . First of all, 14.4 implies that for all simple functions s, Λ (s) = shdµ. Let {sn } be a sequence of simple functions which satisfies the following. ( h |h| if h (ω) 6= 0 ≡ sgn (h) lim sn (ω) = n→∞ 0 if h (ω) = 0 Since this is a finite measure space, sgn (h) is in L1 and sn → sgn (h) in L1 (Ω). Also for E a measurable set, XE sn → XE sgn (h) pointwise and in L1 . Then, using the dominated convergence theorem and continuity of Λ, Z Z |h| dµ = Λ (XE sgn (h)) ≤ kΛk XE |sgn h| dµ ≤ kΛk µ (E) E
Thus, whenever E is measurable, Z Z 1 |h| |h| dµ ≤ kΛk , dµ ≤ µ (E) µ (E) E kΛk E By Lemma 14.2.8 |h| ≤ 1 a.e. and so |h (ω)| ≤ kΛk a.e. ω kΛk
(14.5)
This shows h ∈R L∞ and the density of the simple functions in L1 implies that for any f ∈ L1 , Λ (f ) = hf dµ. It remains to verify that in fact khk∞ = kΛk . Z Z |Λ (f )| = hf dµ ≤ khk∞ |f | dµ = khk∞ kf k1 and so kΛk ≤ khk∞ . With 14.5, this shows the two are equal. A more geometric treatment of the case where ∞ > p > 1 is in Hewitt and Stromberg [34]. It is also included in my Real and Abstract Analysis book on my web site. I have been assured that this other way is the right way to look at it because of its link to geometry and I think that those who say this are right. However, I have never needed this representation theorem for any measure space which is not σ finite and it is shorter to do what is being done here. Next these results are extended to the σ finite case through the use of a trick.
14.3. THE DUAL SPACE OF LP (Ω)
369
Lemma 14.3.3 Let (Ω, S, µ) be a measure space and suppose there exists a measurable function, r such that r (x) > 0 for all x, there exists M such that |r (x)| < M for all x, R and rdµ < ∞. ThenR for Λ ∈ (Lp (Ω, µ))0 , p ≥ 1, there exists h ∈ Lq (Ω, µ), L∞ (Ω, µ) if p = 1 such that Λf = hf dµ. Also khk = kΛk. (khk = khkq if p > 1, khk∞ if p = 1). Here 1 1 p + q = 1. Proof: Define a new measure µ e, according to the rule Z µ e (E) ≡ rdµ.
(14.6)
E
0
Thus µ e is a finite measure on S. For Λ ∈ (Lp (µ)) , Λ (f ) = Λ r1/p r−1/p f 0 e (g) ≡ Λ r1/p g . Now Λ ˜ is in Lp (˜ where Λ µ) because 1/p Z 1/p p e 1/p r g dµ Λ (g) ≡ Λ r g ≤ kΛk
˜ r−1/p f =Λ
Ω
d˜ µ
Z = kΛk
1/p
z}|{ p |g| rdµ
Ω
= kΛk kgkLp (eµ)
Therefore, by Theorems 14.3.2 and 14.3.1 there exists a unique h ∈ Lq (e µ) which represents e Here q = ∞ if p = 1 and satisfies 1/q + 1/p = 1 otherwise. Then Λ. Z Z −1/p −1/p ˜ Λ (f ) = Λ r f = hf r rdµ = f hr1/q dµ Ω
Ω
˜ ∈ Lq (µ) since h ∈ Lq (˜ Now hr ≡h µ). In case p = 1, Lq (e µ) and Lq (µ) are exactly the same. In this case you have Z Z ˜ r−1 f = Λ (f ) = Λ hf r−1 rdµ = f hdµ 1/q
Ω
Ω
˜ Thus the desired representation holds. Then in any case,|Λ (f )| ≤ h
q kf kLp so kΛk ≤ L
˜
h q . Also, as before, L Z
q q−2
˜ ˜ ˜ q−2 ˜ ˜ ˜ = h h hdµ = Λ h h
h Lq (µ)
Ω
Z 1/p ˜ q−2 ˜ p h| h dµ |
≤
kΛk
=
Z p 1/p ˜ q/p q/p = kΛk khk kΛk h
Ω
Ω
˜ and so h
Lq (µ)
˜ ≤ kΛk ≤ h
Lq (µ)
˜ . It works the same for p = 1. Thus h
Lq (µ)
= kΛk .
A situation in which the conditions of the lemma are satisfied is the case where the measure space is σ finite. In fact, you should show this is the only case in which the conditions of the above lemma hold.
Theorem 14.3.4 (Riesz representation theorem) Let (Ω, S, µ) be σ finite and let Λ ∈ (Lp (Ω,R µ))0 , p ≥ 1. Then there exists a unique h ∈ Lq (Ω, µ), L∞ (Ω, µ) if p = 1 such that Λf = hf dµ. Also khk = kΛk. (khk = khkq if p > 1, khk∞ if p = 1). Here p1 + 1q = 1. Proof: Without loss of generality, assume µ (Ω) = ∞. By Proposition 6.10.6, either µ is a finite measure or µ (Ω) = ∞. These are the only two cases. Then let {Ωn } be a sequence ∞ of disjoint of S having the property n = Ω. Define R that 1 < µ(Ω R n ) < ∞, ∪n=1 ΩP P∞elements ∞ 1 −1 r(x) = n=1 n2 XΩn (x) µ(Ωn ) , µ e(E) = E rdµ. Thus Ω rdµ = µ e(Ω) = n=1 n12 < ∞ so µ e is a finite measure. The above lemma gives the existence part of the conclusion of the theorem. Uniqueness is done as before.
370
14.4
CHAPTER 14. REPRESENTATION THEOREMS
The Dual Space Of L∞ (Ω)
What about the dual space of L∞ (Ω)? This will involve the following Lemma. Also recall the notion of total variation defined in Definition 14.2.2.
Lemma 14.4.1 Let (Ω, F) be a measure space. Denote by BV (Ω) the space of finitely additive complex measures ν such that |ν| (Ω) < ∞. Then defining kνk ≡ |ν| (Ω) , it follows that BV (Ω) is a Banach space. Proof: It is obvious that BV (Ω) is a vector space with the obvious conventions involving scalar multiplication. Why is k·k a norm? All the axioms are obvious except for the triangle inequality. However, this is not too hard either.
kµ + νk
≡
|µ + ν| (Ω) = sup
X
π(Ω)
≤
sup
X
π(Ω)
≡
|µ (A)|
A∈π(Ω)
+ sup
X
π(Ω)
A∈π(Ω)
|µ (A) + ν (A)|
|ν (A)|
A∈π(Ω)
|µ| (Ω) + |ν| (Ω) = kνk + kµk .
Suppose now that {ν n } is a Cauchy sequence. For each E ∈ F, |ν n (E) − ν m (E)| ≤ kν n − ν m k and so the sequence of complex numbers ν n (E) converges. That to which it converges is called ν (E) . Then it is obvious that ν (E) is finitely additive. Why is |ν| finite? Since k·k is a norm, it follows that there exists a constant C such that for all n, |ν n | (Ω) < C Let π (Ω) be any partition. Then X |ν (A)| = lim
X
|ν n (A)| ≤ C.
n→∞ A∈π(Ω)
A∈π(Ω)
Hence ν ∈ BV (Ω). Let ε > 0 be given and let N be such that if n, m > N, then kν n − ν m k < ε/2. Pick any such n. Then choose π (Ω) such that |ν − ν n | (Ω) − ε/2
0 there exists a compact set K such that |f (x)| < ε whenever x ∈ / K. Recall the norm on this space is kf k∞ ≡ kf k ≡ sup {|f (x)| : x ∈ Rp } The next lemma has to do with extending functionals which are defined on nonnegative functions to complex valued functions in such a way that the extended function is linear. This exact process was used earlier with the abstract Lebesgue integral. Basically, you can do it when the functional “desires to be linear”. Let L ∈ C0 (Rp )0 . Also denote by C0+ (Rp ) the set of nonnegative continuous functions defined on Rp .
Definition 14.5.2
0
Letting L ∈ C0 (Rp ) , define for f ∈ C0+ (Rp )
λ(f ) = sup{|Lg| : |g| ≤ f, g ∈ C0+ (Rp )}. Note that λ(f ) < ∞ because |Lg| ≤ kLk kgk ≤ kLk kf k for |g| ≤ f . Isn’t this a lot like the total variation of a vector measure? Indeed it is, and the proof that λ wants to be linear is also similar to the proof that the total variation is a measure. This is the content of the following lemma.
Lemma 14.5.3 If c ≥ 0, λ(cf ) = cλ(f ), f1 ≤ f2 implies λ (f1 ) ≤ λ (f2 ), and λ(f1 + f2 ) = λ(f1 ) + λ(f2 ). Also 0 ≤ λ (f ) ≤ kLk kf k∞ Proof: The first two assertions are easy to see so consider the third. For i = 1, 2 and fi ∈ C0+ (Rp ) , let |gi | ≤ fi , λ (fi ) ≤ |Lgi | + ε Then let |ω i | = 1 and ω i L (gi ) = |L (gi )| so that |L (g1 )| + |L (g2 )| = ω 1 L (g1 ) + ω 2 L (g2 ) = L (ω 1 g1 + ω 2 g2 ) = |L (ω 1 g1 + ω 2 g2 )| Then λ (f1 ) + λ (f2 ) ≤
|L (g1 )| + |L (g2 )| + 2ε
= ω 1 L (g1 ) + ω 2 L (g2 ) + 2ε = L (ω 1 g1 ) + L (ω 2 g2 ) + 2ε = L (ω 1 g1 + ω 2 g2 ) + 2ε = |L (ω 1 g1 + ω 2 g2 )| + 2ε where |gi | ≤ fi and Now |ω 1 g1 + ω 2 g2 | ≤ |g1 | + |g2 | ≤ f1 + f2 and so the above shows λ (f1 ) + λ (f2 ) ≤ λ (f1 + f2 ) + 2ε. Since ε is arbitrary, λ (f1 ) + λ (f2 ) ≤ λ (f1 + f2 ) . It remains to verify the other inequality. Let |g| ≤ f1 + f2 , |Lg| ≥ λ(f1 + f2 ) − ε.
14.5. THE DUAL SPACE OF C0 RP Let
( hi (x) =
375
fi (x)g(x) f1 (x)+f2 (x)
if f1 (x) + f2 (x) > 0, 0 if f1 (x) + f2 (x) = 0.
Then hi is continuous and h1 (x) + h2 (x) = g(x), |hi | ≤ fi . The function hi is clearly continuous at points x where f1 (x) + f2 (x) > 0. The reason it is continuous at a point where f1 (x)+f2 (x) = 0 is that at every point y where f1 (y)+f2 (y) > 0, the top description of the function gives fi (y) g (y) ≤ |g (y)| ≤ f1 (y) + f2 (y) |hi (y)| = f1 (y) + f2 (y) so if |y − x| is small enough, |hi (y)| is also small. Then it follows from this definition of the hi that −ε + λ(f1 + f2 ) ≤
|Lg| = |Lh1 + Lh2 | ≤ |Lh1 | + |Lh2 |
≤ λ(f1 ) + λ(f2 ). Since ε > 0 is arbitrary, this shows that λ(f1 + f2 ) ≤ λ(f1 ) + λ(f2 ) ≤ λ(f1 + f2 ) The last assertion follows from λ(f ) = sup{|Lg| : |g| ≤ f } ≤
sup kgk∞ ≤kf k∞
kLk kgk∞ ≤ kLk kf k∞
Let Λ be the unique linear extension of Theorem 6.7.7. It is just like defining the integral for functions when you understand it for nonnegative functions. Then from the above lemma, |Λf | = ωΛf = Λ (ωf ) ≤ λ (|f |) ≤ kLk kf k∞ .
(14.8)
Also, if f ≥ 0, Λf = λ (f ) ≥ 0. Therefore, Λ is a positive linear functional on C0 (Rp ). In particular, it is a positive linear functional on Cc (Rp ). Thus there are now two linear continuous mappings L, Λ which are defined on C0 (Rp ) . The above shows that in fact kΛk ≤ kLk. Also, from the definition of Λ |Lg| ≤ λ (|g|) = Λ (|g|) ≤ kΛk kgk∞ so in fact, kLk ≤ kΛk showing that these two have the same operator norms. kLk = kΛk
(14.9)
By Theorem 7.2.1 on Page 185, since Λ isR a positive linear functional on Cc (Rp ), there exists a unique measure µ such that Λf = Rp f dµ for all f ∈ Cc (Rp ). This measure is regular. In fact, it is actually a finite measure.
Lemma 14.5.4 Let L ∈ C0 (Rp )0 as above. Then letting µ be the Radon measure just described, it follows µ is finite and µ (Rp ) = kΛk = kLk Proof: First of all, it was observed above in 14.9 that kΛk = kLk . Now by regularity, µ (Rp ) = sup {µ (K) : K ⊆ Rp }
376
CHAPTER 14. REPRESENTATION THEOREMS
and so letting K ≺ f ≺ Rp for one of these K, it also follows µ (Rp ) = sup {Λf : f ≺ Rp } However, for such nonnegative f ≺ Rp , λ (f ) ≡ sup {|Lg| : |g| ≤ f } ≤ sup {kLk kgk∞ : |g| ≤ f } ≤
kLk kf k∞ = kLk
and so 0 ≤ Λ (f ) = λ (f ) ≤ kLk. It follows that (Recall the construction of µ in the Riesz representation theorem being referred to here.) µ (Rp ) = sup {Λf : f ≺ Rp } ≤ kLk . Now since Cc (Rp ) is dense in C0 (Rp ) , there exists f ∈ Cc (Rp ) such that kf k∞ ≤ 1 and |Λf | + ε > kΛk = kLk Thus, kLk − ε < |Λf | ≤ λ (|f |) = Λ |f | ≤ µ (Rp ). Since ε is arbitrary, this shows kLk = µ (Rp ). What follows is the Riesz representation theorem for C0 (Rp )0 .
Theorem 14.5.5
Let L ∈ (C0 (Rp ))0 . Then there exists a finite Radon measure µ and a function σ ∈ L (R , µ) such that for all f ∈ C0 (Rp ) , Z L(f ) = f σdµ. ∞
p
Rp
Furthermore, µ (Rp ) = kLk , |σ| = 1 a.e. and if Z ν (E) ≡
σdµ E
then µ = |ν| . Proof: From the above there exists a unique Radon measure µ such that for all f ∈ Cc (Rp ) , Z Λf = f dµ Rp p
Then for f ∈ Cc (R ) , Z |Lf | ≤ λ (|f |) = Λ(|f |) =
|f |dµ = kf kL1 (µ) . Rp
Since µ is both inner and outer regular, Cc (Rp ) is dense in L1 (Rp , µ). (See Theorem 8.4.2 e for more than is needed.) Therefore L extends uniquely to an element of (L1 (Rp , µ))0 , L. By the Riesz representation theorem for L1 for finite measure spaces, there exists a unique σ ∈ L∞ (Rp , µ) such that for all f ∈ L1 (Rp , µ) , Z e = Lf f σdµ Rp
In particular, for all f ∈ C0 (Rp ) , Z Lf =
f σdµ Rp
and it follows from Lemma 14.5.4, µ (Rp ) = kLk.
14.5. THE DUAL SPACE OF C0 RP
377
It remains to verify |σ| = 1 a.e. For any continuous f ≥ 0, Z Z Λf ≡ f σdµ f dµ ≥ |Lf | = Rp
Rp
Now if E is measurable, the regularity of µ implies there exists a sequence of nonnegative bounded functions fn ∈ Cc (Rp ) such that fn (x) → XE (x) a.e. and in L1 (µ) . Then using the dominated convergence theorem in the above, Z Z fn dµ = lim Λ (fn ) ≥ lim |Lfn | dµ = lim n→∞ n→∞ n→∞ Rp E Z Z fn σdµ = σdµ = lim n→∞ p E
R
and so if µ (E) > 0, 1 ≥
Z
1 µ (E)
E
σdµ
which shows from Lemma 14.2.8 that |σ| ≤ 1 a.e. But also, choosing f1 appropriately, kf1 k∞ ≤ 1, |Lf1 | + ε > kLk = µ (Rp ) Letting ω (Lf1 ) = |Lf1 | , |ω| = 1, µ (Rp )
kLk =
=
sup |Lf | ≤ |Lf1 | + ε kf k∞ ≤1
Z ωLf1 + ε = f1 ωσdµ + ε Rp Z Re (f1 ωσ) dµ + ε p ZR |σ| dµ + ε
= = ≤
Rp
and since ε is arbitrary, µ (Rp ) ≤
Z
|σ| dµ ≤ µ (Rp )
Rp
which requires |σ| = 1 a.e. since it was shown to be no larger than 1 and if it is smaller than 1 on a set of positive measure, then the above could not hold. It only remains to verify µ = |ν|. By Corollary 14.2.10, Z Z |ν| (E) = |σ| dµ = 1dµ = µ (E) E
E
and so µ = |ν| . Sometimes people write Z
Z f dν ≡
Rp
f σd |ν| Rp
where σd |ν| is the polar decomposition of the complex measure ν. Then with this convention, the above representation is Z L (f ) = f dν, |ν| (Rp ) = kLk . Rp
Also note that at most one ν can represent L. If there were two of them ν i , i = 1, 2, then ν 1 − ν 2 would represent 0 and so |ν 1 − ν 2 | (Rp ) = 0. Hence ν 1 = ν 2 . All of this works with virtually no change if Rp is replaced with a Polish space, a complete separable metric space in which closed balls are compact, or even more generally for a locally
378
CHAPTER 14. REPRESENTATION THEOREMS
compact Hausdorff space. In the first case, you simply replace Rp with X, the name of the Polish space. In the second case there are only a few technical details. However, I am trying to emphasize the case which is of most interest and all of the principal ideas are present in this more specific example. The following is a rather important application of the above theory along with the material on Fourier transforms presented earlier. It has to do with the fact that the characteristic function of a probability measure is unique so if two such measures have the same one, then they are the same measure. This can actually be shown rather easily. You don’t have to accept this kind of thing on faith and speculations based on special cases. Example 14.5.6 Let µ and ν be two Radon probability measures on the Borel sets of Rp . The typical situation is that these are probability distribution functions for two random varip/2 ables. Then the characteristic function of µ is (2π) times the inverse Fourier transform of µ Z φµ (t) ≡ eit·x dµ (x) Rp
then a very important theorem from probability says that if φµ (t) = φν (t) , then the two measures are equal. This is very easy at this point, but not so easy if you don’t have the general treatment of Fourier transforms presented above. R We have F −1 (µ) = F −1 (ν) in G ∗ . Therefore, µ = ν in G ∗ and by definition, Rp ψdµ = ψdν for all ψ ∈ G. But by the Stone Weierstrass theorem, G is dense in C0 (Rp ) and so Rp the equation holds for all ψ ∈ C0 (Rp ). Now Cc (Rp ) ⊆ C0 (Rp ) and so the equation also holds for all ψ ∈ Cc (Rp ). By uniqueness in the Riesz representation theorem, it follows that µ = ν. You could also use Theorem 14.5.5.
R
14.6
Exercises
1. Suppose µ is a vector measure having values in Rn or Cn . Can you show that |µ| must be finite? Hint: You might define for each ei , one of the standard basis vectors, the real or complex measure, µei given by µei (E) ≡ ei · µ (E) . Why would this approach not yield anything for an infinite dimensional normed linear space in place of Rn ? 2. The Riesz representation theorem of the Lp spaces can be used to prove a very interesting inequality. Let r, p, q ∈ (1, ∞) satisfy 1 1 1 = + − 1. r p q Then 1 1 1 1 =1+ − > q r p r and so r > q. Let θ ∈ (0, 1) be chosen so that θr = q. Then also 1/p+1/p0 =1
z }| { 1 1 = 1− 0 r p
1 1 1 + −1= − 0 q p q
and so 1 1 θ = − 0 q q p
14.6. EXERCISES
379
which implies p0 (1 − θ) = q. Now let f ∈ Lp (Rn ) , g ∈ Lq (Rn ) , f, g ≥ 0. Justify the steps in the following argument using what was just shown that θr = q and p0 (1 − θ) = q. Let 1 1 r0 n h ∈ L (R ) . + 0 =1 r r Z Z Z . f ∗ g (x) h (x) dx = f (y) g (x − y) h (x) dxdy Z Z θ 1−θ ≤ |f (y)| |g (x − y)| |g (x − y)| |h (x)| dydx Z Z
≤
Z
≤
"Z Z
|g (x − y)|
"Z Z
"Z
1−θ
|g (x − y)|
r
|h (x)|
=
|g (x − y)|
(1−θ)p
|g (x − y)| q/r
= kgkq
q/p0
kgkq
r
|h (x)|
Z
Z
r
1/r dy dx
r0 p0 /r0 |h (x)| dx dy
|f (y)| |g (x − y)|
p
0
θ
θ
|f (y)| "Z
r0 1/r0 |h (x)| dx ·
|f (y)| |g (x − y)|
1−θ
"Z Z
≤
|g (x − y)|
1−θ
θr
dx
dy #1/r0
r0 /p0 dy
·
dx
p/r #1/p dx dy
r0 /p0
0
·
#1/p
p/r
p0
#1/p0
dy
#1/r0 dx
q/r
kgkq
kf kp khkr0 = kgkq kf kp khkr0 .
kf kp (14.10)
Young’s inequality says that kf ∗ gkr ≤ kgkq kf kp .
(14.11)
Therefore kf ∗ gkr ≤ kgkq kf kp . How does this inequality follow from the above computation? Does 14.10 continue to hold if r, p, q are only assumed to be in [1, ∞]? Explain. Does 14.11 hold even if r, p, and q are only assumed to lie in [1, ∞]? 3. Suppose (Ω, µ, S) is a finite measure space and that {fn } is a sequence of functions which converge weakly to 0 in Lp (Ω). This means that Z fn gdµ → 0 Ω 0
for every g ∈ Lp (Ω). Suppose also that fn (x) → 0 a.e. Show that then fn → 0 in Lp−ε (Ω) for every p > ε > 0. 4. Give an example of a sequence of functions in L∞ (−π, π) which converges weak ∗ to zero but which does not converge pointwise a.e. to zero. Convergence weak ∗ to Rπ 0 means that for every g ∈ L1 (−π, π) , −π g (t) fn (t) dt → 0. Hint: First consider g ∈ Cc∞ (−π, π) and maybe try something like fn (t) = sin (nt). Do integration by parts.
380
CHAPTER 14. REPRESENTATION THEOREMS
5. Let λ be a real vector measure on the measure space (Ω, F). That is λ has values in R. The Hahn decomposition says there exist measurable sets P, N such that P ∪ N = Ω, P ∩ N = ∅, and for each F ⊆ P, λ (F ) ≥ 0 and for each F ⊆ N, λ (F ) ≤ 0. These sets P, N are called the positive set and the negative set respectively. Show the existence of the Hahn decomposition. Also explain how this decomposition is unique in the sense that if P 0 , N 0 is another Hahn decomposition, then (P \ P 0 ) ∪ (P 0 \ P ) has measure zero, a similar formula holding for N, N 0 . When you have the Hahn decomposition, as just described, you define λ+ (E) ≡ λ (E ∩ P ) , λ− (E) ≡ λ (E ∩ N ). This is sometimes called the Hahn Jordan decomposition. Hint: This is pretty easy if you use the polar decomposition above. 6. The Hahn decomposition holds for measures which have values in (−∞, ∞]. Let λ be such a measure which is defined on a σ algebra of sets F. This is not a vector measure because the set on which it has values is not a vector space. Thus this case is not included in the above discussion. N ∈ F is called a negative set if λ (B) ≤ 0 for all B ⊆ N. P ∈ F is called a positive set if for all F ⊆ P, λ (F ) ≥ 0. (Here it is always assumed you are only considering sets of F.) Show that if λ (A) ≤ 0, then there exists N ⊆ A such that N is a negative set and λ (N ) ≤ λ (A). Hint: This is done by subtracting off disjoint sets having positive meaure. Let A ≡ N0 and suppose Nn ⊆ A has been obtained. Tell why tn ≡ sup {λ (E) : E ⊆ Nn } ≥ 0. Let Bn ⊆ Nn such that λ (Bn ) >
tn 2
Then Nn+1 ≡ Nn \ Bn . Thus the Nn are decreasing in n and the Bn are disjoint. Explain why λ (Nn ) ≤ λ (N0 ). Let N = ∩Nn . Argue tn must converge to 0 since otherwise λ (N ) = −∞. Explain why this requires N to be a negative set in A which has measure no larger than that of A. 7. Using Problem 6 complete the Hahn decomposition for λ having values in (−∞, ∞]. Now the Hahn Jordan decomposition for the measure λ is λ+ (E) ≡ λ (E ∩ P ) , λ− (E) ≡ −λ (E ∩ N ) . Explain why λ− is a finite measure. Hint: Let N0 = ∅. For Nn a given negative set, let tn ≡ inf {λ (E) : E ∩ Nn = ∅} Explain why you can assume that for all n, tn < 0. Let En ⊆ NnC such that λ (En ) < tn /2 < 0 and from Problem 6 let An ⊆ En be a negative set such that λ (An ) ≤ λ (En ). Then Nn+1 ≡ Nn ∪ An . If tn does not converge to 0 explain why there exists a set having measure −∞ which is not allowed. Thus tn → 0. Let N = ∪∞ n=1 Nn and explain why P ≡ N C must be positive due to tn → 0. 8. What if λ has values in [−∞, ∞). Prove there exists a Hahn decomposition for λ as in the above problem. Why do we not allow λ to have values in [−∞, ∞]? Hint: You might want to consider −λ.
14.6. EXERCISES
381 ∞
9. Suppose X is a Banach space and let X 0 denote its dual space. A sequence {x∗n }n=1 in X 0 is said to converge weak ∗ to x∗ ∈ X 0 if for every x ∈ X, lim x∗n (x) = x∗ (x) .
n→∞
Let {φn } be a mollifier. Also let δ be the measure defined by δ (E) = 1 if 0 ∈ E and 0 if 1 ∈ / E. Explain how φn → δ weak ∗. 10. Let (Ω, F, P ) be a probability space and let X : Ω → Rn be a random variable. This means X −1 (open set) ∈ F. Define a measure λX on the Borel sets of Rn as follows. For E a Borel set, λX (E) ≡ P X −1 (E) Explain why this is well defined. Next explain why λX can be considered a Radon probability measure by completion. Explain why λX ∈ G ∗ if Z λX (ψ) ≡ ψdλX Rn
where G is the collection of functions used to define the Fourier transform. 11. Using the above problem, the characteristic function of this measure (random variable) is Z φX (y) ≡ eix·y dλX Rn
Show this always exists for any such random variable and is continuous. Next show that for two random variables X, Y, λX = λY if and only if φX (y) = φY (y) for all y. In other words, show the distribution measures are the same if and only if the characteristic functions are the same. A lot more can be concluded by looking at characteristic functions of this sort. The important thing about these characteristic functions is that they always exist, unlike moment generating functions. Note that this is a specific version of Example 14.5.6. n o 12. Let B be the ball f ∈ Lp (Ω) : kf kp ≤ M where Ω is a measurable subset of Rp and the measure is a Radon measure, and suppose you have a sequence {fk } ⊆ B. Show that there exists a subsequence, still denoted as {fk } and f ∈ B such that for all g ∈ Lq (Ω) , Z Z lim
k→∞
fk gdµ =
f gdµ
That is, show that B is weakly sequentially compact. This is the term for what you will show and it is the most important case of the Eberlein Smulian theorem on weak compactness. It serves as the basis for may existence theorems in non linear analysis. Hint: First use the fact that Lq (Ω) is separable, 1q + p1 = 1. See Corollary 9.7.3. Then use the Cantor diagonalization process which was used earlier in the proof of the Arzela Ascoli theorem to obtain a subsequence, still denoted as {fk } such that for each R g ∈ D for D the countable dense subset of Lq (Ω) , the sequence of complex numbers fk gdµ converges. Now show that this converges for every g ∈ Lq (Ω). Let R 0 F (q) ≡ limk→∞ fk gdµ. Show that F ∈ (Lq (Ω)) . Now use the Riesz representation p theorem to get f ∈ L (Ω) representing F . Observe that this works.
382
CHAPTER 14. REPRESENTATION THEOREMS
Part IV
Complex Analysis
383
Chapter 15
Fundamentals 15.1
Banach Spaces
I am going to present the most basic theorems in complex analysis for the case where the functions have values in a Banach space. See Chapter 13 above for a short discussion of the main properties of these spaces. There are good reasons for allowing functions to have values in a complex Banach space. In particular, when X is a Banach space, so is −1 L (X, X). The presentation of fundamental topics will include the case of λ → (λI − A) for A ∈ L (X, X). This will make possible an easy discussion of some very important theorems. Thus this extra generality is not just generalization for the sake of generalization. It is here for a real reason, to make the transition to spectral theory of linear operators believable. However, if the reader is convinced that these functional analysis topics will never be of use to them, then replace X with C and a standard treatment will be obtained. After presenting the fundamental concepts of this subject, in this general setting, I will specialize to the usual case in which the functions have complex values. In fact, this is the main thrust of this book. I am just trying not to neglect the other application. I assume the reader knows about vector spaces. In this book, all vector spaces are with respect to the field of complex numbers. A norm k·k is defined on a vector space X if kzk ≥ 0 and kzk = 0 if and only if z = 0
(15.1)
If α is a scalar, kαzk = |α| kzk
(15.2)
kz + wk ≤ kzk + kwk .
(15.3)
The last inequality above is called the triangle inequality. Another version of this is |kzk − kwk| ≤ kz − wk
(15.4)
Note this shows that x → kxk is a continuous function. Here is a very interesting result which is usually obtained in calculus from a use of the mean value theorem which of course does not apply to the case of a function having values in a Banach space.
Lemma 15.1.1 Let Y be a normed vector space and suppose h : [a, b] → Y is differentiable on (a, b) continuous on [a, b] and satisfies h0 (t) = 0. Then kh (b) − h (a)k = 0. 385
386
CHAPTER 15. FUNDAMENTALS Proof: Let ε > 0 be given and let S ≡ {t ∈ [a, b] : for all s ∈ [a, t] , kh (s) − h (a)k ≤ ε (s − a)}
Then a ∈ S. Let t = sup S. Then by continuity of h it follows kh (t) − h (a)k ≤ ε (t − a)
(15.5)
Suppose t < b. If strict inequality holds, then by continuity of h strict inequality will persist for s near t and t 6= sup S after all. Thus it must be the case that kh (t) − h (a)k = ε (t − a) Since t < b, there exist positive numbers, hk decreasing to 0 such that kh (t + hk ) − h (a)k > ε (t − a + hk ) Then ε (t − a + hk ) < kh (t + hk ) − h (a)k ≤ kh (t + hk ) − h (t)k + kh (t) − h (a)k = kh (t + hk ) − h (t)k + ε (t − a) and so εhk < kh (t + hk ) − h (t)k Now divide by hk and take a limit. This shows ε ≤ kh0 (t)k which does not happen. Hence t = b and so kh (b) − h (a)k ≤ ε (b − a) since ε > 0 is arbitrary, it follows that h (b) = h (a). Also there is a fundamental theorem about intersections. As before, a set F is closed if its complement is open or equivalently if it contains all of its limit points. The proof is exactly as done earlier.
Theorem 15.1.2 Let Fn ⊇ Fn+1 · · · where each Fn is a closed set in X a Banach space and suppose that the diameter of Fn converges to 0 as n → ∞. Then there exists a unique point in ∩∞ n=1 Fn . Proof: Obviously there can be no more than one point in the intersection because if x, y are two points, then kx − yk > δ > 0 for some δ but eventually both points would be in Fn where n is so large that the diameter of Fn is less than δ. As to existence of the point in the intersection, pick pn ∈ Fn . It is a Cauchy sequence and so it converges to p. Now the pk for k ≥ n is in Fn , a closed set so by Corollary 2.3.10 applied to X rather than Fp , it follows that p ∈ Fn , this for each n. Hence p ∈ ∩∞ n=1 Fn .
15.2
The Cauchy Riemann Equations
These fundamental equations pertain to a complex valued function of a complex variable. Recall the complex numbers should be considered as points in the plane. Thus a complex number is of the form x + iy where i2 = −1. The complex conjugate is defined by x + iy ≡ x − iy
15.2. THE CAUCHY RIEMANN EQUATIONS
387
and for z a complex number, 1/2
|z| ≡ (zz)
=
p
x2 + y 2 .
Thus when x + iy is considered an ordered pair (x, y) ∈ R2 the magnitude of a complex number is nothing more than the usual norm of the ordered pair. Also for z = x + iy, w = u + iv, q |z − w| =
2
2
(x − u) + (y − v)
so in terms of all topological considerations, R2 is the same as C. Thus to say z → f (z) is continuous, is the same as saying (x, y) → u (x, y) , (x, y) → v (x, y) are continuous where f (z) ≡ u (x, y) + iv (x, y) with u and v being called the real and imaginary parts of f . The only new thing is that writing an ordered pair (x, y) as x + iy with the convention i2 = −1 makes C into a field. Now here is the definition of what it means for a function to be analytic.
Definition 15.2.1
Let U be an open subset of C (R2 ) and let f : U → C be a function. Then f is said to be analytic on U if for every z ∈ U, lim
∆z→0
f (z + ∆z) − f (z) ≡ f 0 (z) ∆z
exists and is a continuous function of z ∈ U . For a function having values in C denote by u (x, y) the real part of f and v (x, y) the imaginary part. Both u and v have real values and f (x + iy) ≡ f (z) ≡ u (x, y) + iv (x, y)
Definition 15.2.2
More generally, suppose f has values in X, a complex normed complete vector space. Then define f to be analytic on an open subset U of C if f (z + h) − f (z) ≡ f 0 (z) h→0 h lim
exists and f 0 is a continuous function of z ∈ U . This time, the limit takes place in terms of the norm on X. Meaning the usual thing: For every ε > 0 there exists δ > 0 such that
0 f (z+h)−f (z) < ε whenever 0 < |h| < δ. Here there are no real and imaginary
f (z) −
h parts to consider. First are some simple results in the case that f has values in C.
Proposition 15.2.3 Let U be an open subset of C . Then f : U → C is analytic if and only if for f (x + iy) ≡ u (x, y) + iv (x, y) u (x, y) , v (x, y) being the real and imaginary parts of f , it follows ux (x, y) = vy (x, y) , uy (x, y) = −vx (x, y) and all these partial derivatives, ux , uy , vx , vy are continuous on U . (The above equations are called the Cauchy Riemann equations.) Proof: First suppose f is analytic. First let ∆z = ih and take the limit of the difference quotient as h → 0 in the definition. Thus from the definition, f (z + ih) − f (z) ih u (x, y + h) + iv (x, y + h) − (u (x, y) + iv (x, y)) = lim h→0 ih 1 = lim (uy (x, y) + ivy (x, y)) = −iuy (x, y) + vy (x, y) h→0 i
f 0 (z) ≡
lim
h→0
388
CHAPTER 15. FUNDAMENTALS
Next let ∆z = h and take the limit of the difference quotient as h → 0. f (z + h) − f (z) h→0 h u (x + h, y) + iv (x + h, y) − (u (x, y) + iv (x, y)) = lim h→0 h = ux (x, y) + ivx (x, y) .
f 0 (z) ≡
lim
Therefore, equating real and imaginary parts, ux = vy , vx = −uy
(15.6)
and this yields the Cauchy Riemann equations. Since z → f 0 (z) is continuous, it follows the real and imaginary parts of this function must also be continuous. Thus from the above formulas for f 0 (z) , it follows from the continuity of z → f 0 (z) all the partial derivatives of the real and imaginary parts are continuous. Next suppose the Cauchy Riemann equations hold and these partial derivatives are all continuous. For ∆z = h + ik, f (z + ∆z) − f (z) = u (x + h, y + k) + iv (x + h, y + k) − (u (x, y) + iv (x, y)) = ux (x, y) h + uy (x, y) k + i (vx (x, y) h + vy (x, y) k) + o ((h, k)) = ux (x, y) h + uy (x, y) k + i (vx (x, y) h + vy (x, y) k) + o (∆z) This follows since C 1 implies differentiable along with the definition of the norm (absolute value) in C. By the Cauchy Riemann equations this equals =
ux (x, y) h − vx (x, y) k + i (vx (x, y) h + ux (x, y) k) + o (∆z)
=
ux (x, y) (h + ik) + ivx (x, y) (h + ik) + o (∆z)
=
ux (x, y) ∆z + ivx (x, y) ∆z + o (∆z)
Dividing by ∆z and taking a limit yields f 0 (z) exists and equals ux (x, y) + ivx (x, y) which are assumed to be continuous. For functions of a real variable, it is perfectly possible for the derivative to exist and not be continuous. For example, consider 2 x sin x1 if x 6= 0 f (x) ≡ 0 if x = 0 You can verify that f 0 (x) exists for all x but at 0 this derivative is not continuous. This will NEVER happen with functions of a complex variable. This will be shown later when it is more convenient. For now make continuity of f 0 part of the requirement for f to be analytic.
15.3
Contour Integrals
In the theory of functions of a complex variable, the most important results are those involving contour integration. I will use contour integration on curves of bounded variation as in [16], [51], [35] and referred to in [24]. This is more general than piecewise C 1 curves but most results can be obtained from only considering the special case. The most important tools in complex analysis are Cauchy’s theorem in some form and Cauchy’s formula for an analytic function. This section will give one of the very best versions of these theorems. They all involve something called a contour integral. Now a contour integral is just a sort of line integral. As earlier, γ ∗ will denote the set of points and γ will denote a parametrization. Here is the definition. It should look familiar and resemble a corresponding definition for line integrals presented earlier.
15.3. CONTOUR INTEGRALS
389
Definition 15.3.1
Let γ : [a, b] → C be continuous and of bounded variation and let f : γ ∗ → X where X is a complete complex normed linear space, usually C. Letting P ≡ {t0 , · · · , tn } where a = t0 < t1 < · · · < tn = b, define kP k ≡ max {|tj − tj−1 | : j = 1, · · · , n} and the Riemann Stieltjes sum by S (P ) ≡
n X
f (γ (τ j )) (γ (tj ) − γ (tj−1 ))
j=1
where τ j ∈ [tj−1 , tj ] . (Note this notation is a little sloppy because it does notR identify the specific point, τ j used. It is understood that this point is arbitrary.) Define γ f (z) dz as the unique number which satisfies the following condition. For all ε > 0 there exists a δ > 0 such that if kP k ≤ δ, then Z f (z) dz − S (P ) < ε. γ
Sometimes this is written as Z f (z) dz ≡ lim S (P ) . kP k→0
γ
You note that this is essentially the same definition given earlier for the line integral only this time the function has values in C (more generally X) rather than Rn and there is no dot product involved. Instead, you multiply by the complex number γ (tj ) − γ (tj−1 ) in the Riemann Stieltjes sum. Since the contour integral is defined in terms of limits of sums, it follows that the contour integral is linear because sums are linear. This is just like what was done earlier for line integrals. The fundamental result in this subject is the following theorem.
Theorem 15.3.2
Let f :Rγ ∗ → X be continuous and let γ : [a, b] → C be continuous and of bounded variation. Then γ f dz exists. Also letting δ m > 0 be such that |t − s| < δ m 1 implies kf (γ (t)) − f (γ (s))k < m ,
Z
f dz − S (P ) ≤ 2V (γ, [a, b])
m γ
whenever kP k < δ m . In addition to this, if γ has a continuous derivative on [a, b] , the derivative taken from left or right at the endpoints, then Z Z b f dz = f (γ (t)) γ 0 (t) dt (15.7) γ
a
Proof: The function, f ◦ γ , is uniformly continuous because it is defined on a compact set. Therefore, there exists a decreasing sequence of positive numbers, {δ m } such that if |s − t| < δ m , then 1 kf (γ (t)) − f (γ (s))k < . m Let Fm ≡ {S (P ) : kP k < δ m }. Thus Fm is a closed set. (The symbol, S (P ) in the above definition, means to include all sums corresponding to P for any choice of τ j .) It is shown that diam (Fm ) ≤
2V (γ, [a, b]) m
(15.8)
390
CHAPTER 15. FUNDAMENTALS
and then it will follow there exists a unique point, I ∈ ∩∞ m=1 Fm . This is because RX, the space where f has its values is complete and Theorem 15.1.2. It will then follow I = γ f dz. To verify 15.8, it suffices to verify that whenever P and Q are partitions satisfying kP k < δ m and kQk < δ m , 2 (15.9) kS (P ) − S (Q)k ≤ V (γ, [a, b]) . m Suppose kP k < δ m and Q ⊇ P . Then also kQk < δ m . To begin with, suppose that P ≡ {t0 , · · · , tp , · · · , tn } and Q ≡ {t0 , · · · , tp−1 , t∗ , tp , · · · , tn } . Thus Q contains only one more point than P . Letting S (Q) and S (P ) be Riemann Steiltjes sums, S (Q) ≡
p−1 X
f (γ (σ j )) (γ (tj ) − γ (tj−1 )) + f (γ (σ ∗ )) (γ (t∗ ) − γ (tp−1 ))
j=1
∗
∗
+f (γ (σ )) (γ (tp ) − γ (t )) +
n X
f (γ (σ j )) (γ (tj ) − γ (tj−1 )) ,
j=p+1
S (P ) ≡
p−1 X
f (γ (τ j )) (γ (tj ) − γ (tj−1 )) +
j=1 =f (γ(τ p ))(γ(tp )−γ(tp−1 ))
}| { z f (γ (τ p )) (γ (t∗ ) − γ (tp−1 )) + f (γ (τ p )) (γ (tp ) − γ (t∗ )) n X
+
f (γ (τ j )) (γ (tj ) − γ (tj−1 )) .
j=p+1
Therefore, kS (P ) − S (Q)k ≤
p−1 X 1 1 |γ (tj ) − γ (tj−1 )| + |γ (t∗ ) − γ (tp−1 )| + m m j=1
n X 1 1 1 |γ (tp ) − γ (t∗ )| + |γ (tj ) − γ (tj−1 )| ≤ V (γ, [a, b]) . m m m j=p+1
(15.10)
Clearly the extreme inequalities would be valid in 15.10 if Q had more than one extra point. You simply do the above trick more than one time. Let S (P ) and S (Q) be Riemann Steiltjes sums for which kP k and kQk are less than δ m and let R ≡ P ∪ Q. Then from what was just observed, kS (P ) − S (Q)k ≤ kS (P ) − S (R)k + kS (R) − S (Q)k ≤
2 V (γ, [a, b]) . m
∞ and this shows 15.9 which proves R 15.8. Therefore, there exists a unique point, I ∈ ∩m=1 Fm which satisfies the definition of γ f dz. Now consider the case where γ is C 1 . By uniform continuity, there is δ > 0 such that 0 γ (s) − γ (t) − γ (s) ≡ f (t, s) < ε (15.11) t−s
if |t − s| < δ. To see this, note that f : [a, b] × [a, b] → [0, ∞) is continuous if the difference quotient is defined to equal γ 0 (s) when t = s. Since this product is a compact set, it t − tˆ, s − sˆ < δ, then follows that for each ε > 0 there exists a δ > 0 such that if f (t, s) − f tˆ, sˆ < ε. So pick tˆ = sˆ = s. Then if |t − s| < δ, |f (t, s)| < ε. Thus |γ 0 (s) (t − s) − (γ (t) − γ (s))| < ε |t − s|
(15.12)
15.3. CONTOUR INTEGRALS
391
Then in the above argument, let kPm k < min (η m , δ m ) where η m corresponds in the 1 above to ε = m . From the above fundamental estimate,
Z
f dz − S (Pm ) ≤ 2V (γ, [a, b])
m γ for any sum kPm k < δ m . In particular, for such a partition t0 , · · · , tn , dependence of n on m being suppressed,
Z
n X
2V (γ, [a, b])
f dz − f (γ (tj−1 )) (γ (tj ) − γ (tj−1 ))
≤ m
γ
j=1 Using 15.12 and the triangle inequality,
Z
n X
0
f dz − f (γ (tj−1 )) γ (tj−1 ) (tj − tj−1 )
γ
j=1
X
n
0
f (γ (tj−1 )) (γ (tj−1 ) (tj − tj−1 ) − (γ (tj ) − γ (tj−1 ))) −
j=1
≤
2V (γ, [a, b]) m
Therefore,
Z
n X
0
f dz −
f (γ (t )) γ (t ) (t − t ) j−1 j−1 j j−1
γ
j=1 n
≤
X 1 2V (γ, [a, b]) +M (tj − tj−1 ) m m j=1
(15.13)
=
2V (γ, [a, b]) M + (b − a) m m
(15.14)
where M is an upper bound for kf (z)k for z ∈ γ ∗ , M existing because of the extreme value theorem. However, the first part shows as a special case that, since t → f (γ (t)) γ 0 (t) is Rb continuous, the integral a f (γ (t)) γ 0 (t) dt also exists. Let γ (t) = t for example. It clearly has bounded variation. Thus, in particular,
n
Z b
X
0 0
lim f (γ (tj−1 )) γ (tj−1 ) (tj − tj−1 ) − f (γ (t)) γ (t) dt
=0 m→∞ a
j=1
Therefore, passing to the limit in both sides of 15.14 yields the desired conclusion in 15.7.
Observation 15.3.3 In the case that γ ∗ = [a, b] an interval on the real line, 15.7 shows R R b
that if γ is oriented from a to b, then γ f (z) dz = a f (z) dz and if γ is oriented from b to R Ra a, then γ f (z) dz = b f (z) dz where the notation on the right signifies the usual Riemann integral. In the case that f has complex values, it is reasonable to ask for the real and imaginary parts of the contour integral. It turns out these are just line integrals. Let z = x + iy and let f (z) ≡ u (x, y) + iv (x, y). Also let the parametrization be γ (t) ≡ x (t) + iy (t). Then a term in the approximating sum is of the form (u + iv) (x (ti ) + iy (ti ) − (x (ti−1 ) + iy (ti−1 ))) =
(u + iv) ((x (ti ) − x (ti−1 )) + i (y (ti ) − y (ti−1 )))
392
CHAPTER 15. FUNDAMENTALS u (x (ti ) − x (ti−1 )) − v (y (ti ) − y (ti−1 ))
=
+i [v (x (ti ) − x (ti−1 )) + u (y (ti ) − y (ti−1 ))] Thus the contour integral is really the sum of two line integrals Z Z (u (x, y) , −v (x, y)) · dr + i (v (x, y) , u (x, y)) · dr C
C
where r (t) ≡ (x (t) , y (t)). This shows the following simple observation.
Observation 15.3.4 When f : γ ∗ → C is continuous for γ a bounded variation curve in C, it follows that Z Z Z f dz = (u (x, y) , −v (x, y)) · dr + i (v (x, y) , u (x, y)) · dr γ
C
C
where r (t) ≡ (Re γ (t) , Im γ (t)) , u (x, y) ≡ Re f (x, y) , and v (x, y) ≡ Im f (x, y) and C is the oriented curve whose parametrization is r (t). As in the case of line integrals, these contour integrals are independent of parametrization in the sense that R if γ (t)R = η (s) where t = t (s) with s → t (s) an increasing continuous function, then γ f dz = η f dw.
Definition 15.3.5
If one reverses the order in which points of γ ∗ are encountered, then one replaces γ with −γ in which, for γ : [a, b] → C, −γ (t) encounters the points of γ ∗ in the opposite order, the definition of the contour integral shows that Z Z − f dz = f dz γ
−γ
You could get a parametrization for −γ as −γ (t) ≡ γ (b − t) for t ∈ [0, b − a] or if you wanted to use the same interval, define −γ : [a, b] → C by −γ (t) ≡ γ (b + a − t) . Now this yields an easy way to check orientation of a rectifiable simple closed curve. Recall that these simple closed curves are homeomorphisms of S 1 and there are exactly two orientations.
Theorem 15.3.6
Let Γ be a simple closed curve in C and let z ∈ Ui , the inside component of ΓC . Then for γ a parametrization of Γ, Z 1 1 dw = ±1 n (γ, z) ≡ 2πi γ w − z depending on the orientation of Γ. If z ∈ / Ui ∪ Γ, the integral equals 0. Proof: Let the parametrization be γ (t) = x (t) + iy (t) , t ∈ [0, 2π] . Then from Obser1 vation 15.3.4, and writing w−z in real and imaginary parts with z = a + ib, the contour integral is ! Z x−a y−b · dr 2 2, 2 2 (x − a) + (y − b) (x − a) + (y − b) γ ! Z − (y − b) x−a +i · dr 2 2, 2 2 (x − a) + (y − b) (x − a) + (y − b) γ Using Green’s theorem as in Proposition 12.6.3, we can reduce the integral over Γ to an integral over Cr where Cr is a small circle centered at (a + ib). The orientation of the small
15.3. CONTOUR INTEGRALS
393
circle depends on the orientation of Γ. If this circle has radius r, and if the circle is oriented counter clockwise, x = a + r cos t, y = b + r sin t, these two integrals taken over Cr reduce to 2π
r cos t r sin t , · (−r sin t, r cos t) dt r2 r2 0 Z 2π −r sin t r cos t , · (−r sin t, r cos t) dt +i r2 r2 0
Z
The first is 0 and the second is 2πi. If one takes the opposite orientation of Γ, then the opposite orientation must also occur for Cr and this would involve replacing t with −t in the parameterization. The first integral would be 0 the same way and the second is −2πi by a simple computation. Thus, when the point is inside the curve Γ the above integral yields ±1 depending on which way Γ is oriented. If a + ib is not on the inside of Γ,nor on Γ. Then you can directly apply Green’s theorem and get 0 in both integrals because the vector field is C 1 on Ui ∪ Γ and in each case, Qx − Py = 0. R 1 1 The expression 2πi dw ≡ n (γ, z) is called the winding number. It has been γ w−z discussed here for a simple closed curve, but it applies to a general curve also. This can be considered later from a different point of view.
Definition 15.3.7 Given Γ a simple closed curve, the orientation is said to be positive if the winding number is 1 and negative if the winding number is −1. The following theorem follows easily from the above definitions and theorem. One can also see from the definition that something like the triangle inequality will hold. This is contained in the next theorem.
Theorem 15.3.8
Let f be continuous on γ ∗ having values in a complex Banach space X, writen as f ∈ C (γ ∗ , X) and let γ : [a, b] → C be of bounded variation and continuous. Let M ≥ max {kf ◦ γ (t)k : t ∈ [a, b]} . (15.15) Then
Z
f dz ≤ M V (γ, [a, b]) .
(15.16)
γ
Also if {fn } is a sequence of functions of C (γ ∗ , X) which is converging uniformly to the function f on γ ∗ , then Z Z lim fn dz = f dz. (15.17) n→∞
γ
γ
Proof: Let 15.15 hold. From the proof of Theorem 15.3.2, when kP k < δ m ,
Z
f dz − S (P ) ≤ 2 V (γ, [a, b])
m γ and so
Z
f dz ≤ kS (P )k + 2 V (γ, [a, b])
m γ
≤
n X
M |γ (tj ) − γ (tj−1 )| +
j=1
≤
M V (γ, [a, b]) +
2 V (γ, [a, b]) m
2 V (γ, [a, b]) . m
394
CHAPTER 15. FUNDAMENTALS
This proves 15.16 since m is arbitrary. To verify 15.17 use the above inequality to write
Z
Z Z
f dz − fn dz = (f − fn ) dz
γ
γ
γ
≤ max {kf ◦ γ (t) − fn ◦ γ (t)k : t ∈ [a, b]} V (γ, [a, b]) . Since the convergence is assumed to be uniform, this proves 15.17. Now suppose γ is continuous and bounded variation and f : γ ∗ × K → C is continuous where K is compact. Then consider the function Z F (w) ≡ f (z, w) dz γ
Lemma 15.3.9 The function F just defined is continuous. Proof: The function f is uniformly continuous since it is continuous on a compact set. Therefore, there exists δ > 0 such that if |(z1 , w1 ) − (z2 , w2 )| < δ, then kf (z1 , w1 ) − f (z2 , w2 )k < ε. It follows that if |w1 − w2 | < δ, then from Theorem 15.3.8
Z
≤ εV (γ, [a, b]) (f (z, w ) − f (z, w )) dz kF (w1 ) − F (w2 )k = 1 2
γ
Since ε is arbitrary, this proves the lemma. With this lemma, it becomes easy to give a version of Fubini’s theorem.
Theorem 15.3.10
Let γ i be continuous and bounded variation. Let f be continuous on γ ∗1 × γ ∗2 having values in X a complex complete normed linear space. Then Z Z Z Z f (z, w) dwdz = f (z, w) dzdw γ1
γ2
γ2
γ1
Proof: This follows quickly from the above lemma and the definition of the contour integral. Say γ i is defined on [ai , bi ]. Let a partition of [a1 , b1 ] be denoted by {t0 , t1 , · · · , tn } = P1 and a partition of [a2 , b2 ] be denoted by {s0 , s1 , · · · , sm } = P2 . Z
Z f (z, w) dwdz =
γ1
γ2
=
n Z X
i=1 j=1
γ 1 ([ti−1 ,ti ])
f (z, w) dwdz
γ 1 ([ti−1 ,ti ])
i=1 n X m Z X
Z γ2
Z f (z, w) dwdz γ 2 ([sj−1 ,sj ])
To save room, denote γ 1 ([ti−1 , ti ]) by γ 1i and γ 2 ([sj−1 , sj ]) by γ 2j Then if kPi k , i = 1, 2 is small enough, Theorem 15.3.8 implies
Z Z
Z Z
f (z, w) dwdz − f (γ 1 (ti ) , γ 2 (sj )) dwdz
γ 1i γ 2j
γ 1i γ 2j
Z Z
= (f (z, w) − f (γ 1 (ti ) , γ 2 (sj ))) dwdz ≤
γ 1i γ 2j
Z
!
max (f (z, w) − f (γ 1 (ti ) , γ 2 (sj ))) dw V (γ 1 , [ti−1 , ti ])
γ 2j
≤ εV (γ 2 , [sj−1 , sj ]) V (γ 1 , [ti−1 , ti ])
(15.18)
15.4. PRIMITIVES AND CAUCHY GOURSAT THEOREM
395
Also from this theorem,
Z Z
Z Z
f (z, w) dzdw − f (γ 1 (ti ) , γ 2 (sj )) dzdw
γ 2j γ 1i
γ 2j γ 1i
!
Z
(f (z, w) − f (γ 1 (ti ) , γ 2 (sj ))) dz V (γ 2 , [sj−1 , sj ]) ≤ max
γ 1i ≤ εV (γ 2 , [sj−1 , sj ]) V (γ 1 , [ti−1 , ti ]) (15.19) R Now approximating with sums and using the definition, γ dz = γ 1 (tj ) − γ 1 (tj−1 ) and so 1i
Z
Z
Z
Z
f (γ 1 (ti ) , γ 2 (sj )) dzdw = f (γ 1 (ti ) , γ 2 (sj )) γ 2j
γ 1i
dzdw γ 2j
Z
Z
= f (γ 1 (ti ) , γ 2 (sj ))
Z
Z
dwdz = γ 1i
γ 2j
γ 1i
f (γ 1 (ti ) , γ 2 (sj )) dwdz (15.20) γ 1i
γ 2j
Therefore,
Z Z
Z Z
f (z, w) dwdz − f (z, w) dzdw ≤
γ1 γ2
γ2 γ1
Pn Pm R R
i=1 j=1 γ 1i γ 2j f (z, w) dwdz
P R
n Pm R
− i=1 j=1 γ 1i γ 2j f (γ 1 (ti ) , γ 2 (sj )) dwdz
Pn Pm R R
f (γ 1 (ti ) , γ 2 (sj )) dwdz i=1 j=1 γ
γ + Pn Pm R1i R2j
− i=1 j=1 γ 2j γ 1i f (γ 1 (ti ) , γ 2 (sj )) dzdw
Pn Pm R R
i=1 j=1 γ 2j γ 1i f (γ 1 (ti ) , γ 2 (sj )) dzdw
Pn Pm R R +
− i=1 j=1 γ γ f (z, w) dzdw
2j 1i
From 15.20 the middle term is 0. Thus, from the estimates 15.19 and 15.18,
Z Z Z Z
f (z, w) dzdw f (z, w) dwdz −
γ1 γ2 γ2 γ1 ≤
2εV (γ 2 , [a2 , b2 ]) V (γ 1 , [a1 , b1 ])
Since ε is arbitrary, the two integrals are equal.
15.4
Primitives and Cauchy Goursat Theorem
In beginning calculus, the notion of an antiderivative was very important. It is similar for functions of complex variables. The role of a primitive is also a lot like a potential in computing line integrals, which should be familiar from calculus.
Definition 15.4.1
A function F such that F 0 = f is called a primitive of f.
See how it acts a lot like a potential, the difference being that a primitive has complex, not real values. In calculus, in the context of a function of one real variable, this is often called an antiderivative and every continuous function has one thanks to the fundamental theorem of calculus. However, it will be shown below that the situation is not at all the same for functions of a complex variable. So what if a function has a primitive? Say F 0 (z) = f (z) where f is continuous with values in X a complex Banach space.
396
CHAPTER 15. FUNDAMENTALS
Theorem 15.4.2
Suppose γ is continuous and bounded variation, a parametrization of Γ where t ∈ [a, b] . Suppose f : γ ∗ → X is continuous and has a primitive F . Thus F 0 (z) = f (z) for some Ω ⊇ γ ∗ . Then Z f (z) dz = F (γ (b)) − F (γ (a)) γ
Proof: Define the following function ( h (s, t) ≡
F (γ(t))−F (γ(s)) γ(t)−γ(s)
− f (γ (s)) if γ (t) 6= γ (s) 0 if γ (t) = γ (s)
Then h is continuous at points (s, s). To see this note that the top line is
=
f (γ (s)) (γ (t) − γ (s)) + o (γ (t) − γ (s)) − f (γ (s)) γ (t) − γ (s) o (γ (t) − γ (s)) γ (t) − γ (s)
thus if (sn , tn ) → (s, s) , the continuity of γ requires this to also converge to 0. That is, lim
n→∞
o (γ (tn ) − γ (sn )) =0 γ (tn ) − γ (sn )
In case γ (sn ) = γ (tn ) , then there is nothing to show because then h (sn , tn ) = h (s, s) = 0. Claim: Let ε > 0 be given. There exists δ > 0 such that if |t − s| < δ, then kh (s, t)k < ε. Proof of claim: If not, then for some ε > 0, there exists (sn , tn ) where |tn − sn | < 1/n but kh (sn , tn )k ≥ ε. Then by compactness, there is a subsequence, still called {sn } such that sn → s. It follows that tn → s also. Therefore, 0 = lim kh (sn , tn )k ≥ ε n→∞
a contradiction. Thus whenever |t − s| < δ,
F (γ (t)) − F (γ (s))
0 be given such that 2δ is smaller than both the height and width of Γ. Then there exist finitely many non n overlapping regions {Rk }k=1 consisting of simple closed rectifiable curves along with their interiors whose union equals Ui ∪ Γ. These regions consist of two kinds, those contained in Ui and those with nonempty intersection with Γ. These latter regions are called “border” regions. The boundary of a border region consists of straight line segments parallel to the coordinate axes of the form x = m 4δ or y = k 4δ for m, k integers along with arcs from Γ. The non border regions consist of rectangles. Thus all of these regions have boundaries which are rectifiable simple closed curves. Also each region is contained in a rectangle having sides of length no more than δ. There are at most 4V (Γ) 4 +1 δ
400
CHAPTER 15. FUNDAMENTALS
border regions. The construction also yields an orientation for Γ and for all these regions, the orientations for any line segment shared by two regions are opposite.
Theorem 15.4.7
Let Ui be the inside of Γ a simple closed rectifiable curve having parameterization γ. Also let f 0 (z) exist in Ui and f is continuous on Ui ∪ Γ. Here we assume only that f has values in a complex Banach space X. Then Z f (z) dz = 0. γ
Proof: Let Bδ , Iδ be those regions of Lemma 12.5.3 where as earlier Iδ are those which have empty intersection with Γ and Bδ are the border regions. Without loss of generality, assume Γ is positively oriented. Then since the line integrals cancel on adjacent sides of boundaries of regions, Z X Z X Z f (z) dz + f (z) dz = f (z) dz R∈Iδ
∂R
R∈Bδ
∂R
γ
In this case the first sum on the left in the above formula equals 0 from Corollary 15.4.5 for any δ > 0. Recall that there were at most 4V (Γ) +1 4 δ border regions where each of these border regions is contained in a box having sides of length no more than δ. Letting ε > 0 be given, suppose δ < 1 is so small that for |z − w| < 8δ with z, w ∈ Ui , |f (z) − f (w)| < ε this by uniform continuity of f on this compact set. Let R be a border region. Then picking z1 a point in R, Z Z Z f (z) dz = (f (z) − f (z1 )) dz + f (z1 ) dz ∂R
∂R
∂R
The last contour integral equals 0 because f (z1 ) has a primitive, namely F (z) = f (z1 ) z. It follows that
Z
Z
(f (z) − f (z1 )) dz f (z) dz =
≤ εV (∂R) ≤ ε4δ + εlR
∂R
∂R
where lR is the length of the part of Γ which is part of ∂R. It follows that whenever δ is small enough,
X Z
X Z X X
≤ f (z) dz ≤ f (z) dz ε4δ + εlR
∂R R∈Bδ ∂R R∈Bδ R∈Bδ R∈Bδ 4V (Γ) ≤ ε4δ 4 +1 + εV (Γ) δ ≤ 64εV (Γ) + 16ε + εV (Γ) So let δ n → 0 and let δ n correspond to εn = 1/n. Then the above estimate yields
X Z
≤ lim 65 V (Γ) + 16 = 0 lim f (z) dz
n→∞ n n→∞ n
R∈Bδn ∂R
Hence
Z
X
=0 f (z) dz f (z) dz = 0 + lim
n→∞ γ
R∈Bδn ∂R
Z
15.5. FUNCTIONS DIFFERENTIABLE ON A DISK, ZEROS
15.5
401
Functions Differentiable on a Disk, Zeros
It turns out that if a function has a derivative, then it has all of them, in contrast to functions of a real variable.
Theorem 15.5.1
(Morera1 ) Let Ω be an open set and let f 0 (z) exist for all z ∈ Ω. Let D ≡ B (z0 , r) ⊆ Ω. Then there exists ε > 0 such that f has a primitive on B (z0 , r + ε). Proof: Choose ε > 0 small enough that B (z0 , r + ε) ⊆ Ω. Then for w ∈ B (z0 , r + ε) , define Z F (w) ≡ f (u) du. γ(z0 ,w)
Then by the Cauchy Goursat theorem, and w ∈ B (z0 , r + ε) , it follows that for |h| small enough, Z F (w + h) − F (w) 1 = f (u) du h h γ(w,w+h) Z 1 Z 1 1 f (w + th) dt f (w + th) hdt = = h 0 0 which converges to f (w) due to the continuity of f at w. The winding number of a circle oriented counter clockwise is 1. This follows from Procedure 12.6.2. You could also simply parametrize the circle in the counter clockwise direction and compute the contour integral. Consider the following picture where you have a large circle of radius R and a small circle of radius r centered at z, a point on the inside of γ R . The Cauchy integral formula gives f (z) in terms of the values of f on the large circle. γR •z Γ2
•z0
γr
Γ1
Theorem 15.5.2 Let γ R be a positively oriented circle of radius R and let Ui be its inside. Suppose f : Ui ∪ γ ∗R → X is continuous and f has a derivative on Ui . Then if z ∈ Ui , Z f (w) 1 dw f (z) = 2πi γ R w − z Proof: Use −γ r for the orientation of the smaller circle. Then you have two simple closed curves joined at those two straight lines oriented as shown so that motion around both simple closed curves is counter clockwise. (This makes perfect sense for such a simple region.) The contour integrals cancel on the two straight lines. Also note that the winding number is 1 on each of these simple closed curves, labeled as Γ1 and Γ2 . See Procedure 1 Giancinto
Morera 1856-1909. This theorem or one like it dates from around 1886
402
CHAPTER 15. FUNDAMENTALS
12.6.2. Therefore, if g (z) has a derivative on A, the region between the two circles and continuous on γ ∗R ∪ γ ∗r ∪ A, then from the Cauchy integral theorem above, Z Z Z Z 0= g (w) dw + g (w) dw = g (w) dw − g (w) dw −γ r
γR
γR
γr
This results from using −γ r on those parts of Γi which lie on the small circle so these circular segments have the opposite orientation indicated in the picture. Now let g (w) =
f (w) . w−z
This is continuous on each Γi and has a derivative on the open set between the two circles. Thus Z Z f (w) f (w) dw = dw w − z γR γr w − z Now, since γ r is oriented positively, 1 Z f (w) − f (z) 1 Z f (w) dw − f (z) = dw 2πi γ r 2πi γ r w − z w−z
(15.22)
Since f 0 (z) exists, f (w) − f (z) w−z
f (z) + f 0 (z) (w − z) + o (w − z) − f (z) w−z o (w − z) = f 0 (z) + w−z =
Now f 0 (z) is a constant and so it has a primitive, namely w → f 0 (z) w. Thus 0. It follows that if r is sufficiently small, then 1 Z f (w) − f (z) 1 1 dw ≤ 2πrε = ε 2πi γ r w−z 2π r
R γr
f 0 (z) dw =
Thus, as r → 0, the right term in 15.22 converges to 0. It follows that Z Z f (w) 1 f (w) − f (z) 1 dw = lim dw + f (z) = f (z) r→0 2πi γ 2πi γ R w − z w−z r This is the Cauchy integral formula for a disk. This remarkable formula is sufficient to show that if a function has a derivative, then it has infinitely many and in fact, the function can be represented as a power series. When this is shown, it will be easy to give the general Cauchy integral formula for an arbitrary rectifiable simple closed curve. Let z0 be the center of the large circle. In the situation of Theorem 15.5.2, Z Z 1 f (w) 1 1 f (w) f (z) = dw = z−z0 dw 2πi γ R w − z0 − (z − z0 ) 2πi γ R w − z0 1 − w−z 0 z−z0 Now w−z = 0 1 2πi
|z−z0 | R
Z
< 1 for all w ∈ γ ∗R . Therefore, the above equals
∞ k X f (w) (z − z0 )
γ R k=0
k+1
(w − z0 )
1 dw = 2πi
Z γR
∞ k X (z − z0 ) k+1
k=0
(w − z0 )
! f (w) dw
15.5. FUNCTIONS DIFFERENTIABLE ON A DISK, ZEROS
403
If the partial sums of the above series converge uniformly on γ ∗R then by Theorem 15.3.8, ! Z ∞ k X 1 (z − z0 ) f (w) dw k+1 2πi γ R k=0 (w − z0 ) ! Z p k X 1 (z − z0 ) = lim f (w) dw k+1 p→∞ 2πi γ R k=0 (w − z0 ) p
1 X = lim p→∞ 2πi
k
(z − z0 )
Z
k+1
(w − z0 )
γR
k=0
f (w) dw
(15.23)
Which by definition is ∞ X k=0
1 2πi
Z
!
1
γR
(w − z0 )
k+1
f (w) dw (z − z0 )
k
It is assumed that f is continuous on Ui ∪ γ ∗R . Thus there is an upper bound for kf (w)kX , called M . Then for w ∈ γ ∗R ,
k+1 k
f (w) (z − z )k 1 |z − z0 | M |z − z0 |
0 ≤ M <
(w − z0 )k+1 |z − z0 | R R R and |z − z0 | /R < 1 so the right side is summable. Therefore, by Theorem 2.6.42, convergence is indeed uniform on γ ∗R and so ! Z ∞ ∞ X X 1 1 k k f (w) dw (z − z ) ≡ ak (z − z0 ) f (z) = 0 2πi γ R (w − z0 )k+1 k=0
k=0
This proves part of the next theorem which says, among other things, that when f has one derivative on the interior of a circle, then it must have all derivatives.
Theorem 15.5.3
Suppose z0 ∈ U, an open set in C and f : U → X has a derivative for each z ∈ U . Then if B (z0 , R) ⊆ U, then for each z ∈ B (z0 , R) , f (z) =
∞ X
n
an (z − z0 ) .
(15.24)
n=0
where
1 an ≡ 2πi
Z γR
1
n+1 f
(w) dw
(w − z0 )
and γ R is a positively oriented parametrization for the circle bounding B (z0 , R). Then f (k) (z0 ) = k!ak, 1/n
lim sup kan k
(15.25)
|z − z0 | < 1,
(15.26)
n→∞
f (k) (z) =
∞ X
n (n − 1) · · · (n − k + 1) an (z − z0 )
n−k
,
(15.27)
n=k
Proof: 15.24 follows from the above argument. Now consider 15.26. The above argument based on the Cauchy integral formula for a disk shows that if R > |ˆ z − z0 | > |z − z0 | , then f (ˆ z) =
∞ X n=0
n
an (ˆ z − z0 )
404
CHAPTER 15. FUNDAMENTALS
and so, by the root test, Theorem 2.8.1, 1 ≥ lim sup kan k
1/n
1/n
|ˆ z − z0 | > lim sup kan k
n→∞
|z − z0 |
n→∞
Consider 15.27 which involves identifying the an in terms of the derivatives of f . This is obvious if k = 0. Suppose it is true for k. Then for small h ∈ C, ∞ 1X 1 (k) n−k n−k f (z + h) − f (k) (z) = n (n − 1) · · · (n − k + 1) an (z + h − z0 ) − (z − z0 ) h h n=k
=
=
∞ n−k X n−k 1 X (n−k)−j n−k n (n − 1) · · · (n − k + 1) an hj (z − z0 ) − (z − z0 ) j h j=0 n=k+1 ∞ n−k X X n−k (n−k)−j n (n − 1) · · · (n − k + 1) an hj−1 (z − z0 ) j j=1
n=k+1 ∞ X
=
(n−k)−1
n (n − 1) · · · (n − k + 1) (n − k) an (z − z0 )
n=k+1
∞ X
+h
n−k X
n−k j
n (n − 1) · · · (n − k + 1) an
j=2
n=k+1
hj−2 (z − z0 )
(n−k)−j
(15.28)
By what was shown earlier, lim sup
n (n − 1) · · · (n − k + 1) kan k |z − z0 |
n−k
1/n
1/n
= lim sup kan k
n→∞
|z − z0 | < 1
n→∞
(15.29) Consider the part of 15.28 which multiplies h. Does the infinite series converge? Yes it does. In fact it converges absolutely. ∞ n−k X X n−k (n−k)−j j−2 n (n − 1) · · · (n − k + 1) an h (z − z ) 0 j j=2 n=k+1 ≤
∞ X
(n−2)−k
n (n − 1) · · · (n − k + 1) kan k |z − z0 |
j=2
n=k+1
≤
∞ X
n−k X
(n−2)−k
n (n − 1) · · · (n − k + 1) kan k |z − z0 |
n=k+1
1+
j−2
n−k j
|h|
j−2
|z − z0 |
|h| |z − z0 |
n−k
For all h small enough, this series converges, the infinite sum being decreasing in |h|. Indeed, lim sup
(n−2)−k
n (n − 1) · · · (n − k + 1) kan k |z − z0 |
n→∞
=
lim sup kan k
1/n
|z − z0 | 1 +
n→∞
|h| |z − z0 |
1+
|h| |z − z0 |
n−k !1/n
h > |h| 1/n ˆ and by the root test, Theorem 2.8.1, lim supn→∞ kan k h ≤ 1 so 1/n
lim sup kan k n→∞
|h| < 1
406
CHAPTER 15. FUNDAMENTALS
Now applying this to the sum in question, lim sup kan k
1/n
|h|
n−(k+1) n
= lim sup kan k
n→∞
1/n
|h| < 1
n→∞
Also the sum decreases in |h| and so
∞ ∞
X
X
n−(k+1) n−(k+1) ≤ lim |h| lim h an h kan k |h| =0
h→0 h→0 n=k+1
n=k+1
The tail of the series just described is sometimes referred to as “higher order terms”. The following is a remarkable result about the zeros of an analytic function on a connected open set. It turns out that if the set of zeros have a limit point, then the function ends up being constant.
Definition 15.5.8
Suppose f is an analytic function defined near a point, α where m f (α) = 0. Thus α is a zero of the function, f. The zero is of order m if f (z) = (z − α) g (z) where g is an analytic function which is not equal to zero at α.
Theorem 15.5.9 Let Ω be a connected open set (region) and let f : Ω → X be analytic. Then the following are equivalent. 1. f (z) = 0 for all z ∈ Ω 2. There exists z0 ∈ Ω such that f (n) (z0 ) = 0 for all n. 3. There exists z0 ∈ Ω which is a limit point of the set, Z ≡ {z ∈ Ω : f (z) = 0} . Proof: It is clear the first condition implies the second two. Suppose the third holds. Then for z near z0 ∞ X f (n) (z0 ) n (z − z0 ) f (z) = n! n=k
where k ≥ 1 since z0 is a zero of f. Suppose k < ∞. Then, k
f (z) = (z − z0 ) g (z) where g (z0 ) 6= 0. Letting zn → z0 where zn ∈ Z, zn 6= z0 , it follows k
0 = (zn − z0 ) g (zn ) which implies g (zn ) = 0. Then by continuity of g, we see that g (z0 ) = 0 also, contrary to the choice of k. Therefore, k cannot be less than ∞ and so z0 is a point satisfying the second condition. Now suppose the second condition and let n o S ≡ z ∈ Ω : f (n) (z) = 0 for all n . It is clear that S is a closed set which by assumption is nonempty. However, this set is also open. To see this, let z ∈ S. Then for all w close enough to z, f (w) =
∞ X f (k) (z) k=0
k!
k
(w − z) = 0.
15.6. THE GENERAL CAUCHY INTEGRAL FORMULA
407
Thus f is identically equal to zero near z ∈ S. Therefore, all points near z are contained in S also, showing that S is an open set. Now Ω = S ∪ (Ω \ S) , the union of two disjoint open sets, S being nonempty. It follows the other open set, Ω \ S, must be empty because Ω is connected. Therefore, the first condition is verified. (See the following diagram.) 1.) .%
& ←−
2.)
3.)
Note how radically different this is from the theory of functions of a real variable. Consider, for example the function 2 x sin x1 if x 6= 0 f (x) ≡ 0 if x = 0 which has a derivative for all x ∈ R and for which 0 is a limit point of the set, Z, even though f is not identically equal to zero. Here is a very important application called Euler’s formula. Recall that ez ≡ ex (cos (y) + i sin (y)) Is it also true that ez =
(15.32)
zk k=0 k! ?
P∞
Theorem 15.5.10
(Euler’s Formula) Let z = x + iy. Then ez =
∞ X zk k=0
k!
.
Proof: The Cauchy Riemann equations show that ez given by 15.32 is analytic. So is P∞ k exp (z) ≡ k=0 zk! . In fact the power series converges for all z ∈ C. Furthermore the two functions, ez and exp (z) agree on the real line which is a set which contains a limit point. Therefore, they agree for all values of z ∈ C. This formula shows the famous two identities, eiπ = −1 and e2πi = 1.
15.6
The General Cauchy Integral Formula
Now the following is the general Cauchy integral formula.
Theorem 15.6.1 Let Γ be a simple closed rectifiable curve and let γ be an oriented parametrization for Γ which has inside Ui . Also let f be continuous on Ui ∪ Γ and differentiable in Ui . Then if z ∈ Ui , Z f (w) 1 dw n (γ, z) f (z) = 2πi γ w − z Proof: Consider the function g (w) ≡
f (w)−f (z) w−z 0
if w 6= z f (z) if w = z
(15.33)
This is clearly continuous on Ui ∪ Γ. It is also clear that g 0 (w) exists if w 6= z. It remains to consider whether g 0 (z) exists. Then from the Theorem 15.5.3, we can write f (z + h) as
408
CHAPTER 15. FUNDAMENTALS
a power series in h whenever h is suitably small. f (z+h)−f (z) h
− f 0 (z)
h
= = =
1 1 1 00 1 000 0 2 3 0 f (z) h + f (z) h + f (z) h + · · · − f (z) h h 2! 3! 1 1 1 0 00 000 2 0 f (z) + f (z) h + f (z) h + · · · − f (z) h 2! 3! 1 00 1 000 f (z) + f (z) h + higher order terms 2! 3!
1 00 f (z). It follows that Thus the limit of the difference quotient exists and is 2! Z Z Z 1 1 f (w) 1 f (z) 0 = g (w) dw = dw − dw 2πi γ 2πi γ w − z 2πi γ w − z Z 1 f (w) = dw − f (z) n (γ, z) 2πi γ w − z
The following is a spectacular application. It is Liouville’s theorem.
Theorem 15.6.2
Suppose f is analytic on C and that kf (z)k is bounded for z ∈ C.
Then f is constant. Proof: It was shown above that if γ r is a counter clockwise oriented parametrization for the circle of radius r, then Z f (w) 1 dw if |z| < r f 0 (z) = 2πi γ r (w − z)2 and so
1 1 C2πr 2 2π r where |f (z)| < C for all z and this is true for any r so let r → ∞ and you can conclude that f 0 (z) = 0 for all z ∈ C. However, this shows that f (k) (z) = 0 for all z and for each k ≥ 1. Thus the power series for f (z) , which exists by Theorem 15.5.3, is |f 0 (z)| ≤
f (z) = f (0) +
∞ X f (k) (0) k=1
k!
z k = f (0) .
This leads right away to the shortest proof of the fundamental theorem of algebra.
Theorem 15.6.3
Let p (z) be a non constant polynomial with complex coefficients. Then p (z) = 0 for some z ∈ C. That is, p (z) has a root in C.
Proof: Suppose not. Then 1/p (z) is analytic on C. Also, the leading order term dominates the others and so 1/p (z) must be bounded. Indeed, lim|z|→∞ (1/ |p (z)|) = 0 and the continuous function z → 1/ |p (z)| achieves a maximum on any bounded ball centered at 0. By Liouville’s theorem, this quotient must be constant. However, by assumption, this does not take place. Hence there is a root of p (z).
15.7
Riemann sphere
I do not wish to emphasize the Riemann sphere in this book but some mention of it is 2 appropriate. Consider the unit sphere, S 2 given by (z − 1) + y 2 + x2 = 1. Define a map from the complex plane to the surface of this sphere as follows. Extend a line from the point, p in the complex plane to the point (0, 0, 2) on the top of this sphere and let θ (p) denote the point of this sphere which the line intersects. Define θ (∞) ≡ (0, 0, 2)
15.7. RIEMANN SPHERE
409
(0, 0, 2) • • (0, 0, 1)
• θ(p) p •
C
−1
Then θ is sometimes called sterographic projection. The mapping θ is clearly continuous because it takes converging sequences, to converging sequences. Furthermore, it is b consisting of C clear that θ−1 is also continuous. In terms of the extended complex plane C, along with a point called ∞, a sequence, zn converges to ∞ if and only if θzn converges to (0, 0, 2) if and only if |zn | converges to ∞ in the usual manner from calculus and a sequence, zn converges to z ∈ C if and only if θ (zn ) → θ (z) . This is interesting because of this last part. It gives a meaning for a sequence of complex numbers to converge to something called b and word everything in terms of ∞. To do this properly, we should define a metric on C this metric. However, it amounts to the same thing as saying what it means for sequences to converge. Then, with this definition of what it means for a sequence of complex numbers to converge to ∞, the usual definition of connected sets and separated sets is identical with what was given earlier.
Definition 15.7.1
b the extended complex plane in which this extra point Let S ⊆ C ∞ has been included as just described. Then S is separated if there exist A, B not both empty such that S = A ∪ B, A ∩ B = ∅ and no point of A is a limit of any sequence of points of B while no point of B is the limit of any sequence of points of A. If S is not separated, then it is called connected. Example 15.7.2 Consider the open set S ≡ {z ∈ C such that Im (z) > 0} . Then S ∪ b . {∞} ≡ Sb is connected in C It is obvious that S is connected in C because it is arcwise connected. Suppose Sb b . Then one of them, say B must contain = A ∪ B where these two new sets separate Sb in C ∞. Therefore, A is bounded since otherwise there would be a sequence of points of A converging to ∞ which is assumed not to happen. Then S = A ∪ (B \ {∞}) and A, B \ {∞} would separate S unless one is empty. If B \ {∞} = ∅, then S would be bounded which is not the case. Hence A = ∅. Thus Sb is connected.
Definition 15.7.3
Let S ⊆ C. It is said to be simply connected if the set is conb . Written more compactly, S is simply connected nected and C \ S ∪ {∞} is connected in C b b means S is connected and also C \S is connected in C.
When looking at a set S in C, how do you determine whether it is simply connected? You consider θ S C in S 2 and ask whether it is connected with the convention that if S C is unbounded, you must include (0, 0, 2) in the image of θ. Example 15.7.4 Consider the set S ≡ {z ∈ C such that |z| > 1} . This is a connected set, b \S is not connected. On S 2 it consists of a piece but it is not simply connected because C near the bottom of the sphere and the point (0, 0, 2) at the top. Example 15.7.5 Consider S ≡ {z ∈ C such that |z| ≤ 1} . This connected set is simply b \S corresponds to a connected set on S 2 . connected because C
410
CHAPTER 15. FUNDAMENTALS
15.8
Exercises
1. Suppose you have U ⊆ C an open set and f : U → C is analytic but has only real values. Find all possible f with these properties. 2. Suppose f : C → C is analytic. Let g (z) ≡ f (¯ z ) . Show that g 0 (z) does not exist 0 unless f (¯ z ) = 0. 3. Suppose f is an entire function (analytic on C) and suppose Re f is never 0. Show that f must be constant. Hint: Consider U = {(x, y) : Re f (x, y) > 0} , V = {(x, y) : Re f (x, y) < 0} . These are open and disjoint so one must be empty. If V is empty, consider 1/ef (z) . Use Liouville’s theorem. 4. Suppose f : C → C is analytic. Suppose also there is an estimate α
|f (z)| ≤ M (1 + |z| ) , α > 0 Show that f must be a polynomial. Hint: Consider the formula for the derivative in which γ r is positively oriented and a circle or radius r for r very large centered at 0, Z f (w) n! f (n) (z) = 2πi γ r (w − z)n+1 and pick large n. Then let r → ∞. P∞ n z 2n+1 5. Define for z ∈ C sin z ≡ k=0 (−1) (2n+1)! . That is, you just replace x with z. Give a similar definition for cos z, and ez . Show that the series converges for sin z and that a corresponding series converges for cos z. Then show that sin z =
eiz − e−iz eiz + e−iz , cos z = 2i 2
Show that it is not longer true that the functions sin z, cos z must be bounded in absolute value by 1. Hint: This is a very easy problem if you use the theorem about the zeros of an analytic function, Theorem 15.5.9. 6. Verify the identities cos (z − w) = cos z cos w + sin z sin w and similar identities. Hint: This is a very easy problem if you use the theorem about the zeros of an analytic function, Theorem 15.5.9. 7. Consider the following contour in which the large semicircle has radius R and the small one has radius r ≡ 1/R. y
x iz
The function z → ez is analytic on the curve and on its inside. Therefore, the contour integral with respect to the given R ∞ orientation is 0. Use this contour and the Cauchy integral theorem to verify that 0 sinz z dz = π/2 where this improper integral is defined as Z R sin z lim dz R→∞ −1/R z The function is actually not absolutely integrable and so the precise description of its meaning just given is important. You can use dominated convergence theorem
15.8. EXERCISES
411
to simplify some of the limits if you like but it is possible to establish the needed estimates through elementary means. I recommend using the dominated convergence R −z theorem. To do this, show that the integral over the large circle of CR e z dz → 0 as R → ∞ and verify that you get something else like −π for the integral over the small integral as r → 0. 8. A set U is star shaped if there exists a single point z0 ∈ U such that every segment from z0 to z is contained in U . Now suppose that U is star shaped and f : U → X is analytic. Show that f has a primitive on U . 9. Let U be what remains of C after (−∞, 0] is deleted. Explain why U is star R shaped. Letting γ (1, z) be the straight line segment from 1 to z, let f (z) = γ(1,z) w1 dw. Explain why f 0 (z) = z1 , f (1) = 0. Now explain why f is analytic and why ef (z) = z for all z ∈ U. Also formulate an assertion which says f (ez ) = z for suitable z. This f is the principal logarithm, denoted log (z). 10. Explain why one could delete any ray starting at 0 and obtain a function f (z) which is a primitive of 1/z. 11. For z ∈ C \ (−∞, 0], let arg (z) ≡ θ ∈ (−π, π) such that z = |z| eiθ . Show that log (z) = ln |z| + i arg (z) . 12. Suppose f (z) = u (x, y) + iv (x, u) is analytic. Show that both u, v satisfy Laplace’s equation, uxx + uyy = 0. 13. Suppose you have two complex numbers z = a + ib and w = x + iy. Show that the dot product of the two vectors (a, b) · (x, y) is Re ((a + ib) (x − iy)) = Re (z w) ¯ . 14. ↑Suppose you have two curves t → z (t) and s → w (s) which intersect at some point z0 corresponding to t = t0 and s = s0 . Show that the cosine of the angle θ between these two curves at this point is Re z 0 (t0 ) w0 (s0 ) cos (θ) = |z 0 (t0 )| |w0 (s0 )| Now suppose z → f (z) is analytic. Thus there are two curves t → f (z (t)) and s → f (w (s)) which intersect when t = t0 and s = s0 . Show that the angle between these two new curves at their point of intersection is also θ. This shows that analytic mappings preserve the angles between curves. 15. Suppose z = x + iy and f (z) = u (x, y) + iv (x, y) where f is analytic. Explain how level curves of u and v intersect in right angles. 16. Suppose Γ is a simple closed rectifiable curve and that γ is an oriented parametrization ˆ for Γ which is oriented positively, n (γ, z) = 1 for all z on the inside of Γ. Now let Γ be a simple closed rectifiable curve on the inside of Γ and let γˆ be an orientation of ˆ also oriented positively. Explain why, if z is on the inside of Γ ˆ and f is analytic on Γ R f (w) R f (w) the inside Ui of Γ, continuous on Ui ∪ Γ, then γˆ w−z dw = γ w−z dw and if z is on R f (w) R (w) ˆ Then the inside of Γ but outside of Γ, dw = 0 while γ fw−z dw = f (z) and if z γ ˆ w−z is outside of Γ then both integrals are 0. 17. Let Γ be a simple closed rectifiable curve in C. Let γ be a parametrization of Γ which has positive orientation. Thus n (γ, z) = 1 for all z inside Γ. Also suppose f is an analytic function on a connected open set containing Γ and its inside Ui . Suppose f is not identically zero and has no zeros on Γ. Explain why f has finitely many zeros on
412
CHAPTER 15. FUNDAMENTALS m
the inside of Γ. A zero a has multiplicity m if f (z) = (z − a) g (z) where g (z) 6= 0 on Ui . Let the zeros of f in Ui be {a1 , · · · , am } where there might be repeated numbers in this list, zeros of multiplicity higher than 1. Show that Z 0 1 f (z) m= dz (15.34) 2πi γ f (z) Thus you can count the zeros of an analytic function inside a simple closed curve by doing an integral! Hint: First of all, m is finite since if not, Theorem 15.5.9 implies that f (z) = 0 for all z since there Qm would be a limit point or else a zero of infinite order. Now argue that f (z) = k=1 (z − ak ) g (z) where g (z) is analytic and nonzero 0 (z) . Then use the fact that n (γ, z) = 1. on Ui . Use the product rule to simplify ff (z) 18. Give another very short proof of the fundamental theorem of algebra using the result of Problem 17 above. In fact, show directly that if p (z) is a polynomial of degree n then it has n roots counted according to multiplicity. Hint: Let p (z) be a polynomial. Then by the Euclidean algorithm, Lemma 1.6.2, you can see that there can be no more than n roots of the polynomial p (z) having complex coefficients. Otherwise the polynomial could not have degree n. You should show this. Now there must exist ΓR , a circle centered at 0 of radius R which encloses all roots of p (z). Letting m be the number of roots, Z p0 (z) 1 dz m= 2πi γ R p (z) Now write down in terms of an integral on [0, 2π] and let R → ∞ to get n in the limit on the right. Hence n = m. 19. Suppose now you have a rectifiable simple closed curve Γ and on Γ∗ , |f (z)| > |g (z)| where f, g are analytic on an open set containing Γ∗ . Suppose also that f has no zeros on Γ∗ . In particular, f is not identically 0. Let λ ∈ [0, 1]. (a) Verify that for λ ∈ [0, 1] , f + λg has no zeros on Γ∗ . 0 0 f 0 (z)+µg 0 (z) ∗ (f (z)+λg (z)) (b) Verify that on Γ , f (z)+λg(z) − f (z)+µg(z) ≤ C |µ − λ| . (c) Use Theorem 15.3.8 to show that for γ a positively oriented parametrization of Γ, Z 0 1 f (z) + λg 0 (z) λ→ dz 2πi γ f (z) + λg (z) is continuous. (d) Now explain why this shows that the number of zeros of f + λg on the inside of Γ is the same as the number of zeros of f on the inside of Γ. This is a version of Rouche’s theorem. 20. Give an extremely easy proof of the fundamental theorem of algebra as follows. Let γ R be a parametrization of the circle centered at 0 having radius R which has positive orientation so n (γ, z) = 1. Let p (z) be a polynomial an z n + an−1 z n−1 + · · · + a1 z + a0 . Now explain why you can choose R so large that |an z n | > an−1 z n−1 + · · · + a1 z + a0 for all |z| ≥ R. Using Problem 19 above explain why all zeros of p (z) are inside γ ∗R and why there are exactly n of them counted according to multiplicity. 21. The polynomial z 5 + z 4 − z 3 − 3z 2 − 5z + 1 = p (z) has no rational roots. You can check this by applying the rational root theorem from algebra. However, it has five complex roots. Also 4 z − z 3 − 3z 2 − 5z + 1 ≤ |z|4 + |z|3 + 3 |z|2 + 5 |z| + 1
15.8. EXERCISES
413
By graphing, observe that x5 − x4 + x3 + 3x2 + 5x + 1 > 0 for all x ≥ 2.4. Explain why the roots of p (z) are inside the circle |z| = 2.4. 22. This problem will feature the situation where the radius of the simple closed curve is sufficiently small. The zero counting integral can be used to prove an open mapping m theorem for analytic functions. Suppose you 0 have f (z) = f (z0 ) + φ (z) for z ∈ V an open set containing z0 and φ (z0 ) = 0, φ (z0 ) = 2r 6= 0, and m ∈ N. Let C (a, ρ) denote the positively oriented circle centered at a which has radius ρ. (a) Explain why there exists δ > 0 such that if |z − z0 | = δ, then B (z0 , δ) ⊆ V and φ (z) z − z0 ≥ r, |φ (z)| ≥ r |z − z0 | = rδ Therefore, if |w| < rδ, then if |z − z0 | = δ, |φ (z) − w| = 6 0. R 0 φ (z) 1 dz for |w| < δ 2r and Problem 17 to (b) Use continuity of w → 2πi C(z0 ,δ) φ(z)−w conclude that there exists ε < rδ such that if |w| < ε there is one zero for φ (z)−w in B (z0 , δ). In other words, φ (B (z0 , δ)) ⊇ B (0, ε) . Then also φm (B (z0 , δ)) ⊇ B (0, εm ). Hint: If you have w ∈ B (0, εm ) , then there are m mth roots of w 1/m
equally spaced around B 0, |w|
. Thus these roots are on a circle of radius
less than ε. Pick one. Call it w. ˆ Then there exists z ∈ B (z0 , δ) such that φ (z) = w. ˆ Then φm (z) = w. Fill in details. (c) Explain why f (B (z0 , δ)) ⊇ f (z0 ) + B (0, εm ) and why for w ∈ f (z0 ) + B (0, εm ) there are m different points in B (z0 , δ) , z1 , · · · , zm such that f (zj ) = w. 23. ↑Let Ω be an open connected set. Let f : Ω → C be analytic. Suppose f (Ω) is not m a single point. Then pick z0 ∈ Ω. Explain why f (z) = f (z0 ) + (z − z0 ) g (z) for all z ∈ V an open ball contained in Ω which contains z0 and g (z) 6= 0 in V, g (z) analytic. If this were not so, then z0 would be a zero of infinite order and by the theorem on zeros, Theorem 15.5.9, f (z) = f (z0 ) for all z ∈ Ω which is assumed not to happen. m Thus, every z0 in Ω has this property that near z0 , f (z) = f (z0 ) + (z − z0 ) g (z) for m nonzero g (z). Now explain why f (z) = f (z0 )+φ (z) where φ (z0 ) = 0 but φ0 (z0 ) 6= 0 and φ (z) is some analytic function. Thus from Problem 22 above, there is δ such that f (Ω) ⊇ f (z0 ) + B (0, εm ). Hence f (Ω) is open since each f (z0 ) is an interior point of m f (Ω). You only need to show that there is G (z) such that G (z) = g (z) and then φ (z) ≡ (z − z0 ) G (z) will work fine. When you have done this, Problem 22 will yield a proof of the open mapping theorem which says that if f is analytic on Ω a connected open set, then f (Ω) is either an open set or a single point. So here are some steps for doing this. (a) Consider z →
g 0 (z) g(z) .
It is analytic on the open ball V and so it has a primitive R 0 (w) on V . In fact, you could take h (z) ≡ γ(z0 ,z) gg(w) dw. 0 (b) Let the primitive be h (z) . Then consider g (z) e−h(z) . Show this equals 0. Then explain why this requires it to be constant. Explain why there is a + ib such that g (z) = eh(z)+a+ib . Then use the primitive h (z) + a + ib instead of the original one. Call it h (z). Then g (z) = eh(z) You can then complete the argument by letting g (z) 1/m
G (z) ≡ (z − z0 ) g (z)
1/m
≡ eh(z)/m and
414
CHAPTER 15. FUNDAMENTALS
24. If you have an open set U in C show that for all z ∈ U, |z| < sup {|w| : w ∈ U }. In other words, z → |z| never achieves its maximum on any open set U ∈ C. 25. Let f be analytic on U and let B (z, r) ⊆ U . Let γ r be the positively oriented boundary of B (z, r). Explain, using the Cauchy integral formula why |f (z)| ≤ max {|f (w)| : w ∈ γ ∗r } ≡ mr Show that if equality is achieved, then |f (w)| must be constantly equal to mr on γ ∗r . 26. The maximum modulus theorem says that if Ω is a bounded connected open set and f : Ω → C is analytic and f : Ω → C is continuous, then if |f | achieves its maximum at any point of Ω then f is equal to a constant on Ω. Thus |f | achieves its maximum on the boundary of Ω in every case. Hint: Suppose the maximum is achieved at a point of Ω, z0 . Then let B (z0 , r) ⊆ Ω. Show that if f is constant on B (z0 , r) , then it equals this constant on all of Ω using Theorem 15.5.9. However, if it is not constant, then from the open mapping theorem of Problem 23, f (B (z0 , r)) is an open set. Then use Problem 24 above to obtain a contradiction. 0 27. Let f : C → C be analytic with f (z) 6=0 for all z. Say f (x + iy) = u (x, y) + iv (x, y). u (x, y) is a C 1 mapping of R2 to R2 . Show that at Thus the mapping (x, y) → v (x, y) any point ux uy vx vy 6= 0
Therefore, by the inverse function theorem, Theorem 3.7.7, this mapping is locally one to one. However, the function does not need to be globally one to one. Give an easy example which shows this to be the case. 28. Let Γ be a simple closed rectifiable curve and let {fn } be a sequence of functions which are analytic on Ui , the inside of Γ and continuous on Γ∗ . Then if γ is a parametrization of Γ with n (γ, z) = 1 for z ∈ Ui , then Z 1 fn (w) fn (z) = dw 2πi γ w − z This is by the Cauchy integral formula presented above. Suppose fn converges uniformly on Γ∗ to a continuous function f . Show that then, for z ∈ Ui , and f (z) defined as Z 1 f (w) f (z) ≡ dw 2πi γ w − z It follows that fn (z) → f (z) for each z ∈ Ui and also f is analytic on Ui . Hint: You might use Theorem 15.3.8. This is very different than what happens with functions of a real variable in which uniform convergence of polynomials pn to f does not necessarily confer differentiability on f . For example, to approximate f , a continuous function having no derivatives or even a very easy function like f (x) = |x − (1/2)| for x ∈ [0, 1]. 29. The Schwarz lemma is as follows: Suppose F : B (0, 1) → B (0, 1) , F is analytic, and F (0) = 0. Then for all z ∈ B (0, 1) , |F (z)| ≤ |z| ,
(15.35)
|F 0 (0)| ≤ 1.
(15.36)
and If equality holds in 15.36 then there exists λ ∈ C with |λ| = 1 and F (z) = λz.
(15.37)
15.8. EXERCISES
415
P∞ Prove the Schwarz lemma. Hint: Since F has a power series of the form k=1 ak z k , it follows that F (z) /z equals an analytic function g (z) for all z ∈ B (0, 1). By the maximum modulus theorem, Problem 26 above, applied to g (z) , if |z| < r < 1, it F (z) 1 ≤ max F re ≤ . z t∈[0,2π] r r Explain why this implies F (z) ≤1 |g (z)| = z Now explain why limz→0 F (z) = F 0 (0) = g (0) and so |F 0 (0)| ≤ 1. It only remains to z 0 verify that if |F (0)| = 1, then F (z) is just a rotation as described. If |F 0 (0)| = 1, then the analytic function g (z) has the property that it achieves its maximum at an interior point. Apply 26 to conclude that g (z) must be a constant. Explain Problem = 1 for all z. Use this to conclude the proof. why this requires F (z) z 30. Sketch an example of two differentiable functions defined on [0, 1] such that their product is 0 but neither function is 0. Explain why this never happens for the set of analytic functions defined on an open connected set. In other words, if you have f g = 0 where f, g are analytic on D an open connected set, then either f = 0 or g = 0. For those who like to classify algebraically, this says that the set of analytic functions defined on an open connected set is an integral domain. It is clear that this set of functions is a ring with the usual operations. The extra ingredient is this observation that there are no nonzero zero divisors. Hint: To show this, consider D \ f −1 (0) an open set. If f −1 (0) = D, then you are done. Otherwise, you have g is 0 on an open set. Now use Theorem 15.5.9. 1 31. For D ≡ {z ∈ C : |z| < 1} , consider the function sin 1−z . Show that this function has infinitely many zeros in D. Thus there is a limit point to the set of zeros, but its limit point is not in D. It is good to keep this example in mind when considering Theorem 15.5.9.
416
CHAPTER 15. FUNDAMENTALS
Chapter 16
Isolated Singularities and Analytic Functions 16.1
Open Mapping Theorem for Complex Valued Functions
The open mapping theorem is for an analytic function with values in C. It is even more surprising result than the theorem about the zeros of an analytic function. The following proof of this important theorem uses an interesting local representation of the analytic function.
Theorem 16.1.1
(Open mapping theorem) Let Ω be a region in C and suppose f : Ω → C is analytic. Then f (Ω) is either a point or a region. In the case where f (Ω) is a region, it follows that for each z0 ∈ Ω, there exists an open set V containing z0 and m ∈ N such that for all z ∈ V, m f (z) = f (z0 ) + φ (z) (16.1) where φ : V → B (0, δ) is one to one, analytic and onto, φ (z0 ) = 0, φ0 (z) 6= 0 on V and φ−1 analytic on B (0, δ) . If f is one to one then m = 1 for each z0 and f −1 : f (Ω) → Ω is analytic. Proof: Suppose f (Ω) is not a point. Then if z0 ∈ Ω it follows there exists r > 0 such that f (z) 6= f (z0 ) for all z ∈ B (z0 , r) \ {z0 } . Otherwise, z0 would be a limit point of the set, {z ∈ Ω : f (z) − f (z0 ) = 0} which would imply from Theorem 15.5.9 that f (z) = f (z0 ) for all z ∈ Ω. Therefore, making r smaller if necessary and using the power series of f, m ? m 1/m ) f (z) = f (z0 ) + (z − z0 ) g (z) (= f (z0 ) + (z − z0 ) g (z)
for all z ∈ B (z0 , r) , where g (z) 6= 0 on B (z0 , r) . As implied in the above formula, one wonders if you can take the mth root of g (z) . g0 g is an analytic function on B (z0 , r) and so by Morera’s theorem, Theorem 15.5.1, it has a primitive on B (z0 , r) called h. Therefore by the product rule and the chain rule, ge−h
0
= g 0 e−h + g −e−h h0 g0 = g 0 e−h + g −e−h =0 g 417
418
CHAPTER 16. ISOLATED SINGULARITIES AND ANALYTIC FUNCTIONS
and so there exists a constant, C = ea+ib such that on B (z0 , r) , ge−h = ea+ib . Therefore, g (z) = eh(z)+a+ib and so, modifying h by adding in the constant, a + ib it is still a primitive of g 0 /g and now 0 (z) g (z) = eh(z) where h0 (z) = gg(z) on B (z0 , r) . Letting φ (z) = (z − z0 ) e
h(z) m
implies formula 16.1 is valid on B (z0 , r) . Now φ (z0 ) = 0 but φ0 (z0 ) = e
h(z0 ) m
6= 0.
Shrinking r if necessary you can assume φ0 (z) 6= 0 on B (z0 , r). Is there an open set V contained in B (z0 , r) such that φ maps V onto B (0, δ) for some δ > 0? Let φ (z) = u (x, y) + iv (x, y) where z = x + iy. Consider the mapping x u (x, y) → y v (x, y) where u, v are C 1 because φ is given to be analytic. The Jacobian of this map at (x, y) ∈ B (z0 , r) is ux (x, y) uy (x, y) ux (x, y) −vx (x, y) = vx (x, y) vy (x, y) vx (x, y) ux (x, y) 2 2 2 = ux (x, y) + vx (x, y) = φ0 (z) 6= 0. This follows from a use of the Cauchy Riemann equations. Also u (x0 , y0 ) 0 = v (x0 , y0 ) 0 Therefore, by the inverse function theorem there exists an open set V, containing z0 and T δ > 0 such that (u, v) maps V one to one onto B (0, δ) . Thus φ is one to one onto B (0, δ) as claimed. Applying the same argument to other points z of V and using the fact that φ0 (z) 6= 0 at these points, it follows φ maps open sets to open sets. In other words, φ−1 is continuous. It also follows that φm maps V onto B (0, δ m ) . Indeed, m
|φ (z)|
m
= |φ (z) | .
Therefore, the formula 16.1 implies that f maps the open set V, containing z0 to an open set. This shows f (Ω) is an open set because z0 was arbitrary. It is connected because f is continuous and Ω is connected. Thus f (Ω) is a region (open and connected). It remains to verify that φ−1 is analytic on B (0, δ) . Since φ−1 is continuous, φ−1 (φ (z1 )) − φ−1 (φ (z)) 1 z1 − z = lim = 0 . z →z φ (z1 ) − φ (z) φ (z1 ) − φ (z) φ(z1 )→φ(z) 1 φ (z) lim
Therefore, φ−1 is analytic as claimed. It only remains to verify the assertion about the case where f is one to one. If m > 1, 2πi then e m 6= 1 and so for z1 ∈ V, e
2πi m
φ (z1 ) 6= φ (z1 ) .
(16.2)
16.1. OPEN MAPPING THEOREM FOR COMPLEX VALUED FUNCTIONS
419
2πi
But e m φ (z1 ) ∈ B (0, δ) and so there exists z2 6= z1 (since φ is one to one) such that 2πi φ (z2 ) = e m φ (z1 ) . But then 2πi m m m m φ (z2 ) = e m φ (z1 ) = e2πi φ (z1 ) = φ (z1 ) implying f (z2 ) = f (z1 ) contradicting the assumption that f is one to one. Thus m = 1 and f 0 (z) = φ0 (z) 6= 0 on V. Since f maps open sets to open sets, it follows that f −1 is continuous and so 0 f −1 (f (z1 )) − f −1 (f (z)) lim f −1 (f (z)) = f (z1 ) − f (z) f (z1 )→f (z) z1 − z 1 = lim = 0 . z1 →z f (z1 ) − f (z) f (z) You can dispense with the appeal to the inverse function theorem by using Problem 22 on Page 413. One does not have to look very far to find that this sort of thing does not hold for functions mapping R to R. Take for example, the function f (x) = x2 . Then f (R) is neither a point nor a region. In fact f (R) fails to be open.
Corollary 16.1.2 Suppose in the situation of Theorem 16.1.1 m > 1 for the local representation of f given in this theorem. Then there exists δ > 0 such that if w ∈ B (f (z0 ) , δ) = f (V ) for V an open set containing z0 , then f −1 (w) consists of m distinct points in V. (f is m to one on V ) m
Proof: Let w ∈ B (f (z0 ) , δ) . n Then w = fo(b z ) where zb ∈ V. Thus f (b z ) = f (z0 )+φ (b z) . m 2kπi m φ (b z) . Then each of these numbers is in B (0, δ) Consider the m distinct numbers, e k=1
m
and so since φ maps V one to one onto B (0, δ) , there are m distinct numbers in V , {zk }k=1 2kπi z ). Then such that φ (zk ) = e m φ (b 2kπi m m f (zk ) = f (z0 ) + φ (zk ) = f (z0 ) + e m φ (b z) =
m
f (z0 ) + e2kπi φ (b z)
= f (z0 ) + φ (b z)
m
= f (b z) = w
Example 16.1.3 Consider the open connected set D ≡ R + i (a − π, a + π) . Then z → ez is one to one and analytic on D. It maps D onto C \ l where l is the ray starting from 0 whose angle is a. Therefore, it has an analytic inverse defined on C \ l. This is a branch of the logarithm. It is of the form log (z) = ln |z| + i arga (z) where arga (z) is the angle in (a − π, a + π) with the property that eln|z|+i arga (z) = z We usually let a = 0 and then the inverse is what is usually called the logarithm and is denoted by log . As in Problem 11 this is ln (|z|) + i arg (z) where arg (z) is the angle between −π and π corresponding to z ∈ C \ (−∞, 0]. With the open mapping theorem, the maximum modulus theorem is fairly easy.
Theorem 16.1.4 Let Ω be an open connected, bounded set in C and let f : Ω → C ¯ \ Ω. Then be analytic. Let ∂Ω ≡ Ω ¯ = max {|f (z)| : z ∈ ∂Ω} max |f (z)| : z ∈ Ω and if the maximum of |f (z)| is achieved at a point of Ω, then f is a constant. Proof: Suppose f (Ω) is not a single point. That is, f is not constant. Then by the open mapping theorem, f (Ω) is an open connected subset of C and so z → |f (z)| has no ¯ is on ∂Ω. If f (Ω) is a single point, maximum. Therefore, the maximum of |f (z)| for z ∈ Ω then the equation still holds.
420
CHAPTER 16. ISOLATED SINGULARITIES AND ANALYTIC FUNCTIONS
16.2
Functions Analytic on an Annulus
First consider the definition of an annulus.
Definition 16.2.1
Define ann (a, r, R) ≡ {z : r < |z − a| < R} .
Thus ann (a, 0, R) would denote the punctured ball, B (a, R) \ {a} and when r > 0, the annulus looks like the following.
•a
The annulus consists of the points between the two circles. In the following picture, let there be two parametrizations γ R for the large circle and γˆ r for the small one with orientation as shown. There are also two line segments oriented as shown which miss z ∈ ann (z0 , r, R) and constitute the intersection of the two simple closed curves Γ1 , Γ2 . These two simple closed curves are oriented as shown. Thus each of Γi is positively oriented. Let f be continuous on ann (z0 , r, R) and be analytic on ann (z0 , r, R). γR
•z
γˆ r Γ2
•z0
Γ1
It follows from Theorem 15.6.1, that for z in the annulus, 1 2πi
Z γR
f (w) 1 dw + w−z 2πi
Z γ ˆr
f (w) dw = f (z) w−z
This is because the contributions to the line integrals along those straight lines is 0 since they cancel off because of opposite orientations. Let γ r be the opposite orientation from γˆ r . Then this reduces to Z Z f (w) f (w) dw − dw = 2πif (z) w − z w −z γR γr Thus 1 f (z) = 2πi
"Z
1 = 2πi
"Z
γR
γR
Z
f (w) dw + w − z0 − (z − z0 ) 1 f (w) z−z0 dw + w − z0 1 − w−z 0
γr
Z γr
f (w) dw (z − z0 ) − (w − z0 )
1 f (w) dw 0 z − z0 1 − w−z z−z0
#
#
16.2. FUNCTIONS ANALYTIC ON AN ANNULUS
421
z−z0 Now note that for z in the annulus between the two circles and w ∈ γ ∗R , w−z < 1, and 0 0 for w ∈ γ ∗r , w−z z−z0 < 1. In fact, in each case, there is b < 1 such that z − z0 < b < 1, w ∈ γ ∗r , w − z0 < b < 1 w ∈ γ ∗R , w − z0 z − z0
(16.3)
Thus you can use the formula for the sum of an infinite geometric series and conclude R P∞ z−z0 n 1 f (w) w−z dw 1 n=0 w−z0 γR 0 f (z) = R P∞ w−z0 n 2πi + dw f (w) 1 (z−z0 )
γr
n=0
z−z0
Then from the uniform estimates of 16.3, one can conclude uniform convergence of the partial sums for w ∈ γ ∗R or γ ∗r , and so by the Weierstrass M test, Theorem 2.6.42, one can interchange the summation with the integral and write ! Z ∞ X 1 1 n f (w) f (z) = n+1 dw (z − z0 ) 2πi (w − z ) γR 0 n=0 ! Z ∞ X 1 1 n f (w) (w − z0 ) dw + n+1 , 2πi γ r (z − z0 ) n=0 both series converging absolutely. Thus there are an , bn ∈ X such that f (z) =
∞ X
n
an (z − z0 ) +
∞ X
bn (z − z0 )
−n
n=1
n=0
This proves most of the following theorem.
Theorem 16.2.2
Let z ∈ ann (z0 , r, R) and let f : ann (z0 , r, R) → X be analytic on ann (z0 , r, R) and continuous on ann (z0 , r, R). Then for any z ∈ ann (z0 , r, R) , f (z) =
∞ X
n
an (z − z0 ) +
n=0
where
bn (z − z0 )
−n
(16.4)
n=1
1 2πi
Z
1 bn = 2πi
Z
an =
∞ X
f (w) γR
1
n+1 dw
(w − z0 )
f (w) (w − z0 )
n−1
dw
γr
ˆ < R, then convergence of and both of these series in 16.4 converge absolutely. If r< rˆ < R ˆ both series is absolute and uniform for z ∈ ann z0 , rˆ, R . Proof: Consider the sum with the negative exponents. The other is similar. Let |f (w)| ≤ M on the closure of the annulus. ! Z ∞ X 1 n −n f (w) (w − z0 ) dw bn (z − z0 ) , bn = 2πi γ r n=1 Therefore, kbn k ≤ 2πrM rn and |z − z0 | ≥ rˆ > r. Thus q X n=p
−n
kbn k |z − z0 |
≤
q X n=p
2πˆ rM
rn r, it must be the case that z ∈ H and so for such z, the bottom description of g (z) found in 16.10 is valid. Therefore, it follows lim kg (z)k = 0
|z|→∞
and so g is bounded and analytic on all of C. By Liouville’s theorem, g is a constant. Hence, from the above equation, the constant can only equal zero. ∗ For z ∈ Ω \ ∪m k=1 γ k , since it was just shown that h (z) = g (z) = 0 on Ω m Z m Z 1 X f (w) − f (z) 1 X φ (z, w) dw = dw = 0 = h (z) = 2πi 2πi w−z γk γk k=1
1 2πi
k=1
m Z X k=1
γk
f (w) dw − f (z) w−z
m X
n (γ k , z) .
k=1
In case it is not obvious why g is analytic on H, use the formula. It reduces to showing that m Z X f (w) dw z→ w −z γk k=1
is analytic. You can show this by taking a limit of a difference quotient and argue that the limit can be taken inside the integral. Taking a difference quotient and simplifying a little, one obtains R R f (w) (w) dw − γ fw−z dw Z f (w) γ k w−(z+h) k = dw h (w − z) (w − (z + h)) γk
428
CHAPTER 16. ISOLATED SINGULARITIES AND ANALYTIC FUNCTIONS
considering only small h, the denominator is bounded below by some δ > 0 and also f (w) is bounded on the compact set γ ∗k , |f (w)| ≤ M. Then for such small h, f (w) f (w) − (w − z) (w − (z + h)) (w − z)2 1 1 1 = − f (w) w − z (w − (z + h)) (w − z) 1 1 hM ≤ w − z δ it follows that one obtains uniform convergence as h → 0 of the integrand to sequence h → 0 and by Theorem 15.3.8, the integral converges to Z f (w) 2 dw γ k (w − z)
f (w) for (w−z)2
any
The following is an interesting and fairly easy corollary.
Corollary 16.4.3 Let Ω be an open set (note that Ω might not be simply connected) and let γ k : [ak , bk ] → Ω, k = 1, · · · , m, be closed, continuous and of bounded variation. Suppose also that m X n (γ k , z) = 0 k=1
for all z ∈ / Ω and
Pm
k=1
n (γ k , z) is an integer for z ∈ Ω. Then if f : Ω → X is analytic, m Z X
f (w) dw = 0.
γk
k=1
Proof: This follows from Theorem 16.4.2 as follows. Let g (w) = f (w) (w − z) where z ∈ Ω \ ∪m k=1 γ k ([ak , bk ]) . Then by this theorem, 0=0
m X
n (γ k , z) = g (z)
k=1
m X
n (γ k , z) =
k=1
Z m m Z X 1 X 1 g (w) dw = f (w) dw. 2πi γ k w − z 2πi γk k=1
k=1
What if Ω is simply connected? Can one assert something interesting in this case beyond what is said above?
Corollary 16.4.4 Let γ : [a, b] → Ω be a continuous closed curve of bounded variation where Ω is a simply connected region in C and let f : Ω → X be analytic. Then Z f (w) dw = 0. γ
b ∗ . Thus ∞ ∈ C\γ b ∗ . Then the Proof: Let D denote the unbounded component of C\γ b \ Ω is contained in D since every point of C b \ Ω must be in some component connected set, C ∗ b b of C\γ and ∞ is contained in both C\Ω and D. Thus D must be the component that
16.5. AN EXAMPLE OF A CYCLE
429
b \ Ω. It follows that n (γ, ·) must be constant on C b \ Ω, its value being its value contains C on D. However, for z ∈ D, Z 1 1 dw n (γ, z) = 2πi γ w − z and so lim|z|→∞ n (γ, z) = 0 showing n (γ, z) = 0 on D. Therefore this verifies the hypothesis of Theorem 16.4.2. Let z ∈ Ω ∩ D and define g (w) ≡ f (w) (w − z) . Thus g is analytic on Ω and by Theorem 16.4.2, Z Z 1 g (w) 1 0 = n (z, γ) g (z) = dw = f (w) dw. 2πi γ w − z 2πi γ The following is a very significant result which will be used later.
Corollary 16.4.5 Suppose Ω is a simply connected open set and f : Ω → X is analytic. Then f has a primitive F, on Ω. Recall this means there exists F such that F 0 (z) = f (z) for all z ∈ Ω. Proof: Pick a point, z0 ∈ Ω and let V denote those points, z of Ω for which there exists a curve, γ : [a, b] → Ω such that γ is continuous, of bounded variation, γ (a) = z0 , and γ (b) = z. Then it is easy to verify that V is both open and closed in Ω and therefore, V = Ω because Ω is connected. Denote by γ z0 ,z such a curve from z0 to z and define Z F (z) ≡
f (w) dw. γ z0 ,z
Then F is well defined because if γ j , j = 1, 2 are two such curves, it follows from Corollary 16.4.4 that Z Z f (w) dw = 0, f (w) dw + −γ 2
γ1
implying that Z
Z f (w) dw =
γ1
f (w) dw. γ2
Now this function, F is a primitive because, thanks to Corollary 16.4.4 Z 1 −1 (F (z + h) − F (z)) h = f (w) dw h γ z,z+h Z 1 1 = f (z + th) hdt h 0 and so, taking the limit as h → 0, F 0 (z) = f (z) . Here is a technical result about finding suitable cycles which seems to often be taken for granted, but it is not at all obvious. It is about getting closed curves which enclose each of finitely many compact sets such that there is no intersection between curves, and each goes around the corresponding compact set in the positive direction.
16.5
An Example of a Cycle
The next theorem deals with the existence of a cycle with nice properties. Basically, you go around the compact subset of an open set with suitable contours while staying in the open
430
CHAPTER 16. ISOLATED SINGULARITIES AND ANALYTIC FUNCTIONS
set. The method involves the following simple concept. If a cycle Γ consists of oriented curves {γ 1 , · · · , γ r } , define for p ∈ / Γ∗
n (Γ, p) ≡
r X
n (γ i , p)
i=1
Also, for convenience define Z f (λ) dλ ≡ Γ
r Z X i=1
f (λ) dλ
γi
Definition 16.5.1
A tiling of R2 = C is the union of infinitely many equally spaced vertical and horizontal lines. You can think of the small squares which result as tiles. To tile the plane or R2 = C means to consider such a union of horizontal and vertical lines. It is like graph paper. See the picture below for a representation of part of a tiling of C.
Theorem 16.5.2 Let K1 , K2 , · · · , Km be disjoint compact subsets of an open set Ω m in C. Then there exist continuous, closed, bounded cycles {Γj }j=1 for which Γ∗j ∩ Kk = ∅ for each k, j, Γ∗j ∩ Γ∗k = ∅, Γ∗j ⊆ Ω. Also, if p ∈ Kk and j 6= k, n (Γk , p) = 1, n (Γj , p) = 0 so if p is in some Kk , m X
n (Γj , p) = 1
j=1
each Γj being the union of oriented simple closed curves, while for all z ∈ / Ω m X
n (Γk , z) = 0.
k=1
Also, if p ∈ Γ∗j , then for i 6= j, n (Γi , p) = 0. Proof: Consider z → dist z, ∪k6=j Kk ∪ ΩC . From Lemma 2.5.8 this is a continuous function. Thus it has a minimum on Kj . Let this be δ j . Then δ j > 0 because the Kj are disjoint. Let 0 < δ < min {δ j , j = 1, · · · , m} . Now tile the plane with squares, each of which has diameter less than δ/4.
16.5. AN EXAMPLE OF A CYCLE
431
Ω
K1
K2
Let Sj denote the set of all the closed squares in this tiling which have nonempty intersection with Kj .Thus, ∪Sj ⊇ Kj , all the squares of Sj are contained in Ω, and none have nonempty intersection with any Kk for k 6= j. First suppose p is a point of Kj which is in the interior of one of these squares in the tiling. Denote by ∂Qk the boundary of Qk one of the squares in Sj , oriented in the counter clockwise direction and Qm denote the square of Sj which contains the point, p in its interior. Thus n (∂Qm , p) = 1 but n (∂Qk , p) = 0 for all k 6= m. Also, if p ∈ / Ω, then if Q is any square from any Sj , then n (∂Q, p) = 0. This along with the construction described next will pertain to Sj . However, note that if Q is any square of Si , i 6= j and if z is any point on ∂Q, then n (∂Qk , z) = 0 for i 6= j.
If the two squares of Sj have a common edge, then this edge is oriented in opposite directions and so this edge n othe two squares have in common can be deleted without changing P j n γ , z where γ jk are the edges of squares in Sj and z not on any of the edges of j,k k squares in Sj . In particular if z is on the edge of some square of Si , i 6= j then this sum is unchanged also and is 0. For example, see the picture,
From the construction, if any of the γ j∗ k contains a point of Kj then this point is on one of the four edges of some Qs ∈ Sj and at this point, there is at least one edge of some Qr ∈ Sj , r 6= s which also contains this point. As just discussed, this shared edge can P be deleted without changing k,j n z, γ jk . You simply delete the edges of the Qr which intersect Kj but not the endpoints of these edges. That is, delete the open edges. Do so for every instance of intersection of an edge of some Q ∈ Sj with Kj . When this is done, delete all isolated points, if any. Do this process for each Sj for each j. Let the resulting m oriented curves be denoted by {γ k }k=1 . Let those which pertain to Kj be denoted by Γj . The construction is illustrated in the following picture.
432
CHAPTER 16. ISOLATED SINGULARITIES AND ANALYTIC FUNCTIONS
Ω
K1
K2
Then as explained above, if p is not on any of the lines in the original tiling but in some Pm j Kj then k=1 n γ k , p ≡ n (Γj , p) = 1. If p is in Γ∗i for i 6= j, then n (Γj , p) = 0. It remains to prove the claim about the closed curves. Each orientation on an edge corresponds to a direction of motion over that edge. Call such a motion over the edge a route. Initially, every vertex, (corner of a square in Sj ) has the property there are the same number of routes to and from that vertex. When an open edge whose closure contains a point of K is deleted, every vertex either remains unchanged as to the number of routes to and from that vertex or it loses both a route away and a route to. Thus the property of having the same number of routes to and from each vertex is preserved by deleting these open edges. The isolated points which result lose all routes to and from. It follows that upon removing the isolated points you can begin at any of the remaining vertices and follow the routes leading out from this and successive vertices according to orientation and eventually return to that end. Otherwise, there would be a vertex which would have only one route leading to it which does not happen. Now if you have used all the routes out of this vertex, pick another vertex and do the same process. Otherwise, pick an unused route out of the vertex and follow it to return. Continue this way till all routes are used exactly once, resulting in closed oriented curves, Γk for Sj . Then if p ∈ Kj , X n (Γk , p) = 1 k
because, as discussed above, the sum P of the winding numbers started out as 1 and remains unchanged. If p ∈ Kk , k 6= j, then k n (Γk , p) = 0 and if p ∈ Γ∗i for i 6= j, then n (Γj , p) = 0. In case p ∈ Kj is on some line of the original tiling, it is not on any of the Γk because Γ∗k ∩ Kj = ∅ and so the continuity of z → n (Γk , z) yields the desired result in this case also. Now note that Kk and Γ∗k , Γ∗j , Kj , j 6= k are disjoint compact sets in Ω. Thus the same ˆ k and ∆k such that these are contained in Ω and for process just used will provide a cycle Γ ˆ k , p = 1, n Γ ˆ k , z = 0 for z ∈ Γ∗k p ∈ Kk , n Γ ˆ k , z = 0 if z ∈ Also n Γ / Ω. Incidentally ∆k has various properties relative to Γ∗k which will also be obtained from the above process but which are not important here. Essentially, ˆ what this does is to change each Γk a little bit getting Γk with the same properties but also ˆ k , p = 0 if p ∈ Γ∗ . n Γ k
16.6. ISOLATED SINGULARITIES
433
Corollary 16.5.3 n o In the context of Theorem 16.5.2, there exist continuous, closed,
bounded cycles
ˆj Γ
m
j=1
ˆ ∗ = ∅, Γ ˆ ∗ ⊆ Ω. ˆ∗ ∩ Γ ˆ ∗ ∩ Kk = ∅ for each k 6= j, Γ for which Γ j j j k
Also, if p ∈ Kk and j 6= k, ˆ k , p = 1, n Γ so if p is in some Kk ,
ˆj , p = 0 n Γ
m X ˆj , p = 1 n Γ j=1
ˆ j being the union of oriented simple closed curves, while for all z ∈ each Γ / Ω m X ˆ k , z = 0. n Γ k=1
ˆ ∗ , then for i 6= j, n Γ ˆ i , p = 0. Additionally, n Γ ˆ k , p = 0 if p ∈ Γ∗ . Also, if p ∈ Γ j k
16.6
Isolated Singularities
This is about the situation where the Laurent series of f has nonzero principal part. When this occurs, we say that z0 is a singularity. The singularities are isolated if each is the center of a ball such that f is analytic except for the center of the ball.
Definition 16.6.1
Let B 0 (a, r) ≡ {z ∈ C such that 0 < |z − a| < r}. Thus this is the usual ball without the center. A function is said to have an isolated singularity at the point a ∈ C if f is analytic on B 0 (a, r) for some r > 0.
It turns out isolated singularities can be neatly classified into three types, removable singularities, poles, and essential singularities. The next theorem deals with the case of a removable singularity.
Definition 16.6.2
An isolated singularity of f is said to be removable if there exists an analytic function g analytic at a and near a such that f = g at all points near a.
Theorem 16.6.3
Let f : B 0 (a, r) → X be analytic. Thus f has an isolated singularity at a. Then a is a removable singularity if and only if lim f (z) (z − a) = 0.
z→a
Thus the above limit occurs if and only if there exists a unique analytic function, g : B (a, r) → X such that g = f on B 0 (a, r) . In other words, you can re define f at a so that the resulting function is analytic. 2
Proof: ⇒Let h (z) ≡ (z − a) f (z) , h (a) ≡ 0. Then h is analytic on B (a, r) because it is easy to see that h0 (a) = 0. It follows h is given by a power series, h (z) =
∞ X
k
ak (z − a)
k=2
where a0 = a1 = 0 because of the observation above that h0 (a) = h (a) = 0. It follows that for |z − a| > 0 ∞ X k−2 f (z) = ak (z − a) ≡ g (z) . k=2
434
CHAPTER 16. ISOLATED SINGULARITIES AND ANALYTIC FUNCTIONS
⇐The converse is obvious. What of the other case where the singularity is not removable? This situation is dealt with by the amazing Casorati Weierstrass theorem. This theorem pertains to the case where f has values in C. Most of what has been presented pertains to functions which have values in X for X a Banach space but this one is a specialization.
Theorem 16.6.4
(Casorati Weierstrass) Let a be an isolated singularity and suppose for some r > 0, f (B 0 (a, r)) is not dense in C. Then either a is a removable singularity or there exist finitely many b1 , · · · , bM for some finite number, M such that for z near a, f (z) = g (z) +
M X k=1
bk (z − a)
k
(16.11)
where g (z) is analytic near a. Proof: Suppose B (z0 , δ) has no points of f (B 0 (a, r)) . Such a ball must exist if f (B 0 (a, r)) is not dense. Then for z ∈ B 0 (a, r) , |f (z) − z0 | ≥ δ > 0. It follows from Theorem 16.6.3 1 has a removable singularity at a. Hence, there exists h an analytic function that f (z)−z 0 such that for z near a, 1 h (z) = . (16.12) f (z) − z0 P∞ k 1 for z There are two cases. First suppose h (a) = 0. Then k=1 ak (z − a) = f (z)−z 0 near a. If all the ak = 0, this would be a contradiction because then the left side would equal zero for z near a but the right side could not equal zero. Therefore, there is a first m such that am 6= 0. Hence there exists an analytic function, k (z) which is not equal to zero in some ball, B (a, ε) such that k (z) (z − a)
m
=
1 . f (z) − z0
Hence, taking both sides to the −1 power, f (z) − z0 =
∞ X 1 k bk (z − a) m (z − a) k=0
and so 16.11 holds. The other case is that h (a) 6= 0. In this case, raise both sides of 16.12 to the −1 power and obtain −1 f (z) − z0 = h (z) , a function analytic near a. Therefore, the singularity is removable. This theorem is the basis for the following definition which describes isolated singularities.
Definition 16.6.5
Let a be an isolated singularity of a X valued function f . When 16.11 holds for z near a, then a is called a pole. The order of the pole in 16.11 is M. Essential singularities are those which have infinitely many nonzero terms in the principal part of the Laurent series. When a function f is analytic except for isolated singularities and the isolated singularities are all poles, and there are finitely many of these poles in every compact set, the function is called meromorphic. Actually, if you insist only that the singularities are isolated and poles, then you can prove that there are finitely many in any compact set so part of the above definition is actually redundant, but this will be shown later. What follows is the definition of something called a residue. This pertains to a singularity which has a pole at an isolated singularity.
16.7. THE RESIDUE THEOREM
435
Definition 16.6.6
The residue of f at an isolated singularity α which is a pole, −1 written res (f, α) is the coefficient of (z − α) where f (z) = g (z) +
m X
bk k
k=1
(z − α)
.
Thus res (f, α) = b1 in the above. Here it suffices to assume that f has values in X a Banach space.
16.7
The Residue Theorem
The following is the residue theorem.
Theorem 16.7.1 Let Ω be an open set and let γ k : [ak , bk ] → Ω, k = 1, · · · , m, be a parametrizaton for a simple closed curve, and be continuous and of bounded variation. Suppose also that m X n (γ k , z) = 0 k=1
for all z ∈ / Ω. Then if f : Ω → X has every singularity removable or a pole and such that no γ ∗k contains any poles of f , m m Z X X 1 X n (γ k , α) (16.13) f (w) dw = res (f, α) 2πi γk α∈A
k=1
k=1
where here P denotes the set of poles of f in Ω. The sum on the right is a finite sum. Proof: First note that there are at most finitely many singularities α which are not in ∗ the unbounded component of C \ ∪m exists a finite set, {α1 , · · · , αN } ⊆ P k=1 γ k . Thus thereP n such that these are the only possibilities for which k=1 n (γ k , α) might not equal zero. Therefore, 16.13 reduces to m Z N n X X 1 X f (w) dw = res (f, αj ) n (γ k , αj ) 2πi γk j=1 k=1
k=1
and it is this last equation which is established. For z near αj , f (z) = gj (z) +
mj X r=1
bjr r ≡ gj (z) + Qj (z) . (z − αj )
where gj is analytic at and near αj . Now define G (z) ≡ f (z) −
N X
Qj (z) .
j=1
It follows that G (z) has a removable singularity at each αj . Therefore, by Corollary 16.4.3, 0=
m Z X k=1
G (z) dz =
γk
m Z X k=1
f (z) dz −
γk
N X m Z X j=1 k=1
Qj (z) dz.
γk
Now m Z X k=1
Qj (z) dz
=
γk
=
m Z X k=1 γ k m Z X k=1
γk
mj
X bj1 bjr + (z − αj ) r=2 (z − αj )r m
! dz
X bj1 dz ≡ n (γ k , αj ) res (f, αj ) (2πi) . (z − αj ) k=1
436
CHAPTER 16. ISOLATED SINGULARITIES AND ANALYTIC FUNCTIONS
Therefore, m Z X k=1
f (z) dz
=
γk
N X m Z X
=
N X m X
Qj (z) dz
γk
j=1 k=1
n (γ k , αj ) res (f, αj ) (2πi)
j=1 k=1
=
2πi
N X
res (f, αj )
j=1
=
(2πi)
X
m X
n (γ k , αj )
k=1 m X
res (f, α)
α∈A
n (γ k , α)
k=1
The above is certainly a grand and glorious result but we typically have in mind something much more ordinary. As a review of the main ideas consider this.
•a1
•a2
•a3
•a4
•a5
You have a simple closed curve oriented such that the winding number is 1. Say γ is a n parametrization for this curve. Then inside there are finitely many singularities {ak }k=1 . Enclose each with a circle oriented in the clockwise direction, parameterized by γˆ k and connect them with straight lines as shown. Then you have two simple closed curves which intersect in these finitely many straight line segments. Orient them oppositely so that line integrals over the straight line segments cancel and each of the two simple closed curves is oriented positively. Then if f is analytic on the inside of γ except for the ak and is continuous on γ ∗ , the Cauchy integral theorem implies Z f (z) dz + γ
n Z X k=1
Letting γ k ≡ −ˆ γk, Z f (z) dz = γ
f (z) dz = 0
γ ˆk
n Z X k=1
f (z) dz
γk
Now on the inside of γ k , f (z) = gk (z) + Thus has
R
Mk X
bn n (z − ak ) n=1
(16.14)
f (z) dz = b1 because all the other terms have primitives. Indeed, if n 6= −1, (z − a)
γ1 (z−a)n+1 n+1
as a primitive. However, Z γk
Similarly,
R γk
f (z) dz = res (f, ak ) 2πi.
b1 dz = 2πib1 z − a1
n
16.8. EVALUATION OF IMPROPER INTEGRALS Thus Z f (z) dz = 2πi γ
n X
437
res (f, ak )
k=1
In words, the contour integral is 2πi times the sum of the residues. This formulation is in a sense more general because it is only required that the function be continuous on γ ∗ and analytic on the inside of γ ∗ except for the exceptional points indicated. So is there a way to find the residues? The answer is yes.
Procedure 16.7.2
Say you want to find res (f, a) = b1 in
f (z) = g (z) +
M X
bn n , g analytic (z − a) n=1 M
This is the case where you have a pole of order M at a. You would multiply by (z − a) This would give M
f (z) (z − a)
M
= g (z) (z − a)
+
M X
.
M −n
bn (z − a)
n=1
Then you would take M − 1 derivatives and then take the limit as z → a. This would give (M − 1)!b1 . You can see from the formula that this will work and so there is no question that the limit exists. Because of this, you could use L’Hospitals rule to formally find this limit. This rule pertains only to real functions of a real variable. However, since you know the limit exists in this case, you can pick a one dimensional direction and apply L’Hospital to the real and imaginary parts to identify the limit which is typically what needs to be done.
16.8
Evaluation of Improper Integrals
Letting p (x) , q (x) be polynomials, you can use the above method of residues to evaluate obnoxious integrals of the form Z R Z ∞ p (x) p (x) dx ≡ lim dx R→∞ q (x) −R q (x) −∞ provided the degree of p (x) is two less than the degree of q (x) and the zeros of q (z) involve Im (z) > 0. Of course if the degree of p (x) is larger than that of q (x) , you would do long division. The contour to use for such problems is γ R which goes from (−R, 0) to (R, 0) along the real line and then on the semicircle of radius R from (R, 0) to (−R, 0). y
x Letting CR be the circular part of this contour, for large R, Z p (z) CRk dz ≤ πR k+2 R CR q (z) which converges to 0 as R → ∞. Therefore, it is only a matter of taking large enough R to enclose all the roots of q (z) which are in the upper half plane, finding the residues at these points and then computing the contour integral. Then you would let R → ∞ and the part
438
CHAPTER 16. ISOLATED SINGULARITIES AND ANALYTIC FUNCTIONS
of the contour on the semicircle will disappear leaving the Cauchy principal value integral which is desired. There are other situations which will work just as well. You simply need to have the case where the integral over the curved part of the contour converges to 0 as R → ∞. Here is an easy example. R∞ Example 16.8.1 Find −∞ x21+1 dx You know from calculus that the answer is π. Lets use the method of residues to find this. The function z21+1 has poles at i and −i. We don’t need to consider −i. It seems clear that the pole at i is of order 1 and so all we have to do is take 1 1 x−i = (x − i) = 2 z→i 1 + x (x − i) (x + i) 2i 1 Then the integral equals 2πi 2i = π. That one is easy. Now here is a genuinely obnoxious integral. R∞ 1 Example 16.8.2 Find −∞ 1+x 4 dx lim
It will have poles at the roots of 1 + x4 . These roots are 1 1 √ 1 1 √ 1 1 √ 1 1 √ 2, − 2, − 2, 2 − i + i − i + i 2 2 2 2 2 2 2 2 Using the above contour, we only need consider 1 1 √ 1 1 √ − − i 2, + i 2 2 2 2 2 Since they are all distinct, the poles at these two will be of order 1. To find the residues at these points, you would need to have √ z − − 12 − 21 i 2 lim √ 1 + z4 z→−( 21 + 12 i) 2 factoring 1 + x4 and computing the limit, you could get the answer. L’Hospital’s rule to identify the limit you know is there, 1 1 √ 1 = − i 2 lim √ 3 8 8 z→−( 21 + 12 i) 2 4z
Then applying
L’Hospital’s rule is a strictly one dimensional phenomenon. However, here you know there is a limit and so it is all right to obtain it from one dimensional considerations. The Laurent series shows that the limits you consider will exist. To apply L’Hospital’s rule, you simply specialize to a line of the form t → z0 + t and consider real and imaginary parts. The end result will be the limit sought. More formally, you would use the partial fractions expansion to find limits. In the current example, you can deal only with linear factors and their powers in the denominator because the denominator factors completely in the complex plane. The theory of partial fractions is discussed in Theorem 1.6.13 on Page 19. √ √ Similarly, the residue at 12 + 21 i 2 is − 18 + 81 i 2. Then the contour integral is 1 1 √ 1 1 √ 1√ 2πi − i 2 + 2πi − + i 2 = 2π 8 8 8 8 2 You might observe that this is a lot easier than doing the usual partial fractions and trig substitutions etc. Now here is another tedious example.
16.8. EVALUATION OF IMPROPER INTEGRALS Example 16.8.3 Find
439
R∞
x+2 dx −∞ (x2 +1)(x2 +4)2
The poles of interest are located at i, 2i. The pole at 2i is of order 2 and the one at i is of order 1. In this case, the partial fractions expansion is 1 2 9x + 9 x2 + 1
−
1 2 3x + 3 2 (x2 + 4)
−
1 2 9x + 9 x2 + 4
The pole at i would be 2 1 + 29 (z − i) 1 1 9i + 9 lim = = − i z→i (z + i) (z − i) (i + i) 18 9 1 9z
Now consider the pole at 2i by consideration of the next term in the partial fractions expansion. From the theory of partial fractions, you know that 2 1 3x + 3 2 (x2 + 4)
=
A C B D + + + (x − 2i) (x − 2i)2 (x + 2i) (x + 2i)2
The residue at this point is A since −2i is in the lower half plane. To identify A, you can 2 multiply by (x − 2i) , take the derivative and then take a limit as x → 2i. This would result in A. Multiplying and taking the derivative yields ! 1 2 1 3x + 3 Dx =− 2 3 (x + 4 − 2i) (x + 2i) 3 (x + 2i) Then you have to take a limit as x → 2i which is −
1 i 48
Finally, consider the last term which has a pole of order 1. 1 2 1 1 9 x + 9 (x − 2i) lim = − i x→2i (x − 2i) (x + 2i) 18 18 Then adding in the minus sign, we have the following for the integral. 1 1 1 1 1 5 2πi − i + 2πi − − i − 2πi − i = π 18 9 18 18 48 72 Sometimes you don’t blow up the curves and take limits. Sometimes the problem of interest reduces directly to a complex integral over a closed curve. Here is an example of this. Example 16.8.4 The integral is Z 0
π
cos θ dθ 2 + cos θ
This integrand is even and so it equals Z 1 π cos θ dθ. 2 −π 2 + cos θ For z on the unit circle, z = eiθ , z = z1 and therefore, cos θ = 12 z + z1 . Thus dz = ieiθ dθ and so dθ = dz iz . Note that this is done in order to get a complex integral which reduces
440
CHAPTER 16. ISOLATED SINGULARITIES AND ANALYTIC FUNCTIONS
to the one of interest. It follows that a complex integral which reduces to the integral of interest is Z Z 1 1 1 z2 + 1 1 2 z + z dz = dz 2i γ 2 + 21 z + z1 z 2i γ z (4z + z 2 + 1) where γ is the unit circle oriented counter clockwise. Now the integrand has poles of order 1 at those points where z 4z + z 2 + 1 = 0. These points are 0, −2 +
√
3, −2 −
√
3.
Only the first two are inside the unit circle. It is also clear the function has simple poles at these points. Therefore, z2 + 1 = 1. res (f, 0) = lim z z→0 z (4z + z 2 + 1) √ res f, −2 + 3 = lim √
√ z − −2 + 3
z→−2+ 3
z2 + 1 2√ =− 3. 2 z (4z + z + 1) 3
It follows Z 0
π
z2 + 1 dz 2 γ z (4z + z + 1) 1 2√ = 2πi 1 − 3 2i 3 2√ = π 1− 3 . 3
cos θ dθ 2 + cos θ
=
1 2i
Z
Other rational functions of the trig functions will work out by this method also. Sometimes we have to be clever about which version of an analytic function that reduces to a real function we should use. The following is such an example. Example 16.8.5 The integral here is Z 0
∞
ln x dx. 1 + x4
It is natural to try and use the contour in the following picture in which the small circle has radius r and the large one has radius R. y
x However, this will create problems with the log since the usual version of the log is not defined on the negative real axis. This difficulty may be eliminated by simply using another branch of the logarithm as in Example 16.1.3. Leave out the ray from 0 along the negative y axis and use this example to define L (z) on this set. Thus L (z) = ln |z| + i arg1 (z) where iθ arg1 (z) will be the angle θ, between − π2 and 3π 2 such that z = |z| e . Then the function L(z) used is f (z) ≡ 1+z4 . Now the only singularities contained in this contour are 1√ 1 √ 1√ 1 √ 2 + i 2, − 2+ i 2 2 2 2 2
16.8. EVALUATION OF IMPROPER INTEGRALS
441
√ √ and the integrand f has simple poles at these points. Thus res f, 12 2 + 12 i 2 = √ √ z − 21 2 + 12 i 2 (ln |z| + i arg1 (z)) lim √ √ 1 + z4 z→ 12 2+ 12 i 2 √ √ (ln |z| + i arg1 (z)) + z − 21 2 + 12 i 2 (1/z) = lim √ √ 4z 3 z→ 21 2+ 12 i 2 q π 1 1 ln + √ 2 2 + i4 1 1 = = 2π − i √ √ 3 32 32 4 1 2 + 1i 2 2
2
Similarly
−1 √ 1 √ res f, 2+ i 2 = 2 2 √ 3 √ 3 2π + i 2π. 32 32 Of course it is necessary to consider the integral along the small semicircle of radius r. This reduces to Z 0 ln |r| + it it dt 4 rie it π 1 + (re ) which clearly converges to zero as r → 0 because r ln r → 0. Therefore, taking the limit as r → 0, Z Z −r L (z) ln (−t) + iπ dz + lim dt+ 4 r→0+ −R 1 + t4 large semicircle 1 + z Z R 3√ 3 √ 1√ 1 √ ln t dt = 2πi 2π + i 2π + 2π − i 2π . lim r→0+ r 1 + t4 32 32 32 32 R L(z) Observing that large semicircle 1+z 4 dz → 0 as R → ∞, Z e (R) + 2 lim
r→0+
r
R
ln t dt + iπ 1 + t4
Z
0
−∞
1 dt = 1 + t4
√ 1 1 − + i π2 2 8 4
where e (R) → 0 as R → ∞. From an earlier example this becomes √ ! Z R √ 2 1 1 ln t dt + iπ π = − + i π 2 2. e (R) + 2 lim 4 r→0+ r 1 + t 4 8 4 Now letting r → 0+ and R → ∞, Z 2 0
and so
∞
ln t dt 1 + t4
Z 0
√ 1 1 = − + i π 2 2 − iπ 8 4 √ 1 = − 2π 2 , 8
∞
√
2 π 4
!
ln t 1√ 2 2π , dt = − 4 1+t 16
which is probably not the first thing you would thing of. You might try to imagine how this could be obtained using elementary techniques. Example 16.8.6 The Fresnel integrals are Z ∞ Z cos x2 dx, 0
0
∞
sin x2 dx.
442
CHAPTER 16. ISOLATED SINGULARITIES AND ANALYTIC FUNCTIONS 2
To evaluate these integrals we will consider f (z) = eiz on the curve which goes from 1+i the origin to the point r on the x axis and from this point to the point r √2 along a circle of radius r, and from there back to the origin as illustrated in the following picture. y
x Thus the curve is shaped like a slice of pie. The angle is 45◦ . Denote by γ r the curved part. Since f is analytic, Z 0
=
e
iz 2
Z
r
dz +
e
γr
ix2
r
Z dx −
e
0
2 √ i t 1+i 2
0 r
r
1+i √ 2
dt
1+i √ dt 2 γr 0 0 √ Z Z r π 1+i iz 2 ix2 √ = e dz + e dx − + e (r) 2 2 γr 0 Z
=
2
eiz dz +
Z
R∞
where e (r) → 0 as r → ∞. This used integrals. Z 2 eiz dz γr
Z
2
eix dx −
0
2
e−t
2
e−t dt =
√
π 2 .
Now examine the first of these
Z π 4 it 2 = ei(re ) rieit dt 0 Z π4 2 ≤ r e−r sin 2t dt 0
=
r = 2
Z
r −(3/2)
0
r 2
1
Z
r 1 √ du + 2 2 1−u
0
2
e−r u √ du 1 − u2
Z 0
1
1 √ 1 − u2
e−(r
1/2
)
which converges to zero as r → ∞. Therefore, taking the limit as r → ∞, Z ∞ √ 2 π 1+i √ = eix dx 2 2 0 and so the Fresnel integrals are given by Z 0
∞
√ Z ∞ π cos x2 dx. sin x dx = √ = 2 2 0 2
The following example is one of the most interesting. By an auspicious choice of the contour it is possible to obtain a very interesting formula for cot πz known as the Mittag Leffler expansion of cot πz. Example 16.8.7 Let γ N be the contour which goes from −N − 21 − N i horizontally to N + 21 − N i and from there, vertically to N + 21 + N i and then horizontally to −N − 12 + N i and finally vertically to −N − 12 − N i. Thus the contour is a large rectangle and the direction of integration is in the counter clockwise direction.
16.8. EVALUATION OF IMPROPER INTEGRALS
443
(−N − 12 ) + N i •
(N + 21 ) + N i •
• (−N − 12 ) − N i
• (N + 21 ) − N i
Consider the following integral. Z IN ≡ γN
π cos πz dz (α2 − z 2 ) sin πz
where α is not an integer. This will be used to verify the formula of Mittag Leffler, ∞
X 2 π cot πα 1 + = . 2 2 − n2 α α α n=1
(16.15)
It is left as an exercise to verify that cot πz is bounded on this contour and that therefore, IN → 0 as N → ∞. Now compute the residues of the integrand at ±α and at n where |n| < N + 21 for n an integer. These are the only singularities of the integrand in this contour and therefore, IN can be obtained by using these. First consider the residue at ±α. These are obviously poles of order 1 and so to get the one at α, you take −π cos πα −π cos πz (z − α) π cos πz = lim = 2 2 z→α (α + z) sin πz z→α (α − z ) sin πz 2α sin πα lim
You get the same thing at −α. Next consider the residue at n. If you consider the power series, you will see that this should also be a pole of order 1. Thus it is (z − n) π cos πz z→n (α2 − z 2 ) sin πz lim
= =
π cos πz − (z − n) π 2 sin (πz) z→n −2z sin πz + (α2 − z 2 ) π cos (πz) n π (−1) 1 n = 2 α − n2 (α2 − n2 ) π (−1) lim
Therefore, " 0 = lim IN = lim 2πi N →∞
N →∞
N X n=−N
1 π cot πα − 2 2 α −n α
#
which establishes the following formula of Mittag Leffler. lim
N →∞
N X n=−N
1 π cot πα . = α 2 − n2 α
Writing this in a slightly nicer form, we obtain 16.15. The next example illustrates the technique of a branch cut. R ∞ p−1 Example 16.8.8 For p ∈ (0, 1) , find 0 x1+x dx. This example illustrates the use of something called a branch cut. The idea is you need to pick a single determination of z p−1 which converges to xp−1 for x real and z getting close to x. It will make use of the following contour. In this contour, the radius of the large circle is R and the radius of the small one is r. The angle between the straight lines and the x axis is ε. Denote this contour by γ R,r,ε .
444
CHAPTER 16. ISOLATED SINGULARITIES AND ANALYTIC FUNCTIONS
Choose a branch of the logarithm of the form log (z) = ln |z| + iA (z) where A (z) is the angle of z in (0, 2π). Thus z p−1 = e(p−1)(ln|z|+iA(z)) The straight lines, the one on top. reiε + t Reiε = z, t ∈ [0, 1]. Contour integral: iε re + t Reiε p−1 e(p−1)iε Reiε dt iε + t (Reiε ) f s 1 + re 0 The one on the bottom: rei(2π−ε) + t Rei(2π−ε) = z, t ∈ [0, 1] Contour integral: Z
i(2π−ε) p−1 (p−1)i(2π−ε) re + t Rei(2π−ε) e Rei(2π−ε) dt 1 + rei(2π−ε) + t Rei(2π−ε)
1
Z
1
− 0
The integral over the small circle: z = reit , t ∈ [ε, 2π − ε] Contour integral: 2π−ε
Z − ε
rp−1 e(p−1)it it rie dt 1 + reit
The integral over the large circle: z = Reit , t ∈ [ε, 2π − ε] Contour integral: Z
2π−ε
ε
Rp−1 e(p−1)it Rieit dt 1 + Reit
iπ(p−1)
2πie
z p−1 dz 1+z
Z = γ R,r,ε
The residue at −1 of the function is eiπ(p−1) and so the contour integral on the right equals the sum of those other integrals above. Now let ε → 0. This yields 2πieiπ(p−1) =
Z γ R,r
z p−1 dz 1+z
where the integral on the right equals the sum 1
Z 0
Z + 0
p−1
(r + tR) Rdt + 1 + r + tR 2π
r
Z −
E1 (r) p−1 (p−1)it
e 1 + reit
! p−1 (p−1)i(2π) (r + tR) e Rdt 1 + r + tR
1
0
E2 (R)
rieit dt +
Z 0
2π
Rp−1 e(p−1)it Rieit dt 1 + Reit
16.8. EVALUATION OF IMPROPER INTEGRALS
445
The last two integrals converge to 0 as r → 0 and R → ∞. This follows easily from the form of the integrands. You can change the variable in the first two to write them as R
Z
xp−1 dx, −e(p−1)i(2π) 1+x
r
R
Z r
xp−1 dx 1+x
Thus Z
2πieiπ(p−1) =
R
r
xp−1 dx 1 − e(p−1)i(2π) + E1 (r) + E2 (R) 1+x
where E (r) , E (R) converges to 0 as r → 0 and R → ∞. There is no hope of taking a limit as R → ∞ while keeping r > 0 fixed, but if we let both variables converge at the same time, then we could get something. Let r → 0+ and let R = 1/r. iπ(p−1)
2πie
Z
1/r
= r
xp−1 dx 1 − e(p−1)i(2π) + E (r, 1/r) 1+x
and now, as r → 0,which equals the sum of the two integrals over the straight lines added to the integrals over the small circles which converge to 0 as r → 0 and R → ∞. Top straight line converges as r → 0 to Z 1 p−1 (Rt) Rdt 1 + tR 0 Bottom integral converges as r → 0 to 1
Z −
|tR|
0
p−1 (p−1)i(2π)
e 1 + tR
Rdt
Of course change variables letting x = tR and the two integrals which must be summed are Z 0
R
xp−1 dx, −e(p−1)i(2π) 1+x
Z 0
R
xp−1 dx 1+x
Thus you get the following as R → ∞. Z 0
∞
xp−1 dx 1 − e2π(p−1)i = 2πieiπ(p−1) 1+x
Then what you get is Z 0
= =
∞
xp−1 2πieiπ(p−1) dx = 1+x 1 − e2π(p−1)i
−2πieiπp −2πi −2πi = = −iπp 1 − e2πpi (1 − e2πpi ) e−iπp e − eπip π −2πi = (cos πp − i sin (πp)) − (cos πp + i sin (πp)) sin (pπ)
I think this is quite an amazing result. Actually, people typically are a little more informal in the consideration of such integrals. They regard the bottom side of the line x ≥ 0 as being associated with θ = 2π and the top side being associated with θ = 0 and leave out the fuss with taking limits as ε → 0 and so forth.
446
16.9
CHAPTER 16. ISOLATED SINGULARITIES AND ANALYTIC FUNCTIONS
The Inversion of Laplace Transforms
Recall Theorem 10.3.8 about the inversion of the Laplace transform.
Theorem 16.9.1
Let g be a measurable function defined on (0, ∞) which has expo-
nential growth |g (t)| ≤ Ceηt for some real η and is Holder continuous from the right and left as in 10.1 and 10.2. For Re (s) > η Z ∞ e−su g (u) du Lg (s) ≡ 0
Then for any c > η, 1 R→∞ 2π
Z
R
e(c+iy)t Lg (c + iy) dy =
lim
−R
g (t+) + g (t−) 2
(16.16)
First write the integral on the left as a contour integral. Thus z = c + iy and dz = idy and this is just the contour integral 1 2πi
Z
c+iR
eut Lg (u) du
c−iR
where the contour is the straight line from c − iR to c + iR. Indeed, if you parametrize this contour as z = c + iy and use the procedures for evaluation of contour integrals, you get the integral in 16.16. Then taking the limit as R → ∞ it is customary to write this limit as 1 2πi
Z
c+i∞
eut Lg (u) du
c−i∞
This is called the Bromwich integral and as shown earlier it recovers the mid point of the jump of g at t for every point t where g is Holder continuous from the right and from the left. Remember t ≥ 0. Now u → eut Lg (u) is analytic for Re (u) > η and in particular for Re (u) ≥ c therefore, all of the poles of u → Lg (u) are contained in the set Re (u) < c. Indeed, in practice, u → Lg (u) ends up being represented by a formula which is clearly a meromorphic function, one which is analytic except for isolated poles. So how do you compute this Bromwich integral? This is where the method of residues is very useful. Consider the following contour. Let γ R be the above contour oriented as shown. The radius of the y x=c circular part is R. Let CR be the curved part. Then one can show that under suitable assumptions x Z 1 lim eut F (u) du = 0 (16.17) R→∞ 2πi C R A sufficient condition is that for all |z| large enough, |F (z)| ≤
C α , some α > 0. |z|
(16.18)
Note that this assumption implies there are finitely many poles for F (z) because if w is a pole, you have limz→w |F (z)| = ∞. In the above contour, if c < 0, the following Lemma is still valid with minor modifications in the argument. Alternatively, you could consider the resulting contour for c < 0 as a part of the contour about to be discussed and get the same result as a corollary of what is about to be shown.
16.9. THE INVERSION OF LAPLACE TRANSFORMS
447
Lemma 16.9.2 Let the contour be as shown and assume 16.18. Then the above limit in 16.17 exists. Proof: Assume c ≥ 0 as shown and let θ be the angle between the positive x axis and a point on CR . Let 0 < β < α. Then the contour integral over CR will be broken up into three pieces, two pieces around the y axis c π c i hπ , + arcsin , − arcsin θ ∈ R 2 R1−β 2 3π 3π c c , , − arcsin + arcsin 2 R1−β 2 R and the third having θ∈
c 3π c π , + arcsin − arcsin 2 R1−β 2 R1−β
Then, Z
etz F (z) dz =
3π 2 −arcsin
Z
c ( R1−β )
e(R cos θ+iR sin θ)t F Reiθ Rieiθ dθ+
(16.19)
c ( R1−β )
π 2 +arcsin
CR
Z
c ( R1−β )
π 2 +arcsin
+ π 2 −arcsin
Z
( ) ( Rc )
e(R cos θ+iR sin θ)t F Reiθ Rieiθ dθ
c R
3π 2 +arcsin
+
e(R cos θ+iR sin θ)t F Reiθ Rieiθ dθ
c ) ( R1−β
3π 2 −arcsin
∗ Consider the last two integrals first. For large |z| , with z ∈ CR , the sum of the absolute values of these is no more than Z π Z 3π c 2 +arcsin( R1−β 2 +arcsin( Rc ) ) C C eR(cos θ)t α Rdθ + eR(cos θ)t α Rdθ c π −arcsin( c ) 3π −arcsin( 1−β R R ) 2 R 2 R π c c c ≤ CeR(cos( 2 −arcsin( R )))t arcsin R1−α + arcsin 1−β R R 3π c c c +CeR(cos( 2 +arcsin( R )))t arcsin + arcsin R1−α R1−β R Now from trig. identities, cos π2 − arcsin (θ) = θ, cos 3π 2 + arcsin (θ) = θ, and so the above reduces to c c 2Cect arcsin + arcsin R1−α R1−β R
which converges to 0 as R → ∞. Recall 0 < β < α. It remains to consider the integral in 16.19. For large |z| , the absolute value of this integral is no more than Z
3π 2 −arcsin π 2 +arcsin
( Rc )
( Rc )
eR(cos θ)t
π c β C Rdθ ≤ CπeRt cos( 2 +arcsin( R1−β )) R1−α = CπR1−α e−cR α R
which converges to 0 as R → ∞.
Corollary 16.9.3 Let the contour be as shown and assume 16.18 for meromorphic F (s). Then the above limit in 16.17 exists. Also f (t) , given by the Bromwich integral, is continuous on (0, ∞) and its Laplace transform is F (s).
448
CHAPTER 16. ISOLATED SINGULARITIES AND ANALYTIC FUNCTIONS
Proof: It only remains to verify continuity. Let R be so large that the above contour γ ∗R encloses all poles of F . Then for such large R, the contour integrals are not changing because all the poles are enclosed. Thus Z Z 1 1 ˆ ˆ f tˆ = lim eut F (u) du = eut F (u) du R→∞ 2πi γ 2πi γ R R Z utˆ f tˆ − f (t) ≤ f tˆ − 1 e F (u) du 2πi γ R Z 1 Z 1 utˆ ut + e F (u) du − e F (u) du 2πi γ R 2πi γ R 1 Z ut + e F (u) du − f (t) 2πi γ R Z 1 Z 1 ˆ = eut F (u) du − eut F (u) du 2πi γ R 2πi γ R is fixed, it follows that if tˆ − t is small enough, then f tˆ − f (t) is also small.
Since γ R It follows from Lemma 16.9.2 that 1 f (t) ≡ lim R→∞ 2πi =
Z
c+iR ut
e F (u) du = lim c−iR
R→∞
1 2πi
Z
c+iR
1 e F (u) du + 2πi ut
c−iR
Z
! ut
e F (u) du CR
1 2πi sum of residues of the poles of ezt F (z) = sum of residues. 2πi
The following procedure shows how the Bromwich integral can be computed to obtain an actual formula for a function. However, the integral itself will make sense and could be numerically computed to solve for the inverse Laplace transform.
Procedure 16.9.4 Suppose F (s) is a Laplace transform and is meromorphic on C and satisfies 16.18. (This situation is quite typical) Then to compute the function of t, f (t) whose Laplace transform gives F (s) , do the following. Find the sum of the residues of ezt F (z) for Re z < c. This yields the midpoint of the jump of f (t) at each t where f is Holder continuous from the left and right. (Note there are no jumps by Corollary 16.9.3 so if f is Holder continuous at every point, then f (t) is recovered.) Example 16.9.5 Suppose F (s) = form of f (t).
s . (s2 +1)2
Find f (t) such that F (s) is the Laplace trans-
There are two residues of this function, one at i and one at −i. At both of these points the poles are of order two and so we find the residue at i by ! 2 d ets s (s − i) −iteit res (f, i) = lim = 2 s→i ds 4 (s2 + 1) and the residue at −i is d res (f, −i) = lim s→−i ds
2
ets s (s + i) (s2 + 1)
2
! =
ite−it 4
16.9. THE INVERSION OF LAPLACE TRANSFORMS
449
From the above procedure, the function f (t) is the sum of these. ite−it −iteit + 4 4
1 it e−it − eit 4 1 it (cos (t) − i sin t − (cos t + i sin t)) 4 1 t sin t 2
= = =
s . (s2 +1)2
You should verify that this actually works giving L (f ) =
Example 16.9.6 Find f (t) if F (s) , the Laplace transform is e−s /s. k k k P∞ st −s s You need to compute the residues of e se . The function equals 1s k=0 (−1) (t−1) . k! Thus the residue is 1. However, this fails to be the function whose Laplace transform is F (s) . What is wrong? The problem with this is the failure of the estimate on F (s) to hold for large s. Indeed, if s = −n, you would have en /n but it would need to be less than C/nα which is not possible. The estimate requires F (s) → 0 as |s| → ∞ and this does not happen here. You can verify directly that the function which works is u1 (t) which is 0 for t < 1 and 1 for t ≥ 1. Thus if the estimate does not hold, the procedure does not necessarily hold either. If Re p < c for all p a pole of F (s) and if F (s) is meromorphic and satisfies the growth condition 16.18, and if f (t) is defined by that Bromwich integral, is it true that F (s) is the Laplace transform of f (t) for large s? Thus
1 f (t) ≡ lim R→∞ 2π
Z
R
e
(c+iy)t
−R
Z
1 F (c + iy) dy = 2πi
c+i∞
ezt F (z) dz
c−i∞
The limit must exist because, as discussed above, Z
1 lim R→∞ 2πi
c+iR
!
Z
zt
zt
e F (z) dz +
e F (z) dz
c−iR
CR
is eventually constant because the contour will have enclosed all poles of F (z), but as R continues to increase, the integral over the curved part, CR converges to 0. Let Re s be larger than c. One needs to consider ∞
Z
−st
L (f ) (s) =
e 0
1 2πi
Z
c+i∞
1 e F (z) dzdt = 2πi zt
c−i∞
Z
∞
e 0
Z
−st
lim
R→∞
ezt F (z) dzdt
γR
This equals 1 r→∞ 2πi
Z
lim
0
r
e−st lim
R→∞
Z
ezt F (z) dzdt
γR
Eventually, for all R large enough, the contour includes all of the finitely many poles of F (z). There are only finitely many poles because of the estimate on F (z). Thus we can pick R large enough that the limit on the inside equals the contour integral. Thus Z ∞ Z Z r Z 1 1 −st zt −st L (f ) (s) = e e F (z) dzdt = lim e ezt F (z) dzdt r→∞ 2πi 0 2πi 0 γR γR Now interchanging the two contour integrals 1 = lim r→∞ 2πi
Z
Z
r −(s−z)t
e γR
0
1 F (z) dzdt = lim r→∞ 2πi
Z γR
1 e−(s−z)r − s−z s−z
F (z) dz
450
CHAPTER 16. ISOLATED SINGULARITIES AND ANALYTIC FUNCTIONS 1 = 2πi
Z γR
F (z) dz s−z
Now this contour integral is not zero because F (z) is not analytic on the inside of γ ∗R . Let the orientation of γ R be switched and call the new contour γˆ R . Then Z 1 F (z) L ({) (s) = dz 2πi γˆ R z − s Is this equal to F (s)? Consider a large circular contour of radius M where M > |s| and orient it counter clockwise about s as shown in the following picture. Denote this oriented curve as γ M . Then from the estimate assumed on F, y Z F (z) C 1 x=c dz ≤ α 2πM γ M z − s M M − |s| x Now as M → ∞, this converges to 0. Therefore, from the usual Cauchy s integral formula, ! Z Z F (z) F (z) 1 dz + dz F (s) = 2πi γ ˆR z − s γM z − s Now take a limit of both sides as M → ∞ and you obtain Z Z 1 F (z) 1 F (z) F (s) = dz = dz 2πi γˆ R z − s 2πi γ R s − z Thus this shows the following interesting proposition. This proposition shows conditions under which a meromorphic function is the Laplace transform of a function which happens to be given by the Bromwich integral and they are the conditions used earlier. This proves the following proposition.
Proposition 16.9.7 If Re p < c for all p a pole of F (s) and if F (s) is meromorphic and satisfies the growth condition 16.18, and if f (t) is defined by the Bromwich integral, then F (s) is the Laplace transform of f (t) for large s.
16.10
Exercises
R ∞ cos x 1. Find the following improper integral. −∞ 1+x 4 dx Hint: Use upper semicircle contour R ∞ eix and consider instead −∞ 1+x4 dx. This is because the integral over the semicircle will converge to 0 as R → ∞ if you have eiz but this won’t happen if you use cos z because cos z will be unbounded. Just write down and check and you will see why this happens. eiz Thus you should use 1+z 4 and take real part. I think the standard calculus techniques will not work for this horrible integral. R ∞ cos(x) ix . 2. Find −∞ (1+x 2 )2 dx. Hint: Do the same as above replacing cos x with e 3. Consider the following contour.
x
16.10. EXERCISES
451
The small semicircle has radius r and is centered at (1, 0). The large semicircle has radius R and is centered at (0, 0). Use the method of residues to compute ! Z r Z R x x dx + dx lim lim 3 r→0 R→∞ r 1 − x3 −R 1 − x R∞ x This is called the Cauchy principal value for −∞ 1−x 3 dx. The integral makes no sense in terms of a real honest integral. The function has Ra pole on the x axis. Another ∞ instance of this was in Problem 7 on Page 410 where 0 sin (x) /xdx was determined similarly. However, you can define such a Cauchy principal value. Rather than belabor this issue, I will illustrate with this example. These principal value integrals occur because of cancelation. They depend on a particular way of taking a limit. They are not mathematically respectable but are certainly interesting. They are in that general area of finding something by taking a certain kind of symmetric limit. Such problems include the Lebesgue fundamental theorem of calculus with the symmetric derivative. R 2π cos(θ) 4. Find 0 1+sin 2 (θ) dθ. R 2π dθ 5. Find 0 2−sin θ. R π/2 dθ 6. Find −π/2 2−sin θ. 7. Suppose you have a function f (z) which is the quotient of two polynomials in which the degree of the top is two less than the degree of the bottom and you consider the contour.
x Then define
Z
f (z) eisz dz
γR
in which s is real and positive. Explain why the integral makes sense and why the part of it on the semicircle converges to 0 as R → ∞. Use this to find Z ∞ eisx dx, k > 0. 2 2 −∞ k + x 8. Show using methods from real analysis that for b ≥ 0, √ Z ∞ π −b2 −x2 e cos (2bx) dx = e 2 0 √ R∞ 2 2 Hint: Let F (b) ≡ 0 e−x cos (2bx) dx − 2π e−b . Then from Problem 2 on Page 245, F (0) = 0. Using the mean value theorem on difference quotients and the dominated convergence theorem, explain why √ Z ∞ 2 π −b2 F 0 (b) = −2xe−x sin (2bx) dx + 2b e 2 0 Z ∞ √ π −b2 0 −x2 F (b) = 2b e cos (2bx) dx + e 2 0 √ √ π −b2 π −b2 = 2b F (b) + e + e 2 2 √ −b2 = 2bF (b) + π2be
452
CHAPTER 16. ISOLATED SINGULARITIES AND ANALYTIC FUNCTIONS Now use the integrating factor method for solving linear differential equations from beginning differential equations to solve the ordinary differential equation. √ 2 d −b2 e F (b) = π2be−2b db Then
2 2√ 1 1√ e−b F (b) − 0 = − e−2b π + π 2 2 2 1 1 √ −b2 F (b) = − e−b + πe =0 2 2 You fill in the details. This is meant to be a review of real variable techniques.
9. You can do the same problem as above using contour integration. For b = 0 it follows from Problem 2 on Page 245. For b > 0, use the contour which goes from −a to a 2 to a + ib to −a + ib to −a. Then let a → ∞ and show that the integral of e−z over the vertical parts of this contour converge to 0. Hint: You know from the earlier 2 problem what happens on the bottom part of the contour. Also for z = x + ib, e−z = 2 2 2 2 e−(x −b +2ixb) = eb e−x (cos (2xb) + i sin (2xb)) . 10. Consider the circle of radius 1 oriented counter clockwise. Evaluate Z z −6 cos (z) dz γ
11. Consider the circle of radius 1 oriented counter clockwise. Evaluate Z z −7 cos (z) dz γ
12. Find
R∞
13. Find
R∞
0
0
2+x2 1+x4 dx. x1/3 1+x2 dx
14. Suppose f is an entire function and that it has no zeros. Show there must exist an entire function g such that f (z) = eg(z) . Hint: Letting γ (0, z) be the line segment R 0 0 (w) (z) dw. Then show that gˆ0 (z) = ff (z) . which goes from 0 to z, let gˆ (z) ≡ γ(0,z) ff (w) 0 0 (z) −ˆ g (z) 0 Then e−ˆg(z) f (z) = e−ˆg(z) −f f (t) = 0. Now when you have an f (z) f (z) + e entire function whose derivative is 0, it must be a constant. Modify gˆ (z) to make f (z) = eg(z) . 15. Let f be an entire function with zeros {α1 , · · · , αn } listed according to multiplicity. Thus you might have repeats in this list. Show that there is an analytic function g (z) such that for all z ∈ C, n Y f (z) = (z − αk ) eg(z) k=1
Qn
Hint: You know f (z) = k=1 (z − αk ) h (z) where h (z) has no zeros. To see this, 2 note that near α1 , f (z) = a1 (z − α1 )+a2 (z − α1 ) +· · · and so f (z) = (z − α1 ) f1 (z) where f1 (z) 6= 0 at α1 . Now do the same for f1 and continue till fn = h. Now use the above problem. 2 16. Let F (s) = (s−1) so it is the Laplace transform of some f (t). Use the method of 2 +4 residues to determine f (t).
16.10. EXERCISES
453
17. This problem is about finding the fundamental matrix for a system of ordinary differential equations Φ0 (t) = AΦ (t) , Φ (0) = I having constant coefficients. Here A is an n × n matrix and I is the identity matrix. A matrix, Φ (t) satisfying the above is called a fundamental matrix for A. In the −1 following, s will be large, larger than all poles of (sI − A) . R (·) (a) Show that L 0 f (u) du (s) = 1s F (s) where F (s) ≡ L (f ) (s) (b) Show that L (I) = 1s I where I is the identity matrix. −1
(c) Show that there exists an n × n matrix Φ (t) such that L (Φ) (s) = (sI − A) Hint: From linear algebra
−1
(sI − A)
= ij
.
cof (sI − A)ji det (sI − A)
−1
Show that the ij th entry of (sI − A) satisfies the conditions of Proposition −1 16.9.7 and so there exists Φ (t) such that L (Φ) (s) = (sI − A) . By Corollary 16.9.3, this t → Φ (t) is continuous. (d) Thus (sI − A) L (Φ) (s) = I. Then explain why I − 1s A L (Φ) (s) = 1s I = L (I) and 1 L (Φ) (s) − L (AΦ) (s) = s ! Z (·) L (Φ) − L AΦ (u) du =
L (I) L (I)
0
so
t
Z Φ (t) −
AΦ (u) du = I 0
and so Φ is a fundamental matrix. (e) Next explain why Φ must be unique by showing that if Φ (t) is a fundamental −1 matrix, then its Laplace transform must be (sI − A) and use the theorem which says that if the two continuous functions have the same Laplace transform, then they are the same function. 18. In the situation of the above problem, show that there is one and only one solution to the initial value problem x0 (t) = Ax (t) +f (t) , x (0) = x0 , t ≥ 0 and it is given by Z
t
Φ (t − u) f (u) du
x (t) = Φ (t) x0 + 0
R (·) Hint: Verify that L 0 Φ (t − u) f (u) du (s) = L (Φ) (s) L (f ) (s) . Thus if x is given by the variation of constants formula just listed, then L (x) (s)
=
(sI − A) L (x) (s)
=
−1
(sI − A)
−1
x0 + (sI − A)
L (f ) (s)
x0 + L (f ) Rt Rt Now divide by s and verify x (t) = x0 + 0 Ax (u) du + 0 f (u) du. You could also simply differentiate the variation of constants formula using chain rule and verify it works.
454
CHAPTER 16. ISOLATED SINGULARITIES AND ANALYTIC FUNCTIONS
Chapter 17
Mapping Theorems 17.1
Meromorphic Functions
First is the definition of a meromorphic function. It is one which is analytic except for poles. We assume here that all functions have complex values.
Definition 17.1.1 Let f : Ω → C where Ω is an open subset of C. Then f : Ω → C is called meromorphic if all singularities are either poles or removable. Recall that α is a pole for f if for all z near α, f (z) =
∞ X
n X
k
ak (x − α) +
bk
k=1 (z − α)
k=0
k
Thus f is analytic in some deleted ball B 0 (α, δ) meaning for all z such that 0 < |z − α| < δ. The complex valued meromorphic functions defined on Ω will be denoted as M (Ω) . When f is analytic in some deleted ball around α and f (z) =
∞ X
∞ X
k
ak (x − α) +
k=1
k=0
bk (z − α)
k
where infinitely many of the bk are nonzero, we say that α is an essential singularity. It is shown that the poles cannot have a limit point in Ω just as a nonzero analytic function cannot have a limit point in Ω. Here is a lemma.
Lemma 17.1.2 If f has a pole at a, then limz→a |f (z)| = ∞. Also if f ∈ M (Ω) then the poles cannot have a limit point in Ω and there are at most countably many poles. For f ∈ M (Ω) , α is a pole if and only if limz→α |f (z)| = ∞. Also α is a zero if and only if limz→α |f (z)| = 0. Proof: We know by definition that f (z) = g (z) +
n X
bk k
k=1
(z − a)
where bn 6= 0. Thus by the triangle inequality, |f (z)|
≥
|bn | n − |z − a|
=
|bn | n |z − a|
≥
|bn | 1 n |z − a| 2
|g (z)| +
1−
n−1 X
!
|bk | k
k=1
|z − a|
n−1 1 X n−k |g (z)| |z − a| / |bn | + |bk | |z − a| |bn | n
k=1
455
!!
456
CHAPTER 17. MAPPING THEOREMS
for |z − a| small enough. Thus the claim is verified. Consider the second claim. Suppose αm is a pole and limm→∞ αm = α ∈ Ω and f is analytic at α. Then from the first part, there exists β m such that |αm − β m | < 1/m but |f (β m )| > m. Then β m → α but limm→∞ |f (β m )| does not exist. In particular, |f (β m )| fails to converge to f (α) showing that f cannot be analytic at α. Thus it must be the case that α is a pole. But now this is not allowed either because at a pole the function is analytic on a deleted ball centered at the pole and it is assume here that limm→∞ αm = α. Thus there are finitely many poles in every compact subset of Ω. However, 1 ∞ C ∩ B (0, k) ≡ ∪∞ Ω = ∪k=1 z : dist z, Ω ≤ k=1 Kk k where Kk is compact. If Ω = C, let Kk = B (0, k). Then by what was just shown, there are finitely many poles in Kk and so the number of poles is at most countable. Finally, consider the last claim. It is obvious that α is a zero if and only if limz→α f (z) = 0. It was shown above that at poles limz→α |f (z)| = ∞. Then suppose the limit condition holds. Why is α a pole? This happens because of the Casorati Weierstrass theorem, Theorem 16.6.4. Every singularity is isolated for a meromorphic function by definition. Thus there is a Laurent expansion for f near α. If the principal part is an infinite series, then by this theorem, the values of f near α are dense in C and so limz→α |f (z)| does not even exist. Therefore, this principal part must be a finite sum and so α is a pole. Here is an alternate definition of meromorphic.
Definition 17.1.3
f ∈ M (Ω) for Ω an open set means f is analytic in B 0 (α, r) ≡ {z : 0 < |z − α| < r}
for some r > 0 for every α ∈ Ω. At α ∈ Ω either limz→α (z − α) f (z) = 0 so α is removable or limz→α |f (z)| = ∞ which corresponds to α a pole. The fact that the poles can have no limit point in Ω is fairly significant. It shows that if P is the set of poles and if Ω is connected, then Ω \ P is an open connected set if Ω is.
Lemma 17.1.4 Suppose Ω is an open connected set and suppose P is a set of isolated points contained in Ω such that P has no limit point in Ω. Then Ω \ P is a connected open set. Proof: Suppose Ω \ P = A ∪ B where A, B have empty intersection and A has no limit points of B while B has no limit points of A and neither is empty. For p ∈ P, B (p, r) contains points of A ∪ B because this cannot consist entirely of points of P due to what was just shown, that p is not a limit point of P . For r sufficiently small, all points in B 0 (p, r) are in A ∪ B because p is not a limit point of P but, since Ω is open, B (p, r) ⊆ Ω for r small enough. Also, each A, B is an open set since Ω \ P is open. Now it is obvious that B 0 (p, r) is a connected open set contained in A ∪ B and so B 0 (p, r) must be contained in either A or in B. Let PA be those points of P such that B 0 (p, r) ⊆ A for all r small enough and let PB be defined similarly. Then Ω = (A ∪ PA ) ∪ (B ∪ PB ) . If p ∈ PA , then B 0 (p, r) ⊆ A and so p is not a limit point of B. From what was shown above, p is not a limit point of PB either. If x ∈ A, then x is not a limit point of B. Neither is it a limit point of PB because PB has no limit points in Ω as shown in Lemma 17.1.2. Similarly, B ∪ PB has no limit points of A ∪ PA . This separates Ω and is a contradiction to Ω being connected. The following is a useful lemma.
Lemma 17.1.5 Let f ∈ M (Ω) and suppose Pf the set of poles consist of {α1 , · · · , αn } . Then there exists a function g analytic on Ω such that for all z ∈ / Pf , f (z) −
n X i=1
Si (z) = g (z)
17.2. MEROMORPHIC ON EXTENDED COMPLEX PLANE
457
where Si (z) is the singular part corresponding to αi . That is, for z near αi , f (z) = hi (z) +
mi X
bk k
k=1
(z − αi )
, hi analytic near αi
(17.1)
Pn Proof: Note that f (z) − i=1 Si (z) is meromorphic on Ω. However, it has no poles. Indeed, if α is a pole, then it must be one of the αi since all the Si would be analytic at α if this were not the case. But n n X X Sj (z) Sj (z) = lim hi (z) + Si (z) − lim f (z) − z→αi z→αi j=1 j=1 X Sj (z) = lim hi (z) − z→αi j6=i which is not ∞. Thus the function can be redefined at each of the finitely many poles so that the result, called g (z) is analytic on Ω. PnBecause of this lemma, it is all right to be a little sloppy and simply write f (z) − i=1 Si (z) equals an analytic function.
Proposition 17.1.6 Let Ω be a connected open set. Then M (Ω) is a field with the usual conventions about summation and multiplication of functions. Proof: It is tedious but routine to verify that M (Ω) is a ring. The part of this which is not entirely obvious is whether the product of two meromorphic functions is meromorphic. It is clear that f g is analytic on B 0 (α, r) for small enough r. The only difficulty arises when P∞ Pm k bk α is a zero for f but a pole for g. Thus f (z) = k=r ak (z − α) , g (z) = h (z)+ k=1 (z−α) k for h analytic. But this means f (z) g (z) = f (z) h (z) + f (z)
m X
bk k
k=1
(z − α)
If r is as large as m, then f g has a removable singularity. Otherwise, it will have a pole. Thus the product of meromorphic functions is indeed meromorphic. There are various cases to consider for the sum of two of these meromorphic functions also but it is routine to verify that the sum is meromorphic. So suppose f ∈ M (Ω) and f 6= 0. Then it is analytic on Ω \ P where P is the set of poles. Thus the set of zeros has no limit point. Consider 1/f . 1 It is analytic in B 0 (α, r) for r small enough. If α is a zero, then limz→α |f (z)| = ∞ and 1/f 1 is analytic near α. If α is not a zero of f , then limz→α f (z) (z − α) = 0. Thus 1/f ∈ M (Ω).
17.2
Meromorphic on Extended Complex Plane
b if it is meromorphic on C and either We say f ∈ M C 1 limz→0 zf z = 0 in which it is said that f has a removable singularity at ∞ or lim|z|→0 f z1 = ∞ in which case we say f has a pole at ∞. That is, f z1 has a pole at 0. b , then it has only finitely many poles. Observation 17.2.2 If f ∈ M C
Definition 17.2.1
If f has a pole at ∞, this implies there cannot be a sequence of poles converging to ∞. In terms of what this means, this says that there exists r > 0 such that all poles of f are in B (0, r). Since there is no limit point for the poles, there are only finitely many.
458
CHAPTER 17. MAPPING THEOREMS
P∞ If f has a removable singularity at ∞, then f z1 = k=0 ak z k for small z. Hence if there exists a sequence of poles {αn } , αn → ∞, then one could get β n within 1/n of αn and |f (β n )| > n. Thus, |β n | → ∞ and for large n, f
1 (1/β n )
= f (β n ) =
∞ X
ak
k=0
1 βn
k
for large n, but this is not possible because the right converges to a0 while the left is unbounded.
17.3
Rouche’s Theorem
If f is meromorphic on C then if you have any rectifiable simple closed curve Γ, f is either identically 0 or it can have only finitely many poles and zeros on its inside. Recall why this is so. It is because poles must be isolated. Also, if there are infinitely many zeros of f inside Γ, then the set of zeros will have a limit in C, maybe on the boundary. By Lemma 17.1.4, deletion of the poles results in a connected open set on which f is analytic and in which the zeros have a limit point. This would require f to equal zero at every point other than the poles. Hence it would be forced to have no poles either and would be the zero function.
Theorem 17.3.1
Let f ∈ M (C). Also suppose γ ∗ is a closed bounded variation curve containing none of the poles or zeros of f and let γ be positively oriented so that for z on the inside of γ ∗ , n (γ, z) = 1. Now let {p1 , · · · , pm } and {z1 , · · · , zn } be respectively the poles and zeros of f which are on the inside of γ ∗ . Let zk be a zero of order rk and let pk be a pole of order lk . Then Z 0 n m X X 1 f (z) dz = rk − lk 2πi γ f (z) k=1
k=1
Proof: This theorem follows from computing the residues of f 0 /f which has residues only at poles and zeros of f . I will do this now. First suppose f has a pole of order p at α. Then f has the form given in 17.1. Therefore, Pp kbk h0 (z) − k=1 (z−α) k+1 f 0 (z) = Pp b k f (z) h (z) + k=1 (z−α)k Pp−1 pbp p −k−1+p h0 (z) (z − α) − k=1 kbk (z − α) − (z−α) = Pp−1 p p−k + bp h (z) (z − α) + k=1 bk (z − α) r (z) − =
pbp (z−α)
s (z) + bp
where s (α) = 0, limz→α (z − α) r (α) = 0. It has a simple pole at α and so the residue is res
f0 ,α f
= lim (z − α) z→α
r (z) −
pbp (z−α)
s (z) + bp
= −p
the order of the pole. Next suppose f has a zero of order p at α. Then P∞ k−1 f 0 (z) k k k=p ak (z − α) = = P k−1 f (z) z−α ∞ z − α k=p ak (z − α) and from this it is clear res (f 0 /f ) = p, the order of the zero. The conclusion of this theorem now follows from the residue theorem Theorem 16.7.1.
17.4. FRACTIONAL LINEAR TRANSFORMATIONS
17.4
459
Fractional Linear Transformations
These mappings map lines and circles to either lines or circles.
Definition 17.4.1
A fractional linear transformation is a function of the form f (z) =
az + b cz + d
(17.2)
where ad − bc 6= 0. Note that if c = 0, this reduces to a linear transformation (a/d) z + (b/d) . Special cases of these are defined as follows. dilations: z → δz, δ 6= 0, inversions: z →
1 , z
translations: z → z + ρ. The next lemma is the key to understanding fractional linear transformations.
Lemma 17.4.2 The fractional linear transformation, 17.2 can be written as a finite composition of dilations, inversions, and translations. Proof: Let
1 (bc − ad) d z S1 (z) = z + , S2 (z) = , S3 (z) = c z c2
and
a c in the case where c 6= 0. Then f (z) given in 17.2 is of the form S4 (z) = z +
f (z) = S4 ◦ S3 ◦ S2 ◦ S1 . Here is why. 1 d ≡ S2 (S1 (z)) = S2 z + c z+ Now consider
S3
c zc + d
≡
(bc − ad) c2
c zc + d
d c
=
=
c . zc + d bc − ad . c (zc + d)
Finally, consider S4
bc − ad c (zc + d)
≡
bc − ad a b + az + = . c (zc + d) c zc + d
In case that c = 0, f (z) = ad z + db which is a translation composed with a dilation. Because of the assumption that ad − bc 6= 0, it follows that since c = 0, both a and d 6= 0. This lemma implies the following corollary.
Corollary 17.4.3 Fractional linear transformations map circles and lines to circles or lines. Proof: It is obvious that dilations and translations map circles to circles and lines to lines. What of inversions? If inversions have this property, the above lemma implies a general fractional linear transformation has this property as well. Note that all circles and lines may be put in the form α x2 + y 2 − 2ax − 2by = r2 − a2 + b2
460
CHAPTER 17. MAPPING THEOREMS
where α = 1 gives a circle centered at (a, b) with radius r and α = 0 gives a line. In terms of complex variables you may therefore consider all possible circles and lines in the form αzz + βz + βz + γ = 0,
(17.3)
To see this let β = β 1 + iβ 2 where β 1 ≡ −a and β 2 ≡ b. Note that even if α is not 0 or 1 the expression still corresponds to either a circle or a line because you can divide by α if α 6= 0. Now I verify that replacing z with z1 results in an expression of the form in 17.3. Thus, let w = z1 where z satisfies 17.3. Then 1 α + βw + βw + γww = αzz + βz + βz + γ = 0 zz and so w also satisfies a relation like 17.3. One simply switches α with γ and β with β. Note the situation is slightly different than with dilations and translations. In the case of an inversion, a circle becomes either a line or a circle and similarly, a line becomes either a circle or a line. The next example is quite important. Example 17.4.4 Consider the fractional linear transformation, w =
z−i z+i .
First consider what this mapping does to the points of the form z = x + i0. Substituting into the expression for w, x2 − 1 − 2xi x−i = , w= x+i x2 + 1 a point on the unit circle. Thus this transformation maps the real axis to the unit circle. The upper half plane is composed of points of the form x + iy where y > 0. Substituting in to the transformation, x + i (y − 1) w= , x + i (y + 1) which is seen to be a point on the interior of the unit disk because |y − 1| < |y + 1| which implies |x + i (y + 1)| > |x + i (y − 1)|. Therefore, this transformation maps the upper half plane to the interior of the unit disk. One might wonder whether the mapping is one to one and onto. The mapping is clearly w+1 one to one because it has an inverse, z = −i w−1 for all w in the interior of the unit disk. Also, a short computation verifies that z so defined is in the upper half plane. Therefore, this transformation maps {z ∈ C such that Im z > 0} one to one and onto the unit disk {z ∈ C such that |z| < 1} .
17.5
Some Examples
There is a simple procedure for determining fractional linear transformations which map a given set of three points to another set of three points. The problem is as follows: There are three distinct points in the complex plane, z1 , z2 , and z3 and it is desired to find a fractional linear transformation such that zi → wi for i = 1, 2, 3 where here w1 , w2 , and w3 are three distinct points in the complex plane. Then the procedure says that to find the desired fractional linear transformation solve the following equation for w. z − z1 z2 − z3 w − w1 w2 − w3 · = · w − w3 w2 − w1 z − z3 z2 − z1 The result will be a fractional linear transformation with the desired properties. Why should this procedure work? Here is a heuristic argument to indicate why you would expect this to happen rather than a rigorous proof. The reader may want to tighten the argument to give a proof. First suppose z = z1 . Then the right side equals zero and
17.5. SOME EXAMPLES
461
so the left side also must equal zero. However, this requires w = w1 . Next suppose z = z2 . Then the right side equals 1. To get a 1 on the left, you need w = w2 . Finally suppose z = z3 . Then the right side involves division by 0. To get the same bad behavior, on the left, you need w = w3 . Example 17.5.1 Let z1 = 0, z2 = 1, and z3 = 2 and let w1 = 0, w2 = i, and w3 = 2i. Then the equation to solve is −i z −1 w · = · w − 2i i z−2 1 Solving this yields w = iz which clearly works. Example 17.5.2 Let ξ ∈ C and suppose Im (ξ) > 0. Then define f (z) ≡
z−ξ z − ¯ξ
Let U ≡ C \ ¯ξ . This is clearly an open connected set. Also let V ≡ C \ {1} . Then f maps U one to one and onto V. Indeed, if w ∈ V, then solve w=
z−ξ z − ¯ξ
for z. This yields z=
¯ξw − ξ w−1
As long as w 6= 1, this gives a solution because z 6= ¯ξ. (¯ξ (w − 1) 6= ¯ξw − ξ). Thus f is onto. If you have z−ξ = zzˆˆ−ξ , then you would have z−¯ ξ −¯ ξ z zˆ − ξ zˆ − ¯ξz − ξ ¯ξ = z zˆ − ξz − ¯ξ zˆ − ξ ¯ξ and so you would need to have ξ zˆ + ¯ξz = ξz + ¯ξ zˆ ξ − ¯ξ zˆ = ξ − ¯ξ z which requires zˆ = z. The function is one to one and analytic on U . Therefore, f −1 is also continuous by the open mapping theorem. f (R) ⊆ S 1 , the unit circle {z : |z| = 1} but it is missing the point 1. Therefore, f (R) = S 1 \ {1} since otherwise, f (R) would not be connected. In other words, f (R) cannot miss any other points. Let B be the unit open ball. Let U + be the upper half plane in which Re z > 0 and let U − be the lower half plane in which Re z < 0 except for the point ¯ξ. Then B ⊆ f (U + ) ∪ f (U − ) the union of two disjoint open sets. It follows that B being a connected set must be in one or the other of these. However, f (ξ) = 0 so it must be the case that B ⊆ f (U + ) . Is there anything in f (U + ) which is not in B? If so, there would be a contradiction because part of f (U + ) would be in B and the other part would have |z| > 1. Thus f (U + ) would fail to be connected which is impossible because f is continuous and U + is connected. Therefore, f (U + ) = B. Thus this is an example of an analytic function defined on a connected open set, in this case, the upper half plane such that the image of this analytic function is the unit ball. This begs the question of which connected open sets can be mapped one to one by an analytic function onto the unit ball. It turns out that every simply connected open set will have this property. Simply connected means that if you have any simple closed curve contained in the region, then the inside of the simple closed curve is in the region.
462
CHAPTER 17. MAPPING THEOREMS
Lemma 17.5.3 For α ∈ B (0, 1) , let z−α . 1 − αz
φα (z) ≡
Then φα maps B (0, 1) one to one and onto B (0, 1), φ−1 α = φ−α , and φ0α (α) =
1
2.
1 − |α|
Proof: First of all, for |z| < 1/ |α| , φα ◦ φ−α (z) ≡
z+α 1+αz
1−α
−α =z
z+α 1+αz
after a few computations. If I show that φα maps B (0, 1) to B (0, 1) for all |α| < 1, this will have shown that φα is one to one and onto B (0, 1). Consider φα eiθ . This yields iθ e − α 1 − αe−iθ = 1 − αeiθ 1 − αeiθ = 1 where the first equality is obtained by multiplying by e−iθ = 1. Therefore, φα maps ∂B (0, 1) one to one and onto ∂B (0, 1) . Now notice that φα is analytic on B (0, 1) because the only singularity, a pole is at z = 1/α. By the maximum modulus theorem, Problem 26 on Page 414 or Theorem 16.1.4, it follows |φα (z)| < 1 whenever |z| < 1. The same is true of φ−α . It only remains to verify the assertions about the derivatives. Long division gives φα (z) = −1
(−α)
+
−α+(α)−1 1−αz
and so
φ0α (z)
−2 −1 (−1) (1 − αz) −α + (α) (−α) −2 −1 = α (1 − αz) −α + (α) −2 2 = (1 − αz) − |α| + 1 =
Hence the two formulas follow. The next lemma is called the Schwarz lemma. It was presented earlier in the exercises.
Lemma 17.5.4 Suppose F : B (0, 1) → B (0, 1) , F is analytic, and F (0) = 0. Then for all z ∈ B (0, 1) , |F (z)| ≤ |z| ,
(17.4)
|F 0 (0)| ≤ 1.
(17.5)
and If equality holds in 15.36 then there exists λ ∈ C with |λ| = 1 and F (z) = λz.
(17.6)
Proof: First note that by assumption, F (z) /z has a removable singularity at 0 if its value at 0 is defined to be F 0 (0) . By the maximum modulus theorem, if |z| < r < 1, it F (z) 1 ≤ max F re ≤ . z t∈[0,2π] r r
17.6. RIEMANN MAPPING THEOREM
463
Then letting r → 1, F (z) z ≤1 this shows 17.4 and it also verifies 17.5 on taking the limit as z → 0. If equality holds in 17.5, then |F (z) /z| achieves a maximum at an interior point so F (z) /z equals a constant, λ by the maximum modulus theorem. Since F (z) = λz, it follows F 0 (0) = λ and so |λ| = 1. Rudin [61] gives a memorable description of what this lemma says. It says that if an analytic function maps the unit ball to itself, keeping 0 fixed, then it must do one of two things, either be a rotation or move all points closer to 0. (This second part follows in case |F 0 (0)| < 1 because in this case, you must have |F (z)| = 6 |z| and so by 15.35, |F (z)| < |z|) In the next section, the problem of considering which regions can be mapped onto the unit ball by a one to one analytic function will be considered. Some can and some can’t. It turns out that the main result in this subject is the Riemann mapping theorem which says that any simply connected connected open set can be so mapped onto the unit ball. A key result in showing this is the fact that such regions have something called the square root property.
Definition 17.5.5
A region, Ω has the square root property if whenever f, f1 : Ω → C are both analytic, it follows there exists φ : Ω → C such that φ is analytic and f (z) = φ2 (z) . The following lemma says that every simply connected region which is not all of C has the square root property.
Lemma 17.5.6 Let Ω be a simply connected region with Ω 6= C. Then Ω has the square root property. Proof: Let f and
1 f
both be analytic on Ω. Then
16.4.5, there exists Fe, analytic on Ω such that Fe0 =
f0 f 0
f f
is analytic on Ω so by Corollary 0 e on Ω. Then f e−F = 0 and so
e e f (z) = CeF = ea+ib eF . Now let F = Fe + a + ib. Then F is still a primitive of f 0 /f and 1 f (z) = eF (z) . Now let φ (z) ≡ e 2 F (z) . Then φ is the desired square root and so Ω has the square root property.
17.6
Riemann Mapping Theorem
From the open mapping theorem analytic functions map regions to other regions or else to single points. The Riemann mapping theorem states that for every simply connected region, Ω which is not equal to all of C there exists an analytic function, f such that f (Ω) = B (0, 1) and in addition to this, f is one to one. The proof involves several ideas which have been developed up to now. The proof is based on the following important theorem, a case of Montel’s theorem. Before beginning, note that the Riemann mapping theorem is a classic example of a major existence theorem. In mathematics there are two sorts of questions, those related to whether something exists and those involving methods for finding it. I am afraid that the latter is typically all that gets studied by undergraduate students but the real questions are related to existence. There is a long and involved history for proofs of this theorem. The first proofs were based on the Dirichlet principle and turned out to be incorrect, thanks to Weierstrass who pointed out the errors. For more on the history of this theorem, see Hille [35]. The following theorem is really wonderful. It is about the existence of a subsequence having certain salubrious properties. It is this wonderful result which will give the existence of the mapping desired. The other parts of the argument are technical details to set things up and use this theorem.
464
17.6.1
CHAPTER 17. MAPPING THEOREMS
Montel’s Theorem
Theorem 17.6.1
Let Ω be an open set in C and let F denote a set of analytic functions mapping Ω to B (0, M ) ⊆ C. Then there exists a sequence of functions from F, (k) ∞ {fn }n=1 and an analytic function f such that for each k ∈ N, fn converges uniformly to f (k) on every compact subset of Ω. Here f (k) denotes the k th derivative. Proof: First note there exists a sequence of compact sets, Kn such that Kn ⊆ int Kn+1 ⊆ Ω for all n where here int K denotes the interior of the set K, the union of all open sets conC ≤ n1 tained in K and ∪∞ n=1 Kn = Ω. In fact, you can verify that B (0, n)∩ z ∈ Ω : dist z, Ω works for Kn . Then there exist positive numbers, δ n such that if z ∈ Kn , then B (z, δ n ) ⊆ int Kn+1 . Now denote by Fn the set of restrictions of functions of F to Kn . Then let z ∈ Kn and let γ (t) ≡ z + δ n eit , t ∈ [0, 2π] . It follows that for z1 ∈ B (z, δ n ) , and f ∈ F, Z 1 1 1 |f (z) − f (z1 )| = f (w) − dw 2πi γ w−z w − z1 Z z − z1 1 f (w) ≤ dw 2π γ (w − z) (w − z1 ) Letting |z1 − z|
0. (Why? In general, two disjoint compact sets are at a positive distance from each other. Show this is so.) Then for z ∈ K, m Z X f (w) − f (w) k! (k) n (k) dw f (z) − fn (z) = k+1 2πi (w − z) j=1 γ j ≤
m X k! 1 kfk − f kKn (length of γ k ) k+1 . 2π η j=1
17.6. RIEMANN MAPPING THEOREM
465
where here kfk − f kKn ≡ max {|fk (z) − f (z)| : z ∈ Kn } . Thus you get uniform convergence of the derivatives. Another surprising consequence of this theorem is that the property of being one to one is preserved.
Lemma 17.6.2 Suppose hn is one to one, analytic on Ω, and converges uniformly to h on compact subsets of Ω along with all derivatives. Then h is also one to one. Proof: Pick z1 ∈ Ω and suppose z2 is another point of Ω. Since the zeros of h − h (z1 ) have no limit point, there exists a circular contour bounding a circle which has z2 on the inside of this circle but not z1 such that γ ∗ contains no zeros of h − h (z1 ).
γ • z1
• z2
Using the theorem on counting zeros, Theorem 17.3.1, and the fact that hn is one to one, Z 1 h0n (w) 0 = lim dw n→∞ 2πi γ hn (w) − hn (z1 ) Z h0 (w) 1 = dw, 2πi γ h (w) − h (z1 ) which shows that h − h (z1 ) has no zeros in B (z2 , r) . In particular z2 is not a zero of h − h (z1 ) . This shows that h is one to one since z2 6= z1 was arbitrary. Since the family, F satisfies the conclusion of Theorem 17.6.1 it is known as a normal family of functions. More generally,
Definition 17.6.3
Let F denote a collection of functions which are analytic on Ω, a region. Then F is normal if every sequence contained in F has a subsequence which converges uniformly on compact subsets of Ω.
17.6.2
Regions with Square Root Property
Now with Montel’s theorem the following will imply the Riemann mapping theorem. This approach is in Rudin [61].
Theorem 17.6.4
Let Ω 6= C for Ω a region and suppose Ω has the square root property. Then for z0 ∈ Ω there exists h : Ω → B (0, 1) such that h is one to one, onto, analytic, and h (z0 ) = 0.
Proof: Define F to be the set of functions f such that f : Ω → B (0, 1) is one to one and analytic. The first task is to show F is nonempty. Then, using Montel’s theorem it will be shown there is a function in F, h, such that |h0 (z0 )| ≥ ψ 0 (z0 ) for all ψ ∈ F. When this has been done it will be shown that h is actually onto. This will prove the theorem. Claim 1: F is nonempty. Proof of Claim 1: Since Ω 6= C it follows there exists ξ ∈ / Ω. Then it follows z − ξ and 1 are both analytic on Ω. Since Ω has the square root property, there exists an analytic z−ξ √ 2 function, φ : Ω → C such that φ (z) = z − ξ for all z ∈ Ω, φ (z) = z − ξ. Since z − ξ
466
CHAPTER 17. MAPPING THEOREMS
is not constant, neither is φ and it follows from the open mapping theorem that φ (Ω) is a region. Note also that φ is one to one because if φ (z1 ) = φ (z2 ) , then you can square both sides and conclude z1 − ξ = z2 √ − ξ implying z1 = z2 . za − ξ = a. I claim there exists a positive lower bound to Now pick a ∈ φ (Ω) . Thus √ z − ξ + a for z ∈ Ω. If not, there exists a sequence, {zn } ⊆ Ω such that p p p zn − ξ + a = zn − ξ + za − ξ ≡ εn → 0. Then
p p zn − ξ = εn − za − ξ
(17.7)
and squaring both sides, p zn − ξ = ε2n + za − ξ − 2εn za − ξ. √ 2 Consequently, √ (zn − za ) = εn − 2εn za − ξ which converges to 0. Taking the limit in 17.7, it follows 2 √ za − ξ = 0 and so ξ = za , a contradiction to ξ ∈ / Ω. Choose r > 0 such that for all z ∈ Ω, z − ξ + a > r > 0. Then consider r r ≡ . (17.8) φ (z) + a z−ξ+a √ This is one to one, analytic, and maps Ω into B (0, 1) ( z − ξ + a > r). Thus F is not empty and this proves the claim. Claim 2: Let z0 ∈ Ω. There exists a finite positive real number, η, defined by η ≡ sup ψ 0 (z0 ) : ψ ∈ F (17.9) ψ (z) ≡ √
and an analytic function, h ∈ F such that |h0 (z0 )| = η. Furthermore, h (z0 ) = 0. Proof of Claim 2: First you show η < ∞. Let γ (t) = z0 + reit for t ∈ [0, 2π] and r is small enough that B (z0 , r) ⊆ Ω. Then for ψ ∈ F, the Cauchy integral formula for the derivative implies Z 1 ψ (w) 0 ψ (z0 ) = dw 2πi γ (w − z0 )2 and so ψ 0 (z0 ) ≤ (1/2π) 2πr 1/r2 = 1/r. Therefore, η < ∞ as desired. For ψ defined above in 17.8 −1 √ −r (1/2) z0 − ξ −rφ0 (z0 ) 0 ψ (z0 ) = 6= 0. 2 = 2 (φ (z0 ) + a) (φ (z0 ) + a) Therefore, η > 0. It remains to verify the existence of the function, h. By Theorem 17.6.1, there exists a sequence, {ψ n }, of functions in F and an analytic function, h, such that 0 ψ n (z0 ) → η (17.10) and ψ n → h, ψ 0n → h0 , uniformly on all compact subsets of Ω. It follows |h0 (z0 )| = lim ψ 0n (z0 ) = η n→∞
(17.11)
(17.12)
and for all z ∈ Ω, |h (z)| = lim |ψ n (z)| ≤ 1. n→∞
(17.13)
By 17.12, h is not a constant. Therefore, in fact, |h (z)| < 1 for all z ∈ Ω in 17.13 by the open mapping theorem. It follows from Lemma 17.6.2 that h is one to one. It only remains to verify that h (z0 ) = 0.
17.6. RIEMANN MAPPING THEOREM
467
If h (z0 ) 6= 0,consider φh(z0 ) ◦ h where φα is the fractional linear transformation defined in Lemma 17.5.3. By this lemma it follows φh(z0 ) ◦ h ∈ F. Now using the chain rule, 0 φh(z ) ◦ h (z0 ) = φ0h(z ) (h (z0 )) |h0 (z0 )| 0 0 1 0 = |h (z0 )| 1 − |h (z0 )|2 1 = η > η 1 − |h (z0 )|2 Contradicting the definition of η. This proves Claim 2. Claim 3: The function, h just obtained maps Ω onto B (0, 1). Proof of Claim 3: To show h is onto, use the fractional linear transformation of Lemma 17.5.3. Suppose h is not onto. Then there exists α ∈ B (0, 1) \ h (Ω) . Then 0 6= φα ◦ h (z) for all z ∈ Ω because h (z) − α φα ◦ h (z) = 1 − αh (z) and it is assumed α ∈ / h (Ω) . Therefore, since Ω has the square root property, you can p consider an analytic function z → φα ◦ h (z). This function is one to one because both φα and h are. Also, the values of this function are in B (0, 1) by Lemma 17.5.3 so it is in F. Now let p (17.14) ψ ≡ φ√φ ◦h(z ) ◦ φα ◦ h. 0
α
Thus
ψ (z0 ) = φ√φ
α ◦h(z0
◦ )
p
φα ◦ h (z0 ) = 0
and ψ is a one to one mapping of Ω into B (0, 1) so ψ is also in F. Therefore, 0 p 0 ψ (z0 ) ≤ η, φ ◦ h (z ) 0 ≤ η. α
(17.15)
Define s (w) ≡ w2 . Then using Lemma 17.5.3, in particular, the description of φ−1 α = φ−α , you can solve 17.14 for h to obtain h (z)
= φ−α ◦ s ◦ φ−√φ ◦h(z ) ◦ ψ 0 α ≡F }| { z = φ−α ◦ s ◦ φ−√φ ◦h(z ) ◦ ψ (z) α
= Now
0
(F ◦ ψ) (z)
F (0) = φ−α ◦ s ◦ φ−√φ
α ◦h(z0 )
(17.16)
(0) = φ−1 α (φα ◦ h (z0 )) = h (z0 ) = 0
and F maps B (0, 1) into B (0, 1). Also, F is not one to one because it maps B (0, 1) to B (0, 1) and has s in its definition. Thus there exists z1 ∈ B (0, 1) such that φ−√φ ◦h(z ) (z1 ) = 0 α − 1 and another point z2 ∈ B (0, 1) such that φ √ (z2 ) = 1 . However, thanks to 2
−
φα ◦h(z0 )
2
s, F (z1 ) = F (z2 ). Since F (0) = h (z0 ) = 0, you can apply the Schwarz lemma to F . Since F is not one to one, it can’t be true that F (z) = λz for |λ| = 1 and so by the Schwarz lemma it must be the case that |F 0 (0)| < 1. But this implies from 17.16 and 17.15 that η = |h0 (z0 )| = |F 0 (ψ (z0 ))| ψ 0 (z0 ) = |F 0 (0)| ψ 0 (z0 ) < ψ 0 (z0 ) ≤ η,
468
CHAPTER 17. MAPPING THEOREMS
a contradiction. Then this yields the Riemann mapping theorem.
Corollary 17.6.5 (Riemann mapping theorem) Let Ω be a simply connected region with Ω 6= C and let z0 ∈ Ω. Then there exists a function, f : Ω → B (0, 1) such that f is one to one, analytic, and onto with f (z0 ) = 0. Furthermore, f −1 is also analytic. Proof: From Lemma 17.5.6, Ω has the square root property. By Theorem 17.6.4, there exists a function, f : Ω → B (0, 1) which is one to one, onto, and analytic such that f (z0 ) = 0. The assertion that f −1 is analytic follows from the open mapping theorem.
17.7
Exercises
b . Then f is a rational function. 1. Suppose f ∈ M C (a) First note that as shown above, there are finitely many poles of f {α1 , · · · , αn } so f is analytic for |z| > r (b) Suppose f has a removable singularity at ∞. First of all, let Si (z) be the singular part of the Laurent series expanded about αi . Explain why f (z) −
n X
Si (z) ≡ fn (z) , fn an entire function
i=1
P∞ 1 Then limz→0 zfn (1/z) = 0. Explain why f = k=0 ak z k for small z. Hence, n z P∞ 1 for large |z| , fn (z) = k=0 ak zk and so fn (z) is bounded for large |z|. Now explain why fn is bounded and use Liouville’s theorem. Pn (c) Next case is when f has a pole at ∞. Then f (z) − i=1 Si (z) ≡ fn (z) , fn an entire function. If f has a pole at ∞, then fn also has a pole at ∞ and Pr 1 so fn z1 = + g (z) , for small |z|. where g (z) is analytic. Thus k=1 bkP zk r for largeP|z| , fn (z) − k=1 bk z k = g z1 and since g is analytic, itPfollows that r n k fP n (z)− k=1 bk z is bounded for large |z|. Then explain why f (z)− i=1 Si (z)− r k is a rational k=1 bk z is a bounded entire function, hence a constant sof (z) b function. Note that this shows that any f which is in M C has a partial fractions expansion. b . Thus, with the preceeding problem, 2. Explain why any rational function is in M C b equals the rational functions. M C b . Show that if f has finitely many zeros 3. Suppose you have f ∈ M (C) , not M C {α1 , · · · , αn } and poles {β 1 , · · · , β m } , then there is an entire function g (z) such that r
r
(z − α1 ) 1 · · · (z − αn ) n g(z) = f (z) p p e (z − β 1 ) 1 · · · (z − β m ) m where pi is the of the pole at β i and ri is the order of the zero at αi . Hint: First Qorder n r show f (z) = k=1 (z − αk ) k h (z) where h (z) is meromorphic but has Pmno zeros. Then h (z) has the same poles with the same orders as f (z). Then h (z)− i=1 Si (z) = l (z) where l (z) is entire and the Si (z) are the principal parts h (z) corresponding to β i . Argue now that Qn rk k=1 (z − αk ) f (z) = Qm pk (q (z)) k=1 (z − β k ) where q (z) is entire and can’t have any zeros. Next use Problem 14.
17.7. EXERCISES
469
4. Let w1 , w2 , w3 be independent periods for a meromorphic function f (z). This means P3 that if i=1 ai wi = 0 for each ai an integer, then each ai = 0. Hint: At some point you may want to use Lemma 17.1.4. P3 (a) Show that if ai is an integer, then i=1 ai wi is also a period of f (z). P3 (b) Let PN be periods of the form i=1 ai wi for ai an integer with |ai | ≤ N . Show 3 there are (2N + 1) such periods. h P P i2 3 3 (c) Show PN ⊆ −N ≡ QN . i=1 |wi | , N i=1 |wi | (d) Between the cubes of any two successive positive integers, there is the square of 3 3 a positive integer. Thus (2N ) < M 2 < (2N + 1) . Show this is so. It is easy 3/2 3/2 to verify if you show that (x + 1) − x > 2 for all x ≥ 2 showing that there 3/2 3/2 is an integer m between (n + 1) and n . Then squaring things, you get the result. (e) Partition QN into M 2 small squares. If Q is one of these, show its sides are no longer than
N
2 1/2 2 1/2 P 3 N |w | |wi | i i=1 C ≤ ≤ 1/2 3 M2 N (2N )
P
3 i=1
3
(f) You have (2N + 1) points which are contained in M 2 squares where M 2 is 2 smaller than (2N + 1) . Explain why one of these squares must contain two different periods of PN . P3 P3 (g) Suppose the two periods are i=1 ai wi and i=1 a ˆi wi , both in Q which has sides 1/2 of length no more than √ C/N√ . Thus the distance between these two periods has length no more than 2C/ N . Explain why this shows that there is a sequence of periods of f which converges to 0. Explain why this requires f to be a constant. This result, that there are at most two independent periods is due to Jacobi from around 1835. In fact, there are nonconstant functions which have two independent periods but they can’t be bounded. 5. Suppose you have f is analytic and has two independent periods. Show that f is a constant. Hint: Consider a parallelogram determined by the two periods and apply Liouville’s theorem. Functions having two independent periods which are analytic except for something called a pole are known as elliptic functions. Much can be said about these functions but not so much in this book. 6. Suppose f is an entire function, analytic on C, and that it has two periods w1 , w2 . That is f (z + w1 ) = f (z) and f (z + w2 ) = f (z). Suppose also that the ratio of these two periods is not a real number so vectors, w1 and w2 are not parallel. Show, using Liouville’s theorem, that f (z) equals a constant. Hint: Consider the parallelogram determined by the two vectors w1 , w2 and tile C with similar parallelograms. Elliptic functions are those which have two periods like this and are analytic except for poles. Roughly, these are points where |f (z)| becomes unbounded. Thus the only analytic elliptic functions are constants. 7. You can show that if r is a real irrational number the expressions of the form m+nr for m, n integers are dense in R. See my single variable advanced calculus book or modify the argument in Problem 4. (Let PN be everything of the form m+nr where |m| , |n| ≤ 2 N . Thus there are (2N + 1) such numbers contained in [−N (1 + |r|) , N (1 + |r|)] ≡
470
CHAPTER 17. MAPPING THEOREMS 2
2
I. Let M be an integer, (2N ) < M < (2N + 1) and partition I into M equal intervals. Now argue some interval has two of these numbers in PN etc.) In particular, |m + nr| can be made as small as desired. Now suppose f is a non constant meromorphic function and it is periodic having periods w1 , w2 where if, for m, n integers, mw1 + nw2 = 0 then both m, n are zero. Show that w1 /w2 cannot be real. This was also done by Jacobi. 8. Suppose you have a nonconstant meromorphic function f which has two periods w1 , w2 such that if mw1 + nw2 = 0 for m, n integers, then m = n = 0. Let Pa be a parallelogram with lower left vertex at a and sides determined by w1 and w2 such that no pole of f is on any of the sides. Show that the sum of the residues of f found inside Pa must be zero. a1 z+b1 9. Let f (z) = az+b cz+d and let g (z) = c1 z+d1 . Show that f ◦ g (z) equals the quotient of two expressions, the numerator being the top entry in the vector a b a1 b1 z c d c1 d1 1
and the denominator being the bottom entry. Show that if you define az + b a b , φ ≡ c d cz + d then φ (AB) = φ (A) ◦ φ (B) . Find an easy way to find the inverse of f (z) = give a condition on the a, b, c, d which insures this function has an inverse.
az+b cz+d
and
10. The modular group1 is the set of fractional linear transformations, az+b cz+d such that a, b, c, d are integers and ad − bc = 1. Using Problem 9 or brute force show this modular group is really a group with the group operation being composition. Also dz−b show the inverse of az+b cz+d is −cz+a .
1 This
is the terminology used in Rudin’s book Real and Complex Analysis.
Chapter 18
Spectral Theory of Linear Maps ∗
This chapter provides a short introduction to the spectral theory of linear maps defined on a Banach space. It is only an introduction. You should see [24] for a complete treatment of these topics.
18.1
The Resolvent and Spectral Radius
The idea is that you have A ∈ L (X, X) where X is a complex Banch space. We eliminate from consideration the stupid case that X is only the 0 vector. To begin with, here is a fundamental lemma which will be used whenever convenient. It is about taking a continuous linear transformation through the integral sign.
Lemma 18.1.1 Let f : γ ∗ → L (X, X) be continuous where γ : [a, b] → C has finite total variation and X is a Banach space. Let A ∈ L (X, X) . Then Z
Z
A
f (z) dz f (z) dz A
=
γ
Z
Af (z) dz γ
Z =
γ
f (z) Adz γ
When we write AB for A, B ∈ L (X, X) , it means A ◦ B. That is, A ◦ B (x) = A (B (x)) Proof: This follows from the definition of the integral. Let P denote a sequence of partitions such that kP k → 0. Z f (z) dz ≡ lim γ
kP k→0
X
f (γ (τ i )) (γ (ti ) − γ (ti−1 ))
P
Now multiplication of an element of L (X, X) by A ∈ L (X, X) is continuous because kAB1 − AB2 k ≤ kAk kB1 − B2 k kB1 A − B2 Ak ≤ kB1 − B2 k kAk 471
472
CHAPTER 18. SPECTRAL THEORY OF LINEAR MAPS
∗
Therefore, Z A
f (z) dz
X
≡ A lim
kP k→0
γ
= =
lim A
X
kP k→0
lim
kP k→0
f (γ (τ i )) (γ (ti ) − γ (ti−1 ))
P
f (γ (τ i )) (γ (ti ) − γ (ti−1 ))
P
X
Af (γ (τ i )) (γ (ti ) − γ (ti−1 ))
P
Z ≡
Af (z) dz γ
There are no issues regarding existence of the various quantities because the functions are continuous and the curve is of bounded variation. The other claim is completely similar. Corresponding to A there are two sets defined next.
Definition 18.1.2
The resolvent set, denoted as ρ (A) is defined as n o −1 λ ∈ C : (λI − A) ∈ L (X, X) −1
The spectrum of A, denoted as σ (A) is C \ ρ (A). When λ ∈ ρ (A) , we call (λI − A) resolvent. Thus, in particular, when λ is in ρ (A) , λI − A is one to one and onto.
the
There is a fundamental identity involving the resolvent which should be noted first. In order to remember this identity, simply write −1
(λI − A)
≈
1 λ−A
and proceed formally. 1 1 µ−λ − = λ−A µ−A (λ − A) (µ − A) This suggests that −1
(λI − A)
−1
− (µI − A)
−1
(µI − A)
−1
(λI − A)
=
(µ − λ) (λI − A)
=
(µ − λ) (µI − A)
−1
−1
(18.1)
It is easy to verify that this is in fact the case. Multiply on the left by (λI − A) and on the right by (µI − A) in the top line. This yields µI − A − (λI − A) on the left and it yields (µ − λ) I on the right which is seen to be the same thing. Thus we have two quantities in 18.1, x on the left and y on the right and we have (λI − A) x (µI − A) = (λI − A) y (µI − A) Since both µI − A and λI − A are one to one and onto, it follows that x = y. You can do a similar thing to the expression on the second row. Multiply on the left by (µI − A) and on the right by (λI − A) . This is called the resolvent identity.
Proposition 18.1.3 For A ∈ L (X, X) and λ, µ ∈ ρ (A) , −1
(λI − A)
−1
− (µI − A)
Next is a useful lemma.
−1
(µI − A)
−1
(λI − A)
=
(µ − λ) (λI − A)
=
(µ − λ) (µI − A)
−1
−1
18.1. THE RESOLVENT AND SPECTRAL RADIUS
473
Lemma 18.1.4 Let B ∈ L (X, X) and suppose kBk < 1. Then (I − B)−1 exists and is given by the series −1
(I − B)
=
∞ X
Bk
k=0
Also, 1 1 − kBk
−1
(I − B) ≤
Proof: The series converges by the root test, Theorem 2.8.1. Indeed, 1/n
kB n k
n 1/n
≤ (kBk )
= kBk < 1
so lim sup kB n k
1/n
≤ kBk < 1
n→∞
Now also (I − B)
n X
B k = I − B n+1
k=0
n+1 and converges to 0. Thus, where B n+1 ≤ kBk (I − B)
∞ X
n X
B k = lim (I − B)
k=0
n→∞
B k = lim I − B n+1 = I n→∞
k=0
Similarly the infinite sum is the left inverse of I − B. To see this, note that if kAn − Ak → 0, then An C → AC because kAn C − ACk ≡ sup k(An − A) Cxk ≤ kAn − Ak kCk kxk≤1
Therefore, ∞ X
n X
B k (I − B) = lim
n→∞
k=0
B k (I − B) = lim I − B n+1 = I n→∞
k=0
As to the estimate, if kxk ≤ 1,
! ∞ ∞
X
X
k k B (x) = B (x)
−1
(I − B) x =
k=0
k=0
∞ ∞ X
k X k
B ≤ kBk =
≤
k=0
k=0
1 1 − kBk
The major result about resolvents is the following proposition. Note that the resolvent has values in L (X, X) which is a Banach space.
Proposition 18.1.5 Let A ∈ L (X, X) for X a Banach space. Then the following hold. 1. ρ (A) is open 2. λ → (λI − A)
−1
3. λ → (λI − A)
−1
is continuous
is analytic
−1 4. For |λ| > kAk , (λI − A) ≤
1 |λ|−kAk
and −1
(λI − A)
=
∞ X Ak λk+1 k=0
474
CHAPTER 18. SPECTRAL THEORY OF LINEAR MAPS
∗
−1
−1 Proof: Let λ ∈ ρ (A) . Let |µ − λ| < (λI − A) . Then h i −1 µI − A = (µ − λ) I + λI − A = (λI − A) I − (λ − µ) (λI − A)
−1 −1 Now (λ − µ) (λI − A) = |λ − µ| (λI − A) < 1 from the assumed estimate. Therefore, ∞ h i−1 X k −1 k −1 I − (λ − µ) (λI − A) = (λ − µ) (λI − A) k=0
and so −1
(µI − A)
= =
h
I − (λ − µ) (λI − A)
∞ X
k
(λ − µ)
−1
i−1
−1
(λI − A)
(λI − A)
−1
(18.2)
k+1
k=0
This shows ρ (A) is open. Next consider continuity. From 18.2, and Lemma 18.1.4,
−1 −1
(µI − A) − (λI − A)
∞
X k+1
−1 k −1 − (λI − A) (λ − µ) (λI − A) =
k=0
=
∞
∞
X k+1 k+1 X
k −1 k −1
|λ − µ| (λI − A) (λ − µ) (λI − A)
≤
≤
∞
X k
k −1 −1 |λ − µ| (λI − A)
(λI − A)
k=1
k=1
k=1
=
−1
(λI − A)
−1 |λ − µ| (λI − A)
−1 1 − |λ − µ| (λI − A)
which converges to 0 as µ → λ. Thus the resolvent is continuous at each λ ∈ ρ (A). Now that this is known, it is easy to verify that this function is also analytic. This follows from the continuity and the resolvent equation. −1
(λI − A)
−1
− (µI − A) λ−µ
−1
=
−1
(µ − λ) (µI − A) (λI − A) λ−µ −1
= − (µI − A)
−1
(λI − A)
so taking the limit as λ → µ one obtains the derivative at µ is 2 −1 − (µI − A) . Finally consider the estimate. For |λ| > kAk , (λI − A) = λ I − 18.1.4, −1 X ∞ A Ak 1 −1 I− = (λI − A) = λ λ λk+1
A λ
k=0
and
1 1 1
−1 =
(λI − A) ≤ |λ| 1 − kAk |λ| − kAk |λ|
and so from Lemma
18.1. THE RESOLVENT AND SPECTRAL RADIUS
475
The notion of spectrum is just a generalization of the concept of eigenvalues in linear algebra. In linear algebra, everything is finite dimensional and the spectrum is just the set of eigenvalues. You can get them by looking for solutions to the characteristic equation which involves a determinant and they exist because of the fundamental theorem of algebra. No such thing is available in a general Banach space. However, many of the same results still hold.
Proposition 18.1.6 For A ∈ L (X, X) , σ (A) 6= ∅ and σ (A) is a compact set. If λ ∈ σ (A) , then for all n ∈ N, it follows that λn ∈ σ (An ). −1
Proof: Suppose first that σ (A) = ∅. Then you would have λ → (λI − A) and in fact from estimate 4 above,
−1 lim (λI − A) = 0 |λ|→∞
is analytic
(18.3)
−1 −1 Thus there exists r such that if |λ| < r, then (λI − A) < 1. However, (λI − A) is −1
bounded for |λ| ≤ r and so λ → (λI − A) is analytic on all of C (entire) and is bounded. Therefore, it is constant thanks to Liouville’s theorem, Theorem 15.6.2. But the constant −1 can only be 0 thanks to 18.3. Therefore, (λI − A) = 0 which is nonsense since we are not considering X = {0}. σ (A) is the complement of the open set ρ (A) and so it is closed. It is bounded thanks to 4 of Proposition 18.1.5. Therefore, it is compact by the Heine Borel theorem. It remains to verify the last claim. (λI − A)
n−1 X
λk A(n−1)−k
=
n−1 X
λk+1 A(n−1)−k −
=
n X
λk An−k −
k=1
=
n−1 X
λk An−k
k=0
k=0
k=0
n−1 X
n−1 X
λk An−k = λn I − An
(18.4)
k=0
λk A(n−1)−k (λI − A)
k=0 −1
If λ ∈ σ (A) , then this means (λI − A) does not exist. Thanks to the open mapping theorem, this is equivalent to either λI − A not being one to one or not onto because by this theorem, if it is both, then it is continuous and has continuous inverse because it maps open sets to open sets. Now consider the identity 18.4. If λI − A is not one to one, then the bottom line and the middle line shows that λn I − A is not one to one. If (λI − A) is not onto, then the top line and the middle line shows that λn I − An is not onto. Thus λn ∈ σ (An ).
Definition 18.1.7
σ (A) is closed and bounded. Let r (A) denote the spectral radius
defined as max {|λ| : λ ∈ σ (A)} It is very important to have a description of this spectral radius. The following famous result is due to Gelfand.
Theorem 18.1.8
Let A ∈ L (X, X). Then r (A) = limn→∞ kAn k
1/n
.
Proof: Let λ ∈ σ (A). By Proposition 18.1.6 λn ∈ σ (An ) and so by Proposition 18.1.5, n
|λn | = |λ| ≤ kAn k
476
CHAPTER 18. SPECTRAL THEORY OF LINEAR MAPS
∗
Thus for λ ∈ σ (A) , |λ| ≤ lim inf kAn k
1/n
1/n
so r (A) ≤ lim inf kAn k
n→∞
n→∞
(18.5)
But also, from Proposition 18.1.5, for |λ| > kAk , −1
(λI − A)
∞ X Ak λk+1 k=0
=
By Corollary 16.2.3 about uniqueness of the Laurent series, the above formula for the Laurent series must hold for all |λ| > r (A). By the root test, Theorem 2.8.1, it must be the case that if |λ| > r (A) , then
n 1/n
A 1 1/n
= lim sup lim sup kAn k ≤1 n+1
|λ| n→∞ λ n→∞ Therefore, if |λ| > r (A) , it follows that 1/n
lim sup kAn k
≤ |λ|
n→∞
This being true for all such λ implies with 18.5 that r (A) ≥ lim sup kAn k
1/n
18.2
1/n
≥ lim inf kAn k n→∞
n→∞
≥ r (A) .
Functions of Linear Transformations
Let A ∈ L (X, X) and suppose σ (A) = ∪nk=1 Kk where the Kk are disjoint compact sets. Let δ be small enough that no point of any Kk is within δ of any other Kj as in the proof of Theorem 16.5.2. Let the open set containing Kj be given by Uj ≡ Kj + B (0, δ/2) , defined as all numbers in C of the form k + d where |d| < 2δ and k ∈ Kj . Then by the construction of Theorem 16.5.2, Kj ⊆ Uj and Γ∗j ⊆ Uj also. In addition to this, the open sets just described are disjoint. Now let fk (z) = 1 on Uk and let it be 0 everywhere else. Thus fk is not analytic on C but it is analytic on ∪j Uj ≡ Ω and Ω contains σ (A). −1 Now fk (λ) (λI − A) is not analytic on Ω because Ω contains σ (A). Recall Theorem 16.5.2 about the existence of cycles in this context.
Theorem 18.2.1
Let K1 , K2 , · · · , Km be disjoint compact subsets of an open set Ω m in C. Then there exist continuous, closed, bounded cycles {Γj }j=1 for which Γ∗j ∩ Kk = ∅ for each k, j, Γ∗j ∩ Γ∗k = ∅, Γ∗j ⊆ Ω. Also, if p ∈ Kk and j 6= k, n (Γk , p) = 1, n (Γj , p) = 0 so if p is in some Kk ,
m X
n (Γj , p) = 1
j=1
each Γj being the union of oriented simple closed curves, while for all z ∈ / Ω m X
n (Γk , z) = 0.
k=1
Also, if p ∈ Γ∗j , then for i 6= j, n (Γi , p) = 0.
18.2. FUNCTIONS OF LINEAR TRANSFORMATIONS
477
Also recall the corollary to this theorem, Corollary 16.5.3.
Corollary 18.2.2 n o In the context of Theorem 16.5.2, there exist continuous, closed, bounded cycles
ˆj Γ
m
j=1
ˆ ∗ = ∅, Γ ˆ ∗ ⊆ Ω. ˆ∗ ∩ Γ ˆ ∗ ∩ Kk = ∅ for each k 6= j, Γ for which Γ j j j k
Also, if p ∈ Kk and j 6= k, ˆ k , p = 1, n Γ so if p is in some Kk ,
ˆj , p = 0 n Γ
m X ˆj , p = 1 n Γ j=1
ˆ j being the union of oriented simple closed curves, while for all z ∈ each Γ / Ω m X ˆ k , z = 0. n Γ k=1
ˆ ∗ , then for i 6= j, n Γ ˆ i , p = 0. Additionally, n Γ ˆ k , p = 0 if p ∈ Γ∗ . Also, if p ∈ Γ j k There is a general notion of finding functions of linear operators.
Definition 18.2.3
Let A ∈ L (X, X) and let Ω be an open set which contains σ (A) . C Let Γ be a cycle which has Γ∗ ⊆ Ω ∩ σ (A) , and suppose that 1. n (Γ, z) = 1 if z ∈ σ (A). 2. n (Γ, z) is an integer if z ∈ Ω. 3. n (Γ, z) = 0 if z ∈ / Ω. Then if f is analytic on Ω, f (A) ≡
1 2πi
−1
R Γ
f (λ) (λI − A)
dλ.
First of all, why does this make sense for things which have another meaning? In particular, why does it make sense if λn = f (λ)? Is An correctly given by this formula? If not, then this isn’t a very good way to define f (A). This involves the following theorem which says that if you look at f (A) g (A) defined above, it gives the same thing as the above applied to f (λ) g (λ).
Theorem 18.2.4 Suppose Ω ⊇ σ (A) where Ω is an open set. Let Γ be an oriented cycle, the union of oriented simple closed curves such that n (Γ, z) = 1 if z ∈ σ (A) , n (Γ, z) is an integer if z ∈ Ω, and n (Γ, z) = 0 if z ∈ / Ω. Then for f, g analytic on Ω, Z 1 −1 f (A) ≡ f (λ) (λI − A) dλ 2πi Γ Z 1 −1 g (A) ≡ g (λ) (λI − A) dλ 2πi Γ and f (A) g (A) =
1 2πi
Z
−1
f (λ) g (λ) (λI − A)
dλ
Γ
ˆ be such that for λ ∈ Γ, n Γ, ˆ λ = 0 but the Proof: From Corollary 18.2.2, let Γ ˆ Then using Lemma 18.1.1 as needed, definition of g (A) works as well for Γ. Z Z −1 −1 −4π 2 f (A) g (A) = f (λ) (λI − A) dλ g (µ) (µI − A) dµ ˆ Γ Γ Z Z −1 −1 = f (λ) g (µ) (λI − A) (µI − A) dµdλ Γ
ˆ Γ
478
CHAPTER 18. SPECTRAL THEORY OF LINEAR MAPS
∗
Using the resolvent identity 18.1, this equals Z Z i 1 h −1 −1 f (λ) g (µ) (λI − A) − (µI − A) dµdλ µ−λ ˆ Γ Γ Now using the Fubini theorem, Theorem 15.3.10, Z Z 1 −1 = g (µ) f (λ) (λI − A) dµdλ µ−λ ˆ Γ Γ Z Z 1 −1 dλdµ + g (µ) (µI − A) f (λ) λ − µ ˆ Γ Γ The first term is 0 from Cauchy’s integral formula, Theorem 15.6.1 applied to the individual ˆ simple closed curves whose union is Γ because the winding number n Γ, λ = 0. Thus the inside integral vanishes. From the Cauchy integral formula, the second term is Z −1 2πi g (µ) (µI − A) f (µ) dµ ˆ Γ
and so f (A) g (A)
= =
Z 1 −1 f (µ) g (µ) (µI − A) dµ 2πi Γˆ Z 1 −1 f (λ) g (λ) (λI − A) dλ 2πi Γ
Now consider the case that f (λ) = λ. Is f (A), defined in terms of integrals as above, equal to A? If so, then from the theorem just shown, λn used in the integral formula does lead to An . Thus one considers Z 1 −1 λ (λI − A) dλ 2πi Γ where Γ is a cycle such that n (Γ, z) = 1 if z ∈ σ (A), n (Γ, z) is an integer if z ∈ Ω, and n (Γ, z) = 0 if z ∈ / Ω, Γ∗ ∩ σ (A) = ∅. Recall the following corollary of the Cauchy integral formula.
Corollary 18.2.5 Let Ω be an open set (note that Ω might not be simply connected) and let γ k : [ak , bk ] → Ω, k = 1, · · · , m, be closed, continuous and of bounded variation. Suppose also that m X n (γ k , z) = 0 k=1
for all z ∈ / Ω and
Pm
k=1 n (γ k , z) is an integer for z ∈ Ω. Then if f : Ω → X is analytic, m Z X
f (w) dw = 0.
γk
k=1
Thus if Γ is the union of these oriented curves, Z f (w) dw = 0. Γ C
ˆ ≡ σ (A) and let γˆ R be a large circle of radius R > kAk oriented clockwise, Now let Ω which includes σ (A) ∪ Γ∗ on its inside. Consider the following picture in which σ (A) is the union of the two compact sets, K1 , K2 which are contained in the closed curves shown and Γ is the union of the oriented cycles Γi . A similar picture would apply if there were more than two Ki . All that is of
18.2. FUNCTIONS OF LINEAR TRANSFORMATIONS
479
interest here is that there is a cycle Γ oriented such that for all z ∈ σ (A) , n (Γ, z) = 1, ˆ and γˆ R is a large circle oriented clockwise as shown. n (Γ, z) is an integer if z ∈ Ω,
K1
K2
Γ1 Γ2
γˆ R
C
ˆ ≡ σ (A) . Let γ R ≡ −ˆ Then in this case, f (λ) = λ is analytic everywhere and Ω γR. Thus, by Corollary 18.2.5 Z Z 1 1 −1 −1 f (λ) (λI − A) dλ + f (λ) (λI − A) dλ = 0 2πi Γ 2πi γˆ R and so, f (A) ≡
1 2πi
Z
−1
f (λ) (λI − A)
dλ =
Γ
The integrand is λ
1 2πi
Z
−1
f (λ) (λI − A)
dλ
γR
∞ X Ak λk+1 k=0
for λ ∈ γ ∗R and convergence is uniform on γ ∗R . Then all terms vanish except the one when k = 1 because all the other terms have primitives. The uniform convergence implies that the integral of the sum is the sum of the integrals and there is only one which survives. Therefore, Z 1 A f (A) ≡ dλ = A 2πi γ R λ It follows from Theorem 18.2.4 that if f (λ) = λn , then f (A) = An . This shows that it is not unreasonable to make this definition. Similar reasoning yields f (A) = I if f (λ) = λ0 = 1.
(18.6)
We may not be able to explicitly evaluate f (A) in case f (λ) is not analytic off σ (A) but at least the definition is reasonable and seems to work as it should in the special case when f (A) is known by another method, say a power series. These details about a power series are left to the reader. With Theorem 18.2.4, one can prove the spectral mapping theorem.
Theorem 18.2.6
Suppose Ω ⊇ σ (A) where Ω is an open set and let f be analytic on Ω. Let Γ be an oriented cycle, the union of oriented simple closed curves such that n (Γ, z) = 1 if z ∈ σ (A) , n (Γ, z) is an integer if z ∈ Ω, and n (Γ, z) = 0 if z ∈ / Ω. Then for Z 1 −1 f (A) ≡ f (λ) (λI − A) dλ, 2πi Γ it follows that σ (f (A)) = f (σ (A)) (µ) has a removable singularity at λ so it is equal to an Proof: Let µ ∈ Ω.µ → f (λ)−f λ−µ analytic function of µ called g (µ). Recall why this is. Since f is analytic at λ, we know that for µ near λ ∞ X k f (µ) = ak (µ − λ) k=0
480
CHAPTER 18. SPECTRAL THEORY OF LINEAR MAPS
∗
and so, when this is substituted into the difference quotient, one obtains the power series for an analytic function called g (µ). Then f (λ) − f (µ) = g (µ) (λ − µ) From Theorem 18.2.4, f (A) − f (λ) I
= = =
Z 1 −1 (f (µ) − f (λ)) (µI − A) dµ 2πi Γ Z 1 −1 g (µ) (µ − λ) (I − A) dµ 2πi Γ g (A) (A − λI) = (A − λI) g (A)
If λ ∈ σ (A) then either (A − λI) fails to be one to one or it fails to be onto. The above shows that the same must hold for f (A) − f (λ) I. Thus f (λ) ∈ σ (f (A)). In other words, f (σ (A)) ⊆ σ (f (A)). −1 Now suppose ν ∈ σ (f (A)) . Is ν = f (λ) for some λ ∈ σ (A)? If not, then (f (λ) − ν) ˆ which is analytic function of λ ∈ σ (A) and by continuity, this must hold in an open set Ω contains σ (A). Therefore, using this open set in the above considerations, it follows from Theorem 18.2.4 that for Γ pertaining to this new open set as above, Z 1 −1 −1 −1 (f (µ) − ν) (f (µ) − ν) (µI − A) dµ = I (f (A) − ν) (f (A) − ν) = 2πi Γ and so ν was not really in σ (f (A)) after all. Hence the equality holds.
18.3
Invariant Subspaces
Let σ (A) = ∪ni=1 Ki where the Ki are disjoint and compact. Let δ i = min {dist (z, ∪j6=i Kj ) : z ∈ Ki } > 0 and let 0 < δ < min {δ i , i = 1, · · · , n} Then letting Ui ≡ Ki + B (0, δ) , it follows that the Ui are disjoint open sets. Let Γj be a oriented cycle such that for z ∈ Kj , n (Γj , z) = 1 and for z ∈ Ki , i 6= j, n (Γj , z) = 0. and if z ∈ Γ∗j , then n (Γi , z) = 0. (18.7) Let Γ be the union of these oriented cycles. Thus n (Γ, z) = 0 if z ∈ / Ω ≡ ∪ni=1 Ui and ∗ Ui ⊇ Γi . Define 1 on Ui fi (λ) ≡ 0 on Uj for j 6= i Thus fi is analytic on Ω. Using the above definition, Z 1 −1 fi (λ) (λI − A) dλ fi (A) ≡ Pi ≡ 2πi Γ Then by the spectral mapping theorem, Theorem 18.2.6, σ (fi (A)) = fi (σ (A)) = {0, 1} −1
−1
(18.8)
Note that for λ ∈ ρ (A) , A (λI − A) = (λI − A) A as can be seen by multiplying both sides by (λI − A) and observing that the result is A on both sides. Then since (λI − A) is one to one,R the identity follows. Now let Pk ∈ L (X, X) be the linear transformation given −1 1 by Pk = 2πi (λI − A) dλ. Γk
18.3. INVARIANT SUBSPACES
481
From Lemma 18.1.1, APk
Z 1 −1 (λI − A) dλ = 2πi Γk Z 1 −1 (λI − A) Adλ = 2πi Γk
= A =
Z 1 −1 A (λI − A) dλ 2πi Γk Z 1 −1 (λI − A) dλA = Pk A 2πi Γk
(18.9)
With these introductory observations, the following is the main result about invariant subspaces. First is some notation.
Definition 18.3.1 Let X be a vector space and let Xk be P a subspace. Then X = P n n X means that every x ∈ X can be written in the form x = k=1 xk , xk ∈ Xk . We k=1 k write n M X= Xk k=1
P
if whenever 0 = k xk , it follows that each xk = 0. In other words, we use the new notation when there is a unique way to write each vector in X as a sum of vectors in the Xk . When this uniqueness holds, the sum is called a direct sum. In case AXk ⊆ Xk , we say that Xk is A invariant and Xk is an invariant subspace.
Theorem 18.3.2
Let σ (A) = ∪nk=1 Kk where Kj ∩ Ki = ∅, each Kj being compact. There exist Pk ∈ L (X, X) for each k = 1, · · · , n such that Pn 1. I = k=1 Pk 2. Pi Pj = 0 if i 6= j 3. Pi2 = Pi for each i 4. X =
n M
Xk where Xk = Pk X and each Xk is a Banach space.
k=1
5. AXk ⊆ Xk which says that Xk is A invariant. 6. Pk x = x if x ∈ Xk . If x ∈ Xj , then Pk x = 0 if k 6= j. Proof: Consider 1. n X
Pk
=
k=1
=
1 2πi
Z X n
1 2πi
Z
−1
fk (λ) (λI − A)
dλ
Γ k=1 −1
1 (λ − I)
dλ = I
Γ
from 18.6. Consider 2. Let λ be on Γi and µ on Γj . Then from Theorem 18.2.4 Z Z 1 1 −1 fi (λ) fj (λ) (λI − A) dλ = 0dλ = 0 Pi Pj = 2πi Γ 2πi Γ Pn Now from 1., I = i=1 Pi and so, multiplying both sides by Pi , Pi = Pi2 .This shows 3. Pn Consider 4. Note Pthat from 1. X = k=1 Xk where Xk ≡ Pk X. However, this is a direct sum because if 0 = k Pk xk , then doing Pj to both sides and using part 2., 0 = Pj2 xj = Pj xj and so the summands are all 0 since j is arbitrary. As to Xk being a Banach space, suppose Pk xn → y. Is y ∈ Xk ? Pk xn = Pk (Pk xn ) and so, by continuity of Pk , this converges to Pk y ∈ Xk . Thus Pk xn → y and Pk xn → Pk y so y = Pk y and y ∈ Xk . Thus Xk is a closed subspace of a Banach space and must therefore be a Banach space itself. 5. follows from 18.9. If Pk x ∈ Xk , then APk x = Pk Ax ∈ Xk . Hence A : Xk → Xk .
482
CHAPTER 18. SPECTRAL THEORY OF LINEAR MAPS
∗
Finally, suppose x ∈ Xk . Then x = Pk y. Then Pk x = Pk2 y = Pk y = x so for x ∈ Xk , Pk x = x and Pk restricted to Xk is just the identity. If Pj x is a vector of Xj , and k 6= j, then Pk Pj x = 0x = 0. From the spectral mapping theorem, Theorem 18.2.6, σ (Pk ) = σ (fk (A)) = {0, 1} because fk (λ) has only these two values. Then the following is also obtained
Theorem 18.3.3
Let n > 1, σ (A) = ∪nk=1 Kk
where the Kk are compact and disjoint. Let Pk be the projection map defined above and Xk ≡ Pk X. Then define Ak ≡ APk . The following hold 1. Ak : Xk → Xk , Ak x = Ax for all x ∈ Xk so Ak is just the restriction of A to Xk . 2. σ (Ak ) = {0, Kk } . Pn 3. A = k=1 Ak . 4. If we regard Ak as a mapping A : Xk → Xk , then σ (Ak ) = Kk . Proof: Letting fk (λ) be the function in the above theorem, g (λ) = λ, Z 1 −1 fk (λ) g (λ) (λI − A) dλ Ak = 2πi Γ and so, by the spectral mapping theorem, σ (Ak ) = σ (fk (A) g (A)) = fk (σ (A)) g (σ (A)) = {0, Kk } because the possible values for fk (λ) g (λ) for λ ∈ σ (A) are {0, Kk }. This shows 2. Part 1. is obvious from Theorem 18.3.2. So is Part 3. Consider the last claim about Ak . If µ ∈ / Kk then in all of the above, Uk could have excluded µ. Assume this is the case. 1 Thus λ → µ−λ is analytic on Uk . Therefore, using Theorem 18.2.4 applied to the Banach space Xk , Z 1 1 −1 −1 (µI − A) = (λI − A) dλ 2πi Γk µ − λ and so µ ∈ / σ (Ak ) . Therefore, Kk ⊇ σ (Ak ). If µI − A fails to be onto, then this must be the case for some Ak . Here P is why. If y ∈ / (µI − A) (X) , then there is no x such that (µI − A) x = y. However, y = k Pk y. If each Ak is onto, then there would be xk ∈ Xk such that (µI − Ak ) xk = yk . Recall that Pk xk = xk . Therefore, (µPk − APk ) xk = Pk y, (µI − A) (Pk xk ) = Pk y. P Summing this on k, (µI − A) k xk = k Pk y = y and it would be the case that (µI − A) is onto. Thus if (µI − A) is not onto, then µI − Ak : Xk → Xk is not onto for some k. If µI −P A fails to be one to one, then there exists x 6= 0 such that (µI − A) x = 0. However, x = k xk where xk ∈P Xk . Then, since Ak is just the restriction of A to Xk and Pk is the restriction of I to Xk , k (µPk − Ak ) xk = 0 where I refers to Xk . Now recall that this is a direct sum. Hence (µPk − Ak ) xk = (µI − Ak ) xk = 0 P
If each µI − Ak is one to one, then each xk = 0 and so it follows that x = 0 also, it being the sum of the xk . It follows that σ (A) ⊆ ∪k σ (Ak ) and so σ (A) ⊆ ∪k σ (Ak ) ⊆ ∪k Kk = σ (A) , σ (Ak ) ⊆ Kk , and so you cannot have σ (Ak ) could not hold.
Kk , proper inclusion, for any k since otherwise, the above
18.3. INVARIANT SUBSPACES
483
It might be interesting to compare this with the algebraic approach to the same problem in the appendix, Section A.11. That approach has the advantage of dealing with arbitrary fields of scalars and is based on polynomials, the division algorithm, and the minimal polynomial where this is limited to the field of complex numbers. However, the approach in this chapter based on complex analysis applies to arbitrary Banach spaces whereas the algebraic methods only apply to finite dimensional spaces. Isn’t it interesting how two totally different points of view lead to essentially the same result about a direct sum of invariant subspaces?
484
CHAPTER 18. SPECTRAL THEORY OF LINEAR MAPS
∗
Appendix A
Review of Linear Algebra A.1
Systems of Equations
A system of linear equations is of the form X aij xj = bi , i = 1, · · · , m
(1.1)
j
Your want to find the solutions which are values for the xj which make the equations true. Note that the set of solutions is unchanged if the order of any two of these equations is reversed. Also the set of solutions is unchanged if any equation is multiplied by a nonzero number. The most interesting observation is that the solution set is unchanged when one equation is replaced by a multiple of another equation added to it. Why exactly does the replacement of one equation with a multiple of another added to it not change the solution set? Suppose you have two of these equations, E1 = f1 , E2 = f2
(1.2)
where E1 and E2 are expressions involving the variables. The claim is that if a is a number, then 1.2 has the same solution set as E1 = f1 , E2 + aE1 = f2 + af1 .
(1.3)
Why is this? If (x, y) solves 1.2 then it solves the first equation in 1.3. Also, it satisfies aE1 = af1 and so, since it also solves E2 = f2 it must solve the second equation in 1.3. If (x, y) solves 1.3 then it solves the first equation of 1.2. Also aE1 = af1 and it is given that the second equation of 1.3 is verified. Therefore, E2 = f2 and it follows (x, y) is a solution of the second equation in 1.2. This shows the solutions to 1.2 and 1.3 are exactly the same which means they have the same solution set. It is foolish to write the variables every time you do these operations. It is easier to write the system 1.1 as the following augmented matrix. a11 · · · a1n b1 .. .. .. . . . am1
···
amn
bm
Now the operations described above where you switch the order of the equations, multiply an equation by a nonzero number, or replace an equation by a multiple of another equation added to it are equivalent to the following row operations which are done on this augmented matrix. 485
486
APPENDIX A. REVIEW OF LINEAR ALGEBRA
Definition A.1.1
The row operations consist of the following
1. Switch two rows. 2. Multiply a row by a nonzero number. 3. Replace a row by a multiple of another row added to it. It is important to observe that any row operation can be “undone” by another inverse row operation. For example, if r1 , r2 are two rows, and r2 is replaced with r02 = αr1 + r2 using row operation 3, then you could get back to where you started by replacing the row r02 with −α times r1 and adding to r02 . In the case of operation 2, you would simply multiply the row that was changed by the inverse of the scalar which multiplied it in the first place, and in the case of row operation 1, you would just make the same switch again and you would be back to where you started. In each case, the row operation which undoes what was done is called the inverse row operation. Also, you have a solution x = (x1 , · · · , xn ) to system of equations given above if and only if a11 a1n b1 x1 ... + · · · + xn ... = ... am1
amn
bm
Recall from linear algebra and the way we multiply matrices, this is the same as saying that x satisfies a11 · · · a1n x1 b1 .. .. .. = .. , Ax = b . . . . am1
···
amn
xn
bm
The important thing to observe is that when a row operation is done to the augmented matrix (A|b) you get a new augmented matrix which corresponds to a system of equations which has the same solution set as the original system.
A.2
Matrices
Recall the following definition of matrix multiplication and addition.
Definition A.2.1
Letting Mij denote the entry of M in the ith row and j th column, let A be an m × n matrix and let B be an n × p matrix. Then AB is an m × p matrix and X (AB)ij = Aik Bkj k
If A, B are both m × n, then (A + B) is also m × n and (A + B)ij = Aij + Bij . If α is a scalar, then (αA)ij = αAij . Recall the following properties of matrix arithmetic which follow right away from the above. A + B = B + A,
(1.4)
(A + B) + C = A + (B + C) ,
(1.5)
the commutative law of addition,
the associative law for addition, A + 0 = A,
(1.6)
A.2. MATRICES
487
the existence of an additive identity, A + (−A) = 0,
(1.7)
the existence of an additive inverse. Also, for α, β scalars, the following also hold. α (A + B) = αA + αB,
(1.8)
(α + β) A = αA + βA,
(1.9)
α (βA) = αβ (A) ,
(1.10)
1A = A.
(1.11)
The above properties, 1.4-1.11 are known as the vector space axioms and the fact that the m × n matrices satisfy these axioms is what is meant by saying this set of matrices with addition and scalar multiplication as defined above forms a vector space. There are also properties which are related to matrix multiplication.
Proposition A.2.2 If all multiplications and additions make sense, the following hold for matrices, A, B, C and a, b scalars. A (aB + bC) = a (AB) + b (AC)
(1.12)
(B + C) A = BA + CA
(1.13)
A (BC) = (AB) C
(1.14)
Proof: Using the above definition of matrix multiplication, X X (A (aB + bC))ij = Aik (aB + bC)kj = Aik (aBkj + bCkj ) k
=a
X
Aik Bkj + b
X
k
k
Aik Ckj = a (AB)ij + b (AC)ij = (a (AB) + b (AC))ij
k
showing that A (B + C) = AB + AC as claimed. Formula 1.13 is entirely similar. Consider 1.14, the associative law of multiplication. Before reading this, review the definition of matrix multiplication in terms of entries of the matrices. X X X (A (BC))ij = Aik (BC)kj = Aik Bkl Clj k
=
X
k
l
(AB)il Clj = ((AB) C)ij .
l
Recall also that AB is sometimes not equal to BA. For example, 1 2 0 1 0 1 1 2 6= 3 4 1 0 1 0 3 4
Definition A.2.3
Let A be an m × n matrix. Then AT denotes the n × m matrix which is defined as follows. AT ij = Aji The transpose of a matrix has the following important property.
Lemma A.2.4 Let A be an m × n matrix and let B be a n × p matrix. Then T
(1.15)
T
(1.16)
(AB) = B T AT and if α and β are scalars, (αA + βB) = αAT + βB T
488
APPENDIX A. REVIEW OF LINEAR ALGEBRA Proof: From the definition, X X T (AB) = (AB)ji = Ajk Bki = B T ik AT kj = B T AT ij ij
k
k
1.16 is left as an exercise.
Definition A.2.5
A real n × n matrix A is said to be symmetric if A = AT . It is said to be skew symmetric if AT = −A.There is a special matrix called I and defined by Iij = δ ij where δ ij is the Kronecker symbol defined by 1 if i = j δ ij = 0 if i 6= j It is called the identity matrix because it is a multiplicative identity in the following sense. The following lemma follows from the above definition.
Lemma A.2.6 Suppose A is an m × n matrix and In is the n × n identity matrix. Then AIn = A. If Im is the m × m identity matrix, it also follows that Im A = A.
Definition A.2.7 An n × n matrix A has an inverse A−1 if and only if there exists a matrix, denoted as A−1 such that AA−1 = A−1 A = I where I = (δ ij ) for 1 if i = j δ ij ≡ 0 if i 6= j Such a matrix is called invertible. If it acts like an inverse, then it is the inverse. This is the message of the following proposition.
Proposition A.2.8 Suppose AB = BA = I. Then B = A−1 . Proof: From the definition B is an inverse for A. Could there be another one B 0 ? B 0 = B 0 I = B 0 (AB) = (B 0 A) B = IB = B. Thus, the inverse, if it exists, is unique. Recall the definition of the special vectors ek . ek =
0
···
0
1
0
···
0
T
where the 1 is in the k th position from the left.
Definition A.2.9
Let A be a matrix. N (A) = {x : Ax = 0} . This may also be
referred to as ker (A). There is a fundamental result in the case where m < n. In this case, the matrix A looks like the following.
A.3. SUBSPACES AND SPANS
Theorem A.2.10
489
Let A be an m × n matrix where m < n. Then N (A) contains
nonzero vectors. Proof: It is clear that the theorem is true if A is 1 × n with n > 1. Suppose it is true whenever A is m − 1 × k for m − 1 < k. Say A is m × n with m < n, m > 1, A = a1 · · · an , ai the ith column If a1 = 0, consider the vector x = e1 . Ae1 = 0. If a1 6= 0, do row operations to obtain a ˆ = 0, and matrix Aˆ such that the solutions of Ay = 0 are the same as the solutions of Ay ˆ A= 1 aT 0 B Now B is m − 1 × n − 1 where m − 1 < n − 1 and so by there is a nonzero vector induction, 0 ˆ = 0 and so Az = 0. x such that Bx = 0. Consider the nonzero vector z = . Then Az x
A.3
Subspaces and Spans
To save on notation, F will denote a field. In this course, it will be either R or C, the real or the complex numbers.
Definition A.3.1
Let {x1 , · · · , xp } be vectors in Fn . A linear combination is any
expression of the form p X
ci xi
i=1
where the ci are scalars. The set of all linear combinations of these vectors is called span (x1 , · · · , xn ) . If V ⊆ Fn , then V is called a subspace if whenever α, β are scalars and u and v are vectors of V, it follows αu + βv ∈ V . That is, it is “closed under the algebraic operations of vector addition and scalar multiplication”. A linear combination of vectors is said to be trivial if all the scalars in the linear combination equal zero. A set of vectors is said to be linearly independent if the only linear combination of these vectors which equals the zero vector is the trivial linear combination. Thus {x1 , · · · , xn } is called linearly independent if whenever p X ck xk = 0 k=1
it follows that all the scalars ck equal zero. A set of vectors, {x1 , · · · , xp } , is called linearly dependent if it is not linearly independent. Thus the setP of vectors is linearly dependent if p there exist scalars ci , i = 1, · · · , n, not all zero such that k=1 ck xk = 0.
Proposition A.3.2 Let V ⊆ Fn . Then V is a subspace if and only if it is a vector space itself with respect to the same operations of scalar multiplication and vector addition. Proof: Suppose first that V is a subspace. All algebraic properties involving scalar multiplication and vector addition hold for V because these things hold for Fn . Is 0 ∈ V ? Yes it is. This is because 0v ∈ V and 0v = 0. By assumption, for α a scalar and v ∈ V, αv ∈ V. Therefore, −v = (−1) v ∈ V . Thus V has the additive identity and additive inverse. By assumption, V is closed with respect to the two operations. Thus V is a vector space. If V ⊆ Fn is a vector space, then by definition, if α, β are scalars and u, v vectors in V, it follows that αv + βu ∈ V . Thus, from the above, subspaces of Fn are just subsets of Fn which are themselves vector spaces.
490
APPENDIX A. REVIEW OF LINEAR ALGEBRA
Lemma A.3.3 A set of vectors {x1 , · · · , xp } is linearly independent if and only if none of the vectors can be obtained as a linear combination of the others. Proof: Suppose first that {x1 , · · · , xp } is linearly independent. If xk = 0 = 1xk +
X
P
j6=k cj xj ,
then
(−cj ) xj ,
j6=k
a nontrivial linear combination, contrary to assumption. This shows that if the set is linearly independent, then none of the vectors is a linear combination of the others. Now suppose no vector is a linear combination of the others. Is {x1 , · · · , xp } linearly independent? If it is not, there exist scalars ci , not all zero such that p X
ci xi = 0.
i=1
Say ck 6= 0. Then you can solve for xk as X xk = (−cj ) /ck xj j6=k
contrary to assumption. The following is called the exchange theorem.
Theorem A.3.4 (Exchange Theorem) Let {x1 , · · · , xr } be a linearly independent set of vectors such that each xi is in span(y1 , · · · , ys ) . Then r ≤ s. Proof : Suppose not. Then r > s. By assumption, there exist scalars aji such that xi =
s X
aji yj
j=1
The matrix whose jith entry is aji has more columns than rows. Therefore, by Theorem A.2.10 there exists a nonzero vector b ∈ Fr such that Ab = 0. Thus 0=
r X
aji bi , each j.
i=1
Then r X
bi xi =
i=1
r X i=1
bi
s X
aji yj =
j=1
s r X X j=1
! aji bi
yj = 0
i=1
contradicting the assumption that {x1 , · · · , xr } is linearly independent.
Definition A.3.5
A finite set of vectors, {x1 , · · · , xr } is a basis for Fn if span (x1 , · · · , xr ) = Fn
and {x1 , · · · , xr } is linearly independent.
Corollary A.3.6 Let {x1 , · · · , xr } and {y1 , · · · , ys } be two bases1 of Fn . Then r = s = n. 1 This is the plural form of basis. We could say basiss but it would involve an inordinate amount of hissing as in “The sixth shiek’s sixth sheep is sick.” This is the reason that bases is used instead of basiss.
A.3. SUBSPACES AND SPANS
491
Proof: From the exchange theorem, r ≤ s and s ≤ r. Now note the vectors, 1 is in the ith slot
z }| { ei = (0, · · · , 0, 1, 0 · · · , 0) for i = 1, 2, · · · , n are a basis for Fn .
Lemma A.3.7 Let {v1 , · · · , vr } be a set of vectors. Then V ≡ span (v1 , · · · , vr ) is a subspace. Pr Pr Proof: Suppose α, β are two scalars and let k=1 ck vk and k=1 dk vk are two elements of V. What about r r X X α ck vk + β dk vk ? k=1
Is it also in V ? α
r X
ck vk + β
k=1
r X k=1
k=1
dk vk =
r X
(αck + βdk ) vk ∈ V
k=1
so the answer is yes.
Definition A.3.8
A finite set of vectors, {x1 , · · · , xr } is a basis for a subspace V of Fn if span (x1 , · · · , xr ) = V and {x1 , · · · , xr } is linearly independent.
Corollary A.3.9 Let {x1 , · · · , xr } and {y1 , · · · , ys } be two bases for V . Then r = s. Proof: From the exchange theorem, r ≤ s and s ≤ r.
Definition A.3.10
Let V be a subspace of Fn . Then dim (V ) read as the dimension of V is the number of vectors in a basis. Of course you should wonder right now whether an arbitrary subspace even has a basis. In fact it does and this is in the next theorem. First, here is an interesting lemma.
Lemma A.3.11 Suppose v ∈/ span (u1 , · · · , uk ) and {u1 , · · · , uk } is linearly independent. Then {u1 , · · · , uk , v} is also linearly independent. Pk Proof: Suppose i=1 ci ui + dv = 0. It is required to verify that each ci = 0 and that d = 0. But if d = 6 0, then you can solve for v as a linear combination of the vectors, {u1 , · · · , uk }, k X ci ui v=− d i=1 Pk contrary to assumption. Therefore, d = 0. But then i=1 ci ui = 0 and the linear independence of {u1 , · · · , uk } implies each ci = 0 also.
Theorem A.3.12
Let V be a nonzero subspace of Fn . Then V has a basis.
Proof: Let v1 ∈ V where v1 6= 0. If span {v1 } = V, stop. {v1 } is a basis for V . Otherwise, there exists v2 ∈ V which is not in span {v1 } . By Lemma A.3.11 {v1 , v2 } is a linearly independent set of vectors. If span {v1 , v2 } = V stop, {v1 , v2 } is a basis for V. If span {v1 , v2 } 6= V, then there exists v3 ∈ / span {v1 , v2 } and {v1 , v2 , v3 } is a larger linearly independent set of vectors. Continuing this way, the process must stop before n + 1 steps because if not, it would be possible to obtain n + 1 linearly independent vectors, contrary to the exchange theorem. In words the following corollary states that any linearly independent set of vectors can be enlarged to form a basis.
492
APPENDIX A. REVIEW OF LINEAR ALGEBRA
Corollary A.3.13 Let V be a subspace of Fn and let {v1 , · · · , vr } be a linearly independent set of vectors in V . Then either it is a basis for V or there exist vectors, vr+1 , · · · , vs such that {v1 , · · · , vr , vr+1 , · · · , vs } is a basis for V. Proof: This follows immediately from the proof of Theorem A.3.12. You do exactly the same argument except you start with {v1 , · · · , vr } rather than {v1 }. It is also true that any spanning set of vectors can be restricted to obtain a basis.
Theorem A.3.14
Let V be a subspace of Fn and suppose span (u1 · · · , up ) = V where the ui are nonzero vectors. Then there exist vectors {v1 · · · , vr } such that {v1 · · · , vr } ⊆ {u1 · · · , up } and {v1 · · · , vr } is a basis for V . Proof: Let r be the smallest positive integer with the property that for some set {v1 · · · , vr } ⊆ {u1 · · · , up } , span (v1 · · · , vr ) = V. Then r ≤ p and it must be the case that {v1 · · · , vr } is linearly independent because if it were not so, one of the vectors, say vk would be a linear combination of the others. But then you could delete this vector from {v1 · · · , vr } and the resulting list of r − 1 vectors would still span V contrary to the definition of r.
A.4
Application to Matrices
The following is a theorem of major significance.
Theorem A.4.1
Suppose A is an n × n matrix. Then A is one to one (injective) if and only if A is onto (surjective). Also, if B is an n × n matrix and AB = I, then it follows BA = I.
Proof: First suppose A is one to one. Consider the vectors, {Ae1 , · · · , Aen } where ek is the column vector which is all zeros except for a 1 in the k th position. This set of vectors is linearly independent because, since multiplication by A is linear, if ! n n X X ck Aek = 0, then A ck ek = 0 k=1
k=1
and so, since A is one to one, n X
ck ek = 0
k=1
which implies each ck = 0 because the ek are clearly linearly independent. Therefore, {Ae1 , · · · , Aen } must be a basis for Fn because if not there would exist a vector, y ∈ / span (Ae1 , · · · , Aen ) and then by Lemma A.3.11, {Ae1 , · · · , Aen , y} would be an independent set of vectors having n + 1 vectors in it, contrary to the exchange theorem. It follows that for y ∈ Fn there exist constants, ci such that ! n n X X y= ck Aek = A ck ek k=1
k=1
showing that, since y was arbitrary, A is onto. Next suppose A is onto. Say Ax = 0 where x 6= 0. But this would say that the columns of A are linearly dependent. If A were onto, then the columns of A would be a spanning set of Fn . However, one column is a linear combination of the others and so there would exist a spanning set of fewer than n vectors contradicting the above exchange theorem. Now suppose AB = I. Why is BA = I? Since AB = I it follows B is one to one since otherwise, there would exist, x 6= 0 such that Bx = 0 and then ABx = A0 = 0 6= Ix.
A.5. MATHEMATICAL THEORY OF DETERMINANTS
493
Therefore, from what was just shown, B is also onto. In addition to this, A must be one to one because if Ay = 0, then y = Bx for some x and then x = ABx = Ay = 0 showing y = 0. Now from what is given to be so, it follows (AB) A = A and so using the associative law for matrix multiplication, A (BA) − A = A (BA − I) = 0. But this means (BA − I) x = 0 for all x since otherwise, A would not be one to one. Hence BA = I as claimed. This theorem shows that if an n × n matrix B acts like an inverse when multiplied on one side of A, it follows that B = A−1 and it will act like an inverse on both sides of A.
A.5 A.5.1
Mathematical Theory of Determinants The Function sgn
The following lemma will be essential in the definition of the determinant.
Lemma A.5.1 There exists a function, sgnn which maps each ordered list of numbers from {1, · · · , n} to one of the three numbers, 0, 1, or −1 which also has the following properties. sgnn (1, · · · , n) = 1 (1.17) sgnn (i1 , · · · , p, · · · , q, · · · , in ) = − sgnn (i1 , · · · , q, · · · , p, · · · , in )
(1.18)
In words, the second property states that if two of the numbers are switched, the value of the function is multiplied by −1. Also, in the case where n > 1 and {i1 , · · · , in } = {1, · · · , n} so that every number from {1, · · · , n} appears in the ordered list, (i1 , · · · , in ) , sgnn (i1 , · · · , iθ−1 , n, iθ+1 , · · · , in ) ≡ (−1)
n−θ
sgnn−1 (i1 , · · · , iθ−1 , iθ+1 , · · · , in )
(1.19)
where n = iθ in the ordered list, (i1 , · · · , in ) . Proof: Define sign (x) = 1 if x > 0, −1 if x < 0 and 0 if x = 0. If n = 1, there is only one list and it is just the number 1. Thus one can define sgn1 (1) ≡ 1. For the general case where n > 1, simply define ! Y sgnn (i1 , · · · , in ) ≡ sign (is − ir ) r