187 50
English Pages 650 Year 2024
Mathematical Analysis Volume II
Teo Lee Peng
Mathematical Analysis Volume II Teo Lee Peng January 1, 2024
Contents
i
Contents Contents
i
Preface Chapter 1 1.1 1.2 1.3 1.4 1.5
iv Euclidean Spaces The Euclidean Space Rn as a Vector Space Convergence of Sequences in Rn . . . . . Open Sets and Closed Sets . . . . . . . . Interior, Exterior, Boundary and Closure . Limit Points and Isolated Points . . . . .
. . . . .
1 . . . 1 . . . 23 . . . 33 . . . 46 . . . 59
. . . . . . . . . .
. . . . . . . . . .
66 66 66 68 69 70 74 79 92 121 127
Continuous Functions on Connected Sets and Compact Sets Path-Connectedness and Intermediate Value Theorem . . . . . Connectedness and Intermediate Value Property . . . . . . . . Sequential Compactness and Compactness . . . . . . . . . . . Applications of Compactness . . . . . . . . . . . . . . . . . . 3.4.1 The Extreme Value Theorem . . . . . . . . . . . . . . 3.4.2 Distance Between Sets . . . . . . . . . . . . . . . . .
132 132 147 161 181 181 184
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
Chapter 2 Limits of Multivariable Functions and Continuity 2.1 Multivariable Functions . . . . . . . . . . . . . . . . 2.1.1 Polynomials and Rational Functions . . . . . 2.1.2 Component Functions of a Mapping . . . . . 2.1.3 Invertible Mappings . . . . . . . . . . . . . 2.1.4 Linear Transformations . . . . . . . . . . . . 2.1.5 Quadratic Forms . . . . . . . . . . . . . . . 2.2 Limits of Functions . . . . . . . . . . . . . . . . . . 2.3 Continuity . . . . . . . . . . . . . . . . . . . . . . . 2.4 Uniform Continuity . . . . . . . . . . . . . . . . . . 2.5 Contraction Mapping Theorem . . . . . . . . . . . . Chapter 3 3.1 3.2 3.3 3.4
. . . . .
. . . . . . . . . .
. . . . . . . . . .
. . . . . . . . . .
Contents
ii 3.4.3 3.4.4 3.4.5
Uniform Continuity . . . . . . . . . . . . . . . . . . . 191 Linear Transformations and Quadratic Forms . . . . . 192 Lebesgue Number Lemma . . . . . . . . . . . . . . . 195
Chapter 4 Differentiating Functions of Several Variables 4.1 Partial Derivatives . . . . . . . . . . . . . . . . 4.2 Differentiability and First Order Approximation 4.2.1 Differentiability . . . . . . . . . . . . . 4.2.2 First Order Approximations . . . . . . 4.2.3 Tangent Planes . . . . . . . . . . . . . 4.2.4 Directional Derivatives . . . . . . . . . 4.3 The Chain Rule and the Mean Value Theorem . 4.4 Second Order Approximations . . . . . . . . . 4.5 Local Extrema . . . . . . . . . . . . . . . . . . Chapter 5 5.1 5.2 5.3 5.4 Chapter 6 6.1 6.2 6.3 6.4 6.5
. . . . . . . . .
. . . . . . . . .
. . . . . . . . .
. . . . . . . . .
. . . . . . . . .
. . . . . . . . .
The Inverse and Implicit Function Theorems The Inverse Function Theorem . . . . . . . . . . . . . . . The Proof of the Inverse Function Theorem . . . . . . . . The Implicit Function Theorem . . . . . . . . . . . . . . . Extrema Problems and the Method of Lagrange Multipliers
Multiple Integrals Riemann Integrals . . . . . . . . . . . . . . . . . . . . . . Properties of Riemann Integrals . . . . . . . . . . . . . . . Jordan Measurable Sets and Riemann Integrable Functions Iterated Integrals and Fubini’s Theorem . . . . . . . . . . Change of Variables Theorem . . . . . . . . . . . . . . . . 6.5.1 Translations and Linear Transformations . . . . . 6.5.2 Polar Coordinates . . . . . . . . . . . . . . . . . . 6.5.3 Spherical Coordinates . . . . . . . . . . . . . . . 6.5.4 Other Examples . . . . . . . . . . . . . . . . . . . 6.6 Proof of the Change of Variables Theorem . . . . . . . . . 6.7 Some Important Integrals and Their Applications . . . . .
. . . . . . . . .
. . . .
. . . . . . . . . . .
. . . . . . . . .
201 201 221 221 233 237 238 248 263 271
. . . .
285 285 298 309 329
. . . . . . . . . . .
343 344 376 389 431 450 454 466 477 482 487 509
Contents Chapter 7 7.1 7.2 7.3 7.4 7.5
iii Fourier Series and Fourier Transforms Orthogonal Systems of Functions and Fourier Series The Pointwise Convergence of a Fourier Series . . . The L2 Convergence of a Fourier Series . . . . . . . The Uniform Convergence of a Trigonometric Series Fourier Transforms . . . . . . . . . . . . . . . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
517 518 540 556 570 586
Appendix A Sylvester’s Criterion
615
Appendix B Volumes of Parallelepipeds
622
Appendix C Riemann Integrability
629
References
642
Preface
iv
Preface Mathematical analysis is a standard course which introduces students to rigorous reasonings in mathematics, as well as the theories needed for advanced analysis courses. It is a compulsory course for all mathematics majors. It is also strongly recommended for students that major in computer science, physics, data science, financial analysis, and other areas that require a lot of analytical skills. Some standard textbooks in mathematical analysis include the classical one by Apostol [Apo74] and Rudin [Rud76], and the modern one by Bartle [BS92], Fitzpatrick [Fit09], Abbott [Abb15], Tao [Tao16, Tao14] and Zorich [Zor15, Zor16]. This book is the second volume of the textbooks intended for a one-year course in mathematical analysis. We introduce the fundamental concepts in a pedagogical way. Lots of examples are given to illustrate the theories. We assume that students are familiar with the material of calculus such as those in the book [SCW20]. Thus, we do not emphasize on the computation techniques. Emphasis is put on building up analytical skills through rigorous reasonings. Besides calculus, it is also assumed that students have taken introductory courses in discrete mathematics and linear algebra, which covers topics such as logic, sets, functions, vector spaces, inner products, and quadratic forms. Whenever needed, these concepts would be briefly revised. In this book, we have defined all the mathematical terms we use carefully. While most of the terms have standard definitions, some of the terms may have definitions defer from authors to authors. The readers are advised to check the definitions of the terms used in this book when they encounter them. This can be easily done by using the search function provided by any PDF viewer. The readers are also encouraged to fully utilize the hyper-referencing provided.
Teo Lee Peng
Chapter 1. Euclidean Spaces
1
Chapter 1 Euclidean Spaces In this second volume of mathematical analysis, we study functions defined on subsets of Rn . For this, we need to study the structure and topology of Rn first. We start by a revision on Rn as a vector space. In the sequel, n is a fixed positive integer reserved to be used for Rn .
1.1
The Euclidean Space Rn as a Vector Space
If S1 , S2 , . . ., Sn are sets, the cartesian product of these n sets is defined as the set S = S1 × · · · × Sn =
n Y
Si = {(a1 , . . . , an ) | ai ∈ Si , 1 ≤ i ≤ n}
i=1
that contains all n-tuples (a1 , . . . , an ), where ai ∈ Si for all 1 ≤ i ≤ n. The set Rn is the cartesian product of n copies of R. Namely, Rn = {(x1 , x2 , . . . , xn ) | x1 , x2 , . . . , xn ∈ R} . The point (x1 , x2 , . . . , xn ) is denoted as x, whereas x1 , x2 , . . . , xn are called the components of the point x. We can define an addition and a scalar multiplication on Rn . If x = (x1 , x2 , . . . , xn ) and y = (y1 , y2 , . . . , yn ) are in Rn , the addition of x and y is defined as x + y = (x1 + y1 , x2 + y2 , . . . , xn + yn ). In other words, it is a componentwise addition. Given a real number α, the scalar multiplication of α with x is given by the componentwise multiplication αx = (αx1 , αx2 , . . . , αxn ). The set Rn with the addition and scalar multiplication operations is a vector space. It satisfies the 10 axioms for a real vector space V .
Chapter 1. Euclidean Spaces
2
The 10 Axioms for a Real Vector Space V Let V be a set that is equipped with two operations – the addition and the scalar multiplication. For any two vectors u and v in V , their addition is denoted by u + v. For a vector u in V and a scalar α ∈ R, the scalar multiplication of v by α is denoted by αv. We say that V with the addition and scalar multiplication is a real vector space provided that the following 10 axioms are satisfied for any u, v and w in V , and any α and β in R. Axiom 1
If u and v are in V , then u + v is in V .
Axiom 2
u + v = v + u.
Axiom 3
(u + v) + w = u + (v + w).
Axiom 4
There is a zero vector 0 in V such that 0+v =v =v+0
Axiom 5
for all v ∈ V.
For any v in V , there is a vector w in V such that v + w = 0 = w + v.
The vector w satisfying this equation is called the negative of v, and is denoted by −v. Axiom 6
For any v in V , and any α ∈ R, αv is in V .
Axiom 7
α(u + v) = αu + αv.
Axiom 8
(α + β)v = αv + βv.
Axiom 9
α(βv) = (αβ)v.
Axiom 10
1v = v.
Rn is a real vector space. The zero vector is the point 0 = (0, 0, . . . , 0) with all components equal to 0. Sometimes we also call a point x = (x1 , . . . , xn ) in
Chapter 1. Euclidean Spaces
3
Rn a vector, and identify it as the vector from the origin 0 to the point x. Definition 1.1 Standard Unit Vectors In Rn , there are n standard unit vectors e1 , . . ., en given by e1 = (1, 0, . . . , 0), e2 = (0, 1, . . . , 0), · · · , en = (0, . . . , 0, 1). Let us review some concepts from linear algebra which will be useful later. Given that v1 , . . . , vk are vectors in a vector space V , a linear combination of v1 , . . . , vk is a vector v in V of the form v = c1 v 1 + · · · + ck v k for some scalars c1 , . . . , ck , which are known as the coefficients of the linear combination. A subspace of a vector space V is a subset of V that is itself a vector space. There is a simple way to construct subspaces. Proposition 1.1 Let V be a vector space, and let v1 , . . . , vk be vectors in V . The subset W = {c1 v1 + · · · + ck vk | c1 , . . . , ck ∈ R} of V that contains all linear combinations of v1 , . . . , vk is itself a vector space. It is called the subspace of V spanned by v1 , . . . , vk . Example 1.1 In R3 , the subspace spanned by the vectors e1 = (1, 0, 0) and e3 = (0, 0, 1) is the set W that contains all points of the form x(1, 0, 0) + z(0, 0, 1) = (x, 0, z), which is the xz-plane. Next, we recall the concept of linear independence.
Chapter 1. Euclidean Spaces
4
Definition 1.2 Linear Independence Let V be a vector space, and let v1 , . . . , vk be vectors in V . We say that the set {v1 , . . . , vk } is a linearly independent set of vectors, or the vectors v1 , . . . , vk are linearly independent, if the only k-tuple of real numbers (c1 , . . . , ck ) which satisfies c1 v1 + · · · + ck vk = 0 is the trivial k-tuple (c1 , . . . , ck ) = (0, . . . , 0). Example 1.2 In Rn , the standard unit vectors e1 , . . . , en are linearly independent. Example 1.3 If V is a vector space, a vector v in V is linearly independent if and only if v ̸= 0. Example 1.4 Let V be a vector space. Two vectors u and v in V are linearly independent if and only if u ̸= 0, v ̸= 0, and there does not exists a constant α such that v = αu. Let us recall the following definition for two vectors to be parallel. Definition 1.3 Parallel Vectors Let V be a vector space. Two vectors u and v in V are parallel if either u = 0 or there exists a constant α such that v = αu. In other words, two vectors u and v in V are linearly independent if and only if they are not parallel.
Chapter 1. Euclidean Spaces
5
Example 1.5 If S = {v1 , . . . , vk } is a linearly independent set of vectors, then for any S ′ ⊂ S, S ′ is also a linearly independent set of vectors. Now we discuss the concept of dimension and basis. Definition 1.4 Dimension and Basis Let V be a vector space, and let W be a subspace of V . If W can be spanned by k linearly independent vectors v1 , . . . , vk in V , we say that W has dimension k. The set {v1 , . . . , vk } is called a basis of W . Example 1.6 In Rn , the n standard unit vectors e1 , . . ., en are linearly independent and they span Rn . Hence, the dimension of Rn is n. Example 1.7 In R3 , the subspace spanned by the two linearly independent vectors e1 = (1, 0, 0) and e3 = (0, 0, 1) has dimension 2. Next, we introduce the translate of a set. Definition 1.5 Translate of a Set If A is a subset of Rn , u is a point in Rn , the translate of the set A by the vector u is the set A + u = {a + u | a ∈ A} . Example 1.8 In R3 , the translate of the set A = {(x, y, 0) | x, y ∈ R} by the vector u = (0, 0, −2) is the set B = A + u = {(x, y, −2) | x, y ∈ R}. In Rn , the lines and the planes are of particular interest. They are closely
Chapter 1. Euclidean Spaces
6
related to the concept of subspaces. Definition 1.6 Lines in Rn A line L in Rn is a translate of a subspace of Rn that has dimension 1. As a set, it contains all the points x of the form x = x0 + tv,
t ∈ R,
where x0 is a fixed point in Rn , and v is a nonzero vector in Rn . The equation x = x0 + tv, t ∈ R, is known as the parametric equation of the line. A line is determined by two points. Example 1.9 Given two distinct points x1 and x2 in Rn , the line L that passes through these two points have parametric equation given by x = x1 + t(x2 − x1 ),
t ∈ R.
When 0 ≤ t ≤ 1, x = x1 + t(x2 − x1 ) describes all the points on the line segment with x1 and x2 as endpoints.
Figure 1.1: A Line between two points.
Chapter 1. Euclidean Spaces
7
Definition 1.7 Planes in Rn A plane W in Rn is a translate of a subspace of dimension 2. As a set, it contains all the points x of the form x = x0 + t1 v1 + t2 v2 ,
t1 , t2 ∈ R,
where x0 is a fixed point in Rn , and v1 and v2 are two linearly independent vectors in Rn . Besides being a real vector space, Rn has an additional structure. Its definition is motivated as follows. Let P (x1 , x2 , x3 ) and Q(y1 , y2 , y3 ) be two points in R3 . By Pythagoras theorem, the distance between P and Q is given by p P Q = (x1 − y1 )2 + (x2 − y2 )2 + (x3 − y3 )2 .
Figure 1.2: Distance between two points in R2 . Consider the triangle OP Q with vertices O, P , Q, where O is the origin. Then q q 2 2 2 OP = x1 + x2 + x3 , OQ = y12 + y22 + y32 . Let θ be the minor angle between OP and OQ. By cosine rule, P Q2 = OP 2 + OQ2 − 2 × OP × OQ × cos θ. A straightforward computation gives OP 2 + OQ2 − P Q2 = 2(x1 y1 + x2 y2 + x3 y3 ).
Chapter 1. Euclidean Spaces
8
Figure 1.3: Cosine rule. Hence, cos θ = p
x21
x1 y1 + x2 y2 + x3 y3 p . + x22 + x23 y12 + y22 + y32
(1.1)
It is a quotient of x1 y1 + x2 y2 + x3 y3 by the product of the lengths of OP and OQ. Generalizing the expression x1 y1 + x2 y2 + x3 y3 from R3 to Rn defines the dot product. For any two vectors x = (x1 , x2 , . . . , xn ) and y = (y1 , y2 , . . . , yn ) in Rn , the dot product of x and y is defined as x · y=
n X
xi y i = x1 y 1 + x2 y 2 + · · · + xn y n .
i=1
This is a special case of an inner product. Definition 1.8 Inner Product Space A real vector space V is an inner product space if for any two vectors u and v in V , an inner product ⟨u, v⟩ of u and v is defined, and the following conditions for any u, v, w in V and α, β ∈ R are satisfied. 1. ⟨u, v⟩ = ⟨v, u⟩. 2. ⟨αu + βv, w⟩ = α⟨u, w⟩ + β⟨v, w⟩. 3. ⟨v, v⟩ ≥ 0 and ⟨v, v⟩ = 0 if and only if v = 0.
Chapter 1. Euclidean Spaces
9
Proposition 1.2 Euclidean Inner Product on Rn On Rn , ⟨x, y⟩ = x · y =
n X
xi yi = x1 y1 + x2 y2 + · · · + xn yn .
i=1
defines an inner product, called the standard inner product or the Euclidean inner product. Definition 1.9 Euclidean Space The vector space Rn with the Euclidean inner product is called the Euclidean n-space. In the future, when we do not specify, Rn always means the Euclidean n-space. One can deduce some useful identities from the three axioms of an inner product space. Proposition 1.3 If V is an inner product space, then the following holds. (a) For any v ∈ V , ⟨0, v⟩ = 0 = ⟨v, 0⟩. (b) For any vectors v1 , · · · , vk , w1 , · · · , wl in V , and for any real numbers α1 , · · · , αk , β1 , · · · , βl , * k + k X l l X X X αi βj ⟨vi , wj ⟩. αi vi , βj w j = i=1
j=1
i=1 j=1
Given that V is an inner product space, ⟨v, v⟩ ≥ 0 for any v in V . For example, for any x = (x1 , x2 , . . . , xn ) in Rn , under the Euclidean inner product, ⟨x, x⟩ =
n X
x2i = x21 + x22 + · · · + x2n ≥ 0.
i=1
When n = 3, the length of the vector OP from the point O(0, 0, 0) to the point
Chapter 1. Euclidean Spaces
10
P (x1 , x2 , x3 ) is q p OP = x21 + x22 + x23 = ⟨x, x⟩,
where x = (x1 , x2 , x3 ).
This motivates us to define to norm of a vector in an inner product space as follows. Definition 1.10 Norm of a Vector Given that V is an inner product space, the norm of a vector v is defined as ∥v∥ =
p ⟨v, v⟩.
The norm of a vector in an inner product space satisfies some properties, which follow from the axioms for an inner product space. Proposition 1.4 Let V be an inner product space. 1. For any v in V , ∥v∥ ≥ 0 and ∥v∥ = 0 if and only if v = 0. 2. For any α ∈ R and v ∈ V , ∥αv∥ = |α| ∥v∥. Motivated by the distance between two points in R3 , we make the following definition. Definition 1.11 Distance Between Two Points Given that V is an inner product space, the distance between u and v in V is defined as d(u, v) = ∥v − u∥ =
p ⟨v − u, v − u⟩.
For example, the distance between the points x = (x1 , . . . , xn ) and y = (y1 , . . . , yn ) in the Euclidean space Rn is v u n p uX d(x, y) = t (xi − yi )2 = (x1 − y1 )2 + · · · + (xn − yn )2 . i=1
Chapter 1. Euclidean Spaces
11
For analysis in R, an important inequality is the triangle inequality which says that |x + y| ≤ |x| + |y| for any x and y in R. To generalize this inequality to Rn , we need the celebrated Cauchy-Schwarz inequality. It holds on any inner product space. Proposition 1.5 Cauchy-Schwarz Inequality Given that V is an inner product space, for any u and v in V , |⟨u, v⟩| ≤ ∥u∥ ∥v∥. The equality holds if and only if u and v are parallel. Proof It is obvious that if either u = 0 or v = 0, |⟨u, v⟩| = 0 = ∥u∥ ∥v∥, and so the equality holds. Now assume that both u and v are nonzero vectors. Consider the quadratic function f : R → R defined by f (t) = ∥tu − v∥2 = ⟨tu − v, tu − v⟩. Notice that f (t) = at2 + bt + c, where a = ⟨u, u⟩ = ∥u∥2 ,
b = −2⟨u, v⟩,
c = ⟨v, v⟩ = ∥v∥2 .
The 3rd axiom of an inner product says that f (t) ≥ 0 for all t ∈ R. Hence, we must have b2 − 4ac ≤ 0. This gives ⟨u, v⟩2 ≤ ∥u∥2 ∥v∥2 . Thus, we obtain the Cauchy-Schwarz inequality |⟨u, v⟩| ≤ ∥u∥ ∥v∥.
Chapter 1. Euclidean Spaces
12
The equality holds if and only if b2 − 4ac = 0. The latter means that f (t) = 0 for some t = α, which can happen if and only if αu − v = 0, or equivalently, v = αu. Now we can prove the triangle inequality. Proposition 1.6 Triangle Inequality Let V be an inner product space. For any vectors v1 , v2 , . . . , vk in V , ∥v1 + v2 + · · · + vk ∥ ≤ ∥v1 ∥ + ∥v2 ∥ + · · · + ∥vk ∥. Proof It is sufficient to prove the statement when k = 2. The general case follows from induction. Given v1 and v2 in V , ∥v1 + v2 ∥2 = ⟨v1 + v2 , v1 + v2 ⟩ = ⟨v1 , v1 ⟩ + 2⟨v1 , v2 ⟩ + ⟨v2 , v2 ⟩ ≤ ∥v1 ∥2 + 2∥v1 ∥∥v2 ∥ + ∥v2 ∥2 = (∥v1 ∥ + ∥v2 ∥)2 . This proves that ∥v1 + v2 ∥ ≤ ∥v1 ∥ + ∥v2 ∥. From the triangle inequality, we can deduce the following. Corollary 1.7 Let V be an inner product space. For any vectors u and v in V , ∥u∥ − ∥v∥ ≤ ∥u − v∥. Express in terms of distance, the triangle inequality takes the following form.
Chapter 1. Euclidean Spaces
13
Proposition 1.8 Triangle Inequality Let V be an inner product space. For any three points v1 , v2 , v3 in V , d(v1 , v2 ) ≤ d(v1 , v3 ) + d(v2 , v3 ). More generally, if v1 , v2 , . . . , vk are k vectors in V , then d(v1 , vk ) ≤
k X
d(vi−1 , vi ) = d(v1 , v2 ) + · · · + d(vk−1 , vk ).
i=2
Since we can define the distance function on an inner product space, inner product space is a special case of metric spaces. Definition 1.12 Metric Space Let X be a set, and let d : X × X → R be a function defined on X × X. We say that d is a metric on X provided that the following conditions are satisfied. 1. For any x and y in X, d(x, y) ≥ 0, and d(x, y) = 0 if and only if x = y. 2. d(x, y) = d(y, x) for any x and y in X. 3. For any x, y and z in X, d(x, y) ≤ d(x, z) + d(y, z). If d is a metric on X, we say that (X, d) is a metric space. Metric spaces play important roles in advanced analysis. If V is an innner product space, it is a metric space with metric d(u, v) = ∥v − u∥. Using the Cauchy-Schwarz inequality, one can generalize the concept of angles to any two vectors in a real inner product space. If u and v are two nonzero vectors in a real inner product space V , Cauchy-Schwarz inequality implies that ⟨u, v⟩ ∥u∥ ∥v∥
Chapter 1. Euclidean Spaces
14
is a real number between −1 and 1. Generalizing the formula (1.1), we define the angle θ between u and v as θ = cos−1
⟨u, v⟩ . ∥u∥ ∥v∥
This is an angle between 0◦ and 180◦ . A necessary and sufficient condition for two vectors u and v to make a 90◦ angle is ⟨u, v⟩ = 0. Definition 1.13 Orthogonality Let V be a real inner product space. We say that the two vectors u and v in V are orthogonal if ⟨u, v⟩ = 0. Lemma 1.9 Generalized Pythagoras Theorem Let V be an inner product space. If u and v are orthogonal vectors in V , then ∥u + v∥2 = ∥u∥2 + ∥v∥2 . Now we discuss the projection theorem. Theorem 1.10 Projection Theorem Let V be an inner product space, and let w be a nonzero vector in V . If v is a vector in V , there is a unique way to write v as a sum of two vectors v1 and v2 , such that v1 is parallel to w and v2 is orthogonal to w. Moreover, for any real number α, ∥v − αw∥ ≥ ∥v − v1 ∥, and the equality holds if and only if α is equal to the unique real number β such that v1 = βw.
Chapter 1. Euclidean Spaces
15
Figure 1.4: The projection theorem. Proof Assume that v can be written as a sum of two vectors v1 and v2 , such that v1 is parallel to w and v2 is orthogonal to w. Since w is nonzero, there is a real number β such that v1 = βw. Since v2 = v − v1 = v − βw is orthogonal to w, we have 0 = ⟨v − βw, w⟩ = ⟨v, w⟩ − β⟨w, w⟩. This implies that we must have β= and v1 =
⟨v, w⟩ w, ⟨w, w⟩
⟨v, w⟩ , ⟨w, w⟩
v2 = v −
⟨v, w⟩ w. ⟨w, w⟩
It is easy to check that v1 and v2 given by these formulas indeed satisfy the requirements that v1 is parallel to w and v2 is orthogonal to w. This establishes the existence and uniqueness of v1 and v2 . Now for any real number α, v − αw = v − v1 + (β − α)w.
Chapter 1. Euclidean Spaces
16
Since v − v1 = v2 is orthogonal to (β − α)w, the generalized Pythagoras theorem implies that ∥v − αw∥2 = ∥v − v1 ∥2 + ∥(β − α)w∥2 ≥ ∥v − v1 ∥2 . This proves that ∥v − αw∥ ≥ ∥v − v1 ∥. The equality holds if and only if ∥(β − α)w∥ = |α − β|∥w∥ = 0. Since ∥w∥ = ̸ 0, we must have α = β. The vector v1 in this theorem is called the projection of v onto the subspace spanned by w. There is a more general projection theorem where the subspace W spanned by w is replaced by a general subspace. We say that a vector v is orthogonal to the subspace W if it is orthogonal to each vector w in W . Theorem 1.11 General Projection Theorem Let V be an inner product space, and let W be a finite dimensional subspace of V . If v is a vector in V , there is a unique way to write v as a sum of two vectors v1 and v2 , such that v1 is in W and v2 is orthogonal to W . The vector v1 is denoted by projW v. For any w ∈ W , ∥v − w∥ ≥ ∥v − projW v∥, and the equality holds if and only if w = projW v. Sketch of Proof If W is a k- dimensional vector space, it has a basis consists of k linearly independent vectors w1 , . . . , wk . Since the vector v1 is in W , there are constants c1 , . . . , ck such that v 1 = c1 w 1 + · · · + ck w k .
Chapter 1. Euclidean Spaces
17
The condition v2 = v − v1 is orthogonal to W gives rise to k equations c1 ⟨w1 , w1 ⟩ + · · · + ck ⟨wk , w1 ⟩ = ⟨v, w1 ⟩, .. .
(1.2)
c1 ⟨w1 , wk ⟩ + · · · + ck ⟨wk , wk ⟩ = ⟨v, wk ⟩. Using the fact that w1 , . . . , wk are linearly independent, one can show that the k × k matrix ⟨w1 , w1 ⟩ · · · ⟨wk , w1 ⟩ .. .. ... A= . . ⟨w1 , wk ⟩ · · ·
⟨wk , wk ⟩
is invertible. This shows that there is a unique c = (c1 , . . . , ck ) satisfying the linear system (1.2). If V is an inner product space, a basis that consists of mutually orthogonal vectors are of special interest. Definition 1.14 Orthogonal Set and Orthonormal Set Let V be an inner product space. A subset of vectors S = {u1 , . . . , uk } is called an orthogonal set if any two distinct vectors ui and uj in S are orthogonal. Namely, ⟨ui , uj ⟩ = 0
if i ̸= j.
S is called an orthonormal set if it is an orthogonal set of unit vectors. Namely, 0 if i ̸= j, ⟨ui , uj ⟩ = . 1 if i = j If S = {u1 , . . . , uk } is an orthogonal set of nonzero vectors, it is a linearly independent set of vectors. One can construct an orthonormal set by normalizing each vector in the set. There is a standard algorithm, known as the Gram-Schmidt process, which can turn any linearly independent set of vectors {v1 , . . . , vk } into
Chapter 1. Euclidean Spaces
18
an orthogonal set {u1 , . . . , uk } of nonzero vectors. We start by the following lemma. Lemma 1.12 Let V be an inner product space, and let S = {u1 , . . . , uk } be an orthogonal set of nonzero vectors in V that spans the subspace W . Given any vector v in V , k X ⟨v, ui ⟩ projW v = ui . ⟨ui , ui ⟩ i=1 Proof By the general projection theorem, v = v1 + v2 , where v1 = projW v is in W and v2 is orthogonal to W . Since S is a basis for W , there exist scalars c1 , c2 , . . . , ck such that v1 = c1 u1 + · · · + ck uk . Therefore, v = c1 u1 + · · · + ck uk + v2 . Since S is an orthogonal set of vectors and v2 is orthogonal to each ui , we find that for 1 ≤ i ≤ k, ⟨v, ui ⟩ = ci ⟨ui , ui ⟩. This proves the lemma. Theorem 1.13 Gram-Schmidt Process Let V be an inner product space, and assume that S = {v1 , . . . , vk } is a linearly independent set of vectors in V . Define the vectors u1 , . . . , uk inductively by u1 = v1 , and for 2 ≤ j ≤ k, uj = vj −
j−1 X ⟨vj , ui ⟩ i=1
⟨ui , ui ⟩
ui .
Then S ′ = {u1 , . . . , uk } is a nonzero set of orthogonal vectors. Moreover, for each 1 ≤ j ≤ k, the set {ui | 1 ≤ i ≤ j} spans the same subspace as the set {vi | 1 ≤ i ≤ j}.
Chapter 1. Euclidean Spaces
19 Sketch of Proof
For 1 ≤ j ≤ k, let Wj be the subspace spanned by the set {vi | 1 ≤ i ≤ j}. The vectors u1 , . . . , uk are constructed by letting u1 = v1 , and for 2 ≤ j ≤ k, uj = vj − projWj−1 vj . Since {v1 , . . . , vj } is a linearly independent set, uj ̸= 0. Using induction, one can show that span {u1 , . . . , uj } = span {v1 , . . . , vj }. By projection theorem, uj is orthogonal to Wj−1 . Hence, it is orthogonal to u1 , . . . , uj−1 . This proves the theorem. A mapping between two vector spaces that respect the linear structures is called a linear transformation. Definition 1.15 Linear Transformation Let V and W be real vector spaces. A mapping T : V → W is called a linear transformation provided that for any v1 , . . . , vk in V , for any real numbers c1 , . . . , ck , T (c1 v1 + · · · + ck vk ) = c1 T (v1 ) + · · · + ck T (vk ). Linear transformations play important roles in multivariable analysis. In the following, we first define a special class of linear transformations associated to special projections. For 1 ≤ i ≤ n, let Li be the subspace of Rn spanned by the unit vector ei . For the point x = (x1 , . . . , xn ), projLi x = xi ei . The number xi is the ith -component of x. It will play important roles later. The mapping from x to xi is a function from Rn to R. Definition 1.16 Projection Functions For 1 ≤ i ≤ n, the ith -projection function on Rn is the function πi : Rn → R defined by πi (x) = πi (x1 , . . . , xn ) = xi .
Chapter 1. Euclidean Spaces
20
Figure 1.5: The projection functions. The following is obvious. Proposition 1.14 For 1 ≤ i ≤ n, the ith -projection function on Rn is a linear transformation. Namely, for any x1 , . . . , xk in Rn , and any real numbers c1 , . . . , ck , πi (c1 x1 + · · · + ck xk ) = c1 πi (x1 ) + · · · + ck πi (xk ). The following is a useful inequality. Proposition 1.15 Let x be a vector in Rn . Then |πi (x)| ≤ ∥x∥. At the end of this section, let us introduce the concept of hyperplanes. Definition 1.17 Hyperplanes In Rn , a hyperplane is a translate of a subspace of dimension n − 1. In other words, H is a hyperplane if there is a point x0 in Rn , and n − 1 linearly independent vectors v1 , v2 , . . ., vn−1 such that H contains all points x of the form x = x0 + t1 v1 + · · · + tn−1 vn−1 ,
(t − 1, . . . , tn−1 ) ∈ Rn−1 .
Chapter 1. Euclidean Spaces
21
A hyperplane in R1 is a point. A hyperplane in R2 is a line. A hyperplane in R3 is a plane. Definition 1.18 Normal Vectors Let v1 , v2 , . . ., vn−1 be linearly independent vectors in Rn , and let H be the hyperplane H = x0 + t1 v1 + · · · + tn−1 vn−1 | (t1 , . . . , tn−1 ) ∈ Rn−1 . A nonzero vector n that is orthogonal to all the vectors v1 , . . . , vn−1 is called a normal vector to the hyperplane. If x1 and x2 are two points on H, then n is orthogonal to the vector v = x2 − x1 . Any two normal vectors of a hyperplane are scalar multiples of each other. Proposition 1.16 If H is a hyperplane with normal vector n = (a1 , a2 , . . . , an ), and x0 = (u1 , u2 , . . . , un ) is a point on H, then the equation of H is given by a1 (x1 − u1 ) + a2 (x2 − u2 ) + · · · + an (xn − un ) = n · (x − x0 ) = 0. Conversely, any equation of the form a1 x 1 + a2 x 2 + · · · + an x n = b is the equation of a hyperplane with normal vector n = (a1 , a2 , . . . , an ). Example 1.10 Given 1 ≤ i ≤ n, the equation xi = c is a hyperplane with normal vector ei . It is a hyperplane parallel to the coordinate plane xi = 0, and perpendicular to the xi -axis.
Chapter 1. Euclidean Spaces
Exercises 1.1 Question 1 Let V be an inner product space. If u and v are vectors in V , show that ∥u∥ − ∥v∥ ≤ ∥u − v∥. Question 2 Let V be an inner product space. If u and v are orthogonal vectors in V , show that ∥u + v∥2 = ∥u∥2 + ∥v∥2 . Question 3 Let V be an inner product space, and let u and v be vectors in V . Show that ∥u + v∥2 − ∥u − v∥2 ⟨u, v⟩ = . 4 Question 4 Let V be an inner product space, and let {u1 , . . . , uk } be an orthonormal set of vectors in V . For any real numbers α1 , . . . , αk , show that ∥α1 u1 + · · · + αk uk ∥2 = α12 + · · · + αk2 . Question 5 Let x1 , x2 , . . . , xn be real numbers. Show that q (a) x21 + x22 · · · + x2n ≤ |x1 | + |x2 | + · · · + |xn |; √ q 2 (b) |x1 + x2 + · · · + xn | ≤ n x1 + x22 · · · + x2n .
22
Chapter 1. Euclidean Spaces
1.2
23
Convergence of Sequences in Rn
A point in the Euclidean space Rn is denoted by x = (x1 , x2 , . . . , xn ). When n = 1, we just denote it by x. When n = 2 and n = 3, it is customary to denote a point in R2 and R3 by (x, y) and (x, y, z) respectively. The Euclidean inner product between the vectors x = (x1 , x2 , . . . , xn ) and y = (y1 , y2 , . . . , yn ) is ⟨x, y⟩ = x · y =
n X
xi y i .
i=1
The norm of x is
v u n p uX ∥x∥ = ⟨x, x⟩ = t x2i , i=1
while the distance between x and y is v u n uX d(x, y) = ∥x − y∥ = t (xi − yi )2 . i=1
A sequence in Rn is a function f : Z+ → Rn . For k ∈ Z+ , let ak = f (k). Then we can also denote the sequence by {ak }∞ k=1 , or simply as {ak }. Example 1.11 k 2k + 3 The sequence , is a sequence in R2 with k+1 k k 2k + 3 ak = , . k+1 k In volume I, we have seen that a sequence of real numbers {ak }∞ k=1 is said to converge to a real number a provided that for any ε > 0, there is a positive integer K such that |ak − a| < ε for all k ≥ K. Notice that |ak − a| is the distance between ak and a. To define the convergence of a sequence in Rn , we use the Euclidean distance.
Chapter 1. Euclidean Spaces
24
Definition 1.19 Convergence of Sequences A sequence {ak } in Rn is said to converge to the point a in Rn provided that for any ε > 0, there is a positive integer K so that for all k ≥ K, ∥ak − a∥ = d(ak , a) < ε. If {ak } is a sequence that converges to a point a, we say that the sequence {ak } is convergent. A sequence that does not converge to any point in Rn is said to be divergent.
Figure 1.6: The convergence of a sequence. As in the n = 1 case, we have the following. Proposition 1.17 A sequence in Rn cannot converge to two different points. Definition 1.20 Limit of a Sequence If {ak } is a sequence in Rn that converges to the point a, we call a the limit of the sequence. This can be expressed as lim ak = a.
k→∞
The following is easy to establish.
Chapter 1. Euclidean Spaces
25
Proposition 1.18 Let {ak } be a sequence in Rn . Then {ak } converges to a if and only if lim ∥ak − a∥ = 0.
k→∞
Proof By definition, the sequence {ak } is convergent if and only if for any ε > 0, there is a positive integer K so that for all k ≥ K, ∥ak − a∥ < ε. This is the definition of lim ∥ak − a∥ = 0. k→∞
As in the n = 1 case, {akj }∞ j=1 is a subsequence of {ak } if k1 , k2 , k3 , . . . is a strictly increasing subsequence of positive integers. Corollary 1.19 If {ak } is a sequence in Rn that converges to the point a, then any subsequence of {ak } also converges to a. Example 1.12 Let us investigate the convergence of the sequence {ak } in R2 with k 2k + 3 , ak = k+1 k that is defined in Example 1.11. Notice that k = 1, k→∞ k + 1
lim π1 (ak ) = lim
k→∞
2k + 3 = 2. k→∞ k→∞ k It is natural for us to speculate that the sequence {ak } converges to the point a = (1, 2). lim π2 (ak ) = lim
Chapter 1. Euclidean Spaces
For k ∈ Z+ ,
26
1 3 , . ak − a = − k+1 k
Thus, s ∥ak − a∥ =
1 9 + . (k + 1)2 k 2
By squeeze theorem, lim ∥ak − a∥ = 0.
k→∞
This proves that the sequence {ak } indeed converges to the point a = (1, 2). In the example above, we guess the limit of the sequence by looking at each components of the sequence. This in fact works for any sequences. Theorem 1.20 Componentwise Convergence of Sequences A sequence {ak } in Rn converges to the point a if and only if for each 1 ≤ i ≤ n, the sequence {πi (ak )} converges to the point {πi (a)}. Proof Given 1 ≤ i ≤ n, πi (ak ) − πi (a) = πi (ak − a). Thus, |πi (ak ) − πi (a)| = |πi (ak − a)| ≤ ∥ak − a∥. If the sequence {ak } converges to the point a, then lim ∥ak − a∥ = 0.
k→∞
By squeeze theorem, lim |πi (ak ) − πi (a)| = 0.
k→∞
This proves that the sequence {πi (ak )} converges to the point {πi (a)}.
Chapter 1. Euclidean Spaces
27
Conversely, assume that for each 1 ≤ i ≤ n, the sequence {πi (ak )} converges to the point {πi (a)}. Then lim |πi (ak ) − πi (a)| = 0
k→∞
for 1 ≤ i ≤ n.
Since ∥ak − a∥ ≤
n X
|πi (ak − a)| ,
i=1
squeeze theorem implies that lim ∥ak − a∥ = 0.
k→∞
This proves that the sequence {ak } converges to the point a. Theorem 1.20 reduces the investigations of convergence of sequences in Rn to sequences in R. Let us look at a few examples. Example 1.13 Find the following limit. lim
k→∞
! k 2k + 1 1 k , 1+ ,√ . 3k k k2 + 1
Solution We compute the limit componentwise. " k # k 2 1 2k + 1 lim = lim + = 0 + 0 = 0, k k→∞ k→∞ 3 3 3 k 1 lim 1 + = e, k→∞ k k k lim √ = lim r = 1. 2 k→∞ k→∞ k +1 1 k 1+ 2 k
Chapter 1. Euclidean Spaces
28
Hence, lim
k→∞
! k k 2k + 1 1 ,√ , 1+ = (0, e, 1). 3k k k2 + 1
Example 1.14 Let {ak } be the sequence with k k (−1) ak = (−1) , . k Is the sequence convergent? Justify your answer. Solution The sequence {π1 (ak )} is the sequence {(−1)k }, which is divergent. Hence, the sequence {ak } is divergent. Using the componentwise convergence theorem, it is easy to establish the following. Proposition 1.21 Linearity Let {ak } and {bk } be sequences in Rn that converges to a and b respectively. For any real numbers α and β, the sequence {αak + βbk } converges to αa + βb. Namely, lim (αak + βbk ) = αa + βb.
k→∞
Example 1.15 If {ak } is a sequence in Rn that converges to a, show that lim ∥ak ∥ = ∥a∥.
k→∞
Chapter 1. Euclidean Spaces
29 Solution
Notice that ∥ak ∥ =
p π1 (ak )2 + · · · + πn (ak )2 .
For 1 ≤ i ≤ n, lim πi (ak ) = πi (a).
k→∞
Using limit laws for sequences in R, we have lim π1 (ak )2 + · · · + πn (ak )2 = π1 (a)2 + · · · + πn (a)2 .
k→∞
Using the fact that square root function is continuous, we find that p lim ∥ak ∥ = lim π1 (ak )2 + · · · + πn (ak )2 k→∞ k→∞ p = π1 (a)2 + · · · + πn (a)2 = ∥a∥. There is also a Cauchy criterion for convergence of sequences in Rn . Definition 1.21 Cauchy Sequences A sequence {ak } in Rn is a Cauchy sequence if for every ε > 0, there is a positive integer K such that for all l ≥ k ≥ K, ∥al − ak ∥ < ε. Theorem 1.22 Cauchy Criterion A sequence {ak } in Rn is convergent if and only if it is a Cauchy sequence. Similar to the n = 1 case, the Cauchy criterion allows us to determine whether a sequence in Rn is convergent without having to guess what is the limit first.
Chapter 1. Euclidean Spaces
30 Proof
Assume that the sequence {ak } converges to a. Given ε > 0, there is a positive integer K such that for all k ≥ K, ∥ak − a∥ < ε/2. Then for all l ≥ k ≥ K, ∥al − ak ∥ ≤ ∥al − a∥ + ∥ak − a∥ < ε. This proves that {ak } is a Cauchy sequence. Conversely, assume that {ak } is a Cauchy sequence. Given ε > 0, there is a positive integer K such that for all l ≥ k ≥ K, ∥al − ak ∥ < ε. For each 1 ≤ i ≤ n, |πi (al ) − πi (ak )| = |πi (al − ak )| ≤ ∥al − ak ∥. Hence, {πi (ak )} is a Cauchy sequence in R. Therefore, it is convergent. By componentwise convergence theorem, the sequence {ak } is convergent.
Chapter 1. Euclidean Spaces
31
Exercises 1.2 Question 1 Show that a sequence in Rn cannot converge to two different points. Question 2 Find the limit of the sequence {ak }, where ak =
2k + 1 , k+3
√
k ! 2k 2 + k 2 , 1+ . k k
Question 3 Let {ak } be the sequence with 1 + (−1)k−1 k 1 ak = , k . 1+k 2 Determine whether the sequence is convergent. Question 4 Let {ak } be the sequence with k k ,√ . ak = 1+k k+1 Determine whether the sequence is convergent. Question 5 Let {ak } and {bk } be sequences in Rn that converges to a and b respectively. Show that lim ⟨ak , bk ⟩ = ⟨a, b⟩.
k→∞
Here ⟨x, y⟩ = x · y is the standard inner product on Rn .
Chapter 1. Euclidean Spaces
32
Question 6 Suppose that {ak } is a sequence in Rn that converges to a, and {ck } is a sequence of real numbers that converges to c, show that lim ck ak = ca.
k→∞
Question 7 Suppose that {ak } is a sequence of nonzero vectors in Rn that converges to a and a ̸= 0, show that ak a = . k→∞ ∥ak ∥ ∥a∥ lim
Question 8 Let {ak } and {bk } be sequences in Rn . If {ak } is convergent and {bk } is divergent, show that the sequence {ak + bk } is divergent. Question 9 Suppose that {ak } is a sequence in Rn that converges to a. If r = ∥a∥ = ̸ 0, show that there is a positive integer K such that ∥ak ∥ >
r 2
for all k ≥ K.
Question 10 Let {ak } be a sequence in Rn and let b be a point in Rn . Assume that the sequence {ak } does not converge to b. Show that there is an ε > 0 and a subsequence {akj } of {ak } such that ∥akj − b∥ ≥ ε
for all j ∈ Z+ .
Chapter 1. Euclidean Spaces
1.3
33
Open Sets and Closed Sets
In volume I, we call an interval of the form (a, b) an open interval. Given a point x in R, a neighbourhood of x is an open interval (a, b) that contains x. Given a subset S of R, we say that x is an interior point of S if there is a neighboirhood of x that is contained in S. We say that S is closed in R provided that if {ak } is a sequence of points in S that converges to a, then a is also in S. These describe the topology of R. It is relatively simple. For n ≥ 2, the topological features of Rn are much more complicated. An open interval (a, b) in R can be described as a set of the form B = {x ∈ R | |x − x0 | < r} , where x0 =
b−a a+b and r = . 2 2
Figure 1.7: An open interval. Generalizing this, we define open balls in Rn . Definition 1.22 Open Balls Given x0 in Rn and r > 0, an open ball B(x0 , r) of radius r with center at x0 is a subset of Rn of the form B(x0 , r) = {x ∈ Rn | ∥x − x0 ∥ < r} . It consists of all points of Rn whose distance to the center x0 is less than r. Obviously, it 0 < r1 ≤ r2 , then B(x0 , r1 ) ⊂ B(x0 , r2 ). The following is a useful lemma for balls with different centers.
Chapter 1. Euclidean Spaces
34
Figure 1.8: An open ball. Lemma 1.23 Let x1 be a point in the open ball B(x0 , r). Then ∥x1 − x0 ∥ < r. If r1 is a positive number satisfying r1 ≤ r − ∥x1 − x0 ∥, then the open ball B(x1 , r1 ) is contained in the open ball B(x0 , r).
Figure 1.9: An open ball containing another open ball with different center. Proof Let x be a point in B(x1 , r1 ). Then ∥x − x1 ∥ < r1 ≤ r − ∥x1 − x0 ∥.
Chapter 1. Euclidean Spaces
35
By triangle inequality, ∥x − x0 ∥ ≤ ∥x − x1 ∥ + ∥x1 − x0 ∥ < r. Therefore, x is a point in B(x0 , r). This proves the assertion. Now we define open sets in Rn . Definition 1.23 Open Sets Let S be a subset of Rn . We say that S is an open set if for each x ∈ S, there is a ball B(x, r) centered at x that is contained in S. The following example justifies that an open interval of the form (a, b) is an open set. Example 1.16 Let S to be the open interval S = (a, b) in R. If x ∈ S, then a < x < b. Hence, x − a and b − x are positive. Let r = min{x − a, b − x}. Then r > 0, r ≤ x − a and r ≤ b − x. These imply that a ≤ x − r < x + r ≤ b. Hence, B(x, r) = (x − r, x + r) ⊂ (a, b) = S. This shows that the interval (a, b) is an open set.
Figure 1.10: The interval (a, b) is an open set. The following example justifies that an open ball is indeed an open set. Example 1.17 Let S = B(x0 , r) be the open ball with center at x0 and radius r > 0 in Rn . Show that S is an open set.
Chapter 1. Euclidean Spaces
36 Solution
Given x ∈ S, d = ∥x − x0 ∥ < r. Let r1 = r − d. Then r1 > 0. Lemma 1.23 implies that the ball B(x, r1 ) is inside S. Hence, S is an open set. Example 1.18 As subsets of Rn , ∅ and Rn are open sets. Example 1.19 A one-point set S = {a} in Rn cannot be open, for there is no r > 0 such that B(a, r) in contained in S. Let us look at some other examples of open sets. Definition 1.24 Open Rectangles A set of the form n Y U= (ai , bi ) = (a1 , b1 ) × · · · × (an , bn ) i=1
in Rn , which is a cartesian product of open bounded intervals, in called an open rectangle.
Figure 1.11: A rectangle in R2 .
Chapter 1. Euclidean Spaces
37
Example 1.20 n Y Let U = (ai , bi ) be an open rectangle in Rn . Show that U is an open set. i=1
Solution Let x = (x1 , . . . , xn ) be a point in U . Then for 1 ≤ i ≤ n, ri = min{xi − ai , bi − xi } > 0 and (xi − ri , xi + ri ) ⊂ (ai , bi ). Let r = min{r1 , . . . , rn }. Then r > 0. We claim that B(x, r) is contained in U . If y ∈ B(x, r), then ∥y − x∥ < r. This implies that |yi − xi | ≤ ∥y − x∥ < r ≤ ri
for all 1 ≤ i ≤ n.
Hence, yi ∈ (xi − ri , xi + ri ) ⊂ (ai , bi )
for all 1 ≤ i ≤ n.
This proves that y ∈ U , and thus, completes the proof that B(x, r) is contained in U . Therefore, U is an open set.
Figure 1.12: An open rectangle is an open set.
Chapter 1. Euclidean Spaces
38
Next, we define closed sets. The definition is a straightforward generalization of the n = 1 case. Definition 1.25 Closed Sets Let S be a subset of Rn . We say that S is closed in Rn provided that if {ak } is a sequence of points in S that converges to the point a, the point a is also in S. Example 1.21 As subsets of Rn , ∅ and Rn are closed sets. Since ∅ and Rn are also open, a subset S of Rn can be both open and closed. Example 1.22 Let S = {a} be a one-point set in Rn . A sequence {ak } in S is just the constant sequence where ak = a for all k ∈ Z+ . Hence, it converges to a which is in S. Thus, a one-point set S is a closed set. In volume I, we have proved the following. Proposition 1.24 Let I be intervals of the form (−∞, a], [a, ∞) or [a, b]. Then I is a closed subset of R. Definition 1.26 Closed Rectangles A set of the form n Y R= [ai , bi ] = [a1 , b1 ] × · · · × [an , bn ] i=1
in Rn , which is a cartesian product of closed and bounded intervals, is called a closed rectangle. The following justifies that a closed rectangle is indeed a closed set.
Chapter 1. Euclidean Spaces
39
Example 1.23 Let
n Y R= [ai , bi ] = [a1 , b1 ] × · · · × [an , bn ] i=1
be a closed rectangle in Rn . Show that R is a closed set. Solution Let {ak } be a sequence in R that converges to a point a. For each 1 ≤ i ≤ n, {πi (ak )} is a sequence in [ai , bi ] that converges to πi (a). Since [ai , bi ] is a closed set in R, πi (a) ∈ [ai , bi ]. Hence, a is in R. This proves that R is a closed set. It is not true that a set that is not open is closed. Example 1.24 Show that an interval of the form I = (a, b] in R is neither open nor closed. Solution If I is open, since b is in I, there is an r > 0 such that (b − r, b + r) = B(b, r) ⊂ I. But then b + r/2 is a point in (b − r, b + r) but not in I = (a, b], which gives a contradiction. Hence, I is not open. For k ∈ Z+ , let b−a ak = a + . k Then {ak } is a sequence in I that converges to a, but a is not in I. Hence, I is not closed. Thus, we have seen that a subset S of Rn can be both open and closed, and it can also be neither open nor closed. Let us look at some other examples of closed sets.
Chapter 1. Euclidean Spaces
40
Definition 1.27 Closed Balls Given x0 in Rn and r > 0, a closed ball of radius r with center at x0 is a subset of Rn of the form CB(x0 , r) = {x ∈ Rn | ∥x − x0 ∥ ≤ r} . It consists of all points of Rn whose distance to the center x0 is less than or equal to r. The following justifies that a closed ball is indeed a closed set. Example 1.25 Given x0 ∈ Rn and r > 0, show that the closed ball CB(x0 , r) = {x ∈ Rn | ∥x − x0 ∥ ≤ r} is a closed set. Solution Let {ak } be a sequence in CB(x0 , r) that converges to the point a. Then lim ∥ak − a∥ = 0.
k→∞
For each k ∈ Z+ , ∥ak − x0 ∥ ≤ r. By triangle inequality, ∥a − x0 ∥ ≤ ∥ak − x0 ∥ + ∥ak − a∥ ≤ r + ∥ak − a∥. Taking the k → ∞ limit, we find that ∥a − x0 ∥ ≤ r. Hence, a is in CB(x0 , r). This proves that CB(x0 , r) is a closed set. The following theorem gives the relation between open and closed sets.
Chapter 1. Euclidean Spaces
41
Theorem 1.25 Let S be a subset of Rn and let A = Rn \ S be its complement in Rn . Then S is open if and only if A is closed. Proof Assume that S is open. Let {ak } be a sequence in A that converges to the point a. We want to show that a is in A. Assume to the contrary that a is not in A. Then a is in S. Since S is open, there is an r > 0 such that B(a, r) is contained in S. Since the sequence {ak } converges to a, there is a positive integer K such that for all k ≥ K, ∥ak − a∥ < r. But then this implies that aK ∈ B(a, r) ⊂ S. This contradicts to aK is in A = Rn \ S. Hence, we must have a is in A, which proves that A is closed. Conversely, assume that A is closed. We want to show that S is open. Assume to the contrary that S is not open. Then there is a point a in S such that for every r > 0, B(a, r) is not contained in S. For every k ∈ Z+ , since B(a, 1/k) is not contained in S, there is a point ak in B(a, 1/k) such that ak is not in S. Thus, {ak } is a sequence in A and ∥ak − a∥
0 such that B(x, r) ⊂ Uα ⊂ U . Hence, U is open. k \ For the second statement, let x be a point in V = Vi . Then for each i=1
1 ≤ i ≤ k, x is in the open set Vi . Hence, there is an ri > 0 such that B(x, ri ) ⊂ Vi . Let r = min{r1 , . . . , rk }. Then for 1 ≤ i ≤ k, r ≤ ri and so B(x, r) ⊂ B(x, ri ) ⊂ Vi . Hence, B(x, r) ⊂ V . This proves that V is open. As an application of this theorem, let us show that any open interval in R is indeed an open set. Proposition 1.27 Let I be an interval of the form (−∞, a), (a, ∞) or (a, b). Then I is an open subset of R.
Chapter 1. Euclidean Spaces
43 Proof
We have shown in Example 1.16 that if I is an interval of the form (a, b), then I is an open subset of R. Now (a, ∞) =
∞ [
(a, a + k)
k=1
is a union of open sets. Hence, (a, ∞) is open. In the same way, one can show that an interval of the form (−∞, a) is open. The next example shows that arbitrary intersections of open sets is not necessary open. Example 1.26 For k ∈ Z+ , let Uk be the open set in R given by 1 1 Uk = − , . k k Notice that the set U=
∞ \
Uk = {0}
k=1
is a one-point set. Hence, it is not open in R. De Morgan’s law in set theory says that if {Uα | α ∈ J} is a collection of sets in Rn , then Rn \
[
Uα =
α∈J
Rn \
\ α∈J
\
(Rn \ Uα ) ,
α∈J
Uα =
[
(Rn \ Uα ) .
α∈J
Thus, we obtain the counterpart of Theorem 1.26 for closed sets.
Chapter 1. Euclidean Spaces
44
Theorem 1.28 1. Arbitrary intersection of closed sets is closed. Namely, if {Aα \ | α ∈ J} n is a collection of closed sets in R , then their intersection A = Aα is α∈J
also a closed set. 2. Finite union of closed sets is closed. Namely, if C1 , . . . , Ck are closed k [ sets in Rn , then their union C = Ci is also a closed set. i=1
Proof We prove the first statement. The proof of the second statement is similar. Given that {Aα | α ∈ J} is a collection of closed sets in Rn , for each α ∈ J, n let Uα = Rn \ Aα . Then [ {Uα | α ∈ J} is a collection of open sets [in R . By Theorem 1.26, the set Uα is open. By Theorem 1.25, Rn \ Uα is α∈J
α∈J
closed. By De Morgan’s law, Rn \
[ α∈J
This proves that
\
Uα =
\
(Rn \ Uα ) =
α∈J
\
Aα .
α∈J
Aα is a closed set.
α∈J
The following example says that any finite point set is a closed set. Example 1.27 Let S = {x1 , . . . , xk } be a finite point set in Rn . Then S =
k [
{xi } is a
i=1
finite union of one-point sets. Since one-point set is closed, S is closed.
Chapter 1. Euclidean Spaces
Exercises 1.3 Question 1 Let A be the subset of R2 given by A = {(x, y) | x > 0, y > 0} . Show that A is an open set. Question 2 Let A be the subset of R2 given by A = {(x, y) | x ≥ 0, y ≥ 0} . Show that A is a closed set. Question 3 Let A be the subset of R2 given by A = {(x, y) | x > 0, y ≥ 0} . Is A open? Is A closed? Justify your answers. Question 4 Let C and U be subsets of Rn . Assume that C is closed and U is open, show that U \ C is open and C \ U is closed. Question 5 Let A be a subset of Rn , and let B = A + u be the translate of A by the vector u. (a) Show that A is open if and only if B is open. (b) Show that A is closed if and only if B is closed.
45
Chapter 1. Euclidean Spaces
1.4
Interior, Exterior, Boundary and Closure
First, we introduce the interior of a set. Definition 1.28 Interior Let S be a subset of Rn . We say that x ∈ Rn is an interior point of S if there exists r > 0 such that B(x, r) ⊂ S. The interior of S, denoted by int S, is defined to be the collection of all the interior points of S.
Figure 1.14: The interior point of a set. The following gives a characterization of the interior of a set. Theorem 1.29 Let S be a subset of Rn . Then we have the followings. 1. int S is a subset of S. 2. int S is an open set. 3. S is an open set if and only if S = int S. 4. If U is an open set that is contained in S, then U ⊂ int S. These imply that int S is the largest open set that is contained in S.
46
Chapter 1. Euclidean Spaces
47
Proof Let x be a point in int S. By definition, there exists r > 0 such that B(x, r) ⊂ S. Since x ∈ B(x, r) and B(x, r) ⊂ S, x is a point in S. Since we have shown that every point in int S is in S, int S is a subset of S. If y ∈ B(x, r), Lemma 1.23 says that there is an r′ > 0 such that B(y, r′ ) ⊂ B(x, r) ⊂ S. Hence, y is also in int S. This proves that B(x, r) is contained in int S. Since we have shown that for any x ∈ int S, there is an r > 0 such that B(x, r) is contained in int S, this shows that int S is open. If S = int S, S is open. Conversely, if S is open, for every x in S, there is an r > 0 such that B(x, r) ⊂ S. Then x is in int S. Hence, S ⊂ int S. Since we have shown that int S ⊂ S is always true, we conclude that if S is open, S = int S. If U is a subset of S and U is open, for every x in U , there is an r > 0 such that B(x, r) ⊂ U . But then B(x, r) ⊂ S. This shows that x is in int S. Since every point of U is in int S, this proves that U ⊂ int S. Example 1.28 Find the interior of each of the following subsets of R. (a) A = (a, b)
(b) B = (a, b]
(c) C = [a, b]
(d) Q Solution
(a) Since A is an open set, int A = A = (a, b). (b) Since A is an open set that is contained in B, A = (a, b) is contained in int B. Since int B ⊂ B, we only left to determine whether b is in int B. The same argument as given in Example 1.24 shows that b is not an interior point of B. Hence, int B = A = (a, b).
Chapter 1. Euclidean Spaces
48
(c) Similar arguments as given in (b) show that A ⊂ int C, and both a and b are not interior points of C. Hence, int C = A = (a, b). (d) For any x ∈ R and any r > 0, B(x, r) = (x − r, x + r) contains an irrational number. Hence, B(x, r) is not contained in Q. This shows that Q does not have interior points. Hence, int Q = ∅. Definition 1.29 Neighbourhoods Let x be a point in Rn and let U be a subset of Rn . We say that U is a neighbourhood of x if U is an open set that contains x. Notice that this definition is slightly different from the one we use in volume I for the n = 1 case. Neighbourhoods By definition, if U is a neighbourhood of x, then x is an interior point of U , and there is an r > 0 such that B(x, r) ⊂ U . Example 1.29 Consider the point x = (1, 2) and the sets U = (x1 , x2 ) | x21 + x22 < 9 , V = {(x1 , x2 ) | 0 < x1 < 2, −1 < x2 < 3} in R2 . The sets U and V are neighbourhoods of x. Next, we introduce the exterior and boundary of a set. Definition 1.30 Exterior Let S be a subset of Rn . We say that x ∈ Rn is an exterior point of S if there exists r > 0 such that B(x, r) ⊂ Rn \ S. The exterior of S, denoted by ext S, is defined to be the collection of all the exterior points of S.
Chapter 1. Euclidean Spaces
49
Figure 1.15: The sets U and V are neighbourhoods of the point x. Definition 1.31 Boundary Let S be a subset of Rn . We say that x ∈ Rn is a boundary point of S if for every r > 0, the ball B(x, r) intersects both S and Rn \ S. The boundary of S, denoted by bd S or ∂S, is defined to be the collection of all the boundary points of S.
Figure 1.16: P is an interior point, Q is an exterior point, E is a boundary point.
Chapter 1. Euclidean Spaces
50
Theorem 1.30 Let S be a subset of Rn . We have the followings. (a) ext (S) = int (Rn \ S). (b) bd (S) = bd (Rn \ S). (c) int S, ext S and bd S are mutually disjoint sets. (d) Rn = int S ∪ ext S ∪ bd S. Proof (a) and (b) are obvious from definitions. For parts (c) and (d), we notice that for a point x ∈ Rn , exactly one of the following three statements holds. (i) There exists r > 0 such that B(x, r) ⊂ S. (ii) There exists r > 0 such that B(x, r) ⊂ Rn \ S. (iii) For every r > 0, B(x, r) intersects both S and Rn \ S. Thus, int S, ext S and bd S are mutually disjoint sets, and their union is Rn . Example 1.30 Find the exterior and boundary of each of the following subsets of R. (a) A = (a, b)
(b) B = (a, b]
(c) C = [a, b]
(d) Q
Solution We have seen in Example 1.28 that int A = int B = int C = (a, b).
Chapter 1. Euclidean Spaces
51
For any r > 0, the ball B(a, r) = (a − r, a + r) contains a point less than a, and a point larger than a. Hence, a is a boundary point of the sets A, B and C. Similarly, b is a boundary point of the sets A, B and C. For every point x which satisfies x < a, let r = a − x. Then r > 0. Since x+r = a, the ball B(x, r) = (x−r, x+r) is contained in (−∞, a). Hence, x is an exterior point of the sets A, B and C. Similarly every point x such that x > b is an exterior point of the sets A, B and C. Since the interior, exterior and boundary of a set in R are three mutually disjoint sets whose union is R, we conclude that bd A = bd B = bd C = {a, b}, ext A = ext B = ext C = (−∞, a) ∪ (b, ∞). For every x ∈ R and every r > 0, the ball B(x, r) = (x − r, x + r) contains a point in Q and a point not in Q. Therefore, x is a boundary point of Q. This shows that bd Q = R, and thus, ext Q = ∅. Example 1.31 Let A = B(x0 , r), where x0 is a point in Rn , and r is a positive number. Find the interior, exterior and boundary of A. Solution We have shown that A is open. Hence, int A = A. Let U = {x ∈ Rn | ∥x − x0 ∥ > r} ,
C = {x ∈ Rn | ∥x − x0 ∥ = r} .
Notice that A, U and C are mutually disjoint sets whose union is Rn . If x is in U , d = ∥x−x0 ∥ > r. Let r′ = d−r. Then r′ > 0. If y ∈ B(x, r′ ), then ∥y − x∥ < r′ . It follows that ∥y − x0 ∥ ≥ ∥x − x0 ∥ − ∥y − x∥ > d − r′ = r. This proves that y ∈ U . Hence, Bd (x, r′ ) ⊂ U ⊂ Rn \ A, which shows that x is an exterior point of A. Thus, U ⊂ ext A.
Chapter 1. Euclidean Spaces
52
Now if x ∈ C, ∥x − x0 ∥ = r. For every r′ > 0, let a = Then a ≤
r′ 1 and a ≤ . Consider the point 2 2r
1 min{r′ /r, 1}. 2
v = x − a(x − x0 ). Notice that ∥v − x∥ = ar ≤
r′ < r′ . 2
Thus, v is in B(x, r′ ). On the other hand, ∥v − x0 ∥ = (1 − a)r < r. Thus, v is in A. This shows that B(x, r′ ) intersects A. Since x is in B(x, r′ ) but not in A, we find that B(x, r′ ) intersects Rn \ A. Hence, x is a boundary point of A. This shows that C ⊂ bd A. Since int A, ext A and bd A are mutually disjoint sets, we conclude that int A = A, ext A = U and bd A = C. Now we introduce the closure of a set. Definition 1.32 Closure Let S be a subset of Rn . The closure of S, denoted by S, is defined as S = int S ∪ bd S. Example 1.32 Example 1.31 shows that the closure of the open ball B(x0 , r) is the closed ball CB(x0 , r). Example 1.33 Consider the sets A = (a, b), B = (a, b] and C = [a, b] in Example 1.28 and Example 1.30. We have shown that int A = int B = int C = (a, b), and bd A = bd B = bd C = {a, b}. Therefore, A = B = C = [a, b].
Chapter 1. Euclidean Spaces
53
Since Rn is a disjoint union of int S, bd S and ext S, we obtain the following immediately from the definition. Theorem 1.31 Let S be a subset of Rn . Then S and ext S are complement of each other in Rn . The following theorem gives a characterization of the closure of a set. Theorem 1.32 Let S be a subset of Rn , and let x be a point in Rn . The following statements are equivalent. (a) x ∈ S. (b) For every r > 0, B(x, r) intersects S. (c) There is a sequence {xk } in S that converges to x. Proof If x is in S, x is not in int (Rn \ S). Thus, for every r > 0, B(x, r) is not contained in Rn \ S. Then it must intersect S. This proves (a) implies (b). If (b) holds, for every k ∈ Z+ , take r = 1/k. The ball B(x, 1/k) intersects S at some point xk . This gives a sequence {xk } satisfying ∥xk − x∥
0, there is a positive integer K such that for all k ≥ K, ∥xk − x∥ < r, and thus xk ∈ B(x, r). This shows that B(x, r) is not contained in Rn \ S. Hence, x ∈ / ext S, and thus we must have x ∈ S. This proves (c) implies (a). The following theorem gives further properties of the closure of a set.
Chapter 1. Euclidean Spaces
54
Theorem 1.33 Let S be a subset of Rn . 1. S is a closed set that contains S. 2. S is closed if and only if S = S. 3. If C is a closed subset of Rn and S ⊂ C, then S ⊂ C. These imply that S is the smallest closed set that contains S. Proof These statements are counterparts of the statements in Theorem 1.29. Since ext S = int (Rn \ S), and the interior of a set is open, ext S is open. Since S = Rn \ ext S, S is a closed set. Since ext S ⊂ Rn \ S, we find that S = Rn \ ext S ⊃ S. If S = S, then S must be closed since S is closed. Conversely, if S is closed, Rn \ S is open, and so ext S = int (Rn \ S) = Rn \ S. It follows that S = Rn \ ext S = S. If C is a closed set that contains S, then Rn \ C is an open set that is contained in Rn \ S. Thus, Rn \ C ⊂ int (Rn \ S) = ext S. This shows that C ⊃ Rn \ ext S = S. Corollary 1.34 If S be a subset of Rn , S = S ∪ bd S. Proof Since int S ⊂ S, S = int S ∪ bd S ⊂ S ∪ bd S. Since S and bd S are both subsets of S, S ∪ bd S ⊂ S. This proves that S = S ∪ bd S.
Chapter 1. Euclidean Spaces
55
Example 1.34 n Y Let U be the open rectangle U = (ai , bi ) in Rn . Show that the closure i=1
n Y of U is the closed rectangle R = [ai , bi ]. i=1
Solution Since R is a closed set that contains U , U ⊂ R. If x = (x1 , . . . , xn ) is a point in R, then xi ∈ [ai , bi ] for each 1 ≤ i ≤ n. Since [ai , bi ] is the closure of (ai , bi ) in R, there is a sequence {xi,k }∞ k=1 in + (ai , bi ) that converges to xi . For k ∈ Z , let xk = (x1,k , . . . , xn,k ). Then {xk } is a sequence in U that converges to x. This shows that x ∈ U , and thus completes the proof that U = R. The proof of the following theorem shows the usefulness of the characterization of int S as the largest open set that is contained in S, and S is the smallest closed set that contains S. Theorem 1.35 If A and B are subsets of Rn such that A ⊂ B, then (a) int A ⊂ int B; and (b) A ⊂ B. Proof Since int A is an open set that is contained in A, it is an open set that is contained in B. By the fourth statement in Theorem 1.29, int A ⊂ int B. Since B is a closed set that contains B, it is a closed set that contains A. By the third statement in Theorem 1.33, A ⊂ B. Notice that as subsets of R, (a, b) ⊂ (a, b] ⊂ [a, b]. We have shown that
Chapter 1. Euclidean Spaces
56
(a, b) = (a, b] = [a, b]. In general, we have the following. Theorem 1.36 If A and B are subsets of Rn such that A ⊂ B ⊂ A, then A = B. Proof By Theorem 1.35, A ⊂ B implies that A ⊂ B, while B ⊂ A implies that B is contained in A = A. Thus, we have A ⊂ B ⊂ A, which proves that B = A. Example 1.35 In general, if S is a subset of Rn , it is not necessary true that int S = int S, even when S is an open set. For example, take S = (−1, 0) ∪ (0, 1) in R. Then S is an open set and S = [−1, 1]. Notice that int S = S = (−1, 0) ∪ (0, 1), but int S = (−1, 1).
Chapter 1. Euclidean Spaces
57
Exercises 1.4 Question 1 Let S be a subset of Rn . Show that bd S is a closed set. Question 2 Let A be the subset of R2 given by A = {(x, y) | x < 0, y ≥ 0} . Find the interior, exterior, boundary and closure of A. Question 3 Let x0 be a point in Rn , and let r be a positive number. Consider the subset of Rn given by A = {x ∈ Rn | 0 < ∥x − x0 ∥ ≤ r} . Find the interior, exterior, boundary and closure of A. Question 4 Let A be the subset of R2 given by A = {(x, y) | 1 ≤ x < 3, −2 < y ≤ 5} ∪ {(0, 0), (2, −3)}. Find the interior, exterior, boundary and closure of A. Question 5 Let S be a subset of Rn . Show that bd S = S ∩ Rn \ S.
Chapter 1. Euclidean Spaces
58
Question 6 Let S be a subset of Rn . Show that bd S ⊂ bd S. Give an example where bd S ̸= bd S. Question 7 Let S be a subset of Rn . (a) Show that S is open if and only if S does not contain any of its boundary points. (b) Show that S is closed if and only if S contains all its boundary points. Question 8 Let S be a subset of Rn , and let x be a point in Rn . (a) Show that x is an interior point of S if and only if there is a neighbourhood of x that is contained in S. (b) Show that x ∈ S if and only if every neighbourhood of x intersects S. (c) Show that x is a boundary point of S if and only if every neighbourhood of x contains a point in S and a point not in S. Question 9 Let S be a subset of Rn , and let x = (x1 , . . . , xn ) be a point in the interior of S. (a) Show that there is an r1 > 0 such that CB(x, r1 ) ⊂ S. (b) Show that there is an r2 > 0 such that
n Y
(xi − r2 , xi + r2 ) ⊂ S.
i=1
(c) Show that there is an r3 > 0 such that
n Y i=1
[xi − r3 , xi + r3 ] ⊂ S.
Chapter 1. Euclidean Spaces
1.5
59
Limit Points and Isolated Points
In this section, we generalize the concepts of limit points and isolated points to subsets of Rn . Definition 1.33 Limit Points Let S be a subset of Rn . A point x in Rn is a limit point of S provided that there is a sequence {xk } in S \ {x} that converges to x. The set of limit points of S is denoted by S ′ . By Theorem 1.32, we obtain the following immediately. Theorem 1.37 Let S be a subset of Rn , and let x be a point in Rn . The following are equivalent. (a) x is a limit point of S. (b) x is in S \ {x}. (c) For every r > 0, B(x, r) intersects S at a point other than x. Corollary 1.38 If S is a subset of Rn , then S ′ ⊂ S. Proof If x ∈ S ′ , x ∈ S \ {x}. Since S \ {x} ⊂ S, we have S \ {x} ⊂ S. Therefore, x ∈ S. The following theorem says that the closure of a set is the union of the set with all its limit points. Theorem 1.39 If S is a subset of Rn , then S = S ∪ S ′ .
Chapter 1. Euclidean Spaces
60 Proof
By Corollary 1.38, S ′ ⊂ S. Since we also have S ⊂ S, we find that S ∪ S ′ ⊂ S. Conversely, if x ∈ S, then by Theorem 1.32, there is a sequence {xk } in S that converges to x. If x is not in S, then the sequence {xk } is in S \ {x}. In this case, x is a limit point of S. This shows that S \ S ⊂ S ′ , and hence, S ⊂ S ∪ S ′. In the proof above, we have shown the following. Corollary 1.40 Let S be a subset of Rn . Every point in S that is not in S is a limit point of S. Namely, S \ S ⊂ S ′. Now we introduce the definition of isolated points. Definition 1.34 Isolated Points Let S be a subset of Rn . A point x in Rn is an isolated point of S if (a) x is in S; (b) x is not a limit point of S. Remark 1.1 By definition, a point x in S is either an isolated point of S or a limit point of S. Theorem 1.37 gives the following immediately. Theorem 1.41 Let S be a subset of Rn and let x be a point in S. Then x is an isolated point of S if and only if there is an r > 0 such that the ball B(x, r) does not contain other points of S except the point x.
Chapter 1. Euclidean Spaces
61
Example 1.36 Find the set of limit points and isolated points of the set A = Z2 as a subset of R2 . Solution If {xk } is a sequence in A that converges to a point x, then there is a positive integer K such that for all l ≥ k ≥ K, ∥xl − xk ∥ < 1. This implies that xk = xK for all k ≥ K. Hence, x = xK ∈ A. This shows that A is closed. Hence, A = A. Therefore, A′ ⊂ A. For every x = (k, l) ∈ Z2 , B(x, 1) intersects A only at the point x itself. Hence, x is an isolated point of A. This shows that every point of A is an isolated point. Since A′ ⊂ A, we must have A′ = ∅.
Figure 1.17: The set Z2 does not have limit points. Let us prove the following useful fact. Theorem 1.42 If S is a subset of Rn , every interior point of S is a limit point of S.
Chapter 1. Euclidean Spaces
62 Proof
If x is an interior point of S, there exists r0 > 0 such that B(x, r0 ) ⊂ S. 1 Given r > 0, let r′ = min{r, r0 }. Then r′ > 0. Since r′ < r and r′ < r0 , 2 the point x ′ = x + r ′ e1 is in B(x, r) and S. Obviously, x′ ̸= x. Therefore, for every r > 0, B(x, r) intersects S at a point other than x. This proves that x is a limit point of S. Since S ⊂ int S ∪ bd S, and int S and bd S are disjoint, we deduce the following. Corollary 1.43 Let S be a subset of Rn . An isolated point of S must be a boundary point. Since every point in an open set S is an interior point of S, we obtain the following. Corollary 1.44 If S is an open subset of Rn , every point of S is a limit point. Namely, S ⊂ S ′. Example 1.37 If I is an interval of the form (a, b), (a, b], [a, b) or [a, b] in R, then bd I = {a, b}. It is easy to check that a and b are not isolated points of I. Hence, I has no isolated points. Since I = I ∪ I ′ and I ⊂ I ′ , we find that I ′ = I = [a, b]. In fact, we can prove a general theorem. Theorem 1.45 Let A and B be subsets of Rn such that A is open and A ⊂ B ⊂ A. Then B ′ = A. In particular, the set of limit points of A is A.
Chapter 1. Euclidean Spaces
63 Proof
By Theorem 1.36, A = B. Since A is open, A ⊂ A′ . Since A = A ∪ A′ , we find that A = A′ . In the exercises, one is asked to show that A ⊂ B implies A′ ⊂ B ′ . Therefore, A = A′ ⊂ B ′ ⊂ B. Since A = B, we must have B ′ = B = A. Example 1.38 Let A be the subset of R2 given by A = [−1, 1] × (−2, 2] = {(x, y) | − 1 ≤ x ≤ 1, −2 < y ≤ 2} . Since U = (−1, 1) × (−2, 2) is open, U = [−1, 1] × [−2, 2], and U ⊂ A ⊂ U , the set of limit points of A is U = [−1, 1] × [−2, 2].
Chapter 1. Euclidean Spaces
Exercises 1.5 Question 1 Let A and B be subsets of Rn such that A ⊂ B. Show that A′ ⊂ B ′ . Question 2 Let x0 be a point in Rn and let r be a positive number. Find the set of limit points of the open ball B(x0 , r). Question 3 Let A be the subset of R2 given by A = {(x, y) | x < 0, y ≥ 0} . Find the set of limit points of A. Question 4 Let x0 be a point in Rn , and let r is a positive number. Consider the subset of Rn given by A = {x ∈ Rn | 0 < ∥x − x0 ∥ ≤ r} . (a) Find the set of limit points of A. (b) Find the set of isolated points of the set S = Rn \ A. Question 5 Let A be the subset of R2 given by A = {(x, y) | 1 ≤ x < 3, −2 < y ≤ 5} ∪ {(0, 0), (2, −3)}. Determine the set of isolated points and the set of limit points of A.
64
Chapter 1. Euclidean Spaces Question 6 Let A = Q2 as a subset of R2 . (a) Find the interior, exterior, boundary and closure of A. (b) Determine the set of isolated points and the set of limit points of A. Question 7 Let S be a subset of Rn . Show that S is closed if and only if it contains all its limit points. Question 8 Let S be a subset of Rn , and let x be a point in Rn . Show that x is a limit point of S if and only if every neighbourhood of x intersects S at a point other than itself. Question 9 Let x1 , x2 , . . ., xk be points in Rn and let A = Rn \ {x1 , x2 , . . . , xk }. Find the set of limit points of A.
65
Chapter 2. Limits of Multivariable Functions and Continuity
66
Chapter 2 Limits of Multivariable Functions and Continuity We are interested in functions F : D → Rm that are defined on subsets D of Rn , taking values in Rm . When n ≥ 2, these are called multivariable functions. When m ≥ 2, they are called vector-valued functions. When m = 1, we usually write the function as f : D → R.
2.1
Multivariable Functions
In this section, let us define some special classes of multivariable functions. 2.1.1
Polynomials and Rational Functions
A special class of functions is the set of polynomials in n variables. Definition 2.1 Polynomials Let k = (k1 , . . . , kn ) be an n-tuple of nonnegative integers. Associated to this n-tuple k, there is a monomial pk : Rn → R of degree |k| = k1 + · · · + kn of the form pk (x) = xk11 · · · xknn . A polynomial in n variables is a function p : Rn → R that is a finite linear combination of monomials in n variables. It takes the form p(x) =
m X
ckj pkj (x),
j=1
where k1 , k2 , . . . , km are distinct n-tuples of nonnegative integers, and ck1 , ck2 , . . . , ckm are nonzero real numbers. The degree of the polynomial p(x) is max{|k1 |, |k2 |, . . . , |km |}.
Chapter 2. Limits of Multivariable Functions and Continuity
67
Example 2.1 The following are examples of polynomials in three variables. (a) p(x1 , x2 , x3 ) = x21 + x22 + x23 (b) p(x1 , x2 , x3 ) = 4x21 x2 − 3x1 x3 + x1 x2 x3 Example 2.2 The function f : Rn → R, f (x) = ∥x∥ =
q x21 + · · · + x2n
is not a polynomial. When the domain of a function is not specified, we always assume that the domain is the largest set on which the function can be defined. Definition 2.2 Rational Functions A rational function f : D → R is the quotient of two polynomials p : Rn → R and q : Rn → R. Namely, f (x) =
p(x) . q(x)
Its domain D is the set D = {x ∈ Rn | q(x) ̸= 0} . Example 2.3 The function
x1 x2 + 3x21 x1 − x2 is a rational function defined on the set f (x1 , x2 ) =
D = (x1 , x2 ) ∈ R2 | x1 ̸= x2 .
Chapter 2. Limits of Multivariable Functions and Continuity 2.1.2
68
Component Functions of a Mapping
If the codomain Rm of the function F : D → Rm has dimension m ≥ 2, we usually call the function a mapping. In this case, it would be good to consider the component functions. For 1 ≤ j ≤ m, the projection function πj : Rm → R is the function πj (x1 , . . . , xm ) = xj . Definition 2.3 Component Functions Let F : D → Rm be a function defined on D ⊂ Rn . For 1 ≤ j ≤ m, the j th component function of F is the function Fj : D → R defined as Fj = (πj ◦ F) : D → R. For each x ∈ D, F(x) = (F1 (x), . . . , Fm (x)). Example 2.4 For the function F : R3 → R3 , F(x) = −3x, the component functions are F1 (x1 , x2 , x3 ) = −3x1 , F2 (x1 , x2 , x3 ) = −3x2 , F3 (x1 , x2 , x3 ) = −3x3 . For convenience, we also define the notion of polynomial mappings. Definition 2.4 Polynomial Mappings We call a function F : Rn → Rm a polynomial mapping if each of its components Fj : Rn → R, 1 ≤ j ≤ m, is a polynomial function. The degree of the polynomial mapping F is the maximum of the degrees of the polynomials F1 , F2 , . . . , Fm . Example 2.5 The mapping F : R3 → R2 , F(x, y, z) = (x2 y + 3xz, 8yz 3 − 7x) is a polynomial mapping of degree 4.
Chapter 2. Limits of Multivariable Functions and Continuity 2.1.3
Invertible Mappings
The invertibility of a function F : D → Rm is defined in the following way. Definition 2.5 Inverse Functions Let D be a subset of Rn , and let F : D → Rm be a function defined on D. We say that F is invertible if F is one-to-one. In this case, the inverse function F−1 : F(D) → D is defined so that for each y ∈ F(D), F−1 (y) = x
if and only if
F(x) = y.
Example 2.6 Let D = {(x, y) | x > 0, y > 0} and let F : D → R2 be the function defined as F(x, y) = (x − y, x + y). Show that F is invertible and find its inverse. Solution Let u = x − y and v = x + y. Then x=
u+v , 2
y=
v−u . 2
This shows that for any (u, v) ∈ R2 , there is at most one pair of (x, y) such that F(x, y) = (u, v). Thus, F is one-to-one, and hence, it is invertible. Observe that F(D) = {(u, v) | v > 0, −v < u < v.} . The inverse mapping is given by F−1 : F(D) → R2 , u+v v−u −1 F (u, v) = , . 2 2
69
Chapter 2. Limits of Multivariable Functions and Continuity 2.1.4
70
Linear Transformations
Another special class of functions consists of linear transformations. A function T : Rn → Rm is a linear transformation if for any x1 , . . . , xk in Rn , and for any c1 , . . . , ck in R, T(c1 x1 + · · · + ck xk ) = c1 T(x1 ) + · · · + ck T(xk ). Linear transformations are closely related to matrices. An m × n matrix A is an array with m rows and n columns of real numbers. It has the form a11 a12 · · · a1n a21 a22 · · · a2n A = [aij ] = . .. .. .. . . . . . . am1 am2 · · ·
amn
If A = [aij ] and B = [bij ] are m × n matrices, α and β are real numbers, αA + βB is defined to be the m × n matrix C = αA + βB = [cij ] with cij = αaij + βbij . If A = [ail ] is a m × k matrix, B = [blj ] is a k × n matrix, the product AB is defined to be the m × n matrix C = AB = [cij ], where cij =
k X
ail blj .
l=1
It is easy to verify that matrix multiplications are associative. Given x = (x1 , . . . , xn ) in Rn , we identify it with the column vector x1 x2 x= .. , . xn which is an n × 1 matrix. If A is an m × n matrix, and x is a vector in Rn , then y = Ax is the vector in Rm given by a11 a12 · · · a1n x1 a11 x1 + a12 x2 + · · · + a1n xn a21 a22 · · · a2n x2 a21 x1 + a22 x2 + · · · + a2n xn . y = Ax = .. .. .. .. .. . = . . . .. . . am1 am2 · · ·
amn
xn
am1 x1 + am2 x2 + · · · + amn xn
Chapter 2. Limits of Multivariable Functions and Continuity The following is a standard result in linear algebra. Theorem 2.1 A function T : Rn → Rm is a linear transformation if and only if there exists an m × n matrix A = [aij ] such that T(x) = Ax. In this case, A is called the matrix associated to the linear transformation T : Rn → Rm . Sketch of Proof It is easy to verify that the mapping T : Rn → Rm , T(x) = Ax is a linear transformation if A is an m × n matrix. Conversely, if T : Rn → Rm is a linear transformation, then for any x ∈ Rn , T(x) = T(x1 e1 +x2 e2 +· · ·+xn en ) = x1 T(e1 )+x2 T(e2 )+· · ·+xn T(en ). Define the vectors a1 , a2 , . . ., an in Rm by a1 = T(e1 ), a2 = T(e2 ), . . . , an = T(en ). Let A be the m × n matrix with column vectors a1 , a2 , . . ., an . Namely, h i A = a1 a2 · · · an . Then we have T(x) = Ax. Example 2.7 Let F : R2 → R2 be the function defined as F(x, y) = (x − y, x + y). "
# 1 −1 Then F is a linear transformation with matrix A = . 1 1
71
Chapter 2. Limits of Multivariable Functions and Continuity
72
For the linear transformation T : Rn → Rm , T(x) = Ax, the component functions are T1 (x) = a11 x1 + a12 x2 + · · · + a1n xn , T2 (x) = a21 x1 + a22 x2 + · · · + a2n xn , .. . Tm (x) = am1 x1 + am2 x2 + · · · + amn xn . Each of them is a polynomial of degree at most one. Thus, a linear transformation is a polynomial mapping of degree at most one. It is easy to deduce the following. Corollary 2.2 A mapping T : Rn → Rm is a linear transformation if and only if each component function is a linear transformation. The followings are some standard results about linear transformations. Theorem 2.3 If S : Rn → Rm and T : Rn → Rm are linear transformations with matrices A and B respectively, then for any real numbers α and β, αS + βT : Rn → Rm is a linear transformation with matrix αA + βB. Theorem 2.4 If S : Rn → Rm and T : Rm → Rk are linear transformations with matrices A and B, then T ◦ S : Rn → Rk is a linear transformation with matrix BA. Sketch of Proof This follows from (T ◦ S)(x) = T(S(x)) = B(Ax) = (BA)x. In the particular case when m = n, we have the following.
Chapter 2. Limits of Multivariable Functions and Continuity
73
Theorem 2.5 Let T : Rn → Rn be a linear transformation represented by the matrix A. The following are equivalent. (a) The mapping T : Rn → Rn is one-to-one. (b) The mapping T : Rn → Rn is onto. (c) The matrix A is invertible. (d) det A ̸= 0. In other words, if the linear transformation T : Rn → Rn is one-to-one or onto, then it is bijective. In this case, the linear transformation is invertible, and we can define the inverse function T−1 : Rn → Rn . Theorem 2.6 Let T : Rn → Rn be an invertible linear transformation represented by the matrix A. Then the inverse mapping T−1 : Rn → Rn is also a linear transformation and T−1 (x) = A−1 x. Example 2.8 Let T : R2 → R2 be the linear transformation T(x, y) = (x − y, x + y). " # 1 −1 The matrix associated with T is A = . Since det A = 2 ̸= 0, T is 1 1 " # 1 1 1 invertible. Since A−1 = , we have 2 −1 1 −1
T (x, y) =
x + y −x + y , 2 2
.
Chapter 2. Limits of Multivariable Functions and Continuity 2.1.5
74
Quadratic Forms
Given an m × n matrix A = [aij ], its transpose is the n × m matrix AT = [bij ], where bij = aji for all 1 ≤ i ≤ n, 1 ≤ j ≤ m. An n × n matrix A is symmetric if A = AT . An n × n matrix P is orthogonal if P T P = P P T = I. If the column vectors of P are v1 , v2 , . . ., vn , so that i h P = v1 v2 · · · vn ,
(2.1)
then P is orthogonal if and only if {v1 , . . . , vn } is an orthonormal set of vectors in Rn . If A is an n × n symmetric matrix, its characteristic polynomial p(λ) = det(λIn − A) is a monic polynomial of degree n with n real roots λ1 , λ2 , . . . , λn , counting with multiplicities. These roots are called the eigenvalues of A. There is an orthonormal set of vectors {v1 , . . . , vn } in Rn such that Avi = λi vi
for all 1 ≤ i ≤ n.
(2.2)
Let D be the diagonal matrix λ1 0 · · · 0 λ2 · · · D= .. . . .. . . . 0 0 ···
0 0 .. , .
(2.3)
λn
and let P be the orthogonal matrix (2.1). Then (2.2) is equivalent to AP = P D, or equivalently, A = P DP T = P DP −1 .
Chapter 2. Limits of Multivariable Functions and Continuity
75
This is known as the orthogonal diagonalization of the real symmetric matrix A. A quadratic form in Rn is a polynomial function Q : Rn → R of the form X Q(x) = cij xi xj . 1≤i 0 such that if B is an n × n matrix with ∥B − A∥ < r, then B is also invertible. Sketch of Proof This is simply a rephrase of the statement that if A is a point in the open set GL (n, R), then there is a ball B(A, r) with center at A that is contained in GL (n, R). Let A be an n × n matrix. For 1 ≤ i, j ≤ n, the (i, j)-minor of A, denoted by Mi,j , is the determinant of the (n − 1) × (n − 1) matrix obtained by deleting the ith -row and j th - column of A. Using the same reasoning as above, we find that the function Mi,j : Mn → R is a continuous function. The (i, j) cofactor Ci,j of A is given by Ci,j = (−1)i+j Mi,j . The cofactor matrix of A is CA = [Cij ]. Since each of the components is continuous, the function C : Mn → Mn taking A to CA is a continuous function. If A is invertible, 1 A−1 = CT . det A A Since both C : Mn → Mn and det : Mn → R are continuous functions, and det : GL (n, R) → R is a function that is never equal to 0, we obtain the following. Theorem 2.37 The map I : GL (n, R) → GL (n, R) that takes A to A−1 is continuous.
Chapter 2. Limits of Multivariable Functions and Continuity
Exercises 2.3 Question 1 Let x0 be a point in Rn . Define the function f : Rn → R by f (x) = ∥x − x0 ∥. Show that f is a continuous function. Question 2 Let O = R3 \ {(0, 0, 0)} and define the function F : O → R2 by y z F(x, y, z) = , . x2 + y 2 + z 2 x2 + y 2 + z 2 Show that F is a continuous function. Question 3 Let f : Rn → R be the function defined as 1, if at least one of the xi is rational, f (x) = 0, otherwise. At which point of Rn is the function f continuous? Question 4 Let f : Rn → R be the function defined as x 2 + · · · + x 2 , if at least one of the xi is rational, 1 n f (x) = 0, otherwise. At which point of Rn is the function f continuous?
116
Chapter 2. Limits of Multivariable Functions and Continuity
117
Question 5 Let f : R3 → R be the function defined by 2 2 2 sin(x + 4y + z ) , x2 + 4y 2 + z 2 f (x, y, z) = a,
if (x, y, z) ̸= (0, 0, 0), if (x, y, z) = (0, 0, 0).
Show that there exists a value a such that f is a continuous function, and find this value of a. Question 6 Let a and b be positive numbers, and let O be the subset of Rn defined as O = {x ∈ Rn | a < ∥x∥ < b} . Show that O is open. Question 7 Let A be the subset of R2 given by A = {(x, y) | sin(x + y) + xy > 1} . Show that A is an open set. Question 8 Let A be the subset of R3 given by A = {(x, y, z) | x ≥ 0, y ≤ 1, exy ≤ z} . Show that A is a closed set.
Chapter 2. Limits of Multivariable Functions and Continuity
118
Question 9 A plane in R3 is the set of all points (x, y, z) satisfying an equation of the form ax + by + cz = d, where (a, b, c) ̸= (0, 0, 0). Show that a plane is a closed subset of R3 . Question 10 Define the sets A, B, C and D as follows. (a) A = (x, y, z) | x2 + 4y 2 + 9z 2 < 36 (b) B = (x, y, z) | x2 + 4y 2 + 9z 2 ≤ 36 (c) C = (x, y, z) | 0 < x2 + 4y 2 + 9z 2 < 36 (d) D = (x, y, z) | 0 < x2 + 4y 2 + 9z 2 ≤ 36 For each of these sets, find its interior, exterior and boundary. Question 11 Let a and b be real numbers, and assume that f : Rn → R is a continuous function. Consider the following subsets of Rn . (a) A = {x ∈ Rn | f (x) > a} (b) B = {x ∈ Rn | f (x) ≥ a} (c) C = {x ∈ Rn | f (x) < a} (d) D = {x ∈ Rn | f (x) ≤ a} (e) E = {x ∈ Rn | a < f (x) < b} (f) F = {x ∈ Rn | a ≤ f (x) ≤ b} Show that A, C and E are open sets, while B, D and F are closed sets.
Chapter 2. Limits of Multivariable Functions and Continuity
119
Question 12 Let f : R2 → R be the function defined as x 2 + y 2 , f (x, y) = 8 − x 2 − y 2 ,
if x2 + y 2 < 4 if x2 + y 2 ≥ 4.
Show that f is a continuous function. Question 13 Show that the distance function on Rn , d : Rn × Rn → R, d(u, v) = ∥u − v∥, is continuous in the following sense. If {uk } and {vk } are sequences in Rn that converges to u and v respectively, then the sequence {d(uk , vk )} converges to d(u, v). Question 14 Let T : R2 → R3 be the mapping T(x, y) = (x + y, 3x − y, 6x + 5y). Show that T : R2 → R3 is a Lipschitz mapping, and find the smallest Lipschitz constant for this mapping. Question 15 Given that A is a subset of Rm and B is a subset of Rn , let C = A × B. Then C is a subset of Rm+n . (a) If A is open in Rm and B is open in Rn , show that A × B is open in Rm+n . (b) If A is closed in Rm and B is closed in Rn , show that A × B is closed in Rm+n .
Chapter 2. Limits of Multivariable Functions and Continuity
120
Question 16 Let D be a subset of Rn , and let f : D → R be a continuous function defined on D. Let A = D × R and define the function g : A → R by g(x, y) = y − f (x). Show that g : A → R is continuous. Question 17 Let U be an open subset of Rn , and let f : U → R be a continuous function defined on U . Show that the sets O1 = {(x, y) | x ∈ U, y < f (x)} ,
O2 = {(x, y) | x ∈ U, y > f (x)}
are open subsets of Rn+1 . Question 18 Let C be a closed subset of Rn , and let f : C → R be a continuous function defined on C. Show that the sets A1 = {(x, y) | x ∈ C, y ≤ f (x)} , are closed subsets of Rn+1 .
A2 = {(x, y) | x ∈ C, y ≥ f (x)}
Chapter 2. Limits of Multivariable Functions and Continuity
2.4
121
Uniform Continuity
In volume I, we have seen that uniform continuity plays important role in single variable analysis. In this section, we extend this concept to multivariable functions. Definition 2.10 Continuity Let D be a subset of Rn , and let F : D → Rm be a function defined on D. We say that the function F is uniformly continuous provided that for any ε > 0, there exists δ > 0 such that if u and v are points in D and ∥u − v∥ < δ, then ∥F(u) − F(v)∥ < ε. The following two propositions are obvious. Proposition 2.38 A uniformly continuous function is continuous. Proposition 2.39 Given that D is a subset of Rn , and D′ is a subset of D, if the function F : D → Rm is uniformly continuous, then the function F : D′ → Rm is also uniformly continuous. A special class of uniformly continuous functions is the class of Lipschitz functions. Theorem 2.40 Let D be a subset of Rn , and let F : D → Rm be a function defined on D. If F : D → Rm is Lipschitz, then it is uniformly continuous. The proof is straightforward. Remark 2.3 Theorem 2.34 and Theorem 2.40 imply that a linear transformation is uniformly continuous.
Chapter 2. Limits of Multivariable Functions and Continuity
122
There is an equivalent definition for uniform continuity in terms of sequences. Theorem 2.41 Let D be a subset of Rn , and let F : D → Rm be a function defined on D. Then the following are equivalent. (i) F : D → Rm is uniformly continuous. Namely, given ε > 0, there exists δ > 0 such that if u and v are points in D and ∥u − v∥ < δ, then ∥F(u) − F(v)∥ < ε. (ii) If {uk } and {vk } are two sequences in D such that lim (uk − vk ) = 0,
k→∞
then lim (F(uk ) − F(vk )) = 0.
k→∞
Let us give a proof of this theorem here. Proof Assume that (i) holds, and {uk } and {vk } are two sequences in D such that lim (uk − vk ) = 0.
k→∞
Given ε > 0, (i) implies that there exists δ > 0 such that if u and v are points in D and ∥u − v∥ < δ, then ∥F(u) − F(v)∥ < ε. Since lim (uk − vk ) = 0, there is a positive integer K such that for all k→∞
k ≥ K, ∥uk − vk ∥ < δ. It follows that ∥F(uk ) − F(vk )∥ < ε
for all k ≥ K.
Chapter 2. Limits of Multivariable Functions and Continuity
123
This shows that lim (F(uk ) − F(vk )) = 0,
k→∞
and thus completes the proof of (i) implies (ii). Conversely, assume that (i) does not hold. This means there exists an ε > 0, for all δ > 0, there exist points u and v in D such that ∥u − v∥ < δ and ∥F(u) − F(v)∥ ≥ ε. Thus, for every k ∈ Z+ , there exists uk and vk in D such that 1 (2.5) ∥uk − vk ∥ < , k and ∥F(uk ) − F(vk )∥ ≥ ε. Notice that {uk } and {vk } are sequences in D. Eq. (2.5) implies that lim (uk − vk ) = 0. Since ∥F(uk ) − F(vk )∥ ≥ ε, k→∞
lim (F(uk ) − F(vk )) ̸= 0.
k→∞
This shows that if (i) does not hold, then (ii) does not hold. From Theorem 2.41, we can deduce the following. Proposition 2.42 Let D be a subset of Rn , and let F : D → Rm be a function defined on D. Then F : D → Rm is uniformly continuous if and only if each of the component functions Fj = (πj ◦ F) : D → R, 1 ≤ j ≤ m, is uniformly continuous. Let us look at some more examples. Example 2.34 Let D be the open rectangle D = (0, 5) × (0, 7), and consider the function f : D → R defined by f (x, y) = xy. Determine whether f : D → R is uniformly continuous.
Chapter 2. Limits of Multivariable Functions and Continuity
124
Solution For any two points u1 = (x1 , y1 ) and u2 = (x2 , y2 ) in D, 0 < x1 , x2 < 5 and 0 < y1 , y2 < 7. Since f (u1 ) − f (u2 ) = x1 y1 − x2 y2 = x1 (y1 − y2 ) + y2 (x1 − x2 ), we find that |f (u1 ) − f (u2 )| ≤ |x1 ||y1 − y2 | + |y2 ||x1 − x2 | ≤ 5∥u1 − u2 ∥ + 7∥u1 − u2 ∥ = 12∥u1 − u2 ∥. This shows that f : D → R is a Lipschitz function. Hence, it is uniformly continuous. Example 2.35 Consider the function f : R2 → R defined by f (x, y) = xy. Determine whether f : R2 → R is uniformly continuous. Solution +
For k ∈ Z , let 1 uk = k + , k , k
vk = (k, k).
Then {uk } and {vk } are sequences of points in R2 and 1 lim (uk − vk ) = lim , 0 = (0, 0). k→∞ k→∞ k However,
1 f (uk ) − f (vk ) = k k + k
− k 2 = 1.
Chapter 2. Limits of Multivariable Functions and Continuity
125
Thus, lim (f (uk ) − f (vk )) = 1 ̸= 0.
k→∞
Therefore, the function f : R2 → R is not uniformly continuous. Example 2.34 and 2.35 show that whether a function is uniformly continuous depends on the domain of the function.
Chapter 2. Limits of Multivariable Functions and Continuity
Exercises 2.4 Question 1 Let F : R3 → R2 be the function defined as F(x, y, z) = (3x − 2z + 7, x + y + z − 4). Show that F : R3 → R2 is uniformly continuous. Question 2 Let D = (0, 1) × (0, 2). Consider the function f : D → R defined as f (x, y) = x2 + 3y. Determine whether f is uniformly continuous. Question 3 Let D = (1, ∞) × (1, ∞). Consider the function f : D → R defined as f (x, y) =
√
x + y.
Determine whether f is uniformly continuous. Question 4 Let D = (0, 1) × (0, 2). Consider the function f : D → R defined as f (x, y) = √
1 . x+y
Determine whether f is uniformly continuous.
126
Chapter 2. Limits of Multivariable Functions and Continuity
2.5
127
Contraction Mapping Theorem
Among the Lipschitz functions, there is a subset called contractions. Definition 2.11 Contractions Let D be a subset of Rn . A function F : D → Rm is called a contraction if there exists a constant 0 ≤ c < 1 such that ∥F(u) − F(v)∥ ≤ c∥u − v∥
for all u, v ∈ D.
In other words, a contraction is a Lipschitz function which has a Lipschitz constant that is less than 1. Example 2.36 Let b be a point in Rn , and let F : Rn → Rn be the function defined as F(x) = cx + b. The mapping F is a contraction if and only if |c| < 1. The contraction mapping theorem is an important result in analysis. Extended to metric spaces, it is an important tool to prove the existence and uniqueness of solutions of ordinary differential equations. Theorem 2.43 Contraction Mapping Theorem Let D be a closed subset of Rn , and let F : D → D be a contraction. Then F has a unique fixed point. Namely, there is a unique u in D such that F(u) = u. Proof By definition, there is a constant c ∈ [0, 1) such that ∥F(u) − F(v)∥ ≤ c∥u − v∥
for all u, v ∈ D.
Chapter 2. Limits of Multivariable Functions and Continuity
128
We start with any point x0 in D and construct the sequence {xk } inductively by xk+1 = F(xk ) for all k ≥ 0. Notice that for all k ∈ Z+ , ∥xk+1 − xk ∥ = ∥F(xk ) − F(xk−1 )∥ ≤ c∥xk − xk−1 ∥. By iterating, we find that ∥xk+1 − xk ∥ ≤ ck ∥x1 − x0 ∥. Therefore, if l > k ≥ 0, triangle inequality implies that ∥xl − xk ∥ ≤ ∥xl − xl−1 ∥ + · · · + ∥xk+1 − xk ∥ ≤ (cl−1 + . . . + ck )∥x1 − x0 ∥. Since c ∈ [0, 1), cl−1 + . . . + ck = ck (1 + c + · · · + cl−k−1 )
k ≥ 0, ∥xl − xk ∥
0, there exists a positive integer K such that for all k ≥ K, ck ∥x1 − x0 ∥ < ε. 1−c This implies that for all l > k ≥ K, ∥xl − xk ∥ < ε.
Chapter 2. Limits of Multivariable Functions and Continuity
129
In other words, we have shown that {xk } is a Cauchy sequence. Therefore, it converges to a point u in Rn . Since D is closed, u is in D. Since F is continuous, the sequence {F(xk )} converges to F(u). But F(xk ) = xk+1 . Being a subsequence of {xk }, the sequence {xk+1 } converges to u as well. This shows that F(u) = u, which says that u is a fixed point of F. Now if v is another point in D such that F(v) = v, then ∥u − v∥ = ∥F(u) − F(v)∥ ≤ c∥u − v∥. Since c ∈ [0, 1), this can only be true if ∥u − v∥ = 0, which implies that v = u. Hence, the fixed point of F is unique. As an application of the contraction mapping theorem, we prove the following. Theorem 2.44 Let r be a positive number and let G : B(0, r) → Rn be a mapping such that G(0) = 0, and 1 ∥G(u) − G(v)∥ ≤ ∥u − v∥ 2
for all u, v ∈ B(0, r).
If F : B(0, r) → Rn is the function defined as F(x) = x + G(x), then F is a one-to-one continuous mapping whose image contains the open ball B(0, r/2).
Chapter 2. Limits of Multivariable Functions and Continuity
130
Proof By definition, G is a contraction. Hence, it is continuous. Therefore, F : B(0, r) → Rn is also continuous. If F(u) = F(v), then u − v = G(v) − G(u). Therefore,
1 ∥u − v∥ = ∥G(v) − G(u)∥ ≤ ∥u − v∥. 2 This implies that ∥u − v∥ = 0, and thus, u = v. Hence, F is one-to-one. Given y ∈ B(0, r/2), let r1 = 2∥y∥. Then r1 < r. Consider the map H : CB(0, r1 ) → Rn defined as H(x) = y − G(x). For any u and v in CB(0, r1 ), 1 ∥H(u) − H(v)∥ = ∥G(u) − G(v)∥ ≤ ∥u − v∥. 2 Therefore, H is also a contraction. Notice that if x ∈ CB(0, r1 ), ∥H(x)∥ ≤ ∥y∥ + ∥G(x) − G(0)∥ ≤
r1 1 r1 r1 + ∥x∥ ≤ + = r1 . 2 2 2 2
Therefore, H is a contraction that maps the closed set CB(0, r1 ) into itself. By the contraction mapping theorem, there exists u in CB(0, r1 ) such that H(u) = u. This gives y − G(u) = u, or equivalently, y = u + G(u) = F(u). In other words, we have shown that there exists u ∈ CB(0, r1 ) ⊂ B(0, r) such that F(u) = y. This proves that the image of the map F : B(0, r) → Rn contains the open ball B(0, r/2).
Chapter 2. Limits of Multivariable Functions and Continuity
131
Exercises 2.5 Question 1 Let S n = (x1 , . . . , xn , xn+1 ) ∈ Rn+1 | x21 + · · · + x2n + x2n+1 = 1 be the n-sphere, and let F : S n → S n be a mapping such that 2 ∥F(u) − F(v)∥ ≤ ∥u − v∥ 3
for all u, v ∈ S n .
Show that there is a unique w ∈ S n such that F(w) = w. Question 2 Let r be a positive number, and let c be a positive number less than 1. Assume that G : B(0, r) → Rn is a mapping such that G(0) = 0, and ∥G(u) − G(v)∥ ≤ c∥u − v∥
for all u, v ∈ B(0, r).
If F : B(0, r) → Rn is the function defined as F(x) = x + G(x), show that F is a one-to-one continuous mapping whose image contains the open ball B(0, ar), where a = 1 − c.
Chapter 3. Continuous Functions on Connected Sets and Compact Sets
132
Chapter 3 Continuous Functions on Connected Sets and Compact Sets In volume I, we have seen that intermediate value theorem and extreme value theorem play important roles in analysis. In order to extend these two theorems to multivariable functions, we need to consider two topological properties of sets – the connectedness and compactness.
3.1
Path-Connectedness and Intermediate Value Theorem
We want to extend the intermediate value theorem to multivariable functions. For this, we need to consider a topological property called connectedness. In this section, we will discuss the topological property called path-connectedness first, which is a more natural concept. Definition 3.1 Path Let S be a subset of Rn , and let u and v be two points in S. A path in S joining u to v is a continuos function γ : [a, b] → S such that γ(a) = u and γ(b) = v. For any real numbers a and b with a < b, the map u : [0, 1] → [a, b] defined by u(t) = a + t(b − a) is a continuous bijection. Its inverse u−1 : [a, b] → [0, 1] is u−1 (t) =
t−a , b−a
which is also continuous. Hence, in the definition of a path, we can let the domain be any [a, b] with a < b.
Chapter 3. Continuous Functions on Connected Sets and Compact Sets
133
Figure 3.1: A path in S joining u to v. Example 3.1 Given a set S and a point x0 in S, the constant function γ : [a, b] → S, γ(t) = x0 , is a path in S. If γ : [a, b] → S is a path in S ⊂ Rn , and S ′ is any other subset of Rn that contains the image of γ, then γ is also a path in S ′ . Example 3.2 Let R be the rectangle R = [−2, 2] × [−2, 2]. The function γ : [0, 1] → R2 , γ(t) = (cos(πt), sin(πt)) is a path in R joining u = (1, 0) to v = (−1, 0).
Figure 3.2: The path in Example 3.2.
Chapter 3. Continuous Functions on Connected Sets and Compact Sets
134
Example 3.3 Let S be a subset of Rn . If γ : [a, b] → S is a path in S joining u to v, then e : [−b, −a] → S, γ e (t) = γ(−t), is a path in S joining v to u. γ Now we define path-connectedness. Definition 3.2 Path-Connected Let S be a subset of Rn . We say that S is path-connected if any two points u and v in S can be joined by a path in S. It is easy to characterize a path-connected subset of R. In volume I, we have defined the concept of convex sets. A subset S of R is a convex set provided that for any u and v in S and any t ∈ [0, 1], (1 − t)u + tv is also in S. This is equivalent to if u and v are points in S with u < v, all the points w satisfying u < w < v is also in S. We have shown that a subset S of R is a convex set if and only if it is an interval. The following theorem characterize a path-connected subset of R. Theorem 3.1 Let S be a subset of R. Then S is path-connected if and only if S is an interval. Proof If S is an interval, then for any u and v in S, and for any t ∈ [0, 1], (1 − t)u + tv is in S. Hence, the function γ : [0, 1] → S, γ(t) = (1 − t)u + tv is a path in S that joins u to v. Conversely, assume that S is a path-connected subset of R. To show that S is an interval, we need to show that for any u and v in S with u < v, any w that is in the interval [u, v] is also in S. Since S is path-connected, there is a path γ : [0, 1] → S such that γ(0) = u and γ(1) = v. Since γ is continuous, and w is in between γ(0) and γ(1), intermediate value theorem implies that there is a c ∈ [0, 1] so that γ(c) = w. Thus, w is in S. To explore path-connected subsets of Rn with n ≥ 2, we first extend the
Chapter 3. Continuous Functions on Connected Sets and Compact Sets
135
concept of convex sets to Rn . Given two points u and v in Rn , when t runs through all the points in the interval [0, 1], (1 − t)u + tv describes all the points on the line segment between u and v. Definition 3.3 Convex Sets Let S be a subset of Rn . We say that S is convex if for any two points u and v in S, the line segment between u and v lies entirely in S. Equivalently, S is convex provided that for any two points u and v in S, the point (1 − t)u + tv is in S for any t ∈ [0, 1].
Figure 3.3: A is a convex set, B is not. If u = (u1 , . . . , un ) and v = (v1 , . . . , vn ) are two points in Rn , the map γ : [0, 1] → Rn , γ(t) = (1 − t)u + tv = ((1 − t)u1 + tv1 , . . . , (1 − t)un + tvn ) is a continuous functions, since each of its components is continuous. Thus, we have the following. Theorem 3.2 Let S be a subset of Rn . If S is convex, then it is path-connected. Let us look at some examples of convex sets. Example 3.4 Let I1 , . . ., In be intervals in R. Show that the set S = I1 × · · · × In is path-connected.
Chapter 3. Continuous Functions on Connected Sets and Compact Sets
136
Solution We claim that S is convex. Then Theorem 3.2 implies that S is pathconnected. Given that u = (u1 , . . . , un ) and v = (v1 , . . . , vn ) are two points in S, for each 1 ≤ i ≤ n, ui and vi are in Ii . Since Ii is an interval, for any t ∈ [0, 1], (1 − t)ui + tvi is in Ii . Hence, (1 − t)u + tv = ((1 − t)u1 + tv1 , . . . , (1 − t)un + tvn ) is in S. This shows that S is convex. Special cases of sets of the form S = I1 × · · · × In are open and closed rectangles. Example 3.5 An open rectangle U = (a1 , b1 ) × · · · × (an , bn ) and its closure R = [a1 , b1 ] × · · · × [an , bn ] are convex sets. Hence, they are path-connected. Example 3.6 Let x0 be a point in Rn , and let r be a positive number. Show that the open ball B(x0 , r) and the closed ball CB(x0 , r) are path-connected sets. Solution Let u and v be two points in B(x0 , r). Then ∥u−x0 ∥ < r and ∥v−x0 ∥ < r. For any t ∈ [0, 1], t ≥ 0 and 1 − t ≥ 0. By triangle inequality, ∥(1 − t)u + tv − x0 ∥ ≤ ∥(1 − t)(u − x0 )∥ + ∥t(v − x0 )∥ = (1 − t)∥u − x0 ∥ + t∥v − x0 ∥ < (1 − t)r + tr = r.
Chapter 3. Continuous Functions on Connected Sets and Compact Sets
137
This shows that (1 − t)u + tv is in B(x0 , r). Hence, B(x0 , r) is convex. Replacing < by ≤, one can show that CB(x0 , r) is convex. By Theorem 3.2, the open ball B(x0 , r) and the closed ball CB(x0 , r) are path-connected sets. Not all the path-connected sets are convex. Before we give an example, let us first prove the following useful lemma. Lemma 3.3 Let A and B be path-connected subsets of Rn . If A ∩ B is nonempty, then S = A ∪ B is path-connected. Proof Let u and v be two points in S. If both u and v are in the set A, then they can be joined by a path in A, which is also in S. Similarly, if both u and v are in the set B, then they can be joined by a path in S. If u is in A and v is in B, let x0 be any point in A ∩ B. Then u and x0 are both in the path-connected set A, and v and x0 are both in the path-connected set B. Therefore, there exist continuous functions γ 1 : [0, 1] → A and γ 2 : [1, 2] → B such that γ 1 (0) = u, γ 1 (1) = x0 , γ 2 (1) = x0 and γ 2 (2) = v. Define the function γ : [0, 2] → A ∪ B by γ (t), if 0 ≤ t ≤ 1, 1 γ(t) = γ (t), if 1 ≤ t ≤ 2. 2
Since [0, 1] and [1, 2] are closed subsets of R, the function γ : [0, 2] → S is continuous. Thus, γ is a path in S from u to v. This proves that S is path-connected. Now we can give an example of a path-connected set that is not convex.
Chapter 3. Continuous Functions on Connected Sets and Compact Sets
138
Figure 3.4: If two sets A and B are path-connected and A ∩ B is nonempty, then A ∪ B is also path-connected. Example 3.7 Show that the set S = {(x, y) | 0 ≤ x ≤ 1, −2 ≤ y ≤ 2} ∪ (x, y) | (x − 2)2 + y 2 ≤ 1 is path-connected, but not convex. Solution The set A = {(x, y) | 0 ≤ x ≤ 1, −2 ≤ y ≤ 2} = [0, 1] × [−2, 2] is a closed rectangle. Therefore, it is path- connected. The set B = (x, y) | (x − 2)2 + y 2 ≤ 1 is a closed ball with center at (2, 0) and radius 1. Hence, it is also pathconnected. Since the point x0 = (1, 0) is in both A and B, S = A ∪ B is path-connected. The points u = (1, 2) and v = (2, 1) are in S. Consider the point 1 1 3 3 w = u+ v = , . 2 2 2 2 It is not in S. This shows that S is not convex. Let us now prove the following important theorem which says that continuous
Chapter 3. Continuous Functions on Connected Sets and Compact Sets
139
Figure 3.5: The set A ∪ B is path-connected but not convex. functions preserve path-connectedness. Theorem 3.4 Let D be a path-connected subset of Rn . If F : D → Rm is a continuous function, then F(D) is path-connected. Proof Let v1 and v2 be two points in F(D). Then there exist u1 and u2 in D such that F(u1 ) = v1 and F(u2 ) = v2 . Since D is path-connected, there is a continuous function γ : [0, 1] → D such that γ(0) = u1 and γ(1) = u2 . The map α = (F ◦ γ) : [0, 1] → F(D) is then a conitnuous map with α(0) = v1 and α(1) = v2 . This shows that F(D) is path-connected. From Theorem 3.4, we obtain the following. Theorem 3.5 Intermediate Value Theorem for Path-Connected Sets Let D be a path-connected subset of Rn , and let f : D → R be a function defined on D. If f is continuous, then f (D) is an interval.
Chapter 3. Continuous Functions on Connected Sets and Compact Sets
140
Proof By Theorem 3.4, f (D) is a path-connected subset of R. By Theorem 3.1, f (D) is an interval. We can also use Theorem 3.4 to establish more examples of path-connected sets. Let us first look at an example. Example 3.8 Show that the circle S 1 = (x, y) | x2 + y 2 = 1 is path-connected. Solution Define the function f : [0, 2π] → R2 by f (t) = (cos t, sin t). Notice that S 1 = f ([0, 2π)]. Since each component of f is a continuous function, f is a continuous function. Since [0, 2π] is an interval, it is pathconnected. By Theorem 3.4, S 1 = f ([0, 2π]) is path-connected. A more general theorem is as follows. Theorem 3.6 Let D be a path-connected subset of Rn , and let F : D → Rm be a function defined on D. If F : D → Rm is continuous, then the graph of F, GF = {(x, y) | x ∈ D, y = F(x)} is a path-connected subset of Rn+m .
Chapter 3. Continuous Functions on Connected Sets and Compact Sets
141
Proof By Corollary 2.32, the function H : D → Rn+m , H(x) = (x, F(x)), is continuous. Since H(D) = GF , Theorem 3.4 implies that GF is a pathconnected subset of Rn+m . Now let us consider spheres, which are boundary of balls. Definition 3.4 The Standard Unit n -Sphere S n A standard unit n-sphere S n is a subset of Rn+1 consists of all points x = (x1 , . . . , xn , xn+1 ) in Rn+1 satisfying the equation ∥x∥ = 1, namely, x21 + · · · + x2n + x2n+1 = 1. The n-sphere S n is the boundary of the (n + 1) open ball B n+1 = B(0, 1) with center at the origin and radius 1.
Figure 3.6: A sphere. Example 3.9 Show that the standard unit n-sphere S n is path-connected. Solution Notice that S = where S+n and S−n are respectively the upper and lower hemispheres with xn+1 ≥ 0 and xn+1 ≤ 0 respectively. n
S+n ∪ S−n ,
Chapter 3. Continuous Functions on Connected Sets and Compact Sets
142
If x ∈ S+n , then xn+1 =
q
1 − x21 − . . . − x2n ;
whereas if x ∈ S−n , xn+1 = −
q 1 − x21 − . . . − x2n .
Let CB n = (x1 , . . . , xn ) | x21 + · · · + x2n ≤ 1 be the closed ball in Rn with center at the origin and radius 1. Define the functions f± : CB n → R by q f± (x1 , . . . , xn ) = ± 1 − x21 − . . . − x2n . Notice that S+n and S−n are respectively the graphs of f+ and f− . Since they are compositions of the square root function and a polynomial function, which are both continuous, f+ and f− are continuous functions. The closed ball CB n is path-connected. Theorem 3.6 then implies that S+n and S−n are path-connected. Since both S+n and S−n contain the unit vector e1 in Rn+1 , the set S+n ∩ S−n is nonempty. By Lemma 3.3, S n = S+n ∪ S−n is path-connected. Remark 3.1 There is an alternative way to prove that the n-sphere S n is path-connected. Given two distinct points u and v in S n , they are unit vectors in Rn+1 . We want to show that there is a path in S n joining u to v. Notice that the line segment L = {(1 − t)u + tv | 0 ≤ t ≤ 1} in Rn+1 contains the origin if and only if u and v are parallel, if and only if v = −u. Thus, we discuss two cases.
Chapter 3. Continuous Functions on Connected Sets and Compact Sets
143
Case I: v ̸= −u. In this case, let γ : [0, 1] → Rn+1 be the function defined as γ(t) =
(1 − t)u + tv . ∥(1 − t)u + tv∥
Since (1 − t)u + tv ̸= 0 for all 0 ≤ t ≤ 1, γ is a continuous function. It is easy to check that its image lies in S n . Hence, γ is a path in S n joining u to v. Case 2: v = −u. In this case, let w be a unit vector orthogonal to u, and let γ : [0, π] → Rn+1 be the function defined as γ(t) = (cos t)u + (sin t)w. Since sin t and cos t are continuous functions, γ is a continuous function. Since u and w are orthogonal, the generalized Pythagoras theorem implies that ∥γ(t)∥2 = cos2 t∥u∥2 + sin2 t∥w∥2 = cos2 t + sin2 t = 1. Therefore, the image of γ lies in S n . It is easy to see that γ(0) = u and γ(π) = −u = v. Hence, γ is a path in S n joining u to v. Example 3.10 Let f : S n → R be a continuous function. Show that there is a point u0 on S n such that f (u0 ) = f (−u0 ). Solution n+1
n+1
The function g : R → R , g(u) = −u is a linear transformation. Hence, it is continuous. Restricted to S n , g(S n ) = S n . Thus, the function f1 : S n → R, f1 (u) = f (−u), is also continuous.
Chapter 3. Continuous Functions on Connected Sets and Compact Sets
144
It follows that the function h : S n → R defined by h(u) = f (u) − f (−u) is continuous. Notice that h(−u) = f (−u) − f (u) = −h(u). This implies that if the number a is in the range of h, so does the number −a. Since the number 0 is in between a and −a for any a, and S n is pathconnected, intermediate value theorem implies that the number 0 is also in the range of h. This means that there is an u0 on S n such that h(u0 ) = 0. Equivalently, f (u0 ) = f (−u0 ). Theorem 3.5 says that a continuous function defined on a path-connected set satisfies the intermediate value theorem. We make the following definition. Definition 3.5 Intermediate Value Property Let S be a subset of Rn . We say that S has intermediate value property provided that whenever f : S → R is a continuous function, then f (S) is an interval. Theorem 3.5 says that if S is a path-connected set, then it has intermediate value property. It is natural to ask whether it is true that any set S that has the intermediate value property must be path-connected. Unfortunately, it turns out that the answer is yes only when S is a subset of R. If S is a subset of Rn with n ≥ 2, this is not true. This leads us to define a new property of sets called connectedness in the next section.
Chapter 3. Continuous Functions on Connected Sets and Compact Sets
145
Exercises 3.1 Question 1 Is the set A = (−1, 2) ∪ (2, 5] path-connected? Justify your answer. Question 2 Let a and b be positive numbers, and let A be the subset of R2 given by 2 x y2 A = (x, y) 2 + 2 ≤ 1 . a b Show that A is convex, and deduce that it is path-connected. Question 3 Let (a, b, c) be a nonzero vector, and let P be the plane in R3 given by P = {(x, y, z) | ax + by + cz = d} , where d is a constant. Show that P is convex, and deduce that it is pathconnected. Question 4 Let S be the subset of R3 given by S = {(x, y, z) | x > 0, y ≤ 1, 2 ≤ z < 7} . Show that S is path-connected. Question 5 Let a, b and c be positive numbers, and let S be the subset of R3 given by 2 x y2 z2 S = (x, y, z) 2 + 2 + 2 = 1 . a b c Show that S is path-connected.
Chapter 3. Continuous Functions on Connected Sets and Compact Sets
146
Question 6 Let u = (3, 0) and let A be the subset of R2 given by A = (x, y) | x2 + y 2 ≤ 1 . Define the function f : A → R by f (x) = d(x, u). (a) Find f (x1 ) and f (x2 ), where x1 = (1, 0) and x2 = (−1, 0). (b) Use intermediate value theorem to justify that there is a point x0 in A such that d(x0 , u) = π. Question 7 Let A and B be subsets of Rn . If A and B are convex, show that A ∩ B is also convex.
Chapter 3. Continuous Functions on Connected Sets and Compact Sets
3.2
147
Connectedness and Intermediate Value Property
In this section, we study a property of sets which is known as connectedness. Let us first look at the path-connected subsets of R from a different perpective. We have shown in the previous section that a subset of R is path-connected if and only if it is an interval. A set of the form A = (−2, 2] \ {0} = (−2, 0) ∪ (0, 2] is not path-connected, since it contains the points −1 and 1, but it does not contain the point 0 that is in between. Intuitively, there is no way to go from the point −1 to 1 continuously without leaving the set A. Let U = (−∞, 0) and V = (0, ∞). Notice that U and V are open subsets of R which both intersect the set A. Moreover, A = (A ∩ U ) ∪ (A ∩ V ), or equivalently, A ⊂ U ∪ V. We say that A is separated by the open sets U and V . Definition 3.6 Separation of a Set Let A be a subset of Rn . A separation of A is a pair (U, V ) of subsets of Rn which satisfies the following conditions. (a) U and V are open sets. (b) A ∩ U ̸= ∅ and A ∩ V ̸= ∅. (c) A ⊂ U ∪ V , or equivalently, A is the union of A ∩ U and A ∩ V . (d) A is disjoint from U ∩ V , or equivalently, A ∩ U and A ∩ V are disjoint. If (U, V ) is a separation of A, we say that A is separated by the open sets U and V , or the open sets U and V separate A.
Chapter 3. Continuous Functions on Connected Sets and Compact Sets
148
Example 3.11 Let A = (−2, 0) ∪ (0, 2], and let U = (−∞, 0) and V = (0, ∞). Then the open sets U and V separate A. Let U1 = (−3, 0) and V1 = (0, 3). The open sets U1 and V1 also separate A. Now we define connectedness. Definition 3.7 Connected Sets Let A be a subset of Rn . We say that A is connected if there does not exist a pair of open sets U and V that separate A. Example 3.12 Determine whether the set A = {(x, y) | y = 0} ∪ (x, y)
2 y = 1 + x2
is connected. Solution 2
Let f : R → R be the function defined as f (x, y) = y(x2 + 1). Since f is a polynomial function, it is continuous. The intervals V1 = (−1, 1) and V2 = (1, 3) are open sets in R. Hence, the sets U1 = f −1 (V1 ) and U2 = f −1 (V2 ) are disjoint and they are open in R2 . Notice that 2 . A ∩ U1 = {(x, y) | y = 0} , A ∩ U2 = (x, y) y = 1 + x2 Thus, A ∩ U1 and A ∩ U2 are nonempty, A ∩ U1 and A ∩ U2 are disjoint, and A is a union of A ∩ U1 and A ∩ U2 . This shows that the open sets U1 and U2 separate A. Hence, A is not connected. Now let us explore the relation between path-connected and connected. We
Chapter 3. Continuous Functions on Connected Sets and Compact Sets
149
Figure 3.7: The set A defined in Example 3.12 is not connected. first prove the following. Theorem 3.7 Let A be a subset of Rn , and assume that the open sets U and V separate A. Define the function f : A → R by 0, if x ∈ A ∩ U, f (x) = 1, if x ∈ A ∩ V. Then f is continuous. Notice that the function f is well defined since A ∩ U and A ∩ V are disjoint. Proof Let x0 be a point in A. We want to prove that f is continuous at x0 . Since A is contained in U ∪ V , x0 is in U or in V . It suffices to consider the case where x0 is in U . The case where x0 is in V is similar. If x0 is in U , since U is open, there is an r > 0 such that B(x0 , r) ⊂ U . If {xk } is a sequence in A that converges x0 , there exists a positive integer K such that for all k ≥ K, ∥xk − x0 ∥ < r. Thus, for all k ≥ K, xk ∈ B(x0 , r) ⊂ U , and hence, f (xk ) = 0. This proves that the sequence {f (xk )} converges to 0, which is f (x0 ). Therefore, f is continuous at x0 . Now we can prove the theorem which says that a path-connected set is connected.
Chapter 3. Continuous Functions on Connected Sets and Compact Sets
150
Theorem 3.8 Let A be a subset of Rn . If A is path-connected, then it is connected. Proof We prove the contrapositive, which says that if A is not connected, then it is not path-connected. If A is not connected, there is a pair of open sets U and V that separate A. By Theorem 3.7, the function f : A → R defined by 0, if x ∈ A ∩ U, f (x) = 1, if x ∈ A ∩ V is continuous. Since f (A) = {0, 1} is not an interval, by the contrapositive of the intermediate value theorem for path-connected sets, A is not pathconnected. Theorem 3.8 provides us a large library of connected sets. Example 3.13 The following sets are path-connected. Hence, they are also connected. 1. A set S in Rn of the form S = I1 × · · · × In , where I1 , . . . , In are intervals in R. 2. Open rectangles and closed rectangles. 3. Open balls and closed balls. 4. The n-sphere S n . The following theorem says that path-connectedness and connectedness are equivalent in R.
Chapter 3. Continuous Functions on Connected Sets and Compact Sets
151
Theorem 3.9 Let S be a subset of R. Then the following are equivalent. (a) S is an interval. (b) S is path-connected. (c) S is connected. Proof We have proved (a) ⇐⇒ (b) in the previous section. In particular, (a) implies (b). Theorem 3.8 says that (b) implies (c). Now we only need to prove that (c) implies (a). Assume that (a) is not true. Namely, S is not an interval. Then there are points u and v in S with u < v, such that there is a w ∈ (u, v) that is not in S. Let U = (−∞, w) and V = (w, ∞). Then U and V are disjoint open subsets of R. Since w ∈ / S, S ⊂ U ∪ V . Since u ∈ S ∩ U and v ∈ S ∩ V , S ∩ U and S ∩ V are nonempty. Hence, U and V are open sets that separate S. This shows that S is not connected. Thus, we have proved that if (a) is not true, then (c) is not true. This is equivalent to (c) implies (a). Connectedness is also preserved by continuous functions. Theorem 3.10 Let D be a connected subset of Rn . If F : D → Rm is a continuous function, then F(D) is connected. Proof We prove the contra-positive. Assume that F(D) is not connected. Then there are open sets V1 and V2 in Rm that separate F(D). Let D1 = {x ∈ D | F(x) ∈ V1 } , D2 = {x ∈ D | F(x) ∈ V2 } .
Chapter 3. Continuous Functions on Connected Sets and Compact Sets
152
Since F(D) ∩ V1 and F(D) ∩ V2 are nonempty, D1 and D2 are nonempty. Since F(D) ⊂ V1 ∪ V2 , D = D1 ∪ D2 . Since V1 ∩ V2 is disjoint from F(D), D1 and D2 are disjoint. However, D1 and D2 are not necessary open sets. We will define two open sets U1 and U2 in Rn such that D1 = D ∩ U1 and D2 = D ∩ U2 . Then U1 and U2 are open sets that separate D. For each x0 in D1 , F(x0 ) ∈ V1 . Since V1 is open, there exists εx0 > 0 such that the ball B(F(x0 ), εx0 ) is contained in V1 . By the continuity of F at x0 , there exists δx0 > 0 such that for all x in D, if x ∈ B(x0 , δx0 ), then F(x) ∈ B(F(x0 ), εx0 ) ⊂ V1 . In other words, D ∩ B(x0 , δx0 ) ⊂ F−1 (V1 ) = D1 . Notice that B(x0 , δx0 ) is an open set. Define [
U1 =
B(x0 , δx0 ).
x0 ∈D1
Being a union of open sets, U1 is open. Since D ∩ U1 =
[
(D ∩ B(x0 , δx0 )) ⊂ D1 ,
x0 ∈D1
and D1 =
[ x0 ∈D1
{x0 } ⊂
[
(D ∩ B(x0 , δx0 )) = D ∩ U1 ,
x0 ∈D1
we find that D ∩ U1 = D1 . Similarly, define [ U2 = B(x0 , δx0 ). x0 ∈D2
Then U2 is an open set and D ∩ U2 = D2 . This completes the construction of the open sets U1 and U2 that separate D. Thus, D is not connected. From Theorem 3.9 and Theorem 3.10, we also have an intermediate value theorem for connected sets.
Chapter 3. Continuous Functions on Connected Sets and Compact Sets
153
Theorem 3.11 Intermediate Value Theorem for Connected Sets Let D be a connected subset of Rn , and let f : D → R be a function defined on D. If f is continuous, then f (D) is an interval. Proof By Theorem 3.10, f (D) is a connected subset of R. By Theorem 3.9, f (D) is an interval. Now we can prove the following. Theorem 3.12 Let S be a subset of Rn . Then S is connected if and only if it has the intermediate value property. Proof If S is connected and f : S → R is continuous, Theorem 3.11 implies that f (S) is an interval. Hence, S has the intermediate value property. If S is not connected, Theorem 3.7 gives a continuous function f : S → R such that f (S) = {0, 1} is not an interval. Thus, S does not have the intermediate value property. To give an example of a connected set that is not path-connected, we need a lemma. Lemma 3.13 Let A be a subset of Rn that is separated by the open sets U and V . If C is a connected subset of A, then C ∩ U = ∅ or C ∩ V = ∅. Proof Since C ⊂ A, C ⊂ U ∪ V , and C is disjoint from U ∩ V . If C ∩ U ̸= ∅ and C ∩ V ̸= ∅, then the open sets U and V also separate C. This contradicts to C is connected. Thus, we must have C ∩ U = ∅ or C ∩ V = ∅.
Chapter 3. Continuous Functions on Connected Sets and Compact Sets
154
Theorem 3.14 Let A be a connected subset of Rn . If B is a subset of Rn such that A ⊂ B ⊂ A, then B is also connected. Proof If B is not connected, there exist open sets U and V in Rn that separate A. Since A is connected, Lemma 3.13 says that A ∩ U = ∅ or A ∩ V = ∅. Without loss of generality, assume that A ∩ V = ∅. Then A ⊂ Rn \ V . Thus, Rn \ V is a closed set that contains A. This implies that A ⊂ Rn \ V . Hence, we also have B ⊂ Rn \ V , which contradicts to the fact that the set B ∩ V is not empty. Example 3.14 The Topologist’s Sine Curve Let S be the subset of R2 given by S = A ∪ L, where 1 A = (x, y) 0 < x ≤ 1, y = sin , x and L = {(x, y) | x = 0, −1 ≤ y ≤ 1} . (a) Show that S ⊂ A. (b) Show that S is connected. (c) Show that S is not path-connected. Solution (a) Since A ⊂ A, it suffices to show that L ⊂ A. Given (0, u) ∈ L, −1 ≤ u ≤ 1. Thus, a = sin−1 u ∈ [−π/2, π/2]. Let xk =
1 a + 2πk
for k ∈ Z+ .
Chapter 3. Continuous Functions on Connected Sets and Compact Sets
155
Notice that xk ∈ (0, 1] and sin
1 = sin a = u. xk
Thus, {(xk , sin(1/xk ))} is a sequence of points in A that converges to (0, u). This proves that (0, u) ∈ A. Hence, L ⊂ A. (b) The interval(0,1] is path-connected and the function f : (0, 1] → R, 1 is continuous. Thus, A = Gf is path-connected, and f (x) = sin x hence it is connected. Since A ⊂ S ⊂ A, Theorem 3.14 implies that S is connected. (c) If S is path connected, there is a path γ : [0, 1] → S such that γ(0) = (0, 0) and γ(1) = (1, sin 1). Let γ(t) = (γ1 (t), γ2 (t)). Then γ1 : [0, 1] → R and γ2 : [0, 1] → R are continuous functions. Consider the sequence {xk } with 1 , xk = π + πk 2
k ∈ Z+ .
Notice that {xk } is a decreasing sequence of points in [0, 1] that converges to 0. For each k ∈ Z+ , (xk , yk ) ∈ S if and onlly if yk = sin(1/xk ). Since γ1 : [0, 1] → R is continuous, γ1 (0) = 0 and γ1 (1) = 1, intermediate value theorem implies that there exists t1 ∈ [0, 1] such that γ1 (t1 ) = x1 . Similarly, there exists t2 ∈ [0, t1 ] such that γ1 (t2 ) = x2 . Continue the argument gives a decreasing sequence {tk } in [0, 1] such that γ1 (tk ) = xk for all k ∈ Z+ . Since the sequence {tk } is bounded below, it converges to some t0 in [0, 1]. Since γ2 : [0, 1] → R is also continuous, the sequence {γ2 (tk )} should converge to γ2 (t0 ). Since γ(tk ) ∈ S and γ1 (tk ) = xk , we must have γ2 (tk ) = yk = (−1)k . But then the sequence {γ2 (tk )} is not convergent. This gives a contradiction. Hence, there does not exist a path in S that joins the point (0, 0) to the point (1, sin 1). This proves that S is not path-connected.
Chapter 3. Continuous Functions on Connected Sets and Compact Sets
156
Figure 3.8: The topologist’s sine curve. Remark 3.2 Example 3.14 gives a set that is connected but not path-connected. 1. One can in fact show that S = A. 2. To show that A is connected, we can also use the fact that if D is a connected subset of Rn , and F : D → Rm is a continuous function, then the graph of F is connected. The proof of this fact is left as an exercise. At the end of this section, we want to give a sufficient condition for a connected subset of Rn to be path-connected. First we define the meaning of a polygonal path. Definition 3.8 Polygonal Path Let S be a subset of Rn , and let u and v be two points in S. A path γ : [a, b] → S in S that joins u to v is a polygonal path provided that there is a partition P = {t0 , t1 , . . . , tk } of [a, b] such that for 1 ≤ i ≤ k, γ(t) = xi−1 +
t − ti−1 (xi − xi−1 ) , ti − ti−1
when ti−1 ≤ t ≤ ti .
Obviously, we have the following. Proposition 3.15 If S is a convex subset of Rn , then any two points in S can be joined by a polygonal path in Rn .
Chapter 3. Continuous Functions on Connected Sets and Compact Sets
157
Figure 3.9: A polygonal path. If γ 1 : [a, c] → A is a polygonal path in A that joins u to w, γ 2 : [c, b] → B is a polygonal path in B that joins w to v, then the path γ : [a, b] → A ∪ B, γ (t), if a ≤ t ≤ c, 1 γ(t) = γ (t), if c ≤ t ≤ b, 2
is a polygonal path in A ∪ B that joins u to v. Using this, we can prove the following useful theorem. Theorem 3.16 Let S be a connected subset of Rn . If S is an open set, then any two points in S can be joined by a polygonal path in S. In particular, S is path connected. Proof We use proof by contradiction. Supposed that S is open but there are two points u and v in S that cannot be joined by a polygonal path in S. Consider the sets U = {x ∈ S | there is a polygonal path in S that joins u to x} , V = {x ∈ S | there is no polygonal path in S that joins u to x} . Obviously u is in U and v is in V , and S = U ∪ V . We claim that both U and V are open sets.
Chapter 3. Continuous Functions on Connected Sets and Compact Sets
158
If x is in the open set S, there is an r > 0 such that B(x, r) ⊂ S. Since B(x, r) is convex, any point w in B(x, r) can be joined by a polygonal path in B(x, r) to x. Hence, if x is in U , w is in U . If x is in V , w is in V . This shows that if x is in U , then B(x, r) ⊂ U . If x is in V , then B(x, r) ⊂ V . Hence, U and V are open sets. Since U and V are nonempty open sets and S = U ∪ V , they form a separation of S. This contradicts to S is connected. Hence, any two points in S can be joined by a polygonal path in S.
Chapter 3. Continuous Functions on Connected Sets and Compact Sets
159
Exercises 3.2 Question 1 Determine whether the set 2 A = {(x, y) | y = 0} ∪ (x, y) x > 0, y = x
is connected. Question 2 Let D be a connected subset of Rn , and let F : D → Rm be a function defined on D. If F : D → Rm is continuous, show that the graph of F, GF = {(x, y) | x ∈ D, y = F(x)} is also connected. Question 3 Determine whether the set A = {(x, y) | 0 ≤ x < 1, −1 < y ≤ 1} ∪ {(1, 0), (1, 1)} is connected. Question 4 Assume that A is a connected subset of R3 that contains the points u = (0, 2, 0) and v = (2, −6, 3). (a) Show that there is a point x = (x, y, z) in A that lies in the plane y = 0. (b) Show that there exists a point x = (x, y, z) in A that lies on the sphere x2 + y 2 + z 2 = 25.
Chapter 3. Continuous Functions on Connected Sets and Compact Sets
160
Question 5 Let A and B be connected subsets of Rn . If A ∩ B is nonempty, show that S = A ∪ B is connected.
Chapter 3. Continuous Functions on Connected Sets and Compact Sets
3.3
161
Sequential Compactness and Compactness
In volume I, we have seen that sequential compactness plays important role in extreme value theorem. In this section, we extend the definition of sequential compactness to subsets of Rn . We will also consider another concept called compactness. Let us start with the definition of bounded sets. Definition 3.9 Bounded Sets Let S be a subset of Rn . We say that S is bounded if there exists a positive number M such that ∥x∥ ≤ M
for all x ∈ S.
Remark 3.3 Let S be a subset of Rn . If S is bounded and S ′ is a subset of S, then it is obvious that S ′ is also bounded. Example 3.15 Show that a ball B(x0 , r) in Rn is bounded. Solution Given x ∈ B(x0 , r), ∥x − x0 ∥ < r. Thus, ∥x∥ ≤ ∥x0 ∥ + ∥x − x0 ∥ < ∥x0 ∥ + r. Since M = ∥x0 ∥ + r is a constant independent of the points in the ball B(x0 , r), the ball B(x0 , r) is bounded. Notice that if x1 and x2 are points in Rn , and S is a set in Rn such that ∥x − x1 ∥ < r1
for all x ∈ S,
then ∥x − x2 ∥ < r1 + ∥x2 − x1 ∥ Thus, we have the following.
for all x ∈ S.
Chapter 3. Continuous Functions on Connected Sets and Compact Sets
162
Proposition 3.17 Let S be a subset in Rn . The following are equivalent. (a) S is bounded. (b) There is a point x0 in Rn and a positive constant M such that ∥x − x0 ∥ ≤ M
for all x ∈ S.
(c) For any x0 in Rn , there is a positive constant M such that ∥x − x0 ∥ ≤ M
for all x ∈ S.
Figure 3.10: The set S is bounded. We say that a sequence {xk } is bounded if the set {xk | k ∈ Z+ } is bounded. The following is a standard theorem about convergent sequences. Proposition 3.18 If {xk } is a sequence in Rn that is convergent, then it is bounded.
Chapter 3. Continuous Functions on Connected Sets and Compact Sets
163
Proof Assume that the sequence {xk } converges to the point x0 . Then there is a positive integer K such that ∥xk − x0 ∥ < 1
for all k ≥ K.
Let M = max{∥xk − x0 ∥ | 1 ≤ k ≤ K − 1} + 1. Then M is finite and ∥xk − x0 ∥ ≤ M
for all k ∈ Z+ .
Hence, the sequence {xk } is bounded.
Figure 3.11: A convergent sequence is bounded. Let us now define the diameter of a bounded set. If S is a subset of Rn that is bounded, there is a positive number M such that ∥x∥ ≤ M
for all x ∈ S.
It follows from triangle inequality that for any u and v in S, ∥u − v∥ ≤ ∥u∥ + ∥v∥ ≤ 2M. Thus, the set DS = {d(u, v) | u, v ∈ S} = {∥u − v∥ | u, v ∈ S}
(3.1)
Chapter 3. Continuous Functions on Connected Sets and Compact Sets
164
is a set of nonnegative real numbers that is bounded above. In fact, for any subset S of Rn , one can define the set of real numbers DS by (3.1). Then S is a bounded set if and only if the set DS is bounded above. Definition 3.10 Diameter of a Bounded Set Let S be a bounded subset of Rn . The diameter of S, denoted by diam S, is defined as diam S = sup {d(u, v) | u, v ∈ S} = sup {∥u − v∥ | u, v ∈ S} . Example 3.16 Consider the rectangle R = [a1 , b1 ] × · · · × [an , bn ]. If u and v are two points in R, for each 1 ≤ i ≤ n, ui , vi ∈ [ai , bi ]. Thus, |ui − vi | ≤ bi − ai . It follows that ∥u − v∥ ≤
p (b1 − a1 )2 + · · · + (bn − an )2 .
If u0 = a = (a1 , . . . , an ) and v0 = b = (b1 , . . . , bn ), then u0 and v0 are in R, and p ∥u0 − v0 ∥ = (b1 − a1 )2 + · · · + (bn − an )2 . This shows that the diameter of R is diam R = ∥b − a∥ =
p (b1 − a1 )2 + · · · + (bn − an )2 .
Figure 3.12: The diameter of a rectangle.
Chapter 3. Continuous Functions on Connected Sets and Compact Sets
165
Intuitively, the diameter of the open rectangle U = (a1 , b1 ) × · · · × (an , bn ) is also equal to p d = (b1 − a1 )2 + · · · + (bn − an )2 . However, the points a = (a1 , . . . , an ) and b = (b1 , . . . , bn ) are not in U . There does not exist two points in U whose distance is d, but there are sequences of points {uk } and {vk } such that their distances {∥uk −vk ∥} approach d as k → ∞. We will formulate this as a more general theorem. Theorem 3.19 Let S be a subset of Rn . If S is bounded, then its closure S is also bounded. Moreover, diam S = diam S. Proof If u and v are two points in S, there exist sequences {uk } and {vk } in S that converge respectively to u and v. Then d(u, v) = lim d(uk , vk ). k→∞
(3.2)
For each k ∈ Z+ , since uk and vk are in S, d(uk , vk ) ≤ diam S. Eq. (3.2) implies that d(u, v) ≤ diam S. Since this is true for any u and v in S, S is bounded and diam S ≤ diam S. Since S ⊂ S, we also have diam S ≤ diam S. We conclude that diam S = diam S. The following example justifies that the diameter of a ball of radius r is indeed 2r.
Chapter 3. Continuous Functions on Connected Sets and Compact Sets
166
Example 3.17 Find the diameter of the open ball B(x0 , r) in Rn . Solution By Theorem 3.19, the diameter of the open ball B(x0 , r) is the same as the diameter of its closure, the closed ball CB(x0 , r). Given u and v in CB(x0 , r), ∥u − x0 ∥ ≤ r and ∥v − x0 ∥ ≤ r. Therefore, ∥u − v∥ ≤ ∥u − x0 ∥ + ∥v − x0 ∥ ≤ 2r. This shows that diam CB(x0 , r) ≤ 2r. The points u0 = x0 + re1 and v0 = x0 − re1 are in the closed ball CB(x0 , r). Since ∥u0 − v0 ∥ = ∥2re1 ∥ = 2r, diam CB(x0 , r) ≥ 2r. Therefore, the diameter of the closed ball CB(x0 , r) is exactly 2r. By Theorem 3.19, the diameter of the open ball B(x0 , r) is also 2r.
Figure 3.13: The diameter of a ball. In volume I, we have shown that a bounded sequence in R has a convergent subsequence. This is achieved by using the monotone convergence theorem, which says that a bounded monotone sequence in R is convergent. For points in Rn with n ≥ 2, we cannot apply monotone convergence theorem, as we cannot define a simple order on the points in Rn when n ≥ 2. Nevertheless, we can use the result of n = 1 and the componentwise convergence theorem to show that a bounded sequence in Rn has a convergent subsequence.
Chapter 3. Continuous Functions on Connected Sets and Compact Sets
167
Theorem 3.20 Let {uk } be a sequence in Rn . subsequence that is convergent.
If {uk } is bounded, then there is a
Sketch of Proof The n = 1 case is already established in volume I. Here we prove the n = 2 case. The n ≥ 3 case can be proved by induction using the same reasoning. For k ∈ Z+ , let uk = (xk , yk ). Since |xk | ≤ ∥uk ∥ and
|yk | ≤ ∥uk ∥,
the sequences {xk } and {yk } are bounded sequences. Thus, there is ∞ a subsequence {xkj }∞ j=1 of {xk }k=1 that converges to a point x0 in R. ∞ Consider the subsequence {ykj }∞ j=1 of the sequence {yk }k=1 . It is also bounded. Hence, there is a subsequence {ykjl }∞ l=1 that converges to ∞ a point y0 in R. Notice that the subsequence {xkjl }∞ l=1 of {xk }k=1 is also a subsequence of {xkj }∞ Hence, it also converges to x0 . j=1 . By componentwise convergence theorem, {ukjl }∞ l=1 is a subsequence of ∞ {uk }k=1 that converges to (x0 , y0 ). This proves the theorem when n = 2. Now we study the concept of sequential compactness. It is the same as the n = 1 case. Definition 3.11 Sequentially Compact Let S be a subset of Rn . We say that S is sequentially compact provided that every sequence in S has a subsequence that converges to a point in S. In volume I, we proved the Bolzano-Weierstrass theorem, which says that a subset of R is sequentially compact if and only if it is closed and bounded. In fact, the same is true for the n ≥ 2 case. Let us first look at some examples. Example 3.18 Show that the set A = {(x, y) | x2 + y 2 < 1} is not sequentially compact.
Chapter 3. Continuous Functions on Connected Sets and Compact Sets
168
Solution +
For k ∈ Z , let
uk =
k ,0 . k+1
Then {uk } is a sequence in A that converges to the point u0 = (1, 0) that is not in A. Thus, every subsequence of {uk } converges to the point u0 , which is not in A. This means the sequence {uk } in A does not have a subsequence that converges to a point in A. Hence, A is not sequentially compact. Note that the set A in Example 3.18 is not closed. Example 3.19 Show that the set C = {(x, y) | 1 ≤ x ≤ 3, y ≥ 0} is not sequentially compact. Solution For k ∈ Z , let uk = (2, k). Then {uk } is a sequence in C. If {ukj }∞ j=1 is a subsequence of {uk }, then k1 , k2 , k3 , . . . is a strictly increasing sequence of positive integers. Therefore kj ≥ j for all j ∈ Z+ . It follows that +
∥ukj ∥ = ∥(2, kj )∥ ≥ kj ≥ j
for all j ∈ Z+ .
Hence, the subsequence {ukj } is not bounded. Therefore, it is not convergent. This means that the sequence {uk } in C does not have a convergent subsequence. Therefore, C is not sequentially compact. Note that the set C in Example 3.19 is not bounded. Now we prove the main theorem.
Chapter 3. Continuous Functions on Connected Sets and Compact Sets
169
Theorem 3.21 Bolzano-Weierstrass Theorem Let S be a subset of Rn . The following are equivalent. (a) S is closed and bounded. (b) S is sequentially compact. Proof First assume that S is closed and bounded. Let {xk } be a sequence in S. Then {xk } is also bounded. By Theorem 3.20, there is subsequence {xkj } that converges to some x0 . Since S is closed, we must have x0 is in S. This proves that every sequence in S has a subsequence that converges to a point in S. Hence, S is sequentially compact. This completes the proof of (a) implies (b). To prove that (b) implies (a), it suffices to show that if S is not closed or S is not bounded, then S is not sequentially compact. If S is not closed, there is a sequence {xk } in S that converges to a point x0 , but x0 is not in S. Then every subsequence of {xk } converges to the point x0 , which is not in S. Thus, {xk } is a sequence in S that does not have any subsequence that converges to a point in S. This shows that S is not sequentially compact. If S is not bounded, for each positive integer k, there is a point xk in S such that ∥xk ∥ ≥ k. If {xkj }∞ j=1 is a subsequence of {xk }, then k1 , k2 , k3 , . . . is a strictly increasing sequence of positive integers. Therefore kj ≥ j for all j ∈ Z+ . It follows that ∥xkj ∥ ≥ kj ≥ j for al j ∈ Z+ . Hence, the subsequence {xkj } is not bounded. Therefore, it is not convergent. This means that the sequence {xk } in S does not have a convergent subsequence. Therefore, S is not sequentially compact. Corollary 3.22 A closed rectangle R = [a1 , b1 ] × · · · × [an , bn ] in Rn is sequentially compact.
Chapter 3. Continuous Functions on Connected Sets and Compact Sets
170
Proof We have shown in Chapter 1 that R is closed. Example 3.16 shows that R is bounded. Thus, R is sequentially compact. An interesting consequence of Theorem 3.19 is the following. Corollary 3.23 If S be a bounded subset of Rn , then its closure S is sequentially compact. Example 3.20 Determine whether the following subsets of R3 is sequentially compact. (a) A = {(x, y, z) | xyz = 1}. (b) B = {(x, y, z) | x2 + 4y 2 + 9z 2 ≤ 36}. (c) C = {(x, y, z) | 1 ≤ x ≤ 2, 1 ≤ y ≤ 3, 0 < xyz ≤ 4}. Solution (a) For any k ∈ Z+ , let
1 uk = k, , 1 . k
Then {uk } is a sequence in A, and ∥uk ∥ ≥ k. Therefore, A is not bounded. Hence, A is not sequentially compact. (b) For any u = (x, y, z) ∈ B, ∥u∥2 = x2 + y 2 + z 2 ≤ x2 + 4y 2 + 9z 2 ≤ 36. Hence, B is bounded. The function f : R3 → R, f (x, y, z) = x2 + 4y 2 + 9z 2 is a polynomial. Hence, it is continuous. Since the set I = (−∞, 36] is closed in R, and B = f −1 (I), B is closed in R3 . Since B is closed and bounded, it is sequentially compact.
Chapter 3. Continuous Functions on Connected Sets and Compact Sets
(c) For any k ∈ Z+ , let
uk =
1 1, 1, k
171
.
Then {uk } is a sequence of points in C that converges to the point u0 = (1, 1, 0), which is not in C. Thus, C is not closed, and so C is not sequentially compact. The following theorem asserts that continuous functions preserve sequential compctness. Theorem 3.24 Let D be a sequentially compact subset of Rn . If the function F : D → Rm is continuous, then F(D) is a sequentially compact subset of Rm . The proof of this theorem is identical to the n = 1 case. Proof Let {yk } be a sequence in F(D). For each k ∈ Z+ , there exists xk ∈ D such that F(xk ) = yk . Since D is sequentially compact, there is a subsequence {xkj } of {xk } that converges to a point x0 in D. Since F is continuous, the sequence {F(xkj )} converges to F(x0 ). Note that F(x0 ) is in F(D). In other words, {ykj } is a subsequence of the sequence {yk } that converges to F(x0 ) in F(D). This shows that every sequence in F(D) has a subsequence that converges to a point in F(D). Thus, F(D) is a sequentially compact subset of Rm . We are going to discuss important consequences of Theorem 3.24 in the coming section. For the rest of this section, we introduce the concept of compactness, which plays a central role in modern analysis. We start with the definition of an open covering.
Chapter 3. Continuous Functions on Connected Sets and Compact Sets
172
Definition 3.12 Open Covering Let S be a subset of Rn , and let A = {Uα | α ∈ J} be a collection of open sets in Rn indexed by the set J. We say that A is an open covering of S provided that [ S⊂ Uα . α∈J
Example 3.21 For each k ∈ Z+ , let Uk = (1/k, 1). Then Uk is an open set in R and ∞ [
Uk = (0, 1).
k=1
Hence, A = {Uk | k ∈ Z+ } is an open covering of the set S = (0, 1). Remark 3.4 If A = {Uα | α ∈ J} is an open covering of S and S ′ is a subset of S, then A = {Uα | α ∈ J} is also an open covering of S ′ . Example 3.22 For each k ∈ Z+ , let Uk = B(0, k) be the ball in Rn centered at the origin and having radius k. Then ∞ [
Uk = Rn .
k=1
Thus, A = {Uk | k ∈ Z+ } is an open covering of any subset S of Rn . Definition 3.13 Subcover Let S be a subset of Rn , and let A = {Uα | α ∈ J} be an open covering of S. A subcover is a subcollection of A which is also a covering of S. A finite subcover is a subcover that contains only finitely many elements.
Chapter 3. Continuous Functions on Connected Sets and Compact Sets
173
Example 3.23 ∞ [
For each k ∈ Z, let Uk = (k, k + 2). Then
Uk = R. Thus, A =
k=−∞
{Uk | k ∈ Z} is an open covering of the set S = [−3, 4). There is a finite subcover of S given by B = {U−4 , U−3 , U−2 , U−1 , U0 , U1 , U2 }. Definition 3.14 Compact Sets Let S be a subset of Rn . We say that S is compact provided that every open covering of S has a finite subcover. Namely, if A = {Uα | α ∈ J} is an open covering of S, then there exist α1 , . . . , αk ∈ J such that S⊂
k [
Uαj .
j=1
Example 3.24 The subset S = (0, 1) of R is not compact. For k ∈ Z+ , let Uk = (1/k, 1). Example 3.21 says that A = {Uk | k ∈ Z+ } is an open covering of the set S. We claim that there is no finite subcollection of A that covers S. Assume to the contrary that there exists a finite subcollection of A that covers S. Then there are positive integers k1 , . . . , km such that (0, 1) ⊂
m [ j=1
Ukj =
m [ 1 j=1
kj
,1 .
Notice that if ki ≤ kj , then Uki ⊂ Ukj . Thus, if K = max{k1 , . . . , km }, then m [ 1 Ukj = UK = ,1 , K j=1 and so S = (0, 1) is not contained in UK . This gives a contradiction. Hence, S is not compact.
Chapter 3. Continuous Functions on Connected Sets and Compact Sets
174
Example 3.25 As a subset of itself, Rn is not compact. For k ∈ Z+ , let Uk = B(0, k) be the ball in Rn centered at the origin and having radius k. Example 3.22 says that A = {Uk | k ∈ Z+ } is an open covering of Rn . We claim that there is no finite subcover. Assume to the contrary that there is a finite subcover. Then there exist positive integers k1 , . . . , km such that Rn =
m [
Uk .
j=1
Notice that if ki ≤ kj , then Uki ⊂ Ukj . Thus, if K = max{k1 , . . . , km }, then m [ Ukj = UK = B(0, K). j=1
Obviously, B(0, K) is not equal to Rn . This gives a contradiction. Hence, Rn is not compact. Our goal is to prove the Heine-Borel theorem, which says that a subset of Rn is compact if and only if it is closed and bounded. We first prove the easier direction. Theorem 3.25 Let S be a subset of Rn . If S is compact, then it is closed and bounded. Proof We show that if S is compact, then it is bounded; and if S is compact, then it is closed. First we prove that if S is compact, then it is bounded. For k ∈ Z+ , let Uk = B(0, k) be the ball in Rn centered at the origin and having radius k. Example 3.22 says that A = {Uk | k ∈ Z+ } is an open covering of S. Since S is compact, there exist positive integers k1 , . . . , km such that S⊂
m [ j=1
Ukj = UK = B(0, K),
Chapter 3. Continuous Functions on Connected Sets and Compact Sets
175
where K = max{k1 , . . . , km }. This shows that ∥x∥ ≤ K
for all x ∈ S.
Hence, S is bounded. Now we prove that if S is compact, then it is closed. For this, it suffices to show that S ⊂ S, or equivalently, any point that is not in S is not in S. Assume that x0 is not in S. For each k ∈ Z+ , let 1 n Vk = ext B(x0 , 1/k) = x ∈ R ∥x − x0 ∥ > . k Then Vk is open in Rn . If x is a point in Rn and x ̸= x0 , then r = ∥x − x0 ∥ > 0. There is a k ∈ Z+ such that 1/k < r. Then x is in Vk . This shows that ∞ [ Vk = Rn \ {x0 }. k=1
Therefore, A = {Vk | k ∈ Z+ } is an open covering of S. Since S is compact, there is a finite subcover. Namely, there exist positive integers k1 , . . . , km such that m [ S⊂ Vkj = VK , j=1
where K = max{k1 , . . . , km }. Since B(x0 , 1/K) is disjoint from VK , it does not contain any point of S. This shows that x0 is not in S, and thus the proof is completed. Example 3.26 The set A = {(x, y, z) | xyz = 1} in Example 3.20 is not compact because it is not bounded. The set C = {(x, y, z) | 1 ≤ x ≤ 2, 1 ≤ y ≤ 3, 0 < xyz ≤ 4} is not compact because it is not closed.
Chapter 3. Continuous Functions on Connected Sets and Compact Sets
176
We are now left to show that a closed and bounded subset of Rn is compact. We start by proving a special case. Theorem 3.26 A closed rectangle R = [a1 , b1 ] × · · · × [an , bn ] in Rn is compact. Proof We will prove by contradiction. Assume that R is not compact, and we show that this will lead to a contradiction. The idea is to use the bisection method. If R is not compact, there is an open covering A = {Uα | α ∈ J} of R which does not have a finite subcover. Let R1 = R, and let d1 = diam R1 . For 1 ≤ i ≤ n, let ai,1 = ai and bi,1 = bi , and let mi,1 to be the midpoint of the interval [ai,1 , bi,1 ]. The hyperplanes xi = mi,1 , 1 ≤ i ≤ n, divides the rectangle R1 into 2n subrectangles. Notice that A is also an open covering of each of these subrectangles. If each of these subrectangles can be covered by a finite subcollection of open sets in A , then R also can be covered by a finite subcollection of open sets in A . Since we assume R cannot be covered by any finite subcollection of open sets in A , there is at least one of the 2n subrectangles which cannot be covered by any finite subcollection of open sets in A . Choose one of these, and denote it by R2 . Define ai,2 , bi,2 for 1 ≤ i ≤ n so that R2 = [a1,2 , b1,2 ] × · · · × [an,2 , bn,2 ]. Note that
bi,1 − ai,1 for 1 ≤ i ≤ n. 2 Therefore, d2 = diam R2 = d1 /2. We continue this bisection process to obtain the rectangles R1 , R2 , · · · , so that Rk+1 ⊂ Rk for all k ∈ Z+ , and Rk cannot be covered by any finite subcollections of A . bi,2 − ai,2 =
Chapter 3. Continuous Functions on Connected Sets and Compact Sets
177
Figure 3.14: Bisection method. Define ai,k , bi,k for 1 ≤ i ≤ n so that Rk = [a1,k , b1,k ] × · · · × [an,k , bn,k ]. Then for all k ∈ Z+ , bi,k+1 − ai,k+1 =
bi,k − ai,k 2
for 1 ≤ i ≤ n.
It follows that dk+1 = diam Rk+1 = dk /2. For any 1 ≤ i ≤ n, {ai,k }∞ k=1 is an increasing sequence that is bounded ∞ above by bi , and {bi,k }k=1 is a decreasing sequence that is bounded below by ai . By monotone convergence theorem, the sequence {ai,k }∞ k=1 ∞ converges to ai,0 = sup ai,k ; while the sequence {bi,k }k=1 converges to k∈Z+
bi,0 = inf+ bi,k . Since k∈Z
bi,k − ai,k =
b i − ai 2k−1
for all k ∈ Z+ ,
we find that ai,0 = bi,0 . Let ci = ai,0 = bi,0 . Then ai,k ≤ ci ≤ bi,k for all 1 ≤ i ≤ n and all k ∈ Z+ . Thus, c = (c1 , . . . , cn ) is a point in Rk for all k ∈ Z+ . By assumption that A is an open covering of R = R1 , there exists β ∈ J such that c ∈ Uβ . Since Uβ is an open set, there is an r > 0 such that B(c, r) ⊂ Uβ . Since dk = diam Rk =
d1 2k−1
for all k ∈ Z+ ,
Chapter 3. Continuous Functions on Connected Sets and Compact Sets
178
we find that lim dk = 0. Hence, there is a positive integer K such that k→∞ dK < r. If x ∈ RK , then ∥x − c∥ ≤ diam RK = dK < r. This implies that x is in B(c, r). Thus, we have shown that RK ⊂ B(c, r). Therefore, RK is contained in the single element Uβ of A , which contradicts to RK cannot be covered by any finite subcollection of open sets in A . We conclude that R must be compact. Now we can prove the Heine-Borel theorem. Theorem 3.27 Heine-Borel Theorem Let S be a subset of Rn . Then S is compact if and only if it is closed and bounded. Proof We have shown in Theorem 3.25 that if S is compact, then it must be closed and bounded. Now assume that S is closed and bounded, and let A = {Uα | α ∈ J} be an open covering of S. Since S is bounded, there exists a positive number M such that ∥x∥ ≤ M for all x ∈ S. Thus, if x = (x1 , . . . , xn ) is in S, then for all 1 ≤ i ≤ n, |xi | ≤ ∥x∥ ≤ M . This implies that S is contained in the closed rectangle R = [−M, M ] × · · · × [−M, M ]. Let V = Rn \S. Since S is closed, V is an open set. Then Af= A ∪{V } is an open covering of Rn , and hence it is an open covering of R. By Theorem e ⊂ Afwhich is a finite subcover 3.26, R is compact. Thus, there exists B e \ {V } is a finite subcollection of A that covers S. This of R. Then B = B proves that S is compact.
Chapter 3. Continuous Functions on Connected Sets and Compact Sets
179
Example 3.27 We have shown in Example 3.20 that the set B = (x, y, z) | x2 + 4y 2 + 9z 2 ≤ 36 is closed and bounded. Hence, it is compact. Now we can conclude our main theorem from the Bolzano-Weierstrass theorem and the Heine-Borel theorem. Theorem 3.28 Let S be a subset of Rn . Then the following are equivalent. (a) S is sequentially compact. (b) S is closed and bounded. (c) S is compact. Remark 3.5 Henceforth, when we say a subset S of Rn is compact, we mean it is a closed and bounded set, and it is sequentially compact. By Theorem 3.19, a subset S of Rn has compact closure if and only if it is a bounded set. Finally, we can conclude the following, which says that continuous functions preserve compactness. Theorem 3.29 Let D be a compact subset of Rn . If the function F : D → Rm is continuous, then F(D) is a compact subset of Rm . Proof Since D is compact, it is sequentially compact. By Theorem 3.24, F(D) is a sequentially compact subset of Rm . Hence, F(D) is a compact subset of Rm .
Chapter 3. Continuous Functions on Connected Sets and Compact Sets
180
Exercises 3.3 Question 1 Determine whether the following subsets of R2 is sequentially compact. (a) A = {(x, y) | x2 + y 2 = 9}. (b) B = {(x, y) | 0 < x2 + 4y 2 ≤ 36}. (c) C = {(x, y) | x ≥ 0, 0 ≤ y ≤ x2 }. Question 2 Determine whether the following subsets of R3 is compact. (a) A = {(x, y, z) | 1 ≤ x ≤ 2}. (b) B = {(x, y, z) | |x| + |y| + |z| ≤ 10}. (c) C = {(x, y, z) | 4 ≤ x2 + y 2 + z 2 ≤ 9}. Question 3 Given that A is a compact subset of Rn and B is a subset of A, show that B is compact if and only if it is closed. Question 4 If S1 , . . . , Sk are compact subsets of Rn , show that S = S1 ∪ · · · ∪ Sn is also compact. Question 5 If A is a compact subset of Rm , B is a compact subset of Rn , show that A × B is a compact subset of Rm+n .
Chapter 3. Continuous Functions on Connected Sets and Compact Sets
3.4
181
Applications of Compactness
In this section, we consider the applications of compactness. We are going to use repeatedly the fact that a subset S of Rn is compact if and only if it is closed and bounded, if and only if it is sequentially compact. 3.4.1
The Extreme Value Theorem
First we define bounded functions. Definition 3.15 Bounded Functions Let D be a subset of Rn , and let F : D → Rm be a function defined on D. We say that the function F is bounded if the set F(D) is a bounded subset of Rm . In other words, the function F : D → Rm is bounded if there is positive number M such that ∥F(x)∥ ≤ M
for all x ∈ D.
Example 3.28 Let D = {(x, y, z) | 0 < x2 + y 2 + z 2 < 4}, and let F : D → R2 be the function defined as 1 F(x, y, z) = ,x + y + z . x2 + y 2 + z 2 For k ∈ Z+ , the point uk = (1/k, 0, 0) is in D and 2 1 F(uk ) = k , . k Thus, ∥F(uk )∥ ≥ k 2 . This shows that F is not bounded, even though D is a bounded set. Theorem 3.24 gives the following.
Chapter 3. Continuous Functions on Connected Sets and Compact Sets
182
Theorem 3.30 Let D be a compact subset of Rn . If the function F : D → Rm is continuous, then it is bounded. Proof By Theorem 3.29, F(D) is compact. Hence, it is bounded. Example 3.29 Let D = {(x, y, z) | 1 < x2 + y 2 + z 2 < 4}, and let F : D → R2 be the function defined as 1 F(x, y, z) = ,x + y + z . x2 + y 2 + z 2 Show that F : D → R2 is a bounded function. Solution Notice that the set D is not closed. Therefore, we cannot apply Theorem 3.30 directly. Consider the set U = {(x, y, z) | 1 ≤ x2 + y 2 + z 2 ≤ 4}. For any u = (x, y, z) in U, ∥u∥ ≤ 2. Hence, U is bounded. The function f : R3 → R defined as f (x, y, z) = x2 + y 2 + z 2 is continuous, and U = f −1 ([1, 4]). Since [1, 4] is closed in R, U is closed in R3 . Since f (x, y, z) ̸= 0 on U, F1 (x, y, z) =
x2
1 + y2 + z2
is continuous on U. Being a polynomial function, F2 (x, y, z) = x + y + z is continuous. Thus, F : U → R2 is continuous. Since U is closed and bounded, Theorem 3.30 implies that F : U → R2 is bounded. Since D ⊂ U, F : D → R2 is also a bounded function. Recall that if S is a subset of R, S has maximum value if and only if S is bounded above and sup S is in S; while S has minimum value if and only if S is bounded below and inf S is in S.
Chapter 3. Continuous Functions on Connected Sets and Compact Sets
183
Definition 3.16 Extremizer and Extreme Values Let D be a subset of Rn , and let f : D → R be a function defined on D. 1. The function f has maximum value if there is a point x0 in D such that f (x0 ) ≥ f (x)
for all x ∈ D.
The point x0 is called a maximizer of f ; and f (x0 ) is the maximum value of f . 2. The function f has minimum value if there is a point x0 in D such that f (x0 ) ≤ f (x)
for all x ∈ D.
The point x0 is called a minimizer of f ; and f (x0 ) is the minimum value of f . We have proved in volume I that a sequentially compact subset of R has a maximum value and a minimum value. This gives us the extreme value theorem. Theorem 3.31 Extreme Value Theorem Let D be a compact subset of Rn . If the function f : D → R is continuous, then it has a maximum value and a minimum value. Proof By Theorem 3.24, f (D) is a sequentially compact subset of R. Therefore, f has a maximum value and a minimum value. Example 3.30 Let D = {(x, y) | x2 + 2x + y 2 ≤ 3}, and let f : D → R be the function defined by f (x, y) = x2 + xy 3 + ex−y . Show that f has a maximum value and a minimum value.
Chapter 3. Continuous Functions on Connected Sets and Compact Sets
184
Solution Notice that D = (x, y) | x2 + 2x + y 2 ≤ 3 = (x, y) | (x + 1)2 + y 2 ≤ 4 is a closed ball. Thus, it is closed and bounded. The function f1 (x, y) = x2 + xy 3 and the function g(x, y) = x − y are polynomial functions. Hence, they are continuous. The exponential function h(x) = ex is continuous. Hence, the function f2 (x, y) = (h ◦ g)(x, y) = ex−y is continuous. Since f = f1 + f2 , the function f : D → R is continuous. Since D is compact, the function f : D → R has a maximum value and a minimum value. Remark 3.6 Extreme Value Property Let S be a subset of Rn . We say that S has extreme value property provided that whenever f : S → R is a continuous function, then f has maximum and minimum values. The extreme value theorem says that if S is compact, then it has extreme value property. Now let us show the converse. Namely, if S has extreme value property, then it is compact, or equivalently, it is closed and bounded. If S is not bounded, the function f : S → R, f (x) = ∥x∥ is continuous, but it does not have maximum value. If S is not closed, there is a sequence {xk } in S that converges to a point x0 that is not in S. The function g : S → R, g(x) = ∥x − x0 ∥ is continuous and g(x) ≥ 0 for all x ∈ S. Since lim g(xk ) = 0, we find that inf g(S) = 0. Since x0 is not in S, there is no
k→∞
point x in S such that g(x) = 0. Hence, g does not have minimum value. This shows that for S to have extreme value property, it is necessary that S is closed and bounded. Therefore, a subset S of Rn has extreme value property if and only if it is compact.
3.4.2
Distance Between Sets
The distance between two sets is defined in the following way.
Chapter 3. Continuous Functions on Connected Sets and Compact Sets
185
Definition 3.17 Distance Between Two Sets Let A and B be two subsets of Rn . The distance between A and B is defined as d(A, B) = inf {d(a, b) | a ∈ A, b ∈ B} . The distance between two sets is always well-defined and nonnegative. If A and B are not disjoint, then their distance is 0. Example 3.31 Let A = {(x, y) | x2 + y 2 < 1} and let B = [1, 3] × [−1, 1]. Find the distance between the two sets A and B. Solution +
For k ∈ Z , let ak be the point in A given by 1 ak = 1 − , 0 . k Let b = (1, 0). Then b is in B. Notice that d(ak , b) = ∥ak − b∥ =
1 . k
1 Hence, d(A, B) ≤ for all k ∈ Z+ . This shows that the distance between k A and B is 0.
Figure 3.15: The sets A and B in Example 3.31.
Chapter 3. Continuous Functions on Connected Sets and Compact Sets
186
In Example 3.31, we find that the distance between two disjoint sets can be 0, even though they are both bounded. Example 3.32 Let A = {(x, y) | y = 0} and let B = {(x, y) | xy = 1}. Find the distance between the two sets A and B. Solution +
For k ∈ Z , let ak = (k, 0) and bk = (k, 1/k). Then ak is in A and bk is in B. Notice that 1 d(ak , bk ) = ∥ak − bk ∥ = . k 1 Hence, d(A, B) ≤ for all k ∈ Z+ . This shows that the distance between k A and B is 0.
Figure 3.16: The sets A and B in Example 3.32. In Example 3.32, we find that the distance between two disjoint sets can be 0, even though both of them are closed. When B is the one-point set {x0 }, the distance between A and B is the distance from the point x0 to the set A. We denote this distance as d(x0 , A). In other words, d(x0 , A) = inf {d(a, x0 ) | a ∈ A} . If x0 is a point in A, then d(x0 , A) = 0. However, the distance from a point x0 to a set A can be 0 even though x0 is not in A. For example, the distance between
Chapter 3. Continuous Functions on Connected Sets and Compact Sets
187
the point x0 = (1, 0) and the set A = {(x, y) | x2 + y 2 < 1} is 0, even thought x0 is not in A. The following proposition says that this cannot happen if A is closed. Proposition 3.32 Let A be a closed subset of Rn and let x0 be a point in Rn . Then d(x0 , A) = 0 if and only if x0 is in A. Proof If x0 is in A, it is obvious that d(x0 , A) = 0. Conversely, if x0 is not in A, x0 is in the open set Rn \ A. Therefore, there is an r > 0 such that B(x0 , r) ⊂ Rn \ A. For any a ∈ A, a ∈ / B(x0 , r). Therefore, ∥x0 − a∥ ≥ r. Taking infimum over a ∈ A, we find that d(x0 , A) ≥ r. Hence, d(x0 , A) ̸= 0.
Figure 3.17: A point outside a closed set has positive distance from the set. Proposition 3.33 Given a subset A of Rn , define the function f : Rn → R by f (x) = d(x, A). Then f is a continuous function.
Chapter 3. Continuous Functions on Connected Sets and Compact Sets
188
Proof We prove something stronger. For any u and v in Rn , we claim that |f (u) − f (v)| ≤ ∥u − v∥. This means that f is a Lipschitz function with Lipschitz constant 1, which implies that it is continuous. Given u and v in Rn , if a is in A, then d(u, A) ≤ ∥u − a∥ ≤ ∥v − a∥ + ∥u − v∥. This shows that ∥v − a∥ ≥ d(u, A) − ∥u − v∥. Taking infimum over a ∈ A, we find that d(v, A) ≥ d(u, A) − ∥u − v∥. Therefore, f (u) − f (v) ≤ ∥u − v∥. Interchanging u and v, we obtain f (v) − f (u) ≤ ∥u − v∥. This proves that |f (u) − f (v)| ≤ ∥u − v∥. Now we can prove the following. Theorem 3.34 Let A and C be disjoint subsets of Rn . If A is compact and C is closed, then the distance between A and C is positive.
Chapter 3. Continuous Functions on Connected Sets and Compact Sets
189
Proof By Proposition 3.33, the function f : A → R, f (a) = d(a, C) is continuous. Since A is compact, f has a minimum value. Namely, there is a point a0 in A such that d(a0 , C) ≤ d(a, C)
for all a ∈ A.
For any a in A and c ∈ C, d(a, c) ≥ d(a, C) ≥ d(a0 , C). Taking infimum over all a ∈ A and c ∈ C, we find that d(A, C) ≥ d(a0 , C). By definition, we also have d(A, C) ≤ d(a0 , C). Thus, d(A, C) = d(a0 , C). Since A and C are disjoint and C is closed, Proposition 3.32 implies that d(A, C) = d(a0 , C) > 0. An equivalent form of Theorem 3.34 is the following important theorem. Theorem 3.35 Let A be a compact subset of Rn , and let U be an open subset of Rn that contains A. Then there is a positive number δ such that if x is a point in Rn that has a distance less than δ from the set A, then x is in U .
Figure 3.18: A compact set has a positive distance from the boundary of the open set that contains it.
Chapter 3. Continuous Functions on Connected Sets and Compact Sets
190
Proof n
Let C = R \ U . Then C is a closed subset of Rn that is disjoint from A. By Theorem 3.34, δ = d(A, C) > 0. If x is in Rn and d(x, A) < δ, then x cannot be in C. Therefore, x is in U . As a corollary, we have the following. Corollary 3.36 Let A be a compact subset of Rn , and let U be an open subset of Rn that contains A. Then there is a positive number r and a compact set K such that A ⊂ K ⊂ U , and if x is a point in Rn that has a distance less than r from the set A, then x is in K. Proof By Theorem 3.35, there is a positive number δ such that if x is a point in Rn that has a distance less than δ from the set A, then x is in U . Take r = δ/2, and let [ K =V, B(u, r). where V = u∈A
Since A is compact, it is bounded. There is a positive number M such that ∥u∥ ≤ M for all u ∈ A. If x ∈ V , then there is an u ∈ A such that ∥x − u∥ < r. This implies that ∥x∥ ≤ M + r. Hence, the set V is also bounded. Since K is the closure of a bounded set, K is compact. Since A ⊂ V , A ⊂ K. If w ∈ K, since K is the closure of V , there is a point v in V that lies in B(w, r). By the definition of V , there is a point u in A such that v ∈ B(u, r). Thus, ∥w − u∥ ≤ ∥w − v∥ + ∥v − u∥ < r + r = δ. This implies that w has a distance less than δ from A. Hence, w is in U . This shows that K ⊂ U . Now if x is a point that has distance d less than r from the set A, there is a point u is A such that ∥x − u∥ < r. This implies that x ∈ B(u, r) ∈ V ⊂ K.
Chapter 3. Continuous Functions on Connected Sets and Compact Sets 3.4.3
191
Uniform Continuity
In Section 2.4, we have discussed uniform continuity. Let D be a subset of Rn and let F : D → Rm be a function defined on D. We say that F : D → Rm is uniformly continuous provided that for any ε > 0, there exists δ > 0 such that for any points u and v in D, if ∥u − v∥ < δ, then ∥F(u) − F(v)∥ < ε. If a function is uniformly continuous, it is continuous. The converse is not true. However, a continuous function that is defined on a compact subset of Rn is uniformly continuous. This is an important theorem in analysis. Theorem 3.37 Let D be a subset of Rn , and let F : D → Rm be a continuous function defined on D. If D is compact, then F : D → Rm is uniformly continuous. Proof Assume to the contrary that F : D → Rm is not uniformly continuous. Then there exists an ε > 0, for any δ > 0, there exist points u and v in D such that ∥u − v∥ < δ and ∥F(u) − F(v)∥ ≥ ε. This implies that for any k ∈ Z+ , there exist uk and vk in D such that ∥uk − vk ∥ < 1/k and ∥F(uk ) − F(vk )∥ ≥ ε. Since D is sequentially compact, there is a subsequence {ukj } of {uk } that converges to a point u0 in D. Consider the sequence {vkj } in D. It has a subsequence {vkjl } that converges to a point v0 in D. Being a subsequence of {ukj }, the sequence {ukjl } also converges to u0 . Since F : D → Rm is continuous, the sequences {F(ukjl )} and {F(vkjl )} converge to F(u0 ) and F(v0 ) respectively. Notice that by construction, ∥F(ukjl ) − F(vkjl )∥ ≥ ε
for all l ∈ Z+ .
Thus, ∥F(u0 ) − F(v0 )∥ ≥ ε. This implies that F(u0 ) ̸= F(v0 ), and so u0 ̸= v0 . Since kj1 , kj2 , . . . is a strictly increasing sequence of positive integers, kjl ≥ l. Thus, 1 1 ∥ukjl − vkjl ∥ < ≤ . kjl l
Chapter 3. Continuous Functions on Connected Sets and Compact Sets
192
Taking l → ∞ implies that u0 = v0 . This gives a contradiction. Thus, F : D → Rm must be uniformly continuous. Example 3.33 Let D = (−1, 4) × (−7, 5] and let F : D → R3 be the function defined as p xy F(x, y) = sin(x + y), x + y + 8, e . Show that F is uniformly continuous. Solution Let U = [−1, 4] × [−7, 5]. Then U is a closed and bounded subset of R2 that contains D. The functions f1 (x, y) = x + y, f2 (x, y) = x + y + 8 and f3 (x, y) = xy are polynomial functions. Hence, they are continuous. If (x, y) ∈ U, x ≥ −1, y ≥ −7 and so f2 (x, y) = x + y + 8 ≥ 0. Thus, f2 (U) is contained in the domain of the square root function. Since the square root function, the sine function and the exponential function are continuous on their domains, we find that the functions F1 (x, y) = sin(x + y),
F2 (x, y) =
p x + y + 8,
F3 (x, y) = exy
are continuous on U. Since U is closed and bounded, F : U → R3 is uniformly continuous. Since D ⊂ U, F : D → R3 is uniformly continuous. 3.4.4
Linear Transformations and Quadratic Forms
In Chapter 2, we have seen that a linear transformation T : Rn → Rm is a matrix transformation. Namely, there exists an m × n matrix such that T(x) = Ax
for all x ∈ Rn .
A linear transformation is continuous. Theorem 2.34 says that a linear transformation is Lipschitz. More precisely, there exists a positive constant c > 0 such that ∥T(x)∥ ≤ c∥x∥
for all x ∈ Rn .
Chapter 3. Continuous Functions on Connected Sets and Compact Sets
193
Theorem 2.5 says that when m = n, a linear transformation T : Rn → Rn is invertible if and only if it is one-to-one, if and only if the matrix A is invertible, if and only if det A ̸= 0. Here we want to give a stronger characterization of a linear transformation T : Rn → Rn that is invertible. Recall that to show that a linear transformation T : Rn → Rm is one-to-one, it is sufficient to show that T(x) = 0 implies that x = 0. Theorem 3.38 Let T : Rn → Rn be a linear transformation. The following are equivalent. (a) T is invertible. (b) There is a positive constant a such that ∥T(x)∥ ≥ a∥x∥
for all x ∈ Rn .
Proof (b) implies (a) is easy. Notice that (b) says that 1 ∥x∥ ≤ ∥T(x)∥ a
for all x ∈ Rn .
(3.3)
If T(x) = 0, then ∥T(x)∥ = 0. Eq. (3.3) implies that ∥x∥ = 0. Thus, x = 0. This proves that T is one-to-one. Hence, it is invertible. Conversely, assume that T : Rn → Rn is invertible. Let S n−1 = (x1 , . . . , xn ) | x21 + · · · + x2n = 1 be the standard unit (n − 1)-sphere in Rn . We have seen that S n−1 is compact. For any u ∈ S n−1 , u ̸= 0. Therefore, T(u) ̸= 0 and so ∥T(u)∥ > 0. The function f : S n−1 → Rn , f (u) = ∥T(u)∥ is continuous. Hence, it has a mimimum value at some u0 on S n−1 . Let a = ∥T(u0 )∥. Then a > 0. Since a is the minimum value of f , ∥T(u)∥ ≥ a
for all u ∈ S n−1 .
Chapter 3. Continuous Functions on Connected Sets and Compact Sets
194
Notice that if x = 0, ∥T(x)∥ ≥ a∥x∥ holds trivially. If x is in Rn and x ̸= 0, let u = αx, where α = 1/∥x∥. Then u is in S n−1 . Therefore, ∥T(u)∥ ≥ a. Since T(u) = αT(x), and α > 0, we find that ∥T(u)∥ = α∥T(x)∥. Hence, α∥T(x)∥ ≥ a. This gives ∥T(x)∥ ≥
a = a∥x∥. α
In Section 2.1.5, we have reviewed some theories of quadratic forms from linear algebra. In Theorem 2.7, we state for a quadratic form QA : Rn → R, QA (x) = xT Ax defined by the symmetric matrix A, we have λn ∥x∥2 ≤ QA (x) ≤ λ1 ∥x∥2
for all x ∈ Rn .
Here λn is the smallest eigenvalue of A, and λ1 is the largest eigenvalue of A. We have used Theorem 2.7 to prove that a linear transformation is Lipschitz in Theorem 2.34. It boils down to the fact that if T(x) = Ax, then ∥T(x)∥2 = xT (AT A)x, and AT A is a positive semi-definite quadractic form. In fact, we can also use Theorem 2.7 to prove Theorem 3.38, using the fact that if A is invertible, then AT A is positive definite. Let us prove a weaker version of Theorem 2.7 here, which is sufficient to establish Theorem 3.38 and the theorem which says that a linear transformation is Lipschitz. Theorem 3.39 Let A be an n × n symmetric matrix, and let QA : Rn → R be the quadratic form QA (x) = xT Ax defined by A. There exists constants a and b such that a∥x∥2 ≤ QA (x) ≤ b∥x∥2 for all x ∈ Rn , QA (u) = a∥u∥2 and QA (v) = b∥v∥2 for some u and v in Rn . Therefore, (i) if A is positive semi-definite, b ≥ a ≥ 0; (ii) if A is positive definite, b ≥ a > 0.
Chapter 3. Continuous Functions on Connected Sets and Compact Sets
195
Proof As in the proof of Theorem 3.38, consider the continuous function QA : S n−1 → R. Since S n−1 is compact, there exsits u and v in S n−1 such that QA (u) ≤ QA (w) ≤ QA (v)
for all w ∈ S n−1 .
Let a = QA (u) and b = QA (v). If x = 0, a∥x∥2 ≤ QA (x) ≤ b∥x∥2 holds trivially. Now if x is in Rn and x ̸= 0, let w = αx, where α = 1/∥x∥. Then w in on S n−1 . Notice that QA (w) = α2 QA (x). Hence, a≤
1 QA (x) ≤ b. ∥x∥2
This proves that a∥x∥2 ≤ QA (x) ≤ b∥x∥2 .
3.4.5
Lebesgue Number Lemma
Now let us prove the following important theorem. Theorem 3.40 Lebesgue Number Lemma Let A be a subset of Rn , and let A = {Uα | α ∈ J} be an open covering of A. If A is compact, there exists a positive number δ such that if S is a subset of A and diam S < δ, then S is contained in one of the elements of A . Such a positive number δ is called the Lebesgue number of the covering A. We give two proofs of this theorem.
Chapter 3. Continuous Functions on Connected Sets and Compact Sets
196
First Proof of the Lebesgue Number Lemma We use proof by contradiction. Assume that there does not exist a positive number δ such that any subset S of A that has diameter less than δ lies inside an open set in A . Then for any k ∈ Z+ , there is a subset Sk of A whose diameter is less than 1/k, but Sk is not contained in any element of A. For each k ∈ Z+ , the set Sk cannot be empty. Let xk be any point in Sk . Then {xk } is a sequence of points in A. Since A is sequentially compact, there is a subsequence {xkm } that converges to a point x0 in A. Since A is an open covering of A, there exists β ∈ J such that x0 ∈ Uβ . Since Uβ is open, there exists r > 0 such that B(x0 , r) ⊂ Uβ . Since the sequence {xkm } converges x0 , there is a positive integer M such that for all m ≥ M , xkm ∈ B(x0 , r/2). There exists an integer j ≥ M such that 1/kj < r/2. If x ∈ Akj , then r 1 < . ∥x − xkj ∥ ≤ diam Akj < kj 2 Since xkj ∈ B(x0 , r/2), ∥xkj − x0 ∥ < r/2. Therefore, ∥x − x0 ∥ < r. This proves that x ∈ B(x0 , r) ⊂ Uβ . Thus, we have shown that Akj ⊂ Uβ . But this contradicts to Akj does not lie in any element of A . Second Proof of the Lebesgue Number Lemma Since A is compact, there are finitely many indices α1 , . . . , αm in J such that m [ A⊂ Uαj . j=1 n
For 1 ≤ j ≤ m, let Cj = R \ Uαj . Then Cj is a closed set and
m \
Cj
j=1
is disjoint from A. By Theorem 3.33, the function fj : A → R, fj (x) = d(x, Cj ) is continuous. Define f : A → R by f (x) =
f1 (x) + · · · + fm (x) . m
Chapter 3. Continuous Functions on Connected Sets and Compact Sets
197
Then f is also a continuous function. Since A is compact, there is a point a0 in A such that f (a0 ) ≤ f (a)
for all a ∈ A.
Notice that fj (a0 ) ≥ 0 for all 1 ≤ j ≤ m. Since
m \
Cj is disjoint from
j=1
A, there is an 1 ≤ k ≤ m such that a0 ∈ / Ck . Proposition 3.32 says that fk (a0 ) = d(a0 , Ck ) > 0. Hence, f (a0 ) > 0. Let δ = f (a0 ). It is the minimum value of the function f : A → R. Now let S be a nonempty subset of A such that diam S < δ. Take a point x0 in S. Let 1 ≤ l ≤ m be an integer such that fl (x0 ) ≥ fj (x0 )
for all 1 ≤ j ≤ m.
Then δ ≤ f (x0 ) ≤ fl (x0 ) = d(x0 , Cl ). For any u ∈ Cl , d(x0 , u) ≥ d(x0 , Cl ) ≥ δ. If x ∈ S, then d(x, x0 ) ≤ diam S < δ. This implies that x is not in Cl . Hence, it must be in Uαl . This shows that S is contained in Uαl , which is an element of A . This completes the proof of the theorem. The Lebesgue number lemma can be used to give an alternative proof of Theorem 3.37, which says that a continuous function defined on a compact subset of Rn is uniformly continuous.
Chapter 3. Continuous Functions on Connected Sets and Compact Sets
198
Alternative Proof of Theorem 3.37 Fixed ε > 0. We want to show that there exists δ > 0 such that if u and v are in D and ∥u − v∥ < δ, then ∥F(u) − F(v)∥ < ε. We will construct an open covering of D indexed by J = D. Since F : D → Rm is continuous, for each x ∈ D, there is a positive number δx (depending on x), such that if u is in D and ∥u − x∥ < δx , then ∥F(u) − F(x)∥ < ε/2. Let Ux = B(x, δx ). Then Ux is an open set. If u and v are in Ux , ∥F(u) − F(x)∥ < ε/2 and ∥F(v) − F(x)∥ < ε/2. Thus, ∥F(u) − F(v)∥ < ε. Now A = {Ux | x ∈ D} is an open covering of D. Since D is compact, the Lebesgue number lemma implies that there exists a number δ > 0 such that if S is a subset of D that has diameter less than δ, then S is contained in one of the Ux for some x ∈ D. We claim that this is the δ that we need. If u and v are two points in D and ∥u − v∥ < δ, then S = {u, v} is a set with diameter less than δ. Hence, there is an x ∈ D such that S ⊂ Ux . This implies that u and v are in Ux . Hence, ∥F(u) − F(v)∥ < ε. This completes the proof.
Chapter 3. Continuous Functions on Connected Sets and Compact Sets
199
Exercises 3.4 Question 1 Let D = {(x, y) | 2 < x2 + 4y 2 < 10}, and let F : D → R3 be the function defined as x y x2 − y 2 . F(x, y) = , , x2 + y 2 x2 + y 2 x2 + y 2 Show that the function F : D → R3 is bounded. Question 2 Let D = {(x, y, z) | 1 ≤ x2 + 4y 2 ≤ 10, 0 ≤ z ≤ 5}, and let f : D → R be the function defined as f (x, y, z) =
x2 − y 2 . x2 + y 2 + z 2
Show that the function f : D → R has a maximum value and a minimum value. Question 3 Let A = {(x, y) | x2 + 4y 2 ≤ 16} and B = {(x, y) | x + y ≥ 10}. Show that the distance between the sets A and B is positive. Question 4 Let D = {(x, y, z) | x2 + y 2 + z 2 ≤ 20} and let f : D → R be the function defined as 2 2 f (x, y, z) = ex +4z . Show that f : D → R is uniformly continuous.
Chapter 3. Continuous Functions on Connected Sets and Compact Sets Question 5 Let D = (−1, 2) × (−6, 0) and let f : D → R be the function defined as f (x, y) =
p x + y + 7 + ln(x2 + y 2 + 1).
Show that f : D → R is uniformly continuous.
200
Chapter 4. Differentiating Functions of Several Variables
201
Chapter 4 Differentiating Functions of Several Variables In this chapter, we study differential calculus of functions of several variables.
4.1
Partial Derivatives
When f : (a, b) → R is a function defined on an open interval (a, b), the derivative of the function at a point x0 in (a, b) is defined as f (x0 ) + h) − f (x0 ) , h→0 h
f ′ (x0 ) = lim
provided that the limit exists. The derivative gives the instantaneous rate of change of the function at the point x0 . Geometrically, it is the slope of the tangent line to the graph of the function f : (a, b) → R at the point (x0 , f (x0 )).
Figure 4.1: Derivative as slope of tangent line. Now consider a function f : O → R that is defined on an open subset O of R , where n ≥ 2. What is the natural way to extend the concept of derivatives to this function? n
Chapter 4. Differentiating Functions of Several Variables
202
From the perspective of rate of change, we need to consider the change of f in various different directions. This leads us to consider directional derivatives. Another perspective is to regard existence of derivatives as differentiability and first-order approximation. Later we will see that all these are closely related. First let us consider the rates of change of the function f : O → R at a point x0 in O along the directions of the coordinate axes. These are called partial derivatives. Definition 4.1 Partial Derivatives Let O be an open subset of Rn that contains the point x0 , and let f : O → R be a function defined on O. For 1 ≤ i ≤ n, we say that the function f : O → R has a partial derivative with respect to its ith component at the point x0 if the limit f (x0 + hei ) − f (x0 ) lim h→0 h ∂f exists. In this case, we denote the limit by (x0 ), and call it the partial ∂xi derivative of f : O → R with respect to xi at x0 . ∂f (x0 ) We say that the function f : O → R has partial derivatives at x0 if ∂xi exists for all 1 ≤ i ≤ n. Remark 4.1 When we consider partial derivatives of a function, we always assume that the domain of the function is an open set O, so that each point x0 in the domain is an interior point of O, and a limit point of O\{x0 }. By definition of open sets, there exists r > 0 such that B(x0 , r) is contained in O. This allows us to compare the function values of f in a neighbourhood of x0 from various different directions. ∂f By definition, (x0 ) measures the rate of change of f at x0 in the ∂xi direction of ei . It can also be interpreted as the slope of a curve at the point (x0 , f (x0 )) on the surface xn+1 = f (x), as shown in Figure 4.2
Chapter 4. Differentiating Functions of Several Variables
203
Notations for Partial Derivatives ∂f (x0 ) is fxi (x0 ). An alternative notation for ∂xi
Figure 4.2: Partial derivative. Remark 4.2 Partial Derivatives Let x0 = (a1 , a2 , . . . , an ) and define the function g : (−r, r) → R by g(h) = f (x0 + hei ) = f (a1 , . . . , ai−1 , ai + h, ai+1 , . . . , an ). Then f (x0 + hei ) − f (x0 ) g(h) − g(0) = lim = g ′ (0). h→0 h→0 h h lim
Thus, fxi (x0 ) exists if and only if g(h) is differentiable at h = 0. Moreover, to find fxi (x0 ), we regard the variables x1 , . . . , xi−1 , xi+1 , . . . , xn as constants, and differentiate with respect to xi . Hence, the derivative rules such as sum rule, product rule and quotient rule still work for partial derivatives, as long as one is clear which variable to take derivative, which variable to be regarded as constant.
Chapter 4. Differentiating Functions of Several Variables
204
Example 4.1 Let f : R2 → R be the function defined as f (x, y) = x2 y. Find fx (1, 2) and fy (1, 2). Solution ∂f = 2xy, ∂x
∂f = x2 . ∂y
Therefore, fx (1, 2) = 4,
fy (1, 2) = 1.
Example 4.2 Let f : R2 → R be the function defined as f (x, y) = |x + y|. Determine whether fx (0, 0) exists. Solution By definition, fx (0, 0) is given by the limit f (h, 0) − f (0, 0) h→0 h lim
if it exists. Since |h| f (h, 0) − f (0, 0) = lim , h→0 h h→0 h lim
and lim−
h→0
|h| = −1 and h
the limit
lim+
h→0
f (h, 0) − f (0, 0) h→0 h does not exist. Hence, fx (0, 0) does not exist. lim
|h| = 1, h
Chapter 4. Differentiating Functions of Several Variables
205
Definition 4.2 Let O be an open subset of Rn , and let f : O → R be a function defined on O. If the function f : O → R has partial derivative with respect to xi at every point of O, this defines the function fxi : O → R. In this case, we say that the partial derivative of f with respect to xi exists. If fxi : O → R exists for all 1 ≤ i ≤ n, we say that the function f : O → R has partial derivatives. Example 4.3 Find the partial derivatives of the function f : R3 → R defined as f (x, y, z) = sin(xy + z) +
y2
3x . + z2 + 1
Solution 3 ∂f (x, y, z) = y cos(xy + z) + 2 , ∂x y + z2 + 1 ∂f 6xy (x, y, z) = x cos(xy + z) − 2 , ∂y (y + z 2 + 1)2 6xz ∂f (x, y, z) = cos(xy + z) − 2 . ∂z (y + z 2 + 1)2 For a function defined on an open subset of Rn , there are n partial derivatives with respect to the n directions defined by the coordinate axes. These define a vector in Rn . Definition 4.3 Gradient Let O be an open subset of Rn , and let x0 be a point in O. If the function f : O → R has partial derivatives at x0 , we define the gradient of the function f at x0 as the vector in Rn given by ∂f ∂f ∂f ∇f (x0 ) = (x0 ), (x0 ), · · · , (x0 ) . ∂x1 ∂x2 ∂xn Let us revisit Example 4.3.
Chapter 4. Differentiating Functions of Several Variables
206
Example 4.4 The gradient of the function f : R3 → R defined as f (x, y, z) = sin(xy + z) +
y2
3x + z2 + 1
in Example 4.3 is the function ∇f : R3 → R3 , 3 y cos(xy + z) + 2 y + z2 + 1 6xy x cos(xy + z) − ∇f (x, y, z) = . 2 2 2 (y + z + 1) 6xz cos(xy + z) − 2 2 2 (y + z + 1) In particular, 5 1 ∇f (1, −1, 1) = 0, , . 3 3 It is straightforward to extend the definition of partial derivative to a function F : O → Rm whose codomain is Rm with m ≥ 2. Definition 4.4 Let O be an open subset of Rn , and let F : O → Rm be a function defined on O. Given x0 in O and 1 ≤ i ≤ n, we say that F : O → Rm has partial derivative with respect to xi at the point x0 if the limit ∂F F(x0 + hei ) − F(x0 ) (x0 ) = lim h→0 ∂xi h exists. We say that F : O → Rm has partial derivative at the point x0 if ∂F (x0 ) exists for each 1 ≤ i ≤ n. We say that F : O → Rm has partial ∂xi derivative if it has partial derivative at each point of O. Since the limit of a function G : (−r, r) → Rm when h → 0 exists if and only if the limit of each component function Gj : (−r, r) → R, 1 ≤ j ≤ m when h → 0 exists, we have the following.
Chapter 4. Differentiating Functions of Several Variables
207
Proposition 4.1 Let O be an open subset of Rn , and let F : O → Rm be a function defined on O. Given x0 in O and 1 ≤ i ≤ n, F : O → Rm has partial derivative with respect to xi at the point x0 if and only if if each component function Fj : O → R, 1 ≤ j ≤ m has partial derivative with respect to xi at the point x0 . In this case, we have ∂F1 ∂Fm ∂F (x0 ) = (x0 ), . . . , (x0 ) . ∂xi ∂xi ∂xi To capture all the partial derivatives, we define a derivative matrix. Definition 4.5 The Derivative Matrix Let O be an open subset of Rn that contains the point x0 , and let F : O → Rm be a function defined on O. If F : O → Rm has partial derivative at the point x0 , the derivative matrix of F : O → Rm at x0 is the m × n matrix ∂F1 ∂F1 ∂F1 (x0 ) (x0 ) · · · (x0 ) ∂x2 ∂xn ∂x1 ∂F ∇F1 (x0 ) ∂F ∂F 2 2 2 (x0 ) (x ) · · · (x ) 0 0 ∇F2 (x0 ) ∂x1 ∂x2 ∂xn = DF(x0 ) = . .. . .. .. .. ... . . . ∇Fm (x0 ) ∂F ∂Fm ∂Fm m (x0 ) (x0 ) · · · (x0 ) ∂x1 ∂x2 ∂xn When m = 1, the derivative matrix is just the gradient of the function as a row matrix. Example 4.5 Let F : R3 → R2 be the function defined as F(x, y, z) = xy 2 z 3 , x + 3y − 7z . Find the derivative matrix of F at the point (1, −1, 2).
Chapter 4. Differentiating Functions of Several Variables
208
Solution " # y 2 z 3 2xyz 3 3xy 2 z 2 DF(x, y, z) = . 1 3 −7 Thus, the derivative matrix of F at the point (1, −1, 2) is " # 8 −16 12 DF(1, −1, 2) = . 1 3 −7 Since the partial derivatives of a function is defined componentwise, we can focus on functions f : O → R whose codomain is R. One might wonder why we have not mentioned the word ”differentiable” so far. For single variable functions, we have seen in volume I that if a function is differentiable at a point, then it is continuous at that point. For multivariable functions, the existence of partial derivatives is not enough to guarantee continuity, as is shown in the next example. Example 4.6 Let f : R2 → R be the function defined as xy , if (x, y) ̸= (0, 0), 2 2 f (x, y) = x + y 0, if (x, y) = (0, 0). Show that f is not continuous at (0, 0), but it has partial derivatives at (0, 0).
Solution Consider the sequence {uk } with uk =
1 1 , k k
.
It is a sequence in R2 that converges to (0, 0). Since f (uk ) =
1 2
for all k ∈ Z+ ,
Chapter 4. Differentiating Functions of Several Variables
209
Figure 4.3: The function f (x, y) defined in Example 4.6.
the sequence {f (uk )} converges to 1/2. But f (0, 0) = 0 ̸= 1/2. Since there is a sequence {uk } that converges to (0, 0), but the sequence {f (uk )} does not converge to f (0, 0), f is not continuous at (0, 0). To find partial derivatives at (0, 0), we use definitions. f (h, 0) − f (0, 0) 0−0 = lim = 0, h→0 h→0 h h f (0, h) − f (0, 0) 0−0 fy (0, 0) = lim = lim = 0. h→0 h→0 h h
fx (0, 0) = lim
These show that f has partial derivatives at (0, 0), and fx (0, 0) = fy (0, 0) = 0. For the function defined in Example 4.6, it has partial derivatives at all points. In fact, when (x, y) ̸= (0, 0), we can apply derivative rules directly and find that ∂f (x2 + y 2 )y − 2x2 y y(y 2 − x2 ) (x, y) = = . ∂x (x2 + y 2 )2 (x2 + y 2 )2 Similarly, ∂f x(x2 − y 2 ) (x, y) = 2 . ∂y (x + y 2 )2 Let us highlight again our conclusion.
Chapter 4. Differentiating Functions of Several Variables
210
Partial Derivative vs Continuity The existence of partial derivatives does not imply continuity. This prompts us to find a better definition of differentiability, which can imply continuity. This will be considered in a latter section. When the function f : O → R has partial derivative with respect to xi , we obtain the function fxi : O → R. Then we can discuss whether the function fxi has partial derivative at a point in O. Definition 4.6 Second Order Partial Derivatives Let O be an open subset of Rn that contains the point x0 , and let f : O → R be a function defined on O. Given that 1 ≤ i ≤ n, 1 ≤ j ≤ n, we say ∂ 2f exists at x0 provided that that the second order partial derivative ∂xj ∂xi ∂f there exists an open ball B(x0 , r) that is contained in O such that : ∂xi B(x0 , r) → R exists, and it has partial derivative with respect to xj at the point x0 . In this case, we define the second order partial derivative ∂ 2f (x0 ) of f at x0 as ∂xj ∂xi ∂fxi fx (x0 + hej ) − fxi (x0 ) ∂ 2f (x0 ) = (x0 ) = lim i . h→0 ∂xj ∂xi ∂xj h We say that the function f : O → R has second order partial derivatives at ∂ 2f (x0 ) exists for all 1 ≤ i ≤ n, 1 ≤ j ≤ n. x0 provided that ∂xj ∂xi In the same way, one can also define second order partial derivatives for a function F : O → Rm with codomain Rm when m ≥ 2.
Chapter 4. Differentiating Functions of Several Variables
211
Remark 4.3 ∂ 2f In the definition of the second order partial derivative (x0 ), instead ∂xj ∂xi of assuming fxi (x) exists for all x in a ball of radius r centered at x0 , it is sufficient to assume that there exists r > 0 such that fxi (x0 + hej ) exists for all |h| < r. Definition 4.7 Given 1 ≤ i ≤ n, 1 ≤ j ≤ n, we say that the function f : O → R has the ∂ 2f ∂ 2f provided that (x0 ) exists for second order partial derivative ∂xj ∂xi ∂xj ∂xi all x0 in O. We say that the function f : O → R has second order partial derivatives ∂ 2f provided that exists for all 1 ≤ i ≤ n, 1 ≤ j ≤ n. ∂xj ∂xi Notations for Second Order Partial Derivatives Alternative notations for second order partial derivatives are ∂ 2f = (fxi )xj = fxi xj . ∂xj ∂xi Notice that the orders of xi and xj are different in different notations. Remark 4.4 Given 1 ≤ i ≤ n, 1 ≤ j ≤ n, the function f : O → R has the second ∂ 2f order partial derivative provided that fxi : O → R exists, and fxi ∂xj ∂xi has partial derivative with respect to xj . Example 4.7 Find the second order partial derivatives of the function f : R2 → R defined as f (x, y) = xe2x+3y .
Chapter 4. Differentiating Functions of Several Variables
212
Solution We find the first order partial derivatives first. ∂f (x, y) = e2x+3y + 2xe2x+3y = (1 + 2x)e2x+3y , ∂x ∂f (x, y) = 3xe2x+3y . ∂y Then we compute the second order partial derivatives. ∂ 2f (x, y) = 2e2x+3y + 2(1 + 2x)e2x+3y = (4 + 4x)e2x+3y , 2 ∂x ∂ 2f (x, y) = 3(1 + 2x)e2x+3y = (3 + 6x)e2x+3y , ∂y∂x ∂ 2f (x, y) = 3e2x+3y + 6xe2x+3y = (3 + 6x)e2x+3y , ∂x∂y ∂ 2f (x, y) = 9xe2x+3y . ∂y 2 Definition 4.8 The Hessian Matrix Let O be an open subset of Rn that contains the point x0 . If f : O → R is a function that has second order partial derivatives at x0 , the Hessian matrix of f at x0 is the n × n matrix defined as 2 ∂ f Hf (x0 ) = (x0 ) ∂xi ∂xj 2 ∂ f ∂ 2f ∂ 2f (x0 ) · · · (x0 ) ∂x21 (x0 ) ∂x1 ∂x2 ∂x1 ∂xn 2 2 2 ∂ f ∂ f ∂ f (x0 ) ··· (x0 ) ∂x2 ∂x1 (x0 ) 2 ∂x2 ∂x2 ∂xn . = .. .. .. .. . . . 2 . 2 2 ∂ f ∂ f ∂ f (x0 ) (x0 ) · · · (x ) 0 ∂xn ∂x1 ∂xn ∂x2 ∂x2n We do not define Hessian matrix for a function F : O → Rm with codomain Rm when m ≥ 2.
Chapter 4. Differentiating Functions of Several Variables
213
Example 4.8 For the function f : R2 → R defined as f (x, y) = xe2x+3y in Example 4.7, " # (4 + 4x)e2x+3y (3 + 6x)e2x+3y Hf (x, y) = . (3 + 6x)e2x+3y 9xe2x+3y In Example 4.7, we notice that ∂ 2f ∂ 2f (x, y) = (x, y) ∂y∂x ∂x∂y for all (x, y) ∈ R2 . The following example shows that this is not always true. Example 4.9 Consider the function f : R2 → R defined as 2 2 xy(x − y ) , if (x, y) ̸= (0, 0), x2 + y 2 f (x, y) = 0, if (x, y) = (0, 0). Find fxy (0, 0) and fyx (0, 0).
Figure 4.4: The function f (x, y) defined in Example 4.9.
Chapter 4. Differentiating Functions of Several Variables
214
Solution To compute fxy (0, 0), we need to compute fx (0, h) for all h in a neighbourhood of 0. To compute fyx (0, 0), we need to compute fy (h, 0) for all h in a neighbourhood of 0. Notice that for any h ∈ R, f (0, h) = f (h, 0) = 0. By considering h = 0 and h ̸= 0 separately, we find that h(t2 − h2 ) f (t, h) − f (0, h) = lim 2 = −h, t→0 t→0 t + h2 t h(h2 − t2 ) f (h, t) − f (h, 0) = lim = h. fy (h, 0) = lim t→0 h2 + t2 t→0 t
fx (0, h) = lim
It follows that fx (0, h) − fx (0, 0) −h = lim = −1, h→0 h h→0 h fy (h, 0) − fy (0, 0) h fyx (0, 0) = lim = lim = 1. h→0 h→0 h h fxy (0, 0) = lim
Example 4.9 shows that there exists a function f : R2 → R which has second order partial derivatives at (0, 0) but ∂ 2f ∂ 2f (0, 0) ̸= (0, 0). ∂x∂y ∂y∂x Remark 4.5 If O is an open subset of Rn that contains the point x0 , there exists r > 0 such that B(x0 , r) ⊂ O. Given that f : O → R is a function defined on O, and 1 ≤ i < j ≤ n, let D be the ball with center at (0, 0) and radius r in R2 . Define the function g : D → R by g(u, v) = f (x0 + uei + vej ). ∂ 2f ∂ 2g Then (x0 ) exists if and only if (0, 0) exists. In such case, we ∂xj ∂xi ∂v∂u have ∂ 2g ∂ 2f (x0 ) = (0, 0). ∂xj ∂xi ∂v∂u
Chapter 4. Differentiating Functions of Several Variables
215
The following gives a sufficient condition to interchange the order of taking partial derivatives. Theorem 4.2 Clairaut’s Theorem or Schwarz’s Theorem Let O be an open subset of Rn that contains the point x0 , and let f : O → R be a function defined on O. Assume that 1 ≤ i < j ≤ n, and the second ∂ 2f ∂ 2f : O → R and : O → R exist. If order partial derivatives ∂xj ∂xi ∂xi ∂xj ∂ 2f ∂ 2f the functions and : O → R are continuous at x0 , then ∂xj ∂xi ∂xi ∂xj ∂ 2f ∂ 2f (x0 ) = (x0 ). ∂xj ∂xi ∂xi ∂xj Proof Since O is an open set that contains the point x0 , there exists r > 0 such that B(x0 , r) ⊂ O. Let D = (u, v)| u2 + v 2 < r2 , and define the function g : D → R by g(u, v) = f (x0 + uei + vej ). ∂ 2g ∂ 2g and By Remark 4.5, g has second order partial derivatives, and ∂v∂u ∂u∂v are continuous at (0, 0). We need to show that ∂ 2g ∂ 2g (0, 0) = (0, 0). ∂v∂u ∂u∂v Consider the function G(u, v) = g(u, v) − g(u, 0) − g(0, v) + g(0, 0). Notice that G(u, v) = Hv (u) − Hv (0) = Su (v) − Su (0),
Chapter 4. Differentiating Functions of Several Variables
216
where Hv (u) = g(u, v) − g(u, 0),
Su (v) = g(u, v) − g(0, v).
For fixed v with |v| < r, the function Hv (u) is defined for those u with √ |u| < r2 − v 2 , such that (u, v) is in D. It is differentiable with Hv′ (u) =
∂g ∂g (u, v) − (u, 0). ∂u ∂u
Hence, if (u, v) is in D, mean value theorem for single variable functions implies that there exists cu,v ∈ (0, 1) such that G(u, v) = Hv (u) − Hv (0) = uHv′ (cu,v u) ∂g ∂g (cu,v u, v) − (cu,v u, 0) . =u ∂u ∂u Regard this now as a function of v, the mean value theorem for single variable functions implies that there exists du,v ∈ (0, 1) such that G(u, v) = uv
∂ 2g (cu,v u, du,v v). ∂v∂u
(4.1)
Using the same reasoning, we find that for (u, v) ∈ D, there exists deu,v ∈ (0, 1) such that ∂g ∂g ′ e e e G(u, v) = vSu (du,v v) = v (u, du,v v) − (0, du,v v) . ∂v ∂v Regard this as a function of u, mean value theorem implies that there exists e cu,v ∈ (0, 1) such that G(u, v) = uv
∂ 2g (e cu,v u, deu,v v). ∂u∂v
Comparing (4.1) and (4.2), we find that ∂ 2g ∂ 2g (cu,v u, du,v v) = (e cu,v u, deu,v v). ∂v∂u ∂u∂v
(4.2)
Chapter 4. Differentiating Functions of Several Variables
217
When (u, v) → (0, 0), (cu,v u, du,v v) → (0, 0) and (e cu,v u, deu,v v) → (0, 0). The continuities of guv and gvu at (0, 0) then imply that ∂ 2g ∂ 2g (0, 0) = (0, 0). ∂v∂u ∂u∂v This completes the proof. Example 4.10 Consider the function f : R2 → R in Example 4.9 defined as 2 2 xy(x − y ) , if (x, y) ̸= (0, 0), x2 + y 2 f (x, y) = 0, if (x, y) = (0, 0). When (x, y) ̸= (0, 0), we find that y(x4 + 4x2 y 2 − y 4 ) ∂f (x, y) = , ∂x (x2 + y 2 )2 ∂f x(x4 − 4x2 y 2 − y 4 ) (x, y) = . ∂y (x2 + y 2 )2 It follows that x6 + 9x4 y 2 − 9x2 y 4 − y 6 ∂ 2f ∂ 2f (x, y) = = (x, y). ∂y∂x (x2 + y 2 )3 ∂x∂y Indeed, both fxy and fyx are continuous on R2 \ {(0, 0)}. Corollary 4.3 Let O be an open subset of Rn that contains the point x0 , and let f : O → R be a function defined on O. If all the second order partial derivatives of the function f : O → R at x0 are continuous, then the Hessian matrix Hf (x0 ) of f at x0 is a symmetric matrix.
Chapter 4. Differentiating Functions of Several Variables
218
Remark 4.6 One can define partial derivatives of higher orders following the same rationale as we define the second order partial derivatives. Extension of Clairaut’s theorem to higher order partial derivatives is straightforward. The key point is the continuity of the partial derivatives involved.
Chapter 4. Differentiating Functions of Several Variables
219
Exercises 4.1 Question 1 Let f : R3 → R be the function defined as f (x, y, z) =
xz . +1
ey
Find ∇f (1, 0, −1), the gradient of f at the point (1, 0, −1). Question 2 Let F : R2 → R3 be the function defined as F(x, y) = x2 y, xy 2 , 3x2 + 4y 2 . Find DF(2, −1), the derivative matrix of F at the point (2, −1). Question 3 Let f : R3 → R be the function defined as f (x, y, z) = x2 + 3xyz + 2y 2 z 3 . Find Hf (1, −1, 2), the Hessian matrix of f at the point (1, −1, 2). Question 4 Let f : R2 → R be the function defined as 3xy , if (x, y) ̸= (0, 0), 2 2 f (x, y) = x + 4y 0, if (x, y) = (0, 0). Show that f is not continuous at (0, 0), but it has partial derivatives at (0, 0). Question 5 Let f : R2 → R be the function defined as f (x, y) = |x2 + y|. Determine whether fy (1, −1) exists.
Chapter 4. Differentiating Functions of Several Variables
220
Question 6 Let f : R2 → R be the function defined as 2 xy , if (x, y) ̸= (0, 0), f (x, y) = x2 + y 2 0, if (x, y) = (0, 0). Show that f is continuous, it has partial derivatives, but the partial derivatives are not continuous. Question 7 Consider the function f : R2 → R defined as 2 2 xy(x + 9y ) , if (x, y) ̸= (0, 0), 4x2 + y 2 f (x, y) = 0, if (x, y) = (0, 0). Find the Hessian matrix Hf (0, 0) of f at (0, 0).
Chapter 4. Differentiating Functions of Several Variables
4.2
221
Differentiability and First Order Approximation
Let O be an open subset of Rn that contains the point x0 , and let F : O → Rm be a function defined on O. As we have seen in the previous section, even if F has partial derivatives at x0 , it does not imply that F is continuous at x0 . Heuristically, this is because the partial derivatives only consider the change of the function along the n directions defined by the coordinate axes, while continuity of F requires us to consider the change of F along all directions. 4.2.1
Differentiability
In this section, we will give a suitable definition of differentiability to ensure that we can capture the change of F in all directions. Let us first revisit an alternative perpective of differentiability for a single variable function f : (a, b) → R, which we have discussed in volume I. If x0 is a point in (a, b), then the function f : (a, b) → R is differentiable at x0 if and only if there is a number c such that f (x0 + h) − f (x0 ) − ch = 0. h→0 h lim
(4.3)
In fact, if f is differentiable at x0 , then this number c has to equal to f ′ (x0 ). Now for a function F : O → Rm defined on an open subset O of Rn , to consider the differentiability of F at x0 ∈ O, we should compare F(x0 ) to F(x0 + h) for all h in a neighbourhood of 0. But then a reasonable substitute of the number c should be a linear transformation T : Rn → Rm , so that for each h in a neighbourhood of 0, it gives a vector T(h) in Rm . As now h is a vector in Rn , we cannot divide by h in (4.3). It should be replaced with ∥h∥, the norm of h. Definition 4.9 Differentiability Let O be an open subset of Rn that contains the point x0 , and let F : O → Rm be a function defined on O. The function F : O → Rm is differentiable at x0 provided that there exists a linear transformation T : Rn → Rm so that F(x0 + h) − F(x0 ) − T(h) lim = 0. h→0 ∥h∥ F : O → Rm is differentiable if it is differentiable at each point of O.
Chapter 4. Differentiating Functions of Several Variables
222
Remark 4.7 The differentiability of F : O → Rm at x0 amounts to the existence of a linear transformation T : Rn → Rm so that F(x0 + h) = F(x0 ) + T(h) + ε(h)∥h∥, where ε(h) → 0 as h → 0. The following is obvious from the definition. Proposition 4.4 Let O be an open subset of Rn that contains the point x0 , and let F : O → Rm be a function defined on O. The function F : O → Rm is differentiable at x0 if and only if each of its component functions Fj : O → R, 1 ≤ j ≤ m is differentiable at x0 . Proof Let the components of the function ε(h) =
F(x0 + h) − F(x0 ) − T(h) ∥h∥
be ε1 (h), ε2 (h), . . . , εm (h). Then for 1 ≤ j ≤ m, εj (h) =
Fj (x0 + h) − Fj (x0 ) − Tj (h) . ∥h∥
The assertion of the proposition follows from the fact that lim ε(h) = 0 if and only if
h→0
lim εj (h) = 0 for all 1 ≤ j ≤ m,
h→0
while lim εj (h) = 0 if and only if Fj : O → R is differentiable at x0 . h→0
Let us look at a simple example of differentiable functions.
Chapter 4. Differentiating Functions of Several Variables
223
Example 4.11 Let A be an m × n matrix, and let b be a point in Rm . Define the function F : Rn → Rm by F(x) = Ax + b. Show that F : Rn → Rm is differentiable. Solution Given x0 and h in Rn , notice that F(x0 + h) − F(x0 ) = A(x0 + h) + b − Ax0 − b = Ah.
(4.4)
The map T : Rn → Rm defined as T(h) = Ah is a linear transformation. Eq. (4.4) says that F(x0 + h) − F(x0 ) − T(h) = 0. Thus, F(x0 + h) − F(x0 ) − T(h) = 0. h→0 ∥h∥ lim
Therefore, F is differentiable at x0 . Since the point x0 is arbitrary, the function F : Rn → Rm is differentiable. The next theorem says that differentiability implies continuity. Theorem 4.5 Differentiability Implies Continuity Let O be an open subset of Rn that contains the point x0 , and let F : O → Rm be a function defined on O. If the function F : O → Rm is differentiable at x0 , then it is continuous at x0 . Proof Since F : O → Rm is differentiable at x0 , there exists a linear transformation T : Rn → Rm such that ε(h) =
F(x0 + h) − F(x0 ) − T(h) h→0 −−−−→ 0. ∥h∥
Chapter 4. Differentiating Functions of Several Variables
224
By Theorem 2.34, there is a positive constant c such that ∥T(h)∥ ≤ c∥h∥
for all h ∈ Rn .
Therefore, ∥F(x0 + h) − F(x0 )∥ ≤ ∥T(h)∥ + ∥h∥∥ε(h)∥ ≤ ∥h∥ (c + ∥ε(h)∥) . This implies that lim F(x0 + h) = F(x0 ).
h→0
Thus, F : O → Rm is continuous at x0 . Example 4.12 The function f : R2 → R defined as xy , 2 2 f (x, y) = x + y 0,
if (x, y) ̸= (0, 0), if (x, y) = (0, 0)
in Example 4.6 is not differentiable at (0, 0) since it is not continuous at (0, 0). However, we have shown that it has partial derivatives at (0, 0). Let us study the function F : Rn → Rm , F(x) = Ax + b that is defined in Example 4.11. The component functions of F are F1 (x1 , x2 , . . . , xn ) = a11 x1 + a12 x2 + · · · + a1n xn + b1 , F2 (x1 , x2 , . . . , xn ) = a21 x1 + a22 x2 + · · · + a2n xn + b2 , .. . Fm (x1 , x2 , . . . , xn ) = am1 x1 + am2 x2 + · · · + amn xn + bm . Notice that ∇F1 (x) = a1 = (a11 , a12 , . . . , a1n ) , ∇F2 (x) = a2 = (a21 , a22 , . . . , a2n ) , .. . ∇Fm (x) = am = (am1 , am2 , . . . , amn )
Chapter 4. Differentiating Functions of Several Variables
225
are the row vectors of A. Hence, the derivative matrix of F is a given by ∇F1 (x) a11 a12 · · · a1n ∇F2 (x) a21 a22 · · · a2n = . DF(x) = .. .. .. .. . , . . . . . ∇Fm (x) am1 am2 · · · amn which is the matrix A itself. Observe that a11 h1 + a12 h2 + · · · + a1n hn a21 h1 + a22 h2 + · · · + a2n hn DF(x)h = .. .
⟨∇F1 (x), h⟩ ⟨∇F2 (x), h⟩ = . .. .
am1 h1 + am2 h2 + · · · + amn hn
⟨∇Fm (x), h⟩
From Example 4.11, we suspect that the linear transformation T : Rn → Rm that appears in the definition of differentiability of a function should be the linear transformation defined by the derivative matrix. In fact, this is the case. Theorem 4.6 Let O be an open subset of Rn that contains the point x0 , and let F : O → Rm be a function defined on O. The following are equivalent. (a) The function F : O → Rm is differentiable at x0 . (b) The function F : O → Rm has partial derivatives at x0 , and F(x0 + h) − F(x0 ) − DF(x0 )h = 0. h→0 ∥h∥ lim
(4.5)
(c) For each 1 ≤ j ≤ m, the component function Fj : O → R has partial derivatives at x0 , and Fj (x0 + h) − Fj (x0 ) − ⟨∇Fj (x0 ), h⟩ = 0. h→0 ∥h∥ lim
Chapter 4. Differentiating Functions of Several Variables
226
Proof The equivalence of (b) and (c) is Proposition 4.4, the componentwise differentiability. Thus, we are left to prove the equivalence of (a) and (b). First, we prove (b) implies (a). If (b) holds, let T : Rn → Rm be the linear transformation defined by the derivative matrix DF(x0 ). Then (4.5) says that F : O → Rm is differentiable at x0 . Conversely, assume that F : O → Rm is differentiable at x0 . Then there exists a linear transformation T : Rn → Rm such that F(x0 + h) − F(x0 ) − T(h) = 0. h→0 ∥h∥ lim
(4.6)
Let A be a m × n matrix so that T(h) = Ah. For 1 ≤ i ≤ n, eq. (4.6) implies that F(x0 + hei ) − F(x0 ) − A(hei ) = 0. h→0 h lim
This gives F(x0 + hei ) − F(x0 ) . h→0 h
Aei = lim This shows that
∂F (x0 ) exists and ∂xi ∂F (x0 ) = Aei . ∂xi
Therefore, F : O → Rm has partial derivatives at x0 . Since h i A = Ae1 Ae2 · · · Aen ∂F ∂F ∂F = (x0 ) (x0 ) · · · (x0 ) = DF(x0 ), ∂x1 ∂x2 ∂xn eq. (4.6) says that F(x0 + h) − F(x0 ) − DF(x0 )h = 0. h→0 ∥h∥ lim
This proves (a) implies (b).
Chapter 4. Differentiating Functions of Several Variables
227
Corollary 4.7 Let O be an open subset of Rn that contains the point x0 , and let F : O → Rm be a function defined on O. If the partial derivatives of F : O → Rm exist at x0 , but F(x0 + h) − F(x0 ) − DF(x0 )h ̸= 0, h→0 ∥h∥ lim
then F is not differentiable at x0 . Proof If F is differentiable at x0 , Theorem 4.6 says that we must have F(x0 + h) − F(x0 ) − DF(x0 )h = 0. h→0 ∥h∥ lim
By contrapositive, since F(x0 + h) − F(x0 ) − DF(x0 )h ̸= 0, h→0 ∥h∥ lim
we find that F is not differentiable at x0 . Example 4.13 Let f : R2 → R be the function defined as 3 x , if (x, y) ̸= (0, 0), f (x, y) = x2 + y 2 0, if (x, y) = (0, 0). Determine whether f is differentiable at (0, 0).
Solution One can show that f is continuous at 0 = (0, 0). Hence, we cannot use continuity to determine whether f is differentiable at x0 . Notice that fx (0, 0) = lim
h→0
f (h, 0) − f (0, 0) h−0 = lim = 1, h→0 h h
Chapter 4. Differentiating Functions of Several Variables
228
Figure 4.5: The function f (x, y) defined in Example 4.13.
f (0, h) − f (0, 0) 0−0 = lim = 0. h→0 h→0 h h Therefore, f has partial derivatives at 0, and ∇f (0) = (1, 0). Now we consider the function fy (0, 0) = lim
f (h) − f (0) − ⟨∇f (0), h⟩ h1 h22 =− 2 . ∥h∥ (h1 + h22 )3/2 1 1 Let {hk } be the sequence with hk = , . It converges to 0. Since k k ε(h) =
1 ε(hk ) = − √ 2 2
for all k ∈ Z+ ,
The sequence {ε(hk )} does not converge to 0. Hence, f (h) − f (0) − ⟨∇f (0), h⟩ ̸= 0. h→0 ∥h∥ lim
Therefore, f is not differentiable at (0, 0). Example 4.13 gives a function which is continuous and has partial derivatives at a point, yet it fails to be differentiable at that point. In the following, we are going to give a sufficient condition for differentiability. We begin with a lemma.
Chapter 4. Differentiating Functions of Several Variables
229
Lemma 4.8 Let x0 be a point in Rn and let f : B(x0 , r) → R be a function defined on an open ball centered at x0 . Assume that f : B(x0 , r) → R has first order partial derivatives. For each h in Rn with ∥h∥ < r, there exists z1 , . . . , zn in B(x0 , r) such that f (x0 + h) − f (x0 ) =
n X i=1
hi
∂f (zi ), ∂xi
and ∥zi − x0 ∥ < ∥h∥
for all 1 ≤ i ≤ n.
Proof We will take a zigzag path from x0 to x0 + h, which is a union of paths parallel to the coordinate axes. For 1 ≤ i ≤ n, let xi = x0 +
i X
hk ek = x0 + h1 e1 + · · · + hi ei .
k=1
Then xi is in B(x0 , r). Notice that B(x0 , r) is a convex set. Therefore, for any 1 ≤ i ≤ n, the line segment between xi−1 and xi = xi−1 + hi ei lies entirely inside B(x0 , r). Since f : B(x0 , r) → R has first order partial derivative with respect to xi , the function gi : [0, 1] → R, gi (t) = f (xi−1 + thi ei ) is differentiable and gi′ (t) = hi
∂f (xi−1 + thi ei ). ∂xi
By mean value theorem, there exists ci ∈ (0, 1) such that f (xi ) − f (xi−1 ) = gi (1) − gi (0) = gi′ (ci ) = hi
∂f (xi−1 + ci hi ei ). ∂xi
Chapter 4. Differentiating Functions of Several Variables
Let zi = xi−1 + ci hi ei = x0 +
i−1 X
230
hk ek + ci hi ei .
k=1
Then zi is a point in B(x0 , r). Moreover, f (x0 + h) − f (x0 ) =
n X i=1
(f (xi ) − f (xi−1 )) =
n X
hi
i=1
∂f (zi ). ∂xi
For 1 ≤ i ≤ n, since ci ∈ (0, 1), we have q q 2 2 2 2 ∥zi − x0 ∥ = h1 + · · · + hi−1 + ci hi < h21 + · · · + h2i−1 + h2i ≤ ∥h∥. This completes the proof.
Figure 4.6: A zigzag path from x0 to x0 + h. Theorem 4.9 Let O be an open subset of Rn that contains the point x0 , and let F : O → Rm be a function defined on O. If the partial derivatives of F : O → Rm exists and are continuous at x0 , then F is differentiable at x0 .
Chapter 4. Differentiating Functions of Several Variables
231
Proof By Proposition 4.4, it suffices to prove the theorem for a function f : O → R with codomain R. Since O is an open set that contains the point x0 , there exists r > 0 such that B(x0 , r) ⊂ O. By Lemma 4.8, for each h that satisfies 0 < ∥h∥ < r, there exists z1 , z2 , . . . , zn such that f (x0 + h) − f (x0 ) =
n X
hi
i=1
∂f (zi ), ∂xi
and ∥zi − x0 ∥ < ∥h∥
for all 1 ≤ i ≤ n.
Therefore, n
f (x0 + h) − f (x0 ) − ⟨∇f (x0 ), h⟩ X hi = ∥h∥ ∥h∥ i=1
∂f ∂f (zi ) − (x0 ) . ∂xi ∂xi
Fixed ε > 0. For 1 ≤ i ≤ n, since fxi : B(x0 , r) → R is continuous at x0 , there exists 0 < δi ≤ r such that if 0 < ∥z − x0 ∥ < δi , then |fxi (z) − fxi (x0 )|
0. If ∥h∥ < δ, then for 1 ≤ i ≤ n, ∥zi − x0 ∥ < ∥h∥ < δ ≤ δi . Thus, |fxi (zi ) − fxi (x0 )|
0 such that B(x0 , r) ⊂ O. For each h in Rn with ∥h∥ < r, Lemma 4.18 says that there is a ch ∈ (0, 1) such that 1 f (x0 + h) − f (x0 ) − ⟨∇f (x0 ), h⟩ = hT Hf (x0 + ch)h. 2 Therefore, if 0 < ∥h∥ < r, 1 T f (x0 + h) − f (x0 ) − ⟨∇f (x0 ), h⟩ − h Hf (x0 )h 2 ∥h∥2 2 n n ∂ 2f 1 X X hi hj ∂ f (x0 + ch h) − (x0 ) = 2 i=1 j=1 ∥h∥2 ∂xj ∂xi ∂xj ∂xi n n 1 X X |hi ||hj | ∂ 2 f ∂ 2f ≤ (x0 + ch h) − (x0 ) 2 2 i=1 j=1 ∥h∥ ∂xj ∂xi ∂xj ∂xi n n ∂ 2f 1 X X ∂ 2 f (x0 + ch h) − (x0 ) . ≤ 2 i=1 j=1 ∂xj ∂xi ∂xj ∂xi Since ch ∈ (0, 1), lim (x0 + ch h) = x0 . For all 1 ≤ i ≤ n, 1 ≤ j ≤ n, h→0 fxj xi is continuous. Hence, ∂ 2f ∂ 2f (x0 + ch h) = (x0 ). h→0 ∂xj ∂xi ∂xj ∂xi lim
This proves that 1 f (x0 + h) − f (x0 ) − ⟨∇f (x0 ), h⟩ − hT Hf (x0 )h 2 lim = 0. h→0 ∥h∥2 To prove part (b), let 1 P (x) = f (x0 ) + ⟨∇f (x0 ), x − x0 ⟩ + (x − x0 )T Hf (x0 )(x − x0 ). 2 Part (a) says that f (x0 + h) − P (x0 + h) = 0. h→0 ∥h∥2 lim
(4.9)
Chapter 4. Differentiating Functions of Several Variables
266
Since Q(x) is a polynomial of degree at most two in x, Q(x0 + h) is a polynomial of degree at most two in h. Therefore, we can write Q(x0 + h) as n n X X 1X aii h2i + aij hi hj . Q(x0 + h) = c + bi hi + 2 i=1 i=1 1≤i 0 such that B(x0 , δ) ⊂ O and for all x ∈ B(x0 , δ), f (x) ≥ f (x0 ). The value f (x0 ) is called a local minimum value of f . 3. The point x0 is called a local extremizer if it is either a local maximizer or a local minimizer. The value f (x0 ) is called a local extreme value if it is either a local maximum value or a local minimum value. From the definition, it is obvious that x0 is a local minimizer of the function f : O → R if and only if it is a local maximizer of the function −f : O → R. Example 4.28 (a) For the function f : R2 → R, f (x, y) = x2 + y 2 , (0, 0) is a local minimizer. (b) For the function g : R2 → R, g(x, y) = −x2 − y 2 , (0, 0) is a local maximizer.
Chapter 4. Differentiating Functions of Several Variables
272
(c) For the function h : R2 → R, h(x, y) = x2 − y 2 , 0 = (0, 0) is neither a local maximizer nor a local minimizer. For any δ > 0, let r = δ/2. The points u = (r, 0) and v = (0, r) are in B(0, δ), but h(u) = r2 > 0 = h(0),
h(v) = −r2 < 0 = h(0).
Figure 4.10: The functions f (x, y), g(x, y) and h(x, y) defined in Example 4.28. The following theorem gives a necessary condition for a point to be a local extremum if the function has partial derivatives at that point. Theorem 4.20 Let O be an open subset of Rn that contains the point x0 , and let f : O → R be a function defined on O. If x0 is a local extremizer and f has partial derivatives at x0 , then the gradient of f at x0 is the zero vector, namely, ∇f (x0 ) = 0. Proof Without loss of generality, assume that x0 is a local minimizer. Then there is a δ > 0 such that B(x0 , δ) ⊂ O and f (x) ≥ f (x0 )
for all x ∈ B(x0 , δ).
(4.14)
For 1 ≤ i ≤ n, consider the function gi : (−δ, δ) → R defined by gi (t) = f (x0 + tei ). By the definition of partial derivatives, gi is differentiable at t = 0 and
Chapter 4. Differentiating Functions of Several Variables
gi′ (0) =
273
∂f (x0 ). ∂xi
Eq. (4.14) implies that gi (t) ≥ gi (0)
for all t ∈ (−δ, δ).
In other words, t = 0 is a local minimizer of the function gi : (−δ, δ) → R. From the theory of single variable analysis, we must have gi′ (0) = 0. Hence, fxi (x0 ) = 0 for all 1 ≤ i ≤ n. This proves that ∇f (x0 ) = 0. Theorem 4.20 prompts us to make the following definition. Definition 4.18 Stationary Points Let O be an open subset of Rn that contains the point x0 , and let f : O → R be a function defined on O. If f has partial derivatives at x0 and ∇f (x0 ) = 0, we call x0 a stationary point of f . Theorem 4.20 says that if f : O → R has partial derivatives at x0 , a necessary condition for x0 to be a local extremizer is that it is a stationary point. Example 4.29 For all the three functions f , g and h defined in Example 4.28, the point 0 = (0, 0) is a stationary point. However, 0 is local minimizer of f , a local maximizer of g, but neither a local maximizer nor a local minimizer of h. The behavior of the function h(x, y) = x2 − y 2 in Example 4.28 prompts us to make the following definition.
Chapter 4. Differentiating Functions of Several Variables
274
Definition 4.19 Saddle Points Let O be an open subset of Rn that contains the point x0 , and let f : O → R be a function defined on O. The point x0 is a saddle point of the function f if it is a stationary point of f , but it is not a local extremizer. In other words, ∇f (x0 ) = 0, but for any δ > 0, there exist x1 and x2 in B(x0 , δ) ∩ O such that f (x1 ) > f (x0 ) and f (x2 ) < f (x0 ). Example 4.30 (0, 0) is a saddle point of the function h : R2 → R, h(x, y) = x2 − y 2 . By definition, if x0 is a stationary point of the function f : O → R, then it is either a local maximizer, a local minimizer, or a saddle point. If f : O → R has continuous second order partial derivatives at x0 , we can use the second derivative test to partially determine whether x0 is a local maximizer, a local minimizer, or a saddle point. When n = 1, we have seen that a stationary point x0 of a function f is a local minimum if f ′′ (x0 ) > 0. It is a local maximum if f ′′ (x0 ) < 0. For multivariable functions, it is natural to expect that whether x0 is a local extremizer depends on the definiteness of the Hessian matrix Hf (x0 ). In Section 2.1, we have discussed the classification of a symmetric matrix. It is either positive semi-definite, negative semi-definite or indefinite. Among the positive semi-definite ones, there are those that are positive definite. Among the negative semi-definite matrices, there are those which are negative definite. Theorem 4.21 Second Derivative Test Let O be an open subset of Rn , and let f : O → R be a twice continuously differentiable function defined on O. Assume that x0 is a stationary point of f : O → R. (i) If Hf (x0 ) is positive definite, then x0 is a local minimizer of f . (ii) If Hf (x0 ) is negative definite, then x0 is a local maximizer of f . (iii) If Hf (x0 ) is indefinite, then x0 is a saddle point.
Chapter 4. Differentiating Functions of Several Variables
275
The cases that are not covered in the second derivative test are the cases where Hf (x0 ) is positive semi-definite but not positive definite, or Hf (x0 ) is negative semi-definite but not negative definite. These are the inconclusive cases. Proof of the Second Derivative Test Notice that (i) and (ii) are equivalent since x0 is a local minimizer of f if and only if it is a local maximizer of −f , and H−f = −Hf . A symmetric matrix A is positive definite if and only if −A is negative definite. Thus, we only need to prove (i) and (iii). Since x0 is a stationary point, ∇f (x0 ) = 0. It follows from the second order approximation theorem that f (x0 + h) − f (x0 ) − 21 hT Hf (x0 )h = 0. lim h→0 ∥h∥2
(4.15)
To prove (i), asume that Hf (x0 ) is positive definite. By Theorem 2.9, there is a positive number c such that hT Hf (x0 )h ≥ c∥h∥2
for all h ∈ Rn .
Eq. 4.15 implies that there is a δ > 0 such that B(x0 , δ) ⊂ O and for all h with 0 < ∥h∥ < δ, f (x0 + h) − f (x0 ) − 12 hT Hf (x0 )h c < . 3 ∥h∥2 Therefore, f (x0 + h) − f (x0 ) − 1 hT Hf (x0 )h ≤ c ∥h∥2 3 2
for all ∥h∥ < δ.
This implies that for all h with ∥h∥ < δ, 1 c c f (x0 + h) − f (x0 ) ≥ hT Hf (x0 )h − ∥h∥2 ≥ ∥h∥2 ≥ 0. 2 3 6 Thus, f (x) ≥ f (x0 ) for all x ∈ B(x0 , δ). This shows that x0 is a local minimizer of f .
Chapter 4. Differentiating Functions of Several Variables
276
Now to prove (iii), assume that Hf (x0 ) is indefinite. Then there exist unit vectors u1 and u2 so that ε1 = uT1 Hf (x0 )u1 < 0,
ε2 = uT2 Hf (x0 )u2 > 0.
Let ε = 21 min{|ε1 |, ε2 }. Eq. (4.15) implies that there is a δ0 > 0 such that B(x0 , δ0 ) ⊂ O and for all h with 0 < ∥h∥ < δ0 , f (x0 + h) − f (x0 ) − 1 hT Hf (x0 )h < ε∥h∥2 . (4.16) 2 For any δ > 0, let r = 12 min{δ, δ0 }. Then the points x1 = x0 + ru1 and x2 = x0 + ru2 are in the ball B(x0 , δ) and the ball B(x0 , δ0 ). Eq. (4.16) implies that for i = 1, 2, −r2 ε ≤ f (x0 + rui ) − f (x0 ) −
r2 T ui Hf (x0 )ui < r2 ε. 2
Therefore, f (x0 + ru1 ) − f (x0 ) < r
2
1 T u Hf (x0 )u1 + ε 2 1
1 T u Hf (x0 )u2 − ε 2 2
=r
2
=r
2
1 ε1 + ε 2
≤0
since ε ≤ − 21 ε1 ; while f (x0 + ru2 ) − f (x0 ) > r
2
1 ε2 − ε 2
≥0
since ε ≤ 21 ε2 . Thus, x1 and x2 are points in B(x0 , δ), but f (x1 ) < f (x0 ) while f (x2 ) > f (x0 ). These show that x0 is a saddle point. A symmetric matrix is positive definite if and only if all its eigenvalues are positive. It is negative definite if and only if all its eigenvalues are negative. It is indefinite if it has at least one positive eigenvalue, and at least one negative eigenvalue. For a diagonal matrix, its eigenvalues are the entries on the diagonal. Let us revisit Example 4.28.
Chapter 4. Differentiating Functions of Several Variables
277
Example 4.31 For the functions considered in Example 4.28, we have seen " that # (0, 0) is a 2 0 stationary point of each of them. Notice that Hf (0, 0) = is positive 0 2 " # " # −2 0 2 0 definite, Hg (0, 0) = is negative definite, Hh (0, 0) = 0 −2 0 −2 is indefinite. Therefore, (0, 0) is a local minimizer of f , a local maximizer of g, and a saddle point of h. Now let us look at an example which shows that when the Hessian matrix is positive semi-definite but not positive definite, we cannot make any conclusion about the nature of a stationary point. Example 4.32 Consider the functions f : R2 → R and g : R2 → R given respectively by f (x, y) = x2 + y 4 ,
g(x, y) = x2 − y 4 .
These are infinitely differentiable functions. It is easy to check that (0, 0) is a stationary point of both of them. Now, " # 2 0 Hf (0, 0) = Hg (0, 0) = 0 0 is a positive semi-definite matrix. However, (0, 0) is a local minimizer of f , but a saddle point of g. To determine the definiteness of an n × n symmetric matrix by looking at the sign of its eigenvalues is ineffective when n ≥ 3. There is an easier way to determine whether a symmetric matrix is positive definite. Let us first introduce the definition of principal submatrices.
Chapter 4. Differentiating Functions of Several Variables
278
Definition 4.20 Principal Submatrices Let A be an n × n matrix. For 1 ≤ k ≤ n, the k th -principal submatrix Mk of A is the k × k matrix consists of the first k rows and first k columns of A. Example 4.33 1 2 3 For the matrix A = 4 5 6, the first, second and third principal 7 8 9 submatrices are " # 1 2 3 h i 1 2 M1 = 1 , M2 = , M3 = 4 5 6 4 5 7 8 9 respectively. Theorem 4.22 Sylvester’s Criterion for Positive Definiteness An n × n symmetric matrix A is positive definite if and only if det Mk > 0 for all 1 ≤ k ≤ n, where Mk is its k th principal submatrix. The proof of this theorem is given in Appendix A. Using the fact that a symmetric matrix A is negative definite if and only if −A is positive definite, it is easy to obtain a criterion for a symmetric matrix to be negative definite in terms of the determinants of its principal submatrices. Theorem 4.23 Sylvester’s Criterion for Negative Definiteness An n × n symmetric matrix A is negative definite if and only if (−1)k det Mk > 0 for all 1 ≤ k ≤ n, where Mk is its k th principal submatrix.
Chapter 4. Differentiating Functions of Several Variables
279
Example 4.34 Consider the matrix
1 2 −3 A = −1 4 2 . −3 5 8
Since det M1 = 1, det M2 = 6, det M3 = det A = 5 are all positive, A is positive definite. For a function f : O → R defined on an open subset O of R2 , we have the following. Theorem 4.24 Let O be an open subset of R2 . Suppose that (x0 , y0 ) is a stationary point of the twice continuously differentiable function f : O → R. Let 2 2 ∂ 2f ∂ f ∂ 2f (x0 , y0 ) 2 (x0 , y0 ) − (x0 , y0 ) . D(x0 , y0 ) = ∂x2 ∂y ∂x∂y ∂ 2f (i) If (x0 , y0 ) > 0 and D(x0 , y0 ) > 0, then the point (x0 , y0 ) is a ∂x2 local minimizer of f . ∂ 2f (ii) If (x0 , y0 ) < 0 and D(x0 , y0 ) > 0, then the point (x0 , y0 ) is a ∂x2 local maximizer of f . (iii) If D(x0 , y0 ) < 0, the point (x0 , y0 ) is a saddle point of f . Proof We notice that ∂ 2f ∂ 2f ∂x2 (x0 , y0 ) ∂x∂y (x0 , y0 ) Hf (x0 , y0 ) = . 2 2 ∂ f ∂ f (x0 , y0 ) (x , y ) 0 0 ∂x∂y ∂y 2
Chapter 4. Differentiating Functions of Several Variables
280
∂ 2f Hence, (x0 , y0 ) is the determinant of the first principal submatrix of ∂x2 Hf (x0 , y0 ), while D(x0 , y0 ) is the determinant of Hf (x0 , y0 ), the second principal submatrix of Hf (x0 , y0 ). Thus, (i) and (ii) follow from the Sylvester criteria as well as the second derivative test. For (iii), we notice that the 2 × 2 matrix Hf (x0 , y0 ) is indefinite if and only if it has one positive eigenvalue and one negative eigenvalue, if and only if D(x0 , y0 ) = det Hf (x0 , y0 ) < 0. Now we look at some examples of the applications of the second derivative test. Example 4.35 Let f : R2 → R be the function defined as f (x, y) = x4 + y 4 + 4xy. Find the stationary points of f and classify them. Solution Since f is a polynomial function, it is infinitely differentiable. ∇f (x, y) = (4x3 + 4y, 4y 3 + 4x). To find the stationary points, we need to solve the system of equations x 3 + y = 0 . y 3 + x = 0 From the first equation, we have y = −x3 . Substitute into the second equation gives −x9 + x = 0, or equivalently, x(x8 − 1) = 0.
Chapter 4. Differentiating Functions of Several Variables
281
Thus, x = 0 or x = ±1. When x = 0, y = 0. When x = ±1, y = ∓1. Therefore, the stationary points of f are u1 = (0, 0), u2 = (1, −1) and u3 = (−1, 1). Now, " # 12x2 4 Hf (x, y) = . 4 12y 2 Therefore, "
# 12 4 Hf (u2 ) = Hf (u3 ) = . 4 12
D(u1 ) = −16 < 0,
D(u2 ) = D(u3 ) = 128 > 0.
# 0 4 Hf (u1 ) = , 4 0
"
It follows that
Since fxx (u2 ) = fxx (u3 ) = 12 > 0, we conclude that u1 is a saddle point, u2 and u3 are local minimizers.
Figure 4.11: The function f (x, y) = x4 + y 4 + 4xy.
Chapter 4. Differentiating Functions of Several Variables
282
Example 4.36 Consider the function f : R3 → R defined as f (x, y, z) = x3 − xy 2 + 5x2 − 4xy − 2xz + y 2 + 6yz + 37z 2 . Show that (0, 0, 0) is a local minimizer of f . Solution Since f is a polynomial function, it is infinitely differentiable. Since ∇f (x, y, z) = (3x2 −y 2 +10x−4y−2z, −2xy−4x+2y+6z, −2x+6y+74z), we find that ∇f (0, 0, 0) = (0, 0, 0). Hence, (0, 0, 0) is a stationary point. Now, 6x + 10 −2y − 4 −2 Hf (x, y, z) = −2y − 4 −2x + 2 6 . −2 6 74 Therefore, 10 −4 −2 Hf (0, 0, 0) = −4 2 6 . −2 6 74
The determinants of the three principal submatrices of Hf (0, 0, 0) are 10 −4 det M1 = 10, det M2 = = 4, −4 2 10 −4 −2 det M3 = −4 2 6 = 24. −2 6 74 This shows that Hf (0, 0, 0) is positive definite. Hence, (0, 0, 0) is a local minimizer of f .
Chapter 4. Differentiating Functions of Several Variables
283
Exercises 4.5 Question 1 Let f : R2 → R be the function defined as f (x, y) = x2 + 4y 2 + 5xy − 8x − 11y + 7. Find the stationary points of f and classify them. Question 2 Let f : R2 → R be the function defined as f (x, y) = x2 + 4y 2 + 3xy − 5x − 18y + 1. Find the stationary points of f and classify them. Question 3 Let f : R2 → R be the function defined as f (x, y) = x3 + y 3 + 12xy. Find the stationary points of f and classify them. Question 4 Consider the function f : R3 → R defined as f (x, y, z) = z 3 − 2z 2 − x2 − y 2 − xy + x − y. Show that (1, −1, 0) is a stationary point of f and determine the nature of this stationary point.
Chapter 4. Differentiating Functions of Several Variables
284
Question 5 Consider the function f : R3 → R defined as f (x, y, z) = z 3 + 2z 2 − x2 − y 2 − xy + x − y. Show that (1, −1, 0) is a stationary point of f and determine the nature of this stationary point.
Chapter 5. The Inverse and Implicit Function Theorems
285
Chapter 5 The Inverse and Implicit Function Theorems In this chapter, we discuss the inverse function theorem and implicit function theorem, which are two important theorems in multivariable analysis. Given a function that maps a subset of Rn to Rn , the inverse function theorem gives sufficient conditions for the existence of a local inverse and its differentiability. Given a system of m equations with n+m variables, the implicit function theorem gives sufficient conditions to solve m of the variables in terms of the other n variables locally such that the solutions are differentiable functions. We want to emphasize that these theorems are local, in the sense that each of them asserts the existence of a function defined in a neighbourhood of a point. In some sense, the two theorems are equivalent, which means one can deduce one from the other. In this book, we will prove the inverse function theorem first, and use it to deduce the implicit function theorem.
5.1
The Inverse Function Theorem
Let D be a subset of Rn . If the function F : D → Rn is one-to-one, we can define the inverse function F−1 : F(D) → Rn . The question we want to study here is the following. If D is an open set and F is differentiable at the point x0 in D, is the inverse function F−1 differentiable at y0 = F(x0 )? For this, we also want the point y0 to be an interior point of F(D). More precisely, is there a neighbourhood U of x0 that is mapped bijectively by F to a neighbourhood V of y0 ? If the answer is yes, and F−1 is differentiable at y0 , then the chain rule would imply that DF−1 (y0 )DF(x0 ) = In . Hence, a necessary condition for F−1 to be differentiable at y0 is that the derivative matrix DF(x0 ) has to be invertible.
Chapter 5. The Inverse and Implicit Function Theorems
286
Let us study the map f : R → R given by f (x) = x2 . The range of the function is [0, ∞). Notice that if x0 > 0, then I = (0, ∞) is a neighbourhood of x0 that is mapped bijectively by f to the neighbourhood J = (0, ∞) of f (x0 ). If x0 < 0, then I = (−∞, 0) is a neighbourhood of x0 that is mapped bijectively by f to the neighbourhood J = (0, ∞) of f (x0 ). However, if x0 = 0, the point f (x0 ) = 0 is not an interior point of f (R) = [0, ∞). Notice that f ′ (x) = 2x. Therefore, x = 0 is the point which f ′ (x) = 0. If x0 > 0, take I = (0, ∞) and J = (0, ∞). Then f : I → J has an inverse √ given by f −1 : J → I, f −1 (x) = x. It is a differentiable function with 1 (f −1 )′ (x) = √ . 2 x In particular, at y0 = f (x0 ) = x20 , 1 1 1 . (f −1 )′ (y0 ) = √ = = ′ 2 y0 2x0 f (x0 ) Similarly, if x0 < 0, take I = (−∞, 0) and J = (0, ∞). Then f : I → J has an √ inverse given by f −1 : J → I, f −1 (x) = − x. It is a differentiable function with 1 (f −1 )′ (x) = − √ . 2 x In particular, at y0 = f (x0 ) = x20 , 1 1 1 (f −1 )′ (y0 ) = − √ = = ′ . 2 y0 2x0 f (x0 ) For a single variable function, the inverse function theorem takes the following form. Theorem 5.1 (Single Variable) Inverse Function Theorem Let O be an open subset of R that contains the point x0 , and let f : O → R be a continuously differentiable function defined on O. Suppose that f ′ (x0 ) ̸= 0. Then there exists an open interval I containing x0 such that f maps I bijectively onto the open interval J = f (I). The inverse function f −1 : J → I is continuously differentiable. For any y ∈ J, if x is the point in I such that f (x) = y, then (f −1 )′ (y) =
1 f ′ (x)
.
Chapter 5. The Inverse and Implicit Function Theorems
287
Figure 5.1: The function f : R → R, f (x) = x2 . Proof Without loss of generality, assume that f ′ (x0 ) > 0. Since O is an open set and f ′ is continuous at x0 , there is an r1 > 0 such that (x0 −r1 , x0 +r1 ) ⊂ O and for all x ∈ (x0 − r1 , x0 + r1 ), |f ′ (x) − f ′ (x0 )|
f ′ (x0 ) >0 2
for all x ∈ (x0 − r1 , x0 + r1 ).
Therefore, f is strictly increasing on (x0 − r1 , x0 + r1 ). Take any r > 0 that is less that r1 . Then [x − r, x + r] ⊂ (x0 − r1 , x0 + r1 ). By intermediate value theorem, the function f maps [x − r, x + r] bijectively onto [f (x − r), f (x + r)]. Let I = (x − r, x + r) and J = (f (x − r), f (x + r)). Then f : I → J is a bijection and f −1 : J → I exists. In volume I, we have proved that f −1 is differentiable, and (f −1 )′ (y) =
1 f ′ (f −1 (y))
for all y ∈ J.
This formula shows that (f −1 )′ : J → R is continuous.
Chapter 5. The Inverse and Implicit Function Theorems
288
Remark 5.1 In the inverse function theorem, we determine the invertibility of the function in a neighbourhood of a point x0 . The theorem says that if f is continuously differentiable and f ′ (x0 ) ̸= 0, then f is locally invertible at x0 . Here the assumption that f ′ is continuous is essential. In volume I, we have seen that for a continuous function f : I → R defined on an open interval I to be one-to-one, it is necessary that it is strictly monotonic. The function f : R → R, 2 x + x sin 1 , if x ̸= 0, x f (x) = 0, if x = 0, is an example of a differentiable function where f ′ (0) = 1 ̸= 0, but f fails to be strictly monotonic in any neighbourhood of the point x = 0. This annoying behavior can be removed if we assume that f ′ is continuous. If f ′ (x0 ) ̸= 0 and f ′ is continuous, there is a neighbourhood I of x0 such that f ′ (x) has the same sign as f ′ (x0 ) for all x ∈ I. This implies that f is strictly monotonic on I. Example 5.1 Let f : R → R be the function defined as f (x) = 2x + 4 cos x. Show that there is an open interval I containing 0 such that f : I → R is one-to-one, and f −1 : f (I) → R is continuously differentiable. Determine (f −1 )′ (f (0)).
Chapter 5. The Inverse and Implicit Function Theorems
289
Solution The function f is infinitely differentiable and f ′ (x) = 2 − 4 sin x. Since f ′ (0) = 2 ̸= 0, the inverse function theorem says that there is an open interval I containing 0 such that f : I → R is one-to-one, and f −1 : f (I) → R is continuously differentiable. Moreover, (f −1 )′ (f (0)) =
1 1 = . f ′ (0) 2
Now let us consider functions defined on open subsets of Rn , where n ≥ 2. We first consider a linear transformation T : Rn → Rn . There is an n × n matrix A such that T(x) = Ax. The mapping T : Rn → Rn is one-to-one if and only if A is invertible, if and only if det A ̸= 0. In this case, T is a bijection and T−1 : Rn → Rn is the linear transformation given by T−1 (x) = A−1 x. Notice that for any x and y in Rn , DT(x) = A,
DT−1 (y) = A−1 .
The content of the inverse function theorem is to extend this to nonlinear mappings. Theorem 5.2 Inverse Function Theorem Let O be an open subset of Rn that contains the point x0 , and let F : O → Rn be a continuously differentiable function defined on O. If det DF(x0 ) ̸= 0, then we have the followings. (i) There exists a neighbourhood U of x0 such that F maps U bijectively onto the open set V = F(U ). (ii) The inverse function F−1 : V → U is continuously differentiable. (iii) For any y ∈ V , if x is the point in U such that F(x) = y, then DF−1 (y) = DF(F−1 (y))−1 = DF(x)−1 .
Chapter 5. The Inverse and Implicit Function Theorems
290
Figure 5.2: The inverse function theorem. For a linear transformation which is a degree one polynomial mapping, the inverse function theorem holds globally. For a general continuously differentiable mapping, the inverse function theorem says that the first order approximation of the function at a point can determine the local invertibility of the function at that point. When n ≥ 2, the proof of the inverse function theorem is substantially more complicated than the n = 1 case, as we do not have the monotonicity argument used in the n = 1 case. The proof will be presented in Section 5.2. We will discuss the examples and applications in this section. Example 5.2 Let F : R2 → R2 be the mapping defined by F(x, y) = (3x − 2y + 7, 4x + 5y − 2). Show that F is a bijection, and find F−1 (x, y) and DF−1 (x, y). Solution 2
2
The mapping F : R → R can be written as F(x) = T(x) + b, where T : R2 → R2 is the linear transformation T(x, y) = (3x − 2y, 4x + 5y),
Chapter 5. The Inverse and Implicit Function Theorems
291
"
# 3 −2 and b = (7, −2). For u = (x, y), T(u) = Au, where A = . 4 5 Since det A = 23 ̸= 0, the linear transformation T : R2 → R2 is oneto-one. Hence, F : R2 → R2 is also one-to-one. Given v ∈ R2 , let u = A−1 (v − b). Then F(u) = v. Hence, F is also onto. The inverse F−1 : R2 → R2 is given by F−1 (v) = A−1 (v − b). Since A−1
" # 1 5 2 = , 23 −4 3
we find that
5(x − 7) + 2(y + 2) −4(x − 7) + 3(y + 2) , F (x, y) = 23 23 5x + 2y − 31 −4x + 3y + 34 , , = 23 23 −1
and
" # 5 2 1 DF−1 (x, y) = . 23 −4 3
Example 5.3 Determine the values of a such that the mapping F : R3 → R3 defined by F(x, y, z) = (2x + y + az, x − y + 3z, 3x + 2y + z + 7) is invertible. Solution The mapping F : R3 → R3 can be written as F(x) = T(x) + b, where T : R3 → R3 is the linear transformation T(x, y, z) = (2x + y + az, x − y + 3z, 3x + 2y + z),
Chapter 5. The Inverse and Implicit Function Theorems
292
and b = (0, 0, 7). Thus, F is a degree one polynomial mapping with 2 1 a DF(x) = 1 −1 3 . 3 2 1 The mapping F is invertible if and only if it is one-to-one, if and only if T is one-to-one, if and only if det DF(x) ̸= 0. Since det DF(x) = 5a − 6, the mapping F is invertible if and only if a ̸= 6/5. Example 5.4 Let Φ : R2 → R2 be the mapping defined as Φ(r, θ) = (r cos θ, r sin θ). Determine the points (r, θ) ∈ R2 where the inverse function theorem can be applied to this mapping. Explain the significance of this result. Solution Since sin θ and cos θ are infinitely differentiable functions, the mapping Φ is infinitely differentiable with " # cos θ −r sin θ DΦ(r, θ) = . sin θ r cos θ Since det DΦ(r, θ) = r cos2 θ + r sin2 θ = r, the inverse function theorem is not applicable at the point (r, θ) if r = 0. The mapping Φ is a change from polar coordinates to rectangular coordinates. The result above shows that the change of coordinates is locally one-to-one away from the origin of the xy-plane.
Chapter 5. The Inverse and Implicit Function Theorems
293
Example 5.5 Consider the mapping F : R2 → R2 given by F(x, y) = (x2 − y 2 , xy). Show that there is a neighbourhood U of the point u0 = (1, 1) such that F : U → R2 is one-to-one, V = F(U ) is an open set, and G = F−1 : V → U ∂G1 is continuously differentiable. Then find (0, 1). ∂y Solution The mapping F is a polynomial mapping. Thus, it is continuously differentiable. Notice that F(u0 ) = (0, 1) and " # " # 2x −2y 2 −2 DF(x, y) = , DF(u0 ) = . y x 1 1 Since det DF(u0 ) = 4 ̸= 0, the inverse function theorem implies that there is a neighbourhood U of the point u0 such that F : U → R2 is one-toone, V = F(U ) is an open set, and G = F−1 : V → U is continuously differentiable. Moreover, " # 1 1 2 −1 DG(0, 1) = DF(1, 1) = . 4 −1 2 From here, we find that ∂G1 2 1 (0, 1) = = . ∂y 4 2 Example 5.6 Consider the system of equations sin(x + y) + x2 y + 3xy 2 = 2, 2xy + 5x2 − 2y 2 = 1.
Chapter 5. The Inverse and Implicit Function Theorems
294
Observe that (x, y) = (1, −1) is a solution of this system. Show that there is a neighbourhood U of u0 = (1, −1) and an r > 0 such that for all (a, b) satisfying (a − 2)2 + (b − 1)2 < r2 , the system sin(x + y) + x2 y + 3xy 2 = a, 2xy + 5x2 − 2y 2 = b has a unique solution (x, y) that lies in U . Solution 2
2
Let F : R → R be the function defined by F(x, y) = sin(x + y) + x2 y + 3xy 2 , 2xy + 5x2 − 2y 2 . Since the sine function is infinitely differentiable, sin(x + y) is infinitely differentiable. The functions g(x, y) = x2 y + 3xy 2 and F2 (x, y) = 2xy + 5x2 − 2y 2 are polynomial functions. Hence, they are also infinitely differentiable. This shows that F is infinitely differentiable. Since " # cos(x + y) + 2xy + 3y 2 cos(x + y) + x2 + 6xy DF(x, y) = , 2y + 10x 2x − 4y we find that
" # 2 −4 DF(1, −1) = . 8 6
It follows that det DF(1, −1) = 44 ̸= 0. By the inverse function theorem, there exists a neighbourhood U1 of u0 such that F : U1 → R2 is one-to-one and V = F(U1 ) is an open set. Since F(u0 ) = (2, 1), the point v0 = (2, 1) is a point in the open set V . Hence, there exists r > 0 such that B(v0 , r) ⊂ V . Since B(v0 , r) is open and F is continuous, U = F−1 (B(v0 , r)) is an open subset of R2 . The map F : U → B(v0 , r) is a bijection. For all (a, b) satisfying (a − 2)2 + (b − 1)2 < r2 , (a, b) is in B(v0 , r). Hence, there is a unique (x, y) in U such that F(x, y) = (a, b). This means that the system
Chapter 5. The Inverse and Implicit Function Theorems
295
sin(x + y) + x2 y + 3xy 2 = a, 2xy + 5x2 − 2y 2 = b has a unique solution (x, y) that lies in U . At the end of this section, let us prove the following theorem. Theorem 5.3 Let A be an n × n matrix, and let x0 and y0 be two points in Rn . Define the mapping F : Rn → Rn by F(x) = y0 + A (x − x0 ) . Then F is infinitely differentiable with DF(x) = A. It is one-to-one and onto if and only if det A ̸= 0. In this case, F−1 (y) = x0 + A−1 (y − y0 ) ,
and DF−1 (y) = A−1 .
In particular, F−1 is also infinitely differentiable. Proof Obviously, F is a polynomial mapping. Hence, F is infinitely differentiable. By a straightforward computation, we find that DF = A. Notice that F = F2 ◦ T ◦ F1 , where F1 : Rn → Rn is the translation F1 (x) = x − x0 , T : Rn → Rn is the linear transformation T(x) = Ax, and F2 : Rn → Rn is the translation F2 (y) = y +y0 . Since translations are bijective mappings, F is one-to-one and onto if and only if T : Rn → Rn is one-to-one and onto, if and only if det A ̸= 0. If y = y0 + A (x − x0 ) , then x = x0 + A−1 (y − y0 ) . This gives the formula for F−1 (y). The formula for DF−1 (y) follows.
Chapter 5. The Inverse and Implicit Function Theorems
296
Exercises 5.1 Question 1 Let f : R → R be the function defined as f (x) = e2x + 4x sin x + 2 cos x. Show that there is an open interval I containing 0 such that f : I → R is one-to-one, and f −1 : f (I) → R is continuously differentiable. Determine (f −1 )′ (f (0)). Question 2 Let F : R2 → R2 be the mapping defined by F(x, y) = (3x + 2y − 5, 7x + 4y − 3). Show that F is a bijection, and find F−1 (x, y) and DF−1 (x, y). Question 3 Consider the mapping F : R2 → R2 given by F(x, y) = (x2 + y 2 , xy). Show that there is a neighbourhood U of the point u0 = (2, 1) such that F : U → R2 is one-to-one, V = F(U ) is an open set, and G = F−1 : V → U ∂G2 is continuously differentiable. Then find (5, 2). ∂x Question 4 Let Φ : R3 → R3 be the mapping defined as Φ(ρ, ϕ, θ) = (ρ sin ϕ cos θ, ρ sin ϕ sin θ, ρ cos ϕ). Determine the points (ρ, ϕ, θ) ∈ R3 where the inverse function theorem can be applied to this mapping. Explain the significance of this result.
Chapter 5. The Inverse and Implicit Function Theorems
297
Question 5 Consider the system of equations 4x + y − 5xy = 2, x2 + y 2 − 3xy 2 = 5. Observe that (x, y) = (−1, 1) is a solution of this system. Show that there is a neighbourhood U of u0 = (−1, 1) and an r > 0 such that for all (a, b) satisfying (a − 2)2 + (b − 5)2 < r2 , the system 4x + y − 5xy = a, x2 + y 2 − 3xy 2 = b has a unique solution (x, y) that lies in U .
Chapter 5. The Inverse and Implicit Function Theorems
5.2
298
The Proof of the Inverse Function Theorem
In this section, we prove the inverse function theorem stated in Theorem 5.2. The hardest part of the proof is the first statement, which asserts that there is a neighbourhood U of x0 such that restricted to U , F is one-to-one, and the image of U under F is open in Rn . In the statement of the inverse function theorem, we assume that the derivative matrix of the continuously differentiable mapping F : O → Rn is invertible at the point x0 . The continuities of the partial derivatives of F then implies that there is a neighbourhood N of x0 such that the derivative matrix of F at any x in N is also invertible. Theorem 3.38 asserts that a linear transformation T : Rn → Rn is invertible if and only if there is a positive constant c such that ∥T(u) − T(v)∥ ≥ c∥u − v∥
for all u, v ∈ Rn .
Definition 5.1 Stable Mappings A mapping F : D → Rn is stable if there is a positive constant c such that ∥F(u) − F(v)∥ ≥ c∥u − v∥
for all u, v ∈ D.
In other words, a linear transformation T : Rn → Rn is invertible if and only if it is stable. Remark 5.2 Stable Mappings vs Lipschitz Mappings Let D be a subset of Rn . Observe that if F : D → Rn is a stable mapping, there is a constant c > 0 such that ∥F(u1 ) − F(u2 )∥ ≥ c∥u1 − u2 ∥
for all u1 , u2 ∈ D.
This implies that F is one-to-one, and thus the inverse F−1 : F(D) → Rn exists. Notice that for any v1 and v2 in F(D), 1 ∥F−1 (v1 ) − F−1 (v2 )∥ ≤ ∥v1 − v2 ∥. c This means that F−1 : F(D) → Rn is a Lipschitz mapping.
Chapter 5. The Inverse and Implicit Function Theorems
299
For a mapping F : D → Rn that satisfies the assumptions in the statement of the inverse function theorem, it is stable in a neighbourhood of x0 . Theorem 5.4 Let O be an open subset of Rn that contains the point x0 , and let F : O → Rn be a continuously differentiable function defined on O. If det DF(x0 ) ̸= 0, then there exists a neighbourhood U of x0 such that DF(x) is invertible for all x ∈ U , F maps U bijectively onto the open set V = F(U ), and the map F : U → V is stable. Recall that when A is a subset of Rn , u is a point in Rn , A + u = {a + u | a ∈ A} is the translate of the set A by the vector u. The set A is open if and only if A + u is open, A is closed if and only if A + u is closed. Lemma 5.5 It is sufficient to prove Theorem 5.4 when x0 = 0, F(x0 ) = 0 and DF(x0 ) = In . Proof of Lemma 5.5 Assume that Theorem 5.4 holds when x0 = 0, F(x0 ) = 0 and DF(x0 ) = In . Now given that F : O → Rn is a continuously differentiable mapping with det DF(x0 ) ̸= 0, let y0 = F(x0 ) and A = DF(x0 ). Then A is invertible. Define the open set D as D = O − x0 . It is a neighbourhood of the point 0. Let G : D → Rn be the mapping G(x) = A−1 (F(x + x0 ) − y0 ) . Then G(0) = 0. Using the same reasoning as the proof of Theorem 5.3, we find that G is continuously differentiable and DG(x) = A−1 DF(x + x0 ).
Chapter 5. The Inverse and Implicit Function Theorems
300
This gives DG(0) = A−1 DF(x0 ) = In . By assumption, Theorem 5.4 holds for the mapping G. Namely, there exist neighbourhoods U and V of 0 such that G : U → V is a bijection and DG(x) is invertible for all x ∈ U. Moreover, there is a positive constant a such that ∥G(u1 ) − G(u2 )∥ ≥ a∥u1 − u2 ∥
for all u1 , u2 ∈ U.
Let U be the neighbourhood of x0 given by U = U + x0 . By Theorem 5.3, the mapping H : Rn → Rn , H(y) = A−1 (y − y0 ) is a continuous bijection. Therefore, V = H−1 (V) is an open subset of Rn that contains y0 . By definition, F maps U bijectively to V . Since F(x) = y0 + AG(x − x0 ), we find that DF(x) = A (DG(x − x0 )) . Since A is invertible, DF(x) is invertible for all x ∈ U . Theorem 3.38 says that there is a positive constant α such that ∥Ax∥ ≥ α∥x∥
for all x ∈ Rn .
Therefore, for any u1 and u2 in U , ∥F(u1 ) − F(u2 )∥ = ∥A (G(u1 − x0 ) − G(u2 − x0 )) ∥ ≥ α∥G(u1 − x0 ) − G(u2 − x0 )∥ ≥ aα∥u1 − u2 ∥. This shows that F : U → V is stable, and thus completes the proof of the lemma. Now we prove Theorem 5.4.
Chapter 5. The Inverse and Implicit Function Theorems
301
Proof of Theorem 5.4 By Lemma 5.5, we only need to consider the case where x0 = 0, F(x0 ) = 0 and DF(x0 ) = In . Since F : O → Rn is continuously differentiable, the map DF : O → Mn is continuous. Since det : Mn → R is also continuous, and det DF(0) = 1, there is an r0 > 0 such that B(0, r0 ) ⊂ O and for all x ∈ B(0, r0 ), det DF(x) > 21 . In particular, DF(x) is invertible for all x ∈ B(0, r0 ). Let G : O → Rn be the mapping defined as G(x) = F(x) − x, so that F(x) = x + G(x). The mapping G is continuosly differentiable. It satisfies G(0) = 0 and DG(0) = DF(0) − In = 0. Since G is continuously differentiable, for any 1 ≤ i ≤ n, 1 ≤ j ≤ n, there exists ri,j > 0 such that B(0, ri,j ) ⊂ O and for all x ∈ B(0, ri,j ), ∂Gi ∂Gi ∂G 1 i ∂xj (x) = ∂xj (x) − ∂xj (0) < 2n . Let r = min ({ri,j | 1 ≤ i ≤ n, 1 ≤ j ≤ n} ∪ {r0 }) . Then r > 0, B(0, r) ⊂ B(0, r0 ) and B(0, r) ⊂ B(0, ri,j ) for all 1 ≤ i ≤ n, 1 ≤ j ≤ n. The ball B(0, r) is a convex set. If u and v are two points in B(0, r), mean value theorem implies that for 1 ≤ i ≤ n, there exists zi ∈ B(0, r) such that Gi (u) − Gi (v) =
n X j=1
(uj − vj )
∂Gi (zi ). ∂xj
It follows that n X
∂Gi |Gi (u) − Gi (v)| ≤ |uj − vj | (zi ) ∂xj j=1 n
1 X 1 ≤ |uj − vj | ≤ √ ∥u − v∥. 2n j=1 2 n
Chapter 5. The Inverse and Implicit Function Theorems
302
Therefore, v u n uX 1 ∥G(u) − G(v)∥ = t (Gi (u) − Gi (v))2 ≤ ∥u − v∥. 2 i=1 This shows that G : B(0, r) → Rn is a map satisfying G(0) = 0, and 1 ∥G(u) − G(v)∥ ≤ ∥u − v∥ 2
for all u, v ∈ B(0, r).
By Theorem 2.44, the map F : B(0, r) → Rn is one-to-one, and its image contains the open ball B(0, r/2). Let V = B(0, r/2). Then V is an open subset of Rn that is contained in the image of F. Since F : B(0, r) → Rn is continuous, U = F|−1 B(0,r) (V ) is an open set. By definition, F : U → V is a bijection. Since U is contained in B(0, r0 ), DF(x) is invertible for all x in U . Finally, for any u and v in U , 1 ∥F(u) − F(v)∥ ≥ ∥u − v∥ − ∥G(u) − G(v)∥ ≥ ∥u − v∥. 2 This completes the proof of the theorem. To complete the proof of the inverse function theorem, it remains to prove that F : V → U is continuously differentiable, and −1
DF−1 (y) = DF(F−1 (y))−1 . Theorem 5.6 Let O be an open subset of Rn that contains the point x0 , and let F : O → Rn be a continuously differentiable function defined on O. If det DF(x0 ) ̸= 0, then there exists a neighbourhood U of x0 such that F maps U bijectively onto the open set V = F(U ), the inverse function F−1 : V → U is continuously differentiable, and for any y ∈ V , if x is the point in U such that F(x) = y, then DF−1 (y) = DF(x)−1 .
Chapter 5. The Inverse and Implicit Function Theorems
303
Proof Theorem 5.4 asserts that there exists a neighbourhood U of x0 such that F maps U bijectively onto the open set V = F(U ), DF(x) is invertible for all x in U , and there is a positive constant c such that ∥F(u1 ) − F(u2 )∥ ≥ c∥u1 − u2 ∥
for all u1 , u2 ∈ U.
(5.1)
Now given y in V , we want to show that F−1 is differentiable at y and DF−1 (y) = DF(x)−1 , where x = F−1 (y). Since V is open, there is an r > 0 such that B(y, r) ⊂ V . For k ∈ Rn such that ∥k∥ < r, let h(k) = F−1 (y + k) − F−1 (y). Then F(x) = y
and F(x + h) = y + k.
Eq. (5.1) implies that 1 ∥h∥ ≤ ∥k∥. c Let A = DF(x). By assumption, A is invertible. Notice that
(5.2)
F−1 (y + k) − F−1 (y) − A−1 k = −A−1 (k − Ah) = −A−1 (F(x + h) − F(x) − Ah) . There is a positive constant β such that ∥A−1 y∥ ≤ β∥y∥
for all y ∈ Rn .
Therefore,
−1
F (y + k) − F−1 (y) − A−1 k
∥k∥ β ∥ (F(x + h) − F(x) − Ah) ∥ ∥k∥
β F(x + h) − F(x) − Ah
. ≤
c ∥h∥ ≤
Since F is differentiable at x, lim
h→0
F(x + h) − F(x) − Ah = 0. ∥h∥
(5.3)
Chapter 5. The Inverse and Implicit Function Theorems
304
Eq. (5.2) implies that lim h = 0. Eq. (5.3) then implies that k→0
F−1 (y + k) − F−1 (y) − A−1 k = 0. k→0 ∥k∥ lim
This proves that F−1 is differentiable at y and DF−1 (y) = A−1 = DF(x)−1 . Now the map DF−1 : V → GL (n, R) is the compositions of the maps F−1 : V → U , DF : U → GL (n, R) and I : GL (n, R) → GL (n, R) which takes A to A−1 . Since each of these maps is continuous, the map DF−1 : V → GL (n, R) is continuous. This completes the proof that F−1 : V → U is continuously differentiable. At the end of this section, let us give a brief discussion about the concept of homeomorphism and diffeomorphism. Definition 5.2 Homeomorphism Let A be a subset of Rm and let B be a subset of Rn . We say that A and B are homeomorphic if there exists a continuous bijective function F : A → B whose inverse F−1 : B → A is also continuous. Such a function F is called a homeomorphism between A and B. Definition 5.3 Diffeomorphism Let O and U be open subsets of Rn . We say that U and O are diffeomorphic if there exists a homeomorphism F : O → U between O and U such that F and F−1 are differentiable. Example 5.7 Let A = {(x, y) | x2 + y 2 < 1} and B = {(x, y) | 4x2 + 9y 2 < 36}. Define the map F : R2 → R2 by F(x, y) = (3x, 2y).
Chapter 5. The Inverse and Implicit Function Theorems
305
Then F is an invertible linear transformation with x y −1 , . F (x, y) = 3 2 The mappings F and F−1 are continuously differentiable. It is easy to show that F maps A bijectively onto B. Hence, F : A → B is a diffeomorphism between A and B.
Figure 5.3: A = {(x, y) | x2 + y 2 < 1} and B = {(x, y) |, 4x2 + 9y 2 < 36} are diffeomorphic. Theorem 5.3 gives the following. Theorem 5.7 Let A be an invertible n × n matrix, and let x0 and y0 be two points in Rn . Define the mapping F : Rn → Rn by F(x) = y0 + A (x − x0 ) . If O is an open subset of Rn , then F : O → F(O) is a diffeomorphism. The inverse function theorem gives the following.
Chapter 5. The Inverse and Implicit Function Theorems
306
Theorem 5.8 Let O be an open subset of Rn , and let F : O → Rn be a continuously differentiable mapping such that DF(x) is invertible for all x ∈ O. If U is an open subset contained in O such that F : U → Rn is one-to-one, then F : U → F(U) is a diffeomorphism. The proof of this theorem is left as an exercise.
Chapter 5. The Inverse and Implicit Function Theorems
307
Exercises 5.2 Question 1 Let F : R2 → R2 be the mapping given by F(x, y) = (xey + xy, 2x2 + 3y 2 ). Show that there is a neighbourhood U of (−1, 0) such that the mapping F : U → R2 is stable. Question 2 Let O be an open subset of Rn , and let F : O → Rn be a continuously differentiable mapping such that det DF(x) ̸= 0 for all x ∈ O. Show that F(O) is an open set. Question 3 Let O be an open subset of Rn , and let F : O → Rn be a continuously differentiable mapping such that DF(x) is invertible for all x ∈ O. If U is an open subset contained in O such that F : U → Rn is one-to-one, then F : U → F(U) is a diffeomorphism. Question 4 Let O be an open subset of Rn , and let F : O → Rn be a differentiable mapping. Assume that there is a positive constant c such that ∥F(u) − F(v)∥ ≥ c∥u − v∥
for all u, v ∈ O.
Use first order approximation theorem to show that for any x ∈ O and any h ∈ Rn , ∥DF(x)h∥ ≥ c∥h∥.
Chapter 5. The Inverse and Implicit Function Theorems
308
Question 5 Let O be an open subset of Rn , and let F : O → Rn be a continuously differentiable mapping. (a) If F : O → Rn is stable, show that the derivative matrix DF(x) is invertible at every x in O. (b) Assume that the derivative matrix DF(x) is invertible at every x in O. If C is a compact subset of O, show that the mapping F : C → Rn is stable.
Chapter 5. The Inverse and Implicit Function Theorems
5.3
309
The Implicit Function Theorem
The implicit function theorem is about the possibility of solving m variables from a system of m equations with n + m variables. Let us study some special cases. Consider the function f : R2 → R given by f (x, y) = x2 + y 2 − 1. For a point (x0 , y0 ) that satisfies f (x0 , y0 ) = 0, we want to ask whether there is a neighbourhood I of x0 , a neighbourhood J of y0 , and a function g : I → R such that for (x, y) ∈ I × J, f (x, y) = 0 if and only if y = g(x).
Figure 5.4: The points in the (x, y) plane satisfying x2 + y 2 − 1 = 0. If (x0 , y0 ) is a point with y0 > 0 and f (x0 , y0 ) = 0, then we can take the neighbourhoods I = (−1, 1) and J = (0, ∞) of x0 and y0 respectively, and define the function g : I → R by √ g(x) = 1 − x2 . √ We then find that for (x, y) ∈ I × J, f (x, y) = 0 if and only if y = 1 − x2 = g(x). If (x0 , y0 ) is a point with y0 < 0 and f (x0 , y0 ) = 0, then we can take the neighbourhoods I = (−1, 1) and J = (−∞, 0) of x0 and y0 respectively, and define the function g : I → R by √ g(x) = − 1 − x2 . √ We then find that for (x, y) ∈ I × J, f (x, y) = 0 if and only if y = − 1 − x2 = g(x). However, if (x0 , y0 ) = (1, 0), any neighbourhood J of y0 must contain an interval of the form (−r, r). If I is a neighbourhood of 1, (x, y) is a point in
Chapter 5. The Inverse and Implicit Function Theorems
310
I × (−r, r) such that f (x, y) = 0, then (x, −y) is another point in I × (−r, r) satisfying f (x, −y) = 0. This shows that there does not exist any function g : I → R such that when (x, y) ∈ I × J, f (x, y) = 0 if and only if y = g(x). We say that we cannot solve y as a function of x in a neighbourhood of the point (1, 0). Similarly, we cannot solve y as a function of x in a neighbourhood of the point (−1, 0). However, in a neighbourhood of the points (1, 0) and (−1, 0), we can solve x as a function of y. For a function f : O → R defined on an open subset O of R2 , the implicit function theorem takes the following form. Theorem 5.9 Dini’s Theorem Let O be an open subset of R2 that contains the point (x0 , y0 ), and let f : O → R be a continuously differentiable function defined on O such ∂f that f (x0 , y0 ) = 0. If (x0 , y0 ) ̸= 0, then there is a neighbourhood I ∂y of x0 , a neighbourhood J of y0 , and a continuously differentiable function g : I → J such that for any (x, y) ∈ I × J, f (x, y) = 0 if and only if y = g(x). Moreover, for any x ∈ I, ∂f ∂f (x, g(x)) + (x, g(x))g ′ (x) = 0. ∂x ∂y Dini’s theorem says that to be able to solve y as a function of x, a sufficient condition is that the function f has continuous partial derivatives, and fy does not vanish. By interchanging the roles of x and y, we see that if fx does not vanish, we can solve x as a function of y. For the function f : R2 → R, f (x, y) = x2 + y 2 − 1, the points on the set x2 + y 2 = 1 which fy (x, y) = 2y vanishes are the points (1, 0) and (−1, 0). In fact, we have seen that we cannot solve y as functions of x in neighbourhoods of these two points.
Chapter 5. The Inverse and Implicit Function Theorems
311
Proof of Dini’s Theorem Without loss of generality, assume that fy (x0 , y0 ) > 0. Let u0 = (x0 , y0 ). Since fy : O → R is continuous, there is an r1 > 0 such that the closed rectangle R = [x0 − r1 , x0 + r1 ] × [y0 − r1 , y0 + r1 ] lies in O, and for all (x, y) ∈ R, fy (x, y) > fy (x0 , y0 )/2 > 0. For any x ∈ [x0 − r1 , x0 + r1 ], the function hx : [y0 − r1 , y0 + r1 ] → R has derivative h′x (y) = fy (x, y) that is positive. Hence, hx (y) = g(x, y) is strictly increasing in y. This implies that f (x, y0 − r1 ) < f (x, y0 ) < f (x, y0 + r1 ). When x = x0 , we find that f (x0 , y0 − r1 ) < 0 < f (x0 , y0 + r1 ). Since f is continuously differentiable, it is continuous. Hence, there is an r2 > 0 such that r2 ≤ r1 , and for all x ∈ [x0 − r2 , x0 + r2 ], f (x, y0 − r1 ) < 0 and
f (x, y0 + r1 ) > 0.
Let I = (x0 − r2 , x0 + r2 ). For x ∈ I, since hx : [y0 − r1 , y0 + r1 ] → R is continuous, and hx (y0 − r1 ) < 0 < hx (y0 + r1 ), intermediate value theorem implies that there is a y ∈ (y0 − r1 , y0 + r1 ) such that hx (y) = 0. Since hx is strictly increasing, this y is unique, and we denote it by g(x). This defines the function g : I → R. Let J = (y0 − r1 , y0 + r1 ). By our argument, for each x ∈ I, y = g(x) is a unique y ∈ J such that f (x, y) = 0. Thus, for any (x, y) ∈ I × J, f (x, y) = 0 if and only if y = g(x). It remains to prove that g : I → R is continuosly differentiable. By our convention above, there is a positive constant c such that ∂f (x, y) ≥ c ∂y
for all (x, y) ∈ I × J.
Chapter 5. The Inverse and Implicit Function Theorems
312
Fixed x ∈ I. There exists an r > 0 such that (x − r, x + r) ⊂ I. For h satisfying 0 < |h| < r, x + h is in I. By mean value theorem, there is a ch ∈ (0, 1) such that f (x + h, g(x + h)) − f (x, g(x)) = h
∂f ∂f (uh ) + (g(x + h) − g(x)) (uh ), ∂x ∂y
where uh = (x, g(x)) + ch (h, g(x + h) − g(x)).
(5.4)
Since f (x + h, g(x + h)) = 0 = f (x, g(x)), we find that
fx (uh ) g(x + h) − g(x) =− . h fy (uh )
(5.5)
Since fx is continuous on the compact set R, it is bounded. Namely, there exists a constant M such that |fx (x, y)| ≤ M
for all (x, y) ∈ R.
Eq. (5.5) then implies that |g(x + h) − g(x)| ≤
M |h|. c
Taking h → 0 proves that g is continuous at x. From (5.4), we find that lim uh = (x, g(x)).
h→0
Since fx and fy are continuous at (x, g(x)), eq. (5.5) gives g(x + h) − g(x) fx (uh ) fx (x, g(x)) = − lim =− . h→0 h→0 fy (uh ) h fy (x, g(x)) lim
This proves that g is differentiable at x and ∂f ∂f (x, g(x)) + (x, g(x))g ′ (x) = 0. ∂x ∂y
Chapter 5. The Inverse and Implicit Function Theorems
313
Figure 5.5: Proof of Dini’s Theorem. Example 5.8 Consider the equation xy 3 + sin(x + y) + 4x2 y = 3. Show that in a neighbourhood of (−1, 1), this equation defines y as a function of x. If this function is denoted as y = g(x), find g ′ (−1). Solution 2
Let f : R → R be the function defined as f (x, y) = xy 3 + sin(x + y) + 4x2 y − 3. Since sine function and polynomial functions are infinitely differentiable, f is infinitely differentiable. ∂f (x, y) = 3xy 2 + cos(x + y) + 4x2 , ∂y
∂f (−1, 1) = 2 ̸= 0. ∂y
By Dini’s theorem, there is a neighbourhood of (−1, 1) such that y can be solved as a function of x. Now, ∂f (x, y) = y 3 + cos(x + y) + 8xy, ∂x Hence, g ′ (0) = −
−6 = 3. 2
∂f (−1, 1) = −6. ∂x
Chapter 5. The Inverse and Implicit Function Theorems
314
Now we turn to the general case. First we consider polynomial mappings of degree at most one. Let A = [aij ] be an m × n matrix, and let B = [bij ] be an m × m matrix. Given x ∈ Rn , y ∈ Rm , c ∈ Rm , the system of equations Ax + By = c is the following m equations in m + n variables x1 , . . . , xn , y1 , . . . , ym . a11 x1 + a12 x2 + · · · + a1n xn + b11 y1 + b12 y2 + · · · + b1m ym = c1 , a21 x1 + a22 x2 + · · · + a2n xn + b21 y1 + b22 y2 + · · · + b2m ym = c2 , .. . am1 x1 + am2 x2 + · · · + amn xn + bm1 y1 + bm2 y2 + · · · + bmm ym = cm . Let us look at an example. Example 5.9 Consider the linear system 2x1 + 3x2 − 5x3 + 2y1 − y2 = 1 3x1 − x2 + 2x3 − 3y1 + y2 = 0 Show that y = (y1 , y2 ) can be solved as a function of x = (x1 , x2 , x3 ). Write down the function G : R3 → R2 such that the solution is given by y = G(x), and find DG(x). Solution Let
"
# 2 3 −5 A= , 3 −1 2
"
# 2 −1 B= . −3 1
Then the system can be written as Ax + By = c,
" # 1 where c = . 0
This implies that By = c − Ax.
(5.6)
Chapter 5. The Inverse and Implicit Function Theorems
315
For every x ∈ R3 , c − Ax is a vector in R2 . Since det B = −1 ̸= 0, B is invertible. Therefore, there is a unique y satisfying (5.6). It is given by
G(x) = y = B −1 (c − Ax) " #" # " #" # 1 1 1 1 1 2 3 −5 =− + x 3 2 0 3 2 3 −1 2 " # " # 1 5 2 −3 =− + x 3 12 7 −11 " # 5x1 + 2x2 − 3x3 − 1 = . 12x1 + 7x2 − 11x3 − 3 "
# 5 2 −3 It follows that DG = . 12 7 −11 The following theorem gives a general scenario. Theorem 5.10 Let A = [aij ] be an m × n matrix, and let B = [bij ] be an m × m matrix. Define the function F : Rm+n → Rm by F(x, y) = Ax + By − c, where c is a constant vector in Rm . The equation F(x, y) = 0 defines the variable y = (y1 , . . . , ym ) as a function of x = (x1 , . . . , xn ) if and only if the matrix B is invertible. If we denote this function as G : Rn → Rm , then G(x) = B −1 (c − Ax) , and DG(x) = −B −1 A.
Chapter 5. The Inverse and Implicit Function Theorems
316
Proof The equation F(x, y) = 0 defines the variables y as a function of x if and only for for each x ∈ Rn , there is a unique y ∈ Rm satisfying By = c − Ax. This is a linear system for the variable y. By the theory of linear algebra, a unique solution y exists if and only if B is invertible. In this case, the solution is given by y = B −1 (c − Ax) . The rest of the assertion follows. Write a point in Rm+n as (x, y), where x ∈ Rn and y ∈ Rm . If F : Rm+n → Rm is a function that is differentiable at the point (x, y), the m×(m+n) derivative matrix DF(x, y) can be written as h i DF(x, y) = Dx F(x, y) Dy F(x, y) , where ∂F1 ∂F1 (x, y) (x, y) · · · ∂x1 ∂x2 ∂F ∂F2 2 (x, y) (x, y) · · · ∂x1 ∂x2 Dx F(x, y) = .. .. .. . . . ∂F ∂Fm m (x, y) (x, y) · · · ∂x1 ∂x2
∂F
1
∂F1 (x, y) · · · ∂y2
(x, y) ∂y1 ∂F ∂F2 2 (x, y) (x, y) · · · ∂y ∂y2 Dy F(x, y) = 1 .. .. .. . . . ∂F ∂Fm m (x, y) (x, y) · · · ∂y1 ∂y2
∂F1 (x, y) ∂xn ∂F2 (x, y) ∂xn , .. . ∂Fm (x, y) ∂xn ∂F1 (x, y) ∂ym ∂F2 (x, y) ∂ym . .. . ∂Fm (x, y) ∂ym
Chapter 5. The Inverse and Implicit Function Theorems
317
Notice that Dy F(x, y) is a square matrix. When A = [aij ] is an m × n matrix, B = [bij ] is an m × m matrix, c is a vector in Rm , and F : Rm+n → Rm is the function defined as F(x, y) = Ax + By − c, it is easy to compute that Dx F(x, y) = A,
Dy F(x, y) = B.
Theorem 5.10 says that we can solve y as a function of x from the system of m equations F(x, y) = 0 if and only if B = Dy F(x, y) is invertible. In this case, if G : Rn → Rm is the function so that y = G(x) is the solution, then DG(x) = −B −1 A = −Dy F(x, y)−1 Dx F(x, y). In fact, this latter follows from F(x, G(x)) = 0 and the chain rule. The special case of degree one polynomial mappings gives us sufficient insight into the general implicit function theorem. However, for nonlinear mappings, the conclusions can only be made locally. Theorem 5.11 Implicit Function Theorem Let O be an open subset of Rm+n , and let F : O → Rm be a continuously differentiable function defined on O. Assume that x0 is a point in Rn and y0 is a point in Rm such that the point (x0 , y0 ) is in O and F(x0 , y0 ) = 0. If det Dy F(x0 , y0 ) ̸= 0, then we have the followings. (i) There is a neighbourhood U of x0 , a neighbourhood V of y0 , and a continuously differentiable function G : U → Rm such that for any (x, y) ∈ U × V , F(x, y) = 0 if and only if y = G(x). (ii) For any x ∈ U , Dx F(x, G(x)) + Dy F(x, G(x))DG(x) = 0.
Chapter 5. The Inverse and Implicit Function Theorems
318
Here we will give a proof of the implicit function theorem using the inverse function theorem. The idea of the proof is to construct a mapping which one can apply the inverse function theorem. Let us look at an example first. Example 5.10 Let F : R5 → R2 be the function defined as F(x1 , x3 , x3 , y1 , y2 ) = (x1 y22 , x2 x3 y12 + x1 y2 ). Define the mapping H : R5 → R5 as H(x, y) = (x, F(x, y)) = (x1 , x2 , x3 , x1 y22 , x2 x3 y12 + x1 y2 ). Then we find that
1 0 0 0 0 0 1 0 0 0 DH(x, y) = 0 1 0 0 0 2 0 0 0 2x1 y2 y2 2 2 y2 x3 y1 x2 y1 2x2 x3 y1 x1
.
Notice that " DH(x, y) =
I3 0 Dx F(x, y) Dy F(x, y)
# .
Proof of the Implicit Function Theorem Let H : O → Rm+n be the mapping defined as H(x, y) = (x, F(x, y)) . Notice that F(x, y) = 0 if and only if H(x, y) = (x, 0). Since the first n components of H are infinitely differentiable functions, the mapping H : O → Rm+n is continuously differentiable.
Chapter 5. The Inverse and Implicit Function Theorems
319
Now, " DH(x, y) =
In 0 Dx F(x, y) Dy F(x, y)
# .
Therefore, det DH(x0 , y0 ) = det Dy F(x0 , y0 ) ̸= 0. By the inverse function theorem, there is a neighbourhood W of (x0 , y0 ) and a neighbourhood Z of H(x0 , y0 ) = (x0 , 0) such that H : W → Z is a bijection and H−1 : Z → W is continuously differentiable. For u ∈ Rn , v ∈ Rm so that (u, v) ∈ Z, let H−1 (u, v) = (Φ(u, v), Ψ(u, v)), where Φ is a map from Z to Rn and Ψ is a map from Z to Rm . Since H−1 is continuously differentiable, Φ and Ψ are continuously differentiable. m+n Y Given r > 0, let Dr be the open cube Dr = (−r, r). Since W and Z are i=1
open sets that contain (x0 , y0 ) and (x0 , 0) respectively, there exists r > 0 such that (x0 , y0 ) + Dr ⊂ W, (x0 , 0) + Dr ⊂ Z. n m Y Y If Ar = (−r, r), Br = (−r, r), U = x0 + Ar , V = y0 + Br , then i=1
i=1
(x0 , y0 ) + Dr = U × V,
(x0 , 0) + Dr = U × Br .
Hence, U × V ⊂ W and U × Br ⊂ Z. Define G : U → Rm by G(x) = Ψ(x, 0). Since Ψ is continuously differentiable, G is continuously differentiable. If x ∈ U , y ∈ V , then (x, y) ∈ W . For such (x, y), F(x, y) = 0 implies H(x, y) = (x, 0). Since H : W → Z is a bijection, (x, 0) ∈ Z and H−1 (x, 0) = (x, y). Comparing the last m components give y = Ψ(x, 0) = G(x).
Chapter 5. The Inverse and Implicit Function Theorems
320
Conversely, since H(H−1 (u, v)) = (u, v) for all (u, v) ∈ Z, we find that (Φ(u, v), F(Φ(u, v), Ψ(u, v))) = (u, v) for all (u, v) ∈ Z. For all u ∈ U , (u, 0) is in Z. Therefore, Φ(u, 0) = u,
F(Φ(u, 0), Ψ(u, 0)) = 0.
This implies that if x ∈ U , then F(u, G(u)) = 0. In other words, if (x, y) is in U × V and y = G(x), we must have F(x, y) = 0. Since we have shown that G : U → Rm is continuously differentiable, the formula Dx F(x, G(x)) + Dy F(x, G(x))DG(x) = 0 follows from F(x, G(x)) = 0 and the chain rule. Example 5.11 Consider the system of equations 2x2 y + 3xy 2 u + xyv + uv = 7 4xu − 5yv + u2 y + v 2 x = 1
(5.7)
Notice that when (x, y) = (1, 1), (u, v) = (1, 1) is a solution of this system. Show that there are neighbourhoods U and V of (1, 1), and a continuously differentiable function G : U → R2 such that if (x, y, u, v) ∈ U × V , then (x, y, u, v) is a solution of the system of equations above if and only ∂G1 if u = G1 (x, y) and v = G2 (x, y). Also, find the values of (1, 1), ∂x ∂G2 ∂G2 ∂G1 (1, 1), (1, 1) and (1, 1). ∂y ∂x ∂y Solution Define the function F : R4 → R2 by F(x, y, u, v) = (2x2 y + 3xy 2 u + xyv + uv − 7, 4xu − 5yv + u2 y + v 2 x − 1).
Chapter 5. The Inverse and Implicit Function Theorems
321
This is a polynomial mapping. Hence, it is continuously differentiable. It is easy to check that F(1, 1, 1, 1) = 0. Now, " # 3xy 2 + v xy + u D(u,v) F(x, y, u, v) = . 4x + 2uy −5y + 2vx Thus, "
# 4 2 det D(u,v) F(1, 1, 1, 1) = = −24 ̸= 0. 6 −3 By implicit function theorem, there are neighbourhoods U and V of (1, 1), and a continuously differentiable function G : U → R2 such that, if (x, y, u, v) ∈ U ×V , then (x, y, u, v) is a solution of the system of equations (5.7) if and only if u = G1 (x, y) and v = G2 (x, y). Finally, " # 4xy + 3y 2 u + yv 2x2 + 6xyu + xv D(x,y) F(x, y, u, v) = , 4u + v 2 −5v + u2 "
# 8 9 D(x,y) F(1, 1, 1, 1) = . 5 −4 Chain rule gives DG(1, 1) = −D(u,v) F(1, 1, 1, 1)−1 D(x,y) F(1, 1, 1, 1) " #" # 1 −3 −2 8 9 = 24 −6 4 5 −4 " # 1 −34 −19 = . 24 −28 −70 Therefore, ∂G1 17 (1, 1) = − , ∂x 12
∂G1 19 (1, 1) = − , ∂y 24
∂G2 7 (1, 1) = − , ∂x 6
∂G2 35 (1, 1) = − . ∂y 12
Chapter 5. The Inverse and Implicit Function Theorems
322
Remark 5.3 The Rank of a Matrix In the formulation of the implicit function theorem, the assumption that det Dy F(x0 , y0 ) ̸= 0 can be replaced by the assumption that there are m variables u1 , . . . , um among the n+m variables x1 , . . . , xn , y1 , . . . , ym such that det D(u1 ,...,um ) F(x0 , y0 ) ̸= 0. Recall that the rank r of an m × k matrix A is the dimension of its row space or the dimension of its column space. Thus, the rank r of a m × k matrix A is the maximum number of column vectors of A which are linearly independent, or the maximum number of row vectors of A that are linearly independent. Hence, the maximum possible value of r is max{m, k}. If r = max{m, k}, we say that the matrix A has maximal rank. For a m × k matrix where m ≤ k, it has maximal rank if r = m. In this case, there is a m × m submatrix of A consists of m linearly independent vectors in Rm . The determinant of this submatrix is nonzero. Thus, the condition det Dy F(x0 , y0 ) ̸= 0 in the formulation of the implicit function theorem can be replaced by the condition that the m × (m + n) matrix DF(x0 , y0 ) has maximal rank. Example 5.12 Consider the system 2x2 y + 3xy 2 u + xyv + uv = 7 4xu − 5yv + u2 y + v 2 x = 1
(5.8)
defined in Example 5.11. Show that there are neighbourhoods U and V of (1, 1), and a continuously differentiable function H : V → R2 such that if (x, y, u, v) ∈ U ×V , then (x, y, u, v) is a solution of the system of equations if and only if x = H1 (u, v) and y = H2 (u, v). Find DH(1, 1).
Chapter 5. The Inverse and Implicit Function Theorems
323
Solution Define the function F : R4 → R2 as in the solution of Example 5.11. Since " # 8 9 det D(x,y) F(1, 1, 1, 1) = = −77 ̸= 0, 5 −4 the implicit function theorem implies there are neighbourhoods U and V of (1, 1), and a continuously differentiable function H : V → R2 such that if (x, y, u, v) ∈ U ×V , then (x, y, u, v) is a solution of the system of equations (5.8) if and only if x = H1 (u, v) and y = H2 (u, v). Moreover, DH(1, 1) = −D(x,y) F(1, 1, 1, 1)−1 D(u,v) F(1, 1, 1, 1) " #" # 1 −4 −9 4 2 = 77 −5 8 6 −3 " # 1 −70 19 = . 77 28 −34 Remark 5.4 The function G : U → R2 in Example 5.11 and the function H : V → R2 in Example 5.12 are in fact inverses of each other. Notice that DG(1, 1) is invertible. By the inverse function theorem, there is a neighbourhood U ′ of (1, 1) such that V ′ = G(U ) is open, and G : U ′ → V ′ is a bijection with continuously differentiable inverse. By shrinking down the sets U and V , we can assume that U = U ′ , and V = V ′ . If (x, y) ∈ U and (u, v) ∈ V , F(x, y, u, v) = 0 if and only if (u, v) = G(x, y), if and only if (x, y) = H(u, v). This implies that G : U → V and H : V → U are inverses of each other. At the end of this section, let us consider a geometric application of the implicit function theorem. First let us revisit the example where f (x, y) = x2 + y 2 − 1. At each point (x0 , y0 ) such that f (x0 , y0 ) = 0, x20 + y02 = 1. Hence, ∇f (x0 , y0 ) = (2x0 , 2y0 ) ̸= 0. Notice that the vector ∇f (x0 , y0 ) =
Chapter 5. The Inverse and Implicit Function Theorems
324
(2x0 , 2y0 ) is normal to the circle x2 + y 2 = 1 at the point (x0 , y0 ).
Figure 5.6: The tangent vector and normal vector at a point on the circle x2 + y 2 − 1 = 0. If y0 > 0, let U = (−1, 1) × (0, ∞). Restricted to U , the points where √ f (x, y) = 0 is the graph of the function g : (−1, 1) → R, g(x) = 1 − x2 . If y0 < 0, let U = (−1, 1) × (−∞, 0). Restricted to U , the points where √ f (x, y) = 0 is the graph of the function g : (−1, 1) → R, g(x) = − 1 − x2 . If y0 = 0, then x0 = 1 or −1. In fact, we can consider more generally the cases where x0 > 0 and x0 < 0. If x0 > 0, let U = (0, ∞) × (−1, 1). Restricted to U , the points where p f (x, y) = 0 is the graph of the function g : (−1, 1) → R, g(y) = 1 − y 2 . If x0 < 0, let U = (−∞, 0) × (−1, 1). Restricted to U , the points where p f (x, y) = 0 is the graph of the function g : (−1, 1) → R, g(y) = − 1 − y 2 . Definition 5.4 Surfaces Let S be a subset of Rk for some positive integer k. We say that S is a n-dimensional surface if for each x0 on S, there is an open subset D of Rn , an open neighbourhood U of x0 in Rk , and a one-to-one differentiable mapping G : D → Rk such that G(D) ⊂ S, G(D) ∩ U = S ∩ U, and DG(u) has rank n at each u ∈ D.
Chapter 5. The Inverse and Implicit Function Theorems
325
Example 5.13 We claim that the n-sphere S n = {(x1 , . . . , xn , xn+1 ) | x21 + · · · + x2n + x2n+1 = 1} is an n-dimensional surface. Let (a1 , . . . , an , an+1 ) be a point on S n . Then at least one of the components a1 , . . . , an , an+1 is nonzero. Without loss of generality, assume that an+1 > 0. Let D = (x1 , . . . , xn ) | x21 + · · · + x2n < 1 , U = {(x1 , . . . , xn , xn+1 ) | xn+1 > 0} , and define the mapping G : D → U by q 2 2 G(x1 , . . . , xn ) = x1 , . . . , xn , 1 − x1 − · · · − xn . Then G is a differentiable mapping, G(D) ⊂ S n and G(D) ∩ U = S n ∩ U. Now, " # In , DG(x1 , . . . , xn ) = v where v = ∇Gn+1 (x1 , . . . , xn ). Since the first n-rows of DG(x1 , . . . , xn ) is the n × n identity matrix, it has rank n. Thus, S n is an n-dimensional surface. Generalizing Example 5.13, we find that a large class of surfaces is provided by graphs of differentiable functions. Theorem 5.12 Let D be an open subset of Rn , and let g : D → R be a differentiable mapping. Then the graph of g given by Gg = {(x1 , . . . , xn , xn+1 ) | (x1 , . . . , xn ) ∈ D, xn+1 = g(x1 , . . . , xn )} , is an n-dimensional surface. A hyperplane in Rn+1 is the set of points in Rn+1 which satisfies an equation
Chapter 5. The Inverse and Implicit Function Theorems
326
of the form a1 x1 + · · · + an xn + an+1 xn+1 = b, where a = (a1 , . . . , an , an+1 ) is a nonzero vector in Rn+1 . By definition, if u and v are two points on the plane, then ⟨a, u − v⟩ = 0. This shows that a is a vector normal to the plane. When D is an open subset of Rn , and g : D → R is a differentiable mapping, the graph Gg of g is an n-dimensional surface. If u = (u1 , . . . , un ) is a point on D, (u, g(u)) is a point on Gg , we have seen that the equation of the tangent plane at the point (u, g(u)) is given by xn+1 = f (u) +
n X ∂g (u, g(u))(xi − ui ). ∂x i i=1
Implicit function theorem gives the following. Theorem 5.13 Let O be an open subset of Rn+1 , and let f : O → R be a continuously differentiable function. If x0 is a point in O such that f (x0 ) = 0 and ∇f (x0 ) ̸= 0, then there is neighbourhood U of x0 contained in O such that restricted to U , f (x) = 0 is the graph of a continuously differentiable function g : D → R, and ∇f (x) is a vector normal to the tangent plane of the graph at the point x. Proof Assume that x0 = (a1 , . . . , an , an+1 ). Since ∇f (x0 ) ̸= 0, there is a 1 ≤ ∂f k ≤ n + 1 such that (x0 ) ̸= 0. Without loss of generality, assume that ∂xk k = n + 1.
Chapter 5. The Inverse and Implicit Function Theorems
327
Given a point x = (x1 , . . . , xn , xn+1 ) in Rn+1 , let u = (x1 , . . . , xn ) so that x = (u, xn+1 ). By the implicit function theorem, there is a neighbourhood D of u0 = (a1 , . . . , an ), an r > 0, and a continuously differentiable function g : D → R such that if U = D × (an+1 − r, an+1 + r), (u, un+1 ) ∈ U , then f (u, un+1 ) = 0 if and only if un+1 = g(u). In other words, in the neighbourhood U of x0 = (u0 , an+1 ), f (u, un+1 ) = 0 if and only if (u, un+1 ) is a point on the graph of the function g. The equation of the tangent plane at the point (u, un+1 ) is xn+1 − un+1
n X ∂g (u)(xi − ui ). = ∂xi i=1
By chain rule, ∂f (u, un+1 ) ∂g ∂xi (u) = − . ∂f ∂xi (u, un+1 ) ∂xn+1 Hence, the equation of the tangent plane can be rewritten as n+1 X i=1
(xi − ui )
∂f (u, un+1 ) = 0. ∂xi
This shows that ∇f (u, un+1 ) is a vector normal to the tangent plane. Example 5.14 Find the equation of the tangent plane to the surface x2 + 4y 2 + 9z 2 = 36 at the point (6, 1, −1). Solution Let f (x, y, z) = x2 + 4y 2 + 9z 2 . Then ∇f (x, y, z) = (2x, 8y, 18z). It follows that ∇f (6, 1, −1) = 2(6, 4, −9). Hence, the equation of the tangent plane to the surface at (6, 1, −1) is 6x + 4y − 9z = 36 + 4 + 9 = 49.
Chapter 5. The Inverse and Implicit Function Theorems
328
Exercises 5.3 Question 1 Consider the equation 4yz 2 + 3xz 3 − 11xyz = 14. Show that in a neighbourhood of (−1, 1, 2), this equation defines z as a function of (x, y). If this function is denoted as z = g(x, y), find ∇g(−1, 1). Question 2 Consider the system of equations 2xu2 + vyz + 3uv = 2 5x + 7yzu − v 2 = 1 (a) Show that when (x, y, z) = (−1, 1, 1), (u, v) = (1, 1) is a solution of this system. (b) Show that there are neighbourhoods U and V of (−1, 1, 1) and (1, 1), and a continuously differentiable function G : U → R2 such that, if (x, y, z, u, v) ∈ U × V , then (x, y, z, u, v) is a solution of the system of equations above if and only if u = G1 (x, y, z) and v = G2 (x, y, z). (c) Find the values of
∂G1 ∂G2 ∂G2 (−1, 1, 1), (−1, 1, 1) and (−1, 1, 1). ∂x ∂x ∂z
Question 3 Let O be an open subset of R2n , and let F : O → Rn be a continuously differentiable function. Assume that x0 and y0 are points in Rn such that (x0 , y0 ) is a point in O, F(x0 , y0 ) = 0, and Dx F(x0 , y0 ) and Dy F(x0 , y0 ) are invertible. Show that there exist neighbourhoods U and V of x0 and y0 , and a continuously differentiable bijective function G : U → V such that, if (x, y) is in U × V , F(x, y) = 0 if and only if y = G(x).
Chapter 5. The Inverse and Implicit Function Theorems
5.4
329
Extrema Problems and the Method of Lagrange Multipliers
Optimization problems are very important in our daily life and in mathematical sciences. Given a function f : D → R, we would like to know whether it has a maximum value or a minimum value. In Chapter 3, we have dicusssed the extreme value theorem, which asserts that a continuous function that is defined on a compact set must have maximum and minimum values. In Chapter 4, we showed that if a function f : D → R has (local) extremum at an interior point x0 of its domain D and it is differentiable at x0 , then x0 must be a stationary point. Namely, ∇f (x0 ) = 0. Combining these various results, we can formulate a strategy for solving a special type of optimization problems. Let us first consider the following example. Example 5.15 Let K = (x, y) | x2 + 4y 2 ≤ 100 , and let f : K → R be the function defined as f (x, y) = x2 + y 2 . Find the maximum and minimum values of f : K → R, and the points where these values appear. Solution Let g : R2 → R be the function defined as g(x, y) = x2 + 4y 2 − 100. It is a polynomial function. Hence, it is continuous. Since K = g −1 ((−∞, 0]) and (−∞, 0] is closed in R, K is a closed set. By a previous exercise, O = int K = (x, y) | x2 + 4y 2 < 100 and C = bd K = (x, y) | x2 + 4y 2 = 100 .
Chapter 5. The Inverse and Implicit Function Theorems
330
For any (x, y) ∈ K, ∥(x, y)∥2 = x2 + y 2 ≤ x2 + 4y 2 ≤ 100. Therefore, K is bounded. Since K is closed and bounded, and the function f : K → R, f (x) = x2 + y 2 is continuous, extreme value theorem says that f has maximum and minimum values. These values appear either in O or on C. Since f : O → R is differentiable, if (x0 , y0 ) is an extremizer of f : O → R, we must have ∇f (x0 , y0 ) = (0, 0), which gives (x0 , y0 ) = (0, 0). The other candidates of extremizers are on C. Therefore, we need to find the maximum and minimum values of f (x, y) = x2 + y 2 subject to the constraint x2 + 4y 2 = 100. From x2 + 4y 2 = 100, we find that x2 = 100 − 4y 2 , and y can only take values in the interval [−5, 5]. Hence, we want to find the maximum and minimum values of h : [−5, 5] → R, h(y) = 100 − 4y 2 + y 2 = 100 − 3y 2 . When y = 0, h has maximum value 100, and when y = ±5, it has minimum value 100 − 3 × 25 = 25. Notice that when y = 0, x = ±10; while when y = ±5, x = 0. Hence, we have five candidates for the extremizers of f . Namely, u1 = (0, 0), u2 = (10, 0), u3 = (−10, 0), u4 = (0, 5) and u5 = (0, −5). The function values at these 5 points are f (u1 ) = 0,
f (u2 ) = f (u3 ) = 100,
f (u4 ) = f (u5 ) = 25.
Therefore, the minimum value of f : K → R is 0, and the maximum value is 100. The minimum value appears at the point (0, 0) ∈ int K, while the maximum value appears at (±10, 0) ∈ bd K. Example 5.15 gives a typical scenario of the optimization problems that we want to study in this section.
Chapter 5. The Inverse and Implicit Function Theorems
331
Figure 5.7: The extreme values of f (x, y) = x2 + y 2 on the sets K = {(x, y) | x2 + 4y 2 ≤ 100} and C = {(x, y) | x2 + 4y 2 = 100}. Optimization Problem Let K be a compact subset of Rn with interior O, and let f : K → R be a function continuous on K, differentiable on O. We want to find the maximum and minimum values of f : K → R. (i) By the extreme value theorem, f : K → R has maximum and minimum values. (ii) Since K is closed, K is a disjoint union of its interior O and its boundary C. Since C is a subset of K, it is bounded. On the other hand, being the boundary of a set, C is closed. Therefore, C is compact. (iii) The extreme values of f can appear in O or on C. (iv) If x0 is an extremizer of f : K → R and it is in O, we must have ∇f (x0 ) = 0. Namely, x0 is a stationary point of f : O → R. (v) If x0 is an extremizer of f : K → R and it is not in O, it is an extremizer of f : C → R. (vi) Since C is compact, f : C → R has maximum and minimum values.
Chapter 5. The Inverse and Implicit Function Theorems
332
Therefore, the steps to find the maximum and minimum values of f : K → R are as follows. Step 1 Find the stationary points of f : O → R. Step 2 Find the extremizers of f : C → R. Step 3 Compare the values of f at the stationary points of f : O → R and the extremizers of f : C → R to determine the extreme values of f : K → R. Of particular interest is when the boundary of K can be expressed as g(x) = 0, where g : D → R is a continuously differentiable function defined on an open subset D of Rn . If f is also defined and differentiable on D, the problem of finding the extreme values of f : C → R becomes finding the extreme values of f : D → R subject to the constraint g(x) = 0. In Example 5.15, we have used g(x) = 0 to solve one of the variables in terms of the others and substitute into f to transform the optimization problem to a problem with fewer variables. However, this strategy can be quite complicated because it is often not possible to solve one variable in terms of the others explicitly from the constraint g(x) = 0. The method of Lagrange multipliers provides a way to solve constraint optimization problems without having to explicitly solve some variables in terms of the others. The validity of this method is justified by the implicit function theorem. Theorem 5.14 The Method of Lagrange Multiplier (One Constraint) Let O be an open subset of Rn+1 and let f : O → R and g : O → R be continuously differentiable functions defined on O. Consider the subset of O defined as C = {x ∈ O | g(x) = 0} . If x0 is an extremizer of the function f : C → R and ∇g(x0 ) ̸= 0, then there is a constant λ, known as the Lagrange multiplier, such that ∇f (x0 ) = λ∇g(x0 ).
Chapter 5. The Inverse and Implicit Function Theorems
333
Proof Without loss of generality, assume that x0 is a maximizer of f : C → R. Namely, f (x) ≤ f (x0 ) for all x ∈ C. (5.9) Given that ∇g(x0 ) ̸= 0, there exists a 1 ≤ k ≤ n + 1 such that ∂g (x0 ) ̸= 0. Without loss of generality, assume that k = n + 1. ∂xk Let x0 = (a1 , . . . , an , an+1 ). Given a point x = (x1 , . . . , xn , xn+1 ) in Rn+1 , let u = (x1 , . . . , xn ) so that x = (u, xn+1 ). By implicit function theorem, there is a neighbourhood D of u0 = (a1 , . . . , an ), an r > 0, and a continuously differentiable function h : D → R such that for (u, xn+1 ) ∈ D × (an+1 − r, an+1 + r), g(u, xn+1 ) = 0 if and only if xn+1 = h(u). Consider the function F : D → R defined as F (u) = f (u, h(u)). By (5.9), we find that F (u0 ) ≥ F (u)
for all u ∈ D.
In other words, u0 is a maximizer of the function F : D → R. Since u0 is an interior point of D and F : D → R is continuously differentiable, ∇F (u0 ) = 0. Since F (u) = f (u, h(u)), we find that for 1 ≤ i ≤ n, ∂f ∂f ∂h ∂F (u0 ) = (u0 , an+1 ) + (u0 , an+1 ) (u0 ) = 0. ∂xi ∂xi ∂xn+1 ∂xi
(5.10)
On the other hand, applying chain rule to g(u, h(u)) = 0 and set u = u0 , we find that ∂g ∂g ∂h (u0 , an+1 ) + (u0 , an+1 ) (u0 ) = 0 ∂xi ∂xn+1 ∂xi
for 1 ≤ i ≤ n. (5.11)
∂g By assumption, (x0 ) ̸= 0. Let ∂xn+1 ∂f (x0 ) ∂xn+1 λ= . ∂g (x0 ) ∂xn+1
Chapter 5. The Inverse and Implicit Function Theorems
Then
∂f ∂g (x0 ) = λ (x0 ). ∂xn+1 ∂xn+1
334
(5.12)
Eqs. (5.10) and (5.11) show that for 1 ≤ i ≤ n, ∂f ∂g ∂h ∂g (x0 ) = −λ (x0 ) (u0 ) = λ (x0 ). ∂xi ∂xn+1 ∂xi ∂xi
(5.13)
Eqs. (5.12) and (5.13) together imply that ∇f (x0 ) = λ∇g(x0 ). This completes the proof of the theorem. Remark 5.5 Theorem 5.14 says that if x0 is an extremizer of the constraint optimization problem max / min f (x) subject to g(x) = 0, then the gradient of f at x0 should be parallel to the gradient of g at x0 if the latter is nonzero. One can refer to Figure 5.7 for an illustration. Recall that the gradient of f gives the direction where f changes most rapidly, while the gradient of g here represents the normal vector to the curve g(x) = 0. Using the method of Lagrange multiplier, there are n + 2 variables x1 , . . . , xn+1 and λ to be solved. The equation ∇f (x) = λ∇g(x) gives n + 1 equations, while the equation g(x) = 0 gives one. Therefore, we need to solve n + 2 variables from n + 2 equations. Example 5.16 Let us solve the constraint optimization problem that appears in Example 5.15 using the Lagrange multiplier method. Let f : R2 → R and g : R2 → R be respectively the functions f (x, y) = x2 + y 2 and g(x, y) = x2 + 4y 2 − 100. They are both continuously differentiable. We want to find the maximum and minimum values of the function f (x, y) subject to the constraint g(x, y) = 0. Notice that ∇g(x, y) = (2x, 8y) is the zero vector if and only if (x, y) = (0, 0), but (0, 0) is not on the curve g(x, y) = 0. Hence, for any (x, y) satisfying g(x, y) = 0, ∇g(x, y) ̸= 0.
Chapter 5. The Inverse and Implicit Function Theorems
335
By the method of Lagrange multiplier, we need to find (x, y) satisfying ∇f (x, y) = λ∇g(x, y)
and g(x, y) = 0.
Therefore, 2x = 2λx,
2y = 8λy.
x(1 − λ) = 0,
y(1 − 4λ) = 0.
This gives
The first equation says that either x = 0 or λ = 1. If x = 0, from x2 + 4y 2 = 100, we must have y = ±5. If λ = 1, then y(1 − 4λ) = 0 implies that y = 0. From x2 + 4y 2 = 100, we then obtain x = ±10. Hence, we find that the candidates for the extremizers are (±10, 0) and (0, ±5). Since f (±10, 0) = 100 and f (0, ±5) = 25, we conclude that subject to x2 + 4y 2 = 100, the maximum value of f (x, y) = x2 + y 2 is 100, and the minimum value of f (x, y) = x2 + y 2 is 25. Example 5.17 Use the Lagrange multiplier method to find the maximum and minimum values of the function f (x, y, z) = 8x + 24y + 27z on the set S = (x, y, z) | x2 + 4y 2 + 9z 2 = 289 , and the points where each of them appears. Solution 3
Let g : R → R be the function g(x, y, z) = x2 + 4y 2 + 9z 2 − 289. The functions f : R3 → R, f (x, y, z) = 8x + 24y + 27z and g : R3 → R are both continuously differentiable.
Chapter 5. The Inverse and Implicit Function Theorems
336
Notice that ∇g(x, y, z) = (2x, 8y, 18z) = 0 if and only if (x, y, z) = 0, and 0 does not lie on S. By Lagrange multiplier method, to find the maximum and minimum values of f : S → R, we need to solve the equations ∇f (x, y, z) = λ∇g(x, y, z) and g(x, y, z) = 0. These give 8 = 2λx,
24 = 8λy,
27 = 18λz
x2 + 4y 2 + 9z 2 = 289. To satisfy the first three equations, none of the λ, x, y and z can be zero. We find that 4 3 3 x= , y= , z= . λ λ 2λ Substitute into the last equation, we have 64 + 144 + 81 = 289. 4λ2 1 1 This gives 4λ2 = 1. Hence, λ = ± . When λ = , (x, y, z) = (8, 6, 3). 2 2 1 When λ = − , (x, y, z) = (−8, −6, −3). These are the two candidates for 2 the extremizers of f : S → R. Since f (8, 6, 3) = 289 and f (−8, −6, −3) = −289, we find that the maximum and minimum values of f : S → R are 289 and −289 respectively, and the maximum value appear at (8, 6, 3), the minimum value appear at (−8, −6, −3). Now we consider more general constraint optimization problems which can have more than one constraints.
Chapter 5. The Inverse and Implicit Function Theorems
337
Theorem 5.15 The Method of Lagrange Multiplier (General) Let O be an open subset of Rm+n and let f : O → R and G : O → Rm be continuously differentiable functions defined on O. Consider the subset of O defined as C = {x ∈ O | G(x) = 0} . If x0 is an extremizer of the function f : C → R and the matrix DG(x0 ) has (maximal) rank m, then there are constants λ1 , . . ., λm , known as the Lagrange multipliers, such that ∇f (x0 ) =
m X
λi ∇Gi (x0 ).
i=1
Proof Without loss of generality, assume that x0 is a maximizer of f : C → R. Namely, f (x) ≤ f (x0 ) for all x ∈ C. (5.14) Given that the matrix DG(x0 ) has rank m, m of the column vectors are linearly independent. Without loss of generality, assume that the column vectors in the last m columns are linearly independent. Write a point x in Rm+n as x = (u, v), where u = (u1 , . . . , un ) is in Rn and v = (v1 , . . . , vm ) is in Rm . By our assumption, Dv G(u0 , v0 ) is invertible. By implicit function theorem, there is a neighbourhood D of u0 , a neighbourhood V of v0 , and a continuously differentiable function H : D → Rm such that for (u, v) ∈ D × V, G(u, v) = 0 if and only if v = H(u). Consider the function F : D → R defined as F (u) = f (u, H(u)). By (5.14), we find that F (u0 ) ≥ F (u)
for all u ∈ D.
Chapter 5. The Inverse and Implicit Function Theorems
338
In other words, u0 is a maximizer of the function F : D → R. Since u0 is an interior point of D and F : D → R is continuously differentiable, ∇F (u0 ) = 0. Since F (u) = f (u, H(u)), we find that ∇F (u0 ) = Du f (u0 , v0 ) + Dv f (u0 , v0 )DH(u0 ) = 0.
(5.15)
On the other hand, applying chain rule to G(u, H(u)) = 0 and set u = u0 , we find that Du G(u0 , v0 ) + Dv G(u0 , v0 )DH(u0 ) = 0.
(5.16)
Take h
λ1 λ2 · · ·
i λm = λ = Dv f (x0 )Dv G(x0 )−1 .
Then Dv f (x0 ) = λDv G(x0 ).
(5.17)
Eqs. (5.15) and (5.16) show that Du f (x0 ) = −λDv G(x0 )DH(u0 ) = λDu G(x0 ).
(5.18)
Eqs. (5.17) and (5.18) together imply that ∇f (x0 ) = λDG(x0 ) =
m X
λi ∇Gi (x0 ).
i=1
This completes the proof of the theorem. In the general constraint optimization problem proposed in Theorem 5.15, there are n + 2m variables u1 , . . . , un , v1 , . . . , vm and λ1 , . . . , λm to be solved. The components of m X ∇f (x) = λi ∇Gi (x) i=1
give n + m equations, while the components of G(x) = 0 give m equations. Hence, we have to solve n + 2m variables from n + 2m equations. Let us look at an example.
Chapter 5. The Inverse and Implicit Function Theorems
339
Example 5.18 Let K be the subset of R3 given by K = (x, y, z) | x2 + y 2 ≤ 4, x + y + z = 1 . Find the maximum and minimum values of the function f : K → R, f (x, y, z) = x + 3y + z. Solution Notice that K is the intersection of the two closed sets K1 = {(x, y, z) | x2 + y 2 ≤ 4} and K2 = {(x, y, z) | x + y + z = 1}. Hence, K is a closed set. If (x, y, z) is in K, x2 + y 2 ≤ 4. Thus, |x| ≤ 2, |y| ≤ 2 and hence |z| ≤ 1 + |x| + |y| ≤ 5. This shows that K is bounded. Since K is closed and bounded, f : K → R is continuous, f : K → R has maximum and minimum values. Let D = (x, y, z) | x2 + y 2 < 4, x + y + z = 1 , C = (x, y, z) | x2 + y 2 = 4, x + y + z = 1 . Then K = C ∪ D. We can consider the extremizers of f : D → R and f : C → R separately. To find the extremizers of f : D → R, we can regard this as a constraint optimization problem where we want to find the extreme values of f : O → R, f (x, y, z) = x + 3y + z on O = (x, y, z) | x2 + y 2 < 4 , subject to the constraint g(x, y, z) = 0, where g : O → R is the function g(x, y, z) = x + y + z − 1. Now ∇g(x, y, z) = (1, 1, 1) ̸= 0. Hence, at an extremizer, we must have ∇f (x, y, z) = λg(x, y, z), which gives (1, 3, 1) = λ(1, 1, 1).
Chapter 5. The Inverse and Implicit Function Theorems
340
This says that the two vectors (1, 3, 1) and (1, 1, 1) must be parallel, which is a contradiction. Hence, f : O → R does not have extremizers. Now, to find the extremizers of f : C → R, we can consider it as finding the extreme values of f : R3 → R, f (x, y, z) = x + 3y + z, subject to G(x, y, z) = 0, where G(x, y, z) = (x2 + y 2 − 4, x + y + z − 1). Now
"
# 2x 2y 0 DG(x, y, z) = . 1 1 1
This matrix has rank less than 2 if and only if (2x, 2y, 0) is parallel to (1, 1, 1), which gives x = y = z = 0. But the point (x, y, z) = (0, 0, 0) is not on C. Therefore, DG(x, y, z) has maximal rank for every (x, y, z) ∈ C. Using the Lagrange multiplier method, to solve for the extremizer of f : C → R, we need to solve the system ∇f (x, y, z) = λ∇G1 (x, y, z) + µG2 (x, y, z),
G(x, y, z) = 0.
These gives 1 = 2λx + µ,
3 = 2λy + µ,
x2 + y 2 = 4,
1 = µ,
x + y + z = 1.
From µ = 1, we have 2λx = 0 and 2λy = 2. The latter implies that λ ̸= 0. Hence, we must have x = 0. Then x2 + y 2 = 4 gives y = ±2. When (x, y) = (0, 2), z = −1. When (x, y) = (0, −2), z = 3. Hence, we only have two candidates for extremizers, which are (0, 2, −1) and (0, −2, 3). Since f (0, 2, −1) = 5, f (0, −2, 3) = −3, we find that f : K → R has maximum value 5 at the point (0, 2, −1), and minimum value −3 at the point (0, −2, 3).
Chapter 5. The Inverse and Implicit Function Theorems
341
Exercises 5.4 Question 1 Find the extreme values of the function f (x, y, z) = 4x2 + y 2 + yz + z 2 on the set S = (x, y, z) | 2x2 + y 2 + z 2 ≤ 8 . Question 2 Find the point in the set S = (x, y) | 4x2 + y 2 ≤ 36, x2 + 4y 2 ≥ 4 that is closest to and farthest from the point (1, 0). Question 3 Use the Lagrange multiplier method to find the maximum and minimum values of the function f (x, y, z) = x + 2y − z on the set S = (x, y, z) | x2 + y 2 + 4z 2 ≤ 84 , and the points where each of them appears. Question 4 Find the extreme values of the function f (x, y, z) = x on the set S = (x, y, z) | x2 = y 2 + z 2 , 7x + 3y + 4z = 60 . Question 5 Let K be the subset of R3 given by K = (x, y, z) | 4x2 + z 2 ≤ 68, y + z = 12 . Find the maximum and minimum values of the function f : K → R, f (x, y, z) = x + 2y.
Chapter 5. The Inverse and Implicit Function Theorems
342
Question 6 Let A be an n × n symmetric matrix, and let QA : Rn → R be the quadratic form QA (x) = xT Ax defined by A. Show that the minimum and maximum values of QA : S n−1 → R on the unit sphere S n−1 are the smallest and largest eigenvalues of A.
Chapter 6. Multiple Integrals
343
Chapter 6 Multiple Integrals For a single variable functions, we have discussed the Riemann integrability of a function f : [a, b] → R defined on a compact interval [a, b]. In this chapter, we consider the theory of Riemann integrals for multivariable functions. For a function F : D → Rm that takes values in Rm with m ≥ 2, we define the integral componentwise. Namely, we say that the function F : D → Rm is Riemann integrable if and only if each of the component functions Fj : D → R, 1 ≤ j ≤ m is Riemann integrable, and we define Z Z Z Z F1 , F2 , . . . , Fm . F= D
D
D
D
Thus, in this chapter, we will only discuss the theory of integration for functions f : D → R that take values in R. A direct generalization of a compact interval [a, b] to Rn is a product of compact n Y intervals I = [ai , bi ], which is a closed rectangle. In this chapter, when we say I i=1
is a rectangle, it means I can be written as
n Y [ai , bi ] with ai < bi for all 1 ≤ i ≤ n. i=1
n Y The edges of I = [ai , bi ] are [a1 , b1 ], [a2 , b2 ], . . ., [an , bn ]. i=1
We first discuss the integration theory of functions defined on closed rectangles n Y of the form [ai , bi ]. For applications, we need to consider functions defined on i=1
other subsets D of Rn . One of the most useful theoretical tools for evaluating single integrals is the fundamental theorem of calculus. To apply this tool for multiple integrals, we need to consider iterated integrals. Another useful tool is the change of variables formula. For multivariable functions, the change of variables theorem is much more complicated. Nevertheless, we will discuss these in this chapter.
Chapter 6. Multiple Integrals
6.1
344
Riemann Integrals
In this section, we define the Riemann integral of a function f : D → R defined n Y n on a subset D of R . We first consider the case where D = [ai , bi ]. i−1
Let us first consider partitions. We say that P = {x0 , x1 , . . . , xk } is a partition of the interval [a, b] if a = x0 < x1 < · · · < xk−1 < xk = b. It divides [a, b] into k subintervals J1 , . . . , Jk , where Ji = [xi−1 , xi ]. Definition 6.1 Partitions n Y A partition P of a closed rectangle I = [ai , bi ] is achieved by having a i=1
partition Pi of [ai , bi ] for each 1 ≤ i ≤ n. We write P = (P1 , P2 , . . . , Pn ) for such a partition. The partition P divides the rectangle I into a collection JP of rectangles, any two of which have disjoint interiors. A closed rectangle J in JP can be written as J = J1 × J2 × · · · × Jn , where Ji , 1 ≤ i ≤ n is a subinterval in the partition Pi . If the partition Pi divides [ai , bi ] into ki subintervals, then the partition P = n Y (P1 , . . . , Pn ) divides the rectangle I = [ai , bi ] into |JP | = k1 k2 · · · kn rectangles. i=1
Example 6.1 Consider the rectangle I = [−2, 9] × [1, 6]. Let P1 = {−2, 0, 4, 9} and P2 = {1, 3, 6}. The partition P1 divides the interval I1 = [−2, 9] into the three subintervals [−2, 0], [0, 4] and [4, 9]. The partition P2 divides the interval I2 = [1, 6] into the two subintervals [1, 3] and [3, 6]. Therefore, the partition P = (P1 , P2 ) divides the rectangle I into the following six rectangles. [−2, 0] × [1, 3],
[0, 4] × [1, 3],
[4, 9] × [1, 3],
[−2, 0] × [3, 6],
[0, 4] × [3, 6],
[4, 9] × [3, 6].
Chapter 6. Multiple Integrals
345
Figure 6.1: A partition of the rectangle [−2, 9] × [1, 6] given in Example 6.1. Definition 6.2 Regular and Uniformly Regular Partitions n Y Let I = [ai , bi ] be a rectangle in Rn . We say that P = (P1 , . . . , Pn ) is i=1
a regular partition of I if for each 1 ≤ i ≤ n, Pi is a regular partition of [ai , bi ] into ki intervals. We say that P is a uniformly regular partition of P into k n rectangles if for each 1 ≤ i ≤ n, Pi is a regular partition of [ai , bi ] into k intervals. Example 6.2 Consider the rectangle I = [−2, 7] × [−4, 8]. (a) The partition P = (P1 , P2 ) where P1 = {−2, 1, 4, 7} and P2 = {−4, −1, 2, 5, 8} is a regular partition of I. (b) The partition P = (P1 , P2 ) where P1 = {−2, 1, 4, 7} and P2 = {−4, 0, 4, 8} is a uniformly regular partition of I into 32 = 9 rectangles. The length of an interval [a, b] is b − a. The area of a rectangle [a, b] × [c, d] is (b − a) × (d − c). In general, we define the volume of a closed rectangle of the n Y form I = [ai , bi ] in Rn as follows. i=1
Chapter 6. Multiple Integrals
346
Figure 6.2: A regular and a uniformly regular partition of [−2, 7] × [−4, 8] discussed in Example 6.2. Definition 6.3 Volume of a Rectangle The volume of the closed rectangle I =
n Y
[ai , bi ] is defined as the product
i=1
of the lengths of all its edges. Namely, n Y vol (I) = (bi − ai ). i=1
Example 6.3 The volume of the rectangle I = [−2, 9] × [1, 6] is vol (I) = 11 × 5 = 55. When P = {x0 , x1 , . . . , xk } is a partition of [a, b], it divides [a, b] into k subintervals J1 , . . . , Jk , where Ji = [xi−1 , xi ]. Notice that k X i=1
vol (Ji ) =
k X
(xi − xi−1 ) = b − a.
i=1
n Y Assume that P = (P1 , · · · , Pn ) is a partition of the rectangle I = [ai , bi ] in Rn . i=1
Then for 1 ≤ i ≤ n, Pi is a partition of [ai , bi ]. If Pi divides [ai , bi ] into the ki
Chapter 6. Multiple Integrals
347
subintervals Ji,1 , Ji,2 , . . . , Ji,ki , then the collection of rectangles in the partition P is JP = {J1,m1 × · · · × Jn,mn | 1 ≤ mi ≤ ki for 1 ≤ i ≤ n} . Notice that vol (J1,m1 × · · · × Jn,mn ) = vol (J1,m1 ) × · · · × vol (Jn,mn ). From this, we obtain the sum of volumes formula: X
vol (J) =
kn X
···
mn =1
J∈JP
" =
k1 X
k1 X
vol (J1,m1 ) × · · · × vol (Jn,mn )
m1 =1
#
"
vol (J1,m1 ) × · · · ×
m1 =1
kn X
# vol (Jn,mn )
mn =1
= (b1 − a1 ) × · · · × (bn − an ) = vol (I). Proposition 6.1 Let P be a partition of I =
n Y
[ai , bi ]. Then the sum of the volumes of the
i=1
rectangles J in the partition P is equal to the volume of the rectangle I. Z f for a nonnegative function
One of the motivations to define the integral I
f : I → R is to find the volume bounded between the graph of f and the rectangle I in Rn+1 . To find the volume, we partition I into small rectangles, pick a point ξ J in each of these rectangles J, and approximate the function on J as a constant given by the value f (ξ J ). The volume between the rectangle J and the graph of f over J is then approximated by f (ξ J ) vol (J). This leads us to the concept of Riemann sums. n Y If P is a partition of I = [ai , bi ], we say that A is a set of intermediate i=1
points for the partition P if A = {ξ J | J ∈ JP } is a subset of I indexed by JP , such that ξ J ∈ J for each J ∈ JP .
Chapter 6. Multiple Integrals
348
Definition 6.4 Riemann Sums n Y Let I = [ai , bi ], and let f : I → R be a function defined on I. Given i=1
a partition P of I, a set A = {ξ J | J ∈ JP } of intermediate points for the partition P, the Riemann sum of f with respect to the partition P and the set of intermediate points A = {ξ J } is the sum R(f, P, A) =
X
f (ξ J )vol (J).
J∈JP
Example 6.4 Let I = [−2, 9] × [1, 6], and let P = (P1 , P2 ) be the partition of I with P1 = {−2, 0, 4, 9} and P2 = {1, 3, 6}. Let f : I → R be the function defined as f (x, y) = x2 + y. Consider a set of intermediate points A as follows. J
ξJ
f (ξ J )
vol (J)
[−2, 0] × [1, 3] [−2, 0] × [3, 6] [0, 4] × [1, 3] [0, 4] × [3, 6] [4, 9] × [1, 3] [4, 9] × [3, 6]
(−1, 1) (0, 3) (1, 1) (2, 4) (4, 2) (9, 3)
2 3 2 8 18 84
4 6 8 12 10 15
The Riemann sum R(f, P, A) is equal to 2 × 4 + 3 × 6 + 2 × 8 + 8 × 12 + 18 × 10 + 84 × 15 = 1578. Example 6.5 If f : I → R is the constant function f (x) = c, then for any partition P of I and any set of intermediate points A = {ξ J }, R(f, P, A) = c vol (I). When c > 0, this is the volume of the rectangle I × [0, c] in Rn+1 .
Chapter 6. Multiple Integrals
349
As in the single variable case, Darboux sums provide bounds for Riemann sums. Definition 6.5 Darboux Sums n Y Let I = [ai , bi ] , and let f : I → R be a bounded function defined on i=1
I. Given a partition P of I, let JP be the collection of rectangles in the partition P. For each J in JP , let mJ = inf {f (x) | x ∈ J}
and
MJ = sup {f (x) | x ∈ J} .
The Darboux lower sum L(f, P) and the Darboux upper sum U (f, P) are defined as L(f, P) =
X
mJ vol (J) and
U (f, P) =
J∈JP
X
MJ vol (J).
J∈JP
Example 6.6 If f : I → R is the constant function f (x) = c, then L(f, P) = c vol (I) = U (f, P)
for any partition P of I.
Example 6.7 Consider the function f : I → R, f (x, y) = x2 + y defined in Example 6.4, where I = [−2, 9] × [1, 6]. For the partition P = (P1 , P2 ) with P1 = {−2, 0, 4, 9} and P2 = {1, 3, 6}, we have the followings. J [−2, 0] × [1, 3] [−2, 0] × [3, 6] [0, 4] × [1, 3] [0, 4] × [3, 6] [4, 9] × [1, 3] [4, 9] × [3, 6]
mJ
MJ
02 + 1 = 1 (−2)2 + 3 = 7 02 + 3 = 3 (−2)2 + 6 = 10 02 + 1 = 1 42 + 3 = 19 02 + 3 = 3 42 + 6 = 22 42 + 1 = 17 92 + 3 = 84 42 + 3 = 19 92 + 6 = 87
vol (J) 4 6 8 12 10 15
Chapter 6. Multiple Integrals
350
Therefore, the Darboux lower sum is L(f, P) = 1 × 4 + 3 × 6 + 1 × 8 + 3 × 12 + 17 × 10 + 19 × 15 = 521; while the Darboux upper sum is U (f, P) = 7 × 4 + 10 × 6 + 19 × 8 + 22 × 12 + 84 × 10 + 87 × 15 = 2649. Notice that we can only define Darboux sums if the function f : I → R is bounded. This means that there are constants m and M such that m ≤ f (x) ≤ M
for all x ∈ I.
If P is a partition of the rectangle I, and J is a rectangle in the partition P, ξ J is a point in J, then m ≤ mJ ≤ f (ξ J ) ≤ MJ ≤ M. Multipluying throughout by vol (J) and summing over J ∈ JP , we obtain the following. Proposition 6.2 n Y Let I = [ai , bi ], and let f : I → R be a bounded function defined on I. i=1
If m ≤ f (x) ≤ M
for all x ∈ I,
then for any partition P of I, and for any choice of intermediate points A = {ξ J } for the partition P, we have m vol (I) ≤ L(f, P) ≤ R(f, P, A) ≤ U (f, P) ≤ M vol (I). To study the behaviour of the Darboux sums when we modify the partitions, we first extend the concept of refinement of a partition to rectangles in Rn . Recall that if P and P ∗ are partitions of the interval [a, b], P ∗ is a refinement of P if each partition point of P is also a partition point of P ∗ .
Chapter 6. Multiple Integrals
351
Definition 6.6 Refinement of a Partition n Y Let I = [ai , bi ], and let P = (P1 , . . . , Pn ) and P∗ = (P1∗ , . . . , Pn∗ ) be i=1
partitions of I. We say that P∗ is a refinement of P if for each 1 ≤ i ≤ n, Pi∗ is a refinement of Pi .
Figure 6.3: A refinement of the partition of the rectangle [−2, 9] × [1, 6] given in Figure 6.1. Example 6.8 Let us consider the partition P = (P1 , P2 ) of the rectangle I = [−2, 9] × [1, 6] given in Example 6.1, with P1 = {−2, 0, 4, 9} and P2 = {1, 3, 6}. Let P1∗ = {−2, 0, 1, 4, 6, 9} and P2∗ = {1, 3, 4, 6}. Then P∗ = (P1∗ , P2∗ ) is a refinement of P. If the partition P∗ is a refinement of the partition P, then for each J in JP , P∗ induces a partition of J, which we denote by P∗ (J). Example 6.9 The partition P∗ in Example 6.8 induces the partition P∗ (J) = (P1∗ (J), P2∗ (J)) of the rectangle J = [0, 4]×[3, 6], where P1∗ (J) = {0, 1, 4} and P2∗ (J) = {3, 4, 6}. The partition P∗ (J) divides the rectangle J into 4 rectangles, as shown in Figure 6.3.
Chapter 6. Multiple Integrals
352
If the partition P∗ is a refinement of the partition P, then the collection of rectangles in P∗ is the union of the collection of rectangles in P∗ (J) when J ranges over the collection of rectangles in P. Namely, [ JP∗ = JP∗ (J) . J∈JP
Using this, we can deduce the following. Proposition 6.3 Let I =
n Y [ai , bi ], and let f : I → R be a bounded function defined on I. i=1
If P and P∗ are partitions of I and P∗ is a refinement of P, then X X L(f, P∗ ) = L(f, P∗ (J)), U (f, P∗ ) = U (f, P∗ (J)). J∈JP
J∈JP
From this, we can show that a refinement improves the Darboux sums, in the sense that a lower sum increases, and an upper sum decreases. Theorem 6.4 n Y Let I = [ai , bi ], and let f : I → R be a bounded function defined on I. i=1
If P and P∗ are partitions of I and P∗ is a refinement of P, then L(f, P) ≤ L(f, P∗ ) ≤ U (f, P∗ ) ≤ U (f, P). Proof For each rectangle J in the partition P, mJ ≤ f (x) ≤ MJ
for all x ∈ J.
Applying Proposition 6.2 to the function f : J → R and the partition P∗ (J), we find that mJ vol (J) ≤ L(f, P∗ (J)) ≤ U (f, P∗ (J)) ≤ MJ vol (J).
Chapter 6. Multiple Integrals
353
Summing over J ∈ JP , we find that X X L(f, P) ≤ L(f, P∗ (J)) ≤ U (f, P∗ (J)) ≤ U (f, P). J∈JP
J∈JP
The assertion follows from Proposition 6.3. It is difficult to visualize the Darboux sums with a multivariable functions. Hence, we illustrate refinements improve Darboux sums using single variable functions, as shown in Figure 6.4 and Figure 6.5.
Figure 6.4: A refinement of the partition increases the Darboux lower sum.
Figure 6.5: A refinement of the partition decreases the Darboux upper sum. As a consequence of Theorem 6.4, we can prove the following.
Chapter 6. Multiple Integrals
354
Corollary 6.5 n Y Let I = [ai , bi ], and let f : I → R be a bounded function defined on I. i=1
For any two partitions P1 and P2 of I, L(f, P1 ) ≤ U (f, P2 ). Proof Let P1 = (P1,1 , P1,2 , . . . , P1,n ) and P2 = (P2,1 , P2,2 , . . . , P2,n ). For 1 ≤ i ≤ n, let Pi∗ be the common refinement of P1,i and P2,i obtained by taking the union of the partition points in P1,i and P2,i . Then P∗ = (P1∗ , . . . , Pn∗ ) is a common refinement of the partitions P1 and P2 . By Theorem 6.4, L(f, P1 ) ≤ L(f, P∗ ) ≤ U (f, P∗ ) ≤ U (f, P2 ). Now we define lower and upper integrals of a bounded function f : I → R . Definition 6.7 Lower Integrals and Upper Integrals n Y Let I = [ai , bi ], and let f : I → R be a bounded function defined on I. i=1
Let SL (f ) be the set of Darboux lower sums of f , and let SU (f ) be the set of Darboux upper sums of f . Z 1. The lower integral of f , denoted by f , is defined as the least upper I
bound of the Darboux lower sums. Z f = sup SL (f ) = sup {L(f, P) | P is a partition of I} . I
Z 2. The upper integral of f , denoted by
f , is defined as the greatest lower I
bound of the Darboux upper sums. Z f = inf SU (f ) = inf {U (f, P) | P is a partition of I} . I
Chapter 6. Multiple Integrals
355
Example 6.10 If f : I → R is the constant function f (x) = c, then for any partition P of I, L(f, P) = c vol (I) = U (f, P). Therefore, both SL (f ) and SU (f ) are the one-element set {c vol (I)}. This shows that Z Z f = c vol (I).
f= I
I
For a constant function, the lower integral and the upper integral are the same. For a general bounded funtion, we have the following. Theorem 6.6 n Y Let I = [ai , bi ], and let f : I → R be a bounded function defined on I. i=1
Then we have Z
Z f ≤
I
f. I
Proof By Corollary 6.5, every element of SL (f ) is less than or equal to any element of SU (f ). This implies that Z
Z f = sup SL (f ) ≤ inf SU (f ) =
I
f. I
Example 6.11 The Dirichlet’s Function Let I =
n Y [ai , bi ], and let f : I → R be the function defined as i=1 1, if all components of x are rational, f (x) = 0, otherwise.
This is known as the Dirichlet’s function. Find the lower inegral and the upper integral of f : I → R.
Chapter 6. Multiple Integrals
356 Solution
Let P = (P1 , . . . , Pn ) be a partition of I. A rectangle J in the partition P n Y can be written in the form J = [ui , vi ]. By denseness of rational numbers i=1
and irrational numbers, there exist a rational number αi and an irrational number βi in (ui , vi ). Let α = (α1 , . . . , αn ) and β = (β1 , . . . , βn ). Then α and β are points in J, and 0 = f (β) ≤ f (x) ≤ f (α) = 1
for all x ∈ J.
Therefore, mJ = inf f (x) = 0, x∈J
MJ = sup f (x) = 1. x∈J
It follows that L(f, P) =
X
mJ vol (J) = 0,
J∈JP
U (f, P) =
X
MJ vol (J) =
J∈JP
X
vol (J) = vol (I).
J∈JP
Therefore, SL (f ) = {0},
while SU (f ) = {vol (I)}.
This shows that the lower inegral and the upper integral of f : I → R are given respectively by Z Z f = 0 and f = vol (I). I
I
As we mentioned before, one of the motivations to define the integral f : I → R is to calculate volumes. Given that f : I → R is a nonnegative continuous function defined on the rectangle I in Rn , let S = {(x, y) | x ∈ I, 0 ≤ y ≤ f (x)} , which is the solid bounded between I and the graph of f . It is reasonable to expect that S has a volume, which we denote by vol (S). We want to define the integral
Chapter 6. Multiple Integrals
357
Z f so that it gives vol (S). Notice that if P is a partition of I, then the Darboux I
lower sum L(f, P) =
X
mJ vol (J)
J∈JP
is the sum of volumes of the collection of rectangles {J × [0, mJ ] | J ∈ JP } in Rn+1 , each of which is contained in S. Since any two of these rectangles can only intersect on the boundaries, it is reasonable to expect that L(f, P) ≤ vol (S). Similarly, the Darboux upper sum U (f, P) =
X
MJ vol (J)
J∈JP
is the sum of volumes of the collection of rectangles {J × [0, MJ ] | J ∈ JP } in Rn+1 , the union of which contains S. Therefore, it is reasonable to expect that vol (S) ≤ U (f, P). Hence, the volume of S should be a number between L(f, P) and U (f, P) for any partition P. To make the volume well-defined, there should be only one number between L(f, P) and U (f, P) for all partitions P. By definition, any number between the lower integral and the upper integral is in between L(f, P) and U (f, P) for any partition P. Hence, to have the volume well-defined, we must require the lower integral and the upper integral to be the same. This motivates the following definition of integrability for a general bounded function.
Chapter 6. Multiple Integrals
358
Definition 6.8 Riemann integrability n Y [ai , bi ], and let f : I → R be a bounded function defined on I. Let I = i=1
We say that f : I → R is Riemann integrable, or simply integrable, if Z Z f = f. I
I
In this case, we define the integral of f over the rectangle I as Z Z Z f = f = f. I
I
I
It is the unique number larger than or equal to all Darboux lower sums, and smaller than or equal to all Darboux upper sums. Example 6.12 Example 6.10 says that a constant function f : I → R, f (x) = c is integrable and Z f = c vol (I). I
Example 6.13 The Dirichlet’s function defined in Example 6.11 is not Riemann integrable since the lower integral and the upper integral are not equal. Leibniz Notation for Riemann Integrals The Leibniz notation of the Riemann integral of f : I → R is Z Z f (x)dx, or equivalently, f (x1 , . . . , xn )dx1 · · · dxn . I
I
As in the single variable case, there are some criteria for Riemann integrability which follows directly from the criteria that the lower integral and the upper integral are the same.
Chapter 6. Multiple Integrals
359
Theorem 6.7 n Y Let I = [ai , bi ], and let f : I → R be a bounded function defined on I. i=1
The following are equivalent. (a) The function f : I → R is Riemann integrable. (b) For every ε > 0, there is a partition P of the rectangle I such that U (f, P) − L(f, P) < ε. We define an Archimedes sequence of partitions exactly the same as in the single variable case. Definition 6.9 Archimedes Sequence of Partitions n Y Let I = [ai , bi ], and let f : I → R be a bounded function defined on I. i=1
If {Pk } is a sequence of partitions of the rectangle I such that lim (U (f, Pk ) − L(f, Pk )) = 0,
k→∞
we call {Pk } an Archimedes sequence of partitions for the function f . Then we have the following theorem. Theorem 6.8 The Archimedes-Riemann Theorem Let I =
n Y [ai , bi ], and let f : I → R be a bounded function defined on i=1
I. The function f : I → R is Riemann integrable if and only if fZ has an Archimedes sequence of partitions {Pk }. In this case, the integral f can I
be computed by Z f = lim L(f, Pk ) = lim U (f, Pk ). I
k→∞
k→∞
A candidate for an Archimedes sequence of partitions is the sequence {Pk },
Chapter 6. Multiple Integrals
360
where Pk is the uniformly regular partition of I into k n rectangles. Example 6.14 Let I = [0, 1] × [0, 1]. Consider the function f : I → R defined as 1, if x ≥ y, f (x, y) = 0, if x < y. For k ∈ Z+ , let Pk be the uniformly regular partition of I into k 2 rectangles. (a) For each k ∈ Z+ , compute the Darboux lower sum L(f, Pk ) and the Darboux upper sum U (f, Pk ). Z (b) Show that f : I → R is Riemann integrable and find the integral f . I
Solution i for 0 ≤ i ≤ k. k Then Pk = (Pk , Pk ), and it divides I = [0, 1] × [0, 1] into the k 2 rectangles Ji,j , 1 ≤ i ≤ k, 1 ≤ j ≤ k, where Ji,j = [ui−1 , ui ] × [uj−1 , uj ]. We have
Fixed k ∈ Z+ , let Pk = {u0 , u1 , . . . , uk }, where ui =
vol (Ji,j ) =
1 . k2
Let mi,j =
inf (x,y)∈Ji,j
f (x, y) and
Mi,j =
sup
f (x, y).
(x,y)∈Ji,j
Notice that if i < j − 1, then x ≤ ui < uj−1 ≤ y
for all (x, y) ∈ Ji,j .
Hence, f (x, y) = 0
for all (x, y) ∈ Ji,j .
This implies that mi,j = Mi,j = 0
when i < j − 1.
Chapter 6. Multiple Integrals
361
If i ≥ j + 1, then x ≥ ui−1 ≥ uj ≥ y
for all (x, y) ∈ Ji,j .
Hence, f (x, y) = 1
for all (x, y) ∈ Ji,j .
This implies that mi,j = Mi,j = 1
when i ≥ j + 1.
When i = j − 1, if (x, y) is in Ji,j , x ≤ ui = uj−1 ≤ y, and x = y if and only if (x, y) is the point (ui , uj−1 ). Hence, f (x, y) = 0 for all (x, y) ∈ Ji,j , except for (x, y) = (ui , uj−1 ), where f (ui , uj−1 ) = 1. Hence, mi,j = 0, Mi,j = 1 when i = j − 1. When i = j, 0 ≤ f (x, y) ≤ 1 for all (x, y) ∈ Ji,j . Since (ui−1 , uj ) and (ui , uj ) are in Ji,j , and f (ui−1 , uj ) = 0 while f (ui , uj ) = 1, we find that mi,j = 0,
Mi,j = 1
when i = j.
It follows that L(f, Pk ) =
k X k X i=1
k X i−1 X 1 mi,j vol (Ji,j ) = k2 j=1 i=2 j=1
k k−1 1 X 1 X k(k − 1) = 2 . (i − 1) = 2 i= k i=2 k i=1 2k 2 k X k X
k−1 X i+1 k X X 1 1 U (f, Pk ) = Mi,j vol (Ji,j ) = + 2 k k2 i=1 j=1 j=1 i=1 j=1 k−1 1 1 X 1 k(k + 1) k 2 + 3k − 2 = + 2 (i + 1) = 2 −1+k = . k k i=1 k 2 2k 2
Chapter 6. Multiple Integrals
Since U (f, Pk ) − L(f, Pk ) =
362
2k − 1 k2
for all k ∈ Z+ ,
we find that lim (U (f, Pk ) − L(f, Pk )) = 0.
k→∞
Hence, {Pk } is an Archimedes sequence of partitions for f . By the Arichimedes-Riemann theorem, f : I → R is Riemann integrable, and Z 1 k(k − 1) = . f = lim L(f, Pk ) = lim 2 k→∞ k→∞ 2k 2 I
Figure 6.6: This figure illustrates the different cases considered in Example 6.14 when k = 8. As in the single variable case, there is an equivalent definition for Riemann integrability using Riemann sums. For a partition P = {x0 , x1 , . . . , xk } of an interval [a, b], we define the gap of the partition P as |P | = max {xi − xi−1 | 1 ≤ i ≤ k} . n Y [ai , bi ], we replace the length xi −xi−1 of an interval For a closed rectangle I = i=1
in the partition by the diameter of a rectangle in the partition. Recall that the
Chapter 6. Multiple Integrals
363
n Y diameter of a rectangle J = [ui , vi ] is i=1
diam J =
p (v1 − u1 )2 + · · · + (vn − un )2 .
Definition 6.10 Gap of a Partition n Y Let P be a partition of the rectangle I = [ai , bi ]. Then the gap of the i=1
partition P is defined as |P| = max {diam J | J ∈ JP } . Example 6.15 Find the gap of the partition P = (P1 , P2 ) of the rectangle I = [−2, 9] × [1, 6] defined in Example 6.1, where P1 = {−2, 0, 4, 9} and P2 = {1, 3, 6}. Solution The length of the three invervals in the partition P1 = {−2, 0, 4, 9} of the interval [−2, 9] are 2, 4 and 5 respectively. The lengths of the two intervals in the partition P2 = {1, 3, 6} of the interval [1, 6] are 2 and 3 respectively. Therefore, the diameters of the 6 rectangles in the partition P are √ 22 + 22 , √ 22 + 32 ,
√ √
42 + 22 , 42 + 32 ,
From this, we see that the gap of P is
√
√ 52 + 22 , √ 52 + 32 .
52 + 32 =
√ 34.
In the example above, notice that |P1 | = 5 and |P2 | = 3. In general, it is not difficult to see the following. Proposition 6.9 n Y Let P = (P1 , . . . , Pn ) be a partition of the closed rectangle I = [ai , bi ]. i=1
Then |P| =
p |P1 |2 + · · · + |Pn |2 .
Chapter 6. Multiple Integrals
364
The following theorem gives equivalent definitions of Riemann integrability of a bounded function. Theorem 6.10 Equivalent Definitions for Riemann Integrability Let I =
n Y [ai , bi ], and let f : I → R be a bounded function defined on I. i=1
The following three statements are equivalent for saying that f : I → R is Riemann integrable. (a) The lower integral and the upper integral are the same. Namely, Z Z f = f. I
I
(b) There exists a number I that satisfies the following. For any ε > 0, there exists a δ > 0 such that if P is a partition of the rectangle I with |P| < δ, then |R(f, P, A) − I| < ε for any choice of intermediate points A = {ξ J } for the partition P. (c) For any ε > 0, there exists a δ > 0 such that if P is a partition of the rectangle I with |P| < δ, then U (f, P) − L(f, P) < ε. The most useful definition is in fact the second one in terms of Riemann sums. It says that a bounded function f : I → R is Riemann integrable if the limit lim R(f, P, A)
|P|→0
exists. As a consequence of Theorem 6.10, we have the following.
Chapter 6. Multiple Integrals
365
Theorem 6.11 Let I =
n Y
[ai , bi ], and let f : I → R be a bounded function defined on I. If
i=1
f : I → R is Riemann integrable, then for any sequence {Pk } of partitions of I satisfying lim |Pk | = 0, k→∞
we have Z (i) f = lim L(f, Pk ) = lim U (f, Pk ). k→∞
I
Z (ii)
k→∞
f = lim R(f, Pk , Ak ), where for each k ∈ Z+ , Ak is a choice of k→∞
I
intermediate points for the partition Pk . The proof is exactly the same as the single variable case. The contrapositive of Theorem 6.11 gives the following. Theorem 6.12 n Y Let I = [ai , bi ], and let f : I → R be a bounded function defined on I. i=1
Assume that {Pk } is a sequence of partitions of I such that lim |Pk | = 0.
k→∞
(a) If for each k ∈ Z+ , there exists a choice of intermediate points Ak for the partition Pk such that the limit lim R(f, Pk , Ak ) does not exist, k→∞ then f : I → R is not Riemann integrable. (b) If for each k ∈ Z+ , there exist two choices of intermediate points Ak and Bk for the partition Pk so that the two limits lim R(f, Pk , Ak ) and k→∞
lim R(f, Pk , Bk ) are not the same, then f : I → R is not Riemann k→∞ integrable. Theorem 6.12 is useful for justifying that a bounded function is not Riemann integrable, without having to compute the lower integral or the upper integral. To
Chapter 6. Multiple Integrals
366
apply this theorem, we usually consider the sequence of partitions {Pk }, where Pk is the uniformly regular partition of I into k n rectangles. Example 6.16 Let I = [0, 1] × [0, 1], and let f : I → R be the function defined as 0, if x is rational, f (x, y) = y, if x is irrational. Show that f : I → R is not Riemann integrable. Solution For k ∈ Z+ , let Pk be the uniformly regular partition of I into k 2 rectangles. i when Then Pk = (Pk , Pk ), where Pk = {u0 , u1 , . . . , uk } with ui = k √ 2 , and so lim Pk = 0. 0 ≤ i ≤ k. Notice that |Pk | = k→∞ k The partition Pk divides the square I into k 2 squares Ji,j , 1 ≤ i ≤ k, 1 ≤ j ≤ k, where Ji,j = [ui−1 , ui ] × [uj−1 , uj ]. For 1 ≤ i ≤ k, since irrational numbers are dense, there is an irrational number ci in the interval (ui−1 , ui ). For 1 ≤ i ≤ k, 1 ≤ j ≤ k, let αi,j and β i,j be the points in Ji,j given respectively by αi,j = (ui , uj ),
β i,j = (ci , uj ).
Then f (αi,j ) = 0, f (β i,j ) = uj . Let Ak = {αi,j } and Bk = β i,j . Then the Riemann sums R(f, Pk , Ak ) and R(f, Pk , Bk ) are given respectively by R(f, Pk , Ak ) =
k k X X i=1 j=1
f (αi,j ) vol (Ji,j ) = 0,
Chapter 6. Multiple Integrals
367
and R(f, Pk , Bk ) =
k X k X i=1
=
k X k X 1 j f (β i,j ) vol (Ji,j ) = × 2 k k j=1 i=1 j=1
k × k(k + 1) k+1 . = 2k 3 2k
Therefore, we find that lim R(f, Pk , Ak ) = 0,
k→∞
1 lim R(f, Pk , Bk ) = . k→∞ 2
Since the two limits are not the same, we conclude that f : I → R is not Riemann integrable. Now we return to the proof of Theorem 6.10. To prove this theorem, it is easier to show that (a) is equivalent to (c), and (b) is equivalent to (c). We will prove the equivalence of (a) and (c). The proof of the equivalence of (b) and (c) is left to the exercises. It is a consequence of the inequality L(f, P) ≤ R(f, P, A) ≤ U (f, P), which holds for any partition P of the rectangle I, and any choice of intermediate points A for the partition P. By Theorem 6.7, (a) is equivalent to (a′ ) For every ε > 0, there is a partition P of I such that U (f, P) − L(f, P) < ε. Thus, to prove the equivalence of (a) and (c), it is sufficient to show the equivalence of (a′ ) and (c). But then (c) implies (a′ ) is obvious. Hence, we are left with the most technical part, which is the proof of (a′ ) implies (c). We formulate this as a standalone theorem.
Chapter 6. Multiple Integrals
368
Theorem 6.13 n Y Let I = [ai , bi ], and let P0 be a fixed a partition of I. Given that f : I → i=1
R is a bounded function defined on I, for any ε > 0, there is a δ > 0 such that for all partitions P of I, if |P| < δ, then U (f, P) − L(f, P) < U (f, P0 ) − L(f, P0 ) + ε.
(6.1)
If Theorem 6.13 is proved, we can show that (a′ ) implies (c) in Theorem 6.10 as follows. Given ε > 0, (a′ ) implies that we can choose a P0 such that ε U (f, P0 ) − L(f, P0 ) < . 2 By Theorem 6.13, there is a δ > 0 such that for all partitions P of I, if |P| < δ, then U (f, P) − L(f, P) < U (f, P0 ) − L(f, P0 ) +
ε < ε. 2
This proves that (a′ ) implies (c). Hence, it remains for us to prove theorem 6.13. Let us introduce some additional n Y notations. Given the rectangle I = [ai , bi ], for 1 ≤ i ≤ n, let i=1
vol (I) b i − ai = (b1 − a1 ) × · · · × (bi−1 − ai−1 )(bi+1 − ai+1 ) × · · · × (bn − an ).
Si =
(6.2)
This is the area of the bounday of I that is contained in the hyperplane xi = ai or xi = bi . For example, when n = 2, I = [a1 , b1 ] × [a2 , b2 ], S1 = b2 − a2 is the length of the vertical side, while S2 = b1 − a1 is the length of the horizontal side of the rectangle I.
Chapter 6. Multiple Integrals
369
Proof of Theorem 6.13 Since f : I → R is bounded, there is a positive number M such that |f (x)| ≤ M
for all x ∈ I.
Assume that P0 = (Pe1 , . . . , Pen ). For 1 ≤ i ≤ n, let ki be the number of intervals in the partition Pei . Let K = max{k1 , . . . , kn }, and S = S1 + · · · + Sn , where Si , 1 ≤ i ≤ n are defined by (6.2). Given ε > 0, let δ=
ε . 4M KS
Then δ > 0. If P = (P1 , . . . , Pn ) is a partition of I with |P| < δ, we want to show that (6.1) holds. Let P∗ = (P1∗ , . . . , Pn∗ ) be the common refinement of P0 and P such that Pi∗ is the partition of [ai , bi ] that contains all the partition points of Pei and Pi . For 1 ≤ i ≤ n, let Ui be the collection of intervals in Pi which contain partition points of Pei , and let Vi be the collection of the intervals of Pi that is not in Ui . Each interval in Vi must be in the interior of one of the intervals in Pei . Thus, each interval in Vi is an interval in the partition Pi∗ . Since each partition point of Pei can be contained in at most two intervals of Pi , but the first and last partition points of Pi and Pei are the same, we find that |Ui | ≤ 2ki . Since |Pi | ≤ |P| < δ, each interval in Pi has length less than δ. Therefore, the sum of the lengths of the intervals in Ui is less than 2ki δ. Let Qi = J ∈ JP | the ith -edge of J is from Ui . Then X J∈Qi
vol (J) < 2ki δSi ≤ 2KδSi .
Chapter 6. Multiple Integrals
370
Figure 6.7: The partitions P0 and P in the proof of Theorem 6.13, P0 is the partition with red grids, while P is the partition with blue grids. Those shaded rectangles are rectangles in P that contain partition points of P0 . Now let Q=
n [
Qi .
i=1
Then X J∈Q
vol (J) < 2Kδ
n X
Si = 2KδS.
i=1
For each of the rectangles J that is in Q, we do a simple estimate MJ − mJ ≤ 2M. Therefore, X
(MJ − mJ ) vol (J) < 4M KδS ≤ ε.
J∈Q
For the rectangles J that are in JP \ Q, each of them is a rectangle in the partition P∗ . Therefore, X J∈JP \Q
(MJ − mJ ) vol (J) ≤ U (f, P∗ )−L(f, P∗ ) ≤ U (f, P0 )−L(f, P0 ).
Chapter 6. Multiple Integrals
371
Hence, U (f, P) − L(f, P) =
X
(MJ − mJ ) vol (J)
J∈JP
=
X
(MJ − mJ ) vol (J) +
X
(MJ − mJ ) vol (J)
J∈Q
J∈JP \Q
< U (f, P0 ) − L(f, P0 ) + ε. This completes the proof. Finally we extend Riemann integrals to functions f : D → R that are defined on bounded subsets D of Rn . If D is bounded, there is a positive number L such that ∥x∥ ≤ L for all x ∈ D. n Y This implies that D is contained in the closed rectangle IL = [−L, L]. To i=1
define the Riemann integral of f : D → R, we need to extend the domain of f from D to IL . To avoid affecting the integral, we should extend by zero. Definition 6.11 Zero Extension Let D be a subset of Rn , and let f : D → R be a function defined on D. The zero extension of f : D → R is the function fˇ : Rn → R which is defined as f (x), if x ∈ D, fˇ(x) = 0, if x ∈ / D. If U is any subset of Rn that contains D, then the zero extension of f to U is the function fˇ : U → R. Obviously, if f : D → R is a bounded function, its zero extension fˇ : Rn → R is also bounded. Since we have defined Riemann integrability for a bounded function g : I → R that is defined on a closed rectangle I, it is natural to say that a function f : D → R is Riemann integrable if its zero extension fˇ : I → R to a closed rectangle I is Riemann integrable, and define Z Z f = fˇ. D
I
Chapter 6. Multiple Integrals
372
For this to be unambiguous, we have to check that if I1 and I2 are closed rectangles that contain the bounded set D, the zero extension fˇ : I1 → R is Riemann integrable if and only if the zero extension fˇ : I2 → R is Riemann integrable. Moreover, Z Z fˇ = fˇ. I1
I2
This small technicality would be proved in Section 6.2. Assuming this, we can give the following formal definition for Riemann integrality of a bounded function defined on a bounded domain. Definition 6.12 Riemann Integrals of General Functions Let D be a bounded subset of Rn , and let I =
n Y [ai , bi ] be a closed i=1
rectangle in Rn that contains D. Given that f : D → R is a bounded function defined on D, we say that f : D → R is Riemann integrable if its zero extension fˇ : I → R is Riemann integrable. If this is the case, we define the integral of f over D as Z Z f = fˇ. D
I
Example 6.17 Let I = [0, 1] × [0, 1], and let f : I → R be the function defined as 1, if x ≥ y, f (x, y) = 0, if x < y. which is considered in Example 6.14. Let D = {(x, y) ∈ I | y ≤ x} , and let g : D → R be the constant function g(x) = 1. Then f : I → R is the zero extension of g to the square I that contains D.
Chapter 6. Multiple Integrals
373
In Example 6.14, we have shown that f : I → R is Riemann integrable and Z 1 f (x)dx = . 2 I Therefore, g : D → R is Riemann integrable and Z 1 g(x)dx = . 2 D Remark 6.1 Here we make two remarks about the Riemann integrals. 1. When f : D → R is the constant function, we should expect that it is Riemann integrable if and only if D has a volume, which should be defined as Z vol (D) = dx. D
2. If f : D → R is a nonnegative continuous function defined on the bounded set D that has a volume, weZ would expect that f : D → R is f (x)dx gives the volume of the
Riemann integrable, and the integral D
solid bounded between D and the graph of f . In Section 6.3, we will give a characterization of sets D that have volumes. We will also prove that if f : D → R is a continuous function defined on a set D that has volume, then f : D → R is Riemann integrable.
Chapter 6. Multiple Integrals
374
Exercises 6.1 Question 1 Let I = [−5, 8] × [2, 5], and let P = (P1 , P2 ) be the partition of I with P1 = {−5, −1, 2, 7, 8} and P2 = {2, 4, 5}. Find gap of the partition P. Question 2 Let I = [−5, 8] × [2, 5], and let f : I → R be the function defined as f (x, y) = x2 + 2y. Consider the partition P = (P1 , P2 ) of I with P1 = {−5, −1, 2, 7, 8} and P2 = {2, 4, 5}. Find the Darboux lower sum L(f, P) and the Darboux upper sum U (f, P). Question 3 Let I = [−5, 8] × [2, 5], and let f : I → R be the function defined as f (x, y) = x2 + 2y. Consider the partition P = (P1 , P2 ) of I with P1 = {−5, −1, 2, 7, 8} and P2 = {2, 4, 5}. For each rectangle J = [a, b] × [c, d] in the partition P, let αJ = (a, c) and β J = (b, d). Find the Riemann sums R(f, P, A) and R(f, P, B), where A = {αJ } and B = {β J }. Question 4 Let I = [−1, 1] × [2, 5], and let f : I → R be the function defined as 1, if x and y are rational, f (x, y) = 0, otherwise. (a) Given that P is a partition of I, find the Darboux lower sum L(f, P) and the Darboux upper sum U (f, P). Z Z (b) Find the lower integral f and the upper integral f . I
(c) Explain why f : I → R is not Riemann integrable.
I
Chapter 6. Multiple Integrals
375
Question 5 Let I = [0, 4] × [0, 2]. Consider the function f : I → R defined as f (x, y) = 2x + 3y + 1. For k ∈ Z+ , let Pk be the uniformly regular partition of I = [0, 4] × [0, 2] into k 2 rectangles. (a) For each k ∈ Z+ , compute the Darboux lower sum L(f, Pk ) and the Darboux upper sum U (f, Pk ). Z (b) Show that f : I → R is Riemann integrable and find the integral f . I
Question 6 n Y Let I = [ai , bi ], and let f : I → R be a function defined on I. Show that i=1
the following are equivalent. (a) There exists a number I that satisfies the following. For any ε > 0, there exists a δ > 0 such that if P is a partition of the rectangle I with |P| < δ, then |R(f, P, A) − I| < ε for any choice of intermediate points A = {ξ J } for the partition P. (b) For any ε > 0, there exists a δ > 0 such that if P is a partition of the rectangle I with |P| < δ, then U (f, P) − L(f, P) < ε.
Chapter 6. Multiple Integrals
6.2
376
Properties of Riemann Integrals
In this section, we discuss properties of Riemann integrals. Let us first consider Riemann integrals of functions f : I → R defined on closed rectangles of the form n Y I = [ai , bi ]. Using some of these properties, we prove that the definition of i=1
Riemann integrabililty for functions f : D → R defined on general bounded sets, as given in Section 6.1, is unambiguous. Finally, we will extend the properties of Riemann integrals to functions f : D → R defined on bounded sets. Linearity is one of the most important properties. For functions defined on n Y closed rectangles of the form I = [ai , bi ], the proof is straightforward using the i=1
Riemann sum definition of Riemann integrability, as in the single variable case. Theorem 6.14 Linearity Let I =
n Y
[ai , bi ], and let f : I → R and g : I → R be Riemann integrable
i=1
functions. For any real numbers α and β, (αf + βg) : I → R is also Riemann integrable, and Z Z Z (αf + βg) = α f + β g. I
I
I
Sketch of Proof If P is a partition of I and A is a set of intermediate points for P, then R(αf + βg, P, A) = αR(f, P, A) + βR(g, P, A). The results follows by taking the |P| → 0 limit. Example 6.18 Let I = [0, 2] × [0, 2], and let f : I → Z R andZg : I → R be Riemann integrable functions. Find the integrals f and g if I
f (x, y) = g(y, x)
and (f + g)(x, y) = 6
I
for all (x, y) ∈ I.
Chapter 6. Multiple Integrals
377 Solution
Since I is symmetric with ZrespectZto the line y = x and f (x, y) = g(y, x) for all (x, y) ∈ I, we have
f=
g. By linearity,
I
Z
Z f+
I
Z (f + g) = 6 × vol (I) = 24.
g=
I
I
Hence,
I
Z
Z f=
I
g = 12. I
The following theorem is about the integral of a nonnegative function. Theorem 6.15 Let I =
n Y [ai , bi ], and let f : I → R be a bounded function defined on I. i=1
Assume that f (x) ≥ 0 for all x in I. If f : I → R is Riemann integrable, then Z f ≥ 0. I
Proof For any partition P of I, L(f, P) ≥ 0. Therefore, Z Z f = f ≥ L(f, P) ≥ 0. I
I
The monotonicity theorem then follows from linearity and Theorem 6.15. Theorem 6.16 Monotonicity Let I =
n Y
[ai , bi ], and let f : I → R and g : I → R be Riemann integrable
i=1
functions. If f (x) ≥ g(x) for all x in I, then Z Z f ≥ g. I
I
Chapter 6. Multiple Integrals
378 Proof
By linearity, the function (f − g) : I → R is integrable, and Z Z Z (f − g) = f − g. I
I
I
Z (f − g) ≥ 0, and the assertion follows.
By Theorem 6.15, I
The next important property is the additivity of the Riemann integrals. Theorem 6.17 Additivity Let I =
n Y [ai , bi ], and let P0 be a partition of I. If f : I → R is a bounded i=1
function defined on I, then f : I → R is Riemann integrable if and only if for each J ∈ JP0 , f : J → R is Riemann integrable. In such case, we also have Z X Z f = f. I
J∈JP0
J
Proof It is sufficient to consider the case that P0 = (P1 , . . . , Pn ) divides I into two rectangles I1 and I2 by having a partition point c inside the j th -edge [aj , bj ] for some 1 ≤ j ≤ n. Namely, Pj = {aj , c, bj }, and for i ̸= j, Pi = {ai , bi }. The general case can be proved by induction, adding one partition point at a time. Assume that f : I → R is Riemann integrable. Given ε > 0, there is a partition P of I such that U (f, P) − L(f, P) < ε. Let P∗ be a common refinement of P and P0 . Then U (f, P∗ ) − L(f, P∗ ) ≤ U (f, P) − L(f, P) < ε.
Chapter 6. Multiple Integrals
379
But P∗ induces a partition P∗ (I1 ) and P∗ (I2 ) of I1 and I2 , and we have U (f, P∗ ) = U (f, P∗ (I1 )) + U (f, P∗ (I2 )), L(f, P∗ ) = L(f, P∗ (I1 )) + L(f, P∗ (I2 )). Therefore, U (f, P∗ (I1 )) − L(f, P∗ (I1 )) + U (f, P∗ (I2 )) − L(f, P∗ (I2 )) < ε. This implies that U (f, P∗ (Ij )) − L(f, P∗ (Ij )) < ε
for j = 1, 2.
Hence, f : I1 → R and f : I2 → R are Riemann integrable. Conversely, assume that f : I1 → R and f : I2 → R are Riemann integrable. Let {P1,k } and {P2,k } be Archimedes sequences of partitions for f : I1 → R and f : I2 → R respectively. Then Z f = lim U (f, Pj,k ) = lim L(f, Pj,k ) for j = 1, 2. Ij
k→∞
k→∞
For k ∈ Z+ , let P∗k be the partition of I obtained by taking unions of partition points in P1,k and P2,k . Then P1,k = P∗k (I1 ) and P2,k = P∗k (I2 ). It follows that U (f, P∗k ) = U (f, P1,k )+U (f, P2,k ),
L(f, P∗k ) = L(f, P1,k )+L(f, P2,k ).
Therefore, lim (U (f, P∗k ) − L(f, P∗k )) = 0.
k→∞
Hence, {P∗k } is an Archimedes sequence of partitions for f : I → R. This shows that f : I → R is Riemann integrable, and Z Z Z ∗ f = lim U (f, Pk ) = lim (U (f, P1,k ) + U (f, P2,k )) = f+ f. I
k→∞
k→∞
Next we state a lemma which is useful.
I1
I2
Chapter 6. Multiple Integrals
380
Figure 6.8: A partition P of I and the refined partition P∗ that induces partitions on I1 and I2 . Lemma 6.18 n Y Given that I = [ai , bi ] is a closed rectangle in Rn , let i=1
ω=
1 min{bi − ai | 1 ≤ i ≤ n}. 2
n Y (i) Given η > 0, let Iη be the closed rectangle Iη = [ai − η, bi + η]. i=1
For any ε > 0, there exists δ > 0 such that for any 0 < η < δ, 0 < vol (Iη ) − vol (I) < ε. (ii) Given 0 < κ < ω, let ˇIκ be the closed rectangle ˇIκ =
n Y [ai +κ, bi −κ]. i=1
For any ε > 0, there exists 0 < δ ≤ ω such that for any 0 < κ < δ, 0 < vol (I) − vol (ˇIκ ) < ε.
Figure 6.9: Enlarging or shrinking a rectangle by an arbitrary amount.
Chapter 6. Multiple Integrals
381
This lemma says that one can enlarge or shrink a rectangle by an arbitrarily small amount. It can be proved by elementary means. But here we use some analysis technique to prove it. Proof We prove part (i). The argument for part (ii) is the same. Consider the function h : [0, ∞) → R defined by n Y h(η) = vol (Iη ) = (bi − ai + 2η). i=1
As a function of η, h(η) is a polynomial, and it is a strictly increasing continuous function. The assertion is basically the definition of the limit lim+ h(η) = h(0). η→0
The following theorem says that a bounded function f : I → R which is identically zero on the interior of I is Riemann integrable with integral 0. This is something we would have expected. Theorem 6.19 Let I =
n Y [ai , bi ], and let f : I → R be a bounded function such that i=1
for all x ∈ int (I).
f (x) = 0
Then f : I → R is Riemann integrable and Z f = 0. I
Proof 1 Let ω = min{bi − ai | 1 ≤ i ≤ n}. Since f : I → R is bounded, there is 2 a positive number M such that |f (x)| ≤ M
for all x ∈ I.
Chapter 6. Multiple Integrals
382
By Lemma 6.18, there is a κ ∈ (0, ω) such that ε , M
vol (I) − vol (Iκ )
−ε. In the same way, we find that U (f, P) < ε. Since ε > 0 is arbitrary, we find that Z Z f ≥ 0 and f ≤ 0. I
I
Z
Z f≤
Since I
f , we conclude that I
Z
Z f=
I
f = 0. I
This proves that f : I → R is Riemann integrable and Z f = 0. I
Now let us give a proof that the definition given in Section 6.1 for a bounded function f : D → R defined on a bounded subset of Rn to be Riemann integrable is unambiguous. The crucial point is the following.
Chapter 6. Multiple Integrals
383
Lemma 6.20 n n Y Y ˇ Let I = [ai , bi ] and I = [ˇ ai , ˇbi ] be closed rectangles in Rn such that i=1
i=1
I ⊂ ˇI, and let f : I → R be a bounded function defined on I. Then f : I → R is Riemann integrable if and only if its zero extension fˇ : ˇI → R is Riemann integrable. In such case, we also have Z Z f = fˇ. I
ˇI
Proof ˇ = {Pˇ1 , . . . , Pˇn } be the partition of ˇI such that the set Pˇi is the set Let P that contains a ˇi , ai , bi , ˇbi . For each rectangle J in JPˇ \ {I}, it is disjoint from the interior of I. Hence, ˇ fˇ vanishes in the Z interior of J. By Theorem 6.19, f : J → R is Riemann integrable and fˇ = 0. It follows from the additivity theorem that fˇ : ˇI → J
R is Riemann integrable if and only if fˇ : I → R is Riemann integrable, and Z Z ˇ f = fˇ. ˇI
I
However, restricted to I, fˇ(x) = f (x). Hence, fˇ : ˇI → R is Riemann integrable if and only if f : I → R is Riemann integrable. In such case, we have Z Z ˇ f = f. ˇI
I
Figure 6.10: The rectangle I is contained in the rectangle ˇI.
Chapter 6. Multiple Integrals
384
Finally we can prove the main result. Theorem 6.21 Let D be a bounded set in Rn , and let f : D → R be a bounded function defined on D. The definition for Riemann integrability of f : D → R is n n Y Y ′ ′ unambiguous. Namely, if I1 = [a′′i , b′′i ] contain D, [ai , bi ] and I2 = i=1
i=1
the zero extension fˇ : I1 → R is Riemann integrable if and only if the zero extension fˇ : I2 → R is Riemann integrable. In the latter case, Z Z ˇ f= fˇ, I1
I2
and so we can define unambiguously Z Z f = fˇ, I
D
where I is any rectangle of the form
n Y
[ai , bi ] that contains D.
i=1
Proof Let I = I1 ∩ I2 . Then I is a rectangle that is contained in I1 and I2 . Lemma 6.20 then says that fˇ : I1 → R is Riemann integrable if and only if fˇ : I → R is Riemann integrable, if and only if fˇ : I2 → R is Riemann integrable. In latter case, Z Z Z fˇ. fˇ = fˇ = I1
I
I2
Figure 6.11: The set D is contained in the rectangles I1 and I2 .
Chapter 6. Multiple Integrals
385
Now we can extend the linearity and monotonicity to Riemann integrals over any bounded domains. Theorem 6.22 Linearity Let D be bounded subset of Rn , and let f : D → R and g : D → R be Riemann integrable functions. For any real numbers α and β, (αf + βg) : D → R is also Riemann integrable, and Z Z Z (αf + βg) = α f + β g. D
D
D
Proof Let I =
n Y
[ai , bi ] be a closed rectangle that contains D, and let fˇ : I → R
i=1
and gˇ : I → R be the zero extensions of f : D → R and g : D → R to I. It is easy to check that (αfˇ + βˇ g ) : I → R is the zero extension of (αf + βg) : D → R to I. Since f : D → R and g : D → R are Riemann integrable, fˇ : I → R and gˇ : I → R are Riemann integrable and Z Z Z Z ˇ g. gˇ = f, f= I
I
D
D
By Theorem 6.14, (αfˇ + βˇ g ) : I → R is Riemann integrable, and Z Z Z Z Z ˇ ˇ (αf + βˇ g ) = α f + β gˇ = α f + β g. I
I
I
D
D
It follows that (αf + βg) : D → R is also Riemann integrable, and Z Z Z Z ˇ (αf + βg) = (αf + βˇ g) = α f + β g. D
I
D
D
Chapter 6. Multiple Integrals
386
Theorem 6.23 Let D be a bounded subset of Rn , and let f : D → R be a bounded function defined on D. Assume that f (x) ≥ 0 for all x in D. If f : D → R is Riemann integrable, then Z f ≥ 0. D
Proof n Y Let I = [ai , bi ] be a closed rectangle that contains D, and let fˇ : I → R i=1
be the zero extension of f : D → R to I. Since f : D → R is Riemann integrable, fˇ : I → R is also Riemann integrable. It is easy to check that fˇ(x) ≥ 0 for all x in I. Therefore, Z Z f = fˇ ≥ 0. D
I
As before, monotonicity is a consequence of linearity and Theorem 6.23. Theorem 6.24 Monotonicity Let D be a bounded subset of Rn , and let f : D → R and g : D → R be Riemann integrable functions. If f (x) ≥ g(x) for all x in D, then Z Z f ≥ g. D
D
At the end of this section, we want to present two theorems whose proofs are almost verbatim those for the n = 1 case. The first theorem says that if a function is Riemann integrable, so is its absolute value. Theorem 6.25 Absolute Value of Riemann Integrable Functions Let D be a bounded subset of Rn , and let f : D → R be a bounded function defined on D. If the function f : D → R is Riemann integrable, then the function |f | : D → R is also Riemann integrable.
Chapter 6. Multiple Integrals
387 Sketch of Proof
If fˇ : I → R is the zero extension of f : D → R to the closed rectangle n Y I= [ai , bi ] that contains D, then |fˇ| : I → R is the zero extension of i=1
|f | : D → R. Hence, it is sufficient to consider the case where D is a n Y closed rectangle of the form I = [ai , bi ]. The proof is almost the same i=1
as the n = 1 case. The key of the proof is the fact that for any subset A of I, sup |f (x)| − inf |f (x)| ≤ sup f (x) − inf f (x). x∈A
x∈A
x∈A
x∈A
The second theorem says that products of Riemann integrable functions are Riemann integrable. Theorem 6.26 Products of Riemann Integrable Functions Let D be a bounded subset of Rn , and let f : D → R and g : D → R be bounded functions defined on D. If the functions f : D → R and g : D → R are Riemann integrable, then the function (f g) : D → R is also Riemann integrable. Sketch of Proof It is sufficient to consider the case where D is a closed rectangle of the form n Y I= [ai , bi ]. The proof is almost the same as the n = 1 case. The key of i=1
the proof is the fact that if M is positive number such that |f (x)| ≤ M
and
|g(x)| ≤ M
for all x ∈ I,
then for any subset A of I, sup(f g)(x) − inf (f g)(x) x∈A x∈A ≤ M sup f (x) − inf f (x) + sup g(x) − inf g(x) . x∈A
x∈A
x∈A
x∈A
Chapter 6. Multiple Integrals
388
Exercises 6.2 Question 1 Let I = [0, 3] × [0, 3], and let f : I → R and g : I → R be Riemann integrable functions. Suppose that f (x, y) = g(y, x) and Z Z find f and g. I
(3f + 2g)(x, y) = 10
I
Question 2 Complete the details in the proof of Theorem 6.25. Question 3 Complete the details in the proof of Theorem 6.26.
for all (x, y) ∈ I,
Chapter 6. Multiple Integrals
6.3
389
Jordan Measurable Sets and Riemann Integrable Functions
In this section, we will give some sufficient conditions for a bounded function f : D → R to be Riemann integrable. We start with the following theorem. Theorem 6.27 Let I =
n Y [ai , bi ], and let f : I → R be a continuous function defined on i=1
I. Then f : I → R is Riemann integrable. Proof Since f : I → R is continuous and I is compact, f : I → R is uniformly continuous. Given ε > 0, there exists δ > 0 such that if u and v are points in I and ∥u − v∥ < δ, then ε |f (u) − f (v)| < . vol (I) Let P be any partition of I with |P| < δ. A rectangle J in JP is a compact set. Since f : J → R is continuous, the extreme value theorem says that there exist points uJ and vJ in J such that f (uJ ) ≤ f (x) ≤ f (vJ )
for all x ∈ J.
Therefore, mJ = inf f (x) = f (uJ ) and x∈J
MJ = sup f (x) = f (vJ ). x∈J
Since |P| < δ, ∥uJ − vJ ∥ ≤ diam J ≤ |P| < δ. Therefore, MJ − mJ = f (vJ ) − f (uJ )
0, χA (x, y) = 0, if x ≤ 0. It is continuous at (x, y) if and only if x > 0 or x < 0.
Figure 6.12: The set A = {(x, y) | x > 0}. Interior, Exterior and Boundary of a Set In Chapter 1, we have seen that if A is a subset of Rn , then Rn is a disjoint union of int A, ext A and ∂A. If x0 is a point in Rn , x0 ∈ int A if and only if there is an r > 0 such that B(x0 , r) ⊂ A; x0 ∈ ext A if and only if there is an r > 0 such that B(x0 , r) ⊂ Rn \ A; and x0 ∈ ∂A if for every r > 0, B(x0 , r) contains a point in A and a point not in A. Theorem 6.28 Let A be a subset of Rn , and let χA : Rn → R be the characteristic function of A. Then the set of discontinuities of the function χA is the set ∂A.
Chapter 6. Multiple Integrals
392
Proof Since R is a disjoint union of int A, ext A and ∂A, we will show that χA is continuous on int A and ext A, and discontinuous at every point in ∂A. The sets int A and ext A are open sets, and f is equal to 1 on int A and 0 on ext A. For every x0 in int A, there is an r > 0 such that B(x0 , r) ⊂ A. Therefore, for any ε > 0, if x is such that ∥x − x0 ∥ < r, then n
|f (x) − f (x0 )| = 0 < ε. This shows that f is continuous at x0 . Similarly, if x0 is in ext A, there is an r > 0 such that B(x0 , r) ⊂ Rn \ A. The same reasoning shows that f is continuous at x0 . Now consider a point x0 that is in ∂A. For any k ∈ Z+ , there is a point uk ∈ A and a point vk ∈ / A such that uk and vk are in the neighbourhood B(x0 , 1/k) of x0 . The two sequences {uk } and {vk } both converge to x0 , but the sequence {f (uk )} converges to 1, the sequence {f (vk )} converges to 0. This shows that f is not continuous at x0 .
Figure 6.13: The characteristic function of a set A is not continuous at x0 if and only if x0 ∈ ∂A. By definition, restricted to the set A, χA : A → R is the constant function χA (A) = 1. Now we define Jordan measurable sets and its volume.
Chapter 6. Multiple Integrals
393
Definition 6.14 Jordan Measurable Sets and Volume Let D be a bounded subset of Rn . We say that D is Jordan measurable if the constant function χD : D → R is Riemann integrable. In this case, we define the volume of D as Z Z vol (D) = χD = dx. D
D
Example 6.21 n Y The closed rectangle I = [ai , bi ] is Jordan measurable, and its volume is i=1
Z dx =
vol (I) = I
n Y (bi − ai ), i=1
as what we have defined earlier. Example 6.22 Example 6.14 says that the set D = {(x, y) | 0 ≤ y ≤ x ≤ 1} 1 is Jordan measurable and vol (D) = . 2
Figure 6.14: The set D = {(x, y) | 0 ≤ y ≤ x ≤ 1} is Jordan measurable.
Chapter 6. Multiple Integrals
394
One might think that all bounded subsets of Rn has volumes. This is not true. An example is given below. Example 6.23 Let I = [0, 1]n and let D = {x ∈ I | x ∈ Qn } . Notice that D is a subset of the rectangle I, and the zero extension of χD : D → R is the function χD : I → R, 1, if all components of x are rational, χD (x) = 0, otherwise, which is the Dirichlet’s function. We have seen in Example 6.13 that the function χD : I → R is not Riemann integrable. Hence, χD : D → R is not Riemann integrable. This means the set D is not Jordan measurable and so it does not have a volume. This example also shows that if B is a subset of A, and the function f : A → R is Riemann integrable, the function f : B → R is not necessary Riemann integrable. The next example says that the boundary of a rectangle has volume 0. Example 6.24 n Y Let I = [ai , bi ], and let D = ∂ I. Notice that D is contained in I. The i=1
zero extension of χD : D → R is the function χD : I → R which vanishes on the Z interior of I. By Theorem 6.19, χD : I → R is Riemann integrable χD = 0. Therefore, D = ∂ I has zero volume.
and I
Chapter 6. Multiple Integrals
395
Remark 6.2 Darboux Sums for a Characteristic Function Given a bounded set D that is contained in the rectangle I, if P is a partition of I, L(χD , P) is the sum of the volumes of the rectangles in P that is contained in D; while U (χD , P) is the sum of the volumes of the rectangles in P that intersect D. See Figure 6.15. Thus, for D to have volume, the two numbers L(χD , P) and U (χD , P) should get closer and closer when the partitions P gets finer.
Figure 6.15: The geometric quantities represented by L(χD , P) and U (χD , P) when D is the region bounded inside the circle. Our goal is to give characterization of sets that are Jordan measurable. We will consider those that have zero volumes first. The following is a useful lemma. Lemma 6.29 Let I be a closed rectangle in Rn that contains the closed rectangles I1 , . . . , Ik . There is a partition P of I such that if J is a rectangle in the partition P, then J is either contained in an Ij for some 1 ≤ j ≤ k, or J is disjoint from the interiors of Ij for all 1 ≤ j ≤ k.
Chapter 6. Multiple Integrals
396 Sketch of Proof
We construct the partition P = (P1 , . . . , Pn ) in the following way. For each 1 ≤ i ≤ n, the partition points in Pi is the set of end points of the ith -edge of I, I1 , . . ., Ik . One can check that this partition satisfies the requirement. See Figure 6.16 for an illustration.
Figure 6.16: A partition of the rectangle I that satisfies the conditions in Lemma 6.29. Let us introduce the definition of a cube. Definition 6.15 Cubes n Y A rectangle of the form [ai , bi ] such that i=1
b1 − a1 = b2 − a2 = · · · = bn − an = ℓ = 2r is called a (closed) cube with side length ℓ = 2r. The center of the cube is a1 + b 1 a2 + b 2 an + b n . c = (c1 , c2 , . . . , cn ) = , ,..., 2 2 2 We will denote such a cube by Qc,r . There are also cubes whose edges are not parallel to the coordinate axes. In this chapter, when we say a cube, we always mean a cube defined above. Now we can give a characterization of sets with zero volume.
Chapter 6. Multiple Integrals
397
Theorem 6.30 Let D be a bounded subset of Rn . The following are equivalent. (a) The set D is Jordan measurable and it has zero volume. (b) For any ε > 0, there are finitely many closed cubes Q1 , . . . , Qk such that k k [ X D⊂ Qj and vol (Qj ) < ε. j=1
j=1
(c) For any ε > 0, there are finitely many closed rectangles I1 , . . . , Ik such that k k [ X D⊂ Ij and vol (Ij ) < ε. j=1
j=1
Proof First assume that D is a Jordan measurable set with zero volume. There is a positive number R such that the closed cube Q0,R = [−R, R]n contains the set D. Z Let I = Q0,R . Then the function χD : I → R is Riemann integrable χD = 0. Given m ∈ Z+ , let Pm be the uniformly regular partition
and I
of I into mn rectangles. Notice that each rectangle in the partition Pm is a cube. Since lim |Pm | = 0, we have m→∞
Z lim U (χD , Pm ) =
m→∞
χD = 0. I
Given ε > 0, there is a positive integer M such that for all m ≥ M , U (χD , Pm ) < ε. Consider the partition PM . Notice that for J ∈ JPM , 1, if J ∩ D ̸= ∅, MJ (χD ) = 0, if J ∩ D = ∅.
Chapter 6. Multiple Integrals
398
Let A = {J ∈ JPM | J ∩ D ̸= ∅} . Then U (χD , PM ) =
X
vol (J).
J∈A
A is a finite collection of cubess. Hence, we can named the cubes in A as Q1 , . . . , Qk . By construction, D⊂
k [
Qj
and
j=1
k X
vol (Qj ) < ε.
j=1
This proves that (a) implies (b). (b) implies (c) is obvious since a cube is a rectangle. Now assume that (c) holds. Given ε > 0, (c) says that there are closed rectangles I1 , . . . , Ik such that D⊂
k [
Ij
and
j=1
k X
ε vol (Ij ) < . 2 j=1
By Lemma 6.18, for each 1 ≤ j ≤ k, there is a closed rectangle ˇIj such that Ij ⊂ int ˇIj and ε vol (ˇIj ) − vol (Ij ) < . 2k It follows that D⊂
k [ j=1
int ˇIj
and
k X
vol (ˇIj ) < ε.
j=1
Let I be a closed rectangle whose interior contains the bounded set
k [ j=1
ˇIj .
Chapter 6. Multiple Integrals
399
By Lemma 6.29, there is a partition P of I such that each rectangle J in the partition P is either contained in an ˇIj for some 1 ≤ j ≤ k, or is disjoint from the interiors of ˇIj for all 1 ≤ j ≤ k. Let B = J ∈ JP | J ⊂ ˇIj for some 1 ≤ j ≤ k . If J ∈ / B, then J ∩ int ˇIj = ∅ for all 1 ≤ j ≤ k. Therefore, J ∩ D = ∅. For these J, MJ (χD ) = mJ (χD ) = 0. If J is in B, we use the simple estimate MJ ≤ 1. Thus, U (χD , P) =
X
MJ vol (J) ≤
J∈B
X
vol (J) ≤
k X
vol (ˇIj ) < ε.
j=1
J∈B
Since L(χD , P) ≥ 0, we find that U (χD , P) − L(χD , P) < ε. This proves that χD : I → R is Riemann integrable. Since we have shown that there exists a partition P such that U (χD , P) < ε, we have Z vol (D) = χD ≤ U (χD , P) < ε. I
Since ε > 0 is arbitrary, we find that vol (D) = 0. This completes the proof of (c) implies (a). Motivated by Theorem 6.30, we make the following definition. Definition 6.16 Jordan Content Zero Let D be a bounded subset of Rn . We say that D has Jordan content zero provided that for any ε > 0, there are finitely many closed rectangles I1 , . . . , Ik such that D⊂
k [ j=1
Ij
and
k X j=1
vol (Ij ) < ε.
Chapter 6. Multiple Integrals
400
Sets that have Jordan Content Zero Let D be a bounded subset of Rn . Theorem 6.30 says that D is Jordan measurable with volume zero if and only if it has Jordan content zero. The characterization of sets with zero volume given in Theorem 6.30 facilitates the proofs of properties of such sets. Theorem 6.31 Let D1 and D2 be bounded subsets of Rn . If D1 has Jordan content zero and D2 ⊂ D1 , then D2 also has Jordan content zero. Proof Given ε > 0, since D1 has Jordan content zero, there are closed rectangles I1 , . . . , Ik such that D1 ⊂
k [
Ij
and
j=1
k X
vol (Ij ) < ε.
j=1
Since D2 ⊂ D1 , we find that D2 ⊂
k [
Ij
and
j=1
k X
vol (Ij ) < ε.
j=1
Therefore, D2 also has Jordan content zero. Example 6.25 Let D be the subset of R3 given by D = {(x, y, 2) | − 2 ≤ x ≤ 3, −5 ≤ y ≤ 7} . Show that D is a Jordan measurable set with zero volume.
Chapter 6. Multiple Integrals
401 Solution
Let I = [−2, 3] × [−5, 7] × [2, 3]. Then I is a closed rectangle in R3 . Example 6.24 says that ∂ I has Jordan content zero. Since D ⊂ ∂ I, Theorem 6.31 says that D has Jordan content zero. Hence, D is a Jordan measurable set with zero volume. The next theorem concerns unions and intersections of sets of Jordan content zero. Theorem 6.32 (a) If A = {Dα | α ∈ J} is a collection of sets that have Jordan content \ Dα also has Jordan content zero. zero, then their intersection U = α∈J
(b) If D1 , . . ., Dm are finitely many sets that have Jordan content zero, then m [ their union D = Dj is also a set that has Jordan content zero. j=1
Proof (a) is obvious since U ⊂ Dα for any α ∈ J. (b) is basically a consequence of the fact that finite union of finite sets is finite. Given ε > 0, for each 1 ≤ j ≤ m, since Dj has Jordan content zero, there is a finite collection Bj = {Iβj | βj ∈ Jj } of closed rectangles such that [ X ε Dj ⊂ Iβj , vol Iβj < . m β ∈J β ∈J j
j
j
Let B=
m [
j
Bj .
j=1
Since each Bj , 1 ≤ j ≤ m is finite, B is also a finite collection of closed rectangles. Moreover, D=
m [ j=1
Dj ⊂
m [ [ j=1 βj ∈Jj
Iβj =
[ Iβ ∈B
Iβ ,
Chapter 6. Multiple Integrals
and X Iβ ∈B
vol (Iβ ) ≤
402
m X X
vol Iβj < ε.
j=1 βj ∈Jj
This shows that D has Jordan content zero. Example 6.26 It is obvious that a one-point subset of Rn has Jordan content zero. It follows that any finite subset of Rn has Jordan content zero. Now we want to consider general Jordan measurable sets. We first prove the following two theorems, giving more examples of Riemann integrable functions. The first one is a special case of the second one, but we need to prove it first to prove the second theorem. Theorem 6.33 n Y Let I = [ai , bi ], and let f : I → R be a bounded function defined on I. i=1
If f : I → R is continuous on the interior of I, then f : I → R is Riemann integrable. Proof We will show that for any ε > 0, there is a partition P of I such that U (f, P) − L(f, P) < ε. Since f : I → R is a bounded function, there is a positive number M such that |f (x)| ≤ M for all x ∈ I. By Lemma 6.18, there is a closed rectangle ˇI =
n Y [ui , vi ] contained in the i=1
interior of I, such that vol (I) − vol (ˇI)
0, there is a partition P of I such that U (f, P) − L(f, P) < ε. Since f : I → R is a bounded function, there is a positive number M such that |f (x)| ≤ M for all x ∈ I. Since Nf is a set of Jordan content zero that is contained in I, there are closed rectangles I1 , . . . , Ik such that Nf ⊂
k [ j=1
Ij
and
k X
vol (Ij )
0. Since f : D → R is Riemann integrable, there is a partition P of I such that ε U (fˇ, I) − L(fˇ, I) < . 2 Let η=
ε . 4vol (I)
Chapter 6. Multiple Integrals
417
Then η > 0. Let A = {J × [mJ − η, MJ + η] | J ∈ JP } . Then A is a finite collection of closed rectangles in Rn+1 . If (x, f (x)) is in Gf , there is a J ∈ JP such that x ∈ J. Then mJ ≤ f (x) ≤ MJ implies that (x, f (x)) is in J × [mJ − η, MJ + η]. This proves that [ Gf ⊂ K. K∈A
Now, X
vol (K) =
K∈A
X
(MJ − mJ + 2η) vol (J)
J∈JP
= U (fˇ, P) − L(fˇ, P) + 2η
X
vol (J)
0 such that B(u, r) ⊂ D. There exists δ > 0 such that the n Y rectangle Iδ = [ui − δ, ui + δ] is contained in B(u, r), and hence in D. i=1
Since D has Jordan content zero, we find that Iδ also has Jordan content zero. But vol (Iδ ) = (2δ)n > 0. This gives a contradiction.
Chapter 6. Multiple Integrals
427
The next one is the mean value theorem for integrals. Theorem 6.52 Mean Value Theorem for Integrals Let D be a closed and bounded Jordan measurable set in Rn , and let f : D → R be a continuous function. If D is connected or path-connected, then there is a point x0 in D such that Z f (x)dx = f (x0 ) vol (D). D
Proof Since D is compact and f : D → R is continuous, extreme value theorem asserts that there exist points u and v in D such that f (u) ≤ f (x) ≤ f (v)
for all x ∈ D.
Since D is a Jordan measurable set and f : D → R is continuous, f : D → R is Riemann integrable. The monotonicity theorem implies that Z f (u) vol (D) ≤ f (x)dx ≤ f (v) vol (D). D
If vol (D) = 0, we can take x0 to be any point in D. If vol (D) ̸= 0, notice that Z 1 c= f (x)dx vol (D) D satisfies f (u) ≤ c ≤ f (v). Since D is connected or path-connected, and f : D → R is continuous, intermediate value theorem asserts that f (D) must be an interval. Since f (u) and f (v) are in f (D) and c is in between them, c must also be in f (D). This means that there is an x0 in D such that Z 1 f (x)dx = c = f (x0 ). vol (D) D
Chapter 6. Multiple Integrals
428
Exercises 6.3 Question 1 Explain why the set D = (x, y, z) | 4x2 + y 2 + 9z 2 < 36 is Jordan measurable. Question 2 Explain why the set D = {(x, y, z) | x ≥ 0, y ≥ 0, z ≥ 0, 3x + 4y + 6z ≤ 12} is Jordan measurable. Question 3 Let D = (x, y, z) | x2 + y 2 + z 2 = 25 , and let f : D → R be the function defined as 1, if x, y and z are rational, f (x, y, z) = 0, otherwise. Z Explain why f : D → R is Riemann integrable and find
f. D
Question 4 Let D = (x, y, z) | x2 + y 2 + z 2 ≤ 25 , and let f : D → R be the function defined as f (x, y, z) = ze|xy| . Explain why f : D → R is Riemann integrable.
Chapter 6. Multiple Integrals
429
Question 5 Let D = [0, 2] × (−2, 5) × (1, 7], and let f : D → R be the function defined as x + y, if x < y + z, f (x, y, z) = 2x − y, if x ≥ y + z. Explain why f : D → R is Riemann integrable. Question 6 Let D = (x, y, z) | 4x2 + 9y 2 ≤ 36, 0 ≤ z ≤ x2 + y 2 , and let f : D → R be the function defined as x, if x < y + z, f (x, y, z) = y + z, if x ≥ y + z. Explain why f : D → R is Riemann integrable. Question 7 If D is a Jordan measurable set that is contained in the closed rectangle I, show that I \ D is also Jordan measurable. Moreover, vol (I \ D) = vol (I) − vol (D). Question 8 If D1 and D2 are Jordan measurable sets and D2 is contained in D1 , show that D1 \ D2 is also Jordan measurable. Moreover, vol (D1 \ D2 ) = vol (D1 ) − vol (D2 ). Question 9 If D is a Jordan measurable set, show that int D is also Jordan measurable. Moreover, vol (int D) = vol (D).
Chapter 6. Multiple Integrals
430
Question 10 Let D1 and D2 be Jordan measurable sets in Rm and Rn respectively. Assume that D1 has Jordan content zero. Show that the set D = D1 × D2 in Rm+n also has Jordan content zero. Question 11
2
2
Let D = (x, y) | x + y ≤ 9 . Show that the integral
Z xdxdy exist and D
is equal to 0. Question 12
let D be a Jordan measurable set, and let f : D → R be a Riemann integrable function. If g : D → R is a bounded function such that g(x) = f (x) for all x in D, show that g : D → R is Riemann integrable and Z Z g= f. D
D
Chapter 6. Multiple Integrals
6.4
431
Iterated Integrals and Fubini’s Theorem
In Section 6.3, we have given a sufficient condition for a function f : D → R to be Riemann integrable. Riemann Integrable Functions If D is a subset of Rn such that a constant function on D is Riemann integrable, then any bounded function on D whose set of discontinuities is a set that has Jordan content zero is Riemann integrable. However, we have not discussed any strategy to compute a Riemann integral, except by using a sequence of partitions {Pk } with lim |Pk | = 0.
k→∞
This is a practical approach if one has a computer, but it is not feasible for hand calculations. Besides, it might also be difficult for us to understand the dependence of the integral on the parameters in the integrand. When n = 1, we have seen that the fundamental theorem of calculus gives us a powerful tool to calculate a Riemann integral when the integrand is a continuous function that has explicit antiderivatives. To be able to apply this powerful tool in the multivariable context, we need to relate multiple integrals with iterated integrals. This is the topic that is studied in this section. As a motivation, consider a continuous function f : [a, b] × [c, d] → R defined on the closed rectangle I = [a, b] × [c, d] in R2 . If P = (P1 , P2 ) is a partition of I with P1 = {x0 , x1 , . . . , xk } and P2 = {y0 , y1 , . . . , yl }, there are kl rectangles in the partition P. Denote the rectangles by Ji,j with 1 ≤ i ≤ k and 1 ≤ j ≤ l, where Ji,j = [xi−1 , xi ] × [yj−1 , yj ]. Choose a set of intermediate points A = {αi } for the partition P1 , and a set of intermediate points B = {βj } for the partition P2 . Let ξ i,j = (αi , βj )
for 1 ≤ i ≤ k, 1 ≤ j ≤ l.
Chapter 6. Multiple Integrals
432
Then C = ξ i,j | 1 ≤ i ≤ k, 1 ≤ j ≤ l is a choice of intermediate points for the partition P. The Riemann sum R(f, P, C) is given by R(f, P, C) =
k X l X
f (αi , βj )(xi − xi−1 )(yj − yj−1 ).
(6.4)
i=1 j=1
Since it is a finite sum, it does not matter which order we perform the summation. For fixed x ∈ [a, b], let gx : [c, d] → R be the function gx (y) = f (x, y),
y ∈ [c, d].
If we perform the sum over j in (6.4) first, we find that R(f, P, C) =
k l X X i=1
=
k X
! gαi (βj )(yj − yj−1 ) (xi − xi−1 )
j=1
R(gαi , P2 , B)(xi − xi−1 ).
i=1
Since gαi : [c, d] → R is continuous, it is Riemann integrable. Therefore, Z d gαi (y)dy. lim R(gαi , P2 , B) = |P2 |→0
c
This prompts us to define the function F : [a, b] → R by Z d Z d f (x, y)dy. gx (y)dy = F (x) = c
c
Then lim R(f, P, C) =
|P2 |→0
=
k X i=1 k X
lim R(gαi , P2 , B)(xi − xi−1 )
|P2 |→0
F (αi )(xi − xi−1 ) = R(F, P1 , A).
i=1
If F : [a, b] → R is also Riemann integrable, we would have lim
lim R(f, P, C) = lim R(F, P1 , A) |P1 |→0 Z b Z b Z = F (x)dx =
|P1 |→0 |P2 |→0
a
a
c
d
f (x, y)dy dx.
Chapter 6. Multiple Integrals
433
Interchanging the roles of of x and y, or equivalently, summing over i first in (6.4), we find that Z d Z b f (x, y)dx dy. lim lim R(f, P, C) = |P2 |→0 |P1 |→0
c
a
Figure 6.25: Given a partition P of a rectangle, one can sum over the rectangles row by row, or column by column. Since |P| =
p
|P1 |2 + |P2 |2 ,
|P| → 0 if and only if (|P1 |, |P2 |) → (0, 0). The question becomes whether the two limits lim
lim R(f, P, C)
|P1 |→0 |P2 |→0
and
lim
lim R(f, P, C)
|P2 |→0 |P1 |→0
are equal; and whether they are equal to the limit lim
(|P1 |,|P2 |)→(0,0)
R(f, P, C).
Remark 6.6 Let f : R2 → R be the function defined as f (x, y) =
x2 , x2 + y 2
(x, y) ∈ R2 \ {(0, 0)}.
Chapter 6. Multiple Integrals
434
We find that lim lim f (x, y) = lim 0 = 0,
y→0 x→0
y→0
lim lim f (x, y) = lim 1 = 1.
x→0 y→0
x→0
Hence, lim lim f (x, y) ̸= lim lim f (x, y).
y→0 x→0
x→0 y→0
This example shows that we cannot simply interchange the order of limits. In fact, the limit lim f (x, y) (x,y)→(0,0)
does not exist. The integrals Z d Z c
b
f (x, y)dx dy
Z b Z
d
f (x, y)dy dx
and a
a
c
are called iterated integrals. Definition 6.17 Iterated Integrals Let n be a positive integer larger than 1, and let k be a positive integer less than n. Denote a point in Rn by (x, y), where x ∈ Rk and y ∈ Rn−k . n Y Given that I = [ai , bi ] is a closed rectangle in Rn , let i=1 k Y Ix = [ai , bi ] and
Iy =
i=1
n Y
[ai , bi ].
i=k+1
If f : I → R is a bounded function defined on I, an interated integral is an integral of the form Z Z Z Z f (x, y)dxdy or f (x, y)dydx, Iy
Ix
whenever they exist. Let us consider the following example.
Ix
Iy
Chapter 6. Multiple Integrals
435
Example 6.39 Let g : [a, b] → R and h : [a, b] → R be continuous functions defined on [a, b] such that g(x) ≤ h(x) for all x ∈ [a, b]. Consider the set D defined as D = {(x, y) | a ≤ x ≤ b, g(x) ≤ y ≤ h(x)} . Let χD : R2 → R be the corresponding characteristic function. If c ≤ g(x) ≤ h(x) ≤ d
for all x ∈ [a, b],
then I = [a, b] × [c, d] is a closed rectangle that contains D. We have seen that D is a Jordan measurable set. Hence, χD : I → R is a Riemann integrable function. For any x ∈ [a, b], Z
d
Z
h(x)
dy = h(x) − g(x).
χD (x, y)dy = c
g(x)
Z b Z
d
χD (x, y)dy dx is equal to
Therefore, the iterated integral a
Z b Z
d
c
Z
b
(h(x) − g(x)) dx.
χD (x, y)dy dx = a
c
a
In Z b single variable calculus, we have learned that the integral (h(x) − g(x)) dx gives the area of D. Thus, in this case, we have a
Z b Z
d
Z
χD (x, y)dy dx = vol (D) = a
c
χD (x, y)dxdy. [a,b]×[c,d]
Namely, the iterated integral is equal to the double integral. The following theorem is the general case when n = 2.
Chapter 6. Multiple Integrals
436
Figure 6.26: The domain D = {(x, y) | a ≤ x ≤ b, g(x) ≤ y ≤ h(x)}. Theorem 6.53 Fubini’s Theorem in the Plane Let I = [a, b] × [c, d], and let f : I → R be a Riemann integrable function. For each x ∈ [a, b], define the function gx : [c, d] → R by gx (y) = f (x, y),
y ∈ [c, d].
If gx : [c, d] → R is Riemann integrable for each x ∈ [a, b], let F : [a, b] → R be the function defined as Z d Z d F (x) = gx (y)dy = f (x, y)dy. c
c
Then we have the following. (a) The function F : [a, b] → R is Riemann integrable. (b) The integral of f : [a, b] × [c, d] → R is equal to the integral of F : [a, b] → R. Namely, Z
Z
b
f=
F.
[a,b]×[c,d]
Equivalently, Z
a
Z bZ f (x, y)dxdy =
[a,b]×[c,d]
d
f (x, y)dydx. a
c
Chapter 6. Multiple Integrals
437 Proof
Let
Z I=
f (x, y)dxdy. [a,b]×[c,d]
We will show that for any ε > 0, there exists δ > 0 such that if P is a partition of [a, b] with |P | < δ, and A = {αi } is any set of intermediate points for P , then |R(F, P, A) − I| < ε. This will prove both (a) and (b). Fixed ε > 0. Since f : [a, b] × [c, d] → R is Riemann integrable with integral I, there exists δ0 > 0 such that if P = (P1 , P2 ) is a partition of [a, b] × [c, d] with |P| < δ0 , then U (f, P) − L(f, P) < ε. Take δ = δ0 /2. Let P = {x0 , x1 , . . . , xk } be a partition of [a, b] with |P | < δ. Take any partition P2 = {y0 , y1 , . . . , yl } of [c, d] such that |P2 | < δ. Let √ P = (P1 , P2 ), where P1 = P . Then |P| < 2δ < δ0 . For 1 ≤ i ≤ k, 1 ≤ j ≤ l, let mi,j =
inf
f (x, y),
(x,y)∈[xi−1 ,xi ]×[yj−1 ,yj ]
Mi,j =
sup
f (x, y).
(x,y)∈[xi−1 ,xi ]×[yj−1 ,yj ]
Then L(f, P) =
k X l X
mi,j (xi − xi−1 )(yj − yj−1 ),
i=1 j=1
U (f, P) =
k X l X
Mi,j (xi − xi−1 )(yj − yj−1 ).
i=1 j=1
Now let A = {αi } be any choice of intermediate points for the partition P = P1 . Notice that for any 1 ≤ i ≤ k, additivity theorem says that F (αi ) =
l Z X j=1
yj
yj−1
gαi (y)dy =
l Z X j=1
yj
yj−1
f (αi , y)dy.
Chapter 6. Multiple Integrals
438
Since mi,j ≤ f (αi , y) ≤ Mi,j
for all y ∈ [yj−1 , yj ],
we find that for 1 ≤ j ≤ l, Z
yj
mi,j (yi − yj−1 ) ≤
f (αi , y)dy ≤ Mi,j (yi − yj−1 ). yj−1
It follows that l X
mi,j (yj − yj−1 ) ≤ F (αi ) ≤
j=1
l X
Mi,j (yj − yj−1 ).
j=1
Multiply by (xi − xi−1 ), and sum over i from 1 to k, we find that L(f, P) ≤
k X
F (αi )(xi − xi−1 ) ≤ U (f, P).
i=1
In other words, L(f, P) ≤ R(F, P, A) ≤ U (f, P). Since we also have L(f, P) ≤ I ≤ U (f, P), we find that |R(F, P, A) − I| ≤ U (f, P) − L(f, P) < ε. This completes the proof. Example 6.40 Z x sin(xy)dxdy.
Evaluate the integral [0,1]×[0,1]
Solution The function f : [0, 1] × [0, 1] → R, f (x, y) = x sin(xy) is a continuous function. Hence, it is Riemann integrable.
Chapter 6. Multiple Integrals
439
For each x ∈ [0, 1], the function gx : [0, 1] → R, gx (y) = x sin(xy) is also continuous. Hence, gx : [0, 1] → R is Riemann integrable. By Fubini’s theorem, Z Z 1Z 1 x sin(xy)dxdy = x sin(xy)dydx [0,1]×[0,1]
0
0 1
Z
[− cos(xy)]y=1 y=0 dx
= Z0 1
(1 − cos x) dx
= 0
= 1 − [sin x]10 = 1 − sin 1. The roles of x and y in Fubini’s theorem can be interchanged, and we obain the following. Corollary 6.54 Assume that f : [a, b] × [c, d] → R is a Riemann integrable function such that for each x ∈ [a, b], the function gx : [c, d] → R, gx (y) = f (x, y) is Riemann integrable; and for each y ∈ [c, d], the function hy : [a, b] → R, hy (x) = f (x, y) is Riemann integrable. Then we can interchange the order of integration. Namely, Z dZ b Z bZ d f (x, y)dxdy = f (x, y)dydx. c
a
a
c
Example 6.41 1
Z
1
Z
x sin(xy)dxdy directly, it would
If we evaluate the iterated integral 0
0
be quite Ztedious as we need to apply integration by parts to evaluate the 1 integral x sin(xy)dx. Using Corollary 6.54, we can interchange the 0
order of integration and obtain Z 1Z 1 Z x sin(xy)dxdy = 0
0
0
1
Z
1
x sin(xy)dydx = 1 − sin 1. 0
Chapter 6. Multiple Integrals
440
Remark 6.7 The assumption that f : [a, b]×[c, d] → R is Riemann integrable is essential in Fubini’s theorem. It does not follow from the fact that for each x ∈ [a, b], the function gx : [c, d] → R is Riemann integrable, and the function F : [a, b] → R, Z d F (x) = gx (y)dy c
is Riemann integrable. For example, let g : [−1, 1] → R and h : [−1, 1] → R be the functions defined as 1, 1, if x is rational, if y ≥ 0, g(x) = h(y) = . −1, −1, if x is irrational, if y < 0. Then define the function f : [−1, 1] × [−1, 1] → R by f (x, y) = g(x)h(y). Since h : [−1, 1] → R is a step function, it is Riemann integrable and Z 1 Z 0 Z 1 h(y)dy = h(y)dy + h(y)dy = 0. −1
−1
Hence, for fixed x ∈ [−1, 1], Z
0
1
f (x, y)dy = 0. −1
Thus, the function F : [−1, 1] → R, Z 1 F (x) = f (x, y)dy, −1
being a function that is always zero, is Riemann integrable with integral 0. It follows that Z Z 1
1
−1
−1
f (x, y)dydx = 0. However, one can prove that the function f : [−1, 1] × [−1, 1] → R is not Riemann integrable, using the same way that we show that a Dirichlet’s function is not Riemann integrable.
Chapter 6. Multiple Integrals
441
The fact that for each x ∈ [a, b], the function gx : [c, d] → R, gx (y) = f (x, y) is Riemann integrable also does not follow from the fact that f : [a, b]×[c, d] → R is Riemann integrable. Consider for example the function f : [−1, 1] × [0, 1] → R, 1, if x = 0 and y is rational, f (x, y) = 0, otherwise. Then the set of discontinuities N of f : [−1, 1] × [0, 1] → R is the line segment between the point (0, 0) and the point (0, 1). Hence, N has Jordan content 0. Therefore, f : [−1, 1] × [0, 1] → R is Riemann integrable. For x = 0, g0 : [0, 1] → R is the Dirichlet’s function. Hence, g0 : [0, 1] → R is not Riemann integrable. Now we consider the case depicted in Example 6.39 for more general functions. Theorem 6.55 Let g : [a, b] → R and h : [a, b] → R be continuous functions defined on [a, b] such that g(x) ≤ h(x) for all x ∈ [a, b], and let D be the set D = {(x, y) | a ≤ x ≤ b, g(x) ≤ y ≤ h(x)} . If f : D → R is a continuous function, then Z bZ
Z
h(x)
f (x, y)dxdy = D
f (x, y)dydx. a
g(x)
Proof Since g and h are continuous functions, there exist numbers c and d such that c ≤ g(x) ≤ h(x) ≤ d for all x ∈ [a, b]. Then I = [a, b] × [c, d] be a closed rectangle that contains D. We have shown before that D is a Jordan measurable set and f : D → R is Riemann integrable. Therefore, fˇ : I → R is Riemann integrable.
Chapter 6. Multiple Integrals
442
On the other hand, for each x ∈ [a, b], the function gx : [c, d] → R is a piecewise continuous function given by if c ≤ y < g(x), 0, gx (y) =
f (x, y), 0,
if g(x) ≤ y ≤ h(x), if h(x) < y ≤ d.
Hence, gx : [c, d] → R is Riemann integrable and for x ∈ [a, b], d
Z
Z
h(x)
gx (y)dy =
F (x) =
f (x, y)dy.
c
g(x)
By Fubini’s theorem in the plane, the function F : [a, b] → R is Riemann integrable, and Z bZ
h(x)
Z a
g(x)
Z F (x)dx =
f (x, y)dydx = a
b
fˇ(x, y)dxdy =
I
Z f (x, y)dxdy. D
Again, the roles of x and y in Theorem 6.55 can be interchanged. Let us look at the following example. Example 6.42 Let D be the region in the plane bounded between the curve y 2 = x and the Z line L between the points (1, 1) and (4, −2). Evaluate the integral ydxdy. D
Solution The equation of the line L is x + y = 2. Hence, D = (x, y) | − 2 ≤ y ≤ 1, y 2 ≤ x ≤ 2 − y .
Chapter 6. Multiple Integrals
443
Using Fibini’s theorem, we find that Z Z Z 1 Z 2−y ydxdy = ydxdy = D
−2
y2
1
y(2 − y − y 2 )dy
−2
1 y3 y4 9 2 (2y − y − y )dy = y − = − =− . 3 4 −2 4 −2 Z
1
2
3
Figure 6.27: The domain D = {(x, y) | − 2 ≤ y ≤ 1, y 2 ≤ x ≤ 2 − y}. In Example 6.42, it will be harder if one prefers to integrate over y first. Example 6.43 Let D = (x, y) | x ≥ 0, y ≥ 0, 4x2 + 9y 2 ≤ 36 . Z Evaluate the integral xdxdy. D
Solution The set D can be expressed in two different ways. √ 1 D = (x, y) 0 ≤ x ≤ 3, 0 ≤ y ≤ 36 − 4x2 , 3 p 1 D = (x, y) 0 ≤ y ≤ 2, 0 ≤ x ≤ 36 − 9y 2 . 2
Chapter 6. Multiple Integrals
444
The Z function f : D → R, f (x, y) = x is continuous. Hence, the integral xdxdy is equal to iterated integrals, which we can integrate with respect D
to x first, or with respect to y first. If we integrate with respect to y first, we find that Z Z 3 Z 1 √36−4x2 Z 3 1 3 √ xdxdy = xdydx = x 36 − 4x2 dx. 3 D 0 0 0 This integral needs to be computed using integration by substitution. If we integrate over x first, we find that Z Z Z 1√ 2
2
36−9y 2
xdxdy = D
xdxdy 0
0
1 2 y= 12 x y=0 = 2
√
36−9y 2
1 = 8
Z
2
(36 − 9y 2 )dy.
0
This integral can be easily evaluated to give Z 2 1 xdxdy = 36y − 3y 3 0 = 6. 8 D
Figure 6.28: The domain D = {(x, y) | x ≥ 0, y ≥ 0, 4x2 + 9y 2 ≤ 36}. Now let us generalize the Fubini’s theorem to arbitrary positive integer n that is larger than 1.
Chapter 6. Multiple Integrals
445
Theorem 6.56 Fubini’s Theorem Let n be a positive integer larger than 1, and let k be a positive integer less n Y than n. Given that I = [ai , bi ] is a closed rectangle in Rn , let Ix = i=1 k Y
n Y
i=1
i=k+1
[ai , bi ] and Iy =
[ai , bi ]. Assume that f : I → R is a Riemann
integrable function such that for each x in Ix , the function gx : Iy → R defined by gx (y) = f (x, y), y ∈ Iy is Riemann integrable. Let F : Ix → R be the function defined as Z Z gx (y)dy = f (x, y)dy. F (x) = Iy
Iy
Then we have the followings. (a) The function F : Ix → R is Riemann integrable. (b) The integral of f : I → R is equal to the integral of F : Ix → R. Namely, Z Z Z f (x, y)dxdy = f (x, y)dydx. I
Ix
Iy
(c) For each y in Iy , define the function hy : Ix → R by hy (x) = f (x, y),
x ∈ Ix .
If the function hy : Ix → R is Riemann integrable for each y ∈ Iy , then we can interchange the order of integration. Namely, Z Z Z Z f (x, y)dydx = f (x, y)dxdy. Ix
Iy
Iy
Ix
The proof is similar to the n = 2 case and we leave it to the readers. A useful case is the following which generalizes Theorem 6.55.
Chapter 6. Multiple Integrals
446
Theorem 6.57 Let U be a Jordan measurable set in Rn−1 , and let g : U → R and h : U → R be bounded continuous functions on U satisfying g(x) ≤ h(x) for all x ∈ U. Consider the subset D of Rn defined as D = {(x, y) | x ∈ U, g(x) ≤ y ≤ h(x)} . If f : D → R is a bounded continuous function, then it is Riemann integrable, and Z
Z Z
h(x)
f (x, y)dxdy =
f (x, y)dydx, U
D
g(x)
Let us look at an example. Example 6.44 Z xdxdydz, where S is the solid bounded between
Evaluate the integral S
the planeZ x + y + z = 1 and the three coordinate planes. Then find the integral (x + 5y + 3z)dxdydz. S
Solution The solid S can be expressed as S = {(x, y, z) | (x, y) ∈ D, 0 ≤ z ≤ 1 − x − y} , where D = {(x, y) | 0 ≤ x ≤ 1, 0 ≤ y ≤ 1 − x} . Since D is a triangle, it is a Jordan measurable set. f (x, y, z) = x is continuous.
The function
Chapter 6. Multiple Integrals
447
Hence, we can apply Theorem 6.57. Z Z Z 1−x−y xdz dxdy xdxdydz = 0 S D Z 1 Z 1−x x(1 − x − y)dydx = 0 0 1−x Z 1 y2 x (1 − x)y − = dx 2 0 0 Z Z 1 1 1 1 2 2 = x(1 − x) dx = x (1 − x)dx 2 0 2 0 1 1 1 1 = − = . 2 3 4 24 Since the solid S is symmetric in x, y and z, we have Z Z Z zdxdydz. ydxdydz = xdxdydz = S
S
S
Therefore, Z
Z (x + 5y + 3z)dxdydz = 9
S
3 xdxdydz = . 8 S
Figure 6.29: The solid S bounded between the plane x + y + z = 1 and the three coordinate planes.
Chapter 6. Multiple Integrals
448
Exercises 6.4 Question 1 Let I = [0, 2]×[0, 2], and let f : I → R be the function defined as f (x, y) = x7 y 3 . For a positive integer k, let Pk be the uniformly regular partition of I into k 2 rectangles. Write down the summation formula for the Darboux upper sum U (f, Pk ). Show that the limit lim U (f, Pk ) exists and find the k→∞ limit. Question 2 Let D be Z the triangle with vertices (0, 0), (1, 0) and (1, 1). Evaluate the 2 integral ex dxdy. D
Question 3 2 Let D be the region in the plane bounded Z between the cuve y = x and the
line y = 2x + 3. Evaluate the integral
(x + 2y)dxdy. D
Question 4 Let D be the region in the plane boundedZbetween the cuve y 2 = 4x and the line y = 2x − 4. Evaluate the integral (x + y)dxdy. D
Question 5 1
Z
Z
Evaluate the integral 0
1
√
9x2 + 16 dxdy.
y
Question 6 Z xydxdydz, where S is the solid bounded between
Evaluate the integral S
the planeZ x + y + z = 4 and the three coordinate planes. Then find the integral (4xy + 5yz + 6xz)dxdydz. S
Chapter 6. Multiple Integrals
449
Question 7 Let f : [a, b] → R be a continuous function, and let G be the set G = {(x, y, 0) | a ≤ x ≤ b, 0 ≤ y ≤ |f (x)|} in R3 that lies in the plane z = 0. Rotate the set G about the x-axis, we obtain a solid of revolution S, which can be described as S = (x, y, z) | a ≤ x ≤ b, y 2 + z 2 ≤ f (x)2 . Show that the volume of S is Z vol (S) = π
b
f (x)2 dx.
a
Question 8 Let D1 and D2 be Jordan measurable sets in Rm and Rn respectively. Show that the set D = D1 × D2 is a Jordan measurable set in Rm+n and vol (D1 × D2 ) = vol (D1 ) × vol (D2 ).
Chapter 6. Multiple Integrals
6.5
450
Change of Variables Theorem Z f (x, y)dxdy when
Consider the problem of evaluating an integral of the form D
D is the disc D = {(x, y) | x2 + y 2 ≤ r2 }. When f : D → R is a continuous function, Fubini’s theorem says that we can write the integral as Z
r
Z f (x, y)dxdy =
−r
D
√
Z
r2 −x2
√ − r2 −x2
f (x, y)dydx.
However, it is usually quite complicated to evaluate this integral due to the square roots. In some sense, we have not fully utilized the circular symmetry of the region of integration D. For regions that have circular symmetry, it might be easier if we use polar coordinates (r, θ) instead of rectangular coordinates (x, y). The goal of this section is to discuss the change of variables formula for multiple integrals. For single variable functions, the change of variable formula is usually known as integration by substitution. We have proved the following theorem in volume I. Theorem 6.58 Integration by Substitution Let ψ : [a, b] → R be a function that satisfies the following conditions: (i) ψ is continuous and one-to-one on [a, b]; (ii) ψ is continuously differentiable on (a, b); (iii) ψ ′ (x) is bounded on (a, b). If ψ([a, b]) = [c, d], and f : [c, d] → R is a bounded function that is continuous on (c, d), then the function h : [a, b] → R, h(x) = f (ψ(x))|ψ ′ (x)| is Riemann integrable and Z d Z b f (u)du = f (ψ(x))|ψ ′ (x)|dx. c
a
The function ψ : [a, b] → R that satisfies all the three conditions (i)–(iii) in Theorem 6.58 defines a smooth change of variables u = ψ(x) from x to u.
Chapter 6. Multiple Integrals
451
Definition 6.18 Smooth Change of Variables Let O be an open subset of Rn . A mapping Ψ : O → Rn from O to Rn is called a smooth change of variables provided that it satisfies the following conditions. (i) Ψ : O → Rn is one-to-one. (ii) Ψ : O → Rn is continuously differentiable. (iii) For each x ∈ O, the derivative matrix DΨ(x) is invertible. Remark 6.8 If the mapping Ψ : O → Rn is continuously differentiable, and the derivative matrix DΨ(x) is invertible for each x ∈ O, the inverse function theorem implies that the mapping Ψ : O → Rn is locally one-to-one. However, it might not be globally one-to-one. For Ψ : O → Rn to be a smooth change of variables, we need to impose the additional condition that it is globally one-to-one. Example 6.45 Let x0 be a point in Rn . The mapping Ψ : Rn → Rn , Ψ(x) = x + x0 is a smooth change of variables. It is one-to-one, continuously differentiable, and the derivative matrix is DΨ(x) = In , which is invertible. Example 6.46 Let x0 and y0 be points in Rn , and let A be an invertible n × n matrix. The mapping Ψ : Rn → Rn defined by Ψ(x) = y0 + A(x − x0 ) is a one-to-one continuously differentiable mapping. Its derivative matrix is DΨ(x) = A, which is invertible for all x in Rn . This shows that Ψ is a smooth change of variables. The mapping Ψ : Rn → Rn in Example 6.46 is a composition of translations
Chapter 6. Multiple Integrals
452
and an invertible linear transformation. Example 6.47 Let O = {(x, y) | x > 0, y > 0}, and let Ψ : O → R2 be the mapping defined as Ψ(x, y) = (x2 − y 2 , 2xy). Show that Ψ : O → R2 is a smooth change of variables. Solution First we show that Ψ is one-to-one. If Ψ(x1 , y1 ) = Ψ(x2 , y2 ), then x21 − y12 = x22 − y22 ,
2x1 y1 = 2x2 y2 .
Let z1 = x1 + iy1 and z2 = x2 + iy2 . Then we find that z12 = x21 − y12 + 2ix1 y1 = x22 − y22 + 2ix2 y2 = z22 . Hence, we must have z2 = ±z1 . Restricted to O, x1 , x2 , y1 , y2 > 0. Hence, we must have z1 = z2 , or equivalently, (x1 , y1 ) = (x2 , y2 ). This shows that Ψ : O → R2 is one-to-one. Now " # 2x −2y DΨ(x, y) = 2y 2x is continuous, and det DΨ(x, y) = 4x2 + 4y 2 ̸= 0 for all (x, y) ∈ O. This proves that Ψ is continuously differentiable and the derivative matrix DΨ(x, y) is invertible for all (x, y) ∈ O. Hence, Ψ : O → R2 is a smooth change of variables. In this section, we will state the change of variables theorem, and give some discusssions about why this theorem holds. We will also look at examples of how this theorem is applied, especially for polar and spherical coordinates. The proof of the theorem is quite technical and will be given in next section.
Chapter 6. Multiple Integrals
453
Theorem 6.59 The Change of Variables Theorem Let O be an open subset of Rn , and let Ψ : O → Rn be a smooth change of variables. If D is a Jordan measurable set such that its closure D is contained in O, then Ψ(D) is also Jordan measurable. If f : Ψ(D) → R is a bounded continuous function, then the function g : D → R defined as g(x) = f (Ψ(x)) |det DΨ(x)| is Riemann integrable, and Z Z Z f (x)dx = g(x)dx = f (Ψ(x)) |det DΨ(x)| dx. Ψ(D)
D
(6.5)
D
Notice that the two vertical lines on det DΨ(x) in (6.5) means the absolute value, not the determinant. Remark 6.9 Jacobian For a mapping Ψ : O → Rn from a subset of Rn to Rn , the derivative matrix DΨ(x) is also called the Jacobian matrix of the mapping Ψ. The determinant of the Jacobian matrix is denoted by ∂(Ψ1 , . . . , Ψn ) . ∂(x1 , . . . , xn ) It is known as the Jacobian determinant, or simply as Jacobian. In practice, we will often denote a change of variables Ψ : O → Rn by u = Ψ(x). Then the Jacobian can be written as ∂(u1 , . . . , un ) . ∂(x1 , . . . , xn ) Using this notation, the change of variables formula (6.5) reads as Z f (u1 , . . . , un )du1 · · · dun Ψ(D) Z ∂(u1 , . . . , un ) dx1 . . . dxn . = f (u1 (x), . . . , un (x)) ∂(x , . . . , x ) 1 n D
Chapter 6. Multiple Integrals 6.5.1
454
Translations and Linear Transformations
In the single variable case, a translation is a map T : R → R, T (x) = x + c. If f : [a, b] → R is a Riemann integrable function, then b
Z
Z
b−c
f (x)dx = a
f (x + c)dx. a−c
For n ≥ 2, we have the following theorem, which is a stronger version of Theorem 6.59. Theorem 6.60 Let x0 be a fixed point in Rn , and let Ψ : Rn → Rn be the translation Ψ(x) = x + x0 . If D is a Jordan measurable subset of Rn , then Ψ(D) is Jordan measurable. If f : Ψ(D) → R is a Riemann integrable function, then g = (f ◦ Ψ) : D → R is also Riemann integrable, and Z Z Z Z f (x + x0 )dx. (6.6) g(x)dx = f (x)dx = f (x)dx = D+x0
Ψ(D)
D
D
Proof Obviously, translation maps a rectangle to a rectangle with the same volume. Hence, it maps sets that have Jordan content zero to sets that have Jordan content zero. It is also obvious that Ψ maps the boundary of D to the boundary of Ψ(D). This shows that Ψ(D) is Jordan measurable. If I is a closed rectangle that contains D, then I′ = I + x0 is a closed rectangle that contains Ψ(D). Let fˇ : I′ → R and gˇ : I → R be the zero extensions of f : Ψ(D) → R and g = (f ◦ Ψ) : D → R respectively. Then gˇ = fˇ ◦ Ψ. Since f : Ψ(D) → R is a Riemann integrable, Z I′
fˇ =
Z I′
fˇ =
Z I′
fˇ =
Z f. Ψ(D)
Given a partition P′ = (P1′ , . . . , Pn′ ) of I′ , let P = (P1 , . . . , Pn ) be the partition of I induced by the translation Ψ. Namely, P is a partition such that the rectangle J is in JP if and only if J′ = Ψ(J) = J + x0 is in P′ .
Chapter 6. Multiple Integrals
455
Then mJ (ˇ g ) = inf {g(x) | x ∈ J} = inf {f (x + x0 ) | x ∈ J} = inf {f (x) | x′ ∈ J + x0 } = mJ′ (fˇ). It follows that L(ˇ g , P) =
X
mJ (ˇ g ) vol (J) =
X
mJ′ (fˇ) vol (J′ ) = L(fˇ, P′ ).
J′ ∈JP′
J∈JP
Similarly, we have U (ˇ g , P) = U (fˇ, P′ ). Thus, the sets SL (ˇ g ) and SL (fˇ) of lower sums of gˇ and fˇ are the same, and the sets SU (ˇ g ) and SU (fˇ) of upper sums of gˇ and fˇ are also the same. These imply that Z Z Z ˇ ˇ f, gˇ = sup SL (ˇ g ) = sup SL (f ) = f = I′
I
Z
gˇ = inf SU (ˇ g ) = inf SU (fˇ) =
Z
fˇ =
I′
I
Ψ(D)
Z f. Ψ(D)
Hence, g : D → I is Riemann integrable and Z Z Z g = gˇ = f. D
I
Ψ(D)
Remark 6.10 It is easy to check that for the translation Ψ : Rn → Rn , Ψ(x) = x + x0 , the change of variables formula (6.6) is precisely the formula (6.5), since DΨ(x) = In in this case.
Chapter 6. Multiple Integrals
456
Figure 6.30: A translation in the plane. Example 6.48 Let D = (x, y) | (x − 2)2 + (y + 3)2 ≤ 16 . Z Evaluate the integral (4x + y)dxdy. D
Solution Make the change of variables u = x − 2, v = y + 3, which is a translation. Then x = u + 2, y = v − 3, and we have Z Z (4x + y)dxdy = (4u + 8 + v − 3)dudv D u2 +v 2 ≤16 Z = (4u + v + 5)dudv. u2 +v 2 ≤16
Since the disc B = {(u, v) | u2 + v 2 ≤ 16} is invariant when we change u to −u, or change v to −v, the integrals
Chapter 6. Multiple Integrals
457
Z
Z ududv
vdudv
and
u2 +v 2 ≤16
u2 +v 2 ≤16
are equal to 0. Therefore, Z Z (4x + y)dxdy = 5
dudv.
u2 +v 2 ≤16
D
In single variable analysis, we have shown that the area of a disc of radius r is πr2 . Hence, Z (4x + y)dxdy = 5 × area (B) = 5 × 16π = 80π. D
Now we consider a linear transformation T : Rn → Rn , T(x) = Ax defined by an invertible matrix A. Since DT(x) = A, the change of variables theorem says that for any function f : D → R that is bounded and continuous on D, Z Z Z f (x)dx = f (T(x))| det A|dx = | det A| f (T(x))dx. (6.7) T(D)
D
D
In the special case where f is a constant function, we have vol (T(D)) = | det A| vol (D). A very crucial fact to the proof of the change of variables theorem is a special case of this formula when D is a rectangle. Theorem 6.61 n Y Let I = [ai , bi ], and let T : Rn → Rn , T(x) = Ax be an invertible i=1
linear transformation. Then vol (T(I)) = | det A| vol (I).
(6.8)
Linear transformations map linear objects to linear objects. However, the image of a rectangle under a linear transformation is not necessary a rectangle, n Y but is always a parallelepiped. If I is the closed rectangle [ai , bi ], a point x in I i=1
Chapter 6. Multiple Integrals
458
can be written as x = a + t1 (b1 − a1 )e1 + · · · + tn (bn − an )en , where a = (a1 , . . . , an ), and t = (t1 , . . . , tn ) ∈ [0, 1]n . Hence, we say that the rectangle I is a parallelepiped based at the point a and spanned by the n-linearly independent vectors vi = (bi − ai )ei , 1 ≤ i ≤ n. Definition 6.19 Parallelepipeds A (closed) parallelepiped in Rn is a solid P in Rn based at a point a and spanned by n-linearly independent vectors v1 , . . . , vn . It can be described as P = {a + t1 v1 + · · · + tn vn | t = (t1 , . . . , tn ) ∈ [0, 1]n } .
Figure 6.31: Parallelepipeds in R2 and R3 . The boundary of a parallelepiped is a union of 2n bounded subsets, each of them is contained in a hyperplane. Thus, the boundary of a parallelepiped has Jordan content zero. Therefore, a parallelepiped is a Jordan measurable set. If P be a parallelepiped in Rn based at the point a and spanned by the vectors v1 , . . . , vn , and T : Rn → Rn is an invertible linear transformation, then T(P) is the parallelepiped in Rn based at the point T(a) and spanned by the vectors T(v1 ), . . . , T(vn ). The cube [0, 1]n is called the standard unit cube and it is often denoted by Qn . If P is a parallelepiped in Rn based at the point a and spanned by the vectors v1 , . . . , vn , then P = Ψ(Qn ), where Ψ : Rn → Rn is the mapping h i Ψ(x) = Ax + a, A = v1 · · · vn ,
Chapter 6. Multiple Integrals
459
which is a composition of an invertible linear transformation and a translation. Theorem 6.61 says that vol (P) = | det A|, (6.9) where A is the matrix whose column vectors are v1 , . . . , vn . For example, for a parallelogram in R2 which is spanned by the vectors " # " # a1 a2 v1 = and v2 = , b1 b2 the area of the parallogram is " # a1 a2 det . b1 b2 For a parallelepiped in R3 which is spanned by the vectors a3 a2 a1 v1 = b1 , v2 = b2 and v3 = b3 , c3 c2 c1 the volume of the parallelepiped is a1 a1 a3 det b1 b2 b3 . c1 c2 c3
These formulas have been derived in an elementary course. For general n, we will prove (6.9) in Appendix B using geometric arguments. This will then imply Theorem 6.61. From the theory of linear algebra, we know that an invertible matrix is a product of elementary matrices. Hence, an invertible linear transformation T : Rn → Rn can be written as T = Tm ◦ · · · ◦ T2 ◦ T1 , where T1 , T2 , . . . , Tm is one of the three types of elementary transformations, corresponding to the three types of elementary matrices.
Chapter 6. Multiple Integrals
460
I. When E is the elementary matrix obtained from the identity matrix In by interchanging two distinct rows i and j, the linear tranformation T : Rn → Rn , T(x) = Ex interchanges xi and xj , and fixes the other variables. In this case, det E = −1 and | det E| = 1. II. When E is the elementary matrix obtained from the identity matrix In by multiplying row i by a nonzero constant c, the linear transformation T : Rn → Rn , T(x) = Ex maps the point x = (x1 , . . . , xn ) to T(x) = (x1 , . . . , xi−1 , cxi , xi+1 , . . . , xn ). In this case, det E = c, and | det E| = |c|. III. When E is the elementary matrix obtained from the identity matrix In by adding a constant c times of row j to another row i, the linear transformation T : Rn → Rn , T(x) = Ex maps the point x = (x1 , . . . , xn ) to T(x) = (x1 , . . . , xi−1 , xi + cxj , xi+1 , . . . , xn ). In this case, det E = 1, and | det E| = 1. Since each of the elementary transformations involves changes in at most two variables, it is sufficient to consider these transformations when n = 2. Example 6.49 Let T : R2 → R2 be the linear transformation T(x, y) = (y, x). The matrix E corresponding to this transformation is " # 0 1 E= . 1 0 Under this transformation, the rectangle I = [a, b] × [c, d] is mapped to the rectangle I′ = [c, d] × [a, b]. It is easy to see that vol (I′ ) = vol (I).
Chapter 6. Multiple Integrals
461
Figure 6.32: The linear transformation T(x, y) = (y, x).
Figure 6.33: The linear transformation T(x, y) = (x, 2y). Example 6.50 Let T : R2 → R2 be the linear transformation T(x, y) = (x, ky),
k ̸= 0.
The matrix E corresponding to this transformation is " # 1 0 E= . 0 k Under this transformation, the rectangle I = [a, b] × [c, d] is mapped to the rectangle I′ = [a, b] × [kc, kd] if k > 0; and to the rectangle I′ = [a, b] × [kd, kc] if k < 0. In any case, we find that vol (I′ ) = |k| vol (I).
Chapter 6. Multiple Integrals
462
Example 6.51 Let T : R2 → R2 be the linear transformation T(x, y) = (x + ky, y). The matrix E corresponding to this transformation is " # 1 k E= . 0 1 Under this transformation, the rectangle I = [a, b] × [c, d] is mapped to the parallepiped P with vertices (a + kc, c), (a + kd, d), (b + kc, c) and (b + kd, d). Using elementary geometric argument, one can show that vol (P) = vol (I). Combining Example 6.49, Example 6.50 and Example 6.51, we conclude that (6.8) holds when T : Rn → Rn is an elementary transformation. The type II elementary transformations maps rectangles to rectangles, so do their compositions. Therefore, (6.8) also holds if the linear transformation T : Rn → Rn is a composition of type II elementary transformations. This gives the following. Theorem 6.62 Let x0 = (u1 , . . . , un ) be a fixed point in Rn , and let Ψ : Rn → Rn be the mapping Ψi (x) = αi xi + ui . Equivalently, Ψ(x) = Ax + x0 , where A is a diagonal matrix with diagonal entries α1 , . . . , αn . If D is a Jordan measurable subset of Rn , then Ψ(D) is Jordan measurable. If f : Ψ(D) → R is a Riemann integrable function, then h = (f ◦ Ψ) : D → R is also Riemann integrable, and Z Z Z f (x)dx = | det A| h(x)dx = | det A| f (Ax + x0 )dx. (6.10) Ψ(D)
D
D
Chapter 6. Multiple Integrals
463
Figure 6.34: The linear transformation T(x, y) = (x + y, y).
Figure 6.35: The linear transformation T(x, y) = (x + 2y, y).
Figure 6.36: The linear transformation T(x, y) = (x − y, y). Notice that det A = α1 · · · αn . If y = Ψ(x), then yi = αi xi + ui ,
1 ≤ i ≤ n.
The proof of Theorem 6.62 is similar to the proof Theorem 6.60, by establishing one-to-one correspondence between the partitions, and using the fact that for any rectangles J, vol (Ψ(J)) = |α1 · · · αn | vol (J).
Chapter 6. Multiple Integrals
464
Example 6.52 Find the area of the ellipse E = (x, y) | 4(x + 1)2 + 9(y − 5)2 ≤ 49 . Solution Make a change of variables u = 2(x + 1) and v = 3(y − 5). The Jacobian is " # 2 0 ∂(u, v) = det = 6, ∂(x, y) 0 3 and so
1 ∂(x, y) = . ∂(u, v) 6
Therefore, ∂(x, y) dxdy = area (E) = ∂(u, v) dudv 2 2 2 2 u +v ≤49 4(x+1) +9(y−5) ≤49 Z 49 1 dudv = π. = 6 u2 +v2 ≤49 6 Z
Z
Finally, let us consider an example of applying a general linear transformation. Example 6.53 Evaluate the integral Z D
2x + 3y + 3 dxdy, 2x − 3y + 8
where D = {(x, y) | 2|x| + 3|y| ≤ 6} . Solution Notice that for any (x, y) ∈ D, |2x − 3y + 8| ≥ 8 − 2|x| − 3|y| ≥ 2.
Chapter 6. Multiple Integrals
465
Figure 6.37: The transformation u = 2x − 3y and v = 2x + 3y. Hence, the function h(x, y) =
2x + 3y + 3 2x − 3y + 8
is continuous on D. The region D is enclosed by the 4 lines 2x + 3y = 6, 2x + 3y = −6, 2x − 3y = 6 and 2x − 3y = −6. This prompts us to define a change of variables by u = 2x − 3y and v = 2x + 3y. This is a linear transformation with Jacobian " # 2 −3 ∂(u, v) = det = 12. ∂(x, y) 2 3 Therefore, ∂(x, y) 1 = . ∂(u, v) 12 The region D in the (x, y)-plane is mapped to the rectangle R = {(u, v) | − 6 ≤ u ≤ 6, −6 ≤ v ≤ 6} in the (u, v)-plane.
Chapter 6. Multiple Integrals
466
Thus, Z D
v ∂(x, y) dudv R u + 8 ∂(u, v) Z 6Z 6 1 v+3 = dudv 12 −6 −6 u + 8 6 1 v2 + 3v = [ln(u + 8)]6−6 12 2 −6
2x + 3y + 3 dxdy = 2x − 3y + 8
Z
= 3 ln 7.
6.5.2
Polar Coordinates
Given a point (x, y) in the plane R2 , if r is a nonnegative number and θ is a real number such that x = r cos θ, y = r sin θ, then (r, θ) are called the polar coordinates of the point (x, y). Notice that r=
p x2 + y 2 .
Restricted to V = {(r, θ) | r > 0, 0 ≤ θ < 2π} , the map Φ : V → R2 , Φ(r, θ) = (r cos θ, r sin θ) is one-to-one, and its range is R2 \ {(0, 0)}. However, the inverse of Φ fails to be continuous. We can extend the map Φ to R2 continuously. Namely, given (r, θ) ∈ R2 , let (x, y) = Φ(r, θ), where x = r cos θ,
y = r sin θ.
Then Φ : R2 → R2 is continuously differentiable, but it fails to be one-to-one. Nevertheless, for any real number α, the map is continuous and one-to-one on the open set Oα = {(r, θ) | r > 0, α < θ < α + 2π} .
Chapter 6. Multiple Integrals
467
The derivative matrix of Φ : R2 → R2 is given by " # cos θ −r sin θ DΦ(r, θ) = . sin θ r cos θ Since det DΦ(r, θ) = r cos2 θ + r sin2 θ = r, we find that for any (r, θ) ∈ Oα , DΦ(r, θ) is invertible. Hence, Φ : Oα → R2 is a smooth change of variables.
Figure 6.38: The mapping Φ : O → R2 , Φ(r, θ) = (r cos θ, r sin θ) maps O = {(r, θ) | r > 0, 0 < θ < 2π} to R2 \ L, where L is the positive x-axis. Let us consider the special case where α = 0. In this case, let O = O0 . The map Φ : O → R2 , Φ(r, θ) = (r cos θ, r sin θ) is a smooth change of variables from polar coordinates to rectangular coordinates. Under this change of variables, Φ(O) = R2 \ {(x, 0) | x ≥ 0} is an open set in R2 . If D is the open rectangle (r1 , r2 ) × (θ1 , θ2 ), with 0 < r1 < r2
and
0 < θ1 < θ2 < 2π,
then Φ(D) is the open set bounded between the two circles x21 + y12 = r12 and x22 + y22 = r22 , and the two rays y = x tan θ1 , x ≥ 0 and y = x tan θ2 , x ≥ 0.
Chapter 6. Multiple Integrals
468
Figure 6.39: The region D = {(r cos θ, r sin θ) | r1 < r < r2 , θ1 < θ < θ2 } in the (x, y)-plane. (a) r1 = 0, r2 = ∞, θ1 = 0, θ2 = π. (b) r1 = 0, r2 = ∞, θ1 = − π2 , θ2 = π2 .
Figure 6.40: The region D = {(r cos θ, r sin θ) | r1 < r < r2 , θ1 < θ < θ2 } in the (x, y)-plane. (a) r1 = 0, r2 = ∞, θ1 = 0, θ2 = π2 . (b) r1 = 2, r2 = 5, θ1 = π6 , θ2 = 4π . 7 To apply the change of variables theorem, we notice that the Jacobian is ∂(x, y) = r. ∂(r, θ) Thus, ∂(x, y) drdθ = rdrdθ. ∂(r, θ) The change of variables theorem says the following. dxdy =
Chapter 6. Multiple Integrals
469
Theorem 6.63 Let [α, β] be a closed interval such that β ≤ α + 2π. Assume that g : [α, β] → R and h : [α, β] → R are continuous functions satisfying 0 ≤ g(θ) ≤ h(θ)
for all α ≤ θ ≤ β.
Let D be the region in the (x, y)-plane given by D = {(r cos θ, r sin θ) | α ≤ θ ≤ β, g(θ) ≤ r ≤ h(θ)} . If f : D → R is a continuous function, then Z
Z
β
Z
h(θ)
f (x, y)dxdy =
f (r cos θ, r sin θ)rdrdθ. α
D
(6.11)
g(θ)
Proof Let U = {(r, θ) | α ≤ θ ≤ β, g(θ) ≤ r ≤ h(θ)} . Then U is a compact Jordan measurable set in R2 . If β < α + 2π, take any α0 such that α0 < α < β < α0 + 2π. For example, we can take α0 = α −
2π − (β − α) . 2
If we also have g(θ) > 0
for all θ ∈ [a, b],
then U is contained in the set Oα0 = {(r, θ) | r > 0, α0 < θ < α0 + 2π} , and Φ(U) = D. Applying the change of variables theorem to the mapping Φ : Oα0 → R2 gives the desired formula (6.11) immediately.
Chapter 6. Multiple Integrals
470
If g(θ) = 0 for some θ ∈ [α, β], then we consider the set Uε = {(r, θ) | α ≤ θ ≤ β, g(θ) + ε ≤ r ≤ h(θ) + ε} ,
where ε > 0.
It is contained in Oα0 . Using boundedness of the continuous functions g : [α, β] → R, h : [α, β] → R and f : D → R, it is easy to show that Z Z f (x, y)dxdy = lim+ f (x, y)dxdy. ε→0
D
Φ(Uϵ )
By the change of variables formula, we have Z
Z
β
Z
h(θ)+ε
f (x, y)dxdy =
f (r cos θ, r sin θ)rdrdθ.
Φ(Uϵ )
α
g(θ)+ε
Taking the ε → 0+ limit yields again the desired formula (6.11). The last case we have to consider is when β = α + 2π. The technicality is that U is not contained in any of the Oα0 restricted to which the mapping Φ(r, θ) = (r cos θ, r sin θ) is a smooth change of variables. Instead of taking limits, there is an alternative way to resolve the problem. We write U = U1 ∪ U2 , where U1 = {(r, θ) | α ≤ θ ≤ α + π, g(θ) ≤ r ≤ h(θ)} , U2 = {(r, θ) | α + π ≤ θ ≤ α + 2π, g(θ) ≤ r ≤ h(θ)} , We have shown that the change of variables formula (6.11) is valid for U1 and U2 . Apply the additivity theorem, we find that Z Z Z f (x, y)dxdy f (x, y)dxdy = f (x, y)dxdy + Ψ(U1 ) α+π Z h(θ)
D
Ψ(U2 )
Z =
f (r cos θ, r sin θ)rdrdθ α
Z
g(θ) α+2π Z h(θ)
+
f (r cos θ, r sin θ)rdrdθ α+π
Z
β
Z
g(θ)
h(θ)
=
f (r cos θ, r sin θ)rdrdθ. α
g(θ)
Namely, the formula (6.11) is still valid when β = α + 2π.
Chapter 6. Multiple Integrals
471
Let us give a geometric explanation for the Jacobian ∂(x, y) = r, ∂(r, θ)
where x = r cos θ, y = r sin θ.
Assume that θ1 < θ2 < θ1 + 2π
and
0 < r1 < r2 .
Let D = {(r cos θ, r sin θ) | r1 ≤ r ≤ r2 , θ1 ≤ θ ≤ θ2 } . The area bounded between the circles x2 + y 2 = r12 and x2 + y 2 = r22 is π(r22 − r12 ). By rotational symmetry of the circle, the area of D is ∆A = π(r22 − r12 ) ×
θ2 − θ1 = r∆r∆θ, 2π
where
r1 + r2 , 2 When ∆r → 0, then r=
∆r = r2 − r1
and ∆θ = θ2 − θ1 .
∆A ∼ r∆r∆θ, where r ∼ r1 ∼ r2 .
Figure 6.41: The rectangle [r1 , r2 ] × [θ1 , θ2 ] in the (r, θ)-plane is mapped to the region {(r cos θ, r sin θ) | r1 ≤ r ≤ r2 , θ1 ≤ θ ≤ θ2 } in the (x, y)-plane. Now let us look at some examples.
Chapter 6. Multiple Integrals
472
Example 6.54 Z Evaluate the integral
(2x2 + y 2 )dxdy, where
D
D = (x, y) | x ≥ 0, y ≥ 0, 4x2 + 9y 2 ≤ 36 . Solution Making a change of variables x = 3u, y = 2v, the Jacobian is " # 3 0 ∂(x, y) = det = 6. ∂(u, v) 0 2 Then ∂(x, y) dudv (2x + y )dxdy = 18u + 4v ∂(u, v) U D Z =6 18u2 + 4v 2 dudv,
Z
2
Z
2
2
2
U
where U is the region U = (u, v) | u ≥ 0, v ≥ 0, u2 + v 2 ≤ 1 . Since U is symmetric if we interchange u and v, we find that Z Z Z 1 2 2 (u2 + v 2 )dudv. u dudv = v dudv = 2 U U U Using polar coordinates u = r cos θ, v = r sin θ, we find that Z
2
2
Z
π 2
Z
(u + v )dudv = U
0
0
1
π r × rdrdθ = 2 2
Z 0
1
r3 dr =
π . 8
Therefore, Z Z 6 × (18 + 4) 33π 2 2 (2x + y )dxdy = (u2 + v 2 )dudv = . 2 4 D U
Chapter 6. Multiple Integrals
473
Figure 6.42: The regions D = {(x, y) | x ≥ 0, y ≥ 0, 4x2 + 9y 2 ≤ 36} and U = {(u, v) | u ≥ 0, v ≥ 0, u2 + v 2 ≤ 1}. Example 6.55 Find the volume of the solid bounded between the surface z = x2 + y 2 and the plane z = 9. Solution The solid S bounded between the surface z = x2 + y 2 and the plane z = 9 can be expressed as S = (x, y, z) | (x, y) ∈ D, x2 + y 2 ≤ z ≤ 9 , where D = (x, y) | x2 + y 2 ≤ 9 . Since D is a closed ball, it is a Jordan measurable set. The volume of S is the integral of the constant function χS : S → R. It is a continuous function. By Fubini’s theorem, Z Z 9 Z vol (S) = dz dxdy = (9 − x2 − y 2 )dxdy x2 +y 2
D
D
Using polar coordinatets, we have Z
2π
Z
vol (S) = 0
0
3
9r2 r4 (9 − r )rdrdθ = 2π − 2 4 2
3 = 0
81π . 2
Chapter 6. Multiple Integrals
474
Figure 6.43: The solid bounded between the surface z = x2 + y 2 and the plane z = 9. Example 6.56 Let a be a positive number. Find the volume of ball B = (x, y, z) | x2 + y 2 + z 2 ≤ a2 . Solution Let D = (x, y) | x2 + y 2 ≤ a2 . Then the ball B can be decsribed as n o p p 2 2 2 2 2 2 B = (x, y, z) | (x, y) ∈ D, − a − x − y ≤ z ≤ a − x − y . Thus, by Fubini’s theorem, its volume is Z vol (B) =
Z
Z √a2 −x2 −y2
dxdydz = B
Z = D
D
−
√
a2 −x2 −y 2
p 2 a2 − x2 − y 2 dxdy
! dz
dxdy
Chapter 6. Multiple Integrals
Using polar coordinates, Z 2π Z vol (B) = 2 0
475
a
√
Z a2
−
r2
rdrdθ = 4π
0
a
√ r a2 − r2 dr.
0
Let u = a2 − r2 . Then du = −2rdr. When r = 0, u = a2 . When r = a, u = 0. Therefore, Z vol (B) = 2π 0
a2
2 3 u du = 2π u 2 3 1 2
a2 = 0
4π 3 a. 3
Example 6.57 Let a be a positive number, and let α be a number in the interval (0, π2 ). Find the volume of the solid E bounded between the sphere S = (x, y, z) | x2 + y 2 + z 2 = a2 and the cone n o p C = (x, y, z) | z = cot α x2 + y 2 . Solution The surfaces S and C intersect at the points (x, y, z) satisfying (x2 + y 2 )(1 + cot2 α) = a2 . Namely, x2 + y 2 = a2 sin2 α. Therefore, n o p p E = (x, y, z) | (x, y) ∈ D, cot α x2 + y 2 ≤ z ≤ a2 − x2 − y 2 , where D = (x, y) | x2 + y 2 ≤ a2 sin2 α .
Chapter 6. Multiple Integrals
476
Using Fubini’s theorem and polar coordinates, we find that Z Z Z √2 2 2 ! a −x −y
dxdydz =
vol (E) = E
2π
Z
Z
a sin α
= 0
√ D
√
cot α
dz
dxdy
x2 +y 2
a2 − r2 − r cot α rdrdθ
0
Using a change of variables u = a2 − r2 , we find that Z
a sin α
r 0
√
a2
−
r2 dr
Z 2 1 1 a = u 2 du 2 a2 cos2 α 1 h 3 ia2 a3 = u2 = 1 − cos3 α . 3 3 a2 cos2 α
On the other hand 3 a sin α Z a sin α r 2 r cot α dr = cot α 3 0 0 3 a3 a cos α 3 sin α = (cos α − cos3 α). = 3 sin α 3 Therefore, the volume of E is 2πa3 vol (E) = (1 − cos α). 3
Figure 6.44: The solid bounded between a sphere and a cone.
Chapter 6. Multiple Integrals 6.5.3
477
Spherical Coordinates
Now we consider the spherical coordinates, which is an alternative coordinate system for R3 . Consider the mapping Ψ : R3 → R3 given by Ψ(ρ, ϕ, θ) = (x, y, z) = (ρ sin ϕ cos θ, ρ sin ϕ sin θ, ρ cos ϕ). Namely, x = ρ sin ϕ cos θ, y = ρ sin ϕ sin θ, z = ρ cos ϕ. Let V be the set V = {(ρ, ϕ, θ) | ρ > 0, 0 ≤ ϕ ≤ π, 0 ≤ θ < 2π} . Given u = (x, y, z) ∈ R3 \ {(0, 0, 0)}, we claim that there is a unique (ρ, ϕ, θ) ∈ V such that Ψ(ρ, ϕ, θ) = (x, y, z). This triple (ρ, ϕ, θ) is called a spherical coordinates of the point u = (x, y, z). It is easy to see that p ρ = x2 + y 2 + z 2 = ∥u∥ is the distance from the point u = (x, y, z) to the origin. If we let z ϕ = cos−1 , ρ ϕ satisfies 0 ≤ ϕ ≤ π, and ⟨u, e3 ⟩ = z = ρ cos ϕ = ∥u∥ cos ϕ. Thus, geometrically, ϕ is the angle the vector from 0 to u makes with the positive z-axis. Let W be the (x, y)-plane in R3 . Then (x, y, 0) = projW u. Let r = ρ sin ϕ. Then r=
p p p ρ2 − ρ2 cos2 ϕ = ρ2 − z 2 = x2 + y 2 .
Chapter 6. Multiple Integrals
478
Figure 6.45: Spherical Coordinates. Thus, θ ∈ [0, 2π) is uniquely determined so that x = r cos θ,
y = r sin θ.
Equivalently, (r, θ) is the polar coordinates of the point (x, y) in R2 \ {(0, 0)}. Hence, we find that the map Ψ : V → R3 , Ψ(ρ, ϕ, θ) = (x, y, z) = (ρ sin ϕ cos θ, ρ sin ϕ sin θ, ρ cos ϕ) is one-to-one on the set V , and the range is R3 \ {(0, 0, 0)}. However, the inverse map is not continuous. Let us calculate the derivative matrix of Ψ : R3 → R3 . We find that sin ϕ cos θ ρ cos ϕ cos θ −ρ sin ϕ sin θ DΨ(ρ, ϕ, θ) = sin ϕ sin θ ρ cos ϕ sin θ ρ sin ϕ cos θ . cos ϕ −ρ sin ϕ 0 Therefore, the Jacobian det DΨ(ρ, ϕ, θ) is " # ρ cos ϕ cos θ −ρ sin ϕ sin θ ∂(x, y, z) = cos ϕ × det ∂(ρ, ϕ, θ) ρ cos ϕ sin θ ρ sin ϕ cos θ " # sin ϕ cos θ −ρ sin ϕ sin θ + ρ sin ϕ × det sin ϕ sin θ ρ sin ϕ cos θ = ρ2 cos2 ϕ sin ϕ + ρ2 sin3 ϕ = ρ2 sin ϕ.
Chapter 6. Multiple Integrals
479
This shows that DΨ(ρ, ϕ, θ) is invertible if and only if ρ ̸= 0 and sin ϕ ̸= 0, if and only if (x, y, z) does not lie on the z axis. Thus, for any real number α, if Oα is the open set Oα = {(ρ, ϕ, θ) | ρ > 0, 0 < ϕ < π, α < θ < α + 2π} , then Ψ : Oα → R3 is a smooth change of variables. The change of variables theorem gives the following. Theorem 6.64 Let [α, β] and [δ, η] be a closed intervals such that β ≤ α + 2π
and 0 ≤ δ < η ≤ π,
and let I = [δ, η] × [α, β]. Assume that the functions g : I → R and h : I → R satisfy 0 ≤ g(ϕ, θ) ≤ h(ϕ, θ) for all (ϕ, θ) ∈ I, let D be the region in R3 defined by D = {(ρ sin ϕ cos θ, ρ sin ϕ sin θ, ρ cos ϕ) | (ϕ, θ) ∈ I, g(ϕ, θ) ≤ ρ ≤ h(ϕ, θ)} . If f : D → R is a continuous function, then Z f (x, y, z)dxdydz D Z β Z η Z h(ϕ,θ) = f (ρ sin ϕ cos θ, ρ sin ϕ sin θ, ρ cos ϕ)ρ2 sin ϕ dρdϕdθ. α
δ
g(ϕ,θ)
Again, if β < α + 2π, δ > 0, η < π and g(ϕ, θ) > 0 for all (ϕ, θ) ∈ I, this is just a direct consequence of the general change of variables theorem. The rest can be argued by taking limits. The results of Example 6.57 can be used to give a hindsight about the Jacobian ∂(x, y, z) = ρ2 sin ϕ ∂(ρ, ϕ, θ) that appears in the change from spherical coordinates to rectangular coordinates. Consider the rectangle I = [ρ1 , ρ2 ] × [ϕ1 , ϕ2 ] × [θ1 , θ2 ] in the (ρ, ϕ, θ) space, where θ2 < θ1 + 2π, and for simplicity, assume that 0 < ϕ1 < ϕ2 < π2 . Under the
Chapter 6. Multiple Integrals
480
mapping Ψ(ρ, ϕ, θ) = (ρ sin ϕ cos θ, ρ sin ϕ sin θ, ρ cos ϕ), Ψ(I) is a wedge in the solid E in R3 bounded between the spheres x2 + y 2 + z 2 = p p ρ21 , x2 + y 2 + z 2 = ρ22 , and the cones z = cot ϕ1 x2 + y 2 , z = cot ϕ2 x2 + y 2 . Since E has a rotational symmetry with respect to θ, ∆V = vol (Ψ(I)) =
θ2 − θ1 vol (E). 2π
Using inclusion and exclusion principle, the result of Example 6.57 gives 2π 3 2π 3 ρ2 (1 − cos ϕ2 ) − ρ (1 − cos ϕ2 ) 3 3 1 2π 3 2π 3 − ρ2 (1 − cos ϕ1 ) + ρ (1 − cos ϕ1 ) 3 3 1 2π 3 (ρ − ρ31 )(cos ϕ1 − cos ϕ2 ). = 3 2
vol (E) =
Figure 6.46: Volume change under spherical coordinates. By mean value theorem there is a ρ ∈ (ρ1 , ρ2 ) and a ϕ ∈ (ϕ1 , ϕ2 ) such that ρ32 − ρ31 = 3ρ2 ∆ρ
and
cos ϕ1 − cos ϕ2 = sin ϕ∆ϕ,
Chapter 6. Multiple Integrals
481
where ∆ρ = ρ2 − ρ1 ,
∆ϕ = ϕ2 − ϕ1 .
Let ∆θ = θ2 − θ1 . Then we find that ∆V = ρ2 sin ϕ ∆ρ∆ϕ∆θ. This gives an interpretation of the Jacobian ρ2 sin ϕ. Let us look at an example of applying spherical coordinates. Example 6.58 Z
(x2 + 4z)dxdydz, where
Compute the integral E
E = (x, y, z) | x2 + y 2 + z 2 ≤ 9, z ≥ 0 . Solution Let B be the sphere B = (x, y, z) | x2 + y 2 + z 2 ≤ 9 . By symmetry, Z Z Z 1 1 2 2 x dxdydz = (x2 + y 2 + z 2 )dxdydz. x dxdydz = 2 6 B B E Using spherical coordinates, we have Z Z Z Z 1 2π π 3 2 2 x dxdydz = ρ × ρ2 sin ϕ dρdϕdθ 6 E 0 0 0 5 3 π 162π π ρ = [− cos ϕ]0 = . 3 5 0 5 On the other hand, Z Z zdxdydz = E
2π
0
Z
π 2
Z
0
3
ρ cos ϕ × ρ2 sin ϕ dρdϕdθ
0 2
cos ϕ = 2π − 2
π2 0
ρ4 4
3 = 0
81π . 4
Chapter 6. Multiple Integrals
482
Therefore, Z
(x2 + 4z)dxdydz =
E
162π 567π + 81π = . 5 5
In the example above, we have used the symmetry of the region E to avoid some complicated computations. Another example is the following. Example 6.59 Z Let a be a positive number. Evaluate the integral
x4 dxdydz, where
E
E = (x, y, z) | x ≥ 0, y ≥ 0, z ≥ 0, x2 + y 2 + z 2 ≤ a2 .
Solution The expression of the z variable in terms of the spherical coordinates is considerably simpler than the x and y variables. By symmetry, we have Z Z 4 x dxdydz = z 4 dxdydz. E
E
Thus, using spherical coordinates, we find that Z
4
π 2
Z
Z
π 2
Z
x dxdydz = E
0
0
a
ρ4 cos4 ϕ × ρ2 sin ϕ dρdϕdθ
0
π a π cos5 θ 2 ρ7 πa7 = = − . 2 5 7 0 70 0 6.5.4
Other Examples
Example 6.60 Let D be the region D = (x, y) | y > 0, 4 ≤ x2 − y 2 ≤ 9, 3 ≤ xy ≤ 7 . Compute the integral Z D
(x3 y + xy 3 )dxdy.
Chapter 6. Multiple Integrals
483 Solution
Let O = {(x, y) | y > 0}, and let Ψ : O → R2 be the mapping Ψ(x, y) = (x2 − y 2 , xy). If (x1 , y1 ) and (x2 , y2 ) are points in O such that Ψ(x1 , y1 ) = Ψ(x2 , y2 ), then x21 − y12 = x22 − y22 and x1 y1 = x2 y2 . Let z1 = x1 + iy1 and z2 = x2 + iy2 . Then z12 = (x1 + iy1 )2 = (x2 + iy2 )2 = z22 . This implies that z2 = ±z1 . Thus, y2 = ±y1 . Since y1 and y2 are positive, we find that y1 = y2 . Since x1 y1 = x2 y2 , we then deduce that x1 = x2 . Hence, Ψ : O → R2 is one-to-one. Since it is a polynomial mapping, it is continuously differentiable. Since " # 2x −2y DΨ(x, y) = , det DΨ(x, y) = 2(x2 + y 2 ), y x we find that det DΨ(x, y) ̸= 0 for all (x, y) ∈ O. This implies that Ψ : O → R2 is a smooth change of variables. Let u = x2 − y 2 , v = xy. The Jacobian is ∂(u, v) = 2(x2 + y 2 ). ∂(x, y) Notice that Ψ(D) = {(u, v) | 4 ≤ u ≤ 9, 3 ≤ v ≤ 7} . Therefore, Z Z 1 ∂(u, v) 2 2 xy dxdy xy(x + y )dxdy = 2 D ∂(x, y) D Z Z Z 1 1 7 9 = vdudv = vdudv = 50. 2 Ψ(D) 2 3 4
Chapter 6. Multiple Integrals
484
Figure 6.47: The region D = (x, y) | y > 0, 4 ≤ x2 − y 2 ≤ 9, 3 ≤ xy ≤ 7 . Remark 6.11 Hyperspherical Coordinates For any n ≥ 4, the hyperspherical coorfinates in Rn are the coordinates (r, θ1 , . . . , θn−1 ) such that x1 = r cos θ1 , x2 = r sin θ1 cos θ2 , x3 = r sin θ1 sin θ2 cos θ3 , .. .
(6.12)
xn−1 = r sin θ1 · · · sin θn−2 cos θn−1 xn = r sin θ1 · · · sin θn−2 sin θn−1 . Here r=
q
x21 + · · · + x2n .
If V = (0, ∞) × [0, π)n−2 × [0, 2π), there is a one-to-one correspondence between (r, θ1 , . . . , θn−1 ) in V and (x1 , . . . , xn ) in Rn \ {0} given by (6.12). One can show that the Jacobian of this transformation is ∂(x1 , x2 , . . . , xn ) = rn−1 sinn−2 θ1 · · · sin θn−2 . ∂(r, θ1 , . . . , θn−1 )
Chapter 6. Multiple Integrals
485
Exercises 6.5 Question 1 Let D = (x, y) | 4(x + 1)2 + 9(y − 2)2 ≤ 144 . Z Evaluate the integral (2x + 3y)dxdy. D
Question 2 Evaluate the integral Z D
x+y dxdy, (x − 2y + 8)2
where D = {(x, y) | |x| + 2|y| ≤ 7} . Question 3 Z Evaluate the integral
(x2 − xy + y 2 )dxdy, where
D
D = (x, y) | x ≥ 0, x2 + 9y 2 ≤ 36 . Question 4 Find the volume of the solid bounded between the surface z = x2 + y 2 and the surface x2 + y 2 + z 2 = 20. Question 5 Let a, b and c be positive numbers, and let E be the solid 2 2 2 x y z E = (x, y, z) 2 + 2 + 2 ≤ 1 . a b c Z Evaluate x2 dxdydz. E
Chapter 6. Multiple Integrals
486
Question 6 Let a, b and c be positive numbers, and let E be the solid x2 y 2 z 2 E = (x, y, z) x ≥ 0, 2 + 2 + 2 ≤ 1 . a b c Z x6 dxdydz. Evaluate E
Question 7 Let D be the region D = (x, y) | x > 0, 1 ≤ x2 − y 2 ≤ 25, 1 ≤ xy ≤ 6 . Compute the integral Z D
x4 − y 4 dxdy. xy
Question 8 Let D be the region D = (x, y) | 5x2 − 2xy + 10y 2 ≤ 9 , and let Ψ : R2 → R2 be the mapping defined by Ψ(x, y) = (2x + y, x − 3y). (a) Explain why Ψ : R2 → R2 is a smooth change of variables. (b) Find Ψ(D). Z (c) Compute the integral D
5x2
8 dxdy. − 2xy + 10y 2 + 16
Chapter 6. Multiple Integrals
6.6
487
Proof of the Change of Variables Theorem
In this section, we give a complete proof of the change of variables theorem, which we restate here. Theorem 6.65 The Change of Variables Theorem Let O be an open subset of Rn , and let Ψ : O → Rn be a smooth change of variables. If D is a Jordan measurable set such that its closure D is contained in O, then Ψ(D) is Jordan measurable, and for any function f : Ψ(D) → R that is bounded and continuous, we have Z Z f (x)dx = f (Ψ(x)) |det DΨ(x)| dx. Ψ(D)
D
Among the assertions in the theorem, we will first establish the following. Theorem 6.66 Let O be an open subset of Rn , and let Ψ : O → Rn be a smooth change of variables. If D is a Jordan measurable set such that its closure D is contained in O, then Ψ(D) is also Jordan measurable. A special case of the change of variables theorem is when f : Ψ(D) → R is the characteristic function of Ψ(D). This gives the change of volume theorem. Theorem 6.67 The Change of Volume Theorem Let O be an open subset of Rn , and let Ψ : O → Rn be a smooth change of variables. If D is a Jordan measurable set such that its closure D is contained in O, then Z vol (Ψ(D)) = |det DΨ(x)| dx. D
In the following, let us give some remarks about the statements in the theorem, and outline the plan of the proof.
Chapter 6. Multiple Integrals
488
The Change of Variables Theorem 1. The first step is to prove Theorem 6.66 which asserts that Ψ(D) is Jordan measurable. To do this, we first show that a smooth change of variables sets up a one-to-one correspondence between the open sets in the domain and the range. This basically follows from inverse function theorem. 2. Since f : Ψ(D) → R is continuous and bounded, if Ψ(D) is Jordan measurable, f : Ψ(D) → R is Riemann integrable. 3. Let g : O → R be the function g(x) = | det DΨ(x)|. Since Ψ : O → Rn is continuously differentiable, the function DΨ : 2 O → Rn is continuous. Since determinant and absolute value are continuous functions, g : O → R is a continuous function. Since D is a compact set contained in O, g : D → R is bounded. 4. Since Ψ : O → Rn is continuous, and the functions f : Ψ(D) → R and g : D → R are continuous and bounded, the function h : D → R, h(x) = f (Ψ(x)) g(x) = f (Ψ(x)) |det DΨ(x)| is continuous and bounded. Hence, it is Riemann integrable. 5. To prove the change of variables theorem, we will first prove the change of volume theorem. This is the most technical part of the proof. 6. To prove the change of volume theorem, we first consider the case where Ψ : Rn → Rn is an invertible linear transformation. In this case, the theorem says that if D is a Jordan measurable set, and T : Rn → Rn , T(x) = Ax is an invertible linear transformation, then vol (T(D)) = | det A| vol (D).
(6.13)
Chapter 6. Multiple Integrals
489
7. To prove (6.13), we first consider the case where D = I is a closed rectangle. This is an easy consequence of the fact that the volume of a parallelepiped spanned by the vectors v1 , . . . , vn is equal to | det A|, where A is the matrix with v1 , . . . , vn as column vectors. This was proved in Appendix B. 8. After proving the change of volume theorem, we will prove the change of variables theorem for the special case where D = I is a closed rectangle first. The general theorem then follows by some simple analysis argument. We begin by the following proposition which says that a smooth change of variables maps open sets to open sets. Proposition 6.68 Let O be an open subset of Rn , and let Ψ : O → Rn be a smooth change of variables. Then for any open set D that is contained in O, Ψ(D) is open in Rn . In particular, Ψ(O) is an open subset of Rn . Proof Given that D is an open subset of Rn , let W = Ψ(D). We want to show that W is an open set. If y0 is a point in W, there is an x0 in D such that y0 = Ψ(x0 ). Since Ψ : O → Rn is continuously differentiable and DΨ(x0 ) is invertible, we can apply inverse function theorem to conclude that there is an open set U0 containing x0 such that Ψ(U0 ) is also open, and Ψ−1 : Ψ(U0 ) → U0 is continuously differentiable. Let U = U0 ∩ D. Then U is an open subset of D and U0 . It follows that V = Ψ(U) = (Ψ−1 )−1 (U) is an open subset of Rn that is contained in W = Ψ(D). Notice that V is an open set that contains y0 . Thus, we have shown that every point in W has a neighbourhood that lies in W. This proves that W is an open set. The following proposition says that the inverse of a smooth change of variables is also a smooth change of variables.
Chapter 6. Multiple Integrals
490
Proposition 6.69 Let O be an open subset of Rn , and let Ψ : O → Rn be a smooth change of variables. Then Ψ−1 : Ψ(O) → Rn is also a smooth change of variables. Proof By Proposition 6.68, Ψ(O) is an open set. By default, Ψ−1 : Ψ(O) → Rn is one-to-one. As in the proof of Proposition 6.68, the inverse function theorem implies that it is continuously differentiable. If x0 = Ψ−1 (y0 ), inverse function theorem says that DΨ−1 (y0 ) = DΨ(x0 )−1 . The inverse of an invertible matrix is invertible. Hence, for any y0 in Ψ(O), DΨ−1 (y0 ) is invertible. These prove that Ψ−1 : Ψ(O) → Rn is a smooth change of variables. Remark 6.12 Homeomorphisms and Diffeomorphisms Let O be an open subset of Rn , and let Ψ : O → Rn be a continuous injective map such that Ψ(O) is open, and the inverse map Ψ−1 : Ψ(O) → O is continuous. Then we say that Ψ : O → Ψ(O) is a homeomorphism. A homeomorphism sets up a one-to-one correspondence between open sets in O and open sets in Ψ(O). If Ψ : O → Ψ(O) is a homeomorphism and both the maps Ψ : O → Ψ(O) and Ψ−1 : Ψ(O) → O are continuously differentiable, then we say that Ψ : O → Ψ(O) is a diffeomorphism. Proposition 6.68 and Proposition 6.69 imply that a continuous change of variables is a diffeomorphism. A map of the form Ψ : Rn → Rn , Ψ(x) = y0 + A(x − x0 ), where x0 and y0 are points in Rn and A is an invertible matrix, is a diffeomorphism. Now we can prove the following which is essential for the proof of Theorem
Chapter 6. Multiple Integrals
491
6.66. Theorem 6.70 Assume that O and U are open subsets of Rn , and Ψ : O → U is a homeomorphism. If D is a subset of O such that D is also contained in O, then Ψ(D) = Ψ(D). int Ψ(D) = Ψ(int D), Thus, ∂Ψ(D) = Ψ(∂D). Proof The interior of a set A is an open set that contains all the open set that is contained in A. By Remark 6.12, there is a one-to-one correspondence between the open sets that are contained in D and the open sets that are contained in Ψ(D). Therefore, int Ψ(D) = Ψ(int D). Since D is a compact set and Ψ : O → U is continuous, Ψ(D) is a compact set. Therefore, Ψ(D) is a closed set that contains Ψ(D). This implies that Ψ(D) ⊂ Ψ(D).
(6.14)
Since Ψ−1 : U → O is also continuous, the same argument gives D = Ψ−1 (Ψ(D)) ⊂ Ψ−1 (Ψ(D)). This implies that Ψ D ⊂ Ψ(D).
(6.15)
Eq. (6.14) and (6.15) give Ψ(D) = Ψ(D). The last assertion follows from the fact that for any set A, A is a disjoint union of int A and ∂A. Recall that a set D in Rn has Jordan content zero if and only if for every ε > 0,
Chapter 6. Multiple Integrals
492
D can be covered by finitely many cubes Q1 , . . ., Qk , such that k X
vol (Qj ) < ε.
j=1
The next proposition gives a control of the size of the cube under a smooth change of variables. Proposition 6.71 Let O be an open subset of Rn , and let Ψ : O → Rn be a smooth change of variables. If Qc,r is a cube with center at c and side length 2r, then Ψ(Qc,r ) is contained in the cube QΨ(c),λr , where n X ∂Ψi λ = max max ∂xj (x) . 1≤i≤n x∈Qc,r j=1
Therefore, vol (Ψ(Qc,r )) ≤ λn vol (Qc,r ). Remark 6.13 ∂Ψi (x) is continuous for all 1 ≤ ∂xj n X ∂Ψi max ∂xj (x) x∈Qc,r j=1
Note that since Qc,r is a compact set and i, j ≤ n,
exists. Proof of Proposition 6.71 Notice that u ∈ Qc,r if and only if |ui − ci | ≤ r
for each 1 ≤ i ≤ n.
Let d = Ψ(c). Given v = Ψ(u) with u ∈ Qc,r , we want to show that v is in Qd,λr , or equivalently, |vi − di | ≤ λr
for each 1 ≤ i ≤ n.
Chapter 6. Multiple Integrals
493
This is basically an application of mean value theorem. The set Qc,r is convex and the map Ψi : O → R is continuously differentiable. Mean value theorem says that there is a point x in Qc,r such that vi − di = Ψi (u) − Ψi (c) =
n X ∂Ψi j=1
∂xj
(x)(uj − cj ).
Therefore, n n X X ∂Ψi ∂Ψi |vi − di | ≤ ∂xj (x) |uj − cj | ≤ r ∂xj (x) j=1
j=1
n X
∂Ψi ≤ λr. ≤ r max (u) ∂xj x∈Qc,r j=1 This proves that Ψ(Qc,r ) is contained in QΨ(c),λr . The last assertion in the proposition about the volumes is obvious. Now we prove Theorem 6.66. Proof of Theorem 6.66 Since D is a compact set that is contain in the open set O, Theorem 3.36 says that there is a positive number d and a compact set C such that D ⊂ C ⊂ O, and any point in Rn that has a distance less than d from a point in D lies in C. Since Ψ : O → Rn is continuously differentiable, for all 1 ≤ i, j ≤ n, ∂Ψi : C → R is a continuous function. Since C is a compact set, for each ∂xj 1 ≤ i ≤ n, the function n X ∂Ψi ∂xj (x) j=1 has a maximum on C. Hence, n X ∂Ψi λ = max max ∂xj (x) 1≤i≤n x∈C j=1
exists.
Chapter 6. Multiple Integrals
494
Since D is Jordan measurable, ∂D has Jordan content zero. Since C contains D, it contains ∂D. Given ε > 0, there exist cubes Q1 , Q2 , . . . , Qk , each of which intersects D, and such that ∂D ⊂
n [
Qj
j=1
and
k X
vol (Qj )
0, there is a partition P of I such that Z ε U (χD , P) − χD < . | det A| I Hence, U (χD , P) < vol (D) +
ε . | det A|
Let A = {J ∈ JP | J ∩ D ̸= ∅} . Then X
vol (J) = U (χD , P) < vol (D) +
J∈A
Notice that D⊂
[ J∈A
J.
ε . | det A|
Chapter 6. Multiple Integrals
496
Therefore, T(D) ⊂
[
T(J).
J∈A
For each rectangle J, T(J) is a parallelepiped. For any two distinct rectangles in A , they are disjoint or intersect at a set that has Jordan content zero. Therefore, additivity theorem implies that the set K defined as K=
[
T(J)
J∈A
is Jordan measurable, and X X vol (K) = vol (T(J)) = | det A| vol ( J) < | det A| vol (D) + ε. J∈A
J∈A
Since T(D) ⊂ K, we find that vol (T(D)) ≤ vol (K) < | det A| vol (D) + ε. Since ε > 0 is arbitrary, we conclude that vol (T(D)) ≤ | det A| vol (D).
(6.16)
Since T−1 : Rn → Rn is also an invertible linear transformation, we find that vol (D) = vol T−1 (T(D)) (6.17) 1 ≤ | det A−1 | vol (T(D)) = vol (T(D)). | det A| Eq. (6.16) and (6.17) together give vol (T(D)) = | det A| vol (D). 2
Recall that by identitying an n × n matrix A = [aij ] as a point in Rn , we have defined the norm of A as v uX n u n X ∥A∥ = t a2ij . i=1 j=1
Chapter 6. Multiple Integrals
497
Besides the triangle inequality, this norm also satisfies the following identity. Lemma 6.75 If A = [aij ] and B = [bij ] are n × n matrices, then ∥AB∥ ≤ ∥A∥ ∥B∥. Proof Let [cij ] = C = AB. Then for any 1 ≤ i, j ≤ n, cij =
n X
aik bkj .
k=1
By Cauchy-Schwarz inequality, c2ij ≤
n X
! a2ik
k=1
n X
! b2lj
.
l=1
Therefore, ∥C∥2 =
n X n X i=1 j=1
c2ij ≤
n X n n X X i=1 j=1
k=1
! a2ik
n X
! b2lj
= ∥A∥2 ∥B∥2 .
l=1
This proves that ∥AB∥ = ∥C∥ ≤ ∥A∥ ∥B∥. Now we prove the change of volume formula, which is the most technical part. Proof of Theorem 6.67 Given the smooth change of variables Ψ : O → Rn , let g : O → R be the continuous function g(x) = | det DΨ(x)|. We want to show that if D is a Jordan measurable set such that its closure D is contained in O, then Z Z vol (Ψ(D)) = | det DΨ(x)|dx = g(x)dx = I. D
D
Chapter 6. Multiple Integrals
498
We will first prove that vol (Ψ(D)) ≤ I. By Theorem 6.66, Ψ(D) is Jordan measurable. By Theorem 6.49, its closure Ψ(D) = Ψ(D) is also Jordan measurable, and vol (Ψ(D)) = vol Ψ(D) = vol Ψ(D) . On the other hand, since D \ D has Jordan content zero, Z Z g(x)dx. g(x)dx = D
D
Hence, we can assume from the beginning that D = D, or equivalently, D is closed. As in the proof of Theorem 6.66, Theorem 3.36 says that there is a positive number d and a compact set C such that D ⊂ C ⊂ O, and any point in Rn that has a distance less than d from a point in D lies in C. On the compact set C, the function g : C → R is continuous. By extreme value theorem, there are points u and v in C such that g(u) ≤ g(x) ≤ g(v) for all x ∈ C. Let mg = g(u) and Mg = g(v). Then mg > 0 and mg ≤ g(x) ≤ Mg
for all x ∈ C. 2
On the other hand, the function DΨ−1 : C → Rn is continuous on the compact set C. Hence, it is bounded. Namely, there is a positive number Mh such that ∥DΨ−1 (x)∥ ≤ Mh
for all x ∈ C.
Let L be a positive number such that C is contained in the cube I = [−L, L]n . Let gˇ : I → R be the zero extension of g : D → R. For each positive integer k, let Pk be the uniformly regular partition of I into k n rectangles. Then lim |Pk | = 0. Therefore, k→∞
Z lim U (ˇ g , Pk ) =
k→∞
Z g(x)dx = I,
gˇ(x)dx = I
D
Chapter 6. Multiple Integrals
499
and lim (U (χD , Pk ) − L(χD , Pk )) = 0.
k→∞
The compactness of C implies that the continuous functions DΨ : C → 2 Rn and g : C → R are uniformly continuous. Given ε > 0, there exists a δ1 > 0 such that if u and v are points in C with ∥u − v∥ < δ1 , then ∥DΨ(u) − DΨ(v)∥
0, there is a δ > 0 such that if u and v are points in I, |f (Ψ(u)) − f (Ψ(v))|
vol (D) −
ε . 2M
Let A = {J ∈ JP | J ∩ D ̸= ∅} ,
B = {J ∈ JP | J ⊂ D} .
Since |P| < d, each J in A is contained in C. Moreover, L(χD , P) =
X
vol (J),
J∈B
Denote by Q the set [
Q=
J.
J∈B
Then Q is a compact subset of D, S = D \ Q is Jordan measurable, and X ε vol(D \ Q) = vol (D) − vol (J) < . 2M J∈B By additivitity theorem, Z XZ f (x)dx = Ψ(D)
J∈B
Z f (x)dx +
Ψ(J)
Z f (Ψ(x)) |det DΨ(x)| dx = D
f (x)dx, Ψ(D\Q)
XZ J∈B
f (Ψ(x)) |det DΨ(x)| dx
J
Z f (Ψ(x)) |det DΨ(x)| dx.
+ D\Q
Theorem 6.77 says that for each J in B, Z Z f (x)dx = f (Ψ(x)) |det DΨ(x)| dx. Ψ(J)
J
Chapter 6. Multiple Integrals
Therefore, Z
508
f (x)dx − f (Ψ(x)) |det DΨ(x)| dx Ψ(D) D Z Z ≤ f (x)dx + f (Ψ(x)) |det DΨ(x)| dx . Z
Ψ(D\Q)
D\Q
Z f (x)dx, we have
For the term Ψ(D\Q)
Z
Ψ(D\Q)
f (x)dx ≤ Mf vol (Ψ(D \ Q)) .
By the change of volume theorem, Z vol (Ψ(D \ Q)) = |det DΨ(x)| dx ≤ Mg vol (D \ Q) . D\Q
Therefore, Z
ε f (x)dx ≤ Mf Mg vol (D \ Q) = M vol (D \ Q) < . 2 Ψ(D\Q)
Similarly, Z
ε f (Ψ(x)) |det DΨ(x)| dx ≤ M vol (D \ Q) < . 2 D\Q
This gives Z
Z f (x)dx −
Ψ(D)
which completes the proof.
D
f (Ψ(x)) |det DΨ(x)| dx < ε,
Chapter 6. Multiple Integrals
6.7
509
Some Important Integrals and Their Applications
Up to now we have only discussed multiple integrals for bounded functions f : D → R defined on bounded domains. For practical applications, we need to consider improper integrals where the function is not bounded or the domain is not bounded. As in the single variable case, we need to take limits. In the multi-variable case, things become considerably more complicated. Interested readers can read the corresponding sections in the book [Zor16]. In this section, we use theories learned in multiple integrals to derive some explicit formulas of improper integrals of single-variable functions, without introducing the definition of improper multiple integrals. We then give some applications of these formulas. Proposition 6.78 For any positive number a, Z
∞
e
−ax2
r dx =
−∞
π . a
Proof 2
Since the function f : R → R, f (x) = e−ax is positive for all x ∈ R, Z
∞
e
−ax2
Z
L
L→∞
−∞
2
e−ax dx.
dx = lim
−L
Given a positive number R, we consider the double integral Z 2 2 e−a(x +y ) dxdy. IR = B(0,R)
For any positive number L, √ B(0, L) ⊂ [−L, L] × [−L, L] ⊂ B(0, 2L). 2
2
Since the function g : R2 → R, g(x) = e−a(x +y ) is positive, Z 2 2 IL ≤ e−a(x +y ) dxdy ≤ I√2L . [−L,L]×[−L,L]
(6.19)
Chapter 6. Multiple Integrals
510
Using polar coordinates, we find that 2π
Z
"
R
Z
e−ar rdrdθ = 2π − 2a
−ar2
e
IR = 0
0
Thus, R→∞
L→∞
e−a(x
#R = 0
π 2 1 − e−aR . a
π . a
lim IR =
Eq. (6.19) then implies that Z lim
2
2 +y 2 )
dxdy =
[−L,L]×[−L,L]
π . a
By Fubini’s theorem, Z
−a(x2 +y 2 )
e
Z
L
e
dxdy =
−x2
2 dx
.
−L
[−L,L]×[−L,L]
Thus, we conclude that Z ∞ Z −ax2 e dx = lim L→∞
−∞
L
−L
−ax2
e
r dx =
π . a
√ Figure 6.48: B(0, L) ⊂ [−L, L] × [−L, L] ⊂ B(0, 2L). Z
∞
The improper integral
2
e−ax dx with a > 0 plays an important role in
−∞
various areas of mathematics. For example, in probability theorem, the probability
Chapter 6. Multiple Integrals
511
density function of a normal random variable with mean µ and standard deviation σ is given by 1 (x − µ)2 . f (x) = √ exp − 2σ 2 2π σ √ The normalization factor 1/( 2πσ) is required such that Z ∞ f (x)dx = 1, −∞
which ensures that total probability is 1. Recall that we have defined the gamma function Γ(s) for a real number s > 0 by the improper integral Z ∞
ts−1 e−t dt.
Γ(s) = 0
The value of Γ(1) is easy to compute. Z Γ(1) =
∞
e−t dt = 1.
0
Using integration by parts, one can show that Γ(s + 1) = sΓ(s)
when s > 0.
(6.20)
From this, we find that Γ(n + 1) = n!
for all n ∈ Z+ .
The value of Γ(s) when s = 1/2 is also of particular interest. Theorem 6.79 The value of the gamma function Γ(s) at s = 1/2 is √ 1 Γ = π. 2 Proof By Proposition 6.78, √
Z π=2
∞
−x2
e 0
Z dx = 2 lim+ lim a→0
L→∞
a
L
2
e−x dx.
Chapter 6. Multiple Integrals
512
Making a change of variables t = x2 , we find that Z
L
−x2
e
2
Z
L2
dx =
1
t− 2 e−t dt.
a2
a
Therefore, Z ∞ Z L2 1 1 − 12 −t t e dt = lim+ lim t− 2 e−t dt = Γ a→0 L→∞ a2 2 0 Z L √ 2 = 2 lim+ e−x dx = π. a→0
a
Another useful formula we have mentioned in volume I is the formula for the beta function B(α, β) defined as Z 1 B(α, β) = tα−1 (1 − t)β−1 dt when α > 0, β > 0. 0
It is easy to show that the integral is indeed convergent when α and β are positive. We have the following recursive formula. Lemma 6.80 For α > 0 and β > 0, we have B(α, β) =
(α + β + 1)(α + β) B(α + 1, β + 1). αβ
Proof First notice that for α > 0 and β > 0, Z 1 Z 1 α−1 β−1 t (1 − t) dt = tα−1 (1 − t)β−1 (1 − t + t)dt 0 Z0 1 Z 1 α−1 β = t (1 − t) dt + tα (1 − t)β−1 dt. 0
0
This gives B(α, β) = B(α + 1, β) + B(α, β + 1).
Chapter 6. Multiple Integrals
513
Apply this formula again to the two terms on the right, we find that B(α, β) = B(α + 2, β) + 2B(α + 1, β + 1) + B(α, β + 2). Using integration by parts, one can show that when α > 0 and β > 0, Z 1 Z α 1 α−1 α β−1 t (1 − t) dt = t (1 − t)β dt. β 0 0 This gives B(α + 1, β) =
α B(α, β + 1), β
Therefore,
α+1 β+1 B(α, β) = +2+ B(α + 1, β + 1) β α (α + β + 1)(α + β) = B(α + 1, β + 1). αβ Now we can derive the explicit formula for the beta function. Theorem 6.81 The Beta Function For any positive real numbers α and β, Z 1 Γ(α)Γ(β) tα−1 (1 − t)β−1 dt = B(α, β) = . Γ(α + β) 0 Proof We first consider the case where α > 1 and β > 1. Let g : [0, ∞) × [0, ∞) → R be the function defined as g(u, v) = uα−1 v β−1 e−u−v . This is a continuous function. For L > 0, let UL = {(t, w) | 0 ≤ t ≤ 1, 0 ≤ w ≤ L} , DL = {(u, v) | u ≥ 0, v ≥ 0, u + v ≤ L} .
Chapter 6. Multiple Integrals
514
Consider the map Ψ : R2 → R2 defined as (u, v) = Ψ(t, w) = (tw, (1 − t)w). Notice that Ψ maps the interior of UL one-to-one onto the interior of DL . The Jacobian of this map is " # w t ∂(u, v) = det = w. ∂(t, w) −w 1 − t Thus, Ψ : (0, 1) × (0, ∞) → R2 is a smooth change of variables. By taking limits, the change of variables theorem implies that Z Z ∂(u, v) dtdw = g(u, v)dudv. (6.21) (g ◦ Ψ)(t, w) ∂(t, w) DL UL Now Fubini’s theorem says that Z 1 Z L Z ∂(u, v) α−1 β−1 dtdw = t (1 − t) dt wα+β−1 e−w dw. (g ◦ Ψ)(t, w) ∂(t, w) 0 0 UL Thus, Z (g ◦ Ψ)(t, w)
lim
L→∞
UL
∂(u, v) dtdw = Γ(α + β)B(α, β). ∂(t, w)
On the other hand, we notice that 2 L 0, ⊂ DL ⊂ [0, L]2 . 2 Since the function g : [0, ∞) × [0, ∞) → R is nonnegative, this implies that Z IL ≤ g(u, v)dudv ≤ IL , (6.22) 2
DL
where Z IL =
Z g(u, v)dudv =
[0,L]2
L α−1 −u
u 0
e
Z du 0
L
v β−1 e−v dv.
Chapter 6. Multiple Integrals
515
By the definition of the gamma function, lim IL = Γ(α)Γ(β).
L→∞
Eq. (6.22) then implies that Z lim g(u, v)dudv = Γ(α)Γ(β). L→∞
DL
It follows from (6.21) that Γ(α + β)B(α, β) = Γ(α)Γ(β), which gives the desired formula when α > 1 and β > 1. For the general case where α > 0 and β > 0, Lemma 6.80 and (6.20) give B(α, β) =
(α + β + 1)(α + β) Γ(α + 1)Γ(β + 1) Γ(α)Γ(β) = . αβ Γ(α + β + 2) Γ(α + β)
This completes the proof.
L Figure 6.49: 0, 2
2
⊂ {(u, v) | u ≥ 0, v ≥ 0, u + v ≤ L} ⊂ [0, L]2 .
Now we give an interesting application of the formula of the beta function.
Chapter 6. Multiple Integrals
516
Theorem 6.82 For n ≥ 1, the volume of the n-ball of radius a, Bn (a) = (x1 , . . . , xn ) ∈ Rn | x21 + · · · + x2n ≤ a2 , is equal to
n
π 2 an . Vn (a) = Γ n+2 2 Proof It is easy to see that for any a > 0, Vn (a) = Vn an ,
where Vn = Vn (1).
Since B1 (1) = [−1, 1], we find that V1 = 2. For n ≥ 2, notice that for fixed −1 ≤ y ≤ 1, the ball Bn (1) intersects the plane xn = y on the set Sn (y) = (x1 , . . . , xn−1 , y) | x21 + . . . + x2n−1 ≤ 1 − y 2 . Fubini’s theorem implies that Z 1 Z 1Z n−1 (1 − y 2 ) 2 Vn−1 dy Vn = dx1 . . . dxn−1 dy = √ −1
Bn−1
Z
1
1−y 2
−1
n−1
Z
(1 − y 2 ) 2 dy = Vn−1 0 √ Γ 1 n+1 = Vn−1 B , = πVn−1 2 2 Γ = 2Vn−1
1
1
t− 2 (1 − t) 0 n+1 2 . n+2
n−1 2
dt
2
This formula is still correct when n = 1 if we define V0 = 1. Therefore, ! n n n Y Y √ Γ k+1 Vk π2 2 . Vn = = π k+2 = n+2 V Γ Γ k−1 2 2 k=1 k=1
Chapter 7. Fourier Series and Fourier Transforms
517
Chapter 7 Fourier Series and Fourier Transforms In this chapter, we shift our attention to the theory of Fourier series and Fourier transforms. In volume I, we have considered expansions of functions as power series, which are limits of polynomials. In this chapter, we consider expansions of functions in another class of infinitely differentiable functions – the trigonometric functions sin x and cos x. The reason to consider sin x and cos x is that they are representative of periodic functions. Recall that a function f : R → R is said to be periodic if there is a positive number p so that f (x + p) = f (x) for all x ∈ R. Such a number p is called a period of the function f . If p is a period of f , then for any positive integer n, np is also a period of f . The functions sin x and cos x are periodic functions of period 2π. If f : R → R is a periodic function of period p = 2L, then the function g : R → R defined as πx g(x) = f L is periodic of period 2π. Hence, we can concentrate on functions that are periodic of period 2π. The celebrated Euler formula eix = cos x + i sin x connects the trigonometric functions sin x, cos x with the exponential function with imaginary arguments. Hence, in this chapter, we shift our paradigm and consider complex-valued functions f : D → C defined on a subset D of R. Since a complex number z = x + iy with real part x and imaginary part y can be identified with the point (x, y) in R2 , such a function can be regarded as a function f : D → R2 , so that derivative and integrals are defined componentwise. More
Chapter 7. Fourier Series and Fourier Transforms
518
precisely, given x ∈ D, we write f (x) = u(x) + iv(x), where u(x) and v(x) are respectively the real part and imaginary part of f (x). If x0 is an interior point of D, we say that f is differentiable at x0 if the limit lim
x→x0
f (x) − f (x0 ) x − x0
exists. This is if and only if both u : D → R and v : D → R are differentiable at x0 , and we have f ′ (x0 ) = u′ (x0 ) + iv ′ (x0 ). Similarly, if [a, b] is a closed interval that is contained in D, we say that f is Riemann integrable over [a, b] if and only if both u and v are Riemann integrable over [a, b], and we have Z b Z b Z b f (x)dx = u(x)dx + i v(x)dx. a
a
a
If F : [a, b] → C is a continuously differentiable function, the fundamental theorem of calculus implies that Z b F (b) − F (a) = F ′ (x)dx. a
7.1
Orthogonal Systems of Functions and Fourier Series
In the following, let I = [a, b] be a compact interval in R unless otherwise specified. Denote by R(I, C) the set of all complex-valued functions f : I → C that are Riemann integrable. Given two functions f and g in R(I, C), their sum f + g is the function (f + g) : I → C, (f + g)(x) = f (x) + g(x). If α is a complex number, the scalar product of α with f is the function (αf ) : I → C, where (αf )(x) = αf (x). With the addition and scalar multiplication thus defined, R(I, C) is a complex vector space. From the theory of integration, we know that the set of complexvalued continuous functions on I, denoted by C(I, C), is a subspace of R(I, C).
Chapter 7. Fourier Series and Fourier Transforms
519
If f : I → C is Riemann integrable, so does its complex conjugate f : I → C defined as f (x) = f (x). In volume I, we have proved that if two real-valued functions f : I → R and g : I → R are Riemann integrable, so is their product (f g) : I → R. Using this, it is easy to check that if f : I → C and g : I → C are Riemann integrable complex-valued functions, (f g) : I → C is also Riemann integrable. Proposition 7.1 Given f and g in R(I, C), define Z ⟨f, g⟩ =
b
f (x)g(x)dx. a
For any f, g, h in R(I, C), and any complex numbers α and β, we have the followings. (a) ⟨g, f ⟩ = ⟨f, g⟩. (b) ⟨αf + βg, h⟩ = α⟨f, h⟩ + β⟨g, h⟩. (c) ⟨f, f ⟩ ≥ 0. We call ⟨ · , · ⟩ a positive semi-definite inner product on R(I, C). It follows from (a) and (b) that ⟨f, αg + βh⟩ = α⟨f, g⟩ + β⟨f, h⟩. More generally, we have * m + n n m X X X X αi fi , βj gj = αi βj ⟨fi , gj ⟩. i=1
j=1
i=1 j=1
If f (x) = u(x) + iv(x), where u(x) = Re f (x) and v(x) = Im f (x), then Z ⟨f, f ⟩ = a
b
u(x)2 + v(x)2 dx.
(7.1)
Chapter 7. Fourier Series and Fourier Transforms
520
Notice that ⟨f, f ⟩ = 0 does not imply that f = 0. For example, take any c in [a, b], and define the function f : I → C by 1, if x = c, f (x) = 0, otherwise. Then Z ⟨f, f ⟩ =
b
f (x)f (x)dx = 0 a
even though f is not the zero function. This is why we call ⟨ · , · ⟩ a positive semi-definite inner product. Restricted to the subspace of continuous functions C(I, C), ⟨ · , · ⟩ is a positive definite inner product, or simply an inner product in the usual sense. Using the positive semi-definite inner product, we can define a semi-norm on R(I, C). Definition 7.1 The L2 Semi-Norm Given f : I → C in R(I, C), the semi-norm of f is defined as s Z b p |f (x)|2 dx. ∥f ∥ = ⟨f, f ⟩ = a
It has the following properties. Proposition 7.2 Given f in R(I, C) and α ∈ C, we have the followings. (a) ∥f ∥ ≥ 0. (b) ∥αf ∥ = |α|∥f ∥. The Cauchy-Schwarz inequality still holds for the positive semi-definite inner product on R(I, C).
Chapter 7. Fourier Series and Fourier Transforms
521
Proposition 7.3 Cauchy-Schwarz Inequality Given f and g in R(I, C), |⟨f, g⟩| ≤ ∥f ∥∥g∥. The proof is exactly the same as for an inner product on a real vector space. An immediate consequence of the Cauchy-Schwarz inequality is the triangle inequality. Proposition 7.4 Triangle Inequality Let f1 , . . . , fn be functions in R(I, C) and let α1 , . . . , αn be complex numbers. We have ∥α1 f1 + · · · + αn fn ∥ ≤ |α1 |∥f1 ∥ + · · · + |αn |∥fn ∥. The proof is also the same as for a real inner product. One consider the case n = 2 first and then prove the general case by induction on n. We can define orthogonality on R(I, C) the same way as for a real inner product space. Definition 7.2 Orthogonality Given two functions f and g in R(I, C), we say that they are orthogonal if ⟨f, g⟩ = 0. Example 7.1 Let I = [0, 2π]. For n ∈ Z, define ϕn : I → C by ϕn (x) = einx . Show that if m and n are distinct integers, then ϕm and ϕn are orthogonal. Solution Notice that Z ⟨ϕm , ϕn ⟩ =
2π
Z ϕm (x)ϕn (x)dx =
0
0
2π
ei(m−n)x dx.
Chapter 7. Fourier Series and Fourier Transforms
522
Since m ̸= n, and d i(m−n)x e = i(m − n)ei(m−n)x , dx fundamental theorem of calculus implies that ei(m−n)x ⟨ϕm , ϕn ⟩ = i(m − n)
2π = 0. 0
Hence, ϕm and ϕn are orthogonal. Definition 7.3 Orthogonal System and Orthonormal System Let S = {ϕα | α ∈ J} be a subset of functions in R(I, C) indexed by the set J. We say that S is an orthogonal system of functions if ⟨ϕα , ϕβ ⟩ = 0
whenever α ̸= β,
and ∥ϕα ∥ = ̸ 0
for all α ∈ J.
We say that S is an orthonormal system of functions if it is an orthogonal system and ∥ϕα ∥ = 1 for all α ∈ J. Notice that in our definition of orthogonal system, we have an additional condition that each element in the set S cannot have zero norm. By definition, it is obvious that if S is an orthogonal system, then any subset of S is also an orthogonal system. The same holds for orthonormal systems. Example 7.2 Let I = [0, 2π]. For n ∈ Z, define ϕn : I → C by ϕn (x) = einx . Then Z 2π 2 ∥ϕn ∥ = einx e−inx dx = 2π. 0
Example 7.1 implies that S = {ϕn | n ∈ Z} is an orthogonal system.
Chapter 7. Fourier Series and Fourier Transforms
523
If we let φn : I → R, n ∈ Z be the function ϕn (x) einx √ φn (x) = = , ∥ϕn ∥ 2π then Se = {φn | n ∈ Z} is an orthonormal system. Using the semi-norm, we can define a relation ∼ on R(I, C) in the following way. We say that f ∼ g if and only if ∥f − g∥ = 0. It is easy to check that this is an equivalence relation. Reflexivity and symmetry are obvious. For transitivity, we note that if f ∼ g and g ∼ h, then ∥f − g∥ = 0 and ∥g − h∥ = 0. It follows from triangle inequality that ∥f − h∥ ≤ ∥f − g∥ + ∥g − h∥ = 0. This implies that ∥f − h∥ = 0, and thus f ∼ h. Hence, ∼ is an equivalence relation on R(I, C), which we call L2 -equivalent. Definition 7.4 L2 Equivalent Functions Two Riemann integrable functions f : I → C and g : I → C are L2 equivalent if ∥f − g∥ = 0. Example 7.3 Let I = [a, b], and let S be a finite subset of I. If f : I → C and g : I → C are two Riemann integrable functions and f (x) = g(x)
for all x ∈ [a, b] \ S,
then f and g are L2 -equivalent. Regarding R(I, C) as an additive group, the subset K(I, C) that contains all the functions in R(I, C) that have zero norm is a normal subgroup. They are functions that are L2 -equivalent to the zero function. Denote by b C) = R(I, C)/K(I, C) = R(I, C)/ ∼ R(I,
Chapter 7. Fourier Series and Fourier Transforms
524
b C) is an L2 equivalent class of the quotient group. Then each element of R(I, functions. If u is in K(I, C), g is in R(I, C), then Cauchy-Schwarz inequality implies that |⟨u, g⟩| ≤ ∥u∥∥g∥ = 0. Thus, ⟨u, g⟩ = 0. If f is L2 equivalent to f1 , g is L2 equivalent to g1 , there exists u and v in K(I, C) such that f1 = f + u and g1 = g + v. Therefore, ⟨f1 , g1 ⟩ = ⟨f + u, g + v⟩ = ⟨f, g⟩ + ⟨u, g⟩ + ⟨f, v⟩ + ⟨u, v⟩ = ⟨f, g⟩. Hence, the positive semi-definite inner product ⟨ · , · ⟩ on R(I, C) induces an b C) by infinite product on R(I, ⟨[f ], [g]⟩ = ⟨f, g⟩. b C) is such that If [f ] ∈ R(I, ⟨[f ], [f ]⟩ = ⟨f, f ⟩ = 0, then f is in K(I, C), and thus, [f ] = [0]. This says that the infinite product ⟨ · , · ⟩ b C) is positive definite. The additional condition we impose on a subset S on R(I, of R(I, C) to be an orthogonal system just means that none of the elements in S is L2 -equivalent to the zero function. For an orthogonal system, we have the following from (7.1). Theorem 7.5 Generalized Pythagoras Theorem Let S = {ϕk | 1 ≤ k ≤ n} be an orthogonal system of functions in R(I, C). For any complex numbers α1 , . . . , αn , ∥α1 ϕ1 + · · · + αn ϕn ∥2 = |α1 |2 ∥ϕ1 ∥2 + · · · + |αn |2 ∥ϕn ∥2 . The functions ϕn : [0, 2π] → C, fn (x) = einx , n ∈ Z are easy to deal with d ax because of e = aeax for any complex numbers a. The drawback is they are dx complex-valued functions. Since einx = cos nx + i sin nx,
Chapter 7. Fourier Series and Fourier Transforms
525
if one wants to work with real-valued functions, one should consider the functions cos nx and sin nx. Proposition 7.6 Let I = [0, 2π], and define the functions Cn : I → R, n ≥ 0, and Sn : I → R, n ≥ 1 by Cn (x) = cos nx, Sn (x) = sin nx. Then B = {Cn | n ≥ 0} ∪ {Sn | n ≥ 1} is an orthogonal system, and √ ∥C0 ∥ = 2π, ∥Cn ∥ = ∥Sn ∥ =
√
π
when n ≥ 1.
Proof For n ∈ Z, let ϕn : [0, 2π] → C be the function ϕn (x) = einx . Then C0 = ϕ0 , and when n ∈ Z+ , Cn =
ϕn + ϕ−n , 2
Sn =
ϕn − ϕ−n . 2i
Since {ϕn | n ∈ Z} is an orthogonal system, we find that for n ∈ Z+ , 1 1 ⟨C0 , Cn ⟩ = ⟨ϕ0 , ϕn ⟩ + ⟨ϕ0 , ϕ−n ⟩ = 0, 2 2 i i ⟨C0 , Sn ⟩ = ⟨ϕ0 , ϕn ⟩ − ⟨ϕ0 , ϕ−n ⟩ = 0. 2 2 + For m, n ∈ Z such that m ̸= n, ⟨Cm , Cn ⟩ =
1 (⟨ϕm , ϕn ⟩ + ⟨ϕm , ϕ−n ⟩ + ⟨ϕ−m , ϕn ⟩ + ⟨ϕ−m , ϕ−n ⟩) = 0, 4
1 (⟨ϕm , ϕn ⟩ − ⟨ϕm , ϕ−n ⟩ − ⟨ϕ−m , ϕn ⟩ + ⟨ϕ−m , ϕ−n ⟩) = 0. 4 For m, n ∈ Z+ , considering the cases m = n and m ̸= n separately, we find that ⟨Sm , Sn ⟩ =
⟨Sm , Cn ⟩ =
1 (⟨ϕm , ϕn ⟩ + ⟨ϕm , ϕ−n ⟩ − ⟨ϕ−m , ϕn ⟩ − ⟨ϕ−m , ϕ−n ⟩) = 0. 4i
Chapter 7. Fourier Series and Fourier Transforms
526
These show that B is an orthogonal system. For n ∈ Z+ , since ∥ϕn ∥ = √ ∥ϕ−n ∥ = 2π, and ϕn and ϕ−n are orthogonal, we have
2
1
1 2
= 1 ∥ϕn ∥2 + 1 ∥ϕ−n ∥2 = π, ∥Cn ∥ = ϕ + ϕ n −n
2
2 4 4
2
1
1
= 1 ∥ϕn ∥2 + 1 ∥ϕ−n ∥2 = π. ∥Sn ∥ = ϕ − ϕ n −n
2i
2i 4 4 2
These complete the proof. Given a finite subset S = {f1 , . . . , fn } of R(I, C), let WS = span S = {c1 f1 + · · · + cn fn | c1 , . . . , cn ∈ C} be the subspace of R(I, C) spanned by S. We say that an element g of R(I, C) is orthogonal to WS if it is orthogonal to each f ∈ WS . This is if and only if g is orthogonal to fk for all 1 ≤ k ≤ n. The projection theorem says the following. Theorem 7.7 Projection Theorem Let S = {ϕ1 , . . . , ϕn } be an orthogonal system of functions in R(I, C), and let WS be the subspace of R(I, C) spanned by S. Given f in R(I, C), there is a unique g ∈ WS such that f − g is orthogonal to WS . It is called the projection of the function f onto the subspace WS , denoted by projWS f , and it is given by n X ⟨f, ϕk ⟩ ⟨f, ϕ1 ⟩ ⟨f, ϕn ⟩ projWS f = ϕk = ϕ1 + · · · + ϕn . ⟨ϕk , ϕk ⟩ ⟨ϕ1 , ϕ1 ⟩ ⟨ϕn , ϕn ⟩ k=1
For any h ∈ WS , ∥f − h∥ ≥ ∥f − projWS f ∥. Proof Assume that g is a function in WS such that f − g is orthogonal to WS . Then there exist complex numbers α1 , . . . , αn such that g = α1 ϕ1 + · · · + αn ϕn .
Chapter 7. Fourier Series and Fourier Transforms
527
Since ⟨ϕk , ϕl ⟩ = 0 if k ̸= l, we find that ⟨g, ϕk ⟩ = αk ⟨ϕk , ϕk ⟩
for 1 ≤ k ≤ n.
Since f − g is orthogonal to WS , ⟨f − g, ϕk ⟩ = 0 for all 1 ≤ k ≤ n. This gives ⟨f, ϕk ⟩ = ⟨g, ϕk ⟩, and thus αk ⟨ϕk , ϕk ⟩ = ⟨f, ϕk ⟩
for 1 ≤ k ≤ n.
Hence, we must have αk =
⟨f, ϕk ⟩ . ⟨ϕk , ϕk ⟩
This implies the uniqueness of g if it exists. It is easy to check that the function n X ⟨f, ϕk ⟩ ϕk g= ⟨ϕk , ϕk ⟩ k=1
is indeed a function in WS such that f − g is orthogonal to WS . Finally, for any h in WS , g − h is also in WS . Hence, g − h is orthogonal to f − g. By the generalized Pythogoras theorem, ∥f − h∥2 = ∥(f − g) + (g − h)∥2 = ∥f − g∥2 + ∥g − h∥2 ≥ ∥f − g∥2 . This proves that ∥f − h∥ ≥ ∥f − g∥
for all h ∈ WS .
Now we restrict our consideration to functions f that are periodic of period 2π. In this case, the function is uniquely determined by its values on an interval [a, b] of length 2π. We often take I = [0, 2π] or I = [−π, π]. Notice that if f : R → C is a function of period 2π, then for any α ∈ R, Z α+2π Z 2π Z π f (x)dx = f (x)dx = f (x)dx. α
0
−π
Any function f : [α, α + 2π] → C defined on an interval of length 2π can be extended to be a 2π-periodic function.
Chapter 7. Fourier Series and Fourier Transforms
528
Definition 7.5 Extension of Functions Let I = [α, α + 2π] be an inverval of length 2π, and let f : I → C be a function defined on I. We can extend f to be a 2π-periodic function fe : R → C in the following way. (i) For x ∈ (α, α + 2π), define fe(x + 2nπ) = f (x)
for all n ∈ Z.
(ii) For x = α, define f (α) + f (α + 2π) fe(α + 2nπ) = 2
for all n ∈ Z.
Examples are shown in Figures 7.1 and 7.2.
Figure 7.1: Extending a function f : [−π, π] → R periodically.
Figure 7.2: Extending a function f : [−π, π] → R periodically. Now let us define Fourier series. Example 7.2 asserts that the set S = {ϕn | n ∈ Z} ,
where ϕn (x) = einx ,
Chapter 7. Fourier Series and Fourier Transforms
529
is an orthogonal system of functions in R(I, C), where I = [−π, π]. For n ≥ 0, let Wn be the subspace of R(I, C) spanned by Sn = eikx | − n ≤ k ≤ n . It is a vector space of dimension 2n + 1 with basis Sn . Moreover, W0 ⊂ W1 ⊂ W2 ⊂ · · · . A real basis of Wn is given by Bn = {sin kx | k = 1, . . . , n} ∪ {cos kx | k = 0, 1, . . . , n} . Given f ∈ R(I, C), let sn = projWn f be the projection of f onto Wn . The projection theorem says that
sn (x) = projWn f (x) =
n X
ck eikx ,
k=−n
where
Z π ⟨f, ϕk ⟩ 1 ck = = f (x)e−ikx dx. ⟨ϕk , ϕk ⟩ 2π −π By Proposition 7.6 and the projection theorem, sn (x) can also be written as n a0 X (ak cos kx + bk sin kx) , sn (x) = projWn f (x) = + 2 k=1
where
Z 1 π ak = f (x) cos kxdx, for 0 ≤ k ≤ n, π −π Z 1 π bk = f (x) sin kxdx for 1 ≤ k ≤ n. π −π By definition, we find that a0 c0 = , 2 and when k ≥ 1, ak + ibk ak − ibk c−k = , ck = . 2 2 If f is a real-valued function, ak and bk are real and c−k = ck . Definition 7.6 Trigonometric Series A trigonometric series is a series of the form ∞
a0 X (ak cos kx + bk sin kx). + 2 k=1
Chapter 7. Fourier Series and Fourier Transforms
530 ∞ X
Since a trigonometric series can be expressed in the form
ck eikx , we also
k=−∞ ∞ X
call a series of the form
ck eikx a trigonometric series. Fourier series of a
k=−∞
function is a trigonometric series. Definition 7.7 Fourier Series and its nth Partial Sums Let I = [−π, π]. The Fourier series of a function f in R(I, C) is the infinite series ∞ ∞ X a0 X ikx + ck e or (ak cos kx + bk sin kx) , 2 k=−∞ k=1 where Z π 1 f (x)e−ikx dx, ck = 2π −π Z 1 π ak = f (x) cos kxdx, π −π Z 1 π bk = f (x) sin kxdx, π −π
k ∈ Z, k ≥ 0, k ≥ 1.
The nth -partial sum of the Fourier series is sn (x) =
n X k=−n
n
ikx
ck e
a0 X = + (ak cos kx + bk sin kx) . 2 k=1
It is the projection of f onto the subspace of R(I, C) spanned by Sn = ikx e | −n≤k ≤n . Remark 7.1 If f : R → C is a 2π-periodic function which is Riemann integrable over a closed interval of length 2π, the Fourier series of f is the Fourier series of f : [−π, π] → C.
Chapter 7. Fourier Series and Fourier Transforms
531
Remark 7.2 If I = [−L, L], the Fourier series of a function f ∈ R(I, C) is the series ∞ X
ck exp
k=−∞
where
iπkx L
,
L
iπkx f (x) exp − dx. L −L
Z
1 ck = 2L
Henceforth, we only consider the case where I is a closed interval of length 2π. Let us look at some examples. Example 7.4 Find the Fourier series of the function f : [−π, π] → R defined as f (x) = x. Solution When k = 0, 1 c0 = 2π
Z
π
1 f (x)dx = 2π −π
Z
π
xdx = 0. −π
When k ̸= 0, Z π 1 ck = xe−ikx dx 2π −π π Z 1 1 −ikx 1 π −ikx + e dx = − xe 2π ik ik −π −π =
(−1)k−1 . ik
Therefore, the Fourier series of f is ∞ X k=1
∞
(−1)k−1
eikx − e−ikx X sin kx = 2(−1)k−1 . ik k k=1
Chapter 7. Fourier Series and Fourier Transforms
532
Figure 7.3: The function f (x) = x, −π < x < π and sn (x), 1 ≤ n ≤ 5. Remark 7.3 Let I = [−π, π]. Given f in R(I, C), we call each Z k∈Z ck = f (x)e−ikx dx, I
a Fourier coefficient of f . The mapping Fk from R(I, C) to C which takes a function f to ck is a linear transformation between vector spaces. When f : I → R is a real-valued function, we usually prefer to work with the Fourier coefficients ak and bk . One can show that if f : [−π, π] → R is an odd function, then ak = 0 for all k ≥ 0, so that the Fourier series of f only has sine terms. If f : [−π, π] → R is an even function, then bk = 0 for all k ≥ 1, so that the Fourier series of f only has the constant and the cosine terms. Remark 7.4 If f : I → C is a function of the form f (x) =
X
ck eikx ,
k∈J
where J is a finite subset of integers, then the Fourier series of f is equal to itself.
Chapter 7. Fourier Series and Fourier Transforms
533
Example 7.5 The Fourier series of a constant function f : I → C, f (x) = c is just c itself. Example 7.6 Let f : [0, 2π] → R be the function defined as f (x) = x(2π − x). Find its Fourier series, and express it in terms of trigonometric functions.
Figure 7.4: The function f : [0, 2π] → R, f (x) = x(2π − x) and its entension. Solution When we extend f periodically to the function fe : R → R, we find that when x ∈ [0, 2π], fe(−x) = f (−x + 2π) = (2π − x)x = f (x) = fe(x). Hence, fe(x) is an even function. This implies that the Fourier series of f : [0, 2π] → R only has cosine terms. 1 a0 = π
Z 0
2π
2 f (x)dx = π
Z 0
π
π 2 x3 4 2 πx − = π2. (2πx − x )dx = π 3 0 3 2
Chapter 7. Fourier Series and Fourier Transforms
534
For k ≥ 1, Z Z 2 π 2 π f (x) cos xdx = (2πx − x2 ) cos xdx ak = π 0 π 0 π Z 2 (2πx − x2 ) sin kx 1 π = (2π − 2x) sin kxdx − π k k 0 0 π Z (2π − 2x) cos kx 2 π 2 − − =− cos kxdx πk k k 0 0 4 4 4 = − 2 + 3 [sin kx]π0 = − 2 . k πk k Therefore, the Fourier series of f is ∞ X 2 2 cos kx π −4 . 3 k2 k=1
Example 7.7 Let a and b be two numbers satisfying −π ≤ a < b ≤ π, and let g : [−π, π] → R be the function defined as 1, if a ≤ x ≤ b, g(x) = 0, otherwise. Find the Fourier series of g in exponential form.
Figure 7.5: The function g : [0, 2π] → R defined in Example 7.7 and its entension.
Chapter 7. Fourier Series and Fourier Transforms
535
Solution Since g is piecewise continuous, it is Riemann integrable. Z π Z b 1 1 b−a c0 = g(x)dx = dx = . 2π −π 2π a 2π For k ≥ 1, 1 ck = 2π
Z
π −ikx
g(x)e −π
1 dx = 2π
c−k = ck =
Z
b
e−ikx dx =
a
e−ikb − e−ika , −2πik
eikb − eika . 2πik
Therefore, the Fourier series of g is ∞ b−a i X (e−ikb − e−ika )eikx − (eikb − eika )e−ikx + . 2π 2π k=1 k
Example 7.8 Find the Fourier series of the function f : [−π, π] → R, f (x) = x sin x. Solution Since f is a real-valued even function, we only need to compute the Fourier coefficients Z 1 π ak (f ) = x sin x cos kxdx when k ≥ 0. π −π Let g : [−π, π] → R be the function g(x) = x. We have seen in Example 7.4 that the Fourier series of g is given by ∞ X
G(x) =
2(−1)k−1
k=1
sin kx . k
Therefore, for k ≥ 1, 1 bk (g) = π
Z
π
x sin kx = −π
2(−1)k−1 . k
Chapter 7. Fourier Series and Fourier Transforms
536
From this, we find that a0 (f ) = b1 (g) = 2, Z π 1 1 1 a1 (f ) = x sin 2xdx = b2 (g) = − ; 2π −π 2 2 and when k ≥ 2, Z π 1 ak (f ) = x (sin(k + 1)x − sin(k − 1)x) dx 2π −π 1 = (bk+1 (g) − bk−1 (g)) 2 1 1 k = (−1) − k+1 k−1 2(−1)k−1 . = k2 − 1 Hence, the Fourier series of the function f : [−π, π] → R, f (x) = x sin x is ∞ X 1 2(−1)k−1 1 − cos x + cos kx. 2 k2 − 1 k=2
Figure 7.6: The function f : [−π, π] → R, f (x) = x sin x and its periodic entension. At the end of this section, let us make an additional remark.
Chapter 7. Fourier Series and Fourier Transforms
537
Remark 7.5 Semi-Norms A semi-norm on a complex vector space V is a function ∥ · ∥ : V → R which defines the norm ∥v∥ for each v in V such that the following hold. (a) For any v ∈ V , ∥v∥ ≥ 0. (b) For any α ∈ C, and any v ∈ V , ∥αv∥ = |α|∥v∥. (c) For any u and v in V , ∥u + v∥ ≤ ∥u∥ + ∥v∥. If in addition, we have (d) ∥v∥ = 0 if and only if v = 0, then ∥ · ∥ is called a norm on the vector space V . Proposition 7.2 and Proposition 7.4 justify that the L2 -norm sZ |f (x)|2 dx
∥f ∥2 = I
is indeed a semi-norm on the vector space R(I, C). There are other semi-norms on R(I, C). One of them which will also be useful later is the L1 -norm defined as Z ∥f ∥1 = |f (x)|dx. I
The fact that this is a semi-norm is quite easy to establish.
Chapter 7. Fourier Series and Fourier Transforms
538
Exercises 7.1 Question 1 Let f : [−π, π] → R be a real-valued Riemann integrable function. (a) If f : [−π, π] → R is an odd function, show that the Fourier series of f has the form ∞ X
bk sin kx,
k=1
2 where bk = π
Z
π
f (x) sin kx. 0
(b) If f : [−π, π] → R is an even function, show that the Fourier series of f has the form ∞
a0 X ak cos kx, + 2 k=1
2 where ak = π
Z
π
f (x) cos kx. 0
Question 2 Find the Fourier series of the function f : [−π, π] → R, f (x) = |x|, and express it in terms of trigonometric functions. Question 3 Find the Fourier series of the function f : [−π, π] → R, f (x) = x2 , and express it in terms of trigonometric functions. Question 4 Find the Fourier series of the function f : [0, 2π] → R, f (x) = x2 , and express it in terms of trigonometric functions. Question 5 Find the Fourier series of the function f : [−π, π] → R, f (x) = sin 2x, and express it in terms of trigonometric functions.
Chapter 7. Fourier Series and Fourier Transforms
539
Question 6 Find the Fourier series of the function f : [−π, π] → R, 0, if − π ≤ x < 0, f (x) = sin x, if 0 ≤ x ≤ π, and express it in terms of trigonometric functions. Question 7 Find the Fourier series of the function f : [−π, π] → R, f (x) = x cos x from the Fourier series of the function g : [−π, π] → R, g(x) = x. Question 8 Let x0 be a point in the interval [a, b], and let f : [a, b] → C and g : [a, b] → C be L2 -equivalent Riemann integrable functions. Assume that both f and g are continuous at the point x0 , show that f (x0 ) = g(x0 ).
Chapter 7. Fourier Series and Fourier Transforms
7.2
540
The Pointwise Convergence of a Fourier Series
Let I = [−π, π] and let R(I, C) be the vector space that consists of all Riemann integrable functions f : I → C that are defined on I. Each of these functions can be extended to a periodic function fe : R → C so that fe(x) = f (x) for all x in the interior of I. Given f ∈ R(I, C), we define the Fourier series of f as the infinite series ∞ X
∞
ck eikx =
k=−∞
a0 X + (ak cos kx + bk sin kx) , 2 k=1
where Z 1 f (x)e−ikx dx, ck = 2π I Z 1 ak = f (x) cos kxdx, π I Z 1 f (x) sin kxdx, bk = π I
k ∈ Z, k ≥ 0, k ≥ 1.
The problem of interest to us is the convergence of the Fourier series. Given n ≥ 0, the nth -partial sum of the Fourier series of f is sn (x) =
n X
ck eikx .
k=−n
We say that the Fourier series converges pointwise if the sequence of partial sum functions {sn : I → C} converges pointwise. Let us first give an integral expression for the partial sums sn (x) of the Fourier series. By definition, n Z π 1 X −ikt sn (x) = f (t)e dt eikx 2π k=−n −π Z π n X 1 = f (t) eik(x−t) dt. 2π −π k=−n If x ∈ 2πZ, eikx = 1 for all −n ≤ k ≤ n. Therefore, n X k=−n
eikx = 2n + 1.
Chapter 7. Fourier Series and Fourier Transforms
541
If x ∈ / 2πZ, eix ̸= 1. Using the sum formula for a geometric sequence, we have n X
1
ikx
e
=e
−inx
ix
2inx
1 + e + ··· + e
k=−n 1
=
1
ei(n+ 2 )x − e−i(n+ 2 )x ix
ix
e 2 − e− 2
=
e−i(n+ 2 )x ix
e− 2 1 sin n + x 2 = . x sin 2
×
ei(2n+1)x − 1 eix − 1
Definition 7.8 Dirichlet Kernel Given a nonnegative integer n, the Dirichlet kernel Dn : R → R is 1, if x ∈ 2πZ, 2n + n X 1 x eikx = sin n + Dn (x) = 2 , otherwise. k=−n x sin 2 Our derivation above gives the following. Proposition 7.8 Let I = [−π, π], and let f : I → C be a Riemann integrable function. The nth -partial sum sn (x) of the Fourier series of f has an integral representation given by Z π 1 sn (x) = f (t)Dn (x − t)dt, 2π −π where Dn : R → R is the Dirichlet kernel given by 2n + 1, if x ∈ 2πZ, 1 x Dn (x) = sin n + 2 , otherwise. x sin 2
Chapter 7. Fourier Series and Fourier Transforms
542
Figure 7.7: The Dirichlet kernels Dn : [−π, π] → R for 1 ≤ n ≤ 5. Remark 7.6 By definition, the Dirichlet kernel Dn (x) is equal to Dn (x) =
n X
eikx .
k=−n
From this, one can see that Dn (x) is an infinitely differentiable 2π-periodic function, and it is an even function. Recall that g : [a, b] → C is a step function if there is a partition P = {x0 , x1 , . . . , xl } of [a, b] such that for each 1 ≤ j ≤ l, g : (xj−1 , xj ) → C is a constant function. It is easy to see that g : [a, b] → C is a step function if and only if both its real and imaginary parts are step functions. The following theorem asserts that a Riemann integrable function f : [a, b] → R can be approximated in L1 by step functions. Theorem 7.9 Let f : [a, b] → C be a Riemann integrable function. For every ε > 0, there is a step function g : [a, b] → C such that Z
b
|f (x) − g(x)|dx < ε. a
Chapter 7. Fourier Series and Fourier Transforms
543
Proof Let f (x) = u(x) + iv(x), where u : [a, b] → R and v : [a, b] → R are the real and imaginary parts of f . Assume that u1 : [a, b] → R and v1 : [a, b] → R are step functions such that Z a
b
Z
ε |u(x) − u1 (x)|dx < , 2
a
b
ε |v(x) − v1 (x)|dx < . 2
Let g : [a, b] → R be the function g = u1 + iv1 . Then g is a step function. By triangle inequality, we have Z b Z b Z b |f (x) − g(x)|dx ≤ |u(x) − u1 (x)|dx + |v(x) − v1 (x)|dx < ε. a
a
a
Therefore, it is sufficient to prove the theorem when f is a real-valued function. Given ε > 0, since f : [a, b] → R is Riemann integrable, there is a partition P = {x0 , x1 , . . . , xl } of [a, b] such that U (f, P ) − L(f, P ) < ε, where U (f, P ) =
l X
Mj (xj − xj−1 ),
Mj =
sup
f (x),
xj−1 ≤x≤xj
j=1
and L(f, P ) =
l X
mj (xj − xj−1 ),
mj =
j=1
inf
xj−1 ≤x≤xj
f (x),
are respectively the Darboux upper sum and Darboux lower sum of f with respect to the partition P . Define the function g : [a, b] → R by g(x) = mj
when xj−1 ≤ x < xj ,
and g(b) = f (b). Then g is a step function, and f (x) ≥ g(x)
for all x ∈ [a, b].
Chapter 7. Fourier Series and Fourier Transforms
544
It follows that Z b Z b |f (x) − g(x)|dx = (f (x) − g(x))dx a
a
Z
b
f (x)dx −
= a
≤ U (f, P ) −
l Z X j=1
l X
xj
g(x)dx
xj−1
mj (xj − xj−1 )
j=1
= U (f, P ) − L(f, P ) < ε. This completes the proof.
Figure 7.8: Approximating a Riemann integrable function by a step function. An important tool in the proof of pointwise convergence of Fourier series is the Riemann-Lebesgue lemma. This lemma is important in its own right. Hence, we state it in the most general setting. Theorem 7.10 The Riemann-Lebesgue Lemma Let I = [a, b]. If f : I → C is a Riemann integrable function, then Z lim
β→∞
a
b
f (x)eiβx dx = 0.
Chapter 7. Fourier Series and Fourier Transforms
545
Proof Given ε > 0, Theorem 7.9 says that there is a step function g : [a, b] → R such that Z b ε |f (x) − g(x)|dx < . 2 a This implies that Z b Z b ε (f (x) − g(x))eiβx dx ≤ |f (x) − g(x)|dx < . 2 a a Let P = {x0 , x1 , . . . , xl } be a partition of [a, b] such that for 1 ≤ j ≤ l, g(x) = mj for all x in (xj−1 , xj ). Let M = max{|m1 |, . . . , |ml |}. Then Z xj m 2M j iβxj iβx iβxj−1 g(x)e dx = e −e ≤ β . xj−1 iβ 4M l , ε Z b X l Z xj 2M l ε iβx iβx g(x)e dx ≤ g(x)e dx ≤ < . xj−1 β 2 a j=1
It follows that if β >
Therefore, Z b Z b Z b iβx iβx iβx f (x)e dx ≤ (f (x) − g(x))e dx + g(x)e dx < ε. a
a
a
This proves the assertion. Since sin βx =
eiβx − e−iβx , 2i
we obtain the following. Corollary 7.11 Let I = [a, b]. If f : I → C is a Riemann integrable function, then Z lim
β→∞
b
f (x) sin βxdx = 0. a
Chapter 7. Fourier Series and Fourier Transforms
546
Recall that a function f : [a, b] → C is piecewise continuous if there is a partition P = {x0 , x1 , . . . , xl } of [a, b] such that for each 1 ≤ j ≤ l, f : (xj−1 , xj ) → C is a continuous functions. It is piecewise differentiable if there is a partition P = {x0 , x1 , . . . , xl } of [a, b] such that for each 1 ≤ j ≤ l, f : (xj−1 , xj ) → C is a differentiable functions. Obviously, if f : [a, b] → C is piecewise differentiable, it is piecewise continuous. The piecewise continuity and piecewise differentiability do not impose any conditions on the partition points. In the following, we introduce a class of functions which satisfy stronger conditions on the partition points. Definition 7.9 Strongly Piecewise Differentiable Functions A function f : [a, b] → C is strongly piecewise continuous if there is a partition P = {x0 , x1 , . . . , xl } of [a, b] such that for each 1 ≤ j ≤ l, the limits f+ (xj−1 ) = lim+ f (x) and f− (xj ) = lim− f (x) x→xj−1
x→xj
exist, and the function gj : [xj−1 , xj ] → C defined as if x = xj−1 f+ (xj−1 ), gj (x) =
f (x), f− (xj ),
if xj−1 < x < xj if x = xj
is continuous. If f is also differentiable on (xj−1 , xj ), and the limits f+′ (xj−1 ) = lim+ h→0
f (xj−1 + h) − f+ (xj−1 ) h
and
f− (xj ) − f (xj − h) h→0= h exist, we say that f : [a, b] → C is strongly piecewise differentiable. f−′ (xj ) = lim +
Notice that a strongly piecewise differentiable function is strongly piecewise continuous and bounded. Therefore, it is Riemann integrable. We have abused notation above and denote the limit f (c + h) − f+ (c) lim+ h→0 h
Chapter 7. Fourier Series and Fourier Transforms
547
as f+′ (c). Strictly speaking, f+′ (c) is the right derivative of f at c which is defined as f (c + h) − f (c) lim+ . h→0 h The two expressions are equivalent if f (c) = f+ (c), meaning that f is right continuous at c. In fact, for the limit lim+
h→0
f (c + h) − f (c) h
to exist, a necessary condition is f+ (c) exists and is equal to f (c). However, here we do not require the function f to be continuous at the partition points xj , 0 ≤ j ≤ l. We only require the function to have left and right limits at these points. Since f (c) = f+ (c) = f− (c) only when f is continuous at c, we modify the definitions of f+′ (c) and f−′ (c) for functions that can have discontinuity at the point c. If x is an interior point of I and f : I → C is differentiable at x, lim+
h→0
f (x + h) − f (x) f (x) − f (x − h) = f+′ (x) = f ′ (x) = f−′ (x) = lim+ . h→0 h h
Thus, if the function f : [a, b] → C is strongly piecewise differentiable, then for any x ∈ (a, b), lim+
h→0
f (x + h) + f (x − h) − f+ (x) − f− (x) = f+′ (x) − f−′ (x). h
Example 7.9 The function f : [−π, π] → R defined as π − x, if − π ≤ x < 0, f (x) = x 2 , if 0 ≤ x ≤ π, is strongly piecewise differentiable.
(7.2)
Chapter 7. Fourier Series and Fourier Transforms
548
Figure 7.9: The strongly piecewise differentiable function defined in Example 7.9. Lemma 7.12 Let f : [−π, π] → C be a strongly piecewise differentiable function, and let fe : R → C be the 2π-periodic extension of f . Given x ∈ R, define the function h : [0, π] → C by h(t) =
fe(x + t) + fe(x − t) − fe+ (x) − fe− (x) , t sin 2
t ∈ (0, π],
(7.3)
and h(0) can be any value. Then h : [0, π] → C is a piecewise continuous bounded function. Hence, h : [0, π] → C is Riemann integrable. Proof t When t ∈ [0, π], sin = 0 only when t = 0. Notice that fe is a bounded 2 piecewise continuous function. Hence, h : [0, π] → C is piecewise continuous. For any positive number r that is less than π, h is bounded on [r, π]. To show that h : [0, π] → C is bounded, it is sufficient to show that h(t) has a limit when t → 0+ . Now lim+ h(t) = lim+
t→0
t→0
fe(x + t) + fe(x − t) − fe+ (x) − fe− (x) t lim+ . t t→0 t sin 2
Chapter 7. Fourier Series and Fourier Transforms
Since lim+
t→0
t sin
t 2
549
= 2,
eq. (7.2) gives lim+ h(t) = 2 fe+′ (x) − fe−′ (x) .
t→0
This completes the proof. Now we can prove the Dirichlet’s theorem. Theorem 7.13 Dirichlet’s Theorem Let f : [−π, π] → C be a strongly piecewise differentiable function, and let fe : R → C be the 2π-periodic extension of f . For every x ∈ R, the Fourier series of f converges at the point x to fe− (x) + fe+ (x) . 2 Proof Notice that the function fe is also strongly piecewise differentiable on any compact interval. The nth -partial sum of the Fourier series of f is sn (x) =
n X k=−n
ikx
ck e
,
1 where ck = 2π
Z
π
f (x)e−ikx dx.
−π
For a fixed real number x, we want to show that sn (x) converges to the number fe− (x) + fe+ (x) u= . 2 By Proposition 7.8, Z π Z x+π 1 1 sn (x) = f (t)Dn (x − t)dt = fe(x − t)Dn (t)dt. 2π −π 2π x−π
Chapter 7. Fourier Series and Fourier Transforms
550
Since fe and Dn are 2π-periodic functions, we find that Z π 1 sn (x) = fe(x − t)Dn (t)dt. 2π −π Notice that 1 2π
n Z 1 X π ikt Dn (t)dt = e dt = 1. 2π k=−n −π −π
Z
π
Therefore, 1 sn (x) − u = 2π
Z
π
e f (x − t) − u Dn (t)dt.
−π
Using the fact that Dn : R → R is an even function, we find that Z π 1 sn (x) − u = fe(x + t) + fe(x − t) − 2u Dn (t)dt 2π 0 1 Z π sin n + 2 t 1 e e = f (x + t) + f (x − t) − 2u dt t 2π 0 sin 2 Z π 1 1 h(t) sin n + = t dt, 2π 0 2 where h : [0, π] → C is the function defined by (7.3). By Lemma 7.12, h : [0, π] → C is Riemann integrable. By the Riemann-Lebesgue lemma, Z π 1 h(t) sin n + lim t dt = 0. n→∞ 0 2 This proves that lim sn (x) = u.
n→∞
From the Dirichlet’s theorem, we can spell out the following explicitly.
Chapter 7. Fourier Series and Fourier Transforms
551
Corollary 7.14 If f : [−π, π] → C is a strongly piecewise differentiable function, then its Fourier series converges pointwise. Denote by F : R → C the Fourier series of f . Then for any x ∈ R, F (x) =
fe+ (x) + fe− (x) , 2
where fe : R → C is the 2π-periodic extension of f . This implies that (a) If x ∈ (−π, π) and f is continuous at x, then F (x) = f (x). (b) If f is right continuous at −π, left continuous at π, and f (−π) = f (π), then F (−π) = F (π) = f (−π) = f (π). Let us look at a few examples. Example 7.10 Let f : [−π, π] → R be the function f (x) = x considered in Example 7.4. Since f is a strongly differentiable function, the Fourier series of f converges pointwise. We have shown that the Fourier series is given by F (x) =
∞ X
2(−1)k−1
k=1
sin kx . k
For any x ∈ (−π, π), this series converges to x. When x = π, it converges to f+ (−π) + f− (π) . 0= 2 π πk When x = , since sin is 0 when k = 2n, it is equal to 1 when k = 2 2 4n + 1, and it is equal to −1 when k = 4n + 3, we deduce that 1 π 1 1 1 1 π π F = 1 − + − + ··· = f = , 2 2 3 5 7 2 2 4 which is just the Newton-Gregory formula.
Chapter 7. Fourier Series and Fourier Transforms
552
Figure 7.10: Convergence of the Fourier series of the function f (x) = x, −π < x < π. Example 7.11 The function f : [0, 2π] → R, f (x) = x(2π − x) considered in Example 7.5 is a strongly piecewise differentiable function. Hence, its Fourier series ∞ X cos kx 2 2 F (x) = π − 4 3 k2 k=1
converges everywhere. Since f : [−π, π] → C is continuous and f (−π) = f (π), we find that F (x) = f (x) for all x ∈ [−π, π]. In particular, setting x = 0 and x = π respectively, we find that F (0) =
F (π) =
∞ X 2π 2 1 −4 = f (0) = 0, 2 3 k k=1
∞ X 2π 2 (−1)k −4 = f (π) = π 2 . 2 3 k k=1
These give ∞ X 1 1 1 1 1 1 π2 = 1 + + + + + + · · · = , k2 22 32 42 52 62 6 k=1 ∞ X (−1)k−1 k=1
k2
=1−
1 1 1 1 1 π2 + − + − + · · · = . 22 32 42 52 62 12
Chapter 7. Fourier Series and Fourier Transforms
553
Figure 7.11: The function f : [0, 2π] → R, f (x) = x(2π − x) and its entension. Example 7.12 In Example 7.7, we consider the function g : [−π, π] → R defined as 1, if a ≤ x ≤ b, g(x) = 0, otherwise, where a and b are two numbers satisfying −π ≤ a < b ≤ π. Notice that g is a strongly piecewise differentiable function. Thus, its Fourier series G(x) =
∞ b−a i X (e−ikb − e−ika )eikx − (eikb − eika )e−ikx + 2π 2π k=1 k
converges pointwise. If x ∈ (a, b), Dirichlet’s theorem says that G(x) = g(x) = 1. Hence, for any x ∈ (a, b) ⊂ (−π, π), b − a = 2π − i
∞ X ∞ X (e−ikb − e−ika )eikx − (eikb − eika )e−ikx k=1 k=1
k
.
Chapter 7. Fourier Series and Fourier Transforms
554
Remark 7.7 If we scrutinize the proof of Theeorem 7.13, we find that a necessary and sufficient condition for the Fourier series of a 2π-periodic function f : R → C to converge at a point x is that the limit Z π 1 f (x + t) + f (x − t) sin n + t dt lim n→∞ 0 t 2 should exists. This is known as the Riemann’s localization theorem. Theeorem 7.13 says that if f : [−π, π] → R is strongly piecewise differentiable, then this limit exists. This is sufficient for most of our applications. Remark 7.8 Fourier Sine Series and Fourier Cosine Series Let L > 0, and let f : [0, L] → C be a Riemann integrable function defined on [0, L]. We can extend f to be an odd function fo : [−L, L] → C by defining if − L ≤ x < 0, −f (−x), fo (x) = 0, if x = 0, f (x), if 0 < x ≤ L. We can also extend f to be an even function fe : [−L, L] → C by defining f (−x), if − L ≤ x < 0, fe (x) = f (x), if 0 ≤ x ≤ L. The Fourier series of fo : [−L, L] → C is called the Fourier sine series of f : [0, L] → C. The Fourier series of fe : [−L, L] → C is called the Fourier cosine series of f : [0, L] → C.
Chapter 7. Fourier Series and Fourier Transforms
555
Exercises 7.2 Question 1 Consider the function f : [−π, π] → R, f (x) = x3 − π 2 x. (a) Find the Fourier series of f . (b) Use the Fourier series to find the sum ∞ X (−1)k−1 1 1 1 = 1 − 3 + 3 − 3 + ··· . 3 (2k − 1) 3 5 7 k=1
Question 2 Let f : [−π, π] be the function defined as x + π, if − π ≤ x < 0, f (x) = x − π, if 0 ≤ x ≤ π. (a) Find the Fourier series of f . (b) Study the pointwise convergence of the Fourier series. Question 3 Study the pointwise convergence of the Fourier series of the function f : [−π, π] → R, f (x) = |x| obtained in Exercises 7.1. Question 4 Study the pointwise convergence of the Fourier series of the function f : [−π, π] → R, f (x) = x2 obtained in Exercises 7.1. Question 5 Study the pointwise convergence of the Fourier series of the function f : [0, 2π] → R, f (x) = x2 obtained in Exercises 7.1.
Chapter 7. Fourier Series and Fourier Transforms
7.3
556
The L2 Convergence of a Fourier Series
In this section, we consider the L2 -convergence of a Fourier series. We first define L2 -converges for a sequence of Riemann integrable functions. Definition 7.10 L2 -Convergence Let I = [a, b] be an interval in R, and let {fn : I → C} be a sequence of Riemann integrable functions. We say that {fn : I → C} converges in L2 to a function g : I → C in R(I, C) if lim ∥fn − g∥ = 0.
n→∞
In the vector space R(I, C), we have nonzero functions h : I → C which has zero norm. Hence, if {fn : I → C} converges in L2 to a function g : I → C, the function g is not unique. Nevertheless, we have the following. Theorem 7.15 Let I = [a, b] be an interval in R, and let {fn : I → C} be a sequence of functions in R(I, C) that converges in L2 to the two functions g1 : I → C and g2 : I → C in R(I, C), then g1 and g2 are L2 -equivalent. Proof By triangle inequality, ∥g1 − g2 ∥ ≤ ∥fn − g1 ∥ + ∥fn − g2 ∥. Since lim ∥fn − g1 ∥ = 0 and
n→∞
lim ∥fn − g2 ∥ = 0,
n→∞
we find that ∥g1 − g2 ∥ = 0. Thus, g1 and g2 are L2 -equivalent.
Chapter 7. Fourier Series and Fourier Transforms
557
Example 7.13 Consider the sequence of functions {fn : [0, 1] → R} defined as m 1, if x = for some integer m, n fn (x) = 0, otherwise. Since fn : [0, 1] → R is a function that is nonzero only for finitely many points, we find that ∥fn ∥ = 0. This implies that {fn : I → R} converges in L2 to the function f0 : I → R that is identically zero. However, {fn : I → R} does not converge pointwise. Take for example the point x0 = 1/2. Then fn (x0 ) = 1 if n is even and fn (x0 ) = 0 if n is odd. Hence, the sequence {fn (x0 )} does not converge. In other words, for sequences of functions, pointwise convergence and L2 -convergence are different. The Fourier series of a Riemann integrable function f : I → C converges in L to the function f : I → C if 2
lim ∥sn − f ∥ = 0.
n→∞
Here sn (x) is the nth -partial sum of the Fourier series. The main theorem we want to prove in this section is the Fourier series of any Riemann integrable function f : I → C converges in L2 to f itself. We start with the following theorem which asserts that a Riemann integrable function f : [a, b] → C can be approximated in L2 by step functions. Theorem 7.16 Let f : [a, b] → C be a Riemann integrable function. For every ε > 0, there is a step function g : [a, b] → C such that ∥f − g∥ < ε.
Chapter 7. Fourier Series and Fourier Transforms
558
Proof As in the proof of Theorem 7.9, it is sufficient to consider the case where the function f is real-valued. Since f : [a, b] → R is Riemann integrable, it is bounded. Therefore, there exists M > 0 such that |f (x)| ≤ M
for all x ∈ [a, b].
Given ε > 0, Theorem 7.9 says that there is a step function g : [a, b] → R such that Z b ε2 . |f (x) − g(x)|dx < 2M a By the construction of g given in the proof of Theorem 7.9, we find that |g(x)| ≤ M for all x ∈ [a, b]. Therefore, (f (x) − g(x))2 ≤ |f (x) − g(x)||f (x) + g(x)| ≤ 2M |f (x) − g(x)|. This implies that Z b Z b 2 2 |f (x) − g(x)|dx < ε2 . (f (x) − g(x)) dx ≤ 2M ∥f − g∥ = a
a
Hence, ∥f − g∥ < ε. Theorem 7.17 Let I = [−π, π]. Given a Riemann integrable function f : I → C, let ∞ n X X ikx ck e be its Fourier series, and let sn (x) = ck eikx be the nth k=−∞
k=−n
partial sum. We have the followings. 2
(a) For each n ≥ 0, ∥sn ∥ = 2π
n X
|ck |2 .
k=−n
(b) For each n ≥ 0, we have the Bessel’s inequality ∥sn ∥ ≤ ∥f ∥. (c) The Fourier series converges in L2 to f if and only if lim ∥sn ∥2 = ∥f ∥2 .
n→∞
Chapter 7. Fourier Series and Fourier Transforms
559
Proof For n ∈ Z, let ϕn : R → C be the function ϕn (x) = einx . Then S = {ϕn | n ∈ Z} is an orthogonal system of functions in R(I, C), and ∥ϕn ∥ = √ 2π for all n ∈ Z. For n ≥ 0, the set Sn = {ϕk | − n ≤ k ≤ n} spans the subspace Wn , and n X
ck ϕk (x) = sn (x) = projWn f (x).
k=−n
Since Sn is an orthogonal system, the generalized Pythagoras theorem says that n n X X ∥sn ∥2 = |ck |2 ∥ϕk ∥2 = 2π |ck |2 . k=−n
k=−n
This proves part (a). For part (b), recall that f − sn is orthogonal to sn . Pythagoras theorem again,
By generalized
∥f ∥2 = ∥sn + (f − sn )∥2 = ∥sn ∥2 + ∥f − sn ∥2 ≥ ∥sn ∥2 . Hence, we find that ∥sn ∥ ≤ ∥f ∥. This proves part (b). Part (c) follows from ∥f ∥2 = ∥sn ∥2 + ∥f − sn ∥2 . The Fourier series converges in L2 to f if and only if lim ∥sn − f ∥ = 0, if n→∞ and only if lim ∥sn ∥2 = ∥f ∥2 . n→∞
Remark 7.9 Part (b) of Theorem 7.17 says for the trigonometric series
∞ X
ck eikx to be
k=−∞
the Fourier series of a Riemann integrable function, it is necessary that the ∞ X series |ck |2 is convergent. k=−∞
Chapter 7. Fourier Series and Fourier Transforms
560
Now we will prove that the Fourier series of a special type of step functions converges in L2 to the function itself. Theorem 7.18 Let a and b be two numbers satisfying −π ≤ a < b ≤ π, and let g : [−π, π] → R be the function defined as 1, if a ≤ x ≤ b, g(x) = 0, otherwise. The Fourier series of g converges in L2 to the function g. Proof By Theorem 7.17, it is sufficient to show that lim ∥sn ∥2 = ∥g∥2 .
n→∞
Now, 2
Z
π
∥g∥ =
2
Z
|g(x)| dx = −π
b
dx = b − a. a
In Example 7.7, we have seen that the Fourier coefficients of g is c0 =
b−a , 2π
and ck =
e−ikb − e−ika −2πik
when k ̸= 0.
By part (a) of Theorem 7.17, ! n X ∥sn ∥2 = 2π |c0 |2 + (|ck |2 + |c−k |2 ) k=1 n X (b − a) (eikb − eika )(e−ikb − e−ika ) = + 4π 2π 4π 2 k 2 k=1 2
n (b − a)2 2 X 1 − cos k(b − a) = + . 2π π k=1 k2
Chapter 7. Fourier Series and Fourier Transforms
561
By Example 7.11, ∞ X 2 2 cos kx π −4 = x(2π − x) 2 3 k k=1
and
for all x ∈ [0, 2π],
∞ X 1 π2 . = 2 k 6 k=1
Therefore, ∞ ∞ (b − a)2 2 X 1 2 X cos k(b − a) + lim ∥sn ∥ = − n→∞ 2π π k=1 k 2 π k=1 k2 (b − a)2 π 1 2 2 2 = + − π − 2π(b − a) + (b − a) 2π 3 2π 3 2
= b − a = ∥g∥2 . This proves that the Fourier series of g converges in L2 to g. Now we can prove our main theorem. Theorem 7.19 L2 Convergence of Fourier Series Let I = [−π, π] and let f : I → C be a Riemann integrable function. Then the Fourier series of f converges in L2 to f itself. Proof Since we will be dealing with more than one functions here, we use sn (f ) to denote the nth -partial sum of the Fourier series of f . We will show that given ε > 0, there exists a positive integer N such that for all n ≥ N , ∥sn (f ) − f ∥ < ε. Fixed ε > 0. Theorem 7.16 says that there is step function g : [−π, π] → R such that ε ∥f − g∥ < . 3
Chapter 7. Fourier Series and Fourier Transforms
562
Let P = {x0 , x1 , . . . , xl } be the partition of [−π, π] such that for each 1 ≤ j ≤ l, g is constant on (xj−1 , xj ). Define gj : [−π, π] → C by
gj (x) =
g(x),
if xj−1 < x < xj ,
0,
otherwise.
Then g(x) = g1 (x) + · · · + gl (x)
for all x ∈ [−π, π] \ P.
Since Riemann integrals are not affected by function values at finitely many points, it follows that for each n ≥ 0, sn (g) =
l X
sn (gj ).
j=1
By Theorem 7.17, ε ∥sn (f ) − sn (g)∥ = ∥sn (f − g)∥ ≤ ∥f − g∥ < . 3 By triangle inequality,
l
l
X
X
∥g − sn (g)∥ = (gj − sn (gj )) ≤ ∥gj − sn (gj )∥.
j=1
j=1
By Theorem 7.18, lim ∥gj − sn (gj )∥ = 0 for 1 ≤ j ≤ l. Therefore, n→∞
lim ∥g − sn (g)∥ = 0.
n→∞
This implies that there is a positive integer N such that ∥g − sn (g)∥
1, it is convergent. Hence, the
∞ X cos kx 3
k=1
k2
is a continuous function whose Fourier series is itself. Remark 7.11 By Remark 7.9, in order for the series
∞ X
ck eikx to be the Fourier series
k=−∞
of a Riemann integrable function, it is necessary that the series
∞ X
|ck |2
k=−∞
is convergent. However, the convergence of
∞ X
|ck |2 does not imply the
k=−∞
convergence of
∞ X
|ck |.
k=−∞
Theorem 7.25 gives a criterion for a function defined as a trigonometric series to have Fourier series that is equal to itself. However, we will usually start by a Riemann integrable function. Theorem 7.27 Let I = [−π, π], and let f : I → C be a Riemann integrable function. If ∞ X the Fourier series ck eikx of f : I → C converges uniformly, it defines k=−∞
a continuous 2π-periodic function F : R → C whose restriction to I is L2 equivalent to f : I → C. If the periodic extension fe : R → C of f is continuous at x0 , then F (x0 ) = fe(x0 ).
Chapter 7. Fourier Series and Fourier Transforms
573
Proof The fact that
∞ X
ck eikx defines a continuous periodic function F : R → C
k=−∞
has been asserted in Theorem 7.25. Since
∞ X
ck eikx is the Fourier series
k=−∞
for both f : I → C and F : I → C, Theorem 7.19 says that it converges in L2 to f : I → C and F : I → C. Theorem 7.15 then asserts that F : I → C and f : I → C are L2 equivalent. The values of two L2 equivalent functions agree at a point where both of them are continuous. Example 7.18 In Example 7.5, we have seen that the Fourier series of the function f : [0, 2π] → R, f (x) = x(2π − x) is ∞ X cos kx 2 2 . F (x) = π − 4 2 3 k k=1 ∞ X 1 Since the series is convergent, and f : [0, 2π] → R is continuous 2 k k=1 with f (0) = f (2π) = 0, Theorem 7.26 and Theorem 7.27 imply that the Fourier series F (x) converges uniformly to the periodic extension of the function f (x).
Example 7.19 In Example 7.4, we have seen that the Fourier series of the function f : [−π, π] → R, f (x) = x is F (x) =
∞ X k=1
Since the harmonic series
∞ X 1
2(−1)k−1
sin kx . k
is divergent, we cannot apply Theorem k 7.26. However, we can argue that the Fourier series does not converge uniformly in the following way. k=1
Chapter 7. Fourier Series and Fourier Transforms
574
In Example 7.10, we have used Dirichlet’s theorem to conclude that F (x) converges to f (x) for any x ∈ (−π, π). It is obvious that F (π) = 0. Now lim− F (x) = lim− f (x) = lim− x = π ̸= F (π). x→π
x→π
x→π
This shows that F is not continuous at x = π. Hence, the convergence of the Fourier series is not uniform. Remark 7.12 Theorem 7.25 and Theorem 7.27 can be regarded as uniquessness of Fourier series. Next we want to consider term-by-term differentiation. Example 7.20 Let us consider again the function f : [−π, π] → R, f (x) = x, whose Fourier series is given by F (x) =
∞ X k=1
2(−1)k−1
sin kx . k
Since f : (−π, π) → R is continuously differentiable with f ′ (x) = 1, the extension function fe : R → R is continuously differentiable at all the points where x ∈ / (2n + 1)Z, and fe′ (x) = 1,
when x ∈ / (2n + 1)Z.
Hence, for the function f ′ : [−π, π] → R, regardless of how it is defined at ±π, its Fourier series is the constant G(x) = 1. However, if we differentiate the Fourier series of f : [−π, π] → R term-by-term, we obtain the series ∞ X k=1
2(−1)k−1 cos kx.
Chapter 7. Fourier Series and Fourier Transforms
As |ak | = 2 for all k ∈ Z,
∞ X
575
|ak |2 is divergent. Thus, the series
k=1 ∞ X
2(−1)k−1 cos kx.
k=1
cannot be the Fourier series of any Riemann-integrable function. Example 7.20 shows that term-by-term differentiation can fail even though the function f : [−π, π] → R and its derivative are strongly piecewise differentiable. Theorem 7.28 Let I = [−π, π] and let f : I → R be a Riemann integrable function ∞ ∞ X X ikx ck is convergent, and ck e . If the series with Fourier series k=−∞
k=−∞
the series
∞ X
kck eikx converges uniformly, then f : I → R is L2 -
k=−∞
equivalent to the continuously differentiable function F : I → R, F (x) = ∞ X ck eikx . Moreover, the Fourier series of F ′ : I → R is k=−∞ ∞ X
ikck eikx ,
k=−∞
which converges to F ′ (x) for all x ∈ I. Proof By Theorem 7.25, the uniformly convergent series ∞ X
ikck eikx
k=−∞
defines a continuous function G : I → C, whose Fourier series is itself.
Chapter 7. Fourier Series and Fourier Transforms
Let Z
∞ X
x
G(x)dx +
H(x) = 0
576
ck .
k=−∞
By fundamental theorem of calculus, H is differentiable and H ′ (x) = G(x). Thus, H(x) is continuously differentiable. Since the series ∞ X
ikck eikx
k=−∞
converges uniformly, we can do term-by-term integration to obtain ∞ X
H(x) =
Z
x ikt
ikck
e dt + 0
k=−∞
∞ X
∞ X
ck =
k=−∞
ck eikx = F (x).
k=−∞
Hence, the function F : I → C, F (x) =
∞ X
ck eikx
k=−∞
is continuously differentiable, with derivative ′
′
F (x) = H (x) = G(x) =
∞ X
ikck eikx .
k=−∞
This completes the proof. Let us look at an example. Example 7.21 Consider the trigonometric series ∞ X sin kx k=1
k3
and
∞ X cos kx k=1
k2
.
∞ ∞ X X 1 1 Since the series and are convergent, both the series 3 k k2 k=1 k=1
Chapter 7. Fourier Series and Fourier Transforms
∞ X sin kx k=1
k3
577
∞ X cos kx
and
k=1
k2
converge uniformly, and so they define continuous functions. Let F (x) =
∞ X sin kx k=1
k3
G(x) =
and
∞ X cos kx k=1
k2
.
Since
d sin kx cos kx = for all k ∈ Z+ , dx k 3 k2 F is continuously differentiable and F ′ (x) = G(x). At the end of this section, we give a brief discussion about the Ces`aro mean of a Fourier series. As an application, we give another proof of the Weiestrass approximation theorem. Definition 7.11 Ces`aro Mean of a Fourier Series Given a Riemann integrable function f : [−π, π] → C, let its Fourier series be Z π ∞ X 1 ikx ck e , where ck = f (x)e−ikx dx. 2π −π k=−∞ For n ≥ 1, the nth Ces`aro mean of the Fourier series is σn (x) =
s0 (x) + s1 (x) + · · · + sn−1 (x) , n
where sn (x) =
n X
ck eikx
k=−n
is the nth -partial sum of the Fourier series.
Chapter 7. Fourier Series and Fourier Transforms
578
Proposition 7.29 Let I = [−π, π], and let f : I → C be a Riemann integrable function. The nth Ces`aro mean σn (x) of the Fourier series of f has an integral representation given by Z π 1 f (t)Fn (x − t)dt, σn (x) = 2π −π where Fn : R → R is the kernel n, Fn (x) = sin2 nx 2 2 x, n sin 2
if x ∈ 2πZ, otherwise.
Proof By Proposition 7.8, 1 sn (x) = 2π
Z
π
f (t)Dn (x − t)dt, −π
where Dn (t) is the Dirichlet kernel. This gives Z π 1 σn (x) = f (t)Fn (x − t)dt, 2π −π where Fn (x) =
D0 (x) + D1 (x) + · · · + Dn−1 (x) . n
Using the fact that
Dn (x) =
n X
ikx
e
k=−n
2n + 1, = sin n + 1 x 2 , x sin 2
if x ∈ 2πZ, otherwise,
we find that when x ∈ 2πZ, Fn (x) =
1 + 3 + . . . + (2n − 1) = n. n
Chapter 7. Fourier Series and Fourier Transforms
579
When x ∈ / 2πZ, n−1 1 X 1 x Fn (x) = sin k + n sin x2 k=0 2 ( n−1 ) X 1 1 = x . Im exp i k + n sin x2 2 k=0 inx ix e 1 −1 2 = Im e n sin x2 eix − 1 inx 1 e −1 = Im n sin x2 2i sin x2 1 − cos nx = 2n sin2 x2 =
sin2 nx 2 . n sin2 x2
This completes the proof. Definition 7.12 For n ≥ 1, the Fejér kernel Fn : R → R is the kernel given by if x ∈ 2πZ, n, 2 nx Fn (x) = sin 2 , otherwise. n sin2 x2 A good property about the Fejér kernel is Fn (t) ≥ 0 for all t ∈ R. Now we can prove the following theorem. Theorem 7.30 Let f : [−π, π] → C be a continuous function with f (−π) = f (π), and let σn (x) be the nth Ces`aro mean of the Fourier series of f . Then the sequence of functions {σn : [−π, π] → C} converges uniformly to the function f : [−π, π] → C.
Chapter 7. Fourier Series and Fourier Transforms
580
Figure 7.12: The Fejér kernels Fn : [−π, π] → R for 2 ≤ n ≤ 6. Proof As in the proof of the Dirichlet’s theorem, we find that Z π 1 fe(x − t)Fn (t)dt, σn (x) = 2π −π where fe : R → C be its 2π-periodic extension of f . Since Z π 1 Dn (t)dt = 1 for all n ≥ 0, 2π −π we have
1 2π
Z
π
Fn (t)dt = 1
for all n ≥ 1.
−π
It follows that for x ∈ [−π, π], 1 σn (x) − f (x) = 2π
Z
π
fe(x − t) − fe(x) Fn (t)dt.
−π
For x ∈ [−π, π] and t ∈ [−π, π], x − t ∈ [−2π, 2π]. Since f : [−π, π] → C is a continuous function with f (−π) = f (π), fe : [−2π, 2π] → C is continuous. Hence, it is uniformly continuous. Given ε > 0, there exists δ > 0 such that if u and v are in [−2π, 2π] and |u − v| < δ, then ε |fe(u) − fe(v)| < . 2 By continuity, if |u − v| ≤ δ, then ε |fe(u) − fe(v)| ≤ . 2
Chapter 7. Fourier Series and Fourier Transforms
581
Being continuous on a compact interval, the function fe : [−2π, 2π] → C is also bounded. Therefore, there exists M > 0 such that |fe(x)| ≤ M
for all x ∈ [−2π, 2π].
This implies that for any x ∈ [−π, π] and t ∈ [−π, π], e e f (x − t) − f (x) ≤ 2M. On the other hand, if δ ≤ |t| ≤ π, sin2
t δ ≥ sin2 > 0. 2 2
Therefore, 0 ≤ Fn (t) ≤
1 n sin2
when δ ≤ |t| ≤ π.
δ 2
Let N be a positive integer such that N>
4M . ε sin2 2δ
For n ≥ N and x ∈ [−π, π], we have Z 1 e |σn (x) − f (x)| ≤ f (x − t) − fe(x) Fn (t)dt 2π |t|≤δ Z 1 e e f (x − t) − f (x) + Fn (t)dt. 2π δ≤|t|≤π We estimate the two terms separately. Since Fn (t) ≥ 0 for all t ∈ [−π, π], Z Z π 1 1 0≤ Fn (t)dt ≤ Fn (t)dt = 1. 2π |t|≤δ 2π −π Hence, 1 2π
Z
Z ε 1 ε e Fn (t)dt ≤ . f (x − t) − fe(x) Fn (t)dt ≤ × 2 2π |t|≤δ 2 |t|≤δ
Chapter 7. Fourier Series and Fourier Transforms
582
For the second term, we have Z Z 1 1 2M e e dt f (x − t) − f (x) Fn (t)dt ≤ 2π δ≤|t|≤π 2π δ≤|t|≤π n sin2 2δ 2M ε ≤ < . 2 δ 2 N sin 2 This shows that for all n ≥ N , |σn (x) − f (x)| < ε
for all x ∈ [−π, π].
Thus, the sequence of functions {σn : [−π, π] → C} converges uniformly to the function f : [−π, π] → C. Notice that since sn (x) is in the span of Sn = {eikx | − n ≤ n ≤ n}, σn+1 (x) is in the span of Sn (x). Now we apply Theorem 7.30 to give another proof of the Weierstrass approximation. Theorem 7.31 Weierstrass Approximation Theorem Let f : [a, b] → R be a continuous function defined on [a, b]. Given ε > 0, there is a polynomial p(x) such that |f (x) − p(x)| < ε
for all x ∈ [a, b].
Proof It is sufficient to prove the theorem for a specific [a, b]. We take [a, b] = [0, 1]. Given f : [0, 1] → R is a real-valued continuous function, we extend it to be an even function fe : [−1, 1] → R, and let g : [−π, π] → R be the function defined as g(x) = fe (cos x). This is well-defined since the range of cos x is [−1, 1]. Since cos x and fe : [−1, 1] → R are continuous even functions, g : [−π, π] → R is a continuous even function. Hence, we also have g(π) = g(−π).
Chapter 7. Fourier Series and Fourier Transforms
583
Given ε > 0, Theorem 7.30 implies that there is a positive integer n such that |g(x) − σn+1 (x)| < ε. (7.4) Here σn+1 (x) is the (n + 1)th Ces`aro mean of the Fourier series of g. Since g : [−π, π] → R is a real-valued even function, the Fourier series of g has the form ∞ a0 X + ak cos kx, 2 k=1 where ak , k ≥ 0 are real. This implies that σn+1 (x) =
n X
αk cos kx
k=0
for some real constants α0 , α1 , . . . , αn . For any m ≥ 1, cos mx can be written as a linear combination of 1, cos x, cos2 x, . . . , cosm x. This shows that there are real constants β0 , β1 , . . . , βn such that σn+1 (x) =
n X
βk cosk x.
k=0
Let p(x) =
n X
β k xk .
k=0
Then σn+1 (x) = p(cos x). Thus, (7.4) says that |fe (cos x) − p(cos x)| < ε
for all x ∈ [−π, π].
This implies that |f (x) − p(x)| < ε
for all x ∈ [0, 1],
which completes the proof of the theorem.
Chapter 7. Fourier Series and Fourier Transforms
584
Remark 7.13 In the proof of the Weierstrass approximation theorem given above, we do not use Fourier series since the Fourier series of a 2π-periodic continuous function does not necessary converge uniformly. An example is given in [SS03]. However, there are other approaches to prove the Weierstrass approximation theorem using Fourier series. For example, one can approximate a continuous function uniformly by a continuous piecewise linear function first. The Fourier series of a continuous piecewise linear function does converge uniformly to the function itself. In the proof given above, we used the even extension fe of the given function f . The Fourier series of fe (cos x) is a cosine series, so that the Ces`aro mean is a polynomial in cos x. One can also bypass the even extension and the composition with the cosine function, using directly uniform approximation of trigonometric functions by Taylor polynomials, as asserted by the general theory of power series.
Chapter 7. Fourier Series and Fourier Transforms
585
Exercises 7.4 Question 1 Consider the function f : [−π, π] defined as x + π, if − π ≤ x < 0, f (x) = x − π, if 0 ≤ x ≤ π. The Fourier series of this function has been obtained in Exercises 7.2. Does the Fourier series converge uniformly? Justify your answer. Question 2 Study the uniform convergence of the Fourier series of the function f : [−π, π] → R, f (x) = x2 obtained in Exercises 7.1. Question 3 Show that the trigonometric series ∞ X 2k cos kx + 3 sin kx k=1
k4
defines a continuously differentiable function F : R → R, and find the Fourier series of the function F ′ : [−π, π] → R.
Chapter 7. Fourier Series and Fourier Transforms
7.5
586
Fourier Transforms
We have seen that the Fourier series of a function f : [−L, L] → C defined on [−L, L] is ∞ X iπkx ck exp , L k=−∞ where the Fourier coefficients ck , k ∈ Z are given by Z L 1 iπkt ck = f (t) exp − dt. 2L −L L This is also the Fourier series of the 2L-periodic extension of the function f . Substitute the expression for ck , we find that the Fourier series can be written as Z ∞ iπk(x − t) 1 X π L f (t) exp dt. (7.5) 2π k=−∞ L −L L Heuristically,
∞ X π iπkt exp L L k=−∞
can be regarded as a Riemann sum for the function g : R → C, g(ω) = eiωt . In the limit L → ∞, one obtain heuristically the integral Z ∞ eiωt dω, −∞
so that (7.5) becomes 1 2π
Z
∞
−∞
Z
∞
f (t)eiω(x−t) dtdω.
−∞
This motivates us to define the Fourier transform of a function f : R → C as Z ∞ b f (ω) = f (t)e−iωt dt. −∞
We know that under certain conditions, the Fourier series of a function would converge to the function itself. Hence, we can also explore the conditions in which Z ∞Z ∞ Z ∞ 1 1 iω(x−t) f (x) = f (t)e dtdω = fb(ω)eiωx dω. (7.6) 2π −∞ −∞ 2π −∞
Chapter 7. Fourier Series and Fourier Transforms
587
However, now the integrals we are working with are improper integrals. Therefore, there is another convergence issue that we need to deal with. In this section, we only give a brief discussion about Fourier transforms. An in-depth analysis would require advanced tools. We say that a function f : R → C is Riemann integrable if it is Riemann integrable on any compact intervals. Definition 7.13 L1 and L2 Functions Let f : R → C be a Riemann integrable function. We say that f is L1 if the improper integral Z ∞
|f (x)|dx −∞
is convergent. In this case, we define the L1 -norm of f as Z ∞ |f (x)|dx. ∥f ∥1 = −∞
We say that f is L2 if the improper integral Z ∞ |f (x)|2 dx −∞
is convergent. In this case, we define the L2 -norm of f as sZ ∞
|f (x)|2 dx.
∥f ∥2 = −∞
If f : [a, b] → C is any Riemann integrable function, the zero-extension of f to R is both a L1 and a L2 function. As before, the L1 and L2 norms are seminorms which are positive semi-definite, where there are nonzero functions that have zero norms. Example 7.22 Consider the function f : R → R defined as 1 f (x) = √ . x2 + 1
Chapter 7. Fourier Series and Fourier Transforms
The integral Z
∞
√
−∞
1 x2 + 1
is not convergent, but the integral Z ∞ −∞
x2
588
dx
1 dx +1
is convergent. Hence, f : R → R is L2 but not L1 . Definition 7.14 Fourier transform Let f : R → C be a Riemann integrable function. The Fourier transform of f , denoted by F[f ] or fb, is defined as Z ∞ b f (t)e−iωt dt, F[f ](ω) = f (ω) = −∞
for all the ω ∈ R which this improper integral is convergent. Example 7.23 If f : R → C is a L1 -function, for any ω ∈ R, the integral Z ∞ f (t)e−iωt dt −∞
converges absolutely. Hence, a L1 function has Fourier transform fb which is defined on R. In particular, a function that vanishes outside a bounded interval has a Fourier transform that is defined for all ω ∈ R. Proposition 7.32 Fourier transform is a linear operation. Namely, if f : R → C and g : R → C are functions that have Fourier transforms, then for any complex numbers α and β, the function αf + βg : R → C also has Fourier transform, and F[αf + βg] = αF[f ] + βF[g].
Chapter 7. Fourier Series and Fourier Transforms
589
Remark 7.14 In engineering, it is customary to use t as the independent variable for the function f : R → C, and ω as the independent variable for its Fourier transform fb : R → C. The function f is usually a function of time t, and its Fourier transform is a function of frequency ω. Hence, the Fourier transform is a transform from the time domain to the frequency domain. Example 7.24 Let a and b be two real numbers with a < b. Define the function g : R → R by 1, if a ≤ t ≤ b, g(t) = 0, otherwise. Find the Fourier transform of g. Solution The Fourier transform of g is Z gb(ω) =
a
b
−iωb − e−iωa ) i(e , ω e−iωt dt = b − a,
if ω ̸= 0, if ω = 0.
Of special interest is when g : R → R is given by 1, if − a ≤ t ≤ a, g(t) = 0, otherwise, which is an even function. Example 7.24 shows that its Fourier transform is gb(w) =
2 sin aω . ω
One can show that this function is not L1 but is L2 .
(7.7)
Chapter 7. Fourier Series and Fourier Transforms
590
Figure 7.13: The function g : R → R defined by (7.7) with a = 1.
Figure 7.14: The Fourier transform of the function g : R → R defined by (7.7) with a = 1. Remark 7.15 A function that vanishes outside a bounded interval is said to have compact support. In general, the support of a function f : R → C is defined to be the closure of the set of those points x such that f (x) ̸= 0. Namely, (support f ) = {x ∈ R | f (x) ̸= 0}. Since a set is bounded if and only if its closure is bounded, a function f has compact support if and only if the set of points where f does not vanish is bounded. Let us look at Fourier transforms of functions that does not have compact support.
Chapter 7. Fourier Series and Fourier Transforms
591
Example 7.25 Let a be a positive number, and let f : R → R be the function defined as f (t) = e−a|t| . Find the Fourier transform of f . Solution The Fourier transform of f is given by Z ∞ b f (ω) = e−a|t| e−iωt dt −∞ Z L = lim e−at e−iωt + eiωt dt L→∞ 0 Z L e−(a+iω)t + e−(a−iω)t dt = lim L→∞
0
e−(a+iω)t e−(a−iω)t = lim − − L→∞ a + iω a − iω 1 1 + = a + iω a − iω 2a = 2 . a + ω2
Notice that the function fb : R → C, fb(ω) = L1 and L2 .
a2
L 0
2a is a function that is both + ω2
Figure 7.15: The function f : R → R, f (t) = e−|t| . A function f : R → R of the form (x − µ)2 f (x) = c exp − , 2σ 2
(7.8)
Chapter 7. Fourier Series and Fourier Transforms
Figure 7.16: The function fb : R → R, fb(ω) = transform of f : R → R, f (t) = e−|t| .
592
2 , which is the Fourier 1 + ω2
is the probability density function of a normal distribution with mean µ and standard deviation σ when 1 . c= √ 2πσ It is also known as a Gaussian function. These functions are infinitely differentiable and they decay exponentially to 0 when x gets large. When µ = 0 and σ = 1, 2 1 x Φ(x) = √ exp − 2 2π is the probability density of the standard normal distribution. The Fourier transform t2 of the Gaussian function f (t) = exp − is 2 Z ∞ t2 fb(ω) = e− 2 e−iωt dt −∞ Z ∞ 2 1 2 − ω2 exp − (t + iω) dt =e 2 −∞ Z ∞ 2 2 ω t = e− 2 e− 2 dt =
√
−∞ ω2
2πe− 2 .
In the computation, the equality Z ∞ Z ∞ t2 1 2 e− 2 dt exp − (t + iω) dt = 2 −∞ −∞ can be understood in complex analysis as shifting contours of integrations. We leave the details to the students.
Chapter 7. Fourier Series and Fourier Transforms
593
t2 Notice that for the function f (t) = exp − , its Fourier transform fb(ω) is 2 √ equal to f (ω) multiplied by 2π. Namely,
fb(ω) =
√ 2πf (ω).
√ The factor 2π here is due to our normalization. Different textbooks use different conventions for Fourier transforms. Among them are the followings: Z ∞ Z ∞ −iωt f (t)e dt, f (t)eiωt dt, −∞ Z−∞ Z ∞ ∞ 1 1 −iωt √ √ f (t)e dt, f (t)eiωt dt, 2π −∞ 2π −∞ Z ∞ Z ∞ 1 1 −iωt f (t)e dt, f (t)eiωt dt. 2π −∞ 2π −∞ Some might also replace iωt by 2πiωt. When one is reading about Fourier transforms, it is important to check the definition of Fourier transform that is being used. One can show that in our definition, the Fourier transform of the Gaussian 2 function f (t) = e−at with a > 0 is r Z ∞ π − ω2 2 e−at e−iωt dt = fb(ω) = e 4a , a −∞ so that
ω2 1 1 √ fb(ω) = √ e− 4a . 2π 2a
1 2 Figure 7.17: The function f (t) = e−t and the function g(ω) = √ fb(ω) = 2π 1 − ω2 √ e 4. 2
Chapter 7. Fourier Series and Fourier Transforms
594
Remark 7.16 If f : R → C is a L1 function and Z ∥f ∥1 =
∞
|f (t)|dt = 0,
−∞
we say that f is L1 -equivalent to the zero function. If f : R → C is L1 equivalent to the zero function, then for any ω ∈ R, Z ∞ Z ∞ Z ∞ b −iωt −iωt f (t)e dt ≤ |f (t)e |dt = |f (t)|dt = 0. f (ω) = −∞
−∞
−∞
Thus, the Fourier transform of f is identically zero. Example 7.24 shows that the Fourier transform of a L1 function is not necessary L1 . Nevertheless, we have the following, which is an extension of the RiemannLebesgue lemma to L1 functions on R. Theorem 7.33 Extended Riemann-Lebesgue Lemma If the function f : R → C is L1 , then Z ∞ f (t)eiβt dt = 0. lim β→∞
−∞
In other words, the Fourier transform fb : R → C of f is a function satisfying lim fb(ω) = 0. ω→±∞
Proof We are given that Z
∞
|f (t)|dt < ∞.
J= −∞
Given ε > 0, there is a L > 0 such that Z ε |f (t)|dt < . 2 |t|≥L
Chapter 7. Fourier Series and Fourier Transforms
595
By triangle inequality, we have Z ∞ Z L Z iβt iβt f (t)e dt ≤ f (t)e dt + −∞
−L
For the second term, we have Z Z iβt f (t)e dt ≤ |t|≥L
|t|≥L
iβt
f (t)eiβt dt .
Z
|f (t)e |dt =
|t|≥L
ε |f (t)|dt < . 2 |t|≥L
By the Riemann-Lebesgue lemma, Z L lim f (t)eiβt dt = 0. β→0
−L
Therefore, there exists M > 0 such that if β > M , then Z L ε iβt f (t)e dt < 2. −L It follows that for all β > M , Z ∞ iβt f (t)e dt < ε. −∞
This proves the assertion. The following theorem imposes a strong condition on a function g : R → C to be the Fourier transform of a L1 function f : R → C. Theorem 7.34 If f : R → C is a L1 function, then its Fourier transform fb : R → C is uniformly continuous. Proof We are given that Z
∞
|f (t)|dt < ∞.
J= −∞
Chapter 7. Fourier Series and Fourier Transforms
596
Without loss of generality, we can assume that J > 0. Notice that for any ω1 and ω2 in R, Z ∞ f (t) e−iω1 t − e−iω2 t dt. fb(ω1 ) − fb(ω2 ) = −∞
Given ε > 0, there is a L > 0 such that Z ε |f (t)|dt < . 3 |t|≥L By triangle inequality, we have Z ∞ b b |f (t)| e−iω1 t − e−iω2 t dt f (ω1 ) − f (ω2 ) ≤ −∞ Z Z −iω t −iω2 t 1 |f (t)| e−iω1 t − e−iω2 t dt. dt + |f (t)| e −e = |t|≥L
|t|≤L
The second term is easy to estimate since |e−iω1 t − e−iω2 t | ≤ 2. We have Z Z −iω t 2ε −iω2 t 1 |f (t)|dt < . |f (t)| e −e dt ≤ 2 3 |t|≥L |t|≥L Since the function g : R → C, g(u) = eiu is continuous at u = 0, there exists a δ > 0 such that if |u| < δ, then |eiu − 1|
0, Z
a
lim
L→∞
0
sin Lx π dx = . x 2
Proof Making a change of variables, we have Z a Z aL sin Lx sin x dx = dx. x x 0 0 Therefore, Z lim
L→∞
0
a
sin Lx dx = lim L→∞ x
Z
aL
0
sin x dx = x
Z 0
∞
sin x π dx = . x 2
Now we can prove our main theorem. A function f : R → C is said to be strongly piecewise differentiable if it is strongly piecewise differentiable on any compact intervals. Theorem 7.38 Fourier Inversion Theorem Let f : R → C be a L1 -function that is strongly piecewise differentiable, and let Z ∞ b f (t)e−iωt dt f (ω) = −∞
be its Fourier transform. Then for any x ∈ R, Z L 1 f+ (x) + f− (x) lim fb(ω)eiωx dω = . L→∞ 2π −L 2
Chapter 7. Fourier Series and Fourier Transforms
601
Proof Notice that Z
L
fb(ω)e
iωx
Z
L
Z
∞
f (t)eiω(x−t) dtdω
dω =
−L
−L L
Z
−∞ ∞
Z
f (x − t)eiωt dtdω.
= −L
−∞
To continue, we need a technical lemma which guarantees we can interchange the order of integrations. Lemma 7.39 Let f : R → C be a function that satisfies the conditions in Theorem 7.38. Then for any L > 0, we have Z LZ ∞ Z ∞Z L iωt f (x − t)e dtdω = f (x − t)eiωt dωdt. −L
−∞
−∞
−L
Assuming this lemma, we can continue with the proof of Theorem 7.38. Proof of Theorem 7.38 Continued By Lemma 7.39, we have Z L Z ∞Z L iωx b f (ω)e dω = f (x − t)eiωt dωdt. −L
−∞
−L
Now we can integrate the integral with respect to ω and obtain Z L Z ∞ sin Lt iωx f (x − t) dt. fb(ω)e dω = 2 t −L −∞ sin Lt is an even function, we find that t Z L Z ∞ f (x + t) + f (x − t) iωx b f (ω)e dω = 2 sin Ltdt. t −L 0
Using the fact that
Chapter 7. Fourier Series and Fourier Transforms
602
Split the integral into two parts, we have Z L Z 1 f (x + t) + f (x − t) iωx b f (ω)e dω = 2 sin Ltdt t −L 0 Z ∞ f (x + t) + f (x − t) +2 sin Ltdt. t 1 Let
f+ (x) + f− (x) . 2 As in the proof of Lemma 7.12, the function h : [0, 1] → C with u=
h(t) =
f (x + t) + f (x − t) − 2u t
when t ∈ (0, 1]
is a Riemann integrable function. Thererfore, the Riemann-Lebesgue lemma implies that Z 1 lim h(t) sin Ltdt = 0. L→∞
0
It follows from Corollary 7.37 that Z 1 Z 1 f (x + t) + f (x − t) sin Lt lim 2 sin Ltdt = lim 4u dt = 2πu. L→∞ L→∞ t t 0 0 On the other hand, Z ∞ Z ∞ f (x + t) + f (x − t) dt ≤ 2 |f (t)|dt < ∞. t 1 −∞ By the extended Riemann-Lebesgue lemma, Z ∞ f (x + t) + f (x − t) lim 2 sin Ltdt = 0. L→∞ t 1 This completes the proof that Z L 1 f+ (x) + f− (x) lim fb(ω)eiωx dω = u = . L→∞ 2π −L 2 Now we prove Lemma 7.39.
Chapter 7. Fourier Series and Fourier Transforms
603
Proof of Lemma 7.39 Given ε > 0, since
Z
∞
|f (t)|dt < ∞, −∞
there is an M > 0 such that Z
ε . 4L
|f (t)|dt < |t|≥M
Since eiωt is an infinitely differentiable function, and f (t) is a piecewice continuous function on any compact intervals, Fubini’s theorem implies that Z L Z x+M Z x+M Z L iωt f (x − t)e dtdω = f (x − t)eiωt dωdt. −L
x−M
−L
x−M
Now, L
Z
Z
∞
f (x − t)e
−L
Z
iωt
Z
Z
x+M
dtdω −
f (x − t)e
−∞ L
L
−L
x−M
|f (x − t)|dtdω ≤ 2L −L
dtdω
Z
Z
≤
iωt
|x−t|≥M
ε |f (t)|dt < . 2 |t|≥M
On the other hand, since | sin Lt| ≤ L|t| for all t ∈ R, we have Z ∞ Z L Z x+M Z L iωt iωt f (x − t)e dωdt − f (x − t)e dωdt −∞ −L x−M −LZ Z sin Lt ε dt ≤ 2L ≤2 |f (x − t)| |f (t)|dt < . t 2 |x−t|≥M |t|≥M This proves that Z L Z ∞ Z iωt f (x − t)e dtdω − −L
−∞
∞
−∞
Z
L iωt
f (x − t)e −L
Since ε > 0 is arbitrary, the assertion follows.
dωdt < ε.
Chapter 7. Fourier Series and Fourier Transforms
604
Corollary 7.40 Let f : R → C be a L1 function that is continuous and strongly piecewise differentiable, and let Z ∞ b f (t)e−iωt dt f (ω) = −∞
be its Fourier transform. If fb : R → C is also a L1 function, then for any t ∈ R, F −1 [fb](t) = f (t). Example 7.30 Since the function g : R → R, 1, g(t) = 0,
if − a ≤ t ≤ a, otherwise,
is strongly piecewise differentiable L1 function with Fourier transform gb(ω) =
2 sin aω , ω
the Fourier inversion theorem implies that for |t| < a, 1 lim L→∞ π
Z
1 lim L→∞ π
Z
1 lim L→∞ π
Z
L
−L
sin aω iωt e dω = 1, ω
while if |t| > a, L
−L
sin aω iωt e dω = 0, ω
and for |t| = a, L
−L
sin aω iωt 1 e dω = . ω 2
If f : R → C and g : R → C are L2 functions, then the Cauchy Schwarz
Chapter 7. Fourier Series and Fourier Transforms
605
inequality implies that for any L > 0, Z L 2 Z L 2 2 f (t)g(t)dt ≤ |f (t)| dt |g(t)| dt −L −L −L Z ∞ Z ∞ 2 2 ≤ |f (t)| dt |g(t)| dt .
Z
L
−∞
−∞
This implies that the improper integral Z ∞ f (t)g(t)dt −∞
converges absolutely. Thus, we can define a positive semi-definite inner product on the space of L2 functions on R by Z ∞ ⟨f, g⟩ = f (t)g(t)dt. −∞
The L2 semi-norm is the norm induced by this inner product. The following is a generalization of the Parseval’s identity to Fourier transforms. Theorem 7.41 Parseval- Plancherel Identity If f : R → C is a Riemann integrable function that is both L1 and L2 , then its Fourier transform fb : R → C is a L2 function. Moreover, ∥f ∥22 = Namely, Z
∞
1 b2 ∥f ∥2 . 2π
1 |f (t)| dt = 2π −∞ 2
Z
∞
|fb(ω)|2 dω.
(7.10)
−∞
Sketch of Proof A rigorous proof of this theorem requires advanced tools in analysis. We give a heuristic argument for the validity of the formula (7.10) under the additional assumption that f : R → C is continuous and strongly piecewise differentiable, and fb : R → C is also L1 .
Chapter 7. Fourier Series and Fourier Transforms
606
Since fb : R → C is a continuous and strongly piecewise differentiable L1 function, the Fourier inversion theorem implies that for all t ∈ R, Z ∞ 1 −1 b f (t) = F [f ](t) = fb(ω)eiωt dt. 2π −∞ Notice that fb : R → C is a L2 -function if the limit Z L b 2 lim f (ω) dω L→∞
−L
exists. By the definition of Fourier transform, Z ∞ Z L Z L b 2 b f (t)e−itω dtdω. f (ω) f (ω) dω = −L
−L
−∞
By Theorem 7.34, fb(ω) is uniformly continuous. By the RiemannLebesgue lemma, lim fb(ω) = 0. These imply that the function fb : R → ω→±∞
C is bounded. Using the same reasoning as in the proof of Lemma 7.39, we can interchange the order of integrations and obtain Z L Z ∞ Z L b 2 f (t) fb(ω)e−itω dωdt. f (ω) dω = −L
−∞
−L
Since fb : R → C is L1 , we can take the L → ∞ limit under the integral sign. Since L
Z
fb(ω)e−itω dω =
lim
L→∞
−L
Z
∞
fb(ω)eiωt dω, −∞
we conclude that Z
L
lim
L→∞
−L
Z b 2 f (ω) dω = 2π
∞
−∞
|f (t)|2 dt.
Chapter 7. Fourier Series and Fourier Transforms
607
Example 7.31 For the function f : R → C, f (t) = e−a|t| with a > 0, its Fourier transform 2a is fb : R → C, fb(ω) = 2 . Notice that a + ω2 Z ∞ Z ∞ 1 2 e−2at dt = . |f (t)| dt = 2 a 0 −∞ The Parseval-Plancherel formula implies that Z ∞ Z ∞ Z ∞ 1 π π 1 b 2 |f (t)|2 dt = 3 . dω = 2 f (ω) dω = 2 2 2 2 4a −∞ 2a −∞ 2a −∞ (a + ω ) One of the applications of Fourier transforms is to solve differential equations. For this we need the following. Theorem 7.42 Let f : R → C be a continuously differentiable L1 function such that lim f (t) = 0,
t→±∞
and its derivative f ′ : R → C is a Riemann integrable function that has Fourier transform. Then F[f ′ ](ω) = iωF[f ](ω). Proof This follows from integration by parts. For any a and b with a < b, Z b Z b ′ −itω −itω b f (t)e−itω dt. f (t)e dt = f (t)e + iω a a
a
The assertion follows by taking the limit a → −∞ and b → ∞. Example 7.32 2
Find the Fourier transform of the function f : R → C, f (t) = t2 e−t .
Chapter 7. Fourier Series and Fourier Transforms
608
Solution Let g : R → C be the function 2
g(t) = e−t . Then 2
2
g ′ (t) = −2te−t ,
2
g ′′ (t) = −2e−t + 4t2 e−t = −2g(t) + 4f (t).
Since lim g ′ (t) = 0,
lim g(t) = 0,
t→±∞
t→±∞
and F[g](ω) =
√
ω2
πe− 4 ,
Theorem 7.42 implies that √ ω2 F[g ′′ ](ω) = − πω 2 e− 4 . By linearity, F[g ′′ ] = −2F[g] + 4F[f ]. Therefore,
√ F[f ](ω) = π
1 ω2 − 2 4
ω2
e− 4 .
In the following, we consider an operation called convolution. Definition 7.17 Convolution Let f : R → C and g : R → C be Riemann integrable functions. The convolution of f and g is the function defined as Z ∞ Z ∞ (f ∗ g)(x) = f (x − t)g(t)dt = f (t)g(x − t)dt, −∞
−∞
whenever this integral is convergent. Notice that the improper integral defining f ∗ g is convergent for any x in R when f and g are L2 functions. Convolutions can be defined for a wider class of functions. For example, if the supports of the functions f and g are both contained
Chapter 7. Fourier Series and Fourier Transforms
609
in [0, ∞), then the integral is only nonzero when 0 ≤ t ≤ x. This gives Z x (f ∗ g)(x) = f (t)g(x − t)dt, 0
which is also well-defined for any x ∈ R. In fact, this is the convolution one sees in the theory of Laplace transforms. Example 7.33 Let f : R → R be the function defined as 1, if 0 ≤ x ≤ 1, f (x) = 0, otherwise,
(7.11)
and let g : R → R be the function g(x) = x. Then Z 1 Z ∞ 1 (x − t)dt = x − . f (t)g(x − t)dt = (f ∗ g)(x) = 2 0 −∞ For f ∗ f , we have
Z
1
Z
x
f (x − t)dt =
(f ∗ f )(x) = 0
f (t)dt = x−1
0, x,
if
2 − x, 0,
if 1 ≤ x ≤ 2,
x < 0,
if 0 ≤ x < 1,
if
x > 2.
Figure 7.18: The function f : R → R defined by (7.11) and the function f ∗ f . Convolution usually smooths up a function, as shown in Figure 7.18. In the theory of Fourier transforms, convolution plays an important role because of the following.
Chapter 7. Fourier Series and Fourier Transforms
610
Theorem 7.43 Let f : R → C and g : R → C be functions that are both L1 and L2 . Then (f ∗ g) : R → C is an L1 function and F[f ∗ g] = F[f ]F[g]. Sketch of Proof Fubini’s theorem implies that Z ∞Z ∞ Z ∞ |f (x − t)||g(t)|dtdx |(f ∗ g)(x)|dx ≤ −∞ −∞ −∞ Z ∞ Z ∞ ≤ |g(t)| |f (x − t)|dxdt −∞ −∞ Z ∞ = ∥f ∥1 |g(t)|dt = ∥f ∥1 ∥g∥1 . −∞
This shows that f ∗ g is an L1 function. By Fubini’s theorem again, we have Z ∞Z ∞ F[f ∗ g](ω) = f (x − t)g(t)dte−iωx dx −∞ Z ∞ Z−∞ ∞ f (x − t)e−iω(x−t) dxe−iωt dt g(t) = −∞ −∞ Z ∞ = F[f ](ω) g(t)e−iωt dt = F[f ](ω)F[g](ω). −∞
Example 7.34 Find the Fourier transform of the function g : R → R, 0, if x < 0, t, if 0 ≤ t < 1, g(t) = 2 − t, if 1 ≤ t ≤ 2, 0, if t > 2.
Chapter 7. Fourier Series and Fourier Transforms
611
Solution By Example 7.33, g = f ∗ f , where f is the function given by 7.11. The Fourier transform of f is ω 2 sin −iω iω 1 − e 2. = e− 2 fb(ω) = iω ω Therefore, the Fourier transform of g is gb(ω) = fb(ω) × fb(ω) = e−iω
4 sin2 ω2
ω 2.
Now we list down some other useful properties of Fourier transforms. The proofs are left as exercises. Theorem 7.44 Let f : R → C be a L1 function and let a be a real number. (a) If g : R → C is the function g(t) = f (t − a), then gb(ω) = e−iaω fb(ω). (b) If h : R → C is the function h(t) = f (t)eiat , then b h(ω) = fb(ω − a). Example 7.35 Let us consider solving a partial differential equation of the form 2 ∂ 2u 2∂ u − c = 0, ∂t2 ∂x2
(7.12)
where c is a positive constant. This is the called the wave equation. The function u is a function in (t, x) ∈ R2 . For simplicity, we assume that u, ut , utt are infinitely differentiable bounded L1 functions which decays to 0 when t → ±∞. Let u b(ω, x) be the Fourier transform of u with respect to the variable t. Then F[utt ](ω, x) = −ω 2 u b(ω, x).
Chapter 7. Fourier Series and Fourier Transforms
612
It can be justified that ∂2 F[uxx ] = F[u]. ∂x2 Thus, under Fourier transform with respect to t, the partial differential equation (7.12) is transformed to a second order ordinary differential equation ω2 b(ω, x) = 0 (7.13) u bxx (ω, x) + 2 u c with respect to the variable x. The general solution is iω
iω
u b(ω, x) = A(ω)e c x + B(ω)e− c x for some infinitely differentiable functions A(ω) and B(ω). Assume that e F −1 [A](t) = A(t), iω
e F −1 [B](t) = B(t).
iω
Then A(ω)e c x and B(ω)e− c x are the Fourier transforms of the functions e t+ x e t− x A and B c c respectively. These give x x e e +B t− . u(t, x) = A t + c c Let
e t ϕ(t) = A c
e −t and ψ(t) = B c
.
Then u(t, x) = ϕ(x + ct) + ψ(x − ct). This shows that the solution of the wave equation can be written as a sum of a left-travelling wave ϕ(x + ct) and a right-travelling wave ψ(x − ct).
Chapter 7. Fourier Series and Fourier Transforms
613
Exercises 7.5 Question 1 Let f : R → C be a L1 function and let a be a real number. Define the function g : R → C by g(t) = f (t − a). Show that gb(ω) = e−iaω fb(ω). Question 2 Let f : R → C be a L1 function and let a be a real number. Define the function g : R → C by g(t) = f (t)eiat . Show that gb(ω) = fb(ω − a). Question 3 Find the Fourier transform of the function f : R → C. (a) f (t) =
1 t2 + 4
(b) f (t) =
1 t2 + 4t + 13
(c) f (t) =
sin t t2 + 4t + 13
Question 4 Let f : R → R be the function f (t) = e−3|t| , and let g : R → R be the function g(t) = (f ∗ f )(t). Use convolution theorem to find the Fourier transform of the function g : R → R.
Chapter 7. Fourier Series and Fourier Transforms
614
Question 5 Let a and b two distinct positive numbers, and let f : R → R and g : 2 2 R → R be the functions f (t) = e−at and g(t) = e−bt . Find the function h : R → R defined as h(t) = (f ∗ g)(t). Question 6 Let f : R → C be a bounded L1 function. Show that f is L2 .
Appendix A. Sylvester’s Criterion
615
Appendix A Sylvester’s Criterion In this section, we give a proof of the Sylvester’s criterion, which gives a necessary and sufficient condition for a symmetric matrix to be positive definite. The proof uses the LDU factorization of a matrix. Given an n × n matrix A and an integer 1 ≤ k ≤ n, the k th principal submatrix of A, denoted by Mk (A), is the k × k matrix consists of the first k rows and first k columns of A. The Sylvester’s criterion is the following. Theorem A.1 Sylvester’s Criterion for Positive Definiteness An n × n symmetric matrix A is positive definite if and only if det Mk > 0 for all 1 ≤ k ≤ n, where Mk is its k th principal submatrix. For a positive integer n, let Mn be the vector space of n × n matrices, and let Ln , Un and Dn be respectively the subspaces that consist of lower triangular, upper triangular, and diagonal matrices. Also, let Len = {L ∈ Ln | all the diagonal entries of L are equal to 1} , Uen = {U ∈ Un | all the diagonal entries of U are equal to 1} . Notice that L is in Len if and only if its transpose LT is in Uen . The set of n × n invertible matrices is a group under matrix multiplication. This group is denoted by GL(n, R), and is called the general linear group. As a set, it is the subset of Mn that consists of all the matrices A with det A ̸= 0. The group GL(n, R) has a subgroup that contains all the invertible matrices with determinant 1, deonoted by SL(n, R), and is called the special linear group. The sets Len and Uen are subgroups of SL(n, R). If A is an n × n matrix, an LDU factorization of A is a factorization of the form A = LDU,
Appendix A. Sylvester’s Criterion
616
where L ∈ Len , D ∈ Dn , and U ∈ Uen . Notice that det A = det D. Hence, A is invertible if and only if all the diagonal entries of D are nonzero. The following proposition says that the LDU decomposition of an invertible matrix is unique. Proposition A.2 Uniqueness of LDU Factorization If A is an n × n invertible matrix that has an LDU factorization, then the factorization is unique. Proof We need to prove that if L1 , L2 are in Len , U1 , U2 are in Uen , D1 , D2 are in Dn , and L1 D1 U1 = L2 D2 U2 , then L1 = L2 , U1 = U2 and D1 = D2 . −1 Let L = L−1 2 L1 and U = U2 U1 . Then LD1 = D2 U. Notice that L is in Len and LD1 is in Ln . Similarly, U is in Uen and D2 U is in Un . The intersection of Ln and Un is Dn . Thus, there exists D ∈ Dn such that LD1 = D2 U = D. Since A is invertible, D1 and D2 are invertible. Hence, L = DD1−1
and U = D2−1 D
are diagonal matrices. Since all the diagonal entries of L and U are 1, we find that DD1−1 = In and D2−1 D = In , where In is the n×n identity matrix. This proves that D1 = D = D2 . But then L = In = U , which imply that L1 = L2 and U1 = U2 .
Appendix A. Sylvester’s Criterion
617
Corollary A.3 (i) Given L0 ∈ Ln , if L0 is invertible, it has a unique LDU decomposition with U = In the n × n identity matrix. (ii) Given U0 ∈ Un , if U0 is invertible, it has a unique LDU decomposition with L = In the n × n identity matrix. Proof It suffices to establish (i). The uniqueness is asserted in Proposition A.2. For the existence, let L0 = [aij ], where aij = 0 if i < j. Since L0 is invertible, aii ̸= 0 for all 1 ≤ i ≤ n. Let D = [dij ] be the diagonal matrix with dii = aii for 1 ≤ i ≤ n. Then D is invertible. Define L = L0 D−1 . Then L is a lower triangular matrix and for 1 ≤ i ≤ n, Lii = aii d−1 ii = 1. This shows that L is in Len . Thus, L0 = LD is the LDU decomposition of L0 with U = In . The following lemma says that multiplying by a matrix L in Len does not affect the determinants of the principal submatrices. Lemma A.4 Let A be an n × n matrix, and let L be a matrix in Len . If B = LA, then for 1 ≤ k ≤ n, det Mk (B) = det Mk (A). Proof For an n × n matrix C, we partition it into four blocks " # Mk (C) Nk (C) C= . Pk (C) Qk (C)
Appendix A. Sylvester’s Criterion
618
For L ∈ Len , Mk (L) is in Lek , and Nk (L) is the zero matrix. Now B = LA implies that " # " #" # Mk (B) Nk (B) Mk (L) Nk (L) Mk (A) Nk (A) = . Pk (B) Qk (B) Pk (L) Qk (L) Pk (A) Qk (A) This implies that Mk (B) = Mk (L)Mk (A) + Nk (L)Pk (A) = Mk (L)Mk (A). Since Mk (L) ∈ Lek , det Mk (L) = 1. Therefore, det Mk (B) = det Mk (A). Lemma A.4 has an upper triangular counterpart. Corollary A.5 Let A be an n × n matrix, and let U be a matrix in Uen . If B = AU , then for 1 ≤ k ≤ n, det Mk (B) = det Mk (A). Sketch of Proof T
Notice that that Mk (B ) = Mk (B)T , and B T = U T AT , where U T is in Len . The result follows from the fact that det C T = det C for any k × k matrix C. Now we prove the following theorem which asserts the existence of LDU decomposition for a matrix A with det Mk (A) ̸= 0 for all 1 ≤ k ≤ n. Theorem A.6 Let A = [aij ] be an n×n matrix such that det Mk (A) ̸= 0 for all 1 ≤ k ≤ n. Then A has a unique LDU decomposition.
Appendix A. Sylvester’s Criterion
619 Proof
Notice that Mn (A) = A. Since we assume that det Mn (A) ̸= 0, A is invertible. The uniqueness of the LDU decomposition of A is asserted in Proposition A.2. We prove the statement by induction on n. When n = 1, take L = U = [1] and D = A = [a] itself. Then A = LDU is the LDU decomposition of A. Let n ≥ 2. Suppose we have proved that any (n − 1) × (n − 1) matrix B that satisfies det Mk (B) ̸= 0 for 1 ≤ k ≤ n − 1 has a unique LDU decomposition. Now assume that A is an n × n matrix with det Mk (A) ̸= 0 for all 1 ≤ k ≤ n. Since det M1 (A) = a11 , a = a11 ̸= 0. Let L1 = [Lij ] be the matrix in Len such that for 2 ≤ i ≤ n, Li1 =
ai1 , a
and for 2 ≤ j < i ≤ n, Lij = 0. Namely, # " 0 1 , L1 = P1 (L1 ) In−1 where
1 P1 (L1 ) = P1 (A). a " # 1 0 L−1 , 1 = −P1 (L1 ) In−1
Notice that
and " C = L−1 1 A =
1 0 −P1 (L1 ) In−1
#"
a N1 (A) P1 (A) Q1 (A)
#
is a matrix with P1 (C) = −aP1 (L1 ) + P1 (A) = 0.
" =
a N1 (C) P1 (C) Q1 (C)
#
Appendix A. Sylvester’s Criterion
620
By Lemma A.4, for all 1 ≤ k ≤ n.
det Mk (C) = det Mk (A)
Let B = Q1 (C). Then B is an (n − 1) × (n − 1) matrix. Since P1 (C) = 0, we find that for 1 ≤ k ≤ n − 1, det Mk+1 (C) = a det Mk (B). This shows that det Mk (B) ̸= 0
for all 1 ≤ k ≤ n − 1.
By inductive hypothesis, B has a unique LDU decomposition given by B = LB DB UB . Now let L2 be the matrix in Len given by " # 1 0 L2 = . 0 L−1 B One can check that " (L1 L2 )−1 A = L−1 2 C =
a N1 (C) 0 DB UB
# .
Let L = L1 L2 . Then L is in Len . Since DB UB is an upper triangular (n − 1) × (n − 1) matrix, L−1 A is an upper triangular n × n matrix. By Corollary A.3, L−1 A has a decomposition L−1 A = DU, where D ∈ Dn and U ∈ Uen . Thus, A = LDU is the LDU decomposition of A. Now we can complete the proof of the Sylvester’s criterion for a symmetric matrix to be positive definite.
Appendix A. Sylvester’s Criterion
621
Proof of Sylvester’s Criterion Let A be an n × n symmetric matrix. First we prove that if A is positive definite, then for 1 ≤ k ≤ n, det Mk (A) > 0. Notice that Mk (A) is also a symmetric matrix. For u ∈ Rk , let v be the vector in Rn given by v = (u, 0, . . . , 0). Then vT Av = uT Mk (A)u. This shows that Mk (A) is also positive definite. Hence, all the eigenvalues of Mk (A) must be positive. This implies that det Mk (A) > 0. Conversely, assume that det Mk (A) > 0 for all 1 ≤ k ≤ n. By Theorem A.6, A has a LDU decomposition given by A = LDU. Since A is symmetric, AT = A. This gives U T DT LT = AT = A = LDU. Since U T is in Len and LT is in Uen , the uniqueness of LDU decomposition implies that U = LT . Hence, A = LDLT . By Lemma A.4 and Corollary A.5, det Mk (A) = det Mk (D). If D = [dij ], let τi = dii . Then Mk (D) = τ1 τ2 . . . τk . Since det Mk (A) > 0 for all 1 ≤ k ≤ n, τi > 0 for all 1 ≤ i ≤ n. By the invertible change of coordinates y = LT x, we find that if x ∈ Rn \ {0}, xT Ax = yT Dy = τ1 y12 + τ2 y22 + · · · + τn yn2 > 0. This proves that A is positive definite.
Appendix B. Volumes of Parallelepipeds
622
Appendix B Volumes of Parallelepipeds In this appendix, we give a geometric proof of the formula for the volume of a parallelepiped in Rn . Theorem B.1 Let P be a parallelepiped in Rn spanned by the linearly independent vectors v1 , . . . , vn . Then the volume of P is equal to | det A|, where A is the matrix whose column vectors are v1 , . . . , vn . Let us look at a special case of parallelepiped where this theorem is easy to prove by simple geometric consideration. Definition B.1 Generalized Rectangles A parallelepiped that is spanned by n nonzero orthogonal vectors w1 , . . ., wn is called a generalized rectangle. A generalized rectangle R based at the origin and spanned by the n nonzero orthogonal vectors w1 , . . ., wn is equal to B(Qn ), where Qn = [0, 1]n is the standard unit cube, and B is the matrix h i B = w1 · · · wn . By geometric consideration, the volume of R is given by the product of the lengths of its edges. Namely, vol (R) = ∥w1 ∥ · · · ∥wn ∥. To see that this is equal to det B, let u1 , . . ., un be the unit vectors in the directions of w1 , . . ., wn . Namely, ui =
wi ∥wi ∥
1 ≤ i ≤ n.
Appendix B. Volumes of Parallelepipeds
623
Then B = P D, where P is an orthogonal matrix and D is a diagonal matrix given respectively by ∥w1 ∥ 0 ··· 0 h i 0 ∥w2 ∥ · · · 0 P = u1 · · · un , D= . (B.1) .. .. .. . . . . . . 0
0
···
∥wn ∥
An n × n matrix P is called an orthogonal matrix if P T P = P P T = In , where In is the n × n identity matrix. A matrix P is orthogonal if and only if the column vectors of P form an orthonormal basis of Rn . If P is orthogonal, P −1 = P T , and P −1 is also orthogonal. From P T P = In , we find that det(P ) det(P T ) = det(In ) = 1. Since det(P T ) = det(P ), we have det(P )2 = 1. Hence, the determinant of an orthogonal matrix can only be 1 or −1. Therefore, when B = P D, with P and D as given in (B.1), we have | det B| = | det P det D| = | det D| = ∥w1 ∥ · · · ∥wn ∥. Remark B.1 In the argument above, we do not show that the volume of a generalized rectangle spanned by the n nonzero orthogonal vectors w1 , . . ., wn is equal to ∥w1 ∥Z· · · ∥wn ∥ using the definition of vol (R) in terms of a Riemann dx. This is elementary but tedious.
integral R
A linear transformation T : Rn → Rn , T(x) = P x defined by an orthogonal matrix P is called an orthogonal transformation. The significance of an orthognal transformation is as follows. For any u and v in Rn , ⟨T(u), T(v)⟩ = (P u)T (P v) = uT P T P v = uT v = ⟨u, v⟩. Namely, T preserves inner products. Since lengths and angles are defined in terms of the inner product, this implies that an orthogonal transformation preserves lengths and angles.
Appendix B. Volumes of Parallelepipeds
624
Under an orthogonal transformation, the image of a rectangle R is a rectangle that is congruent to R. Since the volume of a Jordan measurable set D is obtained by taking the limit of a sequence of Darboux lower sums, and each Darboux lower sum is a sum of volumes of rectangles with disjoint interiors that lie in D, we find that orthogonal transformations also preserve the volumes of Jordan measurable sets. Theorem B.2 If T : Rn → Rn , T(x) = P x is an orthogonal transformation, and D is a Jordan measurable set, then T(D) is also Jordan measurable and vol (T(D)) = vol (D). To finish the proof of Theorem B.1, we also need the following fact. Proposition B.3 Let P be a parallelepiped based at the origin and spanned by the vectors v1 , . . ., vn . Assume that πn (vi ) = 0
for 1 ≤ i ≤ n − 1,
or equivalently, v1 , . . . , vn−1 lies in the plane xn = 0. For 1 ≤ i ≤ n − 1, let zi ∈ Rn−1 be such that vi = (zi , 0). If Q is the parallelepiped in Rn−1 based at the origin and spanned by z1 , . . ., zn−1 , then vol (P) = vol (Q)h, where h is the distance from vn to the xn = 0 plane, which is given explicitly by h = projen vn . When n = 3, this can be argued geometrically. For general n, let us give a proof using the definition of volume as a Riemann integral.
Appendix B. Volumes of Parallelepipeds
625
Proof Recall that P = {t1 v1 + · · · + tn−1 vn−1 + tn vn | t ∈ [0, 1]n } . Notice that vn can be written as vn = (a, h) for some a ∈ Rn−1 . Hence, if a point x is in P, then t a + z, t , x= h where 0 ≤ t ≤ h, and z is a point in Q. For 0 ≤ t ≤ h, let t Qt = a + z, t 0 ≤ t ≤ h . h Then it is a (n − 1)-dimensional parallelepiped contained in the hyperplane xn = t, which is a translate of the (n − 1)-dimensional parallelepiped Q0 . By Fubini’s theorem, Z Z h Z vol (P) = dx = dx1 · · · dxn−1 dxn P h
0
Qt
Z
Z
vol (Q0 )dt = vol (Q)h.
vol (Qt )dt =
=
h
0
0
Now we can prove Theorem B.1. Proof of Theorem B.1 We prove by induction on n. The n = 1 case is obvious. Assume that we have proved the n − 1 case. Now given that P is a parallelepiped in Rn which is spanned by v1 , . . . , vn , we can assume that P is based at the origin 0 because translations preserve volumes. Let h i A = v 1 · · · vn . We want to show that vol (P) = | det A|.
Appendix B. Volumes of Parallelepipeds
626
Let W be the subspace of Rn−1 that is spanned by v1 , . . . , vn−1 . Applying the Gram-Schmidt process to the basis {v1 , . . . , vn } of Rn , we obtain an orthonormal basis {u1 , . . . , un }. By the algorithm, the unit vector un is orthogonal to the subspace W . Let i h P = u1 · · · un be the orthogonal matrix whose column vectors are u1 , . . . , un , and consider the orthogonal transformation T : Rn → Rn , T(x) = P −1 x = P T x. For 1 ≤ i ≤ n, let ei = T(vi ). v f = T(P) is a parallelepiped that has the same volume as P, and Then P e1 , . . . , v en . Notice that it is spanned by v i h i h e= v A e n = P T v 1 · · · vn e1 · · · v ⟨u , v ⟩ 1 n uT1 .. i h B . . v 1 · · · vn = . .. = ⟨u , v ⟩ n−1 n T un 0 ⟨un , vn ⟩ From this, we find that e = det(B) × ⟨un , vn ⟩. det(A) Comparing the columns, we also have ei = (zi , 0) v
for 1 ≤ i ≤ n − 1,
where z1 , . . . , zn−1 are the column vectors of B, which are vectors in Rn−1 ; and ⟨u1 , vn ⟩ .. . . en = v ⟨un−1 , vn ⟩ ⟨un , vn ⟩
Appendix B. Volumes of Parallelepipeds
627
The transformation T maps the subspace W to the hyperplane xn = 0, which can be identified with Rn−1 . Let Q be the parallelepiped in Rn−1 based at the origin and spanned by the vectors z1 , . . . , zn−1 . The volume of f is equal to the volume of Q times the distance h from the parallepiped P en to the plane xn = 0. By definition, the tip of the vector v
en = |⟨un , vn ⟩| . h = projen v Proposition B.3 gives f = vol (Q) × |⟨un , vn ⟩| . vol (P) By inductive hypothesis, vol (Q) = | det(B)|. Therefore, f = |det(B) × ⟨un , vn ⟩| = | det(A)|. e vol (P) e = P T A, we find that Since A e = det(P T ) det(A) = ± det(A). det(A) Hence, f = | det(A)|. vol (P) = vol (P) This completes the proof of Theorem B.1. As a corollary , we have the following. Theorem B.4 Let I be a closed rectangle in Rn , and let T : Rn → Rn , T(x) = Ax be an invertible linear transformation. Then vol (T(I)) = | det A| vol (I).
Appendix B. Volumes of Parallelepipeds
628
Proof Let I =
n Y
[ai , bi ]. Then I = S(Qn ) + a, where a = (a1 , . . . , an ),
i=1
Qn is the standard unit cube [0, 1]n , and S : Rn → Rn is the linear transformation defined by the diagonal matrix B with diagonal entries b1 − a1 , b2 − a2 , . . . , bn − an . Therefore, T(I) = (T ◦ S)(Qn ) + T(a). Since the matrix associated with the linear transformation (T ◦ S) : Rn → Rn is AB, T(I) is a parallelepiped based at T(a) and spanned by the columnn vectors of AB. By Theorem B.1, vol (T(I)) = | det(AB)| = | det(A)|| det(B)|. Obviously, n Y | det B| = (bi − ai ) = vol (I). i=1
This proves that vol (T(I)) = | det A| vol (I). Remark B.2 The formula vol (T(I)) = | det A| vol (I) still holds even though the matrix A is not invertible. In this case, det A = 0, and the column vectors of A are not linearly independent. Therefore, T(I) lies in a plane in Rn , and so T(I) has zero volume.
Appendix C. Riemann Integrability
629
Appendix C Necessary and Sufficient Condition for Riemann Integrability In this appendix, we want to prove the Lebesgue-Vitali theorem which gives a necessary and sufficient condition for a bounded function f : D → R to be Riemann integrable. We will introduce the concept of Lebesgue measure zero without introducing the concept of general Lebesgue measure. The latter is often covered in a standard course in real analysis. n Y Recall that the volume of a closed rectangle I = [ai , bi ] or its interior i=1
n Y int (I) = (ai , bi ) is i=1
n Y vol (I) = vol (int I) = (bi − ai ). i=1 n
If A is a subset of R , we say that A has Jordan content zero if (i) for every ε > 0, there are finitely many closed rectangles I1 , . . ., Ik such that A⊂
k [
Ij
and
j=1
k X
vol (Ij ) < ε.
j=1
This is equivalent to any of the followings. (ii) For every ε > 0, there are finitely many closed cubes Q1 , . . ., Qk such that A⊂
k [ j=1
Qj
and
k X
vol (Qj ) < ε.
j=1
(iii) For every ε > 0, there are finitely many open rectangles U1 , . . ., Uk such that k k [ X A⊂ Uj and vol (Uj ) < ε. j=1
j=1
Appendix C. Riemann Integrability
630
(iv) For every ε > 0, there are finitely many open cubes V1 , . . ., Vk such that A⊂
k [
Vj
and
k X
j=1
vol (Vj ) < ε.
j=1
A set has Jordan content zero if and only if it is Jordan measurable and its volume is zero. Hence, we also call a set that has Jordan content zero as a set that has Jordan measure zero. The Jordan measure of a Jordan measurable set A is the volume of A defined as the Riemann integral of the characteristic function χA : A → R. In Lebesgue measure, instead of a covering by finitely many rectangles, we allow a covering by countably many rectangles. A set S is countable if it is finite or it is countably infinite. The latter means that there is a one-to-one correspondence between S and the set Z+ . In any case, a set S is countable if and only if there is a surjection h : Z+ → S, which allows us to write S = sk | k ∈ Z+ ,
where sk = h(k).
Definition C.1 Lebesgue Measure Zero Let A be a subset of Rn . We say that A has Lebesgue measure zero if for every ε > 0, there is a countable collection of open rectangles {Uk | k ∈ Z+ } that covers A, the sum of whose volumes is less than ε. Namely, A⊂
∞ [ k=1
Uk
and
∞ X
vol (Uk ) < ε.
k=1
The following is obvious. Proposition C.1 Let A be a subset of Rn . If A has Jordan content zero, then it has Lebesgue measure zero. The converse is not true. There are sets with Lebesgue measure zero, but they do not have Jordan content zero. The following gives an example of such sets.
Appendix C. Riemann Integrability
631
Example C.1 Let A = Q∩[0, 1]. The function χA : [0, 1] → R is the Dirichlet’s function, which is not Riemann integrable. Hence, A is not Jordan measurable. Nevertheless, we claim that A has Lebesgue measure zero. Recall that Q is a countable set. As a subset of Q, A is also countable. Hence, we can write A as A = ak | k ∈ Z+ . Given ε > 0 and k ∈ Z+ , let Uk be the open rectangle ε ε Uk = ak − k+2 , ak + k+2 . 2 2 Then ak ∈ Uk for each k ∈ Z+ . Thus, A⊂
∞ [
Uk .
k=1
Now,
∞ X
∞ X ε ε vol (Uk ) = = < ε. k+1 2 2 k=1 k=1
Therefore, A has Lebesgue measure zero. The converse to Proposition C.1 is true if A is compact. Proposition C.2 Let A be a compact subset of Rn . If A has Lebesgue measure zero, then it has Jordan content zero. Proof Given ε > 0, since A has Lebesgue measure zero, there is a countable collection {Uα | α ∈ Z+ } of open rectangles that covers A, and X α∈Z+
vol (Uα ) < ε.
Appendix C. Riemann Integrability
632
Since A is compact, there is a finite subcollection {Uαl | 1 ≤ l ≤ m} that covers A. Obviously, we also have m X
vol (Uαl ) < ε.
l=1
Hence, A has Jordan content 0. Example C.2 Using the same reasoning as in Example C.1, one can show that any countable subset of Rn has Lebesgue measure zero. We have seen that if A is a subset of Rn that has Jordan content zero, then its closure A also has Jordan content zero. However, the same is not true for Lebesgue measure. Example C.3 Example C.1 shows that the set A = Q ∩ [0, 1] has Lebesgue measure zero. Notice that A = [0, 1]. It cannot have Lebesgue measure zero. As in the case of Jordan content zero, we have the following equivalences for a set A in Rn to have Lebesgue measure zero. (i) For every ε > 0, there is a countable collection of open rectangles {Uk | k ∈ Z+ } such that A⊂
∞ [
Uk
k=1
and
∞ X
vol (Uk ) < ε.
k=1
(ii) For every ε > 0, there is a countable collection of closed rectangles {Ik | k ∈ Z+ } such that ∞ ∞ [ X A⊂ Ik and vol (Ik ) < ε. k=1
k=1
(iii) For every ε > 0, there is a countable collection of open cubes {Vk | k ∈ Z+ }
Appendix C. Riemann Integrability such that A⊂
∞ [
Vk
633
and
k=1
∞ X
vol (Vk ) < ε.
k=1
(iv) For every ε > 0, there is a countable collection of closed cubes {Qk | k ∈ Z+ } such that A⊂
∞ [
Qk
and
k=1
∞ X
vol (Qk ) < ε.
k=1
The following is obvious. Proposition C.3 Let A be a subset of Rn . If A has Lebesgue measure zero, and B is a subset of A, then B also has Lebesgue measure zero. Using the fact that the set Z+ ×Z+ is countable, we find that a countable union of coutable sets is countable. This gives the following. Proposition C.4 Let {Am | m ∈ Z+ } be a countable collection of subsets of Rn . If each of ∞ [ + the Am , m ∈ Z has Lebesgue measure zero, then the set A = Am also m=1
has Lebesgue measure zero. Proof Fixed ε > 0. For each m ∈ Z+ , since Am has Lebesgue measure zero, there is a countable collection Bm = Um,k | k ∈ Z+ of open rectangles such that ∞ ∞ [ X ε Am ⊂ Um,k and vol (Um,k ) < m . 2 k=1 k=1 It follows that A⊂
∞ [ ∞ [ m=1 k=1
Um,k
and
∞ X ∞ X m=1 k=1
vol (Um,k )
0 u,v∈B(x ,r)∩D 0
Appendix C. Riemann Integrability
635
Lemma C.6 Let D be a subset of Rn , and let x0 be a point in D. Assume that f : D → R is a bounded function. Then f is continuous at x0 if and only if ωf (x0 ) = 0. Proof First assume that f is continuous at x0 . Given ε > 0, there is a δ > 0 such that for all x ∈ B(x0 , δ) ∩ D, ε |f (x) − f (x0 )| < . 3 It follows that for all u, v ∈ B(x0 , δ) ∩ D, |f (u) − f (v)|
0, there is a δ > 0 such that for all 0 < r < δ, sup {f (u) − f (v) | u, v ∈ B(x0 , r) ∩ D} < ε. If x is in B(x0 , δ/2) ∩ D, |f (x) − f (x0 )| ≤ sup {f (u) − f (v) | u, v ∈ B(x0 , δ/2) ∩ D} < ε. This proves that f is continuous at x0 .
Appendix C. Riemann Integrability
636
Corollary C.7 Let D be a subset of Rn , and let f : D → R be a bounded function defined on D. If N is the set of discontinuities of f , then N = {x ∈ D | ωf (x) > 0} . We also need the following proposition. Proposition C.8 Let D be a compact subset of Rn , and let f : D → R be a bounded function defined on D. (a) For any a > 0, the set A = {x ∈ D | ωf (x) ≥ a} is a compact subset of Rn . (b) If N is the set of discontinuities of f , then N =
∞ [
Nk ,
k=1
where
Nk =
1 . x ∈ D ωf (x) ≥ k
(c) The set N has Lebesgue measure zero if and only if Nk has Jordan content zero for each k ∈ Z+ .
Appendix C. Riemann Integrability
637
Proof Since D is compact, it is closed and bounded. For part (a), A ⊂ D implies A is bounded. To prove that A is compact, we only need to show that A is closed. This is equivalent to Rn \ A is open. Notice that Rn \ A = U1 ∪ U2 , where U1 = Rn \ D and U2 = D \ A. Since D is closed, U1 is open. If x0 ∈ U2 , then ωf (x0 ) < a. Let ε = a − ωf (x0 ). Then ε > 0. By definition of ωf (x0 ), there is a δ > 0 such that for all 0 < r < δ, sup{f (u) − f (v) | u, v ∈ B(x0 , r) ∩ D} < ωf (x0 ) + ε = a. Take c = δ/3. If x is in B(x0 , c), and u, v are in B(x, c), then u, v are in B(x0 , 2c). Since 2c < δ, we find that ωf (x) ≤ sup{f (u) − f (v) | u, v ∈ B(x, c) ∩ D} ≤ sup{f (u) − f (v) | u, v ∈ B(x0 , 2c) ∩ D} < a. This shows that B(x0 , c) ⊂ U2 . Hence, U2 is open. Since Rn \ A is a union of two open sets, it is open. This completes the proof. Part (b) follows from Corollary C.7 and the identity (0, ∞) =
∞ [ 1 k=1
k
,∞ .
For part (c), if N has Lebesgue measure zero, then for any k ∈ Z+ , Nk also has Lebesgue measure zero. By part (a) and Proposition C.2, Nk has Jordan content zero. Conversely, assume that Nk has Jordan content zero for each k ∈ Z+ . Then Nk has Lebesgue measure zero for each k ∈ Z+ . Part (b) and Proposition C.4 implies that N also has Lebesgue measure zero. Now we can prove the Lebesgue-Vitali theorem.
Appendix C. Riemann Integrability
638
Proof of the Lebesgue-Vitali Theorem First we assume that f : I → R is Riemann integrable. Given k ∈ Z+ , we will show that the set 1 Nk = x ∈ D ωf (x) ≥ k has Jordan content zero. By Proposition C.8, this implies that the set N of discontiuities of f : I → R has Lebesgue measure zero. Fixed k ∈ Z+ . Given ε > 0, since f : I → R is Riemann integrable, there is a partition P of I such that U (f, P) − L(f, P)
0, let k be a positive integer such that k≥
2vol (I) . ε
Proposition C.8 says that Nk has Jordan content zero. Thus, there is a finite collection of open rectangles Bk = {Ul | 1 ≤ l ≤ m} such that Nk ⊂
m [ l=1
Ul
m X l=1
vol (Ul )
0 such that the open cube x + (−rx , rx )n is contained in the open set R \ Nk . By taking a smaller rx , we can assume that sup {f (u) − f (v) | u, v ∈ B(x, 2rx )}