A Modern Introduction to Mathematical Analysis 3031237129, 9783031237126

This textbook presents all the basics for the first two years of a course in mathematical analysis, from the natural num

233 41 5MB

English Pages 441 [442] Year 2023

Report DMCA / Copyright

DOWNLOAD PDF FILE

Table of contents :
Preface
Contents
Preliminaries
The Symbols of Logic
Logical Propositions
The Language of Set Theory
First Symbols
Some Examples of Sets
Operations with Sets
The Concept of Function
Part I The Basics of Mathematical Analysis
1 Sets of Numbers and Metric Spaces
1.1 The Natural Numbers and the Induction Principle
1.1.1 Recursive Definitions
1.1.2 Proofs by Induction
1.1.3 The Binomial Formula
1.2 The Real Numbers
1.2.1 Supremum and Infimum
1.2.2 The Square Root
1.2.3 Intervals
1.2.4 Properties of Q and RQ
1.3 The Complex Numbers
1.3.1 Algebraic Equations in C
1.3.2 The Modulus of a Complex Number
1.4 The Space RN
1.4.1 Euclidean Norm and Distance
1.5 Metric Spaces
2 Continuity
2.1 Continuous Functions
2.2 Intervals and Continuity
2.3 Monotone Functions
2.4 The Exponential Function
2.5 The Trigonometric Functions
2.6 Other Examples of Continuous Functions
3 Limits
3.1 The Notion of Limit
3.2 Some Properties of Limits
3.3 Change of Variables in the Limit
3.4 On the Limit of Restrictions
3.5 The Extended Real Line
3.6 Some Operations with -∞ and +∞
3.7 Limits of Monotone Functions
3.8 Limits for Exponentials and Logarithms
3.9 Liminf and Limsup
4 Compactness and Completeness
4.1 Some Preliminaries on Sequences
4.2 Compact Sets
4.3 Compactness and Continuity
4.4 Complete Metric Spaces
4.5 Completeness and Continuity
4.6 Spaces of Continuous Functions
5 Exponential and Circular Functions
5.1 The Construction
5.1.1 Preliminaries for the Proof
5.1.2 Definition on a Dense Set
5.1.3 Extension to the Whole Real Line
5.2 Exponential and Circular Functions
5.3 Limits for Trigonometric Functions
Part II Differential and Integral Calculus in R
6 The Derivative
6.1 Some Differentiation Rules
6.2 The Derivative Function
6.3 Remarkable Properties of the Derivative
6.4 Inverses of Trigonometric and Hyperbolic Functions
6.5 Convexity and Concavity
6.6 L'Hôpital's Rules
6.7 Taylor Formula
6.8 Local Maxima and Minima
6.9 Analyticity of Some Elementary Functions
7 The Integral
7.1 Riemann Sums
7.2 δ-Fine Tagged Partitions
7.3 Integrable Functions on a Compact Interval
7.4 Elementary Properties of the Integral
7.5 The Fundamental Theorem
7.6 Primitivable Functions
7.7 Primitivation by Parts and by Substitution
7.8 The Taylor Formula with Integral Form Remainder
7.9 The Cauchy Criterion
7.10 Integrability on Subintervals
7.11 R-Integrable and Continuous Functions
7.12 Two Theorems Involving Limits
7.13 Integration on Noncompact Intervals
7.14 Functions with Vector Values
Part III Further Developments
8 Numerical Series and Series of Functions
8.1 Introduction and First Properties
8.2 Series of Real Numbers
8.3 Series of Complex Numbers
8.4 Series of Functions
8.4.1 Power Series
8.4.2 The Complex Exponential Function
8.4.3 Taylor Series
8.4.4 Fourier Series
8.5 Series and Integrals
9 More on the Integral
9.1 Saks–Henstock Theorem
9.2 L-Integrable Functions
9.3 Monotone Convergence Theorem
9.4 Dominated Convergence Theorem
9.5 Hake's Theorem
Part IV Differential and Integral Calculus in RN
10 The Differential
10.1 The Differential of a Scalar-Valued Function
10.2 Some Computational Rules
10.3 Twice Differentiable Functions
10.4 Taylor Formula
10.5 The Search for Maxima and Minima
10.6 Implicit Function Theorem: First Statement
10.7 The Differential of a Vector-Valued Function
10.8 The Chain Rule
10.9 Mean Value Theorem
10.10 Implicit Function Theorem: General Statement
10.11 Local Diffeomorphisms
10.12 M-Surfaces
10.13 Local Analysis of M-Surfaces
10.14 Lagrange Multipliers
10.15 Differentiable Manifolds
11 The Integral
11.1 Integrability on Rectangles
11.2 Integrability on a Bounded Set
11.3 The Measure
11.4 Negligible Sets
11.5 A Characterization of Measurable Bounded Sets
11.6 Continuous Functions and L-Integrable Functions
11.7 Limits and Derivatives under the Integration Sign
11.8 Reduction Formula
11.9 Change of Variables in the Integral
11.10 Change of Measure by Diffeomorphisms
11.11 The General Theorem on Change of Variables
11.12 Some Useful Transformations in R2
11.13 Cylindrical and Spherical Coordinates in R3
11.14 The Integral on Unbounded Sets
11.15 The Integral on M-Surfaces
11.16 M-Dimensional Measure
11.17 Length and Area
11.18 Approximation with Smooth M-Surfaces
11.19 The Integral on a Compact Manifold
12 Differential Forms
12.1 An Informal Definition
12.2 Algebraic Operations
12.3 The Exterior Differential
12.4 Differential Forms in R3
12.5 The Integral on an M-Surface
12.6 Pull-Back Transformation
12.7 Oriented Boundary of a Rectangle
12.8 Gauss Formula
12.9 Oriented Boundary of an M-Surface
12.10 Stokes–Cartan Formula
12.11 Physical Interpretation of Curl and Divergence
12.12 The Integral on an Oriented Compact Manifold
12.13 Closed and Exact Differential Forms
12.14 On the Precise Definition of a Differential Form
Bibliography
References Cited in the Book
Books on the Kurzweil–Henstock Integral
Some Textbooks on Exercises
Index
Recommend Papers

A Modern Introduction to Mathematical Analysis
 3031237129, 9783031237126

  • 0 0 0
  • Like this paper and download? You can publish your own PDF file online for free in a few minutes! Sign Up
File loading please wait...
Citation preview

Alessandro Fonda

A Modern Introduction to Mathematical Analysis

A Modern Introduction to Mathematical Analysis

Alessandro Fonda

A Modern Introduction to Mathematical Analysis

Alessandro Fonda Dipartimento di Matematica e Geoscienze Università degli Studi di Trieste Trieste, Italy

ISBN 978-3-031-23712-6 ISBN 978-3-031-23713-3 https://doi.org/10.1007/978-3-031-23713-3

(eBook)

© The Editor(s) (if applicable) and The Author(s), under exclusive license to Springer Nature Switzerland AG 2023 This work is subject to copyright. All rights are solely and exclusively licensed by the Publisher, whether the whole or part of the material is concerned, specifically the rights of reprinting, reuse of illustrations, recitation, broadcasting, reproduction on microfilms or in any other physical way, and transmission or information storage and retrieval, electronic adaptation, computer software, or by similar or dissimilar methodology now known or hereafter developed. The use of general descriptive names, registered names, trademarks, service marks, etc. in this publication does not imply, even in the absence of a specific statement, that such names are exempt from the relevant protective laws and regulations and therefore free for general use. The publisher, the authors, and the editors are safe to assume that the advice and information in this book are believed to be true and accurate at the date of publication. Neither the publisher nor the authors or the editors give a warranty, expressed or implied, with respect to the material contained herein or for any errors or omissions that may have been made. The publisher remains neutral with regard to jurisdictional claims in published maps and institutional affiliations. This book is published under the imprint Birkhäuser, www.birkhauser-science.com by the registered company Springer Nature Switzerland AG The registered company address is: Gewerbestrasse 11, 6330 Cham, Switzerland

To my parents, Thea and Luciano

Preface

This book brings together the classical topics of mathematical analysis normally taught in the first two years of a university course. It is the outcome of the lessons I have been teaching for many years in the undergraduate courses in mathematics, physics, and engineering at my university. Many excellent books on mathematical analysis have already been written, so a natural question to ask is: Why write another book on this subject? I will try to provide a brief answer to that question. The main novelty of this book lies in the treatment of the theory of the integral. Kurzweil and Henstock’s theory is presented here in Chaps. 7, 9, and 11. Compared to Riemann’s theory, it requires modest additional effort from the student, but that effort will be repaid with significant benefits. Consider that it includes Lebesgue’s theory itself, since a function is integrable according to Kurzweil and Henstock if and only if it is integrable according to Lebesgue, together with its absolute value. In this theory, the Fundamental Theorem turns out to be very general and natural, and it finds its generalization in Taylor’s formula with an integral remainder, requiring only essential hypotheses. Moreover, the improper integral happens to be a normal integral. Despite the modest additional effort required, no demands for preliminary knowledge about the integral will be made on the student for the purpose of reading this book. Students will be guided as they construct all the necessary mathematical tools, starting from the very beginning. Indeed, the book starts with some preliminaries on logic and set theory. It is a short vademecum, without formal rigor, and will help readers orient themselves with respect to the notations to be used later on. In Chap. 1, we introduce the main sets on which we base the rest of the theory. These are the numeric sets, mainly .R and .C, the space .RN , and the metric spaces. It is in this general context that the concepts of continuity and limit will be developed in Chaps. 2 and 3, respectively. The discussion of numerical series, together with the series of functions, will be postponed to Chap. 8. Chapter 4 is dedicated to the notions of compactness and completeness. Although they seem to be rather abstract concepts, they happen to be necessary for a rigorous treatment of differential and integral calculus. I would like to make special mention here of the original construction of the exponential function and the trigonometric functions, which I propose in Chap. 5. vii

viii

Preface

They are introduced as particular cases of a function with complex values that is constructed with elementary geometric tools. Differential calculus is first developed in Chap. 6 for functions of one real variable and, later in Chap. 10, for functions of several variables. Here the reader will find the implicit function theorem proved by induction, as in the original proofs by Dini and Genocchi–Peano. As was already noted, integral calculus is presented in Chaps. 7, 9, and 11. This approach to the integral was introduced independently by J. Kurzweil [5] in 1957 and R. Henstock [3] in 1961. In Chap. 12, the theory of differential forms and their integral on M-surfaces and on differentiable M-manifolds is developed in detail, following the approach of Spivak [6]. The Stokes–Cartan theorem, with its classic corollaries the curl and divergence theorems, is the final result. Besides being a fundamental tool in applications, it stands out for its elegance and formal perfection, like the most sublime works of art. The book can be used at different teaching levels, in line with the preferences of the teacher. As mentioned earlier, I have proposed it as an early postsecondary text. However, it could also be used in an advanced course in analysis or by scholars wishing to understand the Kurzweil–Henstock integral starting with a simple approach. Unlike the majority of books on these subjects, this one contains almost no exercises. The reason for this is that many textbooks containing only exercises have already been published, which fits perfectly with the arguments of this book. An example is Solving Problems in Mathematical Analysis by T. Rado˙zycki, published by Springer in 2020, which is divided into three volumes: – Part I. Sets, Functions, Limits, Derivatives, Integrals, Sequences, and Series; – Part II. Definite, Improper and Multidimensional Integrals, Functions of Several Variables, and Differential Equations; – Part III. Curves and Surfaces, Conditional Extremes, Curvilinear Integrals, Complex Functions, Singularities, and Fourier Series. A list of other recent textbooks with solved exercises is provided in the bibliography. Finally, I would like to mention that this book would never have written without the strong motivation provided by my students. I thank them all, and I hope that the book will encourage others in the future to become involved in such a beautiful and fruitful theory as mathematical analysis. Trieste, Italy

Alessandro Fonda

Contents

Part I

The Basics of Mathematical Analysis

1

Sets of Numbers and Metric Spaces . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 1.1 The Natural Numbers and the Induction Principle . . . . . . . . . . . . . . . . 1.1.1 Recursive Definitions .. . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 1.1.2 Proofs by Induction .. . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 1.1.3 The Binomial Formula . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 1.2 The Real Numbers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 1.2.1 Supremum and Infimum.. . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 1.2.2 The Square Root .. . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 1.2.3 Intervals . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 1.2.4 Properties of Q and R \ Q . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 1.3 The Complex Numbers . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 1.3.1 Algebraic Equations in C . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 1.3.2 The Modulus of a Complex Number . .. . . . . . . . . . . . . . . . . . . . 1.4 The Space RN . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 1.4.1 Euclidean Norm and Distance . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 1.5 Metric Spaces . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . .

3 3 4 6 9 12 14 17 19 20 23 25 26 28 30 33

2

Continuity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 2.1 Continuous Functions.. . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 2.2 Intervals and Continuity . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 2.3 Monotone Functions .. . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 2.4 The Exponential Function . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 2.5 The Trigonometric Functions . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 2.6 Other Examples of Continuous Functions .. . . .. . . . . . . . . . . . . . . . . . . .

41 41 50 51 53 56 59

3

Limits 3.1 3.2 3.3 3.4 3.5 3.6 3.7

63 63 65 68 70 72 76 83

. .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . The Notion of Limit . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . Some Properties of Limits . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . Change of Variables in the Limit . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . On the Limit of Restrictions . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . The Extended Real Line . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . Some Operations with −∞ and +∞ . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . Limits of Monotone Functions . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . .

ix

x

Contents

3.8 3.9

Limits for Exponentials and Logarithms.. . . . . .. . . . . . . . . . . . . . . . . . . . Liminf and Limsup . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . .

86 90

4

Compactness and Completeness . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 93 4.1 Some Preliminaries on Sequences . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 93 4.2 Compact Sets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 95 4.3 Compactness and Continuity .. . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 97 4.4 Complete Metric Spaces . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 99 4.5 Completeness and Continuity . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 102 4.6 Spaces of Continuous Functions . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 103

5

Exponential and Circular Functions . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 5.1 The Construction .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 5.1.1 Preliminaries for the Proof . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 5.1.2 Definition on a Dense Set . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 5.1.3 Extension to the Whole Real Line . . . . .. . . . . . . . . . . . . . . . . . . . 5.2 Exponential and Circular Functions.. . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 5.3 Limits for Trigonometric Functions.. . . . . . . . . . .. . . . . . . . . . . . . . . . . . . .

Part II

107 107 108 113 114 117 119

Differential and Integral Calculus in .R

6

The Derivative . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 6.1 Some Differentiation Rules. . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 6.2 The Derivative Function . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 6.3 Remarkable Properties of the Derivative . . . . . .. . . . . . . . . . . . . . . . . . . . 6.4 Inverses of Trigonometric and Hyperbolic Functions .. . . . . . . . . . . . 6.5 Convexity and Concavity .. . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 6.6 L’Hôpital’s Rules . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 6.7 Taylor Formula .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 6.8 Local Maxima and Minima.. . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 6.9 Analyticity of Some Elementary Functions . . .. . . . . . . . . . . . . . . . . . . .

127 130 135 136 139 141 146 154 158 159

7

The Integral . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 7.1 Riemann Sums . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 7.2 δ-Fine Tagged Partitions .. . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 7.3 Integrable Functions on a Compact Interval .. .. . . . . . . . . . . . . . . . . . . . 7.4 Elementary Properties of the Integral .. . . . . . . . .. . . . . . . . . . . . . . . . . . . . 7.5 The Fundamental Theorem .. . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 7.6 Primitivable Functions .. . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 7.7 Primitivation by Parts and by Substitution .. . . .. . . . . . . . . . . . . . . . . . . . 7.8 The Taylor Formula with Integral Form Remainder . . . . . . . . . . . . . . 7.9 The Cauchy Criterion .. . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 7.10 Integrability on Subintervals . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 7.11 R-Integrable and Continuous Functions . . . . . . .. . . . . . . . . . . . . . . . . . . . 7.12 Two Theorems Involving Limits . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 7.13 Integration on Noncompact Intervals . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 7.14 Functions with Vector Values . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . .

161 161 163 165 168 170 172 177 181 182 184 187 191 194 198

Contents

Part III

xi

Further Developments

8

Numerical Series and Series of Functions . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 8.1 Introduction and First Properties .. . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 8.2 Series of Real Numbers.. . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 8.3 Series of Complex Numbers.. . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 8.4 Series of Functions.. . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 8.4.1 Power Series . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 8.4.2 The Complex Exponential Function .. .. . . . . . . . . . . . . . . . . . . . 8.4.3 Taylor Series . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 8.4.4 Fourier Series . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 8.5 Series and Integrals . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . .

203 203 208 213 215 217 220 222 224 231

9

More on the Integral . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 9.1 Saks–Henstock Theorem . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 9.2 L-Integrable Functions . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 9.3 Monotone Convergence Theorem .. . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 9.4 Dominated Convergence Theorem .. . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 9.5 Hake’s Theorem .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . .

233 233 236 241 245 247

Part IV

Differential and Integral Calculus in .RN

10 The Differential . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 10.1 The Differential of a Scalar-Valued Function .. . . . . . . . . . . . . . . . . . . . 10.2 Some Computational Rules . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 10.3 Twice Differentiable Functions . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 10.4 Taylor Formula .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 10.5 The Search for Maxima and Minima . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 10.6 Implicit Function Theorem: First Statement .. .. . . . . . . . . . . . . . . . . . . . 10.7 The Differential of a Vector-Valued Function .. . . . . . . . . . . . . . . . . . . . 10.8 The Chain Rule . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 10.9 Mean Value Theorem . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 10.10 Implicit Function Theorem: General Statement . . . . . . . . . . . . . . . . . . . 10.11 Local Diffeomorphisms . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 10.12 M-Surfaces .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 10.13 Local Analysis of M-Surfaces . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 10.14 Lagrange Multipliers.. . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 10.15 Differentiable Manifolds . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . .

255 256 259 261 263 267 269 272 274 277 280 286 287 291 294 299

11 The Integral . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 11.1 Integrability on Rectangles . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 11.2 Integrability on a Bounded Set . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 11.3 The Measure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 11.4 Negligible Sets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 11.5 A Characterization of Measurable Bounded Sets . . . . . . . . . . . . . . . . . 11.6 Continuous Functions and L-Integrable Functions.. . . . . . . . . . . . . . .

301 301 305 307 310 313 317

xii

Contents

11.7 11.8 11.9 11.10 11.11 11.12 11.13 11.14 11.15 11.16 11.17 11.18 11.19

Limits and Derivatives under the Integration Sign . . . . . . . . . . . . . . . . Reduction Formula.. . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . Change of Variables in the Integral . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . Change of Measure by Diffeomorphisms .. . . . .. . . . . . . . . . . . . . . . . . . . The General Theorem on Change of Variables .. . . . . . . . . . . . . . . . . . . Some Useful Transformations in R2 . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . Cylindrical and Spherical Coordinates in R3 . .. . . . . . . . . . . . . . . . . . . . The Integral on Unbounded Sets . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . The Integral on M-Surfaces . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . M-Dimensional Measure .. . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . Length and Area . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . Approximation with Smooth M-Surfaces . . . . .. . . . . . . . . . . . . . . . . . . . The Integral on a Compact Manifold . . . . . . . . . .. . . . . . . . . . . . . . . . . . . .

319 324 333 340 342 344 348 350 360 363 365 369 370

12 Differential Forms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 12.1 An Informal Definition . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 12.2 Algebraic Operations . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 12.3 The Exterior Differential . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 12.4 Differential Forms in R3 . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 12.5 The Integral on an M-Surface .. . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 12.6 Pull-Back Transformation . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 12.7 Oriented Boundary of a Rectangle . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 12.8 Gauss Formula . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 12.9 Oriented Boundary of an M-Surface.. . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 12.10 Stokes–Cartan Formula .. . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 12.11 Physical Interpretation of Curl and Divergence . . . . . . . . . . . . . . . . . . . 12.12 The Integral on an Oriented Compact Manifold.. . . . . . . . . . . . . . . . . . 12.13 Closed and Exact Differential Forms . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 12.14 On the Precise Definition of a Differential Form .. . . . . . . . . . . . . . . . .

375 376 378 380 382 384 389 392 395 397 401 406 408 411 418

Bibliography . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 425 Index . . . . . . . . .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 427

Preliminaries

This preliminary section has as its goal to introduce the main language and notations used in the book. Logic and set theory are treated in an informal way, without aiming for the highest mathematical rigor. Indeed, a rigorous treatment would require a solid background in mathematics, which students just starting out in their college career will not usually possess.

The Symbols of Logic In mathematical language, we usually deal with propositions, indicated by .P, .Q, and so forth. Moreover, we are accustomed to combining them in different ways, for example, P and Q ,

.

P or Q ,

P ⇒ Q,

P ⇔ Q.

Let us explain the meaning of these. We start with P and Q .

.

It is true if both .P and .Q are true; otherwise, it is false. We can draw a table where all four cases are exemplified:1 .

P . Q .P and Q

T T F F

1

T F T F

T F F F

In all tables, T means that a proposition is true, F that it is false. xiii

xiv

Preliminaries

Let us now consider P or Q .

.

It is true if at least one of the two is true and false when both .P and .Q are false. Here is the corresponding table: .

P . Q .P or Q

T T F F

T F T F

T T T F

Let us now analyze P ⇒ Q.

.

It is false only when .P is true and .Q is false; in all other cases it is true. Here is its table: .

P . Q .P ⇒ Q

T T F F

T F T F

T F T T

Let us conclude with P ⇔ Q.

.

It is true if both are true or if both are false. Otherwise, it is false. And here is its table: .

P . Q .P ⇔ Q

T T F F

T F T F

T F F T

It is very important to be able to logically deny a proposition. The negation of .P will be denoted by .¬ P (read as “non .P”): it is true when .P is false, and vice versa.

Preliminaries

xv

For example, we have the following De Morgan rules: ¬ (P and Q)

.

¬ (P or Q)

is equivalent to is equivalent to

¬ P or ¬ Q , ¬ P and ¬ Q .

It is possible, moreover, to verify that P ⇒ Q

.

is equivalent to

¬ P or Q .

Consequently, ¬ (P ⇒ Q)

.

is equivalent to

P and ¬ Q .

Logical Propositions Our propositions will often involve one or more “variables.” For example, we could write them as follows: .P(x), which contains the variable x, in which case we will typically find the following two types of propositions. The first one, ∀x : P(x) ,

.

means “for every x one has that .P(x) is true.” The second, ∃x : P(x) ,

.

means “there exists at least one x for which .P(x) is true.” Let us see how their negation can be formulated. One has that ¬ (∀x : P(x))

is equivalent to

∃x : ¬ P(x)

¬ (∃x : P(x))

is equivalent to

∀x : ¬ P(x) .

.

and .

To be more precise, these x will be assumed to be the elements of some set. Thus, this leads us to a brief review of the theory of sets.

xvi

Preliminaries

The Language of Set Theory First Symbols We are more or less familiar with some numerical sets like, for example, N , the set of natural numbers; Z , the set of integer numbers; .Q , the set of rational numbers; .R , the set of real numbers; .C , the set of complex numbers. . .

Their nature will be further studied as we progress through the book, and several other sets will be introduced later. To treat sets correctly, we need to develop a proper language. This is why we will now introduce some symbols explaining their meaning. Let us first introduce the symbol .∈ . Writing a∈A

.

means “a belongs to the set A” or “a is an element of A.” Its negation is written a∈ / A and reads “a does not belong to A” or “a is not an element of A.” For example, let .A = {1, 2, 3} be the set2 whose elements are the three natural numbers .1, 2, and 3. We clearly have

.

.

1 ∈ A,

2 ∈ A,

3 ∈ A,

4∈ / A,

1 2

∈ / A,

π∈ / A.

whereas .

Let us now present the symbol .⊆ . We will write A⊆B

.

and read “A is contained in B” whenever every element of A is also an element of B. In symbols, x∈A

.



x ∈B.

For example, if, as previously, .A = {1, 2, 3}, we have that .A ⊆ N, but also .A ⊆ R.

2

In this example, the set A is defined by listing its elements, which are finite in number.

Preliminaries

xvii

If .A ⊆ B, we also say that “A is a subset of B,” and we can also write .B ⊇ A. The negation of .A ⊆ B is written .A ⊆ B or .B ⊇ A, and we read this as “A is not contained in B” or “B does not contain A.” We say that two sets A and B are “equal” if they coincide, i.e., if they have the same elements; in such a case we will write A=B.

.

Therefore, A=B



.

A ⊆ B and B ⊆ A .

The negation of .A = B is written as .A = B; in this case, we say that A and B are different, i.e., they do not coincide. Let us emphasize the following “order relation” properties: • .A ⊆ A; • .A ⊆ B and . B ⊆ A .⇒ .A = B; • .A ⊆ B and . B ⊆ C .⇒ .A ⊆ C. We end this section by introducing a very peculiar set, the “empty set,” which is a set having no elements. It is denoted by the symbol Ø.

.

It is convenient to consider .Ø as a subset of any other set, i.e., Ø ⊆ A,

.

for any A .

Some Examples of Sets Let us begin with the simplest sets, those having a single element, for example, A = {3} ,

.

A = {N} ,

A = {Ø} .

The first one is a set having the number 3 as its only element. The second one has a single element .N, and the third one has only the element .Ø. We thus observe that the elements of a set may be other sets as well. We could have sets of the type A = {N, Q} ,

.

A = {Ø, Z, R} ,

A = {{π}, {1, 2, 3}, N}

and of the type A = {3, {3}, N, {N, Q}} .

.

xviii

Preliminaries

In this last case, one must be careful with symbols: we see that .3 ∈ A, hence .{3} ⊆ A, but also .{3} ∈ A, with .{3} being an element of A. Let us also consider, as a last example, the set A = {Ø, {Ø}} .

.

We have that .Ø ∈ A, since .Ø is one of the elements of A, and hence .{Ø} ⊆ A. But we also have that .{Ø} ∈ A, being .{Ø} an element of A, and hence .{{Ø}} ⊆ A. We also recall that .Ø ⊆ A, since this is true for every set.

Operations with Sets It is normal practice to choose a “universal set” where we operate. We will denote it by E. All the objects we will speak of necessarily lie in this set. We define the “intersection” of two sets A and B: it is the set3 A ∩ B = {x : x ∈ A and x ∈ B} ,

.

whose elements belong to both sets. Notice that the intersection could also be the empty set: in that case, we say that A and B are “disjoint.” On the other hand, the “union” of two sets A and B is the set A ∪ B = {x : x ∈ A or x ∈ B} ,

.

whose elements belong to at least one of the two sets, and possibly also to both. The “difference” of the two sets A and B is the set A \ B = {x : x ∈ A and x ∈ / B} ,

.

whose elements belong to the first set but not the second. In particular, the set .E \ A is said to be “complementary” to A and is denoted by .CA. Hence, CA = {x : x ∈ / A} .

.

The following De Morgan rules hold true: C(A ∩ B) = CA ∪ CB ,

.

C(A ∪ B) = CA ∩ CB .

The “product” of the two sets A and B is the set A × B = {(a, b) : a ∈ A, b ∈ B} ,

.

3

Here the sets are defined by specifying the properties that their elements must possess.

Preliminaries

xix

whose elements are the “ordered couples ” .(a, b), where at the first position we have an element of A and at the second position an element of B.

The Concept of Function A “function” (sometimes also called “application”) is defined by assigning three sets: • set A, the “domain” of the function; • set B, the “codomain” of the function; • set .G ⊆ A × B, the “graph” of the function, having the following property: for every .a ∈ A there is a unique .b ∈ B such that .(a, b) ∈ G. A function defined in such a way is usually written .f : A → B (read “f from A to B”). To each element a of the domain we have a well determined associated element b of the codomain: such a b will be denoted by .f (a), and we will write .a → f (a). We thus have that (a, b) ∈ G



.

b = f (a) ,

i.e., G = {(a, b) ∈ A × B : b = f (a)} = {(a, f (a)) : a ∈ A} .

.

For example, the function .f : N → R, defined as .f (n) = n/(n + 1), associates to every .n ∈ {0, 1, 2, 3, . . . } the corresponding value .n/(n + 1), i.e., n →

.

n . n+1

We will thus have that 0 → 0 ,

.

1 →

1 , 2

2 →

2 , 3

3 →

3 , 4

...

Note that the values of this function are all rational numbers, so we could have defined using the same formula a function .f : N → Q. Such a function, however, is not the same as the previous one since they do not have the same codomain. A function whose domain is the set .N of natural numbers is also called a “sequence,” and a different notation is usually preferred: if .s : N → B is such a sequence, instead of .s(n), it is customary to write .sn , and the sequence itself is denoted by .(sn )n .

xx

Preliminaries

The function .f : R → R, defined by .f (x) = x 2 , associates to every .x ∈ R its square. Notice that f (−x) = f (x) ,

.

for every x ∈ R .

We will say that such a function is “even.” If, instead, as for the function .f : R → R defined by .f (x) = x 3 , one has that f (−x) = −f (x) ,

.

for every x ∈ R ,

we will say that such a function is “odd.” Clearly, a function could very well be neither even nor odd. Sometimes it could be useful to use the notation .f (·) instead of just f . For example, if .g : R×R → R is a given function associating to each .(x, y) ∈ R2 a real number .g(x, y), then by .g(·, y) : R → R we will denote the function .x → g(x, y) for any fixed .y ∈ R. The “image” of the function .f : A → B is the set f (A) = {f (a) : a ∈ A} ,

.

and, in general, for every set .U ⊆ A we can write f (U ) = {f (a) : a ∈ U } ;

.

it is the image of the function .f |U , the restriction of f to the domain U . The “composition” of two functions .f : A → B and .g : B → C is the function .g ◦ f : A → C defined as (g ◦ f )(a) = g(f (a)) .

.

It could also be defined only assuming that the image of f is a subset of the domain of g. A function .f : A → B is said to be • “injective” if .a1 = a2 ⇒ f (a1 ) = f (a2 ); • “surjective” if .f (A) = B; • “bijective” if it is both injective and surjective. If .f : A → B is bijective, then for every .b ∈ B there is an .a ∈ A such that f (a) = b (f is surjective), and such an element a is unique (f is injective). One can thus define a function from B to A that associates to every .b ∈ B the unique element .a ∈ A such that .f (a) = b. This is the so-called “inverse function” of −1 : B → A. Thus, .f : A → B, and it is usually denoted by .f .

f (a) = b

.



a = f −1 (b) .

Preliminaries

xxi

The word “bijective” will thus be synonymous with “invertible.” Notice that, for every .a ∈ A and .b ∈ B, f −1 (f (a)) = a ,

.

f (f −1 (b)) = b .

For any function f , whether invertible or not, given a set .V ⊆ B, it is common practice to write f −1 (V ) = {a ∈ A : f (a) ∈ V } .

.

This is the so-called “counterimage set” of V ; it is composed of those elements a of A whose associated element .f (a) belongs to V . To conclude this brief presentation, let us recall that, given two functions .f : A → B and .g : A → B, if the codomain B has an addition operation, we can define the function .f + g : A → B as follows: (f + g)(a) = f (a) + g(a) .

.

Similar definitions can be given for the difference, product, and quotient of two functions: (f − g)(a) = f (a) − g(a) ,

.

(fg)(a) = f (a) g(a) ,

.

  f (a) f (a) = . g g(a)

.

Part I The Basics of Mathematical Analysis

1

Sets of Numbers and Metric Spaces

In this chapter, we introduce the main settings where all the theory will be developed. First, we discuss the sets of numbers .N, .Z, .Q, .R, and .C, then the space N .R , and, finally, abstract metric spaces.

1.1

The Natural Numbers and the Induction Principle

In 1898 Giuseppe Peano, in his fundamental paper “Arithmetices principia: nova methodo exposita”, provided an axiomatic description of the set of natural numbers .N. We briefly state those axioms as follows: (a) (b) (c) (d) (e)

There exists an element, called “zero,” denoted by 0. Every element n has a “successor” .n . 0 is not the successor of any element. Different elements have different successors. Induction principle: If S is a subset of .N such that (i) .0 ∈ S , (ii) .n ∈ S ⇒ n ∈ S , then .S = N.

It is tacitly understood that condition .(ii) must be verified for any .n ∈ N. We may therefore read it in the following way: (ii) If, for some n, we have that .n ∈ S, then also .n ∈ S . We then introduce the familiar symbols .0 = 1, .1 = 2, .2 = 3, and so on. From these few axioms, making use of set theory, Peano showed how to recover all the properties of the natural numbers. In particular, we can define addition and

© The Author(s), under exclusive license to Springer Nature Switzerland AG 2023 A. Fonda, A Modern Introduction to Mathematical Analysis, https://doi.org/10.1007/978-3-031-23713-3_1

3

4

1 Sets of Numbers and Metric Spaces

multiplication such that n = n + 1 .

.

Moreover, writing .m ≤ n whenever there exists a .p ∈ N such that .m + p = n, we obtain an order relation. We will assume here that all the properties of addition, multiplication, and the order relation defined on .N are well known.

1.1.1

Recursive Definitions

The induction principle can be used to define a sequence of objects A0 , A1 , A2 , A3 , . . .

.

We proceed in the following way: (j) We define .A0 . (jj) Assuming that .An has already been defined, for some n, we define .An+1 . In this way, if we denote by S the set of those n for which .An is well defined, it is easy to see that such a set S confirms .(i) and .(ii) in the induction principle. Hence, S coincides with .N, meaning that every .An is well defined. For example, we can define the “powers” .a n by setting (j) .a 0 = 1 , (jj) .a n+1 = a · a n . We then verify that1 .

a 1 = a · a 0 = a · 1 = a, a 2 = a · a 1 = a · a, a3 = a · a2 = a · a · a a4 = a · a3 = a · a · a · a ...

Henceforth, we will assume that all elementary properties of powers are well known.

If .a = 0, it is sometimes a subtle matter to define .00 . However, in this book we will always assume that .00 = 1.

1

1.1 The Natural Numbers and the Induction Principle

5

Let us now define the “factorial” .n! by setting (j) .0! = 1 , (jj) .(n + 1)! = (n + 1) · n! . We then see that .

1! = 1 · 0! = 1 · 1 = 1 2! = 2 · 1! = 2 · 1 3! = 3 · 2! = 3 · 2 · 1 4! = 4 · 3! = 4 · 3 · 2 · 1 ...

Finally, let us define the “summation” of .α0 , α1 , . . . , αn using the notation n  .

αk ,

k=0

which reads “the sum of .αk when k goes from 0 to n.” We set 0  .

αk = α0 ,

k=0 n 

and, assuming that .

αk has been defined for some n, we set

k=0 n+1  .

αk =

k=0

n 

αk + αn+1 .

k=0

In the preceding notation, an index appears, denoted by k; it takes all the integer values between 0 and n. Informally, n  .

αk = α0 + α1 + α2 + · · · + αn .

k=0

The fact that the index is denoted by the letter k is unimportant; we can use any other letter or symbol to denote it, for instance n  .

j =0

αj ,

n  =0

α ,

n  m=0

αm ,

n  =0

α ,

...

6

1 Sets of Numbers and Metric Spaces

Notice that the same sum could be written n+1  .

n+2 

αk−1 ,

k=1

αk−2 ,

...

k=2

n+m 

αk−m ,

k=m

or even n  .

αn−k .

k=0

As you can see, the use of the summation symbol has many variants, and we will sometimes need them in what follows.

1.1.2

Proofs by Induction

The induction principle can also be used to prove a sequence of propositions P0 , P1 , P2 , P3 , . . .

.

We must proceed as follows: (j) We verify .P0 . (jj) Assuming the truth of .Pn for some n, we verify .Pn+1 . In this way, denoting by S the set of those n for which .Pn is true, then S verifies both .(i) and .(ii) in the induction principle. Hence, S coincides with .N, so all propositions .Pn are true. We now provide some examples. Example 1 We want to prove the Bernoulli inequality Pn :

(1 + a)n ≥ 1 + na .

.

We first see that .P0 is true since surely .(1 + a)0 ≥ 1 + 0 · a. Let us now assume that .Pn is true for some n; under this assumption, we need to verify .Pn+1 . Indeed, we have (1+a)n+1 = (1+a)n(1+a) ≥ (1+na)(1+a) = 1+(n+1)a +na 2 ≥ 1+(n+1)a ,

.

hence, .Pn+1 is also true. In conclusion, .Pn is true for every .n ∈ N. Remark 1.1 In this section we are dealing with natural numbers; however, the Bernoulli inequality is true as well for any real number .a ≥ −1, and the proof

1.1 The Natural Numbers and the Induction Principle

7

is exactly the same. Similar remarks could be made for the other formulas in the following discussion. Example 2 The following properties of summation can be proven by induction: .

n n n    (αk + βk ) = αk + βk , k=0

k=0

k=0

which informally reads (α0 +β0 )+(α1 +β1 )+· · ·+(αn +βn ) = (α0 +α1 +· · ·+αn )+(β0 +β1 +· · ·+βn ) ,

.

and n  .

(Cαk ) = C

k=0

n 

 αk ,

k=0

which informally reads Cα0 + Cα1 + · · · + Cαn = C(α0 + α1 + · · · + αn ) .

.

Let us prove, for instance, the first one. We first verify that it holds for .n = 0, with .

0 0 0    (αk + βk ) = α0 + β0 = αk + βk . k=0

k=0

k=0

Assuming now that the formula is true for some n, we have n+1  .

(αk + βk ) =

k=0

n 

(αk + βk ) + (αn+1 + βn+1 )

k=0

=

n 

αk +

k=0

=

n  k=0

n 

βk + (αn+1 + βn+1 )

k=0 n n+1 n+1      αk + αn+1 + βk + βn+1 = αk + βk , k=0

k=0

so that the formula also holds for .n + 1. The proof is thus complete.

k=0

8

1 Sets of Numbers and Metric Spaces

Example 3 The following formula involves a “telescopic sum”: n  .

(ak+1 − ak ) = an+1 − a0 ,

k=0

which can be visualized as (\ a1 − a0 ) + (\ a2 − a\1 ) + (\ a3 − a\2 ) + · · · + (\ an − a\n−1 ) + (an+1 − a\n ) = an+1 − a0 .

.

It can also be proved by induction. Example 4 Let us prove the identity Pn :

a n+1 − b n+1 = (a − b)

n 

.

 a k bn−k .

k=0

We first verify .P0 , i.e., a 0+1 − b 0+1 = (a − b)a 0b0−0 ,

.

which is clearly true. Assume now .Pn to be true for some .n ∈ N; then (a − b)

n+1 

.

a k bn+1−k



= (a − b)

k=0

n 

 a k bn+1−k + (a − b)a n+1 b0

k=0

= (a − b)

n 

 a k bn−k b + (a − b)a n+1

k=0

= (a n+1 − b n+1 )b + (a − b)a n+1 = a n+2 − b n+2 , so that .Pn+1 is also true. Thus, we have proved that .Pn is true for every .n ∈ N. As particular cases of the preceding formula we have .

a 2 − b 2 = (a − b)(a + b) , a 3 − b 3 = (a − b)(a 2 + ab + b2 ) , a 4 − b 4 = (a − b)(a 3 + a 2 b + ab2 + b 3 ) , a 5 − b 5 = (a − b)(a 4 + a 3 b + a 2 b2 + ab3 + b4) , ...

1.1 The Natural Numbers and the Induction Principle

9

Notice also the formula n  .

ak =

k=0

a n+1 − 1 , a−1

which holds for any .a = 1, obtained from the preceding formula taking .b = 1. In some cases it could be useful to start the sequence of propositions, e.g., from .P1 instead of .P0 or from any other of them, say, .Pn ¯ . However, one can always reduce to the previous case by a shift of the indices, so that the principle of induction indeed remains of the same nature. Briefly, to prove the propositions Pn¯ , Pn+1 , Pn+2 , Pn+3 ,... , ¯ ¯ ¯

.

we verify the first proposition .Pn¯ and then, assuming that .Pn is true for some .n ≥ n, ¯ we verify .Pn+1 . As an exercise, the reader could try to prove by induction the following identities: .

n(n + 1) , 2 n(n + 1)(2n + 1) 12 + 22 + 32 + · · · + n2 = , 6 1 + 2 + 3 + ··· + n =

13 + 23 + 33 + · · · + n3 =

n2 (n + 1)2 . 4

Notice the beautiful equality 13 + 23 + 33 + · · · + n3 = (1 + 2 + 3 + · · · + n)2 .

.

1.1.3

The Binomial Formula

Let us define, for any couple of natural numbers .n, k such that .k ≤ n, the binomial coefficients   n n! . . = k k!(n − k)! The following identity holds:  .

     n n n+1 + = . k−1 k k

10

1 Sets of Numbers and Metric Spaces

Indeed,  .

   n! n n n! + + = (k − 1)!(n − k + 1)! k!(n − k)! k−1 k =

n!k + n!(n − k + 1) k!(n − k + 1)!

=

(n + 1)! n!(n + 1) = . k!(n − k + 1)! k!((n + 1) − k)!

It is sometimes useful to represent the binomial coefficients in the so-called Pascal triangle   0 . 0     1 1 0 1       2 2 2 0 1 2         3 3 3 3 0 1 2 3           4 4 4 4 4 0 1 2 3 4             5 5 5 5 5 5 0 1 2 3 4 5 ...

...

...

...

...

...

which we can explicitly write as 1

.

1 1 1 1 1 4 1 ...

2 1 3 3 6 4

1 1

5 10 10 5

1

...

...

...

...

...

1.1 The Natural Numbers and the Induction Principle

11

We will now prove, for every .n ∈ N, the binomial formula Pn :

(a + b)n =

.

n    n n−k k a b . k k=0

It will be necessary to prove separately the case .n = 0 and then start the induction from .n = 1. If .n = 0, then   0 0−0 0 .(a + b) = a b , 0 0

and the formula holds. Assuming now .n ≥ 1, we proceed by induction. We first see that it holds when .n = 1: (a + b)1 =

.

    1 1−0 0 1 1−1 1 a b + a b . 0 1

Now, assuming that .Pn is true for some .n ≥ 1, we prove that .Pn+1 is also true: (a + b)n+1 = (a + b)(a + b)n  n     n n−k k = (a + b) a b k k=0   n     n    n  n n−k k n−k k =a a b +b a b k k

.

k=0

k=0

n   n    n n−k+1 k  n n−k k+1 a a b b + = k k k=0

= a n+1 +

k=0

n   k=1

= a n+1 +

k=0

n   k=1

= a n+1 +

 n−1   n n−k+1 k  n n−k k+1 a a b b + + b n+1 k k   n  n n−k+1 k  n a a n−(k−1)b(k−1)+1 + b n+1 b + k k−1 k=1

n   k=1

  

n n + a n−k+1 bk + b n+1 k k−1

12

1 Sets of Numbers and Metric Spaces

= a n+1 +

 n   n + 1 n−k+1 k b + b n+1 a k k=1

=

 n+1   n + 1 n+1−k k a b . k k=0

We have thus proved by induction that .Pn is true for every .n ∈ N. As particular cases of the binomial formula we have .

(a + b)2 = a 2 + 2ab + b 2 , (a + b)3 = a 3 + 3a 2 b + 3ab2 + b 3 , (a + b)4 = a 4 + 4a 3 b + 6a 2b2 + 4ab3 + b 4 , (a + b)5 = a 5 + 5a 4 b + 10a 3b2 + 10a 2b3 + 5ab4 + b 5 , ...

1.2

The Real Numbers

Starting from the set of natural numbers N = {0, 1, 2, 3, . . . } ,

.

by the use of set theory arguments it is possible first to construct the set of integer numbers Z = {. . . , −3, −2, −1, 0, 1, 2, 3, . . . }

.

and then the set of rational numbers m : m ∈ Z, n ∈ N, n = 0 . .Q = n This set has a lot of nice features from an algebraic point of view. Let us briefly review them. 1. An “order relation” .≤ is defined, with the following properties. For every choice of .x, y, z (a) .x ≤ x . (b) [.x ≤ y and .y ≤ x] . ⇒ .x = y . (c) [.x ≤ y and .y ≤ z] . ⇒ .x ≤ z . Moreover, such an order relation is “total” since any two elements x and y are comparable:

1.2 The Real Numbers

13

(d) .x ≤ y or . y ≤ x . If .x ≤ y, we will also write .y ≥ x. If .x ≤ y and .y = x, we will write .x < y, or .y > x. 2. An addition operation .+ is defined, with the following properties. For any choice of .x, y, z (a) (Associative) .x + (y + z) = (x + y) + z . (b) There exists an “identity element” 0 : we have .x + 0 = x = 0 + x . (c) x has an “inverse element” .−x : we have .x + (−x) = 0 = (−x) + x . (d) (Commutative) .x + y = y + x . (e) If .x ≤ y, then .x + z ≤ y + z . 3. A multiplication operation .· is defined, with the following properties. For any choice of .x, y, z (a) (Associative) .x · (y · z) = (x · y) · z . (b) There exists an “identity element” 1: we have .x · 1 = x = 1 · x . (c) If .x = 0, then x has an “inverse element” .x −1 : we have .x ·x −1 = 1 = x −1 ·x . (d) (Commutative) .x · y = y · x . (e) If .x ≤ y and .z ≥ 0, then .x · z ≤ y · z , and a property involving both operations: (f) (Distributive) .x · (y + z) = (x · y) + (x · z) . A set satisfying the foregoing properties is called an “ordered field.” The set .Q is, in some sense, the smallest ordered field. We will often omit the symbol .· in multiplication. Moreover, we adopt the usual notations, writing .z = y − x if .z + x = y and .z = yx if .zx = y, with .x = 0. In particular, .x −1 = x1 . We rediscover the set .N as a subset of .Q. Indeed, 0 and 1 are the identity elements of addition and multiplication, respectively, and then we have .2 = 1 + 1, .3 = 2 + 1, and so on. Besides its nice algebraic properties, the set of rational numbers .Q is not rich enough to deal with such an elementary geometric problem as the measuring of the diagonal of a square whose side’s length is 1, as the following theorem states. Theorem 1.2 There is no rational number x such that .x 2 = 2. Proof By contradiction, assume that there exist .m, n ∈ N different from 0 such that .

 m 2 n

= 2,

i.e., .m2 = 2n2 . Then m needs to be even, and so there exists a nonzero .m1 ∈ N such that .m = 2m1 . We thus have .4m21 = 2n2 , i.e., .2m21 = n2 . But then n also needs to be even, and so there exists a nonzero .n1 ∈ N such that .2n1 = n. Hence, .

m1 m = n n1

 and

m1 n1

2 = 2.

14

1 Sets of Numbers and Metric Spaces

We can now repeat the argument as many times as we want, continuing the division by 2 of numerator and denominator: .

m m2 m3 mk m1 = = = ··· = = ... , = n n1 n2 n3 nk

where .mk and .nk are nonzero natural numbers such that .m = 2k mk , .n = 2k nk . Then, since .nk ≥ 1, we have that .n ≥ 2k for any natural number .k ≥ 1. In particular, n n n .n ≥ 2 . But the Bernoulli inequality tells us that .2 = (1 + 1) ≥ 1 + n, and all this implies that .n ≥ 1 + n, which is clearly false.  Therefore, one feels the need to further extend the set .Q so as to be able to deal with this kind of problem. It is indeed possible to construct a set .R, containing .Q, which is an ordered field and, hence, satisfies properties (1), (2), and (3) and moreover satisfies the following property. 4. Separation Property. Given two nonempty subsets .A, B of .R such that ∀a ∈ A ∀b ∈ B

.

a ≤ b,

there exists an element .c ∈ R such that ∀a ∈ A

.

∀b ∈ B

a ≤ c ≤ b.

Mathematical Analysis is based on the set .R. We will assume that the reader is familiar with its elementary algebraic properties.

1.2.1

Supremum and Infimum

In this section we analyze some fundamental tools in .R. Let us start with some definitions. A subset E of .R is said to be “bounded from above” if there exists .α ∈ R such that, for every .x ∈ E, we have .x ≤ α; such a number .α is then an “upper bound” of E. If, moreover, .α ∈ E, then we will say that .α is the “maximum” of E, and we will write .α = max E. Analogously, the set E is said to be “bounded from below” if there exists .β ∈ R such that, for every .x ∈ E, we have .x ≥ β; such a number .β is then a “lower bound” of E. If, moreover, we have that .β ∈ E, then we will say that .β is the “minimum” of E, and we will write .β = min E. The set E is said to be “bounded” if it is both bounded from above and below. Some remarks are in order. The maximum, when it exists, is unique. However, a set could be bounded from above without having a maximum, as the example .E = {x ∈ R : x < 0} shows. Similar considerations can be made for the minimum.

1.2 The Real Numbers

15

Theorem 1.3 If E is nonempty and bounded from above, then the set of all upper bounds of E has a minimum. Proof Let B be the set of all upper bounds of E. Then ∀a ∈ E

∀b ∈ B

.

a ≤ b,

and by the separation property there exists a .c ∈ R such that ∀a ∈ E

.

∀b ∈ B

a ≤ c ≤ b.

This means that c is an upper bound of E, and hence .c ∈ B, and it is also a lower bound of B. Hence, .c = min B.  If E is nonempty and bounded from above, the smallest upper bound of E is called the “supremum” of E: it is a real number .s ∈ R that will be denoted by .sup E. It is characterized by the following two properties: (i) .∀x ∈ E (ii) .∀s  < s

x ≤s. ∃x ∈ E :

x > s .

The two preceding properties can also be equivalently written as follows: (i) .∀x ∈ E x ≤ s . (ii) .∀ε > 0 ∃x ∈ E :

x > s −ε.

In the second expression, we understand that the number .ε > 0 can be arbitrarily small. If the supremum .sup E belongs to E, then .sup E = max E; as we saw earlier, however, this is not always the case. We can state the following analogue of the preceding theorem. Theorem 1.4 If E is nonempty and bounded from below, then the set of all lower bounds of E has a maximum. If E is nonempty and bounded from below, the greatest lower bound of E is called the “infimum” of E: It is a real number .¯ι ∈ R that will be denoted by .inf E. It is characterized by the following two properties: (i) .∀x ∈ E x ≥ ¯ι . (ii) .∀¯ι  > ¯ι ∃x ∈ E :

x < ¯ι  .

16

1 Sets of Numbers and Metric Spaces

The two foregoing properties can also be equivalently written as follows: (i) .∀x ∈ E x ≥ ¯ι . (ii) .∀ε > 0 ∃x ∈ E :

x < ι¯ + ε .

If the infimum .inf E belongs to E, then .inf E = min E; however, the minimum could not exist. Notice that, defining the set E − = {x ∈ R : −x ∈ E} ,

.

we have E is bounded from above ⇔ E − is bounded from below,

.

and in that case, .

sup E = − inf E − ,

while E is bounded from below ⇔ E − is bounded from above,

.

and in that case, .

inf E = − sup E − .

In the case where E is not bounded from above, we will write .

sup E = +∞ .

Theorem 1.5 The set .N is not bounded from above, i.e., .sup N = +∞. Proof Assume by contradiction that .N is bounded from above. Then .s = sup N is a real number. By the properties of the supremum, there exists an .n ∈ N such that 1 .n > s − . But then .n + 1 ∈ N and 2 n+1 > s −

.

1 +1>s, 2

thereby contradicting the fact that s is an upper bound for .N.



1.2 The Real Numbers

17

In the case where E is not bounded from below, we will write .

inf E = −∞ .

For instance, we have .inf Z = −∞.

1.2.2

The Square Root

The following property will be used several times. Lemma 1.6 If .0 ≤ α < β, then .α 2 < β 2 . Proof If .0 ≤ α < β, then .α 2 = αα ≤ αβ < ββ = β 2 .



We will now prove that there exists a real number .c > 0 such that .c2 = 2. Let us define the sets A = {x ∈ R : x ≥ 0 and x 2 < 2} ,

.

B = {x ∈ R : x ≥ 0 and x 2 > 2} .

Let us check that ∀a ∈ A

.

∀b ∈ B

a ≤ b.

Indeed, if not, it would be .0 ≤ b < a, and, hence, by Lemma 1.6, .b2 < a 2 . But we know that .a 2 < 2 and .b2 > 2, hence .a 2 < b 2 , so we find a contradiction. By the separation property, there is an element .c ∈ R such that ∀a ∈ A ∀b ∈ B

.

a ≤ c ≤ b.

Notice that, since .1 ∈ A, it is surely the case that .c ≥ 1. We will now prove, by contradiction, that .c2 = 2. If .c2 > 2, then, for .n ≥ 1, .

  1 1 2 2c 2c + 2 ≥ c2 − ; c− = c2 − n n n n

hence, if .n > 2c/(c2 − 2), since .c ≥ 1 and .n ≥ 1, then c−

.

so that .c −

1 n

1 ≥0 n

and

  1 2 c− > 2, n

∈ B. But then .c ≤ c − n1 , which is clearly impossible.

18

1 Sets of Numbers and Metric Spaces

If .c2 < 2, then, for .n ≥ 1, .

  1 1 2 2c 2c 1 2c + 1 + 2 ≤ c2 + + = c2 + ; c+ = c2 + n n n n n n

hence, if .n > (2c + 1)/(2 − c2 ), then .(c + n1 )2 < 2, and therefore .c + n1 ∈ A. But then .c + n1 ≤ c, which is impossible. Since both assumptions .c2 > 2 and .c2 < 2 lead to a contradiction, it must be that 2 .c = 2. Lemma 1.6 also tells us that there cannot exist any other positive solutions of the equation x2 = 2 ,

.

which therefore has exactly two solutions, .x = c and .x = −c. The same type of reasoning can be used to prove that, for any positive real number r, there exists a unique positive real number c such√that .c2 = r. This number c is called the square root of r, and we √ write .c = √r. Notice that the 2 equation .x = r has indeed two solutions, .x = r and .x = − r. One also writes √ . 0 = 0, whereas the square root of a negative number remains undefined. This subject will be reconsidered in the framework of the complex numbers. At this point we are ready to deal with the quadratic equation ax 2 + bx + c = 0 ,

.

where a, b, and c are real numbers, with .a = 0. It can be written equivalently as follows:   b 2 b2 − 4ac x+ = . 2a (2a)2

.

Thus, we see that the equation is solvable if and only if .b2 − 4ac ≥ 0, in which case the solutions are √ −b ± b2 − 4ac . .x = 2a Let us now define the “absolute value” (or “modulus”) of a real number x as

.|x| = x2 =



x

if x ≥ 0 ,

−x

if x < 0 .

1.2 The Real Numbers

19

The following properties may be easily verified. For every .x1 , .x2 in .R, |x1 x2 | = |x1 | |x2| ,

.

whereas |x1 + x2 | ≤ |x1 | + |x2 | .

.

We will sometimes also need the inequality   |x1 | − |x2 | ≤ |x1 − x2 |

.

and the equivalence |x| ≤ α

.

1.2.3



−α ≤ x ≤ α .

Intervals

Let us explain what we mean by “interval.” Definition 1.7 An interval is a nonempty subset I of .R having the following property: if .α, β are two of its elements, then I contains all the numbers between them. We will not exclude the case where I only has a single element. Proposition 1.8 Let I be an interval, define .a = inf I , .b = sup I (possibly .a = −∞ or .b = +∞), and assume .a = b. If .a < x < b, then .x ∈ I . Proof If .a < x < b, then, by the properties of the infimum and supremum, we can always find .α and .β in I such that .a < α < x < β < b. Thus, by the preceding definition, I contains x.  By the foregoing proposition, distinguishing the cases where a and b can be real numbers or not and whether or not they belong to I , we can conclude that any interval I must be among those in the following list: .

[a, b] = {x : a ≤ x ≤ b} , ]a, b[ = {x : a < x < b} , [a, b[ = {x : a ≤ x < b} , ]a, b] = {x : a < x ≤ b} , [a, +∞[ = {x : x ≥ a} ,

20

1 Sets of Numbers and Metric Spaces

]a, +∞[ = {x : x > a} , ] − ∞, b] = {x : x ≤ b} , ] − ∞, b[ = {x : x < b} , R , sometimes denoted by ] − ∞, +∞[ . Note that, when .a = b, the interval .[a, a] reduces to a single point. In that case we say that the interval is “degenerate.” Theorem 1.9 (Cantor Theorem) Let .(In )n be a sequence of intervals of the type In = [an , bn ], with .an ≤ bn , such that

.

I0 ⊇ I1 ⊇ I2 ⊇ I3 ⊇ . . .

.

Then there is a .c ∈ R that belongs to all the intervals .In . Proof Let us define the two sets .

A = {an : n ∈ N} , B = {bn : n ∈ N} .

For any .an in A and any .bm in B (not necessarily having the same index), we have that .an ≤ bm . Indeed, if .n ≤ m, then .In ⊇ Im , hence .an ≤ am ≤ bm ≤ bn . On the other hand, if .n > m, then .Im ⊇ In , so that .am ≤ an ≤ bn ≤ bm . We have thus proved that ∀a ∈ A

.

∀b ∈ B

a ≤ b.

By the separation property, there is a .c ∈ R such that ∀a ∈ A ∀b ∈ B

.

a ≤ c ≤ b.

In particular, .an ≤ c ≤ bn , which means that .c ∈ In for every .n ∈ N.

1.2.4



Properties of Q and R \ Q

We will now study the “density” of .Q and .R \ Q in the set of real numbers .R. Theorem 1.10 Given two real numbers .α, β, with .α < β, there always exists a rational number in between them.

1.2 The Real Numbers

21

Proof Let us consider three different cases. First Case: .0 ≤ α < β. Choose .n ∈ N such that n>

.

1 , β −α

and let .m ∈ N be the greatest natural number such that m < nβ .

.

Then clearly . m n < β, and we will now show that it must be that contradiction, assume that . m n ≤ α; then .

.

m n

> α. By

1 m+1 ≤ α + < α + (β − α) = β , n n

meaning .m + 1 < nβ, in contradiction to the fact that m is the greatest natural number less than .nβ. Second Case: .α < 0 < β. It is sufficient to choose 0, which is a rational number. Third Case: .α < β ≤ 0. We can reduce this case to the first case, changing signs: m since .0 ≤ −β < −α, there exists a rational number . m n such that .−β < n < −α. m Hence, .α < − n < β.  Theorem 1.11 Given two real numbers .α, β, with .α < β, there always exists an irrational number in between them. Proof By the previous theorem, there exists a rational number . m n such that √

α+

.

2
1 , E if ρ ≥ 1 , hence  S(x0 , ρ) =

.

E \ {x0 } Ø

if ρ = 1 , if ρ = 1 .

We will now introduce a series of definitions that will be crucial for understanding the theory we want to develop.

1.5 Metric Spaces

35

Definition 1.14 A set .U ⊆ E is said to be a “neighborhood” of a point .x0 if there exists a .ρ > 0 such that .B(x0 , ρ) ⊆ U ; in that case, the point .x0 is said to be an “internal point” of U . The set of all internal points of U is called the “interior” of U ˚. Clearly, we always have the inclusion .U ˚ ⊆ U. It is said that U and is denoted by .U ˚ is an “open” set if it coincides with its interior, i.e., if .U = U . Here is an example of an open set. Theorem 1.15 An open ball is an open set. Proof Let .U = B(x0 , ρ) be an open ball, and take any point .x1 ∈ U . We want ˚, i.e., .x1 is an interior point of U . Choose .r > 0 such that to prove that .x1 ∈ U .r ≤ ρ − d(x0 , x1 ). If we show that .B(x1 , r) ⊆ U , our proof will be completed. For any .x ∈ B(x1 , r) we have d(x, x0) ≤ d(x, x1 ) + d(x1 , x0 ) < r + d(x1 , x0 ) ≤ ρ ,

.

so that .x ∈ B(x0 , ρ). We have thus shown that .B(x1 , r) ⊆ B(x0 , ρ).



Examples Let us analyze three particular examples. In the first one, the set U coincides with E; in the second one, U is the empty set; in the third one, it is made of a single point. 1. Every point of E is internal to E since every ball is by definition contained in E, ˚ = E. This the universal set. Hence, the interior of E coincides with E, i.e., .E means that E is an open set. 2. The empty set .Ø cannot have internal points. Hence, its interior, having no elements, is the empty set, i.e., .˚ Ø = Ø, meaning .Ø is an open set. 3. In general, the set .U = {x0 }, made up of a single point, is not an open set (e.g., in N with the Euclidean distance), but it could be an open set in certain situations, .R i.e., when .x0 is an “isolated” point of E. This could happen, for instance, if .E = ˆ N, with the usual distance inherited from .R, or when considering the distance .d. Theorem 1.16 The interior of any set U is an open set. ˚ = Ø, this is surely true. Assume, then, that .U ˚ is nonempty, and take any Proof If .U ˚ point .x1 ∈ U . Then there exists a .ρ > 0 such that .B(x1 , ρ) ⊆ U . If we show that ˚, our proof will be completed, since we will have proved that every .B(x1 , ρ) ⊆ U ˚ ˚. point .x1 of .U is an internal point of .U ˚ To prove that .B(x1 , ρ) ⊆ U , let x be an element of .B(x1 , ρ). Since .B(x1 , ρ) is an open set, there exists an .r > 0 such that .B(x, r) ⊆ B(x1 , ρ). Then, .B(x, r) ⊆ U , ˚. The proof is complete. showing that x belongs to .U 

36

1 Sets of Numbers and Metric Spaces

The following implication holds: U1 ⊆ U2

.



˚1 ⊆ U ˚2 . U

˚ is the greatest open set contained in U ; indeed, if As a consequence, we see that .U ˚. A is an open set and .A ⊆ U , then .A ⊆ U Definition 1.17 A point .x0 is said to be an “adherent point” of a set U if for every ρ > 0 we have that .B(x0 , ρ) ∩ U = Ø. The set of all adherent points of U is said to be the “closure” of U and is denoted by .U . Clearly, we always have the inclusion .U ⊆ U . It is said that U is a “closed” set if it coincides with its closure, i.e., if .U = U . .

Here is an example of a closed set. Theorem 1.18 A closed ball is a closed set. Proof Let .U = B(x0 , ρ) be a closed ball. To prove that .U ⊆ U , we will equivalently show that .CU ⊆ CU . This is surely true if .U = E, i.e., if .CU = Ø. Thus, assume now that .CU is nonempty. Take any point .x1 ∈ CU , i.e., such that .d(x1 , x0 ) > ρ. We want to prove that .x1 ∈ CU , i.e., that .x1 is not an adherent point of U . Choose .r > 0 such that .r ≤ d(x0 , x1 ) − ρ. If we show that .B(x1 , r) ∩ U = Ø, our proof will be completed. Assume by contradiction that .B(x1 , r) ∩ B(x0 , ρ) = Ø, and take an .x ∈ B(x1 , r) ∩ B(x0 , ρ). Then d(x0, x1 ) ≤ d(x0 , x) + d(x, x1 ) < ρ + r ≤ ρ + (d(x0 , x1 ) − ρ) = d(x0 , x1 ) ,

.

which is clearly impossible.



Examples Let us consider again the aforementioned three examples: .U = E, .U = Ø, and .U = {x0 }. 1. Since E is the universal set, every adherent point of E necessarily belongs to E. Hence, the closure of E coincides with E, i.e., .E = E. This means that E is a closed set. 2. The empty set .Ø has no adherent points. Indeed, taking any point .x0 in E, for every .ρ > 0 we have that .B(x0 , ρ) ∩ Ø = Ø. Hence, the closure of .Ø, having no elements at all, is empty, i.e., .Ø = Ø, meaning .Ø is a closed set. 3. The set .U = {x0 }, made up of a single point, is always a closed set. Indeed, if we take any .x1 ∈ / U , choosing .ρ > 0 such that .ρ < d(x0 , x1 ), we have that .B(x1 , ρ) ∩ U = Ø, thereby demonstrating that .x1 is not an adherent point of U .

1.5 Metric Spaces

37

Theorem 1.19 The closure of any set U is a closed set. Proof Let .V = U . If .V = E, this surely is a closed set. Let us then assume that .V = E. We need to show that any adherent point of V belongs to V . By contradiction, assume that there exists some .x1 in .V that does not belong to V . Since .x1 ∈ / U , there is a .ρ > 0 such that .B(x1 , ρ) ∩ U = Ø. On the other hand, since .x1 ∈ V , we have that .B(x1 , ρ) ∩ V = Ø. Take an .x ∈ B(x1 , ρ) ∩ V . Then, since .B(x1 , ρ) is an open set, there exists .r > 0 such that .B(x, r) ⊆ B(x1 , ρ). Since .x ∈ V = U , we have .B(x, r) ∩ U = Ø and, hence, also .B(x1 , ρ) ∩ U = Ø, a contradiction.  The following implication holds: U1 ⊆ U2

.



U1 ⊆ U2 .

As a consequence, we see that .U is the smallest closed set containing U : If C is a closed set and .C ⊇ U , then .C ⊇ U . We will now try to understand the relationships between the notions of interior and closure of a set and those between open and closed sets. Theorem 1.20 The following identities hold: ˚, CU = C U

.

˚ ) = CU . (CU

Proof Let us prove the first one. First of all notice that ˚ = E ⇔ CU ˚ = Ø. CU = Ø ⇔ CU = Ø ⇔ U = E ⇔ U

.

Assume now that .CU = Ø. Then x ∈ CU

.



∀ρ > 0 B(x, ρ) ∩ CU = Ø



∀ρ > 0 B(x, ρ) ⊆ U



˚ x ∈ U



˚. x ∈ CU

This proves the first identity. Now let .V = CU . Then ˚ = C(C V ˚) = C(CV ) = CU , V

.

thereby also proving the second identity.



38

1 Sets of Numbers and Metric Spaces

As a consequence of the preceding theorem, ˚ , U = C(CU)

˚ = C(CU) . U

.

Moreover, we have the following corollary. Corollary 1.21 A set is open (closed) if and only if its complementary is closed (open). ˚, hence .CU = C U ˚ = CU , so that .CU is closed. On Proof If U is open, then .U = U ˚ ) = CU = CU , so that .CU is the other hand, if U is closed, then .U = U , hence .(CU open.  It is possible to prove that the union and the intersection of two open (closed) sets is an open (closed) set. The same holds true for an arbitrary finite number of them. However, if one considers an infinite number of open sets, it can be proved that their union is still an open set, whereas their intersection could not be. For example, in .R, taking the open sets

An = −

.

1 1 , , n+1 n+1

with .n ∈ N, their intersection is .{0}, which is not an open set. Analogously, if one considers an infinite number of closed sets, it can be proved that their intersection is still a closed set, whereas their union could not be such. For example, in .R, taking the closed sets .Cn = −1 +

1 1 , 1− n+1 n+1

,

with .n ∈ N, their union is .] − 1, 1[ , which is not a closed set. Definition 1.22 The “boundary” of a set U , denoted by .∂U , is defined as the difference between its closure and its interior, i.e., ˚. ∂U = U \ U

.

We should be careful not to put too much trust in our intuition, naturally developed in an Euclidean world. For example, it is true in .RN that B(x0 , ρ) = B(x0 , ρ) ,

.

∂B(x0 , ρ) = S(x0 , ρ) .

1.5 Metric Spaces

39

However, these identities are not valid in any metric space E. For instance, if we ˆ then .B(x0 , 1) = {x0 }, which is a closed set, take the previously defined distance .d, and .B(x0 , 1) = E, so that .B(x0 , 1) = B(x0 , 1). Moreover, .∂B(x0 , 1) = Ø, whereas .S(x0 , 1) = E \ {x0 }, so that .∂B(x0 , 1) = S(x0 , 1). As a curious example, in .R, taking .U = Q, we have Q = R,

.

˚ = Ø, Q

∂Q = R .

2

Continuity

In this chapter we introduce one of the most important concepts in mathematical analysis: the “continuity” of a function. This topic will be treated in the general framework of metric spaces.

2.1

Continuous Functions

Intuitively, a function f is “continuous” if the value .f (x) varies gradually when x varies in the domain, in other words, if we encounter no sudden variations in the values of the function. In order to make this intuitive idea rigorous enough, it will be convenient to focus our attention at a point .x0 of the domain and to clarify what we mean by f is “continuous” at x0 .

.

We will proceed gradually. First Attempt We will say that f is “continuous” at .x0 when the following statement holds:

If x is near .x0 , then .f (x) is near .f (x0 ). We immediately observe that, although the idea of continuity is already quite well formulated, the preceding proposition is not an acceptable definition, because the word “near,” which appears twice, does not have a precise meaning. However, first of all, to measure how close x is to .x0 and how close .f (x) is to .f (x0 ), we need to introduce distances. More precisely, we will have to assume that the domain and the codomain of the function are metric spaces.

© The Author(s), under exclusive license to Springer Nature Switzerland AG 2023 A. Fonda, A Modern Introduction to Mathematical Analysis, https://doi.org/10.1007/978-3-031-23713-3_2

41

42

2 Continuity

Let, then, E and F be two metric spaces, with their distances .dE and .dF , respectively. Let .x0 be a point in E and .f : E → F be our function. Let us make a second attempt at a definition. Second Attempt We will say that f is “continuous” at .x0 when the following statement holds:

If .dE (x, x0 ) is small, then .dF (f (x), f (x0 )) is small. We immediately realize that the problem encountered in the first attempt at a definition has not been solved at all with this second attempt, since now the word “small,” which appears twice, has no precise meaning. We then ask ourselves: How small do we want the distance .dF (f (x), f (x0 )) to be? What we have in mind is that this distance can be made as small as we want (provided that the distance .d(x, x0 ) is small enough, of course). To be able to measure it, we will then introduce a positive real number, which we call .ε, and we will require .dF (f (x), f (x0 )) to be smaller than .ε when .d(x, x0 ) is sufficiently small. The arbitrariness of this positive number .ε will allow us to take it as small as we like. Third Attempt We will say that f is “continuous” at .x0 when the following statement holds true: Taking any number .ε > 0,

if .dE (x, x0 ) is small, then .dF (f (x), f (x0 )) < ε . Now the word “small” appears only once, whereas the distance .dF (f (x), f (x0 )) is simply controlled by the number .ε. Hence, at least the second part of the proposition now has a precise meaning. We could then try to do the same for the distance .d(x, x0 ), introducing a new positive number, which we call .δ, so we can control it. Fourth Attempt (the Good One!) We will say that f is “continuous” at .x0 when the following statement holds true: Taking any number .ε > 0, it is possible to find a number .δ > 0 for which,

if .dE (x, x0 ) < δ, then .dF (f (x), f (x0 )) < ε. This last proposition, unlike the previous ones, contains no inaccurate words. The distances .dE (x, x0 ) and .dF (f (x), f (x0 )) are now simply controlled by the two positive numbers .δ and .ε, respectively. Let us rewrite it in a formal way. Definition 2.1 We will say that f is “continuous” at .x0 if, for any positive number ε, there exists a positive number .δ such that, if x is any element in the domain E whose distance from .x0 is less than .δ, then the distance of .f (x) from .f (x0 ) is less than .ε. In symbols:

.

∀ε > 0 ∃δ > 0 : ∀x ∈ E

.

dE (x, x0 ) < δ ⇒ dF (f (x), f (x0 )) < ε .

Rather often, in this formulation, “.∀x ∈ E” will be tacitly understood.

2.1 Continuous Functions

43

Let us note that one or both of the inequalities dE (x, x0 ) < δ ,

.

dF (f (x), f (x0 )) < ε

may be replaced, respectively, by dE (x, x0 ) ≤ δ ,

.

dF (f (x), f (x0 )) ≤ ε

without changing the definition at all. This is due to the fact that, on the one hand, ε is any positive number and, on the other hand, that if the implication holds for some positive number .δ, then it holds a fortiori taking instead of that .δ any smaller positive number. Reading again the definition of continuity we see that f is continuous at .x0 if and only if

.

∀ε > 0

.

∃δ > 0 : f (B(x0 , δ)) ⊆ B(f (x0 ), ε) .

Moreover, it is equivalent if we take a closed ball instead of an open ball, one or both of them. Also equivalently we can say that f is continuous at .x0 if and only if .

for every neighborhood V of f (x0 ) there exists a neighborhood U of x0 such that f (U ) ⊆ V .

In what follows, we will often denote the distances in E and F simply by d. We are confident that this will not create confusion. When the function f happens to be continuous at all points .x0 of its domain E, we will say that “f is continuous on E” or simply “f is continuous.” Let us provide a few examples. Example 1 The constant function: For some .c¯ ∈ F we have that .f (x) = c, ¯ for every .x ∈ E. Since .dF (f (x), f (x0 )) = dF (c, ¯ c) ¯ = 0 for every .x ∈ E, such a function is clearly continuous (every choice of .δ > 0 is fine). Example 2 Let .x0 be an “isolated point” of E, meaning there exists a .ρ > 0 such that there are no points of E whose distance from .x0 is less than .ρ, except .x0 itself. We can then see that, in this case, any function .f : E → F is continuous at .x0 . Indeed, for any .ε > 0, taking .δ = ρ, we will have .B(x0 , δ) = {x0 }, so .f (B(x0 , δ)) = {f (x0 )} ⊆ B(f (x0 ), ε). Example 3 Let .E = RN and .F = RN . For a fixed number .α ∈ R, let us consider the function .f : RN → RN defined as .f (x) = α x. This is a continuous function. Indeed, if .α = 0, then we have a constant function with value .0, and we know already that such a function is continuous. Assume, in contrast, that .α = 0. Then,

44

2 Continuity

once .ε > 0 has been fixed, since f (x) − f (x0 ) = α x − α x0  = α(x − x0 ) = |α| x − x0  ,

.

it is sufficient to take .δ =

ε |α|

to verify that

x − x0  < δ ⇒ f (x) − f (x0 ) < ε .

.

Example 4 Let .E = RN and .F = R. Let us show that the function .f : RN → R defined as .f (x) = x is continuous on .RN . This fact will be a simple consequence of the inequality     x − x  ≤ x − x  ,

.

which we will now prove. We have .

x = (x − x ) + x  ≤ x − x  + x  , x  = (x − x) + x ≤ x − x + x .

Since .x − x  = x − x, we have x − x  ≤ x − x  and x  − x ≤ x − x  ,

.

whence the inequality we wanted to prove. Now, considering any point .x0 ∈ RN , once .ε > 0 has been fixed, it is sufficient to take .δ = ε to verify that     x − x0  < δ ⇒ x − x0  < ε .

.

Example 5 Let E be any metric space, and .y0 ∈ E be fixed. The function .f : E → R defined as .f (x) = d(x, y0 ) is continuous. The proof of this fact is similar to the earlier one, since we can show that, for any .x0 ∈ E, |d(x, y0 ) − d(x0 , y0 )| ≤ d(x, x0 ) .

.

Example 6 Let .E = R and .F = R. Consider the “sign function” .f : R → R defined as ⎧ ⎨−1 if x < 0 .f (x) = 0 if x = 0 ⎩ 1 if x > 0 . We can show that this function is continuous at all points except at .x0 = 0. Indeed, if .x0 = 0, then it will be sufficient to take .δ < |x0 |, so as to have f constant

2.1 Continuous Functions

45

on the interval .]x0 − δ, x0 + δ[ and, hence, continuous at .x0 . To see that f is not continuous at 0, let us fix an .ε ∈ ]0, 1[ ; for any choice of .δ > 0, it is possible to find an .x ∈ ] − δ, δ[ such that .|f (x)| = 1, hence .|f (x) − f (0)| > ε. Example 7 The “Dirichlet function” .D : R → R is defined as  D(x) =

.

1 if x ∈ Q , 0 if x ∈ / Q.

It can be seen that, for any .x0 ∈ R, this function is not continuous at .x0 . Indeed, fixing .ε ∈ ]0, 1[ , since both .Q and .R \ Q are dense in .R, for every .x0 and any choice of .δ > 0 there will surely be a rational number .x  and an irrational number .x  in .]x0 − δ, x0 + δ[ ; hence, based on .x0 being rational or irrational, we will have that either .|D(x  ) − D(x0 )| > ε or .|D(x  ) − D(x0 )| > ε. Let us study the behavior of continuity with respect to the sum of two functions and to the product with a constant. In the following theorem, we assume that F is a normed vector space. Theorem 2.2 Let F be a normed vector space and .α a real constant. If .f, g : E → F are continuous at .x0 , then the same is true of .f ± g and .αf . Proof Let .ε > 0 be fixed. By the continuity of f and g there exist .δ1 > 0 and δ2 > 0 such that

.

.

d(x, x0 ) < δ1 ⇒ f (x) − f (x0 ) < ε , d(x, x0 ) < δ2 ⇒ g(x) − g(x0 ) < ε .

Hence, taking .δ = min{δ1 , δ2 }, we have that, if .d(x, x0 ) < δ, then (f ± g)(x) − (f ± g)(x0 ) ≤ f (x) − f (x0 ) + g(x) − g(x0 ) < 2ε

.

and (αf )(x) − (αf )(x0 ) ≤ |α| f (x) − f (x0 ) < |α| ε .

.

By the arbitrariness of .ε, the statement is proved.



Remark 2.3 The conclusion of the preceding proof is correct since the .ε > 0 in the definition of continuity is arbitrary. Indeed, even if, for some constant .c > 0, one proves that ∀ε > 0

.

∃δ > 0 : ∀x ∈ E

dE (x, x0 ) < δ ⇒ dF (f (x), f (x0 )) < c ε ,

46

2 Continuity

this is sufficient to conclude that f is continuous at .x0 . This observation will often be used in what follows. We now state some properties of continuous functions with codomain .F = R. Theorem 2.4 If .f, g : E → R are continuous at .x0 , the same is true of .f · g. Proof Let .ε > 0 be fixed. It is not restrictive to assume .ε ≤ 1, since we could always define .ε = min{ε, 1} and proceed with .ε instead of .ε. By the continuity of f and g there exist .δ1 > 0 and .δ2 > 0 such that .

d(x, x0) < δ1 ⇒ |f (x) − f (x0 )| < ε , d(x, x0) < δ2 ⇒ |g(x) − g(x0 )| < ε .

Here we note that, since .ε ≤ 1, if .|f (x) − f (x0 )| < ε, then .|f (x)| < |f (x0 )| + 1. Hence, taking .δ = min{δ1 , δ2 }, we have d(x, x0 ) < δ ⇒ |(f · g)(x) − (f · g)(x0 )| =

.

= |f (x)g(x) − f (x)g(x0 ) + f (x)g(x0 ) − f (x0 )g(x0 )| ≤ |f (x)| · |g(x) − g(x0 )| + |g(x0 )| · |f (x) − f (x0 )| ≤ (|f (x0 )| + 1) · |g(x) − g(x0 )| + |g(x0 )| · |f (x) − f (x0 )| < (|f (x0 )| + |g(x0 )| + 1) ε . By the arbitrariness of .ε, this proves that .f · g is continuous at .x0 .



We now state the property of sign permanence. Theorem 2.5 If .g : E → R is continuous at .x0 and .g(x0 ) > 0, then there exists a neighborhood U of .x0 such that x ∈ U ⇒ g(x) > 0 .

.

Proof Let us fix .ε = g(x0 ). By continuity, there exists .δ > 0 such that d(x, x0 ) < δ ⇒ g(x0 ) − ε < g(x) < g(x0 ) + ε ⇒ 0 < g(x) < 2g(x0 ) .

.

Then .U = B(x0 , δ) is the neighborhood we are looking for.



Clearly enough, if .g(x0 ) < 0 were true, then there would exist a neighborhood U of .x0 such that x ∈ U ⇒ g(x) < 0 .

.

2.1 Continuous Functions

47

Theorem 2.6 If .f, g : E → R are continuous at .x0 and .g(x0 ) = 0, then also . fg is continuous at .x0 . Proof Notice that, by the property of sign permanence, there exists a neighborhood (x) U of .x0 such that the quotient . fg(x) is defined at least for all .x ∈ U . Since . fg = f · g1 , it will suffice to prove that . g1 is continuous at .x0 . Let us fix .ε > 0; we may assume without loss of generality that .ε < such that

|g(x0 )| 2 . By the continuity of g

there exists a .δ > 0

d(x, x0 ) < δ ⇒ |g(x) − g(x0 )| < ε .

.

Since .ε
|g(x0 )| − ε >

.

|g(x0 )| . 2

As a consequence,   1  |g(x0 ) − g(x)| 1 2 < d(x, x0) < δ ⇒  (x) − (x0 ) = ε. g g |g(x)| |g(x0 )| |g(x0 )|2

.

By the arbitrariness of .ε, this proves that . g1 is continuous at .x0 .



We know that all constant functions are continuous, as is the function .f : R → R defined as .f (x) = x (see Example 3 presented earlier, with .α = 1). By the previous theorems, all polynomial functions are continuous, as are all rational functions, defined by the quotient of two polynomials. More precisely, these latter are continuous at all points where they are defined, i.e., where the denominator is not equal to zero. Let us now examine the behavior of a composition of continuous functions. In the following theorem, E, F , and G are three metric spaces. Theorem 2.7 Let .f : E → F be continuous at .x0 and .g : F → G be continuous at .f (x0 ); then .g ◦ f is continuous at .x0 . Proof Let W be a fixed neighborhood of .[g ◦ f ](x0 ) = g(f (x0 )). By the continuity of g at .f (x0 ) there exists a neighborhood V of .f (x0 ) such that .g(V ) ⊆ W . Then, by the continuity of f at .x0 , there exists a neighborhood U of .x0 such that .f (U ) ⊆ V . Hence, .[g ◦ f ](U ) ⊆ W , thereby proving the statement.  Let us now define, for every .k = 1, 2, . . . , N,, the “kth projection” .pk : RN → R as pk (x1 , x2 , . . . , xN ) = xk .

.

48

2 Continuity

Theorem 2.8 The functions .pk are continuous. 0 Proof We consider a point .x0 = (x10 , x20 , . . . , xN ) ∈ RN and fix an .ε > 0. Notice N that, for every .x = (x1 , x2 , . . . , xN ) ∈ R ,

  N   0 .|xk − xk | ≤  (xj − xj0 )2 = d(x, x0 ) ; j =1

hence, taking .δ = ε, we have that d(x, x0 ) < δ ⇒ |pk (x) − pk (x0 )| = |xk − xk0 | < ε .

.

This proves that .pk is continuous at .x0 .



Let us consider now a function .f : E → RM for some integer .M ≥ 1. We can define the “components” of f as .fk = pk ◦ f : E → R, with .k = 1, 2, . . . , M, so that f (x) = (f1 (x), f2 (x), . . . , fM (x)) .

.

Theorem 2.9 The function f is continuous at .x0 if and only if all its components are as well. Proof If f is continuous at .x0 , then its components are also continuous since they are composed of two continuous functions. To prove the contrary, let us assume that all the components of f are continuous at .x0 . Fixing .ε > 0, for every .k = 1, 2, . . . , M there is a .δk > 0 such that d(x, x0 ) < δk ⇒ |fk (x) − fk (x0 )| < ε .

.

Setting .δ = min{δ1 , δ2 , . . . , δM }, we have   M  √  .d(x, x0 ) < δ ⇒ d(f (x), f (x0 )) =  (fj (x) − fj (x0 ))2 < Mε . j =1

By the arbitrariness of .ε, the proof is complete.



Theorem 2.10 Every linear function . : RN → RM is continuous. Proof We first observe that, since the projections .pk are linear functions, the components .k = pk ◦  of the linear function . are linear as well. Let .[e1 , e2 , . . . , eN ]

2.1 Continuous Functions

49

be the canonical basis of .RN , i.e., .

e1 = (1, 0, 0, . . . , 0) , e2 = (0, 1, 0, . . . , 0) , .. .

eN = (0, 0, 0, . . . , 1) . Every vector .x = (x1 , x2 , . . . , xN ) ∈ RN can be expressed as

x = x1 e1 + x2 e2 + · · · + xN eN = p1 (x)e1 + p2 (x)e2 + · · · + pN (x)eN .

.

Hence, for every .k ∈ {1, 2, . . . , M}, k (x) = p1 (x)k (e1 ) + p2 (x)k (e2 ) + · · · + pN (x)k (eN ) ,

.

showing that .k is a linear combination of the projections .p1 , p2 , . . . , pN . Since those functions are continuous, we have proved that .k is continuous for every .k ∈ {1, 2, . . . , M}. Therefore, since all its components are continuous, the function . is continuous as well.  We conclude this section with a characterization of continuity involving the counterimages of open and closed sets in arbitrary metric spaces. Theorem 2.11 The following propositions are equivalent: (i) .f : E → F is continuous. (ii) If A is open in F , then .f −1 (A) is open in E. (iii) If C is closed in F , then .f −1 (C) is closed in E. Proof Let us show that .(i) implies .(ii). Let .f : E → F be continuous, and let A be an open set in F . Taking .x0 ∈ f −1 (A), we have .f (x0 ) ∈ A. Since A is open, there exists a .ρ > 0 for which .B(f (x0 ), ρ) ⊆ A. Since f is continuous at .x0 , taking .ε = ρ in the definition, there exists a .δ > 0 such that .f (B(x0 , δ)) ⊆ B(f (x0 ), ρ). Then .B(x0 , δ) ⊆ f −1 (B(f (x0 ), ρ)) ⊆ f −1 (A), so that .x0 is in the interior of −1 (A). We have thus proved that every .x ∈ f −1 (A) is in the interior of .f −1 (A), .f 0 so that .f −1 (A) is open. Let us prove now that .(ii) implies .(i). We consider a point .x0 ∈ E, fix .ε > 0, and set .A = B(f (x0 ), ε), which is an open set in F . If .(ii) holds, then .f −1 (A) is an open set in E containing .x0 . Hence, there exists a .δ > 0 such that .B(x0 , δ) ⊆ f −1 (A), meaning .f (B(x0 , δ)) ⊆ A = B(f (x0 ), ε). The continuity of f at .x0 is thus proved.

50

2 Continuity

We now show that .(ii) implies .(iii). Let C be a closed set in F , and let .A = CC, the complementary set of C. The set A is open in F so that, if .(ii) holds, then −1 (A) is open in E. But .f −1 (A) = f −1 (CC) = Cf −1 (C), so .f −1 (C) is closed. .f In a very similar way one proves that .(iii) implies .(ii), concluding the proof of the theorem. 

2.2

Intervals and Continuity

Here is a fundamental property of continuous functions defined on intervals. Theorem 2.12 (Bolzano Theorem) If .f : [a, b] → R is a continuous function such that .

either

f (a) < 0 < f (b) ,

or

f (a) > 0 > f (b) ,

then there exists a .c ∈ ]a, b[ such that .f (c) = 0. Proof We treat the case .f (a) < 0 < f (b), since the other one is completely analogous. We set .I0 = [a, b] and consider the midpoint . a+b 2 of the interval .I0 . If f is equal to zero at that point, then we have found the point c we were looking a+b a+b for. Otherwise, either .f ( a+b 2 ) < 0 or .f ( 2 ) > 0. If .f ( 2 ) < 0, then we call a+b a+b .I1 the interval .[ 2 , b]; if .f ( 2 ) > 0, then instead we refer to .I1 as the interval a+b .[a, 2 ]. Taking now the midpoint of .I1 and following the same reasoning, we can define an interval .I2 and, by recurrence, a sequence of intervals .In = [an , bn ] such that I0 ⊇ I1 ⊇ I2 ⊇ I3 ⊇ . . .

.

and .f (an ) < 0 < f (bn ) for every n. By Cantor’s Theorem 1.9, there exists a c ∈ R belonging to all these intervals. Let us prove that .f (c) = 0. By contradiction, assume .f (c) = 0. If .f (c) < 0, by the property of sign permanence, there is a .δ > 0 such that .f (x) < 0 for every .x ∈ ]c − δ, c + δ[. Now, since .bn − c ≤ bn − an and b−a b−a b−a .bn − an = 2n < n for .n ≥ 1, taking .n > δ , we have .bn ∈ ]c − δ, c + δ[. But then we should have .f (bn ) < 0, in contradiction to the above inequality. A similar line of reasoning rules out the case .f (c) > 0.  .

As a consequence of the foregoing theorem, we deduce that a continuous function “transforms intervals into intervals.” Corollary 2.13 Let E be a subset of .R and .f : E → R be a continuous function. If .I ⊆ E is an interval, then .f (I ) is also an interval.

2.3 Monotone Functions

51

Proof Excluding the trivial cases where I and .f (I ) are made of a single element, let us take .α, β ∈ f (I ), with .α < β, and let .γ be such that .α < γ < β. We want to see that .γ ∈ f (I ). Let .g : E → R be the function defined by g(x) = f (x) − γ .

.

We can find .a, b in I such that .f (a) = α and .f (b) = β. Since I is an interval, the function g is defined on .[a, b] (or .[b, a] in the case .b < a), and it is continuous there. Moreover, .g(a) < 0 < g(b), and, hence, by the foregoing theorem, there is a .c ∈ ]a, b[ such that .g(c) = 0, i.e., .f (c) = γ . 

2.3

Monotone Functions

Let E be a subset of .R. We will say that a function .f : E → R is: “Increasing” if [ .x1 < x2 ⇒ f (x1 ) ≤ f (x2 ) ]. “Decreasing” if [ .x1 < x2 ⇒ f (x1 ) ≥ f (x2 ) ]. “Strictly increasing” if [ .x1 < x2 ⇒ f (x1 ) < f (x2 ) ]. “Strictly decreasing” if [ .x1 < x2 ⇒ f (x1 ) > f (x2 ) ]. We will say that f is “monotone” if it is either increasing or decreasing and “strictly monotone” if it is either strictly increasing or strictly decreasing. Example The function .f : [0, +∞[ → R defined as .f (x) = x n is strictly increasing. The case .n = 2 was established in Lemma 1.6. The general case can be easily proved by induction. Let us now show how one can characterize the continuity of invertible functions defined on an interval. Theorem 2.14 Let I and J be two intervals, and let .f : I → J be an invertible function. Then f is continuous

.



f is strictly monotone .

In that case, .f −1 : J → I is also strictly monotone and continuous. Proof Assume f to be continuous and, by contradiction, that it is not strictly monotone. Then there exist .x1 < x2 < x3 in I such that either f (x1 ) < f (x2 )

.

and

f (x2 ) > f (x3 )

52

2 Continuity

or f (x1 ) > f (x2 )

.

and

f (x2 ) < f (x3 ) .

(Equalities are not allowed since f is injective.) Let us consider the first case, the other being analogous. Choosing .γ ∈ R such that .f (x1 ) < γ < f (x2 ) and .f (x2 ) > γ > f (x3 ), by Corollary 2.13 there exist .a ∈ ]x1 , x2 [ and .b ∈ ]x2 , x3 [ such that .f (a) = γ = f (b), in contradiction to the injectivity of f . Assume now that f is strictly monotone, e.g., strictly increasing, the other case being analogous. Once we have fixed some .x0 ∈ I , we want to prove that f is continuous at .x0 . Let us consider three distinct cases. Case 1.

Assume that .x0 is not an endpoint of interval I and, consequently , .y0 = f (x0 ) is not an endpoint of J . Let .ε > 0 be fixed; we can assume without loss of generality that .[y0 − ε, y0 + ε] ⊆ J . Set .x1 = f −1 (y0 − ε) and −1 (y +ε), and notice that .x < x < x . Since .f (x ) = f (x )−ε .x2 = f 0 1 0 2 1 0 and .f (x2 ) = f (x0 ) + ε, taking .δ = min{x0 − x1 , x2 − x0 } we have d(x, x0 ) < δ ⇒ x1 < x < x2

.

⇒ f (x1 ) < f (x) < f (x2 ) ⇒ d(f (x), f (x0 )) < ε ,

Case 2.

showing that f is continuous at .x0 . Now let .x0 = min I , hence also .y0 = min J . Let .ε > 0 be fixed; we can assume without loss of generality that .[y0 , y0 +ε] ⊆ J . Set, as previously, −1 (y + ε). Since .f (x ) = f (x ) + ε, taking .δ = x − x , we .x2 = f 0 2 0 2 0 have x0 ≤ x < x2 ⇒ f (x0 ) ≤ f (x) < f (x2 ) ⇒ d(f (x), f (x0 )) < ε ,

.

Case 3.

demonstrating that f is continuous at .x0 . If .x0 = max I , then the argument is similar to that in Case 2.

Finally, we observe that .

f strictly increasing



f −1 strictly increasing ,

f strictly decreasing



f −1 strictly decreasing .

Therefore, if f is strictly monotone, then so is .f −1 . Hence, since .f −1 : J → I is invertible and strictly monotone, as proved earlier, it is necessarily continuous. 

2.4 The Exponential Function

2.4

53

The Exponential Function

Let us denote by .R+ the set of positive real numbers, i.e., R+ = ]0, +∞[ = {x ∈ R : x > 0} .

.

The following theorem will be proved in Chap. 5. Theorem 2.15 Given .a > 0, there exists a unique continuous function .fa : R → R+ such that (i) .fa (x1 + x2 ) = fa (x1 )fa (x2 ) , for every .x1 , x2 in .R. (ii) .fa (1) = a . Moreover, if .a = 1, then this function .fa is invertible. The function .fa is called “exponential to base a” and is denoted by .expa . If a = 1, then the inverse function .fa−1 : R+ → R is called the “logarithm to base a” and is denoted by .loga . By Theorem 2.14, it is a continuous function. We can write, for .x ∈ R and .y ∈ R+ ,

.

.

expa (x) = y



x = loga (y) .

From the properties (i) .expa (x1 + x2 ) = expa (x1 ) expa (x2 ), (ii) .expa (1) = a we directly deduce the corresponding properties of the logarithm (j ) .loga (y1 y2 ) = loga (y1 ) + loga (y2 ) , (jj ) .loga (a) = 1 . Since the constant function .f (x) = 1 verifies (i) and (ii), with .a = 1, by the uniqueness of that function we deduce that .f = exp1 , i.e., .

exp1 (x) = 1 ,

for every x ∈ R .

Let us now deduce from (i) and (ii) some general properties of the exponential function. First of all, we observe that, since .expa (1) = expa (1 + 0) = expa (1) expa (0), it must be that .

expa (0) = 1 .

54

2 Continuity

Let us now prove that, for every .x ∈ R and every .n ∈ N, .

expa (nx) = (expa (x))n .

We argue by induction. If .n = 0, then we see that .

expa (0x) = 1 ,

(expa (x))0 = 1 ,

hence the identity surely holds. Assume now that the formula is true for some .n ∈ N. Then expa ((n + 1)x) = expa (nx + x) = expa (nx) expa (x)

.

= (expa (x))n expa (x) = (expa (x))n+1 , hence it is true also for .n + 1. The proof of the formula is thus completed. Taking .x = 1, we see that .

expa (n) = a n

for every .n ∈ N. This fact motivates a new notation: For every .x ∈ R, we will often write .a x instead of .expa (x). Taking .n ∈ N \ {0} and writing     n 1 1 = expa n .a = expa (1) = expa , n n  

is that number .u ∈ R+ that solves the equation .un = a. Such √ a number u is called the “nth root of a,” and it is denoted by .u = n a, i.e.,

we see that .expa

1 n

  √ 1 = n a. . expa n Hence, if .m ∈ N and .n ∈ N \ {0}, .

expa

m n

  m   $ √ %m 1 1 = expa m = na . = expa n n

On the other hand, writing 1 = expa (0) = expa (x − x) = expa (x) expa (−x) ,

.

we see that .

expa (−x) =

1 , expa (x)

for every x ∈ R .

2.4 The Exponential Function

55

In particular, if .m ∈ N and .n ∈ N \ {0},  .

expa

−m n



√ 1 1 $m% = √ = ( n a)−m . n m ( a) expa n

=

Let us check for a moment that, for every .m ∈ Z and .n ∈ N \ {0}, we have .

$√ %m √ n n a = am .

√ √ Indeed, if .b = n a, then .a m = (bn )m = bnm = (bm )n , whence .bm = n a m . We can thus conclude that m √ m n = a m , for every ∈ Q. . expa n n If .a = 1, then the exponential function .expa : R → R+ is continuous and invertible and, hence, strictly monotone. Since .expa (0) = 1 and .expa (1) = a,  .

expa is :

striclty increasing if a > 1; striclty decreasing if 0 < a < 1.

(See Fig. 2.1) Let us also emphasize the following three important formulas: (ab) = a b ,

.

x

x x

 x 1 1 = x = a −x , a a

(a y )x = a yx .

The first one follows from the fact that the function .f (x) = a x bx verifies the property .i) and .f (1) = ab, hence .f = expab . Analogously, for the second one, we take .f (x) = a1x . For the third one, simply take .f (x) = a yx .

Fig. 2.1 The function .expa , with .a > 1 and .a < 1, respectively

56

2 Continuity

Fig. 2.2 The function .loga , with .a > 1 and .a < 1, respectively

If .a = 1, we have that  .

loga is

strictly increasing if a > 1, strictly decreasing if 0 < a < 1.

(See Fig. 2.2) The following are two important formulas for the logarithm: .

loga (x y ) = y loga (x) ,

logb (x) =

loga (x) . loga (b)

To prove the first one, set .u = loga (x y ) and .v = loga (x). Then .a u = x y and .a v = x, and hence .a u = (a v )y = a vy . By the injectivity of .expa , it must be that .u = vy, and the first formula is proved. To prove the second formula, set .u = logb (x), u = x, .a v = x, and .a w = b, whence .v = loga (x), and .w = loga (b). Then .b v w u wu .a = (a ) = a . By injectivity, it must be that .v = wu, and the second formula is also proved.

2.5

The Trigonometric Functions

We will now introduce the trigonometric functions following a path similar to that traced previously for the exponential function. Given a real number .T > 0, a function .F : R → Ω, where .Ω is any possible set, is said to be “periodic with period T ,” or “T -periodic” short, if F (x + T ) = F (x) ,

.

for every x ∈ R.

Clearly, if T is a period for the function F , then .2T , 3T , . . . are also periods for F . We will say that T is the “minimal period” if there are no smaller periods.

2.5 The Trigonometric Functions

57

Let us introduce the set S 1 = {z ∈ C : |z| = 1} ,

.

i.e., the circle centered at the origin, with radius 1, in the complex field .C. The following theorem will be proved in Chap. 5. Theorem 2.16 Given .T > 0, there exists a unique function .hT : R → S 1 , continuous and periodic with minimal period T , such that (i) .hT (x $ 1 %+ x2 ) = hT (x1 )hT (x2 ) , for every .x1 , x2 in .R. (ii) .hT T4 = i . The function .hT is called the “circular function to base T .” Since .S 1 is indeed a subset of .R2 , the function .hT has two components, which will be denoted by .cosT and .sinT ; they will be called “cosine to base T ” and “sine to base T ,” respectively. We can then write, for every .x ∈ R, hT (x) = (cosT (x), sinT (x)) ,

hT (x) = cosT (x) + i sinT (x) .

or

.

These functions are T -periodic, and from the properties of the circular function we have that (a) (b) (c) (d)

(cosT (x))2 + (sinT (x))2 = 1 . .cosT (x1 + x2 ) = cosT (x1 ) cosT (x2 ) − sinT (x1 ) sinT (x2 ) . .sinT (x1 + x2 ) = sinT (x1 ) cosT (x2 ) + cosT (x1 ) sinT (x2 ) . $T % $ % sinT T4 = 1 . .cosT 4 = 0, .

Let us now focus our attention on the interval .[0, T [ . Writing $T %

i = hT

.

4

$ = hT 0 +

T 4

%

$T %

= hT (0)hT

4

= hT (0)i ,

we see that .hT (0) = 1. Moreover, hT

.

$T % 2

= hT

$T 4

+

T 4

%

= hT

$T % 4

hT

$T % 4

= i 2 = −1 ,

whereas  hT

.

3T 4



= hT

$T 2

+

T 4

%

= hT

$T % 2

hT

$T % 4

= (−1)i = −i .

58

2 Continuity

Summing up, cosT (0) = 1 , $ % cosT T4 = 0 , $ % . cosT T2 = −1 ,   = 0, cosT 3T 4

sinT (0) = 0 , $ % sinT T4 = 1 , $ % sinT T2 = 0 ,   sinT 3T = −1 . 4

Now, from 1 = hT (0) = hT (x − x) = hT (x)hT (−x)

.

we have .hT (−x) = hT (x)−1 = hT (x)∗ , being .|hT (x)| = 1. Hence, .

cosT (−x) = cosT (x) ,

sinT (−x) = − sinT (x) ,

showing that .cosT is an even function, whereas .sinT is odd. Let us prove now that .& hT : [0, T [ → S 1 , the restriction of .hT to the interval .[0, T [ , is bijective. First, injectivity: Take .α < β in .[0, T [ . By contradiction, if .hT (α) = hT (β), then hT (β − α) = hT (β)hT (−α) =

.

hT (β) = 1, hT (α)

and hence hT (x + (β − α)) = hT (x)hT (β − α) = hT (x) ,

.

for every x ∈ R ,

so that .β − α would be a period for .hT smaller than T , while we know that T is the minimal period. Then .hT (α) = hT (β), proving that .& hT is injective. We now prove that ⎧ > 0 if 0 < x < T4 ⎪ ⎪ ⎨ . cosT (x) < 0 if T4 < x < 3T 4 ⎪ ⎪ ⎩ 3T > 0 if 4 < x < T

 sinT (x)

> 0 if 0 < x < < 0 if

T 2

T 2

1 and .a < 1, respectively

Fig. 2.7 The hyperbolic function .tanha when .a > 1 and .a < 1, respectively

The striking analogies with the trigonometric functions can be explained recalling the similar properties of the exponential and circular functions. They will be further investigated in Sect. 8.4.2. We can now define the “hyperbolic tangent” to base a as .

tanha (x) =

sinha (x) . cosha (x)

Its domain is the whole real line .R, and it is continuous (Fig. 2.7). Let us now consider two examples of functions and examine their continuity. Example 1 Let .f : R → R be defined as   ⎧ ⎨ sin 1 T .f (x) = x ⎩ 0

if x = 0 , if x = 0 .

If .x0 = 0, then the function f is continuous at .x0 , since it is the composition of continuous functions. In contrast, if .x0 = 0, then it is not continuous at .x0 since in every neighborhood of 0 there are values of x for which .f (x) = 1, whereas .f (0) = 0.

62

2 Continuity

Example 2 Now let .f : R → R be defined by f (x) =

.

⎧ ⎨

x sinT



0

  1 x

if x = 0 , if x = 0 .

This function is continuous on the whole .R. Indeed, if .x0 = 0, then the situation is similar to the previously described one. If now .x0 = 0, then it is useful to observe that |f (x)| ≤ |x| ,

.

for every x ∈ R .

Thus, once .ε > 0 is fixed, it is sufficient to choose .δ = ε to have that |x − 0| < δ ⇒ |f (x) − f (0)| < ε .

.

3

Limits

We will now introduce another fundamental concept that, however, is strongly related to continuity. It is the notion of the “limit” of a function, a local notion, as we will see. As in Chap. 2, the theory will be developed within the framework of metric spaces.

3.1

The Notion of Limit

Our general setting involves two metric spaces, E and F , a point .x0 of E, and a function f :E→F

or

.

f : E \ {x0 } → F ,

not necessarily defined in .x0 . Definition 3.1 If there exists .l ∈ F such that the function .f˜ : E → F , defined by f˜(x) =

.



f (x) l

if x =

x0 , if x = x0 ,

is continuous at .x0 , then l is said to be a “limit of f at .x0 ,” or also a “limit of .f (x) as x tends to .x0 ,” and we can write l = lim f (x) .

.

x→x0

In other terms, l is a limit of f at .x0 if and only if ∀ε > 0 ∃δ > 0 : ∀x ∈ E

.

0 < dE (x, x0 ) < δ ⇒ dF (f (x), l) < ε ,

© The Author(s), under exclusive license to Springer Nature Switzerland AG 2023 A. Fonda, A Modern Introduction to Mathematical Analysis, https://doi.org/10.1007/978-3-031-23713-3_3

63

64

3 Limits

or, equivalently, ∀V neighborhood of .l

.

∃U neighborhood of .x0 : .f (U \ {x0 }) ⊆ V .

Sometimes we can also write “.f (x) → l as .x → x0 .” We know that if .x0 happens to be an isolated point, then the function .f˜ defined earlier will be continuous at .x0 for every .l ∈ F . Therefore, the notion of limit is of no interest at all in this case. This is why we will always assume that .x0 is not an isolated point, in which case we say that .x0 is a “cluster point” of E: Every neighborhood of .x0 contains some point of E that differs from .x0 itself. Note that if .x0 is a cluster point of E, then every neighborhood .U0 of .x0 contains infinitely many points of E. Indeed, once we have found .x1 = x0 in .U0 , it is possible to choose a neighborhood .U1 of .x0 that does not contain .x1 . Then we can find .x2 = x0 in .U1 , and so on. From now on we will assume that .x0 is a cluster point of E. This assumption also enables us to prove the following proposition. Proposition 3.2 If a limit of f at .x0 exists, then it is unique. Proof Assume by contradiction that there are two limits, l and .l  , which are distinct. Let us take .ε = 12 d(l, l  ). Then there exists a .δ > 0 such that 0 < d(x, x0 ) < δ ⇒ d(f (x), l) < ε ,

.

and there exists a .δ  > 0 such that 0 < d(x, x0) < δ  ⇒ d(f (x), l  ) < ε .

.

Let .x = x0 be such that .d(x, x0 ) < δ and .d(x, x0 ) < δ  (such an x exists because .x0 is a cluster point). Then d(l  , l) ≤ d(l, f (x)) + d(f (x), l  ) < 2ε = d(l  , l) ,

.



a contradiction. The following relationship will surely be useful. Proposition 3.3 The equivalence .

always holds true.

lim f (x) = l

x→x0



lim d(f (x), l) = 0

x→x0

3.2 Some Properties of Limits

65

Proof The function .F (x) = d(f (x), l) has real values, and the distance in .R is d(α, β) = |α − β|. The conclusion easily follows from the definitions. 

.

The following theorem underlines the strong relationship between the concepts of limit and continuity. Proposition 3.4 For any function .f : E → F , ⇔

f is continuous at x0

.

lim f (x) = f (x0 ) .

x→x0

Proof In this case, the function .f˜ coincides with f .

3.2



Some Properties of Limits

Let us start with those properties of limits that are directly inherited from those of continuous functions. In the following statements, the functions f and g are defined on E or .E \ {x0 }, indifferently, and .x0 is a cluster point of E. Theorem 3.5 Let F be a normed vector space, .f, g : E \ {x0} → F two functions such that l1 = lim f (x) ,

.

x→x0

l2 = lim g(x) , x→x0

and .α ∈ R. Then .

lim [f (x) ± g(x)] = l1 ± l2 ,

x→x0

lim [αf ](x) = αl1 .

x→x0

Assume now that .F = RM for some integer .M ≥ 1. For any function .f : E \ {x0 } → RM we can consider its components .fk : E \ {x0} → R, with .k = 1, 2, . . . , M, and we can write f (x) = (f1 (x), f2 (x), . . . , fM (x)) .

.

Theorem 3.6 The limit lim f (x) = l ∈ RM exists if and only if all the limits x→x0

.

lim fk (x) = lk ∈ R exist, with .k = 1, 2, . . . , M. In that case, l = (l1 , l2 , . . . , lM ),

x→x0

i.e., .

lim f (x) =

x→x0



 lim f1 (x), lim f2 (x), . . . , lim fM (x) .

x→x0

x→x0

x→x0

Proof This is a direct consequence of Theorem 2.9 for continuous functions. We now assume that .F = R and state the property of sign permanence.



66

3 Limits

Theorem 3.7 Let .g : E \ {x0 } → R be such that .

lim g(x) > 0 .

x→x0

Then there exists a neighborhood U of .x0 such that x ∈ U \ {x0 } ⇒ g(x) > 0.

.

Similarly, if .

lim g(x) < 0 ,

x→x0

then there exists a neighborhood U of .x0 such that x ∈ U \ {x0 } ⇒ g(x) < 0 .

.

As an immediate consequence, we have the following corollary. Corollary 3.8 If there is a neighborhood U of .x0 such that .g(x) ≤ 0 for every x ∈ U \ {x0 }, then, if the limit exists,

.

.

lim g(x) ≤ 0 .

x→x0

Similarly, if .g(x) ≥ 0 for every .x ∈ U \ {x0 }, then, if the limit exists, .

lim g(x) ≥ 0 .

x→x0

Still assuming that .F = R, we now consider the product and the quotient of two functions. Theorem 3.9 Let .f, g : E \ {x0 } → R be such that l1 = lim f (x) ,

.

x→x0

l2 = lim g(x) . x→x0

Then .

lim [f (x)g(x)] = l1 l2 .

x→x0

Moreover, if .l2 = 0, then .

lim

x→x0

l1 f (x) = . g(x) l2

3.2 Some Properties of Limits

67

Now let E be any metric space and .F = R. The following theorem has a strange name, indeed. Theorem 3.10 (Squeeze Theorem) Let .F1 , F2 : E \ {x0 } → R be such that .

lim F1 (x) = lim F2 (x) = l .

x→x0

x→x0

If .f : E \ {x0 } → R has the property that F1 (x) ≤ f (x) ≤ F2 (x)

.

for every x ∈ E \ {x0 } ,

then .

lim f (x) = l .

x→x0

Proof Once .ε > 0 has been fixed, there exist .δ1 > 0 and .δ2 > 0 such that .

0 < d(x, x0) < δ1



l − ε < F1 (x) < l + ε ,

0 < d(x, x0) < δ2



l − ε < F2 (x) < l + ε .

Taking .δ = min{δ1 , δ2 }, we have 0 < d(x, x0) < δ

.



l − ε < F1 (x) ≤ f (x) ≤ F2 (x) < l + ε , 

thereby completing the proof. As a consequence, we have the following corollary. Corollary 3.11 Let .f, g : E \ {x0 } → R be such that .

lim f (x) = 0 ,

x→x0

and there is a constant .C > 0 such that |g(x)| ≤ C ,

.

for every x ∈ E \ {x0 } .

Then .

lim f (x)g(x) = 0 .

x→x0

68

3 Limits

Proof After noticing that .

− C|f (x)| ≤ f (x)g(x) ≤ C|f (x)| ,

and recalling that, by Proposition 3.3, .

lim f (x) = 0

x→x0



lim |f (x)| = 0 ,

x→x0

the result follows from the Squeeze Theorem 3.10, taking .F1 (x) = −C|f (x)| and F2 (x) = C|f (x)|. 

.

Remark 3.12 Returning to the statements of the preceding theorem and corollary, we realize that “for every .x ∈ E \ {x0 }” could be weakened to “for every .x = x0 in a neighborhood of .x0 .” This is due to the fact that the notion of limit relates only to the local behavior of the function near .x0 . This observation holds in general when dealing with limits and will often be used in what follows.

3.3

Change of Variables in the Limit

We now return to the general setting in metric spaces and examine the composition of two functions, f and g. We have two interesting situations. In the first one, some continuity of g is needed. Theorem 3.13 Let .f : E → F , or .f : E \ {x0} → F , be such that .

lim f (x) = l .

x→x0

If .g : F → G is continuous at l, then .

lim g(f (x)) = g(l) ,

x→x0

i.e., .

lim g(f (x)) = g

x→x0



 lim f (x) .

x→x0

Proof Recalling the definition of limit, we know that the function .f˜ : E → R is continuous at .x0 , whereas g is continuous at .l = f˜(x0 ). Hence, .g ◦ f˜ is continuous at .x0 , so that, recalling that .f (x) = f˜(x) when .x = x0 , .

lim g(f (x)) = lim g(f˜(x)) = g(f˜(x0 )) = g(l) ,

x→x0

as we wanted to prove.

x→x0



3.3 Change of Variables in the Limit

69

The following theorem gives us the change of variables formula. The function g does not have to be continuous at the limit point of f , and indeed it could even not be defined there. Theorem 3.14 Let .f : E → F , or .f : E \ {x0} → F , be such that lim f (x) = l .

.

x→x0

Assume, moreover, that .f (x) = l for every .x = x0 in a neighborhood of .x0 . Let g : F → G, or .g : F \ {l} → G, be such that

.

.

lim g(y) = L .

y→l

Then .

lim g(f (x)) = L ,

x→x0

i.e., .

lim g(f (x)) =

x→x0

lim

y→ lim f (x) x→x0

g(y) .

(3.1)

In the preceding formula, we say that the “change of variables .y = f (x)” has been performed in the limit. Proof We first observe that, in view of the assumptions, .g ◦ f is defined on .U \ {x0 } for some neighborhood U of .x0 . Moreover, l is a cluster point of F . Recalling again the definition of limit, we know that the function .f˜ : E → F is continuous at .x0 , with .f˜(x0 ) = l. Similarly, let us introduce the function .g˜ : F → G, defined as  g(y) ˜ =

.

g(y) L

if y =

l, if y = l .

This function is continuous at l, so the composition .g˜ ◦ f˜ is continuous at .x0 . For every .x ∈ U \ {x0 }, since .f (x) = l, we have g(f (x)) = g(f ˜ (x)) = g( ˜ f˜(x)) ,

.

and hence .

lim g(f (x)) = lim g( ˜ f˜(x)) = g( ˜ f˜(x0 )) = L ,

x→x0

thereby proving the result.

x→x0



70

3.4

3 Limits

On the Limit of Restrictions

We have studied some properties of limits in a context where E and F are metric spaces, .x0 is a cluster point of E, and either .f : E → F or .f : E \ {x0 } → F . We now note that all the aforementioned considerations still hold if we assume that the domain of f is a subset of E, say, .D ⊆ E, provided that .x0 is a “cluster point of” D. By this we mean that every neighborhood of .x0 contains some point of D that differs from .x0 itself. Notice that .x0 might not be an element of D. ( ⊆ E. We can consider the restriction of f Now let .f : E \ {x0 } → F , and let .E ( \ {x0 }, i.e., the function .fˆ : E ( \ {x0 } → F such that .fˆ(x) = f (x) for every to .E ( \ {x0 }. .x ∈ E ( then the Theorem 3.15 If the limit of f at .x0 exists and .x0 is a cluster point of .E, limit of .fˆ at .x0 also exists, and it has the same value: .

lim fˆ(x) = lim f (x) .

x→x0

x→x0

Proof The proof follows directly from the definition of .fˆ.



The previous theorem is often used to establish the nonexistence of the limit of f at some point .x0 , trying to find two restrictions along which the two limits differ. Example 1 The function .f : R2 \ {(0, 0)} → R, defined by f (x, y) =

.

xy , x2 + y2

(1 = {(x, y) : has no limit as .(x, y) → (0, 0), since the restrictions of f to the lines .E (2 = {(x, y) : x = y} have different limits. x = 0} and .E Example 2 More surprising is the case of the function f (x, y) =

.

x 2y . x4 + y2

It can be seen that all its restrictions to the lines passing through .(0, 0) have limits (1 = {(x, y) : x = 0} and .E (2 = {(x, y) : equal to 0. Indeed, this is easily seen for .E y = 0}, whereas for any .m = 0, .

mx 3 mx = lim 2 = 0. 4 2 2 x→0 x + m x x→0 x + m2

lim f (x, mx) = lim

x→0

However, the restriction to the parabola .{(x, y) : y = x 2 } is constantly equal to . 12 , thereby leading to a different limit. Hence, the function f has no limit at .(0, 0).

3.4 On the Limit of Restrictions

71

Example 3 Quite unlike the two preceding examples, let us now prove that .

x2y2 = 0. (x,y)→(0,0) x 2 + y 2 lim

Let .ε > 0 be fixed. After having verified that .

it is natural to take .δ =

x 2y 2 1 ≤ (x 2 + y 2 ) , x2 + y2 2

√ 2ε, so that

 2 2   x y    .d((x, y), (0, 0)) < δ ⇒  x 2 + y 2 − 0 < ε . Now let E be a subset of .R and F be a general metric space. Given .f : E → F or .f : E \ {x0 } → F , we can consider the two restrictions .fˆ1 and .fˆ2 to the sets (1 = E ∩ ] − ∞, x0 [ and .E (2 = E∩ ]x0 + ∞[ , respectively. If .x0 is a cluster point .E (1 , then we call the “left limit” of f at .x0 , whenever it exists, the limit of .fˆ1 (x) of .E (1 ), and we denote it by when x approaches .x0 (in .E .

lim f (x) .

x→x0−

(2 , we call the “right limit” of f at .x0 , Analogously, if .x0 is a cluster point of .E ˆ (2 ), and we denote whenever it exists, the limit of .f2 (x) when x approaches .x0 (in .E it by .

lim f (x) .

x→x0+

Theorem 3.16 If .x0 is a cluster point of both .E ∩ ] − ∞, x0 [ and .E∩ ]x0 + ∞[, then the limit of f at .x0 exists if and only if both the left limit and the right limit exist, and they are equal to each other. Proof We already know that if the limit of f at .x0 exists, then all restrictions of f must have the same limit at .x0 . Conversely, let us assume that the left limit and the right limit exist and that .l ∈ F is their common value. Let .ε > 0 be fixed. Then there exist .δ1 > 0 and .δ2 > 0 such that if .x ∈ E, x 0 − δ1 < x < x 0



d(f (x), l) < ε

x 0 < x < x 0 + δ2



d(f (x), l) < ε .

.

and .

72

3 Limits

Defining .δ = min{δ1 , δ2 }, we then have that if .x ∈ E \ {x0 }, x0 − δ < x < x0 + δ



.

d(f (x), l) < ε ,

showing that the limit of f at .x0 exists and is equal to l.



Example The “sign function” .f : R → R, defined as ⎧ ⎨1 .f (x) = 0 ⎩ −1

if x > 0 , if x = 0 , if x < 0 ,

has no limit at .x0 = 0 since . lim f (x) = −1 and . lim f (x) = 1. x→0−

3.5

x→0+

The Extended Real Line

Let us consider the function .ϕ : R → ] − 1, 1[ defined as ϕ(x) =

.

x . 1 + |x|

This is an invertible function, with inverse .ϕ −1 : ] − 1, 1[ → R given by ϕ −1 (y) =

.

y . 1 − |y|

We can now define a new distance on .R as ˜ x  ) = |ϕ(x) − ϕ(x  )| . d(x,

.

It can indeed be verified that it satisfies the four properties characterizing a distance. & 0 , ρ) the open ball for this new distance centered at .x0 , with Let us denote by .B(x

3.5 The Extended Real Line

73

radius .ρ > 0, i.e., & 0 , ρ) = {x ∈ R : |ϕ(x) − ϕ(x0 )| < ρ} . B(x

.

We claim that the neighborhoods of any point .x0 ∈ R remain the same as those provided by the usual distance on .R. Indeed, since .ϕ is continuous at .x0 , for every .ρ1 > 0, there exists a .ρ2 > 0 such that |x − x0 | < ρ2

.



|ϕ(x) − ϕ(x0 )| < ρ1 ,

i.e., & 0 , ρ1 ) . ]x0 − ρ2 , x0 + ρ2 [ ⊆ B(x

.

Conversely, since .ϕ −1 is continuous at .y0 = ϕ(x0 ) ∈ ] − 1, 1[ , for every .ρ1 > 0 there exists a .ρ2 > 0 such that |y − y0 | < ρ2



.

y ∈ ] − 1, 1[ and |ϕ −1 (y) − ϕ −1 (y0 )| < ρ1 .

In particular, taking .y = ϕ(x), |ϕ(x) − ϕ(x0 )| < ρ2



.

ϕ(x) ∈ ] − 1, 1[ and |x − x0 | < ρ1 ,

i.e., & 0 , ρ2 ) ⊆ ]x0 − ρ1 , x0 + ρ1 [ . B(x

.

We have thus proved our claim. Let us now introduce a new set, .& R, defined by adding to .R two new elements denoted by .−∞ and .+∞, i.e., & R = R ∪ {−∞, +∞} .

.

The set .& R is totally ordered, maintaining the usual order on the reals while setting .

− ∞ < x < +∞

for every x ∈ R .

Let us define the function .ϕ˜ : & R → [−1, 1] as ⎧ ⎨ −1 .ϕ(x) ˜ = ϕ(x) ⎩ 1

if x = −∞ , if x ∈ R , if x = +∞ .

74

3 Limits

It is invertible, with inverse .ϕ˜ −1 : [−1, 1] → & R given by ⎧ ⎨ −∞ −1 .ϕ ˜ (y) = ϕ −1 (y) ⎩ +∞

if y = −1 , if y ∈ ] − 1, 1[ , if y = 1 .

We now define, for every .x, x  ∈ & R, ˜ x  ) = |ϕ(x) ˜ − ϕ(x ˜  )| . d(x,

.

It is readily verified that .d˜ is a distance on .& R, so that .& R is now a metric space. Let us see, for example, what a ball centered at .+∞ looks like: B(+∞, ρ) = {x ∈ & R : |ϕ(x) ˜ − 1| < ρ} = {x ∈ & R : ϕ(x) ˜ > 1 − ρ} ,

.

hence ⎧ R ⎨& .B(+∞, ρ) = ] − ∞, +∞] ⎩ −1 ]ϕ (1 − ρ), +∞]

if ρ > 2 , if ρ = 2 , if ρ < 2 ,

where we have used the notation ]a, +∞] = {x ∈ & R : x > a} = ]a, +∞[ ∪ {+∞} .

.

We can thus state that a neighborhood of .+∞ is a set that contains, besides .+∞ itself, an interval of the type .]α, +∞[ for some .α ∈ R. Analogously, a neighborhood of .−∞ is a set containing .−∞ and an interval of the type .] − ∞, β[ for some .β ∈ R. Let us see how the definition of limit translates in some cases where the new elements .−∞ and .+∞ appear. To start with, let .f : E → F be a function with .E ⊆ R, whose codomain F is any metric space. Considering E a subset of .& R, we have that .+∞ is a cluster point of E if and only if E is not bounded from above. In that case, lim f (x) = l ∈ F ⇔

.

x→+∞



∀V neighbourhood of l ∃U neighbourhood of + ∞ : f (U ∩ E) ⊆ V ∀ε > 0 ∃α ∈ R : x > α ⇒ d(f (x), l) < ε .

Similarly, if E is not bounded from below, .

lim f (x) = l ∈ F

x→−∞



∀ε > 0 ∃β ∈ R :

x < β ⇒ d(f (x), l) < ε .

3.5 The Extended Real Line

75

Notice that .

lim f (x) = l

x→−∞



lim f (−x) = l .

x→+∞

Let us now consider a function .f : E → R, or .f : E \ {x0} → R, where E is any metric space and .x0 is a cluster point of E. If we consider the codomain .F = R a subset of .& R, then lim f (x) = +∞ ⇔

.

∀V neighbourhood of + ∞ ∃U neighbourhood of x0 : f (U \ {x0 }) ⊆ V ∀α ∈ R ∃δ > 0 : 0 < d(x, x0 ) < δ ⇒ f (x) > α .

x→x0

⇔ Similarly, .

lim f (x) = −∞

x→x0



∀β ∈ R

∃δ > 0 :

0 < d(x, x0) < δ ⇒ f (x) < β .

Notice that .

lim f (x) = −∞

x→x0



lim [−f (x)] = +∞ .

x→x0

The foregoing situations can be combined together. For example, if .E ⊆ R is not R, then bounded from above and .F = R is considered a subset of .& lim f (x) = +∞ ⇔

.

x→+∞



∀V neighborhood of + ∞ ∃U neighborhood of + ∞ : f (U ∩ E) ⊆ V ∀α ∈ R ∃α  ∈ R : x > α  ⇒ f (x) > α ,

and .

lim f (x) = −∞

x→+∞



∀β ∈ R

∃α  ∈ R :

x > α  ⇒ f (x) < β .

On the other hand, if .E ⊆ R is not bounded from below, .

lim f (x) = +∞



∀α ∈ R

∃β  ∈ R :

x < β  ⇒ f (x) > α ,

lim f (x) = −∞



∀β ∈ R

∃β  ∈ R :

x < β  ⇒ f (x) < β .

x→−∞

and .

x→−∞

An important particular situation is encountered when dealing with a sequence (an )n in a metric space F . We are thus given a function .f : N → F defined as

.

76

3 Limits

f (n) = an . Considering .N a subset of .& R, it is readily seen that the only cluster point of .N is .+∞, and, adapting the definition of limit to this case, we can write

.

.

lim an = l ∈ F



n→+∞

∀ε > 0

∃n¯ ∈ N :

n ≥ n¯ ⇒ d(an , l) < ε .

As a particular case we may have .F = R, considered as a subset of .& R, and we thus recover the preceding definitions when .l = −∞ or .l = +∞. The limit of a sequence will often be denoted simply by .lim an , tacitly implying n that .n → +∞.

3.6

Some Operations with −∞ and +∞

When the limits are .−∞ or .+∞, the normal operations with limits cannot be used. We will provide here a few useful rules for some of these cases. In what follows, all the functions will be defined either on the whole metric space E or on .E \ {x0 }, and .x0 will always be assumed to be a cluster point of E. Let us start with the sum of two functions. Theorem 3.17 If .

lim f (x) = +∞

x→x0

and there exists a .γ ∈ R such that g(x) ≥ γ ,

for every x = x0 in a neighborhood of x0 ,

.

then .

lim [f (x) + g(x)] = +∞ .

x→x0

Proof Let .α ∈ R be fixed. Defining .α˜ = α − γ , there exists a .δ > 0 such that 0 < d(x, x0) < δ ⇒ f (x) > α˜ .

.

Hence, 0 < d(x, x0 ) < δ ⇒ f (x) + g(x) > α˜ + γ = α ,

.

thereby proving the result.



3.6 Some Operations with −∞ and +∞

77

Corollary 3.18 If .

lim f (x) = +∞

x→x0

and

lim g(x) = l ∈ R (or l = +∞) ,

x→x0

then .

lim [f (x) + g(x)] = +∞ .

x→x0

Proof If the limit of g is some .l ∈ R, then there exists a .δ > 0 such that 0 < d(x, x0 ) < δ ⇒ g(x) ≥ l − 1 .

.

On the other hand, if the limit of g is .+∞, then we can find a .δ > 0 such that 0 < d(x, x0 ) < δ ⇒ g(x) ≥ 0 .

.

In any case, the previous theorem can be applied to obtain the conclusion.



As a mnemonic rule, we will briefly write .

(+∞) + l = +∞

if l is a real number ;

(+∞) + (+∞) = +∞ . In perfect analogy, we can state a theorem, with a related corollary, in the case where the limit of f is .−∞. As a mnemonic rule, we will then write .

(−∞) + l = −∞

if l is a real number ;

(−∞) + (−∞) = −∞ . Regarding the product of two functions, we have the following theorem. Theorem 3.19 If .

lim f (x) = +∞

x→x0

and there exists a .γ > 0 such that g(x) ≥ γ ,

.

for every x = x0 in a neighbourhood of x0 ,

78

3 Limits

then .

lim [f (x)g(x)] = +∞ .

x→x0

Proof Let .α ∈ R be fixed. We may assume with no loss of generality that .α > 0. Setting .α˜ = γα , there exists a .δ > 0 such that 0 < d(x, x0) < δ ⇒ f (x) > α˜ .

.

Hence, 0 < d(x, x0 ) < δ ⇒ f (x)g(x) > αγ ˜ = α,

.



thereby proving the statement. Corollary 3.20 If .

lim f (x) = +∞

and

x→x0

lim g(x) = l > 0 (or l = +∞) ,

x→x0

then .

lim [f (x)g(x)] = +∞ .

x→x0

Proof If the limit of g is a real number .l > 0, then there exists a .δ > 0 such that 0 < d(x, x0 ) < δ ⇒ g(x) ≥

.

l . 2

On the other hand, if the limit of g is .+∞, then there is a .δ > 0 such that 0 < d(x, x0 ) < δ ⇒ g(x) ≥ 1 .

.

In any case, the previous theorem provides the conclusion. In the same spirit, we will briefly write .

(+∞) · l = +∞

if l > 0 is a real number ;

(+∞) · (+∞) = +∞ , with all the following variants: .

(+∞) · l = −∞

if l < 0 is a real number ;

(−∞) · l = −∞

if l > 0 is a real number ;



3.6 Some Operations with −∞ and +∞

79

(−∞) · l = +∞

if l < 0 is a real number ;

(+∞) · (−∞) = −∞ ; (−∞) · (−∞) = +∞ . Let us now analyze the reciprocal of a function. We have two theorems. Theorem 3.21 If .

lim |f (x)| = +∞ ,

x→x0

then .

lim

x→x0

1 = 0. f (x)

Proof Let .ε > 0 be fixed. Setting .α =

1 ε

, there exists a .δ > 0 such that

0 < d(x, x0 ) < δ ⇒ |f (x)| > α .

.

Hence,     1 1 1 − 0 = < = ε, 0 < d(x, x0) < δ ⇒  f (x) |f (x)| α

.



thereby proving the claim. Theorem 3.22 If .

lim f (x) = 0

x→x0

and f (x) > 0

.

for every x = x0 in a neighborhood of x0 ,

then .

lim

x→x0

1 = +∞ . f (x)

However, if f (x) < 0

.

for every x = x0 in a neighborhood of x0 ,

80

3 Limits

then lim

.

x→x0

1 = −∞ . f (x)

Proof We treat the first case, the second one being similar. Let .α ∈ R be fixed; we can assume without loss of generality that .α > 0. Setting .ε = α1 , there exists a .δ > 0 such that 0 < d(x, x0) < δ ⇒ 0 < f (x) < ε .

.

Then 0 < d(x, x0 ) < δ ⇒

.

1 1 > = α, f (x) ε 

and the proof is completed.

Finally, we present two useful variants of the Squeeze Theorem 3.10 in the case where the limit is .+∞, where only one comparison function will be needed. Theorem 3.23 Let .F1 be such that .

lim F1 (x) = +∞ .

x→x0

If f (x) ≥ F1 (x)

.

for every x = x0 in a neighborhood of x0 ,

then .

lim f (x) = +∞ .

x→x0

Proof Setting .g(x) = f (x) − F1 (x), we have .g(x) ≥ 0 for every x in a neighborhood of .x0 and .f (x) = F1 (x) + g(x). The result then follows directly from Theorem 3.17.  In the case where the limit is .−∞, we have the following analogous result. Theorem 3.24 Let .F2 be such that .

lim F2 (x) = −∞ .

x→x0

3.6 Some Operations with −∞ and +∞

81

If f (x) ≤ F2 (x) ,

.

for every x = x0 in a neighbourhood of x0 ,

then .

lim f (x) = −∞ .

x→x0

We will now deal with some elementary situations when x approaches either .+∞ or .−∞. 1. Let us first consider the function f (x) = x n ,

.

where n is an integer. It can be verified by induction that for every .n ≥ 1, x≥1

.



xn ≥ x .

Since clearly . lim x = +∞, as a consequence of the preceding theorems we x→+∞

have ⎧ ⎨ +∞ n . lim x = 1 x→+∞ ⎩ 0

if n ≥ 1 , if n = 0 , if n ≤ −1 .

If we then take into account that (−x)n = x n if n is even ,

.

(−x)n = −x n if n is odd ,

we also conclude that ⎧ +∞ ⎪ ⎪ ⎨ −∞ n . lim x = x→−∞ ⎪ 1 ⎪ ⎩ 0

if n ≥ 1 is even , if n ≥ 1 is odd , if n = 0 , if n ≤ −1 .

2. Let us consider the polynomial function f (x) = an x n + an−1 x n−1 + · · · + a2 x 2 + a1 x + a0 ,

.

82

3 Limits

where .n ≥ 1 and .an = 0. Writing  a2 an−1 a1 a0  + · · · + n−2 + n−1 + n f (x) = x n an + x x x x

.

and using the fact that .

 an−1 a1 a0  a2 an + + · · · + n−2 + n−1 + n = an , x→+∞ x x x x lim

we see that  .

lim f (x) =

x→+∞

+∞ −∞

if an > 0 , if an < 0 ,

whereas .

lim f (x) =

x→−∞

 +∞ −∞

if either [n is even and an > 0] or [n is odd and an < 0], if either [n is even and an < 0] or [n is odd and an > 0] .

3. Consider now the rational function f (x) =

.

an x n + an−1 x n−1 + · · · + a2 x 2 + a1 x + a0 , bm x m + bm−1 x m−1 + · · · + b2 x 2 + b1 x + b0

where .n, m ≥ 1 and .an = 0, bm = 0. As previously, writing f (x) = x n−m

.

an + bm +

an−1 x bm−1 x

+ ···+ + ···+

a2 x n−2 b2 x m−2

+ +

a1 + xa0n x n−1 b1 + xbm0 x m−1

,

we can conclude that

.

lim f (x) = lim

x→+∞

x→+∞

an n−m x bm

⎧ +∞ ⎪ ⎪ ⎪ ⎪ ⎨ −∞ = an ⎪ ⎪ ⎪ ⎪ bm ⎩ 0

if n > m and an , bm have the same sign, if n > m and an , bm have opposite signs, if n = m , if n < m .

In a similar way, once it is observed that .

lim f (x) = lim

x→−∞

x→−∞

an n−m x , bm

the limit can be computed in all the different cases.

3.7 Limits of Monotone Functions

3.7

83

Limits of Monotone Functions

We will now see how the monotonicity of a function makes it possible to establish the existence of left or right limits. Let E be a subset of .R, and let .x0 be a cluster point of .E ∩ ]x0 , +∞[ . Theorem 3.25 If .f : E ∩ ]x0 , +∞[ → R is increasing, then lim f (x) = inf f (E ∩ ]x0 , +∞[ ) .

.

x→x0+

On the other hand, if f is decreasing, then .

lim f (x) = sup f (E ∩ ]x0 , +∞[ ) .

x→x0+

Proof We prove only the first statement, since the proof of the second one is analogous. Set .¯ι = inf f (E ∩ ]x0 , +∞[). If it happens that .¯ι ∈ R, then we fix an .ε > 0. By the properties of the infimum, there exists a .y¯ ∈ f (E ∩ ]x0 , +∞[) such that .y¯ < ¯ι + ε. Then, taking .x¯ ∈ E ∩ ]x0 , +∞[ satisfying .f (x) ¯ = y¯ and using the fact that f is increasing, we have x0 < x < x¯ ⇒ ¯ι ≤ f (x) ≤ f (x) ¯ < ¯ι + ε ,

.

thereby completing the proof in this case. If it happens that .¯ι = −∞, then we fix a .β ∈ R. Then, since .f (E ∩ ]x0 , +∞[ ) is unbounded from below, there exists a .x¯ ∈ E ∩ ]x0 , +∞[ satisfying .f (x) ¯ < β. Using the fact that f is increasing, we have x0 < x < x¯ ⇒ f (x) ≤ f (x) ¯ 1 , x . lim a = x→+∞ 0 if 0 < a < 1 ,

.

while  .

lim loga (x) =

x→+∞

+∞ −∞

if a > 1 , if 0 < a < 1 .

Writing .x α = exp(α ln x), we see that ⎧ ⎨ +∞ α . lim x = 1 x→+∞ ⎩ 0

if α > 0 , if α = 0 , if α < 0 .

In the following theorem, we compare the growth of .ex , .x α , and .ln x at .+∞. Theorem 3.29 For every .α > 0 we have .

ex = +∞ , x→+∞ x α lim

lim

x→+∞

ln x = 0. xα

3.8 Limits for Exponentials and Logarithms

89

Proof Let us start proving that, if .a > 1, lim

.

n

an = +∞ . n

Indeed, writing .a = 1 + b, with .b > 0, we see that, if .n ≥ 2, a n = (1 + b)n = 1 + nb +

.

n(n − 1) 2 n(n − 1) 2 b + · · · + bn > b . 2 2

Hence, for every .n ≥ 2, n−1 2 an > b , n 2

.

whence the result, by Theorem 3.23. Let us now show that for every integer .k ≥ 1, .

lim n

an = +∞ . nk

Indeed, writing an . = nk



a n/ k n

k

 √ k ( k a)n = , n

√ ( k a)n = +∞ in order to arrive at the conclusion. n n We now assume that .x ≥ 1. Let .n(x) and .n(α) be natural numbers such that

we can use the fact that .lim

n(x) ≤ x < n(x) + 1 ,

.

n(α) ≤ α < n(α) + 1 .

Setting .k = n(α) + 1, we have .

ex ex ex ex en(x) ≥ = ≥ ≥ . xα xk (n(x) + 1)k (n(x) + 1)k x n(α)+1

Moreover, .

en(x) en en+1 em 1 1 lim lim = lim = = = +∞ , n (n + 1)k x→+∞ (n(x) + 1)k e n (n + 1)k e m mk lim

and the first identity follows.

90

3 Limits

Now, by the change of variables “.y = ln x,” we obtain  1/α α  y −α y e ln x y . lim = lim = lim = lim = 0, x→+∞ x α y→+∞ (e y )α y→+∞ y→+∞ y 1/α ey 

thereby also proving the second identity.

We have thus seen that the exponential function .ex grows at .+∞ faster than any power .x α . We now show that the factorial grows still faster. Theorem 3.30 For every .a ∈ R, .

lim n

an = 0. n!

Proof If .|a| < 1, then .limn a n = 0, whence the result. Let us now assume .|a| ≥ 1 and prove by induction that for every .n ≥ n(|a|), |a|n−n(|a|) ≤ n! .

.

Indeed, this is surely true for .n = n(|a|). On the other hand, if the inequality is true for some .n ≥ n(|a|) then, since .|a| < n + 1, |a|n+1−n(|a|) = |a|n−n(|a|)|a| ≤ n! |a| ≤ n! (n + 1) = (n + 1)! ,

.

so that the inequality is also true for .n + 1. Now, for .n ≥ n(|a|) + 1, .

|a|n−1−n(|a|)|a|1+n(|a|) (n − 1)! |a|1+n(|a|) |a|1+n(|a|) |a|n = ≤ = , n! n! n! n 

and the result follows.

3.9

Liminf and Limsup

Let .(an )n be a sequence of real numbers. For every couple of natural numbers .n,  we define αn, = min{an , an+1 , . . . , an+ } ,

.

βn, = max{an , an+1 , . . . , an+ } .

3.9 Liminf and Limsup

91

If we keep n fixed, the sequence .(αn, ) is decreasing, and the sequence .(βn, ) is increasing, so the following limits exist: α n = lim αn, = inf{an , an+1 , . . . } ,

.



β n = lim βn, = sup{an , an+1 , . . . } . 

Notice that .α n could be equal to .−∞, and .β n could be equal to .+∞. Moreover, α n ≤ an ≤ β n

.

for every n .

Now the sequence .(α n )n either is constantly equal to .−∞ or has real values and is increasing; similarly, the sequence .(β n )n either is constantly equal to .+∞ or has real values and is decreasing. We can then define the “lower limit” and the “upper limit” of .(an )n as .

lim inf an = lim α n , n

n

lim sup an = lim β n . n

n

Let us see how the lower limit can be characterized. We have three cases:  (i) ∀ε > 0 ∃n¯ ∈ N : n ≥ n¯ ⇒ an >  − ε . lim inf an =  ∈ R ⇔ n (ii) ∀ε > 0 , an <  + ε for infinite values of n . .

lim inf an = −∞ ⇔ ∀β ∈ R , an < β for infinite values of n .

.

lim inf an = +∞ ⇔ ∀α ∈ R

n

n

∃n¯ ∈ N :

n ≥ n¯ ⇒ an > α .

Notice that this last case is equivalent to .limn an = +∞. Analogously, for the upper limit we also have three cases:  .

lim sup an =  ∈ R ⇔ n

.

(i) ∀ε > 0 ∃n¯ ∈ N : n ≥ n¯ ⇒ an <  + ε (ii) ∀ε > 0 , an >  − ε for infinite values of n .

lim sup an = +∞ ⇔ ∀α ∈ R , an > α for infinite values of n . n

.

lim sup an = −∞ ⇔ ∀β ∈ R n

∃n¯ ∈ N :

n ≥ n¯ ⇒ an < β .

This last case is equivalent to .limn an = −∞. The advantage of considering the lower and upper limits is that they always exist, while the limit, as we know, could not exist. The following theorem explains this situation better.

92

3 Limits

Theorem 3.31 The sequence .(an )n has a limit (possibly equal to either .−∞ or +∞) if and only if .lim inf an = lim sup an ; in that case, this value coincides with

.

n

n

lim an .

.

n

Proof It is a direct consequence of the foregoing characterizations. We avoid the details for brevity’s sake.  The following property will be useful. Proposition 3.32 Let .(an )n be a sequence of positive real numbers. Then .

an+1 √ √ an+1 ≤ lim inf n an ≤ lim sup n an ≤ lim sup . n an an n n

lim inf n

Proof Let us prove the last inequality. Let . = lim sup an+1 an . If . = +∞, then there n

is nothing to be proved. Thus, assume . < +∞, and notice that surely . ≥ 0. Let .ε > 0 be fixed. Then there exists a .n ∈ N such that n≥n



.

an+1 ε 0 we have .limn n ≥& n

.

 ε n an¯ an <  + . 2 ( + 2ε )n¯ √ n α = 1, there exists a & .n ≥ n + 1 such that

  √ an¯ ε n n an <  + < +ε. 2 ( + 2ε )n¯



We have thus proved that .lim sup n

√ n an ≤ .

The first inequality can be proved similarly.



4

Compactness and Completeness

In this chapter we discover some more subtle properties of the set of real numbers. This investigation will emphasize two important concepts, which will then be analyzed in the general setting of metric spaces: compactness and completeness.

4.1

Some Preliminaries on Sequences

Let U be a subset of a metric space E. Let us recall that a point .x0 is an “adherent point” of U if for every .ρ > 0 one has that .B(x0 , ρ) ∩ U = Ø. On the other hand, .x0 is a “cluster point” for U if for every .ρ > 0 one has that .B(x0 , ρ) ∩ U contains infinitely many elements of U . We can characterize the notion of “adherent point” by making use of sequences. Proposition 4.1 An element x of E is an adherent point of U if and only if there exists a sequence .(an )n in U such that .limn an = x. Proof If %x is an adherent point of U , then for every .n ∈ N the intersection $ 1 B x, n+1 ∩ U is nonempty, and we can select one of its elements, calling it .an . In such a way, we have constructed a sequence .(an )n in U , and it is now a simple task to verify that .limn an = x. Assume now that there exists a sequence .(an )n in U such that .limn an = x. Then, for any .ρ > 0, there exists a .n¯ ∈ N such that

.

n ≥ n¯ ⇒ an ∈ B(x, ρ) .

.

Hence, .B(x, ρ) ∩ U is nonempty, proving that x is an adherent point of U .



Let us now consider two metric spaces E and F and a function .f : E → F . We want to characterize the continuity of f at a point .x0 ∈ E by the use of sequences. © The Author(s), under exclusive license to Springer Nature Switzerland AG 2023 A. Fonda, A Modern Introduction to Mathematical Analysis, https://doi.org/10.1007/978-3-031-23713-3_4

93

94

4 Compactness and Completeness

Proposition 4.2 The function f is continuous at .x0 if and only if, for any sequence (an )n in E,

.

.

lim an = x0



n

lim f (an ) = f (x0 ) . n

Proof Assume that f is continuous at .x0 , and let .(an )n be a sequence in E such that limn an = x0 . By Theorem 3.13 on the limit of the composition of functions,

.

.

lim f (an ) = f (lim an ) = f (x0 ) , n

n

so that one of the two implications is proved. Let us now assume that f is not continuous at .x0 . Then there is an .ε > 0 such that, for every .δ > 0, there exists an .x ∈ E such that .d(x, x0 ) < δ and 1 .d(f (x), f (x0 )) ≥ ε. Taking .δ = n+1 , for every .n ∈ N there exists an .an in E such 1 and .d(f (an ), f (x0 )) ≥ ε. Then .limn an = x0 , but surely it that .d(an , x0 ) < n+1 cannot be that .limn f (an ) = f (x0 ). The proof is thus completed.  As an immediate corollary, we have the following characterization of the limit, assuming .x0 to be a cluster point. Proposition 4.3 We have that . lim f (x) = l if and only if, for any sequence .(an )n x→x0

in .E \ {x0 }, .

lim an = x0 n



lim f (an ) = l . n

Given any sequence .(an )n , we define a “subsequence” by selecting a strictly increasing sequence of indices .(nk )k and considering the composition k → nk → ank .

.

We will denote by .(ank )k such a subsequence. Notice that, since the indices .nk are in .N and .nk+1 > nk , it must be that .nk+1 ≥ nk + 1. As a consequence, one proves by induction that .nk ≥ k, for every k, whence .

lim nk = +∞ . k

Proposition 4.4 If a sequence has a limit, then all its subsequences must have the same limit.

4.2 Compact Sets

95

Proof Indeed, by the Change of Variables Formula (3.1), .

lim ank =

k→+∞

lim

n → lim nk

an = lim an , n→+∞

k→+∞

thereby proving the result.



Theorem 4.5 Any sequence of real numbers has a monotone subsequence. Proof Let .(an )n be a sequence in .R. We say that .n¯ is a “lookout point” for the sequence if .an¯ ≥ an for every .n ≥ n. ¯ Now we distinguish three cases. Case 1. There are infinitely many lookout points; let us order them in a strictly increasing sequence of indices .(nk )k . Then the subsequence .(ank )k is decreasing. Case 2. There are only finitely many lookout points. Let N be the largest one, and choose .n0 > N. Then, since .n0 is not a lookout point, there exists .n1 > n0 such that .an1 > an0 . By induction, we construct a strictly increasing sequence of indices .(nk )k in this way: Once .nk has been defined, since it is not a lookout point, there exists a .nk+1 > nk such that .ank+1 > ank . The subsequence .(ank )k thus constructed is strictly increasing. Case 3. There are no lookout points. In this case, choose .n0 arbitrarily, and proceed as in Case 2. 

4.2

Compact Sets

Here is a fundamental property of closed and bounded intervals. Theorem 4.6 (Bolzano–Weierstrass Theorem) Every sequence .(an )n in .[a, b] has a subsequence .(ank )k having a limit in .[a, b]. Proof By Theorem 4.5, there is a monotone subsequence .(ank )k , which by Corollary 3.27 has a limit .limk ank = l. Since .a ≤ ank ≤ b for every k, by Corollary 3.8 it must be that .a ≤ l ≤ b, thereby proving the result.  In a metric space E, we will say that a subset U is “compact” if every sequence (an )n in U has a subsequence .(ank )k having a limit in U . Bolzano–Weierstrass Theorem 4.6 thus states that if .E = R, then the intervals of the type .U = [a, b] are compact sets. In what follows, a subset of a metric space will be said to be “bounded” whenever it is contained in a ball.

.

Theorem 4.7 Every compact subset of E is closed and bounded.

96

4 Compactness and Completeness

Proof Assume that .U ⊆ E is compact. Taking .x ∈ U , by Proposition 4.1, there is a sequence .(an )n in U such that .limn an = x. Since U is compact, there exists a subsequence .(ank )k having a limit in U . But, since it is a subsequence, .limk ank = x, and hence .x ∈ U. We have thus shown that every adherent point of U belongs to U ; hence, U is closed. Now fix some .x0 ∈ U arbitrarily. We will show that if .n ∈ N is sufficiently large, then .U ⊆ B(x0 , n). By contradiction, if this is false, then we can build a sequence .(an )n in U such that .d(an , x0 ) ≥ n for every .n ∈ N. Since U is compact, there exists a subsequence .(ank )k having a limit .x¯ ∈ U . Using the triangle inequality, |d(ank , x0 ) − d(x, ¯ x0 )| ≤ d(ank , x) ¯ ,

.

whence .limk d(ank , x0 ) = d(x, ¯ x0 ), whereas it should be .

lim d(ank , x0 ) = +∞ , k



a contradiction. Therefore, U must be bounded. Let us focus our attention now on the compact subsets of .RN , with .N ≥ 1. Theorem 4.8 A subset of .RN is compact if and only if it is closed and bounded.

Proof We already know that every compact set is closed and bounded. Assume now that U is a closed and bounded subset of .RN . For simplicity, we will assume that .N = 2. Then U is contained in a rectangle .I = [a, b] × [c, d]. Let (an )n be a sequence in U . Then an = (an1 , an2 ), with .an1 ∈ [a, b] and .an2 ∈ [c, d]. By the Bolzano–Weierstrass Theorem 4.6, the sequence .(an1 )n has a subsequence .(an1k )k having a limit .l1 ∈ [a, b]. Let us now consider the sequence .(an2k )k , with the same indices .nk as the one we just found; it is a subsequence of .(an2 )n . By Bolzano– Weierstrass Theorem 4.6, the sequence .(an2k )k has a subsequence .(an2k )j having a j limit .l2 ∈ [c, d]. By Theorem 3.6, lim ankj = (lim an1k , lim an2k ) = (l1 , l2 ) . j

j

j

j

j

By Proposition 4.1, l = (l1 , l2 ) is an adherent point of U . Since U is closed, l is necessarily an element of U .  The following property of compact sets will be useful. Theorem 4.9 Let .U ⊆ RN be a compact set. If .(Ai )i∈I is a family (not necessarily a countable family) of open sets such that U⊆

)

.

i∈I

Ai ,

4.3 Compactness and Continuity

97

then there exists a finite subfamily .(A1 , . . . , An ) of .(Ai )i∈I such that U ⊆ A1 ∪ · · · ∪ An .

.

Proof For simplicity, we assume .N = 2. Let us first prove the statement in the case where U is a closed rectangle, and let us denote it by .R0 = [a0 , b0 ] × [c0 , d0 ]. By contradiction, assume that there is an open covering .(Ai )i∈I of .R0 without finite subcoverings. We split the rectangle .R0 into four smaller equal closed rectangles, connecting the midpoints of its sides. Among these four rectangles, there is at least one for which there is no finite subfamily of .(Ai )i∈I covering it. Let us call it .R1 . We now proceed recursively and construct in this way a sequence of closed rectangles .Rk = [ak , bk ] × [ck , dk ] such that R0 ⊇ R1 ⊇ R2 ⊇ · · · ⊇ Rk ⊇ Rk+1 , ⊇ . . . ,

.

for each of which there is no finite subfamily of .(Ai )i∈I covering it. By the Cantor Theorem 1.9, there exist .x¯ belonging to all intervals .[ak , bk ] and .y¯ belonging to all intervals .[ck , dk ], so that .(x, ¯ y) ¯ ∈ Rk for every .k ∈ N. Since .(x, ¯ y) ¯ belongs to U , there is at least one .Ai containing it. This set .Ai is open, and the dimensions of .Rk tend to zero as k tends to .+∞. Then, for k sufficiently large, the rectangle .Rk will be entirely contained in .Ai . But this is a contradiction, since there is no finite subfamily of .(Ai )i∈I covering .Rk . Now let U be any closed and bounded subset of .R2 . Then U is contained in a rectangle .[a, b] × [c, d]. If .(Ai )i∈I is an open covering of U , then [a, b] × [c, d] ⊆

)

.

 Ai ∪ (R2 \ U ) .

i∈I

Since .R2 \ U is open, we now have an open covering of .[a, b] × [c, d], and by the first part of the proof, there is a finite subfamily .(A1 , . . . , An ) of .(Ai )i∈I such that [a, b] × [c, d] ⊆ (A1 ∪ · · · ∪ An ) ∪ (R2 \ U ) .

.

Consequently, .U ⊆ A1 ∪ · · · ∪ An , and the proof is thus completed.



Note The preceding theorem indeed holds in any metric space, and it can be shown that the stated property is necessary and sufficient for the compactness of a set U .

4.3

Compactness and Continuity

In what follows, we will say that a function .f : A → R is “bounded from above” (or “bounded from below” or “bounded”) if that is its image .f (A). We will say that “f has a maximum” (or “f has a minimum”) if .f (A) does. In the case where

98

4 Compactness and Completeness

f has a maximum or a minimum, we will call “maximum point” any .x¯ for which f (x) ¯ = max f (A) and “minimum point” any .x¯ for which .f (x) ¯ = min f (A).

.

Theorem 4.10 (Weierstrass Theorem) If U is a compact set and .f : U → R is a continuous function, then f has a maximum and a minimum. Proof Let .s = sup f (U ). We will prove that there is a maximum point, i.e., a x¯ ∈ U , such that .f (x) ¯ = s. We first note that there is a sequence .(yn )n in .f (U ) such that .limn yn = s. Indeed, if .s ∈ R, then for every .n ≥ 1 we can find a .yn ∈ f (U ) such that .s − n1 < yn ≤ s; and if .s = +∞, then for every n there is a .yn ∈ f (U ) such that .yn > n. In both cases, we have .limn yn = s. Correspondingly, we can find a sequence .(xn )n in U such that .f (xn ) = yn . Since U is compact, there exists a subsequence .(xnk )k having a limit .x¯ ∈ U . Because .lim yn = s and .ynk = f (xnk ), the subsequence .(ynk )k also has the same limit s. .

n

Then, by the continuity of f , f (x) ¯ = f (lim xnk ) = lim f (xnk ) = lim ynk = s .

.

k

k

k

The theorem is thus proved in what concerns the existence of the maximum. To deal with the minimum, either one proceeds analogously or one considers the continuous function .g = −f and uses the fact that g has a maximum.  The following theorem holds for a general metric space F . Theorem 4.11 If U is a compact set and .f : U → F is a continuous function, then f (U ) is a compact set.

.

Proof Let .(yn )n be a sequence in .f (U ). We can then find a sequence .(xn )n in U such that .f (xn ) = yn for every .n ∈ N. Since U is compact, there exists a subsequence .(xnk )k that has a limit .x¯ ∈ U. Recalling that .ynk = f (xnk ) and that f is continuous, .

lim f (xnk ) = f (lim xnk ) = f (x) ¯ . k

k

Therefore, the subsequence .(ynk )k has a limit, precisely .f (x), ¯ in .f (U ).



We now introduce the concept of “uniform continuity.” First, recall the meaning of .f : E → F is “continuous.” This means that f is continuous at every point .x0 ∈ E, i.e., ∀x0 ∈ E ∀ε > 0 ∃δ > 0 : ∀x ∈ E

.

d(x, x0) < δ ⇒ d(f (x), f (x0 )) < ε .

4.4 Complete Metric Spaces

99

Notice that, in general, the choice of .δ depends on both .ε and .x0 . We will say that f is “uniformly continuous” whenever such a .δ does not depend on .x0 , i.e., ∀ε > 0 ∃δ > 0 : ∀x0 ∈ E ∀x ∈ E

d(x, x0) < δ ⇒ d(f (x), f (x0 )) < ε .

.

The following theorem states that continuity implies uniform continuity when the domain is a compact set. Theorem 4.12 (Heine Theorem) If U is a compact set and .f : U → F is a continuous function, then f is uniformly continuous. Proof By contradiction, assume that f is not uniformly continuous, i.e., ∃ε > 0 : ∀δ > 0 ∃x0 ∈ E ∃x ∈ E : d(x, x0) < δ and d(f (x), f (x0 )) ≥ ε .

.

Let us fix such an .ε > 0, and choose .δ = are .xn0 and .xn in U such that d(xn , xn0 )
0, for every .k ∈ N.

4.4



Complete Metric Spaces

We will now introduce the concept of “completeness” for a metric space E. To this end, we first need to introduce a special class of sequences. We will say that .(an )n is a “Cauchy sequence” in E if ∀ε > 0

.

∃n¯ :

[ m ≥ n¯ and n ≥ n¯ ] ⇒ d(am , an ) < ε .

100

4 Compactness and Completeness

The metric space E will be said to be “complete” if every Cauchy sequence has a limit in E. It is readily seen that if .(an )n has a limit .l ∈ E, then it is a Cauchy sequence. Indeed, for any fixed .ε > 0, taking m and n large enough, we have d(am , an ) ≤ d(am , l) + d(l, an ) < 2ε .

.

In contrast, a Cauchy sequence in E might not have a limit in the space E. As an n example, take .Q with the usual distance and the sequence .an = 1 + n1 whose limit is .e ∈ / Q. Indeed, .Q is not complete, whereas .R is, as we will now prove. Theorem 4.13 .R is complete. Proof Let .(an )n be a Cauchy sequence in .R. By definition (taking .ε = 1), there exists a .n¯ 1 such that, for every .m ≥ n¯ 1 and .n ≥ n¯ 1 , we have .d(an , am ) < 1. Taking .m = n ¯ 1 and setting .a = an¯ 1 − 1, .b = an¯ 1 + 1, we thus see that the sequence .(an )n≥n ¯ 1 is contained in the interval .[a, b]. By Bolzano–Weierstrass Theorem 4.6, there exists a subsequence .(ank )k having a limit .l ∈ [a, b]. We now want to prove that .

lim an = l . n

Let .ε > 0 be fixed. Since .(an )n is a Cauchy sequence, ∃n¯ :

.

m ≥ n¯ and n ≥ n¯ ⇒ d(am , an ) < ε .

Moreover, since .limk ank = l and .limk nk = +∞, ∃k¯ :

.

k ≥ k¯ ⇒ d(ank , l) < ε and nk ≥ n¯ .

Then for every .n ≥ n, ¯ d(an , l) ≤ d(an , ank¯ ) + d(ank¯ , l) < ε + ε = 2ε ,

.



thereby completing the proof. We now extend the previous theorem to higher dimensions. Theorem 4.14 .RN is complete.

Proof For simplicity, we assume .N = 2. Let .(an )n be a Cauchy sequence in .R2 . We write each vector .an ∈ R2 in its coordinates an = (an,1 , an,2 ) .

.

4.4 Complete Metric Spaces

101

Since |am,1 − an,1 | ≤ an − am  ,

.

|am,2 − an,2 | ≤ an − am  ,

we see that both .(an,1 )n and .(an,2 )n are Cauchy sequences in .R. Hence, since .R is complete, each of them has a limit lim an,1 = l1 ∈ R ,

.

n

lim an,2 = l2 ∈ R . n

Then .

lim an = (lim an,1 , lim an,2 ) = (l1 , l2 ) , n

n

n

which is an element of .R2 .



A normed vector space that is complete with respect to the distance given by its norm is said to be a “Banach space.” We have thus proved that .RN is a Banach space. The following theorem provides us the Cauchy criterion for functions. Theorem 4.15 Let F be a complete metric space. Then a function .f : E → F has a limit . lim f (x) in F if and only if the following property holds: x→x0

.∀ε

> 0 ∃δ > 0 : [ 0 < d(x, x0 ) < δ and 0 < d(x  , x0 ) < δ ] ⇒ d(f (x), f (x  )) < ε .

Proof Assume that .limx→x0 f (x) = l ∈ F . The conclusion then follows from the definition of limit and the triangle inequality d(f (x), f (x  )) ≤ d(f (x), l) + d(f (x  ), l) .

.

On the other hand, if the property stated in the theorem holds, take a sequence (an )n such that .limn an = x0 . Then we see that .(f (an ))n is a Cauchy sequence, so it has a limit .l ∈ F . To see that this limit does not depend on the sequence, let .(an )n be another sequence such that .limn an = x0 . By the foregoing property, for every .ε > 0 it will be .d(f (an ), f (an )) < ε for n sufficiently large, from what follows that .(f (an ))n has the same limit as .(f (an ))n . Having thus proved that .limn f (an ) = l for any sequence .(an )n such that .limn an = x0 , the conclusion follows from Proposition 4.3. 

.

102

4 Compactness and Completeness

4.5

Completeness and Continuity

The following theorem provides a useful extension property for uniformly continuous functions. We say that a set is “dense” in E if its closure coincides with E. ( be a dense subset of E, and let F be a complete metric space. Theorem 4.16 Let .E ( → F is uniformly continuous, then there exists a unique continuous If .fˆ : E ( coincides with .fˆ. function .f : E → F whose restriction to .E ( such that .limn xn = x. Proof Taking .x ∈ E, there exists a sequence .(xn )n in .E Since .fˆ is uniformly continuous and .(xn )n is a Cauchy sequence, it follows that .(fˆ(xn ))n is also a Cauchy sequence. Hence, since F is complete, it has a limit .y ∈ F . We define .f (x) = limn fˆ(xn ) = y. ( such Let us verify that this is a good definition. If .(x˜n )n is another sequence in .E that .limn x˜n = x, then .limn d(xn , x˜n ) = 0, and since .fˆ is uniformly continuous, then also .limn d(fˆ(xn ), fˆ(x˜n )) = 0. Hence, .(fˆ(x˜n ))n necessarily has the same limit y of .(fˆ(xn ))n , and the definition is consistent. Clearly, the function f thus defined extends .fˆ since, if .x ∈ U , we can take the sequence .(xn )n as being constantly equal to x. Let us now prove that f is (uniformly) continuous. Once .ε > 0 has been fixed, let .δ > 0 be such that, taking ( .u, v ∈ E, d(u, v) ≤ 2δ

.



ε d(fˆ(u), fˆ(v)) ≤ . 3

If .x, y are two points in E such that .d(x, y) ≤ δ, then we can take two sequences ( such that .limn xn = x and .limn yn = y. Then, since (xn )n and .(yn )n in .E .limn fˆ(xn ) = f (x) and .limn fˆ(yn ) = f (y), for all sufficiently large n it will be that .d(xn , yn ) ≤ 2δ and .

d(f (x), f (y)) ≤ d(f (x), fˆ(xn )) + d(fˆ(xn ), fˆ(yn )) + d(fˆ(yn ), f (y)) ε ε ε ≤ + + = ε, 3 3 3

.

which proves that f is uniformly continuous. To conclude the proof, let .f˜ : E → F be any continuous function extending .fˆ. ( such that .limn xn = x, Then, for every .x ∈ E, taking a sequence .(xn )n in .E f˜(x) = lim f˜(xn ) = lim fˆ(xn ) = f (x) .

.

n

n

We have thus proved that f is the only possible continuous extension of .fˆ to E. 

4.6 Spaces of Continuous Functions

4.6

103

Spaces of Continuous Functions

Let E and F be two metric spaces. We consider a sequence of functions .fn : E → F , and we want to examine, whenever it exists, the limit .

lim fn (x) . n

Clearly enough, this limit could exist for some .x ∈ E and not exist at all for others. So assume that, for some subset .U ⊆ E, there is a function .f : U → F for which .

lim fn (x) = f (x) ,

for every x ∈ U .

n

In this case we will say that the sequence .(fn )n “converges pointwise” to f on U ; it thus happens that ∀x ∈ U

.

∀ε > 0

∃n¯ ∈ N :

n ≥ n¯ ⇒ d(fn (x), f (x)) < ε .

If the preceding choice of .n¯ does not depend on .x ∈ U , we will say that the sequence .(fn )n “converges uniformly” to f on U ; in this case, ∀ε > 0 ∃n¯ ∈ N :

.

∀x ∈ U

n ≥ n¯ ⇒ d(fn (x), f (x)) < ε ,

i.e., equivalently, .

* + lim sup{d(fn (x), f (x)) : x ∈ U } = 0 . n

Let us provide an example of a sequence .(fn )n which converges pointwise, but not uniformly. Let .fn : [0, 1] → R be defined for .n ≥ 1 as ⎧ ⎪ ⎨ nx .fn (x) = 2 − nx ⎪ ⎩ 0

if 0 ≤ x ≤ if if

1 n 2 n

≤x≤

1 n, 2 n,

≤ x ≤ 1.

It is easily seen that .limn fn (x) = 0 for every .x ∈ [0, 1], but the convergence is not uniform, since .fn ( n1 ) = 1 for every .n ≥ 1. The uniform convergence has good behavior with respect to continuity, as the following theorem states. Theorem 4.17 If each function .fn : E → F is continuous on .U ⊆ E and .(fn )n converges uniformly to f on U , then f is also continuous on U .

104

4 Compactness and Completeness

Proof Consider an arbitrary point .x0 of U ; we will show that f is continuous at .x0 . Let .ε > 0 be fixed. Then there exists a .n¯ ∈ N such that ∀x ∈ U

.

n ≥ n¯ ⇒ d(fn (x), f (x))
0. In this case, .

bn 2 = > 1, an 4 − 2n

hence .an < bn , for every .n ∈ N. Let us see that the sequence .(an )n is strictly increasing; by (5.2),

.

an+1 n+1 =2 =2 an n



2 − 4 − 2n n

= 

2+

2

4 − 2n

>√

2 2+2

= 1,

hence .an < an+1 , for every .n ∈ N. On the other hand, let us prove that the sequence .(bn )n is strictly decreasing; by (5.2) again,

.

n bn 1 = bn+1 2 4 − 2n

 4 − 2n+1

n+1 

2 + 4 − 2n 1 n  =

2 4 − 2n 2 − 4 − 2n

1 2 + 4 − 2n =

2 4 − 2n   1 2

= +1 2 4 − 2n >

1 (1 + 1) = 1 , 2

hence .bn > bn+1 for every .n ∈ N.

112

5 Exponential and Circular Functions

Thus, the sequences .(an )n and .(bn )n are monotone, so they both have a finite limit. Since, then, .

lim n = lim n

n

an = 0, 2n

we have .

limn bn bn 2 = lim = lim = 1, n an n limn an 4 − 2n

so we can conclude that the two sequences do indeed have the same limit. We call this real number argument of .ζ and denote it by .Arg(ζ ). We can thus write Arg(ζ ) = lim 2n |σ˜ n − 1| .

.

n

In this way, we have rigorously defined the “length” of the arc on the unitary circle S 1 starting from .(1, 0) and arriving at .ζ˜ = ζ /|ζ |, moving in counterclockwise direction. It may surprise the reader that such an intuitive notion has required so much work! However, the precise definition of the length of a curve will only be given later on in this book and requires some deeper analytical tools (Chap. 11). We are now ready to introduce an important number in mathematics, the number .π, pronounced “pie,” defined as .

π = 2Arg(i) = 3.14159 . . .

.

The importance of this number .π will emerge later on. It measures twice the length of the arc on the unitary circle .S 1 starting from .(1, 0) and arriving at .(0, 1), moving in a counterclockwise direction, so half the length of .S 1 itself. It can be proved that it is an irrational number. In the case where .(ζ ) = 0, i.e., when .ζ is a positive real number, we set Arg(ζ ) = 0 .

.

In what follows, we will require the inequality |σ˜ n − 1| ≤

.

1 Arg(ζ ) , for every n ∈ N , 2n

which is a direct consequence of the fact that .(an )n is increasing.

(5.4)

5.1 The Construction

5.1.2

113

Definition on a Dense Set

We first define the function f on the set E=

.

m 2n

: m ∈ Z, n ∈ N ,

which is a dense subset of .R. We will see that if we want a function .f : E → C\{0} to satisfy the conditions .(a), .(b), and .(c) of the statement, then its definition is uniquely determined. Thus, assume that .(a), .(b), and .(c) hold for some function .f : E → C \ {0}. Since .f (1) = ζ = σ0 , by .(b),  σ0 = f (1) = f

.

and since .f

  1 2

≥ 0, .f

1 1 + 2 2

  1 2

 =f

      2 1 1 1 f = f , 2 2 2

≥ 0, it must be that .f

  1 2

= σ1 . Similarly, since

          2 1 1 1 1 1 1 + .σ1 = f , =f =f f = f 2 4 4 4 4 4   we see that .f

1 4

= σ2 . Iterating this process, we see that we must set 

1 .f 2n

 = σn , for every n ∈ N .

Moreover,  f

.

m 2n



    m 1 1 . =f m n = f n 2 2

This shows that if .(a), .(b), and .(c) hold, then the definition of f on the set E must be   m .f (5.5) = σnm . 2n This is a good definition since .

m m = n n 2 2





σnm = σnm .

114

5 Exponential and Circular Functions n −n

Indeed, if, for instance, .n ≥ n, then we see by (5.1) that .σn = σn2 n −n

σnm = (σn2

.

)m = σnm2 

n −n

, hence



= σnm .

Let us now prove that the function .f : E → C \ {0} defined by (5.5) satisfies the properties .(a), .(b), and .(c). Notice that .f (0) = 1 and .f (1) = ζ , so that property .(a) holds. Let us prove that f (x1 + x2 ) = f (x1 )f (x2 ) , for every x1 , x2 ∈ E ,

.

(5.6)

k m which is property .(b) on the domain E. Taking .x1 = n and .x2 = n (we can now 2 2 choose the same denominator), we have  f

.

k m + n 2n 2



 =f

k+m 2n



 = σnk+m = σnk σnm = f

   k m f n . 2n 2

Finally, with the aim of verifying property .(c), we claim that n

1, σ˜ n , σ˜ n2 , . . . σ˜ n2 belong to Q1 , for every n ∈ N .

.

This is surely true if .n = 0 or 1. If .n = 2, then we have that .1, σ˜ 2 and .σ˜ 22 = σ˜ 1 surely belong to .Q1 , as well as .σ˜ 24 = ζ˜ . Concerning .σ˜ 23 , we notice that |σ˜ 23 | = 1

.

and

|σ˜ 23 − σ˜ 22 | = |σ˜ 23 − σ˜ 24 | = 2 .

In principle, two points satisfy these √ properties, one in the first quadrant and one in the third. However, since .2 < 2, it must be that .σ˜ 23 belongs to .Q1 . By induction, the same argument can be used to prove the claim for every .n ∈ N. Thus, we have constructed a function .f : E → C\{0} that verifies the properties .(a), .(b), and .(c) on its domain. And this is the only possible function with these properties.

5.1.3

Extension to the Whole Real Line

To extend the function .f : E → C \ {0} defined by (5.5) to the whole real line .R, we will apply Theorem 4.16. To this end, we first need to verify that f is uniformly continuous on any bounded subset of its domain E. We fix a real number .R > 0 and consider the restriction of f to .E ∩ [−R, R]. We define the two functions .g : E → ]0, +∞[ and .h : E → S 1 by g(x) = |f (x)| ,

.

h(x) =

f (x) , |f (x)|

5.1 The Construction

115

and we remark that .

g(x1 + x2 ) = g(x1 )g(x2 ) for every x1 , x2 ∈ E

(5.7)

h(x1 + x2 ) = h(x1 )h(x2 ) for every x1 , x2 ∈ E .

(5.8)

and .

Let us first concentrate on the function g and prove that it is uniformly continuous on .E ∩ [−R, R]. We need some preliminary considerations. It is easily seen that if .|ζ | = 1, then g is constant. Assume now that .|ζ | > 1. In this case, it can be seen that .|σn | > 1 for every .n ∈ N, so also .|σnm | > 1 for every .n ∈ N and .m ≥ 1, i.e., g(x) > 1 , for every x ∈ E ∩ ]0, +∞[ .

.

Consequently, x1 < x2

.



g(x2 ) = g(x1 )g(x2 − x1 ) > g(x1 ) ,

proving that g is strictly increasing. Let us now show that .

lim g(x) = 1 .

x→0+

Fix an . > 0. Let .n¯ ∈ N be such that .n¯ ≥ (|ζ | − 1)/. Then, for every .n ≥ n, ¯ using the Bernoulli inequality,   2n 1 n . g = |ζ | ≤ 1 + n ≤ 1 + 2n  ≤ (1 + )2 , n 2 so that  1 0. By the above considerations, there exists .δ > 0 such that 0 0, then the function h is constantly equal to 1 and f coincides with .g : R → ]0, +∞[ . This is the exponential with base a, i.e., the function .fa : R → R+ whose existence was stated in Theorem 2.15. Indeed, property .(i) follows from (5.7), and .g(1) = a. We need to prove that if .a = 1, then .g : R → ]0, +∞[ is invertible. As seen previously, when .a > 1, the function g is strictly increasing on E, and so also on .R, whereas if .a < 1, then it is strictly decreasing. Hence, if .a = 1, then the function g is injective; let us show that it is also surjective. We have shown, after the statement of Theorem 2.15, that .g(n) = a n for every n .n ∈ N. Assume, for instance, .a > 1; then .lim a = +∞, hence, by monotonicity, n

also .

lim g(x) = +∞ .

x→+∞

118

5 Exponential and Circular Functions

On the other hand, since .g(x)g(−x) = g(x − x) = g(0) = 1, .

1 = 0. x→+∞ g(x)

lim g(x) = lim g(−x) = lim

x→−∞

x→+∞

We can then conclude, by Bolzano’s Theorem 2.12, that the image of g is the whole interval .]0, +∞[ . We have thus proved that if .a > 1, then .g : R → ]0, +∞[ is invertible. The same conclusion holds when .0 < a < 1, and the proof is analogous. The proof of Theorem 2.15 is thus complete.  Proof of Theorem 2.16 Now let .ζ = i and .τ > 0 be arbitrary. Then the function g is constantly equal to 1, so f coincides with .h : R → S 1 . Notice that, since .h(τ ) = i, .

h(2τ ) = h(τ + τ ) = h(τ )2 = i 2 = −1 , h(3τ ) = h(2τ + τ ) = h(2τ )h(τ ) = −i , h(4τ ) = h(3τ + τ ) = h(3τ )h(τ ) = 1 ,

and then h(x + 4τ ) = h(x)h(4τ ) = h(x) , for every x ∈ R ,

.

showing that h is a periodic function, with period .T = 4τ . We would like to prove that T is indeed the minimal period of h. Since h is continuous and nonconstant, its minimal period is .T /k for some integer .k ≥ 1. Assume by contradiction that .k ≥ 2. Then .

2τ T = ≤τ 2k k

and 1=h

.

    2 T T , = h k 2k

and since .h(T /2k) ∈ Q1 , it must be that .h(T /2k) = 1. Then we will have 

T .1 = h 2k



  2 T = h , 4k

5.3 Limits for Trigonometric Functions

119

and since .h(T /4k) ∈ Q1 , it must be that .h(T /4k) = 1, too. Proceeding in this way, we see that it must be that   T = 1 , for every j ∈ N . .h 2j k Hence, also  h

.

mT 2j k

 = 1 , for every j ∈ N and m ∈ Z .

Since the set .{mT /2j k : j ∈ N, m ∈ Z} is dense in .R and h is continuous, this would imply that h is constantly equal to 1, a contradiction, since .h(τ ) = i. We have thus proved that T = 4τ is the minimal period of h ,

.

and henceforth we will write .hT instead of h. Since .(i) follows from (5.8) and h(τ ) = i, the proof of Theorem 2.16 is thus now complete. 

.

5.3

Limits for Trigonometric Functions

In the following theorem, the number .π enters the picture. Theorem 5.2 We have .

lim

x→0+

2π hT (x) − 1 = i. x T

Proof Since .T = 4τ and h4τ (x) = h4

.

x  τ

,

it will be equivalent to proving that .

lim

x→0+

π h4 (x) − 1 = i. x 2

First of all, we show that this is true when .x = .

1 2n ,

lim 2n (σn − 1) = n

i.e., that

π i. 2

(5.9)

120

5 Exponential and Circular Functions

(Recall that .σn and .σ˜ n coincide in this case.) We already know that .

lim |2n (σn − 1)| = Arg(i) = n

π  π   =  i ; 2 2

hence, since .(2n (σn − 1)) > 0, it will be sufficient to show that .

lim 2n (σn − 1) = 0 .

(5.10)

n

Since .σn∗ = σn−1 , we see that (σn − 1) =

.

(σn − 1) + (σn − 1)∗ σn + σn−1 − 2 σn2 + 1 − 2σn (σn − 1)2 = = = . 2 2 2σn 2σn

Recalling (5.4) with .ζ = i, since .σ˜ n = σn and .Arg(i) =

π 2,

we have that

2n+1 |σn − 1| < π .

(5.11)

.

Hence, |2n (σn − 1)| ≤ 2n

.

π2 |σn − 1|2 ≤ n+3 , |2σn | 2

thereby proving (5.10) and, hence, (5.9). We now prove the stated limit when x varies in E, i.e., when .x = such a case, .

m 2n

> 0. In

     h4 (x) − 1 π   σnm − 1 π  =   − i i −  x 2   2mn 2     π  1 + σn + σn2 + · · · + σnm−1 = 2n (σn − 1) − i  m 2       1 + σn + σn2 + · · · + σnm−1 π  n  ≤ 2 |σn − 1|  − 1 + 2n (σn − 1) − i  m 2 |σn − 1| + |σn2 − 1| + · · · + |σnm−1 − 1|  n π  + 2 (σn − 1) − i  . ≤ 2n |σn − 1| m 2

By (5.11), for .k = 1, 2, . . . , m − 1 we have      k k    π  j j j −1  j −1  k .|σn − 1| =  (σ − σ ) − σ ≤ σ  = k |σn − 1| ≤ k n+1 . n n n n   2 j =1  j =1

5.3 Limits for Trigonometric Functions

121

Using the formula 1 + 2 + 3 + · · · + (m − 1) =

.

(m − 1)m , 2

we obtain .

|σn − 1| + |σn2 − 1| + · · · + |σnm−1 − 1| m π π + 1 * π + 2 + · · · + (m − 1) ≤ m 2n+1 2n+1 2n+1 1 (m − 1)m π π m = < . n+1 m 2 2 4 2n

In conclusion, if .x =

m 2n

> 0, then   π π m  n π  i  ≤ i . + (σ − 1) − 2 n 2 4 2n 2

  h4 (x) − 1 π − .  x 2

As .x = 2mn tends to .0, necessarily n tends to .+∞, and the result follows by (5.9). We finally look for the limit as .x → 0+ , without further restrictions on x, and assume by contradiction that either such a limit does not exist or that it is not equal to . π2 i. Then there is .ε > 0 and a strictly decreasing sequence .(xn )n , with .xn → 0+ , such that, for every .n, .

  h4 (xn ) − 1 π  −  xn 2

  i  > ε .

By the continuity of the function . h4 (x)−1 and the density of E in .R, for every x sufficiently large n one can find a positive number .xn ∈ E such that  .|xn − xn |

1 ≤ n

and

  h4 (xn ) − 1 π  −  xn 2

  i  > ε ,

contradicting the previous part of the proof. As a consequence of the preceding theorem, we have the following corollary. Corollary 5.3 We have .

lim

x→0

2π sinT (x) = . x T



122

5 Exponential and Circular Functions

Proof Writing .hT (x) = cosT (x) + i sinT (x), we have .

cosT (x) − 1 sinT (x) hT (x) − 1 = +i . x x x

Hence, by Theorem 5.2, .

lim

x→0+

cosT (x) − 1 = 0, x

lim

x→0+

2π sinT (x) = . x T

Now, we have shown, after the statement of Theorem 2.16, that .sinT is an odd function; hence, .

lim

x→0−

2π sinT (x) sinT (−x) sinT (x) = lim = lim = , + + x −x x T x→0 x→0 

and the proof is completed.

Notice that the choice .T = 2π simplifies the preceding formula. This is why we will always choose as the base of the trigonometric functions the number .T = 2π. We will write .cos(x), .sin(x), .tan(x) (or simply .cos x, .sin x, .tan x) instead of .cos2π (x), .sin2π (x), .tan2π (x). Hence, .

lim

x→0

sin x = 1. x

The knowledge of this limit now allows us to prove that .

tan x = 1. x→0 x lim

Indeed, we have .

tan x sin x sin x = lim cos x = lim lim cos x = 1 · cos(0) = 1 . x→0 x x→0 x x→0 x x→0 lim

Moreover, we can also prove that .

lim

x→0

cos x − 1 1 =− . x2 2

5.3 Limits for Trigonometric Functions

123

Indeed, we have .

cos x − 1 cos2 x − 1 = lim 2 2 x→0 x→0 x (cos x + 1) x lim

= − lim

x→0

1 1 sin2 x 1 = −1 · = − . lim x 2 x→0 cos x + 1 2 2

It will be useful to keep in mind these remarkable limits for further applications.

Part II Differential and Integral Calculus in R .

6

The Derivative

We start by introducing the concept of “derivative” of a function defined on a subset of .R, taking its values in .R. Let .O, a subset of .R, be the domain of the function .f : O → R, and consider as fixed a point .x0 ∈ O. For every .x ∈ O \{x0 }, we may write the “difference quotient” .

f (x) − f (x0 ) ; x − x0

it is precisely the slope of the line passing through the points .(x0 , f (x0 )) and (x, f (x)). Henceforth, .x0 will be assumed to be a cluster point of .O.

.

Definition 6.1 The limit .

lim

x→x0

f (x) − f (x0 ) , x − x0

whenever it exists, is called the “derivative” of f at .x0 , and is denoted by one of the following symbols: f  (x0 ) ,

.

Df (x0 ) ,

df (x0 ) . dx

We say that f is “differentiable” at .x0 when the derivative exists and is a real number (hence, not equal to .+∞ or .−∞). In such a case, the line passing through the point  .(x0 , f (x0 )) and having .f (x0 ) as its slope, whose equation is y = f (x0 ) + f  (x0 )(x − x0 ) ,

.

is called the “tangent line” to the graph of f at the point .(x0 , f (x0 )).

© The Author(s), under exclusive license to Springer Nature Switzerland AG 2023 A. Fonda, A Modern Introduction to Mathematical Analysis, https://doi.org/10.1007/978-3-031-23713-3_6

127

128

6 The Derivative

Note that, in some cases, the derivative of f at .x0 could only be a left limit or a right limit. Typically this situation arises when .O is an interval and .x0 coincides with an endpoint. It is sometimes useful to write, equivalently, f  (x0 ) = lim

.

x→x0

f (x) − f (x0 ) f (x0 + h) − f (x0 ) . = lim h→0 x − x0 h

Example 1 Let .f : R → R be defined as .f (x) = mx + q. Then f  (x0 ) = lim

.

x→x0

(mx + q) − (mx0 + q) = m. x − x0

The tangent line in this case coincides with the graph of the function itself. The particular case .m = 0 tells us that the derivative of a constant function is always equal to 0. Example 2 Let .f (x) = x n ; then x n − x0n .f (x0 ) = lim = lim x→x0 x − x0 x→x0 

 n−1

 x k x0n−1−k

= nx0n−1 .

k=0

Let us prove the same formula using a different approach:  n    (x0 + h)n − x0n 1  n n−k k = lim x0 h − x0n h→0 h→0 h h k

f  (x0 ) = lim

.

k=0

 n      n   n n−k k−1 1  n n−k k = lim x h = lim x h = nx0n−1 . h→0 h h→0 k 0 k 0 k=1

k=1

Example 3 Now let .f (x) = ex ; then ex0 +h − ex0 eh − 1 = lim ex0 = e x0 . h→0 h→0 h h

f  (x0 ) = lim

.

Example 4 Choosing .f (x) = cos x, we have cos(x0 + h) − cos(x0 ) h→0 h cos(x0 ) cos(h) − sin(x0 ) sin(h) − cos(x0 ) = lim h→0 h

f  (x0 ) = lim

.

6 The Derivative

129

= − cos(x0 ) lim h h→0

sin(h) 1 − cos(h) − sin(x0 ) lim h→0 h2 h

= − sin(x0 ) . Example 5 On the other hand, if .g(x) = sin x, then sin(x0 + h) − sin(x0 ) h→0 h sin(x0 ) cos(h) + cos(x0 ) sin(h) − sin(x0 ) = lim h→0 h sin(h) 1 − cos(h) = − sin(x0 ) lim h + cos(x0 ) lim h→0 h→0 h2 h = cos(x0 ) .

g  (x0 ) = lim

.

The following theorem provides us a characterization of differentiability. Theorem 6.2 The function f is differentiable at .x0 if and only if there exists a real number . for which one can write f (x) = f (x0 ) + (x − x0 ) + r(x) ,

.

(6.1)

where r is a function such that .

lim

x→x0

r(x) = 0. x − x0

(6.2)

In that case, we have . = f  (x0 ). Proof Assume that f is differentiable at .x0 . Then .

lim

x→x0

f (x) − f (x0 ) − f  (x0 )(x − x0 ) = 0. x − x0

Hence, setting .r(x) = f (x) − f (x0 ) − f  (x0 )(x − x0 ), the desired properties (6.1) and (6.2) are readily verified, taking . = f  (x0 ). Conversely, assume that (6.1) and (6.2) hold. Then .

lim

x→x0

f (x) − f (x0 ) − (x − x0 ) = 0, x − x0

130

6 The Derivative

and hence f (x) − f (x0 ) . lim = lim x→x0 x→x0 x − x0



 f (x) − f (x0 ) − (x − x0 ) + = , x − x0 

showing that f is differentiable at .x0 . We now prove that differentiability implies continuity. Theorem 6.3 If f is differentiable at .x0 , then f is continuous at .x0 . Proof Since f is differentiable at .x0 , we have that .

lim f (x) = lim

x→x0

x→x0

f (x0 ) +

f (x) − f (x0 ) (x − x0 ) x − x0



= f (x0 ) + f  (x0 ) · 0 = f (x0 ) , 

which means that f is continuous at .x0 .

6.1

Some Differentiation Rules

Let us review some rules for the computation of the derivative. Theorem 6.4 If .f, g : O → R are differentiable at .x0 , then so is .f + g, and (f + g) (x0 ) = f  (x0 ) + g  (x0 ) .

.

Proof We compute

(f + g)(x) − (f + g)(x0 ) f (x) − f (x0 ) g(x) − g(x0 ) . lim = lim + x→x0 x→x0 x − x0 x − x0 x − x0 = lim

x→x0

f (x) − f (x0 ) g(x) − g(x0 ) + lim x→x0 x − x0 x − x0

= f  (x0 ) + g  (x0 ) , and the formula is proved. Theorem 6.5 If .f, g : O → R are differentiable at .x0 , then so is .f · g, and (f · g) (x0 ) = f  (x0 )g(x0 ) + f (x0 )g  (x0 ) .

.



6.1 Some Differentiation Rules

131

Proof We can write .

lim

x→x0



f (x) − f (x0 ) (f · g)(x) − (f · g)(x0 ) g(x) − g(x0 ) = lim g(x0 ) + f (x) x→x0 x − x0 x − x0 x − x0

= lim

x→x0

f (x) − f (x0 ) g(x) − g(x0 ) g(x0 ) + lim f (x) lim , x→x0 x→x0 x − x0 x − x0

and the conclusion follows, recalling that . lim f (x) = f (x0 ), since f is continuous x→x0

at .x0 .



The particular case where g is constant with a value .α ∈ R gives us the formula (αf ) (x0 ) = αf  (x0 ) .

.

Moreover, writing .f − g = f + (−1)g, we have that (f − g) (x0 ) = f  (x0 ) − g  (x0 ) .

.

Theorem 6.6 If .f, g : O → R are differentiable at .x0 and .g(x0 ) = 0, then so is . fg , and   f f  (x0 )g(x0 ) − f (x0 )g  (x0 ) . (x0 ) = . g [g(x0 )]2 Proof Since . fg = f · Indeed, we have .

lim

x→x0

1 g

, it will be useful to first show that . g1 is differentiable at .x0 .

1 1 g (x) − g (x0 )

x − x0

= lim

x→x0

g  (x0 ) g(x0 ) − g(x) =− . (x − x0 )g(x)g(x0 ) [g(x0 )]2

Then     f 1 1 f  (x0 ) g  (x0 )  − f (x0 ) . (x0 ) = f (x0 ) (x0 ) + f (x0 ) (x0 ) = , g g g g(x0 ) [g(x0 )]2 

whence the conclusion. Example 1 Let us take into consideration the tangent function .

tan x =

sin x . cos x

132

6 The Derivative

Choosing .f (x) = sin x and .g(x) = cos x, we have1 D tan x0 =

.

f  (x0 )g(x0 ) − f (x0 )g  (x0 ) cos2 x0 + sin2 x0 1 = = . 2 2 [g(x0 )] cos x0 cos2 x0

Example 2 We now compute the derivative of the hyperbolic functions. Let   1 x ex + e−x 1 = . cosh(x) = e + x . 2 2 e Then D cosh(x0 ) =

.

  1 x0 e x0 ex0 − e−x0 e − x 2 = = sinh(x0 ) . 2 2 [e 0 ]

Similarly, writing   1 x ex − e−x 1 = e − x , . sinh(x) = 2 2 e we have D sinh(x0 ) =

.

  1 x0 e x0 ex0 + e−x0 = cosh(x0 ) . e + x 2 = 0 2 [e ] 2

Moreover, D tanh(x0 ) =

.

cosh(x0 ) cosh(x0 ) − sinh(x0 ) sinh(x0 ) cosh2 (x0 )

=

1 cosh2 (x0 )

.

Example 3 All the polynomial functions F (x) = an x n + an−1 x n−1 + · · · + a2 x 2 + a1 x + a0

.

are differentiable, with derivative F  (x0 ) = nan x0n−1 + (n − 1)an−1 x0n−2 + · · · + 2a2x0 + a1 .

.

Here and in what follows we will often write .cos2 x and .sin2 x instead of .(cos x)2 and .(sin x)2 , respectively.

1

6.1 Some Differentiation Rules

133

Hence, all rational functions of the type F (x) =

.

p(x) , q(x)

with .p(x) and .q(x) polynomials, are also differentiable at all points .x0 where q(x0) = 0.

.

Let us now see how to compute the derivative of the composition of two functions. Subsequently in this chapter, we will always consider only nondegenerate intervals, i.e., those not reduced to a single point. Theorem 6.7 If .f : O → R is differentiable at .x0 , and .g : J → R is differentiable at .f (x0 ), where J is an interval containing .f (O), then .g ◦ f is differentiable at .x0 , and (g ◦ f ) (x0 ) = g  (f (x0 ))f  (x0 ) .

.

Proof Setting .y0 = f (x0 ), let .R : J → R be the auxiliary function defined as ⎧ ⎨ g(y) − g(y0 ) .R(y) = y − y0 ⎩ g  (y0 )

if y = y0 , if y = y0 .

We observe that the function R is continuous at .y0 and g(y) − g(y0 ) = R(y)(y − y0 )

.

for every y ∈ J .

Hence, if .x = x0 , .

g(f (x)) − g(f (x0 )) f (x) − f (x0 ) = R(f (x)) . x − x0 x − x0

Since f is continuous at .x0 and R is continuous at .y0 = f (x0 ), the function .R ◦ f is continuous at .x0 , hence .

lim

x→x0

g(f (x)) − g(f (x0 )) f (x) − f (x0 ) = lim R(f (x)) lim x→x0 x→x0 x − x0 x − x0 = R(f (x0 ))f  (x0 ) = g  (f (x0 ))f  (x0 ) ,

which is what we wanted to prove.



134

6 The Derivative

Example 1 Let .h : R → R be defined as .h(x) = cos(ex ). Then .h = g ◦ f , with x  x .f (x) = e and .g(y) = cos y. For any .x0 ∈ R, we have that .f (x0 ) = e 0 , and if  .y0 = f (x0 ), then .g (y0 ) = − sin y0 . Therefore, h (x0 ) = g  (f (x0 ))f  (x0 ) = − sin(ex0 ) ex0 .

.

Example 2 Now let .h : R → R be defined as .h(x) = ecos x . Then .h = g ◦ f , with y  .f (x) = cos x and .g(y) = e . For any .x0 ∈ R, we have that .f (x0 ) = − sin x0 , and  y 0 if .y0 = f (x0 ), then .g (y0 ) = e . Therefore, h (x0 ) = g  (f (x0 ))f  (x0 ) = ecos x0 (− sin x0 ) .

.

We will now show how to compute the derivative of the inverse of an invertible function. Theorem 6.8 Let .I, J be two intervals, and .f : I → J be a continuous invertible function. If f is differentiable at .x0 and .f  (x0 ) = 0, then .f −1 is differentiable at .y0 = f (x0 ), and (f −1 ) (y0 ) =

.

1 f  (x

0)

.

Proof We first observe that, by Theorem 2.14, the function .f −1 : J → I is continuous. Then, by the change of variable formula, .

lim

y→y0

f −1 (y) − f −1 (y0 ) = y − y0 =

lim

x→lim f −1 (y) y→y0

lim

x→f −1 (y

x − x0 f (x) − f (x0 )

x − x0 f (x) − f (x0 ) 0) 1

= lim

x→x0 f (x)−f (x0 ) x−x0

=

1 , f  (x0 ) 

thereby proving the result.

Example 1 If .f (x) = ex , then .f −1 (y) = ln y, and, for any .y0 > 0, writing .y0 = ex0 , we have that D ln(y0 ) = (f −1 ) (y0 ) =

.

1 f  (x

0)

=

1 1 = . x 0 e y0

6.2 The Derivative Function

135

Example 2 Let .α be any real number, and let .h : ]0, +∞[ → R be defined by h(x) = x α . Since

.

x α = eα ln x ,

.

we can write .h = g ◦ f , with .f (x) = α ln x and .g(y) = ey . Then h (x0 ) = g  (f (x0 ))f  (x0 ) = eα ln x0 α

.

1 1 = x0α α = αx0α−1 . x0 x0

We thus see that the same formula we had found for an exponent .n ∈ N also holds for any exponent .α ∈ R.

6.2

The Derivative Function

We will now assume that the function .f : I → R is defined over an interval .I ⊆ R. We say that “f is differentiable” if it is differentiable at every point of I . In this case, we can associate to every .x ∈ I the real number .f  (x), thereby defining a function  .f : I → R, which is called the “derivative function” or simply “derivative” of f . Looking back at our previous examples, we can summarize the derivatives we have found in the following table: f (x)

f  (x)



αx α−1

ex

ex 1 x − sin x

ln x cos x .

sin x

cos x 1 tan x cos2 x cosh x sinh x sinh x cosh x 1 tanh x cosh2 x ··· ···

Some care must be taken concerning the domains, of course.

136

6 The Derivative

It might be interesting at this point to see whether the derivative function .f  has a derivative at some point .x0 of I . If it does, then we call .(f  ) (x0 ) the “second derivative” of f at .x0 and denote it by one of the following symbols: f  (x0 ) ,

.

D 2 f (x0 ) ,

d 2f (x0 ) . dx 2

It is now possible to proceed by induction and define the nth derivative of f at .x0 , using the notation f (n) (x0 ) ,

.

D n f (x0 ) ,

d nf (x0 ) , dx n

by setting .f (n) (x0 ) = (f (n−1) ) (x0 ). We say that .f : I → R is “n times differentiable” if it is so at every point of I . If, moreover, the nth derivative .f (n) : I → R is continuous, we say that f is of class .C n . The set of those functions is denoted by .C n (I, R) or sometimes by .C n (I ). In this setting, .C 0 (I, R) is just .C(I, R), the set of continuous functions. If f is of class .C n for every .n ∈ N, we say that f is “infinitely differentiable.” The set of those functions is denoted by .C ∞ (I, R) or sometimes by .C ∞ (I ). For example, the exponential function .f (x) = ex belongs to this set, since Dn ex = ex ,

.

for every n ≥ 1 .

It can be verified that all the functions in the preceding table are infinitely differentiable on their domains.

6.3

Remarkable Properties of the Derivative

We say that .x0 ∈ O is a “local maximum point” for the function .f : O → R if there exists a neighborhood U of .x0 for which .f (U ) has a maximum and .f (x0 ) = max f (U ). Equivalently, if ∃ρ > 0 :

.

x ∈ B(x0 , ρ) ∩ O ⇒ f (x) ≤ f (x0 ) .

A similar definition holds for “local minimum point.” We will now compute the derivative of a function f at the local maximum or minimum points, provided that they are not at the endpoints of the domain, the interval I . Theorem 6.9 (Fermat Theorem—I) Let .x0 be an internal point of I , and assume f : I → R to be differentiable at .x0 . If, moreover, .x0 is a local maximum or minimum point for f , then .f  (x0 ) = 0.

.

6.3 Remarkable Properties of the Derivative

137

Proof If .x0 is an internal local maximum point for f , there exists a .ρ > 0 such that ]x0 − ρ, x0 + ρ[ ⊆ I and

.

.



f (x) − f (x0 ) x − x0

≥0 ≤0

if x0 − ρ < x < x0 , if x0 < x < x0 + ρ .

Since f is differentiable at .x0 , the limit of the difference quotient exists, and it coincides with the left and right limits, i.e., f  (x0 ) = lim

.

x→x0−

f (x) − f (x0 ) f (x) − f (x0 ) = lim . + x − x0 x − x0 x→x0

By the foregoing inequalities, as a consequence of sign permanence, .

lim

x→x0−

f (x) − f (x0 ) f (x) − f (x0 ) ≥ 0 ≥ lim . + x − x0 x − x0 x→x0

Then it must be that .f  (x0 ) = 0. In the case of a local minimum point, one proceeds similarly.  It is natural that the derivative, as any limit, provides us with local information on the behavior of a function. However, the following theorems will open the door to the study of the global properties of the graph of a function. If .f : [a, b] → R is a continuous function,

Theorem 6.10 (Rolle Theorem) differentiable on .]a, b[ , and

f (a) = f (b) ,

.

then there exists a point .ξ ∈ ]a, b[ such that .f  (ξ ) = 0. Proof If the function is constant, then its derivative is equal to zero at every point, and the conclusion trivially follows. Assume, then, that f is not constant. Then there exists a .x¯ ∈ ]a, b[ such that either f (x) ¯ < f (a) = f (b) ,

.

or f (x) ¯ > f (a) = f (b) .

Let us consider the first case. By Weierstrass’ Theorem 4.10, f has a minimum in .[a, b], and in this case any minimum point cannot be an endpoint of .[a, b]; hence, it must be in .]a, b[ . Let .ξ ∈ ]a, b[ be such a point. By Fermat’s Theorem 6.9, it must be that .f  (ξ ) = 0. The situation is analogous in the second case. By Weierstrass’ Theorem 4.10, f has a maximum in .[a, b], and in this case any maximum point must be in .]a, b[ . By Fermat’s Theorem 6.9, if .ξ ∈ ]a, b[ is such a point, then .f  (ξ ) = 0. 

138

6 The Derivative

What follows is a generalization of the preceding theorem; it is also known as the mean value theorem. Theorem 6.11 (Lagrange Theorem) If .f : [a, b] → R is a continuous function, differentiable on .]a, b[ , then there exists a point .ξ ∈ ]a, b[ such that f  (ξ ) =

.

f (b) − f (a) . b−a

Proof We define the function .g : [a, b] → R as + * f (b) − f (a) (x − a) . g(x) = f (x) − f (a) + b−a

.

Clearly g is continuous on .[a, b], differentiable on .]a, b[ , and such that g(a) = 0 = g(b) .

.

By Rolle’s Theorem 6.10, there exists a point .ξ ∈ ]a, b[ where g  (ξ ) = f  (ξ ) −

.

f (b) − f (a) = 0, b−a 

whence the conclusion.

Corollary 6.12 Let I be an interval and .f : I → R a continuous function, differentiable on .˚ I . The following propositions hold: (a) .(b) .(c) .(d) .(e) .

∈˚ I , then f ∈˚ I , then f ∈˚ I , then f ∈˚ I , then f ∈˚ I , then f

If .f  (x) ≥ 0 for every .x If .f  (x) > 0 for every .x If .f  (x) ≤ 0 for every .x If .f  (x) < 0 for every .x If .f  (x) = 0 for every .x

is increasing. is strictly increasing. is decreasing. is strictly decreasing. is constant.

Proof To prove .(a), let .x1 < x2 in I . By Lagrange’s Theorem 6.11, there exists a ξ ∈ ]x1 , x2 [ such that

.

f  (ξ ) =

.

f (x2 ) − f (x1 ) . x2 − x1

Hence, since .f  (ξ ) ≥ 0, it must be that .f (x1 ) ≤ f (x2 ). This proves that f is increasing. All the other propositions follow similarly.  Remark 6.13 Note that if f is increasing, then every difference quotient for f is greater than or equal to zero, and therefore .f  (x) ≥ 0 for every .x ∈ ˚ I . Hence, in .(a), and the same also in .(c) and .(e), the implication can be reversed. But this is not

6.4 Inverses of Trigonometric and Hyperbolic Functions

139

the case for .(b) and .(d); indeed, if f is strictly increasing, it is not true in general that .f  (x) > 0 for every .x ∈ ˚ I . The derivative could be equal to zero somewhere, as the example .f (x) = x 3 shows.

6.4

Inverses of Trigonometric and Hyperbolic Functions

Recalling the sign properties of the trigonometric functions and that .D cos x = − sin x and .D sin x = cos x, we have the following properties:  cos is

.

strictly decreasing on [0, π] , strictly increasing on [π, 2π] ,

* π π+ ⎧ ⎨ strictly increasing on − , , * π 2 3π2 + sin is ⎩ strictly decreasing on , . 2 2 Let us consider the two functions .F : [0, π] → [−1, 1] and .G : [− π2 , π2 ] → [−1, 1] defined by .F (x) = cos x and .G(x) = sin x. They are strictly monotone, hence injective. Moreover, because they are continuous, their image is an interval. Since .F (π) = −1 = G(− π2 ) and .F (0) = 1 = G( π2 ), both images coincide with .[−1, 1]. Therefore, the two functions thus defined are bijective. We will call the functions .F −1 : [−1, 1] → [0, π] and .G−1 : [−1, 1] → [− π2 , π2 ] “arccosine” and “arcsine,” respectively, and we will write F −1 (y) = arccos y ,

.

G−1 (y) = arcsin y .

The first one is strictly decreasing, whereas the second one is strictly increasing. Let us compute their derivatives. Setting .y = F (x), for .x ∈ ]0, π[ we have (F −1 ) (y) =

.

1 1 1 1 =− = −√ , = − 2 F  (x) sin x 1 − cos x 1 − y2

while setting .y = G(x), for .x ∈ ] − (G−1 ) (y) =

.

1 G (x)

=

π π 2, 2[

we have

1 1 1 = . = 2 cos x 1 − y2 1 − sin x

Note that .arccos + arcsin, having a derivative always equal to zero, is constant. Since its value at 0 is . π2 , we have that .

arccos y + arcsin y =

π 2

for every y ∈ [−1, 1] .

140

6 The Derivative

Let us consider now the function .H : ] − π2 , π2 [ → R defined as .H (x) = tan x. Considerations similar to those given previously show that it is invertible. We will call the function .H −1 : R → ] − π2 , π2 [ “arctangent,” and we will write H −1 (y) = arctan y .

.

It is strictly increasing, with .

lim arctan y = −

y→−∞

π , 2

lim arctan y =

y→+∞

Let us compute its derivative. Setting .y = H (x), for .x ∈ ] − 1

(H −1 ) (y) =

.

H  (x)

π . 2

π π 2, 2[

we have

1 1 = . 2 1 + tan x 1 + y2

= cos2 x =

Let us now switch to the hyperbolic functions. The hyperbolic sine .sinh : R → R is strictly increasing and invertible. The inverse function can be written explicitly as .

sinh−1 (y) = ln(y +



y2 + 1 ) .

The derivative of this function can be computed either directly or using the formula for the inverse function. If .y = sinh(x), then D sinh−1 (y) =

.

1 1 1 1 = = = . 2 D sinh(x) cosh(x) 1 + y2 1 + sinh (x)

The hyperbolic cosine .cosh : R → R is neither injective (being even) nor surjective (since .cosh x ≥ 1 for every .x ∈ R). On the other hand, the function .F : [0, +∞[ → [1, +∞[ , defined as .F (x) = cosh x, is strictly increasing and invertible. Its inverse function .F −1 : [1, +∞[ → [0, +∞[ can be written explicitly as F −1 (y) = ln(y +

.



y2 − 1 ) .

It is often denoted, by an abuse of notation, by .cosh−1 . Let us compute its derivative. If .y = cosh(x), with .x > 0, then D cosh−1 (y) =

.

1 1 1 1 = = . = 2 D cosh(x) sinh(x) y2 − 1 cosh (x) − 1

The function .tanh : R → R is strictly increasing, but it is not surjective, since .−1 < tanh x < 1 for every .x ∈ R. On the other hand, the function .H : R → ] − 1, 1[ ,

6.5 Convexity and Concavity

141

defined as .H (x) = tanh x, is invertible, and its inverse .H −1 : ] − 1, 1[ → R is given by H

.

−1

  1+y 1 (y) = ln . 2 1−y

It is often denoted, by an abuse of notation, by .tanh−1 . Let us compute its derivative. If .y = tanh(x), then D tanh−1 (y) =

.

1 1 1 = cosh2 (x) = . = 2 D tanh(x) 1 − y2 1 − tanh (x)

We can now return to the table of derivatives and enrich it with some of those found earlier. f (x)

f  (x)



αx α−1

ex

ex 1 ln x x cos x − sin x .

6.5

sin x

f (x)

f  (x)

1 arccos x − √ 1 − x2 1 arcsin x √ 1 − x2 1 arctan x 1 + x2

cos x 1 tan x cos2 x cosh x sinh x

cosh−1 x

sinh x cosh x 1 tanh x cosh2 x

tanh−1 x

sinh−1 x

···

1 √ 2 x −1 1 √ 2 x +1 1 1 − x2 ···

Convexity and Concavity

As usual, in what follows, .I ⊆ R will denote a nondegenerate interval. We will say that a function .f : I → R is “convex” if, taking arbitrarily three points .x1 < x2 < x3 in I , the following inequality holds: (a)

.

f (x2 ) − f (x1 ) f (x3 ) − f (x2 ) ≤ . x2 − x1 x3 − x2

142

6 The Derivative

Let us show that inequality .(a) is equivalent to the following ones: (b)

f (x2 ) − f (x1 ) f (x3 ) − f (x1 ) ≤ , x2 − x1 x3 − x1

(c)

f (x3 ) − f (x1 ) f (x3 ) − f (x2 ) ≤ . x3 − x1 x3 − x2

.

.

Indeed, .

f (x3 ) − f (x2 ) f (x2 ) − f (x1 ) ≤ x2 − x1 x3 − x2 ⇔ (f (x2 ) − f (x1 ))(x3 − x2 ) ≤ (f (x3 ) − f (x2 ))(x2 − x1 ) ⇔ (f (x2 ) − f (x1 ))(x3 − x1 + x1 − x2 ) ≤ (f (x3 ) − f (x1 ) + f (x1 ) − f (x2 ))(x2 − x1 ) ⇔ (f (x2 ) − f (x1 ))(x3 − x1 ) ≤ (f (x3 ) − f (x1 ))(x2 − x1 ) ⇔

f (x2 ) − f (x1 ) f (x3 ) − f (x1 ) ≤ , x2 − x1 x3 − x1

proving that .(a) ⇔ (b). The proof of the equivalence .(a) ⇔ (c) is analogous. We may now observe that .f : I → R is convex if and only if, for every .x0 ∈ I , the difference quotient function .F : I \ {x0 } → R, defined by F (x) =

.

f (x) − f (x0 ) , x − x0

is increasing. Indeed, taking .x, x  in .I \ {x0 } such that .x < x  , we can see that    .F (x) ≤ F (x ) in all three possible cases: .x < x < x0 , or .x < x0 < x , or  .x0 < x < x . The following characterization of a convex differentiable function will now be of no surprise. I , then Theorem 6.14 If .f : I → R is continuous, differentiable on .˚ f is convex

.



f  is increasing on ˚ I.

Proof Assume that f is convex. Let .α < β be two points in .˚ I . If .α < x < β, then, by .(b), we have .

f (β) − f (α) f (x) − f (α) ≤ , x−α β −α

6.5 Convexity and Concavity

143

whence, since f is differentiable at .α, f  (α) = lim

.

x→α +

f (β) − f (α) f (x) − f (α) ≤ . x−α β −α

Analogously, by .(c), we have .

f (β) − f (x) f (β) − f (α) ≤ . β −α β−x

whence, since f is differentiable at .β, f  (β) = lim

.

x→β −

f (β) − f (α) f (β) − f (x) ≥ . β −x β −α

Then .f  (α) ≤ f  (β), showing that .f  is increasing on .˚ I. Conversely, assume .f  to be increasing on .˚ I . Taking .x1 < x2 < x3 arbitrarily in I , by Lagrange’s Theorem 6.11, ∃ ξ1 ∈ ]x1, x2 [ : f  (ξ1 ) =

f (x2 ) − f (x1 ) x2 − x1

∃ ξ2 ∈ ]x2 , x3 [ : f  (ξ2 ) =

f (x3 ) − f (x2 ) . x3 − x2

.

and .

I , it must be that .f  (ξ1 ) ≤ f  (ξ2 ), Notice that .ξ1 < ξ2 . Since .f  is increasing on .˚ thereby yielding inequality .(a).  We will say that f is “strictly convex” if, taking arbitrarily three points .x1 < x2 < x3 in I , we have that (a  )

f (x2 ) − f (x1 ) f (x3 ) − f (x2 ) < . x2 − x1 x3 − x2

(b )

f (x2 ) − f (x1 ) f (x3 ) − f (x1 ) < x2 − x1 x3 − x1

(c )

f (x3 ) − f (x1 ) f (x3 ) − f (x2 ) < . x3 − x1 x3 − x2

.

Equivalently, .

or .

144

6 The Derivative

The following characterization also holds true in this case. I , then Theorem 6.15 If .f : I → R is continuous, differentiable on .˚ f is strictly convex

.

f  is strictly increasing on ˚ I.



Proof We need to slightly modify the proof of the previous theorem. Assume that I . If .α < x < 12 (α + β), by .(b ) f is strictly convex, and let .α < β be two points in .˚ we have .

f ( α+β f (x) − f (α) f (β) − f (α) 2 ) − f (α) < , < α+β x−α β −α −α 2

whence f  (α) = lim

.

x→α +

f ( α+β f (β) − f (α) f (x) − f (α) 2 ) − f (α) ≤ . < α+β x−α β −α −α 2

Analogously, if . 12 (α + β) < x < β, then, by .(c ), we have .

f (β) − f ( α+β f (β) − f (α) f (β) − f (x) 2 ) < , < α+β β−α β −x β− 2

whence f  (β) = lim

.

x→β −

f (β) − f ( α+β f (β) − f (x) f (β) − f (α) 2 ) ≥ . > α+β β −x β −α β− 2

I. Then .f  (α) < f  (β), thereby proving that .f  is strictly increasing on .˚ Conversely, assume .f  to be strictly increasing on .˚ I . Taking .x1 < x2 < x3 in I , by Lagrange’s Theorem 6.11, exactly as in the proof of the previous theorem, we obtain inequality .(a  ).  We will say that f is “concave” if the function .(−f ) is convex or, equivalently, the opposite inequality in .(a) (or in .(b) or in .(c)) holds. Analogously, we will say that f is “strictly concave” if the function .(−f ) is strictly convex or, equivalently, the opposite inequality in .(a  ) (or in .(b ) or in .(c )) holds. Clearly enough, analogous theorems can be written characterizing either the concavity or the strict concavity of f when f is differentiable and .f  is either decreasing or strictly decreasing, respectively. We can now state the following corollary, which is widely applied in practice.

6.5 Convexity and Concavity

145

Corollary 6.16 Let I be an interval and .f : I → R be a continuous function, I . The following propositions hold: twice differentiable on .˚ (a) .(b) .(c) .(d) .

If .f  (x) ≥ 0 for every .x If .f  (x) > 0 for every .x If .f  (x) ≤ 0 for every .x If .f  (x) < 0 for every .x

∈˚ I , then f ∈˚ I , then f ∈˚ I , then f ∈˚ I , then f

is convex. is strictly convex. is concave. is strictly concave.

Proof Let us prove .(a). Since .f  (x) ≥ 0 for every .x ∈ ˚ I , by Corollary 6.12, the function .f  : ˚ I → R is increasing. Hence, by Theorem 6.14, the function .f : I → R is convex. The other properties follow similarly.  Recalling Remark 6.13, we can observe that in .(a) and .(c) the implications can be reversed. If f is convex, then .f  (x) ≥ 0 for every .x ∈ ˚ I , and similarly, if f is concave, then .f  (x) ≤ 0 for every .x ∈ ˚ I . But this is not permitted either in .(b), as the example .f (x) = x 4 shows, or in .(d). Example 1 The exponential function .f (x) = ex is strictly convex since f  (x) = ex > 0 ,

.

for every x ∈ R .

Its inverse .ln(x), the natural logarithm, is strictly concave. Example 2 Since .D 2 cos x = − cos x and .D 2 sin x = − sin x, recalling the sign properties of these functions, we have that

.

* π π+ ⎧ ⎪ strictly concave on − , , ⎪ ⎨ 2 2 cos is * + ⎪ ⎪ ⎩ strictly convex on π , 3π , 2 2  sin is

strictly concave on [0, π] , strictly convex on [π, 2π] .

The points separating an interval where the function is convex from an interval where it is concave are called “inflexion points.” For the cosine function, the set of inflexion points is .{ π2 +kπ : k ∈ Z}, whereas for the sine function it is .{kπ : k ∈ Z}. A similar analysis can be made on all the other elementary functions introduced till now. The following property of a differentiable convex function can be useful. It states, roughly speaking, that its graph always lies above any of its tangents.

146

6 The Derivative

Theorem 6.17 If .f : I → R is convex and it is differentiable at some point .x0 ∈ I , then f (x) ≥ f  (x0 )(x − x0 ) + f (x0 ) ,

.

for every x ∈ I .

Proof The inequality surely holds if .x = x0 . Thus, let us assume .x = x0 . If .x > x0 , taking .h > 0 such that .x0 < x0 + h < x, then, by the convexity of f , we have that .

f (x) − f (x0 ) f (x0 + h) − f (x0 ) . ≥ x − x0 h

Taking the limit as .h → 0 we find .

f (x) − f (x0 ) ≥ f  (x0 ) , x − x0

thereby leading to the inequality we want to prove. On the other hand, if .x < x0 , taking .h < 0 such that .x < x0 + h < x0 , then, by the convexity of f , we have that .

f (x0 ) − f (x0 + h) f (x0 ) − f (x) ≤ , x0 − x −h

.

f (x0 + h) − f (x0 ) f (x) − f (x0 ) . ≤ x − x0 h

i.e.,

Taking the limit as .h → 0, the conclusion follows as well.

6.6



L’Hôpital’s Rules

We first need to prove the following generalization of the Lagrange Theorem 6.11. Theorem 6.18 (Cauchy Theorem) If .f, g : [a, b] → R are two continuous functions, differentiable on .]a, b[ , with .g  (x) = 0 for every .x ∈ ]a, b[ , then there exists a point .ξ ∈ ]a, b[ such that .

f (b) − f (a) f  (ξ ) = . g  (ξ ) g(b) − g(a)

6.6 L’Hôpital’s Rules

147

Proof We define the function .h : [a, b] → R as h(x) = (g(b) − g(a))f (x) − (f (b) − f (a))g(x) .

.

It is continuous on .[a, b], differentiable on .]a, b[ , and such that .h(a) = h(b). Then Rolle’s Theorem 6.10 guarantees the existence of a point .ξ ∈ ]a, b[ where .h (ξ ) = 0, whence the conclusion.  Notice that Lagrange’s Theorem 6.11 can now be seen as a corollary of Cauchy’s theorem by taking .g(x) = x. In the remainder of the book, it will be convenient to adopt the following notation. Whenever a is greater than b, the symbol .[a, b] indicates the interval .[b, a], and .]a, b[ indicates .]b, a[ . Note that the statement of Cauchy’s Theorem 6.18 remains valid in this case as well. The following result is known as “L’Hôpital’s rule in the indeterminate case . 00 .” Theorem 6.19 (L’Hôpital Theorem—I) Let I be an interval containing a point x0 , and let .f, g : I \ {x0 } → R be two differentiable functions, with .g  (x) = 0 for every .x ∈ I \ {x0 }, such that

.

.

lim f (x) = lim g(x) = 0 .

x→x0

x→x0

If the limit .

lim

f  (x) g  (x)

lim

f (x) g(x)

x→x0

exists, then the limit .

x→x0

also exists, and the two coincide. f  (x) (allowing the possibility that .l = +∞ or .−∞). Let us x→x0 g  (x) extend the two functions at the point .x0 by setting .f (x0 ) = g(x0 ) = 0; in this way, f and g will be continuous on the whole interval I . By Cauchy’s Theorem 6.18, for every .x = x0 in I there is a point .ξx ∈ ]x0 , x[ (depending on x) such that

Proof Set .l = lim

.

f (x) − f (x0 ) f (x) f  (ξx ) = = . g  (ξx ) g(x) − g(x0 ) g(x)

148

6 The Derivative

Notice that . lim ξx = x0 . Then, using the change of variables formula (3.1), x→x0

.

lim

x→x0

f (x) f  (ξx ) = lim  = x→x0 g (ξx ) g(x)

lim

y→ lim ξx x→x0

f  (y) f  (y) = lim =l, y→x0 g  (y) g  (y) 

and the proof is complete.

Note that the preceding theorem does not exclude the possibility of .x0 being an endpoint of the interval I , in which case we are dealing with left or right limits. We also observe that the conclusion of the statement is written as an implication: If the limit of the quotient of the two derivatives exists, then the limit of the quotient of the two functions exists. The opposite implication is not true, as we can see in the following example. Let .x0 = 0, f (x) = x 2 sin

.

1 , x

g(x) = x .

Then . lim f (x) = lim g(x) = 0, x→0

x→0

.

1 f (x) = lim x sin = 0, x→0 g(x) x→0 x lim

while .

1 1 f  (x) = 2x sin − cos ,  g (x) x x 

(x) so that the limit .limx→0 fg  (x) does not exist. As an example of the application of L’Hôpital’s rule, let .I = R, .x0 = 0, .f (x) = sin x − x, and .g(x) = x 3 . Then, from

.

f  (x) cos x − 1 1 = lim =− ,  2 x→0 g (x) x→0 3x 6 lim

we deduce that .

lim

x→0

sin x − x 1 =− . 3 6 x

The following corollary can be useful in determining whether a function is differentiable at some point .x0 .

6.6 L’Hôpital’s Rules

149

Corollary 6.20 Let I be an interval containing a point .x0 , and let .f : I → R be a continuous function, differentiable at all .x = x0 . If the limit l = lim f  (x)

.

x→x0

exists, then the derivative of f at .x0 exists, and it coincides with l. Proof Let .F (x) = f (x) − f (x0 ) and .G(x) = x − x0 . We have that .G (x) = 0 for every .x = x0 , .

lim F (x) = lim G(x) = 0 ,

x→x0

x→x0

and .

lim

x→x0

F  (x) = lim f  (x) = l . G (x) x→x0

By L’Hôpital’s rule we have that .

lim

x→x0

f (x) − f (x0 ) F (x) =l, = lim x→x0 G(x) x − x0

i.e., .f  (x0 ) = l.



L’Hôpital’s rule can be extended to cases where .x0 = +∞ or .−∞. Let us analyze here the first case; the other one is analogous. Theorem 6.21 (L’Hôpital Theorem—II) Let I be an interval, unbounded from above, and let .f, g : I → R be two differentiable functions, with .g  (x) = 0 for every .x ∈ I , such that .

lim f (x) = lim g(x) = 0 .

x→+∞

x→+∞

If the limit f  (x) x→+∞ g  (x) lim

.

exists, then the limit .

also exists, and the two coincide.

lim

x→+∞

f (x) g(x)

150

6 The Derivative

f  (x) . Defining the two functions .F (x) = f (x −1 ) and x→+∞ g  (x) −1 ), we see that .G (x) = 0 for every x, and .G(x) = g(x Proof Set .l =

lim

.

lim F (x) = lim G(x) = 0 .

x→0+

x→0+

Moreover, .

lim

x→0+

F  (x) f  (x −1 )(−x −2 ) f  (x −1 ) f  (y) = lim = lim = lim =l. y→+∞ g  (y) G (x) x→0+ g  (x −1 )(−x −2 ) x→0+ g  (x −1 )

Then, by Theorem 6.19, . lim

x→0+

.

F (x) = l, and hence G(x)

f (x) f (u−1 ) F (u) = lim = lim =l, x→+∞ g(x) u→0+ g(u−1 ) u→0+ G(u) lim



thereby proving the result.

We will now state what is called “L’Hôpital’s rule in the indeterminate case . ∞ ∞ .” In the following theorem, .∞ can be either .+∞ or .−∞. Theorem 6.22 (L’Hôpital Theorem—III) Let I be an interval containing a point x0 , and let .f, g : I \ {x0 } → R be two differentiable functions, with .g  (x) = 0 for every .x ∈ I \ {x0 }, such that

.

.

lim f (x) = lim g(x) = ∞ .

x→x0

x→x0

If the limit .

lim

f  (x) g  (x)

lim

f (x) g(x)

x→x0

exists, then the limit .

also exists, and the two coincide.

x→x0

6.6 L’Hôpital’s Rules

151

f  (x) . We first assume that .l ∈ R, and that .x0 is not the right x→x0 g  (x) endpoint of I . Let .ε > 0 be fixed. Then there exists a .δ1 > 0 such that

Proof Set .l = lim

    ε  f (x) − l  ≤ . x0 < x < x0 + δ1 ⇒   g (x) 2

.

By Cauchy’s Theorem 6.18, for every .x ∈ ]x0 , x0 + δ1 [ there is a .ξx ∈ ]x, x0 + δ1 [ such that .

f (x0 + δ1 ) − f (x) f  (ξx ) = , g  (ξx ) g(x0 + δ1 ) − g(x)

and hence    ε  f (x0 + δ1 ) − f (x)   .x0 < x < x0 + δ1 ⇒  g(x + δ ) − g(x) − l  ≤ 2 . 0 1 We can moreover assume that .δ1 was chosen small enough so that x0 < x < x0 + δ1 ⇒ f (x) = 0 and g(x) = 0 .

.

Let us write .

f (x) f (x0 + δ1 ) − f (x) = ψ(x) g(x0 + δ1 ) − g(x) g(x)

and observe that .

lim ψ(x) = lim

x→x0

x→x0

1 − f (x0 + δ1 )/f (x) = 1. 1 − g(x0 + δ1 )/g(x)

In particular, .

lim

x→x0

ε ε 1  l− =l− , ψ(x) 2 2

lim

x→x0

ε ε 1  l+ =l+ , ψ(x) 2 2

so that there exists a .δ ∈ ]0, δ1 [ such that, if .x0 < x < x0 + δ, then .ψ(x) > 0 and .

ε 1  l− ≥ l −ε, ψ(x) 2

1  ε l+ ≤ l +ε. ψ(x) 2

Therefore, if .x0 < x < x0 + δ, then l−ε ≤

.

ε 1 f (x0 + δ1 ) − f (x) 1  ε 1  l− ≤ ≤ l+ ≤ l +ε, ψ(x) 2 ψ(x) g(x0 + δ1 ) − g(x) ψ(x) 2

152

6 The Derivative

and hence     f (x)   .  g(x) − l  ≤ ε . We have thus proved that .

lim

x→x0+

f (x) =l. g(x)

In a perfectly analogous way one proves that, if .x0 is not the left endpoint of I , then .

lim

x→x0−

f (x) =l, g(x)

so that the theorem is proved in the case where .l ∈ R. Assume now that .l = +∞ and that .x0 is not the right endpoint of I . Let .α > 0 be fixed. Then there exists a .δ1 > 0 such that x 0 < x < x 0 + δ1 ⇒

.

f  (x) ≥ 2α . g  (x)

Proceeding as previously, we have that x 0 < x < x 0 + δ1 ⇒

.

f (x0 + δ1 ) − f (x) ≥ 2α . g(x0 + δ1 ) − g(x)

We can, moreover, assume that .δ1 has been chosen small enough so that x0 < x < x0 + δ1 ⇒ f (x) = 0 and g(x) = 0 .

.

Let .ψ(x) be defined as previously. There exists a .δ ∈ ]0, δ1 [ such that x0 < x < x0 + δ ⇒ 0 < ψ(x) < 2 .

.

Therefore, if .x0 < x < x0 + δ, then .

1 1 f (x0 + δ1 ) − f (x) ≥ 2α ≥ α , ψ(x) g(x0 + δ1 ) − g(x) ψ(x)

and hence .

f (x) ≥ α. g(x)

6.6 L’Hôpital’s Rules

153

We have thus proved that .

lim

x→x0+

f (x) = +∞ . g(x)

In a perfectly analogous way one proves that, if .x0 is not the left endpoint of I , then .

lim

x→x0−

f (x) = +∞ , g(x)

so that the theorem is proved also in the case .l = +∞. Finally, the case .l = −∞ can be ruled out observing that a change of sign in one of the two functions leads back to the previously proved case.  Example We want to compute .

lim x ln x .

x→0+

Setting .f (x) = ln x and .g(x) = 1/x, we see that . lim f (x) = −∞ and .

x→0+

lim g(x) = +∞. Moreover,

x→0+

.

lim

x→0+

f  (x) 1/x = lim = lim (−x) = 0 . g  (x) x→0+ −1/x 2 x→0+

Hence, also .

lim x ln x = lim

x→0+

x→0+

f (x) = 0. g(x)

Even in the indeterminate case . ∞ ∞ we can extend L’Hôpital’s rule to cases where .x0 = +∞ or .−∞. Let us see, e.g., the first case. Theorem 6.23 (L’Hôpital Theorem—IV) Let I be an interval, unbounded from above, and let .f, g : I → R be two differentiable functions, with .g  (x) = 0 for every .x ∈ I , such that .

lim f (x) = lim g(x) = ∞ .

x→+∞

x→+∞

If the limit .

f  (x) x→+∞ g  (x) lim

154

6 The Derivative

exists, then the limit .

lim

x→+∞

f (x) g(x)

also exists, and the two coincide. The proof is analogous to that of Theorem 6.21, so we omit it for brevity’s sake.

6.7

Taylor Formula

The following theorem provides us the so-called “Taylor formula with Lagrange’s form of the remainder.” Theorem 6.24 (Taylor Theorem—I) Let .x = x0 be two points of an interval I and .f : I → R be .n + 1 times differentiable. Then there exists a .ξ ∈ ]x0 , x[ such that f (x) = pn (x) + rn (x) ,

.

where pn (x) = f (x0 ) + f  (x0 )(x − x0 ) +

.

1  1 f (x0 )(x − x0)2 + · · · + f (n) (x0 )(x − x0 )n 2! n!

is the “nth-order Taylor polynomial associated with the function f at the point .x0 ,” and rn (x) =

.

1 f (n+1) (ξ )(x − x0 )n+1 (n + 1)!

is the “Lagrange form of the remainder.” Proof We first observe that the polynomial .pn satisfies the following properties: ⎧ ⎪ p (x ) = f (x0 ) , ⎪ ⎪ n 0 ⎪  ⎪ ⎪ ⎨ pn (x0 ) = f (x0 ) ,  pn (x0 ) = f  (x0 ) , . ⎪ .. ⎪ ⎪ ⎪ . ⎪ ⎪ ⎩ (n) pn (x0 ) = f (n) (x0 ) .

6.7 Taylor Formula

155

By Cauchy’s Theorem 6.18, we can find a point .ξ1 ∈ ]x0 , x[ such that .

f (x) − pn (x) (f (x) − pn (x)) − (f (x0 ) − pn (x0 )) = n+1 (x − x0 ) (x − x0 )n+1 − (x0 − x0 )n+1 =

f  (ξ1 ) − pn (ξ1 ) . (n + 1)(ξ1 − x0 )n

Again by Cauchy’s Theorem 6.18, we can find a point .ξ2 ∈ ]x0 , ξ1 [ such that .

f  (ξ1 ) − pn (ξ1 ) (f  (ξ1 ) − pn (ξ1 )) − (f  (x0 ) − pn (x0 )) = n (n + 1)(ξ1 − x0 ) (n + 1)(ξ1 − x0 )n − (n + 1)(x0 − x0 )n =

f  (ξ2 ) − pn (ξ2 ) . (n + 1)n(ξ2 − x0 )n−1

Proceeding by induction, we find .n + 1 points .ξ1 , ξ2 , . . . , ξn+1 such that .

f (x) − pn (x) f  (ξ1 ) − pn (ξ1 ) = (x − x0 )n+1 (n + 1)(ξ1 − x0 )n =

f  (ξ2 ) − pn (ξ2 ) (n + 1)n(ξ2 − x0 )n−1

.. . =

f (n+1) (ξn+1 ) − pn(n+1) (ξn+1 ) . (n + 1)!(ξn+1 − x0 )0

If .x > x0 , these points satisfy the inequalities x0 < ξn+1 < ξn < · · · < ξ2 < ξ1 < x ,

.

whereas if .x < x0 , they are in the opposite order. Since the .(n + 1)th derivative of (n+1) an nth-order polynomial is constantly equal to zero, we have that .pn (ξn+1 ) = 0, and setting .ξ = ξn+1 we conclude.  If .n = 0, then the preceding Taylor Formula is simply f (x) = f (x0 ) + f  (ξ )(x − x0 ) ,

.

for some ξ ∈ ]x0 , x[ ,

which is the outcome of Lagrange’s Theorem 6.11. Note that the Taylor polynomial pn (x) =

.

n  f (k) (x0 ) k=0

k!

(x − x0 )k

(6.3)

156

6 The Derivative

could have a degree smaller than n (here .f (0) simply denotes f ). For example, if f is a constant function, then the degree of .pn (x) is equal to 0. Examples Let us now determine the Taylor polynomial of some elementary functions, taking for simplicity .x0 = 0 (in which case it is sometimes called a “Maclaurin polynomial”). 1. Let .f (x) = ex . Then  xk x3 xn x2 + + ···+ = . 2! 3! n! k! n

pn (x) = 1 + x +

.

k=0

2. Let .f (x) = cos x. Then, if either .n = 2m or .n = 2m + 1,  x4 x6 x 2m x 2k x2 + − + · · · + (−1)m = . (−1)k 2! 4! 6! (2m)! (2k)! m

pn (x) = 1 −

.

k=0

3. Let .f (x) = sin x. Then, if either .n = 2m + 1 or .n = 2m + 2,  x7 x 2m+1 x 2k+1 x3 x5 + − + · · · + (−1)m = . (−1)k 3! 5! 7! (2m + 1)! (2k + 1)! m

pn (x) = x −

.

k=0

4. Now let .f (x) =

1 1−x .

It can be shown by induction that f (n) (x) =

.

n! . (1 − x)n+1

Then .f (n) (0) = n!, and hence pn (x) = 1 + x + x 2 + x 3 + · · · + x n .

.

5. We proceed similarly for the function .f (x) =

1 1+x

and find

pn (x) = 1 − x + x 2 − x 3 + · · · + (−1)n x n .

.

6. Consider now the function .f (x) = ln(1 + x). Its derivative coincides with the previous function, and we easily obtain pn (x) = x −

.

x3 x4 xn x2 + − + · · · + (−1)n−1 . 2 3 4 n

6.7 Taylor Formula

157

7. Another example where the Taylor polynomial has an explicit formula is given 1 by the function .f (x) = 1+x 2 . If either .n = 2m or .n = 2m + 1, then we have pn (x) = 1 − x 2 + x 4 − x 6 + · · · + (−1)m x 2m .

.

8. At this point it is easy to deal with the function .f (x) = arctan x, whose derivative is the previous function. If either .n = 2m + 1 or .n = 2m + 2, then pn (x) = x −

.

x5 x7 x 2m+1 x3 + − + · · · + (−1)m . 3 5 7 2m + 1

In a similar way we can find the Taylor polynomials of the hyperbolic functions cosh x, .sinh x, and that of .tanh−1 x. The following summary table may be useful:

.

f (x) ex

pn (x) at the point x0 = 0 x3 xn x2 + + ···+ 2! 3! n! x3 x4 (−1)n−1 x n + − + ···+ 3 4 n x4 x6 x 2m + − + · · · + (−1)m 4! 6! (2m)! 5 7 x x x 2m+1 + − + · · · + (−1)m 5! 7! (2m + 1)! 5 7 x x x 2m+1 + − + · · · + (−1)m 5 7 2m + 1 x4 x6 x 2m + + + ··· + 4! 6! (2m)! 5 7 x x x 2m+1 + + + ···+ 5! 7! (2m + 1)! 5 7 x x x 2m+1 + + + ···+ 5 7 2m + 1

1+x+

x2 2 x2 cos x 1 − 2! x3 sin x x − . 3! x3 arctan x x − 3 x2 cosh x 1 + 2! x3 sinh x x + 3! x3 tanh−1 x x + 3 ln(1 + x) x −

On the other hand, there is no elementary expression for the Taylor polynomial of the functions .tan x and .tanh x. We report here only the first few terms: 2x 5 17x 7 x3 + + +... 3 15 315 . 2x 5 17x 7 x3 tanh x x − + − +... 3 15 315 tan x x +

158

6 The Derivative

6.8

Local Maxima and Minima

Assuming .x0 to be fixed, we would like to take some limit in the Taylor formula as x tends to .x0 . Hence, for every .x = x0 , to emphasize the fact that the point .ξ ∈ ]x0 , x[ in the Taylor formula depends on x, we will write .ξ = ξx . Whenever .f (n+1) happens to be bounded in a neighbourhood of .x0 , we see that .

lim

x→x0

rn (x) 1 f (n+1) (ξx )(x − x0 ) = 0 . = lim x→x0 (n + 1)! (x − x0 )n

This relation is sometimes written using the following notation: rn (x) = o(|x − x0 |n )

.

if x → x0 .

This is surely true if .f (n+1) is continuous at .x0 , in which case .

lim

x→x0

rn (x) 1 1 f (n+1) (ξx ) = f (n+1) (x0 ) . = lim x→x0 (n + 1)! (x − x0 )n+1 (n + 1)!

Theorem 6.25 Let .f ∈ C 2 (I, R), and assume that .x0 is an internal point of I . If   .f (x0 ) = 0 and .f (x0 ) > 0, then .x0 is a local minimum point for f . On the other  hand, if .f (x0 ) = 0 and .f  (x0 ) < 0, then .x0 is a local maximum point for f . Proof Let us prove the first statement; the second one is analogous. Using the Taylor formula with .n = 1, we have that .

f (x) − f (x0 ) f (x) − [f (x0 ) + f  (x0 )(x − x0 )] = lim x→x0 (x − x0 )2 (x − x0 )2

lim

x→x0

= lim

x→x0

r1 (x) 1 = f  (x0 ) > 0 . 2 (x − x0 ) 2!

Hence, using sign permanence, there is a neighborhood U of .x0 such that .f (x) > f (x0 ) for every .x ∈ U \ {x0 }.  Whenever .f  (x0 ) = 0 and .f  (x0 ) = 0, we will need further information. Always assuming that .x0 is an internal point of I and that f is sufficiently regular, it can be seen that if .f  (x0 ) = 0, then .x0 will be neither a local minimum nor a local maximum point for f . On the other hand, if .f  (x0 ) = 0, then if f  (x0 ) = f  (x0 ) = f  (x0 ) = 0 and f  (x0 ) > 0,

.

then x0 is a local minimum point,

6.9 Analyticity of Some Elementary Functions

159

whereas if f  (x0 ) = f  (x0 ) = f  (x0 ) = 0 and f  (x0 ) < 0,

.

then x0 is a local maximum point. This procedure can be continued, of course, but we avoid the details for brevity’s sake.

6.9

Analyticity of Some Elementary Functions

Somewhat surprisingly, the Taylor polynomial at .x0 may be a good approximation of a function even at distant points x if the degree is taken large enough. An example follows, provided by the exponential function. Theorem 6.26 For every .x ∈ R, we have that   x3 xn x2 x + + ···+ .e = lim 1 + x + . n 2! 3! n! Proof The formula clearly holds when .x = 0. Assuming .x = 0, by Taylor’s Theorem 6.24 there exists a .ξ ∈ ]0, x[ such that .f (x) = pn (x) + rn (x), with rn (x) = eξ

.

x n+1 . (n + 1)!

We want to prove that .lim rn (x) = 0. Notice that for any .x ∈ R, n

|x|n+1 . (n + 1)!

|rn (x)| ≤ e|x|

.

an = 0 for every .a ∈ R, the conclusion Since we proved in Theorem 3.30 that .lim n n!  follows. Instead of ex = lim

.

n  xk

n→+∞

k=0

k!

we will briefly write ex =

.

∞  xk k=0

k!

.

,

160

6 The Derivative

This is the “Taylor series” associated with the exponential function at the point x0 = 0. A similar phenomenon holds for the cosine and sine functions.

.

Theorem 6.27 For every .x ∈ R, we have that .

  x6 x2 x4 x 2m + − + · · · + (−1)m , cos x = lim 1 − m 2! 4! 6! (2m)!   x5 x7 x3 x 2m+1 + − + · · · + (−1)m . sin x = lim x − m 3! 5! 7! (2m + 1)!

Proof Following the lines of the previous proof and using the fact that .| cos ξ | ≤ 1 and .| sin ξ | ≤ 1 for every .ξ ∈ R, we see that |rn (x)| ≤

.

Since .lim n

|x|n+1 . (n + 1)!

an = 0 for every .a ∈ R, the conclusion follows. n!



We will briefly write .

cos x =

∞ 

(−1)k

k=0

x 2k , (2k)!

sin x =

∞ 

(−1)k

k=0

x 2k+1 . (2k + 1)!

Similarly, one can prove that .

cosh x =

∞  x 2k , (2k)!

sinh x =

k=0

∞  x 2k+1 . (2k + 1)! k=0

The functions .f ∈ C ∞ (I, R) for which .f (x) = limn pn (x) for every .x ∈ I are called “analytic” on I . This is not the case for every function. For instance, the function .f : R → R, defined as  f (x) =

.

e−1/x 0

2

if x =

0, if x = 0 ,

is infinitely differentiable, and .f (k) (0) = 0 for every .k ∈ N, hence .pn (x) is identically equal to zero. The reader is invited to verify this.

7

The Integral

In this chapter, we denote by I a compact interval of the real line .R, i.e., I = [a, b] ,

.

7.1

for some a < b .

Riemann Sums

First, we choose in I some points a = a0 < a1 < · · · < am−1 < am = b ,

.

thereby obtaining a “partition” of I made by the intervals .[aj −1 , aj ], with j = 1, . . . , m. Then, for each j , we choose a point

.

xj ∈ [aj −1 , aj ] .

.

A “tagged partition” of I is the set ˚= P

.

$

% $ % x1 , [a0 , a1 ] , . . . , xm , [am−1 , am ] .

Examples Let .I = [0, 1]. Here are some tagged partitions of I : .

˚= P ˚= P

 

1 6 , [0, 1]



+  * + * 0, 0, 13 , 12 , 13 , 1

© The Author(s), under exclusive license to Springer Nature Switzerland AG 2023 A. Fonda, A Modern Introduction to Mathematical Analysis, https://doi.org/10.1007/978-3-031-23713-3_7

161

162

7 The Integral



+  * +  * + , 13 , 13 , 23 , 23 , 23 , 1  * +  * +  * +  * + ˚ = 1 , 0, 1 , 3 , 1 , 1 , 5 , 1 , 3 , 7 , 3 , 1 P . 8 4 8 4 2 8 2 4 8 4

˚= P

1 3,

*

0, 13

˚ as above We now consider a function .f : I → R. For each tagged partition .P we define the number ˚) = S(f, P

m 

.

f (xj )(aj − aj −1 ) ,

j =1

˚ which is called the “Riemann sum” associated with f and .P. To better understand this definition, assume for simplicity that the function f is positive on .I. Then to each tagged partition of I we associate the sum of the areas of the rectangles having base .[aj −1 , aj ] and height .[0, f (xj )]. f (x )

a x1

a 1 x2

a2

x

x 3 a 3 x4 b

If f is not positive on .I, the areas will be considered with a positive or negative sign depending on whether .f (xj ) is positive or negative, respectively. If .f (xj ) = 0, the j th term of the sum will clearly be equal to zero. Example Let .f : [0, 1] → R be defined as .f (x) = 4x 2 − 1, and let ˚= P

.



1 8,

*

0, 14

+ +  * +  * . , 12 , 14 , 34 , 78 , 34 , 1

Then ˚) = − 15 · S(f, P 16

.

1 4

+0·

1 2

+

33 16

·

1 4

=

9 32

.

7.2 δ-Fine Tagged Partitions

7.2

163

δ-Fine Tagged Partitions

To measure how “fine” a tagged partition is, we will have to deal with a “gauge”, i.e., a positive function .δ : I → R. ˚ is “.δ-fine” if, for every If .δ is a gauge on I , we say that the tagged partition .P .j = 1, . . . , m, xj − aj −1 ≤ δ(xj )

.

and

aj − xj ≤ δ(xj ) ;

equivalently, we may write [aj −1 , aj ] ⊆ [xj − δ(xj ), xj + δ(xj )] .

.

We will now show that it is always possible to find a .δ-fine tagged partition of the compact interval .I, whatever the gauge .δ. Theorem 7.1 (Cousin Theorem) For every gauge .δ on .I = [a, b] there is a .δ-fine tagged partition of .I. Proof Set .I0 = I , and assume by contradiction that there exists a gauge .δ : I0 → R for which there are no .δ-fine tagged partitions. Taking the midpoint of .I0 , we divide it in two closed subintervals. At least one of these two subintervals will not have any .δ-fine tagged partition (otherwise we could glue together the two .δ-fine tagged partitions to get a .δ-fine tagged partition of the original interval .I0 ). Let us choose it and denote it by .I1 . We now iterate the same procedure, thereby constructing a sequence I = I0 ⊇ I1 ⊇ I2 ⊇ I3 ⊇ . . .

.

of closed subintervals, none of which has any .δ-fine tagged partitions. By Cantor’s Theorem 1.9, there is a point c belonging to all of these intervals. For n sufficiently ˚ = {(c, In )}, large, .In will be contained in .[c − δ(c), c + δ(c)]. But then the set .P whose only element is the couple .(c, In ), is a .δ-fine tagged partition of .In , a contradiction.  Examples Let us provide some examples of .δ-fine tagged partitions of the interval I = [0, 1].

.

We start with a constant gauge: .δ(x) = 15 . Since the previous theorem does not give any information on how to find a .δ-fine tagged partition, we will proceed by guessing. As a first guess, we choose the .aj equally spaced and the .xj as the midpoints of the intervals .[aj −1 , aj ], i.e., aj =

.

j , m

xj =

aj −1 + aj 2j − 1 = . 2 2m

164

7 The Integral

For the corresponding tagged partition to be .δ-fine, it must be that xj − aj −1 =

.

1 1 ≤ 2m 5

and

aj − xj =

1 1 ≤ . 2m 5

These inequalities are satisfied choosing .m ≥ 3. If .m = 3, we have the .δ-fine tagged partition ˚= P

.



1 6,

*

0, 13

+  * +  * + , 12 , 13 , 23 , 56 , 23 , 1 .

If, instead of taking the points .xj in the middle of the respective intervals we would −1 , then in order like to choose them, for example, at the left endpoint, i.e., .xj = j m to have a .δ-fine tagged partition we should ask that xj − aj −1 = 0 ≤

.

1 5

and

aj − xj =

1 1 ≤ . m 5

These inequalities are verified if .m ≥ 5. For instance, if .m = 5, then we have the δ-fine tagged partition

.

˚= P

.



* +  * +  * +  * +  * + 0, 0, 15 , 15 , 15 , 25 , 25 , 25 , 35 , 35 , 35 , 45 , 45 , 45 , 1 .

Notice that, with such a choice of .aj , if .m ≥ 5, then the points .xj can actually be taken arbitrarily in the corresponding intervals .[aj −1 , aj ], still yielding .δ-fine tagged partitions. The previous example shows how it is possible to construct .δ-fine tagged partitions in the case of a gauge .δ that is constant with value . 15 . It is clear that a similar procedure can be used for a constant gauge with arbitrary positive value. Consider now the case where .δ is a continuous function. Then Weierstrass’ ¯ Consider Theorem 4.10 says that .δ(x) has a minimum positive value: let it be .δ. ¯ and construct a .δ-fine ¯ then the constant gauge with value .δ, tagged partition with the procedure we saw earlier. Clearly, such a tagged partition must be .δ-fine as well. This argument shows how the case of a continuous gauge can be reduced to that of a constant gauge. Consider now the noncontinuous gauge  δ(x) =

.

1 2 x 2

if x = 0 , if x ∈ ]0, 1] .

As previously, we proceed by guessing. Let us try, as earlier, taking the .aj equally distant and the .xj as the midpoints of the intervals .[aj −1 , aj ]. This time, however,

7.3 Integrable Functions on a Compact Interval

165

we are going to fail; indeed, we should have x1 = x1 − a0 ≤ δ(x1 ) =

.

x1 , 2

which is clearly impossible if .x1 > 0. The only way to solve this problem is to choose .x1 = 0. We decide, then, for instance, to take the .xj to coincide with .aj −1 , as was also done earlier. We thus find the .δ-fine tagged partition ˚= P

.

 * + +  * +  * 0, 0, 12 , 12 , 12 , 34 , 34 , 34 , 1 .

Notice that a more economic choice might have been ˚= P

.

 * +  * + 0, 0, 12 , 1, 12 , 1 .

The choice .x1 = 0 is, however, unavoidable. Finally, once a point .c ∈ ]0, 1[ is fixed, let the gauge .δ : [0, 1] → R be defined as ⎧c−x ⎪ ⎪ ⎪ 2 ⎪ ⎪ ⎨ 1 .δ(x) = ⎪ 5 ⎪ ⎪ ⎪ ⎪ ⎩x−c 2

if x ∈ [0, c[ , if x = c , if x ∈ ]c, 1] .

Similar considerations to those made in the previous case lead to the conclusion that, in order to have a .δ-fine tagged partition, it is necessary for one of the .xj to be equal to .c. For example, if .c = 12 , a possible choice is ˚= P

.

7.3

 * +  * +  * +  * +  * + 0, 0, 14 , 14 , 14 , 38 , 12 , 38 , 58 , 34 , 58 , 34 , 1, 34 , 1 .

Integrable Functions on a Compact Interval

We now want to define some kind of convergence of the Riemann sums when the tagged partitions become “finer and finer”. The following definition is due to Jaroslav Kurzweil and Ralph Henstock. Definition 7.2 A function .f : I → R is said to be “integrable” if there is a real number .J with the following property: Given .ε > 0, it is possible to find a gauge ˚ of I , .δ : I → R such that, for every .δ-fine tagged partition .P ˚ −J| ≤ ε. |S(f, P)

.

166

7 The Integral

We will also say that f is “integrable on I .” Let us prove that there is at most one .J ∈ R that verifies the conditions of the definition. If there were a second one, say, .J  , then, for every .ε > 0, there would be two gauges .δ and .δ  on I associated respectively with .J and .J  , satisfying the condition of the definition. Define the gauge δ  (x) = min{δ(x), δ  (x)} .

.

˚ of I is chosen, we have that .P ˚ is both .δ-fine and Once a .δ  -fine tagged partition .P  .δ -fine, and hence ˚ + |S(f, P) ˚ − J  | ≤ 2ε . |J − J  | ≤ |J − S(f, P)|

.

Since this holds for every .ε > 0, it necessarily must be that .J = J  . If .f : I → R is an integrable function, the only element .J ∈ R verifying the conditions of the definition is called the “integral” of f on I and is denoted by one of the following symbols: -

-

-

b

f,

.

-

f,

I

b

f (x) dx ,

f (x) dx .

I

a

a

The presence of the letter x in the preceding notation has no independent importance. It could be replaced by any other letter .t, .u, .α, . . . , or by any other symbol, unless already used with another meaning. For reasons to be explained later on, we set, moreover, -

a

.

-

-

b

f =−

f,

b

a

and

a

f = 0.

a

Examples 1. As a first example, consider a constant function .f (x) = c. In this case, for any ˚ of .[a, b], tagged partition .P ˚ = S(f, P)

m 

.

c(aj − aj −1 ) = c

j =1

m  (aj − aj −1 ) = c(b − a) , j =1

hence also .

a

b

c dx = c(b − a) .

7.3 Integrable Functions on a Compact Interval

167

Indeed, once we fix .ε > 0, it is readily seen in this simple case that any gauge δ : [a, b] → R satisfies the condition of the definition, with .J = c(b − a), since ˚ of I we have .|S(f, P) ˚ − J | = 0 < ε. for every .δ-fine tagged partition .P 2. As a second example, consider the function .f (x) = x. Then .

˚ = S(f, P)

m 

.

xj (aj − aj −1 ) .

j =1

To find a candidate for the integral, let us consider a particular tagged partition where the .xj are the midpoints of the intervals .[aj −1 , aj ]. In this particular case, we have m  .

j =1

xj (aj −aj −1 ) =

m  aj −1 + aj

2

j =1

1 2 2 1 (aj −aj −1 ) = (b 2 −a 2 ). 2 2 m

(aj −aj −1 )=

j =1

We want to prove now that the function .f (x) = x is integrable on .[a, b] and that ˚ we have its integral is really . 12 (b2 − a 2 ). Fix .ε > 0. For any tagged partition .P .

   m  m      aj −1 + aj S(f, P) ˚ − 1 (b 2 − a 2 ) =   (a x (a − a ) − − a ) j j j −1 j j −1     2 2 j =1

j =1

 m    aj −1 + aj   ≤ x j −  (aj − aj −1 ) 2 j =1



m  aj − aj −1 (aj − aj −1 ) . 2 j =1

ε If we choose the gauge .δ to be constant with value . b−a , then for every .δ-fine ˚ we have tagged partition .P

.

  m m    2δ ε  S(f, P) ˚ − 1 (b 2 − a 2 ) ≤ (a − a ) = (aj − aj −1 ) = ε . j j −1   2 2 b−a j =1

j =1

The condition of the definition is thus verified with this choice of the gauge, and we have proved that -

b

.

a

x dx =

1 2 (b − a 2) . 2

168

7.4

7 The Integral

Elementary Properties of the Integral

Let .f : I → R and .g : I → R be two real functions and .α ∈ R a constant. It is ˚ of .I, easy to verify that for every tagged partition .P ˚) = S(f, P) ˚ + S(g, P) ˚ S(f + g, P

.

and ˚) = αS(f, P ˚) . S(αf, P

.

These linearity properties are inherited by the integral, as will be proved in the following two propositions. Proposition 7.3 If f and g are integrable on .I, then .f + g is integrable on I and -

-

-

(f + g) =

.

I

f+

g.

I

I

. . Proof Set .J1 = I f and .J2 = I g. Once .ε > 0 is fixed, there are two gauges .δ1 ˚ of .I, if .P ˚ is .δ1 -fine, then and .δ2 on I such that, for every tagged partition .P ˚ − J1 | ≤ |S(f, P)

ε , 2

˚ − J2 | ≤ |S(g, P)

ε . 2

.

˚ is .δ2 -fine, then whereas if .P .

˚ be a .δ-fine Let us define the gauge .δ : I → R as .δ(x) = min{δ1 (x), δ2 (x)}. Let .P tagged partition of .I. It is thus both .δ1 -fine and .δ2 -fine, hence ˚ − (J1 + J2 )| = |S(f, P) ˚ − J1 + S(g, P) ˚ − J2 | |S(f + g, P)

.

˚ − J1 | + |S(g, P) ˚ − J2 | ≤ |S(f, P) ε ε ≤ + = ε. 2 2 

This completes the proof.

Proposition 7.4 If f is integrable on I and .α ∈ R, then .αf is integrable on I and -

(αf ) = α

.

I

f. I

7.4 Elementary Properties of the Integral

169

Proof If .α = 0, then the identity is surely true. If .α = 0, then set .J = .ε > 0. There is a gauge .δ on I such that ˚ −J| ≤ |S(f, P)

.

. I

f and fix

ε |α|

˚ of .I. Then, for every .δ-fine tagged partition .P ˚ of for every .δ-fine tagged partition .P .I, we have ˚) − αJ | = |αS(f, P) ˚ − αJ | = |α| |S(f, P) ˚ − J | ≤ |α| ε = ε , |S(αf, P |α|

.



and the proof is thus completed.

We have just proved that the set of integrable functions is a real vector space and that the integral is a linear function on it. We now study the behavior of the integral with respect to the order relation in .R. Proposition 7.5 If f is integrable on I and .f (x) ≥ 0 for every .x ∈ I, then f ≥ 0.

.

I

Proof Fix .ε > 0. There is a gauge .δ on I such that  -     ˚ . S(f, P) − f  ≤ ε  I

˚ of .I. Hence, for every .δ-fine tagged partition .P -

˚ − ε ≥ −ε , f ≥ S(f, P)

.

I

. ˚ ≥ 0. Since this is true for every .ε > 0, it must be that . f ≥ since clearly .S(f, P) I 0, thereby proving the result.  Corollary 7.6 If f and g are integrable on I and .f (x) ≤ g(x) for every .x ∈ I, then . f ≤ g. I

I

Proof It is sufficient to apply the preceding proposition to the function .g − f.



170

7 The Integral

Corollary 7.7 If f and .|f | are integrable on .I, then .

-     f  ≤ |f | .   I

I

Proof Applying the preceding corollary to the inequalities − |f | ≤ f ≤ |f | ,

.

we have .

-



-

|f | ≤ I

f ≤ I

|f | , I



whence the conclusion.

7.5

The Fundamental Theorem

The following theorem establishes an unexpected link between differential and integral calculus. It is called the Fundamental Theorem of differential and integral calculus. Theorem 7.8 (Fundamental Theorem—I) Let .F : [a, b] → R be a differentiable function, and let f be its derivative: .F  (x) = f (x) for every .x ∈ [a, b]. Then f is integrable on .[a, b] and -

b

.

f = F (b) − F (a) .

a

Proof Let .ε > 0 be fixed. We know that for every .x ∈ [a, b], f (x) = F  (x) = lim

.

u→x

F (u) − F (x) . u−x

Then for every .x ∈ [a, b], there is a .δ(x) > 0 such that, for every .u ∈ [a, b],     F (u) − F (x) ε − f (x) ≤ , 0 < |u − x| ≤ δ(x) ⇒  u−x b−a

.

i.e., |u − x| ≤ δ(x) ⇒ |F (u) − F (x) − f (x)(u − x)| ≤

.

We have thus defined a gauge .δ : [a, b] → R.

ε |u − x| . b−a

7.5 The Fundamental Theorem

171

Consider now a .δ-fine tagged partition of I , ˚ = {(x1, [a0 , a1 ]), . . . , (xm , [am−1 , am ])} . P

.

Since, for every .j = 1, . . . , m, |aj −1 − xj | ≤ δ(xj )

.

and

|aj − xj | ≤ δ(xj ) ,

we have that .

  F (aj ) − F (aj −1 ) − f (xj )(aj − aj −1 )   = [F (aj ) − F (xj ) − f (xj )(aj − xj )]   +[F (xj ) − F (aj −1 ) + f (xj )(aj −1 − xj )]   ≤ F (aj ) − F (xj ) − f (xj )(aj − xj )   +F (aj −1 ) − F (xj ) − f (xj )(aj −1 − xj ) ε ε ≤ |aj − xj | + |aj −1 − xj | b−a b−a ε ε (aj − xj + xj − aj −1 ) = (aj − aj −1 ) . = b−a b−a

Hence,  m  m     ˚ = |F (b) − F (a) − S(f, P)| [F (a ) − F (a )] − f (x )(a − a ) j j −1 j j j −1  

.

j =1

j =1

 m     = [F (aj ) − F (aj −1 ) − f (xj )(aj − aj −1 )] j =1



m    F (aj ) − F (aj −1 ) − f (xj )(aj − aj −1 ) j =1



m  j =1

and the theorem is proved.

ε (aj − aj −1 ) = ε , b−a 

172

7 The Integral

7.6

Primitivable Functions

In this section and the following one, we denote by .I any interval in .R (not necessarily a compact interval). A function .f : I → R is said to be “primitivable” (or “primitivable on .I”) if there is a differentiable function .F : I → R such that .F  (x) = f (x) for every .x ∈ I. Such a function F is called a “primitive” of f . The Fundamental Theorem establishes that all primitivable functions defined on a compact interval .I = [a, b] are integrable and that their integral is easily computable once a primitive is known. It can be reformulated as follows. Theorem 7.9 (Fundamental Theorem—II) Let .f : [a, b] → R be a primitivable function, and let F be one of its primitives. Then f is integrable on .[a, b] and -

b

.

f = F (b) − F (a) .

a

It is sometimes useful to denote the difference .F (b) − F (a) by the symbols [F ]ba ,

.

[F (x)]x=b x=a

or variants of these, for instance .[F (x)]ba , when no ambiguities can arise. Example Consider the function .f (x) = x n . It is easy to see that .F (x) = is a primitive. The Fundamental Theorem tells us that -

b

.

a



1 x n+1 x dx = n+1

b =

n

a

1 n+1 n+1 x

1 (bn+1 − a n+1 ) . n+1

The fact that the difference .F (b)−F (a) does not depend on the chosen primitive is explained by the following proposition. Proposition 7.10 Let .f : I → R be a primitivable function, and let F be one of its primitives. Then a function .G : I → R is a primitive of f if and only if .F − G is a constant function on .I. Proof If .F − G is constant, then G (x) = (F − (F − G)) (x) = F  (x) − (F − G) (x) = F  (x) = f (x)

.

for every .x ∈ I, and hence G is a primitive of .f. On the other hand, if G is a primitive of f , then we have (F − G) (x) = F  (x) − G (x) = f (x) − f (x) = 0

.

7.6 Primitivable Functions

173

for every .x ∈ I. Consequently, .F − G is constant on .I.



Note that if .f : I → R is a primitivable function, then it is also primitivable on every subinterval of .I. In particular, it is integrable on every interval .[a, x] ⊆ I, and therefore it is possible to define a function -

x

x →

f,

.

a

which we call the “integral function” of f and denote by one of the following symbols: -

-

·

·

f,

.

a

f (t) dt . a

In this last notation it is convenient to use a letter other than x for the variable of .f ; for instance, here we have chosen the letter t. The Fundamental Theorem tells us that if F is a primitive of .f, then -

x

.

f = F (x) − F (a) ,

for every x ∈ [a, b] .

a

.x

We thus see that corollary.

.

a

f differs from .F (x) by a constant, whence the following

Corollary . ·7.11 Let .f : [a, b] → R be a primitivable function. Then the integral function . a f is one of its primitives; it is differentiable on .[a, b] and -

·

.

 f (x) = f (x) ,

for every x ∈ [a, b] .

a

.· Notice that the choice of the point a in the definition of . a f is not at all mandatory. If .f : I → . · R is primitivable, one could take any point .ω ∈ I and consider the function . ω f. The conventions made on the integral with exchanged endpoints are such that the previously stated corollary still holds with this new integral function. Indeed, if F is a primitive of .f, even if .x < ω, then we have -

x

.

ω

f =−

ω

f = −(F (ω) − F (x)) = F (x) − F (ω) ,

x

.· so that . ω f is still a primitive of .f. We can then write .

d dx

-

x ω

f = f (x) ,

or, equivalently,

d dx

-

x ω

f (t) dt = f (x) .

174

7 The Integral

This formula can be generalized; if .α, β : [a, b] → R are two differentiable functions, then d . dx

-

β(x)

f (t) dt = f (β(x))β  (x) − f (α(x))α  (x) .

α(x)

Indeed, if F is a primitive of f , then the preceding formula is easily obtained by . β(x) writing . α(x) f (t) dt = F (β(x)) − F (α(x)) and differentiating. We will denote the set of all primitives of f by one of the following symbols: .

f,

f (x) dx .

. One should be careful with the notation . introduced for the primitives, which looks similar to that for the integral, even if the two concepts are completely different. Concerning the use of .x, an observation analogous to the one made for the integral can be made here, as well: it can be replaced by any other letter or symbol, with due precaution. When applying the theory to practical problems, however, if F denotes a primitive of f instead of correctly writing f = {F + c : c ∈ R} ,

.

it is common to use improper expressions of the type .

f (x) dx = F (x) + c ,

where .c ∈ R stands for an arbitrary constant; we will adapt to this habit, too. Let us make a list of primitives of some elementary functions: ex dx = ex + c ,

.

sin x dx = − cos x + c , cos x dx = sin x + c , x α dx = -

x α+1 + c , with α = −1 , α+1

1 dx = ln |x| + c , x

7.6 Primitivable Functions

-

175

1 dx = arctan x + c , 1 + x2 1 √ dx = arcsin x + c . 1 − x2

Notice that the definition of primitivable function makes sense even in some cases where f is not necessarily defined on an interval, and indeed the preceding formulas should be interpreted on the natural domains of the considered functions. For example,  1 ln x + c if x ∈ ]0, +∞[ , dx = . ln(−x) + c if x ∈ ] − ∞, 0[ . x Example Using the Fundamental Theorem we find - π . sin x dx = [− cos x]π0 = − cos π + cos 0 = 2 . 0

Notice that the presence of the arbitrary constant. c can sometimes lead to apparently different results. For example, we know that . √ 1 2 dx = arcsin x + c, 1−x

but it is readily verified that we also have 1 . dx = − arccos x + c . √ 1 − x2 This is explained by the fact that .arcsin x = π2 − arccos x for every .x ∈ [−1, 1], hence the difference of .arcsin and .− arccos is constant. The same notation c for the arbitrary constant in the two formulas could sometimes be misleading! From the known properties of derivatives we can easily prove the following two propositions. Proposition 7.12 Let f and g be primitivable on .I, and let F and G be two corresponding primitives. Then .f + g is primitivable on .I, and .F + G is one of its primitives; we will briefly write1 .

(f + g) =

f+

g.

1

Here and in what follows, we use in an intuitive way the algebraic operations involving sets. To be precise, the sum of two sets A and B is defined as .A

+ B = {a + b : a ∈ A, b ∈ B} .

176

7 The Integral

Proposition 7.13 Let f be primitivable on .I, and let F be one of its primitives. If .α ∈ R is any given constant, then .αf is primitivable on .I, and .αF is one of its primitives; we will briefly write .

(αf ) = α

f.

As a consequence of these propositions, we have that the set of primitivable functions on .I is a real vector space. We conclude this section by presenting an interesting class of integrable functions that are not primitivable. Let the function .f : [a, b] → R be such that the set E = {x ∈ [a, b] : f (x) = 0}

.

is finite or countable (for instance, a function that is zero everywhere except at a point, or the Dirichlet function .D : [a, b] → R, defined by .D(x) = 1 if x is rational, and .D(x) = 0 if x is irrational). .b Let us prove that such a function is integrable, with . a f = 0. Assume for definiteness that E is infinite (the case where E is finite can be treated in an analogous way). Since it is countable, we can write .E = {en : n ∈ N}. Once .ε > 0 has been fixed, we construct a gauge .δ on .[a, b] as follows. If .x ∈ E, then we set .δ(x) = 1; if, instead, for a certain n we have .x = en , then we set δ(en ) =

.

ε . 2n+3 |f (en )|

˚ = {(x1 , [a0 , a1 ]), . . . , (xm , [am−1 , am ])} be a .δ-fine tagged partition of Now let .P [a, b]. By the way in which f is defined, the associated Riemann sum becomes

.

˚) = S(f, P



f (xj )(aj − aj −1 ) .

.

{1≤j ≤m : xj ∈E}

Let .N = max{1 ≤ j ≤ m : xj ∈ E}. Since .[aj −1 , aj ] ⊆ [xj − δ(xj ), xj + δ(xj )], we have that .aj − aj −1 ≤ 2δ(xj ), and if .xj is in E, it must be that .xj = en for some .n ∈ N. To any such .en can, however, correspond one or two points .xj , so that we will have   . 

 {1≤j ≤m : xj ∈E}

 N   f (xj )(aj − aj −1 ) ≤ 2 |f (en )|2δ(en ) n=0

=4

N  n=0

ε

2

= n+3

N ε   1 n ε 1−( 12 )N+1 = < ε. 2 2 2 1− 12 n=0

7.7 Primitivation by Parts and by Substitution

177

.b This shows that f is integrable on .[a, b] and that . a f = 0. Let us see now that if E is nonempty, then f is not primitivable on .[a, b]. Indeed, .· if it were, its integral function . a f should be one of its primitives. But the foregoing .x procedure shows that . a f = 0 for every .x ∈ [a, b]. Then f should be identically zero, being the derivative of a constant function, a contradiction.

7.7

Primitivation by Parts and by Substitution

We now present two methods frequently used for finding the primitives of certain functions. The first one is known as the method of “primitivation by parts.” Proposition 7.14 Let .F, G : I → R be two differentiable functions, and let .f, g be the corresponding derivatives. One has that f G is primitivable on .I if and only if F g is, in which case a primitive of f G is obtained subtracting from F G a primitive of .F g; we will briefly write . f G = F G − Fg . Proof Since F and G are differentiable, then so is F G, and we have (F G) = f G + F g .

.

Hence, .f G+F g is primitivable on .I with primitive .F G, and the conclusion follows from Proposition 7.12.  Example We would like to find a primitive of the function .h(x) = xex . Define the following functions: .f (x) = ex , .G(x) = x, and consequently .F (x) = ex , .g(x) = 1. Applying the formula given by the foregoing proposition, we have -

ex x dx = ex x −

.

ex dx = xex − ex + c ,

where c stands, as usual, for an arbitrary constant. As an immediate consequence of Proposition 7.14, we have the rule of “integration by parts”: -

b

.

a

-

b

f G = F (b)G(b) − F (a)G(a) −

Fg . a

178

7 The Integral

Examples Applying the formula to the function .h(x) = xex of the previous example, we compute -

1

.

e x dx = e · 1 − e · 0 − 1

x

0

0

0

1

ex dx = e − [ex ]10 = e − (e1 − e0 ) = 1 .

Note that we could have obtained the same result using the Fundamental Theorem; having already found earlier that a primitive of h is given by .H (x) = xex − ex , we have that - 1 . ex x dx = H (1) − H (0) = (e − e) − (0 − 1) = 1 . 0

Let us consider some additional examples. Let .h(x) = sin2 x. With the obvious choice of the functions f and .G, we find 2 . sin x dx = − cos x sin x + cos2 x dx (1 − sin2 x) dx

= − cos x sin x +

= x − cos x sin x − from which we obtain .

sin2 x dx =

sin2 x dx ,

1 (x − cos x sin x) + c . 2

Consider now the case of the function .h(x) = ln x, with .x > 0. To apply the formula of primitivation by parts, we choose the functions .f (x) = 1, .G(x) = ln x. In this way, we find .

ln x dx = x ln x −

1 x dx = x ln x − x

1 dx = x ln x − x + c .

The second method we want to study is known as the method of “primitivation by substitution.” Proposition 7.15 Let .ϕ : I → R be a differentiable function and .f : ϕ(I) → R be a primitivable function on the interval .ϕ(I), with primitive .F. Then the function  .(f ◦ ϕ)ϕ is primitivable on .I, and one of its primitives is given by .F ◦ ϕ. We will briefly write .

(f ◦ ϕ)ϕ  =

-

 f

◦ϕ.

7.7 Primitivation by Parts and by Substitution

179

Proof The function .F ◦ ϕ is differentiable on .I and (F ◦ ϕ) = (F  ◦ ϕ)ϕ  = (f ◦ ϕ)ϕ  .

.

It follows that .(f ◦ ϕ)ϕ  is primitivable on .I, with primitive .F ◦ ϕ.

 2

As an example, we look for a primitive of the function .h(x) = xex . Defining 1 t 2 .ϕ(x) = x , .f (t) = 2 e (it is advisable to use different letters to indicate the variables of .ϕ and f ), we have that .h = (f ◦ ϕ)ϕ  . Since a primitive of f is seen to be .F (t) = 12 et , a primitive of h is .F ◦ ϕ, i.e., -

2

xex dx = F (ϕ(x)) + c =

.

1 x2 e +c. 2

The formula of primitivation by substitution is often written in the form .

f (ϕ(x))ϕ  (x) dx =

  f (t) dt 

-

, t =ϕ(x)

where, if F is a primitive of .f, the term on the right-hand side should be read .

  f (t) dt 

t =ϕ(x)

  F (t) + c 

=

t =ϕ(x)

= F (ϕ(x)) + c .

Formally, there is a “change of variable” .t = ϕ(x), and the symbol dt joins the dt game to replace .ϕ  (x) dx (the Leibniz notation . dx = ϕ  (x) is a useful mnemonic rule). Example To find a primitive of the function .h(x) = and apply the formula .

ln x dx = x

-

  t dt 

t =ln x

=

ln x x , we can choose .ϕ(x)

= ln x

  1 2 1 t + c  = (ln x)2 + c . 2 2 t =ln x

In this case, writing .t = ln x, we have that the symbol dt replaces . x1 dx. As a consequence of the preceding formulas, we have the rule of “integration by substitution”: .

a

b



f (ϕ(x))ϕ (x) dx =

-

ϕ(b)

f (t) dt . ϕ(a)

180

7 The Integral

Indeed, if F is a primitive of f on .ϕ(I), applying the Fundamental Theorem twice, we have -

b

.

(f ◦ ϕ)ϕ  = (F ◦ ϕ)(b) − (F ◦ ϕ)(a) = F (ϕ(b)) − F (ϕ(a)) =

a

-

ϕ(b)

f. ϕ(a)

2

Example Taking the function .h(x) = xex defined previously, we have -

2

.

-

2

4

xex dx =

0

0

1 t 1 e4 − 1 e dt = [et ]40 = . 2 2 2

Clearly, the same result is obtainable directly by the Fundamental Theorem once we 2 know that a primitive of h is given by .H (x) = 12 ex . Indeed, we have -

2

.

2

xex dx = H (2) − H (0) =

0

1 4 1 0 e4 − 1 e − e = . 2 2 2

When the function .ϕ : I → ϕ(I) is invertible, we can also write   . , f (t) dt = f (ϕ(x))ϕ  (x) dx  x=ϕ −1 (t )

with the corresponding formula for the integral: -

β

.

f (t) dt =

α

ϕ −1 (β) ϕ −1 (α)

f (ϕ(x))ϕ  (x) dx .

√ Example Looking for a primitive of .f (t) = 1 − t 2 , with .t ∈ ] − 1, 1[ , we may consider the function .ϕ : ]0, π[ → ] − 1, 1[ defined as .ϕ(x) = cos x, so that f (ϕ(x))ϕ  (x) =

.

1 − cos2 x (− sin x) = − sin2 x ,

since .sin x > 0 when .x ∈ ]0, π[ . Therefore, we can write  -  2 2 . 1 − t dt = − sin x dx  x=arccos t

  1 = − (x − cos x sin x) + c  2 x=arccos t  

1 = − arccos t − t 1 − t 2 + c . 2

7.8 The Taylor Formula with Integral Form Remainder

7.8

181

The Taylor Formula with Integral Form Remainder

Here we have the “Taylor formula with integral form of the remainder.” Theorem 7.16 (Taylor Theorem—II) Let .x = x0 be two points of an interval I and .f : I → R be .n + 1 times differentiable. Then f (x) = pn (x) +

.

1 n!

-

x

f (n+1) (u)(x − u)n du ,

x0

where .pn (x) is the nth-order Taylor polynomial associated with f at .x0 . Proof Let us first prove by induction that if f is .n + 1 times differentiable, then the function .gn (u) = f (n+1) (u)(x − u)n is primitivable (here x is fixed). If .n = 0, then we have that .g0 (u) = f  (u), hence the proposition is true. Assume now that the proposition is true for some .n ∈ N. Then, if f is .n + 2 times differentiable, Du (f (n+1) (u)(x − u)n+1 ) = f (n+2) (u)(x − u)n+1 − (n + 1)f (n+1) (u)(x − u)n ,

.

i.e., gn+1 (u) = (n + 1)gn (u) + Du (f (n+1) (u)(x − u)n+1 ) .

.

Since we know that .gn is primitivable, the preceding formula tells us that .gn+1 is too, since it is the sum of two primitivable functions. We have thus proved the assertion. Let us now prove the formula by induction. If .n = 0, then, by the Fundamental Theorem, - x 1 x (0+1) .f (x) = f (x0 ) + f  (u) du = p0 (x) + f (u)(x − u)0 du , 0! x0 x0 hence the formula is true. Assume now that the formula holds true for some .n ∈ N, and let f be .n + 2 times differentiable. Then  f. (x) − pn+1 (x) = f (x) − pn (x) + 1 n!

-

x

 1 f (n+1) (x0 )(x − x0 )n+1 (n + 1)!

1 f (n+1) (x0 )(x − x0 )n+1 (n + 1)! x0 - x  1 1 f (n+1) (x0 )(x − x0 )n+1 . = f (n+1) (u)(x − u)n du − n! x0 n+1 =

f (n+1) (u)(x − u)n du −

182

7 The Integral

Integrating by parts (we know that .gn and .gn+1 are primitivable), -

x

.

f (n+1) (u)(x − u)n du =

x0

=





(x − u)n+1  (n+1) (u) f n+1

u=x

-

x







x0

u=x0

1 1 f (n+1) (x0 )(x − x0 )n+1 + = n+1 n+1

-

x

(x − u)n+1  (n+2) (u) du f n+1

f (n+2) (u)(x − u)n+1 du ,

x0

and substituting, f (x) − pn+1 (x) =

.

1 n!



1 n+1 -

1 = (n + 1)!

-

x

 f (n+2) (u)(x − u)n+1 du

x0 x

f (n+2) (u)(x − u)n+1 du .

x0

Hence, the formula holds also for .n + 1, and the proof is complete.

7.9



The Cauchy Criterion

We have already encountered the Cauchy criterion for sequences in complete metric spaces and for the limit of functions (Theorem 4.15). It is not surprising that a similar criterion holds also for integrability, which can be thought as a kind of “limit” of the Riemann sums. Theorem 7.17 (Cauchy Criterion) A function .f : I → R is integrable if and only if for every .ε > 0 there is a gauge .δ : I → R such that, taking two .δ-fine tagged ˚ .Q ˚ of I , we have partitions .P, ˚ − S(f, Q)| ˚ ≤ ε. |S(f, P)

.

Proof Let us first prove the necessary condition. Let f be integrable on I , with integral .J , and fix .ε > 0. Then there is a gauge .δ on I such that, for every .δ-fine ˚ of .I, we have tagged partition .P ˚ −J| ≤ |S(f, P)

.

ε . 2

˚ and .Q ˚ are two .δ-fine tagged partitions, then we have If .P ˚ − S(f, Q)| ˚ ≤ |S(f, P) ˚ − J | + |J − S(f, Q)| ˚ ≤ |S(f, P)

.

ε ε + = ε. 2 2

7.9 The Cauchy Criterion

183

Let us now prove the sufficiency. Once the stated condition is assumed, let us choose ε = 1 so that we can find a gauge .δ1 on I such that

.

˚ − S(f, Q)| ˚ ≤ 1, |S(f, P)

.

˚ and .Q ˚ are .δ1 -fine tagged partitions of .I. Taking .ε = 1/2, we can find a whenever .P gauge .δ2 on I that we can choose so that .δ2 (x) ≤ δ1 (x) for every .x ∈ I, such that ˚ − S(f, Q)| ˚ ≤ 1 |S(f, P) 2

.

˚ and .Q ˚ are .δ2 -fine tagged partitions of .I. We can continue this way, whenever .P choosing .ε = 1/k, with k a positive integer, and find a sequence .(δk )k of gauges on I such that, for every .x ∈ I, δ1 (x) ≥ δ2 (x) ≥ · · · ≥ δk (x) ≥ δk+1 (x) ≥ . . . ,

.

and such that ˚ − S(f, Q)| ˚ ≤ 1 |S(f, P) k

.

˚ and .Q ˚ are .δk -fine tagged partitions of .I. whenever .P ˚k of I . We want to show that Let us fix, for every .k, a .δk -fine tagged partition .P ˚k ))k is a Cauchy sequence of real numbers. Let .ε¯ > 0 be given. Let us .(S(f, P choose a positive integer N such that .N ε¯ ≥ 1. If .k1 ≥ N and .k2 ≥ N, assuming, for instance, .k2 ≥ k1 , then we have ˚k ) − S(f, P ˚k )| ≤ |S(f, P 1 2

.

1 1 ≤ ≤ ε¯ . k1 N

˚k ))k is a Cauchy sequence; hence, it has a finite limit, which This proves that .(S(f, P we denote by .J . Now we show that .J is just the integral of f on .I. Fix .ε > 0, let n be a positive integer such that .nε ≥ 1, and consider the gauge .δ = δn . For every .δ-fine tagged ˚ of I and for every .k ≥ n, we have partition .P ˚ − S(f, P ˚k )| ≤ |S(f, P)

.

1 ≤ ε. n

˚k ) tends to .J , and consequently Letting k tend to .+∞, we have that .S(f, P ˚ −J| ≤ ε. |S(f, P)

.

The proof is thus completed.



184

7.10

7 The Integral

Integrability on Subintervals

In this section we will see that if a function is integrable on an interval .I = [a, b], then it is also integrable on any of its subintervals. In particular, it is possible to consider its integral function. Moreover, we will see that if a function is integrable on two contiguous intervals, then it is also integrable on their union. In what follows, it will be useful to consider the so-called “tagged subpartitions” of the interval .I. A tagged subpartition is a set of the type  = {(ξj , [αj , βj ]) : j = 1, . . . , m} ,

.

where the intervals .[αj , βj ] are nonoverlapping, but not necessarily contiguous, and .ξj ∈ [αj , βj ] for every .j = 1, . . . , m. For a tagged subpartition ., it is still meaningful to consider the associated Riemann sum S(f, ) =

m 

.

f (ξj )(βj − αj ) .

j =1

Moreover, given a gauge .δ on .I, the tagged subpartition . is .δ-fine if, for every .j, we have ξj − αj ≤ δ(ξj )

.

and βj − ξj ≤ δ(ξj ) .

Let us state the property of “additivity on subintervals.” Theorem 7.18 Given three points .a < c < b, the function .f : [a, b] → R is integrable on .[a, b] if and only if it is integrable on both .[a, c] and .[c, b]. In this case, -

b

.

a

-

c

f = a

-

b

f+

f. c

Proof We denote by .f1 : [a, c] → R and .f2 : [c, b] → R the two restrictions of f to .[a, c] and .[c, b], respectively. Let us first assume that f is integrable on .[a, b] and prove that .f1 is integrable on .[a, c] by the Cauchy criterion. Fix .ε > 0; since f is integrable on .[a, b], this verifies the Cauchy condition, and hence there is a gauge .δ : [a, b] → R such that ˚ − S(f, Q)| ˚ ≤ε |S(f, P)

.

˚ .Q ˚ of .[a, b]. The restrictions of .δ to .[a, c] for every two .δ-fine tagged partitions .P, ˚1 and .Q ˚1 and .[c, b] are two gauges .δ1 : [a, c] → R and .δ2 : [c, b] → R. Now let .P ˚ be two .δ1 -fine tagged partitions of .[a, c]. Let us fix a .δ2 -fine tagged partition .P2 of

7.10 Integrability on Subintervals

185

f(x)

a

b

c

x

˚ of .[a, b] made by .P ˚1 ∪ P ˚2 and the tagged [c, b] and consider the tagged partition .P ˚ of .[a, b] made by .Q ˚1 ∪ P ˚ and .Q ˚ are .δ-fine. ˚2 . It is clear that both .P partition .Q Moreover, we have

.

˚1 ) − S(f1 , Q ˚1 )| = |S(f, P) ˚ − S(f, Q)| ˚ ≤ ε; |S(f1 , P

.

the Cauchy criterion thus applies, so that f is integrable on .[a, c]. Analogously it can be proved that f is integrable on .[c, b]. Suppose now that .f1 is integrable on .[a, c] and .f2 on .[c, b]. Let us then prove .b .c that f is integrable on .[a, b] with integral . a f + c f. Once .ε > 0 is fixed, there is a gauge .δ1 on .[a, c] and a gauge .δ2 on .[c, b] such that, for every .δ1 -fine tagged ˚1 of .[a, c], we have partition .P   ˚1 ) − . S(f1 , P 

c a

  ε f  ≤ , 2

˚2 of .[c, b], we have and for every .δ2 -fine tagged partition .P    ˚ . S(f2 , P2 ) − 

b

c

  ε f  ≤ . 2

We now define a gauge .δ on .[a, b] as follows: ⎧ c −x ⎪ ⎪ min δ (x), 1 ⎪ ⎪ 2 0 ⎨ / min δ1 (c), δ2 (c) .δ(x) = ⎪ ⎪ ⎪ ⎪ ⎩ min δ2 (x), x − c 2

if a ≤ x < c , if x = c , if c < x ≤ b .

186

7 The Integral

Let ˚ = {(x1 , [a0 , a1 ]), . . . , (xm , [am−1 , am ])} P

.

be a .δ-fine tagged partition of .[a, b]. Notice that, because of the particular choice of the gauge .δ, there must be a certain .j¯ for which .xj¯ = c. Hence, we have ˚ = f (x1 )(a1 − a0 ) + · · · + f (xj¯−1 )(aj−1 S(f, P) − aj−2 ¯ ¯ )+

.

+f (c)(c − aj−1 ¯ ) + f (c)(aj¯ − c) + +f (xj¯+1 )(aj¯+1 − aj¯ ) + · · · + f (xm )(am − am−1 ) . Let us set ˚1 = {(x1 , [a0 , a1 ]), . . . , (xj−1 P ¯ , [aj¯−2 , aj¯−1 ]), (c, [aj−1 ¯ , c])}

.

and ˚2 = {(c, [c, aj¯ ]), (xj+1 P ¯ , [aj¯ , aj¯+1 ]), . . . , (xm , [am−1 , am ])}

.

(but in case .aj−1 or .aj¯ coincides with c, we will have to take away an element from ¯ ˚1 is a .δ1 -fine tagged partition of .[a, c] and .P ˚2 is a .δ2 -fine one of the two). Then .P tagged partition of .[c, b], and we have ˚ = S(f1 , P ˚1 ) + S(f2 , P ˚2 ) . S(f, P)

.

Consequently, .

  S(f, P) ˚ − 

c a

-

b

f+ c

    ˚1 ) − f  ≤ S(f1 , P

c a

    ˚2 ) − f  + S(f2 , P

b c

  f 

ε ε ≤ + = ε, 2 2 

which completes the proof. Example Consider the function .f : [0, 2] → R defined by  f (x) =

.

3 5

if x ∈ [0, 1] , if x ∈ ]1, 2] .

.1 Since f is constant on .[0, 1] with value 3, it is integrable there, and . 0 f = 3. Moreover, on the interval .[1, 2] the function f differs from the constant 5 at only one point: We have that the function .g(x) = f (x) − 5 is zero except for .x = 1.

7.11 R-Integrable and Continuous Functions

187

As we have shown at the end of Sect. 7.6, g is integrable on .[1, 2] with zero integral and so, since .f (x) = g(x) + 5, even f is integrable and -

2

.

-

2

f (x) dx =

1

-

2

g(x) dx +

1

5 dx = 0 + 5 = 5 .

1

In conclusion, -

2

.

0

-

1

f (x) dx =

-

2

f (x) dx +

0

f (x) dx = 3 + 5 = 8 .

1

It is easy to see from the theorem just proved that if a function is integrable on an interval .I, it is still integrable on any subinterval of .I. Moreover, we have the following corollary. Corollary 7.19 If .f : I → R is integrable, for any three arbitrarily chosen points u, v, w in I one has - v - w - w . f = f+ f.

.

u

u

v

Proof The case .u < v < w follows immediately from the previous theorem. The other possible cases are easily obtained using the conventions on the integrals with exchanged or equal endpoints. 

7.11

R-Integrable and Continuous Functions

Let us introduce an important class of integrable functions. As usual, .I = [a, b]. Definition 7.20 We say that an integrable function .f : I → R is “R-integrable” (or “integrable according to Riemann”) if among all possible gauges .δ : I → R that verify the definition of integrability it is always possible to choose one that is constant on .I. We can immediately see, repeating the proofs, that the set of R-integrable functions is a vector subspace of the space of integrable functions. Moreover, the following Cauchy criterion holds for R-integrable functions whenever one considers only constant gauges. Theorem 7.21 A function .f : I → R is R-integrable if and only if for every ε > 0 there is a .δ > 0 (i.e., a constant gauge .δ) such that, taking two .δ-fine tagged

.

188

7 The Integral

˚ .Q ˚ of I , one has partitions .P, ˚ − S(f, Q)| ˚ ≤ ε. |S(f, P)

.

We now want to establish the integrability of continuous functions. Indeed, in the following two theorems we will prove that they are both R-integrable and primitivable. To simplify the expressions to come, we will denote by .μ(K) the length of a bounded interval .K. In particular, μ([a, b]) = b − a .

.

It will be useful, moreover, to set .μ(Ø).= 0. Here is the first theorem. Theorem 7.22 Every continuous function .f : I → R is R-integrable. Proof Fix .ε > 0. Since f is continuous on a compact interval, by Heine’s Theorem 4.12, it is uniformly continuous there, so that there is a .δ > 0 such that, for x and .x  in .I, |x − x  | ≤ 2δ

.



|f (x) − f (x  )| ≤

ε . b−a

We will verify the Cauchy criterion for the R-integrability by considering the constant gauge .δ thus found. Let ˚ = {(x1 , [a0 , a1 ]), . . . , (xm , [am−1 , am ])} P

.

and ˚ = {(x˜1, [a˜ 0 , a˜ 1 ]), . . . , (x˜m Q & , [a˜ m &−1 , a˜ m & ])}

.

be two .δ-fine tagged partitions of .I. Let us define the intervals (perhaps empty or reduced to a single point) Ij,k = [aj −1 , aj ] ∩ [a˜ k−1 , a˜ k ] .

.

Then we have aj − aj −1 =

m & 

.

k=1

μ(Ij,k ) ,

a˜ k − a˜ k−1 =

m  j =1

μ(Ij,k ) ,

7.11 R-Integrable and Continuous Functions

189

and, if .Ij,k is nonempty, .|xj − x˜k | ≤ 2δ. Hence,  m m  m &  & m     ˚ ˚ .|S(f, P) − S(f, Q)| = f (xj )μ(Ij,k ) − f (x˜k )μ(Ij,k )  j =1 k=1

k=1 j =1

  m &  m    = [f (xj ) − f (x˜k )] μ(Ij,k ) j =1 k=1



m  m & 

|f (xj ) − f (x˜k )| μ(Ij,k )

j =1 k=1



m  m &  j =1 k=1

ε μ(Ij,k ) = ε . b−a

Therefore, the Cauchy criterion applies, and the proof is completed.



Here is the second theorem. Theorem 7.23 Every continuous function .f : [a, b] → R is primitivable. Proof Since it is continuous, f is. integrable on every subinterval of .[a, b], so we · can consider its integral function . a f . Let us prove . ·that it is a primitive of .f, i.e., that if a point .x0 is taken in .[a, b], the derivative of . a f at .x0 coincides with .f (x0 ). We first consider the case where .x0 ∈ ]a, b[ . We want to prove that .

lim

h→0

1 h

-

x0 +h a

-



x0

f−

f

= f (x0 ) .

a

f(x)

x

a

x0 x0 + h

b

190

7 The Integral

Equivalently, since 1 . h

-

-

x0 +h

f−

a



x0

f a

1 − f (x0 ) = h

-

x0 +h

(f (x) − f (x0 )) dx ,

x0

we will show that .

lim

h→0

1 h

-

x0 +h

(f (x) − f (x0 )) dx = 0 .

x0

Fix .ε > 0. Since f is continuous in .x0 , there is a .δ > 0 such that, for every x ∈ [a, b] satisfying .|x − x0 | ≤ δ, one has .|f (x) − f (x0 )| ≤ ε. Taking h such that .0 < |h| ≤ δ, we distinguish two cases. If .0 < h ≤ δ, then .

.

 - x +h  0 1  1 x0 +h 1 x0 +h   ≤ (f (x) − f (x )) dx |f (x)−f (x )| dx ≤ ε dx = ε ; 0 0 h  h h x0 x0 x0

on the other hand, if .−δ ≤ h < 0, then we have .

 - x +h  - x0 - x0 0 1  1 1   (f (x)−f (x )) dx |f (x)−f (x )| dx ≤ ε dx= ε , ≤ 0 0 h  −h −h x0 +h x0 x0 +h

and the proof is completed when .x0 ∈ ]a, b[ . In case .x0 = a or .x0 = b, we proceed analogously, considering the right or the left derivative, respectively.  Notice that it is not always possible to find an elementary expression for the primitive of a continuous function. As an example, the function .f (x) = sin(x 2 ), being continuous, is primitivable, but there is no elementary formula defining any of its primitives. By “elementary formula” we mean an analytic formula where only polynomials, exponentials, logarithms, and trigonometric functions appear. Let us now prove that the Dirichlet function .D is not R-integrable on any interval .[a, b] (remember that .D(x) is 1 on the rationals and 0 on the irrationals). We will show that the Cauchy criterion is not verified. Take .δ > 0 constant, and let .a = a0 < a1 < · · · < am = b be such that, for every .j = 1, . . . , m, one has .aj − aj −1 ≤ δ. In every interval .[aj −1 , aj ] we can choose a rational number .xj and an irrational number .x˜j . The two tagged partitions ˚ = {(x1 , [a0 , a1 ]), . . . , (xm , [am−1 , am ])} , P

.

˚ = {(x˜1, [a0 , a1 ]), . . . , (x˜m , [am−1 , am ])} Q

.

are .δ-fine, and, by the very definition of .D, we have ˚ − S(D, Q) ˚ = S(D, P)

m 

.

j =1

[D(xj ) − D(x˜j )](aj − aj −1 ) =

m  j =1

(aj − aj −1 ) = b − a .

7.12 Two Theorems Involving Limits

191

Since .δ > 0 was taken arbitrarily, the Cauchy criterion for R-integrability does not hold, so that f cannot be R-integrable on .[a, b].

7.12

Two Theorems Involving Limits

Let .I = .[a, b] and .f : I → R be a continuous function; recall that the integral · function . a f is a primitive of f , so it is surely continuous on I . It is then possible to define the function . : C(I, R) → C(I, R) as -

x

[(f )](x) =

f.

.

a

Taking .f, g ∈ C(I, R), for every .x ∈ [a, b] one has that  - x    [(f )](x) − [(g)](x) =  (f − g) ≤

x

.

a

-

b

≤ a

|f − g|

a

|f − g| ≤ (b − a)f − g∞ ,

whence (f ) − (g)∞ ≤ (b − a)f − g∞ .

.

This implies that . is a continuous function. We will use this fact in the following two theorems involving limits. We consider the situation where a sequence of continuous functions .(fn )n converges pointwise to a function f , i.e., for every .x ∈ I, lim fn (x) = f (x) .

.

n

The question is whether f is integrable on I , with -

f = lim

.

I

fn ,

n

I

i.e., whether -

lim fn = lim

.

I

n

n

fn . I

In other words, we wonder if it is possible to commute the operations of integral and limit.

192

7 The Integral

Example Let us first show that in some cases the answer could be no. Consider the functions .fn : [0, π] → R, with .n = 1, 2, . . . , defined by  fn (x) =

.

if x ∈ [0, πn ] , otherwise.

n sin(nx) 0

For every .x ∈ [0, π] we have .limn fn (x) = 0, whereas .

π

-

π/n

fn (x) dx =

0

-

π

n sin(nx) dx =

0

sin(t) dt = 2 .

0

Hence, in this case, -

π

.

-

π

lim fn = 0 = 2 = lim n

0

n

fn . 0

In the following theorem, which will be generalized later (Theorem 9.13), the answer to the foregoing question is positive, provided we assume the convergence to be uniform. Theorem 7.24 Let .(fn )n be a uniformly convergent sequence in .C([a, b], R). Then -

b

.

-

b

lim fn = lim n

a

fn .

n

a

Proof Let .limn fn = f : I → R. Since the convergence is uniform, we know that f ∈ C(I, R). Moreover, since . is continuous, we have that .limn (fn ) = (f ), i.e.,

.

.

lim[(fn )](x) = [(f )](x) , n

uniformly in x ∈ [a, b] .

In particular, taking .x = b, .

b

lim n

which is what we wanted to prove.

a

-

b

fn =

f, a



In the second theorem, an analogous question concerning the possibility of commuting the operations of derivative and limit is analyzed.

7.12 Two Theorems Involving Limits

193

Theorem 7.25 Let .x0 ∈ I , .y0 ∈ R, .(fn )n be a sequence in .C 1 (I, R) and .g ∈ C(I, R) be such that .

lim fn (x0 ) = y0 n

and

lim fn = g uniformly on I . n

Then .(fn )n converges uniformly to some function f . Moreover, .f ∈ C 1 (I, R) and f  = g. Consequently, we can write

.

.

 d   d  lim fn (x) = lim fn (x) . n dx n dx

Proof Let us define the function .f : I → R as f (x) = y0 +

x

g(t) dt .

.

x0

Since g is continuous, the function f is differentiable and .f  (x) = g(x) for every 1 .x ∈ I . In particular, .f ∈ C (I, R). The proof will be completed showing that .(fn )n converges uniformly to f . By the Fundamental Theorem, for every .n ∈ N and .x ∈ I we can write fn (x) = fn (x0 ) +

x

.

x0

fn (t) dt ,

i.e., fn (x) = fn (x0 ) + [(fn )](x) − [(fn )](x0 ) .

.

Since .limn fn = g in .C(I, R), we have that .limn (fn ) = (g) in .C(I, R), i.e., .

lim[(fn )](x) = [(g)](x) , n

uniformly in x ∈ I .

Hence, since also .limn fn (x0 ) = y0 , we have that .

lim fn (x) = y0 +[(g)](x)−[(g)](x0) = y0 + n

x

g(t) dt ,

uniformly in x ∈ I .

x0

We have thus proved that .(fn )n converges uniformly to f .



194

7 The Integral

7.13

Integration on Noncompact Intervals

We begin by considering a function .f : [a, b[ → R, where .b ≤ +∞. Assume that f is integrable on every compact interval of the type .[a, c], with .c ∈ ]a, b[ . This happens, for instance, when f is continuous on .[a, b[ . Definition 7.26 We say that a function .f : [a, b[ → R is “integrable” if f is integrable on .[a, c] for every .c ∈ ]a, b[ , and the limit .

lim

c→b− a

c

f

exists and is finite. In that case, the preceding limit is called the “integral” of f on .b .b [a, b[ and it is denoted by . a f , or by . a f (x) dx.

.

. +∞ . +∞ In particular, if .b = +∞, then we will write . a f , or . a f (x) dx. Examples Let .a > 0; it is readily seen that the function .f : [a, +∞[ → R, defined by .f (x) = x −α , is integrable if and only if .α > 1, in which case we have .

a

+∞

a 1−α dx . = xα α−1

Consider now the case .a < b < +∞. It can be verified that the function .f : [a, b[ → R, defined by .f (x) = (b − x)−β , is integrable if and only if .β < 1, in which case we have -

b

.

a

a

dx (b − a)1−β . = (b − x)β 1−β

a

b

We also say that the integral . c converges if the function f is integrable on .[a, b[ , i.e., when the limit .limc→b− . a f exists and is finite. If the limit does not exist, we say the integral is undetermined. If it exists and equals .+∞ or .−∞, we say that the integral diverges to .+∞ or to .−∞, respectively.

7.13 Integration on Noncompact Intervals

195

It is clear that the convergence of the integral depends solely on the behavior of the function “near” the point .b. In other words, if the function is modified outside a neighborhood of b, the convergence of the integral is by no means compromised. Let us now state the Cauchy criterion. Theorem 7.27 Let .f : [a, b[ → R be a function that is integrable on .[a, c], for every .c ∈ ]a, b[ . Then f is integrable on .[a, b[ if and only if for every .ε > 0 there is a .c¯ ∈ ]a, b[ such that, for any .c and .c in .[c, ¯ b[ , we have that  . 

c

c

  f  ≤ ε .

Proof It is a direct consequence of . c Theorem 4.15, when applied to the function F : [a, b[ → R defined as .F (c) = a f . 

.

From the Cauchy criterion we deduce the following comparison criterion. Theorem 7.28 Let .f : [a, b[ → R be a function that is integrable on .[a, c], for every .c ∈ ]a, b[ . If there is an integrable function .g : [a, b[ → R such that, for every .x ∈ [a, b[ , |f (x)| ≤ g(x) ,

.

then f is integrable on .[a, b[ , too. Proof Once .ε > 0 is fixed, there is a .c¯ ∈ ]a, b[ such that, taking arbitrarily .c , c in . c .[c, ¯ b[ , one has that .| c g| ≤ ε. If, for example, .c ≤ c , since .−g ≤ f ≤ g, we have .



c c

g≤

c c

f ≤

c c

g,

and therefore   

.

c c

  f  ≤

c c

g ≤ ε.

The Cauchy criterion then applies, whence the conclusion.



Note that it would have been sufficient to assume the inequality .|f (x)| ≤ g(x) on .[c, ¯ b[ . As an immediate consequence, we have the following corollary.

196

7 The Integral

Corollary 7.29 Let .f : [a, b[ → R be a function that is integrable on .[a, c] for every .c ∈ ]a, b[ . If .|f | is integrable on .[a, b[ , then f is, too, and .

  

b a

  f  ≤

b

|f | .

a

A function satisfying the assumption of the preceding corollary will be said to be L-integrable. Let us now state a corollary of the comparison criterion that is often used in practice. Corollary 7.30 Let .f, g : [a, b[ → R be two functions with positive values that are integrable on .[a, c] for every .c ∈ ]a, b[ . Assume that the limit L = lim

.

x→b−

f (x) g(x)

exists. Then the following conclusions hold: (a) If .L ∈ ]0, +∞[ , then f is integrable on .[a, b[ if and only if g is. (b) If .L = 0 and g is integrable on .[a, b[ , then f is as well. .(c) If .L = +∞ and g is not integrable on .[a, b[ , then neither f is.

.

.

Proof Case .(a). If .L ∈ ]0, +∞[ , then there exists a .c¯ ∈ ]a, b[ such that x ∈ [c, ¯ b[

.



f (x) 3L L ≤ ≤ , 2 g(x) 2

i.e., x ∈ [c, ¯ b[

.



f (x) ≤

3L g(x) 2

and g(x) ≤

2 f (x) . L

The conclusion then follows from the comparison criterion. Case .(b). If .L = 0, then there exists a .c¯ ∈ ]a, b[ such that, if .x ∈ [c, ¯ b[ , then .f (x) ≤ g(x), and the comparison criterion applies. Case .(c). If .L = +∞, then we reduce this to case .(b) by exchanging the roles of f and g.  Example Consider the function .f : [0, +∞[ → R defined by f (x) = e1/(x

.

2+1)

−1.

7.13 Integration on Noncompact Intervals

197

As a comparison function, we take g(x) =

.

x2

1 . +1

It is integrable on .[0, +∞[ , with -

+∞

.

0

1 π dx = lim [arctan x]c0 = . c→+∞ 2 x2 + 1

Since .

f (x) et − 1 = lim = 1, x→+∞ g(x) t t →0+ lim

we conclude that f is integrable on .[0, +∞[ as well. We now consider the case of a function .f : ]a, b] → R, with .a ≥ −∞. There is an analogous definition of its integral. Definition 7.31 We say that a function .f : ]a, b] → R is “integrable” if f is integrable on .[c, b] for every .c ∈ ]a, b[ , and the limit .

lim

c→a + c

b

f

exists and is finite. In that case, the preceding limit is called the “integral” of f on .b .b ]a, b], and it is denoted by . a f or . a f (x) dx.

.

Given the function .f : ]a, b] → R, it is possible to consider the function .g : [a  , b  [ → R, with .a  = −b and .b = −a, defined by .g(x) = f (−x). It is easy to see that f is integrable on .]a, b] if and only if g is integrable on .[a  , b  [ . In this way, we are led back to the previous context. We will also define the integral of a function .f : ]a, b[ → R, with .−∞ ≤ a < b ≤ +∞, in the following way. Definition 7.32 We say that .f : ]a, b[ → R is “integrable” if, once we fix a point p ∈ ]a, b[ , the function f is integrable on .[p, b[ and on .]a, p]. In that case, the “integral” of f on .]a, b[ is defined by

.

-

b

.

a

-

p

f = a

f+

b

f. p

It is easy to verify that the given definition does not depend on the choice of p ∈ ]a, b[ .

.

198

7 The Integral

Examples If .a, b ∈ R, one can verify that the function f (x) =

.

1 [(x − a)(b − x)]β

is integrable on .]a, b[ if and only if .β < 1. In this case, it is possible to choose, for instance, .p = (a + b)/2. Another case arises when .a = −∞ and .b = +∞. For example, we can easily verify that the function .f (x) = (x 2 + 1)−1 is integrable on .] − ∞, +∞[ . Taking, for instance, .p = 0, we have .

+∞

-

1 dx = x2 + 1

−∞

0 −∞

1 dx + x2 + 1

-

+∞ 0

1 dx = π . x2 + 1

Another case that might be encountered in the applications is when a function happens not to be defined in an interior point of an interval. Definition 7.33 Given .a < q < b, we say that a function .f : [a, b] \ {q} → R is integrable if f is integrable on both .[a, q[ and .]q, b]. In that case, we set -

b

.

a

-

q

f =

-

b

f+

a

f. q

For example, if .a < 0 < b, then the function .f (x) = .[a, b] \ {0}, and -

b

.

a



|x| dx = x

a

0

−1 dx + √ −x

-

b 0

√ |x|/x is integrable on

√ √ 1 √ dx = 2 b − 2 −a . x

On the other hand, the function .f (x) = 1/x is not integrable on .[−1, 1] \ {0}, even if the fact that f is odd caused us to say that its integral was equal to zero. However, in that case, some important properties of the integral would be lost, for example, the additivity on subintervals.

7.14

Functions with Vector Values

We now consider a function .f : [a, b] → RN with vector values. As usual, we can write f (x) = (f1 (x), . . . , fN (x)) ,

.

7.14 Functions with Vector Values

199

where the functions .fk : [a, b] → R are the components of f . We say that f is integrable whenever all the components are integrable functions, and in that case we can define the integral of f as -

-

b

f (x) dx =

.

-

b

f1 (t) dt , . . . ,

a



b

fN (t) dt

a

.

a

The integral is thus a vector in .RN . A particular case is encountered when .f : [a, b] → C. Writing f (x) = f1 (x) + if2 (x) ,

.

we will have .

b

-

b

f (x) dx =

a

f1 (x) dx + i

a

b

f2 (x) dx . a

Theorem 7.34 Assume that both .f : [a, b] → RN and .f  : [a, b] → R are integrable and .a < b. Then -

-

b

f (x) dx ≤

.

a

b

f (x) dx .

a

.b .b Proof Set .v = a f (x) dx, i.e., .v = (v1 , . . . , vN ), with .vk = a fk (x) dx for .k = 1, . . . , N. If .v = 0, then the statement surely holds. Assume now that .v = 0. Then, using the Schwarz inequality, v2 =

N 

.

k=1

-

b

=

vk2 =

N 

-

b

vk

fk (x) dx =

a

k=1

v · f (x) dx ≤

a

N 

b

vk fk (x) dx =

k=1 a

-

b

v f (x) dx = v

vk fk (x) dx

a k=1

-

a

N b

b

f (x) dx ,

a

whence v ≤

-

b

.

f (x) dx ,

a

which is what we wanted to prove.



Now let .F : [a, b] → RN be a function whose components .Fk : [a, b] → R are differentiable. In this case we say that F is differentiable, and, writing F (x) = (F1 (x), . . . , FN (x)) ,

.

200

7 The Integral

we can define its derivative at some .x0 ∈ [a, b] as F  (x0 ) = lim

.

x→x0



=

F (x) − F (x0 ) x − x0

lim

x→x0

F1 (x) − F1 (x0 ) FN (x) − FN (x0 ) , . . . , lim x→x0 x − x0 x − x0



= (F1 (x0 ), . . . , FN (x0 )) . Here is a version of the Fundamental Theorem in this context. Theorem 7.35 (Fundamental Theorem—III) If .F : [a, b] → RN is differentiable, then .F  : [a, b] → RN is integrable, and -

b

.

F  (x) dx = F (b) − F (a) .

a

Proof Since each component .Fk : [a, b] → R is differentiable, by the Fundamental Theorem we know that the derivatives .Fk : [a, b] → R are integrable and -

b

.

a

Fk (x) dx = Fk (b) − Fk (a) ,

for every k = 1, . . . , N .

Then .F  is integrable, and -

b

.



-

b

F (x) dx =

a

a

F1 (x) dx ,

-

b

... , a

FN (x) dx



= (F1 (b) − F1 (a) , . . . , FN (b) − FN (a)) % $ % $ = F1 (b) , . . . , FN (b) − F1 (a) , . . . , FN (a) = F (b) − F (a) , thereby completing the proof.



Part III Further Developments

8

Numerical Series and Series of Functions

8.1

Introduction and First Properties

Let V be a normed vector space. Given a sequence .(ak )k in V , the associated “series” is the sequence .(sn )n defined by .

s0 = a0 , s1 = a0 + a1 , s2 = a0 + a1 + a2 , ... sn = a0 + a1 + a2 + · · · + an , ...

1 The element .ak is called the “kth term” of the series, whereas .sn = nk=0 ak is said to be the “nth partial sum” of the series. Whenever .(sn )n has a limit in V , we say that the series “converges.” In that case, the limit .S = limn sn is said to be the “sum of the series,” and we will write  n  ∞   .S = lim ak = ak , n

k=0

k=0

and sometimes we will also use the notation S = a0 + a1 + a2 + · · · + an + . . .

.

© The Author(s), under exclusive license to Springer Nature Switzerland AG 2023 A. Fonda, A Modern Introduction to Mathematical Analysis, https://doi.org/10.1007/978-3-031-23713-3_8

203

204

8 Numerical Series and Series of Functions

However, by an abuse of notation, the series .(sn )n itself is often denoted by the same symbols, ∞ 

either

.

or a0 + a1 + a2 + · · · + an + . . . .

ak ,

k=0

1 Sometimes, for brevity’s sake, we will simply write . k ak . Let us analyze three examples, in all of which .V = R. Example 1 For .α ∈ R, the “geometric series” 1 + α + α2 + α3 + · · · + αn + . . .

.

has as its kth term .ak = α k . If .α = 1, the nth partial sum is sn =

n 

.

αk =

k=0

α n+1 − 1 , α−1

whereas if .α = 1, then we have .sn = n + 1. Hence, the series converges if and only if .|α| < 1, in which case its sum is ∞  .

αk =

k=0

1 . 1−α

Notice that if .α ≥ 1, then .limn sn = +∞, whereas if .α ≤ −1, then the sequence (sn )n has no limit since .lim infn sn = −∞ and .lim supn sn = +∞.

.

In general, if the sequence .(sn )n has no limit, then we say that the series is “undetermined.” On the other hand, for real valued series we say that – The series “diverges to .+∞” if .limn sn = +∞ . – The series “diverges to .−∞” if .limn sn = −∞ . Example 2 The series .

1 1 1 1 + + + ···+ +... 1·2 2·3 3·4 (n + 1) · (n + 2)

has as its kth term .ak =  .

1 1 − 1 2



 +

1 (k+1)(k+2) .

1 1 − 2 3



 +

It is a “telescopic series”:

1 1 − 3 4



 + ···+

1 1 − n+1 n+2

 +...

8.1 Introduction and First Properties

205

Hence, simplifying, sn = 1 −

.

1 , n+2

leading to .limn sn = 1. We have thus proved that the series converges and that its sum is equal to 1. We can then write ∞  .

k=0

1 = 1. (k + 1)(k + 2)

In the preceding example, one could use different notations in the sum like, e.g., ∞  .

k=1

1 = 1, k(k + 1)

or variants of it; for example, the letter “k” could be replaced by any other, so that, 1 1 e.g., . ∞ j =1 j (j +1) = 1. These remarks apply indeed to all series. Example 3 The “harmonic series” 1+

.

has as its kth term .ak =

1 1 1 1 + + + ···+ +... 2 3 4 n+1

1 k+1 .

It diverges to .+∞; we can see this by writing it as

   1 1 1 1 1 1 + + + + + + 2 3 4 5 6 7  1 1 1 1 1 1 + + + + + + 9 10 11 12 13 14

1+

.

1 8



1 1 + + 15 16

 +... ,

gathering together the first 2 of its terms, then 4, then 8, then 16, and so on, doubling their number each time. It is easy to see that the sums in the parentheses are all greater than . 12 . Hence, the sequence of partial sums must have the limit .+∞. We can then write ∞  .

k=0

1 = +∞ . k+1

It must be said that the explicit computation of the sum of a series is a rare event. Very often we will already be satisfied when proving that a series converges or not.

206

8 Numerical Series and Series of Functions

It is important to notice that the convergence of a series is not compromised if only a finite number of its terms are modified. Indeed, if the series converges, we can change, add, or delete a finite number of initial terms, and the new series thus obtained will still converge. In contrast, if the series does not converge, because either it is undetermined or diverges to .±∞, the same will be true of the modified series. 1 Theorem 8.1 If a series . k ak converges, then .

lim an = 0 . n

Proof Let .limn sn = S ∈ V . Then also .limn sn−1 = S, and hence .

lim an = lim(sn − sn−1 ) = lim sn − lim sn−1 = S − S = 0 , n

n

n

n



which is what we wanted to prove.

Let us study the behavior of series with respect to the sum and to the product by a scalar. 1 1 Theorem 8.2 Assume that the two series 1 . k ak and . k bk converge with sums A and B, respectively. Then the series . k (ak + bk1 ) also converges, and its sum is .A + B. Moreover, for any fixed .α ∈ R, the series . k (αak ) also converges, and its sum is .αA. We will briefly write .

∞ ∞ ∞    (ak + bk ) = ak + bk ,

∞ ∞   (αak ) = α ak .

k=0

k=0

Proof Let .sn =

k=0

1n

k=0 ak

and .sn =

sn + sn =

.

k=0

1n

k=0 bk .

n  (ak + bk ) ,

k=0

Then αsn =

k=0

n 

(αak ) ,

k=0



and the result follows, passing to the limits. Let us see how the Cauchy criterion adapts to series in Banach spaces. 1 Theorem 8.3 If V is a Banach space, the series . k ak converges if and only if ∀ε > 0 ∃n¯ :

.

n > m ≥ n¯



n  k=m+1

ak < ε .

8.1 Introduction and First Properties

207

Proof Since V is complete, the sequence .(sn )n has a limit in V if and only if it is a Cauchy sequence, i.e., ∀ε > 0

.

∃n¯ :

[ m ≥ n¯ and n ≥ n¯ ]



sn − sm  < ε .

Now it1is not restrictive to take .n > m, and if we substitute .sn = m .sm = k=0 ak , the conclusion follows.

1n

k=0 ak

and 

We now state a useful convergence criterion. 1 Theorem 1 8.4 If V is a Banach space and the series . k ak  converges, then the series . k ak also converges. 1 In that case we say that the series . k ak “converges in norm,” unless V coincides with either .R or .C, in which cases we say that the series “converges absolutely.” 1 Proof We assume that the series . k ak  converges. Let .ε > 0 be fixed. By the Cauchy criterion, there exists a .n¯ ∈ N such that n > m ≥ n¯

.



n 

ak  < ε .

k=m+1

Since n  .

ak ≤

k=m+1

n 

ak  ,

k=m+1

we have that n > m ≥ n¯

.



n 

ak < ε ,

k=m+1

and the conclusion follows, using the Cauchy criterion again.



The convergence in norm of a series thus reinforces the interest in examining the series of positive real numbers.

208

8.2

8 Numerical Series and Series of Functions

Series of Real Numbers

In this section we only consider series with terms .ak in .R. If for every k one has that .ak ≥ 0, then the sequence .(sn )n is increasing, hence it has a limit, and we have only two possibilities: The series either converges or it diverges at .+∞. The following comparison criterion will be very useful. 1 1 Theorem 8.5 Let . k ak and . k bk be two series for which ∃k¯ ∈ N :

.

k ≥ k¯ ⇒ 0 ≤ ak ≤ bk .

Then 1 1 (a) If .1k bk converges, then1. k ak also converges. .(b) If . k ak diverges, then . k bk also diverges.

.

Proof We define sn = a0 + a1 + a2 + · · · + an ,

.

sn = b0 + b1 + b2 + · · · + bn .

By previous considerations, we can modify a finite number of terms in the two series and assume without loss of generality that .0 ≤ ak ≤ bk for every k. Then the two sequences .(sn )n and .(sn )n are increasing, and .sn ≤ sn for every n. Consequently, the 1  limits .S = limn sn and .S  = limn sn exist, and .S ≤ S ≤ +∞. If . b converges, k 1 1 k then .S  ∈ R, so also .S ∈ R, meaning1. k ak converges. If . k ak diverges, then  .S = +∞, so also .S = +∞, meaning .  k bk diverges. Example The series 1+

.

1 1 1 1 + 2 + 2 + ··· + +... 22 3 4 (n + 1)2

converges. This can be proved by comparing it with the series 1+

.

1 1 1 1 + + + ···+ +... , 1·2 2·3 3·4 n · (n + 1)

which is a slight modification of the one already treated earlier in Example 2. All the terms of the first series are smaller than or equal to the corresponding ones of the second series, which converges. As a first corollary, we have the asymptotic comparison criterion.

8.2 Series of Real Numbers

209

1 1 Corollary 8.6 Let . k ak and . k bk be two series with positive terms, for which the limit  = lim

.

k

ak bk

exists. We have three cases: (a) . ∈ ]0, +∞[ 1 ; the two series either 1both converge or both diverge. (b) . = 0 ; if . k1 bk converges, then . 1 k ak also converges. .(c) . = +∞ ; if . b diverges, then . k k k ak also diverges.

.

.

Proof

Case .(a). If . ∈ ]0, +∞[ , then there exists a .k¯ such that k ≥ k¯

ak  3 ≤ , ≤ 2 bk 2



.

i.e., k ≥ k¯



.

ak ≤

3 bk 2

and bk ≤

2 ak . 

The conclusion then follows from the comparison criterion. ¯ then .ak ≤ bk , and the Case .(b). If . = 0, then there exists a .k¯ such that if .k ≥ k, comparison criterion applies. Case .(c). If . = +∞ we have the analogue of Case 2 with the roles of .ak and .bk interchanged.  The following corollary provides us with the “root test.” 1 Corollary 8.7 Let . k ak be a series with nonnegative terms. If .

lim sup

√ k ak < 1 ,

k

then the series converges. √ Proof Set . = lim supk k ak , and let .α ∈ ], 1[ be an arbitrarily fixed number. Then there exists a .k¯ such that k ≥ k¯

.



√ k ak ≤ α ,

i.e., k ≥ k¯

.



ak ≤ α k .

210

8 Numerical Series and Series of Functions

The conclusion follows by comparison with the geometric series converges, since .0 < α < 1.

1

.

k

α k , which 

Recalling Proposition 3.32, as an immediate consequence we have the “ratio test.” 1 Corollary 8.8 Let . k ak be a series with positive terms. If .

lim sup k

ak+1 < 1, ak

then the series converges. We now present the condensation criterion, which we already implicitly used earlier when dealing with the harmonic series of Example 3. Theorem 8.9 Let .(ak )k be a decreasing sequence of nonnegative numbers. Then the two series ∞  .

ak ,

k=0

∞ 

2k a2k

k=0

either both converge or both diverge. Proof 1 k For simplicity, we delete the first term .a0 from the first series. Let the series k 2 a2k converge. Then

.

.

a1 + (a2 + a3 ) ≤ a1 + 2a2 , a1 + (a2 + a3 ) + (a4 + a5 + a6 + a7 ) ≤ a1 + 2a2 + 4a4 , a1 + (a2 + a3 ) + (a4 + a5 + a6 + a7 ) + +(a8 + a9 + a10 + a11 + a12 + a13 + a14 + a15 ) ≤ ≤ a1 + 2a2 + 4a4 + 8a8 , ...

leading to the inequality 2n+1 −1 .

k=1

ak ≤

n 

2k a2k ,

for every n ∈ N .

k=0

1 1 By comparison, since . k 2k a2k converges, then . k ak also converges.

8.2 Series of Real Numbers

211

1 Assume now, conversely, that the series . k ak converges. Then .

a1 + 2a2 ≤ 2(a1 + a2 ) , a1 + 2a2 + 4a4 ≤ 2(a1 + a2 + (a3 + a4 )) , a1 + 2a2 + 4a4 + 8a8 ≤ 2(a1 + a2 + (a3 + a4 ) + (a5 + a6 + a7 + a8 )) , ...

leading to n  .

n

2 

2 a2k ≤ k

k=0

2ak ,

for every n ∈ N .

k=1

1 1 By comparison, since . k 2ak converges, then . k 2k a2k also converges.



Example 1 Let us consider the series .

∞  1 , kβ k=1

where .β > 0 is a fixed real number. The sequence .(ak )k , with .ak = 1/k β , is decreasing. The “condensed series” ∞  .

2k a2k =

k=1

∞ 



2k

k=1

 1 = (21−β )k (2k )β k=1

1 is a geometric series of the type . k α k , with .α = 21−β . It converges if and only if .|α| < 1; hence, ∞  1 . kβ

converges



k=1

Example 2 Let us now examine the series ∞  .

k=2

1 k(ln k)β

β > 1.

212

8 Numerical Series and Series of Functions

for some .β > 0. By some use of differential calculus, it is rather easy to see that the sequence .(ak )k , with .ak = 1/k(ln k)β , is decreasing. The “condensed series” is ∞  .

2k a2k =

k=2

∞ 

2k

k=2

∞ 1 1  1 = . 2k (ln 2k )β (ln 2)β kβ k=2

Looking back at the previous example, we conclude that ∞  .

k=2

1 k(ln k)β



converges

β > 1.

Till now in this section we have only considered series with nonnegative terms. Now we shift to series having alternating signs. Consider a series of the type a0 − a1 + a2 − a3 + · · · + (−1)n an + . . . ,

.

where all .ak are positive. What follows is the Leibniz criterion. Theorem 8.10 If .(ak )k is a decreasing sequence of positive numbers and .

lim ak = 0 , k

1 then the series . k (−1)k ak converges. Proof Let sn = a0 − a1 + a2 − a3 + · · · + (−1)n an ,

.

and consider the sequence .(sn )n of partial sums. We divide it into two subsequences, one with even indices and the other one with odd indices. Since .(ak )k is positive and decreasing, we see that s1 ≤ s3 ≤ s5 ≤ s7 ≤ . . . ≤ s6 ≤ s4 ≤ s2 ≤ s0 .

.

Hence, the subsequence .(s2m+1 )m , the one with odd indices, is increasing and bounded from above, whereas the subsequence .(s2m )m , the one having even indices, is decreasing and bounded from below. Then both subsequences have a finite limit, and we can write .

lim s2m+1 = 1 , m

lim s2m = 2 . m

8.3 Series of Complex Numbers

213

On the other hand, 2 − 1 = lim(s2m − s2m+1 ) = lim a2m+1 = 0 ,

.

m

m

hence .1 = 2 . Because the two subsequences .(s2m+1 )m and .(s2m )m have the same limit, we can be sure that the sequence .(sn )n will have the same limit. 

8.3

Series of Complex Numbers

1 When we consider a series . k ak whose terms are complex numbers .ak = xk + iyk , where .xk = (ak ) and .yk = (ak ), we can write its partial sums as sn =

n 

.

k=0

ak =

n 

(xk + iyk ) =

k=0

n 

xk + i

k=0

n 

yk = σn + iτn ,

k=0

where .σn = (sn ) and .τn = (sn ). We thus have a sequence .(σn , τn )n in .R2 . Recalling that such a sequence has a limit in .R2 if and only if both its components have a limit in .R, we obtain the following statement. Theorem 8.11 If .ak = xk + iyk , with .xk1and .yk being 1 1 real numbers, the series a converges if and only if both series . x and . k k k k k yk converge. In that case,

.

∞  .

ak =

k=0

∞  k=0

xk + i

∞ 

yk .

k=0

Example Let us consider the series ∞  .

k=0

i 1 i 1 i 1 i in ik = 1 + − − + + − − + ···+ + ... . k+1 2 3 4 5 6 7 8 n+1

The real part is 1−

.

1 1 1 + − +... , 3 5 7

whereas the imaginary part is .

1 1 1 1 − + − + ... . 2 4 6 8

Both of them converge by the Leibniz criterion on series with alternating signs, so the given series also converges.

214

8 Numerical Series and Series of Functions

Note that, in the previous example, the series does not converge absolutely since  ∞  ∞   ik   1 =  . k + 1 k+1 k=0

k=0

is the harmonic series, which we know to be divergent. 1 1∞ We now define the “Cauchy product” of two series . ∞ k=0 ak and . k=0 bk . It is the series ∞  k  .

k=0

 ak−j bj .

j =0

However, some care is needed concerning its convergence. Indeed, it is not true in general that if the two series converge, then their Cauchy product series also converges. The following theorem states that this will be true if at least one of them converges absolutely. 1 1 Theorem 8.12 (Mertens’ Theorem) Assume that the series . k ak and . k bk converge, with sums A and B, respectively. If at least one of them converges absolutely, then their Cauchy product series converges with sum AB. 1 1∞ ¯ Proof To fix ideas, let . ∞ k=0 ak converge absolutely, and set .A = k=0 |ak |. We denote by ck =

k 

.

ak−j bj

j =0

the kth term of the Cauchy product series. Let sn =

n 

.

ak ,

k=0

sn =

n 

bk ,

k=0

sn =

n 

ck .

k=0

Moreover, let .rn = B − sn . Then sn = a0 b0 + (a1 b0 + a0 b1 ) + · · · + (an b0 + an−1 b1 + · · · + a1 bn−1 + a0 bn )

.

 + · · · + an−1 s1 + an s0 = a0 sn + a1 sn−1  = a0 (B − rn ) + a1 (B − rn−1 ) + · · · + an−1 (B − r1 ) + an (B − r0 )  + · · · + an−1 r1 + an r0 ) . = sn B − (a0 rn + a1 rn−1

8.4 Series of Functions

215

Since .lim sn B = AB, the proof will be completed if n

.

 lim (a0 rn + a1 rn−1 + · · · + an−1 r1 + an r0 ) = 0 . n

Let .ε > 0 be fixed. Since .lim rn = 0, there exists a .n¯ 1 such that n

n ≥ n¯ 1

.



|rn | < ε .

Let us set .R = max{|rn | : n ∈ N}. By the Cauchy criterion, there exists a .n¯ 2 ≥ n¯ 1 such that n ≥ n¯ 2



.

|an−n¯ 1 +1 | + |an−n¯ 1 +2 | + · · · + |an | < ε .

Then, if .n ≥ n¯ 2 ,  |a0 rn + a1 rn−1 + · · · + an−1 r1 + an r0 | ≤

.

≤ |a0 | |rn | + · · · + |an−n¯ 1 | |rn¯ 1 | + |an−n¯ 1 +1 | |rn¯ 1 −1 | + · · · + |an | |r0 | ≤ ε(|a0 | + · · · + |an−n¯ 1 |) + R(|an−n¯ 1 +1 | + · · · + |an |) ≤ εA¯ + R ε = (A¯ + R) ε , 

thereby completing the proof.

8.4

Series of Functions

Let E be a metric space and F a normed vector space. If we have a sequence of functions 1 .fk : E → F , for any .x ∈ E we can ask ourselves whether or not the series . k fk (x) converges. If there is a subset .U ⊆ E and a function .f : U → F such that ∞  .

fk (x) = f (x) ,

for every x ∈ U ,

k=0

we say that 1the series “converges pointwise” to f on U ; this happens when, setting sn (x) = nk=0 fk (x), the sequence .(sn )n converges pointwise to f on U . We say that the series “converges uniformly” to f on U if the convergence of .(sn )n to f is uniform on U , meaning

.

∀ε > 0

.

∃n¯ ∈ N :

∀x ∈ U

n ≥ n¯ ⇒

n  k=0

fk (x) − f (x) < ε .

216

8 Numerical Series and Series of Functions

Theorem 1 8.13 If every function .fk : E → F is continuous on .U ⊆ E and the series . ∞ k=0 fk converges uniformly to f on U , then f is also continuous on U . 

Proof It is a direct consequence of Theorem 4.17.

Theorem 8.14 Let E be1 compact, F a Banach space, and all functions 1∞ .fk : E → F continuous. If the series . ∞ k=0 fk ∞ converges, then the series . k=0 fk converges uniformly to a continuous function on E. 1 Proof We know that .V = C(E, F ) is a Banach space, and . ∞ k=0 fk is a series in V that converges in norm. Then it converges in V , meaning that it converges uniformly.  Example Let .E = [a, b] ⊆ R and .F = R, and let us consider the series .

∞  √ 1 sin(e3kx−1 + arctan(x 2 + k )) . 2 k k=1

We will examine the series of the norms in .C([a, b], R). We have that   2 √   1 3kx−1 2   : x ∈ [a, b] ≤ 1 , . sup sin(e + arctan(x + k ))  k2  k2 and the series uniformly.

1

.

1 k k2

converges. Then the series converges in norm and, hence,

We now adapt the two theorems in Sect. 7.12 to the context of series. 1 Theorem 8.15 Let .(fk )k be a sequence in .C([a, b], R) such that the series . k fk is uniformly convergent. Then -

b

∞ 

.

a

k=0

fk (t) dt =

∞  k=0

b

fk (t) dt . a

Proof It is a direct consequence of Theorem 7.24 when applied to the sequence of functions .(sn )n .  1 Theorem 8.16 Let .(fk )k be a sequence 1 1 in .C ([a, b], R). Assume that the series . k fk and the series of the derivatives . k fk converge uniformly to some functions 1 .f : [a, b] → R and .g : [a, b] → R, respectively. Then f is of the class .C , and

8.4 Series of Functions

217

f  = g. Consequently, we can write

.

.

∞ ∞  d  d fk (x) . fk (x) = dx dx k=0

k=0

1n Proof Consider the sequence of partial sums .sn = k=0 fk . Then .(sn )n is in 1 n  1   .C (I, R), and .sn = k=0 fk . By assumption, .limn sn = f and .limn sn = g, 1 uniformly on I . Hence, by Theorem 7.25, it must be that .f ∈ C (I, R) and  .f = g.  Iterating the same argument, we can easily generalize the preceding theorem. Theorem 8.17 Let .(fk )k be a sequence in .C m ([a, b], R). Assume that the series  .

fk ,



k

fk ,



k

fk , . . . ,

k



fk(m)

k

converge uniformly on .[a, b] to some functions f , .g1 , .g2 , . . . , .gm , respectively. Then f is of the class .C m , and f  = g1 ,

f  = g2 , . . . , f (m) = gm .

.

8.4.1

Power Series

An important example of a series of functions is provided by the “power series” (P S)C

∞ 

.

ak z k ,

k=0

whose terms are the functions .fk : C → C defined as .fk (z) = ak zk , for some given coefficients .ak ∈ C. Let us first analyze the pointwise convergence. Theorem 8.18 Setting L = lim sup

.

k

k

|ak | ,

218

8 Numerical Series and Series of Functions

we have the following possibilities: (a) If .L = +∞, then the series .(P S)C converges only for .z = 0. (b) If .L = 0, then the series .(P S)C converges  for every .z ∈ C. converges if |z| < L1 , .(c) If .L ∈ ]0, +∞[ , then the series .(P S)C . does not converge if |z| >

.

.

√ Proof If .L = +∞ and .z = 0, then . k |ak | >

|ak zk | = ( k |ak | |z|)k > 1 ,

.

1 |z|

1 L

.

for infinitely many k, hence

for infinitely many k .

If the series were to converge, then it should be .limk ak zk = 0, but this is not so. Hence, if .L = +∞, then the series converges only for .z = 0. If .L = 0, then for any .z ∈ C we have that  .

lim sup

k

|ak zk | = |z| lim sup k |ak | = 0 ,

k

k

so, by the root test, the series converges absolutely. Assume now .L ∈ ]0, +∞[ . If .|z| < L1 , then 

k . lim sup |ak zk | = |z| lim sup k |ak | = |z| L < 1 , k

k

and, by the root test, the series converges absolutely. In contrast, if .|z| > √ 1 1 k .L > |z| , then . |ak | > |z| for infinitely many k, and hence

|ak zk | = ( k |ak ||z|)k > 1 ,

.

1 L

, i.e.,

for infinitely many k .

If the series were to converge, then we would have .limk ak zk = 0, but this is not the case.  We define the “convergence radius” r of the series .(P S)C as follows: ⎧ ⎪ ⎪0 ⎨ +∞ .r = ⎪ 1 ⎪ ⎩ L

if L = +∞ , if L = 0 , if L ∈ ]0, +∞[ .

If .r > 0, we will say that .B(0, r) is the “convergence disk” of the series. We emphasize that it is an open ball. If .r = +∞, then we set .B(0, r) = C.

8.4 Series of Functions

219

If .r > 0, then the series .(P S)C converges pointwise in .B(0, r). However, this convergence could not be uniform. We now see that the convergence will be uniform on any smaller disk. Theorem 8.19 Assume .r > 0. Then for every .ρ ∈ ]0, r[ the series .(P S)C converges in norm, hence uniformly, on .B(0, ρ). Proof Let .ρ ∈ ]0, r[ be fixed. Then .

sup{|ak zk | : |z| ≤ ρ} = |ak |ρ k ,

and since  .

lim sup

k

|ak |ρ k = ρ lim sup

k

k

|ak | = ρL < rL = 1 ,

k

1 then, by the root test, the series . k |ak |ρ k converges. We have thus seen that the series .(P S)C converges in norm on .B(0, ρ).  Corollary 8.20 If .r > 0, and .f : B(0, r) → C is defined as f (z) =

∞ 

.

ak z k ,

k=0

then f is continuous on .B(0, r). Proof From the previous theorem, for every .ρ ∈ ]0, r[ the convergence is uniform on .B(0, ρ); hence, f is continuous on .B(0, ρ). Since .ρ ∈ ]0, r[ is arbitrary, f is thus continuous at every point of .B(0, r).  Remark 8.21 The above theory can be easily generalized to series of the type ∞  .

ak (z − z0 )k

k=0

for some fixed point .z0 ∈ C. (Indeed, the change of variables .u = z − z0 leads back to the previously considered case.) The convergence disk in this case is .B(z0 , r) = {z ∈ C : |z − z0 | < r}.

220

8 Numerical Series and Series of Functions

8.4.2

The Complex Exponential Function

Let us now examine, for every .z ∈ C, the series .

∞ k  z k=0

k!

.

It is a power series, which converges absolutely, since .

∞  k ∞  z   |z|k  = = e|z| .  k!  k! k=0

k=0

It is therefore possible to define a function .F : C → C by F (z) =

.

∞ k  z k=0

k!

.

Recall that if .z = x ∈ R, then we have proved that .F (x) is equal to .exp(x), i.e., ex =

.

∞  xk k=0

k!

.

We can then interpret this function .F as an extension of the exponential function to the complex plane .C. For this reason, we will call .F the “complex exponential function” and write either .exp(z) or .ez instead of .F (z). Theorem 8.22 For every .z1 and .z2 in .C we have that .

exp(z1 + z2 ) = exp(z1 ) + exp(z2 ) .

1 1∞ z2k z1k Proof The series . ∞ k=0 k! and . k=0 k! converge absolutely, and their sums are .exp(z1 ) and .exp(z2 ), respectively. Then, by Mertens’ Theorem 8.12, the Cauchy product series converges, and its sum is .exp(z1 ) exp(z2 ). On the other hand, the Cauchy product series is

.

   k   ∞  k k−j j ∞ ∞   z1 z2 1  k k−j j (z1 + z2 )k , = z1 z2 = (k − j )! j ! k! k! j k=0

j =0

k=0

j =0

and its sum is .exp(z1 + z2 ), whence the conclusion.

k=0



8.4 Series of Functions

221

Writing .z = x + iy, we obtain .

exp(x + iy) = exp(x) exp(iy) .

Moreover, .

exp(iy) =

∞  (iy)k k=0

k!

= lim qn (y) , n

where n  (iy)k

qn (y) =

.

k=0

k!

= 1 + iy −

y2 y3 y4 y5 y6 y7 yn −i + +i − −i + · · · + in . 2! 3! 4! 5! 6! 7! n!

We thus have that qn (y) = qn(1)(y) + i qn(2)(y) ,

.

where qn(1)(y) = 1 −

.

y6 y 2m y2 y4 + − + · · · + (−1)m 2! 4! 6! (2m)!

if either .n = 2m or .n = 2m + 1, whereas qn(2)(y) = y −

.

y5 y7 y3 y 2m+1 + − + · · · + (−1)m 3! 5! 7! (2m + 1)!

if either .n = 2m + 1 or .n = 2m + 2. Since .

lim q (1)(y) n→+∞ n

= cos y ,

lim q (2)(y) n→+∞ n

= sin y ,

we conclude that .

  lim qn (y) = lim qn(1)(y), lim qn(2)(y) = (cos y, sin y) , n

n

n

hence ex+iy = ex (cos y + i sin y) .

.

This is the Euler formula.

222

8 Numerical Series and Series of Functions

It can be easily verified that cos t =

.

eit + e−it , 2

sin t =

eit − e−it . 2i

These formulas can be used to extend the functions .cos and .sin to the complex field by simply taking .t ∈ C. The hyperbolic functions can also be extended to .C by the formulas .

cosh z =

ez + e−z , 2

sinh z =

ez − e−z . 2

Note that .

cos t = cosh(it) ,

sin t = −i sinh(it) .

The analogies between the trigonometric and the hyperbolic functions are now surely better understood.

8.4.3

Taylor Series

Let us now consider the power series ∞ 

(P S)R

.

ak x k ,

k=0

where .x ∈1 R and all coefficients .ak are also real numbers. We are thus considering the series . k fk , where the functions .fk : R → R are defined by .fk (x) = ak x k . Hence, if .r > 0, then the convergence disk .B(0, r) is now reduced the interval .] − r, r[ , and if .r = +∞, then it is the whole real line .R. In these cases, we might wonder whether the sum of the series .(P S)R was differentiable on .] − r, r[ . Theorem 8.23 Let .r > 0 be the convergence radius of the series .(P S)R , and let f (x) =

∞ 

.

ak x k ,

for every x ∈ ] − r, r[ .

k=0

Then the series ∞  .

k=1

kak x k−1

8.4 Series of Functions

223

has the same convergence radius r. Moreover, the function .f : ] − r, r[ → R is differentiable, and f  (x) =

∞ 

.

kak x k−1 ,

for every x ∈ ] − r, r[ .

k=1

Proof Since .

lim sup

k

|kak | = lim k

k

√ k k lim sup k |ak | = lim sup k |ak | , k

k

1 we see that the convergence radius for the series . k kak x k−1 to r. We can 1 is equal k−1 then define the new function .g : ] − r, r[ → R as .g(x) = ∞ ka x . For any k k=1 fixed .ρ ∈ ]0, r[ we know that the convergence of the1series is uniform on .[−ρ, ρ]. 1  Setting .fk (x) = ak x k , we have that .f = k fk and .g = k fk , and the convergence is uniform on .[−ρ, ρ]. By Theorem 8.16, f is differentiable on  .[−ρ, ρ] and .f (x) = g(x) for every .x ∈ [−ρ, ρ]. The conclusion follows since .ρ ∈ ]0, r[ is arbitrary.  Iterating the same argument and making use of Theorem 8.17 we easily obtain the following generalization. Theorem 8.24 Let .r > 0 be the convergence radius of the series .(P S)R , and let f (x) =

∞ 

.

ak x k ,

for every x ∈ ] − r, r[ .

k=0

Then the series ∞  .

∞ 

kak x k−1 ,

k=1

k=2 ∞  .

k(k − 1)ak x k−2 ,

∞ 

k(k − 1)(k − 2)ak x k−3 ,

...

,

k=3

k(k − 1)(k − 2) · · · (k − m + 1)ak x k−m ,

...

k=m

all have the same convergence radius r. Moreover, the function .f : ] − r, r[ → R is infinitely differentiable and, for every positive integer j , f (j ) (x) =

∞ 

.

k=j

k(k − 1)(k − 2) · · · (k − j + 1)ak x k−j , for every x ∈ ] − r, r[ .

224

8 Numerical Series and Series of Functions

Note now that, taking .x = 0 in the previous formula, we obtain f (j ) (0) = j ! aj

.

for every .j ∈ N (recalling that .f (0) = f ). Then f (x) =

∞ 

.

ak x k =

k=0

∞  1 (k) f (0) x k . k! k=0

This is the “Taylor series” associated with the function f at .x0 = 0. We have thus proved that any power series with positive convergence radius r defines a function f that is analytic on .] − r, r[ . Remark 8.25 Referring to Remark 8.21, we can also extend the foregoing considerations, made for the series .(P S)R , to power series of the type ∞  .

ak (x − x0 )k

k=0

for some fixed point .x0 . If the convergence radius r is positive, then the convergence disk is .]x0 − r, x0 + r[, and the function .f : ]x0 − r, x0 + r[ → R defined by the sum of the series can be expressed by ∞  1 (k) f (x0 )(x − x0 )k , .f (x) = k! k=0

i.e., .f (x) = limn pn (x), where .pn (x) is the Taylor polynomial defined by (6.3).

8.4.4

Fourier Series

Let us now consider the “trigonometric polynomials” having some fixed period .T > 0. They are defined as fn (t) = c0 +

.

    n   2πk 2πk t + bk sin t , ak cos T T k=1

where .c0 , .ak , and .bk are some real constants. We are interested in examining the convergence of the sequence of functions .(fn )n .

8.4 Series of Functions

225

Theorem 8.26 If there exists a function .f : [0, T ] → R such that .

lim fn (t) = f (t) , n

uniformly on [0, T ] ,

then necessarily c0 =

.

ak = bk =

1 T 2 T 2 T

-

T

f (t) dt , -

0 T

 2πk t dt , T   2πk t dt . f (t) sin T 

f (t) cos -

0 T 0

Proof By Theorem 8.15, -

-

T

T

f (t) dt =

.

0

0



    ∞   2πk 2πk t + bk sin t dt c0 + ak cos T T k=1

= c0 T +

∞ - T  k=1 0



   2πk 2πk t + bk sin t dt = c0 T , ak cos T T 

whence the formula for .c0 . Analogously, for any integer .j ≥ 1,  2πj t dt = f (t) cos T 0       - T ∞   2πj 2πk 2πk = t + bk sin t cos t dt c0 + ak cos T T T 0

.



T

k=1

=

∞ - T  k=1 0

       2πj 2πk 2πk t + bk sin t cos t dt . ak cos T T T

On the other hand, integrating by parts twice, we see that, for any positive integer k = j ,

.

-



T

sin

.

0

   2πj 2πk t cos t dt = 0 , T T

-



T

cos 0

   2πj 2πk t cos t dt = 0 , T T

226

8 Numerical Series and Series of Functions

whereas, if .k = j , then -



T

sin

.

0

   2πj 2πj t cos t dt = 0 , T T

-

T

 cos2

0

2πj t T

 dt =

T . 2

Hence, -



T

f (t) cos

.

0

2πj t T

 dt =

T aj , 2

yielding the formula for .aj . Similarly one can see that -



T

f (t) sin

.

0

2πj t T

 dt =

T bj , 2

providing the formula for .bj .



For any given continuous function .f : [0, T ] → R, we define its “Fourier coefficients”     2 T 2πk 2 T 2πk t dt , bk = t dt .ak = f (t) cos f (t) sin T 0 T T 0 T and its “Fourier series”     ∞  a0  2πk 2πk + t + bk sin t . ak cos . 2 T T k=1

The problem is: Does this series converge for every .t ∈ [0, T ]? The answer is, in general, no: There exist continuous functions .f : [0, T ] → R for which the Fourier series fails to converge at some points .t ∈ [0, T ]. However, there are many ways to overcome this difficulty. We will very briefly review some of them. Let us define the partial sums of the Fourier series as fn (t) =

.

    n  a0  2πk 2πk + t + bk sin t . ak cos 2 T T k=1

We can now also define the “Cesaro means” σn (t) =

.

1 [f0 (t) + f1 (t) + · · · + fn (t)] , n+1

so as to be able to state, without proof, the following theorem.

8.4 Series of Functions

227

Theorem 8.27 (Fejer Theorem) If .f : R → R is continuous and T -periodic, then .

lim σn (t) = f (t) n

uniformly for every .t ∈ R. Here is a direct consequence. Corollary 8.28 Let .f, f˜ : R → R be continuous and T -periodic functions. If the respective Fourier coefficients are such that .ak = a˜ k and .bk = b˜k for every k, then f and .f˜ coincide. Proof With the notations adapted to this situation, we will have that .σn (t) = σ˜ n (t), for every .n, and hence f (t) − f˜(t) = lim(σn (t) − σ˜ n (t)) = 0

.

n

for every .t ∈ R.



We could also define the “complex Fourier coefficients” ck =

.

1 T

-

T

f (t)e−i

2π k T t

dt

0

for .k ∈ Z. Setting .b0 = 0, we see that  ck =

.

1 2 (a−k + ib−k ) 1 2 (ak − ibk )

if k < 0, if k ≥ 0,

so that fn (t) =

n 

.

ck e i

2π k T t

.

k=−n

In what follows, we use the notation ∞  .

k=−∞

 |ck | = lim

n→∞

n  k=−n

 |ck |

.

228

8 Numerical Series and Series of Functions

Corollary 1 8.29 Let .f : R → R be a continuous and T -periodic function. If the series . ∞ k=−∞ |ck | converges, then .

lim fn (t) = f (t) n

uniformly for every .t ∈ R. Proof Observe that .

   i 2πT k t  ck e  = |ck | .

1 Hence, if the series . ∞ k=−∞ |ck | converges, by Theorem 8.14 the sequence .(fn )n uniformly converges to some continuous function .f˜ : R → R, which is T -periodic. On the other hand, for this function, c˜k =

.

1 T

-

T

f˜(t)e−i

0

1 n→∞ T

-

T

= lim

= lim

n→∞

2π k T t

dt =

fn (t)e−i

2π k T t

1 T

-

T 0



 2π k lim fn (t) e−i T t dt

n→∞

1 n→∞ T

-

T

dt = lim

0

0

n 

cj e i

2πj T t

e−i

2π k T t

dt

j =−n

n  2π(j−k) 1 T 1 T cj ei T t dt = lim ck dt = ck n→∞ T 0 T 0

j =−n

for every .k ∈ Z. By Corollary 8.28, the two functions f and .f˜ coincide, thereby completing the proof.  In the following theorem, the function f could be discontinuous at some points. Theorem 8.30 (Dirichlet Theorem) Let .f : R → R be a T -periodic function. Assume that there is a finite number of points .t0 , t1 , t2 , . . . , tN , with 0 = t0 < t1 < t2 < · · · < tN = T

.

having the property that f is continuously differentiable on every interval .]tj −1 , tj [ , with .j = 1, 2, . . . , N. At the points .tj (where the function could either fail to be continuous, or, if continuous, it could fail to be differentiable), the following finite limits must exist: .

lim f (s) ,

s→tj−

lim f (s) ,

s→tj+

lim f  (s) ,

s→tj−

lim f  (s) .

s→tj+

8.4 Series of Functions

229

Then, for every .t ∈ [0, T ],     ∞   1 a0  2πk 2πk + t + bk sin t = lim f (s) + lim f (s) . ak cos . 2 T T 2 s→t − s→t + k=1

Moreover, the convergence is uniform on every compact interval where f is continuous. Note that if f is continuous at t, then  1 lim f (s) + lim f (s) . 2 s→t − s→t +

f (t) =

.

For the proofs of Theorems 8.27 and 8.30, we refer the reader to the book by Körner [4]. We now provide two examples of applications of the preceding theorem. Example 1 Let .f : R → R be the .2π-periodic function defined as f (t) = t ,

.

if t ∈ [−π, π[ .

It is readily seen that the assumptions of the Dirichlet theorem are satisfied. We compute c0 =

.

ak

.

2 = 2π

0

1 2π 2π

-



f (t) dt =

0

1 2π

1 f (t) cos(kt) dt = π

-

π −π

-

π −π

t dt = 0 ,

t cos(kt) dt = 0 ,

since .t → t cos(kt) is an odd function; and, integrating by parts, - 2π 2 1 π = f (t) sin(kt) dt = t sin(kt) dt 2π 0 π −π  

- π 2 (−1)k+1 1 cos(kt) cos(kt) π dt = . = + −t π k k k −π −π

bk

.

We can thus state that f (t) =

.

∞  2 (−1)k+1 k=1

k

sin(kt)

for every t ∈ ] − π, π[ .

230

8 Numerical Series and Series of Functions

As a particular case, taking .t = .

π 2,

we obtain the nice formula

1 1 1 π = 1 − + − +... 4 3 5 7

Example 2 Let .f : R → R be the .2π-periodic function defined as f (t) = t 2 ,

.

if t ∈ [−π, π[ .

It is readily seen that the assumptions of the Dirichlet theorem are satisfied. We compute c0 =

.

1 2π

-



f (t) dt =

0

1 2π

-

π −π

t 2 dt =

π2 , 3

and, integrating by parts, - 2π 2 1 π 2 f (t) cos(kt) dt = t cos(kt) dt = 2π 0 π −π  

π - π - π 2 1 sin(kt) 2 sin(kt) dt = − = − 2t t sin(kt) dt = t π k k πk −π −π −π  

- π 4 (−1)k 2 cos(kt) cos(kt) π dt = =− + . −t πk k k k2 −π −π

ak =

.

On the other hand, bk =

.

2 2π

-



f (t) sin(kt) dt =

0

1 π

-

π −π

t 2 sin(kt) dt = 0

since .t → t 2 sin(kt) is an odd function. Since .f : R → R is continuous, we can then state that ∞

π 2  4 (−1)k + .f (t) = cos(kt) , 3 k2

for every t ∈ R .

k=1

Let us focus on two interesting cases. If .t = π, we obtain the formula ∞

π2 =

.

π2  4 + , 3 k2 k=1

8.5 Series and Integrals

231

yielding .

∞  1 π2 ; = k2 6 k=1

if .t = 0, we have ∞

0=

.

π 2  4 (−1)k + , 3 k2 k=1

giving us the formula .

∞  (−1)k+1 π2 . = k2 12 k=1

8.5

Series and Integrals

We now prove a theorem that shows the close connection between the theory of numerical series and that of the integral. Theorem 8.31 Let .f : [1, +∞[ → R be a function that 1is positive, decreasing, and integrable on .[1, c] for every .c > 1. Then the series . ∞ k=1 f (k) converges if and only if f is integrable on .[1, +∞[ . Moreover, we have -

+∞

.

1

f ≤

∞ 

-

+∞

f (k) ≤ f (1) +

f. 1

k=1

Proof For .x ∈ [k, k + 1], it must be that .f (k + 1) ≤ f (x) ≤ f (k), hence -

k+1

f (k + 1) ≤

.

f ≤ f (k) .

k

Summing up, we obtain n  .

k=1

-

n+1

f (k + 1) ≤ 1

f ≤

n 

f (k) .

k=1

.c 1 Since f is positive, the sequence .( nk=1 f (k))n and the function .c → 1 f are both increasing and, therefore, have a limit. The conclusion now follows from the comparison theorem for limits. 

232

8 Numerical Series and Series of Functions

1

2

3

4

5

It should be clear that the choice of the starting point .a = 1 for both the integral and the series is by no way mandatory. 1 −3 Example Consider the series . ∞ k=1 k ; in this case, -

+∞

.

1



 1 1 dx ≤ ≤1+ x3 k3

-

k=1

+∞

1

1 dx , x3

and then ∞

.

3 1  1 ≤ ≤ . 2 k3 2 k=1

A greater accuracy is easily attained by computing the sum of the first few terms and then using the estimate given by the integral. For example, separating the first two terms, we have that .

∞ ∞  1 1  1 + = 1 + , k3 8 k3 k=1

k=3

with -

+∞

.

3



 1 1 1 + dx ≤ ≤ 27 x3 k3 k=3

-

+∞ 3

We thus have proved that ∞

.

255  1 263 ≤ . ≤ 3 216 k 216 k=1

1 dx . x3

9

More on the Integral

9.1

Saks–Henstock Theorem

Let us further analyze the definition of integral for a function .f : I → R when I = [a, b] is a compact interval. . The function f is integrable on I with the integral . I f if, for every .ε > 0, there is a gauge .δ on I such that, for every .δ-fine tagged partition

.

˚ = {(x1 , [a0 , a1 ]), . . . , (xm , [am−1 , am ])} P

.

 .   ˚ − f  ≤ ε . Then, since of .I, we have that .S(f, P) I ˚ = S(f, P)

m 

.

f (xj )(aj − aj −1 ) ,

f = I

j =1

m 

aj

f,

j =1 aj−1

we have that  m   . f (xj )(aj − aj −1 ) −  j =1

aj aj−1

 f  ≤ ε .

. aj This fact tells us that the sum of all “errors” .(f (xj )(aj − aj −1 ) − aj−1 f ) is arbitrarily small, provided that the tagged partition is sufficiently fine. Note that those “errors” may be either positive or negative, so that in the sum they could compensate for one another. The following theorem tells us that even the sum of all absolute values of those “errors” can be made arbitrarily small. Theorem 9.1 (Saks–Henstock Theorem—I) Let .f : I → R be an integrable ˚ function, and let .δ be a gauge on I such that, for every .δ-fine tagged partition .P © The Author(s), under exclusive license to Springer Nature Switzerland AG 2023 A. Fonda, A Modern Introduction to Mathematical Analysis, https://doi.org/10.1007/978-3-031-23713-3_9

233

234

9 More on the Integral

. ˚ − f | ≤ ε. Then for all such tagged partitions of .I, it happens that .|S(f, P) I ˚ = {(x1 , [a0 , a1 ]), . . . , (xm , [am−1 , am ])} we also have that .P m    f (xj )(aj − aj −1 ) − .  j =1

aj aj−1

  f  ≤ 4ε .

Proof We consider separately in the sum the positive and negative terms. Let us prove that the sum of the positive terms is less than or equal to .2ε. In an analogous way one can proceed for the negative terms. Rearranging the terms in the .sum, we aj can assume that the positive ones are the first q terms .(f (xj )(aj − aj −1 ) − aj−1 f ), with .j = 1, . . . , q, i.e., -

a1

f (x1 )(a1 − a0 ) −

.

f , . . . , f (xq )(aq − aq−1 ) −

a0

aq

f. aq−1

Consider the remaining .m − q intervals .[ak−1 , ak ], with .k = q + 1, . . . , m, i.e., [aq , aq+1 ] , . . . , [am−1 , am ] .

.

Since f is integrable on these intervals, there exist some gauges .δk on .[ak−1 , ak ], respectively, which we can choose such that .δk (x) ≤ δ(x) for every .x ∈ [ak−1 , ak ], for which  - ak    ε . S(f, P ˚k ) − f  ≤  m−q ak−1 ˚k of .[ak−1, ak ]. Consequently, the family .Q ˚ for every .δk -fine tagged partition .P made by the couples .(x1 , [a0 , a1 ]), . . . , (xq , [aq−1 , aq ]) and by the elements of the ˚k , with k varying from .q + 1 to .m, is a .δ-fine tagged partition of I such families .P that ˚ = S(f, Q)

q 

.

f (xj )(aj − aj −1 ) +

j =1

m 

˚k ) . S(f, P

k=q+1

Then we have .

q   f (xj )(aj − aj −1 ) − j =1

aj aj−1



f =

q 

f (xj )(aj − aj −1 ) −

j =1

m m     ˚k ) − ˚ − S(f, P f− = S(f, Q) k=q+1

I



ak

k=q+1 ak−1

f

q 

aj

j =1 aj−1

f

9.1 Saks–Henstock Theorem

235

 -  m      S(f, P ˚k ) − ˚ − f+ ≤ S(f, Q)   I

≤ ε + (m − q)

k=q+1

ak ak−1

  f 

ε = 2ε . m−q 

Proceeding similarly for the negative terms, the conclusion follows.

The following corollary will be useful in the next section to study the integrability of the absolute value of an integrable function. Corollary 9.2 Let .f : I → R be an integrable function, and let .δ be a gauge . on I ˚ of .I, it happens that .|S(f, P)− ˚ such that, for every .δ-fine tagged partition .P I f| ≤ ˚ = {(x1 , [a0 , a1 ]), . . . , (xm , [am−1 , am ])} we ε. Then for such tagged partitions .P also have  m    ˚ − . S(|f |, P)    j =1

aj aj−1

  f  ≤ 4ε .

Proof Using the well-known inequalities for the absolute value, by Theorem 9.1,  m    . S(|f |, P) ˚ −    j =1

aj aj−1

  f  = ≤

 m    |f (xj )|(aj − aj −1 ) −  

aj−1

j =1

m     |f (xj )(aj − aj −1 )| −    j =1



aj

m    f (xj )(aj − aj −1 ) −  j =1

aj aj−1

aj aj−1

    f  

  f 

  f  ≤ 4ε . 

This completes the proof.

The Saks–Henstock Theorem 9.1 can be generalized to tagged subpartitions. Here is the statement. Theorem 9.3 (Saks–Henstock Theorem—II) Let .f : I → R be an integrable ˚ of function, and let .δ be a gauge .on I such that, for every .δ-fine tagged partition .P ˚ .I, it happens that .|S(f, P) − I f | ≤ ε. Then for every .δ-fine tagged subpartition . = {(ξj , [αj , βj ]) : j = 1, . . . , m} of I we have m    f (ξj )(βj − αj ) − .  j =1

βj αj

  f  ≤ 4ε .

236

9 More on the Integral

As a consequence, for any such .δ-fine tagged subpartition,  m   . S(f, ) − 

βj

j =1 αj

  f  ≤ 4ε .

Proof By the Cousin theorem, it is possible to extend any .δ-fine tagged subpartition ˚ of I . Hence, the Saks–Henstock  of I to a whole .δ-fine tagged partition .P Theorem 9.1 applies, proving the first part of the statement. Concerning the second part, we have

.

 m   . S(f, ) − 

βj

j =1 αj

  m   m    f =  f (ξj )(βj − αj ) −

j =1 αj

j =1

 m   =  f (ξj )(βj − αj ) − j =1



m    f (ξj )(βj − αj ) −  j =1

βj αj

βj αj

  f 

  f 

  f  ≤ 4ε , 

thereby completing the proof.

9.2

βj

L-Integrable Functions

In this section, we introduce another important class of integrable functions on the interval .I = [a, b]. Definition 9.4 We say that an integrable function .f : I → R is “L-integrable“ (or “integrable according to Lebesgue”) if even .|f | happens to be integrable on .I. It is clear that every positive integrable function is L-integrable. Moreover, every continuous function on .[a, b] is L-integrable there since .|f | is still continuous. We have the following characterization of L-integrability. Proposition 9.5 Let .f : I → R be an integrable function, and consider the set .S of all real numbers q    .  i=1

ci ci−1

  f  ,

obtained choosing .c0 , c1 , . . . , cq in I in such a way that .a = c0 < c1 < · · · < cq = b. The function f is L-integrable on I if and only if .S is bounded from above. In

9.2 L-Integrable Functions

237

that case, we have |f | = sup S .

.

I

Proof Assume first that f is L-integrable on I . If .a = c0 < c1 < · · · < cq = b, then f and .|f | are integrable on every subinterval .[ci−1 , ci ], and we have .

q     i=1

ci ci−1

 q   f  ≤ i=1

ci

|f | =

ci−1

|f | . I

. Consequently, the set .S is bounded from above: .sup S ≤ I |f | < +∞. Conversely, assume now that .S is. bounded from above, and let us prove that in that case .|f | is integrable on I and . I |f | = sup S. Fix .ε > 0. Let .δ1 be a gauge ˚ of .I, we have such that, for every .δ1 -fine tagged partition .P  -    ˚ − f ≤ ε . . S(f, P)   8 I On the other hand, letting .J = sup S, by the properties of the supremum there surely are .a = c0 < c1 < · · · < cq = b such that J −

.

 q ε   ci  ≤ f  ≤J. 2 ci−1 i=1

We construct the gauge .δ2 in such a way that, for every .x ∈ I, it must be that [x − δ2 (x), x + δ2 (x)] meets only those intervals .[ci−1 , ci ] to which x belongs. In this way,

.

• If x belongs to the interior of one of the intervals .[ci−1 , ci ], we have that .[x − δ2 (x), x + δ2 (x)] is contained in .]ci−1 , ci [ . • If x coincides with one of the .ci in the interior of .[a, b], then .[x−δ2(x), x+δ2 (x)] is contained in .]ci−1 , ci+1 [ . • If .x = a, then .[x, x + δ2 (x)] is contained in .[a, c1 [ . • If .x = b, then .[x − δ2 (x), x] is contained in .]cq−1 , b]. Define .δ(x) = min{δ1 (x), δ2 (x)} for every .x ∈ I . Once we take a .δ-fine tagged ˚ = {(x1 , [a0 , a1 ]), . . . , (xm , [am−1 , am ])} of .I, consider the intervals partition .P (possibly empty or reduced to a point) Ij,i = [aj −1 , aj ] ∩ [ci−1 , ci ] .

.

238

9 More on the Integral

The choice of the gauge .δ2 yields that, if .Ij,i has a positive length, then .xj ∈ Ij,i . Indeed, if .xj ∈ [ci−1 , ci ], then [aj −1 , aj ] ∩ [ci−1 , ci ] ⊆ [xj − δ2 (xj ), xj + δ2 (xj )] ∩ [ci−1 , ci ] = Ø .

.

Therefore, if we take those .Ij,i , then the set ˚ = {(xj , Ij,i ) : j = 1, . . . , m , i = 1, . . . , q , μ(Ij,i ) > 0} Q

.

is a .δ-fine tagged partition of .I, and we have ˚ = S(|f |, P)

m 

.

|f (xj )|(aj − aj −1 ) =

j =1

q m  

˚ . |f (xj )|μ(Ij,i ) = S(|f |, Q)

j =1 i=1

Moreover,    q q  m q  m     ε   ci       .J − f = f ≤ f  ≤ J , ≤    2 ci−1 Ij,i Ij,i i=1

i=1

j =1

i=1 j =1

and by Corollary 9.2,  q  m    . S(|f |, Q) ˚ −    i=1 j =1

Ij,i

  ε ε f  ≤ 4 = . 8 2

Consequently, we have ˚ − J | = |S(|f |, Q) ˚ −J| |S(|f |, P)  q  m    ˚ −  ≤ S(|f |, Q) 

.

i=1 j =1



Ij,i

   m    q   f  +   i=1 j =1

Ij,i

    f  − J 

ε ε + = ε, 2 2

which is what we wanted to prove.



We have a series of corollaries. Corollary 9.6 Let .f, g : I → R be two integrable functions such that, for every x ∈ I,

.

|f (x)| ≤ g(x) ;

.

then f is L-integrable on .I.

9.2 L-Integrable Functions

239

Proof Take .c0 , c1 , . . . , cq in I so that .a = c0 < c1 < · · · < cq = b. Since −g(x) ≤ f (x) ≤ g(x) for every .x ∈ I, we have that

.

.

ci



g≤

ci−1

-

ci

ci

f ≤

ci−1

g, ci−1

i.e.,  . 

ci ci−1

  f  ≤

ci

g, ci−1

for every .1 ≤ i ≤ q. Hence, q    .  i=1

ci ci−1

  q   f ≤ i=1

ci ci−1

g=

g. I

. Then the set .S is bounded above by . I g, so that f is L-integrable on I .



Corollary 9.7 Let .f, g : I → R be two L-integrable functions, and let .α ∈ R be a constant. Then .f + g and .αf are L-integrable on .I. Proof By assumption, .f, |f | and .g, |g| are integrable on I . Then .f + g, .|f | + |g|, αf, and .|α||f | are, too. On the other hand, for every .x ∈ I,

.

|(f + g)(x)| ≤ |f (x)| + |g(x)| ,

.

|αf (x)| ≤ |α| |f (x)| .

Corollary 9.6 then guarantees that .f + g and .αf are L-integrable on I .



We have thus proved that the L-integrable functions make up a vector subspace of the space of integrable functions. Corollary 9.8 Let .f1 , f2 : I → R be two L-integrable functions. Then .min{f1 , f2 } and .max{f1 , f2 } are L-integrable on I . Proof It follows immediately from the formulas .

min{f1 , f2 } =

1 (f1 + f2 − |f1 − f2 |) , 2

and from Corollary 9.7.

max{f1 , f2 } =

1 (f1 + f2 + |f1 − f2 |) , 2 

Corollary 9.9 A function .f : I → R is L-integrable if and only if both its positive part .f + = max{f, 0}. and its .negative part .f − = max{−f, 0} are integrable on I . . In that case, . I f = I f + − I f − .

240

9 More on the Integral

Proof It follows immediately from Corollary 9.8 and the formulas .f = f + − f − , + + f −. .|f | = f  We want to see now an example of an integrable function that is not L-integrable. Let .f : [0, 1] → R be defined by   ⎧ 1 ⎨1 sin if x =

0, .f (x) = x x2 ⎩ 0 if x = 0 . Let us define the two auxiliary functions .g : [0, 1] → R and .h : [0, 1] → R as     ⎧ ⎨ 1 sin 1 + x cos 1 if x =

0, .g(x) = x x2 x2 ⎩ 0 if x = 0 , ⎧   ⎨ −x cos 1 if x =

0, .h(x) = x2 ⎩ 0 if x = 0 . It is easily seen that g is primitivable on .[0, 1] and that one of its primitives .G : [0, 1] → R is given by ⎧ 2   1 ⎨x cos if x =

0, .G(x) = 2 x2 ⎩ 0 if x = 0 . Moreover, h is continuous on .[0, 1], so it is primitivable there, too. Hence, even the function .f = g+h is primitivable on .[0, 1]. By the Fundamental Theorem, f is then integrable on .[0, 1]. We will show now that .|f | is not integrable on .[0, 1]. Consider the intervals .[((k +1)π)−1/2, (kπ)−1/2 ], with .k ≥ 1. The function .|f | is continuous on these intervals, so it is primitivable there. By the substitution .y = 1/x 2 , we obtain .

(kπ)−1/2 ((k+1)π)−1/2

1 x

    sin 1  dx =  x2 

(k+1)π kπ

1 | sin y| dy . 2y

On the other hand, -

(k+1)π

.



1 1 | sin y| dy ≥ 2y 2(k + 1)π

-

(k+1)π kπ

| sin y| dy =

1 . (k + 1)π

9.3 Monotone Convergence Theorem

241

If .|f | were integrable on .[0, 1], we would have that, for every .n ≥ 1, -

1

.

-

((n+1)π)−1/2

|f | =

0

|f | +

0



n  k=1

(kπ)−1/2 ((k+1)π)−1/2

(kπ)−1/2

−1/2 k=1 ((k+1)π)

|f | ≥

which is impossible since the series integrable on .[0, 1].

9.3

n 

n  k=1

1 π −1/2

|f |

1 , (k + 1)π

1∞

.

|f | +

1 k=1 k+1

diverges. Hence, f is not L-

Monotone Convergence Theorem

In this section and the next, we will consider the situation where a sequence of integrable functions .(fn )n converges pointwise to a function f , i.e., for every .x ∈ I, lim fn (x) = f (x) .

.

n

The question is whether f is integrable on I , with -

f = lim

.

I

fn ,

n

I

i.e., whether the following formula holds: -

lim fn = lim

.

I

n

n

fn . I

This problem has already been faced in Theorem 7.24, involving continuous functions and uniform convergence. We will see now that the formula holds true if the sequence of functions is monotone. Let us state the following result, due to Beppo Levi. Theorem 9.10 (Monotone Convergence Theorem—I) We are given a function f : I → R and a sequence of functions .fn : I → R, with .n ∈ N, verifying the following conditions:

.

(a) (b) .(c) .(d) .

.

The sequence .(fn )n converges pointwise to f . The sequence .(fn )n is monotone. Each function .fn is .integrable on I . The real sequence .( I fn )n has a finite limit.

242

9 More on the Integral

Then f is integrable on I , and -

f = lim

.

fn .

n

I

I

Proof We assume for definiteness that the sequence .(fn )n is increasing, i.e., fn (x) ≤ fn+1 (x) ≤ f (x) ,

.

for every .n ∈ N and every .x ∈ I. Let us set J = lim

.

n

fn . I

We will prove that f is integrable on I and that .J is its integral. Fix .ε > 0. Since ˚n is a .δ ∗ -fine every .fn is integrable on .I, there are some gauges .δn∗ on I such that if .P n tagged partition of I ; then     ε  ˚ . S(fn , Pn ) − fn  ≤ .  3 · 2n+3 I Moreover, there is a .n¯ ∈ N such that, for every .n ≥ n, ¯ it is 0≤J −

fn ≤

.

I

ε , 3

and since the sequence .(fn )n converges pointwise on I to .f, for every .x ∈ I there is a natural number .n(x) ≥ n¯ such that, for every .n ≥ n(x), one has |fn (x) − f (x)| ≤

.

ε . 3(b − a)

Let us define the gauge .δ in the following way. For every .x ∈ I, ∗ δ(x) = δn(x) (x) .

.

˚ = {(x1 , [a0 , a1 ]), . . . , (xm , [am−1 , am ])} be a .δ-fine tagged partition of Now let .P .I. We have  m    ˚ − J| =   |S(f, P) f (x )(a − a ) − J j j j −1  

.

j =1

    m [f (xj ) − fn(xj ) (xj )](aj − aj −1 ) ≤  j =1

9.3 Monotone Convergence Theorem

243

 m  + fn(xj ) (xj )(aj − aj −1 ) −

aj

aj−1

j =1

  m +

aj

j =1 aj−1

  fn(xj ) 

  fn(xj ) − J .

Estimation of the first term gives  m  m    .  ≤ [f (x ) − f (x )](a − a ) |f (xj ) − fn(xj ) (xj )|(aj − aj −1 ) j j j j −1 n(x ) j   j =1

j =1



m  j =1

ε ε (aj − aj −1 ) = . 3(b − a) 3

To estimate the second term, set r = min n(xj ) , s = max n(xj ) ,

.

1≤j ≤m

1≤j ≤m

and note that, putting together the terms whose indices .n(xj ) coincide with the same value k, by the second statement of Saks–Henstock Theorem 9.3, we obtain .

 m   fn(xj ) (xj )(aj − aj −1 ) −  j =1

 s   =  k=r



s 

s  k=r

  fk (xj )(aj − aj −1 ) − 

ε ε ≤ . 3 3 · 2k+3

aj

fk

aj−1

{1≤j ≤m : n(xj )=k}



4

aj−1

  fn(xj ) 

fk (xj )(aj − aj −1 ) −



k=r {1≤j ≤m : n(xj )=k}



aj

aj aj−1

  fk 

2   

244

9 More on the Integral

Concerning the third term, since .r ≥ n, ¯ using the monotonicity of the sequence (fn )n we have

.

0≤J −

fs = J −

.

I

.

≤J −

≤J −

aj

fs ≤

j =1 aj−1 m 

aj

j =1 aj−1 .

m 

m 

aj

fn(xj ) ≤ -

fr = J −

j =1 aj−1

fr ≤ I

ε , 3

from which   m .  j =1

  ε fn(xj ) − J  ≤ . 3 aj−1 aj

Hence, ˚ −J| ≤ |S(f, P)

.

ε ε ε + + = ε, 3 3 3 

and the proof is thus completed.

As an immediate consequence of the Monotone Convergence Theorem 9.10, we have an analogous statement for a series of functions. Corollary 9.11 We are given a function .f : I → R and a sequence of functions fk : I → R, with .k ∈ N, verifying the following conditions:

.

(a) .(b) .(c) .(d) .

1 The series . k fk converges pointwise to f . For every .k ∈ N and every .x ∈ I, we have .fk (x) ≥ 0. Each function 1 .f.k is integrable on I . The series . k ( I fk ) converges.

Then f is integrable on I , and f =

.

I

k

fk . I

9.4 Dominated Convergence Theorem

245

We can then write -  .

I

fk =

-

k

fk . I

k

2

Example Consider the Taylor series associated with the function .f (x) = ex , 2

ex =

.

∞  x 2k k=0

The functions .fk (x) = .I = [a, b] and -

b

.

x 2k k!

k!

.

satisfy the first three assumptions of Corollary 9.11, with

fk (x) dx =

a

x 2k+1 (2k + 1)k!

b = a

b2k+1 − a 2k+1 , (2k + 1)k!

1 . so it can be seen that the series . k ( I fk ) converges. It is then possible to apply the corollary, thereby obtaining -

b

.

2

ex dx =

a

∞ 2k+1  b − a 2k+1 k=0

(2k + 1)k!

.

.· In particular, considering the integral function . 0 f, we find an expression for the 2

primitives of .ex , i.e., .

2

ex dx =

∞  k=0

9.4

x 2k+1 +c. (2k + 1)k!

Dominated Convergence Theorem

We start by proving the following preliminary result. Lemma 9.12 Let .f1 , f2 , . . . , fn : I → R be integrable functions. If there exists an integrable function .g : I → R such that g(x) ≤ fk (x) ,

.

for every x ∈ I and k ∈ {1, . . . , n} ,

then .min{f1 , f2 , . . . , fn } and .max{f1 , f2 , . . . , fn } are integrable on I .

246

9 More on the Integral

Proof Consider the case .n = 2. The functions .f1 − g and .f2 − g, being integrable and nonnegative, are L-integrable. Hence, .min{f1 −g, f2 −g} and .max{f1 −g, f2 − g} are L-integrable, by Corollary 9.8. The conclusion then follows from the fact that .

min{f1 , f2 } = min{f1 − g, f2 − g} + g , max{f1 , f2 } = max{f1 − g, f2 − g} + g .

The general case can be easily obtained by induction.



We are now ready to state and prove the following important extension of Theorem 7.24 due to Henri Lebesgue. Theorem 9.13 (Dominated Convergence Theorem–I) We are given a function f : I → R and a sequence of functions .fn : I → R, with .n ∈ N, verifying the following conditions:

.

(a) The sequence .(fn )n converges pointwise to f . (b) Each function .fn is integrable on I . .(c) There are two integrable functions .g, h : I → R for which

.

.

g(x) ≤ fn (x) ≤ h(x)

.

for every .n ∈ N and .x ∈ I. $. % Then the sequence . I fn n has a finite limit, f is integrable on I , and -

f = lim

.

I

n

fn . I

Proof For any couple of natural numbers .n, , define the functions φn, = min{fn , fn+1 , . . . , fn+ } ,

.

n, = max{fn , fn+1 , . . . , fn+ } .

By Lemma 9.12, all .φn, and .n, are integrable on I . Moreover, for any fixed .n, the sequence .(φn, ) is decreasing and bounded from below by g, and the sequence .(n, ) is increasing and bounded from above by .h. Hence, these sequences converge to the two functions .φn and .n , respectively: .

lim φn, = φn = inf{fn , fn+1 , . . . } , 

.

lim n, = n = sup{fn , fn+1 , . . . } . 

. Furthermore, the sequence . .( I φn, ) is decreasing and bounded from below. by . I g, whereas the sequence .( I n, ) is increasing and bounded from above by . I h. The Monotone Convergence Theorem 9.10 then guarantees that the functions .φn and .n are integrable on I .

9.5 Hake’s Theorem

247

Now the sequence .(φn )n is increasing, and the sequence .(n )n is decreasing; as limn fn = f, we must have

.

.

lim φn = lim inf fn = f , n

lim n = lim sup fn = f .

n

n

n

. Moreover, the sequence. .( I φn )n is increasing and bounded from above .by . I h, whereas the sequence .( I n )n is decreasing and bounded from below by . I g. We can then apply again the Monotone Convergence Theorem 9.10, from which we deduce that f is integrable on I and . f = lim φn = lim n . .

I

n

.

Since .φn ≤ fn ≤ n , we have . by the Squeeze Theorem 3.10.

I

I

φn ≤

n

. I

fn ≤

. I

I

n , and the conclusion follows 

Example  Consider,  for .n ≥ 1, the functions .fn : [0, 3] → R defined by .fn (x) = n2 arctan nx − n+1 . We have the following situation: ⎧ π ⎪ ⎪ ⎪− 2 ⎪ ⎪ ⎨π . lim fn (x) = n ⎪ 4 ⎪ ⎪ ⎪ π ⎪ ⎩ 2

if x ∈ [0, 1[ , if x = 1 , if x ∈ ]1, 3] .

Moreover, |fn (x)| ≤

.

π , 2

for every n ∈ N and x ∈ [0, 3] .

The assumptions of the Dominated Convergence Theorem 9.13 are then satisfied, taking the two constant functions .g(x) = − π2 , .h(x) = π2 . We can then conclude that   - 3 π π n2 π . lim arctan nx − dx = − + 2 = . n n + 1 2 2 2 0

9.5

Hake’s Theorem

Recall that a function .f : [a, b[ → R is said to be integrable if it is integrable on [a, c] for every .c ∈ ]a, b[ , and the limit

.

.

lim

c→b− a

c

f

exists and is finite. We want to prove the following result by Heinrich Hake.

248

9 More on the Integral

Theorem 9.14 (Hake’s Theorem) Let .b < +∞, and assume that .f : [a, b[ → R is a function that is integrable on .[a, c], for every .c ∈ ]a, b[ . Then the function f is integrable on .[a, b[ if and only if it is the restriction of an integrable function .f¯ : [a, b] → R. In that case, -

b

.

-

f¯ =

b

f.

a

a

Proof Assume first that f is the restriction to .[a, b[ of an integrable function .f¯ : [a, b] → R. Fix .ε > 0; we want to find a .γ > 0 such that, if .c ∈ ]a, b[ and .b − c ≤ γ , then  . 

c

-

b

f−

a

a

  ¯ f  ≤ ε .

Let .δ be a gauge such that, for every .δ-fine tagged partition of .[a, b], we have . ˚ − b f¯| ≤ ε . We choose a positive constant .γ ≤ δ(b) such that |S(f¯, P) a 8 ε .γ |f¯(b)| ≤ . If .c ∈ ]a, b[ and .b − c ≤ γ , then, by the Saks–Henstock Theorem 9.1, 2 ˚ = {(b, [c, b])}, we have taking the .δ-fine tagged subpartition .P .

  . f¯(b)(b − c) −  and hence - c  . f −  a

b a

   f¯ = 

b c

   f¯ ≤ 

b c

b c

  ε ε ¯ f  ≤ 4 = , 8 2

  f¯ − f¯(b)(b − c) + |f¯(b)(b − c)|

ε ε ε ≤ + |f¯(b)|γ ≤ + = ε . 2 2 2 Let us prove now the other implication. Assume that f is integrable on .[a, b[ , and let .J be its integral, i.e., J = lim

.

c→b− a

c

f.

We extend f to a function .f¯ defined on the whole interval .[a, b] by setting, for instance, .f¯(b) = 0. To prove that .f¯ is integrable on .[a, b] with integral .J , fix .ε > 0. By the preceding limit, there is a .γ > 0 such that, if .c ∈ ]a, b[ and .b −c ≤ γ , then - c    ε  . f − J  ≤ .  2 a

9.5 Hake’s Theorem

249

Consider the sequence .(ci )i of points in .[a, b[ given by ci = b −

.

b−a . i+1

Note that it is strictly increasing, it converges to .b, and it is .c0 = a. Since f is integrable on each interval .[ci−1 , ci ], we can consider, for each .i ≥ 1, a gauge .δi on ˚i of .[ci−1 , ci ], we have .[ci−1 , ci ] such that, for every .δi -fine tagged partition .P   S(f, P ˚i ) − 

ci

.

ci−1

  f  ≤

ε . 2i+4

We define a gauge .δ on .[a, b] by setting  2 ⎧ x − ci−1 ci − x ⎪ ⎪ min δ , if x (x) , ⎪ i ⎪ ⎪ 2 2 2 ⎪  ⎪ ⎪ ⎨ min δ (a) , c1 − a if x 1 .δ(x) = 2  2 ⎪ ⎪ ⎪ ⎪ min δi (ci ) , δi+1 (ci ) , ci − ci−1 , ci+1 − ci if x ⎪ ⎪ ⎪ 2 2 ⎪ ⎩ γ if x

∈ ]ci−1 , ci [ , = a, = ci and i ≥ 1 , = b.

˚ = {(xj , [aj −1 , aj ]) : j = 1, . . . , m} be a .δ-fine tagged partition of .[a, b]. Let .P Denote by q the smallest integer for which .cq+1 ≥ am−1 . The choice of the gauge allows us to split the Riemann sum, much like in the proof of Theorem 7.18 on the ˚ will contain additivity of the integral on subintervals, so that the sum .S(f¯, P) • the Riemann sums on .[ci−1 , ci ], with .i = 1, . . . , q; • a Riemann sum on .[cq , am−1 ]; • a last term .f¯(xm )(b − am−1 ). ˚i be the tagged partition of .[ci−1 , ci ] and (The first line disappears if .q = 0.) Let .P ˚ ˚ Then .Q be the tagged partition of .[cq , am−1 ] whose intervals are those of .P. ˚ = S(f¯, P)

q 

.

˚i ) + S(f, Q) ˚ + f¯(xm )(b − am−1 ) . S(f, P

i=1

To better clarify what was just said, assume, for example, that .q = 2; then there must be a .j¯1 for which .xj¯1 = c1 and a .j¯2 for which .xj¯2 = c2 . Then ˚) .S(f¯, P

= [f (x1 )(a1 − a) +. . .+ f (xj¯1 −1 )(aj¯1 −1 − aj¯1 −2 ) + f (c1 )(c1 − aj¯1 −1 )] + [f (c1 )(aj¯1 − c1 ) +. . .+ f (xj¯2 −1 )(aj¯2 −1 − aj¯2 −2 ) + f (c2 )(c2 − aj¯2 −1 )]

250

9 More on the Integral

+ [f (c2 )(aj¯2 − c2 ) + · · · + f (xm−1 )(am−1 − am−2 )] + f¯(xm )(b − am−1 ).

˚i is a .δi -fine tagged partition of .[ci−1 , ci ] and that .Q ˚ is a .δq+1 -fine Note that .P tagged subpartition of .[cq , cq+1 ]. Moreover, by the choice of the gauge .δ, it must be that .xm = b and, hence, .f¯(xm ) = 0 and .b − am−1 ≤ δ(b) = γ . Using the fact that -

am−1

.

f =

a

q 

-

ci

f+

ci−1

i=1

am−1

f, cq

by the Saks–Henstock Theorem 9.3 we have ˚) .|S(f¯, P

  ˚) − − J | ≤ S(f¯, P

am−1

a

am−1

a

  S(f, P ˚i ) − ≤  ≤

   f  + 

q

ci

i=1

ci−1

  f − J 

    ˚) − f  + S(f, Q

am−1 cq

   f  + 

am−1 a

  f − J 

q  ε ε ε + 4 q+4 + 2 2i+3 2 i=1



ε ε ε + + < ε, 8 4 2



and the proof is thus completed.

The above theorem suggests that even for a function .f : [a, +∞[ → R the definition of the integral could be reduced to that of a usual integral. Indeed, fixing arbitrarily .b > a, we could define a continuously differentiable strictly increasing auxiliary function .ϕ : [a, b[ → R such that .ϕ(a) = a and .limu→b− ϕ(u) = +∞; for example, take .ϕ(u) = a + ln b−a b−u . A formal change of variables then gives .

a

+∞

-

b

f (x) dx =

f (ϕ(u))ϕ  (u) du ,

a

and Hake’s theorem applies to this last integral. With this idea in mind, it is possible to prove that .f : [a, +∞[ → R is integrable and its integral is a real number .J if and only if for every .ε > 0 there is a gauge .δ, defined on .[a, +∞[ , and a positive constant .α such that, if a = a0 < a1 < · · · < am−1 ,

.

with

am−1 ≥ α ,

9.5 Hake’s Theorem

251

and, for every .j = 1, . . . , m − 1, the points .xj ∈ [aj −1 , aj ] satisfy xj − aj −1 ≤ δ(xj )

.

and

aj − xj ≤ δ(xj ) ,

then  m−1    . f (xj )(aj − aj −1 ) − J  ≤ ε .  j =1

We refer to Bartle’s book [1] for a complete treatment of this case. Needless to say, similar considerations can be made in the case where the function f is defined on an interval of the type .]a, b], with .a ≥ −∞.

Part IV Differential and Integral Calculus in RN .

10

The Differential

Let .O ⊆ RN be an open set, .x0 a point of .O, and .f : O → RM a given function. We want to extend the notion of derivative of f at .x0 already known in the case .M = N = 1. The definition, inspired by Theorem 6.2, follows. Definition 10.1 We say that f is “differentiable” at .x0 if there exists a linear function . : RN → RM for which we can write f (x) = f (x0 ) + (x − x0 ) + r(x) ,

.

where r is a function satisfying .

lim

x→x0

r(x) = 0. x − x0 

If f is differentiable at .x0 , then the linear function . is called the “differential” of f at .x0 and is denoted by df (x0 ) .

.

Following the tradition for linear functions, taking .h ∈ RN , we will often write df (x0 )h instead of .df (x0 )(h). Assuming that .O is an open set is not really necessary, but guarantees the uniqueness of the differential and it simplifies many issues. In what follows, however, we will sometimes encounter situations where the domain is not open. More care will be needed in these cases. We will now first concentrate for a while on the simpler case .M = 1.

.

© The Author(s), under exclusive license to Springer Nature Switzerland AG 2023 A. Fonda, A Modern Introduction to Mathematical Analysis, https://doi.org/10.1007/978-3-031-23713-3_10

255

256

10.1

10 The Differential

The Differential of a Scalar-Valued Function

Assume, for simplicity, that .M = 1. We start by fixing a “direction,” i.e., a vector v ∈ RN , with .v = 1, also called a “unit vector.” Whenever it exists, we call “directional derivative” of f at .x0 in the direction .v the limit

.

.

lim

t →0

f (x0 + t v) − f (x0 ) , t

which will be denoted by .

∂f (x0 ) . ∂v

If .v coincides with an element .ej of the canonical basis .(e1 , e2 , . . . , eN ) of .RN, the directional derivative is called the j th “partial derivative” of f at .x0 and is denoted by .

∂f (x 0 ) . ∂xj

0 ), then If .x0 = (x10 , x20 , . . . , xN

.

f (x 0 + t e j ) − f (x 0 ) ∂f (x0 ) = lim t →0 ∂xj t = lim

0 ) − f (x 0 , x 0 , . . . , x 0 , . . . , x 0 ) f (x10 , x20 , . . . , xj0 + t, . . . , xN 1 2 j N

t →0

t

,

so that it is commonly called “partial derivative with respect to the j th variable.” The following theorem shows, among other things, that the differential is unique. Theorem 10.2 If f is differentiable at .x0 , then f is continuous at .x0 . Moreover, all the directional derivatives of f at .x0 exist: For every direction .v ∈ RN we have .

∂f (x0 ) = df (x0 )v . ∂v

Proof We know that the function . = df (x0 ), being linear, is continuous, and (0) = 0. Then

.

.

lim f (x) = lim [f (x0 ) + (x − x0 ) + r(x)]

x→x0

x→x0

= f (x0 ) + (0) + lim r(x) x→x0

10.1 The Differential of a Scalar-Valued Function

= f (x0 ) + lim

x→x0

257

r(x) lim x − x0  x − x0  x→x0

= f (x 0 ) , showing that f is continuous at .x0 . Concerning the directional derivatives, we have .

lim

t →0

f (x0 + t v) − f (x0 ) df (x0 )(t v) + r(x0 + t v) = lim t →0 t t t df (x0 )v + r(x0 + t v) = lim t →0 t r(x0 + t v) = df (x0 )v + lim . t →0 t

On the other hand, since .v = 1, the change of variables formula (3.1) gives us .

   r(x0 + t v)  |r(x)|  = lim = 0, lim   x→x0 x − x0  t →0 t 

whence the conclusion.

In particular, if .v coincides with an element .ej of the canonical basis .(e1 , e2 , . . . , eN ), then

.

.

∂f (x0 ) = df (x0 )ej . ∂xj

Writing the vector .h ∈ RN as .h = h1 e1 + h2 e2 + · · · + hN eN , by linearity we have df (x0 )h = h1 df (x0 )e1 + h2 df (x0 )e2 + · · · + hN df (x0 )eN

.

= h1

∂f ∂f ∂f (x0 ) + h2 (x0 ) + · · · + hN (x 0 ) , ∂x1 ∂x2 ∂xN

i.e., N  ∂f .df (x0 )h = (x0 )hj . ∂xj j =1

If we define the “gradient” of f at .x0 as the vector ∇f (x0 ) =

.

 ∂f ∂x1

(x0 ), . . . ,

 ∂f (x 0 ) , ∂xN

258

10 The Differential

we can then write df (x0 )h = ∇f (x0 ) · h .

.

Remark 10.3 The mere existence of the directional derivatives at some point .x0 does not guarantee differentiability there. For example, the function .f : R2 → R, defined as ⎧ ⎨ x4y2 if (x, y) = (0, 0) , .f (x, y) = (x 4 + y 2 )2 ⎩ 0 if (x, y) = (0, 0) , has all its directional derivatives at .x0 = (0, 0) equal to 0. However, it is not even continuous there since its restriction to the parabola .{(x, y) : y = x 2 } is constantly equal to . 14 . Here is a result showing that the existence of the partial derivatives is sufficient for the differentiability, provided that they are continuous. Theorem 10.4 If f has partial derivatives defined in a neighborhood of .x0 , and they are continuous at .x0 , then f is differentiable at .x0 . Proof To simplify the notations, we will assume that .N = 2. We define the function  : R2 → R associating to every vector .h = (h1 , h2 ) the real number

.

(h) =

.

∂f ∂f (x0 )h1 + (x0 )h2 . ∂x1 ∂x2

We will prove that . is indeed the differential of f at .x0 . First of all, it is readily verified that it is linear. Moreover, writing .x0 = (x10 , x20 ) and .x = (x1 , x2 ), by the Lagrange Mean Value Theorem 6.11 we have f (x) − f (x0 ) = (f (x1 , x2 ) − f (x10 , x2 )) + (f (x10 , x2 ) − f (x10 , x20 ))

.

=

∂f 0 ∂f (ξ1 , x2 )(x1 − x10 ) + (x , ξ2 )(x2 − x20 ) ∂x1 ∂x2 1

for some .ξ1 ∈ ]x10, x1 [ and .ξ2 ∈ ]x20 , x2 [ . Hence, r( . x) = f (x) − f (x0 ) − (x − x0 )

∂f ∂f 0 0 = (ξ1 , x2 ) − (x1 , x2 ) (x1 − x10 ) ∂x1 ∂x1

∂f 0 0 ∂f 0 (x , ξ2 ) − (x , x ) (x2 − x20 ) . + ∂x2 1 ∂x2 1 2

10.2 Some Computational Rules

259

Then, since .|x1 − x10 | ≤ x − x0  and .|x2 − x20 | ≤ x − x0 , .

     ∂f |r(x)| ∂f 0 0   ∂f 0 ∂f 0 0  ≤  (ξ1 , x2 ) − (x1 , x2 ) +  (x1 , ξ2 ) − (x1 , x2 ) . x − x0  ∂x1 ∂x1 ∂x2 ∂x2

Letting .x tend to .x0 , we have that .(ξ1 , x2 ) → (x10 , x20 ) and .(x10 , ξ2 ) → (x10 , x20 ) so ∂f ∂f and . ∂x are continuous at .x0 = (x10 , x20 ), it must be that that, since . ∂x 1 2 .

lim

x→x0

|r(x)| = 0, x − x0 

whence the conclusion.



We say that .f : O → R is “differentiable” if it is so at every point of .O; it is “ of class .C 1 ” or “a .C 1 -function” if it has partial derivatives that are continuous on the whole domain .O. From the previous theorem we have that a function of class .C 1 is surely differentiable. Assume now that f is defined on some domain .D that is not an open set. In this case, we say that f is “differentiable” if it is the restriction of some differentiable function defined on an open set .O containing .D, and similarly when f is “of class 1 .C .”

10.2

Some Computational Rules

Let us start with some simple propositions. Proposition 10.5 If .f : O → R is constant, then .df (x0 ) = 0 for every .x0 ∈ O. Proof Let .f (x) = c for every .x ∈ O. Then, setting .(h) = 0 for every .h ∈ RN , f (x) − f (x0 ) − (x − x0 ) = c − c − 0 = 0

.

for every .x ∈ O.



Proposition 10.6 If .A : RN → R is linear, then .dA(x0 ) = A for every .x0 ∈ O. Proof Let .f (x) = Ax for every .x ∈ RN . Then, setting .(h) = Ah, by linearity, f (x) − f (x0 ) − (x − x0 ) = Ax − Ax0 − A(x − x0 ) = 0

.

for every .x ∈ O.



260

10 The Differential

Proposition 10.7 If .B : RN1 × RN2 → R is bilinear, writing .x0 = (x0 , y0 ) with N N N N .x0 ∈ R 1 , .y0 ∈ R 2 , and and .h = (h, k), with .h ∈ R 1 , .k ∈ R 2 , we have dB(x0 )(h) = B(h, y0 ) + B(x0, k) .

.

Proof Writing .x = (x, y) with .x ∈ RN1 , .y ∈ RN2 , let .f (x) = B(x, y). Then, setting .(h) = B(h, y0 ) + B(x0, k), we compute r(x) = f (x) − f (x0 ) − (x − x0 )

.

= B(x, y) − B(x0, y0 ) − B(x − x0 , y0 ) − B(x0, y − y0 ) = B(x − x0 , y − y0 ) = B(x − x0 ) . Denoting by .e1 , . . . , eN1 the vectors of the canonical basis of .RN1 and by ˆ 1 , . . . , eˆ N2 those of the canonical basis of .RN2 , for every .x ∈ RN1 and .y ∈ RN2 .e we have that B(x, y) = B

 N1

.

x i ei ,

N2  j =1

i=1

 yj eˆ j

=

N1  N2 

xi yj B(ei , eˆ j ) ,

i=1 j =1

hence there is a constant C such that |B(x, y)| ≤ Cx y

.

for every x ∈ RN1 , y ∈ RN2 .

Then |B(x − x0 )| = |B(x − x0 , y − y0 )| ≤ Cx − x0  y − y0  ≤ Cx − x0 2 ,

.

whence, if .x = x0 , then .

|B(x − x0 )| |r(x)| = ≤ Cx − x0  , x − x0  x − x0 

and finally .

The statement is thus proved.

lim

x→x0

|r(x)| = 0. x − x0  

We now compute the differential of the sum of two functions and the product with some constants.

10.3 Twice Differentiable Functions

261

Proposition 10.8 If .f, g : O → R are differentiable at .x0 and .α, β are two real numbers, then d(αf + βg)(x0 ) = αdf (x0 ) + βdg(x0 ) .

.

Proof Writing f (x) = f (x0 ) + df (x0 )(x − x0 ) + r1 (x) ,

.

g(x) = g(x0 ) + dg(x0 )(x − x0 ) + r2 (x) , we have that (αf + βg)(x) = (αf + βg)(x0 ) + (αdf (x0 ) + βdg(x0 ))(x − x0 ) + r(x) ,

.

with .r(x) = αr1 (x) + βr2 (x), and .

lim

x→x0

r(x) r1 (x) r2 (x) = α lim + β lim = 0. x→x0 x − x0  x→x0 x − x0  x − x0 

Hence, .αf + βg is differentiable at .x0 with differential .αdf (x0 ) + βdg(x0 ).

10.3



Twice Differentiable Functions

Let .O be an open subset of .RN , and .f : O → R be a differentiable function. We want to extend the notion of “second derivative,” which is well known in the case .N = 1. For simplicity’s sake, let us deal with the case .N = 2. If the partial ∂f derivatives . ∂x , . ∂f : O → R have themselves partial derivatives at a point .x0 , 1 ∂x2 these are said to be “second-order partial derivatives” of f at .x0 and are denoted by .

∂ 2f ∂ ∂f (x 0 ) = (x 0 ) , ∂x1 ∂x1 ∂x12 ∂ 2f ∂ ∂f (x 0 ) = (x 0 ) , ∂x1 ∂x2 ∂x1 ∂x2

∂ 2f ∂ ∂f (x 0 ) = (x 0 ) , ∂x2 ∂x1 ∂x2 ∂x1 ∂ 2f ∂ ∂f (x 0 ) = (x 0 ) . 2 ∂x ∂x2 2 ∂x2

Here is a relation involving the “mixed derivatives.” Theorem 10.9 (Schwarz Theorem) If the second-order mixed partial derivatives ∂2f ∂2f . ∂x2 ∂x1 , . ∂x1 ∂x2 exist in a neighborhood of .x0 and they are continuous at .x0 , then .

∂ 2f ∂ 2f (x 0 ) = (x 0 ) . ∂x2 ∂x1 ∂x1 ∂x2

262

10 The Differential

Proof Let .ρ > 0 be such that .B(x0 , ρ) ⊆ O. We write .x0 = (x10 , x20 ), and we take an .x = (x1 , x2 ) ∈ B(x0 , ρ) such that .x1 = x10 and .x2 = x20 . It is then possible to define g(x1 , x2 ) =

.

f (x1 , x2 ) − f (x1 , x20 ) x2 − x20

,

h(x1 , x2 ) =

f (x1 , x2 ) − f (x10 , x2 ) x1 − x10

.

We can verify that .

g(x1 , x2 ) − g(x10 , x2 ) x1 − x10

=

h(x1 , x2 ) − h(x1 , x20 ) x2 − x20

.

By the Lagrange Mean Value Theorem 6.11, there is a .ξ1 ∈ ]x10 , x1 [ such that .

g(x1 , x2 ) − g(x10 , x2 ) x1 − x10

=

∂g (ξ1 , x2 ) = ∂x1

∂f ∂f 0 ∂x1 (ξ1 , x2 ) − ∂x1 (ξ1 , x2 ) x2 − x20

,

∂f ∂f 0 ∂x2 (x1 , ξ2 ) − ∂x2 (x1 , ξ2 ) x1 − x10

.

and there is a .ξ2 ∈ ]x20 , x2 [ such that .

h(x1 , x2 ) − h(x1 , x20 ) x2 − x20

=

∂h (x1 , ξ2 ) = ∂x2

Again by the Lagrange Mean Value Theorem 6.11, there is a .η2 ∈ ]x20 , x2 [ such that .

∂f ∂f 0 ∂x1 (ξ1 , x2 ) − ∂x1 (ξ1 , x2 ) x2 − x20

=

∂ 2f (ξ1 , η2 ) , ∂x2 ∂x1

=

∂ 2f (η1 , ξ2 ) . ∂x1 ∂x2

and there is a .η1 ∈ ]x10 , x1 [ such that .

∂f ∂x2 (x1 , ξ2 )

∂f 0 ∂x2 (x1 , ξ2 ) x1 − x10



Hence, .

∂ 2f ∂ 2f (ξ1 , η2 ) = (η1 , ξ2 ) . ∂x2 ∂x1 ∂x1 ∂x2

Taking the limit, as .x = (x1 , x2 ) tends to .x0 = (x10 , x20 ), we have that both .(ξ1 , η2 ) and .(η1 , ξ2 ) converge to .x0 , and the continuity of the second-order partial derivatives leads to the conclusion. 

10.4 Taylor Formula

263

We say that .f : O → R is “ of class .C 2 ” or “a .C 2 -function” if all its second-order partial derivatives exist and are continuous on .O. It is useful to consider the “Hessian matrix” of f at .x0 : ⎛ ⎜ Hf (x0 ) = ⎜ ⎝

∂2f (x 0 ) ∂x12

∂2f ∂x2 ∂x1 (x0 )

∂2f ∂x1 ∂x2 (x0 )

∂2f (x 0 ) ∂x22

.

⎞ ⎟ ⎟; ⎠

if f is of class .C 2 , then this is a symmetric matrix. What was just said extends without difficulty for any .N ≥ 2; if f is of class .C 2 , then the Hessian matrix is an .N × N symmetric matrix. One can further define by induction the nth-order partial derivatives. It is said that .f : O → R is “ of class .C n ” or “a .C n -function” if all its nth-order partial derivatives exist and are continuous on .O.

10.4

Taylor Formula

Let .O be an open subset of .RN , and assume that .f : O → R is a function of class n+1 for some .n ≥ 1. .C As previously, for simplicity we will deal with the case .N = 2. Let us introduce the following notations: Dx1 =

.

∂2 , ∂x12

Dx21 =

.

∂ , ∂x1

Dx1 Dx2 =

Dx2 =

∂ , ∂x2

∂2 , ∂x1 ∂x2

Dx22 =

∂2 , ∂x22

and so on for the higher-order derivatives. Note that for any vector .h = (h1 , h2 ) ∈ R2 , df (x0 )h = h1 Dx1 f (x0 ) + h2 Dx2 f (x0 ) ,

.

which can also be written df (x0 )h = [h1 Dx1 + h2 Dx2 ]f (x0 ) .

.

In this way, we can think that the function f is transformed by the “operator” [h1 Dx1 + h2 Dx2 ] into the new function

.

[h1 Dx1 + h2 Dx2 ]f = h1 Dx1 f + h2 Dx2 f .

.

264

10 The Differential

Given two points .x0 = x in .RN , the “segment” joining them is defined by [x0 , x] = {x0 + t (x − x0 ) : t ∈ [0, 1]} ;

.

similarly, we will write ]x0 , x[ = {x0 + t (x − x0 ) : t ∈ ]0, 1[ } .

.

Assume now that the segment .[x0 , x] is contained in .O, and consider the function φ : [0, 1] → R defined as

.

φ(t) = f (x0 + t (x − x0 )) .

.

We will prove that .φ is .n + 1 times differentiable on .[0, 1]. For any .t ∈ [0, 1], since f is differentiable at .u0 = x0 + t (x − x0 ), we have that f (u) = f (u0 ) + df (u0 )(u − u0 ) + r(u) ,

.

with .

lim

u→u0

r(u) = 0. u − u0 

Hence, .

lim

s→t

φ(s) − φ(t) f (x0 + s(x − x0 )) − f (x0 + t (x − x0 )) = lim s→t s−t s−t df (x0 + t (x − x0 ))((s − t)(x − x0 )) + r(x0 + s(x − x0 )) = lim s→t s−t r(x0 + s(x − x0 )) = df (x0 + t (x − x0 ))(x − x0 ) + lim , s→t s−t

and since .

   r(x0 + s(x − x0 ))  |r(u)|  = lim x − x0  = 0 , lim   s→t u→u0 u − u0  s−t

we have that φ  (t) = lim

.

s→t

φ(s) − φ(t) = df (x0 + t (x − x0 ))(x − x0 ) . s−t

With the new notations, setting .x − x0 = h = (h1 , h2 ), we can write φ  (t) = [h1 Dx1 + h2 Dx2 ]f (x0 + t (x − x0 )) = g(x0 + t (x − x0 )) ,

.

10.4 Taylor Formula

265

where g is the function .[h1 Dx1 + h2 Dx2 ]f . We can then iterate the procedure and compute the second derivative φ  (t) = [h1 Dx1 + h2 Dx2 ]g(x0 + t (x − x0 ))

.

= [h1 Dx1 + h2 Dx2 ][h1 Dx1 + h2 Dx2 ]f (x0 + t (x − x0 )) . For briefness, we will write φ  (t) = [h1 Dx1 + h2 Dx2 ]2 f (x0 + t (x − x0 )) .

.

Notice that, by the linearity of the partial derivatives and the equality of the secondorder mixed derivatives (Schwarz Theorem 10.9), [h1 Dx1 + h2 Dx2 ]2 f = h21 Dx21 f + 2h1 h2 Dx1 Dx2 f + h22 Dx22 f

.

= [h21 Dx21 + 2h1 h2 Dx1 Dx2 + h22 Dx22 ]f . We now observe that the equality [h1 Dx1 + h2 Dx2 ]2 = [h21 Dx21 + 2h1 h2 Dx1 Dx2 + h22 Dx22 ]

.

is formally obtained as the square of a binomial. Proceeding in this way, we can prove by induction that, for .k = 1, 2, . . . , n + 1, the formula for the kth derivative of .φ is φ (k) (t) = [h1 Dx1 + h2 Dx2 ]k f (x0 + t (x − x0 )) ,

.

and, using the binomial formula (a1 + a2 ) = k

.

k    k j =0

j

k−j j a2 ,

a1

we formally have that [h1 Dx1

.



k   k k−j j k−j j + h2 Dx2 ] = h h2 Dx1 Dx2 j 1 k

j =0

(in this formula, the symbols .Dx01 and .Dx02 simply denote the identity operator). To write the Taylor formula, let us introduce the notation d k f (x0 )hk = [h1 Dx1 + h2 Dx2 ]k f (x0 ) .

.

266

10 The Differential

Theorem 10.10 (Taylor Theorem—III) Let .f : O → R be a function of class C n+1 and .[x0 , x] be a segment contained in .O. Then there exists a .ξ ∈ ]x0 , x[ such that

.

f (x) = pn (x) + rn (x) ,

.

where pn (x) = f (x0 ) + df (x0 )(x − x0 ) +

.

+ ··· +

1 2 d f (x0 )(x − x0 )2 2!

1 n d f (x0 )(x − x0 )n n!

is the “nth-order Taylor polynomial associated with the function f at the point .x0 ”, and rn (x) =

.

1 d n+1 f (ξξ )(x − x0 )n+1 (n + 1)!

is the “Lagrange form of the remainder.” Proof Applying the Taylor formula to the function .φ, we have that φ(t) = φ(0) + φ  (0)t +

.

1  1 1 φ (0)t 2 + · · · + φ (n) (0)t n + φ (n+1) (ξ )t n+1 2! n! (n + 1)!

for some .ξ ∈ ]0, t[ . We thus directly conclude the proof taking .t = 1 and substituting the values of the derivatives of .φ computed earlier.  The Taylor polynomial can be expressed as pn (x) =

.

n  1 k d f (x0 )(x − x0 )k , k! k=0

with the convention that .d 0 f (x0 )(x − x0 )0 , the first addend in the sum, is simply .f (x0 ). Hence, pn (x) =

.

n   +k   1 * x1 − x10 Dx1 + x2 − x20 Dx2 f (x0 ) k! k=0

 k    n  ∂kf 1  k 0 k−j 0 j = (x0 ) (x1 − x1 ) (x2 − x2 ) . k! j ∂x k−j ∂x j k=0

j =0

1

2

10.5 The Search for Maxima and Minima

267

Here is a useful expression for the second-order polynomial: p2 (x) = f (x0 ) + ∇f (x0 ) · (x − x0 ) +

.

1 2

  Hf (x0 )(x − x0 ) · (x − x0 ) .

The foregoing proved theorem remains valid for any dimension N when the notations are properly interpreted. For example, for any vector .h = (h1 , h2 , . . . , hN ), d k f (x0 )hk = [h1 Dx1 + h2 Dx2 + · · · + hN DxN ]k f (x0 ) .

.

In this case, when writing explicitly the Taylor polynomial, the following generalization of the binomial formula will be useful: (a1 + a2 + · · · + aN )k =



.

m1 +m2 +···+mN =k

10.5

k! mN a m1 a m2 · · · aN . m1 ! m2 ! · · · mN ! 1 2

The Search for Maxima and Minima

As earlier, let .O ⊆ RN , the domain of our function .f : O → R, be an open set. Recall that .x0 ∈ O is a “local maximum point” for f if there exists a neighborhood .U ⊆ O of .x0 such that .f (U ) has a maximum and .f (x0 ) = max f (U ). A similar definition holds for “local minimum point.” Theorem 10.11 (Fermat’s Theorem—II) Assume that .O is an open set and f : O → R is differentiable at .x0 ∈ O. If, moreover, .x0 is a local maximum or minimum point for f , then .∇f (x0 ) = 0.

.

Proof If .x0 is a local maximum point, then for every direction .v ∈ RN there is a .δ > 0 for which  f (x 0 + t v ) − f (x 0 ) ≥ 0 if − δ < t < 0 , . ≤ 0 if 0 < t < δ . t Since f is differentiable at .x0 , we necessarily have that .

∂f f (x 0 + t v ) − f (x 0 ) = 0. (x0 ) = lim t →0 ∂v t

In particular, all partial derivatives are equal to zero, hence .∇f (x0 ) = 0. When .x0 is a local minimum point, the proof is similar.  A point where the gradient vanishes is called a “stationary point.” We know already from the case .N = 1 that such a point could be neither a local maximum nor a local minimum point.

268

10 The Differential

We will now show how the Taylor formula provides a criterion establishing when a stationary point is either a local maximum or a local minimum point. Let us start with a definition. We say that a symmetric .N × N matrix .A is “positive definite” if [Ah] · h > 0 ,

for every h ∈ RN \ {0} .

.

In contrast, we say that .A is “negative definite” if the opposite inequality holds, i.e., when .−A is positive definite. Theorem 10.12 If .x0 is a stationary point and f is of the class .C 2 , with a positive definite Hessian matrix .Hf (x0 ), then .x0 is a local minimum point. In contrast, if .Hf (x0 ) is negative definite, then .x0 is a local maximum point. Proof By the Taylor formula, for any .x = x0 in a neighborhood of .x0 there exists a .ξ ∈ ]x0 , x[ for which f (x) = f (x0 ) + ∇f (x0 ) · (x − x0 ) +

.

1 2

  Hf (ξξ )(x − x0 ) · (x − x0 ) .

If .A = Hf (x0 ) is positive definite, there is a constant .c > 0 such that, for every v ∈ RN with .v = 1,

.

[Av] · v ≥ c .

.

(We have used Weierstrass’ Theorem 4.10 and the fact that the sphere .{v ∈ RN : v = 1} is a compact set.) Hence,   x − x0 x − x0 · ≥ c. Hf (x0 ) x − x0  x − x0 

.

Recalling the continuity of the second derivatives, if .x = x0 is sufficiently near .x0 , then   x − x0 x − x0 · ≥ 12 c > 0 . ξ) . Hf (ξ x − x0  x − x0  (This can be proved by contradiction using the compactness of the sphere again.) Since .∇f (x0 ) = 0, for such .x we have that f (x ) = f (x 0 ) +

.

1 2

  Hf (ξξ )(x − x0 ) · (x − x0 )

≥ f (x0 ) + 12 cx − x0 2 > f (x0 ) , hence .x0 is a local minimum point. The proof of the second statement is analogous.



10.6 Implicit Function Theorem: First Statement

269

We now state (without proof) two useful criteria for determining when a symmetric .N × N matrix .A is positive definite or negative definite. We recall that all the eigenvalues of a symmetric matrix are real. First Criterion The symmetric matrix .A is positive definite if and only if all its eigenvalues are positive. It is negative definite if and only if all its eigenvalues are negative. Second Criterion The symmetric matrix .A = (aij )ij is positive definite if and only if .

a11 > 0 ,   a11 a12 det > 0, a21 a22 ⎞ ⎛ a11 a12 a13 det ⎝ a21 a22 a23 ⎠ > 0 , . . . a31 a32 a33 ⎞ ⎛ a11 a12 · · · a1N ⎜ a21 a22 · · · a2N ⎟ ⎟ ⎜ det ⎜ . ⎟ > 0. .. .. ⎠ ⎝ .. . ··· . aN1 aN2 · · · aNN

It is negative definite if and only if the foregoing written determinants have an alternating sign: those of the .M × M submatrices with M odd are negative, while those with M even are positive.

10.6

Implicit Function Theorem: First Statement

We are now concerned with a problem involving a general equation of the type g(x, y) = 0 .

.

The question is whether or not for the solutions .(x, y) of this equation it is possible to derive y as a function of x, say, .y = η(x). As a typical example, let .g(x, y) = x 2 + y 2 − 1, so that the equation becomes x2 + y2 = 1 ,

.

whose solutions lie on the unitary circle .S 1 . The answer to the preceding question, in this case, could be positive provided that we restrict our analysis to a small

270

10 The Differential

neighborhood of some√ particular solution .(x0 , y0 ), with .y0 = 0. Indeed, if√.y0 > 0, we will obtain .η(x) = 1 − x 2 , whereas if .y0 < 0, we will take .η(x) = − 1 − x 2 . In general, we will show that the same conclusion holds if we take any point ∂g .(x0 , y0 ) for which .g(x0 , y0 ) = 0, provided that . ∂y (x0 , y0 ) = 0. In such a case, there exists a small neighborhood of .(x0 , y0 ) where g(x, y) = 0

.



y = η(x)

for some function .η, which thus happens to be “implicitly defined.” This important result, due to Ulisse Dini, will be later generalized to any finitedimensional setting. Theorem 10.13 (Implicit Function Theorem—I) Let .O ⊆ R × R be an open set g : O → R a .C 1 -function, and .(x0 , y0 ) a point in .O for which

.

g(x0 , y0 ) = 0

.

and

∂g (x0 , y0 ) = 0 . ∂y

Then there exist an open neighborhood U of .x0 , an open neighborhood V of .y0 , and a .C 1 -function .η : U → V such that .U × V ⊆ O, and, taking .x ∈ U and .y ∈ V , we have that g(x, y) = 0

.



y = η(x) .

Moreover, the function .η is of class .C 1 , and the following formula holds: 

∂g (x, η(x)) .η (x) = − ∂y 

−1

∂g (x, η(x)) . ∂x

∂g Proof Assume, for instance, that . ∂g ∂y (x0 , y0 ) > 0. By the continuity of . ∂y , there is a .δ > 0 such that .[x0 − δ, x0 + δ] × [y0 − δ, y0 + δ] ⊆ O and, if .|x − x0 | ≤ δ and ∂g .|y − y0 | ≤ δ, then . ∂y (x, y) > 0. Hence, for every .x ∈ [x0 − δ, x0 + δ], the function .g(x, ·) is strictly increasing on .[y0 − δ, y0 + δ]. Since .g(x0 , y0 ) = 0, we have that

g(x0 , y0 − δ) < 0 < g(x0 , y0 + δ) .

.

By continuity again, there is a .δ  > 0 such that, if .x ∈ [x0 − δ  , x0 + δ  ], then g(x, y0 − δ) < 0 < g(x, y0 + δ) .

.

We define .U = ]x0 −δ  , x0 +δ  [ , and .V = ]y0 −δ, y0 +δ[ . Hence, for every .x ∈ U, since .g(x, ·) is strictly increasing, there is exactly one .y ∈ ]y0 − δ, y0 + δ[ for which .g(x, y) = 0; we call .η(x) such a .y. We have thus defined a function .η : U → V

10.6 Implicit Function Theorem: First Statement

271

such that, taking .x ∈ U and .y ∈ V , g(x, y) = 0

.



y = η(x) .

To verify the continuity of .η, let us fix a .x¯ ∈ U and prove that .η is continuous at .x. ¯ With .x ∈ U and considering the function .γ : [0, 1] → U × V defined as γ (t) = (x¯ + t (x − x), ¯ η(x) ¯ + t (η(x) − η(x))), ¯

.

the Lagrange Mean Value Theorem 6.11 applied to .g ◦ γ tells us that there is a ξ ∈ ]0, 1[ for which

.

g(x, η(x)) − g(x, ¯ η(x)) ¯ =

.

∂g ∂g (γ (ξ ))(x − x) ¯ + (γ (ξ ))(η(x) − η(x)) ¯ . ∂x ∂y

Since .g(x, η(x)) = g(x, ¯ η(x)) ¯ = 0, we have that    ∂g  1  (γ (ξ ))(x − x) . ¯ .|η(x) − η(x)| ¯ = ∂g  ∂x  | ∂y (γ (ξ ))| Since the partial derivatives of g are continuous and . ∂g ∂y is not zero on the compact set .U × V , we have that there is a constant .c > 0 for which    ∂g  1   ≤ c|x − x| ¯ . (γ (ξ ))(x − x) ¯ .   ∂x | ∂g (γ (ξ ))| ∂y As a consequence, .η is continuous at .x. ¯ We now prove the differentiability. Taking .x¯ ∈ U and proceeding as previously, for h small enough we have .

∂g (γ (ξ )) η(x¯ + h) − η(x) ¯ = − ∂x , ∂g h (γ (ξ )) ∂y

with .γ (ξ ) belonging to the segment joining .(x, ¯ η(x)) ¯ to .(x¯ + h, η(x¯ + h)). If h tends to .0, we have that .γ (ξ ) tends to .(x, ¯ η(x)), ¯ and hence ∂g (x, ¯ η(x)) ¯ η(x¯ + h) − η(x) ¯ = − ∂x . ∂g h→0 h (x, ¯ η(x)) ¯

η (x) ¯ = lim

.

∂y

272

10 The Differential

This implies that .η is of class .C 1 , and η (x) = − −

.

∂g ∂x (x, η(x)) ∂g ∂y (x, η(x))

,

for every x ∈ U . 

We have thus completed the proof.

10.7

The Differential of a Vector-Valued Function

Let us recall the definition given at the beginning of the chapter. The differential of a function .f : O → RM at a point .x0 ∈ O is a linear function . : RN → RM for which one can write f (x) = f (x0 ) + (x − x0 ) + r(x) ,

.

with .

lim

x→x0

r(x) = 0. x − x0 

This linear function ., when it exists, is denoted by .df (x0 ). When .M ≥ 2, let .fk : O → R be the components of the function .f : O → RM , with .k = 1, 2, . . . , M, so that f (x) = (f1 (x), f2 (x), . . . , fM (x)) .

.

Theorem 10.14 The function f is differentiable at .x0 if and only if all its components are. In this case, for any vector .h ∈ RN , df (x0 )h = (df1 (x0 )h, df2 (x0 )h, . . . , dfM (x0 )h) .

.

Proof Considering the components in the equation f (x) = f (x0 ) + (x − x0 ) + r(x) ,

.

we can write fk (x) = fk (x0 ) + k (x − x0 ) + rk (x) ,

.

with .k = 1, 2, . . . , M, and we know that .

lim

x→x0

r(x) =0 x − x0 

whence the conclusion.



lim

x→x0

rk (x) =0 x − x0 

for every k = 1, 2, . . . , M , 

10.7 The Differential of a Vector-Valued Function

273

The preceding theorem permits us to recover all the computational rules obtained in the case .M = 1. Moreover, the function .f : O → RM is said to be “differentiable” or “ of class .C 1 ” if all its components are. This definition naturally extends to functions of class .C n . Note that when .N = 1, the differential .df (x0 ) : R → RM is the linear function that associates to any .h ∈ R the vector df (x0 )(h) = h df (x0 )(1) .

.

This last vector .df (x0 )(1) ∈ RM is called the “derivative”of f at .x0 and is usually denoted simply by .f  (x0 ). Using the preceding definition, one readily sees that f  (x0 ) = lim

.

x→x0

f (x) − f (x0 ) , x − x0

thereby recovering the definition given in Sect. 7.14 and the usual definition given when .N = M = 1. It is useful to consider the matrix associated with the linear function . = df (x0 ) given by ⎛

⎞  1 (e 1 )  1 (e 2 ) . . .  1 (e N ) ⎜  2 (e 1 )  2 (e 2 ) . . .  2 (e N ) ⎟ ⎜ ⎟ .⎜ ⎟, .. .. .. ⎝ ⎠ . . .  M (e 1 )  M (e 2 ) . . .  M (e N ) where .e1 , e2 , . . . eN are the vectors of the canonical basis of .RN . This matrix is called the “Jacobian matrix” associated with the function f at .x0 and is denoted by one of the symbols Jf (x0 ) ,

.

f  (x0 ) .

Recalling that .

∂fk (x0 ) = dfk (x0 )ej , ∂xj

with .k = 1, 2, . . . , M and .j = 1, 2, . . . , N, we see that ⎛ ⎜ ⎜ ⎜ .Jf (x0 ) = ⎜ ⎜ ⎜ ⎝

∂f1 ∂f1 ∂x1 (x0 ) ∂x2 (x0 ) ∂f2 ∂f2 ∂x1 (x0 ) ∂x2 (x0 )

.. . ∂fM ∂x1 (x0 )

···

··· .. . ∂fM ∂x2 (x0 ) · · ·

⎞ ∂f1 ∂xN (x0 ) ⎟ ⎟

⎟ ∂f2 ∂xN (x0 ) ⎟ ⎟. .. ⎟ ⎠ . ∂fM ∂xN (x0 )

274

10 The Differential

Remark 10.15 Note that when .M = 1, i.e., when .f : O → R, then its gradient is ∇f (x0 ) = Jf (x0 )T ,

.

the transpose of the row matrix .Jf (x0 ). (Recall that a vector is always a column matrix.)

10.8

The Chain Rule

We now examine the differentiability of the composition of functions. As usual, .O denotes an open subset of .RN , and .x0 is a point in .O. Theorem 10.16 If .f : O → RM is differentiable at .x0 , while .O ⊆ RM is an open set containing .f (O) and .g : O → RL is differentiable at .f (x0 ), then .g ◦ f is differentiable at .x0 , and d(g ◦ f )(x0 ) = dg(f (x0 )) ◦ df (x0 ) .

.

Proof Setting .y0 = f (x0 ), we have f (x) = f (x0 ) + df (x0 )(x − x0 ) + r1 (x) ,

.

g(y) = g(y0 ) + dg(y0 )(y − y0 ) + r2 (y) , with .

lim

x→x0

r1 (x) = 0, x − x0 

lim

y→y0

r2 (y) = 0. y − y0 

Let us introduce the auxiliary function .R2 : O → RL , defined as

R2 (y) =

.

⎧ ⎪ ⎪ ⎨ ⎪ ⎪ ⎩

r2 (y) y − y0  0

if y = y0 , if y = y0 .

Note that .R2 is continuous at .y0 and r2 (y) = y − y0 R2 (y) ,

.

for every y ∈ O .

Then g(f (x)) = g(f (x0 )) + dg(f (x0 ))[f (x) − f (x0 )] + r2 (f (x))

.

= g(f (x0 )) + dg(f (x0 ))[df (x0 )(x − x0 ) + r1 (x)] + r2 (f (x)) = g(f (x0 )) + [dg(f (x0 )) ◦ df (x0 )](x − x0 ) + r3 (x) ,

10.8 The Chain Rule

275

where r3 (x) = dg(f (x0 ))(r1 (x)) + r2 (f (x))

.

= dg(f (x0 ))(r1 (x)) + f (x) − f (x0 )R2 (f (x)) = dg(f (x0 ))(r1 (x)) + df (x0 )(x − x0 ) + r1 (x)R2 (f (x)) . Hence, r3 (x) ≤ . x − x0 

 r1 (x) dg(f (x0 )) + x − x0      r1 (x) x − x0 + df (x0 ) + R2 (f (x)) . x − x0  x − x0  

We can see that all this tends to 0 as .x → x0 . Indeed, if .x tends to .x0 , the first summand tends to 0, since .dg(f (x0 )) : RN → RL is linear, hence continuous, and .

lim

x→x0

r1 (x) = 0. x − x0 

(10.1)

On the other hand, since f is continuous at .x0 and .R2 is continuous at .y0 = f (x0 ), with .R2 (y0 ) = 0, we have that .

lim R2 (f (x)) = 0 .

x→x0

Finally, since .df (x0 ) : RN → RM is linear, hence continuous, it is bounded on the compact set .B(0, 1), by Weierstrass’ Theorem 4.10. Therefore, using also (10.1),  .

 df (x0 )

x − x0 x − x0 

 +

r1 (x) x − x0 

 is bounded .

Summing up, .

lim

x→x0

r3 (x) = 0, x − x0 

and we can conclude that .g ◦ f is differentiable at .x0 , with differential .dg(f (x0 )) ◦ df (x0 ).  It is well known that the matrix associated with the composition of two linear functions is the product of the two respective matrices. From the preceding theorem we then have the following formula for the Jacobian matrices: J (g ◦ f )(x0 ) = J g(f (x0 )) · Jf (x0 ) ;

.

276

10 The Differential

this means that the matrix ⎛ .

⎜ ⎜ ⎝

∂(g◦f )1 ∂x1 (x0 )

···

.. .

··· ∂(g◦f )L ( x ) 0 ··· ∂x1

∂(g◦f )1 ∂xN (x0 )

.. .

∂(g◦f )L ∂xN (x0 )

⎞ ⎟ ⎟ ⎠

is equal to the product ⎛ .

⎜ ⎜ ⎝

∂g1 ∂y1 (f (x0 ))

···

.. . ··· ∂gL ∂y1 (f (x0 )) · · ·

∂g1 ∂yM (f (x0 ))

⎞⎛

∂f1 ∂x1 (x0 )

···

⎟⎜ .. .. ⎟⎜ . . ··· ⎠⎝ ∂gL ∂fM ∂yM (f (x0 )) ∂x1 (x0 ) · · ·

∂f1 ∂xN (x0 )



⎟ .. ⎟. . ⎠ ∂fM ∂xN (x0 )

We thus obtain the formula for the partial derivatives of the composition of functions, usually called the chain rule: .

∂(g ◦ f )i (x 0 ) = ∂xj =

∂gi ∂f1 ∂gi ∂f2 ∂gi ∂fM (f (x0 )) (x 0 ) + (f (x0 )) (x 0 ) + · · · + (f (x0 )) (x 0 ) ∂y1 ∂xj ∂y2 ∂xj ∂yM ∂xj

=

M  ∂gi ∂fk (f (x0 )) (x0 ) , ∂yk ∂xj k=1

where .i = 1, 2, . . . , L and .j = 1, 2, . . . , N. Remark 10.17 When .L = 1, i.e., when .g : O → R, in view of Remark 10.15 we obtain the formula ∇(g ◦ f )(x0 ) = Jf (x0 )T ∇g(f (x0 )) .

.

Let us now prove the following generalization of the formula for the derivative of a product of two functions. Theorem 10.18 Let .f : O → RN1 and .g : O → RN2 be two functions, differentiable at some .x0 . Let .F : O → RL be defined as F (x) = B(f (x), g(x)) ,

.

where .B : RN1 × RN2 → RL is a bilinear function. Then, for every .h ∈ RN , dF (x0 )h = B(df (x0 )h, g(x0 )) + B(f (x0 ), dg(x0 )h) .

.

10.9 Mean Value Theorem

277

Proof First define the function .ϕ : O → RN1 × RN2 as .ϕ(x) = (f (x), g(x)), and note that .dϕ(x0 )h = (df (x0 )h, dg(x0 )h) for every .h ∈ RN . Then it is sufficient to apply Theorem 10.16, in view of Proposition 10.7.  The following two examples with .O ⊆ R, involving the scalar product and the cross product of two functions, are direct consequences of the preceding formula. Assume that .f, g : O → RM are differentiable at some .x0 ∈ O. Then (f · g) (x0 ) = f  (x0 ) · g(x0 ) + f (x0 ) · g  (x0 ) .

.

Moreover, if .M = 3, then (f × g) (x0 ) = f  (x0 ) × g(x0 ) + f (x0 ) × g  (x0 ) .

.

10.9

Mean Value Theorem

Lagrange’s Theorem 6.11 does not extend directly to functions having vector values. For example, taking .a = 0 and .b = 2π, the function .f : [a, b] → R2 defined as .f (x) = (cos x, sin x) is such that .f (b) − f (a) = (0, 0), but there is no .ξ ∈ ]a, b[ for which .f (b) − f (a) = f  (ξ )(b − a) since .f  (ξ ) = (− sin ξ, cos ξ ) = (0, 0). We will nevertheless try to find a substitute for this theorem, which will be useful in what follows. We first need the following lemma. Lemma 10.19 Let .ϕ : [a, b] → RM be a differentiable function for which there is a constant .C ≥ 0 such that ϕ  (t) ≤ C , for every t ∈ [a, b] .

.

Then ϕ(b) − ϕ(a) ≤ C(b − a) .

.

Proof We set .I0 = [a, b]. Assume by contradiction that ϕ(b) − ϕ(a) − C(b − a) = μ > 0 .

.

We divide the interval .[a, b] into two equal parts, taking the midpoint .m = Then it can be seen that one of the two following inequalities holds: ϕ(m) − ϕ(a) − C(m − a) ≥

.

μ , 2

ϕ(b) − ϕ(m) − C(b − m) ≥

a+b 2 .

μ . 2

278

10 The Differential

If the first one holds, we set .I1 = [a, m]; otherwise, we set .I1 = [m, b]. In the same way, we proceed now to the definition of .I2 , then .I3 , and so on. We thus obtain a sequence of compact intervals .In = [an , bn ], with I0 ⊇ I1 ⊇ I2 ⊇ I3 ⊇ . . .

.

such that ϕ(bn ) − ϕ(an ) − C(bn − an ) ≥

.

μ 2n

for every .n ∈ N. By Cantor’s Theorem 1.9, there is a .c ∈ R such that .an ≤ c ≤ bn for every .n ∈ N, and since bn − an =

.

b−a , 2n

we have that .limn an = limn bn = c. Since .ϕ is differentiable at c, we can write ϕ(t) = ϕ(c) + ϕ  (c)(t − c) + r(t) ,

.

with .

lim

t →c

r(t) = 0. t −c

μ Let .ε ∈ ]0, b−a [ . If n is sufficiently large, we have

$ % μ ≤ 2n ϕ(bn ) − ϕ(an ) − C(bn − an ) $ % ≤ 2n ϕ(bn ) − ϕ(c) + ϕ(c) − ϕ(an ) − C(bn − an ) $ % = 2n ϕ  (c)(bn − c) + r(bn ) + ϕ  (c)(an − c) + r(an ) − C(bn − an ) % $ ≤ 2n ϕ  (c) |bn − c| + r(bn ) + ϕ  (c) |an − c| + r(an ) − C(bn − an ) $ % ≤ 2n C(bn − c) + r(bn ) + C(c − an ) + r(an ) − C(bn − an ) $ % = 2n r(bn ) + r(an ) $ % ≤ 2n ε|bn − c| + ε|an − c| = 2n ε(bn − an ) = ε(b − a) ,

.

a contradiction, which completes the proof.



It will now be useful to introduce the norm of a linear function .A : RN → RM as A = max{Ax : x = 1} .

.

10.9 Mean Value Theorem

279

Such a maximum exists by Weierstrass’ Theorem 4.10 since the function .A, being linear, is continuous. The reader might like to check that we have indeed defined a norm, verifying the following properties: A ≥ 0 . A = 0 ⇔ x = 0 . .αA = |α| A .   .A + A  ≤ A + A  .

(a) (b) .(c) .(d) .

.

.

.

Moreover, we have that Ax ≤ A x ,

.

for every x ∈ RN .

We are now ready to state our extension of Lagrange’s Mean Value Theorem 6.11. Let .O be an open set in .RN , and let .f : O → RM be a differentiable function. Theorem 10.20 (Mean Value Theorem) If .[x0 , x] is a segment contained in .O, then .f (x) − f (x0 ) ≤ sup df (v) : v ∈ [x0 , x] x − x0  . Proof If the supremum is equal to .+∞, there is nothing to be proved. Suppose, then, that . sup df (v) : v ∈ [x0 , x] = C ∈ R . We consider the function .ϕ : [0, 1] → RM , defined as .ϕ(t) = f (x0 + t (x − x0 )). Then ϕ  (t) = df (x0 + t (x − x0 ))(x − x0 )

.

≤ df (x0 + t (x − x0 )) x − x0  ≤ Cx − x0  for every .t ∈ [0, 1]. By Lemma 10.19, f (x) − f (x0 ) = ϕ(1) − ϕ(0) ≤ Cx − x0 (1 − 0) = Cx − x0  ,

.

which is exactly what we wanted to prove.



280

10 The Differential

10.10 Implicit Function Theorem: General Statement We will now generalize the Implicit Function Theorem 10.13 in its general finitedimensional context. Let .O be an open subset of .RM × RN and .g : O → RN a 1 .C -function. Hence, g has N components g(x, y) = (g1 (x, y), . . . , gN (x, y)) .

.

Here .x = (x1 , . . . , xM ) ∈ RM , and .y = (y1 , . . . , yN ) ∈ RN . We will use the following notation for the Jacobian matrices: ⎛ .

∂g1 ∂x1 (x, y)

⎜ ∂g (x, y) =⎜ ⎝ ∂x ∂g

···

∂g1 ∂xM (x, y)



⎟ .. ⎟, . ⎠ ∂gN ∂xM (x, y) ⎛ ∂g ⎞ ∂g1 1 ∂y1 (x, y) · · · ∂yN (x, y) ⎜ ⎟ ∂g .. .. ⎟. (x, y) =⎜ . · · · . ⎝ ⎠ ∂y ∂gN ∂gN ( x , y ) · · · ( x , y ) ∂y1 ∂yN .. .

··· N ∂x1 (x, y) · · ·

Theorem 10.21 (Implicit Function Theorem—II) Let .O ⊆ RM × RN be an open set, .g : O → RN a .C 1 -function, and .(x0 , y0 ) a point in .O for which g(x0 , y0 ) = 0

.

and

det

∂g (x0 , y0 ) = 0 . ∂y

Then there exist an open neighborhood U of .x0 , an open neighborhood V of .y0 , and a .C 1 -function .η : U → V such that .U × V ⊆ O, and, taking .x ∈ U and .y ∈ V , we have that g(x, y) = 0

.



y = η(x) .

Moreover, the function .η is of class .C 1 , and the following formula holds true:  J η(x) = −

.

∂g (x, η(x)) ∂y

−1

∂g (x, η(x)) . ∂x

Proof In the case where .N = 1, the definition of .η is almost the same as the one given in the proof of Theorem 10.13. It will be sufficient to replace the interval    .[x0 −δ, x0 +δ] with the ball .B(x0 , δ) and to replace .]x0 −δ , x0 +δ [ with .B(x0 , δ ). Once the function .η : U → V has been defined, let us see how to prove its continuity and its differentiablility.

10.10 Implicit Function Theorem: General Statement

281

¯ ∈ U and prove that .η is continuous at To verify the continuity of .η, let us fix a .x x¯ . If we take .x ∈ U and consider the function .γ : [0, 1] → U × V , defined as

.

¯ + t (x − x ¯ ), η(x ¯ ) + t (η(x) − η(x ¯ ))), γ (t) = (x

.

Lagrange’s Mean Value Theorem 6.11 applied to .g ◦ γ tells us that there is a .ξ ∈ ]0, 1[ for which ¯ , η(x ¯ )) = g(x, η(x)) − g(x

.

∂g ∂g ¯)+ ¯ )) . (γ (ξ ))(η(x) − η(x (γ (ξ ))(x − x ∂x ∂y

¯ , η(x ¯ )) = 0, we have that Since .g(x, η(x)) = g(x    ∂g  1  . ¯ )| = ∂g ¯ .|η(x) − η(x (γ (ξ ))( x − x )  ∂x  | ∂y (γ (ξ ))| Since the partial derivatives of g are continuous and . ∂g ∂y is not zero on the compact set .U × V , we have that there is a constant .c > 0 for which    ∂g  1   ≤ cx − x ¯ ¯. . (γ (ξ ))( x − x )   ∂x | ∂g (γ (ξ ))| ∂y ¯. As a consequence, .η is continuous at .x ¯ = (x¯1 , x¯2 , . . . , x¯M ), let .x = (x¯1 + We now prove the differentiability. Taking .x h, x¯2 , . . . , x¯M ); proceeding as previously, for h small enough we have .

∂g η(x¯1 + h, x¯2 , . . . , x¯M ) − η(x¯1 , x¯2 , . . . , x¯M ) ∂x (γ (ξ )) = − ∂g1 , h (γ (ξ )) ∂y

¯ , η(x ¯ )) to .(x, η(x)). If h tends to .0, with .γ (ξ ) belonging to the segment joining .(x ¯ , η(x ¯ )), and hence we have that .γ (ξ ) tends to .(x .

∂g ¯ ¯ ∂η η(x¯1 + h, x¯2 , . . . , x¯M ) − η(x¯1 , x¯2 , . . . , x¯M ) ∂x (x, η(x)) ¯ ) = lim = − ∂g1 (x . h→0 ∂x1 h ¯ , η(x ¯ )) (x ∂y

The partial derivatives with respect to .x2 , . . . , xM are computed similarly, thereby yielding that .η is of class .C 1 and 1

∂g (x, η(x)) ∂x ∂y (x, η(x))

J η(x) = − ∂g

.

We have thus proved the theorem in the case .N = 1.

for every x ∈ U .

282

10 The Differential

We now assume that the statement holds till .N − 1 for some .N ≥ 2 (and any M ≥ 1) and prove that it then also holds for .N. We will use the notation

.

y1 = (y1 , . . . , yN−1 ) ,

.

and we will write .y = (y1 , yN ). Since ⎛ .

⎜ det ⎜ ⎝

∂g1 ∂y1 (x0 , y0 )

···

.. .

··· ∂gN ( x , y ) · ·· 0 0 ∂y1

∂g1 ∂yN (x0 , y0 )

.. .

∂gN ∂yN (x0 , y0 )

⎞ ⎟ ⎟ = 0 , ⎠

at least one of the elements in the last column is different from zero. We can assume N without loss of generality, possibly changing the rows, that . ∂g ∂yN (x0 , y0 ) = 0. 0 0 0 0 0 Writing .y0 = (y1 , yN ), with .y1 = (y1 , . . . , yN−1 ), we then have 0 gN (x0 , y01 , yN )=0

∂gN 0 (x0 , y01 , yN ) = 0 . ∂yN

and

.

Then, by the already proved one-dimensional case, there are an open neighborhood 0 , and a .C 1 -function .η : U → V U1 of .(x0 , y01 ), an open neighborhood .VN of .yN 1 1 N such that .U1 × VN ⊆ O, with the following properties. If .(x, y1 ) ∈ U1 and .yN ∈ VN , then

.

gN (x, y1 , yN ) = 0

.



yN = η1 (x, y1 )

and J η1 (x, y1 ) = − ∂g

.

N ∂yN

1

∂gN (x, y1 , η1 (x, y1 )) . (x, y1 , η1 (x, y1 ))) ∂(x, y1 )

&×V &1 , with .U & being an open neighborhood We may assume that .U1 is of the type .U 0 & of .x0 and .V1 an open neighborhood of .y1 . &×V &1 → RN−1 by setting Let us define the function .φ : U φ(x, y1 ) = (g1 (x, y1 , η1 (x, y1 )), . . . , gN−1 (x, y1 , η1 (x, y1 ))) .

.

For brevity’s sake we will write g(1,...,N−1) (x, y) = (g1 (x, y), . . . , gN−1 (x, y)) ,

.

so that φ(x, y1 ) = g(1,...,N−1) (x, y1 , η1 (x, y1 )) .

.

10.10 Implicit Function Theorem: General Statement

283

0 Note that, since .η1 (x0 , y01 ) = yN , we have that

φ(x0 , y01 ) = g(1,...,N−1) (x0 , y0 ) = 0

.

and .

∂φ ∂g(1,...,N−1) ∂g(1,...,N−1) ∂η1 (x0 , y01 ) = (x 0 , y 0 ) + (x 0 , y 0 ) (x0 , y01 ) . ∂ y1 ∂ y1 ∂yN ∂ y1 (10.2)

Moreover, since .gN (x, y1 , η1 (x, y1 )) = 0 for every .(x, y1 ) ∈ U1 , differentiating with respect to .y1 we see that 0=

.

∂gN ∂gN ∂η1 (x 0 , y 0 ) + (x 0 , y 0 ) (x0 , y01 ) . ∂ y1 ∂yN ∂ y1

(10.3)

Let us write the identity ⎛

⎞ ∂φ ∂g(1,...,N−1) 0 ( x , y ) ( x , y ) 0 0 ⎟ ⎜ ∂ y1 0 1 ∂yN ⎜ ⎟ ⎜ ⎟ ∂φ 1 ⎟. . det (x0 , y01 ) = ∂g det ⎜ ⎜ ⎟ N ∂ y1 ( x , y ) ⎜ ⎟ 0 0 ∂yN ⎝ ⎠ ∂gN 0 (x 0 , y 0 ) ∂yN Substituting the two equalities (10.2) and (10.3), we have that

.

⎛ ∂φ ⎞ ∂g(1,...,N −1) (x0 , y01 ) (x0 , y0 ) ⎜ ∂ y1 ⎟ ∂yN ⎜ ⎟ ⎜ ⎟ ⎟= det⎜ ⎜ ⎟ ⎜ ⎟ ⎝ ⎠ ∂gN 0 (x0 , y0 ) ∂yN ⎛ ∂g ⎜ ⎜ ⎜ = det⎜ ⎜ ⎜ ⎝

⎞ ∂g(1,...,N −1) ∂g(1,...,N −1) ∂η1 (x0 , y0 ) (x0 , y01 ) (x0 , y0 ) ⎟ ∂ y1 ∂yN ∂ y1 ∂yN ⎟ ⎟ ⎟ ⎟ ⎟ ⎠ ∂gN ∂g ∂gN ∂η1 N 0 (x0 , y0 ) + (x0 , y0 ) (x0 , y1 ) (x0 , y0 ) ∂ y1 ∂yN ∂ y1 ∂yN

(1,...,N −1)

(x0 , y0 )+

  ∂g ∂g ∂η1 ∂g 0 (x0 , y0 ) + (x0 , y0 ) (x0 , y1 ) (x0 , y0 ) = det ∂ y1 ∂yN ∂ y1 ∂yN 

 ∂g ∂η1 ∂g ∂g (x0 , y0 ) (x0 , y01 ) (x0 , y0 ) . (x0 , y0 ) + = det ∂yN ∂ y1 ∂yN ∂y

284

10 The Differential

We now recall that adding a scalar multiple of one column to another column of a matrix does not change the value of its determinant. Hence, since  .

∂η1 ∂g ∂g ( x0 , y 0 ) (x0 , y01 ) (x0 , y0 ) ∂yN ∂ y1 ∂yN ⎛

∂g1 ∂η1 0 ⎜ ∂yN (x0 , y0 ) ∂y1 (x0 , y1 ) · · · ⎜ ⎜ ⎜ ⎜ .. =⎜ ⎜ . ··· ⎜ ⎜ ⎜ ⎝ ∂gN ∂η1 (x0 , y0 ) (x0 , y01 ) · · · ∂yN ∂y1

 =

∂g1 ∂η1 ( x0 , y 0 ) (x0 , y01 ) ∂yN ∂yN −1 .. . ∂gN ∂η1 ( x0 , y 0 ) (x0 , y01 ) ∂yN ∂yN −1

⎞ ∂g1 (x0 , y0 ) ⎟ ∂yN ⎟ ⎟ ⎟ ⎟ .. ⎟, ⎟ . ⎟ ⎟ ⎟ ⎠ ∂gN (x0 , y0 ) ∂yN

it must be that  

∂η1 ∂g ∂g ∂g (x 0 , y 0 ) (x0 , y01 ) (x0 , y0 ) . det (x 0 , y 0 ) + ∂yN ∂ y1 ∂yN ∂y ∂g (x 0 , y 0 ) . ∂y

= det Thus,

.

det

∂φ (x0 , y01 ) = ∂ y1

1 ∂gN ∂yN (x0 , y0 )

det

∂g (x 0 , y 0 ) , ∂y

and finally we have φ(x0 , y01 ) = 0

.

and

det

∂φ (x0 , y01 ) = 0 . ∂ y1

By the inductive assumption, there are an open neighborhood U of .x0 , an open &×V &1 , neighborhood .V1 of .y01 , and a .C 1 -function .η2 : U → V1 such that .U ×V1 ⊆ U and the following holds: For every .x ∈ U and .y1 ∈ V1 , φ(x, y1 ) = 0

.



y1 = η2 (x) .

In conclusion, for .x ∈ U and .y = (y1 , yN ) ∈ V1 × VN , we have that  g(x, y) = 0

.

⇔  ⇔

g(1,...,N−1) (x, y1 , yN ) = 0 gN (x, y1 , yN ) = 0 g(1,...,N−1) (x, y1 , yN ) = 0 yN = η1 (x, y1 )

10.10 Implicit Function Theorem: General Statement

 ⇔  ⇔ ⇔

285

φ(x, y1 ) = 0 yN = η1 (x, y1 )

y1 = η2 (x) yN = η1 (x, y1 )

y = (η2 (x), η1 (x, η2 (x))) .

Setting .V = V1 × VN , we may then define the function .η : U → V as η(x) = (η2 (x), η1 (x, η2 (x))) .

.

This function is of class .C 1 since both .η1 and .η2 are as well. Since .g(x, η(x)) = 0 for every .x ∈ U, we easily deduce that .

∂g ∂g (x, η(x)) + (x, η(x))J η(x) = 0 , ∂x ∂y

whence the formula for .J η(x).



Clearly, the following analogous statement holds true, where the roles of .x and

y are interchanged.

.

Theorem 10.22 (Implicit Function Theorem—III) Let .O ⊆ RM × RN be an open set, .g : O → RM a .C 1 -function, and .(x0 , y0 ) a point in .O for which g(x0 , y0 ) = 0

.

and

det

∂g (x0 , y0 ) = 0 . ∂x

Then there exist an open neighborhood U of .x0 , an open neighborhood V of .y0 , and a .C 1 -function .η : V → U such that .U × V ⊆ O, and, taking .x ∈ U and .y ∈ V , we have that g(x, y) = 0

.



x = η(y) .

Moreover, the function .η is of class .C 1 , and the following formula holds: 

∂g .J η(y) = − (η(y), y) ∂x

−1

∂g (η(y), y) . ∂y

286

10 The Differential

10.11 Local Diffeomorphisms Let us introduce the notion of “diffeomorphism.” Definition 10.23 Given A and B, two open subsets of .RN , a function .ϕ : A → B is said to be a “diffeomorphism” if it is of class .C 1 , it is a bijection, and its inverse −1 : B → A is also of class .C 1 . .ϕ Let us state the following important consequence of the Implicit Function Theorem. Theorem 10.24 (Local Diffeomorphism Theorem) Let A be an open subset of RN , and let .ϕ : A → RN be a .C 1 -function. If, for some .x0 ∈ A, we have that .det J ϕ(x0 ) = 0, then there exist an open neighborhood U of .x0 contained in A and an open neighborhood V of .ϕ(x0 ) such that .ϕ(U ) = V , and the restricted function .ϕ |U : U → V is a diffeomorphism. .

Proof We consider the function .g : A × RN → RN defined as g(x, y) = ϕ(x) − y .

.

Setting .y0 = ϕ(x0 ), we have that g(x0 , y0 ) = 0

.

and

det

∂g (x0 , y0 ) = det J ϕ(x0 ) = 0 . ∂x

By the Implicit Function Theorem 10.22, there exist an open neighborhood V of y0 , an open neighborhood U of .x0 , and a .C 1 -function .η : V → U such that .U ⊆ A and, taking .y ∈ V and .x ∈ U,

.

ϕ(x) = y

.



g(x, y) = 0

Hence, .η = ϕ |−1 , and the proof is thus completed. U



x = η(y) . 

The following corollary will be useful. Corollary 10.25 Let .A ⊆ RN be an open set and .σ : A → RN an injective .C 1 function such that .det J σ (x) = 0 for every .x ∈ A. Then the set .B = σ (A) is open, and the function .ϕ : A → B defined as .ϕ(x) = σ (x) is a diffeomorphism. Proof For every .y0 ∈ σ (A) there is a unique .x0 ∈ A such that .σ (x0 ) = y0 , and we know that .det J σ (x0 ) = 0. Hence, by the Local Diffeomorphism Theorem 10.24, there exist an open neighborhood U of .x0 contained in A and an open neighborhood V of .y0 such that .σ (U ) = V , and the restricted function .σ |U : U → V is a

10.12 M-Surfaces

287

diffeomorphism. Then .V = σ (U ) ⊆ σ (A), thereby proving that .σ (A) is an open set. In conclusion, the function .ϕ : A → σ (A) defined as .ϕ(x) = σ (x) is bijective and of class .C 1 , and, being a local diffeomorphism, its inverse .ϕ −1 : σ (A) → A is of class .C 1 as well.  We now derive the formula for the differential of the inverse function. Theorem 10.26 Let .ϕ : A → B be a diffeomorphism, take .x0 ∈ A, and let .y0 = ϕ(x0 ). Then .dϕ(x0 ) is invertible, and dϕ −1 (y0 ) = dϕ(x0 )−1 ,

.

whence J ϕ −1 (y0 ) = [J ϕ(x0 )]−1 .

.

Proof Observe that .ϕ −1 ◦ ϕ : A → A is the identity function I on A, and .ϕ ◦ ϕ −1 : B → B is the identity function I on B. Then their differentials at .x0 and at .y0 , respectively, are also identity functions, and hence I = d(ϕ −1 ◦ ϕ)(x0 ) = dϕ −1 (ϕ(x0 )) ◦ dϕ(x0 ) = dϕ −1 (y0 ) ◦ dϕ(x0 ) ,

.

I = d(ϕ ◦ ϕ −1 )(y0 ) = dϕ(ϕ −1 (y0 )) ◦ dϕ −1 (y0 ) = dϕ(x0 ) ◦ dϕ −1 (y0 ) .

.

This proves that .dϕ(x0 ) is invertible, and .dϕ −1 (y0 ) is its inverse. The equality for the Jacobian matrices is a consequence of the fact that the matrix associated with the inverse of a linear function is the inverse of the matrix of that linear function. 

10.12 M-Surfaces We often hear talk of “curves” and “surfaces” without a precise definition of what, in fact, they are. We now begin examining these objects from a dynamical point of view. The motivation comes from a typical situation in physics when one wants to describe the trajectory of a moving object. Assuming that the object is a point, surely enough we would not be satisfied if we were only told, for example, that its trajectory was a circle. We would also like to know how the object moves on this circle: Is its speed constant or varying? Is it moving clockwise or counterclockwise? Or is it oscillating back and forth? To satisfy the need to know precisely how the object is moving, we introduce a function, defined on some interval .[a, b], which to each instant of time .t ∈ [a, b] associates its position in space, say, .σ (t). Such a function .σ : [a, b] → RN , if it is sufficiently regular, will be called a “curve” in .RN .

288

10 The Differential

Similar observations can be made for a “surface,” which will be a function defined on some rectangle .[a1 , b1 ] × [a2, b2 ]. (The choice of a rectangular domain is made for simplicity.) These two situations will now be generalized to an arbitrary dimension M, leading to the concept of “M-surface.” We denote by I a “rectangle” in .RM, i.e., a set of the type I = [a1 , b1 ] × · · · × [aM , bM ] .

.

This word is surely familiar in the case of .M = 2. If .M = 1, a rectangle happens to be a compact interval, whereas if .M = 3, we usually prefer to call it a “rectangular parallelepiped” or “cuboid.” Definition 10.27 Let .1 ≤ M ≤ N. We call “M-surface” in .RN a function .σ : I → RN of class .C 1 . If .M = 1, then .σ is also said to be a “curve”; if .M = 2, then we will simply call it a “surface.” The set .σ (I ) is the “image” of the M-surface .σ. We will say that the M-surface .σ is “regular” if, for every .u ∈ I˚, the Jacobian matrix .J σ (u) has rank .M. A curve in .RN is a function .σ : [a, b] → RN, with σ (t) = (σ1 (t), . . . , σN (t)) .

.

The curve is regular if, for every .t ∈ ]a, b[ , the vector .σ  (t) = (σ1 (t), . . . , σN (t)) is different from zero, i.e., .σ  (t) = (0, . . . , 0). In that case, it is possible to define the “tangent unit vector” at the point .σ (t), τσ (t) =

.

σ  (t) . σ  (t)

σ(a) τσ(t ) σ(t ) + τσ(t )

σ (b ) σ(t )

Example The curve .σ : [0, 2π] → R3 , defined by σ (t) = (R cos(2t), R sin(2t), 0) ,

.

10.12 M-Surfaces

289

has as its image the circle {(x, y, z) : x 2 + y 2 = R 2 , z = 0}

.

(which is covered twice). Since .σ  (t) = (−2R sin(2t), 2R cos(2t), 0), it is a regular curve, and τσ (t) = (− sin(2t), cos(2t), 0) .

.

A surface in .R3 is a function .σ : [a1 , b1 ] × [a2, b2 ] → R3 . The surface is regular ∂σ if, for every .(u, v) ∈ ]a1 , b1 [ × ]a2, b2 [ , the vectors . ∂σ ∂u (u, v), . ∂v (u, v) are linearly independent. In that case, they determine a plane, called the “tangent plane” to the surface at the point .σ (u, v), and it is possible to define the “normal unit vector” νσ (u, v) =

.

∂σ ∂u (u, v) ∂σ  ∂u (u, v)

× ×

∂σ ∂v (u, v) ∂σ ∂v (u, v)

,

which is visualized in the following figure. σ(u ,v ) + νσ(u ,v ) νσ(u ,v ) σ(u ,v)

Example 1 The surface .σ : [0, π] × [0, π] → R3 , defined by σ (φ, θ ) = (R sin φ cos θ, R sin φ sin θ, R cos φ) ,

.

has as its image the hemisphere {(x, y, z) : x 2 + y 2 + z2 = R 2 , y ≥ 0} .

.

290

10 The Differential

Since .

∂σ (φ, θ ) = (R cos φ cos θ, R cos φ sin θ, −R sin φ) , ∂φ ∂σ (φ, θ ) = (−R sin φ sin θ, R sin φ cos θ, 0) , ∂θ

we compute .

∂σ ∂σ (φ, θ ) × (φ, θ ) = (R 2 sin2 φ cos θ, R 2 sin2 φ sin θ, R 2 sin φ cos φ) . ∂φ ∂θ

We thus see that it is a regular surface, and νσ (φ, θ ) = (sin φ cos θ, sin φ sin θ, cos φ) .

.

Example 2 The surface .σ : [0, 2π] × [0, 2π] → R3 , defined by σ (u, v) = ((R + r cos u) cos v, (R + r cos u) sin v, r sin u) ,

.

where .0 < r < R, has as its image a torus

 .

(x, y, z) :



5

2 x2

+ y2

−R

+z =r 2

Even in this case, one can verify that it is a regular surface.

2

.

10.13 Local Analysis of M-Surfaces

291

A 3-surface in .R3 is also called a “volume.” Example The function .σ : [0, R] × [0, π] × [0, 2π] → R3 , defined by σ (ρ, φ, θ ) = (ρ sin φ cos θ, ρ sin φ sin θ, ρ cos φ) ,

.

has as image the closed ball {(x, y, z) : x 2 + y 2 + z2 ≤ R 2 } .

.

In this case, .det J σ (ρ, φ, θ ) = ρ 2 sin φ, so that it is a regular volume.

The best way to describe a set .M in .RN is to find a parametrization. Let us explain precisely what this means. Definition 10.28 An M-surface .σ : I → RN is an “M-parametrization” of a set .M if it is regular and injective on .˚ I and .σ (I ) = M . We say that a subset of .RN is “M-parametrizable” if there is an M-parametrization of it. Examples The circle .M = {(x, y) ∈ R2 : x 2 + y 2 = 1} is 1-parametrizable, and 2 .σ : [0, 2π] → R , given by .σ (t) = (cos t, sin t), is a 1-parametrization of it. A 2-parametrization of the sphere .M = {(x, y, z) ∈ R3 : x 2 + y 2 + z2 = 1} is, for example, .σ : [0, π] × [0, 2π] → R3, defined by σ (φ, θ ) = (sin φ cos θ, sin φ sin θ, cos φ) .

.

10.13 Local Analysis of M-Surfaces Sometimes geometrical objects are given by an equation like .y = x 2 (a parabola) or 2 2 2 2 2 .x + y = 1 (a circle) or .x + y + z = 1 (a sphere). We will now show that, under reasonable assumptions, these kinds of objects can be locally described by a curve,

292

10 The Differential

a surface, or, in general, an M-surface, which, we recall, is a .C 1 -function defined on a rectangle, with values in .RN . We now assume .1 ≤ M < N. We thus have in mind a geometrical object described by an equation like g(x) = 0 .

.

We will focus our attention at a point .x0 and describe locally our object by some C 1 -function defined on some rectangle of the type

.

B[r] = [−r, r] × · · · × [−r, r] .

.

Theorem 10.29 Let .O ⊆ RN be an open set, .x0 a point of .O, and .g : O → RN−M a function of class .C 1 such that g(x0 ) = 0 , and J g(x0 ) has rank N − M .

.

Then there exist a neighborhood U of .x0 and a regular and injective M-surface σ : B[r] → RN for some .r > 0 such that .σ (0) = x0 and

.

{x ∈ U : g(x) = 0} = σ (B[r]) .

.

Proof Assume, for instance, that the matrix ⎛ .

∂g1 ∂xM+1 (x0 )

⎜ ∂g (x 0 ) = ⎜ ⎝ ˜ ∂x ∂g

···

.. .

··· N−M ∂xM+1 (x0 ) · · ·

∂g1 ∂xN (x0 )

.. .

∂gN−M ∂xN (x0 )

⎞ ⎟ ⎟ ⎠

is invertible. (If not, it will be sufficient to shift the columns of the matrix .J g(x0 ) ˆ,x ˜ ), where .x ˆ ∈ RM and to reduce to this case.) Let us write each .x ∈ O as .(x N−M ˜ ∈R .x . Since ˆ 0, x ˜ 0) = 0 g(x

.

and

det

∂g ˆ 0, x ˜ 0 ) = 0 , (x ˜ ∂x

ˆ 0, by the Implicit Function Theorem 10.21, there exist an open neighborhood .Uˆ of .x & such that .Uˆ ×U & of .x & ⊆ O, ˜ 0 , and a .C 1 -function .η : Uˆ → U an open neighborhood .U &, we have that ˆ ∈ Uˆ and .x ˜ ∈U and, taking .x ˆ,x ˜) = 0 g(x

.



x˜ = η(xˆ ) .

&, and let ˆ 0 , r] ⊆ Uˆ , let .U = B[x ˆ 0 , r] × U Let .r > 0 be chosen such that .B[x ˆ 0 , η(u + x ˆ 0 )). Since .J σ (u) has as a σ : B[r] → RN be defined as .σ (u) = (u + x

.

10.13 Local Analysis of M-Surfaces

293

submatrix the identity .M × M matrix, surely .σ is regular. Moreover, .σ is injective ˆ 0 is. Finally, if .x = (x ˆ,x ˜ ) ∈ U, then since its first component .u → u + x ˆ,x ˜) = 0 g(x

.



x˜ = η(xˆ )



ˆ,x ˜ ) = σ (x ˆ −x ˆ 0) , (x 

yielding the conclusion.

The M-surface .σ appearing in the statement of the previous theorem is called a “local M-parametrization.” Let us analyze in greater detail three interesting cases. We start by considering a planar curve, i.e., the case .M = 1, .N = 2. Corollary 10.30 Let .O ⊆ R2 be an open set, .(x0 , y0 ) a point of .O, and .g : O → R a function of class .C 1 such that g(x0 , y0 ) = 0

.

and

∇g(x0 , y0 ) = 0 .

Then there exist a neighborhood U of .(x0 , y0 ) and a regular and injective curve σ : [−r, r] → R2 for some .r > 0 such that .σ (0) = (x0 , y0 ) and

.

{(x, y) ∈ U : g(x, y) = 0} = σ ([−r, r]) .

.

Let us now examine the case of a surface in .R3 , i.e., the case .M = 2, .N = 3. Corollary 10.31 Let .O ⊆ R3 be an open set, .(x0 , y0 , z0 ) a point of .O, and .g : O → R a function of class .C 1 such that g(x0 , y0 , z0 ) = 0

.

and

∇g(x0 , y0 , z0 ) = 0 .

Then there exist a neighborhood U of .(x0 , y0 , z0 ) and a regular and injective surface σ : [−r, r] × [−r, r] → R3 for some .r > 0 such that .σ (0, 0) = (x0 , y0 , z0 ) and

.

{(x, y, z) ∈ U : g(x, y, z) = 0} = σ ([−r, r] × [−r, r]) .

.

We conclude with the case of a curve in .R3 , i.e., the case .M = 1, .N = 3. Corollary 10.32 Let .O ⊆ R3 be an open set, .(x0 , y0 , z0 ) a point of .O, and .g1 , g2 : O → R two functions of class .C 1 , such that g1 (x0 , y0 , z0 ) = g2 (x0 , y0 , z0 ) = 0

.

and

∇g1 (x0 , y0 , z0 )×∇g2 (x0 , y0 , z0 ) = 0 .

Then there exist a neighborhood U of .(x0 , y0 , z0 ) and a regular and injective curve σ : [−r, r] → R3 for some .r > 0 such that .σ (0) = (x0 , y0 , z0 ) and

.

{(x, y, z) ∈ U : g1 (x, y, z) = g2 (x, y, z) = 0} = σ ([−r, r]) .

.

294

10 The Differential

10.14 Lagrange Multipliers We are now interested in finding local minimum or local maximum points for f when its domain is constrained to a set defined by some vector valued function g. Theorem 10.33 (Lagrange Multiplier Theorem) Let .O ⊆ RN be an open set and N−M be a function of class .C 1 such that .x0 a point of .O. Let .g : O → R g(x0 ) = 0 , and J g(x0 ) has rank N − M ,

.

and let .f : O → R be differentiable at .x0 . Setting S = {x ∈ O : g(x) = 0} ,

.

if .x0 is either a local minimum or a local maximum point for .f |S (the restriction of f to S), then there exist .(N − M) real numbers .λ1 , . . . , λN−M such that ∇f (x0 ) =

N−M 

.

λj ∇gj (x0 ) .

j =1

The numbers .λ1 , . . . , λN−M are called “Lagrange multipliers.” Proof By Theorem 10.29, there exist a neighborhood U of .x0 and a regular and injective M-surface .σ : B[r] → RN for some .r > 0 such that .σ (0) = x0 and S ∩ U = σ (B[r]) .

.

Consider the function .F : B[r] → R defined as .F (u) = f (σ (u)). Then .0 is either a local minimum or a local maximum point for .F, hence .∇F (0) = 0, i.e., 0 = J F (0) = Jf (x0 )J σ (0) = ∇f (x0 )T J σ (0) .

.

As a consequence, ∇f (x0 ) ·

.

∂σ ∂σ (0) = 0 , . . . , ∇f (x0 ) · (0) = 0 , ∂u1 ∂uM

i.e., ∇f (x0 ) is orthogonal to

.

∂σ ∂σ (0) , . . . , (0) . ∂u1 ∂uM

10.14 Lagrange Multipliers

295

Moreover, since .g(σ (u)) = 0 for every .u ∈ B[r], we have that J g(x0 )J σ (0) = 0 ,

.

hence, the vectors ∇g1 (x0 ) , . . . , ∇gN−M (x0 ) are all orthogonal to

.

∂σ ∂σ (0) , . . . , (0). ∂u1 ∂uM

By assumption, .J σ (0) has rank .M, i.e., the real vector space T generated by

.

∂σ ∂σ (0) , . . . , (0) has dimension M. ∂u1 ∂uM Therefore, the orthogonal space .T ⊥ has dimension .N − M. The vectors .∇g1 (x0 ), . . . , .∇gN−M (x0 ) are linearly independent and, as we saw earlier, they belong to ⊥ , so these vectors form a basis for .T ⊥ . Since .∇f (x ) also belongs to .T ⊥ , it .T 0 must be a linear combination of the vectors of the basis.  As in the previous section, we analyze in detail three interesting cases. We start by considering the case .M = 1, .N = 2. Corollary 10.34 Let .O ⊆ R2 be an open set and .(x0 , y0 ) a point of .O. Let .g : O → R be a function of class .C 1 such that g(x0 , y0 ) = 0 , and ∇g(x0 , y0 ) = 0 ,

.

and let .f : O → R be differentiable at .(x0 , y0 ). Setting S = {(x, y) ∈ O : g(x, y) = 0} ,

.

if .(x0 , y0 ) is either a local minimum or a local maximum point for .f |S , then there exists a real number .λ such that ∇f (x0 , y0 ) = λ∇g(x0 , y0 ) .

.

Example Among all rectangles in the plane with a given perimeter p, we want to find those that maximize the area. Let us denote by x and y the lengths of the sides of a rectangle and define the area function f (x, y) = xy .

.

296

10 The Differential

We are looking for the maximum points of the function f over the set K = {(x, y) ∈ R2 : x ≥ 0 , y ≥ 0 , 2x + 2y = p} .

.

This set is compact, so that f , being continuous, surely has a maximum point in K. Taking .(x, y) ∈ K, note that .f (x, y) = 0 only when .x = 0 or .y = 0; otherwise, .f (x, y) > 0. Define now the function g(x, y) = 2x + 2y − p .

.

Then ∇f (x, y) = λ∇g(x, y)

.



(y, x) = λ(2, 2) .

By the preceding considerations and Corollary 10.34, the maximum point .(x0 , y0 ) must be such that .x0 = y0 ; hence, the rectangle must be a square. Now we move to the case .M = 2, .N = 3. Corollary 10.35 Let .O ⊆ R3 be an open set and .(x0 , y0 , z0 ) a point of .O. Let 1 .g : O → R be a function of class .C such that g(x0 , y0 , z0 ) = 0 , and ∇g(x0 , y0 , z0 ) = 0 ,

.

and let .f : O → R be differentiable at .(x0 , y0 , z0 ). Setting S = {(x, y, z) ∈ O : g(x, y, z) = 0} ,

.

if .(x0 , y0 , z0 ) is either a local minimum or a local maximum point for .f |S , then there exists a real number .λ such that ∇f (x0 , y0 , z0 ) = λ∇g(x0 , y0 , z0 ) .

.

Example Among all cuboids with a given area a, we want to find those that maximize the volume. Let us denote by x, y, and z the lengths of the sides of a cuboid, and define the volume function f (x, y, z) = xyz .

.

We are looking for the maximum points of the function f over the set K = {(x, y, z) ∈ R2 : x ≥ 0 , y ≥ 0 , z ≥ 0 , 2xy + 2xz + 2yz = a} .

.

Taking .(x, y, z) ∈ K, note that .f (x, y, z) = 0 only when .x = 0 or .y = 0 or z = 0; otherwise, .f (x, y, z) > 0. Everything then seems as in the previous example,

.

10.14 Lagrange Multipliers

297

but there is a difficulty now. The set K is unbounded, hence not compact, and the argument of the previous example needs to be modified. First of all we note that , .

a , 6

,

a , 6

,  , , ,    a a a a a 3/2 , , . ∈ K , and f = 6 6 6 6 6

Hence, if .(x0 , y0 , z0 ) is a maximum point of f on K, it must be that f (x0 , y0 , z0 ) ≥

.

 a 3/2 6

.

We now prove that it must be that ,

,

3a , .0 ≤ x0 ≤ 3 2

3a 0 ≤ y0 ≤ 3 , 2

, 0 ≤ z0 ≤ 3

3a . 2

Let  us prove the first one, the others being analogous. By contradiction, if .x0 > 3 3a 2 , then, since .2x0 y0 ≤ a and .2x0 z0 ≤ a, it must be that a a2 a2 1 z0 ≤ .x0 y0 z0 ≤ < 2 4x0 4 3 in contrast to .f (x0 , y0 , z0 ) ≥

$ a %3/2 6

,

 a 3/2 2 = , 3a 6

.

We can then restrict the search of a maximum point of f on the set , , , 3a 3a 3a 2 & = (x, y, z) ∈ R : 0 ≤ x ≤ 3 , 0≤y≤3 , 0≤z≤3 , .K 2 2 2 2xy + 2xz + 2yz = a , which is compact; hence, the point of maximum exists. Now define the function g(x, y, z) = 2xy + 2xz + 2yz − a .

.

Then ∇f (x, y, z) = λ∇g(x, y, z)

.



(yz, xz, xy) = λ(2y + 2z, 2x + 2z, 2x + 2y) .

If .(x, y, z) solves the preceding equation with .x > 0, .y > 0, and .z > 0, then x(y + z) = y(x + z) = z(x + y) ,

.

298

10 The Differential

and hence .x = y = z. By the foregoing considerations and Corollary 10.35, the maximum point .(x0 , y0 , z0 ) must be such that .x0 = y0 = z0 , so the cuboid must be a cube. Let us conclude with the case .M = 1, .N = 3. Corollary 10.36 Let .O ⊆ R3 be an open set and .(x0 , y0 , z0 ) a point of .O. Let 1 .g1 , g2 : O → R be two functions of class .C such that g1 (x0 , y0 , z0 ) = g2 (x0 , y0 , z0 ) = 0 and ∇g1 (x0 , y0 , z0 ) × ∇g2 (x0 , y0 , z0 ) = 0 ,

.

and let .f : O → R be differentiable at .(x0 , y0 , z0 ). Setting S = {(x, y, z) ∈ U : g1 (x, y, z) = 0, g2 (x, y, z) = 0} ,

.

if .(x0 , y0 , z0 ) is either a local minimum or a local maximum point for .f |S , then there exist two real numbers .λ1 , λ2 such that ∇f (x0 , y0 , z0 ) = λ1 ∇g1 (x0 , y0 , z0 ) + λ2 ∇g2 (x0 , y0 , z0 ) .

.

Example We want to find the minimum and maximum points of the function f (x, y, z) = z on the set

.

S = {(x, y, z) ∈ R3 : x 2 + y 2 + z2 = 1 , x + y + z = 1} .

.

Note that S is compact, so the minimum and maximum of f on S exist. Define g1 (x, y, z) = x 2 + y 2 + z2 − 1 and .g2 (x, y, z) = x + y + z − 1. Then

.

∇g1 (x, y, z) = (2x, 2y, 2z) ,

.

∇g2 (x, y, z) = (1, 1, 1) ,

hence ∇g1 (x, y, z) × ∇g2 (x, y, z) = (2(y − z), 2(z − x), 2(x − y)) .

.

Note that ∇g1 (x, y, z) × ∇g2 (x, y, z) = 0

.



x = y = z,

which implies that .(x, y, z) ∈ / S. Now, a simple computation shows that if (x, y, z) ∈ S, then we have that

.

∇f (x, y, z) = λ1 ∇g1 (x, y, z) + λ2 ∇g2 (x, y, z)

.

if and only if either (x, y, z) = (0, 0, 1) ,

.

λ1 =

1 , 2

λ=0

10.15 Differentiable Manifolds

299

or 2 2 1 2 1 , , − , λ1 = − , λ2 = . 3 3 3 2 3 % $ Since .f (0, 0, 1) = 1 and .f 23 , 23 , − 13 = − 13 , by the preceding considerations and $ % Corollary 10.36, we conclude that .(0, 0, 1) is a maximum point and . 23 , 23 , − 13 is a minimum point of f on S. (x, y, z) =

.

10.15 Differentiable Manifolds There is an alternative way of looking at some geometrical objects such as “curves” and “surfaces.” The intuitive idea is that they locally “look the same” as a straight line or a plane. In other words, when observing these objects from a very small distance, they look “almost flat.” We will now make this idea precise, in a general finite-dimensional context. Thus, let .M be a subset of .RN . Definition 10.37 The set .M is an “M-dimensional differentiable manifold,” with 1 ≤ M ≤ N (or a “M-manifold” for short) if, taking a point .x in .M , there are an open neighborhood A of .x, an open neighborhood B of .0 in .RN , and a diffeomorphism .ϕ : A → B such that .ϕ(x) = 0 and either

.

(a) .ϕ(A ∩ M ) = {y = (y1 , . . . , yN ) ∈ B : yM+1 = · · · = yN = 0} or .(b) .ϕ(A ∩ M ) = {y = (y1 , . . . , yN ) ∈ B : yM+1 = · · · = yN = 0 and yM ≥ 0} .

.

It can be seen that .(a) and .(b) cannot hold at the same time. The points .x for which .(b) is verified make up the “boundary” of .M , which we denote by .∂M . If .∂M is empty, we are speaking of an M-manifold without boundary; otherwise, .M is sometimes said to be an M-manifold with boundary. First, note that the boundary of a differentiable manifold is itself a differentiable manifold, with a lower dimension. Theorem 10.38 The set .∂M is a .(M − 1)-manifold without boundary, i.e., ∂(∂M ) = Ø .

.

Proof Taking a point .x in .∂M , there are an open neighborhood A of .x, an open neighborhood B of .0 in .RN , and a diffeomorphism .ϕ : A → B such that .ϕ(x) = 0 and ϕ(A ∩ M ) = {y = (y1 , . . . , yN ) ∈ B : yM+1 = · · · = yN = 0 and yM ≥ 0}.

.

300

10 The Differential

Based on the fact that the conditions .(a) and .(b) of the definition cannot hold simultaneously for any point of .M , it is possible to prove that ϕ(A ∩ ∂M ) = {y = (y1 , . . . , yN ) ∈ B : yM = yM+1 = · · · = yN = 0} .

.



This completes the proof.

There are many examples of manifolds: Circles, spheres, and toruses are manifolds without boundary. A hemisphere is a 2-manifold whose boundary is a circle. However, a cone is not a manifold because of a single point, its vertex. Notice that any open set in .RN is an N-manifold (without boundary). Let us now see that, given an M-manifold .M , corresponding to each of its points .x it is possible to find a local M-parametrization. Theorem 10.39 For every .x ∈ M there is a neighborhood .A of .x such that  N .A ∩ M can be M-parametrized with an injective function .σ : I → R , where I is M a rectangle of .R of the type  I =

.

[ − α, α]M [ − α, α]M−1 × [0, α]

if x ∈ ∂M , if x ∈ ∂M ,

and .σ (0) = x. Moreover, if .x is a point of the boundary .∂M , the Mparametrization .σ is such that the interior points of a single face of rectangle I are sent on .∂M . Proof Consider the diffeomorphism .ϕ : A → B given by the preceding definition, and take an .α > 0 such that the rectangle .B  = [−α, α]N is contained in .B. Setting  −1 (B  ), we have that .A is a neighborhood of .x (indeed, the set .B  = .A = ϕ ] − α, α[N is open and, hence, also .A = ϕ −1 (B  ) is open, and .x ∈ A ⊆ A ). We can then take rectangle I as in the statement and define .σ (u) = ϕ −1 (u, 0). It is readily seen that .σ is injective and .σ (I ) = A ∩M . Moreover, .ϕ(1,...,M) (σ (u)) = u for every .u ∈ I ; hence, .J ϕ(1,...,M) (σ (u)) · J σ (u) is the identity matrix, so that .J σ (u) has rank M for every .u ∈ I. Finally, if .x ∈ ∂M , then [−α, α]M−1 × {0} = σ −1 (A ∩ ∂M ) ,

.

thereby completing the proof.



Notice that the function .σ is indeed defined on an open set containing I , and it is injective there.

11

The Integral

In this chapter we extend the theory of the integral to functions of several variables defined on subsets of .RN, with values in .R. For simplicity, in the exposition we will first focus our attention on the case .N = 2 and later provide all the results in the case of a generic dimension N.

11.1

Integrability on Rectangles

We begin by considering the case of functions defined on rectangles. We recall that a “rectangle” of .RN is a set of the type .[a1 , b1 ] × · · · × [aN , bN ]. In the following exposition, we concentrate for simplicity on the two-dimensional case. The general case is largely identical and does not involve greater difficulties, except for the notations. We consider the rectangle .I = [a1 , b1 ] × [a2 , b2 ] ⊆ R2 and define its measure μ(I ) = (b1 − a1 )(b2 − a2 ) .

.

As a particular case, given .x = (x, y) ∈ I and .r > 0, we have B[x, r] = [x − r, x + r] × [y − r, y + r] ;

.

it is the square centered at .x having r as half of the length of its sides. We say that two rectangles are “nonoverlapping” if their interiors are disjoint. A “tagged partition” of the rectangle I is a set ˚ = {(x1 , I1 ), (x2 , I2 ), . . . , (xm , Im )} , P

.

where the .Ij are nonoverlapping rectangles whose union is I and, for every .j = 1, . . . , m, the point .xj = (xj , yj ) belongs to .Ij . © The Author(s), under exclusive license to Springer Nature Switzerland AG 2023 A. Fonda, A Modern Introduction to Mathematical Analysis, https://doi.org/10.1007/978-3-031-23713-3_11

301

302

11 The Integral

Example If .I = [0, 10] × [0, 6], a possible tagged partition is the following: ˚ = {((1, 1), [0, 7] × [0, 2]), ((0, 5), [0, 3] × [2, 6]), P

.

((5, 4), [3, 10] × [4, 6]), ((10, 0), [7, 10] × [0, 4]), ((5, 3), [3, 7] × [2, 4])} .

x2

I3

x3 I2 I5

x5 I4

I2

x1

x4

Let us now consider a function f defined on the rectangle .I, with values in .R, ˚ = {(xj , Ij ) : j = 1, . . . , m} be a tagged partition of .I. We call “Riemann and let .P ˚ the real number .S(f, P) ˚ defined by sum” associated with f and .P ˚ = S(f, P)

m 

.

f (xj )μ(Ij ) .

j =1

Whenever f happens to be positive, this number is the sum of the volumes of the parallelepipeds having as base .Ij and height .[0, f (xj )]. x3

x2

x1

11.1 Integrability on Rectangles

303

We call a “gauge” on I every positive function .δ : I → R. Given a gauge .δ on ˚ introduced previously is “.δ-fine” if, for every I, we say that the tagged partition .P .j = 1, . . . , m, .

Ij ⊆ B[xj , δ(xj )] .

.

Example Let .I = [0, 1] × [0, 1] and .δ be the gauge defined as follows: ⎧ x+y ⎪ ⎨ 3 .δ(x, y) = 1 ⎪ ⎩ 2

if (x, y) = (0, 0) , if (x, y) = (0, 0) .

We want to find a .δ-fine tagged partition of .I. Much like what we saw at the end of Sect. 7.2, in this case one of the points .xj necessarily must coincide with .(0, 0). We can then choose, for example, ˚= P .

 * + * +  + *  * + (0, 0), 0, 12 × 0, 12 , 12 , 1 , 0, 12 × 12 , 1 ,   * +  * + * + * + 1, 12 , 12 , 1 × 0, 12 , (1, 1), 12 , 1 × 12 , 1 .

It is interesting to observe that it is not always possible to construct .δ-fine tagged partitions by only joining points on the edges of I . The reader may become convinced when attempting to do this using the following gauge: ⎧ x+y ⎪ ⎨ 16 .δ(x, y) = 1 ⎪ ⎩ 2

if (x, y) = (0, 0) , if (x, y) = (0, 0) .

As in the one-dimensional case, one can prove that for every gauge .δ on I there exists a .δ-fine tagged partition of I (see Cousin’s Theorem 7.1). The following definition is identical to the one given in Chap. 7. Definition 11.1 A function .f : I → R is said to be “integrable” (on the rectangle I ) if there is a real number .J with the following property. Given .ε > 0, it is possible ˚ of .I, to find a gauge .δ on I such that, for every .δ-fine tagged partition .P ˚ −J| ≤ ε. |S(f, P)

.

We briefly overview all the properties that can be obtained from the given definition in the same way as was done in the case of a function of a single variable. First of all, there is at most one .J ∈ R that verifies the conditions of the definition. Such a number is called the “integral” of f on I and is denoted by one

304

11 The Integral

of the following symbols: -

I

-

f (x ) d x ,

f,

.

f (x, y) dx dy .

I

I

The set of integrable functions is a real vector space, and the integral is a linear function on it: . (f + g) = f + g , (αf ) = α f I

I

I

I

I

(with .α ∈ R). It preserves the order f ≤g

-



.

f ≤ I

g. I

The Cauchy criterion of integrability holds. Theorem 11.2 (Cauchy Criterion) A function .f : I → R is integrable if and only if for every .ε > 0 there is a gauge .δ : I → R such that, taking two .δ-fine tagged ˚ .Q ˚ of I , we have partitions .P, ˚ − S(f, Q)| ˚ ≤ ε. |S(f, P)

.

Moreover, we have the following property of “additivity on subrectangles.” Theorem 11.3 Let .f : I → R be a function and .K1 , K2 , . . . , Kl be nonoverlapping subrectangles of I whose union is .I. Then f is integrable on I if and only if it is integrable on each of the .Ki . In that case, we have f =

.

I

l  i=1

f. Ki

In particular, if a function is integrable on a rectangle, it is still on every subrectangle. The proof of the theorem is similar to that of Theorem 7.18 and is based on the possibility of constructing a gauge that would allow us to split the Riemann sums on I into Riemann sums on the single subrectangles. We say that an integrable function on I is “R-integrable” there (or integrable according to Riemann) if, among all possible gauges .δ which verify the definition of integrability, it is always possible to choose one that is constant on .I. The set of R-integrable functions is a vector subspace of the space of integrable functions and contains the subspace of continuous functions. We say that an integrable function .f : I → R is “L-integrable” (or integrable according to Lebesgue) if .|f | is integrable on I as well. The L-integrable functions make up a vector subspace of the space of integrable functions. If f and g are

11.2 Integrability on a Bounded Set

305

two L-integrable functions on I , then the functions .min{f, g} and .max{f, g} are L-integrable on I , too. A function f is L-integrable if and only if both its positive part .f + = max{f, 0} and its negative part .f − = max{−f, 0} are integrable. The Saks–Henstock Theorem 9.1, Monotone Convergence Theorem 9.10, and Dominated Convergence Theorem 9.13 extend to the integrable functions on a rectangle, with statements and proofs perfectly analogous to those provided in Chap. 9.

11.2

Integrability on a Bounded Set

We will now provide the definition of the integral on an arbitrary bounded domain. Given a bounded set E and a function .f : E → R, we define the function .fE : RN → R as follows:  f (x) if x ∈ E , .fE (x) = 0 if x ∈ E . We are thus led to the following definition. Definition 11.4 Given a bounded set .E, we say that the function .f : E → R is “integrable” (on E) if there is a rectangle I containing the set E on which .fE is integrable. In that case, we set -

f =

.

E

fE . I

To verify the consistency of the preceding definition, we will now show that when f is integrable on E, we have that .fE is integrable on any rectangle containing the set .E, and the integral of .fE remains the same on each such rectangle. Proposition 11.5 Let I and J be two rectangles containing the set .E. Then . .fE is integrable on I if and only if it is integrable on .J. In that case, we have . I fE = . f . J E Proof We consider for simplicity the case .N = 2. Assume that .fE is integrable on .I. Let K be a rectangle containing both I and .J. We can construct some nonoverlapping rectangles .K1 , . . . , Kr , also nonoverlapping with I such that .I ∪ K1 ∪ · · · ∪ Kr = K. We now prove .that .fE is integrable on each of the subrectangles . .K1 , . . . , Kr and that the integrals . f , . . . , f are all equal to zero. Notice K1 E Kr E that .fE , restricted to each of these subrectangles, is zero everywhere except perhaps on one of their edges. We are thus led to prove the following lemma, which will permit us to conclude the proof. 

306

11 The Integral

Lemma 11.6 Let .K be a rectangle and .g : K → R be a function which .is zero everywhere except perhaps on one edge of .K. Then g is integrable on .K and . K g = 0. Proof We first assume that the function g is bounded on .K, i.e., that there is a constant .C > 0 for which |g(x, y)| ≤ C

.

for every .(x, y) ∈ K. Fix .ε > 0. Let L be the edge of rectangle .K on which g may ε be nonzero, and denote by . its length. Define the constant gauge .δ = C . Then, for ˚ = {(x1 , I1 ), . . . , (xm , Im )} of .K, we have every .δ-fine tagged partition .P ˚ ≤ |S(g, P)|

m 

.

|g(xj )|μ(Ij ) =

|g(xj )|μ(Ij )

{j : xj ∈L}

j =1







C μ(Ij ) ≤ Cδ = ε .

{j : xj ∈L}

. This proves that g is integrable on .K and . K g = 0 in the case where g is bounded on .K. If g is not bounded on .K, assume that it has nonnegative values. Define the following sequence .(gn )n of functions as gn (x) = min{g(x), n} .

.

. Since the functions .gn are bounded, for what we saw earlier we have . K gn = 0, for every .n. It is easily seen that the sequence thus defined satisfies the conditions of the Monotone Convergence Theorem 9.10 and converges pointwise to .g. It then follows that g is integrable on .K and .

K

g = lim n

K

gn = 0 .

+ If g does not have only nonnegative values, . it+ is always . .g . − possible to consider − and . .g+ . From . −the preceding discussion, . K g = K g = 0, and then . K g =  K g − K g = 0, which is what we wanted to prove.

End of Proof of Proposition 11.5 Having proved .that .fE is integrable on each of . the .K1 , . . . , Kr and that the integrals . K1 fE , . . . , Kr fE are equal to zero, by the theorem of additivity on subrectangles we have that, since .fE is integrable on .I, it is also integrable on K, and -

-

-

fE =

.

K

-

fE + I

-

fE + · · · + K1

fE = Kr

fE . I

11.3 The Measure

307

But then .fE is integrable on every subinterval of K, and in particular on .J. We can now construct, analogously to what was just done for .I, some nonoverlapping rectangles .J1 , . . . , Js , also nonoverlapping with .J, such that .J ∪ J1 ∪ · · · ∪ Js = K. Similarly, we will have -

-

-

fE =

.

K

-

fE + J

-

fE + · · · + J1

fE = Js

fE , J

. . which proves that . I fE = J fE . To see that the condition is necessary and sufficient, simply exchange the roles of I and J in the foregoing proof. With the given definition, all the properties of the integral seen earlier easily extend to this setting. There is an exception concerning the additivity since it is not true in general that a function that is integrable on a bounded set remains integrable on any of its subsets. Indeed, take a function .f : E → R, which is integrable but not L-integrable. We consider the subset E  = {x ∈ E : f (x) ≥ 0} ,

.

and we claim that f cannot be integrable on .E  . If it were, then .f + would be integrable on .E. But then .f − = f + − f would also be integrable on .E, and therefore f should be L-integrable on E, in contradiction with the assumption. We will see that, with respect to additivity, the L-integrable functions have a somewhat better behavior.

11.3

The Measure

We now give the definition of “measure” for a bounded subset of .RN . Definition 11.7 A bounded set E is said to be “measurable” if the constant . function 1 is integrable on .E. In that case, the number . E 1 is said to be the “measure” of E and is denoted by .μ(E). The measure of a measurable set is thus a nonnegative number. The empty set is assumed to be measurable, and its measure is equal to .0. In the case of a subset of 2 .R , its measure is also called the “area” of the set. If .E = [a1 , b1 ] × [a2 , b2 ] is a rectangle, it is easily seen that μ(E) =

1 = (b1 − a1 )(b2 − a2 ) ,

.

E

so that the notation is in accordance with the one already introduced for rectangles. For a subset of .R3, the measure is also called the “volume” of the set.

308

11 The Integral

Let us analyze some properties of the measure. It is useful to introduce the characteristic function of a set E, defined by  χE (x) =

.

1 if x ∈ E , 0 if x ∈ E .

If I is a rectangle containing the set .E, we thus have μ(E) =

χE .

.

I

Proposition 11.8 Let A and B be two measurable bounded sets. The following properties hold: (a) If .A ⊆ B, then .B \ A is measurable, and

.

μ(B \ A) = μ(B) − μ(A) ;

.

in particular, .μ(A) ≤ μ(B). (b) .A ∪ B and .A ∩ B are measurable, and

.

μ(A ∪ B) + μ(A ∩ B) = μ(A) + μ(B) ;

.

in particular, if A and B are disjoint, then .μ(A ∪ B) = μ(A) + μ(B). Proof Let I be a rectangle containing .A ∪ B. If .A ⊆ B, then .χB\A = χB − χA , and property .(a) follows integrating on I . Since .χA∪B = max{χA , χB } and .χA∩B = min{χA , χB }, we have that .χA∪B and .χA∩B are integrable on I . Moreover, χA∪B + χA∩B = χA + χB ,

.



and integrating on I , we have .(b).

The following proposition states the Complete Additivity property of the measure. Proposition 11.9 If .(Ak )k≥1 is a sequence of measurable bounded sets whose union .A = ∪k≥1 Ak is bounded, then A is measurable, and μ(A) ≤

∞ 

.

μ(Ak ) .

k=1

If the sets .Ak are pairwise disjoint, then equality holds.

11.3 The Measure

309

Proof Assume first that the sets .Ak are pairwise disjoint. Let I be a rectangle containing their union .A. Then, for every .x ∈ I, χ A (x ) =

∞ 

.

χAk (x) .

k=1

Moreover, since for every positive integer q we have q  .

 μ(Ak ) = μ

k=1

1∞ .

the series . k=1 I χAk = A is measurable and μ(A) =

k=1 μ(Ak )

χA = I

 Ak

≤ μ(I ) ,

k=1

1∞

.

q )

-  ∞ I k=1

converges. By Corollary 9.11, we have that

χAk =

∞  k=1 I

χAk =

∞ 

μ(Ak ) .

k=1

When the sets .Ak are not pairwise disjoint, consider the sets .B1 = A1 , .B2 = A2 \A1 and, in general, .Bk = Ak \ (A1 ∪ · · · ∪ Ak−1 ). The sets .Bk are measurable and pairwise disjoint, and .∪k≥1 Bk = ∪k≥1 Ak . The conclusion then follows from what was proved earlier.  We have a similar proposition concerning the intersection of a countable family of sets. Proposition 11.10 If .(Ak )k≥1 is a sequence of measurable bounded sets, their intersection .A = ∩k≥1 Ak is a measurable set. Proof Let I be a rectangle containing the set .A. Then, since .A = ∩k≥1 (Ak ∩ I ), we have that )  6 . Ak = I \ (I \ (Ak ∩ I )) , k≥1

k≥1

and the conclusion follows from the two previous propositions.



The following two propositions will provide us with a large class of measurable sets. Proposition 11.11 Every open and bounded set is measurable. Proof Consider for simplicity the case .N = 2. Let A be an open set contained in a rectangle .I. We divide the rectangle I into four rectangles of equal areas using the

310

11 The Integral

axes of its edges. Then we proceed analogously with each of these four rectangles, thereby obtaining 16 smaller rectangles, and so on. Since A is open, for every .x ∈ A there is a small rectangle among those just constructed that contains .x and is contained in .A. In this way, it is seen that set A is covered by a countable family of rectangles; being the union of a countable family of measurable sets, it is therefore measurable.  Proposition 11.12 Every compact set is measurable. I contains .B. Proof Let B be a compact set, and let I be a rectangle whose interior .˚ I and .˚ I \ B are open and, hence, measurable, we have that .B = ˚ Since .˚ I \ (˚ I \ B) is measurable.  Example The set E = {(x, y) ∈ R2 : 1 < x 2 + y 2 ≤ 4}

.

is measurable, since it is the difference of the closed disks with radius 2 and 1 centered at the origin, i.e., E = {(x, y) ∈ R2 : x 2 + y 2 ≤ 4} \ {(x, y) ∈ R2 : x 2 + y 2 ≤ 1} .

.

11.4

Negligible Sets

Definition 11.13 We say that a bounded set is “negligible” if it is measurable and its measure is equal to zero. Every set made of a single point is negligible. Consequently, all finite or countable bounded sets are negligible. The edge of a rectangle in R2 is a negligible set, as shown by Lemma 11.6. By the complete additivity of the measure, the union of any sequence of negligible sets, if it is bounded, is always a negligible set. Theorem 11.14 If E is a bounded set and f .: E → R is equal to zero except for a negligible set, then f is integrable on E and E f = 0. Proof Let T be the negligible set on which f is different from zero. Assume first that the function f is bounded, i.e., that there is a constant C > 0 such that |f (x)| ≤ C

.

11.4 Negligible Sets

311

. for every x ∈ E. We consider a rectangle I containing E and prove that I fE = 0. Fix ε > 0. Since T has zero measure, there is a gauge δ such that, for every δ-fine ˚ = {(xj , Ij ), j = 1, . . . , m} of I, tagged partition P ˚ = S(χT , P)



μ(Ij ) ≤

.

{j : xj ∈T }

ε , C

so that ˚ ≤ |S(fE , P)|



|f (xj )|μ(Ij ) ≤ C

.

{j : xj ∈T }



μ(Ij ) ≤ ε .

{j : xj ∈T }

. Hence, if f is bounded, it is integrable on E and E f = 0. If f is not bounded, assume first that it has nonnegative values. Define a sequence of functions fn : E → R as fn (x) = min{f (x), n} .

.

Since the functions fn are .bounded and zero except on T , for what we just saw they are integrable on E, with E fn = 0, for every n. It is easily seen that the defined sequence satisfies the conditions of the Monotone Convergence Theorem 9.10 and converges pointwise to f. Hence, f is integrable on E, and -

f = lim

.

E

n

fn = 0 . E

If f does not have nonnegative values, it is sufficient to consider f + and f − and apply to them what was said earlier.  Here is a counterpart of the foregoing result. Theorem 11.15 If f : E .→ R is an integrable function on a bounded set E, having nonnegative values, with E f = 0, then f is equal to zero except on a negligible set. To prove this, we need the following Chebyshev inequality. Lemma 11.16 Let E be a bounded set and f : E → R an integrable function with nonnegative values. Then, for every r > 0, the set Er = {x ∈ E : f (x) > r}

.

312

11 The Integral

is measurable, and 1 .μ(Er ) ≤ r

f. E

Proof Let I be a rectangle containing E. Once we have fixed r > 0, we define the functions fn : I → R as fn (x) = min{1, n max{fE (x) − r, 0}} .

.

These make up an increasing sequence of L-integrable functions that pointwise converges to χEr . Clearly, 0 ≤ fn (x) ≤ 1

for every n and every x ∈ I .

.

The Monotone Convergence Theorem 9.10 guarantees that χEr is integrable on I , i.e., that Er is measurable. Since, for every x ∈ E, we have rχEr (x) ≤ f (x), integrating both sides of this inequality we obtain the inequality we are looking for.  Proof of Theorem 11.15 Using the Chebyshev inequality, we have that, for every positive integer k, μ(E 1 ) ≤ k

.

k

f = 0. E

Hence, every E 1 is negligible, and since their union is just the set where f is k different from zero, we have the conclusion.  Definition 11.17 Let E be a bounded set. We say that a proposition is true “almost everywhere” on E (or for almost every point of E) if the set of points for which it is false is negligible. The results just proved have the following simple consequence. Corollary 11.18 If two functions f and g, defined on the bounded set E, are . equal almost everywhere, then f is integrable on E if and only if g is. In that case, Ef = . g. E Proof In such a case the function g − f is equal to zero almost everywhere, hence . (g − f ) = 0. Then E -

-

-

f =

.

E

f+ E

thereby completing the proof.

-

-

(g − f ) = E

(f + (g − f )) = E

g, E



11.5 A Characterization of Measurable Bounded Sets

313

This last corollary permits us to consider some functions that are defined almost everywhere and to define their integral. Definition 11.19 A function f, defined almost everywhere on E, with real values, is said to be “integrable” on.E if it .can be extended to an integrable function g : E → R. In this case, we set E f = E g. The preceding definition is consistent since the integral will not depend on the particular extension. It can be seen that all the properties and theorems seen till now remain true for such functions. The reader is invited to verify this.

11.5

A Characterization of Measurable Bounded Sets

The following covering lemma will be useful in what follows. Lemma 11.20 Let E be a set contained in a rectangle .I, and let .δ be a gauge on .E. Then there is a finite or countable family of nonoverlapping rectangles .Jk , contained in .I, whose union covers the set .E, with the following property: In each of the sets .Jk there is a point .xk belonging to E such that .Jk ⊆ B[xk , δ(xk )]. Proof We consider for simplicity the case .N = 2. Let us divide the rectangle I into four rectangles, having the same areas, by the axes of its edges. We proceed analogously with each of these four rectangles, obtaining 16 smaller rectangles, and so on. We thus obtain a countable family of smaller and smaller rectangles. For every point .x of E we can choose one of these rectangles that contains .x and is itself contained in .B[x, δ(x)]. These rectangles would satisfy the properties of the statement if they were nonoverlapping. In order that the sets .Jk be nonoverlapping, it is necessary to choose them carefully, and here is how to do it. We first choose those from the first fourrectangle partition, if there are any, that contain a point .xk belonging to E such that .Jk ⊆ B[xk , δ(xk )]; once this choice has been made, we eliminate all the smaller rectangles contained in them. We consider then the 16-rectangle partition and, among those that remained after the first elimination procedure, we choose those, if there are any, that contain a point .xk belonging to E such that .Jk ⊆ B[xk , δ(xk )]; once this choice has been made, we eliminate all the smaller rectangles contained in them; and so on.  Remark 11.21 Note that if, in the assumptions of the covering lemma, it happens that E is contained in an open set that itself is contained in .I, then all the rectangles .Jk can be chosen so that they are all contained in that open set. We can now prove a characterization of measurable bounded sets. In the following statement, the words in square brackets may be omitted.

314

11 The Integral

Proposition 11.22 Let E be a bounded set, contained in a rectangle .I. The following three propositions are equivalent: .(i) The set E is measurable. (ii) For every .ε > 0 there are two finite or countable families .(Jk ) and .(Jk ), each made of [nonoverlapping] rectangles contained in .I, such that

.

E⊆

)

.

Jk , I \ E ⊆

k

)

Jk

and

μ

 )

k

  )  Jk ∩ Jk ≤ ε.

k

k

(iii) There are two sequences .(En )n≥1 and .(En )n≥1 of measurable bounded subsets such that

.

En ⊆ E ⊆ En ,

lim(μ(En ) − μ(En )) = 0 .

.

n

In that case, we have μ(E) = lim μ(En ) = lim μ(En ) .

.

n

n

Proof Let us first prove that .(i) implies .(ii). Assume that E is measurable, and fix ε > 0. By the Saks–Henstock Theorem 9.3, there is a gauge .δ on I such that, for ˚ = {(xj , Kj ) : j = 1, . . . , m} of .I, every .δ-fine tagged subpartition .P

.

.

m    χE (xj )μ(Kj ) −  j =1

Kj

  ε χE  ≤ . 2

By Lemma 11.20, there is a family of nonoverlapping rectangles .Jk , contained in I , whose union covers E, and in each .Jk there is a point .xk belonging to E such that .Jk ⊆ B[xk , δ(xk )]. Let us fix a positive integer N and consider only .(x1 , J1 ), . . . , (xN , JN ). They make up a .δ-fine tagged subpartition of .I. From the preceding inequality we then deduce that  N     ε  . χE  ≤ , μ(Jk ) − 2 Jk k=1

whence N  .

k=1

μ(Jk ) ≤

N  k=1

ε χE + ≤ 2 Jk

χE + I

ε ε = μ(E) + . 2 2

11.5 A Characterization of Measurable Bounded Sets

315

Since this holds for every positive integer .N, we have thus constructed a family .(Jk ) of nonoverlapping rectangles such that )

E⊆

.



Jk ,

k

μ(Jk ) ≤ μ(E) +

k

ε . 2

Consider now .I \ E, which is also measurable. We can repeat the same procedure that we just followed replacing E with .I \ E, thereby finding a family .(Jk ) of nonoverlapping rectangles, contained in I , such that )

I \E ⊆

.



Jk ,

k

μ(Jk ) ≤ μ(I \ E) +

k

ε . 2

Consequently, I\

)

.

 ) Jk ⊆ E ⊆ Jk ,

k

k

and hence  )    )   )   )  .μ Jk ∩ Jk = μ Jk \ I \ Jk k

k

k



) k



)

k

 Jk

  )  −μ I \ Jk k



Jk − μ(I ) + μ

)

k

Jk



k

  ε ε − μ(I ) + μ(I \ E) + ≤ μ(E) + 2 2 = ε, and the implication is thus proved. Taking .ε = n1 , it is easy to see that .(ii) implies .(iii). Let us prove now that .(iii) implies .(i). Consider the measurable sets & = E

)

.

En ,

n≥1

&= E

6

En ,

n≥1

for which it must be that &, & ⊆ E ⊆ E E

.

& ) = μ(E) & . μ(E

316

11 The Integral

Equivalently, we have χE& ≤ χE ≤ χE& ,

.

I

(χE& − χE& ) = 0 ,

so that .χE& = χE = χE& almost everywhere. Then E is measurable and .μ(E) = & ) = μ(E). & Moreover, μ(E 0 ≤ lim[μ(E) − μ(En )] ≤ lim[μ(En ) − μ(En )] = 0 ,

.

n

n

hence .μ(E) = limn μ(En ). Analogously, we see that .μ(E) = limn μ(En ), and the proof is thus completed.  Proposition 11.23 Let E be a bounded set. Then E is negligible if and only if, for every .ε > 0, there is a finite or countable family .(Jk ) of [nonoverlapping] rectangles such that )  .E ⊆ Jk , μ(Jk ) ≤ ε . k

k

Proof The necessary condition is proved in the first part of the previous proposition. Let us prove the sufficiency. Once fixed .ε > 0, assume there exists a family .(Jk ) with the given properties. Let I be a rectangle containing the set .E. On the other hand, consider a family .(Jk ) whose elements all coincide with .I. The conditions of the previous proposition are then satisfied, so that E is indeed measurable. Then μ(E) ≤ μ

)

.

  Jk ≤ μ(Jk ) ≤ ε ;

k

since .ε is arbitrary, it must be that .μ(E) = 0.

k



Remark 11.24 Observe that if E is contained in an open set that is itself contained in a rectangle .I, then all the rectangles .Jk can be chosen in such a way that they are all contained in that open set. As a consequence of the preceding proposition, it is not difficult to prove the following corollary. Corollary 11.25 If .IN−1 is a rectangle in .RN−1 and T is a negligible subset of .R, then .IN−1 × T is negligible in .RN .

11.6 Continuous Functions and L-Integrable Functions

317

Proof Fix .ε > 0, and, according to Proposition 11.23, let .(Jk ) be a finite or countable family of intervals in .R such that T ⊆

)

.

Jk ,

k



μ(Jk ) ≤

k

ε . μ(IN−1 )

Defining the rectangles .J˜k = IN−1 × Jk , we have that IN−1 ×T ⊆

)

.

J˜k ,



k

μ(J˜k ) = μ(IN−1 )

k



μ(Jk ) ≤ μ(IN−1 )

k

ε μ(IN−1 )



and Proposition 11.23 applies.

11.6

= ε,

Continuous Functions and L-Integrable Functions

We begin this section by showing that the continuous functions are L-integrable on compact sets. Theorem 11.26 Let .E ⊆ RN be a compact set and .f : E → R be a continuous function. Then f is L-integrable on .E. Proof We consider for simplicity the case .N = 2. Since f is continuous on a compact set, there is a constant .C > 0 such that |f (x)| ≤ C ,

.

for every x ∈ E .

Let I be a rectangle containing .E. First we divide I into four rectangles by tracing the segments joining the midpoints of its edges; we denote by .U1,1 , U1,2 , U1,3 , U1,4 these subrectangles. We now divide again each of these rectangles in the same way, thereby obtaining 16 smaller subrectangles, which we denote by .U2,1 , U2,2 , . . . , U2,16 . Proceeding in this way, for every n we will have a subdivision of the rectangle I into .22n small rectangles .Un,j , with .j = 1, . . . , 22n . Whenever E ˚n,j , we choose and fix a point .xn,j ∈ E ∩ U ˚n,j . has a nonempty intersection with .U Define now the function .fn in the following way: ˚n,j is nonempty, then .fn is constant on .U ˚n,j with value .f (xn,j ). • If .E ∩ U ˚ ˚ • If .E ∩ Un,j is empty, then .fn is constant on .Un,j with value .0. The functions .fn are thus defined almost everywhere on .I, not being defined only on the points of the grid made up by the previously constructed segments, which form a countable family of negligible sets. The functions .fn are integrable on each subrectangle .Un,j , since they are constant in its interior. By the property of additivity

318

11 The Integral

on subrectangles, these functions are therefore integrable on .I. Moreover, |fn (x)| ≤ C ,

.

for almost every x ∈ I and every k ≥ 1 .

Let us see now that .(fn )n converges pointwise almost everywhere to .fE . Indeed, taking a point .x ∈ I not belonging to the grid, for every n there is a .j = j (n) for ˚n,j (n) . We have two possibilities: which .x ∈ U (a) .x ∈ E; in this case, since E is closed, we have that, for n sufficiently large, ˚n,j (n) (whose dimensions tend to zero as .n → ∞) will have an empty .U intersection with .E, and then .fn (x) = 0 = fE (x). .(b) .x ∈ E; in this case, if .n → +∞, we have that .xn,j (n) → x (again using the ˚n,j (n) has dimensions tending to zero). By the continuity of .f, we fact that .U have that

.

fn (x) = f (xn,j (n) ) → f (x) = fE (x) .

.

The Dominated Convergence Theorem 9.13 then yields the conclusion.



We now see that L-integrability is conserved on measurable subsets. Theorem 11.27 Let .f : E → R be a L-integrable function on a bounded set .E. Then f is L-integrable on every measurable subset of .E. Proof Assume first that f has nonnegative values. Let S be a measurable subset of .E, and define on E the functions .fn = min{f, nχS }. They form an increasing sequence of L-integrable functions since both f and .nχS are L-integrable, and the sequence converges pointwise to .fS . Moreover, we have -

fn ≤

.

E

f E

for every .n. The Monotone Convergence Theorem 9.10 then guarantees that f is integrable on S in this case. In the general case, since f is L-integrable, both .f + and .f − are L-integrable on E. Hence, based on the preceding discussion, they are both L-integrable on S, and then f is, too.  Let us now prove the complete additivity property of the integral for Lintegrable functions. We will say that two measurable bounded subsets are “nonoverlapping” if their intersection is a negligible set.

11.7 Limits and Derivatives under the Integration Sign

319

Theorem 11.28 Let .(Ek ) be a finite or countable family of measurable nonoverlapping sets whose union is a bounded set .E. Then f is L-integrable on E if and only if the following two conditions hold: (a) f 1is.L-integrable on each .Ek . (b) . k Ek |f (x)| d x < +∞.

.

.

In that case, we have f =

.

E

-

f. Ek

k

Proof Observe that f (x) =



.

fEk (x) ,

|f (x)| =

k



|fEk (x)|

k

for almost every .x ∈ E. If f is L-integrable on .E, then from the preceding theorem (a) follows. Moreover, it is obvious that .(b) holds whenever the sets .Ek are finite in number. If instead they are infinite, then for any fixed n we have

.

n  .

|f (x)| d x =

k=1 Ek

n  k=1 E

|fEk (x)| d x =

-  n E k=1

|fEk (x)| d x ≤

-

|f (x)| d x ,

E

and .(b) follows. Assume now that .(a) and .(b) hold. If the sets .Ek are1 finite in number, it is sufficient to integrate on E both terms in the equation .f = k fEk . If instead they are infinite, assume first that1f has nonnegative values. In this case, Corollary 9.11, when applied to the series . k fEk , yields the conclusion. In the general case, it is sufficient to consider, as usual, the positive and negative parts of f . 

11.7

Limits and Derivatives under the Integration Sign

Let X be a metric space, Y be a bounded subset of .RN, and consider a function .f : X × Y → R. (For simplicity, we may think of X and Y as subsets of .R.) The first question we want to address is when the formula .

f (x, y) dy =

lim

x→x0



Y

-  Y

 lim f (x, y) dy

x→x0

holds. What follows is a generalization of the Dominated Convergence Theorem 9.13.

320

11 The Integral

Theorem 11.29 Let .x0 be an accumulation point of .X, and let the following assumptions hold true: (a) For every .x ∈ X \ {x0 }, the function .f (x, ·) is integrable on .Y, so that we can define the function

.

F (x) =

f (x, y) dy .

.

Y

(b) For almost every .y ∈ Y the limit .limx→x0 f (x, y) exists and is finite, so that we can define almost everywhere the function

.

η(y) = lim f (x, y) .

.

x→x0

(c) There are two integrable functions .g, h : Y → R such that

.

g(y) ≤ f (x, y) ≤ h(y)

.

for every .x ∈ X \ {x0 } and almost every .y ∈ Y. Then .η is integrable on .Y, and we have .

lim F (x) =

x→x0

η(y) dy . Y

Proof Let us take a sequence .(xn )n in .X \ {x0 } that tends to .x0 . Define, for every .n, the functions .fn : Y → R such that .fn (y) = f (xn , y). The assumptions allow us to apply the Dominated Convergence Theorem 9.13, so that .



lim F (xn ) = lim n

n

fn (y) dy

=

Y

-  Y

 η(y) dy . lim fn (y) dy = n

Y

The conclusion then follows from the characterization of the limit by the use of sequences (Proposition 4.3).  We have the following consequence of the above theorem. Corollary 11.30 If X is a subset of .RM, .Y ⊆ RN is compact, and .f : X × Y → R is continuous, then the function .F : X → R, defined by .F (x) = f (x, y) dy , Y

is continuous.

11.7 Limits and Derivatives under the Integration Sign

321

Proof The function .F (x) is well defined, since .f (x, ·) is continuous on the compact set .Y. Let us fix .x0 ∈ X and prove that F is continuous at .x0 . By the continuity of .f, η(y) = lim f (x, y) = f (x0 , y) .

.

x→x0

Moreover, given a compact neighborhood U of .x0 , there is a constant .C > 0 such that .|f (x, y)| ≤ C for every .(x, y) ∈ U × Y. The previous theorem can then be applied, and we have .

lim F (x) =

x→x0

f (x0 , y) dy = F (x0 ) , Y



thereby proving that F is continuous at .x0 .

Now let X be a subset of .R. The second question we want to address is when the formula   -  d ∂f (x, y) dy . f (x, y) dy = dx Y Y ∂x holds. Here is an answer. Theorem 11.31 (Leibniz Rule) Let X be an interval in .R containing .x0 , and let the following assumptions hold true: (a) For every .x ∈ X the function .f (x, ·) is integrable on .Y, so that we can define the function

.

F (x) =

f (x, y) dy .

.

Y

(b) For every .x ∈ X and almost every .y ∈ Y the partial derivative . ∂f ∂x (x, y) exists. .(c) There are two integrable functions .g, h : Y → R such that

.

g(y) ≤

.

∂f (x, y) ≤ h(y) ∂x

for every .x ∈ X and almost every .y ∈ Y. Then the function . ∂f ∂x (x, ·), defined almost everywhere on .Y, is integrable there, the derivative of F in .x0 exists, and we have 

- 

F (x0 ) =

.

Y

 ∂f (x0 , y) dy . ∂x

322

11 The Integral

Proof We define, for .x ∈ X different from .x0 , the function ψ(x, y) =

.

f (x, y) − f (x0 , y) . x − x0

For every .x ∈ X \ {x0 } the function .ψ(x, ·) is integrable on .Y. Moreover, for almost every .y ∈ Y we have .

lim ψ(x, y) =

x→x0

∂f (x0 , y) . ∂x

By the Lagrange Mean Value Theorem 6.11, for .(x, y) as previously there is a .ξ ∈ X between .x0 and x such that ψ(x, y) =

.

∂f (ξ, y) . ∂x

By assumption .(iii), we then have g(y) ≤ ψ(x, y) ≤ h(y)

.

for every .x ∈ X \ {x0 } and almost every .y ∈ Y. By the previous theorem, we can conclude that the function . ∂f ∂x (x0 , ·), defined almost everywhere on .Y, is integrable there, and  -   ∂f (x0 , y) dy . . lim ψ(x, y) dy = x→x0 Y Y ∂x On the other hand,   f (x, y) − f (x0 , y) . lim ψ(x, y) dy = lim dy x→x0 x→x0 x − x0 Y Y  1 = lim f (x, y) dy − f (x0 , y) dy x→x0 x − x0 Y Y = lim

x→x0

F (x) − F (x0 ) , x − x0

so that F is differentiable at .x0 , and the conclusion holds.



Corollary 11.32 If X is an interval in .R, Y is a compact subset of .RN, and the function .f : X × Y → R is continuous and has a continuous partial derivative ∂f . ∂x : X × Y → R, then the function .F : X → R, defined by F (x) =

f (x, y) dy ,

.

Y

is differentiable with a continuous derivative.

11.7 Limits and Derivatives under the Integration Sign

323

Proof The function .F (x) is well defined, since .f (x, ·) is continuous on the compact set .Y. Taking a point .x0 ∈ X and a nontrivial compact interval .U ⊆ X containing it, there is a constant .C > 0 such that .| ∂f ∂x (x, y)| ≤ C for every .(x, y) ∈ U × Y. By the preceding theorem, F is differentiable at .x0 . The same argument holds replacing .x0 with any .x ∈ X, and F  (x) =

- 

.

Y

 ∂f (x, y) dy . ∂x

The continuity of .F  : X → R now follows from Corollary 11.30.



Example Consider, for .x ≥ 0, the function e−x (y +1) .f (x, y) = . y2 + 1 2

2

We want to determine whether the corresponding function .F (x) = differentiable and, in this case, to find its derivative. We have that .

.1 0

f (x, y) dy is

∂f 2 2 (x, y) = −2xe−x (y +1) , ∂x

which, for .y ∈ [0, 1] and .x ≥ 0, is such that , .

2 2 2 2 ≤ −2xe−x ≤ −2xe−x (y +1) ≤ 0 . e



We can then apply the Leibniz rule, so that F  (x) = −2x

-

1

.

e−x

2 (y 2 +1)

dy .

0

Let us make a digression, so as to present an elegant formula. By the change of variable .t = xy, we have .

1

− 2x

e−x

2 (y 2 +1)

dy = −2e−x

2

0

-

x

e−t dt = − 2

0

Taking into account that .F (0) = π/4, we have F (x) =

.

π − 4

-

x 0

e−t dt 2

2 .

d dx

-

x 0

e−t dt 2

2 .

324

11 The Integral

We would like now to pass to the limit for .x → +∞. Since, for .x ≥ 0, we have e−x (y +1) ≤ 1, y2 + 1 2

0≤

.

2

we can pass to the limit under the sign of integration, thereby obtaining .

lim

x→+∞ 0

1

e−x (y +1) dy = y2 + 1 2

2

1

-

 2 2 e−x (y +1) dy = 0 . x→+∞ y 2 + 1 lim

0

Hence, -

+∞

.

e−t dt 2

2

0

=

π 4

and, by symmetry, .

+∞ −∞

e−t dt = 2

√ π,

which is a very useful formula in various applications.

11.8

Reduction Formula

In this section we will prove a fundamental result that permits us to compute the integral of a function of several variables by an iterative process of integration of functions of a single variable. It will be useful to recall some notation. For any fixed x we will denote by .f (x, ·) the function .y → f (x, y). Similarly, for any fixed y we will denote by .f (·, y) the function .x → f (x, y). Before stating the main theorem, it will be useful to first prove a preliminary result. Proposition 11.33 Let .f : I → R be an integrable function on the rectangle I = [a1 , b1 ] × [a2 , b2 ]. Then, for almost every .x ∈ [a1, b1 ], the function .f (x, ·) is integrable on .[a2 , b2 ].

.

Proof Let .T ⊆ [a1 , b1 ] be the set of those .x ∈ [a1 , b1 ] for which .f (x, ·) is not integrable on .[a2, b2 ]. Let us prove that T is a negligible set. For each .x ∈ T , the Cauchy condition does not hold. Hence, if we define the sets ⎧ ⎨

⎫ for every gauge δ2 on [a2 , b2 ] there are two ⎬ ˚2 and Q ˚2 of [a2 , b2 ] such that , .Tn = x : δ2 -fine tagged partitions P ⎩ ⎭ ˚2 ) − S(f (x, ·), Q ˚2 ) > 1 S(f (x, ·), P n

11.8 Reduction Formula

325

we have that each .x ∈ T belongs to .Tn if n is sufficiently large. Thus, T is the union of all .Tn , and if we prove that any .Tn is negligible, then by the properties of the measure we will have that T is also negligible. To do so, let us consider a certain .Tn and fix .ε > 0. Since f is integrable on .I, there is a gauge .δ on I such that, given ˚ and .Q ˚ of .I, we have two .δ-fine tagged partitions .P ˚ − S(f, Q)| ˚ ≤ |S(f, P)

.

ε . n

The gauge .δ on I determines, for every .x ∈ [a1 , b1 ], a gauge .δ(x, ·) on .[a2 , b2 ]. We ˚x and .Q ˚x of now associate to each .x ∈ [a1 , b1 ] two .δ(x, ·)-fine tagged partitions .P 2 2 .[a2 , b2 ] in the following way: ˚x and .Q ˚x such that – If .x ∈ Tn , we can choose .P 2 2 ˚x ) − S(f (x, ·), Q ˚x ) > 1 . S(f (x, ·), P 2 2 n

.

˚x and .Q ˚x equal to each other. – If instead .x ∈ Tn , we take .P 2 2 ˚x and .Q ˚x thus determined: Let us write the two tagged partitions .P 2 2 ˚x = {(y x , K x ) : j = 1, . . . , mx } , Q ˚x = {(y˜ x , K &x ) : j = 1, . . . , m P &x } . 2 j j 2 j j

.

We define a gauge .δ1 on .[a1 , b1 ], setting x x δ1 (x) = min{δ(x, y1x ), . . . , δ(x, ym ˜1x ), . . . , δ(x, y˜m x ), δ(x, y &x )} .

.

˚1 = {(xi , Ji ) : i = 1, . . . , k} be a .δ1 -fine tagged partition of .[a1 , b1 ]. We Now let .P ˚1 ) ≤ ε, i.e., want to prove that .S(χTn , P  .

μ(Ji ) ≤ ε .

{i : xi ∈Tn }

To this end, define the following two tagged partitions of I , which make use of the ˚1 : elements of .P .

˚ = {((xi , y xi ), Ji × K xi ) : i = 1, . . . , k , j = 1, . . . , mxi } , P j j ˚ = {((xi , y˜ xi ), Ji × K &xi ) : i = 1, . . . , k , j = 1, . . . , m Q &xi } . j j

They are .δ-fine, and hence ˚ − S(f, Q)| ˚ ≤ ε. |S(f, P) n

.

326

11 The Integral

On the other hand, we have ˚ − S(f, Q)| ˚ = |S(f, P)   k  mxi m ˜ xi   k  xi xi xi xi   & = f (xi , yj )μ(Ji × Kj ) − f (xi , y˜j )μ(Ji × Kj )

.

i=1 j =1

i=1 j =1

 

 m ˜ xi   k  xi xi x  &xi )  = μ(Ji ) f (xi , yj )μ(Kj ) − f (xi , y˜j i )μ(K j  mxi

j =1

i=1

j =1

  k  * +  x x i i ˚ ) − S(f (xi , ·), Q ˚ )  μ(Ji ) S(f (xi , ·), P = 2 2   i=1



=

˚xi ) − S(f (xi , ·), Q ˚xi )] . μ(Ji )[S(f (xi , ·), P 2 2

{i : xi ∈Tn }

Recalling that ˚xi ) − S(f (xi , ·), Q ˚xi ) > 1 , S(f (xi , ·), P 2 2 n

.

we conclude that .

ε ˚ − S(f, Q)| ˚ > 1 ≥ |S(f, P) n n



μ(Ji ) ,

{i : xi ∈Tn }

˚1 ) ≤ ε, which is what we wanted to prove. All this shows that and hence .S(χTn , P the sets .Tn are negligible, and therefore T is negligible, too.  The following theorem, due to Guido Fubini, permits us to compute the integral of an integrable function of two variables by performing two integrations of functions of one variable. Theorem 11.34 (Reduction Theorem—I) Let .f : I → R be an integrable function on the rectangle .I = [a1 , b1 ] × [a2 , b2 ]. Then: (a) For almost every .x ∈ [a1 , b1 ] the function .f (x, ·) is integrable on .[a2 , b2 ]. .b (b) The function . a22 f (·, y) dy, defined almost everywhere on .[a1 , b1 ], is integrable there.

.

.

11.8 Reduction Formula

327

(c) We have

.

-

f =

.

I

b1

-

a1

b2

 f (x, y) dy dx .

a2

Proof We already proved .(a) in Proposition 11.33. Let us now prove .(b) and .(c). Let T be the negligible subset of .[a1 , b1 ] such that, for .x ∈ T , the function .f (x, ·) is not integrable on .[a2 , b2 ]. Since .T × [a2, b2 ] is negligible in .I, we can modify on that set the function f without changing the integrability properties. We can set, for example, .f = 0 on that set. In this way, we can assume without loss of generality that T is empty. Let us define F (x) =

b2

f (x, y) dy .

.

a2

We want to prove that F is integrable on .[a1 , b1 ] and that -

b1

.

a1

F =

f. I

Let .ε > 0 be fixed. Because of the integrability of f on .I, there is a gauge .δ on I ˚ of .I, such that, for every .δ-fine tagged partition .P  -    ε  ˚ . S(f, P) − f  ≤ .  2 I For every .x ∈ [a1 , b1 ], since .f (x, ·) is integrable on .[a2 , b2 ] with integral .F (x), there exists a gauge .δ¯x : [a2 , b2 ] → R such that, taking any .δ¯x -fine tagged partition ˚2 of .[a2 , b2 ], we have that .P ˚2 ) − F (x)| ≤ |S(f (x, ·), P

.

ε . 2(b1 − a1 )

We can assume that .δ¯x (y) ≤ δ(x, y) for every .(x, y) ∈ I . Then let us choose for ˚x of .[a2 , b2 ] and write it explicitly as every .x ∈ [a1 , b1 ] a .δ¯x -fine tagged partition .P 2 ˚x = {(y x , K x ) : j = 1, . . . , mx } . P 2 j j

.

We will thus have that, for every .x ∈ [a1 , b1 ], ˚x )| ≤ |F (x) − S(f (x, ·), P 2

.

ε . 2(b1 − a1 )

328

11 The Integral

We define a gauge .δ1 on .[a1 , b1 ] by setting x δ1 (x) = min{δ(x, y1x ), . . . , δ(x, ym x )} .

.

˚1 of .[a1 , b1 ], We will prove that, for every .δ1 -fine tagged partition .P  -    . S(F, P ˚1 ) − f  ≤ ε .   I

Let us take a .δ1 -fine tagged partition of .[a1 , b1 ], ˚1 = {(xi , Ji ) : i = 1, . . . , n} , P

.

and construct, starting from it, a .δ-fine tagged partition of I , ˚ = {((xi , y xi ), Ji × K xi ) : i = 1, . . . , n , j = 1, . . . , mxi } . P j j

.

We have the following inequalities:   -  -      ˚ + S(f, P) S(F, P ˚1 ) − f  ≤ |S(F, P ˚1 ) − S(f, P)| ˚ − f    

.

I

I

  mxi n    n ε xi xi   F (xi )μ(Ji ) − f (xi , yj )μ(Ji × Kj ) + ≤ 2 i=1 j =1

i=1

 n     ε xi xi  F (xi ) − f (x , y )μ(K ) i j j μ(Ji ) +  2 mxi



j =1

i=1



n  i=1

ε ε μ(Ji ) + = ε . 2(b1 − a1 ) 2

This proves that F is integrable on .[a1 , b1 ] and -

b1

.

a1

The proof is thus completed.

F =

f. I



11.8 Reduction Formula

329

Example Consider the function .f (x, y) = x 2 sin y on the rectangle .I = [−1, 1] × [0, π]. Since f is continuous on a compact set, it is integrable there, so that -

f =

.

I

=

-

1

 2

x sin y dy dx

−1

0

-

1 −1

π

x

2

[− cos y]π0

1



x3 dx = 2 x dx = 2 3 −1

1

2

−1

=

4 . 3

Clearly, the following version of the Fubini theorem holds, which is symmetric with respect to the preceding one. Theorem 11.35 (Reduction Theorem—II) Let .f : I → R be an integrable function on the rectangle .I = [a1 , b1 ] × [a2 , b2 ]. Then: (a) For almost every .y ∈ [a2 , b2 ] the function .f (·, y) is integrable on .[a1 , b1 ]. .b (b) The function . a11 f (x, ·) dx, defined almost everywhere on .[a2 , b2 ], is integrable there. .(c) We have

.

.

-

f =

.

I

b2

-



b1

f (x, y) dx dy . a2

a1

As an immediate consequence, we have that, if f is integrable on .I = [a1, b1 ] × [a2 , b2 ], then -

-

b1

b2

.

a1

 f (x, y) dy dx =

a2

b2 a2

-

b1

 f (x, y) dx dy .

a1

Therefore, if the preceding equality does not hold, then the function f is not integrable on .I. Examples Consider the function ⎧ ⎨

x2 − y2 if (x, y) = (0, 0) , .f (x, y) = (x 2 + y 2 )2 ⎩ 0 if (x, y) = (0, 0) , on the rectangle .I = [0, 1] × [0, 1]. If .x = 0, then we have .

0

1

y=1 x2 − y2 y 1 , dy = = (x 2 + y 2 )2 x 2 + y 2 y=0 x 2 + 1

330

11 The Integral

so that  - 1 x2 − y2 1 π dx = [arctan x]10 = . dy dx = 2 2 2 2 (x + y ) 4 0 x +1

1 - 1

.

0

0

Analogously, we see that 1 - 1

.

0

0

 x2 − y2 π dx dy = − , 2 2 2 (x + y ) 4

and we thus conclude that f is not integrable on .I. As a further example, consider the function f (x, y) =

.

⎧ ⎨ ⎩

xy if (x, y) = (0, 0) , + y 2 )2 0 if (x, y) = (0, 0) ,

(x 2

on the rectangle .I = [−1, 1] × [−1, 1]. In this case, if .x = 0, we have .

1 −1

y=1 xy −x dy = = 0, (x 2 + y 2 )2 2(x 2 + y 2 ) y=−1

so that .

1

-

−1

1 −1

 xy dy dx = 0 . (x 2 + y 2 )2

Analogously, we see that .

1 −1

-

1 −1

 xy dx dy = 0 . (x 2 + y 2 )2

Nevertheless, we are not allowed to conclude that f is integrable on .I. Actually, it is not at all. Indeed, if f were integrable, it should be on every subrectangle, and in particular on .[0, 1] × [0, 1]. But if .x = 0, then we have .

0

1

y=1 xy −x 1 , dy = = 2 2 2 2 2 (x + y ) 2(x + y ) y=0 2x(x 2 + 1)

which is not integrable with respect to x on .[0, 1]. When the function f is defined on a bounded subset E of .R2 , it is possible to state the reduction theorem for the function .fE . Let .I = [a1 , b1 ] × [a2 , b2 ] be a

11.8 Reduction Formula

331

rectangle containing .E. Let us define the “sections” of .E : Ex = {y ∈ [a2 , b2 ] : (x, y) ∈ E} ,

Ey = {x ∈ [a1 , b1 ] : (x, y) ∈ E} ,

.

and the “projections” of .E : P1 E = {x ∈ [a1 , b1 ] : Ex = Ø} ,

P2 E = {y ∈ [a2 , b2 ] : Ey = Ø} .

.

E Ex

x

P1 E

We can then reformulate the Fubini theorem in the following way. Theorem 11.36 (Reduction Theorem—III) Let .f : E → R be an integrable function on the bounded set .E. Then: (a) For almost every .x ∈ .P1 E the function .fE (x, ·) is integrable on the set .Ex . (b) The function .x → Ex f (x, y) dy, defined almost everywhere on .P1 E, is integrable there. .(c) We have

.

.

-

-

-

 f (x, y) dy dx .

f =

.

E

P1 E

Analogously, the function .y → .P2 E, is integrable there, and

. Ey

f (x, y) dx, defined almost everywhere on

-

-

-

Ex



f =

.

E

f (x, y) dx dy . P2 E

Ey

Example Consider the function .f (x, y) = |xy| on the set E = {(x, y) ∈ R2 : 0 ≤ x ≤ 1, −x 2 ≤ y ≤ x 2 } .

.

332

11 The Integral

Since f is continuous and E is compact, the theorem applies; we have .P1 E = [0, 1] and, for every .x ∈ P1 E, .Ex = [−x 2, x 2 ]. Hence: -

f =

.

E

1  - x2 −x 2

0

 |xy| dy dx =

1

|x|

0

y|y| 2

x 2 −x 2

-

1

dx =

x 5 dx =

0

x6 6

1 = 0

1 . 6

As a corollary, we have a method to compute the measure of a bounded measurable set. Corollary 11.37 If .E ⊆ R2 is a measurable bounded set, then: (a) For almost every .x ∈ P1 E the set .Ex is measurable. (b) The function .x → μ(Ex ), defined almost everywhere on .P1 E, is integrable there. .(c) We have .μ(E) = μ(Ex ) dx .

.

.

P1 E

Analogously, the function .y → μ(Ey ), defined almost everywhere on .P2 E, is integrable there, and μ(E) =

μ(Ey ) dy .

.

P2 E

Example Let us compute the area of a disk with radius .R > 0 : Let .E = {(x, y) ∈ R2 : x 2 + y 2 ≤ R 2 }. Since E is a compact set, √ it is measurable. We have that √ .P1 E = [−R, R] and, for every .x ∈ P1 E, .Ex = [− R 2 − x 2 , R 2 − x 2 ]. Hence: μ(E) =

.

R −R

2 R 2 − x 2 dx =

π/2 −π/2

2R 2 cos2 t dt

π/2

= R 2 [t + cos t sin t]−π/2 = πR 2 . In the case of functions of more than two variables, results analogous to the preceding ones hold true, with the same proofs. One simply needs to separate the variables into two different groups, calling x the first group and y the second one, and the same formulas hold. Example We want to compute the volume of a three-dimensional ball with radius R > 0. Let .E = {(x, y, z) ∈ R3 : x 2 + y 2 + z2 ≤ R 2 }. Let us group together the variables .(y, z) and consider the projection on the x-axis: .P1 E = [−R, R]. The

.

11.9 Change of Variables in the Integral

333

√ sections .Ex then are disks of radius . R 2 − x 2 , and we have -



x3 .μ(E) = π(R − x ) dx = π(2R ) − π 3 −R R

2

2

R

3

−R

=

4 πR 3 . 3

Another way to compute the same volume is to group the variables .(x, y) and consider .P1 E = {(x, y) : x 2 + y 2 = R 2 }. For every .(x, y) ∈ P1 E we have E(x,y)

.

  2 2 2 2 2 2 = − R −x −y , R −x −y ,

so that -

 2 R 2 − x 2 − y 2 dx dy

μ(E) =

.

P1 E

= =

R −R R

 - √R 2 −x 2   2 2 2 2 R − x − y dy dx √ -

−R



R 2 −x 2

π/2

 2(R − x ) cos t dt dx = 2

−π/2

2

R

2

−R

π(R 2 − x 2 ) dx =

4 πR 3 , 3

 √  by the change of variable .t = arcsin y/ R 2 − x 2 . Iterating the preceding reduction procedure, it is possible to prove, for a function of N variables that is integrable on a rectangle I = [a1, b1 ] × [a2, b2 ] × · · · × [aN , bN ] ,

.

formulas like -

f =

.

I

11.9

b1 a1

-

b2 a2

 ...

bN

  f (x1 , x2 , . . . , xN ) dxN . . . dx2 dx1 .

aN

Change of Variables in the Integral

In this section we look for an analogue to the formula of integration by substitution, which was proved in Chap. 7 for functions of a single variable. The proof of that formula was based on the Fundamental Theorem. Since we do not have such a powerful tool for functions of several variables, actually we will not be able to completely generalize that formula.

334

11 The Integral

For example, not only will the function .ϕ be assumed to be differentiable, but we will need it to be a diffeomorphism between two open sets A and B of .RN . In other words, .ϕ : A → B will be continuously differentiable and invertible, and −1 : B → A will be continuously differentiable as well. It is useful to recall .ϕ that, by Theorem 2.11, a diffeomorphism transforms open sets into open sets and closed sets into closed sets. Moreover, by Theorem 10.26, for every point .u ∈ A the Jacobian matrix .J ϕ(u) is invertible: We have .

det J ϕ(u) = 0 .

From now on, we will often use a different notation for the Jacobian matrix: instead of J ϕ(u) , we will write ϕ  (u) .

.

We will also need the following property. Lemma 11.38 Let .A ⊆ RN be an open set and .ϕ : A → RN a .C 1 -function; if S is a subset of A of the type S = [a1 , b1 ] × · · · × [aN−1 , bN−1 ] × {c} ,

.

then .ϕ(S) is negligible. Proof For simplicity, let us concentrate on the case of a subset of .R2 of the type S = [0, 1] × {0} .

.

For any positive integer n, consider the rectangles (actually squares) Jk,n =

.



k−1 k 1 1 , × − , , n n 2n 2n

with .k = 1, . . . , n. For n large enough, they are contained in a rectangle R, which itself is contained in .A. Since R is a compact set, there is a constant .C > 0 such that  .ϕ (u) ≤ C for every .u ∈ R. By the Mean Vale Theorem 10.20, .ϕ is “Lipschitz continuous” on R with Lipschitz constant C, i.e., ϕ(u) − ϕ(v) ≤ Cu − v ,

.

for every u, v ∈ R .

√ Since the sets .Jk,n have as diameter . n1 2, the sets .ϕ(Jk,n ) are surely contained in √ some squares .J˜k,n whose sides’ lengths are equal to . Cn 2. We then have that .ϕ(S) is covered by the rectangles .J˜k,n and

11.9 Change of Variables in the Integral n  .

335

μ(J˜k,n ) ≤ n



k=1

C√ 2 n

2 =

2C 2 . n

Since this quantity can be made arbitrarily small, the conclusion follows from Corollary 11.23.  As a consequence of the foregoing lemma, it is easy to see that the image of the boundary of a rectangle through a diffeomorphism .ϕ is a negligible set. In particular, given two nonoverlapping rectangles, their images are nonoverlapping sets. We are now ready to prove a first version of the change of variables formula in the integral, which will be generalized in a later section. Theorem 11.39 (Change of Variables Theorem—I) Let A and B be open subsets of .RN and .ϕ : A → B a diffeomorphism. If .f : B → R is a continuous function, then, for every compact subset D of .A, .

ϕ(D)

f (x ) d x =

D

f (ϕ(u)) | det ϕ  (u)| d u .

Proof Note first of all that the integrals in the formula are both meaningful, since the sets D and .ϕ(D) are compact and the considered functions continuous. We will proceed by induction on the dimension .N. Let us first consider the case .N = 1. First, using the method of integration by substitution, one verifies that the formula is true when D is a compact interval .[a, b]: It is sufficient to consider the two possible cases in which .ϕ is increasing or decreasing and recall that every continuous function is primitivable. For instance, if .ϕ is decreasing, then we have .ϕ([a, b]) = [ϕ(b), ϕ(a)], so that -

-

ϕ(a)

f (x) dx =

.

ϕ([a,b])

f (x) dx -

ϕ(b) a

= -

b

= =

f (ϕ(u))ϕ  (u) du

b

f (ϕ(u))|ϕ  (u)| du

a

[a,b]

f (ϕ(u))|ϕ  (u)| du .

˚ contains .D. Since both f Now let R be a compact subset of A whose interior .R and .(f ◦ ϕ)|ϕ  | are continuous, they are integrable on the compact sets .ϕ(R) and .R, ˚ and .R ˚ \ D can each be split into a countable union of respectively. The open sets .R nonoverlapping compact intervals whose images through .ϕ also are nonoverlapping close intervals. By the complete additivity of the integral, the formula holds true for

336

11 The Integral

˚ and .R ˚\ D : R

.

-

.

-

˚ ϕ(R)

f (x) dx =

˚ R

f (ϕ(u))|ϕ  (u)| du ,

-

˚ ϕ(R\D)

f (x) dx =

˚ R\D

f (ϕ(u))|ϕ  (u)| du .

Hence, -

f (x) dx =

.

ϕ(D)

= = -

˚ R\D)) ˚ ϕ(R\(

˚ ϕ(R)

˚ R

f (x) dx

f (x) dx −

˚ ϕ(R\D)

f (ϕ(u))|ϕ  (u)| du −

f (x) dx

˚ R\D

f (ϕ(u))|ϕ  (u)| du

f (ϕ(u))|ϕ  (u)| du ,

= D

so that the formula is proved in the case .N = 1. Assume now that the formula holds for the dimension .N, and let us prove that ¯ ∈ A, at least one of the partial it also holds for .N + 1.1 Once we fix a point .u ∂ϕi ¯ ) is different from zero. We can assume without loss of generality derivatives . ∂u ( u j N+1 ¯ ) = 0. Consider the function that it is . ∂ϕ ∂uN+1 (u

α(u1 , . . . , uN+1 ) = (u1 , . . . , uN , ϕN+1 (u1 , . . . , uN+1 )) .

.

N+1 ¯ ) = ∂ϕ ¯ ) = 0, by Theorem 10.24 we have that .α is a Since .det α  (u ∂uN+1 (u ¯ and an open neighborhood diffeomorphism between an open neighborhood U of .u & = α(D). ¯ ). Assume first that D is contained in .U, and set .D V of .α(u We define on V the function .β = ϕ ◦ α −1 , which is of the form

β(v1 , . . . , vN+1 ) = (β1 (v1 , . . . , vN+1 ), . . . , βN (v1 , . . . , vN+1 ), vN+1 ) ,

.

where, for .j = 1, . . . , N, we have βj (v1 , . . . , vN+1 ) = ϕj (v1 , . . . , vN , [ϕN+1 (v1 , . . . , vN , ·)]−1 (vN+1 )) .

.

Such a function .β is a diffeomorphism between the open sets V and .W = ϕ(U ).

1

At a first reading, it is advisable to consider the transition from .N = 1 to .N + 1 = 2.

11.9 Change of Variables in the Integral

337

Consider the sections Vt = {(v1 , . . . , vN ) : (v1 , . . . , vN , t) ∈ V }

.

and the projection PN+1 V = {t : Vt = Ø} .

.

For .t ∈ PN+1 V , define the function βt (v1 , . . . , vN ) = (β1 (v1 , . . . , vN , t), . . . , βN (v1 , . . . , vN , t)) ,

.

which happens to be a diffeomorphism defined on the open set .Vt whose image is the open set Wt = {(x1 , . . . , xN ) : (x1 , . . . , xN , t) ∈ W } .

.

Moreover, .det βt (v1 , . . . , vN ) = det β  (v1 , . . . , vN , t). Consider also the sections &t = {(v1 , . . . , vN ) : (v1 , . . . , vN , t) ∈ D} & D

.

and the projection & = {t : D &t = Ø} . PN+1 D

.

& t and .PN+1 β(D). & By the definition of .β, we have Analogously, we consider .β(D) & t = β t (D &t ) , β(D)

.

& = PN+1 D &. PN+1 β(D)

Using the Reduction Theorem 11.34 and the inductive assumption, we have -

-

.

& β(D)

f =

= = =

& PN+1 D

& PN+1 D

& D

& PN+1 β(D)

-

-

-

&t D

&t D

&t ) βt (D

 f (x1 , . . . , xN , t) dx1 . . . dxN dt

 f (βt (v1 , . . . , vN ), t) | det βt (v1 , . . . , vN )| dv1 . . . dvN dt  f (β(v1 , . . . , vN , t)) | det β  (v1 , . . . , vN , t)| dv1 . . . dvN dt

f (β(v)) | det β  (v)| d v .

Consider now the function .f˜ : V → R defined as f˜(v) = f (β(v)) | det β  (v)| .

.

338

11 The Integral

Define the sections Du1 ,...,uN = {uN+1 : (u1 , . . . , uN , uN+1 ) ∈ D}

.

and the projection P1,...,N D = {(u1 , . . . , uN ) : Du1 ,...,uN = Ø} .

.

In an analogous way we define .α(D)u1 ,...,uN and .P1,...,N α(D). They are all closed sets, and by the definition of .α, we have α(D)u1 ,...,uN = ϕN+1 (u1 , . . . , uN , Du1 ,...,uN ) ,

.

P1,...,N α(D) = P1,...,N D .

Moreover, for every .(u1 , . . . , uN ) ∈ P1,...,N D, the function defined by t → ϕN+1 (u1 , . . . , uN , t)

.

is a diffeomorphism of one variable between the open sets .Uu1 ,...,uN and .Vu1 ,...,uN , sections of U and .V , respectively. Using the Reduction Theorem 11.34 and the onedimensional change of variables formula proved earlier, we have that -

f˜ =

.

-

-

α(D)

P1,...,N α(D)

α(D)u1 ,...,uN

-

-

 f˜(v1 , . . . , vN+1 ) dvN+1 dv1 . . . dvN

= P1,...,N D

-

-

ϕN+1 (u1 ,...,uN ,Du1 ,...,uN )

f˜(u1 , . . . , uN , ϕN+1 (u1 , . . . , uN+1 )) ·

= P1,...,N D

 f˜(v1 , . . . , vN+1 ) dvN+1 dv1 . . . dvN

Du1 ,...,uN

    ∂ϕN+1  ·  (u1 , . . . , uN+1 ) duN+1 du1 . . . duN ∂uN+1 = f˜(α(u)) | det α  (u)| d u . D

Hence, since .ϕ = β ◦ α, we have . f (x ) d x = f (x) d x ϕ(D)

= -

& β(D)

& D

f (β(v)) | det β  (v)| d v

= α(D)

f˜(v) d v

11.9 Change of Variables in the Integral

-

339

f˜(α(u)) | det α  (u)| d u

= -

D

-

D

=

f (β(α(u))) | det β  (α(u))| | det α  (u)| d u f (ϕ(u)) | det ϕ  (u)| d u .

= D

We have then proved that, for every .u ∈ A, there is a .δ(u) > 0 such that the thesis holds true when D is contained in .B[u, δ(u)]. A gauge .δ is thus defined on .A. By Lemma 11.20, we can now cover A with a countable family .(Jk )k of nonoverlapping rectangles, each contained in a rectangle of the type .B[u, δ(u)], so that the formula holds for the closed sets contained in any of these rectangles. At this point let us consider an arbitrary compact subset D of .A. Then the formula holds for each .D ∩ Jk , and, by the complete additivity of the integral and the fact that the sets .ϕ(D ∩ Jk ) are nonoverlapping (as a consequence of Lemma 11.38), we have . f (x ) d x = f (x ) d x ϕ(D)

=

k

ϕ(D∩Jk )

k

D∩Jk

-

= D

f (ϕ(u)) | det ϕ  (u)| d u

f (ϕ(u)) | det ϕ  (u)| d u . 

The theorem is thus completely proved.

Remark 11.40 The change of variables formula is often written, setting .ϕ(D) = E, in the equivalent form .

E

f (x ) d x =

ϕ −1 (E)

f (ϕ(u)) | det ϕ  (u)| d u .

Example Consider the set E = {(x, y) ∈ R2 : −1 ≤ x ≤ 1, x 2 ≤ y ≤ x 2 + 1} ,

.

and let .f (x, y) = x 2 y be a function on it. Defining .ϕ(u, v) = (u, v + u2 ), we have a diffeomorphism with .det ϕ  (u, v) = 1. Since .ϕ −1 (E) = [−1, 1] × [0, 1], by the change of variables formula and the use of the Fubini reduction theorem we have -

2

x y dx dy =

.

E

1 −1

-

1 0

 2

2

u (v + u ) dv

du =

1 −1



u2 + u4 2

 du =

11 . 15

340

11 The Integral

11.10 Change of Measure by Diffeomorphisms In this section we study how a measure is changed by the action of a diffeomorphism. Theorem 11.41 Let A and B be open subsets of .RN , and let .ϕ : A → B be a diffeomorphism. Let .D ⊆ A and .ϕ(D) ⊆ B be bounded sets. If D is measurable, then .ϕ(D) is measurable, .| det ϕ  | is integrable on D, and μ(ϕ(D)) =

.

D

| det ϕ  (u)| d u .

Proof By the preceding theorem, the formula holds true whenever D is compact. Since every open set can be written as the union of a countable family of nonoverlapping (closed) rectangles, by the complete additivity and the fact that A is bounded, the formula holds true even if D is an open bounded set. Assume now that D is a measurable bounded set whose closure .D is contained ˚ contains .D. Then there is a in .A. Let R be a compact subset of A whose interior .R constant .C > 0 such that .| det ϕ  (u)| ≤ C for every .u ∈ R. By Proposition 11.22, for every .ε > 0 there are two finite or countable families .(Jk ) and .(Jk ), each made ˚ such that of nonoverlapping rectangles contained in .R ˚\ R

)

.

 ) Jk ⊆ D ⊆ Jk ,

k

μ

k

 )

  )  Jk ∩ Jk ≤ ε .

k

k

Since the formula to be proved holds on both the open bounded sets and the compact sets, it certainly holds on each rectangle .Jk and .Jk ; then it holds on .∪k Jk and on  ˚ it must be true on .R ˚ \ (∪k J  ) as well. Thus, we .∪k J , and since it holds even on .R, k k ˚ \ (∪k J  )) are measurable, have that .ϕ(∪k Jk ) and .ϕ(R k )    )  ˚\ ϕ R Jk ⊆ ϕ(D) ⊆ ϕ Jk ,

.

k

k

and   )     )  ˚\ μ ϕ Jk − μ ϕ R Jk =

.

k

k

= =

∪k Jk

| det ϕ  (u)| d u −

(∪k Jk )∩(∪k Jk )

˚ \(∪k J  ) R k

| det ϕ  (u)| d u

| det ϕ  (u)| d u

11.10 Change of Measure by Diffeomorphisms

≤ Cμ

 )

341

  )  Jk ∩ Jk

k

k

≤ Cε . Taking .ε = n1 , we find in this way two sequences .Dn = ∪k Jk,n and .Dn = ˚\(∪k J  ) with the aforementioned properties. By Proposition 11.22, we have that R k,n  .ϕ(D) is measurable and .μ(ϕ(D)) = limn μ(ϕ(Dn )) = limn μ(ϕ(Dn )). Moreover, since .χDn converges almost everywhere to .χD , by the Dominated Convergence Theorem 9.13, we have that μ(ϕ(D)) = lim μ(ϕ(Dn )) n = lim | det ϕ  (u)| d u

.

n

-

Dn

= lim n

-

| det ϕ  (u)|χDn (u) d u

| det ϕ  (u)|χD (u) d u

= -

R

R

| det ϕ  (u)| d u .

= D

We can now consider the case of an arbitrary measurable bounded set D in A. Since D is bounded, there is an open ball .B(0, ρ) containing it. Let .A = A∩B(0, ρ) and .B  = ϕ(A ). Since .A is open and bounded, as in the proof of Proposition 11.11, we can consider a sequence of nonoverlapping rectangles .(Kn )n whose union is equal to .A . The formula holds for each of the sets .D ∩ Kn , by the foregoing considerations. The complete additivity of the integral (Theorem 11.28) and the fact that .A is bounded then lead us to our conclusion.  Example Consider the set E = {(x, y) ∈ R2 : x < y < 2x, 3x 2 < y < 4x 2 } .

.

We see that E is measurable since it is an open set. Taking  ϕ(u, v) =

.

u u2 , v v

 ,

we have a diffeomorphism between the set .D = ]1, 2[ × ]3, 4[ and .E = ϕ(D). Moreover, 

1/v −u/v 2 . det ϕ (u, v) = det 2u/v −u2 /v 2 

 =

u2 . v3

342

11 The Integral

Applying the formula on the change of measure and the Reduction Theorem 11.34, we have that - 2 - 4 2  - 2 u 7 2 49 u du = . .μ(E) = dv du = 3 864 1 3 v 1 288

11.11 The General Theorem on Change of Variables We are now interested in generalizing the Change of Variables Theorem 11.39 assuming f is not necessarily continuous but only L-integrable on a measurable set. To do this, it will be useful to prove the following important relationship between the integral of a function having nonnegative values and the measure of its hypograph. Proposition 11.42 Let E be a measurable bounded set and .f : E → R a bounded function with nonnegative values. Let .Gf be the set thus defined: Gf = {(x, t) ∈ E × R : 0 ≤ t ≤ f (x)} .

.

Then f is integrable on E if and only if .Gf is measurable, in which case μ(Gf ) =

f.

.

E

Proof Assume first that .Gf is measurable. By Fubini’s Reduction Theorem 11.36, since .P1 Gf = E, the sections being .(Gf )x = [0, f (x)], we have that the function . f (x) .x → 1 = f (x) is integrable on E and 0 - -

μ(Gf ) =

f ( x)

1=

.

Gf

E

0

 1 dt d x = f (x ) d x . E

Assume now that f is integrable on .E. Let .C > 0 be a constant such that .0 ≤ f (x) < C for every .x ∈ E. Given a positive integer .n, we divide the interval .[0, C] into n equal parts and consider, for .j = 1, . . . , n, the sets 2  j −1 j j C ≤ f (x ) < C ; En = x ∈ E : n n

.

as a consequence of Lemma 11.16, they are measurable and nonoverlapping, and their union is .E. We can then define on E the function .ψn in the following way: ψn =

.

n  j j =1

n

CχE j , n

11.11 The General Theorem on Change of Variables

343

and so Gψn =

.

 n  ) j j En × 0, C . n

j =1

j

By Proposition 11.22, it is easy to see that, since the sets .En are measurable, the * + j j sets .En × 0, n C are, too. Consequently, the sets .Gψn are measurable. Moreover, since 6 .Gf = Gψn , n≥1

even .Gf is measurable, and the proof is thus completed.



We are now in a position to prove the second version of the theorem on the change of variables in the integral. Theorem 11.43 (Change of Variables Theorem—II) Let A and B be open subsets of .RN , and let .ϕ : A → B be a diffeomorphism. Let .D ⊆ A and .ϕ(D) ⊆ B be measurable bounded sets and .f : ϕ(D) → R a function. Then f is L-integrable on .ϕ(D) if and only if .(f ◦ ϕ) | det ϕ  | is L-integrable on .D, in which case -

f (x ) d x =

.

ϕ(D)

-

f (ϕ(u)) | det ϕ  (u)| d u . D

Proof Assume that f is L-integrable on .E = ϕ(D). We first consider the case where f is bounded with nonnegative values. Let .C > 0 be such that .0 ≤ f (x) < C for every .x ∈ E. We define the open sets & = A× ] − C, C[ , A

.

& = B× ] − C, C[ B

&→ B & in the following way: and the function .& ϕ:A ϕ (u1 , . . . , un , t) = (ϕ1 (u1 , . . . , un ), . . . , ϕn (u1 , . . . , un ), t) . &

.

This function is a diffeomorphism, and .det & ϕ  (u, t) = det ϕ  (u) for every .(u, t) ∈ & Let .Gf be the hypograph of .f : A. Gf = {(x, t) ∈ E × R : 0 ≤ t ≤ f (x)} .

.

Since f is L-integrable and E is measurable, by the preceding proposition we have that .Gf is a measurable set. Moreover, &−1 (Gf ) = {(u, t) ∈ D × R : 0 ≤ t ≤ f (ϕ(u))} . ϕ

.

344

11 The Integral

Using Theorem 11.41 and Fubini’s reduction theorem, we have μ(Gf ) =

.

=

ϕ −1 (Gf ) &

ϕ −1 (Gf ) &

- -

0

D

D

| det ϕ  (u)| d u dt

f (ϕ(u))

= =

| det & ϕ  (u, t)| d u dt





| det ϕ (u)| dt d u

f (ϕ(u)) | det ϕ  (u)| d u .

. On the other hand, by Proposition 11.42, we have that .μ(Gf ) = ϕ(D) f, and this proves that the formula holds in the case where f is bounded with nonnegative values. In the case where f is not bounded but still has nonnegative values, we consider the functions fn (x) = min{f (x), n} .

.

For each of them, the formula holds true, and using the Monotone Convergence Theorem 9.10 we prove that the formula holds for f even in this case. When f does not have nonnegative values, it is sufficient to consider its positive and negative parts, apply the formula to them, and then subtract. To obtain the opposite implication, it is sufficient to consider .(f ◦ ϕ) | det ϕ  | instead of f and .ϕ −1 instead of .ϕ and to apply what was just proved.  We recall here the equivalent formula -

f (x ) d x =

.

E

ϕ −1 (E)

f (ϕ(u)) | det ϕ  (u)| d u .

11.12 Some Useful Transformations in R2 Some transformations do not change the measure of any measurable set. We consider here some of those that are most frequently used in applications. Translations We call translation by a given vector .a = (a1 , a2 ) ∈ R2 the transformation .ϕ : R2 → R2 defined by ϕ(u, v) = (u + a1 , v + a2 ) .

.

11.12 Some Useful Transformations in R2

345

y

x

It is readily seen that .ϕ is a diffeomorphism, with .det ϕ  = 1, so that, given a measurable bounded set D and an L-integrable function f on .ϕ(D), we have -

f (x, y) dx dy =

.

ϕ(D)

f (u + a1 , v + a2 ) du dv . D

Reflections A reflection with respect to one of the cartesian axes is defined by ϕ(u, v) = (−u, v) ,

.

ϕ(u, v) = (u, −v) .

or

Here, .det ϕ  = −1, so that, taking for example the first case, we have -

f (x, y) dx dy =

.

ϕ(D)

f (−u, v) du dv . D

y

x

346

11 The Integral

y

x

Rotations A rotation around the origin by a fixed angle .α is given by ϕ(u, v) = (u cos α − v sin α , u sin α + v cos α) .

.

It is a diffeomorphism, with 

cos α − sin α . det ϕ (u, v) = det sin α cos α 

 = (cos α)2 + (sin α)2 = 1 .

Hence, given a measurable bounded set D and an L-integrable function2 f on .ϕ(D), we have . f (x, y) dx dy = f (u cos α − v sin α , u sin α + v cos α) du dv . ϕ(D)

D

Homotheties A homothety of ratio .α > 0 is a function .ϕ : R2 → R2 defined by ϕ(u, v) = (αu, αv) .

.

It is a diffeomorphism, with .det ϕ  = α 2 . Hence, -

f (x, y) dx dy = α

.

ϕ(D)

2

2

f (αu, αv) du dv . D

Let us mention here that Buczolich [2] found an ingenious example of an integrable function in 2 whose rotation by .α = π/4 is not integrable. This is why we have restricted our attention only to L-integrable functions.

.R

11.12 Some Useful Transformations in R2

347

Polar Coordinates Another useful transformation is provided by the function .ψ : [0, +∞[ ×[0, 2π[ → R2 given by ψ(ρ, θ ) = (ρ cos θ, ρ sin θ ) ,

.

y ρ

θ x

which defines the so-called “polar coordinates” in .R2 . Consider the open sets A = ]0, +∞[ × ]0, 2π[ ,

B = R2 \ ([0, +∞[ ×{0}) .

.

The function .ϕ : A → B defined by .ϕ(ρ, θ ) = ψ(ρ, θ ) happens to be a diffeomorphism, and it is easily seen that .

det ϕ  (ρ, θ ) = ρ

for every (ρ, θ ) ∈ A .

Let .E ⊆ R2 be a measurable bounded set, and consider a function .f : E → R. We & = E ∩ B. Since .E & can apply the Change of Variables Theorem 11.43 to the set .E & differ from E and .ψ −1 (E), respectively, by negligible sets, we obtain and .ϕ −1 (E) the following formula on the change of variables in polar coordinates: -

f (x, y) dx dy =

.

E

ψ −1 (E)

f (ψ(ρ, θ ))ρ dρ dθ .

Example Let .f (x, y) = xy be defined on E = {(x, y) ∈ R2 : x ≥ 0, y ≥ 0, x 2 + y 2 < 9} .

.

By the formula on the change of variables in polar coordinates, we have .ψ −1 (E) = [0, 3[×[0, π2 ]; by the Reduction Theorem 11.36, we can then compute -

f =

.

E

0

π/2 - 3 0

 ρ 3 cos θ sin θ dρ

dθ =

81 4

-

π/2 0

cos θ sin θ dθ =

81 . 8

348

11 The Integral

11.13 Cylindrical and Spherical Coordinates in R3 We consider the function .ξ : [0, +∞[ ×[0, 2π[ ×R → R3 defined by ξ(ρ, θ, z) = (ρ cos θ, ρ sin θ, z) ,

.

z

y ρ

θ x

which gives us the so-called “cylindrical coordinates” in .R3 . Consider the open sets .

A = ]0, +∞[ × ]0, 2π[ ×R , B = (R2 \ ([0, +∞[ ×{0})) × R .

The function .ϕ : A → B defined by .ϕ(ρ, θ, z) = ξ(ρ, θ, z) happens to be a diffeomorphism, and it is easily seen that .

det ϕ  (ρ, θ, z) = ρ ,

for every (ρ, θ, z) ∈ A .

Let .E ⊆ R3 be a measurable bounded set, and consider a function .f : E → R. We & = E ∩ B. Since can then apply the Change of Variables Theorem 11.43 to the set .E −1 −1 & & .E and .ϕ (E) differ from E and .ξ (E), respectively, by negligible sets, we obtain the following formula on the change of variables in cylindrical coordinates: -

f (x, y, z) dx dy dz =

.

E

ξ −1 (E)

f (ξ(ρ, θ, z))ρ dρ dθ dz .

. Example Let us compute the integral . E f, where .f (x, y, z) = x 2 + y 2 and E = {(x, y, z) ∈ R3 : x 2 + y 2 ≤ 1, 0 ≤ z ≤ x + y +

.



2} .

11.13 Cylindrical and Spherical Coordinates in R3

349

Passing to cylindrical coordinates, we notice that ρ cos θ + ρ sin θ +

.

√ 2≥0

for every .θ ∈ [0, 2π[ and every .ρ ∈ [0, 1]. By the Change of Variables Theorem 11.43, using also Fubini’s Reduction Theorem 11.36, we compute -

(x 2 + y 2 ) dx dy dz =

.

ξ −1 (E)

E

ρ 3 dρ dθ dz

1  - 2π

= 0

=

0

1  - 2π

0

1

= 2π

ρ 0

3

  ρ dz dθ dρ 3

0

ρ 3 (ρ cos θ + ρ sin θ +

0

-

√ ρ cos θ+ρ sin θ+ 2

-



 √ 2) dθ dρ

√ π 2 . 2 dρ = 2

Now consider the function .σ : [0, +∞[ ×[0, 2π[ ×[0, π] → R3 defined by σ (ρ, θ, φ) = (ρ sin φ cos θ, ρ sin φ sin θ, ρ cos φ) ,

.

z

φ

ρ

y

θ x

which defines the so-called “spherical coordinates” in .R3 . Consider the open sets A = ]0, +∞[ × ]0, 2π[ × ]0, π[ ,

.

B = R3 \ ([0, +∞[ ×{0} × R) .

350

11 The Integral

The function .ϕ : A → B defined by .ϕ(ρ, θ, φ) = σ (ρ, θ, φ) happens to be a diffeomorphism, and it can be easily checked that .

det ϕ  (ρ, θ, φ) = −ρ 2 sin φ ,

for every (ρ, θ, z) ∈ A .

Let .E ⊆ R3 be a measurable bounded set, and consider a function .f : E → R. We & = E ∩ B. Since .E & and can then apply the Change of Variables Theorem 11.43 to .E −1 −1 & .ϕ (E) differ from E and .σ (E), respectively, by negligible sets, we obtain the following formula on the change of variables in spherical coordinates: -

f (x, y, z) dx dy dz =

.

σ −1 (E)

E

f (σ (ρ, θ, φ))ρ 2 sin φ dρ dθ dφ .

Example Let us compute the volume of the set  E = (x, y, z) ∈ R : x + y + z ≤ 1, z ≥ 3

.

2

2

2

2

 x2

+ y2

.

We have μ(E) =

1 dx dy dz

.

E

=

σ −1 (E)

=

ρ 2 sin φ dρ dθ dφ

1  - π/4 - 2π

0

0

1  - π/4

-

= 2π 0

 ρ 2 sin φ dθ

 dφ dρ

0

 ρ 2 sin φ dφ dρ

0

√ - 1 √    2 2 2π 2 = 2π 1 − . ρ dρ = 1 − 2 2 3 0

11.14 The Integral on Unbounded Sets When dealing with unbounded domains, there are good reasons to limit our attention only to L-integrable functions. This section extends the theory of the integral to this context. Let E be a subset of .RN , not necessarily bounded, and assume first that .f : E → R is a nonnegative function, i.e., f (x) ≥ 0 , for every x ∈ E .

.

11.14 The Integral on Unbounded Sets

351

As usual, we will use the notation B[0, r] = [−r, r] × · · · × [−r, r] .

.

If f is integrable on .E ∩ B[0, r] for every .r > 0, we define -

f = lim

.

r→+∞ E∩B[0,r]

E

f.

. Notice that this limit always exists since the function .r → E∩B[0,r] f is increasing, because of .f ≥ 0. When this limit happens to be finite (i.e., not equal to .+∞), we will say that f is “integrable” (on E). It can be easily seen that the same result is obtained if, instead of .B[0, r], we take the Euclidean close balls .B(0, r). This is due to the fact that, for every .r > 0, B(0, r) ⊆ B[0, r] ,

√ B[0, r] ⊆ B(0, r N) .

and

.

The same observation can be made, of course, for many other families .(Sr ) of bounded sets invading .RN , meaning that for every .r > 0 there exists .r  > 0 such that B[0, r] ⊆ Sr  .

.

In the case where the function f also has negative values, we consider both its positive part .f + = max{f, 0} and its negative part .f − = max{−f, 0}, so that + − f − . Notice that .f + ≥ 0 and .f − ≥ 0. We say that f is L-integrable if .f = f + both .f and .f − are integrable, in which case we define -

f =

.

E

f

+

-

f −.



E

E

Notice that, in this case, since .|f | = f + + f − , we have that -

-

f+ +

|f | =

.

E

E

-

f −. E

The fact that .|f | is integrable justifies the name “L-integrable” for the function f . It is not difficult to prove that the set of L-integrable functions is a real vector space, and the integral is a linear function on it which preserves the order. Moreover, we can easily verify that a function f is L-integrable on a set E if and only if the function .fE is L-integrable on .RN . Definition 11.44 A set .E ⊆ RN is said to be “measurable” if .E ∩ B[0, r] is measurable for every .r > 0. In that case, we set μ(E) = lim μ(E ∩ B[0, r]) .

.

r→+∞

352

11 The Integral

Notice that .μ(E), in some cases, can be .+∞. It is finite if and only if the constant function 1 is L-integrable on .E, i.e., the characteristic function of E is L-integrable on .RN . The properties of measurable bounded sets extend easily to unbounded sets. In particular, all open sets and all closed sets are measurable. The Monotone Convergence Theorem of Beppo Levi attains the following general form. Theorem 11.45 (Monotone Convergence Theorem—II) We are given a function f and a sequence of functions .fn , with .n ∈ N, defined almost everywhere on a subset E of .RN, with real values, verifying the following conditions: (a) (b) .(c) .(d) .

.

The sequence .(fn )n converges pointwise to f , almost everywhere on .E. The sequence .(fn )n is monotone. Each function .fn is .L-integrable on .E. The real sequence .( E fn )n has a finite limit.

Then f is L-integrable on E, and -

f = lim

.

n

E

fn . E

Proof Assume, for definiteness, that the sequence .(fn )n is increasing. By considering the sequence .(fn − f0 )n instead of .(fn )n , we can assume without loss of generality that . all the functions have almost everywhere nonnegative values. Let .J = limn ( E fn ); for every .r > 0 we can apply the Monotone Convergence Theorem 9.10 on the bounded set .E ∩ B[0, r], so that f is integrable on .E ∩ B[0, r] and . f = lim fn ≤ lim fn = J . n→∞ E∩B[0,r]

E∩B[0,r]

n→∞ E

. Let us prove that the limit of . E∩B[0,r] f exists, as .r → +∞, and that it is equal to .J . Fix .ε > 0; there is a .n¯ ∈ N such that, for .n ≥ n, ¯ ε ≤ .J − fn ≤ J ; 2 E since, moreover, -

-

.

E

fn¯ = lim

r→+∞ E∩B[0,r]

fn¯ ,

there is a .r¯ > 0 such that, for .r ≥ r¯ , J −ε ≤

.

E∩B[0,r]

fn¯ ≤ J .

11.14 The Integral on Unbounded Sets

353

Then, since the sequence .(fn )n is increasing, we have that, for every .n ≥ n¯ and every .r ≥ r¯ , .J − ε ≤ fn ≤ J . E∩B[0,r]

Passing to the limit as .n → +∞, we obtain, for every .r ≥ r¯ , .J − ε ≤ f ≤J . E∩B[0,r]



The proof is thus completed.

As an immediate consequence, there is an analogous statement for the series of functions. Corollary 11.46 We are given a function f and a sequence of functions .fk , with k ∈ N, defined almost everywhere on a subset E of .RN, with real values, verifying the following conditions:

.

(a) .(b) .(c) .(d) .

1 The series . k fk converges pointwise to f , almost everywhere on .E. For every .k ∈ N and almost every .x ∈ E, we have .fk (x) ≥ 0. Each function 1 .f.k is L-integrable on .E. The series . k ( E fk ) converges.

Then f is L-integrable on E and f =

.

E

∞ 

fk .

k=0 E

From the Monotone Convergence Theorem 11.45 we deduce, in complete analogy with what we have seen for bounded sets, the Dominated Convergence Theorem of Henri Lebesgue. Theorem 11.47 (Dominated Convergence Theorem—II) We are given a function f and a sequence of functions .fn , with .n ∈ N, defined almost everywhere on a subset E of .RN, with real values, verifying the following conditions: (a) The sequence .(fn )n converges pointwise to f , almost everywhere on .E. (b) Each function .fn is L-integrable on .E. .(c) There are two functions .g, h, defined almost everywhere and L-integrable on .E, such that

.

.

g(x) ≤ fn (x) ≤ h(x)

.

for every .n ∈ N and almost every .x ∈ E.

354

11 The Integral

Then the sequence .(

. E

fn )n has a finite limit, f is L-integrable on E, and -

f = lim

.

n

E

fn . E

As a direct consequence we have the complete additivity property of the integral for L-integrable functions. Theorem 11.48 Let .(Ek ) be a finite or countable family of pairwise nonoverlapping measurable subsets of .RN whose union is a set .E. Then f is L-integrable on E if and only if the following two conditions hold: (a) f 1is.L-integrable on each set .Ek . (b) . k Ek |f (x)| d x < +∞.

.

.

In that case, we have f =

.

-

E

f. Ek

k

As another consequence, we have the Leibniz rule for not necessarily bounded subsets Y of .RN , which is stated as follows. Theorem 11.49 (Leibniz Rule—II) Let .f : X × Y → R be a function, where X is a nontrivial interval of .R containing .x0 , and Y is a subset of .RN , such that: (a) For every .x ∈ X, the function .f (x, ·) is L-integrable on .Y, so that we can define the function

.

F (x) =

.

Y

f (x, y) d y .

(b) For every .x ∈ X and almost every .y ∈ Y the partial derivative . ∂f ∂x (x, y) exists. .(c) There are two L-integrable functions .g, h : Y → R such that

.

g(y) ≤

.

∂f (x, y) ≤ h(y) ∂x

for every .x ∈ X and almost every .y ∈ Y. Then the function . ∂f ∂x (x, ·), defined almost everywhere on .Y, is L-integrable there, the derivative of F in .x0 exists, and F  (x0 ) =

- 

.

Y

 ∂f (x0 , y) d y . ∂x

11.14 The Integral on Unbounded Sets

355

Also the reduction theorem of Guido Fubini extends to functions defined on a not necessarily bounded subset E of .RN . Let .N = N1 +N2 , and write .RN = RN1 ×RN2 . For every .(x, y) ∈ RN1 × RN2 , consider the “sections” of E: Ex = {y ∈ RN2 : (x, y) ∈ E} ,

Ey = {x ∈ RN1 : (x, y) ∈ E} ,

.

and the “projections” of E: P1 E = {x ∈ RN1 : Ex = Ø} ,

P2 E = {y ∈ RN2 : Ey = Ø} .

.

We can then reformulate the theorem in the following form. Theorem 11.50 (Reduction Theorem—IV) Let .f : E → R be an L-integrable function. Then: (a) For almost every .x ∈.P1 E, the function .f (x, ·) is L-integrable on the set .Ex . (b) The function .x → Ex f (x, y) d y, defined almost everywhere on .P1 E, is L-integrable there. .(c) We have

.

.

-

-

f =

.

E

Ex

P1 E

 f (x , y ) d y d x .

. Analogously, the function .y → Ey f (x, y) d x, defined almost everywhere on .P2 E, is L-integrable there, and we have -

-

-

f =

.

E

Ey

P2 E

 f (x , y ) d x d y .

Proof Consider for simplicity the case .N1 = N2 = 1, the general case being perfectly analogous. Assume first that f has nonnegative values. By Fubini’s Reduction Theorem 11.36 for bounded sets, once .r > 0 is fixed, we have that, for almost every .x ∈ P1 E ∩ [−r, . r], the function .f (x, ·) is L-integrable on .Ex ∩[−r, r]; the function .gr (x) = Ex ∩[−r,r] f (x, y) dy, defined almost everywhere on .P1 E ∩ [−r, r], is L-integrable there, and -

f =

.

E∩B[0,r]

gr (x) dx . P1 E∩[−r,r]

In particular, -

gr (x) dx ≤

.

P1 E∩[−r,r]

f, E

356

11 The Integral

so that if .0 < s ≤ r, then we have that .gr is L-integrable on .P1 E ∩ [−s, s], and -

gr (x) dx ≤

.

P1 E∩[−s,s]

f. E

Keeping s fixed, we let r tend to .+∞. Since f has nonnegative values, .gr (x) will be increasing with respect to .r. Consequently, for almost every .x ∈ P1 E ∩ [−s, s], the limit .limr→+∞ gr (x) exists (possibly infinite), and we set g(x) = lim gr (x) = lim

.

r→+∞

r→+∞ E ∩[−r,r] x

f (x, y) dy .

Let .T = {x ∈ P1 E ∩ [−s, s] : g(x) = +∞}; let us prove that T is negligible. We define the sets Enr = {x ∈ P1 E ∩ [−s, s] : gr (x) > n}.

.

By Lemma 11.16, these sets are measurable sets and 1 n

μ(Enr ) ≤

.

gr (x) dx ≤ P1 E∩[−s,s]

1 n

f. E

Hence, since the sets .Enr increase.with r, the sets .Fn = ∪r Enr are also measurable, and we have that .μ(Fn ) ≤ n1 E f . Since .T ⊆ ∩n Fn , we deduce that T is measurable, with .μ(T ) = 0. Hence, for almost every .x ∈ P1 E ∩ [−s, s], the function .f (x, ·) is L-integrable on the set .Ex and, by definition, f (x, y) dy = g(x) .

.

Ex

Moreover, if we take r in the set of natural numbers and apply the Monotone Convergence Theorem to the functions .gr , it follows that g is L-integrable on .P1 E ∩ [−s, s], and . g = lim gr , r→∞ P E∩[−s,s] 1

P1 E∩[−s,s]

so that 

-

-

-

f (x, y) dy dx ≤

.

P1 E∩[−s,s]

Ex

f. E

11.14 The Integral on Unbounded Sets

357

Letting now s tend to .+∞, we see that the limit -

.

 f (x, y) dy dx

lim

s→+∞ P E∩[−s,s] 1

Ex

. exists and is finite; therefore, the function .x → Ex f (x, y) dy, defined almost everywhere on .P1 E, is L-integrable there, and its integral is the preceding limit. Moreover, from the inequality proved earlier, passing to the limit, we have that -

-

 f (x, y) dy dx ≤ f.

.

P1 E

Ex

E

On the other hand, -

-

f =

.

E∩B[0,r]

P1 E∩[−r,r]

-

-

Ex ∩[−r,r]

 f (x, y) dy dx

≤ P1 E∩[−r,r]

Ex

-



 f (x, y) dy dx



f (x, y) dy dx , P1 E

Ex

so that, passing to the limit as .r → +∞, -

-

-



f ≤

.

E

f (x, y) dy dx . P1 E

Ex

In conclusion, equality must hold, and the proof is thus completed in the case where f has nonnegative values. In the general case, just consider .f + and .f − , and subtract the corresponding formulas.  The analogous corollary for the computation of the measure holds. Corollary 11.51 Let E be a measurable set. Then E has a finite measure if and only if: (a) For almost every .x ∈ P1 E the set .Ex is measurable and has a finite measure. (b) The function .x → μ(Ex ), defined almost everywhere on .P1 E, is L-integrable there. .(c) We have .μ(E) = μ(Ex ) d x .

.

.

P1 E

358

11 The Integral

With a symmetric statement, if E has a finite measure, we also have μ(E) =

.

P2 E

μ(Ey ) d y .

The change of variables formula also extends to unbounded sets, with the same statement. Theorem 11.52 (Change of Variables Theorem—III) Let A and B be open subsets of .RN and .ϕ : A → B a diffeomorphism. Let .D ⊆ A be a measurable set and .f : ϕ(D) → R a function. Then f is L-integrable on .ϕ(D) if and only if  .(f ◦ ϕ) | det ϕ | is L-integrable on .D, in which case . f (x ) d x = f (ϕ(u)) | det ϕ  (u)| d u . ϕ(D)

D

Proof Assume first that f is L-integrable on .E = ϕ(D) with nonnegative values. Then, for every .r > 0, .

D∩B[0,r]

f (ϕ(u)) | det ϕ  (u)| d u =

ϕ(D∩B[0,r])

f (x) d x ≤

ϕ(D)

f (x) d x ,

so that the limit .

lim

r→+∞ D∩B[0,r]

f (ϕ(u)) | det ϕ  (u)| d u

exists and is finite. Then .(f ◦ ϕ) | det ϕ  | is L-integrable on D, and we have -

f (ϕ(u)) | det ϕ  (u)| d u ≤

.

-

D

f (x ) d x . ϕ(D)

On the other hand, for every .r > 0, -

f =

.

E∩B[0,r]

ϕ −1 (E∩B[0,r])

(f ◦ ϕ) | det ϕ  | ≤

ϕ −1 (E)

(f ◦ ϕ) | det ϕ  | ,

so that, passing to the limit, .

E

f (x) d x = lim

-

r→+∞ E∩B[0,r]

f (x ) d x ≤

ϕ −1 (E)

f (ϕ(u)) | det ϕ  (u)| d u .

The formula is thus proved when f has nonnegative values. In general, just proceed as usual, considering .f + and .f − .

11.14 The Integral on Unbounded Sets

359

To obtain the opposite implication, it is sufficient to consider .(f ◦ ϕ) | det ϕ  | instead of f and .ϕ −1 instead of .ϕ and to repeat the preceding argument.  Concerning the change of variables in polar coordinates in .R2 or in cylindrical or spherical coordinates in .R3, the same type of considerations we made for bounded sets extend to the general case as well. Example Let .E = {(x, y) ∈ R2 : x 2 + y 2 ≥ 1} and .f (x, y) = (x 2 + y 2 )−α , with .α > 0. We have  - 2π - +∞ - +∞ 1 1 . dx dy = ρ dρ dθ = 2π ρ 1−2α dρ . 2 2 α ρ 2α E (x + y ) 0 1 1 It is thus seen that f is integrable on E if and only if .α > 1, in which case the π integral is . α−1 . Example Let us compute the three-dimensional measure of the set  2  1 3 2 2 .E = (x, y, z) ∈ R : x ≥ 1, y +z ≤ . x Using Fubini’s Reduction Theorem 11.50, grouping together the variables .(y, z) we have - +∞ 1 .μ(E) = π 2 dx = π . x 1 Example Consider the function .f (x, y) = e−(x variables in polar coordinates: .

R2

e

−(x 2 +y 2 )

-



dx dy =

-

+∞

e 0

−ρ 2

0

2 +y 2 )

, and let us make a change of





+∞ 1 2 ρ dρ dθ = 2π − e−ρ =π. 2 0

Notice that, using Fubini’s Reduction Theorem 11.50, we have .

R2

e

−(x 2 +y 2 )

dx dy =

+∞ - +∞ −∞

= =

−∞

+∞ −∞ +∞ −∞

e

−x 2

e

−x 2 −y 2

 dx

e−x dx 2

e

 dx dy

+∞ −∞

2 ,

e

−y 2

 dy

360

11 The Integral

and we thus find again the formula .

+∞ −∞

e−x dx = 2

√ π.

11.15 The Integral on M-Surfaces We now want to define the integral of a function .f : U → R on an M-surface σ : I → RN whose image is contained in U . When the indices .i1 , . . . , iM vary in the set .{1, . . . , N}, then for every .u ∈ I we can consider the .M × M matrices obtained from the Jacobian matrix .J σ (u) (also denoted by .σ  (u)) by selecting the corresponding lines, i.e.,

.

⎛ ⎜ σ(i 1 ,...,iM ) (u) = ⎜ ⎝

.

∂σi1 ∂u1 (u)

...

.. .

··· ∂σiM ( u ) . .. ∂u1

∂σi1 ∂uM (u)

.. .

∂σiM ∂uM

⎞ ⎟ ⎟, ⎠

(u)

N

and we can define the vector . (u) in .R(M ) as .

  (u) = det σ(i 1 ,...,iM ) (u)

1≤i1