437 16 10MB
English Pages 884 Year 2015
An Introduction to Modern Analysis
Vicente Montesinos • Peter Zizler • Václav Zizler
An Introduction to Modern Analysis
2123
Vicente Montesinos Departamento de Matemática Aplicada Instituto de Matemática Pura y Aplicada Universitat Politècnica de València Valencia, Spain
Václav Zizler Department of Mathematical and Statistical Sciences University of Alberta Edmonton, Alberta Canada
Peter Zizler Department of Mathematics, Physics and Engineering Mount Royal University Calgary, Alberta Canada
ISBN 978-3-319-12480-3 ISBN 978-3-319-12481-0 (eBook) DOI 10.1007/978-3-319-12481-0 Springer Cham Heidelberg NewYork Dordrecht London Library of Congress Control Number: 2015934584 © Springer International Publishing Switzerland 2015 This work is subject to copyright. All rights are reserved by the Publisher, whether the whole or part of the material is concerned, specifically the rights of translation, reprinting, reuse of illustrations, recitation, broadcasting, reproduction on microfilms or in any other physical way, and transmission or information storage and retrieval, electronic adaptation, computer software, or by similar or dissimilar methodology now known or hereafter developed. The use of general descriptive names, registered names, trademarks, service marks, etc. in this publication does not imply, even in the absence of a specific statement, that such names are exempt from the relevant protective laws and regulations and therefore free for general use. The publisher, the authors and the editors are safe to assume that the advice and information in this book are believed to be true and accurate at the date of publication. Neither the publisher nor the authors or the editors give a warranty, express or implied, with respect to the material contained herein or for any errors or omissions that may have been made. Printed on acid-free paper Springer is part of Springer Science+Business Media (www.springer.com)
Preface
Mathematics is the queen of the sciences Carl Friedrich Gauss
This text is directed at undergraduate students in mathematical sciences who wish to have solid foundations for modern analysis, a meeting point of classical analysis with other parts of mathematics, like functional analysis, operator theory, nonlinear analysis, etc. These foundations are necessary for applications of mathematics in sciences or engineering. Moreover, students planning to pursue graduate work in mathematics will find this text useful, especially those who did not have a chance to go through the honors programs at their respective universities or colleges. It is assumed the reader has a good understanding of elementary linear algebra and arithmetics, as well as some training in simple logic. We shall try to fill foreseeable gaps to help the reader in this direction. The text consists of a rigorous yet gentle self-contained introduction to real analysis with various visual supplements. Moreover, we have enriched the material with several excursions to mathematical areas such as functional analysis, descriptive statistics, or Fourier analysis (some chapters that are rather self-contained can be used as a material for independent optional course in some undergraduate programs). Aside from the theoretical part, the text contains an ample amount of exercises of various difficulties with hints for their solutions. We have prepared a number of figures (by using the free-distribution programs Veusz and IPE, and in a few opportunities also the registered package Mathematica) that are intended to help with understanding of the material covered. We tried to touch on quite a few “folklore” things that are frequently used in real analysis. We hope that instructors in service calculus courses may find the text to be a source for more advanced problems. In the first chapter we introduce the real number system, discuss the principle of the supremum, and first meet the important principle of compactness and the Baire Category theorem. In the second chapter we encounter the notion of convergent and Cauchy sequences of real numbers and the approximation by rational numbers. v
vi
Preface
Chapter 3 contains an introduction to Lebesgue measure on the real line and its applications. Chapter 4 contains basic notions and results in the theory of real-valued functions and their differentiability, together with an introduction to sequences of real-valued functions and their convergence. Chapter 5 upgrades the discussion on function convergence. We discuss pointwise, uniform, measure, and almost everywhere convergences. The focus is on approximation and the properties preserved through it. In particular, global and local approximations are considered. Applications of those concepts include a discussion on real analytic functions and rigorous definitions of the basic functions in analysis. Chapter 6 deals with metric spaces. This is a wide setting in which most of the former discussions find their place. The reader may find here Tietze’s extension theorem, a discussion on separable spaces, with an emphasis on Polish spaces, a deeper analysis of compactness, including the Arzelà– Ascoli theorem, more on the Baire category theorem, and applications to metric fixed point theory. Chapter 7 deals with integration in the Riemann and Lebesgue senses. Lebesgue’s approach is intertwined with the measure theory already developed, and allows for a finer analysis of functions and convergence. Chapter 8 introduces the reader to the basic theory of convex functions. Chapter 9 is a basic introduction to the theory of Fourier series and integrals, including applications. An extension to the more general setting of periodic distributions will be done in Chap. 11. Chapter 10 presents a basic introduction to descriptive statistics. The emphasis is on discrete probability, which may help to understand the subsequent, more general approach. In Chap. 11, named “Excursion to Functional Analysis,” we present an introduction to basic concepts and results in a few selected topics in functional analysis, like Banach spaces, operator theory, and nonlinear functional analysis, with applications to real analysis. In fact, we shall try to illustrate to some extent how “abstract” functional analysis emerges from the waters of real analysis as a lighthouse to orientate and overlook the whole sea. We believe that this chapter may be used as a basic introduction to these subjects, and may foster the interest of the reader to enlarge his/her knowledge of modern techniques used in many fields. Together with Chap. 6, this chapter may constitute a basic material for an introductory graduate course in linear and nonlinear functional analysis. We include an Appendix (Chap. 12), mainly on number systems, and on three fundamental principles in set theory—the axiom of choice, the well-ordering principle, and Zorn’s lemma. The last chapter (Chap. 13) is formed by exercises that are organized according to the chapters in the text. They are of various levels of difficulty. Some of them just briefly review the basic techniques of rigorous elementary calculus, and some of them upgrade the material in the chapters of the text. All of them are accompanied by hints for their solutions. Optional sections are denoted by the symbol ♣.
Acknowledgements
The authors thank the institutions where they got the opportunity to teach the material in the text, namely, the University of Alberta (Edmonton, Alberta, Canada), Mount Royal University (Calgary, Alberta, Canada), and The Universitat Politècnica de València (València, Spain). They thank their colleagues and students for many discussions that helped in preparing this text. In particular, we thank M. Fabian, A. J. Guirao, P. Hájek, J. Muldowney and the late M. Valdivia. The authors would like to thank the Springer team for their interest in this text. In particular, they are thankful to Keith F. Taylor, Vaishali Damle, and Marc Strauss. They also thank Sakshi Narang for the assistance and the very professional work done in editing the final version of this book. Above all, the authors are indebted to their families for their moral support and encouragement. They wish the reader a pleasant journey through this book.
vii
Notation
Special symbols used in this book will be introduced along the text. In order to keep track of them, a list of symbols—referring to the page where they first appear or where they are defined—is included. The first appearance is written in boldface. We tried to follow the usual notation regarding mathematical symbols. However, we depart from this habit in some particular cases. For example, B[x, r] denotes the closed ball with center x and radius r in a metric space. When a symbol for a generic function is needed, we use f, g, or similar, and we speak of “the function f.” Coherence may force then to speak about “the function sin,” or “the function ln,” for example. However, it is a tradition to refer to these functions as to “the function sin x”, or “the function ln x”, and we follow this convention. In this text, two notions of integral are used: The Riemann integral and the Lebesgue integral. For a function f defined in an interval [a, b], the first one is b b denoted by a f or a f (x)dx, while the symbol [a,b] f or [a,b] f (x)dx is reserved for the second. Accordingly, if S is a measurable set and f : S → R is a Lebesgue integrable function on S, the Lebesgue integral of f on S will be denoted +∞ by S f or S f (x)dx. Improper Riemann integrals will be denoted then by a f +∞ or a f (x)dx. Every Riemann-integrable function f on a closed and bounded interval [a, b] is Lebesgue-integrable, and both integrals coincide. In this case, the b common value of the Riemann and the Lebesgue integral will be denoted by a f b (or a f (x)dx), what seems an accepted practice. The end of a proof is marked , the end of an example ♦, while the end of a remark uses the symbol ®.
ix
Contents
1
Real Numbers: The Basics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.1 Notation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.2 Natural Numbers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.3 Integers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.4 Fractions and Rational Numbers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.4.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.4.2 Powers and Radicals of Rational Numbers . . . . . . . . . . . . . 1.5 Base Representation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.5.1 The Expansion of a Natural Number in Base b . . . . . . . . . . 1.5.2 The Expansion of a Rational Number in Base b . . . . . . . . . 1.6 Real Numbers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.6.1 The Definition of a Real Number . . . . . . . . . . . . . . . . . . . . . 1.6.2 The Expansion of a Real Number in Base b. . . . . . . . . . . . . 1.6.3 The Extended Real Number System, Intervals . . . . . . . . . . 1.6.4 Order Properties—and the Completeness—of R . . . . . . . . 1.7 Cardinality of Sets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.7.1 Basics on Cardinality . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.7.2 Cardinality of Z and Q . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.7.3 Cardinality of R . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.7.4 Cardinality of the Set of Real Functions . . . . . . . . . . . . . . . 1.8 Topology of R . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.8.1 Introduction. Open and Closed Sets . . . . . . . . . . . . . . . . . . . 1.8.2 Neighborhoods, Closure, Interior . . . . . . . . . . . . . . . . . . . . . 1.8.3 Topology on a Subset . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.8.4 Compactness . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.8.5 Connectedness and Related Concepts . . . . . . . . . . . . . . . . . 1.9 The Baire Category Theorem in R . . . . . . . . . . . . . . . . . . . . . . . . . . .
1 1 4 5 10 10 11 14 15 15 18 18 21 24 25 30 30 34 35 38 38 38 42 46 47 50 54
2
Sequences and Series . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.1 Approximation by Rational Numbers . . . . . . . . . . . . . . . . . . . . . . . . . 2.2 Sequences . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.2.1 Basics on Sequences . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
57 57 61 61 xi
xii
Contents
2.2.2
2.3 2.4
2.5 2.6 3
4
Two Particular Sequences: Arithmetic and Geometric Progressions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69 More on Sequences . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 70 Series . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73 2.4.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73 2.4.2 General Criteria for Convergence of Series . . . . . . . . . . . . . 77 2.4.3 Series of Nonnegative Terms . . . . . . . . . . . . . . . . . . . . . . . . . 78 2.4.4 Series of Arbitrary Terms . . . . . . . . . . . . . . . . . . . . . . . . . . . . 81 2.4.5 Rearrangement of Series . . . . . . . . . . . . . . . . . . . . . . . . . . . . 84 2.4.6 Double Sequences and Double Series . . . . . . . . . . . . . . . . . 91 2.4.7 Product of Series . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 97 The Euler Number e . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 99 Infinite Products . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 105
Measure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.1 Measure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.1.1 The Lebesgue Outer Measure . . . . . . . . . . . . . . . . . . . . . . . . 3.1.2 The Class of Lebesgue Measurable Sets and the Lebesgue Measure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.1.3 Approximating Measurable Sets . . . . . . . . . . . . . . . . . . . . . . 3.1.4 The Lebesgue Inner Measure . . . . . . . . . . . . . . . . . . . . . . . . 3.1.5 The Cantor Ternary Set . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.1.6 A Nonmeasurable Set . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.1.7 Sequences of Sets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
109 109 109
Functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.1 Functions on Real Numbers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.1.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.1.2 The Limit of a Function . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.1.3 Continuous Functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.1.4 Differentiable Functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.2 Optimization and the Mean Value Theorem . . . . . . . . . . . . . . . . . . . . 4.3 Algebra of Derivatives . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.4 The Trigonometric Functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.5 Finer Analysis of Continuity and Differentiability . . . . . . . . . . . . . . 4.5.1 Differentiability of the Inverse Mapping . . . . . . . . . . . . . . . 4.5.2 Inverse Goniometric Functions . . . . . . . . . . . . . . . . . . . . . . . 4.5.3 Monotone Functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.5.4 Measurable Functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.5.5 Differentiability of Monotone Functions . . . . . . . . . . . . . . . 4.5.6 Functions of Bounded Variation . . . . . . . . . . . . . . . . . . . . . . 4.5.7 Absolutely Continuous Functions and Lipschitz Functions 4.5.8 Examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.5.9 The Intermediate Value Property II . . . . . . . . . . . . . . . . . . . .
135 135 135 140 147 160 165 171 177 183 183 184 186 189 196 201 206 211 213
114 122 123 126 130 132
Contents
xiii
5
Function Convergence . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.1 Function Sequences . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.1.1 Pointwise and Almost Everywhere Convergence . . . . . . . . 5.1.2 Uniform Convergence . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.1.3 Convergence in Measure . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.1.4 Local Approximation by Polynomials . . . . . . . . . . . . . . . . . 5.2 Function Series . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.2.1 Power Series . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.2.2 The Taylor Series . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.2.3 The Exponential and the Logarithmic Functions . . . . . . . . 5.2.4 The Hyperbolic Functions . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.2.5 The Trigonometric Functions . . . . . . . . . . . . . . . . . . . . . . . . 5.2.6 The Binomial Series . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
215 215 215 219 237 238 250 250 258 263 273 274 279
6
Metric Spaces . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6.1 Basics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6.2 Mappings Between Metric Spaces . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6.3 More Examples (Continued) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6.4 Tietze’s Extension Theorem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6.5 Complete Metric Spaces and the Completion of a Metric Space . . . 6.6 Separable Metric Spaces . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6.7 Polish Spaces . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6.8 Compactness in Metric Spaces . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6.8.1 Compact Metric Spaces . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6.8.2 Total Boundedness . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6.8.3 Continuous Mappings on Compact Spaces . . . . . . . . . . . . . 6.8.4 The Lebesgue Number of a Covering . . . . . . . . . . . . . . . . . . 6.8.5 The Finite Intersection Property. Pseudocompactness . . . . 6.9 The Baire Category Theorem Continued . . . . . . . . . . . . . . . . . . . . . . 6.9.1 The Baire Category Theorem in the Context of Metric Spaces . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6.9.2 Some Applications of the Baire Category Theorem . . . . . . 6.10 The Arzelà–Ascoli Theorem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6.11 Metric Fixed Point Theory . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6.11.1 The Banach Contraction Principle . . . . . . . . . . . . . . . . . . . . 6.11.2 Continuity of the Fixed Point . . . . . . . . . . . . . . . . . . . . . . . .
283 283 289 293 294 296 302 306 312 312 317 323 325 326 327
Integration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7.1 The Riemann Integral . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7.1.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7.1.2 The Definition of the Riemann Integral . . . . . . . . . . . . . . . . 7.1.3 Properties of the Integral . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7.1.4 Functions Defined by Integrals . . . . . . . . . . . . . . . . . . . . . . .
339 339 339 342 350 355
7
327 329 332 333 335 337
xiv
Contents
♣ Some Applications of the Riemann Integral and the Arzelà–Ascoli Theorem to the Theory of Ordinary Differential Equations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7.1.6 ♣ Some Applications of the Riemann Integral and the Fixed Point Theory to the Theory of Ordinary Differential and Integral Equations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7.1.7 Mean Value Theorems for the Riemann Integral . . . . . . . . . 7.1.8 Convergence Theorems for Riemann Integrable Functions 7.1.9 Change of Variable; Integration by Parts . . . . . . . . . . . . . . . 7.2 Improper Riemann Integrals . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7.3 The Lebesgue Integral . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7.3.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7.3.2 Step Functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7.3.3 Upper Functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7.3.4 Lebesgue Integrable Functions . . . . . . . . . . . . . . . . . . . . . . . 7.3.5 Convergence Theorems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7.3.6 Measure and Integration . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7.3.7 Functions Defined by Integrals . . . . . . . . . . . . . . . . . . . . . . . 7.3.8 The Space L1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7.3.9 Riemann versus Lebesgue Integrability, and the Riemann–Lebesgue Criterion for Riemann Integrability . 7.3.10 The Fundamental Theorem of Calculus for Lebesgue Integration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7.3.11 Integration by Parts . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7.3.12 Parametric Lebesgue Integrals . . . . . . . . . . . . . . . . . . . . . . .
421 429 430
8
Convex Functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8.1 Basics on Convex Functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8.2 Some Fundamental Inequalities . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8.2.1 Jensen’s Inequality . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8.2.2 Using the Exponential Function . . . . . . . . . . . . . . . . . . . . . . 8.2.3 Using Powers of x (Minkowski’s and Hölder’s Inequalities)
439 439 449 449 450 452
9
Fourier Series . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9.2 Some Elementary Trigonometric Identities . . . . . . . . . . . . . . . . . . . . 9.3 The Fourier Series of 2π-periodic Lebesgue Integrable Functions . 9.4 The Riemann–Lebesgue Lemma . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9.5 The Partial Sums of a Fourier Series and the Dirichlet Kernel . . . . . 9.6 Convergence of the Fourier Series . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9.6.1 Pointwise Convergence of the Fourier Series . . . . . . . . . . . 9.6.2 Cesàro Convergence of the Fourier Series . . . . . . . . . . . . . . 9.6.3 Uniform Convergence of the Fourier Series . . . . . . . . . . . . 9.6.4 Convergence of the Fourier Series in · 1 . . . . . . . . . . . . . 9.6.5 Mean Square Convergence of the Fourier Series . . . . . . . . 9.7 The Fourier Integral . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
455 455 456 458 463 464 467 467 475 478 479 482 482
7.1.5
363
367 370 372 375 381 387 387 389 391 394 396 405 408 410 412
Contents
10
11
xv
Basics on Descriptive Statistics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10.1 Discrete Probability . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10.1.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10.1.2 Random Variables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10.1.3 Products of Discrete Probability Spaces . . . . . . . . . . . . . . . 10.1.4 Inequalities . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10.2 Distribution Functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10.2.1 Selected Distributions of Discrete Random Variables . . . . 10.2.2 Continuous Random Variables and Their Distribution Functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
487 487 487 489 496 497 498 498
Excursion to Functional Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11.1 Real Banach Spaces . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11.1.1 Spaces with a Norm (Normed Spaces, Banach Spaces) . . . 11.1.2 Operators I . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11.1.3 Finite-Dimensional Banach Spaces . . . . . . . . . . . . . . . . . . . 11.1.4 Infinite-Dimensional Banach Spaces . . . . . . . . . . . . . . . . . . 11.1.5 Operators II . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11.1.6 Finite-Rank and Compact Operators . . . . . . . . . . . . . . . . . . 11.1.7 Sets of Operators . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11.2 Three Basic Principles of Linear Analysis . . . . . . . . . . . . . . . . . . . . . 11.2.1 Extending Continuous Linear Functionals . . . . . . . . . . . . . . 11.2.2 Bounded Sets of Operators . . . . . . . . . . . . . . . . . . . . . . . . . . 11.2.3 Continuity of the Inverse Operator . . . . . . . . . . . . . . . . . . . . 11.3 Complex Banach Spaces . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11.3.1 The Associated Real Normed Space . . . . . . . . . . . . . . . . . . . 11.3.2 Operators . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11.3.3 Linear Functionals . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11.3.4 Supporting Functionals and Differentiability . . . . . . . . . . . 11.3.5 Basic Results in the Complex Setting . . . . . . . . . . . . . . . . . . 11.4 Spaces with an Inner Product (Pre-Hilbertian and Hilbert Spaces) . 11.4.1 Basic Hilbert Space Theory . . . . . . . . . . . . . . . . . . . . . . . . . . 11.4.2 An Application to the Uniform Convergence of the Fourier Series . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11.4.3 Complements to Hilbert Spaces . . . . . . . . . . . . . . . . . . . . . . 11.5 Spectral Theory . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11.6 ♣ Pointwise Topology and Product Spaces . . . . . . . . . . . . . . . . . . . . 11.7 Excursion to Nonlinear Functional Analysis . . . . . . . . . . . . . . . . . . . 11.7.1 Variational Principles . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11.7.2 More on Differentiability of Convex and Lipschitz Functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11.7.3 More on Fixed Point Theorems . . . . . . . . . . . . . . . . . . . . . . . 11.8 An Application: Periodic Distributions . . . . . . . . . . . . . . . . . . . . . . . . 11.8.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11.8.2 The Basic Idea . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
505 506 506 509 512 521 524 524 525 526 526 544 545 547 547 547 548 549 549 550 551
501
567 569 574 578 584 584 587 593 594 594 594
xvi
Contents
11.8.3 The Basic Definitions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11.8.4 Derivatives of Periodic Distributions . . . . . . . . . . . . . . . . . . 11.8.5 Convergence in PD . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11.8.6 Fourier Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11.9 Concluding Remarks to Chapter 11 . . . . . . . . . . . . . . . . . . . . . . . . . . .
595 600 602 602 612
12
Appendix . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12.1 The Set of Natural Numbers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12.2 Integer Numbers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12.3 Rational Numbers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12.4 Real Numbers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12.4.1 The Constructive Approach . . . . . . . . . . . . . . . . . . . . . . . . . . 12.4.2 The Axiomatic Approach . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12.5 The Complex Number System . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12.6 Ordering and Choice. Three Fundamental Principles in Set Theory 12.6.1 Definitions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12.6.2 Examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12.6.3 Three Basic Principles . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
617 617 619 620 621 621 623 626 627 627 629 629
13
Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13.1 Numbers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13.1.1 Set-Theoretical Notations . . . . . . . . . . . . . . . . . . . . . . . . . . . 13.1.2 Natural Numbers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13.1.3 Fractions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13.1.4 Base Representation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13.1.5 Real Numbers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13.1.6 Cardinality of Sets—and Ordinal Numbers . . . . . . . . . . . . . 13.1.7 Topology of R . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13.2 Sequences and Series . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13.2.1 Approximation by Rational Numbers . . . . . . . . . . . . . . . . . . 13.2.2 Sequences . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13.2.3 Series . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13.2.4 The Euler Number e . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13.3 Measure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13.3.1 The Lebesgue Outer Measure . . . . . . . . . . . . . . . . . . . . . . . . 13.3.2 The Class of Lebesgue Measurable Sets and the Lebesgue Measure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13.3.3 The Cantor Ternary Set . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13.3.4 A Nonmeasurable Set . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13.3.5 Sequences of Sets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13.4 Functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13.4.1 Functions on Real Numbers . . . . . . . . . . . . . . . . . . . . . . . . . 13.4.2 Optimization and the Mean Value Theorem . . . . . . . . . . . . 13.4.3 The Trigonometric Functions . . . . . . . . . . . . . . . . . . . . . . . . 13.4.4 Finer Analysis of Continuity and Differentiability . . . . . . .
631 631 631 632 634 634 635 637 651 653 653 653 659 664 665 665 665 668 669 669 670 670 692 695 700
Contents
13.5
13.6 13.7 13.8
xvii
13.4.5 Function Convergence . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13.4.6 Function Series . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13.4.7 Metric Spaces . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Integration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13.5.1 The Riemann Integral . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13.5.2 Review of Some Frequently used Techniques for calculating Antiderivatives . . . . . . . . . . . . . . . . . . . . . . . . . . 13.5.3 Improper Riemann Integral . . . . . . . . . . . . . . . . . . . . . . . . . . 13.5.4 Notes on Vector-Valued Riemann Integration . . . . . . . . . . . 13.5.5 The Lebesgue Integral . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13.5.6 Convex Functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Fourier Series . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Basics on Descriptive Statistics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Excursion to Functional Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13.8.1 Banach Spaces . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13.8.2 Operators . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13.8.3 Finite-Dimensional Spaces . . . . . . . . . . . . . . . . . . . . . . . . . . 13.8.4 Infinite-Dimensional Spaces . . . . . . . . . . . . . . . . . . . . . . . . . 13.8.5 Operators II . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13.8.6 Three Principles of Linear Analysis . . . . . . . . . . . . . . . . . . . 13.8.7 Spaces with an Inner Product (Pre-Hilbertian and Hilbert spaces) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13.8.8 Spectral Theory . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13.8.9 Pointwise Topology and Product Spaces . . . . . . . . . . . . . . . 13.8.10 Periodic Distributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
711 720 723 743 743 756 770 772 775 792 796 800 801 801 805 810 811 813 816 823 825 826 827
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 831 Author Index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 835 General Index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 839 Symbol Index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 861
List of Figures
Fig. 1.1 Fig. 1.2 Fig. 1.3 Fig. 1.4 Fig. 1.5 Fig. 1.6 Fig. 1.7 Fig. 1.8 Fig. 1.9 Fig. 1.10 Fig. 1.11 Fig. 1.12 Fig. 1.13 Fig. 1.14 Fig. 2.1 Fig. 2.2 Fig. 2.3 Fig. 2.4 Fig. 2.5 Fig. 2.6 Fig. 2.7 Fig. 3.1 Fig. 3.2
Shaded, the union (i) and the intersection (ii) of families of sets A partition of a set ................................................... The hypothenuse of a right triangle, and the incommensurability with the side .................................. The golden √ cut........................................................ Finding 2 by halving .............................................. The graph of the absolute value function on [−1, 1] (Definition 37)........................................................ The positive and negative part functions .......................... The floor and the ceiling functions................................. The pattern in the proof of Proposition 53 ........................ The graph {(x, f (x)) : x ∈ (0, 1)} of f (proof of Proposition 61) .................................................... The mapping h from (−1, 1) onto R (proof of Proposition 61) . The mapping g from (0, ∞) onto R (proof of Proposition 61) . “Catching” the point x0 .............................................. The construction in the proof of Theorem 109 (sets U n in grey) How Achilles and the tortoise proceed ............................ The partial sums Bn of an alternating series and the sum B (Corollary 183) ....................................................... The difference sp,q − sn,m (proof of Proposition 208) ........... Getting an+1,m+1 by subtracting sn+1,m − sn,m from sn+1,m+1 − sn,m+1 ..................................................... A particular “summation method” (i.e., a particular function ϕ) in Proposition 212.................................................... Two functions that approximate e (Proposition 216) ............ Inequalities (2.50).................................................... The first two steps in the construction of the Cantor ternary set C ................................................................... A tree representation of the Cantor ternary set; 0 points to the left, 1 to the right.....................................................
2 2 12 14 23 25 26 27 33 36 36 37 50 55 74 84 93 93 94 100 107 126 127 xix
xx
Fig. 3.3 Fig. 4.1 Fig. 4.2 Fig. 4.3 Fig. 4.4 Fig. 4.5 Fig. 4.6 Fig. 4.7 Fig. 4.8 Fig. 4.9 Fig. 4.10 Fig. 4.11 Fig. 4.12 Fig. 4.13 Fig. 4.14 Fig. 4.15 Fig. 4.16 Fig. 4.17 Fig. 4.18 Fig. 4.19 Fig. 4.20 Fig. 4.21 Fig. 4.22 Fig. 4.23 Fig. 4.24 Fig. 4.25 Fig. 4.26 Fig. 4.27 Fig. 4.28 Fig. 4.29 Fig. 4.30 Fig. 4.31 Fig. 4.32 Fig. 4.33 Fig. 4.34 Fig. 4.35 Fig. 4.36
List of Figures
Elements in the ε1 ,... ,εn intervals written using the base-3 expansion (Remark 281) ............................................ The graph of the function x 2 − x + 1 on [0, 1] ................... The graph of the function (x − 1)/(x + 1) on [−10, 10] ........ The characteristic function of the set A ........................... The function x 2 is even, the function x 3 is odd................... The limit of a function f at a point x0 may be different from f (x0 ) ............................................................ The function f in Example 4.1.2.1 ................................ The signum function (Eq. (4.2)) ................................... The function x + (1/x) on [−1, 0) ∪ (0, 1] (with partial y-range) (Example 4.1.2.1) ......................................... At x0 , f is continuous, g discontinuous ........................... The function 1/x on the interval [−10, 0) ∪ (0, 10] (the range limited to [−10, 10]) ................................................. The preimage of (1, 2) by f (x) := x 2 (Remark 324) ............ The graph of f in [−10, 10] (proof of Corollary 332)........... The example in Remark 338 ........................................ The intermediate value theorem .................................... The function x 2 on the interval [0, 1] and the argument in Example 4.1.3.2 √ ...................................................... The graph of x on [0, 10] (Example 4.1.3.3) ................... The derivative of f at a is the limit of the slopes of the chords The closer we focus on f , the closer f looks—locally—as a translate of a linear function ........................................ The function |x| (Example 4.1.4.4) ................................ The function in Example 359 and its first derivative ............. A function with a local minimum and maximum at 0 (Remark 361) ......................................................... Some local extrema of f ............................................ At the nonextremum point c = 0 the derivative is 0 ............. Rolle’s theorem....................................................... Lagrange’s Mean Value Theorem .................................. The function f in Remark 372, with f (0) = 1/2 ............... For increasing x, slopes decrease near x1 and increase near x2 . The plot of the Riemann function (Example 4.3) ................ The trigonometric functions ........................................ The trigonometric functions sin x and cos x on [−2π , 2π ] ..... Adding angles α and β .............................................. The proof of Corollary 382 ......................................... Computing the trigonometric functions at some angles ......... The function tan x and its derivative on (−π/2, π/2)............ The function arctan x and its derivative on [-10,10] (Example 395)........................................................ The function arcsin x and its derivative (Example 396) .........
129 136 136 138 139 141 142 142 145 148 148 150 153 155 155 158 159 160 161 164 164 166 166 167 168 169 170 171 176 177 178 178 179 180 184 184 185
List of Figures
Fig. 4.37 Fig. 4.38 Fig. 4.39 Fig. 4.40 Fig. 4.41 Fig. 4.42 Fig. 4.43 Fig. 4.44 Fig. 4.45 Fig. 4.46 Fig. 5.1 Fig. 5.2 Fig. 5.3 Fig. 5.4 Fig. 5.5 Fig. 5.6 Fig. 5.7 Fig. 5.8 Fig. 5.9 Fig. 5.10 Fig. 5.11 Fig. 5.12 Fig. 5.13 Fig. 5.14 Fig. 5.15 Fig. 5.16 Fig. 5.17 Fig. 5.18 Fig. 5.19 Fig. 5.20 Fig. 5.21
Fig. 5.22 Fig. 5.23 Fig. 5.24
xxi
The first steps in the construction of the Lebesgue singular function S ............................................................ The Lebesgue singular function S (i.e., the devil’s staircase).. One of the step functions sn (Proposition 408) ................... The function in Example 429 on [0, 2/π ] ......................... The function sin x as the difference of two increasing functions on [0, 2π ] ............................................................. There are “few” √ horizontal tangent lines (Lemma 440) ......... The function x and a linear function Cx (for C > 0) on [0, 1] The graph of f and f in Example 4.5.8.3........................ The graphs of f and f (its range truncated) in Example 4.5.8.4 Hierarchy of some classes of functions............................ a The first seven functions in Example 454. b The pointwise limit of the sequence ............................. Approximating the Riemann function ............................. The functions fn , after some n, are in the shaded region (uniform convergence) .............................................. The first four elements in the sequence of functions in Remark 470 ........................................................... The first four functions in both sequences (Example 471) ...... The plot of the function f in (5.5) ................................. A nondifferentiable uniform limit of differentiable functions... The function φ and the two first functions f1 and f2 (Definition 481) ...................................................... Three steps in building the Takagi–van der Waerden function in [0, 1] ................................................................... The graph of the Takagi–van der Waerden function on [0, 1] ... Zooming on the graph of the Takagi–van der Waerden function The first polynomials in Lemma 484, and the limit function |x| The function exp x and its first four Taylor polynomials at 0 ... The functions sin x and cos x and their first six Taylor polynomials at 0...................................................... The function ln (1 + x) and its first five Taylor polynomials at 0 Examples for Corollary 507 (in all cases, x0 = 0) ............... Building the sequence of approximations in Newton’s method The function f and four approximations on (−1, 1) (Example 512.1)...................................................... xn Five approximations to ∞ n=1 nn on [−1, 1) (Example 512.2).. ∞ x Five approximations to n=1 n2 on [−1, 1] (Example 512.3).. a The first five Taylor polynomials of (1 − x)−1 at x = 0. b The first four Taylor polynomials of (1 + x 2 )−1 at x = 0 (Example 5.2.2.1) .................................................... The function f in Example 5.2.2.2 ................................ The function in Example 527 ....................................... The graph of the exponential function on the interval [−2, 2] ..
187 187 192 203 205 208 211 212 212 213 216 219 220 223 223 225 228 230 232 232 233 233 240 241 242 247 249 253 253 254
258 259 261 263
xxii
Fig. 5.25 Fig. 5.26 Fig. 5.27 Fig. 5.28 Fig. 5.29 Fig. 5.30 Fig. 5.31 Fig. 5.32 Fig. 5.33 Fig. 6.1 Fig. 6.2 Fig. 6.3 Fig. 6.4 Fig. 6.5 Fig. 6.6 Fig. 6.7 Fig. 6.8 Fig. 6.9 Fig. 6.10 Fig. 6.11 Fig. 6.12
Fig. 6.13 Fig. 6.14 Fig. 6.15 Fig. 6.16 Fig. 7.1 Fig. 7.2 Fig. 7.3 Fig. 7.4 Fig. 7.5 Fig. 7.6 Fig. 7.7
List of Figures
The functions exp x and ln x on the interval [−3, 3] ............. Inequalities (5.81).................................................... The function (1 + x)/(1 − x) on (0, 1) (Remark 540) ........... Using a logarithmic table to find the product ..................... Computing 2 times 3 on a slide rule ............................... The hyperbolic functions sinh x, cosh x, and tanh x, on [−5, 5] The trigonometric functions sin x, cos x, and tan x (its OX and OY scale are different) .............................................. The inverse trigonometric functions on their domains (arctan x on [−10, 10]) ......................................................... Graphs of (1 + x)α for several α’s ................................. Three distances in R2 ................................................ The distance from a point x to a set A............................. A uniformly continuous non-Lipschitz function on [0, 1] ....... A homeomorphism from (0, 1) onto R (Example 562.1) ........ A homeomorphism from C0 onto R (Example 562.2) ........... f defined on F := [0, 1], g on M := R, and K = 3 (Lemma 567) ......................................................... Proof of Proposition 576: functions ϕ(x) for some x’s (here M = R and x0 = 0).................................................. Covering a Polish space and the mapping φ (Theorem 596).... The construction in Proposition 606 ............................... The first four functions in Example 608.12 ....................... The first steps of the construction in Proposition 635 for finite A ............................................................ Approximating a function f first by a continuous piecewise linear function p and then by a function not in Fn (the construction in 6.9.2.1) .............................................. The function f (x) := 1 + x from R onto R has no fixed point. The dashed line is the diagonal..................................... A continuous function from [0, 1] into itself has fixed points (Proposition 651)..................................................... The graphs (in bold) of a contraction (a), and a noncontraction (b), and the iterations (6.16). The dashed line is the diagonal .. Each fn has a fixed point, f does not.............................. Approximating the area with “inscribed” rectangles............. Approximating the area by using Riemann sums................. (b) Two functions giving the same value f (a)+f (but not the 2 same average!)........................................................ Two approaches to the area: Upper-lower sums (solid horizontal lines) and tagged sums (dashed horizontal lines) .... Upper and lower Riemann sums ................................... The effect on the Riemann lower sum of refining the partition . The average of the function x 2 on the interval [0, 1] .............
267 271 271 272 272 274 275 278 279 284 289 291 292 292 295 302 310 314 316 326
331 334 334 336 337 340 341 341 343 344 344 345
List of Figures
Fig. 7.8 Fig. 7.9 Fig. 7.10 Fig. 7.11 Fig. 7.12 Fig. 7.13 Fig. 7.14 Fig. 7.15 Fig. 7.16 Fig. 7.17 Fig. 7.18 Fig. 7.19 Fig. 7.20 Fig. 7.21 Fig. 7.22 Fig. 7.23 Fig. 7.24 Fig. 7.25 Fig. 7.26 Fig. 7.27 Fig. 7.28 Fig. 7.29 Fig. 7.30 Fig. 7.31 Fig. 7.32 Fig. 7.33 Fig. 7.34 Fig. 7.35 Fig. 7.36 Fig. 7.37 Fig. 7.38 Fig. 7.39 Fig. 7.40 Fig. 7.41 Fig. 8.1
xxiii
The picture for the proof of the Fundamental Theorem of Calculus for a continuous function................................. The Fundamental Theorem of Calculus for the function f (x) = constant ...................................................... The Fundamental Theorem of Calculus for the function f (x) = x .............................................................. x f ∈ R[0, 1], although F (x) := 0 f is not differentiable ...... The function F in Remark 686.3, and its derivative (the graph is truncated between y = −3 and y = 3) ......................... The function φ on [0, 1/8] in the construction of Volterra’s function ............................................................... (iii) is the basic ingredient in building Volterra’s function ...... The function F on the first central open interval (first stage) ... First four polygonals in the Cauchy–Peano construction........ The functions ψ0 , ψ0 , and ψ1/2 on (−1, 1), (Remark 688) ...... The three first functions fn building the devil staircase ......... Change of variable for√an increasing function G ................. The function f (x) = 1 − x 2 on [0, 1] ........................... The function h on [−1, 0) in Remark 707......................... Improper Riemann integrals of the first class ..................... Improper Riemann integrals of the second class ................. Plotting sin (x)/x on [0, 50] (Example 713) ...................... The two functions in Example 715 (part of the range of the first one, part of the range and the domain of the second one) ....... The upper and lower Riemann sums in the proof of Theorem 716 ...................................................... The function under the integral sign in Example 719, and its integral ................................................................ The function under the integral sign in Example 720............ Divisions for the Lebesgue integral ................................ A step function ....................................................... An example related to Fatou’s lemma (Remark 746) ............ The three first step functions sn in Example 756 ................. The n- and m-regularization of a function f (proof of Theorem 763) ..................................................... A sketch of the three first functions in Remark 735.2............ The functions F , h2 , and h3 in the proof of Theorem 791 ...... The graph of the function in Remark 793 and of its derivative . The function f (x) = x s−1 e−x for s = 0.1, 0.5, 1, 2, and 3 in Example 803....................................................... The Gamma function ................................................ The function f (x) = ( ln (1 − x))/x in Example 804 ........... Several functions in Example 805.................................. The function in Example 806 ....................................... The convex hull (in grey) of a set S ................................
356 357 358 358 360 361 362 362 364 366 374 376 378 380 381 382 383 385 385 387 387 388 390 400 404 407 415 424 427 431 432 434 435 435 440
xxiv
Fig. 8.2 Fig. 8.3 Fig. 8.4 Fig. 8.5 Fig. 8.6 Fig. 8.7 Fig. 8.8 Fig. 8.9 Fig. 8.10 Fig. 8.11 Fig. 8.12 Fig. 9.1 Fig. 9.2 Fig. 9.3 Fig. 9.4 Fig. 9.5 Fig. 9.6 Fig. 9.7 Fig. 9.8 Fig. 9.9 Fig. 9.10 Fig. 9.11 Fig. 10.1 Fig. 10.2 Fig. 10.3 Fig. 10.4 Fig. 10.5 Fig. 10.6 Fig. 11.1 Fig. 11.2 Fig. 11.3 Fig. 11.4
List of Figures
The graph of a convex function on I := [−1, 1]; the shaded region is part of its epigraph ........................................ slope(A,B)≤slope(A,C)≤ slope(B,C) ............................. A convex function discontinuous at points a and b .............. The argument for the boundedness for f and the existence of limit at b (Remarks 812.3 and 812.8).............................. The unique subtangent at x1 , where f is differentiable, several subtangents at a “corner” x0 , where f is not...................... The three intervals in Remark 812.7 ............................... Two non-Lipschitz convex function on [0, 1] (Remark 814) .... The proof of the sufficient condition in Proposition 819 ........ The function f (x) := x 4 is strictly convex, while f
(0) = 0... The functions exp x and − ln x on the interval [−3, 3] .......... Some powers x r on [0, 1] (for 0 < r < 1 those functions are not convex)............................................................ Some of the functions in Lemma 832 ............................. Changing the variable (Remark 836) .............................. It is (quite) clear why the Riemann–Lebesgue Lemma holds... The Dirichlet kernel in [−π, π ] for m = 0, 1, 2, 3, 4............. 1 1 The graph of t/2 − sin (t/2) on [−6, 6] (proof of Theorem 843) . The 2π -periodic extension of f (x) = x + x 2 on [−π , π ] ...... Some partial sums for the 2π-periodic expansion of f (x) = x + x 2 on [−π , π ].......................................... The 2π-periodic extension of f (x) = x + x 2 on [0, 2π ]........ The Fejér kernel (Definition 857) in [−π , π ] for m = 0, 1, 2, 3, 4....................................................... Two different extensions of f in Example 862 ................... The kernel K in Remark 870 ....................................... The probability density function of the random variable S ..... Two dartboards ....................................................... The probability density and distribution functions of the two-point distribution................................................ The probability density function f20 for the binomial distribution for several p’s .......................................... A distribution function F and its probability density function f The density function of a normal distribution with mean 4 and variance 3 on [−20, 20] ............................................. Two equivalent norms on R2 (inclusions (11.4) .................. The first terms of a bounded sequence in X with an unbounded image by F (Example 906) ......................................... The closed unit ball in the norm ·1 of R3 (proof of Theorem 908) ..................................................... The construction in Lemma 911....................................
440 441 442 443 444 445 446 448 448 451 452 458 463 464 466 468 472 472 473 476 479 486 491 493 499 500 502 504 508 512 514 517
List of Figures
Fig. 11.5 Fig. 11.6 Fig. 11.7 Fig. 11.8 Fig. 11.9 Fig. 11.10 Fig. 11.11 Fig. 11.12 Fig. 11.13 Fig. 11.14 Fig. 11.15 Fig. 11.16 Fig. 11.17 Fig. 11.18 Fig. 11.19 Fig. 11.20
Fig. 11.21 Fig. 11.22 Fig. 11.23 Fig. 11.24 Fig. 11.25 Fig. 11.26 Fig. 11.27 Fig. 11.28 Fig. 11.29 Fig. 11.30 Fig. 11.31 Fig. 11.32 Fig. 11.33
xxv
The construction of an Auerbach basis {ei ; fi }2i=1 in R2 for a given norm ............................................................ The construction of an Auerbach basis {ei ; fi }3i=1 in R3 for a given norm ............................................................ The inductive construction in (1) in the proof of Theorem 916. The construction in (3) in the proof of Theorem 916 ............ Two closed hyperplanes are always linearly isomorphic (Remark 918.3)....................................................... Theorem 926.......................................................... The construction in the proof of Theorem 926 ................... The construction in the proof of Corollary 927................... The hyperplane H supports C at c0 ................................ A supporting functional (Corollary 931 and Proposition 933).. The graph of the function (11.16) on [−1, 1] × [−1, 1] (Example 938)........................................................ The graph of the function (11.17) on [−1, 1] × [−1, 1] (Example 939)........................................................ Balls of a (a) non-Gâteaux (b) Gâteaux differentiable norm at x0 (Remark 940.4) ................................................... The norm is Fréchet differentiable at x0 (Remark 940.5) ....... Two supporting functionals to B 2∞ (at (0, 1) and (1, 1)) (Example 943)........................................................ In bold, the closed unit ball in 24 (left) and in 24/3 (right), the starting point x0 and the computed point (a, b) (Example 944). In the picture, dual balls share the same dash-style .............. The helix in Remark 945 ............................................ Two projection P and Q onto Y , with P = 1 and Q > 1 (in gray, the image of BX )........................................... A projection of norm 1 onto the one-dimensional subspace span {x0 } .............................................................. The construction of a projection of norm almost 2 onto f −1 (0) The parallelogram equality ......................................... In an inner product space, the sphere does not contain segments (Remark 960) ............................................. In an inner produc space, a subspace F and its orthogonal complement F ⊥ ...................................................... The Pythagorean Theorem (Eq. (11.27)) .......................... Searching for the point at minimum distance (Lemma 967.1).. Continuity of the metric projection mapping PC (1(d) in Lemma 967) .......................................................... The closest point x0 to x in a subspace F ......................... Decomposing X into an “orthogonal” direct sum, and the associated projections (Theorem 969 and Corollary 970)....... The sum of the two first summands in the Fourier series of x (Theorem 977)........................................................
518 519 520 521 522 528 529 529 530 531 534 535 537 538 539
540 541 542 543 543 552 553 553 553 556 558 558 559 562
xxvi
Fig. 11.34 Fig. 11.35 Fig. 11.36 Fig. 11.37 Fig. 11.38 Fig. 11.39 Fig. 11.40 Fig. 11.41 Fig. 11.42 Fig. 11.43 Fig. 11.44 Fig. 11.45 Fig. 12.1 Fig. 13.1 Fig. 13.2 Fig. 13.3 Fig. 13.4 Fig. 13.5 Fig. 13.6 Fig. 13.7 Fig. 13.8 Fig. 13.9 Fig. 13.10 Fig. 13.11 Fig. 13.12 Fig. 13.13 Fig. 13.14 Fig. 13.15 Fig. 13.16 Fig. 13.17 Fig. 13.18 Fig. 13.19 Fig. 13.20 Fig. 13.21 Fig. 13.22
List of Figures
The first elements of the Haar basis................................ The supporting functional in a real Hilbert space (Proposition 990)..................................................... The mappings and vectors in Corollary 994. The conclusion is that j (H ) = H ∗∗ ..................................................... The norm of the Hilbert space (R 2 , ·, ·2 ) is Fréchet differentiable out of 0, and its derivative has norm 1 ............ The function f and its perturbation (Theorem 1027) ............ Fermat’s Theorem 362 “almost” holds (Corollary 1029)........ The three points in the proof of Lemma 1031 .................... The construction in Remark 1054.1................................ The construction of φ in Remark 1054.2 .......................... The function in Example 1.......................................... Connections between periodic distributions and their Fourier coefficients sequence ................................................ The six first partial sums of the Fourier series for the distribution δ0 (Remark 1059) (vertical scales are different).... The elements eix for x ∈ R ......................................... Construction of the golden cut and the golden ratio (Exercise 13.24) ...................................................... The Riemann sums in Exercise 13.135 for p = 2 and n = 5 ... First steps in the construction of C × C (Exercise 13.157) ..... The function in Exercise 13.178.................................... The function in Exercise 13.172.................................... √ A fragment of the graph of the function 3 x (see Exercise 13.193) ................................................ A fragment of the graph of the function in Exercise 13.195 .... A schema of the assumption in the hint of Exercise 13.204..... (a) f on [0, 1] has a fixed point, (b) g on (0, 1) has no fixed points (Exercise 13.205) ............................................ The function x 2 and its tangent at 1 (Exercise 13.208) .......... The function x x on (0, 1] ............................................ The reason why Thales circle gives the right answer (Exercise 13.215) .................................................... Three functions in Exercise 13.216 ................................ The function in Exercise 13.218.................................... The function in Exercise 13.219.................................... The extension in Exercise 13.220 .................................. The extension in Exercise 13.221 .................................. A fragment of the graph of the function in Exercise 13.222 .... The functions g and h in Exercises 13.223 and 13.224.......... The function ϕ in Exercise 13.225 ................................. The first five functions of the approximate identity in Exercise 13.227 (d) ............................................................. The function in Exercise 13.230 for x0 = 1.......................
567 571 573 574 585 587 588 599 600 601 604 607 627 636 663 669 672 674 678 679 680 681 682 682 684 684 685 685 685 686 686 687 687 689 690
List of Figures
Fig. 13.23 Fig. 13.24 Fig. 13.25 Fig. 13.26 Fig. 13.27 Fig. 13.28 Fig. 13.29 Fig. 13.30 Fig. 13.31 Fig. 13.32 Fig. 13.33 Fig. 13.34 Fig. 13.35 Fig. 13.36 Fig. 13.37 Fig. 13.38 Fig. 13.39 Fig. 13.40 Fig. 13.41 Fig. 13.42 Fig. 13.43 Fig. 13.44 Fig. 13.45 Fig. 13.46 Fig. 13.47 Fig. 13.48 Fig. 13.49 Fig. 13.50 Fig. 13.51 Fig. 13.52 Fig. 13.53 Fig. 13.54 Fig. 13.55 Fig. 13.56 Fig. 13.57 Fig. 13.58 Fig. 13.59 Fig. 13.60 Fig. 13.61
xxvii
The function in Exercise 13.232 and its asymptote at +∞ ..... The function in Exercise 13.234.................................... The function in Exercise 13.235.................................... The function f in Exercise 13.242................................. The function 13 x 3 − 25 x 2 + 6x + 1 on the interval [1, 4] (Exercise 13.243) .................................................... The function in Exercise 13.244.................................... The function f in Exercise 13.247................................. The function f and its two first derivatives on [−1.5, 1.5] (Exercise 13.249) .................................................... The first four iterates of the sinus function (Exercise 13.261) .. 3 The functions tan x and x + x3 in Exercise 13.264 .............. The four first √ functions in Exercise 13.271 ....................... The function x on [0, 1] (Exercises 13.284 and 13.285) ...... The graph of the three functions in Exercise 13.286 ............. Extending a Lipschitz function (Exercise 13.290) ............... Functions g1 and g2 in Exercise 13.297 ........................... Approximating a continuous function by a Lipschitz function (Exercise 13.298) .................................................... Functions f1 , f2 , f3 for C := [−1, 1] (Exercise 13.302) ........ The first five functions fn on [0, 3] in Exercise 13.307 .......... The first six functions fn on [0, 1] in Exercise 13.308........... The first five summands on [0, 5] in Exercise 13.310 ............ The first five functions fn in Exercise 13.315 .................... The first six functions fn on [0, 5] in Exercise 13.316........... The first five functions fn and gn in Exercise 13.317 ............ The function √ nk=1 fk in Exercise 13.320......................... The function x and its degree-3 Taylor polynomial at x0 = 1 (Exercise 13.325) .................................................... The first four functions in the construction (Exercise 13.326) .. Superimposing the graphs (Exercise 13.326) ..................... Several tangent lines to ex (Exercise 13.334) ..................... Three unit balls in R3 ................................................ The distance between f and g (Exercise 13.347) ................ The distance between Bordeaux and Marseille in the metric given by (13.24) ...................................................... √ Some elements gδ that approximate x (Exercise 13.379) ..... The first steps in building {Is : s ∈ Z n in N. Statement (1.4) below is another equivalent way to formulate the Finite Induction Principle. The equivalence with (1.3) should be clear. For details and a proof see item 3 in Sect. 12.1. There is no strictly decreasing sequence in N.
1.3
(1.4)
Integers
Die ganze Zahl schuf der liebe Gott, alles Übrige ist Menschenwerk. (God made the integers, all else is the work of man.) Leopold Kronecker
6
1 Real Numbers: The Basics
Aside the natural numbers, our most intuitive number system is the set Z of all the integer numbers (for brevity, we refer to them as the integers). It consists of all numbers . . ., −3, −2, −1, 0, 1, 2, 3, . . .
(1.5)
The elements in the list (1.5) are presented in the accepted natural order ≤ of Z (we say that Z is ordered by this particular order; this order inducing on N its natural order). Integers strictly greater than 0 are called positive integers (and they form precisely the set N of all natural numbers), while negative integers are integers smaller than 0 (accordingly, nonnegative integers are those integers greater than or equal to 0). It is important to note that there are subsets of Z having no least element (the set Z itself is an example; another example is the set of all negative integers). Thus, the natural order of Z is not a well order, see the definition after (1.3). Accordingly, there are strictly decreasing infinite sequences of integers. As in the case of natural numbers, every integer has a (unique) immediate successor, and now every integer has an immediate predecessor. On the set Z of all integers we have two algebraic operations, the sum, also called addition (denoted by +), and the product, also called multiplication (the product of two integer numbers a and b will be denoted by a.b, by (a)(b) or, if no confusion arises, by ab). These two operations, when restricted to N, induce the two already mentioned operations of sum and product, respectively, on N. The reader may find a list of the properties enjoyed by Z, endowed with these two operations and the order, in Sect. 12.2. They make Z a so-called ordered commutative ring with no zero divisors. For details, see Theorem 1067. Given an integer z, define its absolute value |z| in the following way: ⎧ ⎨z, if z ≥ 0, |z| := (1.6) ⎩−z, if z < 0. Observe that |z| is a nonnegative integer. Observe, too, that z ≤ |z| for all z ∈ Z, and that, given z1 and z2 in Z, we have |z1 + z2 | ≤ |z1 | + |z2 |, and |z1 .z2 | = |z1 |.|z2 |. The subset of Z consisting of all nonnegative integers endowed with the order induced by ≤ has, regarding its order, the same properties as N in its natural order. To be precise, the mapping p that send each integer z greater than or equal to 0 into z + 1 is an order isomorphism (i.e., a bijective mapping p such that p(n) ≤ p(m) if, and only if, n ≤ m). In particular (see (1.3) above), every nonempty subset of the set of nonnegative integers has a least element. This is needed in the proof of the next result. Proposition 2 (The division algorithm) Given two integers n and d, with d = 0, there exist unique integers q and r such that n = dq + r and 0 ≤ r < |d|. Proof To prove existence, we consider three separate cases. (i) Assume first that n ≥ 0 and d > 0. Let S := {n − dq : q ∈ Z}. By letting q = 0 in the definition of S we note that S contains nonnegative integers. Let
1.3 Integers
7
r be the least integer in S ∩ {z ∈ Z : z ≥ 0} (see the paragraph preceding the proposition). By definition, r ≥ 0. Assume that r ≥ d. We have r = n − dq for some q ∈ Z, so 0 ≤ n − d(q + 1) < r, and this contradict the definition of r. This shows that (0 ≤) r < d. (ii) Assume now n < 0 and d > 0. Apply (i) to −n and d to get −n = dq + r, where q ∈ Z and 0 ≤ r < d. Then n = −dq − r. If r = 0, we are done. Otherwise, put n = −d(q + 1) + d − r. It is enough to note that 0 < d − r < d. (iii) Finally, assume that d < 0. If n ≥ 0 apply (i) to n and −d; otherwise apply (ii) to n and −d. We conclude that n = (− d)q + r, where 0 ≤ r < |d|, so n = d(−q) + r, and we are done. To prove uniqueness, assume that n = dq + r = dq + r , where q, q , r, r are integers, 0 ≤ r < |d|, and 0 ≤ r < |d|. Then d(q − q ) = r − r, hence |d|.|q − q | = |r − r| < |d|, an this forces q = q (and so r = r ). The integer q in Proposition 2 is called the quotient (of the division algorithm), and the integer r there is called the remainder of the division algorithm. Note that, in general, we are unable to apply the division algorithm to arbitrary integers n and d = 0 getting a null remainder (for example, if n = 15 and d = 7, the quotient q is 2, and the remainder r is 1, since 15 = (7) · (2) + 1; observe that Proposition 2 gives uniqueness). In the case we get a null remainder r, the integer d is called a divisor of n, and we say that d divides n. Let us isolate this in a separate statement. Definition 3 We say that a nonzero integer d divides an integer n if there exists an integer q so that n = dq. We say then that d is a divisor of n. Note that if a natural number d divides a natural number n, then necessarily d ≤ n. An integer n is said to be even if 2 divides n. If not, the number is said to be odd. For example, 24 is even, and 25 is odd. Definition 4 Given two integers a and b, not both of them 0, we say that a natural number c is the greatest common divisor of a and b if it is the largest among the natural numbers that divide simultaneously a and b. We denote c = gcd (a, b). If gcd (a, b) = 1 we say that a and b are relatively prime. For example, we have gcd (8, 12) = 4, gcd (− 8, 12) = 4, and gcd (4, 5) = 1. Thus, the numbers 4 and 5 are relatively prime. Lemma 5 (Bézout) Given two integers a and b, not both of them 0, there exist two integers x and y such that ax + by = d, where d is the greatest common divisor of a and b. Proof Form the set S := {|as+bt| : as +bt = 0, s, t ∈ Z}. Choose an element m in S of minimum absolute value (this element always exists, since the set S is obviously nonempty, so we may apply (1.3)). By changing, if necessary, s and t simultaneously by −s and −t, we can write m = ax +by, where x, y ∈ Z. Use the division algorithm (Proposition 2) to write a = mq + r, where q is the quotient and r the remainder (so 0 ≤ r < m). Then a = (ax + by)q + r, hence r = a(1 − xq) − byq. Assume for a moment that r > 0; in this case r ∈ S. Since r < m, we reach a contradiction. We
8
1 Real Numbers: The Basics
must have then r = 0 and so m divides a. Similarly, we prove that m divides b. Let c be a natural number that is a common divisor of a and b. Then c divides ax +by (= m) (in particular, c ≤ m), so m is the greatest common divisor of a and b. Definition 6 A natural number greater than 1 that has no positive divisors except the number 1 and itself is said to be a prime number. If this is not the case, the natural number is said to be composite2 . In other words, a natural number n greater than 1 is composite precisely when it has a positivedivisor different from 1 and from n. For example, the number 11 is a prime number, and the number 12 is a composite number, since 12 = (2)(2)(3). Lemma 7 (Euclid’s lemma) Let p be a prime number that divides a · b, where a and b are natural numbers. Then p divides a or p divides b (or both). Proof Assume that p does not divide a. Then, since p is prime, we have gcd (a, p) = 1, and Bézout’s Lemma 5 implies that there exist integer numbers x and y such that ax +py = 1. Multiply both sides of the previous equality by b to get abx +bpy = b. Since p divides simultaneously abx and bpy, we get that p divides b. Theorem 8 (The fundamental theorem of arithmetic) Every natural number greater than 1 may be expressed in a unique way (disregarding the order) as a product of prime numbers. Proof Fix n ∈ N such that 1 < n. There are two excluding possibilities: Either n is already a prime number (and then the argument stops) or n is a composite number. If this is the case then n has a divisor different from 1 and from n, so n = n1 n2 for two natural numbers n1 and n2 , both of them greater than 1 (and strictly less than n, see the paragraph after Definition 3). If n1 and n2 are both prime, the argument stops. If not, we may express each factor that is a composite number as a product of two natural numbers greater than one. Continuing in this way the argument eventually stops (if not, we are getting an infinite strictly decreasing sequence of natural numbers, something impossible, see (1.4)). It is clear that when we stop, n has been written as a product of prime numbers. To prove uniqueness, assume that a natural number n has two expressions as a product of primes, say n = p1 p2 . . . pn = q1 q2 . . . qm (repeated according to their multiplicity). By cancelling equal terms at both sides of the previous equality, we may assume that no prime pi equals any prime qj in the previous appearance, and conversely. Since p1 divides n = q1 (q2 . . . qm ), it follows from Lemma 7 that p1 divides q2 . . . qm (since it does not divide q1 ). We can proceed by finite induction to finally show that p1 must divide qm , and this is false.
2 There is an agreement not to consider 1 a prime number —nor composite . One of the reasons is to keep the formulation of the fundamental theorem of the arithmetic as in Theorem 8. Indeed, should 1 be considered prime, uniqueness of the expansion of any number as a product of primes— disregarding order—will fail, as 1 can be added—or suppressed—from any such expansion without changing the product.
1.3
Integers
9
The Greek mathematician Euclid (who lived about 2300 years ago) proved the following important result. Theorem 9 There are infinitely many primes among the natural numbers. Proof Suppose on the contrary that the (nonempty) set P of prime numbers is finite, say P := {p1 , p2 , . . ., pn }. Consider the number m = p1 p2 · · · pn + 1.
(1.7)
Since m is greater than any number in P , we get m ∈ P . It follows that m is composite. By Proposition 8 the number m can be divided by a prime number. However, m cannot be divided by any of the numbers in P , a contradiction. An alternative proof of Theorem 9 is suggested in Exercise 13.8. Remark 10 The Prime Number Theorem establishes the asymptotic distribution of prime numbers amongst the sequence of natural numbers in the following precise way: Let π(n) be the number of prime numbers less than or equal to n, for any n ∈ N, and let ln x be the natural logarithmic function (see Definition 535). Then (the concept of limit of a sequence will be introduced in Definition 121) lim
n→∞
π(n) = 1, n/ ln n
i.e., π (n) and n/ ln n become closer and closer as n → ∞. This deep result was proved simultaneously by the French mathematician J. Hadamard and the Belgian mathematician Ch. J. de la Vallée-Poussin following profound ideas of the German mathematician B. Riemann. The conjecture was raised by the German mathematician C. F. Gauss, and independently by the French mathematician A. M. Legendre, and later by the German mathematician J. P. G. L. Dirichlet. In 1980, a relatively simple proof appeared [Ne80]. Still, it uses Cauchy integral theorem from Complex Analysis. ® Remark 11 Note that 32 + 42 = 52 , and by multiplying through this equality we get many integers p, q, and r, such that p2 + q 2 = r 2 (a triplet (p, q, r) like that is called a Pythagorean triplet). However, if n > 2 is an integer, there are no integers p, q, and r, such that pn + q n = r n . This was the famous Fermat conjecture, that only recently (in 1995) was established by the British mathematician A. Wiles, after 358 years of effort of many mathematicians. The conjecture was formulated by the French mathematician P. de Fermat on the margin of his copy of the Diofanto’s Arithmetic, writing there “Cuius rei demonstrationem mirabilem sane detexi. Hanc marginis exiguitas non caperet,” i.e., “I have a wonderful proof of this fact. However, this margin is too narrow to hold it.” ®
10
1 Real Numbers: The Basics
1.4 1.4.1
Fractions and Rational Numbers Introduction
To have a multiplicative inverse (i.e., a reciprocal, that is, a number r such that rn = 1) in the set of nonzero numbers, we need to extend the integers into the set of all fractions. A fraction is defined by a couple (p, q) of integers, the second one a natural number different from zero (see Definition 12, where the couple (p, q) is written, as usual, p/q). In order to avoid the ambiguity that different couples may define the same entity, it is necessary to consider classes consisting of fractions, two couples (p, q) and (r, s) being in the same class precisely when ps = rq. Each resulting class is called a rational number (see Definition 13), and it is customary to choose as a representative of a class a couple (p, q) such that p ∈ Z, q ∈ N, and gcd (p, q) = 1 (see Definitions 4 and 12). Definition 12 Fractions are expressions of the form p where p ∈ Z, q ∈ N, and gcd (p, q) = 1. q If |p| < q we say that the fraction is proper. Definition 13 A rational number is a class of fractions. Two fractions p/q and r/s are in the same class whenever ps = rq. The set of all rational numbers is denoted by Q. Each fraction in a given class is called a representative of the class. However, we agree in this text to chose as a representative of the class the unique expression p/q as in Definition 12—except when explicitly said. That this representative is, in fact, unique, can be seen easily by using the decomposition of p and q in product and −3 belong to the same of prime numbers (see Exercise 13.13). For example, −6 14 7 −3 class, and the representative is 7 . We agree to write a = a1 , so that integers can be thought of as a subset of the rational numbers. Number is within all things. Pythagoras
On the set Q we define two algebraic operations (the sum and the product) that extend the usual operations on Z. To add (to multiply) two classes we choose the representative of each class and produce a new fraction according to: c ad + cb a c ac a + = ; . = . (1.8) b d bd b d bd The resulting class is the one that includes the output. Note that the order of summation and multiplication are irrelevant (i.e., the sum and multiplication of fractions are commutative). Proposition 14 Every fraction can be written as the sum of an integer and a nonnegative proper fraction.
1.4 Fractions and Rational Numbers
11
Proof This follows from the division algorithm (Proposition 2). Indeed, given a ∈ Z and b ∈ N, we can find (unique) integers q and r such that a = bq +r and 0 ≤ r < b. Thus, ab = q + br . Given two fractions ab and dc , we say that a c ≤ whenever ad ≤ bc, b d
(1.9)
and put a/b < c/d in case a/b ≤ c/d and a/b = c/d. Now we define an ordering on Q in the following way: given two rational numbers r1 and r2 , choose representatives a1 of r1 and ab22 of r2 , and say that r1 ≤ r2 if, and only if, ab11 ≤ ab22 (put r1 < r2 if r1 ≤ r2 b1 and r1 = r2 ). This ordering on Q induces on Z the natural ordering described above. Rational numbers greater than 0 are called positive, and rational numbers smaller than 0, negative. Now, unlike integers, no rational number has an immediate successor (nor an immediate predecessor). This is simple to prove: assume that ab < dc . It is a matter of checking to show that ab < ad+bc < dc . 2bd For future references, let us isolate a property of the distribution of natural numbers among the positive fractions. Proposition 15 For every positive fraction ab , there exists n ∈ N such that a ≤ n. b
1 n
≤
Proof We may assume, without loss of generality, that a and b are natural numbers. Since a ≤ ab, we get a a ≤ = a. b 1 Analogously, we find that b ≤ b. a Put n := max{a, b}. Then we have, simultaneously, b a ≤ n and ≤ n. b a The result follows.
1.4.2
Powers and Radicals of Rational Numbers
A notational device useful for writing products of equal rational numbers, like (x)(x) (n times . . . ) (x) (where x ∈ Q and n ∈ N), is to use exponents, and to write n instead x . It is clear then that x n · x m = x n+m , and (x n )m = x n·m for n, m ∈ N. Looking for consistency, we should accept then that x 1 · x 0 = x 1+0 = x 1 , so x 0 = 1 for all x ∈ Q. According to this, we should have, at least formally, that x n · x −n = x 0 = 1. Analogously (again formally) x 1/n · x 1/n (n times . . . ) x 1/n = x. This makes natural to attempt to define negative powers and radicals of fractions.
12
1 Real Numbers: The Basics
Fig. 1.3 The hypothenuse of a right triangle, and the incommensurability with the side
Definition 16 If x ∈ Q, x > 0, and n ∈ N, then x −n :=
1 , xn
(1.10)
and 1
x n , denoted also by
√ n x,
is a number (its existence in Q not guaranteed, see below) such that its n-th power is x.
(1.11) Remark 17 Combining (1.10) and (1.11), we can consider, for a rational number x > 0, powers x n/m , where n and m are integers, and m = 0. Indeed, x n/m =
−3/2 = 27 . ® (x 1/n )m = (x m )1/n . For example 41/2 = 2, 82/3 = 4, and 49 8 q A positive fraction x in the expression x , where q ∈ Q, is referred to as the base. Restricting ourselves to positive bases eliminates the following paradoxical fallacy. 12 =
4
(−1)2
2
2 2
= (−1)2/4 = (−1)1/2 = (−1)2/2 = −1
Expressions as (1.10) are consistent in Q, in the sense that they produce again elements in Q. However, even with positive bases, the situation for radicals as in (1.11) is in general impossible if we insist in remaining √ in the field Q. Indeed, let us consider √ the “number” 2 (= 21/2 ). When you take 2 and multiply it by itself you get the number 2. It can also be interpreted as the length of the hypotenuse in an isosceles right-angle triangle √with unital adjacent sides to the right angle (see Fig. 1.3). The “number” 2 is, however, not a fraction. This will be proved in Theorem 18. There are many legends about the discovery of this fact that involve Pythagoras of Samos. This new number (and many more) are included into a coherent system larger than Q, where all the considered computations will be meaningful. The elements of this larger system are called real numbers. It is obtained by adding to the rational numbers new entities, called irrational numbers, and will be presented in detail in Sect. 1.6. For the moment being, let us prove that the length of the hypothenuse described above is not a rational number or, as historically has been said, “it is not commensurable √ with the side of the isosceles triangle.” This means that the irrationality of 2 manifests itself in the following geometrical manner. Consider
1.4 Fractions and Rational Numbers
13
the triangle in Fig. 1.3. No matter how finely we mesh the sides (having length 1) we can never recover the length of the hypotenuse exactly in the given scale. Theorem 18 There is no rational number whose square is 2. Proof Assuming the contrary, we may write (for some p ∈ Z and q ∈ N) √ p 2 = , with gcd (p, q) = 1. q Then 2 = p2 /q 2 , so that 2q 2 = p2 and thus 2 divides p2 (i.e., p 2 is even). As 2 is prime, it follows from Euclid Lemma 7 that 2 must divide p, hence p2 is a multiple of 4, and therefore 2q 2 is also a multiple of 4. Thus q 2 is even and, by the previous √ argument, q is also even. This contradicts gcd (p, q) = 1. We conclude that 2 cannot be a fraction. He is unworthy of the name of man who is ignorant of the fact that the diagonal of a square is incommensurable with its side. Plato
For a variation of Theorem 18, see Exercise 13.15. The result in Theorem 18 can be immediately generalized (almost with the same proof). Indeed, we have the following result. Theorem 19 Let n be a natural number that is not a perfect square (i.e., that it is not of the form m2 for some natural number m). Then there is no rational number whose square is n. Proof Assume that n = we get
p q
for some p, q ∈ N. Squaring both sides of this equality q 2 n = p2
(1.12)
Let us consider the expansion of p 2 , q 2 , and n in product of primes (see Proposition 8). Since n is not a perfect square, in its expansion some factor appears only an odd number of times. However, both p2 and q 2 have expansions where each factor appears precisely an even number of times. This is a contradiction, since the expansion of p2 is obtained, according to (1.12), by putting together the expansions of q 2 and n. Another example of the impossibility to solve a certain equation in Q is the following: There is no rational number x that satisfies 1 x = . x 1−x
(1.13)
14
1 Real Numbers: The Basics
Fig. 1.4 The golden cut
0
x
1
This will be proved in Exercise 13.23. Geometrically, a positive number x satisfying (1.13) should produce a cut (called the golden cut) of the unit interval [0, 1] 3 in such a way that the ratio of the length of the full segment (i.e., 1) to the length of the segment [0, x] (i.e., x) equals the ratio of the length of this last segment to the length of the remaining segment (i.e., 1 − x), see Fig. 1.4. The inverse of the golden cut x is called the golden ratio. The golden ratio has been used purposely in Architecture, for example by the French architect Le Corbusier, or in Painting, advocated by the Italian mathematician and art scholar L. Pacioli or in the work of the Spanish painter S. Dalí, as a source of aesthetical harmony. For more on this subject and a proof of the previous statement see Exercises 13.23 and 13.24.
1.5
Base Representation
In (1.9) we introduced an ordering in the set Q of rational numbers. Take a particular example: Imagine we wish to compare the two fractions α=
213957 428171 , and β = . 343157 685858
It is not immediate which fraction is larger. We can record fractions in a positional number system (see Remark 1). In a decimal expansion the above fractions read as α = 0.623 · · ·, and β = 0.624 · · ·, and the size comparison is now immediate. The main idea behind the positional number system is simple, yet ingenious. We will first consider natural numbers. Choose a base b, i.e., a natural number greater than 1. We will have b digits to record a natural number. The digits, listed according to increase magnitude, are {0, 1, 2, . . ., b − 1}. We shall express every natural number by using the coefficients in its expansion in terms of successive powers of b, i.e., in terms of {1, b, b2 , b3 . . .}.
3
Intervals will be introduced in Definition 33. Here we only need the definition of a closed and bounded interval, i.e., a subset of R of the form [a, b] := {x ∈ R : a ≤ x ≤ b}, where a and b are real numbers such that a ≤ b. In particular, [0, 1], called the unit interval, is the set of all real numbers greater than or equal to 0 and, simultaneously, smaller than or equal to 1.
1.5 Base Representation
15
As a particular example, let b = 10 (the base of our popular decimal system; this will be our default base), The successive powers of 10 are {1, 10, 100, 1000, . . .} For example, the natural number 373, written in base b = 10, is understood as 373 = (373)10 = 3(102 ) + 7(10) + 3(1). However, if (373)2 denotes a number in base b = 2, this must be understood as 3(22 ) + 7(2) + 3, i.e., the number (29)10 .
1.5.1
The Expansion of a Natural Number in Base b
We present a procedure—building the positional system mentioned in Remark 1— how to write an arbitrary natural number n in an arbitrary base b (recall that by a base we always mean a natural number greater than 1). As an example, we choose n = 46 and set the base b = 3. (The reader will find how to proceed analogously with other numbers.) Recall the division algorithm (Proposition 2), that when applied to natural numbers n and d gives two unique integers q (the quotient) and r (the remainder), so that 0 ≤ r < d and n = dq + r.
(1.14)
Observe that q is, for natural numbers n and d, a nonnegative integer, since q ≤ −1 implies n = dq + r ≤ −d + r < 0, a contradiction. • Divide 46 by 3; it gives q = 15 and r = 1. Since q ≥ b, continue. • Divide 15 by 3; it gives q = 5 and r = 0. Since q ≥ b, continue. • Divide 5 by 3; it gives q = 1 and r = 2. Since q < b, stop. Applying (1.14) repeatedly,
46 = (15)(3) + 1 = (5)(3) + 0 (3) + 1
= (1)(3) + 2 (3) + 0 (3) + 1 = 1(33 ) + 2(32 ) + 0(3) + 1,
hence 46 = 1201(base 3). Observe that we recorded the successive remainders from right to left, and put finally the last quotient as the first-to-the-left digit.
1.5.2
The Expansion of a Rational Number in Base b
Let us now record a fraction in the positional number system. To this end we need negative powers of the base to reach smaller scales. We form the power sequence {b−1 , b−2 , b−3 , . . .} To record a positive fraction α = pq in a base b we first write α = n + β where n is a nonnegative integer and β is a fraction less than 1. We represent n in base
16
1 Real Numbers: The Basics
b according to the algorithm in Sect. 1.5.1. We now describe the algorithm for a positional representation of β. Let us choose β = 58 and keep the base b = 3. (The reader will find immediately how to adapt the procedure to any other proper fraction. We recommend that the reader carefully compare each of the following steps with the high school method of getting decimal expansion of the same fraction—i.e., instead of multiplying by 3 one multiplies by 10.) At each stage, a remainder zero would stop the procedure. • Multiply 5 by 3 and divide the result by 8. It gives a quotient q = 1 and a remainder r = 7. • Multiply the remainder 7 by 3 and divide the result again by 8. This gives a quotient q = 2 and a remainder r = 5. • Multiply the remainder 5 by 3 and divide the result again by 8. In this particular case we are back to the first step, so we enter a circle. We get 58 = 0.121212 · · · (base 3). (Note that, comparing that with the decimal expansion, we get 58 = 0.625. Observe, thus, that in one base the expansion may be terminating while in other may be repeating.) Observe that this time we use the convention of writing the successive quotients as digits after the radix point (denoted in this text by “.”). The following result shows that it is not by chance that, starting from a fraction, we reach either a finite (sometimes called terminating) or an eventually periodic base-b expansion (i.e., from a position on, a nonzero finite block is infinitely repeated; in this case the minimal block is written overlined, as in 5/8 = 0.12, see above). Conversely, it shows that every finite or eventually periodic expansion in some base b represents a fraction 4 . The result will be formulated for proper fractions. Observe that Proposition 14 allows to write every fraction as the sum of an integer and a proper fraction. Theorem 20 Consider a natural number b ≥ 2 as a base. Then, every positive rational number α = pq (where p and q are natural numbers, p < q, gcd (p, q) = 1, and q ≥ 2) has either a finite or an eventually periodic base b-expansion. In either case the repetition (or eventual zeros in the case of finite base expansion) occurs at the latest after q positions. Conversely, every base expansion in some base b ≥ 2 of the form 0.a1 a2 . . . that is either finite or eventually periodic, represents a proper fraction. Proof The first part is a consequence of the algorithm described above in this subsection. Indeed, note that there are q possible remainders {0, 1, . . ., q − 1} to a division of any natural number p by the number q (see Proposition 2). Therefore
4 The proof, given below, of this last statement is, somehow, incomplete. Its pretension is just to provide a practical procedure to find the fraction associated to a given expansion in some base. The reader may note that we are assuming (without justification) that the algebraic operations of sum and product on Q, when applied to their expansions, give the usual rules for manipulating them. This has a simple proof when the expansions are finite, since finally they represent just a finite sum. However, to justify them in the case of nonterminating (hence periodic) expansions we need to ultimately rely on the concept of the sum of a series. This will be done, properly, in Example 171.1.
1.5 Base Representation
17
we have to reach a repetition in the remainders at the latest after q steps (if a remainder is zero, we stop the procedure, as mentioned). For the second part, let x = 0.a1 a2 · · · an b1 b2 · · · bm (base b). Then bn+m x = a1 a2 · · · an b1 b2 · · · bm .b1 b2 · · · bm ,
(1.15)
bn x = a1 a2 · · · an .b1 b2 · · · bm .
(1.16)
and
By subtracting (1.16) from (1.15) we get bn (bm − 1)x = a1 a2 · · · an b1 b2 · · · bm − a1 a2 · · · an , hence x=
a1 a2 · · · an b1 b2 · · · bm − a1 a2 · · · an . bn (bm − 1)
For example, and if we denote by ( pq )b the base expansion of the number pq in
61 base b, we have 17 10 = 0.142857, 13 10 = 0.3, 495 = 0.123, where the 10 overlined block is the one that repeats. Conversely, if x = 3.45137 (base 10), then 105 x = 345137.137, and 102 x = 345.137, hence 102 (103 − 1)x = 345137 − 345, so 345137 − 345 x= . 99900 For further examples see, e.g., Exercise 13.14. The number 0 has an expansion 0.000 · · · in any base. If q is a negative rational number, and n.a1 a2 . . . is the base b-expansion of −q, then the base b-expansion of q is, by definition, −n.a1 a2 . . . . Remark 21 For a characterization of numbers having a finite base b-expansion in terms of the prime-number factorization of b, see Proposition 114. ® Certainly, if we choose a base greater than 10, we need then to use more than ten distinct digits. As an example, suppose we work with the base b = 11. We can borrow the decimal digits {0, 1, 2, . . ., 9} and add the digit representing the tenth digit. The number 373 = (373)10 written in the base b = 11 is then given by (373)10 = (30)11 since 373 = (3)(112 ) + (0)(11) + 10. Conversely,
()11 = 10(111 ) + 10 10 = (120)10 . The base b = 2 is a special base. We have only two digits {0, 1} at our disposal. This fact alone makes 2 a perfect base for representing numbers on a computer, since capacitors have only two options: charged, and uncharged. Note that (373)10 = (1)(28 )+(0)27 +(1)(26 )+(1)(25 )+(1)(24 )+(0)(23 )+(1)(22 )+(0)(2)+(1)(1) and hence (373)2 = 101110101.
18
1 Real Numbers: The Basics
A trade off exists between the number of digits and the number of positions in a base representation. The more digits we have at our disposal the shorter the base representation is. Note that in a positional number system the number 0 has become a crucial computational tool for distinguishing, for example, the numbers 203 and 23. In mathematical formulas, on the other hand, fractions sometimes fit better than numbers in their positional notation —not to speak of the intrinsic loss of accuracy coming from truncation. This is why even mathematical assistant computer programs include fraction manipulation in their symbolic algorithms as a regular procedure. Remark 22 Natural numbers that have a large number of divisors (relative to their size) have played, historically, a relevant role in easing computations. Indeed, a positional system having as a base such a composite number may tremendously help in this. The reader should be aware that computational complexity was, in the past, a burden, not only for scientific development but for everyday life. Financial transactions need division, and the choice of a base with many divisors may help. For example, take b = 60. It has many divisors, since 60 = 22 (3)(5). The division of a number by, say, 3, amounts to multiplying this number by 1/3. In base b := 60, the number 1/3 is just 0(60) + (20)60−1 , and it has a finite number (only one) of digits after the radix point, since 20 is one of the digits of the system. However, in base b := 10, the number 1/3 has an infinite number of digits after the radix point —in this case called the decimal mark. An agent with a good memory (or a table of basic operations to help, maybe in the form of an “abacus”) may compute many complicated operations faster and more accurately by using base 60 than by using base 10. This is one of the reasons why the number 60 was chosen as a measurement for angles and time; thus 60 became the number of seconds in a minute, and the number of minutes in an hour (returning to our example, 1/3 of an hour is just 20 minutes; if a decimal system should have been used, it would be impossible to refer to this verbally in terms of minutes), 360 became the angle measurement for a full rotation, related to the number of days (approximately) taken by the Earth in its full rotation around the Sun—or the return of the seasons, if you wish. For a precise argument to support what has been said here, see Proposition 114. ®
1.6 1.6.1
Real Numbers The Definition of a Real Number
√ To accommodate numbers such as 2 (see Theorem 18) we need to extend the set of fractions (i.e., the set Q of rational numbers). We need√a new number system that would include fractions together with numbers such as 2. This system will be denoted by R, and its elements called real numbers. There are several ways to introduce the system R. Essentially they reduce to two, the “axiomatic” and the “constructive.” By the “axiomatic way” we mean to “define”
1.6 Real Numbers
19
the system of real numbers by saying that it is an abstract set having a certain number of basic properties (given by fiat and called “axioms”). The “constructive” way consists of using the elements in Q as the “building blocks” for the new entities (called “real numbers”). Once this new entities have been constructed, we should embark ourselves in the task of proving that the new system has the expected properties —the properties we suspect it has “by experience.” The advantage of the axiomatic method is that we do not need to prove the basic properties, since these are attributed to the system in the form of axioms from the beginning. However, to appease the conscientious reader, there is the need to actually exhibit an object conforming to the structure— and then an appeal to the rational numbers and some manipulations on them are needed. On the other hand, the advantage of the constructive way is that the “existence” of the constructed system is shown in the process —once the reader is convinced of the “existence” of the set of rational numbers. We shall adopt the second approach, the “constructive way.” Even so, we need to mention that there are at least two common procedures to construct the real numbers from the rational numbers. In order to illustrate both of them, imagine first that you have at your disposal an electronic calculator. Press 2 and press the square-root key. Then (depending on your device) you get something like 1.4142135623730950488. This is not the true value (see Theorem 18). Indeed, the following sequence q0 := 1, q1 = 1.4, q2 = 1.41, q3 = 1.414, q4 = 1.4142, q5 = 1.41421, . . ., √ consists never reach 2. We may define the new num√ only of rational numbers, so we ber 2 precisely as the sequence {qn }∞ n=0 . Not every sequence of rational numbers will define a new entity; we should restrict ourselves to—roughly speaking—sequences having terms with mutual distances going to zero, a rational number q being represented by the sequence q, q, q, . . . (in this way, we should be really “enlarging” the set Q). It will be enough then decide how to order such sequences, and how to add and multiply them, to endow the new set with some “structure,” in a sense that it will include the system Q with all its properties. For an account of this the reader may check, e.g., [Gof53, Chap. 3]. A second version of the constructive method (still using Q√to define the new entities) is illustrated by the following procedure: the “number” 2 is used to divide Q into two sets, L := {q ∈ Q : q < 0} ∪ {q ∈ Q : q ≥ 0, q 2 < 2}, and R := {q ∈ Q : q ≥ 0, q 2 ≥ 2}. (1.17) √ In some sense, they leave 2 “in between” (the reader should understand that L was chosen for “left” and R for “right”). The particular couple (L, R) is said to be a cut in Q (more precisely, a Dedekind cut, see Definition 23, named after the German √ mathematician R. Dedekind), and this cut is (by definition) the real number 2. It is very important to notice that L and R were√defined without using the mysterious √ “number” 2—otherwise the definition √ of 2 should have been circular, and so vicious. The new number (in this case, 2) is the given couple.
20
1 Real Numbers: The Basics
We shall adopt this second approach, as in the next definition. Definition 23 A real number is a pair (L, R) of sets of rational numbers which have the following two properties: (i) Both sets L and R are nonempty, and every rational number is in exactly one of the sets. (ii) Every rational number in the set L is smaller than every rational number in the set R. (iii) The set L has no largest element. The set of all such couples is called the real number system, and is denoted by R. The above pair (L, R) is called a Dedekind cut. So, R is the set of all Dedekind cuts. In short, a Dedekind cut will be called just a cut. The set Q can be identified with a subset of R. To this end, to a rational number q we associate the cut (called a rational cut) (Lq , Rq ) given by Lq := {l ∈ Q : l < q} (hence Rq = {r ∈ Q : q ≤ r}). We will define on R an order relation (≤) and two algebraic operations, called the sum (+) and the product (·), respectively: We say that a cut (L, R) is less than or equal to another cut (L , R ) whenever L ⊂ L . In this case we write (L, R) ≤ (L , R ). If this happens and, moreover, (L, R) = (L , R ) (i.e., L = L ), we write then (L, R) < (L , R ). A cut (L, R) is said to be positive if 0 < (L, R). In other words, {l ∈ Q : l < 0} ⊂ L and, simultaneously, {l ∈ Q : l < 0} = L. Let (L, R) and (L , R ) be two cuts. The sum (L, R) + (L , R ) is the cut (L
, R
), where L
:= {l + l : l ∈ L, l ∈ L }, and R
:= Q \ L
. Let (L, R) and (L , R ) be two positive cuts. We define the product (L, R).(L , R ) as the cut (L
, R
) where L
:= (−∞, 0] ∪ {l.l : l ∈ L, l > 0, l ∈ L , l > 0}. The product of the zero cut and any other cut is the zero cut, by definition. In the remaining situations, the definition of the product of two cuts reduces to the former cases by computing first, if necessary, the additive inverses to get positive cuts, and then defining, for cuts c1 and c2 , ⎧ ⎪ ⎪ ⎨−((−c1 ).c2 ) if c1 < 0 and 0 < c2 , c1 .c2 := −(c1 .(−c2 )) if 0 < c1 and c2 < 0, ⎪ ⎪ ⎩ (− c1 ).(−c2 ) if c1 < 0 and c2 < 0. For details see Sect. 12.4. The set R of all cuts, endowed with the two algebraic operations and the order defined there, is an order complete field (see definitions and the relevant properties in Sect. 12.4 and, in particular, Theorem 1070). These operations and this order, when restricted to Q, induce the usual sum, product, and order in Q. There are cuts that are not rational. Those are, precisely, the cuts (L, R) where R has no smallest element. The cut (L, R) described in (1.17) is not rational. This is proved after Definition 1069, and it is a consequence of Theorem 18.
1.6 Real Numbers
21
From now on, we shall use a single symbol for a real number, although, eventually, we will return to the Dedekind cut description. Real numbers can be realized as points on a geometrical line in space (referred to as the real line) —fractions can be placed on this line by dividing the unit interval [0, 1] := {x ∈ R : 0 ≤ x ≤ 1} in a finite number or equal segments and then by translating. Therefore, real numbers are referred to as points on the real line. The fact that the points on the real line can be put into a one-to-one and onto correspondence with the real numbers is called the Cantor–Dedekind axiom, named after the German mathematician G. Cantor and the aforementioned R. Dedekind.
1.6.2
The Expansion of a Real Number in Base b.
Given a natural number b ≥ 2, let {0, 1, 2, . . ., } be the sequence of the b digits used in base b, ordered as shown. Let us describe a process that associates to each real number x := (L, R) a sequence of digits from {0, 1, 2, . . ., }. Assume, first, that the real number x := (L, R) is positive. This means that there exists a positive q ∈ Q such that q ∈ L (see the definition of order in Sects. 1.6.1 and 12.4.1). Thus, the set (N ∪ {0}) ∩ L is not empty, since it contains 0. It follows that there exists n ∈ N ∪ {0} such that n ∈ L and n + 1 ∈ R: otherwise, by the Finite Induction Principle, N ∪ {0} ⊂ L and so L = Q by Proposition 15. This shows R = ∅, a contradiction. In passing, let us single out the result that has been proved above, for future references. Proposition 24 Given a positive real number x := (L, R), there exists m ∈ N such that x ≤ m. Getting back to the started process, observe that n < n + 1/b < n + 2/b < . . . < n + /b < n + 1. Thus we can find a1 ∈ {0, 1, 2, . . ., } such that n + a1 /b ∈ L and n + (a1 + 1)/b ∈ R. In the same way, we have n + a1 /b < n + a1 /b + 1/b2 < . . . < n + a1 /b + /b2 < n + (a1 + 1)/b, and we can find a2 ∈ {0, 1, 2, . . ., } such that n + a1 /b + a2 /b2 ∈ L and n + a1 /b + (a2 + 1)/b2 ∈ R. Continue in this way to find a sequence {a1 , a2 , . . . } in {0, 1, 2, . . ., }. Definition 25 Let b ≥ 2 be a natural number, and let x ∈ R. If x > 0, write in base b the number n found in the former construction. Then, the expression n.a1 a2 · · · , where {a1 , a2 , . . . } is the sequence found above, is called the expansion in base b of x. If x < 0, then proceed to find n and {a1 , a2 . . . } for −x. The expression −n.a1 a2 · · · is called the expansion in base b of x. In both cases, “.” is said to be the radix point. If x = 0, its expansion in base b is just 0. Remark 26 1. Note that if x ∈ Q, then the expansion of x in base b agrees with the one obtained in Sect. 1.5. This is an immediate consequence of the division algorithm described there.
22
1 Real Numbers: The Basics
2. Numbers that have terminating or repeating base expansions in some base correspond precisely to rational numbers, i.e., to elements in Q (see Theorem 20). From this theorem it follows, too, that if a number has a terminating or repeating expansion in one base then the same happens in all bases. Consequently, numbers that have a nonterminating noneventually-periodic base expansion correspond precisely to irrational numbers. The set of all irrational numbers is denoted by P. ® The following result shows that, conversely, every expansion n.a1 a2 · · · in a base b defines a real number. The case −n.a1 a2 · · · (see Definition 25) is treated similarly. Theorem 27 Let b ≥ 2 be a natural number. Given an expression n.a1 a2 · · · , where n ∈ N∪{0} and ak ∈ {0, 1, 2, . . ., } for all k ∈ N, put q0 = n, and qk := n.a1 a2 · · · ak (base b) for all k ∈ N. Then the couple (L, R), where L := ∞ k=0 {q ∈ Q : q < qk } and R := Q \ L, is a Dedekind cut, and its expansion in base b is, precisely, n.a1 a2 · · · . Proof Each qk is a rational number. Obviously, L = ∅ and, since qk ≤ (n + 1) for all k ∈ N ∪ {0}, we have R = ∅. Given l ∈ L and r ∈ R, there exists k ∈ N ∪ {0} such that l ∈ {q ∈ Q : q < qk }. Since r ∈ {q ∈ Q : q < qk } we get r ≥ qk . All together, l < qk ≤ r, hence l < r. Let us show now that L has no largest element. Observe first that the sequence {qn }∞ n=0 is increasing (i.e., qn ≤ qn+1 for all n ∈ N, see Definition 134). Given l ∈ L, there exists, as before, k ∈ N ∪ {0} such that l ∈ {q ∈ Q : q < qk }. Either qk = qm for all m > k (in which case L := {q ∈ Q : q < qk }) or there exists m > k such that qk < qm (in which case l < qk and qk ∈ L, so l is not the largest element in L). All together, (L, R) is a Dedekind cut. It is obvious now, due to the way the sequence {qk }∞ k=0 has been constructed, that the expansion in base b associated to (L, R) (see the beginning of this subsection and, in particular, Definition 25) is, precisely, n.a1 a2 · · · . As a particular example of the procedure described above, choose b = 2 and let us see that to find the binary expansion of a real number amounts to locate it on the real line through a (maybe infinite) process of halving √ intervals (see Footnote 3 above or Definition 33 below). Take again the number 2, defined as the Dedekind cut (L, R) in (1.17). Geometrically we proceed in the following way (see Fig. 1.5): • Choose the origin and define a unit. • Locate the points 1 and 2 on the real line and√note that √ 1 ∈ L and 2 ∈ R. This implies that n = 1 in the binary expansion of 2, so 2 = 1. · · · . • Divide the interval [1, √2] into two halves. Since 1.5 ∈ R, we get a1 = 0, and the binary expansion of 2 is 1.0 · · · . • Divide the interval [1, 1.5] √ again in two halves. Since 1.25 ∈ L we get a2 = 1, so the binary expansion of 2 is 1.01 · · · . • Continue these dyadic process of halving-and-then-choosing.
1.6 Real Numbers Fig. 1.5 Finding halving
√
23
2 by
After some iterations we get
√ 2 = 1.01101010000010011110011001100110010110011000001111011101011 · · · 2
√
(1.18)
Since 2 is not a fraction (see Theorem 18) it follows from Theorem 20 that (1.18) is neither a terminating nor an eventually periodic expansion. In base b = 10, we have √ 2
10
= 1.41421356237309504880168872420969807856967187537694807317668 · · · ,
and, again, an appeal to Theorems 18 and 20 shows that this base expansion is certainly neither terminating nor eventually periodic. Theorems 20 and 27 suggest a way to produce real numbers (as many as we wish, by slightly changing the procedure) that are not in Q, i.e., irrational numbers. To this end, consider, for example, the number x = 0.1010010001 . . . (base 10), where the number of 0’s between two consecutive 1’s is increased each time by one. Nowhere in the nonterminating decimal expansion do we reach an eventual block repetition. Therefore the number x must be an irrational number. Remark 28 The base expansion interpretation of real numbers has a subtle problem. Consider, for example, the numbers z = 0.9 and w = 1.0 written in base b = 10. Even though these expansions are not equal, the numbers (if you wish, the geometric points on the real line) they represent must be the same. The distance between the points z and w (i.e., the quantity |z−w|, where |·| denotes the absolute value function, see Definition 37) is nonnegative and less than any positive number. Therefore it must be zero. This phenomenon occurs only in the case of expansion whose digits are eventually 0 or expansions whose digits are eventually 9 (in the decimal case), or in general, expansion whose digits are eventually 0 or expansions whose digits are eventually b − 1 (in base b). ® Remark 29 In is interesting to note —it will be seen later on (see Definition 594, Theorem 596, Example 1023, Exercises 13.50 and 13.392)— that the set P of all irrational numbers behaves in many respects much better than that of the rational numbers, in a large part of mathematics. ® Definition 30 The subset D of Q defined by k : k ∈ Z, and n ∈ N D= 2n is called the set of dyadic numbers.
24
1 Real Numbers: The Basics
Note that a real number x has a terminating binary expansion if and only if x is a dyadic number.
1.6.3
The Extended Real Number System, Intervals
A useful notational device —that avoids splitting arguments about real numbers into several cases— is to extend the system R to include two special symbols, called +∞ and −∞, respectively. The conventions and manipulations with them are collected in the following definition. Definition 31 The extended real number system consists of adding two symbols, +∞ and −∞, to R, which satisfy the following rules: (i) (ii) (iii) (iv)
If x If x If x If x
∈ R, then −∞ < x < +∞. ∈ R, then x + ∞ = +∞, x − ∞ = −∞, and > 0, then x(+∞) = +∞, and x(−∞) = −∞. < 0, then x(+∞) = −∞, and x(−∞) = +∞.
x +∞
=
x −∞
= 0.
The extended real number system will be denoted by R ∪ {+∞} ∪ {−∞}, and, in short, by R∗ . Remark 32 The operations 0(+∞) or 0(−∞) are not allowed.
®
Definition 33 Let a, b in R such that a ≤ b. (i) If a < b and I := {x ∈ R : a < x < b}, then the set I is called a bounded open interval. We write I = (a, b). (ii) If I := {x ∈ R : a ≤ x ≤ b}, the set I is called a bounded closed interval. We write I = [a, b]. (iii) If a < b and I := {x ∈ R : a < x ≤ b}, the set I is called a bounded left half open interval. We write I = (a, b]. (iv) If a < b and I := {x ∈ R : a ≤ x < b}, the set I is called a bounded right half open interval. We write I = [a, b). In all this cases, the terminology “bounded” is in agreement with the concept introduced in Definition 41 below. (v) By using the new defined symbols +∞ and −∞, we shall write sometimes intervals with infinite endpoints, a convention with the following meaning: Given a ∈ R, we write [a, +∞) for the set {x ∈ R : a ≤ x}; this set will be called a left half closed unbounded interval; similarly for (a, +∞),(− ∞, a], or (−∞, a). The set (−∞, +∞) coincides with R. Remark 34 According to Definition 33, intervals are considered to be nonempty subsets of R, unless otherwise stated. ®
1.6 Real Numbers
25
Fig. 1.6 The graph of the absolute value function on [−1, 1] (Definition 37)
Definition 35 The length |I | of a bounded interval I as in (i), (ii), (iii), or (iv) in Definition 33, is the real number b − a. In the remaining part of this text, [a, b] will always denote a closed and bounded interval in R, so we shall implicitly assume that a ∈ R, b ∈ R, and a ≤ b. The same applies to the symbol (a, b): it will always denote an open and bounded interval in R, and in this case a and b will be real numbers such that a < b. Similar remarks are done regarding the other kind of intervals defined above. Definition 36 An interval I in R not reduced to a single point is said to be nondegenerate.
1.6.4
Order Properties—and the Completeness—of R
On the set R there is an order ≤ (see Sects. 1.6.1 and 12.4) that, when restricted to Q, is the canonical order in Q (see (1.9)). We defined above a real number x to be nonnegative if 0 ≤ x, and positive if it is nonnegative and different from 0. Definition 37 x ∈ R by
The function absolute value of x, denoted by |x| (5), is defined for ⎧ ⎨x |x| = ⎩−x
if x ≥ 0, if x < 0.
The term “graph of a function” was introduced in Sect. 1.1. The graph of the function |x| appears in Fig. 1.6. Thus, for example, |5| = 5, and |−5| = 5. Note that |x| = max{x, −x} for all x ∈ R, and that for x, y ∈ R, dabs (x, y) := |x − y|
(1.19)
is the standard distance between x and y in R. Thus, e.g., given a ∈ R, {x ∈ R : |x − a| < 1} = (a − 1, a + 1). We distinguish by the context between the role of the symbol | · | when applied to an interval —denoting then its length, see Definition 35— and when applied to a real number —denoting then its absolute value (Definition 37).
5
26
1 Real Numbers: The Basics
Fig. 1.7 The positive and negative part functions
The following are simple but important statements, the second one called the triangle inequality. Proposition 38 Let x and y be two elements in R. Then (i) |xy| = |x|.|y|, and (ii) |x + y| ≤ |x| + |y|. Proof We omit the standard proof of (i), that consists of the consideration of the sign of x and y. For the proof of (ii) first note that x ≤ |x| for all x ∈ R. Thus for each x, y ∈ R we have x + y ≤ |x| + |y|. Hence also −x − y ≤ |x| + |y|. Therefore |x + y| ≤ |x| + |y|. Note that Proposition 38 implies that, for all x, y ∈ R, we have ||x|−|y|| ≤ |x−y|. Indeed, write x = y + x − y and use Proposition 38 to get |x| − |y| ≤ |x − y|. Then interchange the rôle of x and y. Similarly to the absolute value function, the following two functions are defined, namely the so-called positive part function x + , and the negative part function x − . Precisely, the function x + is defined for x ∈ R by x + = max{x, 0}, while
x − = max{−x, 0}.
For a graph of this two functions, see Fig. 1.7. By considering the sign of the numbers, we easily verify that for every x ∈ R we have x = x + − x − and |x| = x + + x − . Definition 39 Let x ∈ R. By x we understand the greatest integer smaller than or equal to x. This integer is called sometimes the integer part of x. The number fr (x) (called the fractional part of x) is defined by fr (x) := x − x. The function that sends x to x is called the floor function. By x we understand the smallest integer greater than or equal to x. The function that sends x to x is called the ceiling function (see Fig. 1.8). Note that, for every x ∈ R, fr (x) ∈ [0, 1). For example 1.49 = 1, fr (1.49) = 0.49, and 1.49 = 2. Similarly, −3.14 = −4, fr (− 3.14) = 0.86, and −3.14 = −3. Remark 40 The canonical order ≤ defined on R (see Sect. 1.6.1 and 12.4), is not a well-order. Indeed, there are nonempty subsets of R (as the set (0, 1)) that have no least element. The order ≤ is not a well order on Q, either. For example, the set (0, 1) ∩ Q has no least element in Q. ®
1.6 Real Numbers
27
Fig. 1.8 The floor and the ceiling functions
Definition 41 Let A denote a subset of R. • We say that A is bounded above if there exists u ∈ R so that a ≤ u for all a ∈ A. The real number u is referred to as an upper bound for the set A. • We say that A is bounded below if there exists l ∈ R so that l ≤ a for all a ∈ A. The real number l is referred to as a lower bound for the set A. • If A is simultaneously bounded above and bounded below we say that A is bounded. The following is an important definition. Definition 42 Let A denote a subset of R. • We say that U ∈ R is a least upper bound, also called a supremum for the set A, if the two following conditions hold true: (i) The number U is an upper bound for the set A. (ii) If u is another upper bound for the set A, then U ≤ u. If a supremum U of A exists, we write U = sup A (see Remark 43). • Analogously, we say that L ∈ R is a greatest lower bound, also called an infimum for the set A, if the two following conditions hold true: (i) The number L is a lower bound for the set A. (ii)If l is another lower bound for the set A, then l ≤ L. If an infimum L of A exists, we write L = inf A (see Remark 43). Remark 43 If a subset A of R has a supremum, this supremum is unique. Indeed, assume that U1 and U2 are two suprema of A. Since U1 is an upper bound for A, and U2 , as a supremum of A, is the least upper bound for A, we have U2 ≤ U1 . By reversing the roles of U1 and U2 , we get U1 ≤ U2 . It follows that U1 = U2 . A similar argument proves the uniqueness of the infimum of a set, if such an infimum exists. ®
28
1 Real Numbers: The Basics
A rewording of the definition of supremum (infimum) of a bounded above (respectively, below) subset A of R is given in the following simple result (for the proof, see Exercise 13.25). Proposition 44 Let A be a bounded above (below) subset of R. Then, U ∈ R is the supremum (L is the infimum) of A if, and only if, the two following conditions hold simultaneously: (i) For every a ∈ A, we have a ≤ U (respectively, L ≤ a). (ii) Given ε > 0, there exists a ∈ A such that U − ε < a ( ≤ U ) (respectively, (L ≤) a < L + ε). Consider the set A = {x : 0 ≤ x, and x 2 < 2} The set A is bounded, since it is bounded above (for example, by the number 2) and bounded below (for example, by the number 0). In fact, we have infinitely many upper an lower bounds for the set A. For example 1.42, 3.5, 7, and 123, are all upper bounds , while −3, −1, and −0.001 √are all lower bounds. The least upper bound for the set A is the real number U = 2 and the √ greatest lower bound is L = 0. Note that the number 0 lies in the set A whereas 2 does not. The property exhibited in the next result is called the completeness of the real number system. It is responsible for the enormous supply of deep results associated to the study of R (6 ). For a proof see Theorem 18. Theorem 45 (Completeness Theorem) Let A be a nonempty subset of R that is bounded above. Then there exists a supremum of A. Remark 46 1. Theorem 45 ensures the existence of a supremum of any nonempty bounded above subset of R. The uniqueness was established in Remark 43. So, finally, putting everything together, every nonempty bounded above subset of R has one, and only one, supremum. 2. An equivalent formulation of Theorem 45 is that every nonempty subset of R that is bounded below has an infimum. That both formulations are equivalent can be seen by passing to the set B := {−x : x ∈ A} and noticing that − inf B = sup A. 3. The completeness of R can be formulated in several (equivalent) terms. For example, Theorem 45 does it by ensuring the existence of a supremum of a bounded above set. Remark 46.2 above presents a version using infima instead of suprema (for bounded below nonempty subsets of R). For another formulation in terms of Cauchy sequences (see Definition 150), or even in terms of intersection of nested sequences of sets, see Theorem 1074, and see also Theorem 152 and Remark 70.4. The question about the existence of a least upper bound for a bounded above subset of Q in the number system Q has a negative answer in general. For example, remaining ourselves in Q, the set {q ∈ Q : q 2 < 2} has no least upper bound. This is exactly what was proved in Theorem 18.
6
1.6 Real Numbers
29
4. Note that Theorem 45 can be equivalently formulated by saying that for a bounded above set, the set of upper bounds has a least element. ® Remark 47 If a subset A of R is not bounded above, sometimes we write sup A = +∞ (again a notational device). By this we do not mean at all that the set A has a supremum in the sense of Definition 42. The same applies to a set A that is not bounded below. We write, in this case, inf A = −∞. ® Remark 48 We may now try to define properly —and to settle some ambiguities about— exponential formulas as those appearing in (1.11). Let α be a positive real number and let n ∈ N. The symbol α n denotes α.α .(n) . . α, while α 1/n denotes a positive real number β such that β n = α. The existence of such a number β is guaranteed by the axiom of the supremum: Indeed, form the set S := {r ∈ R : r > 0, r n < α}. This set is nonempty, due to the fact that for 0 < r < 1 we have r n < r, and it is bounded above, due to the fact that for r > 1 we have r n > r. Moreover, β := sup S satisfies β n = α (see Exercise 13.26). It follows that α q is well defined for every q ∈ Q, q > 0. Put now α −q := 1/α q for q ∈ Q, q > 0. To be consistent, put α 0 := 1. In this way we defined α q for q ∈√Q.√ √ 2 What is the meaning √ of an expression like 2 ? You can not say “multiply 2 by itself a number 2 of times.” In √ the same way, you cannot search for a number x “that multiplied by itself a number 2 of times gives a previously chosen number.” How to deal with this situation? Now you will see how powerful Theorem 45 is. For properly defining α β , where α, β are numbers in R and α > 0 (7 ), we proceed in the following way. Assume first that β > 0. For α ≥ 1 we shall construct a certain set whose supremum is —by definition— the sought number α β in the following way: Remember that α q is well defined if q ∈ Q (see the beginning of this remark). Let S := {α q : q ∈ Q, q ≤ β}. This is a bounded above subset of R (an upper bound is α p , where p ∈ Q, p > β), hence it has a (unique) supremum s. Put then α β := s. If, on the contrary, 0 < α < 1, then 1/α > 1 and we may define (1/α)β as above. Then put α β for the reciprocal of the number (1/α)β so defined. If β < 0, apply the construction using −β and, at the end, compute the reciprocal of the value so obtained. If, finally, β = 0, put α β := 1. This construction, applied to α q , where q ∈ Q, gives a result that agrees with the previous obtained value. For an (equivalent) definition of the power α β by using the exponential function see Sect. 5.2.3, more precisely Corollary 537. ® A careful analysis of the way real numbers are defined and how they are manipulated, show that the underlying idea here is approximation. √ In applications we either manipulate purely symbolical irrational numbers (like 2, or the numbers e and π to be introduced later) or approximate them by rational numbers. However, it is true that we try to delay approximation as much as possible to avoid any roundoff errors
7 If α = 0 and β > 0, then α β is defined to be 0, while 00 is undefined. If α < 0 we stumble into another serious problem, whose solution cannot be found, in general, in the framework of the theory of real analysis: another extension, the complex analysis theory, is needed for that.
30
1 Real Numbers: The Basics
propagating. Nevertheless, approximation questions appear again and again. They are extremely important, so they have to be asked and addressed. This subject will be treated in Sect. 1.8 and in Chap. 2.
1.7
Cardinality of Sets
Music is the pleasure the human soul experiences from counting without being aware that it is counting. Gottfried Leibniz
1.7.1
Basics on Cardinality
The concept of function was introduced in Sect. 1.1. The terms “function” and “mapping” are equivalent. Recall that a function f : A → B is said to be one-to-one if a1 = a2 whenever f (a1 ) = f (a2 ), a1 , a2 ∈ A, and onto if every b ∈ B is the image of an element a ∈ A. If f : A → B is a one-to-one mapping from A onto B, we can consider its inverse mapping as the function f −1 : B → A that satisfies f −1 (f (a)) = a for all a ∈ A. The concept of a finite set was introduced in the same Sect. 1.1. Definition 49 We say that two sets A and B have the same cardinality if there exists a one-to-one and onto mapping between the sets A and B. We say that the cardinality of a set B is larger than or equal to the cardinality of a set A (or, equivalently, that the cardinality of a set A is smaller than or equal to the cardinality of a set B) if there is a one-to-one mapping from the set A into the set B. If, moreover, the set A and B do not have the same cardinality, then we say that the cardinality of the set B is larger than the cardinality of the set A (or, equivalently, that the cardinality of the set A is smaller than the cardinality of the set B). We say that a set A has cardinality ℵ0 (or that it is countably infinite) whenever A and N have the same cardinality. If a set A is finite or countably infinite we say that A is countable. Otherwise, we say that the set is uncountable. We say that a set A has cardinality c (or that it has cardinality of the continuum) whenever A and R have the same cardinality. Technically speaking, a set is used to “count” another set by using a particular oneto-one mapping from the first set onto the second. The familiar process of counting by using the set N (or a initial part of it) is a way to define such a mapping. Observe that the first part of Definition 49 does not restrict the sets A and B to be finite. In a sense, to say that two sets have the same cardinality means that they have the same “size” (although the reader must be careful about relying on his/her intuition and apply, instead, Definition 49; as illustrate by (iii) in Proposition 51, when using infinite sets some apparently paradoxical situations may arise: a set and a proper subset of it may have the same cardinality).
1.7 Cardinality of Sets
31
The following result is basic in the theory of cardinal numbers. It is due to the aforementioned G. Cantor and the German mathematicians F. Bernstein and E. Schröder. Theorem 50 (Cantor–Bernstein–Schröder) Assume that A and B are two sets such that the cardinality of A is less than or equal to the cardinality of B, and that the cardinality of B is less than or equal to the cardinality of A. Then both sets have the same cardinality. Proof (J. König) We may assume, without loss of generality, that A and B are disjoint. Let f : A → B and g : B → A be one-to-one mappings. We shall define a one-to-one and onto mapping h : A → B. To this end, given a ∈ A, define a two-sided sequence . . . → g −1 f −1 g −1 (a) → f −1 g −1 (a) → g −1 (a) → a → f (a) → gf (a) → f gf (a) → gf gf (a) → . . .
(1.20)
Note that for any a ∈ A, the second line in (1.20) extends indefinitely, while the first line may be empty, nonempty and finite (to the left of a certain position maybe the terms are not defined), or infinite. If, to the left, it stops at an element in A (in B) we say that the sequence is an A-stopper (a B-stopper, respectively). If it does not stop, the sequence is doubly infinite if all elements are distinct, or cyclic if they repeat. It is obvious, due to the injectivity of f and g, that two such sequences are disjoint. Therefore, to establish the existence of h it is enough to fix a sequence S like above and to prove the existence of a one-to-one and onto mapping from A ∩ S onto B ∩ S. (i) If S is an A-stopper, then f A∩S : A ∩ S → B ∩ S is a bijection. (ii) If S is a B-stopper, then g −1 A∩S : A ∩ S → B ∩ S is a bijection. (iii) If S is doubly infinite or cyclic, then either f A∩S or g −1 A∩S do the job.
Proposition 51 (i) Any infinite set contains a set whose cardinality is ℵ0 . (ii) Any nonempty subset of a nonempty finite set is finite. (iii) Any infinite subset of a countably infinite set is countably infinite. Proof (i) Let A be an infinite set. Pick an arbitrary a1 ∈ A. If the set A \ {a1 } is nonempty, pick an arbitrary a2 ∈ A\{a1 }. If A\{a1 , a2 } is nonempty, pick an arbitrary a3 ∈ A \ {a1 , a2 }. There are only two possibilities: Either this process can continue indefinitely, or there is a first natural number n0 such that A \ {a1 , a2 , . . ., an0 } = ∅. In the first case, the finite induction principle defines a subset N := {an : n ∈ N} of A, and the one-to-one mapping f : N → A sending n ∈ N to an shows that N has cardinality ℵ0 . In the second case, the mapping f : {1, 2, . . ., n0 } → A that sends n ∈ {1, 2, . . ., n0 } to an is one-to-one and onto, a contradiction with the fact that A is infinite.
32
1 Real Numbers: The Basics
(ii) Let S be a nonempty subset of a finite set A. Without loss of generality, we may assume that A := {1, 2, . . ., n} for some n ∈ N. For proving that S is finite we propose two approaches. The second one uses Theorem 50, not the first one. (ii.1) We shall find a natural number m such that m ≤ n and a one-to-one mapping f from {1, 2, . . ., m} onto S. This will prove that S is finite. Put c := 1 and i := 1. (*) If i ∈ S, then set f (c) := i and add 1 to c, calling the result again c. If, on the contrary, i ∈ S, do nothing. Add 1 to i, and call the result again i. If i > n, put m = c − 1 and stop. If, on the contrary, i ≤ n, go to (*) and follow again the instructions. This procedure stops after n iterations. At this moment, the number m and the function f have been defined. Note that this proof is, in fact, the description of a simple computer program. (ii.2) Assume that S is infinite. By (i) above, it contains a countably infinite subset N . The set N is included in A, so ℵ0 ≤ |A|. Note, too, that |A| ≤ ℵ0 . Theorem 50 concludes that |A| = ℵ0 , and this contradict the fact that A is finite. (iii) Let A be a countably infinite set, and let S be an infinite subset of S. In order to prove that S is countably infinite, we propose again two approaches. The first one uses the procedure in (ii) above and the finite induction principle. For this, we may assume, without loss of generality, that A is N. Follow word by word the instructions in (ii). The only difference is that now the iteration does not stop. The finite induction principle ensures that the function f is defined inductively from N onto S. For the second one, note that S contains, by (i), a countably infinite subset. It is enough to apply Theorem 50 to conclude that S is countably infinite. For (iii) in Proposition 51, see also Exercise 13.39. In particular, the set N and the set of all even natural numbers have the same cardinality (Galileo’ s paradox, 1638). This can be proved directly: The mapping f (n) = 2n is a one-to-one and onto mapping between the set N and the set of all even natural numbers. The reader may provide also a similar direct proof of the fact that the set of all odd natural numbers has the same cardinality as the set N. It might be surprising that a proper subset of some given set has the same cardinality as the given set. This cannot occur for nonempty finite sets (see again Exercise 13.36, where it is proved that the absence of this phenomenon is an intrinsic—and equivalent—definition of finiteness). Corollary 52 A nonempty set A is countable if, and only if, there exists a mapping from N onto A. Proof Assume first that A is nonempty and countable. If A is finite, there exists n ∈ N and a one-to-one and onto mapping f from {1, 2, . . ., n} onto A. Choose a ∈ A and extend f to a mapping f : N → A by letting f (m) = a for m ∈ N, m > n. Then fis a mapping from N onto A. Assume now that A is countably infinite. Then there exists a one-to-one mapping from N onto A, and we are done. Assume now that A is nonempty and there exists a mapping f from N onto A. For a ∈ A, choose an element na ∈ N such that f (na ) = a. If the set NA := {na : a ∈ A} is finite, there exists n ∈ N and a one-to-one mapping g from {1, 2, . . ., n} onto NA .
1.7 Cardinality of Sets
33
Fig. 1.9 The pattern in the proof of Proposition 53
The mapping h : {1, 2, . . ., n} → A given by h(m) = f (g(m)) for m ∈ {1, 2, . . ., n} is one-to-one and maps {1, 2, . . ., n} onto A, hence A is finite. If, on the contrary, the set NA is infinite, it is countably infinite by (iii) in Proposition 51, so there exists a one-to-one mapping g from N onto NA . The mapping h : N → A given by h(n) = f (g(n)) for n ∈ N is one-to-one and maps N onto A, so A is countably infinite. See also Exercise 13.37. Proposition 53 The union of countably many countable sets is countable. Proof Let {An }∞ n=1 be a (countable) family of countable sets. Since each An is countable, there exists, byCorollary 52, a function fn from N onto An . Now define a mapping f from N onto ∞ n=1 An in the following way: Put f (1) := f1 (1), f (2) := f2 (1), f (3) := f1 (2), f (4) := f3 (1), f (5) := f2 (2), f (6) := f1 (3), f (7) := f4 (1), f (8) := f3 (2), etc. (the pattern appears in Fig. 1.9, where (n, m) stands for fn (m)). The conclusion follows from Corollary 52. Proposition 54 If A is an infinite set and B is a countable set, then A ∪ B has the same cardinality as A. Proof Since A ∪ B = A ∪ (B \ A) and any subset of a countable set is countable (Proposition 51), it suffices to prove the statement for the case of disjoint sets A and B. Let C ⊂ A be a countably infinite set (it exists thanks to (i) in Proposition 51). Then C ∪ B is a countably infinite set (see Proposition 53) and thus there exists a one-to-one mapping f from C ∪ B onto C. Define a map h from A ∪ B into A by ⎧ ⎨f (x) if x ∈ C ∪ B, h(x) := ⎩x if x ∈ A \ C. Then h is a one-to-one onto map from A ∪ B onto A.
Corollary 55 If A is an uncountable set and B is a countable set, then the set A \ B has the same cardinality as A. Proof The set A \ B is uncountable as otherwise A would be countable (see Propositions 51 and 53). In particular, A \ B is infinite. By Propositions 51 and 54, the sets A \ B and A (= (A \ B) ∪ (B ∩ A)) have the same cardinality.
34
1.7.2
1 Real Numbers: The Basics
Cardinality of Z and Q
Proposition 56 The set Z has cardinality ℵ0 . Proof Define a mapping from N into Z by the following rule: If n ∈ N, ⎧ ⎨n if n is even f (n) = 2 ⎩−n+1 if n is odd 2
Note that this mapping f : N → Z is one-to-one and onto. Alternatively, we have Z = N ∪ M, where M := {z ∈ Z : z ≤ 0}. Since clearly M is countably infinite, the result follows from Proposition 54. Remark 57 Imagine that you are a manager of a motel with countably many rooms in it (consider a low rise motel with all the rooms stacked in an infinite row). Every person has to have his/her own room. You receive a report that every room in your motel is occupied by somebody (one person per room). Should you, as a manager, turn on the no vacancy sign? In a regular motel (with a nonempty finite number of rooms) the answer is obviously yes. But in our special motel things are not what they seem. You can always create a vacancy for one more person (no-one moves out, of course). Count the rooms in your motel by using all natural numbers. Then instruct the occupant in the room number n to move to the room number n + 1. Suddenly the room number 1 is empty and, at the same time, all previous occupants have their own room. This paradox is named after the German mathematician D. Hilbert and the motel is referred to as the Hilbert’ s hotel. A modification of the procedure even allows to accommodate all the passengers of an incoming bus having a countable number of seats in such a way that, at the end, each (old and new) guest has his/her own room. Intimately related is Galileo’ s paradox, mentioned at the paragraph preceding Corollary 52. ® The proof of the following result has similarities with the proof of Proposition 53. Again, Fig. 1.9 shows the essential procedure behind this technique. Proposition 58 The set Q has cardinality ℵ0 . Proof Let us enumerate all positive fractions as follows: first take the fraction { 11 }. This is level 1. Take then fractions { 21 , 21 }. These form level 2. Continue with { 13 , 22 , 31 } and suppress 22 , since it belongs to the same class as 11 , already listed. This is level 3. Continue in this way. At level n > 1, list the fractions { pq : p + q = n − 1}, and suppress, if necessary, fractions whose class already appeared. This enumeration has created a one-to-one and onto mapping from the set N to the set of all positive fractions. Enumerate negative fractions in a similar manner, and obtain a mapping (one-to-one and onto) from the set of all negative integers to the set of all negative fractions. Finally send 0 to 0 and we have a one-to-one and onto mapping from Z to the set of all fractions. Use finally Proposition 56. There are infinite sets that do not have the same cardinality as the set N (i.e., sets that are not countable). This will be shown in Sect. 1.7.3.
1.7 Cardinality of Sets
35
The infinite! No other question has ever moved so profoundly the spirit of man. David Hilbert
1.7.3
Cardinality of R
We now present a famous argument due G. Cantor. We prove that the set of all real numbers has a larger cardinality than the set of all natural numbers. In another words, it is impossible to list all real numbers in an infinite column. There are fundamentally more real numbers than countable infinity allows. We say that there are uncountably many real numbers. Two natural numbers n and m are said to be congruent modulo p, where p is another natural number, whenever p divides n − m. If this is the case we write n = m (modp). Theorem 59 (Cantor) The set of all real numbers in (0, 1) has a larger cardinality than the set N. In particular, there does not exist a one-to-one and onto mapping between the sets R and N. Proof Arguing by contradiction, assume that there exists a one-to-one and onto map between N and the set of real numbers in (0, 1). Thus we can list all real numbers in (0, 1) in an infinite column. Suppose the following is such a list: 0.x11 x12 x13 x14 x15 x16 x17 · · · 0.x21 x22 x23 x24 x25 x26 x27 · · · 0.x31 x32 x33 x34 x35 x36 x37 · · · 0.x41 x42 x43 x44 x45 x46 x47 · · · 0.x51 x52 x53 x54 x55 x56 x57 · · · 0.x61 x62 x63 x64 x65 x66 x67 · · · 0.x71 x72 x73 x74 x75 x76 x77 · · · .. . where numbers in (0, 1) are written in their binary nonterminating expansion. Without loss of generality we may assume x11 = 0 and x22 = 1. We claim that the number z = 0.z1 z2 z3 z4 z5 z6 z7 · · · (base 2), where zn = xnn + 1 (mod 2), for all n ∈ N, is not in the above list, although z ∈ (0, 1). Indeed, z = 0.10 · · · , hence 0 < z < 1. Note, too, that zn = xnn for all n ∈ N, and therefore z differs from the first, from the second, from the third, etc., number in the list, and we reach a contradiction. The technique used in the proof of Theorem 59 is known as Cantor’ s diagonal method.
36
1 Real Numbers: The Basics
Fig. 1.10 The graph {(x, f (x)) : x ∈ (0, 1)} of f (proof of Proposition 61)
b f a 0
1
Fig. 1.11 The mapping h from (−1, 1) onto R (proof of Proposition 61)
Corollary 60 The set R and the set P of all irrational numbers have the same cardinality. Proof This follows from Propositions 54 and 58. For a different approach see Exercise 13.50. Proposition 61 Any nondegenerate interval J in R has cardinality c. Proof We shall prove first that (0, 1) has the same cardinality as any bounded interval of the form (a, b). To see this, it is enough to consider the one-to-one map f from (0, 1) onto (a, b) given by (see Fig. 1.10) f (x) = (b − a)x + a, for x ∈ (0, 1).
(1.21)
Second, we shall prove that the interval (−1, 1) has cardinality c. To see this note that the mapping h(x) := x/(1 − |x|) maps (−1, 1) onto R in a one-to-one way. For a graph of the function h see Fig. 1.11 The two shown results, put together, conclude the proof of the case J := (a, b). To deal with the cases [a, b), (a, b], and [a, b], use Proposition 54 (alternatively, the Cantor–Bernstein–Schröder Theorem 50 can be used, see Exercise 13.44). To see that, e.g., (0, ∞) has cardinality c use, for example, the function g defined by ⎧ ⎨1 − 1 if 0 < x ≤ 1, x g(x) = ⎩x − 1 if x > 1 (see Fig. 1.12). It is left to the reader to show that g maps (0, ∞) one-to-one onto R. In order to prove Theorem 63 below, we shall use the following lemma: Lemma 62 If r and s are irrational numbers, and r < s, then there is a rational number t ∈ (r, s). Proof Let r and s be two irrational numbers so that r < s. Pick two integers n < m such that n < r < s < m. In the first part of the proof of Proposition 61, for
1.7 Cardinality of Sets
37
Fig. 1.12 The mapping g from (0, ∞) onto R (proof of Proposition 61)
the interval (a, b) := (n, m), we defined a one-to-one mapping f from (0, 1) onto (n, m); clearly f sends rational numbers to rational numbers, and so it does its inverse mapping. Thus it is enough to prove the assertion for irrational numbers r and s in (0, 1) such that r < s. Expand r and s in base b = 2 to obtain r = 0.x1 x2 · · · , and s = 0.y1 y2 · · · , where xn , yn belong to {0, 1} for n ∈ N. Choose the first k ∈ N so that xk = yk . Since r < s, we obviously have 0 = xk < yk = 1. Define t = 0.x1 · · · xk−1 1, and note that t is a rational number such that r < t < s.
Theorem 63 Between two distinct real numbers always lie a rational and an irrational number. Proof Let a, b ∈ R, a < b. (i) If a and b are both irrational numbers, a rational number c ∈ (a, b) exists by is an irrational number in (a, b). Lemma 62. Then d = c+b 2 (ii) If a, b are both rational numbers, then a+b is a rational number in (a, b). Thus, 2 √ is an irrational number in (a, b). a + b−a 2 is an (iii) If a is a rational number and b is an irrational number, then c := a+b 2 irrational number in (a, b). Using this number c, the number b, and (i) above, we get a rational number in (a, b). (iv) The case a being an irrational number and b a rational number can be treated similarly as (iii). Remark 64 As it was already mentioned, Theorem 59 is due to G. Cantor, the founder of Set Theory, a discipline which is now in the foundation of most of modern mathematics. There is a natural question arising from the fact that the set R has a strictly greater cardinality than the set N, and it is whether there exists a set whose cardinality is strictly in between. That this is not the case is what is called the Continuum Hypothesis. To prove or to disprove this conjecture was a task endeavored by the aforementioned G. Cantor and by many mathematicians after his work. That this
38
1 Real Numbers: The Basics
conjecture cannot be proved nor disproved came as a real surprise only after more than 50 years of an intensive effort of many people. Thus, the validity of the Continuum Hypothesis turned out to be independent of the standard axioms of Set Theory. The result is due to the Austrian–American mathematician K. Gödel, and the American mathematician P. Cohen. ® In mathematics you don’ t understand things. You just get used to them. John von Neumann
1.7.4
Cardinality of the Set of Real Functions
The purpose of this short subsection is to illustrate on the existence of larger cardinalities than the continuum. Proposition 65 Let F be the family of all real-valued functions on the real line R. Then the cardinality f of the set F is larger than c. Proof The mapping φ : R → F, where for a ∈ R, φ(a) denotes the constant function a, is one-to-one, showing that the cardinality of F is larger than or equal to c. To show that the cardinality of F is indeed larger than c, assume that a mapping a → fa is one-to-one and maps R onto F. We use the Cantor diagonal procedure discussed in the proof of Theorem 59 to derive a contradiction. Indeed, define the element f in F by f (x) = fx (x) + 1, f orx ∈ R. We show that for no a ∈ R, f = fa . For it, assume that, on the contrary, f = fa for some a ∈ R. Compare the values of f and fa at the point a: we have f (a) = fa (a) + 1 = f (a) + 1, a contradiction. See Exercise 13.56 for a precise computation of f. Remark 66 Note that by using the technique in the proof of Proposition 65 we can built sets having larger and larger cardinalities. See also (d) in Exercise 13.32. ® In Exercises (Sect. 13.17) we will have some complements to the theory of the so-called cardinal numbers and an ample amount of solved problems on cardinalities. The related theory of ordinal numbers will also be sketched.
1.8 Topology of R 1.8.1
Introduction. Open and Closed Sets
The concept of closeness in the real number system is of fundamental importance in Real Analysis. The idea is the basis, in an abstract setting —even in absence of a distance— of a theory known as topology, from the Greek word “topos” (place,
1.8 Topology of R
39
location). The Latin word, used at the beginning of the historical development, was “analisis situs” (analysis of places, of locations). The following is the basic definition in this direction. It introduces the so-called “open sets” (their complements will be called “closed sets,” aiming at the concept of “closeness” mentioned above). It is built using the open intervals, already introduced in Definition 33 (8 ). Definition 67 Let S be a subset of R. We say S is an open set whenever for every x ∈ S there exists an open interval I (depending on x) so that x ∈ I ⊂ S. A set C ⊂ R is said to be a closed set if the complement of C (i.e., the set C c := R \ C) is an open set. Note that every open interval is an open set, and that every closed interval is a closed set. In particular, every singleton is a closed set. There are subsets that are neither open nor closed, as the example of a set like (1, 2] shows. Observe, too, that the definition forces R to be simultaneously open and closed, and that the same is true for the empty set ∅ (see Proposition 71). We shall prove later (Proposition 103) that this is all: the only subsets of R being simultaneously closed and open are R and ∅. The following result shows some stability properties of the class of the open intervals in R. Proposition 68 j ∈J Ij = ∅.
Let {Ij }j ∈J be a family of open intervals in R such that
(i) If J is finite, then j ∈J Ij is an open interval. (ii) For an arbitrary nonempty set J , the set j ∈J Ij is an open interval. Proof Clearly, J is nonempty. For j ∈ J , put Ij := (aj , bj ), where aj and bj are elements of R∗ such that aj < bj , and let I := j ∈J Ij . Choose x ∈ j ∈J Ij . (i) Assume first that J := {1, 2, . . ., n}. Let α := max{aj : j ∈ J } (put α := −∞ in the case that aj = −∞ for all j ∈ J ), and let β := min{bj : j ∈ J } (put β := +∞ in the case that bj = +∞ for all j ∈ J ). Since aj < x < bj for j ∈ J , we get α < x < β. It is enough now to observe that j ∈J (aj , bj ) = (α, β). (ii) If the set {aj : j ∈ J } is bounded below, then define α = inf j ∈J aj ; otherwise set α = −∞. Similarly, if the set {bj : j ∈ J } is bounded above then define β = supj ∈J bj ; otherwise set β = +∞. If y ∈ I , then y ∈ Ij for some j ∈ J . Since x ∈ Ij , the (closed) interval with endpoints x and y is contained in Ij , hence in I . This shows that I is an interval. Clearly, I = (α, β).
8 The family of the open sets in R is much larger than the family of the open intervals, and it is the model, in an abstract setting, for families of sets that will be called “topologies” (their elements generically named “open sets”). The basic properties (O1) to (O4) of the family of all open sets in R, isolated in Proposition 71, are the requirements for the abstract definition of a “topology” on a set (see also Remark 105).
40
1 Real Numbers: The Basics
Note that
∞ 1 1 − , = {0}, n n n=1
thus a countable intersection of open intervals need not be an open interval. Compare this with (i) in the statement of Proposition 68. Regarding sequences of closed bounded intervals, the following property is crucial. It is a direct consequence of the completeness property of the system of real numbers (see Theorem 45) and can be seen as a building block for subsequent developments in real analysis presented in this text. It is referred to as the Nested Interval Property or the Nested Interval Theorem. The reader will understand its central role as soon as he/she realizes that the Nested Interval Property is equivalent to the fact that R is complete (see Remark 70.4 and Theorem 1074). Theorem 69 (Nested Interval Theorem) Consider a sequence {Ik }∞ k=1 of closed bounded intervals in R, say Ik := [ak , bk ], k ∈ N, so that Ik+1 ⊂ Ik for all k ∈ N. Assume that the lengths of the intervals Ik approach zero, which means that for every ε > 0 there exists kε ∈ N so that bkε − akε < ε. Then we have I :=
∞
Ik = {z}, for some z ∈ R.
k=1
Proof Define the sets A := {ak : k ∈ N} and B := {bk : k ∈ N} Observe that a1 ≤ a2 ≤ . . . ≤ b2 ≤ b1 . Thus, by Theorem 45, the quantities α := sup (A) and β := inf (B) do exist in R. Observe, too, that ak ≤ α ≤ β ≤ bk for each k ∈ N. Now, note that [α, β] = ∞ k=1 Ik . Indeed, if x ∈ [α, β] then, by the previous observation, ak ≤ x ≤ bk for all k ∈ N, meaning that x ∈ Ik for all k ∈ N. On the other hand, if x ∈ Ik for all k ∈ N then ak ≤ x ≤ bk for all k ∈ N, and so α ≤ x ≤ β. Assume for a moment that ε := β − α > 0. Find kε such that bkε − akε < ε. Since [α, β] ⊂ [akε , bkε ], we have ε = β − α ≤ bkε − akε < ε, a contradiction. This shows that α = β, hence I = {z}, where z = α = β.
Remark 70
1. The assumption of the intervals being closed in the Nested Interval Theorem 69 is crucial, as the following example shows: ∞
0,
n=1
∞ 1 1 0, = {0}, although = ∅. n n n=1
1.8 Topology of R
41
2. Observe, too, that the bounded character of the intervals in the Nested Interval Theorem 69 is vital: For an example, see Exercise 13.72. 3. There is a version of Theorem 69 that does not assume that the lengths of the intervals Ik approach zero (still boundedness is necessary, see the previous item). In this case, the conclusion is that the intersection ∞ k=1 Ik is nonempty. The proof repeats the former one, but the last sentence. We can only conclude now that α ≤ β (so the intersection is the interval [α, β], no longer a singleton, in general). Below (see Corollary 148) we shall encounter an extension of this result: Every nested sequence consisting of nonempty closed and bounded subsets of R has a nonempty intersection. 4. The Nested Interval Property (Theorem 69) is equivalent to the completeness of R (see Theorem 1074). ® As an application of Theorem 69 (taken from [Stromb81, Theorem 1.58]), let us give another proof of the fact that any general nondegenerate interval I in R is uncountable (see Proposition 61). First note that any general nondegenerate interval contains a nondegenerate closed and bounded interval [a, b]. There is a one-to-one mapping from the interval (a, b) onto R (see the first and second part of the proof of Proposition 61). The set R is infinite, since it contains N; it follows that (a, b), and so [a, b], are both infinite. Assume that [a, b] is countable. Then it is countably infinite, say [a, b] := {xn : n ∈ N}. Consider three intervals b−a b−a 2(b − a) 2(b − a) a, a + , a+ ,a + , a+ ,b . 3 3 3 3 Choose one of them that does not contain x1 , and call it I1 . Now divide I1 in three closed intervals by using three equally-spaced points as above, and let I2 be one of them having the property that x2 ∈ I2 . Continue in this way to obtain ∞a nested sequence {In }∞ of closed intervals. Their lengths approach zero, hence n=1 n=1 In is a single point in [a, b], according to Theorem 69. However, no xn belongs to ∞ n=1 In , a contradiction. The following result is, somehow, an extension of Proposition 68, telling us that a stability result similar to what was established there for the class of open intervals holds for the class of open subsets of R. We mention it here because it singles out the essential features the family of open subsets in an abstract setting should possess. Proposition 71 (O1) The empty set is an open subset of R. (O2) R itself is an open subset of R. (O3) Any finite intersection of open subsets of R is open. (O4) Any union (possibly infinite) of open subsets of R is open. Consequently, the empty set and R itself are both closed subsets of R, every finite union of closed subsets of R is closed, and any intersection (possibly infinite) of closed subsets of R is closed. Proof (O1) This holds since Definition 67 applies trivially to the empty set. (O2) is also clear from the very definition of open subset of R. (O3) If the collection of open sets is empty, the result is obviously true. Otherwise, consider any nonempty finite
42
1 Real Numbers: The Basics
intersection of open sets U = nk=1 Uk , for some n ∈ N. If U is empty we are done. Otherwise, let x ∈ U . Then x ∈ Uk for all k ∈ {1, . . ., n}. There exist open intervals Ik containing x so that Ik ⊂ Uk (since each Uk is an open set) for all k ∈ {1, . . ., n}. By Proposition 68, we know that the intersection J =
n
Ik
k=1
is an open interval containing x. Since J ⊂ nk=1 Uk , we are done. (O4) Let {Uk }k∈K be a collection of open subsets of R. The result is obviously true for K = ∅. If not, and if all Uk are empty, then U is also empty and we are done. Otherwise, let x ∈ U ; therefore x ∈ Uk for some k. Since Uk is open there exists an open interval I containing x and such that I ⊂ Uk , hence I ⊂ U . This proves that U is an open set. Note that ∞ 1 n−1 , = (0, 1). n n n=1 This shows that an infinite union of closed sets need not be closed (observe that (0, 1) is indeed not closed, since its complement C := (− ∞, 0] ∩ [1, +∞) is not open —the point 1 does not lie in an open interval contained in C). On the other hand, we saw in the paragraph after Definition 67 that every singleton is a closed set. Thus, Proposition 71 implies that all finite subsets of R are closed sets.
1.8.2
Neighborhoods, Closure, Interior
Let us build some notation that will help to handle situations where the idea of closeness appears. All the concepts that will be introduced below depend, ultimately, on the definition of open set. The following definition strongly conveys the idea of an element y being “close” to an element x: the element y belongs to a “neighborhood” of x (better think of y being in “many” neighborhoods of x). The concept defined will be used again an again as a way to avoid complicated notation of ε’s and δ’s. Its use in a general context will substitute for quantitative estimates, and will greatly help in understanding and dealing with the basic concepts in this area. Definition 72 Given a point x ∈ R, a subset U of R is called a neighborhood of x if there exists an open subset O of R such that x ∈ O ⊂ U . Observe that, in the previous definition, the set U is not necessarily open. By definition, any subset of R that contains a neighborhood of a point x is, itself, also a neighborhood of x. Observe that, due to the definition of an open set in R (Definition 67), U is a neighborhood of a point x ∈ R if, and only if, there exists δ > 0 such that (x − δ, x + δ) ⊂ U . The following definition gives a name to the set of elements that, vaguely speaking, are “close” to the elements of a given set.
1.8 Topology of R
43
Definition 73 The closure of a subset A of R, denoted by A, is the intersection of all closed sets containing A. Remark 74 Thanks to Proposition 71, A is closed for each A ⊂ R. Due to this observation, and to the way the closure has been defined, A is the smallest (in the sense of inclusion) subset of R that satisfies the following two properties: (i) It is closed, and (ii) it contains A. We shall provide later (see Propositions 78 and 79, and Theorem 83) some other characterizations of the set A, where A is a given subset of R. Thanks again to the definition, we may state that a set A in R is closed if, and only if, A = A. Observe that the closure of the empty set is the empty set, due to the fact that the empty set is closed. ® As an example, let us show that (i) the closure of the nondegenerate interval (a, b) is the interval [a, b], and (ii) the closure of the interval (a, b] is again the interval [a, b]. Indeed, note that (a, b) ⊂ (a, b) ⊂ [a, b], since [a, b] is closed. We have only four possibilities: either (a, b) = (a, b), (a, b) = (a, b], (a, b) = [a, b) or, finally, (a, b) = [a, b]. Only in the last case the set (a, b) is closed. This proves (i). The proof of (ii) is similar. A notion complementary to the concept of closure is the notion of interior of a set A, defined below. Definition 75 The interior of a subset A of R, denoted by Int A, is the union of all open subsets of R contained in A. Remark 76 Thanks to Proposition 71, for each A ⊂ R the set Int A is open. Due to this, and to the way the interior has been defined, Int A is the biggest (in the sense of inclusion) subset of R that satisfies the following two properties: (i) It is open, and (ii) it is contained in A. We shall provide later some other characterizations of the set Int A, where A is a given subset of R. Thanks again to the definition, we may state that a set A in R is open if, and only if, A = Int A. Observe that a nonempty set A may have an empty interior (for example, Int {0} = ∅). ® For example, the interior of the interval [a, b] is the interval (a, b). The interior of the interval [a, b] is again the interval (a, b). This two statements can be proved in a very similar way as we proved that the closure of (a, b) for a < b, is the set [a, b], etc., after Remark 74. Definition 77 The boundary of a subset A of R, denoted by bdr A, is the set A \ Int A. For example, the boundary of the interval [a, b] is the set {a, b}. The boundary of the set {(a, b]} is again the set {a, b}. This follows from the previous two examples. For a subset A of R, we have that bdr A = ∅ if, and only if, A = ∅ or A = R (see Exercise 13.77). Note that the boundary of a set is, by the definition, a closed set, as A \ Int A = A ∩ (R \ Int A). Some connections between the concepts of closure, interior, and boundary of a set and of its complement are collected in the following result. Proposition 78 Let A be a subset of R. Then, (i) A = (Int (Ac ))c , hence Int A = (Ac )c .
44
1 Real Numbers: The Basics
(ii) bdr A = bdr (Ac ) = A ∩ Ac . (iii) A = A ∪ bdr A = Int A ∪ bdr A. Proof (i) It is enough to observe that F is a closed set such that A ⊂ F if, and only if, F c is an open set such that F c ⊂ Ac . This proves the first statement in (i). The second follows by taking complements. (ii) follows from (i). Indeed, bdr A = A \ Int A = A ∩ (Int A)c = A ∩ Ac , and bdr (Ac ) = Ac \ Int (Ac ) = Ac ∩ (Int (Ac ))c = (Int A)c ∩ A. (iii) Int A ∪ bdr A ⊂ A ∪ bdr A ⊂ A. On the other hand, Int A ∪ bdr A = Int A ∪ (A \ Int A) = Int A ∪ (A ∩ (Int A)c ) = ((Int A) ∪ A) ∩ (Int A ∪ (Int A)c ) = A ∩ R = A. 2 The following result contains the promised description of the closure, interior, and boundary of a given set in terms of “proximity,” i.e., of neighborhoods. Proposition 79 Let A be a subset of R. Then (i) A is the set of all points x ∈ R with the property that every neighborhood of x intersects A. (ii) Int A is the set of all points x ∈ A with the property that some neighborhood of x is contained in A. (iii) bdr A is the set of all points x ∈ R with the property that every neighborhood of x intersects both A and Ac . Proof (i) Let x ∈ R be such that there exists a neighborhood U (that we can choose to be open) of x that does not intersect A, so A ⊂ U c . The set U c is closed, hence A ⊂ U c . Since x ∈ U c , we get x ∈ A. Conversely, if x ∈ A, then (A)c is an open set (see Remark 76) that contains x and does not intersect A. (ii) follows from (i) here and (i) in Proposition 78. (iii) follows from (i) and (ii) here and (ii) in Proposition 78. Proposition 80 The closure of a bounded subset of R is bounded. Proof Let A be a bounded subset of R. We can find then M ∈ R such that A ⊂ [−M, M]. Since [−M, M] is a closed subset of R, we have A ⊂ [−M, M] (see Remark 76). This shows the statement. Definition 81 Let A be a nonempty subset of R. (i) A point a ∈ A is said to be isolated in A if there exists a neighborhood U of a such that U ∩ A = {a}. (ii) A point x ∈ R is said to be an accumulation point of A if every neighborhood of x contains points in A \ {x}. Remark 82 Let A be a nonempty subset of R.
1.8 Topology of R
45
1. Observe that a point a ∈ A is isolated in A if, and only if, it is not an accumulation point of A. 2. Observe that a point x ∈ R is an accumulation point of a subset A of R if, and only if, every neighborhood of x contains infinitely many points of A. Indeed if every neighborhood of x contains infinitely many points of A then x is certainly an accumulation point of A. On the other side, if x ∈ R is an accumulation point of A, we may proceed inductively: Given a neighborhood U of x, take n1 ∈ N such that (x − 1/n1 , x + 1/n1 ) ⊂ U . Find x1 ∈ A ∩ (x − 1/n1 , x + 1/n1 ) with x1 = x. Find n2 > n1 such that x1 ∈ (x − 1/n2 , x + 1/n2 ) and find x2 ∈ A ∩ (x − 1/n2 , x + 1/n2 ) with x2 = x. Continue in this way to find the infinite set {xn : n ∈ N} in A ∩ U . 3. Nowhere in the definition of an accumulation point is said that the point x must belong to A. For example, let A := { n1 : n ∈ N}. The point x = 0 is an accumulation point of A, although 0 ∈ A. 4. Observe, too, that it is implicit in the definition that a set S, in order to have an accumulation point, must itself be infinite. It is worth to mention that an infinite— even closed—subset of R may lack, in general, accumulation points. For example, N is an infinite subset of R with no accumulation points (see Exercise 13.73 and compare with Theorem 96). ® Theorem 83 The closure of a subset A of R is the union of the set of all isolated points in A and the set of all accumulation points of A. Proof Let x ∈ A. If x ∈ A, then either x is isolated in A or every neighborhood of x contains points in A other than x, i.e., x is an accumulation point of A. If, on the contrary, x ∈ A and x ∈ A, then, by Proposition 79 (i), every neighborhood of x intersects A, hence x is an accumulation point of A. This shows that every point in A is either isolated in A or an accumulation point of A. On the other hand, if x is isolated in A then x ∈ A ( ⊂ A). If x is an accumulation point of A, then x ∈ A by Proposition 79 (i). Theorem 63 shows, in particular, that any real number can be approximated, with a prescribed degree of accuracy, by rational numbers, something important in Numerical Analysis. The description of the closure of a set given in Proposition 79 (i) allows for another formulation of the same phenomenon: The closure of the set Q is the set R (the details are worked out in the proof of Proposition 85 below). This property of a subset of R is important enough to be singled out. Definition 84 A subset D of R is said to be dense in R if D = R. Obviously, a finite subset F of R cannot be dense in R since it is closed, and so F = F . However, there are countable subsets of R that are dense in R: The set Q is such an example. This and other examples of dense subsets of R appear in the statement of the next result. Proposition 85 The set Q of all rational numbers, the set P of all irrational numbers, and the set D of all dyadic numbers (see Definition 30), are dense in R.
46
1 Real Numbers: The Basics
Proof Let x ∈ R be arbitrary. Consider any neighborhood U of x. The set U has to contain Ix (ε) := (x − ε, x + ε) for some ε > 0. Now, between two distinct real numbers x − ε and x + ε there has to lie a rational number and an irrational number (see Theorem 63). This proves the two first statements. Regarding dyadic numbers, it is enough first to observe that given any real number r, we can find n ∈ N such that r < n (see Proposition 24). We proved in the paragraph preceding equation (1.3) that n < 2n for all n ∈ N. All together, this shows that given ε > 0, we can find n ∈ N such that 1/ε < 2n , and so 2−n < ε. This concludes that some dyadic number of the form k/2n belongs to Ix (ε). Remark 86 The fact that R has a countable dense subset, as it has been proved in Proposition 85, is summarized by saying that R is separable. Note that we found in Proposition 85 three different proper subsets of R, each of them dense in R (the first and third countable, and the first two mutually disjoint). Separability will be treated in the context of metric spaces in Sect. 6.6. ® Remark 87 Collecting some of the previous remarks on some particular subsets of R, let us point out the following facts that can be shown in a standard way. 1. N is a closed subset of R with no interior point. Thus, N = N, Int N = ∅, and bdr N = N. 2. Q is a dense subset of R with no interior point. Thus, Q = R, Int Q = ∅, and bdr Q = R. 3. The set P of all irrational numbers is a dense subset of R with no interior point. Thus, P = R, Int P = ∅, and bdr P = R. 4. If D denotes the set of all dyadic numbers in R (see Definition 30), then D is a dense subset of R with no interior point. Thus, D = R, Int D = ∅, and bdr D = R. 5. The closure of any of the sets [a, b], (a, b), (a, b], and [a, b), is the set [a, b], where a and b are two real numbers such that a < b. The interior of any of the previous sets is the set (a, b). The boundary of any of the previous sets is the set {a, b}. 6. The set { n1 : n ∈ N} has empty interior. Its closure is the set { n1 : n ∈ N} ∪ {0}, hence its boundary coincides with its closure. 7. Every finite subset of R is closed, has empty interior, and so the set coincides with its boundary. ®
1.8.3
Topology on a Subset
In some contexts it is natural to restrict our work to a particular subset of R, and ignore what is outside. For example, notions like “open,” “closed,” “neighborhood,” “dense,” etc., may be considered just in the ambience of, say, the interval [0, 1], a natural thing to do in case we are dealing, e.g., with a function defined just there. In this circumstance, it will be not only inconvenient, but maybe even wrong, to
1.8 Topology of R
47
consider points out of the given set. Our “universe” will be just this given set. To be precise, let us consider the following definition. Definition 88 Let S be a nonempty subset of R. We say that a subset O of S is open relatively to S if there exists an open set U ⊂ R so that O = U ∩ S. Once we have this notion, all the other related notions are defined accordingly: for example, a subset F of S is closed relatively to S if S \ F is open relatively to S; a subset U of S is a neighborhood of a point x ∈ S relatively to S if U contains a set O open relatively to S and x ∈ O; a subset D of S is dense in S relatively to S if every nonempty open relatively to S subset of S intersects D. As a particular instance, observe that the set D of rational points in [0, 1] is dense in [0, 1] relatively to [0, 1]. Sometimes, if it would not cause any misunderstanding, we will just say that D is dense in [0, 1]. The same for other notions. It is simple to show that, if S and T are subsets of R, and S ⊂ T , the set S is open (closed) relatively to T , and T is open (respectively, closed) in R, then S is open (respectively, closed) in R. In general, a set can be open relatively to a certain superset and not open in R. For example, the set [0, 1] is open relatively to [0, 1], since [0, 1] = (−1, 2) ∩ [0, 1], and (−1, 2) is open in R. Certainly, [0, 1] is not open in R. Analogously, the set [0, 1/2) is open relatively to [0, 1], since [0, 1/2) = (−1, 1/2) ∩ [0, 1]. Later on, when dealing with metric spaces (Chap. 6), those concepts will find their natural place.
1.8.4
Compactness
We pass now to a series of results that deal with one of the most important and most beautiful concepts in Mathematics: Compactness. Let A be a bounded subset of R. By the definition it lies inside a closed interval, say [a, b]. This interval has, then, a finite length, precisely b − a (see Definition 35). Intuitively, if A is moreover infinite, it must “accumulate” around some point in [a, b], i.e., we should be able to find an element x0 ∈ [a, b] such that “many” points of A are close to x0 . This is an argument that has an important number of consequences, as we shall show later. For example, it allows to show that “reasonable” real-valued functions attain their maximum and minimum values on “reasonable” sets, which is a crucial thing in Approximation and Optimization theories. A tool to formulate what is behind (the “compactness” of a closed and bounded interval) is the concept of “open cover.” Recall that a subfamily of a given family {Ai : i ∈ I } of sets is a family {Ai : i ∈ J }, where J is a nonempty subset of I . A family F of subsets of a set S is said to be a cover of a subset A of S if A ⊂ ∪F ∈F F . Definition 89 By an open cover of a set A ⊂ R we understand a family O (possibly infinite) of open subsets of R that covers A, i.e., whose union contains A. A subcover of the cover O is a subfamily of O that still covers A. We speak of a finite subcover if the subcover consists of a finite collections of sets.
48
1 Real Numbers: The Basics
Definition 90 A subset A of R is said to be compact if every open cover O of A has a finite subcover. Example 91 To check the compactness of the examples below we can rely on Theorem 96. However, we find instructive to face them just by using the definition of compactness. 1. Every finite subset F of R is compact. Indeed, given an open cover O of F , find, for each x ∈ F , an element Ox ∈ O such that x ∈ Ox . The (finite) family {Ox : x ∈ F } is a subcover of the cover O. 2. The set {1/n : n ∈ N} is not compact. Indeed, consider the open cover {In : n ∈ N}, where In := (1/n, +∞) for n ∈ N. If {Ink : k = 1, 2, . . ., m} is a finite subcover, and n0 := max{nk : k = 1, 2, . . ., m}, it is clear that 1/n0 is not in m I . k k=1 3. The interval (0, 1] is not compact. The open cover O of (0, 1] from which no finite subcover can be extracted is the same as in item 91.2 above, as well as the argument to prove the impossibility. 4. The following example starts on revealing the beauty of the concept of compactness: The set ∞ 1 C := {0} ∪ n n=1 is compact in R. Indeed, let O be an open cover of C. Then for some O ∈ O we have 0 ∈ O. Let ε > 0 be such that 0 ∈ (−ε, ε) ⊂ O. Then, for n > 1ε (recall that the existence of such a natural number follows from Proposition 24) we have n1 ∈ (−ε, ε) ⊂ O. For n ≤ 1ε , find On ∈ O such that n1 ∈ On . Thus {O, On : n ≤ 1ε } is a finite subcover of O of the set C. ♦ ♦ One of the basic results in Analysis says that the compact subsets of R are exactly the sets that are simultaneously closed and bounded (see Theorem 96 below). In order to prove this result we shall proceed in the following way: first, we shall prove that every closed and bounded interval in R is compact (Theorem 92 below). Then, we shall prove (Lemma 93 below) that every closed subset of a compact set in R is itself compact. Conversely, we will show that every compact subset of R is simultaneously closed and bounded (Lemmas 94 and 95 below, respectively). Let us proceed along this path. The following is one of the most important basic results in Analysis. Its proof further develops the main idea in Example 91.4 above. Theorem 92 Any interval [a, b], where a and b are real numbers, and a ≤ b, is compact. Proof Assume, on the contrary, that there exists an open cover O of [a, b] that admits no finite subcover. Split the interval [a, b] into two consecutive closed subintervals, each of length b−a , i.e., [a, (a + b)/2] and [(a + b)/2, b]. Certainly, one of these 2 subintervals, say I1 , cannot be covered by any finite subcover of the cover O. Split
1.8 Topology of R
49
the interval I1 into two consecutive closed subintervals each of length b−a ; again, 4 one of these subintervals, say I2 , cannot be covered by any finite subcover of the cover O. Note that I2 ⊂ I1 . We continue this algorithm and produce a sequence of closed intervals {I1 , I2 , . . . }, where Ik+1 ⊂ Ik for all k ∈ N and none of them can be covered by a finite subcover of O. For each k ∈ N, the length of the interval Ik is b−a . The intersection of these 2k intervals is a single point x ∈ [a, b], due to Theorem 69. Consider a member O in the family O for which x ∈ O. Note that O is open, and thus we can choose an open interval I containing x, say (x − ε, x + ε) for some ε > 0, so that I ⊂ O. Choose k0 ∈N such that (b − a)/2k0 < ε (see Propositions 15 and 85). Since x ∈ Ik0 as {x} = ∞ k=1 Ik , for every y ∈ Ik0 we have |x − y| < ε. Thus y ∈ I ( ⊂ O) for every y ∈ Ik0 , hence Ik0 ⊂ O and the subfamily of O consisting of the single set O is a finite subcover of the cover O for the interval Ik0 , a contradiction. Lemma 93 Let K be any compact subset of R. Then, every closed subset C of R contained in K is also compact. Proof Consider any open cover O of the set C. Since C is closed, O ∪ {C c } is an open cover of K. Due to the fact that K is compact, we can choose a finite subcover of O ∪ {C c }, say {U1 , . . ., Un , C c }, that covers K. Therefore, the family {U1 , . . ., Un } is a finite subcover of O that covers C. It follows that the set C is compact. Lemma 94 Any compact subset K of R is closed. Proof We will show that the complement of K, i.e., the set K c , is open. If K c is empty, we are done. Otherwise, fix an arbitrary x ∈ K c . For each k ∈ K choose two open intervals Uk and Vk so that Uk ∩ Vk = ∅, x ∈ Uk , andk ∈ Vk . The collection {Vk : k ∈ K} is an open cover of K. Since K is compact there exist points {k1 , . . ., kn } in K so that {Vk1 , . . ., Vkn } is a finite subcover of {Vk : k ∈ K} that covers K. Form the set n U ki U := i=1
and note that (x ∈) U ⊂ K . Indeed, U ⊂ Uk for each k and Uk ∩ Vk = ∅. Thus U ∩ ( nk=1 Vk ) = ∅. As {Vk : k = 1, 2, . . ., n} is a cover of K, we get U ∩ K = ∅, so U ⊂ K c . The set U is a finite intersection of open sets, so it is an open neighborhood of x. Since this happens for every x ∈ K c , the set K c is open, and so K is closed. c
Lemma 95 Any compact subset K of R is bounded. Proof Put Ik := (k − 1, k + 1) for k ∈ K. Each set Ik is an open interval in R that contains k. The family O := {Ik : k ∈ K} is an open cover of K. It has, by definition, a finite subfamily that covers K. The union of the members of this subfamily is bounded, hence K is also bounded.
50
1 Real Numbers: The Basics
Fig. 1.13 “Catching” the point x0 a
a+b 4
a+b 2
b
x0
We arrive now to the announced result. It is due to the German mathematician H. E. Heine and the French mathematician É. Borel. Theorem 96 (Heine–Borel)n A subset of R is compact if, and only if, it is closed and bounded. Proof Let K be a compact subset of R. That K is closed and bounded follow, respectively, from Lemmas 94 and 95. Conversely, assume that K is a closed and bounded subset of R. Then we may find a closed interval [a, b] such that K ⊂ [a, b]. The interval [a, b] is compact (see Theorem 92), and thus K, by Lemma 93, is also compact. The following is a crucial result in Analysis. The technique behind its proof can be visualized by looking at Fig. 1.13. Theorem 97 Every infinite compact subset A of R has at least one accumulation point that belongs to A. Proof Since A is bounded (Lemma 95), we can find an interval I1 := [a, b] such that A ⊂ [a, b]. Halve this interval to obtain two adjacent closed intervals. At least one of them contains an infinite number of elements of A. Call this interval I2 . Halve it. At least one of the resulting intervals contains an infinite number of elements of ∞ A. Call this interval ∞ I3 . Continue in this way to obtain {In }n=1 . Use Theorem 69 to conclude that n=1 In = {x0 } for some x0 ∈ R. Note that x0 ∈ A since the lengths of the intervals In approach zero, and any of them contains infinitely many points of A; due to the fact that A is closed (Lemma 94), we get x0 ∈ A (see Remark 74). It is clear that x0 is an accumulation point of A. Remark 98 A consequence of Theorem 96 is that every nonempty compact subset K of R has a supremum S and an infimum s that belong to K. Indeed, the existence of S and s follow from the fact that K is bounded (see Theorem 45 and Remark 46.2). If S, say, does not belong to K, the fact that K is closed implies the existence of ε > 0 such that (S − ε, S + ε) ∩ K = ∅. This violates that S = sup K, as it can be seen from the description of sup K given in Proposition 44, namely there is k ∈ K such that S − ε < k ≤ S. The argument for s is similar. ®
1.8.5
Connectedness and Related Concepts
The following result says that, in some sense, the open intervals of R are the building blocks for the open sets of R.
1.8 Topology of R
51
Proposition 99 Given any open set U ⊂ R there exists a countable pairwise disjoint family {In : n ∈ N} of open intervals so that U=
∞
In
n=1
Proof Let x ∈ U . Consider the two sets Lx := {t ∈ U : (t, x] ⊂ U } and Rx := {t ∈ U : [x, t) ⊂ U }. Since U is open, both Lx and Rx are nonempty. If Lx is not bounded below, put ax := −∞ (this is just a symbol). Otherwise, put ax := inf Lx . Similarly, if Rx is not bounded above, put bx := +∞ (again just a symbol). Otherwise, put bx := sup Rx . Note that (ax , bx ) ⊂ U . Note, too, that if ax is finite, then ax ∈ U (if ax ∈ U , we should be able to “enlarge” the interval (ax , x] to the left still being in U , since U is open). The same applies to bx . Another important remark is that for any y ∈ U ∩ (ax , bx ), we have (ay , by ) = (ax , bx ). This shows, in particular, that for any two arbitrary intervals (ax , bx ) and (ay , by ), either (ax , bx ) = (ay , by ) or (ax , bx ) ∩ (ay , by ) = ∅. It follows, then, that the collection {(ax , bx ) : x ∈ U } is pairwise disjoint. Since each x ∈ U belongs to the corresponding (ax , bx ), the union of this collection is, precisely, the set U . All we have to show now is that {(ax , bx ) : x ∈ U } is a countable collection. This seems unlikely at a first glance (there are many elements in U ). However, keep in mind that many of the intervals in the family coincide. To prove the claim define the map F from the set Q ∩ U into the family {(ax , bx ) : x ∈ U } by F (r) = (ar , br ), for r ∈ Q ∩ U. This mapping is onto, since each interval in the family contains a rational point, and this point is then in U . It is true that the mapping F is not (cannot be) one-to-one. However, we may choose, for each interval I in the family {(ax , bx ) : x ∈ U }, a single element r ∈ Q ∩ I . We obtain a subset S of Q. Now, the restriction of the mapping F to S is one-to-one, and onto the family {(ax , bx ) : x ∈ U }. Since S, as a subset of Q, is certainly countable (see Proposition 51), we get that our family {(ax , bx ) : x ∈ U } is also countable. Remark 100 Note the following point highlighted in the proof above: Any pairwise disjoint family of open subsets of R must be countable. ® We now define a concept that allows to show that “reasonable” real-valued functions attain all values between two values that they attain (i.e., they satisfy the so-called “intermediate value property”). This is widely used in many areas in Mathematics, in particular in solving equations. Applications will be presented along the pages of this text. Definition 101 A set E ⊂ R is said to be disconnected if there exist two nonempty disjoint subsets U and V of E, each of them open relatively to E (see Definition 88), so that E = U ∪ V . If a set E is not disconnected then it is said to be connected.
52
1 Real Numbers: The Basics
Remark 102 Assume that U ⊂ E is open relatively to E. Then E \ U is closed relatively to E. So, an equivalent definition is that E is disconnected whenever there exist a nonempty subset U of E that U = E and U is simultaneously open and closed relatively to E. Observe that the set Q ∩ [0, 1] is disconnected. Indeed, the two sets √ √ 2 2 ∩ [0, 1], andV := Q ∩ , +∞ ∩ [0, 1] U := Q ∩ −∞, 2 2 are both open relatively to Q ∩ [0, 1], disjoint and nonempty, and U ∪ V = Q ∩ [0, 1]. Another example of a disconnected subset of R is the set (0, 1) \ {1/2}, since it can be written as (0, 1/2) ∪ (1/2, 1). ® Proposition 103 No subset of R but R itself or ∅ may be simultaneously open and closed in R. In other words, R is connected. Proof Assume that a set A ⊂ R is simultaneously open (in R),closed (in R), nonempty and different from R. Use Proposition 99 to write A = ∞ n=1 In , where, in order to avoid considering different cases, the pairwise disjoint open intervals In in the family {In : n ∈ N} may be eventually empty.Since Ac is also open (and nonempty), the same proposition ensures that Ac = ∞ n=1 Jn , again for a family of pairwise disjoint open intervals (maybe eventually empty). Let x ∈ A, and let I := (ax , bx ) be the associated interval (one of the intervals In ) defined in the proof of Proposition 99. We cannot have at the same time ax = −∞ and bx = +∞. Assume, without loss of generality, that ax ∈ R. We showed there that ax ∈ A, so ax ∈ Ac , hence ax ∈ Jn for some n ∈ N. This is impossible, for (ax , bx ) ⊂ A will have then elements in Jn , hence in Ac . By a general interval in R we mean any interval of type bounded, or unbounded, or closed, or open, or half-closed in R. See also the first paragraph in Subsection 7.3.2. Corollary 104 Every general interval in R is a connected set, and, conversely, every connected subset of R is a general interval. Proof It is enough to prove the assertion for an open interval. Indeed, if U and V are nonempty disjoint subsets of [a, b], each of them relatively open, and such that U ∪ V = [a, b], then U ∩ (a, b) and V ∩ (a, b) are relatively open in (a, b), disjoint, nonempty, and (U ∩ (a, b)) ∪ (V ∩ (a, b)) = (a, b). A similar argument applies to other type of intervals. The proof that (a, b) is connected is a direct consequence of the fact that R is connected (Proposition 103) and some properties of functions. We shall postpone this direct proof to the moment where the concept of a continuous function will be at hand (see Proposition 331). A proof that does not rely on this is the following: Assume that (a, b) = A ∪ B, where A and B are disjoint nonempty subsets of (a, b), both open relatively to (a, b). Since (a, b) is already open, the two sets A and B are open in R. Take a0 ∈ A and b0 ∈ B. Without loss of generality, we may assume that a0 < b0 . Let C := {x ∈ (a0 , b) : (a0 , x] ⊂ A}. This set is nonempty, due to the fact that A is
1.8 Topology of R
53
open. Let a1 := sup C (> a0 ). It exists since C is bounded. Note that [a0 , a1 ) ⊂ A. If a1 ∈ A, then we may find a2 ∈ (a1 , b) such that [a1 , a2 ) ⊂ A, hence (a0 , a2 ] ⊂ A, contradicting the definition of a1 . Thus, a1 ∈ B. Since B is open, we can find b1 ∈ B, b1 < a1 , such that (b1 , a1 ] ⊂ B, and this contradict again that for some x ∈ (b1 , a1 ] we have (a0 , x] ⊂ A. That the converse holds, i.e., that every connected subset of R is a general interval, is easy: assume that S is a connected subset of R. If S fails to be an interval, there exists x0 ∈ R \ S such that S1 := (−∞, x0 ) ∩ S = ∅, and S2 := (x0 , +∞) ∩ S = ∅. Then S = S1 ∪ S2 , and S1 ∩ S2 = ∅. Since S1 and S2 are two open relatively to S subsets of S, we reach a contradiction. Remark 105 The properties enjoyed by the family of all the open subsets of R in Proposition 71 characterize, in the abstract setting, families called topologies. To be precise, a topology on a nonempty set T is a family T of subsets of T having the following properties: (O1) (O2) (O3) (O4)
∅∈T. T ∈T. If O1 ∈ T and O2 ∈ T , then O1 ∩ O2 ∈ T . If I is an arbitrary nonempty index set and Oi ∈ T for every i ∈ I , then i∈I Oi ∈ T .
Elements in the family T are called open sets. The couple (T , T ) is called a topological space. Many of the definitions and results in this section carry on to this more general setting. In general, there is no distance here to define the concept of open set. The starting point is the family of open sets, given from the beginning and subjected, only, to satisfy the four axioms above. In Chap. 6 we shall present the case in which the topology is given by a distance (also called a metric), giving rise to what shall be called in due course a metric space. Given two topological spaces (T , T ) and (S, S), a one-to-one mapping f from T onto S such that f (O) ∈ S for every O ∈ T and f −1 (U ) ∈ T for all U ∈ S is called a homeomorphism. From the solely topological point of view, spaces (T , T ) and (S, S) are indistinguishable if there is such a homeomorphism. ® The elegant concept of a topology greatly simplified a big part of mathematics, distilling the essential facta about proximity, simplifying and unifying many proofs, and allowing for an application of general principles to several, apparently apart areas of many disciplines. Pioneers were mathematicians like L. Euler, members of the French, Italian, German and Polish Schools, around names like H. Poincaré, M. Fréchet, R. Baire, V. Volterra, C. Arzelà, G. Ascoli, F. Hausdorff, D. Hilbert, G. Cantor, L. E. J. Brouwer, K. Kuratowski, H. Steinhauss, K. Borsuk, S. Banach, S. Mazur, and many others.
54
1 Real Numbers: The Basics
1.9 The Baire Category Theorem in R In this section we fix an arbitrary nonempty closed subset F of R (the possibility that F = R not being excluded). Definition 88 introduced the concept of relatively open set (relatively to F ), and thus all topological concepts introduced so far (i.e., closures, interiors, etc.) have a “relatively to F ” version. For example, if A is a subset of F , the closure of A relatively to F is the smallest (relatively to F ) closed subset of F that contains A. Since the terminology and the notation will be cumbersome if we repeat the clause “relatively to F ” again and again, In this section all topological concepts—i.e., closures, neighborhoods, etc.— will be considered relatively to a given nonempty closed subset F of R if nothing is said on the contrary. The reader is invited to consider that, for the moment being, nothing outside F exists. Definition 106 A subset A of F is said to be nowhere dense if A has an empty interior. For example, in F := R, the set N, or the set {(1/n) : n ∈ N}, are nowhere dense. We shall see many more examples later on. Obviously, a subset of a nowhere dense set is itself nowhere dense. Definition 107 A subset A of F is said to be of first category if it is a countable union of nowhere dense sets. If the set A is not of first category, it is said to be of second category. Clearly, a subset of a set of first category is itself of first category. Theorem 109 below is a deep result in real analysis. The reader will notice its importance growing along the following pages. This result is due to the French mathematician R. Baire. It will motivate the introduction of a class of topological spaces (the so-called Baire spaces, see Definition 638) that play an important role in modern Analysis. Typically, Theorem 109 is often used in the following way: Imagine that one needs to construct an object that is determined by countably many conditions Gn , and we know that there are plenty of objects with the condition Gn for each n ∈ N. Then we can show that there are plenty of objects we look for. Note that this is an existence theorem: We ensure the existence of plenty of the sought objects without actually specifying any particular one. This situation is common in Mathematics, and we shall encounter it in many places in this text. In the proof of Theorem 109, we shall use the following simple result: Lemma 108 Let F be a closed subset of R. Let S be a subset of F . Assume that S is closed relatively to F . Then S is closed in R. Proof There exists a closed set C of R such that C ∩F = S. Note that the intersection of two closed sets in R is closed in R (Proposition 71). The result follows.
1.9 The Baire Category Theorem in R Fig. 1.14 The construction in the proof of Theorem 109 (sets U n in grey)
55 U1 G1
V U2
G2
G3
Given a bounded subset A of R, the diameter of the set A is the real number diam (A) := sup{d(x, y) : x, y ∈ A}. Theorem 109 (Baire Category Theorem) Let F be a nonempty subset of closed ∞ R. Let {Gn }∞ n=1 be a sequence of open dense subsets of F . Then n=1 Gn is dense in F . Proof Fix a nonempty open subset V of F . Since G1 is dense in F , we have G1 ∩ V = ∅. We may find a nonempty open interval U1 such that U 1 ⊂ G1 ∩ V and diam (U 1 ) < 1. Since G2 is dense in F , we have G2 ∩ U1 = ∅. We may find a nonempty open interval U2 such that U 2 ⊂ G2 ∩ U1 and diam (U 2 ) < 1/2. Continue in this way to obtain a sequence {Un } of nonempty open intervals such that V ⊃ U 1 ⊃ U1 ⊃ U 2 ⊃ U2 ⊃ . . . ⊃ U n ⊃ Un ⊃ . . ., U n ⊂ Gn ∩ Un−1 , and diam (U n ) < 1/n for n = 2, 3, . . . Note that U n is the closure of Un relatively to F . However, since F is itself closed, U n coincides with the closure of Un in R (see Lemma 108). This holds for every n ∈ N. By Theorem 69 we get ∞ for some x ∈ U 1 (hence x ∈ V ). Note too that x ∈ ∞ n=1 U n = {x} n=1 Gn . This ∞ shows that V ∩ G = ∅. Since V is an arbitrary nonempty subset of F we n n=1 get that ∞ n=1 Gn is dense in F . (For an illustration of the construction in the proof, see Fig. 1.14. Even that the result here is 1-dimensional, a 2-dimensional picture is more visible.) Remark 110 The condition of being open for the sets Gn in the statement of Theorem 109 cannot be dropped. For an example, see Exercise 13.398. ® Several other formulations turn out to be equivalent to the statement of the Baire Category Theorem 109. We present the following version (Theorem 111 below), and we shall prove that it follows from Theorem 109. How to prove that Theorem 109 follows from Theorem 111 is the purpose of Exercise 13.82. Theorem 111 (Baire) Let F be a nonempty closed subset of R. Then, every nonempty open subset O of F is of second category in F .
56
1 Real Numbers: The Basics
Proof Let O be a nonempty open subset of F . Assume that O is of first category in F , i.e., O = ∞ F . It follows n=1 An , where each An is nowhere dense in ∞easily cthat ∞ Gn := (An )c is open and dense in F . By Theorem 109, G (= ( n n=1 n=1 An ) ) is c A ) ∩ O = ∅, a contradiction. dense. However, ( ∞ n n=1 Remark 112 Theorem 111 applies to nonempty subsets of F that are open relatively to F . Note that the set F itself is open relatively to F , since F = R ∩ F , and the set R is open in R. Then Theorem 111 applies also to O = F . Of course, this means that F (an arbitrary nonempty closed subset of R) cannot be written as a countable union of subsets of F that are nowhere dense relatively to F . As an application, we recover the fact that [0, 1] is uncountable (see Theorem 59 and Proposition 61, where we proved a more precise statement). Indeed, were [0, 1] be countable, then for at least one of its points x we should have that the interior (relatively to [0, 1]) of {x} will be nonempty. Since {x} is closed, this means that x should be isolated in [0, 1], and this is false. ® A particular case of Theorem 111, stated in the light of Remark 112, is worth mentioning. Corollary 113 Let F be a closed nonempty subset of R. Assume that F = ∞ n=1 Fn , where each Fn is a closed set. Then there exists n ∈ N such that Fn has a nonempty interior relatively to F .
Chapter 2
Sequences and Series
This chapter deals with sequences, series, and products of real numbers, and the fundamental concept of convergence of these entities. We shall treat, too, approximation of real numbers by rational numbers, and we shall introduce the Euler number e.
2.1 Approximation by Rational Numbers A good deal of the computational work with real numbers is being done on subsets of rational numbers (there is no way to store in a computer an infinite sequence of digits, as the decimal—or the binary—expansion of an irrational number). Fortunately, as we showed in Proposition 85, the set of rational numbers, as well as the set of dyadic numbers, are dense in the set of real numbers. This allows, then, to approximate any real number by rational numbers or by dyadic numbers as wished. A set of techniques to properly manage approximation is, certainly, needed. These, as part of what is known as “approximation theory”, have been well developed since, and the reader certainly realizes their importance for dealing with the world around us. Although this may seem a paradox, all exact science is dominated by the idea of approximation. Bertrand Russell Le Calcul infinitésimal, [...], est l’apprentissage du maniement des inégalités bien plus que des égalités, et on pourrait le résumer en trois mots: MAJORER, MINORER, APPROCHER. (The infinitesimal calculus, [...], consists of learning the use of inequalities, rather than equalities themselves, and may be summarize in three actions: to SEARCH FOR UPPER BOUNDS, to SEARCH FOR LOWER BOUNDS, to APPROXIMATE.) Jean Dieudonné
Suppose we want to store the number 13 in a computer that is able to manage only dyadic numbers from the list { 2kn : k = 0, 1, 2, . . . , 2n , n = 0, 1, 2, 3}. We find that the closest dyadic number to 13 in our list is the number 38 ( = 0.375 in base 10). This may not be very satisfactory. Suppose now that we can store all dyadic numbers (something impossible) in a computer. We still would not be able to store the number 1 , since 13 is not a dyadic number (see Exercise 13.20). 3 © Springer International Publishing Switzerland 2015 V. Montesinos et al., An Introduction to Modern Analysis, DOI 10.1007/978-3-319-12481-0_2
57
58
2 Sequences and Series
Proposition 114 Let b ≥ 2 be a natural number, and let b = p1k1 p2k2 · · · psks
(2.1)
be the (unique up to reordering) prime-number factorization of b (see Proposition 8). Then a proper fraction α has a finite expansion in the base b if, and only if, α = p/q, where p, q ∈ Z, q = 0, and the prime-number factorization of q uses only prime numbers in (2.1). Proof Assume that α has a finite expansion in the base b. Then, for some m ∈ N and integers n1 , . . . , nm , nm n2 n1 + 2 + ··· + m b b b m−1 m−2 + n2 b + · · · + nm n1 b L = = mk1 , bm p1 · · · psmks
α=
where L ∈ Z. Conversely, assume that, for some p ∈ Z, and li ∈ N ∪ {0} for i = 1, 2, . . . , s, we have p α = l1 . p1 · · · psls Choose {n1 , · · · , ns } so that p1l1 · · · psls p1n1 · · · psns = bN for some natural number N . Clearly we have
p p1n1 · · · psns , α= bN so the number α has a finite expansion in the base b. As a particular case of Proposition 114, observe that fractions (between 0 and 1) that have finite decimal expansions are, precisely, those of the form pq , for q = 2l 5k , where k, l ∈ N ∪ {0}. Corollary 115 Consider a base b, where b is a prime number. Then a fraction α has a finite expansion in the base b if, and only if, α=
p , where p ∈ Z and l ∈ N ∪ {0}. bl
Imagine the ideal—impossible—situation where we can store all fractions in a computer. Since Q is dense in R, for every irrational number x ∈ R there exists a fraction p so that |x − pq | can be made arbitrarily small. There is a price we have to pay: the q more accurate the approximation, the larger the denominator of the fraction. Increasing the size of the denominators of our fractions can be computationally costly; we then need to know, how close we can get to an irrational number with
2.1 Approximation by Rational Numbers
59
a fixed size of the denominator. The two following results are due to the German mathematician J. P. G. Lejeune Dirichlet. Lemma 116 (Dirichlet) Let θ ∈ R and t ∈ N. Then there exist integers p and q so that 0 < q ≤ t and 1 |qθ − p| < t Proof For every k ∈ {0, . . . , t} choose an integer nk so that 0 ≤ kθ − nk < 1 and set xk = kθ − nk . Split the interval [0, 1) into t intervals I1 , I2 , . . . , It , each of them having length 1t : 1 1 2 t −1 I1 = 0, , I2 = , , . . . , It = ,1 . t t t t Having t intervals and t + 1 numbers {xk }, k ∈ {0, . . . , t}, we conclude that at least one interval In contains at least two numbers xk and xl for k = l. Set q = k − l and p = nk − nl and observe |qθ − p| = |(k − l)θ − (nk − nl )| = |xk − xl |
1 , k = 1, 2, . . . , n. qk Q
(2.4)
For this Q we may find, due to Theorem 117, an expression p/q that satisfies (2.2), i.e., 1 θ − p < 1 ≤ , q qQ Q so p/q is not in the list (2.3), a contradiction.
® In Definition 39 we introduced the floor and ceiling functions, and denoted by fr (x) the fractional part of x, i.e., fr (x) := x − x, for all x ∈ R. Observe that fr (x) ∈ [0, 1) for all x ∈ R. Theorem 119 Let θ ∈ [0, 1]. Then the set A = {fr (nθ)}∞ n=1 is dense in [0, 1] if, and only if, θ is irrational.
2.2 Sequences
61
Proof Let θ ∈ [0, 1] be a given irrational number. Let x ∈ [0, 1] be arbitrary and let ε ∈ (0, 1). Choose t ∈ N large enough so that 1t < ε (Proposition 24). According to Lemma 116, there exist integers p and q so that β := |qθ − p| < ε. Since θ is not a rational number we have β = 0. Note that we either have qθ −p = β or qθ − p = −β. Assume first that qθ −p = β. Then fr (qθ ) = β. Choose n ∈ N so that nβ < 1 and (n+1)β > 1 (recall that β is irrational). Note that fr (kqθ ) = kβ for all k ∈ {1, . . . , n} and thus |fr (k0 qθ ) − x| ≤ β for some k0 ∈ {1, . . . , n}. The result then follows. Assume now that qθ − p = −β. Then fr (qθ) = 1 − β. Choose n ∈ N so that nβ < 1 and (n + 1)β > 1. We have fr (kqθ ) = 1 − kβ for all k ∈ {1, . . . , n} and thus |fr (k0 qθ ) − x| ≤ β for some k0 ∈ {1, . . . , n}. This finishes the proof of this implication. If, on the contrary, θ is a fraction, then the set A can not be dense. This follows from the fact that if θ = p/q for some p ∈ Z, q ∈ N, q = 0, then qθ ∈ Z, so fr ((q + n)θ ) ∈ {fr (θ ), fr (2θ), fr (3θ), . . . , fr (qθ )}, for all n ∈ N. Taking for granted the continuity of the trigonometric functions sin x and cos x (see Sect. 5.2.5), Theorem 119 implies the density in the unit circle of the set {( cos nθ, sin nθ) : n ∈ N} whenever θ is an irrational multiple of π . See Exercise 13.86.
2.2 2.2.1
Sequences Basics on Sequences
A sequence of real numbers is a mapping s from N into R. It is customary, instead of writing s(n) for the image of the element n ∈ N, to use just sn . Then we represent a sequence as a list in the form {s1 , s2 , s3 , . . . }, also denoted just by {sn }∞ n=1 . If there is no risk of misunderstanding, we will write {sn } instead1 .
1 Note that we consider only infinite sequences, if nothing is said on the contrary. This does not mean that the range of the sequence should be necessarily an infinite set. For example, a constant sequence {1, 1, 1, . . . } is a perfectly acceptable sequence.
62
2 Sequences and Series
Definition 120 We say that a sequence {xn }∞ n=1 of real numbers is bounded above (bounded below) (bounded) if the set {xn : n ∈ N} ⊂ R is bounded above (respectively, bounded below) (respectively, bounded). The following are examples of sequences of real numbers: ∞ √ √ √ 1 1 1 3 , { n n}∞ = 1, , , . . . n=1 ( = {1, 2, 3, . . . }), {1, 0, 1, 0, 1, . . . }. 2 3 n n=1 Definition 121 below is central in Analysis. It is a typical ε-versus-n0 game. Let us first explain the main idea in the mechanism behind. Let {xn }∞ n=1 be a sequence in R. We want to convey the idea that it approaches some point x ∈ R, that the approximation turns out to be as good as we wish if we let the sequence “run”, and that after some moment, if the approximation is good, it will be good later on as well. All this may be expressed in the following quantitative way, that has a “two player game flavor”: Player 1 plays positive real numbers ε, and player 2 plays in response a natural number n having a previously fixed property. Player 1 wins (and then the sequence {xn }∞ n=1 does not converges to x) if he/she can produce ε > 0 such that no answer n from player 2 fits. Player 2 wins (and then the sequence converges to x) if he/she has a strategy that produces n for any ε > 0 (the natural number n depends on ε). More precisely: Player 1 starts by playing some positive number ε. Player 2, in response, plays nε ∈ N (we use this notation to stress that n depends on ε) in such a way that for all n ≥ nε , |xn − x| < ε. Player 1 plays another ε (he/she tries to beat player 2—i.e., so that player 2 cannot find the corresponding number nε —so he/she is interested in playing a small positive number). No way. The second player produces another nε ∈ N (probably much bigger that the first one) having the same property: for n ≥ nε , xn is ε-close to x. If the game continues forever, in the sense that no ε > 0 played by the first player can make the second player to surrender (in other words, if the second player can always provide the nε ∈ N needed—the second player has a “winning strategy”), we say that the sequence {xn }∞ n=1 converges to x. Let us formalize precisely this as a definition, a most fundamental and truly ingenious one, essentially due to the Czech mathematician B. Bolzano and the French mathematician A. L. Cauchy. Definition 121 A sequence {xn }∞ n=1 in R is said to converge to a real number x if for every ε > 0 there exists nε ∈ N such that |xn − x| < ε for all n ∈ N, n ≥ nε . If this is the case, we write lim xn = x,
n→∞
in short, lim xn = x, n
or
lim xn = x,
or even
xn → x,
and we say that x is the limit of the sequence {xn }∞ n=1 —equivalently, that the sequence ∞ {xn }∞ n=1 tends to x. A sequence {xn }n=1 that has a limit in R is said to be convergent. Otherwise, the sequence is said to be divergent.
2.2 Sequences
63
Example 122 As a first example, let us define a sequence {xn }∞ n=1 in the following way: Put xn = 10−3 if n is odd, otherwise xn = 0. The question is whether {xn }∞ n=1 approaches 0. Player 1 plays ε = 1/2; player 2 replies by any nε . Player 1 plays 1/3. Player 2 again may play any nε . The game continues until player 1 plays ε = 10−3 . Then the game ends, since there is no choice of nε ∈ N that will satisfy |xn −0| < 10−3 for all n ≥ nε . This shows that the sequence {xn }∞ ♦ n=1 does not converge to 0. Remark 123 1. Note that the statement |xn − x| < ε in Definition 121 is equivalent to xn ∈ (x − ε, x + ε). 2. A terminological remark: given a sequence {xn }∞ n=1 , we say that a property of the terms xn occurs eventually whenever there exists N ∈ N such that the property holds for all n ≥ N . We say that the property occurs frequently in case that, for every N ∈ N, there exists n ≥ N such that the property holds for xn . ® It is important to realize that the concept of limit of a sequence is not ambiguous. Precisely, we have the following result. Proposition 124 If a sequence in R is convergent, its limit is unique. Proof Assume that a sequence {xn }∞ n=1 has two limits, say x and y. Then, given ε > 0, we can find n0 and n1 such that, for n ≥ n0 , |x − xn | < ε/2, and for n ≥ n1 , |y − xn | < ε/2. Then, for n = max{n0 , n1 }, we have |x − y| = |x − xn + xn − y| ≤ |x − xn | + |xn − y| < ε. Since ε > 0 is arbitrary, we conclude that x = y. Remark 125 The reader will find that computations concerning limits in many of the subsequent arguments give that a certain quantity is less than Kε, where K is a given fixed constant, and ε is an arbitrary number. Thanks to the arbitrariness of ε and the fact that K is a constant, fixed along the entire argument, the reader can safely substitute Kε by ε and derive the conclusion. Thus, for example, lim xn = 0 if for every ε > 0 there exists nε ∈ N such that |xn | < 10ε for all n ∈ N, n ≥ nε , and lim xn = 0 in turns implies that for every ε > 0 there exists n ε > 0 such that |xn | < ε/6 for all n ∈ N, n ≥ n ε . ® Proposition 126 We have 1 = 0. n→∞ n lim
Proof Let ε > 0. Choose nε ∈ N so that nε > 1/ε (this follows from Proposition 24), hence 1/nε < ε. If n ≥ nε then n1 ≤ n1ε , thus 1 − 0 ≤ 1 − 0 < ε, if n ≥ nε . n n ε According to the definition, this means that 1/n → 0.
n Example 127 The sequence {xn }∞ n=1 , where xn = ( − 1) for n ∈ N, is not convergent. Indeed, assume that it converges to some x ∈ R. Put ε = 21 . Let nε be
64
2 Sequences and Series
such that |xn − x| < ε = 21 for every n ≥ nε . If n ≥ nε , then |xn − xn+1 | ≤
|xn − x| + |x − xn+1 | ≤ 2. 21 = 1, which is not true since for every n ∈ N we have |xn − xn+1 | = 2. ♦ It is important to note that all topological concepts on R —i.e., concepts like open or closed sets, neighborhoods, etc.— can be described by using sequences. This is a consequence of the fact that the closed subsets of R (and then their complements, the open subsets of R) can be characterized by sequences in the following way. Proposition 128 A subset F of R is closed if, and only if, whenever a sequence {xn }∞ n=1 in F converges to x ∈ R, then necessarily x ∈ F . Proof If F is closed, then F c is open. Let {xn }∞ n=1 be a sequence if F that converges to some x ∈ R. Assume that x ∈ F c . There exists ε > 0 such that (x −ε, x +ε) ⊂ F c . c The sequence {xn }∞ n=1 is in (x − ε, x + ε) (⊂ F ) for n big enough, a contradiction. c Assume now that F is not closed. Then F is not open. Thus, there exists x ∈ F c such that (x − 1/n, x + 1/n) ∩ F = ∅ for every n ∈ N. We can choose then xn ∈ (x − 1/n, x + 1/n) ∩ F for n ∈ N. The sequence {xn }∞ n=1 is in F and converges to x ( ∈ F ). Indeed, fix ε > 0. Find nε ∈ N such that 1/nε < ε (this is possible thanks to Proposition 126, see also Proposition 24). For n ≥ nε we get 1/n ≤ 1/nε < ε, hence |x − xn | < ε for n ≥ nε (see Remark 123.1). This shows that xn → x. Proposition 129 Every convergent sequence in R is bounded. Proof Let {xn }∞ n=1 be a sequence in R that converges to x. Fix ε = 1 and find n1 ∈ N such that, if n ≥ n1 , then |xn − x| < 1. This gives |xn | ≤ |x| + 1 for n ≥ n1 . Thus |xn | ≤ max{|x1 |, . . . , |xn1 |, |x| + 1} for all n ∈ N. Remark 130 There is a notational device that may help to shorten some expressions. Let {xn }∞ n=1 be a sequence in R. Assume that given any natural number N we can find nN ∈ N such that xn ≥ N for all n ≥ nN . In this case, we write limn→∞ xn = +∞. Observe that, according to Definition 121, the sequence {xn }∞ n=1 does not converge (see also Proposition 129). In a similar way, if given N ∈ N there exists nN ∈ N such that xn ≤ −N for all n ≥ nN , we write limn→∞ xn = −∞, despite that the sequence {xn }∞ ® n=1 does not converge. In order to prove Corollary 132 below, we will need a simple yet useful inequality, named after the Swiss mathematician D. Bernoulli. Lemma 131 (Bernoulli’s inequality) Let x ∈ R and n ≥ 2 be a positive integer. If x > −1 and x = 0 we have (1 + x)n > 1 + nx.
(2.5)
Proof Given x ∈ R, x > −1, and x = 0, we proceed by induction on n ∈ N, starting with n = 2. In this case we get (1 + x)2 = x 2 + 2x + 1 > 1 + 2x,
2.2 Sequences
65
so (2.5) holds for n = 2. Assume now that the inequality holds for some n ≥ 2. Since (1 + x) > 0 and x = 0, we have (1 + x)n+1 = (1 + x)(1 + x)n > (1 + x)(1 + nx) = 1 + (n + 1)x + nx 2 > 1 + (n + 1)x and the inequality holds for n + 1. By the finite induction principle, we get that the inequality holds for every n ∈ N, n ≥ 2. Corollary 132 For x ∈ R, the sequence {x n }∞ n=1 converges if, and only if, x ∈ (−1, 1]. If |x| < 1, then x n → 0. Proof If x = 1, the sequence {x n }∞ n=1 obviously converges (to 1). If x = −1, the sequence does not converge (see Example 127). For another argument, see the example after Proposition 140. If x = 0, again the sequence is obviously convergent (to 0). Assume that 0 < |x| < 1. Then y := 1/|x| > 1. Write y = (1+ε), where ε > 0. Then, by Lemma 131, we have y n = (1 + ε)n > 1 + nε for all n ∈ N, n ≥ 2. This shows that given k ∈ N there exists n0 ∈ N such that y n > k for every n ≥ n0 . Then |x n | = 1/y n < 1/k n for every n ≥ n0 . Since, by Proposition 126, {1/k}∞ k=1 → 0, we obtain |x | → 0, n hence x → 0. Finally, if |x| > 1, we have |x| = 1 + ε for some ε > 0. Again by Lemma 131, |x n | = |x|n = (1 + ε)n > 1 + nε, hence {x n }∞ n=1 is unbounded, and so, by Proposition 129, {xn }∞ does not converge. n=1 ∞ Proposition 133 Consider two sequences {xn }∞ n=1 and {yn }n=1 in R that converge to limits A and B in R, respectively. Then we have
(i) limn→∞ (xn + yn ) = A + B. (ii) limn→∞ xn yn = AB. (iii) limn→∞ x1n = A1 , provided xn = 0 for all n ∈ N, and A = 0. Proof Let ε > 0. Let nε ∈ N be large enough so that, if n ≥ nε , then |xn − A| < ε and |yn − B| < ε. It follows that |(xn + yn ) − (A + B)| ≤ |xn − A| + |yn − B| < ε + ε = 2ε, for n ≥ nε . Since ε > 0 is arbitrary, we conclude that (xn + yn ) → A + B, and this proves (i). Recall now that every convergent sequence in R is bounded (see Proposition 129). In particular, there exists M > 0 such that |xn | ≤ M for all n ∈ N. Then, for n ≥ nε , |xn yn − AB| = |xn yn − xn B + xn B − AB| ≤ |xn (yn − B)| + |B(xn − A)| ≤ Mε + |B|ε = ε(M + |B|). Since ε > 0 is arbitrary, we conclude (see Remark 125) then that xn yn → AB, and this proves (ii).
66
2 Sequences and Series
Let ε > 0 be such that ε < |A|/2. Observe that, for n ≥ nε , |xn | = |A − (A − xn )| ≥ |A| − |A − xn | ≥ |A| − |A|/2 = |A|/2. Then, for n ≥ nε ,
1 − 1 = |xn − A| ≤ 2|xn − A| < 2ε . x A |xn A| |A|2 |A|2 n
Since 0 < ε < |A|/2 is, otherwise, arbitrary, we conclude (see again Remark 125) that 1/xn → 1/A. Definition 134 We say a sequence {xn }∞ n=1 is increasing (decreasing) if xn ≤ xn+1 for every n ∈ N (respectively, xn ≥ xn+1 for every n ∈ N). We say that {xn }∞ n=1 is strictly increasing (strictly decreasing) if xn < xn+1 for every n ∈ N (respectively xn > xn+1 for every n ∈ N). Theorem 135 Every increasing (decreasing) and bounded above (respectively, bounded below) sequence in R is convergent (to the supremum (respectively, infimum) of its values). If a sequence {xn }∞ n=1 in R is increasing (decreasing) and unbounded, then we have limn→∞ xn = +∞ (respectively, limn→∞ xn = −∞), see Remark 130. Proof Consider a sequence {xn }∞ n=1 in R that is increasing and bounded above. The set {xn : n ∈ N} is bounded above. Let x := sup{xn : n ∈ N}. It exists thanks to Theorem 45. We shall show that lim xn = x.
n→∞
To this end, let ε > 0 be an arbitrary positive number. By the definition of the supremum, there exists nε so that |x − xnε | < ε. Since the sequence is increasing we have xnε ≤ xn ≤ x for all n ≥ nε , hence |x − xn | = x − xn ≤ x − xnε = |x − xnε | < ε for all n ≥ nε , and thus limn→∞ xn = x. The situation in which the sequence is decreasing is similar. If the sequence is increasing and unbounded, given r ∈ R there exists nr ∈ N such that xnr > r. Since the sequence is increasing, r < xnr ≤ xn for all n ≥ nr , and this shows that xn → +∞. The remaining case is treated similarly. If a sequence {xn }∞ n=1 in R is increasing and converges to some x ∈ R, we shall write xn ↑ x. Analogously, we shall write xn ↓ x for a sequence {xn }∞ n=1 that decreases and converges to x. Remark 136 The property exhibited in Theorem 135 is equivalent to the completeness of R (see Theorem 1074). ®
2.2 Sequences
67
∞ Let {xn }∞ n=1 be a sequence in R. Assume that {xn }n=1 is bounded above. Put, for n ∈ N,
un := sup{xm : m ≥ n}.
(2.6)
The sequence {un }∞ n=1 is decreasing. Assume now that {xn }∞ n=1 is bounded below. Put, for n ∈ N, ln := inf{xm : m ≥ n}.
(2.7)
The sequence {ln }∞ n=1 is increasing. Definition 137 Let {xn }∞ n=1 be a sequence in R. We define the limes superior (or limit superior) of the sequence {xn }∞ n=1 , denoted lim supn→∞ xn (or just lim supn xn or even lim sup xn if no misunderstanding is expected), in the following way: • If {xn }∞ n=1 is not bounded above, put lim supn→∞ xn := +∞. ∞ • If {xn }∞ n=1 is bounded above, the sequence {un }n=1 defined in (2.6) is decreasing. Then put lim supn→∞ xn := limn→∞ un (this limit exists finite by Theorem 135 if the sequence {un }∞ n=1 is bounded below. Otherwise, it is −∞ according to the convention in Remark 130). We define the limes inferior (or limit inferior) of the sequence {xn }∞ n=1 , denoted lim inf n→∞ xn (or just lim inf n xn or even lim inf xn if no misunderstanding is expected), in the following way: • If {xn }∞ n=1 is not bounded below, put lim inf n→∞ xn := −∞. ∞ • If {xn }∞ n=1 is bounded below, the sequence {ln }n=1 defined in (2.7) is increasing. Then put lim inf n→∞ xn := limn→∞ ln (this limit exists finite by Theorem 135 if the sequence {ln }∞ n=1 is bounded above. Otherwise, it is +∞ according to the convention in Remark 130). ∞ From the definition of {un }∞ n=1 and {ln }n=1 (see (2.6) and (2.7), respectively) it follows immediately that for a bounded sequence {xn } we have ln ≤ un for every n ∈ N. Since {ln } is increasing and {um } is decreasing, we have, for n ≤ m, ln ≤ lm ≤ um ≤ un , so ln ≤ um for all n ≤ m. Letting m → ∞ we get ln ≤ lim supm→∞ xm . This happens for every n ∈ N, hence lim inf n→∞ xn ≤ lim supn→∞ xn . Of course, the same is true for unbounded (above, below, or both) sequences.
Example 138 For the unbounded above sequence {1, 2, 3, . . . } we have lim inf n = lim supn = +∞. For the unbounded below sequence {−1, −2, −3, . . . } we have lim inf n = lim supn = −∞. The sequence {1, −2, 3, −4, 5, −6, . . . } has lim inf n = −∞, lim supn = +∞. For another examples see Proposition 140 below and the sequence after its proof. ♦ Remark 139 For a characterization of lim sup xn and lim inf xn that may help understand their nature, formulated in terms of subsequences (Definition 143 below), see Exercise 13.102. ®
68
2 Sequences and Series
Proposition 140 A bounded sequence {xn }∞ n=1 converges if, and only if, lim inf xn = lim sup xn . n→∞
n→∞
If this is the case, the common value lim sup xn = lim inf xn coincides with lim xn . ∞ Proof Let {un }∞ n=1 and {ln }n=1 be the sequences defined in (2.6) and (2.7), respectively. Assume first that {xn } converges. Let x be its limit. Fix ε > 0. Then find N ∈ N such that |x − xn | < ε for every n ≥ N . Thus, un ≤ x + ε and ln ≥ x − ε for all n ≥ N. This shows that x − ε ≤ lim inf xn ≤ lim sup xn ≤ x + ε. Since ε > 0 was arbitrary, we obtain lim inf xn = lim sup xn = x. Assume now that lim inf xn = lim sup xn = x. Given ε > 0 there exists N ∈ N such that x − ε < ln ≤ x ≤ un < x + ε for all n ≥ N . This shows that xn < x + ε and, analogously, x − ε < xn , for all n ≥ N . Putting the two inequalities together, we get x − ε < xn < x + ε for all n ≥ N . This shows that lim xn exists and coincides with x. As an example, observe that
lim sup (−1)n = 1 and lim inf (−1)n = −1. n→∞
n→∞
In particular, the sequence {1, −1, 1, −1, . . . } does not converge, as it follows from Proposition 140 (see also Example 127). As an application of the notion of convergent sequence, we can reformulate (i) in Proposition 79 in the following way. Note the similarities between the proof of Proposition 128 and the next one. Proposition 141 If A ⊂ R, then A = {x ∈ R : there exists a sequence {xn }∞ n=1 in A with lim xn = x}. n→∞
Proof If A is empty there is nothing to prove. Assume then that A = ∅. Let x ∈ A. It follows from (i) in Proposition 79 that, for all n ∈ N, we have (x−1/n, x+1/n)∩A = ∅. If xn ∈ (x − 1/n, x + 1/n) ∩ A for n ∈ N, we get a sequence {xn }∞ n=1 in A that converges to x. If there exists a sequence {xn }∞ n=1 in A that converges to some x ∈ R, every neighborhood U of x contains eventually the elements xn of the sequence, hence U intersects A. Again by (i) in Proposition 79, we get that x ∈ A. Corollary 142 Every real number is the limit of a sequence of rational numbers. Proof Recall that Q = R (Proposition 85). The result follows from Proposition 141.
2.2 Sequences
2.2.2
69
Two Particular Sequences: Arithmetic and Geometric Progressions
Arithmetic progressions and geometric progressions are particular kind of sequences often encountered in mathematics.
Arithmetic Progressions Let us consider the sequence {1, 2, 3, . . . } consisting of all natural numbers. For n ∈ N put sn :=
n
k = 1 + 2 + . . . + n.
(2.8)
k=1
In order to calculate sn , use the following diagram: =
1
+ 2
sn
=
n
2sn
=
(1 + n)
=
(1 + n)n.
sn
+
3
+ . . . + (n − 1)
+
n
+ (n − 1) +
(n − 2)
+ ... + 2
+
1
+ (1 + n)
(1 + n)
+ . . . + (1 + n)
+
(1 + n)
+ +
(2.9) It follows that n
k=
k=1
1+n n, for all n ∈ N. 2
(2.10)
This can be proved also by finite induction (see Exercise 13.111). Now assume that a1 , r ∈ R are given. The sequence {an }∞ n=1 , where an+1 := an +r for all n ∈ N, is called an arithmetic progression. Observe that an = a1 + (n − 1)r for all n ∈ N (something that can be also proved easily by finite induction). Note, too, that an argument similar to the one used in (2.9) gives immediately n k=1
ak =
a1 + an n, for all n ∈ N. 2
Again, this can be proved by induction (see Exercise 13.111).
(2.11)
70
2 Sequences and Series
Geometric Progressions Associated to the legend of the birth of the chess game is the story that its inventor asked for a reward to be paid with wheat grains, 1 at the first square, 2 at the second, 4 at the third, 8 at the fourth, each time doubling the number. There are 64 squares. The resulting sequence to be summed up is {20 , 21 , 22 , 23 , . . . }. This is an example of a geometric progression. Fix r ∈ R, r = 1. The sequence {1, r, r 2 , . . . , r n , r n+1 , . . . }
(2.12)
is called a geometric progression of ratio r. For n ∈ N, put sn := 1+r +r 2 +. . .+r n . Observe that (1 + r + r 2 + . . . + r n )(1 − r) = 1 − r n+1 . This shows that sn :=
1 − r n+1 , for r ∈ R, r = 1 . 1−r
(2.13)
The introductory paragraph asks for computing s63 for r = 2. The answer is then 264 − 1, i.e., 18.446.744.073.709.600.000, roughly (1.84).(1019 ).
2.3
More on Sequences
The sequence {xn }∞ n=1 := {0, 1, 0, 1, 0, 1, 0, . . . } does not converge: This follows from Proposition 140: Indeed, it is enough to observe that un = 1 for all n ∈ N, where un was defined by equation (2.6) above, and ln = 0 for all n ∈ N, where ln was defined by equation (2.7) above. Thus, lim inf n→∞ xn = 0 and lim supn→∞ xn = 1. However, some of the terms in the sequence, in infinite number (like {0, 0, 0, . . . }), form a sequence that converges. The following definition gives a name to a certain kind of “extraction”. Definition 143 By a subsequence of a sequence {xn }∞ n=1 we understand a sequence {xnk }∞ k=1 , where {nk }∞ k=1 is a strictly increasing infinite subset of natural numbers, i.e., n1 < n2 < n3 < . . . 1 ∞ For example, the sequence { 2n }n=1 is a subsequence of the sequence { n1 }∞ n=1 . The sequence {0, 0, 0, . . . } is a subsequence of the sequence {0, 1, 0, 1, 0, 1, 0, . . . }. A trivial observation is that every subsequence of a sequence that converges to, say, x, is again convergent (to the same limit x).
2.3 More on Sequences
71
Definition 144 Let {xn }∞ n=1 be a sequence in R. We say that x ∈ R is a cluster point of the sequence {xn }∞ if, for any neighborhood U of x and for every N ∈ N we n=1 may find n > N such that xn ∈ U . Remark 145 The concept of an accumulation point of a subset of R was introduced in Definition 81. There is a close connection between being a cluster point of a sequence {xn }∞ n=1 and being an accumulation point of the set {xn : n ∈ N}. However, the relationship can be more subtle than appears at first glance. For example, it is clear that any accumulation point x of the set {xn : n ∈ N} is a cluster point of the sequence {xn }∞ n=1 , since any neighborhood of x must contain an infinite number of elements in {xn : n ∈ N} (see Remark 82.2). However, recall that no finite set may have an accumulation point (see Remark 82.2). The point 0 is a cluster point of the sequence {1, 0, 1, 0, 1, 0, 1, . . . } according to Definition 144 (and so it is the point 1; there are no more cluster points). However, the set {xn : n ∈ N} ( = {0, 1}), is finite, hence it has no accumulation point at all. ® ∞ Proposition 146 If a sequence {xn }∞ n=1 has a subsequence {xnk }k=1 that converges to some point x, then x is a cluster point of the whole sequence {xn }∞ n=1 . Conversely, if x is a cluster point of the sequence {xn }∞ , then there exists a subsequence of n=1 {xn }∞ that converges to x. n=1 ∞ Proof Assume that {xnk }∞ k=1 is a subsequence of {xn }n=1 that converges to some x ∈ R. Given ε > 0, there exists k0 ∈ N such that {xnk : k ≥ k0 } ⊂ (x − ε, x + ε). Fix N ∈ N. By the definition of a subsequence, we can find k ∈ N such that k ≥ k0 and simultaneously nk > N . Thus, xnk ∈ (x − ε, x + ε). Since ε > 0 and N ∈ N were arbitrary, we get that x is a cluster point of {xn }∞ n=1 . Conversely, assume that x is a cluster point of {xn }∞ n=1 . There exists an infinite subset N1 of N such that {xn : n ∈ N1 } ⊂ (x − 1, x + 1). Choose n1 ∈ N1 . There exists an infinite subset N2 of N such that {xn : n ∈ N2 } ⊂ (x − 1/2, x + 1/2). Since N2 is infinite we may choose n2 ∈ N2 such that n2 > n1 . There exists an infinite subset N3 of N such that {xn : n ∈ N3 } ⊂ (x − 1/3, x + 1/3). Since N3 is infinite we may choose n3 ∈ N3 such that n3 > n2 . Continue in this way. The subsequence {xnk }∞ k=1 clearly converges to x. Given a sequence {xn }∞ (maybe not convergent) it is natural to consider the n=1 family of all convergent subsequences (if any), and their limits. At least, if {xn }∞ n=1 is bounded, there are some. This is the content of the following important result, due to the aforementioned B. Bolzano and the German mathematician K. Weierstrass.
Theorem 147 (Bolzano–Weierstrass) Any bounded sequence in R has a convergent subsequence. Proof If the set A := {xn : n ∈ N} is finite, then there is a constant subsequence, since at least one of the terms repeats infinitely many times. Assume now that A is infinite. The set A is closed and bounded (see Proposition 80), and certainly infinite; hence it has an accumulation point x0 ∈ A (Theorem 97). Obviously, x0 is a cluster point of {xn }∞ n=1 (see Remark 145). Thus, it is enough to apply Proposition 146 to obtain a subsequence of {xn }∞ n=1 that converges to x0 .
72
2 Sequences and Series
Corollary 148 Any nested sequence of nonempty compact subsets of R has a nonempty intersection. Proof Let {Kn }∞ n=1 be a nested sequence of nonempty compact subsets of R (i.e., for n ∈ N we have Kn+1 ⊂ Kn ). For n ∈ N, pick xn ∈ Kn . The sequence {xn }∞ n=1 so obtained is bounded (indeed, it is contained in the bounded set K1 , and we can use Lemma 95), so it has,by Theorem 147, a convergent subsequence {xnk }∞ k=1 . Its limit x clearly belongs to ∞ n=1 Kn . Theorem 147 is one of the key results in real analysis. Many great theorems do follow from it. To stress the connection with the concept of compactness and, in particular, with Theorem 96, let us prove that compactness may be characterized, in fact, by convergence of subsequences. Theorem 149 A subset K of R is compact if and only if every sequence in K has a subsequence that converges to a point in K. Proof Assume that K is compact. Then it is, by Theorem 96, closed and bounded. Let {xn }∞ n=1 be a sequence in K. By Theorem 147, it has a subsequence that converges to some x ∈ R. In fact, x ∈ K since K is closed (see Proposition 128). Assume now that K is not compact. Then, By Theorem 96, K cannot be simultaneously bounded and closed. Assume that K is not bounded. We can choose then a sequence {xn }∞ n=1 in K such that |xn | ≥ n for all n ∈ N. By Proposition 129, this sequence does not converge, and none of its subsequences converge. Assume now that K is not closed. Then, by Proposition 128 there exists a sequence {xn }∞ n=1 in K converges that converges to an element x ∈ R, x ∈ K. No subsequence of {xn }∞ n=1 then to an element in K, as all of them converge to x ∈ K. The following definition is crucial in analysis. Definition 150 A sequence {xn }∞ n=1 in R is said to be a Cauchy sequence if for every ε > 0 there exists nε ∈ N such that |xm − xn | < ε for all m, n ≥ nε . This concept is named after the French mathematician A. L. Cauchy. Proposition 151 Every Cauchy sequence in R is bounded. Proof Let {xn }∞ n=1 be a Cauchy sequence. Then, there exists n0 ∈ N such that |xn − xm | < 1 for all n, m ≥ n0 . In particular, |xn − xn0 | < 1 for all n ≥ n0 , hence |xn | < |xn0 | + 1 for all n ≥ n0 . Put M := max{|xn |, |xn0 | + 1 : n ∈ N, n < n0 }. Then |xn | ≤ M for all n ∈ N. Theorem 152 A sequence in R converges if, and only, it is a Cauchy sequence. Proof Let {xn }∞ n=1 be a convergent sequence and put lim n→∞ xn = L. Let ε > 0 be given and choose nε so that |x − xn | < ε/2 for all n ≥ nε . We have |xm − xn | ≤ |xm − L| + |L − xn | < ε for any m, n ≥ nε . So the sequence is Cauchy.
2.4 Series
73
Conversely, assume that the sequence {xn }∞ n=1 is Cauchy. By Proposition 151, the sequence is bounded. Then, by Theorem 147, there exists a convergent subsequence {xnk }∞ k=1 . Let L be its limit. Therefore, given ε > 0 there exists k0 ∈ N such that |L−xnk | < ε for every k ≥ k0 . Moreover, there exists n0 ∈ N such that |xn −xm | < ε for every n, m ≥ n0 . Find k ∈ N such that k ≥ k0 and nk ≥ n0 . Then, for n ≥ n0 , |xn − L| = |xn − xnk + xnk − L| ≤ |xn − xnk | + |xnk − L| < 2ε. This proves that xn → L. A great advantage of the Cauchy property of a sequence is that, according to Theorem 152, for checking its convergence we do not need to have a formula—nor even a guess—for the limit. Remark 153 The sufficient condition in Theorem 152 (i.e., that every Cauchy sequence in R converges) is equivalent to the completeness of R (see Theorem 1074). If the space we would be living in is√the set Q of all rational numbers, then any sequence {xn }∞ 2 would be Cauchy in Q and not convergent n=1 in Q converging to in Q. A similar situation holds true for the set of all irrational numbers. ®
2.4
Series
2.4.1
Introduction
That which is in locomotion must arrive at the half-way stage before it arrives at the goal. Aristotle, Physics VI:9, 239b10
Recall one of the famous Zeno’s Paradoxes, the one called the “Achilles and the tortoise”: “swift-footed” Achilles races against the tortoise, placed ahead (say at 100 meters). When Achilles runs 100 meters the tortoise is, say, 10 meters ahead. When Achilles reaches this new position, the tortoise is 1 m ahead, etc. Since this process continues forever, the tortoise is placed, at any stage, some distance ahead of Achilles, and so Achilles can never overtake the tortoise. While the discussion about the infinite divisibility of time and space continues, it has been said that the mathematics of the infinitum—specially as laid down in the work of Cauchy and Weierstrass—solves the problem. The distances traveled by Achilles and the tortoise and the position of both at several steps are depicted in Figure 2.1. The increments in position of the two racers at successive steps are space Achilles
0
100
10
1
1/10
1/100
...
tortoise
100
10
1
1/10
1/100
1/1000
...
0
10
1
1/10
1/100
1/1000
...
time
74
2 Sequences and Series
Fig. 2.1 How Achilles and the tortoise proceed
space e2 e1
tortoise
100 Achilles
0
t 1 t2
time
∞ −n and the two racers ∞ meet−nat the same moment 10+ n=0 10 (finite!) at the same place 100 + 10 + n=0 10 (finite!). That this “infinite sum” (consisting in computing the sum sn of the first n summands, and then letting n increase towards ∞ or, in other terms, finding limn→∞ sn ) converges is guaranteed by Proposition 163 below (look also at what was said above concerning geometric progressions). Another instance in which naturally an “infinite sum” appears, although in disguise, is in the base representation of a real number. Indeed, choose a natural number b ≥ 2 as a base. A number x ∈ (0, 1) has a representation x = 0.a1 a2 a3 . . . (base b) (see Sect. 1.5). Even in the case that x ∈ Q, this representation may be “nonterminating” (see Theorem 20). What is behind a nonterminating base expansion is again an infinite sum (called from now on a “series”). Indeed, the number x is n the sum ∞ n=1 an /b . The meaning of such a sum was succinctly explained above. Details are provided in Example 171.1. Let us formalize these ideas by introducing the concept of an (infinite) series, and try to develop some criteria for convergence and for finding the sum of such entity. Definition 154 Let {xn }∞ n=1 be a sequence of real numbers. Consider the sequence , where {sn }∞ n=1 s1 := x1 , s2 := x1 + x2 , s3 := x1 + x2 + x3 , . . .
(2.14)
∞ The sequence {sn }∞ series associated n=1 is called the to the sequence {xn }n=1 . The ∞ series is denoted, in short, by n=1 xn , or just by xn . The element sn is said to be the n-th partial sum of the series. We say that a series ∞ n=1 xn converges when the converges. If this is the case, the limit s of the sequence {sn }∞ sequence {sn }∞ n=1 n=1 is ∞ ∞ called the sum of the series, and we write s = n=1 xn . A series n=1 xn is said to be divergent whenever the sequence {sn }∞ n=1 diverges. ∞ Remark 155 Fix N ∈ N. A series n=1 xn of real numbers converges n if, and only if, the series ∞ n=N+1 xn converges. This is true, since sn := k=1 xk = n N x + x for each n ≥ N + 1. This remark will be applied without k=1 k k=N +1 k mentioning: We may alter a finite number of terms in a series without affecting the convergence or divergence of the given series (of course, it may affect, in case of convergence, the sum of the series). Notice, then, that
2.4 Series
75
all the subsequent convergence criteria, although stated for simplicity usually for the whole series, may be checked in fact for one of its tails, i.e., a series of the form ∞ n=N xn for some N ∈ N. ®
∞
Remark 156 Let n=1 be a convergent series, and let x be its sum. Then ∞ → ∞. Indeed, note first that it follows from Remark 155 n=N+1 xn → 0 as N ∞ that each of the series n=N xn is convergent. Moreover, given ε > 0, there exists n N ∈ N such that | i=1 xi −x| < ε for all n ≥ N . It follows that, for all N ≤ n < m, m m m n n (2.15) xi = xi − xi ≤ xi − x + x − xi < 2ε. i=n+1 i=1 i=1 i=1 i=1 By letting m → +∞ in (2.15), we get | ∞ i=n+1 xi | ≤ 2ε. Since this is true for all n ≥ N, we get the conclusion. ® The following result is a straightforward consequence of Proposition 133. Its proof if left to the reader. ∞ Proposition 157 Let ∞ n=1 an and n=1 bn be two convergent series of real numbers, and let A and B be their sum, respectively. Let α and β be real numbers. Then the series ∞ (αa + βb ) is convergent, and its sum is αA + βB. n n n=1 ∞ Proposition 158 If n=1 xn is a convergent series of real numbers, then xn → 0. Proof The sequence {sn }∞ n=1 is Cauchy (see Theorem 152). Thus, given ε > 0 there exists N ∈ N such that |sn − sm | < ε for all n, m ≥ N . In particular, |xn+1 | = |sn+1 − sn | < ε for every n ≥ N . This shows the statement. Remark 159 1. The result in Proposition ∞ 158 can be used as a test for divergence: if the general term xn of a series n=1 xn does not converge to 0, then the series diverges. For n+1 for n ∈ N, i.e., the series example, the series ∞ n=1 xn , where xn := ( − 1) 1 − 1 + 1 − 1 + . . . , diverges. 2. The converse of Proposition 158 does not hold. See Remark 162 below. ® Definition 160 The series ∞ 1 1 1 1 = 1 + + + + ... n 2 3 4 n=1
(2.16)
is called the harmonic series. Proposition 161 We have n 1 < < 1 + n, for all n ∈ N. 2 k k=1 2n
1+
(2.17)
76
2 Sequences and Series
In particular, the harmonic series diverges. Proof Observe that 1 1 1 1 1 1 1 1 1 + + + + + + ... + + · · · + 1+ 2 3 4 5 6 7 8 2n−1 + 1 2n >1+
1 2 4 n 2n−1 + + + ... + n = 1 + . 2 4 8 2 2
On the other hand 1 1 1 1 1 1 1 1 1 + + + + + + + ... + + · · · + 1+ 2 3 4 5 6 7 8 2n−1 + 1 2n
= , 2k − 1 2k 2 k=1 k k=1 n
n
and the result follows from Proposition 161, since the partial sums of these two series are thus unbounded. One of the most simple and yet important series is the so-called “geometric series”. By this we understand any series of the form ∞
x n , x ∈ R.
(2.18)
n=0
The sequence {x n }∞ n=0 is an instance of a geometric progression (see Subsect. 2.2.2). The character of the series (2.18) (i.e., its convergence or divergence) depends clearly
2.4 Series
77
on the value x. It is a simple—yet crucial—result that the series converges precisely when |x| < 1. In this case we are even able to compute the sum of the series. To this end, observe first that, if sn := nk=0 x k (note that we depart here from the previous notation, as we are summing up (n + 1) terms), then we get, based on (2.13), sn =
1 − x n+1 , if x = 1. 1−x
(2.19)
This gives the following result.
n Proposition 163 The geometric series ∞ n=0 x converges if, and only if, |x| < 1, 1 . and the value of the sum is, in the case of convergence, 1−x Proof If |x| < 1 we have, using (2.19) and Corollary 132, that ∞ n=0
x n = lim
n→∞
n−1 k=0
1 − xn 1 = , if |x| < 1. n→∞ 1 − x 1−x
x k = lim
(2.20)
If, on the contrary, |x| ≥ 1, the general term of the series does not converges to 0. It follows from Proposition 158 that the series diverges.
2.4.2
General Criteria for Convergence of Series
Proposition 158 can be understood as a negative criterion for convergence of a series of arbitrary terms, as it was already mentioned in Remark 159.1. Precisely, if the general term of a series does not converge to 0, the series diverges. Definition 164 We say that a series ∞ n=1 xn is Cauchy (or a Cauchy series) if the sequence of its partial sums is a Cauchy sequence. Observe that a series ∞ x is Cauchy if, and only if, for every n=1 n m ε > 0 there xk | < ε. exists N ∈ N such that, for every n, m ∈ N such that m > n ≥ N , | k=n+1 n This is a consequence of the fact that sm − sn = m x , where s := n k=n+1 k k=1 xk for all n ∈ N. The following result is a straightforward consequence of the Cauchy criterion for the convergence of a sequence (Theorem 152), so we omit the proof. Proposition 165 (Cauchy’s criterion for series) A series ∞ n=1 xn converges if, and only if, it is Cauchy. As with sequences—here even more transparently—the advantage of the Cauchy property of a series is that in checking its convergence we do not need to have a formula for the sum—usually hard to find. This observation is often used in this area. The Cauchy criterion (Proposition 165) has the following consequence: Given a ∞ series ∞ x , consider the series n n=1 n=1 |xn |. If the last one converges, the first one
78
2 Sequences and Series
converges too (but not conversely, see Remark 168). This is the content of Proposition 167. We introduce first a definition. Definition 166 We say that a series ∞ n=1 xn is absolutely convergent if the series ∞ n=1 |xn | converges. Proposition 167 Every absolutely convergent series is convergent. ∞ Proof ∞ Assume that the series n=1 xn is absolutely convergent. Then the series |xn | is Cauchy, i.e., given ε > 0 there exists N∈ N such that for m > n ≥ N , n=1 m m m k=n+1 |xk | < ε. The triangle inequality implies | k=n+1 xk | ≤ k=n+1 |xk | < ε, and the conclusion follows from the Cauchy criterion (Proposition 165). For another proof of Proposition 167 see Exercise 13.114. Remark 168 The converse to Proposition 167 is false. The alternating series ∞ n+1 /n is convergent (see the paragraph succeeding Remark 184 below). n=1 (−1) n+1 However, the series ∞ /n| is the harmonic series, proved to be divergent n=1 |(−1) in Proposition 161. ®
2.4.3
Series of Nonnegative Terms
Due to the fact that many convergent series in applications are, in fact, absolutely convergent, convergence criteria for series of positive terms are welcomed (not to say that they are easier to establish and to use). To list and prove some of them is the content of this subsection. A first observation concerning series of nonnegative terms is that the sequence of partial sums is increasing. This leads easily to the first test concerning such a series. Proposition 169 Let xn be a series of nonnegative terms. Then xn converges if, and only if, the sequence {sn }∞ n=1 of partial sums is bounded above. Proof Due to the fact, mentioned above, that the sequence {sn }∞ n=1 of partial sums is increasing, the result follows from Theorem 135. Most of the tests concerning series of nonnegative terms are based on the comparison between the given series and an appropriate convergent series, according to the following simple result. Proposition 170 (Comparison test) Let yn and xn be two series of nonnega tive terms. Assume that xn converges, and that (0 ≤ ) yn ≤ xn for all n ∈ N. Then the series yn converges, too. n n ∞ Proof Let Xn := k=1 xk and Yn := k=1 yk for all n ∈ N. Then {Xn }n=1 is bounded above, since it is a convergent sequence (see Proposition 129). Since Yn ≤ Xn for all n ∈ N, the sequence {Yn }∞ yn converges n=1 is also bounded above, hence by Proposition 169.
2.4 Series
79
Examples 171 1. Let b be a positive integer greater than or equal to 2. Then, given a sequence ∞ n {an }∞ n=1 of digits (i.e., nonnegative integers) less than b, the series n=1 (an /b ) n converges. This is a consequence of Proposition 170, the ≤ inequality an /b n (b − 1)/bn for every n ∈ N, and the fact that the series ∞ n=1 (b − 1)/b converges, due to Propositions 157 and 163. This shows that every expansion in a base b converges, and defines a real number. Assume now that the expansion is terminating or periodic. Then the sum of the associated series is a rational number. This was “proved” in Theorem 20 by using an algebraic argument that was not properly established. We can now justify the validity of the argument there: indeed, all the algebraic computations used in the “proof” of Theorem 20 are valid under the assumption that the involved series converge; that this happens was the conclusion obtained above. 2. Taking for granted the existence and of the function sin x (see boundedness n sin n Sect. 5.2.5), observe that the series ∞ is convergent: Indeed, | sin | ≤ 21n 2n ∞ n=11 2n for every n ∈ N, and the series n=1 2n is convergent (see Proposition 163). sin n Thus ∞ n=1 2n is absolutely convergent by Proposition 170, hence convergent by Proposition 167. ♦ ∞ Corollary 172 Let ∞ n=1 xn and n=1 yn be series of positive terms. Assume that there exists two positive constants P and Q such that xn ≤ Q, for all n ∈ N. yn ∞ Then, ∞ n=1 xn converges if, and only if, n=1 yn converges. P ≤
Proof It follows from Proposition 170, having in mind that P yn ≤ xn ≤ Qyn for all n ∈ N. ∞ The previous result is used often in the following way: given two series n=1 xn xn and ∞ y of positive terms, such that lim exists as a nonzero finite number, n→∞ yn n=1 n then one series converges if, and only if, the other series converges. The following is a useful criterion for the convergence of series. Proposition 173 (Cauchy’s condensation criterion) Let ∞ n=1 xn be a series of ∞ nonnegative terms, and assume that the sequence {x } n n=1 is decreasing. Then ∞ n ∞ n converges. x converges if, and only if, the series 2 x n 2 n=1 n=0 Proof Note that, due to the decreasing character of the sequence {xn }, x1 + x2 + 2x4 + 4x8 + . . . ≤ x1 + x2 +(x3 + x4 )+(x5 + x6 + x7 + x8 ) + . . . , and x1 + (x2 + x3 ) + (x4 + x5 + x6 + x7 ) + (x8 + . . . + x15 ) + . . . ≤ x1 + 2x2 + 4x4 + 8x8 + . . . The result follows by using Proposition 170. As an application, let us prove the following useful result.
80
2 Sequences and Series
Proposition 174 The series
∞ n=1
1/np converges if, and only if, p > 1.
Proof By Proposition 158, if p ≤ 0 then the general term of the series does not converge to 0, and so the series certainly diverges. Assume now that p > 0. The sequence {1/np } is then strictly decreasing. Apply Proposition ∞173n to conclude p n p converges if, and only if, the series that the series 1/n n=0 2 (1/(2 ) ) = ∞ (1−p) n ) converges. The last one is a geometric series, that converges if, and n=0 (2 only if, 21−p < 1, i.e., if, and only if, p > 1 (see Proposition 163). One of the tests most often used in this area is the following. Proposition 175 (The ratio test) Let xn be a series of positive terms, and let us consider the sequence xn+1 ∞ . xn n=1 (i) If l := lim supn→∞ xn+1 < 1, then the series xn converges. xn (ii) If there exists N ∈ N such that xn+1 /xn ≥ 1 for all n ≥ N , then the series xn diverges. (iii) If l = 1, the test is inconclusive. Proof (i) Find α such that l < α < 1. Then there exists N ∈ N such that xn+1 < α for xn every n ≥ N. In particular, xN+1 < αxN , xN+2 < αxN +1 < α 2 xN , etc. In general, xN+p< α p xN for all p ∈ N, and the result follows from Proposition 170 and the fact p that ∞ p=1 α converges (see Proposition 163). (ii) In this case, clearly {xn } does not converge to 0, and the result follows from Proposition 158. (iii) The series 1/n diverges (see Proposition 161), while (1/n2 ) converges (see Proposition 174). In both cases, l = 1. Remark 176 It is possible to increase the accuracy of the quotient test to deal with series falling in case (iii), namely series xn where xn+1 → 1. An example of this xn will be given in Proposition 509 (the so-called Raabe’s test). ® Proposition 177 (The root test) Let xn be a series of nonnegative terms, and let us consider the sequence √ ∞ n xn n=1 . Let l := lim supn→∞
√ n
xn .
(i) If l < 1, then the series xn converges. (ii) If l > 1, then the series xn diverges. √ (iii) If l = 1 the test is inconclusive (although if n xn ≥ 1 for all n ∈ N, the series diverges). √ Proof (i) Find α such that l < α < 1. Then there exists N ∈ N such that n xn < α 13.103. The result follows from (i.e., xn < α n ) for every n ≥ N, see Exercise n Proposition 170 and the fact that ∞ 163). (ii) In n=1 α converges (see Proposition √ ∞ n this case, there is a subsequence {xnk }∞ k=1 of {xn }n=1 such that k xnk ≥ 1 (i.e.,
2.4 Series
81
xnk ≥ 1) for all k ∈ N, hence {xn } does notconverge to 0, and the result follows from Proposition 158. 2(iii) Again, the series 1/n diverges (see Proposition 161), and the series √ (1/n ) converges (see Proposition 174). In both cases, l = 1 (see n x n ≥ 1 then certainly {xn } does not converges to 0, and so the Lemma 178). If series xn diverges by Proposition 158. Although it is in general easier to apply the ratio test than the root test, the reader should notice that the second one is more powerful, in the sense that if the ratio test ensures convergence, then so it does the root test, and if the root test is inconclusive, then so it is the ratio test. This statement is a consequence of the following lemma, that is interesting in itself since it allows to compute some limits that, otherwise, will be difficult to obtain. An example of this is given after its proof. Lemma 178 Let {an } be a sequence of positive numbers. Then lim inf n→∞
√ √ an+1 an+1 ≤ lim inf n an ≤ lim sup n an ≤ lim sup . n→∞ an an n→∞ n→∞
(2.21)
Proof We shall prove the last inequality. The proof of the first one is similar, and = +∞. the central inequality is trivial. Assume first that l := lim supn→∞ an+1 an Then there is nothing to prove. Otherwise, let l < α. There exists N ∈ N such that an+1 < α for every n ≥ N . Proceeding as in the proof of Proposition 175, we find an 1/(N +p)
for all aN+p < α p aN for all p ∈ N. Therefore, (aN+p )1/(N +p) < α p/(N +p) aN p ∈ N. Letting p → +∞ we get lim supn→∞ (an )1/n ≤ α. Since this is true for every α > l, we get lim supn→∞ (an )1/n ≤ l. As an√ example of how to use Lemma 178 in computing limits, let us show that limn→∞ n n = 1. Indeed, if an = n for all n ∈ N, the sequence {an+1 /an } converges √ √ to 1. From (2.21), lim inf n→∞ n an = lim supn→∞ n an = 1, hence there exists √ limn→∞ n an with value 1 (see Proposition 140).
2.4.4
Series of Arbitrary Terms
We shall consider now series xn where the general term is arbitrary. In this context it is natural to express such a series in the form an b n for some chosen sequences {an } and {bn } of real numbers, hoping that an and bn , when considered separately, have a reasonable behavior. The following result is suited for this approach. It is due to the Norwegian mathematician N. Abel. Lemma 179 (Abel’s partial summation lemma)Consider sequences {an }∞ n=0 and {bn }∞ in R. Set n=0 n ak , for n ≥ 0, A−1 := 0. An := k=0
82
2 Sequences and Series
Then for 0 ≤ p < q we have q
an bn =
n=p
q−1
An (bn − bn+1 ) + Aq bq − Ap−1 bp .
(2.22)
n=p
Proof q
an bn =
n=p
q
(An − An−1 )bn =
n=p
=
q
q
An bn −
n=p
An bn −
n=p
q−1 n=p−1
An bn+1 =
q
An−1 bn
n=p q−1
An (bn − bn+1 ) + Aq bq − Ap−1 bp .
n=p
Remark 180 Formula (2.22) will be used frequently in the following way: Let ∞ be as in the statement of Lemma 179. Fix N ∈ N, and let {an }∞ n=0 and {bn }n=0
n ≥ N . Put An := nk=N ak . Then An = AN−1 + A n for all n ≥ N . Thus, for N ≤ p < q, q
an bn =
n=p
=
q−1
q−1
An (bn − bn+1 ) + Aq bq − Ap−1 bp
n=p
(AN −1 + A n )(bn − bn+1 ) + (AN−1 + A q )bq − (AN −1 + A p−1 )bp
n=p
=
q−1
AN−1 (bn − bn+1 ) + AN−1 bq − AN−1 bp +
n=p
q−1
A n (bn − bn+1 ) + A q bq
n=p
− A p−1 bp =
q−1
A n (bn − bn+1 ) + A q bq − A p−1 bp .
(2.23)
n=p
In other words, in formula (2.22), instead ofpartial sums An we can consider partial ® sums A n for an arbitrary tail of the series ∞ n=0 an . Lemma 179 is the key ingredient in proving the two following results. ∞ Theorem 181 (Dirichlet criterion) Let {an }∞ n=0 and {bn }n=0 be two sequences of real numbers such that: n (i) The sequence {An }∞ n=0 is bounded, where An := k=0 ak , n = 0, 1, 2, . . . is decreasing, and (ii) The sequence {bn }∞ n=0 (iii) limn→∞ bn = 0.
2.4 Series
Then
∞ n=0
83
an bn converges.
Proof Let A be an upper bound for the sequence {|An |}∞ n=0 . The existence of A follows from (i). From Lemma 179 we get, for p ≤ q in N ∪ {0}, and having in mind that (bn − bn+1 ) ≥ 0 for all n ∈ N ∪ {0} (condition (ii)), q q−1 an bn = An ( bn − bn+1 ) + Aq bq − Ap−1 bp n=p
n=p
≤
q−1
|An |(bn − bn+1 ) + Aq bq + Ap−1 bp
n=p
≤A
q−1
(bn − bn+1 ) + Abq + Abp
n=p
= A(bp − bq ) + Abq + Abp = 2Abp .
(2.24)
Given ε > 0 we can find, due to (iii), N ∈ N such that 2Abp < ε for every p ≥ N . It follows from (2.24) that the series ∞ n=0 an bn is Cauchy, hence, by Proposition 165, convergent. The following test also uses Lemma 179 in its proof. ∞ Theorem 182 (Abel’s criterion) Let {an }∞ n=0 and {bn }n=0 two sequences of real numbers such that: (i) The series ∞ n=0 an converges. (ii) The sequence {bn } is monotone, and (iii) The sequence {bn } converges. Then, the series ∞ n=0 an bn converges.
Proof Assume that {bn }∞ n=0 is decreasing (in the other case, the argument is similar). Given ε> 0 there exists N ∈ N ∪ {0} such that |A n | < ε for every n ≥ N , where A n := nk=N ak for n ≥ N (see Remark 180). Observe that {|bn |}∞ n=0 is a bounded . Lemma 179 (in the version of sequence; let B be an upper bound for {|bn |}∞ n=0 Remark 180, formula (2.23)) gives for N ≤ p < q, according to the estimation in (2.24), q an bn ≤ A B + A B + A B + A B = 4A B, (2.25) n=p
where A is an upper bound for the sequence {|A n |}∞ n=N (so A ≤ ε). The conclusion follows from the Cauchy criterion (Proposition 165). A straightforward consequence of Theorem 181 is a sufficient condition for the convergence of what is called an alternating series. This is a series of the form ∞ ∞ n n=0 ( − 1) bn , where {bn }n=0 is a sequence of nonnegative terms. The criterium bears the name of the German mathematician and philosopher G. Leibniz.
84
2 Sequences and Series
Fig. 2.2 The partial sums Bn of an alternating series and the sum B (Corollary 183)
B1
B3
B 4 B2
B0
B
Corollary 183 (Leibniz) Let {bn }∞ n=0 be a sequence in R. Assume that (i) bn ≥ 0 for all n ∈ N ∪ {0}. (ii) {bn }∞ n=0 is a decreasing sequence, and (iii) limn→∞ bn = 0. n Then the series ∞ n=0 (−1) bn converges. Proof It is enough to take an := (−1)n for n ∈ N ∪ {0} in Theorem 181.
Remark 184 An alternative proof of Corollary 183 —that n moreover provides an estimate of the error between any partial sum B := n k=0 bk and the sum B := ∞ n (−1) b — combines the Nested Interval Theorem 69 with a simple geometric n n=0 argument shown in Fig. 2.2, where the location on the real line of the successive partial sums is sketched. It is clear from the picture that 0 ≤ B2n − B ≤ b2n+1 , and that 0 ≤ B − B2n+1 ≤ b2n+2 for all n ∈ N ∪ {0}. Summarizing, 0 ≤ (−1)n+1 (B − Bn ) ≤ bn+1 , for n ∈ N ∪ {0}.
(2.26)
® An interesting particular example of an alternating series is the so-called alternating harmonic series. This is the series ∞ (−1)n−1 n=1
n
=1−
1 1 + − ... 2 3
(2.27)
By Corollary 183, this series converges. What adds interest to the particular example (2.27) is that when suppressing the signs, the series diverges, since it becomes the harmonic series (Proposition 161). In other words, the alternating harmonic series is an example of a convergent, not absolutely convergent, series. In connection with Leibniz’s Corollary 183, see Exercise 13.124.
2.4.5
Rearrangement of Series
The sum of (finitely many) real numbers is commutative, meaning that the sum does not depend on the order, and so any rearrangement adds to the same sum. We may ask whether this is true for an infinite sum (i.e., for a series). We first precise what we mean by the word rearrangement in this case.
2.4 Series
85
∞ Definition ∞ 185 A series n=1 bn of real numbers is said to be a rearrangement of a series n=1 an if there exists a permutation σ of N, i.e., a one-to-one mapping from N onto N, such that bn = aσ (n) for all n ∈ N. The question, formulated at the beginning of this subsection, whether the commutative law applies to an infinite sum has a positive answer for a series of nonnegative terms: if such a series converges, any rearrangement of it converges (to the same sum). It is easy to provide an argument for it, but we prefer to postpone it, since we shall prove later (in Corollary 193) something more general. However, the answer to our question, for series of arbitrary terms, is, in general, negative. The alternating series considered at the end of Sect. 2.4.3 is a good example in this direction, illustrating something that at a first glance may look paradoxical (after all, we are adding all the terms in the series, although in a different order): we can rearrange the series to add to any preassigned real number; even more, we can rearrange the series to diverge to +∞ or to −∞. The argument, due to the aforementioned B. Riemann, reads: Suppose we want a rearrangement of the series (2.27), i.e., ∞ (−1)n−1 n=1
n
=1−
1 1 + − ... 2 3
to sum up to, say, S = 1000. Start by adding only positive terms, first 1, then 1/3, etc., until the first moment the sum exceeds 1000 (we know that this happens: the harmonic series diverges, and so it does the series of the reciprocal of the odd numbers, see Proposition 161). Then stop. Continue by adding to this only negative terms, first −1/2, then −1/4, etc., until the first moment the sum dips below 1000 (again, observe that the series of the reciprocals of the even numbers diverges, see Proposition 161). Then stop. Continue by adding only positive terms, from where they were left above, until the first moment the sum exceeds 1000. Now add only negative terms, from where they were left above, until the first moment sum dips below 1000. Continue in this way. As the terms in the series approach zero, the “jumps” between successive partial sums become, after some time, as small as we wish. So 1000 is “trapped” (more and more tightly) in between those partial sums. The sequence of partial sums converges, then, to 1000. The reader will certainly be able to devise a modification of the strategy above to make the rearranged sum diverge to +∞ or to −∞. It is clear, too, how to modify the construction in order to obtain a rearrangement of the alternating series whose partial sums form a sequence having for lim sup and lim inf any couple of preassigned “numbers” in [−∞, +∞]. This will be done in detail in the proof of Theorem 191. Let us extract the basic features that allow for such a behavior. We need a definition. Definition 186 A series ∞ n=1 an of real numbers is said to be unconditionally convergent if every rearrangement of it converges. Remark 187 It will be proved later (see Corollary 193) that if a series is unconditionally convergent, then all of its rearrangements converge to the same sum. ®
86
2 Sequences and Series
The analysis of the example of the alternating series done at the beginning of this subsection suggests that the reason for the non-unconditional convergence of a series should be that the series consisting of its positive terms and the series consisting of its negative terms are both divergent. This intuition is correct. To establish it as a precise result, we need some simple facts. Let ∞ n=1 an be a series of real numbers. Let us define two associated sequences and {an− }∞ {an+ }∞ n=1 n=1 as follows: For n ∈ N, let ⎧ ⎧ ⎨ a if a ≥ 0, ⎨ 0 if an ≥ 0, n n an− := (2.28) an+ := ⎩ 0 if an < 0, ⎩ −an if an < 0. Note that an+ ≥ 0 and an− ≥ 0 for all n ∈ N. It is easy to show that an+ − an− = an and an+ + an− = |an | for all n ∈ N. Proposition 188 Let ∞ n=1 an be a series of real numbers. ∞ ∞ + ∞ − (i) n=1 an is absolutely convergent if, and only if, the series n=1 an and n=1 an − are both convergent, where an+ and a are defined, for all n ∈ N, in (2.28). If ∞ n + ∞ − ∞ a = a − a . this is the case, then n n=1 n=1 n n=1 n ∞ ∞ + (ii) If n=1 an is convergent but not absolutely convergent, then the series n=1 an − and ∞ a are both divergent. n=1 n + − Proof (i) Assume that ∞ n=1 an is absolutely convergent. Since an + an = |an |, + − we have 0 ≤ an ≤ |an | and an ≤ |an | for all n ∈ N. This implies, by ∞ 0 ≤ ∞ + − a and Proposition 170, that both n n=1 an converge. Conversely, assume ∞ + ∞ n=1 − + − that both all n=1 an and n=1 an converge. Since ∞|an | = an+∞an for n ∈ N,− this ∞ | converges, too. Clearly, n=1 an = n=1 an+ − ∞ shows that n=1 |an n=1 an . ∞ (ii) Assume that ∞ a converges and |a | diverges. Note that an− = n n n=1 n=1 ∞ + ∞ − + an − an for all n ∈ N; thus,if n=1 an converges then n=1 an converges, too, ∞ |an | converges, a contradiction. This proves that and it follows from (i) that ∞ n=1 ∞ + − a diverges. That a diverges is proved analogously. n=1 n n=1 n The reason for the good behavior, regarding rearrangements, of an absolutely convergent series, lies at the following corollary, since the sum of a series of nonnegative terms is unchanged under rearrangements (this is shown, independently of the notion of unconditional convergence, in the proof of Proposition 190). Corollary 189 A series ∞ n=1 an is absolutely ∞ convergent if, and only if, an = bn − cn for all n ∈ N, where ∞ b and n=1 n n=1 cn are some convergent series of nonnegative terms. + − Proof If ∞ n=1 an is absolutely convergent, then put bn := an and cn := an for all + − n ∈ N, where an and an are as in (2.28). The result follows from (i) in Proposition 188. On the other side, if the condition holds, then |an | ≤ bn + cn for all n ∈ N, hence ∞ n=1 |an | converges. The following results is, in fact, an equivalence. The complete statement will appear as Corollary 192, as the reverse implication depends on Theorem 191.
2.4 Series
87
Proposition 190 Every absolutely convergent series of real numbers is unconditionally convergent, and any of its rearrangements sums to the same number. Proof It follows from Corollary 189 that there exist two convergent series ∞ n=1 bn ∞ and n=1 cn of nonnegative terms such that an = bn − cn for all n ∈ N. As a consequence, it is enough to prove the result for a series of nonnegative terms. Assume of N. Put bn := aσ (n) for all then that an ≥ 0 for all n ∈ N. Let σbe a permutation ∞ ∞ n n ∈ N. This defines a rearrangement b of a . Let A := n n n n=1 n=1 k=1 ak , Bn := n ∞ k=1 bk , and A := n=1 an . Given n ∈ N, let m := max{σ (1), σ (2), . . . , σ (n)}. Then, obviously {σ (1), σ (2), . . . , σ (n)} ⊂ {1, 2, . . . , m}, hence Bn ≤ Am ≤ A. ≤ A for all n ∈ N, so ∞ This shows that Bn n=1 bn converges ∞ by Proposition 169. If we denote B := ∞ b , then B ≤ A, proving that n=1 n n=1 an is unconditionally a is a rearrangement of ∞ convergent. Since, on the other hand, ∞ n=1 n n=1 bn , the previous argument gives A ≤ B. All together, A = B. The argument in the proof of the following result was already used in the analysis of the rearrangements of the alternating series done at the beginning of this subsection. Theorem 191 (Riemann) Let ∞ n=1 an be a convergent series of real numbers. If ∞ are two elements n=1 an is not absolutely convergent and α, β ∞ of [−∞, +∞] such that α ≤ β, then there exists a rearrangement ∞ n=1 bn of n=1 an such that lim inf Bn = α, n→∞
where Bn :=
n k=1
lim sup Bn = β,
(2.29)
n→∞
bn , for n ∈ N,
∞ Proof Let {αn }∞ in R such that an → α and bn → β. n=1 and {βn }n=1 be two sequences ∞ + − Due to Proposition 188, the series ∞ a and n=1 n n=1 an are both divergent, where + ∞ − ∞ the sequences {an }n=1 and {an }n=1 were defined in (2.28). Start by adding consecu+ tive elements from the sequence {an+ }∞ moment the n=1 , starting with a1 , until the first + sum exceeds β1 (that this happens is guaranteed by the fact that ∞ a n=1 n diverges). Then subtract from this sum consecutive elements from the sequence {an− }∞ n=1 , startfirst moment the sum falls behind α (this is guaranteed by the ing with a1− , until the 1 − fact that the series ∞ a diverges). Continue by adding to this sum consecutive n=1 n , until the moment elements, starting at the first one left, from the sequence {an+ }∞ n=1 ∞ ∞ the sum exceeds β2 . Keep going. This defines a rearrangement b of n=1 n n=1 an . ∞ That the partial sums Bn of n=1 bn satisfy (2.29) is a consequence of the fact that
an → 0 (see Proposition 158). Related to the previous result, see Exercises 13.124 and 13.138.
Corollary 192 A series of real numbers is unconditionally convergent if, and only if, it is absolutely convergent. Proof This follows from Proposition 190 and Theorem 191.
Corollary 193 All rearrangements of an unconditionally convergent series sum to the same number.
88
2 Sequences and Series
Proof This follows from Corollary 192 and Proposition 190. Several convergence notions for a series (unconditional convergence already appeared in Definition 186, and we can add subseries, bounded multiplier, signmultiplier, and unordered convergence, see Definition 199) play an important role, specially in more general settings as in the case of infinite-dimensional normed spaces. Even the definition of a series can be extended to the case when the index set is not N. We shall present the basics here, to conclude (in Proposition 200) that, in the case of series of real numbers with N as their set of indices, all convergence agree with the absolute convergence. Recall that, given an arbitrary nonempty set , the family Pf () consists of all the finite subsets of (this family includes the empty set). For this and related notation see Sect. 1.1. Definition 194 Let be a countable set, and let {a γ : γ ∈ } be a set of real numbers. Given an element F ∈ Pf (), put sF := γ ∈F aγ (with the convention that s∅ := 0). The countable family {sF : F ∈ Pf ()} is said to be the (unordered) series defined by {aγ : γ ∈ }. The series {sF : F ∈ Pf ()} is said to be unordered convergent to s ∈ R if for every ε > 0 there exists a finite subset F0 of such that, if F is an arbitrary finite subset of such that F ⊃ F0 , then |s − sF | < ε. In this case we write s := γ ∈ aγ , and s is said to be the (unordered) sum of the series {sF : F ∈ Pf ()}. In short, we denote in this case by γ ∈ aγ both the unordered series and its unordered sum. Proposition 195 Let be a countable set. An unordered series γ ∈ aγ of nonnegative terms is unordered convergent if, and only if, the family {sF : F ∈ Pf ()} is bounded. If this is the case, then the unordered sum γ ∈ aγ is precisely sup{sF : F ∈ Pf ()}. Proof Assume that γ ∈ aγ is unordered convergent, and let s be its unordered sum. There exists F0 ∈ Pf () such that for F ∈ Pf () with F ⊃ F0 , we have |s − sF | < 1. Then, given an arbitrary G ∈ Pf (), we have sG ≤ sG∪F0 < s + 1, and this proves the necessary condition. To prove sufficiency, assume that the family {sF : F ∈ Pf ()} is bounded. Let s be its supremum. Given ε > 0 there exists F0 ∈ Pf () such that s − ε < sF0 ≤ s. Let F ∈ Pf () be such that F ⊃ F0 . Then, clearly s − ε < sF0 ≤ sF ≤ s, hence γ ∈ aγ = s. The argument proves also the last assertion. Corollary 196 Let be a countable set. Let γ ∈ aγ be an unordered convergent series of nonnegative terms, and let 0 ⊂ . Then the unordered series γ ∈0 aγ converges. Proof Obviously, the family {sF : F ∈ Pf (0 )} is bounded if the family {sF : F ∈ Pf ()} is bounded. The result follows from Proposition 195. Corollary 197 Let {an }∞ n=1 be a sequence of nonnegative ∞ numbers. Then n∈N an (i.e., the unordered series) converges if, and only if, n=1 an converges. If this is the case, then n∈N an = ∞ n=1 an .
2.4 Series
89
Proof Given F ∈ Pf (N), there exists n ∈ N such that F ⊂ {1, 2, . . . , n}. This shows that sup{sF : F ∈ Pf (N)} = sup{sn : n ∈ N}, where sn := nk=1 ak for n ∈ N. The conclusion follows from Proposition 195. Ifan unordered series γ ∈ aγ of nonnegative numbers does not converge, we put γ ∈ aγ = +∞. This is a natural convention in view of Proposition 195 or its Corollary 197. Proposition 198 Let be a countable set, and let {n : n = 1, 2, . . . } be a sequence of pairwise disjoint subsets of that covers . Let {aγ : γ ∈ } be a set of nonnegative numbers. Then (including the case that some of the sums below are +∞) ∞ aγ = aγ . n=1 γ ∈n
γ ∈
Proof Let F be a nonempty finite subset of . Put Fn := F ∩ n , for n ∈ N. Then γ ∈F
aγ =
∞
aγ ≤
n=1 γ ∈Fn
∞
aγ .
n=1 γ ∈ n
It follows that
aγ ≤
γ ∈
∞
aγ .
(2.30)
n=1 γ ∈ n
This shows the statement if γ ∈ aγ = +∞. 1≤n≤ On the other hand, finite setsFn ⊂ n for fix N ∈ N, and take arbitrary N N. The set F := N F is finite, and a = a ≤ n γ ∈Fn γ γ ∈F γ γ ∈ aγ . n=1 N n=1 It follows that n=1 γ ∈n aγ ≤ γ ∈ aγ . Since this is true for every N ∈ N, we finally get ∞ n=1 γ ∈ n
aγ ≤
aγ .
(2.31)
γ ∈
The two inequalities (2.30) and (2.31) together prove the assertion. Next definition introduces some convergence notions related to a series of real numbers. Proposition 200 below will show that all of them coincide. The reason then for presenting those concepts is that they are different in an infinite-dimensional setting. Definition 199 Let {an }∞ n=1 be a sequence of real numbers. ∞ ∞ (i) A subseries of the series ∞ n=1 an is a series k=1 ank , where {ank }k=1 is a ∞ ∞ subsequence of the sequence {an }n=1 . The series n=1 an is said to be subseries convergent if every subseries of it converges. ∞ (ii) The series ∞ n=1 an is said to be bounded-multiplier convergent if n=1 bn an converges for every bounded sequence {bn }∞ n=1 of real numbers.
90
2 Sequences and Series
∞ ∞ (iii) The series n=1 an is said to be sign-multiplier-convergent if n=1 εn an converges for every sequence {εn }∞ , where ε ∈ {−1, 1} for all n ∈ N. n n=1 Proposition 200 Let {an }∞ n=1 be a sequence of real numbers. Then, the following are equivalent: (i) The series ∞ n=1 an is absolutely convergent. (ii) The series ∞ n=1 an is unconditionally convergent. (iii) The series ∞ n=1 an is subseries convergent. (iv) The series n∈N an is unordered convergent. (v) The series ∞ n=1 an is bounded-multiplier convergent. (vi) The series ∞ n=1 an is sign-multiplier-convergent. Proof (i)⇔(ii) has been proved in Corollary 192. (i)⇒(iii): If ∞ n=1 an is absolutely convergent, then clearly every subseries is also absolutely convergent, hence convergent (see Proposition 167). (iii)⇒(i): Assume that ∞ n=1 an is subseries convergent (in particular, convergent). + If it does not converge absolutely, it follows from(ii) in Proposition 188 that, ∞if an + is defined as in (2.28) for n ∈ N, then the series ∞ a (a subseries of a n=1 n n=1 n ) diverges, a contradiction. (i)⇒(iv) Assume that ∞ n=1 an is absolutely convergent. In particular, it converges (see Proposition 167). Let s be its sum. It follows that, given ε > 0, there exists n0 ∈ N such that | nk=1 ak− s| < ε for every n ≥ n0 . We can choose n0 in such a way that, at the same time, ∞ n0 +1 |an | < ε. Put F0 := {1, 2, . . . , n0 }. Let F ∈ Pf (N) be such that F ⊃ F0 . Then an − s = an + an − s ≤ an − s + |an | < 2ε. n∈F0 n∈F n∈F0 n∈F \F0 n∈F \F0 This ∞ proves that the series n∈N an is unordered convergent, and that n∈N an = n=1 an . (iv)⇒(i) Let ∞ n∈N an . n=1 an be an unordered convergent series, and let s := Given ε > 0 there exists F0 ∈ Pf (N) such that |sF − s| < ε for every F ∈ Pf (N) such that F ⊃ F0 . Let n0 := max{n : n ∈F0 }. Then, given n ≥ n0 we have | nk=1 ak − s| < ε. This proves that the series ∞ n=1 an converges (and sums to s). If it does not converge absolutely, it follows from (ii) in Proposition 188 that the series ∞ + find F ∈ Pf (N) 1 ∈ Pf (N). We can n=1 an diverges. Fix an arbitrary F such that F ∩F = ∅, a ≥ 0 for all n ∈ F , and a > 1. Then a − 1 n n n n∈F n∈F ∪F n∈F1 an = 1 fact that ∞ n∈F an > 1, and this contradicts the n=1 an is unordered convergent. ∞ (i)⇒(v)⇒(vi)⇒(i): Assume that n=1 an is absolutely convergent, ∞ and let {bn }∞ n=1 be a bounded sequence of real numbers. Clearly, the series n=1 bn an is also absolutely convergent, hence convergent (see Proposition 167). If ∞ n=1 an is bounded-multiplier convergent, it is obviously sign-multiplier convergent. Assume now that ∞ Then, by choosing ε ∈ {−1, 1} n n=1 an is sign-multiplier convergent. such that εn an = |an |, for all n ∈ N, we get that ∞ n=1 |an | converges.
2.4 Series
2.4.6
91
Double Sequences and Double Series
Notions introduced in this subsection will be used later in this text. For example, besides their use in the theory of infinite products (Sect. 2.6), we shall encounter them in measure theory (e.g., Proposition 236), function series (see, in particular, Theorem 480 and Remark 514.2), integration theory (see, in particular, Theorems 742 and 750, and in arguments related to Fubini’s theorem for multiple integrals. G. Fubini was an Italian mathematician), and in distribution theory (e.g., Proposition 1061). Double Sequences A double sequence of real numbers is a collection {rn,m }∞ n,m=1 , where rn,m ∈ R for all n, m ∈ N. Definition 201 We say that the double sequence of real numbers {rn,m }∞ n,m=1 converges to the double limit r ∈ R (and we denote r := limn,m→∞ rn,m or, in short, r := limn,m rn,m ) if for every ε > 0 there exists nε ∈ N such that |rn,m − r| < ε for every n, m ≥ nε . If the double sequence does not converge, we say that it diverges. Definition 202 Let {rn,m }∞ n,m=1 be a double sequence of real numbers. If for every n ∈ N the limit an := limm→∞ rn,m exists finite, and the sequence {an }∞ n=1 converges to a ∈ R, then we write a := limn→∞ limm→∞ rn,m (in short a := limn limm rn,m ), and we call a an iterated limit of the double sequence {rn,m }∞ n,m=1 . Analogously, if for every m ∈ N the limit bm := limn→∞ rn,m exists finite, and the sequence {bm }∞ m=1 converges to b, then we write b := lim m→∞ lim n→∞ rn,m (in short b := limm limn rn,m ), and we also call b an iterated limit of the double sequence {rn,m }∞ n,m=1 . Proposition 203 Given a convergent double sequence {rn,m }∞ n,m=1 in R such that limm rn,m exists finite for each n ∈ N, then lim lim rn,m exists, and n
m
lim lim rn,m = lim rn,m . n
m
n,m
A similar statement holds in case we assume that limn rn,m exists for each m ∈ N. Proof If r := limn,m rn,m , given ε there exists nε ∈ N such that |rn,m − r| < ε for n, m ≥ nε . Letting m → ∞ we get | limm rn,m − r| ≤ ε for all n ≥ nε . This shows the assertion. Remark 204 Let us mention two examples on the relationship of the double limits and iterated limits. n 1. If rn,m := m+n for all n, m ∈ N, then limn rn,m = 1 for all m ∈ N, thus limm limn rn,m = 1, and limm rn,m = 0 for all n ∈ N, and so limn limm rn,m = 0. However, rn,n = 1/2 for all n ∈ N. Assume that r := limn,m rn,m exists. This forces r = 1/2. Clearly, 1/2 is not the double limit of the double sequence {rn,m }∞ n,m=1 .
92
2 Sequences and Series
2. Let rn,m := (−1)n+m n1 + m1 for all n, m ∈ N. Clearly, limn,m rn,m = 0. However, ∞ given n ∈ N, the sequence {rn,m }∞ m=1 diverges, as well as the sequence {rn,m }n=1 for each m ∈ N. Thus, neither of the two iterated limits of the double sequence ® {rn,m }∞ n,m=1 exists. Double Series We are mostly interested in double sequences arising as the partial sums of double series. Let us formalize this concept. Definition 205 Let {an,m }∞ n,m=1 be a double sequence of real numbers. For n, m ∈ N n m ∞ put sn,m := i=1 j =1 ai,j . The double sequence {s n,m }n,m=1 is said to be the double ∞ denoted by n,m an,m . The element an,m is series generated by {an,m }n=1 , and is called the (n, m)-th term of the series n,m an,m . If the double sequence {sn,m }∞ n,m=1 converges to s, we call s the sum of the double series n,m an,m , and we say that the double series converges. The number s is said to be, in this case, the sum of the double series n,m an,m . Remark 206 Obviously, given a double sequence {sn,m }∞ n.m=1 of real numbers, we can define a double sequence {an,m }∞ n,m=1 such that the double series generated by ∞ {an,m }∞ n,m=1 is {sn,m }n,m=1 . This is done inductively: a1,1 := s1,1 , a1,n+1 := s1,n+1 −s1,n for all n ∈ N, and am+1,1 = sm+1,1 − sm,1 for all m ∈ N. This defines a1,n and am,1 for all n, m ∈ N. Now a2,2 = s2,2 − a1,1 − a1,2 − a2,1 , etc. This observation, together with the examples in Remark 204, show that 1. We may have a double series n,m an,m that converges though neither n an,m nor m an,m converge for each n, m ∈ N, and that 2. We may have a divergent double series n,m an,m such that n an,m and m an,m exist for all n, m ∈ N. ® Despite the “pathologies” that may arise, shown in Remark 206, a direct application of Proposition 203 to the partial sums of a given double series gives the following stability result, due to the German mathematician A. Pringsheim: Proposition 207 (Pringsheim) Let n,m an,m be a convergent double series of real numbers. If a exists finite for all m ∈ N and a exists all n ∈ N, n,m n,m m n for then a and a both exist, and a = n,m n,m n,m n m m n n m m n an,m = a . n,m n,m The following result, due to the Austrian mathematician O. Stolz, characterizes convergent double series, and it is based on the Cauchy convergence criterium. See Fig. 2.3. Proposition 208 (Stolz) A double series of real numbers converges if, and only if, given ε > 0 there exists nε ∈ N such that |sp,q − sn,m | < ε for every n, m, p, q in N such that nε ≤ n ≤ p and nε ≤ m ≤ q.
2.4 Series
93
Fig. 2.3 The difference sp,q − sn,m (proof of Proposition 208)
1
m
1
m+ 1 m
q
1 n p
Fig. 2.4 Getting an+1,m+1 by subtracting sn+1,m − sn,m from sn+1,m+1 − sn,m+1
1 n n+ 1
Proof (For a sketch of the double summands see Fig. 2.3). Assume first that n,m an,m converges, and let s be its sum. Then, given ε > 0 there exists nε ∈ N such that |s − sn,m | < ε if n ≥ nε and m ≥ nε . Now, for n, m, p, q as in the statement, |s − sn,m | < ε and |s − sp,q | < ε. It follows that |sn,m − sp,q | < 2ε. Since ε > 0 is arbitrary, this proves the necessary condition. As for sufficiency, assume that the condition holds. Given ε > 0 find nε as in the statement. In particular, |sn,n − sp,p | < ε for n, p ∈ N such that nε ≤ n ≤ p, so the sequence {sn,n }∞ n=1 is Cauchy, hence convergent. Let s be its limit. Observe that |s − sn,n | ≤ ε for every n ≥ nε . Take n, m ∈ N such that n ≥ nε and m ≥ nε . Then |sn,m − snε ,nε | < ε due to the given condition. Moreover, |snε ,nε − s| ≤ ε. It follows that |sn,m − s| < 2ε for n, m ≥ nε , and this proves that n,m an,m converges. Corollary 209 The general terms of a convergent double series form a double sequence that converges to 0. Proof Let n,m an,m be a convergent double series. Given ε > 0 we can find, by Proposition 208, nε ∈ N such that |sn,m − sp,q | < ε whenever nε ≤ n ≤ p and nε ≤ m ≤ q. In particular, for n, m ≥ nε we have |sn+1,m − sn,m | < ε, and |sn+1,m+1 − sn,m+1 | < ε. Since an+1,m+1 = (sn+1,m+1 − sn,m+1 ) − (sn+1,m − sn,m ) (see Fig. 2.4), we get |an+1,m+1 | < 2ε. The proof follows from this easily. Corollary 210 Let n,m an,m and n,m bn,m be two series of nonnegative terms. If an,m ≤ bn,m for all n, m ∈ N and n,m bn,m converges, then n,m an,m converges, too. Proof This follows from Proposition 208. Indeed, it is enough to observe that for n, m, p, q ∈ N such |Ap,q − An,m | < |Bp,q − Bn,m |, n ≤ p and m ≤ q, we that have n m where An,m := ni=1 m a and B := n,m j =1 i,j i=1 j =1 bi,j .
94
2 Sequences and Series
Fig. 2.5 A particular “summation method” (i.e., a particular function ϕ) in Proposition 212
1
1
m
n ϕ (k)
Double Series of Nonnegative Terms For double series of nonnegative terms, the pathologies mentioned in Remark 206 cannot arise. Even more, we may sum a given such double series by any “rule” or “summation method” getting the sum of the double series (see Fig. 2.5). This is the content of Proposition 212. If n,m an,m is a double series, we put sn,m := n m a for n, m ∈ N. i,j i=1 j =1 Proposition 211 Let n,m an,m be a double series of nonnegative terms. Then the ∞ series converges if, and only if, the double sequence {sn,m }n,m=1 is bounded above. If this is the case, then the double series n,m an,m sums to sup{sn,m : n, m ∈ N}. Proof Let s := sup{sn,m : n, m ∈ N}. Given ε > 0 there exists n0 and m0 in N such that sn0 ,m0 > s − ε. Letting nε := max{n0 , m0 } we also have snε ,nε > s − ε. Due to the fact that an,m ≥ 0 for all n, m ∈ N we get sn,m ≥ snε ,mε > s − ε for all n, m ≥ nε , and this proves the result. Proposition 212 Let n,m an,m be a double series of nonnegative terms. Let ϕ be a one-to-one mapping from N onto N × N. Then the following statements are equivalent: (i) (ii) (iii) (iv)
The double series converges. n m an,m exists as a real number. m n an,m exists as a real number. The series ∞ k=1 aϕ(k) converges.
If one (and then all) of the previous statements hold, then an,m = an,m = an,m = aϕ(k) n,m
n
m
m
n
(2.32)
k
Proof Observe first that for n ∈ N the sequence {sn,m }∞ m=1 is increasing, and for m ∈ N the sequence {sn,m }∞ is increasing, too. Moreover, sn,m ≤ sp,q for n ≤ p n=1 and m ≤ q. (i)⇒(ii): Assume that n,m an,m is convergent. Let s be its sum. Since sn,m ≤ s ∞ for all n, m ∈ N, we get from the previous observation that {sn,m }∞ m=1 and {sn,m }n=1 both converge, that lim m sn,m ≤ s for all n ∈ N, and that limn sn,m ≤ s for all m ∈ N. (ii)⇒(i): Put u := n m an,m . Let ε > 0. We can find n0 ∈ N such that n0 ∞ n=1 m=1
an,m > u − ε.
2.4 Series
95
For n ∈ {1, 2, . . . , n0 } we can find mn ∈ N such that mn
an,m >
∞
an,m − ε/n0 .
m=1
m=1
Therefore, mn n0
an,m >
n=1 m=1
n0 ∞
an,m − ε > u − ε − ε = u − 2ε.
n=1 m=1
Let nε := max{n0 , mn : n = 1, 2, . . . , n0 }. Then snε ,nε =
nε nε
an,m >
n=1 m=1
mn n0
an,m > u − 2ε.
n=1 m=1
Since sp,q ≥ snε ,nε for p, q ≥ nε , we finally get u − 2ε < sp,q ≤ u for all p, q ≥ nε , and this shows that n,m an,m converges and sums u. (i)⇒(iii) way as (i)⇒(ii) and (ii)⇒(i). Put and (iii)⇒(i) are shown in the same v := m n an,m . Incidentally, we showed that n,m an,m = v. The argument to prove (i)⇔(iv) is based on the following observations: (a) The sequence { nk=1 aϕ(k) }∞ n=1 is clearly increasing. (b) Given K ∈ N there exists N(K) ∈ N such that {ϕ(k) : 1, 2, . . . , K} ⊂ {(n, m) : n, m ≤ N (K)}. (c) Given N ∈ N there exists K(N ) ∈ N such that {(n, m) : n, m ≤ N } ⊂ {ϕ(k) : 1, 2, . . . , K(N )}. This is due to the fact that ϕ is onto. (i)⇒(iv): Given ε > 0, we can find nε ∈ N such that snε ,nε > s − ε, where kε s = n,m an,m . Find kε := K(nε ) as in (c) above. It follows that k=1 aϕ(k) > s − ε. n ∞ Since the injectivity of ϕ implies, obviously, that the sequence { k=1 aϕ(k) }n=1 is ∞ bounded above by s, we get from (a) above that k=1 aϕ(k) converges, and it sums s. (iv)⇒(i): Assume that ∞ k=1 aϕ(k) = . Given ε > 0 we can find kε ∈ N ε aϕ(k) . Find nε := N (kε ) as in (b) above. It follows that such that − ε < kk=1 − ε < snε ,nε . Since ϕ is onto, we have clearly sn,m ≤ for n, m ∈ N. This shows that − ε < sn,m ≤ for n, m ≥ nε , and so n,m an,m converges and sums . The last assertion follows from the proof of the equivalences above. Absolutely Convergent Double Series In view of Proposition 212 and by analogy with the case of series of real numbers, it can be expected that, regarding convergence, the behavior of an absolutely convergent double series should be stable. By anabsolutely convergent double series we mean a double series n,m an,m such that n,m |an,m | converges. Proposition 213 If n,m an,m is an absolutely convergent double series of real numbers and ϕ is a one-to-one mapping from N onto N × N, then (i) The double series n,m an,m converges.
96
2 Sequences and Series
(ii) n m an,m exists as a real number. (iii) m n an,m exists as a real number. (iv) The series ∞ k=1 aϕ(k) converges. Moreover,
an,m =
n,m
n
an,m =
m
m
an,m =
n
(2.33)
aϕ(k)
k
n m Proof Put sn,m := ni=1 m j =1 ai,j and Sn,m := i=1 j =1 |ai,j | for n, m ∈ N. (i) Assume that n,m |an,m | converges. By Proposition 208, given ε > 0 there exists nε ∈ N such that for n, m, p, q in N such that nε ≤ n ≤ p and nε ≤ m ≤ q we have |Sn,m − Sp,q | < ε. It is clear that |sn,m − sp,q | ≤ |Sn,m − Sp,q | ( < ε). Again by Proposition 208, we get that n,m an,m converges. Let s be its sum. (ii) For n, m ∈ N, let ⎧ ⎧ ⎨ a ⎨ 0 if an,m ≥ 0, if an,m ≥ 0, n,m pn,m := qn,m := ⎩ 0 ⎩ if an,m < 0, −an,m if an,m < 0. Then, for all n, m ∈ N. an,m = pn,m − qn,m ,
(2.34)
|an,m | = pn,m + qn,m ,
(2.35)
0 ≤ pn,m ≤ |an,m |, and
(2.36)
0 ≤ qn,m ≤ |an,m |.
(2.37)
From(2.36), (2.37), and Corollary 210, we get that the double series and n,m qn,m both converge. From Proposition 212 it follows that n
pn,m =
m
pn,m , and
n,m
n
qn,m =
m
n,m
pn,m
qn,m .
n,m
Observe that for n ∈ N and M ∈ N, it follows from (2.34) that M
pn,m −
m=1
M
qn,m =
m=1
M
an,m .
m=1
This shows that m an,m exists and coincides with m pn,m − m qm . Moreover, given N ∈ N, we have N n=1 m
pn,m −
N n=1 m
qn,m =
N n=1 m
an,m ,
2.4 Series
97
hence n,m
n
pn,m −
exists and coincide with n m pn,m − n m qn,m ( = n,m qn,m ). Due to the fact that, again from (2.34),
m an,m
n m
pi,j −
i=1 j =1
n m i=1 j =1
qi,j =
n m
ai,j
i=1 j =1
for all n, m ∈ N, we get n,m pn,m − n,m qn,m= n,m an,m (note that a exists by (i) here). This shows, finally, that a = n,m n,m n m n,m n,m an,m . (iii) is proved in the same way as (ii). (iv) follows from the obvious fact that for K ∈ N we have K pϕ(k) − K q = k=1 K k=1 ϕ(k) k=1 aϕ(k) , and from (i)⇔(iv) in Proposition 212 applied to n,m pn,m and n,m qn,m . Equalities (2.33) follow from the previous arguments in (ii), (iii), and (iv). Corollary 214 Let n,m an,m be an absolutely convergent double series. Let σ : N × N → N × N be a one-to-one and onto mapping. Then the doubles series a (called a reordering of the double series n,m σ (n,m) n,m an,m ) is also absolutely convergent, and n,m an,m = n,m aσ (n,m) . Proof Let ϕ : N → N × N be a one-to-one mapping. By Proposition and onto 212, the series k |aϕ(k) | converges, and n,m |an,m | = k |aϕ(k) |. Let δ : N → N be the mapping ϕ −1 ◦ σ ◦ ϕ. This is a one-to-one mapping from N onto N. Put A := |a | for k ∈ N. By Proposition 190, the series k ϕ(k) k Aδ(k) converges, and A = A . Observe that, if (n, m) = ϕ(k), then (ϕ ◦ δ)(k) = σ (n, m). Thus, k δ(k) k k |an,m | = |aϕ(k) | = Ak = Aδ(k) = |aϕ(δ(k)) | = |aσ (n,m) |, n,m
k
k
k
n,m
k
where the last equality follows from Proposition 212. This shows that the double series n,m aσ (n,m) is absolutely convergent. Now, by Proposition 213 and Proposition 190, we get an,m = aϕ(k) = aϕ(δ(k)) = aσ (n,m) . n,m
k
This finalizes the proof.
2.4.7
k
n,m
Product of Series
∞ Assume that ∞ n=0 an and n=0 bn are two convergent series. Sometimes we need to compute the product of the two sums and, it is natural to ask whether it coincides with the sum of the series that is obtained by formally multiplying the two given series. Before proceeding we need to establish what we mean by the “formal product” of the two series. Let us proceed for the moment without regarding convergence. ∞ ∞ an bn n=0
n=0
98
2 Sequences and Series
= (a0 + a1 + . . . )(b0 + b1 + . . . ) = (a0 b0 ) + (a0 b1 + a1 b0 ) + (a0 b2 + a1 b1 + a2 b0 ) + . . . =
∞
cn ,
n=0
0, 1, 2, . . . , n, to form cn := where n we have been collecting terms ak bn−k , k = ∞ k=0 ak bn−k for all n ∈ N ∪ {0}. The expression n=0 cn is called the Cauchy product of the two series. A way to look at this product is the following: form the double sequence {pn,m := an bm }∞ n,m=0 , and define a particular summation method (i.e., a one-to-one and onto mapping ϕ : (N ∪ {0}) → (N ∪ {0}) × (N ∪ {0}), see Propositions 212 and 213) as follows (observe the Fig. 2.5): ⎛ ϕ(0) = (0, 0) ϕ(2) = (0, 1) ϕ(5) = (0, 2) ϕ(9) = (0, 3) ⎜ ⎜ ϕ(1) = (1, 0) ϕ(4) = (1, 1) ϕ(8) = (1, 2) . . . ⎜ ⎜ ⎜ ϕ(3) = (2, 0) ϕ(7) = (2, 1) . . . ⎜ ⎜ ⎜ ϕ(6) = (3, 0) . . . ⎝ ϕ(10) = (4, 0)
⎞ ...
⎟ ⎟ ⎟ ⎟ ⎟ ⎟ ⎟ ⎟ ⎠
...
Assumethat the series n an and m bm are both absolutely convergent, and put A := n |an |, B := m |bm |. Clearly, AB is the supremum (and so the sum) of the double series n,m |an |.|bm | (see Proposition 211). In particular, the double ∞ series p is absolutely convergent, and so n,m n,m k=0 pϕ(k) converges to the sum ∞ p ( = a b ) (see Proposition 213). The n,m n m n,m n,m sequence {ck }k=0 that was called the Cauchy product of the two series n an and m bm is just a subsequence of the sequence of partial sums of the series ∞ k=0 pϕ(k) . This shows that the Cauchy product of two absolutely convergent series is (absolutely) convergent, and it sums the product of the sums of the two series. This result was improved by the German-Polish mathematician F. Mertens. It is presented below. ∞ Theorem 215 (Mertens) Let ∞ n=0 an and n=0 bn two series of real numbers. Assume that (i) ∞ n=0 an converges absolutely. (ii) ∞ n=0 an = A. ∞ (iii) n = B. n=0 b (iv) cn := nk=0 ak bn−k for all n = 0, 1, 2, . . . Then ∞ n=0 cn = AB.
2.5 The Euler Number e
99
Proof Put, for n = 0, 1, 2, . . . , An :=
n
ak , Bn :=
k=0
n
bk , Cn :=
k=0
n
ck , βn := Bn − B.
k=0
Then Cn = a0 b0 + (a0 b1 + a1 b0 ) + . . . + (a0 bn + a1 bn−1 + . . . + an b0 ) = a0 Bn + a1 Bn−1 + . . . + an B0 = a0 (B + βn ) + a1 (B + βn−1 ) + . . . + an (B + β0 ) = An B + a0 βn + a1 βn−1 + . . . + an β0 . Put γn := a0 βn + a1 βn−1 + . . . + an β0 . We wish to show that Cn → AB. Clearly, An B → AB. Therefore, it suffices to show that γn → 0.
(2.38)
∞
Put α := n=0 |an | (here we use (i)). Fix ε > 0. By (iii) we have βn → 0. Hence we may choose N ∈ N such that |βn | < ε for every n ≥ N . For this n we have |γn | ≤ |β0 an + . . . βN an−N | + |βN+1 an−N−1 + . . . βn a0 | ≤ |β0 an + . . . βN an−N | + εα. Keeping N fixed, and letting n → ∞, we get lim sup |γn | ≤ εα, n→∞
since ak → 0 as k → ∞. Since ε is arbitrary, (2.38) follows.
2.5 The Euler Number e This section deals with the important number e, to be introduced in Definition 217. This is the way the Swiss mathematician J. Bernoulli formulated the so-called compound-interest problem: Quaeritur, si creditor aliquis pecuniae summam faenori exponat, ea lege, ut singulis momentis pars proportionalis usurae annuae sorti annumeretur; quantum ipsi finito anno debeatur? (The question is, if the creditor lends a sum of money at usury following the rule that the proportion of the interest rate should be paid at every moment to complete the annual allotment, how much is due to the end of the year?) Jacob Bernoulli, circa 1700
Let us present it in more detail: You invest $1 into a bank that pays 100 % interest rate. Let us see how much money you would get at the end of 1 year with various
100
2 Sequences and Series
Fig. 2.6 Two functions that approximate e (Proposition 216)
compounding options. If the bank issues annual compounding, one obtains (1+1)1 = $2. If the bank decides on semiannual compounding, one obtains 1 2 = $2.25 1+ 2 dollars, a little more. Quarterly compounding would yield (1 + 41 )4 = $2.44140625, 1 365 still a bit more. Daily compounding yields (1 + 365 ) ≈ $2.71456748. In general, if you allow n compounding per year, you obtain 1 n 1+ n dollars. Thus, it is beneficial for your investment to increase the frequency of compounding per year as much as possible. The best option, of course, is to obtain continuous compounding, compounding where you collect interest at all instances of time. In this case you would obtain 1 n lim 1 + n→∞ n dollars at the end of the year if such a limit exists. In this section we show that such a limit indeed exists, and the limit is the irrational number e, named after the Swiss mathematician L. Euler. Read Euler: he is our master in everything. Pierre-Simon, marquis de Laplace
The strategy for showing that the limit exists consists of building two sequences, one strictly increasing and bounded above, and the other one strictly decreasing and bounded below (so that, by Theorem 135, both sequences converge), and having a common limit, called e. Proposition 216 For n ∈ N define 1 n 1 n+1 , and bn = 1 + . an = 1 + n n ∞ Then the sequence {an }∞ n=1 is strictly increasing, and the sequence {bn }n=1 is strictly decreasing. Moreover, for all n we have 2 < an < bn ≤ 4. The two limits limn→∞ an
2.5 The Euler Number e
101
and limn→∞ bn exist as finite numbers, and lim an = lim bn .
n→∞
n→∞
Proof Consider the above sequences starting at n = 2. Using Bernoulli’s inequality(2.5) for n ≥ 2, we have −n n n+1 n 1 n 1 an = = 1− 2 >1− . bn−1 n n−1 n n
Therefore an > bn−1
n−1 n
= an−1 ,
and thus {an } is strictly increasing. Similarly, n n n bn−1 1 −n 1 n2 1 1+ = 1+ = = 1 + an n−1 n n2 − 1 n2 − 1 1 n 1 > 1+ 2 >1+ . n n
and so bn−1 > an
1 1+ n
= bn ,
thus the sequence {bn } is strictly decreasing. Moreover, note that 2 = a1 < an < bn < b1 = 4 for all n > 1, so both sequences {an } and {bn } are bounded. By Theorem 135, the following limits exist as finite numbers: lim an , and lim bn .
n→∞
Moreover
lim an
n→∞
2 < a2 =
lim bn =
and thus
n→∞
n→∞
lim
n→∞
1+
1 n
= lim an n→∞
9 27 ≤ lim an = lim bn ≤ b2 = < 4. n→∞ n→∞ 4 8
Definition 217 We denote e := lim
n→∞
1 1+ n
n ,
(2.39)
102
2 Sequences and Series
a well-defined real number due to the existence of the limit (see Proposition 216). Observe now that, for n ∈ N, 1 n n n 1 n 1 n 1 1+ = + + ... + , (2.40) + 2 0 1 n n nn n 2 n
where pn is a combinatorial number, defined as the number of choices of p elements from a set of n elements (the order is not considered), hence n! n = , for n ∈ N and p = 0, 1, 2, . . . , n, p p !(n − p)!
(2.41)
(see Exercise 13.10, where (2.40) is proved in the more general form of the so-called finite binomial expansion (a + b)n , and (2.41) is established). So, the p-th summand in (2.40) is n 1 n(n − 1) . . . (n − p + 1) 1 = , p p n p! np and by factoring by np we can see that it converges to 1/p! as n → ∞. This suggests 1 to study the series ∞ p=0 p! . Lemma 218 The series (recall that 0! = 1) ∞ 1 n! n=0
(2.42)
is convergent. Proof The statement follows from Proposition 175: Indeed, n! 1 1/(n + 1)! = = → 0 whenever n → ∞. 1/n! (n + 1)! n+1 Let us provide a direct proof of Lemma 218 that shows some useful bounds for the difference of partial sums of the series (2.42). Let ε > 0 begiven and choose nε so that nε (n1 ε !) < ε. Let n, m ≥ nε such that m > n. If sn := nk=0 1/k! for n ∈ N, we have 1 1 + ··· + (n + 1)! m! 1 1 1 1 = + ··· + n! (n + 1) (n + 1)(n + 2) (n + 1)(n + 2) · · · (n + m − n) 1 1 1 1 1 1 + < < < · · · + < ε, 2 m−n n! (n + 1) (n + 1) (n + 1) n(n!) nε (nε !) (2.43)
sm − sn =
2.5 The Euler Number e
103
where the second inequality may be checked by using formula (2.20). This shows that the given series is Cauchy. It follows from Proposition 165 that the given series converges. The series in (2.42) will play an important role in the sequel. Proposition 219 We have e=
∞ 1 n! n=0
(2.44)
and the number e is irrational. Proof By Lemma 218, the series in (2.44) converges. Denote its partial sums and the sum of the series respectively by sn =
n ∞ 1 1 , n ∈ N, s = . k! k! k=0 k=0
1 (Note that sn adds to n+1 summands.) If m ≥ n ≥ 1, recall the estimate sm −sn < nn! (see equation (2.43)). Since this estimate is valid for all m ≥ n, we obtain, by passing to the limit for m,
s − sn ≤
1 , for all n ∈ N. n(n!)
(2.45)
With the notation of Proposition 216, and having in mind (2.40) and (2.41), we can write n n(n − 1)(n − 2) · · · (n − k + 1) 1 n an = 1 + =1+ n nk (k!) k=1 n n 1 1 1 2 k−1 =1+ 1− ··· 1 − p we have p 1 2 k−1 1 1 n 1− ··· 1 − , >1+ e ≥ an = 1 + (1) 1 − n n n n k! k=1 (2.47) where the second inequality follows from the third equality in (2.46) just by taking a smaller number p of the summands involved. Letting n → ∞ in (2.47), it becomes clear that e ≥ sp for all p ∈ N. This proves (2.44).
104
2 Sequences and Series
We now show that e is irrational. Arguing by contradiction, assume that e is a fraction, say p e = , where gcd (p, q) = 1, and p > 1. q Note that (using the notation in Proposition 216) a2 = 2.25 < e < b5 = 2.9859 . . . Thus e is not an integer number. Invoke the estimate (2.45), and observe that {sn }∞ n=1 is a strictly increasing sequence. We get 0 < e − sq ≤
1 . q(q!)
Multiply by q! and obtain 0
−1 for all n ∈ N. The following result provides tests for convergence of the infinite product by checking the behavior of the infinite series ∞ n=1 un . Proposition 225 Put an = 1 + un for all n ∈ N. $ ∞ (i) Assume un ≥ 0 for all n ∈ N. Then ∞ n=1 an converges if, and only if, n=1 un converges. (ii) Assume u n > −1 for all n ∈ N. $ then ∞ (iia) If ∞ n=1 un is absolutely convergent, n=1 an converges. $∞ ∞ 2 (iib) If n=1 un converges, then n=1 an converges if, and only if, ∞ n=1 un converges. Proof (i) This is a consequence of the following inequality: N N N % un ≤ (1 + un ) ≤ exp un , if un ≥ 0 for n = 1, 2, . . . , N , N ∈ N. 1+ n=1
n=1
n=1
(2.49) The validity of (2.49) can be checked as follows: first, (1 + u1 )(1 + u2 ) . . . (1 + un ) ≥ 1 + u1 + u2 + . . . + un due to the fact that all summands in the expansion of (1 + u1 )(1 + u2 ) . . . (1 + un ) are nonnegative. The second inequality in (2.49) follows by taking logarithms to both sides of the inequality and from the fact that ln(1 + u) ≤ u for all u ≥ 0 (see Corollary 539). 2 3 (iia) Let |u| < 1. Since ln (1 + u) = u − u2 + u3 − . . . (see formula (5.77)), we get ∞ ∞ |u|n n |u| |ln (1 + u)| ≤ ≤ . |u| = n 1 − |u| n=1 n=1 Thus, if |u| ≤ 1/2, then |ln (1 + u)| ≤ 2|u|. Assume that ∞ n=1 |un | converges. Then |un | ≤ 1/2 for n big enough. From the previous argument, ∞ and by the Comparison ∞ Test (Proposition 170), the series n=1 |ln (1+un )| converges, hence n=1 ln(1+un ) converges, too. Use, finally, Proposition 223. (iib) Observe that there exist A > 0 and B > 0 such that Au2 ≤ u − ln (1 + u) ≤ Bu2 for |u| < 1/2 (2.50)
2.6 Infinite Products
107
Fig. 2.7 Inequalities (2.50)
(see Fig. 2.7). Indeed, for |u| < 1, u − ln (1 + u) 2 4 2 2 u u3 u5 u3 u2 u2 u u u = − + − + ... ≥ − ≥ − = 2 3 4 5 2 3 2 3 6 and for |u| < 1/2, u2 u3 u4 u5 u − ln(1 + u) = − + − + ... 2 3 4 5 3 u2 1 |u|4 u2 u2 |u| u2 ≤ + + + ... ≤ + |u|3 ≤ + 2|u|3 ≤ + 2u2 2 3 4 2 1 − |u| 2 2 =
5 2 u . 2
Assume now that ∞ n=1 un converges. In particular, un → 0, so |un | < 1/2 for n big enough. Then, the result follows from Proposition 223 and inequalities (2.50). Examples 226
$ 1 1. Consider the infinite product ∞ (iib) of Proposition n=2 1 − n2 . We are in case ∞ 2 225. Indeed, un = −1/n n=2 un converges. ∞ > 2−1 for all n ≥ 2, and the $series ∞ Moreover, the series n=2 un converges, too, hence n=2 (1 − 1/n2 ) converges. In this particular example, we can compute the value of the product, since, for N ≥ 2, N %
1−
n=2
1 n2
=
N % n2 − 1 n=2
=
n2
=
N % (n − 1)(n + 1) n=2
n·n
(N − 1)(N + 1) N +1 1 1·3 2·4 3·5 4·6 · · · · ··· = → . 2·2 3·3 4·4 5·5 N ·N 2N 2
108
2 Sequences and Series
2. A second example provides an interesting formula for e. Define recursively the following sequence of positive real numbers: e1 = 1, en+1 = (n + 1)(en + 1) for n = 2, 3, . . . $ en +1 Then the infinite product ∞ n=1 en converges, and we have e=
∞ % en + 1 n=1
en
=
2 5 16 65 326 · · · · ··· . 1 4 15 64 325
(2.51)
The convergence of the product is guaranteed by (i) in Proposition 225. Indeed, en +1 = 1 + e1n , so un := (en + 1)/en − 1 = e1n for n ∈ N (we follow the notation en there), and it is simple to prove by induction that en ≥ n! for all n ∈ N. Then the ∞ ∞ 1 series n=1 un = n=1 en converges. To prove that the product is e, observe that, given N ∈ N, N % en + 1 n=1
en
=
N % en+1 /(n + 1) n=1
en
=
eN +1 . (N + 1)!
(2.52)
Notice that en+1 (n + 1)(1 + en ) 1 + en 1 en e1 = 1, = = = + , for n ≥ 2, 1! (n + 1)! (n + 1)! n! n! n! so, by using a telescopic argument, 1 eN+1 = , (N + 1)! k! k=0 N
and the conclusion follows from (2.52) and from Proposition 219.
♦
Remark 227 For a formula (due to the English mathematician J. Wallis) that express the number π as an infinite product, see Exercise 13.491. ®
Chapter 3
Measure
This chapter studies the basic concepts, methods, and results in the Lebesgue measure theory on the real line. We focus on results that will be needed in the rest of this text.
3.1
Measure
Measure theory is one of the most important concepts in real analysis. Putting the basis for a theory that accommodates the experience is relatively simple—we start with the measurements of intervals. However, things become delicate, due to the complexity of the structure of the real number system. Measure what is measurable, and make measurable what is not so. Galileo Galilei
Remark 228 In this section, the term “interval,” except when explicitly stated, refers to a bounded interval of the form (a, b), [a, b], [a, b), or (a, b] (see Definition 33). ®
3.1.1
The Lebesgue Outer Measure
The length of a bounded interval was introduced in Definition 35. We want to extend this notion to be applied to a more general class of subsets of R that include the bounded intervals. It is customary to speak then of a “measure” of a set, instead of the more specific term of “length.” As an intermediate step in the construction of a “measure,” we start by introducing what is called the Lebesgue outer measure λ∗ , named after the French mathematician H. Lebesgue. Recall that the word “countable” means either “finite” or “countably infinite.”
© Springer International Publishing Switzerland 2015 V. Montesinos et al., An Introduction to Modern Analysis, DOI 10.1007/978-3-319-12481-0_3
109
110
3 Measure
Definition 229 Let A be a subset of R. By the Lebesgue outer measure λ∗ (A) of A, we understand & λ∗ (A) := inf |Ii |, where {Ii } is a countable cover of A by open bounded i
intervals in R and |Ii | denotes the length of the interval Ii for each i ∈ I } (3.1) with the agreement that the infimum in (3.1) is considered to be +∞ in case all series i |Ii | there diverge. Remark 230 1. The sum i |Ii | in (3.1) is meant in the sense of Definition 194. In this way, the sum is unambiguous even in the infinite case, since |Ii | ≥ 0 for each i. Indeed, under any reordering {in }, the sum n |Iin | has either a finite number of summands or a countably infinite number of them. In the second case, if the series converges it is absolutely convergent (see Theorem 190), and so the character—convergent or divergent—and the value of the sum in the convergent case are independent of the reordering. 2. Observe that, if A and B are subsets of R such that A ⊂ B, then λ∗ (A) ≤ λ∗ (B). Indeed, if {Ii } is a finite or countably infinite family of open bounded intervals that covers B, it also covers A, and so the infimum that defines λ∗ (A) in (3.1) is clearly less than or equal to λ∗ (B). In particular, any subset of a set of Lebesgue outer measure zero has outer measure zero. 3. Due to this, the length |I | of a bounded interval I has been introduced (see Definition 35), it is easy to see that we may omit the word “open” in Definition 229 (see Exercise 13.141). 4. The Lebesgue outer measure is defined for all subsets of R (indeed, given A ⊂ R we have A ⊂ ∞ n=1 (−n, n), and so the existence of at least one cover as in Definition 229 is guaranteed) and takes values in the extended real number system introduced in Definition 229. 5. Note that λ∗ takes only nonnegative values or, possibly, +∞. 6. If a subset A of R is bounded (in the sense that it is contained in an interval (a, b), where a, b ∈ R are such that a < b), then λ∗ (A) < +∞. This follows from the definition of λ∗ (A), since {(a, b)} is an open cover of A (by a single open and bounded interval), so λ∗ (A) ≤ (b − a). On the other hand, λ∗ (A) < ∞ does not imply that A should be bounded. For example, we shall see in Corollary 237 below that every countable subset M of R satisfies λ∗ (M) = 0. In particular, λ∗ (N) = 0 and, certainly, N is not bounded. 7. Observe that, in Definition 229, we are not asking for the family of intervals to be pairwise disjoint. 8. Since ∅ ⊂ (0, ε) and |(0, ε)| = ε for any ε > 0, we have λ∗ (∅) = 0. 9. We will see in Remark 235.2 below that λ∗ (R) = +∞. ®
3.1 Measure
111
Proposition 231 The Lebesgue outer measure is an extension of the length of a bounded interval introduced in Definition 35, in the sense that for any bounded interval I , we have |I | = λ∗ (I ).
(3.2)
Proof Let I be a bounded interval in R with endpoints a and b such that a ≤ b (a < b if the interval is open). Then, for ε > 0, we have I ⊂ (a−ε, b + ε), and {(a−ε, b + ε)} is an open cover of I . This implies that λ∗ (I ) ≤ |(a − ε, b + ε)| = |I | + 2ε. Since ε > 0 is arbitrary, we get λ∗ (I ) ≤ |I |.
(3.3)
In order to prove the reverse inequality, we need the following three simple results. 1. Lemma 232 Let I1 and I2 be two open nonempty intervals in R such that I1 ∩ I2 = ∅. Then I := I1 ∪ I2 is an (open) interval, and if I1 and I2 are both bounded, so it is I and we have |I | ≤ |I1 | + |I2 |. Proof We prove the first part of the lemma in the case that both intervals are bounded. If one (or both) of them is (are) unbounded, the proof of this first part is similar, and it is left to the reader. Let x0 ∈ I1 ∩ I2 . Put Ii = (ai , bi ) for i = 1, 2, and let m := min{a1 , a2 }, M := max{b1 , b2 }. If m = ai and M = bi for some i ∈ {1, 2} we are done (including the statement about lengths), so assume, without loss of generality, that m = a1 and M = b2 . If x ∈ (m, x0 ), then x ∈ I1 . while x ∈ I2 if x ∈ (x0 , M). This shows that (m, M) ⊂ I1 ∪ I2 . On the other hand, if x ∈ I1 then (m =) a1 < x < b1 (≤ M), hence x ∈ (m, M), and if x ∈ I2 then (m ≤) a2 < x < b2 (= M), so again x ∈ (m, M). This shows that I1 ∪ I2 = (m, M). Moreover |I | = M − m ≤ (b1 − a1 ) + (b2 − a2 ) = |I1 | + |I2 |. n 2. Lemma 233 Let I be a bounded interval in R. Assume that I ⊂ i=1 Ii , where {Ii }ni=1 is a finite family of open bounded intervals in R. Then |I | ≤ ni=1 |Ii |. Proof The lemma is proved by induction on n. Assume first that n = 1. Then I ⊂ I1 , and the result is obvious. Assume now that the lemma has been proved for n= 1, 2, . . . , N, where N ∈ N. Let I be a bounded interval such that +1 N+1 I ⊂ N i=1 Ii , where {Ii }i=1 is a finite family of open bounded intervals. If all intervals Ii , i = 1, 2, . . . , N + 1, are pairwise disjoint, Corollary 104 concludes that I must be a subset of Ii for some i ∈ {1, 2, . . . , N + 1}, and the result follows. On the contrary, there are at least two intervals, say I1 and I2 , such that I1 ∩ I2 = ∅. Then, Lemma 232 implies that I0 := I1 ∪ I2 is an (open) interval, and |I0 | ≤ |I1 | + |I2 |. Clearly, I ⊂ I0 ∪ I2 ∪ . . . ∪ IN +1 . The induction hypothesis +1 implies |I | ≤ |I0 | + |I2 | + . . . + |IN+1 |, so |I | ≤ N i=1 |Ii |, and the statement holds for n = N + 1. 2 To avoid cumbersome notation, the Lebesgue outer measure of an interval (a, b) will be denoted λ∗ (a, b). The same applies to other kind of intervals.
112
3 Measure
3. Lemma 234 If a < b, then λ∗ (a, b) = λ∗ [a, b]. Thus, λ∗ (a, b) = λ∗ (a, b] = λ∗ [a, b) = λ∗ [a, b]. Proof Given a countable cover {Ii } of (a, b) by open bounded intervals Ii , and ε > 0, the family {I i } ∪ (a − ε, a + ε) ∪ (b − ε, b + ε) is a countable cover of |I | < +∞. Note that [a, b]. Assume that i i i |Ii | + |(a − ε, a + ε)| + |(b − ε, b + ε)| = i |Ii | + 4ε. It follows that λ∗ [a, b] < λ∗ (a, b) + 4ε. Since this is true for every ε > 0, we get λ∗ [a, b] ≤ λ∗ (a, b). Obviously, the reverse inequality holds, and so λ∗ (a, b) = λ∗ [a, b]. Since λ∗ (a, b) ≤ λ∗ (a, b] ≤ λ∗ [a, b], we get 2 λ∗ (a, b] = λ∗ (a, b). Analogously, λ∗ [a, b) = λ∗ (a, b). Now, we can finish the proof of Proposition 231. Since for two bounded intervals I and J with the same endpoints, we have λ∗ (I ) = λ∗ (J ) (see Lemma 234) and |I | = |J |, we may assume, without loss of generality, that I is a closed bounded interval in R. Let {Ii } be a countable cover of I with open bounded intervals. Since n I is compact, there exists a finite subcover, say some n ∈ N. It is enough n{Ii }i=1 , for to apply Lemma 233 to conclude that |I | ≤ i=1 |Ii | (≤ i |Ii |). This shows that |I | ≤ λ∗ (I ).
(3.4)
The conclusion follows from (3.3) and (3.4).
2
Remark 235 1. It follows from Remark 230.2 and Proposition 231 that a singleton {x} ⊂ R has Lebesgue outer measure zero. Indeed, {x} ⊂ (x − ε, x + ε), so λ∗ ({x}) ≤ |(x − ε, x + ε)| = 2ε, and this hold for all ε > 0. 2. It follows again from Remark 230.2 and Proposition 231 that λ∗ (R) = +∞. Indeed, we have λ∗ (−n, n) = |(−n, n)| = 2n ≤ λ∗ (R) for all n ∈ N. ® From now on, and in order to avoid splitting the arguments in cases, countable families of sets—finite or countably infinite—will be indexed by N. If the family turns out to be finite, this is achieved either by just repeating one of the sets countable infinitely many times, or by adding to the family the empty set repeated countable infinitely many times. Proposition 236 Let {En }∞ n=1 be a sequence of subsets of R. Then (we do not necessarily assume that the series on the right-hand-side of the inequality (3.5) converges), ∞ ∞ ∗ λ En ≤ λ∗ (En ). (3.5) n=1
n=1
Proof If one of the sets En satisfies λ∗ (En ) = +∞, then certainly inequality (3.5) holds. Otherwise, fix ε > 0. For each n consider a countable cover of the set En by
3.1 Measure
113
open bounded intervals In,m , m ∈ N, so that ∞
|In,m | < λ∗ (En ) +
m=1
ε . 2n
(3.6)
Let ϕ : N → N × N bea one-to-one and onto mapping. The sequence {Iϕ(k) }∞ k=1 ∞ is a countable cover of E by open bounded intervals. It follows, then, that n n=1 ∞ ∞ λ∗ series of nonn=1 En ≤ k=1 |Iϕ(k) |. Apply Proposition 212 to the double ∞ negative terms n,m |In,m |: If the double series diverges, then ∞ n=1 m=1 |In,m | = +∞, hence (3.6) implies (3.5); otherwise, we get ∞
|Iϕ(k) | =
λ∗
∞ n=1
En
≤
∞ ∞ n=1 m=1
|In,m | ≤
|In,m |,
n=1 m=1
k=1
hence
∞ ∞
∞
λ∗ (En ) +
n=1
ε ∗ λ (En ) + ε. = 2n n=1
Since (3.7) holds for every ε > 0, the result follows.
∞
(3.7) 2
Corollary 237 The union of countably many subsets of R, each of them having Lebesgue outer measure zero, has Lebesgue outer measure zero. In particular, every countable subset of R has Lebesgue outer measure zero. Proof This is a straightforward consequence of Proposition 236. The particular case follows from Remark 235. 1. Remark 238 Observe that, on the one hand, the set Q is dense in R, and on the other hand, Q has Lebesgue outer measure zero, as it is countable. This means that topological and measure concepts are quite different. We have here an extreme situation; the outer measure of Q is zero, while the outer measure of Q ( = R) is +∞. ® Corollary 239 Let A be a subset of R. Assume that λ∗ (A) < ∞. Then, for every ε > 0 there exists an open superset V of A such that λ∗ (A) ≤ λ∗ (V ) < λ∗ (A) + ε. ∞ ∗ Proof Given ε > 0 we can find, by the definition ∞ of λ (A), a∗ cover {In }n=1 of A ∗ by open bounded intervals such that λ (A) ≤ n=1 |In | < λ (A) + ε. Put V := ∞ The set V is open, and it contains A. It follows from Proposition 236 that n=1 In . ∗ ∗ λ∗ (V ) ≤ ∞ n=1 |In |. Since λ (A) ≤ λ (V ), the conclusion follows. For the following result, we need a definition first.
Definition 240 A countable intersection of open subsets of R is called a Gδ -set. A countable union of closed subsets of R is called an F σ -set. 1+1/n) = [0, 1]. Note that Gδ -sets need not be open. For example, ∞ n=1 (−1/n, In the same way, Fσ -sets need not be closed. For example, ∞ [1/n, 1 − 1/n] = n=1 (0, 1).
114
3 Measure
Corollary 241 Let A be a subset of R. Then there exists a Gδ -subset G of R such that A ⊂ G and λ∗ (A) = λ∗ (G). Proof If λ∗ (A) = +∞ it is enough to take G := R. Otherwise, it follows from Corollary 239 that for every n ∈ N we can find an open subset Vn of R such that A ⊂ Vn and λ∗ (Vn ) < λ∗ (A)+1/n. The set G := ∞ n=1 Vn satisfies the requirements. 2 Remark 242 Note that Corollary 241 does not say that given a subset A of R, we may always find a Gδ -set G in R such that A ⊂ G and λ∗ (G \ A) = 0. We shall see later (see Corollary 267) that this conclusion characterizes a class of subsets of R called Lebesgue measurable (see Definition 245), while there are sets in R (even bounded sets) that are not Lebesgue measurable (see Sect. 3.1.6 and, in particular, Lemma 283). Similarly, Corollary 239 does not conclude that for every subset A of R with λ∗ (A) < +∞ and for every ε, there exists an open subset Vε of V such that A ⊂ Vε and λ∗ (Vε \ A) < ε. It this were true then, by the method of proof of Corollary 241 we would derive the first false statement in this remark. ® Definition 243 Consider a real number α and a subset E of R. By the set α + E we understand the set {α + x : x ∈ E}, and refer to such a set as to a translate (by α) of E. Proposition 244 Consider a subset E of R and a real number α. Then λ∗ (α + E) = λ∗ (E). (We say that the Lebesgue outer measure is translation invariant.) ∞ Proof If {In }∞ n=1 is a cover of E by open bounded intervals, then {α + In }n=1 is a ∗ cover of α + E by bounded open intervals, and for every n ∈ N, λ (α + In ) = λ∗ (In ). The result follows then from the definition of outer measure.
3.1.2
The Class of Lebesgue Measurable Sets and the Lebesgue Measure
We proved in Proposition 236 a certain “regularity” of the outer measure: the outer measure of a countable union of sets is less than or equal to the sum of the individual outer measures. It can be, certainly, less than the sum, since the sets in the family may overlap. The reader may ask why this was not stated more precisely, say by proving that the outer measure of the union of a pairwise disjoint family of sets equals the sum of the individual measures. The reason is that this supposed-to-be result is false, even in the most elementary setting of having two disjoint sets. Indeed, we will see (see Sect. 3.1.6 and Remark 246.4) that it is possible for two sets E and F to be disjoint and yet λ∗ (E ∪ F ) = λ∗ (E) + λ∗ (F ).
(3.8)
3.1 Measure
115
Still, it is highly desirable that for some disjoint bounded sets E and F , λ∗ (E ∪ F ) = λ∗ (E) + λ∗ (F ).
(3.9)
This is something we do expect. If this is not always the case, then an acceptable compromise will be to restrict ourselves to a class of subsets of R where (3.9) holds. This class must be, for having a rich theory, sufficiently big, and must have good stability properties. Moreover, it must include at least all the intervals of R. The following definition is due to the Greek mathematician C. Carathéodory, and presents in a convenient way a class of sets —the Lebesgue measurable ones— that behaves “properly” with respect to the measure introduced.1 We shall see later that this class is very big (including, for example, all open and all closed subsets of R). Definition 245 (Carathéodory) Suppose E is a subset of R. We say that E is Lebesgue measurable (in short, measurable) if λ∗ (E ∩ T ) + λ∗ (E c ∩ T ) = λ∗ (T ), for all subsets T of R,
(3.10)
where E c denotes the complement of the set E, i.e., R \ E. Remark 246 1. Recall that λ∗ (∅) = 0, see Remark 230.8. Thus the sets R and ∅ are both measurable. 2. Observe that it follows from the definition that a subset E of R is measurable if, and only if, its complement E c is measurable. 3. Since (E ∩ T ) ∪ (E c ∩ T ) = T for all the subsets E and T of R, we have, by Proposition 236, λ∗ (T ) ≤ λ∗ (E ∩ T ) + λ∗ (E c ∩ T ) for all sets E and T . Thus, to show that E is measurable we only have to show λ∗ (T ) ≥ λ∗ (E ∩ T ) + λ∗ (E c ∩ T )
(3.11)
for all subsets T of R.
1
This was not the original definition of a measurable set given by H. Lebesgue in his Ph.D. dissertation [Le02], where he introduced what now it is called the Lebesgue theory of measure and integration. There, he said that a bounded subset of R is measurable whenever its outer measure (introduced in Definition 259) and its inner measure (introduced in Definition 268 below) coincide. We shall prove in Theorem 269 below that the class of bounded measurable subsets of R according to Carathéodory (Definition 245) and the class of bounded subsets of R introduced by Lebesgue coincide. By the way, in his Ph.D. dissertation Lebesgue states what he call the “problème de la mesure”, i.e., to know whether it is possible to attach to any bounded subset of R a non-negative number that satisfies the following three conditions: (i) There exist bounded subsets of positive measure. (ii) The measure is translation-invariant. (iii) The measure is countably additive. The solution—in the negative—of this question was provided later by the Italian mathematician G. Vitali in 1905. His example will be presented in Lemma 283 below.
116
3 Measure
4. Observe that if the Carathéodory condition (3.10) fails (i.e., if the set E is not measurable), then there exists a set T so that λ∗ (T ) = λ∗ (T ∩ E) + λ∗ (T ∩ E c ) and thus we have two disjoint sets where the outer measure of the disjoint union does not equal to the sum of the outer measures of the two sets. For the moment, we cannot exhibit a set that is not measurable. This will be done in Sect. 3.1.6. ® It is convenient from the very beginning to note that any subset of R having outer measure zero is measurable. This is the content of the following result. Proposition 247 Any subset of R having Lebesgue outer measure zero is measurable. In particular, any countable subset of R is measurable. Proof Assume that λ∗ (E) = 0 for some E ⊂ R. According to (3.11), we only have to show λ∗ (T ) ≥ λ∗ (E ∩ T ) + λ∗ (E c ∩ T ) for all subsets T of R. Since λ∗ (E) = 0 and E ∩ T ⊂ E, we have λ∗ (E ∩ T ) = 0. Moreover E c ∩ T ⊂ T and thus λ∗ (T ) ≥ λ∗ (E c ∩ T ), and the result follows. The last part is a consequence of the first part and Corollary 237. 2 Besides sets of Lebesgue outer measure 0, the next class of subsets that should be measurable is the one consisting of all intervals. This crucial result is proved in the next proposition. Proposition 248 Any interval I in R is measurable. Proof If I = ∅, the result holds (see Remark 246.1). Otherwise, by Remark 246.4 we need just to show that λ∗ (T ) ≥ λ∗ (I ∩ T ) + λ∗ (I c ∩ T ) for all subsets T of R. Let {In }∞ n=1 be a cover of T by bounded intervals. Without loss of generality, we may assume that I c ∩ In is an interval for each n ∈ N. The family {I ∩ In }∞ n=1 is, obviously, a cover of I ∩ T by bounded intervals. It follows that ∗
λ (I ∩ T ) ≤
∞
|I ∩ In |.
(3.12)
n=1 c The family {I c ∩ In }∞ n=1 is a cover of I ∩ T by bounded intervals. It follows similarly that
λ∗ (I c ∩ T ) ≤
∞
|I c ∩ In |.
n=1
Observe now that, for each n ∈ N, we have |I ∩ In | + |I c ∩ In | = |In |.
(3.13)
3.1 Measure
117
∞ ∞ c Assume first that ∞ n=1 |In | < +∞. Then n=1 |I ∩ In | < +∞ and n=1 |I ∩ In | < +∞. Then, we may use proposition 157 to get ∞ n=1
|I ∩ In | +
∞
|I ∩ In | = c
n=1
∞
|In |.
(3.14)
n=1
If, on the contrary, ∞ n=1 |In | = +∞, inequality (3.14) obviously holds, too. Thus, from (3.12), (3.13), and (3.14), we get in either case λ∗ (I ∩ T ) + λ∗ (I c ∩ T ) ≤
∞
|In |.
n=1
Since this holds for every cover {In }∞ n=1 of T by bounded intervals, we finally get (see Remark 230.3) λ∗ (I ∩ T ) + λ∗ (I c ∩ T ) ≤ λ∗ (T ), and this proves, thanks to Remark 246.3, that every interval in R is measurable. 2 For two measurable disjoint subsets of R, (3.8) cannot occur, and instead (3.9) holds. In fact, it is enough that one of them would be measurable. This is the content of the next result. Proposition 249 Let E and F be two disjoint subsets of R, and assume that E is measurable. Then λ∗ (E ∪ F ) = λ∗ (E) + λ∗ (F ) Proof Due to the measurability of E we have
λ∗ (E ∪ F ) = λ∗ (E ∪ F ) ∩ E + λ∗ (E ∪ F ) ∩ E c = λ∗ (E) + λ∗ (F ).
We collect some properties of the class of measurable sets in the following proposition. We first introduce a definition. Definition 250 A family S of subsets of R is said to be a σ -algebra whenever (i) ∅ ∈ S (ii) If E ∈ S, then E c ∈ S ∞ (iii) If {En }∞ n=1 is a sequence in S, then n=1 En ∈ S Remark 251 1. Observe that it follows from de De Morgan’s laws that if {En }∞ n=1 is a sequence in a σ -algebra S, then ∞ n=1 En ∈ S. 2. Note that the union and the intersection of a finite number of elements in S is also an element in S. This follows from (i) and (iii) in Definition 250, and from the previous item. It is enough to add a countably infinite number of empty sets to the original family. ®
118
3 Measure
Proposition 252 Let M be the class of all measurable subsets of R. Then M is a σ -algebra. Proof (i) and (ii) in Definition 250 for the class M follow from Remark 230.8, from Proposition 247, and from Remark 246.2. For proving (iii), we shall establish first some intermediate results. 1. Lemma 253 Let E1 and E2 be two measurable subsets of R. Then E1 ∪ E2 is also measurable. Moreover, if T is an arbitrary subset of R, and E1 ∩ E2 = ∅, then λ∗ (T ∩ (E1 ∪ E2 )) = λ∗ (T ∩ E1 ) + λ∗ (T ∩ E2 ). Proof Let T be an arbitrary subset of R. Put E := E1 ∪ E2 . Then, using successively that E1 and E2 are both measurable, we get λ∗ (T ) = λ∗ (T ∩ E1 ) + λ∗ (T ∩ E1c )
= λ∗ (T ∩ E1 ) + λ∗ (T ∩ E1c ) ∩ E2 + λ∗ (T ∩ E1c ) ∩ E2c . On the other hand, due to the fact that E1 is measurable,
λ∗ T ∩ (E1 ∪ E2 )
= λ∗ (T ∩ (E1 ∪ E2 )) ∩ E1 + λ∗ (T ∩ (E1 ∪ E2 )) ∩ E1c
= λ∗ (T ∩ E1 ) + λ∗ (T ∩ E2 ) ∩ E1c .
(3.15)
(3.16)
From (3.15) and (3.16), we get
λ∗ (T ) = λ∗ T ∩ (E1 ∪ E2 ) + λ∗ T ∩ (E1 ∪ E2 )c , since T ∩ (E1 ∪ E2 )c = (T ∩ E1c ) ∩ E2c . This shows that (E1 ∪ E2 ) is measurable. The second part follows from (3.16). Indeed, if E1 ∩E2 = ∅, then (T ∩E2 )∩E1c = T ∩ E2 . n 2. Lemma n 254 For n ∈ N, let {Ek }k=1 be a finite family of measurable subsets of R. Then k=1 Ek is also measurable. Moreover, if T is an arbitrary
subset n of R, and n ∗ } consists of pairwise disjoint sets, then λ the family {E T ∩ k k=1 k=1 Ek = n ∗ k=1 λ (T ∩ Ek ).
Proof This follows from Lemma 253 by finite induction. ∞ Let {Sn }∞ be a sequence of subsets of R. We say that {S } is increasing n n=1 n=1 (decreasing) if Sn ⊂ Sn+1 (respectively, Sn ⊃ Sn+1 ) for all n ∈ N. We say that {Sn }∞ n=1 is strictly increasing (strictly decreasing) if simultaneously Sn ⊂ Sn+1 and Sn = Sn+1 , in symbols, Sn Sn+1 (respectively, Sn ⊃ Sn+1 and Sn = Sn+1 , in symbols Sn Sn+1 ), for all n ∈ N. 3. Lemma 255 Let {Sn }∞ of measurable subsets of n=1 be an increasing sequence R. Let T be an arbitrary subset of R. Put S := ∞ n=1 Sn . Then (considering even the possibility that limn→∞ λ∗ (T ∩ Sn ) = +∞), lim λ∗ (T ∩ Sn ) = λ∗ (T ∩ S).
n→∞
(3.17)
3.1 Measure
119
Proof Put D1 := S1 , and, for n ≥ 2, let Dn := Sn \ Sn−1 . Observe, first, that, as a consequence of Lemmas 253, 254, and (ii) in Proposition 252, the sets Sn and Dn are measurable for every n ∈ N. Note that T ∩ S = ∞ n=1 (T ∩ Dn ), and that consists of pairwise disjoint sets. the family {Dn }∞ n=1 Given n ∈ N we get, by Proposition 236 and Lemma 254, λ∗ (T ∩ Sn ) ≤ λ∗ (T ∩ S) ≤
∞
λ∗ (T ∩ Dk ) = lim
n→∞
k=1
n
λ∗ (T ∩ Dk ) = lim λ∗ (T ∩ Sn ).
k=1
n→∞
(3.18) This proves (3.17) both when λ∗ (T ∩ S) < +∞ (in this case the —bounded— ∗ increasing sequence {λ∗ (T ∩ Sn )}∞ n=1 has a finite limit) or when λ (T ∩ S) = +∞ ∗ (in this case, the series in (3.18) diverges and limn→∞ λ (T ∩ Sn ) = +∞). For a variant of Lemma 255 for decreasing sequences of measurable sets, see Lemma 256. We can now finalize the proof of Proposition 252. We need to prove that (iii) in Definition 250 holds. For this, let {En }∞ n=1 be a sequence of measurable sets ∞ in R, and let E := n=1 En . Assume that T is an arbitrary subset of R. Put Sn := nk=1 Ek ∞ for all n ∈ N, and observe that E = ∞ n=1 Sn . The sequence {Sn }n=1 is increasing. Moreover, each set Sn is measurable, due to Lemma 254, hence we have λ∗ (T ) = λ∗ (T ∩ Sn ) + λ∗ (T ∩ Snc ).
(3.19)
Letting n → ∞ in (3.19), we get λ∗ (T ) = lim λ∗ (T ∩ Sn ) + lim λ∗ (T ∩ Snc ) n→∞
n→∞
∗
= λ (T ∩ E) + lim λ∗ (T ∩ Snc ) ≥ λ∗ (T ∩ E) + λ∗ (T ∩ E c ). n→∞
(3.20)
Indeed, the first limit in (3.20) exists, by Lemma 255, and this forces the second limit to exists, too. The inequality in (3.20) then follows from the fact that T ∩Snc ⊃ T ∩E c for all n ∈ N. This proves that (3.11) holds again, hence E is measurable. This finishes the proof of Proposition 252. A variant of Lemma 255 is Lemma 256 below. Note that we need the additional assumption that λ∗ (T ∩ R1 ) < +∞. This fact comes back often in measure theory and we will meet it in the future in this text. Lemma 256 Let {Rn }∞ n=1 be a decreasing sequence of measurable subsets of R, and let T be an arbitrary subset of R. Then, if λ∗ (T ∩ R1 ) < +∞, we have that limn→∞ λ∗ (T ∩ Rn ) exists and is equal to λ∗ (T ∩ ∞ n=1 Rn ). ∞ Proof For n ∈ N, put Sn := Rnc ∩ R1 , S := n=1 Sn , and R := ∞ n=1 Rn . Then S ∪ R = R1 , S ∩ R = ∅, and the two sets R and S are measurable. Since T ∩ S ⊂
120
3 Measure
T ∩ R1 , we get λ∗ (T ∩ S) < +∞. The sequence {Sn }∞ n=1 is increasing, so Lemma 255 ensures that lim λ∗ (T ∩ Sn ) = λ∗ (T ∩ S) (< + ∞).
(3.21)
n→∞
From this and Lemma 253, it follows that
λ∗ (T ∩ R1 ) = λ∗ (T ∩ S) ∪ (T ∩ R) = λ∗ (T ∩ S) + λ∗ (T ∩ R) = lim λ∗ (T ∩ Sn ) + λ∗ (T ∩ R). (3.22) n→∞
Note that R1 = Sn ∪ Rn and Sn ∩ Rn = ∅ for all n ∈ N. Then, again by Lemma 253, λ∗ (T ∩ R1 ) = λ∗ (T ∩ Sn ) + λ∗ (T ∩ Rn ) for all n ∈ N. Finally, it follows from this and from (3.22) that λ∗ (T ∩ R) = λ∗ (T ∩ R1 ) − lim λ∗ (T ∩ Sn ) n→∞
∗ = lim λ (T ∩ R1 ) − λ∗ (T ∩ Sn ) = lim λ∗ (T ∩ Rn ). n→∞
n→∞
2
Remark 257 The requirement λ∗ (T ∩ R1 ) < +∞ cannot be dispensed with in Lemma 256. For example, let Rn :=[n, +∞) for n ∈ N, and letT := R. The ∞ ∞ ∗ sequence {Rn }∞ n=1 is decreasing, and n=1 Rn = ∅. Then λ (T ∩ n=1 Rn ) = 0, ∗ although λ (T ∩ Rn ) = +∞ for all n ∈ N. ® The class M of all measurable sets introduced in Definition 245 is stable under countable operations, as it is shown in Proposition 252. It is of the outmost importance for the theory that the Lebesgue outer measure, when restricted to this class, behaves also very well, in the sense that it is, for example countably additive. This is shown in the next result. ∞ Corollary
∞ 258 Let ∞{En }∗n=1 be a pairwise disjoint family of measurable sets. Then ∗ λ n=1 En = n=1 λ (En ). ∞ ∗ Proof Put E := n=1 En . Assume first that the series ∞ n=1 λ (En ) converges. n n ∗ ∗ By we have λ ( k=1 Ek ) = k=1 λ (Ek ) for all n ∈ N. Now, E = Lemma 254, ( nk=1 En ) ∪ ( ∞ k=n+1 Ek ), so, by Lemma 253, n ∞ ∞ n ∗ ∗ ∗ ∗ ∗ λ (E) = λ En + λ Ek = λ (Ek ) + λ Ek . (3.23) k=1
k=n+1
∞
k=1
∞
k=n+1
∗ By Proposition 236, λ∗ ( k=n+1 k) ≤ k=n+1 λ (Ek ) → 0 as n → ∞, due to E ∞ ∗ the of the series n=1 λ (En ). It follows from (3.23) that λ∗ (E) = ∞convergence ∗ n=1 λ (En ). n ∞ ∗ ∗ Assume now that the series n=1 λ (En ) diverges. Since k=1 λ (Ek ) = n ∗ ∗ ∗ λ ( k=1 Ek ) ≤ λ (E) for all n ∈ N, we get λ (E) = +∞, and the result follows too in this case.
3.1 Measure
121
In view of Proposition 252 and Corollary 258, it is natural to restrict the outer measure to the class of measurable subsets of R. The resulting set function is called the Lebesgue measure. Definition 259 The Lebesgue measure (in short, the measure) on R is the outer measure λ∗ introduced in Definition 229, when restricted to the family M of all measurable subsets of R, a family introduced in Proposition 252. It is denoted by λ. Remark 260 The Lebesgue measure is a particular case of the general concept of a countably additive measure (in short, a measure) defined on a σ -algebra. In general, a measure μ defined on a σ -algebra Σ of subsets of a nonempty set S is a mapping that satisfies the three following properties: (i) μ is defined on Σ and takes values in [0, +∞] (ii) μ(∅) = 0 ∞ (iii) μ( ∞ n=1 En ) = n=1 μ(En ) where {En }∞ n=1 is a pairwise disjoint family in Σ (this includes, as it follows from (ii), the case of a finite union of sets from Σ). That the class M of all measurable sets forms a σ -algebra was proved in Proposition 252 and the Lebesgue measure satisfies the requirements for a measure (i.e., (i), (ii), and (iii) above) was proved in Remarks 230.5, 230.8, and Corollary 258, respectively. Compare this with the original formulation of Lebesgue for bounded measurable subsets of R (see footnote 1 in Sect. 3.1.2). ® Thanks to Propositions 231 and 248, we may unify the notation from now on and write λ(I ) for the length |I | of a bounded interval I in R.
(3.24)
Definition 261 A subset of R having Lebesgue outer measure zero (in view of Proposition 247 this is the same as saying “having Lebesgue measure zero") is said to be null. A property about real numbers that holds for every real number in a certain set M ⊂ R but for the points of a null set is said to hold almost everywhere in M or for almost all x ∈ M (in short, (a.e.) in M). We shall omit “in M” if the set M is understood from the context. Collecting some of what has been said above, every null set in R is measurable, every subset of a null set is again null, the countable union of null sets is itself null, and every countable subset of R is null. Let us prove below that some familiar sets of points in R are measurable. Proposition 262 Any open (closed) subset of R is measurable. Proof By Proposition 99, an open set is a countable union of open intervals, so the result follows from Propositions 248 and 252. The corresponding result for a closed set follows just by taking complements and looking again at Proposition 252. 2 Remark 263 It is simple to prove that the intersection of an arbitrary number of families, each of them a σ -algebra, is again a σ -algebra. This shows that every
122
3 Measure
family F of subsets of R is contained in a smallest σ -algebra—called the σ -algebra generated by F, and denoted by SF. In fact, the family of all subsets of R is itself a σ -algebra. This shows, in particular, that there are σ -algebras containing F. It is enough now to take the intersection of all of them. This is the family SF. ® Definition 264 The σ -algebra generated by the family of all the open subsets of R is called the family of Borel subsets of R. This family is named after É. Borel. Proposition 265 All Borel subsets of R are measurable. Proof We showed in Proposition 262 that every open subset of R is measurable. The family of all measurable subsets of R is a σ -algebra by Proposition 252. Since the family of Borel sets is the smallest σ -algebra that contains the family of open subsets of R, the conclusion follows. 2 Note that all open and closed sets in R are Borel sets. All countable unions and intersections of (randomly chosen) open and closed sets are Borel sets. This is where our intuition works. We intuitively stay within the frame of Borel sets as we can identify intervals, their unions and intersections and their complements. However, there are measurable sets which are not Borel sets. We will construct such a set later on (see Proposition 416).
3.1.3 Approximating Measurable Sets We saw in Corollary 239 that every subset A of R having finite Lebesgue outer measure has an open superset V with almost the same measure (and this implied— see Corollary 241—that we can find a Gδ -superset G with the same measure). In Remark 242, we mentioned—without proof—that this is not the same as saying that the difference V \ A has a small Lebesgue outer measure (or that G \ A is a null set, respectively), and that in fact each of these statements characterizes the Lebesgue measurability of the given set. Now, we can justify these assertions. Proposition 266 Let A be a subset of R. Then the following statements are equivalent: (i) A is measurable. (ii) For every ε > 0 there exists an open superset Vε of A such that λ∗ (Vε \ A) < ε. (iii) For every ε > 0 there exists a closed subset Fε of R such that Fε ⊂ A and λ∗ (A \ Fε ) < ε. (iv) For every ε > 0 there exists an open subset Vε/2 of R and a closed subset Fε/2 of R such that Fε/2 ⊂ A ⊂ Vε/2 and λ(Vε/2 \ Fε/2 ) < ε. Proof (i)⇒(ii): Let ε > 0. If λ(A) < +∞, then the conclusion follows from Corollary 239 and Proposition 252. Indeed, if Vε is the open set found in Corollary 239 for the given ε > 0, then the sets Vε and Vε \ A (= Vε ∩ (R \ A)) are both
3.1 Measure
123
measurable (Propositions 262 and 252), and λ(A) ≤ λ(Vε ) = λ(A) + λ(Vε \ A) < λ(A) + ε, hence λ(Vε \ A) < ε. If, on the contrary, λ(A) = +∞, put An := A ∩ [−n, n] for each n ∈ N. Note that A = ∞ n=1 An . Now λ(An ) < +∞, hence, by the first part of this implication, there exists an open set Vε,n such that An ⊂ Vε,n , and λ(Vε,n \ An ) < ε/2n for each n ∈ N.Put Vε := ∞ n=1 Vε,n . The set Vε is open and contains A. Obviously, ∞ V \ A ⊂ (V \ A ε,n n ). It follows from Proposition 236 that λ(Vε \ A) ≤ n=1 ε ∞ λ(V \ A ) < ε. ε,n n n=1 ∗ (ii)⇒(i): Given ∞ n ∈ N, find an open superset V1/n such∗ that λ (V1/n∗ \ A) < 1/n, and put V := n=1 V1/n . Then V \ A ⊂ V1/n \ A, hence λ (V \ A) ≤ λ (V1/n \ A) (< 1/n) for all n ∈ N. It follows that λ∗ (V \ A) = 0. By Proposition 252, the set V is measurable, as so it is the null set V \A (Proposition 247). Then, again by Proposition 252, the set A is also measurable. (i)⇒(iii): If A is measurable, so it is Ac by Proposition 252. By (i) applied to Ac , given ε > 0 we can find an open set Fεc such that Ac ⊂ Fεc and λ(Fεc \ Ac ) < ε. Observe that Fε is closed, that Fε ⊂ A, and that Fεc \ Ac = A \ Fε . The conclusion follows. (iii)⇒(i): The proof of this implication follows the same pattern as the proof of (ii)⇒(i). Now, we get, for every n ∈ N, closed sets F1/n , each of them a subset of A, and such that λ∗ (A \ F1/n ) < 1/n. The set F := ∞ n=1 An is measurable, and λ∗ (A \ F ) = 0. We conclude then that A is measurable. (ii), together with (iii), imply clearly (iv), since V \ F = (V \ A) ∪ (A \ F ) if F ⊂ A ⊂ V. (iv) implies clearly (ii) (and also (iii)). From the proof of Proposition 266, we can obtain immediately the following consequence (Corollary 267). Corollary 267 Let A be a subset of R. Then, the following are equivalent: (i) A is measurable. (ii) There exists a Gδ -subset G of R such that A ⊂ G and λ∗ (G \ A) = 0. (iii) There exists an Fσ subset F of R such that F ⊂ A and λ∗ (A \ F ) = 0.
3.1.4
The Lebesgue Inner Measure
The Definition and Some Properties of the Lebesgue Inner Measure In retrospective, observe that the definition of the Lebesgue outer measure (and then, the Lebesgue measure when restricted to the measurable subsets of R) was somehow defined “from outside”: we “approximated” the outer measure of a set by using countably many open bounded intervals covering the set, what amounts to use open supersets with almost the same measure (see Proposition 266). In Sect. 3.1.3
124
3 Measure
we saw how to “approximate” the outer measure “from inside” by using closed subsets—in a sense a more natural thing to do: for measuring a set it is artificial to look “outside.” No doubt that it will be convenient to be able to use compact subsets of the given set for approximating the (outer) measure. We shall prove in Theorem 269 below that, for sets having finite outer measure, both procedures give the same value. In fact, we shall prove in the same result something more precise: that the coincidence of the value given by one and the other method characterize, in case of finite outer measure, the measurable sets. Definition 268 Let E be a subset of R. The (Lebesgue) inner measure of E is the value λ∗ (E) ∈ [0, +∞] given by λ∗ (E) := sup{λ(K) : K ⊂ E, Kcompact}. Theorem 269 Let A be a subset of R. Then we have (i) If A is measurable, then λ∗ (A) = λ∗ (A) (= λ(A)). (ii) If λ∗ (A) = λ∗ (A) < +∞, then A is measurable. Proof (i) Assume that A is measurable. – If A is bounded, apply Proposition 266 to obtain, for n ∈ N, a closed subset F1/n of R such that F1/n ⊂ A and λ(A\F1/n ) < 1/n. Observe that, in this case, for every n ∈ N the set Fn is compact. Since λ(F1/n ) + λ(A \ F1/n ) = λ(A), we have λ(A) − 1/n < λ(F1/n ) ≤ λ(A). This happens for all n ∈ N; hence λ∗ (A) = λ(A) (= λ∗ (A)). – In the general case put, for n ∈ N, An := A ∩ [−n, n]. Then An is a bounded ∗ and measurable subset of R. By (i)(a) above we have ∞ λ∗ (An ) = λ (An ) (= ∞ λ(An )). The sequence {An }n=1 is increasing, and n=1 An = A. It follows from Lemma 255 that λ(An ) → λ(A). Then we have, for all n ∈ N, λ(An ) = λ∗ (An ) ≤ λ∗ (A) ≤ λ(A). Letting n → ∞ we get λ∗ (A) = λ(A) (= λ∗ (A)). (ii) Assume that λ∗ (A) = λ∗ (A) < +∞. By Corollary 267, there exist Borel subsets B1 and B2 of R such that B1 ⊂ A ⊂ B2 , and λ(B2 \ B1 ) = 0. Recall that, by Proposition 247, every subset of a null set is measurable (and is a null set). In particular, A \ B1 is measurable. This implies that A (= B1 ∪ (A \ B1 )) is measurable. Remark 270 The requirement that λ∗ (A) < +∞ in Theorem 269 cannot be dispensed with. In fact, we may have a nonmeasurable (unbounded) subset A of R such that λ∗ (A) = λ∗ (A) (= + ∞). To provide an example, consider the nonmeasurable set V constructed in Lemma 283 below. We have V ⊂ [0, 1]. Let A := V ∪ [2, +∞).
3.1 Measure
125
This set satisfies, certainly, λ∗ (A) = λ∗ (A) = +∞. Assume that A is measurable; then [2, +∞)c ∩ A (= V ) should be measurable, too, a contradiction. ® For subsets A of R such that λ∗ (A) < +∞ we may improve Proposition 266(iii). Proposition 271 Let A be a subset of R such that λ∗ (A) < +∞. Then, the following are equivalent: (i) A is measurable. (ii) For every ε > 0, there exists a compact subset Kε of A such that λ(A \ Kε ) < ε. Proof (i)⇒(ii): If A is measurable, then, by Theorem 269, we have λ∗ (A) = λ∗ (A). So, due to the fact that λ(A) < +∞, given ε there exists a compact subset Kε of A such that λ(A) − ε < λ(Kε ). Since λ(Kε ) + λ(A \ Kε ) = λ(A), the conclusion follows. (ii)⇒(i) is contained in (iii)⇒(i) in Proposition 266. Remark 272 Again, Proposition 271 is trivially false without the requirement that λ∗ (A) < +∞. Indeed, take A = R. Then A is measurable. Since every compact subset of R, being bounded, has finite measure, (ii) fails. ® A Result of Steinhaus Definition 273 Consider a nonempty subset A of R. We define A − A := {x − y : x, y ∈ A}. The next result is due to the Polish mathematician H. Steinhaus. Theorem 274 (Steinhaus) Consider a measurable set A ⊂ R. If the Lebesgue measure of A is positive then the set A − A contains an interval (−δ, δ) for some δ > 0. Remark 275 Note that a measurable subset of R having a positive measure does not contain in general an interval. An example of this phenomenon is the set P ∩ [0, 1] of all irrational numbers in the interval [0, 1]. Indeed, its complement in [0, 1] (i.e., the set [0, 1] ∩ Q of all rational numbers in [0, 1]) is a countable set, hence measurable, and λ([0, 1] ∩ Q) = 0 (see Proposition 247). It follows that P ∩ [0, 1] is measurable, λ(P ∩ [0, 1]) = 1, and certainly P ∩ [0, 1] does not contain any interval. Exercise 13.153 complements this remark by checking the validity of Steinhaus Theorem 274 for the set P ∩ [0, 1] above. ® The following lemma will be used in the proof of Theorem 274. Lemma 276 Let K be a nonempty compact subset of R, and let F be a nonempty closed subset of R. Assume that K ∩ F = ∅. Then δ := inf{|x − y| : x ∈ K and y ∈ F } > 0. Proof Assume, on the contrary, that δ = 0. Then there exists a sequence {xn }∞ n=1 ⊂ K and a sequence {yn }∞ n=1 ⊂ F so that lim |xn − yn | = 0.
n→∞
126
3 Measure
Fig. 3.1 The first two steps in the construction of the Cantor ternary set C
∞ Since K is compact, we can extract a convergent subsequence {xnk }∞ k=1 of {xn }n=1 ∞ (see Theorem 149). Let x be its limit. Then, obviously, the sequence {ynk }k=1 also converges to x. This is a contradiction, since x ∈ K ∩ F due to the fact that both sets K and F are closed.
Proof of Theorem 274 Assume first that A is bounded. Let λ(A) = α > 0. By Theorem 269, λ∗ (A) = λ(A), hence we can find a compact set K ⊂ A so that λ(K) > α/2. Fix ε > 0 such that ε < α/2. We can also find, by Proposition 266 (see also Corollary 239), an open subset V of R such that A ⊂ V and λ(V \ A) < ε. Define now δ = inf{|x − y| : x ∈ K and y ∈ V c }. By Lemma 276, we have δ > 0. Let t ∈ R be such that |t| < δ. Consider the compact set t + K (see Definition 243). Note that (t + K) ⊂ V (otherwise t ≥ δ), and that t + K is measurable as it is closed (Proposition 262). We now show that (t + K) ∩ A = ∅.
(3.25)
Suppose on the contrary that (t + K) ∩ A = ∅. Then (t + K) ⊂ V / A and we have, by Proposition 244, ε < α/2 < λ(K) = λ(t + K) ≤ λ(V / A) < ε, which yields to a contradiction. This proves (3.25). Therefore (−δ, δ) ⊂ A − K ⊂ A − A, and the result follows. Assume now that A is unbounded. We can find n ∈ N such that λ(A∩[−n, n]) = 0 (otherwise, λ(A) = 0, since A = ∞ n=1 (A ∩ [ − n, n]) and we may apply Proposition 236). The set A ∩ [−n, n] is bounded, so (A ∩ [−n, n]) − (A ∩ [−n, n]) contains a nonempty open interval. The same holds for A − A.
3.1.5
The Cantor Ternary Set
The Classical CantorTernary Set We study in this subsection a subset C of [0, 1], first introduced by G. Cantor, that plays an important role in mathematics. Its construction was motivated by questions in Fourier analysis. The set C is usually referred to as the Cantor ternary set. The term “ternary” comes from the way it is constructed (see Fig. 3.1): Put
3.1 Measure
127
Fig. 3.2 A tree representation of the Cantor ternary set; 0 points to the left, 1 to the right
∅ := [0, 1].
(level 0)
Remove its open middle third interval I(1,1) := (1/3, 2/3) and keep the remaining two closed intervals 0 := [0, 1/3], 1 := [2/3, 1].
(level 1)
Now proceed with each 0 and 1 as above: From 0 , remove its open middle third interval I2,1 := (1/9, 2/9) and keep the remaining two closed intervals 0,0 := [0, 1/9] and 0,1 := [2/9, 3/9]. Analogously, from 1 , remove its open middle third interval I2,2 := (7/9, 8/9) and keep the remaining two closed intervals 1,0 := [6/9, 7/9] and 1,1 := [8/9, 1]. This leads to 0,0 := [0, 1/9], 0,1 := [2/9, 3/9], 1,0 := [6/9, 7/9], 0,1 := [8/9, 1]. (level 2) Continue in this way. Schematically, we are building an inverted dyadic tree of closed intervals (see Fig. 3.2). −n At each level n we get 2n closed intervals, each of length 3 , whose union is denoted Kn . Precisely, put K0 := ∅ , Kn := εi =0,1, i=1,2,... ,n ε1 ,... ,εn . We get a decreasing sequence {Kn }∞ n=0 of nonempty compact sets. By Corollary 148, its intersection is nonempty. Definition 277 The Cantor ternary set is the subset C of [0, 1] defined by C :=
∞
Kn ,
(3.26)
n=0
where {Kn }∞ n=0 is the sequence of nonempty compact subsets of R described in the preceding paragraph. We shall consider C endowed with the metric induced by the absolute-value metric of R (the resulting space is usually referred to as the Cantor space). A number of its properties should be clear from its very construction. We list them in Proposition 279 for future references. Other less obvious properties will be established shortly. We need, in order to shorten the statement, the following definition.
128
3 Measure
Definition 278 A closed subset P of R is said to be perfect if each of its points is an accumulation point of P . Proposition 279 The Cantor Ternary set C has the following properties. (i) (ii) (iii) (iv) (v)
It is a nonempty compact subset of [0, 1]. It is Lebesgue measurable, and it has Lebesgue measure zero. It is a perfect set (see Definition 278). It is a nowhere dense set (see Definition 106). It can be identified—as a set—with the set 2N consisting of all sequences s = {εn }∞ n=1 consisting of 0’s and 1’s. (vi) It is uncountable (more precisely, it has cardinality c, i.e., the cardinality of the set of real numbers).
Proof (i) That C is nonempty was mentioned above. It is a compact set, being a closed subset of the compact set [0, 1]. (ii) Observe that every closed subset of R is measurable (Proposition 262). That C is a null set follows from the fact that λ(Kn ) = 2n /3n for all n ∈ N. (iii) If x ∈ C and ε > 0 are given, choose n ∈ N such that 3−n < ε. Find εi ∈ {0, 1}, i = 1, 2, . . . , 2n such that x ∈ ε1 ,... ,εn . It follows that the distance from x to any y in C such that y ∈ ε1 ,... ,εn ,εn+1 ,... ,εm for m > n is less than ε. (iv) The set C is closed. If it contained a nonempty open subset, it would then contain a proper open interval—having some positive length δ. However, λ(C) = 0, as it was proved in (ii). (v) Each element x in C can be located by determining a—unique—sequence {ε1 , ε2 , . . . }, where εn ∈ {0, 1} for all n ∈ N (εn = 0 points to the left, εn = 1 to the right, see Fig. 3.2). Conversely, each such a sequence {ε1 , ε2 , . . . } defines a single element x in C by “going down the tree along the chosen path.” Precisely, {x} =
∞
ε1 ,... ,εn .
(3.27)
n=1
The mapping φ : 2N → C defined by (3.27) is clearly one-to-one and onto. This shows that C and 2N can be identified as sets. (vi) This is a consequence of (v) and Exercise 13.47. This result can be obtained also from more general statements (see Corollary 591). Remark 280 Some elements of C are easily identified: those are the end points of all intervals ε1 ,... ,εn for each n ∈ N. Since this set E of points is countable, it is clear (see (vi) in Proposition 279) that there are other points in C besides those in E. From the geometric construction above it is clear that elements {εn }∞ n=1 in E are characterized by the fact that the sequence {εn }∞ n=1 is eventually constant: indeed, if it is eventually 0 (eventually 1), then we stay at a left-end (respectively, right-end) point of a certain ε1 ,... ,εn . From this is clear that C \ E consists of all noneventually ® constant sequences {εn }∞ n=1 .
3.1 Measure
129
Fig. 3.3 Elements in the ε1 ,... ,εn intervals written using the base-3 expansion (Remark 281)
Remark 281 Consider the base-3 expansion of a real number 0.a1 a2 a3 · · · in [0, 1], where an ∈ {0, 1, 2} for all n ∈ N. Clearly the intervals ε1 ,... ,εn defined in the process of building C consist of numbers of the form given in Fig. 3.3, where an ∈ {0, 1, 2} for all n ∈ N: This shows that C consists of all real numbers in [0, 1] whose base-3 expansion is of the form 0.a1 a2 a3 · · · , where an ∈ {0, 2} for all n ∈ N. ® A Cantor Ternary Set of Positive Measure Naturally, we may ask whether it is possible to have a Cantor-like subset of [0, 1] with positive measure. The answer is “yes”, and the idea is to remove less and less (as a fraction of the remaining set) at each step of the classical construction (see the first part of this subsection). Choose 0 < p < 1. At first we remove 2−p 2+p , . 4 4 The measure of the removed set is p2 . In the remaining two closed sets, we remove
2 − p 2 + p 14 − p 14 + p , , . 16 16 16 16
The measure of the removed set is now p4 . Continue inductively. The total measure of the removed intervals is ∞ p = p. n 2 n=1 The set C+ that remains has then measure 1 − p, it is closed (hence compact), and no open interval can lie in C+ . Again, each point in C+ is an accumulation point of C+ . Note, too, that card C+ = c. All arguments supporting these facts are similar to those used in the case of the classical Cantor ternary set of zero measure, except one: the emptiness of Int C+ . We cannot rely now on having zero measure, since the opposite holds. The argument (that it could also had been used in the classical case) goes like
130
3 Measure
this: At step 1, each of the two intervals left (i.e., not removed) has length strictly less than 1/2. At step 2, each of the four intervals left has length strictly less than 1/4. At step n, each of the 2n intervals left has length strictly less than 1/2n . Assume now that some nondegenerate open interval J is inside C+ . Take two different points in J , say x and y. Then d := |x − y| > 0, so there is n ∈ N such that 1/2n < d. This shows that x and y do not belong to the same interval among the 2n intervals left at step n, a contradiction with the fact that [x, y] ⊂ C+ . Related to this construction, see Exercise 13.154. Remark 282 It was seen in Proposition 266 that the outer measure of a subset of R can be approximated from below by the outer measure of closed subsets. This cannot be done by using instead open subsets: For example, a Cantor set of positive measure C+ has empty interior.
3.1.6 A Nonmeasurable Set In this section we shall present an example of a nonmeasurable subset V of [0, 1]. The following construction is due to G. Vitali. Lemma 283 (Vitali) There exists a set V ⊂ [0, 1] which is not measurable. Proof Let I = [0, 1]. We define the following relation ∼ on I : x ∼ y if, and only if, x − y ∈ Q. It is clear that ∼ is an equivalence relation, so the set I splits in the corresponding equivalence classes. Form a set V of elements in I consisting of exactly one representative from each equivalence class (the selection is allowed on the basis of the so-called Axiom of Choice, see Remark 289, Sect. 12.6 and, more particularly, Theorem 1076). V is then a subset of I . We claim that the set V is not measurable. In order to prove the claim, we mention here the two following facts: (i) If v, v ∈ V satisfy v ∼ v , then v = v , since V contains only one element from each equivalence class. (ii) If x ∈ I , then there exists one, and only one, v ∈ V so that x ∼ v. Indeed, x belongs to some equivalence class, so there exists v ∈ V such that x ∼ v. Uniqueness comes from (i). Let B := [−1, 1] ∩ Q. Given q ∈ B, let Vq := q + V = {q + v : v ∈ V }. It follows from (i) above that Vq ∩ Vq = ∅ whenever q, q ∈ B, and q = q .
(3.28)
3.1 Measure
131
Since B is countable, we can list its elements in a sequence: B = {q1 , q2 , . . . }. Let us prove that [0, 1] =
∞
Vq n .
(3.29)
n=1
Indeed, given x ∈ [0, 1], by (i) above there exists v ∈ V such that x ∼ v. This means x − v ∈ B, say x − v = qn , hence x = qn + v and thus x ∈ Vqn . This proves (3.29). Assume that V is measurable. We have only two possibilities. (a) λ(V ) > 0. Use then Theorem 274 to conclude that V − V contains (−δ, δ) for some δ > 0. The interval (−δ, δ) contains a rational point r = 0. Then r = v1 −v2 for some v1 , v2 ∈ V . This forces v1 ∼ v2 , hence, by (i) above, v1 = v2 , so r = 0, and this is a contradiction. (b) λ(V ) = 0. The set Vq is a translate of V ; thus, it is measurable, and λ(Vq ) = 0 for all q ∈ B (see Proposition 244). Use now (3.28) and (3.29), together with Corollary 258, to obtain λ[0, 1] = 0, again a contradiction.
This shows that V is not measurable.
Definition 284 Any set V constructed in the proof of Lemma 283 (observe that any selection by using the Axiom of Choice was allowed) is called a Vitali nonmeasurable set. In fact, the argument used in (a) in the proof above shows the following Corollary 285 If A is a measurable subset of any Vitali nonmeasurable set V , then λ(A) = 0. In particular, λ∗ (V ) = 0. Remark 286 Observe that Vitali nonmeasurable sets have empty interior. Indeed, if V is a Vitali nonmeasurable set and O is an open subset of V , the set O is a countable union of open intervals In (Proposition 99), and each of them has Lebesgue measure zero by Corollary 285. This happens only in case that In = ∅ for each n ∈ N, hence O = ∅. ® Corollary 287 For every ε > 0, we can choose a Vitali nonmeasurable set V so that 0 < λ∗ (V ) < ε. Proof Note first that, due to Proposition 247, every Vitali set must have positive outer measure. Fix ε > 0. Given an arbitrary x ∈ [0, 1], let R(x) be the equivalence class where x belongs. We may find q ∈ [0, 1] ∩ Q such that |x − q| < ε. Put y := x − q. Then x ∼ y, hence y ∈ R(x). This shows that we may always find a representative of any class in [0, ε], i.e., a Vitali set V ⊂ [0, ε]. This proves the assertion. Corollary 288 Let V be any Vitali nonmeasurable subset of I := [0, 1]. Then λ∗ (V ) + λ∗ (I \ V ) > 1, with λ∗ (V ) > 0, and λ∗ (I / V ) = 1.
(3.30)
132
3 Measure
Moreover, λ∗ (V ) + λ∗ (I / V ) < 1, with λ∗ (V ) = 0 and λ∗ (I / V ) < 1.
(3.31)
∞ Proof Let {In }∞ n=1 be a cover of I \V by open bounded intervals. Put A := n=1 (In ∩ I ). Then I \ V ⊂ A. Note that A is a measurable subset of I , and that I \ A (again a measurable subset of I ) is a subset of V . Use Corollary 285 to obtain λ(I \ A) = 0, hence λ(A) = 1. Then 1 = λ(A) ≤
∞
λ(In ∩ I ) ≤
n=1
∞
λ(In ).
n=1
Since this holds for every cover {In }∞ n=1 of I \ V by open bounded intervals, we get λ∗ (I \ V ) ≥ 1. On the other hand, λ∗ (I \ V ) ≤ 1, so λ∗ (I \ V ) = 1. Due to the fact that V is not measurable, we get from Proposition 247 that λ∗ (V ) > 0. All together, we get (3.30). The set I \ V is nonmeasurable (otherwise, the set V will be measurable). Use now Theorem 269 to get 1 = λ∗ (I \ V ) = λ∗ (I \ V ). This forces λ∗ (I \ V ) < 1. That λ∗ (V ) = 0 follows from Corollary 285. This proves (3.31). Remark 289 We mentioned in the proof of Lemma 283 that the construction of a nonmeasurable set relied on the Axiom of Choice. This statement ensures that given a nonempty family of nonempty sets we are able to pick one element from each set (see Sect. 12.6 and, particularly, Theorem 1076, where it is stated that it is equivalent to Zorn’s Lemma (named after the German-American mathematician M. Zorn) or to the Well-Ordering Principle). Clearly, the Axiom of Choice becomes relevant only for uncountable sets (the induction process can be used for countable sets). The Banach–Tarski Paradox(named after the Polish mathematicians S. Banach and A. Tarski) is commonly associated with nonmeasurable sets. It states the following: a unit ball in the three-dimensional space can be disassembled into nine pieces and then reassembled to make two unit balls. The disassembly and reassembly is accomplished via rigid motions (translations and rotations). ®
3.1.7
Sequences of Sets
Definition 290 If {Mn }∞ of sets, then lim supn→∞ Mn denotes n=1 is a sequence the collection of all the elements in ∞ M many Mn ’s, and n that lie in infinitely n=1 lim inf n→∞ Mn denotes the collection of all the elements in ∞ M n that lie in all n=1 but finitely many Mn ’s. If lim inf n→∞ Mn = lim supn→∞ Mn (= M), then we call this set lim Mn , and we say that the sequence {Mn }∞ n=1 converges to M.
n→∞
3.1 Measure
133
Note that we have lim inf Mn = n→∞
∞ ∞ n=1 k=n
Mk ,
and lim sup Mn = n→∞
∞ ∞
Mk .
(3.32)
n=1 k=n
Indeed, x ∈ lim supn→∞ Mn means that given n ∈ N, there exists k ≥ n such that x ∈ Mk . Furthermore, x ∈ lim inf n→∞ Mn means that there exist n ∈ N such that x ∈ Mk for all k ≥ n. Observe that we always have lim inf n→∞ Mn ⊂ lim supn→∞ Mn . Example 291 1. For n ∈ N, let Mn be the set of all real numbers of the form nk , where k runs along mk all integers. Then lim sup Mn is the set of all rational numbers, since nk = mn for all m ≥ 1. Furthermore, lim inf Mn is the set of all integers. Indeed, if a, b are integers with no common factor and b > 1, the number ab lies in no Mn , where n is a natural number such that b does not divide n. 2. Let A and B be two sets. For n ∈ N, let An be defined by ⎧ ⎨A if n is odd, An = ⎩B if n is even. Then lim supn→∞ An = A ∪ B and lim inf An = A ∩ B. Thus limn→∞ An exists if and only if A = B. 3. Every sequence {Mn }∞ n=1 of countable sets has a convergent subsequence. Indeed, we need to show that a subsequence can be extracted so that, if an element lies in an infinite number of sets in this subsequence, then it belongs to all but finitely many sets if this subsequence. This can be achieved by a Cantor diagonal procedure.More precisely, put Mn := {xn,k : k ∈ N} for n ∈ N and order the set M := ∞ ≤ x2,2 ≤ x1,3 ≤ x4,1 ≤ . . . . If for evn=1 Mn as x1,1 ≤ x2,1 ≤ x1,2 ≤ x3,1 ery x ∈ M there exists n ∈ N such that x ∈ ∞ k=n Mk , then lim supn→∞ Mn = ∅, and there is nothing to prove. Otherwise, select the first element in M that belongs to infinitely many Mn ’s andconsider the subsequence {Mnk }∞ k=1 of those Mn ’s. elements in the Now, if no other element in ∞ k=1 Mnk belongs to infinitely many subsequence, we are done. Otherwise, pick the next element in ∞ k=1 Mnk with this property, and choose the corresponding subsequence of {Mnk }∞ k=1 . Proceed recursively. Either we stop at some step (and then we obtain a convergent subsequence) or, on the contrary, the process continues forever. In this case, choose a subsequence consisting of the “diagonal” of the successive subsequences so obtained. This concludes the proof. ♦ Observe that if {Mn }∞ n=1 is a sequence of measurable subsets of R, then lim inf n→∞ Mn and lim supn→∞ Mn are both measurable sets. This follows from (3.32). Lemma 292 (Borel–Cantelli) If {En }∞ n=1 is a sequence of measurable subsets of R such that ∞ n=1 λ(En ) < ∞, then λ( lim supn→∞ En ) = 0.
134
3 Measure
∞ Proof Given m ∈ N, there is n(m) ∈ N such that λ < 1/m. k=n(m) Ek Without loss of generality, we may assume that {n(m)}∞ is a strictly inm=1 ∞ ∞ creasing sequence. Since { E } is a decreasing sequence, we have k n=1 k=n
∞ that λ (see Remark 230.2). By Lemma 256, this implies k=n Ek
→ ∞ 0 ∞ λ (lim sup En ) = λ E = 0. k n=1 k=n Concluding Remarks We finish this chapter by a few remarks that show again that the relationship of the measure and set-theoretical properties is quite subtle. 1. The Cantor set is, from the set-theoretical point of view, a huge set, as it is uncountable—see Proposition 279 (vi)-and yet, it has Lebesgue measure zero. 2. It is not difficult to find for any ε > 0 a closed set in [0, 1] that contains no interior point and yet, its Lebesgue measure is greater than 1 − ε. Indeed, if ∞ 2ε n < ε and {r n : n ∈ N} is the set of all the rational numbers in [0, 1], n=1 then put S = [0, 1] \ ∞ n=1 (rn − εn , rn + εn ). The set S is clearly closed and has measure greater than 1 − ε. Moreover, it contains no interior point, due to the fact that any nonempty open interval in R must contain a rational point. 3. Note that any closed subset F of [0, 1] that has measure 1 must be equal to [0, 1]. Indeed, otherwise its nonempty complement [0, 1] \ F is a subset of [0, 1] that is open relatively to [0, 1] and has measure zero. However, every nonempty subset of [0, 1] that is open relatively to [0, 1] must contain a nonempty open subinterval, hence, it has positive measure.
Chapter 4
Functions
This chapter deals with the basic concepts and results in continuity and differentiability of real-valued functions on the real line, together with their various applications.
4.1 4.1.1
Functions on Real Numbers Introduction
Functions (also called “mappings”) have been used in previous chapters. A function f describes a way to assign to any element x in a given set X an element y (denoted by f (x)) in a given set Y . As it was mentioned in Sect. 1.1, the set X is called the domain of f (denoted D(f )), and the set {f (x) : x ∈ D(f )} the range of f (denoted R(f )). It is useful also to understand a function as a “black box” that accepts inputs (the elements in X) and produces, for each input x, a certain output f (x). This process is sometimes symbolized by the notational device x $ → f (x). The input is also referred to as the independent variable (sometimes denoted simply as the variable), and the output as the dependent variable (it “depends” on x). If f is a function from X into Y , and X1 is a subset of X, the function that to each x1 ∈ X1 associates the element f (x1 ) ∈ Y is said to be the restriction of f to X1 , and is denoted by f X1. On the other hand, if f is a function from X into Y , a set Z is a superset of X, and g is a function from Z into Y such that f (x) = g(x) for all x ∈ X, we say that g extends f (to Z), or that g is an extension of f to Z. A way to represent a function as the action on elements in X giving elements in Y is to write f : X → Y, or
f
X −→ Y. © Springer International Publishing Switzerland 2015 V. Montesinos et al., An Introduction to Modern Analysis, DOI 10.1007/978-3-319-12481-0_4
135
136
4 Functions
Fig. 4.1 The graph of the function x 2 − x + 1 on [0, 1]
Fig. 4.2 The graph of the function (x − 1)/(x + 1) on [−10, 10]
In particular, we are interested in functions f having a domain D(f ) ⊂ R and a range R(f ) ⊂ R. Sometimes we refer to this situation by saying that f is a real-valued (since the range is a subset of R) function of a real variable (since the domain is a subset of R). In some cases, the rule that associates to an element x an element y is given by an algebraic expression (a formula, like in f (x) = x 2 − x + 1), although many times this expression is not algebraic and the action of f must be described by more elaborate procedures. Functions in applications may have a very complicated form, and computer aid may be necessary to obtain some insight on the behavior of a given function (either by plotting its graph, or by evaluating it at significant points in its domain). There is a natural way to represent a real-valued function of a real variable in a plane (after all, f is nothing but a certain subset of the Cartesian product R × R). The graph of a function f : D(f ) → R is a subset of R × R, precisely graph f := {(x, f (x)) : x ∈ D(f )} ⊂ R × R. For example, Fig. 4.1 shows the graph of the function f (x) = x 2 − x + 1 (=(x − 1/2)2 + 3/4) in the interval [0, 1]. For a second example, consider the following function f defined on D(f ) := {x ∈ R : x = −1}, and having range R(f ) := {y ∈ R : y = 1}: To each x ∈ D(f ), the function f associates the real number f (x) := x−1 . For the graph of x+1 the function on the interval [−10, 10], see Fig. 4.2. Check also Exercise 13.162.
4.1 Functions on Real Numbers
137
Definition 293 Consider a function f with domain D(f ) and range R(f ). Let g be a function such that R(f ) ⊂ D(g), i.e., f
g
D(f ) −→ R(f ) ⊂ D(g) −→ R(g). Then the composition of f and g is the function g ◦ f : D(f ) → R(g) (also denoted g(f )) given by (g ◦ f )(x) = g(f (x)), for every x ∈ D(f ). For example, if g(x) = x 2 and f (x) = (2x + 1) for x ∈ R, then (g ◦ f )(x) = (2x + 1)2 , for x ∈ R. The concepts of one-to-one function, and of onto function, were introduced in Sect. 1.1. Definition 294 Let f be a function with domain D(f ) and range R(f ). Assume that f is one-to-one. The inverse function for f , denoted by f −1 , is the function from R(f ) to D(f ) so that (f ◦ f −1 )(y) = y for all y ∈ R(f ), and (f −1 ◦ f )(x) = x for all x ∈ D(f ). Schematically, f
D(f ) −→ R(f ) f −1
D(f ) ←− R(f ). For an example, let us consider again the function f (x) := x−1 from D(f ) := x+1 {x ∈ R : x = −1} onto R(f ) := {y ∈ R : y = 1}, whose graph appears in Fig. 4.2. It is simple to show that f is one-to-one, and that its inverse function f −1 : R(f ) → D(f ) is given by f −1 (y) = 1+y , y ∈ R(f ) (see Exercise 13.162). 1−y Having a simple algebraic formula for our function is no guarantee for an immediate way of obtaining an output from an input: Let us define f on R by considering the function f (x) = x 2 For this function, the output y is obtained from the input x simply by squaring the input. For some inputs x we can readily describe the output y. For example, if √ 2 x = 2 then y = 2. For all rational numbers x = pq the output is y = pq 2 . For other values of x the output y it might be more difficult to establish. Let x = 0.110100010000000100000000000000010. . .(base2), where the digits are all zero except when the position in the expansion is a power of two (then the digit is 1). Note that x is irrational (see Theorem 20). We have
138 Fig. 4.3 The characteristic function of the set A
4 Functions Y XA
1
0
A
A
A
X
(see Exercise 13.139, where it is used theory of series and, in particular, product of series—see Sect. 2.4.7) x 2 = 0.10101010101000101010001000000010101010010000111110. . .(base2). Other times we have to describe functions on real numbers by a rule description. A typical example of a function, that will appear often, is the so-called characteristic function of a subset of R. Definition 295 Let A be a subset of R. The function on R ⎧ ⎨1 if x ∈ A, χA (x) = ⎩0 otherwise, is called the characteristic or the indicator function of the set A. (See Fig. 4.3). The following function, named after J. P. G. Lejeune Dirichlet, is important in many branches of mathematics. Definition 296 (Dirichlet’s function) The Dirichlet function on R is χP , i.e., the characteristic function of the set of all irrational numbers. This function will be denoted by D. This function is unplottable and unrealizable on computer systems, as computers can only record subsets of rational numbers. Naturally, functions that can be described by an algebraic formula are convenient to work with. A bulk of mathematical modelling is accomplished with these functions alone. Functions whose rules are defined solely by multiplication and addition are especially nice. We call them polynomials (see Definition 297 below). We will be able (see Theorem 490 below) to approximate a large class of functions with polynomials. Definition 297 A function p : R → R is called a polynomial function (or just a polynomial) if for all x ∈ R, p(x) = an x n + an−1 x n−1 + · · · + a0 , where n is a natural number called the degree of the polynomial p if an = 0. Here ak ∈ R for all k ∈ {0, 1, . . ., n}, and we refer to these numbers as the coefficients of the polynomial p.
4.1 Functions on Real Numbers
139
Fig. 4.4 The function x 2 is even, the function x 3 is odd
Definition 298 A function f is called a rational function if f =
p , q
where p and q are polynomials. Remark 299 The domain of a rational function is D(f ) = {x ∈ R : q(x) = 0}, in general a proper subset of R. ® Real-valued functions of a real variable are objects that can be manipulated algebraically. More precisely, we can define the sum and the product of two such functions in a natural way: if f : D(f ) → R and g : D(g) → R are functions, we can define f + g : D(f ) ∩ D(g) → R and f g : D(f ) ∩ D(g) → R as (f + g)(x) = f (x) + g(x), and (f g)(x) = f (x)g(x), respectively, for all x ∈ D(f ) ∩ D(g). The quotient f/g is defined analogously; however, its domain is restricted to the set D(f ) ∩ D(g) ∩ {x ∈ D(g) : g(x) = 0}. The following notion was already introduced for the class of real-valued functions having N as domain (i.e., sequences of real numbers), see Definition 134. Definition 300 A real-valued function f defined on an interval I is called increasing (strictly increasing) if f (x) ≤ f (y) (respectively, f (x) < f (y)) whenever x < y in I . The function f is called decreasing (strictly decreasing) if f (x) ≥ f (y) (respectively, f (x) > f (y)) whenever x < y in I . A function that is either increasing or decreasing (either strictly increasing or strictly decreasing) is called monotone (respectively, strictly monotone). Functions that exhibit some kind of symmetry have special names: Definition 301 Let a > 0, and let f : [−a, a] → R be a function. We say that f is even (odd) if f (x) = f (−x) (respectively, if f (x) = −f (−x)) for all x ∈ [−a, a]. Figure 4.4 exhibits an even and an odd function. Definition 302 A real-valued function f of a real variable is said to be bounded if there exist real numbers M and N so that N ≤ f (x) ≤ M for all x ∈ D(f ). For example, the function f (x) := 1/x, defined on [1, +∞), is a bounded function (bounded above by 1, and below by 0). When f (x) := 1/x is defined on (0, 1) it
140
4 Functions
becomes an unbounded function: It is still bounded below by 0; however, it is not bounded above. Indeed, if M > 0 and x ∈ (0, 1) satisfies 0 < x < 1/M, we get f (x) > M. For a plot of this function see Fig. 4.10, where it is plotted on [−10, 0) ∪ (0, 10].
4.1.2
The Limit of a Function
The following is a basic concept in the theory of functions. The modern epsilon-delta definition was introduced by B. Bolzano in 1817. Later on, A. L. Cauchy considered the same idea in his Cours d’Analyse (1821). The way it is formulated today is due to K. Weierstrass. Definition 303 Let f : D(f ) (⊂R) → R be a real-valued function of a real variable. Let x0 ∈ R be an accumulation point of D(f ) (not necessarily an element of D(f )) and let L ∈ R. Then we say that the function f has a limit L at x0 (and we write limx→x0 f (x) = L) whenever the following holds: For every ε > 0 there exists δ (=δ(x0 , ε)) > 0 such that |f (x) − L| < ε whenever x ∈ D(f ), and 0 < |x − x0 | < δ.
(4.1)
Remark 304 1. The ingenuity of the definition lies in the fact that no rule of dependence of δ on ε is a priori prescribed (in other terms, the limit exists precisely when a winning strategy for player 2 in our previous discussion of the concept of the limit of a sequence—that can be translated here with minor modifications—exists; it will depend on the particular form of the function). 2. The requirement that x0 ∈ R is an accumulation point of D(f ) is made in order to avoid checking (4.1) on an empty set. Indeed, in another case, we may have {x ∈ D(f ) : 0 < |x − x0 | < δ} = ∅ for small δ > 0 (this may happen even in case that x0 ∈ D(f ): Note the strict inequality 0 < |x − x0 | in the formula above). 3. Note that a function may have at most one limit at a given accumulation point x0 of D(f ): Indeed, assume that limx→x0 f (x) = L1 and limx→x0 f (x) = L2 . Then, given ε > 0 we may find δi > 0 such that xi ∈ D(f ) and 0 < |xi − x0 | < δi imply that |f (xi ) − Li | < ε, for i = 1, 2. Let δ := min{δ1 , δ2 }. Take x ∈ D(f ) such that 0 < |x − x0 | < δ. Then we have |f (x) − Li | < ε for i = 1, 2. In particular, |L1 − L2 | ≤ |L1 − f (x)| + |f (x) − L2 | < ε + ε = 2ε. Since ε > 0 was arbitrary, we obtain |L1 − L2 | = 0, i.e., L1 = L2 . 4. Note that the Dirichlet function D, introduced in Definition 296, has a limit at no point in R. Indeed, assume that limx→x0 D(x) = L for some x0 ∈ R and L ∈ R. Fix ε = 1/4 and find δ > 0 according to the definition of limit. Find, thanks to Proposition 85, a rational number r and an irrational number x such
4.1 Functions on Real Numbers
141
Fig. 4.5 The limit of a function f at a point x0 may be different from f (x0 )
5.
6.
7.
8. 9.
that 0 < |r − x0 | < δ and 0 < |x − x0 | < δ. Then (|L| = ) |f (r) − L| < ε, and (|1−L| = ) |f (x)−L| < ε. It follows that 1 ≤ |1−L|+|L| < ε+ε = 2ε = 1/2, a contradiction. At first glance it may look strange that we avoid precisely the point x0 from checking the existence and the value of limx→x0 f (x). However, this is at the root of the definition. We are interested in a tendency, not in the value f (x0 )—that may not exist at all (x0 is not forced to be in D(f )), or that may not coincide with L (see Fig. 4.5). We note that the δ value in Definition 303 generally depends on the chosen input value x0 as well as on the quantity ε. Out of the constant functions, the dependence from ε is unavoidable. Example 4.1.3.2 below helps to understand that, in general, there is a dependence on the input value. In order to single out the special situation where δ does not depend on the input value x0 , the concept of uniform continuity—see Definition 343—is introduced later. Although we mentioned already this in item 5 above, it is important to stress that it is possible and desirable—as far as x0 is an accumulation point of D(f )—to define the limit of a function at points x0 that are not in the domain of f . Note that |x − a| < ε means that the distance of x to a is less than ε, i.e., a − ε < x < a + ε or, in other terms, that x ∈ (a − ε, a + ε). Note that if we find δ for some ε0 in the definition of limit, the same δ is then 1 good for all ε ≥ ε0 . Thus it is enough to find δ’s only for, say, ε < 100 . This may be used in practical problems. ®
Examples 305 Although Chap. 13 contains several exercises computing limits of functions (see, e.g., Exercises 13.164–13.169), some examples at this stage may help to clarify the meaning of Definition 121. 1. The function f (x) := x 2 + x is defined in R, and 1 is certainly an accumulation point of R. We claim that L := limx→1 (x 2 + x) exists with value L = 2 (a value guessed at this moment by estimation: If x = 1.1, then f (x) = 2.31, if x = 1.01 then f (x) = 2.0301, etc.) (see also Fig. 4.6). Indeed, fix ε > 0 and put δ := min{ε/4, 1} (1 ). By taking x = 1 + h, where 0 < |h| < δ (i.e., 0 < |x − 1| < δ), we get |f (x)−1| = |f (1+h)−1| = |3h+h2 | ≤ 3|h|+|h2 | < 3δ+δ 2 ≤ 3δ+δ = 4δ = ε. This proves the claim.
We remark that the proposed formula for δ originates in trying to estimate |f (x) − 1| for x close to 1, usually achieved by performing some rough work on the algebraic expressions. Quite often, the
1
142
4 Functions
Fig. 4.6 The function f in Example 4.1.2.1
Fig. 4.7 The signum function (Eq. (4.2))
f
0 f
2. The signum (or sign) function (see Fig. 4.7) is defined by ⎧ ⎪ ⎪ ⎨−1 if x < 0, sign(x) = 1 if x > 0, ⎪ ⎪ ⎩ 0 if x = 0.
(4.2)
We shall prove that limx→0 sign(x) does not exist. To this end, observe first that the sign function is defined on R, and that 0 is certainly an accumulation point of R. Assume for a moment that the limit exists with value L. Take ε = 1/2. Find δ > 0 such that for 0 < |x| < δ, |sign x − L| < 1/2. For 0 < x < δ this gives |1 − L| < 1/2, and for −δ < x < 0, we get | − 1 − L| < 1/2. Using Remark 304.8 we thus have that the distance of L to 1 is less than 1/2 and the distance of L to −1 is less than 1/2. So L ∈ (1/2, 3/2), and at the same time L ∈ (−3/2, −1/2). This is a contradiction. 3. limx→0 x.D(x) = 0, where D is the Dirichlet function (see Definition 296). Again, the function f (x) := xD(x) is defined on R, and 0 is an accumulation point of R. Observe that D is a bounded function (indeed, |D(x)| ≤ 1 for all x ∈ R). Given ε > 0, take δ := ε. Then we have, for 0 < |x| < δ, |xD(x) − 0| ≤ |x| < δ = ε, and the conclusion follows. 4. limx→1 x 4 = 1. Indeed, the function f (x) := x 4 is defined on R, and 1 is certainly an accumulation point of R. Using Remark 304.9, given ε ∈ (0, δ) take δ := ε/15. Use the finite binomial expansion (2.40) or, more generally, (13.3) in Exercise 13.10, to get, for 0 < |h| < δ, and using that 0 < δ < 1 (hence |h|2 < |h|, |h|3 < |h|, and |h|4 < |h|), |f (1 + h) − 1| = |4h + 6h2 + 4h3 + h4 |
guess is obtained by working backwards. This method does require some practice; we will touch on it again and again—see, e.g., Example 4.1.3.2—including some of the proposed exercises.
4.1 Functions on Real Numbers
143
≤ |4h| + |6h2 | + |4h3 | + |h4 | < (4 + 6 + 4 + 1)|h| < 15δ = ε, ♦
and the conclusion follows.
We will gradually develop many techniques for simplifying calculations of limits of functions in the next pages and in exercises (see, in particular, Sect. 13.4.1). The definition of limit is extended to cover the possibility that x0 = +∞, x0 = −∞, L = +∞, or L = −∞. For example, if D(f ) is not bounded above, we say that limx→+∞ f (x) = L ∈ R if the following holds: For every ε > 0 there exists α = α(ε) ∈ R such that |f (x) − L| < ε whenever x ∈ D(f ) and x > α(ε). Another example: If x0 is an accumulation point of D(f ), we say that limx→x0 f (x) = +∞ if, for every β ∈ R, there exists δ = δ(β) > 0 such that f (x) > β whenever 0 < |x − x0 | < δ(β). Examples of this situations will be considered in Remark 378, (vi) in Proposition 531, Remark 534, the paragraph behind Definition 877, the proof of Proposition 416, and Exercises 13.164, 13.222, 13.232, 13.236, 13.248, 13.250, and 13.321. One-Sided Limits Another extension of the definition of limit considers one-sided limits. Those are defined as follows. Definition 306 Let f : D(f ) (⊂R) → R be a function. If x0 ∈ D(f ) satisfies that for some δ0 > 0 we have (x0 −δ0 , x0 ) ⊂ D(f ), then we say that L := limx→x0 − f (x) (where L is a real number) if for every ε > 0 there exists δ ∈ (0, δ0 ) such that |f (x) − L| < ε whenever x0 − δ < x < x0 , and we write f (x0 −) := L. We say that limx→x0 − f (x) = +∞ if for every M > 0 there exists δ ∈ (0, δ0 ) such that f (x) > M whenever x0 − δ < x < x0 , and we write f (x0 −) = +∞. Analogously for limx→x0 − f (x) = −∞, writing f (x0 −) = −∞. If x0 ∈ D(f ) satisfies that for some δ0 > 0 we have (x0 , x0 + δ0 ) ⊂ D(f ), then we say that L := limx→x0 + f (x) (where L is a real number) if for every ε > 0 there exists δ ∈ (0, δ0 ) such that |f (x) − L| < ε whenever x0 < x < x0 + δ, and we write f (x0 + ) := L. We say that limx→x0 + f (x) = +∞ if for every M > 0 there exists δ ∈ (0, δ0 ) such that f (x) > M whenever x0 < x < x0 + δ, and we write f (x0 + ) = +∞. Analogously for limx→x0 + f (x) = −∞, writing f (x0 + ) = −∞. Proposition 307 Let f be a function defined on an open interval I and let x0 ∈ I . Then limx→x0 f (x) exists if and only if, both limits in the next line exist and lim f (x) = lim f (x).
x→x0 −
x→x0 +
If this is the case, then limx→x0 f (x) = limx→x0 − f (x) = limx→x0 + f (x). Proof Assume that both one-sided limits exist, and L := limx→a− f (x) = limx→a+ f (x). Then, given ε > 0 there exists δ − (x0 , ε) and δ + (x0 , ε) as in Definition 306 (the first one for the left-sided and the second one for the right-sided limit). It is enough to take δ = min{δ − (x0 , ε), δ + (x0 , δ)} to conclude that for x ∈ D(f ) and 2 0 < |x − x0 | < δ, then |f (x) − L| < ε. The converse is obvious. Remark 308 Proposition 307 is also used as follows: If the one-sided limits of a function at a point x0 both exist and are different, the limit limx→x0 f (x) does not
144
4 Functions
exist. As a particular example of this situation, consider again Example 4.1.2.2 above: At x0 = 0, both limits limx→0+ sign x (=1) and limx→0− sign x (= − 1) exist and differ. So the function sign has no limit whenever x → 0. ® The following result ensures that monotone functions have one-sided limits. Proposition 309 Let f : I → R be a monotone function defined on an open interval I ⊂ R. Then both limits limx→a− f (x) and limx→a+ f (x) exist (as real numbers) for every a ∈ I . If, moreover, f is bounded on I , then finite one-sided limits exists at the endpoints of the interval (eventually, at −∞ or +∞). Proof Without loss of generality, we may assume that f is increasing. The set A := {f (x) : x ∈ I , x < a} is bounded above by f (a), and the set B := {f (x) : x ∈ I , x > a} is bounded below by f (a). Therefore, by Theorem 45, A has a supremum, say L, and B has an infimum, say R. We shall prove that limx→a− f (x) = L. Indeed, fixing ε > 0, we can find x0 < a such that f (x0 ) > L − ε. Since f is increasing, we have L−ε < f (x0 ) ≤ f (x) ≤ L for every x ∈ (x0 , a). Since ε > 0 was arbitrary, this proves that limx→a− f (x) = L. Analogously, we can prove that limx→a+ f (x) = R. Assume now that f is bounded on I , and let L := inf{f (x) : x ∈ I } and R := sup{f (x) : x ∈ I }. The argument to prove the existence of the one-sided limit (with value L) (with value R) at the left endpoint of I —eventually, at −∞— (respectively at the right endpoint of I —eventually at +∞) is similar to the one in the first paragraph. 2 Limit Superior and Limit Inferior Another related concept that will be used all alongin this book is the limit superior and the limit inferior of a function at a point. This concept was already introduced for sequences of real numbers in Definition 137. Definition 310 Let f : D(f ) (⊂R) → R be a function and let x0 ∈ R be an accumulation point of D(f ). If for some n0 ∈ N, we have supx∈D(f ); 0 0 such that |f (x) − f (x0 )| < ε whenever x ∈ D(f ), and |x − x0 | < δ. A function f that fails to be continuous at x0 is said to be discontinuous at x0 . A function that is continuous at every point of its domain is called an (everywhere) continuous function. If D(f ) := [a, b] is a closed interval in R, the continuity of f at, say, a, is referred to as one-sided continuity. Figure 4.9 depicts the graph of a continuous function and of a discontinuous function at some point x0 .
148
4 Functions
Fig. 4.9 At x0 , f is continuous, g discontinuous
g g f f
x0 Fig. 4.10 The function 1/x on the interval [−10, 0) ∪ (0, 10] (the range limited to [−10, 10])
Remark 317 Let f : D(f ) (⊂R) → R be a function. Assume that x0 ∈ D(f ) is an accumulation point of D(f ). Then it follows from Definition 316 that f is continuous at x0 if and only if, limx→x0 f (x) exists and it has value f (x0 ). Assume now that x0 ∈ D(f ) is an isolated point in D(f ), i.e., there exists δ > 0 such that (x0 − δ, x0 + δ) ∩ D(f ) = {x0 } (see Definition 81). It follows from Definition 316 that f is continuous at x0 . Indeed, given ε > 0, take δ(x0 , ε) := δ; if x ∈ D(f ), |x − x0 | < δ, we have, necessarily, x = x0 , hence (0 = ) |f (x) − f (x0 )| < ε. ® Examples 318 We list now some examples of continuous and discontinuous functions. 1. We shall see later that every polynomial function is continuous everywhere (see Corollary 330). 2. The function f (x) = x1 is continuous on its domain D(f ) = R \ {0} (see Fig. 4.10). Although this result will follow from Proposition 329 below, let us give here a direct argument. Let us prove that f is continuous at a = 0. Assume without loss of generality that a > 0. Let ε > 0. Choose δ > 0 so that 2 εa a δ < min , . 2 2 If |x − a| < δ then x > a/2, and we have 1 − 1 = |x − a| ≤ 2|x − a| < 2δ < ε. x a xa a2 a2
4.1 Functions on Real Numbers
149
This proves the continuity of f at a. Note that the choice of δ depended heavily on the choice of the input value a. We shall prove later (Theorem 344) that, for continuous functions defined on closed and bounded subsets of R, the value δ can be made independent on a in the domain—although, certainly, dependent on ε. We shall practice in a few exercises later (see, e.g., Exercises 13.172, 13.179, 13.193, 13.194, 13.199) how to choose δ(x0 , ε) in the definition of continuity. 3. The Dirichlet function D introduced in Definition 296 is discontinuous everywhere. This follows directly from the definition: Indeed, given x ∈ R and δ > 0 we can always find y ∈ R such that |y − x| < δ and |D(x) − D(y)| = 1. It is also a consequence of Example 4.1.2.4 and Remark 317. See Exercise 13.185, too. ♦ We may give an equivalent condition to the continuity of a function at a point in terms of lim sup and lim inf (see Definition 310), a straightforward consequence of Proposition 313. Additionally, the equivalence between (i) and (ii) below is, in fact, Remark 317, so we shall also omit the proof of this. Proposition 319 Let f : D(f ) (⊂R) → R be a bounded function, and let x0 ∈ D(f ) be an accumulation point of D(f ). Then the following statements are equivalent: (i) f is continuous at x0 . (ii) There exists limx→x0 f (x), and it has value f (x0 ). (iii) lim inf x→x0 f (x) = lim supx→x0 f (x) = f (x0 ). A consequence of Proposition 314 is that continuity can be checked by using sequences. Since this is a fact that will be used often, we isolate it in Proposition 320. It follows straightforwardly from Proposition 314, so we omit its proof. See also Proposition 328, where continuity of a function is checked by using sequences from a given dense subset of its domain. Proposition 320 A function f : D(f ) → R is continuous at x, if and only if, for every sequence {xn }∞ n=1 in D(f ) such that lim n→∞ xn = x, we have lim n→∞ f (xn ) = f (x). Corollary 321 If two real-valued continuous functions f and g with a common domain D ⊂ R agree on a dense subset S of D, then f = g on D. Proof Fix x ∈ D. Find a sequence {sn } in S such that sn → x, so f (sn ) → f (x) and g(sn ) → g(x). Since f (sn ) = g(sn ) for all n ∈ N, we obtain f (x) = g(x). 2 In particular, if two continuous functions defined on R agree on the dyadic numbers, then the two functions must be equal. Therefore, plots of continuous functions on computers reflect their true behavior, unlike the Dirichlet function D introduced in Definition 296. Function values at irrational numbers can be approximated by function values at dyadic numbers (or at any other dense subset). Definition 322 Let a real-valued function f be defined on an open interval I (⊂R). We say that f has a jump discontinuity at a point a ∈ I if limx→a− f (x) and limx→a+ f (x) both exist and they are not equal.
150
4 Functions
Fig. 4.11 The preimage of (1, 2) by f (x) := x 2 (Remark 324)
The signum function (see its definition in Eq. (4.2) and its graph in Fig. 4.7) has a jump discontinuity at x = 0. Definition 323 Let f : D(f ) (⊂R) → R be a function. Let U be a subset of R. By the preimage of U under f (denoted by f −1 (U )) we understand the set f −1 (U ) := {x ∈ D(f ) : f (x) ∈ U }. Remark 324 The symbol f −1 (U ) should not suggest that the inverse function does exist. In fact, it may happen that the function f is not one-to-one. As an illustration, consider first the function f (x) = x 2 defined on R. This √ function is not√one-toone. The preimage of U := (1, 2) is the set f −1 (U ) = (− 2, −1) ∪ (1, 2) (see Fig. 4.11). Second, let D be the Dirichlet function introduced in Definition 296. This function is again not one-to-one. Let U := {0}. Then D−1 (U ) = Q. ® There is a useful characterization of continuity in terms of preimages. Precisely, the following holds. Proposition 325 Let f : D(f ) (⊂R) → R be a function. Then f is continuous at x0 ∈ D(f ) if and only if, the preimage of every neighborhood of f (x0 ) in R is a neighborhood of x0 relatively to D(f ). Proof Suppose that f is continuous at x0 , and let V be a neighborhood of f (x0 ). Let ε > 0 be such that (f (x0 ) − ε, f (x0 ) + ε) ⊂ V (such ε exists since V is a neighborhood of f (x0 )). By the definition of continuity, there exists δ > 0 such that x ∈ D(f ) and |x − x0 | < δ together imply that |f (x) − f (x0 | < ε. This shows that (x0 − δ, x0 + δ) ∩ D(f ) ⊂ f −1 (V ), hence f −1 (V ) is a neighborhood of x0 relatively to D(f ). Conversely, assume that the preimage of every neighborhood of f (x0 ) is a neighborhood of x0 relatively to D(f ). Let ε > 0 be given. The set I := (f (x0 ) − ε, f (x0 ) + ε) is a neighborhood of f (x0 ), hence f −1 (I ) is a neighborhood of x0 relatively to D(f ). This implies that there exists δ > 0 such that
(x0 − δ, x0 + δ) ∩ D(f ) ⊂ f −1 (I ). In particular, if x ∈ D(f ) and |x − x0 | < ε, then |f (x) − f (x0 )| < ε, which means that f is continuous at x0 since ε > 0 was taken arbitrary. 2
4.1 Functions on Real Numbers
151
Since a set is open if and only if it is a neighborhood of each of its points, we immediately get the following consequence of Proposition 325 (the version in brackets is obtained by taking complements). Corollary 326 Let f : D(f ) (⊂R) → R be a function. Then f is continuous if and only if the preimage of every open (closed) subset of R is an open relatively to D(f ) (respectively closed relatively to D(f )) subset of D(f ). Corollary 327 Let f : D(f ) (⊂R) → R be a continuous function. Then, if S is a dense subset of D(f ), the set f (S) := {f (x) : x ∈ S} is dense in R(f ) := {f (x) : x ∈ D(f )}. Proof Let y ∈ R(f ), y = f (x) for some x ∈ D(f ), and let U be a neighborhood of y in R. By Proposition 325, the set f −1 (U ) is a neighborhood of x relatively to D(f ). It follows that there exists s ∈ S such that s ∈ f −1 (U ), hence f (s) ∈ U . This proves the assertion. 2 A way to check continuity of a given function by approximating from a dense subset of its domain appears at the following result (a more general result for functions with values on metric spaces will be given in Exercise 13.367). Proposition 328 Let f : D(f ) (⊂R) → R be a function, and let D0 be a dense subset of D(f ). Then, f is continuous, if and only if, limd→x, d∈D0 f (d) = f (x) for every x ∈ D(f ). Proof That the condition is necessary is obvious. To prove sufficiency, fix x0 ∈ D(f ). Given ε > 0, there exists an open neighborhood U (x0 ) such that |f (x0 ) − f (d)| < ε for every d ∈ U (x0 ) ∩ D0 (observe that U (x0 ) ∩ D0 = ∅). For x ∈ U (x0 ) choose a neighborhood U (x) of x such that U (x) ⊂ U (x0 ) and |f (x) − f (d)| < ε for every d ∈ U (x) ∩ D0 (again, the set U (x) ∩ D0 is nonempty; moreover, it is a subset of U (x0 ) ∩ D0 ). Let d ∈ U (x) ∩ D0 . Then |f (x) − f (x0 )| ≤ |f (x) − f (d)| + |f (d) − f (x0 )| < 2ε. Since x ∈ U (x0 ) was arbitrary, this shows that f is continuous at x0 . 2 Another way to formulate Proposition 328 is to say that f is continuous, if and only if, limn→∞ f (dn ) = f (x) for every x ∈ D(f ) and every sequence {dn }∞ n=1 in D0 that converges to x. Algebraic Operations with Continuous Functions Continuity is stable under the usual operations on functions. In fact, we have the following result. Proposition 329 Let f and g be real-valued functions on a common domain D ⊂ R, and let x0 ∈ D. Assume that f and g are continuous at x0 . Then The function f + g is continuous at x0 . The function f g is continuous at x0 . The function f/g is continuous at x0 if g(x0 ) = 0. If h : R(f ) → R is continuous at f (x0 ) (where R(f ) denotes the range of f ), then h ◦ f is continuous at x0 . (v) The function |f | is continuous at x0 . (vi) The functions max{f , g} and min{f , g} are both continuous at x0 .
(i) (ii) (iii) (iv)
152
4 Functions
Proof Consider any sequence {xn }∞ n=1 in D that converges to x0 . Since f and g are continuous at x0 we have lim f (xn ) = f (x0 ), and lim g(xn ) = g(x0 ).
n→∞
n→∞
Therefore, by Proposition 133, lim f (xn ) + g(xn ) = f (x0 ) + g(x0 );
n→∞
lim f (xn )g(xn ) = f (x0 )g(x0 ).
n→∞
Thus, by Proposition 320, f + g and f g are continuous functions at x0 . This proves (i) and (ii). For proving (iii), do note that if g(x0 ) = 0, we may restrict ourselves to a sequence {xn }∞ n=1 such that g(xn ) = 0 for all n ∈ N. Then lim
n→∞
f (x0 ) f (xn ) = , g(xn ) g(x0 )
and, again by Proposition 320, this proves (iii). (iv) Consider a neighborhood U of h(f (x0 )). Since h is continuous at f (x0 ), we have that h−1 (U ) is a neighborhood of f (x 0 ) relatively to R(f ) (see Proposition 325). Now, since f is continuous at x0 , f −1 h−1 (U ) is a neighborhood of x0 , again by Proposition 325. This proves (iv). (v) is a consequence of (iv), since the function y → |y| is continuous on R. Indeed, given x, x0 ∈ R we have |x| = |x − x0 + x0 | ≤ |x − x0 | + |x0 |, hence |x| − |x0 | ≤ |x − x0 |. By reversing the roles of x and x0 we get |x0 | − |x| ≤ |x − x0 |, hence ||x| − |x0 || ≤ |x − x0 |. This shows the continuity of y → |y| at any x0 ∈ R (see also Example 4.1.4.4 and Fig. 4.19). (vi) follows from (v), since max{f , g} = 21 (f + g + |f − g|), and min{f , g} = 1 + g − |f − g|) (as it can easily be checked by considering the cases where 2 (f f (x) is greater (greater than or equal to, etc.) than g(x)). 2 Corollary 330 Any polynomial is a continuous function. Proof The function f (x) := x, for x ∈ R, is clearly continuous, as well as any constant function. The result then follows by Proposition 329. 2 Continuity and Connectedness We may now complete the direct proof of the fact that a general interval in R is connected (Corollary 104). Connectedness was introduced in Definition 101. Proposition 331 Let f : D(f ) (⊂R) → R be a continuous function. Assume that D(f ) is connected. Then R(f ) is also connected. Proof Assume that R(f ) := U1 ∪ U2 , where U1 and U2 are two disjoint open relatively to R(f ) nonempty subsets of R(f ). Then D(f ) = f −1 (U1 ) ∪ f −1 (U2 ), and f −1 (U1 ) ∩ f −1 (U2 ) = ∅. By Corollary 326, f −1 (U1 ) and f −1 (U2 ) are open relatively to D(f ), and certainly each of them is nonempty. This contradicts the fact that D(f ) was assumed to be connected. 2
4.1 Functions on Real Numbers
153
Fig. 4.12 The graph of f in [−10, 10] (proof of Corollary 332)
Corollary 332 Any general interval in R is connected. Proof It is enough to prove that every interval (a, b), where a < b in R, is connected (see Corollary 104). There exists a continuous mapping from (−1, 1) onto any such interval (see the proof of Proposition 61). The conclusion, in view of Propositions 103 and 331, will follow as soon as the existence of a continuous mapping f from R onto (−1, 1) is guaranteed. This is accomplished, for example, by the mapping (see Fig. 4.12) ⎧ 1 ⎪ if x ≥ 1, ⎪ ⎨1 − 2x , f (x) :=
1 x, 2 ⎪ ⎪ ⎩ −1 −
if x ∈ [−1, 1],
1 , 2x
if x ≤ −1.
It is obvious that f is continuous (see Proposition 329), and that it maps R onto (−1, 1). 2 Continuity and Optimization Theorem 334 and its Corollary 335 below present a key result for optimization theory. Both are due to K. Weierstrass. It is the window to our popular calculus optimization theorems. The following definition introduces a basic concept. Definition 333 We say that a bounded above (below) function f : D(f ) (⊂R) → R attains its maximum (respectively, minimum) if there exists x1 ∈ D(f ) (respectively x0 ∈ D(f )) such that f (x1 ) = sup{f (x) : x ∈ D(f )} (respectively, f (x0 ) = inf{f (x) : x ∈ D(f )}). Theorem 334 (Weierstrass) Let f : K → R be a continuous function, where K is a compact subset of R. Then f (K) is compact. ∞ Proof Let {yn }∞ n=1 be a sequence in f (K). Then we can find a sequence {xn }n=1 in ∞ K such that f (xn ) = yn for all n ∈ N. By Theorem 149, the sequence {xn }n=1 has a subsequence {xnk }∞ k=1 that converges to some point x ∈ K. Since f is continuous, ynk = f (xnk ) → f (x) ( ∈ f (K)). Again by Theorem 149, the set f (K) is compact. 2
Corollary 335 (Weierstrass) A real-valued continuous function f defined on a compact subset K of R is bounded, and attains its maximum as well as its minimum. Proof Since f (K) is compact, it is closed and bounded, by Theorem 96. The existence of M := sup f (K) in f (K) is then guaranteed (see Remark 98). This shows that there exists x1 ∈ K such that f (x1 ) = M.
154
4 Functions
The argument for the minimum is similar. An alternative (constructive) proof—imitating the proof of Theorem 334—of the same result is the following: If M = sup f (K), find a sequence {yn }∞ n=1 in f (K) that converges to M. For n ∈ N find xn ∈ K such that f (xn ) = yn . The sequence ∞ {xn }∞ n=1 has, according to Theorem 149, a convergent subsequence {xnk }k=1 . Let x be its limit. Since K is closed, x ∈ K. Due to the fact that f is continuous, we have ynk = f (xnk ) → f (x); this shows that M = f (x). 2 Remark 336 1. Still another proof of Corollary 335 can be found in Exercise 13.188. 2. The compactness assumption can not be omitted in Corollary 335. For example, the function f (x) = x1 is defined and continuous on (0, 1]. However, it does not attain its maximum there (in fact, it is not even bounded above there, see Fig. 4.10). Boundedness is not a sufficient condition, either. The function f (x) := x defined on (0, 1) is continuous, bounded, and does not attain its supremum (= 1) nor its infimum (= 0) on (0, 1). ® In some situations, if a continuous function has an inverse, this inverse is automatically continuous. This is the content of the following useful result. Proposition 337 Let f : K → R be a continuous one-to-one function, where K is a compact subset of R. Then the inverse function f −1 : f (K) → K is also continuous. Proof The continuity of f −1 will be proved if we can show that {xn }∞ n=1 converges to x whenever {xn }∞ is a sequence in K and x is an element in K such that {f (xn )}∞ n=1 n=1 ∞ converges to f (x). Assume that for some sequence {f (xn )}n=1 as above this fails. ∞ Then, there exists ε > 0 and a subsequence {yn }∞ n=1 of {xn }n=1 such that |x − yn | ≥ ε ∞ for all n ∈ N. The sequence {yn }n=1 has, by Theorem 147, a further subsequence {zn }∞ n=1 that converges (to some z ∈ K). Obviously, |x − z| ≥ ε. Observe that ∞ {f (zn )}∞ n=1 is a subsequence of {f (xn )}n=1 , hence f (zn ) → f (x). By the continuity of f we have, too, f (zn ) → f (z). This implies f (z) = f (x) and, since f is one-to-one, z = x, a contradiction. 2 For an alternative proof of Proposition 337, based on Corollary 326 and Theorem 334, see Exercise 13.200. Remark 338 The result in Proposition 337 is no longer true without the assumption that the domain is compact. For example, Let D(f ) := {−1} ∪ (0, +∞), and let f : D(f ) → R(f ) be defined by ⎧ ⎨1, if x = −1, f (x) = 1 ⎩1 − , if x > 0. 1+x
The range R(f ) is (0, 1] (see Fig. 4.13). The function f is continuous and oneto-one. However, if yn := 1 − 1/n, n = 2, 3, . . ., the sequence {yn }∞ n=1 belongs to
4.1 Functions on Real Numbers
155
Fig. 4.13 The example in Remark 338
Fig. 4.14 The intermediate value theorem
M d f
m a c1
c2
b
R(f ) and converges to 1. However, xn := f −1 (yn ) = n − 1, f −1 (1) = −1, and the sequence {xn }∞ ® n=1 does not converge to −1 (in fact, it does not converge at all). The Intermediate Value Property I Theorem 339 (Intermediate Value Property) Let f be a real-valued continuous function defined on an interval I = [a, b]. Let m (M) be the minimum (respectively, maximum) of f on I (see Corollary 335). Let d ∈ [m, M]. Then there exists c ∈ [a, b] so that f (c) = d. Proof Assume first that d = M or d = m. Then use Corollary 335 to find c ∈ I such that f (c) = d. So, it is enough to prove the result for d ∈ (m, M). Consider the nonempty relatively open sets in I A := {x ∈ I : f (x) < d}, and B := {x ∈ I : f (x) > d}; they are open due to Corollary 326. If we assume that I = A∪B, we violate Corollary 104. Thus there exists c ∈ I such that c ∈ A∪B. This forces f (c) = d (see Fig. 4.14, where two points c1 and c2 satisfying f (c1 ) = f (c2 ) = d are represented). 2 As a consequence, we obtain the following Corollary 340 Let f be a continuous real-valued function defined on an interval I = [a, b]. Then the range f (I ) is an interval [c, d]. The conclusion of Theorem 339 is called the Intermediate Value Property: Definition 341 A real-valued function f defined on a generalized interval I in R is said to have the Intermediate Value Property (also called the Darboux property, after the name of the French mathematician J. G. Darboux) whenever given any two
156
4 Functions
points a, b ∈ I with a < b, and any y between f (a) and f (b), there exists c between a and b such that f (c) = y. Theorem 339 says that every real-valued continuous function defined on a general interval in R has the Intermediate Value Property. The converse is false (see Remark 342 below). Note that the image of any interval J ⊂ I by a real-valued function f defined on I and enjoying the Intermediate Value Property is an interval in R. Remark 342 1. The (discontinuous at 0) function f defined on R by f (x) := sin (1/x) for x = 0 and f (0) := 0 has the Intermediate Value Property. Observe that the image of every non-degenerate interval containing 0 is [−1, 1]. For a plot of the graph of f on [−1, 0) see Fig. 4.44, and recall that the function is odd. 2. More dramatically, there are functions ϕ such that the image of every nontrivial interval in the domain of ϕ is the same fixed nontrivial interval. In particular, those functions are discontinuous everywhere, although they have the Intermediate Value Property. For an example, see Exercise 13.207, providing a function ϕ : [0, 1] → [0, 1] such that the image of every non-degenerate interval in [0, 1] is the whole interval [0, 1]. Note that functions with this property cannot have jump discontinuities: In fact, if the function had a jump discontinuity at x0 , by taking a small enough neighborhood of it we should violate the Intermediate Value Property since the function has at x0 (different) left and right limits. It follows that those functions have “wild” discontinuities at every point of their domains, and a prototype is the function in the previous paragraph. Note, too, that for the class of functions discussed here, i.e., functions f with the property that the image of every nontrivial interval in its domain is the same fixed interval, and for any x0 ∈ D(f ), there is y ∈ R such that the equation f (x) = y has infinitely many solutions as close to x0 as wished. ® In Sect. 4.5.9 we shall continue the study of the Intermediate Value Property. In particular, we shall show there that the derivative of a differentiable function has this property on the interval of definition. Since there are functions with a noncontinuous derivative, this will provide further examples of discontinuous functions with the Intermediate Value Property. Related to the Intermediate Value Property, see Exercises 13.184, 13.204, 13.250, and 13.538. Uniform Continuity In Example 4.1.3.2 we mentioned that the number δ(x, ε) associated to the definition of continuity depends, in general, both on x and ε. The dependence on ε is unavoidable —except in the trivial case of a constant function. However, in many cases δ does not depend on the particular point x where the continuity is checked. We introduce a definition to describe the class of continuous functions having this behavior.
4.1 Functions on Real Numbers
157
Definition 343 A function f : D(f ) (⊂R) → R is said to be uniformly continuous on D(f ) if the following holds: For every ε > 0 there exists δ(ε) > 0 such that |f (x) − f (y)| < ε whenever x, y ∈ D(f ), and |x − y| < δ. Obviously, every uniformly continuous function is continuous on its domain. On the other hand, there are many real-valued continuous functions f on their domain D(f ) ⊂ R that are not uniformly continuous there (see, e.g., Example 4.1.3.1 below). The following general result, due to the mathematicians Georg Cantor and Heinrich Eduard Heine, gives a sufficient condition for a continuous function to be uniformly continuous. Theorem 344 (Heine–Cantor) Every real-valued continuous function f defined on a compact subset K of R is uniformly continuous on K. Proof Fix ε > 0. Given x ∈ K, there exists, by the continuity of f , a number δ(x, ε) > 0 such that k ∈ K and |k − x| < δ(x, ε) imply |f (k) − f (x)| < ε/2. Let us consider the family {(x − δ(x, ε)/2, x + δ(x, ε)/2) ∩ K : x ∈ K}
(4.6)
of relatively open subsets of K that covers K. Since K is compact, there exists a finite subfamily, say {(xi − δ(xi , ε)/2, xi + δ(xi , ε)/2) ∩ K : i = 1, 2, . . ., n},
(4.7)
where xi ∈ K for all i = 1, 2, . . ., n, that still covers K. Put δ(ε) := min{δ(xi , ε) : i = 1, 2, . . ., n}. Take x, y ∈ K such that |x − y| < δ(ε)/2. Since the family in (4.7) covers K, we can find i0 ∈ {1, 2, . . ., n} such that, for this i0 , |x − xi0 | < δ(xi0 , ε)/2.
(4.8)
It follows that |y − xi0 | ≤ |y − x| + |x − xi0 | < δ(ε)/2 + δ(xi0 , ε)/2 ≤ δ(xi0 , ε)/2 + δ(xi0 , ε)/2 = δ(xi0 , ε).
(4.9)
From (4.8) and (4.9) we get |f (x) − f (xi0 )| < ε/2 and |f (y) − f (xi0 )| < ε/2, respectively. Then |f (x) − f (y)| < ε. This shows the statement. 2 Remark 345 Note the ease with which we have proved the above nontrivial results. This is because all the hard work is hidden in the Heine–Borel Theorem 96 regarding compactness, that has been used again and again. Note the rôle of the factor 1/2 in the proof of Theorem 344.
158
4 Functions
Fig. 4.15 The function x 2 on the interval [0, 1] and the argument in Example 4.1.3.2
An alternative, easier but probably less instructive, proof of Theorem 344 is in Exercise 13.190. ® Definition 346 Let f be a real-valued continuous function defined on a subset D of R. The modulus of continuity δ(ε) is the function (taking values in [0, +∞]) defined by δ(ε) := sup{|f (x) − f (y)| : x, y ∈ D, |x − y| ≤ ε}, where ε ≥ 0.
(4.10)
The following proposition has an easy proof that is left to the reader in Exercises 13.189 and 13.199. Proposition 347 Let f be a real-valued function defined on its domain D(f ). Then the following are equivalent. (i) f is uniformly continuous on D(f ). (ii) f (xn ) − f (yn ) → 0 whenever xn , yn ∈ D(f ) are such that xn − yn → 0. (iii) limε→0+ δ(ε) = 0, where δ(ε) is the modulus of continuity of f (introduced in Definition 346). Examples 348 Let us consider some examples: 1. For the function f (x) := 1/x, x ∈ (0, 1], it follows that δ(ε) = +∞ for ε > 0. To prove this, it is enough to consider the sequence {1/n}∞ n=1 as we did in Example 1 4.1.3.2. Indeed, given ε > 0, choose n ∈ N so that n > 1ε . Then n1 − 2n = 1 1 1 < ε, and yet f ( ) − f ( ) = n. See also Fig. 4.10 for the graph of the 2n 2n n function in [−10, 0) ∪ (0, 10]. It follows from (iii) in Proposition 347 that f is not uniformly continuous on (0, 1] (see also Exercises 13.194 and 13.199), although it is continuous there. Observe, too, that due to Theorem 344, f is uniformly continuous when restricted to [c, 1], for every c ∈ (0, 1]. 2. The function f (x) = x 2 (see Fig. 4.15) is uniformly continuous on [0, 1]; however, it is not uniformly continuous on [1, +∞). The first assertion follows from Theorem 344, and a direct proof of it can be provided by computing its modulus of continuity on [0, 1]. The precise value of this modulus is, for the function on [0, 1], δ(ε) = 12 − (1 − ε)2 = 2ε − ε 2 , for ε > 0.
4.1 Functions on Real Numbers
159
√ Fig. 4.16 The graph of x on [0, 10] (Example 4.1.3.3)
Indeed, the supremum in (4.10) is attained for x = 1, y = 1 − ε, due to the increasing character of the “slope” of the function on [0, 1] (see Fig. 4.15 and Remark 352; see also Exercise 13.193). In this case, the reader may also proceed disregarding the precise value of δ(ε), and giving instead an estimate that will suffice for the conclusion: For this, observe that for x, y ∈ [0, 1] such that |x − y| ≤ ε, we have |f (x) − f (y)| = |x 2 − y 2 | = |x − y|.|x + y| ≤ ε|x + y| ≤ 2ε. This shows that δ(ε) ≤ 2ε for all ε ∈ (0, 1]. Since δ(ε) → 0 as ε → 0+ we conclude from (iii) in Proposition 347 that f is uniformly continuous on [0, 1]. In order to prove that f is not uniformly continuous the two √ on [1, +∞), consider √ ∞ sequences {xn }∞ n + 1 and yn√:= n for√n ∈ N. n=1 and {yn }n=1 given by xn := Apply then (ii) in Proposition 347. Indeed, the sequence { n + 1 − n}∞ n=1 converges to 0 (see Exercise 13.89). √ 3. The function f (x) :=√ x is uniformly continuous on [0, ∞): Indeed, note that √ √ x ≤ y + x − y, as we can see by squaring √ it up. if x ≥√y ≥ 0, then √ √ Thus x − y ≤ |x − y| for all x, y ∈ [0,√∞). Therefore δ(ε) := ε is clearly the modulus of continuity of the function x on [0, ∞). Again, an appeal to Proposition 347 concludes that f is uniformly continuous on [0, +∞). See Fig. 4.16. ♦ Proposition 349 Let f and g be uniformly continuous real-valued functions defined on a set D ⊂ R. Let α and β be real numbers. Then the function αf +βg is uniformly continuous. If f and g are moreover bounded, then the product f.g is also uniformly continuous. If h : R(f ) → R is uniformly continuous, the composition h ◦ f is a uniformly continuous function. Proof The proof of the first statement is standard and shall be omitted. Regarding the second one, assume that |f (x)| ≤ M and |g(x)| ≤ M for all x ∈ D. Then, for x, y ∈ D, |f (x)g(x) − f (y)g(y)| = |f (x)g(x) − f (x)g(y) + f (x)g(y) − f (y)g(y)| ' ( = |f (x)|.|g(x) − g(y)|+|g(y)|.|f (x) −f (y)| ≤ M |g(x) − g(y)|+|f (x) −f (y)| , and the conclusion follows easily. As for the third one, fix ε > 0 and find then δ > 0 such that |y1 −y2 | < δ in R(f ) implies |h(y1 )−h(y2 )| < ε. Find now η > 0 such that |x1 − x2 | < η in D(f ) implies |f (x1 ) − f (x2 )| < δ. It follows that, for |x1 − x2 | < η in D(f ), we have |h(f (x1 )) − h(f (x2 ))| < ε. This shows the statement. 2
160
4 Functions
Fig. 4.17 The derivative of f at a is the limit of the slopes of the chords
Remark 350 The requirement about boundedness in the statement of Proposition 349 cannot be dropped. Indeed, the function f (x) := x defined on [1, +∞) is clearly uniformly continuous (take δ = ε in Definition 343). However, the function f (x).f (x) (=x 2 ) was seen not being uniformly continuous on [1, +∞) (see Example 4.1.3.2). ®
4.1.4
Differentiable Functions
At the basis of the calculus is the concept of differentiability. It allows for techniques of optimizations. The key idea is to approximate around a point a a given function f by a function g of the form g(x) := cx + d (whose graph is a straight line), see Fig. 4.17. Definition 351 A real-valued function f defined on an open interval I ⊂ R is said to be differentiable (or smooth) at a point a ∈ I if the following limit exists as a real number: lim
h→0
f (a + h) − f (a) . h
(2 )
(4.11)
df | , and call it the derivative dx x=a of f at a. The function f is said to be differentiable on I if it is differentiable at every point in I . A real-valued function f defined on a closed interval [b, c] is said to be differentiable on [b, c] if f (x) exists for all x ∈ (b, c) and the following finite one-sided limits (denoted f+ (b) and f− (c), and called the one-sided derivatives of f at b and c, respectively) exist as well: If this is the case, we denote this limit by f (a) or
lim
h→0+
2
f (b + h) − f (b) f (c + h) − f (c) , and lim . h→0− h h
Note that the limit in (4.11) can be alternatively written as limx→a
f (x)−f (a) . x−a
4.1 Functions on Real Numbers
161
Fig. 4.18 The closer we focus on f , the closer f looks—locally—as a translate of a linear function
Remark 352 The derivative of f at a can be presented as the slope of the tangent line to the graph of f at the point (a, f (a)) (see Fig. 4.17 for this geometric interpretation of the meaning of the limit in (4.11)). Let f be a real-valued function defined on an open interval I ⊂ R. Let a ∈ I . Assume that f is differentiable at a. For h ∈ R, h = 0, such that a + h ∈ I , put u(h) :=
f (a + h) − f (a) − f (a). h
(4.12)
Observe that limh→0 u(h) = 0. Equation (4.12), for h ∈ R, h = 0, such that a + h ∈ I , can be written f (a + h) = f (a) + f (a)h + h.u(h).
(4.13)
Equation (4.13) holds also trivially for h = 0. Now, this equation has a transparent analytical and geometrical meaning: For h small, the function h $ → f (a + h) − f (a) is “almost” a linear function, precisely the function h $ → f (a)h. The “discrepancy” is really small: it is of the form u(h)h. If the reader has in mind that u(h) → 0 as h → 0, then u(h)h is “doubly” small for small values of h. This is equivalent for a function f to be differentiable at a (see Proposition 353 below). The geometrical counterpart of these considerations is that the graph of f is, near the point (a, f (a)), almost a straight line (the translate of the graph of a linear function!). The closer we look around (a, f (a)), the closer the graph of f is to the straight line (see Fig. 4.18). This straight line is called the tangent line to the graph of f at (a, f (a)). ®
162
4 Functions
It is important to state formally this result. The proof is easy. Yet, we shall provide the details. Note that a linear function L : R → R is defined by L(h) = c.h for all h ∈ R, where c is a real number. Thus, L(1) = c and so L(h) = L(1)h for all h ∈ R. Proposition 353 Let f be a function defined on an open interval I ⊂ R and let a ∈ I . Then f is differentiable at a if and only if, there exists a linear function L : R → R such that f (a + h) = f (a) + L(h) + h.u(h),
(4.14)
for every h ∈ R that satisfies a + h ∈ I , where u : {h ∈ R : a + h ∈ I } → R is a function with the property lim u(h) = 0.
h→0
(4.15)
Proof The necessary condition has been already discussed in Remark 352. Assume now that a linear map L : R → R satisfying (4.14) exists, and that (4.15) holds. Then, for h ∈ R such that h = 0 and a + h ∈ I , f (a + h) − f (a) − L(1) = u(h) → 0 as h → 0, h where we used that L(h)/ h = L(1) for every h = 0. This proves that f is differentiable at a (and f (a) = L(1), so L(h) = f (a)h for all h ∈ R). 2 Remark 354 Note that the mapping L in (4.14) is unique. In fact, if two linear mappings L and T satisfying (4.14) exist (for functions u and v satisfying (4.15), respectively), then 0 = (L − T )(h) + hu(h) − hv(h) for all h ∈ R such that a + h ∈ I . Divide by such an h = 0 to get (L − T )(1) = v(h) − u(h). Letting h → 0 we get then (L − T )(1) = 0, and this implies, since L and T are linear functions from R into R, that L = T . Definition 355 The (unique, see Remark 354) linear function L in Proposition 353 is called the differential of f at a, and it is denoted by dfa . Proposition 356 Let f be a real-valued function defined on an open interval I ⊂ R. Then, if f is differentiable at a point a ∈ I , it is continuous at a. Proof This follows from Proposition 353. Indeed, L is clearly continuous, L(0) = 0, and (4.14) holds for h small enough, hence limh→0 f (a + h) = f (a). 2 There are continuous nowhere differentiable real-valued functions on any interval [a, b], for a < b. An explicit example will be given in Definition 481, and an existence proof in Sect. 6.9.2.1. For some historical notes on this subject see the paragraph preceding Definition 481 and the references therein. Graphs of differentiable functions appear to be smooth at every point of the domain. As we already mentioned (see Remark 352 and Fig. 4.18), the graph of a differentiable function at a near (a, f (a)) can be approximated by a tangent line at
4.1 Functions on Real Numbers
163
the point (a, f (a)). The equation of the tangent line to the graph of the function f at (a, f (a)) is x $ → f (a)(x − a) + f (a).
(4.16)
The following is a basic result. An extension of this will be provided in Corollary 375. Proposition 357 For every n ∈ N, the function f (x) := x n defined on R is differentiable at every point x ∈ R, and f (x) = nx n−1 . Proof It is enough to observe that, for x, y ∈ R, we have y n − x n = (y − x)(y n−1 + y n−2 x + . . . + x n−1 ) (see Exercise 13.19). Thus, for y = x we get f (y) − f (x) = y n−1 + y n−2 x + . . . + x n−1 , y−x hence lim
y→x
as we wanted to show.
f (y) − f (x) = nx n−1 , y−x 2
Examples 358 1. Note that it directly follows from the definition that the derivative of a constant function defined on an open interval I is, at every point of I , zero. 2. The function f (x) := x 2 , defined on R is, according to Proposition 357, differentiable at every point a ∈ R, and f (1) = 2. The equation of the tangent line to the graph of f at (1, 1) is then x $ → 2(x − 1) + 1 (see Eq. (4.16)). 3. Let us consider the function f (x) := x 2 D(x) on R, where D is the Dirichlet function (see Definition 296). Note that f is differentiable only at a = 0 (and f (0) = 0). Indeed, for x = 0, 2 x D(x) − 0 = |xD(x)| ≤ |x|, x 2
= 0. hence limx→0 x D(x)−0 x Fix now a = 0. The function f is discontinuous at a. Indeed, for a sequence {xn }∞ n=1 of rational numbers converging to a we have 0 = f (xn ) → 0, while yn2 = f (yn ) → a 2 ( = 0) if {yn }∞ n=1 is a sequence of irrational numbers converging to a. Thus, by Proposition 356, f is not differentiable at a. For a direct approach when a = 1 see Exercise 13.209. 4. The function f (x) = |x| from R into R (see Fig. 4.19) is continuous everywhere (for this, observe that ||x|−|y|| ≤ |x−y| for all x, y ∈ R), while it is differentiable precisely at points x ∈ R \ {0}. Indeed, let first be a = 0. Assume a > 0. Then,
164
4 Functions
Fig. 4.19 The function |x| (Example 4.1.4.4)
0
Fig. 4.20 The function in Example 359 and its first derivative
on the neighborhood (a/2, 2a) of a, the function f coincides with the function g(x) := x, and so it is differentiable (with f (a) = 1), see Proposition 357. If a < 0, the argument is similar (and f (a) = −1). Finally, let a = 0 and observe that ||x| − |a|| ||x| − |a|| |x| |x| = = 1 for x > 0 ; = = −1 for x < 0. x−a x x−a x Thus, the derivative of f at 0 cannot exists (in geometrical terms, the graph of f has a corner at (0, 0), see Figs. 1.6 and 4.19). ♦ ♦ Let I be a non-degenerate open interval in R, and let f : I → R be a function. Assume that f (x) exists at every point x ∈ I . Then a new function f : I → R is defined, and the question of the existence of the derivative of f at a certain point a ∈ I makes sense. In case f has a derivative at a, we say that f is twice differentiable at a, and we denote the derivative of f at a by the symbol f
(a) (called the second derivative of f at a). Of course, this process can be continued—under the assumption of the existence of the corresponding “high derivative function”—to define inductively derivatives of higher order. Example 359 It is not completely trivial to see, geometrically, if a function has, at some point, a second derivative or not. To illustrate this, consider the following example: Let f : R → R be defined (see Fig. 4.20) by ⎧ ⎨x 4 if x < 0, f (x) := ⎩x 2 otherwise. This function does not have a second order derivative at a = 0, although f exists at every point x ∈ R. Indeed, the existence of f at points a = 0 is clear (at a certain
4.2 Optimization and the Mean Value Theorem
165
neighborhood of a the function f is just x 4 or x 2 depending on the sign of a), while the existence of f (0) (=0) follows from the computation—and the coincidence—of the one-sided derivatives of the two polynomials x 2 and x 4 at 0. Again it is obvious that f has derivatives of any order at a = 0, while f
(0) does not exist. This last statement follows from computing the left- and right-hand-side second derivatives of f at 0 (Fig. 4.20 hints at this fact). ♦
4.2
Optimization and the Mean Value Theorem
The results in previous sections provide a springboard for optimization techniques in applied continuous mathematics. Theorem 362 below is one of the key results in optimization of differentiable functions. It allows us to collect a list of candidates for local extrema for differentiable functions. We define first this useful concept. Definition 360 Let f be a real-valued function defined on an open interval (a, b). We say that f has a local maximum (local minimum) at c ∈ (a, b) if there exists δ > 0 so that f (c) ≥ f (x) for all x ∈ (c − δ, c + δ) (f (c) ≤ f (x) for all x ∈ (c − δ, c + δ), respectively). If f has a local maximum or a local minimum at c, we say that f has a local extremum at c3 . Remark 361 If f attains its (global) maximum on the open interval (a, b) at a point c ∈ (a, b), then obviously f attains a local maximum at c. On the other hand, if a function f is defined on (−1, 1) such that it is constant zero on some open neighborhood of 0 and its maximum on (−1, 1) is 1 and its minimum is −1, then note that f has a local minimum and simultaneously a local maximum at 0 (see Fig. 4.21). ® Theorem 362 (Fermat) Suppose that a real-valued function f is defined on an open interval (a, b) and has a derivative at each point of (a, b). If f has a local extremum at c ∈ (a, b), then f (c) = 0. Proof Without loss of generality we may assume that f has a local maximum at c (see Fig. 4.22); thus there exists δ > 0 so that f (c) ≥ f (x) for all x ∈ (c − δ, c + δ). Then ⎧ f (c + h) − f (c) ⎨≤ 0 for all 0 < h < δ (4.17) ⎩≥ 0 for all − δ < h < 0. h
3
The word maximum has been used for points where a bounded above real-valued function attains its supremum (see Definition 333). Sometimes, in order to emphasize this “global” character in contrast with the “local” character of a local maximum, the term global maximum is used for what we called before just a maximum. The same applies to minimum and global minimum.
166
4 Functions
Fig. 4.21 A function with a local minimum and maximum at 0 (Remark 361)
Fig. 4.22 Some local extrema of f f
a
c1 c2
c3
c4 b
Using (4.17), we have f (c + h) − f (c) ≤ 0, h→0+ h
f (c) = lim and, at the same time,
f (c + h) − f (c) ≥ 0, h→0− h
f (c) = lim
2 hence f (c) = 0. Points where the derivative of a differentiable function f vanishes are called critical points of the function f . Remark 363 1. Although Theorem 362 gives only a necessary condition for having a local extremum, it is important in two senses: (i) It helps to single out candidates for extrema (see, for example, Remark 363.2 and Exercises 13.242, 13.243, and 13.244), and (ii) it has a large number of consequences, as Rolle’s Theorem 364 and its byproduct, the central Mean Value Theorem 365 of Lagrange. That the condition in Theorem 362 is not sufficient for an extremum is shown by the function f (x) := x 3 on (−1, 1) at point c = 0, see Figure 4.23. Indeed, this function has a critical point at c = 0, yet this point is not a local extremum for f . For a sufficient condition for local extrema in terms of second derivatives—a kind of converse of Theorem 362—see Theorem 373.
4.2 Optimization and the Mean Value Theorem
167
Fig. 4.23 At the nonextremum point c = 0 the derivative is 0
2. The function f (x) = x defined on [0, 1] has a maximum (with value 1) at the point 1 and a minimum (with value 0) at the point 0. It has no local extrema on (0, 1). Despite its simplicity, this example illustrates a general procedure to find extrema of a function defined on an interval [a, b]: If the function is continuous, it attains its supremum and infimum on [a, b] at points in [a, b] (Corollary 335). In order to locate any one of those points, we may assume first that it belongs to (a, b). Then, if f is differentiable on (a, b), Theorem 362 says that we must look for a critical point. The alternative is that the point we are looking for would be located on the boundary of [a, b], i.e., in {a, b}. Finally, we are left with the collection {x ∈ (a, b) : x a critical point of f } ∪ {a, b}. To decide which of those points gives the maximum or the minimum (or none) should be an easy task in many instances. In our case, the function f (x) := x on [0, 1] satisfies all the requirements (continuity on [0, 1], differentiability on (0, 1)), and there are no critical points on (0, 1). So we are left with the choice {0, 1}. That 0 gives a minimum and 1 a maximum is clear from the fact that f (0) = 0 < f (x) < f (1) = 1 for all x ∈ (0, 1). See again Exercises 13.242, 13.243, and 13.244. 3. Note that the function f (x) = |x| on R has a local (even global) minimum at 0 and yet f (0) does not exist (see Example 4.1.4.4 and Fig. 4.19). 4. Concerning critical points we mention in passing the deep Morse–Sard theorem— whose proof is beyond the scope of this introductory text: If f is a continuously differentiable real-valued function on the real line and C is the set of critical points of f , then the measure of the set f (C) is zero. For a reference, see, e.g., [Ster64], and for a reduced version see Lemma 440 below. ® There is an interesting consequence of Theorem 362: It is the fact that the derivative of a function—if it exists—has the Intermediate Value Property (see Theorem 339), although this derivative is not in general a continuous function (as in the case of the function in Example 4.5.8.3). This shall be stated precisely and proved later (Theorem 448). The next result is due to the French mathematician M. Rolle. Theorem 364 (Rolle) Suppose that a real-valued function f is continuous on a closed bounded interval [a, b] and has a derivative at each point in the open interval (a, b). Assume that f (b) = f (a). Then there exists c ∈ (a, b) so that f (c) = 0. (See Fig. 4.24, where two such points c1 and c2 are depicted.)
168
4 Functions
Fig. 4.24 Rolle’s theorem f(b)
f(a) f
a
c1
c2
b
Proof By Corollary 335, f attains its maximum at some cmax ∈ [a, b], and its minimum at some cmin ∈ [a, b]. Either the function f is constant on [a, b] (the result is obvious then) or one of f (cmax ) or f (cmin ) differs from f (a) (recall that f (a) = f (b)). Assume without loss of generality that it is the value f (cmax ). Then cmax ∈ (a, b), and so f has a local maximum at cmax . The result follows from Theorem 362. 2 The next two results are direct consequences of Rolle’s Theorem 364. The first one (Theorem 365) is referred to as the Mean Value Theorem and it is a key result in real analysis. This theorem is due to the Italian–French mathematician J. L. Lagrange. It states that an average velocity on a time interval is equal to an instantaneous velocity at some time in the interval. The second one (Corollary 367) is, in fact, an extension of the Mean Value Theorem, and it is sometimes referred to as the Generalized Mean Value Theorem. It is due to A. L. Cauchy. We present the two results separately for historical reasons and for proper references below. When we ask advice, we are usually looking for an accomplice. Joseph-Louis Lagrange
Theorem 365 (Lagrange’s Mean Value Theorem) Suppose that a real-valued function f is continuous on a closed and bounded interval [a, b], where a < b, and it has a derivative at each point in the open interval (a, b). Then there exists c ∈ (a, b) so that f (c) =
f (b) − f (a) , b−a
(4.18)
i.e., f (b) − f (a) = f (c)(b − a).
(4.19)
(See Fig. 4.25, where two such points c1 and c2 are presented.) Proof Consider the function g on [a, b] given by g(x) = f (x) − f (a) −
f (b) − f (a) (x − a) , for all x ∈ [a, b]. b−a
(4.20)
Note that g(a) = g(b) = 0, and that g is continuous on [a, b] and differentiable on (a, b). Therefore, by Theorem 364 there exists c ∈ (a, b) so that g (c) = 0. The result follows. 2
4.2 Optimization and the Mean Value Theorem
169
Fig. 4.25 Lagrange’s Mean Value Theorem f
a c1
c2
b
Remark 366 The reader should realize that the definition of the function g in the proof of Theorem 365 (Eq. (4.20)) is natural: The graph of the function g is just the graph of the function f tilted in such a way that the dashed segment in Fig. 4.25 becomes horizontal. At that moment Rolle’s Theorem 364 applies. Corollary 367 (Cauchy) Let f and g be two real-valued functions on [a, b], both continuous on [a, b] and differentiable on (a, b). Then there exists c ∈ (a, b) so that
f (b) − f (a) g (c) = g(b) − g(a) f (c).
(4.21)
Remark 368 1. Equation (4.21) appears, in the case that g (c) = 0 and g(a) = g(b), as f (c) f (b) − f (a) = . g(b) − g(a) g (c)
(4.22)
2. Note that one cannot show Cauchy’s Corollary 367 directly from Lagrange’s Theorem 365 for separately the numerator and denominator in (4.22), since we would get different points c for each of them. 3. Observe that Theorem 365 is a special case of Corollary 367 for g(x) := x, x ∈ [a, b]. ® Proof of Corollary 367 Define on [a, b] the function
φ(x) := f (b) − f (a) g(x) − g(b) − g(a) f (x), and note that φ meets the conditions in Rolle’s Theorem 364. The result then follows from it. 2 Corollary 369 If f is a real-valued function on [a, b], continuous on [a, b], differentiable on (a, b), and f (x) = 0 for all x ∈ (a, b), then f is identically equal to a constant. Proof Fix c and d such that a ≤ c < d ≤ b and apply Theorem 365 to the function f restricted to the interval [c, d]. It follows that f (c) = f (d). This proves that f is constant on [a, b]. 2
170
4 Functions
Fig. 4.26 The function f in Remark 372, with f (0) = 1/2
Corollary 370 Let f and g be two real-valued continuous functions on [a, b], such that both f and g are differentiable on (a, b). Assume that f (x) = g (x) for all x ∈ (a, b). Then f = g + C, where C is a constant. Proof Apply Corollary 369 to the function f − g. 2 As mentioned before, the Mean Value Theorem 365 has profound implications in real analysis. The following result is one of the direct application of the Mean Value Theorem. Proposition 371 Assume that f is a real-valued function defined and differentiable on an open interval I . Assume, too, that f (x) > 0 (f (x) ≥ 0) for all x ∈ I . Then f is strictly increasing (respectively, increasing) on I . Proof Let x, y ∈ I with x < y, be arbitrary. By Theorem 365 there exists c ∈ (x, y) (⊂I ) so that f (y) − f (x) = f (c). y−x Since f (c) > 0 (respectively, f (c) ≥ 0), we have f (y) > f (x) (respectively, f (y) ≥ f (x)). 2 Remark 372 For concluding that f is locally increasing at some point a (i.e., increasing on a neighborhood of a) it is not enough to assume that f (a) > 0, even in the case that f is differentiable everywhere. As an example, and taking for granted the existence and properties of the sin x function (see Sect. 4.4), consider the function f (see Fig. 4.26, where f is plotted on [−0.1, 0.1]) defined by ⎧ ⎨t 2 sin 1 + 1 t, if t = 0, t 2 f (t) := ⎩0, otherwise. (0) Observe that f (0) = 21 . Indeed, for t = 0 we have that f (t)−f = t sin 1t + 21 → 21 t as t → 0, due to the fact that sin x is a bounded function. Yet f is not increasing on any neighborhood of zero. Indeed,
f (t) = 2t sin
1 1 1 − cos + , ift = 0, t t 2
4.3 Algebra of Derivatives
171
Fig. 4.27 For increasing x, slopes decrease near x1 and increase near x2
f
x1
x2
and on every neighborhood of zero the function f switches its sign (trigonometric functions will be studied in Sect. 5.2.5). ® The following result gives a sufficient condition for extrema in terms of the sign of the second derivative. The geometric interpretation is depicted in Fig. 4.27. For numerical examples, see Exercises 13.242, 13.243, and 13.244. Theorem 373 Let f be a real-valued function defined on an open interval I , and let a ∈ I be a point where the second derivative of f exists. (i) If f (a) = 0 and f
(a) > 0, then f has a local minimum at a. (ii) If f (a) = 0 and f
(a) < 0, then f has a local maximum at a. Proof Observe first that the existence of the second derivative of f at a requires the existence of the first derivative on an interval (a − δ0 , a + δ0 ) (⊂I ), for some δ0 > 0. In particular, f is continuous on (a − δ0 , a + δ0 ). (i) Since f (a) = 0 and f
(a) > 0, f (a + h) − f (a) f (a + h) = lim . h→0+ h→0+ h h
0 < f
(a) = lim
Thus there is δ ∈ (0, δ0 ) such that f
(a) f (a + h) > > 0, h 2 for all 0 < h < δ. In particular, f (a + h) > 0 for all 0 < h < δ. By the Mean Value Theorem 365, f (a +h)−f (a) > 0 for all 0 < h < δ. Therefore f (a +h) > f (a) for all 0 < h < δ. Similarly we get that for some δ1 ∈ (0, δ0 ), we have f (a − h) < f (a) for all 0 < h < δ1 . All together, we proved that f has a local minimum at a. (ii) is proved similarly. 2
4.3 Algebra of Derivatives The following result collects some algebraic properties of differentiable functions. Proposition 374 Let f and g be real-valued functions defined on an open interval I and differentiable at a point a ∈ I . Then
172
4 Functions
(i) The function f + g is differentiable at a, and (f + g) (a) = f (a) + f (b). (ii) (Product Rule) The function f g is differentiable at a, and (f g) (a) = f (a)g(a) + f (a)g (a). In particular, if k is a constant, we have that kf is differentiable at a, and (kf ) (a) = kf (a). (iii) (Quotient Rule) The function f/g is differentiable at a if g(a) = 0, and
f f (a)g(a) − f (a)g (a) (a) = . g g 2 (a) (iv) (Chain Rule) Assume that the real-valued function g is defined on an open interval I and that g is differentiable at a ∈ I . Assume that the real-valued function f is defined on an open interval J such that g(I ) ⊂ J , and that f is differentiable at g(a). Then the function f ◦ g is differentiable at a, and
(f ◦ g) (a) = f g(a) g (a). Proof As for (i), consider, for x ∈ I , x = a, f (x) − f (a) g(x) − g(a) (f + g)(x) − (f + g)(a) = + ; x−a x−a x−a the result follows easily. For (ii) consider, for x ∈ I , x = a, (f g)(x) − (f g)(a) = x−a f (x)g(x) − f (x)g(a) f (x)g(a) − f (a)g(a) + x−a x−a g(x) − g(a) f (x) − f (a) = f (x) + g(a) x−a x−a =
and the result follows. Let us now consider (iv). We know that g (a) exists, so there exists a function α(x) such that α(x) → 0 as x → a, and
(4.23) g(x) − g(a) = (x − a) g (a) + α(x) , for x ∈ I (see Proposition 353, in particular (4.14)). Let d := g(a) ( ∈ J ). For x ∈ I put y := g(x) ( ∈ J ). Since f (d) exists, there exists a function β(y) such that β(y) → 0 as y → d, and
(4.24) f (y) − f (d) = (y − d) f (d) + β(y) , for y ∈ J.
4.3 Algebra of Derivatives
173
For x ∈ I , it follows from (4.23) and (4.24) that (f ◦ g)(x) − (f ◦ g)(a) = f (y) − f (d)
= (g(x) − g(a)) f (g(a)) + β(g(x))
= (x − a) g (a) + α(x) f (g(a)) + β(g(x)) .
Thus we have lim
x→a
(f ◦ g)(x) − (f ◦ g)(a) = f g(a) g (a). x−a
As for (iii), note that f/g = f g −1 , and apply the product rule together with the Chain Rule ((iv) in Proposition 374). 2 Corollary 375 Any polynomial p is a differentiable function on R (and its derivative is again a polynomial). Precisely, if p(x) := an x n + · · · + a1 x 1 + a0 for x ∈ R, then p (x) = nan x n−1 + · · · + a1 for all x ∈ R. Proof The first part follows from (i) and (ii) in Proposition 374. To get the second part, use Proposition 357 and again (i) and (ii) in Proposition 374. 2 As an example of how to use the Chain Rule (item (iv) in Proposition 374), let p(x) := (x 2 + 1)4 . Then p (x) = 4(x 2 + 1)3 .2x = 8x(x 2 + 1)3 . The following result is named after the French mathematician G. de L’Hôspital. Theorem 376 (L’Hôspital) Let f and g be real-valued functions that are differentiable on (a, b) and g (x) = 0 for all x ∈ (a, b). Suppose either lim f (x) = lim g(x) = 0,
(4.25)
lim g(x) = ±∞
(4.26)
f (x) = L, x→a+ g (x)
(4.27)
x→a+
x→a+
or x→a+
Furthermore assume lim
where L is in R or could be ±∞. Then f (x) = L. x→a+ g(x) lim
The value a can be replaced with −∞ and the result still holds. Remark 377
(4.28)
174
4 Functions
1. Note that g cannot have two zeros in (a, b) —this would imply, by Rolle’s Theorem 364, that g will vanish at some point in (a, b). Therefore, we can find c ∈ (a, b) such that g(x) = 0 for x ∈ (a, c), and so f (x)/g(x) ∈ R for x ∈ (a, c). 2. The reader may provide a corresponding version of Theorem 376 for a replaced by b, and convergence to a from the right replaced by convergence to b from the left. The result also holds in this case for b = +∞. ® Proof of Theorem 376 1. Assume first that −∞ ≤ L < +∞, and fix β > L. Fix α such that L < α < β. From (4.27) we can find c ∈ (a, b) such that f (x) < α, for x ∈ (a, c). g (x)
(4.29)
Let a < x < y < c, and note that g(x) = g(y) (otherwise, Rolle’s Theorem 364 would imply that g vanishes at some point in (x, y)). From Corollary 367 we get t ∈ (x, y) such that f (y) − f (x) f (t) =
( 0. Multiply (4.30) by the positive number g(x)−g(y) to get, for x ∈ (a, c1 ), g(x) g(x) − g(y) f (x) − f (y) 0 so that all fractions in the neighborhood I = (a − δ, a + δ) ∩ (0, 1) have denominators greater than 1ε . This is possible since all fractions with denominators less than or equal to 1ε form a finite set. It follows that, if x ∈ (a − δ, a + δ) ∩ (0, 1), then R(x) < ε. Thus limx→a R(x) = 0 and hence R is continuous at a. 2. The Riemann function R is nowhere differentiable. Indeed, observe first that clearly R cannot be differentiable at rational numbers, since the function is not continuous there (see the previous item and Proposition 356). We now show R (a) does not exist even if a is irrational. The argument is based on Theorem 117: Given a natural number Q there exist integers p and q so that 0 < q ≤ Q and |a − (p/q)| < 1/(Qq) ≤ 1/q 2 . Remark 118.2 ensures that the number of irreducible expressions p/q that satisfy the previous inequality for Q running through all natural numbers is infinite, so we can find such fractions x := p/q with q as large as wish. Now, for this x, R(x) − R(a) q1 q1 = a − p ≥ 1 ≥ q. x−a 2 q q
Thus R (a) can not exist.
4.4 The Trigonometric Functions
177
Fig. 4.29 The trigonometric functions
S Q tan(x) x sin(x) O 0
x
R P
1
cos(x)
If we wish to have R defined on [0, 1] we can set R(0) = R(1) = 0. The function is then continuous at x = 0 and at x = 1. In Fig. 4.28 we try to plot the graph of the Riemann function on [0, 1]. We plot function values on a subset of the dyadic numbers. We can visually detect the continuity at irrational numbers as the function values tend to compress to the x axis. ♦
4.4 The Trigonometric Functions Where there is matter, there is geometry. (Ubi materia, ibi geometria.) Johannes Kepler
Trigonometric functions form an important class of functions. These functions arise in geometry. They are used in analysis as basis functions to represent more complicated functions. In this section we depart somehow from our previous approach and rely on the geometrical description (see Fig. 4.29) of the sine, cosine, and tangent functions. For the sine (cosine) function, this is given by the quotient of the opposite side (respectively, the adjacent side) to an acute angle in a right triangle and the hypothenuse. For the tangent function, the quotient of the opposite side to the adjacent side. Observe that, if the angle x is measured in radians, then the length of the arc in Fig. 4.29 is precisely x, so | sin x| ≤ |x|. Later on (see Sect. 5.2.5), we shall provide an analytic description of the trigonometric functions, proving that the resulting functions are the only ones having the properties described here (see Proposition 544). Accordingly, and once this established, the analytic expressions (5.88) and (5.89) there will be taken as definitions of the trigonometric functions. In this section, then, we take for granted the elementary Euclidean geometry of the plane, in particular the notion of angle, of the Euclidean distance and the concept of the unit circle. We measure angles by arc-length; the functions sine (in symbols, sin x) and cosine (cos x) are defined in Fig. 4.29 for an angle x ∈ [0, π/2], while the function tangent (tan x) is defined for an angle x ∈ [0, π/2). We take sin (π − x) = sin x and cos (π − x) = − cos x for x ∈ [0, π/2], and then sin (−x) = − sin x and cos (−x) = cos x for x ∈ [0, π]. In the case of the tangent, tan (π − x) = − tan x for x ∈ [0, π/2), and tan (−x) = − tan x for x ∈ [0, π/2) ∪ (π/2, π ]. Finally, by
178
4 Functions
Fig. 4.30 The trigonometric functions sin x and cos x on [−2π, 2π]
Fig. 4.31 Adding angles α and β
definition, the functions sin x, cos x, and tan x, are 2π -periodic, i.e., sin (x + 2π ) = sin x for all x ∈ R (the same for the others). We plot in Fig. 4.30 the functions sin x and cos x on the interval [−2π , 2π ]. We have, then sin nπ = 0,
sin (2n + 1)
π = (−1)n , 2
for n ∈ Z,
(4.34)
and cos nπ = (−1)n ,
cos (2n + 1)
π = 0, 2
for n ∈ Z.
(4.35)
Note that the similarity between the triangles OP Q and ORS in Fig. 4.29 gives tan x =
sin x , forx ∈ R \ {(2n + 1)π/2 : n ∈ Z}. cos x
Observe, too, that the Pythagorean Theorem concludes that sin2 x + cos2 x = 1. Proposition 381 Let α, β be two real numbers. Then (i)
sin (α + β) = sin α cos β + cos α sin β.
(ii)
sin (α − β) = sin α cos β − cos α sin β.
(iii) cos (α + β) = cos α cos β − sin α sin β. (iv) cos (α − β) = cos α cos β + sin α sin β. Proof Observe Fig. 4.31.
(4.36)
4.4 The Trigonometric Functions Fig. 4.32 The proof of Corollary 382
179 y
π/2 α x
The dashed triangle is obtained by rotating the solid gray triangle by an angle −β. Let us compute the squares of the two (equal) lengths P Q and RS. We get 2
P Q = ( cos (α + β) − 1)2 + sin2 (α + β) = cos2 (α + β) + 1 − 2 cos (α + β) + sin2 (α + β) = 2 − 2 cos (α + β), and 2
RS = ( cos α − cos ( − β))2 + ( sin α − sin ( − β))2 = ( cos α − cos β)2 + ( sin α + sin β)2 = cos2 α − 2 cos α cos β + cos2 β + sin2 α + 2 sin α sin β + sin2 β = 2 − 2 cos α cos β + 2 sin α sin β. Since P Q = RS, we get 2 − 2 cos (α + β) = 2 − 2 cos α cos β + 2 sin α sin β, and (iii) follows from this. To prove (i), it is enough to observe that sin2 (α + β) = 1 − cos2 (α + β) and to use (iii). (ii) follows from (i) and (iv) follows from (iii) having in mind that cos x is an even function and sin x is an odd function. 2 Corollary 382 For every α ∈ R we have π π sin α + = cos α, cos α + = − sin α. 2 2 Proof This follows from (4.34) and (4.35), together with Proposition 381. It follows also from the observation of Fig. 4.32. 2 Remark 383 From the point of view of the graphs of the two functions sin x and cos x, Corollary 382 states that the second is obtained from the first by a horizontal translation of length −π/2. See Fig. 4.30. ®
180
4 Functions
1
1
1/2
π/6
π/4 1
π/3
(i)
(ii)
Fig. 4.33 Computing the trigonometric functions at some angles
Corollary 384 For α ∈ R, we have sin (2α) = 2 sin α cos α, cos2 α =
cos (2α) = cos2 α − sin2 α,
1 + cos (2α) , 2
sin2 α =
1 − cos (2α) . 2
2 Proof It follows from Proposition 381 and the fact that sin2 α + cos2 α = 1. The following collection of identities is obtained easily from Proposition 381. We omit its proof, although Exercise 13.251 hints at the argument for one of them. Corollary 385 Let α and β be two real numbers. Then (i) (ii) (iii) (iv)
cos α−β . sin α + sin β = 2 sin α+β 2 2 α+β α−β sin α − sin β = 2 cos 2 sin 2 . cos α + cos β = 2 cos α+β cos α−β . 2 2 α+β cos α − cos β = 2 sin 2 sin α−β . 2
The geometric definition of the trigonometric functions done at the beginning of this section, together with the basic relations given in Proposition 381 and its Corollaries 382 and 384 allow for easily computing the values sin α, cos α, and tan α for some particular α. To begin with, observe Fig. 4.33. From (i) in Fig. 4.33, it is clear that the shaded triangle is equilateral, hence sin
π = 1/2 6
by looking at the thick lines. From it, we get √ ) π 3 2 cos = 1 − sin (π/6) = , 6 2 and
√ π sin (π/6) 3 tan = = . 6 cos (π/6) 3
4.4 The Trigonometric Functions
181
From (ii) in Fig. 4.33 it is clear that the two shaded triangles are isosceles (as they are similar), hence sin (π/4) √ = cos (π/4). It follows then that 2 sin2 (π/4) = 1, hence sin (π/4) = cos (π/4) = 2/2, and so tan (π/4) = 1. Now, we may compute the trigonometric functions for some other typical angles. For example, + * √ 1 − cos (π/6) 2− 3 π = = . sin 12 2 4 The following lemma pertains to general functions having a limit at some point. This simple—and yet useful—result is known as the sandwich lemma. Lemma 386 (Sandwich Lemma) Consider three functions f , g and h defined in a neighborhood D of some point a ∈ R, such that f (x) ≤ h(x) ≤ g(x) for all x ∈ D \ {a}. Moreover, assume that limx→a f (x) = limx→a g(x) = L. Then limx→a h(x) = L. Proof The proof will be done for L ∈ R. The case L := +∞ or L := −∞ can be handled similarly. Let ε > 0 and choose δ > 0 so that (a − δ, a + δ) ⊂ D, and |g(x) − L| < ε and |f (x) − L| < ε for all x ∈ (a − δ, a + δ) \ {a}. Thus if x ∈ (a − δ, a + δ) \ {a}, then f (x) and g(x) are both in (L − ε, L + ε) and so is h(x). Therefore |h(x) − L| < ε whenever x ∈ (a − δ, a + δ) \ {a}. 2 Proposition 387 The functions sin x and cos x are uniformly continuous on R. Proof Observe that | sin x| ≤ |x| for all x ∈ R (see Fig. 4.29). Moreover, for x, y ∈ R, we have sin x − sin y = 2 sin x−y cos x+y (see Corollary 385). Since 2 2 | cos x| ≤ 1 for all x ∈ R, we get | sin x − sin y| ≤ |x − y|, hence sin x is clearly uniformly continuous on R. Since cos x = sin (x + π2 ) (see Corollary 382), the function cos x on R is uniformly continuous on R as well. 2 Remark 388 Note that the conclusion of Proposition 387 can be obtained from Theorem 344 and the fact that sin x and cos x are 2π -periodic functions (see the introduction to this section). Indeed, it is then enough to check the continuity of the functions sin x and cos x on the compact interval [0, 2π ] (or any other closed interval having length 2π ) to conclude that they are uniformly continuous on R. This observation applies to any other continuous periodic function. ® Proposition 389 We have sin x x = lim = 1. x→0 x x→0 sin x lim
Proof Let x ∈ (0, δ) for some small positive real number δ. Comparing the areas in Fig. 4.29 we obtain (note that the area of a circular arc having radius 1 and angle x is equal to x/2; indeed, the area of the circle of radius 1 is π , and this corresponds
182
4 Functions
to an angle 2π , so the result follows by proportionality) 1 1 1 sin x cos x < x < tan x, 2 2 2 which implies cos x ≤
x 1 ≤ sin x cos x
for all x ∈ (0, δ). Since lim cos x = lim
x→0
x→0
1 =1 cos x
(see Proposition 387), we have, by Lemma 386, lim
x→0+
x = 1. sin x
Note that sin (−x) = − sin x (see the introduction to this section), and thus lim
x→0
x = 1. sin x 2
Corollary 390 We have 1 − cos x = 0. x→0 x lim
Proof Consider lim
x→0
1 − cos x (1 − cos x)(1 + cos x) = lim x→0 x x(1 + cos x) sin2 x sin x 1 = lim sin x = 0. x→0 x(1 + cos x) x→0 x 1 + cos x
= lim
2 Proposition 391 The function sin x is differentiable on R, and we have d sin x = cos x. dx Proof Using Corollary 385 and Proposition 389 we get sin (x + h) − sin x = h 2 2 2x + h x+h−x h h lim cos sin = lim cos x + sin h→0 h h→0 h 2 2 2 2 lim
h→0
4.5 Finer Analysis of Continuity and Differentiability
183
h sin (h/2) lim cos x + = cos x. = lim h→0 h/2 h→0 2 2 Proposition 392 The function cos x is differentiable on R, and we have d cos x = − sin x. dx Proof From Corollary 382 we have cos x = sin (x + π/2) for every x ∈ R. Thus, x by Propositions 391 and (iv) in 374 we have d cos = cos (x + π/2) = − sin x. 2 dx
4.5 4.5.1
Finer Analysis of Continuity and Differentiability Differentiability of the Inverse Mapping
We proved in Proposition 337 that the inverse mapping of a continuous function, if it exists, is continuous under the requirement of compactness of the domain. The following result addresses the question of differentiability. It is a special case of what is called the Inverse Mapping Theorem. Its hypothesis already ensures that the inverse mapping exists.. Theorem 393 Let f : [a, b] → R be a real-valued function. Assume that f is continuous on [a, b] and differentiable on (a, b). Assume, too, that either f (x) > 0 for every x ∈ (a, b) or f (x) < 0 for every x ∈ (a, b). Then the inverse mapping f −1 exists from the interval f [a, b] onto [a, b], it is continuous on f [a, b], and it is differentiable at every interior point y of f [a, b]; moreover, at such y, we have (f −1 ) (y) = 1/f (x), where x ∈ (a, b) is such that f (x) = y. Proof Since f is continuous, by Corollary 340 the range f [a, b] is a closed and bounded interval, say [c, d]. Assume that f (x) > 0 for all x ∈ (a, b) (a similar argument can be used if, instead, f (x) < 0 for all x ∈ (a, b)). Proposition 371 shows then that f is strictly increasing on (a, b) and, by continuity, also on [a, b] (see Exercise 13.177). This proves, in particular, that f has an inverse function g := f −1 defined on [c, d], and mapping [c, d] onto [a, b] in a one-to-one way. The continuity of g follows from Proposition 337. Let us show that g is differentiable on (c, d). Fix y ∈ (c, d). Let x be the (unique) point in (a, b) such that f (x) = y. Fix &y = 0 such that y + &y ∈ (c, d). Use the differentiability of f at x: It exists a function u as in (4.14) (see Proposition 353). In order to simplify the expressions, put &g := g(y + &y) − g(y). Then we can write
&y =f g(y + &y) − f g(y) = f g(y) &g
+ &g.u(&g) = &g f g(y) + u(&g) . (4.37)
184
4 Functions
Fig. 4.34 The function tan x and its derivative on (−π/2, π/2)
Fig. 4.35 The function arctan x and its derivative on [-10,10] (Example 395)
Note
that, due to the one-to-one character of g, we have &g = 0, and so, from (4.37), f g(y) +u(&g) = 0. Again from (4.37), we get, having in mind that f (g(y)) = 0, &y g(y) + u(&g) &y 1 1
=
+ &y . − f (g(y)) f g(y) + u(&g) f (g(y))
&g =
f
In order to conclude that g is differentiable at y we need to show, by using Proposition 353, that the function v(&y) :=
1 1
− f g(y) + u(&g) f (g(y))
satisfies v(&y) → 0 as &y → 0. Note that, since g is continuous, we have &g → 0 as &y → 0. The result now clearly follows (proving, also, that g (y) = 1/f (g(y))). 2 Remark 394 The fact that g (y) = 1/f (g(y)) is also a consequence of the Chain Rule (Proposition 374, (iv)) —once we know that f −1 is differentiable. Indeed, it is enough to apply the Chain Rule to Id(c,d) = f ◦ f −1 , where Id(c,d) denotes the identity mapping on the interval (c, d). ®
4.5.2
Inverse Goniometric Functions
Example 395 The function arctan x (see Fig. 4.35). The function tan x maps (− π2 , π2 ) one-to-one continuously onto (−∞, ∞) with a continuous derivative that is positive everywhere, and so tan x is a strictly increasing
4.5 Finer Analysis of Continuity and Differentiability
185
Fig. 4.36 The function arcsin x and its derivative (Example 396)
function by Proposition 371 (see Fig. 4.34). Thus its inverse function arctan x is differentiable (see Theorem 393), it maps (−∞, ∞) onto the interval (− π2 , π2 ), and is a strictly increasing function (see Fig. 4.35). We now calculate its derivative: For this we use again Theorem 393. First note that, for x ∈ R, cos2 x 1 cos2 x = . = 2 2 2 tan x + 1 sin x + cos x Since tan ( arctan x) = x for all x ∈ R, by the Chain Rule ((iv) in Proposition 374) we get 1 arctan x = 1 for all x ∈ R. cos2 ( arctan x) Thus arctan x = cos2 ( arctan x) =
tan2
1 1 = 2 , for all x ∈ R. ( arctan x)x + 1 x +1 ♦
Example 396 The function arcsin x. The function sin x maps [− π2 , π2 ] one-to-one continuously onto [−1, 1] with a continuous derivative cos x that is strictly positive everywhere in (− π2 , π2 ) (see Fig. 4.30). This shows, by Proposition 371, that sin x is a strictly increasing function on [− π2 , π2 ]. Thus its inverse function arcsin x is differentiable on (−1, 1) (thanks to Theorem 393) and maps [−1, 1] onto the interval [− π2 , π2 ]. Moreover, it is a strictly increasing function (see Fig. 4.36). In order to compute its derivative, we use again Theorem 393. First note that cos2 x =
cos2 x 1 , for all x ∈ R. = tan2 x + 1 sin x + cos2 x 2
We have sin ( arcsin x) = x, for x ∈ [−1, 1]. Hence, by the Chain Rule ((iv) in Proposition 374), cos ( arcsin x) arcsin x = 1, for all x ∈ (0, 1).
186
4 Functions
Thus arcsin x =
1 1 1 , for all x ∈ (0, 1). = =√ 2 cos ( arcsin x) 1 − x2 1 − sin ( arcsin x)
To decide on the derivative from the left of arcsin x at the point x = 1, we calculate, using L’Hôspital’s Rule (Theorem 376), arcsin x − x→1− x−1
π 2
arcsin x + x→−1+ x+1
π 2
lim
= lim √ x→1−
1 1 − x2
= +∞.
Similarly we get lim
= lim √ x→−1+
1 1 − x2
= +∞. ♦
4.5.3
Monotone Functions
In Proposition 309 we showed that monotone functions have one-sided limits at any point. However, the left-hand and the right-hand limit may differ. This happens, however, “few” times. This is the content of the next result. Proposition 397 Let f : I → R be a monotone function defined on an open interval in R. Then the set of discontinuities of f on I is at most countable, and these discontinuities are jump discontinuities. Proof Assume, without loss of generality, that f is increasing. Let a ∈ I be a point of discontinuity of f . By Proposition 309, both f (a−) := limx→a− f (x) and f (a + ) := limx→a+ f (x) exist. Since f is discontinuous at a, we must have f (a−) < f (a + ), and so a is a jump discontinuity. Note, too, that f (x + ) ≤ f (y−) whenever x, y ∈ I , x < y. Therefore, the family of nonempty open intervals F := {(f (a−), f (a + )) : f is discontinuous ata} is pairwise disjoint. It is enough to choose a rational point in each of them to define a one-to-one mapping from F into Q. This shows that F is countable. 2 Recall the geometric construction of the Cantor ternary set C after Definition 343. At the stage n ∈ N we removed from I := [0, 1] a number 2n−1 of open intervals, say In,k , for k = 1, 2, . . ., 2n−1 . Definition 398 The Lebesgue singular function S : [0, 1] → R is defined by ⎧ 2k−1 ⎪ if x ∈ In,k for some n ∈ N, ⎪ ⎨ 2n S(x) := k ∈ {1, 2, . . ., 2n−1 }, ⎪ ⎪ ⎩ sup{f (t) : t ∈ I \ C, t < x} if x ∈ C.
4.5 Finer Analysis of Continuity and Differentiability
187
1
7/8
3/4
5/8
1/2
3/8
1/4
1/8
0 0
I1,1
1 I2,1
I2,2 I3,1 I3,2
I3,3 I3,4
Fig. 4.37 The first steps in the construction of the Lebesgue singular function S 1 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
Fig. 4.38 The Lebesgue singular function S (i.e., the devil’s staircase)
The graph of the Lebesgue singular function looks like a staircase and is sometimes referred to as the devil’s staircase (see Fig. 4.38). Proposition 399 The Lebesgue singular function S is a continuous increasing function mapping the interval I := [0, 1] onto itself. The function S is differentiable at x ∈ I if and only if, x ∈ C (where C denotes the Cantor ternary set), and in this case S (x) = 0. Proof Clearly S is an increasing function. Assume that x ∈ [0, 1] \ C. Then x belongs to one of the removed open intervals In,k , and S is constant on this interval. Thus S is differentiable (in particular, continuous) at x and S (x) = 0. The key observation is that if x =
∞ an n=1
3n
∈ C, an ∈ {0, 2} for all n ∈ N, then S(x) =
∞ an 2n+1 n=1
(4.38)
and this implies that S must be continuous at any xn ∈ [0, 1]. Indeed, it remains to check continuity at any x ∈ C. Put x = ∞ n=1 an /3 , where an ∈ {0, 2}. Fix ε > 0.
188
4 Functions
1−n0 We can find then n0 ∈ N such that < ε. Put δ := 3−n0 . Let y ∈ C such that 2∞ 0 < |y − x| < δ, and write y = n=0 bn /3n , where bn ∈ {0, 2}. Let n ∈ N be the first natural number such that bn = an . Then we have ∞ 2 1 bk − ak = δ > |y − x| = n + 3 3n0 3k k=n+1
≥
∞ 2 2 1 1 2 − = n − n = n, k n 3 3 3 3 3 k=n+1
and this implies n > n0 . Thus, ∞ b − a k k |S(y) − S(x)| = 2k+1 k=1 ∞ ∞ b − a 2 1 1 k k = ≤ = n−1 < n −1 < ε. k+1 k+1 0 2 2 2 2 k=n k=n Fix now two elements y1 and y2 in C such that y1 < x < y2 and |yi − x| < δ for i = 1, 2. Then |S(yi ) − S(x)| < ε for i = 1, 2. Since S is increasing, given y ∈ [0, 1] such that y1 < y < y2 we also have |S(y) − S(x)| < ε, and this proves the continuity of S at x. Note that, as a consequence, the image of [0, 1] by S is again [0, 1]. We now show that S fails to be differentiable at each point of C. To this end, fix x = 0.a1 a2 a3 · · · (base 3) ∈ C, ∞
i.e., x = n=1 an /3n . Given n ∈ N, look at an . If an = 0, then put bn = 2; otherwise, put bn = 0. Let y = 0.a1 a2 a3 · · · an−1 bn an+1 · · · (base 3). In the first case (i.e., if an = 0), the element y satisfies x < y and y − x = 2/3n , hence S(y) − S(x) 1/2n 1 3 n = = . y−x 2/3n 2 2 In the second case, y < x, and y − x = −2/3n , hence we have, again, S(y) − S(x) 1 3 n 1/2n = . = 2/3n 2 2 y−x Letting n → ∞ we obtain a contradiction, since {(3/2)n }∞ n=1 does not converge.
2
4.5 Finer Analysis of Continuity and Differentiability
4.5.4
189
Measurable Functions
The following definition introduces a class of functions that play an important role in the theory of measure and integration. The functions in this class form the natural objects to which any reasonable integrability procedure should apply (still with no guarantee of getting a finite value). Since we shall consider, in the future, functions that are defined as limits of sequences of other functions, it is natural to allow from the beginning functions that take values in the extended real number system R∗ (see Definition 31). The next definition is formulated for a general σ -algebra Σ of subsets of R (see Definition 250). Mostly we shall be interested in the case that Σ = M (the σ -algebra of all Lebesgue measurable subsets of R, see Definition 245) or Σ = B (the σ -algebra of all Borel subsets of R, see Definition 264). Definition 400 Given a σ -algebra Σ of subsets of R, and an element S ∈ Σ, a function f : S → R∗ is said to be Σ-measurable if, for each r ∈ R, the set {x ∈ S : f (x) < r} belongs to Σ. If Σ is the σ -algebra of all Lebesgue measurable (Borel) subsets of R, such a function is said to be Lebesgue measurable (respectively, a Borel function). Remark 401 1. Note that any Borel function—in particular, any continuous function, see Corollary 405 below—is Lebesgue measurable. This is a consequence of Proposition 265. 2. If Σ is a given σ -algebra of subsets of R, observe that the characteristic function χS of a subset S of R that does not belong to Σ is not Σ-measurable. ® The following result collects some features and some stability properties of Σmeasurable functions. All algebraic and analytic manipulations regarding functions in the following proposition are understood to be defined pointwise, i.e., for all x ∈ R, (f + g)(x) := f (x) + g(x), (f 2 )(x) := (f (x))2 , and so on. Proposition 402 Let Σ be a σ -algebra of subsets of R. Let S ∈ Σ. Let f : S → R∗ be a function and let r ∈ R, a ∈ R and b ∈ R such that a < b. (a) If f is Σ-measurable, then (i) {x ∈ S : f (x) ≤ r} ∈ Σ,
(ii) {x ∈ S : f (x) > r} ∈ Σ,
(iii) {x ∈ S : f (x) ≥ r} ∈ Σ,
(iv) {x ∈ S : a < f (x) < b} ∈ Σ,
(v) {x ∈ S : a ≤ f (x) ≤ b} ∈ Σ,
(vi) {x ∈ S : f (x) = r} ∈ Σ,
(vii) {x ∈ S : f (x) = −∞} ∈ Σ, (viii) {x ∈ S : f (x) = +∞} ∈ Σ. Conversely, if one of (i), (ii), (iii) holds for every r ∈ R, then f is Σ-measurable. If one of (vii) or (viii) holds and so it does (iv) for every a, b ∈ R such that a < b, then f is Σ-measurable. Analogously, if one of (vii) or (viii) holds and so it does (v) for every a, b ∈ R such that a < b, then f is Σ-measurable.
190
4 Functions
(b) If g : S → R∗ is another Σ-measurable function, then the following functions, when defined, are Σ-measurable: (i) f + g,
(ii) f 2 ,
(iii) f g,
(iv) cf , for c ∈ R,
(v) max{f , g},
(vi) min{f , g}.
(c) If {fn }∞ n=1 is a sequence of Σ-measurable functions, then lim supn→∞ fn and lim inf n→∞ fn are Σ-measurable. Proof (ai) {x ∈ S : f (x) ≤ r} = ∞ n=1 {x ∈ S : f (x) < r + 1/n}. (aii) {x ∈ S : f (x) > r} = {x ∈ S : f (x) ≤ r}c . (aiii) {x ∈ S : f (x) ≥ r} = {x ∈ S : f (x) < r}c . (aiv) {x ∈ S : a < f (x) < b} = {x ∈ S : f (x) > a} ∩ {x ∈ S : f (x) < b}. (av) {x ∈ S : a ≤ f (x) ≤ b} = {x ∈ S : f (x) ≥ a} ∩ {x ∈ S : f (x) ≤ b}. (avi) {x ∈ S : f (x) = r} = {x ∈S : f (x) ≤ r} ∩ {x ∈ S : f (x) ≥ r}. (avii) {x ∈ S : f (x) = −∞} = ∞ n=1 {x ∈ S : f (x) < −n}. (aviii) {x ∈ S : f (x) = +∞} = ∞ n=1 {x ∈ S : f (x) > n}. Assume now that for every r ∈ R the set {x ∈ S : f (x) ≤ r} belongs to Σ. Then the set ∞{x ∈ S : f (x) < r} also belongs to Σ, due to the fact that {x ∈ S : f (x) < r} = n=1 {x ∈ S : f (x) ≤ r − 1/n}. This shows that f is Σ-measurable. If for every r ∈ R the set {x ∈ S : f (x) > r} belongs to Σ, so it does the set {x ∈ S : f (x) ≤ r}, and the result follows from the preceding argument. If {x ∈ S : f (x) ≥ r} belongs to Σ for every r ∈ R, so it does {x ∈ S : f (x) < r} for every r ∈ R. Assume now that (vii) holds and so it does (iv) for every a, b ∈ R such that a < b and {x ∈ S : f (x) = −∞} ∈ Σ. Since {x ∈ S : f (x) < r} = ∞ n=1 {x ∈ S : r − n < f (x) < r} ∪ {x ∈ S : f (x) = −∞}, we get that {x ∈ S : f (x) < r} ∈ Σ for every r ∈ R, and the function f is Σ-measurable. Assume that (viii) hold sand so it does (iv) for every a, b ∈ R such that a < b. Since {x ∈ S : f (x) > r} = ∞ n=1 {x ∈ S : r < f (x) < r + n} ∪ {x ∈ S : f (x) = +∞}, we get that {x ∈ S : f (x) > r} ∈ Σ for every r ∈ R, and the function f is Σ-measurable by the argument above. We proceed analogously if (iv) is replaced by (v). (bi) If f (x) + g(x) < r then there exists q ∈ Q such that f (x) < q and g(x) < r − q. Indeed, f (x) < r − g(x), and it suffices to take q ∈ Q such that f (x) < q < r − g(x). Hence {x ∈ S : f (x) < q} ∩ {x ∈ S : g(x) < r − q} . {x ∈ S : f (x) + g(x) < r} = q∈Q
√ √ (bii) {x ∈ S : f 2 (x)
< r} 2= {x ∈ S : 2f (x) < r} ∩ {x ∈ S : f (x) > − r}. (biii) f g = (1/4) (f + g) − (f − g) . (biv) Assume c = 0. Then {x ∈ S : cf (x) < r} if and only if, ⎧ ⎨f (x) < r/c, if c > 0, ⎩f (x) > r/c, if c < 0.
4.5 Finer Analysis of Continuity and Differentiability
191
(bv) {x ∈ S : max{f , g} < r} = {x ∈ S : f (x) < r} ∩ {x ∈ S : g(x) < r}. (bvi) {x ∈ S : min{f , g} < r} = {x ∈ S : f (x) < r} ∪ {x ∈ S : g(x) < r}. (c) Let f := lim supn→∞ fn . Then f (x) ≤ r if and only if, for all k ∈ N we have fn (x) ≤ r + (1/k) for all n ∈ N big enough. Thus {x ∈ S : f (x) ≤ r} =
∞ ∞
∞
{x ∈ S : fn (x) ≤ r + (1/k)}.
k=1 n0 =1 n=n0 +1
Analogously, if g := lim inf n→∞ fn , we have {x ∈ S : g(x) ≥ r} =
∞ ∞
∞
{x ∈ S : fn (x) ≥ r − (1/k)}.
k=1 n0 =1 n=n0 +1
2 Remark 403 Note that the fact that (avi) above holds for all r ∈ R∗ does not imply in general that f is Σ-measurable. Indeed, there exists a Lebesgue nonmeasurable subset S of (0, 1) (see Sect. 3.1.6). Define a mapping f : R → R by ⎧ ⎪ if x ∈ S, ⎪ ⎨x f (x) := 0 if x ≤ 0, and ⎪ ⎪ ⎩ −x if x > 0, x ∈ S. Note that f −1 (r) is either a singleton or the empty set for every r ∈ R ∗ \ {0}, and that f −1 (0) = (−∞, 0]. However, {x ∈ R : f (x) > 0} = S, and S is Lebesgue nonmeasurable. It follows from Proposition 402 that the function f is not Lebesgue measurable. ® Proposition 404 Let Σ be a σ -algebra of subsets of R. Let S ∈ Σ. A function f : S → R∗ is measurable if and only if, for every open set G in R, the set f −1 (G) belongs to Σ. Proof One direction is obvious, since the set (−∞, r) is open for every r ∈ R. To prove the other that f is Σ-measurable and G is an open set in R. implication, assume ∞ Write G = ∞ of pairwise disjoint open intervals n=1 In , where {In }n=1 is a sequence −1 in R (see Proposition 99). Then f −1 (G) = ∞ (In ), and f −1 (In ) ∈ Σ for each n=1 f n ∈ N, due to (a) in Proposition 402. Since Σ is a σ -algebra we get f −1 (G) ∈ Σ. 2 Corollary 405 A real-valued continuous function f defined on an open subset of R is a Borel function. Proof The conclusion follows from Corollary 326 and Proposition 404.
2
Corollary 406 Let Σ be a σ -algebra of subsets of R, let S ∈ Σ and let f : S → R be a Σ-measurable function. If ϕ : R → R is continuous, then ϕ◦f is Σ-measurable. Proof Observe that (ϕ ◦ f )−1 = f −1 ◦ ϕ −1 . Let G ⊂ R be an open set. By Corollary 326 the set ϕ −1 (G) is open. In view of Proposition 404, the set f −1 (ϕ −1 (G)) is open, too, and the conclusion follows by applying again Proposition 404. 2
192
4 Functions
Fig. 4.39 One of the step functions sn (Proposition 408)
f
Corollary 407 Let Σ be a σ -algebra of subsets of R, let S ∈ Σ and let f : S → R be a Σ-measurable function. Then the function |f | is Σ-measurable. Given a σ -algebra Σ of subsets of R, Proposition 408 below provides an important characterization of Σ-measurable real-valued functions defined on an element S in Σ. If S ∈ Σ, a Σ-measurable step function s : S → R is a Σ-measurable function defined on S that takes only a nonempty finite number of values. Equivalently, there exist a partition {S1 , . . ., Sn } of S (i.e., a (in this case, finite) family of pairwise disjoint subsets whose union is S) such that Si ∈ Σ for all i = 1, 2, . . ., n, and real numbers r1 , . . ., rn such that f (x) = ri if x ∈ Si , i = 1, 2, . . ., n. Proposition 408 Let Σ be a σ -algebra of subsets of R, let S ∈ Σ, and let f : S → R be a function. Then f is Σ-measurable if and only if, there exists a sequence {sn }∞ n=1 of Σ-measurable step functions defined on S such that lim n→∞ sn (x) = f (x) for every x ∈ S. Proof Assume first that f is Σ-measurable. Given n ∈ N, put k k+1 −1 En,k := f , fork ∈ Z. n n The family {En,k : k ∈ Z} is in Σ (see Proposition 402), it is pairwise disjoint, and covers S. Define, for n ∈ N, the Σ-measurable step function ⎧ ⎨ k if x ∈ E , |k| ≤ n2 + 1, n,k sn (x) := n ⎩0 otherwise. Fix x ∈ S and ε > 0. There exists n0 ∈ N such that |f (x)| ≤ n0 and 1/n0 < ε. Given n ∈ N such that n ≥ n0 there exists a unique k ∈ Z such that x ∈ En,k , i.e., k ≤ nf (x) < k + 1. This implies |nf (x) − k| < 1. Thus, |k| = |k − nf (x) + nf (x)| ≤ n|f (x)| + |nf (x) − k| < n|f (x)| + 1 ≤ n0 n + 1 ≤ n2 + 1. It follows that sn (x) = k/n and so |f (x) − sn (x)| < 1/n < ε. This shows that sn (x) → f (x) as n → ∞.
4.5 Finer Analysis of Continuity and Differentiability
193
Conversely, assume that f is the pointwise limit of a sequence of Σ-measurable step functions defined on S. Then, the measurability of f follows from (c) in Proposition 402. 2 Remark 409 Recall that a function f : D(f ) → R, where D(f ) is a nonempty subset of R, is continuous if and only if, the preimage of any open subset of R is open relatively to D(f ) (see Corollary 326). There are real-valued continuous functions defined on R such that the preimage of a measurable set is nonmeasurable, see, e.g., [Stromb81, p. 309]. ® Remark 410 If f is a measurable function on R, it may be discontinuous at every point of R. As an example, consider the Dirichlet function D introduced in Definition 296). This function is measurable, since D−1 (−∞, α) is either the empty set, Q, or R, all of them measurable sets (see Proposition 402). The function D is discontinuous at each point x ∈ R (see Example 4.1.3.3). However, for every ε > 0 there is an open set Oε of Lebesgue measure less than ε such that, restricted to its complement, the Dirichlet function is continuous (even constant) (indeed, it is enough to put On := ∞ (q 1] = {qn : n ∈ N} and {εn }∞ n − εn , qn + εn ), where Q ∩ [0, n=1 is a n=1 ∞ sequence of positive real numbers such that n=1 εn < ε/2). This fact is reflected in the next important Theorem 411, due to the Russian mathematician N. N. Luzin. ® Note that a base for the topology of R (i.e., a family of open sets such that every other open set can be written as a union of elements in the base) is, for example, the collection of all open and bounded intervals with rational endpoints. This gives a countable base, that we can write as {Un : n ∈ N}. It follows immediately from Corollary 326 that a function f : R → R is continuous if and only if, f −1 (Un ) is open for each n ∈ N. Theorem 411 (Luzin) A real-valued function f defined on R is Lebesgue measurable if and only if, for each ε > 0 there exists an open set E ⊂ R such that λ(E) ≤ ε and f E c , i.e., the restriction of f to the complement of the set E, is continuous. Proof Let {Un }∞ n=1 be a base of the topology of R. Assume first that f is measurable. Fix ε > 0. Given n ∈ N, the set f −1 (Un ) is Lebesgue measurable (this follows from Proposition 404). Hence, by using Proposition 404, we can find a closed set Fn and an open set Gn in R such that Fn ⊂ f −1 (Un ) ⊂ Gn , and λ(Gn \ Fn ) < ε/2n . Put E := ∞ n=1 (Gn \ Fn ). Observe that, due to Proposition 236, we have λ(E) < ε. Given n ∈ N, (f E c )−1 (Un ) = f −1 (Un ) ∩ E c = Gn ∩ E c , and this shows that (f E c )−1 (Un ) is open relatively to E c . This holds for all n ∈ N. We noted in the paragraph preceding the statement of this theorem that, since this happens for every n ∈ N, the function (f E c ) : E c → R is continuous. Assume now that the property holds. We can find, then, a sequence {En } of open subsets of R such that λ(En ) < 1/n and such that f Enc is continuous. Let U be
194
4 Functions
an open subset of R. Given n ∈ N, the set f −1 (U ) ∩ Enc is open relatively to Enc , due to Corollary 326. Therefore, wecan find an opensubset Gn of R such that ∞ c c Gn ∩ Enc = f −1 (U ) ∩ Enc . Put E := ∞ n=1 En , so E = n=1 En (observe that E is a null set). Therefore, f −1 (U ) ∩ E c =
∞
∞
f −1 (U ) ∩ Enc = Gn ∩ Enc ,
n=1
n=1
and so f −1 (U ) ∩ E c is Lebesgue measurable. Since f −1 (U ) = (f −1 (U ) ∩ E c ) ∪ (f −1 (U ) ∩ E), and f −1 (U ) ∩ E is a null set—hence Lebesgue measurable—we get that f −1 (U ) is Lebesgue measurable. This shows, in the light of Proposition 404, that f is Lebesgue measurable. 2 Corollary 412 Let f be a Lebesgue measurable real-valued function defined on R. Then, for every ε > 0 there is a continuous function g on R such that the set {x ∈ R : g(x) = f (x)} has Lebesgue measure less than ε. Proof Given ε > 0 we can find, by Theorem 411, an open subset E of R with the stated property. The function f E c is continuous, and E c is a closed set. We can apply Tietze’s extension Theorem 566 below (a simple proof for the real line is given in Exercise 13.186) to obtain a continuous extension g of f E c to all R. This function satisfies the conclusion. 2 Remark 413 It is natural to ask whether Luzin’s Theorem 411 holds for ε = 0, i.e., whether a real-valued function on R is Lebesgue measurable if and only if, there exists an open null subset E of R such that f E c is continuous. This is false, and the reader may check Exercise 13.151, where a dramatic counterexample is exhibited: There is a real-valued measurable function f on R such that for no null set A the restriction of f to R \ A has a point of continuity. ® Compare the next result, due to the Russian mathematician N. N. Luzin, with Proposition 408. Corollary 414 (Luzin) A real-valued function f defined on R is Lebesgue measurable if and only if, there is a sequence {fn }∞ n=1 of real-valued continuous functions on R such that limn→∞ fn (x) = f (x) for almost all x ∈ R. Proof Assume first that f is Lebesgue measurable on R. Fix a base {Uk }∞ k=1 of the topology of R. Fix n ∈ N, and find an open set En ⊂ R as in the proof of the sufficient condition in Theorem 411 for ε = 2−n . Let fn be a continuous extension to R of f Enc (use Tietze’s Theorem 566 —as we mentioned above, a simple proof for the case of functions on R can be found in Exercise 13.186). Put, for k ∈ N, Hk =
∞ n=k
En .
4.5 Finer Analysis of Continuity and Differentiability
Then λ(Hk ) ≤
∞ n=k
195
2−n = 2−k+1 . We have H1 ⊃ H2 ⊃ . . . Put H =
∞
Hk .
k=1
Then
λ(H ) ≤ λ(Hk ) ≤ 2−k+1
for every k ∈ N, and thus λ(H ) = 0. We claim that if x ∈ H , then lim fn (x) = f (x). This would finish the proof. Toverify the claim note ∞that ifc x ∈ H , then there is k0 such that x ∈ Hk0 , i.e., x ∈ ∞ E , i.e., x ∈ n=k0 n n=k0 En . Thus, for each n ≥ k0 we have fn (x) = f (x). This shows that for each x ∈ H we have lim fn (x) = f (x). Assume now that {fn }∞ n=1 is a sequence of real-valued continuous functions on R such that for some S ⊂ R of Lebesgue measure 0 we have lim fn (x) = f (x) for all x ∈ R \ S. If r ∈ R, then {x ∈ R : f (x) < r} = ∞ k≥n {x ∈ R \ S : fk (x) < n=1 r} ∪ {x ∈ S : f (x) < r}. The first set in the previous union is Lebesgue measurable (see Proposition 402), while the second one is Lebesgue measurable because it is a subset of a set of Lebesgue measure zero (see Proposition 247). Thus f is Lebesgue measurable on R by Proposition 252. 2 Remark 415 We note that for the Lebesgue measurable Dirichlet function f (see Definition 296) there does not exist a sequence {fn }∞ n=1 of continuous functions on R such that limn→∞ fn (x) = f (x) for every x ∈ R (see Example 458.1 and Exercise 13.305). ® We proved in Proposition 265 that the σ -algebra of the Borel subsets of R is contained in the σ -algebra of the Lebesgue measurable subsets of R. It is natural to ask whether these two σ -algebras coincide. The answer is “no,” and this is shown in the next result. Proposition 416 There exists a Lebesgue measurable subset of R that is not a Borel set. Proof Let V be a nonmeasurable Vitali subset of the interval I := [0, 1] constructed in Lemma 283. Let S be the Lebesgue singular function (Definition 398). Note that V consists of some irrational numbers together with a single rational number r. Note, too, that S takes elements in the open removed intervals In,k (see the construction of C in Sect. 3.1.5) to rational values. Therefore, S−1 (V ) is a subset Z of C union, at most, with one of the intervals In,k . Since Z is a subset of a null set, then Z is measurable. The restriction S C of S to C is one-to-one (observe the formula (4.38) for S on C). Since S is continuous it follows from Proposition 337 that S C is a homeomorphism (see the definition in Remark 105). Assume that Z is Borel. Then S(Z) should be Borel, too. However, S(Z) is either V or V \ {r}. None of those sets is a Borel set. 2 The construction in the proof of Proposition 416 gives the following result.
196
4 Functions
Corollary 417 There exists a continuous one-to-one function f from the Cantor ternary set C into R and a (measurable) subset Z of C such that f (Z) is not measurable.
4.5.5
Differentiability of Monotone Functions
We know that monotone functions are continuous everywhere possibly up to a countable set (see Propositions 397 and 399). Recall that the Lebesgue singular function (Definition 398) is monotone on the interval [0, 1] and differentiable (a.e.) (see Proposition 399). That this is a general behavior of monotone functions is a result of H. Lebesgue (Theorem 424). To prove this statement we need a result due to G. Vitali, referred to as the Vitali’s Cover theorem, and the definition of Dini’s derivatives (a concept named after the Italian mathematician U. Dini). The concept of cover was introduced in Sect. 1.8.4. Let us start by proving Vitali’s Cover Theorem 420 below. To this end we will begin with a crucial lemma. Lemma 418 (Vitali) If E is a subset of R with λ∗ (E) < +∞ and I is a cover of E by closed intervals of positive length, then there exits a finite number of pairwise disjoint elements I1 , I2 , . . ., In in I such that n k=1
λ(Ik ) ≥
1 ∗ λ (E). 6
(4.39)
Proof If λ∗ (E) = 0 there is nothing to prove. So, assume that λ∗ (E) > 0. Let λ1 := sup{λ(I ) : I ∈ I}. Assume first that λ1 = +∞. Then we can find a sequence {Jn }∞ n=1 of elements in I such that λ(Jn ) → +∞. It is enough to choose n ∈ N such that λ(Jn ) ≥ (1/6)λ∗ (E); then (4.39) holds (for a single interval I1 := Jn ). Assume now that λ1 < +∞ and put I1 := I. We may find I1 ∈ I1 such that λ(I1 ) > (1/2)λ1 . Split the family I1 into two subfamilies, namely I2 (consisting of the intervals in I1 that are disjoint from I1 ) and J2 (consisting of the intervals in I1 that intersect I1 ). Denote by I1∗ the interval in R that has the same center as I1 and satisfies λ(I1∗ ) = 5λ(I1 ). Since 2λ(I1 ) > λ1 , we obtain that I ⊂ I1∗ for all I ∈ J2 . We have two possibilities: (a) If I2 = ∅, put λ2 := 0; observe that then J2 = I1 , hence E ⊂ I1∗ . It follows that λ∗ (E) ≤ λ(I1∗ ) = 5λ(I1 ), hence λ(I1 ) ≥ (1/5)λ∗ (E) and (4.39) holds (for a single interval I1 ). (b) If I2 = ∅, then we define λ2 := sup{λ(I ) : I ∈ I2 } and choose I2 ∈ I2 such that λ(I2 ) > (1/2)λ2 . Observe that I2 is disjoint from I1 . We split I2 into two subfamilies, namely I3 (consisting of the intervals in I2 disjoint from I2 ), and J3 (consisting of the intervals in I2 that intersect I2 ). Denote by I2∗ the interval in R that has the same center as I2 and satisfies λ(I2∗ ) = 5λ(I2 ). Since 2λ(I2 ) > λ2 we obtain that I ⊂ I2∗ for all I ∈ J3 .
4.5 Finer Analysis of Continuity and Differentiability
197
Continue in this way getting a sequence λ1 ≥ λ2 ≥ . . . (and intervals I1 , I2 , . . .). Assume that for some n ≥ 2 we obtain λn = 0 (i.e., In = ∅). Note that I1 = I2 ∪ J2 = I3 ∪ J3 ∪ J2 = · · · = In ∪ Jn ∪ Jn−1 ∪ · · · ∪ J2 . ∗ Then E is covered by the intervals in Jn ∪ Jn−1 ∪ · · · ∪ J2 , in particular by n−1 k=1 Ik . Then n−1 n−1 n−1 λ(Ik∗ ) = 5λ(Ik ) = 5 λ(Ik ), λ∗ (E) ≤ k=1
k=1
k=1
hence (4.39) holds (for the intervals I1 , . . ., In−1 ). The remaining possibility is that no λn vanishes. We shall distinguish now two cases: (i) λn → λ > 0. Then λn > for all n ∈ N, hence ∞ k=1 λ(Ik ) = +∞. We λ/2 may find then n ∈ N such that nk=1 λ(Ik ) > (1/5)λ∗ (E), so (4.39) holds. ∞(ii) λ∗n → 0. In this case, we claim that every interval I ∈ I1 is contained in k=1 Ik . Otherwise, there would be I ∈ I1 not intersecting any In . Therefore, I ∈ In for all n ∈ N, and this shows that λ(I ) ≤ λn for all n ∈ N, hence λ(I ) = 0, of E, we get λ∗ (E) ≤ a contradiction. This ∞ proves the claim. Since I1 is a cover ∞ n ∗ ∗ k=1 λ(Ik ) = 5 k=1 λ(Ik ). Find n ∈ N such that (1/6)λ (E) < k=1 λ(Ik ). This shows (4.39). 2 Definition 419 Let E be a subset of R. We say that a family V of closed intervals, each of positive length, is a Vitali cover of E if for each x ∈ E and each ε > 0 there exists I ∈ V such that x ∈ I and λ(I ) < ε. As an example, consider E the set of irrational numbers in [0, 1], and let V = {[α, β] : α, β ∈ Q ∩ [0, 1], α < β}. Then V is a Vitali cover of E. Theorem 420 (Vitali) Let E be a subset of R such that 0 < λ∗ (E) < +∞, and let V be a Vitali cover of E. Then, given ε > 0, there exists a pairwise disjoint family {In }∞ n=1 ⊂ V such that ∞ ∞ ∗ In = 0, and λ(In ) < (1 + ε)λ∗ (E). (4.40) λ E\ n=1
n=1
Proof Let U be an open subset of R such that E ⊂ U , and λ(U ) < (1 + ε)λ∗ (E) (see Corollary 239). It is enough now to consider the family V0 of all elements I in V such that I ⊂ U to obtain the second part of (4.40). 1 Apply Lemma 418 to the set E. We obtain a finite sequence {In }N n=1 of pairwise N1 disjoint intervals in V0 such that n=1 λ(In ) ≥ (1/6)λ∗ (E). Then N1 N1 N1 1 ∗ λ∗ (E). In ≤ λ U \ In = λ(U ) − λ(In ) < 1 + ε − λ E\ 6 n=1 n=1 n=1 We may restrict ourselves to ε < 1/12. Then we get N1 11 ∗ ∗ λ (E). In < λ E\ 12 n=1
(4.41)
198
4 Functions
1 Apply what has been already proved to the set E1 := E \ N I and the cover 1 n=1 n V1 consisting of the intervals in V0 that are disjoint from N I . Observe that this n=1 n N 1 is a Vitali cover of E1 due to the fact that n=1 In is a closed set. Then we get IN1 +1 , IN1 +2 , . . ., IN2 pairwise disjoint in V1 , and (4.41) gives, applied to this case, λ
∗
N2
E1 \
In
n=N1 +1
11 ∗ < λ (E1 ) < 12
11 12
2
λ∗ (E),
i.e., λ
∗
E\
N2
In
0 there exists a finite pairwise disjoint subfamily {I1 , . . ., In } of V such that n n ∗ λ E\ Ik < ε, and λ(Ik ) < (1 + ε)λ∗ (E). (4.43) k=1
k=1
The second ingredient in the proof of Lebesgue’s differentiation Theorem 424 below is the concept of Dini derivative. Definition 422 Let f be a real-valued function defined on a nonempty open subset U of R. Let x ∈ U . Define D + f (x) = lim sup
f (y) − f (x) f (y) − f (x) ; D+ f (x) = lim inf , y→x+ y−x y−x
D − f (x) = lim sup
f (y) − f (x) f (y) − f (x) ; D− f (x) = lim inf . y→x− y−x y−x
y→x+
y→x−
The above quantities are referred to as Dini derivatives of f at x. They exist as elements in R ∪ {±∞}.
4.5 Finer Analysis of Continuity and Differentiability
199
Definition 423 Let f be a real-valued function defined on a nonempty open subset U of R. Let x ∈ U . If D + f (x) = D+ f (x) then we write f+ (x) := D + f (x) = D+ f (x) = lim
y→x+
f (y) − f (x) y−x
and refer to this quantity as the right-hand-side derivative of f at x. Similarly, if D − f (x) = D− f (x) then we write f (y) − f (x) y→x− y−x
f− (x) := D − f (x) = D− f (x) = lim
and refer to this quantity as the left-hand-side derivative of f at x. The following result is due to H. Lebesgue. Theorem 424 (Lebesgue) Let f be a real-valued monotone function defined on a general open interval I . Then f (x) exists (a.e.) in I as a real number. Proof Without loss of generality we may assume that f is increasing. In this situation, note that 0 ≤ D+ f (x) ≤ D + f (x) ≤ +∞, and 0 ≤ D− f (x) ≤ D − f (x) ≤ +∞, for x ∈ I . It is also enough to assume that I is a bounded interval (a, b) (otherwise, splitting I in a countable number of bounded subintervals and applying the result for each of them we get the conclusion). We shall first prove that, for any two Dini’s derivatives, the set of points in (a, b) where these two derivatives are different is a null set (we shall care about the finiteness of the derivatives later in the proof). Since the argument is the same for any two Dini’s derivatives, we shall prove that λ(E) = 0 for the set E := {x ∈ (a, b) : D+ f (x) < D + f (x)}. For each pair of rational numbers u and v such that u < v, define the set Eu,v := {x ∈ (a, b) : D+ f (x) < u < v < D + f (x)}. Then we have E=
{Eu,v : u, v ∈ Q, 0 < u < v}.
It is sufficient to show λ(Eu,v ) = 0 for all u, v ∈ Q such that 0 < u < v. Assume on the contrary that there exist u, v ∈ Q so that 0 < u < v and λ∗ (Eu,v ) = α > 0. Let x ∈ Eu,v . Since D+ f (x) < u, there exists an arbitrarily small h > 0 such that [x, x + h] ⊂ (a, b), and f (x + h) − f (x) < uh. This is done for every x ∈ Eu,v , and we get a family V of these intervals [x, x + h] that forms a Vitali cover of Eu,v . Fix ε > 0. By Corollary 421 we can find a finite pairwise disjoint subfamily {[xi , xi + hi ]}N i=1 of V such that (i) f (xi + hi ) − f (xi ) < uhi , i = 1, 2, . . ., N.
200
4 Functions
(ii) λ∗ Eu,v \ N i=1 [xi , xi + hi ] ∗ λ (Eu,v ) − ε, and N ∗ (iii) i=1 hi < (1 + ε)λ (Eu,v ).
< ε, hence λ∗ Eu,v ∩
N
i=1 [xi , xi + hi ]
>
From (i) and (iii) we get N
f (xi + hi ) − f (xi ) < (1 + ε)uλ∗ (Eu,v ).
(4.44)
i=1
Put now A := Eu,v ∩ N i=1 [xi , xi + hi ]. Let y be a point in A that is not an end point of any [xi , xi + hi ], i = 1, 2, . . ., N . Note that D + f (y) > v. Then there exists an arbitrarily small k > 0 such that [y, y + k] ⊂ [xi , xi + hi ] for some i ∈ {1, 2, . . ., N }, and f (y + k) − f (y) > vk. This is done for each such a point y, so obtaining a Vitali cover V of A \ {xi , xi + hi : i = 1, 2, . . ., N} from which, again by Corollary 421, we can extract a finite pairwise disjoint subfamily {[yj , yj + hj ]}M j =1 , such that (iv) Each [yj , yj + hj ] lies in some [xi , xi + hi ]. (v) f (yj + kj ) − f (yj ) > vkj , j = 1, 2, . . ., M, and M ∗ ∗ (vi) j =1 kj > λ (A) − ε ( > λ (Eu,v ) − 2ε, due to (ii)). From (v) and (vi) we get M
f (yj + kj ) − f (yj ) > vλ∗ (Eu,v ) − 2ε.
(4.45)
j =1
Since f is increasing and (iv) holds, we have M
f (yj + kj ) − f (yj ) ≤
j =1
N
f (xi + hi ) − f (xi ) .
(4.46)
i=1
Putting together (4.44), (4.45), and (4.46), we get vλ∗ (Eu,v ) − 2ε < (1 + ε)uλ∗ (Eu,v ). Since ε > 0 is arbitrary, we obtain a contradiction with the fact that u < v. As we mentioned, the remaining cases for different Dini’s derivatives are treated similarly. To finalize the proof, we need to deal with the case of equal and infinite Dini’s derivatives. Since we are assuming that f is increasing, this reduces to the case f (x) = +∞:
4.5 Finer Analysis of Continuity and Differentiability
201
Fix n ∈ N such that a < a + 1/n < b − 1/n < b. Let F := {x ∈ (a + 1/n, b − 1/n) : f (x) = +∞}. Let β be an arbitrary positive number. For each x ∈ F choose an arbitrarily small number h > 0 so that [x, x + h] ⊂ (a + 1/n, b − 1/n), and f (x + h) − f (x) > βh. As this provides a Vitali cover of F , we can choose for every ε > 0, by using Corollary 421, a finite pairwise disjoint subfamily {[xi , xi + hi ]}N i=1 so that (i) f (xNi + hi ) −∗f (xi ) > βhi , i = 1, 2, . . ., N , and (ii) i=1 hi > λ (F ) − ε. From (i) and (ii) we get N
(f (xi + hi ) − f (xi )) > β(λ∗ (F ) − ε),
i=1
and, since f is increasing, f (b − 1/n) − f (a + 1/n) ≥
N
(f (xi + hi ) − f (xi )) (>β(λ∗ (F ) − ε)). (4.47)
i=1
This is true for arbitrary ε > 0 and β > 0, so (4.47) forces λ∗ (F ) = 0.
2
Remark 425 Note that it follows from Exercises 13.156 and 13.272 that the set of points of differentiability of an increasing function may not contain any dense Gδ -set, though it must be dense by the Lebesgue Theorem 424. ®
4.5.6
Functions of Bounded Variation
Let [a, b] be an interval in R (as usual, we assume that it is bounded, and, in order to avoid trivialities, that a < b). A finite set P := {x0 , x1 , . . ., xn } is called a partition4 of the interval [a, b] whenever a = x0 < x1 < x2 < . . . < xn = b. Definition 426 Let [a, b] be a closed and bounded interval in R. We say a real-valued function f on [a, b] is of bounded variation if Vab f := sup{V (P , f ) : P a partition of [a, b]} < ∞, 4
(4.48)
A partition of a set was defined in Sect. 1.1 as a nonempty family of pairwise disjoint subsets whose union is the given set. In some parts of Real Analysis theory, as here and in integration theory, by a partition of an interval it is understood a finite splitting of the interval; for this, a finite number of its points, including the endpoints, is given.
202
4 Functions
where V (P , f ) := nk=1 |f (xk ) − f (xk−1 )| and P := {a = x0 < x1 < . . . < xn = b}. The quantity Vab f is called the total variation of f on [a, b]. Remark 427 1. In Definition 426, the expression for V (P , f ) can be substituted by | nk=1 (f (xk ) − f (xk−1 ))|. Of course, the result of the computation is different, and so it is the corresponding value of Vab f . However, the resulting concept—bounded variation—is the same. For details, see Exercise 13.273. 2. Note that if P , Q are two partitions of [a, b], and Q is a finer than P (i.e., P ⊂ Q), then V (P , f ) ≤ V (Q, f ). In order to prove this, note first that if P := {a = x0 < x1 < . . . < xn = b} and c ∈ [a, b], we have V (P , f ) ≤ V (P , f ), where P := P ∪ {c}. Indeed, if c ∈ (xm−1 , xm ) for some m ∈ {1, 2, . . ., n}, then |f (xm ) − f (xm−1 )| ≤ |f (xm ) − f (c)| + |f (c) − f (xm−1 )|, and the result follows. The general statement can be established then by using finite induction. ® Remark 428 Note that it follows, by a telescopic argument (see Remark 678), that if f is a monotone function on [a, b] then Vab f = |f (b) − f (a)|. Indeed, if f is, say, increasing, and P := {a = x0 < x1 < · · · < xn = b} is a partition of [a, b], then V (P , f ) :=
n k=1
|f (xk ) − f (xk−1 )| =
n
(f (xk ) − f (xk−1 )) = f (b) − f (a)
k=1
(=|f (b) − f (a)|). Since this holds for any partition P of [a, b], the result follows. This shows that every increasing function in a closed and bounded interval is of bounded variation there. A similar argument proves the result in case that f is decreasing. (For a kind of converse statement, see Theorem 432 below.) ® Example 429 The function (see Fig. 4.40) ⎧ ⎨x sin (1/x) f (x) = ⎩0
if x = 0, if x = 0,
defined on I := [0, 2/π] is continuous on its domain, and it is not of bounded variation. Indeed, continuity must be checked just at x = 0. For this, observe that |x sin (1/x)| ≤ |x| for all x ∈ (0, 2/π], and so limx→0+ f (x) = f (0) = 0.
4.5 Finer Analysis of Continuity and Differentiability
203
Fig. 4.40 The function in Example 429 on [0, 2/π]
To see that f is not of bounded variation on [0, 2/π ] define, for each n ∈ N, a partition of [0, 2/π ] by 2 0 < xn < xn−1 < · · · < x2 < x1 = , π 2 2 where xk := kπ for all k = 1, 2, . . ., n. Observe that f ( kπ ) = 0 for k even, and that 2 2 f (2k+1)π = (2k+1)π ( − 1)k for every k ∈ N. Now, for n odd, n−1
|f (xk ) − f (xk+1 )| + |f (xn ) − f (0)| =
k=1
2 2 2 2 2 + + + ··· + + . π 3π 3π nπ nπ
Since the harmonic series diverge, the quantity Vab f (see (4.48)) must be infinite, hence f is not of bounded variation. For other features of the same function see Example 4.5.8.4. ♦ Proposition 430 Let [a, b] be a closed and bounded interval in R. Let f be a realvalued function defined on [a, b]. Let c ∈ [a, b], and assume that f is of bounded variation on [a, b]. Then Vac f + Vcb f = Vab f
(4.49)
(in particular, the restriction of f to any closed subinterval of [a, b] is of bounded variation there), and the function x $ → Vax f , x ∈ [a, b] is increasing. Proof Observe that Remark 427.2 shows that, given a point c ∈ [a, b], we have Vab f = sup{V (P , f ) : P a partition of [a, b], c ∈ P }.
(4.50)
204
4 Functions
To prove (4.49), let P = {a = x0 < x1 < · · · < xn−1 < xn = b} be a partition of [a, b]. By the previous observation, we may assume that c = xm for some m ∈ {0, 1, 2, . . ., n}. Let P1 := {a = x0 < x1 < · · · < xm = c},
and
P2 =: {c = xm < · · · < xn = b}
(if c = a (c = b), then P1 (respectively, P2 ) reduces to a single point). Obviously, V (P , f ) = V (P1 , f ) + V (P2 , f ) ≤ Vac f + Vcb f. Taking the supremum on the left hand side over the family of all partitions of [a, b] that contain c, we get, having in mind (4.50), Vab f ≤ Vac f + Vcb f. To prove the other inequality, choose partitions P1 and P2 of [a, c] and [c, b], respectively. Consider the partition P := P1 ∪ P2 of [a, b]. Then V (P1 , f ) + (P2 , f ) = V (P , f ) ≤ Vab f. Taking suprema on the left hand side over P1 and P2 , we get Vac f + Vcb f ≤ Vab f. If [c, d] is a closed subinterval of [a, b], equation (4.49) for the intermediate point d shows that f is of bounded variation on [a, d]; applying (4.49) again, now to the interval [a, d] and the intermediate point c, shows that f is of bounded variation on [c, d]. To prove the second statement, fix x, y ∈ [a, b] such that x < y. Given an arbitrary partition P of [a, x], P := P ∪ {y} is a partition of [a, y], and V (P , f ) ≤ V (P , f ) + |f (y) − f (x)| = V (P , f ) ( ≤ Vay f ). By taking suprema over P on the left-hand-side of the previous inequality we get y Vax f ≤ Va f . 2 Example 431 Using Remark 428 and the first statement in Proposition 430, we can easily compute, for example, the total variation V02π ( sin x) of the sine function on [0, 2π]—proving, incidentally, that sin x is a function of bounded variation on [0, 2π]. We proceed as follows: π/2
V02π ( sin x) = V0
3π
2 ( sin x) + Vπ/2 ( sin x) + V 32ππ ( sin x) 2 3π π π 3π − sin + sin 2π − sin = 4. = sin − sin 0 + sin 2 2 2 2
♦ The following is a result due to the French mathematician C. Jordan. It characterizes functions of bounded variation in terms of increasing functions.
4.5 Finer Analysis of Continuity and Differentiability
205
Fig. 4.41 The function sin x as the difference of two increasing functions on [0, 2π]
Theorem 432 (Jordan) A real-valued function f defined on an interval [a, b] is of bounded variation, if and only if, f is the difference of two increasing functions on [a, b].
Proof Assume that f is of bounded variation. Write f (x) = Vax f − Vax f − f (x) for x ∈ [a, b], and observe that x $ → Vax f − f (x) is an increasing function on [a, b]. Indeed, if x, y ∈ [a, b] and x ≤ y, then (Vay f − f (y)) − (Vax f − f (x)) = Vxy f − (f (y) − f (x)) ≥ Vxy f − |f (y) − f (x)| = Vxy f − V (P , f ) ≥ 0, where the first equality follows from Proposition 430, and P denotes the partition {x, y} of [x, y]. Observe, too, that the mapping x $ → Vax f is also increasing (see Proposition 430). This two observations, together, prove that f is the difference of two increasing functions. Conversely, a monotone function is of bounded variation. This was shown in Remark 428. It is simple to prove that the difference of two real-valued functions of bounded variation defined on [a, b] is again a function of bounded variation on [a, b]. This proves the sufficient condition. 2 As an example illustrating the statement in Theorem 432, observe that the function sin x on [0, 2π]—a function of bounded variation there, see Example 431—is the difference of the functions 2x + sin x and 2x, two increasing functions on [0, 2π ], see Fig. 4.41. As a consequence of Theorem 432, Proposition 397, and Theorem 424, we get the following result. Corollary 433 A real-valued function f of bounded variation defined on [a, b] is continuous everywhere possibly up to a countable set, where the discontinuities are jump discontinuities.Moreover, f is differentiable (a.e.).
206
4 Functions
4.5.7 Absolutely Continuous Functions and Lipschitz Functions Absolutely Continuous Functions The class of the absolutely continuous functions that follows—a smaller subclass of the class of functions of bounded variation, see Proposition 436 and Remark 441 below—plays an important role in the so-called Fundamental Theorem of Calculus. In fact, functions in this class arise naturally as indefinite integrals of integrable functions (see Sect. 7.1.4 for the Riemann integral, and 7.3.7 for the Lebesgue integral). This will be elaborated in Sect. 7.3.10. We say that the members in a collection of intervals are pairwise nonoverlapping if the intersection of the interiors of any two of them is empty. Definition 434 A real-valued function f defined on an interval [a, b] is said to be absolutely continuous on [a, b] if for every ε > 0 there exists δ > 0 such that if n ∈ N and {[xi , yi ]}ni=1 is any finite collection of pairwise nonoverlapping subintervals of [a, b] such that ni=1 (yi − xi ) < δ, then n
|f (yi ) − f (xi )| < ε.
(4.51)
i=1
Remark 435 1. In Definition 434 we may consider, equivalently, sequences {[xi , yi ]}∞ i=1 of nonoverlapping intervals instead of finite families. It is clear, too, that we may use open subintervals (xi , yi )—or, in fact, any other form of bounded subintervals—instead. 2. Every absolutely continuous function is clearly uniformly continuous. However, not every continuous functions on a closed and bounded interval (hence uniformly continuous by Theorem 344) is absolutely continuous, see Remark 441. 3. For some equivalent formulations of the absolute continuity see Exercises 13.275 and 13.278. ® Proposition 436 Let f be a real-valued absolutely continuous function defined on [a, b]. Then f is of bounded variation on [a, b]. Proof Let ε = 1 and choose the corresponding δ > 0 in Definition 434. Let {x0 , x1 , . . ., xn } be a partition of [a, b] such that xk+1 − xk =
b−a , for all k = 1, 2, . . ., n, n
and n ∈ N is chosen so that b−a < δ. From the absolute continuity of f we get that n xk+1 Vxk f < 1 for all k ∈ {1, . . ., n}, and thus Proposition 430 gives Vab f
=
n
Vxxkk+1 f < n.
k=1
This shows that f is of bounded variation.
2
4.5 Finer Analysis of Continuity and Differentiability
207
We shall show in Remark 441 that the class of all the real-valued absolutely continuous functions on an interval in R is smaller than the class of all the realvalued continuous functions of bounded variation on this interval. To this end, we shall prove Proposition 437 and its Corollary 438, two results that are of independent interest. Proposition 437 Let f be a real-valued absolutely
continuous function defined on [a, b]. Let A ⊂ [a, b] with λ(A) = 0. Then λ f (A) = 0. Proof Let ε > 0 and choose δ > 0 as in Definition 434. Since λ(A) = 0, there exists ∞ a sequence of (closed) intervals {In }∞ that covers A and such that n=1 n=1 λ(In ) < δ. Since f is continuous, f (In ) is a closed interval for each n ∈ N (see Corollary 340), and so {f (In )}∞ n=1 is a sequence of closed intervals that cover f (A). Again, the continuity of f implies that f is bounded on [a, b] (Corollary 335); in particular, the set f (A) is bounded, and so λ∗ (f (A)) < +∞, where λ∗ denotes the upper measure (see Definition 229). Use Vitali’s Lemma 418 to find a pairwise disjoint finite family of intervals f (Ii ) (say {f (Ii )}ni=1 ) such that n
λ(f (Ii )) ≥ (1/6)λ∗ (f (A)).
(4.52)
i=1
The finite family {Ii }ni=1 is also pairwise disjoint, and hence, by the definition of absolute continuity, n
λ(f (Ii )) < ε.
n i=1
λ(Ii ) ≤
∞ n=1
λ(Ii ) < δ,
(4.53)
i=1
Using (4.52) and (4.53), and having in mind that ε > 0 is arbitrary, we get λ∗ (f (A)) = 0 (=λ(f (A))). 2 Corollary 438 Let f be a real-valued absolutely continuous function defined on a closed and bounded interval [a, b]. Then, if E ⊂ [a, b] is a measurable set, the set f (E) is also measurable. Proof Write E= F ∪ N, where F is an Fσ set and N is a null set (see Corollary ∞ F , where each F is compact. Then f (F ) = 267). Put F = ∞ k k k=1 k=1 f (Fk ), and each f (Fk ) is compact due to the continuity of f . It follows that f (F ) is an Fσ subset of R, as each f (Fk ) is closed in [a, b]. By Proposition 437, f (N ) is a null set, so f (E) (=f (F ) ∪ f (N )) is measurable, again by Corollary 267. 2 Theorem 439 Let f be a real-valued absolutely continuous function defined on [a, b]. Suppose f (x) = 0 (a.e.) on (a, b). Then f is constant on [a, b]. In the proof of Theorem 439 we shall need the following lemma of independent interest, a reduced version of the Morse–Sard theorem (see Remark 363.4 above and Fig. 4.42). The theorem is named after the American mathematicians H. C. M. Morse and A. Sard.
208
4 Functions
Fig. 4.42 There are “few” horizontal tangent lines (Lemma 440)
f a
b
Lemma 440 Let f be a real-valued function defined on a closed and bounded interval [a, b] such that a < b. Let B be a subset of (a, b) such that f (x) exists and is zero for all x ∈ B. Then λ∗ (f (B)) = 0. Proof Fix ε > 0 and δ > 0. Observe that, given x ∈ B, we may find n ∈ N (depending on x) such that (x − 1/n, x + 1/n) ⊂ (a, b) and, for 0 < |h| < 1/n, f (x + h) − f (x) < ε. h For n ∈ N, put , 1 1 ⊂ (a, b), |f (x + h) − f (x)| < ε|h| for Bn := x ∈ B : x − , x + n n 10 < |h| < . (4.54) n ∞ According to our previous observation, B = ∞ n=1 Bn , so f (B) = n=1 f (Bn ). In order to prove that λ∗ (B) = 0 it will be enough, in view of Corollary 237, so show that λ∗ (f (Bn )) = 0 for each n ∈ N. So fix n ∈ N. Since Bn is bounded, Remark 230.6 shows that λ∗ (Bn ) < +∞ (in fact, λ∗ (Bn ) ≤ (b − a)). We can find then a sequence ∞ ∞ ∗ {Ik }∞ of open intervals such that B ⊂ I and n k=1 k=1 k k=1 λ(Ik ) < λ (Bn ) + δ. Without loss of generality, we may also assume that λ(Ik ) < 1/n for each k ∈ N. Then, given x, y ∈ Bn ∩ Ik , we have |f (y) − f (x)| < ελ(Ik ), for all k ∈ N. Since f (Bn ) ⊂ ∞ k=1 f (Ik ∩ Bn ), this proves, by using Proposition 236, that λ∗ (f (Bn )) ≤
∞ k=1
λ∗ (f (Ik ∩ Bn )) ≤ ε
∞
λ(Ik ) < ε(λ∗ (Bn ) + δ) ≤ ε(b − a + δ).
k=1
This holds for all ε > 0, hence λ∗ (f (Bn )) = 0.
2
Proof of Theorem 439 Let B := {x ∈ (a, b) : f (x) = 0}, and A := [a, b] \ B. Observe that, by Proposition 437, we have λ (f (A)) = 0, and, by Lemma 440, λ (f (B)) = 0. Since f ([a, b]) = f (A) ∪ f (B), we get λ(f ([a, b])) = 0. Due to the fact that the range of a continuous function on [a, b] is a closed interval, it follows that f is constant on [a, b]. 2 Remark 441 The class of all real-valued absolutely continuous functions on a nondegenerate closed bounded interval [a, b] is strictly contained in the class of all
4.5 Finer Analysis of Continuity and Differentiability
209
real-valued continuous functions of bounded variation on [a, b]. An example of a function of the second class not in the first class is the Lebesgue singular function S introduced in Definition 398. That this function is continuous on [0, 1] was proved in Proposition 399. Since it is increasing (see the same proposition), it is of bounded variation there (Remark 428). We propose here several arguments to prove that S is not absolutely continuous. 1. Let C be the Cantor ternary set (see Definition 343). The set C is null. Note that S(C) = [0, 1], and thus λ(S(C)) = 1. This shows, in view of Proposition 437, that S is not absolutely continuous on [0, 1]. 2. Another way to see that S is not absolutely continuous on [0, 1] is the following (see the proof of Proposition 416 and follow the notation there): S sends the measurable set S−1 (V ) onto the nonmeasurable set V Use now Corollary 438. 3. Still another argument is provided by Theorem 439: The function S has a derivative (a.e.) on [0, 1]. Indeed, it has a derivative precisely at points in [0, 1] \ C Moreover, S (x) = 0 at every point x ∈ [0, 1] \ C (for these two statements, see Proposition 399). Should S be absolutely continuous, it will be a constant function (use Theorem 439), and this is false. ® Remark 442 Theorem 439 does not apply in general to functions of bounded variation. An example is given by the Lebesgue singular function. See Remark 441.3. ® Lipschitz Functions Lipschitz functions, introduced in Definition 443 below, form an important strict subclass of the class of absolutely continuous functions (see Proposition 444 and Example 4.5.8.1 below). They arise naturally when considering functions with bounded derivative (see Proposition 445 below). Definition 443 We say that a real-valued function f defined on a nonempty subset D(f ) of R is Lipschitz on D(f ) if there exists a constant C ≥ 0 so that |f (y) − f (x)| ≤ C|y − x| for all x, y ∈ D(f ). The constant C is called a Lipschitz constant of f , and we say in this case that f is a C-Lipschitz function on D(f ). Lipschitz functions are named after the German mathematician R. Lipschitz. The set of all real-valued Lipschitz functions defined on a nonempty subset D of R is denoted Lip(D). Proposition 444 If f is Lipschitz on a compact interval I := [a, b], then f is absolutely continuous on I . In particular, f is uniformly continuous on I , and so continuous everywhere on I . Moreover, f is of bounded variation on I , hence differentiable (a.e.) on I . Proof Let C > 0 be a Lipschitz constant for f . Given n ∈ N, let {[xi , yi ]}ni=1 be a finite collection of pairwise nonoverlapping subintervals of I . Then, n i=1
|f (yi ) − f (xi )| ≤ C
n i=1
|yi − xi |.
210
4 Functions
This shows that f is absolutely continuous on I (choose δ = ε/C for every ε > 0). The additional statements are a consequence of Remark 435.2, the paragraph after Definition 343, Proposition 436, and Theorem 424, respectively. 2 Note that there are absolutely continuous functions on closed √ and bounded intervals that are not Lispchitz. An example is the function f (x) = x on [0, 1], see Sect. 4.5.8.1. Note, too, that the function f (x) = |x| is Lipschitz on R (indeed, |x| − |y| ≤ |x − y| for all x, y ∈ R, see the paragraph after Proposition 38), yet it is not differentiable at x = 0 (see Fig. 4.19). The following proposition gives a simple, yet useful, sufficient condition for a differentiable function to be Lipschitz. Proposition 445 Let f be a differentiable function on an interval (a, b) with bounded derivative (precisely, there exists a constant C > 0 so that |f (x)| ≤ C for all x ∈ (a, b)). Then f is C-Lipschitz on (a, b). Proof Let x, y ∈ (a, b). By the Mean Value Theorem 365, we can choose c ∈ (x, y) so that f (y) − f (x) = f (c). y−x Now since |f (z)| ≤ C for all z ∈ (a, b), we obtain |f (y) − f (x)| ≤ C|y − x| for all x ∈ (a, b). 2 Theorem 446 Assume that f is a real valued function on a compact set K that is locally Lipschitz, i.e. for every point in K there is a neighborhood U in K such that f is Lipschitz on U . Then f is Lipschitz on K. Proof First, note that since f is continuous on K it is bounded on K, say by the real number M. Assume that f is not Lipschitz on K. Then there are xn , yn ∈ K, n ∈ N, such that |f (xn ) − f (yn )| ≥ n|xn − yn | for all n ∈ N. Note that necessarily lim |xn − yn | = 0. Indeed if not, then for some ε > 0 and for some subsequence nk , we would have |xnk − ynk | ≥ ε. Thus |f (xnk ) − f (ynk )| > nk |xnk − ynk | ≥ nk ε, a contradiction with the fact that |f (xnk ) − f (ynk ) ≤ |f (xnk )| + f (ynk )| ≤ 2M for all k. Because of the compactness of K, assume without loss of generality that lim xn = x in K. Then lim yn = x. Let U be a neighborhood of x on which f is Lipschitz with Lipschitz constant Mx . Then for large n, both xn and yn lie in U and |f (xn ) − f (yn )| > n|xn − yn | ≥ Mx |xn − yn |, a contradiction. This finishes the proof.
2
Remark 447 Theorem 446 should be compared with Exercise 13.287 on the notion of a function pointwise Lipschitz. ®
4.5 Finer Analysis of Continuity and Differentiability
211
√ Fig. 4.43 The function x and a linear function Cx (for C > 0) on [0, 1]
4.5.8
Examples
1. An example of an√absolutely continuous non-Lipschitz function F on [0, 1] is given by F (x) := x (see Fig. 4.43 and Exercises 13.282 for the absolute continuity and 13.285 for the non-Lipschitz For this last statement we may √ character. √ repeat the argument here: Assume | x − √ y| ≤ C|x − y| for some constant C√ > 0 and all x, y ∈ [0, 1]. In particular x ≤ Cx for all x ∈ [0, 1], and so 1/ x ≤ C for all x ∈ (0, 1]. We reach a contradiction by letting x ↓ 0). Note that in Example 4.1.3.3 we proved that this function is uniformly continuous on [0, 1]; this is implied by its absolute continuity. For proofs of the absolute continuity of F on [0, 1] different from the one in Exercise 13.282, see Exercises 13.284 and 13.517. 2. The Lebesgue singular function S (see Definition 398), defined on [0, 1], provides a one-to-one and onto map between the Cantor ternary set C (a null set) and the set [0, 1] (which has measure one). This shows, by using Proposition 437, that S is not absolutely continuous. It follows then that S cannot be Lipschitz, due to Proposition 444. However, it is monotone, hence of bounded variation. This was shown in Remark 441. 3. Consider the function (see Fig. 4.44) ⎧ ⎨x 2 sin (1/x) if x = 0, f (x) = ⎩0 if x = 0, defined on I := [0, π1 ]. Then f exists for all x ∈ I , but f is not continuous at 0. This example is due to Darboux [Da75]. The function f is Lipschitz and hence absolutely continuous, in particular of bounded variation. In order to see all this, let us prove first that f (0) exists and f (0) = 0. Consider, for x ∈ (0, 1/π], f (x) − f (0) x 2 sin (1/x) 1 = = x sin . x x x
212
4 Functions
Fig. 4.44 The graph of f and f in Example 4.5.8.3
Fig. 4.45 The graphs of f and f (its range truncated) in Example 4.5.8.4
Since |x sin (1/x)| ≤ |x| for all x ∈ (0, π1 ), we have limx→0 x sin (1/x) = 0. This shows that f (0) = 0. Observe that ⎧ ⎨2x sin (1/x) − cos (1/x) if x = 0, f (x) = ⎩0 if x = 0.
By looking at the sequence {1/(nπ )}∞ n=1 we easily see that lim x→0 f (x) does
not exist. This shows that the function f is not continuous at x = 0 (see again Fig. 4.44). Note that |f (x)| ≤ π2 + 1 for all x ∈ I and thus, by Proposition 445, f is Lipschitz. 4. Consider the function ⎧ ⎨x sin (1/x) if x = 0, f (x) = ⎩0 if x = 0,
defined on the domain I := [0, π2 ]. This function has been considered in Example 429, where we proved that f is continuous on I (hence uniformly continuous there by Theorem 344), although f is not of bounded variation on I . Note that f (0) does not exists (see Fig. 4.45). Indeed, consider, for x ∈ (0, 2/π ], x sin (1/x) f (x) − f (0) = = sin (1/x). x x Note that limx→0 sin (1/x) does not exist. In order to see this, choose a sequence ∞ {xn }∞ n=1 , where xn := 2/(nπ ) for n ∈ N. Clearly, {xn }n=1 converges to 0, and ∞
{sin (1/xn )}n=1 does not converge. This shows that f (0) does not exist.
4.5 Finer Analysis of Continuity and Differentiability Fig. 4.46 Hierarchy of some classes of functions
213 Proposition 434 (L)
Proposition 426
(AC) (i)
(BV)
(ii) Remark 425.2
(iv) (iii)
(ii) (C)
Figure 4.46 schematizes a hierarchy regarding classes of real-valued functions defined on a closed bounded interval considered in this section, and refers to some separating examples below. We shall use the following abbreviations: (L) Lipschitz (AC) Absolutely continuous (UC) Uniformly continuous (BV) Bounded variation (C) Continuous √ (i) The function x on [0, 1] (Example 1 in Sect. 4.5.8, see also Exercises 13.282, 13.284, and 13.517 for the absolute continuity and Exercise 13.285 for the non-Lipschitz character). (ii) The Lebesgue singular function S on [0, 1] (Remark 441). (iii) Any increasing function (Remark 428 or Theorem 432) on [0, 1] with a jump discontinuity. However, (BV) implies at most countably many (jump) discontinuities (Corollary 433) (iv) The function ⎧ ⎨x sin (1/x) if x = 0, f (x) = ⎩0 if x = 0, defined on I := [0, 2/π] (Example 429).
4.5.9
The Intermediate Value Property II
We saw in Example 4.5.8.3 that there are functions that are differentiable on [a, b] and yet their derivatives are not continuous functions. Even though f need not be continuous if f is differentiable, it has the Intermediate Value Property. This is the content of the following theorem due to J. G. Darboux. Recall the concepts of left-hand and right-hand derivatives introduced in Definition 423. Theorem 448 (Darboux) Let f : [a, b] → R be a function that is differentiable on (a, b) and such that the one-sided derivatives f+ (a) and f− (b) both exist finite. Suppose that γ lies strictly between f+ (a) and f− (b). Then there exists c ∈ (a, b) so that f (c) = γ .
214
4 Functions
Proof Obviously, f is continuous on [a, b]. Assume that f+ (a) < γ < f− (b). Put g(x) := f (x) − γ x, for x ∈ [a, b]. The function g attains its infimum on [a, b] at
some point c (Corollary 335). Since g+ (a) < 0 < g− (b), it is clear that c ∈ (a, b). g(a+h)−g(a)
Indeed, since g+ (a) < 0, we have limh→0+ < 0, so for a small positive h number h we have g(a + h) < g(a), hence c = a. A similar argument can be used to prove that c = b. It follows from Theorem 362 that g (c) = 0, hence f (c) = γ . The argument in the case f+ (a) > γ > f− (b) is similar. 2 Corollary 449 Let f be a differentiable real-valued function defined on (a, b). Choose x, y ∈ (a, b) and suppose that γ lies strictly between f (x) and f (y) . Then there exists c strictly between x and y so that f (c) = γ . Remark 450 Functions that satisfy the conclusion of Corollary 449, i.e., that have the Intermediate Value Property (see Definition 341), are called sometimes Darboux functions, or functions having the Darboux property (after the name of J. G. Darboux). We saw in Theorem 339 that every continuous function is a Darboux function, and in Corollary 449 that the derivative of a differentiable function is a Darboux function. Thus, the function f in Example 4.5.8.3 is an instance of a Darboux function that is discontinuous at 0. We mentioned in Remark 342 an example of a real-valued function f defined on [0, 1] such that the image by f of every non-degenerate subinterval of [0, 1] is the interval [0, 1]. This function is clearly discontinuous at every point, although it has the Intermediate Value Property. The example (due to the Polish mathematicians B. Knaster and K. Kuratowski) is given in Exercise 13.207. ®
Chapter 5
Function Convergence
The study of sequences of functions is central in Analysis. On one hand, many functions we are working with can be defined as limits of sequences of elementary functions (polynomials, trigonometric polynomials, etc.). On the other hand, it may be convenient to approximate a given function by functions that have good properties in order to simplify computations or whole theories. The purpose of this chapter is to investigate all this.
5.1
Function Sequences
We shall consider a sequence {fn }∞ n=1 of real-valued functions defined on a certain nonempty subset D of R. We shall say that D is a common domain for all the functions fn of the sequence.
5.1.1
Pointwise and Almost Everywhere Convergence
Definition 451 Suppose we have a sequence {fn }∞ n=1 of real-valued functions defined on a common nonempty domain D ⊂ R. If limn→∞ fn (x) exists as a real number for every x ∈ D, we say that {fn }∞ n=1 converges pointwise on D. If this is the case, the function f : D → R defined by f (x) := limn→∞ fn (x) is called the ∞ pointwise limit of the sequence {fn }∞ n=1 on D. If the sequence {fn }n=1 converges pointwise on a subset D1 := D \ D0 of D such that D0 ⊂ D satisfies λ(D0 ) = 0, we say that it converges almost everywhere, in short (a.e.). The function f defined (a.e.) on D as f (x) := limn→∞ fn (x) for all x ∈ D1 , is called the (a.e.) limit of the sequence {fn }∞ n=1 . Pointwise convergence of the sequence of functions {fn }∞ n=1 on the set D means, then, the convergence of each of the sequences {fn (x)}∞ n=1 of real numbers, for x ∈ D. According to the Cauchy criterion (Theorem 152), this will happen if each © Springer International Publishing Switzerland 2015 V. Montesinos et al., An Introduction to Modern Analysis, DOI 10.1007/978-3-319-12481-0_5
215
216
5 Function Convergence
Fig. 5.1 a The first seven functions in Example 454. b The pointwise limit of the sequence
a
b
sequence {fn (x)}∞ n=1 is a Cauchy sequence. It is natural then to introduce the concept of a pointwise Cauchy sequence of functions and use it as a test for convergence. Definition 452 A sequence {fn }∞ n=1 of real-valued functions defined on a common nonempty domain D ⊂ R is said to be pointwise Cauchy on D if, for every x ∈ D, the sequence {fn (x)}∞ n=1 is Cauchy. The following result is the Cauchy criterion for pointwise convergence of a sequence of functions, and it is a straightforward consequence of the Cauchy criterion for sequences of real numbers (Theorem 152). Proposition 453 A sequence {fn }∞ n=1 of real-valued functions defined on a common nonempty domain D ⊂ R pointwise converges on D if and only if it is pointwise Cauchy on D. Although very useful (and, generally speaking, easy to compute), the pointwise limit of “good” functions may lack good properties. As a matter of fact, let us consider Example 454 below. It provides a sequence of monomials (note that monomials are continuous functions) defined on a closed and bounded interval in R, converging pointwise on the interval, and yet its pointwise limit being a discontinuous function. n Example 454 Consider the sequence {fn }∞ n=1 of functions fn (x) = x , where D(fn ) = [0, 1] for each n ∈ N (see Fig. 5.1a). Then the pointwise limit f = limn→∞ fn on [0, 1] is given by ⎧ ⎨0 if 0 ≤ x < 1, f (x) = ⎩1 if x = 1.
This follows from the fact that x n → 0 whenever |x| < 1 (see Corollary 132) and that fn (1) = 1 for all n ∈ N. Note that f , defined on [0, 1], is not continuous at x = 1 (see Fig. 5.1b). ♦ To classify functions that can be obtained as the pointwise limit of a sequence of functions of a certain type, R. Baire introduced the following hierarchy of classes: Definition 455 A real-valued continuous function defined on a nonempty set D (⊂ R) is said to belong to the (Baire) class 0 on D. The pointwise limit of a sequence of real-valued continuous functions defined on D is said to belong to the (Baire) class 1 on D. In general, once a certain Baire class n has been defined, functions that are
5.1 Function Sequences
217
the pointwise limit of a sequence of Baire class n functions are said to belong to the Baire class n + 1 on D. When D := R, we shall just speak of functions in the Baire class n. We saw in Example 454 that the characteristic function of the set {1} in [0, 1] is a Baire class 1 function that is not in the Baire class 0. In this context, see Theorem 456. See also Example 5.1.1.1 and Exercise 13.305 for an example of a function in the Baire class 2 that is not in the Baire class 1. It is possible to prove that, for every n ∈ N, there are functions in the Baire class n + 1 that are not in the Baire class n (see, e.g., [Nat55]). Apart from the possible discontinuities of pointwise limits of continuous functions, we still have to tackle another issue: The convergence rates may vary significantly for various input values x. In Example 454 (see also Corollary 132), it takes significantly longer for (0.999)n to approach zero than for (0.001)n to approach zero as n → ∞ (see Fig. 5.1). Both of these problems (continuity, rate of convergence) have to be addressed. New types of function convergence are needed. Some of them are considered in this chapter. Baire’s Theorem We saw in Example 454 that the pointwise limit of a sequence of continuous functions (i.e., a function of Baire class 1) does not need to be continuous. However, a result, due to Baire himself, shows that Baire class 1 functions must retain certain traces of continuity. A subset of R is said to be a Gδ -set if it is the intersection of a countable number of open subsets of R. Theorem 456 (Baire) Let F be a nonempty closed subset of R. Then, every realvalued function of Baire class 1 on F is continuous at the points of a dense Gδ -subset of F . Proof Let {fn }∞ n=1 be a sequence of continuous real-valued functions on F that converges pointwise to f . Let ε > 0 be given. For p ∈ N define Fp (ε) := {x ∈ F : |fn (x) − fm (x)| ≤ ε/3, for all m, n ≥ p}. Since, for any n, m ∈ N, the functions fn and fm are continuous, the set Fp (ε) is closed for all p ∈ N. Due to the pointwise convergence of the sequence {fn }∞ n=1 , we have ∞ Fp (ε) = F. p=1
Note that F1 (ε) ⊂ F2 (ε) ⊂ . . . . By the Baire Category Theorem 111, there must exist p0 ∈ N so that Fp0 (ε) has a nonempty interior relative to F . Therefore, Fp (ε) has a nonempty interior relative to F for all p ≥ p0 . Let Up (ε) be the interior of Fp (ε) (relative to F ) for p ≥ p0 (note Up (ε) ⊂ Uq (ε) if p ≤ q). Let U (ε) :=
∞ p=p0
Up (ε).
218
5 Function Convergence
Note that U (ε) is an open dense set relative to F . Now consider the set (see Definition 310) Gε = {x ∈ U (ε) : lim sup |f (y) − f (x)| < ε}. y→x
We claim that Gε = U (ε). Clearly Gε ⊂ U (ε). Suppose now x ∈ U (ε). Then there exists p ∈ N so that x ∈ Up (ε). Take y ∈ Up (ε). Then we have |fn (x) − fm (x)| ≤ ε/3, |fn (y) − fm (y)| ≤ ε/3, for all n, m ≥ p.
(5.1)
Letting m = p and n → ∞ in (5.1), we get |f (x) − fp (x)| ≤ ε/3, |f (y) − fp (y)| ≤ ε/3, hence |f (y) − f (x)| ≤ |f (y) − fp (y)| + |fp (y) − fp (x)| + |fp (x) − f (x)| ≤
2 ε + |fp (y) − fp (x)|. 3
(5.2)
Compute lim supy→x, y∈Up (ε) f (y) in (5.2) and use the fact that fp is continuous to get lim sup |f (y) − f (x)| < ε, y→x
so x ∈ Gε . This shows that U (ε) ⊂ Gε , hence Gε = U (ε). Therefore the set Gε is a dense open set relative to F for each ε > 0. Now define G :=
∞
G1/n .
n=1
The Baire Category Theorem 109 forces the set G to be dense relative to F . Clearly, any point of G is a point of continuity of f . Remark 457 The fact that the set G in Theorem 456 is a Gδ -set follows from the construction in its proof. However, it is a general result pertaining to an arbitrary real-valued function defined on R that the set of its points of continuity is a Gδ -set (see Exercises 13.181 and 13.366 for an extension to metric spaces). ® Examples 458 1. As an example of an interesting behavior related to Theorem 456, observe that, due to this result, the Dirichlet function described in Definition 296 (a function on R that is nowhere continuous, see Example 318.3) cannot be written as a pointwise limit of a sequence of continuous functions, i.e., it is not in the Baire class 1 (see Definition 455). However, it is in the Baire class 2 (see Exercise 13.305). 2. For each q ≥ 2, consider the set of fractions (nonzero and not equal to one) in [0, 1] that has q as a denominator. Denote this set as Fq . The number of elements
5.1 Function Sequences
219
Fig. 5.2 Approximating the Riemann function
in Fq is less than q. For each q ≥ 2, define a real-valued function (see Fig. 5.2) fq on [0, 1] as follows. If q = 2, let f2 be a continuous hat function with support in [1/4, 3/4] (i.e., it vanishes outside this interval) such that f2 (1/2) = 1/2. For q > 2, construct pairwise disjoint closed subintervals of [0, 1] with center at all fractions in [0, 1] with denominator q, and select those that where not chosen in previous steps. The graph of fq is a continuous hat function with support at the union of these intervals and having value 1/q at the center of each of these intervals. Then limq→∞ fq = R pointwise, where R is the Riemann function introduced in Definition 379. This shows that the Riemann function R is in the Baire class 1 (see Definition 455). Recall that R is continuous at all irrational numbers in [0, 1], a dense subset of [0, 1] (see Example 380). This agrees with the conclusion of Theorem 456. 3. Note that there is no real-valued function defined on R that is continuous at all points in Q and discontinuous at all points in R \ Q (see Item 3 in Sect. 6.9.2). Related to this, see also Exercise 13.181. ♦ The following is a fundamental result characterizing functions of Baire class 1: Theorem 459 (Baire’s Great Theorem) Let F be a closed subset of R and f be a real-valued function on F . Then the following statements are equivalent: (i) f is of Baire class 1. (ii) For every nonempty closed subset C of F , the restriction of f to C has a point of continuity. (iii) The preimage of every open subset of R by f is an Fσ -subset of F . For the proof we refer, e.g., to [DGZ93, p. 18].
5.1.2
Uniform Convergence
In this section we shall introduce some concepts that carry the adjective “uniform,” such as uniform convergence, uniform limit, uniform Cauchy, uniform boundedness, etc. (uniform continuity has been already defined, see Definitions 343). “Uniformity” can be translated to “in the same way along the entire domain,” or “with the same rate on the entire domain.” This applies, too, to the definition of uniform continuity. There is a way to unify the different appearances of the term “uniform”: To introduce a “distance” in the set under discussion—generally a set of functions—, a distance usually identified by the subscript or superscript ∞. Since general distances will
220
5 Function Convergence
Fig. 5.3 The functions fn , after some n, are in the shaded region (uniform convergence)
f1 f2 f3 f4 f5 2ε
f
only be discussed later—in Chap. 6—we prefer here to spell out the definition in each case. Definition 460 We say that a sequence of real-valued functions {fn }∞ n=1 defined on a common nonempty domain D ⊂ R converges uniformly to a function f on D if the following holds: for every ε > 0, there exists N ∈ N such that |f (x) − fn (x)| < ε, for all n ≥ N and x ∈ D.
(5.3)
We say that, in this case, f is the uniform limit on D of the sequence {fn }∞ n=1 , converges uniformly to f . We say that the sequence or that the sequence {fn }∞ n=1 ∞ {fn }∞ n=1 converges uniformly if there exists a function f : D → R such that {fn }n=1 converges uniformly to f . Observe that the uniform convergence of a sequence of functions to a function implies its pointwise convergence to the same function, while the converse is clearly false (see Remark 464 below). The crucial difference between pointwise convergence and uniform convergence is as follows. In the case of uniform convergence, the choice of N can be made independently of the value of x (it depends on ε, certainly), whereas in the case of pointwise convergence, this is not the case (now, such an N as in (5.3) exists depending on x and on ε). The convergence rates in case of uniform convergence are controlled independently of the chosen point x ∈ D (see Fig. 5.3). The corresponding version of the Cauchy criterion for uniform convergence needs a definition. Definition 461 A sequence {fn }∞ n=1 of real-valued functions defined on a common nonempty domain D ⊂ R is said to be uniformly Cauchy on D if, given ε > 0, there exists N ∈ N such that, for every n, m ≥ N and every x ∈ D, we have |fn (x) − fm (x)| < ε. The following result is the Cauchy criterion for uniform convergence of a sequence of functions. The proof is the natural adaptation of the one provided for sequences of real numbers (Theorem 152), and so it will be omitted. Proposition 462 A sequence {fn }∞ n=1 of real-valued functions defined on a common nonempty domain D ⊂ R uniformly converges to some function on D if and only if it is uniformly Cauchy on D.
5.1 Function Sequences
221
Uniform Convergence and Continuity The following result addresses the first problem encountered when dealing with pointwise convergence, i.e., the lack of continuity of the pointwise limit of a sequence of continuous functions. That uniform convergence avoids this “pathology” (Theorem 463 below) is an important result in real analysis and allows for construction of large classes of continuous functions. Theorem 463 The uniform limit of a sequence {fn }∞ n=1 of real-valued continuous functions defined on a common domain D ⊂ R is a continuous function on D. Proof Let f be the uniform limit of the sequence {fn }∞ n=1 on D. Fix ε > 0. Choose n so that |fn (x) − f (x)| < 3ε for all x ∈ D. Let x0 ∈ D. Since fn is continuous at x0 , we may choose δ > 0 so that |fn (x) − fn (x0 )| < 3ε if x ∈ D and |x − x0 | < δ. Now |f (x)−f (x0 )| < |f (x)−fn (x)|+|fn (x)−fn (x0 )|+|fn (x0 )−f (x0 )| < if |x − x0 | < δ. Thus f is continuous at x0 .
ε ε ε + + =ε 3 3 3
Remark 464 1. As a consequence of Theorem 463, the sequence of functions {x n }∞ n=1 defined on [0, 1], given in Example 454, does not converge uniformly on this interval, although it converges pointwise (see Fig. 5.1). This can be shown, as a matter of fact, directly from the definition. Indeed, let ε = 21 and n > 0 be arbitrary. Due to the fact that limx→1− x n = 1, we can choose x ∈ (0, 1) such that x n > 21 . This shows that the convergence of {x n }∞ n=1 to the function f : [0, 1] → R given by f (x) = 0 for x ∈ [0, 1) and f (1) = 1 (see Example 454 and Fig. 5.1) is not uniform on [0, 1], for otherwise, for large n ∈ N, we should have x n < 1/2 for all x ∈ [0, 1). Observe, too, that the convergence is uniform on every subinterval [0, δ], for 0 < δ < 1. Indeed, 0 ≤ x n ≤ δ n for x ∈ [0, δ], and for n ∈ N. The result follows then from Proposition 466 below. Related to this remark and the example of the sequence {x n }∞ n=1 , see Exercises 13.308 and 13.322. 2. Note that the result similar to Theorem 463 is false for both the notions of Lipschitz or absolutely continuous functions. Indeed any continuous function f on a bounded closed interval J is the uniform limit of polynomials (see Theorem 490 below), and those are Lipschitz functions on J (see Proposition 445), √hence absolutely continuous there. Thus it is enough to consider the function x (Example 4.5.8.1) or the Lebesque’s singular function S (see Remark 441), respectively, on [0, 1]. This again shows how ingenious the definition of continuity is. ® Remark 465 The fact that a sequence {fn }∞ n=1 of functions converges uniformly to a function f on its domain D has some more advantages: First, all values of f can be determined as soon as the values of fn are known on a dense subset S of D for all n ∈ N. Indeed, f is then known on S. Since f is continuous, it is thus known on D. Second, since the discrepancy between fn and f is uniformly small, a plot of fn for a large n gives an accurate image of the graph of f (see Fig. 5.3). ®
222
5 Function Convergence
Some Criteria for Uniform Convergence We now have to develop tests as to when a sequence of functions converges uniformly to its pointwise limit. The following statement, the proof of which is left to the reader, gives an equivalent condition for the uniform convergence of a sequence of functions that is frequently used. Proposition 466 A sequence {fn }∞ n=1 of real-valued functions on a domain D ⊂ R converges uniformly to a function f defined on D if and only if σn → 0 as n → ∞, where σn := sup{|fn (x) − f (x)| : x ∈ D}. Our next result is due to U. Dini. We need first a definition. Definition 467 A sequence of real-valued functions {fn }∞ n=1 defined on a common domain D ⊂ R is said to be increasing if fn (x) ≤ fn+1 (x) for every n ∈ N and every x ∈ D. Decreasing sequences are defined similarly. The following theorem is very useful in practical problems. Theorem 468 (Dini) Consider an increasing sequence {fn }∞ n=1 of continuous realvalued functions, defined on a common compact domain K ⊂ R, which converges pointwise to a continuous function f . Then the convergence is uniform. Remark 469 A similar statement holds for decreasing sequences.
®
Proof of Theorem 468 Let ε > 0 and n ∈ N be given and consider the set An := {x ∈ K : f (x) − fn (x) ≥ ε}. Since f and fn are both continuous, the set An is closed. We claim that there exists N ∈ N so that An = ∅ for all n ≥ N . Do note that this ensures uniform convergence. ∞ Assume the contrary. Since ∞the sequence {fn }n=1 is increasing, we have that n ≤ m implies Am ⊂ An . Then n=1 An = ∅ due to the compactness of K (see Corollary 148). Choose x ∈ ∞ n=1 An . Then, |f (x) − fn (x)| ≥ ε for all n ∈ N, and this violates the pointwise convergence of {fn }∞ n=1 to f (at x). Remark 470 1. The monotonicity condition in Theorem 468 cannot be dropped. Consider, in this direction, the following example: Let fn , n = 2, . . ., be the function on [0, 1] whose graph is the broken line through the point [0, 0], [ n1 , 1], [ n2 , 0], [1, 0] (see Fig. 5.4). Then fn → 0 pointwise on [0, 1] and {fn } do not converge to 0 uniformly on [0, 1], since for every n ∈ N, n ≥ 2, there is a point xn ∈ [0, 1| such that fn (xn ) = 1. 2. The continuity of the pointwise limit function in Theorem 468 cannot be dropped, n either. Indeed, the sequence {fn }∞ n=1 of functions defined on [0, 1] by fn (x) := x for all n ∈ N and x ∈ [0, 1] pointwise converges to a discontinuous function f , see Remark 464.1. We saw there that the convergence of the sequence toward the limit function was not uniform.
5.1 Function Sequences Fig. 5.4 The first four elements in the sequence of functions in Remark 470
223 1
f5 f4 f3 0
f2 1
Fig. 5.5 The first four functions in both sequences (Example 471)
3. The compactness of the domain in Theorem 468 cannot be dropped, either. ® Observe the sequence {x n }∞ n=1 of functions defined on (0, 1). Example 471 Put fn (x) := x n − x n+1 and gn (x) := x n − x 2n for n ∈ N. We claim ∞ that the two sequences {fn }∞ n=1 and {gn }n=1 converge pointwise on [0, 1] to the function 0, such that the convergence of {fn }∞ n=1 is uniform, while the convergence of {gn }∞ n=1 is not. See Fig. 5.5. Indeed, the pointwise convergence of both the sequences to the function 0 is due to the fact that fn (x) = x n (1 − x) and gn (x) := x n (1 − x n ) for each x ∈ [0, 1] and n ∈ N. Now, if x ∈ [0, 1), the sequence {x n }∞ n=1 converges to 0—see Corollary 132—(and so fn (x) → 0 and gn (x) → 0 whenever n → ∞). Furthermore, both fn (1) = 0 and gn (1) = 0 for all n ∈ N. To ensure that the sequence {fn }∞ n=1 converges uniformly, we may apply Theorem 468, due to the fact that it is decreasing. Indeed, fn (x) − fn+1 (x) = x n (1 − x)2 ≥ 0 for all x ∈ [0, 1] and all n ∈ N. We may also provide a direct argument to prove the uniform convergence of {fn }∞ n=1 to the function 0: Fix ε ∈ (0, 1). If x ∈ (1 − ε, 1], then 0 ≤ (1 − x) < ε, and so 0 ≤ fn (x) < ε for all n ∈ N. If, on the contrary, x ∈ [0, 1 − ε], we can find n0 ∈ N (depending only on ε) such that 0 ≤ fn (x) < ε for all n ≥ n0 (and all x ∈ [0, 1 − ε]). Indeed, find n0 so that (1 − ε)n < ε whenever n > n0 . If n > n0 , then fn (x) = x n (1 − x) ≤ x n ≤ (1 − ε)n < ε for every x ∈ [0, 1 − ε]. This shows the statement. −1/n Regarding the sequence {gn }∞ ) = 1/4 and 2−1/n ∈ (0, 1), n=1 , observe that gn (2 ∞ for all n ∈ N. This shows that the sequence {gn }n=1 does not converge uniformly to the function 0. Note that 2−1/n → 1 as n → ∞, so the sequence {gn }∞ n=1 does not converge uniformly on any nondegenerate subinterval of [0, 1] that contains 1.
224
5 Function Convergence
Related to this, we remark that there is a sequence {hn }∞ n=1 of real-valued continuous functions on R that pointwise converges to the function 0 and on no interval (a, b) ⊂ R, the sequence {hn }∞ ♦ n=1 converges uniformly. See Exercise 13.324. Our next result is due to K. Weierstrass, and provides a sufficient condition for uniform convergence. This result is referred to as the Weierstrass M-test, and it concerns series of functions. In applications this result is very powerful and is used quite extensively. A mathematician who is not also something of a poet will never be a complete mathematician. Karl Weierstrass
Let us start by introducing some notation on series of functions. Definition 472 Assume that {fn }∞ defined n=1 is a sequence of real-valued functions ∞ on a common domain D ⊂ R. The sequence {sN }N =1 defined by sN := N n=1 fn ∞ . We say that the series for N ∈ N is called the sequence of partial sums of {f } n n=1 ∞ f pointwise (uniformly) converges to a function f : D → R if the sequence n n=1 {sN }∞ (respectively, uniformly) converges to the function f . The series N =1 pointwise will be denoted by ∞ n=1 fn independently of its character and, if the series pointwise converges, f will be called its sum, and will be denoted by ∞ n=1 fn . Theorem 473 (Weierstrass M-test) Consider a sequence {fn }∞ n=1 of real-valued functions defined on a common domain D ⊂ R, and suppose that for each n ∈ N there exists that |fn (x)| ≤ Mn for all x ∈ D. Then, if the a real number Mn ≥ 0 so ∞ series ∞ M converges, the series n n=1 n=1 fn converges uniformly on D. ∞ Proof Since the series n=1 Mn converges, it is Cauchy, i.e., for every ε > 0, we can choose N ∈ N so that, if N < n < m, then m k=n+1 Mk < ε. Now, for N < n < m, and for x ∈ D, m m m |sm (x) − sn (x)| = fk (x) ≤ |fk (x)| ≤ Mn < ε. (5.4) k=n+1
k=n+1
k=n+1
Therefore, the sequence {sn }∞ n=1 is uniformly Cauchy. It is enough to apply Proposition 462. Remark 474 The Weierstrass M-test, as it has been stated (Theorem 473), gives only a sufficient condition for uniform convergence of a function series. That the condition is by no means necessary is shown in Exercise 13.320. Despite this, its usefulness is apparent; see Example 475 below and Exercises 13.266 and 13.310. ® Example 475 Consider the function (see Fig. 5.6) f (t) :=
∞ sin (n2 t) n=1
n2
, for t ∈ R.
(5.5)
This function was originally studied by Weierstrass for checking differentiability properties. The series in (5.5) is uniformly convergent: it is enough to observe that 1 the series ∞ n=1 n2 converges (see Proposition 174), and then apply Theorem 473.
5.1 Function Sequences Fig. 5.6 The plot of the function f in (5.5)
225 1.5 1 0.5 0 –0.5 –1 –1.5
0
1
2
3
4
5
6
7
Since the convergence is uniform and the partial sums are, certainly, continuous, it follows from Theorem 463 that the limit function f is continuous. ♦ Additional Tests for Uniform Convergence of General Function Series Theorem 473 provides a useful test for the uniform convergence of a series of functions. We add here some other tests analogous to Dirichlet’s and Abel’s criteria for the convergence of series of numbers (Theorems 181 and 182, respectively). Their proof is just an adaptation of the ones provided for those criteria. We include them here for the sake of completeness. Those criteria appeared for the first time in some of the papers of the British mathematician, G. H. Hardy. We need first a definition. Definition 476 A sequence {fn }∞ n=1 of real-valued functions defined on a common domain D ⊂ R is called uniformly bounded if there exists a real number M such that |fn (x)| ≤ M for every n ∈ N and every x ∈ D. ∞ Theorem 477 Let {an }∞ n=1 and {bn }n=1 be sequences of real-valued functions defined on a common domain D ⊂ R. Assume that one of the two sets of conditions hold.
(i) [Dirichlet] The sequence {bn (x)}∞ n=1 is monotone for each x ∈ D, the sequence ∞ {bn }∞ n=1 converges uniformly to 0 on D, and the sequence {An }n=1 of partial sums ∞ associated to {an }n=1 is uniformly bounded on D. ∞ (ii) [Abel] The sequence {bn }∞ n=1 is uniformly bounded ∞ on D, the sequence {bn (x)}n=1 is monotone for each x ∈ D, and the series n=1 an is uniformly convergent on D. Then ∞ n=1 an bn is uniformly convergent on D. Proof Put f ∞ := sup{|f (x)| : x ∈ D} for a bounded real-valued function f defined on D.
226
5 Function Convergence
(i) Let A be an upper bound for the sequence {An ∞ }∞ n=0 . The existence of A follows from (i). Fix x ∈ D. Assume first that bn (x) ≥ bn+1 (x) for all n ∈ N ∪ {0}. From Lemma 179, we get, for p ≤ q in N, q−1 q an (x)bn (x) = An (x) (bn (x) − bn+1 (x)) + Aq (x)bq (x) − Ap−1 (x)bp (x) n=p
n=p
q−1
≤
. . . . An ∞ (bn (x) − bn+1 (x))+ .Aq .∞ bq (x)+ .Ap−1 .∞ bp (x)
n=p
≤A
q−1
(bn (x) − bn+1 (x)) + Abq (x) + Abp (x)
n=p
= A(bp (x) − bq (x)) + Abq (x) + Abp (x) = 2Abp (x).
(5.6)
If, on the contrary, bn (x) ≤ bn+1 (x) for all n ∈ N ∪ {0}, we get the same estimate. Given ε > 0 we can find, due to (iii), N ∈ N such that 2Abp ∞ < ε for every p ≥ N. According to Definition 472, the series an bn is uniformly Cauchy, hence, by Proposition 462, uniformly convergent. (ii) Fix x ∈ D and assume that {bn (x)}∞ n=0 is decreasing. Given ε > 0, there exists N ∈ N ∪ {0} such that A n ∞ < ε for every n ≥ N , where A n := nk=N ak for n ≥ N (see Remark 180). Observe that {bn ∞ }∞ n=0 is a bounded sequence; let B be . Lemma 179 (in the version of Remark 180, formula an upper bound for {bn ∞ }∞ n=0 (2.23)) gives for N ≤ p < q, according to the estimation in (5.6), q an (x)bn (x) ≤ A B + A B + A B + A B = 4A B, (5.7) n=p
where A is an upper bound for the sequence {A n ∞ }∞ n=N (so A ≤ ε). The same ∞ estimate holds in case that {bn (x)}n=0 is increasing. The conclusion follows from the Cauchy criterion for uniform convergence (Proposition 462).
Egorov’s Theorem As we saw in Example 454 and Remark 464, pointwise convergence of a sequence of functions need not imply uniform convergence. We can be more precise regarding Example 454: there, the sequence {fn }∞ n=1 , where each fn was defined on [0, 1] by fn (x) := x n , was seen to converge pointwise to the function f : [0, 1] → R given by f (x) = 0 for x ∈ [0, 1) and f (1) = 1, but not uniformly on [0, 1], see Fig. 5.1. The following remarks are in order. n 1. The sequence {fn }∞ n=1 , where fn (x) := x for all x ∈ [0, 1] and n ∈ N, does not converge uniformly on any subset S ⊂ [0, 1] of Lebesgue measure 1. Indeed, such set cannot avoid any interval—as otherwise its complement would have positive measure. So, sup S = 1. Observe that if the convergence on S is uniform, then the convergence on S \ {1} will also be uniform, and the set S \ {1} has Lebesgue measure 1. The pointwise limit of {fn }∞ n=1 on S \ {1} is the function 0.
5.1 Function Sequences
227
However, supx∈S\{1} x n = 1 for each n ∈ N, and we reach a contradiction. Hence the convergence of {fn }∞ n=1 to the function f on S cannot be uniform. 2. However, given 0 < ε < 1, we have that fn → 0 uniformly on [0, 1 − ε]. Indeed, if sn := supx∈[0,1−ε] fn (x) ( = (1 − ε)n ), the sequence {sn }∞ n=1 converges to 0 (see Corollary 132), and we can apply Theorem 473. 3. In Example 5.1.1.2, we saw that there is a sequence of continuous functions on [0, 1] that pointwise converges to the Riemann function (Definition 379), i.e., the Riemann function is in the Baire class 1. Since the Riemann function is continuous on no interval, this convergence is uniform on no interval in [0, 1]. In this direction, see also Exercise 13.324. All these comments should be compared with the following theorem, which is due to the Russian mathematician D. Egorov. Theorem 478 (Egorov) Suppose that {fn }∞ n=1 is a sequence of real-valued measurable functions defined on a closed and bounded interval [a, b], which converges pointwise to a function f . Then for every ε > 0, there exists a measurable subset E of [a, b] so that fn → f uniformly on E and λ([a, b] \ E) < ε. Proof Let ε > 0 be given. For every k, m ∈ N, define the set Em (k) := {x ∈ [a, b] : |f (x) − fn (x)| < 1/k for all n ≥ m}. Use Proposition 402 (c) to conclude that f is a measurable function. This implies that the set Em (k) is measurable for every m, k ∈ N. Since fn → f pointwise, observe that for every k ∈ N, E1 (k) ⊂ E2 (k) ⊂ E3 (k) ⊂ · · ·, and
∞
Em (k) = [a, b].
(5.8)
m=1
For each k ∈ N, choose mk ∈ N so that
ε λ [a, b]\Emk (k) < k . 2 This is possible due to (5.8) and Lemma 255. Define the set E=
∞
Emk (k)
k=1
and note that λ ([a, b]\E)
0, the set E has positive measure, and we may apply Proposition 271. 3. Note that the conclusion of Theorem 478 holds if the sequence {fn } is assumed only to converge to f (a.e.). Indeed, if this happens and we redefine fn and f to f/n and f/, respectively, by letting all of them be zero on a common set N of measure zero such that {fn } pointwise converges on N c , and keeping their values on N c , then {f/n } converges pointwise to f/ (and all those functions are measurable). Given ε > 0, find a set E according to Theorem 478 (for {f/n } and f/). The set [a, b] \ (E \ N) still satisfies λ([a, b] \ (E \ N )) < ε, and the convergence of {fn } to f is uniform on E \ N . ® Uniform Convergence and Differentiability A uniform limit of a sequence of differentiable functions need not be differentiable (see Fig. 5.7 and, more precisely, Lemma 484 below). The following is a sufficient condition for a uniform limit of differentiable functions to be differentiable. Theorem 480 Let {fn }∞ n=1 be a sequence of functions defined on an interval [a, b], each of them continuous on [a, b] and differentiable on (a, b). Assume that the sequence {fn }∞ n=1 converges uniformly on (a, b). Also, assume that lim n→∞ fn (c) exists as a real number for some c ∈ [a, b]. Then {fn }∞ n=1 converges uniformly on [a, b] to some function f , and f is continuous on [a, b] and differentiable on (a, b). Moreover, lim fn (x) = f (x)
n→∞
for all x ∈ (a, b).
∞ Proof Let g be the (uniform) limit of the sequence {fn }∞ n=1 on (a, b). Since {fn }n=1 is uniformly convergent on (a, b), it is uniformly Cauchy there; this implies that, for sufficiently large m, n ∈ N, the number Mm,n := sup fm (x) − fn (x) x∈(a,b)
exists as an element in R. Thus, without loss of generality, we may assume that Mm,n is finite for all m, n ∈ N. Moreover, Mm,n → 0 as m, n → ∞ (the convergence in
5.1 Function Sequences
229
the sense of Definition 201). Apply the Mean Value Theorem 365 to the function fm − fn in order to obtain |(fm − fn )(x) − (fm − fn )(t)| ≤ Mm,n |x − t|, for all x, t ∈ [a, b].
(5.9)
Set t = c in (5.9) and note that |(fm − fn )(x)| ≤ |(fm − fn )(x) − (fm − fn )(c)| + |(fm − fn )(c)| ≤ Mm,n (b − a) + |fm (c) − fn (c)|
(5.10)
for all x ∈ [a, b]. Since {fn (c)}∞ n=1 is a Cauchy sequence and Mm,n → 0 as m, n → ∞, it follows from (5.10) that the sequence {fn }∞ n=1 is uniformly Cauchy on [a, b]. Hence, by Proposition 462, it converges uniformly on [a, b] to a certain function f . By Theorem 463, the function f is continuous on [a, b]. Let ε > 0 be given. Choose N ∈ N so that sup {|fN (x) − g(x)|} < ε/3.
(5.11)
x∈(a,b)
Fix x ∈ (a, b). Thanks to the differentiability of fN at x, we can choose δ > 0 so that fN (t) − fN (x)
< ε/3, (x) (5.12) − f N t −x provided 0 < |t − x| < δ. Now fix t ∈ (a, b) such that 0 < |t − x| < δ. Again by the Mean Value Theorem 365 we have, for any m ∈ N, that there exists z ∈ (a, b) (depending on m) such that fm (t) − fm (x) fN (t) − fN (x) − t −x t −x (fm − fN )(t) (fm − fN )(x) = − (5.13) t −x t −x = |(fm − fN ) (z)|
≤ sup {|fm (x) − fN (x)|} .
(5.14)
x∈(a,b)
Let m → ∞ in (5.14). Recall that {fn }∞ n=1 converges (pointwise and uniformly) to converges uniformly to g on (a, b). We get f on [a, b], and that {fn }∞ n=1 f (t) − f (x) fN (t) − fN (x) ≤ sup {|g(x) − f (x)|} ( < ε/3). − (5.15) N t −x t −x x∈[a,b] From (5.11), (5.12), and (5.15), we obtain f (t) − f (x) − g(x) < ε, t −x provided 0 < |t − x| < δ. Thus f (x) = g(x) for all x ∈ (a, b).
230
5 Function Convergence
Fig. 5.8 The function φ and the two first functions f1 and f2 (Definition 481)
2
φ 0
1/2
4
12
8
f1
f2
An Example (the Takagi–van der Waerden Function) We now present a series (consisting of continuous functions) that converges uniformly to a function that is (continuous and) nowhere differentiable. An example of this sort was provided by the Japanese mathematician T. Takagi in 1903, and independently by the Dutch mathematician B. L. van der Waerden in 1930. By Theorem 463, the limit function is guaranteed to be continuous. Such a construction can be useful when we try to digitally create a fractal-like object such as a cloud, snowflake, or a coastal line. The first examples of continuous nowhere differentiable functions were due to the Czech mathematician B. Bolzano around 1830 (published in 1922), the Swiss mathematician Ch. Cellérier around 1860, and K. Weierstrass (in 1872, published by the German mathematician P. du Bois-Reymond in 1875). For a detailed account of this story see, e.g., [Th03]. As an application of Baire category theorem 641, we shall provide an existence proof of continuous nowhere differentiable functions in Sect. 6.9.2.1. Definition 481 Let := {4k : k = 0, 1, 2, . . . } = {0, 4, 8, 12, . . .}, and define a real-valued function φ on R by φ(x) := distance between x and the set , for all x ∈ R, x ≥ 0, and φ(x) := φ(−x) for all x ∈ R, x < 0. For each n ∈ N, we define the function fn (x) = 41n φ(4n x), for all x ∈ R (see Fig. 5.8). Let f (x) :=
∞
fn (x), for all x ∈ R.
(5.16)
n=1
We refer to f as the Takagi–van der Waerden function (see Figs. 5.9 and 5.10). Proposition 482 The Takagi–van der Waerden function f introduced in Definition 481 is continuous and nowhere differentiable on R. Proof First note that |φ(s) − φ(t)| = |s − t|
(5.17)
5.1 Function Sequences
231
whenever s, t ∈ R are such that there is no even integer in the open interval having endpoints s and t. Fix a real number x. For each k ∈ N, choose ek = 1 or ek = −1, so that the interval (4k x, 4k x + ek ) contains no even integer. If 1 ≤ n ≤ k, the interval (4n x, 4n x + 4n−k ek ) contains no even integer. Indeed, assume that it contains 2p, for some integer p. Then (4k x, 4k x + ek ) would contain 2p4k−n , a contradiction. Thus, for 1 ≤ n ≤ k we have, due to (5.17), fn (x + 4−k ek ) − fn (x)) = 4−n φ(4n x + 4n−k ek ) − φ(4n x) = 4−n 4n−k = 4−k . (5.18) Clearly, if n > k then fn (x + 4−k ek ) = fn (x).
(5.19)
Put hk := 4−k ek for k ∈ N. Then, by using (5.18) and (5.19), k f (x + hk ) − f (x) fn (x + hk ) − fn (x) = lim k→∞ k→∞ hk hk n=1
lim
k k 4−k = lim ek = lim kek . k→∞ k→∞ k→∞ 4−k ek n=1 n=1
= lim
Obviously, this last limit does not exist, hence f cannot have a derivative at x. However, the Weierstrass M-test (Theorem 473) ensures that the convergence of the series in (5.16) is uniform on R. Since each summand is a continuous function, we get that the function f is continuous on R. The first three steps in the construction of the Takagi–van der Waerden function on the interval [0, 1] appear in Fig. 5.9. A more accurate graph of the Takagi–van der Waerden function on the same interval [0, 1] is sketched in Fig. 5.10. The reader may appreciate the fractal-like nature of the Takagi–van der Waerden function as we zoom in on the interval x ∈ [0.299, 0.301] (see Fig. 5.11). Imagine the graph of the Takagi–van der Waerden function is a coast line. The lack of differentiability of the function can be imagined as follows: no matter how high the resolution of the map is, no line-like section of the coast can ever be found. For a related example, exhibiting a continuous nowhere monotone function, whose construction relies on ideas above, see Exercise 13.233. Uniform Approximation by Polynomials Polynomials are computationally very nice functions. According to Theorem 463, the uniform limit of a sequence of polynomials is a continuous function. It is a natural question, with important practical implications, whether, conversely, every continuous function can be approximated uniformly by polynomials. We address this problem below. Definition 483 We say that a real-valued function f on [a, b] can be uniformly approximated by polynomials to an arbitrary accuracy if for every ε > 0, there
232
5 Function Convergence
Fig. 5.9 Three steps in building the Takagi–van der Waerden function in [0, 1]
f1
f1 + f2
f1 + f2 + f3
0
Fig. 5.10 The graph of the Takagi–van der Waerden function on [0, 1]
1
0.7 0.6 0.5 0.4 0.3 0.2 0.1 0
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
exists a polynomial p such that |f (x) − p(x)| < ε for all x ∈ [a, b]. Observe that a real-valued function f defined on [a, b] can be uniformly approximated by polynomials to an arbitrary accuracy if and only if f is the uniform limit on [a, b] of a sequence of polynomials defined on [a, b]. One of the most striking results in the area is that we can approximate any continuous function on a closed interval by polynomials to any arbitrary accuracy. This result is the Weierstrass approximation theorem (Theorem 490 below). We shall provide three different proofs
5.1 Function Sequences
233
0.369
0.3685
0.368
0.3675
0.367
0.3665
0.366
0.3655 0.299 0.2992 0.2994 0.2996 0.2998
0.3
0.3002 0.3004 0.3006 0.3008 0.301
Fig. 5.11 Zooming on the graph of the Takagi–van der Waerden function Fig. 5.12 The first polynomials in Lemma 484, and the limit function |x|
of this theorem, the first one based on the stability of the class of functions that can be uniformly approximated by polynomials on the given interval (proof of Theorem 490), the second one in Exercise 13.228, by using the notions of convolution and of approximate identity, and a third one based on Fejér’s Theorem 859 (see Remark 860). For the first approach, and in order to establish the result, we need some lemmas. Lemma 484 The function t $ → |t| defined on [−1, 1] (see Figs. 4.19 and 5.12) is the uniform limit of an increasing sequence {pn }∞ n=1 of polynomials such that 0 ≤ pn (t) ≤ |t| for all n ∈ N and all t ∈ [−1, 1]. Proof Define a sequence {pn }∞ n=1 of polynomials on [−1, 1] by the recursive formula p0 ≡ 0; pn+1 (t) := pn (t) +
1 2 t − pn2 (t) , for n ∈ N, t ∈ [−1, 1]. 2
(5.20)
234
5 Function Convergence
Observe that for n ∈ N ∪ {0} and t ∈ [−1, 1] we have
1 |t| − pn+1 (t) = |t| − pn (t) − |t|2 − pn2 (t) 2
1
= |t| − pn (t) − |t| − pn (t) |t| + pn (t) 2
1 = |t| − pn (t) 1 − |t| + pn (t) . 2
(5.21)
We shall prove first, by induction, that 0 ≤ pn (t) ≤ |t| for all t ∈ [−1, 1], and all n ∈ N ∪ {0}.
(5.22)
Fix t ∈ [−1, 1]. Certainly, 0 = p0 (t) ≤ |t|. Assume that (5.22) holds for some n ∈ N ∪ {0}. Thus, 0 ≤ |t| + pn (t) ≤ 2|t| ≤ 2, hence 1 − (1/2)(|t| + pn (t)) ≥ 0.
(5.23)
By (5.21), (5.22), and (5.23), we obtain |t| − pn+1 (t) ≥ 0, i.e., pn+1 (t) ≤ |t|. Since this implies that pn (t) ≤ pn+1 (t), we obtain 0 ≤ pn (t) ≤ pn+1 (t) ≤ |t|. This holds for all t ∈ [0, 1], and so (5.22) holds for all n ∈ N ∪ {0}. Observe, too, that we proved that the sequence {pn }∞ n=0 is increasing. This, together with (5.22), proves that {pn (t)}∞ converges pointwise (to l(t), say), for every t ∈ [−1, 1]. Taking limits n=0 in (5.20) for n → ∞, we get l(t) = l(t)+(1/2)(t 2 −l 2 (t)), and this implies l(t) = |t|, since 0 ≤ l(t), for all t ∈ [0, 1]. This shows that {pn }∞ n=0 converges pointwise to the function |t| on [−1, 1]. Since the sequence {pn }∞ n=0 is increasing, the uniform convergence is a consequence of Dini’s Theorem 468. We list a few polynomials in the sequence {pn }∞ defined in Lemma 484 and n=1 provide, in Fig. 5.12, the corresponding graph. p0 ≡ 0 ; p1 (t) =
1 2 1 3 5 1 1 8 t ; p2 (t) = t 2 − t 4 ; p3 (t) = t 2 − t 4 + t 6 − t . 2 8 2 8 8 128
Remark 485 The requirement that the approximation to the function |x| by polynomials is done on a bounded interval, as in Lemma 484, is crucial. On an unbounded interval this is by no means true, see Exercise 13.321. ® Lemma 486 Let fi , i = 1, 2, . . ., n, be a finite sequence of real-valued functions on [a, b] that can be uniformly approximated to an arbitrary accuracy by polynomials. Then the same is true for the product function f1 f2 . . . fn . Proof It is enough to prove the result for two functions f , g and then apply finite induction. The functions f and g are continuous on [a, b], due to Theorem 463. Therefore, both functions are bounded on [a, b] and then, by scaling, we may assume, without loss of generality, that |f (x)| ≤ 1/2 and |g(x)| ≤ 1/2 for all x ∈ [a, b]. Fix 0 < ε < 1/2 and find two polynomials p and q such that |f (x) − p(x)| < ε
5.1 Function Sequences
235
and |g(x) − q(x)| < ε for all x ∈ [a, b]. In particular, |q(x)| ≤ 1 for all x ∈ [a, b]. Therefore, |(f g)(x) − (pq)(x)| = |f (x)g(x) − f (x)q(x) + f (x)q(x) − p(x)q(x)| ≤ |f (x)||g(x) − q(x)| + |f (x) − p(x)||q(x)| ≤ (1/2)ε + ε = (3/2)ε. Since 0 < ε < 1/2 was arbitrary and pq is a polynomial, the result follows.
Corollary 487 Suppose f is a real-valued function on [a, b] that can be uniformly approximated to an arbitrary accuracy by polynomials. Then |f | has the same property. Proof Observe first that f , as the uniform limit of a sequence of polynomials, must be continuous (Theorem 463), and thus bounded on [a, b] (see Corollary 335). By scaling, we may then assume that |f (x)| ≤ 1 for all x ∈ [a, b]. Given ε > 0, use Lemma 484 to find a nonnegative polynomial p such that 0 ≤ |t| − p(t) ≤ ε for all t ∈ [−1, 1]. In particular, 0 ≤ |f (x)| − p(f (x)) < ε for all x ∈ [a, b]. Observe, too, that necessarily p(0) = 0, hence p has the form p(t) = a1 t + a2 t 2 + . . . + an t n for some finite sequence of coefficients a1 , a2 , . . ., an and some n ∈ N. Thus p(f ) = a1 f + a2 f 2 + . . . + an f n . Now, for m = 1, 2, . . ., n, the function f m can be approximated uniformly by polynomials (see Lemma 486), so the same is true for p(f ). The result follows. Definition 488 Let f and g be real-valued functions on [a, b]. Then we denote (f ∨ g)(t) := max{f (t), g(t)}, and (f ∧ g)(t) := min{f (t), g(t)}, for all t ∈ [a, b]. Corollary 489 Let {fi }ni=1 be a finite sequence of real-valued functions defined on [a, b] that can be uniformly approximated by polynomials to an arbitrary accuracy. Then the same is true for the functions f1 ∨ f2 ∨ . . . ∨ fn and f1 ∧ f2 ∧ . . . ∧ fn . Proof It is enough to prove the result for two functions f , g and apply later finite induction. Note that (f ∨ g)(t) =
1 1 (f + g + |f − g|) , and (f ∧ g)(t) = (f + g − |f − g|) , 2 2
so the result follows from Corollary 487. The following important theorem is due to K. Weierstrass.
Theorem 490 (Weierstrass) Any real-valued continuous function f on [a, b] can be uniformly approximated by polynomials to an arbitrary accuracy.
236
5 Function Convergence
Proof Let ε > 0 and x, y ∈ [a, b]. If x = y, let us define a function hx,y on R as hx,y (t) := at + b, where a :=
f (x) − f (y) and b := f (x) − ax x−y
(the graph of hx,y is a straight line through (x, f (x)) and (y, f (y))). If x = y, let hx,x be the constant function defined on R with value f (x). Put Vx,y := {t ∈ [a, b] : hx,y (t) < f (t) + ε}, and note that x, y ∈ Vx,y . Since f − hx,y is continuous, the set Vx,y is open. Fix x ∈ [a, b]. The family {Vx,y : y ∈ [a, b]} is an open cover of [a, b]. Since [a, b] is compact, there exists a finite open subcover {Vx,yn : n = 1, 2, · · ·, N }. Define gx := hx,y1 ∧ hx,y2 ∧ · · · ∧ hx,yN . By Corollary 489, gx can be approximated by a polynomial to an arbitrary accuracy. Note that gx (t) < f (t) + ε for all t ∈ [a, b]. Now define Wx := {t ∈ [a, b] : gx (t) > f (t) − ε} and note that Wx is an open set that contains x. The open cover {Wx : x ∈ [a, b]} has a finite subcover {Wxn : n = 1, 2, · · ·, M}. Define g = gx1 ∨ gx2 ∨ · · · ∨ gxM , and note that g can be approximated by a polynomial to an arbitrary accuracy. Moreover, f (t) − ε < g(t) < f (t) + ε for all t ∈ [a, b]. The result follows.
Remark 491 The Weierstrass Approximation Theorem 490 holds, more generally, for the space C(K) of all continuous functions on a compact subset K of R (see, e.g., [Jame, Theorem 26.7]), where uniformly approximating a function f ∈ C(K) by a sequence {gn }∞ n=1 means supx∈K |f (x) − gn (x)| → 0 as n → ∞. Instead of uniformly approximating by functions in the class P of all polynomials, we may choose any subalgebra A of C(K) that strongly separates points of K. To say that A is a subalgebra means that A is a linear subspace (i.e., if f , g ∈ A, and α, β ∈ R, then αf + βg ∈ A) of C(K) closed by taking products (that is, f g ∈ A whenever f ∈ A and g ∈ A). To say that A strongly separates points of K means that given x, y ∈ K, x = y, and α, β ∈ R, there exists an element g ∈ A such that g(x) = α and g(y) = β.
5.1 Function Sequences
237
Of course, the set of all polynomials in [a, b] is a subalgebra of C[a, b] that strongly separates points of [a, b]. Another nice example (from application’s point of view) is the subalgebra T P of real-valued trigonometric polynomials on the interval [−π , π ], given by & 0 N T P:= f (t):= αn cos (nt) + βn sin (nt) : αn , βn ∈ R, n=1, 2, . . ., N , N ∈N . n=−N
Without relying on this more general result, we shall prove in Exercise 13.228 that any continuous function on a closed interval can be uniformly approximated by a trigonometric polynomial to an arbitrary accuracy. ®
5.1.3
Convergence in Measure
Definition 492 We say that a sequence {fn }∞ n=1 of real-valued measurable functions defined on a measurable set M converges in measure to a measurable function f , if lim λ({x ∈ M : |fn (x) − f (x)| ≥ δ}) = 0
n→∞
for every δ > 0, where λ denotes the Lebesgue measure on R. Theorem 493 (Lebesgue) Let {fn }∞ n=1 be a sequence of real-valued measurable functions defined on a closed and bounded interval I in R, and let f be a realvalued measurable function defined on I . Assume that fn → f (a.e.) on I . Then fn → f in measure on I . Proof Let δ > 0 be given. For k ∈ N, denote Wk := {x ∈ I : |fk (x) − f (x)| ≥ δ}. Fix ε > 0. We shall prove that there exists k0 ∈ N such that λ(Wk ) < ε for k ≥ k0 . In order to see this, note that, by Proposition 271 and Egorov’s Theorem 478, there is a compact subset K of I such that λ(I \ K) < ε and fn − f → 0 uniformly on K (see Remark 479.2). Thus there is k0 such that Wk ⊂ I \ K for k ≥ k0 . This proves that λ(Wk ) < ε for k ≥ k0 . Remark 494 1. Theorem 493 may not necessarily hold if the measure of I is infinite. Indeed, let I := R, and put fn = χ[−n,n] for n ∈ N. The sequence {fn } converges pointwise on I to the function f ≡ 1, although it certainly does not converge to f in measure on I . 2. If M is any measurable set, and if a sequence {fn }∞ n=1 of real-valued functions converges uniformly on M to a function f , then it converges in measure to f . Indeed, given δ > 0, there exists n0 ∈ N such that |fn (x) − f (x)| < δ for all n ≥ n0 and all x ∈ M. This shows that the set {x ∈ M : |fn (x) − f (x)| ≥ δ} is empty for all n ≥ n0 .
238
5 Function Convergence
3. For any measurable set M, and for any sequence {fn } of real-valued measurable functions, if M |fn | → 0 (in the Lebesgue sense, see Sect. 7.3), then fn → 0 in measure. Indeed, if for some δ > 0 we have λ({x ∈ M : |fnk (x)| ≥ δ}) ≥ ε for some subsequence {nk }k and some ε, then M |fnk | ≥ δε for all k ∈ N, a contradiction. ® The following result is due to the Hungarian mathematician F. Riesz. Theorem 495 (Riesz) Let fn , n ∈ N, and f be real-valued measurable functions on a measurable set M. Assume that fn → f in measure. Then there is a subsequence ∞ {fnk }∞ k=1 of {fn }n=1 such that fnk (x) →k f (x) for almost all x ∈ M. Proof First we choose a subsequence {nk } so that the sets Mk = {x ∈ M : |fnk (x) − f (x)| ≥ 2−k }, k ∈ N, satisfy λ(Mk ) < 2−k . By Lemma 292, we have λ( lim sup Mk ) = 0. Given x ∈ M \ lim sup Mk , we get, from the definition of lim sup Mk , that there is k0 ∈ N so that x ∈ Mk for k ≥ k0 . Then |fnk (x) − f (x)| < 2−k for all k ≥ k0 . It follows that lim fnk (x) = f (x) for all x ∈ M \ lim sup Mk . Remark 496 1. For n ∈ N, let fn := χIn , where In = [2−k j , 2−k (j + 1)] if n = 2k + j , for k ≥ 0, and 0 ≤ j < 2k . Then fn → 0 in measure on [0, 1], but {fn } converges at no point in [0, 1]. This does not contradict Theorem 495; a subsequence that converges almost everywhere is {f2k }∞ k=1 . Indeed, f2k (x) → 0 for allx ∈ (0, 1]. 1 2. If fn = nχ[0, 1 ] for n ∈ N, then fn → 0 in measure on [0, 1], but 0 |fn | = 1 n for all n ∈ N (the integral either in the Riemann or in the Lebesgue sense, see Chap. 7). ®
5.1.4
Local Approximation by Polynomials
The Taylor Polynomial Sections 5.1.1 and 5.1.2 presented some results on “global” approximation (i.e., on a whole set, first pointwise, then uniformly) of a function by elements in a certain class (polynomials, more generally continuous functions, etc.). Now we are interested in a different problem: We want, given a function f and a point x0 in its domain, to approximate (pointwise, uniformly) f by a function in a given class locally at x0 , i.e., on some neighborhood of x0 . Most typically, we want to approximate locally the function f at x0 by a polynomial of a preassigned degree. Obviously, the theoretical and practical consequences of this sought approximation will be also of a local character, something that can enlighten the local behavior of f at x0 (existence of local extrema, local monotonicity, local concavity or convexity, etc.). Polynomials were introduced in Definition 297. Here we rewrite their definition by using a slightly different algebraic formula, suited to our purposes.
5.1 Function Sequences
239
A polynomial of degree n (an nth degree polynomial, in short) is a function Pn (x) = a0 + a1 (x − x0 ) + a2 (x − x0 )2 + . . . + an (x − x0 )n ,
(5.24)
where x0 is a preassigned real number (otherwise arbitrary) and an = 0. The way to write the monomials, putting (x − x0 )k instead of the more familiar x k , is intended only for allowing simple expressions for the coefficients a0 , a1 , . . . , an . A way to convey the idea of proximity of the polynomial Pn to f in the vicinity of x0 is to force Pn to agree, at x = x0 , with f in a precise way. We impose the requirement that Pn(k) (x0 ) = f (k) (x0 ) for all k = 0, 1, 2, . . . , n. All together, those are n + 1 conditions, exactly the number of coefficients a0 , a1 , . . . , an to be computed, so it seems plausible that we can determine Pn uniquely. This precise polynomial of degree less than or equal to n will be denoted Pf ,x0 ,n , since it depends on f , x0 , and n. To ensure its existence and to precise its expression are the goals of Proposition 498. Observe, in passing, that this is intended as the natural generalization of the essential fact that the tangent line at (x0 , f (x0 ))—the graph of a 1th degree polynomial—approximates a function f in a vicinity of x0 , as it was discussed in Sect. 4.1.4. As a way to suggest to the reader the form the polynomial Pf ,x0 ,n must adopt, the following proposition shows a way to express an nth degree polynomial P in terms of its derivatives at the point x0 . In this case, of course, we do not speak of “approximation”: the polynomial P is the polynomial (5.25). Related to this, see Exercise 13.328. Proposition 497 Let P be an nth degree polynomial defined on R, and let x0 ∈ R. Then, for every x ∈ R, P (x) =
n P (k) (x0 ) k=0
k!
(x − x0 )k .
(5.25)
Proof Write P in the form (5.24). Fix k ∈ {0, 1, 2, . . . , n}, and compute the mth derivative of the monomial ak (x − x0 )k for m = 0, 1, 2, . . . , k. We get (ak (x − x0 )k )(m) = k(k − 1) . . . (k − m + 1)ak (x − x0 )k−m . This shows that the “independent” term of P (k) (i.e., the degree 0 monomial in P (k) , an expression that coincides with P (k) (x0 )) is k(k − 1) . . . 1.ak ( = k!ak ). The conclusion follows. Proposition 498 Let f be a real-valued function defined on an open interval (a, b) ⊂ R. Let x0 ∈ (a, b) and n ∈ N ∪ {0}. Assume that f (n) (x0 ) exists (if n > 0, this requires the existence of f (k) (x) at points in a neighborhood of x0 , for k = 0, 1, 2, . . . , n − 1). Then there exists a unique polynomial Pf ,x0 ,n of degree less than or equal to n such that Pf(k),x0 ,n (x0 ) = f (k) (x0 ),
k = 0, 1, 2, . . . , n.
(5.26)
240
5 Function Convergence
Fig. 5.13 The function exp x and its first four Taylor polynomials at 0
Precisely, we have (compare with (5.25)) Pf ,x0 ,n (x) =
n f (k) (x0 ) k=0
k!
(x − x0 )k , for all x ∈ R.
(5.27)
Proof It is enough to prove (i) that the polynomial (5.27) satisfies conditions (5.26), and (ii) that every polynomial of type (5.24) that satisfies conditions (5.26) has coefficients ak given by ak =
f (k) (x0 ) , k = 0, 1, 2, . . . , n. k!
The validity of (i) and (ii) is checked easily by a simple computation.
(5.28)
Definition 499 Given a real-valued function f defined on (a, b) ⊂ R, n ∈ N ∪ {0}, and x0 ∈ (a, b), such that the conditions in Proposition 498 are satisfied, the polynomial (5.27) is called (after the English mathematician B. Taylor) the nth Taylor polynomial associated to f at x0 . Remark 500 1. Assume that the conditions in Proposition 498 are satisfied. Observe that the 0th Taylor polynomial associated to f at x0 is just the constant function having value f (x0 ). It is clear, too, that the corresponding 1th Taylor polynomial is the tangent line to the graph of f at (x0 , f (x0 )), etc. See Fig. 5.13 for the first four Taylor polynomials for the function exp x at point 0. 2. If f is already an nth degree polynomial, it is clear that the nth Taylor polynomial associated to f at x0 coincides with f . 3. If f and g are two real-valued functions defined on (a, b), α, β ∈ R, x0 ∈ (a, b), n ∈ N ∪ {0}, and f and g have kth derivatives at x0 for k = 0, 1, 2, . . . , n, it is ® clear that P(αf +βg),x0 ,n = αPf ,x0 ,n + βPg,x0 ,n .
5.1 Function Sequences
241
Fig. 5.14 The functions sin x and cos x and their first six Taylor polynomials at 0
Examples 501 We provide now some examples of “elementary” functions and their nth Taylor polynomials at x0 = 0. 1. Let f (x) := exp x. The function f has all derivatives at every point in R, and f (k) = f for all k = 0, 1, 2, . . . . In particular, f (k) (0) = exp 0 = 1 for all k = 0, 1, 2, . . . (for a detailed treatment of the exponential function, see Sect. 5.2.3 below). This implies that Pexp,0,n (x) = 1 +
x x2 x3 xn + + + .... + , n = 0, 1, 2, . . . , x ∈ R. (5.29) 1 2! 3! n!
(See Fig. 5.13.) 2. Let f := sin x. The function f has all derivatives at every point in R, f (x) = cos x and f
(x) = − sin x, for all x ∈ R (the expression for the subsequent derivatives may be deduced from these). We get, then (see Fig. 5.14), Psin,0,2n+1 (x) = Psin,0,2n+2 (x) = x−
x3 x 2n+1 x5 + − · · · + (−1)n , n = 0, 1, 2, . . . , x ∈ R. 3! 5! (2n + 1)! (5.30)
3. Let f := cos x. The function f has all derivatives at every point in R, f (x) = − sin x and f
(x) = − cos x, for all x ∈ R (the expression for the subsequent derivatives may be deduced from these). We get, then (see Fig. 5.14), Pcos,0,2n (x) = Pcos,0,2n+1 (x) =1−
x2 x 2n x4 + − · · · + (−1)n , n = 0, 1, 2, . . . , x ∈ R. 2! 4! (2n)! (5.31)
4. Let f (x) = ln (1 + x), for all x > −1. The function f has all derivatives at every point of (−1, +∞). Then, Pln (1+x),0,n (x) = x −
x3 xn x2 + − · · · + (−1)n−1 . 2 3 n
(5.32)
242
5 Function Convergence
Fig. 5.15 The function ln (1 + x) and its first five Taylor polynomials at 0
The function f and its five first Taylor polynomials are depicted in Fig. 5.15. For an analysis of the convergence of the Taylor polynomials to the function f , see the last part of Sect. 5.2.3. ♦ Estimating the Approximation In the previous entry, we constructed the nth Taylor polynomial Pf ,n,x0 associated to a function f defined on an interval (a, b) and to a point x0 ∈ (a, b), with the sole requirements that f has all derivatives up to order n at x0 (and this needs, to be sure, the existence of all derivatives up to order n − 1 on a neighborhood of x0 ). This Taylor polynomial would be of little use if no estimate for the approximation—if any—between the function and the polynomial would be available. The problem of finding such an approximation can be considered from different points of view: (A) Clearly, f (x0 ) = Pf ,x0 ,n (x0 ) (after all, this is the first of the set of conditions that Pf ,x0 ,n must satisfy, see (5.26)). We expect that |f (x) − Pf ,x0 ,n (x)| will be small, at least for x close to x0 . So we try to compute Mδ := sup{|f (x) − Pf ,x0 ,n (x)| : x ∈ [x0 − δ, x0 + δ] (⊂(a, b))} to find how Mδ depends on δ for small δ > 0. For a partial answer to this question, see Theorem 502 below and the comments after it. (B) Observe Example 5.1.4.2. Polynomials Psin,0,2n+1 and Psin,0,2n+2 give the same approximation—in fact, they are the same polynomial, so no better approximation is obtained by passing from Psin,0,2n+1 to Psin,0,2n+2 . In general, we may expect that the higher the degree of the polynomial, the better the approximation (provided that the function f has derivatives at x0 of a sufficient higher degree). However, the computational cost of finding higher derivatives may not pay off for a (hypothetical) higher accuracy. In any case, and assuming that the function has derivatives to an order n, we can ask (B1) Does the accuracy increase when passing from Pf ,x0 ,k to Pf ,x0 ,k+1 for k = 0, 1, 2, . . . , n − 1? (B2) If it does, in what sense? (pointwise in (x0 − δ, x0 + δ)?, uniformly on this set?, and for what δ?). (C) If all the derivatives at x0 do exist, we may even write down a series ∞ f (k) (x0 ) (x − x0 )k . This possibility generates several more questions. To k=0 k! properly formulate these questions below we need to develop at least the basic theory of functions series and, more particularly, of power series. This will be done in the next section. We can ask
5.1 Function Sequences
243
(Ci) Does the series converge? (Cii) If the answer to (Ci) is positive, where does the series converge? Does the series converge in the same domain where the function f is defined and has all the derivatives? Or does it converges in a smaller domain? For a partial answer to this question, see Example 5.2.2.1. (Ciii) If the answer to (Ci) is positive and we know the domain of convergence of the series, (Ciiia) Does the series pointwise converge to f there? Can it converge pointwise to a different function? (Ciiib) Does it converge uniformly there? Can it converge uniformly to a different function? Let us fix some notation: given a real-valued function f defined on an interval (a, b) ⊂ R, n ∈ N, and a point x0 ∈ (a, b) where f has derivatives f (k) (x0 ), k = 0, 1, 2, . . . , n, put Rf ,x0 ,n := f − Pf ,x0 ,n , where Pf ,x0 ,n is the nth Taylor polynomial associated to f at x0 . The function Rf ,x0 ,n is called the nth Taylor remainder associated to f at x0 . In this way, f = Pf ,x0 ,n + Rf ,x0 ,n .
(5.33)
Theorem 502 (Taylor formula with remainder) Let f be a real-valued function defined on an interval (a, b) ⊂ R, let n ∈ N, and x0 ∈ (a, b). Assume that f has derivatives f (k) , k = 1, 2, . . . , n + 1, on (a, b). Then, given x ∈ (a, b), x = x0 , and p ∈ N, there exists ξ ∈ (x0 , x) if x0 < x (or ξ ∈ (x, x0 ) if x < x0 ) such that Rf ,x0 ,n (x) =
(x − x0 )p (x − ξ )n−p+1 f (n+1) (ξ ). n!p
(5.34)
Proof Fix x ∈ (a, b), x = x0 , and p ∈ N. Define a real number P by the following equation: f (x0 ) f
(x0 ) (x − x0 ) + (x − x0 )2 + . . . 1! 2! f (n) (x0 ) P ... + (x − x0 )n + (x − x0 )p , n! n!p
f (x) = f (x0 ) +
(5.35)
and consider the following auxiliary function in the variable t ∈ (a, b): f
(t) f (t) (x − t) − (x − t)2 1! 2! f (n) (t) P (x − t)n − (x − t)p . − ... − n! n!p
φ(t) := f (x) − f (t) −
(5.36)
Observe that, due to (5.35) and the way φ was defined, we have φ(x0 ) = φ(x) = 0. Since clearly φ is a differentiable function on (a, b), we can apply Rolle’s Theorem
244
5 Function Convergence
364 to find ξ strictly between x and x0 such that φ (ξ ) = 0. It is easy to compute the derivative of φ on (a, b). After some simple manipulations, we get φ (t) =
(x − t)p−1 P − (x − t)n−p+1 f (n+1) (t) . n!
(5.37)
Since ξ = x, we obtain, from (5.37) and the fact that φ (ξ ) = 0, P = (x − ξ )n−p+1 f (n+1) (ξ ).
(5.38)
Carrying this value to Eq. (5.35), we get f
(x0 ) f (x0 ) (x − x0 ) + (x − x0 )2 + . . . 1! 2! f (n) (x0 ) (x − x0 )p ... + (x − x0 )n + (x − ξ )n−p+1 f (n+1) (ξ ), n! n!p
f (x) = f (x0 ) +
and this gives the announced formula (5.34) for the remainder Rf ,x0 ,n .
(5.39)
Remark 503 In practice, we are usually interested in the cases p = 1 or p = n + 1 in Theorem 502. In the first case (for p = 1), we get Rf ,x0 ,n (x) =
(x−x0 ) (x n!
− ξ )n f (n+1) (ξ ),
(5.40)
called the Cauchy form of the remainder, and, in the second case (for p = n + 1), we get Rf ,x0 ,n (x) =
(x − x0 )n+1 (n+1) (ξ ), called the Lagrange form of the remainder, f (n + 1)! (5.41)
where in both cases, ξ is the real number whose existence is guaranteed by Theorem 502. ® Remark 504 Theorem 502 provides a sufficient condition for the local approximation we were looking for (see question (A) above). To be precise, and using the Lagrange form (5.41) of the remainder, if f is a function as in the statement of this theorem, and if |f (n+1) (x)| ≤ M for all x ∈ [x0 − δ, x0 + δ] ⊂ (a, b), then |f (x) − Pf ,x0 ,n (x)| ≤ Mδ n+1 /(n + 1)! for all x ∈ [x0 − δ, x0 + δ]. ® Remark 505 From Theorem 502 we get that, for a real-valued function f as in the statement having, moreover, the property that f (n+1) is a bounded function on (a, b), lim
x→x0
f (x) − Pf ,x0 ,n (x) (x − x0 )n Rf ,x0 ,n (x) 1 lim f (n+1) (ξ )(x − x0 ) = 0 = lim = x→x0 (x − x0 )n (n + 1)! x→x0
5.1 Function Sequences
245
(using the Lagrange form (5.41) of the remainder). However, this estimate can be improved in the sense that it holds for f with weaker properties. This is important in some applications and is the content of Proposition 506. ® Proposition 506 Assume that a function f is defined on an open interval (a, b), and that it is n times differentiable at a point x0 ∈ (a, b), where n is a positive integer. Let Pf ,x0 ,n be the n-Taylor polynomial of order n for f at x0 , defined in (5.27). Then lim
x→x0
f (x) − Pf ,x0 ,n (x) = 0. (x − x0 )n
(5.42)
In other words, f (x) = Pf ,x0 ,n (x) + o((x − x0 )n ), as x → x0 ,
(5.43)
where o(·) in the last formula is the so-called “small o” Landau’s notation (1 ). Proof Put g(h) = f (x0 + h) − Pf ,x0 ,n (x0 + h) for h ∈ R. By applying (n − 1) times L’Hôspital Rule (Theorem 376), we get g(h) g (n−1) (h) 1 lim = h→0 hn n! h→0 h f (n−1) (x0 + h) − f (n−1) (x0 ) − f (n) (x0 )h 1 lim = n! h→0 h (n−1) f (x0 + h) − f (n−1) (x0 ) 1 lim − f (n) (x0 ) = 0 = n! h→0 h lim
by the definition of f (n) (x0 ). According to Example 5.1.4.2, we get, in particular, that sin x = x −
x3 + o(x 3 ) as x → 0, 3!
(5.44) 3
3
since x − x3! is the third Taylor polynomial Psin,0,3 of sin x at x0 = 0. Note that x − x3! is also the fourth Taylor polynomial Psin,0,4 of sin x at x0 = 0, so we have, also, sin x = x −
x3 + o(x 4 ), as x → 0, 3!
(5.45)
which gives a more precise asymptotic behavior of the function sin x at 0 than (5.44).
1
For the definition and use of the “big O” and “small o” in formulas—a notational device due to Landau—see Exercise 13.257.
246
5 Function Convergence
Some Applications • By using Proposition 506, we can now extend Theorem 373 to functions having a higher differentiability degree at a certain point. Corollary 507 Let f be a real-valued function defined on an open interval (a, b). Assume that for some x0 ∈ (a, b) and some n ∈ N, n ≥ 2, the nth derivative f (n) (x0 ) exists as a real number. Assume, too, that f (k) (x0 ) = 0 for k = 1, 2, . . ., n − 1. 1. If f (n) (x) > 0, then (a) f has a strict local minimum at x0 if n is even. (b) f has not a local extremum at x0 if n is odd. 2. If f (n) (x0 ) < 0, then (a) f has a strict local maximum at x0 if n is even. (b) f has not a local extremum at x0 if n is odd. Proof Observe that, in this case, Pf ,x0 ,n (x) = f (x0 ) +
f (n) (x0 ) (x − x0 )n , for all x ∈ (a, b). n!
Then, for x ∈ (a, b), we have f (x) − Pf ,x0 ,n (x) f (x) − f (x0 ) f (n) (x0 ) . = − (x − x0 )n (x − x0 )n n! It follows from Proposition 506 that lim
x→x0
f (x) − f (x0 ) f (n) (x0 ) . = (x − x0 )n n!
(5.46)
1. Assume now that f (n) (x0 ) > 0. We can find, due to (5.46), δ > 0 small enough to have (x0 − δ, x0 + δ) ⊂ (a, b), and such that for |x − x0 | < δ, we have f (x)−f (x0 ) > 0. (x−x0 )n (a) If n is even, this shows that f (x) > f (x0 ) for all x ∈ (x0 − δ, x0 + δ), hence x0 is a strict local minimum for f . (b) If, on the contrary, n is odd, then f (x) < f (x0 ) if x ∈ (x0 − δ, x0 ), and f (x) > f (x0 ) if x ∈ (x0 , x0 + δ), so x0 is not a local extremum. 2. If f (n) (x0 ) < 0, we can proceed similarly. Precisely, thanks to (5.46) we can find δ > 0 small enough to have (x0 − δ, x0 + δ) ⊂ (a, b), and such that for (x0 ) |x − x0 | < δ, we have f (x)−f < 0. (x−x0 )n (a) If n is even, this shows that f (x) < f (x0 ) for all x ∈ (x0 − δ, x0 + δ), hence x0 is a strict local maximum for f . (b) If, on the contrary, n is odd, then f (x) > f (x0 ) if x ∈ (x0 − δ, x0 ), and f (x) < f (x0 ) if x ∈ (x0 , x0 + δ), so x0 is not a local extremum. For examples concerning Corollary 507, see Fig. 5.16. • A second application of Taylor polynomials concerns further convergence criteria for numerical series.
5.1 Function Sequences
247
Fig. 5.16 Examples for Corollary 507 (in all cases, x0 = 0)
a
b
a
b
Proposition 170 provides a criterion for convergence of a series by comparing its general term with the general term of a series whose behavior is known. In Proposition 175, comparing the quotient of consecutive terms of a given series with a constant provided another convergence test. This last procedure suggests to compare quotients of consecutive terms of a given series and of a series whose character—convergence or divergence—is known. Precisely we have the following simple result (note that this criterion, as well as any other criterion for convergence or divergence of a series, can be checked just on a tail of the series, see Remark 155). It lies at the core of the proof of Raabe’s test for convergence (Proposition 509). Lemma 508 Let
an and
bn be two series of positive terms. Assume that
an+1 bn+1 ≤ an bn for all n ∈ N. Then, if bn converges, so does an . Assume now that an+1 bn+1 ≥ an bn for all n ∈ N and that bn diverges. Then an diverges, too.
(5.47)
(5.48)
Proof We claim that (5.47) implies an ≤ ab11 bn for all n ∈ N. From this and the Comparison Test (Proposition 170), the first part of the statement follows. In order to prove the claim, proceed by induction: for n = 1, the claim clearly holds. Assume
248
5 Function Convergence
that it is true to 1, 2, . . ., n. Since an+1 bn+1 ≤ , an bn we have, by the induction hypothesis, an+1 ≤ an
bn+1 a1 bn+1 a1 ≤ bn = bn+1 , bn b1 bn b1
and this proves the claim for n + 1. By induction, this holds for all n ∈ N. The second part is proved similarly. The next statement follows from Lemma 508, by comparing with the series ∞ μ n=1 (1/n) for different values of μ. It will be used later (see Exercise 13.122). Proposition 509 (Raabe’s test) Let an be a series of positive terms. Put an+1 1 = for all n ∈ N. an 1 + αn Then, k for all n ∈ N, the series (i) if there exists k > 1 such that nαn > (ii) If nαn ≤ 1 for all n ∈ N, the series an diverges.
(5.49)
an converges.
Proof (i) Let μ be such that 1 < μ < k. We shall prove that, for n ∈ N big enough, we have
1 μ 1 an+1 n+1 = (5.50) < 1 μ μ . an 1 + n1 n The result will follow then from Lemma 508 and Proposition 174. Observe that an+1 1 n n 1 = = < . = an 1 + αn n + nαn n+k 1 + nk So, in order to show (5.50), it is enough to prove that, for n ∈ N big enough, 1 μ k . 1+ > 1+ n n For this, consider the function f (x) := x μ . According to Theorem 502, we can find θ ∈ (1, 1 + 1/n) such that f (1 + 1/n) = f (1) + f (1)(1/n) + (1/2)f
(θ )(1/n2 ). This gives 1 μ μ μ(μ − 1) μ−2 1 1+ =1+ + θ n n 2 n2 1 μ−2 1 μ μ(μ − 1) 1+ R. (iii) The series converges uniformly on every interval contained in (−R, R) that is simultaneously closed and bounded.
5.2 Function Series
251
n (iv) The sum s (i.e., s(x) := ∞ n=0 an x ) of the series is an infinitely differentiable function on (−R, R). Precisely, the kth derivative of s(x) is given by ∞ n=k n(n− 1) . . . (n−k+1)an x n−k , for every k ∈ N∪{0} and every x ∈ R such that |x| < R. Moreover, we have the following (Cauchy–Hadamard formula) R = lim sup n→∞
n
|an |
−1 (5.52)
.
This includes the following √ two situations: R = 0 if lim supn→∞ R = +∞ if lim supn→∞ n |an | = 0.
√ n
|an | = +∞, and
Proof Put R := sup{|r| : r ∈ R,
∞
an r n converges} (R = +∞ is allowed).
(5.53)
n=0
n Note that, if |x| > R, where R is defined in (5.53), then we have that ∞ n=0 an x diverges. On the other hand, if R = 0, there is nothing to prove (the series converges only for x = 0). Assume then that R > 0 and that 0 < |x| < R. By the definition n of R in (5.53), we may find s ∈ R with 0 < |x| < |s| < R such that ∞ n=0 an s n converges. This implies that |an s | → 0 as n → ∞; in particular, the sequence {|an s n |}∞ n=0 is bounded. Let M be an upper bound for it. Then we have n n |x| |x| n n |an x | = |an s | ≤M , |s| |s| and the last term is the general term of a convergent series, since |x|/|s| < 1 (see n Proposition 163). This shows that the series ∞ n=0 an x absolutely converges by the comparison test (see Proposition 170). We proved then (i) and (ii). (iii) is an immediate consequence of the Weierstrass M-test (see Theorem 473). Indeed, if an interval [a, b] is contained in (−R, R), then we may find an interval [− A, A] such that [a, b] ⊂ [−A, A] ⊂ (−R, R). Itfollows that for x ∈ [a, b], we n have |an x n | ≤ |an An |, and we know that the series ∞ n=0 |an A | converges by (i). We shall √ prove now (5.52) by showing that if we put R = 1/L, where L := lim supn→∞ n |an |, then R satisfies (5.53). Assume first that L = +∞. Then 1/L = 0. Let x ∈ R such that 0 < |x| < 1/L. Then 1/|x| > L.√Fix r ∈ R such that 1/|x| > r > L. It follows that there exists N ∈ N such that n |an | < r < 1/|x| for every n ≥ N. In other |an x n | < r n |x|n < 1 for every terms, ∞n ≥ Nn. Since r|x| < 1, n n and so the series ∞ r |x| converges, it follows that n=0 n=0 |an x | converges too. Include now the possibility that L = +∞ (in such a case put 1/L = 0). Assume that |x| > 1/L. Then 1/|x| < L. Thus there exists a subsequence {nk }∞ k=1 of 1, 2, . . . nk such that 1/|x| < nk |ank | for every k ∈ N. Then 1 < |a |, hence 1 < |ank x nk | nk ∞ n for every k ∈ N, and the series n=0 an x diverges since its general term does not converge to 0. All together, we proved the formula (5.52).
252
5 Function Convergence
To prove (iv), we need a simple observation: consider the series obtained as the formal derivative of the given series (no convergence is assured for the moment). Precisely, given the series ∞
an x n = a0 + a1 x + a2 x 2 + a3 x 3 + . . .,
(5.54)
n=0
consider the series a1 + 2a2 x + 3a3 x 2 + . . . =
∞
nan x n−1 .
(5.55)
n=1
Let R be the radius of convergence of the series (5.54). Compute the radius of convergence R of the series (5.55) by using the corresponding Cauchy–Hadamard formula (5.52). We have (R )−1 = lim sup |nan |1/(n−1) n→∞
n/(n−1)
= lim sup |an |1/n = R −1 , = lim sup (n|an |)1/n n→∞
n→∞
since limn→∞ (n/n−1) = 1 and limn→∞ n1/n = 1 (see Exercise 13.90). This proves that both series (5.54) and (5.55) have the same radius of convergence. We may now conclude (iv) by using Theorem 480. Indeed, it is enough to assume R > 0. Fix x ∈ (−R, R) and let r ∈ R such that |x| < r < R. By (iii) (and the fact that the radius of convergence of (5.55) is also R), the series (5.55) converges uniformly in [− r, r]. The results follow from Theorem 480. This proves that s is differentiable at x. The argument can be applied again to prove that, in fact, s is infinitely differentiable at x. Remark 512 Note that Theorem 511 does not discuss the convergence of the series ∞ n n=0 an x at points x = R or x = −R. The reason is that there is no decisive statement about this (although some information is available in certain cases, see, for example, Theorem 518). Let us provide some examples to illustrate several possibilities: n 1. The power series ∞ n=0 x has radius of convergence 1 (a consequence of formula (5.52)). Its sum on (−1, 1) can be readily calculated, since its general term is a geometric progression. Apply Proposition 163 to obtain that the sum is the function f (x) := (1 − x)−1 on (−1, 1) (see Fig. 5.18). However, the series does not converge at x = 1 nor at x = −1. The convergence on (−1, 1) is not uniform. To check this, assume for a moment that the series convergesuniformly to f . k Then, for every ε > 0, there exists n ∈ N such that |f (x) − m k=0 x| < ε for n k all m ≥ n and all x ∈ (−1, 1). This is false, since, for example, mk=0 xk is on (−1, bounded m+1 1) and f is not. Alternatively, note that f (x) − k=0 x = 1 x m+1 x −1 1−x − 1−x = 1−x , and this is unbounded on (−1, 1).
5.2 Function Series
253
Fig. 5.18 The function f and four approximations on (−1, 1) (Example 512.1)
Fig. 5.19 Five xn approximations to ∞ n=1 n on [−1, 1) (Example 512.2)
n 2. The power series ∞ n=1 (1/n)x has radius of convergence 1 (use formula (5.52) and Exercise 13.90). The series converges at x = −1 by Corollary 183, and diverges at x = 1 by Proposition 161 (see Fig. 5.19). The convergence on [−1, 1) is not uniform. Arguing by contradiction, assume it does. Then for ε > 0, there q exists n ∈ N such that | k=p x k /k| < ε for every p, q ∈ N with n ≤ p ≤ q and every x ∈ [−1, 1). Since the harmonic series diverges (see Proposition 161), we q can find p, q ∈ N such that n ≤ p ≤ q and k=p 1/k > ε. Now, it is enough q to take x ∈ (0, 1) close enough to 1 to obtain | k=p x k /k| > ε, a contradiction. Related to this example, see Theorem 518. 2 n 3. The power series ∞ (1/n )x has radius of convergence 1. It converges also n=1 for x = 1 and for x = −1, due to Proposition 174. It converges uniformly on [−1, 1], due to the Weierstrass M-test (Theorem 473) (see Fig. 5.20). n 4. The power series ∞ n=0 n!x has radius of convergence R = 0 (thus it converges only for x = 0). This follows from the Cauchy–Hadamard formula (5.52) and Lemma 178. Indeed, (n+1)! = (n + 1) → +∞ as n → ∞. n! x n 5. The power series ∞ n=0 n! has radius of convergence R = +∞ (thus it converges for every x ∈ R). This follows again from the Cauchy–Hadamard formula (n)! 1 (5.52) and Lemma 178. Indeed, (n+1)! = n+1 → 0 as n → ∞ (alternatively, use the ratio test, i.e., Proposition 175). The function on R defined as the sum of this series is the exponential function. For a more complete treatment, see Sect. 5.2.3. ®
254
5 Function Convergence
Fig. 5.20 Five xn approximations to ∞ n=1 n2 on [−1, 1] (Example 512.3)
Remark 513 1. The information provided by Theorem 511 is far more reaching than it may appear n at first glance. For example, given any power series ∞ a x , if it converges n n=0 at some x0 ∈ R, then it converges (even absolutely) at every x ∈ R such that |x| < |x0 | (every power series converges at x = 0, so x0 = 0 is not relevant here). Equivalently, if it diverges at some x0 ∈ R, then it diverges at every x ∈ R such that |x| > |x0 |. This may be used as a test for estimating the radius of convergence of a power series. n 2. The sum s of a power series ∞ n=0 an x whose radius of convergence is R (for some R > 0) is a function on (−R, R) that, in some cases, can be extended beyond the set [ − R, R], although the series certainly does not converge at points n of R \ [−R, R]. For instance, we saw in Example 512.1 that the series ∞ n=0 x has radius of convergence 1 (precisely, it converges if and only if |x| < 1), and sums (1 − x)−1 on (−1, 1). The function (1 − x)−1 exists on (−∞, 1). We cannot ∞ −1 expect (1 − x) = n=0 x n on (−∞, −1], since the series does not converge there. ® Remark 514 1. The reader would have no difficulties in adapting the previous and subsequent n results on power series to series of the form ∞ a n=0 n (x − x0 ) , where x0 is some given real number (we refer to those series as power series centered at x0 ). He/she can understand this as a simple change of variable. For example, the following result is an easy consequence of Theorem 511. We provide the complete statement but not the proof, since it really consists of just “changing the variable.” n Theorem 515 Let x0 ∈ R, and let ∞ n=0 bn (x − x0 ) be a power series centered at x0 . Then, there exists R ∈ [0, +∞] such that the following properties hold: (i) The series converges absolutely for every x ∈ (x0 − R, x0 + R). (ii) The series diverges for every x ∈ R such that |x − x0 | > R. (iii) The series converges uniformly on every closed interval contained in (x0 − R, x0 + R).
5.2 Function Series
255
(iv) The sum s of the series is an infinitely differentiable function on (x0 −R, x0 +R). Precisely, the kth derivative of s is given by ∞ n(n − 1) . . . (n − k + 1)bn (x − n=k x0 )n−k , for every k ∈ N ∪ {0} and every x ∈ R such that |x − x0 | < R. Moreover, we have the following (Cauchy–Hadamard formula) R = lim sup n→∞
n
−1
|bn |
(5.56)
.
√ This includes the following two situations: R = 0 if lim supn→∞ n |bn | = +∞, and √ R = +∞ if lim supn→∞ n |bn | = 0. n 2. Assume that ∞ n=0 an x is a power series whose radius of convergence is R > 0. Let s(x) be the sum of the series for |x| < R. Take x0 ∈ R such that 0 < |x0 | < R n and put δ := R − |x0 |. We claim that there is a power series ∞ n=0 bn (x − x0 ) that converges in (x0 − δ, x0 + δ), and whose sum is again s on this set. To prove this, put (formally) for x ∈ (x0 − δ, x0 + δ), ∞
an x n =
n=0
∞
an (x − x0 + x0 )n
n=0
=
∞ n=0
an
n n k=0
k
(x −
x0 )k x0n−k
=
∞
bp (x − x0 )p ,
(5.57)
p=0
where, for p = 0, 1, 2, . . . , we have n n−k x . an bp = k 0 0≤k≤n, n−k=p
(5.58)
In (5.57), we rearranged the terms of the series in order to collect those having the same factor (x − x0 )p . The possibility to do this is guaranteed by the absolute convergence of the series for x ∈ (x0 −δ, x0 +δ) (see (i) in Theorem 511, Proposition 213, and Corollary 214). This justifies the validity of the formaln development above. Observe that, by the argument, the series ∞ n=0 bn (x − x0 ) converges for x ∈ (x0 − δ, x0 + δ), and sums s(x) there. ® ∞ n Corollary 516 Let n=0 an x be a power series with radius of convergence R ∈ n (0, +∞]. Put s(x) := ∞ n=0 an x for x ∈ (−R, R). Then, (i) We have s (n) (0) (5.59) , for every n ∈ N ∪ {0}. n! In particular, for every n ∈ N ∪ {0}, nk=0 ak x k coincides with Ps,0,n (x), where Ps,0,n is the nth Taylor polynomial of s at 0. an =
256
5 Function Convergence
(ii) The values of s on (−R, R) are determined by the values of s at an arbitrarily small neighborhood of 0. (iii) If s(xn ) = 0 for every n ∈ N, where {xn }∞ n=1 is a sequence in (−R, R) that has an accumulation point x0 ∈ (−R, R), then s(x) = 0 for all x ∈ (−R, R). Proof (i) follows by evaluating s (n) , given by (iv) in Theorem 511, at x = 0, for n ∈ N. (ii) follows from (i): If s in known on (−ε, ε), for some 0 < ε < R, then we can compute s (n) (0) (and this determines an ) for all n ∈ N ∪ {0}, hence s on (−R, R). To prove (iii), we will show, first, a particular case: n (*) Let g(t) = ∞ n=0 bn t be a power series converging on (−δ, δ) for some δ > 0. Assume that there exists a sequence {tn }∞ n=1 in (−δ, δ) such that tn → 0 and g(tn ) = 0 for all n ∈ N. Then g(t) = 0 for all t ∈ (−δ, δ). Proof of (*) Since g is continuous on (−δ, δ) (see (iv) in Theorem 511), we get 0 = g(tn ) → g(0), hence g(0) = 0. Apply the Mean Value Theorem 365 to g on the interval with endpoints 0 and tn to obtain tn in the interior of this interval such that g (tn ) = 0. This holds for all n ∈ N, so we get a sequence {tn }∞ n=0 in (−δ, δ) such that tn → 0 and g (tn ) = 0 for all n ∈ N. By the fact that g is continuous, we get g (0) = 0. Apply the Mean Value Theorem to g and to the interval with endpoints 0 and tn to get tn
in the interior of this interval. Continue in this way. This recursive process shows that g (n) (0) = 0 for n = 0, 1, 2, . . . , and this implies, by (i), that g vanishes on (−δ, δ). Thus, (*) holds. Assume now that {xn }∞ n=1 and x0 are as in the statement. By passing to a subsequence if necessary we may assume, without loss of generality, that the sequence {xn }∞ to x0 . Let δ := R − |x0 | (> 0). The argument in Remark 514 n=1 converges ∞ n shows that s(x) = ∞ n=0 bn (x − x0 ) for some sequence {bn }n=0 of coefficients and for all x∈ (x0 − δ, x0 + δ). Put t := x − x0 for x ∈ (x0 − δ, x0 + δ), and let n g(t) := ∞ n=0 bn t for |t| < δ. There exists N ∈ Nn such that xn ∈ (x0 − δ, x0 + δ) for all n ≥ N. Now we have a power series ∞ n=0 bn t converging for t ∈ (−δ, δ), and a sequence {tn := xN +n }∞ n=1 in (−δ, δ) such that g(tn ) = 0 for all n ∈ N. By (*) in this proof, we get g(t) = 0 for all t ∈ (−δ, δ), i.e., f (x) = 0 for all x ∈ (x0 − δ, x0 + δ). To finalize the proof, let S := {x ∈ (−R, R) : there exists an open neighborhood U (x) ⊂ (−R, R) of x such that f vanishes on U (x)} . Obviously, S is an open subset of (−R, R). Let us prove that S is also closed relatively to (−R, R). For this, choose an infinite sequence {sn } in S such that sn → x ∈ (−R, R). Due to Remark 514.2, there exists an open interval V (x) centered at x such that f is a power series on V (x). Moreover, this power series vanishes on the sequence {sn }n≥N , where N ∈ N is such that sn ∈ V (x) for all n ≥ N . The preceding paragraph shows that f vanishes on V (x), hence x ∈ S. Apply now Corollary 104 to conclude that S = (−R, R).
5.2 Function Series
257
Remark 517 Once it has been proved that, given x0 ∈ (−R, R), the function s := ∞ an x n in Remark 514 can be expressed in (x0 − δ, x0 + δ) as a power series n=0 ∞ n n=0 bn (x − x0 ) centered at x0 (where δ := R − |x0 |), the coefficients bn can be calculated, alternatively to the formula (5.58), by the more simple formula bn =
s (n) (x0 ) , for n ∈ N ∪ {0}. n!
(5.60)
This is a consequence of (i) in Corollary 516 (formula (5.59)) in the corresponding version for a series centered at a point x0 . ® The following result presents an application of Theorem 182 to the convergence of a power series at the boundary of its (bounded) interval of convergence (−R, R). This shows, in particular (see Corollary 519), that the function defined as the sum of a power series in (−R, R) is right-continuous at x = R as soon as it is defined at x = R. The result is proved for the case R = 1, although a simple change of variable shows the assertion in the general case. Related to this result, see Example 512.2. ∞ Theorem 518 (Abel) Consider a sequence {an }∞ n=0 of real numbers so that n=0 an converges. Then the series ∞ f (r) := an r n n=0
converges uniformly on [0, 1], and the function f is thus continuous on [0, 1]. Proof Set bn (r) := r n for r ∈ [0, 1] and n ∈ N ∪ {0}. Note that for r ∈ [0, 1], the sequence {bn (r)}∞ and convergent. n=1 is bounded above by 1, and it is decreasing n We can then apply Theorem 182 to conclude that ∞ n=0 an r converges. In order to prove that the convergence is uniform on [0, 1], we may use the estimate (2.25) (we follow the notation there, i.e., An := nk=0 ak for all n ∈ N ∪ {0}). This gives, for 0 ≤ p < q and r ∈ [0, 1], q−1 q−1
an bn (r) ≤ |An | bn (r) − bn+1 (r) + |Aq ||bq (r)| + |Ap−1 ||bp (r)|. n=p
n=p
The Cauchy criterion (Proposition 165), together with the Weierstrass M-test (Theorem 473), concludes the result. The novelty in the above result is the fact that we are assuming that the series ∞ n=0 an is convergent, not necessarily absolutely convergent. If this was the case, the result would be rather trivial (by using directly the Weierstrass M-test). ∞ Corollary 519 Let {an }∞ n=0 be a sequence of real numbers so that n=0 an converges. Then ∞ ∞ lim an r n = an . r→1−
n=0
n=0
258
5 Function Convergence
Fig. 5.21 a The first five Taylor polynomials of (1 − x)−1 at x = 0. b The first four Taylor polynomials of (1 + x 2 )−1 at x = 0 (Example 5.2.2.1)
a
5.2.2
b
The Taylor Series
Basics on the Taylor Series Now we can provide some answers to the set (C) of questions formulated in Sect. 5.1.4. Let us start with a definition. Definition 520 Let f be a real-valued function defined on (x0 − δ, x0 + δ) for some x0 ∈ R and some δ > 0. Assume that f has all derivatives on (x0 − δ, x0 + δ). The formal series ∞ f (n) (x0 ) n=0
n!
(x − x0 )n
(5.61)
is called the Taylor series of f at x0 . Examples 521 1. An example of an infinitely differentiable real-valued function f defined on a nonempty open domain U ( ⊂ R), whose Taylor series at 0 has an interval of convergence smaller than U . (i) Let f (x) := (1 − x)−1 , for x ∈ U := (−∞, function f is infinitely 1). The n differentiable on U . Its Taylor series at 0 is ∞ x , due to the fact that for n=0 |x| < 1, the series (a geometric progression) sums (1 − x)−1 . However, this series has radius of convergence 1, so it does not converge for |x| > 1 (in fact, it does not converge for |x| ≥ 1, see Remark 512.1). (ii) A variant of this example is provided by the function g(x) := (1 + x 2 )−1 , defined and infinitely differentiable n 2n on U := R. Again, the Taylor series ∞ of g at x0 = 0 has radius n=0 (−1) x of convergence 1; as in the preceding example, it converges precisely on (−1, 1) (see Fig. 5.21). 2. An example of an infinitely differentiable real-valued function f defined on R, whose Taylor series at 0 converges on all R, although its sum differs from f (x) at every point x ∈ R, x = 0. The function (see Fig. 5.22) ⎧ ⎨exp (−1/x 2 ) if x = 0, f (x) := ⎩0 if x = 0,
5.2 Function Series
259
Fig. 5.22 The function f in Example 5.2.2.2
is infinitely differentiable on R, is positive on R \ {0}, and its Taylor polynomial at x = 0 is the zero polynomial. In order to show this, observe that if P is any polynomial, then lim P (1/ h)e−1/ h = lim P (t)e−t = 0. 2
2
t→∞
h→0
Now for each n ∈ N, we have f (n) (0) = lim Pn (1/ h)e−1/ h = 0, 2
h→0
where Pn is some polynomial, as it can be easily proved by induction. Exercise 13.222 is a nonsymmetric variant of this example. There, an infinitely differentiable function that is 0 on (−∞, 0] and positive on (0, +∞) is presented. 3. An example of an infinitely differentiable real-valued function f such that its Taylor series at x = 0 diverges at every point x = 0. The existence of such a function can be proved by using Theorem 528 below. It is 1/n enough to select a sequence {dn }∞ = +∞ n=0 in R such that lim supn→∞ (|dn |) and an infinitely differentiable function f defined on a neighborhood of 0 such that f (n) (0) = dn for n ∈ N ∪ {0}. The Taylor series of this function converges only at x = 0, according to Theorem 511 and, more particularly, formula (5.52). ♦ Proposition 522 Suppose that a real-valued function f is defined and possesses all derivatives on a bounded interval (a, b). Let x0 be a point in (a, b). Define |f (n+1) (x)| (b − a)n+1 , x∈(a,b) (n + 1)!
Mn := sup
and suppose that limn→∞ Mn = 0. Then limn→∞ Pf ,x0 ,n = f , and the convergence is uniform on (a, b). Proof From Theorem 502 we get, following the notation in (5.33) and using the Lagrange form of the Taylor remainder (5.41), that for x ∈ (a, b), |f (x) − Pf,x0 ,n (x)| = |Rf,x0 ,n (x)| =
|f (n+1) (ξ )| |f (n+1) (ξ )| |x − x0 |n+1 ≤ (b − a)n+1 ≤ Mn for n ∈ N, (n + 1)! (n + 1)!
and the result follows from the Weierstrass M-test for sequences of functions (see the paragraph below Definition 460). Real Analytic Functions
260
5 Function Convergence
Definition 523 A real-valued function f defined on an open interval (a, b) is said to be real analyticat a point x0 ∈ (a, b) if there exists δ > 0 such that f is the sum n of a power series ∞ n=0 an (x − x0 ) on (x0 − δ, x0 + δ). We say that f is real analytic on (a, b) if it is real analytic at each point x ∈ (a, b). In other words, we say that a function f defined on an open interval (a, b) is real analytic if it is, locally, the sum of a power series. Remark 524 Two remarks are in order regarding a real analytic function f defined on an interval (a, b). 1. It follows from (iv) in Theorem 515n that f is infinitely differentiable on (a, b). 2. The power series ∞ n=0 an (x − x0 ) that coincides with f on a neighborhood of x0 according to Definition 523 is the Taylor series of f at x0 . This follows again from (iv) in Theorem 515; indeed, f (n) (x0 ) = n!an for all n ∈ N ∪ {0}. ® Examples 525
n 1. Every power series ∞ n=0 an (x − x0 ) with a radius of convergence R ∈ (0, +∞] defines a real analytic function on (x0 − R, x0 + R). This is a consequence of Remark 514.514. Note that this result improves (iv) in Theorem 515. The sum of a power series not only is an infinitely differentiable function, but in fact a real analytic function on its interval of convergence. That this is indeed a neat improvement follows from the observation that there are infinitely differentiable functions that are not real analytic on their domain. Example 5.2.2.2 above shows an infinitely differentiable function f on R such that its Taylor series at 0 is convergent on R although its sum differs from f (x) at each point x = 0 (certainly, it sums f (0) at 0). Would f be real analytic in an interval (−δ, δ) for some δ > 0, its Taylor series would converge to f (x), and this is not the case. By using Theorem 528, other examples of this behavior can be constructed (see Example 5.2.2.3). 2. Example 5.2.2.1 shows a real analytic function on (−∞, 1) that is not a power series there. ♦
Proposition 526 Any real analytic function on an open interval (a,b) can have at most finitely many zeros in a closed and bounded interval J ⊂ (a, b). In particular, a real analytic function on (a, b) that vanishes on a nondegenerate subinterval of (a, b) is identically zero. Proof Let S := {x ∈ (a, b) : there exists an open neighborhood U (x) ⊂ (a, b) of x such that f vanishes on U (x)} . Obviously, S is an open subset of (a, b). Let us prove that S is also closed relatively to (a, b). For this, choose an infinite sequence {sn } in S such that sn → x ∈ (a, b). Due to the real analyticity of f at x, there exists an open interval V (x) centered at x such that f is a power series on V (x). Moreover, this power series vanishes on the
5.2 Function Series
261
Fig. 5.23 The function in Example 527
sequence {sn }n≥N , where N ∈ N is such that sn ∈ V (x) for all n ≥ N . Corollary 516 (iii) shows that f vanishes on V (x), hence x ∈ S. Apply now Corollary 104 to conclude that S = (a, b). Example 527 The function (see Fig. 5.23) ⎧ ⎨x 2 sin (1/x) f (x) = ⎩0
if x = 0, otherwise,
is real analytic on no open interval J containing zero, since there is an infinite number of zeros of f in any such J (see Proposition 526). On the other side, this function is n+1 real analytic in every interval L such that 0 ∈ L. Indeed, sin t = ∞ n = 1 (−1) 2n−1 −2n+1 ∞ x t for every t ∈ R, hence sin (1/x) = n=1 (−1)n+1 (2n−1)! for every x = 0. It (2n−1)! ∞ −2n+3 n+1 x follows that f (x) = n=1 (−1) (2n−1)! for every x = 0, in particular for every x ∈ L. ♦ If f is a real-valued function on (a, b) real analytical at some x0 ∈ (a, b), then the sequence {f (n) (x0 )}∞ n=0 cannot be entirely arbitrary. In fact, in order that a sequence {dn }∞ n=0 in R would be the sequence of the successive derivatives at x0 of a real analytic function on some open interval (a, b) containing x0 , it is necessary and sufficient that the sequence {|an |1/n }∞ n=1 is bounded, where an := (1/n!)dn for all n ∈ N. Indeed, assume first that the condition holds. Then L := lim supn→∞ |an |1/n < +∞, ∞ n hence the radius of convergence ∞ R (=1/L) nof the series n=0 an (x − x0 ) is not zero. The function f (x) := n=0 an (x − x0 ) defined on (x0 − R, x0 + R) is a real analytic function (see Example 5.2.2.1) whose successive derivatives at x0 form the sequence {dn }∞ n=0 . Conversely, assume that {dn }∞ n=0 is the sequence of the successive derivatives at x0 of a function f defined on (a, b) and real analytic at x0 . Then the Taylor dn series ∞ n=0 n! (x − x0 ) of f at x0 converges to f (x) on a neighborhood of x0 . Put an := dn /n! for n ∈ N ∪ {0}. Then L := lim supn→∞ |an |1/n satisfies R := 1/L > 0. This shows that the sequence {|an |1/n }∞ n=0 is bounded. This allows to see the difference between the class of real analytic functions on an interval and the class of infinitely differentiable functions there. For given an arbitrary sequence {dn }∞ n=0 in R, we can always find an infinitely differentiable function on an neighborhood of a chosen point x0 in the interval such that the sequence {f (n) (x0 )}∞ n=0
262
5 Function Convergence
of successive derivatives of f at x0 is precisely {dn }∞ n=0 . Obviously, it is enough to prove the result for x0 = 0. Theorem 528 below, due to É. Borel, shows the existence of an infinitely differentiable function on R having the right sequence of successive derivatives at 0. Theorem 528 (Borel) Let {dn }∞ n=0 be a sequence in R. Then, there exists a realvalued infinitely differentiable function on R such that f (n) (0) = dn for all n ∈ N ∪ {0}. Proof [L. Gårding] Let φ : R → R be an infinitely differentiable function such that ⎧ ⎨1 if |x| ≤ 1/2, φ(x) = ⎩0 if |x| ≥ 1. The existence of such a function is guaranteed by combining Exercises 13.223 and 13.224. Define now f (x) :=
∞ dk k=0
k!
φ(ck x), x ∈ R,
(5.62)
where ck := k + |dk | for k ∈ N. It follows that f is infinitely differentiable at every point x ∈ R such that x = 0. Indeed, on an open nondegenerate interval J such that x ∈ J and 0 ∈ J , all summands in (5.62), but a finite number, vanish, due to the fact that φ has a finite support and that ck → +∞ as k → ∞. We shall prove now that f is infinitely differentiable on a closed interval [−δ, δ], for any δ > 0. Given n = 0, 1, 2, . . . , formally differentiate n times the series (5.62) to obtain, using Leibniz’s rule, the series ⎧ ⎫ ∞ ⎨ n ⎬ n dk n−j , (5.63) rk,j x k−j φ (n−j ) (ck x)ck ⎩ ⎭ j k! k=0
j =0
where rk,j :=
⎧ ⎨
if k ≥ j ,
⎩0
otherwise.
k! (k−j )!
Fix n ∈ N ∪ {0}. Given k > n, put 2n sup{|φ (n−j ) (x)| : 0 ≤ j ≤ n, x ∈ R}. (k − n)! Observe that the series ∞ k=n+1 Mk converges. Let us prove that for k > n, the absolute value of the kth summand in the series (5.63) (i.e., the term between curl brackets) is dominated by Mk . By using the Weierstrass M-test (see Theorem 473), this will show that the series (5.63) uniformly converges on R. Mk :=
5.2 Function Series
263
Fig. 5.24 The graph of the exponential function on the interval [−2, 2]
Indeed, if |ck x| > 1, the kth summand in (5.63) vanishes. If, on the contrary, |ck x| ≤ 1 and j = 0, 1, 2, . . ., n, then n−j
|dk x k−j ck
| = |dk |.|ck x|k−j ckn−k ≤ |dk |ckn−k < ckn−k+1 ≤ 1.
(5.64)
Therefore, for x ∈ R and k > n, we have n n dk n−j k−j (n−j ) (ck x)ck rk,j x φ j =0 j k! n n dk k! n−j = x k−j φ (n−j ) (ck x)ck j =0 j k! (k − j )! ≤
n |φ (n−j ) (ck x)| j =0
(k − j )!
n Mk n = Mk . ≤ n 2 j =0 j
The uniform convergence on R of the series (5.63) is guaranteed. For n ∈ N ∪ {0}, apply Theorem 480 to the series (5.62) on the interval [−δ, δ]. This shows that f is infinitely differentiable on [− δ, δ], and that the sum of (5.63) gives the nth derivative of f at x ∈ [−δ, δ]. It follows then that f (n) (0) = dn for n ∈ N ∪ {0}.
5.2.3
The Exponential and the Logarithmic Functions
The Exponential Function The exponential function exp x is one of the most important functions. We shall define precisely this function and establish the most relevant features of it by using the properties of power series established in Sect. 5.2.1(Fig. 5.24).
264
5 Function Convergence
Proposition 529 The series ∞ n x n=0
n!
(5.65)
has radius of convergence +∞, and so it converges for every x ∈ R. Proof It is enough to apply the Cauchy–Hadamard formula (5.52) together with Lemma 178. Now the following definition makes sense. Definition 530 The exponential function exp x (also denoted ex ) is the sum of the series ∞ n x n=0
n!
, for x ∈ R.
(5.66)
As a consequence of Theorem 511, we obtain the following result, where we list the most important properties of the exponential function. Proposition 531 The series (5.66) and the sum function exp x that it defines have the following properties: (i) The series converges absolutely for every x ∈ R, and it converges uniformly on each bounded interval of R. (ii) The function exp x is infinitely differentiable on R, and it coincides with each of its derivatives. (iii) The function exp x is real analytic on R. (iv) For every x, y ∈ R, we have exp (x) exp (y) = exp (x + y) . (v) For every x ∈ R, we have exp x > 0. (vi) The function exp x is strictly increasing, limx→−∞ exp x = 0 and limx→+∞ exp x = +∞. (vii) The number e defined by formula (2.39) or, equivalently, by (2.44) (see Definition 217) coincides with exp 1. Therefore, for every x ∈ R, the value exp x coincides with the value ex , where ex is defined as in Remark 48. Proof (i) is a consequence of the fact that R = +∞, and (i) and (iii) in Theorem 511. (ii) is a consequence of (iv) in the same theorem. For example, by taking k = 1 ∞ ∞
n−1 n−1 there, we get (exp x) = n(1/n!)x = = n=1 n=1 (1/(n − 1)!)x ∞ n (1/n!)x = exp x. n=0 (iii) This follows from the fact that the sum of every power series is a real analytic function on its open interval of convergence (see Remark 5.2.2.1).
5.2 Function Series
265
(iv) Observe that, for n ∈ N ∪ {0}, and for every x, y ∈ R, n n 1 n k n−k x k y n−k 1 x y = = (x + y)n . k! (n − k)! n! k=0 k n! k=0 n This that the nth term of the Cauchy product series of ∞ n=0 x /n! and ∞ shows n Theorem 215) is the nth term of n=0 y /n! (see the definition preceding n the series exp (x + y). Since the series ∞ n=0 x /n! is certainly absolutely convergent, we may apply Theorem 215 to obtain the conclusion. (v) From (iv), we have, for every x ∈ R, that exp (x) exp (−x) = exp 0 = 1. This shows, in particular, that exp x = 0 and exp (−x) = ( exp x)−1 . (vi) Note that exp x = 1 + x + x 2 /2! + x 3 /3! + . . . ; hence exp x ≥ 1 + x ≥ 1 for every x ≥ 0 (in particular, exp x → +∞ whenever x → +∞). Since, by (iv), exp (−x) = (exp x)−1 , we obtain that exp x → 0 whenever x → −∞, and exp x > 0 for every x ∈ R. Moreover, we know from (ii) that exp x = exp x ( > 0) for every x ∈ R. This shows that exp x is a strictly increasing function on R. (vii) Note, first, that exp 1 = e due to Proposition 219. Using (iv), we get, by finite induction, that exp p = ep for every p ∈ N. Moreover, exp (p/q) exp (p/q) . . .(q times) . . . exp (p/q) = exp p = ep for every q ∈ N, hence exp (p/q) = ep/q . By the fact that exp (−x) = (exp x)−1 , we obtain that, for every rational number, r := p/q, exp r = er . Recall now that we defined ex := sup{er : r ∈ Q, r < x} (see Remark 48). Since exp x is a continuous increasing function (see (ii) and (vi)), we have ex := sup{er : r ∈ Q, r < x} = sup{exp r : r ∈ Q, r < x} = exp x. This proves (vii). Corollary 532 The exponential function and its multiples are the sole differentiable real-valued functions on R, which have the property that the function and its derivative coincide. Proof Assume that f is a differentiable real-valued function defined on R such that f = f . Consider the function g(x) := f (x) exp (−x). Note that g (x) = 0 for all x ∈ R. By Corollary 369, g(x) = K, where K is a real constant. This shows, thus, that f (x) = K exp x for all x ∈ R. Remark 533 Proposition 531 does not provide a quantitative estimate of the speed of convergence of the series (5.66). To do this is the purpose of the following calculations. Fix x ∈ R and find p ∈ N such that |x| < p 1/2 . Let N ≥ 2p. Fix n, m ∈ N such that (2p ≤ ) N ≤ n ≤ m. We claim that for k ≥ n ( ≥ 2p), we have k! ≥ p k/2 . This can be proved by induction: fix p ∈ N. For k = 2p, we clearly have 1.2.3 . . . p (p + 1) . . . (2p) ≥ (p + 1) . . . (2p) ≥ pp = pk/2 . Assume now that it has been already proved, for a given k ≥ 2p, that k! ≥ p k/2 . So we have (k + 1)! = (k + 1)(k!) ≥ (k + 1)p k/2 ≥ p(k+1)/2 ,
266
5 Function Convergence
where the last inequality follows from the fact that (k + 1)2 ≥ p, and so (k + 1)2 p k ≥ p(k+1) . The claim is proved. Thus, m x k k! k=n
m k m m ∞ x |x|k |x| k |x| k ≤ = ≤ k! ≤ p k/2 p 1/2 p 1/2 k=n
=
|x| p 1/2
k=n
n ∞ k=n
k=n
|x| p 1/2
k−n
=
|x| p 1/2
n
k=n
1 1−
|x| p 1/2
≤
|x| p 1/2
N
1 1−
|x| p 1/2
, (5.67)
where the inequality involving the infinite series is allowed since this series is convergent, due to the fact that |x|/p1/2 < 1. It is enough to choose N ( ≥ 2p) big enough to obtain that the last term in (5.67) can be made arbitrarily small. This proves that the series (5.65) is Cauchy, hence convergent. Even more, (5.67) gives an estimate for the approximation between exp x and Pexp,n,0 (x) for every n ∈ N and every x ∈ R. Indeed, fix K > 0 and put p := 4K 2 . Take x ∈ R such that |x| ≤ K √ (i.e., |x| ≤ p/2). Then |x| 1 |x| 1 √ ≤ , and so 1 − √ ≥ . p 2 p 2 Carry this to (5.67) to get that, for N ≥ 2p (= 8K 2 ) and m ≥ n ≥ N , we have N m |x k | 1 1 = N −1 . ≤2 k! 2 2 k=n Letting m → ∞, we get, for n ≥ N , ∞ n−1 k x x k exp x − Pexp,n−1,0 (x) = exp x − = k! k! k=0
≤
∞ |x k | k=n
k!
≤
k=n
1 , for all n ≥ N , N ≥ 8K 2 and all x ∈ [−K, K]. 2N−1
(5.68) ®
Remark 534 It is important to note that the exponential function “grows faster” than any polynomial in the following precise sense: lim x→+∞ |p(x)/ex | = 0 for any polynomial p in the variable x. Indeed, if p(x) := a0 + a1 x + . . . + an x n , where a0 , a1 , . . . an are real numbers and an = 0, then, for x > 0, p(x) < (n + 1)! |p(x)| ≤ (n + 1)! |a0 | + |a1 | + . . . + |an | → 0 (5.69) ex x n+1 x n+1 xn x
5.2 Function Series
267
Fig. 5.25 The functions exp x and ln x on the interval [−3, 3]
as x → +∞. Note that this result could also have been obtained from a repeated use of the de l’Hôpital’s Rule (Theorem 376). For the case of a polynomial of degree 1, this was already mentioned in Remark 378. ® The Logarithmic Function Since exp x is a strictly increasing function that maps R onto (0, +∞), it allows to consider the inverse function exp−1 : (0, +∞) → R. Definition 535 The inverse function for the exponential function exp x is called the natural logarithmic function, and is denoted by ln x. Figure 5.25 plots the graphs of the two functions exp x and ln x. To plot accurately exp x on a bounded interval, we may use the estimate given in (5.68). Once the graph of exp x has been drawn, the graph of ln x—its inverse—can be deduced immediately by taking its mirror image with respect to the diagonal in the OXY plane (see Fig. 5.25). Indeed, this behavior characterizes the geometric relationship between the graphs of a function f having an inverse, and of its inverse f −1 . For this, note that the symmetric point of (x, f (x)) with respect to the diagonal is (f (x), x). If this is the point of the graph of a function g, i.e., if g(f (x)) = x, then g = f −1 . Proposition 536 The logarithmic function ln x has the following properties: (i) eln x = x for x > 0, and ln (ex ) = x for all x ∈ R.
(5.70)
(ii) It is a strictly increasing function that maps (0, +∞) onto R. (iii) Given α and β in (0, +∞), we have ln (αβ) = ln α + ln β.
(5.71)
(iv) It is differentiable at every x > 0, and we have ln x =
1 , for all x > 0. x
(5.72)
In fact, ln x is infinitely differentiable at every x > 0, and we have ln(n) x = (−1)n+1 (n − 1)!x −n , for all x > 0 and for all n ∈ N.
(5.73)
268
5 Function Convergence
Proof (i)—(iii) are straightforward consequences of the fact that exp x and ln x are mutually inverse functions (for a “change-of-variable” proof of statement (iii), see Exercise 13.336). (iv) follows from Proposition 531, which ensures the differentiability of exp x and that exp x = exp x = 0, together with Theorem 393. More precisely, observe that (vi) in Proposition 531 implies that exp x maps in a strictly increasing way R onto (0, +∞). In particular, given y0 > 0, we can find x0 ∈ R such that ex0 = y0 . Fix δ > 0 and consider the restriction f of the exponential function to [x0 − δ, x0 + δ]. This is a continuous function that is differentiable on (x0 − δ, x0 + δ). Since f (x) > 0 for every x ∈ (x0 − δ, x0 + δ), Theorem 393 guarantees the existence of the inverse mapping f −1 : [f (x0 − δ), f (x0 + δ)] → [x0 − δ, x0 + δ] and its differentiability at each y ∈ (f (x0 − δ), f (x0 + δ)), in particular at y0 . The same result shows that (ln y0 = ) (f −1 ) (y0 ) = 1/f (x0 ) = 1/ex0 = 1/y0 . The existence of any derivative follows then from it, due to the differentiability properties of the function 1/x. The next result expresses any power α β , for α > 0 and β ∈ R (see Remark 48), in terms of the exponential function. Thus, if the theory of infinite series—in particular, the theory of power series—is taken as the starting point in the development of the Calculus, formula (5.75) can be taken as the definition of the power α β . Corollary 537 Given α ∈ (0, +∞) and β ∈ R, we have ln (α β ) = β ln α,
(5.74)
α β = exp (β ln α).
(5.75)
or, equivalently,
Proof Let p, q ∈ N. From (iii) in Proposition 536, we get ln (α p ) = p ln α. Therefore, q ln (α 1/q ) = ln ((α 1/q )q ) = ln α, so ln (α 1/q ) = (1/q) ln α. Using (5.71) again we get ln (α p/q ) = (p/q) ln α. By applying exp to both sides of the previous identity, we get α Q = exp (Q ln α) for each rational number Q. Recall now that α β := sup{α Q : Q ∈ Q, Q ≤ β}, hence α β = sup{exp (Q ln α) : Q ∈ Q, Q ≤ β}. The fact that exp is an increasing and continuous function shows that sup{exp (Q ln α) : Q ∈ Q, Q ≤ β} = exp (β ln α), and the conclusion follows. Due to the fact that the domain of ln x is (0, +∞), we attempt to obtain the Taylor series of this function at x0 = 1, and to study its convergence properties. It is computationally simpler to treat the function f (x) := ln (1 + x), for x ∈ (−1, +∞).
(5.76)
We found the Taylor polynomials of f and drew their graphs in Example 5.1.4.4 and Fig. 5.15, respectively. The Taylor series of ln (1 + x) at x0 = 0 is then ∞ n=1
(−1)n+1
xn . n
(5.77)
5.2 Function Series
269
1. The first thing to observe is that (5.77) is a power series whose radius of convergence is 1. This follows easily from the Cauchy–Hadamard formula (5.52). Observe, too, that the series (5.77) converges for x = 1, (Corollary 183), while itdiverges at x = −1 (Proposition 161). The only chance to get n+1 n ln (1 + x) = ∞ x /n is thus to assume x ∈ (−1, 1]. n=1 (−1) 2. Take x ∈ (−1, 1], fix n ∈ N, and use the Cauchy form of the nth Taylor’s remainder Rf ,0,n (see Eq. (5.40) in Remark 503). Precisely Rf ,0,n (x) =
x(x − ξ )n (−1)n , (1 + ξ )n+1
(5.78)
where ξ is a real number that depends on x and n, and lies strictly between 0 and x. Put ξ = θ x, where θ ∈ (0, 1) depends, accordingly, on x and n. Then (5.78) appears as 1−θ n 1 n n+1 Rf ,0,n (x) = (−1) x . (5.79) 1 + θx 1 + θx (a) Fix 0 < δ < 1. Let x ∈ [−δ, δ]. Observe that (i) |x|n+1 ≤ δ n+1 1−θ n (ii) 0 < 1, since the series there does not converge for those values. Of course, (5.81) gives an estimate, although very poor, as it can be seen in Fig. 5.26. However, there is a way to estimate ln x for x ∈ (1, +∞). Replacing x by −x in (5.80), we get ln (1 − x) = −
∞ xn n=1
n
, for −1 ≤ x < 1.
(5.82)
5.2 Function Series
271
Fig. 5.26 Inequalities (5.81)
Fig. 5.27 The function (1 + x)/(1 − x) on (0, 1) (Remark 540)
Subtracting (5.80) and (5.82) for x ∈ (−1, 1), we get 1+x x3 x 2n+1 x ln =2 + + ... + + ... . 1−x 1 3 2n + 1
(5.83)
The continuous function (1 + x)/(1 − x) is strictly increasing on (0, 1), and its range is (1, +∞) (see Fig. 5.27), so (5.83) allows for computing the logarithm of numbers in this last interval. As an example, observe that x = 1/2 in (5.83) gives (1 + x)/(1 − x) = 3, and so s3 := 2 3n=0 x 2n+1 /(2n + 1) gives an approximation toln 3. We get s3 = 1.09806547619 . . . . To estimate the error, observe that 2n+1 2 ∞ /(2n + 1)) < 2x 9 (1/9)(1 + x 2 + x 4 + . . .) = 2.2−9 (1/9)(4/3) < n=4 (x −4 (5.787)10 . Indeed, ln 3 = 1.09861228867 . . . . The series in (5.83) converges very slowly for large values of (1 + x)/(1 − x). A more efficient way is to use Newton’s method (see 5.1.4 and, more particularly, Eq. (5.51)). Fix x ∈ R and consider the function f (y) = x − exp y. We search for a zero y of this function, computing the limit of the sequence {yn }∞ n=0 defined recursively as yn+1 = yn − f (yn )/f (yn ) = yn + (x − exp yn )/ exp yn for n = 0, 1, 2, . . . . This requires an initial estimate y0 that can be approximated by noticing that ln x = log10 x. ln 10, and taking 2.3 as a rough estimate of ln 10. For example, in order to compute ln 100, observe that log10 100 = 2, so a first estimate should be y0 = 4.6. We readily get, by using 11 significative decimal digits, y1 = 4.60518357446, y2 = 4.60517018608, y3 = 4.60517018599, y4 = 4.60517018599, and the last value is ln 100 to 11 significative decimal digits.
®
272 Fig. 5.28 Using a logarithmic table to find the product
5 Function Convergence
number
logarithm
α
lα
β
lβ
sum locate the sum αβ
lα + lβ
Remark 541 The historical success of the use of logarithms in computations comes essentially from property (iii) in Proposition 536 (formula (5.71)). Indeed, once a logarithmic table (i.e., a two-column list consisting of numbers in one of the columns and their logarithms in the other) is available, then it is possible to perform the product of two numbers easily. The process is described in Fig. 5.28. A mechanical device, called a slide rule, was widely used before the advent of the electronic calculator. It has numbers spaced according to a logarithmic pattern along two adjacent (and sliding) scales, so it performs the product of α and β just by adding lengths. Figure 5.29 shows the way to compute 2 times 3. ®
Applications 1. Given a positive number α, and an arbitrary real number β, the power α β was defined in Remark 48. The definition of the exponential function allows for a different approach to the meaning of expressions like α β above. Indeed, for α > 0 and an arbitrary β ∈ R, the power α β is defined to be eβ ln α . Due to the fact that ln x is strictly increasing, it is simple to prove that this definition agrees with the one introduced in Remark 48. 2. For an example of the use of the exponential and logarithmic functions in computations of limits, see Exercise 13.211. 3. Property (ii) in Proposition 531, precisely the fact that exp x = exp x, makes the exponential function a cornerstone in continuous mathematical modeling. We give an instance of a typical elementary application to population analysis. As an example to theory of population growth, suppose that a population P = P (t)
Fig. 5.29 Computing 2 times 3 on a slide rule
5.2 Function Series
273
grows at a rate so that at all instances of time t, we have dP = kP (t), for all t ≥ 0, dt
(5.84)
where k is the growth rate constant (this is a plausible assumption: the instant rate of reproduction depends on the instant size of the population). Let P0 be the initial population at time t = 0, i.e., P0 = P (0). According to Proposition 531, a solution of Eq. (5.84)—a typical example of what is called a differential equation—is given by P (t) = P0 exp (kt) (and so the population has exponential growth). Let us compute the time T for doubling the population starting at t = 0. We need to solve 2P0 = P (T ) = P0 exp (kT ). This gives T = lnk2 . For example, if k = 0.02, then T = 34, 657, i.e., 34 years and 8 months, approximately. Observe that the time needed for doubling the population is independent of the given starting moment. 4. For an alternative approach to ln x, we refer to Exercise 13.336. 5. The logarithmic function ln x is defined as the inverse mapping of the exponential function (see Definition 535). The election of e as the basis for the exponential function is somehow arbitrary—although the analytical properties of the function ex have a simpler symbolic expression than those of the function a x (a, a positive real number). If a > 0 is chosen as the basis for the exponential function a x , the corresponding inverse function is called the basis-a logarithm, denoted as loga x. Precisely, given y > 0, the number x := loga y is the only real number having the property a x = y. A special role is played by the basis 10, for obvious reasons. The log10 x function is called the decimal logarithm, while the base-e logarithm (that we denoted as ln x) is usually called the natural logarithm.
5.2.4
The Hyperbolic Functions
Several useful functions are defined by means of the exponential function ex . They are modeled after the complex trigonometric functions. Let us collect some of them, all defined on R. See Fig. 5.30 for fragments of their graphs. 1. The hyperbolic sine is the function sinh x :=
ex − e−x . 2
(5.85)
ex + e−x . 2
(5.86)
2. The hyperbolic cosine is the function cosh x :=
274
5 Function Convergence
Fig. 5.30 The hyperbolic functions sinh x, cosh x, and tanh x, on [−5, 5]
3. The hyperbolic tangent is the function tanh x :=
sinh x , cosh x
(5.87)
i.e., tanh x = (ex − e−x )/(ex + e−x ). In Exercise 13.337, we consider the inverse functions arsinh x, arcosh x, and artanh x of the hyperbolic functions.
5.2.5
The Trigonometric Functions
The power series definition of the trigonometric functions, sine and cosine, is given below. Note, first, that the two series (5.88) and (5.89) have radius of convergence +∞, as it follows by using the Cauchy–Hadamard formula 5.52 and Lemma 178. This shows that the two series define functions on R. Definition 542 The function sine, denoted as sin x, is the sum of the power series sin x :=
∞
(−1)n
n=0
x 2n+1 , x ∈ R. (2n + 1)!
(5.88)
The function cosine, denoted as cos x, is the sum of the power series cos x :=
∞ n=0
(−1)n
x 2n x ∈ R. (2n)!
(5.89)
Proposition 543 The functions sin x and cos x have the following properties (see Fig. 5.31): (i) Sine is an infinitely differentiable function, and sin x = cos x for all x ∈ R. Moreover, sin 0 = 0, and sin x is an odd function, in the sense that sin (−x) = − sin x for all x ∈ R.
5.2 Function Series
275
Fig. 5.31 The trigonometric functions sin x, cos x, and tan x (its OX and OY scale are different)
(ii) Cosine is an infinitely differentiable function, and cos x = − sin x for all x ∈ R. Moreover, cos 0 = 1, and cos x is an even function, in the sense that cos (−x) = cos x for all x ∈ R. (iii) sin2 x + cos2 x = 1 for all x ∈ R (hence | sin x| ≤ 1 and | cos x| ≤ 1 for all x ∈ R). (iv) There exists a (unique) positive number α with the following properties: (a) cos α = 0 and (b) cos x > 0 for all x ∈ [0, α). The number 2α is denoted by π . We have sin (π/2) = 1. (v) We have sin (x + y) = sin x cos y + cos x sin y for all x, y ∈ R. (vi) The functions sin x and cos x are 2π-periodic. Proof (i) That sin x is infinitely differentiable follows from Theorem 511. Using (iv) in that theorem, it is easy to see that sin x = cos x for all x ∈ R. Clearly, sin 0 = 0, and sin x is odd. The proof of (ii) is similar. (iii) Compute the derivative of the function f (x) := sin2 x + cos2 x to get f (x) = 2 sin x cos x + 2 cos (x)(− sin (x)) = 0 for all x ∈ R. This shows that f is constant on R. Since f (0) = 1, we get the result. The second part follows from the first, since then sin2 x ≤ 1 and cos2 x ≤ 1 for all x ∈ R. (iv) Assume that cos x > 0 for all x > 0, i.e., sin (x) > 0 for all x > 0. This shows that sin x is a strictly increasing function on [0, +∞) (thus, by (iii), cos x is a strictly decreasing function there). Since sin x ≤ 1 for all x ∈ R, there exists S := limx→∞ sin x, and 0 < S ≤ 1, and there exists C := limx→∞ cos x. In particular, cos n → C, and the Mean Value Theorem 365 shows that there exists a sequence {xn }∞ n=1 in (0, +∞) such that xn → +∞ and sin xn → 0. This is a contradiction. This shows that the set {x > 0 : cos x = 0} is not empty. Let α := inf{x > 0 : cos x = 0}. We have 0 < α, since cos 0 = 1. Since cos x is continuous, cos α = 0 and sin α = 1 follow from this. The uniqueness of such a number α is a consequence of the definition. Indeed, if some other β > 0 shares the same properties, then either α < β, so cos α > 0, a contradiction, or β < α and so cos β > 0, again a contradiction. (v) Fix x, y ∈ R. Let us compute sin (x) cos (y) byusing Theorem 215. The Cauchy product of the two series is sin (x) cos (y) = ∞ n=0 cn . For n ∈ N ∪ {0}, n odd,
276
5 Function Convergence
we have
(n−1)/2
cn =
(−1)k
k=0
n−(2k+1) x 2k+1 y n−(2k+1) (−1) 2 (2k + 1)! (n − (2k + 1))!
(n−1)/2
=
(−1)(n−1)/2
k=0
(−1)(n−1)/2
k=0
2k+1 n−(2k+1) y n x . 2k + 1 n!
(5.90)
(n−1)/2
=
x 2k+1 y n−(2k+1) n! (2k + 1)!(n − (2k + 1))! n!
(5.91)
It is clear that cn = 0 for n ∈ N∪{0}, n even. In order to compute cos (x) sin (y), we ∞shall use (5.91) interchanging the roles of x and y. So, cos (x) sin (y) = n=0 dn , where, for n ∈ N ∪ {0}, n odd, we have
(n−1)/2
dn =
(−1)k
k=0
n−(2k+1) y 2k+1 x n−(2k+1) (−1) 2 (2k + 1)! (n − (2k + 1))!
(n−1)/2
=
(−1)
(n−1)/2
k=0
2k+1 n−(2k+1) x n y . 2k + 1 n!
(5.92)
Again it is clear that dn = 0 for n ∈ N ∪ {0}, n even. Adding (5.91) and (5.92) and reindexing, we get, for n ∈ N ∪ {0}, n odd, cn + dn =
n k=0
(−1)
(n+1)/2
k n−k n x y (x + y)n = (−1)(n+1)/2 , k n! n!
(5.93)
while, for n even, cn +dn = 0. Finally, adding expressions (5.93) for n ∈ N∪{0}, we get sin x cos y + cos x sin y = sin (x + y). (vi) This is a consequence of (iv) and (v). Proposition 544 Real-valued functions defined on R and having properties (i), (ii), and (iii) in Proposition 543 are unique. Proof Let s : R → R and c : R → R be functions that satisfy properties (i), (ii), and (iii) for sine and cosine in Proposition 543, respectively. Observe that s = c, s
= −s, s
= −c, and s (iv) = s (the formulas for successive derivatives follow). Fix δ > 0 and consider the interval J := (−δ, δ). Theorem 502 applied to the function s, n ∈ N ∪ {0}, the interval J , and to x0 := 0 shows that s(x) = s(0) +
s (0) s (n) (0) n s
(0) 2 x+ x + ... + x + Rs,0,n (x), for x ∈ J , 1! 2! n! (n+1)
(ξ ) n+1 where Rs,0,n (x) = s(n+1)! x , the number ξ lies strictly between 0 and x if x = 0, and ξ = 0 if x = 0. Due to (iii), we have |s(x)| ≤ 1 for all x ∈ J , hence
5.2 Function Series
277
sup{|Rs,0,n (x)| : x ∈ J } → 0 as n → ∞. This shows that the sequence {Pn }∞ n=0 of s (0) s
(0) 2 s (n) (0) n Taylor polynomials Pn (x) := s(0) + 1! x + 2! x + . . . + n! x for s converges uniformly to s on J . It follows from (i), (ii), and (iii) that s (2n) (0) = 0 and s (2n+1) (0) = (−1)n , for n ∈ N ∪ {0}, so we get, according to (5.88), that s(x) = sin x on J . Since δ > 0 was chosen to be arbitrary, we get the conclusion for s. A similar argument proves that c(x) = cos x. A number of properties of the trigonometric functions sin x, cos x, and related, follow from the statement of Proposition 543. For example, for every x, y ∈ R, sin (x + π/2) = cos x, cos (x + π/2) = − sin x. sin (x ± y) = sin x cos y ± cos x sin y, cos (x ± y) = cos x cos y ∓ sin x sin y, sin (2x) = 2 sin x cos x, cos (2x) = cos2 x − sin2 x. sin2 x = 1−cos2 (2x) , cos2 x = 1+cos2 (2x) . (x+y) (x+y) , cos x cos y = cos (x−y)+cos . sin x sin y = cos (x−y)−cos 2 2 sin (x+y)+sin (x−y) sin (x+y)−sin (x−y) , cos x sin y = . sin x cos y = 2 2 The functions sin x and cos x are bounded on R. It is natural to ask whether the Taylor series of those two functions converge uniformly. The answer is that they do (to the corresponding function) on every bounded interval, but not on R. The reason for the former is clear from the Lagrange form of the remainder (see Eq. (5.41)). Indeed, given M > 0, take x ∈ [−M, M]. Observe that, for n ∈ N and some ξ ∈ (−M, M) (depending on x and n), n+1 x M n+1 |Rsin,0,n (x)| = sin(n+1) ξ ≤ . (5.94) (n + 1)! (n + 1)! n M The series ∞ n=0 M /n! converges (indeed, it sums e ), hence given ε > 0, there exists N ∈ N such that M n /n! < ε for n ≥ N . It follows then from (5.94) that supx∈[−M,M] |Rsin,0,n (x)| ≤ ε for n ≥ N , and so the convergence of the Taylor series of sin x to the function sin x is uniform on [−M, M]. That the convergence is not uniform on R can be seen in the following way: assume for a moment the contrary. Observe that sin (π/2 + 2nπ ) = 1 and that sin (3π/2 + 2nπ) = −1 for all n ∈ Z. Assume that for some k ∈ N, the kth Taylor polynomial Psin,0,k of sin x satisfies | sin x − Psin,0,k (x)| < 1/2 for all x ∈ R. Then Psin,0,k , being continuous, must have an infinite number of zeros (due to the Intermediate Value Theorem 339), something impossible for a polynomial because of the Fundamental Theorem of Algebra. The same arguments apply to the Taylor polynomial of the function cos x. Since cos x > 0 for x ∈ R\{(2n−1)π/2 : n ∈ Z, we can introduce the following definition. Definition 545 The function tangent, denoted as tan x, is defined (see Fig. 5.31) by tan x :=
sin x , for x ∈ R \ {(2n − 1)π/2 : n ∈ Z}. cos x
(5.95)
The Inverse Trigonometric Functions Since the functions sin x, cos x, and tan x are 2π-periodic, they are certainly not one-to-one, and are not, consequently, inverse
278
5 Function Convergence
Fig. 5.32 The inverse trigonometric functions on their domains (arctan x on [−10, 10])
functions (Fig. 5.32). Restricting their domain allows for such an inversion. The election of the restricted domain is somehow discretional. Below we present what appears as customary. Definition 546 (i) The inverse function of the (one-to-one and onto) function sin : [−π/2, π/2] → [−1, 1] is called the arcsine function, denoted as arcsin x, and maps, accordingly, [−1, 1] onto [−π/2, π/2]. (ii) The inverse function of the (one-to-one and onto) function cos : [0, π ] → [−1, 1] is called the arccosine function, denoted as arccos x, and maps, accordingly, [−1, 1] onto [0, π]. (iii) The inverse function of the (one-to-one and onto) function tan : (−π/2, π/2) → R is called the arctangent function, denoted as arctan (x), and maps, accordingly, R onto (−π/2, π/2). Certainly, these functions are one-to-one and onto as defined. An application of Theorem 393 shows that all three functions above are infinitely differentiable in their corresponding domain. The Chain Rule ((iv) in Proposition 374) shows that arcsin x = √
1 1−
x2
, arccos x = √
−1 1−
x2
, arctan (x) =
1 1 + x2
and that, accordingly, the Taylor series (and their interval of convergence) of those functions are arcsin x = =
1 π − arccos x 2 ∞ (2n)! n=0
=x+ arctan (x) =
22n (n!)2 (2n + 1)
x 2n+1
1.3 x 5 1.3.5 x 7 1 x3 + + + . . ., for |x| ≤ 1. 2 3 2.4 5 2.4.6 7
∞ (−1)n 2n+1 x3 x5 x7 =x− x + − + . . ., for |x| ≤ 1. 2n + 1 3 5 7 n=0
5.2 Function Series
279
Fig. 5.33 Graphs of (1 + x)α for several α’s
5.2.6
The Binomial Series
The following formula, whose right-hand side is referred to as the finite binomial expansion of (1 + x)n , is obtained easily by induction (see (iii) in Exercise 13.10): (1 + x)n =
n n k=0
k
x k , for all n ∈ N and x ∈ R,
(5.96)
where the coefficients nk , called the binomial coefficients, or the combinatorial numbers, were given in (2.41). Precisely, n! n n n(n − 1) . . . (n − k + 1) = 1, := = , for k = 1, 2, . . ., n. 0 k k!(n − k)! k! (5.97) That it is possible to express f (x) := (1 + x)α for any α (not necessarily a positive integer) in a related form (now as a series) was a result of Newton, known as the binomial theorem (Theorem 547 below). For a picture of the graph of the function (1 + x)α for several values of α, see Fig. 5.33. Theorem 547 (The binomial theorem) (i) For α ∈ R, we have (1 + x) = α
∞ α k=0
k
x k , for |x| < 1,
(5.98)
where α α α(α − 1)(α − 2) . . . (α − k + 1) := 1, and = , for all k ∈ N. 0 k k! (5.99)
280
5 Function Convergence
The series in (5.98) is called the binomial series. (ii) For α > 0, the series (5.98) converges uniformly and absolutely in [−1, 1] to (1 + x)α . Proof (we follow in part [HewStr65, Theorem 7.25]) The power series in (5.98) has radius of convergence 1. This follows from the Cauchy–Hadamard formula (5.52) by using Lemma 178. Indeed, α |α − k| k+1
α = → 1 as k → ∞. k+1 k Put fα (x) := fα (x)
=
∞ α k=0 k ∞ α
k
k=1
x k for x ∈ (−1, 1). Theorem 511 ensures that
kx
∞ α k = (k + 1)x , for all x ∈ (−1, 1). k+1 k=0
k−1
α = α α−1 to get Use the identity (k + 1) k+1 k fα (x) = α
∞ α−1 k=0
x k ( = αfα−1 (x)), for all x ∈ (−1, 1).
k
(5.100)
Next, for x ∈ (−1, 1), (1 + x)fα−1 (x) = (1 + x)
∞ α−1 k
k=0
=1+
∞ α−1 k
k=1
=1+ =1+
k=1
k
∞ α−1 k x + x k−1 k=1
x = k
k
∞ α − 1 k+1 x + x k k=0 k
k
k
∞ α
∞ α−1 k=0
∞ α−1 k=1
x = k
α−1 + xk k−1
∞ α k=0
k
x k = fα (x).
(5.101)
Combining (5.100) and (5.101), we get (1 + x)fα (x) − αfα (x) = 0 for all x ∈ (−1, 1), so the derivative of the function x →
fα (x) (1+x)α
vanishes on (−1, 1). This shows that
fα (x) the function x → (1+x) α is constant on (−1, 1). Since its value at 0 is 1, we get α fα (x) = (1 + x) for all x ∈ (−1, 1), and this shows (5.98).
5.2 Function Series
281
(ii) Assume now that α > 0. If α ∈ N, then the result is trivial. Otherwise, put
ak := | αk | ( = 0) for k = 0, 1, 2, . . . . Then, |α − k| ak+1 , = ak k+1 so, for k ≥ α,
k−α ak+1 = . ak k+1
i.e., kak − (k + 1)ak+1 = αak ( > 0), for k ≥ α.
(5.102)
The sequence {kak }∞ k=1 is thus eventually decreasing, and all its terms are nonnegative. This shows that it converges. Note, too, that if k0 = α + 1, then, by using (5.102), n k=k0
αak =
n
kak − (k + 1)ak+1 = k0 ak0 − (n + 1)an+1 , for n ≥ k0 ,
k=k0
∞
∞ so the series k converges. Since α > 0, the series k=0 ak converges, too.
α k k=0 αa Since | k x | ≤ | αk | for all x ∈ [−1, 1], the series in (5.98) converges, by the Weierstrass M-test (Theorem 473), uniformly and absolutely on [−1, 1]. This shows the statement.
Chapter 6
Metric Spaces
This chapter deals with metric spaces, where most of the discussions in the whole text find their place. The reader may find here a discussion on compact spaces, separable spaces, Polish spaces and the Baire category theorem. Some applications to the fixed point theory are included.
6.1
Basics
Geometry works, from ancient times, essentially with two instruments: a rule and a compass. The first one measures distances, the second, angles. Modern mathematics isolates these two activities in two subjects: metric spaces (for a theory of distances) and inner product spaces (that presents a theory of angles or, if the reader prefers, orthogonality based in the notion of an inner product, also called a dot product. Most interestingly, this notion—an inner product—is powerful enough to induce also a distance, and so the frame in which metric geometry can be done naturally is settled (see Sect. 11.1). In this chapter, we shall explore the concept, methods, and results pertaining to a metric (or a distance, an equivalent term). We have been using a distance d on R, precisely the absolute-value distance dabs (x, y) := |x − y|, for x, y ∈ R (see Eq. (1.19)). May be the reader is familiar with a distance (the so-called Eu
n 2 1/2 clidean distance d2 ) on R2 or R3 , precisely d2 (x, y) := for i=1 (xi − yi ) x = (xi )ni=1 ∈ Rn , y = (yi )ni=1 ∈ Rn , and n = 2, 3 (the same definition extends to any n ∈ N). The distance by car between two points x := (x 1 , x2 ) and y := (y1 , y2 ) in a modern reticulated city should be given by d1 (x, y) := 2i=1 |xi − yi |. If we are interested, instead, in the maximum discrepancy between two vectors we should use d∞ (x, y) := max{|xi − yi | : i = 1, 2, . . . , n} (for a sketch of these three distances see Fig. 6.1). In this chapter we will discuss what these concepts have in common. Definition 548 Let M be a nonempty set. A distance (also called a metric) on M is a mapping d : M × M → R that satisfies: © Springer International Publishing Switzerland 2015 V. Montesinos et al., An Introduction to Modern Analysis, DOI 10.1007/978-3-319-12481-0_6
283
284
6 Metric Spaces
Fig. 6.1 Three distances in R2
(i) (ii) (iii) (iv)
d(x, y) ≥ 0 for all x, y ∈ M. d(x, y) = 0 if and only if, x = y. d(x, y) = d(y, x) for all x, y ∈ M. d(x, z) ≤ d(x, y)+d(y, z) for all x, y, z ∈ M (the so-called triangle inequality).
The couple (M, d) is said to be a metric space. If the metric d on M is understood, we shall speak just of a metric space M. Example 549 (Some ubiquitous examples) Here we give a list of examples. We shall return to them later on in more detail. 1. The set R was equipped with the metric dabs , defined via the absolute value function by the formula (6.1). That dabs is indeed a metric on R is easy to check. The space (R, dabs ) is then a metric space. 2. Let E be a vector space, i.e., a set with two operations, namely the sum of elements and the product of an element and a scalar (for us, now, just a real number). The reader may consult any linear algebra book to find the properties attributed to these two operations. Elements of E are called, accordingly, vectors. Now let · be a mapping from E into R that satisfies the following properties: (i) x ≥ 0 for every x ∈ E. (ii) x = 0 if and only if, x = 0. (iii) x + y ≤ x + y for every x, y ∈ E. (iv) λx = |λ|.x for every x ∈ E and every scalar λ. (see Definition 895). Such a mapping is said to be a norm, and the couple (E, · ) is called a normed space. The associated mapping d : E × E → R given by d(x, y) := x − y
(6.1)
for all x, y ∈ E is a metric on E, as it is standard to check. If nothing is said on the contrary, we always assume that a normed space carries the distance d defined above, and so it becomes a particular instance of a metric space. Some examples of this kind of objects follow. (a) The set R, with the usual operations sum and product, is a vector space. This vector space was equipped with the metric dabs mentioned in Example 549.1. Observe that the norm x1 := |x| on R induces the distance dabs . The space (R, · 1 ) is a normed space. (b) An extension of the former example is the vector space Rn , where n ∈ N. The operations sum of vectors and product of a vector by a scalar are defined coordinatewise, i.e., (x1 , . . . , xn ) + (y1 , . . . , yn ) := (x1 + y1 , . . . , xn + yn ),
6.1 Basics
285
and λ(x1 , . . . , xn ) := (λx1 , . . . , λxn ) for (x1 , . . . , xn ), (y1 , . . . , yn ) ∈ Rn , and λ ∈ R. The so-called Euclidean norm is defined by (x1 , . . . , xn )2 :=
n
21 xi2
(6.2)
i=1
for (x1 , . . . , xn ) ∈ Rn . We shall see in Sect. 11.4 that (6.2) is indeed a norm. The metric defined from · 2 via (6.1) is denoted by d2 (see Fig. 6.1 for the metric d2 on R2 ). (c) Given a nonempty set , define ∞ () as the set of all bounded real-valued functions on . It is a vector space, the operations defined pointwise, i.e., (f + g)(γ ) := f (γ ) + g(γ ), and (λf )(γ ) := λf (γ ), for all f , g ∈ ∞ (), λ ∈ R, and γ ∈ . Thus ∞ () becomes a vector space. The supremum norm · ∞ is defined on ∞ () by f ∞ := sup{|f (γ )| : γ ∈ }
(6.3)
for f ∈ ∞ (). The corresponding metric is denoted by d∞ (see Fig. 6.1 for the metric d∞ on R2 ). Observe that the concept of uniform convergence of sequences of functions (see Definition 460) is, precisely, convergence in the metric d∞ . The space ∞ (N) will be denoted just by ∞ . A particular case is = {1, 2, . . . , n}, and we recover the space Rn . The norm (6.3) is then (x1 , x2 , . . . , xn )∞ = max{|xi | : i = 1, 2, . . . , n},
(6.4)
for (x1 , x2 , . . . , xn ) ∈ Rn , and so the induced metric is d∞ ((x1 , x2 , . . . , xn ), (y1 , y2 , . . . , yn )) = max{|xi − yi | : i = 1, 2, . . . , n}, (6.5) for (x1 , x2 , . . . , xn ), (y1 , y2 , . . . , yn ) ∈ Rn . 3. Let S be a nonempty set. For x, y ∈ S, let us define ⎧ ⎨0 d(x, y) := ⎩1
if x = y,
(6.6)
otherwise.
It is standard and left to the reader to show that d is a metric on S; thus, (S, d) is a metric space. The metric d is said to be the discrete metric. ♦ Remark 550 Let (M, d) be a metric space and let S be a nonempty subset of M. We may consider on S the metric induced by d, i.e., the restriction d of the mapping d to is a metric space. the set S × S. It is clear that d is a metric on S, so the couple (S, d) We speak of a subspace S of M, considering S endowed with the restriction metric d. In order to avoid cumbersome notation, we shall denote simply by d the restriction of the metric d on M to a subset S, if no misunderstanding can be expected.
286
6 Metric Spaces
If (E, · ) is a normed space and F is a vector subspace of E, then the restriction of the mapping · to F is a norm on F , and we shall always consider, if nothing is said on the contrary, F endowed with this norm, denoted again by · . ® Example 551 (Continued) 4. By the fact that every real-valued continuous function on [0, 1] is bounded (see Corollary 335), and that the sum of continuous functions, as well as the product of a continuous function by a scalar, are continuous, the space C[0, 1] is a (vector) subspace of the space ∞ ([0, 1]) (see Example 549.2c). Thus, according to Remark 550, we may consider C[0, 1] endowed with the restriction of the norm · ∞ (see Eq. (6.3)), and so with the associated metric d∞ . ♦ One of the most important features of a metric space (this was, in fact, the purpose for introducing a distance) is that we may speak of “proximity,” “tendency,” or “convergence” (almost equivalent ideas) in terms of the distance. So we have the following central concept. Definition 552 Let (M, d) be a metric space. A sequence {xn }∞ n=1 in M is said to converge to a point x ∈ M (or, equivalently, that x is the limit of {xn }∞ n=1 ) if the sequence {d(x, xn )}∞ converges to 0 in R (i.e., it is a null sequence in R). The fact n=1 converges to x will be denoted by writing lim xn = x, that a sequence {xn }∞ n→∞ n=1 or, simply, xn → x. Proximity needs a convenient language to help to describe elements that are at close to a certain element in M. Given x0 ∈ M and r ≥ 0, we denote by B(x0 , r) the set {x ∈ M : d(x, x0 ) < r}; this set will be called the open ball centered at x0 and having radius r. The set B[x0 , r] := {x ∈ M : d(x, x0 ) ≤ r} is called the closed ball centered at x0 and having radius r. For example, in the metric space (R, dabs ), we have B(x0 , r) = {x ∈ R : |x − x0 | < r} = (x0 − r, x0 + r) and B[x0 , r] = {x ∈ R : |x − x0 | ≤ r} = [x0 − r, x0 + r]. So, to express that in a metric space we have d(x, y) < r we just write y ∈ B(x, r), and to express that d(x, y) ≤ r we put y ∈ B[x, r]. Hence, the fact that a sequence {xn }∞ n=1 in M converges to an element x ∈ M can be expressed by saying that for every ε > 0 there exists N ∈ N such that xn ∈ B(x, ε) for every n ≥ N . Most of the notions that were introduced in Sect. 1.8 (those that depend on the distance defined in R) extend naturally to the more general setting of a metric space (M, d). We just did it for the notion of a convergent sequence (see Definition 552). A sequence {xn }∞ n=1 in M is said to be a Cauchy sequence if for every ε > 0 there exists N ∈ N such that d(xn , xm ) < ε for all n, m ≥ N. A set B ⊂ M is said to be bounded if it is a subset of a ball B(x, r), where x ∈ M and r > 0. In this case, we define the diameter of a bounded subset B of M as diam (B) := sup{d(x, y) : x, y ∈ B}. A metric space (M, d) is said to be bounded whenever M itself is a bounded set in M, i.e., whenever there exist x0 ∈ M and r > 0 such that M = B(x0 , r). A subset O of M is open if for each x ∈ O there exists r > 0 such that B(x, r) ⊂ O. Clearly, both the empty set and the set M are open sets, the union of an arbitrary family of open subsets of M is open, and the intersection of a finite family of open subsets of M is open. A subset C of M is said to be closed if its complement C c (= M \ C) is open. By taking complements we
6.1 Basics
287
immediately get that the intersection of a family of closed subsets of M is closed, and that the union of a finite family of closed subsets of M is closed (the proof of all these facts is an easy adaptation of the proof of Proposition 71). In particular, given a subset S of M, the intersection of all closed subsets of M that contain S is a closed subset of M named the closure of S and denoted by S. Clearly, S is the smallest (in the sense of inclusion) closed subset of M that contains S. The following result has a proof similar to the one of Proposition 128. For a detailed argument see Exercise 13.359. Since clearly a subset S of a metric space (M, d) is closed if and only if, S = S, an implication follows immediately from Proposition 554 below. Proposition 553 The closure of a subset S of a metric space M consists of the set of limits of all sequences in S that converge in M. The following result characterizes closed sets by using sequences. It extends Proposition 128 to the case of an arbitrary metric space. Proposition 554 Let (M, d) be a metric space. Then, a subset C of M is closed if and only if, the limit of any convergent sequence {xn }∞ n=1 in C belongs to C. Proof Assume first that C is closed. If C is empty, there is nothing to prove. If not, let {xn }∞ n=1 be a sequence in C that converges to some x ∈ M. Assume that x ∈ C. Since M \ C is open, there exists r > 0 such that B(x, r) ⊂ M \ C, and so we can find n0 ∈ N such that xn ∈ B(x0 , r) for all n ≥ n0 , a contradiction with the fact that xn ∈ C for all n ∈ N. Assume now that C is not closed. Thus, M \ C is not open. We can find then x ∈ M \ C such that B(x, 1/n) ∩ C = ∅ for all n ∈ N. This gives a sequence {xn }∞ n=1 in C such that d(x, xn ) < 1/n for all n ∈ N. The sequence {xn }∞ n=1 is thus in C, although its limit belongs to M \ C. We will frequently be using the following result. Proposition 555 Let M be a metric space and S be a subset of M. Let O be a subset of S. Then O is open as a subset of the metric space S if and only if, there is in M such that O = O ∩ S. Similarly for closed sets. an open set O Proof We shall only prove the version for closed sets. The version for open sets follows by taking complements. First, let C be a closed subset of the metric space S. The required closed set is C, since C = C ∩ S. Indeed, we have clearly stated that C ⊂ C ∩ S. On the other hand, let x ∈ C ∩ S. By Proposition 553, there exists a sequence {xn }∞ n=1 in C that converges to x. Since x ∈ S and C is closed in S, we have x ∈ C. is a closed subset of M. We shall prove that C ∩ S is a closed Assume now that C ∩ S that subset of the metric space S. To this end, let {xn }∞ be a sequence in C n=1 due to the fact that C is closed. Since converges to an element x ∈ S. Then x ∈ C, ∩ S and so C ∩ S is closed in the metric space S. x ∈ S we finally get that x ∈ C A point x ∈ M is a cluster point of a sequence {xn }∞ n=1 in M if given ε > 0 and N ∈ N, there exists n ≥ N such that d(xn , x) < ε. Observe that x is a cluster points ∞ ∞ of {xn }∞ n=1 if and only if, there exists a subsequence {xnk }k=1 of {xn }n=1 that converges
288
6 Metric Spaces
to x (see Theorem 146 for the case M = R; the proof of this metric-space-version is similar). The reader will have no difficulties in extending concepts like neighborhood of a point, the interior of a subset of M, etc. An accumulation point of a subset S of M is a point x ∈ M such that every neighborhood of x contains a point in S other than x. Observe that a point x ∈ M is an accumulation point of S if and only if, every neighborhood of x contains infinitely many points of S. Indeed, if x is an accumulation point of S and U is a neighborhood of x, a sequence {xn }∞ n=1 of distinct points in U ∩ S can be constructed by induction: Start by taking a point x1 ∈ U ∩ S, x1 = x. Assume now that the set {x1 , x2 , . . . , xn }, consisting of n distinct elements in U ∩ S, all of them different from x, has already been constructed. Let r := min{d(x, xi ) : i = 1, 2, . . . , n}. Then r > 0. By assumption, there exists xn+1 in (B(x, r/2) ∩ U ∩ S) \ {x}. This shows that xn+1 ∈ {x1 , x1 , . . . , xn }, and that xn+1 ∈ U ∩ S. This finalizes the induction process. On the other hand, if every neighborhood of x contains infinitely many points of S, then clearly x is an accumulation point of S. The set of all accumulation points of S is denoted by S and we say that S is the derived set of S. We say that S ⊂ M is a perfect set if it is closed and S = S. If a point x ∈ S is not an accumulation point of S we say that x is an isolated point in S. Observe that x ∈ S is isolated in S if and only if, there is a neighborhood U of x such that U ∩ S = {x}. A point x ∈ M is said to be a condensation point of S whenever every neighborhood of x contains uncountably many points of S. A subset D of a set A is said to be dense in A whenever A ⊂ D, i.e., whenever every neighborhood of a contains an element in D, for every a ∈ A. This is equivalent, in metric spaces, to say that every point a ∈ A is the limit of a sequence in D. Example 556 (Continued) In due course, we will test the results in this chapter on the list of Examples 549, 551, the following additional ones, and those in Examples 565 below. 5. 6. 7. 8. 9. 10. 11.
12. 13. 14.
The set Q of all rational numbers with the metric dabs . The set P of all irrational numbers with the metric dabs . The set N of all natural numbers with the metric dabs . The set R with the arctangent metric darctan , defined by darctan (x, y) := | arctan x− arctan y| for x, y ∈ R. The space S0 consisting of the elements of a given sequence of real numbers that converges to 0 together with the real number 0, endowed with the metric dabs . The space S consisting of the elements of a given sequence of real numbers that converges to 0 without the real number 0, endowed with the metric dabs . Given a nonempty set , the closed unit ball B ∞ () of the space ( ∞ (, · ∞ )) (see Example 549.2c) endowed with the restriction of the metric d∞ defined by the norm · ∞ . The unit ball of the space C[0, 1] (see Example 551.4) with the restriction of the metric d∞ induced by the supremum norm · ∞ . The set [0, 1] with the metric dabs . The set [0, 1) with the metric dabs .
6.2 Mappings Between Metric Spaces
289
Fig. 6.2 The distance from a point x to a set A
♦ Given a nonempty subset A of a metric space (M, d) and a point x ∈ M, define dist (x, A) := inf{d(x, a) : a ∈ A} (see Fig. 6.2). The function x $ → dist (x, A) is said to be the distance function to A. It is a function from M into the set of nonnegative real numbers. Proposition 557 Let (M, d) be a metric space and A be a nonempty subset of M. Then the distance function to A is a 1-Lipschitz function on M. Proof Let x, y ∈ M. Then for each a ∈ A, we have d(x, a) ≤ d(x, y) + d(y, a). Thus, dist (x, A) ≤ d(x, y) + dist (y, A), hence dist (x, A) − dist (y, A) ≤ d(x, y). By interchanging x and y, we get |dist (x, A) − dist (y, A)| ≤ d(x, y).
Remark 558 1. For a description of the closure of a subset of a metric space in terms of the distance function introduced in Proposition 557 see Exercise 13.359. 2. The infimum in the definition of the distance function is not always attained. Consider, for example, the metric space (S0 , dabs ) := ({−1/n : n ∈ N} ∪ {0}, dabs ) in item 556.9. The distance from 0 to the set A := {−1/n : n ∈ N} is zero and there is no element in A whose distance to 0 is zero. For another example of the same phenomenon, see Exercise 13.375. ®
6.2
Mappings Between Metric Spaces
The reader will have no difficulties in extending most of the definitions and results related to mappings on R to the setting of metric spaces. For example, given two metric spaces (M1 , d1 ) and (M2 , d2 ) and a point x1 ∈ M1 , a function f : M1 → M2
290
6 Metric Spaces
is said to be continuous at x1 if for every ε > 0 there exists δ > 0 such that y1 ∈ M1 and d1 (y1 , x1 ) < δ imply d2 (f (y1 ), f (x1 )) < ε. This is equivalent to say that f (yn ) → f (x1 ) in M2 whenever yn → x1 in M1 . The function f is said to be continuous if it is continuous at each x1 ∈ M1 . For a test of continuity of a function by using elements in a dense subset of its domain see Exercise 13.367. The space of all continuous functions from M1 into M2 is denoted by C(M1 , M2 ). The following useful proposition (easily seen to extend Proposition 325 to the setting of general metric spaces) characterizes continuous mappings between metric spaces. Proposition 559 A function f from a metric space (M1 , d1 ) into a metric space (M2 , d2 ) is continuous if and only if, the inverse image of any open subset of M2 is an open subset of M1 . Proof Assume that f is continuous and let O2 be an open subset of M2 . If O1 := f −1 (O2 ) is empty, there is nothing to prove. Assume, on the contrary, that x ∈ O1 . If for every n ∈ N we have B(x, 1/n) ∩ (M1 \ O1 ) = ∅, we can find a sequence {xn } in M1 \ O1 such that d1 (x, xn ) < 1/n for all n ∈ N. Then the sequence {f (xn )} is in M2 \ O2 , and converges to f (x) (∈ O2 ) by the continuity of f , a contradiction with the fact that M2 \ O2 is closed. Hence, we can find n ∈ N such that B(x, 1/n) ⊂ O1 . Since this happens for every x ∈ O1 , the set O1 is open. Assume now that the condition holds. If f is not continuous at some x ∈ M1 , there exists a sequence {xn } in M1 that converges to x and the sequence {f (xn )} does not converge to f (x). This proves the existence of an open neighborhood O2 of f (x) and a subsequence {xnk } of {xn } such that f (xnk ) ∈ O2 for k ∈ N. Put O1 := f −1 (O2 ). The set O1 is open, by hypothesis. The sequence {xnk } is in M1 \ O1 (a closed set), and converges to x (∈ O1 ), a contradiction. Continuity of a function from a metric space into another can be checked on a dense subset of the former. See Exercise 13.367. A function f : (M1 , d1 ) → (M2 , d2 ) is said to be uniformly continuous if for every ε > 0 there exists δ > 0 such that d2 (f (x), f (y)) < ε if d1 (x, y) < δ (compare with the definition of continuity above; the—crucial—difference is that now δ does not depend on the particular point x ∈ M1 ). The function f is said to be Lipschitz if there exists K > 0 such that d2 (f (x), f (y)) ≤ Kd1 (x, y) for every x, y ∈ M1 (if this is the case, we say that f is K-Lipschitz on M1 ). Clearly, every Lipschitz mapping is uniformly continuous and every uniformly continuous mapping is continuous. The converse implications do not hold. For an example of a continuous nonuniformly continuous function, see the second paragraph after Remark 625 below. For an example of a uniformly continuous non-Lipschitz function, observe that due to Theorem 344, every real-valued continuous function defined on a compact subset of R is uniformly continuous (see also Corollary 630 below); however, not every continuous function on, say, [0, 1], is Lipschitz. As a particular example, let us consider the function f : [0, 1] → [0, 1] given by f (x) := 1 − 1 − (x − 1)2 , x ∈ [0, 1] (see Fig. 6.3). It is obviously continuous on [0, 1]. Assume that there exists K > 0 such that |f (x) − f (y)| ≤ K|x − y| for all x, y ∈ [0, 1]. In particular,
6.2 Mappings Between Metric Spaces
291
Fig. 6.3 A uniformly continuous non-Lipschitz function on [0, 1]
|f (x) − f (0)| = 1 −
1 − (x − 1)2 ≤ Kx for all x ∈ [0, 1], hence, 1 − 1 − (x − 1)2 ≤ K, x
(6.7)
for all x ∈ (0, 1]. This is false, due to the fact that the left-hand side of (6.7) goes to +∞ as x ↓ 0. Another example of a function uniformly continuous on a closed and bounded interval but not Lipschitz there, can be found in Exercises 13.281, 13.285, and 13.288. Functions between metric spaces that preserve distances are called isometries. Precisely, a mapping J : (M1 , d1 ) → (M2 , d2 ) is said to be an isometry if d2 (J (x), J (y)) = d1 (x, y) for all x, y ∈ M1 . Two metric spaces (M1 , d1 ) and (M2 , d2 ) such that an isometry from one onto the other exists, are called isometric. For example, if is a nonempty finite set and card () = n, the metric spaces ( ∞ (), d∞ ) and (Rn , d∞ ) are isometric. Indeed, we may assume that := {1, 2, . . . , n}. Thus, the mapping J : ∞ () → Rn given by J (x) := (x(1), x(2), . . . , x(n)) for x ∈ ∞ () is an isometry from ( ∞ (), d∞ ) onto (Rn , d∞ ). The concept of homeomorphism was introduced in Remark 105. We rephrase this definition here in the context of metric spaces: Definition 560 Let (M1 , d1 ) and (M2 , d2 ) be two metric spaces. A mapping J : M1 → M2 is said to be a homeomorphism from M1 onto M2 if J is one-to-one and onto, and both J and its inverse mapping J −1 are continuous. If such a mapping J exists, we say that the two metric spaces (M1 , d1 ) and (M2 , d2 ) are homeomorphic. Remark 561 Observe that J : (M1 , d1 ) → (M2 , d2 ) is a homeomorphism if and only if, it is one-to-one and onto, and both J and J −1 preserve convergent sequences— i.e., if {xn }∞ n=1 is a sequence in M1 that converges in the metric d1 to an element x ∈ M1 , then {J (xn )}∞ n=1 converges in the metric d2 to f (x), and conversely, if {yn }∞ n=1 is a sequence in M2 that converges in the metric d2 to an element y ∈ M2 , −1 then {J −1 (yn )}∞ (y). n=1 converges in the metric d1 to f Note too, that in view of Proposition 559, a mapping J : (M1 , d1 ) → (M2 , d2 ) is a homeomorphism if and only if, it is one-to-one and onto, and both J and J −1 preserve the family of open sets, i.e., if Oi is an open subset of (Mi , di ), i = 1, 2, ® then J (O1 ) is open in (M2 , d2 ) and J −1 (O2 ) is open in (M1 , d1 ).
292
6 Metric Spaces
Fig. 6.4 A homeomorphism from (0, 1) onto R (Example 562.1)
Fig. 6.5 A homeomorphism from C0 onto R (Example 562.2)
C0
N
x R f(x)
Example 562 1. Let a, b be two real numbers such that a < b. Then, the two metric spaces (R, dabs ) and ((a, b), dabs ) are homeomorphic. Indeed, the mapping f : (a, b) → R given by f (x) := 1/(a − x) + 1/(b − x) for x ∈ (a, b) is a homeomorphism (see Exercise 13.269). The graph of this function on the open interval (0, 1) appears in Fig. 6.4. 2. Let C0 be the subset of R2 consisting of the unit circle centered at (0, 0) without the “North Pole” N (see Fig. 6.5), endowed with the restriction of the Euclidean metric d2 on R2 . This metric space (C0 , d2 ) is homeomorphic to the real line endowed with the absolute-value metric. Indeed, the function f : C0 → R depicted in Fig. 6.5 that maps C0 onto the line R is a homeomorphism from (C0 , d2 ) onto (R, d2 ) and this last space is clearly homeomorphic to (R, dabs ). ♦ Definition 563 Let M be a nonempty set and let d1 and d2 be two metrics on M. We say that d1 and d2 are equivalent if the identity mapping I : (M, d1 ) → (M, d2 ) is a homeomorphism. In view of Remark 561 above, the following three statements regarding a set M and two metrics d1 and d2 on M are equivalent: (i) A sequence {xn }∞ n=1 in M converges in the metric d1 if and only if it converges in the metric d2 . (1 ) (ii) A set S ⊂ M is open in the metric d1 if and only if, it is open in the metric d2 . (iii) The metrics d1 and d2 are equivalent. This statement implies the—apparently more precise—following one: A sequence {xn }∞ n=1 in M converges in the metric d1 if and only if, it converges in the metric d2 , and both limits coincide. Indeed, if xn → x in the metric d1 , the sequence {x1 , x, x2 , x, x3 , . . . } converges to x in the metric d1 , so {x1 , x, x2 , x, x3 , . . . } converges in the metric d2 . Hence, xn → x in the metric d2 . 1
6.3 More Examples (Continued)
293
Example 564 1. The identity mapping f (x) := x from the metric space (R, dabs ) onto the metric space (R, darctan ) is a homeomorphism, hence the two metrics dabs and darctan on R are equivalent. Indeed, assume that {xn }∞ n=1 is a sequence in R that dabs -converges to x ∈ R. Then, due to the fact that the mapping arctan (x) is continuous (see Example 395), we get arctan xn → x in the metric dabs , i.e., darctan (x, xn ) := | arctan x − arctan xn | → 0 as n → ∞. This shows that the sequence {xn }∞ n=1 is darctan -convergent to x. Conversely, assume that darctan (x, xn ) (= | arctan x − arctan xn |) → 0 as n → ∞. Then, since the inverse mapping of the function arctan x : (−1, 1) → R is the continuous function tan x, we get |x − xn | → 0 as n → ∞, i.e., the sequence {xn }∞ n=1 converges to x in the metric dabs . 2. If (M, d) is a metric space, there always exists an equivalent metric d0 on M such that (M, d0 ) is bounded. Indeed, put d0 (x, y) :=
d(x, y) , for all x, y ∈ M. 1 + d(x, y)
(6.8)
Then, d0 is clearly bounded (the number 1 is an upper bound), it is a metric on M, and the two metrics d y d0 are equivalent (for the last two statements see Exercise 13.341 and for the graph of the function f (x) := x/(1 + x) see Fig. 13.512). ♦
6.3
More Examples (Continued)
Example 565 15. For p ≥ 1, the space
p (N) := {x = (xn )∞ n=1 : xn ∈ R for all n ∈ N,
∞
|xn |p < +∞},
(6.9)
n=1
∞ p 1/p for x := endowed with the norm · p , defined by xp = n=1 |xn | (xn ) ∈ p (N). To prove that · p is indeed a norm is easy, except for the triangle inequality in case p > 1; this relies on the inequality (8.18). This norm induces via Eq. (11.1) the distance dp on p (N). 16. (For this item and the next one, see Sects. 7.3 and 11.1) For 1 ≤ p and for a measurable subset E of R, the space Lp (E) of all classes of measurable Lebesgue p-integrable scalar-valued functions on E, i.e., classes of measurable scalar-valued functions f such that |f |p is Lebesgue integrable on E. Here, two functions f and g belong to the same class if and only if, they are equal
1/p (a.e.). This space is endowed with the norm f p := E |f (x)|p dx , for
294
6 Metric Spaces
f ∈ Lp (E). To prove that · p is indeed a norm is easy but for the triangle inequality, this relies on inequalities (8.19) below. This norm induces a distance dp on Lp (E) via Eq. (11.1). 17. For a measurable subset E of R, the space L∞ (E) of all classes of measurable essentially bounded real-valued functions on E, i.e., classes of measurable functions f such that there exists a null set N ⊂ E and some M > 0 such that |f (x)| ≤ M for all x ∈ E \ N . Here again, two functions f and g belong to the same class if and only if, they are equal (a.e.). A norm is defined on L∞ (E) by f ∞ := ess sup (f ) := inf{sup |f (x)| : x ∈ E \ N } : N ⊂ E, λ(N ) = 0}. It induces on L∞ (E) the metric d∞ via Eq. (11.1). 18. Let be an infinite set. The space c0 () consists of all real-valued functions f on such that, given ε > 0, the set {γ ∈ : |f (γ )| > ε} is finite. It is a linear subspace of ∞ () (see Example 549.2c) which, endowed with the restriction of the norm · ∞ , becomes a normed space. Observe that the support of any f ∈ c0 () is countable, since it is ∞ n=1 {γ ∈ : |f (γ )| > 1/n}. 19. Let be an infinite set. Let c00 () denote the linear subspace of c0 () (see Example 565.18) consisting of all finitely supported real-valued functions defined on , i.e., functions f : → R such that their support {γ ∈ : f (γ ) = 0} is finite. Endow this space with the norm induced by · ∞ on c0 () (denoted again by · ∞ ). The space (c00 (), · ∞ ) is a normed space. ♦
6.4 Tietze’s Extension Theorem An important result in the theory of continuous functions on metric spaces is the socalled Tietze’s Extension Theorem, due to the Austrian mathematician H. F. F. Tietze. It allows to extend a real-valued continuous function defined on a closed subset of a metric space to a continuous function defined on the whole space. The resulting function is called an extension of the given function and the term refers to the fact that both functions coincide on the given closed set. We follow [BBT97]. Theorem 566 (Tietze) Let (M, d) be a metric space. Let F be a nonempty closed subset of M and let f : F → R be a continuous function. Then there exists a continuous extension f : M → R of f (i.e., a continuous function f : M → R such that f (x) = f (x) for all x ∈ F ). Moreover, if f : F → R satisfies |f (x)| ≤ K for all x ∈ F , then the continuous extension f can be chosen such that |f (x)| < K for all x ∈ M \ F . To prove Theorem 566, we need the following result. Lemma 567 Let (M, d) be a metric space, F a nonempty closed subset of M, and f : F → R a continuous function. Assume that for some K > 0 we have |f (x)| ≤ K for all x ∈ F . Then there exists a continuous function g : M → R such that (i)
|g(x)| ≤ K/3 for all x ∈ F .
6.4 Tietze’s Extension Theorem
295
Fig. 6.6 f defined on F := [0, 1], g on M := R, and K = 3 (Lemma 567)
(ii) |g(x)| < K/3 for all x ∈ F c . (iii) |f (x) − g(x)| ≤ 2K/3 for all x ∈ F . Proof Define the sets A := {x ∈ F : f (x) ≤ −K/3}, and B := {x ∈ F : f (x) ≥ K/3}. The sets A and B are closed and disjoint. For x ⎧ K d(x,A)−d(x,B) ⎪ , ⎪ 3 d(x,A)+d(x,B) ⎪ ⎪ ⎪ ⎨ K (1 − min{1, d(x, B)}), g(x) := 3 K ⎪ (min{1, d(x, A)} − 1), ⎪ ⎪ 3 ⎪ ⎪ ⎩0
∈ M put if A = ∅ and B = ∅. if A = ∅ and B = ∅. if A = ∅ and B = ∅. if A = ∅ and B = ∅.
(See Fig. 6.6 for a function f defined on [0, 1], with |f (x)| ≤ 3 on [0, 1], A = [0.6, 1], and B = [0, 0.4]). It is straightforward to check that g satisfies the requirements. Proof of Theorem 566 Assume first that f is bounded, say |f (x)| ≤ K for all x ∈ F . The sought extension f will be the sum of a series ∞ n=0 gn , where the functions gn will be defined inductively. Put g0 ≡ 0 on M and observe that 0 2 K |f (x) − g0 (x)| ≤ 3 for all x ∈ F . Suppose that the functions g0 , . . . , gn have already been defined for some n ∈ N ∪ {0} in such a way that n 2 n gk (x) ≤ K, for all x ∈ F. (6.10) f (x) − 3 k=0 Apply Lemma 567 to the function f − nk=0 gk defined on F and its upper bound
2 n K to obtain a function gn+1 defined on M such that 3 (i)
|gn+1 (x)| ≤ 13 ( 23 )n K for all x ∈ F .
296
6 Metric Spaces
(ii) |gn+1 (x)| < 13 ( 23 )n K for all x ∈ F c . (iii) |f (x) − nk=0 gk (x) − gn+1 (x)| ≤ ( 23 )n+1 K for all x ∈ F . Due to (i) and (ii) above, the series ∞ k=0 gk converges uniformly on M (see Theorem 473), hence by Theorem 463, its sum f is a continuous function on M. Because of (6.10) we conclude that f = f on F . In order to check that |f | ≤ K on M, it is enough then to take x ∈ F c . We get ∞ ∞ f (x) = gn (x) = gn+1 (x) k=0
≤
∞ k=0
k=0
|gn+1 (x)| < K
∞ 1 2 n k=0
3
3
= K,
where the last inequality follows from (ii). This completes the proof in the case that f is bounded. Assume now that f is arbitrary. Let h be a continuous strictly increasing function mapping R onto (−1, 1) (for example, we may consider the function (2/π ) arctan x). Then the function h ◦ f is continuous and |(h ◦ f )(x)| < 1 for all x ∈ F . From the first part of the proof, we obtain an extension h ◦ f of h ◦ f to M and still we have |h ◦ f (x)| < 1 for every x ∈ M. Observe that h−1 exists and it is a continuous function from ( − 1, 1) to R (in our example, h−1 = tan ((π/2)x), although the result follows for any such function h by an application of Proposition 337). The mapping h−1 ◦ h ◦ f is the required extension. Remark 568 1. For the case the metric space M is the real line, Tietze’s Theorem 566 has an easier proof. See, e.g., Exercise 13.186. 2. Note that the closedness requirement in Theorem 566 cannot be dropped as, say, the function sin (1/x) cannot be extended to a continuous function from (0, 1) to [0, 1]. See Fig. 7.21 for a plot of the function on [ − 1, 0) and note that the function is odd. ®
6.5
Complete Metric Spaces and the Completion of a Metric Space
In iterative processes usual in numerical computations and most particularly in the way a computer handles the limit of a sequence—say in root-finding programs— we need to instruct the process (the computer) when to stop. Usually this is done by checking the proximity of successive outputs and telling the machine to finalize the process when this proximity is small enough. Note that, since behind the concept of a limit there is an infinite process, it is absolutely out of reach for a computer to
6.5 Complete Metric Spaces and the Completion of a Metric Space
297
actually find the limit of a sequence. What (more or less) a computer does is to verify that a certain sequence is Cauchy. Remark 569 Observe that every convergent sequence {xn }∞ n=1 in a metric space (M, d) is Cauchy, and that every Cauchy sequence there is bounded. Indeed, for the first assertion, observe that given ε > 0 there exists N ∈ N such that d(xn , x) < ε for all n ≥ N, where x is the limit of the sequence. This shows that d(xn , xm ) < 2ε for all n, m ≥ N, and so the sequence is Cauchy. For the second statement, if {xn }∞ n=1 is Cauchy then there exists N ∈ N such that d(xn , xm ) < 1 for all n, m ≥ N . In particular, d(xn , xN ) < 1 for all n ≥ N . Let r := max{1, d(xn , xN ) : n = 1, 2, . . . , N}. Then {xn : n ∈ N} ⊂ B[xN , r], and the conclusion follows. ® Once the Cauchy property is (more or less) checked, the computer may conclude that the sequence has a limit. This is not true in general, since not every Cauchy sequence in a metric space converges: The most conspicuous example is √ the space (Q, d), where d is the absolute-value distance. Write the irrational number 2 as the ∞ limit of a sequence {xn }∞ n=1 of rational numbers (see Proposition 85). Then {xn }n=1 is certainly Cauchy (see Remark 569), although does not converge in Q. Thus we introduce the following concept. Definition 570 A metric space (M, d) is said to be complete if every Cauchy sequence in M converges to a point in M. Remark 571 A normed space (X, · ) is said to be complete whenever the metric space (X, d) is complete, where d is the distance defined by the norm · (see Eq. (6.1)). A normed space that is complete is called a Banach space. Banach spaces will be considered at some length in Sect. 11.1. ® Proposition 572 Let (M, d) be a metric space and let S be a subset of M. Denote also by d the restriction of the metric d to S. Then (i) (ii)
If (M, d) is complete, and S is closed, then the metric space (S, d) is complete. If the metric space (S, d) is complete, then S is a closed subset of M.
Proof (i)
Assume that S is closed and that the metric space (M, d) is complete. Given a Cauchy sequence {xn }∞ n=1 in S, it is a Cauchy sequence in (M, d), and so it converges to an element x ∈ M. Since S is closed, it follows that x ∈ S. This proves that (S, d) is complete. (ii) Assume that the space (S, d) is complete. Let {xn }∞ n=1 be a sequence in S that converges to x ∈ M. The sequence {xn }∞ n=1 is Cauchy in the space (S, d), hence it has a limit s in S. The uniqueness of the limit shows that x = s (∈ S), hence S is closed. Example 573 Let us check completeness for the list of examples given in Examples 549, 551, 556, and 565. We keep the numbering there.
298
6 Metric Spaces
1. The metric space (R, dabs ) of all real numbers with the usual metric is complete. This is the content of Theorem 45 (see also Theorem 152). 2. (a) Due to the fact that the space (R, dabs ) is complete (see the previous item), and that · 1 induces dabs on R, the normed space (R, · 1 ) is complete. It is, then, a Banach space. Incidentally, observe that the Euclidean norm · 2 on R coincides with · 1 . (b) For all n ∈ N, the metric space (Rn , · 2 ) is complete. Indeed, if {xk }∞ k=1 is a Cauchy sequence in (Rn , · 2 ), given ε > 0 there exists k0 ∈ N such that xj − xk 2 < ε for all j , k ≥ k0 . In particular, given i ∈ {1, 2, . . . , n}, we have |xj(i) − xk(i) < ε for all j , k ≥ k0 , where xk = (xk(1) , . . . , xk(n) ) for all k ∈ N. It follows that each of the sequences {xk(i) }∞ k=1 is Cauchy. By 1 above, it converges to some x (i) ∈ R. Since ( ni=1 |xj(i) −xk(i) |2 )1/2 < ε for all j , k ≥ k0 , by passing to the limit when j → ∞ we get ( ni=1 |x (i) − xk(i) |2 )1/2 ≤ ε for all k ≥ k0 , i.e., x − xk 2 ≤ ε for all k ≥ k0 . This shows that {xk }∞ k=1 converges (to x). (c) If is an arbitrary nonempty set, the normed space ( ∞ (), · ∞ ) (see Example 549.2c) is complete. Indeed, let {xn }∞ n=1 be a Cauchy sequence in
∞ (). It was proved in Remark 569 that it is bounded, so there exists K > 0 such that d∞ (xn , 0) ≤ K for all n ∈ N. Note, too, that for any γ ∈ , the sequence {xn (γ )}∞ n=1 is Cauchy; thus, by Theorem 152, it converges, say, to x(γ ). Since |xn (γ )| ≤ K for all n ∈ N and all γ ∈ , the function x : → R so defined belongs to ∞ (). It remains to prove that d∞ (x, xn ) → 0. This is simple: given ε > 0 there exists N ∈ N such that d∞ (xn , xm ) < ε for all n, m ≥ N. In particular, |xn (γ ) − xm (γ )| < ε for all γ ∈ and all n, m ≥ N . Letting m → ∞ we get |xn (γ ) − x(γ )| ≤ ε for all γ ∈ , hence d∞ (x, xn ) ≤ ε for all n ≥ N . This proves the result. 3. The metric space (S, d), where S is a nonempty set and d is the discrete metric, is complete. Indeed, let {sn }∞ n=1 be a Cauchy sequence. Given ε ∈ (0, 1), there exists n0 such that d(sn , sm ) < ε for all n, m ≥ n0 . This implies that sn = sm for all n, m ≥ n0 , and so {sn }∞ n=1 is eventually constant, hence convergent. 4. The normed space (C[0, 1], · ∞ ) is complete, i.e., it is a Banach space. This is a consequence of the fact that ( ∞ ([0, 1]). · ∞ ) is complete (Example 573.2c above), of Theorem 463, and Proposition 572. Indeed, C[0, 1] is a closed subset of the space ∞ ([0, 1]) endowed with the norm · ∞ , because the uniform limit of a sequence of continuous functions is itself continuous (see Theorem 463). 5. The metric space (Q, dabs ) of all rational numbers with its usual metric is not complete. This was proved in the paragraph preceding Definition 570. An alternative argument uses Proposition 572: the set Q is not closed in the complete metric space (R, dabs ) (Example 573.2c above). 6. The metric space (P, dabs ) of all irrational numbers with its usual metric is not complete. It is enough to observe that every rational number is the limit of a sequence {xn }∞ n=1 of irrational numbers (see Proposition 85). Thus, the sequence {xn }∞ is Cauchy, although does not converge in the space R \ Q. Again, an n=1
6.5 Complete Metric Spaces and the Completion of a Metric Space
7.
8.
9.
10.
11. 12.
13.
14.
15.
299
alternative approach is to use Proposition 572; the set R \ Q is not closed in the complete metric space (R, dabs ) (Example 573.2c above). The metric space (N, dabs ) of all natural numbers with its usual metric is complete. The argument is similar to the one used in the case of Example 573.3. Precisely, every Cauchy sequence is eventually constant—hence convergent. The metric space set (R, darctan ) is not complete. Indeed, the sequence {1, 2, . . . } is readily seen to be Cauchy in the metric darctan . However, it does not converge in this metric. See Remark 574 below. The metric space (S0 , dabs ), where S consists of the elements of a sequence of real numbers that converges to 0 together with the real number 0, is complete. Indeed, this set is closed in the complete metric space (R, dabs ) (Example 573.2c above) and we can apply Proposition 572. An argument to prove that S0 is closed in (R, dabs ) is in Exercise 13.99. The metric space (S, dabs ), where S consist of the elements of a sequence of real numbers that converges to 0, without the real number 0, is not complete. Indeed, since 0 ∈ S, the set S is not closed in the complete metric space (R, dabs ) (Example 573.2c above), and we can apply Proposition 572. The metric space (B ∞ () , d∞ ) is complete, since B ∞ () is closed in the Banach space ( ∞ (), · ∞ ) (Example 573.2c) and we may apply then Proposition 572. The closed unit ball of the space (C[0, 1], · ∞ ), endowed with the restriction of the metric d∞ on (C[0, 1], · ∞ ), is a complete metric space, as it follows from the fact that it is a closed subset of the Banach space (C[0, 1], · ∞ ) (see Example 573.4 above) and we may use Proposition 572. The metric space ([0, 1], dabs ) is complete, since [0, 1] is a closed subset of the complete metric space (R, dabs ) (Example 573.2c above) and we may use then Proposition 572. The metric space ([0, 1), dabs ) is not complete, since the set [0, 1) is not closed in the complete metric space (R, dabs ), (Example 573.2c above) and we may use then Proposition 572. For p ≥ 1, the space ( p (N), · p ) is a Banach space. Indeed, let {x k }∞ k=1 be a Cauchy sequence in p (N), where x k = (xik ). Given ε > 0, find k0 such that
∞
p1 |xik − xil |p
≤ε
(6.11)
i=1
for every k, l ≥ k0 . In particular, |xik − xil | ≤ ε for every k, l ≥ k0 and i ∈ N, hence the sequence {xik }∞ k=1 converges to some xi ∈ R for every i ∈ N. Put x = (xi ). We will show that x ∈ p (N). Since every Cauchy sequence is
∞ 1 k p p bounded (see Remark 569), there is a constant C > 0 such that ≤ i=1 |xi | 1
n k p p ≤ C for all n, k ∈ N. By letting k → ∞ C for every k. Therefore i=1 |xi | 1 1
∞
n p p p p ≤ C for every n ∈ N. Therefore ≤ C and we get i=1 |xi | i=1 |xi | so x ∈ p (N).
300
6 Metric Spaces
We will now show that x k → x in ( p (N), · p ). Given ε > 0, we let l → ∞
n 1 k p p in (6.11) and get ≤ ε for every n ∈ N and every k ≥ k0 . We i=1 |xi − xi | let n → ∞ to obtain ∞
p1 |xik
− xi |
p
(= x k − xp ) ≤ ε
i=1
for every k ≥ k0 . Therefore x k → x in ( p (N), · p ). 16. For p ≥ 1 and for a measurable subset E of R, the space (Lp (E), · p ) is a Banach space. The proof for p = 2 is in Proposition 965, and for p = 1 in Exercise 13.485. For other p ∈ [1, +∞) the proof is similar. 17. For a measurable subset E of R, the space (L∞ (E), · ∞ ) is a Banach space. For the proof, we refer to, e.g., [FHHMZ11, Proposition 1.24]. 18. For an infinite set , the space (c0 (), · ∞ ) is a Banach space, since it is a closed subset of the Banach space ( ∞ (), · ∞ ) (see Example 573. 2c). Indeed, if x ∈ ∞ () \ c0 (), there exists ε > 0 such that {γ ∈ : |x(γ )| > ε} is infinite. Note that if y ∈ ∞ () satisfies y − x∞ < ε/2, then {γ ∈ : |x(γ )| > ε} ⊂ {γ ∈ : |y(γ )| > ε/2|}, hence this last set is infinite, and so y ∈ c0 (). This shows that ∞ () \ c0 () is open. 19. For an infinite set , the space (c00 (), · ∞ ) is not complete. In fact, it is dense in c0 (), and obviously c00 () = c0 (). To prove denseness, fix x ∈ c0 () and ε > 0. Find a finite subset 0 of such that |x(γ )| < ε for all γ ∈ \ 0 . Put x0 (γ ) = x(γ ) for all γ ∈ 0 , and x0 (γ ) = 0 otherwise. Then x0 ∈ c00 () and x − x0 < ε. ♦ Remark 574 Observe that completeness is not in general preserved by homeomorphisms. We already provided several examples of this phenomenon, namely: (i)
We showed in Example 562.1 that the two metric spaces (R, dabs ) and ((0, 1), dabs ) are homeomorphic. However, the first one is complete (see Example 573.1), but not the second one, due to Proposition 572. (ii) It was proved in Example 562.2 that the two metric spaces (R, dabs ) and (C0 , d2 ) are homeomorphic (see Fig. 6.5), where C0 denotes the subset of R2 consisting of the unit circle centered at (0, 0) minus the “North Pole.” The first one is complete (see Example 573.1), but not the second one; indeed, a sequence 2 {xn }∞ n=1 in C0 that d2 -converges to N in R is d2 -Cauchy, although it does not converge in (C0 , d2 ). (iii) The two metric spaces (R, dabs ) and (R, darctan ) are homeomorphic. In fact, the identity mapping is a homeomorphism. This was proved in Example 564.1. However, the first space is complete (see Example 573.1), while the second one is not (see Example 573.8). ®
6.5 Complete Metric Spaces and the Completion of a Metric Space
301
The set Q is dense in (R, dabs ) and the last is a complete metric space (see Example 573.1); we say in this particular case that (R, dabs ) is a completion of (Q, dabs ). The precise definition of a completion is given below. / d) / and an isomDefinition 575 A couple consisting of a complete metric space (M, / etry J : M → M is called a completion of the metric space (M, d) if J (M) is a / dense subset of M. The reader may find in references (see, e.g., [KoFo75, II.7.4]) the description of a certain mechanism to construct such a completion that mimics one of the ways R was constructed starting from Q. It consists in identifying the given metric space (M, d) to a subset of the set of all Cauchy sequences (in fact, to the family of classes of Cauchy sequences). Some details will be given in Exercise 13.382. A more direct approach is to rely on two facta: (i) That the space ( ∞ (), · ∞ ) is complete (proved in Example 573.2c), and (ii) that this kind of space is universal for the class of metric spaces (i.e., every metric space is a subspace of one of them). This is proved in Proposition 576 below. Proposition 576 Every metric space (M, d) is isometric to a subset of ( ∞ (), d∞ ) for some set . Proof Fix x0 ∈ M and let be the set M itself. The mapping ϕ : M → ∞ () given by ϕ(x)(γ ) := d(x, γ ) − d(x0 , γ ), for x ∈ M and γ ∈ (= M),
(6.12)
is an isometry into ∞ () (see Fig. 6.7). Indeed, for x, y ∈ M, γ ∈ , (ϕ(x) − ϕ(y))(γ ) = d(x, γ ) − d(x0 , γ ) − d(y, γ ) + d(x0 , γ ) = d(x, γ ) − d(y, γ ) ≤ d(x, y),
(6.13)
as it follows from the fact that d(x, γ ) ≤ d(x, y) + d(y, γ ). Since this is true for every γ ∈ , we get ϕ(x) − ϕ(y)∞ ≤ d(x, y). By taking γ = y in (6.13) we get (ϕ(x) − ϕ(y))(y) = d(x, y). This shows, finally, that ϕ(x) − ϕ(y)∞ = d(x, y), so ϕ is an isometry. Corollary 577 Every metric space has a completion. Proof The couple ((ϕ(M), d∞ ), ϕ), where ϕ is the isometry built in the proof of Proposition 576, is a completion of (M, d). Indeed, the space ( ∞ (), d∞ ) is complete (Example 573.2c), so (ϕ(M), d∞ ) is a complete metric space (see Proposition 572). Certainly, ϕ(M) is dense in ϕ(M). Proposition 578 The completion of a metric space is unique up to isometries. /1 , d/1 ), J1 ) and ((M /2 , d/2 ), J2 ) are two compleProof Precisely stated, assume that ((M / / /2 such that J/◦ J1 = J2 . tions of (M, d). Then, there is an onto isometry J : M1 → M In order to prove this statement, consider the isometry J := J2 ◦ J1−1 : J1 (M) → /1 and J2 (M) is dense in M /2 , there is a unique J2 (M). Since J1 (M) is dense in M / / / extension J of J to M1 . Indeed, if / y ∈ M1 is the limit of a sequence {yn } in J1 (M),
302
6 Metric Spaces
Fig. 6.7 Proof of Proposition 576: functions ϕ(x) for some x’s (here M = R and x0 = 0)
/2 . put J/(/ y ) := limn J (yn ). This limit exists since {J (yn )} is a Cauchy sequence in M /1 onto It is simple to prove that the mapping J/ so defined is an isometry from M /2 . 2 M Theorem 579 (Cantor) Let (M, d) be a metric space. Then (M, d) is complete ∞ if and only if, for every decreasing ∞ sequence {Fi }i=1 of closed sets in M with limi→∞ diam (Fi ) = 0, we have i=1 Fi = ∅. Proof Assume that (M, d) is complete. Pick xi ∈ Fi for each i. Then it is not difficult ∈ M for some x. Given i, to show that {xi }∞ i=1 is a Cauchy sequence. Thus xi → x we have xj ∈ Fi for each j ≥ i. Thus x ∈ Fi . Therefore ∞ i=1 Fi = ∅. Assume now that the condition holds. Given an arbitrary Cauchy sequence {xn }∞ n=1 in M, let Fi := {xi , xi+1 , . . . } for each i ∈ N. The sets Fi are all of them closed. Observe that diam (Fi ) → 0 as i → ∞. Indeed, given ε > 0 there exists i0 ∈ N such that d(xj , xk ) < ε for all j , k ≥ i0 . This shows that diam Fi ≤ ε for all i ≥ i0 . Thus there is x ∈ ∞ i=1 Fi . We claim that xi → x as i → ∞. Indeed, given ε > 0 there exists i0 ∈ N such that diam (Fi ) < ε for all i ≥ i0 . For any i ≥ i0 , we have x ∈ Fi , so there exists j ≥ i such that d(x, xj ) < ε (see Exercise 13.359). Since diam (Fi ) < ε we get d(x, xi ) < 2ε, and the claim is proved.
6.6
Separable Metric Spaces
The metric space (R, dabs ) (see Example 549.1), contains a countable dense subset, namely Q. The denseness of Q allows to approximate, with a preassigned accuracy, each element in R by an element in Q. Unfortunately, the existence of a countable dense subset of a metric space is not guaranteed in general (see, as an instance, Example 586.2c below). We introduce a definition. Definition 580 A metric space (M, d) is said to be separable whenever there exists a countable and dense subset D of M. Remark 581 The concept of separability depends only on the family of open sets in the space, i.e., of the topology of the space. Hence, if a metric space (M, d) is separable, the space (M, d1 ) is also separable whenever d1 is another equivalent
6.6 Separable Metric Spaces
303
metric on M. To show this directly, it is enough to observe that if D is a countable dense subset of (M, d), it is also dense in (M, d1 ), due to the fact that a sequence {xn }∞ n=1 in M is d-convergent to some x ∈ M if and only if, it is d1 -convergent to x. ® The following result characterizes separability in metric spaces. A metric space (M, d) is said to be Lindelöf if every open cover of M has a countable subcover. A subfamily B of the family of open sets of M is said to be a base for the topology of (M, d) if every open subset of M is a union of elements in B. Given δ > 0, a subset N of M is said to be a δ-net in M if for every x ∈ M there exists y ∈ N such that d(x, y) < δ. We say that a subset S of a metric space (M, d) is, for some δ > 0, δ-separated, if d(x, y) ≥ δ for every x, y ∈ S, x = y. Theorem 582 Let (M, d) be a metric space. Then, the following conditions are equivalent. (i) (ii) (iii) (iv) (v)
(M, d) is separable. The topology of (M, d) has a countable base. (M, d) is Lindelöf. For each δ > 0 there exists a countable δ-net Nδ in M. For each δ > 0, any δ-separated set in X is countable.
Proof (i)⇒(ii) Let D be a countable dense subset of M. We claim that B := {B(y, r) : y ∈ D, r ∈ Q} is a (countable) base for the topology of (M, d). Indeed, let O be an arbitrary nonempty open subset of M. Fix x ∈ O and let r ∈ Q be such that B(x, r) ⊂ O. Since D is dense in M, there exists y ∈ D such that d(y, x) < r/3. Note that x ∈ B(y, r/3) ∈ B. If z ∈ B(y, r/3), then d(z, x) ≤ d(z, y) + d(y, x) < r/3 + r/3 < r, hence x ∈ B(y, r/3) ⊂ B(x, r) ⊂ O. This shows that O can be written as a union of elements in B. (ii)⇒(iii) Assume that O is an open cover of M. Let B be a countable base of M. For each B ∈ B that is contained in some O ∈ O select a single element O(B) ∈ O such that B ⊂ O(B). In this way we obtain a countable subfamily of O. We claim that the subfamily so obtained covers M. Indeed, given x ∈ M there exists O ∈ O such that x ∈ O. Since B is a base for the topology of (M, d), there exists B ∈ B such that x ∈ B ⊂ O. Then x ∈ B ⊂ O(B) and this proves the claim. (iii)⇒(iv) Given δ > 0, the family {B(x, δ) : x ∈ M} covers M, hence there exists a countable subcover {B(y, δ) : y ∈ N }. It is clear that the set N is a (countable) δ-net. (iv)⇒(v) Fix δ > 0 and let Sδ be a δ-separated subset of M. By (iv), there exists a countable δ/2-net Nδ/2 in M. Given s ∈ Sδ , we can find φ(s) ∈ Nδ/2 such that d(s, φ(s)) < δ/2. This defines a mapping φ : Sδ → Nδ/2 . Assume that φ(s1 ) = φ(s2 ) for two s1 , s2 ∈ Sδ . Then d(s1 , s2 ) ≤ d(s1 , φ(s1 )) + d(φ(s1 ), s2 ) < δ/2 + δ/2 = δ, hence s1 = s2 . This proves that φ is one-to-one, hence Sδ is countable. (v)⇒(i) Fix δ > 0. Let Fδ be the family of all subsets F of M that are δ-separated. The family Fδ , with the partial order defined by the inclusion (see Sect. 12.6), has the property that every chain (i.e., every totally ordered subset) has an upper bound (namely, its union). Then, by Zorn’s Lemma (see Sect. 12.6.3), Fδ has a maximal
304
6 Metric Spaces
element Sδ . It is clear that Sδ is a δ-net. By assumption, Sδ is countable. It follows that the set ∞ n=1 S1/n is dense and countable, hence (M, d) is separable. Corollary 583 Every subspace of a separable metric space is again separable. Proof It is enough to observe that (ii) in Theorem 582 is inherited by subspaces. An alternative approach is the following. Let Y be a subset of the separable metric space (M, d) and let δ > 0. By (iv) in Theorem 582, there exists a countable δ-net Nδ in M. Let Nδ be a subset of Nδ formed by points x such that B(x, δ) ∩ Y = ∅. For each point x ∈ Nδ , choose y ∈ B(x, δ) ∩ Y . The collection of such chosen ys is easily seen to form a (countable) 2δ-net in Y , and the result follows from (iv) in Theorem 582. Remark 584 We remark that any attempt to directly show that the subspace P consisting of all irrational numbers contains a countable dense set will have the reader to appreciate the power of Corollary 583. ® Regarding the action of a continuous mapping on a separable metric space, we have the following stability result. Proposition 585 The continuous image of a separable metric space into a metric space is separable. Proof Let (M, d) be a separable metric space. There exists a countable and dense subset D of M. Let (N, r) be a metric space and f : M → N a continuous function. The set f (D) is countable. It is also dense in f (M); indeed, if f (x) is an element in f (M), there exists a sequence {xn }∞ n=1 in D that d-converges to x. Since f is continuous, the sequence {f (xn )}∞ (a sequence in f (D)) r-converges to f (x). It n=1 follows that f (M) is separable. Example 586 We will now check separability for the metric spaces in Examples 549, 551, 556, and 565. 1. The space (R, dabs ) is separable. This is a consequence of the fact that Q is countable and dense in R (see Proposition 58 and Theorem 63, respectively). 2. (a) The space (R, · 1 ) is a separable Banach space. Since · 1 induces the distance dabs , this follows from Example 586.1. (b) The space (Rn , · 2 ) is separable, due to the fact that the set consisting of all vectors (r1 , . . . , rn ) having rational coordinates is countable (see Exercise 13.42) and dense in (Rn , · 2 ) (an obvious extension of the fact that every real number is the limit of a sequence of rational numbers). (c) Let be a nonempty set. The space ( ∞ (), · ∞ ) is separable if and only if, is finite. In order to see this, observe that if is finite, then the space is just (Rn , · ∞ ), where n is the cardinality of . As in Example 586.2b above, the set of all vectors having rational coordinates is countable and dense in (Rn , · ∞ ). Assume now that is infinite. Given J ⊂ , put χJ for the characteristic function of J , and observe that {χJ : J ⊂ } is a 1-separated set in ( ∞ (), · ∞ ) (see the paragraph after Remark 581). Since the cardinality of the family P() of all subsets of is uncountable
6.6 Separable Metric Spaces
3.
4.
5. 6.
7. 8.
9. 10. 11.
12.
13.
14.
15.
305
(see (d) in Exercise 13.32), the nonseparability of ( ∞ (), · ∞ ) follows from (v) in Theorem 582. Obviously, the metric space (S, d), where S is a nonempty set and d is the discrete metric (see Example 549.3) is separable if and only if, S is finite or countably infinite. The space (C[0, 1], · ∞ ) is separable. This is a consequence of the Weierstrass Theorem 490. Indeed, the set of all polynomials on [0, 1] having rational coefficients is countable, and it is dense in the subspace of (C[0, 1], · ∞ ) consisting of all polynomials. The metric space (Q, dabs ) is separable because it is, itself, countable. Due to the fact that (R, dabs ) is separable (Example 586.1) and to Corollary 583, the space P of all irrational numbers, endowed with the metric induced by dabs , is separable. An attempt to prove this result directly may illustrate the usefulness of Corollary 583 (see Exercise 13.383). The space (N, dabs ) is separable, due to the fact that N is countable. The space (R, darctan ) is separable. To see this, observe that the identity mapping I from (R, dabs ) onto (R, darctan ) is continuous. Even more, it is a homeomorphism. This was proved in Example 564.1. Since (R, dabs ) is separable (Example 586.1), the conclusion follows from Proposition 585. The space (S0 , dabs ) := ({xn : n ∈ N}∪{0}, dabs ), where {xn }∞ n=1 is a sequence in R that converges in dabs to 0, is separable since {xn : n ∈ N} ∪ {0} is countable. The space (S, dabs ) := ({xn : n ∈ N}, dabs ), where {xn }∞ n=1 is a sequence in R that converges in dabs to 0, is separable since {xn : n ∈ N} is countable. Let be a nonempty set. The space (B( ∞ (),·∞ ) , d∞ ) is separable if is finite, as it follows from Example 586.2c above and Corollary 583. However, if is infinite, it is not separable, due to the fact that the 1-separated set built in Example 586.2c above is in B( ∞ ,·∞ ) . The space (B(C[0,1],·∞ ) , d∞ ) is separable, since it is a subspace of the separable metric space (C[0, 1], · ∞ ) (Example 586.4), and we may apply Corollary 583. The space [0, 1], endowed with the metric dabs , is separable, since it is a subspace of the separable metric space (R, dabs ) (Example 586.1), and we may apply Corollary 583. The space [0, 1), endowed with the metric dabs , is separable, since it is a subspace of the separable metric space (R, dabs ) (Example 586.1), and we may apply Corollary 583. For p ≥ 1, the space ( p (N), · p ) is separable. Indeed, consider in p (N) the family F formed by all finitely supported vectors with rational coefficients. Then F is countable. We will show that F is dense in ( p (N), · p ). Given x ∈ p (N)
∞ εp p ≤ 2 and then find and ε > 0, choose n0 ∈ N, n0 > 1, such that i=n0 |xi | εp p rational numbers r1 , r2 , . . . , rn0 −1 such that |xi −ri | ≤ 2n0 for i = 1, . . . , n0 −1. Then s := (r1 , r2 , . . . , rn0 −1 , 0, . . . ) is in F and s − xpp =
n 0 −1 i=1
|xi − r2 |p +
∞ i=n0
|xi |p ≤
n 0 −1 i=1
εp εp + < εp . 2n0 2
306
6 Metric Spaces
Therefore F is dense in ( p (N), · p ). 16. For p ≥ 1, the Banach space (Lp [0, 1], · p ) is separable. Indeed, it follows from the definition that the subspace of all measurable step functions is · p dense in Lp [0, 1], and each measurable step function can be approximated in the norm · p by a continuous function. The space C[0, 1] is · ∞ -separable (Example 586.4), hence · p -separable, and the result follows. 17. The space (L∞ [0, 1], · ∞ ) is not separable. The family of functions {χ[0,t] : t ∈ [0, 1]} is uncountable and 1-separated. The result follows from (v) in Theorem 582. 18. The space (c0 (N), · ∞ ) is separable. Indeed, the set F of all finitely supported vectors with rational coefficients is a countable dense subset of (c0 (N), · ∞ ). The argument is similar to the one used in Example 586.15. If is uncountable, then (c0 (), · ∞ ) is not separable; Indeed, the set {eγ : γ ∈ } is 1-separated and uncountable, where, for γ ∈ , eγ denotes the canonical unit vector associated to γ , i.e., the characteristic function of the set {γ }. We can use then Theorem 582. 19. The space (c00 (N), · ∞ ), as a subspace of the separable Banach space (c0 (N), · ∞ ) (Example 586.18), is itself separable (see Corollary 583). If is uncountable, it is not separable (see the argument in Example 586.18). ♦
6.7
Polish Spaces
Definition 587 A Polish space is a metric space homeomorphic to a complete and separable metric space. Polish spaces are named this way in the mathematical literature to honor the enormous contribution the Polish mathematical schools have had in this area of mathematics. Remark 588 1. Observe that a Polish space is not necessarily complete. For example, the space (R, darctan ) introduced in Example 556.8 is not complete (see Example 556.8), although the metric spaces (R, darctan ) and (R, dabs ) are homeomorphic (see Example 564.1), and so (R, darctan ) is a Polish space. Another example in the same direction can be presented based on Theorem 589 below. The metric space (R, dabs ) is complete and separable. According to Theorem 589, any open subset O of R is, endowed with the induced metric, a Polish space. However, by Proposition 572, if the set O is not empty and is not the whole space R, it cannot be complete in the metric induced by dabs , since it is not closed (see Proposition 103). 2. Observe that every Polish space can be equipped with a complete compatible metric. To be precise, let (P , d1 ) be a Polish space. There exists then a separable
6.7 Polish Spaces
307
complete metric space (M, ρ) and a homeomorphism F : (P , d1 ) → (M, ρ). We can define a metric d2 on P as d2 (x, y) := ρ(F (x), F (y)) for all x, y ∈ P . The two metrics d1 and d2 on P are compatible, i.e., they induce the same family of open subsets of P , due to the fact that F is a homeomorphism. Equivalently, a sequence in P converges in the metric d1 if and only if, it converges in the metric d2 . 3. Note that every Polish space is separable, due to Proposition 585. 4. Every closed subspace of a Polish space is itself Polish. This is a consequence of Proposition 572 and Corollary 583. ® A wide class of Polish spaces is described in the following result, due to the Russian mathematician P. S. Alexándrov. Aconsequence of it is that the space (R \ Q, dabs ) is a Polish space, since R \ Q := ∞ n=1 (R \ {qn }), where Q = {qn : n ∈ N} is an enumeration of the set of rational points. Theorem 589 (Alexándrov) Every Gδ -subset of a Polish space is a Polish space. Proof Let M be a Polish space and let d be a metric on M compatible with its topology such that (M, d) is complete and separable (see Remark 588.2). Let G ⊂ M be a Gδ -set, so G = ∞ n=1 Gn , where Gn is open for n ∈ N. Define a new metric d0 on G by ∞ 1 1 −n , for x, y ∈ G, d0 (x, y) := d(x, y) + min 2 , − c c dist (x, G ) dist (y, G ) n n n=1 (6.14) where dist (x, S) denotes the distance from a point x ∈ M to a set S ⊂ M (see the definition of the distance function in the paragraph before Proposition 557). To check that d0 is a metric on G, see Exercise 13.384. To verify that d0 and d on G are equivalent is easy. Let us prove that (G, d0 ) is complete. To this end, let {xi }∞ i=1 be a is d-Cauchy, hence it d-converges d0 -Cauchy sequence on G. In particular, {xi }∞ i=1 to some x ∈ M. Moreover, for each n ∈ N, the sequence {1/dist (xi , Gcn )}∞ i=1 is Cauchy, hence it converges. This means, in particular, that the sequence is bounded away from 0. Obviously, dist (xi , Gcn ) →i dist (x, Gcn ), hence dist (x, Gcn ) = 0. This happens for every n ∈ N, hence x ∈ G. Recall the construction of the Cantor ternary set C (Definition 277). The intervals ε1 ,ε2 ,... ,εn defined there had indices in the set 2 0, M contains a finite δ-net. According to the introduction above, every compact metric space is totally bounded. This will be recorded in Proposition 613. Example 610 The space (R, darctan ) is totally bounded. Indeed, given ε ∈ (0, π ) , let c := tan (π/2 − ε/2). Note that R = (−∞, −c) ∪ [ − c, c] ∪ (c, +∞). Recall that the identity mapping from (R, dabs ) onto (R, darctan ) is a homeomorphism (see Example 564.1), so [−c, c] is compact in (R, darctan ), hence totally bounded there. Note too that each of the two sets ( − ∞, −c) and (c, +∞) have darctan -diameter less than ε. All together, this proves the assertion. In particular, this shows that total boundedness is not stable under homeomorphisms. ♦ Remark 611 The reader will have no difficulties in proving that the following four statements regarding a metric space (M, d) are equivalent: (i) M is totally bounded. (ii) For every δ > 0, there exists a finite open cover of M by sets of diameter less than δ.
318
6 Metric Spaces
(iii) For every δ > 0, there exists a finite open cover of M by balls of radius less than δ. (iv) For every δ, the cover {B(x, δ) : x ∈ M} has a finite subcover. We remark, too, that the two following statements are equivalent: (v) For every δ > 0, every open cover of M by sets having diameter less than δ has a finite subcover. (vi) For every δ > 0, every open cover of M by balls having radius less than δ has a finite subcover, and that they imply (i), i.e., that M is totally bounded. However, there are totally bounded metric spaces such that (v) (equivalently, (vi)), fails. For example, take (0, 1) endowed with the restriction d of the absolute-value metric on R. Obviously, ((0, 1), d) is totally bounded. However, the open cover {B(x, 1/3) : x ∈ (1/3, 2/3)} has no finite subcover. Compactness imply, obviously, (v) (equivalently, (vi)) above. However, there are noncompact metric spaces verifying (v) (equivalently, (vi)). An example is the metric space (M, d), where M := {1, 1/2, 1/3, . . . } and d is the restriction to M of the absolute-value metric on R. Certainly, M is not compact (the open cover {B(1/n, (1/n)2 ) : n ∈ N} has no finite subcover). However, any cover by open balls of a fixed radius has a finite subcover, as it is easy to see. ® Proposition 612 Every totally bounded metric space is bounded. Every subspace of a totally bounded metric space, endowed with the restricted distance, is again totally bounded. Proof The first statement follows from the definition. For the second, we proceed as in the proof of Corollary 583. Precisely, Let Y be a subset of the totally bounded metric space (M, d) and let δ > 0. Let Nδ be a finite δ-net in M. Let Nδ be a subset of Nδ formed by points x such that B(x, δ) ∩ Y = ∅. For each point x ∈ Nδ , choose y ∈ B(x, δ) ∩ Y . The collection of such chosen ys is easily seen to form a (finite) 2δ-net in Y , and the result follows. Proposition 613 Every compact metric space is totally bounded. Proof It was presented at the beginning of Sect. 6.8.2.
Remark 614 We mentioned at the introduction of Sect. 6.8.2 that the converse to Proposition 613 does not hold. Related to this, see Theorem 620. ® The following result presents some more characterizations of total boundedness. Theorem 615 Let (M, d) be a metric space. Then, the following are equivalent. (i) (M, d) is totally bounded. (ii) Every sequence in M has a Cauchy subsequence. (iii) For each δ > 0, all δ-separated sets in M are finite. Proof (i)⇒(ii) Let (M, d) be totally bounded. Let {xn }∞ n=1 be a sequence in M. Assume first that there exists a subsequence {xnk }∞ k=1 whose elements are all equal. This is trivially a Cauchy sequence, and we are done. On the contrary, we may
6.8 Compactness in Metric Spaces
319
assume that all terms in {xn }∞ n=1 are mutually distinct. Since M can be covered by a finite number of sets of the form B(x, 1), there is a subsequence {x1,n }∞ n=1 of {xn }∞ contained in a set B of this form. The space (B , d) is totally bounded (see 1 1 n=1 Proposition 612), hence it can be covered by a finite number of sets of the form ∞ B(x, 1/2). Thus, there exists a subsequence {x2,n }∞ n=1 of {x1,n }n=1 contained in a set ∞ B2 of this form. Proceed recursively. The sequence {xn,n }n=1 is a subsequence of {xn }∞ n=1 and it is clearly, Cauchy. (ii)⇒(iii) Assume that, for some δ > 0, there exists an infinite δ-separated set S ∞ in M. Then, S contains a δ-separated sequence {sn }∞ n=1 . Clearly, {sn }n=1 cannot have a Cauchy subsequence. (iii)⇒(i) Fix δ > 0. Let Fδ be the family of all δ-separated subsets of M. This family is partially ordered by inclusion (see Sect. 12.6), and clearly every chain has an upper bound (namely, its union). By Zorn’s Lemma (see Sect. 12.6.3), the family Fδ has a maximal element Nδ . Observe that Nδ is a δ-net in M. Since it is δ-separated, Nδ is finite by the assumption in (iii). Corollary 616 Every totally bounded metric space is separable. Proof Use the equivalence (i)⇔(iii) in Theorem 615 and the equivalence (i)⇔(v) in Theorem 582. The following is a useful consequence of Theorem 615. Corollary 617 Let (M, d) be a metric space. Let A be a subset of M with the following property: For every ε > 0 there exists a totally bounded subset Bε of M such that, for all a ∈ A, there exists b ∈ Bε with d(a, b) < ε. Then A is totally bounded. Proof If not, there exists, by Theorem 615, a positive number δ and a δ-separated infinite subset S of A. Find Bδ/3 according to the hypothesis. Then, given s ∈ S we can find b(s) ∈ Bδ/3 such that d(s, b(s)) < δ/3. Observe that, given s1 and s2 in S such that s1 = s2 , we have d(b(s1 ), b(s2 )) ≥ d(s1 , s2 ) − d(b(s1 ), s1 ) − d(b(s2 ), s2 ) > δ − δ/3 − δ/3 = δ/3. This shows two things: {b(s) : s ∈ S} is a δ/3-separated subset of Bδ/3 and it is infinite. Thus, again by Theorem 615, the set Bδ/3 is not totally bounded, a contradiction. 2 Corollary 618 The closure of a totally bounded subset of a metric space is totally bounded. Proof If S is totally bounded, the set A := S satisfies the hypothesis of Corollary 617 (for all ε > 0, the subset Bε there can be taken just S). Remark 619 Observe that, for every n ∈ N, the two concepts “totally bounded” and “bounded”coincide for subsets of Rn . Indeed, every totally bounded subset of an arbitrary metric space is bounded (Proposition 612). Conversely, if S is a bounded subset of Rn , it is contained in a ball—hence in a closed ball— showing that its closure S is also bounded. By the Heine–Borel Theorem 96, S is then compact. It
320
6 Metric Spaces
follows from Proposition 613 that S is totally bounded, and then S is totally bounded, too, by Proposition 612. ® Let (M, d) be a metric space and S be a nonempty subset of M. Recall that a point x ∈ M is said to be an accumulation point (also called a limit point) of S if every neighborhood of x contains points in S \ {x} (see Definition 81). The set of all accumulation points of S is denoted by S and it is called the Cantor derived set of S. Note that S = S ∪ S . The following result lists some characterization of compactness in the class of metric spaces. Item (iv) is the version of Theorem 149—a consequence of the Bolzano–Weierstrass Theorem 147—in this more general context. (vii) is the generalized version of the Nested Interval Theorem 69. Theorem 620 Let (M, d) be a metric space. Then the following are equivalent. (i) (ii) (iii) (iv) (v) (vi) (vii)
M is compact. Any cover by open balls contains a finite subcover. For every infinite set A ⊂ M, A = ∅. Every sequence in M has a convergent subsequence. M is totally bounded and complete. Every countable open cover of M has a finite subcover. Every decreasing sequence {Cn }∞ n=1 of nonempty closed subsets of M has a nonempty intersection.
Proof (i)⇒(ii) is trivial. (ii)⇒(iii) Assume that there is an infinite subset A of M such that A = ∅. Then A = A ∩ A , hence M \ A is open. Let B be a family A is closed, since A = ∅ and of open balls in M such that {B : B ∈ B} = M \ A. Since every point a of A has a neighborhood that intersects A in a finite set, we can find B(a, r(a)) such that B(a, r(a)) ∩ A = {a}. Consider the following open cover of M: {B(a, r(a)) : a ∈ A} ∪ B. Observe that if any B(a, r(a)) is removed, then a is not covered by other members of this cover, so this cover has no finite subcover, a contradiction. (iii)⇒(iv) If a sequence {xn } in M is formed by a finite number of points, then one of them must repeat infinitely many times, and we get a convergent (constant, in fact) subsequence. Thus we may assume that the sequence {xn } is formed by distinct points. Let x be a limit point of the set {xn : n ∈ N}. Then every neighborhood of x contains an infinite number of distinct points. Thus, for each k we can pick xnk such ∞ that d(xnk , x) < k1 , and n1 < n2 < . . . . Then {xnk }∞ k=1 is a subsequence of {xn }n=1 , and limk xnk = x. (iv)⇒(v) By Theorem 615, (iv) implies that M is totally bounded. We will show that M is complete. For it, let {xn }∞ n=1 be a Cauchy sequence in M. ∞ By (iv), {xn }∞ n=1 has a convergent subsequence {xnk }k=1 , converging, say, to x. Let us prove that xn → x. Indeed, given ε > 0, there is n0 such that d(xn , xm ) < ε for all n, m ≥ n0 and d(xnk , x) < ε for all k ≥ n0 . Then, if n ≥ n0 , since nn0 ≥ n0 ,
6.8 Compactness in Metric Spaces
321
we have d(xn , x) ≤ d(xn , xnn0 ) + d(xnn0 , x) ≤ 2ε. Therefore every Cauchy sequence is convergent in M and M is thus complete. (v)⇒(i) Let U be an open cover of M. By Corollary 616, M is separable and thus, by Theorem 582, M is Lindelöf. Therefore U has a countable subcover V = {Un }∞ n=1 . For n ∈ N, let Wn := U1 ∪ U2 ∪ . . . Un . We will show that Wn = M for some n ∈ N. This will finish the proof that M is compact. Assume, on the contrary, that for every n ∈ N there is xn ∈ M \ Wn . By the equivalence (i)⇔(ii) in Theorem 615, the sequence {xn }∞ n=1 has a Cauchy ∞ subsequence {xnk }∞ . Since (M, d) is complete, {x } converges to some x ∈ M. nk k=1 k=1 Then x ∈ Wn0 for some n0 ∈ N, as {Un }∞ is a cover of M. Since W n0 is open, there n=1 is k0 ∈ N such that xnk ∈ Wn0 for k ≥ k0 . Thus, for k ≥ k0 and nk > n0 , we have xnk ∈ Wn0 ⊂ Wnk , a contradiction. Thus, (i) to (v) are equivalent. Now, (i)⇒(vi) is obvious from the definition. (vi)⇒(vii) Let{Cn }∞ n=1 be a decreasing sequence of nonempty closed subsets of ∞ M. Assume that ∞ C n=1 n = ∅. Put On := M \ Cn for all n ∈ N. Then {On }n=1 is a countable open cover of M that has no finite subcover, a contradiction. (vii)⇒(iv) Let {xn }∞ n=1 be a sequence in M. For n ∈ N, put Cn := {xi : i ≥ n}. ∞ } The sequence {C n n=1 is decreasing and consists of nonempty closed sets. Let x ∈ ∞ ∞ C . We shall choose inductively a subsequence {xnk }∞ k=1 of {xn }n=1 that will n=1 n converge to x. To begin with, due to the fact that x ∈ C1 there exists n1 ∈ N such that d(xn1 , x) < 1. Assume that n1 < n2 < . . . < ni have already been chosen. Since x ∈ Cni +1 , there exists ni+1 > ni such that d(xni+1 , x) < 1/(i + 1). This defines the sequence n1 < n2 < n3 < . . . in N. Clearly, xni → x. This finishes the proof of the theorem. Corollary 621 Let (M, d) be a complete metric space. Then, a subset A of M is totally bounded (in the restricted metric) if and only if, A is relatively compact. Proof Assume first that A is totally bounded. It is easy to see that A is totally bounded. Indeed, a finite δ-net for A is a 2δ-net for A. Then, by Proposition 572 and (v) in Theorem 620, A is compact, since it is complete in the restricted metric. Conversely, if A is compact, then A is totally bounded, by (v) in Theorem 620. By Proposition 612, A is also totally bounded. The following is a useful statement. Corollary 622 Let (M, d) be a complete metric space and let A be a subset of M that is not relatively compact. Then, for some δ > 0 the set A contains an infinite δ-separated set. Proof By Corollary 621, A is not totally bounded. The result follows from Theorem 615.
322
6 Metric Spaces
Remark 623 In absence of completeness, Corollary 622 does not hold. Indeed, consider the metric space M = (0, 1) with the restriction dabs of the absolute-value metric on R. Then (M, dabs ) is certainly not complete, it is totally bounded, although the set M is not relatively compact in the metric space (M, dabs ). However, in view of Theorem 615 the set M does not contain infinite δ-separated sets. ® For separable metric spaces, the following holds. Proposition 624 Let (M, d) be a separable metric space. Then the two following conditions are equivalent: (i) (M, d) is compact. (ii) Every countable cover of M by open balls has a finite subcover. Proof (i) ⇒(ii) is clear. Assume now (ii) and let D be a countable dense subset of M. Observe first that the countable family G := {B(z, r) : z ∈ D, r ∈ Q} has the property that every open subset O of M is the (countable) union of a subfamily of G. In order to see this, let x ∈ O and let r ∈ Q, r > 0, such that B(x, r) ⊂ O. Find z ∈ D such that d(z, x) < r/2. Let y ∈ B(z, r/2). Then d(y, x) ≤ d(y, z) + d(z, x) < r/2 + r/2 = r, hence (x ∈ ) B(z, r/2) ⊂ B(x, r) ⊂ O. Let O be an open cover of M. By the previous observation, each O ∈ O is the countable union of a subfamily of G. The family of all elements in G so obtained form a (countable) cover of M by open balls, hence it has a finite subcover F. It is enough to choose, for each such B ∈ F, a superset O ∈ O to obtain a (finite) cover of M consisting of elements in O. Remark 625 It can be strange, at first glance, that (ii) in Proposition 624 does not appear as one of the equivalences in Theorem 620. The reason is that out of the class of separable (metric) spaces it is not a characterization of compactness. For an example, see Exercise 13.565. ® We will now discuss briefly the action of a continuous function on a totally bounded metric space. Observe first that the continuous image of a totally bounded metric space is not necessarily totally bounded. Indeed, the identity mapping I : (R, darctan ) → (R, dabs ) is continuous (for a graph of the arctan x function see Fig. 4.35). Even more, it is a homeomorphism (see Example 564.1). However, it was shown in Example 610 that (R, darctan ) is totally bounded, while (R, dabs ) is not—it is complete and not compact, so the statement follows from Theorem 620. Incidentally, and in the light of Proposition 626 below, the (continuous) mapping I : (R, darctan ) → (R, dabs ) is not uniformly continuous. A strengthening of the continuity property ensures the permanence of the total boundedness property. This is the content of the next result. We provide two proofs. The first one is based on the characterization of total boundedness in terms of δseparated sets ((iii) in Theorem 615). The second one uses a Lemma on Cauchy sequences and (ii) in the same theorem.
6.8 Compactness in Metric Spaces
323
Proposition 626 The uniformly continuous image of a totally bounded metric space into a metric space is totally bounded. Proof Let (M, d) be a totally bounded metric space and let f : M → N be a uniformly continuous mapping into a metric space (N , r). Fix ε > 0 and let Y be an ε-separated subset of f (M). By the uniform continuity of f we can find δ > 0 such that r(f (x1 ), f (x2 )) < ε whenever x1 , x2 ∈ M, d(x1 , x2 ) < δ. For each y ∈ Y find x ∈ M such that f (x) = y. The collection of those xs form a subset X of M. Note that X is a δ-separated set. Indeed, if x1 and x2 in X satisfy d(x1 , x2 ) < δ, then r(f (x1 ), f (x2 )) < ε, and so f (x1 ) = f (x2 ), hence x1 = x2 . Since every δ-separated subset of M is finite (see Theorem 615), the set X is finite, and so it is Y . This holds for every ε > 0. Again by Theorem 615, we get that f (M) is totally bounded. Lemma 627 Let (M, d) and (N , r) be metric spaces and let f : M → N be a uniformly continuous function. Then the image by f of any Cauchy sequence in (M, d) is a Cauchy sequence in (N , r). Proof Given ε > 0, there exists δ > 0 such that r(f (x), f (y)) < ε whenever x, y ∈ M satisfy d(x, y) < δ. Let {xn } be a Cauchy sequence in (M, d). Find n0 ∈ N such that d(xn , xm ) < δ for all n, m ∈ N such that n ≥ n0 and m ≥ n0 . Then r(f (xn ), f (xm )) < ε for all n, m ∈ N such that n ≥ n0 and m ≥ n0 . This shows that the sequence {f (xn )} is Cauchy in (N, r). A Second Proof of Proposition 626 Let (M, d) be a totally bounded metric space. Let f : M → N be a uniformly continuous mapping from M into a metric space (N , r). Assume that {yn } is a sequence in f (M). For n ∈ N find xn ∈ M such that f (xn ) = yn . The sequence {xn } has, according to Theorem 615, a Cauchy subsequence, whose image, by Lemma 627, is a Cauchy subsequence of {yn }. This shows, by using again Theorem 615, that f (M) is totally bounded.
6.8.3
Continuous Mappings on Compact Spaces
Results 628 to 633 below relate to properties of continuous mappings on compact spaces, extending some of the results obtained for real-valued functions defined on closed and bounded intervals in R. These results are crucial in Optimization Theory. Proposition 628 Let K be a compact metric space and let f be a continuous mapping from K into a metric space M. Then f (K) is a compact subset of M. Moreover, if f is one-to-one, then the inverse mapping f −1 : f (K) → K is continuous. Proof For the first part observe that, according to Theorem 620, it is enough to show that every sequence {yn }∞ n=1 in f (K) has a subsequence that converges to an element in f (K). For n ∈ N, let xn ∈ K such that f (xn ) = yn . The sequence ∞ {xn }∞ n=1 has a convergent subsequence {xnk }k=1 , due to the fact that K is compact (see Theorem 620). Let x ∈ K be its limit. Since f is continuous, {f (xnk )}∞ k=1 converges to f (x) (∈ f (K)).
324
6 Metric Spaces
Assume now that f is, moreover, one-to-one, so there exists the inverse function f −1 : f (K) → K. Any closed subset F of K is compact (see Proposition 602). Use the first part to obtain that f (F ) is compact. By Proposition 601, the set f (F ) is closed in f (K). This shows, in view of Proposition 559, that f −1 is continuous. Corollary 629 Every continuous function f : K → R, where (K, d) is a compact metric space, is bounded, and attains its infimum and supremum. Proof According to Proposition 628, f (K) is a compact subset of R. From Lemma 95 it follows that f (K) is bounded, so there exist sup f (K) and inf f (K), and from Lemma 94 we get that both sup f (K) and inf f (K) belong to f (K). This shows the existence of x and y in K such that f (x) = sup f (K) and f (y) = inf f (K). We propose an alternative—more “constructive”—proof. Assume first that f (K) is unbounded. We can find then a sequence {xn }∞ n=1 in K such that |f (xn )| > n for has a convergent subsequence, say {xnk }∞ each n ∈ N. The sequence {xn }∞ n=1 k=1 . It ∞ follows that {f (xnk )}k=1 converges, in particular {f (xnk ) : k ∈ N} is bounded, a contradiction. Since the set f (K) is bounded, it has a supremum sup f (K). Find a sequence ∞ {xn }∞ n=1 in K such that f (xn ) → sup f (K). The sequence {xn }n=1 has a con∞ vergent subsequence, say {xnk }k=1 , with limit x ∈ K. It follows that f (x) = limk→∞ f (xnk ) = sup f (K). The argument for the infimum is similar. The following result, a consequence of the implication (i)⇒(iv) in Theorem 620, is the extension to metric spaces of the Heine–Cantor Theorem 344 for the space R. Corollary 630 Let (K, d1 ) be a compact metric space and let (M, d2 ) be a metric space. Then, every continuous mapping f from K into M is uniformly continuous. Proof Assume that the result fails. Then, there exists ε > 0 and two sequences ∞ {xn }∞ n=1 and {yn }n=1 in K such that d1 (xn , yn ) < 1/n and d2 (f (xn ), f (yn )) ≥ ε for ∞ all n ∈ N. The sequence {xn }∞ n=1 has a convergent subsequence, say {xnk }k=1 . Let ∞ x ∈ K be its limit. Obviously, {ynk }k=1 converges to the same limit. Since f is continuous, it follows that f (xnk ) →k f (x) and f (ynk ) →k f (x), a contradiction with the fact that d2 (f (xnk ), f (ynk )) ≥ ε for all k ∈ N. The following result (Corollary 632) for families of functions is similar to Corollary 630, and it can be proved almost in the same way, so the proof is left to the reader. We need first a definition. Definition 631 Let (M1 , d1 ) and (M2 , d2 ) be metric spaces. Let F be a nonempty subset of the space C(M1 , M2 ) of all continuous functions from M1 into M2 . Given a point x1 ∈ M1 , the set F is said to be (i) equicontinuous at x1 if given ε > 0 there exists δ(= δ(ε, x1 )) > 0 such that d2 (f (x1 ), f (y1 )) < ε for every f ∈ F whenever y1 ∈ M1 satisfy d1 (x1 , y1 ) < δ; (ii) equicontinuous if F is equicontinuous at each point of M1 ; and (iii) uniformly equicontinuous if given ε > 0 there exists δ(= δ(ε)) > 0 such that d2 (f (x1 ), f (y1 )) < ε for every f ∈ F whenever x1 , y1 ∈ M1 satisfy d1 (x1 , y1 ) < δ.
6.8 Compactness in Metric Spaces
325
Corollary 632 Let (K, d1 ) be a compact metric space and let (M, d2 ) be a metric space. Then, an equicontinuous family of functions in C(K, M) is uniformly equicontinuous. Corollary 633 Let (K, d1 ) be a compact metric space and let (M, d2 ) be a metric space. Then, a pointwise bounded and equicontinuous family of functions in C(K, M) is uniformly bounded. Proof Let F be a pointwise bounded and equicontinuous family of functions in C(K, M). Assume that F is not uniformly bounded. Therefore, we can find a se∞ quence {fn }∞ n=1 in F and a sequence {xn }n=1 in K such that |fn (xn )| > n for every n ∈ N. By passing to a subsequence if necessary we may assume that {xn }∞ n=1 converges, say to x0 ∈ K. The family F is equicontinuous at x0 ; so there exists δ > 0 such that d1 (x, x0 ) < δ implies d2 (f (x), f (x0 )) < 1 for all f ∈ F . We can find n0 ∈ N such that d1 (xn , x0 ) < δ for every n ≥ n0 . Then d2 (f (xn ), f (x0 )) < 1 for all n ≥ n0 and f ∈ F ; in particular, d2 (fn (xn ), fn (x0 )) < 1 for all n ≥ n0 . Due to the fact that {fn (x0 ) : n ∈ N} is a bounded set in (M, d2 ), we reach a contradiction.
6.8.4
The Lebesgue Number of a Covering
The following result is helpful in understanding the structure of compact metric spaces. Theorem 634 (Lebesgue) Let U be an open cover of a compact metric space (M, d). Then there is a number δ > 0 (called the Lebesgue’s number of U) such that for every x ∈ M, there is Ux ∈ U such that B(x, δ) ⊂ Ux . Proof For every x ∈ M, choose Vx ∈ U such that x ∈ Vx and choose δ(x) > 0 such that B(x, 2δ(x)) ⊂ Vx . Then {B(x, δ(x)) : x ∈ M} is a cover of M. By the compactness of M, there exists a set {xj : j = 1, 2, . . . , n} in M such that M=
n
B(xj , δ(xj )).
j =1
Put δ := min{δ(x1 ), δ(x2 ), . . . , δ(xn )}. Given x ∈ M, we can find j0 ∈ {1, 2, . . . , n} such that x ∈ B(xj0 , δ(xj0 )). Then B(x, δ) ⊂ B(x, δ(xj0 )) ⊂ B(xj0 , 2δ(xj0 )) ⊂ Vxj0 (=: Ux ∈ U). This proves the result. We mention here a simple result that pertains to the general theory of compact spaces. It is often used in applications. Proposition 635 If K is a infinite compact metric space, then there is an infinite sequence of nonempty pairwise disjoint open subsets of K.
326
6 Metric Spaces
Fig. 6.11 The first steps of the construction in Proposition 635 for finite A
U(x1) U(x1) K
x1 x2
A x3
Proof (See Fig. 6.11) Let A be the set of all isolated points in K. Since for each x ∈ A the set {x} is open, A is an open subset of K. Assume first that A is infinite. Choose a countable subset {xn : n ∈ N} of A. Then {{xn } : n ∈ N} is an infinite sequence of pairwise disjoint open subsets of K. Assume that, on the contrary, A is finite or empty. Put K1 := K \ A, an infinite compact subset of K consisting only of accumulation points. All the concepts in the rest of the argument refer to the compact space K1 . Take x1 and x2 in K1 , x1 = x2 and find an open neighborhood U (x1 ) of x1 such that x2 ∈ U (x1 ). Since K1 \ U (x1 ) is open and nonempty, it contains infinitely many points (recall that x2 is an accumulation point). Choose x3 ∈ K1 \ U (x1 ) and an open neighborhood U (x2 ) of x2 in K1 \ U (x1 ) such that x3 ∈ U (x2 ). Again, K1 \(U (x1 )∪U (x2 )) is an open neighborhood of x3 , hence it contains infinitely many points of K1 . Choose x4 ∈ K1 \ (U (x1 ) ∪ U (x2 )), x4 = x3 and an open neighborhood U (x3 ) of x3 in K1 \ (U (x1 ) ∪ U (x2 )) such that x4 ∈ U (x3 ). Continuing in this way. we obtain a pairwise disjoint sequence {U (xn ) : n ∈ N} of open subsets of K1 . To finalize the proof, it is enough to observe that, since A is finite, each U (xn ) may be chosen to be an open subset of K that is contained in K1 .
6.8.5
The Finite Intersection Property. Pseudocompactness
Definition 636 A family of sets has the Finite Intersection Property if every nonempty finite subcollection of it has a nonempty intersection. Theorem 637 Let M be a metric space. Then the following are equivalent. (i) M is compact. (ii) Every family of closed sets in M that has the Finite Intersection Property has a nonempty intersection. (iii) Every real-valued continuous function defined on M is bounded. (iv) Every real-valued continuous function defined on M attains its supremum.
6.9 The Baire Category Theorem Continued
327
Proof (i)⇒(ii): Assume that M is compact. Let F be a family of closed sets in M with empty intersection. Then {F c }F ∈F is an open cover of M. By the compactness of M, there are F1 , F2 , . . . , Fn in F such that {Fic }ni=1 is an open cover of M, i.e., n c M = i=1 Fi . In other words, ni=1 Fi = ∅. Assume (ii). Let U be an open cover of M. Then M = U ∈U U , i.e., (ii)⇒(i): c ∅. Since each U c is closed, U ∈U U = we get, by (ii), that for some U1 , U2 , . . . , Un in U, we get ni=1 Uic = ∅, i.e., M = ni=1 Ui . This means that M is compact. (iii)⇒(iv): If (iv) does not hold, there is a continuous function f on M that does not attain its supremum s on M. Then the function g defined on M for x ∈ M by g(x) =
1 , s − f (x)
is a continuous function on M that is unbounded. (iv) -⇒ (iii): Assuming (iv), every continuous function on M has a finite supremum as this supremum is attained. The function −f has also a finite supremum. It follows that f is bounded. (i) -⇒ (iii): See Corollary 335. (iii) -⇒ (i): First we show that assuming (iii), we have that M is totally bounded. Indeed, if not then M would contain, for some δ > 0, an infinite δ-separated set D = {xn : n ∈ N}. The set D is clearly closed (see Exercise 13.346). The function f defined on D by f (xn ) = n for n ∈ N can be extended to a continuous function on M by Tietze’s Theorem 566, and this extended function is not bounded on M, a contradiction. / be its Now we show that assuming (iii), we have that M is complete; Let M / ˆ completion. If f : M → R is a continuous function, its restriction f to M is continuous, hence bounded, i.e., there exists R > 0 such that |f (x)| ≤ R for every / it is clear that |fˆ(x)| / This x ∈ M. Since M is dense in M, ˆ ≤ R for every xˆ ∈ M. / / shows that M also satisfies (iii). Assume that there exists xˆ ∈ M \ M. The function x → d(x, x) ˆ defined on M is continuous and its infimum is 0 (see Exercise 13.359). / The However, this infimum is not attained. This contradiction shows that M = M. compactness of M follows now from Theorem 620.
6.9 The Baire Category Theorem Continued 6.9.1
The Baire Category Theorem in the Context of Metric Spaces
Notions that were introduced for subsets of R (see Sect. 1.9) extend naturally to this more general setting. So, we say that a set S in a metric space (M, d) is nowhere dense if its closure has an empty interior, a set S in M is said to be of first category if it is a countable union of nowhere dense sets, and if a set is not of first category is said to be of second category. Observe that a subset of a set of first category is itself of first category.
328
6 Metric Spaces
The following definition isolates a property enjoyed by the closed subsets of R, see the crucial Theorem 109. Definition 638 We say that a metric space (M, d) is a Baire space if the intersection of any countable collection of open dense subsets of M is dense in M. The following result gives an alternative formulation of the property of being a Baire space. Proposition 639 A metric space is Baire if and only if, the union of any countable collection of closed nowhere dense sets has empty interior. Proof Assume first that the metric space (M, d) isBaire. Let {Fn }∞ n=1 be a sequence of closed nowhere dense subsets of M. Let F := ∞ n=1 Fn . Assume that F contains for n ∈ N. Observe that Gn is an a nonempty open subset U . Put Gn := M \ Fn , open dense subset of M, for all n ∈ N. Put G := n Gn . Then U ⊂ F = n Fn = M \ n Gn = M \ G, a contradiction with the density of G—due to the fact that (M, d) is Baire. Conversely, assume that the property holds. Let {Gn }∞ n=1 be a sequence of open dense subsets of M and G := n Gn . Assume that G is not dense. There exists then an open nonempty subset U of M such that U ∩ G = ∅. Put Fn : M \ Gn , for ∞ n ∈ N. Then of closed nowhere dense subsets of M. Observe {Fn }n=1 is a sequence that M \ F = G, hence F n n n n = M \ G ⊃ U , a contradiction with the fact that F has an empty interior. n n Theorem 640 (Baire Category Theorem) Every complete metric space is a Baire space. Proof Let {On }∞ n=1 be a sequence of open dense subsets of a complete metric space (M, d). Fix x ∈ M and ε > 0. Then B(x, ε) ∩ O1 = ∅. Find a closed subset C1 of B(x, ε) ∩ O1 with a nonempty interior and having diameter less than 1. Since C1 ∩ O2 = ∅, we can find a closed subset C2 of C1 ∩ O2 with a nonempty interior and having diameter less than 1/2. Proceed in this way. If we select yn ∈ Cn for each to some y ∈ M. It is clear that n ∈ N, sequence {yn } is Cauchy, hence converges
the ∞ ∞ y∈ ∩ B(x, ε). This proves that O O is dense in M. n=1 n n=1 n It is useful to have a reformulation of this important result. The following theorem is equivalent to the Baire Category Theorem, in the sense that each of them implies the other. This was already verified in the context of real numbers, see Theorem 111. Theorem 641 Every nonempty open subset of a complete metric space is of second category. Proof Assume that an open set O in a complete metric space (M, d) is of first ∞ category. Then O = ∞ R , where each R is nowhere dense, so O ⊂ n n=1 n n=1 R n . Since (M, d) is a Baire space, O = ∅. Let us show how Theorem 640 can be proved by using Theorem 641. Let {Rn }∞ n=1 be a collection of closed nowhere where (M, d) is a complete ∞ dense subsets of M, ∞ metric space. Put O := Int ( n=1 Rn ). Then O = n=1 (Rn ∩ O), and each set Rn ∩ O is nowhere dense. This implies O = ∅.
6.9 The Baire Category Theorem Continued
329
From the definition of a Polish space and from Theorem 640 we get the following result. Corollary 642 Every Polish space is a Baire space. Remark 643 It follows from Theorem 589, Proposition 639, and Theorem 640, that Q is not a Gδ -subset of R. Indeed, were Q a Gδ -subset of the complete metric space R, it would be a Polish, hence a Baire, space. However, Q is the countable union of closed nowhere dense subset (namely, all its singletons). ® Definition 644 A set A in a metric space (M, d) is called residual in M, if its complement Ac in M is a set of first category in M. Remark 645 An example of a residual set in the real line is the set P of all irrational numbers. Indeed, its complement, i.e., the set Q, is of first category, since it is the countable union of nowhere dense sets (each one a singleton). On the other hand, the set Q is not residual in R; otherwise R will be of first category, contradicting Theorem 641. ® Theorem 646 Let (M, d) be a complete metric space and A be a set in M. Then A is residual in M if and only if, A contains a set G which is a Gδ -set and dense in M. ProofAssume first that A contains a set G which is a Gδ -set and dense in M. Write c G = n Gn , where each Gn is open (and dense, of course) in M. Put Fn := Gn (a nowhere dense set) for n ∈ N, and let B := n Fn . Then B is of first category in M. Moreover, B = Gc ⊃ Ac . Thus Ac is of first category, hence A is residual in M. Assume now that A is residual in M. So, Ac = n Fn , where each Fn is nowhere dense in M. Put Gn := M \ Fn for each n. Then each Gn is open and dense in M. As M is a complete metric space, by the Baire Category Theorem 640 the set n Gn is dense in M. Moreover, Gn ⊂ Fnc for each n ∈ N, and thus Gn ⊂ Fnc = M \ Fn = M \ Ac = A. n
n
The set A thus contains a set G := finishes the proof.
6.9.2
n
Gn , which is a Gδ -set and dense in M. This
Some Applications of the Baire Category Theorem
1. The first construction of an everywhere continuous nowhere differentiable realvalued function on a real interval was given by B. Bolzano around 1830 (see also the paragraph preceding Definition 570). We gave an explicit example of a function having such a behavior in Definition 570 (see Proposition 482). Relying on the Baire Category Theorem 641 it is possible to prove that most of the functions encountered in Analysis—in the sense of Baire category—are of that form. This is a result due to S. Banach and the also Polish mathematician S. Mazurkiewicz
330
6 Metric Spaces
in 1931, answering a question by H. Steinhaus in 1929. Precisely, the Banach– Mazurkiewicz theorem states that the set of all everywhere continuous, nowhere differentiable real-valued functions on any interval [a, b], where a < b, forms a residual set in (C[a, b], d∞ ) (see Example 551). The argument below is essentially due to the American mathematician J. C. Oxtoby. Obviously, it is enough to prove the theorem for [a, b] := [0, 1]. Fix n ∈ N. Let us define Fn := {f ∈ C[0, 1] : there exists x ∈ [0, 1 − 1/n] such that, for all h ∈ (0, 1 − x), |f (x + h) − f (x)| ≤ nh} . For each n ∈ N, the set Fn is d∞ -closed. Indeed, let {fi }∞ i=1 be a sequence in Fn that d∞ -converges to an element f ∈ C[0, 1], and for each i ∈ N let xi be the element in [0, 1 − 1/n] associated to fi in the definition of Fn . By Theorem 147, the sequence {xi }∞ i=1 has a convergent subsequence , so without loss of generality, we may assume that {xi }∞ i=1 converges to some element x ∈ [0, 1 − 1/n]. Fix h ∈ (0, 1 − x). We can find i0 ∈ N such that 0 < h < 1 − xi for i ≥ i0 . We have |f (x + h) − f (x)| ≤ |f (x + h) − f (xi + h)| + |f (xi + h) − fi (xi + h)| + |fi (xi + h) − fi (xi )| + |fi (xi ) − f (xi )| + |f (xi ) − f (x)| ≤ |f (x + h) − f (xi + h)| + d∞ (f , fi ) + nh + d∞ (fi , f ) + |f (xi ) − f (x)| . Given ε > 0, the continuity of f at x and x+h and the uniform convergence of {fi } to f allow to find i1 ≥ i0 such that, simultaneously, |f (x + h) − f (xi + h)| < ε, |f (x) − f (xi )| < ε, and d∞ (f , fi ) < ε for i ≥ i1 . This shows that |f (x + h) − f (x)| < nh + 4ε for all i ≥ i1 . Since ε > 0 was arbitrary, we get |f (x + h) − f (x)| ≤ nh. This shows that f ∈ Fn , and thus Fn is closed in (C[0, 1], d∞ ). Let us now prove that Fn is nowhere dense. To this end, observe first that the set of all continuous piecewise linear functions on [0, 1] is d∞ -dense in C[0, 1]. This is a simple consequence of the uniform continuity of every element in C[0, 1] (Theorem 344), see (i) in Fig. 6.12. Thus, every nonempty open subset of C[0, 1] contains then a continuous piecewise linear function p. It is enough to show that every such function p has, as d∞ -close as we wish, a continuous function g such that for every x ∈ [0, 1 − 1/n] there exists h ∈ (0, 1 − x) such that |g(x + h) − g(x)| > nh. This is simple. It is enough to construct a continuous “saw-tooth” function g close to p (see (ii) in Fig. 6.12). This proves that Fn has an empty interior. Thus the set of all functions in C[0, 1] that are somewhere differentiable from the right is of first category in C[0, 1]. The details are left to the reader. 2. As an application of the Baire Category Theorem 111 in R, we proved in Remark 112 that the interval [0, 1] is uncountable. The reader may immediately identify the ultimate reason for this and produce the following extension: every complete metric space without isolated points must be uncountable. Indeed, should a complete metric space M be countable, at least one of its points must form a set having
6.9 The Baire Category Theorem Continued
331
Fig. 6.12 Approximating a function f first by a continuous piecewise linear function p and then by a function not in Fn (the construction in 6.9.2.1)
g p
f p 0
(i)
1 0
(ii)
1
a nonempty interior (see Proposition 639 and Theorem 640), and this ensures that such a point is isolated. A more precise result in the case of complete metrizable and separable spaces (i.e., Polish spaces) was given in Corollary 591, so in particular both the interval [0, 1] and the Cantor ternary set C (see Definition 277) have cardinality c (something that was proved in Proposition 61 and in (vi) in Proposition 279, respectively, by other methods). 3. The Riemann function R on (0, 1) was introduced in Definition 379. It is continuous at all irrational points and discontinuous at all rational points. A natural question is whether there are functions that are continuous at all rational points and discontinuous at all irrational points. An argument using the Baire Category Theorem shows that the answer is negative. In order to prove this, endow (0, 1) with a metric d that makes ((0, 1), d) complete (a consequence of Theorem 589 and Remark 588.2) and refer all the topological concepts below to the metric space ((0, 1), d). If f : (0, 1) → R is such a function, put Wn := {x ∈ (0, 1) : ω(f , x) < 1/n} for n ∈ N, where ω(f , x) denotes the W oscillation of f at x (see Definition 700 below). Then Q ∩ (0, 1) = ∞ n. n=1 It is simple to prove that each Wn is an open set. It is, moreover, dense, since it contains Q ∩ (0, 1). Thus, (0, 1) \ Wn is a closed set with empty interior, hence P ∩ (0, 1) (= ∞ n=1 ((0, 1) \ Wn )) is a set of first category. Since Q ∩ (0, 1) is countable (hence of first category), we obtain that (0, 1) is also of first category, and this contradicts Theorem 641. Remark 647 The Banach–Mazurkiewicz theorem above (see Item 1 in Sect. 6.9.2) again documents that in modern mathematics often it happens that is “easier” to show that most of the studied objects posses a given property rather than to show one concrete example. Apart from Baire Category that was used in this direction, Probability Theory is a natural choice too. Not speaking about the fact that sometime this is the only way to show existence, being very difficult, or beyond our knowledge, to find concrete examples of objects that occur most often. ®
332
6 Metric Spaces
6.10 The Arzelà–Ascoli Theorem Let (K, d) be a compact metric space. The Arzelà–Ascoli Theorem 648 below (due to the Italian mathematicians C. Arzelà and G. Ascoli) gives a necessary and sufficient condition for a subset of C(K) to be compact in the topology of the uniform convergence, i.e., in the topology of the metric induced by the norm · ∞ . We formulate the result in terms of sequences. Although this will not be used in the proof, note that, as a consequence of Corollary 633, the sequence in the next result is uniformly bounded. Theorem 648 (Arzelà–Ascoli) Let (K, d) be a compact metric space. Then, every pointwise-bounded and equicontinuous sequence in C(K) has a · ∞ -convergent subsequence. Proof Let {fn }∞ n=1 be a pointwise-bounded and equicontinuous sequence in C(K). By Corollary 632, the sequence {fn }∞ n=1 is uniformly equicontinuous. By Corollary 616, K is separable. Let D ⊂ K be a countable and dense subset of K, say D := {dm : m ∈ N}. A standard diagonal procedure gives a subsequence ∞ ∞ {gn }∞ n=1 of {fn }n=1 such that {gn (dm )}n=1 converges for each m ∈ N. To be precise, ∞ due to the fact that {fn (d1 )}n=1 is bounded in R, we may extract a convergent sub∞ 1 sequence {fn1 (d1 )}∞ n=1 . Since {fn (d2 )}n=1 is bounded, we may extract a convergent ∞ 2 subsequence {fn (d2 )}n=1 . Observe that {fn2 (d1 )}∞ n=1 converges too. Continue in this n , where g := f for all n ∈ N, has the stated property. way. The subsequence {gn }∞ n n n=1 We claim that {gn }∞ is · -Cauchy. Since (C(K), · ∞ ) is complete (see ∞ n=1 Example 573.4, where the result is proved for C[0, 1]; the argument for the general case is similar), this will show that {gn }∞ n=1 is · ∞ -convergent to some g ∈ C(K), and the proof will be finished. To this end, fix ε > 0 and find δ > 0 from the definition of uniform equicontinuity of {fn }∞ n=1 . The family {B(x, δ) : x ∈ D} is an open cover of K, so there exists a finite subcover {B(x, δ) : x ∈ S} of K. Since S is a finite subset of D, we can find N ∈ N such that for every n, m ≥ N , we have |gn (s) − gm (s)| < ε for every s ∈ S. Fix x ∈ K and find s ∈ S such that d(x, s) < δ. Then, for every n, m ≥ N , |gn (x)−gm (x)| ≤ |gn (x)−gn (s)|+|gn (s)−gm (s)|+|gm (s)−gm (x)| < ε+ε+ε = 3ε. This holds for every x ∈ K, so gn − gm ∞ ≤ 3ε for every n, m ≥ N , and the claim is proved. Corollary 649 Let (K, d) be a compact metric space. A subset F of C(K) is · ∞ compact if and only if, it is simultaneously · ∞ -closed, pointwise bounded, and equicontinuous. Proof Assume that F is · ∞ -compact. Then it is obviously · ∞ -closed and bounded, in particular pointwise bounded. Fix ε > 0. The family {B(f , ε) : f ∈ F } is an open cover of F , hence there exists a finite subcover {B(g, ε) : g ∈ F0 }, where F0 is a nonempty finite subset of F . Each g ∈ F0 is uniformly continuous by the Heine–Cantor Theorem 344 (this result was proved for K a compact subset of R; the
6.11 Metric Fixed Point Theory
333
proof can be immediately adapted to the case when K is a compact metric space), so there exists δ > 0 such that if x, y ∈ K and d(x, y) < δ, then |g(x) − g(y)| < ε for every g ∈ F0 . Given f ∈ F we can find g ∈ F0 such that f − g∞ < ε. Then, if x, y ∈ K are such that d(x, y) < δ, |f (x) − f (y)| ≤ |f (x) − g(x)| + |g(x) − g(y)| + |g(y) − f (y)| < ε + ε + ε = 3ε. This shows that F is uniformly equicontinuous, in particular equicontinuous. Assume now that F is · ∞ -closed, pointwise bounded, and equicontinuous. Let ∞ {fn }∞ n=1 be a sequence in F . Arzelà–Ascoli Theorem 648 ensures that {fn }n=1 has ∞ a · ∞ -convergent subsequence {fnk }k=1 . Its · ∞ -limit belongs to F , since F is · ∞ -closed. From Theorem 620 it follows that F is · ∞ -compact. For some applications of Theorem 648 see Sect. 7.1.5, Proposition 1049, and Exercises 13.368, 13.401, 13.403, and 13.569.
6.11
Metric Fixed Point Theory
In this section, we shall investigate the following problem: Let S be a nonempty set and f be a mapping from S into itself. When does it exist a point s ∈ S such that f (s) = s? (such a point is said to be a fixed point of the function f ). Many natural problems in Analysis can be solved by answering this question in special situations. For example, if f is a mapping from R into R, finding a solution x to f (x) = r, where r ∈ R is given, is equivalent to find a fixed point for the mapping g(x) := f (x) − r + x. This applies, in particular, to the problem of finding a zero of a function f : R → R, i.e., an element x ∈ R such that f (x) = 0. To solve functional equation of the type f (x) = 0 is not in general easy. Even with polynomials p(x), for degrees larger than or equal to 5, there cannot exist explicit formulas in terms of the coefficients for calculating the solutions of p(x) = 0, i.e., their roots. Some particular quadratic equations were solved already by the Babylonians and the general solution is attributed to Arabic algebraists. At the beginning of the XIV century, Italian mathematicians provided solutions to the general cubic (N. Fontana—nicknamed Tartaglia— although maybe inspired by S. del Ferro, and independently G. Cardano, who solved the cubic in full generality) and quartic equations (L. Ferrari, a student of Cardano). The quintic was thought to be solvable by J. L. Lagrange, who positively knew that the methods used for the quadratic, cubic, and quartic will never work for the quintic (1771). C. F. Gauss seems to have been the first to explicitly state his believe that the quintic was not solvable (1799). Finally, it was N. H. Abel (1824) who published the first complete proof of this important result. For a detailed account of the story of the attempts, failures, and successes around those problems see, e.g., [Brw]. The fixed point theory is a huge active area of pure and applied mathematics. For a reference see, e.g., [KS01] and [BeLi00]. Let us start with the basic definition.
334
6 Metric Spaces
Fig. 6.13 The function f (x) := 1 + x from R onto R has no fixed point. The dashed line is the diagonal
Fig. 6.14 A continuous function from [0, 1] into itself has fixed points (Proposition 651)
Definition 650 Let f : S → S be a mapping from a set S into itself. An element s ∈ S is said to be a fixed point of the mapping f if f (s) = s. In order to ensure the existence of fixed points, some condition either on the mapping f or on the set S (or on both the mapping and the set), should be imposed, since an arbitrary mapping from an arbitrary set into itself lacks, in general, fixed points (observe, for example, that a real-valued function f defined on R has a fixed point at x if the graph of f cuts the diagonal at (x, x), so Fig. 6.13 depicts a continuous function from R onto R without fixed points; see, also, e.g., Exercises 13.205, 13.404, and 13.405). On the other hand we have the following result, illustrated by Fig. 6.14. Proposition 651 Let f be a continuous function from [0, 1] into [0, 1]. Then f has a fixed point. Proof If f (0) = 0 or if f (1) = 1, we are done. Otherwise consider the map h(x) := f (x) − x for x ∈ [0, 1]. Note that h(0) > 0 and that h(1) < 0. By the Intermediate Value Theorem 339, there exists c ∈ (0, 1) such that h(c) = 0. This shows the result. Related to Proposition 651, see Exercise 13.205. We shall discuss important extensions of Proposition 651 in Sect. 11.7.3 (for example, Brouwer’s Fixed Point Theorem). Here we continue with further results in the area of fixed points.
6.11 Metric Fixed Point Theory
6.11.1
335
The Banach Contraction Principle
One of the most useful results in this area concerns contractions in complete metric spaces. It employs the following idea: For a continuous function f , if we iteratively put xn+1 := f (xn ) (starting with a chosen point x0 ) and can ensure that the sequence {xn }∞ n=1 so defined is convergent (say to x), then clearly f (x) = x. For a picture of two different behaviors of the sequences of iterations see Fig. 6.15. A contraction f from a metric space (M, d) into itself is, simply, a C-Lipschitz function f : M → M where 0 ≤ C < 1 (see Definition 443). So, f must satisfy d(f (x), f (y)) ≤ C.d(x, y) for any x, y ∈ M. To be precise, we call such a mapping a C-contraction. Given a mapping f : M → M and n ∈ N, denote by f [n] the n-th iterated of f , i.e., the mapping x → f (f ( n. times . . . . . f (x))). The following is the key result in metric fixed point theory and it is due to S. Banach. Theorem 652 (Banach Contraction Principle) Let (M, d) be a nonempty complete metric space and let f : M → M be a C-contraction (for some C ∈ [0, 1)). Then f has a unique fixed point u ∈ M. Moreover, any sequence {xn }∞ n=0 , where x0 ∈ M and for n ∈ N, xn := f (xn−1 ) (i.e., xn := f [n] (x0 ) for n ∈ N)
(6.16)
converges to u. Additionally, we have d(xn , u) ≤
Cn d(x0 , x1 ), for n ∈ N. 1−C
(6.17)
Proof We shall prove first uniqueness. Assume that u, v ∈ M are two fixed points of the mapping f . Then d(u, v) = d(f (u), f (v)) ≤ Cd(u, v). Since 0 ≤ C < 1, we get d(u, v) = 0, i.e., u = v. To prove existence (and the convergence of the sequence {xn }∞ n=0 to the fixed point), observe that d(xn , xn+1 ) = d(f (xn−1 ), f (xn )) ≤ Cd(xn−1 , xn ), for n ∈ N. Recursively, we get d(xn , xn+1 ) ≤ C n d(x0 , x1 ) for n ∈ N. Let p, q ∈ N be such that p < q. By the triangle inequality we have d(xp , xq ) ≤ d(xp , xp+1 )+. . .+d(xq−1 , xq ). Thus, d(xp , xq ) ≤ (C p + . . . + C q−1 ) d(x0 , x1 ) =
Cq − Cp Cp d(x0 , x1 ) ≤ d(x0 , x1 ). C−1 1−C (6.18)
It is clear from (6.18) and from the fact that C n → 0 as n → ∞ (see Corollary 132) that the sequence {xn }∞ n=0 is a Cauchy sequence, hence convergent (to some u ∈ M). Since f is Lipschitz, it is continuous, hence f (xn ) → f (u). Observe that f (xn ) = xn+1 for all n ∈ N, and that xn+1 → u as n → ∞; thus, f (u) = u. This
336
6 Metric Spaces
Fig. 6.15 The graphs (in bold) of a contraction (a), and a noncontraction (b), and the iterations (6.16). The dashed line is the diagonal
a
b proves that u is a fixed point of f and that any starting point x0 generates a sequence via (6.16) that converges to u. The estimate (6.17) follows from (6.18) by letting q → ∞. Figure 6.15 shows the different behavior of the fixed-point iteration described in (6.16) according to the absence (a) or the presence (b) of the property of being a contraction. Remark 653 1. The requirement of completeness in Theorem 652 cannot be dropped. For an example, see Exercise 13.404. 2. Theorem 652 is no longer true for strictly metric mappings, i.e., functions f from a metric space (M, d) into itself such that d(f (x), f (y)) < d(x, y) for all x, y ∈ M, x = y. For an example, see Exercise 13.406. However, the result holds true for this kind of functions as far as the metric space is compact. This will be proved in Proposition 655. 3. For a situation where the existence of a fixed point is guaranteed, even in absence of any Lipschitz condition—but retaining continuity—see Proposition 651 and Sect. 11.7.3. ® The following result gives a slight improvement of the Banach Contraction Principle (Theorem 652). Corollary 654 Let f be a function from a complete metric space (M, d) into itself. Assume that for some positive integer k the function f [k] is a contraction. Then f has a unique fixed point, and the (unique) fixed point of f [k] is the unique fixed point of f . Proof Theorem 652 gives that f [k] has a unique fixed point u ∈ M. Then f (u) = f (f [k] (u)) = f [k] (f (u)), hence f (u) is a fixed point of f [k] . The uniqueness of the fixed point of f [k] ensures that f (u) = u, so u is a fixed point of f .
6.11 Metric Fixed Point Theory Fig. 6.16 Each fn has a fixed point, f does not
337 y f
f4 f3 f2 f1
x
Assume now that v ∈ M is a fixed point of f . So, f (v) = v and by finite induction, f [n] (v) = v for n ∈ N. In particular, f [k] (v) = v, where v is a fixed point of f [k] . Again the uniqueness of the fixed point of f [k] gives v = u. The next proposition gives the announced fixed point result for strictly metric mappings on compact metric spaces (see Remark 653.2). Exercise 13.406 shows that the compactness requirement cannot be dropped. Proposition 655 Let f be a strictly metric mapping from a compact metric space (K, d) into itself. Then f has a unique fixed point in K. Proof The mapping f is obviously continuous. So it is the mapping ϕ : K → R given by ϕ(x) := d(x, f (x)) for x ∈ K. By Corollary 335, ϕ attains its infimum at some u ∈ K, i.e., d(u, f (u)) = ϕ(u) ≤ ϕ(x) = d(x, f (x)) for all x ∈ K. Since f is strictly metric, if f (u) = u we have ϕ(f (u)) = d(f (u), f [2] (u)) < d(u, f (u)) = ϕ(u), a contradiction. This shows f (u) = u. Assume now that v ∈ K is a fixed point of f . Then, if u = v, d(u, v) = d(f (u), f (v)) < d(u, v), a contradiction. This shows u = v. We shall see some applications of Fixed Point Theory in Sect. 7.1.6, once the basic theory of the Riemann integral will be developed.
6.11.2
Continuity of the Fixed Point
If f is a function from a metric space (M, d) into itself, and f is the limit of a sequence {fn } of functions, each of them having a fixed point un , it is natural to ask whether the sequence {un } converges to a fixed point of f . In general, this is not the case, even if the convergence is uniform and all functions fn are contractions. Figure 6.16 shows this situation.
338
6 Metric Spaces
However, a common Lipschitz constant 0 ≤ C < 1 for all the functions in the sequence {fn } guarantees a positive answer to this question (in the case of complete metric spaces). Proposition 656 Let (M, d) be a complete metric space and let {fm } be a sequence of contractions with the same Lipschitz constant 0 ≤ C < 1. Let um be the (unique) fixed point of fm , for m ∈ N. Assume that the sequence {fm } pointwise converges to a function f . Then f has a fixed point u and um → u. Proof Given x, y ∈ M, we have d(fm (x), fm (y)) ≤ Cd(x, y) for all m ∈ N. Passing to the limit for m → ∞ we get d(f (x), f (y)) ≤ Cd(x, y), so f is a contraction with Lipschitz constant C. It follows from Theorem 652 that f has a (unique) fixed point u. Take x0 := u in Theorem 652 for each function fm . By (6.17) we get, for n = 0, d(u, um ) ≤
1 1 d(u, fm (u)) = d(f (u), fm (u)), for m ∈ N. 1−C 1−C
(6.19)
Letting m → ∞ we get fm (u) → f (u), and (6.19) shows that um → u. Proposition 656 will be used later on (see Example 690) to show how the local solution to a problem in differential equations depends continuously on the initial conditions.
Chapter 7
Integration
This chapter is devoted to the basics in the integration theory, both in the Riemann and Lebesgue sense. Lebesgue’s theory is intertwined with the measure theory developed in Chapter 3. This allows for a finer analysis of functions and convergence.
7.1 The Riemann Integral 7.1.1
Introduction
One of the tasks of geometry is to compute the area of a planar object (more particularly, the area enclosed by a closed curve in the plane), and the volume of a three-dimensional object (again, the volume bounded by a closed surface in space). These problems have been addressed from early times, and solution to particular cases were proposed before the existence of an adequate tool, only much later at hand. However, the basic idea behind the computation of some areas and volumes, present since almost the beginning of “mathematical thinking,” has remained almost the same: To approximate general planar and three-dimensional objects by simple shapes (polygons and polyhedrons, respectively) to a certain degree of accuracy. For example, this was the way Archimedes computed the area enclosed by a parabola in a treatise that bears the title “On the Quadrature of the Parabola” (quadrature was the name given to the process of finding a square of the same area as a giving planar figure). Archimedes proceeded to exhaust a parabolic segment by using inscribed triangles. The object of the integral calculus is to provide tools for “automatizing” these calculations, and on the this practice on a solid base, essentially by using the aforementioned idea, already at the origin of the subject. No surprise then that it preceded in time the more elusive differential calculus—to whom it appeared later as intimately appeared later as intimately connected. After all, a shape or a volume are static creatures, while the “rate of required a new way of thinking: to “freeze” an instant in time and then to consider infinitely many such “isolated” moments to recompose the—usually too fast to be understood—movement. In terms of the geometry, this © Springer International Publishing Switzerland 2015 V. Montesinos et al., An Introduction to Modern Analysis, DOI 10.1007/978-3-319-12481-0_7
339
340
7 Integration
Fig. 7.1 Approximating the area with “inscribed” rectangles
f
a
x1
x2
x3
x4
b
required the ability to trace a tangent to a curve by selecting a contact point—from infinitely many choices—and a direction—again from infinitely many choices. To realize that the calculation of an area and the direction—again from infinitely many choices. To realize that the calculation of an area and the drawing of a tangent are related operations, is one of the great achievements in this area. Quadrature by exhaustion, when applied to the computation of the area of the nonnegative part of the “subgraph” of a bounded nonnegative function f defined on an interval [a, b]—and most of the practical problems can be reduced to this case— translates into the following systematic procedure: Split the interval [a, b] into a finite number of nonoverlapping subintervals (by using “cut” points a = x0 < x1 < x2 < . . . < xn = b) and build on top of each [xi−1 , xi ] a rectangle having height mi := inf{f (t) : t ∈ [xi−1 , xi ]} (see Fig. 7.1). Adding the area of the rectangles we get n
mi (xi − xi−1 ).
(7.1)
i=1
By increasing the number of cut points, a sort of “limit process”—to be defined later—should produce the sought area of the subgraph. A closely related procedure uses f (ti ) instead of mi as height of the i-th rectangle (see Eq. (7.1)), where ti is a chosen point in [xi−1 , xi ], for i = 1, 2, . . ., n. Adding the area of the rectangles (see Fig. 7.2) we get n
f (ti )(xi − xi−1 ),
(7.2)
i=1
an expression called Riemann sum after the name of B. Riemann. Again, a limit process would provide the right value of the planar surface. These two approaches should produce the same value—called the Riemann inb tegral of f in [a, b], and denoted a f (x) dx—under “normal” circumstances. The study of these and related procedures is the goal of the first part of the present chapter. Another instance in which integration appears is the calculation of an “average.” Suppose we have a bounded nonnegative real-valued function f defined on an interval [a, b]. The reader may think that f depicts the time evolution of his/her income. Our
7.1 The Riemann Integral
341
Fig. 7.2 Approximating the area by using Riemann sums
f
a t1 x1
t 2 x2
t3 x3 t4 x4 t5
b
Fig. 7.3 Two functions giving (b) (but the same value f (a)+f 2 not the same average!)
a
b a
b
wish is to determine the average value—a term whose meaning will be specified along the way—of the function f on [a, b]. If we had to average only n function values, say f (t1 ), . . ., f (tn ), where a ≤ t1 < t2 , . . . < tn ≤ b, the plan is clear. Simply add up the values and then divide by n. We, then, can write fave =
n 1 f (ti ). n i=1
This rough approximation may produce a too coarse result due to lack of information. For example, assume that we select only two values, say f (a) and f (b). The “average,” in this case is (1/2)f (a) + (1/2)f (b), a result that does not convey the idea are looking for: a glimpse at the following two examples (Fig. 7.3) shows that for the two functions depicted this “average” gives the same value; however, there is no doubt which income evolution the reader will choose as more favorable. Of course, a way to mend this is to choose many more points a = t0 < t1 < . . . < tn = b and weight them by using nonnegative coefficients w1 , w2 , . . ., wn adding up to 1, to form n wi f (ti ). i=1
A particular way to do this is the following: split the interval [a, b] into a finite collection of nonoverlapping subintervals by using cut points a = x0 < x1 < . . . < xn = b, use as weights the ratios (xi − xi−1 )/(b − a) (in this way we have positive weights adding up to 1), and add the evaluations of f at selected points ti ∈ [xi−1 , xi ]
342
7 Integration
balanced with those weights, to obtain 1 f (ti )(xi − xi−1 ). (b − a) k=1 n
(7.3)
The “limit” situation should be the right definition of the average of f on the interval [a, b]. Comparing Eqs. (7.2) and (7.3) we may observe that the computation of an b average finally gets back to the notion of integral, since fave = a f (x) dx/(b − a). The average is thus the height of a rectangle having base [a, b] and the same area as the set {(x, y) : x ∈ [a, b], 0 ≤ y ≤ f (x)} (see again Fig. 7.3, where the position of the horizontal line is the average). A natural extension of the concept of a Riemann integral is motivated by the need to describe a situation in which some parts of the interval [a, b] are more relevant to the integral than others. A way to proceed is to consider sums of the form n
f (ti )(α(xi ) − α(xi−1 )),
i=1
instead of the usual Riemann sums (7.2), where α is real-valued function defined on [a, b], that for technical purposes is chosen to be of bounded variation. The integral so obtained is called the Riemann–Stieltjes integral of f on the interval [a, b], with respect to the integrator α (named after B. Riemann and the Dutch mathematician T. J. Stieltjes). We shall not pursue this theory in the present text. The reader may consult, e.g., [Ap74]. The Riemann theory focusses on bounded functions defined on bounded intervals. Another extension consists in removing those restrictions, giving way to what are called improper Riemann integrals. This will be considered in Sect. 7.2. A different extension of the Riemann notion is the Lebesgue theory of integration. It allows for working with many more functions than the Riemann theory. Moreover, it allows also for more techniques of integration. This will be considered in Sect. 7.3.
7.1.2
The Definition of the Riemann Integral
Here, we shall study the Riemann integral named after B. Riemann. A more general— in the sense on including more functions to be able to integrate—theory of the integral will be developed in Sect. 7.3. Two—equivalent—approaches to the Riemann integral are commonly used. They are represented in Fig. 7.4. Either we compute the supremum and infimum of the function (getting the continuous horizontal lines) or we select a particular value of f (the dashed horizontal line) in a given subinterval—computing in either case the area of the so-formed rectangle and adding the results. The first approach (technically speaking, computing the upper and lower Riemann sums, see Definition 658 below) is generally attributed to J. G. Darboux. The second one (getting the tagged Riemann
7.1 The Riemann Integral
343
Fig. 7.4 Two approaches to the area: Upper-lower sums (solid horizontal lines) and tagged sums (dashed horizontal lines)
f
a
x1
x2
x3
x4 b
sums, see Definition 667 below) was the original Riemann approach. We shall start by the former, to prove later that both procedures amount to the same concept and value—the Riemann integral. In this section, we shall consider bounded real-valued functions defined on a given closed and bounded interval [a, b], where a < b if nothing is said on the contrary. The concept of a partition of an interval [a, b] in R was already introduced in Sect. 4.5.6. We repeat the definition here and introduce some associated concepts. Definition 657 Fix a closed and bounded interval [a, b] ⊂ R, where a < b. A partition P of the interval [a, b] is a finite sequence {xi }ni=0 so that a = x0 < x1 < x2 < . . . < xn = b. We write P = {a = x0 < . . . < xn = b}. The partition P splits the interval [a, b] into subintervals. Set [xi−1 , xi ] = i for all i ∈ {1, . . ., n}. The norm of the partition P , denoted |P |, is the number max{λ(i ) : i =, 1, 2, . . ., n}, where λ( · ) denotes the Lebesgue measure (in this case, the length of the interval). The family of all partitions of [a, b] will be denoted by P[a, b]. Given two partitions P , Q ∈ P[a, b], we say that P is finer than Q (equivalently, that Q is coarser than P , or that P refines Q) if, as sets, Q ⊂ P . In this case, we write P / Q (equivalently, Q ≺ P ). Definition 658 Consider a bounded real-valued function f defined on [a, b], and a partition P := {a = x0 < . . . < xn = b} of [a, b]. By the Riemann lower sum, denoted L(f , P ) (the Riemann upper sum, denoted U (f , P )), associated to f and to the partition P , we understand (see Fig. 7.5) n n L(f , P ) = mi λ(i ), respectively, U (f , P ) = Mi λ(i ) , (7.4) i=1
i=1
where mi := inf t∈i f (t) and Mi := supt∈i f (t), i = 1, 2, . . ., n. Remark 659 1. Obviously, given P ∈ P[a, b], we have L(f , P ) ≤ U (f , P ). It is simple to prove that, given P , Q ∈ P[a, b] such that P ≺ Q, then L(f , P ) ≤ L(f , Q) and U (f , Q) ≤ U (f , P ) (see Fig. 7.6). Indeed, if i is a subinterval associated subintervals Ij , Ij +1 , . . ., Ij +q , then mi λ(i ) ≤ q to P that Q splits in some
m λ(I ), where m := inf{f (x) : x ∈ Ij +k } for k = 0, 1, . . ., q. A j +k k k=0 k similar argument applies to U . Thus,
344
7 Integration
Fig. 7.5 Upper and lower Riemann sums
f
a
x1
Fig. 7.6 The effect on the Riemann lower sum of refining the partition
x2
x3
x4
b
x2
x3
x4
b
f
a
x1
L(f , P ) ≤ L(f , Q) ≤ U (f , Q) ≤ U (f , P ), if P ≺ Q.
(7.5)
2. A consequence of the previous remark is that, if P , Q ∈ P[a, b] are arbitrary, then L(f , P ) ≤ U (f , Q). Indeed, we have L(f , P ) ≤ L(f , P ∪ Q) ≤ U (f , P ∪ Q) ≤ U (f , Q). 3. Observe that if f is a real-valued bounded function on [a, b] (say |f (x)| ≤ M for all x ∈ [a, b]), then −Mλ([a, b]) ≤ L(f , P ) ≤ U (f , P ) ≤ Mλ([a, b]) for every P ∈ P[a, b]. ® b b A consequence of Remark 659.3 is that the real numbers f and a f in the next a definition do exist. These two numbers are introduced with the obvious purpose of approaching the “area” enclosed by the graph of a function from below and above, respectively. This should be clear from the aspect of the Riemann lower and upper sums (see again Fig. 7.5) and the effect of refining the partition (see Fig. 7.6). Definition 660 Let f be a bounded real-valued function defined on an interval [a, b] ⊂ R. The number 4
b a
f := sup{L(f , P ) : P ∈ P[a, b]}
7.1 The Riemann Integral
345
Fig. 7.7 The average of the function x 2 on the interval [0, 1]
is called the lower Riemann integral of f on [a, b]. Analogously, the number 4
b
f := inf{U (f , P ) : P ∈ P[a, b]} a
is called the upper Riemann integral of f on [a, b]. b b Observe that, due to Remark 659.2, we have f ≤ a f . Now the following a definition is natural. Definition 661 Let f be a bounded real-valued function defined on an interval [a, b] ⊂ R. We say that f is Riemann integrable on [a, b] if 4
b a
4
b
f =
(7.6)
f. a
b In this case, this common value, denoted by a f , is called the Riemann integral of f on [a, b]. The class of Riemann integrable functions on [a, b] is denoted by R[a, b]. Remark 662 The symbol for the integral of a function f on an interval [a, b] has been b fixed as a f . Sometimes (and this is a notation due to Leibniz) we use, equivalently, b a f (x)dx. The appearance of the name of the variable under the integral sign (x in this case) is irrelevant. ® Definition 663 Let [a, b] be an interval in R, and let f ∈ R[a, b]. The real number fave := (b − a)−1
4
b
f
(7.7)
a
is called the average of f on the interval [a, b]. For a discussion of the idea of an average and for a picture that tries to make clear its precise meaning, see Sect. 7.1.1 and Fig. 7.3, respectively. See also the example after Corollary 665 and Fig. 7.7. Proposition 664 Let f be a bounded real-valued function defined on a closed and bounded interval [a, b] ⊂ R. Then f ∈ R[a, b] if and only if, for every ε > 0 there exists a partition P ∈ P[a, b] such that U (f , P ) − L(f , P ) < ε.
346
7 Integration
Proof Assume first that f ∈ R[a, b]. Given ε > 0, we may find P1 , P2 ∈ P[a, b] such that 4 b 4 b 4 b f − ε/2 < L(f , P1 ) ≤ f ≤ U (f , P2 ) < f + ε/2. a
a
a
Let P ∈ P[a, b] be such that P / Pi for i = 1, 2 (for example, we may take P := P1 ∪ P2 ). Then, by using Remark 659.1, we get 4 b 4 b f − ε/2 < L(f , P ) ≤ U (f , P ) < f + ε/2, a
a
and the conclusion follows. Assume now that the given condition holds; i.e., given ε > 0, we can find P ∈ b b P[a, b] such that U (f , P ) − L(f , P ) < ε. Observe that L(f , P ) ≤ f ≤ a f ≤ a b b U (f , P ). It follows that a f − f < ε. Since ε > 0 was arbitrary, we get a b f = af . 2 a
Corollary 665 Let f be a real-valued bounded function defined on a closed and bounded interval [a, b] in R. Then, f ∈ R[a, b] if and only if, there exists a sequence {Pn }∞ such that limn→∞ L(f , Pn ) = limn→∞ U (f , Pn ) n=1 of partitions of [a, b] b ( = l). If this is the case, then a f = l. Proof Assume first that f ∈ R[a, b]. Thanks to Proposition 664, we can find, for n ∈ N, a partition Pn ∈ P[a, b] such that U (f , Pn ) − L(f , Pn ) < 1/n. Without loss of generality, we may assume that P1 ≺ P2 ≺ . . . ≺ Pn ≺ . . . (see Remark 659.1). It follows that {U (f , Pn )}∞ n=1 is a decreasing sequence bounded below by L(f , P1 ) (see Remark 659.2), hence convergent by Theorem 135. A similar argument proves that the sequence {L(f , Pn )}∞ n=1 also converges. Obviously, lim n→∞ L(f , Pn ) = limn→∞ U (f , Pn ). Assume now that the condition holds for a sequence {Pn }∞ n=1 . Since 4 L(f , Pn ) ≤
b
4 f ≤
b a
f =
b
af ,
f ≤ U (f , Pn ) for all n ∈ N, a
a
we get
b
2
and the conclusion follows.
As a first instructive example, consider the function f (x) = x defined on the interval [0, 1]. Given n ∈ N, set Pn := {0 < 1/n < 2/n < . . . < 1}. Obviously we have 2
2 0 + n 1 1 2 U (f , Pn ) = + n n L(f , Pn ) =
1 n
2 1 + ... + n 1 2 2 + ... + n n 1 n
1 n
n−1 n
2 =
n−1 1 2 k . n3 k=0
n 1 n 2 1 2 = 3 k . n n n k=1
7.1 The Riemann Integral
347
Now, recall that nk=1 k 2 = (1/6)n(n + 1)(2n + 1), as it can be readily proved by induction (see Exercise 13.4). Then 1 1 , and U (f , Pn ) → , as n → ∞. 3 3 1 Therefore, by Corollary 665, we have that f ∈ R[0, 1] and 0 f = 1/3 (of course, we will later on develop a technique how to get easily the value of this integral). In particular, fave = 1/3. The geometric meaning of the average of a function on an interval should be clear from Fig. 7.7, where the graph of the function x 2 on [0, 1] is depicted. As a second instructive example, consider the function f : [0, 1] → R defined by ⎧ ⎨0 if x = 0, f (x) := ⎩1 otherwise. L(f , Pn ) →
Consider an arbitrary partition P := {0 = x0 < x1 < . . . < xn = 1} of [0, 1]. We have L(f , P ) = (1− x1 ), and U (f , P ) = 1. Note that x1 close to 0 gives a lower sum 1 1 close to 1. Thus, by Corollary 665, f = 1 = 0 f , and the function f turns out to 0 be Riemann integrable in [0, 1]—with Riemann integral 1. Note that the function was not continuous at one point—the origin—, and that, essentially, this discontinuity did not matter either in the question of integrability of the function or in the value of the integral. The reader may grasp how to deal with similar situations—the presence of a finite number of discontinuities—in Riemann integration (see Proposition 672). The case of an infinite number of discontinuities is more delicate, and shall be consider in due course (see Theorem 777). Despite what we said about choosing, for efficiency, not necessarily equally spaced intervals in evaluating L(f , P ) and U (f , P ) for approximation purposes, numerical analysts usually try first to use “equally spaced” points for partitions (a “blind” procedure that, although certainly not too elaborated, at least claims simplicity). To show that this approach is justified is the subject of the next result. For some bounded interval [a, b] by a equally-spaced partition P ∈ P[a, b] we mean P := {a = x0 < x1 < . . . < xn = b} such that |xi − xi−1 | = (b − a)/n for all i = 1, 2, . . ., n. Proposition 666 Let f be a bounded real-valued function defined on a closed and bounded interval [a, b] in R. Then f ∈ R[a, b] if and only if, given ε > 0 there exists an equally-spaced partition P of [a, b] such that U (f , P ) − L(f , P ) < ε. Proof Due to Proposition 664, it is clearly enough to prove sufficiency. Without loss of generality we may assume that a = 0 and b = 1. Let f ∈ R[0, 1]. By Proposition 664, given ε > 0 there exists P = {0 = x0 < x1 < . . . < xn = 1} ∈ P[a, b] such that U (f , P ) − L(f , P ) < ε. Our goal is to produce an equally-spaced partition P ∈ P[0, 1] such that U (f , P ) − L(f , P ) < ε. Let δ := ε − (U (f , P ) − L(f , P )) ( > 0), M := sup{|f (x)| : x ∈ [0, 1]}, and put r := δ/(8M(n + 1)). “Shrink” each interval i associated to P by increasing
348
7 Integration
its left endpoint by less than r and decreasing its right endpoint by less than r to a smaller subinterval
i whose endpoints are both rational numbers. Observe that Mi
− m
i ≤ Mi − m i for all i = 1, 2, . . ., n, where m i , Mi (respectively m
i , Mi
) denote the infimum and supremum of f on i (respectively,
i ). The family {
i : i = 1, 2, . . ., n} may fail to be a partition of [0, 1]. We obtain a partition (call it P
) m by adding the small intervals {
j }j =1 needed. Note that m ≤ n + 1. So we have U (f , P
) − L(f , P
) ≤
n
(Mi
− m
i )λ(
i ) +
m
2Mλ(
j )
j =1
i=1
≤ U (f , P ) − L(f , P ) + δ/2 < ε. Since all intervals in P
have now rational endpoints, we can obtain P ∈ P[0, 1] equally spaced such that P / P
. Indeed, if P
:= {0 < p1 /q1 < . . . < pm /qm = 1}, where all pi and qi are natural numbers, let q := gcd (q1 , . . ., qm ). Clearly, a partition P ∈ P[0, 1] of (1/q)-spaced points satisfies P / P
. We may use then (7.5) to ensure that U (f , P ) − L(f , P ) < ε. 2 We start now on the second path to the Riemann integral announced in the introduction, and prove its equivalence to the Darboux approach. Until then, let us agree on that f has a Riemann integral if it does by the Darboux approach. Certainly, it is often convenient—specially from the computational point of view—to evaluate the Riemann sums approaching the integral by computing values of f at chosen points in the intervals i defined by partitions P ∈ P[a, b]. To show that this can be done is the purpose of Theorem 669 below. A partition P := {a = x0 < x1 < . . . < xn = b} ∈ P[a, b], together with a selection {t1 , t2 , . . ., tn } of points (called tags), ti ∈ [xi−1 , xi ], i = 1, 2, . . ., n, is called a tagged partition of [a, b], and is denoted by (P , {ti }). We say that a tagged partition (P , {ti }) is finer than a partition P0 whenever P0 ≺ P . The following concept is due to B. Riemann. Definition 667 Consider a bounded real-valued function f defined on an interval [a, b] ⊂ R. A Riemann sum for f , denoted R(f , P , {ti }), over a tagged partition (P := {a = x0 < x1 < . . . < xn = b}, {ti }) of [a, b], is defined by R(f , P , {ti }) = f (t1 )λ(1 ) + f (t2 )λ(2 ) + · · · + f (tn )λ(n ), where i = [xi−1 , xi ], for i = 1, 2, . . ., n. Remark 668 Definition 667 could have been done for any real-valued function f defined on [a, b], even if the function is not bounded there. The purpose of introducing Riemann sums is to define Riemann integrability and the Riemann integral by using this device instead of the lower and upper sums defined above. It will turn out—see Remark 670—that the existence of a real number that can be reasonably understood as a candidate for the integral in terms of Riemann sums forces the function to be bounded. ®
7.1 The Riemann Integral
349
For a geometric interpretation of a Riemann sum, see Fig. 7.2 above. If the function f is continuous on [a, b] then the infimum mi (the supremum Mi ) is attained in each subinterval i at some ci ∈ i (respectively, di ∈ i ). Thus, the Riemann lower (respectively, upper) sum appears as a Riemann sum, precisely L(f , P ) = R(f , P , {ci }), ( respectively, U (f , P ) = R(f , P , {di })). Theorem 669 Let f be a real-valued function defined on a closed and bounded interval [a, b] in R. Then, the following are equivalent: (i) f ∈ R[a, b]. (ii) There exists a real number I with the property that for every ε > 0 we can find δ > 0 such that for every tagged partition (P , {ti }) with |P | < δ, we have |R(f , P , {ti }) − I | < ε. (iii) There exists a real number I with the property that for every ε > 0 we can find a partition Pε ∈ P[a, b] such that for every tagged partition (P , {ti }) finer than Pε , we have |R(f , P , {ti }) − I | < ε. b Moreover, if one (and then all) of the conditions above holds, then I = a f . Theorem 669 shows that both approaches to the Riemann integral give the same existence and value. Remark 670 Note that if a function f satisfies (iii) in Theorem 669 then it must be bounded on [a, b]. Indeed, assume on the contrary that f is unbounded on [a, b]. Then we can find a sequence {xn }∞ n=1 of distinct points in [a, b] such that |f (xn )| > n for all n ∈ N. By Theorem 147, the sequence {xn }∞ n=1 has a subsequence that converges to a point x0 ∈ [a, b]. By (iii) there exists I ∈ R and a partition P1 ∈ P[a, b] such that for every tagged partition (P , {ti }) finer than Pε we have |R(f , P , {ti }) − I | < 1. At least one of the intervals i of P1 contains infinite number of points of {xn }∞ n=1 . Since the tag ti can be chosen arbitrarily in i , we immediately reach a contradiction. ® Proof of Theorem 669 (i)⇒(ii): Assume that f ∈ R[a, b]. Let M := sup{|f (x)| : x ∈ [a, b]}. If M = 0, then f ≡ 0, and certainly f satisfies (ii). Assume, then, that M > 0. Fix ε > 0. Then we can find a partition P = {a = y0 < . . . < ym = b} of [a, b] such that U (f , P ) − L(f , P ) < ε. Let us choose δ > 0 according to the following rule (here, j = [yj −1 , yj ] for j = 1, 2, . . ., m): ⎧ , ε ⎨δ < min , min{λ( ) : j = 1, 2, . . ., m} if m > 1, j 2M(m−1) ⎩δ < 1 if m = 1. Let now Q := {a = x0 < . . . < xn = b} be a partition of [a, b] such that |Q| < δ, and let ti ∈ [xi−1 , xi ] for i = 1, 2, . . ., n. If [xi−1 , xi ] ⊂ [yj −1 , yj ] for some j ∈ {1, 2, . . ., m}, then f (ti ) ∈ [mj , Mj ], where mj := inf{f (x) : x ∈ j }, and Mj := sup{f (x) : x ∈ j }.
350
7 Integration
Assume first that this happens for every i = 1, 2, . . ., n. Then L(f , P ) ≤ R(f , Q, {ti }) ≤ U (f , P ). If, on the contrary, some subinterval [xi−1 , xi ] is not contained in any [yj −1 , yj ], it must intersect at most two such subintervals, so we have, for some j = j (i) ∈ {1, 2, . . ., m − 1}, yj −1 < xi−1 < yj < xi < yj +1 , and this may happen at most (m − 1) times. Put, for such i, f (ti )(xi − xi−1 ) = f (ti )(xi − yj ) + f (ti )(yj − xi−1 ). It follows then that R(f , Q, {ti }) < U (f , P ) + 2(m − 1)Mδ < U (f , P ) + ε. Analogously, R(f , Q, {ti }) > L(f , P ) − 2(m − 1)Mδ > L(f , P ) − ε. b This proves that |R(f , Q, {ti }) − I | < 3ε, where I = a f . (ii)⇒(iii): Assume that I exists as in (ii). Let ε > 0 and find δ > 0 accordingly. Fix a partition Pε ∈ P[a, b] such that |Pε | < δ. Now, given a tagged partition (P , {ti }) finer than Pε , we obviously have |P | < δ, hence |R(f , P , {ti }) − I | < ε. (iii)⇒(i): By Remark 670, if a function f satisfies (iii) then it is bounded. Given ε > 0 find Pε ∈ P[a, b] such that for every tagged partition (P , {ti }) finer than Pε we have |R(f , P , {ti }) − I | < ε. In particular we have |R(f , Pε , {ti }) − I | < ε for all choices of tags ti ∈ i , where {i }ni=1 is the collection of subintervals of the partition Pε . It is clear then that we have |L(f , Pε ) − I | ≤ ε and |U (f , Pε ) − I | ≤ ε. This shows that |U (f , Pε ) − L(f , Pε )| ≤ 2ε. Since ε > 0 was taken arbitrarily, this shows, by Proposition 664, that f ∈ R[a, b]. It should be clear from the proof of the implications that in (ii) and (iii) the same b real number I works, and that it coincides with a f . 2 A Cauchy criterion for Riemann integrability is given in Exercise 13.409.
7.1.3
Properties of the Integral
Proposition 671 Let f , g ∈ R[a, b], and let α, β ∈ R. Then b b b (i) αf + βg ∈ R[a, b], and a (αf + βg) = α a f + β a g. (ii) For any c ∈ [a, b], we have f [a,c] ∈ R[a, c], and f [c,b] ∈ R[c, b]; moreover, c b b a f + c f = a f.
7.1 The Riemann Integral
351
b b (iii) If f (x) ≤ g(x) for all x ∈ [a,b], we have a f ≤ a g. b b (iv) |f | ∈ R[a, b], and a f ≤ a |f |, where |f |(x) := |f (x)| for all x ∈ [a, b]. (v) max{f , g} ∈ R[a, b] and min{f , g} ∈ R[a, b]. (vi) f.g ∈ R[a, b]. Let f : [a, b] → R be a function. (vii) If c ∈ [a, b] and simultaneously f ∈ R[a, c] and f ∈ R[c, b], then f ∈ c b b R[a, b], and a f = a f + c f . Proof (i) Observe first that given a partition P := {a = x0 < x1 < . . . < xn = b} of [a, b], i := [xi−1 , xi ] and tags zi ∈ i for i = 1, 2, . . ., n, we have R(αf + βg, P , {zi }) = αR(f , P , {zi }) + βR(g, P , {zi }). Therefore,
4 α
b
4
b
f +β
a
4 ≤ |α|
a
a
b
g − R(αf + βg, P , {zi })
4 f − R(f , P , {zi }) + |β|
b a
g − R(g, P , {zi }) .
(7.8)
Using Theorem 669, Eq. (7.8) proves that αf + βg ∈ R[a, b], and that 4 b 4 b 4 b (αf + βg) = α f +β g. a
a
a
(ii) Since f ∈ R[a, b], given ε > 0 we can find P ∈ P[a, b] such that U (f , P ) − L(f , P ) < ε. Let P := P ∪ {c}. Then P can be split in two sets, P1 ∈ P[a, c], and P2 ∈ P[c, b]. Note, by using (7.5), that L(f , P ) ≤ L(f , P ) ≤ U (f , P ) ≤ U (f , P ). Then U (f , P1 ) − L(f , P1 ) + U (f , P2 ) − L(f , P2 ) = U (f , P ) − L(f , P ) < ε. This implies that U (f , P1 ) − L(f , P1 ) < ε and U (f , P2 ) − L(f , P2 ) < ε. As a b result, f [a,c] ∈ R[a, c] and f [c,b] ∈ R[c, b]. Since a f is the unique value in [L(f , P ), U (f , P )] for all the partitions P of [a, b] (and the same holds for any other c b b subinterval), we easily get a f + c f = a f . (iii) The result follows from (i) and the fact that if a function h ∈ R[a, b] satisfies h(x) ≥ 0 for all x ∈ [a, b], then clearly L(f , P ) ≥ 0 for all P ∈ P[a, b]. (iv) Given x, y ∈ [a, b], we have ||f |(x) − |f |(y)| = ||f (x)| − |f (y)|| ≤ |f (x) − f (y)|. By taking suprema on a subinterval i of a partition P ∈ P[a, b] we get supi |f | − inf i |f | ≤ supi f − inf i f . Adding up, we get U (|f |, P ) − L(|f |, P ) ≤ U (f , P ) − L(f , P ), and so |f | ∈ R[a, b] by Proposition 664. Since b b f (x) ≤ |f (x)| and −f (x) ≤ |f (x)| for all x ∈ [a, b], we get a f ≤ a |f | and b b − a f ≤ a |f |; the conclusion follows. 2 (v) Observe that max{f , g} = 21 (f + g) + 21 |f − g|, and min{f , g} = 21 (f + g) − 1 |f − g|. The result follows then from (iv). 2
(vi) Note that f (x).g(x) = (1/4) (f (x) + g(x))2 − (f (x) − g(x))2 for all x ∈ [a, b]. Since f + g and f − g are Riemann integrable functions (by (i) above), it will
352
7 Integration
be enough to prove that the square of a real-valued Riemann integrable function is Riemann integrable. By (iv) above, it suffices to show this for a nonnegative Riemann integrable function. So assume that f if a Riemann integrable function on [a, b] such that f (x) ≥ 0 for all x ∈ [a, b]. The function f is bounded, say f (x) ≤ M on [a, b]. Given x, y ∈ [a, b], we have f 2 (x) − f 2 (y) = (f (x) − f (y))(f (x) + f (y)) ≤ 2M(f (x) − f (y)). This shows that, given a partition P ∈ P[a, b], U (f 2 , P ) − L(f 2 , P ) ≤ 2M(U (f , P ) − L(f , P )), and the conclusion follows from Proposition 664. (vii) Define two functions g and h on [a, b] as ⎧ ⎧ ⎨f (x) if x ∈ [a, c], ⎨0 if x ∈ [a, c], g(x) := h(x) = ⎩0 ⎩f (x) otherwise. otherwise. Obviously, g, h ∈ R[a, b]. Item (i) above shows that (f = ) g + h ∈ R[a, b]. 2 The next result shows that the property of being Riemann integrable and the value of the Riemann integral are unaffected by changing the value of a Riemann integrable function at a finite number of points. That the new function so obtained is again Riemann integrable may also be deduced from the general criterion for Riemann integrability (see Theorem 777). Note that the result does not hold for changing values at a countable number of points (see Remark 673.2). Proposition 672 The characteristic function of a finite subset of a closed and bounded interval is Riemann integrable, and its integral is zero. In particular, if f , g are two real-valued bounded functions defined on a closed and bounded interval [a, b] ⊂ R, f ∈ R[a, b], and g = f up to a finite number of points in [a, b], b b then g ∈ R[a, b] and a g = a f . Proof Let S be a nonempty finite subset of [a, b], and let m := card S. Given ε > 0, let P := {a = x0 < . . . < xn = b} ∈ P[a, b] such that |P | < ε, and let I := {i ∈ {1, 2, . . ., n} : i ∩ S = ∅}. Note that card I ≤ 2m. Then, 0 = L(χS , P ) ≤ U (χS , P ) =
n
sup χS (x)λ(i ) =
i=1 x∈i
λ(i ) < 2mε.
i∈I
b b Since ε > 0 was arbitrary, we get χS = a χS = 0, and the conclusion follows. a The last sentence is a consequence of this and (i) in Proposition 671. 2 Remark 673 1. According to Proposition 672, two bounded functions on a closed and bounded interval that differ only at a finite number of points are indistinguishable from the point of view of the Riemann theory of integration. Results in this theory that are stated for a certain function f hold, then, for all functions that differ from f precisely at a finite number of points. The reader should think, then, of a Riemann integrable function as a class of functions, the difference of two elements is the class being the characteristic function of a finite set.
7.1 The Riemann Integral
353
2. Proposition 672 does not hold in general for functions that differ at a countably infinite number of points. For example, the Dirichlet function D : [0, 1] → R (see Definition 296) differ from the constant function 1 at the points of the set Q∩[0, 1] (a countable set). However, D is not Riemann integrable, as it follows from considering U (D, P ) and L(D, P ) for every P ∈ P[0, 1]. Indeed, U (D, P ) = 1 and L(D, P ) = 0 for all P ∈ P[0, 1]. Another argument, based on the criterion for Riemann integrability (Theorem 777), will be given in Remark 782.4. ® Proposition 674 below shows that continuous functions on a closed and bounded interval are Riemann integrable (an extension of this result is presented in Corollary 676). A much more general statement concerning bounded functions on closed and bounded intervals is true: It is necessary and sufficient that the set of points where the function is discontinuous would be null. This is the content of the characterization of Riemann integrable functions in Theorem 777 (see also Remark 782). The proof of Proposition 674 below (and its Corollaries 675 and 676) is, however, considerably easier, so we formulate and prove this statement here (a further extension will be provided in Remark 699). Proposition 674 Every real-valued continuous function defined on a closed and bounded interval in R is Riemann integrable. Proof Any continuous function f defined on [a, b] is uniformly continuous there (Theorem 344). Thus, for every ε > 0 there exists δ > 0 such that |f (x) − f (y)| < ε whenever x, y ∈ [a, b] and |x − y| < δ. Choose a partition P := {a = x0 < x1 < . . . < xn = b} of [a, b] so that |P | < δ. Observe that for each interval i := [xi−1 , xi ], i = 1, 2, . . ., n, we have supx∈i f (x) − inf x∈i f (x) ≤ ε. Then U (f , P ) − L(f , P ) =
n
sup f (x) − inf f (x) λ(i ) ≤ ε(b − a).
i=1
x∈i
The result follows from Proposition 664.
x∈i
2
Corollary 675 Let f be a continuous bounded function on an open and bounded interval (a, b). Then, any extension f to [a, b] is Riemann integrable on [a, b], and b the Riemann integral a f is independent of the particular extension. Proof Let |f| be bounded by M. Given ε > 0 such that ε/(4M) < (b − a)/2, ε ε choose a1 and b1 such that a < a1 < a + 4M and b − 4M < b1 < b. Since f is continuous on [a1 , b1 ], we can choose, by the proof of Proposition 674, a partition P of [a1 , b1 ] such that U (f , P ) − L(f , P ) < ε. Add to P the points a and b to form a partition P of [a, b]. Thus, we have U (f, P ) − L(f, P ) < 2ε. Since ε can be taken as small as we wish, we get, from Proposition 664, that f ∈ R[a, b]f two arbitrary extensions give the same integral is a consequence of Proposition 672. 2 Corollary 676 Let f be a bounded function on a closed and bounded interval [a, b], and assume that the number of points of discontinuity of f is finite. Then f ∈ R[a, b]
354
7 Integration
Proof If the set of points of discontinuity is (a ≤ ) x1 < x2 < . . . < xn (≤ b), apply Corollary 675 to f restricted to each interval (xi , xi+1 ), for i = 1, 2, . . ., n − 1. Finally, use (vii) in Proposition 671. 2 Corollary 675, together with Proposition 672, can be used, for example, to prove that any extension of the function sin x1 , x ∈ (0, 1], to the interval [0, 1], is Riemann integrable on [0, 1] (and that the integral value does not depend on the particular extension). Related to Corollary 675, see Remark 699 below. The next result is somehow less expected than Proposition 674 (monotone functions may have, after all, “many” discontinuities, although, in view of Proposition 397, “not too many”). Later on (see Remark (see Remark 782) we shall give an alternative proof—based on the criterion for Riemann integrability, Theorem 777—for the next result. Here we present a direct proof based on the definition of the Riemann integral. Proposition 677 Let f be a real-valued monotone function on a closed and bounded interval [a, b] ⊂ R. Then f ∈ R[a, b]. Proof Assume, without loss of generality, that f is increasing. Given ε > 0, we shall find a partition P ∈ P[a, b] such that U (f , P ) − L(f , P ) < ε. This will show that f ∈ R[a, b]. For this, first find n ∈ N so that n>
b − a f (b) − f (a) . ε
Choose a partition P := {a = x0 < x1 < · · · < xn = b}, where xi = xi−1 +
b−a , for i = 1, 2, . . ., n. n
Then U (f , P ) − L(f , P ) =
n
sup x∈[xi−1 ,xi ]
i=1
=
inf
x∈[xi−1 ,xi ]
f (x) (xi − xi−1 )
n
b−a f (xi ) − f (xi−1 ) (xi − xi−1 ) = f (xi ) − f (xi−1 ) n i=1
n
i=1
=
f (x) −
n b − a b − a f (xi ) − f (xi−1 ) = f (b) − f (a) < ε. n i=1 n
2 Remark 678 The last equality in the proof above is referred to as a telescopic argument. ®
7.1 The Riemann Integral
355
Example 679 The Riemann function R, introduced in Definition 700, is Riemann 1 integrable on [0, 1], and 0 R = 0. Indeed, let ε > 0 and choose N so that N1 < 2ε . Consider the set DN of all rational numbers in [0, 1] that have denominator smaller than or equal to N. Note that the set DN is finite. Choose, for each point of DN , a closed subinterval of [0, 1] that contains the point, in such a way that these chosen intervals are pairwise disjoint and the sum of their lengths is less than ε/2. Their endpoints form a partition P of the interval [0, 1].
Denote those intervals of P that contain a point of DN by i , and intervals of P that
do not contain points of DN by j . Note that each interval forming the partition P contains an irrational number, and thus L(R, P ) = 0. In the notation of Definition 658 we have
Mi λ(i ) + Mj λ(j ). U (R, P ) =
i
j
The first summand is less than or equal to ε/2, the second one is less than or equal 1 to N1 ( < ε/2). Thus U (R, P ) ≤ ε. This shows that R ∈ R[0, 1], and that 0 R = 0. Another argument to ensure the Riemann integrability of R, based on the criterion for Riemann integrability (Theorem 777), will be given in Remark 782.4. ♦
7.1.4
Functions Defined by Integrals
It follows from (ii) in Proposition 671 that a function f ∈ R[a, b] has the property that the restricted function f [a,x] belongs to R[a, x] for every x ∈ [a, b]. We can then define a new function F : [a, b] → R by 4 F (x) :=
x
f , for x ∈ [a, b].
(7.9)
a
This function is referred to as the indefinite integral of f . A central subject of this subsection is to establish the basic properties of the indefinite integral F defined by (7.9), and to relate it to the function f . As a general rule, the function F is essentially “better” than the function f from which it proceeds. For example, the next result ensures that F is always—despite the possible lack of continuity properties of f —a continuous function (even more, a Lipschitz function). Proposition 680 Let [a, b] be a closed and bounded interval in R, and let f ∈ x R[a, b]. Then, the function F : [a, b] → R defined by F (x) := a f , for x ∈ [a, b], is Lipschitz (in particular absolutely continuous, hence continuous) on [a, b]. Proof Observe first that for every x ∈ [a, b], we have f [a,x] ∈ R[a, x] (see (ii) in Proposition 671). The function f is bounded on [a, b], so there exists M > 0 such that |f (x)| ≤ M for every x ∈ [a, b]. Let x, y ∈ [a, b]. Without loss of generality, we may assume that x ≤ y. From (ii), (iii), and (iv) in Proposition 671 we get
356
7 Integration
Fig. 7.8 The picture for the proof of the Fundamental Theorem of Calculus for a continuous function
f
a
f
x x+ h (a)
b a
x x+ h (b)
b
y x y y |F (y) − F (x)| = a f − a f = | x f | ≤ x |f | ≤ M|y − x|. The conclusion follows (see Proposition 444). 2 For the next results we need the following definition. Definition 681 Let [a, b] be a closed and bounded interval in R. Given a function f : [a, b] → R, an antiderivative for f on [a, b] is a continuous function G : [a, b] → R such that G is differentiable on (a, b), and G (x) = f (x) for all x ∈ (a, b). Proposition 680 says that the function F defined by (7.9) for a Riemann integrable function f is Lipschitz. A central theorem in Analysis (the “Fundamental Theorem of Calculus,” see Theorem 685) says that the function F has often a derivative, and— something of the outmost importance—this derivative coincides with f , so F is in fact an antiderivative for f . This important result (whose earlier versions were the contribution of the Scottish mathematician J. Gregory, the English mathematician I. Barrow and in a fundamental way I. Newton) yields a practical method for evaluating Riemann integrals for functions possessing an antiderivative (also called a primitive function, see Definition 681). For a nonnegative continuous real-valued function f defined on [a, b], the fact that F defined by (7.9) is indeed an antiderivative for f may be reasonably conjectured as soon as we accept that the integral of f on a given interval [a, b] is the area in R2 between the vertical lines x = a and x = b, the OX axis, and the graph of f . Observe first that, for x ∈ [a, b) and h > 0 such that x + h ∈ [a, b], 4 x 4 x+h 4 x+h f− f = f. (7.10) F (x + h) − F (x) = a
a
x
Thus, as a consequence of our agreement, (7.10) corresponds to the shaded area in Fig. 7.8 a. This area is in between the areas of the two shaded rectangles in Fig. 7.8b, having height min{f (t) : t ∈ [x, x+h]} and max{f (t) : t ∈ [x, x+h]}, respectively. Both these rectangles have width h. This implies that min{f (t) : t ∈ [x, x + h]} ≤
F (x + h) − F (x) ≤ max{f (t) : t ∈ [x, x + h]}. h (7.11)
If we allow now h → 0+ in (7.11), the continuity of f will imply that both the max and min above approach f (x), hence we shall have F (x + h) − F (x) → f (x) as h → 0 + . h
(7.12)
7.1 The Riemann Integral Fig. 7.9 The Fundamental Theorem of Calculus for the function f (x) = constant
357 f a ax0 ax0
F
0
x0 F(x) = ax
1
0
x0
1
f(x) = a
A similar reasoning would apply to the case x ∈ (a, b] and h < 0 such that x − h ∈ [a, b] (with obvious changes in the geometric approach), now letting h → 0−, and so F would have a derivative at x equal to f (x). This argument—for continuous functions—will be worked out rigorously in Corollary 683 below. This shows, in plain words, that the processes of integration and differentiation are, at least in a certain setting, mutually reciprocal. This is the essence of the Fundamental Theorem of Calculus 685. Proposition 682 Let [a, b] ⊂ R be a closed and bounded interval. Let f be a Riemann integrable real-valued function defined on [a, b]. Let x ∈ (a, b) be a point of continuity of f . Then, the function F : [a, b] → R defined by Eq. (7.9) is differentiable at x and F (x) = f (x). If f is continuous at a (continuous at b), then F has a right-hand-side derivative F+ (a) = f (a) (respectively, a left-hand-side derivative F− (b) = f (b)). Proof Fix x ∈ (a, b). Since f is continuous at x, given ε > 0 there exists δ > 0 such that |f (x + h) − f (x)| < ε for h ∈ R such that x + h ∈ (a, b) and |h| < δ. Therefore, for such an h = 0 we have 4 x+h 4 x+h 1 F (x + h) − F (x) = f (t) dt − f (x) dt − f (x) h h x x 4 x+h 1 1 |f (t) − f (x)| dt ≤ ε|h| = ε. ≤ |h| x |h| This shows that F (x) exists and F (x) = f (x). The argument for x := a or x := b is similar. 2 The following result gives a natural instance of a class of functions having antiderivatives. Corollary 683 Let [a, b] ⊂ R be a closed and bounded interval. Then, every continuous function f : [a, b] → R has an antiderivative on [a, b]. The indefinite integral F defined by Eq. (7.9) is an antiderivative of f on [a, b]. Proof The function f is Riemann integrable on [a, b] (see Proposition 674) and, by Proposition 680, the function F is a continuous function on [a, b]. Apply now Proposition 682. 2
358
7 Integration
Fig. 7.10 The Fundamental Theorem of Calculus for the function f (x) = x
x0 f F x02/2
0/2
x2
0 F(x)=x2/ 2
Fig. 7.11 f ∈ R[0, x1], although F (x) := 0 f is not differentiable
x0
0
1
1
x0 f(x)=x f 1
1
0
1
0
1
F
Remark 684 Note that not every real-valued Riemann integrable function on a closed and bounded interval [a, b] has an antiderivative. Indeed, according to Theorem 448 a function without the intermediate value property cannot be the derivative of another function. By considering the signum function in Example 305.2 (or, more generally, step functions of the form ni=1 ci χIi , where {Ii : i = 1, 2, . . ., n} is a family of intervals, and ci ∈ R for i = 1, 2, . . ., n) the reader will immediately provide Riemann integrable functions that lack the intermediate value property, and thus they have not an antiderivative (see Fig. 7.11 for an explicit example). A more dramatic example is given by the Riemann function R defined on (0, 1), introduced in Definition 700. Let [a, b] ⊂ (0,1). Note that R ∈ R[a, b] (see Example 679). x Therefore, we may define F (x) := a R, for all x ∈ [a, b]. It turns out that F (x) = 0 for each x ∈ [a, b] (see Example 679 again), and yet (0 = ) F (x) = R(x) for each rational number x ∈ (a, b). If R had an antiderivative on [a, b], Theorem 685 below would apply, and then F (x) = R(x) for all x ∈ (a, b), a contradiction. ® From now on, and more particularly in the statement of the next result, we shall use the following notation: Given a real-valued function G defined on an interval [a, b], we write G|ba := G(b) − G(a). Theorem 685 (Fundamental Theorem of Calculus for the Riemann integral) Let f be a Riemann integrable function on a closed and bounded interval [a, b] ⊂ R, and let f have an antiderivative G on [a, b]. Then the function F : [a, b] → R given
7.1 The Riemann Integral
by F (x) =
x a
359
f for x ∈ [a, b] is an antiderivative for f on [a, b]. Moreover, 4 b b f = Ga . (7.13) a
Proof We will prove that 4
x
G(x) − G(a) =
f , = F (x) for every x ∈ [a, b].
(7.14)
a
This will show that F is an antiderivative for f on [a, b], and, by letting x := b, that (7.13) holds (two simple examples are represented graphically in Figs. 7.9 and 7.10, respectively). Clearly, (7.14) is true for x = a. If x ∈ (a, b], then consider a sequence (n) (n) (n) {Pn }∞ n=1 of partitions of [a, x], say Pn := {a = x0 < x1 < · · · < xkn = x}, such that limn→∞ |Pn | = 0, where |Pn | is the norm of Pn . Then, by Lagrange’s Mean Value Theorem 365, G(x) − G(a) =
kn
(n) G(xi(n) ) − G(xi−1 )=
i=1
kn
(n) G (ci(n) )(xi(n) − xi−1 )
i−1
(n) for some ci(n) ∈ (xi−1 , xi(n) ), i = 1, 2, . . ., kn .
Since G = f on (a, b), kn
(n) G (ci(n) )(xi(n) − xi−1 )=
i=1
kn
(n) f (ci(n) )(xi(n) − xi−1 ) = R(f , Pn , {c1(n) , . . ., ck(n) }). n
i=1
Theorem 669 implies that 4 lim R(f , Pn , {c1(n) , . . ., ck(n) }) n n→∞ Therefore (7.14) holds.
=
x
f. a
2
Remark 686 1. The statement of the Fundamental Theorem of Calculus 685 seems, at first glance, somehow circular: We need to know in advance that a function f has an antiderivative in order to find one. The existence of an antiderivative is not what we are looking for—this is a requirement, not the conclusion. What matters is the precise form of the antiderivative (it is given by an integral). As a byproduct, the b value of a definite integral ( a f is sometimes called a “definite integral”) can be calculated if we have at hand the expression G of a particular antiderivative—that can be obtained by some other means1 . Precisely, we can evaluate the Riemann 1 Part of the standard calculus books efforts are devoted to the so-called “Calculus of Antiderivative —or ‘Primitive’— Functions” methods, some of them are included here in exercises collected at Sect. 13.5.2.
360
7 Integration
Fig. 7.12 The function F in Remark 686.3, and its derivative (the graph is truncated between y = −3 and y = 3)
b b integral a f by a f = G(b)−G(a). As a simple example, consider the function f (x) = x 2 on an interval [a, b] (the integral on [0, 1] was calculated after Defini3 tion 663 by computing Riemann upper and lower sums). Observe that G(x) := x3 , x ∈ [a, b], is an antiderivative for f . Then we can apply Theorem 685 to obtain 4 b a3 b3 − . x2 = 3 3 a +b . In particular, fave = a + ab 3 2. The Fundamental Theorem of Calculus 685 may fail if the hypothesis of f having an antiderivative is removed. Two examples of this situation were given in Remark 684. Still, a Riemann integrable function without antiderivative, like the function f in Fig. 7.11, may have an antiderivative when restricted to some subintervals, say to [0, 1/2) or to [1/2, 1] in that instance, and Theorem 685 can be applied there. 3. Theorem 685 implicitly contains the result that if a Riemann integrable function f has an antiderivative on a closed and bounded interval [a, b], then two arbitrary antiderivatives for f on [a, b] differ by a constant. This particular statement follows, too, from Corollary 370, a consequence of the Lagrange’s Mean Value Theorem 365. Corollary 370 is, however, more general, in the sense that there are continuous functions F : [a, b] → R that are differentiable on (a, b) although F on (a, b) cannot be extended to a Riemann integrable function on [a, b]. An example is given by the function F : [0, 1] → R, defined by ⎧ ⎨x 2 sin 1 , if x ∈ (0, 1], x2 F (x) = ⎩0, if x = 0. 2
2
The function F is continuous on [0, 1] and it has a derivative on (0, 1) (it has even a right-hand side derivative at 0, with value 0, and a left-hand side derivative at 1). However, F (x) = 2x sin x12 − x2 cos x12 for x ∈ (0, 1), an unbounded function on (0, 1) (see Fig. 7.12). 4. A more dramatic —and more interesting— example than the one in Remark 686.3, is given by a function F that is continuous on [0, 1], it has a bounded derivative
7.1 The Riemann Integral
361
Fig. 7.13 The function φ on [0, 1/8] in the construction of Volterra’s function
on [0, 1] (at the end points we consider one-sided derivatives), and this derivative is not Riemann integrable on any interval [0, x] for 0 < x ≤ 1. The example is due to the Italian mathematician V. Volterra in 1881, and is presented below. Apparently, this pathological situation motivated H. Lebesque in his search for his new integral (we shall see later that F is Lebesgue integrable on [0, 1] — Corollary 749, and check also Propositions 445, 444, 436, Theorem 424, and its Corollary 433 in this order). The example is constructed as follows: Let C be a Cantor ternary set of positive measure in [0, 1]. To be precise, we repeat the construction after Definition 277, this time removing a central open interval of length 1/4, from the remaining part two central open intervals of length 1/42 , from the remaining part four central open intervals of length 1/43 , and so on. The compact remaining set is called the Smith–Volterra–Cantor set SV C (after the names of the Irish mathematician H. J. S. Smith, and the aforementioned V. Volterra and G. Cantor). As the original Cantor ternary set, SV C has empty interior, and this time λ(SV C) = 1/2. It is easy to prove that given a pairwise disjoint collection of intervals in [0, 1] whose total length is greater that 1/2, then the union of the family intersects SV C. The basic ingredient is the function φ : [0, 1/8] → R given by (see Fig. 7.13) ⎧ ⎨x 2 sin 1 , if x ∈ (0, 1/8), x φ(x) = ⎩0 if x = 0. This function was used before as an example of a differentiable function having a discontinuous derivative. See Example 4.5.8.3. Define a function F on [0, 1] as follows. For x ∈ C put F (x) = 0. If (a, b) is a component interval of [0, 1] \ C, let c := sup{t : 0 < t ≤ (b − a)/2, φ (t) = 0} and define ⎧ ⎨F (a + t) = F (b − t) = φ(t), if 0 < t ≤ c, and ⎩F (x) = φ(c), if a + c ≤ x ≤ b − c.
362
7 Integration
Fig. 7.14 (iii) is the basic ingredient in building Volterra’s function
Fig. 7.15 The function F on the first central open interval (first stage)
In Fig. 7.14 we represent (i) φ on [0, 1/8], (ii) the modification of φ on the same interval, and (iii) the reflection to fill the interval [0, 1/4]. In Fig. 7.15 we represent the translation of (iii) to the removed central interval. This is the first stage in building F , and it is repeated, then, on each component interval of [0, 1] \ C. The resulting function F has the following properties. (a) F is differentiable on [0, 1] (at the end points we consider one-sided derivatives). (b) F (ζ ) = 0 for every ζ ∈ C. (c) |F (x)| < 3 for every x ∈ [0, 1]. (d) F is discontinuous at every point of C. (e) F is notRiemann integrable on any interval [0, x] for 0 < x ≤ 1. (f) F (x) = [0,x] F for all x ∈ [0, 1], where E means the Lebesgue integral on a set E (see Sect. 7.3 and, in particular, Sect. 7.3.6). Sketch of the Proof (a) and (b): For differentiability of F at the points of C note that if ζ ∈ C and δ > 0 then for x ∈ (a, b), where (a, b) is a component interval of [0, 1] \ C close enough toζ , and a is the endpoint of (a, b) that is nearest to ζ F (x)−F (ζ ) F (x) F (x) (x−a)2 of {a, b}, we have x−ζ = x−ζ < x−a ≤ x−a < δ. This shows that F (ζ ) = 0. That F is differentiable at points x ∈ [0, 1] \ C is clear. (c) follows from (a) and (b), and the formula for the derivative of φ. (d) On every component interval (a, b) of [0, 1] \ C, there are points where φ
equals to ±1.
7.1 The Riemann Integral
363
(e) This is a consequence of Proposition 664. In fact, no matter how small in norm a partition of [0, x] would be, the oscillation of F (see Definition 700) on each subinterval is big. (f) Since |F | ≤ 3, F is Lipschitz on [0, 1] (see Proposition 445), and we can use then the Fundamental Theorem of Calculus—in this case, the Lipschitz version (Theorem 1083) suffices—to get that F is Lebesgue integrable and that F is the indefinite integral (in the Lebesgue sense) of its derivative. ®
7.1.5
♣ Some Applications of the Riemann Integral and the Arzelà–Ascoli Theorem to the Theory of Ordinary Differential Equations
We give an application of the Fundamental Theorem of Calculus for the Riemann integral and the Arzelà–Ascoli Theorem 648 to the theory of ordinary differential equations. The continuity of the two-variable, real-valued function f in Theorem 687 below must be understood in the sense of mapping between metric spaces (see Sect. 6.2). Theorem 687 (Cauchy–Peano Existence Theorem) Let f : U → R be a continuous function defined on a neighborhood of a point (x0 , y0 ) ∈ R2 . Then there exists α > 0 such that the initial value problem ⎧ ⎨ dy (x) = f (x, y(x)), dx (7.15) ⎩y(x0 ) = y0 , has a solution in the interval (x0 −α, x0 +α), i.e., there exists a differentiable function y(x) defined on the interval (x0 − α, x0 + α) that satisfies (7.15). Proof Find a > 0 such that Q := [x0 − a, x0 + a] × [y0 − a, y0 + a] ⊂ U . Since |f | is continuous on Q, it is bounded there. Let M > 1 such that |f (x, y)| ≤ M for all (x, y) ∈ Q. Put α := a/M. Observe that, by the Fundamental Theorem of Calculus (Theorem 685) and Remark 686.3, a differentiable function ϕ is a solution of (7.15) on (x0 − α, x0 + α) if and only if, 4 x ϕ(x) = y0 + f (t, ϕ(t)) dt, for all x ∈ (x0 − α, x0 + α). (7.16) x0
The function f is uniformly continuous on Q (see Corollary 630), so for every ε > 0 there exists δ(ε) > 0 such that |f (x, y) − f (x , y )| < ε, if(x, y), (x , y ) ∈ Q satisfyd∞ ((x, y), (x , y )) < δ(ε). (7.17)
364
7 Integration
Fig. 7.16 First four polygonals in the Cauchy–Peano construction
We shall define a sequence of approximate solutions {ϕn }∞ n=1 to (7.15) in the form of the so-called Euler polygonals, and then, by using Arzelà–Ascoli Theorem 648, we shall extract a convergent subsequence. The proof will be finished once we shall show that the limit of this subsequence is a true solution of (7.15). Fix n ∈ N. Choose then m ∈ N such that Mα/m < δ(1/n). Put xn,k := x0 +kα/m for k = −m, −m + 1, . . ., m. Note that xn,0 = x0 , xn,−m = x0 − α, and xn,m = x0 + α. Let yn,0 := y0 . We define step by step a continuous piecewise linear function ϕn in the following way (see Fig. 7.16). For x ≥ x0 , ⎧ ⎪ yn,0 + f (xn,0 , yn,0 )(x − xn,0 ), if x ∈ [xn,0 , xn,1 ], yn,1 := ϕn (xn,1 ), ⎪ ⎪ ⎪ ⎪ ⎨y + f (x , y )(x − x ), if x ∈ [x , x ], y := ϕ (x ), n,1 n,1 n,1 n,1 n,1 n,2 n,2 n n,2 ϕn (x) = ⎪. . . ⎪ ⎪ ⎪ ⎪ ⎩y n,m−1 + f (xn,m−1 , yn,m−1 )(x − xn,m−1 ), ifx ∈ [xn,m−1 , xn,m ], (7.18) and for x ≤ x0 , ⎧ ⎪ yn,0 + f (xn,0 , yn,0 )(x − xn,0 ), if x ∈ [xn,−1 , xn,0 ], yn,−1 := ϕn (xn,−1 ), ⎪ ⎪ ⎪ ⎪ ⎨y n,−1 +f (xn,−1 , yn,−1 )(x −xn,−1 ), if x ∈ [xn,−2 , xn,−1 ], yn,−2 := ϕn (xn,−2 ), ϕn (x) = ⎪ ... ⎪ ⎪ ⎪ ⎪ ⎩y +f (x ,y )(x −x ), ifx ∈ [x ,x ], n,−m+1
n,−m+1
n,−m+1
n,−m+1
n,−m+1
n,−m
(7.19) Observe that the function ϕn is M-Lipschitz, that it has a piecewise constant derivative (that may have jump discontinuities at points xn,k ), and that ϕn (x0 ) = y0 . Put ⎧ ⎨ϕ (t) − f (t, ϕ (t)), n n Rn (t) := ⎩0,
if t ∈ (xn,k , xn,k+1 ),
k = −m, −m+1, . . ., m−1
if t = xn,k
k = −m, −m+1, . . ., m−1.
7.1 The Riemann Integral
365
Note that |Rn (t)| < 1/n for allt ∈ [x0 − α, x0 + α].
(7.20)
Indeed, for t ∈ [x0 , x0 + α], |Rn (t)| ≤ |ϕn (t) − f (t, ϕn (t))| = |f (xn,k , yn,k ) − f (t, ϕn (t))| for t ∈ (xn,k , xn,k+1 ), k = 0, 1, 2, . . ., m − 1.
(7.21)
Since, for t ∈ (xn,k , xn,k+1 ), k = 0, 1, 2, . . ., m − 1, |xn,k − t| < α/m < δ(1/n), and |yn,k − ϕn (t)| = |ϕn (xn,k ) − ϕn (t)| ≤ M|xn,k − t| < Mα/m < δ(1/n), we get
d∞ (xn,k , yn,k ), (t, ϕn (t)) < δ(1/n), for t ∈ (xn,k , xn,k+1 ), k = 0, 1, 2, . . ., m − 1, (7.22) and so, from (7.21) it follows that |Rn (t)| < 1/n for t ∈ [x0 , x0 + α]. A similar argument shows that |Rn (t)| < 1/n for t ∈ [x0 − α, x0 ]. This is done for all n ∈ N. In this way, we obtain a sequence {ϕn }∞ n=1 of MLipschitz functions on [x0 − α, x0 + α] such that ϕn (x0 ) = y0 for all n ∈ N. In particular, the sequence {ϕn }∞ n=1 is d∞ -bounded, and clearly equicontinuous. We may apply Arzelà–Ascoli Theorem 648 to get a d∞ -convergent subsequence {ϕnk }∞ k=1 . Let ϕ ∈ C[x0 − α, x0 + α] be its d∞ -limit. We claim that ϕ is a solution of the problem (7.15) on (x0 − α, x0 + α). Indeed, the Fundamental Theorem of Calculus (it is enough to use the Lipschitz version, i.e., Theorem 1083) ensures that, for x ∈ [x − α, x + α], 4 x 4 x
ϕn (x) = y0 + f (t, ϕn (t)) + Rn (t) dt. ϕn (t) dt = y0 + (7.23) x0
x0
The integral in (7.23) must be considered in the Lebesgue sense, although here, since ϕn is piecewise constant, the integral is in the Riemann sense—in fact it is the integral of a step function. It is enough to observe that, letting n = nk in (7.23) and k → ∞, we get, in view of (7.17) and (7.20), 4 x ϕ(x) = y0 + f (t, ϕ(t)) dt, (7.24) x0
and this shows the claim.
2
Remark 688 Theorem 687 is an existence result. It does not conclude in general uniqueness (related to this, see Theorem 689). As a particular example of the lack of
366
7 Integration
Fig. 7.17 The functions ψ0 , ψ0 , and ψ1/2 on (−1, 1), (Remark 688)
uniqueness, let f (x, y) := + |y|, for (x, y) ∈ (−2, 2) × (−2, 2). The function f is certainly continuous on (−2, 2) × (−2, 2), so Theorem 687 applies to ensure the existence of a local solution of the initial-value problem ⎧ ⎨ dy (x) = f (x, y(x)), dx (7.25) ⎩y(0) = 0, A review of the proof of Theorem 687 shows that we can take a = 1, so M = 1 and thus α = 1. We obtain then a solution ϕ on (−1, 1) of the initial-value problem (7.25). By inspection, the function ϕ defined on (−1, 1) as ϕ(x) := 0 for all x ∈ (−1, 1), is a solution. Observe, too, that the function (see Fig. 7.17) ⎧ ⎨ x2 if x ∈ [0, 1), ψ0 (x) := 4 2 ⎩− x if x ∈ (−1, 0) 4
is also a solution of the initial-value problem (7.25), as it is straightforward to check. Indeed, there are infinitely many solutions to problem (7.25) in (−1, 1): Choose any r ∈ (0, 1), and put ⎧ (x+r)2 ⎪ if x ∈ (−1, −r], ⎪ ⎨− 4 ψr (x) := 0 if x ∈ [−r, r], ⎪ ⎪ ⎩ (x−r)2 if x ∈ [r, 1). 4 It is easy to show that ψr is a solution to (7.25). Related to Theorem 689, observe that f is not a Lipschitz function of its second variable (see Example 4.5.8.1). ®
7.1 The Riemann Integral
7.1.6
367
♣ Some Applications of the Riemann Integral and the Fixed Point Theory to the Theory of Ordinary Differential and Integral Equations
1. An application to differential equations The following result (due to the French mathematician É Picard) is similar to Theorem 687. However, there is a substantial difference: it requests an extra condition on the function f —a Lipschitz condition on the second variable—and ensures not only existence of a local solution but also uniqueness (see Remark 688). It is an example of application of the Banach contraction principle (Theorem 652). We assume a basic knowledge of partial derivatives and Riemann integration. Theorem 689 (Picard) Let f : U → R be a continuous function defined on a neighborhood U of a point (x0 , y0 ) ∈ R2 . Assume that f satisfies a Lipschitz condition on the second variable, i.e., there exists a constant C > 0 such that |f (x, y1 )−f (x, y2 )| ≤ C|y1 −y2 | for all (x, y1 ) and (x, y2 ) in U . Then there exists α > 0 and a unique continuous real-valued function ϕ defined on [x0 −α, x0 +α], continuously differentiable on (x0 −α, x0 +α), that solves the initial-value problem ⎧ ⎨ dϕ (x) = f (x, ϕ(x)) for all x ∈ (x − α, x + α), 0 0 dx (7.26) ⎩ϕ(x0 ) = y0 . Proof There exists r > 0 such that the closed disc B[(x0 , y0 ), r] is contained in U . The function f is continuous on B[(x0 , y0 ), r], hence it is bounded there. Let M be an upper bound of |f | on B[(x0 , y0 ), r]. Find α > 0 and δ > 0 such that, simultaneously, [x0 − α, x0 + α] × [y0 − δ, y0 + δ] ⊂ B[(x0 , y0 ), r], 2αM < δ, and 2Cα < 1. Let S be the set of all continuous functions from [x0 −α, x0 +α] into [y0 −δ, y0 +δ]. This is a closed subset of the complete metric space (C[x0 − α, x0 + α], d∞ ), hence (S, d∞ ) is itself a complete metric space. Define a mapping G : S → C[x0 − α, x0 + α] by 4 x G(ϕ)(x) := y0 + f (t, ϕ(t)) dt, for ϕ ∈ S and x ∈ [x0 − α, x0 + α]. x0
(7.27) Observe that, for ϕ ∈ S and x ∈ [x0 − α, x0 + α], 4 x 4 x |G(ϕ)(x) − y0 | = f (t, ϕ(t)) dt ≤ |f (t, ϕ(t))| dt ≤ 2αM < δ. x0
x0
This shows that G maps S into S. Moreover, for x ∈ [x0 −α, x0 +α] and ϕ, ψ ∈ S, |G(ϕ)(x) − G(ψ)(x)|
368
7 Integration
4 x
f (t, ϕ(t)) − f (t, ψ(t)) dt = x0 4 4 x |f (t, ϕ(t)) − f (t, ψ(t))| dt ≤ C ≤ x0
x
|ϕ(t) − ψ(t)| dt ≤ 2Cαd∞ (ϕ, ψ).
x0
Thus, d∞ (G(ϕ), G(ψ)) ≤ 2Cαd∞ (ϕ, ψ), hence G is a contraction from (S, d∞ ) into itself. By Theorem 652, it has a (unique) fixed point, say ϕ, i.e., an element ϕ ∈ S such that 4 x f (t, ϕ(t)) dt = ϕ(x) for all x ∈ [x0 − α, x0 + α]. G(ϕ)(x) = y0 + x0
According to the theorem of calculus (Theorem 685) and Remark 686.3, this is equivalent to say that ϕ is a solution of (7.26). 2 Example 690 Proposition 656 can be used to show how the local solution to the problem (7.26) treated in Theorem 689 depends continuously on the initial condition ϕ(x0 ) = y0 . Indeed, assume that {ym } is a sequence or real numbers such that ym → y0 . Follow the proof of Theorem 689 and modify the selection of α by requesting 4αM < δ instead of 2αM < δ. Find m0 ∈ N such that |ym − y0 | < 2αM for all m ≥ m0 . Define, for m = 0, 1, 2, . . ., 4 x f (t, ϕ(t)) dt, for ϕ ∈ S and x ∈ [x0 − α, x0 + α]. Gm (ϕ)(x) := ym + x0
Since |Gm (ϕ)(x) − y0 | ≤ |Gm (ϕ)(x) − ym | + |ym − y0 | ≤ 2αM + 2αM = 4αM < δ for all m ≥ m0 and all x ∈ [x0 − α, x0 + α], we get that Gm maps S into S for all m ≥ m0 . The same argument as in the proof of Theorem 689 shows that all Gm are Lipschitz mappings with the same Lipschitz constant 2Cα ( < 1). It is clear, too, that the sequence {Gm } pointwise converges to G0 . Proposition 656 applies then to get that if ϕm is the solution to the problem ⎧ ⎨ dϕ (x) = f (x, ϕ(x)) for all x ∈ (x − α, x + α), 0 0 dx (7.28) ⎩ϕ(x0 ) = ym . in the interval [x0 − α, x0 + α] for m ≥ m0 , the sequence {ϕm } converges in the ♦ metric d∞ to the solution of the problem (7.26) in the same interval. 2. An application to integral equations The following application shows the use of the improved version of the Banach Theorem 652 given by Corollary 654. Proposition 691 Let [a, b] be a closed and bounded interval in R, and let (C[a, b], d∞ ) be the (complete) metric space of all real-valued continuous functions defined on [a, b] (see Example 551). Let K : [a, b] × [a, b] → R be a continuous mapping, where the square [a, b] × [a, b] is endowed with the metric
7.1 The Riemann Integral
369
induced by d∞ on R2 . Let ϕ ∈ C[a, b]. Finally, let λ be a real number. Define a mapping T : f → Tf from C[a, b] into itself by 4 x Tf (x) := λ K(x, t)f (t) dt + ϕ(x), forf ∈ C[a, b] and x ∈ [a, b]. (7.29) a
Then, T has a unique fixed point in C[a, b]. The mapping K is called the kernel of the mapping T . Put M := sup{|K(x, y)| : (x, y) ∈ [a, b] × [a, b]}. Proof of Proposition 691 Observe first that T maps C[a, b] into itself, as claimed. Indeed, since the functions K and ϕ are continuous on [a, b] × [a, b] and [a, b], respectively, they are uniformly continuous on their respective domains (see Corollary 630), so given ε > 0 there exists δ > 0 (that can be taken less than ε), such that |K(x, s) − K(y, t)| < ε and |ϕ(x) − ϕ(y)| < ε whenever d∞ ((x, s), (y, t)) < δ. Given f ∈ C[a, b], let R := sup{|f (t)| : t ∈ [a, b]}. Let x, y ∈ [a, b] such that a ≤ x ≤ y ≤ b, and |x − y| < δ. Then we have 4 x 4 y |Tf (x) − Tf (y)| = λ K(x, t)f (t) dt + ϕ(x) − λ K(y, t)f (t) dt − ϕ(y) a
a
4
x
≤ |λ|.
|K(x, t) − K(y, t)|.|f (t)| dt
a
4
+ |λ|
y
|K(y, t)|.|f (t)| dt + |ϕ(x) − ϕ(y)|
x
≤ |λ|εR(b−a) + |λ|MRδ+ε ≤ ε(|λ|R(b−a) + |λ|MR+1). This shows that Tf ∈ C[a, b]. The following estimate is proved by induction (here, for n ∈ N, the expression T n f denotes the n-th iterated of T acting on f ): d∞ (T n f , T n g) ≤ |λ|n M n
(b − a)n d∞ (f , g), for f , g ∈ C[a, b]. n!
(7.30)
Indeed, we shall prove that for all n ∈ N and x ∈ [a, b], |T n f (x) − T n g(x)| ≤ |λ|n .M n Observe that
4
|Tf (x) − T g(x)| ≤ |λ|.
x
(x − a)n d∞ (f , g). n!
(7.31)
|K(x, t)|.|f (t) − g(t)| dt ≤ |λ|M(x − a) d∞ (f , g),
a
and this proves (7.31) for n = 1. Take n ∈ N and assume now that (7.31) holds for k = 1, 2, . . ., n. Then, for x ∈ [a, b], |T n+1 f (x) − T n+1 g(x)| = |T (T n f )(x) − T (T n g)(x)|
370
7 Integration
4 ≤ |λ|.
x
4 |K(x, t)|.|T n f (t) − T n g(t)| dt ≤ |λ|n+1 M n+1 d∞ (f , g)
a
a
= |λ|n+1 M n+1 d∞ (f , g)
x
(t − a)n dt n!
(x − a) . (n + 1)! n+1
The induction hypothesis concludes (7.31) for all n ∈ N and all x ∈ [a, b]. This implies (7.30). n Observe that |λ|n M n (b−a) → 0 as n → ∞ (indeed, this is the n-th term of n! the convergent series defining exp (|λ|.M.(b − a)), see Definition 530). This shows that T n is a contraction from C[a, b] into itself for n big enough, and then Corollary 654 applies to conclude the result. 2 3. An application to fractal-like sets Let (M, d) be a metric space. Define a metric H (called the Hausdorff metric, after the name of the German mathematician F. Hausdorff)) on the family B of all nonempty closed and bounded subsets of M in the following way: given B ∈ B and ε > 0, the ε-neighborhood Uε (B) of B is the set Uε (B) := {x ∈ M : d(x, B) < ε}. Then put H (B1 , B2 ) := inf{ε > 0 : B1 ⊂ Uε (B2 ) and B2 ⊂ Uε (B1 )}, for B1 , B2 ∈ B. It is simple to prove that H is a metric on B, and that (B, H ) is complete if (M, d) is complete (see Exercise 13.407). From now on, assume that (M, d) is complete. Let K be the family of all nonempty compact subsets of M. Endow K with the restriction of the metric H on B. Then (K, H ) is still complete, since it is closed in (B, H ). Given a finite set {T1 , . . ., Tn } of contractions from M into itself, put T ∗ (K) :=
n
Ti (K), for K ∈ K.
(7.32)
i=1
The mapping T ∗ is a contraction from K into itself, hence it has, by Theorem 652, a unique fixed point. As a particular example, put (M, d) := (R, dabs ). The mappings Ti := R → R for i = 1, 2 given by T1 (x) = 13 x, and T2 (x) := 13 x + 23 are surely contractions, so it is the mapping T ∗ : K → K given by (7.32). According to the previous observation, T ∗ has a unique fixed point K0 ∈ K. It is simple to prove that K0 is the Cantor ternary set (see Definition 277).
7.1.7
Mean Value Theorems for the Riemann Integral
Theorem 692 and 694 below are usually called the first and second mean value theorem for the Riemann integral, respectively. Theorem 692 (The First Mean Value Theorem for Riemann integrable functions) Let f and g be two real-valued Riemann integrable functions on an interval
7.1 The Riemann Integral
371
[a, b] ⊂ R, where a < b. Assume that g ≥ 0 on (a, b). Let s := inf{f (x) : x ∈ (a, b)} and S := sup{f (x) : x ∈ (a, b)}. Then there is μ ∈ [s, S] such that 4 b 4 b fg = μ · g. a
a
If f is, moreover, continuous, then there exists ξ ∈ [a, b] such that b f (ξ ) a g.
b a
fg =
Proof First note that f.g ∈ R[a, b] (see (vi) in Proposition 671). Since s ≤ f ≤ S on (a, b) and g ≥ 0 on (a, b), we have sg ≤ f g ≤ Sg on (a, b) and thus, by (iii) in Proposition 671, 4 b 4 b 4 b g≤ fg ≤ S g (7.33) s a
a
a
(note that the values of f and g at a and b are irrelevant for the computation of b b the former integrals). If a g = 0, then from (7.33) we have a f g = 0, and the b conclusion follows (by taking any μ ∈ [s, S]). If, on the contrary, a g > 0, we get b s ≤ (μ := ) a b a
fg g
≤ S,
as we wanted to show. In the case that f is continuous on [a, b] we have s = min{f (x) : x ∈ [a, b]} and S = max{f (x) : x ∈ [a, b]}. By the Intermediate Value Theorem 339 we get that there exists ξ ∈ [a, b] such that f (ξ ) = μ. This finishes the proof. 2 A particular and useful case of Theorem 692 is given in the following result. Corollary 693 Let f be a real-valued Riemann integrable function on a closed and bounded interval [a, b]. Let s := inf{f (x) : x ∈ [a, b]} and S := sup{f (x) : x ∈ b [a, b]}. Then, there exists μ ∈ [s, S] such that a f = μ(b − a). Proof Just let g be the constant function 1 on [a, b] in Theorem 692.
2
Theorem 694 (The Second Mean Value Theorem for Riemann integrable functions) Let f and g be two real-valued Riemann integrable functions on an interval [a, b], where a < b. Assume that g is increasing (decreasing) on (a, b) and that A and B are two real numbers such that A ≤ g(a+) ≤ g(b−) ≤ B. Then there is ξ ∈ [a, b] such that 4 ξ 4 b 4 b fg = A f +B f. (7.34) a
a
ξ
(respectively, 4
b a
4 fg = B a
ξ
4
b
f +A ξ
f .
(7.35)
372
7 Integration
Proof Assume that g is increasing. Define a function F on [a, b] by F (x) := x b A a f + B x f . This function is continuous on [a, b] by Proposition 680. Note that s := inf{g(x) : x ∈ (a, b)} = g(a+) and S := sup{g(x) : x ∈ (a, b)} = g(b−). Assume first that g ≥ 0 on (a, b). Find, by Theorem 692, an element μ ∈ [s, S] (⊂ b b b b [A, B]) such that a f g = μ a f . Note that μ a f is in between A a f ( = F (b)) b and B a f (=F (a)). By the Intermediate Value Property (Theorem 339) we can find b b ξ ∈ [a, b] such that F (ξ ) = μ a f ( = a f g). This proves (7.34) in this case. To prove the general case, put G(x) := g(x) − g(a+) for x ∈ (a, b). The function G is Riemann integrable on [a, b], is increasing on (a, b), and G(x) ≥ 0 on (a, b). Note, too, that A − g(a+) ≤ 0 = G(a+) ≤ G(b−) = g(b−) − g(a+) ≤ B − g(a+). By the first part, we can find ξ ∈ [a, b] such that 4 b 4 b 4 b 4 f g − g(a+) f = f G = (A − g(a+)) a
a
a
ξ
4 f + (B − g(a+))
a
b
f, ξ
We get 4 b 4 ξ 4 b 4 b 4 ξ 4 b 4 ξ 4 b f g = g(a+) f − f − f + A f + B f = A f + B f, a
a
a
ξ
a
ξ
a
ξ
as we wanted to show. The version for a decreasing function g is obtained by considering the increasing function −g and applying the first part. 2
7.1.8
Convergence Theorems for Riemann Integrable Functions
The following result shows an application of Egorov’s Theorem 478. Theorem 695 Let {fn }∞ n=1 be a uniformly bounded sequence of Riemann integrable functions defined on [a, b] that converges pointwise to a Riemann integrable function f . Then 4 b 4 b f = lim fn a
n→∞ a
Proof Let K > 0 such that |fn (x)| ≤ K for every n ∈ N and every x ∈ [a, b]. Let ε > 0. By Egorov’s Theorem 478, there exists a measurable subset Aε of [a, b] so that 2Kλ(Aε ) < ε/2, and the sequence {fn }∞ n=1 converges uniformly to f on Bε := [a, b] \ Aε . Therefore, we can find N ∈ N so that (b − a) sup |f (x) − fn (x)| < ε/2 for all n ≥ N. x∈Bε
Now
4
b a
4
b
f− a
4 fn ≤
a
b
4 |f − fn | ≤
4 |f − fn | +
Aε
|f − fn | Bε
7.1 The Riemann Integral
373
≤ 2Kλ(Aε ) + (b − a) sup |f (x) − fn (x)| < ε/2 + ε/2 = ε. x∈Bε
2
Remark 696 The hypothesis of Riemann integrability of the pointwise limit in Theorem 695 cannot be disposed off. Indeed, there are uniformly bounded sequence of real-valued Riemann integrable functions that pointwise converge to a function that is not Riemann integrable. We provide in Remark 752.3 and in Exercises 13.434 and 13.435 below some examples. Here we present another one: Define, for n ∈ N, a function fn on [0, 1] by ⎧ ⎨0 if x = p , where q ≤ n, q fn (x) = ⎩1 otherwise. We obtain a uniformly bounded sequence {fn }∞ n=1 of functions on [0, 1]. Note that limn→∞ fn = D [0,1] pointwise, where D is the Dirichlet function introduced in Definition 296, which is not Riemann integrable on [0, 1] (see Remark 673.2). That any of the functions fn is Riemann integrable follows from Proposition 672 or its Corollary 676: Indeed, its set of points of discontinuity is finite. 2 ® Corollary 697 Let [a, b] be a closed and bounded interval in R. Let {fn }∞ n=1 be a converges uniformly uniformly bounded sequence in R[a, b]. Assume that {fn }∞ n=1 b b to a function f . Then f ∈ R[a, b], and a fn → a f . Proof There exists K > 0 such that supx∈[a,b] |fn (x)| ≤ K for all n ∈ N. We shall prove first that f ∈ R[a, b]. To this end, fix ε > 0. There exists N ∈ N such that supx∈[a,b] |f (x) − fn (x)| < ε for every n ≥ N . This shows, in particular, that f is a bounded function on [a, b]. Since fN ∈ R[a, b], there exists a partition P of [a, b], consisting of subintervals i , i = 0, 1, . . ., p − 1, such that U (fN , P ) − L(fN , P ) < ε.
(7.36)
Put Mi := supx∈i f (x), and Mi := supx∈i fN (x) for i = 0, 1, . . ., p − 1. Then |Mi − Mi | < ε for i = 0, 1, . . ., p − 1. This shows that p−1 (Mi − Mi )λ(i ) ≤ ε(b − a). |U (f , P ) − U (fN , P )| =
(7.37)
i=0
Analogously, we can prove that p−1
(mi − mi )λ(i ) ≤ ε(b − a), |L(f , P ) − L(fN , P )| =
(7.38)
i=0
where mi := inf x∈i f (x), and m i := inf x∈i fN (x) for i = 0, 1, . . ., p − 1. From (7.37) and (7.38) we get U (f , P ) − L(f , P ) − U (fN , P ) − L(fN , P ) < 2ε(b − a). (7.39)
374
7 Integration
Fig. 7.18 The three first functions fn building the devil staircase
f1
0
f2
1 0
f3
1
0
1
By using (7.36), Eq. (7.39) implies U (f , P )−L(f , P ) < 2ε(b−a)+ε. The Riemann integrability of f follows. Finally, it is enough to apply Theorem 695. 2 Example 698 Let S be the Lebesgue singular function introduced in Definition 398. This function is continuous, hence, by Proposition 674, S ∈ R[0, 1]. We shall 1 compute its integral on the interval [0, 1] to show that 0 S = 21 . Indeed, recall the sets In,k , k = 1, 2, . . ., 2n−1 , n ∈ N, in the construction of S. For each n define the continuous function (see Fig. 7.18) ⎧ ⎨ 2k−1 , if x ∈ I , k = 1, 2, . . ., m − 1, m = 1, 2, . . ., n, m,k 2m fn (x) := ⎩piecewise linear in between the Im,k intervals. 1 Observe that fn → S uniformly on [0, 1], and that, for every n ∈ N, 0 fn = 21 (this can be computed explicitly; the reader may also rely on a symmetry-with-respect1 to-the-diagonal argument, see again Fig. 7.18). Thus, by Theorem 697, 0 S = 21 . ♦ Remark 699 A simple modification of the proof of Proposition 672 allows us to prove the following variant of Theorem 695: Let (a, b) be an open and bounded interval in R, and let f : (a, b) → R be a bounded function such that, for every c, d ∈ R with a < c < d < b, f ∈ R[c, d]. Then any extension f : [a, b] → R of f to the interval [a, b] is Riemann integrable in [a, b]. Moreover, the sequence b−1/n b { a+1/n f }∞ n=1 converges to a f as n → ∞. Indeed, assume that |f (x)| ≤ M on [a, b]. Then, given ε > 0 find c, d as above such that c − a < ε, and b − d < ε. Since f ∈ R[c, d], we may find P ∈ P[c, d] such that U (f , P ) − L(f , P ) < ε. Put Q := P ∪ {a, b} ∈ P[a, b]. It is clear that U (f, Q) − L(f, Q) < ε + 4Mε. Since ε > 0 is arbitrary, we get f ∈ R[a, b]. To prove the last part, define, for n ∈ N, functions fn : [a, b] → R by ⎧ ⎪ ⎪ ⎨f (a), for x ∈ [a, a + 1/n), fn (x) = f (x), for x ∈ [a + 1/n, b − 1/n], ⎪ ⎪ ⎩ f (b), for x ∈ (b − 1/n, b]. Then the sequence {fn }∞ n=1 satisfies the requirements in Theorem 695. As a particular case, consider a bounded continuous function f : (a, b) → R, where (a, b) is a bounded open interval in R. The conclusion is that every extension b−1/n b f to [a, b] is Riemann integrable, and that a f = limn→∞ a+1/n f .
7.1 The Riemann Integral
375
For an example, consider the function f defined on (0, 1) by f (x) := 1/n if x ∈ [1/(n + 1), 1/n), for n ∈ N. The extension of f to [0, 1] given by f (0) := 0 1 and f (1) := 1 is Riemann integrable, and its integral 0 f (x) dx is, according to the 2 previous observation, ∞ 1/(n (n + 1)). n=1 Related to this remark, see Corollary 675 above. ®
7.1.9
Change of Variable; Integration by Parts
The following result allows us to evaluate an integral “by changing the variable” (say also “by a substitution rule”), i.e., by integrating the composition of the given function with another function (times a “distortion factor”). The reader will appreciate Leibniz’s notation: if x = G(t), then dx = G (t)dt = g(t)dt (in case G has a derivative g), taking this last expression for the moment as a formal manipulation of G (t) = x = dx/dt. q For practical reasons, it is convenient from now on to allow to write p f even in p q the case that p > q, with the agreement that p f = − q f . We shall use in the proof of Theorem 702 below the following useful concept. Definition 700 Let f be a real-valued bounded function defined on an interval [a, b]. Let S ⊂ [a, b]. The oscillation of f on S is the real number ω(f , S) := sup{|f (x) − f (y)| : x, y ∈ S}.
(7.40)
If x0 ∈ [a, b], then the oscillation of f at x0 is the real number ω(f , x0 ) := lim ω(f , B(x0 , δ) ∩ [a, b]), δ→0
(7.41)
where for δ > 0, B(x0 , δ) := {x ∈ R : |x − x0 | < δ}. Remark 701 1. Note that, if f is a real-valued bounded function defined on [a, b], then f is continuous at some x0 ∈ [a, b] if and only if ω(f , x0 ) = 0. This follows directly from the definition of continuity. 2. The concept of oscillation can be used for the following characterization of Riemann integrability: A function f : [a, b] → R is Riemann integrable on [a, b] if and only if, it is bounded and given ε, η > 0, there exists a partition P of [a, b] such that the sum of the lengths of the subintervals in P for which ω(f , ) > η is less than ε. For a proof, see Exercise 13.426. ® Theorem 702 (Change of variable) Let [a, b] be a closed bounded interval in R such that a < b. Let g : [a, b] → R be a Riemann integrable function. Fix K ∈ R t and put G(t) := a g(s) ds + K for t ∈ [a, b]. Then, if f is a real-valued Riemann b integrable function on G([a, b]), the integral a f [G(t)]g(t) dt exists and
376
7 Integration
Fig. 7.19 Change of variable for an increasing function G
4
b
4
G(b)
f (G(t))g(t) dt =
f (x) dx.
(7.42)
G(a)
a
Remark 703 For a Riemann integrable function g on an interval [a, b], where a < b, the existence of G in Theorem 702 is always guaranteed (see the introduction to Sect. 7.1.4). However, g may fail to have an antiderivative, see Remark 684 above. In case it does, G defined in the statement of Theorem 702 is an antiderivative—see Theorem 685—, so G (t) = g(t) for all t ∈ (a, b), and Eq. (7.42) adopts the more familiar form 4
b a
f (G(t))G (t) dt =
4
G(b)
f (x) dx.
(7.43)
G(a)
Note that G (t) may not be defined at a or at b. Despite this, the Riemann integral in the first member of (7.43) exists (see Remark 673.1). Related to formula (7.43), see also Corollary 704 (Fig. 7.19). ® Proof of Theorem 702 [Dav61] Observe first that G is a continuous function (see Proposition 680), hence G([a, b]) is a closed bounded interval in R. Let M := max{ sup{|f (x)| : x ∈ G([a, b])}, sup{|g(t)| : t ∈ [a, b]}}. Fix ε > 0. Use the characterization of Riemann integrability in Remark 701.2 to obtain P ∈ P[a, b] such that the sum of the lengths of the subintervals in P for which ω(g, ) > ε is less that ε. Call those intervals of type I. The remaining intervals , in which ω(f , ) ≤ ε are of two kinds: Intervals of type II. Those intervals in which there exists t0 such that |g(t0 )| < ε. Note that |g(t)| ≤ 2ε for all t ∈ . Intervals of type III. Those intervals := [u, v] in which |g(t)| ≥ ε for all t ∈ . Assume there are N of them. Due to the fact that ω(f , ) ≤ ε, we must have either g(t) ≥ ε for all t ∈ or g(t) < −ε for all t ∈ . Suppose, for example, the former (the argument in the other case is similar). Then, by Corollary 693 we have G(t
) − G(t ) ≥ ε whenever u ≤ t < t
≤ v t
− t
(7.44)
7.1 The Riemann Integral
377
(note that, in particular, G is strictly increasing on [u, v]). Use again the characterization of Riemann integrability in Remark 701.2 for the function f on [G(u), G(v)] to obtain Q := {u = τ0 < τ1 < . . . < τm = v} such that the sum of the lengths of the subintervals G(J ) in G(Q) for which ω(f , G(J )) > ε is less that ε2 /N . We further subdivide the intervals J in Q into two subtypes. Intervals of type IIIa: Those J ’s in which ω(f ◦ G, J ) > ε. By (7.44), the sum of their lengths is less than (1/ε)(ε 2 /N) ( = ε/N ). Intervals of type IIIb: Those J ’s in which ω(f ◦ G, J ) ≤ ε. All this amounts to a partition {a = t0 < t1 < . . . < tn = b} of [a, b] in subintervals of type I, II, IIIa, and IIIb. Observe that the sum of the lengths of intervals of type I and IIIa is less than ε + N (ε/N ) ( = 2ε). Apply Corollary 693 twice to obtain 4
G(b)
f (x) dx =
G(a)
n 4 i=1
=
n
G(ti )
f (x) dx
G(ti−1 )
λi [G(ti ) − G(ti−1 )] =
i=1
n
λi μi (ti − ti−1 ),
(7.45)
i=1
where λi is between the lower and upper bound of f on [G(ti−1 ), G(ti )] and μi is between the lower and upper bound of g on [ti−1 , ti ], for i = 1, 2, . . ., n. For i = 1, 2, . . ., n, fix ξi ∈ [ti−1 , ti ]. From (7.45) we have 4 n G(b) f (x) dx − f [G(ξi )]g(ξi )(ti − ti−1 ) G(a) i=1
=
n
|λi μi − f [G(ξi )]g(ξi )|(ti − ti−1 );
(7.46)
i=1
we shall prove that this sum is small. To this end, observe that the sum of the lengths of the intervals of type I and IIIa is less than 2ε, so (I, IIIa)
their total contribution to (7.46) is less than 2M 2 .2ε.
(7.47)
Throughout an interval of type II, we have |g(t)| < 2ε, and so |μi | ≤ 2ε. It follows that (II)
their total contribution to (7.46) is at most 4Mε(b − a).
(7.48)
Finally, for an interval J of type IIIb, we have both ω(g, J ) < ε and ω(f , G(J )) < ε. Consequently, on this interval |λi μi − f (G(ξi ))g(ξi )| = |λi [μi − g(ξi )] + [λi − f (G(ξi ))]g(ξi )]| ≤ |λi |.|μi − g(ξi )| + |λi − f (G(ξi ))|.|g(ξi )| < 2Mε,
378
7 Integration
Fig. 7.20√The function f (x) = 1 − x 2 on [0, 1]
so (IIIb)
their total contribution to (7.46) is at most 2Mε(b − a).
(7.49)
Combining (7.47), (7.48), and (7.49), we get 4 n G(b) f (x) dx − f [G(ξi )]g(ξi )(ti − ti−1 ) < 4M 2 ε + 6Mε(b − a). G(a) i=1
Since, for i = 1, 2, . . ., n, the tag ξi ∈ [ti−1 , ti ] was chosen arbitrarily, this concludes the proof. 2 As an example of how to use Theorem 702, let g : [0, π/2] → R be given by g(t) = cos t (a continuous and so Riemann integrable on [0, π/2], having √ an antiderivative G(t) := sin (t) on [0, π/2], see Corollary 683), and let f (x) = 1 − x 2 , for x ∈ [0, 1] (a continuous function on [0, 1], hence Riemann integrable; for its graph, see Fig. 7.20). Then, with the notation in Theorem 702, G(t) := sin t for t ∈ [0, π/2], G[0, π/2] = [0, 1], and we get
4 0
1
4 1−
x2
dx =
π/2
cos2 t dt. 0
In order to compute the last integral, use the fact that cos2 t = (1 + cos (2t))/2 π/2 to obtain that 0 cos2 t dt = π/4. For another approach to this evaluation, see Exercise 13.463. Observe that this example gives a quarter of the area of a circle of radius 1. Thus, the area of such a circle is π. Corollary 704 Let G be a continuous real-valued function on a closed and bounded interval [a, b], and assume that G has a continuous bounded derivative on (a, b). Let f be a real-valued Riemann integrable function on the closed and bounded interval G[a, b]. Then 4 b 4 G(b) f (G(t))G (t) dt = f (x) dx. (7.50) a
G(a)
7.1 The Riemann Integral
379
Proof Put g(t) := G (t) for t ∈ (a, b), g(a) = g(b) = 0. Then g is Riemann integrable on [a, b], and G is an antiderivative of g on [a, b]. The conclusion follows from Theorem 702. 2 Theorem 705 (Integration by Parts) Let f and g be real-valued continuous functions on a closed and bounded interval [a, b] ⊂ R, and assume that both f and g have bounded continuous derivatives on (a, b). Then 4
b
f g = f (b)g(b) − f (a)g(a) −
a
4
b
f g.
(7.51)
a
Proof Observe that (f g) = f g+f g on (a, b). Extend the function (f g) arbitrarily to [a, b], and call this extension h (see Remark 699). Then the function f g is an antiderivative for h on [a, b], so the result follows from the Fundamental Theorem of Calculus (Theorem 685). 2 Remark 706 The integration-by-parts formula (7.51) can be expressed in the language of antiderivatives in the following familiar way: 4
f g = f g −
4
f g.
(7.52)
Under hypothesis in Theorem 705, formula (7.52) relates the antiderivatives f g
the and f g of f g and f g, respectively, to the function f g. This formula comes from (7.51) by letting b := x, where x ∈ [a, b]: we get 4 x 4 x
f (t)g (t)dt = f (x)g(x) − f (a)g(a) − f (t)g(t)dt. (7.53) a
a
The left-hand-side of (7.53) is, according to Theorem 685, an antiderivative of f g , x and so it is, then, f (x)g(x) − a f (t)g(t)dt, what justifies (7.52). As an example of the use of Theorem 705 and formula (7.52), let us calculate x 1 xe dx and 0 xex dx. For the computation of an antiderivative, put ⎧ ⎨f = x, g = ex , (7.54) ⎩f = 1, g = ex . Then, according to (7.52), 4 4 x x xe dx = xe − ex dx + C = xex − ex + C, where C is an arbitrary constant. 1 To evaluate 0 xex dx, use the antiderivative G(x) := xex − ex computed above 1 and formula (7.13) to obtain 0 xex dx = G(1) − G(0) = (1e1 − e1 ) − (−e0 ) = 1.
380
7 Integration
Fig. 7.21 The function h on [−1, 0) in Remark 707
If we wish to use formula (7.51) directly we have, according to (7.54), 1 1 1 xex 0 − 0 ex dx = e − ex 0 = e − e + 1 = 1.
1 0
xex dx = ®
Remark 707 In Corollary 704 and Theorem 705 we were using functions defined and continuous on a closed and bounded interval [a, b], and having a bounded continuous derivative on (a, b). The class of those functions is larger than the class of continuously differentiable functions on [a, b], i.e., functions having a derivative on (a, b) and one-sided derivatives f+ (a) at a and f− (b) at b (denoted here by f (a) and f (b), respectively), with f so defined continuous on [a, b]. As an example, consider the function h(x) x := sin (1/x), x ∈ [−1, 0) (see Fig. 7.21). For x ∈ [−1, 0), put H (x) := −1 h(t) dt. Since h is continuous on [−1, x] (hence Riemann integrable on [−1, x]), we may use Proposition 680 to ensure that H is continuous on [−1, x]. Corollary 683 shows that h : [−1, x] → R has an antiderivative, and so Theorem 685 concludes that H is in fact an antiderivative for h on [−1, x]. Since x ∈ [−1, 0) was arbitrary, we get that H : [−1, 0) → R is continuous and has a derivative on (−1, 0) that coincides with h. Note, too, that for −1 ≤ x < y < 0, we have 4 y 4 y 4 y h(t) dt ≤ |h(t)| dt ≤ dt = y − x. x
x
x
This shows that limx→0− H (x) exists and is finite, so H has a continuous extension to a function (denoted again by H ) defined on [−1, 0]. Summarizing, H : [−1, 0] → R is a continuous function on [−1, 0], it has a continuous and bounded derivative h on (−1, 0), (in particular, H is an antiderivative on [−1, 0] of any extension of h to [−1, 0]), although no possible extension of h to [−1, 0] may be continuous, as limx→0− sin (1/x) does not exist. Observe, too, that any extension h of h to [−1, 0] is Riemann integrable. This will follow from Corollary 675 (see also Remark 699 and Theorem 777 below). Indeed, 0 x h(t) dt as h is bounded and continuous on (−1, 0). Moreover, −1 h(t) dt → −1 x → 0−, as it was shown in Remark 699. ®
7.2 Improper Riemann Integrals
381
Fig. 7.22 Improper Riemann integrals of the first class f
a
a
f f
7.2
Improper Riemann Integrals
Definition 708 Let f be a real-valued function defined on the interval [a, ∞), where a ∈ R. We say that f has an improper Riemann integral of the first class on [a, +∞) (or, alternatively, that the improper Riemann integral of the first class of b f on [a, +∞) exists) if f ∈ R[a, b] for every b ∈ R, b > a, and limb→+∞ a f +∞ +∞ exists as a real number. This limit is denoted by a f (or by a f (x) dx), and its value is said to be the improper Riemann integral of the first class aof f on [a, +∞). Similarly, we define improper Riemann integrals of the first class −∞ f for functions f ∈ R[b, a] for every b ∈ R, b < a, as the it exists as a real number) alimit (whenever a a limb→−∞ b f . This limit is denoted by −∞ f (or −∞ f (x) dx). We say that the improper Riemann integral 4 +∞ 4 +∞ f denoted also by f (x) dx −∞
−∞
exists if f ∈ R[a, b] for all choices of a, b ∈ R such that a ≤ b and b lima→−∞, b→+∞ a f exists as a real number I , in the sense that for every ε > 0 b there exists N ∈ N such that, for a < −N and b > N, | a f − I | < ε. (See Fig. 7.22.) Definition 709 Let f be a real-valued function defined on the interval [a, b), where a, b ∈ R and a < b. Assume that f ∈ R[a, d] for every a < d < b, and that d I := limd→b− a f exists as a real number. Then, we say that the improper Riemann integral of the second class exists, and the value I is called the improper Riemann b integral of the second class of f in [a, b], and is denoted again by a f (or by b that f is defined on (a, b], that f ∈ R[c, b] a f (x) dx). The same applies to the case b for each a < c < b, and that limc→a+ c f exists as a real number. Finally, for a function f defined on (a, b) such that f ∈ R[c, d] for each c, d ∈ (a, b) with c < d, d and that the limit I := limc→a+, d→b− c f exist as a real number (in the sense that
382
7 Integration
Fig. 7.23 Improper Riemann integrals of the second class
f
f
a
b
b
a
a
b
f
for every ε > 0 there exists δ > 0 such that for a < c < a + δ and b − δ < d < b we d have | c f − I | < ε), we say the the improper Riemann integral of the second class of f in [a, b] exists, and the value I is said to be the improper Riemann integral of b b the second class in [a, b], denoted again by a f (or by a f (x) dx). (See Fig. 7.23.) We shall omit the expressions “of the first class” or “of the second class” in the previous definitions if the situation is clear from the context. Remark 710 Any improper Riemann integral of the second class becomes an improper Riemann integral of the first class after a change of variable. For example, let f be a real-valued improper Riemann integrable function defined on the interval [a, b), a, b ∈ R, a < b. Define t = ϕ(x) := (b − x)−1 for x ∈ [a, b). The function ϕ maps [a, b) onto [1/(b − a), +∞), and it has a continuous inverse ψ : [1/(b − a), +∞) onto [a, b) given by x = ψ(t) = b − 1/t. Fix t0 ∈ [1/(b − a), +∞). Since f : [a, b − 1/t0 ] → R is a Riemann integrable function, and the mapping ψ : [1/(b − a), t0 ] → [a, b − 1/t0 ] is a derivable function with Riemann integrable —in fact, continuous— derivative function, we can apply Theorem 702 below to get that f ◦ ψ is Riemann integrable on [1/(b − a), t0 ] and 4
t0
1/(b−a)
f [ψ(t)]t −2 dt =
4
b−1/t0
f (x) dx. a
This is true for every t0 ∈ [1/(b − a), +∞), and so f is improper Riemann integrable in [a, b) if and only if, f ◦ ψ is improper Riemann integrable in [1/(b − a), +∞). ® The Cauchy criterion for the convergence of a sequence (Theorem 152) or for the existence of the limit of a function at a point (Proposition 315) allow us to formulate a corresponding Cauchy criterion for the existence of an improper Riemann integral. We shall formulate it for improper Riemann integrals of the first class. A similar criterion for improper Riemann integrals of the second class holds.
7.2 Improper Riemann Integrals
383
Fig. 7.24 Plotting sin (x)/x on [0, 50] (Example 713)
Proposition 711 Let f be a real-valued function defined on the interval [a, +∞), where a ∈ R. Assume +∞that f ∈ R[a, b] for each b ∈ (a, +∞). Then the improper Riemann integral a f exists as a real number if and only if, given ε > 0 there b exists b0 > a such that for b0 ≤ b1 < b2 , we have b12 f < ε. b +∞ Proof Assume first that L := a f ∈ R. Then limb→+∞ a f = L. Given ε > 0 b there exists b0 ∈ (a, +∞) such that | a f − L| < ε/2 for every b ∈ [b0 , +∞). In particular, for b0 ≤ b1 < b2 we have 4 b2 4 b2 4 b 1 4 b 2 4 b1 f = f− f − L + L − f < ε/2 + ε/2 = ε. ≤ b1
a
a
a
a
Conversely, assume that the condition Fix a sequence {xn }∞ n=1 in [a, +∞) such xn holds. ∞ that xn → +∞. The sequence { a f }n=1 is Cauchy, so it converges. The existence b of limb→+∞ a f follows from the version of Proposition 314 for a limit at +∞. 2 Corollary 712 Let f be a function defined on the interval [a, +∞), where a ∈ R, +∞ such that f ∈ R[a, b] for each b ∈ (a, +∞). Assume that a |f | exists. Then +∞ f exists, too. a Proof By Proposition 711, for every ε > 0 there exists b0 > a such that for b b b b0 ≤ b1 < b2 we have b12 |f | < ε. Note that | b12 f | ≤ b12 |f | ( < ε), and so the result follows by applying again Proposition 711. 2 Example 713 As an application of Theorem 694 and Proposition 711, we will show that the integral 4 +∞ sin x dx x 0 exists as an improper Riemann integral of the first class (see Fig. 7.24). Indeed, if α and β are in (0, ∞) with α < β, then, by Theorem 694, there is ξ ∈ [α, β] such that
384
7 Integration
4
β α
4 4 1 β sin x 1 ξ sin x dx + sin x dx dx = x α α β ξ cos α − cos β cos ξ − cos β cos α − cos β = + ≤ α β α cos ξ − cos β 2 ≤ + 2. + α β β
Then the result follows by using the Cauchy condition for the existence of an improper Riemann integral (Proposition 711). Observe that, in this particular example, the existence of the improper Riemann integral is clearly equivalent to the convergence of the series whose terms are the (decreasing) areas of the “bumps” (alternatively positive and negative) depicted in Fig. 7.24. The convergence of this (alternate) series is given by the Leibniz criterion (Corollary 183), see also the argument in Theorem 716 below. Related to this integral, see Example 806 and Exercise 13.467, where the actual value of the integral is calculated. ♦ b Remark 714 Note that if a f exists as a Riemann integral, then it also exists as an improper Riemann integral of the second class. Indeed, there exists a constant M > 0 such that |f | ≤ M on [a, b]. Then, for a < c < b, 4 b 4 c 4 b 4 b f− f = f ≤ |f | ≤ M|b − c|. a
Thus limc→b
c a
f =
a
b a
c
f.
c
®
Example 715 In the sense of improper Riemann integral we have (see Fig. 7.25) 1.
4
4
√ 1 1 √ dx = lim 2 x ε = lim (2 − 2ε) = 2. ε→0 ε→0 x 0 ε 1 Note that the Riemann integral 0 √1x dx does not exist, as the function √1x is not bounded on (0, 1]. 2. 4 s 4 ∞ 1 1 1 1 s = 1. dx = lim dx = lim − 1 = lim 1 − s→∞ 1 x 2 s→∞ s→∞ x2 x s 1 1
1 √ dx = lim ε→0 x
1
♦ As the reader may conjecture after looking at the definition of the Riemann integral in terms of Riemann sums (see Theorem 669), there should be a connection between the computation on an improper Riemann integral and the sum of a series. A look at Fig. 7.26 will illustrate the point. This is made explicit in the following result (whose proof, by the way, is just a formalization of what should be clear by looking again at Fig. 7.26). Note, too, that Theorem 716 can be used in two ways: the behavior of a numerical series may inform us on the behavior of an improper integral, and conversely (see Corollary 717).
7.2 Improper Riemann Integrals
385
Fig. 7.25 The two functions in Example 715 (part of the range of the first one, part of the range and the domain of the second one)
Fig. 7.26 The upper and lower Riemann sums in the proof of Theorem 716
f
f
1 2 3 4 5 6 7 8 9 10 1 2 3 4 5 6 7 8 9 10 (i) (ii) Theorem 716 (Cauchy) Let f be a decreasing positive function defined on +∞ [1, +∞). Then the improper Riemann integral 1 f exists if and only if the series ∞ n=1 f (n) converges. +∞ Proof Assume that 1 f exists as an improper Riemann integral. Then there is a n constant C > 0 such that for each n ∈ N, 1 f < C. Since nj=2 f (j ) is a Lower Riemann sum (see Definition 658) for the function f on [1, n] and the partition {1 < 2 < · · · < n} (see Fig. 7.26i), we have nj=2 f (j ) < C. As f (j ) > 0 for each j , we get that ∞ (j ) is convergent (see the beginning of Sect. 2.4.3). n=1 f ∞ f (j ) < Assume now that n=1 f (j ) is convergent. Thus for some C > 0, n−1 n j =1 C, for all n ∈ N. If n ∈ N is given, note that the Riemann integral 1 f exists as f is decreasing (see Proposition 677). Then n−1 j =1 f (j ) is an upper Riemann sum (see Definition 658) for the function f on [1, n] and the partition {1 < 2 < · · · < n} n x (see Fig. 7.26ii). Therefore f ≤ C for all n. Since f > 0, we get 1 f ≤ C for 1 x every x ≥ 1. Therefore 1 f is a bounded increasing function and thus its limit at +∞ +∞ exists as a real number. This means that the improper Riemann integral 1 f exists. 2 Applications The following Corollary gives an alternative approach, based on The orem 716, to the study of the character of series of the form np . See Proposition 174, a consequence of the Cauchy’s Condensation Criterion (Proposition 173). 1 Corollary 717 The series ∞ n=1 np is convergent if and only if, p > 1. 1 Proof If p ≤ 0, then it is not true that lim n1p = 0 and thus ∞ n=1 np is divergent n→∞
by Proposition 158. If 0 < p and p = 1, we can use Theorem 716. Indeed, we have 4 x 1 t −p+1 x 1 1 = = − 1 . p −p + 1 1 1 − p x p−1 1 t
386
7 Integration
x Thus limx→+∞ 1 t1p exists as a real number if and only if, p > 1. The case p = 1 gives the harmonic series, which is divergent (see Proposition 161). This case can x also be treated this way: limx→+∞ 1 1t = limx→+∞ ln x = +∞, so the series is divergent. 2
Integral Comparison Tests Theorem 718 Let f and g be two continuous positive functions on [1, +∞). +∞ (i) Assume that f ≤ g and assume that the improper Riemann integral 1 g +∞ exists. Then the improper Riemann integral 1 f exists. +∞ (ii) Assume that the improper Riemann integral 1 g exists and assume that +∞ (x) lim fg(x) = 0 Then the improper Riemann integral 1 f exists. x→+∞
(x) = L with 0 < L < +∞. Then the improper Riemann (iii) Assume that lim fg(x) x→∞ +∞ +∞ integral 1 f exists if and only if the improper Riemann integral 1 g exists.
Proof We will prove (ii). The others are similar. From the limit condition there +∞ is x0 such that f (x) ≤ g(x) for all x ≥ x0 . From the existence of the integral 1 g x it follows that there is a constant C > 0 such that x0 g ≤ C for all x ≥ x0 . Thus, x x x ≥ x0 . As the function x → x0 f on [1, +∞) is increasing, we x0 f ≤ C for all x get that limx→+∞ x0 f exists as a real number. Thus, the improper Riemann integral +∞ Since f is continuous on [1, ∞) we get that the improper Riemann x0 f exists. +∞ 2 integral 1 f exists. Example 719 The improper Riemann integral 4
π 2
0
ln sin x dx. √ x
exists. The problem is at x = 0 (see Fig. 7.27, where the graph of the function f under the integral sign is plotted). We try to compare the function f with some power function that would “eliminate” the logarithm in the following sense: try to compare 3 with, say x − 4 . We get 1 ( ln sin x)x − 2 lim =0 3 x→0+ x− 4 π 3 by L’Hôspital Rule (Theorem 376). Since 02 x − 4 dx exists, we get that the integral in question exists by Theorem 718 (ii). ♦
7.3 The Lebesgue Integral
387
Fig. 7.27 The function under the integral sign in Example 719, and its integral
Fig. 7.28 The function under the integral sign in Example 720
Example 720 To decide on the existence of the improper Riemann integral +∞ 100 −x 100 −x x x e dx, compare with e− 2 . Observe that limx→+∞ x −ex = 0 by 0 e 2 +∞ − x L’Hôspital’s Rule (Theorem 376). Since 0 e 2 dx exists, we get from Theorem 718 (ii) that so is the integral in question (Fig. 7.28). ♦
7.3 The Lebesgue Integral 7.3.1
Introduction
The theory of the Riemann integral developed in Sect. 7.1 is well adapted to the usual requirements of the elementary calculus. However, it is not sufficient for the needs of a more advanced function theory. The behavior regarding convergence, for example, is unsatisfactory (an example was shown in Remark 696). On the other hand, the theory of integration provided by the Lebesgue integral appears well suited for finer analysis, is very stable regarding convergence, and, last but not least, properly includes the Riemann theory, in the sense that on Riemann integrable functions the Lebesgue and the Riemann integral agree, and strictly more functions can be integrated in the Lebesgue sense than in the Riemann sense. A great advantage of the Lebesgue integral is that, unlike the situation with the Riemann integral, the space of Lebesgue integrable functions is a complete metric space in the standard integral norm · 1 (see Exercise 13.485). More precisely, the
388 Fig. 7.29 Divisions for the Lebesgue integral
7 Integration
f
J5 J4 J3 J2 J1 J0 a
b I5
I5
vector space R1 [a, b] of all classes containing Riemann integrable functions on a given closed and bounded interval [a, b] is dense in the complete normed space (i.e., the Banach space) L1 [a, b] of all classes of Lebesgue integrable functions on [a, b] endowed with the integral norm · 1 . Thus, (L1 [a, b], · 1 ) becomes a completion of (R1 [a, b], · 1 ). This will be proved in Proposition 785 and Exercise 13.485. The Riemann procedure for integrating a bounded real-valued function defined on an interval [a, b] basically proceeds by dividing the given interval [a, b] in adjacent subintervals and approximating the integral by a Riemann sum, i.e., considering the measure (the length) of the base intervals times the value of the function at some point in the interval (the height), and adding all these products. Proceed now differently: Divide the range of the function in adjacent disjoint intervals, and consider their preimages (a collection of disjoint subsets of [a, b]). Then compute the previous sum (adding the measure of those sets times the evaluation of the function at some point in the set). This will be the approximation to the integral. Nothing to object to this procedure, but the fact that the partition of [a, b] so obtained need not consist of intervals. If the function is acceptable in the sense of measure theory (i.e., measurable), the partition will certainly consists of measurable sets, and the existence of the measure of each of them will be guaranteed. 2 We claim that, even from the computational point of view, Lebesgue approach is more efficient than Riemann’s technique. Indeed, assume that we want to compute the integral of a function like the one depicted in Fig. 7.29. The values of f along the subset I5 are somehow redundant, since there the function is almost a constant. If we divide [a, b] in, say, equally spaced intervals and compute the value of f at a point in each subinterval, we repeat many times the same information, a useless procedure. However, if we divide the range of f in, say, equally spaced intervals
2 The difference between the Riemann and the Lebesgue integration is sometimes described as follows: to count the total value of a pile of coins, the Riemann approach consists in taking coins one by one and adding their values, while the Lebesgue approach proceeds by grouping first the two-dollar coins, then the one-dollar ones, then the quarters, etc., and finally counting the number of coins in each of the piles.
7.3 The Lebesgue Integral
389
J0 , . . ., J5 , only one of them (J5 ) accounts for the redundant almost-horizontal part of the graph. A single preimage (the set I5 ) collects all this repetitive information. In the literature we find several ways to present the Lebesgue measure and integral theory on the real line. Some of them proceed from the measure to the integral, some other vice versa. We have, already at hand, the basic Lebesgue measure theory; so nothing is more natural than to use this to develop the theory of integration. However, we think enlightening to start by the construction of the integral (first for simple functions called “step functions”, then for more complicated ones, the so-called “Lebesgue measurable functions”) without referring to the Lebesgue measure. Later on (in Sect. 7.3.6) we shall connect both approaches and see that there are, in fact, equivalent. A remark is in order: although we will evolve from integral to measure, some measure is needed at the beginning: at least, the measure λ((a, b)) ( = (b − a)) of a bounded interval (a, b), and the concept of set of measure zero. Recall that a set S ⊂ R is said to have measure zero (we say then that the set is null) ∞if for every ε > 0 there exists a cover {In }∞ of S by open intervals I such that n n=1 n=1 λ(In ) < ε (see Definition 261 and Proposition 247). Recall, too (see Definition 261), that a property related to real numbers is said to hold almost everywhere in a certain subset S (in short, (a.e.) in S), if it happens for every element x ∈ S \ N , where N is a null set. b b Regarding notation, we have been using the symbols a f or a f (x) dx for the Riemann integral of a real-valued +∞ Riemann +∞ integrable function f on a closed and bounded interval [a, b], and a f or a f (x) dx —and their variants— for the improper Riemann integral of f on some general interval. We shall use the symbol E f or E f (x) dx for the Lebesgue integral of a real-valued Lebesgue integrable function f on a measurable set E ⊂ R. Since every Riemann integrable function f on a closed and bounded interval [a, b] is Lebesgue integrable (see Theorem 783), we b b shall use indistinctly a f or [a,b] f (or a f (x) dx and [a,b] f (x) dx, respectively) in such a case. Monotone sequences of functions pointwise converging to a function (maybe almost everywhere) play an important role in this section. If {fn }∞ n=1 is a pointwise increasing sequence of real-valued functions defined on a common domain D (i.e., such that fn (x) ≤ fn+1 (x) for every x ∈ D and every n ∈ N) that pointwise converges to a function f on D, we shall write then fn ↑ f . If fn (x) ≤ fn+1 (x) (a.e.) on D, and the pointwise convergence of the sequence {fn }∞ n=1 to f occurs also (a.e.), we shall write fn ↑ f (a.e.). Change ↑ to ↓ if “increasing” is changed to “decreasing” in the preceding sentence.
7.3.2
Step Functions
We are interested in integrating (in general unbounded) real-valued functions defined on subsets (in general unbounded) of R. We shall start by integrating step functions, that, as a matter of convenience, we shall assume to be defined on general intervals. By a general interval we mean a (bounded, unbounded, closed, open, or half-closed)
390
7 Integration
Fig. 7.30 A step function
subinterval of R. Σ-measurable step functions were defined in a more general setting (real-valued functions defined on elements of a σ -algebra of subsets of R) in Sect. 4.5.4. Definition 721 Let I be a general interval. A function s defined on I is called a step function if there exists a compact subinterval [a, b] of I and a partition P := {a = x0 < x1 < . . . < xn = b} of [a, b] such that s vanishes outside [a, b], and it is constant on each open subinterval oi := (xi−1 , xi ) (with value si , say), i = 1, 2, . . ., n. The set of all real-valued step functions defined on I will be denoted by S(I ). The Lebesgue integral of such an s on I is defined by I s := ni=1 si λ(oi ) Remark 722 Observe that nothing is said on the values of s at the end points of the subintervals i := [xi−1 , xi ], i = 1, . . ., n, defined by the partition P . The way the integral of a step function s has been defined is independent of the particular values s takes at those end points. ® The set S(I ) is, obviously, a subset of the set of all Lebesgue measurable functions defined on I (see Definition 400). Clearly, S(I ) is stable by taking sums, products by real numbers, and products. It is also clear that I is a linear operator, i.e., (s + s ) = s + s , and αs = α s , where s1 , s2 are step functions on 2 I 1 I 1 I 2 I I 1 1 I and α ∈ R. It is obvious, too, that I is a monotone operator, in the sense that if s1 , s2 are as before, and s1 (t) ≤ s2 (t) on I , then I s1 ≤ I s2 . The next property about sequences of step functions is crucial. Proposition 723 Let {sn }∞ n=1 be a decreasing sequence of step functions defined on a general interval I . Assume that sn ↓ 0 (a.e.). Then I sn ↓ 0. Proof The function s1 vanishes outside a compact interval [a, b]. Observe that s1 ≥ 0 (a.e.), so sn vanishes (a.e.) outside [a, b] and sn ≥ 0 (a.e.), for every n ∈ N. Moreover, s1 (t) ≤ M for every t ∈ [a, b], and for some M ≥ 0. The same holds (a.e.) for every sn , n ∈ N. Let En be the (finite) collection of all the end points of the subintervals defining sn . Then E := ∞ n=1 En is a countable subset of [a, b], hence null. Let N be the subset of [a, b] consisting of all points t where {sn (t)}∞ n=1 does not converge. By assumption N is null. Put F := E ∪ N , again a null subset [a, b]. Fix ε > 0. of ∞ There exists a cover {In }∞ of F by open intervals such that n=1 n=1 λ(In ) < ε. If x ∈ [a, b] \ F , then sn (x) → 0, hence we may find N (x) ∈ N such that sn (x) < ε for all n ≥ N (x). Observe that x ∈ E, so sN (x) (x) is constant (a constant
7.3 The Lebesgue Integral
391
less than ε) on some open interval, say U (x). Since {sn }∞ n=1 is decreasing, we get sn (t) < ε, for all n ≥ N (x), t ∈ U (x). The family {U (x) : x ∈ [a, b] \ F } ∪ {In : n ∈ N} is an open cover of the compact set [a, b]. Therefore, we can find a finite subset {xi : i = 1, . . ., p} of [a, b] \ F and some q ∈ N such that {U (xi ) : i = 1, . . ., p} ∪ {In : n = 1, . . ., q} q covers [a, b]. Put N := max{N (xi ) : i = 1, 2, . . ., p}. Finally, let B := n=1 In , and put A := [a, b] \ B. The set A is a finite union of intervals. Observe that 4 4 4 sn = sn + sn , n ∈ N. I
A
B
Now we have B sn ≤ Mε, sincesn (t) ≤ M for every n ∈ N and every t ∈ [a, b], and ∞ n=1 λ(In ) < ε. Moreover, A sn ≤ ε(b − a) for n ≥ N , since sn (t) < ε for every t ∈ A and every n ≥ N . All together, this shows that 4 sn < Mε + ε(b − a), for all n ≥ N. I
This proves the assertion.
2
Let {sn }∞ n=1
Corollary 724 be an increasing sequence of step functions on a general interval I . Assume that, for some function f on I , sn ↑ f (a.e.). Assume, too, that s ↑ L for some L ∈ R. Then, if s is a step function on I such that s ≤ f (a.e.), I n we have I s ≤ L. Proof Define a sequence {rn }∞ n=1 of step functions on I by rn := max{s − sn , 0}, for n ∈ N. Note that the sequence {rn }∞ n=1 is decreasing. Note, too, that rn → 0 (a.e.). By Proposition 723, r → 0. Since s − sn ≤ rn for all n ∈ N, and I (s − sn ) → n I s − L, we get s − L ≤ 0. This proves the assertion. 2 I I
7.3.3
Upper Functions
We enlarge now the set of functions for which the integral can be computed. Corollary 724 provides the clue for the proper definition of the enlarged set. In a similar spirit of that of step functions, the new objects consist of functions that may be only defined (a.e.) on its domain. Definition 725 Let I be a general interval in R. A real-valued function u defined on I is called an upper function if there exists a sequence {sn }∞ n=1 in S(I ) such that (i) {sn }∞ n=1 is increasing. (ii) sn ↑ u (a.e.).
392
(iii)
7 Integration
I sn
↑ L for some L ∈ R.
A sequence {sn }∞ n=1 inS(I ) such that (i), (ii), and (iii) hold is said to generate the function u. We define I u := L (a real number). The set of upper functions on I is denoted by U(I ). The set U(I ) is a subset of the set of all measurable functions defined on I . This follows readily from (c) in Proposition 402. See also Proposition 408. To justify the consistency of the definition of I u for an upper function u in Definition 725, we need to guarantee that the limit L there, whenever it exists, is independent of the particular choice of the sequence {sn }∞ n=1 of step functions. This is done in the next result. Proposition 726 Let u ∈ U(I ), where I is a general interval in R. Let {rn }∞ n=1 and {sn }∞ be two increasing sequences in S(I ) such that r ↑ u (a.e.), s ↑ u (a.e.), n n n=1 and I rn ↑ L, where L ∈ R. Then I sn ↑ L. Proof By Proposition 724, for every n ∈ N we have I sn ≤ L, since sn is a step function such that sn ≤ u (a.e.). Moreover, the sequence { I sn }∞ n=1 in increasing. Therefore, it converges, by Theorem 135. Let S be its limit. Since I sn ≤ L for all n ∈ N, we get S ≤ L. It is enough now to reverse the roles of {sn } and {rn } to get L ≤ S. 2 Proposition 727 Let f be a real-valued continuous function defined on a closed b and bounded interval [a, b]. Then f ∈ U([a, b]), and the Riemann integral a f of f on [a, b] coincides with the integral [a,b] f of f as a function in U([a, b]) (Definition 725). Proof If the interval [a, b] is degenerated, there is nothing to prove. Assume now that a < b and let M := sup{f (x) : x ∈ [a, b]}. For n ∈ N, let Pn ∈ P([a, b]) be given by Pn := {a = x0 < x1 = a + (b − a)/n < x2 = a + 2(b − a)/n < . . . < xn−1 = a + (n − 1)(b − a)/n < xn = b} . Define a step function sn on [a, b] as ⎧ ⎨inf{f (x) : x ∈ (x , x )} i−1 i sn (x) := ⎩M
if x ∈ (xi−1 , xi ), i = 1, 2, . . ., n, otherwise.
on [a, b] The sequence {sn }∞ n=1 is obviously increasing. The uniform continuity of f ∞ ensures that {sn (x)}∞ n=1 converges to f (x) for all x ∈ [a, b]\P , where P := n=1 Pn (since P is countable, it has Lebesgue measure 0). Indeed, given ε > 0 there exists δ > 0 such that |f (x) − f (y)| < ε whenever x, y ∈ [a, b], |x − y| < δ. Fix x ∈ [a, b] \ P . It follows that for n ∈ N such that (b − a)/n < δ, and letting I be the subinterval defined by Pn where x lies, we have 0 ≤ f (x) − inf{f (y) : y ∈ I } ≤ ε, so f (x) − sn (x) ≤ ε for n ∈ N big enough.
7.3 The Lebesgue Integral
393
b Observe that a sn = L(f , Pn ), where L(f , Pn ) is the Riemann lower sum defined in (7.4). The fact that f is Riemann integrable (see Proposition 674) show that b b sn → a f as n → ∞. All together we get then that f ∈ U[a, b] and that a b b b 2 a sn → a f . This proves that a f = [a,b] f . The set U(I ) is rather stable: this is the content of the following result. Proposition 728 Let u, v be two real-valued functions defined on a general interval I . Let α ≥ 0. Then, if u, v ∈ U(I ), (i) (u + v) ∈ U(I ), and I (u + v) = I u + I v. (ii) αu ∈ U(I ), and I (αu) = α I u. (iii) If u ≤ v (a.e), then I u ≤ I v. (iv) If u = v (a.e.), then I u = i v (v) max{f , g} ∈ U(I ) and min{f , g} ∈ U(I ). ∞ Proof Let {sn }∞ n=1 ({tn }n=1 ) be a sequence in S(I ) that generates u (respectively, v). ∞ (i) Note that generates u+ (sn + tn )n=1 (∈ S(I )) obviously v. Since I (sn + tn ) = I sn + I tn for all n ∈ N, I sn → I u, and I tn → I t, we get I (u + v) = I u + I v. ∞ (ii) The sequence {αsn }n=1 generates αu. The results follows from the fact that I αsn = α I sn for all n ∈ N. (iii) Consider the sequence {tn }∞ n=1 , and note that I tn → I v. Since sn ≤ u ≤ v, Corollary 724 shows that I sn ≤ I v, for all n ∈ N. Letting n → ∞ we get u ≤ I I v. (iv) is a straightforward consequence of (iii). (v) Put Rn := max{sn , tn }, and rn := min{sn , tn }, for n ∈ N. Then {rn }∞ n=1 and ∞ {Rn }∞ are sequences in S(I ). The sequence {r } is increasing; moreover, n=1 n n=1 rn → min{u, v} (a.e.), and rn ≤ sn ≤ u, hence I rn ≤ I u, so the increasing sequence { I rn } is bounded above. Therefore, it converges, so min{u, v} ∈ U(I ). Note now that sn + tn = rn + Rn , so Rn = sn + tn − rn for all n ∈ N. We get R I n = I sn + I tn − I rn . Since the right-hand side of this equality converges, so does the left-hand side. This proves that {Rn }∞ n=1 generates max{u, v}, and so max{u, v} ∈ U(I ). 2
Remark 729 The condition α ≥ 0 in Proposition 728 (ii) cannot be removed. There is a function u ∈ U(I ) such that −u ∈ U(I ). To describe this function, let {rn }∞ n=1 be the sequence of all rational numbers in I := [0, 1], and put In := [rn − 4−n , rn + 4−n ] n for all n ∈ N. Let A := ∞ k=1 Ik , and, for n ∈ N, put An := k=1 Ik . For each n ∈ N, the characteristic function χ of A belongs to S(I ), and χ An n An ↑ χA . Since n n n λ(I ) ≤ 2/4 , we get χ ≤ 2/3 for all n ∈ N, hence λ(An ) ≤ k A n k=1 k=1 I χA ∈ U(I ) and I χA ≤ 2/3. Observe now that if a step function s satisfies s ≤ −χA , then s ≤ −1 (a.e.) (hence I s ≤ −1). This follows from the fact that every nonempty subinterval of I contains rational numbers. Assume for a moment that −χA ∈ U(I ). Then, from (−χ the previous observation we get A ) ≤ −1.Since, due to Proposition 728 (i), I χ + (−χ ) = 0, we obtain (−χ ® A A A ) = − I χA , a contradiction. I I I
394
7 Integration
Proposition 730 Let I be a general interval. Let c ∈ I . Put Ic− := I ∩ (−∞, c], and Ic+ := I ∩ [c, +∞). (i) If u ∈ U(I ), then u Ic− ∈ U(Ic− ), and u Ic+ ∈ U(Ic+ ). (ii) Let u be a real-valued function on I . If u Ic− ∈ U(Ic− ) and u Ic+ ∈ U(Ic+ ), then u ∈ U(I ). Proof ∞ (i) Let {sn }∞ n=1 be a sequence in S(I ) that generates u. Obviously, {sn Ic− }n=1 is − − a sequence in S(Ic ) that generates u Ic− , and so u Ic− ∈ U(Ic ). A similar argument applies to Ic+ . ∞ − (ii) Let {sn }∞ n=1 ({tn }n=1 ) be a sequence in S(Ic ) that generates u Ic− (respectively, u Ic+ ). Define, for n ∈ N, ⎧ ⎨s (x) if x ∈ I − , n c hn (x) := ⎩tn (x) if x ∈ I + \ {c}. c
Then {hn }∞ n=1 is a sequence in S(I ) that generates u, so u ∈ U(I ).
7.3.4
2
Lebesgue Integrable Functions
The “pathology” of the set U(I ) described in Remark 729 (if a function u belongs to U(I ) it is not necessarily true that −u ∈ U(I )) leads us to introduce a larger set of functions, defined below. Definition 731 Let I be a general interval in R. A function f : I → R is said to be Lebesgue integrable if f = u1 − u2 , where u1 , u2 ∈ U(I ). In such a case, we define the real number 4 4 4 f := u1 − u2 , (7.55) I
I
I
and we call this value the Lebesgue integral of f on I . The set of all Lebesgue integrable functions on I will be denoted by L(I ). It is necessary, for coherence, to prove that I f is independent of the particular representation of f as a difference of two functions in U(I ). This is done in the next result. Proposition 732 Let u1 , u2 , v1 , v2 be elements in U(I ) such that u1 − u2 = v1 − v2 (a.e.). Then I u1 − I u2 = I v1 − I v2 . Proof Note that u1 + v2 = u2 + v1 (a.e.). Apply then (i) and (iv) in Proposition 728 to get I u1 + I v2 = I u2 + I v1 , and the conclusion follows. 2 Remark 733 Let I be a general interval in R. Let u1 and u2 be two real-valued functions defined on I . Assume that u1 (x) = u2 (x) (a.e.) in I , and that u1 ∈ U(I ).
7.3 The Lebesgue Integral
395
It follows from Definition 725 that u2 ∈ U(I ), too, and from (iv) in Proposition 728 that I u1 = I u2 . As a consequence of this and Proposition 732, the same is true for Lebesgue integrable functions and their integrals: If f1 and f2 are two real-valued functions defined on I , f1 (x) = f2 (x) (a.e.) on I , and f1 is Lebesgue integrable on I , then f2 is Lebesgue integrable on I and I f1 = I f2 (see also (iv) in Proposition 734 below). ® The set L(I ) is a subset of the set of all measurable functions defined on I . Indeed, each upper function was seen to be measurable, and so the conclusion follows from (b) in Proposition 402. It is implicit in Definition 731 that the integral of a Lebesgue integrable function is always finite. This is in contrast to some other approaches, where the integral of a (nonnegative) function is allowed to be +∞ (although the term “integrable” applied to a function means that the integrals should be finite), see, for example, [Wi69, pp. 22, 32]. Note that U(I ) ⊂ L(I ). The example in Remark 729 shows that this inclusion is strict. The set L(I ) is quite stable. This is the content of the next result. Recall that if f is a real-valued function defined on I , then f + := max{f , 0}, and f − := max{−f , 0}. Proposition 734 Let I be a general interval in R. Let f , g ∈ L(I ). Then (i) If α, β ∈ R, then αf + βg ∈ L(I ), and I (αf + βg) = α I f + β I g. (ii) If f ≥ 0 (a.e.), then I f ≥ 0. (iii) If f ≥ g (a.e.), then I f ≥ I g. (iv) If f = g (a.e.), then I f = I g. + − (v) The functions f , f , |f |, max{f , g}, and min{f , g} are in L(I ). Moreover, | I f | ≤ I |f |. Proof (i) follows from the corresponding property of the class U(I ) (see (i) and (ii) in Proposition 728). (ii) follows from (iii) in Proposition 728. (iii) follows from (ii), and (iv) follows from (iii) (see also Proposition 732 above). (v) follows from the fact that, if f = u − v, where u, v ∈ U(I ), then f + = max{u − v, 0} = max{u, v} − v. It is then enough to apply Proposition 728 (v) to conclude that f + ∈ L(I ). Since f − = f + − f , we have then f − ∈ L(I ). Note, too, that + − +f , hence |f | ∈ L(I ). Since −|f | ≤ f ≤ |f |, we get − |f | ≤ |f | = f I I f ≤ |f |, and so | f | ≤ |f |. Note, too, that max{f , g} = (1/2)(f + g + |f − g|), I I I and min{f , g} = (1/2)(f +g−|f −g|), so max{f , g} and min{f , g} are in L(I ). 2 Remark 735 The reader certainly detected the absence of a stability result on the product of two Lebesgue integrable functions in Proposition 734. The reason is that a result of this type is false: There are even Lebesgue integrable functions whose square is not Lebesgue integrable. For an example, see Remark 781. In Propositions 757 and 829 we shall give two positive results in this direction. See also Corollary 780 for the validity of the result in the Riemann case. ® Proposition 736 Let I be a general interval. Let c ∈ I . Put Ic− := I ∩ (−∞, c], and Ic+ := I ∩ [c, +∞).
396
7 Integration
(i) If f ∈ L(I ), then f Ic− ∈ L(Ic− ), and f Ic+ ∈ L(Ic+ ). (ii) Let f be a real-valued function on I . If f Ic− ∈ L(Ic− ) and f Ic+ ∈ L(Ic+ ), then f ∈ L(I ). Proof The conclusion follows from Proposition 730. Indeed, for (i), if f ∈ L(I ), then f = u − v, where u, v ∈ U(I ). Since f Ic− = u Ic− −v Ic− , the conclusion follows. The same argument applies to f Ic+ . The argument for proving (ii) is similar. 2 Example 737 Dirichlet’s function introduced in Definition 296 and the function identically zero are equal (a.e.). It follows that Dirichlet’s function is Lebesgue integrable, and its integral is zero (see Remark 733). ♦ We finalize this subsection by an approximation result that will be used later. Lemma 738 Let I be a general interval in R. Fix f ∈ L(I ) and ε > 0. Then (i) there exists u, v ∈ U(I ) such that f = u − v, v ≥ 0 (a.e.), and I v 0, there exists a continuous function h ∈ C(I ) such that I |f − h| < ε. Proof Find, by (ii) in Lemma 738, s ∈ S(I ) and g ∈ L(I ) such that f = s + g and I |g| < ε/2. It is not difficult to see (see Exercise 13.479) that there exists an element h ∈ C(I ) such that I |s − h| < ε/2. Then 4 4 4 2 |f − h| ≤ |f − s| + |s − h| < ε/2 + ε/2 = ε. I
7.3.5
I
I
Convergence Theorems
The Monotone Convergence Theorem Under this denomination, it is usually understood any kind of result that ensures the existence (a.e) and integrability of the limit of an increasing (decreasing) sequence
7.3 The Lebesgue Integral
397
(in the pointwise sense, and understood almost everywhere) of integrable functions with a common upper (respectively, lower) bound for the integrals. Remark 740 Theorems 741, 742, and 744 below, are formulated for increasing sequences of functions. Corresponding results hold for decreasing sequences of functions. ® The next four results are due to the Italian mathematician B. Levi. Theorem 741 (Levi’s Monotone Convergence Theorem for the class S(I )) Let I be a general interval in R. Let {sn }∞ n=1 be a sequence in S(I ) that satisfies the two following conditions: 3 (i) The sequence {sn }∞ n=1 is increasing . is bounded above. (ii) The sequence { I sn }∞ n=1
Then there exists an element u ∈ U(I ) such that sn ↑ u (a.e.), and
I sn
↑
I
u.
Proof Since s1 ≤ s2 ≤ . . ., and s1 is a step function, by adding if necessary a constant we may assume that 0 ≤ s1 (x) for all x ∈ I . We shall prove that the set D := {x ∈ I : sn (x) does not converge as n → ∞} (7.56) is null. Let M be an upper bound for the sequence { I sn }. Fix ε > 0. Define another sequence {tn }∞ n=1 of step functions on I as 5 ε 6 sn (x) , for x ∈ I andn ∈ N, (7.57) tn (x) := 2M where · denotes the floor function (see Definition 39). It is obvious that the sequence {tn }∞ n=1 is increasing and positive. Put Dn := {x ∈ I : tn+1 (x) − tn (x) ≥ 1}, n ∈ N.
(7.58)
Certainly, D⊂
∞
Dn .
(7.59)
n=1
Since, for each n ∈ N, tn+1 − tn is a step function, the set Dn is a finite union of intervals. Therefore, from the definition of the integral of a step function, and from the monotonicity of the integral on the class S(I ), 4 4 λ(Dn ) = χDn ≤ (tn+1 − tn ). I
I
It follows that, given N ∈ N, N n=1
λ(Dn ) ≤
N 4 n=1
(tn+1 − tn ) I
3 Since a step function s is defined (see Definition 721) disregarding the values at the end points of the finite family of subintervals that define s—and then s can be redefined to be, say, 0 at those points without affecting the integral—, we may omit the term (a.e.) here.
398
7 Integration
4 =
4
4
tN+1 −
t1 ≤
I
I
tN+1 ≤ I
∞
ε 2M
4 sN +1 ≤ I
ε < ε. 2
This holds for every N ∈ N, hence, n=1 λ(Dn ) ≤ ε. It follows that D is null, by the definition of a null set. Put u(x) := limn→∞ sn (x) for x ∈ I \ D, and u(x) := 0 for x ∈ D. Then u ∈ U(I ), and the statement follows from the definition of the class U(I ) and the integral of an element in this class. 2 Theorem 742 (Levi’s Monotone Convergence Theorem for the class U(I )) Let I be a general interval in R. Let {un }∞ n=1 be a sequence in U(I ) that satisfies ∞ (i) The sequence {un }n=1 is increasing (a.e.). (ii) I un is bounded above.
Then {un }∞ n=1 converges (a.e.) to an element u ∈ U(I ), and I un ↑ I u. Proof Assume that I un ≤ R for every n ∈ N. For n ∈ N, let {sn,m }∞ m=1 be a sequence in S(I ) that generates un . For m ∈ N, put tm := max{sn,m : n = 1, 2, . . ., m}. It is obvious that {tm }∞ m=1 is an increasing sequence in S(I ). Moreover, for m ∈ N and n = 1, 2, . . ., m 4 4 4 sn,m ≤ un ≤ um ≤ R, I
I
I
∞ hence I tm ≤ R for all m ∈ N. By 741, the sequence {tn }n=1 converges Theorem (a.e.) to a function u ∈ U(I ), and I tm → I u. Given m ∈ N and n = 1, 2, . . ., m, we have sn,m ≤ tm ≤ u (a.e.). Fixing n ∈ N and letting m → ∞, we get un ≤ u (a.e.). Thus, the increasing sequence {un } is bounded above (a.e.) by u, hence it converges (a.e.) to a function v such that v ≤ u (a.e.). Moreover, tm ≤ um (a.e.) for every m ∈ N. Letting m → ∞ we get then u ≤ v (a.e.). Thus, the two functions u andv are equal (a.e.). In particular, the sequence {un }∞ is convergent (a.e.) to u, and u → 2 n n=1 I I u.
Theorem 743 (Levi’s Monotone Convergence Theorem for series in L(I )) Let I be a general interval in R. Let {gn }∞ n=1 be a sequence in L(I ) that satisfies (i) Each function gn is nonnegative (a.e.). (ii) There exists R > 0 such that nk=1 I gk ≤ R for every n ∈ N. ∞ Then there exists g ∈ L(I ) such that ∞ n=1 gn = g (a.e.) and n=1 I gn = I g. Proof By Lemma 738, we can write gn = un − vn , where un , vn ∈ U(I ), vn ≥ 0 (a.e.), and I vn < 1/2n , so un = gn + vn ≥ 0 (a.e.), for every n ∈ N. Put UN := N ∞ n=1 un , for N ∈ N. The sequence {UN }N=1 is an increasing sequence in U(I ), and 4 UN = I
N 4 n=1
un = I
N 4 n=1
gn + I
N 4 n=1
vn < R + 1, for all N ∈ N. I
By Theorem 742, there exists U ∈ U(I ) such that UN → U (a.e), and I UN → I U . ∞ Analogously, put VN := N n=1 vn , for N ∈ N. The sequence {VN }N =1 is an increasing sequence in U(I ), and
7.3 The Lebesgue Integral
399
4 VN = I
N 4 n=1
vn < 1, for all N ∈ N. I
Again by Theorem 742, there exists V ∈ U(I ) such that VN → V (a.e), and I VN → N ∞ n=1 gn = UN −VN → U −V (a.e.), and this means that n=1 gn = I V . Note that U − V exists (a.e.). Then N 4 n=1
gn =
4 N
I
4
4 gn =
I n=1
4
I
It follows that
N →∞
UN −
=
∞ n=1 I
(UN − VN ) I
4
VN −→
4
I
I
gn =
∞ I
n=1
4
U−
V = I
(U − V ) =
4 ∞
I
gn .
I n=1
2
gn .
Theorem 744 (Levi’s Monotone Convergence Theorem for sequences in L(I )) Let I be a general interval in R. Let {fn }∞ n=1 be a sequence in L(I ) that satisfies ∞ (i) The sequence {fn }n=1 is increasing (a.e.). (ii) I fn is bounded above.
Then {fn }∞ n=1 converges (a.e.) to an element f ∈ L(I ), and
I
fn ↑
I
f.
Proof Put g1 := f1 , and gn := fn − fn−1 for n = 2, 3, . . . Apply Theorem 743 to the sequence {gn }∞ 2 n=2 . For Corollary 745 below we adopt the following conventions: If I is a general interval, and a measurable function f : I → [0, +∞] is not Lebesgue integrable, we put I f = +∞. Recall the definition of lim inf of a sequence {xn }∞ n=1 : lim inf n→∞ xn := limn→∞ inf{xk : k ≥ n} (see Definition 137). We may extend this definition to cover the case of a sequence {xn }∞ n=1 in [0, +∞]: The symbol +∞ is considered, as in Definition 31, to be greater than any real number, and we say that a sequence in [0, +∞] tends to +∞ whenever there exists, for every n ∈ N, an index n0 = n0 (n) ∈ N such that xk ≥ n for every k ≥ n0 . Note then that ∞ lim inf n→∞ xn = +∞ whenever every subsequence {xnk }∞ k=1 of {xn }n=1 tends to +∞. The next result is due to the French mathematician P. Fatou. Corollary 745 (Fatou’s lemma) Let {fn }∞ n=1 be a sequence of measurable functions fn : I → [0, +∞], where I ⊂ R is a general interval. Define a function f : I → [0, +∞] by f (x) := lim inf fn (x), for x ∈ I , n→∞
Then f is measurable and
I
f ≤ lim inf n→∞
I
(7.60)
fn .
Proof The measurability of f is guaranteed by (c) in Proposition 402. Put gn (x) := inf{fn (x), fn+1 (x), fn+2 (x), . . .}, for n ∈ N and x ∈ I.
(7.61)
400 Fig. 7.31 An example related to Fatou’s lemma (Remark 746)
7 Integration
1 2 3 4 f1
f2
6 f3
8
f4
The function g is defined on I and takes values in [0, +∞]. Observe that the sequence {gn }∞ n=1 is increasing and tends to f by the definition of lim inf (Definition 137) and its extension mentioned in the paragraph preceding the statement of the corollary. Note that 0 ≤ gn (x) ≤ fn (x) ≤ +∞ for all n ∈ N and all x ∈ I . It follows that f ≤ +∞ for all n ∈ N, hence 0 ≤ L := lim inf 0 ≤ I gn ≤ n n→∞ I I gn ≤ lim inf n→∞ I fn ≤ +∞. Assume first that L = +∞. Then there is nothing to prove. If, on the contrary, (0 ≤ ) L < +∞, we can find a subsequence {gnk }∞ k=1 ∞ of {gn }n=1 such that I gnk ≤ L + 1 for all k ∈ N and limk→∞ I gnk → L. The ∞ sequence {gnk }∞ k=1 is in L(I ). Theorem 744 implies that {gnk }k=1 converges (a.e.) to g → g ( = L). a Lebesgue integrable function g, and I nk I It is clear that f = g (a.e.), hence I f = I g = limk I gnk = L ≤ lim inf n→∞ I fn . 2 Remark 746 It is instructive to consider the following example; it shows that some requirement on the negative parts of the functions fn in Corollary 745 must be done to ensure its validity. Let I := [1, +∞), and put fn := (−1/n)χ[n,2n] for n ∈ N (see Fig. 7.31). These functions are Lebesgue integrable on I (and their Lebesgue integral is 1 for each of them). It is clear that the function f defined in (7.60) is the constant 0 function, whose Lebesgue integral is 0, and so the conclusion of Corollary 745 does not hold. At a certain moment the proof above breaks down for this example: The problem here is that, for every n ∈ N, the function gn defined in (7.61) is not Lebesgue integrable (according to the convention above, its integral should be −∞ by considering −gn ). The dichotomy L = +∞ or L ∈ R in the proof does not hold in this case. ® Corollary 747 Let f be a nonnegative function on a general interval I and let {In }∞ n=1 be an increasing sequence of intervals whose union is I . Assume , that -∞ for is each n ∈ N, f In is Lebesgue integrable, and that the sequence In f n=1 bounded. Then f is Lebesgue integrable and I f = limn→∞ In f . Proof : For n ∈ N, put fn := f χIn , and apply Fatou’s Lemma (Corollary 745) to the sequence {fn }∞ 2 n=1 . √ Example 748 The function f (x) := 1/ x, x = 0, is Lebesgue integrable on (0, 1) (for a picture of the graph see Fig. 7.25). Indeed, consider the increasing sequence {In := [1/n, 1−1/n]}∞ n=3 of intervals; its union is (0, 1). Since f In is continuous on (Proposition In , it is√an upper function √ √ 727), hence Lebesgue integrable. Moreover, (1/ x) dx = 2( 1 − 1/n − 1/n) ( ≤ 2) for all n ∈ N, n ≥ 3, as it follows In from the same Proposition 727. The Lebesgue integrability of f on (0, 1), as well √ as the fact that (0,1) (1/ x) dx = 2, follow from Corollary 747. Observe that the integral exists as an improper Riemann integral, see Example 715.1. ♦
7.3 The Lebesgue Integral
401
Corollary 749 Let F be a real-valued function of bounded variation defined on a closed and bounded interval [a, b]. Then F (a function that exists (a.e) according b to Corollary 433) belongs to L[a, b], and a F ≤ |F (b) − F (a)|. Proof In view of the Jordan Theorem 432, it is enough to prove the result for an increasing function F . Given n ∈ N, define ϕn : [a, b] → R by ϕn (x) :=
F (x + 1/n) − F (x) , x ∈ [a, b] 1/n
(7.62)
(in order to define ϕn properly, we may assume that F was extended to [a, b + 1] by letting F (x) := F (b) for all x ∈ [b, b + 1]). Since F has a derivative (a.e.) in
(a, b) (see Lebesgue’s Theorem 424), the sequence {ϕn }∞ n=1 converges to F (a.e.). The fact that F is increasing on [a, b] implies that F is Riemann integrable on [a, b] (see Proposition 677), in particular measurable on [a, b]. Thus, each function ϕn is also measurable, and it follows then that F , defined (a.e.) on [a, b], is a measurable function. Moreover, for each n ∈ N the function ϕn is nonnegative, due to the fact that F is increasing, and we have 4 b 4 b 4 b 1 ϕn = n F x+ F (x) dx dx − n a a a 4 b+1/n 4 b 4 b+1/n 4 a+1/n F (x) dx − F (x) dx = n F (b) dx − n F (x) dx =n a+1/n
4
≤ F (b) − n
a a+1/n
b
a
F (a) dx = F (b) − F (a).
a
It is enough to apply now Fatou’s Lemma 745 to conclude that F is Lebesgue b integrable on [a, b], and that a F ≤ F (b) − F (a). 2 The Dominated Convergence Theorem The following theorem is one of the cornerstones of the whole Lebesgue theory of integration. Theorem 750 (Lebesgue Dominated Convergence Theorem) Let {fn }∞ n=1 be a sequence of real-valued measurable functions defined on a general interval I ⊂ R. Assume that fn → f (a.e.), and that a function Φ ∈ L(I ) such that there exists |fn | ≤ Φ (a.e.). Then f ∈ L(I ), and I fn → I f . Remark 751 The function Φ in Theorem 750 is usually called an integrable majorant. ® Proof of Theorem 750 We shall prove that there exist two sequences {gn }∞ n=1 and ∞ ∞ {hn }∞ n=1 in L(I ) such that {gn }n=1 is increasing, {hn }n=1 is decreasing, gn ≤ fn ≤ hn for all n ∈ N, and, finally, gn → f (a.e.) and hn → f (a.e.). Once this is done, we will
402
7 Integration
have I gn ≤ I h1 for every n ∈ N, so Theorem 744will conclude that f ∈ L(I ) and I gn → I f . We will have, too, that I hn ≥ I g1 for every n ∈ N, so the version of Theorem 744 for decreasing will conclude that I hn → I f . sequences Together with the fact that I gn ≤ I fn ≤ I hn , this will show that I fn → I f . To construct such hn put hn := sup{fn , fn+1 , fn+2 , . . .}, for n ∈ N. This is well defined (a.e.), due to the fact that fn → f (a.e.). Obviously, the sequence {hn }∞ n=1 is decreasing. Given ε > 0 and x ∈ I such that fn (x) → f (x), there exists N ∈ N such that f (x) − ε < fn (x) < f (x) + ε for every n ≥ N . Then, f (x) − ε < hn (x) ≤ f (x) + ε for every n ≥ N . This shows that hn → f (a.e.). It is not obvious that hn ∈ L(I ). To prove this fact, put, for n ∈ N and m ≥ n, hn,m := max{fn , fn+1 , . . ., fm }. Then {hn,m }∞ m=n is an increasing sequence in L(I ) that converges (a.e.) to hn . Moreover, 4 4 4 hn,m ≤ |hn,m | ≤ Φ, for every m ≥ n. I I I Use Theorem 744 to conclude that hn ∈ L(I ). Since I hn ≥ − I Φ, we get f ∈ L(I ) and I hn → I f . The procedure for defining {gn }∞ n=1 is similar: Put, for n ∈ N, gn := inf{fn , fn+1 , fn+2 , . . .}. This is well defined (a.e.), due to the fact that fn → f (a.e.). Obviously, the sequence {gn }∞ n=1 is increasing. Given ε > 0 and x ∈ I such that fn (x) → f (x), there exists N ∈ N such that f (x) − ε < fn (x) < f (x) + ε for every n ≥ N . Then, f (x) − ε ≤ gn (x) ≤ f (x) + ε for every n ≥ N . This shows that gn → f (a.e.). To show that gn ∈ L(I ), put, for n ∈ N and m ≥ n, gn,m := min{fn , fn+1 , . . ., fm }. Then {gn,m }∞ sequence in L(I ) that converges (a.e.) to gn . Moreover, m=n is a decreasing we have I gn,m ≥ − Φ for every m I ≥ n. Use Theorem 744 to conclude that gn ∈ L(I ). Since I gn ≤ I Φ, we get I gn → I f . 2 Remark 752 1. An easy consequence of Theorem 750 is that if I is a general interval, f : I → R and Φ : I → R are two measurable functions such that |f | ≤ Φ (a.e.), and Φ ∈ L(I ), then f ∈ L(I ). In particular, for a measurable real-valued function f defined on a general interval I we have f ∈ L(I ) if and only if, |f | ∈ L(I ). Indeed, f ∈ L(I ) ⇒ |f | ∈ L(I ) is in Proposition 734 (v), while the converse implication follows from the Lebesgue Dominated Convergence Theorem 750.
7.3 The Lebesgue Integral
403
2. Note that it follows from Theorem 750 that a real-valued measurable and bounded function defined on a bounded interval I is Lebesgue integrable, due to the fact that any constant function is clearly Lebesgue integrable on I . 3. Note that Theorem 750 does not hold for Riemann integration. Examples are provided in Remark 696 and Exercises 13.434 and 13.435. Another simple example is the following: Let {rn }∞ n=1 be the sequence of all rational numbers in [0, 1]. For n ∈ N, let fn be the characteristic function of the set ni=1 {ri }. Then fn is Riemann integrable for each n ∈ N, |fn | ≤ 1 on [0, 1], and fn → 1 − D pointwise on [0, 1], where D is the Dirichlet function (see Definition 296), whose restriction to [0, 1] is not Riemann integrable (see Remark 673.2). ® A useful consequence of Theorem 750 is the following result. Compare with Corollary 747. Corollary 753 Assume that a real-valued function f is Lebesgue integrable on a general interval I, and let {In }∞ n=1 be an increasing sequence of intervals that cover I . Then In f → I f as n → ∞. Proof The sequence {fn }∞ n=1 , where fn := f.χIn for all n ∈ N, is increasing, and |fn | ≤ |f | for all n ∈ N. Since |f | ∈ L(I ), the result follows from Theorem 750. 2 An example of the use of Corollary 753 is the following. Example 754 The function f (x) := 1/x, defined on (0, 1), is not Lebesgue integrable there. Otherwise, Corollary 753 would give that (0,1) fn → (0,1) f , where fn (x) :=
⎧ ⎨1
if x ∈ (1/n, 1),
⎩0
if x ∈ (0, 1/n],
x
and continuous on (0, 1), hence Riemann for n ∈ N. The function 1 fn is bounded integrable there, and (0,1) fn = 1/n (1/x) dx = − ln (1/n) = ln n → +∞ as n → ∞, a contradiction. ♦ Another straightforward consequence of Theorem 750 is the following version for series of functions, whose proof is left to the reader. Theorem 755 Let I be a general interval. Let {gn }∞ n=1 be a sequence in L(I ) that satisfies the following conditions: ∞ (i) R. n=1 gn converges (a.e.) to a function g : I → (ii) There exists a function G ∈ L(I ) such that | N n=1 gn | ≤ G (a.e) for every N ∈ N. ∞ Then g ∈ L(I ), the series ∞ n=1 I gn converges, and n=1 I gn = I g. Example 756 Let S be the Lebesgue singular function introduced in Definition 398. We mentioned in Example 698 that this function is continuous, hence, by Proposition 674, S ∈ R[0, 1]. There we computed its integral on the interval [0, 1] 1 to show that 0 S = 21 . Here, we modify the argument by using a series of step functions instead of a sequence of continuous function to approximate S (this time
404
7 Integration
Fig. 7.32 The three first step functions sn in Example 756
s1
0
s2
1 0
s3
1 0
1
(a.e.)). Recall again the sets In,k , k = 1, 2, . . ., 2n−1 , n ∈ N, in the construction of S. For each n define the function n 2 2k − 1 χIm,k , 2m m=1 k=1 m−1
sn =
where χ denotes the characteristic function of a set S. Observe that S = ∞ S2m−1 2k−1 k=1 2m χIm,k (a.e.) (in fact, the series converges pointwise on [0, 1] \ C, m=1 where C is the Cantor ternary set), and so {sn }∞ n=1 is a sequence of functions that converges to S (a.e.). Moreover, 0 ≤ sn ≤ 1 on [0, 1]. The function sn is a step function, so we obtain easily that 4
1
sn =
0
n 4m−1 , 6m m=1
and thus, by Theorem 750, 4 0
1
S=
∞ 4m−1 1 = . m 6 2 m=1
♦ The following stability result concerning the class of Lebesgue integrable functions may be useful. It is again a consequence of Theorem 750. Note that the result no longer holds for two Lebesgue integrable functions without extra assumptions. We shall give an example of this situation in Remark 781 below. Proposition 757 Let I be a general interval in R, let f and g be Lebesgue integrable real-valued functions defined on I , and assume that g is, moreover, bounded. Then f.g is Lebesgue integrable. Proof The function f.g is measurable on I (see (b.3) in Proposition 402). By assumption, there exists a constant M such that |g(x)| ≤ M (a.e.). Then |f (x).g(x)| ≤ M|f (x)| (a.e.), and the function M|f (x)| is Lebesgue integrable (see (v) in Proposition 734). Thus, the result follows from Remark 752.1. 2
7.3 The Lebesgue Integral
7.3.6
405
Measure and Integration
The Lebesgue theory of integration can be understood as a part of the Lebesgue measure theory or viceversa. If the emphasis is put on integration, then it is enough to say that a subset A of R is Lebesgue measurable with finite Lebesgue measure λ(A) if its characteristic function is Lebesgue integrable and its Lebesgue integral is λ(A). If, on the contrary, the main matter is measure, the Lebesgue integral of a characteristic function—and then of a step function—can be defined as the measure of its support set, and then, for an “arbitrary” function, via approximation by step functions. Measure and the Integral of a Characteristic Function Recall that the characteristic function χA of a subset A of R is the function that takes value 1 on A and 0 on its complementary set. Proposition 758 A subset A of R is Lebesgue measurable if and only if, its characteristic function χA is Lebesgue measurable. Proof Assume that A is measurable. Note that ⎧ ⎪ if r > 1, ⎪ ⎨R −1 c χA (−∞, r) = A if 0 < r ≤ 1, and ⎪ ⎪ ⎩ ∅ if r ≤ 0. It follows that χA is Lebesgue measurable. Assume now that χA is Lebesgue measurable. Then Ac = χA−1 (−∞, 1) is Lebesgue measurable, hence A is also Lebesgue measurable. 2 Proposition 759 Let A ⊂ R be a Lebesgue measurable subset of R such that λ(A) < ∞. Then χA ∈ L(R) and λ(A) = R χA . Proof Note first that the result is obvious for a bounded interval in R. Second, we shall prove the result for a nonempty open set U in R such that ∞ λ(U ) < +∞. There ∞exists a sequence{In n }n=1 of pairwise disjoint open intervals in R such that U = n=1 Sn := k=1 Ik for n ∈ N. The function sn := χSn is a In . Put step function, hence R sn = nk=1 λ(Ik ) = λ(Sn ) ( ≤ λ(U ) < +∞). Moreover, the sequence {sn }∞ n=1 is increasing and converges pointwise to χU . This shows that χU is an upper function (in particular, Lebesgue integrable), and that s → n R R χU . Since R sn = λ(Sn ) → λ(U ), we get the conclusion. Assume now that K is a nonempty compact subset of R. Find an open bounded interval I in R such that K ⊂ I . The set I \K is open, and λ(I \K) < +∞. Obviously, χK = χI − χI \K . From the second part of the proof, we get R (I \ K) = λ(I \ K). Since R χI = λ(I ), the result follows. Finally, let A be an arbitrary nonempty measurable set in R such that λ(A) < +∞. Given n ∈ N there exists an compact set Kn and an open set Un such that
406
7 Integration
Kn ⊂ A ⊂ Un , and λ(Un \ Kn ) < 1/n (Proposition 271). Note that χKn → χA (a.e.), and that, from the second part of the proof, the function χU1 is a Lebesgue ∞ integrable function that dominatesthe sequence {χKn }n=1 . By Theorem 750 we get that χA ∈ L(R) and R χKn → R χA . The result follows from the fact, already proved, that R χKn = λ(Kn ) for every n ∈ N, and that λ(Kn ) → λ(A). 2 Corollary 760 Let A be a subset of R. Then, if χA is integrable and R χA = 0, we have that λ(A) = 0. Proof By Proposition 758, the set A is measurable. Given n ∈ N, put An := A ∩ [− n, n]. This is a measurable subset of R, hence, again by Proposition 758,χAn is measurable. Moreover, λ(An ) ≤ 2n, hence, by Proposition 759, λ(An ) = R χAn . Note that 0 ≤ χAn ≤ χA , hence R χAn = 0. Since A = ∞ n=1 An , we get λ(A) = 0. 2 The following result completes Remark 772. Corollary 761 If I is a general interval in R, and f : I → R is an (a.e.) nonnegative Lebesgue integrable function such that I f = 0, then f = 0 (a.e.). Proof Given n ∈ N, let An := {x ∈ I : f(x) ≥ 1/n}, of R. a measurable subset Note that (1/n)χAn ≤ f , hence 0 ≤ (1/n) I χAn ≤ I f = 0, hence I χAn = 0. By Corollary 760, we get λ(An ) = 0. Since {x ∈ I : f (x) = 0} = ∞ n=1 An , the conclusion follows. 2 The Integral of a Function on an Arbitrary Measurable Set If E ⊂ R is a measurable set, we may extend the whole theory of integration to the case of measurable functions f : E → R (we say that a function f : E → R is (Lebesgue) measurable if {x ∈ E : f (x) < r} is measurable for each r ∈ R). Assume that f : E → R is measurable. Define a function fˆ : R → R by letting ⎧ ⎨f (x), if x ∈ E, f/(x) := (7.63) ⎩0, otherwise. Clearly, f/ is a measurable function. We say thatf is integrable on E whenever f/ ∈ L(R), and we define in this case E f := R f/. The class of all integrable functions f : E → R is denoted by L(E). It is a simple consequence of Remark 752.1 that if f ∈ L(E) and F is a measurable subset of E, then f F ∈ L(F ) (indeed, if / g denotes the extension to R of a function g defined on a measurable set as in (7.63), we have f/ ∈ L(R), hence |f/| ∈ L(R) by 7 (v) in Proposition 734. Now, it is clear that f7 F ≤ |f/|, hence f F ∈ L(R), and we conclude finally that f F ∈ L(F )). Note that, in this case, F f F = E (f χF ). We write then F f instead of the more cumbersome F f F . A consequence of the Lebesgue Dominated Convergence Theorem 750 and its variant, Theorem 755, is the following result.
7.3 The Lebesgue Integral
407
Fig. 7.33 The n- and m-regularization of a function f (proof of Theorem 763)
f
m
f
n fn fm
−n −m
Proposition 762 Let f ∈ L(E), where E is a measurable subset of R. Let {En }∞ n=1 be a sequence disjoint measurable subsets of E such that ∞ n=1 En = E. ∞ of pairwise Then n=1 En f = E f . vanishes Proof For n ∈ N put fn := fˆχEn , where fˆ is the extension of f to R that on E c . The sequence {fn }∞ consists of measurable functions. Obviously, ∞ n=1 n=1 fn ˆ, and | N ˆ converges to f f | ≤ | f | for all N ∈ N. Then, by Theorem 755, n n=1 ∞ ∞ f = f , i.e., f = f . 2 n n=1 R n=1 En R E Another consequence of the Lebesgue Dominated Convergence Theorem 750 is the following important result. For a different proof see Exercise 13.487. Theorem 763 Let f ∈ L(S), where S is anonempty measurable subset of R. Then, given ε > 0 there exists δ > 0 such that E |f | < ε whenever E is a measurable subset of S with λ(E) < δ. Proof Given n ∈ N, let fn be the “n-regularization” of f , i.e., (see Fig. 7.33) ⎧ ⎨f (x) if |f (x)| ≤ n, fn (x) := ⎩0 otherwise. The sequence {|f −fn |} converges to 0 (a.e), and |f −fn | ≤ |f |+|f n | ≤ 2|f |. Thus, the Lebesgue Dominated ConvergenceTheorem 750 ensures that S |f − fn | → 0, so given ε there exists n ∈ N such that S |f − fn | < ε/2. Put δ := ε/(2n), and let E be a measurable subset of S such that λ(E) < δ. Note that E |f − fn | ≤ S |f − fn |. Then 4 4 4 ε ε ε |f | ≤ |f − fn | + |fn | < + λ(E).n < + = ε, 2 2 2 E E E and this proves the result.
2
The Measure Defined by a Measurable Function Recall that M denotes the σ -algebra of all (Lebesgue) measurable subsets of R (see Definition 245 and Proposition 252).
408
7 Integration
Proposition 764 Let f : R → R be a nonnegative measurable function. Put ν(E) := E f , for any measurable subset E of R such that f ∈ L(E), and put ν(E) = +∞ if f ∈ L(E). Then ν is a measure on M. Proof Obviously, ν(E) ∈ [0, +∞] for all E ∈ M. Let {En }∞ n=1 be a sequence of pairwise disjoint elements in M and let E := ∞ n=1 En . We shall show ∞
ν(En ) = ν(E).
(7.64)
n=1
This follows from Proposition 762 if f E ∈ L(E). If f E ∈ L(E), we have two possibilities: either f En ∈ L(En ) for some n ∈ N (and then obviously ∞ n=1 ν(En ) = ∈ L(E ) for all n ∈ N. In this last case, if ν(E) = +∞) or, on the contrary, f E n n ∞ Theorem 743 concludes that f E ∈ L(E), a contradiction. This n=1 En f < +∞, ∞ shows that in fact ∞ n=1 En f = +∞ and, again, n=1 ν(En ) = ν(E) = +∞. 2 The measure ν defined in Proposition 764 has a special property: It is absolutely continuous with respect to the Lebesgue measure λ (see Definition 765 below). This is in fact what it was proved in Theorem 763. In Theorem 766 below, we reformulate this result in the language of absolute continuity of measures. Definition 765 A measure ν on M is said to be absolutely continuous with respect to the Lebesgue measure λ if for every ε > 0 there exists δ > 0 such that, if E ∈ M and λ(E) < δ, then ν(E) < ε. Theorem 766 Let f ∈ L(S), where S is a nonempty measurable subset of R. If ν is the measure defined by |f | on S as in Proposition 764, then ν is absolutely continuous with respect to λ. The next proposition gives a characterization of absolutely continuous measures in terms of null sets. Proposition 767 Let ν be a measure on M. Then ν is absolutely continuous with respect to the Lebesgue measure λ if and only if, ν(E) = 0 whenever E ∈ M satisfies λ(E) = 0. Proof The condition is obviously necessary. To prove sufficiency, assume that there exists ε > 0 with the property that for all n∈ N we can find En ∈ M such that ∞ ∞ ∞ −n ) < 2 and ν(E ) ≥ ε. Put E := E . Then λ( λ(E n n k n=1 k=n k=n En ) ≤ ∞ ∞ −k −n+1 = 2 for all n ∈ N, hence λ(E) = 0. However, it k=n λ(Ek ) < k=n 2 follows from Lemma 256 (proved there for the Lebesgue outer measure, although the result holds for any measure ν on M) that ν(E) ≥ ε, a contradiction. 2
7.3.7
Functions Defined by Integrals
In the preceding subsection we saw how to every nonnegative measurable function f on R we can associate a measure on R that has a particular continuity property— absolute continuity—with respect to the Lebesgue’s measure. If the function f is
7.3 The Lebesgue Integral
409
Lebesgue integrable (not necessarily nonnegative) we may naturally define a function F in a similar way as it was done in formula (7.9) for a Riemann integrable function f. Precisely, assume that f ∈ L(I ), where I := [a, b] is a closed and bounded interval in R. Observe that f [a,x] ∈ L[a, x] for x ∈ [a, b] (see the paragraph immediately preceding Proposition 762). Put then 4 f , for x ∈ [a, b],
F (x) :=
(7.65)
[a,x]
and call the function F the indefinite integral of f . A central issue in the theory of the integral is to establish the relationship between the two functions f and F appearing in (7.65). This was mentioned and treated in Sect. 7.1.4 for the Riemann integral. For the Lebesgue integral, the discussion will be done in Sect. 7.3.10. Here, we start by proving the following result that is connected to Proposition 680. Proposition 768 Let [a, b] be a closed and bounded interval in R, and let f ∈ L[a, b]. Then, the function F : [a, b] → R defined in (7.65) is absolutely continuous (hence continuous) on [a, b]. Proof Fix ε > 0 and, for the function |f |, find δ > 0 according to Theorem 763. Let {[xi , yi ]}ni=1 be a finite family of nonoverlapping intervals in I such that n i=1 (yi − xi ) < δ. Then n
n 4 |F (yi ) − F (xi )| =
i=1
i=1
≤
[xi ,yi ]
n 4 i=1
[xi ,yi ]
f
|f | =
n i=1
ν([xi , yi ]) = ν
n
[xi , yi ] < ε,
n=1
(7.66) where ν is the measure defined by |f | as inProposition764. Indeed, the last inequality n n in (7.66) follows from the fact that λ i=1 [xi , yi ] = i=1 (yi − xi ) < δ (see Theorem 763). 2 Remark 769 Comparing Proposition 768 to Proposition 680, note that in general we cannot get in Proposition 768 the function F to be Lipschitz, as the example of f (x) := √1x on the open interval (0, 1) shows (define f to be 0 at 0 and 1). Indeed, √ the corresponding function F is x, and this function is not Lipschitz on [0, 1] (see Example 4.5.8.1). ® We shall prove in Proposition 770 below that if the indefinite integral F of a realvalued Lebesgue integrable function f on a given closed and bounded interval [a, b] in R vanishes, then f = 0 (a.e.) on [a, b]. This result will turn out to be crucial in deriving properties of f from properties of F . Compare it with Corollary 761. For a slightly different proof of this proposition, see Exercise 13.486.
410
7 Integration
Proposition 770 Let f ∈ L[a, b], where [a, b] is a given closed and bounded interval in R. Assume that the function F given by (7.65) is identically 0. Then f = 0 (a.e.) on [a, b]. Proof Given n ∈ N, put En := {x ∈ (a, b) : f (x) ≥ 1/n}. Fix ε > 0 and n ∈ N. We can apply Theorem 763 to find a number δ > 0 such that if E is a measurable subset of (a, b) such that λ(E) < δ then E |f | < ε/n. The set En is a measurable subset of (a, b) so, from Proposition 266, we can find an open superset On of En such that λ(On \ En ) < δ. Without loss of generality we may assume that On ⊂ (a, b). Apply now Proposition 99 to find a pairwise disjoint sequence {Ij }∞ j =1 of open subintervals of (a, b) such that On = ∞ I . From Proposition 762 it follows j j =1 x ∞ that On f = j =1 Ij f . Observe that, due to the fact that a f = 0 for every x ∈ (a, b) we get I f = 0 for every subinterval I of(a, b), again a consequence of Proposition 762. These two facts together show that On f = 0. We have 4 4 4 4 1 − f = f− f = f ≥ λ(En ). n On \En On On \En En Thus,
1 λ(En ) ≤ n
4 On \En
|f | < ε/n,
due to the fact that λ(On \ En ) < δ. This shows that λ(En ) < ε. Since ε > 0 waschosen arbitrary, we get λ(En ) = 0. This happens for n ∈ N, so finally we get λ( ∞ n=1 En ) = 0, hence the function f is less than or equal to 0 (a.e.). A similar argument shows that f is greater than or equal to 0 (a.e.), and from this two statements the conclusion follows. 2
7.3.8
The Space L1
Remark 733 shows that if f and g are two measurable real-valued functions such that f (x) = g(x) (a.e.) on a general interval I, and f ∈ L(I ), then g ∈ L(I ), too, and (iv) in Proposition 734 shows that I f = I g. It is natural then to identify—in the framework of the Lebesgue integration theory—functions that are equal (a.e.) on a given general interval I . We even may assume that the functions are defined (a.e.) on I . Clearly, the following is an equivalence relation on the set L(I ) of all Lebesgue integrable real-valued functions on I (see the definition in Sect. 12.3): f ∼ g whenever f (x) = g(x) (a.e.) on I . Thus, L(I ) can be split in mutually disjoint classes, two functions being in the same class whenever they are equal (a.e.). Note, too, that the equivalence relation ∼ introduced above is “compatible” with the two operations sum and product by an scalar, that turn L(I ) into a vector space, in the sense that the sum of two functions in a class belongs to the same class, and the product of a scalar by a function in a class belongs to the same class. It turns out
7.3 The Lebesgue Integral
411
that the space L1 (I ) of all equivalence classes in L(I ) is a vector space, where the sum of two classes and the product of a scalar by a class are defined by taking a “representative” of each class, performing the corresponding operation, and giving as output the class where the result is. Now, the real-valued function defined on L1 (I ) by f1 := I |f |, where f ∈ f —the result being independent of the particular representative f of f—, is a norm, in the sense of Definition 895 below. Indeed, all items (i), (iii), (iv), and the “if” part of (ii) there, follow from Proposition 734, while the “only if” part of (ii) follows from Proposition 770. The space (L1 (I ), · 1 ) becomes, thus, a normed space (see again Definition 895). Minor changes allow for substituting a general interval I for a measurable subset E of R in the two paragraphs above. We collect the resulting concept in a definition for future references. Definition 771 Let E be a measurable subset of R. Two Lebesgue integrable functions defined (a.e.) on E are said to be in the same class whenever they coincide (a.e.) on E. The integral on E of a class is defined to be the integral on E of any of its representatives. The set L1 (E) consists of all classes of Lebesgue integrable real-valued functions defined (a.e.) on E, it is a vector space when the operations are naturally defined by taking representatives of the classes, and it is a normed space when endowed with the norm f1 := I |f |, where f ∈ L1 (E) and f is a representative of f. If E is a closed and bounded interval [a, b], the subset of L1 [a, b] consisting of all classes having a Riemann integrable representative is denoted by R1 [a, b]. Remark 772 As a particular instance, any real-valued function f defined on a measurable subset E of R and vanishing (a.e.) is Lebesgue integrable on E, and its Lebesgue integral is 0. The class to which f belongs is denoted by 0. For example, if D is the Dirichlet function (Definition 296), the function (1 − D) [0,1] is the characteristic function of the set Q ∩ [0, 1], so it vanishes (a.e.)—due to the fact that Q has Lebesgue measure zero—and accordingly, (1 − D) [0,1] is Lebesgue integrable on [0, 1] and belongs to the class 0 of L1 [0, 1]. Its Lebesgue integral on [0, 1] is thus 0. ® Remark 773 Two bounded real-valued functions defined on a closed and bounded interval [a, b] may be identical (a.e.), one of them Riemann integrable on [a, b], but not the other. In other terms, an element in R1 [a, b] may contain functions that are not Riemann integrable. As an example, consider the function 0 and the function (1−D) [0,1] on [0, 1] (see Remark 772). The first one is obviously Riemann integrable, but not the second one (see Remark 673.2). However, both belong to the same class. ® In order to avoid cumbersome notation and complicated statements, and if nothing is said on the contrary, in the framework of Lebesgue theory we shall speak of functions instead of classes of functions. Exercise 13.485 shows that the normed space (L1 [0, 1], · 1 ) is complete, i.e., it is a Banach space (see Definition 896 below). The proof can be adapted to the
412
7 Integration
space (L1 (E), · 1 ), where E is an arbitrary measurable subset of R. The space (L1 [0, 1], · 1 ) is separable. This was proved in Example 586.16. The proof can be adapted to cover the case of L1 (E) for any measurable subset of R.
7.3.9
Riemann versus Lebesgue Integrability, and the Riemann–Lebesgue Criterion for Riemann Integrability
The Lebesgue theory of integration is, indeed, an extension of the Riemann theory. Not only that the Lebesgue theory allows for integrating unbounded functions and/or functions defined on unbounded intervals (or other measurable sets) but also that every real-valued bounded Riemann integrable function defined on an interval [a, b] is Lebesgue integrable, and the value of the integral in the Riemann sense coincides with the value of the integral in the Lebesgue sense. This is the content of Theorem 783. This extension is a true one, since there are real-valued bounded Lebesgue integrable functions defined on bounded intervals that are not Riemann integrable (see Remark 772). The Riemann–Lebesgue Criterion for Riemann Integrability The ultimate way to characterize the Riemann integrable functions among the Lebesgue real-valued bounded integrable functions defined on a bounded interval [a, b] is given by Theorem 777 below. Essentially, it depends on how big (in the measure sense) is the set of points of discontinuity of the given function. A tool that will be used in the proof is the concept of oscillation of a function, introduced in Definition 700. Lemma 774 Let f : [a, b] → R be a bounded function. Then, for any ε > 0, the set Dε := {x ∈ [a, b] : ω(f , x) ≥ ε} is closed. Proof Let x0 ∈ [a, b] such that ω(f , x0 ) < ε. Then we can find δ > 0 such that ω(f , B(x0 , δ) ∩ [a, b]) < ε. Obviously, every x ∈ B(x0 , δ) ∩ [a, b] satisfies 2 ω(f , x) < ε. This shows that [a, b] \ Dε is open in [a, b], hence Dε is closed. Remark 775 Note that, if f is a real-valued bounded function on [a, b], the set D := {x ∈ [a, b] : f is discontinuous at x}
(7.67)
is the countable union of closed sets (i.e., it is an Fσ subset of [a, b]). Indeed, put, for n ∈ N, Dn := {x ∈ [a, b] : ω(f , x) ≥ 1/n}.
(7.68)
The set Dn is closed, by Lemma 774, and, obviously, D=
∞ n=1
Dn .®
(7.69)
7.3 The Lebesgue Integral
413
The following lemma, used in the proof of Theorem 777 below, is of independent interest. Lemma 776 Let f be a real-valued bounded function on an interval [a, b]. Let ε > 0. Assume that ω(f , x) < ε for every x ∈ [a, b]. Then, there exists δ > 0 such that, if J ⊂ [a, b] is a closed interval such that λ(J ) < δ, then ω(f , J ) < ε. Proof For each x ∈ [a, b] we may find δ(x) > 0 such that ω(f , B(x, δ(x))) < ε. The family {B(x, δ(x)/2) : x ∈ [a, b]} is an open cover of [a, b]. Thus, there exists a finite subcover {B(xi , δ(xi )/2) : i = 1, 2, . . ., n}. Put δ := min{δ(xi )/2 : i = 1, 2, . . ., n}. Assume now that J ⊂ [a, b] is a closed interval such that λ(J ) < δ. Certainly, J intersects some B(xi , δ(xi )/2), so J ⊂ B(xi , δ(xi )). This implies that ω(f , J ) < ε. 2 Now we can formulate the announced criterion for Riemann integrability (Theorem 777 below). It is implicit in the work of B. Riemann, and explicit in H. Lebesgue’s PhD dissertation [Le02] and in his Memoir [Le04], where references to the work of Riemann are done. It is known in the literature as the Riemann–Lebesgue criterion for Riemann integrability. Theorem 777 (Riemann–Lebesgue) Let f be a real-valued bounded function defined on [a, b]. Then f ∈ R[a, b] if and only if, D is a null set, being D the set of all points of [a, b] where f is discontinuous. Proof Note, first, that D is an Fσ -set, see Remark 775; in particular, D is Lebesgue measurable. Assume first that λ(D) > 0. Since D = ∞ n=1 Dn (where Dn is the subset of D defined in Eq. (7.68) for n ∈ N), there exists N ∈ N such that λ(DN ) > 0 (see Proposition 247). Fix ε such that λ(DN ) > ε > 0. It follows from the definition of Lebesgue outer measure that given any countable family {In }∞ n=1 of open bounded intervals that covers DN , we must have ∞ λ(I ) > ε. Let P := {a = x0 < x1 < n n=1 . . . < xn = b} be an arbitrary partition of [a, b]. Then U (f , P ) − L(f , P ) = S1 + S2 , where S1 :=
n
(Mi − mi )λ(i ),
i=1, (xi−1 ,xi )∩DN =∅
and S2 :=
n
(Mi − mi )λ(i ),
(7.70)
i=1, (xi−1 ,xi )∩DN =∅
and, as usual, Mi := sup{f (x) : x ∈ i }, mi := inf{f (x) : x ∈ i }, and i := [xi−1 , xi ], i = 1, 2, . . ., n. The intervals in S1 cover the set DN \ P , again a set with Lebesgue measure greater than ε > 0, hence ni=1, (xi−1 ,xi )∩DN =∅ λ(i ) > ε. Moreover, for those intervals, Mi − mi ≥ 1/N , since each of them contains an element in DN . All together, we get S1 ≥ ε/N , hence U (f , P ) − L(f , P ) ≥ ε/N for every partition P of [a, b]. This shows that f ∈ R[a, b]. Assume now that λ(D) = 0. Then, λ(Dn ) = 0 for all n ∈ N. Fix n ∈ N. We can find a countable cover of Dn by open (relative to [a, b]) intervals such that the
414
7 Integration
sum of their lengths is less that 1/n. Since, by Lemma 774, Dn is a closed subset of [a, b], it is compact, so it has a finite subcover. The union of the elements in this cover is an open (relative to [a, b]) set An . Then, Bn := [a, b] \ An is a finite union of closed subintervals of [a, b]. Let I be a typical subinterval of Bn . At each x ∈ I we have ω(f , x) < 1/n. Use Lemma 776 to find δ(n) > 0 such that if J is a subinterval of I with λ(J ) < δ(n), then ω(f , J ) < 1/n. Therefore, each such interval I has a partition into subintervals J with ω(f , J ) < 1/n. Put together all the end points of the intervals so obtained (the ones covering Dn and those partitioning Bn ) to form a partition P := {a = x0 < x1 < . . . < xn = b} of [a, b]. As before, U (f , P ) − L(f , P ) = S1 + S2 , where S1 :=
n
(Mi − mi )λ(i ),
i=1, (xi−1 ,xi )∩Dn =∅
and S2 :=
n
(Mi − mi )λ(i ).
(7.71)
i=1, (xi−1 ,xi )∩Dn =∅
Now, if M := sup{f (x) : x ∈ [a, b]} and m := inf{f (x) : x ∈ [a, b]}, observe that S1 ≤ (M − m)/n, while in each of the subintervals in S2 , the oscillation of f is less that 1/n, hence S2 ≤ (b − a)/n. This shows that U (f , P ) − L(f , P ) < (M − m)/n + (b − a)/n. Since n ∈ N was arbitrary, it follows from Proposition 664 that f ∈ R[a, b]. 2 Some Consequences of the Riemann–Lebesgue Criterion, and Some Examples A straightforward consequence of Theorem 777 is the following useful result. Proposition 778 Let [a, b] be a closed and bounded interval in R. Let f ∈ R[a, b]. If ϕ is a real-valued continuous function on an interval [c, d] such that f [a, b] ⊂ [c, d], then ϕ ◦ f ∈ R[a, b] Proof Note first that the function ϕ is bounded on [c, d], and so it is ϕ ◦ f then, due to Theorem 777, the function f is continuous at points of a set C ⊂ [a, b] such that λ(D) = 0, where D := [a, b] \ C. Since ϕ is continuous, ϕ ◦ f is continuous at each point c ∈ C, by Proposition 329. It is enough now to use again Theorem 777. 2 Remark 779 1. Note that Proposition 778 fails if ϕ is assumed only to be Riemann integrable. Indeed, let ϕ be the characteristic function of the one point set {0} in [0, 1], i.e., ⎧ ⎨1, if x = 0, ϕ(x) = ⎩0, if 0 < x ≤ 1. Let R : [0, 1] → [0, 1] be the Riemann function (see Definition 700) defined by 1 at the endpoints of the interval [0, 1]. We showed in Example 679 that
7.3 The Lebesgue Integral Fig. 7.34 A sketch of the three first functions in Remark 735.2
415
ϕ1
0 ϕ2 ϕ3
ϕ3
1 ϕ2 ϕ3
ϕ3
R ∈ R[0, 1]. It is easy to see that ϕ ∈ R[0, 1]. However, ϕ ◦ R is the Dirichlet function on [0, 1] (Definition 296), which is not Riemann integrable, as it was proved in Remark 673.2. 2. It is natural to ask whether a result similar to Proposition 778 holds reversing the order in the composition. The result is false in general. We slightly modify an example in [Lu99]: Let ϕ : [0, 1] → [0, 1] be the pointwise sum of the sequence of real-valued functions {ϕn }∞ n=1 defined on [0, 1] in the following way (see Fig. 7.34; check the construction of the Cantor ternary set C+ of positive measure in Sect. 3.1.5 and fix there, for definiteness, p = 1/2; the removed open intervals are called, again Ii,j ): The function ϕ1 is continuous on [0, 1], is positive on I1,1 , with maximum value 1/2 there, and is zero on [0, 1] \ I1,1 . The function ϕ2 is continuous on [0, 1], is positive on I1,2 ∪ I2,2 , has maximum value 1/22 there, and is zero on [0, 1] \ (I1,2 ∪ I2,2 ). Continue this way. Observe that the series ∞ n=1 ϕn is uniformly convergent by the Weierstrass M-test (Theorem 473), hence the sum ϕ is a continuous function on [0, 1]. Let f : [0, 1] → [0, 1] be defined by ⎧ ⎨0 if x = 0, f (x) := ⎩1 otherwise. The function f is clearly Riemann integrable. Note that f ◦ ϕ is 0 at the points of the set C+ and is 1 at the remaining points in [0, 1]. Since C+ is perfect , i.e., all points in C+ are accumulation points of C+ ) and it does not contain any nondegenerate interval, the set of points of discontinuity of f ◦ ϕ is C+ , a set that has Lebesgue measure 1/2. Thus, this function is not Riemann integrable by Theorem 777. 3. Proposition 778 is not longer true for Lebesgue integration. As an example, consider the function f (x) := x −1/2 for x ∈ I , where I := (0, 1], and let ϕ(x) := x 2 for x ∈ R. Observe that f is in L(I ) (see Example 748), while ϕ ◦f is the function x −1 on I , and this last function is not in L(I ) (see Example 754). This proves, too, that the square of a Lebesgue integrable function may fail to be Lebesgue integrable—incidentally showing that the product of two Lebesgue integrable functions may fail to be Lebesgue integrable. Another example is presented in Remark 781 below. In Proposition 757, we gave a positive result, under extra
416
7 Integration
hypothesis, in this direction, and in Corollary 780 below we shall provide a second proof of the stability result on the product of two functions for the class of Riemann integrable functions. ® Let us see how Theorem 777 provides an alternative proof of (vi) in Proposition 671. Corollary 780 Let [a, b] be a closed and bounded interval in R. Let f and g be functions in R[a, b]. Then f.g ∈ R[a, b]. Proof Note first that the set of points of continuity of f.g contains the intersection of the sets of points of continuity of f and g. Use then Theorem 777. 2 Remark 781 Another example illustrating that the square of a Lebesgue integrable function may fail to be Lebesgue integrable, this time a function defined on an unbounded interval, is the following: consider f := ∞ n=2 nχ[n,n+1/n3 ] , a real-valued function defined on [2, +∞). Note that f is Lebesgue integrable on [2, +∞) (this can be proved by using Levi’s the integral of monotone convergence Theorem 744, since 2 the step function sn := nk=2 kχ[k,k+1/k3 ] is less than ∞ 1/k , for n = 3, 4, . . .. k=2 2 is not Lebesgue integrable on [2, +∞), since the integral However, the function f of sn2 is nk=2 (1/k) for n = 3, 4, . . . ® Remark 782 Some previous results in the theory of Riemann integral can be deduced from the criterion for Riemann integrability (Theorem 777). We saw one instance in Corollary 780 above. Let us collect some others to appreciate the power of Theorem 777 and its consequence, Proposition 778; we include also further results that may be derived from it. 1. Real-valued continuous functions on closed and bounded intervals are Riemann integrable. This was proved in Proposition 674, and it follows immediately from Theorem 777. 2. Every real-valued monotone function f : [a, b] → R is Riemann integrable (see Proposition 677); this is also a consequence of Theorem 777. Indeed, the set of discontinuities of a monotone function is countable (Proposition 397), hence it has Lebesgue measure zero (Corollary 237—see also Proposition 247). 3. If f ∈ R([a, b]), then |f | ∈ R[a, b]. This was proved in (iv) in Proposition 671. It is also a consequence (by using Proposition 778) of the trivial fact that x → |x| is a continuous function on R. 4. If f ∈ R[a, b] and g : [a, b] → R differs from f at a finite subset S of [a, b], then g ∈ R[a, b]. This was proved in Proposition 672—together with the fact b b that both a f = a g (by the way, this provides another proof of Corollary 675). The first part can be easily deduced from Theorem 777. That the result does not hold in general if S is supposed to be countably infinite was shown in Remark 673.2 by using the Dirichlet function D, that is not Riemann integrable on [0, 1]. Observe, too, that this follows from Theorem 777, since the Dirichlet function is discontinuous everywhere (see Example 318.3), hence D = [0, 1] (we use the notation in Theorem 777). Of course, D belongs to L[0, 1] (see Remark 772), and its Lebesgue integral on [0, 1] is 1.
7.3 The Lebesgue Integral
417
5. The Riemann function, introduced in Definition 700, is Riemann integrable in [0, 1]. This was shown in Example 679, and it follows also from Theorem 777, since the set of points of discontinuity (the rational points in (0, 1)) is null (see Example 380). 6. The number of Riemann integrable function is huge. For example, if C denotes the Cantor set of measure 0 and the function f is defined on [0, 1] to be 0 on the complement of C (a set relatively open in [0, 1] and having measure 1) and an arbitrary bounded function on C, then f is continuous at all points in the complement of C, thus Riemann integrable on [0, 1] by the Riemann criterion. 7. On the other hand, if C+ denotes the Cantor set of positive measure (see Sect. 3.1.5), then the characteristic function of C+ is in the Baire class 1 (see Exercise 13.302) and is not Riemann integrable, as it is discontinuous at any point of C+ and we can use Theorem 777. ® Lebesgue’s Theory (Properly) Extends Riemann’s Theory Riemann integration theory applies to bounded functions defined on a closed and bounded interval [a, b] in R (we assume a < b). By the word “proper extension” here we have in mind two different things: • First, that every Riemann integrable function f on [a, b] is Lebesgue integrable, and that in this case the Riemann integral of f equals its Lebesgue integral (this is the content of Theorem 783 below), and that there are bounded functions on [a, b] that are Lebesgue but not Riemann integrable (examples were already furnished; we mention one again in Example 784.1 below). • Second, that when the set R[a, b] is seen as a subset of L1 [a, b]—and then we are identifying (a.e.) equal function—the inclusion is proper (this will be shown in Examples 784.2 and 784.3 below). The subset of L1 [a, b] consisting of classes that contain Riemann integrable functions in [a, b] was denoted by R1 [a, b] in Definition 771. In Proposition 785 below, we shall precise this situation by proving that R1 [a, b] is dense in (L1 [a, b], · 1 ), where · 1 is the integral norm (see Definition 771). Theorem 783 Let [a, b] ⊂ R be a closed and bounded interval in R. Then, every Riemann integrable function f on [a, b] is Lebesgue integrable there, and the value of the integral of f on [a, b] in Riemann’s sense equals to the value of the integral of f on [a, b] in Lebesgue’s sense. Proof Let f ∈ R[a, b]. Then f is a bounded function, so there exists m, M ∈ R such that m ≤ f (x) ≤ M for every x ∈ [a, b]. Given n ∈ N, there exists, according to Proposition 666, a partition Pn of [a, b], such that U (f , Pn ) − L(f , Pn ) < 1/n and the cut points of Pn are equally spaced in [a, b]. Without loss of generality we may assume that this spacing is less than 1/n. Put sn (Sn ) for the step function associated to L(f , Pn ) (respectively, to U (f , Pn )). Precisely (see Fig. 7.5), if Pn := {a = x0 < x1 < . . . < xN = b}, mi := inf{f (x) : x ∈ [xi−1 , xi ]}, and Mi := sup{f (x) : x ∈
418
7 Integration
[xi−1 , xi ]}, for i = 1, . . ., N , then sn =
N
mi χ(xi−1 ,xi ) , and Sn =
i=1
N
Mi χ(xi−1 ,xi ) .
(7.72)
i=1
Observe that sn and Sn vanish on the cut points of the partition. The values on those cut points are irrelevant, since the arguments below are made modulo (a.e.). We have m ≤ sn ≤ f ≤ Sn ≤ M (a.e.), = L(f , P ), and [a,b] Sn = U (f , P ). For n ∈ N, put pn := [a,b] sn ∞ max{s1 , s2 , . . ., sn }, and qn := min{S1 , S2 , . . ., Sn }. Then {pn }∞ n=1 ({qn }n=1 ) is an increasing (respectively, decreasing) sequence of step functions and we have
m ≤ sn ≤ pn ≤ f ≤ qn ≤ Sn ≤ M (a.e.).
(7.73)
Let C be the countable subset of [a, b] consisting of all cut points of the sequence of partitions {Pn }∞ n=1 ; hence, λ(C) = 0. Let D be the subset of [a, b] consisting of all discontinuity points of f . We know, from Theorem 777, that λ(D) = 0. Fix ε > 0 and take x0 ∈ [a, b] \ (D ∪ C). The function f is continuous at x0 , so there exists δ := δ(x0 , ε) > 0 such that |f (x) − f (x0 )| < ε for every x ∈ [a, b] such that |x − x0 | < δ. Since the length of each of the subintervals in Pn tends to 0 with n, we may find N ∈ N such that the interval in PN that contains x0 lies entirely in (x0 − δ, x0 + δ). In particular, f (x0 ) − ε ≤ sN (x0 ) ≤ SN (x0 ) ≤ f (x0 ) + ε, hence f (x0 ) − ε ≤ pN (x0 ) ≤ qN (x0 ) ≤ f (x0 ) + ε. Since
{pn }∞ n=1
is increasing and {qn }∞ n=1 is decreasing, we get
f (x0 ) − ε ≤ pn (x) ≤ qn (x) ≤ f (x0 ) + ε, for everyn ≥ N. This shows that pn (x0 ) → f (x0 ), and qn (x0 ) → f (x0 ). Since λ(C ∪ D) = 0, we get pn → f (a.e.) and qn → f (a.e.). Moreover [a,b] pn ≤ M(b − a) -∞ ,
and [a,b] qn ≥ m(b − a) for every n ∈ N, hence [a,b] pn respectively, n=1 , -∞ [a,b] qn n=1 converges. This proves that f ∈ U([a, b]) (in particular, that f ∈ L([a, b])), and that 4 4 pn → f (integral in the Lebesgue sense). (7.74) [a,b]
[a,b]
Note, too that from (7.73) we have, for every n ∈ N, 4 4 4 4 4 m(b − a) ≤ sn ≤ pn ≤ f ≤ qn ≤ [a,b]
[a,b]
[a,b]
[a,b]
Sn ≤ M(b − a),
[a,b]
(7.75)
7.3 The Lebesgue Integral
419
and that the Riemann integral of f in [a, b] is the (unique) real number in between L(f , Pn ) = [a,b] sn and U (f , Pn ) = [a,b] Sn (see Proposition 666). The equality of the Lebesgue and the Riemann integral of f follows then from this and from (7.74) and (7.75). 2 Example 784 1. The function D [0,1] , where D is the Dirichlet function introduced in Definition 296, is Lebesgue integrable on [0, 1], yet it is not Riemann integrable there. For the first statement see Example 737, for the second see Remark 673.2. 2. From the point of view of the Lebesgue theory, D [0,1] and the constant function 1 on [0, 1] coincide, since they are equal (a.e.), so the class to which D [0,1] belongs contains Riemann integrable functions. In order to exhibit a class of Lebesgue integrable functions on [0, 1], say, that does not contain any Riemann integrable √ function on [0, 1], consider the class that contains the function f (x) := 1/ x, x ∈ (0, 1] (this function is in L[0, 1], see Example 748; the plot of its graph is in Fig. 7.25). Let g : [0, 1] → R be a measurable function such that f = g (a.e.) on [0, 1], and let N := {x ∈ [0, 1] : f (x) = g(x)}. Since N is null, given n ∈ N we can find xn ∈ [0, 1/n2 ] \ N. We get g(xn ) = f (xn ) > n, and so g is unbounded. Thus, the class where f is does not contain any Riemann integrable function. This proves that R1 [0, 1] is a proper subset of L1 [0, 1]. 3. The class in Example 784.2 does not contain any Riemann integrable function on [0, 1] since any function in the class is unbounded. We may provide here an example of a class of Lebesgue integrable functions on [0, 1] containing bounded functions but not Riemann integrable functions. Let C be a Cantor set of positive measure in [0, 1] (see Sect. 3.1.5). Let χC be its characteristic function on [0, 1]. Since C is a compact subset of [0, 1]—hence measurable—the function χC is measurable (see Proposition 758). Let f be a real-valued function defined on [0, 1] such that f = χC (a.e.). The function f is measurable. Let N := {x ∈ [0, 1] : f (x) = χC (x)}. The set N is null. We claim that f ∈ R[0, 1] —from Theorem 777 this is equivalent to claim that the set D of all discontinuity points of f has positive measure. This follows from the fact that D contains the set C \ N (a set of positive measure). To see this, take x ∈ C \ N. Note that C has an empty interior, and that N cannot contain an interval. Thus, neighborhood of x contains points in C c ∩ N c so x is a is a discontinuity point of f . ♦ Proposition 785 Let [a, b] be a closed and bounded interval in R, and assume a < b. Then, R1 [a, b] is a proper dense subset of (L1 [a, b], · 1 ). Proof That the inclusion is proper was seen in Examples 784.2 and 784.3. For denseness, fix f ∈ L[a, b]. By Corollary 739 there is a continuous function g on [a, b]—hence Riemann integrable on [a, b], see Proposition 674—such that f − g1 < ε. 2 Remark 786 Regarding Proposition 785, note that the space C[a, b] of all continuous functions on [a, b], seen as a subset of L1 [a, b], is a proper subset of R1 [a, b].
420
7 Integration
Indeed, there are Lebesgue classes containing Riemann integrable functions on [a, b] that do not contain any continuous function on [a, b]. An example is given by the Lebesgue class that contains a function f on [a, b] with a single jump discontinuity (say at c ∈ (a, b)). Indeed, any function g that differs from f at a null set of points must be discontinuous at c, due to the fact that we can find two sequences {xn }∞ n=1 in (a, c) and {yn }∞ n=1 in (c, b) such that xn → c and yn → c, and f (xn ) = g(xn ) and ® f (yn ) = g(yn ) for all n ∈ N. Improper Riemann and Lebesgue Integrability Let I be a general interval in R, and let f be a real-valued measurable function defined on I . Observe, first, that a real-valued function f on a general interval I may be bounded, improper Riemann integrable on I , and yet not Lebesgue integrable on I . An instance of this situation is provided in Example 787 below. Example 787 Consider the function f defined on I := [π , +∞) as f (x) := sinx x (see Fig. 7.24 for its plot). It was proved in Example 713 that f has a finite improper Riemann integral on I . However f is not Lebesgue integrable on I . In order to show this, assume for a moment that f ∈ L(I ). Then, by (v) in Proposition 734, |f | ∈ L(I ) as well. This is false, as it follows from the First Mean Value Theorem 692 for the Riemann integral. Indeed, given n ∈ N, there exists ξ ∈ [nπ , (n + 1)π ] such that 4 (n+1)π sin x dx x nπ 4 (n+1)π 1 2 1 2 = ≥ = . sin x dx = (− cos x)|(n+1)π nπ ξ ξ nπ ξ (n + 1)π It is enough to observe now that for n ∈ N we have 4 (n+1)π n 4 (k+1)π n ≥ |f | = f π
∞
k=1
kπ
k=1
2 , (k + 1)π
and the series k=1 (2/(k + 1) diverges. In Example 806 the argument for proving the non-Lebesgue integrability is different. ♦ On the other hand, a function f : I → R may be Lebesgue integrable and yet not improper Riemann integrable. An instance of this situation is given in Example 788 below. Example 788 Consider D [0,+∞) , where D in the Dirichlet function introduced in Definition 296. It is Lebesgue integrable on I := [0, +∞) (see Example 737). However, it is not improper Riemann integrable on I ; indeed, for each b > 0, the function D [0,b] is not Riemann integrable on [0, b] (see Remark 673.2). ♦ In contrast with Example 787, note that, in some cases, improper Riemann integrability implies Lebesgue integrability (and the coincidence of the two integrals). A typical and useful result in this direction is the following.
7.3 The Lebesgue Integral
421
Proposition 789 Let f be a real-valued function defined on [a, +∞), where a ∈ R. +∞ Assume f ≥ 0 (a.e.), and that the integral a f (x) dx exists as an improper Riemann integral. Then f is Lebesgue integrable on [a, +∞), and the improper +∞ Riemann integral a f (x) dx equals to the Lebesgue integral [a,+∞) f (x) dx. Proof This is a consequence of Levi’s monotone convergence Theorem 744. Indeed, given n ∈ N, n > a, the function f [a,n] is Riemann integrable—hence Lebesgue integrable. The sequence {f [a,n] : n ∈ N} is increasing (a.e.), and [a,n] f (x) dx = n +∞ ( ≤ a f (x) dx). It follows from Theorem 744 that f ∈ L[a, +∞) a f (x) dx n +∞ and that [a,n] f (x) dx → [a,+∞) f (x) dx. Since a f (x) dx → a f (x) dx, this shows that the Lebesgue and the improper Riemann integrals of f on [a, +∞) coincide. 2 Of course, the same result holds in case f ≤ 0 (a.e.) on [a, +∞). Similar versions b of this result can be stated for improper Riemann integrals of the form −∞ f (x) dx, +∞ −∞ f (x) dx, or for improper Riemann integrals of the second class.
7.3.10
The Fundamental Theorem of Calculus for Lebesgue Integration
Introduction Under the denomination “Fundamental Theorem of Calculus” it is loosely understood the connection between integrals and derivatives along some of the following paths (here, f denotes the indefinite integral, see formula (7.65)): ⎫ ⎧ ⎪ (1) Start with F -⇒ (2) Compute f := F (if possible) ⎪ ⎪ ⎪ ⎬ ⎨ (A) 1 Try to prove that they are equal ⇓ ⎪ ⎪ ⎪ ⎪ ⎭ ⎩ (4) Compute f ⇐- (3) Let us hope that f is integrable ⎧ ⎪ ⎪ ⎨ (3) Let us hope that F is derivable -⇒
⎫ ⎪ ⎪ ⎬ (B) ⇑ 1 Try to prove that they are equal ⎪ ⎪ ⎪ ⎪ ⎩ ⎭ (2) Compute F := f (if possible) ⇐(1) Start with f (4) Compute F
Remark 790 Regarding (A) and (B) above, some remarks are in order. Below, I := [a, b] denotes a closed and bounded interval in R. Let f be a real-valued function defined on I . We agreed to consider real-valued functions f that are defined (a.e.) on I . 1. On the set (A) of implications. (a) To go from (1) to (2) we should consider (a.e.) differentiable functions. We already know that every monotone—hence every bounded variation—function
422
7 Integration
defined on I has a derivative (a.e.), and that this derivative belongs to L(I ) (Theorem 424, its Corollary 433, and Corollary 749). However, we cannot expect that the bounded variation setting should be the proper one for this kind of result 4 . Indeed, we proved in Proposition 399 that the Lebesgue singular function S is continuous, increasing—hence of bounded variation—, and has a zero derivative (a.e). Therefore, (1), (2), and (3) hold for F := S; however, we cannot close the circle since the indefinite integral of 0 cannot recover the original function S (a function that is not zero (a.e.)!). (b) We immediately realize that to start with a bounded variation function F was not only too optimistic but totally inadequate also: If we want to recover F as an indefinite integral, say of a Lebesgue integrable function f , the function F must be, from the very beginning, absolutely continuous—the class of absolutely continuous functions form a narrower class than the class of bounded variation functions, see Remark 441. Indeed, the indefinite integral of a function in L(I ) is an absolutely continuous function. This was proved in Proposition 768. For a more precise description of this related to bounded-variation functions, see again Corollary 796. (c) Let us restart the chain of implications according to the previous remarks. (1) Let F be an absolutely continuous real-valued function on I . (2) Since F is of bounded variation, it has (a.e.) a derivative f := F . Moreover, (3) the function f belongs to L(I ). (4) We may now compute the indefinite integral of f , i.e., the function x $ → [a,x] f , for x ∈ I . Theorem 791 will show that F (x) − F (a) = [a,x] f , i.e., F (x) = F (a) + [a,x] f , for x ∈ I , and this closes the chain of implications (we need, certainly, to introduce a constant—in our a case, F (a)—since a f = 0 while F (a) may be different of 0). 2. On the set (B) of implications. (a) In (1) we start with a function f on I whose indefinite integral should be computed, so we need at least assume f ∈ L(I ). (b) For (2), note that the indefinite integral F is given by F (x) := [a,x] f for x ∈ I (see formula (7.65) and the paragraph that precedes it). (c) The function F is absolutely continuous on I (Proposition 768), thus of bounded variation, so it has a derivative (a.e.), say g, and g ∈ L(I ) (Theorem 424, its Corollary 433, and Corollary 749). This shows (3) and (4). (d) We apply Theorem791 to the function F and its (a.e.) derivative g to obtain F (x) = [a,x] g, so [a,x] (f − g) = 0 for every x ∈ [a, b]. By Proposition 770 we get f − g = 0 (a.e) on I , i.e., F = f (a.e), and the circle is closed. 3. The two procedures above that we called (A) and (B), although running in the same direction around the circle of implications, started at a different place. This is not the only difference. In (A), departing from an absolutely continuous function F , we recovered F on all the interval [a, b] of definition (and up to a constant, something unavoidable, since F and F + C, where C is a constant, 4
See, however, Corollary 796.
7.3 The Lebesgue Integral
423
have the same derivative). In (B), on the contrary, starting from a Lebesgue integrable function f , we recovered f almost everywhere on [a, b], and this is again unavoidable. Anyhow, the function f , as a Lebesgue integrable function, is in fact the representative of a class of functions, defined almost everywhere on [a, b] and agreeing almost everywhere on [a, b]. Any two of them give the same indefinite integral. ® The Main Result The following result is the Lebesgue’s counterpart of Theorem 685, and extends a version for Lipschitz functions (Theorem 1083 in Exercise 13.490) to the case of absolutely continuous functions5 . Its proof is quite natural: the derivative f of the function F is approximated (a.e.) by a sequence of step functions, obtained by splitting on a dyadic basis the interval and considering, on each subinterval, the slope of the linear approximation. Those step functions have a straightforward integral (in fact, each of them is F (b) − F (a)). The “hard” part consists in proving that the sequence of step functions so obtained converges (a.e.) (in fact, in (L1 (I ), · 1 )) to the function f . It is important to remark that a single result (Theorem 791 below) allows for completing the two procedures we called (A) and (B) at the beginning of this subsection. This has been explicitly stated in Remark 790.1 and 790.2. We think it worth to write the second part in a separate statement at the end of this subsection (Theorem 799). The first part is literally the Theorem 791 below. Theorem 791 [Fundamental Theorem of Calculus for the Lebesgue integral] Let F be an absolutely continuous real-valued function defined on a closed and bounded interval [a, b], and let f := F (defined (a.e.)). Then we have f ∈ L[a, b] and 4 F (b) − F (a) = f. (7.76) [a,b]
Proof (we follow [LP12]). Observe that f exists (a.e.) and belongs to L[a, b], see Theorem 424, its Corollary 433, and Corollary 749. For each n ∈ N we consider the equally-spaced partition Pn := {a = xn,0 < xn,1 < xn,2 < . . . < xn,2n = b} of I := [a, b] having 2n subintervals, so xn,i = a + i(b − a)2−n for i = 0, 1, 2, . . ., 2n . For n ∈ N, let us define a step function hn : [a, b) → R as follows (see Fig. 7.35): for each x ∈ [a, b) there is a unique i ∈ {0, 1, 2, . . ., 2n − 1} such that x ∈ [xn,i , xn,i+1 ); put then hn (x) :=
5
F (xn,i+1 ) − F (xn,i ) 2n = (F (xn,i+1 ) − F (xn,i )). xn,i+1 − xn,i b−a
For the bounded-variation version see Corollary 796.
(7.77)
424
7 Integration
Fig. 7.35 The functions F , h2 , and h3 in the proof of Theorem 791 F
F
a
b a P2
b P3 h3
h2
a
b a
b
lim hn (x) = f (x), for all x ∈ [a, b) \ N ,
(7.78)
On the one hand, the construction of {hn }n∈N implies that n→∞
where N ⊂ I is a null set such that f (x) exists for all x ∈ I \ N . On the other hand, for each n ∈ N, we compute 4 hn (x) dx = I
n −1 4 2
i=0
hn (x) dx = [xn,i ,xn,i+1 ]
n −1 2
(F (xn,i+1 ) − F (xn,i )) = F (b) − F (a),
i=0
and therefore, it only remains to prove that 4 4 lim hn (x) dx = f (x) dx. n→∞ I
I
Let us prove that, in fact, we have convergence in (L1 (I ), · 1 ), i.e., 4 lim |hn (x) − f (x)| dx = 0. n→∞ I
(7.79)
Let ε > 0 be fixed and let δ > 0 correspond to ε/4 in the definition of absolute continuity of F (Definition 434). Since f ∈ L(I ) (see Corollary 749) we can find, by Theorem 763, a number ρ > 0 such that for any measurable set E ⊂ I we have 4 |f (x)| dx < E
ε , whenever λ(E) < ρ. 4
(7.80)
The following crucial lemma will give us estimates for the measure of the set where f is big. We postpone its proof to the end of this one for better readability.
7.3 The Lebesgue Integral
425
Lemma 792 For each ε > 0 there exist k and nk in N such that kλ({x ∈ [a, b) : sup |hn (x)| > k}) < ε.
(7.81)
n≥nk
We continue now with the proof of Theorem 791. Lemma 792 guarantees that there exist k, nk ∈ N such that , ε kλ({x ∈ [a, b) : sup |hn (x)| > k}) < min δ, , ρ . (7.82) 4 n≥nk Let us denote A := {x ∈ [a, b) : sup |hn (x)| > k} n≥nk
which, by virtue of (7.80) and (7.82), satisfies the following properties: λ(A) < δ/k ( < δ), ε kλ(A) < , and 4 4 ε |f (x)| dx < . 4 A
(7.83) (7.84) (7.85)
We are now in a position to prove that the integrals in (7.79) are smaller than ε for all sufficiently large values of n ∈ N. This will finish the proof of Theorem 791. We start by noticing that (7.85) guarantees that for all n ∈ N we have 4 |hn (x) − f (x)| dx I 4 4 = |hn (x) − f (x)| dx + |hn (x) − f (x)| dx 4 ≤
I \A
I \A
|hn (x) − f (x)| dx +
I \A
4 |hn (x) − f (x)| dx +
4 |hn (x)| dx +
A
4
k} there is a unique index i ∈ {0, 1, 2, . . ., 2n − 1} such that x ∈ [xn,i , xn,i+1 ). Since |hn | is constant on [xn,i , xn,i+1 ) we deduce that [xn,i , xn,i+1 ) ⊂ C. Thus there exist distinct indexes ij ∈ {0, 1, 2, . . ., 2n − 1}, with j = 1, 2, . . ., p, such that C=
p
[xn,ij , xn,ij +1 ).
j =1
Therefore, by (7.83), p
(xn,ij +1 − xn,ij) = λ(C) ≤ λ(A) < δ,
j =1
and then the absolute continuity of F finally comes into action: 4 |hn (x)| dx = C
p 4 j =1
=
p
|hn (x)| dx [xn,ij ,xn,ij +1 )
|F (xn,ij +1 ) − F (xn,ij )|
0 be fixed and let ρ > 0 be such that for every measurable set E ⊂ I with λ(E) < ρ, we have 4 ε |f (x)| dx < 2 E (see Theorem 763). Let N ⊂ I be as in (7.78). We can find k ∈ N sufficiently large so that λ({x ∈ I \ N : |f (x)| ≥ k}) < ρ.
(7.90)
7.3 The Lebesgue Integral
427
Fig. 7.36 The graph of the function in Remark 793 and of its derivative
Indeed, ∞ k=1 {x ∈ I \ N : |f (x)| ≥ k} is a null set, and the sequence {{x ∈ I \ N : |f (x)| ≥ k}}∞ k=1 of measurable sets decreases, so we may apply Lemma 256. Equation (7.90) implies that 4 ε k·λ({x ∈ I \ N : |f (x)| ≥ k} ≤ (7.91) |f (x)| dx < . 2 {x∈I \N : |f (x)|≥k} Let us define Ej := {x ∈ I \ N : sup |hn (x)| > k} (j ∈ N). n≥j
Notice that Ej +1 ⊂ Ej for every j ∈ N, and λ(E1 ) < ∞, hence, again by Lemma 256, ⎞ ⎛ ∞ lim λ(Ej ) = λ ⎝ Ej ⎠ . (7.92) j →∞
j =1
Clearly, ∞ j =1 Ej ⊂ {x ∈ I \ N : |f (x)| ≥ k}, so we deduce from (7.92) that we can find some nk ∈ N such that λ(Enk ) ≤ λ({x ∈ I \ N : |f (x)| ≥ k}) + and then (7.91) yields kλ(Enk ) < ε.
ε , 2k 2
Remark 793 Just continuity of F in the statement of Theorem 791 is not sufficient for getting the conclusion. Indeed, consider the following example: Let the function F be defined on [0, 1] by ⎧ ⎨x 2 cos ( π ) if x = 0, x2 F (x) := ⎩0 if x = 0 (see Fig. 7.36). Then F exists at each point of [0, 1]. However F is not Lebesgue integrable. To see this, note that if 0 < a ≤ 1, then the function F is continuous on [a, b] and we can thus use the regular Riemann integral–Fundamental)Theorem of Calculus β 1 2 if αn = 4n+1 and βn = √12n . (Theorem 685) to find that, for example, αnn F = 2n b
∞ By considering n=1 [αn , βn ], this shows that the integral a F does not exist as a real number. ®
428
7 Integration
Theorem 791 deals with absolutely continuous functions. We saw in Remark 790.1 that it is, in general, impossible to recover a continuous function of bounded variation from its (a.e.) derivative by integration, and the example was the Lebesgue singular function, a nonzero (a.e.) bounded-variation function whose derivative vanishes (a.e.). We shall see below (Corollary 796) that this is precisely the situation: what we recover by integration is the absolutely continuous part of the function, and its singular part is lost. We introduce a definition. Definition 794 A real-valued function S of bounded variation defined on a generalized interval I in R is said to be singular whenever its (a.e.) derivative vanishes (a.e.) on I . Remark 795 An example of a singular function is the Lebesgue singular function S given in Definition 398. As an increasing function, it is of bounded variation. Its derivative vanishes (a.e.). ® Corollary 796 Let G be a real-valued function of bounded variation defined on an interval I := [a, b] in R. Then G = F + S, where F is a real-valued absolutely continuous function on I , and S is singular on I . Moreover, if g is the (a.e.) derivative of G on I , then we can take F to be the indefinite integral of g, i.e., F (x) = [a,x] g for x ∈ I . Proof Let g be the (a.e.) derivativeof G (it exists by Theorem 424). Then g ∈ L(I ) (see Corollary 749). Put F (x) := [a,x] g for x ∈ I . The function F is, according to Proposition 768, absolutely continuous on I . Put S := G − F . Observe that S is of bounded variation. We proved in Remark 790.2 that F = g (a.e.), hence S = (G − F ) = 0 (a.e.). 2 Remark 797 An integral that recovers continuous functions from their derivative was discovered later than the Lebesgue one, and is due to the German mathematician O. Perron and the French mathematician A. Denjoy. ® We finish this section by comparing the Fundamental Theorem 685 of Calculus for the Riemann integral and the Fundamental Theorem 791 of Calculus for the Lebesgue integral. Note that the first one follows from the second one: Indeed, if f : [a, b] → R is Riemann integrable on [a, b], and has an antiderivative G on [a, b], then G is Lipschitz (in particular absolutely continuous) due to the fact that its derivative on (a, b) is the bounded function f . Theorem 791, together with Corollary 675, conclude x x that G(x) − G(a) = [a,x] f ( = a f (t) dt) for all x ∈ [a, b], so F (x) := a f (t) dt b is an antiderivative, and a f (t) dt = G(b) − G(a). 1 Example 798 As a use of Theorem 791, let us find −1 sign x dx: Take F (x) = |x| for x ∈ [−1, 1]. The function F is Lipschitz on [−1, 1], hence absolutely continuous. Note that F (x) exists at points x ∈ (−1, 0) ∪ (0, 1), and F (x) = sign (x) at those points. By Theorem 791 and Corollary 675 we get that our integral is F (1) − F (−1) ( = 0). ♦ We are ready to present now the promised basic classical result of Lebesgue.
7.3 The Lebesgue Integral
429
Theorem 799 [Lebesgue’s differentiation theorem] Let f be a Lebesgue integrable function on a closed and bounded interval [a, b]. Then its indefinite integral is differentiable with derivative f (x) at almost every point x ∈ [a, b]. Proof Let F be the indefinite integral of f , i.e. F (x) = [a,x] f for x ∈ [a, b]. By Proposition 768, the function F is absolutely continuous. By Proposition 436 and Theorem 424, F has (a.e.) a derivative g on [a, b], and, due to Theorem 791, F (x) − F (a) = [a,x] g. Since F (x) − F (a) = a,x] f , we get 0 = [a,x] (f − g) for all x ∈ [a, b]. Thus, by Proposition 770, we have f = g a.e. on [a, b] and the proof is completed. 2 We refer to Exercise 13.511, which gives a stronger result than Lebesgue’s differentiation Theorem 799.
7.3.11
Integration by Parts
Results that were established for the Riemann integral in Sect. 7.1 can be extended, by using the Fundamental Theorem of Calculus 791, to the Lebesgue integral. Proposition 800 Let F and G be absolutely continuous functions on a closed and bounded interval [a, b]. Then 4 4 F (x)G (x)dx = [F G]ba − F (x)G(x)dx. (7.93) [a,b]
[a,b]
Proof Both F and G have derivatives F and G , respectively, defined (a.e.), that are Lebesgue integrable functions on [a, b] (Corollary 749). Let x ∈ (a, b) be such that F (x) and G (x) both exist. Then, for h = 0 such that x + h ∈ [a, b], (F G)(x + h) − (F G)(x) h G(x + h) − G(x) F (x + h) − F (x) G(x + h) + F (x) → F (x)G(x) + F (x)G (x), = h h
as h → 0, since G is continuous at x, so (F G) = F G + F G (a.e.).
(7.94)
The function F G is absolutely continuous (see Exercise 13.277). Recall that F ∈ L[a, b] and G ∈ L[a, b]; moreover, F and G are continuous—and so bounded—on [a, b]. This shows that F G ∈ L[a, b] and F G ∈ L[a, b] (see Remark and Theorem 791 that (F G)(b) − (F G)(a) = 752.1). Thus, it follows from (7.94)
F (x)G(x)dx + F (x)G (x)dx. 2 [a,b] [a,b]
430
7.3.12
7 Integration
Parametric Lebesgue Integrals
Theorem 801 Let M be a Lebesgue measurable subset of R, I be a general interval in R and f (x, α) be a real-valued function on M × I such that (i) For almost all x ∈ M, α $ → f (x, α) is continuous on I , (ii) For every α ∈ I , x $ → f (x, α) is Lebesgue measurable on M, (iii) There is a function g ∈ L(M) such that |f (x, α)| ≤ g(x) for all α ∈ I and for almost all x ∈ M. Then the function F defined on I by
4
F (α) =
f (x, α) dx M
is continuous on I . Proof Let αn , α ∈ I be so that limn→∞ αn = α. Put fn (x) := f (x, αn ) and f0 (x) := f (x, α) for n ∈ N and x ∈ M. Then fn (x) → f0 (x) and |fn (x)| ≤ g(x) for almost all x ∈ M. It follows that M fn (x) dx → M f0 (x) dx (i.e., F (αn ) → F (α)), by the Lebesgue Dominated Convergence Theorem 750. 2 Theorem 802 Let M be a Lebesgue measurable set in R, let I be an open interval in R, and let f (x, α) be a real-valued function on M × I such that (i) The integral (F (α) := ) M f (x, α) dx exists in the Lebesgue sense for at least one α0 ∈ I . (ii) For every α ∈ I , the function x $ → f (x, α) is Lebesgue measurable on M. (iii) There is a set N of measure zero in M such that for every x ∈ M \ N and for ∂f all α ∈ I , ∂α (x, α) exists and is finite. (iv) There is a function g ∈ L(M) such that for all x ∈ M \ N (where N is the set ∂f in (iii)) and for all α ∈ I , ∂α (x, α) ≤ g(x). Then the function F : I → R given by F (α) = M f (x, α) dx for every α ∈ I is well ∂f (x, α) belongs defined, is differentiable on I , for every α ∈ I the function x $ → ∂α to L(M), and 4 ∂f
(x, α) dx, for every α ∈ I. (7.95) F (α) = ∂α M Proof If x ∈ M \ N , α ∈ I , and h is such that α + h ∈ I , the Mean Value Theorem 365 for the function α $ → f (x, α) gives θ ∈ (0, 1) such that ∂f |f (x, α + h) − f (x, α)| = h (x, α + θh) ≤ |h|g(x) . ∂α In particular, we have for x ∈ M \ N and for α ∈ I , |f (x, α)| ≤ |f (x, α0 )| + |α − α0 |g(x). Thus, F (α) exists (and is finite) for all α ∈ I .
7.3 The Lebesgue Integral
431
Fig. 7.37 The function f (x) = x s−1 e−x for s = 0.1, 0.5, 1, 2, and 3 in Example 803
Fix α ∈ I . Then, given h ∈ R such that α + h ∈ I , 4 F (α + h) − F (α) f (x, α + h) − f (x, α) = dx. h h M
(7.96)
Let {hn }∞ n=1 be a sequence in R \ {0} such that α + hn ∈ I for all n ∈ N, and hn → 0. Put f (x, α + hn ) − f (x, α) gn (x) := , x ∈ M, n ∈ N. hn ∂f Now, for almost all x ∈ M, we have gn (x) → ∂α (x, α) as n → ∞, and |gn | ≤ g (a.e.). Therefore, by the Lebesgue Dominated Convergence Theorem 750, the ∂f function x $ → ∂α (x, α) belongs to L(M), and 4 4 F (α + hn ) − F (α) ∂f = gn (x) dx −→ (x, α) dx, (7.97) hn ∂α M M
where (7.96) was used. Since (7.97) holds for every sequence {hn } as above, we get that F is differentiable at α, and (7.95) holds for the given α. This is true for every α ∈ I. 2 Example 803 The Gamma function is defined for s > 0 by 4 (s) =
x s−1 e−x dx.
(7.98)
(0,+∞)
In Fig. 7.37, we plot the function x s−1 e−x for several values of s > 0. 1. The integral in (7.98) exists as a Lebesgue integral. Indeed, for fixed s > 0, observe that x s−1 e−1 ≤ x s−1 e−x ≤ x s−1 , forx ∈ (0, 1). This shows that e−1
4 1 x 1 x s−1 dx = e−1 0 = e−1 s s (0,1) 4 4 x s 1 1 s−1 −x ≤ x e dx ≤ x s−1 dx = 0 = . s s (0,1) (0,1)
(7.99)
432
7 Integration
Fig. 7.38 The Gamma function
6 5 4 Γ
3 2 1
0
1
2
3
4
5
Moreover, limx→+∞ (x s−1 e−x )/x −2 = 0. Indeed, ex > x n /n! for x > 0 and for each n ∈ N ∪ {0}, in particular for n > s + 1. Thus, there exists K > 1 such that (0 ≤ ) x s−1 e−x ≤ x −2 on [K, +∞); this implies that [K,+∞) x s−1 e−x dx exists as a Lebesgue integral since [K,+∞) x −2 dx does (indeed, [k,+∞) x −2 = +∞ −x −1 K = 1/K). Finally, [1,K] x s−1 e−x dx is finite as x s−1 e−x is a continuous function on [1, K]. 2. The function has the following properties (see Fig. 7.38 for a fragment of the graph of on the positive axis): (a) (b) (c) (d) (e)
lims→0+ (s) = +∞. lims→+∞ (s) = +∞. is infinitely differentiable on (0, +∞). is a strictly convex function on (0, +∞) (see Definition 807 below). (s + 1) = s(s) for s > 0.
(f) (n + 1) = n! for n ∈ N. Let us proceed to prove the assertions above. (a) That lims→0+ (s) = +∞ follows immediately from (7.99) and the fact that the function x $ → x s−1 e−x is positive on (0, +∞). (b) We now show that lims→+∞ (s) = +∞. Let {sn }∞ n=1 be an increasing sequence of positive numbers with limn sn = +∞. Then {x sn −1 e−x }∞ n=1 is an increasing sequence of positive functions tending to +∞ at each point of the interval (1, +∞). By Fatou’s Lemma (Corollary 745), limn→∞ [1,+∞) x sn −1 e−x dx = +∞. This holds for every such a sequence {sn }∞ n=1 , hence lim s→+∞ (s) = +∞. (c) Observe first that is continuous on (0, +∞). This follows from Theorem 801. Indeed, note that if 0 < s1 < s2 < ∞, then for s ∈ [s1 , s2 ], 0 < x s−1 e−x ≤ max{x s1 −1 , x s2 −1 }e−x ≤ (x s1 −1 + x s2 −1 )e−x ,
7.3 The Lebesgue Integral
433
and the latter is an integrable function on (0, ∞). In order to prove that is infinitely differentiable on (0, +∞), we use repeatedly Theorem 802 to obtain 4 (n) (s) = x s−1 e−x lnn x dx, for s > 0. (7.100) (0,+∞)
The argument to prove the existence of this integral is similar to the one used before, splitting (0, +∞) in three intervals (0, 1), [1, K], and [K, +∞). Only the existence of the integral on (0, 1) needs extra work, done by using first the Integration by Parts Theorem 705 (6 ) and then L’Hôspital’s Rule for proving that x s lnn x → 0 as x → 0+.
(d) From (7.100) we obtain, in particular, that (s) > 0 for all s > 0, and is thus a strictly convex function on (0, +∞) (see Definition 807, Corollary 820 and Remark 821 below). (e) To show that (s + 1) = s(s) for s > 0, we may use again Integration by Parts (Theorem 705) (7 ): 4 4 +∞ x s e−x dx = −x s e−x 0 + s x s−1 e−x dx. (0,+∞)
(0,+∞)
(f) Since we obtain easily (1) = 1, it follows from (803) above that (n+1) = n! for all n ∈ N. Since is a nonnegative function with properties (2) and (2) above, it must attain its minimum on (0, +∞), and, being a (strictly) convex function, the minimum is 6 For an example of how to use properly the Integration by Parts Theorem 705—formulated for the Riemann integral—in this context, see footnote 7. 7 Theorem 705 was established for Riemann integrals. However, it can be used here in the following way: fix n ∈ N and consider the functions
&
xs
if x ∈ In := [1/n, n],
0
if x ∈ (0, +∞) \ In ,
e−x
if x ∈ In := [1/n, n],
0
if x ∈ (0, +∞) \ In ,
fn (x) := and
& gn (x) :=
Both fn and gn are continuous on In and continuously differentiable on (1/n, n), hence 4 n 4 fn gn = fn gn
(0,+∞)
1/n
n = fn gn 1/n −
4
n 1/n
fn gn = ns e−n −
s 4 1 e−1/n − fn gn . n (0,+∞)
Observe that ns e−n → 0, (1/n)s e−1/n → 0 as n → ∞, and fn gn ↑ f g and fn gn ↑ f g pointwise on (0, +∞) as n → ∞. Since f g and f g are in L(0, +∞), the use of the Dominated Convergence Theorem 750 proves the statement.
434
7 Integration
Fig. 7.39 The function f (x) = ( ln (1 − x))/x in Example 804
0
1
−1 f −2
attained at a single point x0 . It is possible to compute, by numerical methods, that x0 = 1.4616321. . ., and (x0 ) = √ 0.8856032. . . Moreover, by the substitution x = t gives (see Exercise 13.493) 4 4 √ 1 e−x 2 e−t dt = π . ♦ = √ dx = 2 2 x (0,+∞) (0,+∞) Example 804 In order to evaluate 4 [0,1)
ln (1 − x) dx x
(see Fig. 7.39) it is enough to observe that, for |x| < 1, the Taylor expansion of the function ln (1 − x) gives ∞ xn , ln (1 − x) = − n n=1 hence, for 0 < x < 1,
∞ x n−1 ln (1 − x) =− . x n n=1
1 n−1 n 1 Note, too, that 0 x n dx = xn2 |1x=0 = n12 , and that the series ∞ n=1 n2 converges. We is in L[0, 1], may use now Levi’s Theorem 743 to conclude that the function ln (1−x) x and 4 ∞ ln (1 − x) 1 π2 =− dx = − 2 x n 6 [0,1) n=1 by Example 850 below.
+∞
♦ −x n
dx = 1, observe first (see Fig. 7.40) Example 805 To show that limn→∞ 0 e n is increasing and converges pointwise to the that on (0, 1) the sequence {e−x }∞ n=1 constant function 1. Then, by the Lebesgue Dominated Convergence Theorem 750, 1 1 n we have 0 e−x dx → 0 dx = 1.
7.3 The Lebesgue Integral
435
Fig. 7.40 Several functions in Example 805
Fig. 7.41 The function in Example 806
On (1, ∞), the sequence {ex }∞ n=1 is decreasing, and converges pointwise to the constant function 0. Each function in the sequence is bounded above by e−x , an element in L(1, +∞). Therefore, again by the Lebesgue Dominated Convergence ∞ ∞ n n Theorem 750, 1 ex dx → 0 as n → ∞. Thus, 0 e−x dx → 1 when n → ∞. ♦ n
Example 806 In Example 713 we proved that the so-called Dirichlet integral given by 4 +∞ sin x dx (7.101) x 0 exists (and it is finite) as an improper Riemann integral. We mentioned there an alternative argument noticing that in fact the evaluation of the integral is done by summing an alternating series whose general term has absolute +∞ value tending to zero, see Corollary 183 and Fig. 7.41. Note that the integral 0 sinx x dx does not exist in the Lebesgue sense. This was proved in Example +∞ 787. Here we provide an alternative argument: Would the integral exist, then 0 | sinx x| dx will exist, too (see (v) in Proposition 734). However, since | sin x| ≥ sin2 x for all x, and 4 0
+∞
sin2 x dx = x
4 0
+∞
1 − cos 2x dx = lim n→∞ 2
4
n 0
1 − cos 2x dx = ∞, 2
we get a contradiction. For still another argument to show this, take α ∈ (0, π/2) such that sin α = 1/2. Then (we are providing a lower bound of the integral in [nπ, (n + 1)π ] by estimating from below the area of the thick rounded box in Fig. 7.41)
436
7 Integration
sin x x dx nπ 4 (n+1)π −α sin x π − 2α ≥ x dx ≥ 2 (n + 1)π − α . nπ+α
4
(n+1)π
Indeed, the length of the basis is (π − 2α), and the value of the function | sin x|/x at (n + 1)π − α is [2[(n + 1)π − α]]−1 . Finally, observe that the series ∞ n=1 [(n + −1 1)π − α]−1 diverges (compare with the harmonic series ∞ n ). n=1 We will evaluate (7.101) by using integrals depending on a parameter. For this, we introduce the auxiliary function 4 +∞ sin (x) −ax F (a) = e dx, for a > 0. (7.102) x 0 We note that, for each a > 0, the integral in (7.102) exists as a Lebesgue integral. Indeed, the function x $ → sinx x e−ax on [0, +∞) is measurable (in fact, it is continuous on (0, +∞)), | sinx x e−ax | ≤ e−ax , and e−ax ∈ L[0, +∞); thus, we can use Remark 752.1. Fix a0 > 0, and consider I := [a0 , +∞), M := [0, +∞), f (x, a) := sin x −ax e , and g(x) = e−a0 x in Theorem 801; we obtain that F is continuous on x [a0 , +∞). Since a0 > 0 is arbitrary, we get that F is continuous on (0, +∞). Similarly, since | sin x|e−ax ≤ e−ax ≤ e−a0 x if a > a0 > 0 (for some arbitrarily chosen a0 > 0), we get, by Theorem 802, that F has a derivative in (a0 , +∞) and 4 +∞ F (a) = − sin (x)e−ax dx 0
for a > a0 . Since a0 > 0 was arbitrary, this happens for every a > 0. +∞ We now evaluate 0 sin (x)e−ax dx for a > 0. By Integration by Parts, 4
+∞
sin (x)e−ax dx = −
0
sin (x)e−ax +∞ + 0 a
4
+∞ 0
cos (x)e−ax dx, a
and again by Integration by Parts, 4 +∞ 4 +∞ sin (x)eax cos (x)e−ax +∞ −ax cos (x)e dx = − − dx. 0 a a 0 0 Putting this together, we get (F (a) = )
4 0
+∞
sin (x)e−ax dx = −
a2
Thus, F (a) = − arctan a + C, for a > 0, for some constant C.
1 . +1
7.3 The Lebesgue Integral
437
To find the constant C, note that if {an }∞ n=1 is a sequence that increases to +∞ and a1 > 1, then the sequence of functions { sinx(x) e−an x }∞ n=1 is decreasing and has a pointwise limit 0 on (0, +∞). Moreover, | sinx x e−an x | ≤ e−x on (0, +∞). The function e−x is Lebesgue integrable on (0, +∞), hence by the Dominated Convergence Theorem 750, F (an ) → 0. Since arctan (an ) → π2 , we get C = π2 . Hence, π F (a) = − arctan (a), for a > 0. 2 Note that from the Second Mean Value Theorem 694 it follows that, for some ξ ∈ [m, n], 4 n −αx e sin x x m 4 4 e−αm ξ e−αn n = sin x dx + sin x dx → 0, m m n ξ when n, m → ∞, uniformly on α > 0. Therefore, 4 n −αx 4 ∞ −αx π e e sin xdx → sin xdx = − arctan α. x x 2 0 0 when n → ∞ uniformly for α > 0. Thus, ∞ taking the limit of both sides of the previous equality when α → 0, we get 0 sinx x dx = π2 . For a different approach to the evaluation of this integral see Exercise 13.467. ♦
Chapter 8
Convex Functions
Convex functions form a class of functions indispensable in many fields of modern mathematics, ranging from linear and nonlinear analysis, approximation, optimization, and applied mathematics.
8.1
Basics on Convex Functions
A convex combination of two points x, y in a vector space E is any point of the form λx + (1 − λ)y, where λ ∈ [0, 1]. Letting λ ∈ [0, 1] we get the so-called line segment between x and y. Observe that if x = y and we let instead λ ∈ R, we get all points in the straight line through x and y. A subset C of a vector space E is said to be convex whenever any convex combination of two arbitrary points in C belongs to C. Note that a subset of R is convex, if and only if, it is a general interval. More generally, a convex combination of points x1 , x2 . . . , xn in a vector space n E is any point of the form i=1 wi xi , where wi ≥ 0 for all i = 1, 2, . . . , n, and n i=1 wi = 1. It is simple to prove that a subset C of E is convex, if and only if, it is closed by making convex combinations of its points, i.e., if and only if, given n ∈ N and points x1 , x2 . . . , xn in C, any convex combination of x1 , x2 . . . , xn belongs to C (see Exercise 13.514). Define the convex hull of a subset S of a vector space E (denoted conv (S)), as the least (in the sense of inclusion) convex subset of E that contains S (see Fig. 8.1). In this section, I denotes a general interval in R. Definition 807 A function f : I → R defined on a general interval I ⊂ R is said to be convex if
f λx + (1 − λ)y ≤ λf (x) + (1 − λ)f (y), for all x, y ∈ I , and for all λ ∈ [0, 1]. (8.1) We say that f is strictly convex if for every x, y ∈ I with x = y, and every λ ∈ (0, 1), the inequality in (8.1) is strict. © Springer International Publishing Switzerland 2015 V. Montesinos et al., An Introduction to Modern Analysis, DOI 10.1007/978-3-319-12481-0_8
439
440
8 Convex Functions
Fig. 8.1 The convex hull (in grey) of a set S
conv(S)
S
Fig. 8.2 The graph of a convex function on I := [−1, 1]; the shaded region is part of its epigraph
A function g : I → R is said to be concave (strictly concave) if −g is a convex (respectively, strictly convex) function. Geometrically, the graph of f on I (i.e., the set {(x, f (x)) : x ∈ I }, see Subsect. 4.1.1, is on or below the line segment with end points (x, f (x)) and (y, f (y)), for all x, y ∈ I (see Fig. 8.2). Observe that a function f on I is convex, if and only if, the epigraph of f , i.e., the set epi (f ) := {(x, t) : x ∈ I , f (x) ≤ t}, is a convex subset of R2 . Observe, too, that z := λx + (1 − λ)y in formula (8.1) can be written as z = y + λ(x − y), and that, in case x = y, λ=
z−y , x−y
1−λ=
x−z . x−y
Examples 808 1. The function x 2 is strictly convex on R. To show it directly one may use the fact that for x = y in R we have xy < 21 (x 2 + y 2 ) (see Exercise 13.513). The reader may appreciate the convenience to use Corollary 820 and Remark 821 below: It is enough, in this case, to observe that the second derivative of the given function is positive (in√ fact, a positive constant). 2. The function x is not convex on [0, 1]. To see this, check the points 0, 1, and 21 . 3. More generally, the function x r is convex in (0, +∞), if and only if, r ≥ 1. In order to see this, apply Corollary 820 below. For a plot of some of those functions on [0,1], see Fig. 8.12 below. ♦ The class of convex functions on I has good stability properties. For example, if f , g are convex functions on I and α ≥ 0, β ≥ 0, then αf + βg is a convex function. The sum of finitely many convex functions on I is a convex function, and so it is
8.1 Basics on Convex Functions
441
Fig. 8.3 slope(A,B)≤slope(A,C)≤ slope(B,C)
A C
f B x
y
z
the pointwise limit of a pointwise convergent sequence of convex functions. If the pointwise supremum of an arbitrary family of convex functions is finite everywhere, it is a convex function. We leave to the reader the proof of those simple statements. It is easy to see that the infimum of two convex functions may fail to be convex, and the same is true for products by negative numbers. The concept of a linear function from a vector space into another is introduced in Definition 899 below. Definition 809 A function A from a vector space E into a vector space F is said to be affine if there is a linear function T from E into F and an element y ∈ F such that A(x) = T (x) + y for every x ∈ E. Note that an affine function preserves convex combinations. Precisely, if A : E → F is affine, n ∈ N, x1 , x w1 , w . . , wn are nonnegative 2 . . . , xn are points in E, 2, . and n n real numbers such that ni=1 wi = 1, then A = w x i=1 i i i=1 wi A(xi ). Indeed, put A(x) = L(x) + y, where L : E → F is a linear function and y ∈ F . Then n n n n n A wi x i = L w i xi + y = wi L(xi ) + wi y = wi A(xi ). i=1
i=1
i=1
i=1
i=1
As a consequence, every affine function f : R → R is convex. In particular, every linear function l : R → R is convex. The following basic result is sometimes called the three chord property for convex functions (for obvious reasons, see Fig. 8.3). Proposition 810 Let f be a convex function defined on I . Then, given points x < y < z in I , we have f (y) − f (x) f (z) − f (x) f (z) − f (y) ≤ ≤ . y−x z−x z−y
(8.2)
Proof By the definition of convexity, f (y) ≤ λf (x) + (1 − λ)f (z), where λ ∈ [0, 1] satisfies y = λx + (1 − λ)z. From this we obtain f (y) − f (x) ≤ (1 − λ)(f (z) − f (x)) and f (z) − f (y) ≥ λ(f (z) − f (x)). Since λ = (z − y)/(z − x) and (1 − λ) = (y − x)/(z − x), (8.2) follows. 2 A straightforward consequence of Proposition 810 is the following result:
442
8 Convex Functions
Fig. 8.4 A convex function discontinuous at points a and b
f a
b
Proposition 811 Let f : I → R be a convex function. Then, f has finite one-sided derivatives f− (x) and f+ (x) at every point x ∈ Int I , and they satisfy f− (x) ≤ f+ (x). Moreover, given y ∈ Int I such that x ≤ y, we have f+ (x) ≤ f− (y). Proof Let ε1 and ε2 be such that [x − ε2 , x + ε2 ] ⊂ Int I and 0 < ε1 < ε2 . Then it follows from (8.2) that f (x − ε2 ) − f (x) f (x + ε1 ) − f (x) f (x + ε2 ) − f (x) ≤ ≤ . −ε2 ε1 ε2
(8.3)
Observe that (8.3) implies, in particular, that the function ε $→
f (x + ε) − f (x) ε
(8.4)
is increasing on [−ε2 , 0) and on (0, ε2 ]. Thus, there exists f+ (x) := 2 )−f (x) limε1 →0+ f (x+εε11)−f (x) and f (x−ε−ε ≤ f+ (x). Letting now ε2 → 0+ we see 2
that there exists f− (x) and f− (x) ≤ f+ (x). Applying (8.2) twice for points x < x + ε1 < y − ε2 < y we get f (x + ε1 ) − f (x) f (y − ε2 ) − f (x) ≤ ε1 y − ε2 − x f (y − ε2 ) − f (x + ε1 ) f (y) − f (x + ε1 ) f (y − ε2 ) − f (y) ≤ ≤ ≤ . (8.5) y − ε2 − x − ε 1 y − x − ε1 −ε2 It is enough now to let ε1 → 0+ in (8.5) to obtain f+ (x) ≤ ε2 → 0+, we get f+ (x) ≤ f− (y).
f (y−ε2 )−f (y) . −ε2
Letting now 2
Remark 812 1. Proposition 811 shows, in particular, that every convex function f on I is continuous on Int I . This can be deduced immediately from the existence of one-sided derivatives at such points, as this implies continuity from the left and from the right there. We shall prove in Proposition 813 that, even more, f is locally Lipschitz. Regarding this remark, see also Remark 812.8. 2. Observe that a convex function on I may be discontinuous at boundary points of I . An example is shown in Fig. 8.4. 3. A convex function f defined on a closed and bounded interval [a, b] is bounded. Indeed, the result is obviously true if a = b. Otherwise (see Fig. 8.5) observe
8.1 Basics on Convex Functions Fig. 8.5 The argument for the boundedness for f and the existence of limit at b (Remarks 812.3 and 812.8)
443 (b, f(b)) (a, f(a)) a
xn
k
b
(xnk, f(xnk))
first that f must be bounded above on [a, b] (use Proposition 810) or, equivalently, that its graph in R2 lies below or on the segment [(a, f (a)), (b, f (b))]. Assume that there exists a sequence {xn }∞ n=1 in [a, b] such that f (xn ) ≤ −n for ∞ all n ∈ N. The sequence {xn }∞ n=1 has a convergent subsequence, say {xnk }k=1 . Let x0 be its limit. If x0 ∈ (a, b) we reach a contradiction, due to the fact that f is continuous at x0 (see Remark 812.1); thus x0 ∈ {a, b}. Assume x0 = b. Take x ∈ (a, b). There exists k0 ∈ N such that x < xnk for k ≥ k0 . Proposition 810 applied to the interval [a, xnk ] will be violated for k big enough, so we need to assume x0 = a. An argument similar to the previous one proves this impossible. This shows the boundedness of f on [a, b].1 4. Another consequence of Proposition 811 is that if f is a convex function on I , then the function x $ → f+ (x) on Int I is increasing. The same is true for the function x $ → f− (x) on Int I . Indeed, given x < y in Int I , we have f− (x) ≤ f+ (x) ≤ f− (y) ≤ f+ (y). If f is not differentiable at some x ∈ Int I , then f− (x) < f+ (x). In particular, f− has a jump discontinuity at x, since f− is increasing. 5. It follows from Proposition 397 and Remark 812.4 above that the number of jump discontinuities of any monotone function is finite or at most countably infinite. This shows that every convex function f on I has at most a countable number of points of nondifferentiability (and those points are “corners” of the graph of f , see Fig. 8.6)2 . Proposition 811 has thus a clear geometrical meaning: Fix x0 ∈ Int I , and let a ∈ R such that f− (x0 ) ≤ a ≤ f+ (x0 ). Then, given x ∈ Int I , we have ⎧ f (x) − f (x0 ) ⎨≤ a, if x < x0 , ⎩≥ a, if x > x0 , x − x0 i.e., f (x) − f (x0 ) ≥ a(x − x0 ) for all x ∈ Int I . In other terms, a(x − x0 ) + f (x0 ) ≤ f (x) for all x ∈ Int I . This means that the graph of the affine function x $ → s(x) := a(x − x0 ) + f (x0 ) (a straight line in R2 that contains (x0 , f (x0 ))) “touches” the epigraph of f “from below” at (x0 , f (x0 )). The linear function 1 In every infinite-dimensional Banach space X there exists an unbounded real-valued convex function defined on BX (see Example 906 below). 2 For an example of a convex function on (0, 1) whose points of differentiability are, precisely, the irrational points of (0, 1) see Remark 812.10.
444
8 Convex Functions
Fig. 8.6 The unique subtangent at x1 , where f is differentiable, several subtangents at a “corner” x0 , where f is not
f
x0
x1
x $ → ax is called a subdifferential of f at x0 , and the graph of the affine function x $ → s(x) := a(x − x0 ) + f (x0 ) above is called a subtangent of f at x0 (see again Fig. 8.6). Certainly, a convex function f may have, at some x0 ∈ Int I , many different subdifferentials, and it is customary to speak of the subdifferential of a convex function f at a point x0 ∈ Int I as of the (always nonempty) set of all subdifferentials of f at x0 . By Remark 812.4, the slope of the subdifferentials of a convex function f : I → R at points in Int I form nonoverlapping intervals that “travel to the right” as the evaluation point in Int I increases. It is clear from the definition that a convex function is differentiable at x1 ∈ Int I , if and only if, the subdifferential of f at x1 is a singleton. It is clear, too, that f is strictly convex, if and only if, any subtangent of f at any x1 ∈ Int I intersects the graph of f exactly at (x1 , f (x1 )) (see Fig. 8.6). 6. Note, too, that Propositions 810 and 811 give that the function f+ is rightcontinuous on Int I , and that f− is left-continuous there. Indeed, given x < y in Int I , and due to (8.2), (8.4), and the continuity of f at x, we have f (y) − f (x) f (y) − f (z) ≥ lim f+ (z), whenever x < z < y. = lim z↓x z↓x y−z y−x
(8.6)
Letting y ↓ x in (8.6), we get f+ (x) ≥ limz↓x f+ (z). On the other hand, we proved in Remark 812.4 that f+ is increasing on Int I . This shows that f+ (x) ≤ limz↓x f+ (z). The two inequalities together give f+ (x) = limz↓x f+ (z). A similar argument proves that f− (x) = limz↑x f− (z). 7. Observe that a convex function f on a general interval I is finitely piecewise monotone (in particular, of bounded variation). Precisely, there exist three nonoverlapping closed intervals (some of them eventually empty) I1 , I2 , and I3 , with 3j =1 Ij = I and x1 < x2 < x3 for all xj ∈ Int Ij , j = 1, 2, 3, such that f is strictly decreasing on I1 , constant on I2 , and strictly increasing on I3 . Indeed, put J1 := {x ∈ Int I : f+ (x) < 0}, J2 := {x ∈ Int I : f+ (x) = 0}, and J3 := {x ∈ Int I : f+ (x) > 0}. Proposition 811 implies that Int I = 3j =1 Jj . According to Remark 812.4, we have x1 < x2 < x3 whenever xj ∈ Jj , j = 1, 2, 3. It follows from Remark 812.6 that J1 is open, J2 is open to the right, and J3 is also open to the right. A typical situation is depicted in Fig. 8.7. It is simple to prove that f is strictly decreasing on J1 , constant on J2 , and strictly
8.1 Basics on Convex Functions
445
Fig. 8.7 The three intervals in Remark 812.7 f
J1
J3 J2
increasing on J3 . For example, to prove the first assertion, assume that x1 < y1 in J1 . If f (x1 ) ≤ f (y1 ) we can find z1 > y1 such that f (y1 ) > f (z1 ), and this situation violates Proposition 810. The other statements are proved similarly. Finally, put Ij := Jj for j = 1, 2, 3. 8. Let f be a convex function defined on a closed and bounded interval [a, b]. Then limx↓a f (x) and limx↑b f (x) both exists as finite numbers. In particular, if we redefine f on a as limx↓a f (x) and f on b as limx↑b f (x), the resulting function defined on [a, b] is still convex, and it is continuous. The result is a straightforward consequence of the boundedness of the function f (Remark 812.3) and of its finite piecewise monotony (Remark 812.7). 9. Another consequence of Remark 812.6 is that if f : I → R is a convex function on an interval I and f is differentiable on Int I , then f is continuous on Int I . A direct proof of this result is as follows: From Proposition 811, we get that f is an increasing function on Int I , so it may have only jump discontinuities there. Moreover, as the derivative of a differentiable function, f has the intermediate value property. This two facts, together, show that f must be continuous on Int I . 10. There are convex functions on (0, 1) differentiable precisely at irrational points. An example is given in Exercise 13.519. ® We refer to the next result as “the local Lipschitz property” of a convex function. Proposition 813 Let f : I → R be a convex function. Then f [a,b] is Lipschitz for every closed and bounded interval [a, b] ⊂ Int I . Proof From (8.2) and Proposition 811 we get f+ (a) ≤ f+ (x) ≤
f (y) − f (x) ≤ f− (y) ≤ f− (b), for all a ≤ x < y ≤ b. y−x
It follows that |f (x) − f (y)| ≤ K|x − y| for all x, y ∈ [a, b], where K := 2 max{|f+ (a)|, |f− (b)|}. Remark 814 In general, even in the case that f is convex on a closed and bounded interval [a, b], we can not conclude that f √is Lipschitz on [a, b]. For example, consider the convex function f (x) := 1 − 1 − x 2 defined on [ − 1, 1]. Its nonLipschitz character was proven √ in the paragraph after Proposition 559. For another example, consider g(x) = − x on [0, 1]. This is a convex function, as it follows
446
8 Convex Functions
Fig. 8.8 Two non-Lipschitz convex function on [0, 1] (Remark 814)
from computing g
and applying Corollary 820. That g is not Lipschitz is proved in Exercise 13.285. The plot of those functions appear in Fig. 8.8. ® Proposition 815 Let f be a real-valued continuous convex function on a closed and bounded interval [a, b]. Then f is absolutely continuous on [a, b]. Proof Remark 812.7 shows that f is finitely piecewise monotone—hence of bounded variation—on [a, b]. Observe that for any c ∈ (a, (a + b)/2), the interval [c, (a +b)/2] is contained in (a, b). Proposition 813 implies then that f is Lipschitz— and so absolutely continuous—on [c, (a + b)/2]. Use Exercise 13.283 to prove that f is absolutely continuous on [a, (a + b)/2]. A similar argument proves that f is absolutely continuous on [(a + b)/2, b], and the conclusion follows. 2 Remark 816 Observe that Proposition 815 could be obtained—even in a stronger form, getting the Lipschitz property—in case that f could be extended to a convex function on an open interval containing [a, b], thanks to Proposition 813 below. However, this extension is not always possible (see Fig. 8.8). ® The following result gives a useful “symmetric” criterion for a convex function to be differentiable at a certain point. Proposition 817 A convex function f : I → R is differentiable at a point x ∈ Int I , if and only if, f (x + h) + f (x − h) − 2f (x) = 0. h→0 h lim
(8.7)
Proof Assume first that f is differentiable at x. Then lim
h→0
f (x + h) − f (x) f (x − h) − f (x) = f (x) = lim . h→0 h −h
It is enough to compute the limit of the difference of the two quotients above to get (8.7).
8.1 Basics on Convex Functions
447
Assume now that (8.7) holds. We can write f (x + h) + f (x − h) − 2f (x) f (x + h) − f (x) f (x − h) − f (x) − . (8.8) = h −h h (x) (x) Since, the two limits limh→0+ f (x+h)−f (= f+ (x)) and limh→0− f (x+h)−f h h
(=f− (x)) exist, we conclude from (8.8) that they are equal, i.e., f− (x) = f+ (x), and so f is differentiable at x. 2 We say that a function f : I → R is midpoint convex on an interval I when (y) f ( x+y ) ≤ f (x)+f for all x, y ∈ I . Obviously, a convex function is midpoint 2 2 convex. However, there are (discontinuous, see below) non-convex midpoint convex functions on R. This is connected to the existence of discontinuous solutions f to the functional equation f (x + y) = f (x) + f (y) for every x, y ∈ R, see (Stromb81, p.307). For continuous functions the converse statement is true, i.e., midpoint convexity implies convexity. This is shown in the following result.
Proposition 818 Let f : I → R be a midpoint convex and continuous function. Then f is convex. Proof Fix x, y ∈ I . It is easy to prove, by induction, that given x1 , x2 , . . . , xn ∈ I , we have x 1 + x 2 + . . . + xn f (x1 ) + f (x2 ) + . . . + f (xn ) f ≤ . (8.9) n n if n = 2m for some m ∈ N Let r ∈ N such that 1 ≤ r ≤ n, and take xi = x for i = 1, 2, . . . , r, and xi = y otherwise. It follows from (8.9) that, for n = 2m n−r r n−r r x+ y ≤ f (x) + f (y). (8.10) f n n n n Since every real number λ ∈ [0, 1] is the limit of a sequence {qn }∞ n=1 of numbers in [0, 1] of the form r/2m for m ∈ N and 1 ≤ r ≤ 2m (consider just the truncate binary expansion of λ), the continuity of f and (8.10) conclude that f (λ(x) + (1 − λ)y) ≤ λf (x) + (1 − λ)f (y). 2 The following result and its corollary are useful tests for convexity. Proposition 819 Let I be an open interval in R, and let f : I → R be a differentiable function. Then f is convex, if and only if, f is an increasing function. Proof That the condition is necessary follows from Proposition 811 (see also Remark 812.4). Assume now that f is increasing. If f is non convex, there exists x < z in I and λ ∈ (0, 1) such that f (λx + (1 − λ)z) > λf (x) + (1 − λ)f (z) (see Fig. 8.9). Put y := λx + (1 − λ)z. By the Mean Value Theorem 365, we can find u ∈ (x, y) and v ∈ (y, z) such that f (u) = (f (y)−f (x))/(y −x) and f (v) = (f (z)−f (y))/(z−y). This is a contradiction, since (f (y) − f (x))/(y − x) > (f (z) − f (y))/(z − y).
448
8 Convex Functions
Fig. 8.9 The proof of the sufficient condition in Proposition 819
Fig. 8.10 The function f (x) := x 4 is strictly convex, while f
(0) = 0
Corollary 820 Let I be an open interval and let f : I → R be a twice differentiable function. Then f is convex, if and only if, f
(x) ≥ 0 for every x ∈ I . Proof Assume that f is convex on I . From Remark 812.4 the function f on I is increasing. Since f is differentiable, this shows that f
(x) ≥ 0 for all x ∈ I . Assume now that f
(x) ≥ 0 for all x ∈ I . Then, f is an increasing function, and the result follows from Proposition 819. 2 Remark 821 In the situation of Corollary 820, note that f
(x) > 0 for all x ∈ I implies that f is strictly convex. Indeed, if f (x) = f (y) = f (z) for some x, y, z in I such that x < y < z, Rolles’s Theorem 364 implies that f (u) = f (v) = 0 for some u ∈ (x, y) and v ∈ (y, z), and a second application of the same theorem gives ξ ∈ (u, v) such that f
(ξ ) = 0, a contradiction. Observe that the converse fails. For example, the function f : R → R given by f (x) := x 4 for all x ∈ R is strictly convex, as it can be proved easily (see Fig. 8.10). However, f
(0) = 0. ® Another simple but useful result about convex functions that pertains to the theory of maxima and minima is the following. Proposition 822 Let f be a convex function defined on a general interval I . If f has a local minimum at some point x0 ∈ Int I , then x0 is a global minimum of f on I . Proof Assume that f (x1 ) < f (x0 ) for some x1 ∈ I . Then
f λx0 + (1 − λ)x1 ≤ λf (x0 ) + (1 − λ)f (x1 ) < λf (x0 ) + (1 − λ)f (x0 ) = f (x0 ) for all λ ∈ (0, 1),
8.2 Some Fundamental Inequalities
449
and this contradicts the fact that f has a local minimum at x0 , since λ ∈ (0, 1) can be taken arbitrarily close to 1, and thus the point λx0 + (1 − λ)x1 arbitrary close to x0 . 2 Remark 823 Observe that a convex function f defined on a general interval I cannot have strict local maxima at interior points of I . Moreover, if f has a local maximum at an interior point x0 of I , then f is locally constant at x0 . Finally, if f has a global maximum at an interior point of I , then f is necessarily constant on I . The three statements can be easily proved from the fact that, in R2 , the graph of a convex function between two arbitrary points x1 and x2 of I is below or on the segment [(x1 , f (x1 )), (x2 , f (x2 ))] (see Fig. 8.2). ®
8.2 8.2.1
Some Fundamental Inequalities Jensen’s Inequality
The following result bears the named of the Danish mathematician J. Jensen Proposition 824 (Jensen’s inequality: finite form) A function f : I → R is convex, if and only if, for every n ∈ N, {xi }ni=1 ⊂ I , and {wi }ni=1 ⊂ (0, 1) such that n i=1 wi = 1, then n n wi x i ≤ wi f (xi ). (8.11) f i=1
i=1
Moreover, f is strictly convex, if and only if, equality in (8.11) always happens precisely when all x1 , x2 , . . . , xn are equal. Proof The sufficient condition for the convexity statement is obvious. The necessary condition can be proved easily by finite induction. However, it becomes more transparent to consider an arbitrary subdifferential x → ax at x0 := ni=1 wi xi (see Remark 812.5). Let s(x) := a(x − x0 ) + f (x0 ). Since s(x) ≤ f (x) for all x ∈ I , we have in particular s(xi ) ≤ f (xi ) for i = 1, 2, . . . , n. Moreover, s(x0 ) = f (x0 ) and ni=1 wi = 1. Thus, due to the fact that affine functions preserve convex combinations (see the note after Definition 809), we have f (x0 ) = s(x0 ) =
n i=1
wi s(xi ) ≤
n
wi f (xi ),
(8.12)
i=1
and we obtain (8.11). The sufficient condition for the strict convexity statement is again obvious. Assume now that f is strictly convex and that two elements in {xi }ni=1 are different. In the argument above, and due to the fact that s(x) < f (x) for all x = M (see Remark 812.5), we obtain strict inequality in (8.12) and (8.11). 2
450
8 Convex Functions
The following is the integral form of Jensen’s inequality. The reader will notice that the proofs of Propositions 824 and 825 are similar, the second being a “continuous” version of the first. Proposition 825 (Jensen’s inequality: integral form) Let g be a real-valued Lebesgue integrable function defined on a general interval I ⊂ R, and let w : I → R be a bounded nonnegative integrable function on I with I w = 1. Then, if f is a convex function on R, we have f ( I gw) ≤ I (f ◦ g)w (for the second integral, the +∞ value is allowed). Proof First of all, note that gw is Lebesgue integrable on I . Indeed, due to the fact that w is bounded, we may use Proposition 757. Note, however, that although (f ◦ g)w is measurable, due to the fact that f is continuous on the interior of I (see Proposition 813) and g is measurable (see Corollary 406), it is not Lebesgue integrable in general (see Remark 779.3). Let x0 := I gw. Choose a subdifferential x → ax for f at x0 (see Remark 812.5). Then we have ax + b ≤ f (x) for all x ∈ R, where b := f (x0 ) − ax0 . In particular, ag(t) + b ≤ f (g(t)) for all t ∈ I . Due to the fact that w ≥ 0, it follows that ag(t)w(t) for all t ∈ I , + bw(t) ≤ f (g(t))w(t) hence I (ag + b)w ≤ I (f ◦ g)w. Since w = 1, we have f ( gw) = f (x0 ) = I I ax0 + b = a I gw + b = I (ag + b)w ≤ I (f ◦ g)w. 2 Remark 826 A simple change of the function under the integral sign allows to b estimate f ( a g) whenever f is a convex function and g : [a, b] → R is a Lebesgue integrable function on [a, b]. Indeed, apply Proposition 825 to the function h(t) := (b − a)g(t) and w(t) = 1/(b − a)for t ∈ I . It follows that 4 b 4 b
1 g(t) dt ≤ f (b − a)g(t) dt. f b−a a a ® Several common functions encountered in analysis can be proved to be convex (or strictly convex) by using Corollary 820 (respectively, Remark 821).
8.2.2
Using the Exponential Function
The function exp x is strictly convex on R, and the function − ln x is strictly convex on (0, +∞). Both statements follow from Corollary 820 and Remark 821 (see Fig. 8.11 for fragments of the plots of these two functions). These two facta lead to some important inequalities. n n Proposition n827 Fix n ∈ N. Then, given {ai }i=1 ⊂ (0, +∞) and {wi }i=1 ⊂ [0, 1] such that i=1 wi = 1, we have n % i=1
aiwi
≤
n i=1
wi ai ,
(8.13)
8.2 Some Fundamental Inequalities
451
Fig. 8.11 The functions exp x and − ln x on the interval [−3, 3]
and (8.13) is an equality, if and only if, all elements in {xi }ni=1 are equal. In particular, if a, b > 0 and λ ∈ [0, 1], then a λ b(1−λ) ≤ λa + (1 − λ)b,
(8.14)
and (8.14) is an equality, if and only if, a = b. Proof The convexity of exp x and Proposition 824 give, for {xi }ni=1 ⊂ R, n n n % wi xi e = exp w i ei ≤ w i e xi . i=1
i=1
i=1
To obtain (8.13) it is enough to put ai := exi for i = 1, 2, . . . , n. For x = y in R, the strict convexity of exp x gives eλx e(1−λ)y = eλx+(1−λ)y < x λe + (1 − λ)ey . Letting a := ex and b := ey we get the conclusion. 2 Remark 828 1. A particular case of (8.13) is when wi = 1/n for i = 1, 2, . . . , n. We get, in this case n 1/n % a1 + . . . + an ai ≤ (8.15) , for ai > 0, i = 1, 2, . . . , n, n i=1 getting an equality, if and only if, all ai are equal. This says that the so-called geometric mean is less than (or equal to if all elements coincide) the so-called arithmetic mean. 2. Observe that, putting λ = 1/p for λ ∈ (0, 1) and (1 − λ) = 1/q, we get from (8.14) that a 1/p b1/q ≤ (1/p)a + (1/q)b. Thus, if x := a 1/p and y := b1/q , we get xy ≤
1 p 1 q 1 1 x + y , for all x, y > 0, p, q > 0, and + = 1, p q p q
and the inequality is strict when x = y.
®
452
8 Convex Functions
Fig. 8.12 Some powers x r on [0, 1] (for 0 < r < 1 those functions are not convex)
8.2.3
Using Powers of x (Minkowski’s and Hölder’s Inequalities)
For r > 1, the function f (x) := x r on [0, +∞) is strictly convex (see Proposition 819, Remark 821, and Fig. 8.12 for plots of those functions on [0, 1]). Therefore, for x, y > 0 and λ ∈ (0, 1), we have (λx +(1−λ)y)r ≤ λx r +(1−λ)y r , and the equality holds, if and only if, x = y. Fix u, v > 0 and let x := u/λ and y := v/(1 − λ). This gives (u + v)r ≤ λ1−r ur + (1 − λ)1−r vr ,
(8.16)
and the equality holds, if and only if, u/λ = v/(1 − λ), i.e., λ = u/(u + v). We consider now two sequences {ak }nk=1 and {bk }nk=1 in (0, +∞). For each k ∈ N, apply (8.16) to ak and bk , and add the resulting inequalities to get n
(ak + bk )r ≤ λ(1−r)
k=1
n
akr + (1 − λ)1−r
k=1
n
bkr .
(8.17)
k=1
Letting ( nk=1 akr )1/r ( nk=1 bkr )1/r λ := n hence(1 − λ) = n ( k=1 akr )1/r + ( nk=1 bkr )1/r ( k=1 akr )1/r + ( nk=1 bkr )1/r in (8.17), we obtain
n k=1
1/r (ak + bk )
r
≤
n k=1
1/r akr
+
n
1/r bkr
.
(8.18)
k=1
Inequality (8.18) is called Minkowski’s inequality (after the name of the German mathematician H. Minkowski. It is an equality, if and only if, ak = bk for all k ∈ N.
8.2 Some Fundamental Inequalities
453
The same argument applied to integrals instead of sums gives, for integrable functions f and g on I 4
1/r |f + g|r
1/r
4 ≤
I
|f |r
1/r
4 +
I
|g|r
.
(8.19)
I
Apply now Jensen’s inequality (8.11) to the strictly convex function x q (where q > 1). We get, for xi , . . . , xn in (0, +∞) and w1 , . . . , wn in (0, 1) such that nk=1 wk = 1, q n n q wk xk ≤ wk xk . (8.20) k=1
k=1
n Given n s1 , . . . , sn in (0, +∞), let wk := sk / k=1 sk ( > 0) for k = 1, 2, . . . , n. Then k=1 wk = 1. Equation (8.20) then reads n q n q k=1 sk xk k=1 sk xk n ≤ , n k=1 sk k=1 sk i.e., n
1/q n (q−1)/q n q s k xk ≤ sk sk xk .
k=1
k=1
(8.21)
k=1
Let us change variables in such a way that the powers of the summands on the righthand-side of the former inequality would be, respectively, q/(q − 1) and q, i.e., put q/(q−1) q q s k = ak and sk xk = bk , for k = 1, 2, . . . , n. Therefore, (8.21) becomes, after noticing that sk xk = ak bk for all k, and putting p := q/(q − 1) n
ak bk ≤
k=1
n
1/p p ak
k=1
n
1/q q bk
, where 1 < p, and
k=1
1 1 + = 1. (8.22) p q
Inequality (8.22) is known as Hölder’s inequality (after the name of the German mathematician O. Hölder). It turns to an equality, if and only if, ak = bk for all k = 1, 2, . . . , n. The same argument applied to integrals instead of sums gives, for measurable functions f and g on I such that |f |p and |g|q are both integrable on I , 1/p 4
4
4 |f g| ≤ I
1/q
|f |p I
|g|q I
, where 1 < p, and
1 1 + = 1. p q
(8.23)
The case p = q = 2 is known as the Cauchy–Schwarz inequality (named after A. L. Cauchy and the German mathematician K. H. A. Schwarz). It is worth to state it separately (Proposition 829). Later, in Sect. 11.1, we shall prove a general
454
8 Convex Functions
Cauchy–Schwarz inequality valid in every vector space endowed with an inner product (Theorem 958). The present inequality (for the Euclidean space Rn ) is a particular case of that statement. Proposition 829 (Cauchy–Schwarz inequality) Let {ak }nk=1 and {bk }nk=1 be two sets of real numbers. Then n n 1/2 n 1/2 2 2 ak bk ≤ ak bk . k=1
k=1
(8.24)
k=1
The inequality turns out to be an equality, if and only if, ak = bk for all k = 1, 2, . . . , n. Given two measurable functions f and g on a general interval I such that |f |2 and |g|2 are both integrable, then |f g| is integrable on I and 1/2 4
4
4 |f g| ≤
|f |
I
1/2 |g|
2
I
2
(8.25)
.
I
Remark 830 For p ≥ 1 and a general interval I , denote by Lp (I ) the vector space of all measurable real-valued functions defined on I such that |f |p is Lebesgue integrable. A simple consequence of (8.23) is that, in case that I is a bounded interval, and p > q ≥ 1, then Lp (I ) ⊂ Lq (I ). Indeed, due to the fact that p/q > 1 we can apply (8.23) to get, if f ∈ Lp (I ), q/p 4
4
4 |f | ≤
|f |
q
I
I
p
(p−q)/p 1
p/(p−q)
4
q/p
≤ λ(I )
|f |
(p−q)/p
I
p
.
I
Thus |f |q is Lebesgue integrable on I , i.e., f ∈ Lq (I ), and f q ≤ (λ(I ))(p−q)/pq f p .
(8.26) ®
Chapter 9
Fourier Series
This chapter deals with Fourier Analysis —a theory that bears the name of the French mathematician and physicist Joseph Fourier, who initiated the systematic approach to it in order to explain the analytic theory of heat—, i.e., to represent functions f defined on R as the sum of a series whose terms are simple trigonometric functions, i.e., the nowadays called Fourier series of f .
9.1
Introduction
a0 To ∞represent a function as a Fourier series, i.e., a series of the form 2 + n=1 (an cos nx + bn sin nx), may look difficult to achieve, since f is not, in general, 2π -periodic, while trigonometric (…) functions are. This is not a big problem if f is supposed to be defined on a closed and bounded interval. A simple change of variable allows to consider that this interval is [0, 2π ], and—after redefining f at the endpoints—f can then be extended to a 2π-periodic function on R. The case of non-periodic functions defined on R can be treated in a related way, by using integrals instead of series. We have to determine precisely what “to be represented by a (trigonometric) series” means (the reader is aware that there are many kinds of convergence). Another, computational, problem is to determine the “coefficients” in the given series. This had been worked out, prior to Fourier’s work, by Euler, who used the orthogonality of the trigonometric system to give a simple description of the form of those coefficients. It is not an overstatement to say that Fourier analysis has been the seed for many—if not most—of the modern theories in analysis (real and complex analysis, harmonic analysis, and the theory of ordinary and partial differential equations among others), topology, functional analysis (including Banach space theory, operator theory, Banach algebras, distribution theory, etc.), set theory, number theory, combinatorics, statistics, measure theory and linear algebra. This is only a partial list in pure mathematics. The list of topics in applied mathematics and engineering that had been influenced by Fourier analysis is even longer (theory of communication, wave transmission, codification, signal processing, imaging, cryptography, option pricing, acoustics, oceanography, optics, diffraction, numerical analysis, computing, time-frequency analysis, wavelets, etc.).
© Springer International Publishing Switzerland 2015 V. Montesinos et al., An Introduction to Modern Analysis, DOI 10.1007/978-3-319-12481-0_9
455
456
9 Fourier Series
The subject is better stated and developed in the framework of complex-valued real-variable functions. A succinct presentation of some of the basics on the complexnumber system is done in Sect. 12.5, and a brief report on normed and Hilbert spaces over the complex number system is included in Chap. 11.
9.2
Some Elementary Trigonometric Identities
Fix x ∈ R, x = 2mπ for m ∈ Z, and n = 0, 1, 2, . . . . Observe that {eikx }nk=0 is a finite geometric progression. Hence n
eikx =
k=0
=
einx eix − 1 eix − 1 n+1 n+1 n+1 ei 2 x ei 2 x − e−i 2 x e
i x2
e
i x2
−e
−i x2
=e
i n2 x
sin
n+1 2 sin x2
x
.
(9.1)
Taking the real and imaginary parts in (9.1), we get n n sin n+1 x n sin n+1 x 2 2 x cos kx = cos x , sin kx = sin x x 2 sin 2 sin 2 2 k=0 k=0
n
(9.2) Analogously, we have, for x ∈ R, x = 2mπ for m ∈ Z, and n = 0, 1, 2, . . . , n k=−n
1
eikx =
1
sin (n + 21 )x ei(n+1)x − e−inx e−ix/2 ei(n+ 2 )x − e−i(n+ 2 )x = = . e−ix/2 eix/2 − e−ix/2 sin x2 eix − 1 (9.3)
Finally,
n
1
ix
e(k+ 2 )ix = e 2
k=0
e(n+1)ix − 1 e(n+1)ix − 1 . = eix − 1 2i sin x2
Taking the real and imaginary parts we obtain
1 sin (n + 1)x cos k + . x= 2 2 sin x2 k=0
(9.4)
sin2 ( n+1 )x 1 − cos (n + 1)x 1 2 x= sin k + = . x x 2 2 sin 2 sin 2 k=0
(9.5)
n
n
9.2 Some Elementary Trigonometric Identities
457
The following inequality will be frequently used later. Lemma 831 For every x ∈ R and every n ∈ N, we have n sin kx ≤ 1 + 2π. k
(9.6)
k=1
Proof First of all, use the Abel partial summation Lemma 179 for an := sin nx and bn := 1/n to get, for each 0 < x < 2π and 0 < m ≤ n, n n−1 sin kx 1 1 1 1 Ak = − + An − Am−1 k k k + 1 n m k=m k=m n 1 1 1 1 = Ak − + An − Am−1 , k k + 1 n + 1 m k=m
(9.7)
n where An := k=0 sin kx for n = 0, 1, 2, . . . . Note that, from (9.2), |An | ≤ 1/ sin (x/2) for n = 0, 1, 2, . . . , due to the fact that 0 < x < π. This gives n n sin kx 1 1 1 1 |Ak | − + |An | + |Am−1 | ≤ k k=m k k+1 n+1 m k=m 1 1 1 1 1 1 1 ≤ − + + sin (x/2) m n + 1 sin (x/2) n + 1 sin (x/2) m =
2/m sin (x/2)
(9.8)
Note that in order to prove the desired inequality, it suffices to assume that 0 < x ≤ π , as sin kx = − sin k(2π − x). For these values of x, we have |sin kx| ≤ kx, and sin
x 1 ≥ x. 2 π
(9.9)
Thus, by using (9.8) and (9.9), we have n sin kx kx sin kx 1 2x + ≤ ≤ x · x + π −1 x = 1 + 2π. k k k 1 1 k=1 1≤k< x
x ≤k≤n
2 Lebesgue integrals are denoted by [a,b] f , and Riemann integrals, indistinctly, by b a f or [a,b] f (see Theorem 783), although in this second case we shall try to stick b to the notation a f in order to stress that in fact f is Riemann integrable. The following result will be applied several times in this section.
458
9 Fourier Series
Fig. 9.1 Some of the functions in Lemma 832
Lemma 832 Fix δ ∈ (0, π). Then 4 δ sin αt π dt → as α → +∞. t 2 0
(9.10)
Proof The function g(t) := sint αt is continuous on (0, δ] and has a limit when t → 0+, precisely g(0+) = α (see Fig. 9.1 for the plot of some of the functions). Then, if we define g(0) = α, the resulting function if continuous on [0, δ]. The change of variable y = αt gives 4 0
δ
sin αt dt = t
4 0
αδ
sin y dy y
When +∞ sinαy → +∞, the last integral converges to the improper Riemann integral dy ( = π/2) (see Example 806; see also Exercise 13.467). This concludes 0 y the proof. 2
9.3 The Fourier Series of 2π -periodic Lebesgue Integrable Functions In the remaining of this chapter, we shall consider real or complex-valued functions f defined on [0, 2π ]. In case that f (0) = f (2π) we may always consider its 2π periodic extension f to the whole R. By a 2π-periodic extension of f we mean a function fdefined on R such that f(x + 2π) = f(x) for all x ∈ R. In order to avoid cumbersome notation, the 2π-periodic extension of f will be denoted again by f , if there is no possibility of misunderstanding. Recall that L[0, 2π] is the space of all Lebesgue integrable scalar-valued functions on the interval [0, 2π ]. If the values of f ∈ L[0, 2π ] at the interval end-points are irrelevant, we may assume that f (0) = f (2π), and identify f with its 2π -periodic extension to R.
9.3 The Fourier Series of 2π -periodic Lebesgue Integrable Functions
459
Note that under 2π -periodicity of f , a simple change of variable (see Exercise 13.525) shows that 4 4 f (t)dt = f (t)dt, for every interval I of length 2π. (9.11) [0,2π ]
I
Observe, too, that the product of two continuous and 2π -periodic scalar-valued functions defined on R is again continuous and 2π-periodic. Consider the continuous 2π -periodic functions 1, sin nx, and cos nx, for n ∈ N, defined on R. Note that, for n, m ∈ N, ⎫ 2π ⎪ 1dx = 2π, ⎪ ⎪ 0 ⎪ 2π 2π ⎪ ⎪ ⎪ ⎪ 0 sin nx dx = 0 cos nx dx = 0, ⎬ 2π
2π 1 (9.12) sin (n + m)x + sin (n − m)x dx = 0. sin nx cos mx dx = 0 2 0 ⎪ ⎪ 2π 2π ⎪ 1 2π 2 ⎪ 0 cos nx cos nx dx = 0 cos nx dx = 2 0 (1 + cos 2nx) dx = π.⎪ ⎪ ⎪ 2π 2 2π ⎪ 2π sin nx sin nx dx = sin nx dx = 1 (1 − cos 2nx) dx = π. ⎭ 0
0
2
0
Similarly for other combinations of sin nx and cos nx. Relations (9.12) are central in the theory. They show that, if we endow the space of continuous 2π-periodic functions on R with the scalar product1 given by 4
2π
f , g :=
f (x)g(x)dx, for f , g continuous and 2π -periodic on R,
(9.13)
0
where the upper bar denotes complex conjugation, then the system cos nx sin nx 1 cos x sin x CS := √ , √ , √ , . . . , √ , √ , . . . π π π π 2π
(9.14)
is orthonormal (a combination of the two words orthogonal and normalized), i.e., f , g = 0, and f , f = 1, if f , g ∈ CS, f = g.
(9.15)
Note that, in case that the functions f and g are real-valued (as in the system (9.14)), the scalar product f , g in (9.13) reads f , g = [0,2π ] f (x)g(x) dx. The analogy with the case of Rn , where we have an orthonormal system {e1 , e2 , . . . , en } that allows to “expand” any vector x in Rn as a sum ni=1 x, ei ei , suggests to associate to any f ∈ L[0, 2π ] its expansion in terms of the system (9.14), 1
The general definition of a scalar product (also called an inner product) will be given in Chap. 11, Definition 957. To check that formula (9.13) defines indeed a scalar product on the space of all continuous and 2π-periodic scalar valued functions on R is routine, except maybe to verify that f , f = 0 implies f = 0. The details are provided in Exercise 13.418.
460
9 Fourier Series
meaning by that an expression of the form a0 /2 + ∞ n=1 (an cos nx + bn sin nx) (the mysterious form a0 /2 for the “independent” term will be understood later—see the paragraph after Eq. (9.18)). For the moment, this is just a formal expression, so we prefer to write the association of f to the expansion above as ∞
f (x) ∼
a0 + (an cos nx + bn sin nx), 2 n=1
(9.16)
where nothing is said about the convergence or divergence of the series in (9.16). As above, the orthogonality of the system (9.14) suggests the following heuristic argument to determine the coefficients in the series in (9.16): Note first that the functions f (x) sin mx and f (x) cos mx are Lebesgue integrable on [0, 2π ] by Proposition 757. By using formulas (9.12), (9.16), and assuming the interchange of the integral and the infinite sum allowed, we have, for m ∈ N ∪ {0}, 4 f (x) cos mx dx = [0,2π]
a0 2
4
2π
cos mx dx +
0
4
∞
2π
+ bn
cos nx. cos mx dx 0
n=1
4
2π
an
sin nx. cos mx dx = am π ,
0
and for m ∈ N, 4
a0 f (x) sin mx dx = 2 [0,2π]
4
2π
sin mx dx +
0
4
2π
2π
cos nx. sin mx dx
an
n=1
+ bn
4
∞
0
sin nx. sin mx dx = bm π.
0
Hence the following values are proposed as the coefficients: 1 an = π
bn =
4 f (x) cos nx dx, for n ∈ N ∪ {0},
(9.17)
f (x) sin nx dx, for n ∈ N.
(9.18)
[0,2π ]
1 π
4 [0,2π ]
The reader will now appreciate that, putting a0 /2 as the constant term in the series in (9.16), allows to use a single formula for all an , n ∈ N ∪ {0}.
9.3 The Fourier Series of 2π -periodic Lebesgue Integrable Functions
461
By an abuse of notation (indeed, note that formula (9.13) does not2 define a scalar product on L[0, 2π]), we may write 8 9 cos nx 1 an = √ f , √ , n ∈ N ∪ {0}, π π 8 9 sin nx 1 bn = √ f , √ , n ∈ N. π π Definition 833 The (trigonometric form of the) Fourier series of a function f ∈ L[0, 2π] is the series in (9.16), where the coefficients {a0 , a1 , a2 , . . . , b1 , b2 , . . . } (called the Fourier coefficients of f (trigonometric form)) are given by (9.17) and (9.18). There is a way to use the (complex) exponential function instead of the two realvalued functions cos x and sin x. For this, note that eikx = cos kx + i sin kx (hence ikx −ikx ikx −ikx , and sin kx = e −e ) for all k ∈ Z. The system of continuous cos kx = e +e 2 2i 2π-periodic functions inx e E := √ (9.19) 2π n∈Z is orthonormal for the scalar product defined in (9.13), as it is simple to prove (see Exercise 13.524). The association (9.16) becomes f (x) ∼
+∞
cn einx
(9.20)
n=−∞
(called the complex or exponential form of the Fourier series of f . We insist again that the series in (9.20) is merely formal, since no convergence is assumed at this stage). The value of the coefficients cn , n ∈ Z (called the Fourier coefficients in their exponential form) are deduced immediately from the expressions (9.17) and (9.18), since it follows by identification that c0 = a0 /2, cn =
1 1 (an − ibn ) for n > 0, and cn = (a−n + ib−n ) for n < 0. 2 2 (9.21)
In integral form, it follows from (9.17), (9.18), and (9.21), that 4 1 cn = f (x)e−inx dx, for n ∈ Z 2π [0,2π ]
(9.22)
Indeed, f , f = 0 does not imply that f = 0. This should not be a problem if we identify functions that are equal almost everywhere, in view of Corollary 761. The problem in that, in general, the product of two Lebesgue integrable functions is not Lebesgue integrable, as we showed in Remark 779.3. 2
462
9 Fourier Series
(note again that the functions f (x)einx , for n ∈ Z, are Lebesgue integrable on [0, 2π ] by Proposition 757). Again by an abuse of notation we can write 8 9 einx 1 f, √ , for n ∈ Z. cn = √ 2π 2π Definition 834 The (exponential form of the) Fourier series of a function f ∈ L[0, 2π]) is the series in (9.20), where the coefficients {cn : n ∈ Z} (called the Fourier coefficients of f (exponential form)) are given by (9.22). Remark 835 It follows from the expression of the Fourier coefficients (9.17) and (9.18) that, if f is an even function (i.e., if f (x) = f (−x) for all x ∈ R), then bn = 0 for all n ∈ N. On the other side, if f is an odd function (i.e., if f (x) = −f (−x) for all x ∈ R), then an = 0 for all n ∈ N ∪ {0}. ® Remark 836 The full theory of Fourier series can be developed for l-periodic functions defined on R, where l > 0. A simple change of variable allows to transfer the situation to the presented [0, 2π]-case. For the sake of completeness, we provide here the argument and we list the resulting formulas regarding a l-periodic function f defined on [a, b], where a < b and l = b − a. Put t = φ(x) := a + (x/2π)(b − a) and g(x) = f (φ(x)) for x ∈ [0, 2π], so x = 2π (t − a)/(b − a) for t ∈ [a, b]. Then we have ∞ a0 + (an cos nx + bn sin nx) 2 n=1 ∞ t −a t −a a0 an cos 2πn + + bn sin 2π n , = 2 b−a b−a n=1
f (t) = g(x) ∼
where 1 an = π bn =
1 π
4
2 g(x) cos nx dx = b − a [0,2π]
4 g(x) sin nx dx = [0,2π]
2 b−a
4 f (t) cos 2π n [a,b]
4
t −a dt, n = 0, 1, 2, . . . b−a (9.24)
f (t) sin 2π n [a,b]
t −a dt, n = 1, 2, . . . . b−a (9.25)
If we use the exponential form, f (t) = g(x) ∼
+∞ n=−∞
cn einx =
t −a , cn exp i2π n b−a n=−∞ +∞
(9.23)
9.4 The Riemann–Lebesgue Lemma
463
Fig. 9.2 Changing the variable (Remark 836)
,
where cn =
1 2π
4
g(x)e−inx dx =
[0,2π]
1 b−a
4
,
t −a dt, n ∈ Z. f (t) exp i2π n b−a [a,b]
9.4 The Riemann–Lebesgue Lemma The reader will observe that integrals of the form f (x) sin nx dx and f (x) cos nx dx are ubiquitous in this theory. An important behavior of those is recorded in the next result. Lemma 837 [Riemann–Lebesgue lemma] Let I be a general interval, and let f ∈ L(I ). Then 4 lim
α→+∞ I
4 f (x) sin αxdx = 0,
and
lim
α→+∞ I
f (x) cos αxdx = 0.
(9.26)
Proof In view of Proposition 402, the function x → f (x) sin αx defined on I is measurable. Since |f (x) sin αx| ≤ |f (x)| for all x ∈ I and all α ∈ R, Remark 752 shows that the function x → f (x) sin αx is Lebesgue integrable on I . Assume first that f is a constant function with compact support on I , say f (x) = K for all x ∈ [a, b] ⊂ I . Then the result follows, since, for α > 0, 4 b K K sin αxdx = ( cos αa − cos αb) → 0 as α → +∞. α a As a consequence, the result holds for every step function s ∈ S(I ) (see Definition 721). Given ε > 0 get, by Lemma 738, a function s ∈ S(I ) and a function g ∈ L(I ) such that f = s + g and I |g| < ε/2. From the first part of the proof, we may find α0 > 0 such that | I s(x) sin αxdx| < ε/2 for all α ≥ α0 . Then, for α ≥ α0 , 4 4 4 f (x) sin αxdx ≤ s(x) sin αxdx + (f (x) − s(x)) sin αxdx I
I
I
4
≤ ε/2 +
|g(x)|dx < ε/2 + ε/2 = ε. I
A similar argument works for the second integral in (9.26).
2
464
9 Fourier Series
Fig. 9.3 It is (quite) clear why the Riemann–Lebesgue Lemma holds
Figure 9.3 hints at the reason why the Riemann–Lebesgue Lemma holds. Certainly, [−1,1] f (x)dx does not vanish, where f is the function depicted in that figure. However, letting its graph oscillate with higher and higher frequencies around the OX axis makes the integral closer and closer to 0. Remark 838 Lemma 837, when applied to a function f in L[0, 2π ], shows that the ∞ sequences {an }∞ n=0 , {bn }n=1 , and {c0 , c1 , c−1 , c2 , c−2 , . . .} of its Fourier coefficients are all null sequences, i.e., they all have limit 0. ®
9.5 The Partial Sums of a Fourier Series and the Dirichlet Kernel Given f ∈ L[0, 2π], m ∈ N, and x ∈ R, consider the mth partial sum of its Fourier series, i.e., sm (x) =
m m m a0 ak cos kx + bk sin kx = ck eikx , + 2 k=1 k=1 k=−m
(9.27)
where the Fourier coefficients a0 , a1 , a2 , . . . , b1 , b2 , . . . are given by (9.17) and (9.18), and the coefficients ck , for k ∈ Z, by (9.22). Thus, carrying (9.22) into (9.27), and noticing that both f and e−ikx are 2π periodic functions, we have, in view of (9.11) and the fact that the interval [x − π, x + π] has length 2π, sm (x) =
1 2π
4 f (v) [x−π ,x+π ]
m
eik(x−v) dv
(9.28)
k=−m
Remark 839 In the rest of this chapter we shall encounter integrals where the function under the integral sign may not be defined at the points of some subset of R. This is not a problem as far as this set is null (if the integrals involved are Lebesgue
9.5 The Partial Sums of a Fourier Series and the Dirichlet Kernel
465
integrals) or finite (in case of Riemann integrals). For example, we shall substitute m ikt e by sin (m + 1/2)x/ sin (x/2) in Eq. (9.28), valid if t = 2mπ , m ∈ Z k=−m ikt (see Eq. 9.3). As far as the two expressions m and sin (m+1/2)x/ sin (x/2) k=−m e are under the integral sign, this should not cause any problem. Even more, in this particular case the second expression has a finite limit when x → 2mπ , and we can assume that the function sin (m + 1/2)x/ sin (x/2) is extended to take this limit value at each point 2mπ , m ∈ Z, thus becoming a continuous function. In order to avoid lengthy statements and reiterative warnings in this direction, we shall consider that, under the integral sign, functions are defined almost everywhere (in the Lebesgue case) and up to a finite number of points (in the Riemann case). ® Thus, by (9.3) and (9.28), for m ∈ N, and for x ∈ R, 4 sin (m + 21 )(v − x) 1 sm (x) = f (v) dv 2π [x−π ,x+π ] sin 21 (v − x) 4 sin (m + 21 )(v − x) 1 f (v) dv = 2π [x−π ,x] sin 21 (v − x) 4 sin (m + 21 )(v − x) 1 f (v) dv + 2π [x,x+π ] sin 21 (v − x) 4 f (x + t) + f (x − t) sin (m + 21 )t 1 dt, (9.29) . = π [0,π ] 2 sin 2t where we used the substitutions v = x − t in [x − π , x], and v = x + t in [x, x + π ]. Note that (9.29) is also valid for m = 0. The expression that multiplies (1/2)(f (x + t) + f (x − t)) under the integral sign in (9.29) appears frequently, and is isolated in the next definition. Definition 840 For m = 0, 1, 2, . . . , the function ⎧ ⎨ sin (m+x 21 )x , if x ∈ R, x = 2kπ , k ∈ Z, sin 2 Dm (x) := ⎩2m + 1, otherwise,
(9.30)
is called the Dirichlet kernel. Remark 841 Observe that Dm (x) =
m
eikx for x ∈ R and m ∈ N ∪ {0},
(9.31)
k=−m
according to Eq. (9.3) in case x = 2nπ, n ∈ Z (otherwise the equality follows by performing a simple addition). Note, too, that if sm (x) denotes the mth partial sum of the Fourier series associated to f (see Eq. (9.27)), then Eq. (9.28) may be written as sm (x) = (Dm ∗ f )(x),
(9.32)
466
9 Fourier Series
Fig. 9.4 The Dirichlet kernel in [−π, π] for m = 0, 1, 2, 3, 4
where ∗ denotes the convolution operator, defined in Exercise 13.227 for the class of continuous functions and, more generally, in Definition 1060 for the class of periodic distributions. ® Remark 842 The following properties of the Dirichlet kernel hold (see Fig. 9.4). 1. Equation (9.29) appears as sm (x) =
1 π
4 [0,π ]
f (x + t) + f (x − t) Dm (t)dt, for m ∈ N, and x ∈ R. 2 (9.33)
2. Dm is a continuous function. Indeed, limt→kπ Dm (t) = 2m + 1 for m ∈ N, as it follows, for example, by using L’Hôspital’s Rule (Theorem 376). 3. Obviously, Dm is 2π -periodic. 4. If we apply the former development to the function f being identically 1, we get 1 a0 = 1, 2
and
ak = bk = 0
for
k ∈ N,
hence sm (x) = 1 for all x ∈ R. Thus, from (9.33), we get 4 1 π Dm (t)dt for m = 0, 1, 2, . . . , 1= π 0
(9.34)
hence 4
π
Dm (t)dt = π , for all m = 0, 1, 2, . . .
(9.35)
0
®
9.6 Convergence of the Fourier Series
9.6
467
Convergence of the Fourier Series
Given a Lebesgue integrable 2π -periodic function defined on R, an important issue is to decide on the convergence of its Fourier series (9.16) or (9.20). We consider pointwise convergence (Sect. 9.6.1), the so-called Cesàro convergence (Sect. 9.6.2), uniform convergence (see Sect. 9.6.3), · 1 -convergence (Sect. 9.6.4), and finally in Sect. 9.6.5 the case the function is in L2 [0, 2π] (see Example 963.2). We shall postpone the study of the so-called · 2 -convergence to the moment the general theory of Hilbert spaces will be at hand (Sect. 11.4 and, more particularly, (iii) in Theorem 977 and Proposition 981).
9.6.1
Pointwise Convergence of the Fourier Series
We shall discuss the pointwise convergence of the Fourier series of a given function f ∈ L[0, 2π] at a certain point x, i.e., of the sequence {sm (x)}∞ m=1 of its partial sums, showing below that it will depend only on the behavior of f in a neighborhood of x. To this end, we need the following result, that allows to localize, and to substitute, the sequence to be checked. Theorem 843 (Riemann’s localization lemma) Let f ∈ L[0, 2π ], and let δ ∈ (0, π ). Put, for m ∈ N and x ∈ R, 1 (x−t) sin (m+ 2 )t sm (x, δ) := π1 [0,δ] f (x+t)+f . sin t dt 2 2 (x−t) = π1 [0,δ] f (x+t)+f Dm (t)dt , 2
(9.36)
and the associated sequence 1 rm (x, δ) := π
4 [0,δ]
f (x + t) + f (x − t) sin (m + 21 )t . dt. 2 t/2
(9.37)
Then, for x ∈ R, sm (x) − sm (x, δ) → 0, and sm (x) − rm (x, δ) → 0, when m → ∞, where {sm (x)}∞ m=1 is the sequence of partial sums of the Fourier series of f (see (9.27) above). Remark 844 Note that Theorem 843 implies that the convergence of the sequence {sm }∞ m=1 at x depends only on the behavior of f on an arbitrarily small neighborhood of x. ®
468
9 Fourier Series
Fig. 9.5 The graph of 1 1 − sin (t/2) on [−6, 6] t/2 (proof of Theorem 843)
Proof of Theorem 843 We saw in (9.29) that, given x ∈ R and m ∈ N, we have 4 f (x + t) + f (x − t) 1 Dm (t)dt. sm (x) = π [0,π ] 2 Pick x ∈ R. The function t → (f (x + t) + f (x − t))/2 sin (t/2) is a function in L[δ, π]. Hence, by the Riemann–Lebesgue Lemma 837 we have 4 1 f (x + t) + f (x − t) sm (x) − sm (x, δ) = Dm (t)dt π [δ,π] 2 4 f (x + t) + f (x − t) 1 1 tdt → 0, when m → ∞. sin m + = 2 sin 2t 2 π [δ,π] For the sequence {rm (x, δ)}∞ m=1 , note that rm (x, δ) − sm (x, δ) 4 1 f (x + t) − f (x − t) 1 1 1 = . − sin m + tdt π [0,δ] 2 t/2 sin (t/2) 2 4 1 1 f (x + t) − f (x − t) sin (t/2) − (t/2) = . sin m + tdt. 2 (t/2) sin (t/2) 2 π [0,δ] (x−t) sin (t/2)−(t/2) The function defined by t → f (x+t)−f if t = 0 and having value 0 at 2 (t/2) sin (t/2) t = 0 is an element in L[0, δ] (the reader will have no difficulty in proving, by using (t/2)−(t/2) L’Hôspital’s Rule (Theorem 376), that the function sin is continuous at 0 if (t/2) sin (t/2) defined to be 0 at 0, see Fig. 9.5). Then, the fact that rm (x, δ) − sm (x, δ) → 0 when m → ∞ follows again from the Riemann–Lebesgue Lemma 837. This finishes the proof. 2
Remark 845 By applying Theorem 843 to the function f identically equal to 1, we get, given any δ ∈ (0, π ), 4 4 1 δ sin (m + 21 )t 1 δ dt → 1, when m → ∞. (9.38) Dm (t)dt → 1, and π 0 π 0 t/2 Indeed, the Fourier series of the function f identically 1 on [0, 2π ] is just f , and any of its partial sums sm is again f . Thus, sm (x) = 1 for m ∈ N and x ∈ R (compare with Remark 842.4). ®
9.6 Convergence of the Fourier Series
469
Let us summarize what we obtained until now: Let f ∈ L[0, 2π ]. Fix x ∈ [0, 2π ]. We are looking for sufficient conditions for the convergence of the Fourier series of f at the point x, i.e., for the convergence of the sequence {sm (x)}∞ m=0 of partial sums (see (9.27)). Theorem 843 shows that for an arbitrary δ ∈ (0, π ), the three ∞ ∞ sequences {sm (x)}∞ m=0 , {sm (x, δ)}m=1 , and {rm (x, δ)}m=1 converge (to the same limit) or all three of them diverge. We shall focus on the sequence {rm (x, δ)}∞ m=1 , since it is somehow easier to manage. Finally, it all amounts to prove that for some δ ∈ (0, π ), the sequence {rm (x, δ)}∞ m=1 converges. In view of (9.37), we must concentrate on the analysis of integrals of the form sin αt g(t) dt. Those integrals are called Dirichlet integrals, and they are central [0,δ] t in the theory of Fourier series. We claim that, under suitable conditions on g, 4 lim
α→+∞ [0,δ]
g(t)
sin αt π dt = g(0+). t 2
(9.39)
In order to make this claim plausible, we may consider the particular case of a constant function g on [0, δ]. Assume then that g(x) = K for x ∈ [0, δ]. Then, by Lemma 832, 4 4 δ sin αt sin αt π π lim g(t) dt = K lim dt = K = g(0+) . α→+∞ [0,δ] α→+∞ 0 t t 2 2 The impact that the convergence of the Dirichlet integral for α → +∞ (see Eq. (9.39)) has on the problem of the pointwise convergence of the Fourier series of a function is shown in the following result. Proposition 846 Let f ∈ L[0, 2π ]. Let x ∈ R be such that the two limits f (x+) and f (x−) exist finite. Assume that, for some δ ∈ (0, π ), Eq. (9.39) holds for the two functions g(t) = f (x + t), and g(t) = f (x − t) defined on (0, δ) (in the sense that the limit exists finite and has the recorded value). Then the Fourier series of f converges at x to (1/2)(f (x+) + f (x−)). Proof Indeed, lim rm (x, δ) 4 4 1 f (x + t) sin (m + 21 )t f (x − t) sin (m + 21 )t . dt + . dt = lim m→∞ π 2 t/2 2 t/2 [0,δ] [0,δ] 4 4 sin (m + 21 )t sin (m + 21 ) 1 1 = lim f (x + t) f (x − t) dt + lim dt m→∞ π [0,δ] m→∞ π [0,δ] t t
m→∞
=
f (x+) + f (x−) . 2
The result follows from the fact that limm→∞ sm (x) = limm→∞ rm (x, δ), proved in Theorem 843. 2
470
9 Fourier Series
Theorems 847 and 851 below give sufficient conditions for the convergence of a Dirichlet integral (9.39). Their Corollaries 848, 849, and 852 together cover a wide spectrum of particular cases. For the classes of functions considered there, we may ensure pointwise convergence—or at least convergence at some points—of the associated Fourier series. Theorem 847 (Dini) Let δ ∈ (0, π) and g : (0, δ] → R be a function such that g(0+) exists. Assume, too, that the function x → g(x)−g(0+) is Lebesgue integrable x on (0, δ]. Then, given α > 0, the function x → g(x) sinxαx is Lebesgue integrable on (0, δ] and, moreover, 4 sin αt π lim g(t) dt = g(0+), α→+∞ [0,δ] t 2 i.e., Eq. (9.39) holds. Proof To prove that for every α > 0 the function x → g(x) sinxαx is Lebesgue integrable on (0, δ], put g(x)
sin αx g(x) − g(0+) sin αx = sin αx + g(0+) . x x x
(9.40)
The first summand in the right-hand side of Eq. (9.40) is a Lebesgue integrable function on (0, δ], due to the assumption and Proposition 757. The function x → sin αx , when extended to be α at 0, is continuous on [0, δ], and so the second summand x in the right-hand side of Eq. (9.40) is, when extended this way, also Lebesgue integrable on [0, δ]. Thus, 4
sin αt g(t) dt = t [0,δ]
4 [0,δ]
g(t) − g(0+) sin αt dt + g(0+) t
4
δ 0
sin αt dt. (9.41) t
Now, the first integral in the right-hand-side of the equality (9.41) converges to 0 as α → +∞ due to the Riemann–Lebesgue Lemma 837 and our assumption, and the second integral converges to g(0+)π/2 due to Lemma 832. 2 Corollary 848 Let f ∈ L[0, 2π ], and x ∈ R. Assume that there exists δ ∈ (0, π ) such that f satisfies a Lipschitz condition on (x, x + δ) and on (x − δ, x). Then the Fourier series of f converges at x to (1/2)(f (x+) + f (x−)). Proof Observe that, in this case, both f (x+) and f (x−) exist. Indeed, if |f (y) − f (z)| ≤ K|y − z| for some K > 0 and all y, z ∈ (x, x + δ), then for any sequence {xn } in (x, x + δ) that converges to x the sequence {f (xn )} is clearly Cauchy, hence convergent. Since this is true for every such a sequence, Proposition 314 concludes that f (x+) exists. The same argument applies to f (x−). Put now g(t) := f (x +t) for t ∈ (0, δ). Since |g(t)−g(s)| = |f (x +t)−f (x +s)| ≤ 2K|t −s| for all t, s ∈ (0, δ), we may let s → 0 in the previous inequality to get |g(t) − g(0+)| ≤ 2K|t| for all t ∈ (0, δ). Thus, | g(t)−g(0+) | ≤ 2K for t ∈ (0, δ), and so t → g(t)−g(0+) is clearly t t integrable on (0, δ). This shows, by Theorem 847, that (9.39) holds. A similar result
9.6 Convergence of the Fourier Series
471
is true for the function t → f (x − t) in (−δ, 0). Proposition 846 applies, and this proves the statement. 2 Corollary 849 Let f ∈ L[0, 2π], and x ∈ R. Assume that f (x+) and f (x−) both exist, and that f+ (x) exists (and is finite), in the sense of the existence of both limits (x+) (x−) limh→0+ f (x+h)−f and limh→0− f (x+h)−f finite. Then the Fourier series of f h h converges at x to (1/2)(f (x+) + f (x−)). (x+)
Proof Given ε > 0 we may find δ > 0 such that f (x+h)−f − f (x) < ε for + h 0 < h < δ. It follows then that f (x + h) − f (x+) h f (x + h) − f (x+) ≤ − f+ (x) + |f+ (x)| < ε + |f+ (x)|, for all 0 < h < δ. h A similar argument shows that f (x + h) − f (x+) h f (x + h) − f (x+)
≤ − f− (x) + |f− (x)| < ε + |f− (x)|, for all − δ < h < 0. h All together, this proves that both functions g(t) = f (x + t) and g(t) = f (x − t) satisfy Theorem 847. Then f satisfies Proposition 846, and the conclusion follows. 2 A numerical example related to Corollary 849 is the following. Example 850 We consider the 2π-periodic extension of the function f (x) := x +x 2 defined on [−π , π ] (see Fig. 9.6), and call this extension F . Observe that every point x ∈ R, x = (2n + 1)π, n ∈ Z, has a neighborhood where F satisfies all the requirements in Corollary 849. Moreover, F is continuous in this neighborhood. It follows that the Fourier series of F is convergent at x, and it converges to (1/2)(F (x+) + F (x−)) = F (x). Assume now that x = (2n + 1)π for some n ∈ Z. Again, there is a neighborhood of x where F satisfies all the requirements in Corollary 849, hence the convergence of the Fourier series of F is guaranteed. The only difference with respect to the previous case is that now F is not continuous at x. The convergence of the Fourier series at x is toward the value (1/2)(F (x+) + F (x−)) ( = π 2 ). Note that in both cases (for x = (2n + 1)π and for x = (2n+1)π, n ∈ Z, respectively), the use of Corollary 848 instead also guarantees pointwise convergence. The computation of the Fourier coefficients of the function F is easy (it is just a matter of using integration by parts, see Theorem 705). There is only a warning: the expressions (9.17) and (9.18) involve integrals on [0, 2π ]. The computation (see Fig. 9.6) requires to split the interval [0, 2π ] in two intervals [0, π] and [π , 2π ], due to the fact that F has been defined as the 2π -periodic extension of a function
472
9 Fourier Series
Fig. 9.6 The 2π-periodic extension of f (x) = x + x 2 on [−π, π ]
Fig. 9.7 Some partial sums for the 2π-periodic expansion of f (x) = x + x 2 on [−π , π ]
originally defined on [−π , π]. We get (see Fig. 9.7 for the plots of some of the partial sums) ∞ 2 π2 4 k k (−1) cos kx − sin kx . (9.42) + (−1) F (x) ∼ 3 k2 k k=1 Then, the sum of the series in (9.42) at x = π gives, according to the previous paragraph ∞ π2 4 (−1)2k , + π2 = 2 3 k k=1 hence ∞
1 π2 , = 6 k2 k=1
(9.43)
a useful for estimating π due to the relatively fast convergence of the expression 1 . For another approach to this expression see Exercise 9.43. series k2 In order to see how dramatically the situation changes if we consider the 2π periodic extension G of the function f defined on [0, 2π ] (see Fig. 9.8), let us compute its Fourier series and check for its pointwise convergence. Now ∞ 4 −2 − 4π 4 cos kx + sin kx . (9.44) G(x) ∼ π + π 2 + 3 k2 k k=1 According to the previous discussion, the Fourier series in (9.44) converges to G(x) at every continuity point of G (i.e., at every point x ∈ R, x = 2nπ , n ∈ Z), while it converges to (1/2)(π + (2π)2 ) = π + 2π 2 at every discontinuity point (i.e., at every point x = 2nπ, n ∈ Z). ♦ A second sufficient condition for the convergence of the Fourier series of a function at a given point is provided by the following result.
9.6 Convergence of the Fourier Series
473
Fig. 9.8 The 2π-periodic extension of f (x) = x + x 2 on [0, 2π]
Theorem 851 (Jordan) Let δ ∈ (0, π ) and g : (0, δ] → R be a function of bounded variation. Then 4 δ sin αt π g(t) dt = g(0+), lim α→+∞ 0 t 2 i.e., Eq. (9.39) holds. Proof Since every function of bounded variation on (0, δ] is the difference of two increasing functions (Theorem 432), we may assume, without loss of generality, that g is increasing on (0, δ] (note that, then, g is Riemann integrable in (0, δ] by Proposition 677). It follows that g(0+) exists. Fix ε > 0 and find then δ ∈ (0, δ) such that (0 ≤ ) g(t) − g(0+) < ε for t ∈ (0, δ ). Observe that, given α > 0, 4 δ
4 δ sin αt sin αt π ≤ − g(0+)) g(t) dt − g(0+) dt (g(t) t 2 t 0 0 4
4 δ sin αt sin αt π δ + g(0+) g(t) dt − dt . (9.45) + t 2 t δ
0 The Riemann–Lebesgue Lemma 837 ensures the existence of α0 > 0 such that 4 δ sin αt (9.46) g(t) t dt < ε, for α ≥ α0 . δ By Lemma 832 we can find α1 ( ≥ α0 ) such that 4
δ sin αt π dt − < ε for α ≥ α1 . 0 t 2
(9.47)
Finally, the function g(t) − g(0+) is increasing and satisfies 0 ≤ g(t) − g(0+) < ε on (0, δ ]. By the Second Mean Value Theorem 694 for the Riemann integral, with A = 0 and B = ε, we obtain ξ ∈ (0, δ ] such that 4 δ
4 αδ
4 δ
sin αt sin αt sin u dt = ε dt = ε du , (9.48) (g(t) − g(0+)) t t u 0 ξ αξ
474
9 Fourier Series
the last equality obtained +∞ by changing the variable to u(=αt). Recall that the improper Riemann integral 0 ( sin u/u) du is convergent (see Example 713), hence there b exists c > 0 such that | a ( sin u/u) du| < ε for c ≤ a < b. In particular, it follows from this and (9.48) that there exists α2 (≥α1 ) such that, for α ≥ α2 , 4
δ sin αt dt < ε2 . (9.49) (g(t) − g(0+)) 0 t Carrying (9.46), (9.47), and (9.49) to (9.45) shows that 4 δ sin αt π < ε 2 + ε + ε forα ≥ α2 , g(t) dt − g(0+) t 2 0 as we wanted to prove. 2 The following corollary follows from Theorem 851 in the same way as Corollary 849 was deduced from Theorem 847. Corollary 852 Let f ∈ L[0, 2π], and x ∈ R. Assume that there exists δ ∈ (0, π ) such that f is of bounded variation on (x − δ, x + δ). Then the Fourier series of f converges at x to (1/2)(f (x+) + f (x−)). Example 853 Examples 850 could have been treated by using Corollary 852 instead of Corollary 849, since both functions f and G there, are clearly monotone on (x − δ, x) and on (x, x + δ) for a suitable δ > 0 (depending on x), for every x ∈ R. ♦ Example 854 We will now present Fejér’s example of a continuous function on R in the class L[0, 2π] whose Fourier series diverges at the point x = 0. For n ∈ N and x ∈ R, put cos x cos 2x cos (n − 1)x 1 + + + ··· + n n−1 n−2 1 cos (n + 1)x cos 2nx cos (n + 2)x − − ··· − . − 2 n 1
fn (x) :=
(9.50)
Put, for x ∈ R, F (x) :=
∞ 1 · f n3 (x). n2 2 n=1
(9.51)
We claim that F is a continuous 2π-periodic function on R. It is obviously 2π periodic. In order to prove continuity, note that, by using (iv) in Corollary 385, we can write n n 1 sin kx . fn (x) = ( cos (n − k)x − cos (n + k)x) = 2 sin nx k k k=1 k=1 and so we can use (9.6) to see that |fn (x)| ≤ 2 + 4π for all n ∈ N and all x ∈ R. This shows that the series in (9.51) is uniformly convergent on R. Since each summand is a continuous function, continuity of F follows from Theorem 463.
9.6 Convergence of the Fourier Series
475
Thus, if sm (x) is the sum of the first m + 1 members of the Fourier series for F , then by integrating term by term in (9.51) to get the Fourier coefficients, we get ∞ 1 sm (x) = · sm,2n3 (x), 2 n n=1
(9.52)
where sm,2n3 is the sum of the first m + 1 members of the Fourier series for f2n3 . Due to the fact that fn is already written in (9.50) as the sum of its Fourier series, sm (fn ) is the sum of the first (m + 1) summands in (9.50), and so sm,n (0) ≥ 0 for all m = 0, 1, 2, . . . and n ∈ N. Moreover, 4 n 1 1 1 dx 1 sn,n (0) = + + ··· + + > = ln n. n n−1 2 1 x 1 Thus, from (9.52) we get s2m3 (0) ≥
1 1 3 s2m3 ,2m3 (0) > 2 ln 2m = m ln 2. 2 m m
Therefore the sequence {sn (0)}∞ n=0 is not convergent and the Fourier series of F is divergent at x = 0. ♦ Remark 855 The first example of a continuous 2π -periodic function whose Fourier series diverges at some point was given by the German mathematician P. du BoisReymond in 1876. In 1913, the Russian mathematician N. N. Luzin conjectured that the Fourier series of a 2π-periodic L2 -function (i.e., a measurable scalar-valued 2π periodic function f such that |f |2 is Lebesgue integrable, see Example 565.16; the space consisting of such functions and the problem of convergence of their Fourier series will be considered in Sect. 11.4, in particular in Example 963.2) should converge to the function almost everywhere. This conjecture was proven by the Swedish mathematician L. Carleson in 1966. In contrast with this result, it had been proved by the Russian mathematician A. Kolmogorov, in 1923, that there is a 2π-periodic Lebesgue integrable function whose Fourier series diverges almost everywhere, a result improved in 1926 by the same author by showing a 2π periodic Lebesgue integrable function whose Fourier series diverges everywhere. Note that Carleson’s result implies, in particular, that the Fourier series of a 2π periodic continuous function converges to the function almost everywhere. ®
9.6.2
Cesàro Convergence of the Fourier Series
Let f ∈ L[0, 2π ]. For m = 0, 1, 2, . . . and x ∈ R, put 1 σm (x) := sk (x), m + 1 k=0 m
(9.53)
476
9 Fourier Series
Fig. 9.9 The Fejér kernel (Definition 857) in [−π , π ] for m = 0, 1, 2, 3, 4
where {sk }∞ k=0 is the sequence of partial sums of the Fourier series of f (see Eq. (9.27)). Note that each σm (x) is a trigonometrical polynomial. The convergence of averages of numerical or function series is usually denoted Cesàro convergence, after the name of the Italian mathematician E. Cesàro. Proposition 856 For f ∈ L[0, 2π] and x ∈ R, using the notation above, we have 4
1 σm (x) = (m + 1)π
[0,π ]
f (x + t) + f (x − t) sin2 m 2+ 1 t dt. . 2 sin2 2t
(9.54)
Proof We saw in (9.29) that, for k = 0, 1, 2, . . . sk (x) =
1 π
4 [0,π ]
f (x + t) + f (x − t) sin (k + 21 )t dt. . 2 sin 2t
Adding up from k = 0 to m and using (9.5), we get (9.54). 2 The following definition isolates the kernel that appears in the integral expression of the averages σm of the first (m + 1)th partial sum sk of the Fourier series, and bears the name of the Hungarian mathematician L. Fejér, that proved Theorem 859 below. Definition 857 For m = 0, 1, 2, . . . , the function ⎧ ⎨ Fm (t) :=
1 m+1
⎩m + 1
sin m+1 2 t sin 2t
2
if t ∈ R, t = 2kπ , k ∈ Z,
(9.55)
otherwise,
is called the Fejér kernel. Equation (9.54) appears then as 4 f (x + t) + f (x − t) 1 Fm (t)dt. σm (x) = π [0,π ] 2
(9.56)
The Fejér kernel is a continuous 2π-periodic function on R, and it is clearly nonnegative (see Fig. 9.9). This last property accounts for the good behaviour of the Fejér kernel—and so for the sequence of averages of the partial sums of the Fourier
9.6 Convergence of the Fourier Series
477
series of a function—in contrast with the not-so-good behaviour of the Dirichlet kernel (9.3). Note that 1 Dk (x), m = 0, 1, 2, . . . , x ∈ R. m + 1 k=0 m
Fm (x) =
and so, according to Eq. (9.32), m 1 σm = Dk ∗ f = Fm ∗ f , for m = 0, 1, 2, . . . , m + 1 k=0
(9.57)
(9.58)
where ∗ denotes the convolution operator. Corollary 858 For m ∈ N we have 4 1 π Fm (t)dt = 1. π 0 Proof Take f in Proposition 856 identically equal to 1. Then , as we mentioned above, sm (x) = 1 for m = 0, 1, 2, . . . , and thus also σm (x) = 1 for m = 0, 1, 2, . . . 2 Theorem 859 [Fejér] Let f be a complex-valued continuous 2π -periodic function on R. Then σm (x) → f (x) when m → ∞, uniformly on R, where the sequence {σm }∞ m=1 is defined in Eq. (9.53). In particular, the subspace of all (classes containing) trigonometrical polynomials is dense in the space (L1 [0, 2π ], · 1 ), where · 1 denotes the integral norm f 1 := [0,2π ] |f (x)|dx for f ∈ L1 [0, 2π ]. Proof Use Proposition 856 and Corollary 858 to write, for m = 0, 1, 2 . . . , x ∈ R, and δ ∈ (0, π ), σm (x) − f (x) =
1 π (m + 1) +
4 [0,δ]
1 π (m + 1)
4
f (x + t) + f (x − t) − 2f (x) sin2 m 2+ 1 t dt . 2 sin2 2t
[δ,π]
f (x + t) + f (x − t) − 2f (x) sin2 m 2+ 1 t . dt. 2 sin2 2t
(9.59) (9.60)
We are to show that given ε > 0, there is n0 so that |σm (x) − f (x)| < ε, for all n > n0 and for all x ∈ R. For it, note that, since f is uniformly continuous on R (it is continuous and 2π -periodic), there is δ ∈ (0, 1) so that |f (x + t) + f (x − t) − 2f (x)| < ε for all x ∈ R and for all |t| < δ.
478
9 Fourier Series
+ 1)t/2) 2 Then, since ( sin ((m ) ≥ 0, we have, in view of Corollary 858, sin (t/2)
1 |(9.59)| ≤ π (m + 1)
4
δ 0
ε sin2 m 2+ 1 t ε dt ≤ t 2 2 sin 2 2π
4
π
Fm (t)dt = ε/2.
(9.61)
0
Keeping this δ > 0 fixed, and noticing that f is a bounded function, say |f (x)| ≤ K for all x ∈ R, we have 4 π 1 m+1 |(9.60)| ≤ 2K sin2 tdt 2 2 π(m + 1) sin (δ/2) δ ≤
2Kπ ≤ ε/2, π(m + 1) sin2 (δ/2)
(9.62)
if m is sufficiently large. The conclusion follows from (9.61) and (9.62). To prove the particular case, note first that the continuous functions on [0, 2π] form a · 1 -dense subspace of L1 [0, 2π] (see Corollary 739). Second, a consequence of Theorem 859 is that the trigonometric polynomials form a · ∞ -dense (hence · 1 dense) subspace of the space of all continuous functions on [0, 2π ]. This two facts together thus prove the last statement. 2 Remark 860 We remark that Fejér Theorem 859 provides an alternative proof to Weierstrass’ approximation Theorem 490. Indeed, assume without loss of generality that f is a continuous function on [0, 2π ] (otherwise transfer a given interval to [0, 2π] by a linear map). Extend f periodically on R and write f as a uniform limit of the sequence {σm }∞ m=0 by Fejér’s theorem. Note finally that each function sin nx and cos nx is, on bounded intervals, the uniform limit of their Taylor’s polynomials (see also Exercise 13.228). ®
9.6.3
Uniform Convergence of the Fourier Series
The partial sums of the Fourier series of a 2π-periodic function are, certainly, continuous (2π -periodic) functions. If the Fourier series of f has to converge uniformly to f , a necessary condition is that f should be continuous (see Theorem 463). The following result gives a sufficient condition for uniform convergence of the Fourier series of a 2π -periodic function. Its proof will be postponed until some tools from the theory of Hilbert spaces will be at hand (see Sect. 11.4.2 and, in particular, Theorem 985). Theorem 861 Let f be a Lipschitz 2π-periodic function. Then the Fourier series of f converges to f uniformly on R. Example 862 Consider the 2π-periodic odd extension Of of the function f (x) = x, x ∈ [0, π ]. The function Of is not continuous on R, hence the convergence of its Fourier series cannot be uniform. However, if Ef denotes the 2π -periodic even extension of f , the function Ef is Lipschitz on R, and its Fourier series converges uniformly (to Ef ). Fragments of the graphs of Of and Ef are depicted in Fig. 9.10. ♦
9.6 Convergence of the Fourier Series
479
Fig. 9.10 Two different extensions of f in Example 862
9.6.4
Convergence of the Fourier Series in · 1
Recall that the space L1 [0, 2π ] may be endowed with the so-called integral norm · 1 , i.e., the norm defined by f 1 := [0,2π ] |f (x)|dx for f ∈ L1 [0, 2π ], and that in this way it becomes a Banach space (i.e., a complete normed space, see Definition 896 and Exercise 13.485). The general argument behind the proof of the · 1 continuity of the convolution operator is beyond the scope of this book, essentially because it needs Fubini’s Theorem on the interchange of the order of integration in a double integral (a result after the name of the Italian mathematician G. Fubini). However we can complete the proof of the case that one of the functions is the Dirichlet kernel, say, without relying on the full version of that delicate result —a result that, in order to be properly presented, needs to introduce the product of two measures. For the general setting we shall provide some references. We need first some background. Given m = 0, 1, 2,. . . , denote by Sm the (linear) operator from L[0, 2π ] into itself ikx given by Sm f (x) = m for x ∈ R (where each ck is given by (9.22)), i.e., k=−m ck e the mapping that to a function f ∈ L[0, 2π] associates the m-th partial sum of its Fourier series (see Eq. (9.27)). Clearly each Sm is linear. That it is continuous in the · 1 norm is a consequence of the following facts: As it was mentioned in Remark 841, Sm f = Dm ∗ f (see formula (9.32)), where ∗ denotes the convolution of two functions (defined in Exercise 13.227 for the case of continuous functions and in Definition 1060 for the general case of two distributions) and Dm is the Dirichlet kernel. (ii) For some of the statements here we need Fubini’s Theorem if we are in the general setting of two Lebesgue integrable functions (for a reference, see, e.g., [Wi69]): The convolution of two functions g and h in L[0, 2π ] exists (a.e.) on [0, 2π], and belongs to L[0, 2π]. Moreover g ∗ h1 ≤ (2π )−1 g1 .h1 . However, we may rely on a reduced version of Fubini’s Theorem in this context to conclude that Sm f 1 = Dm ∗ f 1 ≤ (2π )−1 Dm 1 .f 1 . This will show in particular that the operator Sm is · 1 - · 1 -continuous. This is done in Corollaries 864 and 865, a consequence of the following result. (i)
Proposition 863 Let M be a nonempty Lebesgue measurable set in R, let I := (a, b) be a nonempty bounded open interval in R, and let f (x, α) be a complex-valued function on M × I such that (i) (ii)
For every α ∈ I , the function x $ → f (x, α) is Lebesgue measurable on M. There exists α0 ∈ I such that the function x $ → f (x, α0 ) is Lebesgue integrable on M.
480
9 Fourier Series
(iii)
There is a set N of measure zero in M such that for every x ∈ M \ N and all ∂f ∂f α ∈ I , the function ∂α (x, α) exists and is finite, and the function α $ → ∂α (x, α) defined on I is measurable. There is a function g ∈ L(M) such that for all x ∈ M \ N (where N is the set ∂f in (iii)) and for all α ∈ I , ∂α (x, α) ≤ g(x).
(iv)
Then we have
4 4 I
M
4 4 ∂f ∂f (x, α)dx dα = (x, α)dα dx ∂α M I ∂α
Proof Define F (α) := M f (x, α)dx for α ∈ I . According to Theorem 802, F is well defined and differentiable, the function x $ → ∂f/∂α(x, α) belongs to L(M), and F (α) = M ∂f/∂α(x, α)dx. We claim that the function F is absolutely continuous. Indeed, given ε > 0, take n δ := ε/g1 . Then, n given a finite collection of nonoverlapping intervals {(αi , βi )}i=1 in I such that i=1 (βi − αi ) < δ, note first that the Mean Value Theorem 365 provides, for a given x ∈ M \ N , an element γi ∈ (αi , βi ) (depending on x) such that ∂f (x, γi )|.(βi −αi ) ≤ |g(x)|.(βi −αi ) for all i = 1, 2, . . . , n. |f (x, βi )−f (x, αi )| = | ∂α Thus, n n 4 |F (βi ) − F (αi )| = (f (x, βi ) − f (x, αi ))dx i=1
i=1
≤ ≤
M
n 4
|f (x, βi ) − f (x, αi )|dx
i=1
M
i=1
M
n 4
= g1
|g(x)|(βi − αi )dx n
(βi − αi ) < δg1 = ε,
i=1
as we claimed. The Fundamental Theorem 791 of the Calculus gives, then, 4 4 4 ∂f (x, α)dx dα = F (α) dα = F (b) − F (a). I M ∂α I
(9.63)
∂f On the other hand, note that for x ∈ M \ N , the function α $ → ∂α (x, α) defined on I is measurable there, and its modulus is bounded by the constant function α $ → g(x). ∂f Thus, the function α $ → ∂α (x, α) is Lebesgue integrable on I . A new application of ∂f Theorem 791 shows that I ∂α (x, α) dα = f (x, b) − f (x, a), hence
4 4 M
I
4 ∂f (x, α)dα dx = (f (x, b) − f (x, a))dx = F (b) − F (a). ∂α M
(9.64)
9.6 Convergence of the Fourier Series
481
2
Combining (9.63) and (9.64) we get the result.
Corollary 864 Let I := [a, b] and J be two closed and bounded intervals in R, f ∈ L(J ) and φ : J × I → C be a complex-valued continuous function. Then 4 4 4 4 φ(x, α)f (x)dx dα = φ(x, α)f (x) dα dx. I
J
J
I
α Proof Let Φ(α) := a φ(x, s) ds for α ∈ [a, b]. Put F (x, α) := Φ(x, α).f (x), for (x, α) ∈ J × I . Since Φ and φ are both bounded functions, F satisfies (i) to (iv) in Proposition 863. It is enough to observe that ∂Φ/∂α = φ. 2 For the next corollary, recall that ∗ denotes the convolution of two functions, defined in Exercise 13.227 for the class of continuous functions and, more generally, in Definition 1060 for the class of periodic distributions. Corollary 865 Let f : [0, 2π] → C be a Lebesgue integrable function. Then, for every m ∈ N ∪ {0}, 1 Dm ∗ f 1 ≤ Dm 1 f 1 , 2π where for m = 0, 1, 2, . . . , Dm is the Dirichlet kernel (Definition 840). Proof Equation (9.28) shows that, after a translation of the interval, 4 1 Sm f (x) = (Dm ∗ f )(x) = f (v)Dm (x − v) dv 2π [0,2π ] for any x ∈ [0, 2π ]. Note that Corollary 864 applies, and so 4 Dm ∗ f 1 = |(Dm ∗ f )(x)|dx [0,2π ]
4 dx f (v).D (x − v) dv m [0,2π ] [0,2π ] 4 4 1 ≤ |f (v)|.|Dm (x − v)| dv dx 2π [0,2π ] [0,2π ] 4 4 1 = |f (v)|.|Dm (x − v)|dx dv 2π [0,2π ] [0,2π ] 4 4 1 1 = |f (v)| dv. |Dm (x)|dx = f 1 .Dm 1 , 2π [0,2π ] 2π [0,2π ]
1 = 2π
4
the third equality due to Corollary 864, and the last one to the fact that Dm is a 2π-periodic function. 2 We may prove now the following intermediate result (we shall show later that (i) fails; thus (ii) will fail, too). In fact, we need just the implication (ii)⇒(i). However, the two statements in Lemma 866 are equivalent, something that is of independent interest.
482
9 Fourier Series
Lemma 866 The two following statements are equivalent. If given m ∈ {0, 1, 2, . . . }, Sm denotes the linear operator from L1 [0, 2π ] into itself that to a function associates the m-th partial sum of its Fourier series, then the sequence {Sm }∞ m=1 is bounded. (ii) For every f ∈ L1 [0, 2π], the Fourier series of f converges to f in the norm · 1 . (i)
Proof (i)⇒(ii) follows from the fact that the trigonometric polynomials form a dense subspace of (L1 [0, 2π ], · 1 ) (see Theorem 859). Now, assume that Sm ≤ K for all m ∈ N∪{0}, for some constant K > 0. Given ε > 0 there exists a trigonometrical polynomial P such that f −P 1 < ε. Thus, Sm f −Sm P ≤ Sm .f −P 1 < εK for all m ∈ N ∪ {0}. By letting m be big enough, we have Sm P = P , hence Sm f − P < ε. It follows that f − Sm f < 2ε for m big enough. (ii)⇒(i) follows from Theorem 95 below. 2 We conclude that there are functions f in L1 [0, 2π ] such that their Fourier series do not converge in · 1 . Indeed, the sequence {Sm }∞ m=0 is unbounded (see Exercise 13.533 and estimate Sm (Dn )1 ( = Dm 1 for n ≥ m)), and we can apply Lemma 866.
9.6.5
Mean Square Convergence of the Fourier Series
The important case of a measurable 2π -periodic scalar-valued function f defined on R such that |f |2 is Lebesgue integrable will be considered in the framework of the so-called Hilbert spaces (Sect. 11.4). We shall prove there that in such a case 1/2 the Fourier series of f converges to f in the norm g2 := [0,2π ] |g(x)|2 dx ,a kind of convergence usually referred to as mean square.
9.7 The Fourier Integral A simple change of variable turns a functions f ∈ L[a, b], where [a, b] is a nondegenerate closed and bounded interval in R, into a function defined on [0, 2π ] (see Remark 836). We may also change the value of f at one of the endpoints to get f (0) = f (2π) if necessary, and this does not modify the value of the integrals involved in computing its Fourier coefficients. All this amounts to develop the Fourier theory of series —with almost no changes— for Lebesgue integrable functions defined on any nondegenerate closed and bounded interval. The resulting Fourier series is a periodic function, so there is no way to represent a—in general, nonperiodic— Lebesgue integrable function defined on R by using this technique. However, there is an alternative procedure for functions f ∈ L(R)—by using integrals instead— that in some sense mimics the Fourier series approach. In order to suggest the kind of integral expressions involved, and to show how the Fourier series expansion of a periodic Lebesgue integrable function may inform
9.7 The Fourier Integral
483
about it, let us proceed, informally, in the following way (an we follow an idea in [My73]): Write the Fourier series of the restriction of the function f to an interval [− l, l] as a 2l-periodic function (l a positive number), and then let l → +∞. For this we need the expression of the Fourier series and the Fourier coefficients of the 2l-periodic extension of a Lebesgue integrable function defined on [ − l, l]. By using (9.23), (9.24) and (9.25) for a = −l and b = l we may write ∞ a0 nπ x nπ x + + bn sin , an cos 2 l l n=1 4 1 nπ t f (t) cos dt, n = 0, 1, 2, . . . , where an = l [−l,l] l 4 1 nπ t and bn = dt, n =, 1, 2, . . . f (t) sin l [−l,l] l
f (x) ∼
(9.65) (9.66) (9.67)
Carrying (9.66) and (9.67) into (9.65) we get 4 1 f (x) ∼ f (t)dt 2l [−l,l] 4 ∞ 4 1 nπ t nπ t nπ x nπ x + f (t) cos f (t) sin . cos dt + . sin dt l n=1 [−l,l] l l l l [−l,l] 1 = 2l
4
∞
1 f (t)dt + l n=1 [−l,l]
4
f (t) cos [−l,l]
nπ l
(t − x) dt.
As mentioned, we shall find the limit of the last expression as l → +∞. Note first that 4 4 +∞ 1 ≤ 1 f (t)dt |f (t)|dt → 0, as l → +∞. 2l 2l −∞ [−l,l] If αn := (nπ/ l) and Δαn := αn+1 − αn ( = π/ l) for n ∈ N, we can put ∞
1 l n=1
4 f (t) cos
nπ l
[−l,l]
where F (α) :=
1 π
∞ F (αk )Δαk , (t − x) dt =
(9.68)
n=1
4 f (t) cos (α(t − x))dt. [−l,l]
+∞ Finally, when l → 0 we get that (9.68) should approach 0 F (α) dα, due to the fact that Δαn → 0. Thus, 4 +∞ 4 1 f (t) cos (α(t − x))dt dα, (9.69) f (x) ∼ π (−∞,+∞) 0 and the right-hand side in (9.69) is the Fourier integral representation of f .
484
9 Fourier Series
Of course, this is just an informal approach to the integral representation of f as given by (9.69). Let us formulate and prove a precise result. The reader should compare the requirements with those for the pointwise convergence of the Fourier series of a Lebesgue integrable periodic function as in Proposition 846. Theorem 867 Let f ∈ L(R). Fix x ∈ R and assume that there exists δ > 0 such that the two limits f (x+) and f (x−) exist and limα→+∞ [0,δ] g(t) sint αt dt = π2 g(0+) for the functions g(t) := f (x + t) and g(t) := f (x − t), defined in [0, δ). Then f (x+) + f (x−) 1 = 2 π
4
+∞
0
4
f (t) cos ω(t − x)dt dω,
(9.70)
(−∞,+∞)
where the integral with respect to ω is understood in the sense of an improper Riemann integral. Proof Note that 4 4 4 4 4 sin αu du = f (x + u) + + + u (−∞,+∞) (−∞,−δ) [−δ,0] [0,δ] (δ,+∞)
(9.71)
and that the first and last integrals in the right-hand side of the former equality tend to 0 as α → +∞, in view of the Riemann–Lebesgue Lemma 837. By assumption, the third integral converges to (π/2)f (x+), and by changing the variable, the second integral converges to (π/2)f (x−). This shows that 4 sin αu f (x+) + f (x−) 1 f (x + u) du → , as α → +∞. (9.72) π (−∞,+∞) u 2 In the integral in (9.72) put x + u = t to get 4 4 sin αu sin α(t − x) 1 1 f (x + u) f (t) du = dt. π (−∞,+∞) u π (−∞,+∞) t −x 4
Observe that
α
cos ω(t − x) dω =
0
hence 1 α→+∞ π
4
4 f (t)
lim
(−∞,+∞)
0
α
sin α(t − x) , t −x
f (x+) + f (x−) cos ω(t − x) dω dt = . 2
(9.73)
We can apply Proposition 863 to reverse the order of integration in the iterated integrals in (9.71). This shows, then, that 4 4 1 α f (x+) + f (x−) f (t) cos ω(t − x)dt dω = . (9.74) lim α→+∞ π 0 2 (−∞,+∞) The function ω $ → (−∞,+∞) f (t) cos ω(t − x)dt is continuous on [0, +∞), as it follows from Theorem 801. Since the limit in (9.74) exists, the improper Riemann
9.7 The Fourier Integral
485
integral in the left-hand side of the following equality exists, and its value is the right-hand side there, i.e., 4 4 f (x+) + f (x−) 1 +∞ . f (t) cos ω(t − x)dt dω = 2 π 0 (−∞,+∞) 2 Corollary 868 Let f ∈ L(R). Fix x ∈ R and assume that at least one of the two following conditions is satisfied: Both limits f (x+) and f (x−) exist, and f+ (x) exists (and is finite), in the sense (x+) (x−) of the existence of both limits limh→0+ f (x+h)−f and limh→0− f (x+h)−f h h finite. (ii) There exists δ > 0 such that f is of bounded variation on (x − δ, x + δ). (i)
Then Eq. (9.70) holds. Remark 869 There is a more symmetric way to write (9.70) by using complex exponentials. Note that F (ω) := (−∞,+∞) f (t) cos ω(t − x)dt is an even and con +∞ +∞ 1 tinuous function, so we have π1 0 F (ω) dω = 2π −∞ F (ω) dω. If we define G(ω) := (−∞,+∞) f (t) sin ω(t − x)dt for ω ∈ R, the function G is odd and continr uous, hence π1 −r G(ω) dω = 0 for every r > 0. Thus, under the assumptions of +∞ 1 Theorem 867, we get, by computing 2π −∞ (F (ω) + iG(ω)) dω, 1 2π
4
+∞
4
f (t)e
−∞
iω(t−x)
dt dω =
(−∞,+∞)
f (x+) + f (x−) , 2
(9.75)
where, as above, the integral with respect to ω must be understood as an improper Riemann integral. ® Remark 870 Equation (9.75) should be understood in the following sense: A function f in the variable x ∈ R is changed into another function F in the variable ω ∈ R by using a so-called integral transform, i.e., 4 F (ω) := f (x)K(x, ω)dx, (9.76) I
where K is a function in two variables called the kernel of the transformation. The main idea—already explicit in some other parts of this text—is that an integral is a very stable operation that “improves” the quality of a function. It is instructive, to illustrate the point, to consider the following simple example: Let K : [0, 1] × [0, 1] be defined by ⎧ ⎨1 if x ≤ ω, K(x, ω) := ⎩0 otherwise. For a plot of the graph of K see Fig. 9.11.
486
9 Fourier Series
Fig. 9.11 The kernel K in Remark 870
Note that K is not even continuous. However, the “improvement” that formula (9.76) produces wwhen applied to any function is substantial. Indeed, fix ω ∈ [0, 1]. Then F (ω) = 0 f (x)dx. Observe that, even in the case that f is merely continuous, the “transformed” function F is differentiable (in fact, it is a primitive function of f ). If, more generally, f is a Lebesgue integrable function on [0, 1], we know by Theorem 799 that F is (a.e.) differentiable on [0, 1] with (a.e.) derivative f . An integral transformation widely used in applications is the so-called Fourier transform. It is defined by 4 F(f )(ω) := f (x)eiωx dx, ω ∈ R, (9.77) (−∞,+∞)
for a function f ∈ L(R). It corresponds to formula (9.76) for a kernel K(x, ω) := e−iωx defined on R × R. Observe that, under the assumptions of Theorem 867, we have 4 +∞ 4 f (x+) + f (x−) 1 iω(t−x) f (t)e dt dω = 2 2π −∞ (−∞,+∞) 4 +∞ 4 1 f (t)eiωt dt e−iωx dω = 2π −∞ (−∞,+∞) 4 +∞ 1 F(f )(ω)e−iωx dω. (9.78) = 2π −∞ The way (9.78) should be understood is, finally, that f can be “recovered” (in the wide sense of recovering (1/2)(f (x+) + f (x−))) under some special requirements (see Theorem 867) from its Fourier transform F(f ). Theorem 867 (also in its exponential version in Remark 869) should be properly called then the Inversion Fourier Transform Theorem. ®
Chapter 10
Basics on Descriptive Statistics
Probability is a part of measure theory. This short chapter tries to describe what is called Discrete Probability Theory or, sometimes, “Descriptive Statistics”.
10.1 10.1.1
Discrete Probability Introduction
We start with a simple example in order to present the basic terms and ideas. Example 871 In this example we play with dices (coins, a simpler “device,” will be considered later). Faces are numbered from 1 to 6. The output is the face up after rolling a dice. We collect the outputs of the “experiment” and form with them a set . In this particular case, := {1, 2, 3, 4, 5, 6} (the reader should have in mind that , in other instances, may consists of elements other than numbers, for example := {head, tail}; of course, we shall try to “associate” numbers to those issues, and this is the purpose of what will be called a random variable). Now we want to convey the idea that each of this outputs come with an assigned “weight” (technically speaking, the probability), i.e., the “chance” to get this output in the process of rolling the dice. To give a precise definition of this chance can be very elusive. It will be acceptable to say that it is the limit of the quotient of the number of times the output appeared in throwing the dice n times, and n (i.e., “favorable cases”/“possible cases”), as n → ∞. This appeals to a kind of intuition, indeed, nobody will try to fix the probability by counting up to trillions of experiments. So, by “fiat,” we accept that the probability of each face is 1/6 if the dice is “fair” (after all, there are six faces). The reader with a critical sense will immediately discover the circularity of the “definition”: a dice is “fair” if the chances to get one particular face are the same as the chances to get any other one. A dice is “biased” in other cases. In order to illustrate several possibilities, let’s assume that we have two dices: dice F is fair, dice B is biased. This amounts to endow with a measure (more precisely, a “probability measure”), a different one according to the case (measures PF and PB , respectively). It will be enough to assign a measure—a “probability”—to each
© Springer International Publishing Switzerland 2015 V. Montesinos et al., An Introduction to Modern Analysis, DOI 10.1007/978-3-319-12481-0_10
487
488
10 Basics on Descriptive Statistics
elementary event (i.e., each subset of having a single element). Precisely, Output
1
2
3
4
5
6
PF
1/6
1/6
1/6
1/6
1/6
1/6
PB
0.2
0.1
0.1
0.2
0.2
0.2
(10.1)
♦ The following definition is a particular instance of a general probability space, i.e., a triplet (, , P ), where is a nonempty set, P is a probability measure on , i.e., a measure P on such that P () = 1, and is the family of all P -measurable subsets of . Recall that P() denotes the family of all subsets of a set . Definition 872 A discrete probability space is a probability space (, P(), P ) such that is a nonempty finite or countably infinite set. The discrete probability space is called finite if is a finite set. We already mentioned that subsets {ω}, where ω ∈ , are called elementary events. Note that ω∈ P ({ω}) = 1 (since all summands in the previous series are nonnegative, the series is unconditionally convergent). In general, subsets W of (i.e., elements of P()) are called events. Observe that, due to the countable additivity of P , we may compute P (W ) for any event W as soon as the probability P ({ω}) of each elementary event is known. Indeed (and again the series converges unconditionally), P (W ) = P ({ω}), for W ∈ P(). (10.2) ω∈W
In Example 871, let W be the event “getting an even output.” Then W = {2, 4, 6}, and, clearly, PF (W ) = 0.5 and PB (W ) = 0.5. If W is the event “getting an output less than or equal to 3,” then PF (W ) = 0.5, while PB (W ) = 0.4. Example 873 Assume now that the experiment consists flipping a coin. The space is fairly simple: := {H , T }, where H stands for “head” and T for “tail.” If the coin is fair, then P ({H }) = P ({T }) = 0.5. In general, put P ({H }) = p ∈ [0, 1], so P ({T }) = q := 1 − p. The triplet (, P(), P ) is another instance of a finite probability space. ♦ Note that a discrete probability space on a finite or countably infinite set is always defined by providing an arbitrary sequence {pn } such that pn ≥ 0 for all n ∈ N and ∞ n=1 pn = 1, then by writing the elements in as a sequence, say = {ωn }∞ , n=1 and finally by putting P ({ωn }) := pn for all n ∈ N. Thus, (10.2) defines a probability measure on P(). Example 874 It is necessary to consider discrete probability spaces with a countably infinite set of outputs even in the case of “finite” experiments. For example, imagine that the following game is played: flip a coin until T is shown. The set of outputs
10.1 Discrete Probability
489
is countably infinite. Indeed, = {T , H T , H H T , H H H T , . . . } ∪ {H H H H H H . . . }.
(10.3)
If H appears with probability p ∈ [0, 1], and T with probability q := 1 − p, the corresponding discrete probability space (, P(), P ) associated to the experiment of flipping a coin until T appears consists in the set above, the family P(), and the probability P defined on elementary events ω (in the case that p ∈ [0, 1)), by ⎧ ⎪ if ω = T , ⎪ ⎨q P ({ω}) := pn q, if ω = H H H H .(n) (10.4) . . T , for n ∈ N, ⎪ ⎪ ⎩ 0, if ω = H H H H . . . Observe that expected.
10.1.2
ω∈
P ({ω}) = q + q
∞ n=1
p n = q(1 +
p ) 1−p
= q + p = 1, as ♦
Random Variables
Since, the set may consists of nonnumerical elements (think again of “heads” and “tails”) it is convenient to associate a real number to each element w ∈ . Definition 875 A random variable is a measurable function X : → R, where (, , P ) is a probability space. A random variable is said to be discrete if its range X() is a finite or countably infinite subset of R such that X() ∩ [a, b] is finite for every closed bounded interval [a, b] in R. Example 876 1. An elementary example of a discrete random variable is the function X that assigns 1 to H and 0 to T in tossing a coin. In this case, the probability space is given in Example 873, and X() = {0, 1}. 2. Another example of a discrete random variable, this time with an infinite range, associated to the experiment of tossing a coin until T appears, is the function X defined as ⎧ ⎪ if ω = T , ⎪ ⎨1 X(ω) :=
n+1 ⎪ ⎪ ⎩ 0
if ω = H H .(n) .. HT ,
(10.5)
if ω = H H H . . .
(i.e., X assigns to an elementary event the moment T appears). The probability space is given in Example 874. In this case, X() = N ∪ {0}.
490
10 Basics on Descriptive Statistics
3. As a third example, suppose that we have two fair dices, and we roll simultaneously both of them. In this case, the set is := {(1, 1), (1, 2), (1, 3), . . . , (6, 1), (6, 2), . . . , (6, 6)},
(10.6)
(see Example 878). If the experiment consists of adding the outputs, the random variable associated is S : → R defined by S(ω1 , ω2 ) = ω1 + ω2 , for all(ω1 , ω2 ) ∈ .
(10.7)
♦ Once we have a random variable X on , the “particular” outputs of the experiment (i.e., the elements of ) are irrelevant, and what matters is the action of X on them, and the chances to get a certain result. This is why it is convenient to focus on the probability density function of a discrete random variable X : → R and its associated distribution function. Both terms are defined below. Definition 877 The probability density function f of a discrete random variable X is f (x) := P ({ω ∈ : X(ω) = x}), for x ∈ R.
(10.8)
The distribution function F of X is F (x) := P ({ω ∈ : X(ω) ≤ x}), for x ∈ R.
(10.9)
In order to simplify the expression in (10.8), we shall write P (X = x) instead of the more cumbersome P ({ω ∈ : X(ω) = x}), and its variants P (X ≤ x) (instead of P ({ω ∈ : X(ω) ≤ x})), P (x < X ≤ y), and so on. In this way, Eq. (10.8) and (10.9) appear as f (x) := P (X = x), for x ∈ R.
(10.10)
F (x) := P (X ≤ x), forx ∈ R.
(10.11)
Observe that f (x) ≥ 0 and 0 ≤ F (x) ≤ 1 for all x ∈ R, that F is an increasing function, an that we have limx→−∞ F (x) = 0 and limx→+∞ F (x) = 1. Note, too, that F (x) = f (t), for everyx ∈ R. (10.12) t≤x
Example 878 Returning to Example 876.3 of the random variable “sum S of outputs when rolling two fair dices,” let us list the sets {ω ∈ : S(ω) = x} for different x,
10.1 Discrete Probability Fig. 10.1 The probability density function of the random variable S
491 6=36 5=36 4=36 3=36 2=36 1=36 2
3
4
5
6
7
8
9
10
11
12
compute their sizes, and assign a probability to them according to the rule “favorable cases/possible cases” (i.e., define the corresponding probability density function f ). x
{ω ∈ : S(ω) = x}
size
f (x)
2
{(1, 1)}
1
1/36
3
{(1, 2), (2, 1)}
2
2/36
4
{(1, 3), (2, 2), (3, 1)}
3
3/36
5
{(1, 4), (2, 3), (3, 2), (4, 1)}
4
4/36
6
{(1, 5), (2, 4), (3, 3), (4, 2), (5, 1)}
5
5/36
7
{(1, 6), (2, 5), (3, 4), (4, 3), (5, 2), (6, 1)}
6
6/36
8
{(2, 6), (3, 5), (4, 4), (5, 3), (6, 2)}
5
5/36
9
{(3, 6), (4, 5), (5, 4), (6, 3)}
4
4/36
10
{(4, 6), (5, 5), (6, 4)}
3
3/36
11
{(5, 6), (6, 5)}
2
2/36
12
{(6, 6)}
1
1/36
(10.13)
We represent the probability density function f on a graph (see Fig. 10.1). ♦ The reader would suspect that, once the probability density function of a random variable X is known, we may deduce the probability density of another random variable related to the first one (in our example, we were adding the simple random variable X that produces the face-up number of a rolled dice to itself in an independent way). The notion of independence will be defined in the next entry. Sometimes it is convenient to define a random variable X : → Rn , for some n ∈ N (a vector-valued random variable). In this case, we can write X = (X1 , . . . , Xn ), where Xi : → R is a real-valued random variable defined on for i = 1, . . . , n. Precisely, X(ω) = (X1 (ω), . . . , Xn (ω)) for every ω ∈ .
492
10 Basics on Descriptive Statistics
Independent Random Variables Definition 879 Let (, P(), P ) be a discrete probability space. Let X : → R and Y : → R be two discrete random variables. We say that X and Y are independent if P (X = x andY = y) = P (X = x).P (Y = y), for every x, y ∈ R.
(10.14)
Observe that {ω ∈ : X(ω) = x, Y (ω) = y} = {ω ∈ : X(ω) = x} ∩ {ω ∈ : Y (ω) = y}, so we are asking that the probability of the intersection of these two sets should be the product of their probabilities. In a sense, this means that the value taken by X does not affect the value taken by Y . Example 880 In order to illustrate this, consider rolling simultaneously two fair dices. The set was described in (10.6). Let X : → R be the random variable X(ω1 , ω2 ) = ω1 , for (ω1 , ω2 ) ∈ , and S : → R be the random variable defined in (10.7). Observe that P (X = 1) = 1/6, and that P (S = 2) = 1/36 (see the table (10.13)). Note that P (X = 1 and S = 2) = P ({(1, 1)}) = 1/36 = P (X = 1)P (S = 2), hence the two random variables are not independent. This was expected: if the game scores by adding the outputs, the sum is not independent of what the first dice shows. In fact, if the first dice shows, say, 5, you cannot expect by adding to get 3. On the other hand, let Y : → R be the random variable Y (ω1 , ω2 ) = ω2 , for (ω1 , ω2 ) ∈ . Note that P (X = x and Y = y) = 1/36, and that P (X = x) = 1/6 and P (Y = y) = 1/6 for every x, y ∈ {1, 2, . . . , 6}. This shows that X and Y are independent. This was also expected: what the first dice shows is independent of what the second does. ♦ Definition 881 Let Xi : → R, i = 1, . . . , n, where (, P(), P ) is a discrete probability space. We $ say that X1 , . . . , Xn are mutually independent if P (Xi = xi , i = 1, . . . , n) = ni=1 P (Xi = xi ) for every x1 , . . . , xn ∈ R. Centrality and Dispersion of a Random Variable Some special values that help to “understand” the behavior of a random variable are associated to it. The most important are the, the median and the mode for centrality, and the variance and the standard deviation for dispersion. They give an idea of the main features of the random variable. For example, they will locate a kind of “average” (the mean), of “splitting by likelihood" (the median) or of “the most frequent value” (the mode), while the variance will describe the “concentration” around the mean (the standard deviation is just the square root of the variance). Those verbal descriptions above may already convey the sought meaning. Let us give another example before proceeding to the formal definitions. In the game of
10.1 Discrete Probability
493
Fig. 10.2 Two dartboards
A
B
darts, we aim at the center of a circular dartboard. The random variable associated to the game is the distance from the impact to the center. Compare, after a game, the dartboards from your two favorite pubs. Imagine that you get something like this (see Fig. 10.2): What can we deduce at first glance? It is supposed that players aim at the center. Let us look at the A dartboard. Something is wrong at pub A (maybe the lightning is wrong, maybe someone leaves an open window and the wind blows through): the mean is positive; to compute the median you should trace a circle that leaves half of the impacts out, half in; the mode is some spot near (but maybe not equal to) the mean; the players (dismissing this strange tendency to deviate to the left) are quite good, since the impacts are quite concentrated (i.e., the variance is small). Now look at the B dartboard: The mean is practically 0; for the median, trace the circle that leaves out half the impacts, in the other half; the mode is a certain positive number, corresponding to the radius of some “cluster” spot to the left of the center; and the players are quite bad (or drunk), since the variance is enormous. Definition 882 Let X be a discrete random variable X defined on , where (, P(), P ) is a discrete probability space. • The mean (also called the expected value) of X is the real number defined by E(X) := X(ω)P ({ω}) = xP (X = x), (10.15) ω∈
x∈X()
as far as the series above is absolutely convergent. In other case, we say that the mean does not exist. • The median of X is the set of all x ∈ R such that, simultaneously, P (X ≤ x) ≥ 1/2, and P (X ≥ x) ≥ 1/2.
(10.16)
• The mode of X is the set of all x0 ∈ R such that P (X = x0 ) ≥ P (X = x) for all x ∈ R.
(10.17)
• If E(X) exists, then (X − E(X))2 is again a discrete random variable. Its expected value, if it exists, is called the variance of X, and it is denoted by V (X). Precisely,
2 V (X) := E X − E(X) , (10.18)
494
10 Basics on Descriptive Statistics
if the former expression exists and is finite. In other case, we say that the variance does not exist. • The standard deviation of X is σ (X) := V (X), (10.19) if V (X) is finite. Example 883 1. Rolling a dice (see Example 5, 6}, X(ω) = ω for all ω ∈ . 871). := {1, 2, 3, 4, a) A fair dice. E(X) = 6x=1 xP (X = x) = 6x=1 x/6 = 3.5. The median is the set [3, 4]. The mode is the set {1, 2, 3,√ 4, 5, 6}. The variance V (X) = 6 2 (x − 3.5) (1/6) ≈ 2.9167, so σ (X) = V (X) ≈ 1.7078. x=1 b) The biased dice in (10.1). E(X) = (1)(0.2) + (2)(0.1) + (3)(0.1) + (4)(0.2) + (5)(0.2) + (6)(0.2) = 3.7. The median is 4. The mode, the set {1, 4, 5, 6}. V (X) = 3.21, σ (X) ≈ 1.7916. 2. Adding the output of rolling two dices (see Examples 876.3 and 878). := {(1, 1), (1, 2), . . . . . . , (6, 6)}, S(ω1 , ω2 ) = ω1 + ω2 . a) Two fair dices (see Fig. 10.1). Then E(S) = 7, median= 7, mode= 7, V (S) ≈ 5.8333, σ (S) ≈ 2.4152. b) Two biased dices as in (10.1). Then E(S) = 7.4, median= 7, mode= 7, V (S) = 6.42, σ (S) ≈ 2.5338. 3. Flipping a coin, where p ∈ [0, 1) is the probability of H and q ( = 1 − p) the probability of T (see Example 873). Let X be the random variable defined in Example 876.1. Then E(X) = 1p+0q = p, and V (X) = (1−p)2 p+(0−p)2 q = q 2 p + p 2 q = pq(q + p) = pq. 4. Flipping a coin until T is shown, where p ∈ [0, 1) is the probability of H and q ( = 1 − p) the probability of T (see Example 874). The space is given in (10.3), and the probability on P() in (10.4). The random variable X is defined in (10.5). Let us compute its expected value and its variance. To this end, we need the following formulae, valid for |z| < 1 (see Exercise 13.319): ∞ n=1
nzn =
∞ z z(1 + z) , n2 z n = . 2 (1 − z) (1 − z)3 n=1
Then μ := E(X) = 1q + 2(pq) + 3(p2 q) + 4(p 3 q) + . . . ∞ q n q p 1 = np = = , p n=1 p (1 − p)2 q
10.1 Discrete Probability
495
and V (X) = (1 − μ)2 q + (2 − μ)2 pq + (3 − μ)2 p 2 q + . . . ∞ ∞ q q 2 (n − μ)2 p n = (n + μ2 − 2nμ)p n p n=1 p n=1 ∞ ∞ ∞ q 2 n 2 n n = n p +μ p − 2μ np p n=1 n=1 n=1 p p q p(1 + p) 2 p = 2. +μ − 2μ = p (1 − p)3 1−p (1 − p)2 q
=
In the case of a fair coin, p = q = 1/2, hence E(X) = 2 and V (X) = 2.
♦
Some Properties of the Mean and the Variance In this entry (, P(), P ) is a discrete probability space, X : → R, Y : → R two discrete random variables, and α ∈ R. Proposition 884 (i) E(X + Y ) = E(X) + E(Y ). (ii) E(αX) = αE(X) (iii) If X and Y are independent, then E(XY ) = E(X)E(Y ). Proof (i) E(X + Y ) =
X(ω) + Y (ω) P ({ω})
ω∈
=
ω∈
(ii) E(αX) = (iii) E(XY ) =
ω∈
X(ω)P ({ω}) +
(αX)(ω)P ({ω}) = α
Y (ω)P ({ω}) = E(X) + E(Y ).
ω∈
ω∈
xyP (X = x and Y = y)
x∈X(), y ∈ Y ()
=
x∈X(), y∈Y ()
X(ω)P ({ω}) = αE(X).
xyP (X = x)P (Y = y) =
xP (X = x)
x∈X()
yP (Y = y)
y∈Y ()
= E(X)E(Y ), the second equality above due to the independence of X and Y . Proposition 885 (i) V (X) = E(X2 ) − (E(X))2 . (ii) V (αX) = α 2 V (X). (iii) If X and Y are independent, then V (X + Y ) = V (X) + V (Y ).
496
10 Basics on Descriptive Statistics
Proof (i) V (X) = E((X − E(X))2 ) = E(X2 − 2E(X)X + (E(X))2 ) = E(X2 ) − 2E(X)E(X) + (E(X))2 = E(X2 ) − (E(X))2 , where we used (i) and (ii) in Proposition 884, and the fact that, obviously, the expected value of a random variable that has a constant value is, precisely, this same constant. (iii) In view of the linearity of the expected value (see (i) and (ii) in Proposition 884), we have ' 2 ( V (X + Y ) = E (X + Y ) − E(X + Y ) '
2
( = E (X + Y )2 + E(X) + E(Y ) − 2(X + Y ) E(X) + E(Y ) ' ( ' ( = E X 2 + (E(X))2 − 2XE(X) + E Y 2 + (E(Y ))2 − 2Y E(Y ) ' ( + E 2XY + 2E(X)E(Y ) − 2XE(Y ) − 2Y E(X) . It is enough to observe that ' ( E 2XY + 2E(X)E(Y ) − 2XE(Y ) − 2Y E(X) = 2E(XY ) + 2E(X)E(Y ) − 2E(X)E(Y ) − 2E(Y )E(X) = 0, since E(XY ) = E(X)E(Y ) due to the fact that X and Y are independent (see (iii) in Proposition 884).
10.1.3
Products of Discrete Probability Spaces
Let (i , P(i ), Pi ), i = 1, . . . , n, be discrete probability spaces. Put := 1 ×. . .× $ n . Define P on P() by P ({ω}) := ni=1 Pi ({ωi }) for every ω := (ω1 , . . . ωn ) ∈ . It is obvious that P is a probability measure, called the product of the probability measures Pi , i = 1, . . . , n. Assume that Xi : i → R is a random variable for i = 1, . . . , n. Define a random variable Yi on by Yi (ω) = Xi (ωi ) for i = 1, . . . , n if ω = (ω1 . . . , ωn ). Then, given x1 , . . . , xn ∈ R we have P (Yi = xi ) = P (Xi = xi ) for i = 1, . . . , n. Thus, E(Yi ) = E(Xi ), and V (Yi ) = V (Xi ) for all i = 1, . . . , n. A particular instance of this situation is the following: Let (, P(), P ) be a discrete probability space, and let X : → R be a random variable. Fix n ∈ N, and consider$the probability space (n , P(n ), P n ), where n := × .(n) . . ×, and P n := ni=1 P . Define the random variable Yi on n by Yi (ω) = X(ωi ) for i = 1, . . . , n, where ω = (ω1 , . . . , ωn ) ∈ n . The random variables Y1 , . . . , Yn are
10.1 Discrete Probability
497
independent. Now, define a real-valued random variable Sn on n by Sn (ω) = Sn (ω1 , . . . , ωn ) =
n
Yi (ω) =
n
i=1
X(ωi ).
i=1
By Proposition 884 we get E(Sn ) =
n
E(Yi ) =
i=1
and V (Sn ) =
n
n
E(X) = nE(X),
i=1
V (Yi ) =
i=1
n
V (X) = nV (X).
i=1
Hence, if An (w) := n1 ni=1 X(wi ), then E(An ) = E(X) and V (An ) = n1 V (X). Now, it can be seen that Example 2 after Definition 882 (at least the values for the mean and the variance) may have been obtained from the results in Example 1 there.
10.1.4
Inequalities
Let (, P(), P ) be a finite probability space. The following inequalities have the added value of describing the mean and the variance of a random variable as upper bounds for probabilities. Their meaning is clear: both (Markov’s (from the Russian mathematician A. Markov) and Chebyshev’s inequality (from the Russian mathematician P. Chebyshev)) describe in a quantitative way how improbable is to get a value of a random variable away from the mean. Proposition 886 [Markov’s inequality] Let X be a nonnegative random variable on . Then, for any a > 0, E(X) . P (X ≥ a) ≤ a Proof E(X) =
xP (X = x) =
x 0. Then V (X) P (|X − E(X)| ≥ a) ≤ . a2 Proof P (|X − E(X)| ≥ a)
2
2 E X − E(X) V (X) 2 = P X − E(X) ≥ a ≤ = , 2 a a2
where the last inequality follows from Proposition 886.
Corollary 888 [Chernoff’s bound] Let X be a random variable on . Put F (t) := E(etX ) for t > 0. Then, for x ∈ R and t > 0, we have P (X ≥ x) ≤ e−tx F (t). Proof Fix t > 0 and define a nonnegative random variable Y on by Y := etX . Let a := etx . Then P (X ≥ x) = P (tX ≥ tx) = P (etX ≥ etx ) = P (Y ≥ a) ≤
E(Y ) = e−tx F (t), a
where the last inequality follows from Proposition 886.
Example 889 Let us consider the experiment of flipping a fair coin until tail T is shown (Example 883.4), and let X be the random variable defined in (10.5). We proved there that E(X) = 2 and V (X) = 2. Then, according to Proposition 886, the probability that T appears only after 10 tosses is less than or equal to 0.2. However, by using Proposition 887, we get the probability that T appears only after 10 tosses (i.e., P (|X − 2| ≥ 8) is less than or equal to V (X)/82 , i.e., less than or equal to 0.03125. In this case, this probability can be easily computed, since it is the probability of getting nine times H in a row, i.e., (1/2)9 = 0.001953125. ♦
10.2 10.2.1
Distribution Functions Selected Distributions of Discrete Random Variables
1. The two-point distribution: This corresponds to the experiment of flipping a coin, where the probability of H is p ∈ [0, 1] and the probability of T is q ( = 1 − p) (see Examples 873, 876.1, and 883.3). More generally, every experiment where the output may be “success” or “failure” follows the same pattern. The random variable X associated to this experiment was defined in Example 876.1. Its expected value and variance were calculated in Example 883.3. The probability
10.2 Distribution Functions
499
f
F
1
p q
q 0
0
1
1
Fig. 10.3 The probability density and distribution functions of the two-point distribution
density function f and the distribution (see Fig. 10.3) ⎧ ⎪ ⎪ ⎨q f (x) := p ⎪ ⎪ ⎩ 0 and
⎧ ⎪ ⎪ ⎨0 F (x) := q ⎪ ⎪ ⎩ 1
function F for this random variable are if x = 0, if x = 1,
(10.20)
if x ∈ {0, 1},
if x < 0, if x ∈ [0, 1),
(10.21)
if x ≥ 1.
2. The binomial distribution: The binomial distribution is a model for the experiment of, for a fixed n ∈ N, tossing n times a coin and counting the number of heads (in general, for an experiment consisting in counting “successes” in a row of length n of possible “successes” and “failures”). In this case, consists of all elements of the form ω := O1 O2 O3 . . . On , where Oi ∈ {H , T } for i = 1, 2, . . . , n. The probability P (ω) is p h q n−h , where p is the probability of H , and h is the number of H ’s in ω. The random variable associated to this experiment (counting heads) is Sn (ω) := X1 (ω1 ) + . . . + Xn (ωn ), where Xi is a random variable defined on {H , T } that has a two-point distribution, as it was introduced in the previous example. Observe that the probability density function fn is (Fig. 10.4 plots f20 for three different p’s) n k n−k p q , fork = 0, 1, 2, . . . , n. (10.22) fn (k) = P (Sn = k) = k and so the distribution function Fn is m n k n−k Fn (m) = P (Sn ≤ m) = p q , for m = 0, 1, 2, . . . , n. k k=0
(10.23)
In order to compute the mean and the variance of Sn it is enough to use Proposition 884 (i) and Proposition 885 (iii), respectively, to get E(Sn ) = nk=1 p = np,
500
10 Basics on Descriptive Statistics
Fig. 10.4 The probability density function f20 for the binomial distribution for several p’s
n and V (Sn ) = k=1 pq = npq, the last assertion since the random variables X1 , . . . , Xn are independent. 3. The Poisson distribution: Fix λ ≥ 0. In the previous example we considered the experiment of tossing a coin n times and counting the number of “successes” (say “heads,” where “head” appears with probability p). The random variable associated to this experiment had a binomial distribution with mean np and variance npq. Assume now that we increase indefinitely n keeping a constant mean λ. The pointwise limit of the probability density function is, by definition, the probability density function of a random variable S having a Poisson distribution with parameter λ (after the name of the French mathematician S. D. Poisson). Put f for its probability density function. Observe that pn := λ/n, and that qn := 1 − λ/n for every n ∈ N. Let us compute f (k) for k = 0, 1, 2, . . . . f (k) = lim fn (k) n→∞
k λ n−k n k n−k n! λ = lim pn qn = 1− n→∞ k k!(n − k)! n n
n λk n(n − 1) . . . (n − k + 1) 1 − λn = lim
k n→∞ k! nk 1 − λn
λ n λk 1 − n1 1 − n2 . . . 1 − k−1 λk n 1− = lim = e−λ
k n→∞ k! n k! 1 − λn (for the last equality see Exercise 13.140). Note, too, that E(X) =
∞ k=0
kf (k) = e−λ
∞ ∞ λk λk = e−λ λ = λ. k k! k! k=1 k=0
∞ λk −λ λk e = e−λ k2 k! k! k=0 k=0 ∞ ∞ ∞ ∞ λk λk+1 λ k λk −λ −λ −λ =e k (k + 1) k + =e =e λ (k − 1)! k! k! k! k=1 k=0 k=0 k=0
V (X) + (E(X))2 = E(X2 ) =
∞
k2
10.2 Distribution Functions
= e−λ λ λ
501
∞ λk k=0
k!
+ eλ
= e−λ λ(λeλ + eλ ) = λ2 + λ,
hence V (X) = λ. Observe that, by its very definition, the particular value f (k) of the probability density function f at k for the Poisson distribution is approached by fn (k), where fn is the probability density function of the binomial distribution, as far as p is small and n is big.
10.2.2
Continuous Random Variables and Their Distribution Functions
The concept of a random variable was introduced in Definition 875. To emphasize that a certain random variable is not discrete, we usually speak of a continuous random variable (since is not, in general, a topological space, this terminology should not convey the idea that X is a continuous mapping in the sense of Definition 316. Many of the previously defined concepts related to a discrete random variable extend naturally to the case of a general random variable X. In particular, the distribution function FX associated to X is defined by formula (10.9). However, Equation (10.12) is no longer valid, since, in general, it may happen that the probability of X taking a particular value x in R is 0. For example, assume that the “experiment” consists in choosing at random an element in the interval [0, 1] ⊂ R. Then := [0, 1], the σ -algebra consists of all Lebesgue measurable subsets of [0, 1], the probability measure is the Lebesgue measure λ restricted to [0, 1], and the random variable X : → R is given by X(x) = x for all x ∈ [0, 1]. The distribution function of X is FX (x) = λ([0, x]) = x. However, the probability of choosing a particular point in [0, 1] is, clearly, 0. Recall that a general measure μ defined on the σ -algebra M of the Lebesgue measurable subsets of R is called a probability measure whenever μ(R) = 1. The triplet (R, M, μ) is thus a probability space. Definition 890 We say a function F defined on R is a distribution function if (i) limx→−∞ F (x) = 0, (ii) limx→+∞ F (x) = 1, and (iii) F is increasing. In case that a distribution function F is differentiable, its derivative f := F is called the probability density function associated to F . Figure 10.5 depicts a typical distribution function F and its probability density function f associated. A distribution function F induces a probability measure p on R. More precisely, it induces a probability measure on the class of all measurable subsets of R in the
502 Fig. 10.5 A distribution function F and its probability density function f
10 Basics on Descriptive Statistics
F (x) f
1
F x
x
following way: the probability p([a, b]) of any interval [a, b] ⊂ R is defined as p([a, b]) := F (b) − F (a).
(10.24)
The measure of an interval was the starting point for the definition of the outer and inner measure of any subset of R, then the Lebesgue measure, corresponding to p, of any measurable subset of R. In this way, a probability measure p on the class of all the measurable subsets of R is defined. Remark 891 Observe that p ([a, b]) can be zero even if b > a and, on the other hand, p ([a, a]) > 0 if F fails to be continuous at a. ® Assume that (, , P ) is a probability space, and that X : → R is a random variable. A particular instance of a distribution function is given by the distribution function associated to X (whence the name), whose formula is given in (10.11). This is the content of the following result. Proposition 892 Let X : → R be a random variable, where (, , P ) is a given probability space. Then, the function FX := F defined in (10.11) is a right-continuous distribution function on R. Proof For x ∈ R, put Ex := X−1 (x, +∞). This is an element in . Let {xn }∞ n=1 be a sequence in R suchthat limn→∞ xn = −∞. Then {Exn }∞ n=1 is an increasing sequence in such that ∞ n=1 Exn = . Use Proposition 255 to obtain P (Exn ) → 1. This shows that P (Excn ) → 0. Since this is true for every sequence {xn }∞ n=1 such that xn → −∞, we get FX (x) → 0 as x → −∞, and this proves (i) in Definition 890. To prove (ii) there, for x ∈ R put Sx :=X −1 (−∞, x]. Again, let {xn }∞ n=1 be a sequence in R such that xn → +∞. Then ∞ S = R, hence, by Lemma 255, x n n=1 FX (xn ) = P (Sn ) → 1. Since this is true for such a sequence {xn }∞ , we get that n=1 FX (x) → 1 whenever x → +∞. The increasing character of FX follows from the fact that P is a positive measure. This proves (iii) in Definition 890. In order to prove the right-continuity of FX at each point x0 ∈ R we need to show that, for any sequence {xn }∞ n=1 in R such that xn ↓ x0 , then FX (xn ) → −1 F (x ). Observe that {X (−∞, xn ]}∞ X 0 n=1 is a decreasing sequence of sets, and ∞ −1 −1 X (−∞, x ] = X (−∞, x ]. Then, by Lemma 255 (see Remark 893 n 0 n=1 below), we obtain FX (xn ) = P (X ≤ xn ) → P (X−1 (−∞, x0 ]) = FX (x0 ). Since FX is increasing, it is a function of bounded variation on R (the definition of a function of bounded variation was done for a function defined on an closed and bounded subinterval of R; the extension to a function defined on R is natural). Let
10.2 Distribution Functions
503
x0 < x1 < x2 < . . . < xn be a finite sequence in R. Then n−1
|FX (xi+1 ) − FX (xi )|
i=0
=
n−1
(FX (xi+1 ) − FX (xi )) =
i=0
n−1
P (xi < X ≤ xi+1 ) = P (x0 < X ≤ xn ) ≤ 1,
i=0
and then Vxx0n FX = P (x0 < X ≤ xn ) (see Definition 426).
Remark 893 Lemma 255 was proved for the Lebesgue outer measure on R. However, the reader may observe that the argument is independent of this, and relies only on the fact that λ was a positive countably additive function defined on a σ -algebra. Observe, also that Lemma 255 was formulated for an increasing sequence of sets in the σ -algebra. A corresponding version for decreasing sequences, in case of finite measures like the present one (we deal, in fact, with a probability measure) holds as a consequence of that lemma. Observe, too, that if xn ↑ x0 , then {X−1 (−∞, xn ]}∞ n=1 is an increasing sequence −1 −1 of sets, and ∞ X (−∞, x ] = X (−∞, x ). Hence, by Lemma 255, we obtain n 0 n=1 FX (xn ) = P (X ≤ xn ) → P (X < x0 ) = FX (x0 ) − P (X = x0 ). We cannot ensure that P (X = x0 ) = 0, i.e., we can not assert that the function FX is left-continuous at x0 . By the way, this happens if and only if P (X = x0 ) = 0, as follows from the previous argument. ® The fact that a distribution function is increasing ensures that it has a derivative (a.e.). For a continuous random variable, this derivative f is called the density function of the random variable. If we restrict ourselves to random variables X with absolutely continuous distribution functions FX , then f ∈ L(R), and Theorem 791, when applied to any closed and bounded interval [a, x], gives F (x)−F (a) = [a,x] f . Letting a → −∞ we get, by the Lebesgue Dominated Convergence Theorem 750, and the fact that F (a) → 0 as a → −∞, that 4 F (x) =
f for all x ∈ R,
(10.25)
(−∞,x]
a continuous analogue of Eq. (10.12). Example 894 [The normal distribution] Maybe the most important continuous distribution is the normal distribution—also called Gaussian distribution, to honor C. F. Gauss. It can be understood as the “limit” of the binomial distribution whenever the number of experiments increases without bound (this statement is a particular case of an important result in the theory, the so-called Central Limit Theorem. For a reference see, e.g., [Lo77]). It is then well adapted to analyze in terms of probability situations like the following: Assume that in looking for the sex of a new born, the probability of male at birth in a certain population is 0.487. We want to know the
504
10 Basics on Descriptive Statistics
Fig. 10.6 The density function of a normal distribution with mean 4 and variance 3 on [−20, 20]
probability that the number of females born in some period at some hospital (where they may assist at 113 births per period) will be greater than or equal to 60. Figures are too high to apply formula (10.23). The situation is handled by using the density function of the normal distribution, precisely, f (x) := √
1 2π σ
e−(x−μ)
2 /(2σ 2 )
(10.26)
.
(A graph of f for μ = 2 and σ 2 = 3 on [−20, 20] appears in Fig. 10.6.) The distribution function defined by f is then 1 F (x) = √ σ 2π
4
x
e −∞
−(t−μ)2 /(2σ 2 )
dt =
1 √ 2π
4
(x−μ)/σ
e
−t 2 /2
dt,
(10.27)
−∞
an integral that cannot be evaluated in a closed form. That F in (10.27) is a distribution function according to Definition 890, together with the fact that the mean E(X) of the random variable X with such a density function is μ, and its variance V (X) is σ 2 , is proved in Exercise 13.535. To answer the question that motivated the introduction of the normal distribution, observe that if the random variable X is the number of females born, the probability p of female is 1−0.487 ( = 0.513), and we approximate the mean μ := E(X) and the variance σ 2 := V (X) by the corresponding values for the binomial distribution, i.e., μ = np = 57.97 and σ 2 = npq = 28.23. A numerical computation to approximate F (60) gives 0.6489 (and this is the probability P (X ≤ 60)). So, the answer to our question is 0.3511. ♦
Chapter 11
Excursion to Functional Analysis
In Modern Analysis, most of the problems in Science and Engineering have solutions in infinite-dimensional spaces of functions, where closed bounded sets are usually not compact, the measure theory is of limited use, linear operators may not have eigenvalues even in the complex case, or even reasonable continuous functions do not attain their extrema on closed bounded sets, just to mention some difficulties. Thus we encounter there problems additional to those we handled in the first chapters of this text. Functional Analysis helps in solving these problems by providing additional powerful tools that work in general contexts, allowing in this way to treat a large variety of situations looking at them from a unifying point of view. For example, by considering functions as points of a space, approximation and convergence of sequences of functions can be seen more clearly, and functions on functions — specially linear operators like differentiation and integration— can be treated by a mixture of linear algebra, geometry and topology (as in the Banach space and, more particulary, the Hilbert space setting, approached here in Sections 11.1 to 11.4). Compactness now extends to the important case of families of functions. Fixed point theory and, in general, non-linear analysis—in this functional context— provide solutions to differential, partial differential, and integral equations, and variational calculus looks for solving perturbation problems. Spectral theory extends many of the classical results in finite matrix theory to infinite-dimensional cases. Here we can only hope to glimpse at these topics. The knowledge of modern integral theory (Lebesgue), and a deeper knowledge of differential calculus, are a must as functions involved in solutions of the modern analysis problems are usually quite complicated. This all shows why the Real Analysis is a basis for many fields of Mathematics, like function spaces, modern differential equations, operator theory, and linear and nonlinear functional analysis, just to name a few.
© Springer International Publishing Switzerland 2015 V. Montesinos et al., An Introduction to Modern Analysis, DOI 10.1007/978-3-319-12481-0_11
505
506
11 Excursion to Functional Analysis
11.1 11.1.1
Real Banach Spaces Spaces with a Norm (Normed Spaces, Banach Spaces)
The following definition presents a concept that extracts the main features of the absolute value function on R and the Euclidean norm in Rn , this time in the broader context of an arbitrary vector space over the field of the real or complex numbers. As we shall justify later (see Sect. 11.3), all geometrical and almost all analytical concepts, we are interested in, can be presented in the framework of real normed spaces. Still, the complex case is needed in view of the fact that, specially in Fourier analysis, but also in operator theory, we deal with spaces of (one-variable) complexvalued functions, and those are vector spaces over the field of complex numbers. To avoid misunderstandings, the especial features of the complex case will be treated separately (see again Sect. 11.3). Thus, if nothing is said on the contrary, vector spaces are considered over the field R of real numbers (henceforth, called real vector spaces, in short). Definition 895 A real-valued function · on a vector (i.e., linear) space X is called a norm on X if (i) (ii) (iii) (iv)
x ≥ 0 for every x ∈ X x = 0 if and only if x = 0 λx = |λ| x for every x ∈ X and every λ ∈ R x + y ≤ x + y for every x, y ∈ X (the “triangle inequality”)
A vector space X with a norm · is denoted by (X, ·), and is called a normed linear space (or just a normed space). If the norm · is understood, we simply speak of a normed space X. Observe that the function x → x from a normed space (X, ·) into R is convex (for the extension of the concept of convexity to a multidimensional setting see Exercise 13.523). Indeed, given two vectors x, y ∈ X and λ ∈ [0, 1], we have λx + (1 − λ)y ≤ λx + (1 − λ)y. Note, too, that the function d(x, y) := x − y, where x, y ∈ X
(11.1)
is indeed a metric on (X, ·), usually referred to as the canonical metric associated to ·. To check the triangle inequality (see Definition 548) we write d(x, z) := x − z = x − y + y − z ≤ x − y + y − z =: d(x, y) + d(y, z). Therefore, normed linear spaces are special examples of metric spaces; we can speak then of separable normed space, complete normed spaces, etc. Convergence in the norm refers to convergence in the associated metric. Precisely, a sequence {xn }∞ n=1
11.1 Real Banach Spaces
507
in a normed space (X, ·) is norm-convergent (in symbols, ·-convergent) to an element x ∈ X precisely when d(xn , x) → 0, i.e., xn − x → 0. In this section, by a subspace Y of a normed space (X, ·) we shall always understand a linear subspace, i.e., a subset Y of X such that a1 y1 + a2 y2 ∈ Y whenever y1 , y2 ∈ Y and a1 , a2 are real numbers. The space Y will always be considered endowed with the restriction of the norm · (denoted again ·), and so (Y ·) becomes a normed space. Note that the norm · of a normed space (X, ·) is a 1-Lipschitz function. Indeed, by the triangle inequality, for all x, y ∈ X we have x = x − y + y ≤ x − y + y, hence x − y ≤ x − y. By interchanging the role of x and y we get y − x ≤ y − x ( = x − y). All together, |x − y| ≤ x − y =: d(x, y), for all x, y ∈ X.
(11.2)
In particular, · is continuous. ∞ ∞ Note, too, that if {xn }∞ n=1 and {yn }n=1 are two sequences in X, and {λn }n=1 is a sequence in R such that xn → x, yn → y, and λn → λ, then xn + yn → x + y and λn xn → λx. These follow easily from the properties of the norm (in the second case, we rely on the fact that every convergent sequence in X is bounded). Let (X, ·) be a normed space. The set BX := {x ∈ X : x ≤ 1} is said to be the closed unit ball of (X, ·), and SX := {x ∈ X : x = 1} the unit sphere of (X, ·). Given x0 ∈ X and r > 0, the set B(x0 , r) := {x ∈ X : x − x0 < r} (B[x0 , r] := {x ∈ X : x − x0 ≤ r}) is said to be the open (respectively, closed) ball centered at x0 with radius r. When the canonical metric of a normed space is complete, the normed space is said to be a Banach space, in honor of the Polish mathematician S. Banach, who together with his collaborators introduced and studied this class. Definition 896 A Banach space is a normed linear space (X, ·) that is complete in the canonical metric defined by d(x, y) = x −y for x, y ∈ X, i.e., every Cauchy sequence in X for the metric d converges in the metric d to some point in X. We formulate the completeness condition in terms of the norm: a normed space (X, ·) is said to be a Banach space if given any sequence {xn }∞ n=1 in X such that for every ε > 0 there exists n0 ∈ N with xn − xm < ε for n, m ≥ n0 , then we can find x ∈ X such that xn − x → 0. In particular, the spaces (R, ·1 ), (Rn , ·2 ), ( ∞ (), ·∞ ), (C[0, 1], ·∞ ), introduced in Examples 549.2a, 549.2b, 549.2c, 551.4, respectively, are all Banach spaces. We shall provide below some other examples of normed and Banach spaces. Example 897 1. Let c0 (N) (also denoted by c0 ) be the normed space of all real-valued sequences that converge to 0, endowed with the supremum norm ·∞ . It was considered
508
11 Excursion to Functional Analysis
Fig. 11.1 Two equivalent norms on R2 (inclusions (11.4)
B1 c2B2
c1B2 0
in Example 565.18. In Example 573.18 it was shown to be a Banach space, and in Example 586.18 to be separable. The more general setting of the space c0 (), where is an arbitrary nonempty set, was considered in Example 565.18. In the aforementioned examples it was shown to be a Banach space, and to be separable, if and only if, is countable. 2. Given an arbitrary nonempty set , the space c00 () consists of all elements in c0 () that have a finite support. This space was introduced in Example 565.19, and was endowed with the supremum norm ·∞ . It was proved in Example 573.19 that it is dense in c0 (), hence it is not complete. It is separable, if and only if, is countable (see Example 586.19). 3. Let 1 (N) (also denoted by 1 ) be the vector space of all real-valued sequences ∞ ∞ {xn }∞ such that the series x is absolutely convergent, i.e., n n=1 n=1 n=1 |xn | is ∞ convergent. We endowed this space with the norm x := n=1 |xn |, if x := {xn }∞ n=1 ∈ 1 (N) (see Example 565.15). It was proved to be a Banach space in Example 573.15, and separable in Example 586.15. ♦ In the rest of the chapter, and in the exercises, each of those spaces will be assumed to carry the respective norm defined above if nothing is said on the contrary. Banach spaces provide for an ideal interplay between the (complete) metric structure and the linear structure. Thus they form a framework for a large part of mathematics and its applications. So, the analysis on Banach spaces and their relatives is an important part of modern analysis. Two norms ·1 and ·2 on a vector space X are called equivalent if there are constants c1 > 0 and c2 > 0 such that c1 x1 ≤ x2 ≤ c2 x1 , for every x ∈ X.
(11.3)
Geometrically, this means that the corresponding unit balls B1 := B(X,·1 ) and B2 := B(X,·2 ) satisfy (see Fig. 11.1) c1 B2 ⊂ B1 ⊂ c2 B2 .
(11.4)
Remark 898 1. We may formulate the condition of equivalence in (11.3) by requesting the existence of a single constant c > 0 such that 1 x1 ≤ x2 ≤ cx1 c
(11.5)
11.1 Real Banach Spaces
509
for all x ∈ X. For this, it is enough to choose c big enough to have simultaneously 1/c < c1 and c2 ≤ c. 2. Two norms ·1 and ·2 on a vector space X are equivalent, if and only if, one of the two following conditions hold: (i) (X, ·1 ) and (X, ·2 ) have the same convergent sequences. (ii) (X, ·1 ) and (X, ·2 ) have the same bounded sets. A hint for the proof is in Exercise 13.543. 3. If two norms · 1 and · 2 on a linear space X are equivalent, then clearly the two normed spaces (X, · 1 ) and (X, · 2 ) have the same Cauchy sequences. ®
11.1.2
Operators I
Definition 899 A mapping T from a vector space X into a vector space Y is called linear if T (α1 x1 + α2 x2 ) = α1 T (x1 ) + α2 T (x2 ) for every α1 , α2 ∈ R and x1 , x2 ∈ X. (11.6) There is a tradition to call linear mappings between vector spaces operators (sometimes we stress the fact that they are linear by referring to them as linear operators, although in this text by the word “operator” we always mean “linear operator”). Observe that if X, Y , and Z are vector spaces, and T : X → Y and S : Y → Z are operators, then S ◦ T : X → Z is also an operator. Since, in this context there is no possibility of misunderstanding, this composition will usually be written as ST . The vector space of all operators from X into Y will be denoted L(X, Y ). A linear functional on X is a linear mapping from X into R. Continuity, and related concepts, of a mapping between metric spaces were introduced in Chap. 6. For linear mappings between two normed spaces, the following result lists a number of useful equivalences to the continuity. Proposition 900 Let (X, ·) and (Y , | · |) be normed spaces and let T be a linear mapping from X into Y . The following are equivalent: (i) (ii) (iii) (iv) (v)
T is continuous on X T is continuous at the origin There is C > 0 such that |T (x)| ≤ Cx for every x ∈ X T is Lipschitz T (BX ) is a bounded set in Y
Proof (i) ⇐⇒ (ii) follows from the linearity of T . (iii) ⇐⇒ (iv) follows as |T (x) − T (y)| = |T (x − y)| ≤ Cx − y. If (ii) is true, then given ε > 0, there is δ > 0 such that |T (x)| ≤ ε whenever x x x ≤ δ. If x ∈ X, x = 0, then δ x = δ and thus |T (δ x )| ≤ ε, hence |T (x)| ≤ (ε/δ)x for every x ∈ X, which shows (iii). On the other hand, (iii)
510
11 Excursion to Functional Analysis
clearly implies (ii): Given ε > 0 take δ := ε/C to get |T (x)| ≤ ε whenever x ≤ δ, and this shows the continuity at 0. Assuming (iii) we obtain that |T (x)| ≤ C whenever x ∈ BX , so T (BX ) is bounded. On the other hand, if T (BX ) is bounded and, say, |T (x)| ≤ C for every x )| ≤ C, so |T (x)| ≤ Cx. x ∈ BX , then for every x ∈ X, x = 0, we have |T ( x This shows the equivalence of (iii) and (v). For a list of equivalences for the continuity of linear functionals in the vein of Proposition 900 see Exercise 13.548. It is worth to bring here one of the equivalences from this exercise. First, some notation: A 0-hyperplane is a linear subspace H of a vector space E having the property that, algebraically, E = H ⊕ span {x}, for some x ∈ E \ H . A hyperplane is the translate of a 0-hyperplane. It is simple to prove that H is a (0-) hyperplane, if and only if, there exists a linear functional f : E → R and a real number α such that H = f −1 (α) (respectively, H = f −1 (0)). The equivalence we want to stress here is the following: A linear functional f : X → K is continuous, if and only if, the 0-hyperplane f −1 (0) is closed. An operator T from X into Y is called bounded if T (BX ) is bounded in Y . It follows from Proposition 900 that an operator is continuous, if and only if, it is bounded. We define the operator norm of a bounded operator T from (X, ·) into (Y , | · |) by T = sup{|T (x)| : x ∈ BX }.
(11.7)
T = sup{|T (x)| : x ∈ SX }.
(11.8)
Note that
Indeed, sup{|T (x)| : x ∈ SX } ≤ sup{|T (x)| : x ∈ BX }. On the other hand, if x ∈ BX and x = 0, by the homogeneity of T we have |T (x)| = x.|T (x/x)| ≤ |T (x/x)|, hence sup{|T (x)| : x ∈ BX } ≤ sup{|T (x)| : x ∈ SX }. Note, too, that T = sup{|T (x)| : x ∈ BX \ SX }.
(11.9)
The proof is left as an exercise. Observe also that |T (x/x)| ≤ T for every x ∈ X, x = 0, and thus |T x| ≤ T .x, for every x ∈ X.
(11.10)
B(X, Y ) denotes the vector space of all bounded operators from X into Y , endowed with the operator norm. That · defined by (11.7) is indeed a norm on B(X, Y ) can be easily checked. We put B(X) := B(X, X). Proposition 901 Let X, Y be normed linear spaces. If Y is a Banach space then B(X, Y ) is also a Banach space. Proof The proof of the completeness is similar to that for the space C[0, 1] (see Example 573.4).
11.1 Real Banach Spaces
511
Remark 902 1. If X is a finite-dimensional normed space, then the space B(X) is finitedimensional, too. This is a consequence of the fact that X has a finite algebraic basis (say with n elements), and that every linear operator from X into X is given by a (n × n) matrix. By the way, every linear operator from X into X is already continuous. This follows from the matrix description of the operator (see Sect. 11.1.3 and in particular Corollary 910 below). 2. In contrast with the previous item, B(X) may be nonseparable even if X is a separable Banach space. For an example, see Exercise 13.558, where X is in fact a separable Hilbert space. ® In the theory of linear algebra, the algebraic dual of a vector space E is the set E
of all linear functionals on E, i.e., all linear real-valued mappings defined on E. It is a linear space when endowed with two operations: the sum of two mappings and the product of a real number and a mapping. The following definition gives the linear-topological counterpart of this concept. Definition 903 If (X, ·) is a normed space, then the space B(X, R) (denoted X ∗ ) is called the dual space of X, and the norm · on X ∗ defined by (11.7) is called the dual norm or the supremum norm. Precisely, f := sup{|f (x)| : x ∈ BX }, for f ∈ X∗ .
(11.11)
If nothing is said on the contrary, we will assume X ∗ endowed with the norm defined by (11.11). Sometimes this norm will be denoted | · |∗ to avoid any misunderstanding. For identification of dual spaces of some classical Banach spaces see Exercises 13.581 and 13.584. The following result is a particular case of Proposition 901 above. Corollary 904 Let (X, ·) be a normed space. Then (X ∗ , ·) is a Banach space. An operator T from a normed space X into a normed space Y is said to be a linear isomorphism if T is one-to-one, onto, and both T : X → Y and T −1 : Y → X are continuous. If such an operator exists between X and Y , we say that X and Y are linearly isomorphic, or that X is linearly isomorphic to Y . Recall that a mapping between metric spaces that preserves distances is called an isometry. Observe that an operator T : X → Y , where (X, ·) and (Y , | · |) are normed spaces, is an isometry, if and only if, |T x| = x for all x ∈ X. Indeed, if T is an isometry then |T x| = dY (T x, T 0) = dX (x, 0) = x for all x ∈ X, where dX (dY ) is the metric induced by the norm of X (respectively, of Y ), see Eq. (11.1). Conversely, if |T x| = x for all x ∈ X, then dY (T x1 , T x2 ) = |T x1 −T x2 | = |T (x1 −x2 )| = x1 − x2 = dX (x1 , x2 ) for all x1 , x2 ∈ X. Note that every linear isometry T from a normed space (X, ·) into a normed space (Y , | · |) is always one-to-one. Indeed, if for x1 , x2 ∈ X we have T x1 = T x2 , then |T (x1 − x2 )| = 0, hence x1 − x2 = 0, i.e., x1 = x2 . Two normed spaces X and Y such that there exists a linear isometry from X onto Y are said to be linearly isometric. In particular, they are linearly isomorphic.
512
11 Excursion to Functional Analysis
Fig. 11.2 The first terms of a bounded sequence in X with an unbounded image by F (Example 906)
Example 905 Let X = (C[0, 1], ·∞ ) (where C[0, 1] denotes the space of all realvalued continuous functions defined on [0, 1], see Example 551.4) and t ∈ [0, 1] be a given point. Then the mapping δt from X into R given by δt (f ) = f (t) for all f ∈ X is a continuous linear function, and its norm is 1. Indeed, the linearity of δt is clear from the definition, and its continuity follows from the fact that |δt (f )| = |f (t)| ≤ f ∞ for all f ∈ X. This inequality shows that δt ≤ 1, where · denotes the norm on C[0, 1]∗ dual to ·∞ . Since δt (I) = 1, where I denotes the constant function 1, we have, in fact, that δt = 1. Observe, too, that δt − δs = 2 whenever t, s ∈ [0, 1] and t = s. Indeed, δt − δs ≤ δt + δs ≤ 2. On the other hand, find a function f ∈ C[0, 1] such that f ∞ = 1, f (t) = 1 and f (s) = −1 and evaluate (δt − δs )f . Since {δt : t ∈ [0, 1]} is an uncountable set, (v) in Theorem 582 shows that (C[0, 1]∗ , ·) is nonseparable, while we know that (C[0, 1], ·∞ ) is separable (see Example 586.4). ♦ Example 906 We provide an example of a (noncomplete) normed linear space X and a noncontinuous linear functional F on X. Let X be the normed linear space of all continuous real-valued functions on [−1, 1] that are differentiable on (−1, 1). This is a subspace of C[−1, 1] that contains the restriction to [−1, 1] of any polynomial. We endow X with the restriction of the supremum norm ·∞ . We shall show that the space (X, ·∞ ) is not complete and that the functional F defined on X by F (f ) = f (0) is discontinuous. Indeed, should (X, ·∞ ) be complete, the Weierstrass approximation theorem and the fact that every complete space is closed in overspaces will conclude that X = C[ − 1, 1]; however, not every continuous function on [ − 1, 1] is differentiable on (−1, 1). That F is discontinuous follows from the equivalence between (i) and (v) in Proposition 900: The sequence {sin nx : n ∈ N} is in BX , and so F (BX ) is not bounded (see Fig. 11.2). In this direction, see also Corollary 910. ♦
11.1.3
Finite-Dimensional Banach Spaces
We shall first review a few basic facts from linear algebra needed below.
11.1 Real Banach Spaces
513
Recall that if X is a vector space, then X denotes the vector space of all linear functionals on X, called the algebraic dual space of X. A vector space (also called a linear space) E is said to be finite-dimensional if there exists a finite subset {e1 , e2 , . . . , en } of E (called a system of generators) such that every vector x ∈ E is a linear combination of vectors in {e1 , e2 , . . . , en }, i.e., x = nk=1 λk ek for some real numbers λ1 , . . . , λn . A Hamel (or algebraic) basis of a finite-dimensional vector space is a minimal finite system of generators (in the sense that no strict subset of it is a system of generators). Minimality is equivalent to the fact that the expression of any vector as a linear combination of elements from the system of generators is unique. If {e1 , e2 , . . . , en } (for a natural number n) is a Hamel basis of a finite-dimensional vector space E, we say that E has (linear) dimension n. Of course, the coherence of this definition needs the proof of the fact that any two Hamel bases of a finite-dimensional vector space have the same cardinality. This is done in basic linear algebra courses. If X is a finite-dimensional vector space, and {ei : i = 1, 2, . . . , n} is analgebraic basis of X, call fi the linear functional on X defined by fi (x) = xi for x = ni=1 xi ei , i = 1, 2, . . . , n. Then fi (ej ) = δij (where δij = 1 if i = j and 0 otherwise), so
{fi : i = 1, 2,n . . . , n} is a linearly independent subset of X . Moreover, if f ∈ X , then f = i=1 f (ei )fi , as one may check by evaluating both sides on ei for all i = 1, 2, . . . , n. This shows that {fi : i = 1, 2, . . . , n} is an algebraic basis of X . We collect in the next result a list of basic facts on finite-dimensional linear spaces and linear mappings between them. We do not mention norms: The statement is purely algebraic. Facts 907 (i) Any two n-dimensional linear spaces are linearly algebraically isomorphic. (ii) An n-dimensional linear space is not linearly algebraically isomorphic to any m-dimensional linear space if n = m. (iii) If X is a finite-dimensional linear space then X is linearly algebraically isomorphic to X. (iv) Let X be a finite-dimensional space and let F be a linear functional on X . Then there is x ∈ X such that F (x ) = x (x) for all x ∈ X . (v) Let X be a finite-dimensional space and T be a linear operator from X into X. Then T is one-to-one if and only if T is onto. Proof (i) If {ei , i = 1, 2, . . . , n} is an algebraic basis of X and {fi , i= 1, 2, . . . , n} is an algebraic basis of Y , then the mapping T ( ni=1 λi ei ) := ni=1 λi fi is a linear isomorphism from X onto Y . (ii) If n > m, {ei , i = 1, 2, . . . , n} is an algebraic basis for X, and T is a linear isomorphism from X onto Y , then {T ei , i = 1, 2, . . . , n} is a linearly independent set in Y , a contradiction with the dimension of Y being less than n. (iii) The dimension of X and X are the same by the paragraph preceding these facts. Thus, it follows from (i), above, that X and X are linearly isomorphic.
514
11 Excursion to Functional Analysis
Fig. 11.3 The closed unit ball in the norm ·1 of R3 (proof of Theorem 908)
B3 1
(iv) Let {ei , i = 1, 2, . . . , n} be an algebraic basis of X. Then the functionals {Fi : i = 1, 2, . . . , n} defined on X by Fi (f ) = f (ei ) for i = 1, 2, . . . , n are linearly independent in the n-dimensional space (X ) and thus they span the whole (X ) . (v) Let the dimension of X be n. If T is not one-to-one, there is x = 0 in X such that T (x) = 0. Let x, e2 , . . . , en be an algebraic basis of X. Then Y = T X is spanned by T (e2 ), . . . , T (en ), hence Y = X and so T is not onto. If T is one-to. . . , T (en )} is a one, and {e1 , e2 , . . . , en } is an algebraic basis of X, then {T (e1 ), linearly independent set in Y , since, if ni=1 λj T (ej ) = 0, then nj=1 λj ej = 0, giving λj = 0 for all j as {e1 , . . . , en } is a basis of X. Any linearly independent set of n elements in an n-dimensional space is necessarily a basis in it. This proves that T is onto. The next result is crucial in the theory of finite-dimensional spaces. Theorem 908 Let X be a vector space. If X is finite-dimensional, then any two norms on X are equivalent. Proof Let {e basis of X. We introduce a norm ·1 on X 1 , . . . , en } be an algebraic by x1 := ni=1 |λi | for x = ni=1 λi ei ∈ X. To check the triangle inequality, for x = ni=1 λi ei and y = ni=1 βi ei we write x + y1 =
n i=1
|λi + βi | ≤
n
|λi | +
i=1
n
|βi | = x1 + y1 .
i=1
We will show that · on X is a Lipschitz function on (X, ·1 ). an arbitrary norm Indeed, if x = ni=1 λi ei and y = ni=1 βi ei , then . . n n . . . . (λi − βi )ei . ≤ |λi − βi |.ei x − y = . . . i=1
i=1
≤ max{ei : i = 1, 2, . . . , n}
n i=1
i = 1, 2, . . . , n} · x − y1 .
|λi − βi | = max{ei :
11.1 Real Banach Spaces
515
Use now (11.2) and the previous estimate to get |x − y| ≤ x − y ≤ max{ei : i = 1, 2, . . . , n}x − y1 . We shall prove that S1 := {x ∈ X : x1 = 1} is compact in (X, ·1 ) by showing that every sequence {x (k) }∞ S1 has a subsequence that ·1 -converges to an element in k=1 in n (k) S1 . To this end, put x (k) = ni=1 λ(k) i ei for k ∈ N. We have i=1 |λi | = 1 for every k ∞ and thus {λ(k) {kl }∞ i }k=1 is bounded for every i = 1, 2, . . . , n. Let a subsequence l=1 of n (kl ) (kl ) {1, 2, . . . } be such that λi → λi for every i = 1, 2, . . . , n. Then |λ −λ i| → i=1 i n λ e . Since 0 as l → ∞, so we have x (kl ) → x as l → ∞, where x = i=1 i i n n (kl ) i=1 |λi | = 1 for every l, we have i=1 |λi | = 1 and thus x ∈ S1 . Since · is continuous on the compact set S1 , by Weierstrass’ Corollary 335, it is bounded above on S1 (by some d ∈ R), and it attains its minimum c at a point x x ∈ S1 , hence c := x = 0. It follows that c ≤ x < d for every nonzero 1 x ∈ X. From the latter inequality we have cx1 ≤ x ≤ dx1 for every x ∈ X, so · is equivalent to ·1 . Consequently, all norms on X are equivalent. Remark 909 Let X be an n-dimensional linear space and let {e1 , . . . , en } be an algebraic basis of X. The norm ·1 was defined in the proof of Theorem 908. It is straightforward to check that, in (X, ·1 ), a sequence {x (k) }∞ k=1 converges to x, if and only if, each coordinate of x (k) converges to the corresponding coordinate of x, that a sequence is Cauchy in this space, if and only if, each coordinate forms a Cauchy sequence, etc. Thus, we have that (X, ·1 ) is complete. Since all the norms on X are equivalent (Theorem 908), this observation, together with Remarks 898.2 and 898.3, help in proving Corollary 910 below. ® Corollary 910 (i) If X is a finite-dimensional normed space and if T is a linear operator from X into a normed space Y , then T is continuous. On the other hand, if all linear functionals on a given normed linear space are continuous, then X is finitedimensional. As a consequence, the algebraic dual X of X and the dual X∗ (see Definition 903) coincide if X is finite-dimensional. (ii) All the n-dimensional normed spaces are mutually linearly topologically isomorphic, i.e, there is a continuous linear one-to-one map from one onto the other. (iii) All the n-dimensional normed spaces are linearly topologically isomorphic to their dual spaces. (iv) All the n-dimensional normed spaces are complete and closed in all their normed overspaces. (v) The closed unit ball and the unit sphere in an finite-dimensional normed space are both compact. On the other hand, if the closed unit ball or the sphere of a normed space is compact then X is finite-dimensional.
516
11 Excursion to Functional Analysis
Proof (i) For the continuity of T , it is enough to show that T mapping (X, ·1 ) into Y is continuous (see Theorem 908 and Remark 909). For it, let M = max{T e1 , . . . , T en )}. If x1 ≤ 1 and x = (x1 , . . . , xn ), then . . n n n . . . . xi T e i . ≤ |xi |.T ei ≤ M |xi | ≤ M. T x = . . . i=1
(ii)
(iii) (iv) (v)
i=1
i=1
For the second part, we shall prove that if X is an infinite-dimensional normed space then there always is a discontinuous linear functional on X. Indeed, let {eα } be an (infinite) normalized Hamel basis for X (it exists by Zorn’s Lemma, see Sect. 12.6.3) and choose an infinite countable subset {eαn : n ∈ N} of it. Finally, define a function f : {eα } → N by f (eαn ) = n for each n ∈ N and f (eα ) = 0 for α = αn , and extend it by linearity on X, i.e, aα f (eα ) (the sums here have a nonempty finite number of f ( aα eα ) := summand). Then f is a linear unbounded functional on X. For another example on a particular normed space, see Exercise 13.550, and for an instance of an unbounded linear operator between Banach spaces see Example 906. Follows from (i) and (v) in Facts 11.1.3 and (i) in this Corollary 910, as any linear isomorphism between finite-dimensional spaces is necessarily a topological isomorphism. Follows from (iii) in Facts 11.1.3 and (i) in this Corollary 910. Follows from Remark 909 and from the fact in metric spaces that complete spaces are closed in overspaces (see Proposition 572). Let us first prove that the closed unit ball BX of a finite-dimensional normed space (X, ·) is compact.
Note that BX is a closed subset of a multiple of the closed unit ball of the space (X, ·1 ) defined in the proof of Theorem 908 (see this theorem and Remark 909). So it suffices to show that the unit ball of (X, ·1 ) is compact. To this end, let {xn }∞ n=1 be a sequence in it. The boundedness of the sequence consisting of the first coordinates allows to extract a subsequence of {xn }∞ n=1 whose first coordinates form a convergent sequence. A (finite) diagonal argument produces a subsequence {yn }∞ n=1 of {xn }∞ n=1 such that it converges to a point y in each coordinate, hence in (X, ·1 ), see Remark 909. Note that y is in the unit ball of (X, ·1 ) as this set is closed. This finishes the proof that the unit ball of (X, ·1 ) is compact. To show that the closed unit ball of an infinite-dimensional normed space cannot be compact we shall first state and prove the following result, due to F. Riesz. Lemma 911 (Riesz) Let (X, ·) be a finite-dimensional normed space. Let Y be a proper linear subspace (i.e., a linear subspace different from X). Then, given ε ∈ (0, 1) there exists an element x ∈ SX such that dist (x, Y ) > 1 − ε. Proof Let Z be an algebraic complement of Y in X, i.e., algebraically X = Y ⊕ Z or, in other words, every element x ∈ X can be written in a unique form as y + z, where y ∈ Y and z ∈ Z. Define a nonzero linear mapping f : X → R such that f |Y = 0. Corollary 910 (i) shows that f is continuous. By scaling, we may assume
11.1 Real Banach Spaces
517
Fig. 11.4 The construction in Lemma 911
that f = 1. Find x ∈ SX such that f (x) > 1 − ε (see Fig. 11.4). Then, for y ∈ Y we have 1 − ε < f (x) = f (x − y) ≤ x − y. By taking the infimum over y ∈ Y in (11.12) we get dist (x, Y ) > 1 − ε.
(11.12)
We can now finish the proof of (v) in Corollary 910. Assume that X is infinitedimensional. Use then Lemma 911 to define inductively a sequence {xn }∞ n=1 in SX such that xn −xm > 1/2 for all n, m ∈ N, 1 ≤ m < n. To be precise, start by taking an arbitrary element x1 ∈ SX . Form the one-dimensional subspace X1 := span {x1 }, and let X2 be a two-dimensional subspace of X such that X1 ⊂ X2 ; there exists x2 ∈ X2 such that dist (x2 , X1 ) > 1/2. Find a three-dimensional subspace X3 of X such that X2 ⊂ X3 ; there exists x3 ∈ X3 such that dist (x3 , X2 ) > 1/2. Continue in this way. We get a 1/2-separated infinite subset {xn : n ∈ N} of BX , and this implies, by Theorems 615 and 620, that BX is not compact. Remark 912 We repeat here a consequence of (i) in Corollary 910: In case X is a finite-dimensional normed space, its algebraic and topological dual spaces coincide. From now on, the algebraic dual of a finite-dimensional normed space will be accordingly denoted by X ∗ , instead of by X . ® Remark 913 Regarding (iii) in Corollary 910 we mention here that for n, m ∈ N, the two spaces Rn and Rm cannot be homeomorphic if n = m. This is a consequence of the so-called “invariance of domain” theorem, due to the Dutch mathematician L. E. J. Brouwer: If U is an open subset of Rn , and f : U → Rn is a continuous and one-to-one mapping, then V := f (U ) is an open set and f is a homeomorphism from U onto V . For a proof, see, e.g., [Du66, p. 358]. ® Recall the definition of the Kronecker delta (named after the German mathematician L. Kronecker): δij = 0 if i = j and δij = 1 if i = j . Let X be a Banach space and consider a set of vectors {ei }ni=1 ⊂ X and a set {fi }ni=1 of continuous linear functionals. We say that the set {ei ; fi }ni=1 is a biorthogonal system in X × X ∗ if fi (ej ) = δij for i, j = 1, . . . , n (see also Sect. 11.1.6). A biorthogonal system {ei ; fi }ni=1 is called an Auerbach basis of a finitedimensional normed space X if {ei }ni=1 is a basis of X and ei = fi = 1 for every i = 1, 2, . . . , n. Auerbach bases are important in the geometry of Banach spaces. They bear the name of the Polish mathematician H. Auerbach. For example, if n ∈ N, in the Banach space (Rn , ·∞ ) the system {ei ; fi }ni=1 in n R × (Rn )∗ is an Auerbach basis, where ei := (0, 0, . . . , 0, 1, 0, . . . , 0) (1 in the
518
11 Excursion to Functional Analysis
Fig. 11.5 The construction of an Auerbach basis {ei ; fi }2i=1 in R2 for a given norm
ith position) and fj (ei ) = δi,j for i and j in {1, 2, . . . , n}. So, if x = (xi )ni=1 and (x∞ = ) max{|xi | : i = 1, 2, . . . , n} ≤ 1, then |fj (x)| = |xj | ≤ 1; moreover, fj (ej ) = 1, for all j = 1, 2, . . . , n. These two facts together show that fj ∞ = 1 for all j = 1, 2, . . . , n. Similarly for the norm ·p in Rn , for any p ≥ 1. On the other hand, let the basis in (R2 , ·∞ ) be {ui }2i=1 , where u1 := (1, 0) and u2 := (1, 1). Note that ui ∞ = 1 for i = 1, 2. If {ui ; gi }2i=1 is a biorthogonal system in R2 × (R2 )∗ , note that (1, −1) = 2(1, 0) − (1, 1) and (1, −1)∞ = 1, while g1 (1, −1) = 2g1 (1, 0) − g1 (1, 1) = 2. This shows that g1 ≥ 2. Thus {ui ; gi }2i=1 is not an Auerbach basis in (R2 , ·∞ ). We remark that the proof of the existence of an Auerbach basis is not trivial, even in two-dimensional spaces: We are looking for independent vectors of norm one, such that the line through each one parallel to the other vector leaves the whole ball on one side (see Fig. 11.5). Theorem 914 (Auerbach) Every Banach space X of finite dimension n has an Auerbach basis {ei ; fi }ni=1 . Proof Let {z1 , . . . , zn } be an algebraic basis of X. For u1 , . . . , un ∈ BX let v(u1 , . . . , un ) be the determinant of the matrix whose j th column is formed by the coordinates of uj in the basis {z1 , . . . , zn }. The function |v| is continuous on the compact set BX × · · · × BX ; therefore, there is (e1 , . . . , en ) ∈ BX × · · · × BX such that v(e1 , . . . , en ) = max{|v(x1 , . . . , xn )| : (x1 , . . . , xn ) ∈ BX × · · · × BX }. Since, determinants are homogeneous in each coordinate, we have ei ∈ SX . Clearly, v(e1 , . . . , en ) = 0, so the vectors {ei }ni=1 are linearly independent, hence they form a basis of X. For i = 1, . . . , n, define fi ∈ X∗ by fi (x) =
v(e1 , . . . , ei−1 , x, ei+1 , . . . , en ) . v(e1 , e2 , . . . , en )
Then fi (ej ) = δij , so {ei ; fi }ni=1 is a biorthogonal system. Moreover, sup{|fj (x)| : x ∈ BX } ≤ 1 for each j = 1, 2, . . . , n. Therefore, fj = 1 for j = 1, 2, . . . , n. Thus {ei ; fi }ni=1 is an Auerbach basis of X. Having in mind that v(u1 , . . . , un ) in the proof of Theorem 914 is the volume of the parallelepiped determined by the vectors u1 , . . . , un in X, the construction of an
11.1 Real Banach Spaces
519
Fig. 11.6 The construction of an Auerbach basis {ei ; fi }3i=1 in R3 for a given norm
Auerbach basis {ei ; fi }ni=1 there amounts to find on one hand the vertices {ei }ni=1 of the parallelepiped of maximum volume inscribed in the closed unit ball BX of the space X, and on the other hand the hyperplanes fi−1 (1), i = 1, . . . , n that support BX at those points (see Definition 928 below, and Fig. 11.6 for a three-dimensional representation of it). Remark 915 It is not known whether (C[0, 1], ·∞ ) (see Example 551.4) has an (infinite) Auerbach basis {fn ; μn }∞ n=1 . By this we mean a biorthogonal system ∗ in C[0, 1] × C[0, 1] such that {fn }∞ {fn ; μn }∞ n=1 n=1 is linearly dense (i.e., the span of {fn : n ∈ N} is dense) in (C[0, 1], ·∞ ) and {μn }∞ n=1 separates points of C[0, 1] (i.e., given f and g in C[0, 1] such that f = g, we can find n ∈ N such that μn (f ) = μn (g)). ® Some Remarks on Compactness in Finite-Dimensional Banach Spaces The basic Theorem 908 shows that every finite-dimensional Banach space (X, ·) is linearly isomorphic to (Rn , ·2 ) (where n is the algebraic dimension of X). It follows from Theorem 96 that a subset K of X is compact, if and only if, it is closed and bounded. The following result, due to H. Minkowski and C. Carathéodory, is the finitedimensional precedent of the Krein–Milman theorem (compare with the last but one paragraph in Sect. 11.1.4, and with Exercises 13.593 and 13.608). For the concept of extreme point see Exercise 13.587. The Krein–Milman Theorem is named after the Russian mathematicians M. G. Krein and D. P. Milman. Theorem 916 (Minkowski, Carathéodory) Let K be a compact subset of an ndimensional Banach space (X, ·). Then conv (K) is compact, all extreme points of conv (K) lie in K, and every point x ∈ conv (K) is the convex hull of at most (n + 1) extreme points of conv (K). Proof (Sketch) (1) We show first that if C is a nonempty compact convex subset of an n-dimensional Banach space (we may always assume 0 ∈ C), then every x ∈ C is a convex combination of at most n + 1 extreme points of C. This is proved by induction
520
11 Excursion to Functional Analysis
Fig. 11.7 The inductive construction in (1) in the proof of Theorem 916
on n: It is obvious for n = 1 (C is just a closed and bounded interval); if valid for n ≥ 1, and C is in an (n + 1)-dimensional space X, it is enough to assume that span C = X, hence that C has a nonempty interior (use the Baire Category Theorem 111). If x is a boundary point, it is in a hyperplane H that supports C at x (use the Separation theorem), and the induction hypothesis allows to write x as a convex combination of no more than (n + 1) extreme points of H ∩ C (and those are extreme points of C, see Exercise 13.588). If, on the contrary, x belongs to the interior of C, take an extreme point e of C (it exists by the Krein–Milman theorem, see Exercise 13.593) and draw a segment from e through x until meeting a boundary point b ∈ C (the intersection of a line and C is a closed and bounded interval, due to the connectedness of C). Apply the previous argument to b to finally write x as a convex combination of no more than n + 2 extreme points of C (see Fig. 11.7). (2) Assume that {K1 , K2 , . . . , Km } is a finite collection of compact sets in a Banach space X. Then C := conv m a compact set. Indeed, let {xn }∞ n=1 be i=1 mKi is n n n n a sequence in C. Since xn = λ e , where e ∈ K and λ ∈ [0, 1] for i i i i=1 i i n λ = 1 for all n ∈ N, a (finite) diagonal all n ∈ N, i = 1, 2, . . . , m, and m i=1 i method allows to extract a subsequence of {xn }∞ n=1 converging to an element x ∈ C. (3) All norms in a finite-dimensional Banach space are equivalent (Theorem 908), so we may assume from the beginning that X carries the ·∞ -norm. Its open unit ball is thus a finite intersection of open half-spaces. Obviously, C := conv (K) is a (closed and) bounded subset of X, so it is compact. Let us prove that every extreme point e of C is in K (see Fig. 11.8). Fix r > 0. The set B(e, r) is a finite c intersection of open half-spaces Hi , i = 1, 2, . . . , m. Each
mset Ki := Hi ∩ C is convex and compact, and we get from thatconv is compact. i=1 Ki
(2) m m (as it is not in Since e, being extreme, is not in conv K i i=1 i=1 Ki ), it can be separated from it by a single hyperplane, i.e., there exists f ∈ X∗ and
m α ∈ R such that f (x) < α < f (e) for all x ∈ conv i=1 Ki . The set K must intersect {x ∈ X : f (x) > α}, and so its superset B(e, r).
11.1 Real Banach Spaces
521
Fig. 11.8 The construction in (3) in the proof of Theorem 916
Since r > 0 was arbitrary, we get e ∈ K. As a consequence, every x ∈ C is a convex combination of elements in K, so C = conv (K). Remark 917 1. That (n + 1) extreme points are needed in general in Theorem 916 can be seen by looking at the set K := {(0, 0), (1, 0), (0, 1)} in R2 and its convex hull C. 2. Note that the convex hull of the set {en : n ∈ N} in 2 consists of finitely supported vectors, so it is not closed. ®
11.1.4
Infinite-Dimensional Banach Spaces
We will show how results on bounded operators and functionals on finite-dimensional spaces (see Sect. 11.1.3) generalize—or not—to bounded operators and functionals on infinite-dimensional spaces. In this direction, let us first note the following list of remarks. Remark 918 1. There is a Banach space X and a bounded linear operator from X into X that is one-to-one but not onto. There is a Banach space and a bounded linear operator from X into X that is onto but not one-to-one. Indeed, check the right and left shift operators on 2 (see Example 1010 and Exercise 13.614). Compare these situations with (v) in Facts 11.1.3. 2. There is a normed linear space X and a bounded linear one-to-one operator from X onto X that is not a linear isomorphism from X onto X. Indeed, let an operator T be defined on c00 by ∞ 1 x T (x) := n 2n n=1 for x = (xn ) ∈ c00 . Compare this situation with (v) in Facts 11.1.3 and with the fundamental Banach Open Mapping Theorem 953. 3. Note that the right shift operator on 2 (see Remark 918.1 above and Example 1010) is an isometry onto a hyperplane (see also Exercise 13.536). However, there is an infinite-dimensional Banach space that is not linearly isomorphic to any of its closed hyperplanes. The first example of this behavior—answering a question raised by S. Banach in his book [Ba32]—was given by the British mathematician W. T. Gowers in 1994 (for some details and references, see ([FHHMZ11], §5.4, Remark 2)). Note that any two closed hyperplanes of a Banach space are mutually
522 Fig. 11.9 Two closed hyperplanes are always linearly isomorphic (Remark 918.3)
11 Excursion to Functional Analysis
e
H1
h1 h2
H2
homeomorphic. The reader may provide a proof of this last fact by looking at Fig. 11.9). The Russian mathematician M. J. Kadets proved in the sixties of the 20th century that all separable infinite-dimensional Banach spaces are mutually homeomorphic (for the result and references see, e.g., [FHHMZ11], Theorem 12.46). 4. If X is infinite dimensional, then BX is homeomorphic to SX and to X (see, e.g., [BePe], p. 190). This is in contrast with the finite-dimensional case (a consequence of L. E. J. Brouwer’s invariance of domain theorem [Br12], see Remark 913 above). 5. The space c0 is not isomorphic to 2 . We propose n here two approaches. a) The sequence {sn }∞ n=1 in c0 , where sn = 1 ei for n ∈ N and en is the nth canonical unit vector, does not have any pointwise convergent subsequence (see Remark 999 below). If T is an isomorphism from 2 onto c0 , and tn = T −1 (sn ) for each n ∈ N, the sequence {tn }∞ n=1 in 2 is bounded, so there is t ∈ 2 ∞ and a subsequence {tnk }∞ k=1 of {tn }n=1 such that f (tnk ) → f (t) as k → ∞ for every f ∈ ∗2 (see Theorem 998 below). This shows that g(T (tnk )) = g(snk ) → g(T (t)) as k → ∞ for every g ∈ c0∗ (in particular, snk → T (t) pointwise), and we reach a contradiction with the property of {sn }∞ n=1 we started with. b) For a second argument, see Exercise 13.606. ® We proved in Corollary 910 that the closed unit ball of an infinite-dimensional Banach space is never compact. Since compactness is a highly desirable property, the following result, due to the Canadian-American mathematician L. Alaoglu, that ensures some kind of compactness of the closed unit ball of a (dual) Banach space, plays an important role in the whole theory. We restrict ourselves here to the case of separable Banach spaces, mainly because in this text we did not consider general topological spaces beyond the metric ones. Theorem 919 (Alaoglu) Let X be a separable Banach space and let {xn }∞ n=1 be a dense sequence in SX . Let a metric dw∗ be introduced on BX∗ by dw∗ (f , g) := ∞ 1 dw ∗ i=1 2i |f (xi ) − g(xi )|. Then (BX ∗ , dw∗ ) is a compact metric space, and fn −→ f , if and only if, fn (x) → f (x) for all x ∈ X. In particular, from every sequence {fn }∞ n=1 in BX ∗ one can get a subsequence ∗ such that for some f ∈ B , f (x) → f (x) for every x ∈ X. {fnk }∞ X nk k=1
11.1 Real Banach Spaces
523
Sketch of the Proof First note that, in (BX∗ , dw∗ ), fn → f , if and only if, fn (xi ) → f (xi ) for all i ∈ N, which is the same as fn (x) → f (x) for all x ∈ X. Then, by the Cantor diagonal procedure, given a sequence {fn }∞ n=1 in BX ∗ we ∞ ∞ is convergent for of {f } such that {f (x )} choose a subsequence {fnk }∞ n n i k n=1 k=1 k=1 all i ∈ N. From that we get that {fnk (x)}∞ is Cauchy and thus convergent for all k=1 x ∈ X, and the pointwise limit defines a linear functional f on X such that on BX it is bounded in absolute value by 1, by inspection. Thus, f ∈ BX∗ . ∞ Example 920 The space 1 is a subset of 2 , since k=1 |xk | < +∞ implies that there exists k0 ∈ N such that |xk | < 1 for k ≥ k0 , hence |xk |2 < |xk | for all k ≥ k0 , and the conclusion follows. Note that the closed unit ball of 1 , as a subset of 2 , is closed. This can be proven directly. (i) Let {x (n) }∞ n=1 be a sequence in B 1 that ·2 -converges to an element x ∈ 2 . (x ). Find ε > 0 such that Assume that (x1 = ) ∞ k=1 |xk | > 1, where x = ∞ k0 k |x | > 1 + ε. There exists k ∈ N such that 0 k=1 |xk | > 1 + ε. Since the k=1 k sequence {x (n) }∞ converges to x coordinatewise, we can find n ∈ N such that n=1 (n) |x − x | < ε/(2k0 ) for all k = 1, 2, . . . , k0 . This shows, in particular, that kk0 k(n) ∞ (n) k=1 |xk | > 1 + ε/2, and so k=1 |xk | > 1 + ε/2, a contradiction. (ii) As an application of Theorem 919. Indeed, due to the fact that 1 is the dual space of c0 (see Exercise 13.581), we have, by Theorem 919, that B 1 is compact in the topology dw∗ of the pointwise convergence on elements of c0 , in particular in the topology of the coordinatewise convergence. This shows that B 1 is compact, as a subset of 2 , in the topology of the coordinatewise convergence, hence in the ·2 -topology. ♦ A simple application of the Baire Category Theorem 640 shows that there is no infinite-dimensional Banach space of countable algebraic dimension, i.e., no infinitedimensional Banach space X can have a countable subset N := {xn : n ∈N} such that span N = X. Indeed, if such a set would exists, we could write X = ∞ n=1 Fn , where Fn := span {xi : i = 1, 2, . . . , n} for n ∈ N, hence some Fn should have a nonempty interior, something impossible for a finite-dimensional subspace of X (indeed, if some x belongs to the interior of Fn , then B[x, δ] ⊂ Fn for some δ > 0. Obviously, B[x, δ] and BX are homeomorphic, and B[x, δ] is compact ((v) in Corollary 910), so it is BX . This shows, again by the same reference, that X must be finite-dimensional). In particular, the space of all polynomials cannot be endowed with a norm that makes it a Banach space, since the linear span of the set of functions {x n : n = 0, 1, 2, . . . } is the whole space. Some Remarks on Compactness in Banach Spaces Note that the closed convex hull of a compact subset K of a Banach space X is compact (compare with Theorem 916 above). Indeed, K is totally bounded (Proposition 613), so for every δ > 0, the set K contains a finite δ-net N . It is simple to prove that
524
11 Excursion to Functional Analysis
conv (N ) is a δ-net for conv (K). The set conv (N ) is bounded and lies in a finitedimensional space, hence its closure is compact and so conv (N ) is totally bounded. Clearly, a δ net for conv (N ) is a 2δ-net for conv (K). This shows that conv (K) is totally bounded, thus conv (K) is also totally bounded (see Corollary 618). Since X is a Banach space, the result follows from Theorem 620. We shall prove in Exercise 13.593 that a closed convex and compact subset of a Banach space is the closed convex hull of the set consisting of all its extreme points, and we refer in Remark 1087 a general setting where a similar result holds. Compare again with Theorem 916 above. It is worth to mention here that every compact set in a Banach space lies in the closed convex hull of a null sequence. This is a result due to the German-French mathematician A. Grothendieck. For a reference, see, e.g., ([FHHMZ11], Exercise 1.69]).
11.1.5
Operators II
11.1.6
Finite-Rank and Compact Operators
Definition 921 Let T be a bounded linear operator from a Banach space X into a Banach space Y . Then (i) T is called a finite-rank operator if dim T X is finite. (ii) T is called a compact operator if the closure T BX of T BX in Y is compact. Note that a bounded linear operator from a Banach space X into a Banach space Y is a finite-rank operator if, and only if, there is a family {ei : i = 1, . . . , n} in Y and a family {fi : i = 1, . . . , n} in X ∗ for some n ∈ N, such that T x = ni=1 fi (x)ei for all x ∈ X. Indeed, if dim T X = n, we can find a Hamel (i.e., algebraic) basis {e1 , . . . , en } of T X. Let {g1 , . . . , gn } be a subset of (T X)∗ such that the system {ei , gi }ni=1 is biorthogonal (i.e., gi (ej ) = δi,j for all i, j = 1, 2, . . . , n) (see Theorem 914). Finally, put fi := gi ◦ T for nall i = 1, . . . , n. Then, for all x ∈ X we∗ have n Tx = g (T x)e = i i . Conversely, if f1 , . . . , fn ∈ X and i=1 i i=1 fi (x)e e1 , . . . , en ∈ Y , then the mapping x $ → ni=1 fi (x)ei is a finite-rank operator from X into Y . Note that every finite-rank operator is a compact operator. Indeed, if T X is finitedimensional in Y , then T BX is a bounded subset of the finite-dimensional space T X. Since, T X is closed in Y (see (iv) in Corollary 910), we get that T BX is a closed and bounded subset of the finite-dimensional space T X, hence T BX is compact in T X and in Y as well. To see that the class of compact operators is, in general, strictly larger than the class of finite-rank operators, define an operator T from 2 into 2 by T x = 2xii for x = (xi ). Then T B 2 is a subset of the Hilbert cube (see Example 984 below)
11.1 Real Banach Spaces
525
and thus T B 2 is compact. Hence T is a compact operator that is not a finite-rank operator. Example 922 We give here an example of a finite-rank operator f , hence a compact operator, from a Banach space X into another Y , such that the image of the closed unit ball is not closed. The space Y is R, and f is a continuous linear functional on X that does not attain its norm, i.e., does not attain its supremum on BX . The space X is (c0 , ·∞ ), and f ∈ c0∗ ( see also Exercise 13.581) is defined by f (x) =
∞ xi i=1
2i
, for x = (xi ) ∈ c0 .
The functional f is indeed well defined, since if x = (xi ) ∈ c0 then |f (x)| ≤
∞ |xi | i=1
2i
≤ x∞
∞ 1 = x∞ 2i i=1
(this n shows, in particular, that f ≤ 1). In fact, f = 1 (evaluate f on elements i=1 ei , where en = (0, . . . , 0, 1, 0, . . .), and 1 is in the nth position) but, as for every x ∈ c0 there are indices i for which |xi | < 1, it follows that for every x ∈ Bc0 we have |f (x)| < 1. This shows that f (Bc0 ) = ( − 1, 1). Thus the closure is needed in the definition of compact operators (Definition 921). This example shows a continuous linear functional f on c0 that does not attain its supremum (the value 1) on a nonempty closed and bounded subset of c0 , namely its closed unit ball (see Remark 934). ♦ If X is an infinite-dimensional Banach space, BX is not norm compact (see (v) in Corollary 910), and thus the identity operator on X is not a compact operator. To see this on a concrete example, let X := (c0 , ·∞ ) and note that for each unit vector ei (see Example 922), we have ei ∞ = 1 and ei − ej ∞ = 1 for i = j . It follows that Bc0 is not compact (it is complete and not totally bounded, due to the existence of an infinite 1-net in it, see Theorems 615 and 620). For another examples where the formal identity mapping is a noncompact operator see Exercises 13.570 and 13.571.
11.1.7
Sets of Operators
Recall that the space of all bounded operators B(X, Y ) from a normed space X into a normed space Y is assumed endowed, if nothing is said on the contrary, with the supremum norm T := sup{T x : x ∈ BX }, see formula (11.7). We called this norm the operator norm in Sect. 11.1.2. We note that if X and Y are Banach spaces, then (B(X, Y ), ·) is a Banach space (Proposition 901). Observe that if {Tn }∞ n=1 is a sequence of linear mappings from X into Y and Tn (x) → T (x) for all x ∈ X, then T must be linear.
526
11 Excursion to Functional Analysis
Proposition 923 Let X and Y be Banach spaces. Then, the subspace of B(X, Y ) consisting of all compact operators from X into Y is closed. Proof Let {Tn }∞ n=1 be a sequence of compact operators in B(X, Y ) that ·-converges to an operator T . Given ε > 0 we can find Tn such that T − Tn < ε. In particular, T (BX ) ⊂ Tn (BX ) + εBY . Since Tn (BX ) is compact, we can find a finite cover of it by sets of diameter less than ε, hence a finite cover of T (BX ) by sets of diameter less that 2ε. This shows that T (BX ) is totally bounded (see Remark 611). It is moreover complete, due to the fact that it is closed in the complete space Y . Thus the set T (BX ) is compact1 (Theorem 620). The following result follows from Proposition 923. Corollary 924 Let X and Y be Banach spaces. Then, the closure of the space of finite-rank operators from X into Y in B(X, Y ) is contained in the space of compact operators. Exercise 13.579 shows that, in some instances (for example, in case of separable Hilbert spaces, see Sect. 11.4), the space of all compact operators is the closure, in the operator norm, of the space of all finite-rank operators. Let X, Y be normed spaces, and let A ⊂ B(X, Y ). We say that A is bounded if it is bounded in the operator norm, i.e., if sup{T : T ∈ A} < ∞. We say that A is pointwise bounded if sup{T (x) : T ∈ A} < ∞ for all x ∈ X.
11.2 Three Basic Principles of Linear Analysis The following three fundamental principles of functional analysis (Theorems 925, 951, and 953—and its Corollary 956) are all due to the Polish mathematician S. Banach and his collaborators.
11.2.1
Extending Continuous Linear Functionals
The Hahn–Banach Theorem Every linear functional on a subspace Y of a vector space X can be extended to a linear functional on the whole of X. This is a consequence of the possibility to extend an algebraic basis of Y to an algebraic basis of X, something guaranteed by Zorn’s Lemma (see Sect. 12.6.3). More delicate is the proof of the fact that continuous
1
The reader may identify in the argument just used, a general and useful result: If a subset of a metric space M is, for every ε > 0, at ε-distance from a relatively compact subset of M (a different relatively compact set, in general, for each ε), then the set is, itself, relatively compact. This is a straightforward consequence of Corollary 617. See Exercise 13.397.
11.2 Three Basic Principles of Linear Analysis
527
linear functionals defined on subspaces of a normed space (X, ·) can be extended to continuous linear functionals defined on the whole of X, even preserving the norm of the functional. This result (Theorem 925) depends, ultimately, also on Zorn’s Lemma. It is of fundamental importance in infinite-dimensional functional analysis, and it is due to the Austrian mathematician H. Hahn and S. Banach. In the proof, we shall use some terms associated to the concept of a partial order (see Sect. 12.6.1). Theorem 925 (Hahn–Banach) Let X be a normed space and Y be a linear subspace of X. Let f be a continuous linear functional on Y . Then there is a continuous linear functional f on X that extends f and f = f . Proof Let F be the family of all pairs (Z, g), where Z is a linear subspace of X that contains Y , and g : Z → K is a continuous linear functional such that g = f and extends f , i.e., g(y) = f (y) for all y ∈ Y . Such a family is nonempty, since (Y , f ) ∈ F. Partially order F in the following way: (Z, g) ≤ (H , h) whenever Z ⊂ H and h extends g. Observe that every chain C in F has an upper bound, namely the couple (L, l) consisting of the union L of all subspaces in C (itself a subspace) and the extension l defined on C in the natural way by using the functions in the couples of the chain. Zorn’s Lemma (see Sect. 12.6.3) ensures the existence of a maximal element (M, f) in F. We claim that M = X (and so f would be the sought extension). To prove the claim it is enough to show that we can always extend a continuous linear functional from a subspace H to an overspace E of H having one more dimension, and keeping the norm. To prove this last statement, let (H , g) ∈ F. Let E := H ⊕ span {x0 }, where x0 ∈ X. An extension g of g to E must be of the form g (h + λx0 ) := g(h) + λα, for some α ∈ R to be determined, where h ∈ H and λ ∈ R. Without loss of generality we may assume g = 1. We want then | g (h + λx0 )| ≤ h + λx0 for every h ∈ H and λ ∈ R. By homogeneity, it is enough to prove | g (h + x0 )| ≤ h + x0 for h ∈ H , which g (h − x0 ) ≤ h + x0 simultaneously, amounts to prove g (h + x0 ) ≤ h + x0 and for all h ∈ H . In other terms, we want to show that g(h) + α ≤ h + x0 , g(h) − α ≤ h − x0 , for all h ∈ H. Rewrite this as g(h) − h − x0 ≤ α ≤ h + x0 − g(h).
(11.13)
The purpose is to find α ∈ R that satisfies (11.13) for all h ∈ H . The existence of such an α will be guaranteed as soon as sup{g(h) − h − x0 : h ∈ H } ≤ inf{h + x0 − g(h) : h ∈ H }. For this it will be enough to show that g(h1 ) − h1 − x0 ≤ h2 + x0 − g(h2 ) for all h1 , h2 ∈ H . This is equivalent to prove g(h1 ) + g(h2 ) ≤ h1 − x0 + h2 + x0 , and this is certainly true, since we have g(h1 + h2 ) ≤ h1 + h2 ≤ h1 − x0 + h2 + x0 . The following result is usually known as the geometrical version of the Hahn–Banach theorem.
528
11 Excursion to Functional Analysis
Fig. 11.10 Theorem 926
C H 0
Y
Theorem 926 Let C be an open convex set in a normed space X and let Y be a subspace of X disjoint from C. Then there is a closed 0-hyperplane H that contains Y and is also disjoint from C (see Fig. 11.10). Proof Use Zorn’s Lemma (see (Z) in Sect. 12.6.3) to show the existence of a maximal subspace H of X disjoint from C and containing Y . We shall prove that H is a closed 0-hyperplane. To this end (see Fig. 11.11) put D := λ>0 λC. This is a convex open subset of X. Let H + := H + D and H − := −H + , both convex open subsets of X. We shall prove the following statements: (i) H + ∩ H − = ∅ (ii) X = H ∪ H + ∪ H − From (i) and (ii) it follows that H is a 0-hyperplane. Indeed, and arguing by contradiction, assume that the codimension of H in X is greater than 1. We may find then h+ ∈ H + such that span (H , h+ ) = X (if such an element can be found in H − instead, multiply this element by (−1)). Certainly H − ⊂ span (H , h+ ), so we may find h− ∈ H − such that h− ∈ span (H , h+ ). The segment [h− , h+ ] is connected. Due to the fact that H + and H − are both open, there exists h ∈ H ∩ (h− , h+ ), a contradiction with the fact that h− ∈ span (H , h+ ). This shows that H is indeed a 0-hyperplane. That H is closed follows from two facta: first, a subset of X is a 0-hyperplane, if and only if, it is the kernel of a nonzero linear functional (see the paragraph immediately succeeding the proof of Proposition 900), and second, from this and Exercise 13.548, since H is clearly not dense. It remains then to prove (i) and (ii). (i) Follows from the fact that H + is convex: Indeed, if h+ ∈ H + ∩ H − , then −h+ ∈ H + , hence 0 ∈ H + , a contradiction with the fact that H ∩ C = ∅. (ii) If x ∈ H ∪ H + ∪ H − , then span (H , x) contains properly H and is clearly disjoint from C, contradicting the maximality of H .
11.2 Three Basic Principles of Linear Analysis
529
Fig. 11.11 The construction in the proof of Theorem 926
Fig. 11.12 The construction in the proof of Corollary 927
Compare the properties of the two convex subsets H + and H− in the proof of Theorem 926 with those in the situation described in Exercise 13.549. The following is a useful corollary to Theorem 926. Corollary 927 Let C be a closed convex set in a normed space X. Assume that ∗ {xi }∞ i=1 is a sequence in C such that, for some x ∈ X, f (xi ) → f (x) for all f ∈ X . Then x ∈ C. Proof Using a translation (that preserves the convexity and the closedness) we may assume, without loss of generality, that x = 0. Assume that 0 ∈ C. Put δ = dist (0, C) ( > 0). Let B be an open ball centered at 0 with radius δ/2. Then B + C is clearly a convex set. Moreover, it is open, as it is the union of the open sets B + c, where c ∈ C. Note that 0 ∈ / B + C. Indeed, if 0 = b + c for some b ∈ B and c ∈ C, then c = −b and thus dist (0, C) ≤ dist (0, c) = c = b < δ/2, a contradiction. Apply Theorem 926 to the set B + C and the subspace {0} to obtain a closed 0hyperplane H such that H ∩ (B + C) = ∅. Recall that H is the kernel of a continuous linear functional f , and that f = 0 since H is proper. Without loss of generality we may assume that f (b + c) > 0 for all b ∈ B and c ∈ C. Choose b ∈ B such that f (b) < 0. Then, in particular, f (b + xi ) > 0, hence f (xi ) > −f (b) ( > 0), for all i. This is a contradiction with the fact that f (xi ) → f (0) ( = 0).
530
11 Excursion to Functional Analysis
Fig. 11.13 The hyperplane H supports C at c0
Some Consequences of the Hahn–Banach Theorem 1. Supporting Functionals. Differentiability The main idea here is to extend, in an abstract setting, the concept of a tangent line to the epigraph of a convex function. The epigraph will be substituted by a convex set C in a normed space, and the tangent line by a supporting hyperplane H (see Fig. 11.13), i.e., a hyperplane intersecting C and keeping C “at one side.” Since the main goal is to reproduce somehow the idea behind the concept of a tangent, it is not by chance that differentiability will play a decisive role in this treatment, and will reveal as a fundamental computational tool. A hyperplane H := {x ∈ X : f (x) = α} in a normed space X, where f ∈ X∗ is a nonzero continuous linear functional and α ∈ R, defines two closed half-spaces H + := {x ∈ X : f (x) ≥ α}, and H − := {x ∈ X : f (x) ≤ α}. Definition 928 Let C be a nonempty convex subset of a normed space X, and let f ∈ X∗ , f = 0. Let c0 ∈ C. The hyperplane H := {x ∈ X : f (x) = f (c0 )} is said to support C at c0 if C is contained in one of the two closed half-spaces H + and H − (see Fig. 11.13). We say in this case that f is a supporting functional of C at c0 , or that f supports C at c0 . Remark 929 Observe that a continuous linear functional f = 0 supports C at c0 ∈ C, whenever, f attains its supremum or infimum on C at c0 . ® We list below some corollaries of the separation Theorem 926. Corollary 930 Let C be a nonempty open convex subset of a normed space X, and let x0 ∈ X \ C. Then there exists a closed hyperplane H of X such that x0 ∈ H and H ∩ C = ∅. In particular, if x0 ∈ C \ C, there exists a closed hyperplane H such that x0 ∈ H , H supports C at x0 , and H ∩ C = ∅. Proof Without loss of generality we may assume x0 = 0. Then it is enough to apply Theorem 926 to C and the subspace {0}. A metric version of Corollary 930 for the case of the open unit ball of a normed space is given in the following result. Although it is a consequence of Corollary 930, we provide a proof, this time by using the Hahn–Banach extension result (Theorem 925).
11.2 Three Basic Principles of Linear Analysis
531
Fig. 11.14 A supporting functional (Corollary 931 and Proposition 933)
Corollary 931 Let X be a normed space and x0 ∈ SX . Then there is f ∈ SX∗ such that f (x0 ) = 1 (i.e., f is a supporting functional of C at x0 or, in other terms, the hyperplane {x ∈ X : f (x) = 1} supports BX at x0 ). Proof Let Y := span {x0 }. Define a function f0 : Y → R as f0 (λx0 ) := λ for all λ ∈ R. Observe that |f0 (λx0 )| = |λ| = λx0 for all λ ∈ R, hence f0 is continuous (see Proposition 900) and, moreover, f0 ≤ 1 (see formula (11.7)). Since f0 (x0 ) = 1, we also have f0 ≥ 1. This shows that f0 = 1. The Hahn–Banach Theorem 925 ensures the existence of a linear continuous extension f : X → R of f0 with f = 1. Corollary 932 Let (X, ·) be a normed space. Then x = sup{|f (x)| : f ∈ BX∗ }. In particular, the mapping j from X into (X ∗ )∗ given by j (x)(f ) := f (x) for all x ∈ X and f ∈ X∗ is a linear isometry into (X ∗ )∗ . Proof Let x ∈ X be such that x = 0. Apply Corollary 931 to the vector x/x to get f ∈ SX∗ such that f (x/x) = 1. Then f (x) = x. Since |g(x)| ≤ x for all g ∈ BX∗ , we get x = sup{|g(x)| : g ∈ BX∗ }. This shows, in particular, that the mapping j is an isometry. Moreover, it is clearly linear. The space (X ∗ )∗ is called the bidual space of X, and is denoted by X ∗∗ . According to Corollary 932, X can be identified (linear isometrically) to the subspace j (X) of X∗∗ . Note that Corollary 931 says that for every x ∈ SX there exists a continuous linear functional f ∈ SX∗ that attains its supremum on BX at x. In general, this functional is not unique (consider, for example, x to be a corner of the closed unit ball of ( 2∞ , ·∞ )—see Example 943—or, alternatively, see (i) in Fig. 11.17 below). See also Remark 940.4, where uniqueness of the supporting functional of BX at a point of SX is related to differentiability of the norm at this point. The concept of a supporting functional is an important one. It allows replacing the boundary of the ball by the hyperplane f −1 (1) in many cases (called also a tangent hyperplane), which is an affine set easier to deal with. For example, we have the following statement. Proposition 933 For a normed space X, let x ∈ SX be a point and f be a supporting functional to BX at x. If y ∈ X is such that f (y) = 0, then x + y ≥ x ( = 1) (see Fig. 11.14). Proof Use (11.10) to get x + y ≥ |f (x + y)| = |f (x)| = 1. The search for supporting functionals is the subject of many topics in geometry.
532
11 Excursion to Functional Analysis
Remark 934 A natural question regards the possibility to “dualize” Corollary 931: precisely, it asks whether, if (X, ·) is a Banach space, every f ∈ SX∗ has a support point in BX or, in other terms, whether every f ∈ SX∗ attains its supremum on BX . It is clear that, if a point x in BX satisfies f (x) = 1 then x = 1 (a geometric formulation of this fact in terms of the distance from a point to a hyperplane is given in Exercise 13.551). We shall see below (see Proposition 990) that in the context of Hilbert spaces the result holds true. However, this is no longer true for general Banach spaces, as we showed in Example 922. There are two deep results in this direction, the first one due to the American mathematicians E. A. Bishop and R. Phelps, and the second to the American mathematician R. C. James: (i) [Bishop–Phelps] Let (X, ·) be a Banach space. Then, the set of all linear functionals f ∈ X∗ that attain its supremum on BX is ·-dense in X∗ , and (ii) [James] A Banach space X has the property that for every continuous linear functional F : X∗ → R there is x ∈ X such that F (f ) = f (x) for all f ∈ X ∗ , if and only if, every f ∈ X∗ attains its supremum on BX (2 ). The proofs of these two results are beyond the scope of this introduction. For those and related results see, e.g., ([FHHMZ11], Theorem 7.41) and ([FHHMZ11], Theorem 3.130), respectively. ® Regarding Remark 934, note that in general we cannot expect that a continuous linear functional f on a normed space X, even if it attains its norm on the closed unit ball BX , it would do it at a single point x ∈ SX (look at the closed unit ball of ( 2∞ , ·∞ ) and the functional f (x, y) = x for (x, y) ∈ R2 ). The reader will understand that it would be useful to be able to “renorm” a normed space (i.e., to find an equivalent norm there) in such a way that the new norm enjoys some desirable features. This is an active area of research in the whole theory of Banach spaces. A simple example in this direction is presented in the next result. Proposition 935 Let (X, ·) be a separable Banach space. Then there exists an equivalent norm | · | on X such that its closed unit ball is strictly convex. Proof By Exercise 13.573, we can find a linear, continuous, and one-to-one operator T from X into 2 . Define a new norm | · | on X by |x| := x + T x2 , for x ∈ X. It is simple to prove that | · | is an equivalent norm on X. Indeed, x ≤ |x| ≤ 2x for all x ∈ X, and all other properties of a norm are seen to hold. We claim that if 1 = |x| = |y| = |(x + y)/2| then T x2 = T y2 = (1/2)(T x + T y)2 . Indeed, under this circumstances, . . . . .x + y . .T x + Ty . x + y T x2 + T y2 .+. . . =. + . 2 . . . 2 2 2 2 2
Spaces with the aforementioned property, namely that for every continuous linear functional F : X ∗ → R there is x ∈ X, such that F (f ) = f (x) for all f ∈ X ∗ , are called reflexive. In other terms, a Banach space X is reflexive precisely when the mapping j defined in Corollary 932 maps X onto X ∗∗ . We shall see later (see Proposition 990) that, e. g., every Hilbert space is reflexive.
11.2 Three Basic Principles of Linear Analysis
533
Since (x + y) ≤ x + y and (T x + T y)2 ≤ T x2 + T y2 , we get, in particular, T x2 + T y2 = T x + T y2 .
(11.14)
The parallelogram equality T x + T y22 + T x − T y22 = 2T x22 + 2T y22 , together with Eq. (11.14), give 2T x2 .T y2 = T x22 + T y22 , hence (T x2 − T y2 )2 = 0, and so T x2 = T y2 . This , together with the fact that the closed unit ball of the space ( 2 , ·2 ) is strictly convex (see Remark 960), shows that T x = T y, hence x = y. This proves that | · | is strictly convex. A very efficient way to construct supporting functionals is to use differentiation techniques (the kind of results we are looking for appears, for example, in Remarks 940.2, 940.4, and 940.5). Below, the term “Gâteaux differentiability” comes from the name of the French mathematician R. Gâteaux, and the term “Fréchet differentiability” from the name of the French mathematician M. Fréchet. Definition 936 Let X be a Banach space and f be a real-valued function on X. We say that f is Gâteaux differentiable at x0 ∈ X if there is a continuous linear functional l ∈ X∗ (called the derivative of f at x0 ) such that lim
t→0
f (x0 + th) − f (x0 ) = l(h) t
(11.15)
for all h ∈ X. The limit in (11.15), when it exists, is called the directional derivative of f at x0 , in the direction h. It is also denoted by Dh f (x0 ). If the limit in (11.15) is uniform in h ∈ BX , we say that f is Fréchet differentiable at x0 . The directional derivative l of f at x0 is denoted in this case by f (x0 ). Remark 937 For real-valued functions defined on R, the concepts of being Gâteaux differentiable, Fréchet differentiable or, simply, differentiable according to Definition 351, at a point x0 ∈ R, coincide. Indeed, formula (4.14) shows that, if f is differentiable at x0 , there exists a linear mapping L : R → R, given by L(h) := f (x0 )h for h ∈ R, and a function u : R → R with limh→0 u(h) = 0, such that f (x0 + th) = f (x0 ) + L(th) + t.h.u(th), for h ∈ R and t ∈ R, t = 0 . Thus, f (x0 + th) − f (x0 ) − L(h) = |h.u(th)|, for h ∈ R and t ∈ R, t = 0 . t If ε > 0 and |h| ≤ 1, we can find δ > 0 such that |h.u(th)| < ε for 0 < |t| < δ, hence f (x0 + th) − f (x0 ) lim − L(h) = 0, for h ∈ R t→0 t and the convergence is uniform on h ∈ [−1, 1]. Thus, f is Fréchet (and then Gâteaux) differentiable at x0 , and L defined by L(h) := hf (x0 ) for h ∈ R is its Fréchet (Gâteaux) derivative at x0 . Similarly, we can prove the remaining implication. ®
534
11 Excursion to Functional Analysis
Fig. 11.15 The graph of the function (11.16) on [−1, 1] × [−1, 1] (Example 938)
Example 938 Let the real-valued function f be defined on R2 by ⎧ ⎨ xy if (x, y) = (0, 0), f (x, y) = (x,y)2 ⎩0 if (x, y) = (0, 0).
(11.16)
Observe that the function f is continuous at the origin: Indeed, if x = r cos θ and y = r sin θ, where r = (x, y)2 , we obtain f (x, y) = r cos θ sin θ , and so f (x, y) → 0 as r → 0. However, the function f is not Gâteaux differentiable at the origin (see Fig. 11.15 for a plot of its graph). Indeed, the only possible linear functional in Definition 936 must be the zero functional, due to the fact that both partial derivatives (i.e., directional derivatives Dei f (0) in the direction of the basis unit vectors e1 and e2 ) at the origin vanish. Then the limit to consider in (11.15) is, for x0 := (0, 0) and h := (x, y) = (0, 0), xy |t| . t→0 t (x, y)2 lim
This limit clearly does not exists if x = 0 and y = 0.
♦
Example 939 Even in finite-dimensional Banach spaces, a function can be Gâteaux differentiable at a point and discontinuous there (hence not Fréchet differentiable at this point, since Fréchet differentiability clearly implies continuity). For example, let us consider the following function from R2 into R (for a plot of the graph of the
11.2 Three Basic Principles of Linear Analysis
535
Fig. 11.16 The graph of the function (11.17) on [−1, 1] × [−1, 1] (Example 939)
function see Fig. 11.16): ⎧ ⎨ f (x, y) :=
x4y x 6 +y 3
⎩0
if (x, y) = (0, 0), if (x, y) = (0, 0).
(11.17)
Observe, first, that f has all directional derivatives Dv f (0, 0) at (0, 0) (see Definition 936). Indeed, if v := (a, b) and a = 0 or b = 0, then clearly Dv f (0, 0) = 0. If a = 0 4b and b = 0, we get easily that Dv f (0, 0) = limt→0 t 3 ata6 +b 3 = 0. This shows that the continuous linear mapping l = 0 satisfies (11.15) and so f is Gâteaux differentiable at (0, 0). To see that f is discontinuous at (0, 0) it is enough to approach (0, 0) along the curve y = x 2 . Indeed, f (x, x 2 ) = 1/2 if x = 0, while f (0, 0) = 0. This is in contrast with the fact that for a continuous convex function on a finitedimensional Banach space, Gâteaux differentiability at some point implies Fréchet differentiability there. This is a consequence of the local Lipschitz property of continuous convex functions (see Proposition 813 for the case of dimension 1) and the compactness property of the closed unit ball of a finite-dimensional Banach space (Theorem 147 and (v) in Corollary 910). Precisely, if X is a finite-dimensional Banach space, the Gâteaux differentiability of a function f : X → R at a point x0 ∈ X ensures the existence of l ∈ X ∗ , such that the limit (11.15) exists for all h ∈ X. In particular, given ε > 0 and h ∈ BX there exists δ(ε, h) > 0, such that f (x0 + th) − f (x0 ) − l(h) < ε, for all t such that 0 < |t| < δ(ε, h). t
536
11 Excursion to Functional Analysis
Moreover, f is locally Lipschitz at x0 (Proposition 813), i.e., there exists K > 0 and δ such that |f (x) − f (y)| < Kx − y for all x, y ∈ X with x − x0 < δ and y − x0 < δ. By (v) in Corollary 910, BX is compact, hence BX is totally bounded (see Theorem 620). Thus there exists a finite ε-net {hi }m i=1 in BX (see Definition 609). Put δ := min{δ, δ(ε, hi ) : i = 1, 2, . . . , m} and take t such that 0 < |t| < δ. Given h ∈ BX there exists i ∈ {1, 2, . . . , m} such that h − hi < ε. Then we have f (x0 + th) − f (x0 ) − l(h) t f (x0 + thi ) − f (x0 ) f (x0 + th) − f (x0 + thi ) = − l(hi ) + + l(hi ) − l(h) t t f (x0 + thi ) − f (x0 ) f (x0 + th) − f (x0 + thi ) + |l(hi ) − l(h)| ≤ − l(hi ) + t t 0, 0 + th − 0 ⎨h = ⎩−h if t < 0 t This shows that the limit in (11.15) does not exist. 2. If the norm of a Banach space X is Gâteaux differentiable at x0 ∈ SX with derivative l, then l is a supporting functional to BX at the point x0 . Indeed, for 0 ≤ 1 (as the norm is a 1-Lipschitz function) every h ∈ BX , we have x0 +th−x |t| and thus, by taking the limit when t → 0, we get |l(h)| ≤ 1 for all h ≤ 1. By inspection l(x0 ) = 1. Thus l = 1 and l(x0 ) = 1, which means that l is a supporting functional at x0 . 3. Note that if the norm · of a Banach space is Gâteaux (Fréchet) differentiable at some x = 0, with derivative l, then it is Gâteaux (Fréchet) differentiable at λx with derivative sign (λ)l, for any λ = 0. Indeed, for t = 0, λx + th − λx |λ| x + (t/λ)h − x = , t λ (t/λ) and the result follows by taking limits when t → 0. In particular, the derivative of the norm—whenever it exists—at a nonzero point in X is always in SX∗ . 4. By using the Hahn–Banach theorem we shall prove that the norm of a Banach space is Gâteaux differentiable at a point x0 ∈ SX , if and only if, BX has a
11.2 Three Basic Principles of Linear Analysis
537
Fig. 11.17 Balls of a (a) non-Gâteaux (b) Gâteaux differentiable norm at x0 (Remark 940.4)
a
b
unique supporting functional at x0 (see Fig. 11.17; and then, by what was said in the previous paragraph, this unique supporting functional is the derivative of · at x0 ). Indeed, assume first that · is Gâteaux differentiable at x0 ∈ SX . Let f ∈ SX∗ be a supporting functional at x0 . Given y ∈ X such that f (y) = 0 we have, by Proposition 933, that x0 + ty ≥ x0 for all t ∈ R. In particular, this shows that x0 + ty − x0 x0 + ty − x0 ≥ 0 for all t > 0, and ≤ 0 for all t < 0. t t (11.18) Since · is Gâteaux differentiable at x0 (with Gâteaux derivative l) we get, by letting t → 0 in (11.18), that l(y) = 0, hence ker f ⊂ ker l ( = X). Moreover, f (x0 ) = l(x0 ) = 1. Thus, f = l. Conversely, assume that · is not Gâteaux differentiable at some x0 ∈ SX . Therefore, we can find h ∈ X, h = 0 (and so, by normalizing, h = 1), such that x0 + th − x0 x0 + th − x0 lim = α < β = lim . t↑0 t↓0 t t 0 is monotonically increasing in R \ {0} Observe that the function t → x0 +th−x t (see Proposition 810), and that ⎧ x +th−x0 = h ( = 1) if t > 0, x0 + th − x0 ⎨≤ 0 t (11.19) th |t|.h ⎩≥ t = = −h ( = −1) if t < 0.
t
t
Define two linear mappings f and g on span {h} as f (th) := αt and g(th) := βt for all t ∈ R. Since −1 ≤ α < β ≤ 1 and f (h) = g(h) = 1, we get that f = g = 1 on span {h}. Extend those functionals, by the Hahn–Banach Theorem 925, to the whole of X by keeping the norm to get two different functionals supporting BX at x0 . 5. The norm · of a Banach space is Fréchet differentiable at x0 ∈ SX , if and only if, the following property holds: There exits l ∈ SX∗ such that fn − l → 0 for any sequence {fn }∞ n=1 in BX ∗ with fn (x0 ) → 1 (this is a version of a lemma due to the Russian mathematician J. L. Šmulyan). Observe that it is equivalent to say that the diameter of the set {f ∈ BX∗ : f (x0 ) > 1 − δ} tends to 0 as δ → 0. Note, too, that if this is the case, the functional l is the Fréchet derivative of · at x0 (see Fig. 11.18).
538
11 Excursion to Functional Analysis
Fig. 11.18 The norm is Fréchet differentiable at x0 (Remark 940.5)
To prove this, assume first that · is Fréchet differentiable at x0 . Let l be the Fréchet derivative of · at x0 . Given ε > 0 there exists r > 0 such that x0 + th − x0 − l(h) < ε for 0 < |t| ≤ r and h ∈ BX . t Put δ := rε and fix f ∈ BX∗ such that f (x0 ) > 1 − δ. Then, for t = r and h ∈ BX , 1 −δ + t.f (h) − 1 f (x0 + th) − 1 x0 + th − x0 −l(h) < − l(h)≤ − l(h)< ε, t t t hence (f − l)(h) < ε + δ/t = ε + δ/r = ε + ε = 2ε. Since, this is true for all h ∈ BX, we get f − l ≤ 2ε. Conversely, assume that the condition on sequences does not hold. Fix l ∈ BX∗ such that l(x0 ) = 1 (see Corollary 931). We may find then ε > 0 and a sequence {fn } in BX∗ such that fn (x0 ) → 1 and fn − l > ε for all n ∈ N. Find, for n ∈ N, an element hn ∈ BX such that (fn − l)(hn ) > ε. Fix r > 0. Put δ := min{r, ε/2}. We can find n ∈ N such that 1 − fn (x0 ) < δ 2 . Take (0 < ) t := δ ( < r). Then x0 + thn − x0 fn (x0 ) + tfn (hn ) − 1 − l(hn ) ≥ − l(hn ) t t 1 − fn (x0 ) δ2 = (fn − l)(hn ) − >ε− < ε − ε/2 = ε. t δ This shows that · is not Fréchet differentiable at x0 .
®
Corollary 941 If the norm of a Banach space X is Fréchet differentiable at every point x = 0, it is automatically C 1 -smooth, i.e., the mapping that to x ∈ X \ {0} associates the Fréchet derivative f (x) ∈ X∗ is continuous. Proof It suffices to show, due to positive homogeneity, that if {xn }∞ n=1 is a sequence in SX that converges to x ∈ SX , then the derivatives f (xn ) form a sequence that converges to f (x). This follows from Šmulyan’s result in Remark 940.5. Indeed, f (xn )(x) = f (xn )(xn ) + f (xn )(x − xn ) ≥ 1 − x − xn → 1 as n → ∞.
11.2 Three Basic Principles of Linear Analysis
539
Fig. 11.19 Two supporting functionals to B 2∞ (at (0, 1) and (1, 1)) (Example 943)
Example 942 The standard norm ·1 of the Banach space 1 (see Example 565.15 and Fig. 11.3 for its unit ball in the three-dimensional case) is Fréchet differentiable nowhere. However, this norm is Gâteaux differentiable precisely at all points in {x = (xn ) ∈ 1 : xn = 0 for all n ∈ N}. Indeed, by a version of Proposition 817, if the norm · 1 of 1 was Fréchet differentiable at some x ∈ S 1 (note that it is enough to check differentiability at points in S 1 , see Remark 940.1), we would have lim
h1 →0
x + h1 + x − h1 − 2 = 0. h1
However, this is not true: Given ε > 0, get i ∈ N such that |xi | < 2ε . Put h = εei , where ei is the ith standard unit vector in 1 . Then a direct calculation shows that x ±h1 ≥ 1+ 2ε . Another approach uses the characterization given in Remark 940.5: The dual space of ( 1 , ·1 ) is ( ∞ , ·∞ ) (see Exercise 13.584). Given n x ∈ S 1 , the sequence {sn }∞ in S , where for each n ∈ N we put s :=
n ∞ n=1 i=1 sign (xi )ei , satisfies sn (x) → 1. However, sn − sm ∞ = 1 for all n = m, hence {sn } does not converge in ( ∞ , ·∞ ). Note that Exercise 13.600 gives a more precise result. For the statement about Gâteaux differentiability see Exercise 13.599. ♦ Example 943 Let X be the Banach space 2∞ . We shall find a supporting functional to BX at the points (0, 1) and (1, 1) (see Fig. 11.19). In order to do this, we shall search for a supporting functional f (x, y) = ax + by for some a, b ∈ R. Then f = sup |f (x, y)| = (x,y)∈BX
sup
|x|≤1,|y|≤1
|ax + by| = |a| + |b|.
It is standard to find that for the first point the only solution is a = 0, b = 1 and for the second one for example any a ∈ [0, 1] and b = (1 − a) work. ♦ Example 944 Let X be the Banach space ( 24 , ·4 ) (see Example 565.15). Then (X∗ , ·∗ ) = ( 24/3 , ·4/3 ). Indeed, it can be proved that the dual space of p for p > 1 is q , where p−1 + q −1 = 1 (see, e.g., [FHHMZ11], Proposition 2.17). See Fig. 11.20 for the shape of the closed unit balls in X and X∗ . Let us find, given (x0 , y0 ) ∈ X, (x0 , y0 ) = (0, 0), the (unique) element (a, b) ∈ SX∗ where (x0 , y0 ) attains its norm, i.e., the (unique) element (a, b) in SX∗ where (x0 , y0 ) attains its
540
11 Excursion to Functional Analysis
Fig. 11.20 In bold, the closed unit ball in 24 (left) and in
24/3 (right), the starting point x0 and the computed point (a, b) (Example 944). In the picture, dual balls share the same dash-style
supremum. This amounts to solve the nonlinear problem in the variables a and b ⎧ ⎨maximize : F (a, b) := ax + by 0 0 (11.20) ⎩subject to : G(a, b) := a 4/3 + b4/3 − 1 = 0. For this problem we can use the Lagrange multiplier method (see, e.g., [Edw95]). Observe that ∇F (a, b) = (x0 , y0 ), while ∇G(a, b) = 43 (a 1/3 + b1/3 ). We know that, if (a, b) is a solution to the problem (11.20), then ∇F (a, b) = λ∇G(a, b) for some λ ∈ R. Moreover, a 4/3 + b4/3 = 1. This shows, easily, that for (x0 , y0 ) := (0.6, 0.58) we have, with an error less than 10−4 , (a, b) = (0.6245, 0.5641). Let us find now in the space 24 the farthest point in the Euclidean sphere to the point (x0 , y0 ). This is again a nonlinear problem in the variables x and y, this time ⎧ ⎨maximize : F (x, y) := (x − x )4/3 + (y − y )4/3 0 0 (11.21) ⎩subject to : G(x, y) := x 2 + y 2 − 1 = 0. By the Lagrange multiplier method we are driven to solve ∇F (x, y) = λ∇G(x, y), x 2 +y 2 = 1. We get, with an error less than 10−4 , (x, y) = (−0.70932, −0.70488). ♦ Remark 945 The differentiability theory that has been developed in Subsect. 4.1.1, and Sect. 4.2, 4.3, 4.5, can be extended—with some restrictions—to functions of several variables or, more generally, to functions between infinite-dimensional normed spaces. Example 939 exhibits a real-valued function on a two-dimensional Euclidean space that is Gâteaux differentiable but not continuous at some point. Another example of the failure of some basic results when passing to more-thanone dimensions—even for functions of a single real variable—is the function f : [0, 2π] → R3 given by f (t) := (cos t, sin t, t) (functions from an interval [a, b] into Rn or, more generally, into a normed space, are called paths, and, if t is understood as “time,” they can be seen dynamically as the movement of a point in the space along the line f [a, b]). Figure 11.21 depicts the line for this particular example, precisely a circular helix. The point starts at (1, 0, 0) and ends at (1, 0, 2π )). Observe that f (t) = ( − sin t, cos t, 1), and that there is no t ∈ [0, 2π] such that (2π)f (t) = f (2π ) − f (0). Thus, Rolle’s Theorem 364 has no literal translation to vector-valued functions. ® 2. Solving Linear Equations We list two other consequences of Theorem 925.
11.2 Three Basic Principles of Linear Analysis
Fig. 11.21 The helix in Remark 945
541
542
11 Excursion to Functional Analysis P
Q
BX Y
BX
Y
Fig. 11.22 Two projection P and Q onto Y , with P = 1 and Q > 1 (in gray, the image of BX )
Corollary 946 Let X be a normed space, let x1 , x2 , . . . , xn ⊂ X be linearly independent elements of X, and let α1 , α2 , ..., αn be real numbers. Then there is a bounded linear functional f on X such that f (xi ) = αi for all i = 1, 2, . . . , n. Proof Let Y := span {x1 , x2 , . . . , xn }. The function g : Y → R given by g(y) := n n i=1 ai αi , where y = i=1 ai xi ∈ Y , is a linear mapping on Y . It is continuous due to Corollary 910, and so it has a continuous linear extension f : X → R by the Hahn– Banach Theorem 925. The function f satisfies the condition in the statement. Corollary 947 If X is a normed space and x1 , x2 ∈ X with x1 = x2 , then there is f ∈ X∗ such that f (x1 ) = f (x2 ). Proof If one of the two vectors x1 and x2 —say x2 —is 0, then apply Corollary 946 to the system {x1 } and to the scalar α1 = 1. If both x1 and x2 are nonzero vectors, apply Corollary 946 to the system {x1 , x2 } and to two scalars α1 = α2 . Remark 948 The property of X∗ exhibited in Corollary 947 is referred to by saying that X ∗ separates points of X. The set {en : n ∈ N} of canonical unit vectors in
2 is an example of a proper subset of the dual of a Banach space X that separates points of X. In every finite-dimensional Banach space X, an algebraic basis of its dual space X ∗ is a proper subset of X ∗ that separates points of X. ® 3. Projections Definition 949 If X is a normed space and Y is a closed subspace of X, then a bounded linear mapping P from X onto Y is called a projection if P y = y for every y ∈ Y . If such a projection exists, then Y is called a complemented subspace in X. Note that, from the definition, P ≥ 1 for any projection P (see Fig. 11.22). Observe that the problem of extending a continuous linear functional from a subspace Y of a normed space X to the whole space, solved by the Hahn–Banach Theorem 925, has a trivial solution if the subspace Y is complemented. Indeed, if P : X → X is a continuous linear projection with range Y , and f : Y → R is a continuous linear mapping, the mapping f ◦ P is the sought extension. However, we cannot ensure that f ◦ P has the same norm as f but in the case that P = 1. Note that if Y is a closed subspace of a Banach space X, it is equivalent to say that Y is complemented in X that the identity map on Y can be extended to a bounded linear operator P from X into Y (the extension P is indeed a projection). If this happens we mentioned above that every bounded linear functional on Y can be extended to a bounded linear functional on X. More generally, the complementability of Y in X
11.2 Three Basic Principles of Linear Analysis Fig. 11.23 A projection of norm 1 onto the one-dimensional subspace span {x0 }
543 x
f(x)x0
P
f =1
x0
f =0 0 BX span x0
Fig. 11.24 The construction of a projection of norm almost 2 onto f −1 (0)
implies that every bounded linear operator T from Y into a Banach space Z can be extended to a bounded linear operator T from X into Z (namely T = T ◦ P ). Another situation in this direction is the following consequence of the Hahn– Banach Theorem 925, namely that every one-dimensional subspace of a Banach space is complemented by a projection of norm 1 (see Fig. 11.23). Indeed, if x0 ∈ SX is given, let f ∈ SX∗ be a supporting functional of BX at x0 (its existence is guaranteed by Corollary 931). Define an operator P from X onto span {x0 } by P (x) = f (x)x0 for x ∈ X. Then P = sup{|f (x)|.x0 : x ∈ BX } = 1, and P is a projection since P (λx0 ) = f (λx0 )x0 = λx0 for all real numbers λ. An extension of this result to n-dimensions (see Corollary 950) can be obtained by an application of Auerbach’s Theorem 914. However, in the nth dimensional case, n > 1, the norm of the projection cannot be, in general, forced to be 1. For a result where a bound for the norm of a projection onto an n-dimensional subspace is established, see Corollary 950 below. Another situation in which the existence of a projection with an upper bound for the norm is guaranteed appears in the following observation: If Y is a hyperplane in a Banach space X of the form Y = f −1 (0) for some f ∈ SX∗ , then for every ε > 0 there is a projection from X onto Y of norm less than 2 + ε. Indeed (see 1 Fig. 11.24), given ε > 0, choose e ∈ SX such that f (e) > 1+ε . Put, for x ∈ X, 1 P x := x − f (x) f (e) e. If x ∈ X then f (P x) = f (x) − f (x) = 0, so P maps X into Y . If f (x) = 0, then P x = x, so in fact P is a projection onto Y . Finally, if x ∈ BX , then P x ≤ 1 + (1 + ε), so P ≤ 2 + ε. Hilbert spaces will be treated in Sect. 11.4. The problem of complementability has a nice solution in the context of Hilbert spaces: Every closed subspace of a Hilbert space is complemented by a norm-one projection. This will be precisely
544
11 Excursion to Functional Analysis
stated in Proposition 1002 below. Thus, a characterization of Banach spaces linearly isomorphic to Hilbert spaces, due to the Israeli mathematicians J. Lindenstrauss and L. Tzafriri, follows from their following deep statement: If X is a Banach space that is not linearly isomorphic to a Hilbert space (for example C[0, 1], c0 or p for p = 2—see Remark 918.5 and Exercise 13.606) then X contains an uncomplemented subspace. For a concrete example note that c0 is not complemented in ∞ (a result due to the American mathematician A. Sobczyk, see, e.g., ([FHHMZ11], Theorem 5.6). Projections can be used in estimating distances from complemented subspaces. This is the subject of the following statement and the argument in its proof: If P is a projection from a Banach space X onto its subspace P X, then for every x ∈ X we have dist (x, P X) ≤ x − P x ≤ (1 + P ) dist (x, P X). Indeed, for every y ∈ P X, we have dist (x, P X) ≤ x − P x = x − y + P (y − x) ≤ (1 + P )y − x. Hence, dist (x, P X) ≤ x − P x ≤ (1 + P ) inf{x − y : y ∈ P X} = (1 + P ) dist (x, P X). The following corollary extends to a subspace of dimension n, the fact mentioned above that every one-dimensional subspace of a Banach space X is complemented in X. It gives, too, a bound for the norm of a particular projection. We shall see later (Proposition 1002) that the result in Hilbert spaces can be substantially improved. Corollary 950 Let Y be a subspace of a Banach space X. If dim (Y ) = n then there exists a projection P of X onto Y such that P ≤ n. Proof Let {ei ; fi }ni=1 be an Auerbach basis of Y (see Theorem 914). By the Hahn– Banach Theorem 925, we extend each fi to a norm-one functional on X. Then we n f (x)e define an operator P : X → Y by P (x) = i=1 i i nfor x ∈ X. We claim that P is a projection onto Y . Indeed, for every y = i=1 αi ei ∈ Y we have n αi = fi (y). Therefore P (y) = f (y)e = y. Finally, if x ∈ X and x ≤ 1, i i=1 i then P (x) ≤ ni=1 |fi (x)| ei ≤ ni=1 1 = n. √ Corollary 950 can be substantially improved (n can be replaced by n; this is the content of a theorem of the Ukrainian mathematicians M. I. Kadets and M. G. Snobar, see, e.g., ([FHHMZ11], Theorem 6.28).
11.2.2
Bounded Sets of Operators
It is crucial for the theory that pointwise-bounded sets of operators between Banach spaces are norm-bounded. This is a consequence of the following result due to the Polish mathematicians S. Banach and H. Steinhaus.
11.2 Three Basic Principles of Linear Analysis
545
Theorem 951 (Banach, Steinhaus) Let X be a Banach space, and let Y be a normed space. Let A ⊂ B(X, Y ).If A is pointwise bounded then A is bounded in B(X, Y ). Proof For n ∈ N set Nn := {x ∈ X : supT ∈A : T (x)Y ≤ n}. We claim that Nn is closed, convex, and symmetric in X. The symmetry of Nn is obvious. To check closedness, let {xk } be a sequence in Nn and assume that xk → x ∈ X. Given T ∈ A, we have T (xk )Y ≤ n, so T (x)Y ≤ n by continuity. To see that Nn is convex, let x1 , x2 ∈ Nn and λ ∈ [0, 1]. Then for every T ∈ A we have T (λx1 + (1 − λ)x2 )Y ≤ λT (x1 )Y + (1 − λ)T (x2 )Y ≤ λn + (1 − λ)n = n. Since, for every x ∈ X we have supT ∈A T (x)Y < ∞, there is some n ∈ N greater than the supremum, hence x ∈ Nn . So ∞ n=1 Nn = X. By the Baire Category Theorem 640, there is n0 ∈ N such that the set Nn0 contains an interior point x0 . Thus, there is δ > 0 such that x0 + δBX ⊂ Nn0 . Due to the symmetry of Nn0 we have −x0 + δBX ⊂ Nn0 . If b ∈ BX , then by the convexity of Nn0 we have b = 21 (x0 + b) + 21 ( − x0 + b) ∈ Nn0 . Hence, δBX ⊂ Nn0 . Consequently, given T ∈ A, for every x ∈ BX we have T (δx)Y ≤ n0 , that is, T ≤ nδ0 . This means that supT ∈A T ≤ nδ0 . Remark 952 For incomplete normed spaces Theorem 951 may fail. For an example of this situation, observe that there is a pointwise bounded family of linear operators on (c00 , ·∞ ) that is not a bounded family. Indeed, define for n ∈ N, a bounded linear functional fn on c00 by fn (x) := nxn for x = (xn ) ∈ c00 . Check that fn = n and that fn (x) → 0 as n → ∞ for each x ∈ c00 , since each such x is finitely supported. Compare this example with the statement of Theorem 951. ®
11.2.3
Continuity of the Inverse Operator
If T is a one-to-one linear operator from a finite-dimensional Banach space X onto Y then Y is also finite-dimensional, and it follows from Corollary 910 that T and its inverse map are both continuous, i.e., T is an isomorphism. The important Theorem 953 below extends this fact to the infinite-dimensional setting. The name it bears, (the “Open Mapping Theorem”) refers to the fact that as soon as a continuous linear operator T maps a Banach space X onto another Banach space Y , then T turns out to be “open,” i.e., it maps open subset of X onto open subsets of Y . We formulate the result for one-to-one mappings; in this case, the outcome is that T is an isomorphism. Note that in the infinite-dimensional case not every one-to-one and onto continuous map has a continuous inverse (see, e.g., Remark 918.2). Theorem 953 (Banach Open Mapping Theorem) Let T be a bounded linear oneto-one operator from a Banach space X onto a Banach space Y . Then T is an isomorphism.
546
11 Excursion to Functional Analysis
Proof It is enough to show that T BX is a neighborhood of 0. Since ∞ n=1 nT BX = Y , the Baire Category Theorem 641 ensures the existence of n ∈ N such that nT BX contains an open ball. By homogeneity, T BX contains B(y0 , r) for some y0 ∈ Y and r > 0. Thus, given y ∈ Y , y = 0, we have y0 + ry/y ∈ T BX . Since y0 ∈ T BX we get ry/y ∈ 2T BX , hence y/y ∈ T (2/r)BX . This proves the following statement: Given y ∈ Y and ε > 0, there exists x ∈ X, x ≤ 2r y such that T x − y < ε. (11.22) Fix y ∈ BY . By (11.22) there exists x1 ∈ X, x1 < 2/r, such that T x1 −y < 1/2. Use again (11.22) to get x2 ∈ X such that x2 < (2/r)(1/2) such that T x2 +T x1 − y < 1/22 . Once more, use (11.22) to get x3 ∈ X, x3 < (2/r)(1/22 ) such that T x3 + T x2 + T x1 − y < 1/23 . Proceed inductively to find a sequence {xn } in X. It is clear that {sn } is a Cauchy sequence, where sn := x1 + . . . + xn for all n ∈ N. Let x be its limit. It follows that T x = y. Note that x < 4/r. This shows that BY ⊂ T (4/r)BX , hence (r/4)BY ⊂ T BX . Corollary 954 Let X be a Banach space and T be a one-to-one continuous map from X into a Banach space Y . Then T is an isomorphism if and only if T X is closed in Y . Proof . If T X is closed, the conclusion follows from Theorem 953, as T X is a Banach space. Conversely, if T is an isomorphism, let {xn }∞ n=1 be a sequence in X such that T xn → y ∈ Y . The sequence {xn }∞ n=1 is thus Cauchy, hence it converges to some x ∈ X. It follows that T x = y, so T X is closed. Example 955 Let T be the formal identity operator from ( 1 , ·1 ) into ( 2 , ·2 ). Then T 1 ( = 1 ) is dense and not closed in ( 2 , ·2 ), as it follows from Corollary 954 and Exercise 13.585 (for details see Exercise 13.571). Note in passing that T B 1 is closed in ( 2 , ·2 ) (see Example 920). ♦ Corollary 956 (The Closed Graph Theorem) Let T be a linear operator from a Banach space X into a Banach space Y that has a closed graph, i.e., whenever xn → x in X and T (xn ) → y in Y , then T x = y. Then T is a bounded linear operator. Sketch of the Proof Denote by G the graph of T in X ⊕ Y , i.e., G = {(x, T x) : x ∈ X}. Observe that G is a Banach space and the operator (x, T x) $ → x is a one-to-one linear continuous operator from G onto X. So its inverse, i.e., the map x $ → T x, is continuous by the Banach Open Mapping Theorem 953.
11.3 Complex Banach Spaces
11.3
547
Complex Banach Spaces
In this short section, we shall consider complex normed spaces, how to reduce the analysis on them to the real case, and how the previous results on real normed spaces apply to this context. A complex normed space is a complex linear space X (i.e., a liner space over the field C of complex numbers), together with a real-valued function · defined on X that satisfy (i) to (iv) in Definition 895, where now (iii) holds for every λ ∈ C. For a summary of properties of the field C of complex numbers see Sect. 12.5.
11.3.1
The Associated Real Normed Space
A complex normed space (X, ·) can always be considered as a real normed space (we shall call the new structure the associated real normed space, and will be denoted by (XR , ·)). To be precise, the set X remains the same, the sum of vectors does not change, and now only multiplication by real numbers is allowed. The norm (a real-valued function) remains as it is, and certainly λx = |λ|.x for any real number λ, so · satisfies the requirements of a norm on a vector space over the field of the real numbers. In particular, when passing from (X, ·) to (XR , ·), metric concepts do not change. Balls are the same, mutual distances between vectors too, as well as Cauchy sequences and convergent sequences. In particular, (XR , ·) is a Banach space if (X, ·) is. Clearly, a vector subspace of a complex normed space is also a subspace of the associated real normed space. The two algebraic structures of X and of XR are different. For example, if X is a complex vector space of finite-dimension n, then XR is a real vector space of finite-dimension 2n. Indeed, assume that {ek }nk=1 is an algebraic basis of X. It is an easy exercise to prove that the system {ek }nk=1 ∪ {iek }nk=1 is an algebraic basis of XR . As a simple example, the associated real normed space to (C, | · |) (a one-dimensional complex normed space) is the two-dimensional real normed space (R2 , ·2 ).
11.3.2
Operators
If T is a linear operator from a complex linear space X into another Y , it is obviously a linear operator from XR into YR . In case that (X, ·) and (Y , | · |) are normed spaces, the fact that (X, ·) and (XR , ·) share the same norm (the same for (Y , | · |) and (YR , | · |)) shows that everything regarding continuity, norm of the operator, and, in general, any norm-related aspect of it, remain the same for the complex and the associated real case.
548
11.3.3
11 Excursion to Functional Analysis
Linear Functionals
Let (X, ·) be a complex normed space and let (XR , ·) be its associated real normed space. 1. If f : X → C is a linear functional, and f (x) = u(x) + iv(x), where u(x) ∈ R and v(x) ∈ R for x ∈ X, the two mappings u : XR → R and v : XR → R are (real) linear. They are called the real part (denoted by Rf ) and the imaginary part (denoted by If ), respectively, of f . The two mappings Rf and If are related: Indeed, If (x) = −Rf (ix) for x ∈ X, as it can be easily seen by computing f (ix). Conversely, given a (real) linear functional u : XR → R, the mapping f : X → C given by f (x) := u(x) − i.u(ix) for x ∈ X, is (complex) linear on X. It follows that ker f = ker Rf ∩ i.ker Rf for all linear functionals f : X → C. 2. Let f : X → C be a continuous linear functional. Then f = Rf = If . Indeed, for all x ∈ X, we have |Rf (x)| ≤ |f (x)|, so Rf ≤ f . Moreover, if x ≤ 1 and f (x) = 0, then Rf (x) = 0; otherwise
f f (x)x f (x)f (x) |f (x)| = = |f (x)| |f (x)| Rf (f (x)x) f (x) = = Rf x ≤ sup{|Rf (y)| : y ∈ BX } = Rf |f (x)| |f (x)| (the third equality due to the fact that f (f (x)x) ∈ R), hence f ≤ Rf , and we get all together f = Rf . A similar argument applies to If . In particular, Rf and If are continuous (real) linear functionals on XR . 3. The previous considerations show that (XR , ·)∗ = {Rf : f ∈ X∗ }. 4. Let {xn } be a sequence in X and let x ∈ X. Note that f (xn ) → f (x) for all f ∈ X∗ , if and only if, u(xn ) → u(x) for all u ∈ XR∗ . Indeed, if f (xn ) → f (x) we get, by computing the real part of f (x) and f (xn ) for all n ∈ N, that Rf (xn ) → Rf (x). Conversely, it is enough to observe that given f ∈ X ∗ , the mapping x → −if (x) from X into C belongs to X∗ , and it has a real part If , so If ∈ XR∗ . Thus, if u(xn ) → u(x) for every u ∈ XR∗ , it follows that, for every f ∈ X∗ , Rf (xn ) → Rf (x) and If (xn ) → If (x), hence f (xn ) → f (x). 5. Let {fn } be a sequence in X ∗ and let f ∈ X∗ . Observe that {fn } pointwise converges to f , if and only if, {Rfn } pointwise converges to Rf . Indeed, if for some x ∈ X we have fn (x) → f (x), we get, by computing the real part of fn (x) and f (x), that {Rfn } → Rf (x). Conversely, if Rfn (x) → Rf (x) for all x ∈ X, we have, in particular, −Rfn (ix) → −Rf (ix) for all x ∈ X. Since g(x) = Rg (x) − iRg (ix) for g ∈ X∗ , we get fn (x) → f (x).
11.3 Complex Banach Spaces
11.3.4
549
Supporting Functionals and Differentiability
If we identify a supporting functional of the closed unit ball BX at a point x0 ∈ SX — called a support point—as an element in SX∗ such that f (x0 ) = 1, this definition agrees with the geometric one in the real case (f defines a “tangent” hyperplane to BX at x0 ), and has the advantage that it makes sense in the complex setting. Moreover, if f is such, then Rf ∈ SXR∗ (see 2 in Sect. 11.3.3), and obviously Rf (x0 ) = 1, hence Rf defines a “tangent” (real) hyperplane to BX at x0 in (XR , ·) (i.e., it “touches” BX at x0 and leaves BX “at one side”). Differentiability will always be considered in the associated real normed space.
11.3.5
Basic Results in the Complex Setting
We summarize here the situation regarding the validity of some important results established in the real case when passing to the complex one. • We repeat here that any result valid in the real case and stated purely in terms of norms remains valid also in the complex case. • Proposition 900 characterizing continuous linear operators certainly holds in the complex case, as a result of what was said in Sect. 11.3.2 • Proposition 901 about the completeness of B(X, Y ) holds for complex spaces. Indeed, if {Tn } is a Cauchy sequence in B(X, Y ), then it is also a Cauchy sequence in B(XR , YR ), hence it converges in norm to some T ∈ B(XR , YR ). Due to the fact that for λ ∈ C and x ∈ X we have Tn (λx) → T (λx), we get that T ∈ B(X, Y ). This shows that B(X, Y ) is a Banach space in case that Y is a Banach space. In particular, X∗ is a Banach space if X is a Banach space, so Corollary 904 holds in the complex setting, too. • The results on finite-dimensional normed spaces translate word by word to the complex setting, in particular the crucial Theorem 908, since it is enough to prove them in the real case. Indeed, recall that if (X, ·) is a complex normed space of dimension n, then (XR , ·) is a real normed space of dimension 2n. The important Lemma 911 also holds in the complex case, as well as Theorem 914. • Alaoglu’s Theorem 919 holds for separable complex normed spaces. Indeed, if {fn }n is a sequence in BX∗ , the sequence {Rfn }n is in BXR∗ , and so, by the real case, it has a subsequence {Rfnk }k that pointwise converges to some u ∈ BXR . It is enough to apply 5 in Sect. 11.3.3 to conclude that the sequence {fnk } pointwise converges to f , where f (x) = u(x) − i(u(ix) for all x ∈ X. Thus, BX∗ is compact in the metric dw∗ (see again Theorem 919). • Proposition 923 and its Corollary 924 on finite-rank and compact operators translate word by word to the complex case. • The basic Hahn–Banach extension Theorem 925 holds for complex normed spaces. Indeed, assume that X is such. By the real case, if f ∈ Y ∗ , the mapping :f : XR → R such that R :f = Rf . Rf has a continuous linear extension R
550
11 Excursion to Functional Analysis
:f (x) − i R :f (ix), x ∈ X is a continuous linear extension The function f(x) := R of f (due to the fact that iy ∈ Y for y ∈ Y ), and, according to 2 in Sect. 11.3.3, :f = Rf = f . we have f = R • The geometric version of the Hahn–Banach Theorem 926 holds in the complex case, too. Indeed, assume that (X, ·) is a complex normed space. Consider the associated real normed space (XR , ·). Clearly Y is a subspace of XR , and so, from the real case, there exists a real 0-hyperplane HR that contains Y and is disjoint from C. Since iY = Y , we have Y ⊂ iHR , and so Y ⊂ HR ∩ iHR . According to 1 in Sect. 11.3.3, the set H := HR ∩ iHR is a 0-hyperplane, clearly disjoint from C. Its Corollary 927 about the closure of a convex set holds, too. Indeed, if (X, ·) is a complex normed space, the fact that f (xi ) → f (x) for f ∈ X∗ implies that Rf (xi ) → Rf (x). From the real case, and according to 3 in Sect. 11.3.3, it follows that x ∈ C. • The useful Corollary 932 also stays in the complex case. Indeed, if (X, ·) is a complex normed space, and x ∈ X, we may apply the real version to get, having in mind 3 in Sect. 11.3.3, x = sup{|u(x)| : u ∈ XR∗ , u ≤ 1} ≤ sup{|f (x)| : f ∈ X∗ , f ≤ 1} ≤ x, and the conclusion follows. • In the same vein, both Corollaries 946 and 947 about solving a finite number of equations and the dual separating points, respectively, remain with the same statement in the complex case. • Statements about projections are valid in the complex setting, too, with the same proofs. • The fundamental results of Banach and Steinhaus (Theorem 951), the Open Mapping Theorem 953, and its Closed Graph Corollary Theorem 956, remain valid in the complex setting as it follows from Sect. 11.3.2.
11.4
Spaces with an Inner Product (Pre-Hilbertian and Hilbert Spaces)
A rich geometric theory can be developed in the framework of Banach spaces. However, the full picture of the classical Euclidean geometry on Rn , now on infinitedimensional spaces, is achieved in a more restrictive, yet quite ubiquitous structure, namely the class of Banach spaces with an inner product ·, ·) that is compatible with both the norm and the linear structure. Those spaces are called Hilbert spaces after D. Hilbert. The scalar field K can be R or C. Elements in K will be called scalars. The geometric structure of an inner product space allows for illustrative sketches. Even with the constraint of being finite-dimensional and only valid in the real case, they may reinforce the required intuition.
11.4 Spaces with an Inner Product (Pre-Hilbertian and Hilbert Spaces)
11.4.1
551
Basic Hilbert Space Theory
Definition 957 An inner product (or a scalar product or a dot product) on a real or complex vector space X is a scalar-valued function ·, · on X × X such that (1) (2) (3) (4)
for every y ∈ X, the function x $ → x, y is a linear functional, x, y = y, x, where the bar denotes complex conjugation, x, x ≥ 0 for every x ∈ X, x, x = 0, if and only if, x = 0.
A vector space endowed with an inner product is called an inner product space (sometimes the term pre-Hilbertian space is used). It will be denoted by (X, ·, ·) Note that x, y + z = x, y + x, z and that x, λy = λx, y for all x, y, z ∈ X and all scalars λ, something that can be deduced from (1) and (2) in Definition 957 above. Observe, too, that 0, y = x, 0 = 0 for all x, y ∈ X and that x, y = y, x for all x, y ∈ X if X is a real vector space. The following result includes a statement (the Cauchy–Schwarz inequality) that extends the result proved in Proposition 829 for the case of the Euclidean space Rn (see Example 549.2b) and the space L2 (I ) of all square-integrable measurable functions on a general interval Indeed, Rn with the Euclidean n (see Example 565.16). n inner product x, y := k=1 xk yk , for x = (xk )k=1 and y = (yk )nk=1 in Rn , and L2 (I ) with the inner product f , g := I f (t)g(t)dt for f , g ∈ L2 (I ), are both examples of inner product spaces, the first one real and the second one complex. Theorem 958 Let (X, ·, ·) be an inner product space. Then (i) For x, y ∈ X we have |x, y| ≤
x, x y, y (the Cauchy–Schwarz inequality).
(11.23)
(ii) The function x :=
x, x
(11.24)
is a norm on X. (iii) The norm · satisfies that for every x, y ∈ X, x + y2 + x − y2 = 2x2 + 2y2 (the parallelogram equality) (11.25) (see Fig. 11.25). The norm · defined in (11.24) is called the norm associated to an inner product. Thus, an inner product space is a particular case of a normed space. Note that the Cauchy–Schwarz inequality (11.23) appears then as |x, y| ≤ x.y, x, y ∈ X.
(11.26)
552
11 Excursion to Functional Analysis
Fig. 11.25 The parallelogram equality
Corollary 959 Let (X, ·, ·) be an inner product space. Then, for each y ∈ X the function x → x, y is a continuous linear functional on X. Proof If {xn }∞ n=1 is a sequence in X that converges to x ∈ X, then we have |x − xn , y| ≤ x − xn .y → 0 as n → ∞. Proof of Theorem 958 (i) If y, y = 0, we have y = 0 and the inequality is satisfied. Assume that y, y > 0. Then, by (3) in Definition 962, 9 8 x, y |x, y|2 x, y y, x − y = x, x − 0≤ x− y, y y, y y, y and the statement follows. (ii) We will check the triangle inequality. For x, y ∈ X we have x + y2 = x + y, x + y = x, x + y, y + x, y + y, x ≤ x, x + y, y + 2|x, y| ≤ x, x + y, y + 2 x, x y, y = ( x, x + y, y)2 = (x + y)2 , (the second inequality due to (i)). (iii) Observe that x + y2 + x − y2 = x + y, x + y + x − y, x − y = x, x + x, y + y, x + y, y + x, x − x, y − y, x + y, y = 2x2 + 2y2 .
Note that in a real inner product space (X, ·, ·), the Cauchy–Schwarz inequality (11.26) appears as −x.y ≤ x, y ≤ x.y for all x, y ∈ X, so we can write −1 ≤
x, y ≤ 1, for x, y ∈ X, x = 0, y = 0. x.y
11.4 Spaces with an Inner Product (Pre-Hilbertian and Hilbert Spaces)
553
Fig. 11.26 In an inner product space, the sphere does not contain segments (Remark 960)
Fig. 11.27 In an inner produc space, a subspace F and its orthogonal complement F ⊥
Fig. 11.28 The Pythagorean Theorem (Eq. (11.27))
This allows to define in this case the angle between x and y as the angle θ ∈ x,y . Thus, inequality (11.26) appears, in this context, as [0, π ] such that cos θ = x.y x, y = x.y cos θ , and Definition 961 below has, for Euclidean spaces, a clear geometrical meaning. Remark 960 Note that a consequence of the parallelogram equality (11.25) is the fact that the unit sphere SH of an inner product space has the following geometric property: It does not contain any proper linear segment or, in other words, that x, y ∈ SH , x = y, implies that λx + (1 − λ)y < 1 for any 0 < λ < 1 (see Fig. 11.26). Indeed, any vector z of the form λx + (1 − λ)y, where 0 < λ < 1, can be written as the middle point of two vectors z + h, z − h, both in BH , where h = 0. Then 2 ≥ z + h2 + z − h2 = 2z2 + 2h2 . Since h > 0, this shows that z < 1.
®
Definition 961 Let (X, ·, ·) be an inner product space. Two vectors x, y ∈ X are said to be orthogonal if x, y = 0. If this is the case we write x ⊥ y, and we also say that x is orthogonal to y. Let M ⊂ X, M = ∅. We say that the vector x is orthogonal to M, denoted x ⊥ M, if x is orthogonal to every vector y in M. If F is a subspace of X, then the set F ⊥ of all vectors x ∈ X such that x ⊥ F is called the orthogonal complement of F in X (see Fig. 11.27).
554
11 Excursion to Functional Analysis
Observe that given x ∈ X (see Fig. 11.28), x2 = y2 + z2 , if x = y + z, y ⊥ z (the Pythagorean Theorem). (11.27) Indeed, x2 = x, x = y + z, y + z = y, y + y, z + z, y + z, z = y, y + z, z = y2 + z2 . Definition 962 A Banach space (H√ , ·) is called a Hilbert space if there is an inner product ·, · on H such that x = x, x for every x ∈ H . Example 963 1. The (vector) space 2 (N) (also denoted by 2 ) consists of all square-summable ∞ 2 sequences of scalars, i.e., sequences {xn }∞ such that n=1 n=1 |xn | < +∞. We endow this space with the inner product x, y2 :=
∞
xn yn
(11.28)
n=1 ∞ for x := {xn }∞ n=1 and y := {yn }n=1 in 2 , where y denotes the conjugate of the number y. That this inner product is well defined follows from the Cauchy– Schwarz inequality (8.24). It induces the norm 1/2 ∞ x2 := |xn |2 (11.29) n=1
{xn }∞ n=1
∈ 2 (N). for x = The space ( 2 , ·, ·2 ) is a separable Hilbert space. To check separability, proceed as in Example 897.1. To prove completeness, follow the pattern in Example 897.3 with some obvious changes. 2. Let L2 [0, 2π ] be the (vector) space of all scalar-valued square Lebesgue integrable functions (a.e.) defined on [0, 2π] (two functions identified if they agree (a.e.)), measurable functions f defined (a.e.) on [0, 2π ] such that i.e., scalar-valued 2 |f (t)| dt < +∞. Define the inner product [0,2π] 4 f (t)g(t) dt (11.30) f , g2 := [0,2π ]
for f , g ∈ L2 [0, 2π] (this was already introduced in Eq. (9.13) for the space of 2π -periodic continuous scalar-valued functions). It is well-defined thanks to the Cauchy–Schwarz inequality (8.25). It is clear that ·, ·2 satisfies (1) to (3) in Definition 957, and that 0, 02 = 0. To prove that f , f 2 = 0 implies f = 0 (a.e.), use Corollary 761. All together, ·, ·2 is an inner product on L2 [0, 2π ]. It induces the norm 4 1/2 2 f 2 := |f (t)| dt , (11.31) [0,2π ]
11.4 Spaces with an Inner Product (Pre-Hilbertian and Hilbert Spaces)
555
for f ∈ L2 [0, 2π]. The space L2 [0, 2π ] is a separable Hilbert space. Separability will be proved in Corollary 982 as a consequence of the existence of a countably infinite orthonormal basis. To prove completeness (Theorem 965 below), we need first the following intermediate result: Lemma 964 Let {fn }∞ that ∞ n=1 be a sequence in L2 [0, 2π ] such n=1 fn 2 < ∞ +∞. Then there exists a function s ∈ L2 [0, 2π ] such that n=1 fn = s (a.e.) and also in the norm ·2 . Proof For n ∈ N put Sn := nk=1 |fk |. The sequence of functions {Sn2 }∞ n=1 is increasing (a.e.). Moreover, for every n ∈ N, Minkowski’s inequality (8.19) gives . n .2 n 2 ∞ 2 4 . . . . 2 Sn = . |fk |. ≤ fk 2 ≤ fk 2 < +∞. . . [0,2π] k=1
2
k=1
k=1
(Note that, in particular, Sn2 ∈ L1 [0, 2π].) The Monotone Convergence Theorem 2 744 implies then that the sequence {Sn2 }∞ n=1 converges (a.e.) to a function S ∈ ∞ L1 [0, 2π (a.e) to S). For n ∈ N put ] (in particular, the sequence {Sn }n=1 converges sn := nk=1 fk . We just proved that the series ∞ k=1 fk is absolutely convergent (a.e.); thus, it converges (a.e.) (to some function s), so the sequence {sn2 }∞ n=1 (a sequence in L1 [0, 2π]) converges (a.e.) to s 2 . Observe that |sn2 | ≤ Sn2 ≤ S 2 for all n ∈ N, and that S 2 ∈ L1 [0, 2π ]. Thus, the Dominated Convergence Theorem 750 implies that s 2 ∈ L1 [0, 2π ]. Moreover, (a.e.) we have 2 n |s − sn |2 ≤ |s| + |fk | ≤ (2S)2 ∈ L1 [0, 2π ]. k=1
Again relying on the Dominated Convergence Theorem 750 we get now that |s − sn |2 → 0, hence s − sn 2 → 0. [0,2π] Proposition 965 The space (L2 [0, 2π ], ·2 ) is complete. Proof Let {fn }∞ n=1 be a Cauchy sequence in L2 [0, 2π ]. Consider a subsequence {fnk }∞ such that fnk+1 − fnk < 2−k for k ∈ N. For K ∈ N let k=1 sK := fn1 +
K
(fnk+1 − fnk ).
(11.32)
k=1
By Lemma 964, the series (11.32)—whose partial sums form the subsequence ∞ {fnk }∞ k=1 of {fn }n=1 —is ·2 -convergent. It is obvious that a Cauchy sequence having a convergent subsequence must be convergent itself. Remark 966 We remark that the completeness in Proposition 965 is due to the properties of the Lebesgue integral. The Riemann integral does not generate such property. Precisely, we showed in Examples 784.2 and 784.3, and in Proposition 785, that the subspace R1 [a, b] of L1 [a, b] consisting of all classes containing
556
11 Excursion to Functional Analysis
Fig. 11.29 Searching for the point at minimum distance (Lemma 967.1)
Riemann integrable functions on a closed and bounded interval [a, b], for a < b, is a proper dense subspace when L1 [a, b] is endowed with the integral norm ·1 , and in Exercise 13.485 that the space (L1 [a, b], ·1 ) is complete. ® ♦ Given a normed space (X, ·), a point x ∈ X, and a nonempty subset A of X, the distance from x to A is the real number dist (x, A) := inf{x − a : a ∈ A}. This concept was defined in the setting of metric spaces in the paragraph preceding Proposition 557. It was proved there that this function is 1-Lipschitz. A point a0 ∈ A is said to be at minimum distance from x if x − a0 = dist (x, A). In the proof of Theorem 969 we shall need the following lemma, a crucial result in this area (see Fig. 11.29 for (1) and Fig. 11.31 for (2) there): Lemma 967 Let H be a Hilbert space, and let x be a point in H . 1. Let C be a nonempty closed and convex subset of H . Then (a) (b) (c) (d)
There exists a point x0 ∈ C at minimum distance from x. The point x0 ∈ C at minimum distance from x is unique. The mapping x $ → dist (x, C) from H into R is 1-Lipschitz. The mapping PC : H → H that sends x to the point x0 ∈ C at minimum distance from x is continuous.
2. Let F be a closed subspace of H . Then (a) The point x0 ∈ F at minimum distance from x (it exists by (1.a) and is unique by (1.b) above) satisfies (x − x0 ) ⊥ F . (b) Conversely, if a point x0 ∈ F satisfies (x − x0 ) ⊥ F , then x0 is the (unique) point in F at minimum distance from x. Proof 1. Put d := dist (x, C) (see Fig. 11.29). (a) Let yn ∈ C be such that (d ≤ ) x − yn 2 < d 2 + n1 . We shall prove that {yn }∞ n=1 is a Cauchy sequence. For this, consider x − yn and x − ym in the parallelogram equality (11.25); it gives 2x − (yn + ym )2 + yn − ym 2 = 2x − yn 2 + 2x − ym 2 . From this we can estimate yn − ym , getting yn − ym 2
11.4 Spaces with an Inner Product (Pre-Hilbertian and Hilbert Spaces)
557
= 2x − yn 2 + 2x − ym 2 − 2x − (yn + ym )2 . .2 . y n + ym . 2 2 . . = 2x − yn + 2x − ym − 4 .x − 2 .
< 2 d 2 + n1 + 2 d 2 + m1 − 4d 2 = n2 + m2 , y n + ym ∈ C. Therefore, {yn }∞ where we used that n=1 is a Cauchy sequence in 2 H , hence it converges to some point x0 ∈ C (due to the fact that C is closed), and x − x0 = lim x − yn = d. n→∞
(b) By 1(a) here, there exists a point x0 ∈ C at minimum distance d from x. Assume that y0 ∈ C satisfies x − y0 = d. Use the parallelogram equality (11.25) for the vectors x − x0 and x − y0 to get . .2 . x 0 + y0 . . . + x0 − y0 2 4 .x − 2 . = (x − x0 ) + (x − y0 )2 + (x − x0 ) − (x − y0 )2 = 2x − x0 2 + 2x − y0 2 = 4d 2 . 0 < d, a contradiction with the fact that If x0 = y0 , we obtain x − x0 +y 2 (x0 + y0 )/2 ∈ C. So 1(a) and 1(b) together allow us to define the mapping PC : H → C that sends x ∈ H to the unique point in C at minimum distance from x. (c) Take x, y ∈ H . Let x0 := PC (x) and y0 := PC (y). Then
dist (x, C) = x − x0 = x − y + y − x0 ≥ y − x0 − x − y ≥ dist (y, C) − x − y. Reverse the roles of x and y to get dist (y, C) ≥ dist (x, C) − x − y. These two inequalities together show that |dist (x, C) − dist (y, C)| ≤ x − y. (d) Let {xn }∞ n=1 be a sequence in H that converges to x ∈ H (see Fig. 11.30). Let yn := PC (xn ) for n ∈ N. We shall prove that yn → x0 ( := PC (x)). Observe that dist (xn , C) = xn − yn ; thus, from the triangle inequality we get dist (xn , C) + x − xn ≥ x − xn + xn − yn = x − yn ≥ dist (xn , C) − x − xn . Moreover, due to 1(c) above, dist (xn , C) → dist (x, C) as n → ∞. We get then that x − yn → dist (x, C), and the proof of 1(a) above shows then that {yn } converges to PC (x). 2. Assume now that F is a closed subspace of H (see Fig. 11.31).
558
11 Excursion to Functional Analysis
Fig. 11.30 Continuity of the metric projection mapping PC (1(d) in Lemma 967)
Fig. 11.31 The closest point x0 to x in a subspace F
(a) Let x0 := PF (x), and d := x − x0 . Put x1 = x − x0 , and note that x1 2 = d 2 . Assume that x1 is not orthogonal to F , so there is z ∈ F such that x1 , z > 0 (change z for eiθ z if necessary, where θ is a suitable real number). Then for ε > 0 we have x − (x0 + εz)2 = x1 − εz2 = x1 − εz, x1 − εz
= x1 , x1 − 2εx1 , z + ε2 z, z = d 2 − ε 2x1 , z − εz2 .
Since x1 , z > 0, for ε small enough we have 2x1 , z − εz2 > 0, and therefore x − (x0 + εz) < d, a contradiction. Thus x1 ⊥ F . (b) Assume now that x0 ∈ F satisfies (x − x0 ) ⊥ F . Then. given y ∈ F we have, by (11.27), x − y2 = (x − x0 ) + (x0 − y)2 = x − x0 2 + y − x0 2 ≥ x − x0 2 , the second equality due to the fact that (x − x0 ) ⊥ (y − x0 ) (see (11.27)). This proves that x0 is a point in F at minimum distance from x. Remark 968 The function PC defined in 1(d) in Lemma 967 is, in fact, 1-Lipschitz, i.e., PC (x)−PC (y) ≤ x −y for all x, y ∈ H . For a proof see, e.g., ([FHHMZ11, Exercise 7.48). ® Lemma 967, when applied to closed subspaces of a Hilbert space, gives the following central result in Hilbert space theory (Theorem 969), due to F. Riesz. In its formulation, the following construction is needed: Given two normed spaces (X, ·) and (Y , | · |), the set X × Y becomes a vector space if the two operations sum and product by a scalar are defined as (x1 , y1 ) + (x2 , y2 ) := (x1 + x2 , y1 + y2 ), and λ(x, y) := (λx, λy), for any x, x1 , x2 ∈ X, y, y1 , y2 ∈ Y , and any scalar λ. Now, norms on X × Y can be defined in several ways: for example,
11.4 Spaces with an Inner Product (Pre-Hilbertian and Hilbert Spaces)
559
Fig. 11.32 Decomposing X into an “orthogonal” direct sum, and the associated projections (Theorem 969 and Corollary 970)
(i) (x, y)∞ := max{x, |y|}, (ii) (x, y)1 := x + |y|, (iii) (x, y)2 := (x2 + |y|2 )1/2 , for x ∈ X and y ∈ Y . All these three norms turn out to be equivalent (see Theorem 908 below). It is easy to prove that X × Y endowed with any of those norms is a Banach space in case that (X, ·) and (Y , | · |) are both Banach spaces. Theorem 969 (Riesz) Let F be a closed subspace of a Hilbert space H . Then F ⊕ F ⊥ (called the direct sum of the subspaces F and F ⊥ ) is equal to H , in the sense that F +F ⊥ = H (algebraically), F ∩F ⊥ = {0}, and the mapping S : F ×F ⊥ → H given by S(x0 , x1 ) = x0 + x1 is a linear isomorphism from the Banach space F × F ⊥ onto H . The mapping P0 (P1 ) that sends x ∈ H to the unique element x0 ∈ F (respectively, x1 ∈ F ⊥ ) such that x = x0 + x1 , is a linear projection. Moreover, x2 = P0 x2 + P1 x2 . In particular, P0 x ≤ x (respectively, P1 x ≤ x) for all x ∈ H , so P0 = P1 = 1 (see Fig. 11.32). Proof For any x ∈ H , let x0 ∈ F be the point in F at minimum distance from x (see Lemma 967 and Fig. 11.31). Put x1 := x − x0 . Then x1 ∈ F ⊥ (see (2.a) in Lemma 967), and x = x0 + x1 , so F + F ⊥ = H . Obviously, F ∩ F ⊥ = {0}. The uniqueness of the decomposition x = x0 + x1 shows the linearity of P0 and P1 . Finally, x2 = x0 2 + x1 2 (see (11.27)), hence (P0 x = ) x0 ≤ x. Analogously, P1 x ≤ x. That P0 (and then P1 ) is a continuous mapping was proven in (1.d) in Lemma 967 (the fact that P0 and P1 are linear mappings, together with the inequalities P0 (x) ≤ x and P1 (x) ≤ x for all x ∈ H show, too, that both P0 and P1 are continuous, see Proposition 900 below). The fact that S is a linear isomorphism from F × F ⊥ onto H follows from it. Clearly, P0 x0 = x0 for x0 ∈ F , hence P0 is a projection (Definition 949). Since the norm of any projection is greater than or equal to 1, we have P0 = 1; the same applies to P1 . The following result is a particular case of Theorem 969. See also Proposition 990. Corollary 970 Let H be a Hilbert space, and let f be a nonzero continuous linear functional on H . Then, if F := f −1 (0), there exists x1 ∈ H such that x1 ⊥ F and H = F ⊕ span {x1 }.
560
11 Excursion to Functional Analysis
Proof Let x ∈ H be such that f (x) = 0. The construction in the proof of Theorem 969 shows that x = x0 + x1 , where x0 ∈ F ( := f −1 (0)) and x1 ∈ F ⊥ . Observe that f (x1 ) = 0, and that if h ∈ H , then h0 := h − (f (h)/f (x1 ))x1 ∈ F . This shows that h = h0 + (f (h)/f (x1 ))x1 , hence F ⊥ = span {x1 }. The result follows from Theorem 969. Let H be a Hilbert space and let S be a subset of H . The set S is called an orthonormal set if s1 , s2 = 0 whenever s1 = s2 ∈ S, and s, s = 1 for every s ∈ S. A maximal orthonormal set (in the sense of inclusion) in H is called an orthonormal basis of H . Theorem 971 Given an orthonormal set S0 in a Hilbert space, there exists an orthonormal basis of H that contains S0 . In particular, every Hilbert space has an orthonormal basis. Proof Let S0 be an orthonormal subset of H . Let O be the family of all orthonormal subsets of H that contain S0 (a nonempty family since it contains S0 ). We can order O by inclusion. Then O becomes a preordered set (see Sect. 12.6.1) such that every chain in O has an upper bound in O (the union of the sets in the chain), hence it has, by Zorn’s Lemma (see Sect. 12.6.3), a maximal element S ∈ O. By definition, S is an orthonormal basis of H . Remark 972 As an immediate consequence of Theorem 969 we obtain that if {eμ } is an orthonormal basis of H , then span {eμ } = H . Indeed, otherwise we could find a vector x in SH ∩ span {eμ }⊥ (where SH is the unit sphere in H ). The system {eμ } ∪ {x} is then orthonormal and larger than {eμ }, a contradiction with the maximality of {eμ }. ® Example 973 We present some examples of orthonormal bases in some Hilbert spaces. 1. For n ∈ N, the space Rn endowed with the inner product x, y := nk=1 xk yk , where x = (xk )nk=1 and y = (yk )nk=1 , is a real Hilbert space. Clearly, the (finite) system {ek }nk=1 , where ek := (0, . . . , 0, 1, 0, . . . , 0) (the number 1 in the k-th coordinate) for k = 1, 2, . . . , n, is an orthonormal basis. 2. The space 2 (N) was seen to be a (separable) Hilbert space in Example 963.1. The system {en : n ∈ N}, where en is the sequence with all elements 0 but the number 1 at the n-th position, is clearly an orthonormal basis. 3. Another fundamental example is the space L2 [0, 2π ]. This space was considered in Example 963.2, and its study will be pursued in Example 980. It has an orthonormal basis (see Proposition 981). ♦ Many of the Hilbert spaces H used in applications are separable. Theorem 977 below shows that, in the case H is separable, we can find a countable orthonormal basis {en }∞ n=1 of H (in fact, every orthonormal basis in H will be countable), and that every vector in H can be represented (uniquely up to permutations) as the sum of a series of the form ∞ n=1 cn en . Conversely, a Hilbert space with a countable orthonormal basis is separable. This important result is a consequence of a simple formula that computes the ·-distance from a given x ∈ H to the finite-dimensional subspace
11.4 Spaces with an Inner Product (Pre-Hilbertian and Hilbert Spaces)
561
of H generated by the first n vectors of the orthonormal basis (see Fig. 11.33). This formula contains the ingredients to prove a basic result concerning the behavior of the sequence of the so-called Fourier coefficients of x. Lemma 974 Let H be a Hilbert space. Let S0 := {e1 , e2 , . . . , en } be a finite orthonormal subset of H , and let x ∈ H . Then, for every scalars c1 , c2 , . . . , cn we have . .2 n n n . . . . 2 x − c e = x, x − |x, e | + |ci − x, ei |2 . (11.33) . i i. i . . i=1
i=1
i=1
Proof Observe that, thanks to the orthonormality of the system S0 , . .2 ; < n n n . . . . ci e i . = x − c i ei , x − ci e i .x − . . i=1
i=1
= x, x −
i=1
n
ci x, ei −
i=1
n
ci x, ei +
i=1
n
|ci |2 .
(11.34)
i=1
Moreover, n
|ci − x, ei |2 =
i=1
n
(ci − x, ei )(ci − x, ei )
i=1
=
n
|ci |2 −
n
i=1
i=1
ci x, ei −
n
ci x, ei +
i=1
n
|x, ei |2 .
i=1
(11.35)
It is enough to carry (11.35) into (11.34) to get (11.33). Remark 975 Lemma 974 has several consequences.
1. It nimmediately gives that the closest point in span {e1 , e2 , . . .n, en } to x is i=1 x, ei ei (indeed, this choice of the set of coefficients {ci }i=1 in (11.33) minimizes the distance from x to span {e1 , e2 , . . . , en }, see Fig. 11.33), and the square of the distance is given by dist 2 (x, span {e1 , e2 , . . . , en }) = x2 −
n
|x, ei |2 .
(11.36)
i=1
Note that the existence and uniqueness of such a point were guaranteed by Lemma 967. However, in this case both statements follow directly from Eq. (11.33). 2. Since dist (x, span {e1 , e2 , . . . , en }) ≥ 0 we get from (11.36) n i=1
|x, ei |2 ≤ x2 .
(11.37)
562
11 Excursion to Functional Analysis
Fig. 11.33 The sum of the two first summands in the Fourier series of x (Theorem 977)
® The estimate (11.37) holds for every orthonormal system {e1 , e2 , . . . , en }, and for every n ∈ N, so we get the following corollary. Corollary 976 (Bessel inequality) Let {ei }∞ i=1 be an orthonormal system in a Hilbert space H . Then ∞
|x, ei |2 ≤ x2 , for every x ∈ H (Bessel inequality).
(11.38)
i=1
Theorem 977 Let (H , ·, ·) be an infinite-dimensional Hilbert space. (i) If H is separable, then it has a countably infinite orthonormal basis {ei }∞ i=1 . (ii) If H is separable, then every orthonormal basis in H is countably infinite. (iii) If {ei }∞ i=1 is an orthonormal basis of H , then, for every x ∈ H , x=
∞
x, ei ei ,
(11.39)
i=1
(where the convergence of the series is in ·) and ∞
|x, ei |2 = x2 (Parseval identity).
(11.40)
i=1
(iv) If H has a countably infinite orthonormal basis, then H is separable. (v) (Riesz–Fischer) If {ei }∞ i=1 is an orthonormal basis of H , then the mapping F that sends each x ∈ H to the sequence {x, ei }∞ i=1 is a linear isometry from the space H onto the (complex) Banach space 2 (see Example 963.1). In particular, given an arbitrary sequence {ci }∞ i=1 in 2 , there exists a (unique) element x ∈ H such that x, ei = ci for every i ∈ N. (vi) Two separable infinite-dimensional Hilbert spaces are linearly isometric. The Riesz–Fischer Theorem ((v) in Theorem 977) is due to F. Riesz and the Austrian mathematician E. S. Fischer. Before proceeding with the proof of Theorem
11.4 Spaces with an Inner Product (Pre-Hilbertian and Hilbert Spaces)
563
977, we give a name to the sequence {x, ei }∞ i=1 of scalars in Eq. (11.39) associated to a given orthonormal system {ei }∞ and to a vector x in a Hilbert space H . i=1 Definition 978 Let H be a separable infinite-dimensional Hilbert space, and let {ei }∞ ei are called i=1 be an orthonormal system in H . Given x ∈ H , the numbers x, ∞ the Fourier coefficients of x in the system {ei }∞ i=1 , and the formal sum i=1 x, ei ei is called the Fourier expansion of x or the Fourier series for x (see Fig. 11.33) in the given system. This circumstance will be denoted by the symbol x∼
∞
x, ei ei .
i=1
Remark 979 1. Statement (iii) in Theorem 977 asserts that, given an orthonormal basis in a separable Hilbert space H , the Fourier series of any x ∈ H in this basis converges to x in the norm of H . A byproduct of this is the following: Any reordering {eσ (i) }∞ basis, i=1 (i.e., σ : N → N is a∞permutation) is again an orthonormal ∞ hence, x = ∞ i=1 x, ei ei = i=1 x, eσ (i) eσ (i) , so the series i=1 x, ei ei is unconditionally convergent. An alternative approach to the same is the following: Fix ε > 0 nconclusion 0 and find n ∈ N such that x − x, e e < ε and, simultaneously, 0 i i i=1 ∞ 2 2 n0 +1 |x, ei | < ε (use (11.39) and (11.40) in Theorem 977, respectively). } ⊂ {σ (i) : i =1, . . . , n1 }, and Find n1 ∈ N such that {1, 2, . . . , n0 0 note that, for n ≥ n1 , we have ni=1 x, eσ (i) eσ (i) − ni=1 x, ei ei ≤
∞ n 2 1/2 |x, e | < ε. This shows that x − x, e e i σ (i) σ (i) < 2ε for i=n0 +1 i=1 every n ≥ n1 , and so, since ε > 0 was taken arbitrary, x = ∞ i=1 x, eσ (i) eσ (i) . The reader will recognize the argument behind the proof of the fact that every absolutely convergent series in R is unconditionally convergent (Proposition 190). 2. Item (vi) in Theorem 977 says that, essentially, all separable infinite-dimensional Hilbert spaces are the same (and so all of them are essentially 2 (N) (real or complex according to the nature of the Hilbert space)). This can be extended, with minor modifications, to any two Hilbert spaces having orthonormal basis of the same cardinality. ® Proof of Theorem 977 (i) and (ii). Let {eμ } be an orthonormal basis of H (it exists by Theorem 971). Note that eμ − eμ = eμ − eμ , eμ − eμ 1/2 = (eμ , eμ + eμ , eμ )1/2 = (eμ 2 + eμ 2 )1/2 =
√ 2,
for μ = μ , so it follows that {eμ } is countable (see (v) in Theorem 582). Note that {eμ } cannot be finite. Indeed, if {ei }ni=1 is an orthonormal basis of H for some n ∈ N, the space L := span {ei : i = 1, 2, . . . , n} is finite-dimensional, hence a closed proper subspace of H . However, span {ei : i = 1, 2, . . . , n} = span{ei :
564
11 Excursion to Functional Analysis
i = 1, 2, . . . , n} = H (see Remark 972), a contradiction with the fact that H is infinite-dimensional. (iii) Due to the fact that span · {en : n ∈ N} = H (see Remark 972) we get, for a given x ∈ H . that dist (x, span {e1 , e2 , . . . , en }) → 0 as n → ∞ (i.e., n 2 2 i=1 |x, ei | → x as n → ∞, see Eq. (11.36)). In other words, x − xn → 0 as n → ∞, where, for n ∈ N, x n is the point in span{e1 , e2 , . . . , en } at minimum distance from x (i.e., xn = ni=1 x, ei ei according to Remark 975.1). (iv) follows from (iii). Indeed, the countable set 0 & n λi ei : λi ∈ K with rational real and imaginary parts, i = 1, 2, . . . , n, n ∈ N i=1
is dense in span {ei : i ∈ N}, and we showed that this last space is dense in H . (v) That the sequence {x, ei }∞ i=1 belongs to 2 is a consequence of Corollary 976, To prove that F is onto, let so F maps H into 2 , obviously in a linear way. n {ci }∞ be a sequence in
, and let s := c 2 n i i=1 i=1 m m ei for2 n ∈ N. The sequence {sn }∞ is ·-Cauchy, since c e = i i 2 n=1 i=n i=n |ci | for 1 ≤ n ≤ m. Due to the fact that H is complete, there exists x ∈ H such that sn → x in the norm ·. By Corollary 959, for i ∈ N we have sn , ei → x, ei as n → ∞. Since sn , ei = ci for n ≥ i, this shows that the sequence of Fourier coefficients of x is {ci }∞ i=1 . That F is an isometry follows from (11.40). (vi) is a consequence of (i) and (v) above. An orthonormal basis in a separable infinite-dimensional Hilbert space is a particular case of what is called a Schauder basis (named after the Polish mathematician J. ∞ Schauder) in a separable Banach space X, i.e., a sequence ∞ {ei }i=1 in X such that every element x ∈ X can be uniquely written as x = i=1 ai ei for some scalars ai , i ∈ N (the convergence of this series in the norm). This was part of the statement in Theorem 977 for the Hilbert space case. For some examples of Schauder bases in Banach spaces, see Exercise 13.566. We note that there are separable infinitedimensional Banach spaces X for which there is no Schauder basis. This important result was proved by the Swedish mathematician P. Enflo in 1974. Example 980 The L2 [0, 2π ] space (continued, see Example 963.2). The set inx e , where x ∈ [0, 2π ], (11.41) √ 2π n∈Z of functions in L2 [0, 2π ] was introduced by Eq. (9.19), and it was proved thereafter that it is an orthonormal system in the space of continuous 2π -periodic functions on R endowed with the scalar product given by Eq. (9.13), in particular in L2 [0, 2π ]. In the next result we shall prove something more. Proposition 981 inx e , x ∈ [0, 2π], is an orthonormal basis of L2 [0, 2π ]. √ 2π n∈Z
(11.42)
11.4 Spaces with an Inner Product (Pre-Hilbertian and Hilbert Spaces)
565
√ Proof Put en := (1/ 2π)einx for n ∈ Z. Let f be an element f ∈ L2 [0, 2π ] such that f , en = 0 for all n ∈ Z. This clearly implies that 4 4 4 f (x)dx = f (x) sin nxdx = f (x) cos nxdx = 0 for all n ∈ N, [0,2π]
[0,2π ]
[0,2π ]
(11.43) due to the fact that einx = cos nx + i sin nx and e−inx = cos nx − i sin nx, for all x ∈ [0, 2π] and all n = 0, 1, 2, . . . . We shall prove that f = 0 (a.e.). This will conclude, by using Theorem 969, that span{en : n ∈ Z} = L2 [0, 2π ]. The following argument applies to a real-valued function defined on [0, 2π ]. To conclude the proof it will be enough to consider, separately, the real and imaginary parts of a complex-valued function. Assume first that f is continuous on [0, 2π ]. Let {gn }∞ n=1 be the approximate identity on [0, 2π] constructed in (e) in Exercise 13.227. Then the sequence {f ∗ gn }∞ n=1 −1 converges uniformly to f on [0, 2π], where (f ∗ gn )(x) := (2π ) [0,2π ] f (t)gn (x − t) dt for all x ∈ [0, 2π] and all n ∈ N (see (b) in Exercise 13.227). Since gn is a trigonometric polynomial, it follows from (11.43) that f ∗ gn = 0 for all n ∈ N, hence f = 0. Let f be now an arbitrary element in L2 [0, 2π ]. Note that f ∈ L1 [0, 2π ] (see Remark 830). Assume that f satisfies, additionally, (11.43) above. Put then 4 f (t)dt, x ∈ [0, 2π ]. F (x) := [0,x]
In view of Proposition 768, F is an absolutely continuous function on [0, 2π], and due to the fact that [0,2π ] f (t)dt = 0 (see (11.43)), the function F satisfies F (0) = F (2π ) = 0. Then, by using integration by parts (Proposition 800) it is easy to see that 4 4 F (t) sin ntdt = F (t) cos ntdt = 0, for n ∈ N. [0,2π]
[0,2π ]
√ Put C0 := F , e0 = (1/ 2π) [0,2π ] F (x)dx. Then 9 8 1 C0 , en = 0 for all n ∈ Z. F−√ 2π √ It follows from the first part of the proof that F − (1/ 2π )C0 = 0, i.e., F is a constant function. Due to the fact that F (0) = 0 we get F = 0. Proposition 770 concludes that f = 0 (a.e.). Corollary 982 The Hilbert space (L2 [0, 2π ], ·, ·2 ) is separable. Proof This is a consequence of the fact that L2 [0, 2π ] has a countably infinite orthonormal basis (Proposition 981). Then (iv) in Theorem 977 concludes that (L2 [0, 2π ], ·2 ) is separable. The orthonormal basis (11.42) is used in frequency decompositions of periodic functions. Remark 979.1 notes that the system (11.42), written, arbitrarily, as a
566
11 Excursion to Functional Analysis
sequence, is an orthonormal basis in L2 [0, 2π], to which Theorem 977 may apply. It is customary to organize the two-sided-sequence (11.42) in such a way that the partial sums to be considered are ni=−n ci ei , and the associated “series” written as +∞ i=−∞ ci ei . Following this convention, and according to Theorem 977, every f ∈ L2 [0, 2π ] satisfies f =
+∞
cn einx ,
(11.44)
n=−∞
where the convergence of the series in (11.44) is in ·2 , and 4
1 cn = 2π
2π
f (x)e−inx dx for n ∈ Z.
(11.45)
0
Parseval identity (11.40) appears in this particular case as +∞
|cn |2 =
n=−∞
1 f 22 , 2π
(11.46)
for f ∈ L2 [0, 2π ]. It follows from the previous discussion that the system ,
√1 , cos √ x , sin √ x , cos √ 2x , sin √ 2x , . . . π π π π 2π
-
, x ∈ [0, 2π ],
(11.47)
is an orthonormal basis of L2 [0, 2π ]. Given f ∈ L2 [0, 2π], we have ∞
a0 + f = (ak cos kx + bk sin kx), 2 k=1
(11.48)
where again the convergence of the series in (11.48) is in ·2 , and ak :=
1 π
bk :=
1 π
2π 0
f (x) cos kx dx, k = 0, 1, 2, . . . ,
0
f (x) sin kx dx, k = 1, 2, . . .
2π
Now, Parseval identity (11.40) appears, in terms of the two sequences {an }∞ n=0 and {bn }∞ n=1 , as ∞
a02 2 1 + (an + bn2 ) = f 22 . 2 π n=1
(11.49)
11.4 Spaces with an Inner Product (Pre-Hilbertian and Hilbert Spaces)
567
Fig. 11.34 The first elements of the Haar basis
The two systems (11.42) and (11.47) are related, as we mentioned above, by the so-called Euler formulas: eix = cos x + i sin x, and cos x = (1/2)(eix + e−ix ), ♦ sin x = (1/2i)(eix − e−ix ), for x ∈ R. Example 983 Let L2 [0, 1] denote the Hilbert space all measurable complex-valued square integrable functions on the interval [0, 1]. Let the functions hi (called Haar wavelets after the Hungarian mathematician A. Haar) be defined on [0, 1], for i ∈ N, as follows (see Fig. 11.34): h0 (x) = 1 for x ∈ [0, 1]; h1 (x) = 1 for x ∈ [0, 21 ) and h1 (x) = −1 for x ∈ [ 21 , 1]; h2 (x) = 1 for x ∈ [0, 41 ), h2 (x) = −1 for x ∈ [ 41 , 21 ], h2 (x) = 0 for x ∈ ( 21 , 1]; h3 (x) = 1 for x ∈ [ 21 , 43 ), h3 (x) = −1 for x ∈ ( 43 , 1] and h3 (x) = 0 elsewhere, etc. Fix p ∈ [1, ∞). Clearly, the subset {hi : i ∈ N} of Lp [0, 1] is linearly independent. Since, H = span {hi : i ∈ N} contains the characteristic functions of the dyadic Lp intervals, we get H = Lp [0, 1]. In L2 [0, 1], the set {hi : i ∈ N} constitutes, after normalization, an orthonormal basis. To check orthogonality is easy: if hi and hj have disjoint supports, [0,1] hi hj clearly vanishes. If, on the contrary, their supports intersect, one of them is a subset of the other, and again [0,1] hi hj = 0. Thus, it follows from the linear density of span {hi : i ∈ N} in L2 [0, 1] that {hi : i ∈ N} is, after normalization, an orthonormal basis there. Haar wavelets are used in signal decompositions in time and frequency. ♦ Example 984 The Hilbert cube Q is the subset of 2 defined by 1 Q = x = (xi ) ∈ 2 : |xi | ≤ i for all i ∈ N . 2 The ∞ set Q1is compact in ( 2 , ·2 ). Indeed, given ε > 0 there is n0 ∈ N such that i=n0 +1 22i |xi | ≤ ε for all x ∈ Q, as |xi | ≤ 1 for all x ∈ Q and all i. Then use finite ♦ ε-nets (see Theorems 615 and 620) in Rn0 to finish the proof.
11.4.2 An Application to the Uniform Convergence of the Fourier Series The space CP [0, 2π ] of all scalar-valued continuous 2π -periodic functions defined on R is obviously a closed subspace of the normed space (C[0, 2π ], ·∞ ). This
568
11 Excursion to Functional Analysis
last space is complete (Example 573.4, the scalar-valued version is similar) and separable (Example 586.4, also for the scalar-valued case), so (CP [0, 2π ], ·∞ ) is also a separable Banach space. The proof of Theorem 861 above is based on some of the results in previous sections (they were stated for real-valued functions, and extend naturally to complexvalued functions just by applying them to their real and imaginary parts). We write down again the statement for completeness. Theorem 985 Let f be a scalar-valued Lipschitz 2π -periodic function. Then the Fourier series of f converges to f uniformly on R. Proof Every Lipschitz function is of bounded variation (see Propositions 436 and 444). Thus, Corollary 749 shows that f (that exists (a.e.) thanks to Corollary 433) belongs to L[0, 2π ]. Moreover, if f is C-Lipschitz for some C > 0, then at points x where f (x) exists we have, clearly, |f (x)| ≤ C, hence |f (x)|2 ≤ C 2 . The function |f |2 is then measurable (see (b)3 in Proposition 402 and Corollary 407) and bounded on the bounded interval [0, 2π], hence Lebesgue integrable (see Remark 752.2), and so f ∈ L2 [0, 2π ]. For n ∈ Z, let 4 1 Cn := f (x)e−inx dx (11.50) 2π [0,2π ] be its n-th Fourier coefficient (see formula (9.22)). Bessel’s inequality (Corollary 976) shows then that n∈Z |Cn |2 < +∞. Observe that the function e−inx is Lipschitz on [0, 2π], due to the fact that it has a bounded derivative there. In particular, it is absolutely continuous on [0, 2π] (see Proposition 444). We can then use the integration-by-parts formula (Proposition 800) in (11.50). Due to the fact that einx is a 2π-periodic function and that the continuity of f implies f (2π ) = f (0), we get Cn = −incn for all n ∈ Z. Using the Cauchy–Schwarz inequality (Theorem 958) we get 1/2 ∞ 1/2 ∞ ∞ 1 1 2 |cn | = |Cn | < +∞. |Cn | ≤ n n2 n=1 n=1 n=1 n=1
∞
−∞ inx | = 1 for all The same argument shows that n=−1 |cn | < +∞. Since |e ∞ x ∈ Rand all n ∈ N, this proves that the sequence {sn }n=1 of partial sums n inx of the Fourier series of f is Cauchy in the Banach space sn := k=−n ck e (CP [0, 2π ], ·∞ ), hence it converges (to a continuous 2π -periodic function g)3 . Since the averages of the sequence {sn } form a sequence that ·∞ -converges to f (Theorem 859), it follows from Exercise 13.101 that f = g.
3
The argument is, in fact, a generalized version of the one used for proving the Weierstrass M-test (Theorem 473).
11.4 Spaces with an Inner Product (Pre-Hilbertian and Hilbert Spaces)
11.4.3
569
Complements to Hilbert Spaces
In this subsection we shall see the form that some previous results in the general framework of Banach space adopt in the important case of Hilbert spaces. Some of them, and due to the introductory level of this “Excursion,” will be formulated only for separable Hilbert spaces. The program to be developed is the following: we shall stick to the same order of presentation already used. In all items below, H denotes a Hilbert space. In some of them, we may consider (H , ·) as a (complex) Banach space; then, (HR , ·) denotes its associated real Banach space (see Sect. 11.3.1). 1. Corollary 931 ensures the existence, for each x ∈ SH , of a supporting functional to BH at x (see Sect. 11.3.4). In this case, the supporting functional is unique. In other words, the norm of HR is Gâteaux differentiable at any point in HR \ {0} (see Remark 940.1 and 940.4.) 2. Even more, the norm of HR is Fréchet differentiable at any point in HR \ {0}. 3. Regarding Corollary 932, the mapping j maps H onto H ∗∗ . 4. Related to Remark 934, now every continuous linear functional on H has a (unique) support point in BH . This shows that the absolute value of every continuous linear functional—or any continuous linear real functional—attains its supremum on BH . 5. Regarding projections, every closed linear subspace of H is complemented with a norm-1 projection. This program will be developed by using, in particular, a special tool: The existence in every Hilbert space of an orthonormal basis (Theorem 977). We shall show a central result, that the mapping J from H into H ∗ that sends x ∈ H \ {0} to x-times the unique supporting functional on SH ∗ , and maps 0 to 0, is an antilinear isometry from H onto H ∗ . Its inverse J −1 maps any continuous linear functional f = 0 into f -times the unique support point of f in SH . The mapping J will allow to define a scalar product ·, · on H ∗ . It induces the dual norm, and (H ∗ , ·, ·) becomes a Hilbert space “antilinearly” isometric to (H , ·, ·). Let us start by item 1 above. If (X, ·) is a Banach space not reduced to {0}, the function x → x, as a function on XR , is never Gâteaux differentiable at the origin (this was proven in Remark 940.1). In general, the norm may or may not be differentiable at nonzero points (see Example 942 for a Banach space where the norm is nowhere Fréchet differentiable). In case of Hilbert spaces, the situation is pleasant: The norm ·, as a mapping from HR into R, is Gâteaux differentiable at every nonzero point. This is the content of Proposition 986 given below. We shall prove later (Proposition 1000) that, even more, the norm is, in fact, Fréchet differentiable at every nonzero point. Proposition 986 The norm of a Hilbert space H , as a mapping from HR into R, is Gâteaux differentiable at every nonzero point. Proof Let us prove first that ·2 , as a mapping from HR into R, is Gâteaux differentiable at every x ∈ H . On the way, we shall show that the Gâteaux derivative of
570
11 Excursion to Functional Analysis
·2 at x is the element in HR∗ given by h → h, x + x, h, for h ∈ H . Indeed, for h ∈ H and t = 0, x + th2 − x2 − h, x − x, h t x, x + th, x + tx, h − t 2 h, h = − h, x − x, h 2 t = |h, x + x, h + th, h − h, x − x, h| → 0 as t → 0. This shows the assertion. Now, in order to prove the Gâteaux differentiability of · at x, for x ∈ H , x = 0, it is enough to compose ·2 with the √ real-valued differentiable function y (defined on (0, +∞)). Remark 987 The precise value of the Gâteaux derivative of the norm · on HR at a point x ∈ H , x = 0, is RJ x /x, where J x is the (unique) functional in xSH ∗ that supports xBH at x and RJ x denotes its real part—a (unique) real continuous linear functional in xSHR that supports xBHR at x. Analytically, J x(h) = h, x for h ∈ H , and so the Gâteaux derivative of · on HR at x = 0 is the real continuous linear mapping h → x−1 Rh, x, h ∈ H . This will be justified in Lemma 992 and Theorem 993. ® The following consequence should be compared with Proposition 990 below. Corollary 988 Let H be a Hilbert space. Then, every x ∈ SH has a unique supporting functional f ∈ SH ∗ . Proof This follows from Proposition 986 and Remark 940.4.
Remark 989 By homogeneity, Corollary 988 implies that, given a Hilbert space H , for any x ∈ H , x = 0, there exists a unique f ∈ xSH ∗ such that f (x) = x2 . Compare with Remark 991.1. ® In Hilbert spaces there is a kind of symmetrization (or dualization) of Corollary 931. The first conclusion in the following result—the only one that makes sense in a general Banach space—may fail in that context. For a brief discussion on this pathology see Remark 934. Note the symmetry with the situation in Corollary 988 above. Proposition 990 Let H be a Hilbert space. Then, for every continuous linear functional f on H such that f = 1 there exists a unique point x ∈ SH such that f (x) = 1, and x ⊥ f −1 (0), i.e., f supports BH at the (unique) point x. Even more, the diameter of the sections S(f , δ) := {x ∈ BH : Rf (x) > 1 − δ}, 0 < δ < 1, tend to 0 as δ → 0 (see Fig. 11.35 for a real Hilbert space). Proof Apply 2 in Lemma 967 (after a translation) to find the (unique) point x ∈ f −1 (1) at minimum distance d from 0. It satisfies x ⊥ f −1 (0). By Exercise 13.551 (applied to the translate f −1 (1) of f −1 (0)) we get d = 1. Thus, x = 1. To prove the second part, take y ∈ S(f , δ) and let z := (x + y)/2 and u := x − z. Then x = z + u and y = z − u. Note that 1 − δ ≤ Rf (z) ≤ Rf .z = z; hence, by
11.4 Spaces with an Inner Product (Pre-Hilbertian and Hilbert Spaces)
571
Fig. 11.35 The supporting functional in a real Hilbert space (Proposition 990)
the parallelogram equality, 2 ≥ x2 + y2 = 2z2 + 2u2 ≥ 2(1 − δ)2 + 2u2 and we get that u2 ≤ 2δ − δ 2 . This shows the statement.
Remark 991 1. Proposition 990 can be formulated in the following way: For every f ∈ H ∗ , f = 0, there exists a unique x1 in H such that x1 = f and f (x1 ) = f 2 . This result follows easily by homogeneity. Compare with Remark 989. 2. Uniqueness in Proposition 990 follows too from Remark 960. ® In the following lemma and in Theorem 993, we shall define a mapping J from H into H ∗ . In order to avoid cumbersome notation, this time the image of an element x ∈ H —itself a function—will be denoted alternatively by J (x) or J x. Lemma 992 Let H be a Hilbert space. For x ∈ H define a scalar-valued mapping J x on H by the formula J x(h) := h, x, for h ∈ H.
(11.51)
Then, the mapping J that sends x ∈ H to J x has the following properties: (i) It is a one-to-one mapping from H onto H ∗ . (ii) It is an isometry, i.e., J x = x for all x ∈ H . (iii) For x, y ∈ H , and for a scalar λ, we have J (x + y) = J x + J y, and J (λx) = λJ x. A mapping between normed spaces with property (iii) in Lemma 992 is said to be antilinear. Proof of Lemma 992 From the properties of the scalar product, it is clear that, for every x ∈ H , the mapping J x is linear. The Cauchy–Schwarz inequality (11.23) shows that |J x(h)| ≤ x.h for every h ∈ H , hence J x is continuous (thus an element in H ∗ ) and J x ≤ x for every x ∈ H . Obviously, J 0 = 0. If x ∈ H is not zero, then J x(x) = x, x = x2 , hence J x(x/x) = x, and we get J x ≥ x. From this and the inequality above it follows that J x = x. Note, too, that J : H → H ∗ is antilinear, in the sense
572
11 Excursion to Functional Analysis
that J (x + y) = J x + J y and J (λx) = λJ x for all x, y ∈ H and all scalars λ. As a consequence, J : H → H ∗ is an isometry (and in particular, J is one-to-one): Indeed, if J x = J y for some x, y ∈ H , then J (x − y) = J x − J y = 0, and x − y = J (x − y) = J x − J y = 0, so x = y. From this it follows that, for x = 0, J x is the (unique, see Corollary 988 and Remark 989) element in xSH ∗ supporting xBH at x. As a consequence, J : H → H ∗ is an onto mapping. Indeed, if f ∈ H ∗ , f = 0, find, by Remark 991.1, a (unique) element x1 ∈ H such that x1 = f and f (x1 ) = f 2 . Clearly, f is the (unique) element in f SH ∗ supporting f BH at x1 , hence J x1 = f . Property (i) in Lemma 992 ensures that the mapping J has an inverse J −1 : H ∗ → H . It is clear that J −1 has the corresponding properties (i), (ii), and (iii) in the same lemma, and that, for f ∈ H ∗ , f = 0, f supports f BH at the (unique) point J −1 f (see Proposition 990 and Remark 991.1). Define on H ∗ an inner product ·, · by f , g := J −1 g, J −1 f , f , g ∈ H ∗ .
(11.52)
It is a matter of checking that ·, · on H ∗ is indeed an inner product, and that the norm on H ∗ induced by ·, · via the formula (11.24) is the dual norm. Since (H ∗ , ·) is complete (see Corollary 904), the space (H ∗ , ·, ·) is a Hilbert space. Let us summarize the previous observations in the following important result. It is a consequence of Lemma 992 and what was said in the previous paragraph. Its conclusion can be condensed in the statement that H and H ∗ are Hilbert antilinearly isometric. Theorem 993 (Riesz) Let H be a Hilbert space. Then the mapping x $ → J x from H onto H ∗ given by J x(h) := h, x for all x, h ∈ H is an antilinear isometry from H onto H ∗ . Its inverse mapping J −1 is also an antilinear isometry from H ∗ onto H . The space H ∗ is a Hilbert space when endowed with the inner product f , g := J −1 g, J −1 f , for f , g ∈ H ∗ . This inner product induces the dual norm on H ∗ . The two Hilbert spaces H and H ∗ are antilinearly isometric, in the sense that J above has the stated properties: it is an antilinear isometry from H onto H ∗ , and x, y = Jy, J x for all x, y ∈ H . Corollary 994 If H is a Hilbert space, the mapping j : H → H ∗∗ defined in Corollary 932 maps H onto H ∗∗ . Proof Lemma 992, when applied to the Hilbert space H , gives an antilinear onto isometry J1 : H → H ∗ , and when applied to the Hilbert space H ∗ gives an antilinear onto isometry J2 : H ∗ → H ∗∗ . Fix F ∈ H ∗∗ and let f := J2−1 (F ), x := J1−1 (f ) (see Fig. 11.36). Given g ∈ H ∗ , put y := J1−1 (g). Then F (g) = J2 f (g) = g, f , and g(x) = J1 y(x) = x, y = J1 y, J1 x = g, f . This shows that g(x) = F (g) for all g ∈ H ∗ , hence j (x) = F and the mapping j : H → H ∗∗ is onto. Remark 995 Let us consider the action of the mapping J : H → H ∗ defined in Lemma 992 on any orthonormal basis {en }∞ n=1 of a separable Hilbert space H (its
11.4 Spaces with an Inner Product (Pre-Hilbertian and Hilbert Spaces)
573
Fig. 11.36 The mappings and vectors in Corollary 994. The conclusion is that j (H ) = H ∗∗
existence and the fact that it is countably infinite are guaranteed by (i) and (ii) in Theorem 977). Note first that J en (em ) = δn,m for n, m ∈ N. The definition of the scalar product in H ∗ (see formula (11.52)) and the fact that J is an isometry state that the system {J (en )}∞ n=1 is orthonormal. Indeed, J en , J em = em , en = δn,m for all n, m ∈ N. Let us prove that {J en }∞ n=1 is a basis. Assume not. Then Y := span{J en : n ∈ N} is a proper closed subspace of H ∗ . By Lemma 967 and Theorem 969 we can find a function f ∈ SH ∗ such that f ∈ Y ⊥ . Find J −1 (f ) according to Theorem 993. We have en , J −1 f = f , J en = 0 for all n ∈ N, and this implies J −1 f = 0, a contradiction with the fact that J −1 f = f = 1. ® Remark 996 The translation of Theorem 993 to the case of the Hilbert space 2 is the following: The antilinear isometry J from 2 onto ∗2 that sends an element x = (xn ) ∈ 2 to the element J x ∈ ∗2 is defined by J (x)(h) = h, x = ∞ n=1 hn xn for h = (hn ) ∈ 2 . Remark 997 In case H is a real Hilbert space, the mappings J and its inverse J −1 defined in Lemma 992 become linear isometries. The spaces H and H ∗ turn out to be linearly isometric, and the mapping J from H onto H ∗ preserves the inner product, in the sense that J x, J y = x, y for all x, y ∈ H (we say then that H and H ∗ are Hilbert isomorphic). ® The following result is of capital importance for the whole area of Functional Analysis. Theorem 998 If H is a separable Hilbert space, then given any bounded sequence ∞ ∞ {hn }∞ n=1 in H there exists h ∈ H and a subsequence {hnk }k=1 of {hn }n=1 such that ∗ f (hnk ) → f (h) for every f ∈ H . Proof The result is a consequence of Corollary 994 and its proof, and the Alaoglu– Bourbaki Theorem 919. Indeed, H ∗∗ and j (H ) (see Corollary 994) coincide, in particular f (h) = j h(f ) for all f ∈ H ∗ and all h ∈ H . Thus, H is the dual space of H ∗. Remark 999 Note that, in the (bounded) sequence ncontrast with Theorem 998, ∞ {sn }∞ n=1 in c0 , where sn := k=1 ek for n ∈ N and {en }n=1 is the system of canonical unit vectors, has no pointwise convergent subsequence in c0 . Indeed, the only candidate to a pointwise limit of a subsequence is the vector (1, 1, 1, . . .), which is not in c0 . Related to this, see Remark 918.5. ® We present below the announced result on the Fréchet differentiability of the norm of a Hilbert space H (always understood as a mapping from HR into R). Observe that the computation of the mapping J in Lemma 992 (see Remark 997) and the Fréchet
574
11 Excursion to Functional Analysis
Fig. 11.37 The norm of the Hilbert space (R 2 , ·, ·2 ) is Fréchet differentiable out of 0, and its derivative has norm 1
derivative of · are closely related, as follows from Remark 940.2. Precisely, we have (see Fig. 11.37): Proposition 1000 The norm of a Hilbert space H (as a mapping from HR into R) is Fréchet differentiable at every nonzero point x, and its derivative at x is RJ x /x, where J is the mapping defined at Lemma 992 and RJ x denotes the real part of the functional J x. Proof It is clear that the real part of J x/x supports BHR at x. The result follows by using Proposition 990 in the space H ∗ and Remark 940.5. Remark 1001 We note that the differentiability of the Hilbertian norm can be checked directly, see Exercise 13.612. ® Considering complementability in Hilbert spaces, observe Proposition 1002 below, the easy part of a deep result of J. Lindenstrauss and L. Tzafriri mentioned in the sixth paragraph after Definition 949: Every Banach spaces that is not linearly isomorphic to a Hilbert space has an uncomplemented subspace. Proposition 1002 Every closed subspace of a Hilbert space is complemented in it by a projection of norm 1. Proof This is a direct consequence of Riesz Theorem 969.
11.5
Spectral Theory
Let X be a linear space, and let T : X → X be a linear operator. A vector x ∈ X is said to be an eigenvector of T if x = 0 and T x = λx for some scalar λ (that is called an eigenvalue of the operator). Note that λ is an eigenvalue of T , if and only if, there exists a nonzero vector x ∈ X such that (λIX − T )x = 0, where IX denotes the identity operator from X onto X, i.e., if ker (λIX − T ) = {0} or, in other words, if λIX − T is not one-to-one. In case that X is a finite-dimensional vector space (say with dimension n), a test for showing this is to compute the determinant of the associated square matrix (a n-degree polynomial P in the variable λ). Its roots are, precisely, the eigenvalues of T . The Fundamental Theorem of Algebra ensures the existence of roots (in the complex plane), and so the nonemptiness of the spectrum σ (T ), i.e., in this case, the set of all eigenvalues. Clearly, again in this case, if λ ∈ C \ σ (T ), the operator λIX − T is an isomorphism from X onto X (and conversely) (see Facts 11.1.3).
11.5 Spectral Theory
575
This is the reason why it is natural to consider in this context vector space over the field of complex numbers. For Banach spaces, Theorem 1009 below shows a variant of this result. We need some preliminary lemmas. If X and Y are Banach spaces, an operator T ∈ B(X, Y ) is called invertible if T is an isomorphism from X onto Y . In other words, an operator T ∈ B(X, Y ) is invertible if and only if there is a bounded operator T −1 ∈ B(Y , X) such that T −1 T = IX and T T −1 = IY . By the Open Mapping Theorem 953, this is equivalent to T being one-to-one and onto. The following lemma is basic. It can be compared to the fact that for |t| < 1, the function f (x) := (1 − t) has a reciprocal − t)−1 that can be written as the ∞ (1 k sum of the absolutely convergent series k=0 t (see Remark 512.1). In a sense, its proof is similar, and in fact it uses the sum of the aforesaid numerical power series. In the proof we need the fact that an absolutely convergent series of continuous linear operators on a Banach space converges to a continuous linear operator in the operator norm (a series ∞ n=1 Tn of operators is said to be absolutely convergent T is convergent). The proof of this fact is almost a repetition if the series ∞ n n=0 of the proof of Proposition 167 having in mind that (B(X), ·) is complete (see Proposition 901). By T n we denote the composition of the operator T with itself a number n of times. Lemma 1003 Let X be a Banach space and T ∈ B(X). If T < 1 then (IX − T ) is invertible and (IX − T )−1 =
∞
T k,
(11.53)
k=0
where the series converges absolutely in B(X). ∞ ∞ k 1 k k Proof First note that ∞ k=0 T ≤ k=0 T = 1−T and so k=0 T is absolutely convergent in B(X). As we mentioned prior to the statement of the lemma, this k implies that the series converges to a continuous linear operator (denoted ∞ k=0 T ) from X into X. Note that, for n ∈ N, (IX −T )
n
T k = IX +T +T 2 +. . .+T n −T −T 2 −T 3 −. . .−T n+1 = IX −T n+1 .
k=0
Observe, too, that T n ≤ T n → 0, hence IX − T
n+1
= (IX − T )
n k=0
Similarly,
∞ k=0
T
k
(IX − T ) = IX .
T →n (IX − T ) k
∞
T k = IX .
k=0
576
11 Excursion to Functional Analysis
Lemma 1004 Let X be a Banach space and S, T ∈ B(X). If T is invertible and T − S < T −1 −1 , then S is invertible and S −1 − T −1 ≤
T −1 2 T − S . 1 − T −1 T − S
Proof We have T −1 (T − S) ≤ T −1 ·T − S < 1. Thus we can use Lemma We 1003 to get that I − T −1 (T − S) ( = T −1 S) is invertible, hence S is invertible. n −1 also have, by the same lemma, that [IX − T −1 (T − S)]−1 = ∞ (T − S)) . (T i=0 Hence,
−1 S −1 = (T − (T − S))−1 = T (IX − T −1 (T − S)) =
∞
(T −1 (T − S))n T −1 = T −1 +
n=0
∞
(T −1 (T − S))n T −1 ,
n=1
and thus S −1 − T −1 ≤
∞
(T −1 (T − S))n T −1
n=1
≤ T −1
∞ n=1
(T − S·T −1 )n =
T −1 2 T − S . 1 − T −1 T − S
Lemma 1004 gives immediately the following consequence. Corollary 1005 Let X be a Banach space. The set C of all invertible operators on X is an open set in B(X) and the mapping T $ → T −1 is a homeomorphism of C onto C, in the sense of the operator norm (11.7) on BX . Definition 1006 Let X be a Banach space, and let T ∈ B(X). The spectrum σ (T ) of T is defined by σ (T ) = {λ ∈ C : λIX − T is not invertible}. The resolvent set ρ(T ) is defined by ρ(T ) = C \ σ (T ). The points of ρ(T ) are called regular values of T . Remark 1007 Recall that a one-to-one continuous linear operator from a Banach space onto another Banach space is, automatically, an isomorphism (Theorem 953). Accordingly, if T : X → X is a continuous linear operator and X is a Banach space, there are exactly three, mutually exclusive, reasons why the operator λIX −T : X → X should not be invertible, so defining three, mutually disjoint, subsets of σ (T ): (i) λIX − T is not one-to-one—those λ’s form the so-called point spectrum σp (T ). (ii) λIX − T is one-to-one, it is not onto, but it has a dense image—defining the so-called continuous spectrum σc (T )-and, finally,
11.5 Spectral Theory
577
(iii) λIX − T is one-to-one and it has a nondense image—defining the so-called residual spectrum σr (T ). Note that a point λ ∈ C is in the point spectrum of T precisely when there exists x ∈ X, x = 0, such that T x = λx, i.e., when λ is an eigenvalue. ® In particular, the point spectrum (i.e., the set of all eigenvalues, see Remark 1007) is a (maybe empty) subset of σ (T ). The point spectrum coincides with the spectrum if the space is finite-dimensional, since in finite-dimensional spaces an operator T from X into X is an isomorphism, if and only if, it is one-to-one (see Facts 11.1.3). Note also that, for compact operators, 0 is always in the spectrum if the space is infinite-dimensional, since the unit ball of X is not compact in this case, and so there is no compact linear isomorphism of an infinite-dimensional space onto itself. On the other hand, a compact operator may not have any eigenvalue. However, it has a so-called nontrivial invariant subspace, i.e., a subspace Y of X different from {0} and the whole space so that T Y ⊂ Y . The existence of a bounded operator on a space X without such subspace was first showed by the P. Enflo in 1981. If X can be chosen to be 2 is still unknown. In order to locate the spectrum of an operator, the following intermediate result will be useful. Lemma 1008 The spectrum of a continuous linear operator T : X → X, where X is a complex Banach space, is a subset of the set T .BC , where BC denotes the closed unit ball of the set C of all complex numbers. Proof If |λ| > T , then by Lemma 1003, λ ∈ σ (T ). In general, we have the following result.
Theorem 1009 The spectrum of a continuous linear operator T : X → X, where X is a complex Banach space, is a nonempty compact subset of T .BC ., where BC denotes the closed unit ball of the set C of all complex numbers. Proof By Lemma 1008, σ (T ) ⊂ T BC . The complement of σ (T ) is open by Corollary 1005. So, the set σ (T ) is a bounded and closed subset of C, so it is compact. The proof that the spectrum is nonempty is beyond the scope of this text, as it uses the theory of complex variables. Example 1010 Let the mapping R (called the right shift operator) from 2 into 2 be defined by R(x) = (0, x1 , x2 , . . .) for x = (x1 , x2 , . . .) ∈ 2 . Clearly R is a linear operator. Note that Rx2 = x2 for all x ∈ 2 : It follows that R is a linear isometry into (so R = 1), and its range is a hyperplane in 2 . R is clearly one-to-one so 0 is not an eigenvalue. If λ = 0, then by solving (0, x1 , x2 , . . .) = (λx1 , λx2 , λx3 , . . .), we get xi = 0 for all i, hence R has no eigenvalues at all. This shows that the point spectrum σp (R) is empty. In view of Remark 1007, Lemma 1008, and Theorem 1009, ∅ = σ (R) = σc (R) ∪ σr (R) ⊂ BC . Note that σr (R) = {λ : λ ∈ C, |λ| < 1}, and that σc (R) = SC . To see this, fix λ ∈ C, |λ| ≤ 1 and compute (λI − R)en = λen − en+1 for all n ∈ N. Let
578
11 Excursion to Functional Analysis
x = (xn ) ∈ 2 such that (λI − R)en , x = 0 for all n ∈ N. Then λxn = xn+1 for n ∈ N. This shows that x = (x1 , λx1 , λ2 x1 , . . .) = x1 (1, λ, λ2 , . . .). Since x ∈ 2 , this forces |λ| < 1 and, conversely, if |λ| < 1 we can find a nonzero vector x such that x ⊥ (λI − R)en for all n ∈ N. This shows that the range of (λI − R) is not dense in
2 . Assume now that |λ| = 1. The same argument as above shows that the only vector x ∈ 2 orthogonal to (λI −R)en for all n ∈ N is the origin, hence the range of λI −R is dense in 2 . We shall prove that the mapping λI − R is not onto, since there is no x ∈ 2 such that (λI − R)x = e1 . Indeed, if such an x = (xn ) exists, then λx1 = 1 and λxn − xn−1 = 0 for n ≥ 2. Thus, xn = 1/λn for all n ∈ N, and this is impossible due to the fact that |xn | → 0. ♦ Remark 1011 A related example, the so-called left shift operator from 2 into 2 , will be considered in Exercise 13.614. ®
11.6
♣ Pointwise Topology and Product Spaces
Along this book, the term “pointwise” appeared in many places. Definition 451 precisely says what pointwise convergence of a sequence of functions means. However, there is a difficulty: sequences are not enough to describe all topological features behind this form of convergence. To see this, and to look for a remedy, is one of the purposes of this brief section. The proper setting for speaking about pointwise convergence is product spaces, or—what amounts to the same thing—function spaces. Let X be a nonempty set, and let (Y , d) be a metric space. The set Y X is the collection of all functions from X into Y . Proposition 1012 A topology Tp on Y X is defined by calling subsets O of Y X “open” if for every element f0 ∈ O, there exists ε > 0 and a nonempty finite subset F of X such that U (f0 ; F , ε) := {f ∈ Y X : d(f (x), f0 (x)) < ε for each x ∈ F } ⊂ O.
(11.54)
Proof We must show that the family Tp of open sets satisfy the axioms (O1) to (O4) in Remark 105. Clearly (O1), (O2), and (O4) are satisfied. In order to prove (O3) let O1 and O2 be two open subsets. We shall prove that O1 ∩ O2 is open. If O1 ∩ O2 = ∅, we are done. If not, let f ∈ O1 ∩ O2 . Then we can find a nonempty finite subset Fi of X and εi > 0 such that U (f ; Fi , εi ) ⊂ Oi , i = 1, 2. It is clear that U (f ; F1 ∪ F2 , min{ε1 , ε2 }) ⊂ O1 ∩ O2 , and this proves that O1 ∩ O2 is open. Definition 1013 The topology Tp on Y X defined in Proposition 1012 is called the topology of pointwise convergence. The reason for the name is in Proposition 1017. Until now we have been dealing with topologies defined by a metric. We say that a topology T on a set S is metrizable whenever there exists a metric d on S that
11.6 ♣ Pointwise Topology and Product Spaces
579
induces T . The following result clarifies when the topology Tp on Y X is metrizable (Y is always a metric space). Proposition 1014 Let (Y , d) be a metric space not reduced to a singleton, and let X be a nonempty set. The topology Tp on Y X of the pointwise convergence is metrizable, if and only if, X is countable. Proof Assume first that X is countable, say X := {xn : n ∈ N}. Define a metric p on Y X by ∞ 1 d(f (xn ), g(xn )) p(f , g) := , f , g ∈ Y X. n 1 + d(f (x ), g(x )) 2 n n n=1 It is easy to show that p is indeed a metric on Y X . Let us prove that p defines the topology Tp . Fix a nonempty set O ∈ Tp and an element f0 ∈ O. Find a nonempty finite set F ⊂ X and ε > 0 such that U (f0 ; F , ε) ⊂ O, where U (f0 ; F , ε) is given in (11.54). We may assume that F = {x1 , . . . , xn0 } for some n0 ∈ N such that ∞ −n < ε/2. Now, find δ > 0 such that 2n0 δ/(1 − 2n0 δ) < ε. Let f ∈ Y X n=n0 +1 2 such that p(f , f0 ) < δ. Then, for n = 1, 2, . . . , n0 , d(f (xn ), f0 (xn )) < 2n δ ≤ 2n0 δ, 1 + d(f (xn ), f (xn0 )) hence d(f (xn ), f0 (xn ))
0 such that f ∈ O for all f ∈ Y X such that −n p(f , f0 ) < ε. Find n0 ∈ N such that ∞ < ε/2. Now, for f ∈ Y X such n=n0 +1 2 that d(f (xn ), f0 (xn )) < ε/2n0 for all n = 1, 2, . . . , n0 we have n0 ∞ 1 d(f (xn ), f0 (xn )) 1 d(f (xn ), f0 (xn )) + p(f , f0 ) < n 1 + d(f (x ), f (x )) n 1 + d(f (x ), f (x )) 2 2 n 0 n n 0 n n=1 n=n +1 0
< ε/2 + ε/2 = ε, hence f ∈ O. This shows that U (f0 ; F , ε) ⊂ {f ∈ Y X : p(f , f0 ) < ε} ⊂ O. Thus, every open set in the topology defined by the metric p is also open in the topology Tp . We proved thus that the topology defined by the metric p coincides with the topology Tp . Assume now that Tp is defined by a metric, say p. Fix f0 ∈ Y X . For n ∈ N, let us consider B(f0 , 1/n) := {f ∈ Y X : p(f , f0 ) < 1/n}. Since this is also a Tp -neighborhood of f0 , there exists a nonempty finite set Fn ⊂ X and εn > 0 such that U (f0 , Fn , εn ) ⊂ B(f0 , 1/n). We claim that X = ∞ n=1 Fn . If not, find x0 ∈ F . Let f : X → Y be such that f (x ) = f (x ) (here we need that Y is X\ ∞ n 0 0 0 n=1
580
11 Excursion to Functional Analysis
not a singleton) and f (x) = f0 (x) for all x ∈ ∞ n=1 Fn . Put δ := d(f (x0 ), f0 (x0 )) ( > 0). The set U (f0 ; {x0 }, δ) is a neighborhood of f0 , hence it must contain a set B(f0 , 1/n) for some n ∈ N. It follows that U (f0 ; Fn , εn ) ⊂ U (f0 , {x0 }, δ). However, f ∈ U (f0 ; Fn , εn ), and f ∈ U (f0 ; {x0 }, δ), a contradiction. The topology T of a metric space (M, d) can be described by sequences. By this we mean that T is completely determined as soon as the family of all convergent sequences in (M, d) and their limits is known—and conversely. This is a byproduct of the characterization of closedness given in Proposition 554. Proposition 1015 Assume that (Y , d) is a metric space not reduced to a singleton, and let X be a nonempty set. Then sequences are enough to describe the topology Tp on Y X , if and only if, Tp is metrizable, what, according to Proposition 1014, amounts to say, “if and only if, X is countable.” Proof Assume first that X is uncountable. Fix y0 ∈ Y and let N := {f ∈ Y X : {x ∈ X : f (x) = y0 } is countable}. We shall prove that N is dense in (Y X , Tp ) and yet there is an element f1 in Y X such that no sequence in N converges to f1 . To prove denseness fix f ∈ Y X , a nonempty finite subset F of X, and ε > 0. Define ⎧ ⎨f (x) if x ∈ F , g(x) := ⎩y0 otherwise. Then g ∈ N ∩ U (f0 ; F , ε). This proves denseness. Fix y1 ∈ Y such that y1 = y0 . Let f1 : X → Y be the constant function that takes the value y1 . There is no sequence in N that Tp -converges to f1 , since the limit of a Tp -convergent sequence is clearly a function f such that takes the value y0 at an uncountable subset of X. Assume, on the contrary, that X is countable. Then we proved in Proposition 1014 that Tp on Y X is metrizable. We mentioned after Definition 552 that a subset C of a metric space is closed, if and only if, the limit of each convergent sequence in C belongs to C. After Proposition 1015 it is clear that nonmetrizable topologies require a convergence tool more general than a sequence. This is done in the following definition. Definition 1016 Let (I , ≤ ) be a nonempty partially ordered directed upward set. Let X be a nonempty set. A function r : I → X is said to be a net in X. If X is a topological space, a net r in X is said to converge to a point x ∈ X if, given a neighborhood U of x, there exists i0 ∈ I such that r(i) ∈ U for every i ≥ i0 . Note that a sequence is a particular case of a net, precisely when (I , ≤ ) is the set N endowed with its natural order. In the same way that is usual to write a sequence as {rn }∞ n=1 instead of the more artificial notation r : N → X, putting rn instead of r(n) for n ∈ N, we write {ri }i∈I for a net r = I → X, where r(i) is just written ri for i ∈ I . The following result justifies the name given to the topology Tp . Proposition 1017 A net {fi }i∈I in Y X converges to f ∈ Y X in the topology Tp of pointwise convergence, if and only if, fi (x) → f (x) in (Y , d) for each x ∈ X.
11.6 ♣ Pointwise Topology and Product Spaces
581
Proof Let {fi }i∈I be a net in Y X and f ∈ Y X . Assume first that {fi }i∈I converges to f in (Y X , Tp ). Fix x ∈ X. Then, given ε > 0 there exists i0 ∈ I such that fi ∈ U (f ; {x}, ε) for i ≥ i0 , i.e., d(fi (x), f (x)) < ε for i ≥ i0 . This shows that {fi (x)}i∈I converges to f (x) in (Y , d). Conversely, assume that {fi (x)}i∈I converges to f (x) in (Y , d) for each x ∈ X. Fix a nonempty finite subset F := {x1 , x2 , . . . , xn } of X and ε > 0. Then, for k ∈ {1, 2, . . . , n} we can find ik in I such that d(fi (xk ), f (xk )) < ε for i ≥ ik . It is enough to choose i0 ∈ I such that i0 ≥ ik for k = 1, 2, . . . , n to get fi ∈ U (f ; F , ε) for i ≥ i0 . This shows that {fi }i∈I converges to f in (Y X , Tp ). Corollary 1018 Let N be a nonempty countable set, and let (K, d) be a compact metric space. Then the space (K N , Tp ) is compact. Proof The case that N consists of two elements was proved in Proposition 606, and for finite N it is proved by finite induction. So assume that N is countably infinite. We may also assume, without loss of generality, that N := N. By Proposition 1015, it is N enough to show that every sequence {xk }∞ k=1 in K has a convergent subsequence. Due to the fact that (K, d) is compact, we can extract a subsequence of {xk }∞ k=1 whose first coordinates form a convergent subsequence. Repeat the argument for the sequence of the second coordinates of the extracted subsequence. Continue in this way. A simple diagonal argument shows that there exists a subsequence of {xk }∞ k=1 that pointwise converges to an element in K N (thanks to Proposition 1017). We provide below some concrete examples of product spaces and the corresponding topology of pointwise convergence. Example 1019 Let [0, 1][0,1] denote the space of all real valued functions from the interval [0, 1] into [0, 1]. Consider the interval [0, 1] endowed with the absolute-value metric. (i) We remark that ([0, 1][0,1] , Tp ) is a compact space. This is the celebrated Tychonoff theorem (see, e.g., [Ke55, Thm 5.13]; A. N. Tychonoff was a Russian mathematician). It provides the foundation for a good deal of modern real analysis and related fields. A proof of a particular case is in Corollary 1018 above. (ii) The closure in ([0, 1][0,1] , Tp ) of the subset of all continuous functions on [0, 1] that have values in [0, 1], is the whole set [0, 1][0,1] . Indeed, given g ∈ [0, 1][0,1] and a finite subset F := {t1 , t2 , . . . , tn } in [0, 1], define a continuous function fF : [0, 1] → [0, 1] that to ti associates g(ti ) for i = 1, 2, . . . , n. The family Pf [0, 1] consisting of all the finite subsets of [0, 1], endowed with the partial order defined by the inclusion, is a directed set, hence {fF : F ∈ Pf [0, 1], ⊆} is a net in C[0, 1], and clearly converges to g in the topology of the pointwise convergence (see Proposition 1017). However, the Dirichlet function (see Definition 296), for instance, is not the limit of a sequence of continuous functions in [0, 1][0,1] in the topology Tp (see Remark 415). This illustrates by a different example the fact that the space ([0, 1][0,1] , Tp ) is not metrizable, something that follows from Proposition 1014.
582
11 Excursion to Functional Analysis
(iii) It is separable. Indeed, C[0, 1] is separable endowed with the supremum norm (see Example 586.4), and so it is when endowed with a coarser topology, namely the topology of the pointwise convergence (since, this topology has fewer open sets). Now, given h ∈ [0, 1][0,1] , i.e., a function h : [0, 1] → [0, 1], and an arbitrary finite set {x1 < x2 < . . . < xn } in [0, 1], there exists a continuous function f : [0, 1] → [0, 1] such that f (xi ) = h(xi ) for i = 1, 2, . . . , n. This proves that C[0, 1] is dense in [0, 1][0,1] in the topology of the pointwise convergence, and the conclusion follows. ♦ Example 1020 Let [0, 1]ω (also written [0, 1]N ) denote the space of all functions from the set ω := {1, 2, 3, . . . } into [0, 1] (i.e., of all sequences in [0, 1]), endowed with the topology of pointwise convergence (again [0, 1] carries the metric induced by the absolute-value metric on R). This is a compact space (a consequence of Tychonoff’s Theorem (see, e.g., [Ke55, Thm 5.13]; 1 now Corollary 1018 applies)), and it is metrizable (by the metric ρ(f , g) := |f (n) − g(n)|, see Proposition n2 1014). The compactness can be proved directly by the use of the Cantor diagonal method. An important result from modern analysis is that any infinite-dimensional compact convex set in a Banach space is homeomorphic to this space (Keller’s Theorem, see, e.g., ([FHHMZ11], §12.3 and Theorem 12.37)). ♦ Example 1021 Let Rω (also written RN ) denote the space of all real-valued functions on the set ω = {1, 2, 3, . . . } (i.e., all sequences of real numbers), endowed with the topology of pointwise convergence (R carries the absolute-value metric). This is a complete metrizable separable space and all separable infinite-dimensional Banach spaces are homeomorphic to it. See, e.g., ([FHHMZ11], Thm. 12.46). ♦ Example 1022 Let 2ω (also written 2N ) denote the space of all functions from ω = {1, 2, 3, . . . } into {0, 1} (i.e., of all sequences consisting of 0’s and 1’s) endowed with the topology of the pointwise convergence. The set {0, 1} carries the metric induced by the absolute-value metric on R. We note that, in this case, a subset O of 2ω is open whenever for every f0 ∈ O there are numbers n1 , n2 , . . . , nk in N such that {f ∈ 2ω : f (ni ) = f0 (ni ) for all i = 1, 2, . . . k} ⊂ O. This space is homeomorphic to the Cantor ternary set C endowed with its usual topology. Indeed, we proved in Proposition 279 that the mappingφ from 2N onto C that to f ∈ 2N associates the element x ∈ C such that {x} = ∞ n=1 f (1),... ,f (n) (see Eq. (3.27)) is one-to-one and onto. Let us see that it is also continuous with continuous inverse whenever 2N is endowed with the product topology and C with the topology inherited from the absolute-value topology in R. To this end, fix f ∈ 2ω , and ε > 0. Let n0 ∈ N such that 3−n0 < ε. If g ∈ 2N satisfies g(n) = f (n) for n = 1, 2, . . . , n0 , then φ(g) ∈ g(1),... ,g(n0 ) (= f (1),... ,f (n0 ) ), and diam f (1),... ,f (n0 ) < 3−n0 < ε. The continuity of φ follows. This shows, without extra effort, that the inverse mapping is also continuous. Indeed, it is enough to apply Proposition 337 (its proof shows that the statement holds true if K there is a compact metric space and f maps K into a metric space in a one-to-one way).
11.6 ♣ Pointwise Topology and Product Spaces
583
An important result in modern analysis is that every compact metric space is a continuous image of this space. ♦ Example 1023 Let ωω (also written NN ) denote the space of all functions from ω := {1, 2, . . . } into ω (i.e., of all sequences of natural numbers) endowed with the pointwise topology. The set ω carries the metric induced by the absolute-value metric on R. Note that, in this case, a subset O of ωω is open if for every f0 ∈ O there are numbers n1 , n2 , . . . , nk such that {f ∈ ωω : f (ni ) = f0 (ni ) for all i = 1, 2, . . . , k} ⊂ O. In Definition 638 this space was endowed with a metric that induced the pointwise topology described above. The resulting metric space was called there the space of Baire. In Exercise 13.392 it is proved that this space is homeomorphic to the space P of all irrational numbers endowed with the usual metric in R. We proved in Theorem 596 that any complete separable metric space is a continuous image of this space. ♦ Example 1024 Let (X, ·) be a (real) Banach space. Recall that (X∗ , ·) is a Banach space (see Corollary 904). Observe that X∗ is a subset of RX , and so X∗ can be endowed with the restriction of the topology Tp of the pointwise convergence on RX (where R carries the usual topology induced by the absolute-value metric). This topology on X∗ is called the weak∗ topology, and is denoted by w∗ . A consequence of Proposition 1017 is that a net {xi∗ }i∈I in X ∗ is w∗ -convergent to x ∗ ∈ X ∗ , if and only if, xi∗ (x) → x ∗ (x) for each x ∈ X. Every x ∈ X can also be understood as a mapping from X ∗ into R, precisely as the one that to x ∗ ∈ X ∗ associates x ∗ (x). It is clear that this mapping is linear, and the fact that the association is one-to-one is a consequence of the Hahn–Banach Theorem 925. Indeed, if x and y are two elements in X and x = y, there exists x ∗ ∈ X ∗ such that x ∗ (x) = x ∗ (y). In this way, X can be considered as a subspace ∗ ∗ of RX . The topology on X induced by the topology Tp of RX is called the weak topology on X, and is denoted by w. A consequence of Proposition 1017 is that a net {xi }i∈I in X is w-convergent to x ∈ X, if and only if, x ∗ (xi ) → x ∗ (x) for each x ∗ ∈ X∗ . Proposition 1014 shows that (RX , Tp ) is not metrizable if X has dimension greater than or equal to 1. In principle, this does not exclude the possibility that Tp , when restricted to X∗ , should be metrizable. In fact, it is so in some cases. For example, assume that (X, ·) is an n-dimensional Banach space, where n ∈ N. Observe that, thanks to the linearity of the elements in X ∗ , the space (X ∗ , w∗ ) can be identified to (Rn , Tp ). Precisely, if {ei }ni=1 is an algebraic basis of X, the mapping φ : X ∗ → Rn defined as φ(x ∗ ) := (x ∗ (e1 ), . . . , x ∗ (en )) for x ∗ ∈ X∗ , is a linear isomorphism if X∗ is endowed with the w∗ topology, and Rn with the Euclidean topology. This shows that (X∗ , w∗ ) is metrizable. This is the only case in which w∗ on X ∗ is metrizable. In fact, we have the following result. Proposition 1025 Let X be an infinite-dimensional Banach space. Then the topology w∗ on X ∗ is not metrizable.
584
11 Excursion to Functional Analysis
Proof Assume it is. There exists a metric d on X∗ whose associated topology is w∗ . For n ∈ N, let B(0, 1/n) := {x ∗ ∈ X ∗ : d(x ∗ , 0) < 1/n}. Since this is a w∗ neighborhood of 0 there exists a nonempty finite subset Fn of X and εn > 0 such that U (0; Fn , εn ) ⊂ B(0, 1/n). Fix x0 ∈ X and let V := {x ∗ ∈ X∗ : |x ∗ (x0 )| < 1}. This is a w∗ -neighborhood of 0, hence there exists n ∈ N such that B(0, 1/n) ⊂ V . All together we have U (0; Fn , εn ) ⊂ B(0, 1/n) ⊂ V . Let x ∗ ∈ X∗ such that x ∗ (x) = 0 for x ∈ Fn . Then mx ∗ ∈ U (0; Fn , εn ) for all m ∈ N, hence mx ∈ V for all m ∈ N, and so x ∗ (x0 ) = 0. This shows that, considering x0 , x1 , . . . , xn as linear mappings from X ∗ into R, they have the property n that i=1 ker xi ⊂ ker x0 , hence x0 is a linear combination of x1 , x2 , . . . , xn (see, e.g., [FHHMZ11], Lemma 3.21). We proved that X is the linear span of ∞ n=1 Fn , hence it has countable algebraic dimension. This is impossible (see the paragraph after Theorem 919). A similar argument, that is left to the reader, shows that the weak topology of a Banach space X is never metrizable but in case that X is finite-dimensional. Observe that, as a consequence of this and Proposition 1025, neither w∗ in X∗ nor w in X coincide with the norm topology if X is an infinite-dimensional Banach space. An extension of Theorem 919 says that (BX∗ , w∗ ) is compact whenever X is a Banach space (this is the important Alaoglu–Bourbaki theorem). The proof is similar to the one given there, this time relying on the compactness of any product of compact spaces. Observe that in this case we do not have, in general, metrizability of the space (BX∗ , w∗ ). This trend of ideas will not be pursued here. ♦ Theorem 1026 Let M be a metric space and BC(M) denote the space of all bounded continuous real functions on M endowed with the supremum norm, denoted by ·∞ . Then (BC(M), ·∞ ) is a Banach space, and its unit ball BBC(M)∗ is compact in its pointwise topology w∗ . Let h denote the homeomorphism of M into (BBC(M)∗ , w∗ ) defined by t $ → ht , where ht (f ) := f (t) for every f ∈ BC(M). Let β(M) denote the closure of h(M) in (BBC(M)∗ , w∗ ), endowed with the restriction of the topology w∗ . Then β(M) is a compact space and each bounded continuous function on M can be extended to a continuous function on β(M). ˇ The space β(M) is called the Stone–Cech compactification of M. For an account ˇ of the Stone–Cech compactification, see, e.g., [Jame, pp. 299–303]. M. H. Stone ˇ was an American mathematician, and E. Cech was a Czech mathematician.
11.7
Excursion to Nonlinear Functional Analysis
11.7.1 Variational Principles One of the difficulties for optimization in an infinite-dimensional setting is that a real-valued bounded-below continuous function on a reasonable metric space may
11.7 Excursion to Nonlinear Functional Analysis
585
Fig. 11.38 The function f and its perturbation (Theorem 1027)
not attain its infimum. The following result, due to the French mathematician I. Ekeland, says that a small perturbation—actually, as small as wished—of the original function does attain its infimum (see Fig. 11.38). This has a number of important consequences, some of them listed below. Theorem 1027 (Ekeland Variational Principle) Let (X, d) be a complete metric space and let f be a bounded-below continuous real valued function defined on X. Then, given ε > 0, there is a point x0 ∈ X that minimizes the function x $ → f (x) + εd(x, x0 ), i.e., f (x0 ) ≤ f (x) + εd(x, x0 ) for all x ∈ X. Proof ([BeLi00], p. 87) Choose x1 ∈ X arbitrary. Then having chosen x1 , x2 , . . . , xn ∈ X, choose xn+1 ∈ X such that f (xn+1 ) + εd(xn , xn+1 ) ≤ inf{f (x) + εd(xn , x) : x ∈ X} +
1 . 2n
(11.55)
The sequence {f (xn )} is “almost nonincreasing” in the sense that f (xn+1 ) ≤ f (xn+1 ) + εd(xn , xn+1 ) ≤ f (xn ) +
1 , 2n
hence {f (xn )} is convergent, due to the fact that f is bounded below (for details, see Exercise 13.110). Summing the inequalities εd(xn , xn+1 ) ≤ f (xn ) − f (xn+1 ) +
1 2n
(11.56)
over n ∈ N, we get from the convergence of {f (xn )} that d(xn , xn+1 ) is convergent and that {xn } is thus a Cauchy sequence, hence convergent, say to x0 . We will show that x0 is the required point. Indeed, from (11.55) we have for each n ∈ N, and for each x ∈ X, f (xn+1 ) ≤ f (x) + εd(xn , x) +
1 . 2n
(11.57)
586
11 Excursion to Functional Analysis
Given x ∈ X, take the limit in (11.57) for n → ∞ to get f (x0 ) ≤ f (x) + εd(x0 , x).
Remark 1028 By choosing conveniently the “starting point” x1 in the proof of Theorem 1027 we can get x0 such that (i) f (x0 ) is arbitrarily close to b := inf{f (x) : x ∈ X}, and (ii) x0 − x1 is small. Indeed, fix δ > 0 and take x1 ∈ X such that f (x1 ) − b < δ. Then carry on the selection of the subsequent elements xn to satisfy f (xn+1 ) + εd(xn , xn+1 ) ≤ inf{f (x) + εd(xn , x) : x ∈ X} +
δ . 2n
(11.58)
We get, as in (11.56), εd(xk , xk+1 ) ≤ f (xk ) − f (xk+1 ) +
δ , 2k
(11.59)
and adding up (11.59) from k = 1 to n we obtain εd(x1 , xn+1 ) ≤ f (x1 ) − f (xn+1 ) + δ
n 1 . 2k k=1
Letting n → ∞ we get ∞ 1 ≤ f (x1 ) − b + δ < δ + δ = 2δ. εd(x1 , x0 ) ≤ f (x1 ) − f (x0 ) + δ k 2 k=1
This implies d(x1 , x0 ) < 2δ/ε, and (b ≤ ) f (x0 ) ≤ f (x1 ) + εd(x0 , x1 ) < f (x1 ) + 2δ < b + δ + 2δ = b + 3δ. It is enough to choose, for example, δ = ε 2 to get simultaneously d(x1 , x0 ) < 2ε, and (b ≤ ) f (x0 ) < b + 3ε 2 . ® The following result, due to the American mathematicians R. S. Palais and S. Smale, retains the basic necessary condition for a minimum (Fermat’s Theorem 362) from the classical one-dimensional optimization analysis (see Fig. 11.39). Corollary 1029 [Palais, Smale] Let f be a bounded-below Fréchet differentiable real-valued function on a Banach space X. Then there is a sequence {xn }∞ n=1 in X such that (i) f (xn ) → inf x∈X f (x), and (ii) |f (xn )| → 0. Proof Put b := inf{f (x) : x ∈ X}. According to Theorem 1027 and Remark 1028, we can choose a sequence {xn }∞ n=1 in X such that (*) f (xn ) ≤ f (x) + (1/n)xn − x,
11.7 Excursion to Nonlinear Functional Analysis
587
Fig. 11.39 Fermat’s Theorem 362 “almost” holds (Corollary 1029)
f x1 x2 x3
and (**) (b ≤ ) f (xn ) < b + 1/n, for all x ∈ X and all n ∈ N. From (**) we get (i) in the statement. From (*) it follows that, given u ∈ SX and λ ∈ R, f (xn ) ≤ f (xn + λu) +
1 λu. n
This shows that, for λ > 0, f (xn + λu) − f (xn ) 1 ≥− , λ n
(11.60)
f (xn + λu) − f (xn ) 1 ≤ . λ n
(11.61)
and for λ < 0,
Since f is Fréchet differentiable, It follows from (11.60) and (11.61) that |f (xn )| < 1/n, and this shows (ii) in the statement.
11.7.2
More on Differentiability of Convex and Lipschitz Functions
Convex Functions The following result extends Proposition 813 to the case of Banach spaces. Proposition 1030 Let f be a convex function which is defined and bounded on an open convex set C in a Banach space. Then f is locally Lipschitz on C, i.e., for every x ∈ C there is a ball B(x, r) ⊂ C such that f is Lipschitz on B(x, r). In order to prove Proposition 1030 we show first the following statement: Lemma 1031 If a convex function f is bounded by 1 on BX , then f is 5-Lipschitz on 21 BX . Proof of Lemma 1031 Assume x, y ∈ 21 BX and assume that f (y) − f (x) > 5x − y−x (see Fig. 11.40). Clearly, z ∈ BX . Since the y. Consider the point z = y + 2y−x points x, y, z are on a line in this order, the three chord property (see Proposition 810) gives
588
11 Excursion to Functional Analysis
Fig. 11.40 The three points in the proof of Lemma 1031
BX
z x y 0
f (z) − f (y) f (y) − f (x) ≥ > 5. z − y y − x We have z − y = 21 . Thus f (z) ≥ f (y) +
5 2
≥ −1 +
5 2
= 23 , a contradiction.
Proof of Proposition 1030 It follows easily from Lemma 1031.
Proposition 1032 Every convex function f defined on a finite-dimensional normed space X is continuous. Proof Let {e1 , e2 , . . . , en } be an algebraic basic of X. Put M := max{|f (ei )| : i = 1, 2, . . . , n}. Since all norms on X are equivalent (seeTheorem 908), we may nassume n that the norm · of X has the property that x ≥ |x | where x = i i=1 i=1 xi ei . Then for x ∈ BX we get ni=1 |xi | ≤ 1 and so we have n n n |f (x)| = xi f (ei ) ≤ |xi |.|f (ei )| ≤ M |xi | ≤ M. i=1
i=1
i=1
Thus, by Proposition 1030, f is locally Lipschitz, hence continuous, on BX . The following result is due to the Polish mathematician S. Mazur.
Theorem 1033 (Mazur) Let X be a separable Banach space and f be a continuous convex function on X. Then the set of points of Gâteaux differentiability of f is dense Gδ in X. Proof Let {xm }∞ m=1 be a dense sequence in the unit sphere of X. For n, m ∈ N, put δ Gn,m := x∈X; there exists δ>0 so that f (x + δxm ) + f (x − δxm ) − 2f (x)< . n Since, any convex continuous function on the real line is differentiable at all but countably many points (see Remark 812.4), we have that each Gn,m is dense in X. Since by Proposition 1030 the function f is locally Lipschitz on X, we get that each Gn,m is open in X. To see this, we may assume, without loss of generality that f is L-Lipschitz on X for some L > 0. Fix x ∈ Gn,m . Then, by the definition of Gn,m there exists C ∈ (0, 1/n) such that f (x + δxm ) + f (x − δxm ) − 2f (x) < C. δ δ 1 ( n − C) . Then, if z − x < ε we have Choose 0 < ε < min 2δ , 4L f (z + δxm ) + f (z − δxm ) − 2f (z) δ
11.7 Excursion to Nonlinear Functional Analysis
589
f (x + δxm ) + Lx − z + f (x − δxm ) + Lx − z − 2f (x) + 2Lx − z δ 4Lε 1 1 ≤C+ 0 y∈SX δ n
(x) Since f is convex, f (x+δy)+f (x−δy)−2f is decreasing as δ 8 0+ and we get G = δ Gn . Hence, it suffices to show that each Gn is an open subset of X. To this end, let x ∈ Gn and f be L-Lipschitz on B(x, α) (see Proposition 813). There exists C < n1 and δ < α2 such that
f (x + δy) + f (x − δy) − 2f (x) < C. δ y∈SX δ 1 − C . We claim that B(x, ε) ⊂ Gn . Indeed, given Choose 0 < ε < min 2δ , 4L n z ∈ B(x, ε) we have for all y ∈ SX : sup
f (z + δy) + f (z − δy) − 2f (x) δ
590
11 Excursion to Functional Analysis
f (x + δy) + Lz − x + f (x − δy) + Lz − x − 2f (x) + 2Lz − x δ 4Lε 1 1 ≤C+
mx h
For m ∈ N put Am = {x ∈ A; mx = m}. Given m ∈ N, consider the cover of X∗ by 1 . Since X∗ is separable, by the Lindelöf property, all open balls in X∗ of radius 24m m let {Bk }k be a countable subfamily of these balls that covers X ∗ . For k ∈ N define Am,k = {x ∈ Am ; px ∈ Bkm }. We have A = m,k Am,k . Hence it is enough to show that Am,k is nowhere dense for each m, k. Fix m and k, choose any x ∈ Am,k and a neighborhood U of x. We will show that there is a point y ∈ U that has a neighborhood V such that V ∩ Am,k = ∅. By Proposition 1030, assume that U is of the form U = BX (x, r), the open r-ball centered at x, where r is chosen so that f is Lipschitz with constant K > 1/m on BX (x, r). Since x ∈ Am , there is h ∈ X, h < r such that f (x + h) − f (x) >
h m
+ p x (h).
(11.62)
We will show that BX (x + h, h/12Km) ∩ Am,k = ∅. Assume that there is z ∈ BX (x + h, h/12Km) ∩ Am,k . As z ∈ Am,k and x ∈ Am,k , by the definition of Am,k 1 we obtain p x − p z < 12m . By the choice of p z we have f (x) − f (z) ≥ pz (x − z).
(11.63)
Adding (11.62) and (11.63) we have f (x + h) − f (z) > p z (x − z) +
h m
+ p x (h)
= px (x + h − z) + (p z − p x )(x − z) +
h . m
(11.64)
11.7 Excursion to Nonlinear Functional Analysis
591
h h Since x + h − z ≤ 12Km and px ≤ K, we have |p x (x + h − z)| ≤ 12m . h x z Furthermore, z − x ≤ z − (x + h) + h ≤ 12Km + h ≤ 2h and p − p < 1 1 . Therefore, |(pz − px )(x − z)| ≤ 12m · 2h = h . Hence from (11.64) we 12m 6m obtain h h h h f (x + h) − f (z) > − − = . m 6m 3m 2m h This contradicts the fact that |x + h − z| ≤ x + h − z ≤ 12mK . Therefore, f is Fréchet differentiable on a residual set in X. The fact that it is actually Fréchet differentiable on a dense Gδ set in X follows from Lemma 1036. On the real line we do have Lebesgue theorem on differentiability (a.e.) for Lipschitz functions (as they have finite variation, see Theorem 424). However, even for finite-dimensional spaces the situation is difficult: We have the following result, a special case of a theorem of the German mathematician H. A. Rademacher.
Theorem 1038 (Rademacher) Every Lipschitz real valued function on a finitedimensional Banach space is Fréchet differentiable at points of a dense set. For the proof we refer, e.g., to [BoVan10, Theorem 2.3.1] or [FHHMZ11, Theorem 11.21]. Note that in Theorem 1038 we do not speak on the Gδ -property of the set of points of differentiability, as the set of points of differentiability of a Lipschitz real-valued function defined on the real line may not be a residual set (though the function is differentiable (a.e.)). The following important result is due to the Czech-British mathematician D. Preiss in the nineties of the twentieth century: Theorem 1039 (Preiss) If X is a Banach space such that X ∗ is separable, then every Lipschitz real valued function on X is Fréchet differentiable at points of a dense set. Its (constructive) proof is beyond the scope of this introductory text. For a reference, see, e.g., [Le02]. Let us mention that the result is so subtle that, even now, is not known if three Lipschitz real-valued functions on a Hilbert space have a common point of Fréchet differentiability, though it has been recently proved that two functions do have it. From the smooth approximation on infinite-dimensional Banach spaces we mention the following result, due to R. Bonic and J. Frampton. Theorem 1040 (Bonic, Frampton) Let X be a Banach space with separable dual. Then any continuous real valued function on X can be uniformly approximated by Fréchet continuously differentiable functions. On the other hand we have the following result of the Czech mathematician J. Kurzweil. Theorem 1041 (Kurzweil) The space C[0, 1] does not admit any Fréchet differentiable real valued function with bounded nonempty support. Using this fact and a standard composition with continuous real valued functions on the real numbers, it is not difficult to prove that Kurzweil Theorem 1041 has the following corollary.
592
11 Excursion to Functional Analysis
Corollary 1042 The norm ·∞ of C[0, 1] cannot be uniformly approximated by Fréchet differentiable functions uniformly on the unit ball. Rough Norms The following concept of a rough norm was studied and important results on it were proved by the Canadian mathematician John Whitfield. Definition 1043 Let ε > 0. We say that a norm · on a Banach space X is ε-rough if for all x ∈ X, lim sup h→0
x + h + x − h − 2x ≥ ε. h
(11.65)
We say that a norm · is rough, if it is ε-rough for some ε > 0. Lemma 1044 The standard norm of C[0, 1] is 2-rough. Proof Let f be an element in the unit sphere of C[0, 1] and let λ > 0. Assume without loss of generality that there is t0 ∈ [0, 1] such that f (t0 ) = 1 . Pick two different points s and t of [0, 1] such that f (s) > 1 − λ2 and f (t) > 1 − λ2 . Consider a function g in the unit sphere of C[0, 1] such that g(s) = 1 and g(t) = −1. Then we have f + λg + f − λg ≥ (f + λg)(s) + (f − λg)(t) = f (s) + f (t) + λ(g(s) − g(t)) ≥ 2 − 2λ2 + 2λ This shows that
f + λg + f − λg − 2 ≥ 2 − 2λ. λ As this holds for all λ > 0, we get for all f in the unit sphere of C[0, 1], lim sup h→0
f + h + f − h − 2 ≥ 2. h
Thus the norm of C[0, 1] is 2-rough. The following result is due to B. Leach and J. H. M. Whitfield.
Theorem 1045 (Leach, Whitfield) If a Banach space X admits an equivalent rough norm, then it cannot admit an equivalent Fréchet differentiable norm. Proof Consider X in its ε-rough norm ·, and let | · | be an equivalent Fréchet differentiable norm on X. Let a constant C > 0 be such that |x| ≥ Cx for all x ∈ X. Define the function f on X by f (x) = |x|2 − x for x ∈ X.
11.7 Excursion to Nonlinear Functional Analysis
593
Then the function f is a bounded-below continuous function on X. Indeed, for each x ∈ X, we have f (x) = |x|2 −x ≥ C 2 x2 −x = x(Cx−1). So, if x > C −1 , f (x) > 0. If x ≤ C −1 , then f (x) = |x|2 − x ≥ −x ≥ −C −1 . Thus we can apply Ekeland’s Variational Principle (Theorem 1027) to f . Accordingly, there is x0 ∈ X such that ε f (x0 + h) ≥ f (x0 ) − h, for every h ∈ X. 4 Write down this inequality for h and for −h and add them. We obtain for all h ∈ X, ε |x0 + h|2 + |x0 − h|2 − x0 + h − x0 − h ≥ 2|x0 |2 − 2x0 − h. 2 Hence, lim sup h→0
|x0 + h|2 + |x0 − h|2 − 2|x0 |2 h
≥ lim sup h→0
x0 + h + x0 − h − 2x0 ε − >0 h 2
This contradicts the Fréchet differentiability of | · |2 at x0 .
11.7.3
More on Fixed Point Theorems
A fundamental theorem in Fixed Point Theory is the so-called Banach Contraction Principle for complete metric spaces (see Theorem 652). We saw in Sect. 6.11 that non-contractive mappings on complete metric spaces may lack fixed points. Another cornerstone is Brouwer’s Fixed Point Theorem, ensuring that every continuous function from a nonempty convex compact subset of a finite-dimensional Banach space into itself has a fixed point (for a proof see, e.g., [FHHMZ11], Theorem 12.14). For a one-dimensional version of this result see Proposition 651 and Exercise 13.205. The following result, due to J. Schauder, P. K. Lin, and Y. Sternfeld, extends Brouwer’s theorem to the infinite-dimensional case. We refer to, e.g., [BeLi00], for a proof. Theorem 1046 (Schauder, Lin, Sternfeld) Let C be a nonempty convex compact set in a Banach space and f be a continuous map from C into C. Then f has a fixed point. If C is a closed noncompact convex set in a Banach space, then there is a Lipschitz mapping f from C into C that has no fixed point. Example 1047 Let the mapping f from the closed unit ball of c0 into itself be defined by f (x) = (1 − x∞ , x1 , x2 , . . .) for x = (x1 , x2 , . . .) ∈ Bc0 . Then, the mapping f is Lipschitz, and it has no fixed point. Indeed, if x is a fixed point of f , then by inspection, x1 = 1 − x∞ = x2 = x3 = . . . , so in order that x ∈ c0 , we have to have xi = 0 for each i ∈ N. This is a ♦ contradiction, as x1 = 1 − x∞ .
594
11 Excursion to Functional Analysis
11.8 An Application: Periodic Distributions 11.8.1
Introduction
Physicists and engineers use to apply the rules of Calculus to “functions” as the Dirac delta (that assigns 0 to x = 0 and +∞ to x = 0), for example “computing” its definite integral or its derivative, and letting it to form part of a differential equation. It is possible to develop a theory to justify those manipulations, a frame where the definition and the extended calculus sit properly. “Extended” is a crucial word, since this new frame really includes the classical one. This was the purpose of the socalled “Distribution Theory,” today a part of the Mathematical Analysis thanks to the essential contributions of the French mathematician L. Schwartz. Many of the processes considered by engineers are periodic. It is easier to establish the basis of Distribution Theory in the case of the so-called “periodic distributions.” Since, one of the objectives in mind is to treat topics in Fourier Analysis in this framework, one-real-variable functions with complex values will be considered. In this section we are indebted in part to the reference [Bea73].
11.8.2
The Basic Idea
In the classical sense, a function is determined by assigning to each element x in its domain a value f (x) in its range. In the new setting, a “function” is determined by its action on other functions via integration. This is to say that we identify f by 2π 1 the result of computing 2π 0 f (x)φ(x)dx for a “substantial” family of “simple” functions φ—called “test functions”—(the integral extended to the interval [0, 2π] since 2π-periodicity is what we are interested in). In this new context we can still define the standard operational rules. For example, derivation is carried out in a very simple way: The key point is the use of the integration-by-parts formula (7.51), see Theorem 705. Indeed, assume that given a 2π 1 function f we are able to compute 2π 0 f (x)φ(x)dx for all φ in a suitable class of test functions (so identifying f by this procedure, as we suggested above). Then, 2π
1 in order to identify f (i.e., 2π 0 f (x)φ(x)dx for φ in the class), it is enough to recall that, under suitable conditions, 4 2π 4 2π 1 1 f (2π)φ(2π) − f (0)φ(0) − f (x)φ(x)dx = f (x)φ (x)dx . 2π 0 2π 0 for all φ in the class (clearly, we must consider classes of test functions that are differentiable). The right-hand member of the previous formula provides, thus, the “derivative” of f (the left-hand member) without actually computing f . Several remarks are in order. 1. Despite a common understanding, it is definitely easier to integrate than to differentiate—ask a computer.
11.8 An Application: Periodic Distributions
595
2. The process of integrating is incomparably more stable than to differentiate— perturb slightly a function: its integral is almost the same, while its derivative can be totally different. 3. If the class of test functions consists of infinitely differentiable functions, we can repeat the procedure again and again “computing” higher and higher derivatives of f .
11.8.3
The Basic Definitions
The Test Functions We consider the space T of all infinitely differentiable and 2π -periodic scalar-valued functions defined on R, endowed with the metric d(φ, ψ) :=
∞ 1 d∞ (D n φ, D n ψ) , for all φ, ψ ∈ T, n+1 2 1 + d∞ (D n φ, D n ψ) n=0
(11.66)
where D n denotes the n-th derivative operator. This metric space (T, d) is called the space of test functions for the periodic distributions. Examples of functions in T are the exponential function eit , the sine sin t and cosine cos t functions, the composite-by-infinitely-differentiable functions of those trigonometric functions, and the functions in Exercises 13.219 and 13.225. Proposition 1048 The metric space (T, d) is complete. Proof Let {fn }∞ n=1 be a Cauchy sequence in T. Fix k ∈ {0, 1, 2, . . . }. The sequence is Cauchy in the Banach space (CP [0, 2π ], ·∞ ) (see Subsect. 11.4.2), {D k fn }∞ n=1 hence it converges to an element gk ∈ CP [0, 2π ]. We shall prove that g0 ∈ T, and that D k g0 = gk for k ∈ N. All together, this will conclude that {fn } converges to g0 in the metric (11.66). Note that, for x ∈ R, k ∈ {0, 1, 2, . . . }, and n ∈ N, the Fundamental Theorem of Calculus 685 gives 4 x k k D fn (x) = D fn (0) + D k+1 fn (t) dt. (11.67) 0
Letting n → ∞ in (11.67) we get 4
x
gk (x) = gk (0) +
gk+1 (t) dt,
(11.68)
0
(the last summand because the convergence of D k+1 fn to gk+1 is uniform (see Corollary 697). Now, (11.68) implies that Dgk = gk+1 (see again Theorem 685). By induction we get D k g0 = gk for k ∈ N (in particular, D k g0 exists for every k ∈ N, hence g0 ∈ T).
596
11 Excursion to Functional Analysis
The next result shows two properties of the space of 2π -periodic test functions. We say that a space has the Heine–Borel property if it satisfies the Heine–Borel Theorem 96, i.e., every closed and bounded subset is compact. This, together with the separability, tell us that, in a sense, the space is rather small, hence, speaking again loosely, the dual space, i.e., the space of all 2π-periodic distributions, is rather big. Proposition 1049 The space (T, d) has the Heine–Borel property and is separable. Proof We shall prove that every bounded sequence {fn }∞ n=1 in T has a d-convergent subsequence. Fix k = 0, 1, 2, . . . , and consider the sequence {D k fn }∞ n=1 . This is, by is also d∞ -bounded. assumption, d∞ -bounded. Moreover, the sequence {D k+1 fn }∞ n=1 Use Exercise 13.402 to conclude that the sequence {D k fn }∞ is equicontinuous. By n=1 Theorem 648, there exists a subsequence that d∞ -converges. Proceed then recursively: apply the previous argument to the case k = 0 to obtain a subsequence {fn0 }∞ n=1 that d∞ -converges, say to h0 . Apply the argument again to the sequence 1 ∞ 1 1 ∞ {fn0 }∞ n=1 to obtain a subsequence {fn }n=1 such that {D fn }n=1 is d∞ -convergent, say to h1 . Continue in this way. The sequence {gn := fnn }∞ n=1 has the property that is d -convergent to h for all k = 0, 1, 2, . . . Now, the argument in the {D k gn }∞ ∞ k n=1 proof of Proposition 1048 shows that h0 ∈ T and that D k h0 = hk for k ∈ N. This implies that gn → h0 in the metric d. We shall prove that (T, d) is separable. Proceed by contradiction by assuming that (T, d) is not separable. Let pn (f , g) :=
n 1 d∞ (D k f − D k g) , for all f , g ∈ T, and for all n ∈ N, k+1 2 1 + d∞ (D k f − D k g) k=0
and put Un := {f ∈ T : pn (f , 0) < 1/n} for n ∈ N. We claim that then, for some N ∈ N, the space (T, pN ) is not separable. If, on the contrary, each (T, pn ) is separable, we may find a countable pn -dense subset Dn of T. The set D := ∞ in T. Indeed, given f ∈ T and ε > 0, we can find N ∈ N such n=0 Dn is d-dense 1 that ∞ N +1 2k+1 < ε/2. Since DN is pN -dense in T, there exists g ∈ T such that pN (f , g) < ε/2. It follows that d(f , g) < ε/2 + ε/2 = ε, and so D is d-dense. Since D is countable, this contradicts the nonseparability of (T, d), and the claim is proved. In view of Theorem 582, there exists δ > 0 and an uncountable subset SN of UN that is δ-separated in (T, pN ). Put kN := 1. Since T = ∞ k=1 kUN +1 , there exists kN+1 ∈ N such that SN ∩ kN+1 UN+1 contains a proper uncountable subset SN+1 . Continue in this way to get a sequence {kn }∞ n=N in N and a strictly decreasing sequence {Sn }∞ n=N of uncountable subsets of T such that Sn ⊂ kn Un for all n ≥ N . For each n ≥ N, choose fn ∈ Sn \ Sn+1 . Since pm (fn , 0) ≤ km for all n ≥ m, and for all m ≥ N, the sequence {fn }∞ n=N is d-bounded in T. By the first part of the proof, ∞ {fn }∞ n=N has a d-convergent subsequence {fnk }k=1 . In particular, pN (fni , fnj ) < δ for ni and nj big enough, and this contradicts the property of the set SN . Propositions 1048 and 1049 give, in particular, that the space (T, d) is a Polish space. However, its structure is richer, since it is a vector space (accordingly, its elements—infinitely differentiable 2π-periodic functions—are referred to as “vectors”). Both structures, the algebraic and the topological ones, are, moreover,
11.8 An Application: Periodic Distributions
597
intimately related (we say that they are compatible): the two algebraic operations of sum of vectors and product of a scalar and a vector are continuous functions (something that can be checked easily). Technically speaking, our space T is a topological vector space—more precisely, a Fréchet space:A Fréchet space (from M. Fréchet) is a vector space X endowed with a topology T that has the following properties: 1. T is compatible with the linear structure, i.e., both mappings (x, y) → x + y and (λ, x) → λx from X × X into X and from R × X into X, respectively, are continuous. 2. T is induced by a metric d (i.e., given a sequence {xn } in X and an element x ∈ X, we have xn → x in T , if and only if, d(xn , x) → 0), and the metric d is translation-invariant (i.e., d(x + z, y + z) = d(x, y) for all x, y, z ∈ X). 3. The metric space (X, d) is complete. Remark 1050 In Sect. 11.1 we introduced the class of Banach spaces and the subclass of Hilbert spaces. It is worth to mention that the metric on T defined in (11.66) does not come from a norm in T that makes T a normed space. In order to prove this, we shall show that if (T, ·) is a normed space, then there exists a ·-null sequence {φm } in T that is not d-null. We shall need the following. Lemma 1051 Let F : T → R be a positively homogeneous function. Then F is d-continuous at 0, if and only if, there exists n ∈ N such that |F (φ)| ≤ n(φ∞ + . . . + D n φ∞ ), for all φ ∈ T.
(11.69)
Proof Assume first that (11.69) does not hold. Then, given n ∈ N there exists φn ∈ T such that |F (φn )| > n(φn ∞ + . . . + D n φn ∞ ) ( ≥ nD k φn ∞ for all k = 0, 1, 2, . . . , n). It follows that . . k . D φn . . . . |F (φ )| .
1 1 D k φn ∞ < for k = 0, 1, 2, . . . , n. (11.70) |F (φ )| n n n ∞ . k . . . Letting n → ∞ in (11.70) we get . |FD(φφnn)| . → 0 as n → ∞. Since this is ∞ true for k = 0, 1, 2, . . . , we get d(φn /|F (φn )|, 0) → 0. However, and due to the positive homogeneity of F , we have F (φn /|F (φn )|) = 1 for all n ∈ N, so F is not d-continuous at 0. Conversely, assume that (11.69) holds. Let {φm } be a d-null sequence in T. In particular, for every k = 0, 1, 2, . . . we have D k φm ∞ → 0 as m → ∞, so F (φm ) → 0. This proves that F is continuous at 0. Now we can conclude the proof of the assertion: Put =
sin (mx) , for m ∈ N, mn+1 where n is given by Lemma 1051 for the positively homogeneous function ·. Observe that D k φm ∞ = mk−n−1 for m ∈ N and k = 0, 1, 2, . . . , n. It follows from φm (x) :=
598
11 Excursion to Functional Analysis
(11.69) that φm ≤ n
n k=0
1 mn−k+1
≤
n , for m ∈ N, m−1
hence φm → 0. However, D n+1 φm ∞ = 1 for all m ∈ N, and so {d(φm , 0)}∞ m=1 does not converge to zero. ® The Space of Periodic Distributions We can now introduce the following definition. Definition 1052 The space of periodic distributions, denoted PD, is the topological dual of the space (T, d)—i.e., the space of all linear and continuous functions from T into C. As a consequence of Lemma 1051 we get that, given F ∈ PD, there exists a constant C > 0 and an integer number n ∈ N ∪ {0} such that |F (φ)| ≤ C(φ∞ + D(φ)∞ + . . . + D n (φ)∞ ) for all φ ∈ T.
(11.71)
The minimum of such n ∈ N ∪ {0} is said to be the order of the distribution F . Example 1053 It is important to show that some objects of the classical Calculus— continuous 2π-periodic functions, more generally Riemann integrable or, even more generally, Lebesgue integrable functions defined on [0, 2π ]—can be viewed, in fact, as elements of PD (the identification, as we mentioned above, to be done via integration), and that some non-so-classical, although “familiar” objects like the Dirac delta function, also belong to PD. This is done below. Some of the arguments depend on the following simple fact, a consequence of the particular behavior of the metric d on T: given a d-null sequence {φn } in T, we have, in particular, φn ∞ → 0. 1. Let f be a Lebesgue integrable complex-valued function defined on [0, 2π]. Let us define a function Ff : T → C as Ff (φ) :=
1 2π
4 f (t)φ(t)dt for all φ ∈ T.
(4)
(11.72)
[0,2π ]
Note that f φ is a measurable function (see Proposition 402) that, moreover, is Lebesgue integrable on [0, 2π ] (see Remark 757). Observe that Ff is a linear function: Indeed, given φ, ψ ∈ T and α, β ∈ C, we have 4 4 4 f (t)(αφ(t) + βψ(t))dt = α f (t)φ(t)dt + β f (t)ψ(t)dt. [0,2π]
[0,2π ]
[0,2π ]
Due to the 2π-periodic character of both f and φ ∈ T, we may assume, if convenient, that the integral in (11.72) is extended over any closed interval having length 2π , the result being independent of the chosen interval.
4
11.8 An Application: Periodic Distributions
599
Fig. 11.41 The construction in Remark 1054.1
Moreover, the mapping Ff from T into C is d-continuous. Indeed, 4 4 1 1 |Ff (φ)| ≤ |f (t)|.|φ(t)|dt ≤ |f (t)|dt, φ∞ 2π [0,2π ] 2π [0,2π ]
(11.73)
and the continuity of Ff at 0 follows from Lemma 1051. This is enough for ensuring the continuity of Ff at every point in T, due to the linearity of Ff and the compatibility of the algebraic and geometric structures of T. We proved, all together, that Ff ∈ PD for each Lebesgue-integrable complex-valued function f defined on [0, 2π]. Moreover, it follows from (11.73) that Ff has order 0. 2. Let t ∈ [0, 2π]. Let δt be the function defined on T as δt (φ) := φ(t), for all φ ∈ T.
(11.74)
The mapping δt is linear: Indeed, δt (αφ + βψ) = (αφ + βψ)(t) = αφ(t) + βψ(t) = αδt (φ) + βδt (ψ) for all α, β ∈ C and all φ, ψ ∈ T. Moreover, δt is d-continuous: Indeed, |δt (φ)| = |φ(t)| ≤ φ∞ for all φ ∈ T, and Lemma 1051 shows the continuity of δt at 0. As above, the fact that δt is linear implies then the continuity at every point in T. All together, δt ∈ PD for each t ∈ [0, 2π ], and δt has order 0. ♦ Remark 1054 Two remarks are in order: 1. Periodic distributions of type Ff (see Example 1053.1) form a proper subspace of the space PD. Indeed, given t0 ∈ [0, 2π ], the distribution δt0 (see Example 1053.2) is not defined by an integrable function. In order to prove it, assume for a moment that a Lebesgue-integrable complex-valued function f defined on [0, 2π] exists such that δt0 = Ff . We may assume, without loss of generality, that t0 ∈ (0, 2π ) (see the footnote in formula (11.72)). Let U be an open interval such that t0 ∈ U ⊂ (0, 2π ) so small that U |f | < 1 (see Theorem 763). Let φ be an element in T such that 0 ≤ φ(t) ≤ 1 for all t ∈ [0, 2π ], that φ(t0 ) = 1, and that φ vanishes out of U (such a function φ exists, see Exercises 13.219 and 13.225). Then we have 1 = φ(t0 ) = δt0 (φ) = Ff (φ)=
1 2π
4 fφ = [0,2π ]
1 2π
4 fφ ≤ U
1 2π
4 |f |.|φ| ≤ U
1 2π
4 |f | 0 and a step function s associated to a partition P := {0 = a0 < a1 < a2 < . . . < an = 2π }. Find δ > 0 such that E |f | < ε for every measurable subset of [0, 2π ] such that λ(E) < δ (see Theorem 763). Find φ ∈ T such that 0 ≤ φ(t) ≤ 1 for all t ∈ [0, 2π ], φ(ai ) = 0 for all i = 0, 1, 2, . . . , n, and λ{t ∈ [0, 2π] : φ(t) = 1} < δ (see Fig. 11.42 and Exercise 13.225). Then 4 4 4 4 = ≤ f s f φ + f (s − φ) |f | < ε. [0,2π ]
[0,2π ]
[0,2π ]
E
Since this holds for every ε > 0, we get [0,2π ] f s = 0 for every step function s on [0, 2π], as we wanted to show. If we apply the first part to the step function χ[0,x] (i.e., the characteristic function of the interval [0, x]) for x ∈ [0, 2π], we get [0,x] f = 0. This is true for every x ∈ [0, 2π], and so the result follows from Proposition 770. ® From now on, and thanks to Remark 1054.2, we shall identify a Lebesgue-integrable complex-valued function f defined on [2, π ] with the periodic distribution Ff defined by (11.72). Accordingly, those distributions will be called “functions.”
11.8.4
Derivatives of Periodic Distributions
Assume that f is a 2π-periodic differentiable complex-valued function defined on R, and that its derivative—again a 2π -periodic function—is Lebesgue integrable. As a distribution, Ff is defined by (11.72). The integration-by-parts formula (Theorem 705) shows that, for a test function φ ∈ T we have, due to the 2π-periodic character
11.8 An Application: Periodic Distributions
601
Fig. 11.43 The function in Example 1
of both f and φ, 1 2π
4 [0,2π ]
f (t)φ(t)dt = −
1 2π
4
f (t)φ (t)dt.
(11.75)
[0,2π ]
The left-hand member of (11.75) is the way the distribution Ff acts, so it corresponds, in our identification, to the function f . What is somehow amazing is that, due to the fact that Eq. (11.75) is an equality, we are able to compute Ff without actually computing f : Just use the right-hand member of (11.75) instead. This observation suggests how to introduce derivatives (of any degree) of periodic distributions in a way that coincides with the classical derivation procedure in case of differentiable functions. This is done below. Definition 1055 Let F ∈ PD and let k ∈ N ∪ {0}. Then, the k-th derivative of F is the periodic distribution D k F ∈ PD given by D k F (φ) := (−1)k F (D k φ) for all φ ∈ T.
(11.76)
There is a point in the previous definition that must be checked: Formula (11.76) does indeed define an element in PD. Certainly, D k F is a linear mapping on T. Moreover, it is continuous. This follows from the following fact: if {φn } is a d-null sequence in T, we have D k φn ∞ →n 0 for every k = 0, 1, 2, . . . This shows that {D k φn }∞ n=1 is a d-null sequence for every k = 0, 1, 2, . . . . Since F is continuous, {F (D k φn )}∞ n=1 k tends to 0, and so it does {D k F (φn )}∞ n=1 . It follows that D F is continuous at 0 and so continuous on T. Definition 1055 allows to consider derivatives of any order of 2π -periodic Lebesgue-integrable real-valued functions as soon as they are view as periodic distributions. We consider two illustrative examples. 1. Let f be the 2π-periodic extension of the function (see Fig. 11.43) ⎧ ⎪ ⎪ ⎨−1 if x ∈ (0, π ), 1 if x ∈ (π , 2π ), ⎪ ⎪ ⎩ 0 if x = 0, π, 2π.
602
11 Excursion to Functional Analysis
We identify f with the periodic distribution Ff . Then, for φ ∈ T, (DFf )(φ) = −Ff (φ ) = −
1 2π
4
2π
f (t)φ (t)dt
0
4 π 4 2π 1 1 = φ (t)dt − φ (t)dt 2π 0 2π π 1 = (φ(π) − φ(0) − φ(2π ) + φ(π )) 2π 1 = (− φ(0) + 2φ(π ) − φ(2π )). 2π The reader may provide the expressions for the successive derivatives D k Ff , k = 2, 3, . . . Observe that the function f is not differentiable in the classical sense. 2. Let t ∈ [0, 2π]. Then, for φ ∈ T and k = 0, 1, 2, . . . , (D k δt )(φ) = (−1)k δt (D k φ) = (−1)k (D k φ)(t). It is easy to show that the derivative of a periodic distribution of order n ∈ N ∪ {0} is a distribution of order n + 1. Starting from functions and Dirac deltas we may then construct periodic distributions of any order n ∈ N ∪ {0}. It is interesting that essentially all periodic distributions can be obtained this way. This will shown in Remark 1059.
11.8.5
Convergence in PD
We may now define convergence of sequences of periodic distributions in the following way: We say that a sequence {Fn } in PD converges to an element F in the PD
sense of distributions (and we shall write Fn → F ) whenever {Fn (φ)} converges to F (φ) for each φ ∈ T.
11.8.6
Fourier Analysis
In the space T we have the (two-sided) sequence {en }n∈Z , where en (t) := eint , for n ∈ Z and t ∈ [0, 2π ]
(11.77)
(indeed, those functions are infinitely differentiable and 2π -periodic). Given a periodic distribution F ∈ PD, define its n-th Fourier coefficient cn by cn := F (e−n ), for n ∈ Z.
(11.78)
11.8 An Application: Periodic Distributions
603
In this way, we associate to any periodic distribution F its sequence of Fourier coefficients {cn }n∈Z . Note that formula (11.78), when applied to a periodic distribution F := Ff defined by a complex-valued Lebesgue integrable 2π -periodic function f (see Example 1053.2), returns its sequence (9.22) of Fourier coefficients. The mapping that sends each F ∈ PD to the associated sequence (cn )n∈Z is one-to-one: Indeed, the subspace span {en : n ∈ Z} is dense in (T, d), hence two periodic distributions F and G such that F (en ) = G(en ) for all n ∈ Z are equal. The ddensity of span {en : n ∈ Z} is a consequence of Theorem 861: observe that any function f in T is Lipschitz—due to the fact that it has a bounded derivative on [0, 2π ]. Therefore its Fourier series converge uniformly to f (Theorem 861). The same applies to the function Df . Observe that the partial sums of the Fourier series of Df are the derivatives of the corresponding partial sums of the Fourier series of f —just use the integration-by-parts Proposition 800, see also the proof of Theorem 861. By induction, this holds for any derivative D n f , and so the Fourier series of f is d-convergent to f . This shows the density of span {en : n ∈ Z} in (T, d)—and, eventually, provides an alternative proof to the separability of the space (T, d), see Proposition 1049. Given a periodic distribution F ∈ PD, its Fourier series is the (formal) series n∈Z cn en , where {cn }n∈Z is the sequence of its Fourier coefficients. We write F ∼ cn e n (11.79) n∈Z
(in short, F ∼ {cn }n∈Z or, alternatively, {cn }n∈Z ∼ F ) to denote that the series n∈Z cn en is the Fourier series of F ∈ PD. By the remark above, if f is a complexvalued Lebesgue integrable 2π-periodic function defined on R, and Ff denotes the periodic distribution defined by f (Example 1053.2), then the Fourier series of Ff coincides with the Fourier series of f given by (9.20). When dealing with Fourier exponential series n∈Z an en , convergence must be understood in any of the allowed senses (pointwise, uniform, quadratic mean, in the metric d, in the sense of distributions, etc.) by considering , N the sequence of partial sums . n=−N an en N ∈N
We shall transfer the calculus on periodic distributions to the calculus on their sequences of Fourier coefficients. Moreover, classes of periodic distributions— functions, Dirac deltas, etc.—will be characterized by the behavior of the sequence of Fourier coefficients of their elements. Chart 11.44 summarizes the—accessible-implications/equivalences. They are formally stated and proved in Proposition 1058 below. The missing definitions are recorder below. The space C 1 P [0, 2π ] consists of all continuously differentiable (i.e., functions with a continuous derivative) 2π -periodic complex-valued functions defined on R. Definition 1056 A sequence {cn }n∈Z in C is said to be of slow growth if there exists r > 0 and c > 0 such that |cn | ≤ c|n|r for all n ∈ Z, n = 0.
604
11 Excursion to Functional Analysis
Fig. 11.44 Connections between periodic distributions and their Fourier coefficients sequence
A sequence {cn }n∈Z in C is said to be of rapid decrease if for every r > 0 there exists c = c(r) > 0 such that |cn | ≤ c|n|−r for all n ∈ Z, n = 0. Figure 11.44 presents part of the announced correspondence between distributional calculus and sequence calculus. Sums and products by scalars, derivatives, and convergence were operations already defined; the sequence counterpart of the first two will be presented in Proposition 1057, the third one in Proposition 1061. Other useful ones—as complex conjugacy, reversal, translation, and convolution—should be defined, as the previous ones, in PD, only to prove later the correspondence in spaces of sequences. However, forced by the brevity of this introduction, we shall shortcut this approach by just defining those last in the space of sequences, a licence that the reader may kindly accept in view of its practical interest. Proposition 1057 (i) The mapping that to any F ∈ PD associates its sequence of Fourier coefficients is linear. (ii) If F ∼ n∈Z cn en , and k ∈ N ∪ {0}, then D k F ∼ n∈Z (in)k cn en . Proof The proof of (i) is obvious. To prove (ii) is enough to observe that D k e−inx = ( − in)k e−inx for all k ∈ N ∪ {0} and for all n ∈ Z. Then, D k F (e−n ) = (−1)k F (D k e−n ) = (−1)k F ((−in)k e−n ) = k k (in) F (e−n ), hence D F ∼ n∈Z (in)k cn en . Proposition 1058 The following statements hold: (i) A periodic distribution is a function f ∈ L2 [0, 2π ], if and only if, its sequence of Fourier coefficients belongs to 2 (Z). If this is the case, its Fourier series converges to f in the norm ·2 . (ii) A periodic distribution is a test function φ—i.e., an infinitely differentiable 2π periodic function—if, and only if, its sequence {cn }n∈Z of Fourier coefficients is of rapid decrease. If this is the case, the Fourier series c en converges n n∈Z to φ in the metric space (T, d). As a consequence, if n∈Z cn en ∼ φ ∈ T and
11.8 An Application: Periodic Distributions
n∈Z
605
an en ∼ F ∈ PD, then F (φ) =
n∈Z
cn a−n
=
c−n an .
(11.80)
n∈Z
(iii) The sequence {cn }n∈Z of Fourier coefficients of an arbitrary periodic distribution F has small growth. Conversely, any sequence {c n }n∈Z of small growth defines a, unique, periodic distribution F such that F ∼ n∈Z cn en . Moreover, The series
n∈Z cn en
converges to F in the sense of distributions.
(iv) Let f ∈ C 1 P [0, 2π ]. Then the sequence {cn }n∈Z of its Fourier coefficients belongs to the (complex) Banach space 1 (Z), and the Fourier series of f converges to f uniformly. (v) If a sequence {cn }n∈Z belongs to 1 (Z), it is the sequence of Fourier coefficients of a periodic distribution f in CP [0, 2π], and its Fourier series converges to f uniformly—i.e., in the norm ·∞ . Proof (i) The correspondence betweenL2 [0, 2π] and 2 (Z) is (v) in Theorem 977. (ii) Let φ ∈ T, and let φ ∼ the integration-by-parts n∈Z cn en . Induction on Proposition 800 gives, for k = 0, 1, 2, . . . , D k φ ∼ n∈Z (in)k cn en . Observe that every continuous 2π-periodic function is an element in L2 [0, 2π ]. Use then Parseval identity (11.40) to show that n∈Z |in|2k |cn |2 = D k φ22 and, in particular |n|k |cn | ≤ D k φ2 for all n ∈ N. This proves that {cn }n∈Z is of rapid decrease. Assume now that {cn }n∈Z is of rapid decrease. Put sN (x) := N n=−N cn en (x), for n ∈ N ∪ {0}.By letting r := 2 in the definition of rapid decrease (Definition 1056) we get n∈Z |cn | < +∞. It follows that {sN }∞ N =1 is a ·∞ -Cauchy sequence in CP [0, 2π]. A similar argument applies to the sequence {D k sN }∞ N =1 , where k ∈ N, giving that it is ·∞ -Cauchy in CP [0, 2π]. This shows—see the proof of the completeness of (T, d) in Proposition 1048—that {sN }∞ N =1 is d-convergent to some φ ∈ T. It is clear that φ ∼ c e . n∈Z n n Since, PD, is a linear n∈Z cn en = φ in the metric d and every F ∈ andd-continuous function on T, itfollows that, if F ∼ n∈Z an en , then F ( n∈Z cn en ) = n∈Z cn F (en ) = n∈Z cn a−n . (iii) Assume that n∈N cn en ∼ F ∈ PD. Let k ∈ N ∪ {0} be the order of F , and let C > 0 be a constant that satisfies simultaneously Eq. (11.71) and max{|c1 |, |c−1 |} ≤ 2C. In particular, for |n| ≥ 2, (|cn | = ) |F (e−n )| ≤ C(e−n ∞ + (− in)e−n ∞ + . . . + (− in)k e−n ∞ ) = C(1 + |n| + . . . + |n|k ) ≤ 2C|n|k ,
606
11 Excursion to Functional Analysis
where the last inequality can be proved easily by induction (and for |n| = 1, |cn | ≤ 2C). This shows that {cn }n∈Z has slow growth. Assume now that a sequence {cn }n∈Z has slow growth. Then there exists r > 0 and C > 0 such that |cn | ≤ C|n|r for all n ∈ Z, n = 0. Choose k ∈ N such that r ≤ k − 2. Thus |cn | ≤ C|n|k−2 for all n ∈ Z, Nn = 0. Define b0 := 0, bn := (in)−k cn for n ∈ Z, n = 0, and put v := N n=−N bn en . Observe that |bn | ≤ C|n|−2 for n ∈ Z, n = 0, so n∈Z |bn | < +∞, and then vN → v in the norm ·∞ by the Weierstrass M-test (Theorem 473), for some v ∈ C[0, 2π ]. Note that Fv (e−n ) = bn for all n ∈ Z. Form the periodic distribution F := D k Fv + Ff , wheref := c0 e0 . Then F ∼
n∈Z cn en .
(11.81)
Indeed, for n ∈ Z,
F (e−n ) = D k Fv (e−n ) + Ff (e−n ) = Fv [(−1)k D k e−n ] + Ff (e−n ) = (in)k Fv (e−n ) + Ff (e−n ) = (in)k bn + +Ff (e−n ) = cn . The uniqueness follows from the remark after formula (11.78). Assume now that n∈Z an en ∼ φ ∈ T, and that n∈Z cn en ∼ F ∈ PD. Put, N for N ∈ N, FN := N n en . By (ii) we have FN (φ) = n=−N c n=−N cn a−n . We proved there that the series n∈Z cn a−n converges to F (φ). This shows that FN (φ) → F (φ). This finishes the proof of (iii). (iv) The function f is continuous and 2π-periodic, in particular an element in L2 [0, 2π ]. Use (i) to show that n∈Z |in|2 |cn |2 < +∞. Finally, by using the Cauchy–Schwarz inequality 8.24, we have n∈Z, n =0
|cn | =
n∈Z, n =0
⎛ 1 |ncn | ≤ ⎝ n
n∈Z, n =0
⎞1/2 ⎛ 1⎠ ⎝ n2
⎞1/2 n2 cn2 ⎠
,
n∈Z, n=0
and the conclusion follows as in (iii) (use again the Weierstrass M-test). (v) Assume that {cn }n∈Z belongs to 1 (Z). It follows that the series n∈Z cn en is ·∞ -Cauchy, hence it converges in ·∞ to a (2π -periodic and continuous) function f . Clearly, f ∼ n∈Z cn en . The last part of (v) follows as in (iv). Remark 1059 The proof of (iii) in Proposition 1058 gives the following consequence: every periodic distribution F ∈ PD is the sum of a derivative of a distribution given by a continuous 2π -periodic function and the distribution defined by a constant function.
This gives a full description of the space of all periodic distributions. For example, the distribution δ0 has clearly the Fourier series n∈Z en , i.e., cn = 1 for all n ∈ Z. We can write δ0 = D 2 Fv + Ff , where v := n∈Z, n=0 (in)−2 en , and f
11.8 An Application: Periodic Distributions
607
Fig. 11.45 The six first partial sums of the Fourier series for the distribution δ0 (Remark 1059) (vertical scales are different)
is the constant function 1, as can be easily checked (this proves, by the way, that δ0 is not a function, since the sequence of Fourier coefficients of any Lebesgue integrable 2π-periodic function belongs to the space c0 , according to the Riemann–Lebesgue Lemma 837). See in Fig. 11.45 the plot of the six first partial sums N n=−N en , N = 0, 1, 2, 3, 4, 5, of the Fourier series for the periodic distribution δ0 , and recall that N n=−N en is just the Dirichlet kernel DN , introduced in Definition 840 (see formula 9.31 and Fig. 9.4). Item (iii) in Proposition 1058 shows, in particular, that the sequence {DN }∞ ® N=0 converges to δ0 in the sense of distributions. The following definition introduces a useful operation between periodic distributions. The reader may check, by using (iii) in Proposition 1058, that the object so defined belongs, indeed, to the space PD. This operation, when applied to two 2π-periodic scalar-valued continuous functions, gives the classical convolution (see Exercise 13.227 for the definition and Exercise 13.621 for the result). Some applications of the concept of convolution appear in exercises (see, in particular, Exercise 13.623). Definition 1060 Let n∈Z an en ∼ F ∈ PD, and n∈Z bn en ∼ G∈ PD. We define the convolution of F and G as the periodic distribution F ∗G ∼ n∈Z an bn en . For example, if f ∈ L[0, 2π] and sn denotes the n-th partial sum of its Fourier series, recall that sn = Dn ∗ f for all n = 0, 1, 2, . . . (see formula (9.32)). Note that, Definition 1060 to the distributions Dn and Ff we get n indeed, when applying c e , if f ∼ c en . k k n n∈Z k=−n The next result gives necessary and sufficient conditions for convergence of sequences in (T, d) and in PD (in this last space, in the sense of distributions). Proposition 1061 (i) A sequence {φm }∞ m=1 in T such that φm ∼ n∈Z cm,n en for all m ∈ N, is d-convergent to an element φ ∼ n∈Z cn en in T, if and only if,
608
11 Excursion to Functional Analysis
{(cm,n )n∈Z }∞ m=1 is of “uniform rapid decrease” and converges pointwise to (cn )n∈Z , i.e., for all r>0 there exists c(r)>0 such that |cm,n |≤c(r)|n|−r , for m∈N, n ∈ Z, n = 0, (11.82) and cm,n → cn , as m → ∞, for all n ∈ Z.
(11.83)
(ii) A sequence {Fm }∞ m=1 in PD such that Fm ∼ n∈Z cm,n en for all m ∈ N, converges in the sense of distributions to an element F ∼ n∈Z cn en in PD, if and only if, {(cm,n )n∈Z }∞ m=1 is of “uniform slow growth” and converges pointwise to (cn )n∈Z , i.e., there exists r>0 and C>0 such that |cm,n | ≤ C|n|r , for m ∈ N and n ∈ Z, n = 0, (11.84) and cm,n → cn , as m → ∞, for all n ∈ Z.
(11.85)
Proof (i) Sufficiency. Fix r > 0 and find c(r) > 0 such that |cn,m | ≤ c(r)|n|−r for all n ∈ Z, n = 0, and for all m ∈ N. Then |cn | ≤ c(r)|n|−r for all n ∈ Z, n = 0. Proposition 1058 (ii) shows that there exists φ ∈ T such that φ ∼ n∈Z cn en . We fix now r > 2 and get c(r) accordingly. shall prove that φm → φ in (T, d). To this end, −r We can find N ∈ N such that c(r) ∞ n < ε. Then, for x ∈ [0, 2π ] and for n=N+1 m ∈ N, +∞ inx (cm,n − cn )e n=−∞
≤
N n=−N
0. We have D φm ∼ n∈Z (in) cm,n en (see
11.8 An Application: Periodic Distributions
609
2π k 1 (ii) in Proposition 1057), hence |(in)k cm,n | ≤ 2π 0 |D φ(x)| dx < c(k), and so |cm,n | ≤ c(k)|n|−k for all m ∈ N and all n ∈ Z, n = 0. Since k ∈ N ∪ {0} is arbitrary, this proves (11.82). That (11.83) holds follows from the fact that {Fn (φm )}∞ m=1 → Fn (φ) for Fm := Fem , m ∈ Z (see (11.80)). (ii) Sufficiency. There exists r > 0 and C > 0 such that |cm,n | ≤ C|n|r for all m ∈ N and all n ∈ Z, n = 0. It follows that |cn | ≤ C|n|r for all n ∈ Z, n = 0, hence there exists F ∈ PD such that F ∼ n∈Z cn en . We shall prove that the sequence {Fm }∞ converges to F in the sense of distributions. Fix m=1 n∈Z an en ∼ φ ∈ T. Due to the fact that the sequence {an }+∞ has fast decrease, we can find c( − r − 2) > 0 n=−∞ ≤ c(− r − 2)|n|−r−2 for all n ∈ Z, n = 0. Choose N ∈ N such that such that |an | −2 c( − r − 2)C ∞ < ε. Find then M ∈ N such that for m ≥ M we have n=N +1 n N |a |.|c − c | < ε. We finally get, by using (11.80), −n m,n n n=−N |Fm (φ) − F (φ)| = | a−n (cm,n − cn )| n∈Z N+1
≤
|a−n |.|cm,n − cn | +
n=−N −1
+
+∞
−N −1
|a−n |.|cm,n | +
n=−∞ +∞
|a−n |.|cm,n | +
n=N+1
|a−n |.|cn |
−N −1
|n|−r−2 |n|r +
n=−∞
+
n=N+1
|a−n |.|cn |
n=−∞
n=N+1
≤ ε + c(−r − 2)C +∞
−N−1
|n|
−r−2
|n| + r
+∞
−N−1
|n|−r−2 |n|r
n=−∞
|n|
−r−2
|n|
r
n=N+1
≤ ε + 4c(− r − 2)Cε = ε(1 + 4c(− r − 2)C), and this proves that |(F − Fm )(φ)| → 0 as m → ∞. Necessity. Observe that the space (T, d) is a complete and metric space, and d is a translation-invariant metric (see Sect. 11.8.3). Adapt the proof of the Banach– Steinhaus Theorem to show that every pointwise-bounded sequence {Fm }∞ m=1 of continuous linear functionals on (T, d) is equicontinuous—i.e., given ε > 0 there exists δ = δ(ε) > 0 such that supφ∈B(0,δ) |Fm (φ)| < ε. Let {Fm }∞ m=1 be a sequence in PD such that {Fm (φ)}∞ converges to F (φ) for all φ ∈ T, where F is a given element m=1 in PD. Since {Fm (φ)}∞ is bounded for each φ ∈ T, we can previous m=1 apply the −k−1 2 < δ/2. observation to find δ := δ(1). There exists k0 ∈ N such that ∞ k=k0 +1 Find r > 0 such that δ |n|k−r < , for all n ∈ Z, |n| ≥ 2, and for all k = 0, 1, 2, . . . , k0 . 2(k0 + 1) Then
k0 1 |n|k−r δ < , for all n ∈ Z, |n| ≥ 2. k+1 k−r 1 + |n| 2 2 k=0
610
11 Excursion to Functional Analysis
Note that, given n ∈ Z, |n| ≥ 2, we have d
k0 ∞ ∞ 1 |n|k−r 1 |n|k−r 1 en , 0 = ≤ + < δ. k+1 1 + |n|k−r k+1 1 + |n|k−r k+1 |n|r 2 2 2 k=0 k=0 k=k +1 0
This shows that |Fm (en /|n| )| < 1 for all n ∈ Z, |n| ≥ 2, and for all m ∈ N. Thus, r
(cm,−n = ) |Fm (en )| ≤ |n|r for all m ∈ N, and all n ∈ Z, |n| ≥ 2.
(11.86)
Due to the fact that {Fm (en )}∞ m=1 converges for all n ∈ Z, Eq. (11.86) hold for all n ∈ Z. This proves (11.84). To prove (11.85) note that {Fm (en )}∞ m=1 → F (en ) for each n ∈ Z, and use (11.80). An Application: The Heat Equation We shall solve the following (second-order linear) problem (with an initial condition) in the theory of partial differential equations. The partial differential equation (PDE) in Proposition 1062 below is the so-called heat equation. In the statement of the result, given a family {Ft }t>0 in PD, by the derivative dtd Ft at t > 0 in the sense −Ft in the space PD, the limit of distributions we understand the limit limt→0 Ft+t t computed in the sense of distributions. Proposition 1062 Given G ∈ PD, there exists a unique family {Ft }t>0 in T such that ⎧ ⎨ d F = D 2 F (d/dt in the sense of distributions), (PDE) t t (H ) dt ⎩Ft → G as t ↓ 0 (convergence in the sense of distributions). (Init) (11.87) In this way, Ft := Fφt for t > 0, where {φt }t>0 is a family in T, and the function φ(x, t) := φt (x), for x ∈ R and t > 0, is also infinitely differentiable as a function of the variable t ∈ (0, +∞). Proof We shall proveuniqueness first: Assume that {Ft }t>0 is a family in T that satisfies (H). Let G ∼ n∈Z bn en and Ft ∼ n∈Z an (t)en for t > 0. Equation (PDE) in (11.87) says, precisely, that for t > 0, lim
t→0, t+t>0, t =0
Ft+t − Ft = D 2 Ft , in the sense of distributions. t
In particular, for n ∈ Z, an (t + t) − an (t) t Ft+t − Ft t→0 = (e−n ) −→ D 2 Ft (e−n ) = Ft (D 2 e−n ) = −n2 Ft (e−n ) = −n2 an (t). t
11.8 An Application: Periodic Distributions
611
This shows that an is differentiable at t and an (t) = −n2 an (t).
(11.88)
Moreover, from (Init) in (H) we get t↓0
(an (t) = ) Ft (e−n ) −→ G(e−n ) ( = bn ).
(11.89)
Eqs. (11.88) and (11.89) for n ∈ Z define a problem consisting of an ordinary differential equation and an initial condition. It is obvious that this problem, for n ∈ Z, has one, and only one, solution. Precisely, an (t) = bn e−n t , n ∈ Z, t > 0. 2
(11.90)
This shows uniqueness of the solution of (H). 2 Let us prove now existence. Define an (t) := bn e−n t for n ∈ Z and t > 0. Since G ∈ PD we have, by (iii) in Proposition 1058, that there exists r > 0 and C > 0 such that |bn | ≤ C|n|r for all n ∈ Z, n = 0. Fix t > 0. Observe that |an (t)| ≤ |bn | ≤ C|n|r , for all n ∈ Z, n = 0,
(11.91)
hence {an (t)}n∈Z is the sequence of Fourier coefficients of an element Ft ∈ PD. In fact Ft ∈ T. This can be proven in the following way: Take k ∈ N and note that, for n ∈ Z and p ∈ N ∪ {0}, |an (t)| = |bn |e−n t = |bn |
1
2
≤ |bn |
e n2 t
p! p! C(p!) r−2p ≤ C|n|r 2p p = |n| . 2 p (n t) |n| t tp
It is enough to choose p ∈ N ∪ {0} such that r − 2p < −k to get that {an (t)}n∈Z is the sequence of Fourier coefficients of a periodic distribution Ft ∈ T, i.e., there exists a function φt ∈ T such that Ft = Fφt for all t > 0. Observe that for all n ∈ Z we have an (t) → bn as t ↓ 0; this, together with inequalities (11.91), show that Ft → G in the sense of distributions (see (ii) in Proposition 1061). It remains to prove that, for t > 0, dtd Ft = D 2 Ft (where the limit involved in the computation of the derivative d/dt must be understood in the sense of distributions). To this end, fix t > 0 and observe that, for t = 0 small enough to have t + t > 0, an (t + t) − an (t) Ft+t − Ft ∼ en , t t n∈Z
and D 2 Ft ∼ n∈Z −n2 an (t)en = n ∈ Z. Observe that lim
t→0
n∈Z
an (t)en (see (ii) in Proposition 1057). Fix
an (t + t) − an (t) = an (t) = −n2 an (t), for n ∈ Z and t > 0. t
612
11 Excursion to Functional Analysis
By using the Mean Value Theorem 365, an (t + t) − an (t) = an (t + θn t), for some θn = θn (t), 0 < θn < 1. t Note that |an (s)| = |n2 bn e−n s | ≤ C|n|r+2 for s > 0. Therefore, for δ := t/2, 2
|an (t + θn t)| ≤ |n|r+2 C, for all n ∈ Z and all t, |t| < δ. It is enough to apply (ii) in Proposition 1061 to conclude that dtd Ft = D 2 Ft . Put φ(x, t) := φt (x) for all x ∈ R and t > 0. Since we proved, in particular, that d ( ∂φ = ) F ∼ t n∈Z an (t)en , a similar argument proves that φ has derivatives of all ∂t dt order with respect to t. This concludes the proof. Exercises 13.622 and 13.623 give another application to the theory of partial differential equations.
11.9
Concluding Remarks to Chapter 11
What we know is not much. What we do not know is immense. Pierre-Simon, marquis de Laplace
Banach space theory found its origin in the thirties of the twentieth century mainly by a group of mathematicians around S. Banach in Lvov (then in Poland), in an attempt to use all possible modern tools from topology, probability and differential and integral calculus to solve problems in function theory, approximations, optimization, and differential equations, just to name a few. Since then, this theory developed into one of the most important modern fields in mathematics. In this chapter, we saw some of its early fundamental principles (HahnBanach theorem, Banach open mapping theorem and Banach–Steinhaus theorem). Recently, it turned out that the structure of Banach spaces is much more involved than had been commonly expected before, and thus Banach spaces could model more situations than thought. Some of these recent results solved problems that were open for few decades. Those are good news for this theory. It also calls for more research in this area. In a few lines below, we will try to comment on some of this recent progress. For references to the results mentioned below we refer, e.g., to [BeLi00], [FHHMZ11], [JL01], and [JL03]. Before we start, we need some preliminaries. Proposition 1063 Let X be a Banach space and Y and Z be two infinitedimensional closed subspaces of X such that Y ∩ Z = {0} . Then the following are equivalent. (i) Y + Z is closed in X. (ii) The canonical linear projections from Y + Z onto Y and Z are both continuous. (iii) The distance of SY to SZ is positive, i.e., inf{y − z : y ∈ SY , z ∈ SZ } > 0.
11.9 Concluding Remarks to Chapter 11
613
Proof (i) ⇒ (ii) If Y +Z is closed, it is a Banach space. Then the map (y, z) → y +z is continuous from Y × Z into Y + Z, one to one and onto Y + Z. By the Open Mapping Theorem, it is an isomorphism. This implies (ii). (ii) ⇒ (i) Let yn + zn → x, where x ∈ X and yn ∈ Y , zn ∈ Z for all n ∈ N. Assuming (ii), {yn } and {zn } are both Cauchy sequences and thus convergent, say to y ∈ Y and z ∈ Z, respectively. Thus yn + zn → y + z. This proves that x = y + z and Y + Z is thus closed. (ii) ⇒ (iii) Let P be the (continuous) canonical projection of Y + Z onto Y . If y ∈ SY and z ∈ SZ , then P .y − z ≥ P (y − z) = P y = y = 1. Thus the distance of SY to SZ is greater than or equal to P −1 . (iii) ⇒ (ii) Assume that (ii) does not hold. Without loss of generality, we may assume that the canonical linear projection from Y + Z onto Y is not continuous. Thus we can find two sequences {yn } in Y and {zn } in Z such that yn + zn → 0 and yn → 0. By scaling and choosing a subsequence if necessary we may assume that yn → 1. Then zn → 1 and it follows easily that yn /yn − zn /zn → 0, which gives that the distance from SY to SZ is zero. The following two notions are crucial in this new progress. Definition 1064 Let f be a real-valued function on the unit sphere SE of an infinitedimensional Banach space E. The function f is said to be oscillation stable if for every infinite-dimensional subspace F of E and for every ε > 0 there is an infinite-dimensional subspace G of F such that the oscillation of f on SG is at most ε. Here the oscillation of f on SG is defined by sup{|f (x) − f (y)| : x, y ∈ SG }. Definition 1065 A subset A of the unit sphere of an infinite-dimensional Banach space E is said to be an asymptotic set (or an inevitable set) if the distance d(A, F ) from A to F is zero, where F is an arbitrary infinite-dimensional subspace of E. Note that it follows from linear algebra that the unit sphere of any hyperplane (through the origin) is an example of an asymptotic set in a Banach space. It is nontrivial to find separated asymptotic sets, i.e., asymptotic sets with positive distance between them. In fact, the first such example was obtained by the existence of a space that does not contain any isomorphic copy of p or c0 , discovered by the Russian-Israeli mathematician B. S. Tsirelson in 1974, by using the previous result of V. D. Milman. Nowadays, there are techniques how to create sequences of mutually separated asymptotic sets in some classes of spaces. On the other hand, we will see below that every two asymptotic sets in c0 are at mutual distance zero (T. Gowers). An instructive initial result in this direction is the following proposition. Proposition 1066 Let E be an infinite-dimensional Banach space. Then the following are equivalent. (i) Every real-valued Lipschitz function on SE is oscillation stable. (ii) There is no infinite-dimensional subspace F of E whose unit sphere contains two asymptotic sets A and B such that d(A, B) > 0.
614
11 Excursion to Functional Analysis
Proof Assume (ii) does not hold. Let F , A, and B contradict (ii). The function f (x) = d(x, B) on the unit sphere of E is a Lipschitz function which vanishes on B and is bigger than or equal to d(A, B) on A. We will show that f is not oscillation stable. To see this, let G be any infinite-dimensional subspace of F . We will show that the oscillation of f on SG is greater than or equal to d(A, B). For it, due to the fact that A is an asymptotic set, given 0 < ε < 41 d(A, B), choose x ∈ SG and a ∈ A such that x − a < ε. Then as f is 1-Lipschitz, f (x) ≥ f (a) − ε ≥ d(A, B) − ε. On the other hand, since B is an asymptotic set, choose y ∈ SG and b ∈ B such that b − y < ε. Then, as f (b) = 0 we have f (y) < ε. Thus f (x)−f (y) ≥ d(A, B)−2ε ≥ 21 d(a, B). This shows that f is not oscillation stable. Thus (ii)⇒(i). To prove (i)⇒(ii), assume that f is a uniformly continuous function of SE that is not oscillation stable. Thus we can find an infinite-dimensional subspace F0 of E and ε0 > 0 such that the oscillation of f on the unit sphere of any subspace of F0 is at least 2ε0 . Put a := sup inf{f (x) : x ∈ SG }, G
where the supremum is taken over all infinite-dimensional subspaces G of F0 . Choose an infinite-dimensional subspace F of F0 such that inf{f (x) : x ∈ SF } > a − ε. Put b := inf sup{f (x) : x ∈ SG } G
where the infimum is taken over all infinite-dimensional subspaces G of F . Then b − a > ε0 , by the choice of F0 and ε0 . Put A := {x ∈ SF : f (x) ≥ (a + b)/2 + ε/2} and B := {x ∈ SF : f (x) ≤ (a + b)/2 − ε/2}. Then A and B are asymptotic sets in SF and d(A, B) ≥ C −1 ε0 , where C is the Lipschitz constant of f . Thus (ii) does not hold. Here are some particular recent striking results in this area: 1. (W. T. Gowers, B. Maurey) There is a Banach space X such that if Y and Z are two infinite-dimensional closed subspaces of X, then the distance of the unit spheres SY and SZ is zero, meaning that inf{y − z : y ∈ SY , z ∈ SZ } = 0. This is in contrast with, say, X := c0 , Y be the subspace of c0 formed by vectors that are supported on the set of odd numbers and Z be the subspace of vectors supported on the even numbers. Then y − z = 1 for any y ∈ SY and any z ∈ SZ . Thus the distance of SY to SZ is 1. Spaces as in 1 above are nowadays called hereditarily indecomposable. Such an space X has, among others, the following properties:
11.9 Concluding Remarks to Chapter 11
615
(a) No closed subspace of X can be written as a topological direct sum of two of its closed infinite-dimensional subspaces. This is seen from Proposition 1063. (b) (W. T. Gowers) X is not isomorphic to any of its proper subspaces, (i.e., subspaces different from X). In particular, X is not isomorphic to any of its hyperplanes. This is not easily seen and we refer to Maurey’s article [Ma03]. (c) (B. S. Tsirelson) X does not contain any isomorphic image of p for p ≥ 1 or c0 . This Tsirelson result (1974) induced all this recent progress in this area. (d) (V. D. Milman) (X, ·) contains a subspace Y such that there is λ > 1 and an equivalent norm |·| on Y such that for all infinite-dimensional closed subspaces Z of Y , we have sup{ |z|z21 || : z1 , z2 ∈ S(Z,·) } ≥ λ. We say that the space Y is distortable. This is not easy to see, and we refer to Odell–Schlumprecht article [OSc03]. Further striking recent results include: 2. (T. Odell, T. Schlumprecht) The space p is distortable for any p > 1. In particular, the Hilbert space is distortable. We refer to Odell–Schlumprecht article [OSc03] or ([BeLi00], Chap. 13). 3. (W. T. Gowers) Let g be a uniformly continuous real valued function on the unit sphere of c0 . Then for every ε > 0 there is an infinite-dimensional closed subspace X of c0 such that the oscillation of g on the unit sphere of X is at most ε, i.e., sup{|g(x) − g(y)| : x, y ∈ SX } < ε. This is Gowers’ result, see ([BeLi00], Chap. 13). 4. (S. Argyros, R. Haydon) There is a Banach space X such that for every bounded linear operator T from X into X there is a real number λ and a compact operator K from X into X such that T = λI + K where I stands for the identity operator on X. This is a striking new result due to Argyros and Haydon [ArHa09]. 5. (W. T. Gowers, R. Komorowski, N. Tomczak-Jaegermann) If a Banach space X is isomorphic to all of its infinite-dimensional subspaces, then X is isomorphic to a Hilbert space. We refer to the citation in Maurey’s paper [Ma03]. This can be compared to the earlier result: (J. Lindenstrauss, L. Tzafriri) If X is not isomorphic to a Hilbert space, then it contains a uncomplemented subspace, i.e., a subspace that is not a range of any bounded projection. We refer to [FHHMZ11, Thm. 6.16] for a proof. 6. (S. Argyros, A. Tolias) There is a nonseparable hereditarily indecomposable Banach space X such that all bounded operators T from X into itself are of the form T = λI + S where λ is a real number, I is the identity operator on X and the range of the operator S is separable. We refer to ([FHHMZ11], Remark 5.2) for the reference to this article.
Chapter 12
Appendix
This optional appendix upgrades in short a few of the results in the text, that are not necessarily directly used there.
12.1 The Set of Natural Numbers The axiomatic approach to the introduction of the set of natural numbers and its defining properties is due to the Italian mathematician G. Peano. It uses three basic concepts, N, 1, and S. The symbol N is read as “to be a natural number,” 1 is just a symbol, and S denotes a function, named “immediate successor.” The five Peano axioms for the set of natural numbers are the following: (Ai) N (1) (i.e., 1 is a natural number). (Aii) If N(n), then N (S(n)) (i.e., if n is a natural number, its successor S(n) is also a natural number). (Aiii) There is no n such that N (n) and 1 = S(n) (i.e., the number 1 is not the successor of any natural number). (Aiv) If N (n) and N (m), and S(n) = S(m), then n = m (i.e., if two natural numbers have the same successor, they are equal). (Av) If K is a set consisting of natural numbers, if 1 ∈ K, and S(n) ∈ K whenever n ∈ K, then K is the set of all natural numbers. (This is the so-called finite induction principle). Besides the five defining axioms, there are three definitions; the sum (+), the product (.) of natural numbers, and an order relation, according to the following rules: (D1) For N (n) we define n + 1 := S(n), and for N (n) and N (m), n + S(m) := S(n + m). (D2) For N(n), we define n.1 := n, and for N (n) and N (m), n.S(m) := (n.m) + n. (D3) For N (n) and N (m) we say that n < m whenever there exists p with N (p) such that n + p = m, and we put n ≤ m whenever n < m or n = m.
© Springer International Publishing Switzerland 2015 V. Montesinos et al., An Introduction to Modern Analysis, DOI 10.1007/978-3-319-12481-0_12
617
618
12 Appendix
Observe that every natural number can be written by means of the symbols 1 and S. Indeed, the set M := {1, S(1), S(S(1)), . . . } is the set N of all natural numbers: It is a subset of N, 1 ∈ M, and if n ∈ M then S(n) ∈ M. It is enough to use (Av) then . It is clear then that 1 < S(1) < S(S(1)) < . . . , and so 1 is the smallest (regarding the defined order ≤) element in N. A natural number n such that S(n) = m (i.e., m = n + 1) is called an immediate predecessor of m. A natural number p is a predecessor of m if p + q = m for some q ∈ N. A sequence of natural numbers is a mapping f from N into N. We write nk := f (k) for each k ∈ N, and denote a sequence as {nk }∞ k=1 . We say that is increasing (strictly increasing) if n ≤ n (respectively the sequence {nk }∞ k k+1 k=1 nk < nk+1 ) for every k ∈ N. We say that the sequence {nk }∞ is decreasing (strictly k=1 decreasing) if nk ≥ nk+1 (respectively nk > nk+1 ) for every k ∈ N. Some consequences of the axioms follow: 1. Every natural number but 1 has a unique immediate predecessor. Indeed, Axiom (Aiii) says that 1 has no predecessor, and axiom (Aiv) says that the predecessor of a natural number, if it exists, is unique. Let P be the set of natural numbers having a predecessor, and let P1 := {1} ∪ P . Then 1 ∈ P1 , and if n ∈ P1 then S(n) ∈ P1 , since n is a predecessor of S(n). This shows, by (Av), that P1 = N, and the conclusion follows. 2. The finite induction principle (Axiom (Av) at the beginning of this section) is equivalent to the following axiom: (Av’) Every nonempty subset of N has a least element .
(12.1)
We call the least element of a nonempty subset of N its first element. We refer to the property stated in (Av’) by saying that the order ≤ in N is well ordered. To prove the equivalence, assume first the validity of (Av’). Let P be a certain property regarding natural numbers, and write P (n) in case P holds for some n ∈ N, ¬P (n) otherwise. Assume P (1), and assume too P (n + 1) if P (n). Let K be the set of all n ∈ N such that ¬P (n). If K = ∅, then it has a first element, say s1 . Since P (1), we get s1 > 1. Then, by 1 above, s1 has a unique predecessor, denoted s0 , and s0 ∈ K. It follows P (s0 ), hence P (s1 )is a contradiction. We conclude then that K is empty, and so P (n) for all n ∈ N. Now, start from the validity of (Av), and let K be a nonempty subset of N. Let P be the following property: P (n) whenever n and all its predecessors are in K c (we denote the complement of K in N by K c ). Assume that K has no first element. Then certainly P (1), since 1 ∈ K. Now, assume P (n). Then necessarily P (S(n)) (otherwise, S(n) will be the first element of K). By (Av), we get P (n) for all n ∈ N, i.e., K = ∅. We reach a contradiction, hence K must have a first element. 3. Given a strictly increasing sequence {nk }∞ k=1 in N, and n ∈ N, there exists k ∈ N such that nk > n. Indeed, put P (n) for the property there exists k ∈ N such that n < nk . Since 1 ≤ n for all n ∈ N, we have P (1). Assume P (n) for some n ∈ N. Then there exists k ∈ N such that n < nk . It follows that S(n) ≤ nk < nk+1 , hence P (S(n)). Use (Av) to conclude P (n) for all n ∈ N.
12.2 Integer Numbers
619
4. Axiom (Av) is still equivalent to the following: (Av”) There is no strictly decreasing sequence of natural numbers.
(12.2)
To prove the equivalence, assume first the validity of (Av) (and so of (Av’)). Let {nk }∞ k=1 be a strictly decreasing sequence of natural numbers. Then the set {nk : k ∈ N} has no first element, a contradiction with (Av’). Now, assume that (Av”) holds. If a given nonempty subset M of N has no first element,
12.2
Integer Numbers
Define a set Z consisting of N, a symbol 0 (referred to as “zero”), and a set N− := {−n : n ∈ N} where − is again a symbol. The elements in N, when considered as elements in Z, are referred to as positive integers, the element 0 is called “zero,” and the elements in N− are called negative integers. An order ≤ that extends the order in N is defined on Z: 0 ≤ n and −n ≤ 0 for all n ∈ N, −n ≤ m for all n, m ∈ N, and −n ≤ −m whenever m ≤ n, for all n, m ∈ N. On the set Z we define two algebraic operations that extend the operations in N: the sum (denoted by +), and the product (denoted by .). Recall that if n < m in N, then m − n denotes the element p in N such that p + n = m. The sum is defined in the following way: (i) If n, m ∈ N, then n + ( − m) = ( − m) + n = n − m if n > m, and n + ( − m) = (−m) + n = −(m − n) if m > n. (ii) If n ∈ N, then n + (−n) = (−n) + n = 0. (iii) If n, m ∈ N, then (−m) + (−n) = −(n + m). (iv) If z ∈ Z, then z + 0 = 0 + z = z. The product is defined in the following way: (i) If n, m ∈ N, then n.(−m) = (−m).n = −(n.m), and (−n).(−m) = n.m. (ii) If z ∈ Z, then z.0 = 0.z = 0. The following result collects the main properties of the set Z endowed with the sum, the product, and the order defined above. The proof is a routine exercise, and will be omitted. It is simple to prove that the algebraic operations and the order on Z induce the algebraic operations on N and the order assumed on N. Here, by the word “closed,” we mean that the result of the algebraic operation is again an element in the set. Theorem 1067 The set Z, endowed with the two operations sum ( + ) and product (.), and with the order ≤, is an ordered commutative ring without zero divisors. Precisely, it has the following properties. (i) The sum is closed, associative, and commutative; it has 0 as the additive identity, and every element z ∈ Z has −z as its inverse, i.e., z + (−z) = 0.
620
12 Appendix
(ii) The product is closed, associative, and commutative; it has 1 as the multiplicative identity, and z1 .z2 = 0 implies z1 = 0 or z2 = 0 (or both). (iii) The product is distributive with respect to the sum, i.e., if z1 , z2 , z3 ∈ Z, then z1 .(z2 + z3 ) = (z1 .z2 ) + (z1 .z3 ). (iv) If z1 , z2 , z3 ∈ Z and z1 ≤ z2 , then z1 + z3 ≤ z2 + z3 , and if 0 ≤ z3 , then z1 z3 ≤ z2 z3 .
12.3
Rational Numbers
Let S be a nonempty set. An equivalence relation on S is a subset R of S × S that has the following properties: (i) (s, s) ∈ R for all s ∈ S. (ii) If (s, t) ∈ R, then (t, s) ∈ R. (iii) If (s, t) ∈ R and (t, u) ∈ R, then (s, u) ∈ R. An alternative notation for (s, t) ∈ R will be sRt. Given s ∈ S, the set R(s) := {t ∈ S : tRs} is called the equivalence class (or just the class) defined by s and the equivalence relation R. Observe that the {R(s) : s ∈ S} is a pairwise disjoint cover of S. Define a relation ∼ on Z × (Z \ {0}) by (z1 , w1 ) ∼ (z2 , w2 ) if and only if, z1 .w2 = z2 .w1 . It is simple to prove that ∼ is an equivalence relation on Z. The set of classes is called Q, and its elements are called rational numbers. So, a rational number is a class ∼ (z, w), where (z, w) ∈ Z × (Z \ {0}). An element (z, w) in a class will be called a representative of this class. If we agree in identifying z with ∼ (z, 1) for each z ∈ Z, the set Z is then a subset of Q. Let us define on Q two algebraic operations sum (+) and product (.), and an order ≤. Given q1 , q2 ∈ Q, take representatives (z1 , w1 ) and (z2 , w2 ) of q1 and q2 , respectively. Define q1 + q2 the class where (z1 .w2 + z2 .w1 , w1 .w2 ) is, and q1 .q2 the class where (z1 .z2 , w1 .w2 ) is. We say that q1 ≤ q2 whenever z1 .w2 ≤ z2 .w1 . Since the definition of sum, product, and order, was done by taking representatives, it is necessary, prior to stating the main properties of those, to ensure that the definitions are independent of the representatives. It is simple to prove the validity of the previous statement, and it is left to the reader. It is also simple to prove that the algebraic operations and the order on Q induce on Z the algebraic operations and the order defined above. Theorem 1068 The set Q, endowed with the algebraic operations sum (+) and product (.), and the order ≤ defined above, is an ordered commutative field. Precisely, it has the following properties. (i) Q is an ordered commutative ring (see Theorem 1067). (ii) Every element q1 ∈ Q such that q1 = 0 has a multiplicative inverse, i.e., an element q2 ∈ Q such that q1 .q2 = 1. If (z, w) ∈ q1 , then (w, z) ∈ q2 .
12.4 Real Numbers
12.4 12.4.1
621
Real Numbers The Constructive Approach
Definition 1069 A real number x is a pair (L, R) of subsets of Q which have the following properties: (i) L = ∅, R = ∅, L ∪ R = Q, L ∩ R = ∅. (ii) Every rational number in the set L is smaller than every rational number in the set R. (iii) The set L has no largest element. The set of all such couples is called the real number system, and is denoted by R. The above pair (L, R) is called a Dedekind cut (from R. Dedekind,) or, in short, a cut.1 To any rational number q we associate the cut (Lq , Rq ), where Lq := {l ∈ Q : l < q}, and, consequently, Rq := {r ∈ Q : q ≤ r} (we say that this kind of cuts are rational cuts). In this way, Q is applied in a one-to-one and onto way on the set of all rational cuts. From now on, we shall identify the rational number q with the particular rational cut defined above; the cut will be denoted, simply, by q. There are cuts (L, R) such that R has no smallest element, and so do not correspond to any rational number. For example, put L := {x ∈ Q : q ≤ 0}∪{x ∈ Q : x 2 < 2}, and R := Q \ L. By the very definition, (L, R) is a cut. Let us prove that R has no smallest element. Fix q ∈ Q such that q > 0 and q 2 ≥ 2. By Theorem 18, q 2 > 2 (and q > 0). Use Proposition 15 to find n ∈ N big enough such that 0 < 1/n < q and, simultaneously,
2q < n. q2 − 2
It follows that 2q 1 1 1 1 2 = q2 − + 2 > q 2 + (2 − q 2 ) + 2 = 2 + 2 > 2, q− n n n n n and, simultaneously, q − 1/n > 0. This shows that q − 1/n ∈ R, and so R has no smallest element. We endow R with an order relation (≤) and two algebraic operations, the sum (+) and the product (.). Those are defined below. 1. We say that a cut (L, R) is less than or equal to another cut (L , R ) whenever L ⊂ L . In this case we write (L, R) ≤ (L , R ). If this happens and, moreover, (L, R) = (L , R ) (i.e., L = L ), we write then (L, R) < (L , R ). Clearly, if (L, R) is a cut, the set R is simply Q \ L, so L determines the cut, and reference to R is superfluous. However, there are two reasons for speaking about pairs. One is historical: This was the kind of cuts that were introduced. The second one is that a pair conveys the idea of a “cut” in the set Q, and invites the reader to think of the defined new point as the location “in between” L and R. 1
622
12 Appendix
A cut (L, R) is said to be positive if 0 < (L, R). It follows that (L, R) is positive if and only if 0 ∈ L or, equivalently, if L contains a positive rational number. A cut (L, R) is said to be nonnegative if 0 ≤ (L, R). A cut (L, R) is said to be negative if (L, R) < 0. 2. Let (L, R) and (L , R ) be two cuts. We define the sum (L, R) + (L , R ) to be the cut (L
, R
), where L
:= {l + l : l ∈ L, l ∈ L }, and R
:= Q \ L
. It is a simple exercise to prove that, in fact, (L
, R
) is a cut, and that this definition, when applied to rational cuts (identified to rational numbers), agree with the definition of sum of rational numbers. Observe, too, that the rational cut 0 (i.e., ( − ∞, 0), [0, +∞))) acts as the zero element of the new defined operation. The agreement on Q of the sum just defined and the original one shows that, in case the cut corresponds to the rational number q, the additive inverse is the cut corresponding to the rational number −q. If, on the contrary, the cut (L, R) corresponds to an irrational number, then the additive inverse is the cut (L , R ), where L := {−r : r ∈ R} and R := {−l : l ∈ L}. If c is a cut, write −c for its additive inverse. 3. Let (L, R) and (L , R ) be two positive cuts. We define the product (L, R).(L , R ) as the cut (L
, R
) where L
:= ( − ∞, 0] ∪ {l.l : l ∈ L, l > 0, l ∈ L , l > 0}. It is straightforward to prove that (L
, R
) so defined is a cut. The product of the zero cut and any other cut is the zero cut, by definition. The definition of the product of two cuts in the remaining situations reduce to the former cases by computing first, if necessary, the additive inverses to get positive cuts, and then defining, for cuts c and c , ⎧
⎪ ⎪ ⎨−((−c).c ) if c < 0 and 0 < c , c.c := −(c.(−c )) if 0 < c and c < 0, ⎪ ⎪ ⎩ (−c).(−c ) if c < 0 and c < 0. We shall use the following notational convention: A cut (L, R) will be denoted, alternatively, by a single symbol. If x is a real number we shall denote the corresponding cut by (Lx , Rx ). The following two results summarize the main properties of R endowed with the order, the sum, and the product defined, respectively, in items 1, 2, and 3 above. There is a routine to check the validity of the order and algebraic statements, so, we will not prove Theorem 1070. To check the new property of completeness is somehow more involved. This will be done below. Recall that, given a nonempty subset S of R, a number b ∈ R is said to be an upper bound of S if s ≤ b for all s ∈ S, and a number β ∈ R is said to be the supremum of S if β is an upper bound of S and there is no smaller upper bound of S. A set S having an upper bound is said to be bounded above. Analogously we may define a lower bound of S, the infimum of S, and the concept of being bounded below (see Definitions 41 and 42). Theorem 1070 The set R, endowed with the order ≤ and the algebraic operations sum (+) and product (.), defined in items 1, 2, and 3 above, is a commutative ordered
12.4 Real Numbers
623
field (see Theorem 1068). The order and the algebraic operations sum and product induce the order and the operations on Q, respectively, defined in Sect. 12.3. Theorem 1071 The order on R defined in item 1 above, is complete. That is, every nonempty bounded above subset of R has a supremum. Proof Let S be a nonempty subset of R. Assume that S is bounded above, i.e., that there exists M ∈ R such that s ≤ M for all s ∈ R. Put L := Ls R := Rs . (12.3) s∈S
s∈S
We claim that (L, R) is a cut. In order to prove the claim, we shall verify (i) to (iii) in Definition 1069. (i) Let us show first that R = ∅. Given s ∈ S we have s ≤ M, hence Ls ⊂ LM and, therefore, Rs ⊃ RM . It follows that RM ⊂ s∈S Rs (= R). In particular, R = ∅. Obviously, L = ∅. Let q ∈ Q. We have only two possibilities: either q ∈ Ls for some s ∈ S (and then q ∈ L), or q ∈ Ls (i.e., q ∈ Rs ) for each s ∈ S (and then q ∈ R). (ii) Assume that l, r ∈ Q satisfy l ∈ L, r ∈ R. Then, there exists s ∈ S such that l ∈ Ls . Certainly, r ∈ Rs , hence l < r. (iii) If l ∈ L, there exists s ∈ S such that l ∈ Ls . We can find then l ∈ Ls (⊂ L) such that l < l . This proves that L has no largest element. The claim has been then proved. Observe that s ≤ (L, R) for each s ∈ S. This follows directly from the fact that Ls ⊂ L for each s ∈ S. Finally, we shall prove that (L, R) is the smallest cut that is greater than or equal to each element in S (i.e., (L, R) is the supremum of S). Assume, on the contrary, that s ≤ K < (L, R) for all s ∈ S. This shows that Ls ⊂ LK for all s ∈ S, and that LK L. This contradicts the definition of L. 2 Remark 1072 1. Observe that the supremum of a bounded above nonempty subset S of R is unique. This was proved in Remark 43. 2. A statement similar to Theorem 1071 can be formulated for a nonempty bounded below subset of R: it has an infimum (and this infimum is unique). In fact both statements (the existence of infimum and the existence of supremum) are equivalent, as can be proved easily. ®
12.4.2
The Axiomatic Approach
Definition 1073 The real number system is a set R together with two algebraic operations, denoted by + (called sum or addition) and · (called product), i.e., two
624
12 Appendix
mappings + and · from R × R into R, and an order relation (i.e., a subset R of the Cartesian product R × R) that satisfy the following axioms (we write p + q instead of +(p, q), p · q (or pq) instead of ·(p, q), and p < q (alternatively, q > p) instead of (p, q) ∈ R): Addition Axioms (Internal law) p + q ∈ R for all p, q ∈ R. (Commutative law) p + q = q + p for all p, q ∈ R. (Associative law) p + (q + r) = (p + q) + r for all p, q, r ∈ R. (Existence of a neuter element) There exists an element 0 ∈ R such that 0 + p = 0 for every p ∈ R. (+v) (Existence of inverse) For every p ∈ R there exists an element (denoted −p) in R such that p + (−p) = 0.
(+i) (+ii) (+iii) (+iv)
Product Axioms (Internal law) pq ∈ R for all p, q ∈ R. (Commutative law) pq = qp for all p, q ∈ R. (Associative law) p(qr) = (pq)r for all p, q, r ∈ R. (Existence of a neuter element) There exists an element 1 ∈ R such that 1 = 0, and 1p = p for every p ∈ R. (.v) (Existence of inverse) For every p ∈ R such that p = 0, there exists an element (denoted 1/p) in R such that p(1/p) = 1.
(.i) (.ii) (.iii) (.iv)
Distributive Axiom (d) For p, q, r ∈ R we have p(q + r) = pq + pr. Order Axioms (oi) If p, q ∈ R, then one and only one of the following statements hold: p < q, q < p, p = q. (oii) Given p, q, r ∈ R such that p < q and q < r, then p < r. (oiii) If p, q, r ∈ R and p < q, then p + r < q + r. (oiv) If p, q ∈ R, 0 < p, and 0 < q, then 0 < pq. Axiom of the Supremum Given a nonempty subset S of R, we say that S is bounded above if an upper bound (i.e., an element b ∈ R such that s < b or s = b hold for every s ∈ S) exists. An element s ∈ R is said to be a supremum of S if it is the smallest among the collection of all upper bounds of S. (s) Given a nonempty bounded above subset S of R, then S has a supremum. The axiom of the supremum is also referred to as the completeness axiom. A set S endowed with two algebraic operations + and · and an order relation 1, 2 n+1 n > (n + 1)n for n ∈ N, n ≥ 3.
Hint. Standard. 13.5 Find directly the formula for s(n) := 12 + 22 + 32 + . . . + n2 for each n ∈ N (see also Exercise 13.4). Hint. Write (m + 1)3 − m3 = (m + 1)2 + m(m + 1) + m2 = 3m2 + 3m + 1. Summing up from m = 1 to n and using the formula 1 + 2 + . . . + n = Exercise 13.4), we get
n(n+1) 2
(see
3 (n + 1)3 − 1 = 3s(n) + n(n + 1) + n. 2 Thus
3 3s(n) = n3 + 3n2 + 3n + 1 − 1 − n(n + 1) − n, 2
hence s(n) =
n(n + 1)(2n + 1) . 6
Similarly we may get the formula for 13 + 23 + 33 + . . . + n3 by using the expansion of (n + 1)4 (see also Exercise 13.4). 13.6 Show that the cardinality of the family of all subsets (included the empty set) of a set of cardinality n ∈ N is 2n . Hint. By induction: split the family of all subsets of {1, 2, . . ., n + 1} to two disjoint subfamilies: The first subfamily is formed by subsets of {1, 2, . . . , n}, the second one by all subsets of {1, 2, . . . , n} to which n + 1 was added. Note that 2n + 2n = 2n+1 . 13.7 Recall that a triplet {a, b, c} of natural numbers is said to be a Pythagorean triplet if a 2 + b2 = c2 (see Remark 11). Prove that {3, 4, 5} is the only consecutive Pythagorean triplet. Hint. n2 + (n + 1)2 = (n + 2)2 , hence n2 − 2n − 3 = 0, so n = 3. 13.8 Prove that for any prime number p, there exists a prime number q greater than p by following the hint. This gives an alternative proof to the statement that there are infinitely many primes (Theorem 9).
13.1 Numbers
633
Hint. Let p be a prime number. Let {2, 3, 5, . . ., p} be the (finite) sequence of all prime numbers less than or equal to p. The number q := (2 · 3 · · · p) + 1 is certainly greater than p. Either q is prime—and then we are done—or it is composite. In the second case, it has a prime number divisor, say r (Theorem 8). Since none of the prime numbers 2, 3, . . ., p is a divisor of q, the prime number r must be greater than p, and we get the conclusion also in this case. 13.9 Suppose 2n − 1 is a prime number with n ∈ N. Prove that n must be a prime number. Hint. Prime numbers of the form 2p − 1, where p is a prime number, are referred to as Mersenne primes. Assume n = rs. Then, 2n − 1 = (2r )s − 1 = (2r − 1)((2r )s−1 + (2r )s−2 + · · · + 1).
13.10 Combinatorial numbers. Given n, k ∈ N, the symbol nk (it reads: n choose k) denotes the number of choices of k elements from a set of n elements (the order is not considered). (i) Prove that n n n+1 + = . k k+1 k+1
(13.1)
n(n − 1) . . . (n − k + 1) n n! = = . k k! k!(n − k)!
(13.2)
(ii) Prove (2.41), i.e., that
(iii) Derive directly (no formulas or induction needed) the finite binomial expansion (see also Eq. (5.96)) (a + b) = n
n n k=0
k
a k bn−k , for a, b ∈ R.
(13.3)
Hint.(i) Given n + 1 elements, k + 1 terms are chosen from them in one of the two following ways: (a) either we chose them from the first n elements (and so you get the second summand in (13.1)) or (b) by choosing k from a subset of n elements and add to them the one left (and so you get the first summand in (13.1)). (ii) Use induction. (iii) Write (a + b).(a + b) . . . (a + b) (n times) and collect terms in which, in the product, a is taken k times (and b thus n − k times). 13.11 Show that, for every n ∈ N, we have nk=0 nk = 2n . Hint. 2n = (1 + 1)n and use (iii) in Exercise 13.10. 13.12 Show that n5 − n is divisible by 30 for each n ∈ N, n ≥ 2. Hint. Proceed by induction: Obviously, the result is true for n = 2. Assume now that it holds for some n ≥ 2. Note that
634
13 Exercises
(n + 1)5 − (n + 1) − (n5 − n) 4 4 5 j 5 j n + 1 − (n + 1) − n5 + n = n . = n5 + j j j =1 j =1 The last term above is divisible by 5 due to the combinatorial numbers involved, and so it is (n5 −n) by the induction hypothesis. Thus, this shows that (n+1)5 −(n+1) is divisible by 5. This proves that (n5 −n) is divisible by 5 for all n ≥ 2. The divisibility of n5 − n by 2 and 3 follows by using the three consecutive numbers in the equality n5 − n = (n − 1)n(n + 1)(n2 + 1).
13.1.3
Fractions
13.13 Prove that each rational number has a unique representative p/q, for p ∈ Z, q ∈ N \ {0}, and gcd (p, q) = 1. Hint. Assume that p/q = r/s, where the couples (p, q) and (r, s) satisfy the requirements in the statement. Then ps = rq. Every prime number in the factorization of p must appear (with the same or bigger multiplicity) in the factorization of r. The argument can be reversed, and so every prime number in the factorization of r must appear (with the same or bigger multiplicity) in the factorization of p. This forces p = r (and then q = s). 13.14 Find the fraction that represents the real number 0.11223. Hint. Follow the procedure in the proof of Theorem~20: x = 0.11223, 104 x = 1122.3, 105 x = 11223.3, 105 x − 104 x = 11223 − 1122, x = 11223−1122 . 9.104 √ 13.15 Prove that 3 2 is not a fraction. √ Hint. Modify accordingly the proof of Theorem 18. Precisely, assume that 3 2 = p/q for some p ∈ Z and q ∈ N \ {0}, where gcd (p, q) = 1. Then p 3 = 2q 3 -⇒ p 3 even -⇒ p even , p = 2k -⇒ 8k 3 = 2q 3 -⇒ q 3 = 4k 3 -⇒ q 3 even -⇒ q even.
13.1.4
Base Representation
13.16 (i) Write 1130 in base 11. (ii) Write 15 in base 2. (iii) Write 13 in base 2. (iv) Write 43 in base 2. Check back the result by summing infinite series. Hint. (i) Following the text in Sect. 1.5.1, put b := 11. Then, 1130 = (102 × 11) + 8 = (9.11 + 3).11 + 8 = 9.b2 + 3.b + 8 = 938 (base 11). (ii) Following the text in Sect. 1.5.2, compute subsequently 1 : 5, 2 : 5, 4 : 5, 8 : 5, 6 : 5, 1 : 5, . . . This gives 1/5 = 0.00110011. . . (base 2) (= 0.0011 (base 2)). (iii) 1/3 (base 2) = 0.01. (iv) 3/4 (base 2) = 0.11.
13.1 Numbers
13.1.5
635
Real Numbers
13.17 Prove that • 1+ •
1 2
√1 2
+
· 43 · · ·
√1 3 2n−1 2n
+ · · · √1n >
0, (U −ε, U ]∩A = ∅. It follows that U − ε is an upper bound for A, a contradiction with the fact that U is the least upper bound for A. This shows (ii). Assume now that (i) and (ii) hold simultaneously. (i) shows that U is an upper bound for A, while (ii) shows that U is the least upper bound for A. Together, we get that U is the supremum of A. For the infimum, proceed similarly. 13.26 Show that for two subsets A and B of R we have sup A + sup B = sup (A + B), where A + B := {a + b : a ∈ A, b ∈ B}. Show that for two subsets A and B of {r ∈ R : r ≥ 0} we have sup A. sup B = sup (A.B), where A.B := {ab : a ∈ A, b ∈ B}. Hint. Standard. 13.27 Draw the picture of the set (0, 1) + [0, 21 ). Hint. Standard. 13.28 Give an example of a subset M of R such that M + M is not a subset of 2M. Compare with Exercise 13.29. Hint. M = {0, 1}. 13.29 Show that for a convex subset M of R we have M + M ⊂ 2M. Compare with Exercise 13.28. Hint. The definition of convexity (see Sect. 8.1). 13.30 For which subsets M of R we have 2M ⊂ M + M? Hint. All. 1 13.31 Show in detail that ∞ n=1 (0, n ) = ∅. Hint. Use, for example, Proposition 126.
13.1.6
Cardinality of Sets—and Ordinal Numbers
Exercises 13.32–13.40 review the basics of the theory of cardinal numbers, and Exercises 13.64–13.68 elements of the theory of ordinal numbers. They complete the material in Sect. 1.7. We are indebted to several sources, in particular [Kam50] and [Sie65]. 13.32 Introduction to Cardinal Numbers A set A is said to be equivalent to a set B (in symbols, A ∼ B) if there exists a one-to-one mapping from A onto B. By assumption, the empty set is equivalent only to itself. By a cardinal number u we mean a class of mutually equivalent sets. If A is a set, the cardinal number to which A belongs is denoted by card A. Thus, two sets A and B are equivalent, if and only if, card A = card B, i.e., they have the same cardinality according to Definition 49. Any set A in the class card A is said to be a representative of the cardinal number card A
638
13 Exercises
A set A is said to have a smaller cardinal number than a set B (in symbols card A < card B) if A is equivalent to a subset of B and B is equivalent to no subset of A. Given two cardinal numbers u and v, we say that u ≤ v whenever u < v or u = v. This agrees with Definition 49. The relationship ≤ is a partial order, in the sense that for cardinal numbers u, v, w, (ai) u ≤ u, (aii) if u ≤ v and v ≤ u then u = v, and (aiii) if u ≤ v and v ≤ w then u ≤ w. Property (aii) is the statement of the Cantor–Bernstein– Schröder Theorem 50. Prove (ai) and (aiii) above. (b) Prove that given two cardinal numbers u and v, at most one of the three possibilities u < v, u = v, or v < u arises. It is true, too, that at least one happens, although the proof of this will be postponed to Exercise 13.64. The notions of finite and infinite sets were introduced in Definition 49. The cardinal number of the empty set is, by definition, 0. The cardinal number of a nonempty finite set is called, accordingly, a finite cardinal number, and corresponds to (and will be denoted by) a natural number. Otherwise, we speak of transfinite cardinal numbers. The smallest transfinite cardinal number is the cardinal number of N, denoted by ℵ0 ; this follows from (c) below. (c) Prove that every infinite set contains a countably infinite subset. The cardinal number of the set R is denoted by c. It was proved in Theorem 59 that ℵ0 < c. There are infinitely many cardinal numbers, since the following result holds: (d) For every set A, the set P(A) of all the subsets of A has a cardinal number larger than card A. The hypothesis that there is no cardinal number larger than that of natural numbers and at the same time smaller than that of real numbers is called the continuum hypothesis (CH) and is consistent with the usual axioms of mathematics. By this we mean that, simultaneously, (i) if we assume it, no contradiction can occur, and (ii) if we assume its negation, no contradiction can occur either. See, e.g., [Du66] for more information on this. (a)
Hint. (ai) and (aiii) have an easy proof. (b) follows from the definitions. (c) This is (i) in Proposition 51. (d) This assertion is proved by using a Cantor diagonal method, similar to the one used to prove that ℵ0 < c (see again Theorem 59, Proposition 65, and Remark 66). In detail, assume that f : V → P(V ) is a one-to-one and onto mapping. Form the set S := {x ∈ V : x ∈ f (x)}. Since S ∈ P(V ), there exists v ∈ V such that f (v) = S. If v ∈ S, then v ∈ f (v) (= S), a contradiction. If v ∈ S then v ∈ f (v) (= S), again a contradiction. 13.33 Let n ∈ N. Prove that the sets {1, 2, . . ., n} and {1, 2, . . ., n, n + 1} do not have the same cardinality (i.e., they are not equivalent). Prove then that the same is true for the sets {1, 2, . . ., n} and {1, 2, . . ., n, n + 1, . . ., n + m} for m ∈ N. Hint. Assume that there is a one-to-one mapping from {1, 2, . . ., n + 1} onto {1, 2, . . ., n}. It is easy to prove by induction that the sequence {n + 1, f (n + 1), f (f (n + 1)), f (f (f (n + 1))), . . .} consists of mutually distinct elements in {1, 2, . . ., n}, and this is impossible due to the fact that this last set is finite. The second part follows from the first one by finite induction.
13.1 Numbers
639
13.34 Prove, by using the Cantor–Bernstein–Schröder Theorem 50, that if A is equivalent to a subset of B, then card A ≤ card B. Hint. If A ∼ C, where C ⊂ B, two possibilities may arise: (i) No subset of A is equivalent to B, so card A < card B, or (ii) there is a subset of A equivalent to B. Apply in this case the Cantor–Bernstein–Schröder Theorem 50. 13.35 Let A be a finite set. Let B be a subset of A. Assume that there exists a one-to-one mapping f from B onto A. Prove that A = B. Hint. If A is empty, there is nothing to prove. If not, without loss of generality we may assume that A = {1, 2, . . ., n} for some n ∈ N. The set B is nonempty. By (ii) in Proposition 51, B is finite, hence it is equivalent to {1, 2, . . ., m} for some m ∈ N. Therefore, we get a one-to-one mapping from {1, 2, . . ., n} onto {1, 2, . . ., m}, and this violates the statement in Exercise 13.33 if n = m. 13.36 Prove that a nonempty set A is finite, if and only if, it is not equivalent to any of its proper subsets. Hint. Let A be a nonempty finite set. Then, we may assume, without loss of generality, that A = {1, 2, . . ., n} for some n ∈ N. Assume that A is equivalent to a proper subset B. Then, there exists a one-to-one mapping from B onto A, a contradiction with the statement in Exercise 13.35. Assume now that A is infinite. By (i) in Proposition 51, A contains N , a subset equivalent to N. If f : N → N is a one-to-one and onto mapping, and 2N denotes the subset of N of all even natural numbers, put E := f −1 (2N). Let h : N → 2N be a one-to-one and onto mapping (for example, put h(n) := 2n for all n ∈ N). The mapping g : A → A defined as ⎧ ⎨s if s ∈ A \ N , g(s) := ⎩f −1 ◦ h ◦ f (s) otherwise. maps A onto (A \ N ) ∪ E in a one-to-one way, and (A \ N ) ∪ E is a proper subset of A. 13.37 Show that if there is a map ϕ from a set A onto a set B, then card B ≤ card A. Hint. For b ∈ B let ϕ −1 (b) = {a ∈ A : ϕ(a) = b}. Using the axiom of choice, pick ab ∈ ϕ −1 (b) for every b ∈ B. Define A1 := {ab : b ∈ B}. Show that card B = card A1 ≤ card A. 13.38 Let A and B be two nonempty finite sets with the same cardinality. Prove that if a mapping f from A into B is one-to-one then f has to be onto as well. Hint. Since card (f (A)) = card (B), there is a one-to-one mapping from f (A) into B. Exercise 13.35 shows that f (A) = B. 13.39 Show that the cardinality of any infinite subset S of N is ℵ0 . Hint. This is (iii) in Proposition 51. Alternatively, construct by induction a strictly increasing sequence of integers {an } in S. It follows that the map n → an from N into S is one-to-one. The identity is a one-to-one map from S into N. Then use the Cantor–Bernstein–Schröder Theorem 50.
640
13 Exercises
13.40 Algebra of Cardinal Numbers = Given two cardinal numbers u and = v, define the sum u + v as card (U V ), where denotes the disjoint union, and U (V ) is a representative of u (respectively, of v). (a)
(b) (c)
(d)
(e)
Prove that the definition of sum above is independent of the representatives U and V , that the sum is commutative, associative, that u1 ≤ v1 and u2 ≤ v2 implies u1 + u2 ≤ v1 + v2 . The following equalities hold: n + ℵ0 = ℵ0 for every finite cardinal number n (Exercise 13.41). ℵ0 + ℵ0 = ℵ0 (Proposition 54, see also Exercise 13.41). More generally, ℵ0 +u = u for every transfinite cardinal number u (Proposition 54, see also Exercise 13.41). c + c = c (Exercise 13.45). Define the product u.v of two cardinal numbers as card (U × V ), where U and V are sets that represent u and v, respectively, and U × V denotes the Cartesian product of U and V (see Sect. 1.1). Prove that the product is commutative, associative, and distributive with respect to the sum. Prove also that u1 ≤ v1 and u2 ≤ v2 implies u1 .u2 ≤ v1 .v2 . The following equalities hold: n.ℵ0 = ℵ0 (Exercise 13.42) and n.c = c (Exercise 13.45) for all finite cardinal number n = 0, that ℵ0 .ℵ0 = ℵ0 (Exercise 13.42), ℵ0 .c = c (Exercise 13.45), and finally that c.c = c (Exercise 13.55). Associate to each element v of a nonempty set V a set Uv . We may assume that the sets in {Uv : v ∈ V } are pairwise disjoint. Let uv be the cardinal of Uv for v ∈ V . We may extend the concept of sum and product$of two cardinal numbers to this situation, introducing the cardinal v∈V uv and v∈V uv accordingly (use = $ the sets v∈V Uv and v∈V Uv , respectively). Prove that the expected corresponding properties for the generalized sum and product of cardinal numbers hold. A particular case is the power of two cardinal numbers u and v, represented uv . It corresponds to the product in the case when card Uv = u for all v ∈ V , and card V = v. The power of two cardinal numbers can be formulated equivalently by defining first the concept of the power of two sets: Let U and V be two nonempty sets. Let U V := F(V , U ), i.e., the set of all mappings from V into U . Prove that the set F(V , U ) can be canonically identified to the product $ v∈V Uv , where Uv = U for all v ∈ V . An important particular case is when U := {0, 1}. Instead of {0, 1}V we write 2V . Denote, as usual, P(V ) the set of all subsets of V . The mapping χ : P(V ) → 2V given by χ (B) := χB for each B ⊂ V , where χB is the characteristic function of B (see Definition 295), is a one-to-one mapping from P(V ) onto 2V . Now we may define card (U )card (V ) to be card (U V ). This definition coincides with the previous $ one due to the fact that the set F(U , V ) has been identified to the set v∈V Uv . Since card {0, 1} = 2 we get, in particular, 2card (V ) = card (P(V )). Taking V := N, we get 2ℵ0 = card (P(N)). The exact value of 2ℵ0 is c (see Exercise 13.47). In Exercise 13.32 (d) we mentioned already that for every set V , card (V ) < card (P(V )), i.e., v < 2v for every cardinal number v.
13.1 Numbers
641
Prove that if C is a family of cardinal numbers such that it does not contain a largest one, then there exists a cardinal number that is larger than any of them. (g) Prove, by using (f), that it is self-contradictory to consider the set of all cardinal numbers.
(f)
Hint. (a), (b), and (c) are standard. (d)$ is standard. (e) To the function f : V → U corresponds the element (f (v)) ∈ v∈V v∈V Uv . (f) Consider the cardinal number w∈C w. 13.41 (i) Show that n + ℵ0 = ℵ0 for every finite cardinal number n. (ii) Show that ℵ0 + ℵ0 = ℵ0 . (iii) Show that ℵ0 + u = u for every transfinite cardinal number u. Hint. (i) This is a consequence of Proposition 54. A direct proof is as follows: Given n ∈ N, the set A := {n + 1, n + 2, . . .} is equivalent to N (this can be seen directly; alternatively, use Exercise 13.39), and N = {1, 2, . . ., n} ∪ A. (ii) Write N as the union of the set of odd numbers and the set of even numbers and use (iii) in Proposition 51. Alternatively, use Proposition 54. (iii) This is Proposition 54. Alternatively, let U be a set whose cardinal number is u. The set U contains a countable subset U0 (see (c) in Exercise 13.32), and U0 ∼ U0 ∪ N by (ii). Let U1 := U \ U0 . Then U ∼ U1 ∪ U0 ∼ U1 ∪ U0 ∪ N ∼ U ∪ N. 13.42 Write N as the union of infinitely many disjoint infinite sets. Show, as a consequence, that ℵ0 .ℵ0 = ℵ0 and that n.ℵ0 = ℵ0 for all finite cardinals n = 0. Hint. For n ∈ N, let Sn be the collection of all natural numbers that require exactly n primes in their expansion. For the second part, use Proposition 53. Alternatively, use the first part and Exercise 13.39. That n.ℵ0 = ℵ0 for all finite cardinals n = 0 follows from induction from the fact that ℵ0 + ℵ0 = ℵ0 (see Exercise 13.41) or from the fact that there are one-to-one mappings from N into S1 ∪ . . . ∪ Sn and from this set into S1 ∪ S2 ∪ . . . = N, using finally the Cantor–Bernstein–Schröder Theorem 50. 13.43 Let S be a nonempty set of cardinality less than or equal to ℵ0 . What is the cardinality of the family of all finite subsets of S? Hint. If n := card S is finite (a natural number or 0), then the family of all finite subsets of S is the family of all subsets of S, whose cardinality is 2n . If S := N, then note that, for each n ∈ N, the set of all subsets of N having cardinal n can be viewed as a subset of Nn , whose cardinal is ℵn0 (i.e., ℵ0 , as it follows from ℵ0 .ℵ0 = ℵ0 , see Exercise 13.42 and use finite induction; the Cantor–Bernstein–Schröder Theorem 50 shows that its cardinality is precisely ℵ0 . It is enough now to apply again Exercise 13.42 to obtain that the sought cardinal is ℵ0 . 13.44 (i) Prove that all intervals (0, 1), [0, 1), (0, 1] and [0, 1] have cardinality c by using the given hint. (ii) Show that [0, 1] and (0, 1) have the same cardinal number by giving an explicit bijection f : [0, 1] → (0, 1). Hint. (i) This follows from Proposition 61. Alternatively, observe that (0, 1) ⊂ [0, 1) ⊂ [0, 1] ⊂ (−1, 2), and that (0, 1) and (−1, 2) have cardinality c. Use the Cantor–Bernstein–Schröder Theorem 50. Finally, use Proposition 61. A similar argument applies to (0, 1].
642
13 Exercises
(ii) Let {qn : n = 3, 4, . . .} be the set of all rational numbers in (0, 1), put p1 := 0 and p2 := 1. Let S be the set of all irrational numbers in (0, 1). Define a mapping f from [0, 1] onto (0, 1) by f (qn ) := qn+2 for n ∈ N, and f (s) := s for each s ∈ S. Clearly f is a bijection from [0, 1] onto (0, 1). Alternatively, to show for instance that card [0, 1] equals card (0, 1] we can proceed as follows: If {ri }∞ i=0 is the sequence of all the rational numbers in [0, 1] with r0 = 0, then consider the map ϕ(ri ) = ri+1 and ϕ(x) = x is x is irrational. 13.45 Prove that c + c = c. By induction, prove that n.c = c for each finite cardinal number n = 0, and that ℵ0 .c = c. Hint. Use Exercise 13.44 and Proposition 61 to get that c = card R = card (0, 1) = card (0, 1] ≤ card ((0, 1] ∪ (1, 2)) = card (0, 2) = cardR. Observe that card ((0, 1] ∪ (1, 2)) = c + c. To prove the last statement, put R = n∈Z [n, n + 1). 13.46 Show that every set of cardinality c can be decomposed into a countably infinite number of disjoint sets of cardinality c. Hint. Use Exercise 13.45. 13.47 Prove that 2ℵ0 = c. Hint. Put Pf (N) for the set of all finite subsets of N. Observe first that card (Pf (N)) = ℵ0 (see Exercise 13.43). Define a map ϕ : 2N \ Pf (N) → (0, 1] by ϕ(A) := 0.χA (1)χA (2)χA (3). . . (base 2), where χA is the characteristic function of A, a subset of N. We claim that ϕ is one-to-one and onto. Indeed, let A and B be two subsets of N such that A = B. Let n be the first element in N such that n ∈ A ∩ B. Assume, without loss of generality, that n ∈ A \ B. Then χA (n) = 1 and χB (n) = 0. There exists m > n such that m ∈ A, since A is infinite. It follows that ϕ(A) − ϕ(B) ≥ 2−n + 2−m −
∞
2−k = 2−m ,
k=n+1
and this shows that ϕ is one-to-one. Moreover, every x ∈ (0, 1] has a binary nonterminating expansion x = 0.a1 a2 a3 . . .. Indeed, if x = 0.b1 b2 . . .bn is a terminating binary expansion, then x = 0.b1 b2 . . .bn−1 (bn − 1)111. . .. This implies that ϕ is onto, and the claim holds. This proves that card (2N \ Pf (N)) = card ((0, 1]) = c (see Exercise 13.44). Since card (Pf (N)) = ℵ0 and ℵ0 + c = c (see Exercise 13.41) we conclude that 2ℵ0 = c, as claimed. 13.48 What is the cardinality of the set of all the integer-valued functions defined on the set of all integers? (in other words, what is the cardinality on NN ?) Hint. 2ℵ0 ≤ ℵℵ0 0 ≤ (2ℵ0 )ℵ0 = 2ℵ0 ℵ0 = 2ℵ0 (use Exercise 13.42). So, the answer is 2ℵ0 , i.e., c, due to Exercise 13.47. 13.49 What is the cardinality of all permutations of N? Hint. The set is included in NN , and includes, for each subset A of N, a permutation that keeps elements in N \ A and changes all elements in A. Use then Exercise 13.48 and the Cantor–Bernstein–Schröder Theorem 50. The answer is c.
13.1 Numbers
643
13.50 Show that the cardinality of the set P of all irrational numbers is c by using the given hint. Hint. This is Corollary 60. Alternatively, we propose four different approaches: 1. The set NN (which is of cardinality c, see Exercise 13.48) can be mapped one-toone and onto the set of all irrational numbers between 0 and 1. This can be proved by using the concept of continued fractions; the sought map ϕ is given by ϕ(n1 , n2 , . . . ) =
1 n1 +
1
, for (n1 , n2 , . . .) ∈ NN .
1 3 +...
n2 + n
We refer to, e.g., ([FHHMZ11], § 17.14) for more on continued fractions. For another proof of the existence of a one-to-one mapping from NN onto the set P see Exercise 13.392. 2. Alternatively, fill in the gaps all in the following argument: Write the set P of ∞ irrational numbers as P = ∞ S , where S := P ∩ I for all i ∈ N, and {I } i i i i i=1 i=1 is the sequence of open intervals between two consecutive integer numbers. (a) Explain why card Si = card P for every i ∈ N. Use this to construct a oneto-one map from P onto a set of countably many disjoint copies of P. Thus ℵ + ℵ + ℵ + . . . = ℵ, where ℵ denotes the cardinality of the set of all irrational numbers. Then, in particular, ℵ ≤ c = ℵ0 + ℵ ≤ ℵ + ℵ ≤ ℵ + ℵ + ℵ + . . . = ℵ. Thus c = ℵ. Maybe only (a) requires some help: Use the mapping x → 1/x to pass from an open bounded interval (not containing 0) to an open unbounded interval. This map takes rational (irrational) numbers into rational (respectively, irrational) numbers. 3. Here is one more alternative proof: Let ϕ be a one-to-one map from R onto R2 , and call A the image of the set Q. The set A is countable. Let be a vertical line in R2 that has the x-coordinate different from the x-coordinates of all the points in A. Then lies outside of A, and the cardinality of is c. Then the cardinality of ϕ −1 ( ) is also c. The set ϕ −1 ( ) lies outside of Q. This shows that P contains a subset whose cardinality is c, and it is contained in a set whose cardinality is c. The Cantor–Bernstein–Schröder Theorem 50 shows that the cardinality of P is c. 4. (This needs results from Sect. 6.7.) The set P, endowed with the metric inherited from the usual metric in R, is a perfect Polish space. It is enough now to use Corollary 591. 13.51 What is the cardinality of the set of all algebraic numbers? Note that a real number is called algebraic if it is a root of a polynomial with integer coefficients. Hint. Every polynomial has a finite number (if any) or real roots. How many polynomials of degree n with integer coefficients are? Then use Proposition 54 or, alternatively, Exercise 13.42. 13.52 Prove Sierpi´nski’s lemma: There is a family F formed by infinite subsets of N such that card F = c and, if F1 and F2 are two different elements of F, then card (F1 ∩ F2 ) < ∞. Hint. Write every real number as a limit of a sequence of distinct rational numbers and identify by a one-to-one onto map the set Q and the set N. In this way to each
644
13 Exercises
real number corresponds an infinite subset of N; due to the definition of limit, this gives the sought family F. 13.53 (a) What is the cardinality of the family of all countable subsets of (0, 1)? (b) What is the cardinality of the set of all continuous functions on [0, 1]? Hint. (a) Observe that there are as many countable subsets of (0, 1) as mappings from N into (0, 1). Accordingly, the cardinal of the family of all countable subsets of (0, 1) is card (0, 1)N = cℵ0 . Now compute cℵ0 = (2ℵ0 )ℵ0 = 2ℵ0 .ℵ0 = 2ℵ0 = c (see Exercises 13.42 and 13.47). (b) Use the fact that each continuous function is determined by its values at Q ∩ [0, 1]. Then use (a). 13.54 (a) What is the cardinality of the family of all subsets of Q? (b) What is the cardinality of the family of all open sets in (0, 1)? Hint. (a) According to Exercise 13.47, card P(Q) = card 2 card Q = 2ℵ0 = c. (b) Use the fact that every open set is a union of a countable family of intervals with rational endpoints. How many are such countable families? Map each such countable family to its union. Alternatively, observe that an open subset of (0, 1) is characterized by the set of rational numbers it contains, so there is a one-to-one mapping from the family of all open subsets of (0, 1) into the family of subsets of Q. Use now (a) and the Cantor–Bernstein–Schröder Theorem 50. 13.55 (a) Show that c.c = c. (b) Show that cn = cℵ0 = c for each finite cardinal number n = 0. Hint. (a) Write x, y ∈ [0, 1] as x = 0.a1 a2 a3 · · · (not terminating) and y = 0.b1 b2 b3 · · · (not terminating). Map (x, y) to 0.a1 b1 a2 b2 . . . . This map is one-to-one from [0, 1] × [0, 1] onto [0, 1]. (b) cℵ0 = (2ℵ0 )ℵ0 = 2ℵ0 .ℵ0 = 2ℵ0 = c (see Exercises 13.47 and 13.42). For cn use the previous result and the Cantor–Bernstein–Schröder Theorem 50. 13.56 (a) What is the cardinality of all real-valued functions defined on the set of integer numbers? (b) What is the cardinality f of the set of all real-valued functions on R? (c) What is the cardinality of the set of all real-valued measurable functions on R? Hint. (a) cℵ0 = (2ℵ0 )ℵ0 = 2ℵ0 .ℵ0 = 2ℵ0 = c (use Exercises 13.42 and 13.47). (b) That f is strictly bigger than c is in Theorem 65. Note that, by definition, f = cc . Now, 2c = 2ℵ0 .c = (2ℵ0 )c = cc , so f = 2c (use Exercises 13.45 and 13.47). (c) If C has measure zero and cardinality c (such a set exists, see Definition 277), consider the set of characteristic functions of all the subsets of C (each of them is measurable) and use the Cantor–Bernstein–Schröder Theorem 50 to conclude that the sought cardinal is 2c . 13.57 What is the cardinality of the family of all Lebesgue measurable sets in R? Hint. 2c : Consider all the subsets of the Cantor ternary set of Lebesgue measure 0. 13.58 What is the cardinality of the set of all Riemann integrable functions on [0, 1]? Hint. Similar to Exercise 13.57, having in mind (b) in Exercise 13.56.
13.1 Numbers
645
13.59 What is the cardinality of the set of all real-valued Baire class 1 functions on R? Hint. c. Use the definition, that it is c for the set of all real-valued continuous functions (see Exercise 13.53), and Exercise 13.45. 13.60 What is the cardinality of the set of all real-valued monotone functions on [0, 1]? Hint. Every monotone function on [0, 1] is determined by its values at the subset consisting of all rational numbers in [0, 1] and the points of discontinuity (which is countable, see Proposition 397). The value at a point of discontinuity is chosen in a nonempty interval (that has cardinality c), and those intervals are pairwise disjoint by monotonicity, so they are, for each function, countably many. Use then Exercise 13.55 to conclude that there are 2ℵ0 .2ℵ0 = 2ℵ0 = c monotone functions on [0, 1]. 13.61 This exercise presents a version of Ramsey’s theorem [Ra30]. Given a set X, we let X (n) to be the set of all subsets of X of cardinality n. We say that a system of k disjoint sets {Si }ki=1 forms a partitioning of X (n) whenever X (n) = ki=1 Si . Theorem 1078 (Ramsey [Ra30]) Let k, n ∈ N. Then for every partitioning {Si }ki=1 of N(n) there exists i ∈ {1, . . . , k} and an infinite set M ⊂ N, such that M (n) ⊂ Si . An equivalent formulation is the following. Let n be a natural number. Let ψ be a mapping from N(n) to some nonempty finite set C. Then, there is an infinite subset M of N such that ψ is constant on M (n) . Still a paraphrase of the previous statement is the following: If a coloring (with a finite number of colors) of sets of natural numbers of a given cardinality n is defined, then there is an infinite subset M of N such that all subsets of M of length n have the same color. Proof of Theorem 1078 By induction on n. For n = 1 the result is obvious. Assume that for some n > 1 the statement has been proved for 1, 2, . . ., n − 1. We shall prove it for n. The argument is based on the following observation: If we fix some j ∈ N, we may consider all elements in N(n) that contain j . Define a mapping ψ from (N \ {j })(n−1) into C as ψ (F ) = ψ(F ∪ {j }) for all F ∈ (N \ {j })(n−1) . This is a coloring of all finite subsets of length n − 1 in (N \ {j })(n−1) , hence, by the induction hypothesis, there exists an infinite subset M1 of N \ {j } such that all subsets of M1 of length n − 1 get the same color (i.e., ψ is constant on them). This means that ψ is constant (the same constant) on all sets F ∪ {j }, where F ∈ (N \ {j })(n−1) . To prove the assertion for n we iterate the construction above: Let us start with n0 := 1, and find an infinite subset M1 of N \ {n0 } such that ψ is constant on all sets of the form {n0 } ∪ F , for F a subset of length n − 1 of M1 . Let n1 = minM1 (> n0 ). Find an infinite subset M2 of M1 \ {n1 } such that ψ is constant (maybe a different constant) on all sets of the form {n1 }∪F , for F a subset of length n−1 of M2 , and put n2 = minM2 (> n1 ). Continue in this way to obtain a sequence M1 ⊃ M2 ⊃ . . . of infinite sets (and the sequence n0 < n1 < n2 < . . .). By passing to a subsequence if necessary (denoted again {Mi }) we may assume that the same constant is associated to all Mi ’s. The sought set is then {ni }∞ i=1 .
646
13 Exercises
13.62 Notes on Infinite Combinatorics 1. Root lemma: Let A be an uncountable family of finite sets. Then, there exists an uncountable subfamily B of A and a finite (possibly empty) set R such that A ∩ B = R for every distinct elements A, B of B. A family like B is called a -system, and R is called the root of the -system. 2. Let X be an uncountable set. For each α ∈ X, let Lα be a finite subset of X such that α ∈ Lα . Then, there exists an uncountable subset Y ⊂ X such that Y ∩ Lα = ∅ for every α ∈ Y . 3. There is a subset B of the real line R such that for any nonempty perfect subset P of R, both B ∩ P and P \ B have cardinality c. A set like B in 3. is called a Bernstein set. For another description of a Bernstein set, see Exercise 13.158. Hint. (a) For a proof in the case that card (A) = c and that A is formed by finite sets in [0, ω1 ] see, e.g., [FHHMPZ01, Exercise 12.28], [FHHMZ11, Exercise 14.30], or [DGZ93, page 262]. 13.63 Define the sum of an arbitrary collection of nonnegative real numbers, and show that it is independent of any permutation of the set of indices. Hint. Let {aγ : γ ∈ } be a collection of nonnegative real numbers. If Pf () denotes the family of all finite subsets of , define s := γ ∈ aγ as s :=
sup
F ∈Pf () γ ∈F
aγ ,
whenever s is finite. Note that, in this case, given ε > 0, the set {γ : aγ > ε} is finite. Thus, the set {γ : aγ = 0} is countable, say, {aγi }∞ . Then s = aπ (γ ) , i=1 where π : → is any permutation of . Exercises 13.64 to 13.68 provide the basic elements of the theory of ordinal numbers. Some of them do not really include explicit questions, but statements and descriptions whose validity should be checked by the reader. 13.64 Basics on Ordinal Numbers The reader is invited to read first Sect. 12.6 for concepts and results. Assume that (S, ≤) and (T , 9) are two partially ordered sets. A mapping f : S → T is said to be increasing if f (s1 ) 9 f (s2 ) whenever s1 ≤ s2 . If an increasing one-to-one mapping f from (S, ≤) onto (T , 9) exists (such a mapping is called a similarity), we say that (S, ≤) and (T , 9) are similar. Of course, a partially ordered set similar to a well-ordered set is itself a well-ordered set. The class of all partially ordered sets that are similar to a given well-ordered set is called an ordinal number. Since two similar sets are also equivalent, to each ordinal number there is a definite associated cardinal number. However, different ordinal numbers may have the same associated cardinal number. For example, the two sets (N, ≤) (where ≤ denotes the natural ordering) and the set (the list defines the increasing order) {1, 3, 5, 7, . . ., 2, 4, 6, 8, . . .} represent different ordinal numbers: the first one is denoted by ω, and every element but 1 has an immediate predecessor, the second one
13.1 Numbers
647
contains two elements, namely, 1 and 2, without immediate predecessor, so they are not similar. However, both sets have cardinal ℵ0 . Given a well-ordered set (M, ≤) and an element x ∈ M, the set {y ∈ M : y < x} is called an initial segment (in short, a segment) of M, and it is denoted by Mx . Note that the element x does not belong to Mx . Some facts on segments follow: First, if (M, ≤) is a well-ordered set and f is a similarity mapping from M onto a subset N of M, then x ≤ f (x) for all x ∈ M. Hence, the only similarity mapping from M onto M is the identity mapping. (ii) A segment of a segment of M is a segment of M. Given two distinct segments Mx and My of M, then one of them is a segment of the other. (iii) Similarity mappings between two well-ordered sets transform a segment of the first onto a segment of the second. No well-ordered set is similar to any of its segments, or to a segment of a subset. Two distinct segments of the same well-ordered set are never similar to each other. We say that two ordinal numbers μ and ν satisfy μ < ν if there exists wellordered sets (M, ≤) representing μ and (N, 9) representing ν, and a similarity mapping from M onto an initial segment of N; we put μ ≤ ν if μ < ν or μ = ν. Given an ordinal number μ > 0, represent by Mμ the set of ordinal numbers that are strictly less than μ. (iv) Maybe the most interesting feature about the set Mμ defined in (iii) above is that, when ordered according to increasing magnitude, it is a well-ordered set whose ordinal number is precisely μ. This gives canonical representatives for the ordinal numbers: To a given ordinal number μ, we associate its representative Mμ . This immediately ensures that, given two ordinal numbers μ and ν, then at least (and also at most) one of the three following possibilities arise: μ < ν, μ = ν, or ν < μ. We obtain, too, that any set consisting of ordinal numbers, if ordered according to increasing magnitude, is well-ordered. In particular, to every ordinal number μ corresponds an immediate successor, denoted μ + 1.
(i)
Not every ordinal number has an immediate predecessor (in this case, we say that the ordinal number is a limit ordinal number). A consequence of the well-ordering of any set of ordinal numbers is that it can be described by a (generalized) sequence of ordinal numbers, i.e., a family {μi }i∈I , where (I , 9) is some well-ordered set, for each i ∈ I , μi is an ordinal number, and if i ≺ j in I then μi < μj . We say that a (generalized) sequence M is fundamental if it does not contain a greatest element. In this case, there exists a definite next larger ordinal number to all elements in the sequence M, and we denote this element by limμ∈M μ. Observe that limμ∈M μ is an ordinal number whose associated cardinal number is ℵ0 , if card (M) = ℵ0 and each μ in M has associated cardinal number ℵ0 . Given a cardinal number m, we consider the family of all ordinal numbers μ such that their associated cardinal number is m. Since this family is a subset of the well-ordered set Mν , where ν is an ordinal number having a cardinal number strictly
648
13 Exercises
greater than m, it is well-ordered, so it has a first ordinal number, that we associate to the cardinal number m. Observe that two cardinal numbers m and n satisfy m < n if, and only if, their respective associated ordinal numbers μ and ν satisfy μ < ν. It follows from the fact that two arbitrary ordinal numbers can be compared that the same is true for two arbitrary cardinal numbers. Hint. (i) Let A := {x ∈ M : f (x) < x}. Assume that A = ∅, and let x0 be its first element. Then f (x0 ) < x0 , hence f (x0 ) ∈ A. It follows that f (f (x0 )) ≥ f (x0 ). However, since f is a similarity mapping, we have f (f (x0 )) < f (x0 ), a contradiction. This implies that A = ∅. To prove the second part, observe that if f : M → M is onto and a similarity, so it is f −1 . From the first part we get x ≤ f (x) and, when applied to f −1 , f (x) ≤ f −1 (f (x)), too, i.e., f (x) ≤ x. This shows that f (x) = x for all x ∈ M. (ii) Standard. (iii) The first statement is trivial. The rest follow from (i). (iv) Fix an ordinal number ν and a well-ordered set N whose ordinal number is ν. The mapping f from the set N into the set Mν of all ordinal numbers less than ν that sends an element x ∈ N to the ordinal number of the segment Nx is a similarity from N onto Mν . That the mapping is onto is a consequence of the definition of the ordering of ordinal numbers. See Exercise 13.68 for examples of limit ordinal numbers (ω, the ordinal of N in its natural order is such), or for the fact that the ordinal number associated to the cardinal number ℵ0 is ω. 13.65 Algebra of Ordinal Numbers Let (M, ≤) be a well-ordered set, and let {(Nμ , ≤μ) : μ ∈ M} be a family of well-ordered sets, that we may assume without loss of generality to be pairwise disjoint. Put λμ for the ordinal number of the wellordered set the family {λμ : μ ∈ M}, (Nμ , ≤μ), μ ∈ M. Then, we define the sum λ of denoted μ∈M λμ , as the ordinal number of the set N := μ∈M Nμ , ordered by 9, where 9 is defined in the following way: if λ, δ ∈ N are in the same set Nμ , then λ 9 δ if λ ≤μ δ. If, on the contrary, λ ∈ Nμ and δ ∈ Nν , with μ < ν, then we put λ ≺ δ. Prove that (N, 9) is a well-ordered set, and that λ is independent of the representative (Nμ , ≤μ ) chosen for each λμ . It follows from this that λ = μ∈M λμ is well-defined. Observe, too, that the sum of ordinal numbers, even a finite sum, is not commutative. Given two ordinal numbers μ and ν, represented by the well-ordered sets (M, ≤) and (N, 9), we define the product μ.ν of the two ordinal numbers as the ordinal number of the set M × N, ordered according to the following rule: if (x1 , y1 ) and (x2 , y2 ) are two elements in M × N, then (x1 , y1 ) (x2 , y2 ) if either x1 = x2 and y1 9 y2 , or if x1 < x2 . To define the product of a well-ordered set of ordinal numbers, and then the power of two ordinal numbers, we need the transfinite induction method (stated in Exercise 13.66); see Exercise 13.67. Hint. To prove the first question is routine. For the second (the noncommutativity of the sum of ordinal numbers) observe that the ordinal number ω+1 is different from the ordinal number 1 + ω. The first one is represented by the set (listed according to increasing magnitude) {0, 1, 2, 3, . . ., κ} (where κ is just a symbol), the second one is clearly ω. Those two sets are not similar: the first one has a last element, not the second one.
13.1 Numbers
649
13.66 The Transfinite Induction Principle Prove the following result: Let μ > 0 be an ordinal number. Let P (ν) be a proposition concerning the ordinal numbers ν in Mμ . (i) Assume that P (0) is true. (ii) Assume that, given an arbitrary ordinal number ν in Mμ , if P (λ) is true for all λ ∈ Mν , then P (ν) is true. Then, we can conclude that P (ν) is true for all ν ∈ Mμ . In the case of the nonnegative integers, the induction principle reads as follows (see Sect. 1.2 and 12.1): Let P be a proposition concerning the nonnegative integers. (i) Assume that P (0) is true. (ii) Assume that, given an arbitrary positive integer m, if P (n) is true for each 0 ≤ n < m, then P (m) is true. Then, we can conclude that P is true for all the nonnegative integers. (ii) in the case of the nonnegative integers can be substituted for (ii)’. Assume that, given an arbitrary nonnegative integer m, if P (m) is true then P (m + 1) is true. Hint. If P (ν) fails for some ν ∈ Mμ , consider the element ν + 1 and the nonempty set S := {λ ∈ Mν+1 : P (λ) fails}. Since S, in the induced order, is a well-ordered set, it has a first element, say ν0 . According to (i), 0 < ν0 . The set Mμ0 is, therefore, nonempty. By the definition of ν0 , P (λ) holds true for every λ ∈ Mν0 . According to (ii), P (ν0 ) holds true. This is a contradiction with the fact that ν0 ∈ S. 13.67 The Product of a Well-Ordered Set of Ordinal Numbers, and the Power of Two Ordinal Numbers Let μ be an ordinal number. We know that the ordinal number of the set Mμ is μ. To each ν ∈ Mμ we associate a well-ordered set Mν . Let λν be its ordinal number. Put f (0) := 1. Assume that, for some ν ∈ Mμ+1 , f (λ) has been defined for all λ ∈ Mν . Then we have two possibilities: (i) If ν is not a limit ordinal, it has an immediate predecessor, denoted ν − 1. Put then f (ν) := f (ν − 1).λν−1 . (ii) If ν is a limit ordinal, put f (ν) := limλ∈Mν f (λ). According to the Transfinite Induction Principle (see Exercise 13.66), f (ν) is defined for every ν ∈ Mμ+1 . In particular, f (μ) is well defined. We call this ordinal$number the product of the well-ordered set {λν : ν ∈ Mμ }, and we denote it by ν a3 > . . .. The set {an : n ∈ N} has no first element. 13.70 Let Ω denote the set of all ordinal numbers less than or equal to the first uncountable ordinal number. Show that there is no strictly decreasing positive realvalued function defined on Ω. Hint. Put δ := inf f and let f (αn ) < δ + n1 . Consider ordinal numbers greater than all αn . This shows, in particular, that the ordinal number associated to the cardinal number ℵ0 is, precisely, ω.
1
13.1 Numbers
13.1.7
651
Topology of R
13.71 Let A := { 1i : i ∈ N}. Show that A is a Gδ -set in R. Construct a real-valued function on R that is continuous exactly at the points of A. Hint. Choose εi > 0 such that εi < 1i for each i ∈ N, and that the intervals 1 ( i − εi , 1i + εi ) are pairwise disjoint. Call the family of these intervals G1 . Form the same with intervals of radius εi /2, and call the family of these intervals G2 . Keep going to form G3 , G4 , etc. Then ∞ n=1 Gn is the set A. The function is, for example, ⎧ ⎨D(x) sin π , for x = 0, x f (x) = ⎩1, for x = 0, where D is the Dirichlet function (see Definition 296). Show that xD(x) does not have one-sided derivative at 0. 13.72 Give an example of a decreasing sequence of closed intervals with empty intersection. Hint. The sequence of intervals {[n, ∞)}∞ n=1 . This shows that the requirement of boundedness in the sequence of closed nested intervals in Theorem 69 is crucial. 13.73 Prove that the set N, as a subset of R, has no accumulation points. Hint. Assume that x ∈ R is an accumulation point of N. Then, the neighborhood (x − 1, x + 1) must contain infinitely many natural numbers; this is plainly false. 13.74 Let K be a nonempty finite set in the real line R and let {an } be a sequence in K that converges to a ∈ R. Show that a ∈ K. Hint. If a ∈ K find δ > 0 such that (a − δ, a + δ) ∩ K = ∅. Then, eventually an ∈ (a − δ, a + δ), a contradiction. Observe that this exercise shows that every finite subset of R is closed (see Proposition 128). 13.75 Let M be a nonempty bounded set in R, and s be the supremum of M. Show that s lies in the closure of M. Formulate a similar result for the infimum. Hint. See (ii) in Proposition 44. 13.76 Use Theorem 45 in R to show that any nonempty bounded set of natural numbers has a maximal element in its usual order. Hint. First note that the set is finite, hence closed (see Exercise 13.74). Then apply Exercise 13.75. 13.77 Let S be a nonempty subset of R such that its boundary is empty. Show that S = ∅ or S = R. Hint. S must be closed. This follows from (iii) in Proposition 78. The same argument applied to S c shows that S c is also closed (see Exercise 13.345). Finally, use the connectedness of R (see Proposition 103). 13.78 Find a set in R that is neither a Gδ -set nor an Fσ -set. Hint. The union of the set of rational numbers in (−∞, 0) and the set of irrational numbers in (0, ∞).
652
13 Exercises
13.79 Is it possible to express [0, 1] as the union of nondegenerate disjoint closed intervals each of them of length less than 1? Hint. No. If yes, consider the set S of all endpoints of such intervals. Show that S is perfect and thus uncountable (see Corollary 591), but any such interval contains rational points. 13.80 Show that R cannot be covered by a countable family consisting of more than one closed disjoint sets. Hint. This is Sierpinski’s result (for references and other sources see [Knu, Problem E 2613]; we present here the proof by several authors recorded there. Note its relationship to the connectedness property of R). Assume that R is the union of disjoint closed sets {Ci : i = 0, 1, 2, . . . }. Choose x < y ∈ R so that x ∈ C0 , y ∈ C0 . Put a0 := sup (C0 ∩ (−∞, y)). Then a0 ∈ C0 and C0 ∩ (a0 , y) = ∅. Let i1 be the smallest positive integer such that Ci1 ∩ (a0 , y) = ∅, and put b0 := inf (Ci1 ∩ (a0 , y)). Then b0 ∈ Ci1 , a0 < b0 , and the sets Ci , i ≤ i1 do not meet (a0 , b0 ). Let i2 be the smallest positive integer such that Ci2 ∩ (a0 , b0 ) = ∅, and put a1 = sup (Ci2 ∩ (a0 , b0 )). Then a1 ∈ Ci2 , and a0 < a1 < b0 . Moreover, the sets Ci , i ≤ i2 do not meet (a1 , b0 ). Let i3 be the smallest positive integer such that Ci3 ∩ (a1 , b0 ) = ∅, and put b1 := inf (Ci3 ∩ (a1 , b0 )). Then b1 ∈ Ci3 , and a0 < a1 < b1 < b0 . Moreover, the sets Ci for i ≤ i3 do not meet (a1 , b1 ). Keep going. We get a sequence of intervals {[ai , bi ] : i = 0, 1, 2, . . .} that is nested, and thus there is z such that z ∈ [ai , bi ] for all i ≥ 0. It follows from the construction that then z ∈ Ci for all i ≥ 0. This is a contradiction. 13.81 Is the circle homeomorphic to the disc in R2 ? Hint. No, use the connectedness after removing a point. 13.82 Prove Theorem 109 by using Theorem 111. ∞ ∞Hint. Let {Gn }n=1 be a sequence of open dense subsets of F . Assume that is not dense. We can find then subset O of F such n=1 Gn ∞a nonempty open ∞ c c c that O ∩ ∞ G = ∅, i.e., O ⊂ ( G ) (= G n=1 n n=1 n=1 n ). Note that Gn is ∞ n c nowhere for n ∈ N. It follows that n=1 Gn is of first category in F . Since dense c G , O⊂ ∞ n=1 n we get that O is of first category, too, a contradiction. 13.83 Let f be a continuous function defined on [0, +∞) such that f (0) = 0. If for any x ∈ [0, 1] we have lim f (nx) = 0, use Baire category to show that n→∞ lim f (x) = 0 x→∞ Hint. Fix ε > 0 and consider the sets An,ε = {x ∈ [0, 1] : |f (kx)| ≤ ε for allk ≥ n} for n ∈ N. Since n An,ε = [0, 1], find n ∈ N and a nonempty open interval (a, b) ⊂ [0, 1] such that (a, b) ⊂ An,ε . Note that for big k, the intervals (ka, kb) overlap.
13.2 Sequences and Series
13.2
653
Sequences and Series
13.2.1 Approximation by Rational Numbers 13.84 Prove that if θ ∈ R is irrational, and we choose for each t ∈ N integers p = pt and q = qt as in Lemma 116 then, whatever the choice is, we have lim t→∞ qt = +∞. Hint. Otherwise, there is a sequence t1 < t2 < . . . for which we have to choose the same p and the same q. Then we have |qθ − p| < tn−1 for all n ∈ N, meaning that qθ − p = 0, a contradiction as θ is irrational. 13.85 If θ is irrational, show that every point of the interval [−1, 1] is a limit point of the sequence sin πθ , sin 2πθ , . . ., sin nπ θ, . . . Hint. For n ∈ N, fr (nθ/2) = nθ/2 − nθ/2 is the fractional part of nθ/2. Multiply by 2π to get 2π fr (nθ/2) = nπ θ − 2πnθ/2, hence sin (2π fr (nθ/2)) = sin (nπ θ). According to Theorem 119, the sequence {fr (nθ/2)}∞ n=1 is dense in [0, 1], hence the sequence {2π fr (nθ/2)}∞ is dense in [0, 2π ]. The continuity of the n=1 function sin x shows the result. 13.86 Prove that the set {(cos nθ, sin nθ ) : n ∈ N} is dense in the unit circle T (in C) endowed with the Euclidean metric, whenever θ is an irrational multiple of π . Hint. Put θ = απ , for a fixed irrational number α. Exercise 13.85 shows that {2π fr (nα/2)}∞ n=1 is dense in [0, 2π ]. The mapping x → exp inx maps continuously [0, 2π ] onto the unit circle T in C (see Sect. 12.5), hence the set {exp i2π fr (nα/2) (= exp iπ nα = exp inθ = (cos nθ , sin nθ )) : n ∈ N} is dense in T.
13.2.2
Sequences
13.87 Let {an }∞ n=1 be a sequence of real numbers such that an → , and let k ∈ N. Show that limn→∞ (an + . . . + an+k ) = kl. Hint. Standard. ∞ 13.88 Let {an }∞ n=1 and {bn }n=1 be two sequences of real numbers such that an →
and bn → . Show that limn→∞ max{an , bn } = . Hint. Standard.
13.89 Compute the following limits. √ √ (i) limn→∞ (√n + 1 − n). (ii) limn→∞ (√ n2 + 1 − n). (iii) limn→∞ n2 + n − n. 2 (iv) limn→∞ n2n . 2 (v) limn→∞ n n+n+1 2 +1 .
654
13 Exercises
√ √ Hint. (i) Multiply √ by ( n + 1 + n) to get that the limit is 0. Multiply √ and divide 2 2 and divide by √ n + 1+ n to obtain that the limit is 0. (iii) Multiply and divide by √ 2 2 n + 1 + n . Then divide numerator and denominator by n to get that the limit is 1/2. (iv) Use binomial expansion (see (iii) in 13.10) to show that
the finite
Exercise
(1+1)n = n0 + n1 +. . .+ nn . This gives n2 /2n ≤ n2 / n3 = 6n2 /(n(n−1)(n−2)) → 0 as n → ∞. (v) Divide numerator and denominator by n2 to get that the limit is 1. 13.90 Use the finite binomial expansion (see (iii) in Exercise 13.10) to show that √ n n → 1 when n √ → ∞. Hint. Write n n = 1 + εn . We show that εn → 0. By the finite binomial
expansion, using the second term in the expansion, n = (1 + εn )n ≥ n2 εn2 . n Thus εn2 ≤ n → 0. (For another approach to the computation of the limit, see 2
Lemma 178.) 13.91 Show elementary that limn→∞ nn!n = 0. Hint. Write nn!n = n(n−1)...2.1 ≤ n.n....n.1 ≤ 1/n, for each n ∈ N. n.n...n n.n...n √ n
13.92 Show that limn→∞ nn! = 1e . Hint. We can use Stirling’s formula (from the Scottish mathematician J. Stirling) (see, e.g., [Stromb81], p. 253): √ 1 1 n! = 2πnnn e−n+λn , < λn < , n ∈ N. (13.9) 12n + 1 12n Alternatively, use Definition 217 and Lemma 178 to get * √ n n! n n! = lim lim n→∞ n n→∞ nn 1 (n + 1)! > n! 1 = lim = lim n = . n→∞ (n + 1)n+1 nn n→∞ 1 + 1 e n In Table 13.1, we list some values of n! and their below and above approximations by using Stirling formula (13.9). 13.93 If r is a √rational number, show that r is the limit of the sequence of irrational numbers {r + n2 }∞ n=1 . Hint. Standard. 13.94 Show by the definition that limn→∞ 2 +1 | nn2 +n
− 1| = < Hint. definition of the limit would do. n−1 n2 +n
n n2 +n
0, the choice n0 =
1 ε
in the
13.95 Show by the definition that limn→∞ sinn n = 0. Hint. The function | sin (x)| is bounded (by 1). This proves the estimate | sinn n | ≤ 1 for all n ∈ N (see Fig. 7.24 for a plot of the function sin (x)/x). n
13.2 Sequences and Series
655
Table 13.1 Stirling formula (see Exercise 13.92 and formula (13.9)) n
n!
taking λn := 1/(12n + 1)
taking λn := 1/(12n)
1
1
0.995870161
1.002274449
2
2
1.997320405
2.000652048
3
6
5.996095879
6.000599142
4
24
23.99082154
24.00102389
5
120
119.969854
120.0026371
6
720
719.8722123
720.0091873
7
5040
5039.334743
5040.040582
8
40320
40315.8881
40320.21779
9
362880
362850.5534
362881.3779
10
3628800
3628560.142
3628810.051
11
39916800
39914609.49
39916883.11
12
479001600
478979428.3
479002368.5
13
6227020800
6226774418
6227028660
14
87178291200
87175308851
15
1.30767×10
12
1.30764×10
12
87178379323 1.30768×1012
13.96 Prove that {sin (n2 )}∞ n=1 is not a null sequence, i.e., it does not converge to 0. Hint. Assume sin (n2 ) → 0. Hence sin2 (n2 ) → 0 and so cos2 (n2 ) → 1. We have sin ((n + 1)2 ) = sin (n2 + 2n + 1) = sin (n2 ) cos (2n + 1) + sin (2n + 1) cos (n2 ). Thus we get sin (2n + 1) → 0. Write sin (2n + 3) = sin (2n + 1) cos 2 + sin 2 cos (2n + 1) ( → 0). Therefore, cos (2n + 1) → 0 and thus 1 = sin2 (2n + 1) + cos2 (2n + 1) → 0, a contradiction. 13.97 Consider the sequence {xn }∞ n=1 , where xn := sin n for n ∈ N. (i) (ii)
Prove that the sequence {xn }∞ n=1 does not converge. Find lim supn→∞ xn and lim inf n→∞ xn .
Hint. (i) If limn→∞ sin n = 0, develop sin (n + 1) to get that limn→∞ cos n = 0. This contradicts sin2 x + cos2 x = 1. If limn→∞ sin n = = 0, develop sin 2n so see that then limn→∞ cos n = 21 . Then develop cos 2n to get a contradiction. (ii) Observe that sin n = sin (nπ(1/π)). Use now Exercise 13.85 to obtain that lim supn→∞ xn = 1 and that lim inf n→∞ xn = −1. Note that this result implies (i). 13.98 Show that any sequence {an } of real numbers has a monotone subsequence. Hint. Assume first that the sequence is bounded. By the Bolzano–Weierstrass Theorem 147, it has a convergent subsequence. Denote this subsequence again by ∞ {an }∞ n=1 , and put := lim an . If {an }n=1 is eventually constant, we are done. If not, we may assume, without loss of generality, that an > for infinitely many n’s. Then, choose an1 > . Find an2 ∈ ( , an1 ) such that n2 > n1 . Find an3 ∈ ( , an2 ) such that n3 > n2 . Keep going.
656
13 Exercises
Assume now that the sequence is unbounded, say unbounded above. Define a subsequence {ank } of {an } by induction: Start by choosing n1 = 1. If n1 , n2 , . . ., nk in N have been already chosen, then find nk+1 ∈ N such that nk+1 > nk and ank+1 > ank . The resulting sequence {ank } is a subsequence of {an }, and it is increasing. If the sequence {an } is unbounded below, proceed analogously to obtain a decreasing subsequence. 13.99 Show that the set S := {xn : n ∈ N} ∪ {x} is closed in R if {xn }∞ n=1 is a sequence in R that converges to x ∈ R. Hint. Fix y ∈ R, y ∈ S. Let r := |y − x| (> 0) and rn := |y − xn | (> 0) for n ∈ N. Since xn → x, there exists n0 ∈ N such that rn > r/2 for all n ≥ n0 . Put δ := min{r1 , r2 , . . ., rn0 , r/2}. Clearly, (y − δ, y + δ) ∩ S = ∅, hence R \ S is open. 13.100 Let an > 0 for all n ∈ N, and lim an = 0. Show that there is an unbounded sequence {bn } of positive numbers such that lim an bn = 0. This shows that there is no “slowest”convergence to zero. √ Hint. Consider bn = 1/ an . n 13.101 Assume that lim an = a. Show that lim a1 +a2 +···+a = a. n Hint. Fix ε > 0. Let N ∈ N be so that |an − a| < ε for n > N . Then, for m > N m −a) N −a) m we have a1 +...+a − a = (a1 −a)+...+(a + (aN+1 −a)+...+(a . The second fraction m m m goes to 0 if m → ∞ due to the denominator, the third fraction is less than or equal to ε.
13.102 Prove that if {xn } is a bounded sequence in R, then lim sup xn is the only real number s that has the two following properties: (i) There is a subsequence of {xn } that converges to s, and (ii) s is the greatest number with property (i). Analogously, lim inf xn is the only real number l that has the two following properties: (i) There is a subsequence of {xn } that converges to l, and (ii) l is the smallest number with property (i). Observe that in case that {xn } is unbounded above, there is a subsequence whose limit is +∞. Analogously, if {xn } is unbounded below, there is a subsequence whose limit is −∞. Note that it is possible, for a bounded above sequence {xn }∞ n=1 , that lim supn→∞ xn = −∞. For example, consider the sequence {−1, −2, −3, . . .}. Hint. Let s := lim sup xn . There exists N1 ∈ N such that sup{xn : n ≥ N1 } ∈ (s − 1, s + 1). Find n1 ≥ N1 such that xn1 ∈ (s − 1, s + 1). There exists N2 ∈ N, N2 > n1 , such that sup{xn : n ≥ N2 } ∈ (s −1/2, s +1/2). Find n2 ≥ N2 (> n1 ) such that xn2 ∈ (s − 1/2, s + 1/2). Continue in this way. The subsequence {xnk } converges to s. Moreover, if a subsequence of {xn } converges to s > s, for every N ∈ N we have sup{xn : n ≥ N } > (s + s )/2, hence limN sup{xn : n ≥ N } ≥ (s + s )/2, a contradiction. Hence, s satisfies (i) and (ii). Let s ∈ R satisfies (i) and (ii). Fix ε > 0. Since a subsequence {xnk } of {xn } converges to s, we have sup{xn : n ≥ N } > s − ε for all N ∈ N. This shows that limN sup{xn : n ≥ N } ≥ s − ε. The set {n ∈ N : xn ≥ s + ε} is finite (otherwise, a subsequence of {xn } will converge to an element greater than s +ε, see Theorem 147). This shows that, for some N ∈ N, sup{xn : n ≥ N} ≤ s + ε, hence limN sup{xn : n ≥ N} ≤ s + ε. Since ε > 0 was arbitrary, limN sup{xn : n ≥ N } = s.
13.2 Sequences and Series
657
The argument for lim inf is similar. 13.103 Prove the following assertions concerning a sequence {xn }∞ n=1 in R: If lim supn→∞ xn = s, where s is a finite number, then given ε > 0 there exists N ∈ N such that xn < s + ε for every n ≥ N , and there exists a subsequence ∞ {xnk }∞ k=1 of {xn }n=1 such that s − ε < xnk for every k ∈ N. (ii) If lim inf n→∞ xn = l, where l is a finite number, then given ε > 0 there exists N ∈ N such that xn > l − ε for every n ≥ N , and there exists a subsequence ∞ {xnk }∞ k=1 of {xn }n=1 such that xnk < l + ε for every k ∈ N. (i)
Hint. (i) From the definition, s := limn→∞ sn , where sn := sup{xk : k ≥ n} for all n ∈ N. Find N ∈ N such that sn < s + ε for every n ≥ N . In particular, sN < s + ε, hence xk < s + ε for every k ≥ n. By Exercise 13.102, there exists a subsequence ∞ ∞ {xnk }∞ k=1 of {xn }n=1 that converges to s (we do not need to assume that {xn }n=1 is bounded in this case). The conclusion follows. The proof of (ii) is similar. 13.104 Write Q as a sequence {rn }∞ n=1 . Show that every real number is a cluster . point of the sequence {rn }∞ n=1 Hint. The definition of cluster point and Theorem 63. 13.105 Show that a bounded sequence in R converges if, and only if, all of its convergent subsequences converge to the same limit. Hint. One implication is obvious. Assume now that a bounded sequence {xn }∞ n=1 does not converge. By Theorem 147, there exists a subsequence {xnk }∞ k=1 that converges to some x0 . Since {xn }∞ n=1 does not converge to x0 , there exists ε > 0 such that, for all n ∈ N, we can find m > n with |xm − x0 | ≥ ε. Proceeding recursively, ∞ we find a subsequence {xmj }∞ j =1 of {xn }n=1 such that |xmj − x0 | ≥ ε for all j ∈ N. If we apply again Theorem 147, this subsequence has a further subsequence that converges (to some point y0 that satisfies, certainly, |y0 − x0 | ≥ ε), and we reach a contradiction. 13.106 Put x1 := 2 and xn+1 := 21 (xn + x1n ) for n > 1. Show that x := lim xn exists and is equal to 1. Hint. Show that xn ≥ 1 for all n and that {xn } is decreasing. It follows from Theorem 135 that the sequence {xn } converges. Let x be its limit. Then note that x = 21 (x + x1 ), which implies that x 2 = 1. Of course, the possibility x = −1 is ruled out. 13.107 Put x1 := 0, x2 := 1 and xn+2 := 21 (xn + xn+1 ) for all n ∈ N. Show that lim xn exists and is equal to 23 . Hint. Observe that |xn+1 −xn | = 2−n for all n ∈ N. This shows that the sequence {xn } is Cauchy, hence convergent. Let x be its limit. Note that x2n − x2n−2 = 2−2n+1 for all n ∈ N. shows that x2n = x0 + 2−1 + 2−3 + . . . + 2−2n+1 for all n ∈ N, This −2n+1 hence x = ∞ 2 = 2/3. n=1 2 13.108 Let x1 = 1 and define xn iteratively by 2xn+1 = 4xn − 1. Show that the ∞ sequence {xn }n=1 converges and find its limit.
658
13 Exercises
Hint. Use monotonicity and boundedness, the limit is 1 +
√
2 . 2
13.109 Assume that an > 0 for all n ∈ N, that a1 = 1, and that a2 < 1. Assume also that the sequence {an } is decreasing, and that ank ≤ an ak , for all n, k ∈ N.
(13.10)
Show that there is ε > 0 such that an ≤
1 , for all n ∈ N. nε
Hint. Given n ∈ N, n ≥ 2, choose k ∈ N so that 2k ≤ n < 2k+1 . We have an ≤ a2k ≤ a2k , where the second inequality follows from (13.10). Therefore, due to the monotonicity of the function ln x, we get ln an ≤ k ln a2 . Thus, since k ln 2 ≤ ln n, we have ln an ln a2 . ≤ ln 2 ln n Put ln a2 ε := − . ln 2 Then, for n ∈ N we have an ≤ n−ε . −n 13.110 Let {xn }∞ n=1 be a bounded-below sequence in R such that xn+1 ≤ xn + 2 ∞ for all n ∈ N (such sequences are called almost nonincreasing). Prove that {xn }n=1 converges. Hint. Observe first that
xm − xn ≤
m−1
2−k for every m > n ≥ 1.
(13.11)
k=n
n −k Then note that {xn }∞ n=1 is bounded. Indeed, xn+1 − x1 ≤ k=1 2 , hence xn+1 ∈ [x1 −1, x1 +1] for all n ∈ N. Use Theorem 147 to ensure the existence of a convergent ∞ subsequence, say {xnk }∞ k=1 , and let a be its limit. Let {xmk }k=1 be another convergent subsequence, and let b be its limit. Due to (13.11), and the fact that the series ∞ −k converges, we obtain a ≤ b. The same argument proves that b ≤ a, hence k=0 2 a = b. Since this happens for every convergent subsequence of {xn }∞ n=1 , the sequence itself must converge.
Arithmetic and Geometric Progressions 13.111 Prove by induction that 1 + 2 + 3 + . . . + n = 1+n n. Prove a similar formula 2 for the sum of the first n terms of a general arithmetic progression (this was proved in Sect. 2.2.2 by other method).
13.2 Sequences and Series
659
Hint. The result is obviously true for n = 1. Assume that it holds for some n ∈ N. Then, if Sn := 1 + 2 + . . . + n, n 1 + (n + 1) 1+n n + (n + 1) = (1 + n) +1 = (n + 1). Sn+1 = Sn + (n + 1) = 2 2 2 This proves the formula for n + 1. If {an }∞ n=1 is an arithmetic progression (say n n. The proof is an+1 = an + r for all n ∈ N), then Sn := a1 + a2 + . . . + an = a1 +a 2 similar. 13.112 Let an = 1 + 21 + · · · + n1 − ln n, for n ∈ N. Show that γ := limn→∞ an exists and γ ∈ (0, 1). The number γ is called the Euler–Mascheroni constant (from L. Euler and the Italian mathematician L. Mascheroni). Hint. Put bn := an − n1 for n ∈ N. By the Mean Value Theorem 365, 1 1 1 < < ln (n + 1) − ln n = n+1 n + θn n for some θn ∈ (0, 1) and for all n ∈ N. Therefore, for n ∈ N,
1 − ln (n + 1) − ln n < 0, n+1 1 bn+1 − bn = − ln (n + 1) − ln n > 0. n It follows that {an } is decreasing and {bn } is increasing. Moreover, 0 = b1 < bn < an < a1 = 1, and an − bn = 1/n, for all n ∈ N. It follows from Theorem 135 that {an } and {bn } converge to the same limit γ . Its approximate value is an+1 − an =
γ = 0.5772156649015328606065120900824024310421593359399235988. . . 1 1 13.113 Calculate limn→∞ ( n1 + n+1 + · · · + 2n ). Hint. Using the notation in Exercise 13.112, we have
a2n − an−1 =
1 1 1 + + ··· + − (ln 2n − ln (n − 1)). n n+1 2n
Since ln 2n − ln (n − 1) = ln we get the result ln 2.
13.2.3
2n n−1
→ ln 2, and a2n − an−1 → 0 (see Exercise 13.112),
Series
Series of Nonnegative Terms 13.114 Prove, by using the comparison test (Proposition 170) that every absolutely convergent series converges ∞(Proposition 167). ∞ + Hint. Assume that convergent. Then, n=1 xn is absolutely n=1 ∞ − x∞n and ∞ ∞ + − by Proposition 170. Since x = x − n n n=1 xn converge, n=1 n=1 n=1 xn , ∞ it follows that n=1 xn converges, too.
660
13 Exercises
2 13.115 Assume that an is convergent and an ≥ 0 for all n. Show that an is convergent. Hint. After some index, an < 1 and then an2 ≤ an . Then use the comparison test (Proposition 170). 13.116 Let an be a convergent series of positive terms such that an+1 ≤ an for all n. Show that lim nan = 0. Hint. Given ε > 0 there is N ∈ N such that 2n n aj < ε for all n ≥ N . Then 2n (n + 1)a2n ≤ n aj < ε for all n ≥ N . Thus (n + 1)a2n → 0. Since an+1 ≤ an , we have also (n + 1)a2n+1 → 0. Hence, by an easy combination of these things, nan → 0. 1 13.117 Prove by atelescopic argument2 that the series ∞ n=2 n2 −1 is convergent, find 1 is also convergent. its sum, and then show that n2 1 1 Hint. Write n21−1 = 21 ( n−1 − n+1 ). Use a telescopic argument (do the evaluation of the partial sums) to show that the partial sums associated to the sequence { n21−1 } are bounded. Precisely, ∞ n=2
∞
1 1 1 1 = − n2 − 1 2 n − 1 n + 1 n=2 3 1 1 1 1 1 1 1 1 1 ( − ) + ( − ) + ( − ) + ( − ) + ... = . 2 1 3 2 4 3 5 4 6 4 1 ≤ n21−1 , we get that the series is convergent. n2 =
Since
1 n2
1 1 + 3.5 + ... 13.118 Find the sum 1.3 Hint. Put 1/[(2n + 1)(2n + 3)] = (1/2).(1/(2n + 1) − 1/(2n + 3)) and proceed by cancelling equal terms. The result is 21 . 13.119 Let an be a convergent series with positive terms.Show that there is a sequence {bn } of positive terms such that lim bann = +∞ and an bn is convergent. This shows that there is no “slowest” convergent series. Compare with Exercise 13.100. −k Hint. Get indexes jk such that m whenever m > n ≥ jk . Define j =n aj < 2 k bj = k for j = jk + 1, . . . jk+1 . Use the fact that is convergent—apply the 2k ratio test, see Proposition 175—and the fact that if a subsequence of partial sums of a series with positive terms is convergent then the series is convergent (since the sequence of partial sums is increasing.)
13.120 Let an =
2
⎧ ⎨1/n,
if n = m2 ,
⎩1/n2 ,
if n = m2 .
By a “telescopic argument” we mean the following cancellation fact: x1 ) + (x3 − x2 ) + . . . + (xN − xN −1 ) = xN − x1 . See Remark 678.
N n=2
(xn − xn−1 ) = (x2 −
13.2 Sequences and Series
661
Show that an is convergent. Hint. (1/k 2 ) < ∞ (see Proposition 174).
n! 13.121 Decide on the convergence of the series ∞ n=1 nn . Hint. Use Exercise 13.92 and the root test (Proposition 177).
13.122 Show that the series 3 1 1.3 3 1.3.5 3 + + + ... 2 2.4 2.4.6 is convergent. Hint. Use Raabe’s test (Proposition 509). 13.123 Decide on the convergence of (ν(n)/n2 ) where ν(n) is the number of digits in n. Hint. 1 1 1 1 (ν(n)/n2 ) = 1 2 + . . . + 2 + 2 + . . . + 1 9 102 992 10 2.102 3.103 n.10n + + + . . . + 12 102 104 102n−2 3 n 4 + . . . = 10 + 2 + + . . . + n−2 + . . . + 10 102 10 + ... ≤
and the last series converges (use, for example, the root test, i.e., Proposition 177).
Series of Arbitrary Terms 13.124 Assume that an > 0 for all n ∈ N and that lim an = 0. Is the series (−1)n an convergent? Observe that no assumption on monotonicity is made. Compare this with the Leibniz criterion 183). (Corollary n Hint. No in general. Consider (−1)n 2+(−1) . In order to see that this series n diverges, assume the contrary. Subtract then the (convergent) alternate series (−1)n n2 (use Corollary 183) to get the divergent harmonic series (2.16). ∞ 2 ∞ 2 13.125 Assume that n=1 an and n=1 bn are both convergent. Show that ∞ n=1 an bn is absolutely convergent. Hint. |an bn | ≤ 21 (an2 + bn2 ). 13.126 Decide on the convergence of the series ∞ n=1
∞
(−1)n+1
sin n . n
Hint. The series n=1 (−1)n+1 sin n has bounded partial sums (see Sect. 9.2). Then use the Dirichlet–Jordan test (split the sum collecting even terms, on one side, and
662
13 Exercises
odd terms, on the other), and apply the Dirichlet test (Theorem 181). We can actually compute the sum of the series, although this needs the theory of Fourier series (see Exercise 13.527). 13.127 Decide on the convergence of the series ∞
(−1)n+1
n=1
Hint. Use sin2 n =
1− cos 2n 2
sin2 n . n
and Exercise 13.126.
13.128 Show that
∞ sin n2 sin n
n
n=1
is convergent. 2 Hint. The series ∞ n=1 sin n sin n has bounded partial sums: Indeed, write sin n2 sin n =
1 (cos n(n + 1) − cos n(n − 1)) 2
and use a telescopic argument (see Remark 678). Finally, use the Dirichlet test (Theorem 181). 13.129 Show that
∞ n=2
1 ln (n!)
is divergent. Note that ln (n!) ≤ ln (nn ) = n ln n. The improper Riemann integral ∞Hint. 1 1 2 x ln x dx is divergent. To see this, note that the antiderivative to x ln x is ln (ln x). 13.130 Show that the series
∞ ln n n=2
is convergent. Hint. Since lim large n,
ln √n n
n2
= 0 by L’Hôspital’s rule (Theorem 376), we have that for
√ ln n n 1 ≤ 2 = 3, n2 n n2 and the last is the general term of a convergent series (see Proposition 174). 13.131 Find the sum
∞ n . n 2 n=1
13.2 Sequences and Series
663
Fig. 13.2 The Riemann sums in Exercise 13.135 for p = 2 and n = 5
n Hint. Consider the function f (x) = ∞ (−1, 1). We search the n=1 nx , for x ∈ ∞ 1 n−1 n−1 value f ( 2 ). Write f (x) = x n=1 nx . Denote g(x) := ∞ . Then, for n=1 nx x ∈ (−1, 1), we have 4
x
g=
0
∞ n=1
xn =
x +C (1 − x)2
1 1 1 for some constant C. Thus g(x) = (1−x) 2 . It follows that g( 2 ) = 4, and so f ( 2 ) = 2. 13.132 Assume that an and bn are series such that an is convergent and lim abnn = 1. Does bn converge? Compare with the comparison test (Proposition 170) and, more particularly, Corollary 172. n n (−1) 1 √ √ Hint. No in general. Consider an = (−1) and b = + . n n n n
13.133 Let {an } be a sequence of real numbers. Assume that, for every p ∈ N, lim (an+1 + an+2 · · · + an+p ) = 0.
n→∞
Is then
an convergent?
Hint. No in general. Consider the harmonic series and use Proposition 161. 13.134 Let an be a convergent series, {bn } be a bounded sequence. Is it true that an bn is convergent? Show that the answer is no in general and yes for absolutely convergent series. n √ Hint. an = bn = (−1) for all n ∈ N. For the second part, use the Cauchy n criterion (Proposition 165). +···+n , p > 0. 13.135 Compute lim 1 +2np+1 1 p Hint. Use the integral 0 x dx and the partition 0 < n1 < n2 < · · · < 1 to obtain p p +···+np (see Fig. 13.2 for the case p = 2). as a particular Riemann sum Sn := 1 +2np+1 Since the function x p is integrable in [0, 1] we know that the sequence of Riemann 1 p 1 sums {Sn }∞ n=1 converges to the integral 0 x dx. The result is p+1 . p
p
p
664
13 Exercises
13.136 Compute lim n1 (sin πn + sin 2π + · · · + sin (n−1)π ). n n π < . . . π . Proceed similarly Hint. Use 0 sin x dx and the partition 0 < πn < 2π n 2 to Exercise 13.135. The result is π . n+1 an converges if an converges. 13.137 Show that n }∞ Hint. Use the Abel criterion (Theorem 182). Observe that the sequence { n+1 n n=1 is monotone. Note, too, that we could use the comparison test (Proposition 170) if the series an consists of nonnegative terms. Here, however, the result holds for an arbitrary series an . 13.138 If the series is convergent but not unconditionally convergent, show that it can be rearranged to a series that is not convergent but has a bounded partial sums Hint. The technique in the proof of Riemann Theorem 191. 13.139 Compute the square of the real number 0, 110100010. . . (base 2), where 1’s are at positions 2k for k = 0, 1, 2, . . . n k and an = 0 otherwise. Hint. Put a 0 = 0, an = 1/2 for n = 2 , k = 0, 1, 2, . . ., ∞ Note that x = n=0 an , hence the product x.x is written as ∞ n=0 cn , where cn =
n
ak an−k , for all n = 0, 1, 2, . . .
(13.12)
k=0
It is simple to prove that for m = 0, 1, 2, . . ., the number of nonzero summands in Eq. (13.12) is, at most, 2. We have the following cases: (i) (ii) (iii) (iiia) (iiib)
If m = 1, then cm = 0. If m is odd, m ≥ 3, and m = 20 + 2q for q = 1, 2, . . ., then cm = 1/2m−1 . If m is even, and m = 2p + 2q , we still have two cases: p = q: Then cm = 1/2m−1 . p = q: Then cm = 1/2m .
The result is x 2 = 0.10101010101000101010001000000010101010010000111110. . .(base2).
13.2.4
The Euler Number e
13.140 Prove that (i) limn→∞ (1 − n1 )n = e−1 . (ii) limn→∞ (1 + n1 )n+p = e, where p is a constant. (iii) limn→∞ (1 + λn )n = eλ , where λ ∈ R, λ = 0.
13.3 Measure
665
Hint. (i) −n n−1 n n 1 n = = 1− n n n−1 n −n ? −(n−1) @ n−1 1 1 = 1+ = 1+ → e−1 , n−1 n−1 as n → ∞.
n+p ' n ( n+p n (ii) 1 + n1 = 1 + n1 → e, as n → ∞. n λ λ
n 1 (iii) 1 + λn = 1 + n/λ → eλ , as n → ∞. Note that (i) is a particular case of this.
13.3 13.3.1
Measure The Lebesgue Outer Measure
13.141 Prove that in Definition 229 we may omit the word “open” regarding the sequences of bounded intervals used in (3.1). Similarly, the same concept is defined if the intervals considered are closed, or half-closed, etc. Hint. It is enough to prove that given an arbitrary cover {Jn }∞ n=1 of A by bounded of A by open bounded intervals intervals, we can alwaysproduce a cover {In }∞ n=1 ∞ such that ∞ |I | and |J | differ in an arbitrarily small preassigned positive n=1 n n=1 n number ε. If, say, Jn := [an , bn ], take In := (an − ε/2n+1 , bn + ε/2n+1 ). If Jn = [an , bn ) or Jn = (an , bn ], proceed similarly. Observe, too, that |(a, b)| = |[a, b]| = |[a, b)| = |(a, b]| by Definition 35.
13.3.2
The Class of Lebesgue Measurable Sets and the Lebesgue Measure
13.142 (a) Show that every closed set in [0, 1] that has Lebesgue measure 1 must be equal to [0, 1]. (b) Does there exist a nowhere dense set in [0, 1] of Lebesgue measure 1? (c) Let A and B be sets in [0, 1] each of Lebesgue measure 1. Does A ∩ B have measure 1? Hint. (a) Let F be a closed subset of [0, 1] such that λ(F ) = 1. If F = [0, 1], then F c contains an open nonempty interval J . Thus, 1 = λ([0, 1]) ≥ λ(F ) + λ(J ) > 1, a contradiction. (b) No. If such S existed, S would have measure 1, hence S = [0, 1] by the first part. (c) Yes, De Morgan formulas (1.1) and (1.2). Precisely, λ(A \ B) = λ(B \ A) = 0, hence λ(A ∩ B) = 1. 13.143 If E is a measurable subset of R and 0 < α < β < λ(E), then there exists a compact set A ⊂ E which is nowhere dense in R and such that α < λ(A) < β. Prove this statement.
666
13 Exercises
Hint. Without loss of generality, we may assume that λ(E) < +∞. Choose a compact set C ⊂ E\Q with λ(C) > β (see Proposition 271). Cover C by open sets V1 , V2 , . . ., Vn , each having measure less than β − α. Let A := C \ (V1 ∩ V2 . . . ∩ Vm ), where m is the smallest integer for which this set has measure less than β. 13.144 If E is a measurable subset of R and 0 < α < λ(E), then there exists a compact set C ⊂ E that is nowhere dense in R and such that λ(C) = α. Prove this statement. Hint. Use Exercise 13.143 to construct compact subsets of E, say C1 ⊃ C2 ⊃ . . ., such that α < λ(Cn ) < α + n1 for each n ∈ N, and let C be their intersection. 13.145 Given ε > 0, construct a compact set in [0, 1] \ Q with Lebesgue measure greater than 1 − ε. Hint. If {rn : n ∈ N} denotes the set of all rational numbers in (0, 1), consider ∞ ∞ the set S = n=1 (rn − δn , rn + δn ) where {δn }n=1 satisfies δn > 0 for all n ∈ N, and ∞ n=1 δn < ε. Then consider the complement of S. 13.146 Let A be the set of all points in [0, 1] the decimal expansions of which do not contain the digit 7. Show that the Lebesgue measure of A is zero. k k+1 Hint. Consider the ten intervals {[ 10 , 10 ) : k = 0, 1, 2, . . ., 9}. Throw away the eighth interval; in this way, you dispense with all decimals whose first term is 1 7. Note that the length of this interval is 10 . Next we divide each of the remaining nine intervals into ten intervals each of length 1012 . From each of these divisions, we through away the eighth interval; in this way, we removed all decimals whose second term is 7. Note that at this stage we dispense with lengths totaling 1092 . We keep going. The total length we through away is the sum of the series involved. 13.147 Let Z be a subset of R with Lebesgue measure zero. Show that the set {x 2 : x ∈ Z} also has Lebesgue measure zero. Hint. Assume first that Z is bounded. The function f : R → R given by f (x) := x 2 for all x ∈ R is absolutely continuous on a closed bounded interval (this follows, for example, from Propositions 444 and 445). Proposition 437 concludes the proof. If Z is arbitrary, apply the former argument to Z ∩ [−n, n] for each n ∈ N, and then note that f (Z) = ∞ n=1 f (Z ∩ [−n, n]). 13.148 If |f | is a measurable real-valued function on R, is f necessarily measurable? Hint. No. Consider the function taking the value 1 on a nonmeasurable set (see Lemma 283) and −1 on its complement. 13.149 There exists a measurable set E ⊂ R such that 0 < λ(E ∩ I ) < λ(I ) for each nonempty open interval I ⊂ R. Prove this statement. Compare with Exercise 13.150. Hint. Let {In }∞ n=1 be all open intervals having rational endpoints. Put C0 := ∅ and r0 := 1. After compact nowhere dense sets C0 , C1 , . . .Cn−1 and positive numbers r0 , r1 , . . .rn−1 have been constructed, let rn := min{r0 , . . ., rn−1 , λ(In \ n−1 C )} and i=0 i use Exercise 13.143 to choose a compact nowhere dense Cn ⊂ In \ n−1 i=0 Ci with
13.3 Measure
0 < λ(Cn )
0, then x = 2x − x, and clearly 2x ∈ A, x ∈ A. If, on the contrary, x < 0, then x = −x − (−2x), and −x ∈ A, −2x ∈ A. Assume now that x ∈ Q. Find an irrational number y such that 0 < y < 1/2. If x ≥ 0, then x = (x + y) − y, and (x + y) ∈ A, y ∈ A. If, on the contrary, x < 0, put x = y − (−x + y), and note that y ∈ A and (−x + y) ∈ A. This shows the statement.
13.3.3
The Cantor Ternary Set
13.154 Construct a subset of [0, 1] in the same manner as the Cantor set, except that at the kth stage each interval removed has length δ3−k , 0 < δ < 1 (see Definition 277 and the construction in Sect. 3.1.5). Explain why the resulting set is perfect, has measure 1 − δ, and is nowhere dense. Hint. Look at Sect. 3.1.5. 13.155 Show that there is a perfect nowhere dense subset of R consisting of irrational numbers. Hint. Start the construction (see Sect. 3.1.5) of a Cantor-like set on a closed interval J in R with irrational endpoints, and gather all the rational numbers in J as {rn : n ∈ N}. Delete the “middle” interval defined in such a way that it has irrational ends and contains r1 . Keep going. Since {rn : n ∈ N} is left aside this Cantor-like set, we are done. 13.156 Show that there is a set F of first category in [0, 1] such that λ(F ) = 1. Hint. Let {rn } be the sequence of all rational numbers in [0, 1]. For δ > 0, (r let δn > 0, n ∈ N, be so that n δn < 2δ . Let Sδ := ∞ n=1 n − δn , rn + δn ). Let Mδ := [0, 1] \ Sδ . Since the measure of Sδ is ≤ δ, we get that the measure of Mδ is greater than or equal to 1 − δ. Note that Mδ is nowhere dense in [0, 1]. Thus the set M := ∞ n=1 M n1 is a first category set in [0, 1] of measure one. Observe that its complement in [0, 1] is thus a residual set in [0, 1] of measure zero. Alternatively, for n ∈ N, let Cn be a Cantor ternary set in [0, 1] of measure greater than 1 − n1 (see Sect. 3.1.5 and Exercise 13.154). Consider ∞ n=1 Cn . 13.157 If C denotes the Cantor ternary set in [0, 1], show that C − C = [−1, +1]. Hint. Consider the set (C × C) ⊂ ([0, 1] × [0, 1]). The set C × C can be obtained by deleting “middle” squares similarly as in the process for obtaining C. Now, given α ∈ [−1, 1], check that the line y = x + α intersect at least one corner square left after deleting the first four “middle” squares. The same situation occurs in this corner square in the second step of the construction. By the nested sets principle (Theorem 69), this line has to intersect C × C, see Fig. 13.3. If c2 = c1 + α, then α = c2 − c1 .
13.3 Measure
669
Fig. 13.3 First steps in the construction of C × C (Exercise 13.157)
13.3.4 A Nonmeasurable Set 13.158 A set B ⊂ R is called a Bernstein set if neither B nor R \ B contain any perfect set. In Exercise 13.62, we defined a Bernstein set as a subset B of R such that for every nonempty perfect subset P of R, both B ∩ P and P \ B have cardinality c. Prove that both concepts coincide. (ii) Taking for granted that Bernstein sets exist, prove that every Bernstein set is not Lebesgue measurable.
(i)
Hint. (i) Assume that B is a Bernstein set according to Exercise 13.62. Then, it is clear that no nonempty perfect set can be contained in B nor in R \ B. Assume now that B is a Bernstein set according to the definition here. Observe that if P is a nonempty perfect set, and N is a countable subset of P , then P \ N is again perfect. This follows from the fact that for p ∈ P , any neighborhood of p has uncountably many points in P (see Corollary 591). As a consequence, if P ∩ B is countable, we reach a contradiction. The same happens if we assume that P \ B is countable. (ii)Assume that B is measurable. Without loss of generality we may assume, too, that λ(B) > 0. Since B = ∞ n=1 (B ∩ [−n, n]), there exists n ∈ N such that λ(B ∩ [−n, n]) > 0. Use Proposition 271 to get a compact subset K of B ∩ [−n, n] such that λ(K) > 0. Then, K must be uncountable. Use now Exercise 13.389 to get a contradiction.
13.3.5
Sequences of Sets
13.159 Let {Mn }∞ n=1 be a sequence of measurable sets. Show that (a) (b)
If, moreover, λ(
λ( lim inf Mn ) ≤ lim inf λ(Mn ). Mn ) < ∞, then λ( lim sup Mn ) ≥ lim sup λ(Mn ).
670
13 Exercises
Hint. (a) Iflim inf λ(Mn ) = +∞, there is nothing to prove. Otherwise, note that the ∞ sequence { ∞ ) = lim λ(Mn ). Observe, k=n Mk }n=1 is increasing, hence lim inf λ(Mn ∞ too, that λ( k=n Mk ) ≤ λ(Mk ) for allk ≥ n, hence λ( ∞ k=n Mk ) ≤ lim k λ(Mk ). ∞ M ) ≤ lim This happens for all n ∈ N, hence λ( ∞ k λ(Mk ), as claimed. k k=n n=1 The proof of (b) is similar.
13.4 13.4.1
Functions Functions on Real Numbers
Introduction 13.160 Let the map f from [0, 1] into [0, 1] be defined by ⎧ ⎨x, for x an irrational number, f (x) := ⎩1 − x, for x a rational number. Show that f is a one-to-one map from [0, 1] onto [0, 1]. Hint. Direct computation. 13.161 Let us say that a real-valued function f defined on an open interval I is increasing at a point a ∈ I if there is δ > 0 such that (a − δ, a + δ) ⊂ I and f (x) < f (a) < f (y) whenever a − δ < x < a < y < a + δ. Let f be increasing at each point of an open interval I ⊂ R. Show that f is increasing on I . Hint. Pick a, b two points in I such that a < b. We need to show that f (a) < f (b). To this end let M := {x ∈ I : a < x ≤ b, f (a) < f (x)}. This set is nonempty since f is increasing at a. We want to show that b ∈ M. First, we show that sup M = b. If sup M < b, then, as f is increasing in sup M, there is δ > 0 such that (supM − δ, sup M + δ) ⊂ (a, b) and f (x) < f (supM) < f (y) for all sup M − δ < x < sup M < y < sup M + δ. From the definition of sup M, there is x ∈ M such that sup M − δ < x ≤ sup M. Since x ∈ M, we have f (a) < f (x), hence f (a) < f (y) for all y ∈ (supM, sup M + δ). This is a contradiction with the definition of sup M, so sup M = b. We will now show that b ∈ M. Since f is increasing at b, there is δ > 0 such that (b−δ , b+δ ) ⊂ I and f (x) < f (b) < f (y) whenever b−δ < x < b < y < b+δ . From the definition of sup M (= b), we get that there is x ∈ (b − δ , b] such that f (a) < f (x), hence x ∈ M. If x = b, then we are done. If x = b, then f (x) < f (b) and f (x) > f (a), giving together f (a) < f (b). Thus b ∈ M. 13.162 For the function f (x) = x−1 find its domain D(f ), range R(f ), and the x+1 inverse function in case it exists. Hint. Clearly, D(f ) = R \ {−1}, R(f ) = R \ {1}. For finding f −1 , do simple algebraic manipulation: Put y = (x − 1)/(x + 1). This shows, if y = 1, that x = (1 + y)/(1 − y). The possibility to get this formula for y = 1 implies that f is one-to-one. Alternatively, if (x1 − 1)/(x1 + 1) = (x2 − 1)/(x2 + 1) for some
13.4 Functions
671
x1 , x2 ∈ R, both different from −1, again a simple algebraic computation shows that x1 = x2 . For the graph of f on [−10, 10], see Fig. 4.2. The Limit of a Function 13.163 Assume that for a function f defined on an open interval I and for a number a in this interval, L := limx→a f (x) > 0. Show that this implies that on some neighborhood of a except perhaps at a, the function f is strictly positive. Hint. In the definition of limit, take ε := L/2. Find then δ > 0 such that |f (x) − L| < ε (= L/2) for x ∈ (a − δ, a + δ) ∩ I , x = a. This shows that f (x) > L/2 for x ∈ (a − δ, a + δ) ∩ I , x = a. 13.164 Show that (i) limx→+∞ x 5 − x 2 = +∞. 5 +x 2 (ii) limx→+∞ xx5 +2x 3 = 1. Hint. (i) x 5 − x 2 = x 5 (1 − x −3 ). (ii) limx→+∞
x 5 +x 2 x 5 +2x 3
= limx→+∞
1+x −3 . 1+2x −2
13.165 Show in detail that limx→0 sin (1/x) does not exist. Hint. For f (x) := sin (1/x), x = 0, compute the limit of the two sequences ∞ {f (nπ)}∞ n=1 and {f ((2n − 1)π/2)}n=1 . See also Example 312.2, where it is shown that lim supx→0 f (x) = lim inf x→0 f (x), and refer to Proposition 313. x+1 13.166 Show by using an ε-δ-argument that limx→1 x+2 = 23 . Hint. It is enough to restrict ourselves to ε ∈ (0, 1). Let δ ∈ (0, ε/6) and take 0 < |x| < δ. Compute |(1 + x + 1)/(1 + x + 2) − 2/3| and prove that it is less than ε.
13.167 Show by using an ε-δ-argument that limx→0 x sin x1 = 0. Hint. Given ε > 0, take δ = ε. If 0 < |x| < δ, then |x sin (1/x)| ≤ |x| < δ = ε. 13.168 Assume that a real-valued function f defined on a neighborhood of a point a ∈ R has the property that f (an ) → f (a) whenever {an } is a sequence converging to a such that |an − a| < 21n for each n. Is f necessarily continuous at a? Hint. Yes. Similar to Exercise 13.173. 13.169 Show that limx→0 x[ x1 ] = 1, where [.] stands for the whole part of the number. Hint. x1 − 1 ≤ [ x1 ] ≤ x1 + 1 and the sandwich Lemma 386. 13.170 Show that in Definition 137 of lim sup and lim inf, we can replace {x ∈ D(f ), 0 < |x − x0 | < 1/n} by {x ∈ D(f ), 0 < |x − x0 | < δn } for n ∈ N, where {δn }∞ n=1 is a sequence of positive numbers such that δn → 0. Hint. Standard. 13.171 Prove the following Levine theorem: If f is a real-valued function on [a, b], then the set of all points where the left limit exists finite but f is discontinuous is countable.
672
13 Exercises
Fig. 13.4 The function in Exercise 13.178
Hint. ([Stromb81], p. 280) Call this set A, and put An := {x ∈ A : ω(f , x) ≥ n1 }, where w(f , x) is the oscillation of f at x (see Definition 700). Note that each point of An is the right endpoint of some open interval that is disjoint from An . Thus An is countable.
Continuous Functions 13.172 Show elementary that the function f (x) = x 3 + 2x 2 + x + 1 is continuous at x = 1. Hint. 1.
“Rough work”: We need to show that f (x) is close to f (1) (= 5) if x is close to 1. Observe that |f (x) − 5| = |x 3 + 2x 2 + x + 1 − 5| = |x 3 − 1 + 2x 2 − 2 + x − 1| = |(x − 1)(x 2 + x + 1) + 2(x − 1)(x + 1) + x − 1| = |(x − 1)(x 2 + 3x + 4)|
(13.13)
(see Fig. 13.4). (Note that this could have been obtained by checking that x = 1 is a zero for the function x → f (x) − 5.) Restrict yourself to points x ∈ R such that |x − 1| < 1. From (13.13), we get |f (x)| ≤ 14|x − 1|. Fix ε > 0. It follows that, if |x| ≤ 2 and |x − 1| < ε/14, then |f (x) − 5| < ε. ε 2. “Precise work”: Given ε > 0, put δ = min{1, 14 }. If |x − 1| < δ, then |f (x) − 5| < ε. This proves the continuity of f at the point x = 1. 13.173 Assume that a real-valued function f defined on a neighborhood of a point a ∈ R has the property that f (an ) → f (a) whenever {an } is a monotone sequence converging to a. Is f necessarily continuous at a? Hint. Yes. Indeed, if not, then there is a sequence {bn } that converges to a and ε > 0 such that |f (bn ) − f (a)| ≥ ε. Choose a monotone subsequence of {bn }. 13.174 Construct a real-valued function on [0, 1] that has a limit at each point but is not continuous at infinite countably many points.
13.4 Functions
673
&
Hint. Put f (x) =
1 n
0
if x = 1 − otherwise.
1 , n+1
Another example is the Riemann function R (Definition 379. Indeed, R has a limit (0) at each point x ∈ (0, 1), and so it is discontinuous precisely at points Q ∩ (0, 1). 13.175 (a) Construct a function f on R that is continuous at all integers and discontinuous at all numbers that are not integers. (b) Construct a function on R such that f is continuous at all numbers that are not integers and f is at the same time discontinuous at all integers. Hint. (a) (sinπx) ×D(x), where D is the Dirichlet function (see Definition 296). (b) Put f (x) = 0 if x is not an integer and f (x) = 1 if x is an integer. 13.176 Let D = {rn : n ∈ N}, where {rn }∞ n=1 is a sequence of mutually distinct real numbers. Define the function f : R → R by f (x) = 1/2n , for x ∈ R, rn x, f (y) − f (x) ≥ 2−n . If x is not in {rn : n ∈ N}, then to get the continuity of f from the right at x, given find a nonempty finite subset S of {rn : n ∈ N, rn > x} such that ε 1> 0 1 − n rn >x 2 rn ∈S 2n < ε. If x < y < minS, then f (y) − f (x) ≤ ε. The last statement here follows from the fact that if x, y ∈ R, x < y, then there is rk such that x < rk < y and thus f (rk ) < f (y) by the first part of the proof. 13.177 Assume that f is a continuous function on [0, 1], and that it is strictly increasing on (0, 1). Is f strictly increasing on [0, 1]? Hint. Yes. By continuity, f (0) ≤ f (x) for all x ∈ (0, 1). Assume that f (0) = f (x) for some x ∈ (0, 1). Then, for 0 < y < x we get (f (x) =) f (0) ≤ f (y) < f (x), a contradiction. The proof for x = 1 is similar.
674
13 Exercises
Fig. 13.5 The function in Exercise 13.172
13.178 Construct a continuous function f on R such that (a) f (x) = 0 for all x ∈ R with |x| ≥ 1, and (b) f (0) = 2. (See also Exercise 13.223 for an infinitely differentiable version.) Hint. Let ⎧ ⎨2 cos π x, 2 f (x) = ⎩0,
if |x| ≤ 1, if |x| ≥ 1.
Show that f has the required properties by considering one-sided continuity. 13.179 Let f be a bounded (not necessarily continuous) real-valued function on R. For x ∈ R, put F (x) := sup{f (y) : y < x}. Show that F is continuous from the left at every point of R. Hint. Let x ∈ R. Given ε > 0, by the definition of the supremum there is a point y < x such that f (y) > F (x) − ε. Let δ := x − y. Take now z ∈ (x − δ, x] = (y, x]. Then F (z) = sup{f (w) : w < z} ≥ f (y) > F (x) − ε. From the definition of F , we have for z ≤ x that F (z) ≤ F (x), as any point that is less than z is then less than x. Thus, for z ∈ (x − δ, x] we have F (x) − ε < F (z) ≤ F (x). This shows that F is continuous from the left at x. 13.180 Show that if f is a continuous real-valued function on R and B ⊂ R is Borel, then f −1 (B) is Borel. Extend this result to the class of continuous functions between metric spaces. Hint. The family F of all C ⊂ Rn for which f −1 (C) is Borel is a σ -algebra containing every open set. 13.181 If f is a real-valued function defined on R, then the set of its points of continuity is a Gδ -set (see Exercise 13.366 for an extension to metric spaces). On the other hand, given a Gδ -set G in R, there is a function on R whose set of points of continuity is exactly G. Hint. For the first part: Gn := {x ∈ R : there is a neighborhood U of x 1 such that |f (x) − f (y)| < for all x, y ∈ U n
13.4 Functions
675
Then each Gn is open and Gn is exactly the set of continuity points of f . Indeed, if x ∈ Gn , then clearly x is a point of continuity of f . If x0 is a point of continuity of f and n ∈ N is given, choose δ > 0 so that for each point x of (x0 − δ, x0 + δ), 1 . Then, if x, y ∈ (x0 − δ, x0 + δ), we have |f (x) − f (x0 )| < 2n |f (x) − fy)| ≤ |f (x) − f (x0 )| + |f (x0 ) − f (y)|
0 and two sequence {xn } and {yn } in [a, b] are given such that xn − yn → 0, first find δ > 0 for ε from the definition of uniform continuity. Then for this δ find n0 ∈ N such that |xn − yn | < δ whenever n > n0 . Then for such n, |f (xn ) − f (yn )| < ε. Thus f (xn ) − f (yn ) → 0. If f is not uniformly continuous, find ε > 0 so that for each n, there are xn , yn ∈ [a, b] with |xn − yn | < n1 so that |f (xn ) − f (yn )| > ε. This shows that the condition does not hold true. 13.190 Give an alternative proof of Theorem 344 by using Exercise 13.189. Hint. Let f : [a, b] → R be a continuous function. If f is not uniformly ∞ continuous then, by Exercise 13.189, there exist two sequences {xn }∞ n=1 and {yn }n=1
13.4 Functions
677
in [a, b] such that xn − yn → 0 and f (xn ) − f (yn ) → 0. By passing if necessary to a subsequence, we may assume that both {xn } and {yn } converge. From the given condition, they have a common limit, say x0 . Thus, f (xn ) → f (x0 ), and f (yn ) → f (x0 ), a contradiction. Observe that this proof works for continuous functions defined on compact metric spaces as well. 13.191 Assume that f is uniformly continuous on [a, b] and also on [b, c]. Show that f is uniformly continuous on [a, c]. Hint. If a ≤ x ≤ b ≤ y ≤ c, then |f (y)−f (x)| ≤ |f (y)−f (b)|+|f (b)−f (x)|. Then, use Exercise 13.189. 13.192 Show that
√ n
x−
√ n
a
1 and x > a > 0. √ √ √ n Hint. We need to show that √ x< nx− a + n a. By raising to the nth power, √ this amounts to show that x < ( n x − a + n a)n . Note that, according to the finite binomial expansion (13.3), we have √ √ ( n x − a + n a)n n n n n n−1 1 1 n−1 (x − a) n a n + a = (x − a) + (x − a) n a n + . . . + n−1 n 0 1 n n n−1 1 1 n−1 = (x − a) + (x − a) n a n + . . . + (x − a) n a n + a 1 n−1 n n n−1 1 1 n−1 n n =x+ (x − a) a + . . . + (x − a) n a n > x, 1 n−1 as we wanted to show. 13.193 (i) (ii)
Find the modulus of continuity of the function f (x) := x 2 on [0, 1]. √ Find an estimation of the modulus of continuity for the function f (x) = 3 x on [0, 1] and then on [1, ∞).
Hint. (i) The function f is (strictly) increasing on [0, 1] (see Fig. 4.15). It follows that δ(ε) = sup{f (x) − f (y) : x − ε ≤ y ≤ x, x, y ∈ [0, 1]} = sup{x 2 − (x − ε)2 : x, x − ε ∈ [0, 1]} = sup{2εx − ε 2 : x, x − ε ∈ [0, 1]}
(see formula (4.10)). Obviously, the supremum is attained when x = 1, and this gives δ(ε) = 2ε − ε 2 . √ √ √ 3 x − 3 y ≤ 3 x − y (see Exercise 13.192). Thus, an answer (ii) If 0 < y < x, then √ to the first question is 3 ε. Due to the decreasing character of the derivative (see Fig. 13.6), the supremum in formula (4.10) is attained for x = ε and y = 0, so
678
13 Exercises
Fig. 13.6 A fragment√ of the graph of the function 3 x (see Exercise 13.193)
√ δ(ε) = 3 ε. An answer to the second one is 13 ε, by using the Mean Value Theorem 365. Again due to the decreasing character of the derivative,√ the supremum in formula (4.10) is attained for x = 1 + ε and y = 1, giving δ(ε) = 3 (1 + ε) − 1. 13.194 Show that the function
1 x is continuous but not uniformly continuous on (0, 1]. Show that this function is uniformly continuous on [c, +∞) for every c > 0. Hint. The graph of f on the interval [−10, 0)∪(0, 10] is represented in Fig. 4.10. f (x) =
1.
Continuity. Fix a ∈ (0, 1]. We need to upper-estimate | x1 − a1 | in terms of |x − a| for points x in (0, 1] close to a. Without loss of generality, we may restrict ourselves to points x ∈ (0, 1] so that |x − a| < a/2. For such points x, we have 1 − 1 = |x − a| ≤ |x − a| . x a2 a |ax| ,
-
2
2 Therefore, if we put δ := min a2 , a2ε , then x1 − a1 ≤ ε provided |x − a| < δ. 2. To show that f is not uniformly continuous on (0, 1], we may use (ii) in Proposition 347: Put xn := n1 , yn := n2 , for n ∈ N. Then xn − yn → 0 and (n/2 =) f (xn ) − f (yn ) → 0. See Exercise 13.189. 3. Fix c > 0. We shall prove that f is uniformly continuous on [c, +∞). If x, y ∈ [c, +∞), then 1 − 1 = |x − y| ≤ |x − y| . x y |xy| c2 Therefore, if ε > 0 is given, it suffices to take δ = c2 ε in the definition to see that the function is uniformly continuous on [c, +∞). Alternatively, use the Mean Value Theorem 365. 13.195 Show that the function defined for x ∈ R by f (x) = x sin x is not uniformly continuous on R. Hint. Consider points 2nπ and 2nπ + n1 , for n ∈ N. Use Exercise 13.189 together with Proposition 389. 13.196 Let the function f be defined on (0, 1) by f (x) = x sin x1 . Show that f is uniformly continuous on (0, 1) by using the Heine–Cantor Theorem 344. Hint. Define f (0) = 0, f (1) = sin 1, and show that such extended function f is continuous on the closed interval [0, 1]. Thus, it is uniformly continuous there by
13.4 Functions
679
Fig. 13.7 A fragment of the graph of the function in Exercise 13.195
the Heine–Cantor Theorem 344. This function was considered in Example 4.5.8.4 (a partial plot of the function can be found there, see Fig. 4.45), where it was proved that f (0) does not exist, and that f is not of bounded variation. 13.197 Let f be a real-valued uniformly continuous function on a bounded open interval (a, b). Show that f is bounded on (a, b). Hint. For ε = 1 get δ from the uniform continuity of f . Let the points a = a1 < a2 < a3 < . . . < an = b be chosen so that the distance of aj to the next aj +1 is less than δ/2. Let M = max{f (a2 ), . . . , f (an−1 )}. If x ∈ (a, b), there is j ∈ {2, 3, . . . (n − 1)} such that |x − aj | < δ. Then |f (x) − f (aj )| < ε = 1. Hence |f (x)| < |f aj )| + 1 ≤ M + 1. 13.198 Show that the function tan x is not uniformly continuous on (0, π2 ). Hint. Use Exercise 13.197. See Fig. 4.34. 13.199 Show that a real-valued function f defined on a subset D of R is uniformly continuous on D, if and only if, δ(ε) → 0 as ε → 0+, where δ is the modulus of continuity of f on D (see Definition 346). Hint. Assume first that f is uniformly continuous on D. Then, given α > 0 we can find ε > 0 such that for every x, y ∈ D with |x −y| ≤ ε we have |f (x)−f (y)| ≤ α. It follows from the definition of δ that δ(ε) ≤ α. Since δ is increasing on [0, +∞) we obtain limε→0+ δ(ε) = 0. Conversely, assume that limε→0+ δ(ε) = 0. Then, given α > 0 we can find ε > 0 such that δ(ε) ≤ α. This shows that |f (x) − f (y)| ≤ α for all x, y ∈ D with |x − y| ≤ ε. Therefore, f is uniformly continuous on D. 13.200 Prove, by using Corollary 326 and Theorem 334, that if K is a compact subset of R, and f : K → R is a continuous and one-to-one mapping, then f −1 : f (K) → K is continuous. This gives an alternative proof to Proposition 337. Hint. It is sufficient, in view of Corollary 326, to prove that f (K0 ) is closed in f (K) for every closed subset K0 of K. Indeed, f (K) is compact due to Theorem 334. By Lemma 94, f (K) is closed in R. The set K0 is closed in K, hence, by Lemma 93, K0 is compact. Again Theorem 334 shows that f (K0 ) is compact in R, hence, by Lemma 94, it is closed in R. It is contained in f (K), hence it is closed in f (K). 13.201 Let f be a continuous function on a bounded open interval (a, b). Prove that f can be extended continuously on [a, b], if and only if, f is uniformly continuous on (a, b).
680
13 Exercises
Fig. 13.8 A schema of the assumption in the hint of Exercise 13.204
u v u v x
y
Hint. If f uniformly continuous, then lim f at endpoints exists and is finite by the Cauchy condition. Putting f (a) := limx→a+ f (x) and f (b) := limx→b− f (x) defines f on [a, b], and clearly this extension is continuous. If f can be extended continuously to the compact set [a, b], it is uniformly continuous there, by the Heine– Cantor Theorem 344, and so it is f : (a, b) → R. 13.202 Let f be a bounded monotone continuous function on a bounded interval (a, b). Prove that then f is uniformly continuous on (a, b). Hint. The limit of f at endpoints exists as a real number. Then, use Exercise 13.201. 13.203 Let f be a continuous real-valued function on [0, ∞) such that limx→∞ f (x) exists as a real number. Show that f is uniformly continuous on [0, ∞). Hint. Assume for simplicity limx→∞ f (x) = 0. Get K > 0 so that |f (x)| < ε for x ≥ K. Then get δ > 0 so that if x, y ≤ K are such that |x − y| < δ, then |f (x) − f (y)| < ε (note that f is uniformly continuous on [0, K]). Then, if x, y ≥ 0 and |x − y| < δ, it is not hard to show, using if needed the point K, that |f (x) − f (y)| ≤ 2ε. The Intermediate Value Property 13.204 Prove that if f is a real-valued function on an open interval I ⊂ R, and f is continuous and one-to-one, then f is strictly increasing or strictly decreasing. Compare with Exercise 13.160. Hint. This time, we think a sketch is enough for a proof. Assume that the conclusion is false. Then, we can find x < y in I such that f (x) > f (y), and u < v in I such that f (u) < f (v). That the function should be one-to-one requests that no horizontal line meets the graph more than once. In Fig. 13.8, we locate alternative positions of u and v (the grey band is not allowed), all of them leading to a contradiction by using the intermediate value property of continuous functions. 13.205 Show that Proposition 651 does not hold for the interval (0, 1). (See Fig. 13.9b.) Compare with Theorem 652. Hint. Use g(x) := sin π2 x.
13.4 Functions
681
Fig. 13.9 (a) f on [0, 1] has a fixed point, (b) g on (0, 1) has no fixed points (Exercise 13.205)
a
b
13.206 Let P be a polynomial of an odd order. Show that the intermediate value property (Theorem 339) implies that P has at least one real root, i.e., P (x) = 0 for some real number x. Hint. Let P (x) = an x n + . . . + a0 , where n is odd. Assume without loss of generality that an = 1. Then, P (x) = x n (1 + an−1 x1 + . . . + a0 x1n ). Thus, limx→+∞ P (x) = +∞ and limx→−∞ P (x) = −∞. This shows that there exist a, b ∈ R such that a < b, P (a) < 0, and P (b) > 0. By the intermediate value property, there is x ∈ (a, b) such that P (x) = 0. 13.207 There exists a function ϕ defined on [0, 1] such that ϕ(J ) = [0, 1] for every nondegenerate interval J ⊂ [0, 1]. Observe that this gives an example of a discontinuous function satisfying the intermediate value property. Hint. (B. Knaster, K. Kuratowski) For x ∈ [0, 1], put x = 0.a1 a2 a3 · · · (base 2) (if x has two such expansions, choose one with finitely many nonzero digits). For n ∈ N, put px (n) := a1 + a2 + · · · + an . Then define ϕ(x) := lim sup n→∞
px (n) . n
The discontinuity of the function ϕ at each point in [0, 1] is a straightforward consequence of its behavior on each open subinterval.
Differentiable Functions 13.208 Find directly (just by using the definition of derivative) a function f (x) = ax + b such that f (1) = 2 and f (1) = 1. Verify that the graph of f is the tangent line to the graph of g(x) := x 2 at x = 1. Hint. Note that (f (1 + h) − f (1))/ h = a for all h ∈ R, h = 0. This shows, then, that (2 =) f (1) = a. Since 1 = f (1) = a + b, we get b = −1. Note, too, that g (x) = 2x, hence g (1) = 2, and that g(1) = 1. This shows the assertion. See Fig. 13.10. 13.209 Show directly that the function f (x) = x 2 D(x), where D is the Dirichlet function (see Definition 296) is not differentiable at x = 1 (see also Example 358.3). Use either (i) Definition 355 or (ii) the product rule. Hint. Assume that f is differentiable at x = 1.
682
13 Exercises
Fig. 13.10 The function x 2 and its tangent at 1 (Exercise 13.208)
Fig. 13.11 The function x x on (0, 1]
Let L be its differential at x = 1. Then, f (1 + h) = f (1) + L(h) + h.u(h) for all h ∈ R, where u(h) → 0 as h → 0. For h ∈ Q, we get 0 = L(h) + h.u(h), so L(1) = −u(h), and letting h → 0 we obtain L(1) = 0, hence L = 0. For h ∈ Q, we find now f (1+h) = (1+h)2 = f (1)+L(h)+h.u(h) = 1+h.u(h). Thus, 2h+h2 = h.u(h), hence u(h) = 2+h. This violates u(h) → 0 as h → 0. (ii) The function g(x) := 1/x 2 defined on (0, +∞) is differentiable at x = 1. If f was differentiable at x = 1, then the function f (x).g(x) will be differentiable at x = 1, and this is false, since f (x).g(x) is the Dirichlet function D(x), which is not even continuous at x = 1.
(i)
13.210 Let f and g be two real-valued functions on an open interval I , and assume that f (x) > 0 for all x ∈ I . If f and g are both differentiable at a ∈ I , show that f g is differentiable at a and find its derivative. Hint. For x ∈ I , we can compute ln (f (x)g(x) ) getting g(x) ln f (x), due to the fact that f (x) > 0 for all x ∈ I . This shows that f (x)g(x) = exp (g(x) ln f (x)). The function on the right-hand side is differentiable at a (by using repeatedly the chain rule and the derivative of a product), and its derivative at a is f (a)
g (a) ln f (a) + g(a) exp (g(a) ln f (a)) f (a) f (a) f (a)g(a) . = g (a) ln f (a) + g(a) f (a) 13.211 Evaluate limx→0+ x x . Hint. It is limx→0+ ex ln x . By L’Hôspital’s rule (Theorem 376), limx→0+ x ln x = limx→0+ ln1x = 0. Thus, the result is 1. See Fig. 13.11. x
13.4 Functions
683
13.212 Calculate for a > 0, lim
x→a
ax − x a . x−a
Hint. lim
x→a
ax − x a ax − aa + aa − x a = lim x→a x−a x−a = lim
x→a
ex ln a − a a x a − aa − lim x→a x − a x−a
= (ln a)(a a − 1), by using L’Hôspital’s rule (Theorem 376). 13.213 Find a function that is continuous on [−1, 1] and differentiable exactly at 0. Hint. The Takagi–van der Waerden function (Definition 481) multiplied by x 2 . 13.214 Find a function that has first derivative everywhere but second derivative nowhere. Hint. The antiderivative of the Takagi–van der Waerden function (Definition 481). 13.215 Find the equation of the tangent line through the point (2, 0) to the circle x 2 + y 2 = 1. Hint. A direct calculation says that the equation of the tangent line at the point ) (x0 , 1 − x02 ) is
−x0 x 1 y=) +) . 1 − x02 1 − x02
From the fact that the tangent line goes through the point (2, 0), we get −2x0 1 +) = 0. ) 2 1 − x0 1 − x02 √
Thus x0 = 21 and the points of touch is ( 21 , ± 23 ). One can also use the geometrical argument involving the so called Thales circle. In fact the point of touch must lie on the circle (x − 1)2 + y 2 (see Fig. 13.12 for a geometric explanation). Therefore, we solve ⎧ ⎨x 2 + y 2 = 1, ⎩(x − 1)2 + y 2 = 1. We get, by subtracting, (x − 1)2 − x 2 = 0, which gives x = 21 .
684
13 Exercises
Fig. 13.12 The reason why Thales circle gives the right answer (Exercise 13.215)
(0, 0)
(0, 2)
Fig. 13.13 Three functions in Exercise 13.216
√ √ 13.216 Consider the following functions: f1 (x) = 3 x, f2 (x) = 3 |x| and &√ 3 x + 1 for x ≥ 0, f3 (x) = √ 3 x for x < 0. Show that f1 and f2 are continuous everywhere, f3 is discontinuous at 0, f1 (0) =
+∞, f3 (0) = +∞, f2− (0) = −∞, f2+ (0) = +∞ Hint. Standard (see Fig. 13.13). 13.217 Let the function f be defined by ⎧ ⎨0 f (x) := ⎩x 2
if
x ≤ 0,
otherwise.
Show that f is a C 1 -smooth function that is not twice differentiable at the origin. Hint. One side derivatives. 13.218 Let the function f be defined by ⎧ ⎨(x 2 − 1)4 f (x) := ⎩0
if |x| ≤ 1, otherwise.
Show that f is a C 2 -smooth bump function on R. Hint. One-sided derivatives (see Fig. 13.14).
13.4 Functions
685
Fig. 13.14 The function in Exercise 13.218
Fig. 13.15 The function in Exercise 13.219
Fig. 13.16 The extension in Exercise 13.220
13.219 Let the function f be defined by ⎧ ⎨exp −(1 − x 2 )−1 f (x) := ⎩0
if |x| < 1, otherwise.
Show that f is a C ∞ -smooth bump function on R. See also Exercise 13.223. Hint. One-sided derivatives. See Fig. 13.15. 13.220 Let ε > 0 be given. Let f be the function defined on (−∞, −ε] ∪ [ε, ∞) by f (x) = |x|. Extend f continuously to a differentiable function on R. Hint. Define the function g on R by & 1 2 x + 2ε , if |x| ≤ ε g(x) = 2ε |x|, if |x| ≥ ε. See Fig. 13.16. 13.221 Let the function f be defined by & 0 if x ≤ 0, f (x) = 1 if x ≥ 1. Extend f to a twice differentiable function on R.
686
13 Exercises
Fig. 13.17 The extension in Exercise 13.221
Fig. 13.18 A fragment of the graph of the function in Exercise 13.222
Hint. Try to use the polynomial P (x) := a1 x 5 + a2 x 4 + a3 x 3 + a4 x 2 + a5 x + a6 . Then, we require P (0) = P (0) = P
(0) = 0 and P (1) = 1, P (1) = P
(1) = 0. We solve the linear equations and get P (x) = 6x 5 − 15x 4 + 10x 3 . So the extended function is, for instance (see Fig. 13.17) & P (x) = 6x 5 − 15x 4 + 10x 3 , if 0 ≤ x ≤ 1, f(x) = f (x), otherwise. There is a stronger version of this in Exercise 13.224. In fact, we will extend there function f to a C ∞ -differentiable function on R. In Exercises 13.222–13.226, we shall establish that the class of real-valued infinitely differentiable functions defined on a given closed and bounded interval is quite large, in the sense of separating points, for example. This will turn out to be important for the theory of distributions, in particular for the theory of periodic distributions, developed in this text in Sect. 11.8. 13.222 Prove that there exists an infinitely differentiable real-valued function f on R such that f (x) = 0 for all x ≤ 0, f (x) > 0 for all x > 0, it is strictly increasing on [0, +∞), and limx→+∞ f (x) = 1. Hint. Consider f (x) = 0 for all x ≤ 0 and f (x) = exp (−1/x) for x > 0. The only point where differentiability must be carefully checked is x = 0. The reader will find that f+(n) (0) = 0 for all n ∈ N. 13.223 Use Exercise 13.222 to prove that for every closed bounded interval [a, b] ⊂ R, there exists an infinitely differentiable real-valued function g such that g(x) = 0 for all x ∈ (a, b) and g(x) > 0 for all x ∈ (a, b). See also Exercise 13.219 for a different example of such a function. Hint. Put g(x) := f (x − a)f (b − x), where f is the function in Exercise 13.222. See Fig. 13.19. 13.224 Prove that for every closed bounded interval [a, b] ⊂ R, there exists an infinitely differentiable real-valued function h such that h(x) = 0 for all x ≤ a, h(x) ∈ (0, 1) for all x > a, and x h(x) = 1 for x ≥ b. b Hint. Put h(x) := M −1 −∞ g(t)dt, where M := −∞ g(t)dt, for g the function in Exercise 13.223. See Fig. 13.19.
13.4 Functions
687
Fig. 13.19 The functions g and h in Exercises 13.223 and 13.224
1
h g
b
a
Fig. 13.20 The function ϕ in Exercise 13.225
1
a
b
c
d
13.225 Prove that for a < b < c < d in R, there exists an infinitely differentiable function ϕ such that ϕ(x) = 0 for x ∈ (−∞, a] ∪ [d, +∞), ϕ(x) ∈ (0, 1) for x ∈ (a, b) ∪ (c, d), and ϕ(x) = 1 for x ∈ [b, c]. Hint. Use the function h defined in Exercise 13.224. See Fig. 13.20. Observe that a similar function was constructed in Exercise 13.219. An extension of this result can be found in Exercise 13.226. 13.226 Let K be a nonempty compact subset of an open set G in R. Show that there is a C ∞ -differentiable function ϕ on R such that ϕ = 1 on K, ϕ = 0 on R \ G, and 0 ≤ ϕ ≤ 1 on R. Hint. By using Exercise 13.223, for each a ∈ K let ϕa be a C ∞ -differentiable function on R such that 0 ≤ ϕa (x) for all x ∈ R, ϕa (a) > 1 and ϕa = 0 on R \ G. For each a ∈ K, let Ua := {x : ϕa (x) > 1}. By the compactness of K, there exist ai in K, i = 1, 2, . . ., n, such that K ⊂ ni=1 Uai . Put ϕ := ni=1 ϕai . Then ϕ (x) > 1 for all x ∈ K and ϕ (x) = 0 for all x ∈ R \ G. Let ψ be a C ∞ -differentiable function on R such that ψ(0) = 0, 0 ≤ ψ(x) ≤ 1 for x ∈ R, and ψ(x) = 1 for x ≥ 1 (see Exercise 13.224). Then put ϕ = ψ ◦ ϕ. 13.227 We say that a sequence of 2π-periodic continuous real-valued functions {gn }∞ n=1 is an approximate identity if (i) gn (x) 2π≥ 0 for all x ∈ R 1 (ii) 2π 0 gn (x) dx = 1 for all n ∈ N 2π−δ (iii) For every 0 < δ < π, limn→∞ δ gn (x) dx → 0. Define the convolution f ∗ g of two 2π -periodic continuous real-valued functions f , g on R as 1 (f ∗ g)(x) = 2π
4 0
2π
f (x − t)g(t) dt, for x ∈ R.
(13.14)
688
13 Exercises
Note that Definition 1060 extends the previous definition to elements in the space of all periodic distributions. (a) Prove that the convolution is well-defined, and it is a con 2π tinuous 2π -periodic function. (b) Prove that (f ∗g)(x) = (2π )−1 0 f (t)g(x −t) dt. (c) Prove that if at least one of the functions f or g (say g) is infinitely differentiable, then the convolution f ∗g is also infinitely differentiable (and D k (f ∗g) = f ∗(D k g) for all k = 0, 1, 2, . . ., where D k denotes the kth derivative operator). (d) Prove that f ∗ gn → f uniformly on R if f is 2π-periodic and continuous and {gn }∞ n=1 is as approximate identity. (e) Prove that there exists an approximate identity {ϕn }∞ n=1 consisting of infinitely differentiable functions—more precisely, of trigonometric polynomials. All together, this shows that every 2π -periodic continuous function is the uniform limit of a sequence of 2π-periodic infinitely differentiable functions. Hint. Given t ∈ R, put Tt f (x) = f (x − t) for x ∈ R. Observe that Tt (f ∗ g) = (Tt f )∗g = f ∗(Tt g). Note, too, that Tt (f ∗g)−(f ∗g)∞ ≤ f ∞ Tt g−g∞ , where · ∞ denotes the supremum on [0, 2π]. Use the Heine–Cantor Theorem 344 to conclude that f ∗ g is continuous. (b) A simple change of variable in the integral, and the periodicity of the functions f and g. (c) Put Qt (g)(x) = t −1 (g(x + t) − g(x)) for t = 0. Using the Mean Value Theorem 365, show that, if g is differentiable and it has a continuous derivative, then Qt g − Dg∞ → 0 when t → 0. Then, by induction, show that if g is infinitely differentiable, then D k (Qt g) − D k+1 g∞ → 0 when t → 0, for every k = 0, 1, 2, . . . Prove that Qt (f ∗ g) = f ∗ Qt g. Then show that if g is differentiable and its derivative is continuous, then f ∗ g is also differentiable and D(f ∗ g) = f ∗ Dg. By induction, this proves (c). (a)
To obtain (d), fix ε > 0. Find δ > 0 depending on ε, according to the uniform continuity of f on [0, 2π ]. Then put, for x ∈ R, 2π |(f ∗ gn )(x) − f (x)| = 2π 2π 2π | 0 gn (t)f (x − t) dt − 0 gn (t)f (x) dt| = | 0 gn (t)(f (x − t) − f (x)) dt| ≤ 2π δ 2π−δ δ + 2π −δ . Now the first and third inte0 gn (t)|f (x − t) − f (x)| dt = 0 + δ grals are small due to the uniform continuity of f and the way δ was chosen, and the middle integral is small for n big enough, due to (iii) in the definition of an approximate identity. This is independent of x. For (e), try ϕn (x) := cn (1 − cos x)n for n ∈ N, where cn has been chosen in order that (ii) in the definition of an approximate identity holds. The details are as follows: Fix δ ∈ (0, π). Then 4 2π−δ 4 2π−δ 1 + cos x n 1 1 cn ϕn (x) dx = dx 2π δ 2π 2 δ 4 2π −1 4 2π−δ 1 + cos x n 1 + cos x n = dx dx. 2 2 0 δ Note that 1 + cos x < r(1 + cos y) for x ∈ [δ, 2π − δ] and y ∈ [0, δ/2], for some r ∈ (0, 1). Therefore ϕn (x) = cn (1 + cos x)n < cn r n (1 + cos y)n
13.4 Functions
689
Fig. 13.21 The first five functions of the approximate identity in Exercise 13.227 (d)
= r n ϕn (y), for x ∈ [δ, 2π − δ] and y ∈ [0, δ/2]. Integrating on y ∈ [0, (1/2)δ], we get 4 4 δ/2 ϕn (x) dy ≤ r n 0
δ/2
ϕn (y) dy ≤ 2π r n ,
0
i.e., ϕn (x) ≤ 4π δ −1 r n for x ∈ [δ, 2π − δ]. This shows that ϕn → 0 uniformly on [δ, 2π − δ]. 13.228 Give an alternative proof of the Weierstrass Approximation Theorem 490 by using Exercise 13.227. Hint. Observe, first, that it is enough to consider functions defined on [0, 2π ]. Prove that, if f is a continuous 2π-periodic real-valued function, and ϕ is a trigonometric polynomial, then f ∗ ϕ is again a trigonometric polynomial (to see this, apply a change of variable in the integral for the convolution, see (b) in Exercise 13.227). The existence of an approximate identity {ϕn }∞ n=1 consisting of trigonometric polynomials (see (e) in the same exercise) ensures then that f is the uniform limit of a sequence of trigonometric polynomials (see (d) in the same exercise). Alternatively, we can use Fejér Theorem 859. Finally, observe that a trigonometric polynomial can be uniformly approximated by a polynomial on [0, 2π ] (use its Taylor expansion). 13.229 Assume that f is a real-valued function on R such that at a certain point x ∈ R we have f (x − 1/n) − f (x) f (x + 1/n) − f (x) = lim =L lim n→∞ n→∞ 1/n 1/n and L is finite. Is f necessarily differentiable at x? Hint. No in general: The Dirichlet function (see Definition 296). 13.230 Let the function f be defined on R by & x 2 , if x ≤ x0 , f (x) = ax + b, if x > x0 . Choose a and b, if possible, so that f is differentiable at x0 . Hint. Use one-sided continuity and differentiability. See Fig. 13.22 for a particular example (x0 = 1).
690
13 Exercises
Fig. 13.22 The function in Exercise 13.230 for x0 = 1
Fig. 13.23 The function in Exercise 13.232 and its asymptote at +∞
13.231 Let f be a real-valued continuous function on an interval (a, b). Fix x0 ∈ (a, b). Assume that f is differentiable at each point x of (a, b) different from x0 , and that the limit lim f (x) = L x→x0
exists as a real number. Show that f is then differentiable at x0 , and f (x0 ) = L. This shows, in particular, that the derivative cannot have jump discontinuities. Hint. Study the proof of L’ Hôpital’s rule (Theorem 376). 13.232 Find an asymptote at +∞ for the function f (x) = x + arctan x. (An asymptote at +∞ is a line y(x) := kx +q such that limx→+∞ |f (x)−y(x)| = 0.) x Hint. Letting x → +∞, we have x + arctan x − kx − q → 0 -⇒ 1 + arctan − x q π k − x → 0. Thus k = 1. Therefore, x + arctan x − x − q → 0, so q = 2 . (See Fig. 13.23.) 13.233 Prove that there is a continuous function on [0, 1] that is monotone or convex on no subinterval. Hint. Use the Lebesgue differentiation theorem for monotone functions (Theorem 424) and Sect. 6.9.2.1. x 2 sin
1
13.234 Show that limx→0 sin x x = 0. Note that L’Hôspital’s rule (Theorem 376) cannot be used, due to the fact that (4.27) fails. Hint. The function can be written sinx x x sin x1 . Use then Proposition 389 and the fact that sin x is a bounded function. See Fig. 13.24 for the graph of the given function.
13.4 Functions
691
Fig. 13.24 The function in Exercise 13.234
Fig. 13.25 The function in Exercise 13.235
13.235 Assume that limx→∞ f (x) = 0 and that f is differentiable. Does it follow that limx→∞ f (x) = 0? 2 Hint. No in general: sinxx . See Fig. 13.25. 13.236 Assume that limx→∞ f (x) = 0. Does it imply that limx→∞ f (x) exist? Hint. No in general: cos (ln x). 13.237 (a) Does there exist a real-valued continuous map defined on R such that f (Q) ⊂ P and f (P) ⊂ Q? (b) Does there exist a map from Q onto P? (c) Does there exist a continuous map from R onto P? Hint. (a) No. Assume that such a function f exists. Then, f (R) (= f (Q) ∪ f (P)) −1 is countable, say f (R) := {zn : n ∈ N}. Then R = ∞ (zn ), and the Baire n=1 f −1 Category Theorem 111 implies that for some n ∈ N, f (zn ) contains an interval J . However, J contains both rational and irrational numbers, and so its image cannot be a single number zn . (b) No (cardinality reasons). (c) No (connectedness). 13.238 Graph √ the following functions: √ 2 2 |x| + x , x 2 |x| , x 4 − x 2 , x 2|x|+1 , x 4x+1 , dist (x, Z), where dist (x, Z) is the distance +1 √ √ from set√ Z of the integer numbers, x − x 2 , sin2 (π x), | sin x|, 1 − √ x to the 2 x − x 2 , e−|x| , e |x| , e−x , ln1x . Hint. Standard. 13.239 Find the family of all functions y = y(x) on an interval (a, b) that solves the first order linear ordinary differential equation problem y (x) + f (x).y(x) = g(x),
(13.15)
692
13 Exercises
where f and g are continuous functions on (a, b). x Hint. Find an antiderivative function F (e.g., F (x) := a f (t) dt) of f on (a, b). It exists because of Corollary 683. Now, multiply both sides of the Eq. (13.15) by exp (F (x)), and observe that ( exp (F (x)).y(x))
= F (x) exp (F (x)).y(x) + exp (F (x)).y (x) = exp (F (x))(f (x).y(x) + y (x)) = exp (F (x)).g(x). x It follows that exp (F (x)).y(x) = a exp (F (s)).g(s) ds + K, where K is an arbitrary constant. Finally, for x ∈ (a, b), 4 x exp (F (s)).g(s) ds + K , y(x) = exp (−F (x)) a
where K is an arbitrary constant. 13.240 Find a function y = y(x) on a neighborhood U of 1 that does not contain 0 such that y(x) + x 2 , for x ∈ U , and y(1) = 1. y (x) = x Hint. Exercise 13.239 applied to this situation gives immediately y(x) = x 3 /2 + Kx, where K is an arbitrary constant. Due to the fact that y(1) = 1, we get K = 1/2, so finally y(x) = (x + x 3 )/2 for x ∈ U . 13.241 Find all differentiable real-valued functions f on R such that f = f on R.
Hint. Put f = y and solve y = y. This gives yy = 1, so ln |y| = x + K for an arbitrary constant K. It follows that |y| = ex+K = eK ex , so y = Cex for C an arbitrary real constant.
13.4.2
Optimization and the Mean Value Theorem
13.242 Find points where the following function has local maxima or local minima: f (x) = x 3 − 6x 2 + 9x − 4. Hint. See Fig. 13.26. It is easy to see that f (x) = 0, if and only if, x = 1 or x = 3. Note that f
(1) < 0, f
(3) > 0. Thus, according to Theorem 373, the function f has a local maximum at x = 1 and a local minimum at x = 3. By Theorem 362, there is no other local extremum. 13.243 Consider the function (see Fig. 13.27) f (x) = defined on R.
1 3 5 2 x − x + 6x + 1 3 2
13.4 Functions
693
Fig. 13.26 The function f in Exercise 13.242
Fig. 13.27 The function 1 3 x − 25 x 2 + 6x + 1 on the 3 interval [1, 4] (Exercise 13.243)
(i) (ii)
Find its critical points and decide on their character (local maxima, minima, or none of the above). Consider f defined on [1, 4]. Find its maximum and minimum on this interval.
Hint. We have f (x) = x 2 − 5x + 6 for all x, so f (x) = 0 precisely when x = 2 or x = 3. Those are then its critical points. Observe that f
(x) = 2x − 5, hence f
(2) = −1 and f
(3) = 1. According to Theorem 373, x = 2 is a local maximum and x = 3 a local minimum. There are no other local extrema, as it follows from Theorem 362. (ii) The function f is certainly continuous on [1, 4]. According to Corollary 335, it attains its supremum and infimum on this interval. In view of Remark 3632, for extrema we must check the set {1, 2, 3, 4}. Observe that f (1) = 4.83, f (2) = 5.6, f (3) = 5.5, f (4) = 6.3. Thus, x = 1 gives a minimum and x = 4 a maximum. (i)
13.244 Find the maximum and the minimum of the function x f (x) = 2 x +1 on the interval [0, 2]. Hint. The function is certainly continuous on [0, 2]. By Corollary 335, f attains its extrema on [0, 2]. If this happens at an interior point, f must have at this point a local extremum. Thus, according to Theorem 362, the derivative at such point must be 0. Observe that f (x) = 0, if and only if, x = ±1. So, on (0, 2) the only candidate
694
13 Exercises
Fig. 13.28 The function in Exercise 13.244
Fig. 13.29 The function f in Exercise 13.247
for an extremum must be x = 1, hence the only candidates for extrema in [0, 2] belong to the set {0, 1, 2}. We compare the values: f (0) = 0, f (1) = 21 , f (2) = 25 . Therefore, the maximum is at the point 1 (equal to 21 ), and the minimum at the point 0 (equal to 0). See Fig. 13.28. √ 13.245 Prove that sin x + cos x ≤ 2 by using differentiability. Hint. Put f (x) := sin x + cos x for x ∈ R. The function f is 2π-periodic. so it is enough to consider f on [0, 2π ], where it attains its extrema. Note that f (0) = f (2π ) = 1. If x0 ∈ (0, 2π ) is an extremum, then f (x0 ) = 0 = cos √x0 − sin x0 . Since cos2 x0 + sin2 x0 = 1, we get sin2 x0 = 1/2, and so f (x0 ) = ± 2(> 1). The conclusion follows. For another approach, see Exercise 13.252. 13.246 Use the mean value theorem to show that { lnnn }∞ n=1 is a decreasing sequence. Hint. Standard. 13.247 Show that of all the rectangles with constant area, the square has the least perimeter. Hint. Say the area is 1. The sides be x and x1 . We look for the minimum of the function f (x) = x + x1 (see Fig. 13.29). It is at the point where the derivative of f is 0, i.e, at x = 1. 13.248 Assume that f is a real-valued continuous function on R such that lim f (x) = lim f (x) = +∞.
x→+∞
x→−∞
Show that f attains its minimum on R.
13.4 Functions
695
Fig. 13.30 The function f and its two first derivatives on [−1.5, 1.5] (Exercise 13.249)
Hint. Let x0 > 0 be such that f (x) > f (0) + 1 whenever x ≥ x0 or x ≤ −x0 . The function f is continuous on [−x0 , x0 ]. Thus, by Corollary 335, it attains its minimum on [−x0 , x0 ], say at y0 . Then f (y0 ) ≤ f (0) ≤ f (0) + 1 ≤ f (x) whenever x ≥ x0 or x ≤ −x0 . Since f (y0 ) ≤ f (x) whenever −x0 ≤ x ≤ x0 , we have that f (y0 ) ≤ f (x) for all x ∈ R. 13.249 Find inflection points of the function f defined by f (x) = x 4 −x 2 for x ∈ R. An inflection point of a twice differentiable function f is a point x0 such that f
changes its sign at x0 , i.e., f
(x) < 0 for x0 − δ < x < x0 and f
(x) > 0 for x0 < x < x0 + δ for some δ > 0, or vice versa. Hint. x0 = ± √16 (see Fig. 13.30). Observe that, due to Corollary 820 and Remark 821, an inflection point of a twice differentiable function is a point where the function changes from strictly convex to strictly concave, or vice versa.
13.4.3
The Trigonometric Functions
13.250 The functions hyperbolic sine, hyperbolic cosine, and hyperbolic tangent were defined in (5.85), (5.86), and (5.87), respectively. Find their domain, range, and inverse function (in case they exist). Hint. (See the plots of their graphs in Fig. 5.30.) The common domain is R. All three functions are continuous on R. Since sinh x → +∞ as x → +∞ and sinh x → −∞ as x → −∞, its range is R (use the intermediate value property of continuous functions, Theorem 339). Clearly, the range of cosh x is [1, +∞) by a similar argument. Note that tanh x = (ex − e−x )/(ex + e−x ) = (e2x − 1)/(e2x + 1), clearly −1 < tanh x < 1 for all x ∈ R, tanh x → 1 as x → +∞ and tanh x → −1 as x → −∞. Theorem 339 concludes that its range is (−1, 1). The function cosh x is not injective (it is an even function), so it has not inverse function. Letting ex = z, we can solve for z both sinh x = (ex − e−x )/2 = (z + z−1 )/2 = y to obtain x = ln (y +
696
13 Exercises
y 2 + 1) for y ∈ R, and tanh x = (ex − e−x )/(ex + e−x ) = (z − z−1 )/(z + z−1 ) = y to obtain x = ln ((1 + y)/(1 − y)) for y ∈ (0, 1).
13.251 Prove the equalities in Corollary 382. Hint. We shall hint at the proof of (i). The rest is similar. Add (i) and (ii) in Proposition 381 to get sin (u+v)+sin (u−v) = 2 sin u cos v. Now put u := (α+β)/2 and v := (α − β)/2. √ 13.252 Show that for all x ∈ R, we have sin x + cos x ≤ 2. Hint. Square the expression sin x + cos x and use Corollary 384. We propose three other (more sophisticated) approaches to this: (i) Exercise 13.522, based on the Cauchy–Schwarz inequality; (ii) Exercise 13.245 (using differentiability), and (iii) Exercise 13.542 (a byproduct of norm-equivalence). 13.253 Use the Mean Value Theorem 365 to show that the function arctan is continuous at every point of R. Hint. Standard. For a picture of the graph of the arctan function see Fig. 4.35. 13.254 Show that
x x x sin x lim cos cos . . . cos n = . n→∞ 2 4 2 x Hint.
cos x2 cos x4 . . . cos 2xn cos x2 cos x4 . . . cos 2xn = sin x 2 sin x2 cos x2
cos x4 . . . cos 2xn cos x4 . . . cos 2xn 1 = = = ... = n x x x 2 sin 2 4 sin 4 cos 4 2 sin
x 2n
→
1 , x
when n → ∞. 13.255 Show elementary that the series ∞
sin n
n=1
has bounded partial sums. This is in contrast with the series have bounded partial sums. Hint.
∞ n=1
| sin n|, that cannot
1 sin1 + sin 2 + . . . + sin n sin 2 1 1 3 3 5 1 1 = cos − cos + cos − cos + . . . + cos n − − cos n + 2 2 2 2 2 2 2 1 1 = cos − cos n + . 2 2
13.4 Functions
Thus
697
sin 1 + sin 2 + . . . + sin n ≤
1 . sin 21
for each n ∈ N. The series ∞ n=1 | sin n| cannot have bounded partial sums, as otherwise it would converge and thus sin n → 0, which is not true (see Exercise 13.97). | sin n| 13.256 Show that ∞ is divergent. n=1 n Hint. Assume it is convergent. Since n| ≥ sin2 n for each n, we would get cos| sin 1 1−cos 2n 2n that 2 is convergent. Since is convergent by the Dirichlet test (see n n Theorem 181) (use also the formulas in Sect. 9.2), we would get that the sum of the 1 last two series, i.e., , is convergent, a contradiction. n Exercises 13.257–13.259 below are intended to practice with what is called Landau notation, a code to compare the asymptotic behavior of two functions. E. G. H. Landau was a German mathematician. 13.257 (Landau Notation) Let f and g be two real-valued functions defined on a neighborhood of a point x0 ∈ R. We write f (x) = O(g(x)) as x → x0
(13.16)
if there exists δ > 0 and M > 0 such that |f (x)| ≤ M|g(x)| for all x ∈ R such that |x − x0 | < δ. Variations of this definition can be stated for x0 = +∞, x0 = −∞, or x0 = ∞. For example, we write f (x) = O(g(x)) as x → +∞ whenever there exists M > 0 and a ∈ R such that |f (x)| ≤ M|g(x)| for all x > a. We write f (x) = o(g(x)) as x → x0
(13.17)
(x) if limx→x0 fg(x) = 0. As above, we may let x → +∞, x → −∞, or x → ∞. √ Show that sin x = o( x) as x → 0, and that sin x = o(x) as x → 0. Show that sin x = O(x) as x → 0. Hint. Observe that, according to Proposition 506, we have sin x = x + o(x) as x → 0. Hence, o(x) √ sin x x + o(x) √ x → 0, as x → 0, = x+ √ = √ x x x
and
x + o(x) o(x) sin x = =1+ → 1, as x → 0. x x x 13.258 Find 1 1 lim − . x→0 sin x x Hint. By Proposition 506, we have sin x = x + o(x) as x → 0, and sin x = 3 x − x3! + o(x 3 ) as x → 0. Thus, 3
x − x + x3! − o(x 3 ) 1 1 x − sin x − = = sin x x x sin x x sin x
698
13 Exercises
Fig. 13.31 The first four iterates of the sinus function (Exercise 13.261)
x3 3!
3
3
) x x − o(x x − o(x 3 ) − o(x 3 ) 3! x3 = = → 0 as x → 0. = 3! o(x) o(x) 2 x(x + o(x)) 1+ x x (1 + x )
13.259 Show that lim
x→0
Hint. Recall that sin x = x −
lim
x→0
x3 3!
1 x − sin x = . 3 x 6
+ o(x 3 ) as x → 0 (see Proposition 506). Then
x − sin x = lim x→0 x3
x− x−
x3 3!
+ o(x 3 )
x3
=
1 . 6
13.60 Given ε > 0, find δ > 0 so that arctan x − arctan 1 < ε whenever |x − 1| < δ. Hint. The Mean Value Theorem 365. The function is 1-Lipschitz. Thus, the answer is δ = ε. 13.261 Show that limn→∞ sin (sin(sin. . .(n) . . .(sinx))) = 0 for every x ∈ R, where sin (sin(sin. . .(n) . . .(sinx))) denotes the function sin applied n times to x. Hint. If, say, x ≥ 0, and an (x) := sin (sin(sin. . .(n) . . .(sinx))), then 0 ≤ an+1 (x) ≤ an (x) ≤ 1 for each n. Thus limn→∞ an (x) exists, denote it by α ∈ [0, 1]. We have sin (an (x)) = an+1 (x) for each n ∈ N. Take the limit for n → ∞ in this last equality, and use the continuity of the function sin x. We get sin α = α. Thus, we get α = 0. √ 13.262 Show that limn→∞ cos π n2 + n = 0. √ n2 + n − (n + 21 ) = 0. Indeed, write Hint. First, note that limn→∞ 1 2 n +n− n+ 2
13.4 Functions
699
Fig. 13.32 The functions 3 tan x and x + x3 in Exercise 13.264
2 n2 + n − n + 21 1 1 =√
=− √ → 0, 1 2 2 4 n + n + n + 21 n +n+ n+ 2 when n → ∞. √ Second, put xn := π n2 + n, yn := (n + 21 )π , for n ∈ N. Then cos yn = 0 for all n, and xn − yn → 0. Since the function cos x is uniformly continuous (even Lipschitz) on R as it has a bounded derivative, we have that cos xn → 0. Alternatively, one can use the formula for cos x − cos y, using the points xn and yn . 13.263 Let f be the function defined on R by & x sin x1 , for x = 0, f (x) = 0, for x = 0 (see the first graph in Fig. 4.45). Let g be the function defined on R by & 1, for x = 0, g(x) = 0, for x = 0. Show that limx→0 f (x) = 0, limx→0 g(x) = 0, and limx→0 (g ◦ f )(x) does not exist. 1 and yn = (2n+1 1 )π . Hint. Use the points xn = nπ 2 Compare this with Proposition 329. 13.264 Show the inequality tan x > x +
x3 3
for 0 < x < π2 . Hint. Repeated use of the Mean Value Theorem 365. Precisely, observe first that on (0, π/2) we have tan x − x = 1 + tan2 ξ − 1 > 0, where ξ ∈ (0, x). Then, tan x − x − x 3 /3 = 1 + tan2 κ − 1 − κ 2 = (tan κ − κ)(tan κ + κ) > 0, where κ ∈ (0, x), by using the previous estimate (See Fig. 13.32). 13.265 Find a formula for nj=1 j cos j x. Hint. (sinx + sin 2x + . . . + sin nx) .
700
13 Exercises
13.266 Let
∞ nn f (x) = sin (nx). 3n n! n=1
Determine where f is continuous. Hint. R. (Use the Weierstrass M-test (Theorem 473) for the function series and the ratio (Proposition 175) or the root test (Proposition 177) for the numerical series. Then, apply Theorem 463.) 13.267 Find two discontinuous functions f and g so that f ◦ g is continuous everywhere. Hint. In [0, 1], g = χ{1} , f = χ(0,1) , where χ stands for the characteristic function. 13.268 Find two sets A ⊂ R and B ⊂ R2 , and an everywhere continuous function f : A → B that is one-to-one and onto, such that f −1 : B → A is not everywhere continuous. Hint. Run the circle.
13.4.4
Finer Analysis of Continuity and Differentiability
Monotone Functions 13.269 Let a, b be two real numbers such that a < b. Define the function f : (a, b) → R by f (x) := 1/(a − x) + 1/(b − x) for all x ∈ (a, b) (for the graph of f see Fig. 6.4). Prove that f on (0, 1), (i) is continuous, (ii) is strictly increasing, (iii) its range is R, and (iv) it has a continuous inverse. This shows that R and (a, b), both endowed with the absolute-value metric, are homeomorphic metric spaces. Hint. (i) and (ii) are obvious. (iii) follows from (i), (ii), and the fact that limx→a+ f (x) = −∞ and limx→b− f (x) = +∞. The existence of the inverse mapping follows from the fact that f is one-to-one. In order to prove (iv), we can fix y ∈ R and find x ∈ (a, b) such that f (x) = y. Then, given δ > 0 such that [x − δ, x + δ] ⊂ (a, b), the set f [x − δ, x + δ] = [f (x − δ), f (x + δ)] contains y in its interior, so for checking the continuity of f −1 at y is enough to check it on [f (x − δ), f (x + δ)]. For this, we can apply Proposition 337. 13.270 Prove the following result called Helly’s first theorem (E. Helly was an Austrian–American mathematician): Let {fn } be a sequence of increasing functions on [a, b] that is uniformly bounded, i.e., there is K > 0 such that |fn (x)| ≤ K for every n ∈ N and for all x ∈ [a, b]. Then, there is a subsequence {fnk } of {fn } that pointwise converges to an increasing function f on [a, b]. Hint. Let {rk } denote the sequence that includes all the rational numbers in [a, b] as well as the end points a and b. Choose a subsequence {fn1 } of {fn } such 2 that {fn1 (r1 )}∞ n=1 converges. From this subsequence, choose a subsequence {fn } such
13.4 Functions
701
Fig. 13.33 The four first functions in Exercise 13.271
that {fn2 (r2 )}∞ n=1 converges. Keep going. Let the sequence of functions {gn } be defined by gn = fnn for all n ∈ N. Then, {gn } is a subsequence of every {f k }n and thus {gn (rk )}∞ n=1 is convergent for every k ∈ N (this procedure is called the Cantor diagonal process, and is a very important technique in analysis). Define the function f on the set {rk : k ∈ N} by f (rk ) := limn→∞ gn (rk ), for k ∈ N. Then, extend f to [a, b] by letting f (x) := suprk 0 pick rk and ri such that rk < x0 < ri and f (ri ) − f (rk ) < ε. Fixing rk and ri , find n0 ∈ N such that |gn (rk ) − f (rk )| < ε and |gn (ri )−f (ri )| < ε for n > n0 . Then, f (x0 )−2ε < gn (rk ) ≤ gn (ri ) < f (x0 )+2ε for n > n0 . Since gn (rk ) ≤ gn (x0 ) ≤ gn (ri ), we have f (x0 )−ε < gn (x0 ) < f (x0 )+ε for n > n0 . This finishes the proof. 13.271 Let the sequence {fn } of functions on [0, 2π ] be defined by fn (x) := sin (nx) for all n ∈ N and x ∈ [0, 2π ] (see Fig. 13.33). Then, {fn } has no subsequence that converges (a.e.) on [0, 2π]. Compare this with Helly’s first theorem (see Exercise 13.270). Hint. Indeed, if {gn } were such a subsequence, then put hn := gn+1 − gn for all n ∈ N. Then, {h2n } converges (a.e.) to zero on [0, 2π ], all the functions hn are uniformly bounded by 4 and thus, by the Lebesgue Dominated Convergence 2π 2π Theorem 750, lim 0 h2n → 0. However, by simple calculation, 0 h2n = 2π . Differentiability of Monotone Functions 13.272 Show that for every set E of measure zero in [a, b] there exists a continuous increasing function σ (x) such that σ (x) = ∞ at all points of E. Hint. Fill the gaps in the following argument: For n ∈ N, let Gn be an open set such that Gn ⊃ E and λ(Gn ) < 21n , and let ψn (x) := λ(Gn ∩ [a, x]) for x ∈ 1 [a, b]. Then, each ψn is continuous, increasing, ∞ nonnegative, and ψn (x) < 2n for all x ∈ [a, b]. For x ∈ [a, b], put σ (x) := n=1 ψn (x). Then, σ is an increasing, nonnegative, and continuous function on [a, b]. Let x0 ∈ E and n ∈ N be given. Then for a small h > 0, we have [x0 , x0 + h] ⊂ Gn and thus
ψn (x0 + h) = λ (Gn ∩ [a, x0 ]) ∪ (Gn ∩ (x0 , x0 + h))
702
13 Exercises
= λ(Gn ∩ [a, x0 ]) + h = ψn (x0 ) + h. A similar argument is used for h < 0. Therefore, we have ψn (x0 + h) − ψn (x0 ) = 1, for small |h|. h Given N ∈ N, for small |h| (since we consider only a finite number of summands) we have N σ (x0 + h) − σ (x0 ) ψn (x0 + h) − ψn (x0 ) ≥ = N. h h n=1 It follows that σ (x0 ) = +∞. Note that this implies that the set consisting of all points of differentiability of a continuous monotone function may not be residual. Indeed it suffices to apply this exercise to the residual set of measure zero [0, 1] \ F given in Exercise 13.156. Functions of Bounded Variation 13.273 Do we of bounded variation if we require get
the same notion ) − f (x ) in Definition 426 instead of V (P , f ) = V (P , f ) = f (x i i+1 f (xi+1 ) − f (xi ) ? Hint. Yes. Split the intervals into two classes by the sign of f (xi+1 ) − f (xi ). 13.274 Assume that f is a continuous function of bounded variation on an interval [a, b]. Show that the variation function Vax f is a continuous function on [a, b]. Show that this implies that f is the difference of two continuous and increasing functions. Hint. First, observe that x $ → Vax f is increasing (Proposition 430). Fix x0 ∈ [a, b). We shall prove that Vax f is continuous to the right at x0 . Assume not. Then, we can find ε0 > 0 such that for every δ > 0, there exists h ∈ (0, δ) with (Vxx00 +h f =) Vax0 +h f − Vax0 f > ε0 (see again Proposition 430). Since f is continuous on [a, b], it is uniformly continuous (Theorem 344), hence there exists δ0 > 0 such that |f (x) − f (y)| < ε0 /2 whenever x, y ∈ [a, b], |x − y| < δ0 . Proceed by induction to find a strictly decreasing sequence {hn } of positive numbers in the following way: Fix r ∈ (0, δ). Find h1 ∈ (0, r) such that Vxx00 +h1 f > ε0 . This implies that there exists a finite familyof nondegenerate nonoverlapping intervals 1 1 {(ci1 , di1 )}ni=1 in [x0 , x0 + h1 ] such that ni=1 |f (di1 ) − f (ci1 )| > ε0 . We may assume 1 1 that the finite sequence {(ci1 , di1 )}ni=1 is indexed in such a way that di1 ≤ ci+1 for i = 1, 2, . . .,n1 − 1. Since 0 < d11 − c11 < δ, we have |f (d11 ) − f (c11 )| < ε0 /2. This 1 |f (di1 ) − f (ci1 )| > ε0 /2. shows that ni=2 For the next step, find h2 ∈ (0, d1 ) such that Vxx00 +h2 f > ε0 , so a finite family of 2 2 exists such that ni=1 |f (di2 ) − f (ci2 )| > ε0 . nonoverlapping intervals {(ci2 , di2 )}ni=1 n 2 2 2 2 2 Again, |f (d1 ) − f (c1 )| < ε0 /2, hence i=2 |f (di ) − f (ci )| > ε0 /2. In this way, m m nm we {hn }∞ n=1 , and families {(ci , di )}i=1 of nonoverlapping intervals in [a, b] with nfind m m m i=2 |f (di ) − f (ci )| > ε0 /2 for each m ∈ N. This is a contradiction with the fact that f has bounded variation.
13.4 Functions
703
For x0 in (a, b] and for proving continuity from the left, proceed similarly. For the second part, recall that f (x) = Vax f − (Vax f − f (x)), and that both functions Vax f and Vax f − f (x) are increasing (see the proof of Theorem 432). Absolutely Continuous Functions. Lipschitz Functions 13.275 Do we get the same notion of absolute continuity if we require in Definition 434 f (xi+1 ) − f (xi ) < ε instead of (4.51)? Hint. Yes. Split the intervals into two classes by the sign of (f (xi+1 ) − f (xi )). 13.276 Let f be an absolutely continuous real-valued function defined on the inter ∞ val [a, b]. Show that ∞ i=1 |f (bi ) − f (ai )| < ∞ whenever ∞ {(ai , bi )}i=1 is a family of pairwise nonoverlapping intervals in [a, b] such that i=1 (bi − ai ) < ∞. Hint. Given ε > 0, get δ > 0 from the definition of the absolute continuity m of f . There is n ∈ N such that 0 n (bi − ai ) < δ whenever n, m ≥ n0 . Then m n |f (bi ) − f (ai )| < ε by the absolute continuity of f . 13.277 Show that the product of two real-valued absolutely continuous functions on a closed and bounded interval is absolutely continuous. Hint. Let f and g be two absolutely continuous functions on [a, b]. Since they are continuous, they are bounded there, say |f (x)| ≤ M and |g(x)| ≤ M for all x ∈ [a, b]. Given a finite sequence {[xi , yi ]}ni=1 of nonoverlapping subintervals of [a, b], put n
|f g(yi ) − f g(xi )|
i=1 n
[|f (yi )g(yi ) − f (yi )g(xi ) + f (yi )g(xi ) − f (xi )g(xi )|]
i=1
≤
n
[|f (yi )|.|g(yi ) − g(xi )| + |f (yi ) − f (xi )|.|g(xi )|]
i=1
≤M
? n
|g(yi ) − g(xi )| +
i=1
n
@ |f (yi ) − f (xi )| .
i=1
The conclusion readily follows. 13.278 If we assume in the definition of absolute continuity (Definition 434) that the intervals are countably many, or disjoint, do we get the same? Hint. Yes. Gymnastics with ε’s and δ’s, and continuity. See Remark 435.
704
13 Exercises
13.279 Let f be a Lipschitz real-valued function defined on R, and let g be a real-valued absolutely continuous function, defined also on R. Is f ◦ g absolutely continuous? Hint. Yes, it is. The proof is standard. 13.280 Use the concept of Lipschitz property of functions to give an alternative proof to the fact that f (x) = x1 is continuous at every x0 > 0 (see Example 318.2). Hint. We have f (x) = − x12 . So, |f (x)| ≤ x42 if x ≥ 21 x0 . Thus, the function 0
is Lipschitz on [ 21 x0 , ∞) with the constant Mean Value Theorem 365 to verify it.
4 . x02
So put δ = min{ 21 x0 ,
εx02 } 4
and use the
13.281 Let the function f be defined as follows (See Fig. 4.45): & x sin x1 for 0 ≤ x ≤ 1, x = 0, f (x) = 0 for x = 0. (i)
Show that f is not Lipschitz on any interval (0, δ), but it is Lipschitz on any (δ, 1), if δ > 0. (ii) Show that f (0) does not exist. (iii) Show that f is continuous on [0, 1]. (iv) To appreciate the Heine–Cantor Theorem 344, show directly—i.e., not using (iii) nor the Heine–Cantor theorem—that f is uniformly continuous on [0, 1] (see also Exercise 13.196). (v) Given x ∈ [0, 1], show that there is a constant K > 0 such that if y ∈ [0, 1], then |f (x) − f (y)| ≤ K|x − y| (we say that f is pointwise Lipschitz on [0, 1], see Exercises 13.287 and 13.288). Hint. See Example 4.5.8.4. (i)
For x = 0, we have
1 1 1 − sin x x x and this function is bounded on any (δ, 1). Use then the Mean Value Theorem 365 to see that f is Lipschitz on [δ, 1]. Note that f is not bounded on any interval (0, δ) (consider hn = ( π2 + 2nπ)−1 and hn = (2nπ )−1 ), so f cannot be Lipschitz there (use the definition of the derivative). f (x) = sin
(ii) f (0) = lim
h→0
(iii)
(iv)
f (h) 1 = lim sin h→0 h h
does not exist (consider hn = ( π2 + 2nπ)−1 and hn = (2nπ )−1 ). f is continuous (from the right) at 0: Indeed, given ε > 0 put δ = ε. If |x| < δ, then |f (x)| ≤ |x| ≤ ε. The function f is continuous at any nonzero point as it has a derivative there (see (i) above). Let ε > 0 be given. Since f is Lipschitz and thus uniformly continuous on [ε, 1], choose δ > 0 such that |f (x) − f (y)| < ε whenever x, y ∈ [ε, 1] are
13.4 Functions
705
such that |x − y| ≤ δ. Note that |f (x) − f (y)| ≤ |f (x)| + |f (y)| ≤ 2ε, if x, y ∈ [0, ε]. If x ∈ [0, ε] and y ∈ [ε, 1], let z be in the segment [x, y] with z = ε. Then |f (x) − f (y)| ≤ |f (x) − f (z)| + |f (z) − f (y)| ≤ 3ε. Thus, altogether, |f (x)−f (y)| ≤ 3ε whenever x, y ∈ [0, 1] are such that |x −y| ≤ δ. (v) First, if x = 0, |f (y) − f (0)| = |f (y)| ≤ 1, so we can take K = 1. Let x ∈ (0, 1]. Since f is K− Lipschitz on [ x2 , 1] by 1, we have |f (x) − f (y)| ≤ K|x − y| whenever y ∈ [ x2 , 1]. Let L > x4 . If y ∈ [0, x2 ], then |x − y| ≥ x2 . Thus x |f (y) − f (x)| ≤ |f (x) + |f (x)| ≤ 2 < L ≤ L|x − y|. 2 Therefore, for any x ∈ [0, 1], there is a constant C > 0 such that |f (x) − f (y)| ≤ C|x − y| (take C = max{K, L}). √ 13.282 Explain why x is an absolutely continuous function√ on [0, 1]. −1 Hint. (See Fig. 4.43.) Consider the function f (x) := (2 √ x) on (0, 1], and put f (0) := 0. Then f ∈ L[0, 1]. Indeed, [1/n,1] f = 1 − 1/n, so the sequence {fn }∞ n=1 , where fn := f.χ[1/n,1] , n ∈ N, satisfies the hypothesis in Theorem 744, and we obtain that the pointwise limit (the function f ) belongs √ to L[0, 1]. It is enough now to apply Proposition 768 to get that [0,x] f (s) ds (= x) is absolutely continuous on [0, 1]. Related to this √ exercise, see also Exercise 13.285. For another proof of the absolute continuity of x on [0, 1], based on Exercise 13.283, see Exercises 13.284 and 13.517. 13.283 Prove the following result: If a real-valued function f is continuous and has bounded variation on [a, b], and is absolutely continuous on every interval [a , b] for a < a < b, then f is absolutely continuous on [a, b]. Hint. If, for x ∈ [a, b], V (x) := Vax f denotes the variation of f on [a, x] (see Definition 426 and formula (4.48)), then V (x) is continuous on [a, b]. Given ε > 0, get δ > 0 such that V (a + δ) < ε. Since f is absolutely continuous on [a + δ, b], there is δ1 > 0 such that for any finite system of nonoverlapping intervals with total length less than δ1 , we have the summation corresponding to these intervals in the definition of absolute continuity less than ε. If we now have a finite collection of nonoverlapping intervals of [a, b] with total length less that δ1 , we assume that a + δ is not in the interior of any of them and split them into the collection of intervals to the left and right of a + δ. The first part of the corresponding sum in the definition of absolute continuity is less that ε due to the choice of δ and the second one is less than ε due to the choice of δ1 . √ 13.284 Use Exercise 13.283 to show that the function x is absolutely continuous on [0, 1] (this √ was already proved in Exercise 13.282, see also Exercise 13.517). Hint x is continuous on [0, 1] and Lipschitz on [a, 1] for every a ∈ (0, 1]. This follows from Proposition 445. Therefore, it is absolutely continuous on [a, 1] for √ every a ∈ (0, 1] (see Proposition 444). Moreover, x is monotone on [0, 1], hence of bounded variation on [0, 1] (see Theorem 432). It is now enough to use Exercise 13.283.
706
13 Exercises
√ Fig. 13.34 The function x on [0, 1] (Exercises 13.284 and 13.285)
Fig. 13.35 The graph of the three functions in Exercise 13.286
√ 13.285 (a) Show that x is uniformly continuous on [0, +∞) but not Lipschitz on [0, 1]. (b) Show that ex is not uniformly continuous on (0, ∞). √ √ √ Hint. (See Fig. 13.34). (a) For the uniform continuity, observe that | x − y| ≤ |x − y| (see Exercise √ 13.192, and see also Example 348.3). Note, too, that the uniform continuity of x on [0, 1] follows directly from Theorem 344. Observe that √ in Exercises 13.282 and 13.284 we showed something more precise: the function x on √ [0, 1] is even absolutely continuous (and this implies the uniform continuity of x on [0, 1],√see Remark 435). See also Exercise 13.517. For the Lipschitz property, assume that x ≤ Kx for some constant K > 0 and take the limit as x → 0+. (b) Consider eln (n+1) − eln (n) . 13.286 Is the composition of two absolutely continuous functions necessarily absolutely continuous? √ √ Hint. No. Consider x 2 sin2 (1/x) and x on [0, 1]. That the function x is absolutely continuous on [0, 1] follows from Exercise 13.284. That the function x 2 sin2 (1/x) is absolutely continuous on [0, 1] follows from the fact that it is Lipschitz there (see Proposition 445). That the composition, i.e., the function x sin (1/x) on [0, 1], is not absolutely continuous followed by considering the sequence of intervals 2 2 {[ (2n+1)π , (2n)π ]}∞ n=1 . The graph of the first function is in Fig. 13.35, while the graph of the second one is in Fig. 4.43. The graph of the composition is in the first part of Fig. 4.45. In Fig. 13.35, we plot the three graphs together. 13.287 Let us say that a real-valued function f defined on an interval I is Lipschitz at a ∈ I if there is a constant C > 0 such that |f (x) − f (a)| ≤ C|x − a| for all x ∈ I . We say that f is pointwise Lipschitz on I if this happens for each a ∈ I (see Exercises 13.281 and 13.288). Show the following statement: Assume that a real-valued function f is bounded on an interval I , and that it has a (finite) derivative at some point a ∈ I . Then f is Lipschitz at a. (a) Hint. Assume that limx→a f (x)−f = L. Then, there is δ > 0 such that |f (x) − x−a f (a)| ≤ (|L| + 1)|x − a| whenever x ∈ I , |x − a| < δ. Let a constant D > 0 be such
13.4 Functions
707
that |f (x)| ≤ D for each x ∈ I . Let α > 0 be such that D < αδ. Then, if |x − a| ≥ δ we get |f (x) − f (a)| ≤ 2D < 2αδ ≤ 2α|x − a|. Therefore, max{|L| + 1, 2α} can be used to see that f is Lipschitz at a. 13.288 Show that the function
⎧ ⎨x 2 sin 1 x2 f (x) = ⎩0
for x = 0, for x = 0.
is pointwise Lipschitz (see Exercise 13.287) but not Lipschitz on [0, 1]. Hint. f (x) exists as a real number at each x ∈ [0, 1] (at 0 note that limx→0 x sin ( x13 ) = 0). Thus, one can use Exercise 13.287 to see that f is pointwise Lipschitz on [0, 1]. If x = 0, then f (x) = 2x sin ( x12 ) + x 2 (−2)x −3 cos ( x12 ) = 2x sin ( x12 )−2x −1 cos ( x12 ), which is not a bounded function on (0, 1]. Thus, f cannot be Lipschitz on [0, 1]. A graph depicting f and f is shown in Fig. 7.12. The function f was used to give an example of a function having a derivative not Riemann integrable, see Remark 686. 13.289 Assume that a function f is uniformly continuous on R. Show that f is Lipschitz on large distances, i.e., for every d > 0, there is a Kd > 0 such that |f (x) − f (y)| ≤ Kd |x − y|, whenever |x − y| ≥ d. Hint. ([BeLi00], p. 18) Let C := sup{|f (x) − f (y)| : |x − y| ≤ d}. If |x − y| ≥ d, find n ( ≤ 2|x − y|/d) points x = x0 < x1 , · · · < xn = y such that |xi − xi−1 | ≤ d for each i. Thus |f (x) − f (y)| ≤ |f (xi ) − f (xi−1 )| ≤ nC ≤ Kd |x − y|, where Kd = 2C/d. 13.290 Let C be a nonempty subset of R and f be a K-Lipschitz function on C for some K > 0. Show that f can be extended to a K-Lipschitz function on R. Hint. ([BeLi00], p. 12) Assume f is 1-Lipschitz. Put, for y ∈ R, F (y) := inf{f (z) + |z − y| : z ∈ C} (See Fig. 13.36; it is convenient to realize F as the pointwise infimum of the family {fz }z∈C , where for each z ∈ C the function fz is defined by fz (y) := f (z) + |z − y|, y ∈ R). To see that F (y) is finite for every y ∈ R, fix z0 ∈ C. By the 1-Lipschitz property of f , and by the triangle inequality, we have f (z) + |z − y| ≥ f (z0 ) − |z − z0 | + |z − y| ≥ f (z0 ) − |y − z0 |. for every z ∈ C. F is clearly an extension of f to R. To see that F is 1-Lipschitz, fix x, y ∈ R. Choose z0 ∈ C so that F (x) is close to f (z0 ) + |x − z0 |. Since F (y) ≤ f (z0 ) + |y − z0 | we have that, up to a small number, F (y) − F (x) ≤ f (z0 ) + |y − z0 | − f (z0 ) − |x − z0 | ≤ |x − y|. Similarly we prove that F (x) − F (y) ≤ |x − y|.
708
13 Exercises
Fig. 13.36 Extending a Lipschitz function (Exercise 13.290)
F F
f C
13.291 Let dabs be the absolute-value distance on R. If S ⊂ R and x ∈ R, let the distance of x to S be defined by d(x, S) = inf{dabs (x, s) : s ∈ S}. Show that d(·, S) is a 1-Lipschitz function. Is the result true in any metric space? Hint. If x, y ∈ R and s ∈ S, then dabs (x, s) − dabs (y, s) ≤ dabs (x, y). Shuffle around this and infima. Note that the argument depends only on the property of a distance, so it applies also to any metric space. 13.292 (a) Show that the function defined for x ∈ R by f (x) = |x| is 1-Lipschitz on R (this can be related to Exercise 13.291). (b) Let f be a real-valued continuous function on R. Show that |f | defined on R by |f |(x) = |f (x)| is continuous on R. (c) Let f and g be two real-valued continuous functions on R. Show that the function max{f , g} defined on R by max{f , g}(x) = max{f (x), g(x)} is continuous on R. Hint. (a) ||x| − |y|| ≤ |x − y|. Note, too, that |x| is the distance from x to the set {0}, and use Exercise 13.291. (b) Use (a) and the fact that |f | is the composition of |x| and the function f (see Proposition 329). (c) max{f (x), g(x)} = 1 (f (x) + g(x)) + 21 |f (x) − g(x)|. 2 13.293 Assume that f and g are two bounded real-valued Lipschitz functions on a general interval I . Show that f g is a Lipschitz function on I . Hint. |f (x)g(x) − f (y)g(y)| = |(f (x) − f (y))g(x) + f (y)(g(x) − g(y))| ≤ |f (x) − f (y)|.|g(x)| + |f (y)|.|g(x) − g(y)|. 13.294 Show that cos x is a 1-Lipschitz function on R. Hint. Proposition 445. It is possible to prove the statement directly from a formula for cos x − cos y. 13.295 An extended-valued function f defined on a metric space (M, d) is said to be lower semicontinuous at x0 ∈ M if lim inf x→x0 f (x) ≥ f (x0 ) (the notion of limes inferior of f at a point x0 in M is the natural extension of Definition 310 to the case of a metric space; we do not request that the point x0 should be an accumulation point). The function f is said to be lower semicontinuous if it is lower semicontinuous at every x0 ∈ M. Prove the following equivalences: (i) (ii)
f is lower semicontinuous at x0 If f (x0 ) = +∞, then f (x) → +∞ as x → x0 , and if f (x0 ) < +∞, then given ε > 0 there exists a neighborhood U of x0 such that f (x) ≥ f (x0 ) − ε for all x ∈ U .
Moreover, the two following statements are equivalent:
13.4 Functions
(a) (b)
709
f is lower semicontinuous For every r ∈ R, the set {x ∈ M : f (x) ≤ r} is closed.
Formulate and prove the corresponding concept and, respectively, properties of upper semicontinuity. Hint. To prove the equivalence between (i) and (ii) is routine. Assume (a). Fix r ∈ R, and let {xn }∞ n=1 be a sequence in S := {x ∈ M : f (x) ≤ r} that converges to x0 ∈ M. If f (x0 ) = +∞ then, by (ii) above, f (xn ) → +∞, and this is impossible. If r < f (x0 ) (< + ∞) then, again by (ii) above, there exists a neighborhood U of x0 such that f (x) > r for x ∈ U , and we arrive to a contradiction since xn ∈ U for n big enough. This shows that x0 ∈ S, and so S is closed. Assume that {x ∈ M : f (x) ≤ r} is closed for all r ∈ R. Fix x0 ∈ M. If f (x0 ) = +∞, and r ∈ R is arbitrary, then the set U := {x ∈ M : f (x) > r} is open and contains x0 , hence there exists n0 ∈ N such that B(x0 , 1/n0 ) ⊂ U . This shows that inf{f (x) : x ∈ B(x0 , 1/n)} ≥ r for all n ≥ n0 , hence limn→∞ inf{f (x) : x ∈ B(x0 , 1/n)} = +∞. If, on the contrary, f (x0 ) < +∞, and ε > 0 is arbitrary, the set U := {x ∈ M : f (x) > f (x0 ) − ε} is open, and contains x0 , so there exists n0 ∈ N such that B(x0 , 1/n0 ) ⊂ U . This shows that inf{f (x) : x ∈ B(x0 , 1/n)} ≥ f (x0 )−ε for all n ≥ n0 , hence lim inf x→x0 f (x) := limn→∞ inf{f (x) : x ∈ B(x0 , 1/n)} ≥ f (x0 ). 13.296 Prove that the characteristic function χS of a subset S of a metric space (M, d) is lower semicontinuous, if and only if, the set S is open. Hint. Assume first that χS is lower semicontinuous. Then, {x ∈ M : χS (x) ≤ 1/2} = M \ S, and this set is closed according to Exercise 13.295, hence S is open. Assume now that S is open. Observe that, for r ∈ R, the set {x ∈ M : χS (x) ≤ r} is either M, M \ S, or, finally, the empty set (all of them closed in M). From Exercise 13.295, the function χS is lower semicontinuous. 13.297 Prove the following result of Baire: Let g be a real-valued lower semicontinuous bounded below function on R. Then, there is an increasing sequence {gn }∞ n=1 of real-valued Lipschitz functions on R such that lim gn (x) = g(x) for every x ∈ R. Hint. For n ∈ N, define gn (x) := inf{g(z) + n|x − z| : z ∈ R}. The graph of the functions g1 and g2 are plotted in Fig. 13.37. If g(x) ≥ −K for all x ∈ R, then g1 (x) ≥ −K for all x ∈ R. Moreover, −K ≤ g1 ≤ g2 ≤ . . . ≤ g. Since, by the triangle inequality, gn (x) ≤ g(z) + n|y − z| + n|x − y| for all x, y, z ∈ R, we have gn (x) ≤ gn (y) + n|x − y| and thus |gn (x) − gn (y)| ≤ n|x − y| for all x, y ∈ R. Given x ∈ R, choose zn such that g(zn ) + n|x − zn | < gn (x) + n1 ≤ g(x) + n1 . Therefore, n|x − zn | < g(x) + n1 + K and thus zn → x. Therefore, by the lower semicontinuity of g, we have g(x) ≤ lim inf g(zn ) ≤ lim gn (x) ≤ g(x). Note: The requirement on the boundedness below of g can be removed by the t trick of composition of g with the function φ = 1+|t| . Note, too, that it follows from this that every lower semicontinuous real-valued function on R is of Baire class 0 or 1. Recall that continuous functions are called of
710
13 Exercises
Fig. 13.37 Functions g1 and g2 in Exercise 13.297 g1 g
g2 g
Fig. 13.38 Approximating a continuous function by a Lipschitz function (Exercise 13.298)
f
a
b
Baire class 0 and functions that are not in Baire class 0 but are pointwise limits of sequences of continuous functions are called of Baire class 1. 13.298 Let C[a, b], the space of all real-valued continuous functions on a closed and bounded interval [a, b], be endowed with the d∞ distance (see Example 551.4). Is the set of Lipschitz functions closed there? Hint. No. In fact, the set of all Lipschitz functions on [a, b] is dense in (C[a, b], d∞ ). Indeed, each f ∈ C[a, b] can be approximated in the metric d∞ by a sequence of piecewise linear functions (each of them clearly Lipschitz), due to the uniform continuity of f . See Fig. 13.38 for a sketch of the approximation. For another approach to this result, see Exercise 13.380. 13.299 Is the circle homeomorphic to [0, 1]? Hint. No: There is no homeomorphism from [0, 1) onto [0, 1] (the inverse function—itself continuous—will apply a compact space onto a noncompact one, something impossible by Theorem 334; however, [0, 1) and the circle are homeomorphic (use the mapping f (t) := (cos 2πt, sin 2π t)). 13.300 Show that the function arctan x is 1-Lipschitz on R. Hint. Use the Mean Value Theorem 365 on [x, y] for x, y ∈ R. For a graph of the function arctan on [−10, 10] see Fig. 4.35.
13.4 Functions
711
Fig. 13.39 Functions f1 , f2 , f3 for C := [−1, 1] (Exercise 13.302)
13.4.5
Function Convergence
Pointwise and Almost Everywhere Convergence 13.301 Show that the characteristic function of a finite set in R is a Baire class 1 function. Hint. This is clear if the finite set is empty. Otherwise, observe that the characteristic function f of the one-point set {0} is the pointwise limit of a sequence {fn } of continuous functions fn defined on R by fn (0) = 1, fn (x) = 0 for |x| ≥ n1 , and fn is linear on (0, n1 ) and on (− n1 , 0), for n ∈ N. The result in this exercise is a particular case of the result in Exercise 13.302. 13.302 Show that the characteristic function of any closed set in R is a Baire class 1 function on R. -∞ , 1 , where Hint. If C is a closed set in R, consider the functions 1+nd(x,C) n=1 d(·, C) denotes the distance function to C (see Fig. 13.39). An alternative approach is the following: The complement of C is an open set, hence its characteristic function χC c is lower semicontinuous (see Exercise 13.296)— and bounded below—so we can use Exercise 13.297 to conclude that χC c is in the Baire class 1. Clearly, so it is χC . 13.303 Let f be a Baire class 1 function and g be a measurable function. Show that f ◦ g is a measurable function. Hint. The pointwise limit of a sequence of measurable functions is measurable, and f ◦ g is measurable if f is continuous and g measurable (see Corollary 406). 13.304 Show that the set of points of continuity of Baire class 1 functions on [0, 1] may have positive measure strictly less than one. Hint. Consider the characteristic function of a Cantor ternary set of positive measure (see Sect. 3.1.5 and Exercise 13.154). See also Exercises 13.185 and 13.302. 13.305 Show that the Dirichlet function D on R introduced in Definition 296 (i.e., the characteristic function of the set P of all irrational numbers) is a Baire class 2 function that is not Baire class 1. Hint. To show that D is not in the Baire class 1, use Theorem 456 and Exercise 13.185. To show that D is in the Baire class 2 consider 1 −
712
13 Exercises
Fig. 13.40 The first five functions fn on [0, 3] in Exercise 13.307
limn→∞ ( limm→∞ (cos (n!πx))2m ). Alternatively, if {rn }∞ n=1 is the sequence of all rational numbers, let fn be the characteristic function of the set {r1 , r2 , . . ., rn }, n ∈ N. Check that 1 − D is the pointwise limit of the sequence {fn }. Then use Exercise 13.301. 13.306 Let f be a Baire class 1 function on R. Show that the preimage of every open set U in R is an Fσ set in R. Hint. ∞ ∞ 1 −1 x ∈ R : dist (fn (x), R \ U ) ≥ , for all n ≥ m , f (U ) = k k=1 m=1 where {fn }∞ n=1 is a sequence of continuous functions that pointwise converge to f . Note that the property in this exercise actually characterizes the Baire class 1 functions as well as the fact that for each closed subset C of R the restriction of the function to C has a point of continuity (this is the Baire Great Theorem, see e.g., [DGZ93, Theorem I.4.1]). Uniform Convergence 2nx 13.307 Let fn (x) = 1+n 2 x 2 for n ∈ N and x ∈ R. Decide on the uniform convergence of the sequence {fn } on (i) [0, 1], and on (ii) [1, +∞). Hint. First of all, note that fn (x) → 0 for every x ∈ R. Note, too, that maxx∈[0,+∞) fn (x) = f ( n1 ), as it follows by a simple graphing (see Fig. 13.40). Finally observe that
(i) (ii)
fn ( n1 ) = 1 → 0, so the convergence of {fn } to 0 is not uniform on [0, 1]. maxx≥1 fn (x) = fn (1) by graphing (see again Fig. 13.40). Therefore, the sequence converges to 0 uniformly on [1, +∞).
13.308 Decide on the uniform convergence of the sequence {fn }∞ n=1 on (0, 1), where fn (x) = 1 − x n for n ∈ N. Hint. See Fig. 13.41 for the plot of the first six functions of the sequence. An indirect way to prove that the sequence does not converge uniformly is to consider the sequence on [0, 1]. Should the convergence be uniform on [0, 1], the pointwise limit will be continuous (Theorem 463), which is not the case. Regarding the convergence on (0, 1), observe that by adding to points 0 and 1 to the interval (0, 1), and ensuring that limn→∞ fn (0) and limn→∞ fn (1) both exist, the convergence character of the sequence does not change.
13.4 Functions
713
Fig. 13.41 The first six functions fn on [0, 1] in Exercise 13.308
Fig. 13.42 The first five summands on [0, 5] in Exercise 13.310
13.309 Does (x n − x n+1 ) converge uniformly on [0, 1]? Hint. No. By a telescopic argument (see Remark 678), the partial sums are 1 − x n , that do not converge uniformly on (0, 1) (see Exercise 13.308). 13.310 Decide on the uniform convergence of x 2 e−nx on [0, +∞). Hint. Uniformly converges. Indeed, by graphing (see Fig. 13.42), maxx≥0 x 2 e−nx = n22 e−2 . Use then the Weierstrass M-test (Theorem 473). 13.311 Assume that the series fn (x) is uniformly convergent on the interval (a, b), and that bn := limx→b− fn (x) exists for each n ∈ N. Show that bn is convergent. Hint. Given ε > 0, there exists n0 ∈ N such that m fk (x) < ε (13.18) k=n
for every x ∈ (a, b) and n0 ≤ n ≤ m. By letting x → b− in (13.18) we get m k=n bk ≤ ε. Use then the Cauchy condition. Note that the existence of limx→b− fn (x) for each n ∈ N must be requested, as it can be that fn (x) → ∞ as x → b− for some n ∈ N. It is true that the uniform convergence hypothesis shows that this cannot happen infinitely many times. Indeed, it follows again from (13.18) that |fk (x)| < ε for every x ∈ (a, b) and k ≥ n0 . 13.312 Assume that for each n ∈ N, fn is a real-valued function on R so that limx→∞ fn (x) = an . Assume that fn → f uniformly on R. Show that limx→∞ f (x) = lim an . Show a counterexample if the convergence is not uniform. Hint. Standard from the definition of uniform convergence. For the counterexample, consider χ[0,n] , the characteristic function of the interval [0, n].
714
13 Exercises
Fig. 13.43 The first five functions fn in Exercise 13.315
13.313 Assume that fn converges absolutely and converges uniformly on an interval I . Is it true that |fn | converges uniformly on I ? Hint. No. (−1)n xnn on (0, 1). Use Exercise 13.311. 13.314 Decide on the uniform convergence of the series
arctan
n
x x 2 + n3
on (−∞, ∞). Hint. Put f (x) := arctan x. By the Mean Value Theorem 365, we readily obtain |f (x)| ≤ |x| for all x ∈ R. This shows that |f (x/(x 2 + n3 )| ≤ |x|/(x 2 + n3 ) for all x ∈ R. It is easy to show that x/(x 2 + n3 ) attains its supremum at n3/2 , with value (1/2)n−3/2 . Use now the Weierstrass M-test (Theorem 473). 13.315 Decide on the uniform convergence of the sequence {fn }∞ n=1 , where √ n fn (x) = 1 + x n , n ∈ N, are defined on the interval [0, 2]. √ √ Hint. Yes. Observe√that if 0 ≤√x ≤ 1, √ then 1 ≤ √n 1 + x n ≤ n √ 2 → 1. If 1√ ≤ x ≤ 2, then x ≤ n 1 + x n ≤ n 2x n = n 2x, and n 2x − x = x( n 2 − 1) ≤ 2( n 2 − 1) → 0. See Fig. 13.43. 13.316 For n ∈ N, let
x n fn (x) = 1 + , x ≥ 0. n
Decide on the uniform convergence of the sequence {fn }∞ n=1 on [0, 1] and on [0, ∞). Hint. ∞ n x n x k n x k ex − 1 + − = k n k! n k=0 k=0
=
n xk k=0
k!
1−
n! nk (n − k)!
+
∞ xk k! k=n+1
13.4 Functions
715
Fig. 13.44 The first six functions fn on [0, 5] in Exercise 13.316
Fig. 13.45 The first five functions fn and gn in Exercise 13.317
=
n xk k=0
k!
(n − k + 1)(n − k + 2). . .n 1− nk
∞ xk + k! k=n+1
(13.19) n
See Fig. 13.44. Observe that on [0, +∞), the function fn (x) := ex − 1 + xn is
n−1 ≥ ex − (1 + xn )n ≥ 0, increasing (and nonnegative), since fn (x) = ex − 1 + xn as it follows from (13.19). This shows that, for x ∈ [0, 1], we have 0 ≤ fn (x) ≤ fn (1) = e −(1+1/n)n → 0 as n → ∞ (Definition 217), and this proves the uniform convergence (to ex ) of the given sequence on [0, 1]. (still to ex ) is not uniform on [0, ∞), since limx→+∞
xThe convergence x n e − (1 + n ) = ∞ for each n. 13.317 Let fn (x) := arctan nx, gn (x) := x arctan nx, for n ∈ N. Decide on the uniform convergence of {fn } and {gn } on [0, ∞). Hint. See Fig. 13.45. Concerning {fn }: No; the sequence of functions converges to ⎧ ⎨0 if x = 0, f (x) := ⎩ π if x > 0. 2
which is discontinuous.
716
13 Exercises
Concerning {gn }: First of all, it converges uniformly to the function g(x) := on [1, ∞). To see this, first note that by L’Hôspital’s rule (Theorem 376), lim
z→∞
π 2
− arctan z 1 z
π x 2
= 1.
Thus, there is z0 such that for z ≥ z0 we have π 2 − arctan z ≤ . 2 z So, for n > z0 and for all x ≥ 1, we have π 2 − arctan nx ≤ . 2 nx It follows that for n > z0 and for all x ≥ 1, we have x
π
2 − arctan nx ≤ . 2 n
Therefore, x. arctan nx converges uniformly to g(x) = π2 x on [1, ∞). On the other hand, the sequence gn is increasing, and thus the uniformity of its convergence on [0, 1] to the function ( π2 )x follows from Dini’s Theorem 468. x 2n+1 13.318 Find the sum of the power series f (x) := ∞ n=0 2n+1 for |x| < 1. Hint. The power series has radius of convergence 1, hence f exists on (−1, 1), ∞ 1
2n = 1−x and f (x) = 2 for |x| < 1 (see Theorem 511). It follows that n=0 x 1+x f (x) = ln 1−x + C for some constant C, and C = 0 by inspecting the point x = 0. ∞ n 13.319 Find the sum of the power series ∞f (x)n := n=1 nx for |x| < 1. Hint. Consider the power series n=0 x , whose radius of convergence is 1, summing (1 − x)−1 for |x| < 1 (see formula (2.20)). According to Theorem 511, we ∞ −1 n = can take derivatives on both sides of (1 − x) n=0 x to get , after multiplying ∞ n −2 by x, n=1 nx = x(1 − x) for |x| < 1. Again, this series has radius of convergence 1. The same allows to take argument 2 n derivatives at both sides; after multiplying by x we get ∞ n x = x(1 + x)(1 − n=1 x)−3 , valid for |x| < 1. 13.320 Assume that {an } is a strictly decreasing sequence of nonnegative numbers such that a1 := 1, an → 0, and an diverges. Denote by Id the identity mapping on [0, 1]. Let the nonnegative functions fn be defined on [0, 1] by fn := Id.χ[an ,an+1 ) , n ∈ N. Show that fn (x) = x uniformly on [0, 1] and that sup fn diverges. Compare this with the Weierstrass convergence M-test (Theorem 473). Hint. Standard. This shows that the test gives only a sufficient condition.
13.4 Functions
717
Fig. n 13.46 The function k=1 fk in Exercise 13.320 n k =1 f k
0
an
a 3 a 2 a 1 (=1)
13.321 Show that there does not exist a polynomial p such that |p(x) − |x|| < 1
for all x ∈ R.
(13.20)
Compare with Lemma 484. Hint. Assume that p(x) := an x n + . . . + a1 x + a0 , where an = 0. If n ≥ 2, for x > 0, we get from (13.20) that a0 + a1 + . . . + an − 1 < 1 . (13.21) xn n−1 n x x xn Then, by taking the limit in (13.21) as x → +∞, we get that an = 0, a contradiction. Obviously, if n = 0, then we obtain also a contradiction, since p is, in this case, a constant function. Thus, we are left only with the possibility n = 1. We get, from (13.20), that a 1 0 for allx > 0. (13.22) + a1 − 1 < x x Taking the limit in (13.22) as x → +∞, we get a1 = 1. Similarly, a 1 0 for allx < 0. + a1 + 1 < x x
(13.23)
By taking the limit as x → −∞ in (13.23), we get a1 = −1. Thus, we arrive at a contradiction with the existence of such a polynomial. 13.322 We say that a sequence of functions {fn } defined on a nonempty subset D of R converges locally uniformly to a function f on D, if for each x ∈ D, there is a neighborhood U (x) in D such that {fn } converges uniformly to f on U (x). Show that {fn } defined on (0, 1) by fn (x) := x n , n ∈ N, converges locally uniformly to the function identically equal to 0 on (0, 1), although it does not converge uniformly on (0, 1). (ii) Show that if K is a nonempty compact subset of R and {fn } converges locally uniformly to f on K, then fn converges uniformly to f on K.
(i)
Hint. (i) Given x ∈ (0, 1), choose δ ∈ (x, 1). We proved in Remark 464 (see also Example 454 and Fig. 5.1) that the sequence {x n }∞ n=1 converges uniformly to the
718
13 Exercises
function 0 on the neighborhood (0, δ) of x. That the convergence cannot be uniform on [0, 1] follows from Theorem 463 (see again Remark 464 and also Exercise 13.308). It is clear, then, than it cannot be uniform on (0, 1) (= [0, 1] \ {0, 1}. (ii) For x ∈ K, let U (x) be a neighborhood of x in K on which fn → f uniformly. Due to the compactness of K, we can choose a finite set {xi : i = 1, 2, . . ., n} in K such that ni=1 U (xi ) covers K. Given ε > 0 and i, let ni ∈ N be so that |fn (x) − f (x)| < ε for all x ∈ U (xi ) and n ≥ ni . Let n0 := maxi {ni }. Then, if n > n0 , we have |fn (x) − f (x)| ≤ ε for all x ∈ K. 13.323 Show that in Egorov’s Theorem 478, applied for [0, 1], one cannot, in general, get the measure of the set of uniform convergence to be equal to 1. Hint. For n ∈ N, let fn (x) be defined to be n1 on (1/n, 1], n on (0, n1 ], and 0 at x = 0. Assume that the sequence {fn } converges uniformly (to 0) on a measurable set A such that λ(A) = 1. Since, for each n ∈ N, A ∩ [0, 1/n] = ∅, we can construct a sequence {an }∞ n=1 in A \{0} such that an ↓ 0. The sequence {fn } converges uniformly to 0 on N := {an : n ∈ N}. This leads to a contradiction. 13.324 Find an example of a sequence {fn }∞ n=1 of real-valued continuous functions on R such that limn→∞ fn (x) = 0 for every x ∈ R and on no interval (a, b) ⊂ R, the sequence {fn }∞ n=1 converges uniformly. Compare this with Egorov’s Theorem 478. Hint. Let φ(x) = 2x on [0, 1/2], φ(x) = 2 − 2x on (1/2, 1], and φ(x) = 0 otherwise on R. Then, lim φ(2n x)= 0 for every x ∈ R. Let {ri }∞ i=1 be a dense −i n sequence in R. Define fn (x) := ∞ 2 φ(2 (x − r )), for x ∈ R. Then, as a i i=1 sum of a uniformly convergent series of continuous functions, fn is continuous on R (see Theorem 463), and for a similar reason, lim fn (x) = 0 for every x ∈ R. If (a, b) ⊂ R, then ri ∈ (a, b) for some i, and we have sup {fn (x) : x ∈ (a, b)} ≥ 21i for sufficiently large n. This shows that fn does not converge uniformly on (a, b). In fact, if E is any set on which {fn } converges uniformly, then E is nowhere dense. Indeed, assume that {fn } converges uniformly on a set E. Then, if αn := sup {fn (x) : x ∈ E}, then αn → 0. As the functions fn are all continuous, αn is also the supremum of fn of E. Hence, {fn } converges uniformly on E. From what we have shown, E cannot contain an open interval. Therefore, any set on which {fn } converges uniformly is nowhere dense. Local Approximation by Polynomials √ 13.325 Find the Taylor polynomial of the third order for the function f (x) = x at the point 1. 1 Hint. Standard. The result: 1+ 21 (x −1)− 18 (x −1)2 + 16 (x −1)3 . See Fig. 13.47. 13.326 Find the flaw in the following argument: We show that π = 2. Consider 1 a line segment of length 1. Consider semicircles of radii 2n+1 with centers at 2k−1 2n+1 for k ∈ {1, 2, . . ., 2n } and n ∈ {0, 1, . . ., } (Fig. 13.48 shows the graphs for n = 0 to n = 3; see also the beauty of superimposing the graphs in Fig. 13.49). The total
13.4 Functions Fig. 13.47 The function and its degree-3 Taylor polynomial at x0 = 1 (Exercise 13.325)
719 √
x
Fig. 13.48 The first four functions in the construction (Exercise 13.326)
Fig. 13.49 Superimposing the graphs (Exercise 13.326)
length of the semicircles at level n is given by 2n
π π = 2 2n+1
which is independent of n. Note the graphs of the waves are approaching uniformly the zero function. Therefore, the lengths of the semicircles must approach the length of the line segment [0, 1]. Thus, we conclude that π = 2. Hint. Assume that a sequence of functions {fn }∞ n=1 converges uniformly to a function f . Compute the length of the graph of fn . This length depends on fn , not on fn , and there is no reason for the sequence {fn }∞ n=1 converging uniformly. 13.327 Express the polynomial P (x) = 1 + 3x + 5x 2 − 2x 3 as a Taylor polynomial centered at x = −1. Hint. P (−1) = 5, P (−1) = −13, P
(−1) = 22, P
(−1) = −12, by the formula for the Taylor polynomial, the result is 5−13(x +1)+11(x +1)2 −2(x +1)3 .
720
13.4.6
13 Exercises
Function Series
Power Series 13.328 Prove the following result: Let P be a polynomial of degree n on an interval (a, b), and let x0 ∈ (a, b). Then P (x) = ni=0 ai (x − x0 )i , where ai = P (i) (x0 )/ i! for i = 0, 1, 2, . . ., n. Hint. Consider the (n + 1-dimensional) linear space F of functions generated by the monomials {1, x, x 2 , . . ., x n }. The functions 1, (x − x0 ), (x − x0 )2 , . . ., (x − x0 )n are all in F , and they form a linearly independent set, hence an algebraic basis of F . Therefore, P is a linear combinations of these functions. The precise value of the coefficients is in the statement of Proposition 497. 13.329 Find the radius of convergence and the interval of convergence of the series ∞ n=1
xn . 3n + 2 n
)
n √ Hint. The radius of convergence is 3: Use n 3n + 2n = n 3n (1 + 23 ) → 3 and Lemma 178. The interval of convergence is (−3, 3): Indeed, for |x| = 3, the general term of the series does not converge to 0. 13.330 Find the radius of convergence and the interval of convergence of the series ∞ π − arctan n x n . 2 n=1 Hint. Put an = π2 − arctan n. Then an+1 → 1. Thus, the radius of convergence an is 1 (see Lemma 178). Furthermore, a1n → 1. Thus, the interval is [−1, +1) (use the n harmonic series—Proposition 161—and the Leibniz criterion—Corollary 183). 13.331 Find power series with the following interval of convergence: (i) {0}. (ii) (−∞, +∞). (iii) (−1, +1). (iv) (−1, 1]. (v) [−1, +1). (vi) [−1, +1]. Hint. n , (i) n!x xn (ii) n! , (iii) x n , (iv) (−1)n xnn , xn , (v) xnn . (vi) n2
13.4 Functions
721
Fig. 13.50 Several tangent lines to ex (Exercise 13.334)
The Taylor Series x
13.332 Use the Taylor expansion to show that limx→+∞ xe n = ∞ for all n ∈ N. xn xj x Hint. Since ex = ∞ j =0 j ! , we have that for all x ≥ 0, e ≥ n! for all n. Thus given n ∈ N, ex ≥
x n+1 (n+1)!
for all x ≥ 0.
The Exponential and the Logarithmic Functions 13.333 Without using L’Hôspital’s rule (Theorem 376), show that limn→∞ lnnn = 0.
Hint. First note that for n > 2, 2n = (1 + 1)n ≥ n2 . This directly follow from the finite binomial expansion (see (iii) in Exercise 13.10). Given ε > 0, get k0 such that ln 2 k+1 < ε for every k ≥ k0 . Then, if n > 2k0 it 2k follows that for some k ≥ k0 , 2k ≤ n ≤ 2k+1 . Thus,
ln n n
≤
ln 2k+1 2k
= ln 2 k+1 ≤ ε. 2k
13.334 For which k ∈ R the line y = kx is a tangent line to the graph of f (x) = ex ? Hint. See Fig. 13.50. Given a point (x0 , ex0 ) construct the tangent line at this point. The slope is ex0 , so the equation is y = ex0 x + q for some q ∈ R. By putting in the point (x0 , ex0 ), we get q = ex0 (1 − x0 ). We look for such a point that makes q = 0. Thus x0 = 1. Then, we get k = e1 = e. 13.335 Show that for n ∈ N, n n e
< n! < e
n n 2
.
Hint. Finite induction. For the first inequality, use the fact that (1 + n1 )n < e (see Proposition 216) to write ( n+1 )n+1 )n ( n+1 e e = ( ne )n ( ne )n
n+1 e
1 n n+1 = 1+ < n + 1. n e
n+1 Thus, assuming ( ne )n < n!, we get ( n+1 < (n + 1)!. e )
722
13 Exercises
For the second inequality, use the fact that (1 + n1 )n > 2 (see again Proposition 216) to write )n ( n+1 )n+1 ( n+1 e 2 = n n n n (2) (2)
n+1 2
1 n n+1 = 1+ > n + 1. n 2
n+1 > (n + 1)!. Thus, assuming ( n2 )n > n!, we get ( n+1 2 )
13.336 Prove, by using the change-of-variable Theorem 702, that ln (xy) = ln x + ln y (see Proposition 536). Hint. In the following integrals, change variables, first by xt = u, and then by xu = v. We have 4 xy 4 y 1 1 ln (xy) = dt = x du 1 xu t 1 x 4 y 4 x 4 y 4 1 1 1 1 1 du + du = dv + du = ln x + ln y. = 1 u 1 u 1 v 1 u x The Hyperbolic Functions 13.337 Find expressions for the inverse hyperbolic functions arsinhx, arcoshx, and artanhx (see Fig. 5.30). Hint. For the hyperbolic sinus, put t := ex to write y = sinh x = (t − t −1 )/2. y 2 + 1. Since t > 0, we get x = Algebraic manipulations lead to t = y ± 2 ln (y + y + 1), valid for y ∈ R. Note that the hyperbolic cosinus is not a one-to-one function, hence it has no inverse. We can compute the inverse when the function is restricted to [0, +∞) or when restricted to (−∞, 0], getting x = ln (y + y 2 − 1) and x = ln (y − y 2 − 1), respectively. The inverse of the artanhx function is restricted to (−1, 1), and we have x = (1/2) ln ((y + 1)/(y − 1)), y ∈ (−1, 1). The Trigonometric Functions 13.338 Show that given a nondegenerate interval I and real numbers a, b, at least one of them nonzero, there is x ∈ I such that a cos x + b sin x = 0. Hint. Assuming the contrary, there is an interval I and numbers a, b ∈ R, at least one of them nonzero, such that a cos x + b sin x = 0 for all x ∈ I . By differentiating the last equation, we have that −a sin x + b cos x = 0. So we have a cos x + b sin x = 0, −a sin x + b cos x = 0.
13.4 Functions
723
for all x ∈ I . Choose x ∈ I such that both sin x and cos x are nonzero. Multiply the first of the equations by sin x and the second by cos x and sum up the resulting equations. We get b = 0. Similarly we get a = 0, a contradiction.
13.4.7
Metric Spaces
Basics 13.339 Show that the discrete metric d on a space X, i.e., the metric & 1, if x = y, d(x, y) = 0, if x = y, for x, y ∈ X, satisfies d(x, z) ≤ max{d(x, y), d(y, z)} ( ≤ d(x, y) + d(y, z)), for all x, y, z ∈ X. Hint. Standard. 13.340 In a metric space (M, d), put d1 (x, y) := min{d(x, y), 1} for x, y ∈ M. Show that d1 is a metric on M that gives the same convergent sequences as d does (i.e., the two metric d and d1 are equivalent, see Definition 563). Hint. We show the triangle inequality only. The other properties have an easy proof. To show the triangle inequality, given x, y, z ∈ M denote a := d(x, y), b := d(y, z), and c := d(x, z). Observe that min{2, 1 + a, 1 + b, a + b} ≥ min{1, c}. Thus d1 (x, y) + d1 (y, z) = min{1, a} + min{1, b} = min{2, 1 + a, 1 + b, a + b} ≥ min{1, c} = d1 (x, z). 13.341 Let (X, d) be a metric space. Put, for x, y ∈ X, d0 (x, y) :=
d(x, y) . 1 + d(x, y)
(See Example 564.2.) Prove that d0 is a metric on X that is equivalent to the metric d (see Definition 563). Moreover, (0 ≤) d0 (x, y) ≤ 1 for all x, y ∈ X. In the particular
724
13 Exercises
case that X = R and d(x, y) = dabs (x, y) := |x − y| for x, y ∈ R, prove that (R, d0 ) is totally bounded. Hint. To see that d0 is a metric, note the following inequalities for x, y, z ∈ X, t based on the fact that the function t → 1+t is increasing on (0, +∞) (for the graph of this function on the interval [0, 10], see Fig. 13.78), and that d is a metric: d(x, z) d(x, y) + d(y, z) ≤ 1 + d(x, z) 1 + d(x, y) + d(y, z) d(x, y) d(y, z) = + 1 + d(x, y) + d(y, z) 1 + d(x, y) + d(y, z) d(x, y) d(y, z) ≤ + 1 + d(x, y) 1 + d(y, z)
d0 (x, z) =
= d0 (x, y) + d0 (y, z). t is a homeoThat d0 and d are equivalent follows from the fact that t $ → 1+t morphism from R onto (−1, 1), which implies that d(xn , x) → 0, if and only if, d0 (xn , x) → 0. The proof of the last statement follows similarly to the statement on the arctan metric on R in Example 610. Remark. This exercise shows that completeness is not preserved, in general, by passing to an equivalent metric. Indeed, (R, dabs ) is complete (see Theorem 1071). Were (R, d0 ) be complete, it would be compact, due to Theorem 620 and the fact that (R, d0 ) was seen to be totally bounded. Compactness is a topological notion— i.e., it is preserved by homeomorphisms, see Proposition 628—, so (R, dabs ) will be compact, and this is not the case by the Heine–Borel Theorem 96. ®
13.342 Let (M, d) be a metric space. Put S 1 := {x : d(x, 0) = 3 d(x, 0) = 1}, and S2 := {x : d(x, 0) = 2}. Find S 1 , S1 , and S2 for the following spaces:
1 }, 3
S1 := {x :
3
R with the metric datan (x, y) := | arctan x − arctan y| (see Example 556.8) R with the metric d(x, y) := min{|x − y|, 1} (see Exercise 13.340) NN with the Baire metric (see Definition 594) R2 with the metric d∞ , d2 , and d1 (see Sect. 6.1 and Examples 549.2b and 549.2c) 5. R with the discrete metric (see Example 549.3 and Exercise 13.339).
1. 2. 3. 4.
Hint. 1. 2. 3. 4. 5.
S 1 = {x : | arctan x| = 13 }, S1 = {x : | arctan x| = 1}, S2 {x : | arctan x| = 2} 3 S 1 = {x = ± 13 }, S1 = {x : x = ±1}, S2 = ∅ 3 S 1 = {(0, 0, x3 , x4 , . . .)}, xi ∈ N, x3 = 0, S1 = {x}, such that x1 = 0, S2 = ∅ 3 Observe the closed unit balls in R3 for the three distances (Fig. 13.51) S 1 = ∅, S1 = R \ {0}, S2 = ∅. 3
13.343 Let B[x, r] be the closed ball with center x and radius r in a metric space (M, d).
13.4 Functions
725
Fig. 13.51 Three unit balls in R3
(i) (ii)
Show that the diameter of B[x, r] is less than or equal to 2r Is it true that the diameter of B[x, r] is equal to 2r?
Hint. (i) The triangle inequality using the center point (ii) No. See the discrete metric (Example 549.3). 13.344 Let (X, d) be a metric space and let B(x0 , r) and B[x0 , r] be open and closed balls in (X, d), respectively. (i) Show that B(x0 , r) is an open set in (X, d) and B[x0 , r] is a closed set in (X, d). (ii) Give an example of a metric space where the closure of an open ball is not the corresponding closed ball. Hint. (i) Use the triangle inequality and the continuity of the distance function. (ii) Let (M, d) be the metric space defined in Example 549.3. Observe that for every x ∈ M, we have B(x, 1) = {x}. This set is closed and open, simultaneously, as it is a singleton and, at the same time, an open ball, hence an open set (see (i) above). It follows that the closure of B(x, 1) is again B(x, 1), while B[x, 1] = M. 13.345 Let A be a subset of a metric space (M, d). Show that all (i), (ii), and (iii) in Proposition 78 proved for M := R extend to this situation. Hint. Repeat the proof of Proposition 78. 13.346 Let (M, d) be a metric space. Prove that given δ > 0, any δ-separated subset L (see the definition at the paragraph preceding Theorem 582) of M must be necessarily closed. Hint. Let x ∈ M \ L. We claim that B(x, δ/2) ∩ L is either empty or a singleton. Indeed, if two elements y, z ∈ L belong to B(x, δ/2), we get d(y, z) ≤ d(y, x) + d(x, z) < δ/2 + δ/2 = δ, a contradiction. This shows the claim. If B(x, δ/2) ∩ L is a singleton {y}, put r := d(x, y). Otherwise, put r := δ/2. In either case, B(x, r)∩L = ∅. This proves that M \ L is open, and so L is closed. 13.347 Find the distance of the function f (x) = x 2 to the function g(x) = x 3 in the space (C[0, 1], d∞ ). 4 Hint. Direct calculation. The result: 27 . See Fig. 13.52. 13.348 Show that a set in a metric space is Gδ , if and only if its complement is Fσ . Hint. De Morgan formulas (1.1) and (1.2). 13.349 Show that in a metric space, every closed set is Gδ , and every open is Fσ .
726
13 Exercises
Fig. 13.52 The distance between f and g (Exercise 13.347)
Hint. Let F be a closed set in the metric space (M, d). Then F = ∞ n=1 Gn , as it follows from Exercise 13.359, where Gn := {x ∈ M : d(x, A) < 1/n} (note that this is an open set) for n ∈ N. The second statement follows from this and Exercise 13.348. 13.350 Show that if F is an Fσ -set, then there is an increasing sequence {Fi }∞ i=1 of ∞ closed sets such that F = F . i=1 i ∞ Hint. If ∞ i=1 Fi = F , then n=1 (F1 ∪ F2 ∪ . . . ∪ Fn ) = F . 13.351 Show that if G is a Gδ -set, then there is a decreasing sequence {Gi }∞ i=1 of open sets Gi such that G = ∞ G . i i=1 Hint. Exercises 13.348 and 13.350. 13.352 Show that a countable union and a finite intersection of Fσ -sets is an Fσ -set. Hint. Write ∞
∞ i=1
{Fi,j : (i, j ) ∈ N × N}. Fi,j =
j =1
If the family is not empty, write n ∞
i=1
{F1,j1 ∩ . . . ∩ Fn,jn : (j1 , . . ., jn ) ∈ N × . . . × N}. Fi,j =
j =1
13.353 Show that a countable intersection and a finite union of Gδ -sets is Gδ . Hint. Similarly as Exercise 13.352. 13.354 Show that [0, 1) is simultaneously Gδ and Fσ . Hint. Consider (− n1 , 1) and [0, 1 − n1 ]. 13.355 Let M be a nonempty finite set and d be a metric on it. Show that each point in M is an open set, that M is a compact metric space, and describe convergent and Cauchy sequences in M. Show that every real-valued function on M is continuous. Hint. Given x ∈ M, put r = inf{d(x, y) : x, y ∈ M, y = x}. Then {x} = B(x, r/2). This shows that each point in M is an open set. The compactness is obvious from the fact that M is finite. The convergent sequences as well as Cauchy sequences are eventually constant. The last statement follows from the fact that each subset of M is open.
13.4 Functions
727
13.356 Let K be a finite nonempty set of real numbers. Construct a sequence {xn } ⊂ K such that the set of all cluster points of {xn } is exactly K. Hint. If K = {z1 , z2 , . . . , zn }, consider {z1 , z2 , . . . , zn , z1 , z2 , . . . , zn , z1 , z2 , . . . , zn , . . .} 13.357 Construct a sequence {xn } of real numbers that has exactly one cluster point but {xn } is not convergent. Can {xn } be bounded? Hint. Put x2n = n1 , x2n+1 = n. No, otherwise, if x is a cluster point of {xn } and xn → x, there is ε > 0 and a subsequence {xnk } of {xn } such that |xnk − x| ≥ ε for each k. Let y be a cluster point of {xnk } (compactness); then y is a cluster point of {xn } different from x. 13.358 Show that every separable metric space has cardinality less than or equal to c. Hint. Let D be a countable dense subset of M. Define a mapping ϕ : M → P(D) in the following way: to x ∈ M we associate a sequence {dn }∞ n=1 in D such that dn → x. Then put ϕ(x) := {dn : n ∈ N}. Obviously ϕ is one-to-one. This shows that card (M) ≤ 2ℵ0 (= c, see Exercise 13.47). 13.359 Let (M, d) be a metric space. Let A be a nonempty subset of M. Prove that for an element x ∈ M the following are equivalent: (i) x∈A (ii) There exists a sequence {an } in A such that an → x (iii) d(x, A) = 0, where d(·, A) is the distance function introduced in Proposition 557. Hint. (i)⇒(ii): Let x ∈ A. Assume that an open neighborhood U (x) of x is disjoint from A. Then, M \ U (x) is a closed set that does not contain x, a contradiction with the fact that A is the intersection of all closed sets containing A. In particular, B(x, 1/n) ∩ A = ∅, so we can choose an ∈ B(x, 1/n) ∩ A, for every n ∈ N. Clearly, the sequence {an } converges to x. (ii)⇒(iii): If {an } is a sequence in A such that an → x, then d(x, an ) → 0. Since d(x, A) ≤ d(x, an ), we get d(x, A) = 0. (iii)⇒(i): Let F be a closed subset of M such that A ⊂ F . If x ∈ F , there exists ε > 0 such that B(x, ε) ∩ F = ∅. It follows that d(x, A) ≥ ε, a contradiction. 13.360 Let (X, d) be a metric space, and a1 , a2 , a3 , a4 be points on X. Show that |d(a1 , a4 ) − d(a2 , a3 )| ≤ d(a1 , a2 ) + d(a3 , a4 ). Hint. Write d(a1 , a4 ) ≤ d(a1 , a2 ) + d(a2 , a3 ) + d(a3 , a4 ). Thus d(a1 , a4 ) − d(a2 , a3 ) ≤ d(a1 , a2 ) + d(a3 , a4 ). By interchanging a1 with a2 and a3 with a4 , we thus get |d(a1 , a4 ) − d(a2 , a3 )| ≤ d(a1 , a2 ) + d(a3 , a4 ).
728
13 Exercises
13.361 Let (X, d) be a metric space. Let {xn } and {yn } be two sequences in X such that xn → x and yn → y in X. Show that then lim d(xn , yn ) = d(x, y).
n→∞
Hint. In Exercise 13.360, put a1 := xn , a2 := x, a3 := y, and a4 := yn . We get |d(xn , yn ) − d(x, y)| ≤ d(xn , x) + d(yn , y). 13.362 Let X and Y be metric space, Y being complete. Let D be a dense set in X. Let ϕ be a uniformly continuous map from D into Y . Show that ϕ can be extended to a uniformly continuous map from X into Y . Hint. ϕ carries Cauchy sequences in D to Cauchy sequences in Y . If x ∈ X, define ϕ (x) := lim ϕ(xn ), xn ∈D
where {xn } is a sequence in D converging to x. Show that this does not depend on the choice of the sequence {xn }, and that ϕ is uniformly continuous on X. 13.363 Let C be the unit circle in (R2 , d2 ), where d2 is the Euclidean metric in R2 , and let a metric on C be defined by ⎧ ⎪ if x, y are on the same ray ⎪ ⎨d2 (x, y) (13.24) d(x, y) := from the center of C, ⎪ ⎪ ⎩ d2 (x, 0) + d2 (y, 0) otherwise. This is sometimes called the French rail road see
system (Paris is the origin, Fig. 13.53). Show that d is a metric and find B ( 21 , 0), 41 and B ( 18 , 0), 41 . Describe the sequences converging to a point in C. Hint. Draw pictures. Observe the dramatic difference between the first and the second ball. For the last question, distinguish between sequence converging to 0 and to any other point.
Mappings Between Metric Spaces 13.364 Choose a point t ∈ [0, 1] and define a real valued function δt on the metric space C[0, 1] by δt (f ) = f (t) for all f ∈ C[0, 1]. Show that δt is a continuous function on C[0, 1]. Evaluate its maximum on the unit ball of C[0, 1], i.e., on the set {f ∈ C[0, 1] : d∞ (f , 0) ≤ 1}. Hint. If {fn }∞ n=1 is a sequence in C[0, 1] such that fn → f in the metric d∞ , we have in particular, fn (t) → f (t), i.e., δt (fn ) → δt (f ). The maximum is 1.
13.4 Functions
729
Fig. 13.53 The distance between Bordeaux and Marseille in the metric given by (13.24)
1 13.365 Let a real valued function F be defined on C[0, 1] by F (f ) = 0 f for f ∈ C[0, 1]. Show that F is a continuous function on the metric space (C[0, 1], d∞ ) and find its maximum on the unit ball of C[0, 1]. Regarding Exercise 13.364, show that there is no t ∈ [0, 1] such that F = δt . 1 1 Hint. If fn → f in the d∞ metric on [0, 1], then 0 fn → 0 f . The maximum is 1. Fix t ∈ [0, 1] and find a sequence {fn }∞ n=1 in C[0, 1] such that fn (t) → 0 and F (fn ) → 1. 13.366 If f is a real-valued function on a metric space M, show that the set G of all points where f is continuous is a Gδ -set in M. This exercise extends Exercise 13.181 to the case of a metric space. Adapt the argument there to this more general situation. Hint. For n ∈ N, consider Gn = {x ∈ M : ∃ a neighborhood V (x) ofx such that |f (y) − f (z)| < Show that each Gn is open and that G =
1 for every y, z ∈ V (x)}. n
∞
n=1
Gn .
13.367 Continuity of a given real-valued functions of a real variable can be tested by approximating from a dense subset of its domain. This was proved in Proposition 328. Prove a similar result for a function from a metric space into a metric space. Hint. The result should read as follows: Let f : T → M be a function, where (T , ρ) and (M, d) are metric space, and let D0 be a dense subset of T . Then, f is continuous, if and only if, limd→x, d∈D0 f (d) = f (x) for every x ∈ T . That the condition is necessary is obvious. To prove sufficiency, fix x0 ∈ T . Given ε > 0, there exists a neighborhood U (x0 ) of x0 such that f (U (x0 ) ∩ D0 ) ⊂ B(f (x0 ), ε) (observe that U (x0 ) ∩ D0 = ∅). For x ∈ U (x0 ) choose a neighborhood U (x) of x such that U (x) ⊂ U (x0 ) and f (U (x) ∩ D0 ) ⊂ B(f (x), ε) (again, the set U (x) ∩ D0 is non empty; moreover, it is a subset of U (x0 ) ∩ D0 ). Let s ∈ U (x) ∩ D0 . Then d(f (x), f (x0 )) ≤ d(f (x), f (s)) + d(f (s), f (x0 )) < 2ε. Since x ∈ U (x0 ) was arbitrary, this shows that f is continuous at x0 .
730
13 Exercises
13.368 Find a countable set {Hn : n ∈ N} of continuous functions on the unit ball B := {g ∈ C[0, 1] : supt∈[0,1] |g(t)| ≤ 1} of the space (C[0, 1], d∞ ) (see Example 551.4) such that fn → f uniformly on B if Hm (fn ) →n Hm (f ) for each m ∈ N. Put this result in contrast with Exercise 13.618. Hint. Let {tn }∞ n=1 be the sequence of all rational numbers in [0, 1], and for each n ∈ N let Fn be the real-valued function on B defined by Fn (g) = g(tn ), for g ∈ B. For each m ∈ N, let Gm be the real-valued function defined on B given by 1 Gm (g) = sup |g(ti ) − g(tj )| : i, j ∈ N, |ti − tj | ≤ , for g ∈ B. m Observe that Gm+1 (g) ≤ Gm (g), for each m ∈ N and g ∈ C[0, 1].
(13.25)
Gm (g) →m 0, for each g ∈ C[0, 1],
(13.26)
Note, too, that
due to the uniform continuity of g on [0, 1] (Theorem 344). Put {Hn : n ∈ N} := {Fn : n ∈ N} ∪ {Gn : n ∈ N}. Let fn , f ∈ B be such that Fm (fn ) →n Fm (f ) for each m,
(13.27)
Gm (fn ) →n Gm (f ) for each m.
(13.28)
Given ε > 0, by the uniform continuity of f on [0, 1], there is m ∈ N such that Gm (f ) < ε. Due to (13.28), there is n0 such that Gm (fn ) < ε for every n ≥ n0 . It follows from (13.26) that there exists m1 ∈ N (that, by (13.25), can be taken greater than m) such that Gm1 (fn ) < ε for each n < n0 . Since Gm1 (fn ) ≤ Gm (fn ) (< ε) for n ≥ n0 , we finally get Gm1 (fn ) < ε for all n ∈ N. It follows from the Arzelà–Ascoli Theorem 648 that {fn : n ∈ N} is relatively compact. Assume that it is not true that fn → f in the supremum metric. Then, there is an element f0 ∈ C[0, 1], f0 = f , and a subsequence {nk }∞ k=1 of {1, 2, 3, . . .} such that fnk → f0 in the supremum metric. Let k ∈ N be such that f0 (tk ) = f (tk ). From (13.27), we have that fn (tk ) → f (tk ), which contradicts fnk (tk ) → f0 (tk ). Thus, fn → f in the supremum metric. Remark The above statement was used, e.g., in showing that all separable infinite dimensional Banach spaces are mutually homeomorphic. This was a result by the Russian mathematician M. I. Kadets. For the proof of this result, we refer, e.g., to [FHHMZ11, Theorem 12.46]. The result was later extended by the Polish mathematician H. Torunczyk.
13.4 Functions
731
13.369 Let F be a family of continuous real-valued functions on a complete metric space X and assume that for each x ∈ X there is a number Mx > 0 such that |f (x)| ≤ Mx for all f ∈ F. Then, there is a nonempty open set O ⊂ X and a constant M > 0 such that |f (x)| ≤ M for all f ∈ F and all x ∈ O. Hint. For n ∈ N , put Fn := {x ∈ X : |f (x)| ≤ n for all f ∈ F}. Then, each Fn is a closed set in X, and X = n Fn . It follows from the Baire Category Theorem 641 that some Fn has an interior point. 13.370 Show that there are metric spaces X and Y such that X is homeomorphic to a subset of Y and Y is homeomorphic to a subset of X, and yet X and Y are not homeomorphic. Compare this with the Cantor–Bernstein–Schröder Theorem 50 on cardinalities. The similar question for Banach spaces and linear isomorphisms was solved in the negative only recently by W. T. Gowers. Hint. X := [0, ∞) and Y := {−1} ∪ [0, ∞). To get the nonhomeomorphism part, consider isolated points. 13.371 Let p ≥ 1, and let n ∈ N. Show that the function dp defined on np × np by
n 1 p p dp (x, y) = is a metric on np . Exercise 13.372 shows that this i=1 |xi − yi | n metric is equivalent on p to the d∞ metric. Hint. We use Minkowski’s inequality (see Sect. 8.2): For x, y, z ∈ np we have dp (x, z) =
n
p1 |xi − zi |
≤
p
n
i=1
≤
n
p1 (|xi − yi | + |yi − zi |)
p
i=1
p1 |xi − yi |
p
+
i=1
n
p1 |yi − zi |
p
i=1
= dp (x, y) + dp (y, z). The other requirements for a distance function are clearly met. 13.372 Show that for every n ∈ N and every real number p ≥ 1, there is a constant Cp > 0 such that p1 n 1 p max1≤i≤n |xi − yi | ≤ |xi − yi | ≤ Cp max1≤i≤n |xi − yi |, Cp i=1 for every x = (xi ), y = (yi ) ∈ np . This shows that on np the metric dp introduced in Exercise 13.371 and the metric d∞ are equivalent. Hint. Use the following inequalities: maxi |xi − yi |p ≤ |xi − yi |p ≤ nmaxi |xi − yi |p .
732
13 Exercises
13.373 If 1 ≤ q < p, show that q ⊂ p and q = p . Hint. Note that ifx = (xi ) ∈ q , then xi → 0. Note, too, that if |a| < 1, then |a|q > |a|p . Thus, if |xi |q < ∞ then |xi |p < ∞. 1 The element x = (xi ), where xi = n− q for i ∈ N, is not in q ; however, it is in 1
p for any p > q, as < ∞ for any α > 1. nα n n 13.374 Let the sequence {x n }∞ n=1 of elements x = (xi ) of 1 be defined by ⎧ ⎨ 1 if i ≤ n, xin := n ⎩0 if i > n.
Show that d(x n , 0) = 1 in 1 but d(x n , 0) → 0 in any p , p > 1. Hint. A direct calculation. 13.375 For n ∈ N, let an element x n = (xin ) of 2 be defined by ⎧ ⎨1 + 1 if i = n, i xin = ⎩0 otherwise. Let D := {x n : n ∈ N}. Show that D is closed in 2 , the distance of the zero element 0 in 2 to D is 1, but there is no element in D whose distance to 0 in 2 is 1. Hint. If for a sequence {dn } in D, we have dn → x, then for large n and m we must have d(dn , dm ) → 0, so the sequence {dn }∞ n=1 must be eventually constant. Then x ∈ D. Furthermore, we have d(0, x n ) = 1 + n1 for each n. Complete Metric Spaces, and the Completion of a Metric Space 13.376 Let c be the space of all convergent sequences of real numbers endowed with the d∞ metric. Is it a complete metric space? Hint. Yes. The proof is standard. 13.377 Let BC(0, 1) denotes the metric space of bounded continuous real-valued functions on the interval (0, 1) endowed with the supremum metric d∞ . Show that this is a complete nonseparable metric space. Hint. To prove completeness, use Proposition 462 to show that {gn }∞ n=1 is d∞ convergent to a function f : [0, 1] → R. Theorem 463 shows that f is continuous. Due to the fact that a d∞ -Cauchy sequence {gn }∞ n=1 in CB(0, 1) is necessarily d∞ bounded, we conclude that f is bounded. All together, we get f ∈ CB(0, 1). To show the nonseparability, given n ∈ N find a continuous function fn on (0, 1) 1 , n1 ) and d∞ (fn , 0) = 1. Define a mapping T : whose support is contained in ( n+1 ∞
∞ → BC(0, 1) by T ((an )) := n=1 an fn . Prove that T ( ∞ ) is a subspace of (CB(0, 1), d∞ ) that is isometric to ∞ . Since this last space is nonseparable (see Example 586.2c), so, it is (CB(0, 1), d∞ ).
13.4 Functions
733
Fig. 13.54 Some √ elements gδ that approximate x (Exercise 13.379)
13.378 Let CU (0, 1) be the metric space of all uniformly continuous real-valued functions defined on the open interval (0, 1), endowed with the d∞ metric. Show that it is a complete separable metric space. Hint. A standard argument shows that the d∞ -limit of a d∞ -Cauchy sequence of uniformly continuous functions of (0, 1) is uniformly continuous on (0, 1). This shows completeness. To prove separability, extend continuously each such function to a function on [0, 1], and use the fact that (C[0, 1], d∞ ) is separable (see Example 586.4). √ 13.379 Show elementary that the function defined on [0, 1] by f (x) = x is the uniform limit of a sequence of Lipschitz functions on [0, 1]. Hint. Give δ ∈ (0, 1) consider the function (see Fig. 13.54) ⎧ ⎨ √1 x if 0 ≤ x ≤ δ, gδ (x) = √δ ⎩ x if δ ≤ x ≤ 1, √ and observe that x is Lipschitz on every interval [δ, 1] for 0 < δ ≤ 1 (see Exercise 13.284). This exercise should be compared with Exercise 13.298, where it is proved that every real-valued continuous function on [a, b] is the uniform limit of a sequence of Lipschitz functions on [a, b] (there, piecewise linear functions were used). See also Exercise 13.297. 13.380 Let L[0, 1] be the metric space of all Lipschitz functions on [0, 1] endowed with the d∞ metric. Is it a complete space? Hint. No, see Exercise 13.379 or, alternatively, Exercise 13.298 . 13.381 Let Pn [0, 1] be the metric space of all polynomials on [0, 1] whose degree is less than or equal to n, endowed with the d∞ metric. Is it a complete space? Hint. Yes. There are many approaches to a get an affirmative answer. (i) We claim first that the coefficients of the polynomial are continuous functions of the polynomial in the d∞ metric. A possible argument for proving the claim is the following: Let P (x) := a0 + a1 x + a2 x 2 + . . . + an x n . Pick n + 1 distinct points x0 , . . ., xn in [0, 1] and form a system of n + 1 equations in the n + 1 unknowns
734
13 Exercises
a0 , . . ., an : ⎧ ⎪ a0 + a1 x0 + a2 x02 + . . . + an x0n = P (x0 ), ⎪ ⎪ ⎪ ⎪ ⎨a + a x + a x 2 + . . . + a x n = P (x ), 0 1 1 2 1 n 1 1 ⎪ · · · ⎪ ⎪ ⎪ ⎪ ⎩a + a x + a x 2 + . . . + a x n = P (x ), 0
in short, XA = V , where ⎛ 1 x0 x02 ⎜ ⎜1 x1 x12 ⎜ X := ⎜ ⎜. . . . . . . . . ⎝ 1 xn xn2
1 n
. . . x0n
2 n
⎞
n n
⎛
n
⎞ a0
(13.29)
⎛
⎞ P (x0 )
⎟ ⎜ ⎟ ⎜ ⎜ a ⎟ ⎜ P (x ) . . . x1n ⎟ 1 ⎟ ⎜ 1 ⎟ ⎜ ⎟ , A := ⎜ ⎟ , V := ⎜ ⎟ ⎜ ⎟ ⎜ ... ... ⎠ ⎝ ... ⎠ ⎝ ... n an . . . xn P (xn )
⎟ ⎟ ⎟ ⎟. ⎟ ⎠
Observe that the matrix X of the system (13.29) is nonsingular, or, in other words, that its columns form a system of n + 1 linearly independent vectors in Rn+1 . Indeed, in the opposite case, we can find λ0 , λ1 , . . ., λn in R such that ⎛ ⎞ ⎛ ⎞ ⎛ ⎞ ⎛ ⎞ 1 x0 x0n 0 ⎜ ⎟ ⎜ ⎟ ⎜ ⎟ ⎜ ⎟ ⎜ 1 ⎟ ⎜ x ⎟ ⎜ xn ⎟ ⎜ 0 ⎟ ⎜ ⎟ ⎜ 1 ⎟ ⎜ 1 ⎟ ⎜ ⎟ λ0 ⎜ ⎟ + λ1 ⎜ ⎟ + . . . + λn ⎜ ⎟=⎜ ⎟. ⎜ ... ⎟ ⎜ ... ⎟ ⎜ ... ⎟ ⎜ ... ⎟ ⎝ ⎠ ⎝ ⎠ ⎝ ⎠ ⎝ ⎠ xn xnn 1 0 n k This amounts to (P (xi ) =) k=0 λk xi = 0, i = 0, 1, 2, . . ., n, a contradiction with the fact that the polynomial P , being of degree less than or equal to n, has no more than n zeros, according to the fundamental theorem of algebra (for a proof see, e.g., [Jam70]). This shows, then, that we can solve the system (13.29) in the unknowns a0 , . . ., an , getting finally A = X−1 V . In particular, if {Pj } is a sequence of polynomials of degree less than or equal to n such that d∞ (P , Pj ) → 0, and if Aj denotes the finite set of coefficients corresponding to Pj , then Aj → A coordinatewise. This proves that claim. As a consequence of the former discussion, we get that if a sequence of polynomials {Pk }∞ k=1 of degree less than or equal to n converge to a continuous function, then the sequence formed by their j th coefficients converge, say to aj t. This clearly n j implies that the sequence {Pk }∞ k=1 d∞ -converges to j =0 aj x : This proves that the polynomials of degree less than or equal to n form a closed set in the space of continuous functions endowed with the d∞ metric. Since this last space is complete (see Example 573.4), an appeal to Proposition 572 finalizes the argument. (ii) Another approach, this times based on (finite-dimensional) functional-analytic arguments, is provided in Exercise 13.559.
13.4 Functions
735
13.382 Follow the hint to construct the completion of a metric space (recall that we provided in the text a more direct approach to this result, see Corollary 577). Hint. Let (M, d) be a metric space. Let CS the set of all Cauchy sequences in M.
∞ Given S := {sn }∞ n=1 and S := {sn }n=1 in CS we write S ∼ S whenever the following
hods: Given ε > 0 there exists n0 ∈ N such that d(sn , sn ) < ε for n ≥ n0 . This is : for the set of all equivalence classes an equivalence relation in the set CS. Put CS : : d) in a complete defined by ∼. We shall endow CS with a metric d that turns (CS, : d) in such a metric space, and we shall define an isometry J from (CS, d) into (CS, : : way that J (CS) is dense in (CS, d). This will show that (CS, d) is a completion of (CS, d). : that contain S := {sn } and S := {s } in CS, If S and S are elements in CS n
respectively, we define d(S, S ) := limn→∞ d(sn , sn ). It is easy to prove that d so defined is independent of the chosen representative in each class, and that dis indeed : Let us prove that (CS, : d) is complete. Let { a metric on CS. S k }∞ k=1 be a Cauchy ∞ : sequence in (CS, d), and let Sk := {sk,n }k=1 ∈ Sk for all k ∈ N (the election of the particular representative in the class is irrelevant for the argument). Given n ∈ N, there exists kn ∈ N such that d( Sk , Sj ) < 1/2n for each k, j ≥ kn . Without loss of generality, we may assume that k1 > 1 and that kn < kn+1 for all n ∈ N. This defines a strictly increasing sequence {kn }∞ n=1 in N. Note that, for n ∈ N, each Sk with kn ≤ k ≤ kn+1 satisfies limm→∞ d(skn ,m , sk,m ) < 1/2n and, simultaneously, Sk is a Cauchy sequence; thus, we can find in ∈ N such that d(sk,m , skn ,m ) < 1/2n and d(sk,m , sk,r ) < 1/2n for each r, m ≥ in and for kn ≤ k ≤ kn+1 . Again, without loss of generality, we may assume in < in+1 for all n ∈ N. Define S := {sn }∞ n=1 in the following way: ⎧ ⎨s if 1 ≤ k < k1 , 1,1 sk = ⎩sk,i if kn ≤ k < kn+1 , for n ∈ N. n
Let us prove first that S ∈ SC. For this, fix n ∈ N with n > 2, and k, j ≥ kn . Pick p, q ∈ N such that n ≤ p ≤ q, kp ≤ k < kp+1 , kq ≤ j < kq+1 . Then d(sk , sj ) = d(sk,ip , sj ,iq ) ≤ d(sk,ip , skp+1 ,ip ) + d(skp+1 ,ip , skp+1 ,ip+1 ) + d(skp+1 ,ip+1 , skp+2 ,ip+1 ) + d(skp+2 ,ip+1 , skp+2 ,ip+2 ) + . . . . . . + d(skq−1 ,iq−1 , skq ,iq−1 ) + d(skq ,iq−1 , skq ,iq ) + d(skq ,iq , sj ,iq )
0, there exists Pε ∈ P[a, b] such that |R(f , P , {ti }) − R(f , Q, {sj })| < ε for all tagged partitions (P , {ti }) and (Q, {sj }) of [a, b] such that P / Pε and Q / Pε . (iv) For each ε > 0, there exists Pε ∈ P[a, b] such that |R(f , Pε , {ti }) − R(f , Pε , {si })| < ε for all choices of tags {ti } and {si } associated to Pε . (i) (ii)
Hint. (i)⇒(ii) follows from (i)⇒(ii) in Theorem 669. (ii)⇒(iii) is obvious, as well as (iii)⇒(iv). Let us prove (iv)⇒(iii). Assume that (iv) holds. Given ε > 0, find Pε = {a = x0 < x1 < . . . < xN = b} ∈ P[a, b] as in (iv). Put Wi := (xi − xi−1 )f [xi−1 , xi ] for i = 1, 2, . . ., N , and let W := N i=1 Wi . Observe that every real number x in W − W satisfies |x| < ε, hence the same holds for each x that can be written as a convex combination of elements in W − W . Let P := {a = u0 < u1 < . . . < uM = b} ∈ P[a, b], P / Pε , and let xi−1 = uki−1 < uki−1 +1 < . . . < uki = xi for i = 1, 2, . . ., N . Then we have, for a tagged partition (P , {tk }), R(f , Pε , {xi }) − R(f , P , {tk }) ⎧ ⎫ ki N ⎨ ⎬ = f (xi )(xi − xi−1 ) − f (tk )(uk − uk−1 ) ⎩ ⎭ k=ki−1 +1
i=1
=
ki N i=1 k=ki−1
( uk − uk−1 ' f (xi )(xi − xi−1 ) − f (tk )(xi − xi−1 ) . x − xi−1 +1 i
(13.31)
Observe that, for each i = 1, 2, . . ., N , the inner sum in (13.31) is a convex combination of elements in Wi − Wi , so (13.31) is an element in N i=1 conv (Wi − Wi ) = conv (W − W ). This proves that |R(f , Pε , {xi }) − R(f , P , {tk })| < ε. Finally, if
13.5 Integration
745
(P , {tk }) and (Q, {sj }) are two tagged partitions finer than Pε , we get |R(f , P , {tk }) − R(f , Q, {sj })| ≤ |R(f , P , {tk }) − R(f , Pε , {xi })| + |R(f , Pε , {xi }) − R(f , Q, {sj })| < 2ε. This shows (iii). The proof that (iii)⇒(i) is similar to (iii)⇒(i) in Theorem 669, and shall be omitted. The Exercises 13.410–13.412 discuss the definition of the Henstock–Kurzweil integral, also called the gauge integral. In these few exercises, a function g on a bounded closed interval [a, b] will be called a gauge function or, simply, a gauge, if g(x) > 0 for all x ∈ [a, b]. Recall that a partition P of [a, b], say P := {a = x0 < x1 < x2 < . . . < xn = b}, together with points zi ∈ [xi−1 , xi ], i = 1, 2, . . . , n, is called a tagged partition of [a, b]. The Riemann sum associated with this tagged partition P is R(f , P ) :=
n
f (zi )(xi − xi−1 ).
i=1
If g is a gauge on [a, b], then the tagged partition P is said to be g-fine if xi − xi−1 ≤ g(zi ) for every i = 1, 2, . . . , n. An important thing here is the following, due to the French mathematician P. Cousin. Lemma 1079 (Cousin’s lemma) For every gauge g, there is a tagged partition P of [a, b] that is g-fine. Definition 1080 A function f on the interval [a, b] is said to have a Henstock– Kurzweil integral I on [a, b], if for every ε > 0, there is a gauge g such that for every g-fine tagged partition P of [a, b], the Riemann sum R(f , P ) satisfies |R(f , P ) − I | ≤ ε. 13.410 Prove Lemma Hint. Put S := {x ∈ [a, b] : there is a g-fine tagged partition on [a, x]} and show that, in the metric space ([a, b], dabs ), where dabs is the restriction to [a, b] of the usual metric on R, the set S is simultaneously open and closed. Use then Corollary 104 to prove that S = [a, b], since S is nonempty. 13.411 Show that f has a Henstock–Kurzweil integral on [a, b] whenever f has a Riemann integral. Hint. Note that the difference between the Riemann and Henstock–Kurzweil integrals consists of considering only constant gauge functions in the definition of the Henstock–Kurzweil integral. We note that every Lebesgue integrable function as well as every improper Riemann integrable function on [a, b] is Henstock–Kurzweil integrable.
746
13 Exercises
13.412 Show that 1 − D, where D denotes the Dirichlet function (i.e., the characteristic function on the set P of all irrational numbers, see Definition 296) has a Henstock–Kurzweil integral on [0, 1]. Hint. The function 1−D is the characteristic function χQ of the rational numbers. Let {rk }∞ k=1 = Q ∩ (0, 1]. Given ε > 0, define a gauge function g on [0, 1] as follows: g(x) = 1 if x is irrational, and g(rk ) = 2rkk for k ∈ N. Let a P be a g-fine tagged partition P of [a, b]. Since χQ (x) = 0 is x is irrational, the Riemann sum for P equals the summation extended only over such intervals in P that have the corresponding zi rational. However, the total length of these intervals is at most 2ε, as each rk may be in a maximum of two intervals (as the endpoint). Some of the exercises in this subsection may have a shorter solution by using more advanced results—as we suggest in the corresponding hint. Solving them by using basic ingredients may help the reader to familiarize with the basic mechanisms of the theory. 13.413 Finda function on [0, 1] that is bounded on no subinterval. p Hint. f q := q. 13.414 Assume that a real-valued function f is unbounded on a bounded closed interval [a, b]. Show that there is a point c ∈ [a, b] such that f is unbounded on every neighborhood of c. Use this to see that the Riemann integral has to be defined for bounded functions only. Hint. Compactness of [a, b]. 13.415 Let (a, b) a general interval in R. We say that a function f : (a, b) → R is Newton integrable on (a, b), if f has a primitive function F (i.e., a function F that is differentiable in (a, b), F (x) = f (x) on (a, b), and the one-sided finite limits b F (a+) and F (b−) exist). The number (N) a f (t)dt := F (b−) − F (a+) is called the Newton integral of f on (a, b). Let the function f be defined for x ∈ [0, 1] by & 1 if x = n1 for some n ∈ N, f (x) = 0 otherwise. Show that f has a Riemann integral on [0, 1] (equal to 0) and that f does not have a Newton integral on (0, 1). 1 Hint. First show that L(f , P ) = 0 for all partitions P ∈ P[a, b]. Thus f = 0. 0 Put S := {1/n : n ∈ N}. Given ε ∈ (0, 1), the set Sε := S ∩ [ε, 1] is finite. Then, clearly we can construct a partition P ∈ P[0, 1] of the form P := {a0 = 0, a1 = ε, a2 , . . ., an } such that the sum of the lengths of the intervals (ak−1 , ak ) that intersect Sε is less that ε. It follows that U (f , P ) < ε + ε = 2ε. Since ε ∈ (0, 1) was arbitrary, 1 we get 0 f = 0, and f is Riemann integrable in (0, 1) (with Riemann integral 0). The function f has no primitive, since it does not have the intermediate value property (see Corollary 449). A different (more elaborated) approach to the solution is to use the characterization of Riemann integrable functions (Theorem 777). Indeed,
13.5 Integration
747
the function f has only a countable number of discontinuities, hence it is Riemann integrable on (0, 1). Since it differs from the function 0 only at a countable number of points, the Lebesgue integral of f is 0 (see Remark 772). Finally, use Theorem 783 to conclude that the Riemann integral of f on (0, 1) is also 0. 13.416 Assume that [a, b] is a closed and bounded interval in R and let f ∈ R[a, b]. Show elementary that f has at least one point of continuity in [a, b]. Hint. Since U (f , P ) − L(f , P ) ≤ ε for some P ∈ P[a, b], we see that on some subinterval of P , sup f − inf f must be small. Repeat this argument on and use the Nested Interval Theorem 69 to locate a point of continuity. Note that in this way we get that any nondegenerated open subinterval of [a, b] contains a point of continuity of f . Another argument uses Theorem 777: The complement in [a, b] of the set of points of continuity of f in [a, b] is a null set. It follows again that every nondegenerated open subinterval of [a, b] contains a point of continuity of f . 13.417 Assume that f has Riemann integral on [a, b] and that f (x) > 0 for every b x ∈ [a, b]. Show that a f > 0. Hint. Use Exercise 13.416: If x0 is a point of continuity of f , show that there is δ > 0 such that f (x) > δ on some interval I ⊂ [a, b] around x0 of positive length, say d. Then I f > δd > 0. 13.418 Let f be a continuous real-valued function defined on an interval [a, b], a < b. Assume that f (x) ≥ 0 for all x ∈ [a, b] and that f is not identically zero. b Show that a f (x) dx > 0. Compare with Exercise 13.417. Hint. If f is not identically zero, there exists c ∈ [a, b] such that f (c) > 0. By continuity, find δ > 0 such that f (x) > f (c)/2 for all x ∈ [a, b] ∩ [x − δ, x + δ]. b v Then a f (x) dx ≥ u f (x) dx > 0, where [u, v] := [a, b] ∩ [x − δ, x + δ]. 13.419 Let f be the function defined on [0, 1] by f (x) = xD(x), where D is the 1 Dirichlet function (see Definition 296). Show that 0 f does not exist as a Riemann integral. 1 1 Hint. If it exists, 1 f exists. Also, 1 x1 dx exists, because 1/x is a con2 2 tinuous function on [1/2, 1] (see Proposition 674). Note also that the product of Riemann integrable functions is Riemann integrable (see Corollary 780). Hence 1 1 1 1 D = 1 f (x) dx exists, which is a contradiction (see Remark 673.2). x 2
2
13.420 Let f be a continuous function on [0, 1]. Assume that 0 ≤ f (x) ≤ 1 for every n1 1 x ∈ [0, 1] and that max{f (x) : x ∈ [0, 1]} = 1. Show that limn→∞ 0 f n = 1. Hint. For every 0 < δ < 1 choose an interval I with length ε > 0 such that 1 f > δ on I . Then estimate 0 f n ≥ I f n ≥ I δ n ≥ εδ n . Finally, take the nth root 1 and use that limn→∞ ε n = 0. 13.421 Assume that the function f is bounded on [a, b] and continuous everywhere but at one point. Show that f ∈ R[a, b] without using the Riemann criterion (Theorem 777).
748
13 Exercises
Hint. Without loss of generality, we may assume that the point of discontinuity is 1 a. Let |f | be bounded by M. Given ε > 0 choose 0 < δ < 4M ε such that a + δ < b. Then, use the fact that f is continuous on [a + δ, b] and thus is Riemann integrable there to find a partition P ∈ P[a + δ, b] such that U (f , P ) − L(f , P ) < ε/4. Join to this partition P the point a to form a partition P ∈ P[a, b], and then show that U (f , P ) − L(f , P ) < ε. 13.422 Show that if a real-valued function f on [a, b] has finite one-sided limits at b every point, then its Riemann integral a f exists. Hint. First show that f is bounded. This can be done by contradiction extracting a convergent subsequence from “bad” points. Precisely, if for n ∈ N, a point xn ∈ [a, b] exists such that |f (xn )| ≥ n, extract from the sequence {xn }∞ n=1 a subsequence {xnk }∞ k=1 that converges to some x ∈ [a, b]. We may assume, without loss of generality, that {xnk }∞ k=1 is monotone. It follows that f cannot have both one-sided limits finite at x. Note that the function f has only jump discontinuities. Fix n ∈ N and let Fn := {x ∈ [a, b] : |f (x+) − f (x−)| ≥ 1/n}. We claim that this set is finite. Indeed, if not, by Theorem 97 its closure has accumulation point, say x ∈ [a, b]. It is obvious that f fails the condition of having finite one-sided limits. Since this is true for every n ∈ N and the set of all discontinuity points of f is ∞ n=1 , the Riemann integrability follows from the fact that this set is countable, and so null (see Theorem 777). 13.423 Assume that f is a real-valued bounded function defined on (0, 1) and extend it by 0 to a function on [0, 1]. Assume that this extended function is Riemann integrable. Then, if the function is extended by any other value, it is again Riemann integrable and has the same Riemann integral. Prove this Hint. Look at Proposition 672. 13.424 Use Taylor’s expansion to evaluate the integral 4
ln (1 + x) dx. x 0 ∞ n−1 n Hint. Recall that ln (1 + x) = n=1 (−1) x /n for x ∈ (0, 1] (see for∞ n−1 n−1 x /n for all x ∈ mula (5.80)). It follows that (1 + x))/x = n=1 (−1) (ln ∞ n−1 (−1, 0) ∪ (0, 1]. The series n=1 (−1) /n converges (see Corollary 183), hence Abel’s Theorem 518 implies that the series 1
∞
(−1)n−1 x n−1 /n
(13.32)
n=1
is uniformly convergent on [0, 1]. In particular, its sum is a continuous function f on [0, 1]. On the other hand, the function g(x) := (ln (1 + x))/x (see Fig. 13.58) is continuous on (0, 1], and it has a limit at x = 0 (use, for example, L’Hôspital Theorem 376). Denote this limit by g(0); in this way g becomes a continuous function on [0, 1], that agrees with the continuous function f on (0, 1]. It follows that f (x) = g(x) for
13.5 Integration
749
Fig. 13.58 The graph of ln (1 + x)/x on [−1, 1] (Exercise 13.424)
all x ∈ [0, 1]. The sequence of partial sums {Sn }∞ n=1 of the series (13.32) converges, then, uniformly to (ln (1 + x))/x on [0, 1], and so, by Corollary 697, we have 4 1 4 1 ln (1 + x) dx as n → ∞. Sn (x) dx → x 0 0 Note that 4
1
Sn (x) dx =
0
n i=1
(−1)i−1
1 . i2
(13.33)
The sum of this series (and so the value of the integral) is π 2 /12 (see Exercise 13.529). 13.425 Let f be a continuous and Lebesgue integrable function on R. Show that for almost all x ∈ R, we have limn→∞ f (nx) = 0. n |f (nx)| |f (nx)| 1 dx < ∞. It Hint. R n dx = n2 R |f (x)| dx. Therefore, ∞ ∞ |fn (x)| n=1 R n follows from Theorem 743 that the function n=1 n is Lebesgue integrable. Had the set & 0 ∞ |fn (x)| x∈R: = +∞ n n=1 positive Lebesgue measure, it would contradict the Lebesgue integrability of the |fn (x)| |f (nx)| function ∞ . This shows that for almost all x ∈ R we have ∞ < n=1 n=1 n n ∞ (see related to this the argument in Exercise 13.505), and this implies that limn→∞ f (nx) = 0 for almost all x ∈ R. n 13.426 Prove the statement in Remark 701.2. Hint. Assume first that f ∈ R[a, b]. Then f is bounded. Fix ε, η > 0. By Proposition 664, there exists P := {a = x0 < x1 < . . . < xn = b} ∈ P[a, b] j such that U (f , P ) − L(f , P ) < εη. Let {i }i=1 be the subintervals in P for which ω(f , i ) > η. Then, if Mi := sup{f (x) : x ∈ i } and mi := inf{f (x) : x ∈ i } for i = 1, 2, . . ., n, η
j
λ(i )
ε/(2(b − a)) is less than ε/4K. Those subintervals are {i : i = 1, 2, . . ., j }. Thus, we have U (f , P ) − L(f , P ) =
n
(Mi − mi )λ(i )
i=1
=
j
(Mi − mi )λ(i ) +
(Mi − mi )λ(i ) ≤ 2K
i=j +1
i=1
+
n
ε 2K
ε (b − a) = ε, 2(b − a)
and this shows, by Proposition 664, that f ∈ R[a, b]. Functions Defined by Integrals 13.427 Let the function f be defined on R by ⎧ ⎪ ⎪ ⎨0 if x < 0, f (x) =
x
if 0 ≤ x ≤ 1, and
1
if 1 < x.
⎪ ⎪ ⎩
Find an antiderivative F of f on R. Hint. See Fig. 13.59. Corollary 683 can be applied to justify that f has an antiderivative on the interval, say, [−1, 2], since f is a continuous function. Then, we may use the Fundamental x Theorem of Calculus 685 to ensure that F (x) = −1 f is an antiderivative. According to Definition 681, F is differentiable on (−1, 2) and F (x) = f (x) on (−1, 2). The choice of the interval [−1, 2] was done in order to include the “pathological” points 0, 1, where f is not differentiable. Carrying on the computation, we finally get ⎧ ⎪ if x < 0, ⎪ 4 x ⎨0 1 2 F (x) := f = 2x if 0 ≤ x ≤ 1, and ⎪ −1 ⎪ ⎩ x − 21 if x > 1.
13.5 Integration
751
Fig. 13.60 The function in Exercise 13.428
Fig. 13.61 The functions f and f 2 in Exercise 13.429
13.428 Let the function f be defined on [−1, 1] by ⎧ ⎨2x sin 1 − cos 1 if x = 0, x x f (x) = ⎩0 if x = 0. (See Fig. 13.60). Show that (a) f is not continuous at 0 but it is continuous at every other point. (b) f has an antiderivative on [−1, 1] (see Definition 681). (c) f ∈ R[−1, 1]; calculate 1 f . −1 Hint. Consider Example 4.5.8.3. For (a), check the points xn = 2π1 n , n ∈ N. (b) Consider the function g defined by g(x) = x 2 sin x1 for x = 0 and g(0) = 0. (c) The function f is bounded and is continuous everywhere but at one point. Use Exercise 13.421 and Theorem 685. 13.429 Show that there exists a differentiable function F on R such that F (x) = f (x), where f (x) := sin (1/x) for all x ∈ R \ {0}, and f (0) := 0, although this is not the case for the function f 2 (see Fig. 13.61). Hint. The function f is Riemann integrable on [0, x b], for any b > 0. Proposition 682 shows that the continuous function F (x) := 0 sin (1/t) dt, defined for x ∈ [0, b], is differentiable on (0, b], and F (x) = f (x) at every point x ∈ (0, b]. Let us prove that F has a right-hand side derivative 0 at x = 0. To this end, put Q(x) :=
F (x) − F (0) , for x ∈ (0, b]. x
752
13 Exercises
We shall compute limx→0+ Q(x). A change of variable shows that, for y = 1/x and x ∈ (0, b], 4 +∞ sin u Q(x) = y du u2 y The function u → 1/u2 is decreasing on [1/y, +∞), hence we can apply the Second Mean Value Theorem 694 for the Riemann integral to an interval [y, Y ], where 1/b ≤ y < Y , to get ξ (Y ) ∈ [y, Y ] such that 4
Y
sin u 1 du = u2 y
y y
4
ξ (Y )
sin u du +
y
y Y2
4
Y
sin u du ξ (Y )
1 y (13.34) (cos y − cos ξ (Y )) + 2 (cos ξ (Y ) − cos Y ). y Y +∞ Note that the improper Riemann integral y (sinu)/u2 du exists (see Corollary 712 and Theorem 718), hence the limit of the expression in (13.34) exists as Y → +∞. We can find then ξ ≥ y such that 4 +∞ sin u 1 du = (cos y − cos ξ ). (13.35) (Q(x) = ) y 2 u y y =
If we let now x → 0+ (i.e., y → +∞) in (13.35), we obtain Q(x) → 0, so F+ (0) = 0. The argument for proving that F− (0) = 0 is similar. Regarding the function f 2 , assume that there exists a differentiable function G such that G (x) = f 2 (x) for all x ∈ R. In particular, thefunction G [−1,1] is an x antiderivative of f 2 [−1,1] , so, by Theorem 685, G(x) = 0 f 2 (x) dx + C, for all
x ∈ [−1, 1], where C is a constant. In particular, G (0) = f (0) = 0. However, we shall prove that G (0) exists equal to 1/2, a contradiction. The argument is quite similar to the one used in the first part, so we shall just sketch it: By using sin2 (α) = (1/2)(1 − cos 2α), we have G(x) =
1 1 x− 2 2
4
x
cos 0
2 1 dt = x − t 2
4
x/2
cos 0
1 1 du = x − u 2
4
+∞ 2/x
cos s ds. s2
For Y > 2/x we get, using Theorem 694, 4
Y
2/x
4 Y 1 cos s ds Y 2 ξ (Y ) 2/x x 1 x2 sinξ (Y ) − sin + 2 (sinY − sin ξ (Y )), = 4 2 Y
cos s x2 ds = 2 s 4
4
ξ (Y )
cos s ds +
13.5 Integration
753
for some ξ (Y ) ∈ [2/x, Y ]. Letting Y → +∞, and noting that an improper Riemann integral, we get 4 +∞ cos s x2 2 ds = sinξ − sin s2 4 x 2/x
+∞ 2/x
cos s s2
ds exists as
for some ξ > 2/x, hence G(x) − G(0) 1 x 2 = − sinξ − sin , x 2 4 x for some ξ > 2/x. Letting now x → 0 we get G (0) = 1/2, as we wanted to show. 13.430 Let a and b be real numbers such that a < b. Find 4 b 4 b 4 b d d d sin x 2 dx, sin x 2 dx, sin x 2 dx dx a da a db a Hint. 0, −sin a 2 , sin b2 . 13.431 Find d dx
4
x2
1 + t 2 dt.
0
Hint. First change the variable (t√= u2 ). Then, use the Fundamental Theorem of Calculus (Theorem 685) to get 2x 1 + x 4 . x
13.432 Find lim
x→0
0
sin t 2 dt . x
Hint. 0, by L’Hôspital’s rule (Theorem 376) and the Fundamental Theorem 685 of calculus for the Riemann integral.
Convergence Theorems 13.433 Show that there is a sequence {fn } of nonnegative real-valued Riemann inte1 grable functions on [0, 1] such that limn→∞ 0 fn dx = 0 and yet {fn (x)} converges for no x ∈ [0, 1]. Hint. Consider the characteristic functions of the following intervals: [0, 1], [0, 1/2], [1/2, 1], [0, 1/4], [1/4, 1/2], [1/2, 3/4], [3/4, 1], etc. 13.434 Show that there is a uniformly bounded sequence {fn } of real-valued continuous functions fn on [0, 1] that pointwise converge to a function f that is not equivalent to any Riemann integrable function. This sequence {fn } thus converge to f in L1 [0, 1] (by the Dominated Convergence Theorem 750) and is Cauchy in the normed space R1 consisting of all the real-valued functions defined on [0, 1] that have a Riemann integral, endowed with the restriction
754
13 Exercises
of the canonical norm on L1 [0, 1]. This proves that R1 , with this norm, is not a Banach space, and shows the importance of Lebesgue integral for completeness. Hint. Fill in the details in the following argument: Let C ⊂ [0, 1] be a Cantor ternary set of positive measure (see Sect. 3.1.5), f be 1 the characteristic function of C, and for n ∈ N and x ∈ [0, 1], fn (x) = 1+n d(x,C) , where d denotes the distance function to C. Observe that each fn is a continuous function (see Proposition 557), and that the sequence {fn } converges pointwise to f . Let us see that f is not equivalent to any Riemann integrable function. Arguing by contradiction, assume that h is a Riemann integrable function that equals f (a.e.). Let N := {x : f (x) = h(x), or h is discontinuous at x}. Then λ(N ) = 0, hence λ(C \ N) > 0. Thus, there is p ∈ C \ N such that 0 < p < 1. In particular, f (p) = h(p) = 1 and h is continuous at p. If J is any open interval with p ∈ J ⊂ [0, 1], then the open set V = J ∩ ((0, 1) \ C) is not empty (recall that C does not contain any nondegenerate interval), and thus 0 < λ(V ) (= λ(V \ N )). Thus, there is x ∈ J such that h(x) = f (x) = 0. Thus, we can choose a sequence xk such that lim xk = p and lim h(xk ) = 0 = 1 = h(p). This is a contradiction with the continuity of h at p. 13.435 Find an example of a uniformly bounded sequence of real-valued continuous functions on [0, 1] that converge pointwise to a function that is not Riemann integrable. Hint. In Exercise 13.302 take for C the Cantor ternary set of positive measure (see the paragraph after Remark 686). Change of Variable. Integration by Parts 13.436 Calculate the area encircled by the ellipse x2 y2 + 2 = 1, 2 a b where a and b are two positive constants. )
2 Hint. From the equation of the ellipse, y = b 1 − xa . The area of a quarter of the ellipse, by two changes of variable, is 4 a* 4 1 x 2 b 1− dx = ba 1 − t 2 dt a 0 0 4 π 4 π 2 2 2 1 − sin u cos u du = ab cos2 u du = ab 0
4 = ab
0
0 π 2
1 + cos 2u π du = ab 2 4
So, the result is π ab. 1 x dx. 13.437 Compute 0 √x+1 Hint. Use the substitution x + 1 = t (see item 1 in Integration of Algebraic Irrationals, Subsection 13.5.2 below).
13.5 Integration
755
Fig. √13.62 The functions x/ x + 1 on [0, 1] (Exercise 13.437)
Fig. 13.63 The function under the integral sign in Exercise 13.439, for several values of a
13.438 Let f denote the Takagi–van der Waerden function introduced in Definition 1 481. Prove that f ∈ R[0, 1] and compute 0 f . Hint. This function is continuous (see Proposition 482); according to Proposition 674, it is Riemann integrable when restricted to any closed and bounded interval in 1 R. We shall show that 0 f = 13 . Indeed, for each n consider the function fn (x) := 1 φ(4n x), where the function φ comes from the construction of the Takagi–van der 4n Waerden function. By Theorem 702 we get 4 1 4 4n 1 1 fn = n φ = n. 16 4 0 0 It was proved in Proposition 482 that the series ∞ n=1 fn converges uniformly to the function f . As it is obvious, the sequence {sn }∞ n=1 of partial sums is uniformly bounded. We can apply then Corollary 697 to obtain 4
1
f =
0
∞ 4
1
fn =
0
n=1
∞ 1 1 = . n 4 3 n=1
Parametric Integrals 13.439 For a > −1 evaluate 4 H (a) := 0
+∞
1 − e−ax dx. xex 2 2
(13.36)
756
13 Exercises
Hint. For a picture of the graph of the function under the integral sign for some values of a, see Fig. 13.63. Apply Theorem 802 in the following setting (we follow the notation there): Choose −1 < A < 0. Put M := [0, +∞), I = (A, +∞), 2 2 f (x, a) := (1−e−ax )/(xex ) for (x, a) ∈ M ×I . Observe that the integral in (13.36) 2 trivially exists for a = 0. Note too that ∂f (x, a)/∂a = xe−(a+1)x for (x, a) ∈ M × I . −(A+1)x 2 The function g(x) := xe has a finite improper Riemann integral in M. Since g is positive on M, we can apply Proposition 789 in order to conclude that g is Lebesgue integrable on M. Clearly g dominates all functions ∂f (x, a)/∂a on M × I . By Theorem 802, the function H is differentiable at each a ∈ I ; moreover, 1 H (a) = 2(a+1) . Thus, H (a) = 21 ln (a + 1) + K, where K is a constant. Since H (0) = 0, we set K = 0. This argument applies to any A ∈ (−1, 0), so in fact we get H (a) = (1/2) ln (a + 1) for all a > −1.
13.5.2
Review of Some Frequently used Techniques for calculating Antiderivatives
This subsection discusses some techniques for finding antiderivatives for some functions. These techniques are important for other purposes as well. For more in this direction see [Val74] and [Gou59]. Remark 1081 First of all, note that there are elementary functions (i.e., combinations of polynomials, roots, trigonometric functions and their inverses, power functions, logarithmic functions, etc.) that do not possess elementary antiderivatives. For example: sin x 2 ex , , etc. x For a discussion about the impossibility to express in terms of elementary functions some antiderivatives see, e.g., [Con05]. ® Integration by Parts As a reminder of the present technique, we write 4 4 f g = f g − f g . For the precise formulation, see Theorem 705. 13.440 x ln x dx. Hint. Put f = x, g = ln x. We get 4 4 1 x2 1 x2 ln x − x dx = ln x − x 2 + C x ln x dx = 2 2 2 4 for an arbitrary constant C.
13.5 Integration
757
13.441 arctan x dx. Hint. Put f = 1, g = arctan x. We get 4 1. arctan x dx 4 1 2x = x arctan x − dx = x arctan x − ln 1 + x2 + C 2 1 + x2 for an arbitrary constant C. 13.442 x sin x dx. Hint. Put f = x, g = sin x. We get 4 4 x sin x dx = −x cos x + cos x dx = −x cos x + sin x + C for an arbitrary constant C. 13.443 ex sin x dx. Hint. By parts, twice. First, f = ex , g = sin x, then f = ex , g = cos x. We get 4 4 4 ex sin x dx = −ex cos x + ex cos x dx = −ex cos x + ex sin x − ex sin x dx. 4
Thus 2
ex sin x dx = ex (sinx − cos x) + C,
where C is an arbitrary constant, so 4 1 ex sin x dx = ex (sinx − cos x) + D 2 for an arbitrary constant D.
13.444 Try to do similarly x tan x dx and see the failure. Hint. Neither u = x, dv = tan x dx nor u = tan x, dv = x dx work. Rational Functions 4
P (x) dx. Q(x)
By a rational function we understand a function of the form P (x)/Q(x), where P and Q are polynomials in the variable x. We say that the rational function is proper if the degree of the numerator is strictly less than the degree of the denominator. For a rational function, by performing the division algorithm we get a polynomial plus a proper rational function, hence for computing the primitive function we may always
758
13 Exercises
assume that the rational function P /Q is proper. The basis for finding a primitive function is the following decomposition method : Assume that Q is a polynomial in the variable x, and that {α, . . ., β, (a + bi), (a − bi). . ., (c + di), (c − di)} are the roots of Q, with multiplicity order p, . . ., q, r, . . ., s, respectively (where all constants α, . . ., β, a, b, . . ., c, d are real numbers). Then Q(x) = a0 (x − α)p . . .(x − β)q ((x − a)2 + b2 )r . . .((x − c)2 + d 2 )s and we can write ap P (x) a1 a2 = + + ... + p p−1 Q(x) (x − α) (x − α) x−α bq b1 b2 + ... + + + ... + q q−1 (x − β) (x − β) x−β m1 x + n1 m2 x + n2 mr x + nr + + ... + r + r−1 2 2 2 2 (x − a)2 + b2 ((x − a) + b ) ((x − a) + b ) h1 x + k1 h2 x + k 2 hs x + ks + ... + + + ... + , (x − c)2 + d 2 ((x − c)2 + d 2 )s ((x − c)2 + d 2 )s−1 (13.37) where a1 , . . ., ap , . . ., b1 , . . ., bq , m1 , n1 , . . ., mr , nr , . . ., h1 , k1 , . . ., hr , ks are real numbers that can be obtained by identifying coefficients. We should consider three different cases. 1. The case Q has only real roots . It is enough to observe that ⎧ ⎨ a(x − α)−p dx = a(x − α)−p+1 (−p + 1)−1 + C, if p = 1 , ⎩ a(x − α)−1 dx = a ln |x − α| + C, otherwise. where C is an arbitrary constant. 2. The case that Q has only real and complex roots, the complex ones with multiplicity order 1. Use 1 above for real roots, and observe that 4 4 mx + n ma + n 1/b x−a ma + n dx = arctan +C, dx = x−a 2 2 2 (x − a) + b b b b ( b ) +1 where C is an arbitrary constant. 3. The case that Q has complex roots with multiplicity order greater than 1 . Then, we can use the Hermite method (from the French mathematician C. Hermite): 4 4 P (x) U (x) V (x) dx = + dx, (13.38) Q(x) R(x) S(x)
13.5 Integration
759
where R(x) is a polynomial having the same roots as Q(x), only each of them with an order of multiplicity diminished by 1, and S(x) is a polynomial having the same roots as Q(x), only each of them with multiplicity 1. Here, U (x) and V (x) are polynomials to form proper rational fractions U/R and V /S, whose coefficients are found by identification after taking derivatives on both sides of (13.38). The primitive in the right-hand side of (13.38) is computed according to the case 2. 4
13.445
dx . +1
x4 Hint. We write x4
1 1 = 4 2 +1 x + 2x + 1 − 2x 2 1 1 = = , √ √ √ 2 2 2 2 (x + 1) − ( 2x) (x + 1 + 2x)(x 2 + 1 − 2x)
and proceed with the decomposition to partial fractions as Case 2 above. Precisely, 1 ax + b cx + d = √ , √ + 2 2 1 + x4 x + 1 − 2x x + 1 + 2x √ √ where, after identifying coefficients, a = 1/(2 2), b = 1/2, c = −1/(2 2), and d = 1/2. It follows that √ √ 1 2 1 2x + 2 = √ √ + √ 1 + x 4 4 2 x 2 + 1 + 2x x 2 + 1 + 2x √ √ 2 1 2x − 2 − √ , √ − √ 4 2 x 2 + 1 − 2x x 2 + 1 − 2x hence
4
√ √ 1 1 dx = √ [ ln (x 2 + 1 + 2x) + 2 arctan (1 + 2x)] 4 1+x 4 2 √ √ 1 − √ [ ln (x 2 + 1 − 2x) + 2 arctan (1 − 2x)] + C, 4 2
where C is an arbitrary constant.
760
13 Exercises
4
13.446
dx . ((x + 1)2 + 3)2
Hint. We are in Case 3 above. Hermite’s method suggests to write 4 4 dx ax + b cx + d = + dx. (x + 1)2 + 3 (x + 1)2 + 3 ((x + 1)2 + 3)2 By taking derivatives at both sides of the previous equality, and then identifying coefficients, we finally get √ arctan 1+x 1+x 3 + C, + √ 6(3 + (1 + x)2 ) 6 3
where C is an arbitrary constant. A different approach is the following: Put 4 4 1 dx dx = 2 2 2 9 ((x + 1) + 3) x+1 √
The substitution
x+1 √ 3
3
2 . +1
= u leads to an integral of the type 4
dx . (x 2 + 1)2
Here, we can proceed by parts: By letting f =
1 1+x 2
and g = 1, we have
4 x x2 1 .1 dx = + 2 dx 1 + x2 1 + x2 (1 + x 2 )2 4 2 4 x x +1−1 x dx = +2 dx = + 2 arctan x − 2 . 1 + x2 (1 + x 2 )2 1 + x2 (1 + x 2 )2 4
arctan x =
Thus
4
dx 1 x 1 = arctan x + + C, 2 2 (1 + x ) 2 2 1 + x2
where C is an arbitrary constant. Note also that, in general, for polynomials of degree 5 or higher, there do not exist formulas for the roots in terms of the coefficients of the polynomial. This result is the Abel–Ruffini theorem (from N. H. Abel and the Italian mathematician P. Ruffini), also known as Abel’s impossibility theorem, stating that there is no general algebraic solution—i.e., a solution expressed by radicals in terms of the coefficients of the polynomial. For a proof see, e.g., [Al04, Problems 351 and 352].
13.5 Integration
761
Algebraic Irrationals 1. To compute the primitive function 4
px + q m px + q n dx , R x, , . . ., rx + s rx + s
where R is a rational function, p, q, r, s real numbers such that r and s do not simultaneously vanish, and m, . . ., n are rational numbers, change the variable from x to t by putting px+q = t M , where M is the lowest common multiple of rx+s the denominators of m, . . ., n. This leads to the computation of a primitive of a rational function. 13.447
*
4 x
Hint. Put 1 √ 32 x
*
x 2x−1
x dx. 2x − 1
= t 2 . We get the integral of a rational function. The answer is
√ √ √ x √ 2 x(−3+2x +8x 2 )+3 −2 + 4x ln |2 x + −2 + 4x| +C, 2x − 1
where C is an arbitrary constant. 2. To compute the primitive function 4
R(x, ax 2 + bx + c) dx ,
where R is a rational function and a = 0, a change of variable leads to the computation of the primitive of a rational function. Precisely, change to the new variable t, according to the following cases: √ √ a) If a > 0, put√ ax 2 + bx + c = ±√ ax + t b) If c > 0 put ax 2 + bx + c = ± c + xt 2 c) If √ a < 0 and α, β are the two real √ roots of ax + bx + c = 0 (after all 2 2 ax + bx + c must be real), put ax + bx + c = t(x − α). 4
13.448
√ √
dx 1 + x2
.
dx 1 + x 2 = x + t. Then, after substituting, √1+x = − dtt dt = 2 √ − ln t + K = − ln ( 1 + x 2 − x) + K, where K is an arbitrary constant. A different √ change of variable, leading to the computation of the primitive function of 1/ 1 + x 2 , is also possible: Put Hint. Put
x = sinh t :=
et − e−t . 2
(13.39)
762
Then, we get
13 Exercises
4 √
4
dx 1 + x2
=
cosh t dt = t + K = arcsinhx + C, cosh t
where C is an arbitrary constant. That the two families of primitive functions coincide can be seen easily: Putting et = u in (13.39), we have u− u1 = 2x, so u2 −2xu−1 = 0, √ √ = x + x 2 + 1, as the other choice is negative and u > 0. hence u = x ± x 2 + 1√ Thus we have et = x + 1 + x 2 and t = arcsinh x = ln (x + 1 + x 2 ). (13.40) Similarly we get
4 √
dx x2 − 1
= ln x + x 2 − 1 + C,
for an arbitrary constant C. 13.449
4
1 + x 2 dx,
4 x 2 − 1 dx.
√ Hint. For the first integral, the substitution suggested above uses 1 + x 2 = x + t. So 4 4 4 1 1 2 1 (1 + t 2 )2 dt 1 + x 2 dx = − dt = − t + + 4 t3 4 t t3 2 1 1 1 1 = − t 2 − ln t + t −2 + K = − 1 + x2 − x 8 2 8 8 1 −2 1 1 + x2 − x + 1 + x2 − x + C, − ln 2 8 where C is an arbitrary constant.
A different √ approach is done by using the “by parts” technique: Putting f = 1 2 and g = 1 + x , we get, by using Exercise 13.448, 4 4 x2 dx 1. 1 + x 2 dx = x 1 + x 2 − √ 1 + x2 4 2 x +1−1 2 =x 1+x − dx √ 1 + x2 4 = x 1 + x2 − 1 + x 2 dx + ln (x + 1 + x 2 ). Thus, we have 4 1 1 1 + x 2 dx = x 1 + x 2 + ln (x + 1 + x 2 ) + C, 2 2 where √ C is an 1arbitrary constant. In view of (13.40), this can be written also as 1 x 1 + x 2 + 2 arcsin hx + C, where C is an arbitrary constant. 2
13.5 Integration
763
√ For the second integral, proceed similarly. The result is 21 x x 2 − 1 − √ x 2 − 1) + C, where C is an arbitrary constant. 4
13.450
1 2
ln (x +
1 − x 2 dx
√ Hint. The substitution suggested above uses 1 − x 2 = t(x − 1), and so 4 4 t2 1 − x 2 dx = −8 dt. 2 (t + 1)3 This can be solved by Hermite’s method by putting 4 4 t2 at 3 + bt 2 + ct + d pt + q dt, dt = + 2 3 2 2 (t + 1) (t + 1) (t 2 + 1) where coefficients a, b, c, d, p, and q, can be obtained by identification after taking derivatives at both sides of the previous identity. A second method uses the substitution x = sin t. It gives, in the interval (0, π/2), 4 4 1 + cos 2t dt cos2 t dt = 2 1 1 1 1 = t + sin 2t + C = t + 2 sin t cos t 2 4 2 4 1 1 + C = arcsin x + x 1 − x 2 + C, 2 2 where C is an arbitrary constant. 13.451
4
(x + 1)(x + 2) dx.
√ √ Hint. Note that (x + 1)(x + 2) = x 2 + 3x + 2, and use the substitution √ (t 2 −3t+2)2 2 x + 3x + 2 = x + t to obtain 2 dt. Perform the division to get a (3−2t)2 polynomial and a proper fraction and proceed to the computation of the primitive to get √ √ √ √ 2 + 3x + x 2 ( 1 + x 2 + x(3 + 2x) − arcsin h 1 + x) + C, √ √ 4 1+x 2+x where C is an arbitrary constant. Another approach is to write + 3 2 1 x+ x 2 + 3x + 2 = − 2 4 A B B1 3 2 1 C = 2 x+ −1 = (2x + 3)2 − 1. 4 2 2
764
13 Exercises
Then, use the substitution 2x + 3 = t and proceed as above to get 1 3 x + 2 + 3x + x 2 − ln 3 + 2x + 2 2 + 3x + x 2 + C, 4 2 8 where C is an arbitrary constant. Binomial Integrals Primitives of the form
4 x m (a + bx n )p dx
are called binomial integrals. If m+1 , p, or m+1 + p are integer numbers, the change n n of variable from x to t given by x = t 1/n leads to the computation of an integral of the form 1 in algebraic irrationals, above. (It was proved by the Russian mathematician P. Chebyshev that those were the only cases in which the function under the integral sign has a primitive that can be expressed in terms of “elementary functions”.) 13.452
4 √
x2 1 + x3
dx.
Hint. Put x 3 = t to get 4 2√ dt 2 1 = 1+t +C = 1 + x 3 + C, √ 3 3 3 1+t where C is an arbitrary constant. Integration of Transcendental Functions 1. Primitives of the form
4 R(sinx, cos x) dx, ,
where R is a rational function. Change the variable x to t by letting tan
x = t, 2
leading to the computation of the primitive function of a rational function. In some instances, we can simplify the method by using a different change of variable: (a) If R(sinx, cos x) = −R(sinx, − cos x) then use t = sin x.
13.5 Integration
765
(b) If R(sinx, cos x) = −R(− sin x, cos x) then use t = cos x. Finally, (c) if R(sinx, cos x) = R(− sin x, − cos x) then use t = tan x. We use the following formulas, where tan (x/2) = t: sin2
sin2 x2 x = 2 sin2 x2 + cos2
x 2
cos2
cos2 x2 x = 2 sin2 x2 + cos2
x 2
=
tan2 x2 t2 = 2 x 2 tan 2 + 1 t +1
=
1 t2 + 1
sin x = 2 sin
sin x2 x x 2t x cos2 = 2 cos = 2 2 2 cos x2 2 t +1
cos x = cos2
x x x t2 1 − t2 . − sin2 = 1 − 2 sin2 = 1 − 2 2 = 2 2 2 t +1 1 + t2 4
13.453
dx . sin3 x Hint. Use (b) above and put cos x = t. We get 4 dt , − (1 − t 2 )2 which is the integral of a rational function. The primitive function is 1 x 1 x 1 x x 1 − csc2 − ln cos + ln sin + sec2 + C, 8 2 2 2 2 2 8 2
where C is an arbitrary constant. 4
13.454
dx . 2 + cos x
Hint. Put tan x2 = t, i.e., x = 2 arctan t. Then 2 + cos x = 2 + Thus, using finally the substitution √t 3 = u, 4
dx =2 2 + cos2 x
4
2 dt = 3 + t2 3
4
1+
dt 2 √t
3
1−t 2 1+t 2
=
3+t 2 . 1+t 2
766
13 Exercises
√ 4 √ 2 3 2 3 du = arctan u + C 3 1 + u2 3 √ 2 3 1 x = arctan √ tan + C, 3 2 3
=
for an arbitrary constant C. 13.455
4
sin2 x dx. 1 + sin2 x
Hint. Put tan x = t. We get 4
Now, 4
sin2 x dx = 1 + sin2 x 4
4
sin2 x + 1 − 1 dx = 1 + sin2 x
4 1−
1 1 + sin2 x
4
dx.
4 dt 1 du =√ √ t2 2 1 + u2 2 1 + ( 2t) 1 + 1+t 2 √ 1 1 = √ arctan u + C = √ arctan 2 tan x + C, 2 2
dx = 1 + sin2 x
1
1 dt = 1 + t2
for an arbitrary constant C. Thus, the final result for the integral is 4
√ sin2 x 1 dx = x − √ arctan ( 2 tan x) + C, 2 1 + sin x 2
for an arbitrary constant C. 13.456
4
dx . sin x
Hint. As suggested, put cos x = t. So, 4 4 dx dt 1 1 = = ln (1 − t) − ln (1 + t) + C sin x t2 − 1 2 2 1 1 = ln (1 − cos x) − ln (1 + cos x) + C 2 2 x x x = − ln cos + ln sin + C = ln tan + C, 2 2 2 where C is an arbitrary constant.
13.5 Integration
767
4
13.457
sin x + 2 cos x dx. sin x + cos x
Hint. Since we are in case (c), put tan x = t to get rational integral, and we obtain
t+2 (t+1)(t 2 +1)
dt. This is a
1 3 1 ln |t + 1| − ln (t 2 + 1) + arctan t + C 2 4 2 1 3 1 = ln | tan x + 1| − ln (tan 2 x + 1) + x + C, 4 2 2 where C is an arbitrary constant. For an alternative approach, we write sin x + 2 cos x = a(sinx + cos x)) + b(cos x − sin x) (from linear algebra we know that such constants must exist as the summands are linearly independent). We calculate, by comparing, the coefficients, getting a = 23 , b = 21 . Thus, since the second summand is the derivative of the denominator in the integral, we have 4 4 sin x + 2 cos x 3 1 cos x − sin x dx = x + dx sin x + cos x 2 2 sin x + cos x 3 1 = x + ln | sin x + cos x| + C, 2 2 for an arbitrary constant C. An Alternative Way to Roots 13.458
4
dx . √ x2 − 1
Hint. We put x1 = t, and then sin t = u. We get, for x > 1, by the above, and relying on Exercise 13.456, 4 4 4 dx dt 1 du =− = ln +C =− √ √ 2 2 sin u | tan u2 | x −1 t 1−t + + * cos2 2u 1 + cos u (1 + cos u)2 + C = ln + C = ln +C = ln 2 u 1 − cos u sin 2 sin2 u 1 + cos u 1 + 1 − sin2 u = ln + C = ln +C sin u sin u 1 = ln 1 + 1 − t2 + C t * 1 = ln x 1 + 1 − 2 + C = ln (x + x 2 − 1) + C, x
768
13 Exercises
for an arbitrary constant C. 13.459
4 √
dx x − x2
.
Hint. We write, using finally the substitution 2x − 1 = t, 4 4 dx dx = √ 2 x−x −(x 2 − x) 4 4 dx dx = = ) ) 1 −((x − 21 )2 − 41 ) − (x − 21 )2 4 4 4 dx dx 1 = = ) 2 1 1 − (2x − 1)2 1 − (2(x − 21 )2 ) 4 4 1 1 dt = = arcsin (2x − 1) + C, √ 4 4 1 − t2 for an arbitrary constant C. 2. Reduction formulas for antiderivatives of the form 4 Im,n := sinm x cosn x dx, where n, m ∈ Z .
(13.41)
Exercise 13.460 provides a sample of reduction formulas. 13.460 Prove the following reduction formulas by integration by parts, where Im−n is given in (13.41): (1)
Im,n = −
sinm−1 x cosn+1 x m−1 + Im−2,n , m+n m+n
if m + n = 0,
(2)
Im,n =
sinm+1 x cosn−1 x n−1 + Im,n−2 , m+n m+n
if m + n = 0,
(3)
Im,n =
sinm+1 x cosn+1 x m+n+2 + Im+2,n , m+1 m+1
if m + 1 = 0,
(4)
Im,n = −
sinm+1 x cosn+1 x m+n+2 + Im,n+2 , n+1 n+1
if n + 1 = 0.
For (1) put (n + 1)Im,n = − sinm−1 x.d cosn+1 x. For (2) put (m + 1)Im,n = Hint.n−1 cos x.d sinm+1 x. For (3) use (1) putting m + 2 instead of m, and for (4) use (2) putting n + 2 instead of n.
13.5 Integration
769
4
13.461 Solve
Hint.
dx sin4 x
dx . sin4 x
= I−4,0 . Then sin−3 x cos x −4 + 2 + I−2,0 −3 −3 4 dx cos x 2 cos x 2 =− + =− − cot x + C, 3 2 3 3 3 3 sin x sin x 3 sin x
I−4,0 =
where C is an arbitrary constant.
A Special Case of Trigonometric Polynomials 4
13.462
sin 4x cos 7x dx. Hint. We use sin (α ± β) = sin α cos β ± sin β cos α. By adding these two rows, we get 4 4 1 sin 4x cos 7x dx = (sin11x − sin 3x) dx 2 1 1 = − cos 11x + cos 3x + C, 22 6 for an arbitrary constant C. π/2 π/2 13.463 Compute 0 sin2 x dx and 0 cos2 x dx. Hint. Note that sin2 x + cos2 x = 1 for all x ∈ R, so 4 π/2 4 π/2 4 π/2 sin2 x dx + cos2 x dx = dx = π/2. 0
0
Since sin (x + π/2) = cos x, we get
0
π/2 0
sin2 x dx =
π/2 0
cos2 x dx (= π/4).
13.464 Let the function f be defined on [−1, 1] by & 1 for x ≥ 0, f (x) = −1 for x < 0. Show that the Riemann integral of f on [−1, +1] exists and evaluate it. Hint. Use Propositions 672 and 674. For computing the integral, split the interval into two subintervals. Draw a picture.
770
13 Exercises
Fig. 13.64 The function (sinx)/x in Exercise 13.467 on [0, 50]
13.465 Suppose f (t, s) is a continuously twice differentiable function. Let 4 t g(t) = f (t, s) ds 0
Show
13.5.3
d 2g d = f (t, t) + 2 dt dt
∂f ∂t
4
t
(t, t) + 0
∂2 f (t, s) ds ∂t 2
Improper Riemann Integral
1 13.466 Prove that the integral 0 ln x dx exists as an improper Riemann integral. 1 Hint. Compute δ ln x dx integrating by parts, and take the limit as δ ↓ 0. +∞ 13.467 Prove that 0 sinx x dx = π/2 by using the concept of Dirichlet kernel (Definition 840). Hint. See Fig. 13.64. Observe that the function f (x) := sin (x)/x, x > 0 has limit 1 when x → 0+, so if we define f (0) = 1, the resulting function is continuous on [0, +∞). Then, check Example 713 to see that this improper Riemann integral π/2 dx = exists. Divide the proof in several parts: (i) Prove first that (a) 0 sin (2n+1)x sin x π/2. For this, use the equality sin (2k + 1)x − sin (2k − 1)x = 2 cos 2kx sin x.
Summing from k = 1 to n we get sin (2n + 1)x = sin x 1 + 2 nk=1 cos 2kx . π/2 dx. A simple change of variable This gives (a). (ii) Now put In := 0 sin (2n+1)x x (2n+1)π/2 sin x gives In = 0 dx. Put := π/2 − In . Use (a) to show that n = n x π/2 1 1 g(x) sin (2n + 1)x dx, where g(x) := sin x − x for x = 0. If we put g(0) := 0, 0 the function g so defined has a continuous derivative on [0, π/2]. Use integration by parts to prove that |n | ≤ K/n for a certain constant K > 0. This shows that n → 0 as n → ∞, and this proves the result. For a different approach to the evaluation of this integral see Example 806.
13.5 Integration
771
Fig. 13.65 The function in the Fresnel integral (Exercise 13.468) on [0, 10]
Fig. 13.66 Comparison of functions on [0, 1] (Exercise 13.469)
13.468 Show that the Fresnel integral (see Fig. 13.65) 4 ∞ sin x 2 dx 0
exists as an improper Riemann integral. Hint. The change of variable: x 2 = t leads to a Dirichlet-like integral (see Example 806). 13.469 Decide on the existence of the two following improper integrals: 4 (a) 0
1
1 dx, (b) ln x
4 1
+∞
√
dx x3 + x
.
Hint. (a) Diverges “due to the point 1” (see Fig. 13.66): Put y = 1 − x for x ∈ (0, 1) 1 to get that the first integral is 0 (ln (1 − y))−1 dy. Note that − ln (1 − y) = y + y 2 /2 + y 3 /3 + . . . < y + y 2 + y 3 + . . . = y/(1 − y) for y ∈ (0, 1) (see Eq. (5.80)). Then, for y ∈ (0, 1), −1 1−y 1 > = − 1 > 0. ln (1 − y) y y 1 Note that 0 (1/y) dy diverges. Use, finally, Theorem 718. (b) Converges. Compare 3 with 1/x 2 (see Fig. 13.66) and use Theorem 718. 13.470 Decide if the following improper Riemann integral exists: 4 +∞ dx . 5 − x3 + x2 − x + 1 x 2 Hint. Converges. Compare with x −5 and use Theorem 718.
772
13 Exercises
Fig. 13.67 An improper Riemann integrable non-Lebesgue integrable function 0
1
2
3
4
5
13.471 Assume that f is a continuous positive function on [0, +∞) such that the improper Riemann integral 4 +∞ f (x) dx 0
+∞ exists. Show that the Lebesgue integral 0 f exists (and that both the improper Riemann integral and the Lebesgue integral of f on [0, +∞) coincide). If we do not assume that f is nonnegative, the statement on the existence of the Lebesgue integral may fail. Show an example. Hint. Fatou Lemma 745. For the second question, see Fig. 13.67 or, alternatively, check Example 806.
13.5.4
Notes on Vector-Valued Riemann Integration
The purpose of this section is to briefly discuss—through some exercises—how the Riemann integral for real-valued functions is extended to vector-valued functions, and propose in some of them examples to demonstrate the differences that we encounter between the scalar- and the vector-valued case. First, we provide the basic definitions and simplest results. We transfer to Exercises 13.472 to 13.478 details and examples. The framework consists of a function defined on a closed-bounded interval [a, b] in R and taking values in a Banach space X. Of course, it is enough to consider the case [a, b] := [0, 1]. The concepts of a partition and of a tagged partition were introduced in Sect. 1.1 and in the paragraph preceding Definition 667, respectively. Following the real case (see Definition 661), we say that f is Riemann integrable on [0, 1] if there is a point 1 z ∈ X (called the Riemann integral of f in [0, 1], and written 0 f ) such that for every ε > 0, there is δ > 0 with the property that for every tagged partition (P , {si }), where P := {0 = t0 < t1 < . . . < tn = 1}, we have . n . . . . . f (si )(ti+1 − ti ) − z. < ε, (13.42) (R(f , P , {si }) := . . . i=1
whenever the norm of the partition P is less than δ. Of course, if f is Riemann integrable, then its Riemann integral is unique. Thus for real-valued functions, the vector-valued definition agrees with the standard definition of Riemann integrability (see the equivalence (i)⇔(ii) in Theorem 669).
13.5 Integration
773
Similarly as in the real case, f has a Riemann integral if the following condition, called condition (D) (for J. G. Darboux), holds true: For every ε > 0, there is δ > 0 such that sup f (s) − f (u)(ti+1 − ti ) < ε s,u∈[ti ,ti+1 ]
whenever the norm of the partition P := {0 = t0 < t1 < . . . < tn = 1} is less than δ. It can be proved that f satisfies (D), if and only if, the set of all points of discontinuity of f is null in [0, 1] (see, e.g., [Go91]). In particular, every continuous function f : [0, 1] → X satisfy (D), hence it is Riemann integrable (see Exercise 13.472 below). However, the latter condition (D) is not necessary for the existence of the Riemann integral, unlike in the real case (see Exercise 13.473 below). 13.472 Show that any Banach space-valued function that is continuous has a Riemann integral. Hint. Due to the uniform continuity, condition (D) is satisfied. 13.473 Let {rn }∞ n=1 be the sequence of all distinct rational numbers in [0, 1] and define a function f from [0, 1] into c0 as follows: ⎧ ⎨0 if t is irrational, (13.43) f (t) = ⎩en if t = rn , n ∈ N, where {en }∞ n=1 is the sequence of the standard unit vectors in c0 . (i) (ii) (iii)
Show that f is Riemann integrable with integral 0. Show that condition (D) is not satisfied. Show that f is continuous nowhere on [0, 1]
Hint. If 0 = t0 < t1 < . . . < tn = 1 is a partition of [0, 1] of norm less than ε and {si } are tags to this partition, then check that the Riemann sum (13.42) corresponding to this tagged partition is, in the norm · ∞ of c0 , not greater than 2ε (watch for the endpoints of the partition). (ii) Given a partition 0 = t0 < t1 < . . . < tn = 1, check that supu,v∈[ti ,ti+1 ] f (u) − f (v) = 1 for each i. (iii) Any point of [0, 1] is the limit of a sequence of distinct rn ’s.
(i)
13.474 Repeat Exercise 13.473 for f : [0, 1] → 2 defined as in (13.43), where {en }∞ n=1 is now the sequence of standard unit vectors in 2 . Hint. Yes, the function is Riemann integrable. Use a similar proof, and consider the following computation (for the definition of Riemann sums R(f , P , {ti }) see (7.2)):
774
13 Exercises
. n . . . . . R(f , P , {si }) = . f (si )(ti − ti−1 ). . . i=1
& ≤
n
01/2 (ti − ti−1 )2
i=1
& ≤ |P |1/2
n
01/2 (ti − ti−1 )
,
i=1
where |P | denotes the norm of the partition P . 13.475 Repeat Exercise 13.473 for {ei } lying in 1 . Hint. The function is not Riemann integrable. To check this observe that if P := {0 = t0 < t1 < . . . < tn = 1} is any partition, {ri }ni=1 are rational tags, and {si }ni=1 are irrational tags, then R(f , P , {ri }) − R(f , P , {si }) = 1. The situation with 1 is somehow special: The space 1 is an example of a Banach space X having the so-called Lebesgue property, i.e., functions f : [0, 1] → X are Riemann integrable precisely when they are Darboux integrable, i.e., when they have property (D) (see, e.g., [Go91]). 2
13.476 Let the function f from [0, 1] into c0 be defined by f (t) = { 2t , 2t 2 , . . .}. Show that f is Riemann integrable and evaluate its Riemann integral. Hint. Show that f is continuous on [0, 1]. Considering the coordinates, we get 1 that each coordinate function f (t)i has Riemann integral equal to (i+1)2 i , for i ∈ N, 1 so the result is (i+1)2 i . 13.477 Let E be a Cantor set in [0, 1] of positive measure (see Sect. 3.1.5). Let the function f from [0, 1] into ∞ [0, 1] be defined as follows: f (t) = 0 if t ∈ / E, and f (t) = χt if t ∈ E, where χt denotes the characteristic function of the singleton {t}. Show that f is Riemann integrable on [0, 1], while the function f is not. Hint. The first part as in Exercise 13.473. For the second part, note that f is the characteristic function of the set E. The version of a famous theorem of Pettis in this context says (see, e.g., [Di84, p. 25): A function f from [0, 1] into a Banach space X is measurable, if and only if, x ∗ ◦ f is a Lebesgue measurable function on [0, 1] for each x ∗ ∈ X∗ and at the same time, there is a set A in [0, 1] of Lebesgue measure 0 such that the closed linear hull of the set {f (t); t ∈ / A} is a separable subspace of X. B. J. Pettis was an American mathematician. In this connection, we have the following exercise: 13.478 Let the function f from [0, 1] into ∞ be defined as follows: f (t) = χ[0,t] for t ∈ [0, 1]. Show that f is Riemann integrable (with integral 0) and that f is not measurable. Hint. For the first part, follow Exercise 13.477. For the second part, use Pettis’ theorem preceding the statement of the actual exercise: Given a set A ⊂ [0, 1] of measure 0, [0, 1] \ A is uncountable. As f is one-to-one, f ([0, 1] \ A) is uncountable. Note that f (t1 ) − f (t2 )∞ = 2 if t1 = t2 . Thus f ([0, 1] \ A) is not separable.
13.5 Integration
775
Fig. 13.68 A step function (in bold) and its continuous approximation (dashed)
If f is a simple measurable function from [0, 1] into a Banach space X, f = xi χAi with all Ai having a finite measure μ(Ai ), we define its Bochner integral f as xi μ(Ai ). [0,1] If X is a Banach space, a measurable function f from [0, 1] into X is called Bochner integrable if there is a sequence of measurable simple functions fn from [0, 1] into X such that fn converges almost everywhere to f and that [0,1] fn − fm → 0. The Bochner integral [0,1] f is then defined as limn [0,1] fn . The following result characterizes Bochner integrability: A function f from [0, 1] into a Banach space Xis Bochner integrable, if and only if, f is measurable and the Lebesgue integral [0,1] f is finite. S. Bochner was an American mathematician. Therefore, for real-valued functions, the notions of Bochner and Lebesgue integrability coincide. Since not every Riemann integrable function is measurable (see Exercise 13.478), there are functions that are Riemann integrable but not Bochner integrable.
13.5.5
The Lebesgue Integral
Basics 13.479 Let s be a real-valued step function on a general interval I ⊂R. Prove that for every ε > 0, there exists a continuous function h ∈ C(I ) such that I |s − h| < ε. Hint. Have a look at Fig. 13.68.
Convergence Theorems 13.480 Show, by giving an example, that the statement in Lebesgue Dominated Convergence Theorem 750 does not hold true if the assumption on the existence of an integrable majorant is dropped. Hint. nχ [0, n1 ]. xdx 13.481 Show that the Lebesgue integral [0,+∞) 1+x a is finite for every a > 2, and that the function F defined for a > 2 by 4 xdx F (a) = 1 + xa [0,+∞)
776
13 Exercises
Fig. 13.69 Several functions in Exercise 13.481
is continuous on (2, ∞). Hint. Standard. The continuity follows from the Lebesgue Dominated Convergence Theorem 750. See Fig. 13.69. 13.482 Show that, as a Lebesgue integral, 4
e−x cos [0,+∞)
∞ √ n! x dx = (−1)n (2n)! n=0
√ Hint. Observe first that the sequence of functions {χ[0,n] .e−x . cos x}∞ n=1 on −x −x [0, +∞) satisfies |fn (x)| ≤ e for all x ∈ [0, +∞). Since the function e is clearly ∞ Lebesgue √ integrable on [0, +∞), and {fn }n=1 pointwise converges to the function −x e cos x on [0, +∞), this later function is Lebesgue √ integrable on [0, +∞). The evaluation of the Taylor series for cos x at x for some x ≥ 0, and its multiplication by e−x , gives e−x cos
√ e−x x 2 e−x x 3 e−x x + − + . . .. x = e−x − 2! 4! 3!
(13.44)
For n ∈ N, put bn (x) := e−x x n for x ∈ [0, +∞) (see Fig. 13.70) and an := (−1)n+1 /(2n)!, and apply Dirichlet’s test ((i) in Theorem 477) in order to get that the convergence of the series (13.44) is uniform on [0, +∞). We can integrate both sides of formula (13.44) on [0, +∞), getting that the right-hand side is, due to the uniform convergence, a sum of integrals. Now, if In := [0,+∞) x n e−x dx, use integrations by parts to get In = nIn−1 . Since I0 = 1, we get In = n! for all n ∈ N. The result follows. 13.483 Let f be a real-valued continuous positive decreasing function on [0, ∞) such that its Lebesgue integral [0,+∞) f is finite. Show that the Lebesgue integral 2 [0,+∞) f is finite. Compare with Exercise 13.484. Can we drop the assumption of monotonicity? Hint. Note that limx→∞ f (x) = 0. So, for large x, we have (0≤) f 2 (x) ≤ f (x). We cannot drop the monotonicity assumption. Indeed, consider a sequence {Jn }∞ n=1 of disjoint closed intervals in [0, +∞), where λ(Jn ) = 1/n3 for n ∈ N, and let f be
13.5 Integration
777
Fig. 13.70 Some functions bn in Exercise 13.482
a function defined on [0, +∞) that is 0 in the complement of value on Jn is n, for n ∈ N.
∞ n=1
Jn and whose
13.484 Find a function that is Lebesgue integrable on [0, 1] but the square of it is not. Can such a function be bounded? Compare with Exercise 13.483. Hint. Consider the function f (x) = √1x on (0, 1). It cannot be bounded: Lebesgue’s Dominated Convergence Theorem 750. See also Example 715.1. 13.485 Show, by using a method similar to the one in the proof of Proposition 965— the completeness of (L2 [0, 1], · 2 )—that the space (L1 [0, 1], · 1 ) is complete, too. Note that the result extends to any measurable subset of R instead of the interval [0, 1]. ∞ Hint. We prove an analogue to Lemma 964. first nLet {fn }n=1 be a sequence in ∞ L1 [0, 1] such that n=1 fn 1 < +∞. Put Sn := k=1 |fk | for n ∈ N. We get an increasing sequence in L1 [0, 1] such that 4
4 Sn = [0,1]
n
[0,1] k=1
|fk | =
n 4 k=1
|fk | = [0,1]
∞
fk 1 < +∞.
k=1
The Monotone Convergence Theorem 744 shows that there exists a Lebesgue in∞ tegrable function S in [0, 1] such that Sn → S (a.e.), i.e., the series n=1 |fn (x)| ∞ converges (a.e.) to S(x). In particular, the series n=1 fn (x) converges (to a real number f (x)) (a.e.). This defines (a.e.) a measurable function f . The Lebesgue Dominated Convergence Theorem 750 shows that f is in L1 [0, 1], due to the fact that | nk=1 fk | ≤ S (a.e.). The proof of the completeness of L1 [0, 1] uses the result just proved and the same argument as in the proof of Proposition 965.
Measure and Integration 13.486 Show that if for a real-valued measurable function f on R we have I f = 0 for every interval I , then f = 0 (a.e). Hint. This is proved in the text in Proposition 770. Here, we give a slightly different proof. Observe first that G f = 0 for every open set G, and thus H f = 0
778
13 Exercises
for every Gδ set H , and thus M f = 0 for every measurable set M (Corollary 267). Therefore, f + = 0 and, because f + ≥ 0, we have f + = 0 (a.e.) (see Corollary 761). Similarly for f − , and thus f = f + − f − = 0 (a.e.). Functions Defined by Integrals 13.487 Follow the hint to give an alternative proof to Theorem 763 on the absolute continuity of a measure defined by an integral. Hint. Assume first that f is bounded on S (say |f (x)| ≤ K for x ∈ S). Then, the result follows, since 4 4 f ≤ |f | ≤ Kλ(E). (13.45) E
E
Assume now that f is unbounded on S, and let fˆ be the extension of f to R that 764. Put vanishes on S c . Let ν be the measure defined by |fˆ| as in Proposition Sn := {x ∈ S : f (x) ∈ [n − 1, n)} for n ∈ N. Since S = ∞ n=1 Sn , and the sequence {Sn }∞ n=1 consists of pairwise disjoint measurable subsets, Proposition 764 gives ν(S)= ∞ n=1 ν(Sn ). Since ν(S) < +∞, given ε > 0 there exists N ∈ N such that ∞ n=N+1 ν(Sn ) < ε/2. Find δ > 0 such that δ < ε/(2N ), and let E be a measurable subset of S such that λ(E) < δ. Then, from the fact that ν is a measure, and from (13.45), we get 4 4 N ∞ f ≤ |f | = ν(E) = ν E ∩ Sn + ν E ∩ Sn E
E
n=1
≤ N λ(E) +
∞
n=N+1
ν(Sn ) < ε/2 + ε/2 = ε.
n=N+1
Riemann Versus Lebesgue Integrability 13.488 Show that the integral
4
π 2
ln (sinx)dx 0
exists in the improper Riemann and in the Lebesgue sense, and evaluate it. Hint. On the interval [0, π/2] we have sin x ≥ x − x 3 /6 ≥ 0. This follows from the fact that the Taylor series of sin x expanded at x0 = 0, when evaluated at x ∈ [0, π/2], is alternating, and we can apply what was said in Remark 184. Since ln x is an increasing function on (0, +∞) we get ln (sinx) ≥ ln (x − x 3 /6). We have then 0 ≥ ln (sinx) ≥ ln (x − x 3 /6) ≥ ln (x 3 − x 3 /6) = ln (5/6) + 3 ln x, for x ∈ (0, 1]. 1 Since 0 ln x dx exists as an improper Riemann integral (see Exercise 13.466) and the function ln x is negative on (0, 1), we get from (i) in the comparison Theorem
13.5 Integration
779
Fig. 13.71 The function ln (sinx) on [0, π/2] (Exercise 13.488)
718 that ln (sinx) exists as an improper Riemann integral on [0, π/2]. By Proposition 789, ln (sinx) is also Lebesgue integrable on [0, π/2], and the two integrals coincide. The graph of the function ln (sinx) on (0, π/2] is depicted in Fig. 13.71. In order to compute the value of the integral, we can proceed as follows: 4
π 2
0
4 ln (sinx)dx = 2 4
=2
π 4
π 2
0
4 π 4 π 4 4 π ln (sint)dt + 2 ln (cos t)dt ln 2 + 2 2 0 0 4 π π 4 ln (sint)dt − 2 ln cos − u du π 2 2 4 π 2 ln (sint)dt + 2 ln (sint)dt
ln 2 sin t cos t =
0
=
ln (sin2t)dt 0
π = ln 2 + 2 2 =
π 4
π ln 2 + 2 2 π ln 2 + 2 2
4
π 4
0
4
π 4
π 4
0
4
π 2
ln (sint)dt. 0
Thus, by comparing the right- and the left-hand side of this equality we get ln (sinx)dx = − π2 ln 2.
In [Foll99], the difference between the Riemann and the Lebesgue integral is summarized by saying that “to compute the Riemann integral of f , one partitions the domain [a, b] into subintervals,” while in the Lebesgue integral, “one is in effect partitioning the range of f.” However, not everybody agrees. For example, in ([Bo96], p. 208), the following remark is added: “This definition of the Lebesgue integral is superficially much like the definition of the Riemann integral. One of the most important differences is the use of measurable sets instead of intervals. In Lebesgue’s original definition of an integral, the range of the function was partitioned, instead of the domain. That often used to be thought (incorrectly) to be the essential difference between Lebesgue’s and Riemann’s definitions.” Both authors are partially right. We present in Exercise 13.489 below the precise statements, preceded by the introduction of the concepts needed. Let [a, b] and [c, d] be two closed and bounded intervals in R, and let f : [a, b] → [c, d] be a measurable function. Given a partition P ∈ P[c, d], say P := {c = y0
0. Let P := {c = y0 < y1 < . . . < yn = d} ∈ P[c, d] such that |P | < ε. Define l(f , P ) and u(f , P ) as in (13.48). Note that
u(f , P ) − l(f , P ) (x) < ε for every x ∈ [a, b]. This proves that [a,b] u(f , P ) − [a,b] l(f , P ) ≤ ε(b − a). Since [a,b] l(f , P ) ≤ f ≤ [a,b] u(f , P ) and ε > 0 was arbitrary, we obtain LL(f ) = LU (f ) = [a,b] [a,b] f . Conversely, assume that LL(f ) = LU (f ). Given n ∈ N, we can find Pn , Qn ∈ P[c, d] such that LU (f , Qn ) − LL(f , Pn ) < 1/n. Take R1 ∈ P[c, d] such that, simultaneously, P1 ≺ R1 , Q1 ≺ R1 , and |R1 | < 1. Fix n ∈ N and assume that Rk has been already defined, for k = 1, 2, . . ., n. Then choose Rn+1 ∈ P[c, d] such that, simultaneously, Rn ≺ Rn+1 , Pn+1 ≺ Rn+1 , Qn+1 ≺ Rn+1 , and |Rn+1 | < 1/(n + 1). In this way, we define a sequence {Rn }∞ n=1 in P[c, d]. We have, by (13.52), l(f , Rn ) ≤ f ≤ u(f , Rn ), and LU (f , Rn ) − LL(f , Rn ) < 1/n, for all n ∈ N. Due to the fact that |Rn | < 1/n for all n ∈ N, the (increasing) sequence {l(f , Rn )}∞ n=1 converges pointwise to f . Since {LL(f , Rn )}∞ n=1 converges (necessarily to LL(f )), the Monotone Convergence Theorem 744 ensures that f ∈ L([a, b]), and [a,b] f = LL(f ) = LU (f ). The Fundamental Theorem of Calculus for the Lebesgue Integral In Exercise 13.490, we reprove a result that is the Lipschitz case of the Fundamental Theorem of Calculus 791 for the Lebesgue’s integral. For the case of Lipschitz functions, its proof is considerably easier. Still, the broad scope of its applications make the statement—and its proof—worth considering.
782
13 Exercises
Fig. 13.73 The functions f
and the first five ϕn in the proof of Theorem 1083 for a particular f
13.490 Prove the following result. Theorem 1083 (Fundamental Theorem of Calculus for the Lebesgue Integral— Lipschitz Case) Let F be a Lipschitz function on an closed and bounded interval [a, b]. Then F exists (a.e.) on [a, b] and F ∈ L[a, b]. Moreover, 4 F (b) − F (a) = F
[a,b]
where the integral is understood in the Lebesgue sense. Hint. Assume that F is M-Lipschitz for some M > 0. Extend the definition of F on [a, b + 1] by putting F (x) = F (b) for x ∈ [b, b + 1], and call this extension again F . The function F has then a finite derivative F almost everywhere on [a, b + 1] as F is M-Lipschitz (see Proposition 444). For x ∈ [a, b] and n ∈ N, put
1 ϕn (x) = n F x + − F (x) . n Each ϕn is a continuous function, and the sequence {ϕn }∞ n=1 converges almost everywhere on [a, b] to F (see Fig. 13.73); thus, F is Lebesgue measurable. The fact that F is M-Lipschitz implies (i) that F is bounded almost everywhere by M, so F is Lebesgue integrable on [a, b] (3 ), and (ii) that |ϕn (x)| ≤ M on [a, b]. By the Lebesgue Dominated Convergence Theorem, we have 4 4
F = lim ϕn . (13.53) [a,b]
n→∞ [a,b]
3 That F ∈ L[a, b] is true for F in the more general class of functions of bounded variation, see Corollary 749. The proof given there used the same idea here, although Fatou’s Lemma 745 was needed for getting the conclusion.
13.5 Integration
783
We have, by easy substitutions, by the First Mean Value Theorem 692 for Riemann integrals, and by the continuity of F , that there are θn , θn ∈ [0, 1] such that 4 1 4 4 b b+ n ϕn = n F− F a+ n1
[a,b]
4 =n
b+ n1
b
a
4
a+ n1
F−
F a
θn 1 θ 1 F b+ − F a+ n =n n n n n
θ θn − F a + n −→ F (b) − F (a). =F b+ n n By (13.53 and (13.54), this means that 4 F = F (b) − F (a).
(13.54)
[a,b]
13.491 Wallis Infinite Product Use (1) in Exercise 13.460 for n = 0 in order to prove the so-called Wallis infinite product for π: ∞ % 2n π 2n · = . 2n − 1 2n + 1 2 n=1
(13.55)
π Hint. Put I (n) := 0 sinn x dx. Note that I (0) = π , and I (1) = 2. Use the reduction formula (1) in Exercise 13.460 to get, by finite induction, I (2n) = π
n % 2k − 1 k=1
2k
,
(13.56)
and I (2n + 1) = 2
n %
2k , 2k +1 k=1
(13.57)
for n ∈ N. Observe that sin2n+1 x ≤ sin2n x ≤ sin2n−1 x for x ∈ [0, π ], hence I (2n + 1) ≤ I (2n) ≤ I (2n − 1), and so 1≤
I (2n − 1) 2n + 1 I (2n) ≤ = . I (2n + 1) I (2n + 1) 2n
(13.58)
This shows that I (2n) = 1. n→∞ I (2n + 1) lim
Since
n π % 2k − 1 2k + 1 I (2n) = . , I (2n + 1) 2 k=1 2k 2k
we get from (13.59) formula (13.55).
(13.59)
784
13 Exercises
Fig. 13.74 The function in Exercise 13.493 on [0, 3]
13.492 (Wallis Formula) Use Exercise 13.491 to prove the so-called Wallis formula: π If we put In := 0 sinn x dx for n ∈ N, then lim
√
n→∞
n.In =
√ 2π .
(13.60)
$ 2k 2k Hint. Put Pn := nk=1 2k−1 for n ∈ N. Formula (13.55) shows that · 2k+1 2 Pn → π/2√as n → ∞. From (13.57) we get Pn = (I (2n √ √ + 1)/2) .(2n √ + 1). This shows that 2n + 1I (2n+1) → 2π as n → ∞. That 2nI (2n) → 2π follows from this and from (13.58). 13.493 Integral of Probability Prove, by following the hint, that the Laplace inte +∞ 2 gral 0 e−x dx (also called the probability integral) exists as an improper Riemann integral and a Lebesgue integral, and 4
+∞
e−x dx =
0
2
√ π 2
(13.61)
(see Fig. 13.74). Hint. First, we note that for 0 < ε < 1, we have 0 < (1 − ε)1/ε . This follows from the fact that the function is decreasing on (0, 1) (by using its derivative) and 1 limε→0+ (1 − ε) ε = e−1 . Thus, given x > 0 and n ∈ N such that n > x 2 ,
x2 0< 1− n
nx −2
−1
x2, n n x2 x2 2 −x 2 n.
13.5 Integration
785
Then, for all x ≥ 0, fn (x) ≤ e−x
2
and
lim fn (x) = e−x . 2
n→∞
By the Lebesque Dominated Convergence Theorem 750, n 4 +∞ 4 √n 4 +∞ x2 −x 2 1− e dx = lim fn (x) dx = lim dx. n→∞ 0 n→∞ 0 n 0 We can use the change of variable formula in the Riemann integrability version for the continuous function fn [0,√n] (Theorem 702): put √xn = cos t to get 4 0
√
n
4
π/2
√
x2 1− n sin
n
n dx = *
2n+1
0
t dt =
√ n · 2n + 1 2n + 1
4
π/2
sin2n+1 t dt. 0
Observe, that due to the symmetry with respect to π/2 of sin x on [0, π ], we have π π/2 that 0 sinn x dx = 2 0 sinn x dx for all n ∈ N. Use Wallis’ formula (Exercise 13.492) to get finally √ 4 +∞ π −x 2 e dx = . 2 0 13.494 The hint below shows how to prove the following result:
Theorem 1084 (Fubini on Differentiation of Series) Let s(x) := ∞ n=1 fn (x) be a pointwise convergent series of increasing functions on an interval [a, b]. Then, for
almost all x ∈ [a, b], we have that s (x) = ∞ n=1 fn (x) is finite. Hint. First, observe that we can assume fn (a) = 0 for all n (and then fn (x) ≥ 0 for all x ∈ [a, b] and all n ∈ N). Indeed, by considering functions fn (x) − fn (a) instead, the resulting series of functions still converges pointwise, and the series of their derivatives does not change. For x ∈ [a, b] and n ∈ N, put sn (x) =
n
fk (x).
k=1
Then, for all m, n ∈ N, m ≥ n, all the functions fn , sn , sm − sn , s, s − sn
(13.62)
are nonnegative increasing and all equal to 0 at the point a. By Lebesgue’s Theorem 424, there is a set N of measure 0 in [a, b] such that all functions in (13.62) have finite derivative at each point of [a, b] \ N . Observe that 0 ≤ s1 (x) ≤ s2 (x) ≤ . . . ≤ s (x) < ∞ for each x ∈ [a, b] \ N . Therefore, for x ∈ [a, b] \ N we have ∞ n=1
fn (x) = lim sn (x) ≤ s (x) < ∞, n→∞
(13.63)
786
13 Exercises
so the series t(x) := ∞ n=1 fn (x) is convergent. Choose n1 < n2 < n3 < . . . in −k N so that s(b) − snk (b) < 2 for all k ∈ N. Then 0 ≤ s(x) − snk (x) < 2−k for all x ∈ [a, b]. It follows that the series of increasing functions ∞k ∈ N and for all (x) is pointwise convergent on [a, b]. Apply to this series what has s(x) − s n k k=1 been already proved for any series of increasing functions: By (13.63), the series ∞
s (x) − s (x) is convergent for almost all x ∈ [a, b]. Therefore its kth term nk k=1 has limit 0 for almost all x ∈ [a, b]. Thus limk→∞ sn k (x) = s (x) for almost all x ∈ [a, b]. Then, again by (13.63) we have that t(x) = limn→∞ sn (x) = s (x) for almost all x ∈ [a, b]. The following exercise states a result of H. Lebesgue on what is called the density of a measurable set at a point, a concept that is defined below. Definition 1085 Let M be a measurable set in R and x ∈ R. We say that M has density dM (x) at x ∈ R if the following limit exists: dM (x) :=
lim
[y,z]→[x,x], y≤x≤z
λ([y, z] ∩ M) , z−y
(13.64)
where λ denotes the Lebesgue measure on R. The limit (13.64) should be understood in the following way: For every ε > 0 there exists δ > 0 such that, for y < x < z and z − y < δ, then |λ([y, z] ∩ M)/(z − y) − dM (x)| < ε. 13.495 Prove the so-called Lebesgue density theorem below by following the hint. Theorem 1086 (Lebesgue Density Theorem) Let M be a measurable set in R. Then ⎧ ⎨1 for almost all x ∈ M, dM (x) = ⎩0 for almost all x ∈ R \ M, where dM was introduced in Definition . Hint. Obviously, it is enough to prove the result for a bounded measurable set M. Thus, assume that M ⊂ (a, b) for some interval (a, b) ⊂ R. For x ∈ (a, b) put fM (x) = λ(M ∩ [a, x]). Note that χM (i.e., the characteristic function of M) is a Lebesgue measurable function (Proposition 758). Since it is bounded, we have χM ∈ L[a, b] (Remark 752.2). Observe that fM is just the function F defined by (7.65) for the function f := χM , i.e., fM (x) = [a,x] χM (t) dt for x ∈ (a, b). Thus, Proposition 768 shows that fM is absolutely continuous on [a, b], hence of bounded variation (Proposition 436), and so having a finite derivative (a.e.) on (a, b) (Corollary 433). According to Corollary 749, this derivative fM (defined a.e.) belongs to L[a, b] and fM = χM (a.e. on (a, b)) (see Theorem 799). Let x ∈ (a, b) be such that fM (x) exists finite. Note that for
13.5 Integration
787
Fig. 13.75 The function sin√ (1/x) on (0, 1] (Exercise x 13.498)
a < y < x < z < b, λ([y, z]∩M) fM (z) − fM (y) fM (z) − fM (x) z − x fM (y) − fM (x) x − y = = . + . , z−y z−y z−x z−y y −x z−y M (x) M (x) and this is a convex combination of the two numbers fM (z)−f and fM (y)−f . Each z−x y−x
M (y) → fM (x) as y, z → x of them tends to fM (x) as z → x and y → x, so fM (z)−f z−y (precisely, for every ε > 0 there exists δ > 0 such that a < y < x < z < b, M (y) z − y < δ, implies | fM (z)−f − fM (x)| < ε). This shows that for every x ∈ (a, b) z−y where fM (x) exists finite, we have dM (x) = fM (x). This happens, according to our previous discussion, (a.e.) on (a, b). Since fM (x) = χM (x) (a.e.), this shows the result.
13.496 A set A ⊂ R is called porous, if there is a number 0 < α < 1 with the following property: For every x ∈ A and for every δ > 0 there is a y ∈ R such that such that 0 < |y − x| < δ and A ∩ B(y, α|y − x|) = ∅. Show that every porous set is nowhere dense and that every porous set has measure 0. Hint. Lebesgue’s density Theorem 1086. 13.497 Explain in detail why the following integral exists in the improper Riemann √ = 1. and in the Lebesgue sense, and verify its value: [0,1] 2dx x √ Hint. The function f (x) := 1/(2 x) is not bounded on (0, 1], so it is not Riemann integrable there. However, the integral exists in the improper Riemann sense, 1 1 1 √ 1 √ since ε f (x) dx = x ε = 1 − ε, and so 0 f (x) dx = limε↓0 ε f (x) dx = √ limε↓0 (1 − ε) = 1. That the function f is Lebesgue integrable on [0, 1] (with the same integral value) is a consequence of Proposition 789 and the fact that f does not change sign on (0, 1]. 13.498 Explain in detail why the integral [0,1] sin√(1/x) dx exists as a Lebesgue x integral. Hint. The Lebesgue dominated convergence √ theorem for the sequence {f.χ[1/n,1] : n ∈ N}, where f (x) := sin (1/x)/ x for x ∈ (0, 1], and Exercise 13.497. The function f is plotted in Fig. 13.75.
788
13 Exercises
Fig. 13.76 A Lebesgue integrable function not vanishing at ∞ (Exercise 13.499)
13.499 Is it true that if a real-valued function f has finite Lebesgue integral on R, then limx→+∞ f (x) = 0? Compare with Exercise 13.500. Hint. No. Higher and higher “beaks” around integers (see Fig. 13.76) give an integrable function on R. 13.500 If f is a real-valued Lebesgue integrable and uniformly continuous function on R, show that, then, limx→+∞ f (x) = 0. Compare with Exercise 13.499. Hint. Assume that {xn } is a sequence with xn+1 > xn +1 and such that |f (xn )| ≥ ε for all n ∈ N and for some ε > 0. Then, from the uniform continuity, we get δ ∈ (0, 1/4) such that for every n ∈ N we have inf{f (x) : x ∈ (xn −δ, xn +δ)} > ε/2. Therefore [xn −δ,xn +δ] |f | ≥ 2ε 2δ for all n ∈ N. Since these intervals around xn are mutually disjoint, we get R |f | = +∞, a contradiction (see Proposition 734). 13.501 Decide on the Riemann integrability of the characteristic function of the Cantor ternary set of measure zero and on the Riemann integrability of the characteristic function of the Cantor ternary set of positive measure (see Sect. 3.1.5 and Exercises 13.154 and 13.304). Hint. Use the characterization of Riemann integrability (Theorem 777) and Exercise 13.185. Note that the set of points of discontinuity of the characteristic function of a Cantor ternary set C (both in the case of Lebesgue measure zero or positive) is the set C itself. 13.502 Let {fn } be a decreasing sequence of nonnegative Lebesgue integrable functions on R such that limn→∞ fn = 0. Prove that lim fn (x) = 0 for almost every x ∈ R. Hint. The Monotone Convergence Theorem 744 applied to the sequence {−fn }∞ n=1 and Corollary 761. 13.503 Give an example of a uniformly bounded sequence {fn } of Lebesgue integrable functions on R that converges (a.e.) to a function that is not Lebesgue integrable. Can you replace R by a bounded measurable set? Hint. Let fn be defined by 0 on (−∞, −n), by −1 on [−n, 0), by 1 on [0, n], and by 0 on (n, +∞). Hint for the last part: No. Lebesgue Dominated Convergence Theorem 750. 13.504 Is the result analogous to that in Proposition 778 true for the Lebesgue integral? Hint. No. Consider the function f defined on [−1, +1] by √1x on (0, 1] and √1−x on [−1, 0). Let then φ be the function x 2 .
13.5 Integration
789
13.505 Let {fk } be on a measurable a sequence of measurable real-valued functions set E. Show that fn converges absolutely (a.e.) on E if |f k | < ∞. E Hint. This is a straightforward consequence of Theorem 743. Indeed, consider ∞ ∞ | < +∞, the sequence {|f of positive measurable functions. Since |f |} n n n=1 n=1 E the series ∞ |f | converges (a.e.) on E. n n=1 13.506 Use Exercise 13.505 to show the following fact: If {rk }∞ k=1 is the sequence 1 of all rational numbers in [0, 1] then the series converges a.e. on √ 2k |x − rk | [0, 1]. Hint. A straightforward consequence of the mentioned exercise. 13.507 Show that the statement in Fatou’s Lemma (Corollary 745) in general does not hold even if {fn } is an increasing sequence of real-valued measurable functions not necessarily nonnegative. Hint. fn := −χ[n,∞) . √ 13.508 If f and g are two nonnegative integrable functions on R, show that f g is integrable. 2 2 Hint. If a and b are two nonnegative numbers, then ab ≤ a +b . 2 13.509 Show the following result of Vitali: If p ∈ [1, +∞) and {fn } is a sequence of real-valued (a.e.) on R to a function f , and functions that converges limn→∞ |fn |p = |f |p , then limn→∞ |fn − f |p = 0. Hint. |fn − f |p ≤ (|fn | + |f |)p ≤ (2max{|fn |, |f |})p = 2p (max{|fn |, |f |} = 2p (max{|fn |p , |f |p }) ≤ 2p (|fn |p + |f |p ). Therefore, 2p (|fn |p + |f |p ) − |fn − f |p ≥ 0. We have (2p (|fn |p + |f |p ) − |fn − f |p ) →n 2p+1 |f |p (a.e.). Therefore, by Fatou’s Lemma 745, 4 4
p p+1 p 2 |f | ≤ lim inf 2 (|fn |p + |f |p ) − |fn − f |p 4 4 p+1 p =2 |f | − lim sup |fn − f |p . It follows that lim sup
|fn − f |p = 0.
13.510 In Remark 790.1c, we mentioned that absolutely continuous functions F on an interval [a, b] can be recovered from F (a Lebesgue integrable function) by the formula of the fundamental theorem of calculus (Theorem 791). However, for other functions F (even everywhere differentiable), F may not be Lebesgue integrable. This is shown by the following example (fill the gaps in the argument). Let the function F be defined on [0, 1] by F (x) = x 2 cos (π/x 2 ) for x = 0 and F (0) = 0 (see Fig. 13.77).
790
13 Exercises
Fig. 13.77 The function F in Exercise 13.510
Then F exists at each point of [0, 1]. However, F is not Lebesgue integrable. To see this, note that if 0 < a ≤ 1, then the function F is continuous on [a, b] and we can thus use the regular fundamental theorem of calculus for the Riemann integral ) β 1 2 if αn = 4n+1 (Theorem 685) to find that, for example, ( [αn ,βn ] F =) αnn F = 2n and βn = √12n . Assume for a moment that F is Lebesgue integrable on [0, 1]. Then, so it is on M := ∞ n=1 [αn , βn ]. By using Proposition 762 for the measurable set M and the sequence of pairwise disjoint intervals {[αn , βn ], n ∈ N}, we arrive to a contradiction. Thus, the Lebesgue integral does not recover the function from its everywhere existing derivative, in general. The integral that does this was discovered about 10 years after Lebesgue by O. Perron and, independently, with a different definition, by A. Denjoy. Perron used in his definition the lower derivatives, while Denjoy used a transfinite construction. It took about 10 years that the German mathematician H. Hake, together with P. S. Alexándrov and the Dutch mathematician H. Looman, found that the Perron and Denjoy integrals give the same. In the 60s, a new integral that also has this property was discovered independently by the Irish mathematician R. Henstock and the Czech mathematician J. Kurzweil (see, e.g., [Sc09]). 13.511 Theorem 799 says that if f is a real-valued Lebesgue integrable function defined on [a, b], and F is its indefinite integral on [a, b], then, for almost all points in x ∈ [a, b], we have F (x) = f (x). This means that for such points x we have 4 1 lim f (y) dy = f (x), h→0 h [x,x+h] i.e.,
1 h→0 h
4 (f (y) − f (x)) dy = 0.
lim
[x,x+h]
If the following stronger statement holds for x, namely that 4 1 lim |f (y) − f (x)| dy = 0, h→0 h [x,x+h] then the point x is called a Lebesgue point of f . The collection of all such points is called the Lebesgue set of f .
13.5 Integration
791
Fig. 13.78 The function x/(1 + x) (Exercise 13.512)
Prove the following Lebesgue’s theorem: If f is a real-valued Lebesgue integrable function defined on [a, b], then almost all points in [a, b] are Lebesgue points of f . Hint. Let Q = {rn : n ∈ N} be the set of all rational numbers in R, and for k ∈ N let Zk be the set where the formula 4 1 |f (y) − rk |dy = |f (x) − rk | lim h→0 h [x,x+h] is not valid. Since |f (y) − rk | is a Lebesgue integrable function on [a, b], we have, by Lebesgue differentiation Theorem 799, that the λ(Zk ) = 0, where λ denotes the / Z, we have, for Lebesgue measure on R. Put Z := ∞ k=1 Zk . Then λ(Z) = 0. If x ∈ all k, 4 4 1 1 |f (y) − f (x)| dy ≤ |f (y) − rk | dy |h| [x,x+h] |h| [x,x+h] 4 4 1 1 + |f (x) − rk | dy = |f (y) |h| [x,x+h] |h| [x,x+h] − rk | dy + |f (x) − rk |. Therefore, if x ∈ / Z, we have 4 1 lim sup |f (y) − f (x)| dy ≤ 2|f (x) − rk | h→0 |h| [x,x+h] for every k. Due to the density of the set {rk : k ∈ N} in the real line, we get that 4 1 lim sup |f (y) − f (x)| dy = 0 h→0 |h| [x,x+h] if x ∈ / Z. This completes the proof. 13.512 Let M be a measurable subset of R such that λ(M) < ∞. For n ∈ N, let fn be a measurable functions on M. Show that fn → 0 in measure, if and only if, 4 |fn | → 0. 1 + |fn | M Hint. Note that the function τ (ε) = Fig. 13.78).
ε 1+ε
is increasing and τ (ε) ≤ ε on [0 + ∞) (see
792
13 Exercises
Given ε > 0, put Fn = {x ∈ M : |fn (x)| ≥ ε}. Note that x ∈ Fn -⇒ and
|fn (x)| ε < < ε. 1 + |fn (x)| 1+ε
x ∈ Fn -⇒ Write
4 M
|fn | = 1 + |fn |
ε |fn (x)| ≥ , 1 + |fn (x)| 1+ε
4 Fn
|fn | + 1 + |fn |
4 M\Fn
|fn | . 1 + |fn |
(13.65)
If fn → 0 in measure, the integral on the left-hand side in (13.65) is less than or |fn | equal to λ(Fn ) + ελ(M), as on M \ Fn we have 1+|f ≤ |fn | < ε. n| This implies that then the integral on the left-hand side in (13.65) tends to 0 when n → ∞. If this happens, then the first integral on the right-hand side tends to 0. Since, on |fn | ε Fn , 1+|f ≥ 1+ε for each n, we get that this first-term integral is greater than or n| ε equal to λ(Fn ) 1+ε . Thus we get λ(Fn ) → 0 when n → ∞. Note that λ(M) < ∞ is needed: Put fn = n1 on R.
13.5.6
Convex Functions
13.513 Show directly that the function f (x) := x 2 is strictly convex on R. Use alternatively Remark 821. Hint. Fix x < y in R and λ ∈ (0, 1). We want to prove that f (λx + (1 − λ)y) < λf (x) + (1 − λ)f (y). After squaring, collecting, and cancelling terms, this amounts to prove that 2xy < x 2 + y 2 , i.e., (x − y)2 > 0, and this certainly holds. Since f is continuous, we could have applied Proposition 818 from the beginning (adapted to the strict convexity case), jumping right to check the last requirement in the previous paragraph. Observe that f
(x) = 2 (> 0) for all x ∈ R. The result follows then from Remark 821. 13.514 Show that a subset C of a vector space is convex, if and only if, every convex combinations of its elements belongs to C. Given an arbitrary subset S of a vector space, show (i) that the definition of the convex hull conv (S) of a subset S of a vector space (see Sect. 8.1) is unambiguous by proving that conv (S) is the intersection of all the convex subsets of E that contain S; (ii) that conv (S) is the set of all convex combinations of elements in S.
13.5 Integration
793
Fig. 13.79 A convex function is above or on any tangent (Exercise 13.515)
Hint. For the first part, note that λ1 x1 + λ2 x2 + λ3 x3 = (λ1 + λ2 )
λ1 λ2 x1 + x2 + λ 3 x3 . λ1 + λ 2 λ1 + λ 2
For the second part, to prove that the intersection of an arbitrary number of convex subsets of E is again convex is easy. This gives immediately (i). To prove (ii) is enough to show that the set of all convex combinations of elements in S is a convex set. For this, take two linear combinations ni=1 λi xi and m γ yj of elements j j =1 γ y is clearly a convex in S, and α ∈ [0, 1]. Then, α ni=1 λi xi + (1 − α) m j =1 j j combination of elements in S. Related to this exercise, see also Exercise 13.537. 13.515 Show that a real-valued differentiable function on a general open interval I ⊂ R is convex, if and only if, each point (x, f (x)) of its graph is above or on the graph of the tangent line at each point x0 ∈ I (see Fig. 13.79). Hint. If f is convex, and x0 < x ∈ I , f (x) − f (x0 ) = f (ξ )(x − x0 ) for some ξ ∈ (x0 , x) by the Mean Value Theorem 365. Since f (x0 ) ≤ f (ξ ) (see Proposition 811), we get f (x) ≥ f (x0 ) + f (x0 )(x − x0 ), and the right-hand member of the previous inequality is the ordinate at x of the tangent line at x0 . A similar argument applies for x < x0 . Assume now that f is not convex. Then, we can find x < z in I and λ ∈ (0, 1) such that if y := λx + (1 − λ)z, then f (y) > λf (x) + (1 − λ)f (z) (see Fig. 8.9). Assume that each point (x, f (x)) is above or on the graph of any tangent line to f . Then, the slope of the tangent line at (y, f (y)) must be at the same time greater than the slope of the chord [(x, f (x)), (y, f (y))]—in order to leave (x, f (x)) above—and smaller than the slope of the chord [(y, f (y)), (z, f (z))]—in order to leave (z, f (z)) above—, a contradiction. 13.516 Express the function sin x as the difference of two convex functions on the real line R. Hint. Consider x 2 +sin x and x 2 −sin x (see Fig. 13.80). For checking convexity, use Corollary 820. √ 13.517 Use Proposition 815 to show that the function x is absolutely continuous on [0, 1] (see also Exercises 13.282, 13.284, and 13.285).
794
13 Exercises
Fig. 13.80 The function sin x as the difference of two convex functions (Exercise 13.516)
√ Fig. 13.81 √ The functions x and − x on [0, 1] (Exercise 13.517)
√ Hint. The function − x is continuous and√convex on √ the interval [0, 1] (see Fig. 13.81). Proposition 815 then proves that − x (and so x) is absolutely continuous. Observe that we proved this result in Exercises 13.282 and 13.284 without mentioning convexity. 13.518 Prove that the indefinite integral of an increasing function f in L[a, b] is an (absolutely continuous) convex function F on [a, b]. Hint. The absolute continuity of F is a consequence of Proposition 768. In particular, F is continuous. It follows from Proposition 818 that it is enough to check midpoint convexity, i.e., that 1 x+y ≤ F (x) + F (y) (13.66) F 2 2 for every x, y ∈ [a, b]. Fix x < y in [a, b]. Observe that 4 x+y 4 x 4 x+y 2 2 x+y f = f+ f, = F 2 a a x
(13.67)
and that 4 x 4 x 4 4 y 4 y 4 x F (x) + F (y) 1 1 1 y = 2 f+ f = f+ f = f+ f. 2 2 a 2 2 x a a x a (13.68) From (13.67) and (13.68) it follows that, in order to prove (13.66) it is enough to show that 4 x+y 4 y 4 4 x+y 2 2 1 y 1 f ≤ f = f+ f , x+y 2 x 2 x x 2 that in turn is equivalent to
4 x
x+y 2
4 f ≤
y x+y 2
f,
and this holds due to the fact that f is increasing on [a, b].
13.5 Integration
795
Fig. 13.82 The series for f truncated at k = 4 (Exercise 13.519)
13.519 Let {rn : n ∈ N} be an enumeration of all rational numbers in (0, 1). Let the function f be defined on (0, 1) by ∞ |x − rk | f (x) = , x ∈ (0, 1). 2k k=1
(13.69)
Show that f is a convex function that is differentiable exactly at irrational points of (0, 1). This exercise should be related to Remarks 812.4 and 812.10. Observe that there is a real-valued function that is differentiable exactly at Q (a result by the Polish mathematician Z. Zahorski, see [Pira66]). (x) Hint. Use the fact that the expression f (x+h)+f (x−h)−2f is nonnegative for all h convex functions and that the summands are convex functions to prove, by using (8.7), that f is not differentiable at rational points. Use the fact that for Lipschitz (x) (x)−f (x−h)) convex functions the expression f (x+h)+f (x−h)−2f = (f (x+h)−f (x))−(f is h h nicely bounded and thus, by the uniform convergence of the series, this expression can be made uniformly small on a tail (not depending on h). Then deal with the first finitely many terms to make this expression small for small h. Fig. 13.82 hints at why f is not differentiable at rational points. We plotted only the sum of the first four summands of the series in (13.69) taking Q ∩ (0, 1) = {1/2, 1/4, 3/4, 1/8, . . .}. The reader should notice that f has corners at those four first points. Observe that, due to Remark 812.4, the situation described in this exercise is “the worst” it may happen. 13.520 Let f and g be convex functions defined on R such that f (x) ≥ g(x) for every x ∈ R. Assume that for some x0 ∈ R we have f (x0 ) = g(x0 ), and that the function f is differentiable at x0 . Show that g is differentiable at x0 . Hint. Use (8.7) and estimate the quotient for g. Remark. This simple result is in the foundation of many principles in the field called “Convex Optimization”. ® 13.521 Show that there is a real-valued increasing function f on [0, 1] such that f is convex or concave on no subinterval.
796
13 Exercises
Hint. Baire Category. Show that the space M of increasing functions in the metric of the supremum is complete. If In is an interval with rational endpoints, put En := {f ∈ M : f is either convex or concave on In }. Show that each En is closed in M (best is to do it via the complements). Then show that En0 = ∅. Since there are countably many intervals with rational endpoints, we get the result by Baire category theorem. 13.522 Let a, b ∈ R. Show that |a cos x + b sin x| ≤
a 2 + b2
for all x ∈ R. Hint. Cauchy–Schwarz inequality (Proposition 829). This exercise introduces a “several variables” version of the concept of convexity. 13.523 We defined in Exercise 13.514 the concept of a convex set in a vector space. We say that a function f : C → R, where C is a convex subset of a (real) vector space E, is (strictly) convex if it is (strictly) convex when restricted to any nondegenerate line segment in C. (i) Prove that f is convex, if and only if, f ( ni=1 λi xi ) ≤ n n i=1 λi f (xi ) for all x1 , . . ., xn in C, λi ≥ 0 for i = 1, 2, . . ., n, i=1 λi = 1, and n ∈ N, and that f is strictly convex if the former inequality is strict whenever n ≥ 2 and at least two λi ’s are nonzero. (ii) Prove that a convex continuous function defined on a nonempty convex compact subset C of Rn attains its supremum at an extreme point of C. This is called the Bauer maximum principle (for the concept of extreme point, see Exercise 13.587). Hint. The first part, similar to Exercise 13.514. For the second part (see Fig. 13.83), let f : C → R be a continuous convex function on C. Prove first that a bounded-above convex function defined on an open interval (a, b) in R that attains its supremum at some point of (a, b) must be constant on (a, b). As a consequence, prove that if the point x0 ∈ C where f attains its supremum belongs to the relative interior of C (i.e., the interior relatively to the liner span of C), then f must be constant on C, and we are done since C has at least an extreme point (the Krein–Milman theorem, see Exercise 13.608 and Remark 1087). If x0 is not in the relative interior, we can find a supporting hyperplane H to C at x0 . If H ∩ C := {x0 }, then x0 is then an extreme point of C, and we are done again. If H ∩C = {x0 }, reduce the argument to H ∩ C, where the dimension has been diminished by 1. Keep going.
13.6
Fourier Series
√ inx 13.524 Prove that the exponential system {e / 2π }n∈Z in [0, 2π ] is orthonormal for the scalar product f , g := [0,2π ] f g (see Eq. (9.13)).
13.6 Fourier Series
797
Fig. 13.83 A (i) strictly convex (ii) convex, continuous function on a convex compact set attains its maximum at an extreme point (Exercise 13.523)
graf f graf f x0
x0 C
C
(i)
(ii)
√ Hint. Put en := einx / 2π for n ∈ Z. Then, for n, m ∈ Z, 4 4 4 1 1 1 inx inx inx −inx e e dx = e e dx = 1 dx = 1, en , en = 2π [0,2π ] 2π [0,2π ] 2π [0,2π ] and, if n = m, then 1 en , em = 2π
4 einx eimx [0,2π ]
1 dx = 2π
4 ei(n−m)x dx = 0, [0,2π ]
due to the fact that ei(n−m)x has a 2π -periodic primitive function. 13.525 Let f be a 2π-periodic Lebesgue integrable function on R. Show that 4 4 f = f [x,x+2π ]
[0,2π ]
for every x ∈ R. Hint. Say that x ∈ (0, 2π). Then, by a simple change of variable, 4 4 4 4 4 4 f = f+ f = f+ f = [x,x+2π]
[x,2π ]
[2π ,2π+x]
[x,2π ]
[0,x]
f.
[0,2π ]
13.526 Expand the function f (x) = sin4 x into its Fourier’s series. Hint. 1 − cos 2x 2 1 = (1 − 2 cos 2x + cos2 2x) sin4 x = (sin2 x)2 = 2 4 1 1 + cos 4x 3 1 1 = 1 − 2 cos 2x + = − cos 2x + cos 4x. 4 2 8 2 8 13.527 Expand the function f (x) = x on (−π, π ) into its Fourier series and discuss its convergence. Hint. Direct calculation gives f (x) ∼ 2
∞ n=1
(−1)n+1
sin nx . n
See Fig. 13.84 for the graphs of f and S4 (i.e., the sum of the first four terms of its Fourier series). Regarding convergence, the function f is odd, 2π -periodic and
798
13 Exercises
Fig. 13.84 The functions f and S4 in Exercise 13.527 on [−π, π]
Fig. 13.85 The functions f and S3 in Exercise 13.528 on [0, 2π]
continuous at every x = (2n + 1)π, for n ∈ Z. Observe that it is a Lipschitz function on [−π, π ]. According to Corollary 848, its Fourier series converge pointwise (alternatively, note that f has finite one-sided derivatives at each x ∈ R, so we could use Corollary 849 instead). At x = (2n + 1)π , n ∈ Z, the Fourier series converges to f (x), while at the remaining points x := (2n + 1)π , n ∈ Z, it converges to n+1 (1/2)(f (x+) + f (x−)) = 0. Note that, in particular, ∞ sin n/n = 1/2. n=1 (−1) See Exercise 13.126 for another approach to the convergence of this numerical series. The 2π-periodic extension of f is not continuous, hence its Fourier series does not converge uniformly. On the other hand, f ∈ L2 [−π , π ], so its Fourier series converges to f in the norm · 2 . 13.528 Let the function f be defined by f (x) = x 2 for 0 ≤ x ≤ π , f (π + x) = f (π − x) for 0 ≤ x ≤ π and f be 2π periodic. Discuss the pointwise convergence of its Fourier series and compute its Fourier coefficients. Then, compute f (π) in 2 1 order to get the identity (9.43), i.e., π6 = ∞ k=1 k 2 . Hint. The graphs of f and S3 (the sum of the first three terms of its Fourier series) are depicted in Fig. 13.85. The function f is Lipschitz on R and we can use Corollary 848 to ensure the pointwise convergence of its Fourier series to f . The Fourier coefficients of f can be computed by inspection (use integration by parts; note that bk = 0 due to the fact that f is an even function, see Remark 835), getting a0 = 2π and, for k ≥ 1, 3 ak = (−1)k k42 . So, ∞ π2 (−1)k f (x) = +4 cos kx 3 k2 k=1
13.6 Fourier Series
799
Fig. 13.86 The functions f and the first partial sums of its Fourier series (Exercise 13.529) on [−3π, 3π]
for every x ∈ R (this time we have equality, since f is a continuous function). Thus, in particular, for x = π we get π2 =
∞ ∞ (−1)k (−1)k 1 π2 π2 +4 + 4 = . 2 2 3 k 3 k k=1 k=1
This gives identity (9.43). This expression was computed in Example 850 by using a different function. 13.529 Expand the 2π-periodic extension f/ of the function on [−π , π ] as defined n−1 f (x) := x 2 into its Fourier series. As an application, find ∞ (1/n2 ). n=1 (−1) Hint. The function f is even (see Fig. 13.86),so it has a Fourier series expansion 2 π 2 of the form a20 + ∞ n=1 an cos nx, where an = π 0 x cos nx dx for n = 0, 1, 2, . . .. Using integration by parts we readily show ∞
x2 ∼
π 2 4(−1)n cos nx. + 3 n2 n=1
(13.70)
Since f/ is clearly Lipschitz, we can use Theorem 861 to conclude that its Fourier series (13.70) converges uniformly to f/on R. In particular, it converges to f/(0) (= 0) n−1 (1/n2 ) = π 2 /12. at x = 0, and this gives ∞ n=1 (−1) 13.530 Does there exist a continuous periodic function whose Fourier series converges at some point x to a value that is different from f (x)? Hint. No, Fejér’s Theorem 859. 13.531 Let the 2π-periodic function f be defined by & π −x , for 0 < x < 2π , 2 f (x) = 0, for x = 0. sin nx Show that f ∼ ∞ n=1 n , discuss the convergence of this series, and show that it converges uniformly on [δ, 2π − δ] for every δ ∈ (0, π ). Hint. The function f is odd, so it has a Fourier series of sinus. Its 2π -periodic extension is continuous at every x = 2nπ, n ∈ Z. The function f is Lipschitz, so its Fourier series converges pointwise on R (see Corollary 848), and the limit is (x) at x = 2nπ, and 0 at any other point. In particular, for x = 1 we have f ∞ n=1 (sinn)/n = f (1) = (π − 1)/2. For a plot of the function and of a partial sum of its Fourier series see Fig. 13.87.
800
13 Exercises
Fig. 13.87 The functions f and S4 in Exercise 13.531
13.532 Find
∞ sin2 n n=1
n2
.
Hint. Use Parseval identity (11.49) for the function defined on [−π , π ] by & 1, if − 1 ≤ x ≤ 1, f (x) = 0, otherwise. We get the result (π − 1)/2. 13.533 For n ∈ N ∪ {0} let Dn be the Dirichlet kernel (Definition 840). Show that the so-called Lebesgue constants Ln := Dn 1 satisfy Ln = (4/π 2 ) ln n + O(1). Hint (see, e.g., [Katz76], p. 50). 1 Ln = π Note that
4 0
π
n−1 4 (j +1)π sin (n + 1/2)t n+1/2 | sin (n + 1/2)t| dt = 2 dt + O(1). sin (t/2) jπ π j =1 n+1/2 t 4
(j +1)π n+1/2 jπ n+1/2
13.7
| sin (n + 1/2)t| dt =
2 . n + 1/2
Basics on Descriptive Statistics
13.534 In the text, we mentioned the “experiment” of throwing a dart onto a circular dartboard. Assume that the dartboard has radius 1, and that the dart always impact the dartboard anywhere at random. Find the distribution function, its density, mean and variance, and the probability to impact on an annulus {(x, y) : (x, y)2 ∈ [a, b]}, for 0 ≤ a ≤ b ≤ 1. Hint. The random variable X is defined on the closed unit ball B := B(R2 ,·2 ) of the two-dimensional Euclidean space, and its range is [0, 1]. The distribution function F (r) := P {x ∈ B : X(x) ≤ r} is F (r) = πr 2 /π = r 2 forr ∈ [0, 1], and so 1 1 f (r) = 2r for r ∈ (0, 1). The mean is μ := E(X) = 0 rf (r) dr = 0 2r 2 dr = 2/3, 1 and the variance is V (X) = 0 (r − 2/3)2 f (r) dr = 1/18. The probability to impact on an annulus as mentioned is F (b) − F (a) = b2 − a 2 .
13.8 Excursion to Functional Analysis
801
13.535 Prove that the function F in (10.27) satisfies all requirements of a distribution function in Definition 890, and that the random variable X of the normal distribution, whose density function f is given by formula has mean μ and variance σ 2 . +∞ −x(10.26), 2 Hint. Note first that the√integral 0 e dx exists as an improper Riemann integral, that its value is π/2 (see Exercise 13.493), and that the function 2 x $ → e−x defined on R is positive and even. This shows that F is increasing, that limx→−∞ F (x) = 0, and that limx→+∞ F (x) = 1. Moreover, 4 ∞ 1 2 2 E(X) = √ xe−(x−μ) /(2σ ) dx 2πσ −∞ 4 ∞ 4 ∞ 1 −(x−μ)2 /(2σ 2 ) f (x) dx. (x − μ)e dx + μ =√ 2πσ −∞ −∞ The first integral in the second row vanishes, since its integrand is an odd function. That the second integral there is 1 follows from (13.61) in Exercise 13.493. √ 2 2 ∞ Again from (13.61), −∞ e−(x−μ) /(2σ ) dx = σ 2π . Compute the derivative with respect to σ at both sides, to get 4 ∞ √ (x − μ)2 −(x−μ)2 /(2σ 2 ) e dx = 2π , 3 σ −∞ and so V (X) := √
13.8
1 2πσ
4
∞
−∞
(x − μ)2 e−(x−μ)
2 /(2σ 2 )
dx = σ 2 .
Excursion to Functional Analysis
13.536 Is it true that the adjoint of a linear isometry into is an isometry? Hint. No. See the shifts in 2 (Example 1010 and Exercise 13.614). Note that, after identifying ∗2 and 2 (see Theorem 993 and Remark 996), the adjoint of the right shift operator is the left shift one.
13.8.1
Banach Spaces
Basics 13.537 Let S be a subset of a Banach space X. Show that conv (S), i.e., the closure of the convex hull of S, is the smallest (in the sense of inclusion) of all convex and closed subsets of X that contain S. Show, too, that conv (S) is the intersection of all closed and convex subsets of X that contain S. Hint. Prove that the closure of a convex set is a convex set. Use then Exercise 13.514.
802
13 Exercises
13.538 Let X be a Banach space, BX be the closed unit ball of X and SX be the unit sphere of X. Show in detail that the interior BX◦ of BX is the open unit ball B(0; 1) of X, i.e BX◦ = B(0, 1) := {x ∈ X : x < 1}. (ii) If x0 ∈ BX◦ , v0 ∈ SX , and r is the ray from x0 in the direction of v0 , i.e., r := {x0 + tv0 : t ≥ 0}, show that r ∩ SX is a singleton. (iii) If x0 ∈ SX show that dist (2x0 , BX ) = 1.
(i)
Hint. (i) and (iii): the triangle inequality. (ii): For existence, use the Intermediate Value Theorem 339. For uniqueness: Use a convexity argument to prove that if a convex function of one variable has the same value at three different points, it must be constant at the segment between them. n 13.539 Let a real-valued function f be defined on c0 by f (x) = xn , where x = (xn ). Show that f is a continuous function on (c0 , · ∞ ). Is f uniformly continuous on the unit ball of (c0 , · ∞ )? Hint. Note first that f is well defined, since for every c0 there exists n0 ∈ N x∈ (1/2n ) is convergent. Let us such that |xn | < 1/2 for all n ≥ n0 , and the series prove that f is continuous at x ∈ c0 . Given ε ∈ (0, 1/4), there is n0 ∈ N such that n |xn | < ε (< 1/2) for each n > n0 and ∞ n=n0 +1 (1/2 ) < ε. Take δ > 0 small enough so that δ.n20 .max{(x∞ + δ)n−1 : n = 1, 2, . . ., n0 } < ε, and let y ∈ B(x, δ), so y∞ < x∞ + δ (< 1/2). Observe that, given two real numbers a and b, we have a n − bn = (a − b)(a n−1 + a n−2 b + . . . + abn−2 + bn−1 ). Then, for n = 1, 2, . . ., n0 we have |xnn − ynn | = |xn − yn |.|xnn−1 + xnn−2 yn + . . . + xn ynn−2 + ynn−1 | ε < δ.n0 .max{(x∞ + δ)n−1 : n = 1, 2, . . ., n0 } < . n0 We get ∞ |f (x) − f (y)| = (xnn − ynn ) n=1
≤
n0
|xnn
−
+
∞ n=n0 +1
n=1
≤ε+
ynn |
∞ n=n0
|xn | + n
∞
|yn |n
n=n0 +1
∞
1 + 2n n=n +1
0
1 < ε + ε + ε = 3ε. 2n +1
This shows the continuity of f at x. Fix δ > 0 and find n ∈ N such that (1 − δ)/δ < n − 1. For this n, put xn := (1, .(n) . ., 1, 0, 0, . . .), and observe that f (xn ) = n. Note, too, that f ((1 − δ)xn ) < (1 − δ)/δ (< n − 1). This proves that f (xn ) − f ((1 − δ)xn ) > 1. Since δ > 0 was
13.8 Excursion to Functional Analysis
803
Fig. 13.88 Functions |t 2 − at| for some a’s (Exercise 13.540)
arbitrary, and xn − (1 − δ)xn ∞ = δ, the function f is not uniformly continuous on Bc0 . 13.540 Calculate the distance from the function f (t) = t 2 to the line spanned by the function g(t) = t in the space (C[0, 1], · ∞ ). √ Hint. Need to calculate: mina∈R {maxt∈[0,1] |t 2 − at|}. The result is ( 2 − 1)2 . Fig. 13.88 depicts some of the functions |t 2 − at| on [0, 1], also the graph of the function giving the minimum (a thick line). 13.541 Let {xn } be a sequence of real numbers with lim xn = 1. Put x = (xn ) ∈ ∞ . Calculate the distance from x to c0 ( ⊂ ∞ ) in the space ( ∞ , · ∞ ). Hint. We calculate inf{x − y∞ : y ∈ c0 }. The result is 1. Equivalent Norms The concept of equivalent norms was introduced after Definition 896. 13.542 Prove that, in Rn , 1 √ · 1 ≤ · 2 ≤ · 1 , n where · 1 and · 2 where √defined in Examples 897.3 and 963.1. As an application, show that sin x + cos x ≤ 2 (see also Exercise 13.252). √ Hint. See Fig. 13.89 for a two-dimensional picture. Note that (1/ n)B n2 ⊂ B n1 ⊂ B n2 (compute the distance in (Rn , · 2 ) from 0 to the hyperplane ni=1 xi = 1) and use formulas (11.4) and (11.5). For the second part, observe that the point (sinx, cos x) belongs to B n2 . 13.543 Let · 1 and · 2 be two norms on a Banach space X. Assume that one of the two following conditions hold: (i) The two norms define the same family of convergent sequences in X. (ii) The two norms define the same family of bounded subsets of X.
804
13 Exercises
Fig. 13.89 The inclusion of the balls B1 := B 2 and 1 B2 := B 2 in Exercise 13.542 2
Prove that the two norms are equivalent (the converse is obviously true: If the two norms are equivalent, then both (i) and (ii) hold). Hint. Indeed, if the two norms are equivalent and {xn } is a sequence in X that · 1 -converges to some x ∈ X, then x − xn 1 → 0. In view of (11.3), we have, too, x − xn 2 → 0, and so {xn } is · 2 -convergent to x. Assume now that the two norms are not equivalent. By Remark 898.1, for any n ∈ N there exists xn ∈ X such that (11.5) for c := 1/n does not hold. Either (1/nk )xnk 1 > xnk 2 or xnk 2 > nk xnk 1 for some subsequence {xnk } of {xn }. In the first case, . . . 1 √ xn k . .√ . . n x . > nk k nk 2 1 −1/2 for all k, so the sequence {yk }∞ xnk −1 2 xnk for all k, is k=1 , where yk := (nk ) · 2 -null but not · 1 -null. In the second case, we proceed similarly. Note that the construction gives a sequence that is bounded in one of the norms and unbounded in the other.
13.544 Show in detail that if (X, · ) is a Banach space and | · | is an equivalent norm on X, then (X, | · |) is a Banach space. Hint. If {xn } is a | · |-Cauchy sequence, it is also · -Cauchy. Then it is · -convergent, hence | · |-convergent (see Exercise 13.543). 13.545 Define a norm · ∞ on 1 by x∞ := supi |xi |, whenever x = (xi ) ∈ 1 . Show that · ∞ is not equivalent to the norm · 1 on 1 . . ., 1, 0, 0, . . .) for these two norms. Hint. Check the points (1, 1, .(n) 13.546 Let · 1 and · 2)be two equivalent norms on a Banach space X. Show that both · 1 + · 2 and · 21 + · 22 are equivalent norms on X. Hint. The Hölder inequality (8.22) after using the triangle inequality. 13.547 Let the norm · D be defined on R2 for x = (x1 , x2 ) by 0 & 2 xγ2 i 2 , xD = sup 4i i=1 where the supremum is taken over all ordered pairs (γ1 , γ2 ) of distinct elements 1 and 2.
13.8 Excursion to Functional Analysis
805
Fig. 13.90 The two ellipses defining Day’s norm in R2 (Exercise 13.547)
Show that · D is indeed a norm and that x2D =
xγ21 4
+
xγ22 42
,
where γ1 and γ2 are such that |xγ1 | ≥ |xγ2 |. Show that the unit ball of the norm · D is the intersection of two ellipses and draw a picture of it (see Fig. 13.90). The norm · D is a special case of the famous Day’s norm, important in the geometry of Banach spaces (see, e.g., ([Di75], § 4.1)). x2 x2 Hint. First show that if |x1 | ≥ |x2 |, then (x1 , x2 )D = 41 + 422 . This will hint at these ellipses. Then finish the proof in the standard way.
13.8.2
Operators
Linear Functionals 13.548 Let X be a Banach space, and let f : X → R be a linear functional. Prove that the following statements are equivalent: (i) f is continuous. (ii) f is continuous at 0. (iii) f −1 (0) is closed in X. (iv) f −1 (0) is not dense in X. Hint. (i)⇒(ii) is obvious (observe that the equivalence between (i) and (ii) was proved for general operators between normed spaces in Theorem 900). To prove (ii)⇒(iii) take a sequence {xn } in f −1 (0) that converges to x0 and observe that xn − x0 → 0. (iii)⇒(iv) is again obvious. To prove (iv)⇒(i), find a ball B(x, r) that do not intersect f −1 (0). By convexity, B(x, r) must lie at one side of f −1 (0), i.e., in {x ∈ X : f (x) < 0} or {x ∈ X : f (x) > 0}. In both cases, f is one-sided bounded on a neighborhood of some point. By translation, it is one-sided bounded on a neighborhood of 0, and by symmetry it is bounded there. Use finally Proposition 900. 13.549 If f is a discontinuous linear functional on a Banach space X, put C1 := {x ∈ X : f (x) ≥ 0} and C2 := {x ∈ X : f (x) < 0}. Show that C1 and C2 are convex disjoint sets that are both dense in X and C1 ∪ C2 = X. Hint. Obviously, C1 and C2 are convex, disjoint, and together fill X. In order to prove density use Exercise 13.548. Precisely, C1 contains f −1 (0). This last set is
806
13 Exercises
Fig. 13.91 Computing the distance to a hyperplane (Exercise 13.551)
already dense, so C1 is dense in X. Fix e ∈ X such that f (e) = 1. Choose any x ∈ X, and consider the vector x + e. By the density of f −1 (0) we may find a sequence −1 (0) such that xn → x + e. The sequence {yn := xn − e}∞ {xn }∞ n=1 in f n=1 converges to x, and yn ∈ f −1 (−1). This proves that f −1 (−1) is dense in X. Since C2 contains this last set, it follows that C2 is dense. 13.550 Let the linear functional f on (c00 , · ∞ ) (see Example 897.2) be defined by f (x) = xi . Show that f is not continuous. . ., 1, 0, 0, . . .). For another example of a disconHint. Check the points: (1, 1, .(n) tinuous linear functional on every infinite dimensional normed space, see Remark 918.11.1.3. 13.551 Let X be a normed space. Let f be a nonzero continuous linear functional on X, and let N := {x ∈ X : f (x) = 0} its kernel. Show that given x ∈ X, dist (x, N ) = |f (x)|/f . In particular, f attains its norm on BX , if and only if, there exists an element x ∈ X \ N that has a closest element in N , if and only if, every element in X has a closest element in N . Hint. It is enough to show the statement for f ∈ SX∗ and, by translation, to prove that d(0, H ) = 1, where H := {x ∈ X : f (x) = 1}. Given x ∈ H , we have 1 = f (x) ≤ x, hence dist (0, H ) ≥ 1. On the other hand, given ε > 0 find x ∈ SX such that f (x) > 1 − ε (see Fig. 13.91). The element x/f (x) belongs to H , and x/f (x) = 1/f (x) < (1 − ε)−1 . Since ε > 0 was taken arbitrary, this shows that dist (0, H ) ≤ 1, and the result follows. The remaining part follows from it. 13.552 Let X be a Banach space and {fn } be a sequence in BX∗ that converges to 0 at every point of some set D ∈ X, where D is dense in X. Does {fn } converges to 0 at each point of X? Is the assumption on boundedness of the sequence needed? Hint. The answer to the first question is “yes”. Indeed, estimate fn (x) by using close points in D. For the second part, the answer is also “yes”. Use the space c0 (see Example 565.18). Consider the sequence {fn }∞ n=1 of continuous linear functionals on c0 given by fn (x) = nxn , n ∈ N, for x = (xn ) ∈ c00 , and take D the subspace of c0 consisting of its finitely supported vectors. Operators 13.553 (i) Compute the norm of an operator T : (Rn , · ∞ ) → (Rm , · ∞ ) in terms of its associated matrix in the canonical bases. (ii) Do the same for T : (Rn , · 1 ) → (Rm , · 1 ).
13.8 Excursion to Functional Analysis
807
Fig. 13.92 Decomposing T as T3 T2 T1 (Exercise 13.555)
Hint. (i) The mapping x → T x from Rn into R is convex and continuous on the compact convex set B := B(Rn ,·∞ ) , hence it attains its maximum (i.e., the norm T ) at an extreme point of B (see Exercise 13.523). Clearly, a point x ∈ B is extreme precisely when x = ( ± 1, ±1, . . ., ±1) ∈ Rn (see also Exercise 13.587),so it is enough to test T on those points. This amounts to compute supi=1,2,...,m nj=1 |aij |, and this is then the value of T . (ii) Now, if B := B(Rn ,·1 ) , a point x = (x1 , . . ., xn ) ∈ Rn is extreme of B if all coordinates vanish but one, and this is ±1 (see again Exercise 13.587). It follows that T = supj =1,...,n m i=1 |aij |. Observe that to compute the norm T of T : (R n , · 2 ) → (Rm , · 2 ) becomes more complicated, since Ext (B(Rn ,·2 ) ) = S(Rn ,·2 ) , and so we must check T on all points of the Euclidean sphere. To compute T requires Lagrange multipliers (see Example 944 and Exercise 13.554 for the use of this technique, and for references see, e.g., [Edw95]). 13.554 Let the operator T from R2 into R2 be defined by (i) T (x, y) = (x + 2y, 2x + y). (ii) T (x, y) = (x + 4y, x + y). (iii) T (x, y) = (x + y, x + y). Find, in each case, the matrix of T , the eigenvalues and eigenvectors of T , the norm of T in the Euclidean norm of R2 and the adjoint operator of T . Hint. Let us work in detail (i). The rest are similar. The columns of the matrix M associated to T in a given basis are the images of the vectors of the basis. In this case, and choosing the canonical basis, ⎛ ⎞ 1 2 ⎠. M=⎝ 2 1 Since we are in a finite-dimensional setting, the spectrum σ (T ) of the operator T coincides with the set of eigenvalues, i.e., the set of roots of the polynomial det (λI − M), where I is here the identity matrix. In other words, σ (T ) = σp (T ). Observe that det (λI − M) = (λ − 1)2 − 4, hence σ (T ) = {−1, 3}. Each of those eigenvalues defines a set of associated eigenvectors. Precisely, the set of all eigenvectors with eigenvalue λ = −1 is {(x, y) ∈ R2 , (x, y) = (0, 0), x = −y}, while those with eigenvalue λ = 3 form the set {(x, y) ∈ R2 , (x, y) = (0, 0), x = y}.
808
13 Exercises
In order to find T we must maximize the function f (x, y) := T (x, y)2 = (x + 2y)2 + (2x + y)2 on S(R2 ,·2 ) . It is true that f is a convex function on the convex and compact subset B(R2 ,·2 ) of R2 , and so it must attain its maximum at an extreme point of B(R2 ,·2 ) (see Exercise 13.523) (i.e., a point on its boundary S(R2 ,·2 ) ). However, this information is, in this case, irrelevant, since all points in S(R2 ,·2 ) are extreme points of B(R2 ,·2 ) . As mentioned in Exercise 13.553, we need to use Lagrange multipliers (see, e.g., [Edw95]). It is easier (and in order to find the point that maximizes f , equivalent) to maximize F (x, y) := f 2 (x, y) = (x + 2y)2 + (2x + y)2 on S(R2 ,·2 ) , i.e., on {(x, y) ∈ R2 : g(x, y) := x 2 + y 2 − 1 = 0}. The solution (x, y) (it exists due to the continuity of F and the fact that S(R2 ,·2 ) is compact) satisfies ∇F (x, y) = λ∇g(x, y) for some λ (together with g(x, y) = 0) (this is the Lagrange Multiplier Method). This is a nonlinear system that has four solutions: 1 1 1 1 1 1 1 1 √ , √ , √ , − √ , − √ , √ , and − √ , − √ . 2 2 2 2 2 2 2 2 The first and last of those give clearly the maximum value for f , namely T = 3. We compute in Exercise 13.557 the matrix M ∗ associated to the adjoint—or transposed—operator T ∗ . In our case, the adjoint operator T ∗ has, in the canonical basis, the matrix ⎛ ⎞ 1 2 ⎠, M∗ = ⎝ 2 1 that in this case coincides with M. The operator T —or its matrix M—are called then self-adjoint. 13.555 Find a linear operator T from (R2 , · 2 ) into (R2 , · 2 ) that maps the point (1, 1) to the point (1, 0) and the point (1, −1) to the point (0, 1). Find its norm. a b 1 1 Hint. T has, in the canonical basis, a matrix M := . Since M = c d 1 0 1 0 1/2 1/2 and M = , we get M := . To compute T we could −1 1 1/2 −1/2 proceed as in Exercise 13.554. However, this time, and just to stress the geometric point of view of the definitions, we may observe that T is the composition of a number of simple transformations: (T1 ) A reflection with respect the √ OX axis. (T2 ) A counterclockwise rotation by angle π/2, and (T3 ) a homothety of 1/ 2 factor (see Fig. 13.92). Now, it is clear that the image of B(R2 ,·2 ) under T = T3 T2 T1 is the ball √ √ (1/ 2)B(R2 ,·2 ) , hence T = 1/ 2. Observe that, this time, we are not computing explicitly the points that maximize T . However, it follows from the geometric description that any point x ∈ S(R2 ,·2 ) is a maximizer for T .
13.8 Excursion to Functional Analysis
809
13.556 Find a nonzero operator T from R2 into itself such that T 2 = T T = 0 (= the zero operator). a b Hint. Put M := for the matrix associated to T in the canonical basis, c d 1 −1/2 and solve MM = 0. This gives, for example, M := . Observe that if λ 2 −1 is an eigenvalue of an operator T , then λ2 must be an eigenvalue of the operator T 2 . Since, in this case, T 2 = 0, the only eigenvalue of T is λ := 0. Due to the fact that T acts on a finite-dimensional space, the previous observation shows that σ (T ) = {0}. 13.557 Let T : X → Y be a bounded operator from a Banach space X into a Banach space Y . A natural construction that has its origin in analytic geometry defines the so-called adjoint or transposed operator T ∗ . In our context, it goes from the dual space Y ∗ into the dual space X ∗ . Precisely, (T ∗ y ∗ )(x) := y ∗ (T x) for all y ∗ ∈ Y ∗ and all x ∈ X. Prove that (i) T ∗ really maps Y ∗ into X∗ ; (ii) that T = T ∗ ; (iii) in case X and Y are finite-dimensional, define the matrix M associated to T —given algebraic bases in X and Y —and compute the matrix associated to T ∗ in the “dual” bases. Hint. Observe, first, that T ∗ maps Y ∗ into X ∗ as stated: Indeed, if x ≤ 1 and ∗ y ≤ 1, then |(T ∗ y ∗ )(x)| = |y ∗ (T x)| ≤ y ∗ .T x ≤ y ∗ .T .x ≤ T , hence T ∗ y ∗ ∈ X ∗ and T ∗ y ∗ ≤ T . This shows that T ∗ is a bounded operator and that T ∗ ≤ T . In fact, T = T ∗ . Indeed, if T = 0 then T ∗ = 0 and there is nothing to prove. If T = 0 and ε > 0, there exists x ∈ SX such that T x > (1 − ε)T . Find now y ∗ ∈ SY ∗ such that y ∗ (T x) = T x. Then |(T ∗ y ∗ )(x)| = |y ∗ (T x)| = T x > (1 − ε)T , hence T ∗ y ∗ > (1 − ε)T , and so T ∗ > (1 − ε)T . Since ε > 0 is arbitrary, we get T ∗ ≥ T . As we mentioned in Remark 352, every linear operator T from an n-dimensional real vector space E into an m-dimensional one F is represented by a m × n matrix n n m M := (aij )m, i=1,j =1 . Precisely, fixed two Hamel bases, {ej }j =1 of E and {fi }i=1 of F , we can write m aij fi , for j = 1, 2, . . ., n. T (ej ) = i=1
Observe that the coefficients of the expansion of ej in the basis of F form the j th column of the matrix M. In order to compute the matrix associated to the transposed mapping T ∗ : F ∗ → E ∗ (here, no matter which norm is placed on E and F , the
810
13 Exercises
mapping T is automatically continuous, see Corollary 910), we need to fix Hamel bases on F ∗ and E ∗ . It is natural to choose the functional associated to the bases as the corresponding “dual bases”. Precisely, let {ej∗ }nj=1 and {fi∗ }ni=1 be defined as ej∗ (ek ) = δj k , where δj k is 1 if j = k, 0 otherwise, and fi∗ (fl ) = δil similarly. Then T
∗
(fi∗ )(ej )
=
fi∗ (T ej )
=
fi∗
m
akj fk
k=1
= aij , for all i = 1, 2, . . ., m, j = 1, 2, . . ., n. m It follows that the matrix associated to T ∗ in the given bases is M ∗ := (aj∗,i )n, j =1,i=1 , where aj∗i = aij for i = 1, 2, . . ., m, j = 1, 2, . . ., n. In other words, M ∗ is the result of transposing (i.e., interchanging) rows and columns in the matrix M. Such a matrix M ∗ is called in matrix theory the transposed of the matrix, and so there is an agreement between the operator theory and the matrix theory terminologies.
13.558 Show that B( 2 ), equipped with the operator norm, contains an isometric copy of ∞ and thus it is not separable. Hint. Define a mapping ϕ from ∞ into B( 2 ) as follows: If (ai ) ∈ ∞ , let ϕ((ai )) be the bounded operator from 2 into 2 defined for (xi )i ∈ 2 by ϕ((ai )) : (xi )i $ → (ai xi )i . We claim that ϕ is a linear isometry from ∞ into B( 2 ). It is enough to check that if (ai )∞ = 1, then the operator ϕ((ai )) has norm 1 in B( 2 ). First note that if x = (xi ) ∈ 2 then (ai xi )2 =
|ai |2 |xi |2
21
≤ (ai )∞
|xi |2
21
= (ai )∞ x2 ,
and thus the operator ϕ((ai )) has norm at most 1. On the other hand, given ε > 0, choose n0 such that |an0 | > 1 − ε. Then ϕ((ai ))(en0 ) = an0 en0 , which has norm |an0 |. Letting ε → 0 we obtain ϕ((ai )) = 1.
13.8.3
Finite-Dimensional Spaces
13.559 Let Pn [0, 1] be the metric space of all polynomials on [0, 1] whose degree is less than or equal to n, endowed with the d∞ metric. Is it a complete space? Hint. This was proved in Exercise 13.381. Here we provide another approach: The space Pn [0, 1] is a finite-dimensional normed space when equipped with the norm · ∞ (an algebraic basis is {1, x, x 2 , . . ., xn }), and this norm induces the d∞ -metric. The (affirmative) result follows from (iv) in Corollary 910. 13.560 Let X be a Banach space and Y be a finite-dimensional Banach space. Then, there is a bounded linear operator T from X onto Y .
13.8 Excursion to Functional Analysis
811
Fig. 13.93 The intersection of a finite number of hyperplanes cuts SX∗ (Exercise 13.562)
Hint. By (i) in Corollary 910, any one-to-one linear operator S from Y into X is an isomorphism onto SY . Let P be a linear continuous projection from X onto SY (see Corollary 950). It is enough, then, to use the operator S −1 P from X onto Y . 13.561 Assume that X is a Banach space and Y is a finite-dimensional subspace of X. Show that every element of X has a nearest point in Y . Compare with Exercise 13.551. Hint. Bounded sets in finite-dimensional spaces are relatively compact (see (v) in Corollary 910). Observe that the result may fail if the subspace Y is not finitedimensional: Exercise 13.551 gives a necessary and sufficient condition for this to happen when Y is a closed hyperplane; then use Example 922.
13.8.4
Infinite-Dimensional Spaces
13.562 If X is an infinite-dimensional Banach space, show that 0 is in the closure of SX∗ in the pointwise topology on the elements in X, i.e., in the w∗ -topology (see Example 1024). In the case of a separable Banach space X, it gives that there is a sequence of points {fn }∞ n=1 in SX ∗ such that fn (x) → 0 for each x ∈ X. However, such sequence exists even if X is nonseparable. This is a deep theorem due to Josefson and Nissenzweig, see, e.g., [Di84, Chapter 12]. Hint. If x1 , x2 , . . ., xn ∈ X, find a point f ∈ SX∗ such that f (xi ) = 0 for i = 1, 2, . . ., n (see Fig. 13.93). This point always exists; consider an element x0 ∈ span {x1 , x2 , . . ., xn }. Define a linear functional g on span {x0 , x1 , . . ., xn } putting g(x0 ) = 1, and g(xi ) = 0 for i = 1, 2, . . ., n. The function g is continuous (every linear functional on a finite-dimensional space is continuous, see Corollary 910). The Hahn–Banach Theorem 925 gives a continuous linear extension f . 13.563 Show that if {ei : i ∈ N} denotes the set of standard unit vectors in 2 , and f ∈ ∗2 (= 2 ) (see Theorem 993 and Remark 996), then f (ei ) → 0 as i → ∞ (in other words, en → 0 in the weak topology (see Example 1024). This is in contrast with the situation in 1 . In fact, there is no sequence {xi }∞ i=1 in S 1 such that f (xi ) → 0 for each f ∈ ∗1 . This is an important result due to the Estonian-German mathematician A. Schur (see, e.g., [FHHMZ11, Theorem 5.36]).
812
13 Exercises
Hint. First show that if x ∈ 2 then ei , x → 0 when i → ∞. Then use that
∗2 = 2 (see Theorem 993). In 1 use the functional f := (1, 1, . . .) ∈ ∞ . 13.564 Let {ei } be the sequence of the standard unit vectors in 1 . Show that 0 ∈ / conv {ei }. Compare with Exercise 13.611. Hint. If λ1 , λ2 , ..., λn ≥ 0, λi = 1, then λi ei = 1 in 1 . 13.565 [A.J. Guirao, personal communication] This exercise illustrates Remark 625. Let be an uncountable set. The space c0 () of all mappings x : → R with the property that for each ε > 0, the set {γ ∈ : |x(γ )| > ε} is finite, was defined in Example 565.18. We showed in Example 573.18 that, endowed with the norm x∞ := sup{|x(γ )| : γ ∈ }, it is a Banach space. Given γ ∈ , the characteristic function eγ of the set {γ } is an element in c0 () of norm 1. Observe that eγ − eγ ∞ = 1 for every γ , γ ∈ , γ = γ . The set E := {eγ : γ ∈ }, endowed with the metric d∞ induced by · ∞ , is a metric space. By the previous observation, together with (v) in Theorem 582, the space (E, d∞ ) is not separable. Show the preceding statements, as well as the fact that (E, d∞ ) is not compact although every countable cover of (E, d∞ ) by open balls has a finite subcover. Hint. Standard, but the last two statements. For showing noncompactness, consider the open cover {B(eγ , 1/2) : γ ∈ }, not having a finite subcover. Assume that {Bn : n ∈ N} is a cover by open balls. One of them, say Bn := B(eγ , r), must have at least two elements, hence r ≥ 1. It is enough to note that then B(eγ , r) = E. A similar construction can be done by using R endowed with the discrete metric. 13.566 Theorem 977 shows that any orthogonal basis of a separable Hilbert space H is a Schauder basis for H (for the definition of a Schauder basis, see the paragraph after the proof of the aforesaid theorem). Show that the sequence of the standard unit vectors {ei } in p or c0 is a Schauder basis for p (1 ≤ p < ∞) or c0 , respectively. There are separable Banach spaces that do not admit any Schauder basis. We mentioned earlier that this is a result due to P. Enflo. .p . Hint. Let (xn ) ∈ p , for 1 ≤ p < +∞. Note that .x − nk=1 xk ek .p ≤ ∞ ∞ p k=n+1 |xk | , hence x = k=1 x(k)ek , where the convergence is in the norm · p . Given k ∈ N, the linear function fk : p → R that maps x = (xk ) to xk is obviously · p -continuous; this shows that, in case x = ∞ k=1 ak ek , fk (x) = ak (= xk ), and the sequence {en } is, indeed, a basis of p . The argument for c0 is similar. 13.567 Show that a bounded subset C of c0 is relatively · ∞ -compact, if and only if, for every ε > 0 there is i0 ∈ N such that |xi | < ε for every i ≥ i0 and every x = (xi ) ∈ C. Hint. There exists M > 0 such that x∞ ≤ M for all x ∈ C. Assume that C satisfy the ε-i0 -condition. The space (c0 , · ∞ ) is complete (see Example 897.1), hence for proving that C is relatively compact it is enough to show that C is totally bounded (Theorem 620). Given ε > 0, find i0 according to the hypothesis and consider the set Kε := {x = (xi ) ∈ c0 : |xi | ≤ M fori = 1, 2, . . ., i0 , xi = 0 for i > i0 }.
13.8 Excursion to Functional Analysis
813
Fig. 13.94 A sketch of the construction in the hint to Exercise 13.567
The set Kε is · ∞ -compact, since it is homeomorphic to the compact set B(Ri0 ,·∞ ) of Ri0 . Given x = (xi ) ∈ C, the · ∞ -distance to y := (x1 , . . ., xi0 , 0, 0, . . .) ∈ Kε is less than or equal to ε. The result follows from Corollary 617. Assume now that the ε-i0 -condition fails. There exists then ε > 0 such that for every i ∈ N we can find j ≥ i and x ∈ C such that |xj | ≥ ε. We shall construct inductively an ε/2-separated sequence in C. To this end, start by taking an arbitrary x (1) ∈ C, and find j1 = i1 ∈ N such that |xj(1) | < ε/2 for all j ≥ i1 . Find now x (2) ∈ C and j2 ≥ i1 such that |xj(2) | ≥ ε. There exists i2 > j2 ( ≥ i1 ) such that 2 |xj(2) | ≤ ε/2 for all j ≥ i2 . There exists x (3) ∈ C and j3 ≥ i2 such that |xj(3) | ≥ ε, 3 (3) and we can find i3 > j3 such that |xj | ≤ ε/2 for all j ≥ i3 . Proceed inductively. The sequence so obtained satisfies the requirements, hence C is not totally bounded (and so, by Corollary 621, not relatively compact) in (c0 , · ∞ ). ∞ 2 13.568 Let C = {(an ) ∈ 2 : n=1 nan ≤ 1}. Show that C is totally bounded and closed in 2 and that C contains no interior point in 2 . Hint. Use the nth term in the definition of C to show that given ε > 0 there is n0 such that n≥n0 an2 < ε for every (an ) ∈ C. It follows that there is a closed, bounded and finite-dimensional (hence compact) subset Kε of 2 such that C ⊂ Kε + εB 2 . This shows the statement (see Corollary 617). Note that C cannot contain a translate of a multiple of B 2 since B 2 is not totally bounded (see (v) in Corollary 910).
13.8.5
Operators II
Finite-Rank and Compact Operators 13.569 Let K(x, y) be a continuous real-valued function on [0, 1] × [0, 1] and let the operator T be defined on C[0, 1] by 4 Tf (x) =
1
K(x, y)f (y) dy. 0
Show that T is a compact linear operator from (C[0, 1], · ∞ ) into itself. Hint. That T maps C[0, 1] into C[0, 1] has a standard proof (by the way, a more general statement was established at the beginning of the proof of Proposition 691). In order to show that T BC[0,1] is · ∞ -compact it is enough to show that, if ∞ {fn }∞ n=1 is a sequence in BC[0,1] , then the sequence {Tfn }n=1 is pointwise bounded and equicontinuous. Indeed, the Arzelà–Ascoli Theorem 648 shows that {Tfn }∞ n=1 has then a · ∞ -convergent subsequence.
814
13 Exercises
13.570 Show that the identity operator T from 2 into c0 is a bounded linear operator. Find its norm and show that it is not compact. Hint T is well defined, since every sequence in 2 clearly belongs to c0 . The norm is 1. Use the standard unit vectors to show that T is not compact. 13.571 Let T be the identity operator from 1 into 2 . Show that T is a norm-one linear operator, and that T ( 1 ) is a norm-dense proper subspace of 2 (precisely, show that every element x ∈ 2 is the limit of a sequence of elements in T ( 1 )). Show too that T is not a compact operator. Hint. T is well defined, since every sequence in 1 clearly belongs to 2 . Consider finitely supported vectors. Observe that Theorem 953 implies that no continuous linear one-to-one operator from 1 into 2 can be onto, since 1 and 2 are not mutually linearly isomorphic (see Exercise 13.585). To see that T is not compact, consider the sequence {en }∞ n=1 of all the canonical unit vectors. 13.572 Let T be an operator from c0 into 2 be defined by T (xi ) = ((1/2i )xi ) for x = (xi ) ∈ c0 . Show that T is a compact one-to-one continuous linear operator that maps c0 onto a proper dense set in 2 . Hint. For every ε > 0, find a finite ε-net in T (Bc0 ) using the fact that bounded sets in finite-dimensional spaces are relatively compact and the fact that (1/2i ) is a convergent series. The image of T is dense in 2 (compute the image of the canonical unit vectors in c0 ). T cannot be onto because Theorem 953 (the spaces c0 and 2 are not mutually linearly isomorphic, see Exercise 13.606). We remark that Pitt’s theorem (see, e.g., [FHHMZ11, Proposition 4.49]) asserts that any bounded linear operator from c0 into 2 is necessarily compact. 13.573 Let S be a separable Banach space. Let {xi : i ∈ N} be a dense set in SX . For each i, let fi be a continuous linear functional on X with fi (xi ) = 1 and fi = 1. Let the operator T from X into 2 be defined by T (x) = ( fi2(x) i ). Show that T is a compact one-to-one operator from X into 2 . Hint. Note that T clearly maps X into 2 . To show that T is one-to-one, consider x ∈ SX . Pick xi such x − xi < 21 . Then fi (x) = fi (xi ) − (fi (xi ) − fi (x)) ≥ 1 − x − xi ≥ 21 . To show that T is compact, follow the hint in Exercise 13.572 or, alternatively, note that Kn := {l = (li ) ∈ 2 : |li | ≤ 1 fori = 1, 2, . . ., n, li = 0 for i > n} is compact in 2 , and that T BX is close to Kn . Apply then Corollary 617. 13.574 Let P be a bounded projection from a Banach space X onto an infinitedimensional subspace Y . Show that P is not a compact operator. Hint. BY is not compact. 13.575 Show that the closure cannot be omitted in the definition of a compact operator. Hint. Look at Example 922. The operator there was even finite-rank. Related to this, observe Exercise 13.576. 13.576 Let X be a finite-dimensional Banach space, Y be a normed space and let T be a linear operator from X into Y . Show that T is a compact operator.
13.8 Excursion to Functional Analysis
815
Hint. Corollary 910 shows that T is continuous. Since BX is compact (see (v) in Corollary 910), T (BX ) is also compact (no need to take closures). Sets of Operators 13.577 Let X and Y be Banach spaces and Tn and T be bounded linear operators from X into Y for n ∈ N. Assume that lim Tn x = T x for every x ∈ X. Let K be a compact set in X. Show that lim Tn (x) = T (x) uniformly in x ∈ K. Hint. Assume without loss of generality that Tn − T ≤ 1 for all n (use the Banach–Steinhaus Theorem 951). Given ε > 0 there exists a finite ε-net S for K. We can find n0 ∈ N such that (Tn − T )(x) ≤ ε for all x ∈ S and all n ≥ n0 . It follows that (Tn − T )(x) ≤ 3ε for all x ∈ K. 13.578 Let {Tn } be a sequence of continuous linear operators from a Banach space X into a Banach space Y . Assume that the sequence {Tn } converges to an operator T ∈ B(X, Y ) in the operator norm. Show that If Tn are all finite-dimensional, then T is a compact operator. Give an example that T need not in general be finite-dimensional. (ii) If Tn are all compact operators, then T is a compact operator. (i)
Hint. (i) Note that T (BX ) is as close as we wish (in the sense of Corollary 617) to a compact set (precisely, Tn (BX )). The aforementioned Corollary 617 shows that T (BX ) is totally bounded, hence relatively compact (see Corollary 618 and Proposition 620). For an example as requested, take the compact operator defined in Exercise 13.572 and the sequence of its “truncations”. (ii) uses the same argument as in (i). 13.579 Show that in a separable Hilbert space H every compact operator T is the limit in the operator norm of a sequence {Tn } of finite-rank operators (see Definition 921), i.e., Tn − T → 0 when n → ∞. Since the space of compact operators is closed in the operator norm (see Sect. 11.1.7), this shows that the closure in the operator norm of the space of all finite-rank operators is the space of all compact operators. Hint. Let {en }∞ n=1 be an orthonormal basis of H (see Theorem 977). Given n ∈ N, let Pn be the orthogonal projection from H onto the finite-dimensional space span {ek : k ≤ n}. Observe that {Pn ◦ T } is a sequence of finite-dimensional operators that uniformly converge to T (use Exercise 13.577 for K := T (BH )). 13.580 Let H be a separable Hilbert space and T be a bounded linear operator from H into H . (i) Does there exist a sequence {Tn }∞ n=1 of finite-rank operators from H into H such that Tn (x) → T (x) for all x ∈ H ? (ii) Does there exist a sequence {Tn }∞ n=1 of finite-rank operators from H into H that converges to T uniformly on BH ? Hint. (i) Yes. If Pn are the orthogonal projections on the first n vectors of an orthogonal basis of H , consider Tn := Pn ◦ T . (ii) No. The closure in the operator norm of the space of finite-rank operators is the space of compact operators (see
816
13 Exercises
Exercise 13.579). There are bounded noncompact operators on a separable infinitedimensional Hilbert space (the identity operator is such, see the paragraph after Example 922).
13.8.6
Three Principles of Linear Analysis
Dual Spaces 13.581 Show that (c0 , · ∞ )∗ is linearly isometrically isomorphic to ( 1 , · 1 ). Hint. Let f ∈ (c0 , · ∞ )∗ . For n ∈ N, put bn := f (en ), where en is the nth canonical unit vector in c0 , and let b := (bn ). Then ni=1 |bi | = f ( ni=1 εi ei ) ≤ f since n was for some choice of εi ∈ {−1, 1}, i = 1, 2, . . ., n, and we get b1 ≤ f ∞ arbitrary. On the other hand, if x∞ ≤ 1 in c0 , |f (x)| = | ∞ b x | ≤ i i i=1 i=1 |bi | = ∗ b1 . Thus, the mapping f → (bn ) from c0 into 1 is a linear isometry. Clearly it is onto. 13.582 Let {ai } be a sequence of real numbers, such that sup{|ai | : i ∈ N} = 1. Let the functional f on ( 1 , · 1 ) be defined by f (x) = ai xi , where x = (xi ). Show that f is continuous and its norm in ∗1 is equal to 1. Hint. Use Exercise 13.609. 13.583 Assume that f is a continuous linear functional on ( 1 , · 1 ) of norm 1. Put ai := f (ei ) for all i ∈ N, where ei denotes the standard ith unit vector in 1 . Show that sup{|ai | : i ∈ N} = 1. Hint. Use Exercises 13.609 and 13.582. 13.584 Show that ( 1 , · 1 )∗ is linearly isometrically isomorphic to ( ∞ , · ∞ ). Hint. Use Exercises 13.582 and 13.583. 13.585 Show that 1 is not linearly isomorphic to 2 . Hint. From Exercise 13.584 we know that ( 1 , · 1 )∗ is linearly isometrically isomorphic to ( ∞ , · ∞ ), a nonseparable space (see Example 586.2c). However, the dual space of a separable Hilbert space is again a separable Hilbert space (Theorem 993). Related to this exercise see also Exercise 13.606 below. 13.586 Use Exercise 13.581 and Example 905 to show that c0 and C[0, 1] are not linearly isomorphic. Hint. Look at the separability of their dual spaces (for this, look at Exercise 13.581 and Example 905, respectively). Extreme Points and Exposed Points in Banach Spaces We collect here some exercises about extreme and exposed points in Banach spaces. For more on this subject, now in Hilbert spaces, see Remark 1087 and Exercise 13.608 below.
13.8 Excursion to Functional Analysis
817
Fig. 13.95 How to put any of the elements of Bc0 in between two of them
1
2
3
4
5
6
7
8
9
Fig. 13.96 How to put a functions in BC[0,1] in between two of them
13.587 An extreme point of a closed and convex subset C of a Banach space is a point in C that is not the center of any nondegenerate line segment in C. Find the extreme points—if any—of the closed unit balls of (i) c0 , (ii) 1 , (iii) Hilbert spaces (in particular 2 ), (iv) ∞ , and (v) C[0, 1]. Hint. (i) Given x = (xi ) ∈ S(c0 ,·∞ ) , take i ∈ N so that |xi | < 21 and consider the points x ± 41 ei (see Fig. 13.95). (ii) Let x ∈ S 1 such that at least two of its coordinates do not vanish (and so each of them has an absolute value strictly less than 1). It is simple to write x as the middle point of y and z in B 1 , where y and z are obtained from x by adding and subtracting (subtracting and adding, respectively) the same small constant to those coordinates. It follows that the extreme points of S 1 are those of the form ±en , n ∈ N. (iii) We noticed in Remark 960 that every point of the unit sphere of a Hilbert space is extreme. This applies, in particular, to 2 . (iv) Clearly, the extreme points of B ∞ are the elements x = ( ± 1, ±1, . . .). (v) The closed unit ball BC[0,1] has only two extreme points: the constant function 1 and the constant function −1. Indeed, it is clear that those functions are extreme points. On the other hand, if for a continuous function f on [0, 1] we can find x0 ∈ [0, 1] such that |f (x0 )| < 1, it is clear that we can write f as the middle point of two continuous functions g and h in BC[0,1] . See Fig. 13.96. 13.588 Let C be a closed convex subset of a Banach space X, and assume that a continuous linear functional f ∈ X∗ attains its supremum s on C. The set F := {x ∈ X : f (x) = s} ∩ C is thus a nonempty closed and convex subset of C. Prove that if c ∈ F is an extreme point of F , then c is an extreme point of C. Hint. Assume c = (c1 + c2 )/2, where c1 , c2 ∈ C, c = c1 . Due to the fact that f (c1 ) ≤ f (c) and f (c2 ≤ f (c), and that f (c1 + c2 )/2 = f (c), we conclude f (c1 ) = f (c2 ) = f (c), i.e., c1 ∈ F and c2 ∈ F . This is a contradiction with the fact that c is an extreme point of F .
818
13 Exercises
Fig. 13.97 The extreme points do not form a closed set (Exercise 13.589)
Fig. 13.98 (0, 0) is an extreme not exposed point of C (Exercise 13.591)
13.589 Find an example of a three-dimensional convex compact set the extreme points of which do not form a closed set. Can a similar example be found in two dimensions? Related to this, see Exercise 13.590 Hint. In three dimensions, consider the segment S joining the points (0, 0, −1) and (0, 0, 1) and the disc D in the x, y-plane centered at (1, 0, 0) and having the radius 1. Take the convex hull of S and D and consider the point (0, 0, 0) (see Fig. 13.97). This situation cannot occur in two dimensions as if x is not an extreme point, then closed points to it are not extreme either. 13.590 Let C be a compact convex set in Banach space X. Show that the set of extreme points of C is a Gδ set in C. Related to this, see Exercise 13.589. Hint. For n ∈ N, put Fn = {z ∈ C; z = 21 (x + y), x, y ∈ C; x − y ≥ n1 }. Then Fn is closed as if zj ∈ Fn , zj → z and zi = 21 (xi + yj ), xj , yj ∈ C then assume that xj → x and yj → y, x, y ∈ C. Then x − y ≥ n1 . Note that z is not extreme, if and only if, it belongs to some Fn . 13.591 A point x in a closed convex set C in a Banach space X is called an exposed point of C if there is an f ∈ X ∗ such that f is supporting C exactly at x, i.e., there is f ∈ X ∗ such that f (x) = sup{f (y) : y ∈ C}, and for all y ∈ C such that y = x we have f (y) < f (x). (i) Show that every exposed point of C is an extreme point of C (see Exercise 13.587 for the definition of extreme point). (ii) Find an example of an extreme point that is not an exposed point. Hint. (i) The same hint as (ii) in Exercise 13.538. (ii) Consider in R2 the epigraph (i.e., the portion of the plane above the graph) of the function f defined for x ≤ 0 by 0 and for x > 0 by x 2 . Then consider the point (0, 0) (see Fig. 13.98).
13.8 Excursion to Functional Analysis
819
Fig. 13.99 The construction in Exercise 13.593 c0 x0
C
Fig. 13.100 The construction in the second part of Exercise 13.593
13.592 Show that any standard unit vector ei of the unit ball of 1 is an exposed point of the unit ball of 1 . Hint. The functional f defined on 1 by f (x) = xi is exposing the unit ball of
1 at ei . 13.593 Prove the following version of the so-called Krein–Milman theorem: If C is a nonempty compact convex subset of a Banach space X, then C has an extreme point, and in fact C is the closed convex hull of the subset consisting of all its extreme points. (Compare with Exercise 13.608.) Hint. [Due to the Polish mathematician S. Straszewicz] First of all, C is separable (see Proposition 605). By taking the closed linear span of C, we may assume that X is separable, too. Of course, the result does not depend on the particular equivalent norm on X, so we may use Proposition 935 to assume that the norm · on X is strictly convex (i.e., its closed unit ball is a strictly convex set). Fix x0 ∈ X and define a function r : C → R by r(c) := x0 − c. Obviously, r is continuous, hence it attains its maximum, say at c0 ∈ C. Due to the fact that · is strictly convex, this point is unique. It is clear that it is an extreme point of C (see Fig. 13.99). To prove the second part (see Fig. 13.100), assume that D := conv (E) C, where E is the set of all extreme points of C. Use the Separation Theorem 926 to find f ∈ X ∗ that strictly separates conv (E) and some point c ∈ C. Put m := max{f (x) : x ∈ C}. The set F := {x ∈ C : f (x) = m} is a convex compact subset of X, so it has an extreme point e. By Exercise 13.587, e is also an extreme point of C, a contradiction. Remark 1087 An important result in Banach space theory is the following theorem, that extends the last part of Exercise 13.593 to the setting of what are called weakly compact (convex) sets in Banach spaces: [Krein–Milman] If X is a Banach space and C is a closed convex subset of X with the property of being compact in the topology of the pointwise convergence on elements in X∗ , then C has an extreme point. It is equivalent to the seemingly stronger conclusion that C is the closed convex hull of
820
13 Exercises
the set of its extreme points (to obtain the second statement from the first, apply the technique in the Hint to Exercise 13.593). The version for finite-dimensional Banach spaces was due to Minkowski and Carathéodory (Theorem 916). Observe that this finite-dimensional version follows either from Exercise 13.593 or Exercise 13.608. 13.594 Prove that if C is a convex compact subset of a Banach space X and f is an element in X∗ , then f attains its supremum on C at an extreme point of C. Hint. It follows from the argument in the last part of the hint to Exercise 13.593. We note here that the result is true for a convex subset of a normed space X that is compact in the topology of the pointwise convergence on the elements of X ∗ .
Differentiability 13.595 (i) Show that the function x → x on a Banach space is never Gâteaux differentiable at 0. (ii) Show that the function x → x2 is Fréchet differentiable at 0 and compute its derivative there. Hint. (i) Observe that the limit (11.15) does not exist for h = 0 (just let first t → 0+ and then t → 0−). (ii) Standard from the definition. The derivative f (0) is the 0 function. Compare this exercise with Proposition 1000. 13.596 Let f be a continuous linear functional on Banach pace X. Show that f is a Fréchet differentiable function on X and find its Fréchet derivative. Hint. The definition of the Fréchet derivative. The Fréchet derivative is just f . 1 13.597 Let a mapping F on C[0, 1] be defined by F (f ) = 0 f 2 , for f ∈ C[0, 1]. Show that F is Fréchet differentiable and find its Fréchet derivative. Direct calculation. The Fréchet derivative F (f ) satisfies F (f )h = Hint. 1 2 0 f h for h ∈ C[0, 1]. 13.598 Let the function f be defined for x = (x1 , x2 , . . . ) ∈ 2 by f (x) = x12 + x22 . Show that f is Fréchet differentiable on 2 and find its Fréchet derivative. Hint. Use the definition. The Fréchet derivative at x = (x1 , x2 , . . .) is f (x)(h) = 2x1 h1 + 2x2 h2 , where h = (h1 , h2 , . . .) ∈ 2 . 13.599 Show that the set of points where · 1 on 1 is Gâteaux differentiable is {x = (xn ) ∈ 1 : xn = 0 for all n ∈ N}, and compute its Gâteaux derivative at each of those points. Recall (Example 942) that the norm · 1 on 1 is nowhere Fréchet differentiable. For a more precise statement see Exercise 13.600. Hint. Assume first that x = (xn ) ∈ 1 satisfies xn = 0 for some n ∈ N. Then, if en denotes the nth vector of the canonical basis of 1 , x + ten 1 − x1 |t| for allt = 0, = t t and so the limit (11.15) clearly does not exist. On the contrary, assume that xn = 0 for all n ∈ N. Let us prove that the Gâteaux derivative of · 1 at x is (sign xn )∞ n=1 ( ∈ ∞ ), where sign r denotes the sign of a real number r. Fix h = (hn ) ∈ 1 . Without loss
13.8 Excursion to Functional Analysis
821
of generality we may assume that h1 ≤ 1. Given ε > 0 find n0 ∈ N such that ∞ n=n0 +1 |hn | < ε. Then find δ > 0 such that δ < |xn |, n = 1, 2, . . ., n0 . We have sign (xn + thn ) = sign xn for all n = 1, 2, . . ., n0 and all t such that |t| < δ. So we have ∞ x + th1 − x1 − sign xn hn t n=1 ∞ ∞ ∞ n=n0 +1 |xn + thn | − n=n0 |xn | ≤2 ≤ |hn | < 2ε. t n=n +1 0
This shows that · 1 is Gâteaux differentiable at x. 13.600 Prove that the standard norm · on 1 is 2-rough. Hint. (Sketch) For x = 0 the limsup in (11.65) is clearly 2. For a nonzero vector x = (xi ) ∈ 1 , find a very small coordinate (say i) and let h := λei . Then, x + h1 and x − h1 are both, roughly, x1 + h1 . The result follows by letting λ → 0. 13.601 Show that the space 1 does not admit any real-valued C 1 -smooth function with bounded nonempty support (i.e., a C 1 -smooth bump function). This implies that
1 does not admit any C 1 -smooth equivalent norm. Hint. First note that the standard norm ·1 of 1 is a 2-rough norm (see Exercise 13.600). Assume that 1 admits a C 1 -smooth bump function g. Let f : 1 → R ∪ {+∞} be defined by ⎧ ⎨g(x)−2 − x if g(x) = 0, 1 f (x) = ⎩+∞ otherwise. By a variant of Ekeland’s Theorem 1027, there is, for 0 < ε < 1, a point x0 in the domain of f such that ε f (x0 + h) ≥ f (x0 ) − h1 , for everyh ∈ 1 . 4 So for h1 small enough we have, by adding the inequalities for h and for −h, ε g(x0 + h)−2 + g(x0 − h)−2 − x0 + h1 − x0 − h1 ≥ 2g(x0 )−2 − 2x0 1 − h1 . 2 Hence lim sup h1 →0
g(x0 + h)−2 + g(x0 − h)−2 − 2g(x0 )−2 h1
≥ lim sup h1 →0
x0 + h1 + x0 − h1 − 2x0 1 ε − ≥ 2 − ε/2 > 0. h1 2
This contradicts the Fréchet differentiability of g at x0 .
822
13 Exercises
Fig. 13.101 The sequence {fn } in Exercise 13.605
13.602 Show that 1 does not admit any equivalent Fréchet differentiable norm. Hint. Assume that · is an equivalent Fréchet differentiable norm on 1 . Then, the function · 2 is C 1 -Fréchet smooth everywhere on 1 (the point 0 can be treated separately). Let τ be a real-valued function on R that is C 1 -smooth, τ (0) = 1 and τ (t) = 0 for every |t| > 1. Then, the composition τ (x2 ) is a C 1 -smooth bump on
1 , a contradiction with the Exercise 13.601. 13.603 Let · 1 be the standard norm on 1 and let the function f be defined on the unit ball B1 of 1 by f (x) = x1 . Show that there does not exist any C 1 -Fréchet smooth function g defined on the open unit ball such that |f (x) − g(x)| < 41 . Hint. Assume such g exists. Extend g on 1 by putting g(x) = 1 for x1 ≥ 1. Let τ be a real-valued C 1 -smooth function on R such that τ (g(0)) = 1 and τ (t) = 0 for |t| ≥ 21 . Let the function h be given by h(x) = τ (g(x)) for x ∈ 1 . Then, the function h has the following properties: h(0) = 1, h(x) = 0 if x ≥ 43 , and h is C 1 -smooth (to see it consider points x1 < 43 and points x1 ≥ 43 . The Banach–Steinhaus Theorem The Open Mapping and Closed Graph Theorems 13.604 Let X be a normed linear space and · 1 and · 2 be norms on X both being complete and such that · 1 ≥ · 2 on X. Show that · 1 and · 2 are equivalent. Hint. The identity map I : (X, · 1 ) → (X, · 2 ) is continuous, and by the Banach Open Mapping Theorem 953, it is a linear isomorphism. 1 13.605 Let a norm on C[0, 1] be defined by f 1 := 0 |f |. Use Exercise 13.604 to show that (C[0, 1], · 1 ) is not complete. Hint. Observe that f 1 ≤ f ∞ for all f ∈ C[0, 1]. If fn (x) := x n − x 2n for ∞ x ∈ [0, 1] and n ∈ N, the sequence {fn 1 }∞ n=1 converges to 0, but {fn ∞ }n=1 not (see Fig. 13.101).
13.8 Excursion to Functional Analysis
13.8.7
823
Spaces with an Inner Product (Pre-Hilbertian and Hilbert spaces)
Basics 13.606 Prove that none of the Banach spaces c0 (N), p (N) for p ≥ 1, p = 2, or C[0, 1] is linearly isomorphic to a Hilbert space. Hint. Note that if two Banach spaces are linearly isomorphic, so they are their dual spaces and their bidual spaces. For C[0, 1], it is enough to note that it is a separable Banach space whose dual is nonseparable (see Example 905). However, the dual space of a separable Hilbert space H is separable, as it is linearly isometric to H (see Theorem 993). For c0 (N) (a separable Banach space), note that its bidual is ∞ (N) (this follows from Exercises 13.581 and 13.584). This space is not separable (see Example 586.2c). However, the bidual of a separable Hilbert space H is separable (in fact, it is linearly isometric to H , see Theorem 993). Another approach to the same result appear in Remark 918.5. Regarding p (N), p > 1, p = 2, should this space be isomorphic to a Hilbert space, its dual space (a space isometric to q for q such that (1/p) + (1/q) = 1, see, e.g., [FHHMZ11, Proposition 2.17]) will be isomorphic to itself. This is false by a theorem of Pitt (see, e.g., [FHHMZ11, Proposition 4.49]). The case p = 1 is treated in Exercise 13.585. 1 13.607 Show that in a separable Hilbert space H , the series ∞ i=1 i ei is unconditional but not absolutely convergent, where {ei }∞ is an orthonormal basis. i=1 Hint. The first statement is a consequence of the Riesz–Fischer Theorem ((v) in Theorem 977), since the sequence {1/ i}∞ i=1 is in 2 . Note that every reordering of the orthonormal basis is still an orthonormal basis, and the result follows. See also Remark 979.1. Finally, use the fact that the harmonic series is divergent (Proposition 161). We remark that in any infinite-dimensional Banach space such a series can be found; this is a result of Dvoretzky and Rogers (see, e.g., [FHHMZ11, Theorem 13.38]).
Extreme Points and Exposed in Hilbert Spaces 13.608 An exposed point e (exposed by a functional f ∈ SX∗ ) of a closed convex set C of a Banach space X is said to be strongly exposed if diam {x ∈ C : f (x) > f (e) − δ} → 0 as δ → 0. Let X be a Banach space isomorphic to a Hilbert space. Show that any compact convex set C in X has a strongly exposed point (hence an exposed point, and so an extreme point). Note that this was proved in a more general context in Exercise 13.593. Hint. [Straszewicz] Note first that the result does not depend on the (equivalent) norm chosen on X. We may then assume that the space is a Hilbert space H . Take x0 ∈ H arbitrary. Let x be a farthest point to x0 in C (its existence is guaranteed by
824
13 Exercises
Fig. 13.102 The construction in Exercise 13.608
Fig. 13.103 A functional attains its supremum on B 2 1 at an extreme point (Exercise 13.610)
compactness, since the mapping d : C → R given by d(x) := x − x0 for x ∈ C is continuous). Let B be the closed ball centered at x0 and having radius d(x). Let f be the (unique) supporting functional to B at x (see Corollary 988 and Fig. 13.102). Then, by Proposition 990, the functional f strongly exposes B at x and thus, since C ⊂ B, it also strongly exposes C at x. 13.609 Show that given real numbers a and b, we have max{(ax + by) : |x| + |y| = 1} = max{|a|, |b|}. Hint. Standard. Related to this, see Exercise 13.610. 13.610 Give a geometric interpretation of Exercise 13.609 in terms of the extreme points of the closed unit ball of ( 21 , · 1 ). Hint. Let a and b be two real numbers. Let f (x, y) := ax + by for (x, y) ∈ 21 . We are asking for the maximum of f on the closed unit ball B of ( 21 , · 1 ). Since f is a continuous linear functional on 21 , it attains it supremum on an extreme point of B (see Remark 1087). It is simple to prove that the set of extreme points of B is {(0, 1), (0, −1), (1, 0), (−1, 0)}. The result follows (see Fig. 13.103). 13.611 Let {ei }∞ i=1 be the sequence of standard unit vectors in 2 . Show that 0 ∈ C, where C := conv{ei : i ∈ N}. Show that 0 is not an extreme point of C (for the concept of extreme point see Exercise 13.587). Show that there is no sequence {λi } of nonnegative numbers with the sum equal to 1 such that 0 = i λi ei . Hint. For the firs assertion, we shall provide two hints: (i) First, show that n1 (e1 + e2 + ... + en ) = √1n . This proves that 0 ∈ C. (ii) Alternatively, use Exercise 13.563. To prove that 0 is not an extreme point of C, note that all the coordinates of all points in C are nonnegative. So, if0 = 21 (x1 + x2 ) for x1 , x2 ∈ C, then both x1 and x2 have all coordinates 0. If 0 = i λi ei , then the i coordinate of the latter sum is 0.
13.8 Excursion to Functional Analysis
825
Differentiability in Hilbert Spaces 13.612 Show directly that the norm of a real Hilbert space H is Fréchet differentiable out of 0, and compute its Fréchet derivative. Hint. Prove first that the function x $ → x2 on a H is a Fréchet differentiable function out of 0. For this, note that, if x is a nonzero element in H , then (x + th, x + th) − (x, x) = (x, th) + (th, x) + t 2 (h, h), hence x + th2 − x2 − (2x, h) = |2(x, h) + t(h, h) − 2(x, h)| = |t|.h2 t for all h ∈ H and t = 0. This shows the statement by taking limits as t → 0, getting in particular that the derivative of · 2 at x = 0 is the continuous linear mapping h $ → (2x, h). Now, √ for the function x $ → x, compose with the differentiable (out of 0) function x. The Fréchet derivative at x is h $ → (x, h). 13.613 Show that there is no equivalent rough norm on a Hilbert space. Hint. The square of the norm of a Hilbert space is Fréchet differentiable away from the origin. This directly follows from the definition of Fréchet derivative and the properties of the dot product. Exercise 13.602 then gives a C 1 -smooth bump on a Hilbert space and the contradiction with the assumption on the existence of a rough norm is reached similarly to the argument in Exercise 13.601.
13.8.8
Spectral Theory
13.614 Let the mapping L (called the left shift operator) from 2 into 2 be defined by L(x) = (x2 , x3 , . . .) for x = (x1 , x2 , x3 , . . .) ∈ 2 . Show that L is a bounded linear operator. Find its norm and all its eigenvalues. Find its spectrum. Prove that σc (L) = SC (so σr (L) = ∅). Hint. Clearly, L = 1 (although L is not an isometry, since Le1 − L02 = 0). Regarding eigenvalues, note that L(1, λ, λ2 , . . .) = (λ, λ2 , . . .) = λ(1, λ, λ2 , . . .), so λ is an eigenvalue for each |λ| < 1. If |λ| ≥ 1, then the equation of the eigenvalue λ is not solvable in 2 . This shows that σp (L) = {λ ∈ C : |λ| < 1}. Since L=1, we get, by Lemma 1008, that σ (L) ⊂ BC . Due to the fact that σ (L) is compact (see Theorem 1009) and that it contains {λ ∈ C : |λ| < 1}, it follows that σ (L) = BC . Given λ ∈ SC , observe that (λI − L)e1 = λe1 , and that (λI − L)en = λen − en−1 for n ≥ 2. It follows that λn en belongs to the range of λI − L for all n ∈ N. Thus, this range is dense in 2 . It cannot be the whole of 2 , since in this case λ ∈ ρ(L), violating that σ (L) = BC . This proves that σc (L) = SC (and thus σr (L) = ∅).
826
13 Exercises
13.615 Find the sets σp (D) and σ (D) for a diagonal operator D on 2 , i.e., for an operator D(xi ) = (ci xi ), where {ci } is some bounded sequence of scalars. Calculate the norm D. For simplicity, assume that ci = 0 for each i ∈ N. Hint. Put C := sup{|ci | : i ∈ N}. Obviously, D ≤ C. On the other hand, given ε > 0 find i ∈ N such that |ci | > C − ε. Note that Dei 2 = ci ei 2 = |ci | > C − ε. This shows, since ε > 0 is arbitrary, that D = C. Note that Den = cn en for all n ∈ N. This shows that cn ∈ σp (D). On the other hand, if λ = cn for each n ∈ N, and Dx = λx, we get cn xn = λxn for all n ∈ N, hence xn = 0 for all n ∈ N, so x = 0. This shows that σp (D) = {cn : n ∈ N}. We claim that σ (D) = σp (D). Indeed, since σ (D) is closed (see Theorem 1009) we have σp (D) ⊂ σ (D). On the other hand, if λ ∈ σp (D), the distance d from λ to σp (D) is strictly positive, i.e., |λ − cn | ≥ d > 0 for all n ∈ N. Given z = (zn ) ∈ 2 , the vector x = (xn ) such that xn := (λ − cn )−1 zn for all n ∈ N clearly belongs to 2 and (λI − D)x = z. This shows that (λI − D) is onto, (and one-to-one, from the first part of the proof), hence an isomorphism by Theorem 953, so λ ∈ ρ(D).
13.8.9
Pointwise Topology and Product Spaces
n 13.616 Let {x n }∞ n=1 be a bounded sequence in c0 such that xi → 0 as n → ∞ for ∗ n each i ∈ N. Is it true that f (x ) → 0 for each f ∈ c0 ? Does the same hold true for
2 and for 1 instead of c0 ? Hint. True for c0 : Note that c0∗ = 1 (see Exercise 13.8.6). We may assume ∗ n x 2 ≤ 1 for all n ∈ N. Given f ∈ c0 such that f 1 = 1 and ε > 0, get i0 such ∞ that i0 +1 |fi | < ε. Let n0 be such that |xin | < ε/ i0 for all n ≥ n0 and all i ≤ i0 . Then for n ≥ n0 we have ∞ n n |f (x )| = fi xi ≤ |fi xin | + |fi xin | ≤ ε + ε = 2ε. i=1
i≤i0
i≥i0 +1
True for 2 , a similar proof. Not true for 1 : x n = (0, . . ., 1, 0, . . .), where 1 is on the n digit. This exercise illustrates an important result known as Rainwater theorem: A bounded sequence in a Banach space X that converges to an element x pointwise on the set of all extreme points of BX∗ is weakly convergent to x. See, e.g., [FHHMZ11, Theorem 3.134]. Observe that the set of all extreme points of Bc0∗ is {en : n ∈ N}, where en is n-th canonical unit vector in c0∗ (= 1 ), and the set of all extreme points in B ∗1 is the set of all vectors in ∗1 (= ∞ ) whose coordinates are ±1. 13.617 Let X be a Hilbert space, x ∈ SX and a sequence {xn } in SX be so that f (xn ) → f (x) for every f ∈ X∗ . Show that xn − x → 0. Show that such a statement does not hold true for the space c0 . Hint. Choose f ∈ SX∗ such that f (x) = 1 (see Corollary 931). Then 2 ≥ xn + x ≥ |f (xn ) + f (x)| → 2. Recall the parallel equality (11.25): xn + x2 +
13.8 Excursion to Functional Analysis
827
x − xn 2 = 2xn 2 + 2x2 . Its right-hand side is equal to 4, while xn + x2 → 4. Thus xn − x → 0. In the space c0 , put x = (1, 0, . . . ), xn = (1, 0, . . . , 1, 0, . . . ), the second 1 in the nth position. Then xn − x = 1 for each n > 2 and xn → x in each coordinate. Thus, by Exercise 13.616, f (xn ) → f (x) for each f ∈ c0∗ . This exercise illustrates a property common to Banach spaces where the norm is locally uniformly rotund, i.e., xn → x if {xn } is a sequence in SX , x a point in SX , and x + xn → 2. It is an important result (due to Kadets, see, e.g., [FHHMZ11, Thm. 8.1]) that every separable Banach space has a locally uniformly rotund equivalent norm. 13.618 In finite-dimensional spaces, a sequence {xn }∞ n=1 converges to a point x if and only if xn converges to x pointwise on X ∗ , i.e., in the w-topology. Show that if X is an infinite-dimensional Banach space and {fi }∞ i=1 is any sequence of points of the of norm-one elements of X such that dual space X, then there is a sequence {xn }∞ n=1 fj (xn ) → 0 for every j ∈ N. Compare this with the sequence {en } of the standard unit vectors in 2 . Hint. First note that from linear algebra methods, it follows that the codimension of the intersection A of the null spaces of f1 , f2 , . . ., fn is at least n. So, pick xn ∈ A with xn = 1. Keep going. You get a sequence {xn }∞ n=1 of norm-one elements of X such that fj (xn ) → 0 for each j . ∗ 13.619 Let X be a Banach space. Let a bounded sequence {fn }∞ n=1 in X converges ∞ ∗ to f ∈ X on a linearly dense subset A of X. Then {fn }n=1 converges pointwise (i.e., on every x ∈ X) to f . Compare with Theorem 919. Hint. Without loss of generality, we may assume that fn ≤ 1 for all n ∈ N. It is enough to show that if xj → x in X and fn (xj ) →n 0 for each j , then fn (x) → 0. For it, let ε > 0 be given. Choose j0 so that xj0 − x < ε. Pick n0 so that |fn (xj0 )| < ε for all n ≥ n0 . Then for all n ≥ n0 we have |fn (x)| ≤ |fn (x − xj0 )| + |fn (xj0 )| ≤ x − xj0 + |fn (xj0 )| < 2ε.
13.620 Show that Bc0 is not compact in the metric of pointwise convergence. Hint. Let x n ∈ Bc0 be defined by xin = 1 if i ≤ n and 0 elsewhere. Then n x → (1, 1, ...) pointwise and every subsequence has the same property. However, (1, 1, ..) ∈ c0 . Note that in the space 1 , the standard unit vectors {ei } have all norm 1 and they converge to 0 pointwise. However, f (ei ) = 1 for all i for f = (1, 1, ...) ∈ ∗1 = ∞ . Actually, there is no sequence {xi } of norm-one elements in 1 such that f (xi ) → 0 for all f ∈ ∗1 . This is the well-known theorem of Schur (see, e.g., [FHHMZ11, Theorem 5.36]). It was also mentioned in Exercise 13.563.
13.8.10
Periodic Distributions
13.621 We introduced the convolution F ∗ G of two periodic distributions F and G in Definition 1060. Prove that this definition agrees with the classical definition (see
828
13 Exercises
Exercise 13.227) of the convolution f ∗ g of two continuous real-valued 2π -periodic functions f and g on R. Hint. Assume that f and g are real-valued continuous 2π -periodic functions on R. In order to prove that Ff ∗ Fg = Ff ∗g (see Example 1053.1), it will be enough to show that (Ff (en ).Fg (en ) =) (Ff ∗ Fg )(en ) = Ff ∗g (en ) for all n ∈ Z, where en (t) = e−int for n ∈ Z. We have 4 2π 1 Ff ∗g (en ) = (f ∗ g)(t)e−int dt 2π 0 4 2π 4 2π 1 1 = f (s)g(t − s) ds e−int dt 2π 0 2π 0 4 2π 4 2π 1 1 = f (s)g(t − s) dt e−int ds 2π 0 2π 0 4 2π 4 2π 1 1 = f (s) e−ins ds g(t − s)e−in(t−s) dt 2π 0 2π 0 4 2π 4 2π 1 1 −ins −int = f (s)e ds g(t)e dt = Ff (en ).Fg (en ), 2π 0 2π 0 the third equality due to Corollary 864, and the last one by the fact that both g and e−int are 2π -periodic functions. 13.622 Solve (i.e., prove existence and uniqueness), by a method similar to the one used in 11.8.6, the following problem (the Wave Equation): Given G and H in PD, find a family {Ft }t>0 in PD such that, simultaneously, (i) there exists dtd Ft for each t > 0, 2 (ii) there exists dtd 2 Ft = D 2 Ft for each t > 0, PD
(iii) Ft −→ G, as t ↓ 0, and PD
(iv) dtd Ft −→ H , as t ↓ 0. Hint. Let G ∼ n∈Z bn en and H ∼ n∈Z cn en . The ordinary differential equation to be solved for n ∈ Z is D 2 an (t) = −n2 an (t) for t > 0, together with the initial conditions an (0) = bn and Dan (0) = cn for n ∈ Z. This gives Ft ∼ 1 n∈Z an (t)en for t > 0, where an (t) = bn cos nt + n cn sin nt for n = 0, and a0 (t) = b0 + c0 t, for n ∈ Z and t > 0. 13.623 Solve the Dirichlet problem of finding a function u on the closed unit disc ∂2u ∂2u in R2 that satisfies Laplace equation ∂x 2 + ∂y 2 = 0 and that coincides with a given function g on the boundary of D. In the language of distributions, and passing to polar coordinates, it can be formulated and solved in a more general way, letting to the following result: Given G ∈ PD, there exists a unique function u(r, θ ) defined
13.8 Excursion to Functional Analysis
829
and infinitely differentiable on [0, 1) × R such that, if ur (θ ) := u(r, θ), ⎧ ⎨r 2 ∂ 2 u + r ∂u + ∂ 2 u = 0, 0 ≤ r < 1, θ ∈ R, ∂r ∂θ 2 ∂r 2 ⎩ur → G as r ↑ 1, in the sense of distributions. Prove that ur = G ∗ Pr , for r ∈ [0, 1), where Pr is the Poisson kernel, i.e., Pr ∈ T, Pr (θ) := n∈Z r |n| einθ for θ ∈ R. Moreover, if G = Fg for some g ∈ CP [0, 2π ], then ur → g uniformly as r ↑ 1. Hint. Let G ∼ n∈Z bn en . For uniqueness, let Fr ∼ n∈Z an (r)en (where en := einθ for all n ∈ Z). The partial differential equation leads to r 2 D 2 an + rDan − n2 an = 0 for 0 < r < 1, and the boundary condition to an (r) → bn as r ↑ 1 for n ∈ Z. Moreover, an (0) = 0 for n = 0. For n ∈ Z, n = 0, we find an (r) := bn r |n| , and a0 (r) := b0 . For existence, put ur := G ∗ Pr , 0 ≤ r < 1. Observe that {Pr }0≤r