349 125 5MB
English Pages XI, 463 [458] Year 2020
UNITEXT 125
Stefano Gentili
Measure, Integration and a Primer on Probability Theory Volume 1
UNITEXT - La Matematica per il 3+2 Volume 125
Editor-in-Chief Alfio Quarteroni, Politecnico di Milano, Milan, Italy; EPFL, Lausanne, Switzerland Series Editors Luigi Ambrosio, Scuola Normale Superiore, Pisa, Italy Paolo Biscari, Politecnico di Milano, Milan, Italy Ciro Ciliberto, Università di Roma “Tor Vergata”, Rome, Italy Camillo De Lellis, Institute for Advanced Study, Princeton, NJ, USA Massimiliano Gubinelli, Hausdorff Center for Mathematics, Rheinische Friedrich-Wilhelms-Universität Bonn, Bonn, Germany Victor Panaretos, Institute of Mathematics, EPFL, Lausanne, Switzerland
The UNITEXT - La Matematica per il 3+2 series is designed for undergraduate and graduate academic courses, and also includes advanced textbooks at a research level. Originally released in Italian, the series now publishes textbooks in English addressed to students in mathematics worldwide. Some of the most successful books in the series have evolved through several editions, adapting to the evolution of teaching curricula. Submissions must include at least 3 sample chapters, a table of contents, and a preface outlining the aims and scope of the book, how the book fits in with the current literature, and which courses the book is suitable for. For any further information, please contact the Editor at Springer: francesca. [email protected] THE SERIES IS INDEXED IN SCOPUS
More information about this series at http://www.springer.com/subseries/5418
Stefano Gentili
Measure, Integration and a Primer on Probability Theory Volume 1
123
Stefano Gentili Funzionario Dirigente P.A. Tolentino, Macerata, Italy Translated by Simon G. Chiossi Departamento de Matemática Aplicada Universidade Federal Fluminense Niterói, Rio de Janeiro, Brazil
ISSN 2038-5714 ISSN 2532-3318 (electronic) UNITEXT - La Matematica per il 3+2 ISSN 2038-5722 ISSN 2038-5757 (electronic) ISBN 978-3-030-54939-8 ISBN 978-3-030-54940-4 (eBook) https://doi.org/10.1007/978-3-030-54940-4 © The Editor(s) (if applicable) and The Author(s), under exclusive license to Springer Nature Switzerland AG 2020 This work is subject to copyright. All rights are solely and exclusively licensed by the Publisher, whether the whole or part of the material is concerned, specifically the rights of translation, reprinting, reuse of illustrations, recitation, broadcasting, reproduction on microfilms or in any other physical way, and transmission or information storage and retrieval, electronic adaptation, computer software, or by similar or dissimilar methodology now known or hereafter developed. The use of general descriptive names, registered names, trademarks, service marks, etc. in this publication does not imply, even in the absence of a specific statement, that such names are exempt from the relevant protective laws and regulations and therefore free for general use. The publisher, the authors and the editors are safe to assume that the advice and information in this book are believed to be true and accurate at the date of publication. Neither the publisher nor the authors or the editors give a warranty, expressed or implied, with respect to the material contained herein or for any errors or omissions that may have been made. The publisher remains neutral with regard to jurisdictional claims in published maps and institutional affiliations. Cover illustration: Construction of the Riemann integral of a function. © KBRUSH Agenzia Pubblicitaria. Reproduced with permission This Springer imprint is published by the registered company Springer Nature Switzerland AG The registered company address is: Gewerbestrasse 11, 6330 Cham, Switzerland
Preface
“We may forgive a child that is afraid of the darkness. The real tragedy of life is when a man fears the light.” Plato “Classes and concepts may, however, also be conceived as real objects, namely classes as “pluralities of things” or as structures consisting of a plurality of things and concepts as the properties and relations of things existing independently of our definitions and constructions. It seems to me that the assumption of such objects is quite as legitimate as the assumption of physical bodies and there is quite as much reason to believe their existence. They are in the same sense necessary to obtain a satisfactory system of mathematics as physical bodies are necessary for a satisfactory theory of our sense perceptions and in both cases it is impossible to interpret the propositions one wants to assert about these entities as propositions about the “data”, i.e., in the latter case the actually occurring sense perceptions. (1944)”1 Kurt Gödel
This text presents, starting from simple notions, advanced topics in real analysis, measure theory and integration. The various subjects are introduced first in an informal and wordy manner, and then discussed from a technical and scientific point of view. The thorough and complete study thus provides the reader with an extensive and well-founded picture of the more complex, interesting and stimulating concepts of mathematical analysis.
“Russell’s mathematical logic”, in: A. D. Irvine (ed), Bertrand Russell—Critical assessments, Volume II: logic and mathematics, Routledge, London-New York, 1999, p. 120.
1
v
vi
Preface
The text’s approach, albeit prevailingly theoretical, leads up to the study of applied subjects such as probability, statistics and other areas of applied mathematics (including classical analysis and dynamics of financial markets) plus many more. This aspect will become particularly relevant in Volume Two which discusses, besides Fourier transforms, topics like characteristic functions and the various themes related to convergence, including the laws of large numbers and the central limit theorem. All of this clearly attests to the natural relationship between the notions elaborated in analysis and the possible applications of such theory to the solution of practical problems. A second feature of the book, which I believe makes it more accessible to novices, are the numerous references to the historical-scientific context in which the topics have developed. These tell about the often misguided and error-laden, yet ingenious, attempts made by scholars that contributed to the birth of novel branches of mathematical analysis, while at the same time leading to a revision and deep reorganisation of the entire subject. Notions that earlier appeared to be rock-solid truths, and part of a system made of attuned laws without odd behaviours, thanks to the critical investigations begun in the 1800s are now called into question. The ensuing virtuous revision of several fields has made mathematics more rigorous as a whole, and has laid the foundations of the contemporary landscape’s various facets. The focus of this book is the Lebesgue integral, which was created and studied by the French namesake at the beginning of the 1900s, in order to overcome the limitations of the Riemann integral which hindered a satisfactory development of Fourier’s theory. In due course it became manifest that a reasonably general theory of integration, capable of outdoing Riemann’s theory, could only be attained by at the same time strengthening the understanding of the measure of an object. In turn, this prompted the development of a full theory of measure able to interpret new aspects and conceptual differences that would otherwise not be detectable. Throughout the treatise a number of important contributions are presented, made by the likes of Lejeune Dirichlet, Émile Borel, Ulisse Dini, Giuseppe Peano, Georg Cantor, Camille Jordan, Giuseppe Vitali, Vito Volterra, Constantin Carathéodory, René-Louis Baire, Felix Hausdorff, Heinrich Tietze, Wacław Sierpiński, and Kazimierz Kuratowski, to name a few. The books is meant for STEM and Economics students with a background in real analysis in one variable. Notwithstanding the rather elementary inception, the text allows to learn the subject’s tools and understand its sophisticated techniques. It also introduces the reader to the more elaborate topics of contemporary mathematics as we know them today. Needless to say, the contents are the result of the work of several scholars which the author has only strived to reproduce, in order to contextualise, logically and historically, a rather profound matter. Due merits are therefore exclusive to all those mathematicians that contributed to the development of the subject treated here.
Preface
vii
At last, my heartfelt thanks go to Francesca Bonadei and Francesca Ferrari at Springer Italy, who with tremendous professionalism shepherded the author to the final publication. Special thanks to Simon G. Chiossi (Fluminense Federal University—UFF) who, besides translating the book from Italian with unparalleled expertise and competence, called my attention to a few points that were eventually fleshed out. The author clearly remains the sole responsible for the contents. Tolentino, Italy April 2019
Stefano Gentili
Contents
Part I
Sets
1
Round-Up of Topology . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.1 Topology, Neighbourhoods, Closure, Interior and Frontier . . . . 1.2 Limit Points and Isolated Points . . . . . . . . . . . . . . . . . . . . . . .
3 3 10
2
Types of Sets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.1 Derived Set, Set of Isolated Points and Boundary . 2.2 Nowhere Dense, Dense and Perfect Sets . . . . . . . 2.3 Types of Sets in Metric Spaces . . . . . . . . . . . . . . 2.4 Meagre and Residual Sets . . . . . . . . . . . . . . . . . .
Part II
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
13 13 24 29 48
55 55 58 63 68
Borel Sets and Baire Functions on R
3
Borel 3.1 3.2 3.3 3.4
sets in R . . . . . . . . . . . . . . . . . . . . . . . . . . . . Basics on Borel Sets . . . . . . . . . . . . . . . . . . . . Borel Fr Sets and Gd Sets in R . . . . . . . . . . . . Discontinuity Set of Real Functions over ½a; b . Additive and Multiplicative Families of Sets . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
4
Baire 4.1 4.2 4.3 4.4
Functions on R . . . . . . . . . . . . . . . . . . . Introduction . . . . . . . . . . . . . . . . . . . . . . Baire Functions on R . . . . . . . . . . . . . . . Real Functions Over ½a; b in Class B1 . . Cardinality of Classes of Baire Functions .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. 85 . 85 . 89 . 96 . 102
5
Borel Functions and Baire Functions . . . . . . . . . . . . . . . . . . . . . . . 105 5.1 The Lebesgue–Hausdorff Theorem on R . . . . . . . . . . . . . . . . . 105
. . . . .
. . . . .
. . . . .
. . . . .
ix
x
Contents
Part III
Families of Sets
6
Semi-algebra and Algebra of Sets . . . . . . . . . . . . . . . . . . . . . . . . . . 121 6.1 Semi-algebra of Subsets of a Set X . . . . . . . . . . . . . . . . . . . . . 122 6.2 Algebra of Sets Generated by a Semi-algebra . . . . . . . . . . . . . . 123
7
Monotone Classes and r-Algebras . . . . . . . . . . . . . . . . . . . . . . . . . . 131 7.1 Monotone Classes and Generation of r-Algebras . . . . . . . . . . . 131 7.2 The Borel r-Algebra and Restricted r-Algebras . . . . . . . . . . . . 142
Part IV
Measure Theory
8
Set Functions and Measure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 167 8.1 Set Functions and Finitely Additive Measures . . . . . . . . . . . . . 167 8.2 Completely Additive Set Functions . . . . . . . . . . . . . . . . . . . . . 185
9
The Lebesgue Measure . . . . . . . . . . . . . . . . . . . . . . . 9.1 The Outer Measure . . . . . . . . . . . . . . . . . . . . . . 9.2 Carathéodory’s Criterion of Measurability . . . . . 9.3 Uniqueness of Extension m and Measure-Space Completion . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9.4 The Lebesgue Measure on R . . . . . . . . . . . . . . . 9.5 Cardinality of LðRÞ and BðRÞ . . . . . . . . . . . . . 9.6 The Lebesgue-Stieltjes Measure . . . . . . . . . . . . .
. . . . . . . . . . . 197 . . . . . . . . . . . 197 . . . . . . . . . . . 201 . . . .
10 Measurable Functions . . . . . . . . . . . . . . . . . . . . . . . . . 10.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10.2 R-valued Measurable Functions . . . . . . . . . . . . . . 10.3 Pointwise Convergence A.e. and Almost Uniform Convergence . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10.4 Approximating Measurable Functions . . . . . . . . . 10.5 The Vitali–Cantor Map . . . . . . . . . . . . . . . . . . . . 10.6 Rudiments of the Theory of Random Variables . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
207 218 228 234
. . . . . . . . . . 241 . . . . . . . . . . 241 . . . . . . . . . . 242 . . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
256 263 270 274
11 The Lebesgue Integral . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11.1 The Integral of Non-negative, Simple Measurable Maps 11.2 The Integral of Non-negative Maps . . . . . . . . . . . . . . . 11.3 The Integral of Arbitrary Maps . . . . . . . . . . . . . . . . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
285 285 290 303
Part V
. . . .
. . . .
. . . .
Theory of integration
12 Comparing Notions of Integral . . . . . . . . . . . . . . . . . . . . . . . . . . . . 323 12.1 Comparing the Riemann and Lebesgue Integrals . . . . . . . . . . . . 323 12.2 Completeness of L1 ð½a; bÞ . . . . . . . . . . . . . . . . . . . . . . . . . . . . 326
Contents
xi
12.3 Improper Integrals over Bounded Domains and the Lebesgue Integral . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 338 12.4 Improper Integrals over Unbounded Domains and the Lebesgue Integral . . . . . . . . . . . . . . . . . . . . . . . . . . . . 344 Part VI
Fundamental theorems of integral calculus on R
13 Functions with Bounded Variation and Absolutely Continuous Functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13.1 Functions with Bounded Variation . . . . . . . . . . . . . . . . . . . 13.2 Positive and Negative Total Variations . . . . . . . . . . . . . . . . 13.3 Total Variations as Functions . . . . . . . . . . . . . . . . . . . . . . 13.4 Differentiability of Monotone Functions . . . . . . . . . . . . . . . 13.5 Differentiability of Functions with Bounded Variation . . . . . 13.6 Absolutely Continuous Functions . . . . . . . . . . . . . . . . . . . . 14 Fundamental Theorems of Calculus for the Lebesgue Integral . 14.1 The 1st and 2nd Fundamental Theorems of Integral Calculus . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14.2 Integration by Parts and by Substitution . . . . . . . . . . . . . . . 14.3 Foundations of the Theory of Discrete and Continuous Random Variables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14.4 A Miscellany of Discrete and Continuous Distributions . . . .
. . . . . . .
. . . . . . .
. . . . . . .
355 355 363 365 367 375 376
. . . 383 . . . 383 . . . 392 . . . 398 . . . 411
Appendix A: Compact and Totally Bounded Metric Spaces . . . . . . . . . . 429 Appendix B: Urysohn’s Lemma and Tietze’s Theorem . . . . . . . . . . . . . . 439 List of Figures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 447 List of Symbols . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 449 References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 455 Index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 457
Part I
Sets
Chapter 1
Round-Up of Topology
1.1 Topology, Neighbourhoods, Closure, Interior and Frontier The concept of a topological space took shape in the mid-1800s with the study of R and more generally Euclidean space, and the properties of continuous maps on such spaces. Associated to the notion of topological space is the primitive concept of open set, from which there descend neighbourhoods, closed sets, the closure, the interior, the frontier, limit points and isolated points, derived sets etc. The term “set”, in particular, is used in mathematics to indicate a collection or family of objects, and is therefore a primitive notion. The objects that form a set are called its elements, and for a set to be well defined, i.e. for it to have a meaning in mathematics, we must be able to decide without ambiguity whether an element belongs in it or not. Definition 1.1.1 (Topology and topological spaces) A topology on a non-empty set X is a family T of subsets of X satisfying the following properties: 1. ∅ and X belong to T ; 2. arbitrary unions of elements of T belong to T ; 3. finite intersections of elements of T belong to T . The set X is called space, while the pair (X, T ) is called topological space. The sets belonging in T are called open sets of the topological space (X, T ). ♦ Definition 1.1.2 (Closed sets) Given a topological space (X, T ), a subset F ⊆ X is said to be closed in X in the topology T , or in (X, T ), if (X \ F) is an open set of the topology T . ♦ The following fact holds: Proposition 1.1.1 (Properties of closed sets in a topological space) Given a topological space (X, T ): © The Editor(s) (if applicable) and The Author(s), under exclusive license to Springer Nature Switzerland AG 2020 S. Gentili, Measure, Integration and a Primer on Probability Theory, UNITEXT 125, https://doi.org/10.1007/978-3-030-54940-4_1
3
4
1 Round-Up of Topology
1. ∅ and X are closed sets; 2. arbitrary intersections of closed sets in (X, T ) are closed in (X, T ); 3. finite unions of closed sets in (X, T ) are closed in (X, T ). Proof 1. ∅ and X are closed sets, for their respective complements are the open sets X and ∅. 2. Given an arbitrary collection {Fα }α∈I of closed sets in (X, T ), by De Morgan’s laws we have Fα = X \ Fα . X\ α∈I
α∈I
As the X \ Fα are open sets, the expression on the right-hand side is an open set, so α∈I Fα is closed. 3. Similarly, if Fi , for i = 1, . . . , n, are closed in (X, T ), still by De Morgan’s laws X\
n i=1
Fi =
n
X \ Fi
i=1
where the term on the right is open, and hence
n i=1
Fi is closed in T .
We have seen that X and ∅ are simultaneously open and closed in every topological space. But there exist topological spaces admitting a larger selection of both open and closed sets. From now onwards, unless otherwise specified, we shall assume the general topological space (X, T ) given, and we shall denote it simply by X . Definition 1.1.3 (Neighbourhoods) Given a set A ⊆ X , we call neighbourhood of ♦ x ∈ A any open set Ux ⊆ A such that x ∈ Ux . From the above definition it follows that if A isan open set, we may take Ux = A, so we may also say that A is open in T if A = x∈A Ux for any x ∈ A (property 2. in the definition of topology). Definition 1.1.4 (Closure of a set) Let A ⊆ X be a subset and consider the collection of all closed sets F A = {F} containing A. We call closure of A the set ♦ A¯ = F∈F A F. In practice the closure of A is the smallest closed set containing A, and a set A is closed if and only if it coincides with its closure. Note furthermore that the collection of all closed sets containing A is not empty, since the whole space X , which is closed, ¯ contains A. Proposition 1.1.2 (Proprieties of the closure operator) Given subsets A, B in the space X , the closure operator enjoys the following properties:
1.1 Topology, Neighbourhoods, Closure, Interior and Frontier
1. 2. 3. 4. 5. 6. 7.
5
∅ = ∅¯ and X = X¯ ; ¯ A ⊆ A; ¯ A ⊆ B ⇒ A¯ ⊆ B; ¯ A ∪ B = A¯ ∪ B; ¯ A¯ = ( A); ¯ A ∩ B ⊆ A¯ ∩ B; ¯ A ⊆ α∈I α α∈I Aα , where {Aα }α∈I is an arbitrary family of subsets in X .
Proof 1. 2. 3. 4.
∅ is both open and closed, as is the space X ; this is a direct consequence of the definition of closure; ¯ as F A ⊆ F B , we have A¯ ⊆ B; since A ⊆ A ∪ B, then A¯ ⊆ A ∪ B and similarly B¯ ⊆ A ∪ B, so A¯ ∪ B¯ ⊆ A ∪ B. ¯ then A ∪ B ⊆ A¯ ∪ B. ¯ The latter set is Conversely, since A ⊆ A¯ and B ⊆ B, closed, being the union of two closed sets, so by definition of closure ¯ A ∪ B ⊆ A¯ ∪ B,
whence the equality. ¯ so A¯ ⊆ ( A). ¯ Moreover 5. The closure of A¯ is the smallest closed set containing A, ¯ so ( A) ¯ ⊆ A, ¯ and the equality follows. A¯ is closed and contains A, 6. We know (A ∩ B) is the smallest closed set containing (A ∩ B). Since A ⊆ A¯ ¯ then (A ∩ B) ⊆ ( A¯ ∩ B). ¯ The latter is closed and contains (A ∩ B), and B ⊆ B, and differs from the smallest closed set (A ∩ B) containing (A ∩ B). Therefore ¯ (A ∩ B) ⊆ ( A¯ ∩ B). 7. For any J ⊆ I , let B J = α∈J Aα . Then by item 3. we have BJ =
Aα ⊆ Aα , ∀α ∈ J
α∈J
⇔ B¯ J =
Aα ⊆ A¯ α , ∀α ∈ J
α∈J
⇔ B¯ J =
α∈J
Aα ⊆
A¯ α , ∀J ⊆ I.
α∈J
As this holds for every J ⊆ I , the claim follows.
Definition 1.1.5 (Interior of a set) Take a set A ⊆ X and the family of all open sets ◦ G A = {G} contained in A. One calls interior of A the set A = G∈G A G. ♦
6
1 Round-Up of Topology
Concretely, the interior of a A is the largest open set contained in A, and a set A is open if and only if it equals its interior. The collection of open sets contained in A is ◦
non-empty since ∅, which is open, is a subset of A. The following proposition is an immediate consequence of the definition of interior. Proposition 1.1.3 (Interior points) Given a set A ⊆ X , a point x ∈ X belongs to ◦
the interior A if and only if there exists a neighbourhood Ux such that Ux ⊆ A. There are remarkable properties relating the operators of closure and interior. The ◦
first regards the open set A and the closed set (X \ A). Proposition 1.1.4 (Remarkable properties of closure and interior) For any subset A⊆X ◦
1. A = X \ (X \ A); ◦ ¯ c = ( A)c . 2. ( A) Proof 1. Clearly (X \ A) ⊆ (X \ A), so X \ (X \ A) ⊆ X \ (X \ A) = X \ Ac = A. Ac
Furthermore, X \ (X \ A) is an open set contained in A, whence ◦
X \ (X \ A) ⊆ A. To show the opposite inclusion observe that for any open set U contained in A we have (X \ A) ⊆ (X \ U ) = (X \ U ). Moreover1 (X \ A) ⊆ (X \ U ). In addition X \ (X \ U ) ⊆ X \ (X \ A) ⇔ U ⊆ X \ (X \ A). Uc
◦
This is true for every open set U contained in A, so by taking U = A we obtain ◦
A ⊆ X \ (X \ A). ◦
◦
2. As (X \ A) = ( A)c , the set (X \ A) is the complement of the open set A, and as such is closed. ◦
c ¯ = X \ A¯ = (X \ A) is open since complementary to the closed set A. ¯ Hence ( A) its closure is 1 Note
¯ A ⊆ B ⇔ A¯ ⊆ B.
1.1 Topology, Neighbourhoods, Closure, Interior and Frontier
7
◦ ¯ c = (X \ A) = ( A)c . ( A)
The latter interesting relation, together with Proposition 1.1.2, implies the next property. Proposition 1.1.5 (Further property of the closure operator) Given subsets A, B of a space X , the closure operator satisfies the following property A¯ \ B¯ ⊆ A \ B. Proof A¯ \ B¯ is not closed, being the intersection of the closed set A¯ and the open ¯ c . The smallest closed set containing A¯ \ B¯ is set ( B) ◦ ◦ ¯ ⊆ ( A) ¯ ∩ ( B) ¯ c = A¯ ∩ ( B)c = A¯ \ B = (A \ B). ( A¯ \ B)
The claim then follows. Proposition 1.1.6 (Properties of the interior operator) Given subsets A, B in X , the interior operators satisfies: ◦
◦
1. X = X and ∅ = ∅; ◦
2. A ⊆ A;
◦
◦
3. A ⊆ B ⇒ A ⊆ B; ◦
◦ ◦ 4. (A ∩ B) = A ∩ B; ◦ ◦ ◦ 5. A = A; ◦
◦ ◦ 6. A ∪ B ⊆ A ∪ B; ◦
◦ 7. Aα . α∈I Aα ⊆ α∈I
Proof All claims follow directly from Propositions 1.1.2, 1.1.4 and De Morgan’s law. Definition 1.1.6 (Frontier of a set) We call frontier of A ⊆ X the set ◦
∂ A = A¯ ∩ (X \ A) = A¯ ∩ ( A)c . ♦
8
1 Round-Up of Topology
Proposition 1.1.7 (Boundary points) Given a set A ⊆ X , a point x ∈ X belongs to the frontier ∂ A if and only if for every neighbourhood Ux Ux ∩ A = ∅ and Ux ∩ Ac = ∅. Proof For the necessary implication let us assume x ∈ ∂ A, so that x is not an interior ◦
nor an exterior point of A. Then we claim x ∈ / A, so no neighbourhood Ux exists that is completely contained in A. This means Ux ∩ Ac = ∅. A similar reasoning shows ◦
that as x is not exterior to A, x cannot be interior to ( A)c , and therefore there is no neighbourhood Ux totally contained in Ac , meaning that Ux ∩ A = ∅. Suppose now x ∈ X is such that for every neighbourhood Ux we have Ux ∩ A = ∅ ◦ ◦ and Ux ∩ Ac = ∅. Take x ∈ A¯ = A ∪ ∂ A. If x ∈ A there exists a Ux contained in A such that Ux ∩ Ac = ∅, contradicting the assumption. Hence necessarily x ∈ ∂ A. Proposition 1.1.8 (Properties of the frontier operator) Let A and B be subsets in X . The frontier operator satisfies the following properties: ◦
A = A \ ∂ A; A¯ = A ∪ ∂ A; ∂(A ∪ B) ⊆ ∂ A ∪ ∂ B; ∂(A ∩ B) ⊆ ( A¯ ∩ ∂ B) ∪ ( B¯ ∩ ∂ A); ∂(X \ A) = ∂ A; ◦
◦ 6. X = A ∪ ∂ A ∪ ( X \ A); 7. A is open if and only if ∂ A = A¯ \ A;
1. 2. 3. 4. 5.
◦
8. A is closed if and only if ∂ A = A \ A; 9. A is open and closed if and only if ∂ A = ∅. Proof 1. By Definition 1.1.6 we obtain ◦ ◦ ◦ ◦ A \ ∂ A = A \ A¯ ∩ ( A)c = A ∩ A¯ c ∪ A = (A ∩ A¯ c ) ∪ (A ∩ A) = A. =∅
2. Still by definition of frontier ◦
∂ A ∪ A = A¯ ∩ ( A)c ∪ A = A¯ X
◦
=A
1.1 Topology, Neighbourhoods, Closure, Interior and Frontier
9
3. ∂(A ∪ B) = A ∪ B ∩ X \ (A ∪ B) ¯ ∩ (X \ A) ∩ (X \ B) = ( A¯ ∪ B) ¯ ∩ (X \ A) ∩ (X \ B) ⊆ ( A¯ ∪ B) ⊆ ( A¯ ∩ (X \ A)) ∪ ( B¯ ∩ (X \ B)) = ∂ A ∪ ∂ B. 4. ∂(A ∩ B) = A ∩ B ∩ X \ (A ∩ B) ¯ ∩ X ∩ (Ac ∪ B c ) ⊆ ( A¯ ∩ B) ¯ ∩ (X \ A) ∪ (X \ B) = ( A¯ ∩ B)
¯ ∩ (X \ A) ∪ (X \ B) = ( A¯ ∩ B) ¯ ∩ (X \ B)] ¯ ∩ (X \ A)] ∪ [( A¯ ∩ B) = [( A¯ ∩ B) = [ B¯ ∩ ( A¯ ∩ (X \ A))] ∪ [ A¯ ∩ ( B¯ ∩ (X \ B))] = ( B¯ ∩ ∂ A) ∪ ( A¯ ∩ ∂ B). 5. The frontier of the complement of a set equals ◦
∂(X \ A) = (X \ A) ∩ (X \ X \ A) = ( A)c ∩ (X \ Ac ) ◦
◦
= ( A)c ∩ (X ∩ A) = ( A)c ∩ A¯ = ∂ A. 6. Item 6. is self-evident. ◦ 7. Suppose A is open, hence A = A, and so ◦
∂ A = A¯ ∩ ( A)c = A¯ ∩ Ac = A¯ \ A. Vice versa let us assume ∂ A = A¯ \ A. Then ◦
◦
◦
A¯ \ A = ∂ A = A¯ ∩ ( A)c = A¯ \ A ⇔ A = A. ¯ so 8. If A is closed then A = A, ◦
◦
∂ A = A¯ ∩ (X \ A) = A ∩ (X \ A) = A ∩ ( A)c = A \ A. ◦
Conversely if ∂ A = A \ A,
10
1 Round-Up of Topology ◦
◦
◦
¯ A \ A = ∂ A = A¯ ∩ ( A)c = A¯ \ A ⇔ A = A. ◦
¯ whence 9. As A is open and closed A = A = A, ◦
◦
∂ A = A¯ \ A¯ = A \ A = ∅.
1.2 Limit Points and Isolated Points Let us now define limit points and isolated points. Definition 1.2.1 (Limit points and isolated points) Given a subset A ⊆ X : 1. a point x ∈ X is a limit point of A whenever x ∈ A \ {x}; 2. a point x ∈ X is an isolated point of A if there exists Ux such that Ux ∩ A = {x}. ♦ Proposition 1.2.1 (Characterisation of limit points) A point x ∈ X is a limit point of a given set A ⊆ X if and only if every neighbourhood Ux contains at least one point of A other than x. Formally: Ux ∩ (A \ {x}) = ∅. Proof For the necessary part we assume x is a limit point of A, so x is not isolated. Then for every neighbourhood Ux we have Ux ∩ (A \ {x}) = ∅, so Ux contains some point of A different from x. For the sufficient part, each neighbourhood Ux contains some point of A other than x. If x ∈ / A \ {x}, then x ∈ X \ (A \ {x}). As the latter set is open, there exists a neighbourhood Ux contained in it, but against the hypothesis Ux ∩ (A \ {x}) = ∅. The contradiction thus proves the claim. Proposition 1.2.2 (Characterisation of isolated points) Given A ⊆ X and a point x ∈ A, {x} is open in A if and only if x is isolated in A. Proof If {x} is open in A, there is Ux ⊆ X such that Ux ∩ A = {x}, i.e. there exists a neighbourhood of x in A only containing x, whence x is isolated in A. Suppose conversely x is an isolated point of A, so by definition there exists a neighbourhood Ux such that Ux ∩ A = {x}. Then {x} is open in A.
1.2 Limit Points and Isolated Points
11
Let us now apply the theory seen thus far to the number set R. By a linear set of points we shall mean a set of points chosen arbitrarily on the real line R. The simplest instance of linear set is an interval. If a and b are any two points on R with a ≤ b, we shall say: • • • • • • • • • •
[a, b] is a closed bounded interval; (a, b) is an open bounded interval; (a, b] is a left-open, right-closed bounded, interval; [a, b) is a right-open, left-closed bounded interval; [a, a] is a degenerate interval; the closed interval reduces to the set {a}, in the other cases we have ∅; [a, +∞) is a left-closed, right-unbounded interval; (a, +∞) is a left-open, right-unbounded interval; (−∞, b] is a left-unbounded, right-closed interval; (−∞, b) is a left-unbounded, right-open interval; R = (−∞, +∞) is the left- and right-unbounded interval.
Regarding the extended real numbers: • • • • •
[a, ∞] is a left-closed, right-unbounded interval; (a, ∞] is a left-open, right-unbounded interval; [−∞, b] is a left-unbounded, right-closed interval; [−∞, b) is a left-unbounded, right-open interval; R = [−∞, ∞] is the left- and right-unbounded closed interval, representing the set of extended real numbers.
Using the same process of Proposition 1.2.1 the reader should prove the characterisation of limit points in R. We may therefore adopt the following definition of limit point in R. Definition 1.2.2 (Limit points in R) Let A be a linear set of points. A point x ∈ R is a limit point of A if every open interval Ix contains at least one point of A distinct from x: Ix ∩ (A \ {x}) = ∅. ♦ Proposition 1.2.3 (Open intervals with limit points) Let x be a limit point of a linear set A. Then every open interval (a, b) containing x must contain infinitely many points of A. Proof Let us assume by contradiction that the interval (a, b), to which x belongs, only contains a finite number of points of A. Call x1 , x2 , . . . , xn the points of A ∩ (a, b) different from x. Indicate by δ the smallest among the positive numbers |x − x1 |, |x − x2 |, . . . , |x − xn |, |x − a|, |x − b|. The open interval (x − δ; x + δ) contains none of a, x1 , x2 , . . . , xn , b, so it cannot contain any point of A distinct from x. This contradicts the fact that x is a limit point of A, whence the claim.
12
1 Round-Up of Topology
Let us now examine the structure of open and closed, bounded linear sets in R. Theorem 1.2.1 (Georg Cantor2 —1882—Open subsets of R) Any open set G ⊆ R is a countable union (at most) of pairwise disjoint open intervals. Proof Let G be an open linear set in R. Given x ∈ G, call Ux the family of neighbourhoods Uxcontained in G. The family is obviously non-empty since G is open. The union Ux ∈Ux Ux = Ix therefore exists and becomes the largest open interval containing x and contained in G. Take y ∈ G and let I y ⊆ G be an interval containing y. If (Ix ∩ I y ) = ∅, then (Ix ∪ I y ) is an open interval containing x and y, and Ix ⊆ (Ix ∪ I y ). Since, furthermore, Ix is the largest open interval containing x, clearly (Ix ∪ I y ) ⊆ Ix . Then Ix = (Ix ∪ I y ). So either Ix = I y or (Ix ∩ I y ) = ∅, meaning the intervals contained in an open subset G of R either coincide or are pairwise disjoint. Let now Iqi : Uqi = Iqi , ∀qi ∈ (G ∩ Q) Uqi ∈Uqi
be the collection of intervals contained in G, each one containing a rational number of G. It is clear that this family is at most countable, since Q is at most countable in G and the intervals Iqi are pairwise disjoint. At last, G open implies that the union of these open intervals must coincide with G. The converse statement holds by definition of open set. Knowing that the complement of an open set is closed, the previous theorem gives the following observation. Remark 1.2.1 (Closed subsets of R) A non-empty bounded subset F ⊆ R is closed if either it is a closed interval [a, b], or it is obtained from a closed interval [a, b] by removing at most countably many pairwise disjoint open intervals with endpoints in F. ♣
2 G.
Cantor, Über unendliche, lineare Punktmannigfaltigkeiten, Math. Ann. 20 (1882), 113–121.
Chapter 2
Types of Sets
2.1 Derived Set, Set of Isolated Points and Boundary Definition 2.1.1 (Derived set and set of isolated points) Given a set A ⊆ X we call: 1. derived set of A the set of limit points of A, denoted by A ; 2. (A \ A ) the set of isolated points of A.
♦
Proposition 2.1.1 (Characterisation of closed sets) A set A ⊆ X is closed if and only if it contains all of its limit points, that is A ⊆ A. Proof Let A ⊆ X be closed and x ∈ X a limit point of A, so x ∈ A . If x ∈ / A, since A \ {x} ⊆ A¯ = A then x ∈ / A \ {x}, so x is not a limit point of A. The contradiction thus proves the claim. For the sufficient implication we suppose A ⊆ A. It suffices to show Ac is open. / A since y ∈ / A, so y ∈ / For this let y be an arbitrary point in Ac . Observe that y ∈ A \ {x}. Negifying Proposition 1.2.1 gives a neighbourhood U y that does not contain / A, points of A different from x, i.e. U y ∩ A \ {x} = ∅. Moreover, as y ∈ U y ∩ A \ {x} = U y ∩ A = ∅. Therefore U y ⊆ Ac and y is an internal point of Ac . But y ∈ Ac is arbitrary, so Ac is open. Proposition 2.1.2 (Derived sets) All iterated derived sets A , A , A , . . . , A(n) , . . . of a given set A ⊆ X are closed and such that A ⊇ A ⊇ A ⊇ · · · ⊇ A(n) ⊇ · · · .
© The Editor(s) (if applicable) and The Author(s), under exclusive license to Springer Nature Switzerland AG 2020 S. Gentili, Measure, Integration and a Primer on Probability Theory, UNITEXT 125, https://doi.org/10.1007/978-3-030-54940-4_2
13
14
2 Types of Sets
Proof We begin from the latter claim and suppose x ∈ A(n) , for n ≥ 2, is such that x∈ / A . A neighbourhood Ux will contain a finite number of elements of A at most. Consequently Ux will not contain points of A , and hence no points of A , A , . . ., so no point of A(n) for every n ≥ 2. This goes against the assumption and proves the statement. Let us prove the first claim. To show A is closed it is enough to prove X \ A is open. Suppose x ∈ X \ A , so x ∈ / A , and then there exists a neighbourhood Ux containing at most finitely many points of A. None of these can be a limit point of A, for otherwise Ux would contain infinitely many points of A. Hence Ux ⊆ X \ A . As this is true for any x ∈ X \ A , it follows X \ A is open and so A is closed. By induction, we suppose X \ A(n−1) is open and A(n−1) closed. / A(n) , and then there is a neighbourhood Ux containing Take x ∈ X \ A(n) , so x ∈ a finite number of points of A(n−1) at most. None is a limit point of A(n−1) , otherwise Ux would have infinitely many points in common with A(n−1) , thus violating the hypothesis. Therefore Ux ⊆ X \ A(n) . As this holds for any x ∈ X \ A(n) , necessarily X \ A(n) is open, as requested. From this proposition we infer that the set of isolated points of A is A \ A , an open set in X . A set A ⊆ X with n derived sets, where the nth derived set is clearly finite, is said to be of first species of order n. In particular all sets of first species of order zero contain only a finite number of points; all sets of first species of order 1, have the first derived set which contains only a finite number of points. In general for all n ∈ N all sets of first species of order n have the nth derived set which contains only a finite number of points. A set A ⊆ X is called of second species if it has infinitely many derived sets.1 Example 2.1.1 (Derived sets of sets of first and second species in R) 1. An example2 of set of first species and order 1 is 1 1 1 1 A = 1, , , , . . . , , . . . , 2 3 3 n whose derived set A = {0} contains the element 0 that does not belong in A. 2. An example3 of set of first species and order four is: 1 1 1 1 + n + n + n : ∀n i ≥ 1, i = 1, 2, 3, 4 . A= 3n 1 52 73 11 4
1 The
notion of derived set was introduced in 1872 by Cantor. In 1874 Du Bois-Reymond created a hierarchy of limit points based on their order, including infinite order. 2 Cantor, Math. Ann. vol. V (1872). 3 Ascoli, Ann. of Mat., Serie II. vol. VI, p. 56, 1875.
2.1 Derived Set, Set of Isolated Points and Boundary
The first derived set of A is the following set 1 1 1 1 1 1 1 1 1 A = 0; n + n + n ; n + n + n ; n + n + n ; 1 2 3 1 2 4 1 3 3 5 7 3 5 11 3 7 11 4 1 1 1 1 1 1 1 1 1 + n + n ; n + n ; n + n ; n + n ; n 2 3 4 1 2 1 3 1 5 7 11 3 5 3 7 3 11 4 1 1 1 1 1 1 + n ; n + n ; n + n ; 5n 2 73 52 11 4 7 3 11 4 1 1 1 1 ; ; ; : ∀n , n , n , n ≥ 1 . 1 2 3 4 3n 1 5n 2 7n 3 11n 4 The second derived set of A is 1 1 1 1 1 1 A = 0; n + n ; n + n ; n + n ; 31 52 31 73 31 11 4 1 1 1 1 1 1 + n ; n + n ; n + n ; n 52 73 52 11 4 7 3 11 4 1 1 1 1 ; ; ; : ∀n 1 , n 2 , n 3 and n 4 ≥ 1 . 3n 1 5n 2 7n 3 11n 4 The third derived set of A is 1 1 1 1 A = 0; n ; n ; n ; n : ∀n 1 , n 2 , n 3 and n 4 ≥ 1 . 3 1 5 2 7 3 11 4 The fourth derived set consists in the singleton {0}. 3. In the light of the above example we may say that the set4 1 1 1 A= + + ··· + : ∀ai ≥ 1, with i = 1, . . . , n , a1 a2 an is of first species and order n. 4. Also the following set5 A=
1 1 1 + m +m + · · · + m +m +···+m n 2m 1 2 1 2 2 1 2 : ∀m i ≥ 0, and n ∈ N ,
is of first species and order n.
4 H.
J. S. Smith, Proc. Lond. Math. Soc., vol. VI, p. 145, 1875. Acta Math., vol. IV, p. 58, 1884.
5 Mittag-Leffler,
15
16
2 Types of Sets
5. Here is an example6 of second species, i.e. with derived sets up to the first infinite ordinal ω: 1 1 1 1 + n+m + n+m +m · · · + n+m +m +···+m A= n 1 1 2 1 2 n 2 2 2 2 : ∀n ≥ 0; ∀ m 1 , . . . , m n ≥ 0 , where the m i , for i = 1, . . . , n have all positive integral values including zero, independently of one another. For every n the derived set A(n) contains the numbers 1/2n , 1/2n+1 , . . ., so the derived set A(ω) exists and reduces to 0 only. 6. Another subset of second species in R are the rationals contained in the interval [0, 1]. The first derived set contains every real number in [0, 1], and each successive derived set coincides with the first. Several statements in the next proposition are immediate from Proposition 1.1.2, but we reckon it is a useful exercise to prove them explicitly. Proposition 2.1.3 (Properties of derived sets) Let A, B be subsets of X . Then: 1. 2. 3. 4. 5.
A¯ = A ∪ A ; if A is a subset of B, every limit point of A is a limit point of B: A ⊆ B ⇒ A ⊆ B ; (A ∪ B) = A ∪ B ; (A ∩ B) ⊆ (A ∩ B ); α∈I Aα ⊆ α∈I Aα .
Proof ¯ Hence 1. Regarding the first inclusion, A ⊆ A¯ and by Proposition 2.1.1 A ⊆ A. ¯ A ∪ A ⊆ A. For the opposite inclusion we need to show first A ∪ A is closed, i.e. (A ∪ A )c is / A , there exists U y such that U y ∩ A \ {x} = open. Since y ∈ (A ∪ A )c ⇒ y ∈ ∅. Moreover y ∈ / A, so U y ∩ A \ {x} = U y ∩ A = ∅. But A ⊆ A implies U y ∩ A = ∅. Overall U y ∩ (A ∪ A ) = ∅, and so U y ⊆ (A ∪ A )c . As y ∈ (A ∪ A )c is arbitrary (A ∪ A )c is open, and A ∪ A closed. Furthermore, A ∪ A is closed and contains A. As A¯ is the smallest closed set containing A, clearly A¯ ⊆ A ∪ A . 2. Now x ∈ A ⇔ Ux ∩ A \ {x} = ∅. As A ⊆ B, then Ux ∩ A \ {x} ⊆ Ux ∩ B \ {x} = ∅. Therefore x ∈ B , i.e. A ⊆ B . 3. Since A ⊆ A ∪ B and B ⊆ A ∪ B, by the previous item A ⊆ (A ∪ B) and B ⊆ (A ∪ B) , hence A ∪ B ⊆ (A ∪ B) . Conversely, suppose x ∈ (A ∪ B) , so every neighbourhood Ux contains infinitely many points of (A ∪ B). This means that every neighbourhood Ux contains infinitely many points of A, infinitely many points of B, or both. If x is a limit point of A and/or B, then x ∈ A and/or x ∈ B , whence x ∈ (A ∪ B ). Therefore (A ∪ B) ⊆ (A ∪ B ). From the latter two inclusions the assertion follows. 6 Mittag-Leffler,
Acta Math., vol. IV, p. 58, 1884.
2.1 Derived Set, Set of Isolated Points and Boundary
17
4. Now (A ∩ B) ⊆ A, so item 2. forces (A ∩ B) ⊆ A . In addition (A ∩ B) ⊆ B, so still by 2. we have (A ∩ B) ⊆ B . By these two we have (A ∩ B) ⊆ (A ∩ B ). 5. Take x ∈ α∈I Aα , so x ∈ Aα for some α ∈ I . Then there is a neighbourhood Ux such that Ux ∩ Aα \ {x} = ∅. A fortiori, then, Aα \ {x} = ∅ ⇔ x ∈ Aα , Ux ∩ α∈I
α∈I
proving the claim. Remark 2.1.1 1. Item 1. in Proposition 2.1.3 implies A \ {x} = (A \ {x}) ∪ (A \ {x}) and (A \ {x}) ⊆ A \ {x}. 2. Regarding the frontier of a set A ⊆ X :
• an isolated point x of A is not a limit point of A, so it does not belong to A . As internal points are limit points, x cannot belong in the interior of A, and must then belong to the frontier ∂ A; • an internal point x of A cannot belong to the frontier ∂ A; therefore ∂ A consists of the isolated points and the non-internal limit points of A: ◦
∂ A = (A \ A ) ∪ (A \ A). 3. The set of limit points A of A ⊆ X is made of the internal points of A and the frontier points that are not isolated: ◦
A = A ∪ (∂ A \ (A \ A )). 4. As the closure7 A¯ of A ⊆ X obviously contains the isolated points of A, it can be seen as the union of the derived set A and the set of isolated points (A \ A ) A¯ = A ∪ (A \ A ). ◦
Likewise, the union of the interior A and the frontier ∂ A is still the closure of A ◦
A¯ = A ∪ ∂ A.
♣
Informally speaking, the cardinality of the set of isolated points is at most countable. In fact if x is an isolated point of A, hence not a limit point, it is possible to choose a 7 Points
in the closure of a set are at times called adherent to the set.
18
2 Types of Sets
neighbourhood such that Ux ∩ A = {x}. By doing the same with every point of the set and choosing the size of the neighbourhoods Ux so that they are pairwise disjoint, we obtain a family of neighbourhoods that is at most countable. Yet not all countable set is made of isolated points. What we have seen goes to show that a closed subset F ⊆ X differs from its derived set at most by countably many points. In fact F\F does not contain limit points, and as such it consists of isolated points, whence it must be at most countable. Let us now examine a number of properties of R related to these issues. Theorem 2.1.1 (Bolzano–Weierstraß theorem) Every infinite bounded subset A ⊆ R possesses at least one limit point, which may or not belong to A. Proof As A is bounded, there exists a closed interval [a, b] containing it. Take the point c=
a+b . 2
At least one of the closed subintervals [a, c] and [c, b] must contain infinitely many points of A, otherwise A would be finite. Clearly, if both subintervals contain infinitely many points we simply choose one, and rename it [a1 , b1 ]. Now pick the point c1 =
a1 + b1 2
and call [a2 , b2 ] the subinterval, among [a1 , c1 ] and [c1 , b1 ], that contains infinitely many points of A. Iterating this process we generate a nested sequence [a, b] ⊃ [a1 , b1 ] ⊃ [a2 , b2 ] ⊃ · · · , where each interval contains infinitely many points of A. As bn − an =
b−a , 2n
the length of [an , bn ] tends to zero as n → ∞, and there exists a point x contained in all of the [an , bn ], and such that lim an = lim bn = x.
n→∞
n→∞
We claim x is a limit point of A. Pick an open interval (α, β) containing x. Clearly for n large enough [an , bn ] ⊂ (α, β).
2.1 Derived Set, Set of Isolated Points and Boundary
19
The open interval (α, β) contains infinitely many points of A, and the theorem is proved. The condition of I being bounded is essential for Theorem 2.1.1 to hold. To explain the importance of this observation think of N. It is infinite, but it has no limit points since it is unbounded. Now we shall introduce Cantor’s intersection theorem for R. Theorem 2.1.2 (Cantor’s intersection property for R) Let {Fn }∞ n=1 be a decreasing sequence of non-empty, closed and bounded subsets in R. Then there exists a unique point x ∈ R such that x ∈ Fn for every n ∈ N, i.e. F=
∞
Fn = ∅.
n=1
Therefore F is non-empty, closed and bounded in R. Proof Let {Fn }∞ n=1 be a decreasing sequence of non-empty, closed and bounded subsets of R. Clearly F=
∞
Fn
n=1
is closed, and since F ⊆ F1 , with F1 bounded, also F is bounded. There remains to prove F is not empty. For this pick x1 ∈ F1 , x2 ∈ F2 , . . . , xn ∈ Fn , . . . . The sequence {xn }n≥1 may be finite or infinite. The former situation occurs when starting from a certain n , for every n ≥ n the xn stabilise to a specific value ξ, so ξ ∈ Fn for every n ≥ n . But since the sequence of subsets {Fn } is decreasing, necessarily ξ ∈ Fn for every n < n as well, so eventually ξ ∈ F. In the second case A = x1 , xn 1 , . . . , xn k , . . . , with 1 < n 1 , . . . , n k , . . .. Clearly A is a subset of F1 so, besides infinite, it must be bounded. The Bolzano–Weierstraß theorem (Theorem 2.1.1) implies A has a limit point, say ξ. We claim that for every k ∈ N the set (A \ x1 , . . . , xn k ) has the same limit points as A. Let in fact y be a limit point of A, and take a neighbourhood U y , which contains an of points of A. At most finitely many elements in U y infinite number . The rest is still made ofinfinitely many points, all falling belong to x , . . . , x 1 n k , . . . , x point of A \ x . Analogously, in A \ x1 , . . . , xn k , making y a limit 1 n k one proves that if y is a limit point of A \ x1 , . . . , xn k , then it is a limit point of A, too. As all this holds for every k ∈ N, A = A \ x1 , . . . , xn k , ∀k ∈ N. But ξ for every k ∈ N is a limit point of A \ x1 , . . . , xn k , and
20
2 Types of Sets
A \ x1 , . . . , xn k ⊆ Fn k ,
so ξ is a limit point of Fn k . Now, Fn k closed implies ξ ∈ Fn k , for every k ∈ N. By assumption the sequence {Fn } decreases as n grows, so ξ ∈ Fn for every n and, ultimately, ξ ∈ F, meaning F is non-empty. Definition 2.1.2 (Open covers and finite subcovers) An open cover (or open covering) of a set A ⊆ R is a collection of open subsets G = {G α }α∈I in R such that for every x ∈ A there exists a G α ∈ {G α }α∈I such that x ∈ G α , so Gα. A⊆ α∈I
A finite subcover of an open cover G of A is a finite subcollection {G α1 , G α2 , . . . , G αn } of subsets in G, such that for every x ∈ A there exists an open set G αi , 1 ≤ i ≤ n, with x ∈ G αi . So again A⊆
n
G αi .
i=1
♦ We are now ready to prove the important covering theorem of Heine–Pincherle– Borel. The result states that if a closed bounded set F ⊆ R is covered by a family of open intervals, one can discard many intervals and still cover the set with the rest. As a matter of fact a finite number is sufficient to cover F. The theorem might seem trivial at first sight, but reveals its depth in its multiple and crucial applications. Theorem 2.1.3 (Heine–Pincherle–Borel theorem for closed bounded subsets of R) If a closed bounded set F ⊆ R is covered by a family of open intervals I, there exists a finite family of intervals I1 , I2 , . . . , In of I that still covers F. Proof We suppose by contradiction that there is no finite open subcover of I for F. As F is bounded there are finitely many closed intervals {Ji }, each of length less than 1, covering F. At least one of these contains the portion of F that cannot be covered by a finite number of open intervals of I. Let us call it J1 , and define F1 = J1 ∩ F. As F1 is bounded, there are finitely many closed intervals, each of length less than 1/2, that cover F1 . As before, one of these contains the part of F1 not covered by finitely many open intervals of I. Call it J2 and let F2 = J2 ∩ F1 . Proceeding thus we generate a decreasing sequence of closed bounded sets F1 ⊇ F2 ⊇ · · · ⊇ Fn ⊇ · · · such that for every n ∈ N, the set Fn is covered by a closed interval Jn of length less than 1/n, but cannot be covered by a finite number of open intervals from I.
2.1 Derived Set, Set of Isolated Points and Boundary
21
By Theorem 2.1.2 F=
∞
Fn
n=1
and x ∈ F. But I is an open cover, so there exists I = (a, b) ∈ I such that x ∈ I . There is a k ∈ N for which the length of Jk satisfies (Jk )
0, such that x ∈ U (y, r ) ⊆ G. Call G the collection of open sets just defined. Let us check that this defines a topology on R: • first of all, for every x ∈ ∅ there exists U (∅) ⊆ ∅, so ∅ is in G; moreover R = (−∞, ∞) is the neighbourhood U (y, ∞), whence R ∈ G; • now we show that arbitrary unions belong to G. Let {G i }i∈I be a family of open sets of G, where I is a collection of indices. If x ∈ i G i , then x ∈ G i for some i. ⊆ So, there exists a neighbourhood U (y, r ) such that x ∈ U (y, r ) ⊆ G i i Gi . Given x is arbitrary, i G i is in G; • finally, take G 1 , G 2 ∈ G. If x ∈ (G 1 ∩ G 2 ) then x ∈ G 1 , so there is a neighbourhood U (y, r1 ) such that x ∈ U (y, r1 ) ⊆ G 1 , and similarly for G 2 . Set U (y, r ) = U (y, r1 ) ∩ U (y, r2 ), from which x ∈ U (y, r ) ⊆ G 1 ∩ G 2 . But again x ∈ G 1 ∩ G 2 is arbitrary, and the claim holds. In a similar way one proves that the following class G of subsets of Rn is a topology on Rn : we call G ∈ G open iff, for every x ∈ G, there is an open ball B(y, r ) = x ∈ Rn : x − y < r n 2 1/2 , such that x ∈ B(y, r ) ⊆ G. where r > 0 and x − y = i=1 (x i − yi ) These topologies are called standard or Euclidean topologies of R and Rn , and they are actually special instances of metric topologies. Definition 2.3.1 (Metric spaces) A metric space is a set X equipped with a real function d : X × X → [0, ∞), called distance or metric, satisfying: 1. d(x, y) ≥ 0 for every x, y ∈ X ; 2. d(x, y) = 0 ⇔ x = y ∈ X ; 3. d(x, y) = d(y, x) for every x, y ∈ X ;
30
2 Types of Sets
4. d(x, z) ≤ d(x, y) + d(y, z) for every x, y, z ∈ X . In general one indicates a metric space as a pair (X, d).
♦
By this definition we see that the spaces R or R , whose distance function induces the Euclidean topology, are metric spaces: n
• (R, | · |), i.e. the real numbers with Euclidean distance d(x, y) = |x − y|; • (Rn , · n ), i.e. real n-tuples x = (x1 , . . . , xn ) with distance d(x, y) =
n
1/2 (xi − yi )2
.
i=1
Further instances of metric spaces are: • (Q, | · |), the rationals with distance d(x, y) = |x − y|; ∞ 2 xi < ∞, • (H, ·, ·), i.e. real sequences {x} = {x1 , x2 , . . . , xn , . . .} such that i=1 equipped with metric d(x, y) =
∞
1/2 (xi − yi )
.
2
i=1
Here one should prove that if
∞ i=1 ∞
xi2 < ∞ and
∞ i=1
yi2 < ∞ then
(xi − yi )2 < ∞,
i=1
besides proving the axioms of a metric space. Definition 2.3.2 (Balls in metric spaces) Given a metric space (X, d), a point x ∈ X and a number r > 0, we call open ball with centre x and radius r the set B(x, r ) = y ∈ X : d(x, y) < r , and closed ball with centre x and radius r the set ¯ B(x, r ) = y ∈ X : d(x, y) ≤ r .
♦
In general we may define the topology on X induced by the metric d as the collection Td of sets G ⊆ X such that, for every x ∈ G, there exists an open ball B(x, r ) = y ∈ X : d(y, x) < r of radius r > 0 and centre x such that B(x, r ) ⊆ G. The sets G are called open sets of the topology Td induced by d. Several theorems and results valid in (R, · ) generalise to arbitrary metric spaces, since metric spaces rely on properties of the real numbers that guarantee a similar behaviour in terms of distance, convergence of sequences, completeness, separability etc. Before to define the notions of limit and convergence in metric spaces, we define the concept of separable metric space.
2.3 Types of Sets in Metric Spaces
31
Definition 2.3.3 (Separable metric spaces) A metric space (X, d) is called separable if X contains a countable and dense subset. ♦ For example R is separable for it contains the rationals, which are countable and dense. Hence the metric space (R, | · |), where | · | is the Euclidean distance d(x, y) = |x − y|, is separable. Just as a real number can be approximated, as well as one wants, by rational numbers, a separable space possesses countable subsets using which one can get as close as one wishes—in the sense of limits—to any element. Definition 2.3.4 (Limits of sequences in metric spaces) Given a metric space (X, d), a sequence {xn }n≥1 ⊆ X converges to the limit x0 if for every > 0 there exists an n such that for every n ≥ n d(x0 , xn ) < .
♦
Theorem 2.3.1 (Cauchy condition for convergent sequences in metric spaces) Let (X, d) be a metric space and {xn }n≥1 ⊆ X a sequence converging to x0 . Then the sequence satisfies the Cauchy condition: for every > 0 there exists n ∈ N such that for every n, m ≥ n d(xn , xm ) < . Such a sequence is called a Cauchy sequence. Proof If {xn }n≥1 converges to x0 , for every > 0 there exists n such that d(x0 , xn ) < /2 for every n ≥ n . Take n, m larger than n . By the triangle inequality d(xn , xm ) ≤ d(x0 , xn ) + d(x0 , xm ) < , and the statement follows.
Not every Cauchy sequence converges in a given space X . Consider the metric space (X, d), where X = (0, 1) and d(x, y) = |x − y|. The sequence {xn }n>1 = {1/n} is Cauchy, but has no limit in X . It is a Cauchy sequence because it converges in (R, d), but not in X , since its limit 0 does not belong to X . The property of a space that every Cauchy sequence converges in the space is called completeness, and is detailed below. Definition 2.3.5 (Complete metric spaces) A metric space (X, d) is complete when every Cauchy sequence converges (has a limit) in X . ♦ All the examples of metric spaces presented earlier are complete metric spaces except for the rational numbers. In Q, in fact, sequences may converge to a limit in R. Complete separable metric spaces are also known as Polish spaces, because the first mathematicians to investigate their properties, at the beginning of the XX century, belonged to the Polish School. The main results on Polish spaces are linked to the names of Wacław Sierpi´nski, Kazimierz Kuratowski and Alfred Tarski.
32
2 Types of Sets
We shall prove a number of metric properties regarding the sets described earlier. The first result is due to Ulisse Dini. Dini’s theorem say that sets of first species are always nowhere dense and hence somehow neglectable. This happens when we can enclose points in finitely many neighbourhoods chosen so small that their measure may be neglected without consequence. Theorem 2.3.2 (Negligibility of sets of first species) Let (X, d) be a complete and separable metric space. A set A ⊆ X of first species is nowhere dense and negligible. Proof Let A be of first species of order n in X , and U ⊆ X an open set. If U contains no points of A and of each derived set, then the claim is proved. Suppose then the points x1 , . . . , xm n of A(n) belong to U . Given any > 0, let 0 < n ¯ in , rn ) ∩ such that {B(xin , rn )}imn =1 be a cover of these points, with B(x rn < [m n ·(n+1)] ¯ jn , rn ) = ∅ for i n = jn , i n , jn = 1, . . . , m n , and B(x
A¯ \ A(n) ∩ U ∩
mn
¯ B(xin , rn ) = ∅,
i n =1
where B(xin , rn ) = x ∈ X : d(x, xin ) < rn , for i n = 1, . . . , m n . Then we remove . Having taken from U the cover, where the total length of the radii is less than (n+1) (n) care of the points of A , suppose there are x1 , . . . , xm n−1 points from A(n−1) in Un−1 = U
mn
¯ in , rn ). B(x
i n =1 n−1 Let 0 < rn−1 < [m n−1 ·(n+1)] such that {B(xin−1 , rn−1 )}in−1 =1 be a cover of these points, ¯ in−1 , rn−1 )} pairwise disjoint, with { B(x
m
m n−1
¯ B(xin−1 , rn−1 ) ⊆ Un−1
i n−1 =1
and
A¯ \ A(n−1) ∩ Un−1 ∩
m n−1
¯ B(xin−1 , rn−1 ) = ∅.
i n−1 =1
The total length of the radii of eliminated balls at the second stage is still less than . If we continue the process and assume in all subsequent iterations U will (n+1) contain a finite number of points from A(n−2) , . . . , A(1) , A(0) , at the very end the sum of the radii of all removed balls will be less than . After the elimination process the set
2.3 Types of Sets in Metric Spaces
V =U
33
n mk
¯ B(xik , rk ) ⊆ U
k=0 i k =1
does not contain points of A. Hence in U ⊆ X there is at least one neighbourhood V ⊆ U without points of A. This means A is nowhere dense. Moreover as is arbitrary, all the points of the infinite set A can be put in a finite number of balls, whose radii can be made as small as one wants. We say that A is negligible. The first examples of non-negligible nowhere dense sets, which are also relevant from a historical perspective, were discovered around the end of the 1800s by the mathematicians Vito Volterra and Henry Smith. Example 2.3.1 (Vito Volterra’s non-negligible nowhere dense set—1881)9 To construct Volterra’s non-negligible nowhere dense set we proceed in steps. Step one is to take the interval [0, 1] and construct an infinite sequence 0 < · · · < an+1 < an < · · · < a2 < a1 < 1 accumulating at 0 lim an = 0.
n→∞
We also require that d(1, a1 ) =
1 , 22
and the distances between successive points
d(a1 , a2 ); d(a2 , a3 ); . . . ; d(an , an+1 ); . . . → 0 are decreasing and less than d(1, a1 ) = 212 . In step two we take out the open interval (a1 , 1), and apply to each remaining interval (an+1 , an ) the above procedure: we construct a sequence an+1 < · · · < an1k+1 < an1k < · · · < an12 < an11 < an converging to an+1 lim an1k = an+1 ,
k→∞
such that d(an , an11 ) = 9 Volterra
1 22·2
·
1 22·1
and the distances of consecutive points
was interested in constructing a nowhere dense perfect set with positive exterior measure, which would have enabled him to define an everywhere differentiable function with bounded derivative that is not Riemann integrable on any closed bounded interval. In turn, this would have given a counterexample to Hankel’s conjecture, whereby every function with dense discontinuity set is Riemann integrable. The example is taken from: V. Volterra, Opere Matematiche—Memorie and Note, Roma Accademia Nazionale dei Lincei (1954), Volume I—Alcune osservazioni sulle funzioni punteggiate discontinue (1881), p.11 ff.
34
2 Types of Sets
Fig. 2.2 Construction of Volterra’s set—the first three iterations
d(an11 , an12 ); d(an12 , an13 ); . . . ; d(an1k , an1k+1 ); . . . → an+1 are decreasing and less than d(an , an11 ) = 212·2 · 212·1 . Observe that whilst in step one there was a unique limit point (zero), now there are infinitely many limit points, namely {0, . . . , an+1 , an , . . . , a2 , }. In step three we take out the intervals (a111 , a1 ), (a211 , a2 ), . . . , (an11 , an ), . . . and again apply the procedure to each remaining interval (an1k+1 , an1k ): build an1k+1 < · · · < an2k
j+1
< an2k < · · · < an2k < an2k < an1k j
2
1
tending to an1k+1 and satisfying d(an1k , an2k ) = 1
1 1 1 · 2·2 · 2·1 2·3 2 2 2
and that the distances of consecutive points are decreasing and less than the above quantity (Fig. 2.2). If we suppose to go on indefinitely, eventually we will produce a linear set V¯ containing all limit points and 1. V¯ is nowhere dense in [0, 1], since A = [0, 1] \ V¯ is a disjoint union of open intervals with endpoints in V¯ . The open intervals of which A is made are precisely the intervals removed during the various steps of the construction of V¯ :
2.3 Types of Sets in Metric Spaces
35
V0 = (a1 , 1); V11 = (a111 , a1 ), V21 = (a211 , a2 ), . . . , Vn 1 = (an11 , an ), . . . In general, then, for every non-empty neighbourhood U ⊆ [0, 1], there exists a nonempty neighbourhood Vk ⊆ U such that Vk ∩ V¯ = ∅. The overall length of the intervals constituting A is l(A)
2 equal parts, and delete the last interval, of length 1/m, from each successive subdivision. Now take the (m − 1) remaining intervals and divide them in m 2 parts, deleting the last one produced in each subdivision. we are taking out (m − 1) At this stage pairwise disjoint intervals, all of length 1/m · 1/m 2 . Divide again the (m − 1)(m 2 − 1) intervals left in m 3 parts, and delete as before 2 the last one produced in each take out (m − 1)(m − 1) pairwise division. 2Thus we 3 disjoint intervals of length 1/m · 1/m · 1/m . And so on. At step k the number of pairwise-disjoint excluded intervals is N (k) = 1 + (m − 1) + (m − 1)(m 2 − 1) + · · · + (m − 1)(m 2 − 1) · · · (m k−1 − 1). Call I1 , . . . , I N (k−1) such intervals. The total length of these pairwise-disjoint intervals is computed by induction. When n = 1 the length equals 1 1 =1− 1− ; l(I1 ) = m m for n = 1, 2 l(I1 ∪ I2 ) = 1 − 1 − =1− 1− =1− 1−
1 1 1 + (m − 1) · · 2 m m m 1 1 1 + 1− · 2 m m m 1 1 · 1− 2 . m m
(2.3.1)
To use induction we suppose that for n = 1, 2, . . . , N (k − 1)
10 The collected mathematical papers of Henry John Stephen Smith, edited by J. W. L. Glaisher, Volume II—On the integration of discontinuous functions (1894), p.95 ff.
36
2 Types of Sets
l
N (k−1) n=1
In
1 1 1 =1− 1− · 1 − 2 · · · 1 − k−1 . m m m
Then for n = 1, 2, . . . , N (k) l
N (k) n=1
In
1 1 1 =1 − 1 − · 1 − 2 · · · 1 − k−1 m m m 1 1 1 1 + (m − 1)(m 2 − 1) · · · (m k−1 − 1) · 2 · · · k−1 · k m m m m 1 1 1 · 1 − 2 · · · 1 − k−1 =1 − 1 − m m m 1 1 1 1 · 1 − 2 · 1 − k−1 · k + 1− m m m m 1 1 1 1 · 1 − 2 · · · 1 − k−1 · 1 − k . =1 − 1 − m m m m
This sum, as k → ∞, tends to the finite limit 1 1− E , m where E
1 m
is Euler’s product
∞ k=1 1 −
0
0. For n large enough there exists a Cn containing a closed interval Cn,k , k = 1, . . . , 2n , of length 1/3n < δ, such that x ∈ Cn,k ⊂ Cn . Let a1 be the endpoint of the interval Cn,k distinct from x. Then a1 belongs to C and |x − a1 | ≤
1 < δ. 3n
As this holds for every n ∈ N, as n increases the remaining intervals’ endpoints, which belong to C, tend to x ∈ C. Hence every x ∈ C is a limit point of C, and C ⊆ C . Despite Cantor’s ternary set is a linear set of points with the cardinality of the continuum, not only it contains no interval, but it is not even dense in any segment of [0, 1], regardless of size. Sets of this sort, albeit of the cardinality of the continuum, do not possess a continuous structure to speak of. 14 Indeed,
it is perfect.
2.3 Types of Sets in Metric Spaces
45
Here is another perfect example not dissimilar to the Cantor set. Example 2.3.4 (A perfect set) Write numbers of [0, 1] in binary form, so that every x ∈ [0, 1], x=
∞ bn n=1
2n
,
reads x = 0.b1 b2 b3 . . . in binary form, with bn ∈ {0, 1} for every n. Every rational number in the interval of the form m/2n , n = 1, . . ., has a double binary expansion. These rationals are called dyadic since their denominator is a power of 2. Consider for instance 41 , 21 and 43 , which read: ⎧ 0 ⎪ ⎨ 0+ 1 x = = 0.25 = 2 ⎪ 4 ⎩ 0 + 20
0 0 1 1 ¯ + 2 + 3 + 4 + · · · ⇔ x2 = 0.00111; 21 2 2 2 0 1 0 0 + 2 + 3 + 4 + · · · ⇔ x2 = 0.01. 21 2 2 2
⎧ 0 ⎪ ⎨ 0+ 1 x = = 0.5 = 2 ⎪ 2 ⎩ 0 + 20
0 1 1 1 ¯ + 2 + 3 + 4 + · · · ⇔ x2 = 0.0111; 21 2 2 2 1 0 0 0 + 2 + 3 + 4 + · · · ⇔ x2 = 0.1; 21 2 2 2
⎧ 0 ⎪ ⎨ 0+ 3 x = = 0.75 = 2 ⎪ 4 ⎩ 0 + 20
1 0 1 1 ¯ + 2 + 3 + 4 + · · · ⇔ x2 = 0.10111; 1 2 2 2 2 1 1 0 0 + 2 + 3 + 4 + · · · ⇔ x2 = 0.11. 21 2 2 2
All other real numbers in [0, 1] have a unique binary representation. By viewing binary expansions as decimal strings we may imagine that every rational number in [0, 1] with double representation defines an open interval, complementary to the points with unique representation. The set of points with unique representation, as complement of the countable union of the aforementioned open intervals, is closed, more than countable, non-empty, perfect and nowhere dense in [0, 1], just like the Cantor set. Lemma 2.3.1 (Baire—countable intersections of open dense sets) Let (X, d) be a complete metric space and {G n }n≥1 a sequence of open dense subsets. Then the intersection ∞
n=1
is dense in X .
Gn = G
46
2 Types of Sets
Proof Take a sequence {G n }n≥1 of open dense subsets in X . Given x0 ∈ X and a neighbourhood U of radius r0 > 0, to prove the statement we should show U ∩ G = ∅. As G 1 is dense in X , U ∩ G 1 contains a point x1 . But U ∩ G 1 is open, so it contains a neighbourhood centred at x1 of radius r1 < r0 /2 such that B¯ 1 = x ∈ X : d(x, x1 ) ≤ r1 ⊆ (G 1 ∩ U ). Now take a set G 2 dense in X , so G 2 ∩ B1 contains a point x2 . As G 2 ∩ B1 is open, it contains a neighbourhood centred at x2 with radius r2 < r1 /2 such that B¯ 2 = x ∈ X : d(x, x2 ) ≤ r2 ⊆ (G 2 ∩ B1 ), where B1 denote the open ball centered at x1 with radius r1 . At step n, the set G n is dense in X , so G n ∩ Bn−1 contains a point xn , and G n ∩ Bn−1 being open means it contains a neighbourhood centred at xn of radius rn < rn−1 /2 such that B¯ n = x ∈ X : d(x, xn ) ≤ rn ⊆ (G n ∩ Bn−1 ), where Bn−1 denote the open ball centered at xn−1 with radius rn−1 . Continuing by induction we generate a sequence {xn } in X , which we claim is a Cauchy sequence.15 Note B¯ n is a decreasing sequence of closed sets and {rn } is a positive, real, infinitesimal sequence. Given > 0 and some n > 1 such that rn ≤ , for i, j > n there exist, by construction, xi , x j ∈ Bn = x ∈ X : d(xn , x) < rn such that d(xi , x j ) ≤ d(xi , xn ) + d(x j , xn ) < 2rn ≤ 2. Hence {xn } is Cauchy, and by virtue of the space’s completeness it converges to x ∈ X. Finally, we need to show x ∈ G ∩ U . Take n > 1, so for every m ≥ n xm ∈ B¯ n ⊆ B¯ 1 . As the above two sets are closed,
15 This
step relies on the axiom of dependent choice. If the complete metric space is also assumed to be separable, the lemma is provable in ZF with no additional choice hypotheses.
2.3 Types of Sets in Metric Spaces
47
x ∈ B¯ n ⊆ B¯ 1 ⇔ x ∈ B¯ n ∩ B¯ 1 . What is more, B¯ 1 ⊆ (G 1 ∩ U ) ⊆ U and B¯ n ⊆ (G n ∩ Bn−1 ) ⊆ G n , so in particular x ∈ G n ∩ U . But n > 1 was arbitrary, so x ∈ G ∩ U.
A rather direct consequence of Baire’s lemma is: Corollary 2.3.1 (Countable unions of closed sets with empty interior) Let (X, d) be a complete metric space and {Cn }n≥1 a sequence of closed sets with empty interior. Then the union ∞
Cn = C
n=1
has empty interior. Proof By the previous lemma a countable intersection of open dense sets is dense. Since a closed subset C ⊆ X has empty interior iff A = X \ C is open and dense, ∞
An = A is dense ⇔
n=1
∞
Acn = Ac has empty interior.
n=1
Remark 2.3.3 (A countable intersection of open dense subsets of R) In complete metric spaces the converse of Lemma 2.3.1 is false, for not every dense subset is a countable intersection of open dense sets. Take Q, and suppose one could express it as countable intersection of open dense sets Uqi , qi ∈ Q:
Uqi . Q= qi ∈Q
For every i = 1, 2, . . ., Uqi are open in R, as is R \ {qi }. The latter is open since {qi } but R does not have is closed (the singleton {qi } would be open if qi were isolated, isolated points). Hence for every i = 1, 2, . . . the Uqi ∩ R \ {qi }) are open dense subsets, and
Uqi ∩ R \ {qi } = Q ∩ R \ {qi } qi ∈Q
qi ∈Q
{qi } =Q∩ R\ qi ∈Q
= Q ∩ R \ Q = ∅.
48
2 Types of Sets
The countable intersection of these open dense sets is empty. As this contradicts Baire’s lemma above, the initial assumption is absurd and the claim follows. ♣ Remark 2.3.4 (A countable union of nowhere dense subsets of R) While finite unions of nowhere dense subsets is nowhere dense, for countable unions this might not be the case. Let q1 , q2 , . . . , qn , . . . be a complete enumeration of Q. Any singleton {qi }, i ∈ N, is nowhere dense in R, for {qi } coincides with both its closure and its boundary, and when this happens the closure’s interior is empty. On the other ∞ {qi } = Q is dense hand the countable union of these nowhere dense subsets i=1 in R. ♣
2.4 Meagre and Residual Sets The fact that a countable union of nowhere dense subsets of a complete and separable metric space is not nowhere dense leads to suspect we might need to define a new type of set. This is exactly what René-Louis Baire did in 1899 in his doctoral thesis “Sur les fonctions des variables réelles”. Baire defined countable unions of nowhere dense sets ‘meagre’. The first example of a meagre subset of R, by Remark 2.3.4, is Q. As we shall see, in general a meagre subset of a given closed set F ⊆ R is an insignificant, or at any rate small, portion of F. What remains in F after we take out a meagre set is called a residual, or comeagre, subset of F. A residual subset, in contrast to a meagre one, forms a substantial part of F. Definition 2.4.1 (Meagre sets and sets of second category) In a space X we say: 1. a set M ⊆ X is meagre, or of first category, in X if it can be written as a countable union of nowhere dense subsets: M=
∞
Ai ;
i=1 ◦
with A¯ i = ∅ for every i ∈ N; 2. a set A ⊆ X is of second category if it is not meagre.
♦
Definition 2.4.2 (Comeagre or residual sets) In a space X a set R is comeagre or residual in X if its complement is meagre in X . ♦ Sets of second category therefore include comeagre/residual sets. Proposition 2.4.1 (Properties of meagre sets) In a space X ∞ 1. the countable union of meagre subsets i=1 Mi is meagre; 2. a subset of a meagre set is meagre.
2.4 Meagre and Residual Sets
49
Proof As for 1., it suffices to observe that a countable union of countable unions of nowhere dense subsets is a countable union of nowhere dense subsets. Regarding 2., let B be a subset in the meagre set A, which is a countable union of meagre subsets {An }n≥1 . Clearly B=B∩A=B∩
∞
An =
n=1
∞
(B ∩ An ).
n=1
By 2. in Proposition 2.2.2 a subset in a nowhere dense set is nowhere dense, so for every n ∈ N the set B ∩ An ⊆ An is nowhere dense. Hence B = ∞ n=1 (B ∩ An ) is meagre. Proposition 2.4.2 (Characterisation of residual sets) Let (X, d) be a complete metric space, a set R ⊆ X is residual, or comeagre, if and only if R contains a countable intersection of open dense subsets of X . meagre. There exists a sequence of Proof Suppose R is residual, so R c = M is ∞ ∞ such that M = i=1 Ri , and Ri is nowhere dense in nowhere dense subsets {Ri }i=1 c X for every i ∈ N iff ( R¯ i ) = Di is dense and open in X . Therefore M=
∞ i=1
Ri ⊆
∞ i=1
R¯ i ⇔
∞ ∞
( R¯ i )c = Di ⊆ R. i=1
i=1
Proposition 2.4.3 (Countable intersections of residual sets) The countable intersec∞ tion of residual sets i=1 Ri in a space X is a residual set R. Proof The countable intersection of a countable intersection of open dense subsets is a countable intersection of open dense subsets. Theorem 2.4.1 (Baire’s category theorem) Any residual subset R in a complete metric space (X, d) is dense in X . ∞ Proof Since A = X \ R is of first category, then A = i=1 Ai , where each Ai ⊆ X ∞ is nowhere dense. Then A ⊆ Aˆ = i=1 A¯ i , where each A¯ i is closed with empty interior. By corollary 2.3.1 Aˆ has empty interior, and the inclusion forces A to have empty interior as well. Let us assume that R is not dense. Then there exists a nonempty open set U ⊆ X such that U ∩ R = ∅ and U ⊆ A, but A has empty interior. The contradiction proves the claim. Let us illustrate the notions of meagre set and second-category set by applying them to R. Proposition 2.4.4 (Baire category of intervals) Intervals in R are of second category. Proof Suppose the interval I ⊆ R were meagre, i.e. of first category. Then R \ I would be residual and Theorem 2.4.1 would make it dense in R. But (R \ I ) ∩ I = ∅, a contradiction, so the statement holds.
50
2 Types of Sets
¯ = R is of second In general the closure of a meagre set is not meagre. For example Q category, as the next proposition entails. Proposition 2.4.5 (Baire category of R) R is of second category. Proof If R were of first category, we could write R=
∞
Ri ,
i=1
with every Ri nowhere dense in R. Let I be an interval of R, so I =R∩ I =
∞
Ri ∩ I.
i=1
But now Ri ∩ I is nowhere dense for every i ∈ N, turning I in a countable union of nowhere dense subsets of R. It would follow I is of first category, i.e. meagre. The contradiction proves the claim. Proposition 2.4.6 (Baire category of residual subsets of R) Residual subsets in R are of second category. Proof Suppose a residual set R ⊆ R were of first category. Then both R and R \ R would be of first category. If so R ∪ R c = R would be a countable union of nowhere dense subsets, a contradiction. Remark 2.4.1 (The set R \ Q) We have seen that the rational numbers, being a countable union of nowhere dense subsets, form a meagre set (Remark 2.3.4). Consequently the set of irrationals R \ Q is residual and of second category in R, while R or every interval I in R is of second category in itself, but they are not residual. ♣ Finally we observe that I, I c ⊆ R are of second category in themselves too. A set of second category in a complete metric space (X, d) is always uncountable. As we shall explain in a short while, though, there exist uncountable sets of first category. Proposition 2.4.7 (Cardinality of residual sets in complete metric spaces) Let M be a meagre subset in X . Then for any set E of second category in X , the set E \ M is uncountable. Proof Let E ⊆ X be of second category and M a meagre subset in X , so we may write it as countable union of nowhere dense subsets Ai M=
∞
Ai .
i=1
If (E \ M) were countable, we would be able to express it as a countable union ∞ i=1 {x i } of singletons {x i }, i = 1, 2, . . .. Then
2.4 Meagre and Residual Sets
51
∞ ∞ ∞ E = E \ M ∪ (E ∩ M = {xi } ∪ E ∩ Ai = {xi } ∪ E ∩ Ai .
i=1
i=1
i=1
Since E ∩ Ai and {xi } are both nowhere dense for every i ∈ N, E would be of first category, a contradiction. Remark 2.4.2 (Meagre sets with the cardinality of the continuum) It was mentioned that there are meagre subsets whose cardinality is the continuum, just like for second category sets. An example is n∈Z Cn , where Cn = C + n = x + n : x ∈ C the Cantor set . We proved C is nowhere dense, while Z is countable. The union n∈Z Cn has the following structure. It resembles a countable replica of the Cantor set obtained by integer translation. To each n ∈ Z there corresponds a nowhere dense, closed set Cn = C + n embedded in the real line as a copy of the Cantor set. For n = m the sets Cn and Cm have the cardinality of the continuum. So in general Cn = (C + n), n∈Z
n∈Z
is a countable union of nowhere dense subsets, i.e. a meagre set. Furthermore its cardinality is the continuum, since 2ℵ0 · ℵ0 = 2ℵ0 . ♣
Part II
Borel Sets and Baire Functions on R
Chapter 3
Borel sets in R
3.1 Basics on Borel Sets Open and closed subsets of R are but special families of sets of a larger class, that of Borel sets. The Borel class consists of all subsets of R that can be obtained by union and intersection of countably many open, or closed, subsets. This brings to mind the definition of a topological space. For Borel sets, though, only countable operations are allowed, whereas in a topological space we may join and intersect sets arbitrarily. At any rate, starting from a topological space one can always obtain a family of Borel sets generated by the collection of open sets, or closed sets, of the space. Regarding the specific construction of a Borel family, we call: • G 0 set: any open subset of R; • G 1 set: a countable intersection of G 0 sets; a G 1 set is also known as a G δ set; • G 2 set: a countable union of G 1 sets; a G 2 set is also known as a G δ,σ set; .. . In general, for every countable ordinal α < ( being the first uncountable ordinal) we can define G α sets. Once G β sets have been defined for every β < α, then G α sets are the countable intersections or unions of G β sets, for β < α, according to whether α is odd or even. If α is a limit ordinal, i.e. with no immediate predecessor, we may assume it even, making odd the successor α + 1 to any limit ordinal. The first ordinal 1 is odd, and the successor of an odd/even ordinal is even/odd respectively. Let A be a G α set. Then A = A ∪ A ∪ A ∪ . . . and A = A ∩ A ∩ A ∩ . . . ; so irrespective of the parity of α, any G α set A is also a G α+1 set. We obtain thus the following sequence of types © The Editor(s) (if applicable) and The Author(s), under exclusive license to Springer Nature Switzerland AG 2020 S. Gentili, Measure, Integration and a Primer on Probability Theory, UNITEXT 125, https://doi.org/10.1007/978-3-030-54940-4_3
55
3 Borel sets in R
56
G 0 , G 1 , . . . , G α , . . . , α < , where α may be either a limit ordinal or one with immediate predecessor. G α sets, for α < , are usually called Borel sets, for they arise as unions or intersections of a countable number of open sets. To produce the whole Borel class the process must be repeated for every countable α less than (in other terms, by transfinite induction). If for instance {Ai : i ∈ N} is a countable family such that A1 is a G 1 \ G 0 set A2 is a G 2 \ G 1 set A3 is a G 3 \ G 2 set ......... ∞ ∞ Ai or i=1 Ai is one of the G β sets built previously. Only we have no guarantee i=1 at iteration we can be sure that those two are G sets. We may in fact prove that to obtain the entire Borel class over R—which as we shall see has important features, such as stability under countable intersections and unions—it is necessary to repeat the iterations for every countable ordinal α less than . The resulting family has cardinality 2ℵ0 . At this point the following recursive definition, for α ∈ (0, ), seems natural. Call 10 the class of open sets in R, i.e. the class of G 0 sets: 10 = E 1 ⊆ R : E 1 open in R . Call 02 the class of countable intersections of sets in 10 : 02
= H ⊆R:H = 2
2
∞
E i1 ,
E i1
∈
10 ,
∀i ∈ N .
i=1
As E 1 = E 1 ∩ E 1 ∩ E 1 ∩ . . . for every E 1 ∈ 10 , it follows 10 ⊆ 02 . If β < α is an odd ordinal with immediate predecessor, let β0 be the class of countable unions of sets in 0β−1 : ∞ β−1 β−1 β0 = E β ⊆ R : E β = Hi , Hi ∈ 0β−1 , ∀i ∈ N . i=1
Instead, if β < α is an even ordinal with immediate predecessor, let 0β denote the 0 class of countable intersections of sets in β−1 : 0β
β
β
= H ⊆R:H =
∞ i=1
β−1 Ei ,
β−1 Ei
∈
0 β−1 , ∀i
∈N .
3.1 Basics on Borel Sets
57
For the same reasons that 10 ⊆ 02 , for every non-limit β < α we have β0 ⊆ 0β+1 . Therefore when β ∈ (0, α), with α a limit ordinal, 10 ⊆ 02 ⊆ 30 ⊆ . . . ⊆ β0 ⊆ 0β+1 ⊆ . . . .
(3.1.1)
In general, for every even β < α the sets in 0β are countable intersections of sets in 0 β−1 :
0 . 0β = β−1 δ Once we have the classes β0 and 0β for every β < α, we can define 0α , i.e. the class of countable intersections of sets in β0 , β < α:
β0 . 0α = β 0, a point x ∈ [a, b] and a neighbourhood Uδ (x) = (x − δ, x + δ), the oscillation of f at the point x is the real number ω( f, x) ≥ 0 defined by
ω( f, x) = lim ω f, Uδ (x) ∩ [a, b] . δ→0
♦
Remark 3.3.1 (Comparing oscillations of real maps) Since ω( f, E) is the least upper bound of the image of E under f , for every E 1 ⊆ E 2 ⊆ [a, b] we have f (E 1 ) ⊆ f (E 2 ) and ω( f, E 1 ) ≤ ω( f, E 2 ).
(3.3.1) ♣
Theorem 3.3.1 (Oscillation of real maps at continuous points) A map f : [a, b] → R is continuous at x0 ∈ [a, b] if and only if its oscillation at x0 vanishes:
lim ω f, Uδ (x0 ) ∩ [a, b] = 0.
δ→0
(3.3.2)
3 Borel sets in R
64
Proof Necessary implication: suppose f continuous at x0 . For any > 0 there exists δ > 0 such that for every x ∈ Uδ (x0 ) ∩ [a, b] | f (x) − f (x0 )|
0 is arbitrary, f is continuous at x0 .
Proposition 3.3.1 (Discontinuity set of a real map) Given a map f : [a, b] → R and any α > 0, the subset on which the oscillation of f is greater than or equal to α,
3.3 Discontinuity Set of Real Functions over [a, b]
65
D( f ) = x ∈ [a, b] : ω( f, x) ≥ α , is closed. Proof Let x¯ be a limit point of D( f ). By definition there exists a sequence {xn }n≥1 in D( f ) such that xn → x. ¯ For every n ∈ N, ω( f, xn ) ≥ α.
c Suppose x¯ ∈ / D( f ). Then x¯ belongs in the continuity set D( f ) = C( f ): x¯ ∈ C( f ) = {x ∈ [a, b] : ω( f, x) < α}. ¯ ∩ Hence ω( f, x) ¯ < α. By definition there is a δ > 0 such that for every x ∈ Uδ (x) [a, b]
¯ ∩ [a, b] = ω f, Uδ (x)
sup
x∈Uδ (x)∩[a,b] ¯
| f (x) ¯ − f (x)| < α.
(3.3.4)
But x¯ is a limit point of D( f ), so there exists some xi ∈ D( f ) internal to the neigh¯ ∩ [a, b]. Therefore we can take a neighbourhood Uη (xi ) ∩ [a, b] bourhood Uδ (x) such that ¯ ∩ [a, b]. Uη (xi ) ∩ [a, b] ⊆ Uδ (x)
(3.3.5)
At the same time xi ∈ D( f ) implies
ω f, Uη (xi ) ∩ [a, b] =
sup
x∈Uη (xi )∩[a,b]
| f (xi ) − f (x)| ≥ ω( f, xi ) ≥ α.
Remark 3.3.1 tells
¯ ∩ [a, b] ≥ ω f, Uη (xi ) ∩ [a, b] ≥ α. ω f, Uδ (x)
The contradiction then allows to conclude.
Proposition 3.3.2 (G δ and Fσ subsets of closed real intervals) A subset A ⊆ [a, b] is a G δ set if and only if [a, b] \ A is an Fσ set. ∞ Proof Necessary part: let A ⊆ [a, b] be a G δ set, A = i=1 G i . Then [a, b] \ A = [a, b] \
∞ i=1
Gi =
∞ ∞
[a, b] \ G i = [a, b] ∩ G ic . i=1
i=1
Clearly [a, b] ∩ G ic , for every i ∈ N, is closed as intersection of closed sets, so [a, b] \ A is an Fσ set, being a countable union of closed sets. ∞ Fi , with Fi closed for Vice versa, let [a, b] \ A be an Fσ set, [a, b] \ A = i=1 every i ∈ N. Then
3 Borel sets in R
66
A = [a, b]
∞
Fi =
i=1
∞
[a, b] ∩ Fic . i=1
For every i ∈ N the set [a, b] ∩ Fic is open in [a, b], making A a G δ set.
Proposition 3.3.3 (Borel type of countable dense subsets of closed intervals) A countable set A ⊆ R that is at most dense in [a, b] cannot be a G δ set. Proof Let A be countable and dense in [a, b], and suppose by contradiction it is of ∞ type G δ , so A = i=1 G i . As A ⊆ G i for every i = 1, 2, . . ., every G i is dense in [a, b]. Hence A is the countable intersection of open dense sets in [a, b], and [a, b] \ A = [a, b]
∞
Gi =
i=1
∞
[a, b] \ G i i=1
the countable union of nowhere dense subsets in [a, b], i.e. of first category. Therefore A is residual in [a, b] and by virtue of Proposition 2.4.7 it cannot be countable. Remark 3.3.2 (Discontinuities of a function) 1. Combining Theorem 3.3.1 with Proposition 3.3.1 we may say a map f : [a, b] → R is discontinuous at x0 ∈ [a, b] iff its oscillation at that point is ω( f, x0 ) > 0. 2. Given f : [a, b] → R define for every n ∈ N 1 . Dn ( f ) = x ∈ [a, b] : ω( f, x) ≥ n The discontinuity set of f reads F=
∞
Dn ( f ) =
n=1
∞
x ∈ [a, b] : ω( f, x) ≥
n=1
1 , n
i.e. a countable union of closedsets, whence an Fσ set. c 3. On the other hand, any Dn ( f ) = Cn ( f ) is open in [a, b] and obviously G=
∞ n=1
Cn ( f ) =
∞ n=1
1 x ∈ [a, b] : ω( f, x) < n
represents the continuity set of f on [a, b], which is patently a G δ set. 4. Remarks 3.2.1 and 3.2.2 tell us there cannot exist a map f : R → R that is continuous at rational points and discontinuous at irrational points. Put equivalently, there is no map f : R → R whose discontinuity set are the irrational numbers. ♣
3.3 Discontinuity Set of Real Functions over [a, b]
67
Theorem 3.3.2 (Real-valued maps that are continuous on a residual set) A map f : [a, b] → R is continuous on a residual set R of [a, b] if and only if R is dense in [a, b]. Proof The necessary part is immediate because R is residual by hypothesis, and Baire’s theorem (Theorem 2.4.1) implies its density in [a, b]. Sufficient implication: let R denote the continuity set of f on [a, b], and suppose R dense in [a, b]. As R is a G δ set, R=
∞
Gi ,
i=1
where each G i is open in [a, b]. But R ⊆ G i for every i ∈ N, so every G i is dense in [a, b] and R is the countable intersection of open dense sets. Let us define ∞
R \ G i ), D =R\ R = i=1
a countable union of nowhere dense subsets. It is of first category and hence R is residual. The discontinuity set of a real function is not necessarily countable, as the next example shows. Example 3.3.1 (Discontinuity of the Cantor set’s characteristic function) Using Remark 3.2.3 we may also verify the characteristic (or indicator) function ϕC (x) of the Cantor set is discontinuous at every x ∈ C and continuous at every x ∈ G. For this, define 1 if x ∈ C ϕC (x) = 0 otherwise This map satisfies lim ϕC (x) = 0
x→x¯
for every x¯ ∈ / C. Hence the limit never exists in C, while it always exists in the complement. Consequently ϕC is continuous on G and discontinuous on C. The discontinuities are essential, in other words rather severe ones. Pick in fact an x ∈ [0, 1]. As C is closed with empty interior, if x ∈ C then any neighbourhood U of x has non-empty intersection with C c = G. Therefore for any > 0 less than 1/2 there exists x¯ ∈ U ∩ G such that ¯ = |1 − 0| = 1 > . |ϕC (x) − ϕC (x)|
3 Borel sets in R
68
Hence ϕC is discontinuous at any x ∈ C. If x ∈ G, we may find a neighbourhood U in the open set G. Given that ϕC is 0 on U , for any > 0 and every x¯ ∈ U ¯ = |0 − 0| = 0 < . |ϕC (x) − ϕC (x)| Hence ϕC is continuous at every point in G. The characteristic function of the Cantor set is then discontinuous on C, which has the cardinality of the continuum. What we have just proved allows us to say there exists real functions with uncountably many discontinuity points.
3.4 Additive and Multiplicative Families of Sets Until now we have examined the features of Borel subsets of R of type Fσ and G δ . As we have seen, they play an important role in classifying the discontinuities of real functions. Yet to acquire the tools necessary for the Lebesgue-Hausdorff theorem we must also study the properties of the additive class α0 and the multiplicative class 0α , where α is any ordinal smaller than the first uncountable ordinal . To that end we shall invoke a result about transfinite induction. The principle of transfinite induction is equivalent to finite induction on positive integers, but only the former generalises to well-ordered sets (like R). Relatively to the set of ordinals, transfinite induction consists in proving a certain property holds for every ordinal number. Any such proof proceeds in two steps. In the first one proves that a given property P(x) holds for the ordinal 0, i.e. P(0). In the second, inductive, step one proves P(x) for every ordinal α < , namely P(α), where α may either be a limit ordinal or one with immediate predecessor, and is the first uncountable ordinal. The inductive step is typically split into two. In part one assumes the property holds for some successor ordinal β, in which case it is possible to use classical induction. Part two consists in assuming the property for every ordinal β smaller than a limit ordinal α. Here classical induction does not work, for α has no predecessor. Theorem 3.4.1 (Weak principle of transfinite induction) Let A be the class of ordinals α < and call by P(x) a property such that: 1. for the first ordinal 0 ∈ A property P(0) holds; 2. for every successor ordinal β ∈
A, P(β − 1) ⇒ P(β); 3. for every limit ordinal α ∈ A, ∀β < α, P(β) ⇒ P(α) . Then P(α) holds for every ordinal α < . Proof In such a case {P(x)} must correspond to the class of ordinal numbers. To prove the result let us take the above hypotheses and suppose, by contradiction, that there existed an ordinal γ such that ¬P(γ) holds. Then
3.4 Additive and Multiplicative Families of Sets
69
B = β ≤ γ : ¬P(β) would not be empty, and therefore it would admit a minimum element, say β . Clearly β = 0 since P(0) holds. As β is the minimum of B, P(η) is true for every η < β . Assumptions 2. or 3. would give us P(β ), which cannot be. Hence the claim. Here is a variant of the above principle. Theorem 3.4.2 (Strong principle of transfinite induction) Let A be the class of ordinals α < and call by P(x) a property such that: 1. for the first ordinal 0 ∈ A property P(0) holds; 2. for every α ∈ A ∀β < α, P(β) ⇒ P(α). Then P(α) holds for every ordinal α < . Proof In analogy to the previous theorem suppose there existed an ordinal γ for which ¬P(γ) were true. Then B = {β < γ : ¬P(β)} would be non-empty, with minimum β ∈ B. This ordinal cannot be the first element of A, because of P(0). As every ordinal η < β is in A, by induction ∀η < β , P(η) ⇒ P(β ) and P(β ) would hold, against the assumption.
Definition 3.4.1 (Additive and multiplicative classes) Define in R the following classes of sets recursively: 1. for α = 1 let 10 = E 1 ⊆ R : E 1 open and 01 = H 1 ⊆ R : H 1 closed ; 2. for 0 < α < , if α is a successor ordinal
0 ; α0 = 0α−1 σ and 0α = α−1 δ if α is a limit ordinal, having defined β0 and 0β for every β < α, we set α0 =
β 0, take M ∈ R, and choose k, j ∈ Z so that k/2n < k < M and j/2n < j ≤ M. From Eq. (5.1.8) we infer x ∈ R : f n (x) < M = x ∈ R : f n (x) ≤ k = Ak x ∈ R : f n (x) > M = x ∈ R : f n (x) > j = E j .
The Ak are ambiguous of class α and the E j belong in α0 , so { f n }n≥1 is a Borel sequence of class α. The next proposition and lemma prepare the ground for the proof of Theorem 5.1.4. Proposition 5.1.3 (Borel class of simple pointwise limits of simple functions) Any simple Borel function s : R → A of class β > 2, where A = {a1 , . . . , ak } ⊆ R and β is a successor ordinal, is the pointwise limit of a sequence {sn }n≥1 of simple functions of Borel class β − 1 with values in A . If s belongs to class α + 1, with α a limit ordinal, then s is the pointwise limit of a sequence {sn }n≥1 of simple functions of class β < α. Proof By hypothesis s is of class β (or, respectively, α + 1), so Corollary 5.1.1 implies that any set Ai = s −1 ({ai }) is ambiguous of class β (or α + 1), and moreover R = A1 ∪ . . . ∪ Ak . By proposition 3.4.6, Ai = lim Ain , n→∞
(5.1.9)
where every Ain is ambiguous of class β − 1 (or β < α). For every n ∈ N define H1n = A1n , H2n = A2n \ A1n , . . . , Hkn = Akn \ A1n ∪ . . . ∪ Ak−1n . Clearly for every given n these sets are pairwise disjoint and in 0β−1 (or in β 0 and every x ∈ R. Then there exists a sequence {u i }i≥1 of simple functions in class < α such that lim u i (x) = g(x)
i→∞
and
|u i (x) − si (x)| ≤ k
(5.1.10)
for every x ∈ R. Proof Define di (x) = |ti (x) − si (x)|. The maps ti , si are of class < α. Corollary 5.1.1 says Aik = t −1 ({aik }) and Bi j = s −1 ({bi j }), k = 1, . . . , n, j = 1, . . . m, are ambiguous of class < α. Then the difference t −1 ({aik }) − s −1 ({bi j }) = Aik \ Bi j is ambiguous of class < α. The maps di are continuous, and by item 3. in proposition 3.4.2 Ai = di−1 [0, k] = {x ∈ R : |ti (x) − si (x)| ≤ k} is ambiguous of class < α. Hence di is in class < α and its image is finite. Set ti (x) if x ∈ Ai u i (x) = si (x) if x ∈ R \ Ai . By Proposition 5.1.1, 2., the map u i is in class < α, and for every x ∈ R |u i (x) − si (x)| ≤ k, proving the second relation in (5.1.10). In order to show the first relation in (5.1.10) take an arbitrary x ∈ R. As limi→∞ si (x) = f (x) and limi→∞ ti (x) = g(x), if i is large enough |ti (x ) − si (x )| ≤ k, so x ∈ Ai and u i (x ) = ti (x ). Then lim u i (x ) = lim ti (x ) = g(x ).
i→∞
i→∞
But this holds for any x ∈ R, so lim u i (x) = g(x).
i→∞
114
5 Borel Functions and Baire Functions
Theorem 5.1.4 (Borel class of sequences of functions and Borel class of pointwise limits) Any Borel function f : R → R of class β > 2, where β is a successor ordinal, is the pointwise limit of a sequence { f n }n≥1 of functions in class β − 1. If f is in Borel class α + 1 for some limit ordinal α, then f is the pointwise limit of a sequence { f n }n≥1 of functions in class β < α. Proof Since the interval (c, d), c < d, is homeomorphic to R we may as well take f : R → (c, d). By Theorem 5.1.3, for every x ∈ R f (x) = lim f n (x), n→∞
where { f n }n≥1 converges uniformly, and its terms f n are in class β (respectively α + 1) with finite image, so they are simple. By uniform convergence we may assume, without loss of generality, | f n+1 (x) − f n (x)|
0. Given n ∈ N such that 1/2n−1 < and | f n (x ) − f (x )| < ,
5.1 The Lebesgue–Hausdorff Theorem on R
115
call n 0 > n an index such that, for every i > n 0 , |h n,i (x ) − f n (x )| < . For i > n 0 |h i,i (x ) − f (x )| =|h i,i (x ) − h i−1,i (x ) + . . . + h n+1,i (x ) − h n,i (x ) + h n,i (x ) − f n (x ) + f n (x ) − f (x )| ≤|h i,i (x ) − h i−1,i (x )| + . . . + |h n+1,i (x ) − h n,i (x )| + |h n,i (x ) − f n (x )| + | f n (x ) − f (x )|
1 1 < i−1 + . . . + n + + . 2 2 The term in brackets is a partial geometric series, from index n to (i − 1), whence
i−1 1 − 21i 1 1 1 1 1 2n = = − i · 2 = n−1 − i−1 < . 1 k n 2 2 2 1− 2 2 2 k=n 0 were chosen arbitrarily. To show that the previous theorem holds for Borel functions of class β = 2 we shall rely on a result from functional analysis, Tietze’s extension theorem, which is reviewed in Appendix B. Theorem 5.1.5 (Borel functions of class 2) Any Borel function f : R → R of class 2 is the pointwise limit of a sequence of continuous maps. Proof Without loss of generality f can be thought of as valued in a bounded interval (c, d). To begin with we shall also assume f is simple of class 2, so we rename it s : R → A , with A = {y1 , . . . , yk } ⊆ (c, d). For every i = 1, . . . , k the set s −1 ({yi }) = Ai is an Fσ set, i.e. Ai = s −1 ({yi }) =
∞
Fi,n ,
n=1
where Fi,n ⊆ Fi,n+1 , and Fi,n = F¯i,n . By Tietze’s extension (Theorem B.0.2, Appendix B), for every n ∈ N there exists a continuous map f n : R → {y1 , . . . , yk }, hence of class 1, such that f n (x) = yi for every x ∈ Fi,n , i = 1, . . . , k, and whose values lie between the lower and upper bounds of s. As R = A1 ∪ . . . ∪ Ak and Ai ∩ A j = ∅ for i = j, any x ∈ R will belong in Ai , for some i = 1, . . . k. If n is sufficiently large, x ∈ Fi,n , so f n (x ) = yi , i.e. f n (x ) = s(x ). But x ∈ R is arbitrary, so for every x ∈ R
116
5 Borel Functions and Baire Functions
s(x) = lim f n (x). n→∞
Now we tackle the general case and take f : R → (c, d) of class 2. Just as in Theorem 5.1.3, let { f n }n≥1 be a sequence of functions in class 2, each with finite image inside (c, d), and uniformly convergent to f . Uniform convergence ensures we may suppose | f n+1 (x) − f n (x)|
0, by definition of outer measure m ∗ (E) + /2 is no longer the least upper bound, so there exists a sequence of pairwise-disjoint sets {Fn }n≥1 ⊆ A for which E ⊆ ∞ n=1 Fn and m ∗ (E) ≤
∞
m ∗ (Fn ) < m ∗ (E) +
n=1
Hence there is an n such that
∞
n=n +1
< ∞. 2
m ∗ (Fn ) < /2. Let us call F =
n n=1
Fn , so
9.3 Uniqueness of Extension m ∗ and Measure-Space Completion
E \ F = E
n
⊆
Fn
∞
n=1
Fn
n=1 ∞
⇒ m ∗ (E \ F ) ≤
n
217
Fn
n=n +1
Fn
n=n +1
n=1
m ∗ (Fn )
g(x)
= x ∈ X : f (x) ≤ g(x) ∈ Nm ;
this context imposing completeness is essential, because complete spaces contain all zeromeasure sets and all negligible subsets, which are measurable too.
218
9 The Lebesgue Measure
¯ is bounded m-a.e. on X if 4. a map f : X → R
c x ∈ X : | f (x)| < ∞ = x ∈ X : | f (x)| = ∞ ∈ Nm . ♣
¯ determines a partition of R ¯ X into equivaEquality m-a.e. among maps f i : X → R lence classes, where the representatives of one coset might be thought of as one map defined a.e. on X . In particular, Proposition 9.3.6 (A.e. equality of functions and equivalence relation) Let (X, S, m) ¯ X is an equivalence relation. be a complete measure space. Then m-a.e. equality on R Proof Reflexivity and symmetry are evident. Let us show transitivity. Take f, g, h : ¯ and X → R
N1 = x ∈ X : f (x) = g(x) and N2 = x ∈ X : g(x) = h(x) . Call N = N1 ∪ N2 , so N ∈ Nm and for every x ∈ N c = N1c ∩ N2c f (x) = g(x) = h(x). Then
N c ⊆ x ∈ X : f (x) = h(x) ⇔ x ∈ X : f (x) = h(x) ⊆ N . As (X, S, m) is complete,
x ∈ X : f (x) = h(x) ∈ Tm ,
whence f = h a.e. on X .
9.4 The Lebesgue Measure on R Now we specialise to X = R, and let A indicate the algebra A(I) generated by collection I of real intervals. We indicate by the length function of Sects. 8.1–8.2. The outer measure ∗ induced on subsets of R is called Lebesgue outer measure, and in analogy to Definition 9.1.1 and the formula of Proposition 9.1.1 it is defined as follows: for every E ⊆ R
∗
(E) = inf
∞ i=1
(Ii ) : E ⊆
∞ i=1
Ii , Ii ∈ I, ∀i and Ii ∩ I j = ∅, i = j .
9.4 The Lebesgue Measure on R
219
The σ-algebra of ∗ -measurable sets, obtained in Sect. 9.2, for X = R is called σalgebra of Lebesgue measurable sets and denoted L (R) or simply L . Clearly the σ-algebra S(I) generated by real intervals is the Borel σ-algebra on R, which we indicate with B(R). The measure is σ-finite on I, so Theorem 9.3.1 ensures its extension to B(R) is unique. Moreover, Borel sets in R are Lebesgue measurable. Theorem 9.4.1 (Measurability of Borel sets in R) Any Borel subset of R is Lebesgue measurable. Proof We begin by proving the measurability of (a, ∞). Given A ⊆ R and A1 = A ∩ (a, ∞) and A2 = A ∩ (−∞, a], we must show ∗ (A) ≥ ∗ (A1 ) + ∗ (A2 ). This is obvious if ∗ (A) = ∞, so suppose ∗ (A) < ∞. Given > 0, there is a family ∗ {Ii }i≥1 of intervals ∞ covering∗ A such that (A) + is not the greatest lower bound, so we have i=1 (Ii ) < (A) + . Define disjoint intervals (possibly empty) Ii1 = Ii ∩ (a, ∞) and Ii2 = Ii ∩ (−∞, a], so (Ii ) = (Ii1 ) + (Ii2 ) = ∗ (Ii1 ) + ∗ (Ii2 ). As A1 ⊆
∞
1 i=1 Ii ,
A2 ⊆
∞
2 i=1 Ii ,
∗ (A1 ) ≤ ∗
∞
i=1
∗ (A2 ) ≤ ∗
∞ i=1
≤
Ii1 Ii2
≤
∞ i=1 ∞
∗ (Ii1 ) ∗ (Ii2 )
i=1
and so ∞ ∞ ∗ 1 ∗ 2 (Ii ) < ∗ (A) + , (Ii ) + (Ii ) ≤ (A1 ) + (A2 ) ≤ ∗
∗
i=1
i=1
whence the claim. Now, (a, ∞) measurable implies its complement (−∞, a] is measurable. Observe (−∞, b) = ∞ n=1 (−∞, b − 1/n], so (−∞, b) is measurable. As (a, b) = (−∞, b) ∩ (a, ∞), any open interval (a, b) is measurable. Consequently any open set in R is
220
9 The Lebesgue Measure
measurable as a union (finite or countable) of open sets. Taking complements, any closed set is measurable, and so on. Now we know that B(R) ⊆ L , so L = S(I) ∪ T = B ∪ T . Given E ⊆ R such that ∗ (E) = 0, since zero-measure sets are measurable, E is Lebesgue measurable and has zero measure. Then, for every T ⊆ E, we have ∗ (T ) ≤ ∗ (E), so ∗ (T ) = 0 and T is also Lebesgue measurable of zero measure. But then the Lebesgue measure ∗ on R and the collection L of Lebesgue measurable sets are complete, so (R, L , ∗ ) is the completion of (R, B, ∗ ). We have said in earlier sections that any E ⊆ X can be approximated by a smaller set F and a larger set G ∈ S(A). Now we would like to know if there is a similar situation for sets of the class L (R). Theorem 9.4.2 (Approximation in L (R) by open and closed subsets of R) A set E ⊆ R is Lebesgue measurable (⇒ E ∈ L ) if and only if, for any > 0, there exists an open set G ⊇ E, or a closed set F ⊆ E, in R such that ∗ (G \ E) < , or ∗ (E \ F) < . Proof Let us prove the first one (necessary part). Let E be Lebesgue measurable and suppose for a moment ∗ (E) < ∞, so ∀Y ⊆ R ∗ (Y ) = ∗ (Y ∩ E) + ∗ (Y ∩ E c ). Given > 0 there certainly exists an open set G ⊆ R containing E, with ∗ (G ∩ E c ) < , and such that ∗ (G) = ∗ (G ∩ E) + ∗ (G ∩ E c ). ∞ E i , with E i ∩ E j = ∅, i = j and ∗ (E i ) < In case ∗ (E) = ∞, we can set E = i=1 ∞ for every i ∈ N. As before, since all E i are measurable, given > 0, for every i ∈ N there is an open set G i ⊆ R containing E i with ∗ (G i ∩ E ic ) < /2i and such that ∗ G i = ∗ G i ∩ E i + ∗ G i ∩ E ic for every i ∈ N. In general, setting G =
G \ E) ⊆
∞ i=1
∞ i=1
G i with E ⊆ G,
∞ ∗ G i ∩ E ic < . G i \ E i and ∗ G \ E) ≤ i=1
Vice versa, for any > 0 there exists an open set G containing E, with ∗ (E) < ∞, such that ∗ (G\E) < . Given Y ⊆ R,
9.4 The Lebesgue Measure on R
221
Y ∩ E c = (Y ∩ G c ) ∪ (Y ∩ (G\E)). Subadditivity and monotonicity of ∗ , together with the measurability of G, imply ∩ E) + ∗ (Y ∩ G c ) + ∗ (Y ∩ (G\E)) ∗ (Y ∩ E) + ∗ (Y ∩ E c ) ≤ ∗ (Y ⊆(Y ∩G)
∗
⊆(G\E)
∗
∗
≤ (Y ∩ G) + (Y ∩ G ) + (G \ E) < ∗ (Y ) + . Hence
c
∗ (Y ) ≥ ∗ (Y ∩ E) + ∗ (Y ∩ E c ),
i.e. E is Lebesgue measurable. The case ∗ (E) = ∞ is left as an exercise. Now we come to the second one (necessary part). The linear set E is in L (R), so E c ∈ L (R). Using the previous part there exists an open set G such that E c ⊆ G and ∗ (G \ E c ) < . Put F = G c , then F ⊆ E is a closed set, and since G \ E c = E \ F, we have ∗ (E \ F) < , i.e. the claim. Conversely for any > 0 there exists a closed set F contained in E, with ∗ (E) < ∞, such that ∗ (E\F) < . Given Y ⊆ R, Y ∩ E = (Y ∩ F) ∪ (Y ∩ (E\F)). But ∗ is subadditive and monotone, and F measurable, so c ∗ (Y ∩ E) + ∗ (Y ∩ E c ) ≤ ∗ (Y ∩ F) + ∗ (Y ∩ (E\F)) + ∗ (Y ∩ E) c ⊆(E\F)
⊆(Y ∩F )
≤ ∗ (Y ∩ F) + ∗ (Y ∩ F c ) + ∗ (E \ F) < ∗ (Y ) + . Hence ∗ (Y ) ≥ ∗ (Y ∩ E) + ∗ (Y ∩ E c ), i.e. E is Lebesgue measurable. The case ∗ (E) = ∞ is left as an exercise.
We have seen any E ∈ L (R) can be approximated by open and closed sets. This includes elements of L (R) with finite measure. In this case the closed sets F approximating E ∈ L (R), ∗ (E) < ∞, are compact. Proposition 9.4.1 (Approximation of sets in L (R) by compact sets) A set E ⊆ R, with ∗ (E) < ∞, is Lebesgue measurable (⇒ E ∈ L ) if and only if, for any > 0, there exists a compact set K ⊆ E such that
222
9 The Lebesgue Measure
∗ (E \ K ) < .
Proof Immediate corollary of the previous theorem.
Now it is clear we may approximate Lebesgue measurable subset of R by open sets, closed sets or compact sets up to sets with arbitrarily small measure. In the next theorem, using countable intersections of open sets or unions of closed sets, we approximate Lebesgue measurable subsets of R up to zero-measure sets. Theorem 9.4.3 (Approximation for sets in L (R) by G δ and Fσ sets) A set E ⊆ R is Lebesgue measurable if and only if there exists a G ∈ G δ (F ∈ Fσ ) such that E ⊆ G (F ⊆ E) and (9.4.1) ∗ (G\E) = 0(∗ (E\F) = 0). Proof Start with E Lebesgue measurable. By Theorem 9.4.2, for every n ∈ N there exist a closed set Fn and an open set G n such that Fn ⊆ E ⊆ G n plus ∗ (G n \E)
0 there exists an open set G ⊆ R, with E ⊆ G, such that (G \ E) < if and only if
9.4 The Lebesgue Measure on R
223
(E) = inf (G) : G is open in R and E ⊆ G . Proof Necessary implication: For any > 0, we have a G ⊆ R, E ⊆ G such that (G \ E) < . As E ⊆ G = E ∪ (G \ E), (E) ≤ (G ) = (E) + (G \ E) < (E) + . This is for any open G containing E such that (G \ E) < , so the inequality holds when taking the infimum inf{(G)}. Also > 0 is arbitrary, so we obtain the weak inequality
(E) ≤ inf (G) : G is open in R and E ⊆ G ≤ (E). Sufficient implication: take E ∈ L (R) such that (E) < ∞ and
(E) = inf (G) : G is open in R and E ⊆ G .
(9.4.2)
Pick > 0, so (E) + is not the greatest lower bound and there exists G ⊆ R open containing E such that (E) + > (G). As G = E ∪ (G \ E), (G) = (E) + (G \ E) < (E) + , and then ∞ (G \ E) < . In case (E) = ∞, the Eq. 9.4.2 is itrue and we can set E i where (E i ) < ∞ for all i, so (G i \ E i ) < /2 and E = i=1 (G \ E) ≤
∞
(G i \ E i ) < .
i=1
Remark 9.4.1 The dual statement, so to speak, to Proposition 9.4.2 goes like this: “Let E ∈ L (R). For every > 0 there exists a closed set F ⊆ R, with F ⊆ E, such that (E \ F) < if and only if
(E) = sup (F) : F is closed in R and F ⊆ E .” In the case of sets E ∈ L (R), with (E) < 0, the statement reads: “Let E ∈ L (R) satisfy 0 < (E) < ∞. Then for every > 0 there exists a compact subset K ⊆ E such that (E \ K ) < if and only if
(E) = sup (K ) : K is compact in R and K ⊆ E .” ♣
224
9 The Lebesgue Measure
The properties of on L (R) considered in Proposition 9.4.2 and Remark 9.4.1 highlight the relationship between L-measurable sets and elementary topological families, namely open, closed and compact subsets of R (provided (E) < ∞ for the latter). These considerations permit to generalize properties of on R to a measure m on a generic metric space (X, d). Definition 9.4.1 (Outer and inner regular measures) Given a metric space (X, d), a measure m on the collection of Borel sets B(X ) is called a Borel measure. We further say it is: 1. outer regular if, for every E ∈ B(X ),
m(E) = inf m(G) : G is open and E ⊆ G
= sup m(F) : F is closed and F ⊆ E 2. inner regular if, for every E ∈ B(X ), 0 < m(E) < ∞, one has
m(E) = sup m(K ) : K is compact and K ⊆ E . ♦ Proposition 9.4.2 and Remark 9.4.1 now tell us that the Lebesgue measure on R is both inner and outer regular, i.e. just regular. The next theorem explains that on complete and separable metric spaces (X, d), a finite measure m on B(X ) is (outer and inner) regular. Theorem 9.4.4 (Regularity conditions for finite Borel measures) On a metric space (X, d), a finite Borel measure on B(X ) is outer regular. If (X, d) is additionally complete and separable, a finite Borel measure on B(X ) is also inner regular, and hence regular. Proof For the first claim we must prove that, for every E ∈ B(X ),
m(E) = inf m(G) : G is open and E ⊆ G
= sup m(F) : F is closed and F ⊆ E .
(9.4.3)
By virtue of Proposition 9.4.2 and Remark 9.4.1 proving (9.4.3) amounts to showing that for every > 0 there exists an open set G and a closed set F such that4 F ⊆ E ⊆ G and m(G \ F ) < . 4 Note
.
G \ F = E ∪ G \ E) \ F = E \ F) ∪ G \ E \ F = E \ F ∪ G \ E .
9.4 The Lebesgue Measure on R
225
Define
S = E ∈ B(X ) : F closed, G open, F ⊆ E ⊆ G and m(G \ F ) < . It suffices to prove S is a σ-algebra containing all closed subsets, so that S = B(X ). For this, if E ∈ S then given > 0 there exist F, G such that F ⊆ E ⊆ G, and correspondingly G c ⊆ E c ⊆ F c and m(F c \ G c ) < η. Let {E i }i≥1 be a sequence in S. Given > 0, by assumption for every i we can choose an open set G i and a closed set Fi such that Fi ⊆ E i ⊆ G i and m(G i \ Fi )
0 we can choose n ∈ N such that m(G n ) − m(F) < . Clearly F ⊆ G n and m(G n \ F) < , i.e. F ∈ S. But F ∈ C is arbitrary, so S is a σ-algebra containing all closed subsets, i.e. S = B(X ). Now we come to the second statement and prove, for every E ∈ B(X ),
226
9 The Lebesgue Measure
m(E) = sup m(K ) : K is compact and K ⊆ E . This is equivalent to showing that for every > 0 there exists a compact set K such that m(E \ K ) < . Take > 0 and choose a closed set F ⊆ E such that m(E \ F) < /2. As X is separable, it contains a countable dense set D = {x1 , . . . , xk , . . .}. Then for every k ∈ N we can choose a sequence {xki }i≥1 ⊆ D such that the family of balls B = {B(xki , 1/k)}i≥1 , where
B(xki , 1/k) = y ∈ X : d(xki , y) < 1/k and xki ∈ D , is an open cover of F: F⊆
∞
B(xki , 1/k).
i=1
Define
¯ ki , 1/k) = y ∈ X : d(xki , y) ≤ 1/k and xki ∈ D . B(x As m is finite on B(X ), for every k ≥ 1 there exists n k such that nk ¯ ki , 1/k) < m F B(x i=1
2k+1
.
Now consider K =
nk ∞
¯ ki , 1/k); B(x
k=1 i=1
this is closed and totally bounded in X and, by completeness, Proposition A.0.5 guarantees K is compact. Furthermore, E \ K = (E \ F) ∪ (F \ K ), so nk ∞ ¯ m(E \ K ) ≤ m(E \ F) + m(F \ K ) < + m F B(xki , 1/k) 2 k=1 i=1 nk ∞ ¯ ki , 1/k) F B(x = +m 2 k=1 i=1 ∞ nk ¯ ≤ + m F < , B(xki , 1/k) 2 k=1 i=1 proving the claim.
9.4 The Lebesgue Measure on R
227
We close the section with a few results on the translation-invariance of the Lebesgue measure. We want to know whether, given a set E ⊆ R and x ∈ R, the shift E + x = {y + x : y ∈ E} has measure (E). Recall the Lebesgue measure is nothing but the restriction of the outer measure ∗ to Lebesgue measurable sets. Given E ⊆ R, (E) = inf
∞
(Ii ) : E ⊆
i=1
∞
Ii , Ii ∈ I .
i=1
We want to prove (E) = (E + x). If E can be covered by intervals Ii then (E + x) is covered by intervals (Ii + x). Equivalently, if (E + x) is covered by (Ii + x) then E is covered by (Ii + x − x) = Ii : E⊆
∞
Ii ⇔ (E + x) ⊆
i=1
∞ (Ii + x). i=1
Hence (E) = (E + x), i.e. the Lebesgue measure of a set E ⊆ R is invariant under translations. That said, it is important to know whether every translated-set of the family L (R) is measurable. Proposition 9.4.3 (L-measurability of L (R)-translated sets) If E ∈ L (R), for every x0 ∈ R the shifted set
E + x0 = (x + x0 ) ∈ R : x ∈ E is L-measurable. Proof E is Lebesgue measurable, so for every Y ⊆ R (Y ) = (Y ∩ E) + (Y ∩ E c ). But is translation-invariant, so (Y ∩ E) = ((Y ∩ E) + x0 ) ∀Y ⊆ R. In the right hand we can swap the order of intersection and translation, to the effect that (Y ∩ E) = ((Y + x0 ) ∩ (E + x0 )). Similarly (Y ∩ E c ) = ((Y ∩ E c ) + x0 ) = ((Y + x0 ) ∩ (E c + x0 )). As (E c + x0 ) = (E + x0 )c , then, (Y ∩ E c ) = ((Y ∩ E c ) + x0 ) = ((Y + x0 ) ∩ (E + x0 )c ). Hence (Y ) = ((Y + x0 ) ∩ (E + x0 )) + ((Y + x0 ) ∩ (E + x0 )c ).
228
9 The Lebesgue Measure
Since (Y ) = (Y − x0 ) for any x0 ∈ R, the previous relation reads (Y ) = ((Y + x0 − x0 ) ∩ (E + x0 )) + ((Y + x0 − x0 ) ∩ (E + x0 )c ) = (Y ∩ (E + x0 )) + (Y ∩ (E + x0 )c ),
and E + x0 is L-measurable.
Overall, the Lebesgue measure on R is invariant under translations, σ-finite and regular.
9.5 Cardinality of L (R) and B(R) We saw the Lebesgue σ-algebra L (R) is the completion of the Borel σ-algebra B(R). In this sense L (R) = B(R) ∪ T , so B(R) ⊆ L (R) ⊆ P(R). The aim of this section is to check whether B(R) is a proper subfamily of L (R), i.e. if there exist Lebesgue measurable sets that are not Borel sets. There are two ways to address the matter. The first is to build a Lebesgue measurable sets that is not Borel. The explicit construction is, alas, not that easy.5 The second consists in showing the cardinality of B(R) is less than the cardinality L (R). The first proposal was undertaken by the Russian mathematician Suslin (1894– 1919), who found the first example of Lebesgue measurable set that is not Borel. Suslin discovered a very important large family, the so-called analytical sets, which can be described as continuous images of Borel sets, each of which is Lebesgue measurable (assuming bounded). The family contains Borel sets, but is much bigger. It would be interesting to establish if there existed a bounded, non-Lebesgue measurable set. But it is impossible to answer this question with arguments that only rely on cardinality. What we will do is compare the cardinalities of B(R) and L (R). Let us begin with the latter, and take the Cantor set for guidance. Here 2 22 2n 1 2 n 1 + 2 + 3 + · · · + n+1 + · · · = = 1. 3 3 3 3 3 n=0 3 ∞
(G) =
Every Cn is measurable, hence also C = ∞ n=1 C n is measurable. More specifically, so the (C1 ) = 2/3; (C2 ) = (2/3)2 ; . . . , (Cn ) = (2/3)n , measure’s continuity from above (Theorem 8.2.4) gives: n ∞ 2 (C) = Cn = lim (Cn ) = lim = 0. n→∞ n→∞ 3 n=1 5 We
shall exhibit a non-constructive example of a set that is not Borel when we discuss the VitaliCantor map (Sect. 10.5).
9.5 Cardinality of L (R) and B (R)
229
C has the cardinality of the continuum (Example 2.3.3), and now we know it has measure zero, so C ∈ N . As the Lebesgue σ-algebra is complete, C ∈ L (R), any T ⊆ C has (T ) = 0, hence T ∈ L (R). Therefore P(C) ⊆ L (R), and |L (R)| ≥ 2c . But L (R) ⊆ P(R), so finally |L (R)| = 2c . That the cardinalities of L (R) and P(R) are the same does not say anything about the possible existence of non-Lebesgue measurable subsets. Ulam’s theorem6 sheds light on the matter: “assuming the continuum hypothesis, the Lebesgue measure cannot be extended to the collection of all subsets of R”. Hence not all subsets of R are Lebesgue measurable. Prior to this, in 1905, Giuseppe Vitali7 had clarified the issue by constructing, with the axiom of choice (AC), a non-Lebesgue measurable subset of the real numbers. Without the AC or the continuum hypothesis we cannot know if there exist subsets of R that are not Lebesgue measurable. If we replace the AC with the statement “every subsets of R is Lebesgue measurable” in ZF set theory, the new system stays consistent (as shown by R. Solovay in 1970). To the day, the question whether Lebesgue measurable sets exhaust all subsets of R remains open. Shifting the focus onto Borel sets, by the Lebesgue-Hausdorff theorem we know B(R) has cardinality c. In the next proposition we prove it directly. Proposition 9.5.1 (Cardinality of the Borel σ-algebra) The Borel σ-algebra B(R) has the cardinality of the continuum. Proof Recall 10 is the family of open sets. Since the Borel σ-algebra can be generated by 10 , we put S(10 ) = B(R), and by Theorem 7.2.1 α0 . S(10 ) = α 0, ⎪ ⎨ m((0, 0 if x = 0, F(x) = ⎪ ⎩ −m((x, ˜ 0]) if x < 0. We claim this choice satisfies m((a, ˜ b]) = F(b) − F(a) for every (a, b]. To see this, just compute F(b) − F(a) = m((0, ˜ b]) − m((0, ˜ a]) = F(b) − F(0) − F(a) + F(0) = m((a, ˜ b]). Now let us show F is increasing. Take x, y ∈ R such that x < y and suppose they are positive: ˜ y]) = m((0, ˜ x] ∪ (x, y]) = m((0, ˜ x]) + m((x, ˜ y]) ≥ F(x). F(y) = m((0, If both are negative, with y < x, F(y) = −m((y, ˜ 0]) = −m˜ (y, x] ∪ (x, 0] = −m((y, ˜ x]) − m((x, ˜ 0]) ≤ −m((y, ˜ x]) + F(x) ≤ F(x). Finally if y < 0 < x, 0 ≤ m((y, ˜ x]) = F(x) − F(y) ⇔ F(y) ≤ F(x) and F is indeed increasing. Now we prove F is right-continuous provided m˜ is completely additive. Take x ∈ R and a sequence {xn }n≥1 with xn ↓ x. It suffices to show F(xn ) → F(x). In case x ≥ 0, 0 ≤ x < · · · ≤ xn ≤ · · · ≤ x2 ≤ x1 , (0, x1 ] = (0, x] ∪
(x, x1 ] .
∞
n=2 (x n ,x n−1 ]
9.6 The Lebesgue-Stieltjes Measure
239
By countable additivity, ˜ x]) + m((x, ˜ x1 ]) m((0, ˜ x1 ]) = m((0, ∞ ∞ F(x1 ) =F(x) + m˜ (xn , xn−1 ] = F(x) + m((x ˜ n , xn−1 ]) n=2
=F(x) + lim
k
k→∞
=F(x) + lim
k→∞
n=2
m((x ˜ n , xn−1 ])
n=2
k [F(xn−1 ) − F(xn )]
n=2
F(x1 ) − F(x2 ) + F(x2 ) − . . . + F(xk−1 ) − F(xk )
=F(x) + lim
k→∞
=F(x) + F(x1 ) − lim F(xk ), k→∞
and hence limk→∞ F(xk ) = F(x) for every x ≥ 0. In case x < 0, x ≤ · · · ≤ xn ≤ · · · ≤ x2 ≤ x1 < 0, and (x, 0] =
(x, x ] ∪(x1 , 0]. 1
∞
n=2 (x n ,x n−1 ]
Countable additivity gives ˜ 1 , 0]) m((x, ˜ 0]) =m((x, ˜ x1 ]) + m((x ∞ ∞ −F(x) =m˜ (xn , xn−1 ] − F(x1 ) = m((x ˜ n , xn−1 ]) − F(x1 ) n=2
= lim
k
k→∞
= lim
n=2
k
k→∞
= lim
k→∞
n=2
m((x ˜ n , xn−1 ]) − F(x1 ) [F(xn−1 ) − F(xn )] − F(x1 )
n=2
F(x1 ) − F(x2 ) + F(x2 ) − . . . + F(xk−1 ) − F(xk ) − F(x1 )
=F(x1 ) − lim F(xk ) − F(x1 ), k→∞
and similarly, limk→∞ F(xk ) = F(x) for every x < 0.
We have shown one can construct completely additive set functions on right-closed intervals provided they arise through right-continuous, increasing maps. Moreover,
240
9 The Lebesgue Measure
the measure m˜ on B(R) such that m˜ (a, b] < ∞, ∀a, b ∈ R, a < b, can be viewed as the Lebesgue-Stieltjes measure for some increasing, right-continuous F. The measure m˜ is totally finite iff −∞ < F(−∞) < F(∞) < ∞. In that case, in Proposition 9.6.2 the map F : R → R will be given by F(x) = m˜ (−∞, x] . Right-continuous increasing maps F : R → R, as characterised by Propositions 9.6.1, 9.6.2, are called cumulative distribution functions of m˜ on R. In case m(R) ˜ = 1, m˜ is called probability measure and the corresponding F is the cumulative distribution function where ˜ = 1. lim F(x) − F(−x) = m(R)
x→∞
Chapter 10
Measurable Functions
10.1 Introduction For a function f to be measurable the pre-image of any measurable set in the codomain must be measurable. We met a similar situation when discussing the Lebesgue–Hausdorff theorem, and more specifically when we defined the class of Borel functions (Definition 5.1.1). On the basis of that definition and the subsequent discussion about measurable sets, we can say Borel functions represent a special instance of family of measurable functions. Now the time has come to address the matter from a broader angle. Definition 10.1.1 (Measurable functions1 ) Let (X, S(A)) and (X , S (A)) be measurable spaces. We shall say that a function f : (X, S(A)) → (X , S (A)) is (S(A), S (A))-measurable, or just measurable, if f −1 (A ) ∈ S(A) for every A ∈ S (A). ♦ Theorem 10.1.1 (Characterisation of measurable functions) Given measurable spaces (X, S(A)), (X , S(A )), a map f : X → X is (S(A), S(A ))-measurable if and only if, for every A ∈ A , f −1 (A ) ∈ S(A). Proof (⇒): trivial. (⇐): suppose f −1 (A ) ∈ S(A) for every A ∈ A . Take F = A ∈ A : f −1 (A ) ∈ S(A) .
1 Measurable
functions are called random variables in probability theory.
© The Editor(s) (if applicable) and The Author(s), under exclusive license to Springer Nature Switzerland AG 2020 S. Gentili, Measure, Integration and a Primer on Probability Theory, UNITEXT 125, https://doi.org/10.1007/978-3-030-54940-4_10
241
242
10 Measurable Functions
Clearly A ⊆ F. As f −1 (∅) = ∅ ∈ S(A), then ∅ ∈ F. Let A ∈ F, so f −1 (A ) ∈ S(A), and then f −1 (A ))c = f −1 (A )c ∈ S(A), i.e. (A )c ∈ F. Hence X ∈ F. Let {Ai }i≥1 be a sequence in F, then by definition f −1 (Ai ) i≥1 ⊆ S(A), and so f
−1
∞ i=1
Ai
=
∞
f −1 (Ai ) ∈ S(A).
i=1
In other words F is closed under countable unions, whence a σ-algebra containing A , but S(A ) is the σ-algebra generated by A , then S(A ) ⊆ F, and therefore f −1 (A ) ∈ S(A) for every A ∈ S(A ). Observe that f is measurable if to each set in S(A ) there corresponds under f −1 at least one set in S(A). Therefore a measurable function f , from the measurable space (X, S(A)) to the measurable space (X , S(A )), is such that the σ-algebra f −1 (S(A )) is contained in the σ-algebra S(A). Consequently the source σ-algebra contains as many (measurable) sets as the target σ-algebra. Definition 10.1.2 (Borel and Lebesgue measurable functions) Let (X, T ), (X , T ) be topological spaces with respective Borel σ-algebras B(T ), B(T ), and let L (T ) denote the completion of B(T ). A map f : X → X is called Borel measurable if it is (B(T ), B(T ))-measurable, and we call it Lebesgue measurable if it is ♦ (L (T ), B(T ))-measurable. It is straightforward that a Borel measurable map f : X → X is Lebesgue measurable, since B(T ) ⊆ L (T ). Proposition 10.1.1 (Borel measurability of continuous maps) Given topological spaces (X, T ), (X , T ), a continuous map f : X → X is Borel measurable. Proof By continuity, f −1 (A ) ∈ T for every A ∈ T . As open sets in T , T generate the Borel σ-algebras B(T ) and B(T ), by Theorem 10.1.1 f is (B(T ), B(T ))measurable. So f : X → X continuous implies f −1 (A ) ∈ B(T ) for every Borel set A ∈ B(T ). In Example 10.5.1 we will see the same does not happen for Lebesgue measurable sets, because the continuous pre-image of a Lebesgue measurable set is not Lebesgue measurable, in general. Moreover, given topological spaces (X, T ), (X , T ), (X , T ), if f : X → X and g : X → X are continuous and in addition f is Borel or Lebesgue measurable and g is Borel measurable, then the composite g ◦ f : X → X is of the same type as f .
¯ 10.2 R-valued Measurable Functions ¯ defined on a measurable space (X, S), where X is We turn to maps f : X → R non-empty and S the σ-algebra of subsets of X . In analogy to Definition 10.1.1 we can introduce measurable functions with values in the extended reals.
¯ 10.2 R-valued Measurable Functions
243
¯ Given a measurable Definition 10.2.1 (Measurable functions with values in R) ¯ is S-measurable or simply measurable if it space (X, S), a function f : X → R ¯ is (S, B(R))-measurable. We indicate with M (X, S) the collection of S-measurable functions, with ♦ M + (X, S) that of non-negative S-measurable maps. The terminology leads to a result summarising the various equivalent ways to describe ¯ R-valued measurable maps. Proposition 10.2.1 (Equivalent ways to express measurability of maps) Let (X, S) ¯ a map. The following facts are equivalent: be a measurable space, f : X → R f isS-measurable; f −1 [α, ∞] = {x ∈ X : f (x) ∈ [α, ∞]} ∈ S, ∀α ∈ R; f −1 (α, ∞] = {x ∈ X : f (x) ∈ (α, ∞]} ∈ S, ∀α ∈ R; f −1 [−∞, α] = {x ∈ X : f (x) ∈ [−∞, α]} ∈ S, ∀α ∈ R; f −1 [−∞, α) = {x ∈ X : f (x) ∈ [−∞, α)} ∈ S, ∀α ∈ R;
6. f −1 {+∞} = x ∈ X : f (x) ∈ ∞ n=1 (n, ∞] ∈ S;
7. f −1 {−∞} = x ∈ X : f (x) ∈ ∞ [−∞, −n) ∈ S. n=1
1. 2. 3. 4. 5.
Proof To show 1. ⇔ 2. observe that ¯ = [α, ∞] ⊆ R ¯ : f −1 [α, ∞] ∈ S, ∀α ∈ R , B(R) so ¯ such that the set f −1 [α, ∞] ∈ S, ∀α ∈ R, ∃[α, ∞] ∈ B(R) is equivalent to
∀A ∈ S (A), the set f −1 A ∈ S,
i.e. the Definition 10.1.1. The equivalences 1. ⇔ 3., 1. ⇔ 4., 1. ⇔ 5. are completely similar. So let us assume f is measurable (as of Definition 10.1.1, or in the sense of 2., 3., 4. or 5.). Since f −1 (α, ∞] is measurable, setting n = α gives ∞
∞ f −1 (n, +∞] = f −1 (n, +∞] = f −1 {+∞} ∈ S
n=1
n=1
is measurable. And since f −1 [−∞, α) is measurable, using −n = α we find ∞
f
−1
∞ −1 [−∞, −n) = f [−∞, −n) = f −1 {−∞} ∈ S
n=1
is measurable, too.
n=1
244
10 Measurable Functions
¯ Remark 10.2.1 1. Based on the above proposition we can also say f : X → R is S-measurable iff f −1 (α, β) ∈ S for every α, β ∈ R, α ≤ β. In fact f −1 (α, β) = f −1 [−∞, β) ∩ f −1 (α, ∞] ∈ S, and similarly, f −1 [α, β) = f −1 [−∞, β) ∩ f −1 [α, ∞] ∈ S; f −1 (α, β] = f −1 [−∞, β] ∩ f −1 (α, ∞] ∈ S; f −1 [α, β] = f −1 [−∞, β] ∩ f −1 [α, ∞] ∈ S. ¯ is S-measurable iff f −1 (I ) ∈ S for every interval In conclusion, f : X → R ¯ But open sets are countable unions of open pairwise-disjoint intervals, so I ⊆ R. ¯ namely ¯ is S-measurable iff f −1 (G) ∈ S for every open G ∈ B(R): f : X → R ∞ G = i=1 Ii and f
−1
(G) = f
−1
∞ ∞ −1 f (Ii ) ∈ S. Ii = i=1
i=1
¯ is Lebesgue or Borel meaTake X = R in Proposition 10.2.1: then f : R → R surable iff a. f −1 (I ) ∈ L (R), or b. f −1 (I ) ∈ B(R). ¯ respectively, for every interval I ⊆ R. 2. For any A ∈ S the indicator function ϕ A (x) is S-measurable. As ϕ A = 1 for / A, then for all β, α ∈ R every x ∈ A and ϕ A = 0 for every x ∈ ⎧ X if β ≤ 0 and α > 1 ⎪ ⎪ ⎪ ⎨ A if β > 0 and α > 1 ϕ−1 A ([β, α) = ⎪ Ac if β ≤ 0 and α ≤ 1 ⎪ ⎪ ⎩ ∅ if α ≤ 0. This is measurable since X, A, Ac , ∅ ∈ S. ¯ be the constant 3. Constant maps are S-measurable, as continuous. Let f : X → R map f (x) = k. Then f
−1
(α, ∞] = x ∈ X : f (x) > α =
X if α < k ∅ if α ≥ k,
¯ 10.2 R-valued Measurable Functions
245
and S-measurability follows. ¯ is continuous everywhere but for an enclosable set, it is S4. If f : X → R measurable. Let E ∈ S be an enclosable set on which f is not continuous, and recall enclosable implies S-measurable, of zero measure. Given an inter¯ val I ⊆ R, f −1 (I ) = x ∈ X : f (x) ∈ I = x ∈ X : f (x) ∈ I ∩ E ∪ x ∈ X : f (x) ∈ I ∩ E c . So x ∈ X : f (x) ∈ I ∩ E is, as of zero mea subset of E, S-measurable sure. But on E c f is continuous, so x ∈ X : f (x) ∈ I ∩ E c is S-measurable. Hence f −1 (I ) = x ∈ X : f (x) ∈ I is S-measurable. 5. A monotone map f : R → R (e.g., increasing) preserves the linear order, so for any I ∈ [α, ∞) : α ∈ R f −1 (I ) is ∅ or an interval of I, hence in class B(R).
♣
¯ be in Proposition 10.2.2 (Comparing measurable functions) Let f, g : X → R M (X, S). The following sets are S-measurable 1. x ∈ X : f (x) < g(x); 2. x ∈ X : f (x) > g(x); 3. x ∈ X : f (x) = g(x); 4. x ∈ X : f (x) ≥ g(x); 5. x ∈ X : f (x) ≤ g(x) . Proof Let us commence with 1.. For every x ∈ X there is a q ∈ Q such that f (x) < q < g(x), so
x ∈ X : f (x) < g(x) = x ∈ X : f (x) < q < g(x) q∈Q
=
q∈Q
=
x ∈ X : f (x) < q ∩ x ∈ X : q < g(x)
f −1 ([−∞, q)) ∩ g −1 ((q, ∞]) .
q∈Q
q)) and g −1 ((q, ∞]) are S-measurable, and so is x ∈ X : Clearly f −1 ([−∞, f (x) < g(x) . From this, also
c
x ∈ X : f (x) < g(x)
is S-measurable. And similarly for
= x ∈ X : f (x) ≥ g(x)
246
10 Measurable Functions
x ∈ X : g(x) < f (x) ∈ S c ⇔ x ∈ X : g(x) < f (x) = x ∈ X : g(x) ≥ f (x) ∈ S and
x ∈ X : f (x) = g(x) = x ∈ X : f (x) ≥ g(x) ∩ x ∈ X : f (x) ≤ g(x) ∈ S.
Now we shall examine if and when measurability is preserved by operations between measurable functions. Theorem 10.2.1 (Elementary operations between measurable functions) Let f, g : ¯ be in M (X, S) and c ∈ R. Define (X, S) → R (a) 0 · (±∞) = 0, by convention; (b) f (x) + g(x) = 0 and f (x) · g(x) = 0 ∀x ∈ E 1 ∪ E 2 , where E 1 = {x ∈ X : f (x) = −∞} ∩ {x ∈ X : g(x) = +∞} E 2 = {x ∈ X : f (x) = +∞} ∩ {x ∈ X : g(x) = −∞} are measurable (cf. 6.–7. in Proposition 10.2.1). Then the following belong to M (X, S): 1. 2. 3. 4. 5. 6.
c · f (x) = (c · f )(x); f (x) + g(x) = ( f + g)(x); f (x) · f (x) = f 2 (x); f (x) · g(x) = ( f · g)(x); f (x)/g(x) if g(x) = 0, ∀x ∈ X ; | f (x)|.
Proof 1. For c = 0 we have −1
(c · f ) ((α, ∞]) = {x ∈ X : c · f (x) > α} = =0
X if α < 0 ∅ if α ≥ 0,
whence 0 · f (x) is S-measurable. For c = 0, (c · f )
−1
((α, ∞]) = x ∈ X : c · f (x) > α =
x ∈ X : f (x) > α/c if c > 0 x ∈ X : f (x) < α/c if c < 0.
Now f measurable implies these sets are measurable, so c · f (x) ∈ M (X, S).
¯ 10.2 R-valued Measurable Functions
247
2. Without loss of generality take x ∈ X such that f (x) + g(x) > α, i.e. f (x) > α − g(x). Then there is a rational number q such that f (x) > q > α − g(x), so f (x) > q and g(x) > α − q. Immediately ( f + g)−1 ((α, ∞]) = x ∈ X : f (x) + g(x) > α x ∈ X : f (x) > q ∩ x ∈ X : α − q < g(x) . = q∈Q
By assumption f (x), g(x) are S-measurable so the last term is a countable union in S, making f (x) + g(x) S-measurable. 3. ( f 2 )−1 ((α, ∞]) = x ∈ X : f 2 (x) > α X if α < 0 = √ √ x ∈ X : f (x) > α ∪ x ∈ X : f (x) < − α
if α ≥ 0.
In either case f 2 (x) is S-measurable. 4. Since f 2 (x) + g 2 (x) − f 2 (x) − g 2 (x) + 2 f (x)g(x) + 2 f (x)g(x) 4 2 2 f (x) + g(x) − f (x) − g(x) , = 4
f (x) · g(x) =
by parts 2.–3. the product map is S-measurable. 5. If we show 1/g is S-measurable, the claim follows from 4. Clearly 1/g is defined on A = X \ {x ∈ X : g(x) = 0}. Note {x ∈ X : g(x) = 0} ∈ S, because in this case g is constant. To show 1/g is S-measurable we prove that for every α ∈ R
1 g(x)
−1
(α, ∞] = x ∈ A :
1 > α ∈ S. g(x)
Suppose α = 0, so 1/g(x) > 0. This means g(x) > 0 and g(x) = ∞, and then x∈A:
1 > 0 = x ∈ A : g(x) > 0 x ∈ A : g(x) = ∞ ∈ S. g(x)
If α > 0, because 1/g(x) > α, then g(x) > 0 and g(x) < 1/α, so
1 1 x∈A: > α = x ∈ A : 0 < g(x) < ∈ S. g(x) α
Finally if α < 0, as 1/g(x) > α, then we can have 1/g(x) > α and g(x) > 0 and/or 1/g(x) > α and g(x) < 0. Consequently
248
10 Measurable Functions
x∈A:
1 1 >α = x ∈ A: > α and g(x) > 0 ∪ g(x) g(x) 1 x∈A: > α and g(x) < 0 . g(x)
But α < 0 so regarding the first set
1 > α and g(x) > 0 = x ∈ A : g(x) > 0 ∈ S, x∈A: g(x)
while for the second set 1 1 x∈A: > α and g(x) < 0 = x ∈ A : g(x) < and g(x) < 0 g(x) α 1 ∈ S. = x ∈ A : g(x) < α 1 Eventually, x ∈ A : g(x) > α is S-measurable. 6. When α < 0, then | f (x)| > α, so (| f |)−1 ((α, ∞]) = x ∈ X : | f (x)| > α = X. When α ≥ 0, (| f |)−1 ((α, ∞]) = x ∈ X : | f (x)| > α = x ∈ X : f (x) > α ∪ x ∈ X : f (x) < −α . In either case (| f |)−1 ((α, ∞]) ∈ S.
Definition 10.2.2 (Positive and negative part of a map) We call positive part and ¯ the two non-negative maps negative part of f : X → R f (x) if f (x) ≥ 0 + f (x) = sup{ f (x), 0} = 0 if f (x) < 0 −
f (x) = sup{− f (x), 0} =
− f (x) if f (x) < 0 0 if f (x) ≥ 0.
for every x ∈ X . It is well known any map can be expresses using these parts f (x) = f + (x) − f − (x)| f (x)| = f + (x) + f − (x),
♦
¯ 10.2 R-valued Measurable Functions
249
combining which we obtain: f + (x) =
1 1 | f (x)| + f (x) f − (x) = | f (x)| − f (x) . 2 2
Remark 10.2.2 The statements in Theorem 10.2.1 cannot be reversed. About 1., if (c · f )(x) is measurable then f is without doubt measurable when c = 0, but may be not measurable for c = 0. Regarding 2., ( f + g)(x) = 0 might be measurable and f, g not: take f not measurable and g = − f . 3.: let V ⊂ [0, 1] be the Vitali set and f : [0, 1] → R 1 + x if x ∈ V f (x) = (10.2.1) −(1 + x) if x ∈ [0, 1] \ V. It is not measurable, for f −1 (0, ∞) = {x ∈ [0, 1] : f (x) > 0} = V, while f 2 (x) = (1 + x)2 is continuous, hence Borel measurable by Proposition 10.1.1. In item 4. we can take g = 0 and f not measurable, while in 5. we choose f = 0 and 1/g not measurable. For 6. take f as in (10.2.1), from which f −1 ((0, ∞)) = {x ∈ [0, 1] : f (x) > 0} = V is not measurable, while | f | = f + (x) + f − (x) = 1 + x, ∀x ∈ [0, 1] = V ∪ [0, 1] \ V is continuous and, by Proposition 10.1.1, Borel measurable.
♣
Now we prove a result that fully describes measurable functions in terms of positive and negative parts. Corollary 10.2.1 (Characterisation of measurable functions in terms of positive and ¯ ∈ M (X, S) if and only if f + (x), f − (x) ∈ M + (X, S). negative parts) f : X → R Proof Suppose f measurable. Since f + (x) =
1 1 | f (x)| + f (x) and f − (x) = | f (x)| − f (x) , 2 2
by 1., 6. in Theorem 10.2.1 both f + , f − are measurable. Conversely if f + and f − are measurable, by 1.–2. in Theorem 10.2.1 f is measurable. Theorem 10.2.2 (Sequences of measurable functions: measurability of sup and inf) ¯ For any R-valued sequence { f n }n≥1 ⊆ M (X, S)
250
10 Measurable Functions
sup{ f n } ∈ M (X, S) and inf{ f n } ∈ M (X, S). Proof Fix α ∈ R. Since sup f n−1 (α, ∞] = x ∈ X : sup{ f n (x) > α} n≥1
n≥1
=
∞
x ∈ X : f n (x) > α ,
n=1
and f n ∈ M (X, S) for every n ∈ N, then x ∈ X : f n (x) > α ∈ S. But S is closed under countable unions, so sup f n (x) ∈ M (X, S). n≥1
Moreover
inf f n (x) = − sup − f n (x) ,
n≥1
n≥1
and similarly inf n≥1 f n (x) ∈ M (X, S). ¯ Clearly inf n≥1 f n (x) and supn≥1 f n (x) always exist in R.
Directly from the previous theorem, Corollary 10.2.2 (Measurability of min( f (x), g(x)) and max( f (x), g(x))) If f, g : ¯ belong in M (X, S), then (X, S) → R f (x) ∨ g(x) = max f (x), g(x) and f (x) ∧ g(x) = min f (x), g(x) are S-measurable.
To clarify, we plot in Fig. 10.1 the functions min{x, x 2 } and max{x, x 2 } and the corresponding pre-images of f (x) ≥ α . We can see x ∨ x 2 = max{x, x 2 } is simply 2 2 2 mapping any x in its domain to the largest x ∧ x = min{x, x } of x, x , whereas 2 returns the smallest of the two. The set x ∈ X : x ∨ x ≥ α , in red, consists of x ∈ X : x ∨ x 2 ≥ α = {x ∈ X : x ≥ α} ∪ {x ∈ X : x 2 ≥ α} ∈ S; the set x ∈ X : x ∧ x 2 ≥ α , in green, equals x ∈ X : x ∧ x 2 ≥ α = {x ∈ X : x ≥ α} ∩ {x ∈ X : x 2 ≥ α} ∈ S. ¯ For every Remark 10.2.3 Let { f n }n≥1 be a sequence of functions from X to R. x∈X lim inf f n (x) = lim sup f n (x) n→∞
n→∞
¯ 10.2 R-valued Measurable Functions
251
¯ The sequence converges (pointwise, uniformly etc.) to iff limn→∞ f n (x) exists in R. some f ∈ R if the limit (exists and) is finite, i.e. if limn→∞ f n (x) ∈ R. Note further that for monotone sequences (increasing or decreasing) lim inf f n (x) = lim sup f n (x). n→∞
n→∞
¯ always exists. Hence limn→∞ f n (x) ∈ R
♣
Theorem 10.2.3 (Sequences of measurable functions: measurability of lim inf and ¯ Then lim sup) Let { f n }n≥1 ⊆ M (X, S) with values in R. lim inf { f n } ∈ M (X, S) and lim sup{ f n } ∈ M (X, S). n→∞
n→∞
Proof It is enough to observe lim inf f n (x) = sup n→∞
n≥1
inf f k (x)
k≥n
and lim sup f n (x) = inf n→∞
n≥1
and the claim follows from Theorem 10.2.2. ¯ Here, too, lim inf n→∞ f n (x) and lim supn→∞ f n (x) exist in R.
sup f k (x) , k≥n
Theorem 10.2.4 (Sequences of measurable functions: measurability of pointwise ¯ in M (X, S) with pointwise limits) If { f n (x)}n≥1 is a sequence of maps from X to R limit f : X → R, then f ∈ M (X, S). Proof Observe that, for every x ∈ X , f (x) = lim f n (x) = lim inf f n (x). n→∞
n→∞
Theorem 10.2.5 (Monotone sequences of measurable functions: measurability of ¯ limits) If { f n (x)}n≥1 is a monotone sequence of measurable maps from X to R: 1. limn→∞ f n ∈ M (X, S); ¯ is measurable. 2. A = x ∈ X : limn→∞ f n (x) ∈ R Fig. 10.1 Graphs of y = min(x, x 2 ) and y = max(x, x 2 )
252
10 Measurable Functions
Proof 1. As { f n (x)}n≥1 is monotone, ∀x ∈ X lim inf f n (x) = lim f n (x) = lim sup f n (x); n→∞
n→∞
n→∞
by Theorem 10.2.3 limn→∞ f n (x) is measurable. 2. Regarding the latter fact,
A = x ∈ X : lim inf f n (x) = lim sup f n (x) . n→∞
n→∞
As both lim inf n→∞ f n (x) and lim supn→∞ f n (x) are measurable on X , by part 3. in Proposition 10.2.2
¯ A = x ∈ X : lim f n (x) ∈ R n→∞
is measurable.
Measurability is inherited also in presence of properties valid almost everywhere on X (such as equality a.e. of maps, convergence a.e. of sequences, and so on). Theorem 10.2.6 (Equality m-almost everywhere of measurable functions) Let ¯ ∈ M (X, S), and g : X → R ¯ (X, S, m) be a complete measure space. If f : X → R is m-almost everywhere equal to f on X , then g ∈ M (X, S). Proof N = {x ∈ X : f (x) = g(x)} belongs in Nm . Given α ∈ R {x ∈ X : g(x) > α} = {x ∈ X : g(x) > α} ∩ N ∪ {x ∈ X : g(x) > α} ∩ N c = {x ∈ X : g(x) > α} ∩ N ∪ {x ∈ X : f (x) > α} ∩ N c . Clearly {x ∈ X : f (x) > α} ∩ N c ∈ S, and by completeness {x ∈ X : g(x) > α} ∩ N ∈ Tm ⊆ S. ¯ m). Consider a measure space (X, S, m) and its completion (X, S, ¯ The next propo¯ and S-measurability. sition examines a notion tackling the relationship between S¯ and S-measurable maps) Let Proposition 10.2.3 (Equality m-a.e. on X of S¯ m), ¯ an S¯ (X, S, m) be a measure space with completion (X, S, ¯ f¯ : X → R ¯ such that measurable map. There exists an S-measurable map f : X → R f¯(x) = f (x) m-almost everywhere on X .
¯ 10.2 R-valued Measurable Functions
253
Proof Write Q = {qn : n ∈ N} and define for every n ∈ N E n = {x ∈ X : f¯(x) < qn }. ¯ ¯ E n is S-measurable. Thismeans E n = Fn ∪ Tn , where Fn ∈ S, As f¯ ∈ M (X, S), Tn ∈ Tm is negligible and Tn ⊆ Nn . Put T = ∞ n=1 Tn , still negligible. Pick N ∈ Nm such that T ⊆ N and define G = N c and f = f¯ · ϕG . Clearly f = f¯ m-a.e., ¯ and for every n ∈ N
x ∈ X : f (x) < qn = x ∈ X : f¯ · ϕG (x) < qn = E n ∩ G = (Fn ∩ G) ∪ (Tn ∩ G) = (Fn ∩ G) ∈ S,
so f ∈ M (X, S).
¯ meaProposition 10.2.4 (Measurability of Dirichlet-like maps) Take f, g : X → R ¯ surable, and measurable sets E 1 , E 2 partitioning X . Then the map h : X → R, f (x) if x ∈ E 1 h(x) = g(x) if x ∈ E 2 is measurable. Proof As E 1 , E 2 are measurable, their characteristic functions ϕ E1 , ϕ E2 are measurable as well. By Theorem 10.2.1 h is Borel measurable, because for every x ∈ X h(x) = f (x) · ϕ E1 (x) + g(x) · ϕ E2 (x). Among the maps like h are Dirichlet-like functions.
We come to investigate simple measurable functions, which represent a sort of core for all measurable functions, and are made of linear combinations of characteristic functions. Using these we will be able to approximate more complicated maps. ¯ Definition 10.2.3 (R-valued simple n functions) Consider subsets A1 , . . . , An of X ¯ Ai . A simple function is a map s : X → R such that Ai ∩ A j = ∅ and X = i=1 n
ai ϕ Ai (x), x ∈ X,
s(x) = i=1
¯ where {a1 , . . . , an } = A ⊆ R.
♦
254
10 Measurable Functions
A simple function is a (finite) n linear combination of characteristic functions of certain ¯ for every x ∈ Ai . Hence Ai , that takes values ai ∈ R sets Ai ⊆ X , with X = i=1 Ai = {x ∈ X : s(x) = ai }. Remark 10.2.4 (Simple functions.) The most trivial simple function is the constant map s(x) = kϕ X (x). n Ai = X = mj=1 B j , The representation of a simple function is not unique. Let i=1 with Ai ∩ Ak = ∅ and B j ∩ Bh = ∅ for i = k and j = h, and set n
s1 (x) =
m
ai ϕ Ai (x) and s2 (x) = i=1
b j ϕ B j (x), j=1
such that ai = b j if x ∈ (Ai ∩ B j ). Then ∀x ∈ X n
s1 (x) =
n
m
ai ϕ Ai (x) = i=1
ai ϕ(Ai ∩B j ) (x) i=1 j=1 n
m
=
m
b j ϕ(Ai ∩B j ) (x) = i=1 j=1
b j ϕ B j (x) = s2 (x). j=1
It follows s1 (x) is essentially the same as s2 (x), so any simple function admits several representations, including: n
m
ai ϕ Ai ∩B j (x).
s(x) = i=1 j=1
♣ The next fact proves that elementary operations preserve simplicity. ¯ Proposition 10.2.5 (Operations between R-valued simple functions) Let s, s1 , s2 be simple functions. For any α, β ∈ R the following are simple functions: 1. 2. 3. 4. 5.
(αs1 + βs2 ); (s1 · s2 ) |s|; (s1 ∨ s2 ) = max(s1 , s2 ) and (s1 ∧ s2 ) = min(s1 , s2 ); s · ϕ E for every E ⊆ X .
Proof The proof is just a matter of computing:
¯ 10.2 R-valued Measurable Functions
255
1. n
m
αs1 (x) + βs2 (x) = α
ai ϕ Ai (x) + β
b j ϕ B j (x)
i=1
j=1
n
m
m
=α
n
ai ϕ Ai ∩B j (x) + β i=1 j=1
n
m
=
b j ϕ B j ∩Ai (x) j=1 i=1
αai + βb j ϕ Ai ∩B j (x) = (αs1 + βs2 )(x).
i=1 j=1
2. n
m
s1 (x) · s2 (x) =
ai ϕ Ai (x) · i=1
b j ϕ B j (x) j=1
m
n
m
n
=
ai ϕ Ai ∩B j (x) · i=1 j=1 n
b j ϕ Ai ∩B j (x) i=1 j=1
m
=
ai b j ϕ Ai ∩B j (x) = (s1 · s2 )(x). i=1 j=1
3.
n
|ai |ϕ Ai (x) = s + (x) + s − (x).
|s(x)| = i=1
4. n
m
s1 (x) ∨ s2 (x) =
ai ϕ Ai (x) ∨ i=1
b j ϕ B j (x) j=1
n
n
m
=
m
ai ϕ Ai ∩B j (x) ∨ i=1 j=1
(ai ∨ b j )ϕ Ai ∩B j (x) = s1 ∨ s2 (x).
m
n
b j ϕ Ai ∩B j (x) i=1 j=1
= i=1 j=1
The second expression is proved similarly. 5.
n
sϕ E =
n
ai ϕ Ai ϕ E = i=1
ai ϕ Ai ∩E . i=1
This last argument also shows that restrictions of simple functions are simple.
256
10 Measurable Functions
Proposition 10.2.6 (Measurability of simple functions) A simple function s is Smeasurable if Ai ∈ S for every i. Proof Under the given assumption, s −1 (I ) =
Ai ∈ S.
i:ai ∈I
On a measurable space (X, S) we indicate the class of S-measurable simple functions by M0 (X, S), and by M0+ (X, S) the subclass of non-negative ones. Obviously s + and s − belong to M0+ (X, S).
10.3 Pointwise Convergence A.e. and Almost Uniform Convergence In this section we study the concepts of pointwise convergence almost everywhere and almost uniform convergence. Relying on Definitions 4.2.1, 4.2.2 we can address pointwise convergence m-a.e. straightaway, and deal with almost uniform convergence later, provided the measure spaces are complete. ¯ Definition 10.3.1 (Pointwise convergence m-almost everywhere) A sequence of Rvalued maps { f n }n≥1 ⊆ M (X, S) defined on a complete measure space (X, S, m) converges pointwise m-a.e. to f : X → R whenever
x ∈ X : lim f n (x) = f (x) n→∞
c
∈ Nm . ♦
Theorem 10.3.1 (Pointwise convergence m-a.e. and measurability of limit) Let ¯ (X, S, m) be a complete measure space. A sequence { f n }n≥1 ⊆ M (X, S) of Rvalued maps that converges pointwise m-a.e. admits limit f : X → R ∈ M (X, S). Proof The set N = x ∈ X : lim f n (x) n→∞
belongs to Nm . Define f : X → R as follows: for every x ∈ X lim f n (x) if x ∈ (X \ N ) f (x) = n→∞ 0 if x ∈ N .
10.3 Pointwise Convergence A.e. and Almost Uniform Convergence
257
This is the well-defined map to which { f n }n≥1 converges m-almost everywhere. We claim f ∈ M (X, S). Recall (X, S, m) is complete, so N = x ∈ X : ¯ limn→∞ f n (x) is S-measurable. Given an interval I ⊆ R, f −1 (I ) = {x ∈ X : f (x) ∈ I } = {x ∈ X : f (x) ∈ I } ∩ N ∪ {x ∈ X : f (x) ∈ I } ∩ N c . As {x ∈ X : f (x) ∈ I } ∩ N ⊆ N , it is also S-measurable. Moreover, since f = limn→∞f n on N c , by 2. in Theorem 10.2.5 we have {x ∈ X : limn→∞ f n (x) ∈ I } ∩ N c ∈ S. Therefore f −1 (I ) = {x ∈ X : f (x) ∈ I } ∈ S. Remark 10.3.1 The limit of an m-a.e. convergent sequence is unique as a function defined m-a.e.. That is to say: lim f n = f 1 m-a.e. on X and
n→∞
lim f n = f 2 m-a.e. on X
n→∞
imply f 1 = f 2 m-a.e. on X .
♣
Here is a useful lemma for the sequel. Lemma 10.3.1 (Sequences of measurable functions: measurability of convergence set and of its complement) Let (X, S, m) be a complete measure space, { f n }n≥1 ⊆ ¯ M (X, S) a sequence m-a.e. to the S-measurable cmap of R-valued maps converging f : X → R. Then x ∈ X : limn→∞ f n = f and x ∈ X : limn→∞ f n = f are S-measurable. Proof By definition, given x ∈ X the sequence { f n (x)}n≥1 tends to f (x) iff, for every = 1/k > 0, there is an n ∈ N such that | f i (x) − f (x)|
n 0 ⇔ ∃n 0 ∈ N : | f n (x)| < n , ∀n > n 0 . Then for x ∈ X \ N ∞ n=1
∞
n0
| f n (x)| =
| f n (x)| + n=n 0 +1
n=1
∞
n0
| f n (x)|
0, there exists a set E ∈ S with m(E) < δ such { f n }n≥1 converges uniformly to f on (X \ E). ♦ The measure of E is as small as we like, but not zero. Hence a.u. convergence is uniform convergence up to a set of arbitrarily small, but positive, measure. The Severini–Egorov theorem explains that pointwise convergence m-a.e. implies m-almost uniform convergence on X provided m(X ) < ∞. First, though, we need a preliminary fact.
10.3 Pointwise Convergence A.e. and Almost Uniform Convergence
261
Proposition 10.3.2 (Relevant property of convergence m-a.e.) On a complete mea¯ sure space (X, S, m) with m(X ) < ∞, let { f n }n≥1 be a sequence of R-valued, S-measurable maps that pointwise converge m-a.e. to an S-measurable map f : X → R. Given any > 0 and δ > 0 there exist a measurable subset E ⊆ X with m(E ) ≤ δ, and an integer n such that sup | f n (x) − f (x)| ≤
x∈(X \E )
for every n ≥ n . Proof Consider N = x ∈ X : f (x) = lim f n (x) ∈ Nm n→∞
and A = (X \ N ), so m(A) = m(X ). Given n ∈ N we introduce E n = x ∈ A : | f n (x) − f (x)| ≥ . The measurability of f n (x) and f (x) on A implies the measurability of E n . So let us put, for every k ∈ N, Ak =
∞
En =
n=k
∞
x ∈ A : | f n (x) − f (x)| ≥ .
n=k
This, too, is measurable as countable union of measurable sets. Clearly {Ak }k≥1 is decreasing, with m(A1 ) ≤ m(A) < ∞, so by the measure’s continuity from above m
∞ k=1
Ak
= lim m(Ak ). k→∞
As f n (x) converges to f (x) pointwise for every x ∈ A, from a certain point onwards, i.e. eventually, | f n (x) − f (x)| < . Hence x eventually does not belong to Ak , say for n ≥ k . But then the intersection of the Ak is empty and the above limit is zero. Given then any δ > 0, there is a k such that m(Ak ) ≤ δ. Note Ak depends on , so what we are talking about is not the uniform convergence of f n (x) to f (x) on (X \ Ak ) (uniform convergence up to an arbitrarily small set). That settled, let us write E = Ak ∪ N . As Ak ⊆ A and (A ∩ N ) = ∅, the space’s / E , then x ∈ / E n for every n ≥ k , completeness implies m(E ) = m(Ak ) ≤ δ. If x ∈ and so x ∈ (X \ E ) and | f n (x) − f (x)| < ,
262
10 Measurable Functions
whence sup | f n (x) − f (x)| ≤ .
x∈(X \E )
Applying this result repeatedly we can attain almost uniform convergence starting from pointwise convergence m-almost everywhere. Theorem 10.3.5 (Severini–Egorov theorem, or Littlewood’s second principle— pointwise convergence m-a.e. and m-a.u. convergence) Let (X, S, m) be a complete ¯ S-measurable measure space with m(X ) < ∞, and { f n }n≥1 a sequence of R-valued, maps pointwise converging m-a.e. to an S-measurable map f (x) : X → R. Then the convergence is uniform on X up to a set of arbitrarily small measure. Proof Suppose { f n (x)}n≥1 converges m-a.e. By the previous result, given n = 1/n and δn = δ/2n there exists E n = Aδn , with m(E n ) < δn , such that sup
x∈(X \E n )
Consider Aδ =
∞ n=1
| f n (x) − f (x)| ≤
1 . n
E n . By subadditivity
m(Aδ ) = m
∞
En
∞
n=1
∞
m(E n )
= sn (x). 2n+1 2n
Recalling k0 is arbitrary, for every x ∈ f −1 ([0, n + 1)) and n ∈ N we have sn (x) ≤ sn+1 (x) and sn (x) ≤ f (x). Similarly, at step n: f (x) ∈ [n, ∞] ⇒ sn (x) = n, and at step n + 1: f (x) ∈ [n, n + 1) or [n + 1, ∞]. Therefore f (x) ∈ [n, n + 1) ⇒ sn+1 (x) = n = sn (x) and f (x) ∈ [n + 1, ∞] ⇒ sn+1 (x) = n + 1 > n = sn (x). In all cases, then, for every x ∈ f −1 ([0, ∞)) and every n ∈ N we obtain sn (x) ≤ sn+1 (x) and sn (x) ≤ f (x). Our next theorem shows instead how a measurable function can be approximated by continuous maps up to a set of arbitrarily small measure. Theorem 10.4.2 (Lusin–Littlewood’s third principle—approximation of measurable functions by continuous maps) Consider a complete measure space (X, L , m), where X ⊆ R is an L-measurable set such that m(X ) < ∞, L is the Lebesgue σ¯ is an m-a.e. bounded, algebra on R and m is a regular measure on X . If f : X → R Lebesgue measurable map, m {x ∈ X : f (x) = ±∞} = 0, then for every > 0: 1. there exists a compact set K ⊆ X with m(X \ K ) < ; 2. the restriction of f to K is continuous, i.e.: f is continuous on X up to a set of arbitrarily small measure. Proof To start with, take as f (x) the indicator function ϕ E (x), E ∈ L . By Proposition 9.4.1, given > 0 there exists a compact subset K ⊆ E such that m(E \ K ) < /2. As X \ E is measurable, there exists a compact subset K ⊆ (X \ E) such that
10.4 Approximating Measurable Functions
267
m((X \ E) \ K ) < /2. Setting K = K ∪ K we observe (X \ E) \ K = [(X \ E) \ (K ∪ K )] = [(X \ E) \ K ] ∩[(X \ E) \ K ] =(X \E)
= [(X \ E) \ K ] and (E \ K ) = [(E \ (K ∪ K )] = [(E \ K ) ∩ (E \ K )] = (E \ K ), =E
and so m(X \ K ) = m(((X \ E) ∪ E)) \ K ) = m(((X \ E) \ K ) ∪ (E \ K )) = m((X \ E) \ K ) + m(E \ K ) = m((X \ E) \ K ) + m(E \ K ) < .
Consider the restriction ϕ X |K (x) =
1 if x ∈ K 0 if x ∈ / K.
As K ∩ K = ∅ and ϕ X is constant on K , then ϕ X is continuous on K . Since is arbitrary, this means ϕ X |K (x) is continuous on X up to a set of arbitrarily small measure, proving the claim for characteristic functions. Now suppose f (x) is the simple function s : X → [0, ∞], n
ai ϕ Ei (x), x ∈ X.
s(x) = i=1
Given > 0 we can apply the previous part with /n to every characteristic function in the linear combination s(x). Pass to the case where f (x) is non-negative and bounded: 0 ≤ f (x) ≤ M. Fix n ∈ N and divide the codomain’s interval [0, M) in n equal parts. This gives points yi = i · (M/n), i = 0, . . . , n. For i = 0, . . . , n − 1 define sets E i = x ∈ X : yi ≤ f (x) < yi+1 . As f (x) is Lebesgue measurable, the E i are measurable, and satisfy E i ∩ E j = ∅, n−1 E i = X . We use them to build the non-negative simple function i = j, plus i=0
268
10 Measurable Functions n−1
sn (x) =
yi ϕ Ei (x). i=0
For > 0 and every n ∈ N, from what we proved above there exists a compact set K n ⊆ X such that m X \ Kn ≤ n 2 and sn (x) is continuous on K n . As n ∈ N varies we obtain thus an increasing sequence of compact sets K n such that K =
∞
K n ⊆ X.
n=1
This K is compact, as countable intersection of compact sets, and further,
m X \ K) = m X ∩ =m
∞ n=1
∞
c
=m X∩
Kn
n=1
X \ Kn
∞
K nc
n=1 ∞
≤ n=1
= . 2n
To prove statement 2. take x ∈ X . Obviously x ∈ E i for some i, so | f (x) − sn (x)| = | f (x) − yi ϕ Ei (x)| = | f (x) − yi | ≤ yi+1 − yi M M M −i · = = (i + 1) · n n n and then # M # . sup # f (x) − sn (x)# ≤ n x∈X Hence {sn (x)}n≥1 converges uniformly to f (x) on K , and each sn (x) is continuous on K , so f (x) too, because the uniform limit of continuous maps is continuous. Now suppose f (x) ≥ 0. As f (x) is bounded m-a.e., m(E) = m x ∈ X : f (x) = ∞ = 0. For any n ∈ N define E n = x ∈ X : f (x) ≥ n .
10.4 Approximating Measurable Functions
269
The sequence is decreasing and such that m(E 1 ) ≤ m(X ) < ∞ and E = The measure’s continuity from above forces
∞ n=1
En .
m(E) = lim m(E n ) = 0; n→∞
Therefore for > 0 there exists an n¯ such that m(E n¯ ) = m x ∈ X : f (x) ≥ n¯ ≤ /2, and this is the case for every n ≥ n. ¯ Put X = (X \ E n¯ ) (measurable) and note 0 ≤ f (x) < n. ¯ By the previous part, for every > 0 there exists an K contained in X such that m(X \ K ) ≤ /2 and f (x) is continuous on K . But m(X \ K ) = m(E n¯ ) + m(X \ K ) ≤ , so the claim for f ≥ 0 is settled. At last, if f (x) is fully arbitrary, the general statement descends from the fact that f (x) = f + (x) − f − (x). The English mathematician J. E. Littlewood abridged Lusin’s theorem3 like this: “measurable functions are approximately continuous”. Remark 10.4.1 A Lebesgue measurable function may be discontinuous everywhere, as is the Dirichlet function. So it is interesting that the Dirichlet function satisfies Lusin’s theorem on [0, 1]. To see it, take a listing {qi }i≥1 of the rationals in [0, 1]. Fix > 0 and associate to qi the interval Q i = qi − i , qi + i . 2 2 The set Then
∞ i=1
Q i contains all the rationals in [0, 1], it is open, and
F = [0, 1] \
∞
!∞ i=1
(Q i ) < .
Qi
i=1
satisfies (0, 1) \ F < , and the restriction of the Dirichlet function f |F is identically zero because there are no rationals in F. Therefore the restriction is uniformly continuous on F. ♣ Eventually we come to the so-called Littlewood first principle, whereby a finitemeasure subset of R is approximately a finite union of pairwise-disjoint intervals. This fact follows directly from Proposition 9.3.5. 3 Also
known as Littlewood’s third principle.
270
10 Measurable Functions
Proposition 10.4.1 (Littlewood’s first principle) Let E ⊆ R be a set in L with measure m(E) < ∞. For any > 0 there n exists a finite union of pairwise-disjoint Ii , such that intervals I1 , . . . , In ∈ A(I), with F = i=1 m(EF) < .
10.5 The Vitali–Cantor Map We already know the Borel σ-algebra B(R) is contained in the Lebesgue σ-algebra L (R), and we have also seen that the axiom of choice brings into being subsets that cannot be Lebesgue measurable. Here we shall explain how to produce a Lebesgue measurable set that is not Borel measurable. For this purpose we use the Cantor set to construct a function which, among others, was discussed and popularized by Vitali (1905). For this reason we shall call the resulting function the Vitali–Cantor map. Example 10.5.1 (The Vitali–Cantor map) Consider the Cantor set of Example 2.3.3. There we saw that the residual sets produced whilst deleting are: C1 = 0, C2 = 0,
2 1 ∪ , 1 = [a1,1 , b1,1 ] ∪ [a1,2 , b1,2 ] 3 3 2 3 6 7 8 1 ∪ , ∪ , ∪ , 1 = [a2,1 , b2,1 ] ∪ [a2,2 , b2,2 ] ∪ [a2,3 , b2,3 ] ∪ [a2,22 , b2,22 ] 9 9 9 9 9 9
2 3 6 7 8 9 18 19 20 21 24 25 1 C3 = 0, ∪ , ∪ , ∪ , ∪ , ∪ , ∪ , 27 27 27 27 27 27 27 27 27 27 27 27 27 26 ∪ , 1 = [a3,1 , b3,1 ] ∪ [a3,2 , b3,2 ] ∪ [a3,3 , b3,3 ] ∪ [a3,4 , b3,4 ] ∪ [a3,5 , b3,5 ] ∪ [a3,6 , b3,6 ] 27 ∪ [a3,7 , b3,7 ] ∪ [a3,23 , b3,23 ] . . . Cn =[an,1 , bn,1 ] ∪ [an,2 , bn,2 ] ∪ · · · ∪ [an,k , bn,k ] ∪ · · · ∪ [an,2n , bn,2n ] . . .
At step n one deletes 2n − 1 middle intervals, leaving 2n closed disjoint intervals of length 1/3n . For every n ∈ N consider f n : [0, 1] → R,
10.5 The Vitali–Cantor Map
271
Fig. 10.2 Graph of the Vitali–Cantor map
⎧k−1 k ⎪ ⎨ n ≤ f n (x) ≤ n ∀x ∈ [an,k , bn,k ], k = 1, . . . , 2n 2 2 ⎪ n ⎩ f (x) = k ∀x ∈ (b , a n n,k n,k+1 ) ⊆ [0, 1] \ [an,k , bn,k ] , k = 1, . . . , 2 − 1. n 2 This f n (x) is • linearly increasing, ranging between values k−1 and 2kn inside [an,k , bn,k ], for k = 2n 1, 2, . . . , 2n ; • constant, equal to 2kn , on deleted intervals (bn,k , an,k+1 ), for k = 1, . . . , 2n − 1. The graph of f n (x) at step n is a polygonal path made of as many segments as the overall number (2n − 1) + 2n of (deleted and) remaining intervals joining (an,k , (k − 1)/2n ) to (bn,k , k/2n ). Of these intervals, 2n − 1 are horizontal and 2n have slope (3/2)n . Every (upper-)slanted segment is followed by a horizontal one, then another slanted one and so on. This makes f n (x) increasing. Moreover, f n (x) is onto: this happens everywhere and for instance where f n is constant, for on x ∈ (bn,k , an,k+1 ) it has the value f n (x) = k/2n . Let us compare the various maps f n (x), i.e. the paths generated at the different steps. For every n > 1 the graph of f n (x) is obtained from that of f n−1 (x) by not touching the 2n−1 − 1 horizontal segments of step n − 1, but replacing the 2n−1 slanted bits with three segments: a slanted one, then a horizontal one in the middle, then another slanted one (Fig. 10.2). In general the open intervals along the x-axis corresponding to horizontal segments are pairwise disjoint, because any horizontal segment lies between two slanted bits and remains unchanged duringthe iterations. This means any two intervals (bn,k , an,k+1 ) and (bm,k , am,k +1 ) ⊆ [0, 1] \ C , for n, m ≥ 1, k = 1, . . . , 2n − 1, and k = 1, . . . , 2m − 1, are disjoint for k = k . We would like to know whether { f n (x)}n≥0 converges to some limit f (x), and how. Let us fix step n. On intervals where f n−1 is constant, f n has that same value, so | f n (x) − f n−1 (x)| = 0. On intervals [an−1,k , bn−1,k ], k = 1, . . . , 2n−1 ,
272
10 Measurable Functions
# # # k 1 k − 1# | f n (x) − f n−1 (x)| < | f n−1 (an−1,k ) − f n−1 (bn−1,k )| = ## n−1 − n−1 ## = n−1 , 2 2 2 which is, for n > 1, the largest increment of f n−1 (x) on each [an−1,k , bn−1,k ], k = 1, . . . , 2n−1 . When n = 1 the biggest increment of f 0 is naturally 1. Fix m > 1 and two positive integers i, j > m. The maximum increment on each of the 2i , 2 j slanted segments of f i (x), f j (x) respectively equals 1/2i , 1/2 j , both less than 1/2m . Suppose i > j without loss of generality. Then 1/2m > 1/2 j > 1/2i and the absolute maximum difference between f i (x) and f j (x) is less than every x ∈ [a j,k , b j,k ], k = 1, . . . , 2 j : | f i (x) − f j (x)|
m. Hence the sequence { f n (x)}n≥0 is Cauchy and, as such, it converges pointwise on R to some f : [0, 1] → [0, 1]. The sequence converges also uniformly to f on [0, 1]: given > 0 there exists an m such that, for j > m with 21j < , for every i ∈ N | f j+i (x) − f j (x)| < , ∀x ∈ [0, 1]. That is to say, for j > m and every x ∈ [0, 1] lim | f j+i (x) − f j (x)| < ,
i→∞
so for every j > m and x ∈ [0, 1] | f (x) − f j (x)| < . Note that each f n is continuous on [0, 1], so Theorem 4.2.1 forces f to be continuous on [0, 1].
10.5 The Vitali–Cantor Map
273
Furthermore f is non decreasing. Taking x < y in [0, 1], • x, y ∈ C ⇒ f (x) < f (y); • if x, y ∈ [0, 1] \ C, suppose for some n ∈ N and k = 1, . . . , 2n − 1 we have x, y ∈ (bn,k , an,k+1 ) ⊆[0, 1] \ C,then f (x) = f (y), otherwise f (x) < f (y); • if x ∈ C ∧ y ∈ [0, 1] \ C or vice versa, then f (x) ≤ f (y). Hence f (x) ≤ f (y) whenever x < y ∈ [0, 1]. What is more, f (x) set has zero derivative on [0, 1] except for the zero-measure . Call (b C. Take in fact x ∈ [0, 1] \ C . For some n we have x ∈ [0, 1] \ C , n n,k an,k+1 ) the interval in [0, 1] \ Cn containing x. As f n is constant on (bn,k , an,k+1 ) and f m (x) = f n (x) on (bn,k , an,k+1 ) for every m ≥ n, passing to the limit as m →∞ gives f n (x) = f n+1 (x) = · · · = f (x) = 2kn on (bn,k , an,k+1 ). Our x ∈ [0, 1] \ C is arbitrary so we may set f (x) = k for every x ∈ [0, 1] \ C , and then f (x) = 0 on [0, 1] \ C . But (C) = 0, so f (x) is zero -a.e. on [0, 1]. We call f : [0, 1] → [0, 1] the Vitali–Cantor map. At this juncture consider g : [0, 1] → [0, 2], g(x) = f (x) + x, where f (x) is the Vitali–Cantor map. It is continuous, as sum of continuous maps, and strictly increasing, since g (x) = 1 -a.e. (in fact f (x) = 0 -a.e.), hence bijective. The inverse g −1 = h is continuous: if U ⊆ [0, 1] is open, then [0, 1] \ U is closed and bounded, i.e. compact. But g is continuous so g([0, 1] \ U ) is compact. As g([0, 1] \ U ) = h −1 ([0, 1] \ U ) = h −1 ([0, 1]) \ h −1 (U ) = [0, 2] \ h −1 (U ), [0, 2] \ h −1 (U ) is compact. So h −1 (U ) ⊆ [0, 2] is open, making h continuous. All in all g is a homeomorphism. We claim g(C) has measure 1. Now, g maps the deleted intervals in [0, 1] (those removed when constructing C) to intervals of the same length in [0, 2]. This is because f (a) = f (b) for every removed interval (a, b) belonging to the set [0, 1] \ C, so (g(a), g(b)) = g(b) − g(a) = f (b) + b − f (a) − a = b − a
(10.5.1)
and the latter implies g([0, 1] \ C) = [0, 1] \ C) = 1. As [0, 2] = g(C)
g([0, 1] \ C),
[0, 2] = g(C) + g([0, 1] \ C) = g(C) + 1 = 2, whence g(C) = 1. Since g(C) > 0, Remark 9.5.2 tells us there exists a non-measurable set V ⊂ −1 g(C). Then take E = h(V ) = g (V ). Clearly E ⊂ C has measure (C) = 0. But R, L (R), is complete, making E negligible, i.e. (E) = 0.
274
10 Measurable Functions
The map h, inverse of a homeomorphism, is continuous and hence Borel. Therefore for every Borel set B the pre-image h −1 (B) is measurable. But we know h −1 (E) = V is not measurable, implying E cannot be Borel, albeit measurable with measure zero. The conclusion is B(R) ⊂ L (R), and the pre-image under a measurable map of a non-Borel measurable set may not be measurable.
10.6 Rudiments of the Theory of Random Variables A random quantity, or random variable, is a function of the outcome of an experiment. The outcome is uncertain before the experiment takes place or before certain informations on the phenomenon under exam become available. A random variable is typically denoted with a capital letter X, Y, . . ., and the outcome of a test, called elementary event, is indicated with ω ∈ , where is the sample space. Definition 10.6.1 (Random variables) Let (, S()) and (R, B(R)) be measurable spaces. A function X : → R is called random variable (RV) if E = ω ∈ : X (ω) ∈ B = X −1 (B) ∈ S() for every B ∈ B(R). A RV is therefore an S-measurable map defined on the measurable space (, S()) with values in Borel sets B(R). ♦ The main objective of the study of RVs is to compute the probability the variable X assumes values in some B ∈ B(R). As the probabilities with which the possible outcomes, or events, manifest themselves are by definition distributed on the sample space (the so-called probability distribution), to determine the probability that a variable X takes value in B we must consider the probability P(X ∈ B) = P ω ∈ : X (ω) ∈ B = P(X −1 (B)).
(10.6.1)
From formula (10.6.1) the pre-image of B ∈ B(R) under X belongs to the event σ-algebra S(), which and so X should be measurable and is assumed complete, the measure of E = ω ∈ : X (ω) ∈ B equals P(X −1 (B)). Definition 10.6.2 (Probability distribution functions) Let X : → R be a RV defined on the probability space (, S(), P). The probability distribution (function) of X , denoted PX , is the completely additive set function defined by P(E) = PX (B) = P(X −1 (B)), for every E = ω ∈ : X (ω) ∈ B ∈ S(), where B ∈ B(R).
(10.6.2) ♦
Proposition 10.6.1 (Functions of random variables) If X is a RV and f : R → R a Borel measurable map, then Y = f (X ) is a RV.
10.6 Rudiments of the Theory of Random Variables
275
Proof The pre-image f −1 (B) of any B ∈ B(R), as f is Borel measurable, is a Borel set, so Y −1 (B) = {ω ∈ : f (X (ω)) ∈ B} = {ω ∈ : X (ω) ∈ f −1 (B)} = X −1 f −1 (B) ∈ S().
Hence Y is a RV. From the above proposition, given a RV X and α ∈ R, X + α, αX, |X |, X 2 , 1/ X, for X = 0, are all RVs.
Proposition 10.6.2 (Sum, difference, product and quotient of RVs) If X and Y are RVs then so are Z = X + Y, Z = X − Y, Z = X Y, Z = X/Y, for Y = 0. Proof To show Z = X + Y is a RV we must prove {ω ∈ : Z (ω) ≤ x} ∈ S() for every x ∈ R. Take any x ∈ R, so Z (ω) ≤ x iff there is a rational q such that X (ω) ≤ q and Y (ω) ≤ x − q. Hence {ω ∈ : Z (ω) ≤ x} =
{ω ∈ : X (ω) ≤ q, Y (ω) ≤ x − q}
q∈Q
=
{ω ∈ : X (ω) ≤ q} ∩ {ω ∈ : Y (ω) ≤ x − q} .
q∈Q
For every q ∈ Q {ω ∈ : X (ω) ≤ q} ∩ {ω ∈ : Y (ω) ≤ x − q} ∈ S() since X, Y are RVs. Hence {ω ∈ : Z (ω) ≤ x} ∈ S(). Now, since X − Y = X + (−Y ), by Proposition 10.6.1 X − Y is a RV. Furthermore, XY =
$ 1" (X + Y )2 − (X − Y )2 4
shows X Y is a RV. Finally, Proposition 10.6.1 implies 1/Y is still a RV, and so X/Y is a RV. Sometimes we do know the probability distribution directly (for instance when we toss a coin), while in other cases we have to resort to Eq. (10.6.2). There are various
276
10 Measurable Functions
ways to identify the probability distribution of a RV X . The first is to determine the corresponding cumulative distribution function FX (x) . This is defined, for every x ∈ R, as (10.6.3) FX (x) = P(X ≤ x). Cumulative distribution functions allow to detect the probability distribution PX of the sample space and have the advantage of replacing functions defined on sets with functions defined on points. The latter are associated to the Lebesgue–Stieltjes measure of Sect. 9.6 in case " $ ˜ = 1. lim F(x) − F(−x) ) = m(R)
x→∞
Definition 10.6.3 (Cumulative distribution functions) We call (cumulative) distribution function of a RV X the map FX : R → [0, 1] given by FX (x) = P(X ≤ x), x ∈ R. ♦ In the next proposition we prove the properties of cumulative distribution functions. Proposition 10.6.3 (Properties of distribution functions) The distribution function of a RV X is an increasing, right-continuous map such that P(a < X ≤ b) = FX (b) − FX (a) for every a < b ∈ R, and lim FX (x) = 0 and
x→−∞
lim FX (x) = 1.
x→∞
Proof Let us start by observing that distributions functions (Sect. 9.6) satisfy P(a < X ≤ b) = m FX ((a, b]) = FX (b) − FX (a), for every a, b ∈ R, a < b. In fact, putting A = {ω ∈ : X (ω) ≤ a} and B = {ω ∈ : X (ω) ≤ b}, we have A ⊆ B and {ω ∈ : a < X (ω) ≤ b} = B ∩ Ac = B \ A. As P() = 1, the distribution function satisfies P(a < X ≤ b) = P(B \ A) = P(B) − P(A) = FX (b) − FX (a). If a < b, moreover, then Aa = {ω ∈ : X (ω) ≤ a} is contained in Bb = {ω ∈ : X (ω) ≤ b}, and hence FX (a) = P(Aa ) = P(X ≤ a) ≤ P(X ≤ b) = P(Bb ) = FX (b),
10.6 Rudiments of the Theory of Random Variables
277
making the distribution function FX increasing. As PX is a measure on the σ-algebra S(), we claim FX is right-continuous. Take x ∈ R and {xi }i≥1 a decreasing sequence tending to x. Define events A = {ω ∈ : X (ω) ≤ x1 }, B1 = {ω ∈ : X (ω) ≤ x} and Bi = {ω ∈ : xi+1 < X (ω) ≤ xi }, ∀i ≥ 1, ∞ Bi . As the Bi are independent events so Bi ∩ B j = ∅ for every i = j, and A = i=1 we may set ∞ ∞ P(A) = P Bi = P(Bi ), (10.6.4) i=1
i=1
so P(A) = FX (x1 ), P(B1 ) = FX (x), P(Bi ) = FX (xi ) − FX (xi+1 ), ∀i ≥ 1. Hence (10.6.4) reads FX (x1 ) =FX (x) + [FX (x1 ) − FX (x2 )] + [FX (x2 ) − FX (x3 )] + · · · + [FX (xi ) − FX (xi+1 )] + · · · , and then FX (x) = limi→∞ FX (xi ), i.e. FX (x) = FX (x + ). Finally, we prove lim x→−∞ FX (x) = 0 and lim x→+∞ FX (x) = 1. Regarding the first, let {xi }i≥1 be a sequence decreasing to −∞. Define events A = {ω ∈ : X (ω) ≤ x1 }, Bi = {ω ∈ : xi+1 < X (ω) ≤ xi }, ∀i ≥ 1, so Bi ∩ B j = ∅, i = j, and A =
∞
P(A) = P
i=1
Bi . The independence of the Bi forces
∞ i=1
Bi
∞
=
P(Bi ), i=1
and P(A) = FX (x1 ), P(Bi ) = FX (xi ) − FX (xi+1 ), ∀i ≥ 1, so (10.6.5) becomes FX (x1 ) =[FX (x1 ) − FX (x2 )] + [FX (x2 ) − FX (x3 )] + · · · + [FX (xi ) − FX (xi+1 )] + · · ·
(10.6.5)
278
10 Measurable Functions
Therefore FX (x1 ) = FX (x1 ) − limi→∞ FX (xi ). But {xi } tends to −∞, so necessarily limi→∞ FX (xi ) = FX (−∞) = 0. As for the second limit, consider {xi }i≥1 increasing to ∞. Define = {ω ∈ : X (ω) < ∞}, B1 = {ω ∈ : X (ω) ≤ x1 }, and Bi = {ω ∈ : xi < X (ω) ≤ xi+1 }, ∀i ≥ 1, so Bi ∩ B j = ∅, i = j, and =
∞
P() = P
i=1
Bi . Once more, the Bi are independent so
∞ i=1
Bi
∞
=
P(Bi ),
(10.6.6)
i=1
and P() = 1, P(B1 ) = FX (x1 ), P(Bi ) = FX (xi+1 ) − FX (xi ), ∀i ≥ 1, and (10.6.6) reads 1 =FX (x1 ) + [FX (x2 ) − FX (x1 )] + [FX (x3 ) − FX (x2 )] + · · · + [FX (xi+1 ) − FX (xi )] + · · · , whence 1 = limi→∞ FX (xi ). As {xi } tends to ∞, lim FX (xi ) = FX (∞) = 1.
i→∞
Proposition 10.6.4 (Uniform continuity of continuous distribution functions) A continuous distribution function on R is uniformly continuous.
Proof For every x ∈ R we have 0 ≤ FX (x) ≤ 1. Given > 0 we can find real numbers a < b such that 0 ≤ FX (a)
b, |FX (x) − FX (b)| < . 2 2
and
Let us examine the situation in [a, b], where FX is continuous and hence uniformly continuous. Given x1 , x2 ∈ [a, b] there exists η() > 0 such that for every x1 , x2 ∈ [a, b] with |x1 − x2 | < η(),
10.6 Rudiments of the Theory of Random Variables
279
|FX (x1 ) − FX (x2 )|
0. At last, calling FX the distribution function of the RV X , for any a < b we have P(a < X ≤ b) = FX (b) − FX (a) P(a < X < b) = P(a < X ≤ b) − P(X = b) = FX (b) − FX (a) − FX (b) + FX (b− ) = FX (b− ) − FX (a) P(a ≤ X < b) = P(a < X < b) + P(X = a) = FX (b− ) − FX (a) + FX (a) − FX (a − ) = FX (b− ) − FX (a − ) P(a ≤ X ≤ b) = P(a ≤ X < b) + P(X = b) = FX (b− ) − FX (a − ) + FX (b) − FX (b− ) = FX (b) − FX (a − ). ♣
Part V
Theory of integration
Introduction
To determine the area of the region below the graph of a map, what one usually does is divide the base, i.e. the interval of integration, in a suitable number of subintervals a = a0 < a1 < a2 < . . . < an = b, and for each (ai−1 , ai ) choose one of the infinitely many values f (x) attains on it. Then compute S = f (x1 )(a1 − a0 ) + f (x2 )(a2 − a1 ) + · · · + f (xn )(an − an−1 ) = f (x1 )(I1 ) + f (x2 )(I2 ) + · · · + f (xn )(In ) =
n
f (xi )(Ii ).
i=1
The roundoff errors made (both up and down) in doing so tend to become smaller, until they vanish, as the subintervals shrink to 0. This is the typical logic of a definition of integral such as the one adopted by Cauchy, which was justified under the assumption the map f to be integrated is continuous. In this case, in fact, to nearby x-values there correspond nearby images f (x). Choosing any one of the infinite values of f over a subinterval would work, because the range of f does not present irregular peaks, like jumps or discontinuities. As he formulated the notion of integral, Riemann wanted to set up a general theory to include totally discontinuous maps as well. Although his aim was different from Cauchy’s, however, he adopted a similar construction. In practice he divided the integration interval of f in a finite number of subintervals, and assumed that as these tended to zero f behaved, if not exactly as a continuous map, at least with a certain degree of regularity, i.e. with oscillations bounded by some threshold. In this way he managed to capture the integrability of only certain discontinuous maps, certainly not the majority.
282
Theory of integration
283
Henri Lebesgue, in the 1926 paper “Sur le développment de la notion d’intégrale”, astonished by the fact that Riemann’s method would work for some discontinuous functions, wrote that when that happened it was by sheer accident. The reason for the Riemann integral’s failure to generalise as originally intended, as Lebesgue stressed in the aforementioned article, lies in insisting the oscillations of f are confined within certain limits. Put differently, the problem is applying Cauchy’s integrability criterion for continuous maps – or a petty modification of it – to discontinuous functions. Lebesgue therefore scrapped Riemann’s method altogether, considering it inadequate in view of generalisations, and adopted instead Cauchy’s guideline that close values should be gathered and multiplied by the measure of the corresponding set along the x-axis. Given the discontinuities of f , that set might not be one of the predetermined intervals, but might on the contrary be scattered over the entire domain of integration, to form a complicated linear set of points. Hence the need of a theory of measure capable establishing the ‘size’ of arbitrary linear subsets of the real line. The only way to group the values f (x) by proximity was to divide into a finite number of subintervals not the interval of integration, but the interval [c, d] of range values between the infimum and supremum of f : c = c0 < c1 < . . . < cn = d. Thus over each subinterval the values f (x) are certainly near one another, and one can make them as near as one wants by refining the partition, i.e. increasing the number of subintervals. Moreover, for every i = 1, 2, . . ., nearby values f (x) ∈ [ci−1 , ci ) will have pre-image Ai = {x ∈ [a, b] : ci−1 ≤ f (x) < ci } on the x-axis. The latter might not have a naive measure, like Riemann’s intervals. On the contrary, in view of the discontinuities of f the structure could turn out to be extremely intricate. Thanks to a suitable theory of measure we will be able to find the measure ∗ of any set Ai , and then compute the two sums S=
n
ci ∗ (Ai )
and
i=1
s=
n
ci−1 ∗ (Ai ).
i=1
The former is larger, the latter smaller, than the area of the region underneath f . As subintervals tend to zero S and s tend to the respective infimum S∗ and supremum s ∗ . ∗ Lebesgue proved that for bounded measurable functions ∞ S∗ = s , and this common Ai , denoted number is precisely the integral of f (x) over A = i=1 f (x)(d x). A
Needless to say, if A is the entire interval [a, b], then the notation used is slightly b different from the Riemann integral’s, i.e. [a,b] f (x)(d x) as opposed to a f (x)d x.
284
Theory of integration
In this way the subintervals Ii appearing in the Riemann’s definition, whose measure is elementary, are replaced by sets Ai , which may be rather complicated sets. In the aforementioned survey paper Lebesgue explains the novelties of his integral with an effective analogy, which we transcribe faithfully because of its intelligibility and straightforwardness: “[…] with Riemann’s procedure, one attempted to sum the indivisibles by taking them in the order in which they are given by the variation of x, one would operate like a merchant with no method who counted coins and bills in the order they came into his hands; whereas we operate like the methodical tradesman who says: I have m(E 1 ) 1-crown coins with total value 1 · m(E 1 ), I have m(E 2 ) 2-crown coins with total value 2 · m(E 2 ), I have m(E 3 ) 5-crown coins with total value 5 · m(E 3 ) etc, so altogether I have S = 1 · m(E 1 ) + 2 · m(E 2 ) + 5 · m(E 3 ) + . . . The two processes will lead, certainly, the merchant to the same result because, irrespective of how rich he is, there is only a finite number of bills to count; but, for us, who have to sum infinitely many indivisibles, the difference of the two ways is crucial”. Integration theory therefore requires we keep conceptually separated the values of f from what we call integration sets Ai or Ii , which for the Lebesgue integral are pre-images under f . The structure of the Ai , which are spread across the x-axis in ways which can be more or less complex, is determined by range values of f close to one another. Roughly, we may say a map f is Lebesgue integrable if every set that f projects on the horizontal axis is measurable. Within this framework the study we have made of so-called measurable functions is fully justified.
Chapter 11
The Lebesgue Integral
11.1 The Integral of Non-negative, Simple Measurable Maps ¯ using its positive and negative parts f + , f − has a number Writing a map f : X → R of advantages. The first is one can then define integrals just for non-negativefunctions. The that is allows to view integration as a linear operator, i.e. f dm = +second is f dm − f − dm. Throughout this chapter the space (X, S, m) will be complete and with σ-finite measure, unless said otherwise. Let us remind M0+ (X, S) is the collection of non-negative S-measurable simple maps s : X → [0, ∞], s(x) =
n
ai ϕ Ai (x), x ∈ X.
i=1
n Ai = X . The ai ∈ [0, ∞] are distinct, and Ai ∈ S, Ai ∩ A j = ∅, ∀i = j and i=1 Given simple S-measurable maps s, s1 , s2 with values in [0, ∞], and a constant α ≥ 0, we know the following are simple and S-measurable (and [0, ∞]-valued): (αs), (s1 + s2 ), (s1 · s2 ), (s1 ∨ s2 ), (s1 ∧ s2 ), (sϕ E ) for any E ∈ S etc. We shall define the integral for such class of functions. Keeping in mind the integral of a characteristic function of an S-measurable set A ⊆ X is defined by ϕ A dm = 1 · m(A), X
we may define integrals of simple non-negative measurable maps in the following way.
© The Editor(s) (if applicable) and The Author(s), under exclusive license to Springer Nature Switzerland AG 2020 S. Gentili, Measure, Integration and a Primer on Probability Theory, UNITEXT 125, https://doi.org/10.1007/978-3-030-54940-4_11
285
286
11 The Lebesgue Integral
Definition 11.1.1 (Integral simple maps) On the measure space n of non-negative ai ϕ Ai ∈ M0+ (X, S). We call Lebesgue integral of s (X, S, m) consider s = i=1 with respect to measure m (or simply, in m) the expression s dm = X
n
ai m(Ai ).
♦
i=1
When ai = 0 and m(Ai ) = ∞ the convention is to set 0 · (∞) = 0, so that the integral of 0 over a set of infinite measure vanishes. The integral might be ∞ in case ai ∈ (0, ∞] and m(Ai ) = ∞ for some i ≥ 1. This notion does not produce expressions like ±∞ · ∓∞,1 and is well defined because it does not depend on the particular representation of s. Suppose in fact s : X → [0, ∞] simple can be written s(x) =
n
ai ϕ Ai =
i=1
m
b j ϕB j ,
j=1
whereAi , B j ∈ S, i = 1, . . . , n, j = 1, . . . , m, are pairwise disjoint and X = mj=1 B j . The claim is that s dm = X
n
ai m(Ai ) =
i=1
m
n i=1
Ai =
b j m(B j ).
j=1
But by Remark 10.2.4 s dm = X
n
ai m(Ai ) =
i=1
=
m
bj
n
j=1
n
ai m
i=1
m
n m (Ai ∩ B j ) = ai m(Ai ∩ B j )
j=1
m(Ai ∩ B j ) =
i=1
i=1 m
j=1
b j m(B j ).
j=1
Definition 11.1.2 (Integral of non-negative simple maps on measurable sets) Given a measurespace (X, S, m), the Lebesgue integral with respect to m of the simple n map s = i=1 ai ϕ Ai ∈ M0+ (X, S) on E ∈ S is the quantity
s dm = E
1 As
sϕ E dm = X
n
ai m(Ai ∩ E).
♦
i=1
s ∈ M0+ is [0, ∞]-valued the worst that can happen is
X
s dm =
n
i=1 ai m(Ai )
= ∞ · ∞.
11.1 The Integral of Non-negative, Simple Measurable Maps
287
Recalling Definition 9.3.1 we immediately see s dm = E
n
ai m |(Ai ∩E) =
s dm |E .
(11.1.1)
X
i=1
Let us examine the properties of the integral of simple maps, from which the general theory of Lebesgue integration can be developed. Proposition 11.1.1 (Properties of the integral of non-negative simple maps) For any s, s1 , s2 ∈ M0+ (X, S) and α ≥ 0: s dm ∈ [0, ∞]; 1. X 2. (homogeneity) αs dm = α s dm X X 3. (finite additivity) (s1 + s2 ) dm = s1 dm + s2 dm; X X X 4. (monotonicity) s1 dm ≤ s2 dm, if s1 ≤ s2 in X ; X X 5. (monotonicity w.r.t. integration domain) s dm ≤ s dm, ∀E ∈ S; E X 6. (s1 ∧ s2 ) dm ≤ si dm ≤ (s1 ∨ s2 ) dm, i = 1, 2. X
X
X
Proof 1. Regarding 1. it suffices to note the integral is a non-negative extended real number, as sum of non-negative terms in [0, ∞]. 2. n n αs dm = αai m(Ai ) = α ai m(Ai ) = α s dm. X
i=1
X
i=1
3. Writing (s1 + s2 ) =
m n
(ai + b j )ϕ Ai ∩B j ,
i=1 j=1
we deduce
m n
(ai + b j )m(Ai ∩ B j ) s1 + s2 dm = X
i=1 j=1
=
n i=1
=
n i=1
ai
m
m(Ai ∩ B j ) +
j=1
ai m(Ai ) +
m j=1
m j=1
b j m(B j )
bj
n i=1
m(Ai ∩ B j )
288
11 The Lebesgue Integral
=
s1 dm + X
s2 dm. X
Hence integration is a linear process. 4. Suppose s1 ≤ s2 , i.e. ai ≤ b j , ∀x ∈ (Ai ∩ B j ), i = 1, . . . , n, j = 1, . . . , m. Then n
ai ϕ Ai =
i=1
s1 dm = X
ai ϕ Ai ∩B j ≤
i=1 j=1
⇔
n m
m n
n m
b j ϕ Ai ∩B j =
i=1 j=1
ai m(Ai ∩ B j ) ≤
i=1 j=1
m
b j ϕB j
j=1
m n
b j m(Ai ∩ B j ) =
i=1 j=1
s2 dm. X
5. For every E ∈ S we have ϕ E ≤ ϕ X , and so sϕ E ≤ s. Definition 11.1.2 and the previous item give s dm = sϕ E dm ≤ s dm. E
X
X
6. As min{ai , b j } ≤ max{ai , b j }, ∀x ∈ (Ai ∩ B j ), then s1 ∧ s2 =
m n
min{ai , b j }ϕ(Ai ∩B j )
i=1 j=1
≤ si , i = 1, 2 ≤
n m
max{ai , b j }ϕ(Ai ∩B j ) = s1 ∨ s2 .
i=1 j=1
Using item 4. ends the proof. Proposition 11.1.2 (Measure with density s ∈ M0+ (X, S) with respect to a reference measure m) Consider a measure space (X, S, m) and s ∈ M0+ (X, S). The function μ : S → [0, ∞] defined as μ(E) = s dm, ∀E ∈ S E
is a completely additive measure, said to have density s with respect to m. Proof We have to prove μ is a measure. First, ∅ ∈ S, so
11.1 The Integral of Non-negative, Simple Measurable Maps
μ(∅) =
∅
s dm =
n
289
ai m(Ai ∩ ∅) = 0.
i=1
Take a pairwise-disjoint sequence {E k }k≥1 ⊆ S such that E = must show μ(E) =
∞
∞ k=1
E k ∈ S, and we
μ(E k ).
k=1
Pick s =
n i=1
ai ϕ Ai . Since positive-term series can be rearranged, we have
μ(E) =
s dm = E
=
n
ai m(Ai ∩ E) =
i=1
n ∞
ai m(Ai ∩ E k ) =
k=1 i=1
n i=1
∞ k=1
ai
∞
m(Ai ∩ E k )
k=1
s dm = Ek
∞
μ(E k ).
k=1
Proposition 11.1.3 (Integral of simple maps—remarkable inequality) Given a measure space (X, S, m), a simple map s and increasing sequence {sn }n≥1 of simple maps in M0+ (X, S) such that s ≤ limn→∞ sn , then s dm ≤ lim sn dm. n→∞
X
X
Proof Let ∈ (0, 1) and define for any n ∈ N
E n = x ∈ X : (1 − )s(x) ≤ sn (x) . We claim {E n }n≥1 ⊆ S: given n ∈ N, since sn , s ∈ M0+ (X, S) then (1 − )s ∈ M0+ (X, S), so Proposition 10.2.2 guarantees E n ∈ S. Furthermore the sequence {E n }n≥1 is increasing: given n ∈ N and x ∈ E n , (1 − )s(x) ≤ sn (x) ≤ sn+1 (x), i.e. x ∈ E n+1 . At last, X =
∞ n=1
E n , because E n ⊆ X for every n ∈ N and so ∞
E n ⊆ X.
n=1
Conversely, take x ∈ X and set t (x) = limn→∞ sn (x). By hypothesis s(x) ≤ t (x). If t (x) = ∞, for n large enough sn (x) ≥ (1 − )s(x), whilst if t (x) < ∞ a large enough n still implies sn (x) > (1 − )t(x) > (1 − )s(x). Anyhow there exists an n ∈ N such that x ∈ E n , whence X ⊆ ∞ n=1 E n .
290
11 The Lebesgue Integral
Take the measure μ : S → [0, ∞] with density s with respect to m and obtain, from Theorem 8.2.3, μ(X ) = s dm = lim s dm = lim μ(E n ). n→∞
X
n→∞
En
Since, for every x ∈ X and every n ∈ N, (1 − )s(x) ≤ sn (x), the integral’s monotonicity implies s dm ≤ sn dm ≤ sn dm En En X ⇒(1 − ) lim s dm ≤ lim sn dm n→∞ E n→∞ X n sn dm; ⇔(1 − ) s dm ≤ lim
(1 − )
X
n→∞
X
To finish, remember is arbitrary.
11.2 The Integral of Non-negative Maps In view of the comments in the introductions to the present chapter, Sect. 10.4, and the previous section’s results, it is only natural to define the Lebesgue integral for non-negative measurable functions in terms of non-negative simple maps. Definition 11.2.1 (Integral of maps in M +(X, S)) Given a measure space (X, S, m),
define S f = s ∈ M0+ (X, S) : 0 ≤ s ≤ f . The integral of f ∈ M + (X, S) with respect to m is f dm = sup s dm. ♦ X
s∈S f
X
If f ∈ M + (X, S) and E ∈ S then f · ϕ E ∈ M + (X, S), as product of measurable functions. Therefore we may define integrals over sets E ∈ S. Definition 11.2.2 (Integral of maps in M + (X, S) over S-sets) Given a measure space (X, S, m) consider f ∈ M + (X, S) and E ∈ S. We call integral of f with respect to m over the domain E ⊆ X the quantity ♦ f dm = f · ϕ E dm. E
X
11.2 The Integral of Non-negative Maps
291
Remark 11.2.1 By (11.1.1) immediately f dm = sup s dm = sup s dm |E = f dm |E . s∈S f
E
s∈S f
E
X
(11.2.1)
X
♣ Beside ( f ϕ E ), for any E ∈ S, also measurable and [0, ∞]-valued are (α f ), ( f + g), ( f · g), ( f ∨ g), ( f ∧ g) etc. whenever we take α ≥ 0 and f, g measurable and [0, ∞]-valued, by Theorem 10.2.1. Next up is a result expressing the integral of a non-negative measurable function as a limit of integrals. Proposition 11.2.1 (Integral of a map in M + (X, S) as limit integral of simple maps in M0+ (X, S)) Given a measure space (X, S, m), consider f ∈ M + (X, S) and an increasing sequence {sn }n≥1 ⊆ M0+ (X, S) such that f = limn→∞ sn . Then f dm = lim sn dm. X
n→∞
X
Proof When X f dm < ∞, let us take any > 0. As limn→∞ sn = f , there exists s ∈ S f such that s ≤ limn→∞ sn and by Proposition 11.1.3 s dm ≤ lim sn dm < sup s dm + = f dm + . X
n→∞
X
s∈S f
X
X
This is true for all s ∈ S f and the arbitrary of gives the claim. In a similar way one proves the case X f dm = ∞. This proposition says, in other words, that the integral of a limit function in M + (X, S) equals the limit of the integral of the (increasing) sequence (of nonnegative simple measurable maps). Using what we have leant thus far we shall prove the basic properties of integrals. Proposition 11.2.2 (Properties of the integral of functions in M + (X, S)) Given a measure space (X, S, m), f, g ∈ M + (X, S) and α ≥ 0: 1. f dm ∈ [0, ∞]; X 2. (homogeneity) α f dm = α f dm X X
f + g dm = 3. (finite additivity) f dm + g dm; X
X
X
292
11 The Lebesgue Integral
4. (monotonicity) if f ≤ g on X ,
f dm ≤ X
g dm; X
5. (monotonicity w.r.t. integration domain) for any E ∈ S f dm ≤ f dm. E
X
Proof 1. This descends directly from Definition 11.2.1. 2. α f dm = sup αs dm = α sup s dm = α f dm. s∈S f
X
s∈S f
X
X
X
3. Since ( f + g) ∈ M + (X, S), pick an increasing sequence {sn + tn }n≥1 ⊆ M0+ (X, S) such that ( f + g) = limn→∞ (sn + tn ). By Propositions 11.2.1 and 11.1.1 (n.3.) we have ( f + g)dm = lim (sn + tn )dm n→∞ X X sn dm + tn dm = lim n→∞ X X sn dm + lim tn dm = lim n→∞ X n→∞ X = f dm + g dm. X
X
4. Since f ≤ g, then sups∈S f s ≤ supt∈Sg t, and Proposition 11.1.1, 4., implies f dm = sup s dm ≤ sup t dm = g dm. X
s∈S f
X
t∈Sg
X
X
5. Clearly ϕ E ≤ ϕ X , so f ϕ E ≤ f ϕ X , and the previous item gives f dm ≤ f dm. E
X
Proposition 11.2.3 (Chebyshev inequality) Given a measure space (X, S, m), for any f ∈ M + (X, S) and α > 0
1 f (x)m(d x). m {x ∈ X : f (x) ≥ α} ≤ α X
11.2 The Integral of Non-negative Maps
293
Proof Observe first that the set Aα = {x ∈ X : f (x) ≥ α} is measurable since f (x) is measurable on X . Moreover, f (x) is non-negative, so by the integral’s monotonicity
f dm ≥
α dm = α · m(Aα )
1 1 ⇔ m {x ∈ X : f (x) ≥ α} ≤ f dm ≤ f dm. α Aα α X X
f dm ≥
Aα
Aα
Now we shall establish an important theorem for passing limits inside integrals. It was first proved in 1906 by Beppo Levi (1875–1961) and known in the literature as “monotone convergence theorem”. Theorem 11.2.1 (Beppo Levi’s monotone convergence theorem) Let { f i }i≥1 ⊆ M + (X, S) be an increasing sequence, and set lim f i = f on X.
i→∞
Then f ∈ M + (X, S) and
f i dm =
lim
i→∞
X
f dm. X
Proof As { f i }i≥1 ⊆ M + (X, S) is an increasing sequence of measurable, nonnegative functions, by Theorem 10.2.5 f is the same. Let us prove the second assertion. Since f i ≤ f i+1 ≤ f on X , item 4. in Proposition 11.2.2 tells f i dm ≤ f i+1 dm ≤ f dm X
for every i ∈ N, i.e.
X
X
f i dm
i≥1
lim
i→∞
X
X
is an increasing sequence of numbers, so on X f i dm ≤ f dm. X
Hence we just need the oppositeinequality. Pick α ∈ (0, 1) and s(x) = nj=1 a j ϕ E j ∈ M0+ (X, S), with nj=1 E j = X and 0 ≤ s(x) ≤ f (x). Define
294
11 The Lebesgue Integral
Ai = x ∈ X : f i (x) ≥ α · s(x) . By 4. in Proposition 10.2.2 all sets Ai are S-measurable, and form an increasing sequence: given any i ∈ N and x ∈ Ai , we have α · s(x) ≤ f i (x) ≤ f i+1 (x) and so x ∈ A i+1 . Hence Ai ⊆ Ai+1 . ∞ Ai = X . Take x ∈ X so that α · s(x) < s(x) ≤ f (x). We claim i=1 ∞There exists ≤ f (x), and then x ∈ A . Hence X ⊆ i 0 ≥ 1 such that α · s(x) i i 0 i=1 Ai . Since ∞ 0 Ai = X . Ai ⊆ X, ∀i, in the end i=1 Moreover α · s(x) ≤ f i (x) and f i (x) · ϕ Ai (x) ≤ f i (x), so 4. in Proposition 11.2.2 gives α s dm ≤ f i dm ≤ f i dm. Ai
Ai
X
Let μ : S → [0, ∞] denote the with density s with respect to m. Since measure ∞ Ai , Theorem 8.2.3 tells {Ai }i≥1 is increasing and X = i=1 α lim
i→∞
Ai
s dm = α lim μ(Ai ) = αμ(X ) i→∞ = α s dm X ≤ lim f i dm. i→∞
X
This inequality holds when α → 1− , so s dm ≤ lim f i dm. i→∞
X
X
But s ∈ M0+ (X, S) is arbitrary, so the above applies to sup s dm s∈S f
X
as well. Eventually, recalling Definition 11.2.1, f dm ≤ lim f i dm X
concluding the proof.
i→∞
X
11.2 The Integral of Non-negative Maps
295
Remark 11.2.2 In the monotone convergence theorem we cannot replace the increasing sequence with a decreasing one. Consider the measure space (R, L , ) and the decreasing sequence { f n (x)}n≥1 , ⎧ ⎨1 if x ≥ n f n (x) = n ⎩ 0 if x < n. Clearly f n ∈ M + (R, L ) for every n, and for any x ∈ R lim f n (x) = f (x) = 0,
n→∞
hence R
f d = 0.
At the same time R
f n d = 0 · ((−∞, n)) +
1 · ([n, ∞)) = ∞ n
and lim
n→∞ R
f n d = ∞,
and therefore
lim
n→∞ R
f n d =
f d.
♣
R
Proposition 11.2.4 (Consequences of Beppo Levi’s theorem) The following are corollaries to Beppo Levi’s theorem: ∞ Ai , and 1. if {Ai }i≥1 are pairwise-disjoint S-measurable sets such that X = i=1 ¯ belongs in M + (X, S), f : X → R f dm = X
∞ i=1
f dm; Ai
2. if { f i }i≥1 ⊆ M + (X, S) then ∞ X i=1
f i dm =
∞ i=1
f i dm. X
296
11 The Lebesgue Integral
k Proof 1. The sequence {gk }k≥1 , gk = i=1 f ϕ Ai ∈ M + (X, S) for every k ∈ N, is increasing, so by Theorem 10.2.5 g = f ϕ X ∈ M + (X, S). The integral’s additivity and monotone convergence imply
f dm = lim X
gk dm = lim
k→∞
= lim
k→∞
k
k→∞
X k
f ϕ Ai dm = X
i=1
f ϕ Ai dm
X i=1
∞
f dm. Ai
i=1
k 2. Write gk = i=1 f i . Every f i is non-negative, so {gk }k≥1 is an increasing sequence of non-negative measurable maps. Again, additivity and monotone convergence give ∞ X i=1
f i dm =
lim
k
X k→∞ i=1
= lim
k→∞
k i=1
f i dm = lim
k
k→∞
f i dm = X
∞ i=1
f i dm
X i=1
f i dm.
X
Let us extend Proposition 11.1.2 to non-negative measurable functions. Proposition 11.2.5 (Measures with density f ∈ M + (X, S) with respect to m) Given a measure space (X, S, m) and a map f ∈ M + (X, S), the function μ : S → [0, ∞], μ(E) = f dm, ∀E ∈ S E
is a completely additive measure. We say μ has density f with respect to m. Proof First, the set function μ : S → [0, ∞] is a measure on S. As ∅ ∈ S we know μ(∅) = f dm = sup s dm = 0. ∅
s∈S f
∅
Take a sequence {E i }i≥1 of pairwise-disjoint sets in S such that E = let us show μ(E) =
∞ i=1
μ(E i ),
∞ i=1
E i , and
11.2 The Integral of Non-negative Maps
297
i.e. f dm = E
Clearly ϕ E =
∞ i=1
f ϕ E dm = X
f X
∞
=
f dm. Ei
i=1
ϕ Ei . By part 1. in Proposition 11.2.4
f dm = E
∞
i=1
f ϕ Ei dm = X
∞
∞ ϕ Ei dm = f ϕ Ei dm
i=1 ∞ i=1
X
i=1
f dm.
Ei
From that, we can express the integral of any (non-negative) measurable map in the following terms. Proposition 11.2.6 (Integrals of measurable maps with respect to density measures) Given the measure space (X, S, m), let f ∈ M + (X, S) be the density of μ : S → [0, ∞] with respect to m. For every g ∈ M + (X, S) g dμ = g f dm. X
Proof For s =
n i=1
X
ai ϕ Ai
s dμ = X
=
n
ai μ(Ai ) i=1 n
=
n
ai
i=1
ai f ϕ Ai dm =
i=1
=
X
f dm Ai
n X
ai ϕ Ai
f dm
i=1
s f dm. X
If g ∈ M + (X, S), let {sn }n≥1 be an increasing sequence of simple, non-negative functions such that limn→∞ sn = g. Beppo Levi’s theorem and the above imply
g dμ = lim X
n→∞
sn dμ = lim X
n→∞
sn f dm = X
lim sn f dm =
X n→∞
g f dm. X
The following proposition is of interest because it highlights the semi-continuity of the Lebesgue integral of sequences of non-negative measurable maps.
298
11 The Lebesgue Integral
Theorem 11.2.2 (Fatou’s lemma) Given the measure space (X, S, m) and a sequence { f i }i≥1 ⊆ M + (X, S), lim inf f i dm ≤ lim inf f i dm. X
i→∞
i→∞
X
Proof Recall the definition of limit inferior
lim inf f i = sup inf f k . i→∞
k≥i
i≥1
By Theorem 10.2.3, lim inf i→∞ f i ∈ M + (X, S). Since inf k≥i f k i≥1 is an increasing sequence in M + (X, S) such that
lim inf f k = lim inf f i ,
i→∞
k≥i
i→∞
applying the monotone convergence theorem gives inf f k dm. lim inf f i dm = lim X
i→∞
i→∞
(11.2.2)
k≥i
X
Moreover inf k≥i f k ≤ f i for every i ∈ N, and monotonicity implies inf f k dm ≤ f i dm X
k≥i
(11.2.3)
X
inf k≥i f k dm i≥1 : it for every i ∈ N. Consider the sequence of integrals X
is increasing and with values in [0, ∞]. Hence limi→∞
X infk≥i f k dm exists, belongs to [0, ∞], and actually equals lim inf i→∞ X inf k≥i f k dm. By (11.2.3) we deduce inf f k
lim
i→∞
X
k≥i
dm = lim inf i→∞
inf f k
X
k≥i
dm ≤ lim inf i→∞
The required inequality descends from (11.2.2), (11.2.4).
f i dm.
(11.2.4)
X
Remark 11.2.3 1. First of all Fatou’s lemma is applicable to sequences in M + (X, S) that are not necessarily decreasing or increasing: it suffices they are measurable and non-negative. 2. Here is an example fulfilling Fatou’s strict inequality: the sequence with general term f n (x) = ϕ[n,n+1] (x) ∈ M + (X, S) has values in [0, ∞) and
11.2 The Integral of Non-negative Maps
299
f (x) = lim ϕ[n,n+1] (x) = ϕ∅ (x) = 0, ∀x ∈ [0, ∞). n→∞
Moreover, ∀n ∈ N
[0,∞)
f n dm =
⇒ lim inf n→∞
[0,∞)
[0,∞)
ϕ[n,n+1] dm = m([n, n + 1] = 1
f n dm = 1 >
[0,∞)
f dm = 0.
3. If a sequence is made of non-positive measurable functions, applying Fatou’s lemma to {− f n }n≥1 gives X
⇔ X
lim inf (− f n ) dm ≤ lim inf (− f n ) dm n→∞ n→∞ X lim sup f n dm ≥ lim sup f n dm. n→∞
n→∞
(11.2.5)
X
4. If we remove the hypothesis of non-negativity or non-positivity, Fatou’s lemma becomes false, in general. Consider the real-valued sequence defined by f n (x) = (−1)n ϕ[n,n+1] (x). Clearly f (x) = lim f n (x) = (−1)n ϕ∅ (x) = 0, ∀x ∈ [0, ∞), n→∞
and ∀n ∈ N f n dm = (−1)n ϕ[n,n+1] dm = (−1)n m([n, n + 1] = (−1)n [0,∞) [0,∞) ⇒ lim inf f n dm = −1 < f dm = 0. n→∞
[0,∞)
[0,∞)
5. If there exists a measurable map h such that f n (x) ≥ h(x) or
f n (x) ≤ h(x), ∀x ∈ X, ∀n ∈ N,
then the measurable sequence { f n }n≥1 satisfies Fatou’s lemma or inequality (11.2.5), regardless of the sign of the f n . It is enough to consider {gn }n≥1 , where ♣ gn = f n − h, and apply Fatou’s lemma or (11.2.5). Here come two more useful properties. Theorem 11.2.3 (Vanishing of the integral) A function f ∈ M + (X, S) has vanishing integral,
300
11 The Lebesgue Integral
f dm = 0, X
if and only if f = 0 m-almost everywhere. Proof Necessary implication: suppose f dm = 0 X
and define for every n ∈ N 1 . E n = x ∈ X : f (x) ≥ n Chebyshev’s inequality implies 1 ≤n m(E n ) = m x ∈ X : f (x) ≥ f dm = 0, n X so m(E n ) = 0, ∀n ∈ N. Since ∞ E = x ∈ X : f (x) > 0 = E n , and E 1 ⊆ E 2 ⊆ . . . , n=1
the complete subadditivity of m ensures that ∞ ∞
m(E) = m {x ∈ X : f (x) > 0} = m En ≤ m(E n ) = 0. n=1
n=1
Therefore the measure of (X \E), where f (x) = 0, equals m(X ) − m
∞
En
= m(X ),
n=1
i.e. f (x) = 0 up to a zero-measure set E ⊆ X . Sufficient implication: suppose f (x) is non-negative, S-measurable, and null ma.e., so
m(E) = m {x ∈ X : f (x) > 0} = 0. Pick the increasing sequence {sn (x)}n≥1 of non-negative simple measurable functions sn (x) = n · ϕ E (x). Clearly
11.2 The Integral of Non-negative Maps
301
f (x) ≤ lim inf sn (x).
(11.2.6)
n→∞
Moreover, for every n ∈ N sn dm = n ϕ E dm = n · m(E) = 0, X
X
and then sn dm = 0.
lim inf n→∞
X
Applying Fatou’s lemma to inequality (11.2.6) we obtain 0≤ f dm ≤ lim inf sn dm ≤ lim inf sn dm = 0. X
X
n→∞
n→∞
X
Theorem 11.2.4 (Finite integral implies bounded integrand) Any f ∈ M + (X, S) with f dm < ∞ X
is bounded m-almost everywhere. Conversely, if m(X ) < ∞ and f is bounded m-a.e., then f dm < ∞. X
Proof Define, for n ∈ N, E n = {x ∈ X : f (x) ≥ n}.
∞ Then m {x ∈ X : f (x) = ∞} = m n=1 E n , and by Chebyshev’s inequality 1 m(E n ) ≤ f dm ⇒ m(E 1 ) ≤ f dm < ∞. n X X As {E n }n≥1 is decreasing and m(E 1 ) < ∞, Theorem 8.2.4 forces m
∞ n=1
Therefore
En
1 n→∞ n
= lim m(E n ) ≤ lim n→∞
f dm = 0. X
302
11 The Lebesgue Integral
∞
m {x ∈ X : f (x) = ∞} = m E n = 0. n=1
On the other hand if f (x) is non-negative, S-measurable, such that f (x) ≤ M m-a.e. for some M > 0, then
m(E) = m {x ∈ X : f (x) > M} = 0. The increasing sequence {sn (x)}n≥1 of non-negative simple measurable maps sn (x) = M · ϕ X \E (x) + n · ϕ E (x) is such that f (x) ≤ lim inf sn (x).
(11.2.7)
n→∞
For every n ∈ N, furthermore, sn dm = M ϕ X \E dm + n ϕ E dm = M · m(X ), X
X
but m(X ) < ∞, so
X
sn dm = M · m(X ) < ∞.
lim inf n→∞
X
Using inequality (11.2.7) and Fatou’s lemma, 0≤ f dm ≤ lim inf sn dm ≤ lim inf sn dm < ∞. X
X
n→∞
n→∞
X
Theorem 11.2.5 (Comparing integrals of maps in M + (X, S)) Consider f, g ∈ M + (X, S) and suppose f ≤ g, or f = g, or f ≥ g, m-a.e. Correspondingly, then,
f dm ≤ X
g dm; X
f dm = X
g dm; X
f dm ≥ X
Proof Define A = {x ∈ X : f (x) > g(x)}, so m(A) = 0. Moreover f = f ϕ X \A + f ϕ A and g = gϕ X \A + gϕ A , where f ϕ A = 0 = gϕ A m-a.e. By Theorem 11.2.3
g dm. X
11.2 The Integral of Non-negative Maps
303
f dm = X
f ϕ X \A dm and
g dm =
X
X
gϕ X \A dm. X
As f ϕ X \A ≤ gϕ X \A for every x ∈ X , by monotonicity f dm ≤ g dm. X
X
The cases f = g and f ≥ g are completely analogous.
11.3 The Integral of Arbitrary Maps Now we pass to arbitrary functions. From here on, unless otherwise noted, we shall consider maps f defined on a complete measurable space (X, S) with σ-finite ¯ measure m with values in R. Recall f can be written using its positive and negative parts f + , f − , which are non-negative. Furthermore, f + , f − are S-measurable iff f is S-measurable (Corollary 10.2.1). − + − + The difference (f+ − f −), where f , f ∈ M (X, S), is in M (X, S). The + integrals f +dm and f− dm are well defined and by linearity we can define f dm = f dm − f dm. Note, thought, that this might produce the indeterminacy (∞ − ∞) in case f − dm = ∞. f + dm = In defining integrable or summable functions we must then exclude those situations. ¯ ∈ M (X, S) Definition 11.3.1 (Integrable and summable functions) If f : X → R then: 1. f is integrable with respect to m (in m, for short) if the following integrals exist and at least one of them is finite f + dm, f − dm. X
X
If so, the Lebesgue integral of f on X is + f dm = f dm − f − dm ∈ [−∞, ∞]; X
X
X
304
11 The Lebesgue Integral
2. f is summable (or Lebesgue integrable) in m if both integrals + f dm and f − dm X
X
exist and are finite, in which case the Lebesgue integral of f on X is + f dm = f dm − f − dm ∈ (−∞, ∞). X
X
♦
X
We shall denote by L 1 (X, S, m) the space of m-summable functions. For simplicity’s sake we might write L 1 (X ), L 1 (m) or just L 1 whenever no confusion can arise. The first immediate observation is that f ∈ L 1 ⇒ f ∈ M (X, S). ¯ in M (X, S) Proposition 11.3.1 (Properties of L 1 functions) For any f, g : X → R 1. 2. 3. 4. 5.
f ∈ L 1 ⇔ f +, f − ∈ L 1 ⇔ | f | ∈ L 1; f ∈ L 1 ⇒ | f | < ∞ m-a.e.; | f | ≤ g m-a.e. and g ∈ L 1 ⇒ f ∈ L 1 . m(X ) < ∞ and | f | ≤ M m-a.e. for some M ⇒ f ∈ L 1 ; f ∈ L 1 ⇒ f ϕ E ∈ L 1 for every E ∈ S, where f dm = f ϕ E dm. E
X
Proof 1. The equivalence f ∈ L 1 ⇔ f + , f − ∈ L 1 descends directly from the definition of summability. Let us show f + , f − ∈ L 1 ⇒ | f | ∈ L 1 . Since | f | = ( f + + f − ), + | f | dm = f dm + f − dm ∈ [0, ∞). X
X
X
Now to | f | ∈ L 1 ⇒ f + , f − ∈ L 1 . As f + , f − ≤ | f |, f
+
dm ≤
| f | dm ∈ [0, ∞) and
X
f
X
−
dm ≤
X
| f | dm ∈ [0, ∞). X
2. Because
f dm = X
f
+
f − dm ∈ (−∞, ∞),
dm −
X
X
Theorem 11.2.4 gives f + ∈ [0, ∞) and
f − ∈ [0, ∞)
m-almost everywhere, and hence | f | = ( f + + f − ) ∈ [0, ∞) m-a.e.
11.3 The Integral of Arbitrary Maps
305
3. Since | f | = ( f + + f − ) ≤ g m-a.e. on X ⇔ f + ≤ g and f − ≤ g m-a.e. on X, by Theorem 11.2.5 f
+
dm ≤
X
g dm ∈ [0, ∞) and
f
X
−
dm ≤
X
g dm ∈ [0, ∞), X
and then X f dm ∈ (−∞, ∞). + − 4. From | f | = f + + f − we know
+ f −≤ M and f ≤ M m-a.e. Theorem 11.2.4 + − says f , f ∈ L 1 , so f = f − f ∈ L 1 . 5. As f ∈ L 1 , and
f + · ϕ E dm − f − · ϕ E dm X X + − = f dm − f dm E E = f dm E ≤ f dm ∈ (−∞, ∞);
f · ϕ E dm = X
X
we have f ϕ E ∈ L 1 . Recalling Remark 11.2.1 we obtain
f dm = E
E
=
f + dm −
f − dm
E
f
+
f − dm |E =
dm |E −
X
X
f dm |E .
(11.3.1)
X
Let us list integral properties of summable functions. Proposition 11.3.2 (Properties of the integral of L 1 functions) For any f , g ∈ M (X, S) and α ∈ R, 1. if f ∈ L 1 , then
≤ f dm | f | dm; X
X
306
11 The Lebesgue Integral
2. if f = g m-a.e. and f ∈ L 1 , then g ∈ L 1 and f dm = g dm; X
X
3. if f ∈ L 1 , then α · f ∈ L 1 and α · f dm = α f dm; X
(11.3.2)
X
4. if f, g ∈ L 1 , then f + g ∈ L 1 and
f dm + g dm. f + g dm = X
X
X
Proof 1. Since f ∈ L 1 ⇔ | f | ∈ L 1 , then + − = f dm f dm − f dm X X X ≤ f + dm + f − dm X X = | f | dm. X
2. As f = g m-a.e., then m-almost everywhere we have f + = g+
f − = g− .
and
Moreover f ∈ L 1 , so f + , f − ∈ L 1 and by Theorem 11.2.5
f + dm = X
g + dm ∈ [0, ∞) and X
f − dm = X
Hence g = (g + − g − ) ∈ L 1 and
g − dm ∈ [0, ∞). X
f dm =
X
g dm. X
3. For α = 0 obviously α f ∈ L 1 , and (11.3.2) holds. For α > 0 we have (α f )+ = α f + , (α f )− = α f − and so
11.3 The Integral of Arbitrary Maps
307
α f + dm − α f − dm X X + − f dm − f dm =α X X f dm ∈ (−∞, ∞). =α
α f dm = X
X
The case α < 0 is completely similar. 4. As f, g ∈ L 1 , by part 1. in Proposition 11.3.1 | f |, |g| ∈ L 1 . But | f + g| ≤ | f | + |g|, so by linearity and monotonicity | f + g| dm ≤ | f | dm + |g| dm ∈ [0, ∞). X
X
X
That same part 1. gives ( f + g) ∈ L 1 . Now, f, g ∈ L 1 implies f + , f − , g + , g − ∈ L 1 , and the latter are all m-almost everywhere finite, non-negative and measurable by part 2. in Proposition 11.3.1. Hence Proposition 11.2.2, 3., ensures
( f + g)+ dm + X
X
⇒
f − dm + g − dm = f + dm + g + dm + ( f + g)− dm X X X X
+ − f + g dm = f dm − f dm X X X + g + dm − g − dm X X = f dm + g dm. X
X
Remark 11.3.1 In contrast to what happened in M + (X, S) (Theorem 11.2.3), if f ∈ L 1 is such that f dm = 0, X
it may not vanish m-almost everywhere. Since f dm = f + dm − f − dm, X
X
X
the integral on X can vanish when f + dm = 0 = f − dm X
X
308
11 The Lebesgue Integral
i.e. | f | dm = 0.
(11.3.3)
X
In this case f + = 0 = f − m-a.e. ⇒ f = 0 m-a.e. But + f dm = f − dm ∈ (0, ∞) X
X
does not necessarily imply f = 0 m-a.e. The only case, apart from that of (11.3.3), in which zero integral implies the map is m-a.e. zero, is encapsulated in the next result. It is useful to remember that for L 1 functions the implication f = 0 m-a.e. on X ⇒ X f dm = 0 remains valid. ♣ Proposition 11.3.3 (Vanishing integral of L 1 functions) Suppose f ∈ L 1 is such that f dm = 0 E
for every E ∈ S. Then f = 0 m-almost everywhere. Proof Write
X + = x ∈ X : f (x) ≥ 0 . For any E ∈ S
f + dm =
E
E∩X +
f dm = 0,
so Theorem 11.2.3 gives f + = 0 m-a.e. Similarly, using
X − = x ∈ X : f (x) < 0 one shows f − = 0 m-a.e. Now consider
A = x ∈ X : f + (x) > 0 and B = x ∈ X : − f − (x) < 0 . Clearly m(A) = m(B) = 0, and
x ∈ X : f (x) = 0 = A ∪ B and A ∩ B = ∅,
so m x ∈ X : f (x) = 0 = 0, i.e. f = 0 m-a.e.
11.3 The Integral of Arbitrary Maps
309
The Lebesgue integral is finitely and countably additive in the integration domain, in the following sense. Proposition 11.3.4 (Finite and complete additivity of the integral with respect to the integration domain) If f ∈ L 1 1. for any A, B ∈ S such that A ∩ B = ∅, f dm = f dm + f dm; A∪B
A
B
2. if {E i }i≥1 ⊆ S is a pairwise-disjoint sequence, then the integrals over E i form an absolutely convergent series: ∞ i=1
Ei
f dm < ∞.
∞ Setting E = i=1 E i ⊆ X , the integral of f is completely additive in the integration domain:
∞ i=1
f dm =
f dm.
Ei
E
Proof 1. As A, B are disjoint,
f ϕ A + ϕ B dm = f ϕ A + f ϕ B dm X X = f ϕ A dm + f ϕ B dm = f dm + f dm.
f dm = A∪B
X
X
A
B
2. Item 1. in Proposition 11.3.2 says that for every i ≥ 1 f dm ≤ | f | dm, Ei
Ei
and | f | ∈ M + (X, S). Hence the set function μ : S → [0, ∞], μ(E) = | f | dm, ∀E ∈ S E
is a measure with density | f | with respect to m, which is completely additive (Proposition 11.2.5). Therefore ∞ i=1
μ(E i ) =
∞ i=1
| f | dm = Ei
| f | dm = μ(E) < ∞. E
310
11 The Lebesgue Integral
But since ∞ i=1
Ei
∞ f dm ≤ | f | dm = | f | dm < ∞, Ei
i=1
E
the series converges absolutely. Consequently it converges, and then the sequence Sn =
n i=1
f dm Ei
converges, meaning f dm = lim E
n
n→∞
i=1
f dm Ei
=
∞ i=1
f dm. Ei
Here is a result used often, that allows to swap limits and integrals of L 1 functions. Theorem 11.3.1 (Lebesgue’s dominated convergence theorem - 1908) If a sequence { f i }i≥1 ⊆ M (X, S) converges to f = lim f i m-almost everywhere, and there exists i→∞
a map g ∈ L 1 such that
| f i | ≤ g m-a.e. (so-called Lebesgue condition) for every i ∈ N, then: 1. f, f i ∈ L 1 , ∀i ∈ N; 2. f dm = lim f i dm; i→∞ X X 3. lim | f i − f | dm = 0. i→∞ X
Proof Without loss of generality we may assume that everywhere on X lim f i = f
i→∞
and
| f i | ≤ g for every i ∈ N.
By Theorem 10.2.4 f ∈ M (X, S), and since ∀i ∈ N | f i | ≤ g, then for the limit f we also have | f | ≤ g. As g ∈ L 1 , by 3. in Proposition 11.3.1 every f i and f are L 1 . This ends claim 1.. For number 2. pick non-negative measurable functions {g + f i }i≥1 . Using Fatou’s lemma and the integral’s linearity we have
11.3 The Integral of Arbitrary Maps
g dm + X
311
(g + f ) dm ≤ lim inf (g + f i ) dm i→∞ X X g dm + f i dm = lim inf i→∞ X X g dm + lim inf f i dm, =
f dm = X
i→∞
X
X
and so
f dm ≤ lim inf
f i dm.
i→∞
X
(11.3.4)
X
Now take the non-negative, measurable maps {g − f i }i≥1 . Using again Fatou’s lemma we obtain g dm − f dm = (g − f ) dm ≤ lim inf (g − f i ) dm i→∞ X X X X g dm − f i dm = lim inf i→∞ X X g dm − lim sup f i dm, = i→∞
X
X
thus
f dm ≥ lim sup
f i dm.
i→∞
X
Then (11.3.4) and (11.3.5) give
f dm = lim
X
(11.3.5)
X
i→∞
f i dm. X
Now 3.. For every i ∈ N | f i − f | ≤ 2g, so the sequence {| f i − f |}i≥1 , that converges to 0, is dominated by 2g. By dominated convergence we obtain lim | f i − f | dm = lim | f i − f | dm = 0. i→∞
X
X i→∞
312
11 The Lebesgue Integral
Remark 11.3.2 In the above proof we simplified a little when we assumed { f i }i≥1 converged everywhere on X , and said all inequalities | f i | ≤ g held at all points of X . We shall explain why this does not affect the general statement’s proof. Call E the subset of X where convergence fails and some | f i | ≤ g does not hold. As m(E) = 0, we may redefine f i and f to be zero on E. Then f i ϕ(X \E) , f ϕ(X \E) ∈ M (X, S) for all i ∈ N, and f i ϕ(X \E) = f i and f ϕ(X \E) = f m-a.e. Theorem 10.2.6 implies f i , f ∈ M (X, S). Similarly, f i ϕ(X \E) , f ϕ(X \E) ∈ L 1 , and f i ϕ(X \E) = f i and f ϕ(X \E) = f m-a.e. Part 2. in Proposition 11.3.2 guarantees f i , f ∈ L 1 and
f i ϕ(X \E) dm =
X
f ϕ(X \E) dm =
X
(X \E)
(X \E)
f i dm =
f i dm, ∀i ∈ N
(11.3.6)
f dm,
(11.3.7)
X
f dm =
X
given that
f dm = E
f i dm = 0, ∀i ∈ N. E
Therefore the new statement of the theorem becomes f dm = lim f i dm, i→∞
X \E
X \E
which is indeed the same as the original because of (11.3.6), (11.3.7).
♣
Just as with monotone convergence, Lebesgue’s theorem has two corollaries: dominated convergence for series and bounded convergence in spaces with finite measure. Corollary 11.3.1 (Dominated convergence theorem for series) If { f i }i≥1 ⊆ L 1 is ∞ ∞ | f | dm < ∞, then f = such that i=1 i i=1 f i is well defined m-a.e., f ∈ L 1 X and
∞ i=1
f i dm = X
f dm. X
Proof Clearly {| f i |}i≥1 ⊆ M + (X, S), so 2. in Proposition 11.2.4 implies ∞ X
i=1
∞ | f i | dm = | f i | dm < ∞, i=1
X
∞ ∞ whence i=1 | f i | ∈ L 1 . Theorem 11.2.4 says i=1 | f i | is bounded m-a.e., hence ∞ i=1 f i converges absolutely and, in particular, it converges pointwise m-a.e. to f .
11.3 The Integral of Arbitrary Maps
Furthermore
∞ X
313
fi
dm ≤
∞ X
i=1
| f i | dm < ∞,
i=1
n
∞ so f = n→∞ i=1 f i = i=1 f i ∈ L 1 . Since { f i }i≥1 ⊆ L 1 , by item 4. in lim n Proposition 11.3.2 i=1 f i ∈ L 1 for every n ∈ N. Moreover, ∞ n ≤ f | f i | ∀n ∈ N m-a.e. i i=1
∞ ∞ and | f | = f i ≤ | f i | m-a.e. .
i=1
i=1
i=1
By dominated convergence ∞ i=1
f i dm = lim
n→∞
X
=
n
n
X n→∞ i=1
n
n→∞
X
i=1
lim
f i dm = lim f i dm =
f i dm
X i=1
∞
f i dm =
X i=1
f dm. X
Corollary 11.3.2 (Bounded convergence) Suppose (X, S, m) is a finite-measure space (m(X ) < ∞) and take { f n }n≥1 ⊆ M (X, S). Suppose there exists an M ∈ (0, ∞) such that | f n | ≤ M m-a.e. for every n ∈ N and limn→∞ f n = f m-almost everywhere. Then f and every f n belong to L 1 , and f dm = lim f n dm. n→∞
X
Proof Here g = M, so
X
g dm =
X
M dm = Mm(X ) < ∞, X
and g ∈ L 1 . By dominated convergence f dm = lim f n dm. X
n→∞
X
In 1907 Giuseppe Vitali had written a paper2 proving when limits and integrals can be swapped. Before we prove Vitali’s theorem we need to introduce certain notions. Remark 11.3.3 Let us point out that in contrast to non-negative measurable functions, when we deal with L 1 maps the set function 2 Sull’integrazione
per serie, Rend. Circ. Mat. Palermo 23 (1907), 137–155.
314
11 The Lebesgue Integral
μ(E) =
f dm ∈ (−∞, ∞),
E ∈S
E
is not a measure. But if f ∈ M (X, S) then μ¯ : S → [0, ∞], | f | dm ∈ [0, ∞], E ∈ S μ(E) ¯ = E
is a measure. It is the measure with density | f | with respect to m. Clearly if f ∈ L 1 , then μ¯ is finite. ♣ Now we want to examine the relationships among several measures on the same σ-algebra. Definition 11.3.2 (Absolutely continuous measures) Let (X, S) be a measurable space and m, μ two measures defined on the σ-algebra S. One says μ is absolutely continuous with respect to m, written μ m, if m(E) = 0 ⇒ μ(E) = 0 for every E ∈ S. ♦ In practice a measure μ is absolutely continuous with respect to another measure m if among its zero-measure sets are all those of m. The prototypical example of finite and absolutely continuous measure with respect to a given m is the integral of a non-negative summable function, which is absolutely continuous. Proposition 11.3.5 (Equivalent criteria for summability) If f ∈ L 1 , the following conditions are equivalent: 1. the measure μ : S → [0, ∞),
| f | dm, E ∈ S
μ(E) = E
is absolutely continuous with respect to m: ∀E ∈ S, m(E) = 0 ⇒ μ(E) = 0; 2. for each > 0 there exists δ > 0 such that μ(E) = | f | dm < E
for any E ∈ S with m(E) < δ. Proof 1.: given μ m, suppose by contradiction there was an > 0 and an Fn ∈ S, for every n ∈ N, such that m(Fn ) < 1/2n and μ(Fn ) = | f | dm ≥ . Fn
11.3 The Integral of Arbitrary Maps
The sets E n =
∞ i=n
315
Fi form a decreasing sequence in S. Calling E=
∞
E n so E ⊆ E n , ∀n ∈ N,
n=1
by complete subadditivity m(E n ) ≤
∞
m(Fi )
0 there exists δ > 0 such that, for every E ∈ S with m(E) < δ, ♦ | f | dm < ∀ f ∈ F . E
Proposition 11.3.6 (Sequences of uniformly summable maps) Let (X, S, m) be a ¯ n≥1 a sequence of uniformly finite-measure space (m(X ) < ∞) and { f n : X → R} summable maps that converge pointwise m-a.e.. Then the limit f : X → R is summable on X . Proof We need m(X ) < ∞ to invoke Proposition 9.3.5, so that X can be approx k imated by a finite disjoint union E = i=1 E i of measurable sets, up to a set of arbitrarily small measure. Choosing = 1, by uniform summability the pairwisedisjoint sets E i and the leftover (X E) can be selected so that all of them have measure less than δ. Hence for every n ∈ N we have | f n | dm ≤ X
k
| f n | dm + Ei
i=1
(X E)
| f n | dm < k · + = (k + 1),
and applying Fatou’s lemma | f | dm = lim inf | f n | dm ≤ lim inf | f n | dm < (k + 1), X
X
n→∞
whence f is summable on X .
n→∞
X
11.3 The Integral of Arbitrary Maps
317
At this point we are ready to prove the result guaranteeing when, for summable functions, limit and integral can be swapped. Theorem 11.3.2 (Vitali’s convergence theorem - 1907) Let (X, S, m) be a finite¯ n≥1 a sequence of uniformly summable maps that measure space and { f n : X → R} converges to f : X → R pointwise m-a.e. Then f is summable on X and lim f n dm = lim f n dm. X n→∞
n→∞
X
Proof By Proposition 11.3.6 f is summable on X , and further m-a.e. bounded because of Theorem 11.2.4. Part 1. in Proposition 11.3.2 implies f n dm − f dm = ( f n − f ) dm X X X ≤ | f n − f | dm X
for every n ∈ N. Then the uniform summability of { f n }n≥1 guarantees that given > 0 there is a δ > 0 such that | f n − f | dm = | f n − f | dm + | f n − f | dm X X \E E | f n − f | dm+ (11.3.9) ≤ X \E | f n | dm + | f | dm (11.3.10) E E 0 there is a δ > 0 such that | f n | dm ≤ g dm < E
E
for every E ∈ S, m(E) < δ, and n ∈ N. Immediately, then, the f n are uniformly summable and converge to the pointwise limit f m-a.e. Under these conditions Vitali’s theorem holds, since we are assuming m(X ) < ∞. ♣ Definition 11.3.4 (Tight families of functions) Given a measure space (X, S, m),3 ¯ is tight on X if for any > 0 there a family F of summable functions f : X → R exists a set K ∈ S, with m(K ) < ∞, such that | f | dm < ∀ f ∈ F . ♦ X \K
Proposition 11.3.7 (Uniformly summable and tight sequences of functions) Let ¯ n≥1 a uniformly summable and (X, S, m) be a measure space4 and { f n : X → R} tight sequence of maps. If { f n } converges pointwise to f : X → R m-a.e., then f is summable on X . Proof Fix > 0, than by assumption there is a set K ∈ S, with m(K ) < ∞, such that 3 Here 4 Here
X is allowed to have infinite measure. X is allowed to have infinite measure.
11.3 The Integral of Arbitrary Maps
319
X \K
| f n |dm <
(11.3.11)
for every n ∈ N. Then take = 1. As m(K ) < ∞ and the { f n }n≥1 are uniformly summable, we apply Proposition 11.3.6 to the finite-measure set K , so | f |dm < (k + 1). K
Using Fatou’s lemma on (11.3.11) produces | f | dm = lim inf | f n | dm ≤ lim inf X \K
X \K
n→∞
n→∞
X \K
| f n | dm < 1.
Putting these together gives | f | dm = | f | dm + | f | dm < 1 + k + 1 = (k + 2), X
(X \K )
K
and the claim is proved.
With the help of Proposition 11.3.7 Vitali’s theorem extends to spaces of infinite measure provided we assume the sequence is tight. Theorem 11.3.3 (Extension of Vitali’s convergence theorem - 1907) Let (X, S, m) ¯ n≥1 a tight sequence of uniformly summable be a measure space and { f n : X → R} maps. If { f n } converges pointwise to f : X → R m-a.e., then f is summable on X and lim f n dm = lim f n dm. X n→∞
n→∞
X
Proof By Proposition 11.3.7 f ∈ L 1 , and by Theorem 11.2.4 f is bounded m-a.e. Part 1. in Proposition 11.3.2 says that for every n ∈ N = f dm − f dm ( f − f ) dm n n X X X ≤ | f n − f | dm. X
As the sequence is tight, given > 0 there is K ∈ S with m(K ) < ∞, such that | f n − f | dm = | f n − f | dm + | f n − f | dm (11.3.12) X
X \K
K
320
11 The Lebesgue Integral
for every n ∈ N. Above, the first integral on the right is smaller than /4. Moreover, since m(K ) < ∞ and { f n }n≥1 is uniformly summable, we can apply Vitali’s theorem (the original version, Theorem 11.3.2) to the second integral, obtaining for every n≥k | f n − f | dm ≤ | f n − f | dm + | f n | dm + | f | dm K \E
K
E
E
m(K \ E) 3 < · + + < . 4 m(K ) 4 4 4 Putting all this together, f n dm − f dm ≤ X
X
X \K
| f n − f | dm +
| f n − f | dm K
m(K \ E) + + < . < + · 4 4 m(K ) 4 4
Remark 11.3.5 • Removing tightness in the case m(X ) = ∞ invalidates the extended version of Vitali’s convergence theorem. Consider the sequence of general term f n = ϕ[n,n+1] , so clearly lim f n = f = 0.
n→∞
The f n are uniformly summable, because for every E ∈ S(R+ ), m(E) < δ, we have
f n dm = ϕ[n,n+1] dm = m [n, n + 1] ∩ E ≤ m(E), E
E
for every n ∈ N. Yet the theorem’s conclusion is false, since for every n ∈ N
f n dm = ϕ[n,n+1] dm = m [n, n + 1] ∩ R+ = 1 R+ R+ ⇒ lim f n dm = 1;
n→∞ R+
11.3 The Integral of Arbitrary Maps
321
whilst
lim f n dm =
R+ n→∞
R+
f dm = 0.
• Let us show that the dominated convergence theorem is a consequence of Vitali’s convergence result also in case m(X ) = ∞. Let { f n }n≥1 be a sequence of measurable functions converging pointwise m-a.e. to a measurable map f , and suppose g ∈ L 1 obeys | f n | ≤ g m-a.e., for every n ∈ N. The sequence is uniformly summable: g ∈ L 1 implies, by Proposition 11.3.5, that for every > 0 there is a δ > 0 such that, for every E ∈ S, m(E) < δ, | f n | dm ≤ g dm < E
E
for every n ∈ N. It is also tight, because for every > 0 exists a set K ∈ S, with m(K ) < ∞ such that | f n | dm ≤ g dm < , X \K
X \K
for every n ∈ N. Since { f n }n≥1 converges pointwise m-a.e. to f , the extended Vitali theorem holds. ♣ Remark 11.3.6 We remind that the proofs in this chapter hold under the premise the measure space (X, S, m) is complete and has σ-finite measure, except when we said otherwise. We have seen, with Proposition 9.6.1, that also the Lebesgue–
Stieltjes space R, S(R), m F is complete and with σ-finite measure. In particular,
the Lebesgue–Stieltjes measure of [0, 1], S([0, 1]), m F is totally finite. Let us call, for f ∈ L 1 , Lebesgue–Stieltjes integral the expression f d F, X
where X ∈ S(R) and dm F = d F. From this we deduce a useful observation, namely that the Lebesgue–Stieltjes integral enjoys all the nice properties of the Lebesgue integral if the measure space is complete and the measure is σ-finite. Under these conditions, therefore, the Lebesgue–Stieltjes integral satisfies Beppo Levi’s theorem, Fatou’s lemma and Lebesgue’s dominated convergence theorem, among others. ♣
Chapter 12
Comparing Notions of Integral
12.1 Comparing the Riemann and Lebesgue Integrals This section is devoted to showing the Lebesgue integral is an extension of the Riemann integral. Fix X = R, S = L (the σ-algebra of Lebesgue measurable sets) and m = (the Lebesgue measure on R). We indicate by L 1 [a, b] and R[a, b] the classes of Lebesgue summable and Riemann integrable bounded maps on the compact domain [a, b] respectively. The first thing to notice is there exist bounded functions f : [a, b] → R that are not Riemann integrable but are Lebesgue summable. That is because the bounded map f : [a, b] → R, 1 if x ∈ [a, b] ∩ Q f (x) = 2 if x ∈ [a, b] \ Q is Dirichlet-like, and hence not Riemann integrable. But it is summable, with Lebesgue integral equal to 2(b − a). Hence when we compare the two notions of integral it suffices to show that any Riemann integrable bounded map f : [a, b] → R is Lebesgue summable, and the integrals coincide, i.e. b f (x)d x = f (x)(d x) a
[a,b]
(the integral on the left is à la Riemann). To prove the aforementioned inclusion we must explain which quantities are at stake. For that, given I = [a, b] we define • Pn = {a = x0 < x1 < · · · < x jn = b} the nth partition of I in jn parts, and P the family of all possible partitions of I ; © The Editor(s) (if applicable) and The Author(s), under exclusive license to Springer Nature Switzerland AG 2020 S. Gentili, Measure, Integration and a Primer on Probability Theory, UNITEXT 125, https://doi.org/10.1007/978-3-030-54940-4_12
323
324
12 Comparing Notions of Integral
• Ikn = [xkn −1 , xkn ] the kn th interval of Pn , kn = 1, . . . , jn ; • |Pn | = maxkn =1,..., jn m pj (Ikn ) the norm, or mesh, of Pn ; • m Ikn = inf x∈Ikn f (x), M Ikn = supx∈Ikn f (x), for a bounded f : [a, b] → R; j • s( f, Pn ) = knn =1 m Ikn m pj (Ikn ) lower sum (over Pn ); j • S( f, Pn ) = knn =1 M Ikn m pj (Ikn ) upper sum (over Pn ); • s( f ) = sup Pn ∈P s( f, Pn ) lower (Riemann) integral of f : [a, b] → R; • S( f ) = inf Pn ∈P S( f, Pn ) upper (Riemann) integral of f : [a, b] → R; b f (x)d x = S( f ) Riemann integral on I of the bounded map f , in • s( f ) = a
which case f is said to be Riemann integrable; • R(I ) the collection of Riemann integrable, bounded maps on I . So now we have to prove R(I ) ⊆ L 1 (I ). Theorem 12.1.1 (Summability of Riemann integrable maps) If f : [a, b] → R belongs to R([a, b]) then f ∈ L 1 ([a, b]) and b f (x) (d x) = f (x)d x. [a,b]
a
Proof Consider an increasing sequence {Pn }n≥1 of partitions of [a, b] such that max m pj (Ikn ) = 0. lim |Pn | = lim n→∞
n→∞
kn =1,..., jn
For every n ∈ N, moreover, set Pn = a = x0 < x1 < x2 < · · · < x jn −1 < x jn = b and Ikn = [xkn −1 , xkn ], M Ikn = sup f (x), m Ikn = inf f (x) for kn = 1, . . . , jn . x∈Ikn
x∈Ikn
Now, for any n ∈ N we can define the following step functions with respect to partition Pn of [a, b] sn (x) =
jn
m Ikn ϕ Ikn (x) and tn (x) =
kn =1
jn
M Ikn ϕ Ikn (x).
kn =1
The former is a lower bound for f (x), the latter an upper bound. Hence {sn }n≥1 is increasing, {tn }n≥1 decreasing, and for every n ∈ N s n ≤ f ≤ tn .
12.1 Comparing the Riemann and Lebesgue Integrals
325
We also have, for every n ∈ N,
b
sn (x)d x =
a
jn
m Ikn m pj (Ikn ) = s( f, Pn )
kn =1 b
tn (x)d x =
a
jn
M Ikn m pj (Ikn ) = S( f, Pn ),
kn =1
and then
b
lim
n→∞ a
n→∞
b
lim
n→∞ a
But f ∈ R(I ), so lim
n→∞ a
b
sn (x)d x = lim s( f, Pn ) = s( f ) tn (x)d x = lim S( f, Pn ) = S( f ). n→∞
sn (x)d x =
b
f (x)d x = lim
b
n→∞ a
a
tn (x)d x.
(12.1.1)
Observe that m pj (Ikn ) = (Ikn ) for every kn = 1, . . . , jn , n ∈ N. Besides, {sn }, {tn } ⊆ M0 ([a, b], L ) are bounded in [a, b]. Therefore the Riemann integrals on the far right and far left of (12.1.1) coincide with the Lebesgue integrals. Calling lim sn = s, lim tn = t,
n→∞
n→∞
by Corollary 11.3.2 s, sn , t, tn ∈ L 1 [a, b], L for every n ∈ N, and lim
n→∞ a
b
sn (x)d x =
[a,b]
s(x) (d x),
Formula (12.1.1) implies s(x) (d x) = [a,b]
a
b
b
lim
n→∞ a
tn (x)d x =
t (x) (d x).
f (x)d x =
[a,b]
t (x) (d x).
Since (t − s) ∈ M [a, b], L and (t − s)(x) (d x) = 0, [a,b]
[a,b]
(12.1.2)
326
12 Comparing Notions of Integral
Theorem 11.2.3 guarantees t = s -a.e. on [a, b], and consequently s = f = t -a.e. on [a, b]. By Theorem 10.2.6 f ∈ M [a, b], L , so part 2. of Proposition 11.3.2 gives f ∈ L 1 [a, b], L and s(x) (d x) = f (x) (d x) = t (x) (d x). (12.1.3) [a,b]
[a,b]
[a,b]
Eventually, (12.1.2) and (12.1.3) allow to conclude.
Remark 12.1.1 Using Theorem 10.2.6, item 2. in Proposition 11.3.2 and Theorem 12.1.1 we draw the following conclusion: “Let g : [a, b] → R be Riemann integrable and bounded, and suppose f : [a, b] → R coincides with g -almost everywhere on [a, b]. Then f (x) is Lebesgue summable, and b g(x)d x = g(x)(d x) = f (x)(d x)." a
[a,b]
[a,b]
Here is an example for this situation: take f : [0, 1] → R to be ⎧ ⎪ x 2 if x ∈ [0, 1] \ Q ⎪ ⎪ ⎨
1 f (x) = if x ∈ [0, 1] ∩ Q \ {0} . ⎪ x ⎪ ⎪ ⎩ 1 if x = 0 The function g : [0, 1] → R, g(x) = x 2 is Riemann integrable and equals f almost everywhere on [0, 1]. Hence f is Lebesgue summable and 1 1 f (x) (d x) = g(x)d x = . ♣ 3 0 [0,1]
12.2 Completeness of L 1 ([a, b]) We have seen that if f, g are summable and α a real number, then f + g and α f
are summable (Proposition 11.3.2, 3.-4.). This makes L 1 [a, b], L , a real vector space, on which we set f 1 = | f |d ∀ f ∈ L 1 ([a, b]). [a,b]
This is a semi-norm in that it fulfils the following two properties on a norm (out of three): homogeneity
12.2 Completeness of L 1 ([a, b])
327
α f 1 = α f 1 ∀ f ∈ L 1 ([a, b]), α ∈ R and the triangle inequality f + g1 ≤ f 1 + g1 , ∀ f, g ∈ L 1 ([a, b]). It is not definite in general, though (see Theorem 11.2.3). To speak about a norm we need to identify -almost everywhere equal maps, i.e. put on L 1 ([a, b]) the equivalence relation f ∼ g iff f = g -a.e. on [a, b]. The ensuing quotient space of L 1 ([a, b]) will become a normed vector space. Rephrasing, we may write f ∼g⇔ | f − g|d = 0. [a,b]
The relation ∼ is reflexive, symmetric and transitive and hence an equivalence relation. L 1 ([a, b]) is thus partitioned in equivalence classes, i.e. pairwise-disjoint sets of functions such that for every f, g in one coset [a,b] | f − g|d = 0. Let us write [ f ] for the equivalence class of f and consider the quotient L 1 ([a, b])/ ∼, whose elements are the equivalence classes. It is a vector space, for [ f ] + [g] = [ f + g] α[ f ] = [α f ]. The coset [0] consists of the zero map and functions whose absolute value has vanishing integral. This equivalence class is the neutral element of the vector space [ f ] + [0] = [ f ]. On the vector space L 1 ([a, b]) we define the integral norm 0 ≤ [ f ]1 = f 1 = and [ f ]1 = f 1 = Furthermore,
[a,b]
| f |d < ∞, ∀ f ∈ L 1 ([a, b])
[a,b]
| f |d = 0 ⇔ f = 0 -a.e. on[a, b].
328
12 Comparing Notions of Integral
α f 1 =
[a,b]
|α f |d = |α| f 1 , ∀α ∈ R and f ∈ L 1 [a, b]
f + g1 = ≤
[a,b]
[a,b]
| f + g|d | f |d +
[a,b]
|g|d
= f 1 + g1 , ∀ f and g ∈ L 1 [a, b] . All this is coherent since if we take another representative g of [ f ], by the triangle inequality g1 ≤ g − f 1 + f 1 and f 1 ≤ f − g1 + g1 ; as f ∼ g, we have d(g, f ) = g − f 1 = 0 and then f 1 = g1 . In fact the integral norm on L 1 ([a, b]) stays the same if we modify functions on zero-measure sets. The quotient L 1 ([a, b])/ ∼ with integral norm · 1 is a normed vector space and we indicate it by
L 1 ([a, b]), · 1 .
Now we show R([a, b]), · 1 is incomplete, and L 1 ([a, b]), · 1 is actually its completion. For this we use an example generalising the Cantor set. Example 12.2.1 (Generalised Cantor set) Consider [0, 1] and fix a number α ∈ (0, 1/3). Step 1: delete the middle open interval of length α 1−α 1+α ; G α1 = 2 2 and call C1α what remains 1+α 1−α α ∪ , 1 , where m(C1α ) = 1 − α. C1 = 0; 2 2 Step 2: from each of the two intervals of C1α remove the middle open interval of length α2 , and name the remaining set C2α , which now has measure 1 − α − 2α2 . Iterating as above, at step n the set Cnα has measure 1 − [α + 2α + 2 α + · · · + 2 2
2 3
α ]=1−α
n−1 n
n−1
(2α)i .
i=0
∞
The generalised Cantor set is the intersection C α = n=1 Cnα of a decreasing sequence, and C1α has finite measure. Since the Lebesgue measure is continuous from above,
12.2 Completeness of L 1 ([a, b])
329
n−1 m(C α ) = lim m(Cnα ) = lim 1 − α (2α)i n→∞
n→∞
i=0
∞ =1−α (2α)i = 1 − i=0
1 − 3α α = . 1 − 2α 1 − 2α
(12.2.1)
For α = 1/3 we recover the ‘classical’ Cantor set, which has measure zero. If we choose α ∈ 0, 1/3 the generalised Cantor set has measure m(C α ) > 0.
Proposition 12.2.1 (Incompleteness of R([a, b]), · 1 ) The vector space of bounded, Riemann integrable maps on an interval [a, b], equipped with norm f (x)1 = | f |d, [a,b]
is not complete. Proof Consider f : [0, 1] → R defined as f = ϕC α . It is the characteristic function of the generalised Cantor set, and as such it is discontinuous at the frontier of the set. Since its range does not contain intervals of any (positive) length, C α has no interior point. All points in C α lie on the frontier, so C α is perfect for it coincides with its derived set. Therefore the characteristic function ϕC α is discontinuous everywhere on C α , which has positive measure. It follows that the discontinuity set is not enclosable, i.e. not PJ measurable, making f not Riemann integrable. On the other hand, ϕC α is Lebesgue integrable and its integral equals the Lebesgue measure of C α . Different story if we consider the sets Cnα , n ≥ 1. For any such consider f n : [0, 1] → R, f n (x) = ϕCnα (x). Each f n is Riemann integrable, because Cnα is the finite union of closed intervals, and the frontier consists of finitely many points. Hence f n is discontinuous at a finite number of points and therefore Riemann integrable. The elements of {ϕCnα }n≥1 all belong to R [0, 1] (⇒ Riemann integrable), while lim f n = ϕC α
n→∞
is only Lebesgue integrable. Moreover, for every n ≥ 1 | f n | ≤ 1, and the constant map is integrable on [0, 1]. By dominated convergence we have α α lim ϕCn − ϕC 1 = lim |ϕCnα − ϕC α |d = 0. n→∞
n→∞ [a,b]
This means that for any > 0 there exists an n such that
330
12 Comparing Notions of Integral
ϕC αp − ϕCqα 1 < for every p, q ≥ n . Hence {ϕCnα }n≥1 is a Cauchy sequence in R([a, b]), but it does not converge in the space. It converges in L 1 ([a, b]), though, and so R([a, b]) is not complete. Next we show · 1 makes L 1 ([a, b]) complete.
Theorem 12.2.1 (Riesz–Fischer theorem—completeness of L 1 ([a, b]), · 1 )
The normed vector space L 1 ([a, b]), · 1 , defined identifying -almost everywhere equal functions, is complete. Proof In order to show (L 1 ([a, b]) is complete we must prove any Cauchy sequence { f n }n≥1 ⊆ L 1 ([a, b]) converges to some map f ∈ L 1 ([a, b]) -almost everywhere on [a, b]. It suffices to show that from every { f n }n≥1 ⊆ L 1 ([a, b]) that converges pointwise -a.e. we can extract a subsequence,1 similarly convergent -a.e. on [a, b], to some f in L 1 ([a, b]). So suppose { f n }n≥1 is Cauchy, so for every > 0 there is an n() such that for every n and m ≥ n() we have f n − f m 1 < . Pick = 1/2, so we have n
1 2
such that for every n 1 , n 2 ≥ n f n 2 − f n 1 1
1 and 0 < k < n, so
(−∞,−1)
|x|k d FX (x) +
(1,∞)
|x|k d FX (x)
0 we have t d FX (t) ≥ x (x,∞)
(x,∞)
d FX (t) = x 1 − FX (x) ≥ 0.
Hence 0 ≤ lim x 1 − F(x) ≤ lim x→∞
x→∞ (x,∞)
t d FX (t) = 0
and so lim x→∞ x 1 − F(x) = 0. For x < 0, we similarly have t d FX (t) ≤ x d FX (t) = x FX (x) ≤ 0, (−∞,x]
(−∞,x]
so 0 ≥ lim x FX (x) ≥ lim x→−∞
x→−∞ (−∞,x]
t d FX (t) = 0
and then lim x→−∞ x FX (x) = 0.
Proposition 14.3.5 (Schwarz inequality) If the random variables X, Y admit moments of order 2, the random variable X Y has first moment, and " E(|X Y |) ≤ E(X )2 E(Y )2 . Proof Define the random variable Z = (X − αY )2 , with α ∈ R, so that E(Z ) = E(X )2 − 2αE(X Y ) + α2 E(Y )2 . As Z ≥ 0, then E(Z ) ≥ 0 for every α ∈ R, and then E(X )2 − 2αE(X Y ) + α2 E(Y )2 ≥ 0. Choose
406
14 Fundamental Theorems of Calculus for the Lebesgue Integral
α=
E(X Y ) , E(Y )2
whence 0 ≤ E(X )2 − 2αE(X Y ) + α2 E(Y )2
E(X Y ) E(X Y ) 2 2 = E(X ) − 2 E(X Y ) + E(Y )2 E(Y )2 E(Y )2 [E(X Y )]2 = E(X )2 − E(Y )2 2 2 ⇔ [E(X Y )] ≤ E(X ) E(Y )2 " ⇔ E(|X Y |) ≤ E(X )2 E(Y )2 .
Proposition 14.3.6 (Markov inequality) If a random variable X with expectation m = E(X ) only assumes positive values, P(X > 0) = 1, then for every α > 0 P(X > α · E(X )) ≤
1 . α
(14.3.6)
Proof If 0 < α ≤ 1 then 1/α ≥ 1 and the inequality is trivial. Suppose α > 1. As X is positive, m= x d F(x) > 0. (0,∞)
Then m=
(0,αm)
x d F(x) +
≥ αm
(αm,∞)
(αm,∞)
x d F(x) ≥
x d F(x) (αm,∞)
d F(x) = αm 1 − F(αm) ,
and so
1 − P(X ≤ α · m) ≤
1 . α
Next up is the Bienaymé–Chebyshev inequality, according to which given an expectation m, a standard deviation σ and a number α > 0, the probability a RV lands between m ± ασ is bigger than 1 − 1/α2 . Hence the larger the interval (read: σ) the more certain the event.
14.3 Foundations of the Theory of Discrete and Continuous Random Variables
407
Proposition 14.3.7 (Bienaymé–Chebyshev inequality) Suppose a random variable X has finite second moment, so in particular E(X ) and E(X − m)2 = σ 2 are well defined. Then 1 P |X − m| > α σ ≤ 2 . α Proof Just replace X in Markov’s inequality with |X − m|2 and α > 0 with α2 : 1 1 P |X − m|2 > α2 E |X − m|2 ≤ 2 ⇔ P |X − m| > ασ ≤ 2 . α α
What this means is that for any RV X with E(X ) = m and Var(x) = σ 2 < ∞, and any α > 0, 1 P |X − m| < ασ ≥ 1 − 2 . α The Bienaymé–Chebyshev inequality holds for any α > 0, but only the case α > 1 is useful. When α = 1 the inequality is trivial P |X − m| < ασ ≥ 0, and for α < 1 it states the even more obvious fact that probabilities are bigger than negative numbers. If we take ασ = k > 0, the inequality reads σ2 P |X − m| < k ≥ 1 − 2 . k To finish we shall discuss a few properties of discrete RVs. First of all, the expectation E(·) is a linear operator: E(αX + βY ) = αE(X ) + β E(Y ), for any α, β ∈ R. Proposition 14.3.8 (Linearity of a discrete RV’s expected value) Given random variables X, Y and α, β ∈ R, suppose E(X ), E(Y ) exist finite. Then so does E(αX + βY ) = αE(X ) + β E(Y ). Proof Let us call the events of X, Y respectively Ai = {ω ∈ : X (ω) = xi }, i ∈ I and B j = {ω ∈ : Y (ω) = y j }, j ∈ J , where Ai = = Bj. i∈I
j∈J
408
14 Fundamental Theorems of Calculus for the Lebesgue Integral
On we can define events Ci, j = {ω ∈ : X (ω) = xi , Y (ω) = y j }
where
P(Ci, j ) = P(B j ) and
i∈I
P(Ci, j ) = P(Ai ).
j∈J
Then
E(αX + βY ) =
(αxi + β y j )P(Ci, j )
i∈I, j∈J
=
i∈I, j∈J
⎡
= α⎣
⎛ xi ⎝
i∈I
=α
αxi P(Ci, j ) +
i∈I, j∈J
⎞⎤
β y j P(Ci, j ) ⎡
P(Ci, j )⎠⎦ + β ⎣
j∈J
⎛
xi P(Ai ) + β ⎝
i∈I
yj
j∈J
⎤ P(Ci, j ) ⎦
i∈I
⎞
y j P(B j )⎠ = α
j∈J
xi pi
⎛ +β⎝
i∈I
⎞ yj pj⎠
j∈J
= αE(X ) + β E(Y ).
What we have proved above can naturally be extended by induction to a finite, even ∞ countable, number of RVs, but in the latter case the series of expected values i=1 E(|X i |) should converge. Proposition 14.3.9 (Variance in terms of moments) For any discrete RV X with finite second moment, Var(X ) = E(X 2 ) − [E(X )]2 . Proof 2 Var(X ) = E X − E(X ) = E X 2 − 2X E(X ) + [E(X )]2 =E(X 2 ) − 2E(X E(X )) + E([E(X )]2 ) =
i∈I
i∈I
=
i∈I
P(X =
xi )xi2
−2
P(X = xi ) xi ·
i∈I
⎛' (2 ⎞ P(X = xi ) ⎝ P(X = xi )xi ⎠ i∈I
⎛'
P(X = xi )xi2 − ⎝
i∈I
i∈I
(2 ⎞ P(X = xi )xi
⎠
P(X = xi )xi +
14.3 Foundations of the Theory of Discrete and Continuous Random Variables
=E(X 2 ) − [E(X )]2 .
409
We recall that two events A, B ∈ are said independent if the occurrence of A does not affect the probability of occurrence of B. Equivalently, since P(A) > 0, P(A ∩ B) P(A)P(B) = = P(B). P(A) P(A)
P(B|A) =
(14.3.7)
The definition requires P(A) > 0, which is why in general the notion of independent events relies on formula P(A ∩ B) = P(A)P(B), valid for P(A) ≥ 0, instead of Eq. (14.3.7). Definition 14.3.5 (Independent discrete random variables) Two discrete RVs X, Y are called independent if, for every Ai ∈ S( X ), B j ∈ S(Y ), i ∈ I, j ∈ J , P(Ai ∩ B j ) = P(Ai )P(B j ).
♦
Proposition 14.3.10 (Expected value of product of independent discrete RVs) If X, Y are independent discrete RVs with finite expectations E(X ), E(Y ), then E(X Y ) exists, is finite, and equals E(X Y ) = E(X )E(Y ). Proof In analogy to Proposition 14.3.8, E(X Y ) =
(xi y j )P(Ci, j )
i∈I, j∈J
⎡
=⎣
⎤⎡ xi P(Ci, j )⎦ ⎣
i∈I, j∈J
=
⎛
xi P(Ai ) ⎝
i∈I
= E(X )E(Y ).
i∈I, j∈J
⎤ y j P(Ci, j )⎦ ⎞
y j P(B j )⎠
j∈J
Definition 14.3.6 (Conditional expectation of discrete RVs) Given a discrete RV X and an event B with P(B) > 0, the conditional expectation of X with respect to B is
410
14 Fundamental Theorems of Calculus for the Lebesgue Integral
E(X |B) =
xi P(Ai |B) =
i∈I
assuming the series
i∈I
xi P (X = xi )|B ,
(14.3.8)
i∈I
xi P (X = xi )|B converges.
♦
Proposition 14.3.11 Let X be a discrete RV with finite expectation E(X ) and {Bi }i∈I a sequence of pairwise-disjoint events such that i∈I Bi = and P(Bi ) > 0. Then P(Bi )E(X |Bi ). E(X ) = i∈I
Proof Take any Ak ∈ S(). Then Ak = Ak ∩ = Ak ∩ Bi ), and also
∞ i=1
∞ Bi = i=1 (Ak ∩
(Ak ∩ Bi ) ∩ (Ak ∩ B j ) = Ak ∩ (Bi ∩ B j ) = ∅. ∞ P(Ak ∩ Bi ). Hence P(Ak ) = i=1 Substituting P(Ak ∩ Bi ) = P(Bi )P(Ak |Bi ) above gives P(Ak ) =
P(Bi )P(Ak |Bi )
(14.3.9)
i∈I
⇔ P(X = xk ) =
P(Bi )P((X = xk )|Bi ).
i∈I
By (14.3.8), then, E(X ) =
xk P(X = xk ) =
k∈K
=
' P(Bi )
i∈I
k∈K
k∈K
xk
' i∈I
( P(Bi )P((X = xk )|Bi ) (
xk P((X = xk )|Bi ) =
P(Bi )E(X |Bi ).
i∈I
Definition 14.3.7 (Almost constant RVs) A random variable X is almost constant if the event {ω ∈ : X (ω) = K ∈ R} is almost sure, i.e. its probability is 1 up to zero-measure sets: P(X = K ) = 1.
♦
Proposition 14.3.12 (Variance vanishing) The variance of a discrete RV X is zero if and only if X is almost constant. Proof If X is almost constant it assumes values {xi = K }i∈I with probability P(X = K ) = 0 on event B = {ω ∈ : X (ω) = K },
14.3 Foundations of the Theory of Discrete and Continuous Random Variables
411
while it assumes value K with probability P(X = K ) = 1 on A = {ω ∈ : X (ω) = K }, and A ∩ B = ∅. Therefore E(X ) =
P(X = xi )xi + P(X = K )K = P(X = K )
i∈I
⇔
E(X 2 ) =
xi + K = K ,
i∈I
P(X = xi )xi2 + P(X = K )K 2 = P(X = K )
i∈I
xi2 + K 2 = K 2 ,
i∈I
and then 2 Var(X ) = E X − E(X ) = E(X − K )2 = E(X 2 − 2K X + K 2 ) = E(X 2 ) − 2K E(X ) + K 2 = E(X 2 ) − K 2 = 0. Conversely, suppose 2 2 xi − E(X ) P(X = xi ) = 0. Var(X ) = E X − E(X ) = i∈I
By definition P(X = xi ) ≥ 0 for every i ∈ I , and hence xk − E(X ) = 0 ⇒ P(X = xk ) > 0 , ∀k ∈ I1 , x j − E(X ) = 0 ⇒ P(X = x j ) = 0 , ∀ j ∈ I2 where I1 ∩ I2 = ∅ and I1 ∪ I2 = I . Consequently xk = E(X ) = K has P(X = xk ) > 0 for every k ∈ I1 , while x j = K has P(X = x j ) = 0 for every j ∈ I2 . In the end P(X = xk ) = 1 − P(X = x j ) = 1 − P(X = K ) = 1. P(X = K ) = k∈I1
j∈I2
14.4 A Miscellany of Discrete and Continuous Distributions Example 14.4.1 (Binomial random variable) Suppose we have an urn containing the same number of white and red balls. Five balls are drawn at random, one at a time, and after each extraction we put the draw back in the urn. We want to calculate the probability of picking two whites and three reds, in any order. First, we find the probability of extracting a white ball: 1 P [W ][ ][ ][ ][ ] = . 2
412
14 Fundamental Theorems of Calculus for the Lebesgue Integral
The probability the first ball is white and the second is also white equals P [W ][W ][ ][ ][ ] = P [W ][ ][ ][ ][ ] × P [ ][W ][ ][ ][ ] =
2 1 . 2
The probability of drawing three whites and two reds, in this order, is P [W ][W ][W ][R][R] =
3 2 1 1 × . 2 2
But extracting first three whites and then two reds is not the only way to obtain the required unordered outcome. The possible arrangements [R][R][W ][W ][W ] [R][W ][R][W ][W ] [R][W ][W ][R][W ]
[W ][R][W ][R][W ] [W ][R][W ][W ][R] [W ][W ][R][R][W ]
[R][W ][W ][W ][R] [W ][R][R][W ][W ]
[W ][W ][R][W ][R] [W ][W ][W ][R][R]
correspond, as a matter of fact, to the number of permutations with repetition of k ≤ n elements of one type and n − k of another type2 :
5 5! 5! = = = 10. 3 3!2! 3!(5 − 3)! Clearly each outcome has the same probability, e.g. P [R][W ][W ][R][W ] =
3 2 1 1 = P [R][R][W ][W ][W ] . 2 2
and so
2 If k of the n elements are labelled W and n − k are labelled R, the general number of permutations with repetitions
n! k 1 ! · k 2 ! · k 3 ! · · · ki ! reduces to the number of combinations of k objects out of n Pnr =
Pk,n−k =
n! . k!(n − k)!
14.4 A Miscellany of Discrete and Continuous Distributions
413
3 2 3 2 1 1 1 1 P [R][W ][W ][R][W ] or [R][R][W ][W ][W ] = + . 2 2 2 2
Eventually P [R][R][W ][W ][W ] or . . . or [W ][W ][W ][R][R] =
3 2 3 2 1 1 1 1 + ··· + 2 2 2 2 ) *+ , 10 terms
3 2 1 5 1 = . 2 3 2
In general, then, if we draw n balls and reintroduce the extracted one every time, the probability (also known as the mass) of getting 0 ≤ k ≤ n white balls equals
n k p (1 − p)n−k . pk = P(X = k) = k We call binomial distribution, or binomial random variable, of order n and parameter p the RV B(n, p) =
0 , 1 , 2 ,..., n n n . (1 − p)n , 1 p (1 − p)n−1 , 2 p 2 (1 − p)n−2 , . . . , p n
The sum of all masses pk equals n k=0
pk =
n
n k=0
)
k
p k (1 − p)n−k = [ p + (1 − p)]n = 1. *+
,
Newton’s binomial formula
Set q = 1 − p to compute the expectation E(X ) =
n
kpk =
k=0
n n k n−k k p q k k=1
of the binomial distribution. Differentiating in t the identity ( pt + q)n =
n
n k=1
k
p k t k q n−k
gives np( pt + q)
n−1
n n k k−1 n−k = k p t q . k k=1
(14.4.1)
414
14 Fundamental Theorems of Calculus for the Lebesgue Integral
As p + q = 1, when t = 1 we find
n n k n−k k p q , np = k k=1 so from (14.4.1) we deduce E(X ) = np. Let us compute the binomial variance. Since Var(X ) = E(X 2 ) − E(X )2 = E(X (X − 1)) + E(X ) − E(X )2 the only unknown term is E(X (X − 1)). We relabel terms by i = k − 2 and m = n − 2, so k = 2, . . . , n ⇒ i = 0, . . . , n − 2 = m. Then E(X (X − 1)) =
n k=0
k(k − 1)
n n k n−k n! p q = k(k − 1) p k q n−k k k!(n − k)! k=0
n
n! (n − 2)! = p k q n−k = n(n − 1) p 2 p k−2 q n−k (k − 2)!(n − k)! (k − 2)!(n − k)! k=2 k=2 m m
m i m−i m! pq = n(n − 1) p 2 pi q m−i = n(n − 1) p 2 i i!(m − i)! n
i=0
i=0
= n(n − 1) p 2 ( p + q)m = n(n − 1) p 2 ,
and finally Var(X ) = E(X (X − 1)) + E(X ) − E(X )2 = n(n − 1) p 2 + np − n 2 p 2 = n 2 p 2 − np 2 + np − n 2 p 2 = np − np 2 = np(1 − p) = npq.
Example 14.4.2 (Poisson random variable) In contrast to many discrete random variables (e.g. the hypergeometric RV, or the negative binomial RV) the Poisson process is not generated by a sequence of Bernoulli trials, but concerns a number of events over a certain time interval in a spacial, social, or other, setting. The events are completely independent in the sense that they occur with constant rate, and samesetting event sequences over non-overlapping time intervals do not interact. Consider then a time interval (0, t + t] and suppose we want to register how many events of a certain type occur in a given spatial region. We speak of Poisson distributions when the following key conditions are met: • the probability that an event occurs in (0, t + t] is independent of the instant t (stationary Poisson distribution), nor depends on earlier history, i.e. what happened before or at time t; • between t and t + t only one event of the type considered can occur, with probability proportional to t by a factor λ. The probability further events occur in
14.4 A Miscellany of Discrete and Continuous Distributions
415
that interval is infinitesimal of order bigger than t, so that the probability of observing one or more events at time t + t equals pk≥1 (t + t) = λt + o(t); • no event of the type examined has occurred at the initial time t = 0. Due to this, we assume no event took place in the system at time 0, and that pk≥1 (t + t) = λt + o(t)
probability that at least one event occurs in the period (t, t + t]
pk>1 (t + t) = o(t)
probability that k ≥ 2 events occur in (t, t + t]
p0 (t + t) = 1 − λt − o (t)
probability that no event occurs in (t, t + t],
where o (t) gathers all infinitesimals. We seek the probability that within (t, t + t] exactly n system events occur, i.e. pn (t + t). For this let us indicate by Sn , Sn−1 , Sn−k the system’s state by the time n, n − 1, n − k events have already occurred, respectively. In the following table infinitesimals o(t) are added, rather than subtracted, since they are irrelevant, in practice: state at t Sn Sn−1 Sn−k
events at t + t 0 1 k
state at t + t Sn Sn Sn
probability pn (t)[1 − λt + o (t)] pn−1 (t)[λt + o(t)] pn−k (t)[o(t)]
The probability the system at time t + t is in state Sn (n events have been witnessed) is pn (t + t) = pn (t)[1 − λt] + pn−1 (t)[λt] + o∗ (t), where o∗ (t) collects all infinitesimal, and hence pn (t + t) − pn (t) o∗ (t) = −λ pn (t) + λ pn−1 (t) + . t t Since o∗ (t) is a higher-order infinitesimal than t, as the latter tends to 0 the ratio o∗ (t)/t goes to zero, too. Hence the above right-hand side converges, and dpn (t) = −λ pn (t) + λ pn−1 (t). dt
(14.4.2)
416
14 Fundamental Theorems of Calculus for the Lebesgue Integral
Thus we have tied pn (t) to its derivative and the preceding term in the sequence. This is a recurrence differential equation. To solve it we first determine a few values of pn (t) using the assumptions. In particular • p0 (0) = 1, saying it is certain that no events occurred at time 0; • p0 (t) = 0, meaning it is impossible no event occurred at time t; • p−n (·) = 0, expressing the fact we admit only positive integers n. We may solve (14.4.2) for p0 (t): dp0 (t) 1 dp0 (t) = −λ p0 (t) ⇔ = −λ, dt p0 (t) dt integrating which
1 dp0 (t) · dt = −λdt p0 (t) dt ⇔ ln p0 (t) = − λt + A0 .
A0 is to all effects the initial condition of the ODE, and actually A0 = 0 because nothing happened at the very beginning of the process, so ln( p0 (t)) = (ln p0 (t)) ln e = ln eln p0 (t) = ln e−λt ⇔ p0 (t) = e−λt . The probability of witnessing an event at time t according to Eq. (14.4.2) is dp1 (t) = λ p0 (t) − λ p1 (t) dt
dp1 (t) + λ p1 (t) = λ p0 (t) dt dp1 (t) + λ p1 (t) = λe−λt . ⇔ dt ⇔
Multiplying both sides by eλt gives dp1 (t) λt e + λeλt p1 (t) = λ. dt Note
dp1 (t) λt e dt
+ λeλt p1 (t) is the derivative of p1 (t) · eλt , so integrating produces
dp1 (t) λt e + λeλt p1 (t) dt = λdt dt ⇔ p1 (t)eλt = λt + A1 ,
14.4 A Miscellany of Discrete and Continuous Distributions
417
and the initial condition p1 (0) = 0 forces A1 = 0, and then p1 (t)eλt = λt. Multiplying by e−λt , p1 (t) =
λt −λt e . 1!
We may assume by induction that the probability of occurrence of n − 1 events at time t equals pn−1 (t) =
λn−1 t n−1 −λt e . (n − 1)!
Then, by (14.4.2), the probability of having n events at time t is dpn (t) = λ pn−1 (t) − λ pn (t) dt
dpn (t) + λ pn (t) = λ pn−1 (t) dt dpn (t) λn−1 t n−1 −λt + λ pn (t) = λe . ⇔ dt (n − 1)! ⇔
Multiplying by eλt , dpn (t) λt λn t n−1 e + λeλt pn (t) = . dt (n − 1)! As noticed earlier, the left side is the derivative of eλt pn (t), and integrating we obtain
dpn (t) λt λn t n−1 λt e + λe pn (t) dt = dt. dt (n − 1)! (λt)n + An . ⇔ pn (t)eλt = n(n − 1)!
Again, pn (0) = 0 forces An = 0, so multiplying by e−λt eventually gives pn (t) =
(λt)n e−λt , n!
the so-called Poisson distribution, or Poisson random variable. In general this distribution is defined at every t, so we set, for k = 1, . . . , n, . . .
0 , 1 , 2 ,..., n ,... λk e−λ pk = = . 2 n k! e−λ , 1!λ e−λ , λ2! e−λ , . . . , λn! e−λ , . . . The Poisson random variable is well defined, since P(X = k) ≥ 0 for any k and
418
14 Fundamental Theorems of Calculus for the Lebesgue Integral
∞
P(X = k) =
k=0
∞ λk e−λ
k!
k=0
= e−λ
∞ λk k=0
k!
= e−λ eλ = 1.
Let us find its expectation and variance: '∞ ( ∞ λk−1 λk e−λ =λ k E(X ) = e−λ = λeλ e−λ = λ; k! (k − 1)! k=0 k=1 and from E(X 2 ) =
∞ k=0
∞
k2
λk e−λ λk e−λ = (k 2 − k + k) k! k! k=0
∞
∞ λk e−λ λk e−λ + = k(k − 1) k k! k! k=0 k=0 '∞ ( λk−2 = λ2 e−λ + E(X ) = λ2 e−λ eλ + λ = λ2 + λ, (k − 2)! k=2
Var(X ) = λ2 + λ − λ2 = λ. The parameter λ, initially taken as proportionality factor of the event’s occurrence during a t lapse, is therefore both the expected value and the variance of our event at time t. Next we want to explain how the binomial random variable under certain conditions has an asymptotic behaviour that turns it into a Poisson distribution. Suppose np = λ, in the sense that as n → ∞ the constant p tends to 0 and therefore np becomes asymptotic to a constant λ. Let us examine the expansion
k
n k λ n−k λ n! n−k p (1 − p) 1− = lim lim n→∞ n→∞ k k!(n − k)! n n p→0 p→0
λ n λ −k 1 k n! λk 1− 1− lim = k! n→∞ (n − k)! n n n p→0 −k term by term, starting from 1 − λn . Since lim (np) = λ and hence the double n→∞ p→0
limit boils down to limn→∞ , for any k ∈ N we have
λ −k 1− lim = (1)−k = 1. n→∞ n p→0
14.4 A Miscellany of Discrete and Continuous Distributions
419
n λ n Regarding lim : because 1 − λn > 0 one has 1 − n→∞ n p→0
n
λ λ n ln 1 − ln e = ln eln 1− n n
n λ λ n ⇔ 1− = eln 1− n , n and we consider the limit of the exponent on the right:
- n−λ 1 λ n n−λ lim ln 1 − = lim ln . = lim n ln n→∞ n→∞ n→∞ n n n n As n → ∞ the indeterminacy 0/0 is sorted by de l’Hôpital’s rule. For the numerator d dn
n−λ ln n
=
1 n−λ n
λ d n−λ λ n · 2 = , · = dn n n−λ n n(n − λ)
while the denominator’s derivative equals −1/n 2 . Therefore λ
nλ λ n n(n−λ) = −λ lim ln 1 − = lim = lim − n→∞ n→∞ n→∞ n n−λ − n12 and hence lim
n→∞
λ 1− n
n = lim e n→∞
ln 1− λn
n
= e−λ .
Finally, k 1 n! n! lim = lim n→∞ n→∞ (n − k)! n (n − k)!n k p→0 n(n − 1) · · · (n − k + 1)(n − k)(n − k − 1) · · · 1 n→∞ [(n − k)(n − k − 1) · · · 1]n k n(n − 1) · · · (n − k + 1) = lim n→∞ nn
· · · n
k−1 1 ... 1 − = 1. = lim 1 1 − n→∞ n n = lim
Putting everything back together,
420
14 Fundamental Theorems of Calculus for the Lebesgue Integral
n k λk −λ n−k p e . lim (1 − p) = n→∞ k k! p→0 We have thus proved that as the number n of trials increases and the probability p of event E lowers in such a way that np is a constant λ, a binomial random variable tends to a Poisson random variable. The fact that a binomial distribution converges to a Poisson distribution whenever the probability of witnessing an event is close to zero induced some scholars to call the Poisson random variable, somehow misleadingly, “random variable of rare events”. The Poisson random variable is well defined even when events are not rare at all ( p 0). We have proved that if n → ∞ and p → 0 but np = λ, then k B(n, p) → λk! e−λ . This is not the case when p 0, nor when n → ∞, because then p > (λ/n) and (1 − p) < (1 − λ/n). Now we want to examine a few classical continuous distributions. Example 14.4.3 (Uniform random variable) A RV on [a, b] with probability density 1 if x ∈ [a, b] f (x) = b−a 0 if x ∈ / [a, b]. is called uniform, and denoted U[a,b] (Fig. 14.1). To start with, a uniform RV satisfies conditions (14.3.2). Let us find its cumulative distribution function. If x < a, f (t) = 0 for t ≤ x, and so F(x) = 0. If a ≤ x ≤ b, we have f (t) = 1/(b − a) and then (Fig. 14.1) 1 x −a . f (t)dt + f (t)dt = dt = F(x) = b − a [a,x] b−a (−∞,a) [a,x] If x > b, F(x) =
(−∞,a)
f (t)dt +
[a,b]
f (t)dt +
(b,x]
f (t)dt =
1 b−a
[a,b]
dt = 1.
Altogether F(x) =
⎧ ⎨ 0 ⎩
if x < a if a ≤ x ≤ b if x > b.
x−a b−a
1
Out of the distribution function we recover the mean and variance: E(X ) =
(−∞,a)
xd F(x) +
[a,b]
xd F(x) +
(b,∞)
xd F(x) =
1 b−a
[a,b]
xd x =
a+b . 2
14.4 A Miscellany of Discrete and Continuous Distributions
421
Fig. 14.1 Density and distribution functions of a uniform RV Ua,b E(X 2 ) = =
(−∞,a)
1 b−a
x 2 d F(x) +
[a,b]
x 2d x =
x 2 d F(x) +
[a,b] a 2 + ab
+ b2
3
(b,∞)
x 2 d F(x)
,
so Var(X ) = E(X 2 ) − (E(X ))2 =
(b − a)2 . 12
In particular, the distribution function of a uniform RV over [0, 1] equals ⎧ ⎨ 0 if x < 0 F(x) = x if 0 ≤ x ≤ 1 ⎩ 1 if x > 1. Taking linear combinations of independent uniform RVs generates random variables with density varying according to the number of RVs involved. n Ini particular, for n = 2 we obtain a triangular RV; for higher n, putting Sn = i=1 U[0,1] , the density Sn − E(Sn ) √ Var(Sn ) quickly tends to a bell-shaped curve.
Notwithstanding its simplicity, the uniform RV is very useful towards applications, because any RV can be transformed in a uniform one and vice versa. The randomnumber algorithms sometimes employed to generate uniform random values are not typically object-oriented, and pass all randomness tests. Hence these generators are suitable for random variables of other kinds as well. Sequences of random numbers have several applications in the fields of experimental simulations, numerical analysis involving Monte Carlo methods (evaluating integrals and solving systems of differential equations) and polling models. Example 14.4.4 (Negative-exponential random variable) The negative-exponential RV is deduced from memoryless probabilistic models. These are used to compute the lifetime of devices either not subject to decay, or subject to negligible deterioration.
422
14 Fundamental Theorems of Calculus for the Lebesgue Integral
In general a RV X is said to be without memory, or memoryless, if the probability the phenomenon survives to time t0 + t (or the device does not break down before that instant), assuming it survives until t0 , is equal to the probability the phenomenon survives during the entire lapse [0, t]: P(X > t0 + t|X > t0 ) = P(X > t) ⇔
P(X > t0 + t) = P(X > t). P(X > t0 )
This relation formalises the lack of memory. If FX (t) = P(X ≤ t) denotes the cumulative distribution function of X , the survival function S : [0, ∞) → [0, 1] S(t) = 1 − FX (t) = P(X > t), t ≥ 0 encodes memorylessness, because S(t0 + t) = S(t) for t0 , t > 0. S(t0 ) The map S is differentiable on (0, ∞), it has right derivative at 0 and solves system S(t0 + t) = S(t0 ) · S(t) S(0) = 1 − FX (0) = 1, S(∞) = 1 − FX (∞) = 0. When t > 0 S(t + t1 ) − S(t) S(t) · S(t1 ) − S(t) = lim+ t1 →0 t1 t1 S(t1 ) − 1 = S(t) · lim+ = S(t) · S (0+ ). t1 →0 t1
S (t) = lim+ t1 →0
As FX (t) is increasing, 1 − FX (t) = S(t) is positive and decreasing, i.e. S (t) < 0 for every t ∈ [0, ∞). From S (t) = S (0+ ) = −λ ≤ 0 S(t) we find
S (t) ln S(t) = = −λ ⇔ ln S(t) = −λt + a ⇔ S(t) = e−λt+a . S(t)
Now S(0) = 1 ⇒ S(0) = ea = 1 ⇒ a = 0 ⇒ S(t) = e−λt , so FX (t) = 1 − e−λt . In addition, S vanishes for t < 0. The random variable X then has exponential distribution function with rate parameter λ > 0. The choice λ = 0 is excluded since
14.4 A Miscellany of Discrete and Continuous Distributions
423
Fig. 14.2 Density function of a negative-exponential RV
S(t) = 1 is not the survival function of any RV (S must decrease from S(0) = 1 to S(∞) = 0). In summary, for λ > 0 FX (t) =
1 − e−λt if t ∈ [0, ∞) 0 if t ∈ (−∞, 0).
and
f X (t) =
λe−λt if t ∈ [0, ∞) 0 if t ∈ (−∞, 0).
The density function of a negative-exponential RV looks like in Fig. 14.2. The mean and variance are found integrating by parts: 1 −λt 1 −λt dt tλe dt = − e λt − λ − e E(X ) = λ λ [0,∞) [0,∞) ∞ 1 1 = −te−λt + e−λt dt = − te−λt − e−λt = λ λ
−λt
[0,∞)
0
1 t 2 λe−λt dt − 2 λ [0,∞)
1 1 1 = −λt 2 e−λt − 2λt − e−λt dt − 2 λ λ λ [0,∞) 2 1 = − t 2 e−λt + tλe−λt dt − 2 λ [0,∞) λ ∞ 2 1 2 1 1 1 = − t 2 e−λt 0 + · − 2 = 2 − 2 = 2 . λ λ λ λ λ λ
Var(X ) = E(X 2 ) − (E(X ))2 =
Example 14.4.5 (Normal random variable) The normal, or Gaussian, random variable is supported on the entire real axis. Its density function f (x) =
1 (x−m)2 1 √ e− 2 σ2 . σ 2π
has the graph as in Fig. 14.3. We claim
(14.4.3)
424
14 Fundamental Theorems of Calculus for the Lebesgue Integral
Fig. 14.3 Density function of a normal RV
R
To see this set y =
x−m , σ
1 (x−m)2 1 √ e− 2 σ2 d x = 1. σ 2π
(14.4.4)
so d x = σdy and y2 1 A= √ e− 2 dy. 2π R
Moreover, A2 =
1 2π
e−
y 2 +z 2 2
dydz.
R2
Let Cr indicate the disc centred at the origin with radius 0 < r < ∞: . y2 + z2 ≤ r2 . Cr = (y, z) ∈ R2 : 2 Then A2 =
1 2π
e−
y 2 +z 2 2
dydz =
R2
1 lim 2π r →∞
e−
y 2 +z 2 2
dydz.
Cr
" Passing to polar coordinates (ρ, θ), since ρ = y 2 + z 2 we can write 2π r ρ2 1 lim ρe− 2 dρ dθ. A2 = 2π r →∞ 0 0 Substituting / ρ2 0 d e− 2 dρ
= −ρe
2 − ρ2
⇔−
/ ρ2 0 d e− 2 dρ
ρ2
= ρe− 2
14.4 A Miscellany of Discrete and Continuous Distributions
425
in the integral we conclude / ρ2 0 d e− 2
2π 1 1 1 − ρ2 2∞ 2π lim dρ e 2 A = − dθ = − · θ 0 0 2π r →∞ 0 dρ 2π 0 1 1 0 − 1 · 2π = =− · 2π = 1, 2π 2π
r
2
i.e.
e−
y 2 +z 2 2
dydz = 2π.
R2
Furthermore, e
−x
2 +y 2 2
d xd y =
R2
e
=
R
R
x2
e− 2 d x
x2
e− 2 d x x2
x2
e− 2 d x
(relabelling y as x)
R 2
R
R
y2
e− 2 dy
dx
R
= ⇔
2
− x2
e− 2 d x =
= 2π √
2π ⇔
e−x d x = 2
R
√ π.
With the substitution x = σ y + m, d x = σdy, the expectation of X is 2 y2 1 1 − 21 (x−m) 2 σ x· √ d x = (σ y + m) · √ E(X ) = e e− 2 dy σ 2π 2π R R 2 2 1 σ − y2 − y2 y · e dy + m · √ e dy . =√ 2π R 2π ) R *+ ,
√ = 2π
Substituting
/ y2 0 d e− 2 dy
y2 2
= −ye− , or −
/ y2 0 d e− 2 dy
y2
= ye− 2 ,
gives σ E(x) = √ 2π Choosing, instead,
R
−
/ y2 0 d e− 2 dy
y 2 ∞ σ dy + m = √ − e− 2 −∞ + m = m. 2π
426
14 Fundamental Theorems of Calculus for the Lebesgue Integral
√ √ (x − m)2 = t 2 , so x = σ 2t + m and d x = σ 2dt, 2 2σ the variance equals Var(x) =
1 √ σ 2π ⎛
R
x 2e
− 21
(x−m)2 σ2
d x − (E(X ))2 =
√ √ σ 2 2 (σ 2t + m)2 e−t dt − m 2 √ σ 2π R ⎞
⎟ ⎜ √ ⎟ 1 ⎜ 2 2 −t 2 −t 2 2 −t 2 ⎟ 2 = √ ⎜ t e dt + 2σ 2m te dt +m e dt 2σ ⎟−m ; ⎜ π⎝ R R R ⎠ ) *+ , ) *+ , 2σ 2 = √ π
R
√ = π
=0
t 2 e−t dt. 2
Since d(e−t ) t d(e−t ) 2 2 = −2te−t , so − · = t 2 e−t , dt 2 dt 2
2
replacing in the expression and integrating by parts we obtain 2 2σ 2 2σ 2 t d(e−t ) 2 Var(x) = √ dt t 2 e−t dt = √ − · dt π R π R 2 ∞
2σ 2 t 1 2 2 = √ − · e−t + e−t dt 2 2 R π −∞ √ 2 2 2σ 1 σ 2 π −t 2 = √ · e dt = √ = σ2 . π 2 R 2 π The RV X with density (14.4.3) is completely characterised by the two parameters m, σ. One writes X ∼ N (m, σ 2 ), where ∼ should be understood as “has the same distribution as”, to mean such an X is “normally distributed”. To finish, a few observations concerning Gaussian random variables. • The density function is symmetric about the straight line x = m, so the skewness vanishes. At x = m the function has an absolute maximum. • The x-axis is a horizontal asymptote for the density function. • The points x1 = m − σ, x2 = m + σ are the density’s inflections. • Almost the entire unit mass is contained in [m − 3σ, m + 3σ]:
14.4 A Miscellany of Discrete and Continuous Distributions
427
P(m − 3σ ≤ X ≤ m + 3σ) = 0.99. • The expected value m of the normal RV determines where the density is placed in the plane. Given two density functions with m 1 < m 2 , the graph with mean m 1 somehow sits to the right of the one with mean m 2 . • The variance σ 2 of the normal RV determines the density’s shape. For smaller values of σ the density function is more concentrated around the expected value m, and hence its peak is higher than a density with larger σ. • The so-called standard normal RV, obtained setting z = (x − m)/σ, has probability density 1 1 2 f (z) = √ e− 2 z . 2π It enjoys all the properties of normal RVs but has mean 0 and variance 1. It is customary to call it Z ∼ N (0, 1). To transform a normal RV X into a standard normal RV Z it suffices to put
x −m . FZ (z) = FX σ This formula allows to compute the cumulative distribution function of N (m, σ) by knowing that of N (0, 1). By the aforementioned symmetry, moreover, the table of values of the standard normal distribution function only contains positive arguments.
Appendix A
Compact and Totally Bounded Metric Spaces
Definition A.0.1 (Diameter of a set) The diameter of a subset E in a metric space (X, d) is the quantity diam(E) = sup d(x, y). x,y∈E
♦ Definition A.0.2 (Bounded and totally bounded sets)A subset E in a metric space (X, d) is said to be 1. bounded, if there exists a number M such that E ⊆ B(x, M) for every x ∈ E, where B(x, M) is the ball centred at x of radius M. Equivalently, d(x1 , x2 ) ≤ M for every x1 , x2 ∈ E and some M ∈ R; 2. totally bounded, if for any > 0 there exists a finite subset {x1 , . . . , xn } of X such that the (open or closed) balls B(xi , ), i = 1, . . . , n cover E, i.e. E = n i=1 B(x i , ). ♦ Clearly a set is bounded if its diameter is finite. We will see later that a bounded set may not be totally bounded. Now we prove the converse is always true. Proposition A.0.1 (Total boundedness implies boundedness) In a metric space (X, d) any totally bounded set E is bounded. Proof Total boundedness means we may choose = 1 and find A = {x1 , . . . , xn } in X such that for every x ∈ E © The Editor(s) (if applicable) and The Author(s), under exclusive license to Springer Nature Switzerland AG 2020 S. Gentili, Measure, Integration and a Primer on Probability Theory, UNITEXT 125, https://doi.org/10.1007/978-3-030-54940-4
429
430
Appendix A: Compact and Totally Bounded Metric Spaces
inf d(xi , x) ≤ 1.
1≤i≤n
Set M = max1≤i≤n d(x1 , xi ). We know there is an index i ∈ {1, . . . , n} such that d(xi , x) ≤ 1, so the triangle inequality implies d(x1 , x) ≤ d(x1 , xi ) + d(xi , x) ≤ M + 1. Hence E ⊆ B(x1 , M + 1), and E is bounded. Example A.0.1 (Bounded versus totally bounded sets) The notions of boundedness and total boundedness differ, because the former is weaker. The discrete distance of distinct points is always 1, and equals 0 for coinciding points. • The space X = {a, b, c} with discrete metric is surely bounded. It is totally bounded, too, since {a}, {b}, {c} have zero diameter (less than any > 0), there is a finite number of them, and their union is X . • The space [0, 1] equipped with the discrete distance, rather than the usual Euclidean metric, is bounded: its diameter is 1, the distance of any pair of distinct points. But it is not totally bounded. The subsets of diameter less than 1 are the singletons {x}, of which there is no finite number covering the whole space. • N with the discrete metric is bounded (distinct natural numbers are all 1 apart), but not totally bounded. We cannot cover N by finitely many balls of radius < 1 (a ball consists of a single point, its centre, but N is infinite). A Cauchy sequence of natural numbers is eventually constant, hence it converges. N is therefore a complete metric space. Proposition A.0.2 (Separability of totally bounded metric spaces) Totally bounded metric spaces (X, d) are separable. Proof As X is totally bounded, for every n ∈ N there exists An = {x1 , . . . , xn } ⊆ X , finite, such that n 1 . B xi , X= n i=1 The union D = ∞ n=1 An is countable (as countable union of finite sets). We claim that D¯ = X . Let x ∈ X be an arbitrary point. For any n ∈ N, there exists some xin ∈ An such that x ∈ B xin , 1/n . This gives a sequence {xin }n≥1 ⊆ D
Appendix A: Compact and Totally Bounded Metric Spaces
431
with d(x, xin ) < 1/n. Therefore lim xi = x. This shows that every x ∈ X lies in the closure of D and so D¯ = X . Intuitively a separable space is one where any element is close to one from some (dense) countable subset. In a compact space, instead, any element is close to another taken from a finite number of subsets, so compact spaces can be considered kind of “pseudo-finite”. In analogy to the notion of compactness in R we have the following definition for generic metric spaces. Definition A.0.3 (Compactness) A subset E in a metric space (X, d) is said to be compact if any open cover of E admits a finite subcover (so-called Heine–Pincherle– Borel property). ♦ The next results generalises the Bolzano–Weierstraß theorem to compact metric spaces. Theorem A.0.1 (Bolzano–Weierstraß theorem for metric spaces) If (X, d) is a compact metric space, any infinite subset E ⊆ X has a limit point. Proof Let (X, d) be compact and E ⊆ X infinite. Assume, by contradiction, E has no limit points. Then for every x ∈ X there is an open ball B(x, r x ) such that B(x, r x ) ∩ E ⊆ {x}. The family {B(x, r x )}x∈X is an open cover of X , so there exists a finite subcover B(xi , r xi ), i = 1, . . . , n. But then E=E∩X=E∩
n
B(xi , r xi ) ⊆ {x1 , . . . , xn },
i=1
i.e. E is finite. The contradiction proves the claim. Definition A.0.4 (Sequentially compact metric spaces) A metric space (X, d) is sequentially compact if every sequence in X admits a convergent subsequence. ♦ We will see straightaway that sequential compactness implies total boundedness. Proposition A.0.3 (Sequential compactness versus total boundedness) A sequentially compact metric space (X, d) is totally bounded.
432
Appendix A: Compact and Totally Bounded Metric Spaces
Proof Suppose (X, d) is not totally bounded. Then there is an 0 such that X cannot be covered by balls centred at the points of any finite set A0 ⊆ X . Taking x1 ∈ X , then, we can find x2 ∈ X with d(x1 , x2 ) > 0 , for otherwise X = B(x1 , 0 ) and A0 = {x1 }. Next, take x1 , x2 ∈ X , and there is an x3 ∈ X with d(x3 , x1 ) > 0 and d(x3 , x2 ) > 0 , because otherwise X = B(x1 , 0 ) ∪ B(x2 , 0 ) and A0 = {x1 , x2 }. Inductively, at step k + 1 we have {x1 , . . . , xk } ⊆ X and there is an xk+1 ∈ X with d(xk+1 , xi ) > 0 for every i ∈ {1, . . . , k}, because if not, once again, X=
k
B(xi , 0 )
i=1
and A0 = {x1 , . . . , xk }. Overall we have a whole sequence {xk }k≥1 such that d(xk , x j ) > 0 , k = j. It must diverge, hence it certainly cannot have a convergent subsequence. But then X would not be sequentially compact, contradicting the hypothesis. Definition A.0.5 (Countably compact metric spaces) A metric space (X, d) is countably compact if any countable open cover of X has a finite subcover. ♦ Proposition A.0.4 (Sequentially compact metric spaces and countable compactness) A sequentially compact metric space (X, d) is countably compact. Proof Let {G n }n≥1 be a countable open cover of X X=
∞
Gn.
n=1
Suppose X not countably compact, so {G n }n≥1 does not have a finite subcover, meaning for every m ≥ 1 m
G n ⊆ X.
n=1
The proof is by induction. Take xm 1 ∈ X such that xm 1 ∈ / m n=1 G n , and since {G n }n≥1 covers X , there is an m such that x ∈ G . Choose xm 2 ∈ X such that xm 2 ∈ / 1 m m 1 1 m 1 . Then there is an m such that x ∈ G . Then pick x ∈ X such that n 2 m m m 2 2 3 n=1 G 2 / m xm 3 ∈ n=1 G n , so x m 3 ∈ G m 3 for some index m 3 . At step k
Appendix A: Compact and Totally Bounded Metric Spaces
433
m k−1
xm k ∈ G m k
and
xm k ∈ /
Gn.
n=1
We end up with a sequence {xm k }k≥1 in the sequentially compact space X , so there is a subsequence {xkn }n≥1 with some limit point x ∈ X . But {G n }n≥1 is a cover, so x belongs in some G n 0 . Furthermore, lim xkn = x
n→∞
forces xkn ∈ G n 0 for all n ≥ n 0 . But because of how the sequence was constructed, m kn −1 / G n 0 ⊆ n=1 G n . Hence there exists an m kn > n 0 such that xkn ∈ G m kn , and xkn ∈ no subsequence of {xm k }k≥1 can converge to x. But this means X is not sequentially compact which is a contradiction. Therefore X is countably compact. In metric spaces compactness can be characterised by means of sequences. Namely, Theorem A.0.2 (Characterisation of compact metric spaces in terms of sequential compactness) A metric space (X, d) is compact if and only if it is sequentially compact. Proof (⇒) Let (X, d) be compact, and let us prove it is sequentially compact. Given a compact subset E ⊆ X , suppose there is a sequence {xn }n≥1 none of whose subsequences converge in E. Any such must have infinite range, otherwise it would become eventually constant and hence converge in E. Nor can it accumulate at a point in E, otherwise this point would be the limit of a subsequence. Take x ∈ E. This is not a limit point for {xn }n≥1 , so there is an x > 0 such that at most one term of the sequence. the ball B(x, x ) contains Clearly B(x, x ) x∈E is an open cover of E, and by compactness we extract a finite subcover such that E⊆
n
B(xi , xi ).
i=1
n B(xi , xi ) there are at most n distinct points, But {xn }n≥1 ⊆ E, and since in i=1 we deduce the sequence can only attain finitely many values. Hence it must have a convergent subsequence, contradicting the hypothesis. (⇐) Consider a set of indices I and a family of open sets G = {G i }i∈I with Gi . X= i∈I
434
Appendix A: Compact and Totally Bounded Metric Spaces
As X is sequentially compact, it is totally bounded (Proposition A.0.3), and by Proposition A.0.2 also separable. Call D = {x1 , . . . , xn , . . .} ⊆ X the countable dense subset. Using this we construct the countable collection of balls B = B(xn , q) : xn ∈ D, q ∈ Q . Take the subcollection BG = B(xn , q) ∈ B : ∃G i ∈ G such that B(xn , q) ⊆ G i . Clearly the latter is at most countable. At this point it is enough to show BG is an open cover of X . If so, in fact,by Proposition A.0.4 and Definition A.0.5 we can extract from it a finite subcover B(x1 , q1 ), . . . , B(xk , qk ) . Each B(x j , q j ), j = 1, . . . , k, is contained in some G i of the original cover G. Renaming these G j gives a finite subcollection {G j : 1 ≤ j ≤ k} of G such that X=
k
G j,
j=1
whence compactness follows. So take an x in X . As G covers X , x belongs to some G i . The latter is open, so for > 0 there is an open ball B(x, ) ⊆ G i . But D is dense in X , so there exists xn ∈ D such that d(x, xn ) < /2, i.e. x ∈ B(xn , /2). Moreover, for every y ∈ B(xn , /2) we have d(y, xn ) < /2, so d(y, x) ≤ d(y, xn ) + d(x, xn ) < from the triangle inequality. Hence x ∈ B(xn , /2) ⊆ B(x, ) ⊆ G i . Pick q ∈ Q such that /2 < q < , so that x ∈ B(xn , q) ⊆ B(x, ) ⊆ G i . As B(xn , q) has rational radius and centre xn ∈ D, it belongs to the family B. But B(xn , q) is a subset of G i , so it also belong to BG . In conclusion, any x ∈ X belongs to a ball of BG , making the latter an open countable cover of X . There is a second description of compactness in metric spaces in terms of completeness and total boundedness.
Appendix A: Compact and Totally Bounded Metric Spaces
435
Theorem A.0.3 (Characterisation of compact metric spaces in terms of completeness and total boundedness) A metric space (X, d) is compact if and only if it is complete and totally bounded. Proof If X is compact, it is sequentially compact by what we said previously. Then X is complete and by Proposition A.0.3 it is totally bounded as well. Sufficient implication: suppose X complete, totally bounded and not compact, by contradiction. Hence it has an open cover B(xα , rα ) α∈I without finite subcovers. Total boundedness guarantees that for = 1 there exists a finite set {x11 , . . . , xn 1 } such that the (open or closed) balls B(xi1 , 1), i = 1, . . . , n, cover X .Call x1 the element in {x11 , . . . , xn 1 } with the property that no finite subcollection of B(xα , rα ) α∈I is a cover of B(x1 , 1). Such an element must exist, because if the balls of radius 1 centred at x11 , . . . , xn 1 possessed a finite subcover extracted from B(xα , rα ) α∈I then X would be compact. Similarly, choosing = 1/2 we let x2 be the element among x11/2 , . . . , xn 1/2 for which B(x2 , 1/2) ∩ B(x1 , 1) = ∅ and B(x2 , 1/2) is not covered by any finite subfamily of the open cover of X . This x2 exists, too: if all balls of radius 1/2 centred at x11/2 , . . . , xn 1/2 and intersecting B(x1 , 1) had a finite subcover of the open cover, then B(x1 , 1) would admit a finite subcover of the original family. By induction, for = 1/2m we call xm the element among x11/2m , . . . , xn 1/2m such that B(xm , 1/2m ) ∩ B(xm−1 , 1/2m−1 ) = ∅ and with the property that no finite subfamily of the open cover of X is a cover of B(xm , 1/2m ). Take x ∈ B(xm , 1/2m ) ∩ B(xm−1 , 1/2m−1 ), so the triangle inequality forces d(xm−1 , xm ) ≤ d(xm−1 , x) + d(xm , x) ≤
1 2m−1
+
1 1 ≤ m−2 , m 2 2
and for k < m d(xk , xm ) ≤ d(xk , xk+1 ) + d(xk+1 , xk+2 ) + . . . + d(xm−1 , xm ) 1 1 1 1 1 ≤ k−1 + k + k+1 + . . . + m−2 ≤ k−2 . 2 2 2 2 2 Hence {xi }i≥1 is a Cauchy sequence in the complete space X , so it converges to a point x ∈ X . Call B(xα0 , rα0 ) the ball of the open cover B(xα , rα ) α∈I that contains x. There exists an > 0 such that B(x, ) ⊆ B(xα0 , rα0 ). By construction, moreover, there is an m such that 1/2m < /2 and d(xm , x) < /2, so finally B(xm , 1/2m ) ⊆ B(x, ) ⊆ B(xα0 , rα0 ). But then one ball from the open cover is enough to cover B(xm , 1/2m ), contradicting the assumption.
436
Appendix A: Compact and Totally Bounded Metric Spaces
Theorems A.0.2 and A.0.3 imply that a metric space is sequentially compact iff it is complete and totally bounded. Proposition A.0.5 (Compactness in complete metric spaces) In a complete metric space (X, d) a subset F is compact if and only if it is closed and totally bounded. Proof The ⇒ part is self-evident: F is compact (also sequentially) and it contains all of its limit points. Conversely, take a sequence {xn }n≥1 in F. By assumption F is totally bounded, so for every = 1/m, m ≥ 1, F can be covered by finitely many balls B(xn , 1/m) = {y ∈ X : d(xn , y) ≤ 1/m}, n = 1, . . . , k. Choosing = 1, let {x11 , . . . , xk1 } denote the subset of X such that the closed balls B(xi1 , 1) of radius 1 finitely cover F. Among these there is at least one that contains infinitely many terms of {xn }n≥1 ⊆ F, call it B1 = B(xn 1 , 1). Then define N1 = {n : xn ∈ B1 }, an infinite set. Pick n 1 ∈ N1 and = 1/2. Let {x12 , . . . , xk2 } be the finite subset of X for which the closed balls B(xi2 , 1/2) cover F. One of these, say B2 = B(xn 2 , 1/2), contains infinitely many terms of {xn }n≥1 ⊆ F. Define N2 = {n > n 1 : xn ∈ B1 ∩ B2 } and so forth. Eventually we obtain a subsequence {xni }i≥1 of {xn }n≥1 such that, for every n j ≥ n i , the terms xn j belong to Bi = B(xni , 1/i), and therefore {xni }i≥1 is Cauchy in Bi ⊆ F. By completeness {xn }n≥1 converges in X , and as a matter of fact in F, since F is closed. Thus we proved {xn }n≥1 ⊆ F has a convergent subsequence, and F is compact. The Heine–Pincherle–Borel theorem descends in a straightforward manner from the the above result. Theorem A.0.4 (Heine–Pincherle–Borel theorem for complete metric spaces) Let (X, d) be a complete metric space and F ⊆ X a closed and totally bounded subset. Any cover of open balls {B(xα , rα )}α∈I of F admits a finite subcover. At last, here is the version of Cantor’s intersection theorem for complete metric spaces.
Appendix A: Compact and Totally Bounded Metric Spaces
437
Theorem A.0.5 (Cantor’s intersection theorem in complete metric spaces) In a complete metric space (X, d) the intersection of a decreasing sequence of non-empty, closed and totally bounded subsets is non-empty. Theorem A.0.6 ( Cardinality of perfect sets) In a complete metric space (X, d) any non-empty, perfect and totally bounded set P ⊆ X has the cardinality of the continuum. Proof Let P be a perfect, non-empty and totally bounded set in X . As any point in P is a limit point of P, the set must be infinite. Suppose P is countable. Take one point x1 and B(x1 , r1 ) = x ∈ X : d(x, x1 ) < r1 . As x1 is a limit point of P, B(x1 , r1 ) contains infinitely many points of P different ¯ 1 , r1 ) ∩ P = ∅, where B(x ¯ 1 , r1 ) = x ∈ X : d(x, x1 ) ≤ r1 . from x1 , so K 1 = B(x Among those pick x2 = x1 and let r2 > 0 be such that r2 < min d(x1 , x2 ), r1 − d(x1 , x2 ) . Call B(x2 , r2 ) = x ∈ X : d(x, x2 ) < r2 , ¯ 2 , r2 ) ⊆ B(x ¯ 1 , r1 ). Since d(x1 , x2 ) > r2 , then x1 ∈ ¯ 2 , r2 ). Furthermore, x2 so B(x / B(x is a limit point of P, so B(x2 , r2 ) contains infinitely many points of P other than x2 , ¯ 2 , r2 ) ∩ P = ∅. Among them pick x3 = x2 , and so forth. and K 2 = B(x ¯ n , rn ) ∩ P = ∅ conSuppose, by induction, to have built B(xn , rn ) such that B(x tains infinitely many points of P different from xn . Choose xn+1 = xn and take rn+1 < min d(xn , xn+1 ), rn − d(xn , xn+1 ) . Then B(xn+1 , rn+1 ) = x ∈ X : d(x, xn+1 ) < rn+1 and 1. 2. 3.
¯ n , rn ); ¯ n+1 , rn+1 ) ⊆ B(x B(x ¯ / B(xn+1 , rn+1 ), because d(xn , xn+1 ) > rn+1 ; xn ∈ ¯ n+1 , rn+1 ) ∩ P = ∅. K n+1 = B(x
438
Appendix A: Compact and Totally Bounded Metric Spaces
The set in 3. satisfies the induction hypothesis, so we can proceed with the construction. / K n+1 , at the end of the process
∞Since xn ∈
∞none of the points of P belongs in K . Moreover K ⊆ P for every n, so n n=1 n n=1 K n = ∅. But {K n } is a decreasing sequence of non-empty compact sets (Proposition 2.1.4) and with empty intersection. This violates Theorem A.0.5, and the claim follows.
Appendix B
Urysohn’s Lemma and Tietze’s Theorem
In a metric space (X, d) any two non-empty, closed and disjoint sets A, B ⊆ X can be separated by open disjoint sets U, V ⊆ X . This is called normality property. Proving that metric spaces are normal is rather easy. Take in fact a ∈ A, b ∈ B and · d(a, B) > 0, rb = (1/3) · d(b, A) > 0, where d(a, B) = inf d(a, b) : ra = (1/3) b ∈ B and d(b, A) = inf d(b, a) : a ∈ A . Then B(a, ra ) and V = B(b, rb ) U= a∈A
b∈B
are open, and disjoint since if y ∈ B(a, ra ) ∩ B(b, rb ) (assuming ra ≥ rb without loss of generality), the triangle inequality would give d(a, b) ≤ d(a, y) + d(y, b) ≤ ra + rb ≤ 2ra , whilst by construction d(a, b) ≥ 3ra . We set out to prove a general fact concerning normal topological spaces—hence valid for metric spaces—called Urysohn lemma. The objective is to build a continuous function f on a normal space X mapping non-empty closed subsets A, B to 0, 1 respectively. To do this we need a very large family of open sets containing A that do not meet B, and then decide which value f takes of a given x ∈ X by looking at which open sets it belongs. Theorem B.0.1 (Urysohn’s lemma1 ) Let X be a normal space and A, B non-empty, closed, disjoint subsets. Call U the family of neighbourhoods of A that do not intersect 1 Pavel
Samuilovich Urysohn (1898–1924) was born in Odessa. He studied under D.F. Egorov and N.N. Lusin in Moscow, where he was awarded a Ph.D. in 1921. Urysohn was one of the most promising Soviet mathematicians of his generation at the time of his death, at the age of 25, in a tragic accident while swimming off the coast of Brittany. Urysohn’s lemma, albeit trivial in metric spaces, allows to generalise Tietze’s extension theorem to normal spaces.
© The Editor(s) (if applicable) and The Author(s), under exclusive license to Springer Nature Switzerland AG 2020 S. Gentili, Measure, Integration and a Primer on Probability Theory, UNITEXT 125, https://doi.org/10.1007/978-3-030-54940-4
439
440
Appendix B: Urysohn’s Lemma and Tietze’s Theorem
B, and D the set of rationals of the form m/2n , m, n ∈ N and 0 ≤ m/2n ≤ 1. Then there exists a continuous map f : X → [0, 1] such that f (A) = 0, f (B) = 1, and 0 < f (x) < 1 if x ∈ X \ (A ∪ B). Proof The set D=
m m 1 1 3 1 3 5 7 : m, n ∈ N and 0 ≤ n ≤ 1 = 0, 1, , , , , , , , . . . , 2n 2 2 4 4 8 8 8 8
consists of the dyadic numbers in [0, 1]. For each q ∈ D take a neighbourhood Uq ⊆ X . Initially, choose U1 = X . As X is normal, there are open disjoint neighbourhoods U A , U B such that U A ∩ U B = ∅, so U¯ A ∩ B = ∅ and U¯ A ⊆ (X \ B) ⊆ U1 = X . Put U0 = U A , and by normality we have U¯ 0 ⊆ U1/2 ⊆ U¯ 1/2 ⊆ (X \ B) ⊆ U1 = X. Iterating in this way we can find U1/4 such that U¯ 0 ⊆ U1/4 ⊆ U¯ 1/4 ⊆ U1/2 . Analogously, we can find U3/4 satisfying a similar relationship with U1/2 and (X \ B), and so on. By induction we end up with 1. 2. 3.
A ⊆ Uq for every q ∈ D; B ⊆ U1 and B ∩ Uq = ∅ for every q < 1; U¯ p ⊆ Uq for every p < q in D.
Now define f : X → [0, 1] by f (x) = inf t ∈ D : x ∈ Ut . This map equals 0 on A (condition 1. above) and 1 on B (condition 2.). We need to show f is continuous, i.e. that the pre-image of any open subset in [0, 1] is open in X . Suppose V = (a, b) intersects [0, 1]. Given x ∈ f −1 (V ), clearly f (x) ∈ (a, b), so there exist rational numbers p, q such that a < p < f (x) < q < b. As p < f (x) and f (x) = inf t ∈ D : x ∈ Ut , there exists some p such that p < ¯ / U p , and since / U¯ p . p < f (x). But p < f (x) implies x ∈ x∈ U p ⊆ U p then On the other hand, f (x) < q and f (x) = inf t ∈ D : x ∈ Ut force x ∈ Uq . Combining these facts leads to x ∈ Uq \ U¯ p = U , which is open in X . We claim f (U ) ⊆ (a, b).
Appendix B: Urysohn’s Lemma and Tietze’s Theorem
441
Pick y ∈ U , so by construction y ∈ Uq ⊆ U¯ q and by definition f (y) ≤ q < b. On the other hand by construction y ∈ / U¯ p ⊇ U p , and by definition f (y) ≥ p > a. Hence f (y) ∈ [ p, q] ⊆ (a, b). Clearly f : X → [0, 1] is 0 on A and 1 on B, and it is continuous on X , so for x ∈ X \ (A ∪ B) it must assume all values in (0, 1).
Example B.0.1 (Example of Urysohn function on a metric space) Take A, B nonempty, closed and disjoint in a metric space (X, d), and let us construct f : X → [0, 1] such that f (x) = 0 for every x ∈ A, f (x) = 1 for every x ∈ B and 0 < f (x) < 1 for every x ∈ X \ (A ∪ B). Define continuous maps (see Lemma 5.1.2) d(x, A) = inf d(x, a) : a ∈ A and d(x, B) = inf d(x, b) : b ∈ B . Clearly
d(x, A) =
= 0 if x ∈ A > 0 if x ∈ A
c
and d(x, B) =
= 0 if x ∈ B > 0 if x ∈ B c .
Since d(x, A) + d(x, B) is always positive and continuous, we introduce the continuous map f (x) =
d(x, A) . d(x, A) + d(x, B)
In practice ⎧ ⎨= 0 f (x) = 1 ⎩ ∈ (0, 1)
if x ∈ A if x ∈ B , if x ∈ X \ (A ∪ B)
furnishing the required function. Theorem B.0.2 (Tietze’s extension theorem2 ) Let X be a normal topological space and E a closed subset. Then 2 Heinrich
Tietze (1880–1964) was born in Schleinz, today’s Austria. He studied in Vienna, where he obtained his Habilitation in 1908. From 1910 until 1919 he was professor in Brno, during which time he proved the extension theorem for metric spaces. He spent the rest of his career teaching at universities in Erlangen and Munich.
442
Appendix B: Urysohn’s Lemma and Tietze’s Theorem
1. a continuous map f : E → [c, d] extends to a continuous map g : X → [c, d]; 2. a continuous map f : E → R extends to a continuous map g : X → R. ‘Extends’ means f (x) = g(x) for every x ∈ E. Proof We start by 1. Replacing f with ( f − c)/(d − c) allows us to prove the statement with respect to the standard interval [0, 1]. Pick α ∈ (0, 1] and a continuous mapping f : E → [0, α]. Divide [0, α] in three subintervals of length (1/3)α: 1 1 2 2 α, α I3 = α, α . I1 = 0, α , I2 = 3 3 3 3 Now set
1 A = x ∈ E : f (x) ≤ α 3
2 and B = x ∈ E : f (x) ≥ α . 3
These are disjoint, and because f ∈ C 0 (E) (the class of continuous maps on E) they are closed in E, hence in X . By Urysohn’s lemma there exists a continuous function gˆ : X → [0, 1] mapping ˆ x ∈ X , is continuous and A to 0 and B to 1. Then g1 (x) = (α/3)g(x), g1 (x) = 0, g1 (x) = α3 ,
∀x ∈ A; ∀x ∈ B; g1 (x) ∈ 0, α3 , ∀x ∈ X \ (A ∪ B). Furthermore, for x ∈ E 2 1 0 ≤ f (x) − g1 (x) ≤ α − α = α. 3 3 When α = 1 we have g1 : E → [0, 1/3]. Set 2 ( f − g1 ) = f 1 : E → 0, . 3 Repeating the previous step for α = 2/3 gives the three subintervals 2 1 2 1 2 2 2 2 2 , I3 = I1 = 0, · , I2 = · , , , 3 3 3 3 3 3 3 and we redefine
Appendix B: Urysohn’s Lemma and Tietze’s Theorem
1 2 A = x ∈ E : f 1 (x) ≤ · 3 3
443
2 2 . and B = x ∈ E : f 1 (x) ≥ 3
still closed and disjoint in X . The Urysohn lemma gives a new continuous map gˆ : X → [0, 1] sending A to 0 and B to 1. The continuous map g2 (x) = (1/3) · (2/3)g(x) ˆ satisfies g2 (x) = 0,
∀x ∈ A; ∀x ∈ B; · , g2 (x) = g2 (x) ∈ 0, 13 · 23 , ∀x ∈ X \ (A ∪ B); 1 3
2 3
and for every x ∈ E 2 2 0 ≤ [ f (x) − (g1 (x) + g2 (x))] ≤ . 3 Next, consider 2 2 . [ f − (g1 + g2 )] = f 2 : E → 0, 3 By induction, at the nth iteration we have a continuous map gn such that gn (x) = 0, ∀x ∈ A; n−1 , ∀x ∈ B; gn (x) = 13 23 n−1 , ∀x ∈ X \ (A ∪ B); gn (x) ∈ 0, 13 · 23 and n 2 0 ≤ f (x) − gi (x) ≤ 3 i=1 n
for every x ∈ E. Taking the limit as n → ∞ we finally obtain g(x) =
∞
gi (x), x ∈ E.
i=1
The above turns into ∞the required extension g : X → [0, 1], provided we show it gi (x) converges uniformly on X , and that g(x) = f (x) for is continuous, i.e. i=1 every x ∈ E.
444
Appendix B: Urysohn’s Lemma and Tietze’s Theorem
Define Sn (x) = g1 (x) + g2 (x) + . . . + gn (x). For any n ∈ N the partial sum Sn is continuous, and whenever m > n m m |Sm (x) − Sn (x)| = gk (x) ≤ |gk (x)| k=n+1 m
k=n+1
n m k−1 2 1 1 23 − 23 ≤ = · 3 k=n+1 3 3 1 − 23 n m n 2 2 2 = − ≤ 3 3 3
for every x ∈ X . Hence {Sn }n≥1 is a Cauchy sequence. Taking the limit as m → ∞, we know lim Sm (x) = g(x)
m→∞
so the above inequality reads n 2 |g(x) − Sn (x)| ≤ , x ∈ X. 3 Therefore {Sn }n≥1 converges uniformly on X to g(x), whence the continuity of g follows. Furthermore g(x) is bounded, since ∞ 1 1 2 i−1 1 gi (x) ≤ = · 0 ≤ g(x) = 3 3 3 1 − i=1 i=1 ∞
2 3
= 1.
At last, given x ∈ E, by construction n 2 . f (x ) − Sn (x ) = f (x ) − g1 (x ) + . . . + gn (x ) ≤ 3
As n → ∞ we obtain f (x ) − g(x ) = 0, so f (x) = g(x) for every x ∈ E. Proof of 2. Now we have a continuous extension g : X → [0, 1] of f : E → [0, 1]. Since R is homeomorphic to (0, 1), it will suffice to exhibit a continuous extension φ of f with image φ(X ) ⊆ (0, 1). Notice that by replacing f with (2 f − 1) and g with (2g − 1) we can prove 2. for the interval [−1, 1]. So consider F = g −1 ({−1, 1}) ⊆ X.
Appendix B: Urysohn’s Lemma and Tietze’s Theorem
445
Clearly F is closed, and disjoint from E because g(E) ⊆ (−1, 1). Urysohn’s lemma guarantees there exists a continuous function h : X → [0, 1] such that h(F) = {0} and h(E) = {1}. The continuous map φ = h · g extends f , since for every x ∈ E φ(x) = h(x) · g(x) = g(x) = f (x); its image lies in (−1, 1), because for x ∈ F φ(x) = g(x) · h(x) = 0; and when x ∈ /F |φ(x)| = |g(x) · h(x)| ≤ |g(x)| < 1.
List of Figures
2.1 2.2 3.1 IV.1 IV.2 IV.3 IV.4 IV.5 IV.6 IV.7 10.1 10.2 12.1 12.2 12.3 12.4 12.5 14.1 14.2 14.3
Table of rational numbers … 26 Construction of Volterra’s set - the first three iterations … 34 Hierarchy of Borel classes … 76 Construction of the Riemann integral of a function f (x) … 152 Graph of y = (x) … 153 Graph of Riemann’s function R(x) for n = 2 … 154 Graph of Riemann’s function R(x) for n = 10 … 154 Graph of f (x) = x 2 sin x1 as x → 0+ … 159 Graph of f a,b (x), part of Volterra’s function … 160 Graph of the Volterra function … 161 Graphs of y = min(x, x 2 ) and y = max(x, x 2 ) … 251 Graph of the Vitali-Cantor map … 271 Example of continuous extension … 332 1 Graph of f (x) = e− x on x > 0 and f (x) = 0 on x ≤ 0 … 333 … 336 Graph of F(x) = f (x)+f (x) f (1−x)
g(x) Graph of G(x) = g(x)+g(−1−x) … 336 Graph of bump function φn … 337 Density and distribution functions of a uniform RV Ua,b … 421 Density function of a negative-exponential RV … 423 Density function of a normal RV … 424
© The Editor(s) (if applicable) and The Author(s), under exclusive license to Springer Nature Switzerland AG 2020 S. Gentili, Measure, Integration and a Primer on Probability Theory, UNITEXT 125, https://doi.org/10.1007/978-3-030-54940-4
447
List of Symbols
A⊂B A⊆B A(C) A+ B(n, p) B(R) B(A) C c ℵ0 ϕA I 01 0α 0β α0 β0 M0+ (X, S) 10 A pj (I) I˜ M0 (X, S) N ¯ B(x, r) A¯
A is a proper subset of B, 19 A is a subset of B, 4 Algebra generated by a family C, 124 Algebra generated by pairwise-disjoint sets of a semi-algebra, 125 Binomial random variable of order n and parameter p, 413 Borel σ-algebra over R, 142 Boundary of a set A, 22 Cantor set, 40 Cardinality of the continuum, 27 Cardinality of the set N, 27 Characteristic function of a set A, 67 Class of all real intervals, 122 Class of closed subsets of R, 56 Class of countable intersections of sets in β