Measure-Theoretic Calculus in Abstract Spaces - On the Playground of Infinite-Dimensional Spaces [1 ed.] 9783031219115, 9783031219122

This monograph provides a rigorous, encyclopedic treatment of the fundamental topics in real analysis, functional analys

106 89 14MB

English Pages 933 [951] Year 2024

Report DMCA / Copyright

DOWNLOAD PDF FILE

Table of contents :
Preface
Contents
List of Figures
Notations
1 Introduction
1.1 The Tour of the Book
1.2 How to Use the Book
1.3 What This Book Does Not Include
2 Set Theory
2.1 Axiomatic Foundations of Set Theory
2.2 Relations and Equivalence
2.3 Function
2.4 Set Operations
2.5 Algebra of Sets
2.6 Partial Ordering and Total Ordering
2.7 Basic Principles
3 Topological Spaces
3.1 Fundamental Notions
3.2 Continuity
3.3 Basis and Countability
3.4 Products of Topological Spaces
3.5 The Separation Axioms
3.6 Category Theory
3.7 Connectedness
3.8 Continuous Real-Valued Functions
3.9 Nets and Convergence
4 Metric Spaces
4.1 Fundamental Notions
4.2 Convergence and Completeness
4.3 Uniform Continuity and Uniformity
4.4 Product Metric Spaces
4.5 Subspaces
4.6 Baire Category
4.7 Completion of Metric Spaces
4.8 Metrization of Topological Spaces
4.9 Interchange Limits
5 Compact and Locally Compact Spaces
5.1 Compact Spaces
5.2 Countable and Sequential Compactness
5.3 Real-Valued Functions and Compactness
5.4 Compactness in Metric Spaces
5.5 The Ascoli–Arzelá Theorem
5.6 Product Spaces
5.7 Locally Compact Spaces
5.7.1 Fundamental Notion
5.7.2 Partition of Unity
5.7.3 The Alexandroff One-point Compactification
5.7.4 Proper Functions
5.8 σ-Compact Spaces
5.9 Paracompact Spaces
5.10 The Stone–Čech Compactification
6 Vector Spaces
6.1 Group
6.2 Ring
6.3 Field
6.4 Vector Spaces
6.5 Product Spaces
6.6 Subspaces
6.7 Convex Sets
6.8 Linear Independence and Dimensions
7 Banach Spaces
7.1 Normed Linear Spaces
7.2 The Natural Metric
7.3 Product Spaces
7.4 Banach Spaces
7.5 Compactness
7.6 Quotient Spaces
7.7 The Stone-Weierstrass Theorem
7.8 Linear Operators
7.9 Dual Spaces
7.9.1 Basic Concepts
7.9.2 Duals of Some Common Banach Spaces
7.9.3 Extension Form of Hahn–Banach Theorem
7.9.4 Second Dual Space
7.9.5 Alignment and Orthogonal Complements
7.10 The Open Mapping Theorem
7.11 The Adjoints of Linear Operators
7.12 Weak Topology
8 Global Theory of Optimization
8.1 Hyperplanes and Convex Sets
8.2 Geometric Form of Hahn–Banach Theorem
8.3 Duality in Minimum Norm Problems
8.4 Convex and Concave Functionals
8.5 Conjugate Convex Functionals
8.6 Fenchel Duality Theorem
8.7 Positive Cones and Convex Mappings
8.8 Lagrange Multipliers
9 Differentiation in Banach Spaces
9.1 Fundamental Notion
9.2 The Derivatives of Some Common Functions
9.3 Chain Rule and Mean Value Theorem
9.4 Higher Order Derivatives
9.4.1 Basic Concept
9.4.2 Interchange Order of Differentiation
9.4.3 High Order Derivatives of Some Common Functions
9.4.4 Properties of High Order Derivatives
9.5 Mapping Theorems
9.6 Global Inverse Function Theorem
9.7 Interchange Differentiation and Limit
9.8 Tensor Algebra
9.9 Analytic Functions
9.10 Newton's Method
10 Local Theory of Optimization
10.1 Basic Notion
10.2 Unconstrained Optimization
10.3 Optimization with Equality Constraints
10.4 Inequality Constraints
11 General Measure and Integration
11.1 Measure Spaces
11.2 Outer Measure and the Extension Theorem
11.3 Measurable Functions
11.4 Integration
11.5 General Convergence Theorems
11.6 Banach Space Valued Measures
11.7 Calculation with Measures
11.8 The Radon–Nikodym Theorem
11.9 Lp Spaces
11.10 Dual of C(X,Y) and Cc(X,Y)
12 Differentiation and Integration
12.1 Carathéodory Extension Theorem
12.2 Change of Variable
12.3 Product Measure
12.4 Functions of Bounded Variation
12.5 Absolute and Lipschitz Continuity
12.6 Fundamental Theorem of Calculus
12.7 Representation of (Ck(Ω,Y))*
12.8 Sobolev Spaces
12.9 Integral Depending on a Parameter
12.10 Iterated Integrals
12.11 Manifold
12.11.1 Basic Notion
12.11.2 Tangent Vectors
12.11.3 Vector Fields
13 Hilbert Spaces
13.1 Fundamental Notions
13.2 Projection Theorems
13.3 Dual of Hilbert Spaces
13.4 Hermitian Adjoints
13.5 Approximation in Hilbert Spaces
13.6 Other Minimum Norm Problems
13.7 Positive Definite Operators on Hilbert Spaces
13.8 Pseudoinverse Operator
13.9 Spectral Theory of Linear Operators
14 Probability Theory
14.1 Fundamental Notions
14.2 Gaussian Random Variables and Vectors
14.3 Law of Large Numbers
14.4 Martingales Indexed by Z+
14.5 Banach Space Valued Martingales Indexed by Z+
14.6 Characteristic Functions
14.7 Convergence in Distribution
14.8 Central Limit Theorem
14.9 Uniform Integrability and Martingales
14.10 Existence of the Wiener Process
14.11 Martingales with General Index Set
14.12 Stochastic Integral
14.13 Itô Processes
14.14 Girsanov's Theorem
A Elements in Calculus
A.1 Some Formulas
A.2 Convergence of Infinite Sequences
A.3 Riemann-Stieltjes Integral
Bibliography
Index
Recommend Papers

Measure-Theoretic Calculus in Abstract Spaces - On the Playground of Infinite-Dimensional Spaces [1 ed.]
 9783031219115, 9783031219122

  • 0 0 0
  • Like this paper and download? You can publish your own PDF file online for free in a few minutes! Sign Up
File loading please wait...
Citation preview

Zigang Pan

MeasureTheoretic Calculus in Abstract Spaces On the Playground of Infinite-Dimensional Spaces

Measure-Theoretic Calculus in Abstract Spaces

Zigang Pan

Measure-Theoretic Calculus in Abstract Spaces On the Playground of Infinite-Dimensional Spaces

Zigang Pan Mason, OH, USA

ISBN 978-3-031-21911-5 ISBN 978-3-031-21912-2 https://doi.org/10.1007/978-3-031-21912-2

(eBook)

Mathematics Subject Classification: 28-02, 28B05, 28C15, 28C20, 46-02, 46G05, 46G10 © The Editor(s) (if applicable) and The Author(s), under exclusive license to Springer Nature Switzerland AG 2023 This work is subject to copyright. All rights are solely and exclusively licensed by the Publisher, whether the whole or part of the material is concerned, specifically the rights of translation, reprinting, reuse of illustrations, recitation, broadcasting, reproduction on microfilms or in any other physical way, and transmission or information storage and retrieval, electronic adaptation, computer software, or by similar or dissimilar methodology now known or hereafter developed. The use of general descriptive names, registered names, trademarks, service marks, etc. in this publication does not imply, even in the absence of a specific statement, that such names are exempt from the relevant protective laws and regulations and therefore free for general use. The publisher, the authors, and the editors are safe to assume that the advice and information in this book are believed to be true and accurate at the date of publication. Neither the publisher nor the authors or the editors give a warranty, expressed or implied, with respect to the material contained herein or for any errors or omissions that may have been made. The publisher remains neutral with regard to jurisdictional claims in published maps and institutional affiliations. This book is published under the imprint Birkhäuser, www.birkhauser-science.com by the registered company Springer Nature Switzerland AG The registered company address is: Gewerbestrasse 11, 6330 Cham, Switzerland

Preface

This is a book that started out as a reading note of the book Royden (1988) and the MATH 441 & 442 notes by Professor Peter Leob of the University of Illinois at Urbana-Champaign. I am inspired by Luenberger (1969) on the wide applicability of functional analysis tools and its unification of classical results in decision theory, optimization theory, control theory, and numerical approximation. When I took the course MATH 480 with Professor Sean Meyn at the University of Illinois at UrbanaChampaign using the text book Luenberger (1969), I was awestruck by the clean and concise theorems and tight and elegant arguments that leads to the conclusions of the theorems. My background is in control theory, and I find that the tools in Luenberger (1969) require deep knowledge in Lebesgue integral and therefore measure theory. Thus, I started with Royden (1988) to study measure theory and general integration with the goal of studying measure theoretic calculus. Another objective of mine is to set a solid foundation for my control theory research. Thus, I started out on this decade and a half long study that has led to this book. During this study, I consulted a number of books that are the guiding light that shine upon my way. The foundation of mathematics is set theory, which I find the axiomatic foundation for set theory in my high school award book Mathematics Handbook Editors Group (1979). The book Suppes (1972) clears up most of the paradoxes in my mind about set theory. Professor Leob’s notes are excellent in the characterization of equivalence between Axiom of Choice, Zorn’s Lemma, Hausdorff’s Maximum Principle, and the Well-Ordering Principle. After the completion of the book, I reread the classic (Halmos, 1960) and find it to be very agreeable, which was not the case when I was writing Chap. 2 back then. The concept of nets, as a generalization of sequences, is an essential tool for dealing with various convergence concepts, especially for Riemann and Lebesgue integrals. In Chap. 6, I have included some material from the book Maunder (1996). Chapters 6– 10 include a significant amount of material from the book Luenberger (1969). The Alaoglu’s Theorem 7.122 in Chap. 7 is, in my view, the key to infinite-dimensional systems. Chapter 9 also references Bartle (1976). The book Bartle (1976) is solid in its mathematical rigor and serves as the starting point of this endeavor. The Global Inverse Function Theorem in Sect. 9.6 is inspired by Ambrosetti and Prodi (1993). v

vi

Preface

Section 9.9 provides all the essential theorems on analytic functions. Chapter 11 includes significant amount of self-developed material due to lack of reference on this subject. The proof of Radon-Nikodym Theorem 11.169 was adapted from MATH 442 notes by Prof. Peck of the University of Illinois at Urbana-Champaign. The book Royden (1988) offers clues on how to invent the wheel. The calculation of measure (see Sect. 11.7) is a main contribution of this book. As I understand from this study, the definition of a measure, calculating new measures based on already defined measures, and working with the cumulative distribution functions of these measures are essential characteristics in the application of measure theory. In Sect. 11.10, I finally resolve the dual of the space of continuous functions of a compact space to a Banach space. I further correctly defined the space .Cc (X, Y) and find its dual. This result sets the foundation for the concept of convergence in distribution as the weak.∗ convergence in the dual of .Cc (X, Y). I could further define .Cc (X , Y) for .X that are open subset of a finite-dimensional Banach space. I chose to stop there and leave this as an exercise for the reader. In Chap. 12, differentiation and integration finally comes together. The Tonelli’s Theorem is standard. The Fubini’s Theorem comes in two flavors: one for general (separable) Banach space valued integrands, and the other for .σ -compact conic segment valued integrands. The latter one is as simple as it can get: for absolutely integrable functions of the product measure space, the integral can be calculated as repeated integrals, as long as one sets the intermediate integral to the null vector whenever the intermediate integral is not absolutely integrable. This has direct application to finite-dimensional Banach space valued integrands. The concept of isomeasure is a great discovery. It allows me to equate integrations on different measurable spaces. This has direct application in the Change of Variable Theorem 12.91. There, all I need to do is to show that a certain induced measure is absolutely continuous with respect to the standard Borel measure on m .R . The connection between Riemann-Stieltjes integral and Lebesgue-Stieltjes integral is established in Proposition 12.100 for the real-valued integrand case, and Theorem A.10 for the general Banach space valued integrand case. I attempted to define Sobolev space in a correct fashion, which is different from that of Zeidler (1995) when the dimension of the domain of the function is greater than 1. I feel that my definition is the correct way to go, due to the characterization of absolutely continuous functions on multi-dimensional domains in the Fundamental Theorem of Calculus II (Theorem 12.88). Also, standard results on integral depending on a parameter (Bartle, 1976) are generalized to a more general Banach space setting, in Theorems 12.111, 12.112, 12.115, and 12.116. Here, it is observed that the best practice is to stack the measurable part of the integrand in a function inside the integrand. Then, the assumptions for the integrand are easily formulated and the results are similar to Bartle (1976). The chapter ends with Sect. 12.11 on the topic of manifold. In this section, I studied the topic of manifold where local charts are modeled by some Banach space over the field .K. This topic is included due to the necessity to discuss complex logarithm in the proof of Central Limit Theorem 14.63. Here, for a rigorous presentation, one has to make a trip to manifolds. These results are motivated by Isidori (1995), which has a great introduction to differential

Preface

vii

manifolds in its Appendix A, and by Bishop and Goldberg (1980), which has a proof of the representation of tangent vectors of a finite-dimensional manifold. In Chap. 13, standard projection theorems are established as well as the dual of a Hilbert space. Examples of complete orthonormal sequences are obtained. The Hermitian adjoint of a linear operator is introduced, which then leads to the definition of positive definite and positive semi-definite operators (including complex versions). Standard results on pseudoinverse of a linear operator between Hilbert spaces is proved. Then, the spectral theory of compact Hermitian linear operators is established. The singular value decomposition is obtained for all compact linear operators between Hilbert spaces. In Chap. 14, standard results on probability theory are obtained. I consulted my MATH 451 & 452 notes of Professor Burkholder of the University of Illinois at Urbana-Champaign, Stark and Woods (1986), Williams (1991), and Billingsley (1995). Professor Burkholder’s classes at the University of Illinois at Urbana-Champaign were particularly helpful. The text book used in the classes was Williams (1991), which tells one everything one needs to know about conditional expectations, which I generalized to Banach spaces in Proposition 14.11. Then, Professor Burkholder led us through the definition of the Wiener process and stochastic integration and Markov processes. These topics are beyond my reach if not for MATH 452. After a few iterations, I have finally obtained an elegant version of the Itô’s Formula (Theorem 14.122). It applies in infinite dimensional spaces as well, but works like a charm in finite dimensional spaces. Also, I proved the Girsanov’s Theorem 14.123 in April 2020 after consulting (Gihman and Skorohod, 1972; Stroock and Varadhan, 1979; Elliott, 1982), and thus completes this decade and a half long study. After I took my Math 441 & 442 classes on Real Analysis based on the book Royden (1988) at the University of Illinois at Urbana-Champaign, there is something fuzzy about the theory. I am confident that the results are correct, but the arbitrary assignment of .0 × ∞ = 0 hangs like a dark cloud in the horizon; and the lack of Change of Variable Theorem in the theory for multi-dimensional measure spaces kept me wanting for more results in this field. I did not bother then to study measure theory myself, and pressed on with my PhD study in control theory without solidifying my background. That proved to be a major mistake. After I lost my teaching job for good, I finally settled down to study the basics that underpins my past research. My wife got a secure job and I became an at-home dad. I have all the time in the world. I really don’t want to waste my brain away, and pressed on with research to keep my mind sharp and my resumé competitive. God put the thought in me that I could work on foundational mathematics rather than waving my hands at some advanced control problems. I could do both, but the urge to stand on my own feet finally win over my attention. I put down some fruitful control theory research and focused myself on the foundational mathematics that underpins my earlier research. I expected that it might take a few years, but it actually started me on this decade and a half long pursuit. In retrospect, I am happy with my decision of working in mathematics, which is a topic that fascinated me since my youth. The end result is very satisfying for me and I look back at this journey as something that I can be proud of. As a consequence of this study, I find that my past research on

viii

Preface

linear exponential of quadratic Gaussian control are correct since I finally proved Girsanov’s Theorem 14.123. My research in nonlinear stochastic systems can only be regarded as pointers to what could be done in these directions, rather than solid theory. Key integrability assumptions are missing in those research. My research on jump linear systems are correct entirely. I want to thank my Uncle Desheng Pan and Aunt Deyin Pan for my upbringing. Uncle Sheng gave me a rigorous mathematical mind when I was 10 years old that he drilled me to study Euclidean geometry. I was not happy then, but it gave me a sharp mind ever since. All those classes and lectures in later life are worth much more to me than to any other person, since I constantly think about the “why” behind everything I learned. Aunt Yin and Uncle Sheng shouldered most of the life’s burdens when I was in China and gave me and my sister the freedom to study and excel. I thank Aunt Yin especially for all those chores that she did for us, all those clothes that she washed without a washing machine, and all those meals she cooked. One has to remember she did all these in addition to a six day per week job. I want to express my gratitude to the University of Illinois at Urbana-Champaign that gave me a world class education. The friendships I formed during my graduate study will last a lifetime. Most importantly, I am grateful for Professor Tamer Ba¸sar of the University of Illinois for his friendship throughout my adult life, first as my advisor and later as my friend. I am also grateful for his time and effort in reading and editing the Preface and the Chap. 1 of this book that resulted in a much more readable book over all. I am grateful for God for showing me the right ways of life at many instances so that I can correct myself in many things and become an independent scholar to finish this foundational book and work on many interesting and important adaptive controls research along this journey of life. God saw the best in me and provided me with a stable and delightful environment, the charming city of Mason, OH, that I can work toward my goal. With God’s gentle persuasion, I stayed on the correct path through this book writing endeavor. God, the All Mighty, is my guiding light and my rock. I am forever grateful for God’s love. I want to thank Professor Wen-ben Jone of the University of Cincinnati for remaining my friend after I lost my last job. I want to thank Professor Gang Tao of the University of Virginia for helpful comments in reaching the current form of the book. I am grateful to the Birkhäuser (Springer-Nature) staffs for all of their effort and help in bringing this book into publication: Editors Mr. Thomas Hempfling, Ms. Sarah Annette Goob, Mrs. Dana Knowles, and Mr. Christopher Tominich, TEX Support Administrator Suresh Kumar, Project Coordinator Mr. Daniel Ignatius Jagadisan, and Production Contact Ms. Antje Endemann, as well as Production Manager Mrs. Kali Gayathri, and Mr. Periyanayagam Leoselvakumar at Straive. I want to thank my wife for all the help she gave me. I also want to thank my parents for their help over the years and my daughter for her patience (or lack of it) with dad’s preachings. Mason, OH, USA July 2023

Zigang Pan

Contents

1

Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 1.1 The Tour of the Book . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 1.2 How to Use the Book . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 1.3 What This Book Does Not Include . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . .

1 1 24 25

2

Set Theory . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 2.1 Axiomatic Foundations of Set Theory . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 2.2 Relations and Equivalence.. . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 2.3 Function . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 2.4 Set Operations.. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 2.5 Algebra of Sets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 2.6 Partial Ordering and Total Ordering . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 2.7 Basic Principles . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . .

27 27 28 29 30 31 32 34

3

Topological Spaces . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 3.1 Fundamental Notions . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 3.2 Continuity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 3.3 Basis and Countability .. . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 3.4 Products of Topological Spaces . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 3.5 The Separation Axioms .. . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 3.6 Category Theory . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 3.7 Connectedness . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 3.8 Continuous Real-Valued Functions . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 3.9 Nets and Convergence . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . .

41 41 44 46 48 52 53 56 60 64

4

Metric Spaces . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 4.1 Fundamental Notions . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 4.2 Convergence and Completeness .. . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 4.3 Uniform Continuity and Uniformity . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 4.4 Product Metric Spaces .. . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 4.5 Subspaces . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 4.6 Baire Category . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . .

79 79 81 84 86 91 92 ix

x

Contents

4.7 4.8 4.9

Completion of Metric Spaces . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . Metrization of Topological Spaces . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . Interchange Limits . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . .

93 98 99

5

Compact and Locally Compact Spaces . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 5.1 Compact Spaces. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 5.2 Countable and Sequential Compactness . . . . . . .. . . . . . . . . . . . . . . . . . . . 5.3 Real-Valued Functions and Compactness.. . . . .. . . . . . . . . . . . . . . . . . . . 5.4 Compactness in Metric Spaces .. . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 5.5 The Ascoli–Arzelá Theorem . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 5.6 Product Spaces . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 5.7 Locally Compact Spaces . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 5.7.1 Fundamental Notion . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 5.7.2 Partition of Unity . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 5.7.3 The Alexandroff One-point Compactification .. . . . . . . . . 5.7.4 Proper Functions .. . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 5.8 σ -Compact Spaces .. . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 5.9 Paracompact Spaces . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . ˇ 5.10 The Stone–Cech Compactification .. . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . .

105 105 110 112 114 117 118 121 121 123 126 128 129 131 134

6

Vector Spaces .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 6.1 Group .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 6.2 Ring.. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 6.3 Field . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 6.4 Vector Spaces . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 6.5 Product Spaces . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 6.6 Subspaces . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 6.7 Convex Sets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 6.8 Linear Independence and Dimensions . . . . . . . . .. . . . . . . . . . . . . . . . . . . .

137 137 139 139 140 142 143 146 149

7

Banach Spaces .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 7.1 Normed Linear Spaces.. . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 7.2 The Natural Metric. . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 7.3 Product Spaces . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 7.4 Banach Spaces . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 7.5 Compactness . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 7.6 Quotient Spaces .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 7.7 The Stone-Weierstrass Theorem . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 7.8 Linear Operators . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 7.9 Dual Spaces . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 7.9.1 Basic Concepts .. . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 7.9.2 Duals of Some Common Banach Spaces . . . . . . . . . . . . . . . 7.9.3 Extension Form of Hahn–Banach Theorem . . . . . . . . . . . . 7.9.4 Second Dual Space . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 7.9.5 Alignment and Orthogonal Complements . . . . . . . . . . . . . . 7.10 The Open Mapping Theorem.. . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . .

153 153 160 162 164 169 171 173 180 185 185 185 190 197 199 204

Contents

7.11 7.12

xi

The Adjoints of Linear Operators .. . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 208 Weak Topology . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 211

8

Global Theory of Optimization . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 8.1 Hyperplanes and Convex Sets . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 8.2 Geometric Form of Hahn–Banach Theorem.. .. . . . . . . . . . . . . . . . . . . . 8.3 Duality in Minimum Norm Problems .. . . . . . . . .. . . . . . . . . . . . . . . . . . . . 8.4 Convex and Concave Functionals .. . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 8.5 Conjugate Convex Functionals .. . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 8.6 Fenchel Duality Theorem.. . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 8.7 Positive Cones and Convex Mappings .. . . . . . . .. . . . . . . . . . . . . . . . . . . . 8.8 Lagrange Multipliers.. . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . .

221 221 224 226 229 233 240 247 249

9

Differentiation in Banach Spaces . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 9.1 Fundamental Notion . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 9.2 The Derivatives of Some Common Functions .. . . . . . . . . . . . . . . . . . . . 9.3 Chain Rule and Mean Value Theorem . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 9.4 Higher Order Derivatives .. . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 9.4.1 Basic Concept .. . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 9.4.2 Interchange Order of Differentiation.. . . . . . . . . . . . . . . . . . . 9.4.3 High Order Derivatives of Some Common Functions .. . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 9.4.4 Properties of High Order Derivatives . . . . . . . . . . . . . . . . . . . 9.5 Mapping Theorems . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 9.6 Global Inverse Function Theorem.. . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 9.7 Interchange Differentiation and Limit . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 9.8 Tensor Algebra .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 9.9 Analytic Functions .. . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 9.10 Newton’s Method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . .

257 257 262 265 271 271 278

10 Local Theory of Optimization .. . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 10.1 Basic Notion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 10.2 Unconstrained Optimization.. . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 10.3 Optimization with Equality Constraints . . . . . . .. . . . . . . . . . . . . . . . . . . . 10.4 Inequality Constraints . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . .

335 335 341 344 350

11 General Measure and Integration . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 11.1 Measure Spaces . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 11.2 Outer Measure and the Extension Theorem . . .. . . . . . . . . . . . . . . . . . . . 11.3 Measurable Functions . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 11.4 Integration .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 11.5 General Convergence Theorems . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 11.6 Banach Space Valued Measures .. . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 11.7 Calculation with Measures . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 11.8 The Radon–Nikodym Theorem . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 11.9 Lp Spaces . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 11.10 Dual of C(X , Y) and Cc (X, Y) . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . .

359 359 364 377 391 406 421 452 484 499 514

283 286 293 305 312 316 318 331

xii

Contents

12 Differentiation and Integration . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 12.1 Carathéodory Extension Theorem . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 12.2 Change of Variable.. . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 12.3 Product Measure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 12.4 Functions of Bounded Variation.. . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 12.5 Absolute and Lipschitz Continuity .. . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 12.6 Fundamental Theorem of Calculus . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 12.7 Representation of (Ck (Ω, Y))∗ . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 12.8 Sobolev Spaces.. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 12.9 Integral Depending on a Parameter . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 12.10 Iterated Integrals . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 12.11 Manifold .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 12.11.1 Basic Notion . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 12.11.2 Tangent Vectors . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 12.11.3 Vector Fields . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . .

545 545 551 556 584 607 625 654 663 671 682 688 688 693 695

13 Hilbert Spaces .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 13.1 Fundamental Notions . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 13.2 Projection Theorems .. . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 13.3 Dual of Hilbert Spaces .. . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 13.4 Hermitian Adjoints . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 13.5 Approximation in Hilbert Spaces . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 13.6 Other Minimum Norm Problems.. . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 13.7 Positive Definite Operators on Hilbert Spaces.. . . . . . . . . . . . . . . . . . . . 13.8 Pseudoinverse Operator.. . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 13.9 Spectral Theory of Linear Operators .. . . . . . . . . .. . . . . . . . . . . . . . . . . . . .

699 699 703 704 708 710 720 724 728 732

14 Probability Theory . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 14.1 Fundamental Notions . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 14.2 Gaussian Random Variables and Vectors . . . . . .. . . . . . . . . . . . . . . . . . . . 14.3 Law of Large Numbers . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 14.4 Martingales Indexed by Z+ . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 14.5 Banach Space Valued Martingales Indexed by Z+ .. . . . . . . . . . . . . . . 14.6 Characteristic Functions . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 14.7 Convergence in Distribution .. . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 14.8 Central Limit Theorem . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 14.9 Uniform Integrability and Martingales . . . . . . . .. . . . . . . . . . . . . . . . . . . . 14.10 Existence of the Wiener Process . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 14.11 Martingales with General Index Set . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 14.12 Stochastic Integral . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 14.13 Itô Processes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 14.14 Girsanov’s Theorem . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . .

751 751 776 778 778 787 791 798 804 809 813 827 842 864 899

Contents

A

Elements in Calculus . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . A.1 Some Formulas.. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . A.2 Convergence of Infinite Sequences . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . A.3 Riemann-Stieltjes Integral .. . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . .

xiii

905 905 906 907

Bibliography . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 925 Index . . . . . . . . .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 927

List of Figures

Fig. 6.1 Fig. 6.2 Fig. 6.3 Fig. 6.4 Fig. 7.1 Fig. 7.2 Fig. 8.1 Fig. 11.1 Fig. 13.1 Fig. 13.2 Fig. 13.3 Fig. 13.4 Fig. 13.5 Fig. 14.1 Fig. 14.2

The sum of two sets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . Convex and nonconvex sets . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . Convex hulls . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . Cones . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . Sequence for Example 7.28 . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . Modes of convergence in Y = X∗ . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . Fenchel duality . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . Modes of convergence for functions from X . . .. . . . . . . . . . . . . . . . . . . . Projection onto a subspace . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . Relationship of a Hilbert space H and its dual H∗ and H as Banach space and its dual H∗ . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . Dual projection problems . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . Reformulation of the projection Theorem . . . . . .. . . . . . . . . . . . . . . . . . . . Projection to a convex set . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . A typical cumulative distribution function Fn and the corresponding random variables xn+ and xn− . . . .. . . . . . . . . . . . . . . . . . . . The first eight Haar functions . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . .

144 147 147 149 166 220 240 544 703 707 721 722 723 801 820

xv

Notations

N, .Z, and .Q

.

R and .C

.

K Z+ , .Z− .R+ , .R− , .C+ , .C− . .

R+ , .R− , .C+ , .C−

.

C0 .∈ .∈ / .⊆ .⊇ .⊂ .⊃ .∀ .∃ .∃! .∵ .∴ . · ∞ .(xn ) n=1 .(xα )α∈Λ .idA .

the sets of natural numbers, integers, and rational numbers, respectively the sets of real numbers and complex numbers, respectively either .R or .C .N ∪ {0}, .Z \ N, respectively .(0, ∞) ⊂ R, .(−∞, 0) ⊂ R, the open right half of the complex plane, the open left half of the complex plane, respectively .[0, ∞) ⊂ R, .(−∞, 0] ⊂ R, the closed right half of the complex plane, the closed left half of the complex plane, respectively .C \ {0} belong to not belong to contained in contains strict subset of strict super set of for all exists exists a unique because therefore such that the sequence .x1 , x2 , . . . the ordered collection the identity map on a set A xvii

xviii

e .i .|λ| .λ .Re (λ) .Im (λ) . λ .a ∨ b .a ∧ b .

sgn(x)

.

π .∅ .{x, y} X . 2 .∪ .(x, y) .X × Y A = {x ∈ B | P (x)} x∼y .X/ ≡ . .

f :X→Y

.

graph (f ) dom (f ) .f (A) .range (f ) .finv (A) onto, surjective 1-1, injective bijective .finv .g ◦ f . .

f |A YX .∩ . .

Notations

the natural number the complex unit the absolute value of a real or complex number .λ the complex conjugate of a complex number .λ the real part of a complex number .λ the imaginary part of a complex number .λ the phase angle of a complex number .λ the maximum of two real numbers a and b the minimum of two real numbers a and b ⎧ ⎨ −1 if x < 0 .sgn(x) = 0 if x = 0 , .∀x ∈ R ⎩ 1 if x > 0 the ratio of the circumference of a circle to its diameter the empty set; See Page 27. an unordered pair; See Page 27. the collection of all subsets of X; See Page 27. the set union; See Page 27. an ordered pair; See Page 28. the Cartesian or direct product of sets X and Y ; See Page 28. definition of a set A; See Page 28. x and y are related in a relation; See Page 28. the quotient of the set X with respect to an equivalence relation .≡; See Page 28. a function of X to Y ; .{(x, f (x)) ∈ X × Y | ∀x ∈ X} is the graph of f ; See Page 29. the graph of a function f ; See Page 29. the domain of f ; See Page 29. the image of .A ⊆ X under f ; See Page 29. the range of f , equals to .f (X); See Page 29. the preimage of .A ⊆ Y under f ; See Page 29. .f (X) = Y ; See Page 29. .f (x1 ) = f (x2 ) if .x1 , x2 ∈ X and .x1 = x2 ; See Page 29. both surjective and injective; See Page 29. the inverse function of f ; See Page 29. the composition of .g : Y → Z with .f : X → Y ; See Page 29. the restriction of f to A; See Page 29. the set of all functions of X to Y ; See Page 29. the set intersection; See Page 30.

Notations



.

\ AB

. .

card (X)  α∈Λ Xα .πα (x) . .

A

.

A◦

.

∂A

.



.

α∈Λ (Xα , Oα )

T1 T2 T3 T4 T3 1 2

(xα )α∈A .limα∈A xα .limx→x0 f (x) .Re .

lim supα∈A xα lim infα∈A xα .lim supx→x f (x) 0 .lim infx→x0 f (x) .BX (x0 , r) .B X (x0 , r) .dist(x0 , S) . .

(X, ρX ) × (Y, ρY )  ∞ i=1 Xi , ρ .lc(X ) . .

gf

.

supp(f )

.

xix

the compliment of a set A, where the whole set is clear from context; See Page 30. set minus; See Page 30. the symmetric difference of A and B, equals to .(A \ B) ∪ (B \ A); See Page 30. the number of elements in the finite set X; See Page 31. the Cartesian or direct product of .Xα ’s; See Page 40. the projection of an element in a Cartesian product space to one of the coordinates; See Page 40. the closure of a set A, where the whole set is clear from context; See Page 41. the interior of a set A, where the whole set is clear from context; See Page 41. the boundary of a set A, where the whole set is clear from context; See Page 41. the product topological space; See Page 48. Tychonoff space; See Page 52. Hausdorff space; See Page 52. regular space; See Page 53. normal space; See Page 53. completely regular space; See Page 63. a net; See Page 64. the limit of a net; See Page 64. the limit of .f (x) as .x → x0 ; See Page 68. the set of extended real numbers, which equals .R ∪ {−∞, +∞}; See Page 71. the limit superior of a real-valued net; See Page 71. the limit inferior of a real-valued net; See Page 71. the limit superior of .f (x) as .x → x0 ; See Page 73. the limit inferior of .f (x) as .x → x0 ; See Page 73. the open ball centered at .x0 with radius r; See Page 79. the closed ball centered at .x0 with radius r; See Page 79. the distance from a point .x0 to a set S in a metric space; See Page 81. the finite product metric space; See Page 86. the countably infinite product metric space; See Page 89. The metric space consisting of nonempty closed subsets of .X ; See Page 101. The lifted composition of the function g and f ; See Page 102. the support of a function; See Page 123.

xx

β(X )

.

(M(A, Y), F )

.

ϑX N (A) .R(A) .αS . .

S+T

.

span (A) v (P )

. .

co (S)

.

x |x| .C1 ([a, b]) . .

lp

.

lp (X)

.

V (P )

.

◦P

.

X×Y

.

C([a, b])

.

C(K, X)

.

XR

.

[x] X /M

. .

X/M

.

Cv (X , Y)

.

Notations

ˇ the Stone-Cech compatification of a completely regular topological space .X ; See Page 134. the vector space of .Y-valued functions of a set A over the field .F ; See Page 142. the null vector of a vector space .X ; See Page 144. the null space of a linear operator A; See Page 144. the range space of a linear operator A; See Page 144. the scalar multiplication by .α of a set S in a vector space; See Page 144. the sum of two sets S and T in a vector space; See Page 144. the subspace generated by the set A; See Page 145. the linear variety generated by a nonempty set P ; See Page 146. the convex hull generated by S in a vector space; See Page 147. the norm of a vector x; See Page 153. the Euclidean norm of a vector x; See Page 153. the normed linear space of continuously differentiable real-valued functions on the interval .[a, b]; See Page 154. the normed linear space of real-valued sequences with finite p-norm, .1 ≤ p ≤ +∞; See Page 155. the normed linear space of .X-valued sequences with finite p-norm, .1 ≤ p ≤ +∞; See Page 159. the closed linear variety generated by a nonempty set P ; See Page 162. the relative interior of a set P ; See Page 162. the finite Cartesian product normed linear space; See Page 162. the Banach space of continuous real-valued functions on the interval .[a, b]; See Page 166. the normed linear space of continuous .X-valued functions on a compact space .K; See Page 166. the real normed linear space induced by a complex normed linear space .X; See Page 170. the coset of a vector x in a quotient space; See Page 171. the quotient space of a vector space .X modulo a subspace M; See Page 171. the quotient space of a normed linear space .X modulo a closed subspace M; See Page 171. the vector space of continuous .Y-valued functions on .X ; See Page 173.

Notations

xxi

B(X, Y)

the set of bounded linear operators of .X to .Y; See Page 181. the dual of .X; See Page 185. a vector in the dual; See Page 185. the evaluation of a bounded linear functional .x∗ at the vector x, that is .x∗ (x); See Page 185. the subspace of .l∞ (X) consisting of .X-valued sequences with limit .ϑX ; See Page 189. the second dual of .X; See Page 197. the orthogonal complement of the set S; See Page 199. the pre-orthogonal complement of the set S; See Page 200. the adjoint of a linear operator A; See Page 208. the adjoint of the adjoint of a linear operator A; See Page 209. the weak topology on a normed linear space .X; See Page 211. the weak topological space associated with a normed linear space .X; See Page 211. the weak∗ topology on the dual of a normed linear space .X; See Page 214. the weak∗ topological space associated with the dual of a normed linear space .X; See Page 214. the support of a convex set K; See Page 226. the epigraph of a convex function .f : C → R; See Page 229. the conjugate convex set; See Page 233. the conjugate convex functional; See Page 233. the epigraph of the conjugate convex functional; See Page 233. the pre-conjugate convex set; See Page 236. the pre-conjugate convex functional; See Page 236. the epigraph of the pre-conjugate convex functional; See Page 236. greater than or equal to (with respect to the positive cone); See Page 247. less than or equal to (with respect to the positive cone); See Page 247. greater than (with respect to the positive cone); See Page 247. less than (with respect to the positive cone); See Page 247.

.

X∗ .x∗ .x∗ , x .

c0 (X)

.

X∗∗ ⊥ .S ⊥ . S .

A  .A .

Oweak (X)

.

Xweak

.

Oweak∗ (X∗ )

.

X∗weak∗

.

K supp [f, C]

. .

C conj .f conj .[f, C]conj .

Γ ϕ .conj[ϕ, Γ ] .conj .conj



=

.



=

.



.



.

xxii

S⊕  .S .AD (x0 ) (1) (x ), .Df (x ) .f 0 0 .Df (x0 ; u) .

.

∂f ∂y (x0 , y0 )

ro(A)(B) Bk (X, Y)

. .

BS k (X, Y)

.

Dk f (x0 ), .f (k) (x0 ) .Ck , .C∞ .

.

∂k f ∂xik ···∂xi1

Sm (D) Ck (Ω, Y)

. .

Cb (X , Y)

.

Cb k (Ω, Y)

.

ATn1 ,...,nm

.

A⊗B 0m1 ×···×mn

. .

1m1 ×···×mn

.

exp(A)

.

S+ X , .Spsd X

.

S− X , .Snsd X

.

SX

.

Notations

the positive conjugate cone of a set S; See Page 248. the negative conjugate cone of a set S; See Page 248. the set of admissible deviations in D at .x0 ; See Page 257. the Fréchet derivative of f at .x0 ; See Page 258. the directional derivative of f at .x0 along u; See Page 258. partial derivative of f with respect to y at .(x0 , y0 ); See Page 259. right operate: .ro(A)(B) = BA; See Page 264. the set of bounded multi-linear .Y-valued functions on .Xk ; See Page 271. the set of symmetric bounded multi-linear .Y-valued functions on .Xk ; See Page 271. the kth order Fréchet derivative of f at .x0 ; See Page 272. k-times and infinite-times continuously differentiable functions, respectively; See Page 272. kth-order partial derivative of f ; See Page 277.  .= α∈K αD; See Page 277. the normed linear space of k-times continuously differentiable .Y-valued functions on a compact set .Ω ⊆ X; See Page 313. the normed linear space of bounded continuous .Y-valued functions on a topological space .X ; See Page 314. the normed linear space of k-times bounded continuously differentiable .Y-valued functions on a set .Ω ⊆ X; See Page 315. the transpose of an mth order tensor A with the permutation .(n1 , . . . , nm ); See Page 316. the outer product of two tensors; See Page 317. an nth order .K-valued tensor in m m .B(K n , . . . , B(K 1 , K) · · · ) with all elements equal to 0 an nth order .K-valued tensor in m m .B(K n , . . . , B(K 1 , K) · · · ) with all elements equal to 1 the exponential function of a linear operator A on a Banach space; See Page 329. sets of positive definite and positive semi-definite operators over the real normed linear space .X, respectively; See Page 335. sets of negative definite and negative semi-definite operators over the real normed linear space .X, respectively; See Page 335. .BS 2 (X, R); See Page 336.

Notations

(R, BL , μL ) .μLo .BB (X ) .A(X ) .

μB .R .P a.e. in X .P (x) a.e. x ∈ X .χA,X .

P ◦f

.

R(X) I(X) . X f dμ . .



.

X

f (x) dμ(x)

P ◦μ

.

P ◦μ

.

μ1 + μ2

.

αμ

.

μy

.



.

Mσ (X, B, Y)

.

Mf (X, B, Y)

.

limn∈N μn = ν

.

μ1 ≤ μ2

.

xxiii

Lebesgue measure space; See Page 370. Lebesgue outer measure; See Page 370. Borel sets; See Page 371. the algebra generated by the topology of a topological space .X ; See Page 371. the Borel measure on .R; See Page 371. .((R, R, |·|), BB (R), μB ); See Page 375. P holds almost everywhere in .X ; See Page 379. P holds almost everywhere in .X ; See Page 379. the indicator function of .A ⊆ X. Function of X to .{0, 1}; See Page 387. .P ◦f : X → [0, ∞) ⊂ R defined by .P ◦f (x) = f (x), .∀x ∈ X; See Page 388. the collection of all representation of .X; See Page 391. the integration system on .X; See Page 391. the integral of a function f on a set X with respect to measure .μ; See Page 392. the integral of a function f on a set X with respect to measure .μ; See Page 392. the total variation of a Banach space valued pre-measure .μ; See Page 421. the total variation of a Banach space valued measure .μ; See Page 426. the .Y-valued measure that equals to the sum of two .Yvalued measures on the same measurable space; See Page 455. the .Y-valued measure that equals to the scalar product of .α ∈ K and .Y-valued measure .μ; See Page 456. the .Y-valued measure that equals to scalar product of a .K-valued measure .μ and .y ∈ Y; See Page 456. the .Z-valued measure that equals to product of an bounded linear operator A and a .Y-valued measure .μ; See Page 456. the vector space of .σ -finite .Y-valued measures on the measurable space .(X, B); See Page 463. the normed linear space of finite .Y-valued measures on the measurable space .(X, B); See Page 464. a sequence of .σ -finite (.Y-valued) measures .(μn )∞ n=1 converges to a .σ -finite (.Y-valued) measure .ν; See Page 466. the measures .μ1 and .μ2 on the measurable space .(X, B) can be compared if .μ1 (E) ≤ μ2 (E), .∀E ∈ B; See Page 467.

xxiv

⎤ μ1,1 · · · μ1,m ⎢ . .. ⎥ .⎣ . . . ⎦ μn,1 · · · μn,m .μ1 " μ2

Notations



μ1 ⊥ μ2

.

.

dν dμ

Pp ◦ f .ess sup . ¯p .limn∈N zn = z in L .

Mf t (X , Y)

.

Mσ t (X , Y)

.

Mσ (X, B)

.

Mf (X, B)

.

Mσ t (X )

.

Mf t (X )

.

Cc (X, Y)

.

 m

. .

j =1 μj

Ex1 , Ex2 

fx1 , fx2 

m

.

i=1 Xi

the vector measure; See Page 469. the measure .μ1 is absolutely continuous with respect to the measure .μ2 ; See Page 480. the measures .μ1 and .μ2 are mutually singular; See Page 480. the Radon–Nikodym derivative of the .σ -finite .Y-valued measure .ν with respect to the .σ -finite .K-valued measure .μ; See Page 489. the function .f (·)p ; See Page 499. the essential supremum; See Page 501. ¯ ¯ ¯ the sequence .(zn )∞ n=1 ⊆ Lp converges to .z ∈ Lp in .Lp pseudo-norm; See Page 503. the normed linear space of finite .Y-valued topological measures on .X ; See Page 519. the vector space of .σ -finite .Y-valued topological measures on .X ; See Page 519. the set of .σ -finite measures on the measurable space .(X, B); See Page 520. the set of finite measures on the measurable space .(X, B); See Page 520. the set of .σ -finite topological measures on the topological space .X ; See Page 520. the set of finite topological measures on the topological space .X ; See Page 520. the normed linear space of .Y-valued continuous function of a finite-dimensional Banach space, that converges to .ϑY as .xX → ∞; See Page 538. convergence in distribution; See Page 542. the product measure of .μ1 , . . . , μm ; See Page 560. for a set E ⊆ X1 ×X2 , Ex1 is the section of the set E with the first coordinate fixed at x1 , and Ex2  is the section of the set E with the second coordinate fixed at x2 ; See Page 568. for a function f : X1 × X2 → Y , fx1 : X2 → Y is the section of the function f with the first coordinate fixed at x1 , and fx2  : X1 → Y is the section of the function f with the second coordinate fixed at x2 ; See Page 570. product topological measure space of .X1 , . . . , Xm , where .m ∈ N; See Page 576.

Notations

rx1 ,x2

.

rx1 ,x2

.

r◦x1 ,x2

.

P(Ω) VRecti,x1 ,x2

. .

ΔF(rx1 ,x2) TF rx1 ,x2

. .

(Rm , BLm , μLm ), .μLmo

.



.

U

g(x) dF (x)

πi0 : Rm → Rm−1

.

b

.

a

f (x) dx

dia(S) .em,i .Sn−1 .

Wp,1,x0 (X, Y)

.

Ln : C0 → {a + ib ∈ C | − π < b ≤ π} .exp .

ln C∞ (p, K)

. .

Tp X

.

Fp

.

T∗p X

.

xxv

the semi-open rectangle in .Rm with corners .x1 and .x2 ; See Page 585. the closed rectangle in .Rm with corners .x1 and .x2 ; See Page 585. the open rectangle in .Rm with corners .x1 and .x2 ; See Page 585. the principal of a region .Ω; See Page 585. the set of vertexes of .rx1 ,x2 with i coordinates equal to that of .x1 ; See Page 587. the increment of F on .rx1 ,x2 ; See Page 587. the total variation of a function F on the semi-open rectangle .rx1 ,x2 ; See Page 587. m-dimensional Lebesgue measure space and the mdimensional Lebesgue outer measure; See Page 606. the integral of function g with respect to .Y-valued measure space .(P(Ω), BB (P(Ω)), μ) whose distribution function is .F : Ω → Y over the set .U ∈ BB (P(Ω)); See Page 607. .πi (x) = (π1 (x), . . . , πi0 −1 (x), πi0 +1 (x), . . . , πm (x)), 0 m .∀x ∈ R ; See Page 607. the integral of function f from .a ∈ R to .b ∈ R with respect to .μB ; See Page 618. the diameter of a subset of a metric space; See Page 625. the ith unit vector in .Rm ; See Page 628. the finite compact separable metric measure space on the n-dimensional unit sphere, where the measure of a set is equal to the “surface area” of the set; See Page 663. 1st order Sobolev space of .X to .Y under p-norm with respect to .x0 ∈ X; See Page 664. the principal value of logarithm function; See Page 691. the exponential function of a manifold to .C0 ; See Page 692. the logarithm function of .C0 to a manifold; See Page 692. the collection of all functionals that are smooth in a neighborhood of p on a manifold .X ; See Page 693. the space of tangent vectors on a manifold .X at .p ∈ X ; See Page 694. the differential of a mapping F at p on a manifold .X ; See Page 694. the space of tangent covectors on a manifold .X at .p ∈ X ; See Page 695.

xxvi

Notations

Lf λ

the Lie derivative of .λ along the vector field f ; See Page 695. the set of all smooth vector fields on the manifold .X ; See Page 695. the Lie bracket of vector fields f and g; See Page 697. the inner product of vectors x and y; See Page 699. the vectors x and y are orthogonal in a pre-Hilbert space; See Page 701. the vector x is orthogonal to the set S in a pre-Hilbert space; See Page 701. x ∗ ∈ X∗ that satisfies x ∗ , y = y, x, ∀y ∈ X; See Page 705. the orthogonal complement in Hilbert spaces; See Page 707. the direct sum of M and N; See Page 707. the Hermitian adjoint of A; See Page 708. the Gram matrix of y1 , . . . , yn ; See Page 711. the Gram determinant of y1 , . . . , yn ; See Page 711. set of Hermitian operators on a real or complex Hilbert space; See Page 724. sets of positive definite and positive semi-definite operators over real or complex Hilbert space X, respectively; See Page 724. sets of negative definite and negative semi-definite operators over real or complex Hilbert space X, respectively; See Page 724. the pseudoinverse of A ∈ B(X, Y); See Page 729. the set of compact linear bounded operators of X to Y; See Page 735. expectation of x; See Page 751. ˆ See Page 753. conditional expectation of x given .B; ˆ the conditional probability of event F happens given .B, ˆ ; See Page 761. which is defined to be .E(χF,Ω |B). the conditional probability of event F happening given the event E happened; See Page 763. the law for the random variable x; See Page 771. Gaussian (normal) random variable (vector) with mean .x¯ and covariance K; See Page 776. the topological measure space on .Z+ with the counting measure and the partial ordering of .≤; See Page 779. the characteristic function of the random variable x; See Page 791.

.

V(X )

.

[f, g] x, y x⊥y

.

x⊥S x∗ S ⊥h M ⊕N A∗ Gram(y1 , . . . , yn ) gram(y1 , . . . , yn ) S¯X S¯+ X , S¯psd X S¯− X , S¯nsd X A† K(X, Y) E(x) ˆ E(x|B) ˆ .P (F |B) . .

P (F |E)

.

Lx x ∼ N(x, ¯ K)

. .

ℵ := (Z+ , Z+2, μ)

.

Ψx

.

Notations



.

ra,b φs

b dws , .a φs dws

 ra,b φ(s) dws , b φ(s) dws . a ˆ a,b × Ω, Y) .Q(r .



.

ra,b φs

b dws . a φs dws

U(ra,b × Ω, Y)

.

  ˆ ra,b R

.

  Iˆ ra,b

.

xxvii

Riemann-Stieltjes integral of .φ with respect to the Wiener process w on the interval .ra,b ; See Page 843. Wiener integral of .φ (with respect to the Wiener process w) on the interval .ra,b ; See Page 849. the set of all .Y-valued simple predictable step functions on .ra,b × Ω; See Page 850. the Itô integral of .φ (with respect to Wiener process w) on the interval .ra,b ; See Page 857. This set contains Itô integrable processes, whose Itô integral can be approximated by integrating the sampled and hold process; See Page 860. the collection of all sampled partitions of a compact interval .ra,b ⊆ R; See Page 907. the Riemann integration system on a compact interval .ra,b ⊂ R; See Page 907.

b   a f dg, . ra,b f (s) dg(s) the Riemann-Stieltjes integral of f with respect to g on .ra,b ; See Page 907.

.

Chapter 1

Introduction

1.1 The Tour of the Book When I started on this mathematics endeavor, I knew I had to start with set theory. Various paradoxes hang in the air but I knew this subject is beyond my ability. I have to start with a set of axioms that are solid and extensive. I thought that I studied so much math, and did so many problems, I never have to question my own understanding of mathematics. So, what I know by heart should be correct. I just have to find an axiomatic system that conforms with my understanding that avoids the various paradoxes that signals warnings in certain directions. I find the Zermelo–Fraenkel–Cantor (ZFC) axiomatic set theory from my Chinese award book that I received during my high school years Mathematics Handbook Editors Group (1979). This axiomatic system claims to be consistent. It agrees with my understanding in set theory. Most importantly, I see that it avoids the obvious pitfalls of the paradoxes. I took a leap of faith here and based my Chap. 2 on this axiomatic system. My understanding here is that in my entire life, all the sets I encounter are defined via these axioms except Axiom 4. Axiom 4 is something that I am not familiar with, and maybe the source of the paradoxes when violated. I feel that I am safe that I define sets only using the rest of the axioms. So, there it is. I had the understanding that in set definition using Axiom 5, one must specify clearly the set A always and then the paradoxes are avoided. I hope that I am correct with my decision. I am confident that I am correct. Throughout the book, I never had those weird set definitions that calls for suspicion. So, even if this axiomatic system is not entirely consistent, I feel that my work will not be of void. The major theorem in Chap. 2 is the equivalence of the Axiom of Choice, the Hausdorff’s Maximal Principle, the Zorn’s Lemma, and the Well-Ordering Principle. The proof of this equivalence (Theorem 2.18) is based on the teaching notes of Professor Peter Loeb on Real Analysis at the University of Illinois at Urbana-Champaign. There is no more leap of faith in the rest of the book. Everything follows from reasoning, and Axiom 4 is never invoked. Then, the book Suppes (1972) clears up the mysteries in © The Author(s), under exclusive license to Springer Nature Switzerland AG 2023 Z. Pan, Measure-Theoretic Calculus in Abstract Spaces, https://doi.org/10.1007/978-3-031-21912-2_1

1

2

1 Introduction

the paradoxes, and reinforced my confidence in my presentation. The only thing that I am not so confident about is that I may need to take the natural numbers as given objects rather than defining them using set, which might cause some confusion if one think too much about it. But, this approach has its own problems, in the sense that there are objects that are not sets. This might cause a lot of trouble in establishing the consistency of the set theory. Anyway, I will stay with what I have in the book here. After I have finished my book, I reread the classic book Halmos (1960) in set theory and find its treatment of set theory to be very agreeable with my understanding of the subject. This was not the case when I read it during the time of my writing Chap. 2. I think that it is safe to say that Halmos (1960) can be the foundation for set theory as well. Chapter 3 is a study of topological spaces. A topological space is a set together with a collection of subsets of the set which is called the topology on the set. Any set in the topology is considered an open set. Any set whose complement is an open set is considered a closed set. Then, we can talk about set closure, interior, and boundary. Continuity is the key concept for a function between topological spaces. Fundamental notions of a basis of a topological space, a basis at a particular point in the topological space, the first and second countability of a topological space, the various classes of topological spaces depending on the separability axioms, the category theory of topological spaces, the connectedness concepts are important building blocks for this chapter. There is the proof of Urysohn’s Lemma (Theorem 3.55) which establishes the existence of real-valued continuous functions with certain properties on a normal topological space. Furthermore, Tietze’s Extension Theorem 3.57 allows us to extend a real-valued function defined on a closed subset of a normal topological space to the entire space. At the end of the chapter, the key notion of “net” is introduced which is a generalization of the concept of a sequence. A sequence has only one direction: forward. A net is like the water flow in a river, it has many vortexes locally, but in the big picture, it has only one direction: forward. With the definition of a net, we can talk about limits and convergence. If we have a net of real (or extended real) numbers, one can talk about limit superior and limit inferior. The concept of continuity is fundamentally linked to the concept of net in topological spaces. It will be seen that the definition of Lebesgue and Riemann integrals are all based on the convergence of the nets. Metric spaces are studied in Chap. 4. A metric space is a set together with a function that defines the distance (metric) between each pair of points in the set. It induces a topology on the set, and therefore we can talk about open and closed sets, continuity and convergence (in terms of the metric). The concept of completeness is essential in metric spaces. A complete metric space is one in which every Cauchy sequence converges, and therefore the space does not have “holes” in it. In addition to these topological concepts, we can further talk about uniform continuity, uniform convergence, equi-continuity, etc., in general uniformity. The fundamental difference between a complete metric space and an incomplete one is delineated in Baire Category Theorem 4.41. Any given metric space can be embedded as a dense subset of a complete metric space (Theorem 4.49). Because of the additional properties of a metric space instead of a topological space, we are

1.1 The Tour of the Book

3

interested in the question of when a topological space is metrizable. This question is resolved through the Urysohn Metrization Theorem 4.53. Because of the ability to talk about uniform convergence, we provide conditions when two limit operations in series can be interchanged without affecting the final answer. This is the topic of Sect. 4.9. The limit operation in series can be interchanged if the joint limit exists. The joint limit exists if certain convergence is uniform. The concept of compactness of a set is pervasive in analysis. Chapter 5 is devoted to the study of compact spaces and locally compact spaces. The concept of compactness involves actually four concepts: countably compact, Bolzano– Weierstrass property, sequentially compact, and compact for general topological spaces. A countably compact space is equivalent to the space having the Bolzano– Weierstrass property. Otherwise, these compactness properties are distinct in general topological spaces. Here, metric space again demonstrates its superiority over topological spaces, where all these compactness notions are equivalent in metric spaces as demonstrated in the Borel-Lebesgue Theorem 5.37. It is well-known that a continuous real-valued function on a nonempty (countably) compact space achieves its maximum and minimum on the space. Ascoli-Arzelá Theorem 5.44 then says that for metric space valued equi-continuous functions, pointwise convergence implies uniform convergence, which in turn implies that the limit function is continuous. In some sense, this is a sequential compactness theorem for equi-continuous family of functions. In Sect. 5.6, there is the Tychonoff Theorem 5.47, which states that any arbitrary product of compact spaces is compact. The chapter then turns to study locally compact spaces in Sect. 5.7. A main result for locally compact Hausdorff spaces is that a compact set in such a space admits partition of unity with respect to any open covering among the class of functions that satisfy certain standard properties. Another main result for locally compact Hausdorff spaces is that they admit the Alexandroff one-point compactification. We make use of the Alexandroff one-point compactification later in Chap. 11 to define the space .Cc (X, Y) and find its dual. Then, the concept of proper functions is introduced. The existence of a proper continuous function on a locally compact Hausdorff space is equivalent to .σ compactness of the space. Then, the concept of paracompactness is introduced and it is proved that metric spaces are paracompact. This chapter ends with a discussion ˇ of Stone-Cech compactification. Starting in Chap. 6, we bring binary operations into the study of spaces. A set with a binary operation (addition) is called a group if the binary operation is associative, admits a unit element (0), and every element in the set admits an additive inverse. If the group operation is commutative, then we have an abelian group. On the other hand, a semi-group is something less than a group. It will become a group if it admits inverses for every element in the set. A step further of the concept of a group is a ring. A ring is an abelian group together with another binary operation defined on the set: multiplication. On a ring, the multiplication must be associative, and distributive with respect to addition (left and right). A ring can be commutative if the multiplication operation is commutative. It can have a 1 element if the multiplication operation admits an identity element. A commutative ring with an identity element (for multiplication) is further called a field, if every

4

1 Introduction

element in the set except 0 has a multiplicative inverse. The concept of field leads us to the familiar entities: .Q, .R, .C. A vector space over a field is a set together with vector addition, defined on the product of the vector space with itself, and scalar multiplication, defined on the product of the vector space and the field, further satisfying a list of conditions for scalar multiplication and vector addition. A vector space reminds us of .Rn or .Rn1 ×n2 ×···×np (but with less technical capabilities). We can define linear operators of a vector space to another (or the same) vector space. The concept of linear combination, subspaces, product space, linear variety, linear independence, and dimension are introduced in vector spaces. For real or complex vector spaces (vector spaces over the field .R or .C), the concept of convexity can be introduced. A vector space is a space with vector operations but without a topology. A normed linear space is a vector space together with a nonnegative real-valued function defined on the set: which is called the norm. The norm is like the concept of the length of an arrow. The arrow has a direction and a length. With the definition of a norm, the space admits a metric space structure, where the distance between two points is simply the length of the difference of the two points (vectors). Thus, we have a topology on the space and can talk about open and closed sets, continuity, and convergence. Since it is a metric space, we can talk about uniform convergence and uniformity in general. A complete normed linear space is called a Banach space. (Remember that completeness is a concept introduced for metric spaces). The Banach space is the focus of study in Chap. 7. In .Rn , compactness of a set is equivalent to the set being closed and bounded. For finite-dimensional Banach spaces, compactness of a set is again equivalent to the set being closed and bounded. But, for infinite-dimensional Banach spaces, compactness is hard to come by. But, a lot can be said for a set that is compact in a Banach space. (Then, it is countably compact, has Bolzano–Weierstrass property, sequentially compact. Remember the Borel-Lebesgue Theorem). Since a Banach space has the vector space structure, we can talk about the quotient space with respect to a given subspace. The quotient space is a Banach space if the given subspace is closed and we define norm on the quotient space in a specific way. The given subspace can be the nullspace of a pseudo-norm (which is less than a norm in the sense that the pseudo-norm can be 0 for non-null vectors in the vector space). The vector space with a pseudo-norm can become a normed linear space if we take the quotient of the vector space with respect to the nullspace of the pseudo-norm. Then, we switch gear and prove the Stone–Weierstrass Theorem 7.56 for the Banach space of real-valued functions on a compact space. Starting in Sect. 7.8, we study linear operators defined on a normed linear space. The set of bounded linear operators of a normed linear space to a Banach space itself forms a Banach space. In particular, bounded linear operators of a Banach space to its associated field .K (stands for .R or .C) are called functionals on the original Banach space, which form a Banach space of special interest: it is called the dual of the original Banach space. A Banach space together with its dual allows for many optimization problems to be solved. A fundamental result concerning a normed linear space and its dual is the Hahn–Banach Theorem. In this chapter (Chap. 7),

1.1 The Tour of the Book

5

we discuss the extension form of the Hahn–Banach Theorem. In simple words, Hahn–Banach Theorem guarantees the existence of an extension of a bounded linear functional defined on a subspace of the normed linear space to a bounded linear functional on the entire normed linear space without increasing the norm of the functional. This key existence result is proved via a direct application of the Axiom of Choice (or its equivalent). It allows for proving the existence of the optimum in many optimization problem formulated in the dual space. With this theorem, we know that not all formulations of an optimization problem are equal. A crafty formulation allows the solution to be found, while a naive formulation leads to nowhere (literally). The dual of the dual of a Banach space in general contains the original Banach space as a subspace. It should be noted that there are some Banach spaces whose duals are unknown to us (even though it exists in theory). We cannot comprehend such a dual space. The dual of the dual of a Banach space may be the original Banach space, in which case, we say the original Banach space is reflexive. Reflexive Banach spaces are simpler to deal with. We will restrict ourselves to such spaces if certain proofs do not go through for general Banach spaces. In the space of .Rn , we have a powerful theorem called the projection theorem. Minimum norm from a vector to a subspace can be found by the projection theorem. In Banach spaces, the concept of orthogonality can be generalized, except that, for a set in the Banach space, its orthogonal complement is a subset in the dual of the Banach space. For a subset in the dual of a Banach space, its orthogonal complement is a subset of the dual of the dual of the Banach space. To simplify things, we define the pre-orthogonal complement of this set (in the dual) to be a set in the original Banach space. With these concepts, minimum norm problems of a point to a subspace can be characterized using the orthogonal complement or the pre-orthogonal complement (see Propositions 7.97 and 7.99). A strong result concerning a bijective bounded linear operator between two Banach spaces is that its inverse is a bounded linear operator. This is a deep result that is a consequence of the Baire Category Theorem. This theorem demonstrates the additional advantage of working in a Banach space, rather than a normed linear space. For a bounded linear operator A of a Banach space .X to another Banach space .Y, there is a corresponding a bounded linear operator  ∗ ∗ .A : Y → X , which is called the adjoint of A that maps the dual of .Y to the dual of .X. In terms of matrices, the adjoint of a matrix is its transpose, if the definition of linear functional is standard. The range spaces and null spaces of A and .A are closely related as delineated in Propositions 7.112 and 7.114. These are the infinitedimensional counterparts of the matrix fact .dim(R(A)) = n − dim(N (A)), where A is an .m × n-dimensional matrix, but are strictly weaker conclusions. Chapter 7 concludes with a discussion of weak topology on a Banach space and the weak∗ topology on the dual of a Banach space. On a Banach space, the norm induces a topology on the space, which is called the strong topology. Then, there is the weak topology, which is the weakest topology such that all bounded linear functionals of the Banach space are continuous. A weak open set is a strong open set, and a weak closed set is a strong closed set. But, a strong open set may or may not be a weak open set, and a strong closed set may or may not be a weak closed set. Remember that topology is the collection of subsets of the space that we call open

6

1 Introduction

sets. Weak topology is a subset of strong topology. With the definition of the weak topology on the Banach space, we can talk about continuity and convergence in the weak topology. It is true that vector addition and scalar multiplication are continuous in the weak topology. For finite-dimensional Banach spaces, the weak topology is the same as the strong topology. So, the study of the weak topology is for infinitedimensional Banach spaces only. On the dual of a Banach space, (remember it is also a Banach space), we have the weak topology, and we will define a possibly even weaker topology: the weak∗ topology: it is the weakest topology on the dual such that all elements of the Banach space viewed as bounded linear functionals on the dual are continuous. When the Banach space is reflexive, then the weak∗ topology coincides with the weak topology on the dual of the Banach space. Again, it is true that vector addition and scalar multiplication are continuous in the weak∗ topology. The main reason why we study these topologies is the Alaoglu Theorem 7.122 that says that the closed unit ball of the dual of the Banach space is a weak∗ compact set. This should not rush the reader to the wrong conclusion that every sequence in the closed unit ball has a weak∗ convergent subsequence! Remember this is weak∗ topology, not a metric space, and thus weak∗ compactness does not imply weak∗ sequential compactness. The right conclusion that can be made here is that every sequence in the closed unit ball has a weak∗ cluster point (Bolzano–Weierstrass property). Another piece of useful result in the weak topology is that a bounded linear operator A between Banach spaces is continuous as an operator between the weak topologies of these Banach spaces. I think that Alaoglu Theorem 7.122 holds the key to infinite-dimensional system theory where one can have the tools of compactness available. Chapter 8 studies convex optimization problems in real Banach spaces. The concept of a hyperplane is defined to be a maximal proper linear variety in a normed linear space. Every closed hyperplane can be identified by a set .{x ∈ X | f (x) = c}, where f is a non-null bounded linear functional in the dual of .X and .c ∈ R. A direct consequence of the Hahn–Banach Theorem is Mazur’s Theorem 8.7 which says that given a linear variety and a convex set with nonempty interior in a real normed linear space, if the linear variety does not intersect the interior of the convex set, then there is a hyperplane containing the linear variety such that the convex set is contained in exactly one of the two closed half spaces associated with the hyperplane. Another direct consequence of Hahn–Banach Theorem is the Eidelheit Separation Theorem 8.9 that says given two nonempty convex sets .K1 and .K2 in a real normed linear space .X, if the interior of .K1 is nonempty, and the interior of .K1 does not intersect .K2 , then we may find a hyperplane that separates .K1 and .K2 . A direct consequence of these theorems is that a closed convex subset of a real normed linear space is weakly closed (remember the weak topology). This, combined with the Alaoglu Theorem, yields that, for a reflexive real Banach space .X, a closed bounded convex subset in the dual of .X is weak∗ (weakly) compact. The infimization of the norm of a point to a nonempty convex set in a real normed linear space is equivalent to the maximization problem involving the support functional of the convex set over the support of the convex set (Proposition 8.15). The same problem, if formulated in the dual of a reflexive real Banach space, admits

1.1 The Tour of the Book

7

a minimizer. For more general problems where the optimizing functional is convex but not necessarily a norm expression, we introduce the concept of conjugate convex functional. For convex functionals defined on a convex subset of the dual of a real normed linear space, we introduce the concept of pre-conjugate convex functional. This culminates in Propositions 8.32 and 8.33, where the original function is equal to the supremization over bounded linear functionals in the conjugate convex set of a functional involving the conjugate convex functional. Under stronger assumptions, the supremization becomes maximization in Proposition 8.34. A step further in this line of study is the Fenchel Duality Theorem 8.35. The key game theory result is obtained in Proposition 8.39. For constrained convex optimization problems, we first introduce the concept of positive cone and its dual notion of positive conjugate cone. The chapter ends with presentation of two variants of the Lagrange multiplier theory (Propositions 8.57 and 8.58), as well as sufficient conditions of optimality. In a Banach space, we can do vector addition and scalar multiplication and have the concept of distance. All the necessary ingredients are present for the study of calculus on the space. In Chap. 9, we embark on the study of differentiation on Banach spaces. This topic is relatively well understood and has significant body of coverage in Luenberger (1969). I follow closely the results on Fréchet derivatives for functions between Banach spaces. I take extra care in the results for functions with general domains (not simply open domains). There, an extra condition is needed for the local shape of the domain at the point of differentiation. For high order derivatives (including partial derivatives), we inevitably deal with tensor operators. The Chain Rule (Theorem 9.18) and Mean Value Theorems are established for firstorder differentiations. A function f is .Ck at .x0 ∈ X and .dom (f ) is locally convex implies that the kth order derivative .f (k) (x0 ) is a k-fold symmetric operator on .X (Proposition 9.28). The calculation of high order derivatives of a function can be carried out according to Proposition 9.27, once it is shown that it is sufficiently many times differentiable. Interchange of order of differentiation can be carried out according to Propositions 9.31 and 9.32. These results are standard (Bartle, 1976) if the domain of the function is open. When this assumption is not valid, then the technical assumptions on the domain that allow the interchange of order of differentiation are delineated in these propositions. The derivatives of useful functions are obtained as well as some basic properties governing the high order derivatives. A key take away from Sect. 9.4 is that a .Ck function composed with another .Ck function again yields a .Ck function. In particular, this result holds at each point in the domain (see Proposition 9.45). For a point in the interior of the domain, the function is .Ck at the point if all partial derivatives up to kth order are continuous at the point (see Proposition 9.47). The study of high order derivative ends with the well-known Taylor’s Theorem 9.48. The usefulness of differentiation in Banach spaces lies in the existence of various mapping theorems in Sect. 9.5 and analytic functions in Sect. 9.9. In Sect. 9.5, we present the Contraction Mapping Theorem, Injective Mapping Theorem, Surjective Mapping Theorem, Open Mapping Theorem, Inverse Function Theorem, and two versions of Implicit Function Theorems (for continuous and differentiable cases). These results have been well motivated in Luenberger (1969). The global version

8

1 Introduction

of the Inverse Function Theorem is motivated by Ambrosetti and Prodi (1993). The result is that for a continuous and countably proper function F between Hausdorff topological spaces, if it is locally continuously invertible at each point in the domain, .dom (F ) is arcwise connected and the .range (F ) is simply connected, then F is a homeomorphism between .dom (F ) and .range (F ). (A continuous function is said to be countably proper if the inverse image of any compact set is countably compact). The technicalities all disappear once we deal with metric spaces. In Sect. 9.7, we present sufficient conditions under which the limit operation and differentiation operation in series can be interchanged. The result becomes standard (Bartle, 1976) if the domain of the function is an open set. In Sect. 9.8, we present the definition and properties of basic tensor operations. Section 9.9 includes all essential ingredients of analytic functions on Banach spaces: the definition of an analytic function, the Taylor series expansion of an analytic function, the composition of analytic functions yields an analytic function, the frequently encountered analytic functions in a Banach space, the inversion of bounded linear operators between Banach spaces as an analytic function, and the analytic versions of Inverse Function Theorem and Implicit Function Theorem. It also includes the discussion of the analytic function: the exponentiation of a bounded linear operator on a Banach space. The chapter ends with a section on Newton’s Method on the numerical computation of the root of a function, motivated by Luenberger (1969). In Chap. 10, we derive local necessary, as well as sufficient conditions for the unconstrained optimization of a functional, the constrained optimization of a functional subject to equality constraints, as well as inequality constraints, on real Banach spaces. We define the concept of positive definite, positive semi-definite, negative definite, and negative semi-definite operators on a real Banach space. We derive necessary and sufficient conditions for Fréchet differentiable functionals to be a convex functional. In constrained optimization problems, the constraints are mappings between Banach spaces, and the inequality constraints are in terms of positive cones. Lagrange multiplier theory is presented for constrained optimization problems. The generalized Kuhn–Tucker Theorem for optimization problems with infinite-dimensional inequality constraints is presented. The results are standard and follow Luenberger (1969). The first five sections of Chap. 11 are devoted to the study of measure theory, measurable functions on a measure space, general integration theory on a measure space, and the interchange of integration and limit via the Lebesgue Dominated Convergence Theorems. As the study proceeded, I decided that it is unfruitful to deal primarily with complete measure spaces. Rather we focus on general measure space that may not be complete. The focus here is on Borel measure spaces, when dealing with measures on a topological space. The foundation of measure theory is the Carathéodory Extension Theorem. In the definition of the Lebesgue measure on .R, we observe its key feature: any Lebesgue measurable set E can be contained by some open set U with arbitrarily small Lebesgue measure .μL (U \ E); or equivalently, it can contain some closed set F with arbitrarily small Lebesgue measure .μL (E \ F ). We restrict the coverage to only those Borel measurable sets .BB (R). The restriction of the Lebesgue measure to these Borel measurable sets is

1.1 The Tour of the Book

9

defined to be the Borel measure on .R, .μB . This measure clearly inherits the key feature from Lebesgue measure. This leads us to define topological measure space .X on a topological space .X := (X, O) to be a the space .((X, O), B, μ) where .μ is a measure on the measurable space of .(X, B := BB (X )), where .B is the .σ -algebra generated by the open subsets of X, such that any (Borel) measurable set .E ∈ B can be contained by some open set .U ∈ O with arbitrarily small measure .μ(U \ E). A metric measure space is a topological measure space on top of a metric space, and Banach measure space is a topological measure space on top of a Banach space, and so on and so forth. One simple relationship due to this definition is that if for any set .E ∈ B, we let .(E, OE ) be the topological subspace of .X , and .(E, BE , μE ) be the measure subspace of .(X, B, μ), then .BE is the .σ -algebra generated by .OE and .μE satisfies the preceding condition such that .E := ((E, OE ), BE , μE ) forms a topological measure space. Thus, we call .E the topological measure subspace of .X. On a topological measure space, all measurable subsets admit a topological measure structure. With the focus on Borel measurable sets, a function is measurable if the inverse image of any open set (and hence any Borel measurable set) is Borel measurable. These concepts are not changed. But, the concept of two functions being almost everywhere equal to each other is significantly changed. We say that a property holds almost everywhere if the set on which the property does not make sense or does not hold is a Borel measurable set and has measure zero. The burden here is to guarantee that this negligible set is (Borel) measurable, which can be ensured by the measurability of the functions on both sides of the equality and the range of the functions is a separable metric space. The pointwise limit of measurable functions is again measurable. But for almost everywhere limit of measurable functions, one has to be careful: the limiting function is measurable but not necessarily defined everywhere. To get a measurable function that is defined on the entire domain as an almost everywhere limit of the sequence of measurable functions, one can simply set the function to an arbitrary constant whenever the pointwise limit does not exist. By careful arguments in the proof, the standard machinery of measure theory carries through to Borel measurable spaces. The Littlewood’s three principles are cleanly established and appropriately generalized to Borel measurable spaces. The integration of a measurable function on a measure space is carefully defined. Thus, .0 × ∞ = 0 is completely banished from the theory. First, we define a directed system, which is called the integration system, on the range of the integrand (which is a normed linear space). Next, integration on finite measure spaces is defined. This directed system results in a net for the corresponding integrand, whose convergence determines the integrability of the integrand. Special care is taken for cases when the integrand is possibly unbounded on the domain of integration, where the integration system has special rules for the sample point that represents a measurable set in the range of the integrand. Based on this definition, we further define the integration of a measurable function on a measure space of .∞ measure to be the limit of the net of integrals on finite measure subspaces of the measure space with the partial ordering being set containment for the finite measure subspaces. Then, we prove that these definitions lead to expected results for the

10

1 Introduction

integration of simple functions. Then, the Bounded Convergence Theorem 11.77 is established for general integrands over finite measure spaces. Then, we prove that our definition leads to the same integration formula for nonnegative realvalued measurable functions over any measure space, Proposition 11.78, as Royden (1988). Now, we are fully sure that our definition of integration is correct. We then prove the Monotone Convergence Theorem and establish all well-known properties for Lebesgue integral of nonnegative extended real-valued functions. Finally, we proceed in Sect. 11.5 to establish Lebesgue Dominated Convergence Theorem for Banach space valued measurable functions, which culminates in Theorem 11.91. Thus, under the assumption that the integrands are absolutely integrable over the measure space and the Banach space being separable, all important properties of integration for Banach space valued functions are established in Proposition 11.92. The integral version of Jensen’s Inequality is established in Theorem 11.98. With these developments in integration theory, I couldn’t help wondering whether a measure can be Banach space valued. The integrations on measurable subset over a measure space of a Banach space valued measurable function seems to generate a Banach space valued measure on the original measurable space. The question is then how to define such a measure. The definition is relatively straight forward if for every measurable set, the Banach space valued measure generates a vector in the Banach space. This leads to definition of pre-measure on a measurable space. For the pre-measure to be countably additive, one must assume that the measures on countable collections of disjoint measurable sets must be absolutely summable. Taking this idea to the limit when the collection of disjoint sets is small in size but covers the entire measurable space, we inevitably stumble upon the total variation of the Banach space valued pre-measure. This total variation must be a finite measure on the measurable space. Now, the intuition is obtained and the rest is just working out the details. To define a Banach space valued measure .μ on a measurable space, there must be first a measure .ν on the measurable space, which is known as the total variation of .μ, to be denoted by .P ◦ μ. Whenever a measurable set E has .ν(E) = ∞, .μ(E) is undefined; on the other hand, when .ν(E) < ∞, then .μ(E) is a point in the Banach space, and satisfies the countable additivity for its subsets. .ν(E) must then relate to .μ(E) as the set of E is broken down to ever smaller disjoint pieces .(Ei )ni=1 and the sum of the .μ(Ei )’s must give us back .ν(E). This definition is very different from its obvious counterpart in Royden (1988), the signed measure. But, this is the correct path forward. We confirm the intuition that integrations on measurable subsets over a measure space of a measurable function .f (·) indeed generate a Banach space valued measure on the original measurable space, where the total variation is simply the integration on the measurable subsets over the measure space of the measurable function .f (·) (see Proposition 11.116). To generate a .σ -finite Banach space valued measure space, we may piece together countably many finite Banach space valued measure spaces as long as they are compatible on the common domain of definition. The resulting .σ -finite Banach space valued measure space admits all of the finite Banach space valued measure spaces as measure subspaces. This is the so-called generation process (Proposition 11.118), and the resulting .σ -finite Banach space

1.1 The Tour of the Book

11

valued measure space is furthermore unique. There will be a few opportunities in the book to apply the generation process to yield desired .σ -finite Banach space valued measure spaces. Following that, we define integration over Banach space valued measure spaces along the same line as for measure spaces. The integral of simple functions over a Banach space valued measure space is determined and is as expected. Then, the Lebesgue Dominated Convergence Theorem 11.131 is established. We define a function to be absolutely integrable over a Banach space valued measure space if its norm function is integrable over the total variation of Banach space valued measure. Under the assumption of absolute integrability of integrands, most important properties of integration hold for integration over Banach space valued measure spaces (see Proposition 11.132). Now, we have Banach space valued measure spaces. For a fixed measurable space .X := (X, B) and a fixed Banach space .Y, the collection of all finite .Yvalued measures on .X form a Banach space .Mf (X , Y), where the norm of each finite .Y-valued measure is the total variation of the set X. The key here is the definition of vector addition and scalar multiplication for finite .Y-valued measures. The vector addition and scalar multiplication are defined for .σ -finite .Y-valued measures as well via the generation process. Then, the collection of all .σ -finite .Y-valued measures on .X forms a vector space .Mσ (X , Y). We can further define a topology on .Mσ (X , Y) whose restriction on .Mf (X , Y) is exactly the one induced by the norm on .Mf (X , Y). Then, .Mσ (X , Y) is a vector space with a topology, we can then talk about continuity and convergence in .Mσ (X , Y). Then, we prove the Lebesgue Dominated Convergence Theorem 11.151 in the general setting of .Mσ (X , Y), when the underlying measure on the measurable space is also changing but converges to some .σ -finite .Y-valued measure .μ according to the topology on .Mσ (X , Y). The result of Proposition 11.116 is further generalized to σ -finite .Yvalued measure spaces. The integrations on measurable subsets over a σ -finite .Y-valued measure space of a measurable function .f (·) generate a .σ -finite Banach space valued measure on the original measurable space, where the total variation of the new measure is upper bounded by the integration on the measurable subsets over the total variation of .Y-valued measure space of the measurable function .f (·) (see Proposition 11.153). The upper bound of the total variation is exactly the total variation if we restrict .Y to be .K (see Proposition 11.154). Hence, up to this point, we have the following: given a measurable function over a .σ -finite (.K-valued) measure space, it is possible to integrate the function to generate a .σ -finite .Z-valued measure on the measurable space, where .Z is a Banach space which contains the range of the measurable function. The inverse question of the above is interesting and is resolved by the Radon-Nikodym Theorem. The fundamental condition that allows this inverse operation (Radon-Nikodym derivative) is that the .σ -finite .Z-valued measure must be absolutely continuous with respect to the .σ -finite .K-valued measure. Once the Radon-Nikodym derivative is known, integration with respect to the .Z-valued measure can be easily carried out with respect to the .K-valued measure. This is carried out in Sect. 11.8, which culminates in Theorems 11.169 and 11.171, where the first one is for measures and the second one is for Banach space valued measures. Section 11.9 studies

12

1 Introduction

the .Lp (X , Y), where .p ∈ [1, ∞] ⊂ Re , .X is a .σ -finite measure space, and .Y is a separable Banach space. Minkowski’s Inequality and Hölder’s Inequality are then presented, which guarantee that .L¯ p (X , Y) is a pseudo-normed linear space. If we take the quotient space of .L¯ p (X , Y) with respect to the nullspace of the pseudo-norm, we have the normed linear space .Lp (X , Y), which a Banach space if .Y is a Banach space. The main objective is to characterize the dual space of .Lp (X , Y). This leads to the Riesz Representation Theorem 11.186, which says that ∗ ∗ .(Lp (X , Y)) = Lq (X , Y ), where .p ∈ [1, ∞) and .q ∈ (1, ∞] with .1/p + 1/q = 1, ∗ .X is a .σ -finite measure space, and .Y is a separable reflexive Banach space with .Y being separable. Chapter 11 ends with Sect. 11.10 that contains the study of the Riesz Representation Theorem on the characterization of the duals of .C(X , Y), where .X is a compact Hausdorff topological space, and .Y is a normed linear space. Under certain conditions, we have that .(C(X , Y))∗ equals the space of all finite .Y∗ -valued topological measures on .X , .Mf t (X , Y∗ ) (see Theorem 11.201). The condition in this theorem is easily satisfied if the measure space .X is a compact rectangle in .Rm (see Theorem 11.204) or if .Y is finite-dimensional (see Theorem 11.205). Along the way, we provide sufficient conditions under which .C(X , Y) is separable (see Proposition 11.206), and prove that for a locally compact separable metric space .X, .Mf (X , BB (X ), Y) = Mf t (X , Y), i. e., a finite .Y-valued measure on the Borel measurable space is automatically a .Y-valued metric (topological) measure (see Theorem 11.198). The notation .Cc (X, Y) appears in the literature often, which is defined to be the space of continuous functions of a finite-dimensional Banach space .X to a finite-dimensional Banach space .Y that has a compact support. This puzzle is resolved in this section and it has been clarified that .Cc (X, Y) is the space of all .Y-valued continuous functions on .X that converges to .ϑY as .xX → ∞. We characterized the dual .(Cc (X, Y))∗ = Mf t (X, Y∗ ) = Mf (X, BB (X), Y∗ ) and obtained the fact .Cc (X, Y) is separable in Theorem 11.209. This characterization allows for a definition of the weak∗ convergence in .Mf t (X, Y∗ ) as the notion of convergence in distribution on the space of measures. This definition (Definition 11.210) puts the convergence in distribution concept on solid ground, as it is exactly the weak∗ convergence in .Mf t (X, Y∗ ). Note that the space of probability measures on .X is a subset of .Mf t (X, R). Chapter 12 is the climax of this mathematical journey, linking integration, Radon-Nikodym derivative, and Fréchet derivative together that allow us to go smoothly from one to the other, and solve the puzzle of the connection between multi-dimensional Riemann integral with Lebesgue integral that we have defined, and derive the Change of Variable formula that we are so familiar with in Riemann integration in the Lebesgue integration setting. Since Banach space valued measures were introduced, to obtain product measure spaces, it becomes necessary to derive a generalized Carathéodory Extension Theorem 12.4 to handle these measure spaces. Section 12.2 introduces the concept of isomeasure: there are two measurable spaces ¯ , and each endowed with a (Banach space valued) measure .μ and .μ, .X and .X ¯ respectively. The two measures are said to be isomeasuric if there is a bijective mapping .g : X → X¯ such that the inverse image of any measurable set in .X¯

1.1 The Tour of the Book

13

is measurable in .X , the image of any measurable set in .X is measurable in .X¯ , the two total variations of the two measures agree (equal to each other) for any measurable set in .X¯ and its inverse image under g in .X , and the two measures agree (equal to each other whenever one of them is defined, or both remain undefined when the common total variation is .∞) for any measurable set in .X¯ and its inverse image under g in .X . Two (Banach space valued) measure spaces being isomeasuric imply that they are equivalent up to relabeling of elements. Isomeasure preserves completeness, finiteness, and .σ -finiteness of measure spaces. A step further, we introduce the concept of homeomorphic isomeasure between two (Banach space valued) topological measure spaces, which is basically a bijective mapping that is an isomeasure and a homeomorphism at the same time. Two topological measure spaces are equivalent up to relabeling of elements if they are homeomorphically isomeasuric. Given a topological measure space, a Borel measurable space on another topological space, and a homeomorphism between the two topological spaces, the induced measure on the measurable space based on the bijective mapping and the topological measure space, is such that the topological measure space and the topological measurable space with the induced measure are homeomorphically isomeasuric under the given mapping. It is straightforward to prove that integration on one of the measure spaces can be easily computed via integration on the other measure space after the integrand is composed with the bijective mapping. As a prelude to the ultimate Change of Variable Theorem, we obtain Theorem 12.16: we have two measure spaces and a bijective mapping between them; the mapping induces a measure on the second measure space based on the first measure space. Now, if we know that the induced measure admits a Radon-Nikodym derivative with respect to the measure on the second measure space, then we may easily calculate the integral on the first measure space by calculating the integral on the second measure space with respect to the induced measure, which can be expressed as an integration over the second measure space as long as we multiply the RadonNikodym derivative into the integrand. The ultimate theorem on change of variables will delineate the conditions such that the induced measure admits the RadonNikodym derivative as desired. This task is postponed until Sect. 12.6. In Sect. 12.3, we establish the existence of the product (.K-valued) measure space of finitely many (.K-valued) measure spaces. These results are motivated by the Math 442 notes of Professor Peck of the University of Illinois at UrbanaChampaign. Special care is taken to deal with Borel measure spaces, rather than Lebesgue measure spaces. Here, things are actually simpler. A measurable set in the product measure space has its sections in the multiplier measure spaces to be all measurable. Then a measurable function in the product measure space has all of its sections in the multiplier measure spaces to be measurable. Tonelli’s Theorem 12.29 allows the integral in product measure spaces of nonnegative real-valued integrands to be written as repeated integrals, where we have to allow integrals of nonnegative extended real-valued functions, introduced in Definition 11.79. Tonelli’s Theorem is usually used to prove a function to be absolutely integrable over the product measure space. Under the assumption that the integrand is absolutely integrable over the

14

1 Introduction

product measure space, Fubini’s Theorem then gives conditions that allow the joint integral to be computed iteratively. It is observed that general integration can be done if the range of the integrand is a subset of a separable Banach space. But, for product measure spaces, if the integration system is set up on a separable Banach space, there is no guarantee that iterative integrals will make sense, that first integrating with respect to one multiplier measure space may not render a measurable function for the next integration. Here, we can say that the product integral can be written as repeated integrals if we appropriately define the intermediate integrand so that it is measurable and equals the integral with respect to the first multiplier measure space almost everywhere whenever the first iterative integral is absolutely integrable. Fubini’s Theorem 12.30 promises that such expression as iterative integrals always exist, and if one can make the integrand for the second iterative integration step to be measurable and equal to the integral with respect to the first multiplier measure space almost everywhere, then one can write the product integral as repeated integrals using such an integrand. The picture for integrands whose range is a subset of a .σ -compact conic segment of a Banach space is much simpler. All one needs to do (see Fubini’s Theorem 12.31) is to write the product integration as repeated integral, and in the integration with respect to the first multiplier measure space, just set the integration result to be the null vector if the integrand is not absolutely integrable (with respect to the first multiplier measure space), then automatically the second integration has an absolutely integrable integrand. This result readily applies to finite-dimensional Banach space valued integrands. In this section, we further prove that the finite product measure space of second countable topological measure spaces is a second countable topological measure space (with respect to the product topological space). Things are nice here. Same can be said when the multiplier topological measure spaces are .K-valued topological measure spaces. Fubini’s Theorem 12.35 generalizes the Theorem 12.30 to .K-valued product measure spaces. We see that (Banach space valued) measures are important objects to deal with. The question is how to keep track of them, even when their domain is .Rm . It is known in the one-dimensional case, a measure can be represented by its cumulative distribution function, which is a function of bounded variation when the measure is finite. In Sect. 12.4, we embark on a study to characterize so-called functions of locally bounded variation, which are functions on .Rm that will correspond to the cumulative distribution function of (Banach space valued) measures on .Rm . The first observation is that for a measure on the closed interval .[1, 2], the cumulative distribution function must live on an interval .(1 − , 2] unless the rule of the cumulative distribution function is modified at end points of the interval. This modification seems to be difficult to characterize for measures on .Rm . Also, things will be weird if the domain of the cumulative distribution function is non-standard and too complex. As motivated by the one-dimensional case, we must define these cumulative distribution functions to be continuous on the right. (Continuous on the left works as well, if we insist that cumulative distribution functions are continuous on the left on every point in the domain. We just have to pick one and stick with it.) Then, we proceed first to define the notion of a region .Ω, of which the cumulative distribution functions are defined, and its principal .P(Ω) on which the measure is

1.1 The Tour of the Book

15

defined. The region .Ω is the countable union of closed rectangles in .Rm ; while the principal .P(Ω) is the countable disjoint union of the corresponding semi-open rectangles in .Rm that further satisfies the assumption that for any closed rectangle contained in .Ω, its corresponding semi-open rectangle must be contained in .P(Ω). As defined, there is a unique .P(Ω) for each region .Ω, but not vice versa. It is easy to see that a region and its principal are (Borel) measurable sets. The principal enlarges if the region enlarges. For a rectangle in .Rm , it must be a region, and its principal is a rectangle that can be easily characterized (Proposition 12.39). The countable union of semi-open rectangles in .Rm must be a region, whose principal is itself. Product of regions is a region, whose principal is the product of the principals of the regions. An open subset of .Rm is a region whose principal is itself. Then, we proceed to define the function of locally bounded variation on a region. These are functions that are continuous on the right first of all. We define the increment of such a function on a semi-open rectangle (whose closure is a subset of the region) to be an alternate sum of the values of the function on vertices of the rectangle. We define the total variation of such a function on a semi-open rectangle (whose closure is a subset of the region) to be the supremum of the sum of the norms of the increments of the function on semi-open rectangles whose disjoint union is the given semi-open rectangle. Second of all, for a function to be of locally bounded variation, it must have finite total variation on any semi-open rectangle whose closure is a subset of the region. Lastly, on any closed rectangle that is a subset of the domain (a region), we may define the total variation function of the semi-open rectangle with opposite corners consisting of the lower right corner of the rectangle and any point in the rectangle, this function must be continuous from the right. So, there are three conditions for a function to be of locally bounded variation. The total variation on semi-open rectangles reminds us of the total variation of the Banach space valued measure, and the last condition on the right continuity of the total variation function reminds us of the fact that the total variation must be a measure and therefore must have a right continuous cumulative distribution function. The function is said to be of bounded variation if its total variation on the domain (region) is finite. We hope that we have nailed this definition. To prove this, we define a cumulative distribution function for a .σ -finite Banach space valued measure on the principal of the region to be one such that for any semi-open rectangle whose closure is a subset of the region, the measure of the semi-open rectangle equals the increment of the cumulative distribution function on the semi-open rectangle. In Theorem 12.50, we establish the result that for any Banach space (.Y) valued function of locally bounded variation defined on a region that is a subset of .Rm , there exists a unique .σ -finite .Y-valued measure on the principal of the region such that the function is a cumulative distribution function of the .Y-valued measure, and the total variation of measure on any semi-open rectangle whose closure is a subset of the region is equal to total variation of the function on the semi-open rectangle, and the total variation of the measure on the entire principal of the region is equal to the total variation of the function. This gives one way of the correspondence. For any finite m .Y-valued Borel measure on the principal of the region in .R , we can always find a function of locally bounded variation defined on the region that is the cumulative

16

1 Introduction

distribution function of the measure (see Proposition 12.51). So, the converse is valid for finite measures. For .σ -finite .Y-valued measures, a cumulative distribution function may not exist and is not unique. The converse holds (see Proposition 12.52) if the region in .Rm is a rectangle .Ω (which must be a region by Proposition 12.39 as we have discussed) and the .Y-valued measure is on the principal of .Ω, if the total variation of the measure on any semi-open rectangle, whose closure is a subset of the rectangle (region), is finite, then for any point .x0 in the region, we may find the cumulative distribution function, that is defined on the region, of the measure with origin .x0 , which is a function of locally bounded variation; for the total variation of the .Y-valued measure, we may find the cumulative distribution function of the total variation with origin .x0 , and this function is also of locally bounded variation and its value on any semi-open rectangle whose closure is a subset of the region equals the total variation of the cumulative distribution function for the .Y-valued measure on the semi-open rectangle. For more general regions, a cumulative distribution function may not be unique or even exist since the increment of a function has many degrees of freedom that may not be captured by just specifying the origin .x0 . With these converses done, the chapter brings up the question of whether a function of locally bounded variation or a cumulative distribution function of a .Y-valued measure is measurable on .Rm . Propositions 12.54 and 12.55 show that the cumulative distribution function on a region defined for finite .Y-valued measures and cumulative distribution function with origin .x0 defined on a rectangle for .σ -finite .Y-valued measures are measurable. For arbitrary functions of locally bounded variation, it should be clear that one-dimensional case is fine. For higher dimensional domains, there is too much freedom in the function to allow for a positive conclusion. An example on .Rm , which is the m-dimensional product of the Borel measure space .R, demonstrates that the m-dimensional outer measure induced by the m-dimensional Borel measure of a set E is equal to the infimum over the sum of m-dimensional Borel measures of any countable open covering of E. The measurable sets with respect to this outer measure (m-dimensional Lebesgue outer measure) are the m-dimensional Lebesgue measurable sets. Since the Borel measure .μB on .R is translational invariant, so is the m-dimensional Borel measure .μBm , as well as the m-dimensional Lebesgue measure .μLm . Section 12.4 ends with the definition of Lebesgue-Stieltjes Integral: the integral on a region in .Rm of a measurable function with respect to some function of locally bounded variation is defined to be the integral on the principal of the region of the measurable function with respect to the unique .σ -finite Banach space valued measure that corresponds to the function of locally bounded variation, which is prescribed in Theorem 12.50. Section 12.5 investigates the properties of cumulative distribution functions for Banach space valued measures that are absolutely continuous (admits RadonNikodym derivative) with respect to the m-dimensional Borel measure .μBm . In essence, we seek to define absolutely continuous functions on .Rm . In .R, the corresponding definition has been widely known. On higher dimensional spaces, the m-dimensional integral of the Radon-Nikodym derivative still allows additional degrees of freedom in the result in terms of integration constant (functions) in the lower dimensional surface of the region. A function can be the cumulative

1.1 The Tour of the Book

17

distribution function for an absolutely continuous measure (with respect to .μBm ) without being continuous. This is obviously an undesirable aspect. Thus, we define a function on a rectangle to be absolutely continuous at a point .x0 if freezing any subset of the m-coordinates yields an absolutely continuous function in a lower dimension; and in the m-dimensional space, .∀ > 0, there exists a .δ > 0, for any pairwise disjoint semi-open rectangles in a vicinity of .x0 , such that their .μBm measure summing to less than .δ, implies that the norms of the increments of the function on the semi-open rectangles sum to less than .. This seems to be a natural way of defining absolutely continuous functions on higher dimensions. The reason why in this definition the domain is restricted to be a rectangle is that the .-.δ language does not make sense if the domain is full of holes or has weird boundaries. A possible alternative to this choice is to consider open domain, which should work just as well. But, the former allows for piecing together absolutely continuous functions on closed rectangles, which would result in an absolutely continuous function on the joint rectangle. The rectangle domain allows investigation of absolute continuity on the boundary of a set. In view of this, one just needs to have a big set on which the function is absolutely continuous. A smaller region in the big set might be investigated for physical phenomena. It is proved in the section that a function is absolutely continuous at a point implies that it is continuous at that point. Vectorizing two absolutely continuous functions on the same domain yields an absolutely continuous function. Absolute continuity is preserved under restriction of the domain. Piecing together absolutely continuous functions on two smaller rectangles yields an absolutely continuous function on the joint rectangle if the smaller rectangles are both relatively open or both relatively closed. Closely related to absolutely continuous functions on intervals is the Lipschitz continuous functions. A Lipschitz continuous function composed with an absolutely continuous function on an interval is again an absolutely continuous function. This fact is the key to the integrability of nonlinear ordinary differential equations. There is no counterpart of this result in higher dimensions, which determines the difficulty in the solvability of general partial differential equations. The product of two absolutely continuous functions on intervals remains absolutely continuous. This is the reason why Integration by Part Theorem exists. Again, there is no counterpart of this result in higher dimensions. The composition of bounded linear operator with an absolutely continuous function on .Rm is again absolutely continuous. The key word here is “linear”, as it does not work for nonlinear operators. An absolutely continuous function is automatically of locally bounded variation. Therefore, there exists a unique .σ -finite .Y-valued measure that admits the function as a cumulative distribution function (see Proposition 12.73). This .Y-valued measure is absolutely continuous with respect to .μBm and therefore admits a Radon-Nikodym derivative f if .Y is reflexive and separable with .Y∗ being separable. Then, the .Y-valued measure is such a measure with kernel f over .Rm . Then, the increment of the function on a semi-open rectangle is equal to the integral of f over the semi-open rectangle with respect to .μBm . The converse direction of the preceding discussion works like this: start with a rectangle in .Rm and a .σ -finite .Y-valued measure on its principal (as a topological subspace of .Rm ), assume that .Y-valued measure is absolutely

18

1 Introduction

continuous with respect to .μBm on the principal; then the cumulative distribution function of the .Y-valued measure with any origin .x0 in the rectangle is absolutely continuous (Proposition 12.75). So far, we have been only working with integration and Radon-Nikodym derivative. Section 12.6 studies the line between integration and Fréchet derivative. The key here is the Vitali’s Lemma. For higher dimensional spaces (beyond 1), the key is to adapt the Vitali’s Lemma to rectangles rather than balls. Then, we may define Lebesgue point of an open set in .Rm and show that the set of Lebesgue points is Lebesgue measurable (not Borel), and its complement in the original open set has (m-dimensional) Lebesgue measure 0 (Proposition 12.80). Given a right continuous function on .Rm , it directly leads to that the partial derivatives with respect to coordinate variables are .μBm -measurable, and the Fréchet derivative of the function is .μBm -measurable. The question is when do these derivatives exist almost everywhere, and whether it allows the interchange of integration and differentiation. Standard result in the Riemann integral case directly carries over to Lebesgue integral (only in one-dimension, see Theorems 12.82 and 12.83). Here, I also consulted the book Spivak (1965). In Fundamental Theorem of Calculus I, Theorem 12.86, we show that the function defined as a m-dimensional integral of a measurable function that is absolutely integrable over any semi-open rectangles in the open rectangle domain must admit all partial derivatives along all coordinate variables almost everywhere .μBm . The partial derivatives equal to (almost everywhere .μBm ) the .(m − 1)-dimensional integral subject to measurability concerns. If the function is finite-dimensional Banach space valued, then the partial derivatives along coordinate variables are equal to (almost everywhere .μBm ) the .(m − 1)-dimensional integral whenever the integral is absolutely integrable and set to the null vector whenever it is not. In Fundamental Theorem of Calculus II, Theorem 12.88, a .Y-valued absolutely continuous function defined on an open rectangle of .Rm can be expressed as the sum of .2m − 1 integrals of .2m − 1 measurable functions and the function value at .x0 in the domain. These measurable functions are unique up to a set of measure 0 on their domain of definition once .x0 is chosen. Therefore, these .2m − 1 measurable functions are called the stream functions of the original function with respect to the point .x0 . The partial derivatives along coordinate variables are equal to (almost everywhere .μB ) some one-dimensional stream function with respect to some point. If the stream functions are locally bounded, then the Fréchet derivative of the original function exists almost everywhere in the open rectangle. The stream functions of the original function with respect to any point .x¯ can be obtained from the stream functions with respect to .x0 . When .Y is finite-dimensional, this transformation of stream functions can be calculated whenever the integrands in the transformation are absolutely integrable but sets to null vector of .Y if any of the integrands is not absolutely integrable over its domain of integration. Integration by Parts Theorem 12.89 allows for absolutely continuous functions on intervals instead of requiring .C1 functions. The Change of Variable Theorem 12.91 specifies the conditions under which the induced measure is absolutely continuous with respect

1.1 The Tour of the Book

19

μBm and proved the usual formula, which allows for absolutely integrable integrand functions and homeomorphisms that may not be .C1 . In Sect. 12.7, we include some incomplete characterization of the dual of .Ck (X , Y). Then, we present the specialization of the Change of Variable Theorem to two-dimensional and three-dimensional common domain transformations (Examples 12.95–12.97), and one-dimensional general domain transformation (Proposition 12.98). There is also a result that states the equivalence of Riemann-Stieltjes and Lebesgue-Stieltjes integrals for bounded measurable real-valued integrands with respect to monotone integrator (Proposition 12.100). The surface measure on the unit sphere in .Rm is defined in Example 12.104. In Sect. 12.8, we introduce a new definition of Sobolev space that agrees with the classical definition (Zeidler, 1995) in one-dimensional case. The driving reason here is that for functions with generalized derivatives, one must look at absolutely continuous functions. These functions can be expressed as sum of integrals of the stream functions with respect to some point .x0 in the domain exactly. If we define the norm of an absolutely continuous function to include the .L¯ p norms of the function as well as of its stream functions, then one can show that the space is a Banach space. It is a space of (absolutely) continuous functions with all integral norms that is complete. When .p = 2, this space is a Hilbert space with integral inner products. In Sect. 12.9, we study continuity and Fréchet differentiability with respect to .x ∈ D ⊆ X of a function defined as the integral of an integrand function over a finite or .σ -finite Banach space valued measure space with x as a parameter. For the finite Banach space valued measure space case, continuity of the function is established in Theorem 12.111. Fréchet differentiability is established in Theorem 12.112, where we allow the domain D to be somewhat general. The result becomes standard if we assume that the domain D is open. This then allows us to prove the Leibniz Formula Theorem 12.113. For the .σ -finite Banach space valued measure space case, continuity of the function is established in Theorems 12.114 and 12.115. Fréchet differentiability is established in Theorem 12.116. One take away from these results is that it is best to lump the measurable part of the integrand into a signal that the integrand depends on continuously; then the result follows elegantly with easy generalization to the .Ck case. We then switch gear to show that a Lipschitz continuous function composed with a function of locally bounded variation on an interval again yields a function of locally bounded variation (Proposition 12.117). Vectorizing two functions of locally bounded variation on an interval yields a function of locally bounded variation (Proposition 12.118). We prove that any function in .L¯ p (X, Y), where .X is a compact interval in .R, can be approximated by an absolutely continuous function with any prescribed precision in .L¯ p (X, Y). The end of the section is Taylor’s Theorem 12.122 with the integral form of the remainder. Section 12.10 studies the condition under which an integration on a compact rectangle of .Rm can be calculated as repeated integrals. Here, the goal is to obtain sufficient conditions for the result to hold, that is easily checked and free of technicalities. Theorem 12.124 is an obvious answer to this quest. Then, going deeper, we arrive at the Iterated Integral Theorem 12.127, which is very general and practical for applied mathematics calculations. .

20

1 Introduction

The last section of the chapter is connected with the topic of manifolds where local charts are modeled by some Banach space over the field .K. This topic is included here due to the necessity to discuss complex logarithms later in Chap. 14 in the proof of the Central Limit Theorem 14.63. Here, for a rigorous presentation, one has to make a trip to the manifolds. These results are motivated by Isidori (1995), which has an excellent introduction to differential manifolds in its Appendix A. The book Bishop and Goldberg (1980) has a proof of the representation of tangent vector at a point of a smooth manifold of finite dimensions. In Sect. 12.11.1, we provide the basic definition of a smooth and analytic manifold of variant .Y, where .Y is a Banach space over .K. We give a few examples of smooth and analytic manifolds. We then define the logarithm function on the space .C0 := C \ {0}, which takes value in a manifold (Example 12.139). Properties of this .ln function are obtained in Proposition 12.140. In Sect. 12.11.2, we define tangent vector at a point of a smooth manifold. The collection of all such vectors form the tangent space at the point of the manifold, which is isomorphic to the reflexive Banach space .Y. We then define the differential of a smooth mapping between two smooth manifolds at a point and prove that the differential is a linear map (Theorem 12.144). This leads to the Chain Rule (Theorem 12.145) in the context of manifolds. At the end of this subsection, we define the tangent covectors at a point of a smooth manifold. In Sect. 12.11.3, we define vector fields, the Lie derivative, and the Lie bracket and establish their properties in Theorem 12.150 that includes the Jacobi Identity. Due to my determination that I will not deal with the existence and uniqueness of ordinary differential equations in this book, we have to quit here in the pursuit of smooth manifolds. Chapter 13 contains key results that are specific to Hilbert spaces. Pre-Hilbert space is a vector space together with the binary operation, called the inner product, that induces the norm on the space. A Hilbert space is a pre-Hilbert space that is further complete. So, it is a special kind of a Banach space. All of the results for Banach spaces are valid in Hilbert spaces. The special feature of Hilbert spaces is that the Projection Theorem 13.13 holds. The dual of a Hilbert space is easily characterized in Riesz-Fréchet Theorem 13.15, and a Hilbert space is reflexive. The dual is isometrically isomorphic to itself if the underlying field is real. When the underlying field is complex, then the dual is isometric (but not isomorphic) to the Hilbert space. There exists a norm preserving conjugate linear mapping from a Hilbert space to it dual, which is further invertible. With this mapping, we can define two vectors in the Hilbert space to be orthogonal if one of the vectors is orthogonal (as defined for Banach space and its dual) to the bounded linear functional that is the image of the other vector under the conjugate map. In Sect. 13.4, we define the Hermitian adjoint of a bounded linear operator between Hilbert spaces and obtain its properties. In Sect. 13.5, we present the Gram–Schmidt procedure and other results related to orthonormal sequences in Hilbert spaces. Legendre polynomials are discussed in Example 13.34. The .L2 theory for Fourier series is presented in Example 13.36. The Projection Theorem is revisited in Sect. 13.6, and the minimum norm problem for convex sets is covered. Then, we define positive-definite and

1.1 The Tour of the Book

21

positive semi-definite operators among Hermitian operators on a Hilbert space and derive their properties in Sect. 13.7. Section 13.8 is devoted to the study of projection operators on Hilbert spaces, and the pseudoinverse of bounded linear operators between Hilbert spaces. The last section of this chapter, Sect. 13.9, presents the study of eigenvector and eigenvalues for a bounded linear operator on a Hilbert space. In general, eigenvector or eigenvalue may not exist. But, once we restrict ourselves to compact Hermitian linear operators, then this concept is very useful and generalizes common understandings for Hermitian matrices. In Spectral Theory, Theorem 13.52, the singular value decomposition of a compact linear operator between Hilbert spaces is obtained and the eigenstructure of a compact Hermitian linear operator is obtained. This topic is motivated by Zeidler (1995) and the well-known theory for matrices. It is shown in Example 13.53 that linear integral operators with continuous kernels are compact operators between .L2 spaces. In Proposition 13.55, certain matrix like properties are established for compact linear operators. For compact Hermitian linear operators, Fredholm Alternative Proposition 13.56 establishes certain matrix like properties for these operators. In Example 13.57, we establish that first order Sobolev space .W2,1,x0 (X, Y) is a Hilbert space when .X is a .σ -finite metric measure subspace of m ∗ .R and .Y is separable Hilbert space with .Y being separable. Chapter 14 is devoted to probability theory. The material in this chapter is inspired mainly by Williams (1991) and MATH 451 and 452 notes of Professor Burkholder of the University of Illinois at Urbana-Champaign. The fundamental notions in probability theory are introduced in Sect. 14.1: random variables, Banach space valued random variables, the expectation, independence of .σ -algebras, independence of random variables, conditional expectation, the law of a random variable, and its probability density function. The conditional expectation of a random variable given a sub-.σ -algebra is defined to be the Radon-Nikodym derivative of appropriately defined measures (see Definition 14.7). The main properties are listed in Proposition 14.11, which is motivated by Williams (1991). We then prove the Fundamental Theorem of Modeling, Theorem 14.15, which allows all probability problems to be set up as measure theoretic problems. After a section on Gaussian random variables and vectors (Sect. 14.2), the Weak Law of Large Numbers, Theorem 14.29, is presented in Sect. 14.3. Then, we directly turn to the study of Martingale theory in Sect. 14.4, defining a stochastic process as a measurable function on the product measure space of the probability measure space and the .σ -finite measure space of time domain. Proofs of Doob’s Optional Stopping Theorem 14.36, Doob’s Upcrossing Lemma 14.39, and Doob’s Forward Convergence Theorem 14.41 are given. Here, the Law of Large Numbers is revisited, and the Strong Law of Large Numbers (Theorem 14.44) is proved. We then turn in Sect. 14.5 to study Banach space valued Martingales indexed by .Z+ . We present the generalized Doob’s Optional Stopping Theorem 14.48, Kolmogorov 0-1 Law, Theorem 14.50, and some results on the tail .σ -algebra Proposition 14.51. Then, we present results (especially Lévy’s Inversion Formula, Theorem 14.55) related to the characteristic function for a real-valued random variable in Sect. 14.6.

22

1 Introduction

In Sect. 14.7, we discuss the mode of convergence and formally apply the Definition 11.210 in the probability measure space context, Definition 14.56. Then, the Modes of Convergence Theorem 14.57 follows readily from the definition. We introduce the concept of uniformly growth boundedness in Definition 14.58, which is also known as the tightness condition in Williams (1991). The convergence in distribution notion we defined is weaker than that of Williams (1991). Proposition 14.60 then gives the conditions to bridge the classical notion of convergence in distribution and our notion of convergence in distribution. The uniformly growth boundedness condition plays a key role in this result, which is dependent on Skorokhod Representation Theorem 14.59. In Sect. 14.8, we proceed to present the Central Limit Theorem 14.63, which is dependent on Helly’s Lemma 14.61 and Lévy’s Convergence Theorem 14.62. Results on the characteristic function and its inversion formula have direct application to Fourier transform. In Sect. 14.9, we present some results on uniform integrable Martingales. One of my goals in this mathematics study is to establish the existence of Wiener process so as to provide a solid foundation for my past control research. It was sad that I was not able to do this before late July 2018, when I opened my MATH 452 notes of Professor Burkholder of the University of Illinois at Urbana-Champaign and found that the proof for Wiener process is right there, which was not there the last time I checked the notes. So, finally, God opened the door to probability theory for me then. Section 14.10 contains a rigorous proof of the existence of the Wiener process. In Theorem 14.70, we prove that the conditional expectation is the least square estimator of a random variable given any observation .σ -algebra. Proposition 14.71 is an effort to generalize Proposition 14.60 to .Rm -valued random variables. It is used in the proof of Proposition 14.72, which allows us to form infinite vector sums of independent identically distributed Gaussian random variables and conclude that the sum is an .Rm -valued Gaussian random variable. The construction of the Wiener process begins with the Haar functions (Example 14.73). Then, we construct the Wiener process on the unit interval, then on the .[0, ∞) ⊂ R. Then, we define and construct the m-dimensional (.Rm -valued) standard Wiener process (Definition 14.78 and Example 14.79). Also included is the proof that the Wiener process is nowhere differentiable with probability 1. In Sect. 14.11, we study continuous time Martingale processes showing that a particular transformation of the Wiener process yields a class of Martingale processes in the continuous time domain. Lemma 14.85 is a key technical lemma that allows us to analyze the time maximum of the absolute value of the Wiener process. The main result of this section is the Law of Iterated Logarithms (Theorem 14.87). I then study continuous time stopping times and prove a result on the stopped Wiener process Proposition 14.94. The climax of this chapter is the last three sections on stochastic integration. Here, stochastic integrals are defined with respect to standard Wiener process when the integrand is of bounded variation, and therefore the integral exists in the Riemann-Stieltjes integral sense. Then the integrand can be generalized to the .L¯ 2 deterministic signal case, which yields the Wiener integral Proposition 14.98; and also to the .L¯ 2 random signal case, which yields the Itô integral Definition 14.104. Yet, in these more general

1.1 The Tour of the Book

23

integrals, the resulting function may not be a stochastic process. It was hinted in Professor Burkholder’s notes that modification of resulting function on sets of measure zero may be performed that renders the resulting function a stochastic process, i. e., Itô processes. These results are beyond my grasp before June 2020. After reading Professor Burkholder’s notes again, I understood his approach to Itô process. Results on the resulting Itô process are summarized in Sect. 14.13. In these sections, Proposition 14.107 shows that if the integrand process is in the subspace .U(ra,b × Ω, Y) then the integral can be approximated by integrals of the sampleand-hold version of the integrand process. As it turns out, the Itô process exists for general .L¯ 2 random integrands that are Itô integrable, and is unique up to a set of probability zero. But, for the Itô’s Formula, Theorem 14.122, to hold, we need to impose some mild integrability assumptions for various technical reasons. Thus, we need the integrand process to be in .U(ra,b × Ω, Y). The main difficulty is with the quadratic variation term in the 2-order Taylor expansion of the desired Itô process. For this term to converge to the desired formula, I have to impose growth conditions. The formula applies to general separable Hilbert space valued stochastic processes. The result is elegant and specializes nicely when the involved Hilbert spaces are finite-dimensional. The key here is to make the assumption that the integrand processes satisfy the Riemann Criterion for Integrability, which is necessary and sufficient in the finite-dimensional case but is only sufficient for the general Hilbert space case. In April 2020, I finally proved the Girsanov’s Theorem 14.123 after consulting Gihman and Skorohod (1972), Stroock and Varadhan (1979), Elliott (1982). Thus, my earlier research on linear exponential of quadratic Gaussian control are on solid ground if the Girsanov’s Theorem holds, in particular, the t exponential formula .ξ0f (f ) is a Martingale, which is always a super Martingale t to begin with. I had trouble to derive a sufficient condition for .ξ0f (f ) to be a Martingale. Then, I spotted Theorem 13.27 and Example 13.32 in Elliott (1982). The proofs for these results require localization results for stochastic integrals that is beyond my grasp. So, I think that I will stop here in my pursuit of probability theory. My earlier research on jump linear systems are entirely correct. My earlier research on nonlinear stochastic systems are just pointers to what one can do in these nonlinear systems but are not in general rigorous in terms of theory. Key integrability assumptions are missing in those results. Appendix A includes some results in elementary calculus and some results on Riemann-Stieltjes integral that is generalized to Banach space valued integrand and integrator. These results are necessary for the discussion on stochastic integrations and motivated by the excellent results of Bartle (1976). We define the RiemannStieltjes integral as a limit of an appropriate net, and then prove Cauchy Criterion for Integrability, the bilinearity of the Riemann-Stieltjes integral, and Integration by Parts (Theorem A.8). The Integration by Parts Theorem is where the RiemannStieltjes integral really shines. It holds under the single assumption that the one of the integrals is well-defined. Then, we continue to prove the Riemann Criterion for Integration (Theorem A.9). This then leads to the Integrability Theorem A.10, which connects the Riemann-Stieltjes integral to the Lebesgue-Stieltjes integral that we have been discussing in the main text of this book. It also establishes a sufficient

24

1 Introduction

condition when Riemann-Stieltjes integral converges in gauge mode and gives bounds on Riemann-Stieltjes integral. In Theorem A.11, we prove that the RiemannStieltjes integral must converge in gauge mode whenever it exists if the integrator is .idR . In Theorem A.12, we give a sufficient condition under which the RiemannStieltjes integral is continuous with respect to the terminal time of the integral. In Theorem A.13, we give results on the absolute Riemann-Stieltjes integrability of the integrand function with respect to the integrator when the integrand is RiemannStieltjes integrable with respect to the integrator. Then, the Riemann Criterion for Integrability is revisited in Theorem A.14 on the necessity of this criterion when the integrand is finite-dimensional and the integrator is .idR . In Theorem A.15, it is proved that if the integrand is bounded and .BB (R)-measurable, and the integrator is of bounded variation, and furthermore the Riemann-Stieltjes integral satisfies the Riemann Criterion for Integrability, then the Riemann-Stieltjes integral equals the corresponding Lebesgue-Stieltjes integral. Corollaries A.16 and A.17 are sufficiency results of certain Riemann-Stieltjes integrals satisfying the Riemann Criterion for Integrability. The appendix ends with Theorem A.18 which states that if two integrands both satisfy the Riemann Criterion for Integrability with respect to a .K-valued integrator of bounded variation, then the product of the integrands also satisfies the Riemann Criterion for Integrability with respect to the .K-valued integrator.

1.2 How to Use the Book The book is written in a linear order mostly. Sections A.1 and A.2 can be read first. Then, one may proceed from Chap. 2 through 11 and 12. At this point, Sect. A.3 is now accessible. This section is a required reading before one can start reading Sect. 14.12. After Chap. 12, one can proceed to Chaps. 13 and 14 and thus complete the book. The book can also be used as a reference book. Every result is stated independently. Thus, one can locate the result that he/she is interested in and then check for required definitions and terminologies. The proof for each result is selfcontained. Therefore one may dig deeper into the interested result by reading its proof. These are the main ways of reading this book. This book can serve as a graduate level textbook for mathematics-loving students in the subject of analysis. The prerequisite for the book is the undergraduate calculus course where some exposure to the Riemann integral is presented. Since the book is self-contained and touches on most of the abstract spaces that are prevalent in analysis, a student completing this book will have a good understanding of what is available in analysis and tackle a range of applied mathematics problems.

1.3 What This Book Does Not Include

25

1.3 What This Book Does Not Include This book does not include many topics on manifolds, which is an interesting and useful topic. It has direct ramification in our physical world. Interested readers are encouraged to further explore this topic. I personally would like very much to explore this field but I have to go back to my controls research, which is the true calling to me. In probability theory, one may further investigate Markov processes and bring the results on Poisson process and Markov chains to light.

Chapter 2

Set Theory

2.1 Axiomatic Foundations of Set Theory We will list the nine axioms of Zermelo–Fraenkel–Cantor axiom system. The ninth axiom, which is the axiom of choice, will be introduced in Sect. 2.7. Let A and B be sets and x and y be objects (which is another name for sets). Axiom 1 (Axiom of Extensionality) .A = B if .∀x ∈ A, we have .x ∈ B; and ∀x ∈ B, we have .x ∈ A.

.

Axiom 2 (Axiom of Empty Set) There exists an empty set .∅ that does not contain any element. Axiom 3 (Axiom of Pairing) For any objects x and y, there exists a set .{x, y} that contains only x and y. Axiom 4 (Axiom of Regularity) Any nonempty set .A = ∅, there exists .a ∈ A, such that .∀b ∈ A, and we have .b ∈ / a. Axiom 5 (Axiom of Replacement) .∀x ∈ A, let there be one and only one y to form an ordered pair .(x, y). Then, the collection of all such y’s is a set B. Axiom 6 (Axiom of Power Set) The collection of all subsets of A is a set denoted by .A2. Axiom  7 (Axiom of Union) For any collection of sets .(Aλ )λ∈Λ , where .Λ is a set, then . λ∈Λ Aλ is a well-defined set. Axiom 8 (Axiom of Infinity) There exists a set A such that .∅ ∈ A, and .∀x ∈ A, we have .{∅, x} ∈ A. By Axiom 2, there exists the empty set .∅, which we may call 0. Now, by Axiom 3, there exists the set .{∅}, which is nonempty and we may call 1. Again, by Axiom 3, © The Author(s), under exclusive license to Springer Nature Switzerland AG 2023 Z. Pan, Measure-Theoretic Calculus in Abstract Spaces, https://doi.org/10.1007/978-3-031-21912-2_2

27

28

2 Set Theory

there exists the set .{∅, {∅}}, which we will call 2. After we define n, we may define n+1 := {0, n}, which exists by axiom of pairing. This allows us to define all natural numbers. By Axiom 8, these natural numbers can form the set, .N := {1, 2, . . .}, which is the set of natural numbers. Furthermore, by Axiom 6, we may define the set of all real numbers, .R. For any .x ∈ A and .y ∈ B, we may apply Axiom 3 to define the ordered pair  .(x,  y) := {{{{x}}, 1}, {{{y}}, 2}}. Then, the set .A × B is defined by . x∈A y∈B {(x, y)}, which is a valid set by Axiom 7. By Axiom 5, any portion .As of a well-defined set A is again a set, which is called a subset of A, and we will write .As ⊆ A. Thus, the formula .

{x ∈ A | p(x) is true.}

.

defines a set as long as A is a set and .p(x) is unambiguous logic expression.

2.2 Relations and Equivalence Definition 2.1 Let A and B be sets. A relation R from A to B is a subset of A × B. ∀x ∈ A, ∀y ∈ B, we say x ∼ y if (x, y) ∈ R. We will say that R is a relation on A if it is a relation from A to A. We define .

dom (R) := {x ∈ A | ∃y ∈ B, such that x ∼ y}

range (R) := {y ∈ B | ∃x ∈ A, such that x ∼ y} which are well-defined subsets.

%

Definition 2.2 Let A be a set and R be a relation on A. ∀x, y, z ∈ A: R is reflexive if x ∼ x. R is symmetric if x ∼ y implies y ∼ x. R is transitive if x ∼ y and y ∼ z implies x ∼ z. R is an equivalence relationship if it is reflexive, symmetric, and transitive, and will be denoted by “≡.” 5. R is antisymmetric if x ∼ y and y ∼ x implies x = y. % 1. 2. 3. 4.

Let ≡ be an equivalence relationship on A; then, it partitions A into disjoint equivalence classes Ax := {y ∈ A | x ≡ y}, ∀x ∈ A. The collection of all equivalence classes, A/ ≡:= {Ax ⊆ A | x ∈ A}, is called the quotient of A with respect to ≡.

2.3 Function

29

2.3 Function Definition 2.3 Let X and Y be sets and D ⊆ X. A function f of D to Y , denoted by f : D → Y , is a relation from D to Y such that ∀x ∈ D, there is exactly one y ∈ Y , such that (x, y) ∈ f ; we will denote that y as f (x). The graph of f is the set graph (f ) := {(x, f (x)) ∈ X × Y | x ∈ D}. The domain of f is dom (f ) = D. ∀A ⊆ X, the image under f of A is f (A) := {y ∈ Y | ∃x ∈ A∩D such that f (x) = y}, which is a subset of Y . The range of f is range (f ) = f (X). ∀B ⊆ Y , the inverse image under f of B is finv (B) := {x ∈ D | f (x) ∈ B}, which is a subset of D. f is said to be surjective if f (X) = Y ; and f is said to be injective if f (x1 ) = f (x2 ), ∀x1 , x2 ∈ D with x1 = x2 ; f is said to be bijective if it is both surjective and injective, in which case it is invertible and the inverse function is denoted by finv : Y → D. We will say that f is a function from X to Y . % Let f : D → Y and g : Y → Z be functions, we may define a function h : D → Z by h(x) = g(f (x)), and then h is called the composition of g with f and is denoted by g ◦ f . Let A ⊆ X. We may define a function l : A ∩ D → Y by l(x) = f (x), ∀x ∈ A ∩ D. This function is called the restriction of f to A, and denoted by f |A . Let f : D → Y , g : Y → Z, and h : Z → W ; we have (h ◦ g) ◦ f = h ◦ (g ◦ f ). Let f : D → D and k ∈ Z+ ; we will write f k := f ◦ · · · ◦ f , where f 0 := idD .    k

A function f : X → Y is a subset of X × Y . Then, f ∈ X×Y 2. The collection of all functions of X to Y is then a set given by Y X := {f ∈ X×Y 2 | ∀x ∈ X, ∃! y ∈ Y · (x, y) ∈ f } We have the following result concerning the inverse of a function. Proposition 2.4 Let φ : X → Y , where X and Y are sets. Then, φ is bijective if, and only if, ∃ψi : Y → X, i = 1, 2, such that φ ◦ ψ1 = idY and ψ2 ◦ φ = idX . Furthermore, φinv = ψ1 = ψ2 . Proof “Sufficiency” Let ψi : Y → X, i = 1, 2, exist. ∀y ∈ Y , φ ◦ ψ1 (y) = idY (y) = y, which implies that y ∈ range (φ), and hence, φ is surjective. Suppose that φ is not injective, then ∃x1 , x2 ∈ X with x1 = x2 such that φ(x1 ) = φ(x2 ). Then, we have x1 = idX (x1 ) = ψ2 (φ(x1 )) = ψ2 (φ(x2 )) = idX (x2 ) = x2

.

which is a contradiction. Hence, φ is injective. This proves that φ is bijective. “Necessity” Let φ be bijective. Then, φinv : Y → X exists. ∀x ∈ X, let y = φ(x), then x = φinv (y); hence, φinv (φ(x)) = x. Therefore, we have φinv ◦ φ = idX . ∀y ∈ Y , let x = φinv (y); then y = φ(x); hence, φ(φinv (y)) = y. Therefore, we have φ ◦ φinv = idY . Hence, ψ1 = ψ2 = φinv .

30

2 Set Theory

Let ψ1 and ψ2 satisfy the assumption of the proposition, and φinv be the inverse function of φ. Then, we have ψ1 = idX ◦ψ1 = (φinv ◦ φ) ◦ ψ1 = φinv ◦ (φ ◦ ψ1 ) = φinv ◦ idY = φinv

.

ψ2 = ψ2 ◦ idY = ψ2 ◦ (φ ◦ φinv ) = (ψ2 ◦ φ) ◦ φinv = idX ◦φinv = φinv ' &

This completes the proof of the proposition.

For bijective functions f : X → Y and g : Y → Z, g ◦ f is also bijective and (g ◦ f )inv = finv ◦ ginv .

2.4 Set Operations Let X be a set and .X2 be the set consisting of all subsets of X. .∀A, B ⊆ X, we will define A ∪ B := {x ∈ X | x ∈ A or x ∈ B}

.

A ∩ B := {x ∈ X | x ∈ A and x ∈ B} A˜ := {x ∈ X | x ∈ / A} A \ B := {x ∈ A | x ∈ / B} = A ∩ B˜ AB := (A \ B) ∪ (B \ A) We have the following results. Proposition 2.5 Let .A, B, D, Aλ ∈ X2, .f : D → Y , .C, E, Cλ ∈ Y 2, where X and Y are sets, .λ ∈ Λ, and .Λ is an index set. Then, we have: 1. 2. 3. 4. 5.

A ∪ B = B ∪ A and .A ∩ B = B ∩ A. A ⊆ A ∪ B and .A = A ∪ B if, and only if, .B ⊆ A. .A ∪ ∅ = A, .A ∩ ∅ = ∅, .A ∪ X = X, and .A ∩ X = A. ˜ = X, .A˜˜ = A, .A ∪ A˜ = X, .A ∩ A˜ = ∅, and .A ⊆ B if, and only if, .B˜ ⊆ A. ˜ .∅ The De Morgan’s Laws: . .



.

λ∈Λ

6. .B ∪ 7. .f





∼

=



λ ; A

λ∈Λ





∼

=

λ∈Λ

λ . A

λ∈Λ

     Aλ = (B ∪ Aλ ) and .B ∩ Aλ = (B ∩ Aλ ).

λ∈Λ  λ∈Λ  λ∈Λ    λ∈Λ Aλ = f (Aλ ) and .f Aλ ⊆ f (Aλ ). λ∈Λ



λ∈Λ

λ∈Λ

λ∈Λ

2.5 Algebra of Sets

8. .finv



31

     Cλ = finv (Cλ ) and .finv Cλ = finv (Cλ ).

λ∈Λ λ∈Λ λ∈Λ λ∈Λ   9. .finv (C \ E)  =  finv (C) \ finv (E), .f (finv (C)) = C ∩ range f , and .finv (f (A)) ⊇ A ∩ dom f = A ∩ D.

The proof of the above results is standard and is therefore omitted.

2.5 Algebra of Sets Definition 2.6 A set X is said to be finite if it is either empty or the range of a function of {1, 2, . . . , n}, with n ∈ N. In this case, card (X) denotes the number of elements in X. It is said to be countable if it is either empty or the range of a function of N. % Definition 2.7 Let X be a set and A ⊆ X2. A is said to be an algebra of sets on X (or a Boolean algebra on X) if: (i) ∅, X ∈ A. (ii) ∀A, B ∈ A, A ∪ B ∈ A, and A˜ ∈ A. A is said to be a σ -algebra on X if it is an algebra on X and countable union of sets in A is again in A. % Let M ⊆ X2, where X is a set; then, there exists a smallest algebra on X, A0 ⊆ containing M, which means that any algebra on X, A1 ⊆ X2, that contains M, and we have A0 ⊆ A1 . This algebra A0 is said to be the algebra on X generated by M. Also, there exists a smallest σ -algebra on X, A ⊆ X2, containing M, which is said to be the σ -algebra on X generated by M.

X2,

Proposition 2.8 Let X be a set, E be a nonempty collection of subsets of X, A be the algebra on X generated by E, and    ∃n, m ∈ N, ∀i1 , . . . , i2n ∈ {1, . . . , m}, ∃Fi ,...,i ⊆ X 1 2n  ∼  with Fi1 ,...,i2n ∈ E or Fi1 ,...,i2n ∈ E, such that  m  m m   ··· Fi1 ,...,i2n A=

 ¯ .A := A⊆X

i1 =1 i2 =1

¯ Then, A = A.

i2n =1

  Proof ∀E ∈ E, let n = 1, m = 1, and F1,1 = E. Then, E = 1i1 =1 1i2 =1 Fi1 ,i2 ∈ ¯ Hence, we have E ⊆ A. ¯ It is clear that A¯ ⊆ A. All we need to show is that A¯ is A. an algebra on X. Then, A ⊆ A¯ and the result follows.

32

2 Set Theory

˜ F2,1 = E, Fix E ∈ E = ∅. E ⊆ X. Let n = 1, m = 2, F1,1 = E, F1,2 = E, 2 2 ¯ ˜ and F2,2 = E. Then, ∅ = i1 =1 i2 =1 Fi1 ,i2 ∈ A. Let n = 1, m = 2, F1,1 = E, 2  ¯ ˜ and F2,2 = E. ˜ Then, X = 2 F1,2 = E, F2,1 = E, i1 =1 i2 =1 Fi1 ,i2 ∈ A. A ¯ ∃nA , mA ∈ N, ∀i1 , . . . , i2nA ∈ {1, . . . , mA }, ∃F ∀A, B ∈ A, i1 ,...,i2nA ⊆ X ∼  A mA mA  A A ∈ E such that A = i1 =1 i2 =1 · · · m with Fi1 ,...,i2n ∈ E or Fi1 ,...,i2n i2n =1 A

A

Fi1 ,...,i2n , B

A

and

A

∃nB , mB B



A

N,

∀i1 , . . . , i2nB ∈ {1, . . . , mB }, ∼  B ∈ E or Fi1 ,...,i2n ∈ E such that B =

∃Fi1 ,...,i2n ⊆ X with Fi1 ,...,i2n B B mB m mB B B · · · F i1 =1 i2 =1 i2nB =1 i1 ,...,i2nB . mA mA mA Note that A˜ = i1 =1 i2 =1 · · · i2n

B

 A ∼ Fi1 ,...,i2n . Let n = nA + 1,  A A ∼ m = mA , ∀i1 , . . . , i2n ∈ {1, . . . , m}, Gi1 ,...,i2n = Fi2 ,...,i2n−1 . Then, A˜ = m m m ¯ i1 =1 i2 =1 · · · i2n =1 Gi1 ,...,i2n ∈ A. Without loss of generality, assume nA ≥ nB . Let n = nA and m = mA + mB . Define i¯ = 1 + (i mod m⎧ A ) and i˜ = 1 + (i mod mB ), ∀i ∈ N. ∀i1 , . . . , i2n ∈ ⎨ F A¯ ¯ if i1 ≤ mA i1 ,i2 ,...,i2nA {1, . . . , m}, let Gi1 ,...,i2n = . Then, it is easy to check ⎩ F B ˜ ˜ if i1 > mA i1 −mA ,i2 ,...,i2nB m  m ¯ ¯ that A ∪ B = m i1 =1 i2 =1 · · · i2n =1 Gi1 ,...,i2n ∈ A. Hence, A is an algebra on X. This completes the proof of the proposition. ' & =1 A

2.6 Partial Ordering and Total Ordering Definition 2.9 Let A be a set and ) be a relation on A. ) will be called a partial ordering if it is reflexive and transitive. It will be called a total ordering if it is an antisymmetric partial ordering and satisfies ∀x, y ∈ A with x = y, and we have either x ) y or y ) x (not both). % As an example, the set containment “⊆” is a partial ordering on any collection of sets, while “≤” is a total ordering on any subset of R. Definition 2.10 Let A be a set with a partial ordering “)”: 1. a ∈ A is said to be minimal if, ∀x ∈ A, x ) a implies a ) x. 2. a ∈ A is said to be the least element if, ∀x ∈ A, a ) x, and x ) a implies that x = a. 3. a ∈ A is said to be maximal if, ∀x ∈ A, a ) x implies x ) a. 4. a ∈ A is said to be the greatest element if, ∀x ∈ A, x ) a, and a ) x implies that x = a. % Definition 2.11 Let A be a set with a partial ordering “)” and E ⊆ A: 1. a ∈ A is said to be an upper bound of E if x ) a, ∀x ∈ E. It is the least upper bound of E if it is the least element in the set of all upper bounds of E.

2.6 Partial Ordering and Total Ordering

33

2. a ∈ A is said to be a lower bound of E if a ) x, ∀x ∈ E. It is the greatest lower bound of E if it is the greatest element in the set of all lower bounds of E. % We have the following results. Proposition 2.12 Let A be a set with a partial ordering “).” Then, the following holds: (i) If a ∈ A is the least element, then it is minimal. (ii) There is at most one least element in A. (iii) Define a relation * by ∀x, y ∈ A, x * y if y ) x. Then, * is a partial ordering on A. Furthermore, * is antisymmetric if ) is antisymmetric: 1. a ∈ A is the least element for (A, )) if, and only if, it is the greatest element for (A, *). 2. a ∈ A is minimal for (A, )) if, and only if, it is maximal for (A, *). (iv) If a ∈ A is the greatest element, then it is maximal. (v) There is at most one greatest element in A. (vi) If ) is antisymmetric, then a ∈ A is minimal if, and only if, there does not exist x ∈ A such that x ) a and x = a. (vii) If ) is antisymmetric, then a ∈ A is maximal if, and only if, there does not exist x ∈ A such that a ) x and x = a. (viii) If ) is antisymmetric, then it is a total ordering if, and only if, ∀x1 , x2 ∈ A, and we have x1 ) x2 or x2 ) x1 . Proof (i) is straightforward from Definition 2.10. For (ii), let a1 and a2 be least elements of A. By a1 being the least element, we have a1 ) a2 . By a2 being the least element, we then have a1 = a2 . Hence, the least element is unique if it exists. For (iii), ∀x, y, z ∈ A, since x ) x implies x * x, then * is reflexive. If x * y and y * z, we have y ) x and z ) y, which implies z ) x, and hence, x * z. This shows that * is transitive. Hence, * is a partial ordering on A. When ) is antisymmetric, x * y and y * x imply that x ) y and y ) x, and therefore x = y. Hence, * is also antisymmetric. For 1, “only if” let a ∈ A be the least element in (A, )). ∀x ∈ A, we have a ) x. This implies x * a, ∀x ∈ A. ∀x ∈ A with a * x, we have x ) a, by a being the least element in (A, )), and we have x = a. Hence, a is the greatest element in (A, *). The “if” part is similar to the “only if” part. For 2, “only if” let a ∈ A be a minimal element for (A, )). Then, ∀x ∈ A with a * x implies x ) a, and hence, a ) x, which yields x * a. Hence, a is a maximal element for (A, *). The “if” part is similar to the “only if” part. (iv) is straightforward from Definition 2.10. For (v), let a1 and a2 be greatest elements of A. By a1 being the greatest element, we have a2 ) a1 . By a2 being the greatest element, we then have a1 = a2 . Hence, the greatest element is unique if it exists. For (vi), “if,” ∀x ∈ A with x ) a, then we have a = x, which means that a ) x; hence, a is minimal. “Only if,” suppose that ∃x ∈ A such that x ) a and x = a.

34

2 Set Theory

Note that a ) x since a is minimal. Then, a = x, since ) is antisymmetric, which is a contradiction. For (vii), “if,” ∀x ∈ A with a ) x, then we have a = x, which means that x ) a; hence, a is maximal. “Only if,” suppose that ∃x ∈ A such that a ) x and x = a. Note that x ) a since a is maximal. Then, a = x, since ) is antisymmetric, which is a contradiction. For (viii), “if,” ∀x1 , x2 ∈ A with x1 = x2 , we must have x1 ) x2 or x2 ) x1 . They cannot hold at the same time since, otherwise, x1 = x2 , which is a contradiction. “Only if,” ∀x1 , x2 ∈ A, when x1 = x2 , then x1 ) x2 ; when x1 = x2 , then x1 ) x2 or x2 ) x1 ; hence, in both cases, we have x1 ) x2 or x2 ) x1 . This completes the proof of the proposition. ' &

2.7 Basic Principles Now, we introduce the last axiom in Zermelo–Fraenkel–Cantor axiom system. Axiom 9 (Axiom of Choice) Let .(Aλ )λ∈Λ be a collection of nonempty sets, .Λ is a set (this collection is a set by Axiom 5), then there exists a function .f : Λ →  λ∈Λ Aλ , such that, .∀λ ∈ Λ, and we have .f (λ) ∈ Aλ . With Axioms 1–8 holding, the axiom of choice is equivalent to the following three results. Theorem 2.13 (Hausdorff Maximal Principle) Let .) be a partial ordering on a set E. Then, there exists a maximal (with respect to set containment .⊆) subset .F ⊆ E, such that .) is a total ordering on F . Theorem 2.14 (Zorn’s Lemma) Let .) be an antisymmetric partial ordering on a nonempty set E. If every nonempty totally order subset F of E has an upper bound in E, then there is a maximal element in E. Definition 2.15 A well-ordering of a set is a total ordering such that every nonempty subset has a least element. % Theorem 2.16 (Well-Ordering Principle) Every set can be well-ordered. To prove the equivalence we described above, we need the following result. Lemma 2.17 Let E be a nonempty set and .) is an antisymmetric partial ordering on E. Assume that every nonempty subset S of E, on which .) is a total ordering, has a least upper bound in E. Let .f : E → E be a mapping such that .x ) f (x), .∀x ∈ E. Then, f has a fixed point on E, i. e., .∃w ∈ E, .f (w) = w.

2.7 Basic Principles

35

Proof Fix a point .a ∈ E, since .E = ∅. We define a collection of “good” sets:   B = B ⊆ E  (i) a ∈ B

.

(ii) f (B) ⊆ B

(iii) ∀F ⊆ B, F = ∅

F is totally ordered with ) implies that the least upper  bound of F belongs to B. Consider the set B0 := {x ∈ E | a ) x}

.

Clearly, .B0 is nonempty since .a ∈ B0 and f (B0 ) = {f (x) ∈ E | a ) x ) f (x)} ⊆ B0

.

since f satisfies .x ) f (x), .∀x ∈ E. For any .F ⊆ B0 , such that F is totally ordered with .) and .F = ∅. Let .e0 be the least upper bound of F in E. Then, .∃x0 ∈ F such that .a ) x0 ) e0 , and therefore .e0 ∈ B0 . This shows that .B0 ∈ B and .B is nonempty. The following result holds for the collection .B. Claim 2.17.1 Let .{Bα | α ∈ Λ} be any nonempty subcollection of .B; then  α∈Λ Bα ∈ B.  Proof of Claim (i) .a ∈ Bα , .∀α ∈ Λ. This implies .a ∈ α∈Λ Bα .  (ii) By Proposition 2.5, we have .f ( α∈Λ Bα ) ⊆ α∈Λ f (Bα ) ⊆ α∈Λ Bα , where the last .⊆  follows from the fact .f (Bα ) ⊆ Bα , .∀α ∈ Λ. (iii) Let .F ⊆ α∈Λ Bα , which is totally ordered by .) and .F = ∅. For any .α ∈ Λ, .F ⊆ Bα implies that the least upper bound ofF is an element of .Bα . Therefore, the least upper bound of F is in the intersection . α∈Λ Bα .  This establishes . α∈Λ Bα ∈ B and completes the proof of the claim. ' & .

The claim shows that the collection .B is closed  under arbitrary intersection, as long as the collection is nonempty. Define .A := B∈B B. By the above claim, we have .A ∈ B, i. e., A is the smallest set in .B. Hence, .A ⊆ B0 , i. e., the set A satisfies, in addition to (i)–(iii). (iv) .∀x ∈ A, .a ) x. Define the relation .≺ on E as .∀x, y ∈ E, .x ≺ y if, and only if, .x ) y and .x = y. Define the set P by P = {x ∈ A | ∀y ∈ A, y ≺ x ⇒ f (y) ) x}

.

Clearly, .a ∈ P , since there does not exist any .y ∈ A such that .y ≺ a, by .) being antisymmetric. Therefore, P is nonempty.

36

2 Set Theory

We claim that: Claim 2.17.2 (v) .∀x ∈ P , .∀z ∈ A, then .z ) x or .f (x) ) z. Proof of Claim Fix .x ∈ P , and let B := {z ∈ A | z ) x} ∪ {z ∈ A | f (x) ) z}

.

We will show that .B ∈ B: (i) .a ∈ A, .x ∈ P ⊆ A, by (iv), .a ) x, which further implies that .a ∈ B. (ii) .∀z ∈ B ⊆ A, then .f (z) ∈ A since .A ∈ B. There are three exhaustive scenarios. If .z ≺ x, since .x ∈ P and .z ∈ B ⊆ A, then .f (z) ) x. This implies that .f (z) ∈ B. If .z = x, then .f (x) ) f (x) = f (z). This implies that .f (z) ∈ B. If .f (x) ) z, then .f (x) ) z ) f (z). This again implies that .f (z) ∈ B. Hence, in all three scenarios, we have .f (z) ∈ B. Then, .f (B) ⊆ B by the arbitrariness of .z ∈ B. (iii) Let .F = ∅ be any totally ordered subset of B and .e0 ∈ E be the least upper bound of F . Since .F ⊆ B ⊆ A and .A ∈ B, then .e0 ∈ A. There are two exhaustive scenarios. If there exists .y ∈ F such that .f (x) ) y, then .f (x) ) y ) e0 . This implies .e0 ∈ B. If, for any .y ∈ F , .y ) x, then .F ⊆ {z ∈ A | z ) x}. This implies that x is an upper bound of F and .e0 ) x, since .e0 is the least upper bound of F . Therefore, .e0 ∈ B. In both of the cases, we have .e0 ∈ B. This establishes that .B ∈ B. By A being the smallest set in .B, we have .A = B. Therefore, the claim is proven. ' & Now, we show that .P ∈ B: (i) .a ∈ P and therefore .P = ∅. (ii) Fix an .x ∈ P ⊆ A. Then, .f (x) ∈ A. .∀y ∈ A such that .y ≺ f (x). We need to show that .f (y) ) f (x), which then implies .f (x) ∈ P . By (v), there are two exhaustive scenarios. If .y ) x, then .y ) x. If .f (x) ) y, then .f (x) ) y ≺ f (x) forms a contradiction by .) being antisymmetric. Therefore, we must have .y ) x, which results in the following two exhaustive scenarios. If .y ≺ x, then .f (y) ) x since .x ∈ P . This implies that .f (y) ) x ) f (x). If .y = x, then .f (y) = f (x) ) f (x). In both cases, we have .f (y) ) f (x). By the arbitrariness of y, we have .f (x) ∈ P , which further implies .f (P ) ⊆ P by the arbitrariness of .x ∈ P . (iii) Let .F = ∅ be a totally ordered subset in P . Let .e0 ∈ E be the least upper bound of F . We have .F ⊆ A implies that .e0 ∈ A by .A ∈ B. .∀z ∈ A with .z ≺ e0 , implies that z must not be an upper bound of F . Therefore, .∃x0 ∈ F such that .x0 ) z. By (v), we have .z ≺ x0 . Hence, by .x0 ∈ F ⊆ P , .z ∈ A, and .z ≺ x0 , we have .f (z) ) x0 . Therefore, .f (z) ) e0 since .e0 is an upper bound of F . This further implies that .e0 ∈ P by the arbitrariness of z. This proves that .P ∈ B. Since .P ⊆ A and A is the smallest set in .B, then .P = A. The set A satisfies properties (i)–(v). For any .x1 , x2 ∈ A, by (v), there are two exhaustive scenarios. If .x1 ) x2 , then .x1 and .x2 are related through .). If .f (x2 ) ) x1 , then .x2 ) f (x2 ) ) x1 , which implies that .x1 and .x2 are related through .). Therefore, .x1 and .x2 are related through .) in

2.7 Basic Principles

37

both cases. Then, by Proposition 2.12 (viii), A is totally ordered by .) and nonempty. Let .w ∈ E be the least upper bound of A. Then, .w ∈ A, since .A ∈ B. Therefore, .f (w) ∈ A by .f (A) ⊆ A, which implies that .f (w) ) w. This coupled with .w ) f (w) yields .f (w) = w, since .) is antisymmetric. This completes the proof of the lemma. ' & Theorem 2.18 Under the Axioms 1–8, the following are equivalent: 1. 2. 3. 4.

Axiom of choice Hausdorff maximum principle Zorn’s lemma Well-ordering principle

Proof 1. .⇒ 2. Define E := {A ⊆ E | ) defines a total ordering on A}

.

Clearly, .∅ ∈ E, then .E = ∅. Define a partial ordering on .E by .⊆, which is set containment. This partial ordering .⊆ is clearly reflexive, transitive, and antisymmetric. .∀A ∈ E, define a collection  AA :=

.

{B ∈ E | A ⊂ B} if ∃B ∈ E such that A ⊂ B {A} otherwise

Clearly, .AA = ∅. By axiom of choice, .∃T : E → E such that .T (A) = B ∈ AA , ∀A ∈ E. We will show that T admits a fixed point by Lemma 2.17. Let .B ⊆ E be any nonempty subset on which .⊆ is a total ordering. Let .C := B∈B B. Clearly, .C ⊆ E. We will show .) is a total ordering on C. Since .) is a partial ordering on E, then it is a partial ordering on C. .∀x1 , x2 ∈ C, .∃B1 , B2 ∈ B such that .x1 ∈ B1 and .x2 ∈ B2 . Since .⊆ is a total ordering on .B, then we may without loss of generality assume .B1 ⊆ B2 . Then, .x1 , x2 ∈ B2 . Since .B2 ∈ B ⊆ E, then .) is a total ordering on .B2 , which means that we have .x1 ) x2 or .x2 ) x1 . Furthermore, if .x1 ) x2 and .x2 ) x1 , then .x1 = x2 by .) being antisymmetric on .B2 . Therefore, by Proposition 2.12, .) is a total ordering on C. Hence, .C ∈ E. This shows that .B admits least upper bound C in .E with respect to .⊆. By the definition of T , it is clear that .A ⊆ T (A), .∀A ∈ E. By Lemma 2.17, T has a fixed point on .E, i. e., .∃A0 ∈ E such that .T (A0 ) = A0 . By the definitions of T and .AA0 , there does not exist .B ∈ E such that .A0 ⊂ B. Hence, by Proposition 2.12 (vii), .A0 is maximal in .E with respect to .⊆. 2. .⇒ 3. Let E be a nonempty set with an antisymmetric partial ordering .). By Hausdorff maximum principle, there exists a maximal (with respect to .⊆) totally ordered (with respect to .)) subset .F ⊆ E. We must have .F = ∅; otherwise, let .x0 ∈ E (since .E = ∅), .F ⊂ {x0 } ⊆ E, and .{x0 } is totally ordered by .), which violates the fact that F is maximal (with respect to .⊆). Then, F has an upper bound .e0 ∈ E. .

38

2 Set Theory

Claim 2.18.1 .e0 ∈ F . Proof of Claim Suppose .e0 ∈ / F . Define .A := F ∪ {e0 } ⊆ E. Clearly, .F ⊆ A and F = A. We will show that .) is a total ordering on A. Clearly, .) is an antisymmetric partial ordering on A since it is an antisymmetric partial ordering on E. .∀x1 , x2 ∈ A, we will distinguish 4 exhaustive and mutually exclusive cases: Case 1: .x1 , x2 ∈ F ; Case 2: .x1 ∈ F , .x2 = e0 ; Case 3: .x1 = e0 , .x2 ∈ F ; Case 4: .x1 = x2 = e0 . In Case 1, we have .x1 ) x2 or .x2 ) x1 since .) is a total ordering on F . In Case 2, we have .x1 ) x2 = e0 since .e0 is an upper bound of F . In Case 3, we have .x2 ) x1 = e0 . In Case 4, we have .x1 = e0 ) e0 = x2 . Hence, .) is a total ordering on A. Note that .F ⊆ A and .F = A. By Proposition 2.12 (vii), this contradicts with the fact that F is maximal with respect to .⊆. Therefore, we must have .e0 ∈ F . This completes the proof of the claim. ' & .

.∀e1 ∈ E such that .e0 ) e1 . .∀x ∈ F , we have .x ) e0 ) e1 . Hence, .e1 is an upper bound of F . By Claim 2.18.1, we must have .e1 ∈ F . Then, .e1 ) e0 since .e0 is an upper bound of F . This shows that .e0 is maximal in E with respect to .). 3. .⇒ 4. Let E be a set. It is clear that .∅ ⊆ E is well-ordered by the empty relation. Define

E := {(Aα , α ) | Aα ⊆ E, Aα is well-ordered by α }

.

Then, .E = ∅. Define an ordering .) on .E by .∀(A1 , 1 ), (A2 , 2 ) ∈ E, we say (A1 , 1 ) ) (A2 , 2 ) if the following three conditions hold: (i) .A1 ⊆ A2 ; (ii) .2 = 1 on .A1 ; (iii) .∀x1 ∈ A1 , .∀x2 ∈ A2 \ A1 , and we have .x1 2 x2 . Now, we will show that .) defines an antisymmetric partial ordering on .E. .∀(A1 , 1 ), (A2 , 2 ), (A3 , 3 ) ∈ E. Clearly, .(A1 , 1 ) ) (A1 , 1 ). Hence, .) is reflexive. If .(A1 , 1 ) ) (A2 , 2 ) and .(A2 , 2 ) ) (A3 , 3 ), we have .A1 ⊆ A2 ⊆ A3 , and (i) holds; (ii) .3 = 2 on .A2 and .2 = 1 on .A1 imply that .3 = 1 on .A1 ; (iii) .∀x1 ∈ A1 , .∀x2 ∈ A3 \ A1 , we have 2 exhaustive senarios: if .x2 ∈ A2 , then .x2 ∈ A2 \ A1 , which implies .x1 2 x2 and hence .x1 3 x2 ; if .x2 ∈ A3 \ A2 , then we have .x1 ∈ A2 and .x1 3 x2 , and thus, we have .x1 3 x2 in both cases. Therefore, .(A1 , 1 ) ) (A3 , 3 ), and hence, .) is transitive. If .(A1 , 1 ) ) (A2 , 2 ) and .(A2 , 2 ) ) (A1 , 1 ), then .A1 ⊆ A2 ⊆ A1 .⇒ .A1 = A2 and .2 = 1 on .A1 . Hence, .(A1 , 1 ) = (A2 , 2 ), which shows that .) is antisymmetric. Therefore, .) defines an antisymmetric partial ordering on .E. Let .A ⊆ E be any nonempty subset totally ordered by.). Take .A = {(Aα , α ) | α ∈ Λ}, where .Λ = ∅ is an index set. Define .A := α∈Λ Aα . Define an ordering . on A by: .∀x1 , x2 ∈ A, .∃(A1 , 1 ), (A2 , 2 ) ∈ A such that .x1 ∈ A1 and .x2 ∈ A2 , without loss of generality, assume that .(A1 , 1 ) ) (A2 , 2 ) since .A is totally ordered by .), then .x1 , x2 ∈ A2 , and we will say that .x1  x2 if .x1 2 x2 . We will now show that this ordering is uniquely defined independent of .(A2 , 2 ) ∈ A. Let .(A3 , 3 ) ∈ A be such that .x1 , x2 ∈ A3 . Since .A is totally ordered by .), then there are two exhaustive cases: Case 1: .(A3 , 3 ) ) (A2 , 2 ); Case 2: .(A2 , 2 ) ) (A3 , 3 ). In Case 1, we have .A3 ⊆ A2 and .3 = 2 on .A3 , which implies that .x1  x2 .⇔ .x1 2 x2 .⇔ .x1 3 x2 . In Case 2, we have .A2 ⊆ A3 .

2.7 Basic Principles

39

and .3 = 2 on .A2 , which implies that .x1  x2 .⇔ .x1 2 x2 .⇔ .x1 3 x2 . Hence, the ordering . is well-defined on A. Next, we will show that . is a total ordering on A. .∀x1 , x2 , x3 ∈ A. .∃(Ai , i ) ∈ A such that .xi ∈ Ai , .i = 1, 2, 3. Since .A is totally ordered by .), then, without loss of generality, assume that .(A1 , 1 ) ) (A2 , 2 ) ) (A3 , 3 ). Then, .x1 , x2 , x3 ∈ A3 . Clearly, .x1  x1 since .x1 3 x1 , which implies that . is reflective. If .x1  x2 and .x2  x3 , then .x1 3 x2 3 x3 , which implies .x1 3 x3 since .3 is transitive on .A3 , and hence, .x1  x3 . This shows that . is transitive. If .x1  x2 and .x2  x1 , then .x1 3 x2 and .x2 3 x1 , which implies that .x1 = x2 since .3 is antisymmetric on .A3 . This shows that . is antisymmetric. Since .3 is a well-ordering on .A3 , then we must have .x1 3 x2 .⇔ .x1  x2 or .x2 3 x1 .⇔ .x2  x1 . Hence, . defines a total ordering on A. Next, we will show that . is a well-ordering on A. .∀B ⊆ A with .B = ∅. Fix .x0 ∈ B. Then, .∃(A1 , 1 ) ∈ A such that .x0 ∈ A1 . Note that .∅ = B ∩ A1 ⊆ A1 . Since .A1 is well-ordered by .1 , then .∃e ∈ B ∩ A1 , which is the least element of .B ∩ A1 . .∀y ∈ B ⊆ A, .∃(A2 , 2 ) ∈ A such that .y ∈ A2 . We have 2 exhaustive and mutually exclusive cases: Case 1: .y ∈ A1 ; Case 2: .y ∈ A2 \ A1 . In Case 1, .e 1 y since e is the least element of .B ∩ A1 , which implies that .e  y. In Case 2, since .A is totally ordered by .), we must have .(A1 , 1 ) ) (A2 , 2 ), which implies that .e 2 y, by (iii) in the definition of .), and hence, .e  y. In both cases, we have shown that .e  y. Since . is a total ordering on A, then e is the least element of B. Therefore, . is a well-ordering on A, which implies .(A, ) ∈ E. .∀(A1 , 1 ) ∈ A. (i) .A1 ⊆ A. (ii) .∀x1 , x2 ∈ A1 , .x1 1 x2 .⇔ .x1  x2 ; hence, . = 1 on .A1 . (iii) .∀x1 ∈ A1 , .∀x2 ∈ A \ A1 , .∃(A2 , 2 ) ∈ A such that .x2 ∈ A2 \ A1 ; since .A is totally ordered by .), then we must have .(A1 , 1 ) ) (A2 , 2 ); hence, .x1 2 x2 and .x1  x2. Therefore, we have shown .(A1 , 1 ) ) (A, ). Hence, .(A, ) ∈ E is an upper bound of .A. By Zorn’s Lemma, there is a maximal element .(F, F ) ∈ E. We claim that .F = E. We will prove this by an argument of contradiction. Suppose .F ⊂ E, then .∃x0 ∈ E \ F . Let .H := F ∪ {x0 }. Define an ordering .H on H by: .∀x1 , x2 ∈ H , if .x1 , x2 ∈ F , we say .x1 H x2 if .x1 F x2 ; if .x1 ∈ F and .x2 = x0 , then we let .x1 H x2 ; if .x1 = x2 = x0 , we let .x1 H x2 . Now, we will show that .H is a well-ordering on H . .∀x1 , x2 , x3 ∈ H . If .x1 ∈ F , then .x1 F x1 and .x1 H x1 ; if .x1 = x0 , then .x1 H x1 . Hence, .H is reflexive. If .x1 H x2 and .x2 H x3 , we have 4 exhaustive and mutually exclusive cases: Case 1: .x1 , x3 ∈ F ; Case 2: .x1 ∈ F and .x3 = x0 ; Case 3: .x3 ∈ F and .x1 = x0 ; Case 4: .x1 = x3 = x0 . In Case 1, we must have .x2 ∈ F and then .x1 F x2 and .x2 F x3 , which implies that .x1 F x3 , and hence, .x1 H x3 . In Case 2, we have .x1 H x3 . In Case 3, we must have .x2 = x0 , which leads to a contradiction .x0 H x3 ; hence, this case is impossible. In Case 4, we have .x1 H x3 . In all cases except that is impossible, we have .x1 H x3 . Hence, .H is transitive. If .x1 H x2 and .x2 H x1 , we have 4 exhaustive and mutually exclusive cases: Case 1: .x1 , x2 ∈ F ; Case 2: .x1 ∈ F and .x2 = x0 ; Case 3: .x2 ∈ F and .x1 = x0 ; Case 4: .x1 = x2 = x0 . In Case 1, we have .x1 F x2 and .x2 F x1 , which implies that .x1 = x2 since .F is antisymmetric on F . In Case 2, we have .x0 H x1 , which is a contradiction, and hence, this case is impossible. In Case 3, we

40

2 Set Theory

have .x0 H x2 , which is a contradiction, and hence, this case is impossible. In Case 4, we have .x1 = x2 . In all cases except those impossible, we have .x1 = x2 . Hence, .H is antisymmetric. When .x1 , x2 ∈ F , then, we must have .x1 F x2 or .x2 F x1 since .F is a well-ordering on F , and hence, .x1 H x2 or .x2 H x1 . When .x1 ∈ F and .x2 = x0 , then .x1 H x2 . When .x2 ∈ F and .x1 = x0 , then .x2 H x1 . When .x1 = x2 = x0 , then .x1 H x2 . This shows that .H is a total ordering on H . .∀B ⊆ H with .B = ∅. We will distinguish two exhaustive and mutually exclusive cases: Case 1: .B = {x0 }; Case 2: .B = {x0 }. In Case 1, .x0 is the least element of B. In Case 2, .B \ {x0 } ⊆ F and is nonempty and hence admits a least element .e0 ∈ B \ {x0 } ⊆ F with respect to .F . .∀x ∈ B, if .x ∈ B \ {x0 }, then .e0 F x and hence .e0 H x; if .x = x0 , then .e0 H x. Hence, .e0 is the least element of B since .H is a total ordering on H . Therefore, .H is a well-ordering on H and .(H, H ) ∈ E. Clearly, .F ⊂ H , .F = H on F , and .∀x1 ∈ F and .∀x2 ∈ H \ F , we have .x2 = x0 and .x1 H x2 . This implies that .(F, F ) ) (H, H ). Since .(F, F ) is maximal in .E with respect to .), we must have .(H, H ) ) (F, F ), and hence, .H ⊆ F . This is a contradiction. Therefore, .F = E and E is well-ordered by .F .  4. .⇒ 1. Let .(Aλ )λ∈Λ be a collection of nonempty sets, and .Λ is a set. Let .A := λ∈Λ Aλ . By well-ordering principle, A may be well-ordered by .. .∀λ ∈ Λ, .Aλ ⊆ A is nonempty and admits the least element .eλ ∈ Aλ . This defines a function .f : Λ → A by .f (λ) = eλ ∈ Aλ , .∀λ ∈ Λ. This completes the proof of the theorem. ' & Example 2.19 Let .Λ be an index set and .(A  of sets. We will α )α∈Λ be a collection try to define the Cartesian (direct) product . α∈Λ Aα . Let .A = α∈Λ Aα , which is a set by the axiom of union. Then, as we discussed in Sect. 2.3, .AΛ is a set, which consists of all functions of .Λ to A. Define the projection functions .πα : AΛ → A, Λ .∀α ∈ Λ, by, .∀f ∈ A , .πα (f ) = f (α). Then, we may define the set  .

Aα := {f ∈ AΛ |πα (f ) ∈ Aα , ∀α ∈ Λ}

α∈Λ

 When all of .Aα ’s are nonempty, then, by axiom of choice, the product . α∈Λ Aα is also nonempty. %

Chapter 3

Topological Spaces

3.1 Fundamental Notions Definition 3.1 A topological space (X, O) consists of a set X and a collection O of subsets (namely, open subsets) of X such that (i) ∅, X ∈ O. (ii) ∀O1 , O2 ∈ O, we have O1 ∩ O2 ∈ O.  (iii) ∀(Oα )α∈Λ ⊆ O, where Λ is an index set, we have α∈Λ Oα ∈ O. The collection O is called a topology for the set X.

%

Definition 3.2 Let (X, O) be a topological space and F ⊆ X. The complement of  := X \ F . F is said to be closed if F  ∈ O. The closure of F is given by F is F  F := B, which is clearly a closed set. The interior of F is given by F ◦ := 

F ⊆B  B∈O

B, which is clearly an open set. A point of closure of F is a point in F . An

B⊆F B∈O

interior point of F is a point in F ◦ . A boundary point of F is a point x ∈ X such  = ∅. The boundary of that ∀O ∈ O with x ∈ O, we have O ∩ F = ∅ and O ∩ F F , denoted by ∂F , is the set of all boundary points of F . An exterior point of F is a )◦ , where F ◦ is called the exterior of F . An accumulation point of F is point in (F a point x ∈ X such that ∀O ∈ O with x ∈ O, we have O ∩ (F \ {x}) = ∅. % Clearly, ∅ and X are both closed and open. Proposition 3.3 Let (X, O) be a topological space and A, B, and E are subsets of X. Then,  ◦ = E. (i) E ⊆ E, E = E, E ◦ ⊆ E, (E ◦ )◦ = E ◦ , and E (ii) ∀x ∈ X, x is a point of closure of E if, and only if, ∀O ∈ O with x ∈ O, we have O ∩ E = ∅. © The Author(s), under exclusive license to Springer Nature Switzerland AG 2023 Z. Pan, Measure-Theoretic Calculus in Abstract Spaces, https://doi.org/10.1007/978-3-031-21912-2_3

41

42

3 Topological Spaces

(iii) ∀x ∈ X, x is an interior point of E if, and only if, ∃O ∈ O with x ∈ O such that O ⊆ E. (iv) A ∪ B = A ∪ B, (A ∩ B)◦ = A◦ ∩ B ◦ . (v) E is closed if, and only if, E = E. (vi) E = E ◦ ∪ ∂E.  ◦. (vii) X equals the disjoint union E ◦ ∪ ∂E ∪ (E)  Proof (i) Clearly,  E⊆ E. Then, E ⊆ E. ∀C ⊇ E with C ∈ O, we have C ⊇ E. Then, E = C⊇ C = E. Hence, we have E = E. C⊇E  C∈O

Clearly,

E◦

C⊇E  C∈O

⊆ E. Note that  .E =



! B

=

B⊇E  B∈O





= B

◦ O=E

 O⊆E O∈O

B⊇E  B∈O

Furthermore, !◦ " ◦ #◦   "#◦    = E = E   = E◦ .(E ) = E = E =E ◦ ◦

(ii) “Only if” ∀x ∈ E, we have x ∈

 B⊇E  B∈O

B. ∀O ∈ O with x ∈ O, let O1 :=

 = ∅. Suppose O ∩ E = ∅. Then,  ∈ O. Note that x ∈ O and E ∩ E O∪E 1 1 . Then, E ⊆ O 1 and x ∈ O 1 . This E ∩ O1 = ∅, which further implies that E ⊆ O contradicts with x ∈ O. Hence, O ∩ E = ∅. =E ◦ ∈ O. Then, ∃O := E ◦ ∈ O such that E ∩ O = ∅. Hence, the “If” ∀x ∈ E result holds. (iii) “Only if” ∀x ∈ E ◦ ⊆ E, then E ◦ ∈ O.  “If” ∀x ∈ X, ∃O ∈ O such that x ∈ O ⊆ E. Then, x ∈ O ⊆ B⊆E B = E ◦ . B∈O Hence, the result holds. (iv) Let B := {O ∈ O | O ⊆ A ∩ B}, BA := {O ∈ O | O ⊆ A}, and BB := {O ∈ O | O ⊆ B}. ∀O1 ∈ BA and ∀O2 ∈ BB , then, O1 ∩ O2 ∈ B. On the other hand, ∀O ∈ B, we have O = O ∩ O and O ∈ BA and O ∈ BB . Then,   ◦ .(A ∩ B) = O= (O1 ∩ O2 ) O⊆A∩B O∈O



O1 ⊆A,O2 ⊆B O1 ,O2 ∈O







⎜  ⎟ ⎜  ⎟ ◦ ◦ ⎜ ∗=⎜ O1 ⎟ O2 ⎟ ⎝ ⎠∩⎝ ⎠=A ∩B O1 ⊆A O1 ∈O

O2 ⊆B O2 ∈O

3.1 Fundamental Notions

43

We also have A∪B =

.

"" #◦ #∼     ◦  ◦ ∼  ∩ B ◦ ∼= A  ∩ B  A ∪B = A

 =A∪B = A∩B (v) “If” E is closed since E = E and E is closed. “Only if” Since E is closed, then E ⊆ E. Then, we have E = E. Hence, the result holds. (vi) This result follows directly from (ii), (iii), and Definition 3.2.  = E ◦ ∪ ∂E ∪ E ◦ . By (iii) and Definition 3.2, E ◦ and (vii) Note that X = E ∪ E  ∂E are disjoint. It is obvious that E is disjoint with E ◦ ∪ ∂E = E. Hence, the result holds. ' & To simplify notation in the theory, we will abuse the notation to write x ∈ X when x ∈ X and A ⊆ X when A ⊆ X for a topological space X := (X, O). We will later simply discuss a topological space X without further reference to components of X , where the topology is understood to be OX . When it is clear from the context, we will neglect the subscript X . Proposition 3.4 Let (X, O) be a topological space and A ⊆ X. A admits the subset topology OA := {O ∩ A | O ∈ O}. Proof Clearly, OA is a collection of subsets of A. ∅ = ∅ ∩ A ∈ OA and A = X ∩ A ∈ OA . ∀OA1 , OA2 ∈ OA , ∃O1 , O2 ∈ O such that OA1 = O1 ∩ A and OA2 = O2 ∩ A. Then, O1 ∩ O2 ∈ O since O is a topology. Then, OA1 ∩ OA2 = (O1 ∩O2 )∩A ∈ OA . ∀(OAα )α∈Λ ⊆ OA , where  Λ is an index set, we have, ∀α ∈ Λ, ∃Oα ∈ O such that O = O ∩ A. Then, Aα α α∈Λ Oα ∈ O since O is a topology.     Therefore, α∈Λ OAα = α∈Λ Oα ∩ A ∈ OA . Hence, OA is a topology on A. ' & Let (X, O) be a topological space and A ⊆ X. The property of a set E ⊆ A being open or closed is relative with respect to (X, O), that is, this property may change if we consider the subset topology (A, OA ). Proposition 3.5 Let X be a topological space, A ⊆ X be endowed with the subset topology OA , and E ⊆ A. Then, (1) E is closed in OA if, and only if, E = A ∩ F , where F ⊆ X is closed in OX . (2) The closure of E relative to (A, OA ) (the closure of E in OA ) is equal to E ∩ A, where E is the closure of E relative to X . Proof Here, the set complementation and the set closure operation are relative to X.  . Since F is closed in OX , (1) “If” A \ E = A \ (A ∩ F ) = A ∩ A ∩F = A∩F  ∈ OX . Then, A \ E ∈ OA . Hence, E is closed in OA . then F

44

3 Topological Spaces

“Only if” A \ E ∈ OA . Then, ∃O ∈ OX such that A \ E = A ∩ O. Then,   Hence, the result holds. E = A \ (A \ E) = A ∩ A ∩ O = A ∩ O. (2) By (1), E ∩ A is closed in OA . Then, the closure of E relative to (A, OA ) is contained in E ∩ A. On the other hand, by Proposition 3.3, if x ∈ X is a point of closure of E relative to X , then it is a point of closure of E in OA if x ∈ A. Then, E ∩ A is contained in the closure of E relative to (A, OA ). Hence, the result holds. This completes the proof of the proposition. ' & Definition 3.6 For two topologies over the same set X, O1 and O2 , we will say that O1 is stronger (finer) than O2 if O1 ⊃ O2 , in which case, O2 is said to be weaker (coarser) than O1 . % Proposition 3.7 Let X be a set and A ⊆ X2. Then, there exists the weakest topology O on X such that A ⊆ O. This topology is called the topology generated by A.  Proof Let M := {X ⊆ X2 | A ⊆ X and X is a topology on X} and O = X ∈M X . Clearly, X2 ∈ M and hence O is well-defined. Then, (i) ∅, X ∈ X , ∀X ∈ M. Hence, ∅, X ∈ O. (ii) ∀A1 , A2 ∈ O, we have A1 , A2 ∈ X , ∀X ∈ M. Then, A1 ∩ A2 ∈ X , ∀X ∈ M. Hence, A1 ∩ A2 ∈ O. (iii) ∀(Aα )α∈Λ ∀α ∈ Λ, ∀X ∈ M, Aα ∈ X .  ⊆ O, where Λ is an index set, we have, Then, α∈Λ Aα ∈ X , ∀X ∈ M. Hence, we have α∈Λ Aα ∈ O. Therefore, O is a topology on X. Clearly, A ⊆ O since A ⊆ X , ∀X ∈ M. Therefore, O is the weakest topology containing A. ' &

3.2 Continuity Definition 3.8 Let (X, OX ) and (Y, OY ) be topological spaces, D ⊆ X with the subset topology OD , and f : D → Y (or f : (D, OD ) → (Y, OY ) to be more specific). Then, f is said to be continuous if, ∀OY ∈ OY , we have finv (OY ) ∈ OD . f is said to be continuous at x0 ∈ D if, ∀OY ∈ OY with f (x0 ) ∈ OY , ∃U ∈ OX with x0 ∈ U such that f (U ) ⊆ OY . f is said to be continuous on E ⊆ D if it is continuous at x, ∀x ∈ E. % Proposition 3.9 Let X and Y be topological spaces, D ⊆ X with the subset topology OD , and f : D → Y. f is continuous if, and only if, ∀x0 ∈ D, f is continuous at x0 . Proof “If” ∀OY ∈ OY , ∀x ∈ finv (OY ) ⊆ D. Since f is continuous at x, then ∃Ux ∈ OX with x ∈ Ux such that f (Ux ) ⊆ OY , which implies, by Proposition 2.5, that Ux ∩D ⊆ finv (OY ). Then, finv (OY ) = x∈finv (OY ) (Ux ∩D) =  ( x∈finv(OY ) Ux ) ∩ D ∈ OD . Hence, f is continuous.

3.2 Continuity

45

“Only if” ∀x0 ∈ D, ∀OY ∈ OY with f (x0 ) ∈ OY , let U = finv (OY ) ∈ OD . By Proposition 3.4, ∃U¯ ∈ OX such that U = U¯ ∩D. Then, x0 ∈ U . By Proposition 2.5, f (U¯ ) = f (U ) ⊆ OY . Hence, f is continuous at x0 . This completes the proof of the proposition. ' & Proposition 3.10 Let X and Y be topological spaces and f : X → Y. f is  ∈ OY , we have f continuous if, and only if, ∀B ⊆ Y with B inv (B) ∈ OX , that is, the inverse image of any closed set in Y is closed in X .  ∈ OX . Proof “If” ∀O ∈ OY , we have, by Proposition 2.5, finv (O) = f inv (O) Hence, f is continuous.  ∈ OY . Since f is continuous, then, by Proposition 2.5, “Only if” ∀B ⊆ Y with B   ∈ OX . Hence, the result holds. finv (B) = finv (B) This completes the proof of the proposition. ' & Theorem 3.11 Let X and Y be topological spaces, f : X → Y, and X = X1 ∪X2 , where X1 and X2 are both open or both closed. Let X1 and X2 be endowed with subset topologies OX1 and OX2 , respectively. Assume that f |X1 : X1 → Y and f |X2 : X2 → Y are continuous. Then, f is continuous. Proof Consider the case that X1 and X2 are both open. ∀x0 ∈ X . Without loss of generality, assume x0 ∈ X1 . ∀O ∈ OY with f (x0 ) ∈ O. Since f |X1 is continuous, then, by Proposition 3.9, ∃U ∈ OX1 with x0 ∈ U such that f |X1 (U ) ⊆ O. Since X1 ∈ OX , then U ∈ OX . Note that f (U ) = f |X1 (U ) ⊆ O, since U ⊆ X1 . Hence, f is continuous at x0 . By the arbitrariness of x0 and Proposition 3.9, f is continuous. Consider the case that X1 and X2 are both closed. For all closed subset B ⊆ Y, we have finv (B) ⊆ X. Then, finv (B) ∩ X1 = ( f |X1 )inv (B) is closed in OX1 , by Proposition 3.10 and the continuity of f |X1 . Similarly, finv (B)∩X2 = ( f |X2 )inv (B) is closed in OX2 . Since X1 and X2 are closed sets in OX , then, finv (B) ∩ X1 and finv (B) ∩ X2 are closed in OX , by Proposition 3.5. Then, finv (B) = (finv (B) ∩ X1 ) ∪ (finv (B) ∩ X2 ) is closed in OX . By Proposition 3.10, f is continuous. This completes the proof of the theorem. ' & Proposition 3.12 Let X , Y, and Z be topological spaces, f : X → Y, g : Y → Z, and x0 ∈ X . Assume that f is continuous at x0 and g is continuous at y0 := f (x0 ). Then, g ◦ f : X → Z is continuous at x0 . Proof ∀OZ ∈ OZ with g(f (x0 )) ∈ OZ . Since g is continuous at f (x0 ), then ∃OY ∈ OY with f (x0 ) ∈ OY such that g(OY ) ⊆ OZ . Since f is continuous at x0 , then ∃OX ∈ OX with x0 ∈ OX such that f (OX ) ⊆ OY . Then, g(f (OX )) ⊆ OZ . Hence, g ◦ f is continuous at x0 . This completes the proof of the proposition. ' & Definition 3.13 Let X and Y be topological spaces and f : X → Y. f is said to be a homeomorphism between X and Y if it is bijective and continuous and finv : Y → X is also continuous. The spaces X and Y are said to be homeomorphic if there exists a homeomorphism between them. %

46

3 Topological Spaces

Any properties invariant under homeomorphisms are called topological properties. Homeomorphisms preserve topological properties in topological spaces. Isomorphisms preserve algebraic properties in algebraic systems. Isometries preserve metric properties in metric spaces. Definition 3.14 Let X be a topological space, D ⊆ X with the subset topology OD , and f : D → R. f is said to be upper semicontinuous if ∀a ∈ R, finv ((−∞, a)) ∈ OD . f is said to be upper semicontinuous at x0 ∈ X if ∀ ∈ (0, ∞) ⊂ R, ∃U ∈ OX with x0 ∈ U such that f (x) < f (x0 )+, ∀x ∈ U ∩D. f is said to be lower semicontinuous if −f is upper semicontinuous. % Proposition 3.15 Let X be a topological space, D ⊆ X with the subset topology OD , and f : D → R. f is upper semicontinuous if, and only if, f is upper semicontinuous at x0 , ∀x0 ∈ D. Proof This is straightforward and is therefore omitted.

' &

Proposition 3.16 Let X and Y be topological spaces, f : X → R and g : X → R be upper semicontinuous at x0 ∈ X , h : Y → X be continuous at y0 ∈ Y, and h(y0 ) = x0 . Then, f + g is upper semicontinuous at x0 and f ◦ h is upper semicontinuous at y0 . Furthermore, if f is also lower semicontinuous at x0 , then f is continuous at x0 . Proof ∀ ∈ (0, ∞) ⊂ R, ∃Uf ∈ OX with x0 ∈ Uf such that f (x) < f (x0 ) + , ∀x ∈ Uf , by the upper semicontinuity of f . By the upper semicontinuity of g, ∃Ug ∈ OX with x0 ∈ Ug such that g(x) < g(x0 ) + , ∀x ∈ Ug . Then, x0 ∈ U := Uf ∩ Ug ∈ OX and (f + g)(x) = f (x) + g(x) < f (x0 ) +  + g(x0 ) +  = (f + g)(x0 ) + 2, ∀x ∈ U . Hence, f + g is upper semicontinuous at x0 . ∀ ∈ (0, ∞) ⊂ R, ∃Uf ∈ OX with x0 ∈ Uf such that f (x) < f (x0 ) + , ∀x ∈ Uf , by the upper semicontinuity of f . By the continuity of h, ∃Uh ∈ OY with y0 ∈ Uh such that h(y) ∈ Uf , ∀y ∈ Uh . Then, (f ◦ h)(y) < f (h(y0 )) + , ∀y ∈ Uh . Hence, f ◦ h is upper semicontinuous at y0 . ∀ ∈ (0, ∞) ⊂ R, by the upper semicontinuity of f , ∃U1 ∈ OX with x0 ∈ U1 such that f (x) < f (x0 )+, ∀x ∈ U1 . By the lower semicontinuity of f , ∃U2 ∈ OX with x0 ∈ U2 such that f (x) > f (x0 )−, ∀x ∈ U2 . Then, x0 ∈ U := U1 ∩U2 ∈ OX and |f (x) − f (x0 )| < , ∀x ∈ U . Hence, f is continuous at x0 . This completes the proof of the proposition. ' &

3.3 Basis and Countability Definition 3.17 Let (X, O) be a topological space and B ⊆ O. B is said to be a basis of the topological space if, ∀O ∈ O, ∀x ∈ O, ∃B ∈ B such that x ∈ B ⊆ O. Bx ⊆ O with x ∈ B, ∀B ∈ Bx , is said to be a basis at x ∈ X if ∀O ∈ O with x ∈ O, ∃B ∈ Bx such that x ∈ B ⊆ O. %

3.3 Basis and Countability

47

If B is a basis  for the topology O, then the topology generated by B is O and, ∀O ∈ O, O = x∈O Bx , where Bx ∈ B is such that x ∈ Bx ⊆ O. Proposition 3.18 Let X be a set and B ⊆ X2. Then, B is a basis for the topology generated by B, O, if, and only if, the following two conditions hold:  (i) ∀x ∈ X, ∃B ∈ B such that x ∈ B; (that is, B∈B B = X). (ii) ∀B1 , B2 ∈ B, ∀x ∈ B1 ∩ B2 , ∃B3 ∈ B such that x ∈ B3 ⊆ B1 ∩ B2 . Proof “Only if” Let B be a basis for O. Since X ∈ O, then ∀x ∈ X, ∃B ∈ B such that x ∈ B ⊆ X. So (i) is true. ∀B1 , B2 ∈ B ⊆ O, we have B1 ∩ B2 ∈ O. ∀x ∈ B1 ∩ B2 , ∃B3 ∈ B such that x ∈ B3 ⊆ B1 ∩ B2 . Hence, (ii) is also true. “If” Define O¯ := {O ⊆ X | ∀x ∈ O, ∃B ∈ B such that x ∈ B ⊆ O}. Clearly, ¯ X ∈ O¯ since (i). ∀O1 , O2 ∈ O, ¯ ∀x ∈ O1 ∩ O2 , we have x ∈ O1 B ⊆ O¯ and ∅ ∈ O. ¯ ∃B1 , B2 ∈ B such that x ∈ B1 ⊆ O1 and x ∈ O2 . Then, by the definition of O, and x ∈ B2 ⊆ O2 . Then, x ∈ B1 ∩ B2 . By (ii), ∃B3 ∈ B such that x ∈ B3 ⊆ ¯ ¯ B1 ∩ B2 ⊆ O 1 ∩ O2 . Then, O1 ∩ O2 ∈ O. ∀(Oα )α∈Λ ⊆ O, where Λ is an index ¯ set, let O = α∈Λ Oα . ∀x ∈ O, ∃α ∈ Λ such that x ∈ Oα . By the definition of O, ¯ Therefore, O¯ is a topology ∃B ∈ B such that x ∈ B ⊆ Oα ⊆ O. Hence, O ∈ O. ¯ B is a basis for O. ¯ Note that O¯ ⊇ O since O is the on X. By the definition of O, weakest topology containing B. Then, B is a basis for O. This completes the proof of the proposition. ' & Example 3.19 For the real line R, let A := {interval (a, b) | a, b ∈ R, a < b}. Then, the topology generated by A, OR , is the usual topology on R as we know before. By Proposition 3.18, A is a basis for this topology. % Definition 3.20 A topological space (X, O) is said to satisfy the first axiom of countability if there exists a countable basis at each x ∈ X, i.e., ∀x ∈ X, ∃Bx ⊆ O, which is countable, such that, ∀O ∈ O with x ∈ O, we have ∃B ∈ Bx with x ∈ B ⊆ O, and ∀B ∈ Bx , we have x ∈ B. In this case, we will say that the topological space is first countable. The topological space is said to satisfy the second axiom of countability if there exists a countable basis B for O, in which case, we will say that it is second countable. % Clearly, a second countable topological space is also first countable. Example 3.21 The real line is first countable, where a countable basis at any x ∈ R consists of intervals of the form (x − r, x + r) with r ∈ Q and r > 0. The real line is also second countable, where a countable basis for the topology consists of intervals of the form (r1 , r2 ) with r1 , r2 ∈ Q and r1 < r2 . % When basis are available on topological spaces X and Y, in Definitions 3.8 and 3.14, we may restrict the open sets OY and U to be basis open sets without changing the meaning of the definition. In Proposition 3.3, we may restrict O to be a basis open set and the results still hold.

48

3 Topological Spaces

Definition 3.22 A collection (Aα )α∈Λ of sets, where Λ is an index set, is said to be a covering of a set X if X ⊆ α∈Λ Aα . It is an open covering if Aα ’s are open sets in a specific topological space. % Definition 3.23 A topological space X is said to be Lindelöf if any open covering of X has a countable subcovering. % Proposition 3.24 A second countable topological space X is Lindelöf. Proof Let (Bi )i∈N be a countable basis for X , where N is a countable index set. Let is an index set. ∀α ∈ Λ, Oα =  (Oα )α∈Λ be an open covering of X , where  Λ B , where N ⊆ N. Then, X = α i∈Nα i α∈Λ i∈Nα Bi . We may determine a ¯ subcollection (Bi )i∈N¯ , where N ⊆ N such that these Bi ’s appear at least once in ¯ the collection Ai := {Oα | α ∈ the previous union. Then, X = i∈N¯ Bi . ∀i ∈ N, an assignment Λ, Bi ⊆ Oα } is nonempty. Then, by Axiom of Choice, there  exists  Oαi i∈N¯ such that Oαi ∈ Ai , ∀i ∈ N¯ . Then, we have Oαi i∈N¯ is a countable subcover. This completes the proof of the proposition. ' &

3.4 Products of Topological Spaces Proposition 3.25 Let Λ be an index  set and (Xα , Oα ) be a topological space, ∀α ∈ Λ. The product topology O on α∈Λ Xα is the topology generated by B :=



.

  Oα  Oα ∈ Oα , ∀α ∈ Λ, and Oα = Xα for all but a finite

α∈Λ

 number of α’s The collection B  forms a basis for the topology O. We will also write  X , O = α∈Λ α α∈Λ (Xα , Oα ). When (Xα , Oα ) = (X, O) =: X , ∀α ∈ Λ,  we will denote α∈Λ (Xα , Oα ) by X Λ .  Proof We will provethis by Proposition 3.18. Since α∈Λ Xα ∈ B, then (i) is true. ∀B1 , B2 ∈ B, B1 = α∈Λ O1α , O1α ∈ Oα , ∀α ∈ Λ, and O1α = Xα , ∀α ∈ Λ \ Λ1 , where Λ1 ⊆ Λ is a finite set; B2 = α∈Λ O2α , O2α ∈ Oα , ∀α ∈ Λ, and O2α = Xα , ∀α ∈ Λ\Λ2 , where Λ2 ⊆ Λ is a finite set. Let B3 = B1 ∩B2 = α∈Λ (O1α ∩O2α ). Clearly, O1α ∩ O2α ∈ Oα , ∀α ∈ Λ, and O1α ∩ O2α = Xα , ∀α ∈ Λ \ (Λ1 ∪ Λ2 ), where Λ1 ∪ Λ2 ⊆ Λ is a finite set. Hence, B3 ∈ B. Then, (ii) also holds. By Proposition 3.18, B is a basis for O. This completes the proof of the proposition. ' & Definition 3.26  Let Xα be a topological space, ∀α ∈ Λ, where Λ is an index set, and Y = α∈Λ Xα be the product topological space. Define a collection of projection functions πα : Y → Xα , ∀α ∈ Λ, by πα (y) = yα for y = (yα )α∈Λ ∈ Y. %

3.4 Products of Topological Spaces

49

Proposition 3.27 Let (Xα, Oα ) be a topological space, ∀α ∈ Λ, where Λ is an index set, and (Y, O) = α∈Λ (Xα , Oα ) be the product topological space. Then, O is the weakest topology on which the projection functions πα , ∀α ∈ Λ, are continuous. Proof We will distinguish two exhaustive and mutually exclusive cases: Case 1: Λ = ∅, and Case 2: Λ = ∅. Case 1: Λ = ∅. (Y, O) = ({∅}, {∅, {∅}}). Then, O is the only topology on Y . Hence, the result is true. Case 2: Λ = ∅. Define  *   A := O ⊆ Y  O = Oα , ∃α0 ∈ Λ such that Oα0 ∈ Oα0 and

.

α∈Λ

Oα = Xα , ∀α ∈ Λ \ {α0 }

+

Clearly, A = ∅ is the collection of inverse images of open sets under πα , ∀α ∈ Λ. It is also clear that A ⊆ B ⊆ O, where B is thebasis for O that we introduced k in Proposition 3.25. ∀B ∈ B, we have B = i=1 Ai for some k ∈ N and A1 , . . . , Ak ∈ A. Hence, the topology generated by A contains B. Then, the topology generated by A equals O. Hence, O is the weakest topology containing A. This completes the proof of the proposition. ' & Proposition 3.28 Let Xi := (Xi , Oi ) be a second countable topological space, ∀i ∈ N ⊆ N, andN is a countable index set. Then, the product topological space X := (X, O) := i∈N Xi is second countable. Proof ∀i ∈ N, let Bi ⊆ Oi be a countable basis for Xi , and without loss of generality, assume Xi ∈ Bi . Let B be the basis for the product topological space X as defined in Proposition 3.25. Define B¯ :=

*

.

  Bi ⊆ X  Bi ∈ Bi , ∀i ∈ N, and Bi = Xi for all but a finite

i∈N

+ number of i’s ⊆ B ⊆ O Since Bi ’s are countable, then B¯ is countable. We will show that B¯ is a basis  for O. ∀O ∈ O, ∀x ∈ O, ∃OB ∈ B such that x ∈ OB ⊆ O, where OB = i∈N OBi , OBi ∈ Oi , ∀i ∈ N, OBi = Xi , ∀i ∈ N \ N¯ , and N¯ ⊆ N is a finite set. ∀i ∈ N¯ , πi (x) ∈ OBi . Then, ∃Bi  ∈ Bi such that πi (x) ∈ Bi ⊆ OBi . Let Bi := Xi , ¯ Clearly, x ∈ B ⊆ OB ⊆ O. Hence, ∀i ∈ N \ N¯ . Let B := i∈N Bi ∈ B. B¯ is a basis for X . Then, X is second countable. This completes the proof of the proposition. ' &

50

3 Topological Spaces

Proposition 3.29 Let Xα := (Xα , Oα) be a topological space, ∀α ∈ Λ, where Λ is an index set. Let X = (X, O) := α∈Λ Xα be the product space. Assume that   Aα ⊆ Xα , ∀α ∈ Λ. Then, α∈Λ Aα = α∈Λ Aα . Proof Note that 

.

α∈Λ



∼

=



) πα inv (A α

α∈Λ

where πα : X → Xα is the projection function, ∀α ∈ Λ. ∀α ∈ Λ, by Definition 3.2,   A Then, α ∈ Oα . By Proposition 3.27, πα is continuous.   πα inv (Aα ) ∈ O. Then, A is a closed set in X . Clearly, α∈Λ Aα ⊆ α∈Λ Aα . Then, we have α∈Λ α  α∈Λ Aα ⊆ α∈Λ Aα .  On the other hand, ∀x ∈ α∈Λ Aα , for  all basis open set O ∈ O with x ∈ O, by Proposition 3.25, we have O = α∈Λ Oα , where Oα ∈ Oα , ∀α ∈ Λ, and Oα = Xα for all but finitely many α’s. ∀α ∈ Λ, πα (x) ∈ Oα and πα (x) ∈ A have O ∩ α . By Proposition  3.3, Oα ∩ Aα = ∅. Then, by Axiom of Choice, we  ( α∈Λ Aα ) = α∈Λ (Oα ∩ Aα ) = ∅. Thus, by Proposition 3.3, x ∈ α∈Λ Aα .   Hence, we have α∈Λ Aα ⊆ α∈Λ Aα .   Therefore, α∈Λ Aα = α∈Λ Aα . This completes the proof of the proposition. ' & An immediate consequence of the above proposition  is that if Aα ⊆ Xα is closed, ∀α ∈ Λ, where Xα ’s are topological spaces, then α∈Λ Aα is closed in the product topology.   Proposition 3.30 Let Λβ β∈Γ be a collection of pairwise disjoint sets, where  Γ is an index set. Let (Xα , Oα ) be a topological space, ∀α ∈ β∈Γ Λβ .  β ∀β ∈ Γ , ( α∈Λβ Xα , O ) is the product space with the product topology.   ( β∈Γ α∈Λβ Xα , OΓ  ) is the product space of product spaces with the product  topology. Let ( α∈ Λβ Xα , O) be the product space with the product topology. β∈Γ    Then, ( β∈Γ α∈Λβ Xα , OΓ  ) and ( α∈ Λβ Xα , O) are homeomorphic. β∈Γ     Proof Define a mapping E : β∈Γ α∈Λβ Xα → α∈ β∈Γ Λβ Xα by, ∀x ∈    β∈Γ α∈Λβ Xα , ∀α ∈ β∈Γ Λβ , ∃! βα ∈ Γ · α ∈ Λβα , πα (E(x)) = βα 

πα

Γ 

(πβα (x)).  ∀y ∈ α∈

 β Xα . ∀β ∈ Γ , define xβ ∈ α∈Λβ Xα be such that πα (xβ ) =   Γ  πα (y), ∀α ∈ Λβ . Define x ∈ β∈Γ α∈Λβ Xα by πβ (x) = xβ , ∀β ∈ Γ . Then,  β  Γ  ∀α ∈ β∈Γ Λβ , ∃! βα · α ∈ Λβα , and πα (E(x)) = πα α (πβα (x)) = πα (y) and hence E(x) = y. Hence, E is surjective.   ∀x, z ∈ β∈Γ α∈Λβ Xα with E(x) = E(z). ∀α ∈ β∈Γ Λβ , ∃! βα · α ∈ β∈Γ

βα 

Λβα . Then, πα

Λβ

Γ 

βα 

(πβα (x)) = πα (E(x)) = πα (E(z)) = πα

Γ 

(πβα (z)). Hence,

3.4 Products of Topological Spaces β

Γ 

51 β

Γ 

Γ 

∀β ∈ Γ , ∀α ∈ Λβ , πα (πβ (x)) = πα (πβ (z)), which implies that πβ (x) = Γ 

πβ (z). Hence, x = z. This shows that E is injective. Hence, E is bijective and admits an inverse Einv . continuous  Next, we show that E is  by showing that E is continuous at x0 , ∀x0 ∈ Xα . Let y0 = E(x0 ). ∀B ∈ O which is a β∈Γ α∈Λβ Xα . ∀x0 ∈ β∈Γ α∈Λ β  basis open set with y0 ∈ B. Then, B = α∈ Λβ Bα , Bα ∈ Oα , ∀α ∈ β∈Γ Λβ , β∈Γ  and Bα = Xα for all α’s except finitely many α’s, say α ∈ ΛN . ∀α ∈ β∈Γ Λβ , ∃!βα ∈ Γ ·α ∈ Λβα . Let ΓN = {βα | α ∈ ΛN }. Then, ΓN is a finite set. ∀β ∈ ΓN , let ΛNβ :=  ΛN ∩ Λβ , which is a nonempty  finite set. Then, ΛN equals the disjoint union of β∈ΓN ΛNβ . Define B β := α∈Λβ Bα , ∀β ∈ Γ . Then, ∀β ∈ Γ , B β is # "  β . Define B Γ  := β . ∀β ∈ Γ \ Γ , a basis open set in X , O α N α∈Λβ β∈Γ B  β Γ  Γ  B = α∈Λβ Xα . Clearly, B is a basis open set in O . ∀β ∈ ΓN , ∀α ∈ β

Γ 

ΛNβ , πα (πβ (x0 )) = πα (y0 ) ∈ πα (B) = Bα . Then, x0 ∈ B Γ  . ∀x ∈ B Γ  , β

Γ 

β

Γ 

∀β ∈ ΓN , ∀α ∈ ΛNβ , we have πα (E(x)) = πα (πβ (x)) ∈ πα (πβ (B Γ  )) = β

πα (B β ) = Bα . Hence, E(x) ∈ B, which implies that E(B Γ  ) ⊆ B. Hence, E is continuous at x0 . Then, by the arbitrariness of x0 and Proposition 3.9, E is continuous.  Finally, we will show Einv is continuous by showing that, ∀y0 ∈ α∈ Λβ Xα , β∈Γ   Einv is continuous at y0 . ∀y0 ∈ X . Let x = E α 0 inv (y0 ) ∈ α∈ β∈Γ Λβ   Γ  Γ  ∈ O with x0 ∈ B Γ  . Then, β∈Γ α∈Λβ Xα . For any basis open set, B   B Γ  = β∈Γ B¯ β , B¯ β ∈ Oβ , ∀β ∈ Γ , and B¯ β = α∈Λ Xα for all β ∈ Γ β

Γ  except finitely many β’s, say β ∈ ΓN . ∀β ∈ ΓN , πβ (x0 ) ∈ B¯ β . Then, there Γ  exists a basis open set B β ∈ Oβ such that πβ (x0 ) ∈ B β ⊆ B¯ β . Then,  B β = α∈Λβ Bα , Bα ∈ Oα , ∀α ∈ Λβ , and Bα = Xα for all α ∈ Λβ except β

Γ 

finitely many α’s, say α ∈ ΛNβ . ∀α ∈ ΛNβ , πα (y0 ) = πα (πβ (x0 )) ∈ Bα . Let  ΛN = β∈ΓN ΛNβ , which is a finite set and the union is pairwise disjoint. Let  Bα = Xα , ∀α ∈ Λβ , ∀β ∈ Γ \ ΓN . Define B := α∈ Λβ Bα . Clearly, B is β∈Γ a basis open set in O and y0 ∈ B. ∀y ∈ B. Let x = Einv (y), then, y = E(x). β Γ  ∀β ∈ Γ , ∀α ∈ Λβ , πα (y) = πα (E(x)) = πα (πβ (x)) ∈ Bα . Then, ∀β ∈ ΓN ,  Γ  πβ (x) ∈ α∈Λβ Bα = B β ⊆ B¯ β . Then, x ∈ B Γ  . Hence, Einv (B) ⊆ B Γ  . Hence, Einv is continuous at y0 . By the arbitrariness of y0 and Proposition 3.9, we have Einv is continuous.   This shows that E is a homeomorphism of ( β∈Γ α∈Λβ Xα , OΓ  ) to   ( α∈ ' & Λβ Xα , O) and completes the proof of the proposition. β∈Γ

Proposition 3.31 Let Λ be an index set, Xα := (Xα , OXα ) and Yα := (Yα , OY α ) be topological spaces, α ∈ Λ. Assume that X ∀α ∈ Λ. α and Yα are homeomorphic,  Then, the product topological spaces X = ( X , O ) := X and Y= α X α α∈Λ α∈Λ   ( α∈Λ Yα , OY ) := α∈Λ Yα are homeomorphic.

52

3 Topological Spaces

Proof ∀α ∈ Λ, since Xα and Yα are homeomorphic, then, ∃Hα : Xα → Yα such that Hα is bijective and both Hα and Hαinv are continuous. Define mapping H :  Y  X X → Y by, ∀x ∈ α∈Λ Xα , πα (H (x)) = Hα (πα (x)), ∀α ∈ Λ. X Y  ∀y ∈ Y, define x ∈ X by πα (x) = Hα inv (πα (y)), ∀α ∈ Λ. Then, H (x) = y. Hence, H is surjective. X X ∀x1 , x2 ∈ X with x1 = x2 . Then, ∃α0 ∈ Λ such that πα0 (x1 ) = πα0 (x2 ). Y  X X Y  Then, πα0 (H (x1 )) = Hα0 (πα0 (x1 )) = Hα0 (πα0 (x2 )) = πα0 (H (x2 )), since Hα0 is injective. Hence, H (x1) = H (x2 ). Therefore, H is injective. Therefore, H is invertible with inverse Hinv. Next,we show that H is continuous. For any basis open set, OY ∈ OY . Then, OY = α∈Λ OY α with OY α ∈ OY α , ∀α ∈ Λ, and OY α = Yα for all α’s except finitely many α’s, say α ∈ ΛN . Then, Hinv (OY ) = α∈Λ Hαinv (OY α ) ∈ OX . Hence, H is continuous. Finally, weshow that Hinv is continuous. For any basis open set, OX ∈ OX . Then, OX = α∈Λ OXα with OXα ∈ OXα , ∀α ∈ Λ, andOXα = Xα for all α’s except finitely many α’s, say α ∈ ΛN . Then, H (OX ) = α∈Λ Hα (OXα ) ∈ OY . Hence, Hinv is continuous. Hence, H is a homeomorphism. This completes the proof of the proposition. & ' Proposition 3.32 Let Λ be an index set, X be a topological space, Yα := (Yα , Oα ) be a topological space, and fα : X → Yα , ∀α ∈ Λ. Let Y = (Y, O) := α∈Λ Yα be the product topological space and f : X → Y be given by, ∀x ∈ X , πα (f (x)) = fα (x), ∀α ∈ Λ. Then, f is continuous at x0 ∈ X if, and only if, fα is continuous at x0 , ∀α ∈ Λ. Proof “Sufficiency” ∀O ∈ O with  f (x0 ) ∈ O. By Proposition 3.25 and Definition 3.17, ∃ a basis open set B = α∈Λ Bα ∈ O such that f (x0 ) ∈ B ⊆ O, where Bα ∈ Oα , ∀α ∈ Λ, and Bα = Yα for all α’s except finitely many α’s, say α ∈ ΛN . ∀α ∈ ΛN , fα (x0 ) = πα (f (x0 )) ∈ πα (B) = Bα . By the continuity of fα at x0 , ∃Uα ∈ OX with x0 ∈ Uα such that fα (Uα ) ⊆ Bα . Let U := α∈ΛN Uα ∈ OX . Clearly, x0 ∈ U . ∀x ∈ U , ∀α ∈ ΛN , πα (f (x)) = fα (x) ∈ fα (U ) ⊆ fα (Uα ) ⊆ Bα . Hence, f (x) ∈ B and f (U ) ⊆ B ⊆ O. Then, f is continuous at x0 . “Necessity” ∀α ∈ Λ, the projection function πα is continuous, by Proposition 3.27. Note that fα = πα ◦ f . Then, fα is continuous at x0 by Proposition 3.12. This completes the proof of the proposition. ' &

3.5 The Separation Axioms Definition 3.33 Let (X, O) be a topological space. It is said to be  T1 (Tychonoff): ∀x, y ∈ X with x = y, ∃O ∈ O such that x ∈ O and y ∈ O. T2 (Hausdorff): ∀x, y ∈ X with x = y, ∃O1 , O2 ∈ O such that x ∈ O1 , y ∈ O2 , and O1 ∩ O2 = ∅.

3.6 Category Theory

53

T3 (regular): it is Tychonoff and, ∀x ∈ X, ∀F ⊆ X with F being closed and x∈ / F , ∃O1 , O2 ∈ O such that x ∈ O1 , F ⊆ O2 , and O1 ∩ O2 = ∅. T4 (normal): it is Tychonoff and, ∀F1 , F2 ⊆ X with F1 and F2 being closed and F1 ∩ F2 = ∅, ∃O1 , O2 ∈ O such that F1 ⊆ O1 , F2 ⊆ O2 , and O1 ∩ O2 = ∅. % Note that (X, O) isTychonoff which implies that, ∀x ∈ X, the singleton set {x}  = y∈X,y =x Oy , where Oy ∈ O and y ∈ Oy and x ∈ is closed since {x} / Oy . Then, it is clear that T4 ⇒ T3 ⇒ T2 ⇒ T1 . Proposition 3.34 A topological space (X, O) is Tychonoff if, and only if, ∀x ∈ X, the singleton set {x} is closed. Proof “Only if” ∀x ∈ X,  ∀y ∈ X with y = x, we have ∃Oy ∈ O such that y ∈ Oy y . Then, {x}  = y∈{x} and x ∈ O  Oy ∈ O. Hence, {x} is closed.  ∈ O. Then, x ∈ {y}  and y ∈  Hence, “If” ∀x, y ∈ X with x = y, we have {y} / {y}. (X, O) is Tychonoff. ' & This completes the proof of the proposition. Proposition 3.35 Let (X, O) be a Tychonoff topological space. It is normal if, and only if, for all closed subset F ⊆ X and any open subset O ∈ O with F ⊆ O, ∃U ∈ O such that F ⊆ U ⊆ U ⊆ O.  are closed, and F ∩ O  = ∅, Proof “Necessity” Since (X, O) is normal, F and O  then ∃O1 , O2 ∈ O such that F ⊆ O1 , O ⊆ O2 , and O1 ∩ O2 = ∅. Then, we 2 ⊆ O. Since O 2 is closed, then, O1 ⊆ O 2 . Therefore, we have have F ⊆ O1 ⊆ O  F ⊆ O1 ⊆ O1 ⊆ O2 ⊆ O. So, U = O1 . “Sufficiency” For any closed subsets F1 , F2 ⊆ X with F1 ∩ F2 = ∅, we have F1 ⊆ F2 . Then, ∃U ∈ O such that F1 ⊆ U ⊆ U ⊆ F2 . Let O1 = U ∈ O and  ∈ O. Clearly, O ∩ O = ∅, F ⊆ O , and F ⊆ O . Hence, (X, O) is O2 = U 1 2 1 1 2 2 normal. This completes the proof of the proposition. ' & It is easy to show that the product topological space of Hausdorff topological spaces is Hausdorff.

3.6 Category Theory Definition 3.36 In a topological space (X, O), a subset D ⊆ X is said to be dense if D = X. (X, O) is said to be separable if there exists a countable dense subset  ◦  is dense in X. A  =M D ⊆ X. A subset M ⊆ X is said to be nowhere dense if M subset F ⊆ X is said to be of first category (or meager) if F is the countable union of nowhere dense subsets of X. A subset S ⊆ X is said to be of second category (or nonmeager) if S is not meager. A subset H ⊆ X is said to be a residual set (or

54

3 Topological Spaces

 is meager. (X, O) is said to be second category everywhere if every comeager) if H nonempty open subset of X is of second category. % Proposition 3.37 Let X be a topological space and Y ⊆ X be dense. Then, ∀O ∈ O, O ∩ Y = O. Proof Clearly, O ⊇ O ∩ Y is closed. Then, we have O ∩ Y ⊆ O. ∀x ∈ O, ∀U ∈ O with x ∈ U , by Proposition 3.3, U ∩ O = ∅. Then, (U ∩ O) ∩ Y = ∅ since Y = X and U ∩ O ∈ O. Hence, we have U ∩ (O ∩ Y ) = ∅. This implies that x ∈ O ∩ Y , by Proposition 3.3. Hence, we have O ⊆ O ∩ Y . Therefore, O = O ∩ Y . This completes the proof of the proposition. ' & Proposition 3.38 Let X be a topological space. Then, X is second category everywhere if, and only if, countable intersection of open dense subsets is dense. Proof “Only if” Let (On )∞ n=1 be a sequence of open dense subsets of X . Let Fn :=   n , ∀n ∈ N. Clearly, Fn is closed and nowhere dense, ∀n ∈ N, since F O ∞ ∼ ∞ ∞ n = Fn = On = X . Note that n=1 On = n=1 Fn . Now, suppose ∞ that  n=1 On is not dense. Then, ∃O ∈ OX which is nonempty such that On ∩ O = ∅. Then, n=1 ∞ ∼   ∩ O = ∅. Hence, O = ∞ n=1 Fn n=1 (Fn ∩ O) and is of first  category. This contradicts with the fact that O is of second category. Therefore, ∞ n=1 On must be dense. The case of finite intersection can be converted to the above scenario by padding X as additional open dense subsets. “If” ∀O ∈ O with O = ∅. Fix any countable collection (Eα )α∈Λ of nowhere    dense subsets α∈Λ Eα is dense. "of X . ∀α # ∈ Λ, Eα is open and dense. Then,   Then, O ∩ α∈Λ Eα = ∅, which further implies that O  α∈Λ Eα . Hence,  O  α∈Λ Eα . Hence, O is of second category by the arbitrariness of (Eα )α∈Λ . Hence, X is second category everywhere by the arbitrariness of O. This completes the proof of the proposition. ' &  ∈ O. Proposition 3.39 Let (X, O) be a topological space. F, O ⊆ X with O, F Then, (i) O \ O and F \ F ◦ are nowhere dense. (ii) If (X, O) is second category everywhere and F is of first category, then F is nowhere dense.  is closed. Note that Proof (i) O \ O = O ∩ O   ∪O =O ∪O =X =O O\O =O ∩O

.

where we have applied Proposition 3.3 in the above. Hence O \ O is nowhere dense. ◦ is closed. Note that F \ F◦ = F ∩ F

3.6 Category Theory

55

◦ = F ◦ ∪ F ◦ = X ∪ F◦ = F ∪ F◦ = F ∩F F \ F ◦ = F

.

where we have applied Proposition 3.3 in the above. Hence F \ F ◦ is nowhere dense.   (ii) Let F = ∞ n=1 Fn , where Fn ’s are nowhere dense subsets of X. Then, Fn is ∞  is dense. open and dense in (X, O). By Proposition 3.38, we have n=1 Fn  Since F is closed, then F = F . ∀m ∈ N. Note that F ⊇ m n=1 Fn . Then, m  F ⊇ m F = F , by Proposition 3.3. Therefore, n=1 n n=1 n F =F ⊇

∞ 

.

n=1

Fn ⊇

∞ 

Fn = F

n=1

   ∞ F This implies F = ∞ n=1 Fn . Hence, we have F = n=1 n , which is dense. Hence, F is nowhere dense. This completes the proof of the proposition. ' & Proposition 3.40 Let (X, O) be a topological space. Then, (i) A closed set F ⊆ X is nowhere dense if, and only if, it contains no nonempty open subset. (ii) A subset E ⊆ X is nowhere dense if, and only if, ∀O ∈ O with O = ∅, ∃U ∈ O with U = ∅ such that U ⊆ O \ E.  ∩ U = ∅. Proof (i) “Only if” Suppose ∃U ∈ O with U = ∅ and U ⊆ F . Then, F   = X, by Proposition 3.3. Hence, the result This contradicts with the fact that F = F holds.  = ∅. “If” ∀x ∈ X, ∀U ∈ O with x ∈ U . Then, U  F implies U ∩ F , by Proposition 3.3. Hence, we have This implies that x is a point of closure of F =F  = X. Hence, F is nowhere dense. F (ii) “Only if” ∀O ∈ O with O = ∅. Note that  ⊇ O ∩E O \E = O ∩E

.

 is open and dense, and hence O ∩ E  is open and Since E is nowhere dense, then E  nonempty by the nonemptyness of O and Proposition 3.3. Then, ∃U = O ∩ E. “If” ∀x ∈ X, ∀O ∈ O with x ∈ O. Then, ∃U ∈ O with U = ∅ such that    where we  ◦ = O◦ ∩ E ◦ = O ∩ E, U ⊆ O \ E. Note that U = U ◦ ⊆ O ∩ E  is nonempty. By the arbitrariness have made use of Proposition 3.3. Hence, O ∩ E  By the arbitrariness of x, E of O and Proposition 3.3, x is a point of closure for E. is nowhere dense. This completes the proof of the proposition.

' &

56

3 Topological Spaces

Theorem 3.41 (Uniform Boundedness Principle) Let (X, O) be a topological space that is second category everywhere. Let F be a family of continuous realvalued functions of X. ∀x ∈ X, ∃Mx ∈ [0, ∞) ⊂ R such that |f (x)| ≤ Mx , ∀f ∈ F . Then, ∃U ∈ O with U = ∅ and ∃M ∈ [0, ∞) ⊂ R such that |f (x)| ≤ M, ∀x ∈ U and ∀f ∈ F . Proof Let Em,f := finv ([−m, m]) ⊆ X, ∀f ∈ F and ∀m ∈ Z+ . Since f in continuous and [−m, m] is a  closed interval, then, by Proposition 3.10, Em,f ’s are closed. ∀m ∈ Z+ , let Em := f ∈F Em,f , which is closed. ∀x ∈ X, ∃m ∈ Z+ with m ≥ Mx such that |f (x)|≤ m, ∀f ∈ F . Then, x ∈ Em,f , ∀f ∈ F , and hence x ∈ Em . Therefore, X = ∞ m=0 Em . Since X is second category everywhere, then Em ’s are not all nowhere dense. Then, ∃n ∈ Z+ such that En is not nowhere dense,  = E n = X. Then, by Proposition 3.40, En contains a nonempty open that is, E n subset U ∈ O. Then, U ⊆ En and U = ∅. ∀x ∈ U , ∀f ∈ F , we have |f (x)| ≤ n. This completes the proof of the theorem. ' &

3.7 Connectedness Definition 3.42 A topological space X is said to be connected if there do not exist nonempty open sets O1 and O2 such that X = O1 ∪ O2 and O1 ∩ O2 = ∅. Such a pair of O1 and O2 is called a separation of X if it exists. % Proposition 3.43 Let X and Y be topological spaces and f : X → Y be a surjective continuous function. If X is connected, then Y is connected. Proof Suppose Y is not connected. Then, ∃O1 , O2 ∈ OY with O1 = ∅ and O2 = ∅ such that O1 ∪ O2 = Y and O1 ∩ O2 = ∅. Since f is surjective, then finv (O1 ) = ∅ and finv (O2 ) = ∅. Since f is continuous, then finv (O1 ), finv (O2 ) ∈ OX . By Proposition 2.5, finv (O1 ) ∪ finv (O2 ) = finv (O1 ∪ O2 ) = X and finv (O1 ) ∩ finv (O2 ) = finv (O1 ∩ O2 ) = ∅. This contradicts with the assumption that X is connected. Therefore, Y is connected. This completes the proof of the proposition. ' & Theorem 3.44 (Mean Value Theorem) Let X be a topological space and f : X → R be a continuous function. Assume that X is connected and ∃x, y ∈ X such that f (x) < c < f (y) for some c ∈ R. Then, ∃z ∈ X such that f (z) = c. Proof Suppose the result is false. Then, finv ({c}) = ∅. Let O1 := finv ((c, ∞)) and O2 := finv ((−∞, c)). Then, x ∈ O2 ∈ OX and y ∈ O1 ∈ OX , by assumptions of the theorem. By Proposition 2.5, O1 ∩ O2 = ∅ and O1 ∪ O2 = X . This contradicts with the assumption that X is connected. Hence, the result is true. This completes the proof of the theorem. ' &

3.7 Connectedness

57

Proposition 3.45 Let X be a topological space, U ⊆ V ⊆ U ⊆ X , and U be connected in the subset topology OU . Then, V is connected in the subset topology OV . Proof Suppose V is not connected in its subset topology. Then, ∃OV 1 , OV 2 ∈ OV with OV 1 = ∅ and OV 2 = ∅, such that OV 1 ∪ OV 2 = V and OV 1 ∩ OV 2 = ∅. By Proposition 3.4, ∃O1 , O2 ∈ O such that OV 1 = O1 ∩ V and OV 2 = O2 ∩ V . Let x1 ∈ OV 1 and x2 ∈ OV 2 . Then, xi ∈ U ∩ Oi , i = 1, 2. By Proposition 3.3, ∃x¯i ∈ U ∩ Oi =: OU i = ∅, i = 1, 2. By Proposition 3.4, OU 1 , OU 2 ∈ OU . Note that OU 1 ∩ OU 2 = U ∩ O1 ∩ O2 = U ∩ V ∩ O1 ∩ O2 = U ∩ (O1 ∩ V ) ∩ (O2 ∩ V ) = U ∩ OV 1 ∩ OV 2 = ∅ and OU 1 ∪ OU 2 = U ∩ (O1 ∪ O2 ) = U ∩ V ∩ (O1 ∪ O2 ) = U ∩ ((V ∩ O1 ) ∪ (V ∩ O2 )) = U ∩ (OV 1 ∪ OV 2 ) = U ∩ V = U . Hence, the pair (OU 1 , OU 2 ) is a separation of U . This implies that U is not connected in its subset topology. This contradicts with the assumption. Hence, V is connected. ' & Definition 3.46 Let X be a topological space, x0 ∈ X . Let M := {M ⊆ X | x0 ∈ M, M is connected in the subset topology.}

.

The component containing x0 is defined by A :=



M∈M M.

%

Clearly, X is the union of its components. Proposition 3.47 Let X be a topological space and x0 ∈ X . Then, the component A of X containing x0 is connected and closed. Proof Let M be as defined in Definition 3.46. Suppose A is not connected. Let OA be the subset topology on A. Then, ∃O1 , O2 ∈ OA with O1 = ∅ and O2 = ∅ such that O1 ∪ O2 = A and O1 ∩ O2 = ∅. Without loss of generality, assume that x0 ∈ O1 . By the definition of A, ∃A0 ∈ M such that A0 ∩ O2 = ∅. Note that A0 = (O1 ∩ A0 ) ∪ (O2 ∩ A0 ) and O1 ∩ A0 x0 and O2 ∩ A0 are nonempty and open in the subset topology on A0 . Furthermore, (O1 ∩ A0 ) ∩ (O2 ∩ A0 ) = ∅. This shows that A0 is not connected, which contradicts with the fact that A0 ∈ M. Hence, A is connected. By Proposition 3.45, A is connected. Then, we have A ∈ M and A = A. By Proposition 3.3, A is closed in O. This completes the proof of the proposition. ' & Proposition 3.48 Let X := (X, O) be a topological space and Aα ⊆ X be connected (in subset topology), ∀α ∈ Λ, where  Λ is an index set. Assume that Aα1 ∩ Aα2 = ∅, ∀α1 , α2 ∈ Λ. Then, A := α∈Λ Aα is connected (in subset topology). Proof Suppose A is not connected. Then, ∃O1 , O2 ∈ O such that O1 ∩ A = ∅ = O2 ∩ A, (O1 ∩ A) ∩ (O2 ∩ A) = ∅, and (O1 ∩ A) ∪ (O2 ∩ A) = (O1 ∪ O2 ) ∩ A = A. Let xi ∈ Oi ∩ A, i = 1, 2. Then, ∃αi ∈ Λ such that xi ∈ Aαi , i = 1, 2. By the assumption, let x0 ∈ Aα1 ∩ Aα2 = ∅. Without loss of generality, assume x0 ∈ O1 . Then, x0 ∈ O1 ∩ Aα2 = ∅, x2 ∈ O2 ∩ Aα2 = ∅, (O1 ∩ Aα2 ) ∩ (O2 ∩ Aα2 ) ⊆

58

3 Topological Spaces

(O1 ∩ A) ∩ (O2 ∩ A) = ∅, and (O1 ∩ Aα2 ) ∪ (O2 ∩ Aα2 ) = (O1 ∪ O2 ) ∩ Aα2 = Aα2 . Hence, O1 ∩ Aα2 and O2 ∩ Aα2 form a separation of Aα2 . This implies that Aα2 is not connected. This is a contradiction. Therefore, A is connected. This completes the proof of the proposition. ' & Definition 3.49 Let X be a topological space. It is said to be locally connected if there exists a basis B such that B is connected (in the subset topology), ∀B ∈ B. % Proposition 3.50 Any component of a locally connected topological space is open. Proof Let X be a locally connected topological space and B be a basis made up of connected sets. ∀x0 ∈ X , let A be the component containing x0 . By Proposition 3.47, A is closed and connected. ∀x ∈ A, ∃B ∈ B such that x ∈ B. Since B is connected, then, by Proposition 3.48 and Definition 3.46, B ⊆ A. Hence, A is open. This completes the proof of the proposition. ' & Proposition 3.51 Let (Xα , Oα ) be a connected topological space,∀α ∈ Γ , where Γ is an index set. Let (X, O) be the product topological space α∈Γ (Xα , Oα ). Then, (X, O) is connected. Proof Suppose that X is not connected. Then, ∃O1 , O2 ∈ O with O1 = ∅ and O2 = ∅ such that O1 ∪ O2 = X and O1 ∩ O2 = ∅. Let B be the basis defined in Proposition  3.25. Then, there exist  nonempty disjoint index sets Λ1 and Λ2 such that O1 = λ∈Λ1 Bλ and O = 2 λ∈Λ2 Cλ , where Bλ ∈ B, ∀λ ∈ Λ1 , and Cλ ∈ B, ∀λ ∈ Λ2 . ∀λ ∈ Λ1 , Bλ = α∈Γ Bλα , where Bλα ∈ Oα , ∀α ∈ Γ , and Bλα = Xα for all except finitely  many α’s, say α ∈ Γλ . Note that Γλ = ∅, since Bλ ⊆ O1 ⊂ X. ∀λ ∈ Λ2 , Cλ = α∈Γ Cλα , where Cλα ∈ Oα , ∀α ∈ Γ , and Cλα = Xα for all except finitely many α’s, say α ∈ Γλ . Note that Γλ = ∅, since Cλ ⊆ O2 ⊂ X. Note that   .∅ = O1 ∩ O2 = (Bλ ∩ Cγ ) λ∈Λ1 γ ∈Λ2

Therefore, Bλ ∩ Cγ = ∅, ∀λ ∈ Λ1 , ∀γ ∈ Λ2 , which implies that ∃αλγ ∈ Γ · Bλαλγ ∩ Cγ αλγ = ∅. Fix x1 ∈ O1 and x2 ∈ O2 . Then, x1 = x2 . ∃λ1 ∈ Λ1 such that x1 ∈ Bλ1 . Let Γ1 := Γλ1 . ∃λ2 ∈ Λ2 such that x2 ∈ Cλ2 . Let Γ2 := Γλ2 . Note that ∀x ∈ X with πα (x) = πα (x1 ), ∀α ∈ Γ1 , we have x ∈ Bλ1 ⊆ O1 . Similarly, ∀x ∈ X with πα (x) = πα (x2 ), ∀α ∈ Γ2 , we have x ∈ Cλ2 ⊆ O2 . Therefore, starting with x1 ∈ O1 and switch, one by one, its coordinate πα (x1 ) to πα (x2 ), for all α ∈ Γ2 , we will end up with a point x3 ∈ O2 . Therefore, there must exist a step in this process such that switching one coordinate πα0 (x1 ) to πα0 (x2 ), for some α0 ∈ Γ2 , leads to the change of set membership from x¯ 1 ∈ O1 before the switch to x¯2 ∈ O2 after the switch. In summary, there exist x¯1 ∈ O1 , x¯2 ∈ O2 , and α0 ∈ Γ such that πα (x¯1 ) = πα (x¯2 ), ∀α ∈ Γ \ {α0 }. Since O1 ∩ O2 = ∅, we must have πα0 (x¯1 ) = πα0 (x¯2 ).

3.7 Connectedness

59

Define Λ1 0 := {λ ∈ Λ1 | Bλ =



.

Bλα , πα (x¯1 ) ∈ Bλα , ∀α ∈ Γ \ {α0 }}

α∈Γ

and Λ2 0 := {λ ∈ Λ2 | Cλ =



.

Cλα , πα (x¯2 ) ∈ Cλα , ∀α ∈ Γ \ {α0 }}

α∈Γ

Let M :=

 α∈Γ

Mα ⊆ X, where Mα = {πα (x¯1 )}, ∀α ∈ Γ \#{α0 }, and Mα0 = Xα0 .  "  ∪ B λ∈Λ1 λ γ ∈Λ2 Cγ implies that

Note that M ⊆ X = O1 ∪ O2 = M⊆

.

 



Bλ ∪

!



=



γ ∈Λ2 0

λ∈Λ1 0





(Bλ ∪ Cγ )

λ∈Λ1 0 γ ∈Λ2 0

  Therefore,  we have Xα0 = πα0 (M) ⊆ λ∈Λ1 0 γ ∈Λ2 0 (πα0 (Bλ ) ∪ πα0 (Cγ )) =  λ∈Λ1 0 γ ∈Λ2 0 (Bλα0 ∪ Cγ α0 ) ⊆ Xα0 , by Proposition 2.5. Then, Xα0 =





.

(Bλα0 ∪ Cγ α0 ) =

λ∈Λ1 0 γ ∈Λ2 0

  λ∈Λ1 0





Bλα0 ∪

! Cγ α0

γ ∈Λ2 0

∗ =: D1 ∪ D2 By derivations  in the first paragraph of this proof, we have ∀λ ∈ Λ1 0 , ∀γ ∈ Λ2 0 , Bλ ∩ Cγ = α∈Γ (Bλα ∩ Cγ α ) = ∅. Note that, ∀α ∈ Γ \ {α0 }, πα (x¯1 ) = πα (x¯2 ) ∈ Bλα ∩ Cγ α = ∅. Then, we must have Bλα0 ∩ Cγ α0 = ∅. Therefore, we have D1 ∩ D2 =





.

(Bλα0 ∩ Cγ α0 ) = ∅

λ∈Λ1 0 γ ∈Λ2 0

Clearly, D1 , D2 ∈ Oα0 , πα0 (x¯1 ) ∈ D1 = ∅, and πα0 (x¯2 ) ∈ D2 = ∅ since x¯1 ∈ O1 and x¯2 ∈ O2 . This shows that D1 and D2 form a separation of Xα0 . This contradicts with the assumption that (Xα , Oα ) is connected, ∀α ∈ Γ . Therefore, (X, O) is connected. This completes the proof of the proposition. ' & Definition 3.52 Let X be a topological space and I := [0, 1] ⊂ R be endowed with the subset topology of R. A curve in X is a continuous mapping γ : I → X . γ (0) is called the beginning point and γ (1) is called the end point. A closed curve is such that γ (0) = γ (1). Two closed curves γ1 and γ2 are said to be homotopic to each other if there exists a continuous function φ : I × I → X such that φ(t, 0) = γ1 (t), φ(t, 1) = γ2 (t), and φ(0, t) = φ(1, t), ∀t ∈ I . %

60

3 Topological Spaces

Definition 3.53 Let X be a topological space. It is said to be arcwise connected if ∀x1 , x2 ∈ X , there exists a curve γ in X such that γ (0) = x1 and γ (1) = x2 . X is said to be simply connected if it is arcwise connected and any closed curve is homotopic to a single point (that is, a degenerate curve γ with γ (t) = x ∈ X , ∀t ∈ [0, 1] ⊂ R.) % Proposition 3.54 Let (X, O) be a topological space. Then, it is connected if it is arcwise connected. Proof Suppose (X, O) is not connected. Then, ∃O1 , O2 ∈ O with O1 = ∅ and O2 = ∅ such that O1 ∪ O2 = X and O1 ∩ O2 = ∅. Fix x1 ∈ O1 and x2 ∈ O2 . Then, x1 = x2 . Since (X, O) is arcwise connected, then there exists a curve γ such that γ (0) = x1 and γ (1) = x2 . Define t := sup{s ∈ [0, 1] ⊂ R | γ ([0, s]) ⊆ O1 }. Then, t ∈ [0, 1] ⊂ R. Claim 3.54.1 γ (t) ∈ O1 . Proof of Claim Suppose γ (t) ∈ / O1 , then γ (t) ∈ O2 . Then, t > 0, since γ (0) = x1 ∈ O1 . Since γ is continuous, then ∃t1 ∈ [0, t) ⊂ R such that γ ((t1 , t]) ⊆ O2 . ∀s ∈ (t1 , 1] ⊂ R, γ ([0, s]) ∩ O2 = ∅. Then, s ∈ / {s ∈ [0, 1] ⊂ R | γ ([0, s]) ⊆ O1 }. Hence, s ≥ t, which implies that t1 ≥ t. This contradicts t1 < t. Hence, γ (t) ∈ O1 . This completes the proof of the claim. ' & Since γ (1) = x2 ∈ O2 , then t < 1. By the continuity of γ , ∃t2 ∈ (t, 1] ⊂ R such that γ ([t, t2 )) ⊆ O1 . ∀s ∈ [0, t), ∃s1 ∈ (s, t] such that s1 ∈ {s ∈ [0, 1] ⊂ R | γ ([0, s]) ⊆ O1 }. Then, γ ([0, s1 ]) ⊆ O1 and hence γ (s) ∈ O1 . Therefore, γ ([0, t)) ⊆ O1 . This coupled with γ ([t, t2 )) ⊆ O1 , implies γ ([0, t2 )) ⊆ O1 . Then, t ≥ t2 . This contradicts with t < t2 . Therefore, (X, O) is connected. This completes the proof of the proposition. & '

3.8 Continuous Real-Valued Functions Theorem 3.55 (Urysohn’s Lemma) Let (X, O) be a normal topological space, A, B ⊆ X be closed subsets, and A ∩ B = ∅. Then, there exists a continuous realvalued function f : X → [0, 1] ⊂ R such that f (x) = 0, ∀x ∈ A, and f (x) = 1, ∀x ∈ B. Proof Since the set Q := Q ∩ [0, 1] is countable, then, by recursively applying Proposition 3.35, we may find (Or )r∈Q ⊆ O such that the following two properties are satisfied:  1. ∀r ∈ Q, A ⊆ Or ⊆ Or ⊆ B. 2. ∀r, s ∈ Q with r < s, Or ⊆ Os . Define the real-valued function f : X → R by f (x) = inf({r ∈ Q | x ∈ Or } ∪ {1})

.

3.8 Continuous Real-Valued Functions

61

1 . By 1, Clearly, f : X → [0, 1], f (x) = 0, ∀x ∈ O0 , and f (x) = 1, ∀x ∈ O  Hence, all we need to show is that f is continuous. we have A ⊆ O0 and O1 ⊆ B. ∀x0 ∈ X, we will show that f is continuous at x0 . Let a0 = f (x0 ) ∈ [0, 1]. ∀U ⊆ R with U being open and a0 ∈ U , ∃a1 , a2 , a3 , a4 ∈ Q such that a1 < a2 < a0 < a3 < a4 and (a1 , a4 ) ⊆ U . Let a¯ 2 = max{a2 , 0} and a¯ 3 = min{a3 , 1}. Then, we must have a1 < a¯ 2 ≤ a0 ≤ a¯ 3 < a4 and a¯ 2 , a¯ 3 ∈ Q. We will distinguish three exhaustive and mutually exclusive cases: Case 1: a0 ∈ (0, 1), Case 2: a0 = 0, and Case 3: a0 = 1. Case 1: a0 ∈ (0, 1). Then, we must have a1 < a¯ 2 < a0 < a¯ 3 < a4 . Let , ∩ O ∈ O. ∀x ∈ V , we have x ∈ O and f (x) ≤ a¯ . Also, x ∈ O , V =O a¯ 2 a¯ 3 a¯ 3 3 a¯ 2 implies that f (x) ≥ a¯ 2 . Hence, f (V ) ⊆ [a¯ 2 , a¯ 3 ] ⊂ (a1 , a4 ) ⊆ U . f (x0 ) = a0 < a¯ 3 implies that x0 ∈ Oa¯ 3 . f (x0 ) = a0 > a¯ 2 implies that ∃ a2 ∈ (a¯ 2 , a0 ) ∩ Q such that , . Therefore x ∈ V . This shows that ∃V ∈ O with x ∈ V such , x0 ∈ O ⊆ O  a2 a¯ 2 0 0 that f (V ) ⊆ U . Case 2 a0 = 0. Then, we must have a1 < 0 = a0 < a¯ 3 < a4 . Take V = Oa¯ 3 ∈ O. We must have x0 ∈ V . ∀x ∈ V , 0 ≤ f (x) ≤ a¯ 3 . Hence, f (V ) ⊆ [0, a¯ 3 ] ⊂ (a1 , a4 ) ⊆ U . Hence, ∃V ∈ O with x0 ∈ V such that f (V ) ⊆ U . ,∈ Case 3 a0 = 1. Then, we must have a1 < a¯ 2 < a0 = 1 < a4 . Take V = O a¯ 2 ,  O. Since f (x0 ) = a0 = 1, then x0 ∈ O 1+a¯2 ⊆ Oa¯ 2 = V . ∀x ∈ V , f (x) ≥ a¯ 2 . 2

Hence, f (V ) ⊆ [a¯ 2, 1] ⊂ (a1 , a4 ) ⊆ U . Hence, ∃V ∈ O with x0 ∈ V such that f (V ) ⊆ U . Therefore, in all cases, ∃V ∈ O with x0 ∈ V such that f (V ) ⊆ U . Hence, f is continuous at x0 . By the arbitrariness of x0 and Proposition 3.9, f is continuous. This completes the proof of the theorem. ' & Proposition 3.56 Let X and Y be topological spaces and Y be Hausdorff. f1 : X → Y and f2 : X → Y are continuous. Let D ⊆ X be dense. Assume that f1 |D = f2 |D . Then, f1 = f2 . Proof Suppose f1 = f2 . Then, ∃x ∈ X such that f1 (x) = f2 (x). Since Y is Hausdorff, then ∃O1 , O2 ∈ OY such that f1 (x) ∈ O1 , f2 (x) ∈ O2 , and O1 ∩ O2 = ∅. Since f1 and f2 are continuous, we have U1 := f1 inv (O1 ) ∈ OX and U2 := f2 inv (O2 ) ∈ OX . Note that x ∈ U1 ∩ U2 ∈ OX and x ∈ D, then, by Proposition 3.3, ∃x¯ ∈ D ∩ U1 ∩ U2 . Then, f1 (x) ¯ ∈ O1 and f2 (x) ¯ ∈ O2 , which implies that f1 |D (x) ¯ = f2 |D (x). ¯ This is a contradiction. Hence, we must have f1 = f2 . This completes the proof of the proposition. ' & Theorem 3.57 (Tietze’s Extension Theorem) Let (X, O) be a normal topological space, A ⊆ X be closed, and h : A → R. Let A be endowed with the subset topology OA . Assume that h is continuous. Then, there exists a continuous function k : X → R such that k|A = h. Proof Let f := continuous.

h . Then, |f (x)| < 1, ∀x ∈ A, and by Proposition 3.12, f is 1 + |h|

62

3 Topological Spaces

Claim 3.57.1 Let l : A → R be a continuous function such that |l(x)| ≤ c1 ∈ R, ∀x ∈ A, where c1 > 0. Then, there exists a continuous function g : X → R such that |g(x)| ≤ c1 /3, ∀x ∈ X, and |l(x) − g(x)| ≤ 2c1 /3, ∀x ∈ A. Proof of Claim Let B := {x ∈ A | l(x) ≤ −c1 /3} and C := {x ∈ A | l(x) ≥ c1 /3}. Then, B and C are closed sets in OA , by the continuity of l and Proposition 3.10. Since A is closed, then B and C are closed in O, by Proposition 3.5. Clearly, B ∩ C = ∅. By Urysohn’s Lemma, there exists a continuous function g : X → R such that |g(x)| ≤ c1 /3, ∀x ∈ X, g(x) = −c1 /3, ∀x ∈ B, and g(x) = c1 /3, ∀x ∈ C. Hence, |l(x) − g(x)| ≤ 2c1 /3, ∀x ∈ A. This completes the proof of the claim. ' & By repeated application of Claim 3.57.1, we may define fi : X → R, ∀i ∈ N,  i−1 i   such that fi is continuous, |fi (x)| ≤ 23i , ∀x ∈ X, and f (x) − ik=1 fk (x) ≤ 23i , ∀x ∈ A. Define g : X → R by g(x) = limi∈N ik=1 fk (x), ∀x ∈ X. Clearly, g is -∞ 2i−1 well-defined, g|A = f , and |g(x)| ≤ = 1, ∀x ∈ X. ∀x0 ∈ X.  -∞i=1 3i ∀ ∈ (0, ∞) ⊂ R. ∃N ∈ N such that  i=N+1 fi (x) < /3, ∀x ∈ X. By the continuity of f1 , . . . , fN and  Proposition 3.9, ∃U ∈ O with x0 ∈ U such that -N   N  i=1 fi (x) − i=1 fi (x0 ) < /3, ∀x ∈ U . Then, we have, ∀x ∈ U ,    N  N N . .    .      g(x) − g(x0 ) ≤ g(x) − + f (x) f (x) − f (x ) i i i 0    

.

i=1

i=1

i=1

.   N  + fi (x0 ) − g(x0 ) <  i=1

Therefore, g is continuous at x0 . Then, g is continuous, by the arbitrariness of x0 and Proposition 3.9. Let D := {x ∈ X | |g(x)| = 1}. Clearly, D is a closed set, by Proposition 3.10. Note that A ∩ D = ∅, since g|A = f and |f (x)| < 1, ∀x ∈ A. Then, by Urysohn’s Lemma, there exists a continuous function g¯ : X → [0, 1] such that g| ¯A = 1 g(x)g(x) ¯ and g| ¯ D = 0. Define k : X → R by k(x) = , ∀x ∈ X. By |g(x)| 1 − g(x) ¯ Propositions 3.12 and 3.32 and the fact that 1 − g(x)|g(x)| ¯

= 0, ∀x ∈ X, we have g(x) k is continuous. ∀x ∈ A, k(x) = = h(x). Hence, k|A = h. 1 − |g(x)| This completes the proof of the theorem. ' & Definition 3.58 Let X be a set and F be a collection of real-valued functions of X. Then, there is the weakest topology on X such that all functions in F are continuous. This topology is called the weak topology generated by F . % Let X be a set, I := [0, 1] ⊂ R, and F be a collection of functions of X to I such that ∀x, y ∈ X with x = y, ∃f ∈ F , we have f (x) = f (y). Each f ∈ F

3.8 Continuous Real-Valued Functions

63

is a point in I X and F can be identified with a subset of I X . The topology that F inherits as a subspace of I X is called the topology of pointwise convergence. Now, X can be identified with a subset of I F by, ∀x ∈ X, πf (x) = f (x), ∀f ∈ F . Then, the topology of X as a subset of I F is the weak topology generated by F . Proposition 3.59 Let X be a topological space, I := [0, 1] ⊂ R, and F be a collection of continuous functions of X to I such that ∀x, y ∈ X with x = y, ∃f ∈ F , we have f (x) = f (y). Let E : X → I F be the equivalence map given by, ∀x ∈ X , πf (E(x)) = f (x), ∀f ∈ F . Then, E is continuous. Furthermore, if ∀ closed set F ⊆ X and ∀x ∈ X with x ∈ / F , ∃f ∈ F with f (x) = 1 and f |F = 0, then E : X → E(X ) is a homeomorphism. F Proof ∀x 0 ∈ X . Fix a basis open set O in I with E(x0 ) ∈ O. By Proposition 3.25, O = f ∈F Of , where Of ∈ OI with OI being the subset topology on I , ∀f ∈F , and Of = I for all f ’s except finitely many f ’s, say f ∈ FN . Let U = f ∈FN finv (Of ) ∈ OX . By E(x0 ) ∈ O, we have x0 ∈ U . ∀x ∈ U , we have πf (E(x)) ∈ Of = I , ∀f ∈ F \ FN , and πf (E(x)) = f (x) ∈ Of , ∀f ∈ FN . Hence, E(x) ∈ O. Then, E(U ) ⊆ O. Therefore, E is continuous at x0 . By the arbitrariness of x0 and Proposition 3.9, E is continuous. Under the additional assumption on F , we need to show that E is a homeomorphism between X and E(X ). ∀x, y ∈ X with x = y, ∃f ∈ F such that πf (E(x)) = f (x) = f (y) = πf (E(y)). Then, E(x) = E(y). Hence, E : X → E(X ) is injective. Clearly, E : X → E(X ) is surjective. Then, E : X → E(X ) is bijective and admits inverse Einv : E(X ) → X . ∀x0 ∈ X ,  is we will show that Einv is continuous at E(x0 ). ∀O ∈ OX with x0 ∈ O. O  | closed and x0 ∈ / O. Then, ∃f0 ∈ F such that f0 (x0 ) = 1 and f0 O = 0. Define  U = f ∈F Uf ⊆ I F by Uf = I , ∀f ∈ F \ {f0 } and Uf0 = (1/2, 1] ∈ OI . Clearly, U is open in I F . Clearly, E(x0 ) ∈ U . ∀x ∈ X with E(x) ∈ U , we  and x ∈ O. This shows that have πf0 (E(x)) = f0 (x) > 1/2. Then, x ∈ / O Einv (E(X ) ∩ U ) ⊆ O. Hence, Einv is continuous at E(x0 ). By the arbitrariness of x0 and Proposition 3.9, Einv : E(X ) → X is continuous. This implies that E : X → E(X ) is a homeomorphism. This completes the proof of the proposition. ' &

Definition 3.60 A topological space X is said to be completely regular (or T3 1 ) 2 if it is Tychonoff and ∀x0 ∈ X and for all closed set F ⊆ X with x0 ∈ / F , there exists a continuous real-valued function f : X → [0, 1] such that f (x0 ) = 1 and f |F = 0. % Proposition 3.61 A normal topological space is completely regular. A completely regular topological space is regular. Proof Let X be a normal topological space. Then, X is Tychonoff. ∀x0 ∈ X and for all closed set F ⊆ X with x0 ∈ / F , we have {x0 } is closed, by Proposition 3.34. By Urysohn’s Lemma, there exists a continuous real-valued function f : X → [0, 1] such that f (x0 ) = 1 and f |F = 0. Hence, X is completely regular.

64

3 Topological Spaces

Let X be a completely regular topological space. Then, X is Tychonoff. ∀x0 ∈ X and for all closed set F ⊆ X with x0 ∈ / F , there exists a continuous real-valued function f : X → [0, 1] such that f (x0 ) = 1 and f |F = 0. Let O1 := {x ∈ X | f (x) > 1/2} and O2 := {x ∈ X | f (x) < 1/2}. Then, O1 , O2 ∈ O by the continuity of f . Clearly, x0 ∈ O1 , F ⊆ O2 , and O1 ∩ O2 = ∅. Hence, X is regular. This completes the proof of the proposition. ' & Corollary 3.62 Let X be a completely regular topological space, I = [0, 1] ⊂ R, and F := {f : X → I | f is continuous}. Then, the equivalence map: E : X → I F defined by πf (E(x)) = f (x), ∀x ∈ X , ∀f ∈ F , is a homeomorphism between X and E(X ) ⊆ I F . Proof Since X is completely regular, then X is Tychonoff and all singleton subset of X is closed. Then, it is easy to check that all assumptions in Proposition 3.59 are satisfied. Then, the result follows. This completes the proof of the corollary. ' &

3.9 Nets and Convergence Definition 3.63 A directed system is a nonempty set A and a relation on A, ≺, such that (i) ≺ is transitive. (ii) ∀α, β ∈ A, ∃γ ∈ A such that α ≺ γ and β ≺ γ . A net is a mapping of a directed system A := (A, ≺) to a topological space X . ∀α ∈ A, the image is xα . The net is denoted by (xα )α∈A , where we have abuse the notation to say α ∈ A when α ∈ A. It is understood that the relation for A is ≺A , where we will ignore the subscript A if no confusion arises. A point x ∈ X is a limit of the net (xα )α∈A if ∀O ∈ O with x ∈ O, ∃α0 ∈ A · ∀α ∈ A with α0 ≺ α, we have xα ∈ O. We also say that (xα )α∈A converges to x. A point x ∈ X is a cluster point of (xα )α∈A if ∀O ∈ O with x ∈ O, ∀α ∈ A, ∃β ∈ A with α ≺ β · xβ ∈ O. % Clearly, a limit point of a net is a cluster point of the net. In Definition 3.63, we may restrict O to be a basis open set without changing the meaning of the definition. Example 3.64 (N, ≤) is a directed system. A net over (N, ≤) corresponds to a sequence. % Proposition 3.65 Let X be a topological space. Then, the following statements hold: (i) X is Hausdorff if, and only if, for all net (xα )α∈A ⊆ X , there exists at most one limit point for the net. We then write x = limα∈A xα when the limit exists.

3.9 Nets and Convergence

65

(ii) If X is Hausdorff, any convergent net (xα )α∈A ⊆ X with limit x ∈ X has exactly one cluster point, which is x. Proof (i) “Only if” Suppose there exists a net (xα )α∈A ⊆ X such that ∃xA , xB ∈ X with xA = xB and xA and xB are limit points of the net. Since X is Hausdorff, then ∃O1 , O2 ∈ O such that xA ∈ O1 , xB ∈ O2 , and O1 ∩ O2 = ∅. Since xA is the limit of the net, then ∃α1 ∈ A, ∀α ∈ A with α1 ≺ α, we have xα ∈ O1 . Similarly, since xB is the limit of the net, then ∃α2 ∈ A, ∀α ∈ A with α2 ≺ α, we have xα ∈ O2 . Since A is a directed system, ∃α3 ∈ A such that α1 ≺ α3 and α2 ≺ α3 . Then, we have xα3 ∈ O1 and xα3 ∈ O2 , which implies that O1 ∩ O2 = ∅, which is a contradiction. Therefore, every net in X has at most one limit point. “If” Suppose X is not Hausdorff. Then, ∃xA , xB ∈ X with xA = xB such that ∀OA , OB ∈ O with xA ∈ OA and xB ∈ OB , we have OA ∩ OB = ∅. Let Λ := {(OA , OB ) | xA ∈ OA ∈ O, xB ∈ OB ∈ O}. Clearly, (X , X ) ∈ Λ, then Λ = ∅. Define a relation ≺ on Λ by, ∀(OA1 , OB1 ), (OA2 , OB2 ) ∈ Λ, we say (OA1 , OB1 ) ≺ (OA2 , OB2 ) if OA1 ⊇ OA2 and OB1 ⊇ OB2 . Clearly, ≺ is transitive on Λ. ∀(OA1 , OB1 ), (OA2 , OB2 ) ∈ Λ, we have xA ∈ OA3 := OA1 ∩ OA2 ∈ O and xB ∈ OB3 := OB1 ∩ OB2 ∈ O. Then, we have (OA3 , OB3 ) ∈ Λ, (OA1 , OB1 ) ≺ (OA3 , OB3 ), and (OA2 , OB2 ) ≺ (OA3 , OB3 ). Hence, A := (Λ, ≺) is a directed system. ∀(OA , OB ) ∈ Λ, OA ∩ OB = ∅. By Axiom of Choice, we  a mapping x(OA ,OB ) ∈ OA ∩ OB , ∀(OA , OB ) ∈ Λ. Then, the net  may have x(OA ,OB ) (O ,O )∈A ⊆ X . ∀OA1 ∈ O with xA ∈ OA1 . Fix OB1 := X ∈ O A B with xB ∈ OB1 . Then, (OA1 , OB1 ) ∈ Λ. ∀(OA2 , OB2 ) ∈ Λ with (OA1 , OB1 ) ≺ (OA2 , OB2 ), we  have x(O  A2 ,OB2 ) ∈ OA2 ∩OB2 ⊆ OA1 ∩OB1 = OA1 . Hence, xA is a limit point of x(OA ,OB ) (O ,O )∈A . ∀OB1 ∈ O with xB ∈ OB1 . Fix OA1 := X ∈ O A B with xA ∈ OA1 . Then, (OA1 , OB1 ) ∈ Λ. ∀(OA2 , OB2 ) ∈ Λ with (OA1 , OB1 ) ≺ (OA2 , OB2 ), we have  B2 ) ∈ OA2 ∩ OB2 ⊆ OA1 ∩ OB1 = OB1 . Hence, xB  x(OA2 ,O is a limit point of x(OA ,OB ) (O ,O )∈A . This contradicts with the assumption that A B every net has at most one limit point. Therefore, X is Hausdorff. (ii) Let X be Hausdorff and net (xα )α∈A ⊆ X satisfy limα∈A xα = x ∈ X . Clearly, x is a cluster point of the net (xα )α∈A . Let y ∈ X with y = x be another cluster point of the net (xα )α∈A . Then, there exist O1 , O2 ∈ O with x ∈ O1 and y ∈ O2 and O1 ∩ O2 = ∅. Since x is the limit of the net, then ∃α1 ∈ A such that ∀α ∈ A with α1 ≺ α, we have xα ∈ O1 . Then, xα ∈ / O2 . This contradicts the definition that y is cluster point of (xα )α∈A . Hence, the net has exactly one cluster point, which is x. This completes the proof of the proposition. ' & Proposition 3.66 Let X and Y be topological spaces, D ⊆ X with subset topology OD , and f : D → Y. Then, the following are equivalent: (i) f is continuous at x0 ∈ D. (ii) ∀ net (xα )α∈A ⊆ D with x0 as a limit point, we have that the net (f (xα ))α∈A has a limit point f (x0 ). (iii) ∀ net (xα )α∈A ⊆ D with x0 as a cluster point, we have that the net (f (xα ))α∈A ⊆ Y has a cluster point f (x0 ).

66

3 Topological Spaces

Proof (i) ⇒ (ii). Fix a net (xα )α∈A ⊆ D with x0 ∈ D as a limit point. ∀OY ∈ OY with f (x0 ) ∈ OY . By the continuity of f at x0 , ∃OX ∈ OX with x0 ∈ OX such that f (OX ) ⊆ OY . Since x0 is a limit point of (xα )α∈A , then ∃α0 ∈ A such that, ∀α ∈ A with α0 ≺ α, we have xα ∈ OX . Then, f (xα ) ∈ OY . Hence, we have f (x0 ) is a limit point of (f (xα ))α∈A . (ii) ⇒ (i). Suppose f is not continuous at x0 ∈ D. Then, ∃OY 0 ∈ OY with f (x0 ) ∈ OY 0 such that, ∀OX ∈ OX with x0 ∈ OX , we have f (OX )  OY 0 . Let M := {O ∈ OX | x0 ∈ O}. Clearly, X ∈ M and M = ∅. Define a relation ≺ on M by, ∀O1 , O2 ∈ M, we say O1 ≺ O2 if O1 ⊇ O2 . Clearly, ≺ is transitive on M. ∀O1 , O2 ∈ M, let O3 = O1 ∩ O2 ∈ OX and x0 ∈ O3 . Then, O3 ∈ M, O1 ≺ O3 , and O2 ≺ O3 . Hence, A := (M, ≺) is a directed system. ∀O ∈ M, f (O) \ OY 0 = ∅. By Axiom of Choice, we may define a net (xO )O∈A by xO ∈ O ∩ D with f (xO ) ∈ / OY 0 . Clearly, x0 is a limit point of (xO )O∈A . Yet, f (x0 ) ∈ OY 0 ∈ OY and f (xO ) ∈ / OY 0 , ∀O ∈ M. Then, f (x0 ) is not a limit point of the net (f (xO ))O∈A . This contradicts with the assumption. Therefore, f is continuous at x0 . (i) ⇒ (iii). Fix a net (xα )α∈A ⊆ D with x0 as a cluster point. ∀OY ∈ OY with f (x0 ) ∈ OY , by the continuity of f at x0 , ∃U ∈ OX with x0 ∈ U such that f (U ) ⊆ OY . By Definition 3.63, ∀α ∈ A, ∃α0 ∈ A with α ≺ α0 , xα0 ∈ U . Then, f (xα0 ) ∈ OY . Hence, f (x0 ) is a cluster point of the net (f (xα ))α∈A . (iii) ⇒ (i). Suppose f is not continuous at x0 . Let M := {O ∈ OX | x0 ∈ O}. Clearly, A := (M, ⊇) is a directed system. ∃OY 0 ∈ OY with f (x0 ) ∈ OY 0 such that ∀U ∈ M, we have f (U )  OY 0 . By Axiom of Choice, we , may assign to each U ∈ M an xU ∈ U ∩ D such that f (xU ) ∈ O Y 0 . Consider the net (xU )U ∈A ⊆ D. Clearly, x0 is a limit point of the net and therefore is a cluster point of the net. Consider the net (f (xU ))U ∈A . For the open set OY 0 f (x0 ), , ∀U ∈ A, f (xU ) ∈ O Y 0 . Then, f (x0 ) is not a cluster point of (f (xU ))U ∈A . This contradicts with the assumption. Therefore, f must be continuous at x0 . This completes the proof of the proposition. ' & Proposition 3.67 Let (Xα , Oα ) be a topological space, ∀α ∈ Λ, where Λ is an  index set. Let (X, O) be the product space α∈Λ (Xα , Oα ). Let xβ β∈A ⊆ X be a   net. Then, x0 ∈ X is a limit point of xβ β∈A if, and only if, ∀α ∈ Λ, πα (x0 ) ∈ Xα   is a limit point of πα (xβ ) β∈A . Proof “Only if” ∀α ∈ Λ, by Proposition 3.27, πα is continuous. Then, πα is continuous at x0 ∈ X, by  Proposition 3.9. By Proposition 3.66, πα (x0 ) is a limit point of the net πα (xβ ) β∈A .   “If” Suppose that x0 ∈ X is not a limit point of the net xβ β∈A . Then, ∃ a basis open set B ∈ O with x 0 ∈ B such that, ∀β0 ∈ A, ∃β ∈ A with β0 ≺ β, we have xβ ∈ / B. Then, B = α∈Λ Oα , Oα ∈ Oα , ∀α ∈ Λ, and Oα = Xα for all α’s except finitely many α’s, say α ∈ ΛN . Then, ∀β0 ∈ A, ∃β ∈ A with β0 ≺ β, we have xβ ∈ / B. This implies that παβ (xβ ) ∈ / Oαβ , for some αβ ∈ ΛN . Then, by an argument of contradiction, we may show that ∃α0 ∈ ΛN such that, ∀β0 ∈ A, ∃β ∈ A with β0 ≺ β, we have πα0 (xβ ) ∈ / Oα0 . Hence, πα0 (x0 ) ∈ Oα0 is not

3.9 Nets and Convergence

67

  the limit of the net πα0 (xβ ) β∈A . This contradicts with the assumption. Hence, we   have x0 is a limit point of xβ β∈A . This completes the proof of the proposition. ' & Proposition 3.68 Let (X, O) be a topological space, E ⊆ X, and x ∈ X. x ∈ E if, and only if, ∃ a net (xα )α∈A ⊆ E such that x is a limit point of the net. Proof “Only if” Let M := {O ∈ O | x ∈ O}. Clearly, X ∈ M, then M = ∅. Clearly, A := (M, ⊇) is a directed system. Since x ∈ E, then, by Proposition 3.3, ∀O ∈ A, O ∩ E = ∅. By Axiom of Choice, ∃ a net (xO )O∈A ⊆ E such that xO ∈ O ∩ E, ∀O ∈ A. ∀O ∈ O with x ∈ O, then O ∈ A. ∀O1 ∈ A with O ⊇ O1 , we have xO1 ∈ O1 ∩ E ⊆ O. Hence, x is a limit point of (xO )O∈A . “If” Let (xα )α∈A ⊆ E be the net such that x is a limit point of the net. ∀O ∈ O with x ∈ O, ∃α0 ∈ A, ∀α ∈ A with α0 ≺ α, we have xα ∈ E ∩ O. Since (xα )α∈A is a net, then ∃α1 ∈ A with α0 ≺ α1 . Then, xα1 ∈ O ∩ E = ∅. By Proposition 3.3, x ∈ E. This completes the proof of the proposition. ' & Definition 3.69 Let (X, O) be a topological space, A := (A, ≺) be a directed system, and (xα )α∈A ⊆ X be a net. Let As ⊆ A be a subset with the same relation ≺ as A such that ∀α ∈ A, ∃αs ∈ As such that α ≺ αs . Then, As := (As , ≺) is a directed system and (xα )α∈As is a net, which is called a subnet of (xα )α∈A . % Proposition 3.70 Let (X, O) be a topological space and (xα )α∈A ⊆ X be a net. Then, x0 ∈ X is a limit point of (xα )α∈A if, and only if, any subnet (xα )α∈As has a limit point x0 . Proof “Only if” Since x0 ∈ X is a limit point of (xα )α∈A , then ∀O ∈ O with x0 ∈ O, ∃α0 ∈ A such that, ∀α ∈ A with α0 ≺ α, we have xα ∈ O. Let (xα )α∈As be a subnet. Then, ∃αs0 ∈ As such that α0 ≺ αs0 . ∀αs ∈ As with αs0 ≺ αs , we have α0 ≺ αs and xαs ∈ O. Hence, x0 is a limit point of the subnet. “If” Since (xα )α∈A is a subnet of itself, then it has limit x0 . This completes the proof of the proposition. ' & A cluster point of a subnet is clearly a cluster point of the net. Proposition 3.71 Let (X, O) be a topological space and (xα )α∈A ⊆ X be a net. Then, x0 ∈ X is a limit point of (xα )α∈A if, and only if, for every subnet (xα )α∈As of (xα )α∈A , there exists a subsubnet (xα )α∈Ass that has a limit point x0 . Proof “Sufficiency” We assume that every subnet (xα )α∈As of (xα )α∈A , there exists a subsubnet (xα )α∈Ass that has a limit point x0 . We will prove the result using an argument of contradiction. Suppose x0 is not a limit point of (xα )α∈A . Then, ∃O0 ∈  O with x*0 ∈ O0, ∀α0 ∈ A, + ∃α ∈ A with α0 ≺ α such that xα ∈ O0 . Define    As := α ∈ A  xα ∈ O0 , ≺ . Clearly, (xα )α∈As is a subnet of (xα )α∈A . Any 0 . Then, x0 is not subsubnet (xα )α∈Ass of (xα )α∈As , ∀αss ∈ Ass , we have xαss ∈ O a limit of (xα )α∈Ass . This contradicts the assumption. Therefore, x0 is a limit point of (xα )α∈A .

68

3 Topological Spaces

“Necessity” Let x0 be a limit point of (xα )α∈A and (xα )α∈As be a subnet. By Proposition 3.70, x0 is a limit point of (xα )α∈As , which is a subsubnet of itself. Then, the result holds. This completes the proof of the proposition. ' & Definition 3.72 Let X := (X, OX ) and Y := (Y, OY ) be topological spaces, D ⊆ X , f : D → Y, and x0 ∈ X be an accumulation point of D. y0 ∈ Y is said to be a limit point of f (x) as x → x0 if ∀OY ∈ OY with y0 ∈ OY , ∃U ∈ OX with x0 ∈ U such that f (U \ {x0}) = f ((D ∩ U ) \ {x0 }) ⊆ OY . We will also say that f (x) converges to y0 as x → x0 . % When basis are available on topological spaces X and Y, in Definition 3.72, we may restrict the open sets OY and U to be basis open sets without changing the meaning of the definition. Proposition 3.73 Let X := (X, OX ) and Y := (Y, OY ) be topological spaces, D ⊆ X , f : D → Y, and x0 ∈ X be an accumulation point of D. If Y is Hausdorff, then there is at most one limit point of f (x) as x → x0 . In this case, we will write limx→x0 f (x) = y0 ∈ Y when the limit exists. Proof Suppose f (x) admits limit points yA , yB ∈ Y as x → x0 with yA = yB . Since Y is Hausdorff, then ∃UA , UB ∈ OY such that yA ∈ UA , yB ∈ UB , and UA ∩ UB = ∅. Since yA is a limit point of f (x) as x → x0 , then ∃VA ∈ OX with x0 ∈ VA such that f (VA \ {x0 }) ⊆ UA . Since yB is a limit point of f (x) as x → x0 , then ∃VB ∈ OX with x0 ∈ VB such that f (VB \ {x0 }) ⊆ UB . Then, x0 ∈ V := VA ∩ VB ∈ O. Since x0 is an accumulation point of D, then ∃x ∈ (D ∩ V ) \ {x0 }. Then, we have f (x) ∈ UA since x ∈ (D ∩ VA ) \ {x0 } and f (x) ∈ UB since x ∈ (D∩VB )\{x0 }. Then, f (x) ∈ UA ∩UB = ∅. This contradicts with UA ∩UB = ∅. Hence, the result holds. This completes the proof of the proposition. ' & Proposition 3.74 Let X := (X, OX ) and Y := (Y, OY ) be topological spaces, D ⊆ X with subset topology OD , f : D → Y, and x0 ∈ D. Then, the following statements are equivalent: (i) f is continuous at x0 . (ii) If x0 is an accumulation point of D, then f (x0 ) is a limit point of f (x) as x → x0 . Proof (i) ⇒ (ii). This is straightforward. (ii) ⇒ (i). We will distinguish two exhaustive and mutually exclusive cases: Case 1: x0 is not an accumulation point of D, and Case 2: x0 is an accumulation point of D. Case 1: x0 is not an accumulation point of D. ∃V ∈ OX with x0 ∈ V such that V ∩ D = {x0 }. ∀U ∈ OY with f (x0 ) ∈ U , we have f (V ) = {f (x0 )} ⊆ U . Hence, f is continuous at x0 . Case 2: x0 is an accumulation point of D. ∀U ∈ OY with f (x0 ) ∈ U , ∃V ∈ OX with x0 ∈ V such that f (V \ {x0 }) ⊆ U . Then, we have f (V ) ⊆ U . Hence, f is continuous at x0 . In both cases, f is continuous at x0 .

3.9 Nets and Convergence

This completes the proof of the proposition.

69

' &

Proposition 3.75 Let X := (X, OX ), Y := (Y, OY ), and Z := (Z, OZ ) be topological spaces, D ⊆ X , f : D → Y, x0 ∈ X be an accumulation point of D, y0 ∈ Y be a limit point of f (x) as x → x0 , and g : Y → Z be continuous at y0 . Then, g(y0 ) ∈ Z is a limit point of g(f (x)) as x → x0 . When Y and Z are Hausdorff, then we may write limx→x0 g(f (x)) = g(limx→x0 f (x)). Proof ∀OZ ∈ OZ with g(y0 ) ∈ OZ , by the continuity of g at y0 , ∃OY ∈ OY with y0 ∈ OY such that g(OY ) ⊆ OZ . Since y0 is the limit of f (x) as x → x0 , then ∃OX ∈ OX with x0 ∈ OX such that f (OX \ {x0 }) ⊆ OY . Then, g(f (OX \ {x0 })) ⊆ OZ . Hence, g(f (x)) converges to g(y0 ) as x → x0 . This completes the proof of the proposition. ' & Proposition 3.76 Let X be a topological space, D¯ ⊆ X , x0 ∈ X be an ¯ Y and Z be Hausdorff topological spaces, D ⊆ Y, y0 ∈ Y accumulation point of D, be an accumulation point of D, f : D¯ → D, and g : D → Z. Assume that (i) ∃O0 ∈ OX with x0 ∈ O0 such that f (O0 \ {x0 }) ⊆ D \ {y0}. (ii) limx→x0 f (x) = y0 and limy→y0 g(y) = z0 ∈ Z. Then, limx→x0 g(f (x)) = z0 . Proof ∀OZ ∈ OZ with z0 ∈ OZ , by limy→y0 g(y) = z0 , ∃OY ∈ OY with y0 ∈ OY such that g(OY \ {y0 }) = g((OY ∩ D) \ {y0 }) ⊆ OZ . By limx→x0 f (x) = y0 , ¯ \ {x0 }) ⊆ OY . ∃OX ∈ OX with x0 ∈ OX such that f (OX \ {x0 }) = f ((OX ∩ D) ¯ \ {x0 }, we Let O1 := O0 ∩ OX ∈ OX . Clearly, x0 ∈ O1 . Then, ∀x ∈ (O1 ∩ D) have f (x) ∈ OY ∩ (D \ {y0 }) = (OY ∩ D) \ {y0 } and g(f (x)) ∈ OZ . Hence, ¯ \ {x0 })) ⊆ OZ . Hence, limx→x0 g(f (x)) = z0 . This completes the g(f ((O1 ∩ D) proof of the proposition. ' & Proposition 3.77 Let X , Y, and Z be Hausdorff topological spaces, D¯ ⊆ X , x0 ∈ ¯ D ⊆ Y, y0 ∈ Y be an accumulation point of X be an accumulation point of D, D, f : D¯ → D be bijective, and g : D → Z. Assume that limx→x0 f (x) = y0 and limy→y0 finv (y) = x0 . Then, limx→x0 g(f (x)) = limy→y0 g(y) whenever one of the limits exists in Z. Proof We will prove the result by distinguishing two exhaustive cases: Case 1: limy→y0 g(y) = z0 ∈ Z, and Case 2: limx→x0 g(f (x)) = z0 ∈ Z. Case 1: limy→y0 g(y) = z0 ∈ Z. We will further distinguish three exhaustive and mutually exclusive subcases: Case 1a: y0 ∈ / D, Case 1b: y0 ∈ D and x0 = finv (y0 ), and Case 1c: y0 ∈ D and x0 = x¯ 0 := finv (y0 ). Case 1a: y0 ∈ / D. Then, (i) of Proposition 3.76 is satisfied with O0 := X . By Proposition 3.76, we have limx→x0 g(f (x)) = z0 ∈ Z. Case 1b: y0 ∈ D and x0 = finv (y0 ). Let O0 := X . ¯ \ {x0 }, we have D f (x) = f (x0 ) = y0 by f being bijective. ∀x ∈ (O0 ∩ D) This implies that f (O0 \ {x0 }) ⊆ D \ {y0 }. Then, limx→x0 g(f (x)) = z0 ∈ Z by Proposition 3.76. Case 1c: y0 ∈ D and x0 = x¯0 := finv (y0 ). By X being Hausdorff, ¯ \ {x0 }, ∃O0 ∈ OX such that x0 ∈ O0 and x¯0 ∈ / O0 . This leads to ∀x ∈ (O0 ∩ D) we have D f (x) = f (x¯0 ) = y0 by f being bijective. This implies that f (O0 \

70

3 Topological Spaces

{x0 }) ⊆ D \ {y0 }. Then, limx→x0 g(f (x)) = z0 ∈ Z by Proposition 3.76. Hence, limx→x0 g(f (x)) = limy→y0 g(y) = z0 ∈ Z in all three subcases. Hence, the result holds in this case. Case 2 limx→x0 g(f (x)) = z0 ∈ Z. Define h : D¯ → Z by h(x) = g(f (x)), ¯ Then, limx→x0 h(x) = z0 . By Case 1, we have limy→y0 h(finv (y)) = z0 ∈ ∀x ∈ D. Z. Then, limy→y0 g(f (finv (y))) = limy→y0 g(y) = z0 . Hence, the result holds in this case. This completes the proof of the proposition. ' & Example 3.78 Let g : R → R. It is desired to calculate limy→+∞ g(y). We will apply Proposition 3.77 to this calculation. Take X = R, Y = Re , and Z = Re . Let D¯ := (−∞, −1]∪(0, +∞) ⊂ R = X , x0 = 0, D := R⊂ Re = Y, and y0 = +∞. 1/x x > 0 ¯ Then, g : D → Z. Define f : D¯ → D by f (x) = , ∀x ∈ D. x + 1 x ≤ −1  1/y y > 0 ¯ Clearly, f is bijective with finv : D → D given by finv (y) = , y−1 y ≤0 ∀y ∈ D. Clearly, limx→0 f (x) = +∞ and limy→+∞ finv (y) = 0. Then, by Proposition 3.77, limy→+∞ g(y) = limx→0 g(f (x)) whenever one of the limits exists in Z. % Proposition 3.79 Let X := (X, OX ) and Y := (Y, OY ) be topological spaces, D ⊆ X , x0 ∈ X be an accumulation point of D, y0 ∈ Y, and f : D → Y. Then, the following statements are equivalent: (i) y0 is a limit point of f (x) as x → x0 . (ii) ∀ net (xα )α∈A ⊆ D \ {x0 } with x0 as a limit, we have y0 is a limit point of the net (f (xα ))α∈A . Proof (i) ⇒ (ii). Fix any net (xα )α∈A ⊆ D \ {x0 } with x0 as a limit. ∀U ∈ OY with y0 ∈ U , by (i), ∃V ∈ OX with x0 ∈ V such that f (V \ {x0 }) ⊆ U . ∃α0 ∈ A such that ∀α ∈ A with α0 ≺ α, we have xα ∈ V . Then, xα ∈ (D ∩ V ) \ {x0 } and f (xα ) ∈ U . Hence, (f (xα ))α∈A has a limit y0 . (ii) ⇒ (i). Suppose y0 is not a limit point of f (x) as x → x0 . Then, ∃U0 ∈ OY with y0 ∈ U0 , ∀V ∈ OX with x0 ∈ V , we have f (V \ {x0 })  U0 . Then, ∃xV ∈ (D ∩ V ) \ {x0 } such that f (xV ) ∈ / U0 . Let M := {V ∈ OX | x0 ∈ V } and A := (M, ⊇). Clearly, A is a directed system. By Axiom of Choice, we may construct a net (xV )V ∈A ⊆ D \ {x0 }. Clearly, x0 is a limit point of this net. But, ∀V ∈ A, f (xV ) ∈ / U0 . Hence, y0 is not a limit point of the net (f (xV ))V ∈A . This contradicts with (ii). Therefore, y0 is a limit point of f (x) as x → x0 . This completes the proof of the proposition. ' &

3.9 Nets and Convergence

71

Example 3.80 The extended real line is Re := R ∪ {∞} ∪ {−∞}. We assume that ∀x ∈ R, −∞ < x < ∞. We define the usual operations: ∀x ∈ R, .

x+∞=∞ x · ∞ = ∞;

x − ∞ = −∞ if x > 0

x · (−∞) = −∞; ∞+∞=∞ ∞·∞=∞

if x > 0 − ∞ − ∞ = −∞

∞ · (−∞) = −∞

The operations ∞ − ∞ and 0 · ∞ are undefined. On Re , we introduce the countable collection of subsets of Re , BRe , as follows. ∅, R, Re ∈ BRe . ∀r1 , r2 ∈ Q with r1 < r2 , [−∞, r1 ), (r1 , r2 ), (r2 , +∞] ∈ BRe . By Proposition 3.18, it is easy to show that BRe is a basis for a topology on Re . This topology is denoted ORe , which is the usual topology on Re . It is easy to show that (Re , ORe ) is an arcwise connected second countable Hausdorff topological space. An important property of Re is that ∀E ⊆ Re , supx∈E x ∈ Re and infx∈E x ∈ Re . It is easy to see that R as a subset of Re admits the subset topology O that equals the usual topology OR on R. % Proposition 3.81 Let X be a set, f1 : X → R, f2 : X → R, g1 : X → Re , and g2 : X → Re . Assume that g1 (x) + g2 (x) ∈ Re is well-defined, ∀x ∈ X. Then, supx∈X g1 (x) ≤ M ∈ Re if, and only if, ∀x ∈ X, g1 (x) ≤ M. infx∈X g1 (x) ≥ m ∈ Re if, and only if, ∀x ∈ X, g1 (x) ≥ m. supx∈X g1 (x) > M ∈ Re if, and only if, ∃x ∈ X, g1 (x) > M. infx∈X g1 (x) < m ∈ Re if, and only if, ∃x ∈ X, g1 (x) < m. supx∈X (−g1 )(x) = − infx∈X g1 (x). supx∈X (f1 + f2 )(x) ≤ supx∈X f1 (x) + supx∈X f2 (x). ∀α ∈ (0, ∞) ⊂ R, supx∈X (αg1 (x)) = α supx∈X g1 (x); ∀α ∈ [0, ∞) ⊂ R, supx∈X (αf1 (x)) = α supx∈X f1 (x) when supx∈X f1 (x) ∈ R. (viii) supx∈X (g1 + g2 )(x) ≤ supx∈X g1 (x) + supx∈X g2 (x), when the right-hand side is well-defined. (i) (ii) (iii) (iv) (v) (vi) (vii)

Proof This is straightforward and is therefore omitted.

' &

Definition 3.82 Let (xα )α∈A ⊆ Re be a net. The limit superior and limit inferior of the net are defined by .

lim sup xα = inf

sup

x β ∈ Re

lim inf xα = sup

inf

x β ∈ Re

α∈A

α∈A

α∈A β∈A with α≺β α∈A β∈A with α≺β

%

72

3 Topological Spaces

Proposition 3.83 Let (xα )α∈A ⊆ Re , (yα )α∈A ⊆ R, and (zα )α∈A ⊆ R be nets over the same directed system. Then, we have lim infα∈A xα ≤ lim supα∈A xα . − lim infα∈A xα = lim supα∈A (−xα ). lim infα∈A xα = lim supα∈A xα = L ∈ Re if, and only if, limα∈A xα = L. lim supα∈A (yα + zα ) ≤ lim supα∈A yα + lim supα∈A zα , when the right-hand side makes sense. (v) if limα∈A yα = y ∈ R, then lim supα∈A (yα + zα ) = y + lim supα∈A zα .

(i) (ii) (iii) (iv)

Proof Let Vα := {β ∈ A | α ≺ β}, ∀α ∈ A. Then, Vα = ∅, ∀α ∈ A, since A is a directed system, and Vα ⊇ Vβ , ∀α, β ∈ A with α ≺ β. (i) Let l := lim infα∈A xα ∈ Re and L := lim supα∈A xα ∈ Re . ∀m ∈ R with m < l, supα∈A infβ∈A with α≺β xβ > m implies that, by Proposition 3.81, ∃α0 ∈ A such that infβ∈Vα0 xβ > m. Then, ∀α ∈ A, ∃α1 ∈ A such that α0 ≺ α1 and α ≺ α1 . Then, m < infβ∈Vα0 xβ ≤ infβ∈Vα1 xβ ≤ supβ∈Vα xβ ≤ supβ∈Vα xβ . 1 Hence, L ≥ m. By the arbitrariness of m, we have L ≥ l. (ii) Note that, by Proposition 3.81, .

lim sup(−xα ) = inf sup (−xβ ) = inf α∈A β∈Vα

α∈A



α∈A

− inf xβ



β∈Vα

= − sup inf xβ = − lim inf xα α∈A β∈Vα

α∈A

(iii) “If” ∀m ∈ R with m > L, ∃α0 ∈ A such that xα ∈ (−∞, m), ∀α ∈ A with α0 ≺ α. Then, supβ∈Vα xβ ≤ m and lim supα∈A xα ≤ m. By the arbitrariness 0 of m, we have lim supα∈A xα ≤ L. By (ii), we have −L = limα∈A (−xα ) ≥ lim supα∈A (−xα ) = − lim infα∈A xα . Then, by (i), we have L ≤ lim infα∈A xα ≤ lim supα∈A xα ≤ L. “Only if” We will distinguish three exhaustive and mutually exclusive cases: Case 1: L = −∞, Case 2: L ∈ R, and Case 3: L = +∞. Case 1: L = −∞. ∀m ∈ R, lim supα∈A xα < m implies that ∃α0 ∈ A such that supβ∈Vα xβ < m. 0 Then, xβ ∈ (−∞, m), ∀β ∈ Vα0 . Hence, we have limα∈A xα = −∞ = L. Case 2: L ∈ R. ∀ ∈ (0, ∞) ⊂ R, L = lim infα∈A xα implies that ∃α1 ∈ A such that infβ∈Vα1 xβ > L − . Then, xβ ∈ (L − , +∞) ⊂ R, ∀β ∈ Vα1 . L = lim supα∈A xα implies that ∃α2 ∈ A such that supβ∈Vα xβ < L+. Then, xβ ∈ (−∞, L+) ⊂ R, 2 ∀β ∈ Vα2 . Let α0 ∈ A with α1 ≺ α0 and α2 ≺ α0 . Then, xβ ∈ (L − , L + ), ∀β ∈ A with α0 ≺ β. Therefore, limα∈A xα = L. Case 3: L = +∞. ∀M ∈ R, lim infα∈A xα > M implies that ∃α0 ∈ A such that infβ∈Vα0 xβ > M. Then, xβ ∈ (M, ∞), ∀β ∈ Vα0 . Hence, we have limα∈A xα = +∞ = L. (iv) Note that, ∀α ∈ A, by Proposition 3.81, .

sup (yβ + zβ ) ≤ sup yβ + sup zβ β∈Vα

β∈Vα

β∈Vα

3.9 Nets and Convergence

73

Then, by Proposition 3.81, we have, ∀α ∈ A, .

lim sup(yα + zα ) ≤ sup yβ + sup zβ α∈A

β∈Vα

β∈Vα

∀γ ∈ A, ∃α0 ∈ A with α ≺ α0 and γ ≺ α0 . Then, .

lim sup(yα + zα ) ≤ sup yβ + sup zβ ≤ sup yβ + sup zβ α∈A

β∈Vα0

β∈Vα0

β∈Vα

β∈Vγ

Hence, we have lim supα∈A (yα + zα ) ≤ lim supα∈A yα + lim supα∈A zα , when the right-hand side makes sense. (v) By (iv) and (iii), we have lim supα∈A (yα + zα ) ≤ y + lim supα∈A zα . Note that lim supα∈A zα = lim supα∈A (yα + zα − yα ) ≤ lim supα∈A (yα + zα ) + lim supα∈A (−yα ). Then, we have lim supα∈A (yα +zα ) ≥ y +lim supα∈A zα . Hence, we have lim supα∈A (yα + zα ) = y + lim supα∈A zα . This completes the proof of the proposition. ' & Definition 3.84 Let X := (X, OX ) be topological spaces, D ⊆ X , f : D → R, and x0 ∈ X be an accumulation point of D. Then the limit superior and limit inferior of f (x) as x → x0 are defined by .

lim sup f (x) = x→x0

lim inf f (x) = x→x0

inf

sup

f (x) ∈ Re

sup

inf

f (x) ∈ Re

O∈O with x0 ∈O x∈(D∩O)\{x0 } O∈O with x0 ∈O x∈(D∩O)\{x0 }

%

Proposition 3.85 Let X be a topological space, D ⊆ X , x0 ∈ X be an accumulation point of D, f : D → R, and g : D → R. Then, we have (i) lim infx→x0 f (x) ≤ lim supx→x0 f (x). (ii) − lim infx→x0 f (x) = lim supx→x0 (−f )(x). (iii) lim infx→x0 f (x) = lim supx→x0 f (x) = L ∈ Re if, and only if, limx→x0 f (x) = L. (iv) lim supx→x0 (f + g)(x) ≤ lim supx→x0 f (x) + lim supx→x0 g(x), when the right-hand side makes sense. (v) if lim f (x) = y ∈ R, then lim sup(f + g)(x) = lim sup g(x) + y. x→x0

x→x0

x→x0

Proof (i) Let l := lim infx→x0 f (x) ∈ Re and L := lim supx→x0 f (x) ∈ Re . ∀m ∈ R with m < l, supO∈O with x0 ∈O infx∈(D∩O)\{x0} f (x) > m implies that ∃U ∈ O with x0 ∈ U such that infx∈(D∩U )\{x0} f (x) > m. ∀O ∈ O with x0 ∈ O, we have x0 ∈ V := O ∩ U ∈ O and (D ∩ V ) \ {x0 } = ∅, since x0 is an accumulation point of D. Then, m < infx∈(D∩U )\{x0} f (x) ≤ infx∈(D∩V )\{x0 } f (x) ≤ supx∈(D∩V )\{x0 } f (x) ≤ supx∈(D∩O)\{x0} f (x). Hence, L ≥ m. By the arbitrariness of m, we have L ≥ l.

74

3 Topological Spaces

(ii) Note that, by Proposition 3.81, .

lim sup(−f )(x) = x→x0

=

inf

sup

(−f (x))

O∈O with x0 ∈O x∈(D∩O)\{x0 }

inf



O∈O with x0 ∈O

=−



sup

inf

x∈(D∩O)\{x0 }

inf

O∈O with x0 ∈O x∈(D∩O)\{x0 }

f (x)



f (x)

= − lim inf f (x) x→x0

(iii) “If” ∀m ∈ R with m > L, ∃V ∈ O with x0 ∈ V such that f (V \ {x0 }) ⊆ (−∞, m). Then, sup f (x) ≤ m and lim sup f (x) ≤ m. By the arbitrariness x∈(D∩V )\{x0 }

x→x0

of m, we have lim sup f (x) ≤ L. By (ii), we have −L = lim (−f )(x) ≥ x→x0

x→x0

lim sup(−f )(x) = − lim inf f (x). Then, by (i), we have L ≤ lim inf f (x) ≤ x→x0

x→x0

x→x0

lim sup f (x) ≤ L. x→x0

“Only if” We will distinguish three exhaustive and mutually exclusive cases: Case 1: L = −∞, Case 2: L ∈ R, and Case 3: L = +∞. Case 1: L = −∞. ∀m ∈ R, lim supx→x0 f (x) < m implies that ∃V ∈ O with x0 ∈ V such that supx∈(D∩V )\{x0 } f (x) < m. Then, f (V \ {x0 }) ⊆ (−∞, m). Hence, we have limx→x0 f (x) = −∞ = L. Case 2: L ∈ R. ∀ ∈ (0, ∞) ⊂ R, L = lim infx→x0 f (x) implies that ∃V1 ∈ O with x0 ∈ V1 such that infx∈(D∩V1)\{x0 } f (x) > L − . Then, f (V1 \ {x0 }) ⊆ (L − , +∞) ⊆ R. L = lim supx→x0 f (x) implies that ∃V2 ∈ O with x0 ∈ V2 such that supx∈(D∩V2)\{x0 } f (x) < L + . Then, f (V2 \ {x0 }) ⊆ (−∞, L + ). Let V := V1 ∩ V2 ∈ O. Clearly, x0 ∈ V , and, by Proposition 2.5, f (V \ {x0 }) ⊆ f (V1 \ {x0 }) ∩ f (V2 \ {x0}) ⊆ (L − , L + ). Therefore, limx→x0 f (x) = L. Case 3: L = +∞. ∀M ∈ R, lim infx→x0 f (x) > M implies that ∃V ∈ O with x0 ∈ V such that infx∈(D∩V )\{x0 } f (x) > M. Then, f (V \ {x0 }) ⊆ (M, +∞). Hence, we have limx→x0 f (x) = +∞ = L. (iv) Note that, ∀O ∈ O with x0 ∈ O, by Proposition 3.81, .

sup

(f + g)(x) ≤ (

x∈(D∩O)\{x0 }

sup

x∈(D∩O)\{x0 }

f (x)) + (

sup

x∈(D∩O)\{x0 }

g(x))

Then, by Proposition 3.81, we have, ∀O ∈ O with x0 ∈ O, .

lim sup(f + g)(x) ≤ ( x→x0

sup

x∈(D∩O)\{x0 }

f (x)) + (

sup

x∈(D∩O)\{x0 }

g(x))

3.9 Nets and Convergence

75

∀U ∈ O with x0 ∈ U , we have x0 ∈ V := U ∩ O ∈ O. Then, .

lim sup(f + g)(x) ≤ ( x→x0

≤(

sup

f (x)) + (

sup

f (x)) + (

x∈(D∩V )\{x0 } x∈(D∩O)\{x0 }

sup

g(x))

sup

g(x))

x∈(D∩V )\{x0 } x∈(D∩U )\{x0}

Hence, we have lim supx→x0 (f + g)(x) ≤ lim supx→x0 f (x) + lim supx→x0 g(x), when the right-hand side makes sense. (v) By (iv) and (iii), lim supx→x0 (f + g)(x) ≤ y + lim supx→x0 g(x). Note that lim supx→x0 g(x) = lim supx→x0 (f + g − f )(x) ≤ lim supx→x0 (f + g)(x) + lim supx→x0 (−f )(x). Then, we have lim supx→x0 (f + g)(x) ≥ y + lim supx→x0 g(x). Hence, we have lim supx→x0 (f + g)(x) = lim supx→x0 g(x) + y. This completes the proof of the proposition. ' & Proposition 3.86 Let X := (X, OX ) be a topological space, D ⊆ X with the subset topology OD , f : D → R, and x0 ∈ D. Then, the following statements are equivalent: (i) f is upper semicontinuous at x0 . (ii) If x0 is an accumulation point of D, then lim supx→x0 f (x) ≤ f (x0 ). Proof (i) ⇒ (ii). Let x0 be an accumulation point of D. By the upper semicontinuity of f at x0 , ∀ ∈ (0, ∞) ⊂ R, ∃O ∈ OX with x0 ∈ O such that f (x) < f (x0 ) + , ∀x ∈ O ∩ D. Then, supx∈(O∩D)\{x0} f (x) ≤ f (x0 ) + , and lim supx→x0 f (x) ≤ f (x0 ) + . By the arbitrariness of , we have lim supx→x0 f (x) ≤ f (x0 ). (ii) ⇒ (i). We will distinguish two exhaustive and mutually exclusive cases: Case 1: x0 is not an accumulation point of D, and Case 2: x0 is an accumulation point of D. Case 1: x0 is not an accumulation point of D. ∃V ∈ OX with x0 ∈ V such that V ∩D = {x0}. ∀ ∈ (0, ∞) ⊂ R, we have f (x) = f (x0 ) < f (x0 )+, ∀x ∈ V ∩D. Hence, f is upper semicontinuous at x0 . Case 2: x0 is an accumulation point of D. ∀ ∈ R+ , lim supx→x0 f (x) < f (x0 )+  implies that ∃O ∈ O with x0 ∈ O such that supx∈(O∩D)\{x0} f (x) < f (x0 ) + . Then, f (x) < f (x0 ) + , ∀x ∈ O ∩ D. Hence, f is upper semicontinuous at x0 . In both cases, f is upper semicontinuous at x0 . This completes the proof of the proposition. ' & Proposition 3.87 Let X be a topological space, D ⊆ X , x0 ∈ X be an accumulation point of D, and f : D → R. Assume there exists a net (xα )α∈A ⊆ D \ {x0 } such that limα∈A xα = x0 and lim infα∈A f (xα ) = c ∈ Re . Then, lim infx→x0 f (x) ≤ c.

76

3 Topological Spaces

Proof By Definitions 3.84 and 3.82, we have .

lim inf f (x) = x→x0

sup

lim inf f (xα ) = sup α∈A

inf

O∈O with x0 ∈O x∈(D∩O)\{x0 }

inf

α∈A β∈A with α≺β

f (x)

f (xβ ) = c

∀O ∈ O with x0 ∈ O, since limα∈A xα = x0 , then ∃α0 ∈ A such that ∀α ∈ A with α0 ≺ α, we have xα ∈ O. Then, xα ∈ (O ∩ D) \ {x0 }. This leads to .

inf

x∈(D∩O)\{x0 }

f (x) ≤

inf

β∈A with α0 ≺β

f (xβ ) ≤ c

Hence, we have lim infx→x0 f (x) ≤ c. This completes the proof of the proposition. ' & Definition 3.88 Let Ai := (Ai , ≺i ) be a directed system, i = 1, 2, and X := (X, O) be a topological space. Define a relation ≺ on A1 × A2 by, ∀(α1 , β1 ), (α2 , β2 ) ∈ A1 × A2 , we say (α1 , β1 ) ≺ (α2 , β2 ) if α1 ≺1 α2 and β1 ≺2 β2 . It is easy to verify that A := (A1 × A2 , ≺) =: A1 × A2 is a directed  A joint net is a mapping of the directed system A1 × A2 to X , denoted by system. xα,β (α,β)∈A ×A . The joint net is said to admit a (joint) limit point xˆ ∈ X if it 1 2 admits a limit point xˆ when viewed as a net over the directed system A. When X is Hausdorff, by Proposition 3.65, the joint net admits at most one joint limit point, which will be denoted by lim(α,β)∈A1 ×A2 xα,β ∈ X if it exists. % Proposition 3.89 Let X := [0, ∞] ⊂ Re with subset topology O, X := (X, O), and c ∈ (0, ∞) ⊂ R. Then, + : X × X → X and c· : X → X are continuous. Proof We will prove this using Proposition 3.9. ∀(x1 , x2 ) ∈ X × X. We will distinguish four exhaustive and mutually exclusive cases. Case 1: x1 < ∞ and x2 < ∞. Then, x1 + x2 < ∞. ∀ basis open set U := (r1 , r2 ) ∩ X ∈ O with x1 + x2 ∈ U , take V1 := (x1 − (x1 + x2 − r1 )/2, x1 + (r2 − x1 − x2 )/2) ∩ X ∈ O and V2 := (x2 − (x1 + x2 − r1 )/2, x2 + (r2 − x1 − x2 )/2) ∩ X ∈ O. Then, V1 × V2 is an open set in X × X with (x1 , x2 ) ∈ V1 × V2 and ∀(x¯1 , x¯2 ) ∈ V1 × V2 , we have x¯1 + x¯2 ∈ U . Hence, + : X × X → X is continuous at (x1 , x2 ). Case 2: x1 < ∞ and x2 = ∞. Then, x1 + x2 = ∞. ∀ basis open set U := (r1 , ∞] ∩ X ∈ O with x1 + x2 ∈ U , take V1 := (x1 − 1, x1 + 1) ∩ X ∈ O and V2 := (r1 − x1 + 1, ∞] ∩ X ∈ O. Then, V1 × V2 is an open set in X × X with (x1 , x2 ) ∈ V1 × V2 and ∀(x¯1 , x¯2 ) ∈ V1 × V2 , we have x¯ 1 + x¯2 ∈ U . Hence, + : X × X → X is continuous at (x1 , x2 ). Case 3: x1 = ∞ and x2 < ∞. By Case 2 and symmetry, + : X × X → X is continuous at (x1 , x2 ). Case 4: x1 = ∞ and x2 = ∞. Then, x1 + x2 = ∞. ∀ basis open set U := (r1 , ∞] ∩ X ∈ O with x1 + x2 ∈ U , take V1 := (r1 /2, ∞] ∩ X ∈ O and V2 := (r1 /2, ∞] ∩ X ∈ O. Then, V1 × V2 is an open set in X × X with (x1 , x2 ) ∈ V1 × V2

3.9 Nets and Convergence

77

and ∀(x¯1 , x¯2 ) ∈ V1 × V2 , we have x¯1 + x¯2 ∈ U . Hence, + : X × X → X is continuous at (x1 , x2 ). In all cases, we have + : X × X → X is continuous at (x1 , x2 ). By Proposition 3.9, + : X × X → X is continuous. ∀x ∈ X. We will distinguish two exhaustive and mutually exclusive cases. Case 1: x < ∞. Then, cx < ∞. ∀ basis open set U := (r1 , r2 ) ∩ X ∈ O with cx ∈ U , take V := (r1 /c, r2 /c) ∩ X ∈ O. Then, x ∈ V and ∀x¯ ∈ V , we have cx¯ ∈ U . Hence, c· : X → X is continuous at x. Case 2: x = ∞. Then, cx = ∞. ∀ basis open set U := (r1 , ∞] ∩ X ∈ O with cx ∈ U , take V := (r1 /c, ∞] ∩ X ∈ O. Then, x ∈ V and ∀x¯ ∈ V , we have cx¯ ∈ U . Hence, c· : X → X is continuous at x. In both cases, we have c· : X → X is continuous at x. By Proposition 3.9, c· : X → X is continuous. This completes the proof of the proposition. ' & Proposition 3.90 Let X := (X, OX ) and Y := (Y, OY ) be topological spaces, X be separable, and f : X → Y be continuous. Then, f (X) ⊆ Y is separable (in its subset topology). Proof Since X is separable, then it has a countable subset D ⊆ X such that D = X. Then, f (D) ⊆ f (X) and f (D) is a countable set. ∀y0 ∈ f (X), ∃x0 ∈ X such that y0 = f (x0 ). By Proposition 3.68, there exists a net (xα )α∈A ⊆ D such that limα∈A xα = x0 . By Proposition 3.66, we have the net (f (xα ))α∈A ⊆ f (D) and limα∈A f (xα ) = f (x0 ) = y0 . Thus, by Proposition 3.68, we have y0 ⊆ f (D), where the closure is with respect to Y. By the arbitrariness of y0 , we have f (X) ⊆ f (D). By Proposition 3.5, we have f (D) is dense in f (X), which implies that f (X) is separable. This completes the proof of the proposition. ' &

Chapter 4

Metric Spaces

4.1 Fundamental Notions Definition 4.1 A metric space (X, ρ) is a set X together with a metric ρ : X×X → R such that, ∀x, y, z ∈ X, (i) (ii) (iii) (iv)

ρ(x, y) ≥ 0. ρ(x, y) = 0 ⇔ x = y. ρ(x, y) = ρ(y, x). ρ(x, y) ≤ ρ(x, z) + ρ(z, y).

%

Let S ⊆ X. Then, (S, ρ|S×S ) is also a metric space. Example 4.2 (R,-ρ), with ρ(x, y) = |x −y|, ∀x, y ∈ R, is a metric space. (Rn , ρ), with ρ(x, y) = ( ni=1 |xi − yi |2 )1/2 , ∀x = (x1 , . . . , xn ), y = (y1 , . . . , yn ) ∈ Rn , n is a metric space, where n ∈ N. (R , ρ), with ρ(x, y) = ni=1 |xi − yi |, ∀x = (x1 , . . . , xn ), y = (y1 , . . . , yn ) ∈ Rn , is a metric space, where n ∈ N. % Proposition 4.3 Let X := (X, ρ) be a metric space, and an open ball centered at x0 ∈ X with radius r ∈ (0, ∞) ⊂ R is defined by BX (x0 , r) := {x ∈ X | ρ(x, x0 ) < r}. The metric space generates a natural topology O on X with the basis collection B given by B := {BX (x0 , r) | x0 ∈ X, r ∈ (0, ∞) ⊂ R}. The closed ball centered at x0 ∈ X with radius r ∈ [0, ∞) ⊂ R is defined by B X (x0 , r) := {x ∈ X | ρ(x, x0 ) ≤ r}, which is closed. Proof We will show that B is the basis for its generated topology by Proposition 3.18. (i) ∀x ∈ X, x ∈ BX (x, 1). (ii) ∀BX (x1 , r1 ), BX (x2 , r2 ) ∈ B, let x ∈ BX (x1 , r1 ) ∩ BX (x2 , r2 ). Then, we have ρ(x, x1 ) < r1 and ρ(x, x2 ) < r2 . Let r := min{r1 − ρ(x, x1 ), r2 − ρ(x, x2 )} ∈ (0, ∞) ⊂ R. Then, x ∈ BX (x, r) ∈ B. ∀x3 ∈ BX (x, r), we have ρ(x3 , x1 ) ≤ ρ(x3 , x) + ρ(x, x1 ) < r1 and ρ(x3 , x2 ) ≤ ρ(x3 , x) + ρ(x, x2 ) < r2 . Hence, we have x3 ∈ BX (x1 , r1 ) ∩ BX (x2 , r2 ). © The Author(s), under exclusive license to Springer Nature Switzerland AG 2023 Z. Pan, Measure-Theoretic Calculus in Abstract Spaces, https://doi.org/10.1007/978-3-031-21912-2_4

79

80

4 Metric Spaces

Therefore, we have BX (x, r) ⊆ BX (x1 , r1 ) ∩ BX (x2 , r2 ). Hence, the assumptions of Proposition 3.18 are satisfied, and then B is a basis for O. ∼ Next, we show that B X (x0 , r) is a closed set. ∀x ∈ B X (x0 , r) , we have ρ(x, x0 ) > r. Let r1 := ρ(x, x0 ) − r ∈ (0, ∞) ⊂ R. ∀x1 ∈ BX (x, r1 ), we have ρ(x1 ,x∼0 ) ≥ ρ(x, x0) − ρ(x, x1∼) > r. Hence, we have x ∈ BX (x, r1 ) ⊆ B X (x0 , r) . Therefore, B X (x0 , r) is open and B X (x0 , r) is closed. This completes the proof of the proposition. ' & We will sometimes talk about a metric space X := (X, ρ) without referring the components of X , where the metric is understood to be ρX and the natural topology is understood to be OX . When it is clear from the context, we will also neglect the subscript X . We will abuse the notation and say x ∈ X and A ⊆ X when x ∈ X and A ⊆ X. On a metric space, we can talk about open and closed sets, and all those concepts defined in Chap. 3, all with respect to the natural topology. A metric space is clearly first countable, where a countable basis at x0 ∈ X is {B(x0 , r) | r ∈ Q, r > 0}. Proposition 4.4 A metric space (X, ρ) is separable if, and only if, it is second countable. Proof “Only if” Let D ⊆ X be a countable dense set. Let M := {B(x, r) | x ∈ D, r ∈ Q, and r > 0}. Clearly, M is countable and M ⊆ O. ∀O ∈ O, ∀x ∈ O. ∃r ∈ (0, ∞)∩Q such that B(x, r) ⊆ O. Since D is dense, then ∃x1 ∈ B(x, r/2)∩D. Let r1 = r/2 ∈ (0, ∞) ∩ Q. Then, we have x ∈ B(x1 , r1 ) ⊆ B(x, r) ⊆ O and B(x1 , r1 ) ∈ M. Hence, M is a basis for O. Hence, (X, O) is second countable. “If” Let O has a countable basis B. By Axiom of Choice, we may assign a xB ∈ B, ∀B ∈ B with B = ∅. Let D = {xB ∈ X | B ∈ B, B = ∅}. Then, D is countable. ∀x ∈ X, ∀O ∈ O with x ∈ O, ∃B ∈ B such that x ∈ B ⊆ O. Then, xB ∈ D ∩ B ⊆ D ∩ O = ∅. Hence, by Proposition 3.3, we have x ∈ D. Therefore, by the arbitrariness of x, we have D is dense. Hence, (X, O) is separable. This completes the proof of the proposition. ' & Proposition 4.5 Let X be a topological space and Y and Z be metric spaces. Let f : X → Y, g : Y → Z, h : Y → X , x0 ∈ X , and y0 ∈ Y. Then, the following statements hold: 1. f is continuous at x0 if, and only if, ∀ ∈ (0, ∞) ⊂ R, ∃U ∈ OX with x0 ∈ U such that ρY (f (x), f (x0 )) < , ∀x ∈ U . 2. g is continuous at y0 if, and only if, ∀ ∈ (0, ∞) ⊂ R, ∃δ ∈ (0, ∞) ⊂ R such that ρZ (g(y), g(y0 )) < , ∀y ∈ BY (y0 , δ). 3. h is continuous at y0 if, and only if, ∀U ∈ OX with h(y0 ) ∈ U , ∃δ ∈ (0, ∞) ⊂ R such that h(y) ∈ U , ∀y ∈ BY (y0 , δ). Proof The proof is straightforward and is therefore omitted.

' &

4.2 Convergence and Completeness

81

Definition 4.6 Let X and Y be metric spaces and f : X → Y be a homeomorphism. f is said to be an isometry between X and Y if ρY (f (x1 ), f (x2 )) = ρX (x1 , x2 ), ∀x1 , x2 ∈ X . Then, the two metric spaces are said to be isometric. % Definition 4.7 Let X be a set and ρ1 and ρ2 be two metrics on X. ρ1 and ρ2 are said to be equivalent if the identity map from (X, ρ1 ) to (X, ρ2 ) is a homeomorphism. % When two metrics are equivalent, then the natural topologies generated by them are equal to each other. Clearly, a metric space is Hausdorff.

4.2 Convergence and Completeness Proposition 4.8 Let X be a metric space and (xα )α∈A ⊆ X be a net. Then, limα∈A xα = x ∈ X if, and only if, ∀ ∈ (0, ∞) ⊂ R, ∃α0 ∈ A, ∀α ∈ A with α0 ≺ α, we have ρ(xα , x) < . Proof This is straightforward and is omitted.

' &

Since metric spaces are Hausdorff, then the limit is unique if it exists. Definition 4.9 Let X be a metric space, x0 ∈ X , and S ⊆ X . The distance from x0 to S is dist(x0 , S) := infs∈S ρ(x0 , s) ∈ [0, ∞] ⊂ Re . % dist(x0 , S) = ∞ if, and only if, S = ∅. Proposition 4.10 Let X be a metric space, x0 ∈ X , S ⊆ X , and S is closed. Then, x0 ∈ S if, and only if, dist(x0 , S) = 0. Proof “Only if” This is obvious. “If” By the fact that dist(x0 , S) = 0, ∀n ∈ N, ∃xn ∈ S such that ρ(x0 , xn ) < 1/n. Then, limn∈N xn = x0 . By Proposition 3.68, x0 ∈ S. Since S is closed, then, by Proposition 3.3, S = S. Hence, x0 ∈ S. This completes the proof of the proposition. ' & Proposition 4.11 A metric space with its natural topology is normal. Proof Let X be the metric space. Clearly, X is Hausdorff. For all closed sets F1 , F2 ⊆ X with F1 ∩ F2 = ∅. We will distinguish two exhaustive and mutually exclusive cases: Case 1: F1 = ∅ or F2 = ∅, and Case 2: F1 = ∅ and F2 = ∅. Case 1: F1 = ∅ or F2 = ∅. Without loss of generality, assume F1 = ∅. Take O1 = ∅ ∈ O and O2 = X ∈ O, then F1 ⊆ O1 , F2 ⊆ O2 , O1 ∩ O2 = ∅. Case 2: F1 = ∅ and F2 = ∅. ∀x ∈ F1 , dist(x,  F2 ) ∈ (0, ∞) ⊂ R, by Proposition 4.10. Define O1 ∈ O by O1 := x∈F1 B(x, dist(x, F2 )/3). ∀x ∈ F 2 , dist(x, F1 ) ∈ (0, ∞) ⊂ R by Proposition 4.10. Define O2 ∈ O by O2 := x∈F2 B(x, dist(x, F1 )/3). Clearly, F1 ⊆ O1 and F2 ⊆ O2 . Note that O1 ∩ O2 = ∅, since otherwise, ∃x0 ∈ O1 ∩ O2 , ∃x1 ∈ F1 such that x0 ∈ B(x1 , dist(x1 , F2 )/3), ∃x2 ∈ F2 such that x0 ∈ B(x2 , dist(x2 , F1 )/3), without loss

82

4 Metric Spaces

of generality, assume dist(x1 , F2 ) ≤ dist(x2 , F1 ), then dist(x2 , F1 ) ≤ ρ(x2 , x1 ) ≤ ρ(x2 , x0 ) + ρ(x0 , x1 ) < dist(x2 , F1 )/3 + dist(x1 , F2 )/3 ≤ 2 dist(x2 , F1 )/3, which is a contradiction. Hence, in both cases, ∃O1 , O2 ∈ O such that F1 ⊆ O1 , F2 ⊆ O2 , O1 ∩ O2 = ∅. Hence, X is normal. This completes the proof of the proposition. ' & Definition 4.12 Let X be a metric space and (xn )∞ n=1 ⊆ X . The sequence is said to be a Cauchy sequence if ∀ ∈ (0, ∞) ⊂ R, ∃N ∈ N, ∀n, m ≥ N, ρ(xn , xm ) < . % Clearly, every convergent sequence in a metric space is a Cauchy sequence. Proposition 4.13 Let X be a metric space, E ⊆ X , and x0 ∈ X . Then, x0 ∈ E if, and only if, ∃(xn )∞ n=1 ⊆ E such that limn∈N xn = x0 . Proof “Only if” ∀n ∈ N, since x0 ∈ E, then, by Proposition 3.3, ∃xn ∈ E ∩ B(x0 , 1/n). Clearly, (xn )∞ n=1 ⊆ E and limn∈N xn = x0 . “If” This is immediate by Proposition 3.68. This completes the proof of the proposition. & ' Definition 4.14 A metric space is said to be complete if every Cauchy sequence in the metric space converges to a point in the space. % Proposition 4.15 Let X := (X, ρ) be a metric space, Y := (Y, O) be a topological space, f : X → Y, and x0 ∈ X . Then, the following statements are equivalent: (i) f is continuous at x0 . (ii) if x0 is an accumulation point of X , then f (x) converges to f (x0 ) as x → x0 . ∞ (iii) ∀(xn )∞ n=1 ⊆ X with limn∈N xn = x0 , we have (f (xn ))n=1 ⊆ Y converges to f (x0 ). ∞ (iv) ∀(xn )∞ n=1 ⊆ X with x0 as a cluster point, we have that (f (xn ))n=1 ⊆ Y admits a cluster point f (x0 ). Proof (i) ⇔ (ii). This follows from Proposition 3.74. (i) ⇒ (iii). This follows from Proposition 3.66. (iii) ⇒ (iv). ∀(xn )∞ Since X is a metric space n=1 ⊆ X with x0 as a cluster point.  ∞ and therefore first countable, then ∃ a subsequence xni i=1 of (xn )∞ n=1 such that  ∞ limi∈N xni = x0 . Then, by (iii), f (xni ) i=1 converges to f (x0 ). Then, f (x0 ) is a ∞  cluster point of f (xni ) i=1 and therefore a cluster point of (f (xn ))∞ n=1 . (iv) ⇒ (i). Suppose f is not continuous at x0 . ∃OY 0 ∈ OY with f (x0 ) ∈ OY 0 such that ∀n ∈ N, we have f (B(x0 , 1/n))  OY 0 . Then, ∃xn ∈ B(x0 , 1/n) such ∞ , that f (xn ) ∈ O Y 0 . Consider the sequence (xn )n=1 . Clearly, x0 = limn∈N xn and therefore is a cluster point of the sequence. Consider the sequence (f (xn ))∞ n=1 . For , the open set OY 0 f (x0 ), ∀n ∈ N, f (xn ) ∈ O . Then, f (x ) is not a cluster Y0 0 point of (f (xn ))∞ . This contradicts with the assumption. Therefore, f must be n=1 continuous at x0 . This completes the proof of the proposition. ' &

4.2 Convergence and Completeness

83

Proposition 4.16 Let X := (X, ρ) be a metric space, Y := (Y, O) be a topological space, D ⊆ X , f : D → Y, x0 ∈ X be an accumulation point of D, and y0 ∈ Y. Then, the following statements are equivalent: (i) f (x) converges to y0 as x → x0 . (ii) ∀(xn )∞ n=1 ⊆ D \ {x0 } with limn∈N xn = x0 , we have that y0 is a limit point of (f (xn ))∞ n=1 . Proof (i) ⇒ (ii). By (i), ∀O ∈ O with y0 ∈ O, ∃δ ∈ (0, ∞) ⊂ R such that ∀x ∈ (D ∩ BX (x0 , δ)) \ {x0 }, f (x) ∈ O. ∀(xn )∞ n=1 ⊆ D \ {x0 } with limn∈N xn = x0 , ∃N ∈ N such that ∀n ≥ N, xn ∈ BX (x0 , δ). Then, xn ∈ (D ∩ BX (x0 , δ)) \ {x0 } and f (xn ) ∈ O. Hence, y0 is a limit point of (f (xn ))∞ n=1 . (ii) ⇒ (i). We will show this by an argument of contradiction. Suppose (i) does not hold. Then, ∃O0 ∈ O with y0 ∈ O0 , ∀n ∈ N, ∃xn ∈ (D ∩ BX (x0 , 1/n)) \ {x0 } such that f (xn ) ∈ X \ O0 . Clearly, the sequence (xn )∞ n=1 ⊆ D \ {x0 } and limn∈N xn = x0 . But, f (xn ) ∈ / O0 , ∀n ∈ N. Then, y0 is not a limit point of (f (xn ))∞ n=1 . This contradicts (ii). Hence, (i) must hold. This completes the proof of the proposition. ' & Proposition 4.17 Let X be a metric space, D ⊆ X , f : D → R, and x0 ∈ X be an accumulation point of D. Then, we have .

lim sup f (x) = x→x0

lim inf f (x) = x→x0

inf

sup

f (x)

sup

inf

f (x)

∈(0,∞)⊂R x∈(D∩B (x0 ,))\{x0 } ∈(0,∞)⊂R x∈(D∩B (x0 ,))\{x0 }

Proof Let L = lim supx→x0 f (x) ∈ Re and L¯ := inf∈(0,∞)⊂R supx∈(D∩B(x0 ,))\{x0 } f (x) ∈ Re . ∀m ∈ R with m < L, we have m < supx∈(D∩V )\{x0 } f (x), ∀V ∈ OX with x0 ∈ V . Then, ∀ ∈ (0, ∞) ⊂ R, supx∈(D∩B(x0 ,))\{x0 } f (x) > m. Hence, ¯ By the arbitrariness of m, we have L ≤ L. ¯ ∀m ∈ R with m > L, ∃V ∈ OX m ≤ L. with x0 ∈ V such that supx∈(D∩V )\{x0 } f (x) < m. Then, ∃ ∈ (0, ∞) ⊂ R such that B(x0 , ) ⊆ V . Then, supx∈(D∩B(x0 ,))\{x0 } f (x) ≤ supx∈(D∩V )\{x0 } f (x) < m. ¯ Then, we have L¯ < m. By the arbitrariness of m, we have L¯ ≤ L. Hence, L = L. Note that, by Propositions 3.85 and 3.81, .

lim inf f (x) = − lim sup(−f )(x) = − x→x0

x→x0

=

sup

inf

sup

∈(0,∞)⊂R x∈(D∩B (x0 ,))\{x0 }

inf

∈(0,∞)⊂R x∈(D∩B (x0 ,))\{x0 }

This completes the proof of the proposition.

(−f )(x)

f (x) ' &

84

4 Metric Spaces

4.3 Uniform Continuity and Uniformity Definition 4.18 Let X and Y be metric spaces and f : X → Y. f is said to be uniformly continuous if ∀ ∈ (0, ∞) ⊂ R, ∃δ ∈ (0, ∞) ⊂ R such that ρY (f (x1 ), f (x2 )) < , ∀x1 , x2 ∈ X with ρX (x1 , x2 ) < δ. % Clearly, a function f is uniformly continuous implies that it is continuous. Definition 4.19 Let X and Y be metric spaces and f : X → Y. f is said to be a uniform homeomorphism if f is bijective and both f and finv are uniformly continuous. % Properties preserved under uniform homeomorphisms are called uniform properties. These include Cauchy sequences, completeness, uniform continuity, and total boundedness. Definition 4.20 Let X be a set and ρ1 and ρ2 be two metrics defined on X. Then, the two metrics are said to be uniformly equivalent if the identity map from (X, ρ1 ) to (X, ρ2 ) is a uniform homeomorphism. % Proposition 4.21 Let (X, ρ) be a metric space. Define σ : X × X → R by ρ(x1 , x2 ) , ∀x1 , x2 ∈ X. Then, σ is a metric on X and ρ and σ (x1 , x2 ) = 1 + ρ(x1 , x2 ) σ are uniformly equivalent. Proof ∀x1 , x2 , x3 ∈ X, σ (x1 , x2 ) ∈ [0, 1); σ (x1 , x2 ) = 0 ⇔ ρ(x1 , x2 ) = 0 ⇔ s x1 = x2 ; σ (x1 , x2 ) = σ (x2 , x1 ); by the monotonicity of 1+s on s > −1, we have σ (x1 , x2 ) ≤

.

ρ(x1 , x3 ) ρ(x3 , x2 ) ρ(x1 , x3 ) + ρ(x3 , x2 ) ≤ + 1 + ρ(x1 , x3 ) + ρ(x3 , x2 ) 1 + ρ(x1 , x3 ) 1 + ρ(x3 , x2 )

∗ = σ (x1 , x3 ) + σ (x3 , x2 ) Hence, σ defines a metric on X. ∀ ∈ (0, ∞) ⊂ R, ∀x1 , x2 ∈ X with ρ(x1 , x2 ) < , we have σ (x1 , x2 ) ≤ ρ(x1 , x2 ) < . Hence, idX : (X, ρ) → (X, σ ) is uniformly continuous.  On the other hand, ∀ ∈ (0, ∞) ⊂ R, ∀x1 , x2 ∈ X with σ (x1 , x2 ) < 1+ , we have ρ(x1 , x2 ) < . Hence, idX : (X, σ ) → (X, ρ) is uniformly continuous. Therefore, ρ and σ are uniformly equivalent. ' & Definition 4.22 A metric space X is said to be totally bounded if ∀ ∈ (0, ∞) ⊂ R, there exist finitely many open balls with radius  that cover X . % Proposition 4.23 Let X , Y, W, and Z be metric spaces, (xn )∞ n=1 ⊆ X be a Cauchy sequence, f : X → Y and g : Y → Z be uniformly continuous functions, and h : X → W be a uniform homeomorphism. Then, the following statements hold: (i) (f (xn ))∞ n=1 is a Cauchy sequence. (ii) g ◦ f is uniformly continuous.

4.3 Uniform Continuity and Uniformity

85

(iii) If X is complete, then W is complete. (iv) If X is totally bounded and f is surjective, then Y is totally bounded. Proof (i) ∀ ∈ (0, ∞) ⊂ R, by the uniform continuity of f , ∃δ ∈ (0, ∞) ⊂ R such that ρY (f (xa ), f (xb )) < , ∀xa , xb ∈ X with ρX (xa , xb ) < δ. Since (xi )∞ i=1 is Cauchy, then ∃N ∈ N such that ρX (xn , xm ) < δ, ∀n, m ≥ N. Then, ρY (f (xn ), f (xm )) < . Hence, (f (xi ))∞ i=1 is a Cauchy sequence. (ii) ∀ ∈ (0, ∞) ⊂ R, by the uniform continuity of g, ∃δ1 ∈ (0, ∞) ⊂ R such that ρZ (g(y1 ), g(y2 )) < , ∀y1 , y2 ∈ Y with ρY (y1 , y2 ) < δ1 . By the uniform continuity of f , ∃δ ∈ (0, ∞) ⊂ R such that ρY (f (xa ), f (xb )) < δ1 , ∀xa , xb ∈ X with ρX (xa , xb ) < δ. Then, we have ρZ (g(f (xa )), g(f (xb ))) < . Hence, g ◦ f is uniformly continuous. ∞ (iii) ∀ Cauchy sequence (wi )∞ i=1 ⊆ W. By (i), (hinv (wi ))i=1 ⊆ X is a Cauchy sequence. Since X is complete, then limi∈N hinv (wi ) = x0 ∈ X . By Proposition 3.66, we have limi∈N wi = limi∈N h(hinv (wi )) = h(x0 ) ∈ W. Hence, W is complete. (iv) ∀ ∈ (0, ∞) ⊂ R, by the uniform continuity of f , ∃δ ∈ (0, ∞) ⊂ R such that ρY (f (xa ), f (xb )) < , ∀xa , xb ∈ X with ρX (xa , xb ) < δ. By  the total boundedness of X , there exists a finite set XN ⊆ X , such that BX (x, δ) = X . Then, by the surjectiveness of f and Proposition 2.5, we x∈X N have x∈XN f (B  X (x, δ)) = Y. Note that f (BX (x, δ)) ⊆ BY (f (x), ), ∀x ∈ X . Then, we have x∈XN BY (f (x), ) = Y. Hence, Y is totally bounded. ' & This completes the proof of the proposition. Definition 4.24 Let X be a set and Y := (Y, ρ) be a metric space. Let (fα )α∈A be a net of functions of X to Y. Then, the net is said to converge uniformly to a function f : X → Y if ∀ ∈ (0, ∞) ⊂ R, ∃α0 ∈ A, ∀α ∈ A with α0 ≺ α, we have ρ(fα (x), f (x)) < , ∀x ∈ X. % Definition 4.25 Let X be a set and Y := (Y, ρ) be a metric space. Let (fn )∞ n=1 be a sequence of functions of X to Y. Then, the sequence is said to be a uniform Cauchy sequence if ∀ ∈ (0, ∞) ⊂ R, ∃N ∈ N, ∀n, m ∈ N with n, m ≥ N, we have ρ(fn (x), fm (x)) < , ∀x ∈ X. % A uniformly convergent sequence (fn )∞ n=1 is a uniform Cauchy sequence. A uniform Cauchy sequence in a complete metric space is uniformly convergent. Proposition 4.26 Let X := (X, O) be a topological space and Y := (Y, ρ) be a metric space. Let (fn )∞ n=1 be a uniformly convergent sequence of functions of X to Y whose limit is f : X → Y. Assume that, ∀n ∈ N, fn is continuous at x0 ∈ X . Then, f is continuous at x0 .

86

4 Metric Spaces

Proof ∀ ∈ (0, ∞) ⊂ R, ∃N ∈ N, we have ρ(fN (x), f (x)) < /3, ∀x ∈ X . Since fN is continuous at x0 , then ∃U ∈ O with x0 ∈ U , ∀x ∈ U , we have ρ(fN (x), fN (x0 )) < /3. Then, ∀x ∈ U , we have ρ(f (x), f (x0 )) ≤ ρ(f (x), fN (x)) + ρ(fN (x), fN (x0 )) + ρ(fN (x0 ), f (x0 ))

.

((ρX (x1 , y1 )) + 2 1/2 (ρX (x2 , y2 )) ) ≥ (ρX (x1 , y1 ) + ρX (x2 , y2 ))/ 2. Note that ρX (x1 , x2 ) ≤ ρX (x1 , y1 ) + ρX (y1 , x2 )

.

ρX (y1 , x2 ) ≤ ρX (x1 , x2 ) + ρX (x1 , y1 ) ρX (y1 , x2 ) ≤ ρX (y1 , y2 ) + ρX (x2 , y2 ) ρX (y1 , y2 ) ≤ ρX (x2 , y1 ) + ρX (y2 , x2 ) This implies that √ − = − 2δ < −ρX (x1 , y1 ) − ρX (x2 , y2 ) ≤ ρX (x1 , x2 ) − ρX (x2 , y1 )

.

+ρX (x2 , y1 ) − ρX (y1 , y2 ) = ρX (x1 , x2 ) − ρX (y1 , y2 ) ≤ ρX (x1 , y1 ) +ρX (x2 , y1 ) + ρX (x2 , y2 ) − ρX (x2 , y1 ) = ρX (x1 , y1 ) + ρX (x2 , y2 ) <  Hence, we have |ρX (x1 , x2 ) − ρX (y1 , y2 )| < . Hence, ρX is uniformly continuous on X × X . This completes the proof of the proposition. ' & Proposition 4.31 Let X and Y be complete metric spaces and Z = X × Y be the product metric space with the Cartesian metric ρ. Then, Z is complete. Proof Fix any Cauchy sequence ((xn , yn ))∞ n=1 ⊆ Z. ∀ ∈ (0, ∞) ⊂ R, ∃N ∈ N such that ρ((xn , yn ), (xm , ym )) < , ∀n, m ≥ N. Then, we have ρX (xn , xm ) <  ∞ and ρY (yn , ym ) < . Hence, (xn )∞ n=1 ⊆ X and (yn )n=1 ⊆ Y are Cauchy sequences. By the completeness of X and Y, ∃x0 ∈ X and ∃y0 ∈ Y such that limn∈N xn = x0 and limn∈N yn = y0 . By Proposition 3.67, we have limn∈N (xn , yn ) = (x0 , y0 ) ∈ Z. Hence, Z is complete. This completes the proof of the proposition. ' &

88

4 Metric Spaces

Clearly, Definition 4.28 and Propositions 4.29 and 4.31 may be easily generalized to the case of X1 × · · · × Xn , where  n ∈ N and Xi are metric spaces, i = 1, . . . , n. When n = 0, it should be noted that α∈∅ Xα = ({∅}, ρ), where ρ(∅, ∅) = 0. Proposition 4.32 Let Xα be a metric space, α ∈ Λ, where Λ is a finite set. Let Λ =  Λ , where Λβ ’s are pairwise disjoint and finite and Γ is also finite. ∀β ∈ Γ , β β∈Γ    β := α∈Λβ Xα be the product metric space. Let X Γ  := β∈Γ α∈Λβ Xα let X  be the product metric space of product metric spaces, and X := α∈Λ Xα be the product metric space. Then, X and X Γ  are isometric.    Γ  , ∀α ∈ Λ, Proof Define E : β∈Γ α∈Λβ Xα → α∈Λ Xα by, ∀x ∈ X β 

Γ 

∃! βα ∈ Γ · α ∈ Λβα , πα (E(x)) = πα α (πβα (x)). By Proposition 3.30, E is a homeomorphism. ∀x, y ∈ X Γ  , we have .

ρ(E(x), E(y)) =

.

!1/2 (ρα (πα (E(x)), πα (E(y))))2

α∈Λ

=

. .

!1/2

(ρα (πα (E(x)), πα (E(y))))

2

β∈Γ α∈Λβ

=

. .

Γ 

Γ 

(ρα (παβ (πβ (x)), παβ (πβ (y))))2

!1/2

β∈Γ α∈Λβ

=

=

.

.

β∈Γ

α∈Λβ

.

Γ  Γ  (ρα (παβ (πβ (x)), παβ (πβ (y))))2

Γ 

Γ 

(ρ β (πβ (x), πβ (y)))2

!1/2

!1/2 !2 !1/2

= ρ Γ  (x, y)

β∈Γ

Hence, E is an isometry. This completes the proof of the proposition.

' &

Proposition 4.33 Let Xα := (Xα , ρXα ) and Yα := (Yα , ρY α ) be uniformly homeomorphic metric spaces, ∀α ∈ Λ, whereΛ is a finite index set. Define the  product metric spaces X := (X, ρX ) := α∈Λ Xα and Y := (Y, ρY ) := α∈Λ Yα , where ρX and ρY are the Cartesian metric. Then, X and Y are uniformly homeomorphic.

4.4 Product Metric Spaces

89

Proof Let Fα : Xα → Yα be a uniform homeomorphism, ∀α ∈ Λ. Define F : Y  X X → Y by, ∀x ∈ X , πα (F (x)) = Fα (πα (x)), ∀α ∈ Λ. By Propositions 4.29 and 3.31, F is a homeomorphism between X and Y. We need only to show that F and Finv are uniformly continuous. Let m ∈ Z+ be the number of elements in Λ. ∀ ∈ (0, ∞) ⊂ R, ∀α ∈ Λ, ∃δα ∈ (0, ∞) ⊂ R such that,√∀xα1 , xα2 ∈ Xα with ρXα (xα1 , xα2 ) < δα , we have ρY α (Fα (xα1 ), Fα (xα2 )) < / 1 + m, by the uniform continuity of Fα . Let δ = min{infα∈Λ δα , 1} ∈ (0, ∞) ⊂ R. ∀x1 , x2 ∈ X with ρX (x1 , x2 ) < δ, we have, ∀α ∈ Λ, ρXα (παX (x1 ), παX (x2 )) < δ ≤ δα

.

√ X X This implies that ρY α (Fα (πα (x1 )), Fα (πα (x2 ))) < / 1 + m. Hence, we have ρY (F (x1 ), F (x2 )) =

.

=

.

(ρY α (παY  (F (x1 )), παY  (F (x2 ))))2 α∈Λ .

!1/2

(ρY α (Fα (παX (x1 )), Fα (παX (x2 ))))2 α∈Λ

!1/2 1, ∃αn ∈ A with αn−1 ≺ αn , ∀α¯ 1 , α¯ 2 ∈ A with αn ≺ α¯ 1 and αn ≺ α¯ 2 , we have ρ(xα¯ 1 , xα¯ 2 ) < 1/n. ∀n ∈ N, ∀m1 , m2 ∈ N with m1 , m2 > n, we have αn ≺ αn+1 ≺ ·· · ≺ αm1 and ∞ αn ≺ αn+1 ≺ · · · ≺ αm2 . Then, ρ(xαm1 , xαm2 ) < 1/n. Hence, xαn n=1 ⊆ X is a Cauchy sequence. By the completeness of X , ∃x0 ∈ X such that limn∈N xαn = x0 . ∀n ∈ N, ∀α ∈ A with αn ≺ α, ∀m ∈ N with m > n, we have αn ≺ αn+1 ≺ · · · ≺ αm and ρ(xα , xαm ) < 1/n. Then, by Propositions 4.30, 3.66, and 3.67, we have ρ(xα , x0 ) = limm∈N ρ(xα , xαm ) ≤ 1/n. Hence, we have limα∈A xα = x0 . This completes the proof of the proposition. ' & Proposition 4.45 Let X := (X, O) be a topological space, Y := (Y, ρ) be a complete metric space, D ⊆ X, x0 ∈ X be an accumulation point of D, and f : D → Y. Then, limx→x0 f (x) ∈ Y if, and only if, ∀ ∈ (0, ∞) ⊂ R, ∃O ∈ O with x0 ∈ O, ∀x, ¯ xˆ ∈ (D ∩ O) \ {x0 }, we have ρ(f (x), ¯ f (x)) ˆ < . Proof “Sufficiency” Assume that ∀ ∈ (0, ∞) ⊂ R, ∃O ∈ O with x0 ∈ O, ∀x, ¯ xˆ ∈ (D ∩ O) \ {x0 }, we have ρ(f (x), ¯ f (x)) ˆ < . Define M := {O ∈ O | x0 ∈ O}. Clearly, X ∈ M and M = ∅. It is easy to see that A := (M, ⊇) is a directed system. Since x0 is an accumulation point of D, then ∀O ∈ A, (D ∩ O) \ {x0 } = ∅. By Axiom of Choice, ∃ a net (xO )O∈A ⊆ X such that xO ∈ (D∩O)\{x0 }, ∀O ∈ A. This also defines a net (f (xO ))O∈A ⊆ Y by Axiom of Replacement. ∀ ∈ (0, ∞) ⊂ ˆ R, by the assumption, ∃Oˆ ∈ A, ∀x, ¯ xˆ ∈ (D∩ O)\{x ¯ f (x)) ˆ < . 0 }, we have ρ(f (x), ˆ \ ∀O1 , O2 ∈ A with Oˆ ⊇ O1 and Oˆ ⊇ O2 , xOi ∈ (D ∩ Oi ) \ {x0 } ⊆ (D ∩ O) {x0 }, i = 1, 2. This implies that ρ(f (xO1 ), f (xO2 )) < . This shows that the net (f (xO ))O∈A ⊆ Y is Cauchy. By Proposition 4.44, limO∈A f (xO ) = y0 ∈ Y . ∀ ∈ (0, ∞) ⊂ R, ∃O1 ∈ A, ∀O ∈ A with O1 ⊇ O, we have ρ(y0 , f (xO )) < /2. By the assumption, ∃O2 ∈ A, ∀x, ¯ xˆ ∈ (D ∩ O2 ) \ {x0 }, we have ρ(f (x), ¯ f (x)) ˆ < /2. Let O3 := O1 ∩ O2 ∈ A. Then, O1 ⊇ O3 and xO3 ∈ (D ∩ O3 ) \ {x0 } ⊆ (D ∩ O2 ) \ {x0 }. ∀x ∈ (D ∩ O3 ) \ {x0 }, we have ρ(y0 , f (x)) ≤ ρ(y0 , f (xO3 )) + ρ(f (xO3 ), f (x)) < . Hence, we have limx→x0 f (x) = y0 ∈ Y. “Necessity” Let limx→x0 f (x) = y0 ∈ Y. Then, ∀ ∈ (0, ∞) ⊂ R, ∃O ∈ O with x0 ∈ O, ∀x ∈ (D ∩ O) \ {x0 }, we have ρ(f (x), y0 ) < /2. Hence, ∀x, ¯ xˆ ∈ (D ∩ O) \ {x0 }, ρ(f (x), ¯ f (x)) ˆ ≤ ρ(f (x), ¯ y0 ) + ρ(y0 , f (x)) ˆ < . This completes the proof of the proposition. ' & Proposition 4.46 Let (X, ρX ) and (Y, ρY ) be metric spaces, (Y, ρY ) be complete, E ⊆ X, and f : E → Y be uniformly continuous. Then, there is a unique continuous extension g : E → Y . Furthermore, g is uniformly continuous. Proof ∀(xi )∞ i=1 ⊆ E with limi∈N xi = x0 ∈ X, by Proposition 4.23 and the uniform continuity of f , (f (xi ))∞ i=1 is a Cauchy sequence in Y . Since (Y, ρY ) is complete, then ∃y0 ∈ Y such that limi∈N f (xi ) = y0 . Let (x¯i )∞ i=1 ⊆ E be any other sequence with limi∈N x¯i = x0 . Then, the sequence (x1 , x¯1 , x2 , x¯2 , . . .) converges to x0 . By

4.7 Completion of Metric Spaces

95

Proposition 4.23 and the uniform continuity of f , (f (x1 ), f (x¯1 ), f (x2 ), f (x¯2 ), . . .) is a Cauchy sequence in Y , which converges since (Y, ρY ) is complete. The limit for this Cauchy sequence must be y0 by Proposition 3.70. Hence, we have limi∈N f (x¯i ) = y0 . Hence, y0 is dependent only on x0 but not on the sequence (xi )∞ i=1 . By Proposition 4.13, we may define a function g : E → Y by g(x0 ) = y0 , ∀x0 ∈ E. ∀x0 ∈ E, choose a sequence (x0 , x0 , . . .), which converges to x0 , then y0 = f (x0 ). Hence, we have g|E = f . Next, we show that g is uniformly continuous. ∀ ∈ (0, ∞) ⊂ R, by the uniform continuity of f on E, ∃δ ∈ (0, ∞) ⊂ R such that ρY (f (x1 ), f (x2 )) < , ∀x1 , x2 ∈ E with" ρX (x # 1 , x2") 0}. Then, x0 ∈ Vx0 ∈ O since fx0 is continuous. Note that Vx0 ⊆ U¯ λ0 , which implies that supp(fx0 ) = Vx0 ⊆ U¯ λ0 ⊆ Uλ0 ⊆ Oλ0 , which is compact by Proposition 5.5. Hence, fx0 has compact support.  ∈ O, by Proposition 5.5. By (iv), there exists a continuous ∀x0 ∈ U \ K, K  function gx0 : X → [0, 1] in F such that gx0 (x0 ) = 1 and gx0 K = 0. Let Wx0 := {x ∈ X |gx0 (x) > 0}. Then, x0 ∈ Wx0 "∈ O since gx0# is continuous. Clearly,     Note that U ⊆ Wx0 ⊆ K. Wx . By the compactness x∈K Vx ∪ x∈U\K

of that U ⊆ "U , there#exist "finite sets# KN ⊆ K and UN ⊆ U \ K such  x∈KN Vx ∪ x∈UN Wx . By the construction of Wx ’s, K ⊆ x∈KN Vx . Let f := x∈KN fx ∈ F and g := x∈UN gx ∈ F (here f, g ∈ F by (i) and (ii)). Then, f is nonnegative, f (x) > 0, ∀x ∈ K, g is nonnegative, and f (x) + g(x) > 0, ∀x ∈ U .  Define Φ := {φ : X → [0, 1]  φ = fx /(f + g), x ∈ KN}, which is a finite set. Note that ∀x ∈ KN , supp(fx ) ⊆ Uλ0 ⊆ U ⊆ U ⊆ {x ∈ X  (f + g)(x) = 0}, for some λ0 ∈ Λ. Then, fx /(f + g) ∈ F . Hence, Φ ⊆ F . ∀φ ∈ Φ, ∃x ∈ KN such that φ = fx /(f + g). Then, φ is continuous by the fact that φ ∈ F . supp(φ) = supp(fx ) is compact and supp(φ) ⊆ Oλ , for some λ ∈ Λ. Hence, Φ is subordinate

126

5 Compact and Locally Compact Spaces

to (Oλ )λ∈Λ . Clearly, φ is nonnegative, ∀φ ∈ Φ. ∀x ∈ K, we have . .

φ(x) =

φ∈Φ

f (x) f (x) = =1 f (x) + g(x) f (x)

and ∀x ∈ X , we have 0≤

.

.

φ(x) = (f/(f + g))(x) ≤ 1

φ∈Φ

This completes the proof of the theorem.

' &

Corollary 5.65 Let X be a locally compact Hausdorff topological space, K ⊆ X be compact, and (Oλ )λ∈Λ ⊆ O be an open covering of K, where Λ is an index set. Then, there exists a finite collection Φ of continuous nonnegative real-valued "#  functions on X , which is subordinate to (Oλ )λ∈Λ , such that φ  = 1, φ∈Φ K φ∈Φ φ(x) ∈ [0, 1] ⊂ R, ∀x ∈ X , and supp(φ) is compact, ∀φ ∈ Φ. Proof Let F be the collection of continuous real-valued functions on X . Clearly, F satisfies (i)–(iii) in Theorem 5.63. Note that {x0} is compact, and by Proposition 5.62, (iv) of Theorem 5.63 is also satisfied by F . Then, the result follows. & '

5.7.3 The Alexandroff One-point Compactification Theorem 5.66 (Alexandroff One-Point Compactification) Let X := (X, O) be a locally compact Hausdorff topological space. The Alexandroff one-point compactification of X is the set Xc := X ∪ {ω} with the topology Oc := {Oc ⊆ Xc | Oc ∈ O or Xc \ Oc is compact in X }. Then, Xc := (Xc , Oc ) is a compact Hausdorff space, and the identity map id : X → Xc \ {ω} is a homeomorphism. The element ω is called the point at infinity in Xc . Proof We first show that Oc is a topology on Xc . (i) ∅ ∈ O, and then ∅ ∈ Oc ; Xc \ Xc = ∅ is compact in X , and then Xc ∈ Oc . (ii) ∀Oc1 , Oc2 ∈ Oc , we will distinguish four exhaustive and mutually exclusive cases: Case 1: Oc1 , Oc2 ∈ O; Case 2: Xc \ Oc1 and Xc \ Oc2 are compact in X ; Case 3: Xc \ Oc1 is compact in X and Oc2 ∈ O; Case 4: Oc1 ∈ O and Xc \ Oc2 is compact in X . Case 1: Oc1 , Oc2 ∈ O. Then, Oc1 ∩ Oc2 ∈ O, and hence, Oc1 ∩ Oc2 ∈ Oc . Case 2: Xc \ Oc1 and Xc \ Oc2 are compact in X . Then, Xc \ (Oc1 ∩ Oc2 ) = (Xc \ Oc1 ) ∪ (Xc \ Oc2 ), which is compact in X . This implies that Oc1 ∩ Oc2 ∈ Oc . Case 3: Xc \ Oc1 is compact in X and Oc2 ∈ O. Let O¯ c1 := Oc1 \ {ω}; then O¯ c1 = X \ (Xc \ Oc1 ). Since Xc \ Oc1 is compact in X and X is Hausdorff, then, by Proposition 5.5, Xc \ Oc1 is closed in X . Then, O¯ c1 ∈ O. Note that Oc1 ∩ Oc2 = O¯ c1 ∩ Oc2 ∈ O. Then, Oc1 ∩ Oc2 ∈ Oc . Case 4: Oc1 ∈ O and Xc \ Oc2 is compact in X . By an

5.7 Locally Compact Spaces

127

argument that is similar to Case 3, we have Oc1 ∩ Oc2 ∈ Oc . Hence, in all four cases, we have Oc1 ∩ Oc2 ∈ Oc . (iii) ∀(Ocλ )λ∈Λ ⊆ Oc , where Λ is an index set, we will distinguish two exhaustive and mutually exclusive cases: Case A: (Ocλ )λ∈Λ ⊆ O; Case B: ∃λ0 ∈ Λ such that ω ∈ O . Case A: (O ) ⊆ O. Then, cλ cλ λ∈Λ 0 λ∈Λ Ocλ ∈ O, and hence,  O ∈ O . Case B: ∃λ ∈ Λ such that ω ∈ O . Then, Xc \ Ocλ0 is compact cλ c 0 cλ 0 λ∈Λ in X . We may partition Λ into two disjoint sets Λ1 and Λ2 such that Λ = Λ1 ∪ Λ2 , Λ1 ∩ Λ2 = ∅, ∀λ ∈ Λ1 , Ocλ ∈ O, ∀λ ∈ Λ2 , Xc \ Ocλ is compact in X . Note that / Xc \



.

⎛ =⎝

0 Ocλ

λ∈Λ







=⎝





(X \ Ocλ )⎠ ∩ ⎝

λ∈Λ1



(Xc \ Ocλ )⎠ ∩ ⎝

λ∈Λ1





⎞ (Xc \ Ocλ )⎠

λ∈Λ2





(Xc \ Ocλ )⎠ ∩ (Xc \ Ocλ0 )

λ∈Λ2

∀λ ∈ Λ1 , X \ Ocλ is a closed set in X . ∀λ ∈ Λ2 , X c \ Ocλ is compact in X and therefore closed in X by Proposition 5.5. Hence, Xc \ λ∈Λ Ocλ  is a closed subset in X by Proposition 5.5. Then, of Xc \ Ocλ0 and hence compact λ∈Λ Ocλ ∈ Oc .  Hence, in both cases, we have λ∈Λ Ocλ ∈ Oc . Summarizing the above, Oc is a topology on Xc . Next, we show that Xc is compact. Fix an open covering (Ocλ )λ∈Λ ⊆ Oc of Xc . We may partition Λ into two disjoint sets Λ1 and Λ2 such that Λ = Λ1 ∪ Λ2 , Λ1 ∩ Λ2 = ∅, ∀λ ∈ Λ1 , Ocλ ∈ O, ∀λ ∈ Λ2 , Xc \ Ocλ is compact in X . Since ω ∈ Xc , then ∃λ0 ∈ Λ2 such that ω ∈ Ocλ0 . Then, Xc \ Ocλ0 is compact in X . ∀λ ∈ Λ2 , let O¯ cλ := Ocλ \ {ω}, and then O¯ cλ = X \ (Xc \ Ocλ ). By Proposition 5.5 and the compactness of Xc \ Ocλ , O¯ cλ ∈ O. Note that ⎛ Xc = ⎝



.

λ∈Λ1





Ocλ ⎠ ∪ ⎝



⎞ Ocλ ⎠ ∪ Ocλ0 = Ocλ0 ∪ (Xc \ Ocλ0 )

λ∈Λ2

    ¯ Then, Xc \ Ocλ0 ⊆ λ∈Λ1 Ocλ ∪ λ∈Λ2 Ocλ . By the compactness of Xc \ Ocλ0 , there exist ⊆ Λ2 such # finite " sets ΛN1# ⊆ Λ1 and Λ"N2 # that " Xc \ Ocλ0# ⊆ "   ¯ λ∈ΛN1 Ocλ ∪ λ∈ΛN2 Ocλ . Then, Xc = λ∈ΛN1 Ocλ ∪ λ∈ΛN2 Ocλ ∪ Ocλ0 . Hence, Xc is compact. Next, we show that Xc is Hausdorff. ∀x1 , x2 ∈ Xc with x1 = x2 . We will distinguish two exhaustive and mutually exclusive cases: Case 1: x1 , x2 ∈ X ; Case 2: x1 = ω or x2 = ω. Case 1: x1 , x2 ∈ X . Since X is Hausdorff, then ∃O1 , O2 ∈ O such that x1 ∈ O1 , x2 ∈ O2 , and O1 ∩ O2 = ∅. Clearly, O1 , O2 ∈ Oc . Case 2: x1 = ω or x2 = ω. Without loss of generality, assume x2 = ω. Then, x1 ∈ X . By local compactness of X , ∃O1 ∈ O such that x1 ∈ O1 ⊆ O1 ⊆ X and O1 is compact. Then, O1 ∈ Oc and O2 := Xc \ O1 ∈ Oc . Clearly, x2 ∈ O2 and O1 ∩ O2 = ∅.

128

5 Compact and Locally Compact Spaces

Hence, in both cases, we have obtained O1 , O2 ∈ Oc such that x1 ∈ O1 , x2 ∈ O2 , and O1 ∩ O2 = ∅. Hence, Xc is Hausdorff. Finally, we show that id : X → Xc \ {ω} is a homeomorphism. Clearly, id is bijective. ∀Oc ∈ Oc , we either have Oc ∈ O, which implies that Oc ∩ (Xc \ {ω}) = Oc ∈ O; or we have Xc \ Oc is compact, which implies that O¯ c := Oc \ {ω} = X \ (Xc \ Oc ) ∈ O, by Proposition 5.5, and hence, Oc ∩ (Xc \ {ω}) = O¯ c ∈ O. Hence, the subset topology Oc¯ on Xc \ {ω} with respect to Xc is contained in O. It is easy to see that O ⊆ Oc¯ . Then, O = Oc¯ . Hence, id : X → Xc \ {ω} is a homeomorphism. This completes the proof of the theorem. ' &

5.7.4 Proper Functions Definition 5.67 Let X and Y be topological spaces and f : X → Y be continuous. f is said to be proper if ∀ compact set K ⊆ Y, we have finv (K) ⊆ X is compact. f is said to be countably proper if ∀ compact set K ⊆ Y, we have finv (K) ⊆ X is countably compact. % Proposition 5.68 Let X := (X, OX ) and Y := (Y, OY ) be locally compact Hausdorff topological spaces and f : X → Y be continuous. Let Xc := (Xc , OXc ) and Yc := (Yc , OY c ) be the Alexandroff one-point compactifications of X and Y, respectively, where Xc = X ∪ {ωx } and Yc = Y ∪ {ωy }. Define a function fc : Xc → Yc by fc (x) = f (x), ∀x ∈ X, and fc (ωx ) = ωy . Then, f is proper if, and only if, fc is continuous. Proof “Only if” Let f be proper. ∀OY c ∈ OY c , we will distinguish two exhaustive and mutually exclusive cases: Case 1: OY c ∈ OY ; Case 2: Yc \ OY c is compact in Y. Case 1: OY c ∈ OY . Then, fc inv (OY c ) = finv (OY c ) ∈ OX . This implies that fc inv (OY c ) ∈ OXc . Case 2: Yc \ OY c is compact in Y. Then, fc inv (Yc \ OY c ) = finv (Yc \ OY c ) is compact in X by the properness of f . Then, by Proposition 2.5, fc inv (OY c ) = Xc \ fc inv (Yc \ OY c ) ∈ OXc . Hence, in both cases, we have fc inv (OY c ) ∈ OXc . Hence, fc is continuous. “If” Let fc be continuous. Fix a compact set K ⊆ Y. Then, ωy ∈ Yc \ K ∈ OY c . By the continuity of fc , we have ωx ∈ fc inv (Yc \ K) ∈ OXc . This implies that Xc \ fc inv (Yc \ K) is compact in X . By Proposition 2.5, finv (K) = fc inv (K) = Xc \ fc inv (Yc \ K). Hence, f is proper. This completes the proof of the proposition. ' & Proposition 5.69 Let X be a topological space, Y be a metric space, F ⊆ X be closed, and f : X → Y be continuous and proper. Then, f (F ) ⊆ Y is closed. Proof ∀y ∈ f (F ), by Proposition 4.13, ∃(yn )∞ n=1 ⊆ f (F ) such that limn∈N yn = y. Then, ∀n ∈ N, ∃xn ∈ F such that f (xn ) = yn . Let K = {y} ∪ (yn )∞ n=1 . Then, K is compact. By the properness of f , finv (K) is compact. By Propositions 3.5 and 5.5, F ∩ finv (K) is compact. Note that (xn )∞ n=1 ⊆ F ∩ finv (K). Then, by

5.8 σ -Compact Spaces

129

Proposition 5.4, (xn )∞ n=1 admits a cluster point x ∈ F ∩ finv (K). By the continuity ∞ of f and Proposition 3.66, we have the sequence (f (xn ))∞ n=1 = (yn )n=1 admits a cluster point f (x). Since Y is Hausdorff and limn∈N yn = y, then y = f (x) ∈ f (F ). Therefore, f (F ) ⊆ f (F ) and f (F ) is closed. This completes the proof of the proposition. ' & In the above proposition, the assumption on Y may be relaxed to first countable Hausdorff space. Proposition 5.70 Let X be a topological space, Y be a locally compact Hausdorff topological space, F ⊆ X be closed, and f : X → Y be continuous and proper. Then, f (F ) ⊆ Y is closed. Proof ∀y ∈ f (F ), by the local compactness of Y, ∃O ∈ OY such that y ∈ O and O is compact in Y. Then, y ∈ f (F ) ∩O. ∀U ∈ OY with y ∈ U , by Proposition 3.3, we have ∅ = (U ∩ O) ∩ f (F ) = U ∩ (O ∩ f (F )). Hence, y ∈ f (F ) ∩ O by Proposition 3.3. By Proposition 3.68, there exists a net (yα )α∈A ⊆ f (F ) ∩ O such that limα∈A yα = y. ∀α ∈ A, ∃xα ∈ F such that yα = f (xα ). Then, xα ∈ F ∩ finv (O). Then, the net (xα )α∈A ⊆ F ∩ finv (O). By the properness of f , we have finv (O) ⊆ X is compact. By Proposition 3.5, F ∩ finv (O) is closed relative to finv (O), which further implies that F ∩ finv (O) is compact by Proposition 5.5. By Proposition 5.4, the net (xα )α∈A admits a cluster point x ∈ F ∩ finv (O). By the continuity of f and Proposition 3.66, f (x) is a cluster point of the net (f (xα ))α∈A = (yα )α∈A . Since Y is Hausdorff and limα∈A yα = y, then y = f (x) ∈ f (F ). Hence, f (F ) ⊆ f (F ) and f (F ) is closed. This completes the proof of the proposition. ' &

5.8 σ -Compact Spaces Definition 5.71 A topological space is said to be σ -compact if it is the union of countably infinitely many compact sets. % Proposition 5.72 Let X be a locally compact topological space. Then, the following statements are equivalent: (i) (ii) (iii)

X is Lindelöf. X is σ -compact. ∞ ∃(O ∞n )n=1 ⊆ O such that ∀n ∈ N, On ⊆ On+1 is compact, and X = n=1 On . This sequence is called an exhaustion of X .

Furthermore, if X is Hausdorff, then the above is equivalent to: (iv) ∃φ : X → [0, ∞) ⊂ R that is proper and continuous. Proof (i) ⇒ (ii). ∀x ∈ X , by local  compactness of X , ∃Ox ∈ O such that x ∈ Ox ⊆ Ox and Ox is compact. X = x∈X Ox . Since X is Lindelöf, then ∃ a countable

130

5 Compact and Locally Compact Spaces

 set XC ⊆ X such that X = x∈XC Ox . We will distinguish three exhaustive and mutually exclusive cases: Case 1: XC = ∅; Case 2: XC = ∅ is finite; Case 3: XC is countably infinite. Case  1: XC = ∅. Let Kn = ∅, ∀n ∈ N, which are clearly compact. Then, X = ∅ = ∞ n=1 Kn . Hence, X is σ -compact. Case 2: XC = ∅ is finite. Without loss of generality, assume that XC = {x1 , . . . , xn } for some n ∈ N. Let Ki = Oxi , i = 1, . . . , n, and Ki = ∅, i = n + 1, n + 2, . . .. Clearly, Ki ’s are compact and X = ∞ i=1 Ki . Hence, X is σ -compact. Case 3: XC is countably infinite. Without loss of generality, assume that  XC = {x1 , x2 , . . .}. Let Ki = Oxi , ∀i ∈ N. Clearly, Ki ’s are compact and X = ∞ i=1 Ki . Hence, X is σ -compact. In all cases, X is σ -compact. (ii) ⇒ (iii). Let X = ∞ n=1 Kn , where Kn is compact, ∀n ∈ N. Without loss of generality, we assume that Kn ⊆ K  n+1 , ∀n ∈ N, since, otherwise, we may let ¯ K¯ n = ni=1 Kn and consider X = ∞ n=1 Kn instead. Let O0 := ∅ ∈ O. Then, O0 = ∅ is compact. ∀n ∈ N, Kn ∪ On−1 is compact, and by Proposition 5.52, ∃On ∈ O such that Kn ∪ On−1 ⊆ On ⊆ On and On is compact. Then, (On )∞ n=1 is an exhaustion of X that we seek. (iii) ⇒ (i). Let (Un )∞ open covering n=1 be an exhaustion of X . Fix any  (Oα )α∈Λ ⊆ O of X , where Λ is an index set. ∀n ∈ N, Un ⊆ α∈Λ Oα . By the  compactness of Un , there exists a finite set ΛNn ⊆ Λ such that Un ⊆ α∈ΛNn Oα . ∞ ∞  Then, X = n=1 Un ⊆ n=1 α∈ΛNn Oα , which is a countable subcovering. Hence, X is Lindelöf. Now, assume that X is Hausdorff. (iii) ⇒ (iv). Let (On )∞ n=1 be an exhaustion of X . ∀n ∈ N, On ⊆ On+1 is compact. Then, by Proposition 5.62, there exists a continuous -∞function φn : X → [0, 1] such that φn |On = 1 and φn |O = 0. Let φ :=  n=1 (1 − φn ). Clearly, n+1 φ : X → [0, ∞) ⊂ R. ∀x0 ∈ X , ∃n0 ∈ N such that x0 ∈ On0 . Then, φ|On = 0  -n0 −1  (1 − φ ) , which is clearly continuous. ∀U ∈ O with φ(x ) ∈ U,  i 0 R i=1 On0

∃V ∈ O with x0 ∈ V ⊆ On0 such that φ|On (V ) ⊆ U . Then, φ(V ) ⊆ U . 0 Hence, φ is continuous at x0 . By the arbitrariness of x0 and Proposition 3.9, φ is continuous. Fix any compact subset K ⊆ R. Then, K is closed and bounded, by  Heine–Borel theorem. ∃N ∈ N such that K ⊆ [−N, N] ⊂ R. ∀x ∈ O N+2 , we n , n = 1, . . . , N + 2, and hence, φn (x) = 0, n = 1, . . . , N + 1, and have x ∈ O φ(x) ≥ N + 1 > N. Then, φinv (K) ⊆ ON+2 ⊆ ON+2 . By Proposition 3.10, φinv (K) is closed in X . By the compactness of ON+2 and Proposition 5.5, φinv (K) is compact. Hence, φ is proper. (iv)⇒ (ii). Let Kn = [−n, n] ⊂ R, ∀n ∈ N. Then, Kn ’s are compact in R and R ⊆ ∞ n=1 Kn . By the  properness of φ, φinv (Kn ) is compact in X , ∀n ∈ N. By Proposition 2.5, X = ∞ n=1 φinv (Kn ). Hence, X is σ -compact. This completes the proof of the proposition. ' &

5.9 Paracompact Spaces

131

5.9 Paracompact Spaces Definition 5.73 Let X be a topological space and A be a collection of subsets in X . A is said to be locally finite if ∀x ∈ X , ∃U ∈ O with x ∈ U such that U meets only finitely many members of A, that is, U ∩ A = ∅ for finitely many A ∈ A. % Proposition 5.74 Let X be a topological space and (Eλ )λ∈Λ be a locally finite collection of subsets of X , where Λ is an index set:   (i) Let E = λ∈Λ Eλ . Then, E = λ∈Λ Eλ . (ii) Let K ⊆ X be compact. Then, K meets only finitely many members of (Eλ )λ∈Λ . Proof (i) ∀x ∈ E, by local finiteness of (Eλ )λ∈Λ , ∃U ∈ O with x ∈ U , and then U meets only finitely many members of (Eλ )λ∈Λ . Let ΛN ⊆ Λ be the finite set such that U ∩ Eλ = ∅, ∀λ ∈ ΛN . Suppose x ∈ / λ∈Λ Eλ . Then, ∀λ ∈ ΛN , x ∈ / Eλ". ∃Uλ ∈ O #with x ∈ Uλ such that Uλ ∩ Eλ = ∅, by Proposition 3.3. Let  O := λ∈ΛN Uλ ∩ U ∈ O. Then, x ∈ O and O ∩ Eλ = ∅, ∀λ ∈ Λ. Then,  O ∩ E = ∅. This contradicts with x ∈ E, by Proposition 3.3. Hence, x ∈ λ∈Λ Eλ .  Then, E ⊆ λ∈Λ Eλ . Onthe other hand, ∀λ ∈ Λ, Eλ ⊆ E. This implies that E λ ⊆ E. Therefore, we have λ∈Λ Eλ ⊆ E.  Hence, E = λ∈Λ Eλ . (ii) Let K ⊆ X be compact. ∀x ∈ K, by the local finiteness of (Eλ )λ∈Λ , ∃Ux ∈ O withx ∈ Ux such that Ux meets only finitely many members of (Eλ )λ∈Λ . Then, K ⊆ x∈KUx . By the compactness  of K, there exists a finite set KN ⊆ K such that K ⊆ x∈KN Ux . Clearly, x∈KN Ux meets only finitely many members in (Eλ )λ∈Λ . Hence, K meets only finitely many members in (Eλ )λ∈Λ . This completes the proof of the proposition. ' & Definition 5.75 A topological space X is said to be paracompact if every open covering of X has a locally finite open refinement. % Proposition 5.76 A closed subset of a paracompact space is paracompact in the subset topology. Proof Let X be a paracompact space, F ⊆ X be closed, and OF be the subset topology on F . Let (OF α )α∈Λ ⊆ OF be any open covering of F , where Λ is an index set (open in the subset topology OF ).By Proposition 3.4, ∀α ∈ Λ, ∃Oα ∈ O  such that OF α = Oα ∩ F . Then, F ⊆ α∈Λ Oα and X ⊆ α∈Λ Oα ∪ F . By the paracompactness of X , there exists a locally finite open refinement V of }. Let V¯ := {V ∈ V  V ⊆ Oα , for some α ∈ Λ}. Then, F ⊆ (O ) ∪ {F  α α∈Λ ¯  V ∈V¯ V , since other V ’s in V are subset of F . Clearly, V is locally finite. Then,

132

5 Compact and Locally Compact Spaces

 ¯ ⊆ OF is a locally finite open refinement of (OF α )α∈Λ (open in the {V ∩ F  V ∈ V} subset topology OF ). Hence, (F, OF ) is paracompact. This completes the proof of the proposition. ' & Theorem 5.77 Metric spaces are paracompact. Proof Let X be a metric space. Let (Oα )α∈Λ be any open covering of X , where Λ is an index set. By well-ordering principle, Λ may be well-ordered by . We will construct an open refinement of (Oα )α∈Λ in the following steps. Let X0 = X . Step n(n ∈ N): ∀α ∈ Λ, let Pαn := {x ∈ Oα∩ Xn−1  α is the least element of n   {β αn B(x, 3/2  ) ⊆ Oα } and Dαn :=  ∈ Λ x ∈ Onβ }}. Let Qαn := {x ∈ P \ x∈Qαn B(x, 1/2 ) ∈ O. Let Xn := Xn−1 α∈Λ Dαn .   We will now show that V := {Dαn α ∈ Λ,  n ∈ N} is a locally finite open refinement of (Oα )α∈Λ . ∀x0 ∈ X . Then, {β ∈ Λ  x0 ∈ Oβ } = ∅. Let α0 be the least element of the set, which exists since Λ is well-ordered. Then, x0 ∈ Oα0 . Then, ∃n0 ∈ N such that B(x0 , 3/2n0 ) ⊆ Oα0 , since Oα0 is open. Then, x0 ∈ Dα0 n0 or x0 ∈ Dβn for some β ∈ Λ and for some n ∈ N with n < n0 . Hence, V covers X . Clearly, Dαn ⊆ Oα and is open by construction, ∀α ∈ Λ and ∀n ∈ N. Then, V is an open refinement of (Oα )α∈Λ .  Fix any x0 ∈ X . Consider the set {β ∈ Λ  ∃n ∈ N, x0 ∈ Dβn } = ∅. Let α1 be the least element ∃j0 ∈ N   of this set. Then, x0 ∈ Dα1 n0 for some n0 ∈ N. Furthermore, such that B x0 , 2−j0 ⊆ Dα1 n0 . Consider the open set B x0 , 2−j0 −n0 x0 .   ∀n ≥ n0 + j0 , ∀α ∈ Λ, since B x0 , 2−j0 ⊆ Dα1 n0 ⊆ Oα1 , then ∀x ∈       B x0 , 2−j0 −n0 , we have B x, 2−j0 − 2−j0 −n0 ⊆ B x0 , 2−j0 ⊆ Dα1 n0 . ∀y ∈ Dαn , since n > n0 , then ∃x¯ ∈ Qαn with x¯ ∈ / Dα1 n0 such that y ∈ B x, ¯ 2−n . Then, ∃x¯ ∈   B y, 2−n such that x¯ ∈ Dαn \ Dα1 n0 . Note that 2−n ≤ 2−j0 −n0 ≤ 2−j0 − 2−j0 −n0 .     Therefore, x ∈ / Dαn . Hence, B x0 , 2−j0 −n0 ∩ Dαn = ∅. Thus, B x0 , 2−j0 −n0 does not intersect any Dαn , ∀α ∈ Λ, ∀n ≥ n0 + j0 .   Claim 5.77.1 ∀n < n0 + j0 , B x0 , 2−j0 −n0 intersects at most one of the set Dαn ’s, α ∈ Λ. Proof is not true. Then, ∃α, β ∈ Λ such that  of Claim Suppose the result  B x0 , 2−j0 −n0 ∩ Dαn = ∅, B x0 , 2−j0 −n0 ∩ Dβn = ∅, and α = β. Without   loss of generality, assume that α  β. Let p ∈ B x0 , 2−j0 −n0 ∩ Dαn and q ∈   B x0 , 2−j0 −n0  ∩ Dβn . By the definition of Dαn , ∃p¯ ∈ Qαn ⊆ Dαn such that p ∈ B p, ¯ 2−n ⊆ Dαn . Similarly, ∃q¯ ∈ Qβn ⊆ Dβn such that q ∈ B q, ¯ 2−n ⊆ ¯ 3/2n ) ⊆ Oα . Similarly, Dβn . Then, by the definition of Qαn , we have B(p, B(q, ¯ 3/2n ) ⊆ Oβ . Note that ρ(p, ¯ q) ¯ ≤ ρ(p, ¯ p) + ρ(p, x0 ) + ρ(x0 , q) + ρ(q, q) ¯

.

< 2−n + 2−j0 −n0 + 2−j0 −n0 + 2−n ≤ 3/2n

5.9 Paracompact Spaces

133

Then, q¯ ∈ Oα . But, q¯ ∈ Qβn ⊆ Pβn , α  β, and α = β implies that q¯ ∈ / Oα . This is a contradiction. Therefore, the result must hold. This completes the proof of the claim. ' &   −j −n Hence, B x0 , 2 0 0 can meet at most n0 +j0 −1 sets in V. Hence, V is locally finite. Therefore, we have obtained a locally finite open refinement of (Oα )α∈Λ . Hence, X is paracompact. This completes the proof of the theorem. ' & Definition 5.78 Let X be a topological space and (Eα )α∈Λ be a collection of subsets of X , where Λ is an index set. (Eα )α∈Λ is said to be star-finite if ∀α0 ∈ Λ, Eα0 meets only finitely many members in the collection. % It is easy to see that a star-finite open covering of X is locally finite. But the converse need not hold. Proposition 5.79 A σ -compact locally compact space is paracompact. Proof Let X be a σ -compact locally compact space. Fix any open covering (Oα )α∈Λ of X , where Λ is an index set. By Proposition 5.72, there exists an exhaustion (Un )∞ n=1 of X . Let U−1 = ∅ and U0 = ∅. Let Vαn := Oα ∩ (Un+1 \ Un−2 ) ∈ O, ∀α ∈ Λ, ∀n ∈ N. It is easy to see that V := {Vαn  α ∈ Λ, n ∈ N} is an open refinement of (Oα )α∈Λ that  covers X . ∀n ∈ N, by the compactness of Un and Proposition 5.5, Un \ Un−1 ⊆ α∈Λ Vαn is compact. Then, there exists a finite set   Vn ⊆ {Vαn  α ∈ Λ} such that Un \ Un−1 ⊆ V ∈Vn V . Then, VL := ∞ n=1 Vn is an ∞ open refinement of (Oα )α∈Λ that covers X since n=1 (Un \ Un−1 ) = X . ∀V ∈ VL , ∃n0 ∈ N such that V ∈ Vn0 . Then, V does not intersect any member of Vn with n ∈ N and |n − n0 | > 2. Hence, VL is star-finite. Then, VL is a locally finite open refinement of (Oα )α∈Λ . Then, X is paracompact. This completes the poof of the proposition. ' & Proposition 5.80 Let X be a locally compact Hausdorff topological space. X is paracompact if, and only if, any open covering of X has a star-finite open refinement (that covers X ). Proof “Only if” Let U be any open covering of X . ∀x ∈ X , ∃Ux ∈ U such that x ∈ Ux ∈ O. By Proposition 5.53, ∃Ox ∈ O such that x ∈ Ox ⊆ Ox ⊆ Ux and Ox is compact. Hence, (Ox )x∈X is an open refinement of U. By the paracompactness of X , there exists a locally finite open refinement V of (Ox )x∈X that covers X . ∀V ∈ V, ∃x ∈ X such that V ⊆ Ox . Then, V ⊆ Ox . By Propositions 5.5 and 3.5, we have V is compact. By Proposition 5.74, V meets only finitely many members of V. Hence, V is star-finite. Therefore, V is a star-finite open refinement of U. “If” Let U be any open covering of X . Then, there is a star-finite open refinement V of U. Then, V is a locally finite open refinement of U. Hence, X is paracompact. This completes the proof of the proposition. ' & Proposition 5.81 A paracompact Hausdorff topological space is normal.

134

5 Compact and Locally Compact Spaces

Proof Let X be a paracompact Hausdorff topological space. Clearly, X is Tychonoff. ∀x0 ∈ X and ∀ closed set F ⊆ X with x0 ∈ / F , we will show that ∃O1 , O2 ∈ O such that x0 ∈ O1 , F ⊆ O2 , and O1 ∩ O2 = ∅. ∀x ∈ F , then 1 2 1 2 x = x0 . Since X is Hausdorff, ∃Ox , O"x ∈ O such # that x0 ∈ Ox , x ∈ Ox ,  1 2 2 ∪ . By the paracompactness of and Ox ∩ Ox = ∅. Then, X ⊆ F x∈F Ox  * + } ∪ Ox2  x ∈ F . X , there exists a locally finite open refinement V ⊆ O of {F  + *  2 Let V¯ := V ∈ V  V ⊆ Ox , for some x ∈ F . Then, V¯ is an open covering . Furthermore, V¯ is locally finite. of F , since ∀V ∈ V \ V¯ we have V ⊆ F 2 ¯ ∃x ∈ F such that V ⊆ Ox , x0 ∈ Ox1 , and Ox2 ∩ Ox1 = ∅. Then, ∀V ∈ V, , , 2 1 1 / V . By Proposition 5.74, V ⊆ Ox ⊆Ox . Hence, we have V ⊆ Ox and x0 ∈  1 . Then, O1 ∈ O, F ⊆  V = V =: O ¯ ¯ ¯ V ∈V V ∈V V ∈V V =: O2 ∈ O, O2 ⊆ O1 , and x0 ∈ O1 . Hence, ∃O1 , O2 ∈ O such that x0 ∈ O1 , F ⊆ O2 , and O1 ∩ O2 = ∅. Then, X is regular. Next, we show that X is normal. Fix any closed sets F1 , F2 ⊆ X with F1 ∩ 1 2 F2 = ∅. ∀x ∈ F2 , then x ∈ / F1 . Since X is regular, ∃Ox , O"x ∈ O such# that  1 2 1 2 2 . By F1 ⊆ Ox , x ∈ Ox , and Ox ∩ Ox = ∅. Then, X ⊆ F2 ∪ x∈F2 Ox the paracompactness  +of X , there exists alocally finite open refinement V ⊆ O of * 2  2 {F2 }∪ Ox  x ∈ F2 . Let V¯ := {V ∈ V  V ⊆ Ox , for some x ∈ F2 }. Then, V¯ is an open covering of F2 , since ∀V ∈ V\V¯ we have V ⊆ F2 . Furthermore, V¯ is locally ¯ ∃x ∈ F2 such that V ⊆ Ox2 , F1 ⊆ Ox1 , and Ox2 ∩Ox1 = ∅. Then, finite. ∀V ∈ V, , , 2 1 1 V ⊆ Ox ⊆  Ox . Hence, we have V ⊆ Ox and F1 ∩ V = ∅. By Proposition 5.74,   V = V =: O . Then, O ∈ O, F ⊆ 1 1 2 V ∈V¯ V ∈V¯ V ∈V¯ V =: O2 ∈ O, O2 ⊆   O1 , and F1 ∩ O1 = ∅. Hence, ∃O1 , O2 ∈ O such that F1 ⊆ O1 , F2 ⊆ O2 , and O1 ∩ O2 = ∅. Then, X is normal. This completes the proof of the proposition. ' &

ˇ 5.10 The Stone–Cech Compactification Definition 5.82 Let X := (X, O) be a completely regular topological space, I = [0, 1] ⊆ R, and F be the family of continuous functions of X to I . By Corollary 3.62, X is homeomorphic to E(X ) ⊆ I F , where E is the equivalence map. Let F = E(X ) and OF be the subset topology of F . Then, β(X ) := (F, OF ) is a compact Hausdorff topological space, by Tychonoff Theorem and ˇ Proposition 5.5. β(X ) is said to be the Stone–Cech compactification of X . % Proposition 5.83 Let X := (X, O) be a completely regular topological space, I = [0, 1] ⊆ R, F be the family of continuous functions of X to I , and E : X →

ˇ 5.10 The Stone–Cech Compactification

135

E(X ) ⊆ I F be the equivalence map. Then, there exists a unique compact Hausdorff topological space β(X ) with the following properties: (i) The space E(X ) is dense in β(X ). (ii) Each bounded continuous real-valued function on E(X ) extends to a bounded continuous real-valued function on β(X ). (iii) If X is a dense subset of a compact Hausdorff topological space Y, then there exists a unique continuous mapping φ : β(X ) → Y such that φ is surjective and φ|E(X ) = Einv . Furthermore, if X is locally compact, then E(X ) is an open subset of β(X ). Proof We will first show that β(X ) defined in Definition 5.82 satisfies properties (i), (ii), and (iii). (i) By Proposition 3.5, E(X ) is dense in F = E(X ) = β(X ). (ii) Fix any bounded continuous real-valued function g : E(X ) → R. Then, ∃N ∈ N such that g : E(X ) → [−N, N] ⊂ R. Then, g¯ := ((g  + N)/(2N)) ◦ E ∈ F by Proposition 3.12. Now, consider the function h¯ := πg¯ β(X ) : β(X ) → [0, 1] ⊂ R,  = g¯ ◦ Einv = (g + which is continuous by Proposition 3.27. Note that h¯  E(X )

N)/(2N). Then, by Proposition 3.12, h := 2N h¯ − N is a continuous function of β(X ) to [−N, N]. Furthermore, h|E(X ) = g. Hence, h is the desired extension that we seek. Clearly, h : β(X ) → [−N, N] ⊂ R. (iii) Let Y := (Y, OY ) be a compact Hausdorff space such that X is dense in Y. Let G be the family of continuous real-valued functions of Y to I . Let E Y  : G  Y → I G be the equivalence map and πg : I G → I be the projection function, ∀g ∈ G. By Propositions 5.14 and 3.61, Y is completely regular. By Corollary 3.62, E Y  : Y → E Y  (Y) ⊆ I G is a homeomorphism. Define a mapping ψ : G → F by ψ(g) = g|X , ∀g ∈ G. By Proposition 3.56, ψ is injective. By Tychonoff theorem, I G is a compact Hausdorff space. Define a mapping Ψ : I F → I G by G  πg (Ψ (if )) = πψ(g)(if ), ∀if ∈ I F , ∀g ∈ G. ∀x ∈ X , E Y  (x) ∈ I G satisfies G  πg (E Y  (x)) = g(x) = g|X (x) = ψ(g)(x) = πψ(g)(E(x)), ∀g ∈ G. Then, we  have E Y  (x) = Ψ (E(x)), ∀x ∈ X . Hence, Ψ ◦ E = E Y  X .

Claim 5.83.1 Ψ is continuous.

 Proof of Claim Fix any basis open set U ⊆ I G . Then, U = g∈G Ug , where Ug ⊆ I is open, ∀g ∈ G, and Ug = I for all g’s except finitely many g’s, say g ∈ GN ⊆ G.  Let V := f ∈F Vf ⊆ I F be given by Vf = I , ∀f ∈ / range (ψ), and Vf = Ug , ∀f ∈ range (ψ) and f = ψ(g). The set V is well-defined since ψ is injective. ∀if ∈ G  V , ∀g ∈ G, πg (Ψ (if )) = πψ(g) (if ) ∈ Vψ(g) = Ug . Hence, we have Ψ (if ) ∈ U . Then, V ⊆ Ψinv (U ). ∀if ∈ I F \ V , ∃f0 ∈ range (ψ) such that πf0 (if ) ∈ / Vf0 . Then, G  f0 = ψ(g0 ), where g0 ∈ GN . Note that πg0 (Ψ (if )) = πψ(g0 ) (if ) ∈ / Vψ(g0 ) = / U . This shows that I F \ V ⊆ Ψinv (I G \ U ). Hence, we have Ug0 . Then, Ψ (if ) ∈ V = Ψinv (U ), by Proposition 2.5. Clearly, V is a basis open set in I F . Hence, Ψ is continuous. This completes the proof of the claim. ' &

136

5 Compact and Locally Compact Spaces

Since X is dense in Y and E Y  is a homeomorphism between Y and E Y  (Y), then E Y  (X ) is dense in E Y  (Y). Since Y is compact and E Y  is continuous, then E Y  (Y) is compact in I G , by Proposition 5.7. Furthermore, by Proposition 5.5, E Y  (Y) is closed in I G . Then, E Y  (X ) = E Y  (Y), where E Y  (X ) is the closure of E Y  (X ) in I G . By the compactness of β(X ), the continuity of Ψ , and Proposition 5.7, Ψ (β(X )) ⊆ I G is compact. Furthermore, by Proposition 5.5, Ψ (β(X )) is closed in I G . Note that E Y  (X ) = Ψ (E(X )) ⊆ Ψ (β(X )). Then, Ψ (β(X )) ⊇ E Y  (Y). Claim 5.83.2 Ψ (β(X )) = E Y  (Y). Proof of Claim Suppose E Y  (Y) ⊂ Ψ (β(X )), then Ψinv (E Y  (Y)) ⊂ β(X ) and β(X ) ∩ (I F \ Ψinv (E Y  (Y))) = ∅. Note that Ψinv (E Y  (Y)) is closed in I F by the closedness of E Y  (Y), the continuity of Ψ , and Proposition 3.10, which further implies that I F \ Ψinv (E Y  (Y)) ∈ OI F . By the denseness of E(X ) in β(X ), we have E(X ) ∩ (I F \ Ψinv (E Y  (Y))) = ∅, and hence, ∃x ∈ X such that E(x) ∈ I F \ Ψinv (E Y  (Y)). By Proposition 2.5, I F \ Ψinv (E Y  (Y)) = Ψinv (I G \ E Y  (Y)). Then, Ψ (E(x)) ∈ I G \ E Y  (Y). This contradicts with the fact that Ψ (E(x)) = E Y  (x) ∈ E Y  (Y). Therefore, we must have Ψ (β(X )) = E Y  (Y). This completes the proof of the claim. ' & Define φ : β(X ) → Y by φ = E Y  inv ◦ Ψ |β(X ) . Clearly, φ is continuous and  surjective by Proposition 3.12. Clearly, φ ◦ E = E Y  inv ◦ E Y  X = id|X . Hence, φ|E(X ) = Einv . ¯  Let φ : β(X ) → Y be any continuous and surjective mapping such that φ¯ E(X ) = Einv = φ|E(X ) . By Proposition 3.56 and the denseness of E(X ) in ¯ Hence, φ is unique. β(X ), we have φ = φ. ˇ Thus, we have shown that the Stone–Cech compactification β(X ) satisfies (i), (ii), and (iii). Next, we will show that β(X ) is unique. Let Y be any compact Hausdorff space that satisfies (i), (ii), and (iii). Since X and E(X ) are homeomorphic, we may identify X as E(X ). Then, by (iii), ∃! φ : β(X ) → Y, which is continuous and surjective, such that φ|X = idX . Since Y satisfies (iii), then ∃! λ : Y → β(X ), which is continuous and surjective, such that λ|X = idX . Then, φ ◦ λ : Y → Y is continuous and satisfies (φ ◦ λ)|X = idX . By Proposition 3.56, we have φ ◦ λ = idY . On the other hand, λ ◦ φ : β(X ) → β(X ) is continuous and satisfies (λ ◦ φ)|X = idX . Then, by Proposition 3.56, we have λ ◦ φ = idβ(X ) . By Proposition 2.4, we have λ = φinv . Hence, β(X ) and Y are homeomorphic. Hence, β(X ) is unique. If, in addition, X is locally compact, then, by Proposition 5.57, X is open in β(X ). This completes the proof of the proposition. ' & Example 5.84 Re with the topology ORe introduced in Example 3.80 is Hausdorff and second countable. It is easy to show that Re is compact. By Proposition 5.14, Re is a normal topological space. By Urysohn Metrization Theorem 4.53, Re is metrizable. Hence, Re is a second countable metrizable compact Hausdorff topological space. %

Chapter 6

Vector Spaces

6.1 Group Definition 6.1 A group is the triple (G, +, e) that consists a nonempty set G, an operation + : G×G → G, and a unit element e ∈ G, satisfying, ∀g1 , g2 , g3 ∈ G: (i) (g1 + g2 ) + g3 = g1 + (g2 + g3 ) (associativeness). (ii) e + g1 = g1 + e = g1 (unit element). (iii) ∃(−g1 ) ∈ G such that g1 +(−g1 ) = e = (−g1 )+g1 (existence of inverse, which is clearly unique). % We have the following result. Proposition 6.2 Let (G, +, e) be a group and H ⊆ G be nonempty. Then, (H, +, e) is a group (which will be called a subgroup of (G, +, e)) if, and only if, ∀g1 , g2 ∈ H , we have g1 + (−g2 ) ∈ H . Proof “Necessity” This is obvious. “Sufficiency” Since H = ∅, then ∃g ∈ H . Then, e = g + (−g) ∈ H . ∀g1 , g2 ∈ H , e + (−g2 ) = (−g2 ) ∈ H . Then, g1 + g2 = g1 + (−(−g2 )) ∈ H . Hence, H is closed under +. Then, it is straightforward to check that (H, +, e) satisfies all the properties of Definition 6.1. Hence, it is a group. This completes the proof of the proposition. ' & Proposition 6.3 Let (G, +, e) be a group and g1 , g2 , g3 ∈ G. If g1 + g2 = g1 + g3 , then g2 = g3 . On the other hand, if g1 + g3 = g2 + g3 , then g1 = g2 . Proof If g1 + g2 = g1 + g3 , then we have g2 = e + g2 = ((−g1 ) + g1 ) + g2 = (−g1 ) + (g1 + g2 )

.

= (−g1 ) + (g1 + g3 ) = ((−g1 ) + g1 ) + g3 = e + g3 = g3

© The Author(s), under exclusive license to Springer Nature Switzerland AG 2023 Z. Pan, Measure-Theoretic Calculus in Abstract Spaces, https://doi.org/10.1007/978-3-031-21912-2_6

137

138

6 Vector Spaces

If g1 + g3 = g2 + g3 , then we have g1 = g1 + e = g1 + (g3 + (−g3 )) = (g1 + g3 ) + (−g3 )

.

= (g2 + g3 ) + (−g3 ) = g2 + (g3 + (−g3 )) = g2 + e = g2 This completes the proof of the proposition.

' &

Definition 6.4 Let (G, +, e) be a group; the order of the group is the number of elements in G, if G is finite. ∀g ∈ G, the order of g is the integer n > 0 such that g + · · · + g = e. %    n

Definition 6.5 Let (G, +G , eG ) and (H, +H , eH ) be two groups, and T : G → H . T is said to be a homomorphism if, ∀g1 , g2 ∈ G, we have T (g1 ) +H T (g2 ) = T (g1 +G g2 ). T is said to be an isomorphism if it is bijective and a homomorphism; in this case, the two groups are said to be isomorphic. % Let T : G → H be a homomorphism; then T (eG ) = eH . Definition 6.6 Let (G, +, e) be a group. g1 , g2 ∈ G are said to be conjugate if ∃g3 ∈ G such that g2 = (−g3 ) + g1 + g3 . Let (H, +, e) be a subgroup of (G, +, e). It is said to be normal (self-conjugate) if, ∀h ∈ H , ∀g ∈ G, we have (−g) + h + g ∈ H . % Definition 6.7 Let (G, +, 0) be a group. It is said to be abelian if, ∀g1 , g2 ∈ G, we have g1 + g2 = g2 + g1 (commutativeness). Then, the unit element 0 is also called the zero element. % Sometimes, we are interested in an algebraic structure that is weaker than a group. For example, the structure of functions f : X → X with respect to the function composition operation. This leads us to the following definition. Definition 6.8 A semigroup is the triple (G, ◦, e) that consists of a nonempty set G, an operation ◦ : G×G → G, and a unit element e ∈ G, satisfying, ∀g1 , g2 , g3 ∈ G: (i) (g1 ◦ g2 ) ◦ g3 = g1 ◦ (g2 ◦ g3 ) (associativeness). (ii) e ◦ g1 = g1 = g1 ◦ e (unit element). Furthermore, a semigroup that satisfies g1 ◦ g2 = g2 ◦ g1 , ∀g1 , g2 ∈ G, is called an abelian semigroup. %

6.3 Field

139

6.2 Ring Definition 6.9 A ring (R, +, ×, 0) is an abelian group (R, +, 0) with an operation × : R × R → R such that, ∀r1 , r2 , r3 ∈ R: (i) (r1 × r2 ) × r3 = r1 × (r2 × r3 ) (ii) r1 × (r2 + r3 ) = r1 × r2 + r1 × r3 (iii) (r1 + r2 ) × r3 = r1 × r3 + r2 × r3

(associativeness). (right distributiveness). (left distributiveness).

A ring is commutative if, ∀r1 , r2 ∈ R, we have r1 × r2 = r2 × r1 . A ring is with identity element if ∃1 ∈ R such that, ∀r ∈ R, 1 × r = r × 1 = r. (The identity element is unique if it exists.) % Proposition 6.10 Let (R, +, ×, 0) be a ring. S ⊆ R. Then, (S, +, ×, 0) is a ring (which will be called a subring) if (S, +, 0) is a subgroup of (R, +, 0) and s1 × s2 ∈ S, ∀s1 , s2 ∈ S. Proof Clearly, (S, +, 0) is an abelian group. Then, it is straightforward to show that (S, +, ×, 0) is a ring. ' & Proposition 6.11 Let (R, +, ×, 0) be a ring, and r1 , r2 ∈ R; then, we have: (i) r1 × 0 = 0 × r1 = 0. (ii) r1 × (−r2 ) = (−r1 ) × r2 = −(r1 × r2 ). (iii) (−r1 ) × (−r2 ) = r1 × r2 . Proof Note that 0 + 0 × r1 = 0 × r1 = (0 + 0) × r1 = 0 × r1 + 0 × r1 . By Proposition 6.3, we have 0 × r1 = 0. Similarly, we can show that r1 × 0 = 0. Note that r1 × r2 + r1 × (−r2 ) = r1 × (r2 + (−r2 )) = r1 × 0 = 0. Then, we have r1 × (−r2 ) = −(r1 × r2 ). Similarly, we can show that (−r1 ) × r2 = −(r1 × r2 ). Note that (−r1 ) × (−r2 ) + (−(r1 × r2 )) = (−r1 ) × (−r2 ) + r1 × (−r2 ) = ((−r1 ) + r1 ) × (−r2 ) = 0 × (−r2 ) = 0. Then, we have (−r1 ) × (−r2 ) = r1 × r2 . This completes the proof of the proposition. ' & Definition 6.12 Let (R, +, ×, 0) be a ring and S ⊆ R. Then, S is said to be an ideal if (S, +, 0) is a subgroup of (R, +, 0) and r × s, s × r ∈ S, ∀r ∈ R and ∀s ∈ S. % Definition 6.13 Let (R, +R , ×R , 0R ) and (S, +S , ×S , 0S ) be two rings and T : R → S. T is said to be a ring homomorphism if it is a homomorphism and T (r1 ) ×S T (r2 ) = T (r1 ×R r2 ), ∀r1 , r2 ∈ R. T is said to be a ring isomorphism if it is a bijective ring homomorphism; in this case, the two rings are said to be isomorphic. %

6.3 Field Definition 6.14 Let (F, +, ×, 0) be a commutative ring with identity element 1. Then, the quintuple (F, +, ×, 0, 1) is said to be a field if (F \ {0}, ×, 1) forms an abelian group. %

140

6 Vector Spaces

Proposition 6.15 Let (F, +, ×, 0, 1) be a field, A ⊆ F , and 0, 1 ∈ A. Then, (A, +, ×, 0, 1) is a field (which will be called a subfield) if, and only if, ∀a1 , a2 ∈ A, a1 + (−a2 ) ∈ A, and a1 × a2−1 ∈ A when a2 = 0, where a2−1 denotes the multiplicative inverse of a2 . Proof “Necessity” This is straightforward. “Sufficiency” By Proposition 6.2, (A, +, 0) is a group. Since (F, +, 0) is abelian, then (A, +, 0) is also an abelian group. ∀a1 ∈ A with a1 = 0. By the assumption of the proposition, we have 1 ∈ A, by the property of field, we have 1 = 0. Then, a1−1 = 1 × a1−1 ∈ A. ∀a1 , a2 ∈ A. If a2 = 0, then, by Proposition 6.11, a1 × a2 = 0 ∈ A. On the other hand, if a2 = 0, we have a2−1 ∈ A and a1 × a2 = a1 × (a2−1 )−1 ∈ A. Hence, by Proposition 6.10, (A, +, ×, 0) is a ring. Since (F, +, ×, 0) is a commutative ring with identity element 1 and 1 ∈ A, then (A, +, ×, 0) is also a commutative ring with identity element 1. Note that 1 ∈ A \ {0}, and then, A \ {0} = ∅ and A \ {0} ⊆ F \ {0}. ∀a1 , a2 ∈ A \ {0}, a1 × a2−1 ∈ A. We claim that a1 ×a2−1 = 0 since, otherwise, a1 = a1 ×1 = a1 ×(a2−1 ×a2 ) = (a1 ×a2−1 )×a2 = 0 by Proposition 6.11, which is a contradiction. Hence, a1 × a2−1 ∈ A \ {0}. Since (F \ {0}, ×, 1) is an abelian group, then, by Proposition 6.2, (A \ {0}, ×, 1) is also a group and is further abelian. Therefore, (A, +, ×, 0, 1) is a field. This completes the proof of the proposition. ' &

6.4 Vector Spaces Associated with every vector space is a set of scalars. This set of scalars can be any algebraic field .F := (F, +, ·, 0, 1). Examples of fields are the rational numbers .Q, the real numbers .R, and the complex numbers .C. Here, we will abuse the notation to say .x ∈ F when .x ∈ F . Definition 6.16 A vector space .X over a field .F := (F, +, ·, 0, 1) is a set X of elements called vectors together with two operations .⊕ and .⊗. .⊕ : X × X → X is called vector addition. It associates any two vectors .x, y ∈ X with a vector .x ⊕ y ∈ X, the sum of x and y. .⊗ : F × X → X is called scalar multiplication. It associates a scalar .α ∈ F and a vector .x ∈ X with a vector .α ⊗ x ∈ X, the scalar multiple of x by .α. Furthermore, the following properties hold for .∀x, y, z ∈ X and .∀α, β ∈ F : x ⊕ y = .y ⊕ x (commutative law). (x ⊕ y) ⊕ z = .x ⊕ (y ⊕ z) (associative law). .∃ a null vector .ϑ ∈ X such that .x ⊕ ϑ = x, .∀x ∈ X. .α ⊗ (x ⊕ y) = (α ⊗ x) ⊕ (α ⊗ y) = α ⊗ x ⊕ α ⊗ y, where we neglected the parenthesis in the last equality since we assume that .⊗ takes precedence over .⊕ (distributive law). (v) .(α + β) ⊗ x = α ⊗ x ⊕ β ⊗ x (distributive law). (vi) .(αβ) ⊗ x = α ⊗ (β ⊗ x) (associative law). (vii) .0 ⊗ x = ϑ and .1 ⊗ x = x. (i) (ii) (iii) (iv)

. .

6.4 Vector Spaces

141

We thus denote the quadruple .(X, ⊕, ⊗, ϑ) by .X . The vector space is denoted by (X , F ). %

.

For convenience, .(−1) ⊗ x =: x and called the negative of the vector x. Note that (x) ⊕ x = x ⊕ (x) = 1 ⊗ x ⊕ (−1) ⊗ x = (1 + (−1)) ⊗ x = 0 ⊗ x = ϑ

.

We will also denote .x  y := x ⊕ (y). We will abuse the notation to say .x ∈ X when .x ∈ X. Note that .(X , ⊕, ϑ) forms an abelian group. Proposition 6.17 Let .X := (X, ⊕, ⊗, ϑ) be a vector space over the field .F := (F, +, ·, 0, 1). .∀x, y, z ∈ X , .∀α, β ∈ F , we have: 1. 2. 3. 4. 5. 6.

x ⊕ y = x ⊕ z .⇒ .y = z (cancellation law). α ⊗ x = α ⊗ y and .α = 0 .⇒ .x = y (cancellation law). .α ⊗ x = β ⊗ x and .x = ϑ .⇒ .α = β (cancellation law). .(α − β) ⊗ x = α ⊗ x  β ⊗ x (distributive law). .α ⊗ (x  y) = α ⊗ x  α ⊗ y (distributive law). .α ⊗ ϑ = ϑ. . .

We will call .ϑ the origin. Example 6.18 .X = {ϑ} with .ϑ ⊕ ϑ = ϑ and .α ⊗ ϑ = ϑ, .∀α ∈ F . Then, (X, ⊕, ⊗, ϑ) is a vector space over .F . %

.

Example 6.19 Let .F := (F, +, ·, 0, 1) be a field. Then, .(F, +, ·, 0) is a vector space over .F . We will abuse the notation and say that .F is a vector space over .F . % Example 6.20 Let .F := (F, +, ·, 0, 1) be a field, .Y := (Y, ⊕Y , ⊗Y , ϑY ) be a vector space over .F , and A be a set. .X = {f : A → Y}, that is, X is the set of all .Y-valued functions on A. Define vector addition and scalar multiplication by, .∀x, y ∈ X, .∀α ∈ F , .za := x ⊕ y ∈ X is given by .za (u) = x(u) ⊕Y y(u), .∀u ∈ A, .zs := α ⊗ x ∈ X is given by .zs (u) = α ⊗Y x(u), .∀u ∈ A. Let .ϑ ∈ X be given by .ϑ(u) = ϑY , .∀u ∈ A. Now, we will show that .X := (X, ⊕, ⊗, ϑ) is a vector space over .F . .∀x, y, z ∈ X, .∀α, β ∈ F , .∀u ∈ A, we have: (i) .(x ⊕ y)(u) = x(u) ⊕Y y(u) = y(u) ⊕Y x(u) = (y ⊕ x)(u) .⇒ .x ⊕ y = y ⊕ x. (ii) .((x ⊕ y) ⊕ z)(u) = (x ⊕ y)(u) ⊕Y z(u) = (x(u) ⊕Y y(u)) ⊕Y z(u) = x(u) ⊕Y (y(u) ⊕Y z(u)) = x(u) ⊕Y (y ⊕ z)(u) = (x ⊕ (y ⊕ z))(u) .⇒ .(x ⊕ y) ⊕ z = x ⊕ (y ⊕ z). (iii) .(x ⊕ ϑ)(u) = x(u) ⊕Y ϑY = x(u) .⇒ .x ⊕ ϑ = x. (iv) .(α ⊗ (x ⊕ y))(u) = α ⊗Y (x ⊕ y)(u) = α ⊗Y (x(u) ⊕Y y(u)) = α ⊗Y x(u) ⊕Y α ⊗Y y(u) = (α ⊗ x)(u) ⊕Y (α ⊗ y)(u) = (α ⊗ x ⊕ α ⊗ y)(u) .⇒ .α ⊗ (x ⊕ y) = α ⊗ x ⊕ α ⊗ y. (v) .((α + β) ⊗ x)(u) = (α + β) ⊗Y x(u) = α ⊗Y x(u) ⊕Y β ⊗Y x(u) = (α ⊗x)(u)⊕Y (β ⊗x)(u) = (α ⊗x ⊕β ⊗x)(u) .⇒ .(α +β)⊗x = α ⊗x ⊕β ⊗x. (vi) .((αβ) ⊗ x)(u) = (αβ) ⊗Y x(u) = α ⊗Y (β ⊗Y x(u)) = α ⊗Y (β ⊗ x)(u) = (α ⊗ (β ⊗ x))(u) .⇒ .(αβ) ⊗ x = α ⊗ (β ⊗ x).

142

6 Vector Spaces

(vii) .(0⊗x)(u) = 0⊗Y x(u) = ϑY = ϑ(u) .⇒ .0⊗x = ϑ; .(1⊗x)(u) = 1⊗Y x(u) = x(u) .⇒ .1 ⊗ x = x. Therefore, .X is a vector space over .F . This vector space will be denoted by (M(A, Y), F ). %

.

Example 6.21 Let .F := (F, +, ·, 0, 1) be a field. .X = F n with .n ∈ N. Define vector addition and scalar multiplication by .x ⊕y := (ξ1 +η1 , . . . , ξn +ηn ) ∈ X, .α⊗ x := (αξ1 , . . . , αξn ) ∈ X, .∀x = (ξ1 , . . . , ξn ), y = (η1 , . . . , ηn ) ∈ X, .∀α ∈ F . Let n .ϑ := (0, . . . , 0) ∈ X. Then, it is straightforward to check that .F := (X, ⊕, ⊗, ϑ) is a vector space over .F . % Example 6.22 Let .F := (F, +, ·, 0, 1) be a field. .X = F m×n := {m × n-dimensional F -valued matrices} with .m, n ∈ N. Define vector addition and scalar multiplication by .x ⊕ y := (ξij + ηij )m×n ∈ X, .α ⊗ x := (αξij )m×n ∈ X, .∀x = (ξij )m×n , y = (ηij )m×n ∈ X, .∀α ∈ F. Let .ϑ := (0)m×n ∈ X. Then, it is straightforward to check that .F m×n := (X, ⊕, ⊗, ϑ) is a vector space over .F . % Example 6.23 Let .F := (F, +, ·, 0, 1) be a field. .X = {(ξk )∞ k=1 | ξk ∈ F , ∀k ∈ N}. Define vector addition and scalar multiplication by .x ⊕ y := (ξk + ηk )∞ k=1 ∈ X, ∞ ∞ ∞ .α ⊗ x := (αξk ) ∈ X, .∀x = (ξk ) , y = ∈ X, .∀α ∈ F . Let .ϑ := (η ) k k=1 k=1 k=1 (0, 0, . . .) ∈ X. Then, it is straightforward to check that .(X, ⊕, ⊗, ϑ) is a vector space over .F . %

6.5 Product Spaces Proposition 6.24 Let X := (X, ⊕X , ⊗X , ϑX ) and Y := (Y, ⊕Y , ⊗Y , ϑY ) be vector spaces over the field F := (F, +, ·, 0, 1). The Cartesian product of X and Y, denoted by X × Y, is the quadruple (X × Y, ⊕, ⊗, (ϑX , ϑY )), where the vector addition ⊕ : (X × Y ) × (X × Y ) → X × Y and the scalar multiplication ⊗ : F × (X × Y ) → X × Y are given by, ∀(x1 , y1 ), (x2 , y2 ) ∈ X × Y , ∀α ∈ F , (x1 , y1 ) ⊕ (x2 , y2 ) := (x1 ⊕X x2 , y1 ⊕Y y2 ) and α ⊗ (x1 , y1 ) := (α ⊗X x1 , α ⊗Y y1 ). Then, (X × Y, F ) is a vector space. Proof Let ϑ := (ϑX , ϑY ) ∈ X × Y . ∀(x1 , y1 ), (x2 , y2 ), (x3 , y3 ) ∈ X × Y , ∀α, β ∈ F: (i) (x1 , y1 ) ⊕ (x2 , y2 ) = (x1 ⊕X x2 , y1 ⊕Y y2 ) = (x2 ⊕X x1 , y2 ⊕Y y1 ) = (x2 , y2 ) ⊕ (x1 , y1 ). (ii) ((x1 , y1 ) ⊕ (x2 , y2 )) ⊕ (x3 , y3 ) = (x1 ⊕X x2 , y1 ⊕Y y2 ) ⊕ (x3 , y3 ) = ((x1 ⊕X x2 ) ⊕X x3 , (y1 ⊕Y y2 ) ⊕Y y3 ) = (x1 ⊕X (x2 ⊕X x3 ), y1 ⊕Y (y2 ⊕Y y3 )) = (x1 , y1 ) ⊕ (x2 ⊕X x3 , y2 ⊕Y y3 ) = (x1 , y1 ) ⊕ ((x2 , y2 ) ⊕ (x3 , y3 )). (iii) (x1 , y1 ) ⊕ ϑ = (x1 ⊕X ϑX , y1 ⊕Y ϑY ) = (x1 , y1 ). (iv) α ⊗ ((x1 , y1 ) ⊕ (x2 , y2 )) = α ⊗ (x1 ⊕X x2 , y1 ⊕Y y2 ) = (α ⊗X (x1 ⊕X x2 ), α ⊗Y (y1 ⊕Y y2 )) = (α ⊗X x1 ⊕X α ⊗X x2 , α ⊗Y y1 ⊕Y α ⊗Y y2 ) = (α ⊗X x1 , α ⊗Y y1 ) ⊕ (α ⊗X x2 , α ⊗Y y2 ) = α ⊗ (x1 , y1 ) ⊕ α ⊗ (x2 , y2 ).

6.6 Subspaces

143

(v) (α + β) ⊗ (x1 , y1 ) = ((α + β) ⊗X x1 , (α + β) ⊗Y y1 ) = (α ⊗X x1 ⊕X β ⊗X x1 , α ⊗Y y1 ⊕Y β ⊗Y y1 ) = (α ⊗X x1 , α ⊗Y y1 ) ⊕ (β ⊗X x1 , β ⊗Y y1 ) = α ⊗ (x1 , y1 ) ⊕ β ⊗ (x1 , y1 ). (vi) (αβ) ⊗ (x1 , y1 ) = ((αβ) ⊗X x1 , (αβ) ⊗Y y1 ) = (α ⊗X (β ⊗X x1 ), α ⊗Y (β ⊗Y y1 )) = α ⊗ (β ⊗X x1 , β ⊗Y y1 ) = α ⊗ (β ⊗ (x1 , y1 )). (vii) 0 ⊗ (x1 , y1 ) = (0 ⊗X x1 , 0 ⊗Y y1 ) = (ϑX , ϑY ) = ϑ; 1 ⊗ (x1 , y1 ) = (1 ⊗X x1 , 1 ⊗Y y1 ) = (x1 , y1 ). Hence, X × Y is a vector space over F .

' &

With the above definition, it is easy to generalize to X1 × X2 × · · · × X n , where n n ∈ N. We will also write X n = X × · · · × X , where n ∈ N. When n = 0, i=1 Xi    n

is given by the vector space defined in Example 6.18.

6.6 Subspaces Proposition 6.25 Let X := (X, ⊕, ⊗, ϑ) be a vector space over the field F := (F, +, ·, 0, 1), and M ⊆ X with M = ∅. Then, M := (M, ⊕, ⊗, ϑ) is a vector space over F (which will be called a subspace of (X , F )) if, and only if, ∀x, y ∈ M, ∀α, β ∈ F , we have α ⊗ x ⊕ β ⊗ y ∈ M. We will also abuse the notation to say M is a subspace of (X , F ). M is said to be a proper subspace of (X , F ) if M ⊂ X. Proof “If” Since M = ∅, then ∃x0 ∈ M. ϑ = ϑ ⊕ ϑ = 0 ⊗ x0 ⊕ 0 ⊗ x0 ∈ M. ∀x, y ∈ M, ∀α, β ∈ F . x ⊕ y = 1 ⊗ x ⊕ 1 ⊗ y ∈ M. Hence, M is closed under vector addition. α ⊗ x = (α + 0) ⊗ x = α ⊗ x ⊕ 0 ⊗ x ∈ M. Hence, M is closed under scalar multiplication. Then, it is straightforward to show that M is a vector space over F . “Only if” This is straightforward. This completes the proof of the proposition. ' & Example 6.26 We present the following list of examples of subspaces: 1. Let (X , F ) be a vector space. Then, the singleton set M = {ϑ} is a subspace. 2. Consider the vector space (R3 , R). Any straight line or plane that passes through the origin is a subspace. 3. Consider the vector space (Rn , R), n ∈ N. Let a ∈ Rn . The set M := {x ∈ Rn | a, x = 0} is a subspace. 4. Let X := {(ξk )∞ k=1 | ξk ∈ R, k ∈ N}, ⊕ and ⊗ be the usual addition and scalar multiplication, and ϑ = (0, 0, . . .). By Example 6.23, X := (X, ⊕, ⊗, ϑ) is a vector space over R. Let M := {(ξk )∞ k=1 ∈ X | limk∈N ξk ∈ R} be a subspace. 5. Let X := {f : (0, 1] → Rn }, ⊕ and ⊗ be the usual addition and scalar multiplication, and ϑ : (0, 1] → Rn be given by ϑ(t) = ϑRn , ∀t ∈ (0, 1]. By Example 6.20, X := (X, ⊕, ⊗, ϑ) is a vector space over R. Let M := {f ∈ X | f is continuous} be a subspace. %

144

6 Vector Spaces

To simplify notation in the theory, we will later simply discuss a vector space (X , F ) without further reference to components of X , where the operations are understood to be ⊕X and ⊗X and the null vector is understood to be ϑX . When it is clear from the context, we will neglect the subscript X . Also, we will write x1 + x2 for x1 ⊕ x2 and αx1 for α ⊗ x1 , ∀x1 , x2 ∈ X , ∀α ∈ F . Definition 6.27 Let X be a vector space over the field F . f : X → F is said to be a functional. % Definition 6.28 Let X and Y be vector spaces over the field F . A : X → Y is said to be linear if ∀x1 , x2 ∈ X , ∀α, β ∈ F , A(αx1 + βx2 ) = αA(x1 ) + βA(x2). Then, A is called a (vector space) homomorphism or a linear operator. Furthermore, if it is bijective, then A is said to be a (vector space) isomorphism. The null space of A is N (A) := {x ∈ X | A(x) = ϑY }. The range space of A is R(A) := range (A). B : X → Y is said to be an affine operator if B(x) = A(x) + y0 , ∀x ∈ X , where A : X → Y is a linear operator and y0 ∈ Y. % Example 6.29 A row vector v ∈ R1×n is a linear functional on Rn . A matrix A ∈ Rm×n is a linear function of Rn to Rm . % For linear operators, we will adopt the following convention. Let A : X → Y, B : Y → Z, and x ∈ X , where X , Y, Z are vector spaces and A and B are linear operators. We will write Ax for A(x) and BA for B ◦A. Clearly, N (A) is a subspace of X and R(A) is a subspace of Y. When A and B are bijective, we will denote Ainv by A−1 and Binv by B −1 , where A−1 and B −1 are linear operators, then BA is also bijective, and (BA)−1 = A−1 B −1 . Definition 6.30 Let (X , F ) be a vector space, α ∈ F , and S, T ⊆ X . The sets αS and S + T are defined by αS := {αs ∈ X | s ∈ S};

.

S + T := {s + t ∈ X | s ∈ S, t ∈ T }

%

This concept is illustrated in Fig. 6.1. We should note that S + T = T + S, ∅ + S = ∅, {ϑ} + S = S, and α∅ = ∅. Thus, S − T := S + (−T ).

Fig. 6.1 The sum of two sets

6.6 Subspaces

145

Proposition 6.31 Let M and N be subspaces of a vector space (X , F ) and α¯ ∈ F . Then, M ∩ N, M + N, and αM ¯ are subspaces of (X , F ). Proof Since M and N are subspaces, then ϑ ∈ M and ϑ ∈ N. Hence, ϑ ∈ M ∩N = ∅, ϑ ∈ M + N = ∅, and ϑ = αϑ ¯ ∈ αM ¯ = ∅. ∀x, y ∈ M ∩ N, ∀α, β ∈ F , αx + βy ∈ M and αx + βy ∈ N. Then, αx + βy ∈ M ∩ N. Hence, M ∩ N is a subspace. ∀x, y ∈ M + N, ∀α, β ∈ F , we have x = x1 + x2 and y = y1 + y2 , where x1 , y1 ∈ M and x2 , y2 ∈ N. Then, αx1 + βy1 ∈ M and αx2 + βy2 ∈ N. This implies that αx + βy = (αx1 + βy1 ) + (αx2 + βy2 ) ∈ M + N. Hence, M + N is a subspace. ∀x, y ∈ αM, ¯ ∀α, β ∈ F , we have x = α¯ x¯ and y = α¯ y, ¯ where x, ¯ y¯ ∈ M. Then, αx + βy = α α¯ x¯ + β α¯ y¯ = α(α ¯ x¯ + β y) ¯ ∈ αM. ¯ Hence, αM ¯ is a subspace. This completes the proof of the proposition. ' & We should note that M ∪ N is in general not a subspace. Definition 6.32 A linear combination of vectors x1 , . . . , xn , where n ∈ Z+ , in a vector space (X , F ) is a sum of the form ni=1 αi xi := α1 x1 + · · · + αn xn , where α1 , . . . , αn ∈ F . % Note that + is defined for two vectors. To sum n vectors, one must add two at a time. By the definition of vector space, the simplified notion is not ambiguous. When n = 0, we take the sum to be ϑX . A linear combination can only involve finitely many vectors. Definition 6.33 Let (X , F ) be a vector space and S ⊆ X . .

span (S) := {

-n

i=1 αi xi

| xi ∈ S, αi ∈ F , i = 1, . . . , n, n ∈ Z+ }

= {linear combination of vectors in S} is called the subspace generated by S.

%

Proposition 6.34 Let (X , F ) be a vector space and S ⊆ X . Then, span (S) is the smallest subspace containing S. Proof Clearly, ϑ ∈ span (S) = ∅. ∀x, y ∈ span (S), ∀α, β ∈ F . It is easy to show that αx + βy ∈ span (S). Hence, span (S) is a subspace. Clearly, span (S) ⊇ S. ∀M ⊆ X such that S ⊆ M and M is a subspace. Clearly, ϑ ∈ M by the proof of Proposition 6.25. ∀y equals to a linear combination of vectors in S. Then, y = -n i=1 αi xi with n ∈ Z+ , x1 , . . . , xn ∈ S, and α1 , . . . , αn ∈ F . This implies that x1 , . . . , xn ∈ M and y ∈ M. Hence, span (S) ⊆ M. Hence, span (S) is the smallest subspace containing S. This completes the proof of the proposition. ' &

146

6 Vector Spaces

Definition 6.35 Let (X , F ) be a vector space, M ⊆ X be a subspace, and x0 ∈ X . Then, V := {x0 } + M is called a linear variety. % The translation of a subspace is a linear variety. We will abuse the notation to write x0 + M for {x0 } + M. ∀x¯ ∈ V , V − x¯ := V − {x} ¯ is a subspace. Definition 6.36 Let (X , F ) be a vector space, S ⊆ X , and S = ∅. The linear variety generated by S, denoted by v (S), is defined as the intersection of all linear varieties in X that contain S. % Proposition 6.37 Let (X , F ) be a vector space and ∅ = S ⊆ X . Then, v (S) is a linear variety given by v (S) = x0 + span (S − x0 ), where x0 is any vector in S. Proof Note that S = S − x0 + x0 ⊆ x0 + span (S − x0 ) and x0 + span (S − x0 ) is a linear variety. Hence, v (S) ⊆ x0 + span (S − x0 ). ∀V ⊆ X such that S ⊆ V and V is a linear variety. Then, x0 ∈ V and V − x0 is a subspace. Clearly, S − x0 ⊆ V − x0 , which implies that, by Proposition 6.34, span (S − x0 ) ⊆ V − x0 . Therefore, x0 + span (S − x0 ) ⊆ V . Hence, x0 + span (S − x0 ) ⊆ v (S). Therefore, v (S) = x0 + span (S − x0 ). ' & This completes the proof of the proposition.

6.7 Convex Sets Denote .K to be either .R or .C. Definition 6.38 Let .(X , K) be a vector space and .C ⊆ X . C is said to be convex if, .∀x1 , x2 ∈ C, .∀α ∈ [0, 1] ⊂ R, we have .αx1 + (1 − α)x2 ∈ C. % Subspaces and linear varieties are convex, so is .∅. See Fig. 6.2 for illustration. Proposition 6.39 Let .(X , K) be a vector space and .K, G ⊆ X be convex sets. Then: 1. .λK is convex, .∀λ ∈ K. 2. .K + G is convex. Proof .∀λ ∈ K. .∀x1 , x2 ∈ λK, .∀α ∈ [0, 1] ⊂ R. .∃k1 , k2 ∈ K such that .x1 = λk1 and .x2 = λk2 . Then, .αx1 + (1 − α)x2 = αλk1 + (1 − α)λk2 = λ(αk1 + (1 − α)k2 ). Since K is convex, then .αk1 + (1 − α)k2 ∈ K. This implies that .αx1 + (1 − α)x2 ∈ λK. Hence, .λK is convex. .∀x1 , x2 ∈ K + G, .∀α ∈ [0, 1] ⊂ R. .∃k1 , k2 ∈ K and .∃g1 , g2 ∈ G such that .xi = ki + gi , .i = 1, 2. Note that αx1 + (1 − α)x2 = α(k1 + g1 ) + (1 − α)(k2 + g2 )

.

= αk1 + αg1 + (1 − α)k2 + (1 − α)g2 = (αk1 + (1 − α)k2 ) + (αg1 + (1 − α)g2 )

6.7 Convex Sets

147

Fig. 6.2 Convex and nonconvex sets

Fig. 6.3 Convex hulls

Since K and G are convex, then .(αk1 + (1 − α)k2 ) ∈ K and .(αg1 + (1 − α)g2 ) ∈ G. This implies that .αx1 + (1 − α)x2 ∈ K + G. Hence, .K + G is convex. This completes the proof of the proposition. ' & Proposition 6.40 Let .(X , K) bea vector space and .{Cλ }λ∈Λ be a collection of convex subsets of .X . Then, .C := λ∈Λ Cλ is convex. Proof .∀x1 , x2 ∈ C, .∀α ∈ [0, 1] ⊂ R. .∀λ ∈ Λ, .x1 , x2 ∈ Cλ . Since .Cλ is convex, then .αx1 + (1 − α)x2 ∈ Cλ . This implies that .αx1 + (1 − α)x2 ∈ C. Hence, C is convex. This completes the proof of the proposition. ' & Definition 6.41 Let .(X , K) be a vector space and .S ⊆ X . The convex hull generated by S, denoted by .co (S), is the smallest convex set containing S (Fig. 6.3). % Justification of the existence of convex hull rests with Proposition 6.40. Definition 6.42 Let .(X , K) be a vector space. A convex -n combination of vectors x1 , . . . , xn ∈ X , where.n ∈ N, is a linear combination . i=1 αi xi with .αi ∈ [0, 1] ⊂ R, .∀i = 1, . . . , n, and . ni=1 αi = 1. %

.

Proposition 6.43 Let .(X , K) be a vector space and .S ⊆ X . Then, co (S) = {convex combinations of vectors in S}

.

Proof We need the follow result. Claim 6.43.1 Let .G ⊆ X be a convex subset. Then any convex combination of vectors in G belongs to G.

148

6 Vector Spaces

Proof of Claim We need to show: .∀n ∈N, .∀x1 , . . . , xn ∈ G,-.∀α1 , . . . , αn ∈ R, such that .αi ∈ [0, 1] ⊂ R, .i = 1, . . . , n, . ni=1 αi = 1 implies . ni=1 αi xi ∈ G. We will prove this by mathematical induction -on n: 1◦ Consider .n = 1. Then, .α1 = 1 and . ni=1 αi xi = x1 ∈ G. The result holds. 2◦ Assume that the result holds for .n = k ∈ N. 3◦ Consider the case .n = k + 1. Without loss of generality, assume .α1 > 0. By k . αi xi ∈ G. Then, the induction hypothesis, we have . α1 + · · · + αk i=1

k+1 . .

αi xi = (α1 + · · · + αk )

i=1

k . i=1

αi xi + αk+1 xk+1 α1 + · · · + αk

= αk+1 xk+1 + (1 − αk+1 )

k . i=1

αi xi α1 + · · · + αk

By the convexity of G, we have . k+1 i=1 αi xi ∈ G. Hence, the result holds for .n = k + 1. This completes the induction process and the proof of the claim. ' & .∀x1 , x2 ∈ K := {convex combinations of vectors in S}, .∀α ∈ [0, 1] ⊂ R. By the definition of K, = 1, 2, .∃ni ∈ N, .∃yi,1 , . . . , yi,ni ∈ S, .∃αi,1 , . . . , αi,ni ∈ [0, 1] ⊂ - .∀i i i R, such that . nj =1 αi,j = 1 and .xi = nj =1 αi,j yi,j . Then,

αx1 + (1 − α)x2 = α

n1 .

.

α1,j y1,j + (1 − α)

j =1

=

n1 .

αα1,j y1,j

j =1

n2 .

α2,j y2,j

j =1 n2 . + (1 − α)α2,j y2,j j =1

Note that .αα1,j ≥ 0, .j = 1, . . . , n1 , and .(1 − α)α2,j ≥ 0, .j = 1, . . . , n2 , and n1 n2 n1 n2 . . . . . αα1,j + (1 − α)α2,j = α α1,j + (1 − α) α2,j = α + (1 − α) = 1. j =1

j =1

j =1

j =1

Hence, .αx1 + (1 − α)x2 ∈ K. This shows that K is convex. Clearly, .S ⊆ K. On the other hand, fix any convex set G in the vector space, satisfying .S ⊆ G. .∀p ∈ K, by Claim 6.43.1, .p ∈ G since .S ⊆ G. Then, .K ⊆ G. The above implies that K is the smallest convex set containing S. Hence, .K = co (S). This completes the proof of the proposition. ' & Definition 6.44 Let .(X , K) be a vector space and .C ⊆ X . C is said to be a cone with vertex at origin if .ϑ ∈ C and, .∀x ∈ C, .∀α ∈ [0, ∞) ⊂ R, we have .αx ∈ C. C is said to be a cone with vertex .p ∈ X if .C = p + D, where D is a cone with vertex

6.8 Linear Independence and Dimensions

149

Fig. 6.4 Cones

at origin. C is said to be a conic segment if .ϑ ∈ C and, .∀x ∈ C, .∀α ∈ [0, 1] ⊂ R, we have .αx ∈ C. % If vertex is not explicitly mentioned, it is assumed to be at origin (Fig. 6.4). Convex cones arise in connection with positive vectors. In .Rn with .n ∈ N, the positive cone may be defined as P = {x = (ξ1 , . . . , ξn ) ∈ Rn | ξi ≥ 0, i = 1, . . . , n}

.

6.8 Linear Independence and Dimensions Definition 6.45 Let (X , F ) be a vector space, x ∈ X , and S ⊆ X . The vector x is said to be linearly dependent upon S if x ∈ span (S). Otherwise, x is said to be linearly independent of S. S is said to be a linearly independent set if, ∀y ∈ S, y is linearly independent of S \ {y}. % Note that ∅ is a linearly independent set; {x} is a linearly independent set if, and only if, x = ϑ; and {x1 , x2 } is a linearly independent set if, and only if, x1 and x2 do not lie on a common line through the origin. Theorem 6.46 Let X be a vector space over the field F := (F, +, ·, 0, 1) and S ⊆ X . Then, S is a linearly independent set if, andonly if, ∀n ∈ N, ∀α1 , . . . , αn ∈ F , ∀x1 , . . . , xn ∈ S, which are distinct, we have ni=1 αi xi = ϑ, which implies that αi = 0, i = 1, . . . , n. Proof “Sufficiency” We will prove it using an argument of contradiction. Suppose S is not a linearly independent set. Then, ∃y ∈ S such that y is linearly dependent upon S \{y}. So, y ∈ span (S \ {y}). ∃n ∈ N, ∃x2 , . . . , xn ∈ S \{y}, and ∃α2 , . . . , αn ∈ F such that y = ni=2 αi xi (when n = 1, then y = ϑ). Without loss of generality, we may assume-that x2 , . . . , xn are distinct. Let x1 = y and α1 = −1 = 0. Then, we have ni=1 αi xi = ϑ with α1 = 0 and x1 , . . . , xn are distinct. This is a contradiction. Hence, the sufficiency result holds. “Necessity” We again prove this by an argument of contradiction. Suppose the result does not hold. - ∃n ∈ N, ∃α1 , . . . , αn ∈ F , and ∃x1 , . . . , xn ∈ S which are distinct such that ni=1 αi xi = ϑ and ∃i0 ∈ {1, . . . , n} such that αi0 = 0. Without loss of generality, we may assume i0 = 1. Then, we have α1 x1 = − ni=2 αi xi . -n −1 Hence, x1 = i=2 (−α1 αi )xi . Note that xi = x1 implies that xi ∈ S \ {x1 },

150

6 Vector Spaces

i = 2, . . . , n. Hence, x1 is linearly dependent upon S \ {x1 }. This is a contradiction. Then, the necessity result holds. This completes the proof of this theorem. ' & Corollary 6.47 Let X be a vector space over the field F := (F, +, ·, 0, 1), {x1 , . . . , xn } ⊆ X be a linearly independent are distinct, α1 , . . . , αn , - set, xi ’s β1 , . . . , βn ∈ F , and n ∈ Z+ . Assume that ni=1 αi xi = ni=1 βi xi . Then, αi = βi , i = 1, . . . , n. Proof The assumption implies ni=1 (αi − βi )xi = ϑ. When n = 0, clearly the result holds. When n ∈ N, by Theorem 6.46, we have αi − βi = 0, i = 1, . . . , n. This completes the proof of the corollary. ' & Definition 6.48 Let X be a vector space over the field F := (F, +, ·, 0, 1), n ∈ Z+ , x1 , . . . , xn ∈ X . The vectors x1 , . . . , xn are linearly independent if, ∀α1 , . . . , αn ∈ F , ni=1 αi xi = ϑ, which implies α1 = · · · = αn = 0. Otherwise, these vectors are said to be linearly dependent. % Lemma 6.49 Let X be a vector space over the field F := (F, +, ·, 0, 1). Then: 1. x1 , . . . , xn ∈ X are linearly independent, where n ∈ Z+ , if, and only if, they are distinct and the set {x1 , . . . , xn } is a linearly independent set. 2. S ⊆ X is a linearly independent set if, and only if, ∀n ∈ N, ∀x1 , . . . , xn ∈ S that are distinct, which implies that x1 , . . . , xn are linearly independent. Proof 1. “Sufficiency” When n = 0, x1 , . . . , xn are linearly independent and ∅ is a linearly independent set. Hence, the result holds. When n ∈ N, this is straightforward from Theorem 6.46. “Necessity” We will prove this using an argument of contradiction. Suppose the result does not hold. We will distinguish two exhaustive cases: Case 1: x1 , . . . , xn are not distinct; Case 2: the set {x1, . . . , xn } is not a linearly independent set. Case 1. Without loss of generality, assume x1 = x2 . Set α1 = 1, α2 = −1 and the rest of αi ’s to 0. Then, we have ni=1 αi xi = ϑ, and hence, x1 , . . . , xn is linearly dependent. This is a contradiction. Case 2. By Theorem 6.46, ∃m ∈ N, ∃α1 , . . . , αm ∈F , which are not all 0’s, and ∃y1 , . . . , ym ∈ {x1 , . . . , xn } that are distinct, such that m i=1 αi yi = ϑ. Clearly, m ≤ n; otherwise, yi ’s are not distinct. Without loss of generality, - assume y1 = x1 , . . . , ym = xm . Set αm+1 = · · · = αn = 0. Then, we have ni=1 αi xi = ϑ with α1 , . . . , αn not all 0’s. This is a contradiction. Thus, we have arrived at a contradiction in every case. Hence, the necessity result holds. 2. This is straightforward from Theorem 6.46. This completes the proof of the lemma. ' & Definition 6.50 Let (X , F ) be a vector space and S ⊆ X be a linearly independent set with n ∈ Z+ elements. S is said to be a basis of the vector space if span (S) = X . In this case, the vector space is said to be finite-dimensional with dimension n. All other vector spaces are said to be infinite-dimensional. %

6.8 Linear Independence and Dimensions

151

Theorem 6.51 Let X be a finite-dimensional vector space over the field F := (F, +, ·, 0, 1). Then, the dimension n ∈ {0}∪N is unique. Furthermore, any linearly independent set of n vectors forms a basis of the vector space. Proof Let n ∈ Z+ be the minimum of dimensions for X . Then, there exists a set S1 ⊆ X with n elements such that S1 is a basis for the vector space. We will show that ∀y1 , . . . , ym ∈ X with m > n, then y1 , . . . , ym are linearly dependent. This implies that any subset with more than n elements cannot be a linearly independent set by Lemma 6.49. Henceforth, the dimension of the vector space is unique. ∀y1 , . . . , ym ∈ X with m > n. There are two exhaustive and mutually exclusive cases: Case 1: n = 0; Case 2: n > 0. Case 1: n = 0. Then, S1 = ∅, and X contains a single vector ϑ. Hence, y1 = · · · = ym = ϑ. Clearly, y1 , . . . , ym are linearly dependent. This case is proven. Case 2: n > 0. Take S1 = {x1 , . . . , xn }. Since S1 is a basis for the vector space, then, - ∀i ∈ {1, . . . , m}, yi ∈ span (S1 ), that is, ∃αij ∈ F , j = 1, . . . , n, such that yi = nj=1 αij xj . Consider the m × n-dimensional matrix A := (αij )m×n . Since m > n, then rank(A) < m, which implies that the row vectors of all 0’s, such that -mA are linearly dependent. ∃β1 , . . . , βm ∈-Fm, which are-not m -n β α = 0, j = 1, . . . , n. This implies β y = i ij i=1 i i i=1 j =1 βi αij xj = -i=1 n m j =1 ( i=1 βi αij )xj = ϑ. Hence, y1 , . . . , ym are linearly dependent. This case is also proven. Hence, the dimension of the vector space is unique. ∀S2 ⊆ X with n vectors that is a linearly independent set, we will show that S2 is a basis for the vector space. We will distinguish two exhaustive and mutually exclusive cases: Case 1: n = 0; Case 2: n > 0. Case 1: n = 0. Then, S2 = ∅, and X is a singleton set consisting of the null vector, which equals to span (S2 ). Hence, S2 is a basis for the vector space. Case 2: n > 0. Take S2 = {z1 , . . . , zn }. ∀x ∈ X , x, z1 , . . . , zn (for a total of n + 1 vectors), by the preceding proof, are linearly-dependent. Then, ∃α, β1 , . . . , βn ∈ F , which - are not all 0’s, such that αx + ni=1 βi zi = ϑ. Suppose α = 0. Then, ni=1 βi zi = ϑ. Since S2 is a linearly independent set, by Theorem 6.46, then β1 = · · · = βn = 0. This contradicts the fact that α, β1 , . . . , βn are not all 0’s. Hence, α = 0. Then, -n with−1 x = i=1 (−α βi )zi and x ∈ span (S2 ). Therefore, we have X = span (S2 ). Hence, S2 is a basis of the vector space. This completes the proof of the theorem. ' & Finite-dimensional spaces are simpler to analyze. Many results of finitedimensional spaces may be generalized to infinite-dimensional spaces. We endeavor to stress the similarity between the finite- and infinite-dimensional spaces.

Chapter 7

Banach Spaces

7.1 Normed Linear Spaces Vector spaces admit algebraic properties, but they lack topological properties. Denote .K to be either .R or .C. Definition 7.1 Let .(X , K) be a vector space. A norm on the vector space is a realvalued function .· : X → [0, ∞) ⊂ R that satisfies the following properties: .∀x, y ∈ X , .∀α ∈ K, (i) .0 ≤ x < +∞ and .x = 0 .⇔ .x = ϑ. (ii) .x + y ≤ x + y. (triangle inequality) (iii) .αx = |α|x. A real (complex) normed linear space is a vector space over the filed .R (or .C) together with a norm defined on it. A normed linear space consisting of the triple .(X , K, ·). % To simplify notation in the theory, we may later simply discuss a normed linear space .X := (X , K, ·) without further reference to components of .X, where the operations are understood to be .⊕X and .⊗X , the null vector is understood to be .ϑX , and the norm is understood to be .·X . When it is clear from the context, we will neglect the subscript .X. Example 7.2 Let .n ∈ Z+ . The Euclidean space .(Rn , R, |·|) 4 is a normed

linear space, where the norm is defined by .|(ξ1 , . . . , ξn )| = ξ12 + · · · + ξn2 , n .∀(ξ1 , . . . , ξn ) ∈ R . Note that we specifically denote the Euclidean norm as .|·|, rather than .·, to distinguish it from other norms on .Rn . %

n Example 7.3 Let .n ∈ Z+ . The space .(C 4-, C, |·|) is a normed linear space, where n 2 n the norm is defined by .|(ξ1 , . . . , ξn )| = i=1 |ξi | , .∀(ξ1 , . . . , ξn ) ∈ C . Note that

© The Author(s), under exclusive license to Springer Nature Switzerland AG 2023 Z. Pan, Measure-Theoretic Calculus in Abstract Spaces, https://doi.org/10.1007/978-3-031-21912-2_7

153

154

7 Banach Spaces

we specifically denote the norm as .|·|, rather than .·, to distinguish it from other norms on .Cn . % Example 7.4 Let .X := {f : [a, b] → R}, where .a, b ∈ R with .a ≤ b, .⊕ and .⊗ be the usual vector addition and scalar multiplication, and .ϑ : [a, b] → R be given by .ϑ(t) = 0, .∀t ∈ [a, b]. By Example 6.20, .X := (X, ⊕, ⊗, ϑ) is a vector space over .R. Let .M := {f ∈ X | f is continuous} is a subspace by Proposition 6.25. Let .M := (M, ⊕, ⊗, ϑ). Then, .(M, R) is a vector space. Introduce a norm on this space by .x = maxa≤t ≤b |x(t)|, .∀x ∈ M. Now, we verify the properties of the norm. .∀x, y ∈ M, .∀α ∈ R. (i) Since .[a, b] is a nonempty compact set and x is continuous, then, .x = |x(t)| ∈ [0, ∞) ⊂ R, for some .t ∈ [a, b]. Clearly, .x = 0 .⇔ .x = ϑ. (ii) .x + y = max |x(t) + y(t)| ≤ max (|x(t)| + |y(t)|) ≤ max |x(t)| + a≤t ≤b

max |y(t)| = x + y.

a≤t ≤b

a≤t ≤b

a≤t ≤b

(iii) .αx = max |αx(t)| = max |α||x(t)| = |α| max |x(t)| = |α|x, where a≤t ≤b

a≤t ≤b

a≤t ≤b

have made use of Proposition 3.81 and the fact that .x ∈ R in the third equality. Hence, .C([a, b]) := (M, R, ·) is a normed linear space.

%

Example 7.5 Let .X := {f : [a, b] → R}, where .a, b ∈ R with .a < b, .⊕ and ⊗ be the usual vector addition and scalar multiplication, and .ϑ : [a, b] → R be given by .ϑ(t) = 0, .∀t ∈ [a, b]. By Example 6.20, .X := (X, ⊕, ⊗, ϑ) is a vector space over .R. Let .M := {f ∈ X | f is continuously differentiable} is a subspace by Proposition 6.25. Let .M := (M, ⊕, ⊗, ϑ). Then, .(M, R) is a vector  space.  Introduce a norm on this space by .x = maxa≤t ≤b |x(t)| + maxa≤t ≤b x (1)(t), .∀x ∈ M. Now, we verify the properties of the norm. .∀x, y ∈ M, .∀α ∈ R. .

(i) Since .[a, b] is a nonempty set and x is continuously differentiable,  compact  then, .x = |x(t1 )| + x (1)(t2 ) ∈ [0, ∞) ⊂ R, for some .t1 , t2 ∈ [a, b]. Clearly, .x = 0 .⇔ .x = ϑ. (ii) .x + y = max |x(t) + y(t)| + max |x (1)(t) + y (1)(t)| ≤ max |x(t)| + a≤t ≤b

a≤t ≤b

a≤t ≤b

max |y(t)| + max |x (1)(t)| + max |y (1)(t)| = x + y. a≤t ≤b a≤t ≤b a≤t ≤b   (iii) .αx = maxa≤t ≤b |αx(t)| + maxa≤t ≤b αx (1) (t) = |α|x. Hence, .C1 ([a, b]) := (M, R, ·) is a normed linear space.

%

Example 7.6 Let .X := {(ξk )∞ k=1 | ξk ∈ R, ∀k ∈ N}, .⊕ and .⊗ be the usual vector addition and scalar multiplication, and .ϑ := (0, 0, . . .) ∈ X. By Example 6.23, .X := (X, ⊕, ⊗, ϑ) is a vector space over .R. For .p ∈ [1, ∞) ⊂ R, let .Mp :=  -∞ -∞ p p 1/p {(ξk )∞ , k=1 |ξk | < +∞}. Define the norm .xp = k=1 |ξk | k=1 ∈ X | ∞ ∞ |ξ | < +∞}. .∀x = (ξk ) ∈ M . Let . M := {(ξ ∈ X | sup Define the ) p ∞ k k=1 k≥1 k k=1 norm .x∞ = supk≥1 |ξk |, .∀x = (ξk )∞ k=1 ∈ M∞ . .∀p ∈ [1, +∞] ⊂ Re , we will

7.1 Normed Linear Spaces

155

show that .Mp := (Mp , ⊕, ⊗, ϑ) is a subspace in .(X , R) and .lp := (Mp , R, ·p ) is a normed linear space. .∀p ∈ [1, +∞] ⊂ Re . .ϑ ∈ lp = ∅. .∀x, y ∈ Mp , .∀α, β ∈ R. It is easy to check that .αxp = |α|xp . Then, by Theorem 7.9, .αx + βyp ≤ αxp + βyp = |α|xp + |β|yp < +∞. This implies that .αx + βy ∈ Mp . Hence, .Mp is a subspace of .(X , R). It is easy to check that .xp ∈ [0, ∞) ⊂ R, .∀x ∈ lp , and .xp = 0 .⇔ .x = ϑ. Therefore, .lp is a normed linear space. % Lemma 7.7 .∀a, b ∈ [0, ∞) ⊂ R, .∀λ ∈ (0, 1) ⊂ R, we have a λ b1−λ ≤ λa + (1 − λ)b

.

where equality holds if, and only if, .a = b. Proof Define .f : [0, ∞) → R by .f (t) = t λ − λt + λ − 1, .∀t ∈ [0, ∞) ⊂ R. Then, f is continuous, and is differentiable on .(0, ∞). .f (1) (t) = λt λ−1 − λ, .∀t ∈ (0, ∞). Then, .f (1) (t) > 0, .∀t ∈ (0, 1) and .f (1) (t) < 0, .∀t ∈ (1, ∞). Then, .f (t) ≤ f (1) = 0, .∀t ∈ [0, ∞), where equality holds if, and only if, .t = 1. We will distinguish two exhaustive and mutually exclusive cases: Case 1: .b = 0; Case 2: .b > 0. Case 1: .b = 0. We have .a λb1−λ = 0 ≤ λa = λa + (1 − λ)b, where equality holds if, and only if, .a = b = 0. This case is proved. Case 2: .b > 0. Since .f (a/b) ≤ 0, we immediately obtain the desired inequality, where equality holds if, and only if, .a/b = 1 .⇔ .a = b. This case is also proved. This completes the proof of the lemma. ' & Theorem 7.8 (Hölder’s Inequality) Let .p ∈ [1, +∞) ⊂ R and .q ∈ (1, +∞] ⊂ ∞ Re with .1/p + 1/q = 1. Then, .∀x = (ξk )∞ k=1 ∈ lp , .∀y = (ηk )k=1 ∈ lq , we have ∞ . .

|ξk ηk | ≤ xp yq

k=1

When .q < ∞, equality holds if, and only if, .∃α, β ∈ R, which are not both zeros, such that .α|ξk |p = β|ηk |q , .k = 1, 2, . . .. When .q = ∞, equality holds if, and only if, .|ηk | = y∞ for any .k ∈ N with .|ξk | > 0. Proof We will distinguish two exhaustive and mutually exclusive cases: Case 1: q = ∞; Case 2: .1 < q < +∞. Case 1: .q = ∞. Then, .p = 1. We have

.

∞ . .

k=1

|ξk ηk | =

∞ . k=1

|ξk ||ηk | ≤

∞ .

|ξk |y∞ = x1 y∞

k=1

where equality holds if, and only if, .|ηk | = y∞ for any .k ∈ N with .|ξk | > 0. This case is proved.

156

7 Banach Spaces

Case 2: .1 < q < +∞. Then, .1 < p < +∞. We will further distinguish two exhaustive and mutually exclusive cases: Case 2a: .xp yq = 0; Case 2b: .xp yq > 0. Case 2a: .xp yq = 0. Without loss of generality, assume .yq = 0. Then, .y = ϑ and ∞ . .

|ξk ηk | = 0 = xp yq

k=1

Equality holds .⇔ .α = 0 and .β = 1 and .α|ξk |p = β|ηk |q , .k = 1, 2, . . .. This subcase is proved. Case 2b: .xp yq > 0. Then, .xp > 0 and .yq > 0. .∀k ∈ N, by / 0q ! |ξk | p |ηk | Lemma 7.7 with .a = , .b = , and .λ = 1/p, we have xp yq .

|ξk ηk | 1 |ξk |p 1 |ηk |q ≤ + xp yq p xpp q yqq

with equality if, and only if, . k ∈ N, we have

|ξk |p |ηk |q p = q . Summing the above inequality for all xp yq

.

-∞ .

1 1 k=1 |ξk ηk | ≤ + =1 xp yq p q

|ηk |q |ξk |p p = p q , .k = 1, 2, . . .. Equality .⇒ .α = 1/xp xp yq q and .β = 1/yq and .α|ξk |p = β|ηk |q , .k = 1, 2, . . .. On the other hand, if .∃α, β ∈ R, which are not both zeros, such that .α|ξk |p = β|ηk |q , .k = 1, 2, . . ., then, without loss of generality, assume .β = 0. Let .α1 = α/β. Then, .α1 |ξk |p = |ηk |q , .k = p q q p 1, 2, . . .. Hence, .α1 xp = yq , which further implies that .α1 = yq /xp . p q |ξk | |ηk | Hence, . p = q , .k = 1, 2, . . .. This implies equality. Therefore, equality xp yq if, and only if, .∃α, β ∈ R, which are not both zeros, such that .α|ξk |p = β|ηk |q , .k = 1, 2, . . .. This subcase is proved. This completes the proof of the theorem. ' & with equality if, and only if, .

When .p = 2 = q, the Hölder’s inequality becomes the well-known Cauchy– Schwarz Inequality: ∞ . .

k=1

|ξk ηk | ≤

/∞ . k=1

|ξk |

2

01/2 / ∞ . k=1

01/2 |ηk |

2

7.1 Normed Linear Spaces

157

Theorem 7.9 (Minkowski’s Inequality) Let .p ∈ [1, ∞] ⊂ Re . .∀x, y ∈ lp , then, x + y ∈ lp and

.

x + yp ≤ xp + yp

.

When .1 < p < ∞, equality holds if, and only if, .∃α, β ∈ [0, ∞) ⊂ R, which are not both zeros, such that .αx = βy. ∞ Proof .∀x = (ξk )∞ k=1 , y = (ηk )k=1 ∈ lp . We will distinguish three exhaustive and mutually exclusive cases: Case 1: .p = 1; Case 2: .p = ∞; Case 3: .1 < p < ∞. ∞ ∞ . . |ξk + ηk | ≤ Case 1: .p = 1. .x + y1 = (|ξk | + |ηk |) = x1 + y1 < k=1

k=1

+∞. Equality holds if, and only if, .ξk ηk ≥ 0, .k = 1, 2, . . .. Hence, .x + y ∈ l1 . Case 2: .p = ∞. .x + y∞ = sup |ξk + ηk | ≤ sup (|ξk | + |ηk |) ≤ sup |ξk | + k∈N

k∈N

sup |ηk | = x∞ + y∞ < +∞. Hence, .x + y ∈ l∞ .

k∈N

k∈N

Case 3: .1 < p < ∞. We will further distinguish two exhaustive and mutually exclusive cases: Case 3a: .xp yp = 0; Case 3b: .xp yp > 0. Case 3a: .xp yp = 0. Without loss of generality, assume .yp = 0. Then, .y = ϑ and .ηk = 0, .k = 1, 2, . . .. Hence, .x+y = x ∈ lp and .x + yp = xp = xp +yp . Let .α = 0 and .β = 1, we have .αx = βy. Hence, equality holds if, and only if, .∃α, β ∈ [0, ∞) ⊂ R, which are not both zeros, such that .αx = βy. This subcase is proved. Case 3b: .xp yp > 0. Then, .xp > 0 and .yp > 0. Let .λ = xp ∈ (0, 1) ⊂ R. .∀k ∈ N. xp + yp |ξk + ηk |p . ≤ (xp + yp )p

/

|ξk | + |ηk | xp + yp

0p

/

|ξk | |ηk | = λ + (1 − λ) xp yp

0p

Since .1 < p < ∞, then the function .t p is strictly convex on .[0, ∞) ⊂ R. Then, we have .

|ξk + ηk |p |ξk |p |ηk |p ≤ λ + (1 − λ) p p (xp + yp )p xp yp

where equality holds .⇔ .ξk ηk ≥ 0 and .

|ηk | |ξk | ξk ηk = .⇔ . = . xp yp xp yp

158

7 Banach Spaces

Summing the above inequalities for all .k ∈ N, we have ∞

. x + yp |ξk + ηk |p = (xp + yp )p (xp + yp )p k=1 -∞ -∞ p |ηk |p k=1 |ξk | ≤λ + (1 − λ) k=1 p =1 p xp yp p

.

Therefore, x + yp ≤ xp + yp < +∞

.

ξk ηk = , .k = 1, 2, . . . .⇔ .(1/xp )x = (1/yp )y. xp yp Equality implies that .α = 1/xp and .β = 1/yp and .αx = βy. On the other hand, if .∃α, β ∈ [0, ∞) ⊂ R, which are not both zeros, such that .αx = βy, then, without loss of generality, assume .β = 0. Then, .y = α1 x with .α1 = α/β. Easy to show that .yp = α1 xp . Then, .α1 = yp /xp . This implies that .(1/xp )x = (1/yp )y, which further implies equality. Hence, equality holds if, and only if, .∃α, β ∈ [0, ∞), which are not both zeros, such that .αx = βy. Clearly, .x + y ∈ lp . This subcase is proved. This completes the proof of the theorem. ' & where equality holds .⇔ .

Example 7.10 Let .X be a real (complex) normed linear space. Let .Y := {(ξk )∞ k=1 | ξk ∈ X, ∀k ∈ N}. By Example 6.20, .(Y, ⊕, ⊗, ϑ) is a vector space over .K, where .⊕, .⊗, and .ϑ are defined in the example. For .p ∈ -as p ∞ [1, ∞) ⊂ R, let .Mp := {(ξk )∞ ∈ Y | k=1 ξk X < +∞}. Define k=1  -∞ p 1/p , .∀x = (ξk )∞ the norm .xp = k=1 ξk X k=1 ∈ Mp . Let .M∞ := ∞ {(ξk )k=1 ∈ Y | supk≥1 ξk X < +∞}. Define the norm .x∞ = supk≥1 ξk X , ∞ .∀x = (ξk ) k=1 ∈ M∞ . ∞ ∞ .∀p ∈ [1, ∞] ⊂ Re , .∀x = (ξk ) k=1 , y = (ηk )k=1 ∈ Mp . x + yp ⎧ ∞ !1/p !1/p ∞ . . ⎪ p ⎪ ⎨ ξk + ηk X ≤ (ξk X + ηk X )p p ∈ [1, ∞) = k=1 k=1 ⎪ ⎪ p=∞ ⎩ sup ξk + ηk X ≤ sup(ξk X + ηk X )

.

k≥1



k≥1

⎧ ∞ !1/p . ⎪ p ⎪ ⎨ ξk X + k=1

∞ . k=1

⎪ ⎪ ⎩ sup ξk X + sup ηk X k≥1

k≥1

= xp + yp < +∞

p ηk X

!1/p p ∈ [1, ∞) p=∞

7.1 Normed Linear Spaces

159

where we have made use of the Minkowski’s Inequality. In the preceding inequality, when .1 < p < ∞, equality holds if, and only if, .ξk + ηk X = ξk X + ηk X , .∀k ∈ N, and .∃α, β ∈ [0, ∞) ⊂ R, which are not both zeros, such that .αξk X = βηk X , .∀k ∈ N. ∞ .∀p ∈ [1, ∞] ⊂ Re . Note that .ϑ = (ϑX , ϑX , . . .) ∈ Mp = ∅. .∀x = (ξk ) k=1 ∈ Mp , .∀α ∈ K, we have

αxp =

.

⎧ ∞ !1/p . ⎪ p ⎪ ⎨ αξk X = k=1

p

|α|p ξk X

k=1

⎪ ⎪ ⎩ sup αξk X = sup |α|ξk X k≥1

=

∞ .

!1/p p ∈ [1, ∞) p=∞

k≥1

⎧ !1/p ∞ . ⎪ p ⎪ ⎨ |α| ξk X p ∈ [1, ∞) k=1

⎪ ⎪ ⎩ |α| sup ξk X

p=∞

= |α|xp < +∞

k≥1

where we have made use of Proposition 3.81. Then, .∀x, y ∈ Mp , .∀α, β ∈ K, we have .αx + βyp ≤ αxp + βyp = |α|xp + |β|yp < +∞. Then, .αx + βy ∈ Mp . Hence, .Mp := (Mp , ⊕, ⊗, ϑ) is a subspace in .(Y, ⊕, ⊗, ϑ). It is easy to check that, .∀x ∈ Mp , .xp ∈ [0, ∞) ⊂ R, and .xp = 0 .⇔ .x = (ϑX , ϑX , . . .) = ϑ. Therefore, .lp (X) := (Mp , K, ·p ) is a real (complex) normed linear space, .∀p ∈ [1, ∞] ⊂ Re . % Example 7.11 Let .X := {f : [a, b] → R}, where .a, b ∈ R with .a < b, .⊕ and .⊗ be the usual vector addition and scalar multiplication, and .ϑ : [a, b] → R be given by .ϑ(t) = 0, .∀t ∈ [a, b]. By Example 6.20, .X := (X, ⊕, ⊗, ϑ) is a vector space over .R. Let .M := {f ∈ X | f is continuous} is a subspace by Proposition 6.25. Let .M := (M, ⊕, ⊗, ϑ). Then, .(M, R) is a vector space. Introduce a norm on this b space by .x = a |x(t)| dt, .∀x ∈ M. Now, we verify the properties of the norm. .∀x, y ∈ M, .∀α ∈ R. (i) Since x is continuous, then, .|x(·)| is continuous and therefore integrable on .[a, b]. Hence, .0 ≤ x < +∞. .x = ϑ .⇒ .x = 0. On the other hand, .x = ϑ .⇒ .∃t1 ∈ [a, b] such that .|x(t1 )| > 0. By continuity of x, .∃δ ∈ (0, ∞) ⊂ R such that .|x(t)| > |x(t1 )|/2, .∀t ∈ [t1 − δ, t1 + δ] ∩ [a, b] .⇒ .x > 0. Hence, .x = 0 .⇔ .x = ϑ. b b (ii) .x + y = a |x(t) + y(t)| dt ≤ a (|x(t)| + |y(t)|) dt = x + y. b (iii) .αx = a |αx(t)| dt = |α|x. Hence, .(M, R, ·) is a normed linear space.

%

Definition 7.12 Let X be a set, .Y be a normed linear space, .f : X → Y. f is said to be bounded if .∃M ∈ [0, ∞) ⊂ R such that .f (x) ≤ M, .∀x ∈ X. .S ⊆ Y is said to be bounded if .∃M ∈ [0, ∞) ⊂ R such that .s ≤ M, .∀s ∈ S. %

160

7 Banach Spaces

Proposition 7.13 Let .X := (X , K, ·) be a normed linear space and .M ⊆ X be a subspace. Then, .(M, K, ·) is also a normed linear space. Proof This is straightforward and is therefore omitted.

' &

7.2 The Natural Metric A normed linear space is actually a metric space. Proposition 7.14 A normed linear space .X := (X , K, ·) admits the natural metric .ρ : X × X → [0, ∞) ⊂ R given by .ρ(x, y) = x − y, .∀x, y ∈ X. Proof .∀x, y, z ∈ X, we have (i) .ρ(x, y) ∈ [0, ∞); (ii) .ρ(x, y) = x − y = 0 .⇔ x − y = ϑ .⇔ .x = y; (iii) .ρ(x, y) = x − y = (−1) (x − y) = y − x = ρ(y, x); and (iv) Note that

.

ρ(x, z) = x − z = (x − y) + (y − z) ≤ x − y + y − z

.

= ρ(x, y) + ρ(y, z) Hence, .ρ is a metric. This completes the proof of the proposition.

' &

Now, we can talk about topological properties and metric properties of a normed linear space. When we refer to these properties of a normed linear space, we are referring to the above metric specifically. Proposition 7.15 Let .X be a normed linear space and .C ⊆ X be convex. Then, .C and .C ◦ are convex. Proof .∀x1 , x2 ∈ C, .∀α ∈ [0, 1] ⊂ R, we need to show that .αx1 +(1−α)x2 ∈ C. By Proposition 3.3, .∀r ∈ (0, ∞) ⊂ R, .B(xi , r)∩C = ∅, .i = 1, 2. Let .pi ∈ B(xi , r)∩C, .i = 1, 2. Then, by convexity of C, we have .αp1 + (1 − α)p2 ∈ C. Note that (αx1 + (1 − α)x2 ) − (αp1 + (1 − α)p2 )

.

= α(x1 − p1 ) + (1 − α)(x2 − p2 ) ≤ αx1 − p1  + (1 − α)x2 − p2  < r Then, .αp1 +(1−α)p2 ∈ B(αx1 + (1 − α)x2 , r)∩C = ∅. Hence, .αx1 +(1−α)x2 ∈ C by the arbitrariness of r and Proposition 3.3. Then, .C is convex. ◦ ◦ .∀x1 , x2 ∈ C , .∀α ∈ [0, 1] ⊂ R, we need to show that .αx1 + (1 − α)x2 ∈ C . .∃ri ∈ (0, ∞) ⊂ R such that .B(xi , ri ) ⊆ C, .i = 1, 2. Let .r := min{r1 , r2 } > 0. .∀p ∈ B(αx1 + (1 − α)x2 , r), let .w := p − αx1 − (1 − α)x2 . Then, .w < r and .xi + w ∈ B(xi , ri ) ⊆ C, .i = 1, 2. By the convexity of C, we have .C α(x1 + w) + (1 − α)(x2 + w) = p. Hence, we have .B(αx1 + (1 − α)x2 , r) ⊆ C and .αx1 + (1 − α)x2 ∈ C ◦ . Hence, .C ◦ is convex. This completes the proof of the proposition. ' &

7.2 The Natural Metric

161

Proposition 7.16 Let .X be a normed linear space, .x0 ∈ X, .S ⊆ X, and .P = x0 +S. Then, .P = x0 + S and .P ◦ = x0 + S ◦ . Proof .∀x ∈ P , by Proposition 3.3, .∀r ∈ (0, ∞) ⊂ R, .∃p0 ∈ P ∩ B(x, r). Then, p0 − x0 ∈ S ∩ B(x − x0 , r) = ∅. Hence, by Proposition 3.3, .x − x0 ∈ S and .x ∈ x0 + S. Hence, .P ⊆ x0 + S. On the other hand, .∀x ∈ x0 + S, we have .x − x0 ∈ S and, by Proposition 3.3, .∀r ∈ (0, ∞) ⊂ R, .∃s0 ∈ S ∩ B(x − x0 , r) = ∅. Then, .x0 + s0 ∈ P ∩ B(x, r) = ∅. By Proposition 3.3, .x ∈ P . Hence, .x0 + S ⊆ P . Therefore, we have .P = x0 + S. ◦ .∀x ∈ P , .∃r ∈ (0, ∞) ⊂ R such that .B(x, r) ⊆ P . Then, .B(x − x0 , r) = B(x, r) − x0 ⊆ P − x0 = S. Hence, .x − x0 ∈ S ◦ , .x ∈ x0 + S ◦ and .P ◦ ⊆ x0 + S ◦ . On the other hand, .∀x ∈ x0 + S ◦ , we have .x − x0 ∈ S ◦ . Then, .∃r ∈ (0, ∞) ⊂ R such that .B(x − x0 , r) ⊆ S. Hence, .B(x, r) = B(x − x0 , r) + x0 ⊆ x0 + S = P . Therefore, .x ∈ P ◦ and .x0 + S ◦ ⊆ P ◦ . Hence, we have .P ◦ = x0 + S ◦ . This completes the proof of the proposition. & ' .

Proposition 7.17 Let .X be a normed linear space over .K. Then, the following statements hold. (i) (ii) (iii) (iv)

If .M ⊆ X is a subspace, then .M is a subspace. If .V ⊆ X is a linear variety, then .V is a linear variety. If .C ⊆ X is a cone with vertex .p ∈ X, then .C is a cone with vertex p. Let .x0 ∈ X and .r ∈ (0, ∞) ⊂ R. Then, .B(x0 , r) = B(x0 , r).

Proof (i) Clearly, we have .ϑ ∈ M ⊆ M = ∅. .∀x, y ∈ M, .∀α ∈ K, we will show that .x + y, αx ∈ M. Then, .M is a subspace. .∀r ∈ (0, ∞) ⊂ R, by Proposition 3.3, .∃x0 ∈ B(x, r/2)∩M = ∅ and .∃y0 ∈ B(y, r/2)∩M = ∅. Since M is a subspace, then .x0 + y0 ∈ M and .x0 + y0 ∈ B(x + y, r). Hence, we have .M ∩ B(x + y, r) = ∅, which implies that .x + y ∈ M. .∀r ∈ (0, ∞) ⊂ R, by Proposition 3.3, .∃x0 ∈ B(x, r/(1 + |α|)) ∩ M = ∅. Since M is a subspace, then .αx0 ∈ M and .αx0 ∈ B(αx, r). Hence, we have .M ∩ B(αx, r) = ∅, which implies that .αx ∈ M. Hence, .M is a subspace. (ii) Note that .V = x0 + M, where .x0 ∈ X and .M ⊆ X is a subspace. By Proposition 7.16, .V = x0 + M. By (i), .M is a subspace. Then, .V is a linear variety. (iii) First, we will prove the special case .p = ϑ. Clearly, .ϑ ∈ C ⊆ C. .∀x ∈ C, .∀α ∈ [0, ∞) ⊂ R, we will show that .αx ∈ C. .∀r ∈ (0, ∞) ⊂ R, by Proposition 3.3, .∃x0 ∈ C ∩ B(x, r/(1 + |α|)) = ∅. Then, we have .αx0 ∈ C ∩ B(αx, r) = ∅. By Proposition 3.3, .αx ∈ C. Hence, .C is a cone with vertex at the origin. For general .p ∈ X, we have .C = p + C0 , where .C0 is a cone with vertex at the origin. By Proposition 7.16, .C = p + C0 . By the special case we have shown, .C0 is a cone with vertex at the origin. Hence, .C is a cone with vertex p. (iv) First, we will prove the special case .x0 = ϑ. By Proposition 4.3, .B(ϑ, r) is a closed set and .B(ϑ, r) ⊇ B(ϑ, r). Then, we have .B(ϑ, r) ⊆ B(ϑ, r). On the other k hand, .∀x ∈ B(ϑ, r), define .(xk )∞ k=1 ⊆ X by .xk := 1+k x, .∀k ∈ N. Then, .xk  =

162

7 Banach Spaces

< r, .∀k ∈ N. Therefore, .(xk )∞ k=1 ⊆ B(ϑ, r). It is obvious that .limk∈N xk = x, then, by Proposition 4.13, .x ∈ B(ϑ, r). Then, we have .B(ϑ, r) ⊆ B(ϑ, r). Hence, the result holds for .x0 = ϑ. For arbitrary .x0 ∈ X, by Proposition 7.16, .B(x0 , r) = x0 + B(ϑ, r) = x0 + B(ϑ, r) = x0 + B(ϑ, r) = B(x0 , r). Hence, the result holds. This completes the proof of the proposition. ' & k 1+k x



k 1+k r

Proposition 7.18 Let .X be a normed linear space, .P ⊆ X, and .P = ∅. The closed linear variety generated by P , denoted by .V (P ), is the intersection of all closed linear varieties containing P . Then, .V (P ) = v (P ). Proof By Proposition 7.17, .v (P ) is a closed linear variety containing P . Then, we have .V (P ) ⊆ v (P ). On the other hand, .V (P ) is a closed linear variety containing P , and then .v (P ) ⊆ V (P ). Hence, .v (P ) ⊆ V (P ). Therefore, the result holds. This completes the proof of the proposition. ' & The justification of the definition of .V (P ) is that intersection of linear varieties is a linear variety when the intersection is nonempty. Definition 7.19 Let .X be a normed linear space and .P ⊆ X be nonempty. .x ∈ P is said to be a relative interior point of P if it is an interior point of P relative to the subset topology of .V (P ). The set of all relative interior points of P is called the relative interior of P , and denoted by .◦P . P is said to be relatively open if P is open in the subset topology of .V (P ). % Proposition 7.20 Let .X be a normed linear space and .(xα )α∈A ⊆ X be a net. Then, .limα∈A xα = x ∈ X if, and only if, .∀ ∈ (0, ∞) ⊂ R, .∃α0 ∈ A, .∀α ∈ A with .α0 ≺ α, we have .x − xα  < . Proposition 7.21 Let .X be a normed linear space. Then, .· is a uniformly continuous function on .X. Proof This follows directly from Propositions 4.30 and 7.14.

' &

7.3 Product Spaces Proposition 7.22 Let X := (X , K, ·X ) and Y := (Y, K, ·Y ) be normed linear spaces. By Proposition 6.24, X × Y is a vector space over K. Define a function · : X × Y → [0, ∞) ⊂ R by (x, y) := (x2X + y2Y )1/2 , ∀(x, y) ∈ X × Y. Then, (X × Y, K, ·) is a normed linear space. This normed linear space will be called the Cartesian product of X and Y and be denoted by X × Y. Proof ∀(x1 , y1 ), (x2 , y2 ) ∈ X × Y, ∀α ∈ K, we have (x1 , y1 ) = (x1 2X + y1 2Y )1/2 ∈ [0, ∞) ⊂ R and (x1 , y1 ) = 0 ⇔ x1 X = 0 and y1 Y = 0 ⇔ x1 = ϑX and y1 = ϑY ⇔ (x1 , y1 ) = ϑ; (x1 , y1 ) + (x2 , y2 ) = (x1 + x2 , y1 + y2 ) = (x1 + x2 2X + y1 + y2 2Y )1/2 ≤

7.3 Product Spaces

163

 1/2 (x1 X + x2 X )2 + (y1 Y + y2 Y )2 ≤ (x1 2X + y1 2Y )1/2 + (x2 2X + 2 1/2 y2 Y ) = (x1 , y1 )+(x2 , y2 ), where the first inequality follows from the fact that X and Y are normed linear spaces and the second inequality follows from the Minkowski’s Inequality; α (x1 , y1 ) = (αx1 , αy1 ) = (αx1 2X + αy1 2Y )1/2 = (|α|2 x1 2X + |α|2 y1 2Y )1/2 = |α|(x1 2X + y1 2Y )1/2 = |α|(x1 , y1 ). Hence, · is a norm on X × Y. This completes the proof of the proposition. ' & Clearly, the natural metric for the Cartesian product X × Y is the Cartesian metric defined in Definition 4.28. The above proposition may also be generalized to the case of X1 ×X2 ×· · ·×Xn , where n ∈ N and Xi ’s are  normed linear spaces over the same field K. When n = 0, it should be noted that α∈∅ Xα = ({∅ =: ϑ}, K, ·), where ϑ = 0. Proposition 7.23 Let X := (X , K, ·) be a normed linear space. Then, the vector addition ⊕X : X × X → X is uniformly continuous; and the scalar multiplication ⊗X : K × X → X is continuous. Proof ∀ ∈ (0, ∞) ⊂ R, ∀(x1 , x2 ), (y1 , y2 ) ∈ X × X with √ (x1 , x2 ) − (y1 , y2 )X×X < / 2

.

we have (x1 + x2 ) − (y1 + y2 ) ≤ x1 − y1  + x2 − y2  √ √ ≤ 2 (x1 − y1 2 + x2 − y2 2 )1/2 = 2(x1 , x2 ) − (y1 , y2 )X×X < 

.

Hence, the vector addition ⊕X is uniformly continuous. ∀α0 ∈ K, ∀x0 ∈ X, ∀ ∈ (0, ∞) ⊂ R, let δ = /(1 + +|α0 |+x0 ) ∈ (0, 1) ⊂ R. ∀(α, x) ∈ BK×X ((α0 , x0 ), δ), we have αx − α0 x0  ≤ αx − αx0  + αx0 − α0 x0 

.

= |α|x − x0  + x0 |α − α0 | " #1/2 " #1/2 x − x0 2 + |α − α0 |2 ≤ |α|2 + x0 2 ≤ (|α| + x0 )δ ≤ (1 + |α0 | + x0 )δ <  where we have made use of Cauchy–Schwarz Inequality in the second inequality. Hence, ⊗X is continuous at (α0 , x0 ). By the arbitrariness of (α0 , x0 ), we have that ⊗X is continuous. This completes the proof of the proposition. ' &

164

7 Banach Spaces

Definition 7.24 Let X := (X , K, ·X ) and Y := (Y, K, ·Y ) be two normed linear spaces over the same field K and A : X → Y be a vector space isomorphism. A is said to be an isometrical isomorphism if AxY = xX , ∀x ∈ X. The X and Y are said to be isometrically isomorphic. % Let A : X → Y be an isometrical isomorphism between X and Y. Then, A is an isometry between X and Y. Both A and Ainv are uniformly continuous. X and Y are equal to each other up to a relabeling of their vectors. Proposition 7.25 Let Xα, α ∈ Λ, be normed linear spaces over K, where Λ is a finite index set. Let Λ = β∈Γ Λβ , where Λβ ’s are pairwise disjoint and finite and  Γ is also finite. ∀β ∈ Γ , let Xβ := α∈Λβ Xα be the Cartesian product space. Let   XΓ  := β∈Γ α∈Λβ Xα be the Cartesian product space of product spaces, and  X := α∈Λ Xα be the Cartesian product space. Then, X and XΓ  are isometrically isomorphic.    Proof Define E : β∈Γ α∈Λβ Xα → α∈Λ Xα by, ∀x ∈ XΓ  , ∀α ∈ Λ, ∃! βα ∈ β 

Γ 

Γ · α ∈ Λβα , πα (E(x)) = πα α (πβα (x)). By Proposition 4.32, E is a isometry. It is clear that E is a linear operator since the projection functions are linear for vector spaces. Hence, E is a vector space isomorphism. Then, E is an isometrical isomorphism. This completes the proof of the proposition. ' &

7.4 Banach Spaces Definition 7.26 If a normed linear space is complete with respect to the natural metric, then it is called a Banach space. % Clearly, a Cauchy sequence in a normed linear space is bounded. Proposition 7.27 A normed linear space X is complete if, and only if, every -∞ ∞ absolutely summable series is summable, that is ∀(x ⊆ X, ) n n=1 n=1 xn  ∈ R ⇒ ∞ x ∈ X. n=1 n Proof “Only if” Let X be complete and (xn )∞ n=1 ⊆ X be absolutely summable. x  ∈ R and ∀ ∈ (0, ∞) ⊂ R, m ∈ N Then, ∞ n n=1 6- ∃N ∈6N such that, ∀n,with n ≥ m, we have ni=m xi  < . Then, 6 ni=m xi 6 < . Let sn := ni=1 xi , ∞ ∀n ∈ N, be the partial -∞ sum. Then, (sn )n=1 ⊆ X is a Cauchy sequence. By the completeness of X, n=1 xn = limn∈N sn ∈ X. “If” Let (xn )∞ ∈ N n=1 ⊆ X be a Cauchy sequence. Let n0 = 0. ∀k ∈ N, ∃n ∞k with nk > nk−1 such that xn − xm  < 2−k , ∀n, m ≥ nk . Then, xnk k=1 is a subsequence of (xn )∞ n=1 . Let y1 := xn1 and yk := xnk − xnk−1 , ∀k ≥ 2. Then, the sequence (yk )∞ ⊆ that yk  < 2−k+1 , k=1 -∞ X and its kth partial sum is xnk . Note ∀k ≥ 2. Then, k=1 yk  < y1  + 1, which implies that ∞ k=1 yk = x0 ∈ X. Hence, we have limk∈N xnk = x0 ∈ X.

7.4 Banach Spaces

165

We 6 show that limn∈N xn = x0 . ∀k ∈ N, ∃N ∈ N with6 N ≥ k such 6 6 will now 6xn − x0 6 < 2−k , ∀i ≥ N. ∀n ≥ nN , we have xn − x0  ≤ 6xn − xn 6 + that i N 6 6 6xn − x0 6 < 2−N + 2−k ≤ 2−k+1 . Hence, limn∈N xn = x0 . Therefore, X is N complete. This completes the proof of the proposition. ' & We frequently take great care to formulate problems arising in applications as equivalent problems in Banach spaces rather than other incomplete spaces. The principal advantage of Banach spaces in optimization problems is that when seeking an optimal vector, we often construct a sequence (net) of vectors, each member of which is superior to preceding ones. The desired optimal vector is then the limit of the sequence (net). In order for this scheme to be effective, there must be available a test for convergence which can be applied when the limit is unknown. The Cauchy criterion for convergence meets this requirement provided the space is complete. Example 7.28 Consider Example 7.11 with a = 0 and b = 1. We will show that the space (M, R, ·) is incomplete. Take (xn )∞ n=1 ⊆ M to be, ∀n ∈ N, ⎧ ⎪ ⎨0 .xn (t) = (n + 1)t − ⎪ ⎩1

n+1 2

1 0 ≤ t ≤ 12 − n+1 1 1 + 1 2 − n+1 < t ≤ 12 t > 12

Clearly, xn is continuous (See Fig. 7.1) and xn  < 1, ∀n ∈ N. ∀n, m ∈ N, we have 7 1  7 1    |xn (t) − xm (t)| dt =  (xn (t) − xm (t)) dt  .xn − xm  = 0

    1 1  =  − 2 (n + 1) 2 (m + 1) 

0

Then, (xn )∞ there is no continuous function n=1 is Cauchy. Yet, it is obvious that 1 x ∈ M such that limn∈N xn = x, i. e., limn∈N 0 |xn (t) − x(t)| dt = 0. Hence, the space is incomplete. % Example 7.29 Kn with norm defined as in Example 7.2 or Example 7.3 is a Banach space. % Example 7.30 Consider the real normed linear space C([a, b]) defined in Example 7.4, where a, b ∈ R and a ≤ b. We will show that it is complete. Take a Cauchy sequence (xn )∞ n=1 ⊆ C([a, b]). ∀ ∈ (0, ∞) ⊂ R, ∃N ∈ N such that ∀n, m ≥ N, 0 ≤ |xn (t) − xm (t)| ≤ xn − xm  < , ∀t ∈ [a, b]. This shows that, ∀t ∈ [a, b], (xn (t))∞ n=1 ⊆ R is a Cauchy sequence, which converges to x(t) ∈ R since R is complete. This defines a function x : [a, b] → R. It is easy to show that (xn )∞ n=1 , viewed as a sequence of functions of [a, b] to R, converges uniformly to

166

7 Banach Spaces

Fig. 7.1 Sequence for Example 7.28

1

1

x. By Proposition 4.26, x is continuous. Hence, x ∈ C([a, b]). It is easy to see that limn∈N xn − x = 0. Hence, limn∈N xn = x. Hence, C([a, b]) is a Banach space. % Example 7.31 Let K be a countably compact topological space, X be a normed linear space over the field K, Y := {f : K → X}. Define the usual vector addition ⊕ and scalar multiplication ⊗ and null vector ϑ on Y as in Example 6.20. Then, Y := (Y, ⊕, ⊗, ϑ) is a vector space over K. Let M := {f ∈ Y | f is continuous}. Then, by Propositions 3.32, 6.25, and 7.23, M := (M, ⊕, ⊗, ϑ) is a subspace of (Y, K).  Define a function · : M → [0, ∞) ⊂ R by f  = max supk∈K f (k)X , 0 , ∀f ∈ M. This function is well-defined by Propositions 7.21, 3.12, and 5.29. We will show that · defines a norm on M. We will distinguish two exhaustive and mutually exclusive cases: Case 1: K = ∅; Case 2: K = ∅. Case 1: K = ∅. Then, M is a singleton set {∅} and ∅ = 0. Clearly, (M, K, ·) is a normed linear space. Case 2: K = ∅. ∀f, g ∈ M, ∀α ∈ K, f  = maxk∈K f (k)X by Propositions 7.21, 3.12, and 5.29. f  = 0 ⇔ f (k)X = 0, ∀k ∈ K ⇔ f (k) = ϑX , ∀k ∈ K ⇔ f = ϑ. f + g = maxk∈K f (k) + g(k)X ≤ maxk∈K (f (k)X + g(k)X ) ≤ maxk∈K f (k)X + maxk∈K g(k)X = f  + g. αf  = maxk∈K αf (k)X = maxk∈K |α|f (k)X = |α|f . Hence, (M, K, ·) is a normed linear space. In both case, we have shown that C(K, X) := (M, K, ·) is a normed linear space. % Example 7.32 Let K be a countably compact topological space, X be a Banach space over the field K (with norm ·X ). Consider the normed linear space C(K, X) (with norm ·) defined in Example 7.31. We will show that this space is a Banach space. We will distinguish two exhaustive and mutually exclusive cases: Case 1: K = ∅; Case 2: K = ∅. Case 1: K = ∅. Then, C(K, X) is a singleton set. Hence, any Cauchy sequence must converge. Thus, C(K, X) is a Banach space. Case 2: K = ∅. Take a Cauchy sequence (xn )∞ n=1 ⊆ C(K, X). ∀ ∈ (0, ∞) ⊂ R, ∃N ∈ N such that ∀n, m ≥ N, 0 ≤ xn (k) − xm (k)X ≤ xn − xm  < , ∀k ∈ K. This shows that, ∀k ∈ K, (xn (k))∞ n=1 ⊆ X is a Cauchy sequence, which converges to x(k) ∈ X since X is complete. This defines a function x : K → X. It is easy to show that (xn )∞ n=1 , viewed as a sequence of functions of K to X, converges uniformly to x. By Proposition 4.26, x is continuous. Hence, x ∈ C(K, X). It is easy to see that limn∈N xn − x = 0. Hence, limn∈N xn = x. Hence, C(K, X) is a Banach space.

7.4 Banach Spaces

167

In both cases, we have shown that C(K, X) is a Banach space when X is a Banach space. % Example 7.33 Let X be a Banach space over the field K. Consider the normed linear space lp (X) (with norm ·p ) defined in Example 7.10, where p ∈ [1, ∞] ⊂ Re . We will show that this space is a Banach space. We will distinguish two exhaustive and mutually exclusive cases: Case 1: p ∈ [1, ∞); Case 2: p = ∞. Case 1: p ∈ [1, ∞). Take a Cauchy sequence (xn )∞ n=1 ⊆ lp (X), where xn = ∞  ξn,k k=1 ⊆ X, ∀n ∈ N. ∀ ∈ (0, ∞) ⊂ R, ∃N ∈ N such that ∀n, m ≥ N , we 6 6 6p 1/p -∞ 6 6 6 have xn − xm p < . ∀k ∈ N, 6ξn,k − ξm,k 6X ≤ = i=1 ξn,i − ξm,i X  ∞ xn − xm p < . Hence, ∀k ∈ N, ξn,k n=1 ⊆ X is a Cauchy sequence, which converges to some ξk ∈ X since X is a Banach space. Let x := (ξk )∞ k=1 ⊆ X. We will show that x ∈ lp (X) and limn∈N xn = x. Since (xn )∞ is a Cauchy n=1 sequence, then it is bounded, that is ∃M ∈ [0, ∞) ⊂ R such that xn p ≤ M, -∞ 6 6p p p 6 6 ∀n ∈ N. Then, we have xn p = i=1 ξn,i X ≤ M , ∀n ∈ N. Then, 6p -k 6 p ∀n, k ∈ N, we have i=1 6ξn,i 6X ≤ M . By Propositions 7.21 and 3.66, we have 6 -∞ -k 6 -k p p p p 6 6p i=1 ξi X = limn∈N i=1 ξn,i X ≤ M . Hence, we have i=1 ξi X ≤ M -∞ p 1/p and xp = ≤ M. Hence, we have x ∈ lp (X). ∀ ∈ (0, ∞) ⊂ R, i=1 ξi X ∀n, m ∈ N with n, m ≥ N , we have xn − xm p < . Then, ∀k ∈ N, 6p -k 6 p 6 6 limit as m → ∞, by Propositions 7.21, 3.66, i=1 ξn,i − ξm,i X <  6. Taking 6 6p -k - 6 p 6 6 and 7.23, we have i=1 ξn,i − ξi X = limm∈N ki=1 6ξn,i − ξm,i 6X ≤  p , 6p 1/p -∞ 6 6 6 ∀k ∈ N. Hence, we have xn − xp = ≤ . This shows i=1 ξn,i − ξi X that limn∈N xn = x. Hence, lp (X) is complete and therefore a Banach space. Case 2: p = ∞. Take a Cauchy sequence (xn )∞ n=1 ⊆ l∞ (X), where xn = ∞  ξn,k k=1 ⊆ X, ∀n ∈ N. ∀ ∈ (0, ∞) ⊂ R, ∃N 6 6 ∈ N such that 6 ∀n, m ≥ 6 N , 6 6 6 we have xn − xm ∞ < . ∀k ∈ N, ξn,k − ξm,k X ≤ supi∈N ξn,i − ξm,i 6X = ∞  xn − xm ∞ < . Hence, ∀k ∈ N, ξn,k n=1 ⊆ X is a Cauchy sequence, which converges to some ξk ∈ X since X is a Banach space. Let x := (ξk )∞ k=1 ⊆ X. We will show that x ∈ l∞ (X) and limn∈N xn = x. Since (xn )∞ is a Cauchy n=1  x sequence, then it is bounded, that is ∃M ∈ [0, ∞) ⊂ R such that n ∞ ≤ M, 6 6 6 6 ∀n ∈ N. Then, we have 6ξn,k 6X ≤ supi∈N 6ξn,i 6X = xn6∞ ≤ M, ∀n, k ∈ N. 6 By Propositions 7.21 and 3.66, we have ξk X = limn∈N 6ξn,k 6X ≤ M. Hence, x∞ = supi∈N ξi X ≤ M. Therefore, x ∈ l∞ (X). ∀ ∈ (0, ∞) ⊂ R, ∀n, m ∈ N6 with n, m ≥ N , we have xn − xm ∞ < . Then, ∀k ∈ N, 6 6ξn,k − ξm,k 6 < . Taking limit as m → ∞, by Propositions 7.21, 3.66, and 7.23, 6 6 6 6 X 6 6 we have 6ξn,k − ξk 6X 6= limm∈N 6 ξn,k − ξm,k X ≤ , ∀k ∈ N. Hence, we have xn − x∞ = supi∈N 6ξn,i − ξi 6X ≤ . This shows that limn∈N xn = x. Hence, l∞ (X) is complete and therefore a Banach space. In summary, we have shown that lp (X) is a Banach space, ∀p ∈ [1, ∞] ⊂ Re , when X is a Banach space. % Definition 7.34 Let X be a normed linear space and S ⊆ X. S is said to be complete if S with the natural metric forms a complete metric space. %

168

7 Banach Spaces

By Proposition 4.39, a subset in a Banach space is complete if, and only if, it is closed. Proposition 7.35 Let Y be a normed linear space over K, S1 , S2 ⊆ Y be separable subsets, and α ∈ K. Then, span (S1 ), span (S1 ), S1 + S2 , αS1 , S1 ∩ S2 , and S1 ∪ S2 are separable subsets of Y. Proof Let KQ := Q if K = R; and KQ := {a + ib ∈ C | a, b ∈ Q}, if K = C. Clearly, KQ is a countable dense set in K. Let D ⊆ S1 be a countable dense subset. Let Dˆ := { ni=1 αi yi ∈ Y | n ∈ Z+ , αi ∈ KQ , yi ∈ D, i = 1, . . . , n}, which is a countable set. Clearly, Dˆ ⊆ span (S1 ) is a dense subset. Hence, span (S1 ) ⊆ Y is separable. By Proposition 4.38, span (S1 ) is separable. It is straightforward to show ' & that S1 + S2 , αS1 , S1 ∩ S2 , and S1 ∪ S2 are separable. Theorem 7.36 In a normed linear space X, any finite-dimensional subspace M ⊆ X is complete. Proof Let X be a normed linear space over the field K. Let n¯ ∈ Z+ be the dimension of M, which is well-defined by Theorem 6.51. We will prove the theorem by mathematical induction on n. ¯ 1◦ n¯ = 0. Then, M = {ϑ}. Clearly, any Cauchy sequence in M must converge to ϑ ∈ M. Hence, M is complete. n¯ = 1. Let {e1 } ⊆ M be a basis for M. Clearly, e1 = ϑ and e1  > 0. Fix any Cauchy sequence (xn )∞ n=1 ⊆ M. Then, xn = αn e1 for some αn ∈ K, ∀n ∈ N. Since (xn )∞ n=1 is Cauchy, then ∀ ∈ (0, ∞) ⊂ R, ∃N ∈ N, ∀n, m ≥ N, we have xn − xm  < e1 . Then, we have |αn − αm | = xn − xm /e1  < . Hence, (αn )∞ n=1 is a Cauchy sequence in K. Then, limn∈N αn = α ∈ K since K is complete. It is easy to show that limn∈N xn = αe1 ∈ M. Hence, M is complete. 2◦ Assume M is complete when n¯ = k − 1 ∈ N. 3◦ Consider the case n¯ = k ∈ {2, 3, . . .} ⊂ N. Let {e1 , . . . , ek } ⊆ M be a basis for M. Define Mi := span ({e1 , . . . , ek } \ {ei }) and δi := dist(ei , Mi ), i = 1, . . . , k. ∀i = 1, . . . , k, we have δi ∈ [0, ∞) ⊂ R. Mi , which is a k − 1-dimensional subspace of X. By inductive assumption Mi is complete. By Proposition 4.39, Mi is closed. Clearly, ei ∈ / Mi . By Proposition 4.10, δi > 0. Let δ := min{δ1 , . . . , δk } > 0. Fix any Cauchy sequence (xn )∞ n=1 ⊆ M. -k ∀n ∈ N, xn admits a unique representation xn = λ e where λn,i ∈ K, n,i i i=1 i = 1, . . . , k, by Definition 6.50 and Corollary 6.47. ∀ ∈ (0, ∞) ⊂ R, ∃N ∈ N ∀i ∈ {1, . . . , k}, we have  > such that ∀n, m ≥ 6-N, we have xn − x6m  < δ.     6 k 6  xn − xm /δ = 6 j =1 (λn,j − λm,j )ej 6/δ ≥ λn,i − λm,i δi /δ ≥ λn,i − λm,i .  ∞ Hence, λn,i n=1 ⊆ K is a Cauchy sequence, ∀i ∈ {1, . . . , k}. Then, limn∈N λn,i = -k λi ∈ K. Let x := i=1 λi ei ∈ M. Now, it is straightforward to show that limn∈N xn − x = 0. Then, limn∈N xn = x ∈ M. Hence, M is complete. This completes the induction process. This completes the proof of the theorem. ' &

7.5 Compactness

169

Definition 7.37 Let X be a vector space over the field K and ·1 and ·2 be two norms defined on X . These two norms are said to be equivalent if ∃K ∈ (0, ∞) ⊂ R such that x1 /K ≤ x2 ≤ Kx1 , ∀x ∈ X . % Clearly, two norms are equivalent implies that the natural metrics are uniformly equivalent. Theorem 7.38 Let X be a finite-dimensional vector space over the field K. Any two norms on X are equivalent. Proof Let n ∈ Z+ be the dimension of X , which is well-defined by Theorem 6.51. Let ·1 and ·2 be two norms defined on X . We will distinguish three exhaustive and mutually exclusive cases: Case 1: n = 0; Case 2: n = 1; Case 3: n ≥ 2. Case 1: n = 0. Then, X = {ϑ} and ϑ1 = 0 = ϑ2 . Hence, the two norms are equivalent. Case 2: n = 1. Let {e1 } be a basis for X . Clearly, e1 = ϑ. Let δ := e1 1 /e1 2 ∈ (0, ∞) ⊂ R and K = max{δ, δ −1 } ∈ (0, ∞) ⊂ R. ∀x ∈ X , ∃! α ∈ K such that x = αe1 . Then, x1 = |α|e1 1 = |α|e1 2 δ = δx2 and x1 /K ≤ x2 ≤ Kx1 . Hence, the two norms are equivalent. Case 3: n ≥ 2. Let {e1 , . . . , en } ⊆ X be a basis for X . ∀x -n∈ X , by Definition 6.50 and Corollary 6.47, ∃! α1 , . . . , αn ∈ K such that x = will show i=1 αi ei . Wethat ∃K1 ∈ (0, ∞) ⊂ R such that ( ni=1 |αi |)/K1 ≤-x1 ≤ K1 ( ni=1 |αi |). By a-similar argument, ∃K2 ∈ (0, ∞) ⊂ R such that ( ni=1 |αi |)/K2 ≤ x2 ≤ K2 ( ni=1 |αi |). Then, x1 /K ≤ x2 ≤ Kx1 , where K := K1 K2 . Hence, the two norms are equivalent. Clearly, ei = ϑ and ei 1 >-0, i = 1, . . . , n. Let-δ := max1≤i≤n ei 1 ∈ n n (0, ∞) ⊂ R. Then, x1 ≤ i=1 |αi |ei 1 ≤ δ( i=1 |αi |). Define Mi = span ({e1 , . . . , en } \ {ei }), i = 1, . . . , n, and δi = dist(ei , Mi ) (with respect to ·1 ). Since {e1 , . . . , en } is a basis for X , then ei ∈ / Mi . ∀i ∈ {1, . . . , k}, by Theorem 7.36, Mi is complete (with respect to ·1 ). By Proposition 4.39, Mi is closed (with respect to ·1 ). By Proposition 4.10, we have δi 6∈-(0, ∞) 6⊂ R. Let δ¯ := min{δ1 , . . . , δn } ∈ (0, ∞) ⊂ R. Then, x1 = 6 ni=1 αi ei 61 ≥ ¯ ¯ ∀i ∈ {1, . . . , n}. Hence, we have x1 ≥ (δ/n) |αi |δi ≥ |αi |δ, ( ni=1 -n -|αi |). Let ¯ > 0. Then, we have ( i=1 |αi |)/K1 ≤ x1 ≤ K1 ( ni=1 |αi |). K1 = max{δ, n/δ} This completes the proof of the theorem. ' &

7.5 Compactness Definition 7.39 Let X be a normed linear space and S ⊆ X. S is said to be compact if S together with the natural metric forms a compact metric space. % Proposition 5.29 says that a continuous function achieves its minimum and maximum on nonempty countably compact spaces. This has immediate generalization to infinite-dimensional spaces. Yet, the compactness restriction is so severe in infinitedimensional spaces that it is applicable in minority of problems.

170

7 Banach Spaces

Lemma 7.40 Let (X , C) be a vector space. Then X is also a vector space over the field R. Furthermore, if X := (X , C, ·) is a normed linear space, then XR := (X , R, ·) is also a normed linear space. X and XR are isometric and admit the same metric space properties. In particular, X is a Banach space if, and only if, XR is a Banach space. Proof Let (X , C) be a vector space. Note that R ⊂ C. Then, ∀x, y, z ∈ X , ∀α, β ∈ R, we have (i) x + y = y + x; (ii) (x + y) + z = x + (y + z); (iii) ϑX + x = x; (iv) α (x + y) = αx + αy; (v) (α + β)x = αx + βx; (vi) (αβ)x = α (βx); (vii) 0 · x = ϑX and 1 · x = x. Hence, (X , R) is a vector space. Let X := (X , C, ·) be a normed linear space. Then, (X , R) is a vector space. ∀x, y ∈ X, ∀α ∈ R, we have (i) x ∈ [0, ∞) ⊂ R and x = 0 ⇔ x = ϑX ; (ii) x + y ≤ x + y; αx = |α|x. Hence, XR := (X , R, ·) is a normed linear space. Clearly, idX : X → XR is an isometry and the natural metrics induced by X and XR on X are identical. Hence, X and XR admits the same metric space properties. Then, X is a Banach space if, and only if, X is complete if, and only if, XR is complete if, and only if, XR is a Banach space. This completes the proof of the lemma. ' & Proposition 7.41 Let K ⊆ Cn with n ∈ Z+ . Then, K is compact if, and only if, K is closed and bounded. Proof “Necessity” By Proposition 5.38, K is complete and totally bounded. Then, K is bounded. By Proposition 4.39, K is closed. “Sufficiency” Let |·| be the norm on Cn . By Lemma 7.40, X := (Cn , R, |·|) admits the same metric space property as Cn . Then, K ⊆ X is closed and bounded. Note that X is isometrically isomorphic to R2n . Hence, K ⊆ X is compact. Then, K ⊆ Cn is compact. This completes the proof of the proposition. ' & Proposition 7.42 Let X be a finite-dimensional normed linear space over the field K. K ⊆ X is compact if, and only if, K is closed and bounded. Proof “Necessity” By Proposition 5.38, K is complete and totally bounded. Then, K is bounded. By Proposition 4.39, K is closed. “Sufficiency” Let S ⊆ X be a basis for X, that is, S is linearly independent and span (S) = X. Since X is finite-dimensional, then let n ∈ Z+ be the dimension of X, which is well-defined by Theorem 6.51. We will distinguish two exhaustive and mutually exclusive cases: Case 1: n = 0; Case 2: n ∈ N. Case 1: n = 0. Then, X is a singleton set. Clearly K is compact. - Case 2: n ∈ N. Let S = {e1 , . . . , en }. ∀x ∈ X, x can be uniquely expressed as ni=1 αi ei for some α1 , . . . , αn ∈ K. This allows n us ,. . . , αn ), ∀x 6 = -nto define a bijective mapping ψ : X → K by ψ(x) = (α16 n 6 6 · x α e ∈ X. Define a alternative norm on X by = α e 1 1 i=1 i i i=1 i i 1 = 4n 2 i=1 |αi | , ∀x ∈ X. It is easy to show that ·1 is a norm. By Theorem 7.38, there exists ξ ∈ [1, ∞) ⊂ R such that x1 /ξ ≤ x ≤ ξ x1 , ∀x ∈ X. Hence, ψ is a homeomorphism. By Proposition 3.10, ψ(K) is closed. By the equivalence of

7.6 Quotient Spaces

171

the two norms, ψ(K) is bounded. By Propositions 5.40 or 7.41, ψ(K) is compact. By Proposition 5.7, K is compact. This completes the proof of the proposition. ' &

7.6 Quotient Spaces Proposition 7.43 Let M ⊆ X be a subspace of a vector space X over a field F := (F, +, ·, 0, 1). x1 , x2 ∈ X are said to be equivalent modulo M if x1 −x2 ∈ M. This equivalence relationship (easy to show) partitions the space X into disjoint subsets, or classes, of equivalent elements: namely, the linear varieties that are distinct translations of the subspace M. These classes are often called the cosets of M. ∀x ∈ X , there is a unique coset of M, [x](= x + M), such that x ∈ [x]. The quotient of X modulo M is defined to be the set N of all cosets of M. Define vector addition and scalar multiplication on N by, ∀[x1 ], [x2 ] ∈ N, ∀α ∈ F , [x1 ] + [x2 ] := [x1 + x2 ] and α[x1 ] := [αx1 ]. Let the null vector on N be [ϑX ]. Then, N together with the vector addition, scalar multiplication, and the null vector form a vector space over F . This vector space will be called the quotient space of X modulo M and denoted by X /M. Define a function φ : X → X /M by φ(x) = [x], ∀x ∈ X . Then, φ is a linear function and will be called the natural homomorphism. Proof Fix any [x1 ], [x2 ], [x3 ] ∈ X /M and any α, β ∈ F . We will first show that vector addition and scalar multiplication are uniquely defined. ∀y1 ∈ [x1 ], ∀y2 ∈ [x2 ], we have x1 − y1 , x2 − y2 ∈ M; and (x1 + x2 ) − (y1 + y2 ) ∈ M, since M is a subspace. Then, [x1 ] + [x2 ] = [x1 + x2 ] = [y1 + y2 ] = [y1 ] + [y2 ]. Hence, the vector addition is uniquely defined. αx1 − αy1 = α(x1 − y1 ) ∈ M. Then, α[x1 ] = [αx1 ] = [αy1 ] = α[y1 ]. Hence, the scalar multiplication is uniquely defined. (i) [x1 ] + [x2 ] = [x1 + x2 ] = [x2 + x1 ] = [x2 ] + [x1 ]; (ii) ([x1 ] + [x2 ]) + [x3 ] = [(x1 + x2 ) + x3 ] = [x1 + (x2 + x3 )] = [x1 ] + ([x2 ] + [x3 ]); (iii) [x1 ] + [ϑX ] = [x1 + ϑX ] = [x1 ]; (iv) α ([x1 ] + [x2 ]) = α[x1 + x2 ] = [α (x1 + x2 )] = [αx1 + αx2 ] = [αx1 ] + [αx2 ] = α[x1 ] + α[x2 ]; (v) (α + β)[x1 ] = [(α + β)x1 ] = [αx1 + βx1 ] = [αx1 ] + [βx1] = α[x1 ] + β[x1 ]; (vi) (αβ)[x1 ] = [(αβ)x1] = [α (βx1 )] = α[βx1 ] = α (β[x1 ]); (vii) 0[x1 ] = [0x1 ] = [ϑX ]; 1[x1 ] = [x1 ]. Hence, X /M is a vector space over F . ' & Clearly, φ is a linear function. This completes the proof of the proposition. Proposition 7.44 Let X := (X , K, ·X ) be a normed linear space and M ⊆ X be a closed subspace, and X /M be the quotient space of X modulo M. Define a norm · on X /M by, ∀[x] ∈ X /M, [x] := infm∈M x − mX = dist(x, M). Then, X/M := (X /M, K, ·) is a normed linear space, which will be called the quotient space of X modulo M. Proof By Proposition 7.43, X /M is a vector space over K. Here, we only need to show that · defines a norm on X /M. First, we show that · : X /M → [0, ∞) ⊂ R is uniquely defined. ∀[x] ∈ X /M, ∀y ∈ [x], y − x ∈ M. Then, we have [y] =

172

7 Banach Spaces

infm∈M y − mX = infm∈M x − y + y − mX = infm∈M x − mX = [x] ≤ xX < +∞. Hence, · is uniquely defined. Next, we show that · defines a norm on X /M. ∀[x1 ], [x2 ] ∈ X /M, ∀α ∈ K, we have (i) [x1 ] ∈ [0, ∞) ⊂ R, since [x1 ] ≤ x1 X < +∞. [x1 ] = [ϑX ] ⇒ [x1 ] = infm∈M mX = 0 and [x1 ] = 0 ⇒ dist(x1 , M) = 0, by Proposition 4.10, we have x1 ∈ M ⇒ [x1 ] = [ϑX ]. Hence [x1 ] = [ϑX ] ⇔ [x1 ] = 0. (ii) [x1 ] + [x2 ] = [x1 + x2 ] = infm∈M x1 + x2 − mX . Note that [xi ] = infm∈M xi − mX , i = 1, 2. ∀ ∈ (0, ∞) ⊂ R, i = 1, 2, ∃mi ∈ M such that xi − mi X < [xi ]+. Then, we have x1 + x2 − m1 − m2 X ≤ x1 − m1 X + x2 − m2 X ≤ [x1 ] + [x2 ] + 2. Hence, we have [x1 ] + [x2 ] ≤ [x1 ] + [x2 ] + 2. By the arbitrariness of , we have [x1 ] + [x2 ] ≤ [x1 ] + [x2 ]. (iii) α[x1 ] = [αx1 ] = infm∈M αx1 − mX . We will distinguish two exhaustive and mutually exclusive cases: Case 1: α = 0; Case 2: α = 0. Case 1: α = 0. α[x1 ] = infm∈M mX = 0 = |α|[x1 ]. Case 2: α = 0. α[x1 ] = infm∈M αx1 − αmX = infm∈M |α|x1 − mX = |α|[x1 ]. Hence, in both cases, we have α[x1 ] = |α|[x1 ]. This shows that · is a norm on X /M. Hence, X/M is a normed linear space. This completes the proof of the proposition. ' & Proposition 7.45 Let X be a Banach space and M ⊆ X be a closed subspace. Then, the quotient space X/M is a Banach space. Proof By Proposition 7.44, X/M is a normed linear space. Here, we need only to show that X/M is complete. Let ·X be the norm on X and · be the norm on X/M. We will prove this using Proposition 7.27. Fix any absolutely summable -∞ ∈ R. ∀n ∈ N, ∃yn ∈ series ([xn ])∞ ⊆ X/M. Then, [xn ] such that n=1 - n=1 [xn ]∞ ∞ −i ) = y  yn X < [xn ]+2−n . Then, ∞ < ([x ]+2 i i X i=1 i=1 i=1 [xi ]+ ∞ ∞ 1. Then, (yn )n=1 ⊆ X is absolutely summable. By Proposition 7.27, -∞ -∞n=1 yn = y ∈ X, since X is complete. Now, it is easy to show that n=1 [xn ] = n=1 [yn ] = [y] ∈ X/M. Hence, X/M is complete by Proposition 7.27. This completes the proof of the proposition. ' & Definition 7.46 Let (X , K) be a vector space and · : X → [0, ∞) ⊂ R. Assume that · satisfies (ii) and (iii) of Definition 7.1 and ϑ = 0, but not necessarily (i). Then, · is called a pseudo-norm. % Proposition 7.47 Let (X , K) be a vector space and · : X → [0, ∞) ⊂ R be a pseudo-norm on X . Then, the set M := {x ∈ X | x = 0} is a subspace of (X , K). On the quotient space X /M, define a norm ·1 : X /M → [0, ∞) ⊂ R by [x]1 = x, ∀[x] ∈ X /M. Then, the space (X /M, K, ·1 ) is a normed linear space, which will be called the quotient space of (X , K) modulo ·. Proof We need the following claim. Claim 7.47.1 ∀x ∈ X , ∀m ∈ M, we have x = x + m.

7.7 The Stone-Weierstrass Theorem

173

Proof of Claim Note that x ≤ x + m + −m = x + m ≤ x + m = x. This completes the proof of the claim. ' & Clearly, ϑX ∈ M = ∅. ∀m1 , m2 ∈ M, ∀α ∈ K, we have αm1 ∈ M, by the properties of the pseudo-norm, and m1 + m2 ∈ M by Claim 7.47.1. Hence, M is a subspace of (X , K). By Proposition 7.43, X /M is a vector space over K. By Claim 7.47.1, ·1 : X /M → [0, ∞) ⊂ R is uniquely defined. Next, we show that ·1 is a norm on X /M. ∀[x1 ], [x2 ] ∈ X /M, ∀α ∈ K, [x1 ]1 = x1  ∈ [0, ∞) ⊂ R. [x1 ] = [ϑX ] implies that [x1 ]1 = [ϑX ]1 = ϑX  = 0. [x1 ]1 = 0 implies that x1  = 0 and x1 ∈ M, which further implies that x1 ∈ [ϑX ] and [x1 ] = [ϑX ]. [x1 ] + [x2 ]1 = [x1 + x2 ]1 = x1 + x2  ≤ x1  + x2  = [x1 ]1 + [x2 ]1 . α[x1 ]1 = [αx1 ]1 = αx1  = |α|x1  = |α|[x1 ]1 . Hence, ·1 is a norm on X /M. Therefore, (X /M, K, ·1 ) is a normed linear space. This completes the proof of the proposition. ' &

7.7 The Stone-Weierstrass Theorem Definition 7.48 Let X be a set, f : X → R, and g : X → R. f ∨ g : X → R and f ∧ g : X → R are defined by (f ∨ g)(x) = max{f (x), g(x)} and (f ∧ g)(x) = min{f (x), g(x)}, ∀x ∈ X . % Proposition 7.49 Let X be a topological space and f : X → R and g : X → R be continuous at x0 ∈ X . Then, f ∨g and f ∧g are continuous at x0 . If furthermore f and g are continuous, then f ∨ g and f ∧ g are continuous. Proof This is straightforward.

' &

Example 7.50 Let X be a topological space and Y be a normed linear space over the field K. Let (M(X , Y), K) be the vector space defined in Example 6.20. Let V := {f ∈ M(X , Y) | f is continuous}. Then, by Propositions 3.12, 3.32, 6.25, and 7.23, V is a subspace of (M(X , Y), K). This subspace will be denoted by Cv (X , Y). % Definition 7.51 Let X be a topological space, Cv (X , R) be the vector space defined in Example 7.50. L ⊆ Cv (X , R) is said to be a lattice if L = ∅ and ∀f, g ∈ L, f ∨ g ∈ L and f ∧ g ∈ L. A subspace M ⊆ Cv (X , R) is said to be an algebra if ∀f, g ∈ M, fg ∈ M, where fg is the product of functions f and g. % Proposition 7.52 Let X := (X, O) be a compact space, C(X , R) be the Banach space as defined in Example 7.32, and L ⊆ C(X , R) be a lattice. Assume h : X → R defined by h(x) = inff ∈L f (x), ∀x ∈ X is continuous. Then, ∀ ∈ (0, ∞) ⊂ R, ∃g ∈ L such that 0 ≤ g(x) − h(x) < , ∀x ∈ X . Proof We will distinguish two exhaustive and mutually exclusive cases: Case 1: X = ∅; Case 2: X = ∅. Case 1: X = ∅. C(X , R) is a singleton set and L = C(X , R) since L is a lattice and therefore nonempty. Then, the result holds. Case 2: X = ∅. ∀ ∈ (0, ∞) ⊂ R, ∀x ∈ X , since h(x) ∈ R, then ∃fx ∈ L such that

174

7 Banach Spaces

0 ≤ fx (x)−h(x) < /3. Since fx and h are continuous, then ∃Ox ∈ O with x ∈ Ox such that ∀y ∈ Ox , we have |fx (y) − fx (x)| < /3 and |h(y) − h(x)| < /3. Then, |fx (y) − h(y)| ≤  |fx (y) − fx (x)| + |fx (x) − h(x)| + |h(x) − h(y)| < , ∀y ∈ Ox . Then, X ⊆  x∈X Ox . Since X is compact, then there exists a finite set XN ⊆8X such that X ⊆ x∈XN Ox . Since X = ∅ then XN must be nonempty. Let g := x∈XN fx ∈ L. ∀x ∈ X , ∃x0 ∈ XN such that x ∈ Ox0 and 0 ≤ g(x) − h(x) ≤ fx0 (x) − h(x) < . This completes the proof of the proposition. ' & Proposition 7.53 Let X := (X, O) be a compact space, C(X , R) be the Banach space as defined in Example 7.32, and L ⊆ C(X , R) be a lattice satisfying the following conditions: (i) L separates points, that is ∀x, y ∈ X with x = y, ∃f ∈ L such that f (x) = f (y). (ii) ∀f ∈ L, ∀c ∈ R, c + f, cf ∈ L. Then, ∀h ∈ C(X , R), ∀ ∈ (0, ∞) ⊂ R, there exists g ∈ L such that 0 ≤ g(x) − h(x) < , ∀x ∈ X . Proof We will distinguish two exhaustive and mutually exclusive cases: Case 1: X = ∅; Case 2: X = ∅. Case 1: X = ∅. C(X , R) is a singleton set and L = C(X , R) since L is a lattice and therefore nonempty. Then, the result holds. Case 2: X = ∅. We need the following two results. Claim 7.53.1 ∀a, b ∈ R, ∀x1 , x2 ∈ X with x1 = x2 , ∃f ∈ L such that f (x1 ) = a and f (x2 ) = b. Proof of Claim Let g ∈ L be such that g(x1 ) = g(x2 ), which exists by (i). Let f =

.

bg(x1 ) − ag(x2 ) a−b g+ ∈L g(x1 ) − g(x2 ) g(x1 ) − g(x2 )

Then, f is the desired function.

' &

Claim 7.53.2 ∀a, b ∈ R with a ≤ b, ∀ closed set F ⊆ X , ∀x0 ∈ X with x0 ∈ / F, ∃f ∈ L such that f (x0 ) = a, f (x) ≥ a, ∀x ∈ X , and f (x) > b, ∀x ∈ F . Proof of Claim ∀x ∈ F , we have x = x0 . By Claim 7.53.1, ∃fx ∈ L such that fx (x0 ) = a and fx (x) = b + 1. Let Ox := {x¯∈ X | fx (x) ¯ > b}. Then, Ox ∈ O since fx is continuous. Clearly, x ∈ Ox . F ⊆ x∈F Ox . By Proposition 5.5, F is  compact. Then, there exists a finite set FN ⊆ F such that F ⊆ x∈FN Ox . Take # "9 g ∈ L = ∅. Then, a = 0g + a ∈ L by (ii). Let f := a ∨ x∈FN fx ∈ L. Clearly, f (x0 ) = a and f (x) ≥ a, ∀x ∈ X . ∀x ∈ F , ∃x0 ∈ FN such that x ∈ Ox0 . Then, f (x) ≥ fx0 (x) > b. Hence, f is the desired function. This completes the proof of the claim. ' & ∀h ∈ C(X , R), let L¯ := {f ∈ L | f (x) ≥ h(x), ∀x ∈ X }. By the continuity of h, the compactness of X , and Proposition 5.29, ∃b ∈ R such that h(x) ≤ b, ∀x ∈ X . ¯ it is easy Then, the constant function b ∈ L¯ = ∅ (b ∈ L by (ii)). ∀f1 , f2 ∈ L,

7.7 The Stone-Weierstrass Theorem

175

¯ Hence, L¯ is a lattice. We will show to show that f1 ∨ f2 ∈ L¯ and f1 ∧ f2 ∈ L. that h(x) = inff ∈L¯ f (x), ∀x ∈ X . Then, the result follows from Proposition 7.52. ∀x0 ∈ X , ∀η ∈ (0, ∞) ⊂ R, let Fx0 ,η := {x ∈ X | h(x) ≥ h(x0 ) + η}. By the continuity of h and Proposition 3.10, Fx0 ,η is closed. Clearly, x0 ∈ / Fx0 ,η . Now, by Claim 7.53.2, ∃fx0 ,η ∈ L such that fx0 ,η (x0 ) = h(x0 ) + η, fx0 ,η (x) ≥ h(x0 ) + η, ∀x ∈ X , and fx0 ,η (x) > b, ∀x ∈ Fx0 ,η . It is clear that fx0 ,η (x) > h(x), ∀x ∈ X . ¯ Then, inf ¯ f (x0 ) ≤ fx0 ,η (x0 ) = h(x0 ) + η. By the definition Hence, fx0 ,η ∈ L. f ∈L ¯ of L and the arbitrariness of η, we have h(x0 ) ≤ inff ∈L¯ f (x0 ) ≤ h(x0 ). Hence, we have h(x0 ) = inff ∈L¯ f (x0 ). This completes the proof of the proposition. ' & Lemma 7.54 ∀ ∈ (0, ∞) ⊂ R, there exists a polynomial P (s) in one variable such that |P (s) − |s|| < , ∀s ∈ [−1, 1] ⊂ R. Proof This result is a special case of Bernsteˇın Approximation Theorem (Bartle, 1976, pg. 171). For a generalization of the Bernsteˇın Approximation Theorem to multi-variable case, see Page 532. ' & Lemma 7.55 Let X be a nonempty compact space and A ⊆ C(X , R) be an algebra. Then, A is an algebra. Proof Note that A is a subspace of C(X , R) by Definition 7.51. Then, by Proposition 7.17, A is a6 subspace 1) ⊂ R, 6 of C(X , R). ∀f, g ∈ A, ∀ ∈ (0,  ∃f¯, g¯ ∈ A such that 6f − f¯6 < 2(1+g) and g − g ¯ < 2(1+f ) . Then, 6 6  ¯(x)g(x) − f ¯  f¯g¯ ∈ A since A is an algebra, and 6fg − f¯g¯ 6 = maxx∈X f (x)g(x)     ¯ ≤ maxx∈X |f (x)| ≤ maxx∈X |f (x)g(x) − f (x)g(x)| ¯  − f (x)g(x) ¯ ¯  + f (x)g(x) |g(x) − g(x)| ¯ + maxx∈X |g(x)| ¯ · f (x) − f¯(x) ≤ f g − g ¯ + g + g¯ − g 6 6 ¯ 6f − f¯6 < f  + (g+g−g) < . Hence, fg ∈ A by the arbitrariness of . 2(1+f ) 2(1+g) This shows that A is an algebra. This completes the proof of the lemma.

' &

Theorem 7.56 (Stone-Weierstrass Theorem) Let X := (X, O) be a compact space and C(X , R) be the Banach space defined in Example 7.32. Assume that A ⊆ C(X , R) is an algebra and satisfies (i) A separates points, that is, ∀x1 , x2 ∈ X with x1 = x2 , then ∃f ∈ A such that f (x1 ) = f (x2 ). (ii) A contains all constant functions. Then, A = C(X , R), that is A is dense in C(X , R). Proof We will distinguish two exhaustive and mutually exclusive cases: Case 1: X = ∅; Case 2: X = ∅. Case 1: X = ∅. C(X , R) is a singleton set and A = C(X , R) since A is a subspace and therefore nonempty. Then, the result holds.

176

7 Banach Spaces

Case 2: X = ∅. By Lemma 7.55, A is an algebra. We need the following claim. Claim 7.56.1 ∀f ∈ A, then |f | ∈ A. Proof of Claim Fix f ∈ A. ∀ ∈ (0, ∞) ⊂ R, ∃g ∈ A such that f − g < /2. By Proposition 5.29 and Definitions 5.1 and 5.21, ∃N ∈ (0, ∞) ⊂ R such that g < N. Then, g/N ∈ A since A is an algebra. By Lemma 7.54, ∃ a polynomial  P (s) such that |P (s) − |s|| < 2N , ∀s ∈ [−1, 1] ⊂ R. Hence, we have P ◦ (g/N) ∈  A since A is an algebra. Furthermore, |P (g(x)/N) − |g(x)|/N| < 2N , ∀x ∈ X . Let h = NP ◦ (g/N) ∈ A. Then, we have h − |g| < /2. Note that |f | − h ≤ |f | − |g| + |g| − h ≤ f − g + |g| − h < . Hence, |f | ∈ A, by the arbitrariness of . This completes the proof of the claim. ' & ∀f, g ∈ A, we have 1 (f + g) + 2 1 f ∧ g = (f + g) − 2

f ∨g =

.

1 |f − g| ∈ A 2 1 |f − g| ∈ A 2

Hence, A is a lattice. Clearly, A separates points since A separates points. By Proposition 7.53, ∀h ∈ C(X , R), ∀ ∈ (0, ∞) ⊂ R, there exists g ∈ A such that 0 ≤ g(x) − h(x) < , ∀x ∈ X . Then, g − h < , and h ∈ A = A, where the last equality follows from Proposition 3.3. Hence, we have A = C(X , R). This completes the proof of the theorem. ' & Corollary 7.57 Let X ⊆ Rn be a closed and bounded set with the subset topology O, where n ∈ Z+ , X := (X, O), and f : X → R be a continuous function. Then, ∀ ∈ (0, ∞) ⊂ R, there exists a polynomial P : X → R in n variables such that |f (x) − P (x)| < , ∀x ∈ X . Proof By Proposition 5.40, X is a compact space. Let C(X , R) be the Banach space defined in Example 7.32. Then, f ∈ C(X , R). Let A be all polynomial in n variables on X . Clearly, A ⊆ C(X , R) and is a linear subspace of C(X , R). Then, it is easy to show that A is an algebra. ∀x1 , x2 ∈ X with x1 = x2 , there exists a coordinate i0 ∈ {1, . . . , n} such that πi0 (x1 ) = πi0 (x2 ). Then, the polynomial f ∈ A given by f (x) = πi0 (x), ∀x ∈ X , separates x1 and x2 . Hence, A separates points. Clearly, A contains all the constant functions. By the Stone-Weierstrass Theorem, A = C(X , R). This completes the proof of the corollary. ' & Corollary 7.58 Let S1 = [0, 2π] ⊂ R and f ∈ C(S1 , R). Assume that f (0) = f (2π). Let M = {g ∈ C(S1 , R) | g(x) = cos(nx), ∀x ∈ S1 or g(x) = sin(nx), ∀x ∈ S1 , where n ∈ Z+ }. Let A = span (M) ⊆ C(S1 , R). Then, f ∈ A. Proof Let S2 ⊂ R2 be the unit circle: S2 := {(y1 , y2 ) ∈ R2 | y12 + y22 = 1}. Define a mapping Ψ : S1 → S2 by Ψ (x) = (cos(x), sin(x)), ∀x ∈ S1 . Clearly, Ψ is surjective and continuous. Note that S1 is compact and S2 is Hausdorff. It is obvious that we may define a function Φ : S2 → R such that Φ ◦ Ψ = f , which

7.7 The Stone-Weierstrass Theorem

177

is continuous. By Proposition 5.18, we have Φ is continuous. Note that S2 is closed and bounded in R2 , then S2 is compact by Proposition 5.40. Hence, Φ ∈ C(S2 , R). By Corollary 7.57, ∀ ∈ (0, ∞) ⊂ R, there exists a polynomial P : S2 → R such that |Φ(y1 , y2 ) − P (y1 , y2 )| < , ∀y1 , y2 ∈ S2 . Then, |f (x) − P ◦ Ψ (x)| = |Φ ◦ Ψ (x) − P ◦ Ψ (x)| < , ∀x ∈ S1 . Note that P ◦ Ψ ∈ A, since, ∀γ , θ ∈ R, 1 1 (1 − cos(2θ )); (cos(θ ))2 = (1 + cos(2θ )); 2 2 1 sin(γ ) cos(θ ) = (sin(γ + θ ) + sin(γ − θ )); 2 1 sin(γ ) sin(θ ) = (cos(γ − θ ) − cos(γ + θ )); 2 1 cos(γ ) cos(θ ) = (cos(γ − θ ) + cos(γ + θ )) 2 (sin(θ ))2 =

.

Hence, f ∈ A, by the arbitrariness of . This completes the proof of the corollary. ' & For ease of presentation below, we will define two functions:√sqr : R → R by sqr(x) = x 2 , ∀x ∈ R, and sqrt : [0, ∞) → [0, ∞) by sqrt(x) = x, ∀x ∈ [0, ∞). Corollary 7.59 Let X := (X, O) be a compact space and A ⊆ C(X , R) be an algebra that satisfies (i) A separates points. (ii) ∃f0 ∈ A such that f0 (x) = 0, ∀x ∈ X . Then, the constant function 1 ∈ A = C(X , R). Proof We will distinguish two exhaustive and mutually exclusive cases: Case 1: X = ∅; Case 2: X = ∅. Case 1: X = ∅. C(X , R) is a singleton set and A = C(X , R) since A is a subspace and therefore nonempty. Then, the result holds. Case 2: X = ∅. By Lemma 7.55, A is an algebra. Since A is an algebra, then, sqr ◦f0 ∈ A, which satisfies that sqr ◦f0 (x) > 0, ∀x ∈ X . Furthermore, g0 := sqr ◦f0 /sqr ◦f0  ∈ A, which satisfies g0 : X → (0, 1] ⊂ R. By Corollary 7.57, √ ∀ ∈ (0,  ∞) ⊂ R, there exists a polynomial Q in one variable such that  s − Q (s) < , ∀s ∈ [0, 1] ⊂ R, and Q (0) = 0. ∀f : X → [0, 1] ⊂ R with f ∈ A, Q ◦ f ∈ A since A is an algebra and Q (0) = 0. Then, sqrt ◦f ∈ A = A. Recursively, we may conclude that sqrtn ◦g0 ∈ A, ∀n ∈ N. By Proposition 5.29, ∃xm ∈ X such  that g0 (x)  ≥ g0 (xm ) =: γ > 0, ∀x ∈ X . −n   ∀ ∈ (0, 1) ⊂ R, ∃n0 ∈ N such that 1 − γ 2 0  < . Then, 1 − sqrtn0 ◦g0  < . Hence, the constant function 1 ∈ A = A.

178

7 Banach Spaces

Clearly, A separates points since A does. Hence, by Theorem 7.56, A = A = C(X , R). This completes the proof of the corollary. & ' Proposition 7.60 Let X be a compact space and A ⊆ C(X , R) be an algebra that separates points on X . Then, either A = C(X , R) or ∃x0 ∈ X such that A = {f ∈ C(X , R) | f (x0 ) = 0}. Proof We will distinguish two exhaustive and mutually exclusive cases: Case 1: X = ∅; Case 2: X = ∅. Case 1: X = ∅. C(X , R) is a singleton set and A = C(X , R) since A is a subspace and therefore nonempty. Then, the result holds. Case 2: X = ∅. By Lemma 7.55, A is an algebra. We will further distinguish two exhaustive and mutually exclusive cases: Case 2a: 1 ∈ A; Case 2b: 1 ∈ / A. Case 2a: 1 ∈ A. Clearly, A is an algebra that separates points. By Stone-Weierstrass Theorem, C(X , R) = A = A. Then, the result holds. Case 2b: 1 ∈ / A. Then, A ⊂ C(X , R). By Corollary 7.59, ∀f ∈ A, there exists x ∈ X such that f (x) = 0. Then, we have the following claims. Claim 7.60.1 ∀f ∈ A, |f | ∈ A. Hence, A is a lattice. Proof of Claim ∀f ∈ A, by the compactness of X and Proposition 5.29, ∃M ∈ (0, ∞) ⊂ R such that f  ≤ M. Let g := f/M ∈ A. Then, g : X → [−1, 1] ⊂ R. ∀ ∈ (0, ∞) ⊂ R, by Lemma 7.54, there exists a polynomial P : [−1, 1] → R with P (0) = 0 such that |P (s) − |s|| < /M, ∀s ∈ [−1, 1] ⊂ R. Since A is an algebra, then M · P ◦ g ∈ A and |f | − M · P ◦ g = M|g| − P ◦ g < . Hence, |f | ∈ A = A. ∀f, g ∈ A, we have 1 (f + g) + 2 1 f ∧ g = (f + g) − 2

f ∨g =

.

1 |f − g| ∈ A 2 1 |f − g| ∈ A 2

Hence, A is a lattice. This completes the proof of the claim.

' &

Claim 7.60.2 ∃x0 ∈ X such that f (x0 ) = 0, ∀f ∈ A. Proof of Claim We will prove this using an argument of contradiction. Suppose the claim is false. ∀x ∈ X , ∃fx ∈ A such that fx (x) = 0. By Claim 7.60.1, |fx | ∈ A and |fx (x)| > 0. Let Ox := {x¯ ∈ X | |fx (x)| ¯ > 0}. Since fx ∈ A ⊆ C(X , R), then x ∈ Ox ∈ OX . Hence, we have X ⊆ x∈ X Ox . By the compactness of X , there exists a finite set XN ⊆ X such that X ⊆ x∈XN Ox . Clearly, XN = ∅ since X = ∅. Let f := x∈XN |fx | ∈ A. Hence, f (x) > 0, ∀x ∈ X . By Corollary 7.59, we have 1 ∈ A = A, which is a contradiction. Hence, the claim is true. This completes the proof of the claim. ' &

7.7 The Stone-Weierstrass Theorem

179

Claim 7.60.3 ∀ closed set F ⊆ X with x0 ∈ / F , we have (i) Let AF := {h ∈ C(F, R) | ∃f ∈ A such that h = f |F }, then AF = C(F, R). (ii) ∀ ∈ (0, 1) ⊂ R, ∃g ∈ A such that g : X → [0, 1] ⊂ R and g(x) ≥ 1 − , ∀x ∈ F . Proof of Claim Clearly, AF is an algebra since A is an algebra. Furthermore, AF separate points on F since A separates points on X . By Proposition 5.5, F with the subset topology is compact. ∀x ∈ F , we have x = x0 . ∃fx ∈ A such that fx (x) = fx (x0 ) = 0. By Claim 7.60.1, |fx | ∈ A and |fx (x)| > 0. Let ¯ > 0}.  Ox := {x¯ ∈ X | |fx (x)| Then, we have x ∈ Ox ∈ OX . Hence, F ⊆ x∈F O . By the compactness of F , x  there exists a finite set FN ⊆ F such that F ⊆ x∈FN Ox . Let f := x∈FN |fx |. Then, f ∈ A and f (x) > 0, ∀x ∈ F . Then, h := f |F ∈ AF and h(x) > 0, ∀x ∈ F . By Corollary 7.59, we have AF = C(F, R). Hence, (i) is true. ∃M ∈ (0, ∞) ⊂ R such that f  ≤ M. Then, g0 := f/M ∈ A and g0 : X → [0, 1] ⊂ R. By Corollary 7.57, √ ∀ ∈ (0,  ∞) ⊂ R, there exists a polynomial Q in one variable such that  s − Q (s) < , ∀s ∈ [0, 1] ⊂ R, and Q (0) = 0. ∀f¯ : X → [0, 1] ⊂ R with f¯ ∈ A, Q ◦ f¯ ∈ A since A is an algebra and Q (0) = 0. Then, sqrt ◦f¯ ∈ A = A. Recursively, we may conclude that sqrtn ◦g0 ∈ A, ∀n ∈ N. By the compactness of F and Proposition 5.29, ∃γ ∈ (0, 1] ⊂ R such that −n g0 (x) ≥ γ , ∀x ∈ F . ∀ ∈ (0, 1) ⊂ R, ∃n0 ∈ N such that γ 2 0 ≥ 1 − . Then, g := sqrtn0 ◦g0 ∈ A, g : X → [0, 1] ⊂ R and g(x) ≥ 1 − , ∀x ∈ F . Hence, (ii) is true. This completes the proof of the claim. ' & Claim 7.60.4 ∀g ∈ C(X , R) with g(x0 ) = 0 and g(x) ≥ 0, ∀x ∈ X , we have g ∈ A. Proof of Claim ∀ ∈ (0, ∞) ⊂ R, let O := {x ∈ X | g(x) < /2}. Then, x0 ∈ O ∈ OX since g is continuous. Let M := g ∈ [0, ∞) ⊂ R. By Claim 7.60.3, ∃h1 ∈ A such that h1 : X → [0, 1] ⊂ R and h1 (x) ≥ 2/3, ˜ Note that h1 (x0 ) = 0 by Claim 7.60.2. Let h2 := 3Mh1 /2 ∈ A. Then, we ∀x ∈ O. ˜ have 0 ≤ h2 (x) ≤ 3M/2, ∀x ∈ X , and h2 (x) ≥ M ≥ g(x), ∀x ∈ O. Let U := {x ∈ O | h2 (x) < /2}. Then, x0 ∈ U ∈ OX since O ∈ OX , h2 is continuous, and h2 (x0 ) = 0. Furthermore, g|U˜ ∈ C(U˜ , R). By Claim 7.60.3,   ∃h3 ∈ A such that  h3 |U˜ (x) − g|U˜ (x) < , ∀x ∈ U˜ . Define h4 := (h3 ∨ 0) ∧ h2 . Then, h4 ∈ A by Claim 7.60.1. ∀x ∈ X , we will show that |g(x) − h4 (x)| <  by distinguishing three exhaustive and mutually exclusive cases: Case A: x ∈ U ; Case ˜ B: x ∈ O \ U ; Case C: x ∈ O.

180

7 Banach Spaces

Case A: x ∈ U . Then, we have 0 ≤ h2 (x) < /2, 0 ≤ g(x) < /2. Then, 0 ≤ h4 (x) = (h3 (x) ∨ 0) ∧ h2 (x) ≤ h2 (x) < /2. Therefore, |g(x) − h4 (x)| < . Case B: x ∈ O \ U . Then, we have |h3 (x) − g(x)| < , h2 (x) ≥ 0, and 0 ≤ g(x) < /2. Hence, 0 ≤ h3 (x) ∨ 0 < g(x) + . Then, 0 ≤ h4 (x) < g(x) +  and −/2 < −g(x) ≤ h4 (x) − g(x) < . Therefore, we have |g(x) − h4 (x)| < . ˜ Then, we have |h3 (x) − g(x)| <  and h2 (x) ≥ M ≥ g(x) ≥ 0. Case C: x ∈ O. Hence, g(x) −  < h3 (x) < g(x) + , g(x) −  < h3 (x) ∨ 0 < g(x) + , and g(x) −  < h4 (x) < g(x) + . Therefore, |g(x) − h4 (x)| < . Hence, in all three cases, we have |g(x) − h4 (x)| < . Hence g − h4  < . By the arbitrariness of , we have g ∈ A = A. ' & ∀f ∈ C(X , R) with f (x0 ) = 0, let f+ := f ∨ 0 and f− := (−f ) ∨ 0. Clearly, f+ , f− ∈ C(X , R), f+ (x0 ) = 0, f− (x0 ) = 0, and f = f+ − f− . By Claim 7.60.4, f+ , f− ∈ A. Then, f ∈ A since A is an algebra. This coupled with Claim 7.60.2, we have that A = {f ∈ C(X , R) | f (x0 ) = 0}. Hence, the result holds in this case. This completes the proof of the proposition. ' &

7.8 Linear Operators Definition 7.61 Let X and Y be normed linear spaces over the field K. A linear operator A : X → Y is said to be bounded if ∃M ∈ [0, ∞) ⊂ R such that AxY ≤ MxX , ∀x ∈ X. % Here, the boundedness concept is different from Definition 7.12. We are talking about bounded linear operators here, not bounded operators. Definition 7.61 only applies to linear operators. Proposition 7.62 Let X and Y be normed linear spaces over K and A : X → Y be a linear operator. Then, (i) If A is bounded then A is uniformly continuous. (ii) If A is continuous at some x0 ∈ X then A is bounded. Proof (i) ∃M ∈ [0, ∞) ⊂ R such that AxY ≤ MxX , ∀x ∈ X. ∀ ∈ (0, ∞) ⊂ R, let δ = /(1 + M) ∈ (0, ∞) ⊂ R. ∀x1 , x2 ∈ X with x1 − x2 X < δ, we have Ax1 − Ax2Y = A(x1 − x2 )Y ≤ Mx1 − x2 X < . Hence A is uniformly continuous. (ii) ∀ ∈ (0, ∞) ⊂ R, ∃δ ∈ (0, ∞) ⊂ R, ∀x ∈ BX (x0 , δ), we have Ax − Ax0 Y < . ∀x¯ ∈ BX (ϑX , δ), we have Ax ¯ Y = A(x0 + x) ¯ − Ax0 Y < . Hence, A is continuous at ϑX . Then, ∃δ0 ∈ (0, ∞) ⊂ R such that AxY < 1, ∀x ∈ BX (ϑX , δ0 ). Then, ∀x ∈ X, we have two possibilities: (a) x = ϑX , then δ0 AxY = 0 ≤ δ20 xX ; (b) x = ϑX , then, 2x x ∈ B (ϑ , δ ) and we have 6 " X #6 X X 0 6 " #6 6 2xX 6 6 2xX 6 δ0 δ0 AxY = 6 δ0 A 2x x 6 = δ0 6A 2x x 6 < δ20 xX . Hence, A is X X Y Y bounded. This completes the proof of the proposition. ' &

7.8 Linear Operators

181

Proposition 7.63 Let X and Y be normed linear spaces over the field K. Let (M(X, Y), K) be the vector space defined in Example 6.20 with the null vector ϑ. Let P := {A ∈ M(X, Y) | A is linear. }. Then, P is a subspace of M(X, Y). Define a functional · on P by A := inf{M ∈ [0, ∞) ⊂ R | AxY ≤ MxX , ∀x ∈ X}, ∀A ∈ P . Then, ∀A ∈ P , A may be equivalently expressed as A =

.

sup

x∈X,xX ≤1

AxY

⎧ ⎨

⎫ ⎫ ⎧ ⎬ ⎨ AxY ⎬ = max sup , 0 = max sup AxY , 0 ⎩ x∈X xX ⎭ ⎭ ⎩ x∈X x =ϑX

xX =1

Let N := {A ∈ P | A < +∞}. Then, (N, K, ·) =: B(X, Y) is a normed linear space. Proof It is easy to check that P is closed under vector addition and scalar multiplication in M(X, Y). Clearly, ϑ ∈ P = ∅. Hence, P is a subspace of M(X, Y). Next, we prove the four equivalent definitions of A. ∀A ∈ P , define the set BA := {M ∈ [0, ∞) ⊂ R | AxY ≤ MxX , ∀x ∈ X}. Let δA1 := supx∈X,xX ≤1 AxY . Since ϑX X = 0 and AϑX Y = ϑY Y = 0, then δA1 ≥ 0. ∀M ∈ BA , we have AxY ≤ MxX ≤ M, ∀x ∈ X with xX ≤ 1. Hence, we have δA1 ≤ * M and δA1 ≤ A. + Ax Let δA2 := max supx∈X, x =ϑX x Y , 0 . ∀x ∈ X with x = ϑX , we have X 6 " #6 6 6 AxY x δA1 ≥ 6A x 6 = x . Hence, we have δA2 ≤ δA1 . Let δA3 := X Y * +X max supx∈X, xX =1 AxY , 0 . Clearly, we have 0 ≤ δA3 ≤ δA2 . On the other hand, ∀x ∈ X, we have either x = 6 ϑ"X , then #6 AxY = 0 ≤ 6 6 x Y δA3 xX ; or x = ϑX , then xX > 0 and δA3 ≥ 6A x 6 = Ax xX , and X Y hence, AxY ≤ δA3 xX . Thus, ∀x ∈ X, we have AxY ≤ δA3 xX . If δA3 = +∞, then A ≤ δA3 . If δA3 < +∞, then δA3 ∈ BA and A ≤ δA3 . Therefore, A ≤ δA3 . Hence, A = δA1 = δA2 = δA3 . It is easy to check that N is closed under vector addition and scalar multiplication in P . Clearly, ϑ ∈ N = ∅. Hence, N is a subspace of P . Finally, we will show that · defines a norm on N. ∀A1 , A2 ∈ N, ∀α ∈ K. (i) A1  ∈ [0, ∞) ⊂ R. If A1 = ϑ, then A1  = supx∈X,xX ≤1 A1 xY = supx∈X,xX ≤1 0 = 0. On the other hand, if A1  = 0, then, ∀x ∈ X, 0 ≤ A1 xY ≤ A1 xX = 0, which implies that A1 x = ϑY and A1 = ϑ. Hence, A1  = 0 ⇔ A1 = ϑ. (ii) A1 + A2  = supx∈X, xX ≤1 (A1 + A2 )xY = supx∈X, xX ≤1 A1 x + A2 xY ≤ supx∈X, xX ≤1 (A1 xY + A2 xY ) ≤ supx∈X, xX ≤1 A1 xY + supx∈X, xX ≤1 A2 xY = A1  + A2 .

182

7 Banach Spaces

(iii) αA1  = supx∈X, xX ≤1 (αA1 )xY = supx∈X, xX ≤1 α(A1 x)Y = |α| supx∈X, xX ≤1 A1 xY |α|A1 , supx∈X, xX ≤1 |α|A1 xY = = where we have made use of Proposition 3.81 in the fourth equality. Hence, B(X, Y) is a normed linear space. ' & Proposition 7.64 Let X, Y, and Z be normed linear spaces over the field K, A ∈ B(X, Y), and B ∈ B(Y, Z). Then, ∀x ∈ X, we have Ax ≤ Ax, where the three norms are over three different normed linear spaces. Furthermore, BA ≤ BA, where the three norms are over three different normed linear spaces. Proof This is straightforward and is therefore omitted.

' &

Proposition 7.65 Let X and Y be normed linear spaces over the field K and B(X, Y) be the normed linear space of bounded linear operators of X to Y. Define an operator ψ : B(X, Y) × X → Y by ψ(A, x) = Ax, ∀A ∈ B(X, Y), ∀x ∈ X. Then, ψ is continuous.  Proof ∀(A0 , x0 ) ∈ B(X, Y) × X, ∀ ∈ (0, ∞) ⊂ R, let δ = +x ∧ 0  ∧  ∈ (0, ∞) ⊂ R. ∀(A, x) ∈ B , x ), δ), we have ((A 0 0 B(X,Y)×X +A0  ψ(A, x) − ψ(A0 , x0 ) = Ax − A0 x0  = Ax − Ax0 + Ax0 − A0 x0  ≤ A(x − x0 ) + (A − A0 )x0  ≤ Ax − x0  + A − A0 x0  ≤ (A0  + A − A0 )δ +x0 δ ≤ (δ +A0 )δ +x0 δ ≤  + = 2. Hence, ψ is continuous. This completes the proof of the proposition. ' &

Proposition 7.66 Let X and Y be normed linear spaces over the field K. Let B(X, Y) be the normed linear space defined in Proposition 7.63. If Y is a Banach space, then B(X, Y) is also a Banach space. Proof All we need to show is that B(X, Y) is complete. Take a Cauchy sequence (An )∞ n=1 ⊆ B(X, Y). ∀x ∈ X, we have An x − Am x ≤ An − Am x, ∀n, m ∈ N, by Proposition 7.64. Hence, (An x)∞ n=1 ⊆ Y is a Cauchy sequence. Since Y is a Banach space, then ∃! yx ∈ Y such that limn∈N An x = yx . Hence, we may define a function f : X → Y by f (x) = yx , ∀x ∈ X. ∀x1 , x2 ∈ X, ∀α, β ∈ K, by Propositions 7.23, 4.15, and 3.67, we have f (αx1 + βx2 ) = limn∈N An (αx1 + βx2 ) = limn∈N (αAn x1 + βAn x2 ) = α limn∈N An x1 + β limn∈N An x2 = αf (x1 ) + βf (x2 ). Hence, f is linear. Since, (An )∞ n=1 is Cauchy, then, ∀ ∈ (0, ∞) ⊂ R, ∃N ∈ N, ∀n, m ≥ N, An − Am  < . ∀x ∈ X, An x − Am x ≤ An − Am x ≤ x. Then, by Propositions 3.66, 7.21, and 7.23, f (x) − Am x = limn∈N An x − Am x ≤ limn∈N An − Am x ≤ x. Therefore, f (x) ≤ f (x) − AN x + AN x ≤ x + AN x = ( + AN )x. This implies that f is bounded. Hence, f ∈ B(X, Y). Note that 0 ≤ limn∈N An − f  = limn∈N supx∈X, x≤1 (An − f )(x) = limn∈N supx∈X, x≤1 An x − f (x) ≤ . By the arbitrariness of , we have limn∈N An − f  = 0 and hence limn∈N An = f ∈ B(X, Y). Hence, B(X, Y) is complete. This completes the proof of the proposition. ' &

7.8 Linear Operators

183

Proposition 7.67 Let X and Y be normed linear spaces over the field K, X be finitedimensional with dimension n ∈ Z+ , and A : X → Y be a linear operator. Then, A ∈ B(X, Y). Proof Let XN ⊆ X be a basis for X, which then contains exactly n vectors. We will distinguish two exhaustive and mutually exclusive cases: Case 1: n = 0; Case 2: n ∈ N. Case 1: n = 0. Then, X = {ϑX }. Then, by Proposition 7.63, we have A = supx∈X, x≤1 Ax = 0. Hence, A ∈ B(X, Y). Case 2: n ∈ N. Let XN = {x1 , . . . , xn }. ∀x ∈ X, by Corollary 6.47 and n Definition 6.50, ∃! α1 , . . . , αn ∈ K such that x4= i=1 αi xi . Then, we may -n 2 define an alternative norm ·1 on X by x1 = i=1 |αi | . It is easy to check that ·1 defines a norm on X. By Theorem 7.38, ∃M ∈ (0, ∞) ⊂ R such that x/M ≤ x1 ≤ Mx, ∀x ∈ X. Define ri := Axi , i = 1, . . . , n. Then, by Proposition 7.63, 6 / n 06 6 6 . 6 6 Ax = sup sup A .A = α x 6 6 i i -n 6 6 x∈X, x≤1 α1 ,...,αn ∈K,  i=1 αi xi ≤1 i=1 n . |αi |ri sup α1 ,...,αn ∈K,  ni=1 αi xi ≤1 i=1 / n 01/2 / n 01/2 . . 2 2 |αi | ≤ sup ri α1 ,...,αn ∈K,  ni=1 αi xi ≤1 i=1 i=1 / n 01/2 / n 01/2 . . 2 2 x1 ≤ = sup ri sup ri Mx



x∈X, x≤1

≤M

/ n .

i=1

01/2

x∈X, x≤1

i=1

ri2

i=1

where we have applied the Cauchy–Schwarz Inequality in the second inequality. Hence, A ∈ B(X, Y). This completes the proof of the proposition. ' & Proposition 7.68 Let X and Y be normed linear spaces over the field K and A : X → Y be a linear operator. If A is bounded, then N (A) ⊆ X is closed. On the other hand, if N (A) is closed and R(A) ⊆ Y is finite-dimensional, then A is bounded. Proof Let A be bounded. By Proposition 7.62, A is continuous. Note that the set {ϑY } ⊆ Y is closed by Proposition 3.34. By Proposition 3.10, N (A) = Ainv ({ϑY }) is closed. Let N (A) be closed and R(A) be finite-dimensional. Let n ∈ Z+ be the dimension of R(A). We will distinguish two exhaustive and mutually exclusive

184

7 Banach Spaces

cases: Case 1: n = 0; Case 2: n ∈ N. Case 1: n = 0. Then, R(A) = {ϑY }. Hence, A = ϑB(X,Y) , which is clearly bounded. Case 2: n ∈ N. Then, ∃XN = {x1 , . . . , xn } ⊆ X with exactly n elements such that A(XN ) ⊆ Y is a basis of R(A). Let X/N (A) be the quotient normed linear space as defined in Proposition 7.44. ∀x ∈ X, Ax ∈ R(A) = span (A(XN )). Then,  -n by Definition 6.50 and Corollary -n6.47, ∃! α1 , . . . , αn ∈ K such -nthat Ax = = ϑ α Ax . Then, we have A x − α x and x − i i i i Y i=1 i=1 αi xi ∈ > =-n -i=1 n = N (A). Hence, [x] = α x α This implies that X/N [x ]. (A) ⊆ i i i i i=1 i=1 span ({[x1 ], . . . , [xn ]}). Hence, X/N (A) is finite-dimensional. ¯ Define a mapping A¯ : X/N (A) → Y by A[x] = Ax, ∀[x] ∈ X/N (A). Note that ∀x¯ ∈ [x], we have x − x¯ ∈ N (A) and Ax = Ax. ¯ Hence, A¯ is ¯ uniquely defined. ∀[x1 ], [x2 ] ∈ X/N (A), ∀α, β ∈ K, we have A(α[x1 ] + β[x2 ]) = ¯ 1 ]) + β A([x ¯ ¯ 2 ]). A([αx 1 + βx2 ]) = A(αx1 + βx2 ) = αAx1 + βAx2 = α A([x ¯ ¯ Hence, A6 is a6linear 6 6operator. 6By6Proposition 7.67, A is bounded. ∀x ∈ X, ¯ 6 ≤ 6A¯ 6[x] ≤ 6A¯ 6x. Hence, A ∈ B(X, Y). Ax = 6A[x] This completes the proof of the proposition. ' & Proposition 7.69 Let X be a normed linear space and M ⊆ X be a closed subspace. Let X/M be the quotient normed linear space and φ : X → X/M be the natural homomorphism. Then, φ ≤ 1. Proof By Proposition 7.43, φ is linear. ∀x ∈ X, φ(x) = infm∈M x − m ≤ x. Then, φ ∈ B(X, X/M) and φ ≤ 1. This completes the proof of the proposition. ' & Proposition 7.70 Let X and Y be normed linear spaces over the field K, A ∈ B(X, Y), and φ : X → X/N (A) be the natural homomorphism. Then, ∃B ∈ B(X/N (A), Y) such that A = B ◦ φ, B is injective, and A = B. Proof By Proposition 7.68, N (A) is closed. Then, by Proposition 7.44, X/N (A) is a normed linear space. Define B : X/N (A) → Y by B([x]) = Ax, ∀[x] ∈ X/N (A). Note that ∀x¯ ∈ [x], we have x − x¯ ∈ N (A) and Ax = Ax. ¯ Hence, B is uniquely defined. ∀[x1 ], [x2 ] ∈ X/N (A), ∀α, β ∈ K, we have B(α[x1 ] + β[x2 ]) = B([αx1 + βx2 ]) = A(αx1 + βx2 ) = αAx1 + βAx2 = αB([x1 ]) + βB([x2 ]). Hence, B is linear. ∀[x1 ], [x2 ] ∈ X/N (A), B[x1 ] = B[x2 ] implies that Ax1 = Ax2 , x1 − x2 ∈ N (A), and hence, [x1 ] = [x2 ]. This shows that B is injective. ∀x ∈ X, B ◦ φ(x) = B[x] = Ax. Hence, A = B ◦ φ. ∀[x] ∈ X/N (A), ∀m ∈ N (A), B[x] = Ax = A(x − m) ≤ Ax − m. This implies that B[x] ≤ A[x]. Hence, B ∈ B(X/N (A), Y) and B ≤ A. By Propositions 7.64 and 7.69, A = B ◦ φ ≤ Bφ ≤ B. Then, A = B. This completes the proof of the proposition. ' &

7.9 Dual Spaces

185

7.9 Dual Spaces 7.9.1 Basic Concepts Definition 7.71 Let X be a normed linear space over the field K. Then, the space B(X, K), which consists of all bounded linear functionals on X, is called the dual of X and denoted by X∗ . We will denote the vectors in X∗ by x∗ and denote x∗ (x) by x∗ , x. % Proposition 7.72 Let X be a normed linear space over the field K and X∗ be its dual. Then, the following statements hold. (i) If a linear functional f : X → K is continuous at some x0 ∈ X, then f ∈ X∗ . (ii) ∀x∗ ∈ X∗ , x∗ is uniformly continuous. (iii) For any linear functional f : X → K, we have f  = inf{M ∈ [0, ∞) ⊂ R | |f (x)| ≤ MxX , ∀x ∈ X} = supx∈X,xX ≤1 |f (x)| = ? 3 ? 3 |f (x)| |f (x)|, 0 . max sup , 0 = max sup x∈X, x =ϑX xX x∈X, xX =1 (iv) |x∗ , x| ≤ x∗ x, ∀x ∈ X, ∀x∗ ∈ X∗ . (v) ·, · is a continuous function on X∗ × X. (vi) X∗ is a Banach space. (vii) A linear functional f : X → K is bounded if, and only if, N (f ) is closed. Proof (i) and (ii) are direct consequences of Proposition 7.62. (iii) is a direct consequence of Proposition 7.63. (iv) follows from Proposition 7.64. (v) follows from Proposition 7.65. (vi) follows from Proposition 7.66 since K is a Banach space. Finally, (vii) follows from Proposition 7.68. This completes the proof of the proposition. ' &

7.9.2 Duals of Some Common Banach Spaces Example 7.73 Let us consider the dual of X := ({ϑX }, K, ·). Clearly, there is a single linear functional on X given by f : X → K and f (ϑX ) = 0. Clearly, this linear functional is bounded with norm 0. Hence, X∗ = ({f }, K, ·1 ), which is isometrically isomorphic to X. % Example 7.74 Let us consider the dual of X := Kn , n ∈ N. ∀x := (ξ1 , . . . , ξn ) ∈ n n , ηn ∈ K is K , any functional of the form f (x) = i=1 ηi ξi with η1 , . . .clearly linear. By Cauchy–Schwarz Inequality, we have |f (x)| =  ni=1 ηi ξi  ≤ 44 4-n n n 2 -n |ξ |2 2 i=1 |ηi ||ξi | ≤ i=1 |ηi | i=1 i = i=1 |ηi | |x|. Hence, f is bounded 4n 2 and f  ≤ i=1 |ηi | . Since the equality is achieved at x = (η1 , . . . , ηn ) in the above inequality, where ηi denotes the complex conjugate of ηi , then f  =

186

7 Banach Spaces

4-

n 2 i=1 |ηi | .

Clearly, for different n-tuple (η1 , . . . , ηn ), the linear functional f is distinct. Now, let f be a bounded linear functional on X. Let ei ∈ X be the ith unit vector (all components of ei are zero except a 1 at the ith component), i = 1, . . . ,n. Let ηi = f (ei ) ∈ K,i = 1, . . . , n. Then, ∀x = (ξ1 , . . . , ξn ) ∈ X, we have x = ni=1 ξi ei and f (x) = ni=1 ηi ξi . Define a function Ψ : X∗ → Kn by Ψ (f ) = (η1 , . . . , ηn ), ∀f ∈ X∗ . Clearly, Ψ is linear, bijective and norm preserving. Hence, the dual of Kn is (isometrically isomorphic to) Kn . %

Lemma 7.75 Let X be a normed linear space over K, X∗ be its dual, and x∗ ∈ X∗ . Then, ∀ ∈ (0, 1) ⊂ R, ∃x ∈ X with x ≤ 1 such that x∗ , x ∈ R and x∗ , x ≥ (1 − )x∗ . Proof ∀ ∈ (0, 1) ⊂ R. We will distinguish two exhaustive and mutually exclusive cases: Case 1: x∗  = 0; Case 2: x∗  > 0. Case 1: x∗  = 0. Then, x∗ = ϑ∗ . Take x = ϑ and the result holds. Case 2: x∗  > 0. Note that x∗  = supx∈X, x≤1 |x∗ , x|. Then, ∃x¯ ∈ X with x ¯ ≤ 1 such that x∗ ,x ¯ | |x∗ , x| x = x ¯ ≥ (1 − )x∗  > 0. Take x := |x x ¯ ∈ X. Then, ¯ ≤1 ¯ ∗ ,x @@ AA |x∗ ,x |x∗ ,x ¯ | ¯ | = x∗ ,x ¯ = |x∗ , x| ¯ ∈ R and and x∗ , x = x∗ , x∗ ,x ¯  x¯ ¯  x∗ , x x∗ , x ≥ (1 − )x∗ . This completes the proof of the lemma. ' & Example 7.76 Let Yi be a normed linear space over  the field K, Y∗i be its dual, i = 1, . . . , n, n ∈ N. We will study the dual of X := ni=1 Yi . We will show that ∗ X = ni=1 Y∗i isometrically isomorphically. Let f ∈ X∗ . ∀i = 1, . . . , n, define Ai : Yi → X by x = (y1 , . . . , yn ) = Ai y, ∀y ∈ Yi , and yj = ϑYi , ∀j ∈ 1, . . . , n with j = i, and yi = y. Clearly, Ai is welldefined and linear and bounded with Ai  ≤ 1. Define fi : Yi → K by fi = f ◦ Ai . Then, fi is a bounded linear functional on Yi with fi  ≤ f Ai  ≤ f . Hence, fi ∈ Y∗i . Denote fi =: y∗i . ∀x = (y1 , . . . , yn ) ∈ X, we have yi ∈ Yi , i = 1, . . . , n. By the linearity of f , we have / f (x) = f

n .

.

0 Ai yi

i=1

=

n . i=1

f (Ai yi ) =

n .

y∗i , yi 

(7.1)

i=1

n ∗ Let x∗ := (y∗1 , . . . , y∗n ) ∈ i=1 Yi . ∀ ∈ (0, 1) ⊂ R, ∀i = 1, . . . , n, by Lemma 7.75, ∃y¯i ∈ Y with y¯i  ≤ 1 such BB thatCC y∗i , y¯i  ∈ R and y∗i , y¯i  ≥ (1 − )y∗i . Let yˆi = y∗i y¯i . Then, y∗i , yˆi ≥ (1 − )y∗i 2 . Then, we have   -n  -n BB CC -n BB CC -n 2 f = = i=1 Ai yˆ i i=1 y∗i , yˆi i=1 y∗i , yˆi ≥ (1 − ) i=1 y∗i  = 2 2 ∗ (1 − )x∗  . Since f ∈ X , then, by Proposition 7.72, we have (1 − )x∗  ≤ "- 6 6 #1/2 66 1/2 n 6 62 f 6 n Ai yˆi 6 = f  ≤ f  n y∗i 2 = f x∗ . yˆi i=1

i=1

i=1

By the arbitrariness of , we have x∗  ≤ f .  Based on the preceding analysis, we may define a function ψ : X∗ → ni=1 Y∗i by ψ(f ) = (f ◦ A1 , . . . , f ◦ An ), ∀f ∈ X∗ . Then, ψ(f ) ≤ f . Clearly, ψ is linear. ∀f1 , f2 ∈ X∗ with ψ(f1 ) = ψ(f2 ). Then, we have f1 ◦ Ai = f2 ◦ Ai , ∀i ∈

7.9 Dual Spaces

187

-n {1, -n. . . , n}. ∀x := (y1 , . . . , yn ) ∈ X, by (7.1), we have f1 (x) = i=1 f1 (Ai yi ) = ∀x∗ := i=1 f2 (Ai yi ) = f2 (x). Then, we have f1 = f2 . Hence, ψ is injective. = ni=1 y∗i , yi , (y∗1 , . . . , y∗n ) ∈ ni=1 Y∗i , define φ(x∗ ) : X → K by φ(x∗ )(x)∀x := (y1 , . . . , yn ) ∈ X. Note that, ∀x := (y1 , . . . , yn ) ∈ X, ni=1 |y∗i , yi | ≤ n i=1 y∗i yi  ≤ x∗ x, where we have made use of Proposition 7.72 in the first inequality and Cauchy–Schwarz Inequality in the second inequality. Clearly, φ(x∗ ) is linear and bounded with φ(x∗ ) =

.



sup

|φ(x∗ )(x)| ≤

sup

x∗ x ≤ x∗ 

x∈X, x≤1 x∈X, x≤1

n .

sup

x=(y1 ,...,yn )∈X, x≤1 i=1

|y∗i , yi | (7.2)

Hence, φ(x∗ ) ∈ X∗ and ψ(φ(x∗ )) = (φ(x∗ ) ◦ A1 , . . . , φ(x∗ ) ◦ An ) = (y∗1 , . . . , y∗n ) = x∗

.

Then, ψ ◦ φ = idni=1 Y∗ . Hence, ψ is surjective. Therefore, ψ is bijective and i admits inverse ψinv . By Proposition 2.4, φ = ψinv . ∀f ∈ X∗ , by (7.2), we have f  = φ(ψ(f )) ≤ ψ(f ) ≤ f . Then, ψ(f ) = f  and ψ is an isometry. Hence, ψ is an isometrical isomorphism. % Example 7.77 Let us consider the dual of X := lp (Y), p ∈ [1, ∞) ⊂ R, Y is a normed linear space over K. Let Y∗ be the dual of Y and q ∈ (1, ∞] ⊂ Re with 1/p + 1/q = 1. We will show that X∗ is isometrically isomorphic to lq (Y∗ ). Let f ∈ X∗ . ∀i ∈ N, define Ai : Y → X by x = (y1 , y2 , . . .) = Ai y, ∀y ∈ Y, and yj = ϑY , ∀j ∈ N with j = i, and yi = y. Clearly, Ai is welldefined, linear, and bounded with Ai  ≤ 1. Define fi : Y → K by fi = f ◦ Ai . Then, fi is a bounded linear functional on Y with fi  ≤ f Ai  ≤ f . Hence, fi ∈ Y∗. Denote fi =: y∗i . ∀x = (y1 , y2 , . ..) ∈ X, we have yi ∈ Y, ∞ p yi p = 0. This ∀i ∈ N, and < +∞. Then, limn∈N ∞ i=1 yi  6 6 -∞ i=n+1 p 1/p -n implies that limn∈N 6x − i=1 Ai yi 6 = limn∈N = 0. Hence, i=n+1 yi  limn∈N ni=1 Ai yi = x. By the continuity of f and Proposition 3.66, we have f (x) = lim f

.

n∈N

/ n . i=1

0 Ai yi

= lim

n∈N

n . i=1

f (Ai yi ) = lim

n∈N

n .

y∗i , yi 

(7.3)

i=1

Claim 7.77.1 x∗ := (y∗1 , y∗2 , . . .) ∈ lq (Y∗ ) and x∗  ≤ f . Proof of Claim We will distinguish two exhaustive and mutually exclusive cases: Case 1: 1 < p < +∞; Case 2: p = 1. Case 1: 1 < p < +∞. Then, 1 < q < +∞. ∀ ∈ (0, 1) ⊂ R, ∀i ∈ N, by Lemma 7.75, ∃y¯i ∈ Y with y¯i  ≤ 1 such that 6 ∗i , y¯i  ∈ R 6 y and y∗i , y¯i  ≥ (1 − )y∗i . Let yˆi = y∗i q/p y¯i , then 6yˆi 6 ≤ y∗i q/p ,

188

7 Banach Spaces

BB CC BB CC q q/p+1 , yˆi ∈ R, and y)y∗i∗i , yˆi BB≥ (1 −CC  y∗i BB = (1 CC− )y∗i  . Then, -n ∀n ∈ N, n n n q    f = = A y ˆ , y ˆ , y ˆ y y ≥ (1 − ) ∗i i ∗i i i=1 i i i=1 i=1 -n i=1 yq∗i  . ∗ Since f ∈ X , then, by Proposition 7.72, we have (1 − ) i=1 y∗i  ≤ 6 6- 6 6p 1/p  -n q 1/p f 6 ni=1 Ai yˆi 6 ≤ f  ni=1 6yˆi 6 ≤ f  · . Then, i=1 y∗i  1/q -n q y∗i  ≤ f /(1 − ). By the arbitrariness of n, we we have -∞ i=1 q 1/q ≤ f /(1 − ). By the arbitrariness of , we have have i=1 y∗i   -∞ q 1/q y  f . ≤ Hence, the result holds in this case. ∗i i=1 Case 2: p =6 1.6 Then, q = +∞.BB ∀ ∈ CC(0, 1) ⊂ R,BB∀i ∈ N, CC by Lemma 7.75, y ∃yˆi ∈ Y with 6yˆi 6 ≤ 1BBsuch that , y ˆ , y ˆ ∈ R and y ∗i ∗i 6i 6≥ (1 − )y∗i .  i  CC Then, (1 − )y∗i  ≤  y∗i , yˆi  = f (Ai yˆi ) ≤ f Ai 6yˆi 6 ≤ f . By the arbitrariness of n, we have supi≥1 y∗i  ≤ f /(1 − ). By the arbitrariness of , we have supi≥1 y∗i  ≤ f . Hence, the result holds in this case. This completes the proof of the claim. ' & The preceding analysis shows that we may define a function ψ : X∗ → lq (Y∗ ) by ψ(f ) = (f ◦ A1 , f ◦ A2 , . . .), ∀f ∈ X∗ . Clearly, ψ is linear. ∀f1 , f2 ∈ X∗ with ψ(f1 ) = ψ(f2 ). Then, we have f1 ◦ A(y1 , y2 , . . .) ∈ X, i = f2 ◦ Ai , ∀i ∈ N. ∀x :=by (7.3), we have f1 (x) = limn∈N ni=1 f1 (Ai yi ) = limn∈N ni=1 f2 (Ai yi ) = f2 (x). Then, we have f1 = f2 . Hence, ψ is injective. ∀x∗ := (y∗1 , y∗2 , . . .) ∈ ∞ lq (Y∗ ), define φ(x∗ ) : X → K by φ(x∗ )(x) = ∗i , yi , ∀x := i=1 -y ∞ , y , . . .) ∈ X. Note that, ∀x := , y , . . .) ∈ X, (y (y 1 2 1 2 i=1 |y∗i , yi | ≤ -∞ y y  x x, ≤ where we have made use of Proposition 7.72 in the ∗i i ∗ i=1 first inequality and Hölder’s Inequality in the second inequality. Hence, φ(x∗ )(x) is well-defined. Hence, φ(x∗ ) is well-defined. Clearly, φ(x∗ ) is linear and bounded with .φ(x∗ ) =



sup

|φ(x∗ )(x)| ≤

sup

x∗ x ≤ x∗ 

x∈X, x≤1 x∈X, x≤1

sup

∞ .

x=(y1 ,y2 ,...)∈X, x≤1 i=1

|y∗i , yi | (7.4)

Hence, φ(x∗ ) ∈ X∗ and ψ(φ(x∗ )) = (φ(x∗ ) ◦ A1 , φ(x∗ ) ◦ A2 , . . .) = (y∗1 , y∗2 , . . .) = x∗

.

Then, ψ ◦ φ = idlq (Y∗ ) . Hence, ψ is surjective. Therefore, ψ is bijective and admits inverse ψinv . By Proposition 2.4, φ = ψinv . ∀f ∈ X∗ , by Claim 7.77.1 and (7.4), we have f  = φ(ψ(f )) ≤ ψ(f ) ≤ f . Then, ψ(f ) = f  and ψ is an isometry. Hence, ψ is an isometrical isomorphism. % Example 7.78 Let Y be a normed linear space over the field K, Y∗ be its dual, and M = {x = (y1 , y2 , . . .) ∈ l∞ (Y) | limn∈N yn = ϑY }. Clearly, M is a subspace of l∞ (Y). By Proposition 7.13, M is a normed linear space over K, which will be

7.9 Dual Spaces

189

denoted by c0 (Y). Next, we will study the dual of X := c0 (Y). We will show that X∗ = l1 (Y∗ ) isometrically isomorphically. Let f ∈ X∗ . ∀i ∈ N, define Ai : Y → X by x = (y1 , y2 , . . .) = Ai y, ∀y ∈ Y, and yj = ϑY , ∀j ∈ N with j = i, and yi = y. Clearly, Ai is well-defined and linear and bounded with Ai  ≤ 1. Define fi : Y → K by fi = f ◦ Ai . Then, fi is a bounded linear functional on Y with fi  ≤ f Ai  ≤ f . Hence, fi ∈ Y∗ . Denote fi =: y∗i . ∀x = (y1 , y2 , . . .) ∈ X, we have yi ∈ Y, ∀i ∈ N, limn∈N yi = ϑY , and limn∈N yi  = 0 (by 6Proposition 7.21). 6 Then, limn∈N supi≥n+1 yi  = 6 = limn∈N sup 0. This implies that limn∈N 6x − ni=1 Ai yi k≥n+1 yk  = limn∈N maxk≥n+1 yk  = 0. Hence, limn∈N ni=1 Ai yi = x. By the continuity of f and Proposition 3.66, we have f (x) = lim f

.

n∈N

/ n .

0 Ai yi

= lim

n∈N

i=1

n .

f (Ai yi ) = lim

n∈N

i=1

n .

y∗i , yi 

(7.5)

i=1

Let 6 6x∗ := (y∗1 , y∗2 , BB. . .). ∀CC∈ (0, 1) ⊂ BBR, ∀i ∈CC N, by Lemma 7.75, ∃yˆi ∈ Y with 6yˆi 6 ≤ 1 such that y∗i , yˆi ∈ R and y∗i , yˆi ≥ (1 − )y∗i . Then, ∀n ∈ N, -n BB   -n CC -n  -n = ≤    (1 −6) ∗i  ≤ i=1 y6 i=1 y∗i , yˆi i=1 f (Ai yˆ i ) = f -i=1 Ai yˆi n ∞ 6 6 f  ≤ f . By the arbitrariness of n, we have i=1 Ai yˆi i=1 y∗i  ≤ f /(1 − ). By the arbitrariness of , we have ∞ . .

y∗i  ≤ f 

(7.6)

i=1

Hence, x∗ ∈ l1 (Y∗ ). Based on the preceding analysis, we may define a function ψ : X∗ → l1 (Y∗ ) by ψ(f ) = (f ◦ A1 , f ◦ A2 , . . .), ∀f ∈ X∗ . Clearly, ψ is linear. ∀f1 , f2 ∈ X∗ with ψ(f1 ) = ψ(f2 ). Then, we have f1 ◦ A(y1 , y2 , . . .) ∈ X, i = f2 ◦ Ai , ∀i ∈ N. ∀x :=by (7.5), we have f1 (x) = limn∈N ni=1 f1 (Ai yi ) = limn∈N ni=1 f2 (Ai yi ) = f2 (x). Then, we have f1 = f2 . Hence, ψ is injective. ∀x∗ := (y∗1 , y∗2 , . . .) ∈ ∞ l1 (Y∗ ), define φ(x∗ ) : X → K by φ(x∗ )(x) = ∗i , yi , ∀x := i=1 -y ∞ , y , . . .) ∈ X. Note that, ∀x := , y , . . .) ∈ X, (y (y 1 2 i=1 |y∗i , yi | ≤ -1∞ 2 y y  x x, ≤ where we have made use of Proposition 7.72 in the ∗i i ∗ i=1 first inequality and Hölder’s Inequality in the second inequality. Hence, φ(x∗ )(x) is well-defined. Hence, φ(x∗ ) is well-defined. Clearly, φ(x∗ ) is linear and bounded with φ(x∗ ) =

.



sup

|φ(x∗ )(x)| ≤

sup

x∗ x ≤ x∗ 

x∈X, x≤1 x∈X, x≤1

sup

∞ .

x=(y1 ,y2 ,...)∈X, x≤1 i=1

|y∗i , yi | (7.7)

190

7 Banach Spaces

Hence, φ(x∗ ) ∈ X∗ and ψ(φ(x∗ )) = (φ(x∗ ) ◦ A1 , φ(x∗ ) ◦ A2 , . . .) = (y∗1 , y∗2 , . . .) = x∗

.

Then, ψ ◦ φ = idl1 (Y∗ ) . Hence, ψ is surjective. Therefore, ψ is bijective and admits inverse ψinv . By Proposition 2.4, φ = ψinv . ∀f ∈ X∗ , by (7.6) and (7.7), we have f  = φ(ψ(f )) ≤ ψ(f ) ≤ f . Then, ψ(f ) = f  and ψ is an isometry. Hence, ψ is an isometrical isomorphism. %

7.9.3 Extension Form of Hahn–Banach Theorem Definition 7.79 Let X be a vector space over K. A sublinear functional is p : X → R satisfying, ∀x1 , x2 ∈ X , ∀α ∈ R with α ≥ 0, (i) p(x1 + x2 ) ≤ p(x1 ) + p(x2 ); and (ii) p(αx1 ) = αp(x1 ). % Note that any · on X is a sublinear functional. We introduce the above definition to illustrate the full generality of the Hahn–Banach Theorem. Theorem 7.80 (Extension Form of Hahn–Banach Theorem) Let X be a vector space over the field R, p : X → R be a sublinear functional, M ⊆ X be a subspace, and f : M → R be a linear functional on M satisfying f (m) ≤ p(m), ∀m ∈ M. Then, ∃ a linear functional F : X → R such that F |M = f and F (x) ≤ p(x), ∀x ∈ X . Furthermore, if X is normed with · and p is continuous at ϑX , then F is continuous. Proof We will prove the theorem using Zorn’s Lemma. Define a collection of extensions of f , E, by   .E := {(g, N)  N ⊆ X is a subspace, M ⊆ N, g : N → R is a linear  functional, such that G|M = f and g(n) ≤ p(n), ∀n ∈ N} Clearly, (f, M) ∈ E = ∅. Define a relation ) on E by ∀(g1 , N1 ), (g2 , N2 ) ∈ E, we say (g1 , N1 ) ) (g2 , N2 ) if N1 ⊆ N2 and g2 |N1 = g1 . Clearly, ) is reflexive, antisymmetric, and transitive. Hence, ) is an antisymmetric partial ordering on E. For anynonempty subcollection C ⊆ E such that ) is a total ordering on C, let Nc = (g,N)∈C N. Then, ϑX ∈ M ⊆ Nc ⊆ X . ∀x1 , x2 ∈ Nc , ∀α, β ∈ R, ∃(g1 , N1 ), (g2 , N2 ) ∈ C such that x1 ∈ N1 and x2 ∈ N2 . Since C is totally ordered, by Proposition 2.12, then, without loss of generality, we may assume that (g1 , N1 ) ) (g2 , N2 ). Then, N1 ⊆ N2 and x1 , x2 ∈ N2 . This implies that αx1 + βx2 ∈ N2 ⊆ Nc , since N2 is a subspace. The above shows that Nc is a subspace of X . Define a functional gc : Nc → R by ∀x ∈ Nc , ∃(g, N) ∈ C such that x ∈ N, we assign gc (x) := g(x). Such a functional is uniquely defined because of the following reasoning. ∀x ∈ Nc , ∀(g1 , N1 ), (g2 , N2 ) ∈ C such that x ∈ N1 ∩ N2 . By the total

7.9 Dual Spaces

191

ordering of C and Proposition 2.12, we may assume that, without loss of generality, (g1 , N1 ) ) (g2 , N2 ). Then, x ∈ N1 ⊆ N2 and g2 |N1 = g1 . Hence, g1 (x) = g2 (x). Therefore, gc is well-defined. ∀x1 , x2 ∈ Nc , ∀α, β ∈ R, ∃(g1 , N1 ), (g2 , N2 ) ∈ C such that x1 ∈ N1 and x2 ∈ N2 . By the total ordering of C and Proposition 2.12, we may assume that, without loss of generality, (g1 , N1 ) ) (g2 , N2 ). Then, x1 , x2 ∈ N2 and gc (αx1 + βx2 ) = g2 (αx1 + βx2 ) = αg2 (x1 ) + βg2 (x2 ) = αgc (x1 ) + βgc (x2 ). Hence, gc is a linear functional. gc (x1 ) = g2 (x1 ) ≤ p(x1 ). ∀m ∈ M, gc (m) = g2 (m) = f (m). Therefore, (gc , Nc ) ∈ E. ∀(g, N) ∈ C, we have N ⊆ Nc and ∀n ∈ N, gc (n) = g(n). Then, (g, N) ) (gc , Nc ). Hence, (gc , Nc ) is an upper bound of C. By Zorn’s Lemma, ∃(gM , NM ) ∈ E, which is maximal with respect to ). Now, we are going to show that NM = X . Suppose NM ⊂ X . Then, ∃x0 ∈ X \ NM . Let Ne = {x ∈ X | ∃α ∈ R, ∃n ∈ NM · x = αx0 + n}. Clearly, NM ⊂ Ne , x0 ∈ Ne , and Ne is a subspace of X . ∀x ∈ Ne , ∃! α ∈ R and ∃! n ∈ NM such that x = αx0 + n (otherwise, we may deduce that x0 ∈ NM ). Define ge : Ne → R by ge (x) = ge (αx0 + n) = gM (n) + αge (x0 ), ∀x = αx0 + n ∈ Ne , α ∈ R, and n ∈ NM , where ge (x0 ) ∈ R is a constant to be determined. Clearly, ge is welldefined and is a linear functional on Ne . ∀n ∈ NM , we have ge (n) = gM (n), and hence, ge |NM = gM . Then, ge |M = f . We will show that (ge , Ne ) ∈ E by finding an admissible constant ge (x0 ) such that ge (x) ≤ p(x), ∀x ∈ Ne . ∀n1 , n2 ∈ NM , we have gM (n1 ) + gM (n2 ) = gM (n1 + n2 ) ≤ p(n1 + n2 ) ≤ p(n1 + x0 ) + p(n2 − x0 ). Then, gM (n2 ) − p(n2 − x0 ) ≤ p(n1 + x0 ) − gM (n1 ). By the arbitrariness of n1 and n2 , we have supn2 ∈NM (gM (n2 ) − p(n2 − x0 )) ≤ infn1 ∈NM (p(n1 + x0 ) − gM (n1 )). Since ϑX ∈ NM , then ∃c ∈ R such that .

− ∞ < sup (gM (n2 ) − p(n2 − x0 )) ≤ c n2 ∈NM

≤ inf (p(n1 + x0 ) − gM (n1 )) < +∞ n1 ∈NM

Let ge (x0 ) := c. ∀x ∈ Ne , x = αx0 + n, where α ∈ R and n ∈ NM . We will distinguish three exhaustive and mutually exclusive cases: Case 1: α > 0; Case 2: α = 0; Case 3: α < 0. Case 1: α > 0. ge (x) = αc + gM (n) = α(c + gM (n/α)) ≤ α(p(n/α + x0 ) − gM (n/α) + gM (n/α)) = αp(x/α) = p(x). Case 2: α = 0. Then, x ∈ NM and ge (x) = gM (n) ≤ p(n) = p(x). Case 3: α < 0. ge (x) = αc + gM (n) = α(c + gM (n/α)) ≤ α(gM (−n/α) − p(−n/α − x0 ) + gM (n/α)) = −αp(−x/α) = p(x). Hence, we have ge (x) ≤ p(x) in all three cases. Therefore, (ge , Ne ) ∈ E. We have shown that (gM , NM ) ) (ge , Ne ) and (gM , NM ) = (ge , Ne ). This contradicts the fact that (gM , NM ) is maximal in E by Proposition 2.12. Hence, NM = X and F = gM . If, in addition, X is normed with · and p is continuous at ϑX , then, ∀ ∈ (0, ∞) ⊂ R, ∃δ ∈ (0, ∞) ⊂ R such that |p(x)| < , ∀x ∈ B(ϑX , δ). ∀x ∈

192

7 Banach Spaces

B(ϑX , δ), we will distinguish two exhaustive and mutually exclusive cases: Case 1: F (x) ≥ 0; Case 2: F (x) < 0. Case 1: F (x) ≥ 0. Then, 0 ≤ F (x) ≤ p(x) <  and |F (x)| < . Case 2: F (x) < 0. Then, |F (x)| = −F (x) = F (−x) ≤ p(−x) < . Hence, in both cases, we have |F (x)| < . Then, F is continuous at ϑX . Thus, by Proposition 7.72, F is a bounded linear functional, and therefore continuous. This completes the proof of the theorem. ' & To obtain the counterpart result for complex vector spaces, we need the following result. Lemma 7.81 Let (X , C) be a vector space, f : X → C be a functional on (X , C), and g : X → R be given by g(x) = Re (f (x)), ∀x ∈ X . Then, f is a linear functional on (X , C) if, and only if, g is a linear functional on (X , R) and f (x) = g(x) − ig(ix), ∀x ∈ X . Proof By Lemma 7.40, (X , R) is a vector space. “Necessity” ∀x1 , x2 ∈ X , ∀α, β ∈ R, we have g(αx1 + βx2 ) = Re (f (αx1 + βx2)) = Re (αf (x1 ) + βf (x2 )) = α Re (f (x1 )) + β Re (f (x2 )) = αg(x1 ) + βg(x2 ). Hence, g is a linear functional on (X , R). ∀x ∈ X , let f (x) = g(x) + i Im (f (x)). Since f is a linear functional on (X , C), then f (ix) = g(ix) + i Im (f (ix)) = if (x) = − Im (f (x)) + ig(x). Then, Im (f (x)) = −g(ix) and f (x) = g(x) − ig(ix). “Sufficiency” ∀x1 , x2 ∈ X , ∀α := αr + iαi , β := βr + iβi ∈ C, where αr , αi , βr , βi ∈ R, we have f (αx1 + βx2 ) = g(αx1 + βx2 ) − ig(iαx1 + iβx2 )

.

= g(αr x1 + iαi x1 + βr x2 + iβi x2 ) − ig(αr ix1 − αi x1 + βr ix2 − βi x2 ) = αr g(x1 ) + αi g(ix1 ) + βr g(x2 ) + βi g(ix2 ) − iαr g(ix1 ) + iαi g(x1 ) − iβr g(ix2 ) + iβi g(x2 ) = αr f (x1 ) + iαi f (x1 ) + βr f (x2 ) + iβi f (x2 ) = αf (x1 ) + βf (x2 ) Hence, f is a linear functional on (X , C). This completes the proof of the lemma.

' &

Theorem 7.82 (Extension Form of Hahn–Banach Theorem) Let X be a vector space over the field C, p : X → R be a sublinear functional, M ⊆ X be a subspace (of (X , C)), and f : M → C be a linear functional satisfying Re (f (m)) ≤ p(m), ∀m ∈ M. Then, there exists a linear functional F : X → C, such that F |M = f and Re (F (x)) ≤ p(x), ∀x ∈ X . Furthermore, if X is normed with · and p is continuous at ϑX , then F is continuous. Proof By Lemma 7.40, (X , R) is a vector space and (M, R) is also a vector space. It is easy to show that (M, R) is a subspace of (X , R). Define g : M → R by g(m) = Re (f (m)), ∀m ∈ M. By Lemma 7.81, g is a linear functional on (M, R) and f (m) = g(m) − ig(im), ∀m ∈ M. Note that p is a sublinear functional on (X , C), then it is a sublinear functional on (X , R). Furthermore, g(m) =

7.9 Dual Spaces

193

Re (f (m)) ≤ p(m), ∀m ∈ M. Then, by Hahn–Banach Theorem, Theorem 7.80, ∃ a linear functional G on (X , R) such that G|M = g and G(x) ≤ p(x), ∀x ∈ X . Define a functional F on (X , C) by F (x) = G(x) − iG(ix), ∀x ∈ X . By Lemma 7.81, F is a linear functional on (X , C). ∀m ∈ M, im ∈ M, since M is a subspace of (X , C). Then, F (m) = G(m) − iG(im) = g(m) − ig(im) = f (m). Hence, we have F |M = f . ∀x ∈ X , Re (F (x)) = G(x) ≤ p(x). Hence, F is the functional we seek. Furthermore, if (X , C) is normed with · and p is continuous at ϑX , then, by Lemma 7.40, XR := (X , R, ·) is a normed linear space. By Theorem 7.80, G is continuous. Then, G : XR → R is continuous. By Propositions 3.12, 3.32, and 7.23, F : XR → C is continuous. By Lemma 7.40, F is continuous on X := (X , C, ·). This completes the proof of the theorem. ' & Theorem 7.83 (Simple Version of Hahn–Banach Theorem) Let X be a normed linear space over K, M ⊆ X be a subspace, f : M → K be a linear functional which is bounded (on M), that is f M := supm∈M, m≤1 |f (m)| ∈ [0, ∞) ⊂ R. Then, there exists a F ∈ X∗ such that F |M = f and F  = f M . Proof Define a functional p : X → R by p(x) = f M x, ∀x ∈ X. It is easy to check that p is a sublinear functional. We are going to distinguish two exhaustive and mutually exclusive cases: Case 1: K = R; Case 2: K = C. Case 1: K = R. Clearly, by Proposition 7.72, we have f (m) ≤ |f (m)| ≤ p(m), ∀m ∈ M, and p is continuous. By Hahn–Banach Theorem, Theorem 7.80, ∃ a linear functional F : X → R such that F |M = f and F (x) ≤ p(x), ∀x ∈ X. Furthermore, F is continuous. Hence, F ∈ X∗ , by Proposition 7.72. Note that f M = supm∈M, m≤1 |F (m)| ≤ supx∈X, x≤1 |F (x)| = F . On the other hand, we have, ∀x ∈ X, −F (x) = F (−x) ≤ p(−x) = f M x and F (x) ≤ p(x) = f M x. Hence, |F (x)| ≤ f M x. Then, F  ≤ f M . Hence, F  = f M . Case 2: K = C. Note that, ∀m ∈ M, Re (f (m)) ≤ |Re (f (m))| ≤ |f (m)| ≤ p(m), by Proposition 7.72. By Hahn–Banach Theorem, Theorem 7.82, there exists a linear functional F : X → C such that F |M = f and Re (F (x)) ≤ p(x), ∀x ∈ X. Furthermore, since p is continuous, then F is continuous. By Proposition 7.72, F ∈ X∗ . Note that f M = supm∈M, m≤1 |F (m)| ≤ supx∈X, x≤1 |F (x)| = F . x; On the other hand, ∀x ∈ X, we have either F (x) " = 0, then # |F (x)| " "≤ f M## |F (x)| |F (x)| (x)| or F (x) = 0, then |F (x)| = |FF (x) F (x) = F F (x) x = Re F F (x) x ≤ 6 6   " # 6 |F (x)| 6  |F (x)|  |F (x)| p F (x) x = f M 6 F (x) x 6 = f M  F (x)  · x = f M x. Thus, we have |F (x)| ≤ f M x, ∀x ∈ X. Therefore, F  ≤ f M . Hence, F  = f M . This completes the proof of the theorem. ' &

194

7 Banach Spaces

Corollary 7.84 Let X be a vector space over the field K, M ⊆ X be a subspace, p : X → R be a sublinear functional, f : M → K be a linear functional on M, and (G, ◦, E) be an abelian semigroup of linear operators of X to X . Assume that the following conditions hold. (i) (ii) (iii) (iv)

Am ∈ M, ∀A ∈ G and ∀m ∈ M. p(Ax) ≤ p(x), ∀x ∈ X and ∀A ∈ G. f (Am) = f (m), ∀A ∈ G and ∀m ∈ M. Re (f (m)) ≤ p(m), ∀m ∈ M.

Then, there exists a linear functional F : X → K such that F |M = f , F (Ax) = F (x), ∀A ∈ G and ∀x ∈ X , and Re (F (x)) ≤ p(x), ∀x ∈ X . Furthermore, if, in addition, X is normed with · and p is continuous at ϑX , then F is continuous. Proof Define a functional q : X → Re by q(x) =

.

inf

n∈N A1 ,...,An ∈G

/ n 0 1 . p Ai x ; n

∀x ∈ X

i=1

Since p(ϑX ) = 0, then q(ϑX ) = 0. Since E ∈ G, then, ∀x ∈ X , q(x) ≤ p(Ex) ≤ p(x) < +∞. ∀x1 , x2 ∈ X , ∀n, k ∈ N, ∀A1 , . . . , An ∈ G, ∀B1 , . . . , Bk ∈ G, we have q(x1 + x2 ) ≤

.

! n . k . 1 p Ai ◦ Bj (x1 + x2 ) nk i=1 j =1

=

! k k n . n . . . 1 p Ai ◦ Bj (x1 ) + Ai ◦ Bj (x2 ) nk i=1 j =1

=

i=1 j =1

! . !! n k k n . . . 1 Ai Bj (x1 ) + Bj Ai (x2 ) p nk

1 ≤ p nk

i=1

j =1

n .

k .

i=1

Ai

j =1

!! Bj (x1 )

j =1

n k . 1 . ≤ p Ai Bj (x1 ) nk i=1

j =1

i=1

!! k n . . 1 p + Bj Ai (x2 ) nk j =1

!!

i=1

!! k n . 1 . + p Bj Ai (x2 ) nk j =1

i=1

7.9 Dual Spaces

195

! ! n k k n . . 1 . 1 . ≤ p Bj (x1 ) + p Ai (x2 ) nk nk j =1

i=1

j =1

i=1

!

=

k n . . 1 1 p Bj (x1 ) + p Ai (x2 ) k n j =1

!

i=1

where the first two equalities follows from the fact that G is an abelian semigroup of linear operators on X , the second and third inequalities follows from the fact that p is sublinear, the fourth inequality follows from (ii) in the assumption. Then, by the definition of q, we have q(x1 + x2 ) ≤ q(x1 ) + q(x2)

.

and the right-hand side makes sense since q(x1 ) < +∞ and q(x2 ) < +∞. Note that 0 = q(ϑX ) ≤ q(x1) + q(−x1). Then, q(x1) > −∞. Hence, q : X → R. ∀α ∈ [0, ∞) ⊂ R, ∀x ∈ X , we have q(αx) =

.

=

inf

n∈N A1 ,...,An ∈G

inf

n∈N A1 ,...,An ∈G

! n . 1 p Ai (αx) = n i=1

1 α p n

n .

. 1 p α Ai x n n

inf

n∈N A1 ,...,An ∈G

!

i=1

!! Ai x

= αq(x)

i=1

where the fourth equality follows from Proposition 3.81 and the fact that q(x) ∈ R. Hence, we have shown that q is a sublinear functional. ∀m ∈ M, ∀n ∈ N, ∀A1 , . . . , An ∈ G, we have !

! n 1. 1 . Re f (m) = Re f (Ai m) = Re f n n i=1



n .

!! Ai m

i=1

! n . 1 p Ai m n i=1

where the first equality follows from (i) and (iii) in the assumptions, the second equality follows from the linearity of f , and the first inequality follows from (iv) in the assumptions. Hence, we have Re (f (m)) ≤ q(m), ∀m ∈ M. By the extension forms of Hahn–Banach Theorem (Theorem 7.80 for K = R and Theorem 7.82 for K = C), there exists a linear functional F : X → K such that F |M = f and Re (F (x)) ≤ q(x), ∀x ∈ X .

196

7 Banach Spaces

∀x ∈ X , ∀A ∈ G, ∀n ∈ N, we have # 1 " p E(x − Ax) + A(x − Ax) + · · · + An−1 (x − Ax) n  1 1 1 = p(Ex − An x) ≤ p(Ex) + p(An (−x)) ≤ (p(x) + p(−x)) n n n

q(x − Ax) ≤

.

where the first inequality follows from the definition of q, the second inequality follows from the fact that p is sublinear, and the third inequality follows from (ii) in the assumption. Hence, by the arbitrariness of n, we have q(x − Ax) ≤ 0. We will show that F (Ax) = F (x) by distinguishing two exhaustive and mutually exclusive cases: Case 1: K = R; Case 2: K = C. Case 1: K = R. We have F (x)−F (Ax) = F (x −Ax) ≤ q(x −Ax) ≤ 0. On the other hand, we have F (Ax)−F (x) = F (−x)−F (A(−x)) = F ((−x)−A(−x)) ≤ q((−x) − A(−x)) ≤ 0. Hence, F (x) = F (Ax). Case 2: K = C. Define H : X → R by H (x) = Re (F (x)), ∀x ∈ X . By Lemma 7.40, (X , R) is a vector space. By Lemma 7.81, H is a linear functional on (X , R) and F (x) = H (x) − iH (ix), ∀x ∈ X . Note that H (x) − H (Ax) = H (x − Ax) = Re (F (x − Ax)) ≤ q(x − Ax) ≤ 0. On the other hand, we have H (Ax) − H (x) = H (−x) − H (A(−x)) = H ((−x) − A(−x)) = Re (F ((−x) − A(−x))) ≤ q((−x) − A(−x)) ≤ 0. Hence, H (x) = H (Ax). Then, F (Ax) = H (Ax) − iH (iAx) = H (Ax) − iH (A(ix)) = H (x) − iH (ix) = F (x). Furthermore, if, in addition, X is normed with · and p is continuous at ϑX , then ∀ ∈ (0, ∞) ⊂ R, ∃δ ∈ (0, ∞) ⊂ R, ∀x ∈ B(ϑX , δ), we have |p(x) − p(ϑX )| = |p(x)| < . Note that q(x) ≤ p(x) < . Then, q(−x) ≤ p(−x) < . Since q is a sublinear functional, we have 0 = q(ϑX ) ≤ q(x) + q(−x). Hence, we have q(x) ≥ −q(−x) > −. Therefore, |q(x) − q(ϑX )| = |q(x)| < . Hence, q is continuous at ϑX . Therefore, by the extension forms of Hahn–Banach Theorem, F is continuous. This completes the proof of the corollary. ' & Proposition 7.85 Let X be a normed linear space over the field K and x0 ∈ X with x0 = ϑX . Then, ∃x∗ ∈ X∗ with x∗  = 1 such that x∗ , x0  = x∗ x0 . Proof Consider the subspace M := span ({x0 }). Define f : M → K by f (αx0 ) = αx0 , ∀α ∈ K. Clearly, f is uniquely defined by Proposition 6.17. Clearly f is a linear functional on M and f M := supm∈M, m≤1 |f (m)| = supα∈K, αx0 ≤1 |αx0 | = 1. By the simple version of Hahn–Banach Theorem, there exists x∗ ∈ X∗ such that x∗ |M = f and x∗  = f M = 1. Then, x∗ , x0  = f (x0 ) = x0  = x∗ x0 . This completes the proof of the proposition. ' & Example 7.86 Let Y be a normed linear space over the field K, X := l∞ (Y) be the normed linear space as defined in Example 7.10, and y0 ∈ Y with y0 = ϑY . We will show that X∗ = l1 (Y∗ ) by Hahn–Banach Theorem.

7.9 Dual Spaces

197

Let M := {x := (ξ1 , ξ2 , . . .) ∈ X | limn∈N ξn ∈ Y}. Clearly, M is a subspace of X. By Proposition 7.85, there exists y∗ ∈ Y∗ such that y∗  = 1 and y∗ , y0  = y0  > 0. Define f : M → K by f (m) = y∗ , limn∈N ξn , ∀m := (ξ1 , ξ2 , . . .) ∈ M. It is easy to show that f is a linear functional on M. ∀m = (ξ1 , ξ2 , . . .) ∈ M with m ≤ 1, we have 6 @@ AA 6  6 6  .|f (m)| =  y∗ , lim ξn  ≤ 6 lim ξn 6 = lim ξn  ≤ sup ξn  n∈N

n∈N

n∈N

n≥1

= m ≤ 1 where we have applied Propositions 7.72, 7.21, and 3.66. Hence, we have f M := supm∈M, m≤1 |f (m)| ≤ 1. Consider m0 = (y0 , y0 , . . .) ∈ M, we have |f (m0 )| = |y∗ , y0 | = y0  = m0  > 0. Hence, by Proposition 7.72, f M = 1. By the simple version of Hahn–Banach Theorem, there exists x∗ ∈ X∗ such that x∗ |M = f and x∗  = f M = 1. Clearly,-there does not exist (η∗1 , η∗2 , . . .) ∈ l1 (Y∗ ) such that x∗ is given by x∗ , x = ∞ n=1 η∗n , ξn , ∀x = (ξ1 , ξ2 , . . .) ∈ X. Hence, X∗ = l1 (Y∗ ). %

7.9.4 Second Dual Space Definition 7.87 Let X be a normed linear space over the field K and X∗ be its dual. The dual space of X∗ is called the second dual of X and is denoted by X∗∗ . % Remark 7.88 Let X be a normed linear space over the field K, X∗ be its dual, and X∗∗ be its second dual. Then, X is isometrically isomorphic to a dense subset of a Banach space, which can be taken as a subspace of X∗∗ . This can be proved as follows. By Proposition 7.72, X∗∗ is a Banach space over K. ∀x ∈ X, define a functional f : X∗ → K by f (x∗ ) = x∗ , x, ∀x∗ ∈ X∗ . Clearly, f is a linear functional on X∗ . Note that |f (x∗ )| = |x∗ , x| ≤ xx∗ , ∀x∗ ∈ X∗ , by Proposition 7.72. Then, f  ≤ x. When x = ϑX , then f  = 0 = x. When x = ϑX , then, by Proposition 7.85, ∃x∗0 ∈ X∗ with x∗0  = 1 such that x∗0 , x = x. Then, f  = supx∗ ∈X∗ , x∗ ≤1 |f (x∗ )| ≥ |f (x∗0 )| = x. Hence, we have f  = x. Hence, f ∈ X∗∗ . Thus, we may define a natural mapping φ : X → X∗∗ by φ(x) = f , ∀x ∈ X. ∀x1 , x2 ∈ X, ∀α, β ∈ K, ∀x∗ ∈ X∗ , we have φ(αx1 + βx2 )(x∗ ) = x∗ , αx1 + βx2  = αx∗ , x1 +βx∗ , x2  = αφ(x1 )(x∗ )+βφ(x2 )(x∗ ). Hence, φ is a linear function. ∀x ∈ X, we have φ(x) = x. Hence, φ is an isometry and φ is injective. Therefore, φ is an isometrical isomorphism between X and φ(X). Clearly, φ(X) ⊆ X∗∗ is a subspace. Then, by Proposition 7.17, φ(X) ⊆ X∗∗ is a closed subspace. By Proposition 4.39, φ(X) is complete. Hence, φ(X) is a Banach space. By Proposition 3.5, φ(X) is dense in φ(X). This completes the proof of the remark. %

198

7 Banach Spaces

Definition 7.89 Let X be a normed linear space over the field K, X∗∗ be its second dual, and φ : X → X∗∗ be the natural mapping as defined in Remark 7.88. X is said to be reflexive if φ(X) = X∗∗ , that is X and X∗∗ are isometrically isomorphic. Then, we may label X∗∗ such that X∗∗ = X and φ = idX . % A reflexive normed linear space is clearly a Banach space. Proposition 7.90 Let X be a Banach space over the field K and X∗ be its dual. Then, X is reflexive if, and only if, X∗ is reflexive. Proof Let X∗∗ be the second dual of X and X∗∗∗ be the dual of X∗∗ . “Necessity” Let X be reflexive. Then, X∗∗ = X isometrically isomorphically. Then, X∗ = (X∗∗ )∗ = X∗∗∗ isometrically isomorphically. Hence, X∗ is reflexive. “Sufficiency” Let X∗ be reflexive. Then, X∗∗∗ = φ∗ (X∗ ), where φ∗ : X∗ → X∗∗∗ is the natural mapping on X∗ . We will show that X is reflexive by an argument of contradiction. Suppose X is not reflexive. Let φ : X → X∗∗ be the natural mapping. φ is an isometrical isomorphism between X and φ(X). Since X is complete, then φ(X) is also complete. Then, by Proposition 4.39, φ(X) is a closed subspace of X∗∗ . Since X is not reflexive, then there exists y∗∗ ∈ X∗∗ \ φ(X). Then, y∗∗ = ϑ∗∗ = φ(ϑ). Let δ := infm∗∗ ∈φ(X) y∗∗ − m∗∗ . By Proposition 4.10, δ ∈ (0, ∞) ⊂ R. Consider the subspace N := {m∗∗ +αy∗∗ ∈ X∗∗ | α ∈ K, m∗∗ ∈ φ(X)}. Since y∗∗ ∈ / φ(X), then ∀n∗∗ ∈ N, ∃! α ∈ K and ∃! m∗∗ ∈ φ(X) such that n∗∗ = αy∗∗ + m∗∗ . Define a linear functional f : N → K by f (n∗∗ ) = α, ∀n∗∗ = αy∗∗ + m∗∗ ∈ N, where m∗∗ ∈ φ(X) and α ∈ K. ∀n∗∗ ∈ N, let n∗∗ = αy∗∗ +m∗∗ , where m∗∗ ∈ φ(X) and α ∈ K. We have |f (n∗∗ )| = |α| = |αδ|/δ ≤ 1δ αy∗∗ + m∗∗  = 1δ n∗∗ . Hence, f N ≤ 1/δ. By the simple version of Hahn–Banach Theorem, there exists F ∈ X∗∗∗ such that F |N = f and F  = f N . Since X∗ is reflexive, then φ∗ (X∗ ) = X∗∗∗ . Then, ∃y∗ ∈ X∗ such that F = φ∗ (y∗ ). Then, ∀x∗∗ ∈ X∗∗ , we have F (x∗∗ ) = φ∗ (y∗ )(x∗∗ ) = x∗∗ , y∗ . ∀m∗∗ ∈ φ(X), we have m∗∗ , y∗  = F (m∗∗ ) = f (m∗∗ ) = 0. Hence, ∀x ∈ X, y∗ , x = φ(x)(y∗ ) = φ(x), y∗  = 0. Hence, y∗ = ϑ∗ . Then, F = φ∗ (ϑ∗ ) = ϑ∗∗∗ . This contradicts with the fact that F (y∗∗ ) = f (y∗∗ ) = 1 = 0. Therefore, X must be reflexive. This completes the proof of the proposition. ' & Example 7.91 Clearly, the normed linear space ({ϑX }, K, ·) as defined in Example 7.73 is reflexive. Clearly Kn , n ∈ N, are reflective. Let Y be a reflexive Banach space. Then, lp (Y), p ∈ (1, ∞) ⊂ R, are reflexive. l1 (Y) is not reflexive by Example 7.86 (when Y is nontrivial). Then, by Proposition 7.90, l∞ (Y∗ ) is not reflexive either. % Proposition 7.92 Let X be a finite-dimensional normed linear space over the field K. Then, X is reflexive. Proof Let n ∈ Z+ be the dimension of X and φ : X → X∗∗ be the natural mapping. We will distinguish two exhaustive and mutually exclusive cases: Case 1: n = 0; Case 2: n ∈ N. Case 1: n = 0. Then, X = ({ϑ}, K, ·). By Example 7.73, X∗ = ({ϑ∗ }, K, ·∗ ). Then X∗∗ = ({ϑ∗∗ }, K, ·∗∗ ). Clearly, φ(ϑ) = ϑ∗∗ . Hence, φ(X) = X∗∗ and X is reflexive.

7.9 Dual Spaces

199

⊆ X be a basis of X. ∀x ∈ X, by Corollary 6.47, Case 2: n ∈ N. Let {e1 , . . . , en }n x can be uniquely expressed as α1 , . . . , αn ∈ K. ∀i = j =1 αj ej for somen 1, . . . , n, define fi : X → K by fi (x) = αi , ∀x = j =1 αj ej ∈ X. Clearly, fi is well-defined and a linear functional. By Proposition 7.67, fi is continuous and ∗ . Denote f by e ∈ X∗ . ∀x ∈ X∗ , ∀i = 1, . . . , n, let β = x , e  ∈ K. fi ∈ Xi ∗i ∗ i ∗ i ∀x = nj=1 αj ej ∈ X, we have x∗ , x =

n .

.

n n . BB CC . BB CC αj x∗ , ej = βj αj = βj e∗j , x

j =1

=

DD n .

EE

j =1

k=1

βj e∗j , x

k=1

-n ∗ Hence, x∗ = ({e , . . . , e∗n }). ∀x∗∗ ∈ X∗∗ , j =1 βj e∗j . Therefore, X = span -n ∗1 ∀i = 1, . . . , n, let γi = x∗∗ , e∗i  ∈ K. ∀x∗ = j =1 βj e∗j ∈ X∗ , we have x∗∗ , x∗  =

n .

.

BB

CC

βj x∗∗ , e∗j =

j =1

DD =

x∗ ,

n . k=1

βj γ j =

j =1

EE γ k ek

n .

DD /

=

φ

n .

DD βj

j =1 n .

0

e∗j , EE

n .

EE γ k ek

k=1

γ k ek , x ∗

k=1

 Hence, x∗∗ = φ nk=1 γk ek and X∗∗ ⊆ φ(X). This shows that X is reflexive. This completes the proof of the proposition.

' &

7.9.5 Alignment and Orthogonal Complements Definition 7.93 Let X be a normed linear space, x ∈ X, X∗ be the dual, and x∗ ∈ % X∗ . We say that x∗ is aligned with x if x∗ , x = x∗ x. Clearly, ϑ aligns with any vector in the dual and ϑ∗ is aligned with any vector in X. By Proposition 7.85, ∀x ∈ X with x = ϑ, there exists a x∗ ∈ X∗ with x∗ = ϑ∗ that is aligned with it. Definition 7.94 Let X be a normed linear space, x ∈ X, X∗ be the dual, and x∗ ∈ X∗ . We say that x and x∗ are orthogonal if x∗ , x = 0. % Definition 7.95 Let X be a normed linear space, S ⊆ X, and X∗ be the dual. The orthogonal complement of S, denoted by S ⊥ , consists of all x∗ ∈ X∗ that is orthogonal to every vector in S, that is S ⊥ := {x∗ ∈ X∗ | x∗ , x = 0, ∀x ∈ S}. %

200

7 Banach Spaces

Definition 7.96 Let X be a normed linear space, X∗ be the dual, and S ⊆ X∗ . The pre-orthogonal complement of S, denoted by ⊥S, consists of all x ∈ X that is orthogonal to every vector in S, that is ⊥S := {x ∈ X | x∗ , x = 0, ∀x∗ ∈ S}. % Proposition 7.97 Let X be a normed linear space over the field K, M ⊆ X be a subspace, and y ∈ X. Then, the following statements hold. (i) δ := inf y − m = m∈M

max

Re (x∗ , y), where the maximum is

x∗ ∈M ⊥ , x∗ ≤1 M ⊥ with x∗0 

≤ 1. If the infimum is achieved at achieved at some x∗0 ∈ m0 ∈ M then y − m0 is aligned with x∗0 . (ii) If ∃m0 ∈ M and ∃x∗0 ∈ M ⊥ with x∗0  = 1 such that y − m0 is aligned with x∗0 , then the infimum is achieved at m0 and the maximum is achieved at x∗0 , that is δ = y − m0  = x∗0 , y = Re (x∗0 , y). Proof (i) ∀x∗ ∈ M ⊥ with x∗  ≤ 1, ∀m ∈ M, we have .

Re (x∗ , y) = Re (x∗ , y − m) ≤ |Re (x∗ , y − m)| ≤ |x∗ , y − m| ≤ x∗ y − m ≤ y − m

where we have applied Proposition 7.72 in the third inequality. Then, we have .

sup x∗ ∈M ⊥ , x∗ ≤1

Re (x∗ , y) ≤ inf y − m = δ m∈M

We will distinguish two exhaustive and mutually exclusive cases: Case 1: δ = 0; Case 2: δ > 0. Case 1: δ = 0. Take x∗0 := ϑ∗ . Clearly, x∗0 ∈ M ⊥ and x∗0  = 0 ≤ 1. Then, δ = Re (x∗0 , y) ≤ supx∗ ∈M ⊥ , x∗ ≤1 Re (x∗ , y). Therefore, we have δ = infm∈M y − m = maxx∗ ∈M ⊥ , x∗ ≤1 Re (x∗ , y) and the maximum is achieved at some x∗0 ∈ M ⊥ with x∗0  ≤ 1. If the infimum is achieved at m0 ∈ M, then, 0 = δ = y − m0  and y = m0 . Then, clearly y − m0 is aligned with x∗0 . This case is proved. ¯ Case 2: δ > 0. Then, y ∈ X \ M. Consider the subspace span (M ∪ {y}) =: M. ¯ there exists a unique α ∈ K and a unique m ∈ M such that x = αy + m. ∀x ∈ M, Then, we may define a functional f : M¯ → K by f (x) = f (αy + m) = ¯ f  ¯ := sup ¯ x≤1 |f (x)| ≤ αδ. Clearly, f is a linear functional on M. M x∈M, supα∈K,m∈M,αy+m≤1 |α|δ ≤ supα∈K,m∈M,αy+m≤1 αy + m ≤ 1. By the simple version of Hahn–Banach Theorem, ∃x∗0 ∈ X∗ such that x∗0 |M¯ = f and x∗0  = f M¯ ≤ 1. ∀m ∈ M, we have x∗0 , m = f (m) = 0. Hence, x∗0 ∈ M ⊥ . Note that δ = f (y) = x∗0 , y. Hence, δ = infm∈M y − m = maxx∗ ∈M ⊥ , x∗ ≤1 Re (x∗ , y) and the maximum is achieved at some x∗0 ∈ M ⊥ with x∗0  ≤ 1. If the infimum is achieved at m0 ∈ M, then δ = y − m0  = x∗0 , y = x∗0 , y − m0  ≤ x∗0 y − m0  ≤ y − m0 , where the first inequality follows from Proposition 7.72. Hence, x∗0 , y − m0  = x∗0 y − m0  and x∗0 is aligned with y − m0 . This case is proved.

7.9 Dual Spaces

201

(ii) Note that δ ≤ y − m0  = x∗0 y − m0  = x∗0 , y − m0  = x∗0 , y = Re (x∗0 , y) ≤ δ. Then, the result follows. ' & This completes the proof of the proposition. Proposition 7.98 Let X be a normed linear space over K, A, C ⊆ X, and B, D ⊆ X∗ . Then, the following statements holds. (i) If A ⊆ C, then C ⊥ ⊆ A⊥ . Similarly, if B ⊆ D, then ⊥D ⊆ ⊥B. (ii) A⊥ ⊆ X∗ is a closed subspace. (iii) ⊥B  ⊆X is a closed subspace. (iv) ⊥ A⊥ = span (A). Proof (i) This follows directly from Definition 7.95. (ii) ϑ∗ ∈ A⊥ = ∅. ∀x∗1 , x∗2 ∈ A⊥ , ∀α, β ∈ K, ∀x ∈ A, we have αx∗1 + βx∗2 , x = αx∗1 , x + βx∗2 , x = 0. Hence, αx∗1 + βx∗2 ∈ A⊥ ⊥ and A⊥ is a subspace. ∀x∗ ∈ A⊥ , by Proposition 4.13, ∃(x∗n )∞ n=1 ⊆ A such that limn∈N x∗n = x∗ . Then, ∀x ∈ A, by Propositions 7.72 and 3.66, we have x∗ , x = limn∈N x∗n , x = 0. Then, x∗ ∈ A⊥ and A⊥ ⊆ A⊥ . By Proposition 3.3, A⊥ = A⊥ and A⊥ is closed. Hence, (ii) follows. (iii) This can be shown  by  a similar argument as (ii).   (iv) Clearly, A ⊆ ⊥ A⊥ . Then, by (iii), we have M := span (A) ⊆ ⊥ A⊥ . On     the other hand, A ⊆ M, then, by (i), we have A⊥ ⊇ M ⊥ and ⊥ A⊥ ⊆ ⊥ M ⊥ .   ∀x0 ∈ ⊥ M ⊥ , by Proposition 7.97, we have .

inf x0 − m =

m∈M

max

x∗ ∈M ⊥ , x∗ ≤1

Re (x∗ , x0 ) = 0

Since M is closed, by Proposition 4.10, x0 ∈ M. Hence, we have     ⊥ M ⊥ ⊆ M. Hence, M = ⊥ A⊥ . Hence, (iv) follows. This completes the proof of the proposition.





A⊥



⊆ ' &

Proposition 7.99 Let X be a normed linear space over the field K, M ⊆ X be a subspace, and y∗ ∈ X∗ . Then, the following statements holds. (i) δ := min y∗ − x∗  = x∗ ∈M ⊥

sup

¯ where the minimum Re (y∗ , m) =: δ,

m∈M, m≤1 ∈ M ⊥ . If the

supremum is achieved at m0 ∈ M with is achieved at some x∗0 m0  ≤ 1, then m0 is aligned with y∗ − x∗0 . (ii) If ∃m0 ∈ M with m0  = 1 and ∃x∗0 ∈ M ⊥ such that y∗ − x∗0 is aligned with m0 , then the minimum is achieved at x∗0 and the supremum is achieved at m0 , that is δ = y∗ − x∗0  = y∗ , m0  = Re (y∗ , m0 ). Proof (i) ∀x∗ ∈ M ⊥ , ∀m ∈ M with m ≤ 1, we have .

Re (y∗ , m) = Re (y∗ − x∗ , m) ≤ |Re (y∗ − x∗ , m)| ≤ |y∗ − x∗ , m| ≤ y∗ − x∗ m ≤ y∗ − x∗ 

202

7 Banach Spaces

where we have applied Proposition 7.72 in the third inequality. Then, we have δ = inf y∗ − x∗  ≥

.

x∗ ∈M ⊥

sup

m∈M, m≤1

Re (y∗ , m) = δ¯

We will distinguish two exhaustive and mutually exclusive cases: Case 1: δ = 0; Case 2: δ > 0. Case 1: δ = 0. By Proposition 7.98, M ⊥ is a closed subspace. By Proposition 4.10, we have y∗ ∈ M ⊥ the minimum is achieved at a unique vector x∗0 := y∗ . Take m0 = ϑ with m0  = 0 ≤ 1. Then, δ = Re (y∗ , m0 ) ≤ supm∈M, m≤1 Re (y∗ , m) = δ¯ ≤ δ. Hence, δ = minx∗ ∈M ⊥ y∗ − x∗  = supm∈M, m≤1 Re (y∗ , m) = δ¯ and the minimum is achieved at x∗0 = y∗ ∈ M ⊥ . If the supremum is achieved at m0 ∈ M with m0  ≤ 1. Then, clearly m0 is aligned with y∗ − x∗0 . This case is proved. Case 2: δ > 0. Then, y∗ ∈ / M ⊥ . Consider the subspace M. Then, we may define a functional f : M → K by f (m) = y∗ , m. Clearly, f is a linear functional on M. ∀m ∈ M with m ≤ 1, we have either f (m) = 0, ¯ or f (m) = 0, then |f (m)| = 0 ≤ supm∈M, m≤1 Re (y∗ , m) = δ; " # " " ## (m)| (m)| (m)| f (m) = Re |ff (m) f (m) = Re f |ff (m) m = then |f (m)| = |ff (m) AA# "@@ (m)| ¯ Hence, f M := supm∈M, m≤1 |f (m)| ≤ δ¯ ≤ m ≤ δ. Re y∗ , |ff (m) δ < +∞. By the simple version of Hahn–Banach Theorem, ∃y∗0 ∈ X∗ such ¯ Let x∗0 := y∗ − y∗0 . ∀m ∈ M, we that y∗0 |M = f and y∗0  = f M ≤ δ. have x∗0 , m = y∗ , m − y∗0 , m = 0. Hence, x∗0 ∈ M ⊥ . Note that δ ≥ δ¯ ≥ f M = y∗0  = y∗ − x∗0  ≥ infx∗ ∈M ⊥ y∗ − x∗  = δ. Hence, δ = minx∗ ∈M ⊥ y∗ − x∗  = supm∈M, m≤1 Re (y∗ , m) = δ¯ and the minimum is achieved at some x∗0 ∈ M ⊥ . If the supremum is achieved at m0 ∈ M with m0  ≤ 1, then δ = y∗ − x∗0  ≥ y∗ − x∗0 m0  ≥ |y∗ − x∗0 , m0 | = |y∗ , m0 | ≥ Re (y∗ , m0 ) = δ, where the second inequality follows from Proposition 7.72. Hence, y∗ − x∗0 , m0  = y∗ − x∗0 m0  and y∗ − x∗0 is aligned with m0 . This case is proved. (ii) Note that δ ≤ y∗ − x∗0  = y∗ − x∗0 m0  = y∗ − x∗0 , m0  = y∗ , m0  = Re (y∗ , m0 ) ≤ δ. Hence, the result follows. ' & This completes the proof of the proposition. Proposition 7.100 Let X be a normed linear space over the field K and S ⊆ X be a subspace. By Proposition 7.13, S is a normed linear space over K. Then, the following statement holds. (i) S ∗ is isometrically isomorphic to X∗ /S ⊥ . (ii) If X is reflexive and S is closed, then S is reflexive. Proof (i) Define a mapping A : X∗ → S ∗ by A(x∗ ), s = x∗ , s, ∀x∗ ∈ X∗ and ∀s ∈ S. ∀x∗ ∈ X∗ , we will show that A(x∗ ) ∈ S ∗ . Clearly, A(x∗ ) is a linear functional on S. A(x∗ ) := sups∈S,s≤1 |A(x∗ ), s| = sups∈S,s≤1 |x∗ , s| ≤ sups∈S,s≤1 x∗ s ≤ x∗  < +∞. Hence, A(x∗ ) ∈ S ∗ and A is well-defined.

7.9 Dual Spaces

203

∀x∗1 , x∗2 ∈ X∗ , ∀α, β ∈ K, ∀s ∈ S, we have A(αx∗1 + βx∗2 ), s = αx∗1 + βx∗2 , s = αx∗1 , s + βx∗2 , s

.

= αA(x∗1 ), s + βA(x∗2), s = αA(x∗1 ) + βA(x∗2), s Hence, A is a linear function. Since Ax∗  ≤ x∗ , ∀x∗ ∈ X∗ , then A ∈ B(X∗ , S ∗ ) with A ≤ 1. By Proposition 7.68, N (A) ⊆ X∗ is a closed subspace. ∀x∗ ∈ N (A), we have Ax∗ = ϑS ∗ . ∀s ∈ S, 0 = Ax∗ , s = x∗ , s. Hence, x∗ ∈ S ⊥ . Therefore, N (A) ⊆ S ⊥ . On the other hand, ∀x∗ ∈ S ⊥ , ∀s ∈ S, we have 0 = x∗ , s = Ax∗ , s. Then, Ax∗ = ϑS ∗ . Hence, x∗ ∈ N (A). Thus, S ⊥ ⊆ N (A). In conclusion, S ⊥ = N (A). By Proposition 7.45, X∗ /S ⊥ is a Banach space. Let φ : X∗ → X∗ /S ⊥ be  the natural homomorphism. By Proposition 7.70, there exists AD ∈ B X∗ /S ⊥ , S ∗ such that A = AD ◦ φ, AD is injective, and AD  = A ≤ 1. ∀s∗ ∈ S ∗ , by the simple version of Hahn–Banach Theorem, there exists x∗ ∈ X∗ such that x∗ |S = s∗ and x∗  = s∗ . ∀s ∈ S, we have s∗ , s = x∗ , s = Ax∗ , s = AD (φ(x∗ )), s = AD ([x∗ ]), s. Hence, s∗ = AD ([x∗ ]). Then, AD is surjective. Thus, AD is bijective. ∀[x∗ ] ∈ X∗ /S ⊥ , [x∗ ] = infy∗ ∈S ⊥ x∗ − y∗ , by Proposition 7.44. By Proposition 7.99, we have [x∗ ] = min x∗ − y∗  =

.

y∗ ∈S ⊥

= = ≤ ≤ =

sup

s∈S, s≤1

Re (x∗ , s)

sup

Re (Ax∗ , s) =

sup

Re (AD [x∗ ], s)

sup

|Re (AD [x∗ ], s)| ≤

sup

AD [x∗ ]s ≤ AD [x∗ ] = Ax∗ 

s∈S, s≤1 s∈S, s≤1 s∈S, s≤1 s∈S, s≤1

inf

y∗ ∈N (A)

sup

s∈S, s≤1

Re (AD (φ(x∗ )), s)

sup

s∈S, s≤1

|AD [x∗ ], s|

A(x∗ − y∗ ) = inf A(x∗ − y∗ ) y∗ ∈S ⊥

≤ inf Ax∗ − y∗  ≤ [x∗ ] y∗ ∈S ⊥

where we have applied Proposition 7.72 in the third inequality and Proposition 7.64 in the fifth inequality. Therefore, we have [x∗ ] = AD [x∗ ] and AD is an isometry. Thus, AD is an isometrical isomorphism between X∗ /S ⊥ and S ∗ . Hence, (i) is established. (ii) Let X be reflexive and S be a closed subspace. Let ψ : S → S ∗∗ be the natural mapping. All we need to show is that ψ(S) = S ∗∗ . Fix a s∗∗ ∈ S ∗∗ . Define a functional τ : X∗ → K by τ (x∗ ) = s∗∗ , Ax∗ , ∀x∗ ∈ X∗ . It is easy to show

204

7 Banach Spaces

that τ is a linear functional on X∗ . ∀x∗ ∈ X∗ with x∗  ≤ 1, we have |τ (x∗ )| = |s∗∗ , Ax∗ | ≤ s∗∗ Ax∗  ≤ s∗∗ Ax∗  ≤ s∗∗  < +∞, where we have applied Propositions 7.72 and 7.64. Hence, τ ∈ X∗∗ . Since X is reflexive, then, by Remark 7.88 and Definition 7.89, ∃x0 ∈ X such that τ (x∗ ) = x∗ , x0 , ∀x∗ ∈ X∗ . ∀y∗ ∈ S ⊥ , we have y∗ , x0  = τ (y∗ ) = s∗∗ , Ay∗  = s∗∗ , ϑS ∗  = 0, where the third equality follows from the fact that  S ⊥ = N (A). Hence, x0 ∈ ⊥ S ⊥ = S, by Proposition 7.98. Hence, ∀s∗ ∈ S ∗ , ∃x∗ ∈ X∗ such that Ax∗ = AD [x∗ ] = s∗ since AD is an isometrical isomorphism. Then, we have s∗∗ , s∗  = s∗∗ , Ax∗  = τ (x∗ ) = x∗ , x0  = Ax∗ , x0  = s∗ , x0 . This implies that s∗∗ = ψ(x0 ) and ψ(S) = S ∗∗ . Hence, (ii) is established. This completes the proof of the proposition. ' &

7.10 The Open Mapping Theorem Definition 7.101 Let X := (X, OX ) and Y := (Y, OY ) be topological spaces and A : X → Y. A is called an open mapping if ∀OX ∈ OX , we have A(OX ) ∈ OY , that is the image of each open set is open. % Proposition 7.102 Let X be a normed linear space over the field K, S, T ⊆ X, and α ∈ K. Then, the following statements hold. (i) (ii) (iii) (iv) (v)

αS = αS.  = α If α = 0, then αS S. If α = 0, then (αS)◦ = αS ◦ . S +T ⊆ S +T. S ◦ + T ◦ ⊆ (S + T )◦ .

Proof (i) We will distinguish three exhaustive and mutually exclusive cases: Case 1: α = 0 and S = ∅; Case 2: α = 0 and S = ∅; Case 3: α = 0. Case 1: α = 0 and S = ∅. Then, S = ∅ and αS = ∅ = αS. Case 2: α = 0 and S = ∅. Then, S = ∅ and αS = {ϑ} = αS. Case 3: α = 0. ∀x ∈ αS, by Proposition 4.13, ∃(xn )∞ n=1 ⊆ αS such that  −1 ∞ limn∈N xn = x. Note that α xn n=1 ⊆ S and limn∈N α −1 xn = α −1 x by Propositions 3.66 and 7.23. By Proposition 4.13, α −1 x ∈ S. Then, x ∈ αS. This shows that αS ⊆ αS. On the other hand, ∀x ∈ αS, then α −1 x ∈ S. By Proposition 4.13, ∃(xn )∞ n=1 ⊆ S such that limn∈N xn = α −1 x. Note that (αxn )∞ ⊆ αS and lim αx n = x by n∈N n=1 Propositions 3.66 and 7.23. By Proposition 4.13, x ∈ αS. Then, αS ⊆ αS. Hence, αS = αS.  if, and only if, α −1 x ∈   = α (ii) ∀x ∈ αS S if, and only if, x ∈ α S. Hence, αS S. ◦ (iii) ∀x ∈ , ∃δ ∈ (0, ∞) ⊂ R such that B(x, δ) ⊆ αS. Then, (αS)   B α −1 x, δ/|α| = α −1 B(x, δ) ⊆ S. Hence, α −1 x ∈ S ◦ . Then, x ∈ αS ◦ . This shows that (αS)◦ ⊆ αS ◦ .

7.10 The Open Mapping Theorem

205

the other hand, ∀x ∈ αS ◦ , we have α −1 x ∈S ◦ . ∃δ ∈ (0, ∞) ⊂ R such that  On  −1 B α x, δ ⊆ S. Then, B(x, |α|δ) = αB α −1 x, δ ⊆ αS. Hence, x ∈ (αS)◦ . This shows that αS ◦ ⊆ (αS)◦ . Hence, we have (αS)◦ = αS ◦ . (iv) ∀x¯ ∈ S + T , ∃¯s ∈ S and ∃t¯ ∈ T such that x¯ = s¯ + t¯. By Proposition 4.13, ∞ ¯ ∃(sn )∞ n=1 ⊆ S and ∃(tn )n=1 ⊆ T such that limn∈N sn = s¯ and limn∈N tn = t . Then, ∞ ¯ by Propositions 7.23, 3.66, (sn + tn )n=1 ⊆ S + T and limn∈N (sn + tn ) = s¯ + t¯ = x, and 3.67. By Proposition 4.13, x¯ ∈ S + T . Hence, S + T ⊆ S + T . (v) ∀x ∈ S ◦ + T ◦ , ∃s0 ∈ S ◦ and ∃t0 ∈ T ◦ such that x = s0 + t0 . Then, ∃rs , rt ∈ (0, ∞) ⊂ R such that B(s0 , rs ) ⊆ S and B(t0 , rt ) ⊆ T . Thus, we have B(x, rs + rt ) = B(s0 , rs ) + B(t0 , rt ) ⊆ S + T and x ∈ (S + T )◦ . Hence, S ◦ + T ◦ ⊆ (S + T )◦ . This completes the proof of the proposition. ' & Theorem 7.103 (Open Mapping Theorem) Let X and Y be Banach spaces over the field K and A ∈ B(X, Y) be surjective. Then, A is an open mapping. Furthermore, if A is injective, then Ainv ∈ B(Y, X). Proof We need the following claim. Claim 7.103.1 The image of the unit ball in X under A contains an open ball centered at the origin in Y, that is ∃δ ∈ (0, ∞) ⊂ R such that BY (ϑY , δ) ⊆ A(BX (ϑX , 1)).   Proofof Claim Let Sn := BX ϑX , 2−n , ∀n ∈ Z+ . Since A is linear ∞and surjective ∞ and kS = X, then, by Proposition 2.5, we have Y = 1 k=1 k=1 A(kS1 ) = ∞ k=1 kA(S1 ). Since Y is a complete metric space, then, by Baire Category Theorem, Y is second category everywhere. Then, Y is not of first category, that is Y is not countable union of nowhere dense sets. Then, by Proposition 7.102, A(S1 ) ⊆ Y is not nowhere dense. Then, A(S1 ) ⊆ Y is not nowhere  dense. By Proposition 3.40, ¯ δ¯ ⊆ A(S1 ). Then, we have ∃y¯ ∈ Y and ∃δ¯ ∈ (0, ∞) ⊂ R such that BY y,     ¯ δ¯ − y¯ ⊆ A(S1 ) − A(S1 ). Note that, by Proposition 7.102 BY ϑY , δ¯ = BY y, and the linearity of A, −A(S1 ) = −A(S1 ) = A(−S 1 ) = A(S1 ). Then, again  by Proposition 7.102 and the linearity of A, BY ϑY , δ¯ ⊆ A(S1 ) + A(S1 ) ⊆ A(S1 ) + A(S1 ) = A(S1 + S1 ) = A(S0 ). Thus, A(S0 ) ⊆ Y contains aball center  at ¯ By Proposition 7.102 and linearity of A, BY ϑY , 2−n δ¯ = the origin with radius δ.   2−n BY ϑY , δ¯ ⊆ 2−n A(S0 ) = A(2−n S0 ) = A(Sn ) ⊆ Y, ∀n ∈ Z+ .   ¯ Now, we proceed to show that BY ϑY , δ/2 ⊆ A(S0 ). Fix an arbitrary vector y ∈   ¯ . Then, y ∈ A(S1 ) and ∃x1 ∈ S1 , such that y − Ax1  < 2−2 δ. ¯ This BY ϑY , δ/2   −2 implies that y − Ax1 ∈ BY ϑY , 2 δ¯ ⊆ A(S2 ). Recursively, ∀n ∈ N with n ≥ 2, 6 6 -n −n−1 δ. 6 6 ¯ y − n−1 n ). Then, ∃xn ∈ Sn such that y − k=1 Axk < 2 k=1 Axk ∈ A(S  n −n−1 This implies that y − k=1 Axk ∈ B Y ϑY , 2 δ¯ ⊆ A(Sn+1 ). Since xn ∈ Sn ∞ −n , ∀n ∈ N, then  x  x < 2 < 1. By Proposition 7.27, and n k=1 -∞ n -n we have x = x ∈ Y. It is easy to show that x ∈ S . Then, y = lim 0 n∈N k=1 n k=1 Axk =  limn∈N A nk=1 xk = Ax, where we have applied Proposition 3.66. Thus, y ∈   ¯ A(S0 ) and BY ϑY , δ/2 ⊆ A(S0 ). This completes the proof of the claim. ' &

206

7 Banach Spaces

Fix any open set O ⊆ X and any y ∈ A(O). Let x ∈ O be such that Ax = y. Then, there exists r ∈ (0, ∞) ⊂ R such that BX (x, r) ⊆ O. By Claim 7.103.1, ∃δ ∈ (0, ∞) ⊂ R such that BY (ϑY , δ) ⊆ A(BX (ϑX , r)). By the linearity of A, we have BY (y, δ) = y + BY (ϑY , δ) ⊆ Ax + A(BX (ϑX , r)) = A(BX (x, r)) ⊆ A(O). By the arbitrariness of y, A(O) ⊆ Y is open. If, in addition, A is injective, then A is bijective and Ainv exists. Since A is open mapping, then Ainv is continuous. It is obvious that Ainv : Y → X is linear since A is linear. Therefore, Ainv ∈ B(Y, X). This completes the proof of the theorem. ' & Proposition 7.104 Let X be a vector space over K and ·1 and ·2 be two norms on X such that X1 := (X , K, ·1 ) and X2 := (X , K, ·2 ) are Banach spaces. If ∃M ∈ [0, ∞) ⊂ R such that x2 ≤ Mx1 , ∀x ∈ X . Then, the two norms are equivalent. Proof Consider the mapping A := idX : X1 → X2 . Clearly A is a linear bijective function. By the assumption of the proposition, A ∈ B(X1 , X2 ). By Open Mapping Theorem, Ainv ∈ B(X2 , X1 ). Then, ∃M¯ ∈ [0,  ∞) ⊂  R such that x1 = ¯ ¯ Ainv x1 ≤ Mx 2 , ∀x ∈ X . Now, take K = max M, M + 1 ∈ (0, ∞) ⊂ R. Then, we have x1 /K ≤ x2 ≤ Kx1 , ∀x ∈ X . Hence, the two norms are equivalent. This completes the proof of the proposition. ' & Theorem 7.105 (Closed Graph Theorem) Let X := (X , K, ·) and Y be Banach spaces over the field K and A : X → Y be a linear operator. Assume that A satisfies that ∀(xn )∞ n=1 ⊆ X with limn∈N xn = x0 ∈ X and limn∈N Axn = y0 ∈ Y, we have y0 = Ax0 . Then, A ∈ B(X, Y). Proof Define a functional ·1 : X → R by x1 := x + Ax, ∀x ∈ X. It is easy to show that ·1 defines a norm on X . Let X1 := (X , K, ·1 ) be the normed linear space. We will show that X1 is complete. Fix any Cauchy sequence (xn )∞ n=1 ⊆ ∞ X1 . It is easy to see that (xn )∞ ⊆ X is a Cauchy sequence; and that (Ax ) n n=1 n=1 ⊆ Y is a Cauchy sequence. By the completeness of X and Y, there exists x0 ∈ X and y0 ∈ Y such that limn∈N xn − x0  = 0 and limn∈N Axn − y0  = 0. By the assumption of the theorem, we have y0 = Ax0. Hence, we have limn∈N xn − x0 1 = 0. Then, limn∈N xn = x0 in X1 . Thus, X1 is a Banach space. Clearly, ∀x ∈ X , we have x ≤ x1 . By Proposition 7.104, · and ·1 are equivalent. Then, ∃M ∈ [0, ∞) ⊂ R such that x1 ≤ Mx, ∀x ∈ X . Then, we have Ax ≤ Mx, ∀x ∈ X . Hence, A ∈ B(X, Y). This completes the proof of the theorem. ' & Recall that the graph of a function f : X → Y is the set {(x, y) ∈ X × Y | x ∈ X, y = f (x)}. Then, we have the following alternative statement of the Closed Graph Theorem. Theorem 7.106 (Closed Graph Theorem) Let X and Y be Banach spaces over the field K and A : X → Y be a linear operator. Then, A ∈ B(X, Y) if, and only if, the graph of A is a closed set in X × Y. Proof “Sufficiency” Let the graph of A be graph (A). Fix any sequence (xn )∞ n=1 ⊆ X with limn∈N xn = x0 ∈ X and limn∈N Axn = y0 ∈ Y. Note that ((xn , Axn ))∞ n=1 ⊆

7.10 The Open Mapping Theorem

207

graph (A) ⊆ X × Y and limn∈N (xn , Axn ) = (x0 , y0 ), by Proposition 3.67. Then, by Proposition 4.13, we have (x0 , y0 ) ∈ graph (A) = graph (A), where the equality follows from Proposition 3.3. Hence, we have y0 = Ax0 . By the Closed Graph Theorem, Theorem 7.105, we have A ∈ B(X, Y). “Necessity” ∀(x0 , y0 ) ∈ graph (A), by Proposition 4.13, there exists ((xn , Axn ))∞ n=1 ⊆ graph (A) such that limn∈N (xn , Axn ) = (x0 , y0 ). By Proposition 3.67, we have limn∈N xn = x0 and limn∈N Axn = y0 . By Proposition 3.66, we have y0 = limn∈N Axn = Ax0. Hence, (x0 , y0 ) ∈ graph (A). By Proposition 3.3, graph (A) is closed in X × Y. This completes the proof of the theorem. ' & Proposition 7.107 Let X be a Banach space over the field K, Y be a normed linear space over the same field, and F ⊆ B(X, Y). Assume that ∀x ∈ X, ∃Mx ∈ [0, ∞) ⊂ R such that T x ≤ Mx , ∀T ∈ F . Then, ∃M ∈ [0, ∞) ⊂ R, such that T  ≤ M, ∀T ∈ F . Proof By Baire Category Theorem, X is second category everywhere. ∀T ∈ F , let f : X → R be given by f (x) = T x, ∀x ∈ X. By Propositions 7.21 and 3.12, f is a continuous real-valued function. By Uniform Boundedness Principle, Theorem 3.41, there exists an open set O ⊆ X with O = ∅ and ¯ ∀T ∈ F , ∀x ∈ O. Since O is nonempty M¯ ∈ [0, ∞) ⊂ R such that T x ≤ M, and open, then ∃BX (x0 , r) ⊆ O for some x0 ∈ X and some r ∈ (0, ∞) ⊂ R. ∀x ∈ X with x < r, x + x0 ∈ BX (x0 , r) ⊆ O and ∀T ∈ F , we have T x = T (x + x0 ) − T x0  ≤ T (x + x0 ) + T x0  ≤ M¯ + Mx0

.

Hence, ∀T ∈ F , we have, ∀ ∈ (0, r) ⊂ R, T  =

.

sup

x∈X, x≤1

T x =

sup

(r − )−1 T ((r − )x) ≤

x∈X, x≤1

M¯ + Mx0 r −

By the arbitrariness of , we have T  ≤ (M¯ + Mx0 )/r =: M < +∞. This completes the proof of the proposition. ' & Proposition 7.108 Let X be a Banach space over the field K, Y be a normed linear space over the same field, and (Tn )∞ n=1 ⊆ B(X, Y). Assume that ∀x ∈ X, limn∈N Tn x = T (x) ∈ Y. Then, T ∈ B(X, Y). Proof ∀x1 , x2 ∈ X, ∀α, β ∈ K, we have T (αx1 + βx2 ) = lim Tn (αx1 + βx2 ) = lim (αTn x1 + βTn x2 ) = αT x1 + βT x2

.

n∈N

n∈N

by Propositions 7.23 and 3.66. Hence, T is linear. ∀x ∈ X, by Propositions 7.21 and 3.66, limn∈N Tn x = T x < +∞. Then, ∃Mx ∈ [0, ∞) ⊂ R such that Tn x ≤ Mx , ∀n ∈ N. By Proposition 7.107, ∃M ∈ [0, ∞) ⊂ R such that Tn  ≤ M, ∀n ∈ N. ∀x ∈ X with x ≤ 1, by Proposition 7.64, T x = limn∈N Tn x ≤

208

7 Banach Spaces

lim supn∈N Tn x ≤ M < +∞. Hence, T  ≤ M < +∞. Therefore, T ∈ B(X, Y). This completes the proof of the proposition. & '

7.11 The Adjoints of Linear Operators Proposition 7.109 Let X and Y be normed linear spaces over the field K and A ∈ B(X, Y). The adjoint operator of A is A : Y∗ → X∗ defined by BB

.

CC A (y∗ ), x = y∗ , Ax;

∀x ∈ X, ∀y∗ ∈ Y∗

6 6 Then, A ∈ B(Y∗ , X∗ ) with 6A 6 = A. Proof First, we will show that A is well-defined. ∀y∗ ∈ Y∗ , ∀x ∈ X. y∗ , Ax ∈ K. Hence, f : X → K defined by f (x) = y∗ , Ax, ∀x ∈ X, is a functional on X. By the linearity of A and y∗ , f is a linear functional. ∀x ∈ X, we have, by Proposition 7.64 |f (x)| ≤ y∗ Ax ≤ y∗ Ax

.

Hence, f is a bounded linear functional with f  ≤ Ay∗ . The above shows that A (y∗ ) = f ∈ X∗ . Hence, A is well-defined. 6 6 It is straightforward to show that A is a linear operator. By the fact that 6A y∗ 6 = 6 6 f  ≤ Ay∗ , ∀y∗ ∈ Y∗ , we have A ∈ B(Y∗ , X∗ ) and 6A 6 ≤ A. 6 On 6 the other hand, ∀x ∈ X, we have either Ax = ϑY , then Ax = 0 ≤ 6A 6x; or Ax = ϑY , then, by Proposition 7.85, ∃y∗ ∈ Y∗ with y∗  = 1 such 6 6 BB CC that6Ax = y∗6, Ax, which implies that Ax = A y∗6, x 6 ≤ 6A y∗ 6x ≤ 6 6 6A 6y∗ x = 6A 6x. Hence, we must have Ax ≤ 6A 6x. This implies 6 6 . 6 that A ≤ 6A 66 Then, A = 6A 6. This completes the proof of the proposition. ' & Proposition 7.110 Let X, Y, and Z be normed linear spaces over the field K. Then, the following statements hold. idX = idX∗ . If A1 , A2 ∈ B(X, Y), then (A1 + A2 ) = A1 + A2 . If A ∈ B(X, Y) and α ∈ K, then (αA) = αA . If A1 ∈ B(X, Y) and A2 ∈ B(Y, Z), then (A2 A1 ) = A1 A2 . If A ∈ B(X, Y) and A has a bounded inverse, then (A−1 ) = (A )−1 . BB CC Proof (i) ∀x∗ ∈ X∗ , ∀x ∈ X, idX (x∗ ), x = x∗ , idX (x) = x∗ , x = idX∗ (x∗ ), x. Hence, the result follows. BB CC (ii) ∀y∗ ∈ Y∗ , ∀x ∈ X, we BBhave (ACC1 + BBA2 ) y∗ , xCC = y BB ∗ , (A1 + A2 )x CC =  y , x + A y , x y y , A x + , A x = A = A y + A y , x = ∗ 1 ∗ 2 ∗ ∗ ∗ ∗ 1 2 1 2 BB  CC (A1 + A2 )y∗ , x . Hence, the result follows. (i) (ii) (iii) (iv) (v)

7.11 The Adjoints of Linear Operators

209

BB CC ∗  (iii) ∀y∗ ∈ Y =CC y∗ , (αA)x = BB ,  ∀x CC∈ X,BB we  have CC (αA) BB y∗ , x αy∗ , Ax = α A y∗ , x = α(A y∗ ), x = (αA )y∗ , x . Hence, the result follows. BB CC (iv) ∀z∗ ∈ Z∗ , BB∀x ∈ X, CCwe have (A2 A1 )CC z∗ , xBB = z∗ , (A BB CC 2 A1 )x = z∗ , A2 (A1 x) = A2 z∗ , A1 x = A1 (A2 z∗ ), x = (A1 A2 )z∗ , x . Hence, the result follows. (v) By (i) and (iv), we have (A−1 ) A = (AA−1 ) = idY = idY∗ and A (A−1 ) = (A−1 A) = idX = idX∗ . By Proposition 2.4, we have (A−1 ) = (A )−1 . This completes the proof of the proposition. ' & Proposition 7.111 Let X and Y be normed linear spaces over the field K, A ∈ B(X, Y), φX : X → X∗∗ and φY : Y → Y∗∗ be the natural mappings on X and Y, respectively, and A : X∗∗ → Y∗∗ be the adjoint of the adjoint of A. Then, we have A ◦ φX = φY ◦ A. Proof ∀x ∈ X, ∀y∗ ∈ Y∗ , we have BB  CC BB CC BB CC A (φX (x)), y∗ = φX (x), A y∗ = A y∗ , x = y∗ , Ax

.

= φY (Ax), y∗  Hence, the desired result follows. This completes the proof of the proposition.

' &

By Proposition 7.111 and Remark 7.88, ∀A, B ∈ B(X, Y), we have A = B  if, and only if, A = B. Proposition 7.112 Let X and Y be normed linear spaces over the field K and A ∈ B(X, Y). Then, the following statements hold.   (i) (R(A))⊥ = N A .    (ii) ⊥ R A = N (A).    (iii) If R(A) is closed, then R(A) = ⊥ N A .   Proof (i) Fix a vector y∗ ∈ N A . ∀y ∈ R(A), ∃x ∈ X such that y = Ax. Then, BB CC y∗ , y = y∗ , Ax = A y∗ , x = ϑX∗ , x = 0. Hence, y∗ ∈ (R(A))⊥ .   This shows that N A ⊆ (R(A))⊥ . BB CC On the other hand, fix a vector y∗ ∈ (R(A))⊥ . ∀x ∈ X, we have A y∗ , x =   y∗ , Ax = 0. Hence, we have A y∗ = ϑX∗ . This shows that (R(A))⊥ ⊆ N A .   Hence, we have (R(A))⊥ = N A .   (ii) Fix a vector x ∈ N (A). ∀x∗ ∈ R A , ∃y∗ ∈ Y∗ such that x∗ = A y∗ . Then,    x∗ , x = y∗ , Ax = y∗ , ϑY  = 0. Hence, x ∈ ⊥ R A . This shows that    N (A) ⊆ ⊥ R A .    the other hand, fix a vector x ∈ ⊥ R A . ∀y∗ ∈ Y∗ , we have 0 = BB On CC A y∗ , x = y∗ , Ax. Hence, by Proposition 7.85, we have Ax = ϑY and    x ∈ N (A). This shows that ⊥ R A ⊆ N (A).    Hence, we have ⊥ R A = N (A).

210

7 Banach Spaces

  (iii) Fix y ∈ R(A). Then ∃x ∈ X such that y = Ax. ∀y∗ ∈ N A , we have    BB  CC y∗ , y = A y∗ , x = ϑX∗ , x = 0. Hence, we have y ∈ ⊥ N A and    R(A) ⊆ ⊥ N A .    On the other hand, fix y ∈ ⊥ N A . By Proposition 7.97, δ := infk∈R(A) y − k = maxy∗ ∈(R(A))⊥ , y∗ ≤1 Re (y∗ , y), where the maximum ⊥ is achieved at some  y∗0 ∈ (R(A)) with y∗0  ≤ 1. By (i), we have ⊥  Proposition 4.10, (R(A)) = N A y∗0 . Then, δ = Re (y∗0 , y)= 0. By  y ∈ R(A) since R(A) ⊆ Y is closed. This shows that ⊥ N A ⊆ R(A).    Hence, we have R(A) = ⊥ N A . This completes the proof of the proposition. ' & The dual version of (ii) in the above proposition is deeper, which requires both Open Mapping Theorem and Hahn–Banach Theorem. Toward this end, we need the following result. Proposition 7.113 Let X and Y be Banach spaces over the field K and A ∈ B(X, Y). Assume that R(A) ⊆ Y is closed. Then, there exists K ∈ [0, ∞) ⊂ R such that, ∀y ∈ R(A), there exists x ∈ X such that y = Ax and x ≤ Ky. Proof Since A is continuous, then N (A) ⊆ X is closed. By Proposition 7.45, X/N (A) is a Banach space. Let φ : X → X/N (A) be the natural homomorphism, which is a bounded linear function by Proposition 7.69. By Proposition 7.70, there exists a bounded linear function AD : X/N (A) → R(A) such that A = AD ◦ φ, A = AD  and AD is injective. By Proposition 4.39, R(A) is complete. Then, by Proposition 7.13, R(A) is a Banach space. The mapping AD is surjective to R(A). Hence, AD : X/N (A) → R(A) is a bijective bounded linear operator. By Open Mapping Theorem, A−1 Let [x] := D ∈ B(R(A), X/N (A)). ∀y 6∈ R(A), 6 6A−1 6y. We will [x] A−1 y ∈ X/N Then, by Proposition 7.64, ≤ (A). D D distinguish two exhaustive and mutually exclusive cases: Case 1: [x] = 0; Case 2: [x] > 0. Case 1: [x] = 0. Then, [x] =6 [ϑX6]. Take x = ϑX , we have 6 y = AD [x] = Ax = ϑY and x = 0 = 26A−1 D y. Case 2: [x] > 0. that x ≤ 2[x]. Then, Note that [x] = infx∈[x] x. Then, ∃x ∈6 [x] such 6 −1 6 6 y = AD [x] = Ax and x ≤ 2[x] ≤ 2 AD y. Hence, the desired result 6 6 6 holds in both cases with K = 26A−1 D . This completes the proof of the proposition. ' & Proposition 7.114 Let X and Y be Banach spaces over the field K and A ∈ B(X, Y). Assume that R(A) is closed in Y. Then,   R A = (N (A))⊥

.

  Proof Fix an x∗ ∈ R A , then ∃y∗ ∈ Y∗ such that x∗ = A y∗ . ∀x ∈ N (A), we BB CC have x∗ , x = A y∗ , x = y∗ , Ax = y∗ , ϑY  = 0. Then, x∗ ∈ (N (A))⊥ .   This shows that R A ⊆ (N (A))⊥ .

7.12 Weak Topology

211

On the other hand, fix an x∗ ∈ (N (A))⊥ . Let K ∈ [0, ∞) ⊂ R be the constant described in Proposition 7.113. ∀y ∈ R(A), ∀x ∈ X with Ax = y, x∗ , x assumes a constant value that is dependent on y only. So, we may define a functional f : R(A) → K by f (y) = x∗ , x, ∀y ∈ R(A) and ∀x ∈ X with Ax = y. Clearly, f is a linear functional. By Proposition 7.113, ∃x0 ∈ X with Ax0 = y, such that x0  ≤ Ky. Then, by Proposition 7.64, |f (y)| = |x∗ , x0 | ≤ x∗ x0  ≤ x∗ Ky. Hence, f R(A) := supy∈R(A), y≤1 |f (y)| ≤ Kx∗  < +∞. By the simple version of Hahn–Banach Theorem 7.83, ∃y∗ ∈ Y∗ such that y∗ |R(A) = f and y∗  = f R(A) . ∀x ∈ X, we have BB

.

CC A y∗ , x = y∗ , Ax = f (Ax) = x∗ , x

    This implies that A y∗ = x∗ ∈ R A . Hence, we have (N (A))⊥ ⊆ R A .   Therefore, we have R A = (N (A))⊥ . This completes the proof of the proposition. ' &

7.12 Weak Topology Definition 7.115 Let X be a normed linear space over the field K. The weak topology on X, denoted by Oweak (X) ⊆ X2, is the weak topology generated by X∗ , that is the weakest topology on X such that x∗ : X → K, ∀x∗ ∈ X∗ , are continuous. % For a normed linear space X, let OX be the natural topology induced by the norm on X. Then, Oweak (X) ⊆ OX . We usually call the topology OX the strong topology. A set S ⊆ X is weakly open, that is S ∈ Oweak (X), then S is strongly open, that is S ∈ OX , and if S is weakly closed, then it is strongly closed, but not conversely. We will denote the topological space (X, Oweak (X)) by Xweak. Proposition 7.116 Let X be a normed linear space over the field K. Then, the following statements hold. (i) A basis for Oweak (X) consists of all sets of the form {x ∈ X | x∗i , x ∈ Oi , i = 1, . . . , n}

.

(7.8)

where n ∈ N, Oi ∈ OK and x∗i ∈ X∗ , i = 1, . . . , n, and OK is the natural topology on K. A basis at x0 ∈ X for Oweak (X) consists of all sets of the form {x ∈ X | |x∗i , x − x0 | < , i = 1, . . . , n}

.

where  ∈ (0, ∞) ⊂ R, n ∈ N, and x∗i ∈ X∗ , i = 1, . . . , n.

(7.9)

212

7 Banach Spaces

(ii) Xweak is completely regular (T3 1 ). 2 (iii) For a sequence (xk )∞ k=1 ⊆ Xweak , x0 ∈ X is the limit point of the sequence in the weak topology if, and only if, limk∈N x∗ , xk  = x∗ , x0 , ∀x∗ ∈ X∗ . In this case, we will write limk∈N xk = x0 weakly and say that (xk )∞ k=1 converges weakly to x0 . Proof (i) By Definition 7.115, Oweak (X) is the topology generated by sets of the form (7.8). We will show that these sets form a basis for the topology by Proposition 3.18. Take B = {x ∈ X | ϑ∗ , x ∈ K}, which is of the form (7.8) and B = X. ∀x ∈ X, we have x ∈ B. For any B1 , B2 ⊆ X, which are of the form (7.8), clearly, B1 ∩ B2 is again of the form (7.8). Hence, Proposition 3.18 applies and the sets of the form (7.8) form a basis for Oweak (X). Let B ⊆ X be any set of the form (7.9). Clearly, x0 ∈ B ∈ Oweak (X). ∀O ∈ Oweak (X) with x0 ∈ O, by Definition 3.17, there exists a basis open set B1 := {x ∈ X | x∗i , x ∈ Oi , i = 1, . . . , n}, for some n ∈ N and some x∗i ∈ X∗ and Oi ∈ OK , i = 1, . . . , n, such that x0 ∈ B1 ⊆ O. For each i = 1, . . . , n, ci := x∗i , x0  ∈ Oi ∈ OK . Then, ∃i ∈ (0, ∞) ⊂ R such  that BK (ci , i ) ⊆ Oi . Take  = min1≤i≤n i ∈ (0, ∞) ⊂ R and B2 := {x ∈ X  |x∗i , x − x0 | < , i = 1, . . . , n}. Clearly, B2 is of the form (7.9) and x0 ∈ B2 ⊆ B1 ⊆ O. Hence, sets of the form (7.9) form a basis at x0 for Oweak (X). Thus, (i) holds. (ii) ∀x1 , x2 ∈ X with x1 = x2 , we have x1 − x2 = ϑ. By Proposition 7.85, ∃x∗ ∈ X∗ with x∗  = 1 such that x∗ , x1 − x2  = x1 − x2  > 0. Let O1 := {a ∈ K | Re (a) > Re (x∗ , x2 ) + x1 − x2 /2} and O2 := {a ∈ K | Re (a) < Re (x∗ , x2 ) + x1 − x2 /2}. Then, O1 , O2 ∈ OK and O1 ∩ O2 = ∅. Let B1 := {x ∈ X | x∗ , x ∈ O1 } and B2 := {x ∈ X | x∗ , x ∈ O2 }. Then, B1 , B2 ∈ Oweak (X), x1 ∈ B1 , x2 ∈ B2 , and B1 ∩ B2 = ∅. This shows that Xweak is Hausdorff. Next, we show that Xweak is completely regular. Fix any weakly closed set  ∈ Oweak (X). Then, there exists a basis open set F ⊆ Xweak and x0 ∈ F some x∗i ∈ X∗ B = {x ∈ X | x∗i , x ∈ Oi , i = 1, . . . , n}, for some n ∈ N and  and Oi ∈ OK , i = 1, . . . , n, such that x0 ∈ B ⊆ F . Let O := ni=1 Oi ⊆ Kn which is open. Let p0 := (x∗1 , x0 , . . . , x∗n , x0 ) ∈ Kn . Then, p0 ∈ O. By Propositions 4.11 and 3.61, Kn is normal and therefore completely regular. Then, there exists a continuous function f : Kn → [0, 1] ⊂ R such that f |O˜ = 0 and f (p0 ) = 1. Define g : Xweak → [0, 1] ⊂ R by g(x) = f (x∗1 , x, . . . , x∗n , x), ∀x ∈ Xweak. By Propositions 3.12 and 3.32, g is a continuous real-valued function on Xweak , g(x0 ) = f (p0 ) = 1, and g|F = 0. Hence, Xweak is completely regular. Thus, (ii) holds. (iii) “Only if” ∀x∗ ∈ X∗ , x∗ : X → K is weakly continuous. By Proposition 3.66, we have limk∈N x∗ , xk  = x∗ , x0 . ∗ “If” Let (xk )∞ k=1 satisfy that limk∈N x∗ , xk  = x∗ , x0 , ∀x∗ ∈ X . For any basis open set B = {x ∈ X | x∗i , x ∈ Oi , i = 1, . . . , n}, for some n ∈ N and some x∗i ∈ X∗ and Oi ∈ OK , i = 1, . . . , n, with x0 ∈ B, we have that ∀i = 1, . . . , n, ∃Ni ∈ N such that x∗i , xk  ∈ Oi , ∀k ≥ Ni . Take N = max1≤i≤n Ni ∈

7.12 Weak Topology

213

N. ∀k ≥ N, xk ∈ B. This shows that (xk )∞ k=1 converges weakly to x0 . Hence, (iii) holds. This completes the proof of the proposition. ' & Proposition 7.117 Let X be a normed linear space over the field K and Xweak be the topological space of X endowed with the weak topology. Then, vector addition ⊕ : Xweak × Xweak → Xweak is continuous; and scalar multiplication ⊗ : K × Xweak → Xweak is continuous. Proof Fix (x0 , y0 ) ∈ Xweak × Xweak. We will show that ⊕ is continuous at (x0 , y0 ). Fix a basis open set B ∈ Oweak (X) with x0 + y0 ∈ B. Then, by Proposition 7.116, B = {z ∈ X | |x∗i , z − x0 − y0 | < , i = 1, . . . , n}, for some n ∈ N, for some  ∈ (0, ∞) ⊂ R, and for some x∗i ∈ X∗ , i = 1, . . . , n. Let B1 := {x ∈ X | |x∗i , x − x0 | < /2, i = 1, . . . , n} and B2 := {y ∈ X | |x∗i , y − y0 | < /2, i = 1, . . . , n}. Clearly, B1 , B2 ∈ Oweak (X), B1 × B2 ∈ OXweak ×Xweak , and (x0 , y0 ) ∈ B1 × B2 . ∀(x, y) ∈ B1 × B2 , ∀i = 1, . . . , n, |x∗i , x + y − x0 − y0 | ≤ |x∗i , x − x0 | + |x∗i , y − y0 | < . Hence, x + y ∈ B. This shows that ⊕ is continuous at (x0 , y0 ). By the arbitrariness of (x0 , y0 ) and Proposition 3.9, ⊕ is continuous. Fix (α0 , x0 ) ∈ K × Xweak . We will show that ⊗ is continuous at (α0 , x0 ). Fix a basis open set B ∈ Oweak (X) with α0 x0 ∈ B. Then, by Proposition 7.116, B = {z ∈ X | |x∗i , z − α0 x0 | < , i = 1, . . . , n}, for some n ∈ N, for some  ∈ (0, ∞) ⊂ R, and for some x∗i ∈ X∗ , i = 1, . . . , n. Let M := max1≤i≤n x∗i  ∈  [0, ∞) ⊂ R. Let B1 := {α ∈ K | |α − α0 | < +2Mx } and B2 := {x ∈ 0  X | |x∗i , x − x0 | < 2 (1+|α0 |) , i = 1, . . . , n}. Clearly, B1 ∈ OK and B2 ∈ Oweak (X), B1 × B2 ∈ OK×Xweak , and (α0 , x0 ) ∈ B1 × B2 . ∀(α, x) ∈ B1 × B2 , ∀i = 1, . . . , n, |x∗i , αx − α0 x0 | ≤ |x∗i , αx − αx0 | + |x∗i , αx0 − α0 x0 | = |α||x∗i , x − x0 | + |α − α0 ||x∗i , x0 | ≤ (|α0 | + |α − α0 |)|x∗i , x − x0 | + |α − α0 |x∗i x0  ≤ (|α0 | + 1)|x∗i , x − x0 | + |α − α0 |Mx0  < . Hence, αx ∈ B. This shows that ⊗ is continuous at (α0 , x0 ). By the arbitrariness of (α0 , x0 ) and Proposition 3.9, ⊗ is continuous. This completes the proof of the proposition. ' & Proposition 7.118 Let X be a finite-dimensional normed linear space over the field K, OX be the strong topology on X, and Oweak (X) be the weak topology on X. Then, OX = Oweak (X). Proof Clearly, Oweak (X) ⊆ OX . Fix any basis open set O = BX (x0 , r) ∈ OX with x0 ∈ X and r ∈ (0, ∞) ⊂ R. We will show that O ∈ Oweak (X). Let n ∈ Z+ be the dimension of X. We will distinguish two exhaustive and mutually exclusive cases: Case 1: n = 0; Case 2: n ∈ N. Case 1: n = 0. Then, X is a singleton set. O = X ∈ Oweak (X). This case is proved. Case 2: n ∈ N. Let {e1 , . . . , en } ⊆ X be a basis of X with ei  =-1, i = 1, . . . , n. ∀i = 1, . . . , n, let fi : X → K be defined by fi (x) = αi , ∀x = nj=1 αj ej ∈ X. Clearly, fi is well-defined and is a linear functional. By Proposition 7.67, fi is continuous. Denote fi by e∗i ∈ X∗ . ∀x1 ∈ O, let δ = (r−x1 − x0 )/n ∈ (0, ∞) ⊂

214

7 Banach Spaces

R. Let B := {x ∈ X | |e∗i , x − x1 | 0, ∀n ∈ N. Hence, xn ∈ X is well-defined, ∀n ∈ N. Clearly, xn  = 1 and xn ∈ K, ∀n ∈ N. By K being and Borel compact ∞ Lebesgue Theorem 5.37, ∃x0 ∈ K and a subsequence xni i=1 of (xn )∞ n=1 such x ∈ span ∀i ∈ N, then x ∈ span that limi∈N xni = x0 . Since (E), (E). 0 -∞ ni This implies that x0 = a x , where a ∈ K, ∀j ∈ N. Note that x j j j 0 = -ni−1 -ni−1 j =1 limi∈N xni = limi∈N j =1 aj xj and thus limi∈N (xni − j =1 aj xj ) = ϑX . But, 6 6 6 6 6 -ni−1 -ni−1 6 6 6 6 1 66 6 6 y a aj xj 6 = 66y −z − − z x − z ∀i ∈ N, 6xni − j =1 6y ni ni ni j j ni 6. j =1 ni ni 6 6    -ni−1 6 6yn − zn 6 aj xj − zn The vector ∈ span x1 , . . . , xni −1 . i i j =1 6 6 i 6 6 ni−1 6 1 6 By the definition of zni , we have 6xni − j =1 aj xj 6 = 6yn −zn 6 i i 6 6 6 6 6 -ni−1 6 6 6 1 66 6 6 6 6 = 1. This 6yni − j =1 yni − zni aj xj − zni 6 ≥ 6y −z 6 yni − zni ni ni -ni−1 contradicts with limi∈N (xni − j =1 aj xj ) = ϑX . Hence, the hypothesis does not hold, and X must be finite-dimensional. This completes the proof of the proposition. ' & Proposition 7.126 Let X := (X, O) be a separable topological space, Y be a Banach space, and f : X → Y be continuous. Then, W := span (f (X )) ⊆ Y is a separable normed linear subspace of Y, and W ⊆ Y is a separable Banach subspace of Y. Proof W is a subspace of Y and hence a normed linear subspace of Y. W is a closed subspace of Y and hence a complete normed linear subspace of Y, by Proposition 4.39. Then, W is a Banach subspace of Y. By Proposition 3.90, f (X ) is separable. Then, by Proposition 7.35, W and W are separable subsets of Y. This completes the proof of the proposition. ' & Proposition 7.127 Let Y := (Y,  ρ) be a metric space,  and Si ⊆ Y be a separable subset, ∀i ∈ N ⊆ N. Then, U := i∈N Si and V := i∈N Si are separable subsets of Y. Proof Since U ⊆ S1 and S1 is separable. By Proposition 4.38, U is separable.  Let Di ⊆ Si be a countable dense subset of Si , ∀i ∈ N. Then, D := i∈N Di ⊆ V and D is countable by Cantor’s diagonal  ∞ argument. ∀y0 ∈ V , then y0 ∈ Si0 for some i0 ∈ N. By Proposition 4.13, ∃ yj j =1 ⊆ Di0 ⊆ D such that y0 = limj ∈N yj . Then, again by Proposition 4.13, y0 ∈ D. By the arbitrariness of y0 , we have V ⊆ D. Then, D is dense in V , which implies that V is separable. This completes the proof of the proposition. ' &

Proposition 7.128 Let X := (X, O) be a separable topological space, Y be a Banach space, and fn : X → Y be continuous, ∀n ∈ N. Then, Wn := span (fn (X )) ⊆ Y is a separable normed linear subspace of Y, and W :=   span n∈N Wn ⊆ Y is a separable Banach subspace of Y. Proof This follows directly from Propositions 7.126, 7.127, 7.35, and 4.39.

' &

220 Fig. 7.2 Modes of convergence in Y = X∗

7 Banach Spaces limn∈IN yn = y0 in limn∈IN yn = y0 weakly is reflexive limn∈IN yn = y0 weak∗

Here, we will summarize different modes of convergence in a Banach space in the following figure. Let X be a Banach space over K, Y := X∗ , and Z := Y∗ = X∗∗ . A ∗ sequence (yn )∞ n=1 ⊆ Y can converge to some y0 ∈ Y in norm, or weakly, or weak . Then, we have the diagram in Fig. 7.2.

Chapter 8

Global Theory of Optimization

In this chapter, we are going to develop a number of tools for optimization in real normed linear spaces, including the geometric form of Hahn–Banach Theorem, minimum norm duality for convex sets, Fenchel Duality Theorem, and Lagrange multiplier theory for convex programming. In this chapter, we will restrict our attention to real spaces, rather than complex ones.

8.1 Hyperplanes and Convex Sets Definition 8.1 Let X be a real vector space. A hyperplane H is a maximal proper linear variety in X , that is, a linear variety H ⊂ X , and if V ⊇ H is a linear variety, then either V = H or V = X . % Proposition 8.2 Let X be a real vector space. H ⊆ X is a hyperplane if, and only if, there exists a linear functional f : X → R and c ∈ R with f being not identically equal to zero, such that H = {x ∈ X | f (x) = c}. Proof “Necessity” Let H be a hyperplane. Then, H is a linear variety. There exists a subspace M and x0 ∈ X such that H = x0 + M. We will distinguish two exhaustive and mutually exclusive cases: Case 1: x0 ∈ / M; Case 2: x0 ∈ M. Case 1: x0 ∈ / M. Let M¯ := span (M ∪ {x0 }). Clearly, M¯ ⊃ H and is a linear variety. Then, M¯ = X by Definition 8.1. ∀x ∈ X , x can be uniquely written as αx0 + m, where α ∈ R and m ∈ M. Define f : X → R by f (x) = f (αx0 + m) = α, ∀x ∈ X. Clearly, f is a linear functional and is not identically equal to zero. It is straightforward to verify that H = {x ∈ X | f (x) = 1}. Case 2: x0 ∈ M. Then, H = M. By Definition 8.1, ∃x1 ∈ X \ H . Let M¯ := span (M ∪ {x1 }). Clearly, M¯ ⊃ H and is a linear variety. Then, M¯ = X by Definition 8.1. ∀x ∈ X , x can be uniquely written as αx1 + m, where α ∈ R and m ∈ M. Define f : X → R by f (x) = f (αx1 + m) = α,

© The Author(s), under exclusive license to Springer Nature Switzerland AG 2023 Z. Pan, Measure-Theoretic Calculus in Abstract Spaces, https://doi.org/10.1007/978-3-031-21912-2_8

221

222

8 Global Theory of Optimization

∀x ∈ X. Clearly, f is a linear functional and is not identically equal to zero. It is straightforward to verify that H = {x ∈ X | f (x) = 0}. “Sufficiency” Let H = {x ∈ X | f (x) = c}, where f : X → R is a linear functional, c ∈ R, and f is not identically equal to zero. Let M := N (f ). Clearly, M is a proper subspace of X . Since f is not identically equal to zero, then ∃x0 ∈ X \ M such that f (x0 ) = 1. ∀x ∈ X , f (x − f (x)x0 ) = 0. Then, x − f (x)x0 ∈ M and x ∈ span (M ∪ {x0 }). Hence, X = span (M ∪ {x0 }) and M is a maximal proper subspace. Then, H = cx0 + M is a hyperplane. This completes the proof of the proposition. ' & Consider a hyperplane H in a real normed linear space X. By Proposition 7.17, H is a linear variety. By Definition 8.1, H = X or H = H . Thus, a hyperplane in a real normed linear space must either be dense or closed. Proposition 8.3 Let X be a real normed linear space and H ⊆ X. H is a closed hyperplane if, and only if, there exist x∗ ∈ X∗ and c ∈ R with x∗ = ϑ∗ such that H = {x ∈ X | x∗ , x = c}. Proof “Necessity” Let H be a closed hyperplane. By Proposition 8.2, there exists a linear functional f : X → R and c ∈ R with f being not identically equal to zero such that H = {x ∈ X | f (x) = c}. All we need to show is that f ∈ X∗ . Since H is a linear variety, then H = ∅. Fix x0 ∈ H . It is easy to show that M := H − x0 = N (f ). By Proposition 7.16, M is closed. By Proposition 7.72, f ∈ X∗ . “Sufficiency” By Proposition 8.2, H is a hyperplane. By the continuity of x∗ and Proposition 3.10, H is closed. Hence, H is a closed hyperplane. This completes the proof of the proposition. ' & For a real normed linear space X and a closed hyperplane H ⊂ X. Then, H = {x ∈ X | x∗ , x = c}, where ϑ∗ = x∗ ∈ X∗ and c ∈ R. We associate four sets with H : (a) {x ∈ X | x∗ , x ≤ c}; (b) {x ∈ X | x∗ , x < c}; (c) {x ∈ X | x∗ , x ≥ c}; (d) {x ∈ X | x∗ , x > c}, which are called half-spaces. The first two are negative half-spaces. The last two are positive half-spaces. The first and third are closed. The second and fourth are open. Definition 8.4 Let X be a real normed linear space and K ⊆ X be convex with ϑ ∈ K ◦ . The Minkowski functional p : X → R of K is defined by p(x) = inf{r ∈ R | r −1 x ∈ K, r > 0}, ∀x ∈ X. % Proposition 8.5 Let X be a real normed linear space and K ⊆ X be convex with ϑ ∈ K ◦ . Then, the Minkowski functional p : X → R of K satisfies (i) (ii) (iii) (iv) (v)

0 ≤ p(x) < +∞, ∀x ∈ X. p(αx) = αp(x), ∀x ∈ X, ∀α ∈ [0, ∞) ⊂ R. p(x1 + x2 ) ≤ p(x1 ) + p(x2 ), ∀x1 , x2 ∈ X. p is uniformly continuous. K = {x ∈ X | p(x) ≤ 1}; K ◦ = {x ∈ X | p(x) < 1}.

Furthermore, (ii) and (iii) imply that p is a sublinear functional.

8.1 Hyperplanes and Convex Sets

223

Proof (i) Since ϑ ∈ K ◦ , then ∃0 ∈ (0, ∞) ⊂ R such that B (ϑ, 0 ) ⊆ K. ∀x ∈ X, we have either x = ϑ, then p(ϑ) = 0; or x = ϑ, then 0 ≤ p(x) ≤ x/0 < +∞. (ii) ∀x ∈ X, ∀α ∈ [0, ∞) ⊂ R, we will distinguish three exhaustive and mutually exclusive cases: Case 1: x = ϑ; Case 2: x = ϑ and α = 0; Case 3: x = ϑ and α > 0. Case 1: x = ϑ. We have p(αx) = p(ϑ) = 0 = αp(x). Case 2: x = ϑ and α = 0. Then, we have p(αx) = p(ϑ) = 0 = αp(x). Case 3: x = ϑ and α > 0. ∀r ∈ {r ∈ R | r −1 x ∈ K, r > 0}, we have αr > 0 and αr ∈ {r ∈ R | r −1 (αx) ∈ K, r > 0}. Hence, αp(x) ≥ p(αx). On the other hand, ∀r ∈ {r ∈ R | r −1 (αx) ∈ K, r > 0}, we have r/α ∈ {r ∈ R | r −1 x ∈ K, r > 0}. Hence, α −1 p(αx) ≥ p(x). Therefore, we have αp(x) = p(αx). (iii) ∀x1 , x2 ∈ X, ∀r1 ∈ {r ∈ R | r −1 x1 ∈ K, r > 0}, ∀r2 ∈ {r ∈ R | r −1 x2 ∈ K, r > 0}, we have r1−1 x1 , r2−1 x2 ∈ K By the convexity of K, we have (r1 + r2 )−1 (x1 + x2 ) =

.

r1 r2 r1−1 x1 + r −1 x2 ∈ K r1 + r2 r1 + r2 2

Then, r1 + r2 ∈ {r ∈ R | r −1 (x1 + x2 ) ∈ K, r > 0}. Hence, we have r1 + r2 ≥ p(x1 + x2 ) and p(x1 ) + p(x2 ) ≥ p(x1 + x2 ). (iv) ∀x ∈ X, 0 ≤ p(x) ≤ x/0 . ∀ ∈ (0, ∞) ⊂ R, let δ = 0  ∈ (0, ∞) ⊂ R. ∀x1 , x2 ∈ X with x1 − x2  < δ, we have p(x1 ) ≤ p(x2 ) + p(x1 − x2 ) ≤ p(x2 ) + x1 − x2 /0 < p(x2 ) +  and p(x2 ) ≤ p(x1 ) + p(x2 − x1 ) ≤ p(x1 ) + x2 − x1 /0 < p(x1 ) + . Hence, |p(x1 ) − p(x2 )| < . This shows that p is uniformly continuous. (v) ∀x ∈ K ◦ , we have either x = ϑ, then p(x) = 0 < 1; or x = ϑ, then ∃ ∈ (0, ∞) ⊂ R such that B (x, x) ⊆ K, which implies that p(x) ≤ 1/(1 + ) < 1. Therefore, K ◦ ⊆ {x ∈ X | p(x) < 1}. ∀x ∈ X with p(x) < 1, by the continuity of p, ∃δ ∈ (0, ∞) ⊂ R such that p(y) < 1, ∀y ∈ B (x, δ). ∀y ∈ B (x, δ), y ∈ K by the convexity of K, p(y) < 1, and the fact that ϑ ∈ K. Then, B (x, δ) ⊆ K and x ∈ K ◦ . Therefore, we have {x ∈ X | p(x) < 1} ⊆ K ◦ . Thus, K ◦ = {x ∈ X | p(x) < 1}. ∀x ∈ K, we have p(x) ≤ 1. Then, K ⊆ {x ∈ X | p(x) ≤ 1}. By the continuity of p and Proposition 3.10, {x ∈ X | p(x) ≤ 1} is closed. Then, K ⊆ {x ∈ X | p(x) ≤ 1}. ∀x ∈ X with p(x) ≤ 1, ∀ρ ∈ (0, 1) ⊂ R, p(ρx) = ρp(x) < 1. Then, ρx ∈ K ◦ . Then, by Proposition 4.13, x ∈ K ◦ ⊆ K. Hence, {x ∈ X | p(x) ≤ 1} ⊆ K. Therefore, K = {x ∈ X | p(x) ≤ 1}. This completes the proof of the proposition. ' & Proposition 8.6 Let X be a real normed linear space and K ⊆ X be convex with K ◦ = ∅. Then, K = K ◦ . Proof Clearly, K ◦ ⊆ K. Then, K ◦ ⊆ K. To show that K ⊆ K ◦ , we will distinguish two exhaustive and mutually exclusive cases: Case 1: ϑ ∈ K ◦ ; Case 2: ϑ ∈ / K ◦ . Case 1: ϑ ∈ K ◦ . Let p : X → [0, ∞) ⊂ R be the Minkowski functional of K. By Proposition 8.5, p is a continuous sublinear functional, K ◦ = {x ∈ X | p(x) < 1}, and K = {x ∈ X | p(x) ≤ 1}. ∀x ∈ K,

224

8 Global Theory of Optimization x +x x ∈ K. Then,  x < . y ≤ +x

∀ ∈ (0, ∞) ⊂ R, p(x) ≤ 1. By the convexity of K, y :=

x p(y) = +x p(x) < 1. Then, y ∈ K ◦ . Note that x − Hence, x ∈ K ◦ . Therefore, K ⊆ K ◦ . This case is proved. Case 2: ϑ ∈ / K ◦ . Let x0 ∈ K ◦ = ∅. Let K1 = K − x0 . By Propositions 7.16 ◦ and 6.39, K1 = K ◦ − x0 = ∅ and K1 is convex. By Case 1, K1 ⊆ K1◦ . By Proposition 7.16, K = x0 + K1 = x0 +K1 ⊆ x0 +K1◦ = x0 + K1◦ = (x0 + K1 )◦ = K ◦ . This case is proved. Hence, K = K ◦ . This completes the proof of the proposition. ' &

8.2 Geometric Form of Hahn–Banach Theorem The next theorem is the geometric form of Hahn–Banach Theorem. Theorem 8.7 (Mazur’s Theorem) Let .X be a real normed linear space, .K ⊆ X be a convex set with nonempty interior, and .V ⊆ X is a linear variety with .V ∩K ◦ = ∅. Then, there exists a closed hyperplane H containing V but no interior point of K and K is contained in one of the closed half-spaces associated with H ; that is, ∗ .∃c ∈ R and .∃x∗ ∈ X with .x∗ = ϑ∗ such that .x∗ , v = c, .∀v ∈ V , .x∗ , k < c, ◦ .∀k ∈ K , and .x∗ , k ≤ c, .∀k ∈ K. Proof We will distinguish two exhaustive and mutually exclusive cases: Case 1: ϑ ∈ K ◦ ; Case 2: .ϑ ∈ / K ◦ . Case 1: .ϑ ∈ K ◦ . Let .p : X → [0, ∞) ⊂ R be the Minkowski functional of K. By Proposition 8.5, p is a continuous sublinear functional. Let .M := span (V ). Since V is a linear variety and .ϑ ∈ / V , then .∃x1 ∈ V ¯ .∀m ∈ M, .∃! α ∈ R and such that .V − x1 =: M¯ is a subspace in M and .x1 ∈ / M. .∃! m ¯ ∈ M¯ such that .m = αx1 + m. ¯ Then, we may define a functional .f : M → R by .f (m) = f (αx1 + m) ¯ = α, .∀m ∈ M. Clearly, f is a linear functional on M and .f (v) = 1, .∀v ∈ V . .∀v ∈ V , we have .v ∈ / K ◦ and .p(v) ≥ 1 = f (v). .∀m = αx1 + m ¯ ∈ M, we have either .α > 0, then .f (m) = αf (x1 + α −1 m) ¯ = α ≤ αp(x1 + α −1 m) ¯ = p(m), where the inequality follows from the fact that −1 m .x1 + α ¯ ∈ V ; or .α ≤ 0, then .f (m) = α ≤ 0 ≤ p(m). Thus, .∀m ∈ M, we have .f (m) ≤ p(m). By the extension form of Hahn–Banach Theorem, there exists ∗ .x∗ ∈ X such that . x∗ |M = f and .x∗ , x ≤ p(x), .∀x ∈ X. Clearly, .x∗ = ϑ∗ . .∀v ∈ V , .x∗ , v = f (v) = 1 =: c ∈ R. .∀k ∈ K, we have .x∗ , k ≤ p(k) ≤ 1. ◦ .∀k ∈ K , we have .x∗ , k ≤ p(k) < 1. Hence, the closed hyperplane .H := {x ∈ X | x∗ , x = c} is the one that we seek. Case 2: .ϑ ∈ / K ◦ . Let .x0 ∈ K ◦ . Then, by Proposition 6.39, .K1 := K − x0 is a convex set. .V1 := V − x0 is a linear variety. By Proposition 7.16, .K1◦ = K ◦ − x0 . Then, .ϑ ∈ K1◦ and .V1 ∩ K1◦ = ∅. By Case 1, there exist .c1 ∈ R and .x∗ ∈ X∗ with .x∗ = ϑ∗ such that .x∗ , v1  = c1 , .∀v1 ∈ V1 , .x∗ , k1  < c1 , .∀k1 ∈ K1◦ , and .x∗ , k1  ≤ c1 , .∀k1 ∈ K1 . Let .c := c1 + x∗ , x0  ∈ R. Then, .∀v ∈ V , ◦ .x∗ , v = c. .∀k ∈ K , .x∗ , k < c. By Proposition 7.16, .K1 = K − x0 . Then, .

8.2 Geometric Form of Hahn–Banach Theorem

225

∀k ∈ K, .x∗ , k ≤ c. Thus, the closed hyperplane .H := {x ∈ X | x∗ , x = c} is the one that we seek. This completes the proof of the theorem. ' &

.

Definition 8.8 Let .X be a real normed linear space. A closed hyperplane .H := {x ∈ X | x∗ , x = c} with .x∗ ∈ X∗ , .x∗ = ϑ∗ , and .c ∈ R is called a supporting hyperplane of a convex set .K ⊆ X if either .infk∈K x∗ , k = c or .supk∈K x∗ , k = c. % Clearly, for a convex set K in a real normed linear space, if K admits interior points, then, by Mazur’s Theorem, there exists a supporting hyperplane of K passing through each boundary point of K. Theorem 8.9 (Eidelheit Separation Theorem) Let .X be a real normed linear space and .K1 , K2 ⊆ X be nonempty convex sets with .K1◦ = ∅ and .K1◦ ∩ K2 = ∅. Then, there exists a closed hyperplane H that separates .K1 and .K2 , that is .∃x∗ ∈ X∗ with .x∗ = ϑ∗ and .∃c ∈ R such that .

sup x∗ , k1  ≤ c ≤ inf x∗ , k2  k2 ∈K2

k1 ∈K1

Proof Let .K := K1◦ − K2 . By Propositions 7.15 and 6.39, K is convex. Clearly, ◦ = ∅ since .K ◦ = ∅ and .K = ∅. Since .K ◦ ∩ K = ∅, then .ϑ ∈ .K / K. Let 2 2 1 1 ◦ = ∅. By Mazur’s Theorem, .V = {ϑ}. Then, V is a linear variety and .V ∩ K ∗ ◦ .∃x∗ ∈ X with .x∗ = ϑ∗ such that .x∗ , k ≤ x∗ , ϑ = 0, .∀k ∈ K. .∀k1 ∈ K , 1 .∀k2 ∈ K2 , .k1 − k2 ∈ K and .x∗ , k1 − k2  ≤ 0. Then, .x∗ , k1  ≤ x∗ , k2 . Hence, .−∞ < supk1 ∈K ◦ x∗ , k1  ≤ c ≤ infk2 ∈K2 x∗ , k2  < +∞ for some 1

c ∈ R. ByBB Proposition 8.6, .K1 = K1◦ . By Proposition 4.13, .∀k1 ∈ K1 , .x∗ , k1  ≤ CC supk¯1 ∈K ◦ x∗ , k¯1 ≤ c. Hence, we have

.

1

.

sup x∗ , k1  = sup x∗ , k1  ≤ c ≤ inf x∗ , k2 

k1 ∈K1

k1 ∈K1◦

k2 ∈K2

Thus, the closed hyperplane .H := {x ∈ X | x∗ , x = c} is the one we seek. This completes the proof of the theorem. ' & Proposition 8.10 Let .X be a real normed linear space and .K ⊆ X be a closed convex set. Then, .∀x0 ∈ X \ K. Then, there exists a .x∗ ∈ X∗ such that .x∗ , x0  < infk∈K x∗ , k. Hence, K is weakly closed. Proof We will distinguish two exhaustive and mutually exclusive cases: Case 1: K = ∅; Case 2: .K = ∅. Case 1: .K = ∅. Let .x∗ = ϑ∗ . Then, .x∗ , x0  = 0 < +∞ = infk∈K x∗ , k. Clearly, K is weakly closed. This case is proved. Case 2: .K = ∅. .∀x0 ∈ X \ K, by Proposition 4.10, .dist(x0 , K) = infk∈K x0 − k =: d ∈ (0, ∞) ⊂ R. Then, .K1 := BX (x0 , d) is a convex set with nonempty interior and .K1 ∩ K = ∅. By Eidelheit Separation Theorem, there exists .x∗ ∈ X∗ with .x∗ = ϑ∗ and .∃c ∈ R such that .supk1 ∈K1 x∗ , k1  ≤ c ≤

.

226

8 Global Theory of Optimization

infk∈K x∗ , k. By Lemma 7.75, .x∗ , x0  < supk1 ∈K1 x∗ , k1  ≤ c. Then, x∗ , x0  < infk∈K x∗ , k. Let .K¯ be the closure of K in the weak topology .Oweak (X). .x0 ∈ {x ∈ X | x∗ , x < c} =: O. Clearly, O is a weakly open set and .O ∩ K = ∅. Then, ¯ Hence, ¯ Thus, we have shown that .∀x0 ∈ K,  .x0 ∈ K. by Proposition 3.3, .x0 ∈ / K. ¯ ⊆ K. Hence, K is weakly closed by Proposition 3.3. This case is proved. .K This completes the proof of the proposition. ' & .

Proposition 8.11 Let .X be a reflexive real normed linear space and .K ⊆ X∗ be a bounded closed convex set. Then, K is weak∗ compact. Proof Since K is bounded, then there exists .n ∈ Z+ such that .K ⊆ B X∗ (ϑ∗ , n) =: S. By Alaoglu Theorem, S is weak∗ compact. Since K is closed and convex, then, by Proposition 8.10, K is weakly closed. Since .X is reflexive, then the weak topology and the weak∗ topology on .X∗ coincide. Then, K is weak∗ closed. By Proposition 5.5, K is weak∗ compact. This completes the proof of the proposition. ' &

8.3 Duality in Minimum Norm Problems Definition 8.12 Let X be a real normed linear space and K ⊆ X be a nonempty convex set. The support of K is the set K supp := {x∗ ∈ X∗ | supk∈K x∗ , k < +∞}. % Proposition 8.13 Let X be a real normed linear space and K, G ⊆ X be nonempty convex sets. Then, the following statements hold. (i) K supp ⊆ X∗ is a convex cone. (ii) If K is closed, then K = =

 x∗ ∈K supp, x∗ =ϑ∗



{x ∈ X | x∗ , x ≤ sup x∗ , k} k∈K

x∗ ∈K supp

{x ∈ X | x∗ , x ≤ sup x∗ , k}, that is, K equals to the k∈K

intersection of all closed half-spaces containing K. (iii) (K + G)supp = K supp ∩ Gsupp. (iv) (K ∩ G)supp ⊇ K supp + Gsupp, if K ∩ G = ∅. Proof (i) Clearly, ϑ∗ ∈ K supp. ∀x∗1 , x∗2 ∈ K supp, ∀α ∈ [0, ∞) ⊂ R, we have either α = 0, then αx∗1 = ϑ∗ ∈ K supp; or α > 0, then supk∈K αx∗1 , k = supk∈K αx∗1 , k = α supk∈K x∗1 , k < +∞, where we have applied Proposition 3.81, which further implies that αx∗1 ∈ K supp. This shows that K supp is a cone. Note that supk∈K x∗1 + x∗2 , k = supk∈K (x∗1 , k + x∗2 , k) ≤ supk∈K x∗1 , k + supk∈K x∗2 , k < +∞, where we have applied Proposition 3.81. Then, x∗1 + x∗2 ∈ K supp. This coupled with the fact that K supp is a cone implies that K supp is a convex cone.

8.3 Duality in Minimum Norm Problems

227

 ¯ (ii) Clearly K ⊆ x∗ ∈K supp {x ∈ X | x∗ , x ≤ supk∈K x∗ , k} =: K. ∗ ∀x ∈ X\K, by Proposition 8.10, ∃x∗0 ∈ X such that x∗0 , x < infk∈K x∗0 , k. Then, supk∈K −x∗0 , k < −x∗0 , x < +∞. Hence, −x∗0 ∈ K supp and ¯ This shows that X \ K ⊆ X \ K¯ and K¯ ⊆ K. Hence, K = K. ¯ x ∈ X \ K. Note that, for ϑ∗ ∈ K supp, {x ∈ X | ϑ∗ , x ≤ supk∈K ϑ∗ , k} = X. Then,  ´ K = x∗ ∈K supp, x∗ =ϑ∗ {x ∈ X | x∗ , x ≤ supk∈K x∗ , k}. Let K be the ´ Any closed intersection of all closed half-spaces containing K. Clearly, K ⊆ K. half-space containing K can be expressed as {x ∈ X | x∗ , x ≤ c} for some x∗ ∈ X∗ with x∗ = ϑ∗ and for some c ∈ R. Then, c ≥ supk∈K x∗ , k ∈ R.   Hence, x∗ ∈ K supp and K´ ⊆ x∗ ∈K supp, x∗ =ϑ∗ c≥supk∈K x∗ ,k {x ∈ X | x∗ , x ≤  ´ c} = x ∈K , x =ϑ {x ∈ X | x∗ , x ≤ supk∈K x∗ , k} = K. Hence, K = K. ∗

supp





(iii) ∀x∗ ∈ K supp ∩ Gsupp, then supk∈K x∗ , k =: c1 < +∞ and supg∈G x∗ , g =: c2 < +∞. ∀x ∈ K + G, x = k + g for some k ∈ K and some g ∈ G. Then, x∗ , x = x∗ , k + x∗ , g ≤ c1 + c2 < +∞. This shows that x∗ ∈ (K + G)supp. Hence, K supp ∩ Gsupp ⊆ (K + G)supp. On the other hand, ∀x∗ ∈ (K + G)supp, supx∈K+G x∗ , x =: c < +∞. Fix k0 ∈ K and g0 ∈ G, since K and G are nonempty. Then, supk∈K x∗ , k = supx∈g0 +K x∗ , x − x∗ , g0  ≤ c − x∗ , g0  < +∞ and supg∈G x∗ , g = supx∈k0 +G x∗ , x−x∗ , k0  ≤ c−x∗ , k0  < +∞. Hence, x∗ ∈ K supp∩Gsupp. This shows that (K + G)supp ⊆ K supp ∩ Gsupp. Therefore, we have (K + G)supp = K supp ∩ Gsupp. (iv) ∀x∗ ∈ K supp + Gsupp, let x∗ = x∗1 + x∗2 with x∗1 ∈ K supp and x∗2 ∈ Gsupp. ∀x ∈ K ∩ G, we have x∗ , x = x∗1 , x + x∗2 , x ≤ supk∈K x∗1 , k + supg∈G x∗2 , g < +∞. Hence, x∗ ∈ (K ∩ G)supp. Hence, we have K supp + Gsupp ⊆ (K ∩ G)supp. This completes the proof of the proposition. ' &

Definition 8.14 Let X be a real normed linear space and K ⊆ X be a nonempty convex set. The support functional of K is h : K supp → R given by h(x∗ ) = supk∈K x∗ , k, ∀x∗ ∈ K supp. % In the above definition, h takes value in R since K = ∅ and x∗ ∈ K supp. Proposition 8.15 Let X be a real normed linear space, K ⊆ X be a nonempty convex set, h : K supp → R be the support functional of K, and y ∈ X. Then, the following statements holds. (i) δ := inf y − k = k∈K

max

(x∗ , y − h(x∗ )), where the maximum

x∗ ∈K supp, x∗ ≤1

is achieved at some x∗0 ∈ K supp with x∗0  ≤ 1. If the infimum is achieved at k0 ∈ K then y − k0 is aligned with x∗0 and x∗0 , k0  = h(x∗0 ). (ii) If ∃k0 ∈ K and ∃x∗0 ∈ K supp with x∗0  = 1 such that y − k0 is aligned with x∗0 and x∗0 , k0  = h(x∗0 ), then the infimum is achieved at k0 and the maximum is achieved at x∗0 , that is δ = y − k0  = x∗0 , y − h(x∗0 ).

228

8 Global Theory of Optimization

Proof (i) ∀x∗ ∈ K supp with x∗  ≤ 1, ∀k ∈ K, we have y − k ≥ x∗ y − k ≥ x∗ , y − k = x∗ , y − x∗ , k ≥ x∗ , y − h(x∗ ). Hence, δ = infk∈K y − k ≥ supx∗ ∈K supp, x∗ ≤1 (x∗ , y − h(x∗ )). We will distinguish two exhaustive and mutually exclusive cases: Case 1: δ = 0; Case 2: δ > 0. Case 1: δ = 0. Take x∗0 = ϑ∗ ∈ K supp. x∗0  = 0 ≤ 1. Then, δ = 0 = x∗0 , y − h(x∗0 ). Then, δ := infk∈K y − k = maxx∗ ∈K supp, x∗ ≤1 (x∗ , y − h(x∗ )). If the infimum is achieved at k0 ∈ K, then y − k0 is aligned with x∗0 and x∗0 , k0  = 0 = h(x∗0 ). This case is proved. Case 2: δ > 0. Since K = ∅, then δ < +∞. Take K1 = BX (y, δ). Then, K1 is a convex set with nonempty interior. K ∩ K1◦ = ∅. By Eidelheit Separation Theorem, there exists x∗ ∈ X∗ with x∗ = ϑ∗ and ∃c ∈ R such that supk1 ∈K1 x∗ , k1  ≤ c ≤ infk∈K x∗ , k. Take x∗0 = −x∗ −1 x∗ . Then, by Proposition 3.81, we have infk1 ∈K1 x∗0 , k1  ≥ −c/x∗  ≥ supk∈K x∗0 , k. This shows, x∗0 ∈ K supp and x∗0  = 1. The above inequality is equivalent to x∗0 , y + infx∈BX (ϑ,δ) x∗0 , x ≥ h(x∗0 ). By Lemma 7.75, we have x∗0 , y − h(x∗0 ) ≥ δ. Hence, we have δ = x∗0 , y−h(x∗0 ) = maxx∗ ∈K supp, x∗ ≤1 (x∗ , y−h(x∗ )). If the infimum is achieved at some k0 ∈ K, then δ = y − k0  = x∗0 , y−h(x∗0 ) ≤ x∗0 , y − x∗0 , k0  = x∗0 , y − k0  ≤ x∗0 y − k0  ≤ y − k0 , where the second inequality follows from Proposition 7.72. Hence, x∗0 , y − k0  = x∗0 y − k0 , x∗0 is aligned with y − k0 , and x∗0 , k0  = h(x∗0 ). This case is proved. (ii) Note that δ ≤ y − k0  = x∗0 y − k0  = x∗0 , y − k0  = x∗0 , y − h(x∗0 ) ≤ δ. Then, the result follows. This completes the proof of the proposition. ' & Now, we will state a proposition that guarantees the existence of a minimizing solution to a minimum norm problem. This proposition is based on the following result. Proposition 8.16 Let X be a real normed linear space. Then, · : X → [0, ∞) ⊂ R is weakly lower semicontinuous. Proof By Definition 3.14, all we need to show is that, ∀a ∈ R, the set Sa := {x ∈ X | x > a} is weakly open. Note that Sa is strongly closed and convex. Then, by Proposition 8.10, Sa is weakly closed. Then, Sa is weakly open. Hence, the norm is a weakly lower semicontinuous functional on X. This completes the proof of the proposition. ' & Proposition 8.17 Let X be a reflexive real normed linear space, x∗ ∈ X∗ , and K ⊆ X∗ be a nonempty closed convex set. Then, δ = mink∗ ∈K x∗ − k∗  and the minimum is achieved at some k∗0 ∈ K. Proof Fix a k∗1 ∈ K = ∅. Let μ := x∗ − k∗1  ∈ [0, ∞) ⊂ R and d = μ + x∗  + 1 ∈ (0, ∞) ⊂ R. Then, ∀k∗ ∈ K with k∗  > d, we have x∗ − k∗  ≥ k∗  − x∗  > μ + 1. Thus, δ = infk∗ ∈K x∗ − k∗  = infk∗ ∈K∩B ∗ (ϑ∗ ,d) x∗ − k∗  and any k∗0 achieving the infimum for the original X

problem must be in the set K ∩ B X∗ (ϑ∗ , d) =: K1 . By Proposition 6.40, K1 is

8.4 Convex and Concave Functionals

229

bounded closed and convex. Note that k∗1 ∈ K1 = ∅. Then, by Proposition 8.11, K1 is weak∗ compact. By Propositions 7.121, 8.16, and 3.16, we have f : X∗ → R given by f (k∗ ) = x∗ − k∗ , ∀k∗ ∈ X∗ , is weakly lower semicontinuous. By Proposition 7.90, K1 is weakly compact since X is reflexive. By Proposition 5.30 and Definition 3.14, there exists k∗0 ∈ K1 that achieves the infimum on K1 . Such k∗0 achieves the infimum on K. This completes the proof of the proposition. ' &

8.4 Convex and Concave Functionals Definition 8.18 Let X be a real vector space, C ⊆ X be a convex set, f : C → R and g : C → R. f is said to be convex, if ∀x1 , x2 ∈ C, ∀α ∈ [0, 1] ⊂ R, we have f (αx1 + (1 − α)x2 ) ≤ αf (x1 ) + (1 − α)f (x2 )

.

f is said to be strictly convex if ∀x1 , x2 ∈ C with x1 = x2 and ∀α ∈ (0, 1) ⊂ R, we have that the strict inequality holds in the above. g is said to be (strictly) concave if −g is (strictly) convex. % Definition 8.19 Let X be a real vector space, C ⊆ X , and f : C → R. The epigraph of f over C is the set [f, C] := {(r, x) ∈ R × X | x ∈ C, f (x) ≤ r}

.

%

Proposition 8.20 Let X be a real vector space, C ⊆ X be convex, and f : C → R. Then, f is convex if, and only if, the epigraph [f, C] is convex. Proof “Sufficiency” ∀x1 , x2 ∈ C, ∀α ∈ [0, 1] ⊂ R, we have (f (x1 ), x1 ), (f (x2 ), x2 ) ∈ [f, C]. By the convexity of the epigraph, then α (f (x1 ), x1 ) + (1 − α) (f (x2 ), x2 ) = (αf (x1 ) + (1 − α)f (x2 ), αx1 + (1 − α)x2 ) ∈ [f, C]. Then, αf (x1 ) + (1 − α)f (x2 ) ≥ f (αx1 + (1 − α)x2 ). Hence, f is convex. “Necessity” Let f be convex. ∀(r1 , x1 ), (r2 , x2 ) ∈ [f, C], ∀α ∈ [0, 1] ⊂ R, we have r1 ≥ f (x1 ) and r2 ≥ f (x2 ). By the convexity of f and C, we have αx1 + (1 − α)x2 ∈ C and αr1 +(1−α)r2 ≥ αf (x1 )+(1−α)f (x2 ) ≥ f (αx1 +(1−α)x2). Then, we have α (r1 , x1 ) + (1 − α) (r2 , x2 ) = (αr1 + (1 − α)r2 , αx1 + (1 − α)x2) ∈ [f, C]. Hence, the epigraph is convex. This completes the proof of the proposition. ' & Proposition 8.21 Let X be a real normed linear space, C ⊆ X be nonempty, and f : C → R. Then, V ([f, C]) = R × V (C). Proof We will first show that v ([f, C]) = R × v (C). Fix x0 ∈ C. Then, (f (x0 ), x0 ) ∈ ∈ v ([f, C]) = (f (x0 ), x0 ) [f, C]. ∀(r, x) + span ([f, C] − (f (x0 ), x0 )), where the equality follows from Proposition 6.37,

230

8 Global Theory of Optimization

then ∃n ∈ Z+ , ∃(r1 , x1 ), . .. , (rn , xn ) ∈ [f, C], and ∃α1 , . . . , αn ∈ R such n that (r − f (x0 ), x − x0 ) = i=1 αi (ri − f (x0 ), xi − x0 ). Then, x ∈ v (C) = x0 + span (C − x0 ) and (r, x) ∈ R × v (C). Hence, v ([f, C]) ⊆ R × v (C). On the other hand, ∀(r, x) ∈ R × v (C), then, by Proposition 6.37, x ∈ v (C) = x0 + span (C −x0 ). Then, ∃n ∈ Z+ , ∃x1 , . . . , xn ∈ C, and ∃α1 , . . . , αn ∈ R such that x − x0 = ni=1 αi (xi − x0 ). ∃r0 ∈ (0, - ∞) ⊂ R such that r0 > f (x0 ). Then, (r0 , x0 ) ∈ [f, C]. Let α0 = (r − f (x0 ) − ni=1 αi (f(xi ) − f (x0 )))/(r0 − f (x0 )). Now it is easy to check that (r − f (x0 ), x − x0 ) = ni=1 αi (f (xi ) − f (x0 ), xi − x0 ) + α0 (r0 − f (x0 ), x0 − x0 ). Note that (f (xi ), xi ) ∈ [f, C], ∀i = 1, . . . , n. Then, we have (r, x) ∈ (f (x0 ), x0 ) + span ([f, C] − (f (x0 ), x0 ))) = v ([f, C]). Hence, R × v (C) ⊆ v ([f, C]). Therefore, we have v ([f, C]) = R × v (C). By Proposition 7.18, V ([f, C]) = v ([f, C]) = R × v (C). By Proposition 3.29, we have R × v (C) = R × v (C) = R × V (C). Therefore, we have V ([f, C]) = R × V (C). This completes the proof of the proposition. ' & Proposition 8.22 Let X be a real normed linear space, C ⊆ X be convex, f : C → R be convex, and ◦C = ∅. Then, [f, C] has a relative interior point (r0 , x0 ) if, and only if, f is continuous at x0 , x0 ∈ ◦C, and r0 ∈ (f (x0 ), +∞) ⊂ R. Proof “Sufficiency” Fix x0 ∈ ◦C and r0 ∈ (f (x0 ), +∞) ⊂ R. Let f be continuous at x0 . Then, for  = (r0 − f (x0 ))/2 ∈ (0, ∞) ⊂ R, ∃δ ∈ (0, r0 − f (x0 ) − ] ⊂ R such that ∀x ∈ BX (x0 , δ)∩V (C), we have x ∈ C and |f (x) − f (x0 )| < . Clearly, (r0 , x0 ) ∈ [f, C]. ∀(r, x) ∈ BR×X ((r0 , x0 ), δ) ∩ V ([f, C]) = BR×X ((r0 , x0 ), δ) ∩ (R × V (C)), we have x ∈ BX (x0 , δ) ∩ V (C) and r ∈ B (r0 , δ). Then, x ∈ C and r > r0 − δ ≥ f (x0 ) +  > f (x). Hence, (r, x) ∈ [f, C]. This shows that (r0 , x0 ) ∈ ◦ [f, C]. “Necessity” Let (r0 , x0 ) ∈ ◦ [f, C]. Then, ∃0 ∈ (0, ∞) ⊂ R such that (BR (r0 , 0 ) × BX (x0 , 0 )) ∩ V ([f, C]) ⊆ [f, C]. By Proposition 8.21, we have BR (r0 , 0 ) × (BX (x0 , 0 ) ∩ V (C)) ⊆ [f, C]. Therefore, BX (x0 , 0 ) ∩ V (C) ⊆ C, x0 ∈ ◦C, f (x) ≤ r0 − 0 , ∀x ∈ BX (x0 , 0 ) ∩ V (C), and f (x0 ) ≤ r0 − 0 < r0 . ∀ ∈ (0, ∞) ⊂ R, Let δ = min {/(r0 − f (x0 )), 1} ∈ (0, 1] ⊂ R. ∀x ∈ BX (x0 , δ0 )∩C, we have x, x0 , x0 +δ −1 (x −x0), x0 −δ −1 (x −x0) ∈ BX (x0 , 0 )∩ V (C) ⊆ C. By the convexity of f , we have f (x) = f ((1 − δ)x0 + δ (x0 + δ −1 (x − x0 )))

.

≤ (1 − δ)f (x0 ) + δf (x0 + δ −1 (x − x0 )) ≤ f (x0 ) + (r0 − 0 − f (x0 ))δ < f (x0 ) +  δ 1 x+ (x0 − δ −1 (x − x0 ))) 1+δ 1+δ δ 1 f (x) + f (x0 − δ −1 (x − x0 )) ≤ 1+δ 1+δ δ 1 f (x) + (r0 − 0 ) ≤ 1+δ 1+δ

f (x0 ) = f (

8.4 Convex and Concave Functionals

231

⇒ (1 + δ)f (x0 ) ≤ f (x) + (r0 − 0 )δ ⇒ f (x) ≥ f (x0 ) − (r0 − f (x0 ) − 0 )δ > f (x0 ) −  Hence, |f (x) − f (x0 )| < . This shows that f is continuous at x0 . This completes the proof of the proposition.

' &

Proposition 8.23 Let X be a real normed linear space, C ⊆ X be convex, x0 ∈ ◦C = ∅, and f : C → R be convex. If f is continuous at x , then f is continuous 0 at x, ∀x ∈ ◦C. Proof Fix any x ∈ ◦C. We will distinguish two exhaustive and mutually exclusive cases: Case 1: x = x0 ; Case 2: x = x0 . Case 1: x = x0 . Then f is continuous at x. Case 2: x = x0 . ∀ ∈ (0, ∞) ⊂ R, by the continuity of f at x0 and the fact that x0 ∈ ◦C, ∃δ ∈ (0, ∞) ⊂ R such that |f (y) − f (x )| < , ∀y ∈ B (x , δ) ∩ V (C) ⊆ C. 0 X 0 Since x ∈ ◦C, then ∃δ1 ∈ (0, δ] ⊂ R such that BX (x, δ1 ) ∩ V (C) ⊆ C. Take 0 +δ1 /2 ∈ (1, ∞) ⊂ R. Then, β(x − x0 ) + x0 ∈ BX (x, δ1 ) ∩ V (C) ⊆ β = x−x x−x0  C. ∀y ∈ BX (x, (β − 1)δ1 /β) ∩ C, y = (1 − 1/β) (β/(β − 1) (y − x) + x0 ) + (1/β) (β (x −x0)+x0 ). Note that β/(β −1) (y −x)+x0 ∈ BX (x0 , δ)∩V (C) ⊆ C. By the convexity of f , we have f (y) ≤ (1 − 1/β)f (β/(β − 1) (y − x) + x0 ) + (1/β)f (β (x − x0 ) + x0 )

.

< (1 − 1/β)(f (x0 ) + ) + (1/β)f (β (x − x0 ) + x0 ) =: r1 Define r := r1 +(β−1)δ1/β. Then, by Proposition 8.21, BR×X ((r, x), (β − 1)δ1 /β) ∩ V ([f, C]) = BR×X ((r, x), (β − 1)δ1 /β) ∩ (R × V (C)) ⊆ [f, C]. Hence, (r, x) ∈ ◦ [f, C]. By Proposition 8.22, f is continuous at x. This completes the proof of the proposition. ' & Proposition 8.24 Let X be a finite-dimensional real normed linear space, C ⊆ X be a convex set, and f : C → R be convex. Then, ∀x0 ∈ ◦C, f is continuous at x0 . Proof We will distinguish two exhaustive and mutually exclusive cases: Case 1: ◦C = ∅; Case 2: ◦C = ∅. Case 1: ◦C = ∅. This is trivial. Case 2: ◦C = ∅. Fix any x0 ∈ ◦C. Then, ∃δ ∈ (0, ∞) ⊂ R such that K¯ := B X (x0 , δ) ∩ V (C) ⊆ C. Let M := V (C) − x0 , which is a closed subspace. Let n ∈ Z+ be the dimension of M, which is well-defined by Theorem 6.51. We will further distinguish two exhaustive and mutually exclusive cases: Case 2a: n = 0; Case 2b: n ∈ N. Case 2a: n = 0. Then, C = {x0 }. Clearly, f is continuous at x0 . Case 2b: n ∈ N. Let {e1 , . . . , en } ⊆ M be a basis in M such   that ei  = 1, ¯ ∀i = 1, . . . , n. Then, K := co ({x0 ± δei | i-= 1, . . . , n}) ⊆ co K -n ⊆ C since C is n convex. ∀x ∈ K, by Proposition 6.43, x = i=1 αi (x0 + δei ) + i=1 βi (x0 − δei ) for some αi , βi ∈ [0, 1] ⊂ R, i = 1, . . . , n, with ni=1 (αi + βi ) = 1. By the

232

8 Global Theory of Optimization

convexity of f , we have f (x) ≤

n .

.

αi f (x0 + δei ) +

i=1



n . i=1

n .

βi f (x0 − δei )

i=1

|f (x0 + δei )| +

n .

|f (x0 − δei )| =: r1

i=1

It is easy to see that p(m) := ni=1 |αi |, ∀m = ni=1 αi ei ∈ M defines a norm on M. By Theorem 7.38, ∃ξ ∈ [1, ∞) ⊂ R such that -p(m)/ξ ≤ m ≤ ξp(m), ∀m ∈ M. ∀x ∈ BX (x0 , δ/ξ )∩V (C), wehave x = x0 + ni=1 αiδei for some α1 , . . . , αn ∈ n n  x ≥ p( R. Then, δ/ξ > − x α δe )/ξ = δ/ξ 0 i i i=1 i=1 |αi |. Then, we have -n |α | < 1. Then, x can be expressed as a convex combination of vectors in i i=1 {x0 ± δei | i = 1, . . . , n} and x ∈ K. Hence, BX (x0 , δ/ξ ) ∩ V (C) ⊆ K ⊆ C. Take r = δ/ξ + r1 . It is easy to show that BR×X ((r, x0 ), δ/ξ ) ∩ V ([f, C]) ⊆ [f, C], by Proposition 8.21. Then, (r, x0 ) ∈ ◦ [f, C]. By Proposition 8.22, f is continuous at x0 . This completes the proof of the proposition. ' & Proposition 8.25 Let X be a real normed linear space, C ⊆ X, and f : C → R. Then, the following statements hold. (i) If [f, C] is closed, then, f is lower semicontinuous. (ii) If C is closed and f is lower semicontinuous, then [f, C] is closed. Proof (i) ∀a ∈ R, Va := {(a, x) ∈ R × X | x ∈ X} is closed. Then, [f, C] ∩ Va = {(a, x) ∈ R × X | x ∈ C, f (x) ≤ a} is closed. Hence, by Proposition 4.13, Ta := {x ∈ C | f (x) ≤ a} is closed. Note that {x ∈ C | − f (x) < a} = C \ {x ∈ C | f (x) ≤ −a} = C \ T−a

.

Clearly, C \ T−a is open in the subset topology of C. Then, −f is upper semicontinuous and f is lower semicontinuous. (ii) ∀(r0 , x0 ) ∈ [f, C], by Proposition 4.13, ∃((rn , xn ))∞ n=1 ⊆ [f, C] such that limn∈N (rn , xn ) = (r0 , x0 ). By Proposition 3.67, we have limn∈N rn = r0 and limn∈N xn = x0 . By Definition 8.19, we have (xn )∞ n=1 ⊆ C and rn ≥ f (xn ), ∀n ∈ N. By Proposition 4.13, x0 ∈ C = C. We will distinguish two exhaustive and mutually exclusive cases: Case 1: there exists infinitely many n ∈ N such that xn = x0 ; Case 2: there exists only finitely many n ∈ N such that xn = x0 . Case 1: there exists infinitely many n ∈ N such that xn = x0 . Then, without loss of generality, assume xn = x0 , ∀n ∈ N. Then, rn ≥ f (x0 ), ∀n ∈ N. Hence, f (x0 ) ≤ limn∈N rn = r0 . Then, (r0 , x0 ) ∈ [f, C]. Case 2: there exists only finitely many n ∈ N such that xn = x0 . Then, without loss of generality, assume xn = x0 , ∀n ∈ N. Then, x0 is an accumulation point of C. By Propositions 3.86 and 3.85 and Definition 3.14, −f (x0 ) ≥

8.5 Conjugate Convex Functionals

233

lim supx→x0 (−f (x)) = − lim infx→x0 f (x). By Proposition 3.87, we have f (x0 ) ≤ lim infx→x0 f (x) ≤ lim infn∈N f (xn ) ≤ lim infn∈N rn = r0 . Then, (r0 , x0 ) ∈ [f, C]. In both cases, we have (r0 , x0 ) ∈ [f, C]. Then, [f, C] ⊆ [f, C]. By Proposition 3.3, [f, C] is closed. This completes the proof of the proposition. ' & Proposition 8.26 Let X be a real normed linear space, C ⊆ X be convex, and f : C → R be convex. Assume that [f, C] is closed. Then, f is weakly lower semicontinuous. Proof ∀a ∈ R, Va := {(a, x) ∈ R × X | x ∈ X} is closed. Then, [f, C] ∩ Va = {(a, x) ∈ R × X | x ∈ C, f (x) ≤ a} is closed. Hence, by Proposition 4.13, Ta := {x ∈ C | f (x) ≤ a} is closed. Since f is convex, then Ta is convex. By Proposition 8.10, Ta is weakly closed. Note that {x ∈ C | − f (x) < a} = C \ {x ∈ C | f (x) ≤ −a} = C \ T−a

.

Clearly, C \ T−a is weakly open in the subset topology of C. Then, −f is weakly upper semicontinuous and f is weakly lower semicontinuous. This completes the proof of the proposition. ' &

8.5 Conjugate Convex Functionals Definition 8.27 Let X be a real normed linear space, C ⊆ X be a nonempty convex set, and f : C → R be convex. The conjugate set C conj is defined as C conj := {x∗ ∈ X∗ | sup (x∗ , x − f (x)) < +∞}

.

x∈C

and the functional f conj : C conj → R conjugate to f is defined by f conj(x∗ ) = sup (x∗ , x − f (x)),

.

x∈C

∀x∗ ∈ C conj

> = We will use a compact notation [f, C]conj for f conj, C conj .

%

In the above definition, f conj takes value in R since x∗ ∈ C conj and C = ∅. Proposition 8.28 Let X be a real normed linear space, C ⊆ X be a nonempty convex set, f :=C → R be >convex. Then, C conj ⊆ X∗ is convex, f conj : C conj → R is convex, and f conj, C conj ⊆ R × X∗ is a closed convex set. Proof ∀x∗1 , x∗2 ∈ C conj, ∀α ∈ [0, 1] ⊂ R, let Mi := f conj(x∗i ) = supx∈C (x∗i , x − f (x)) ∈ R, i = 1, 2. ∀x ∈ C, we have αx∗1 + (1 − α)x∗2 , x − f (x) = α (x∗1 , x − f (x))+ (1 − α) (x∗2 , x − f (x)) ≤ αM1 + (1 − α)M2
= > = By Proposition 8.20, f conj, C conj is convex. ∀(s0 ,=x∗0 ) ∈ f> conj, C conj , by Proposition 4.13, there exists ((sk , x∗k ))∞ f conj, C conj such that k=1 ⊆ limk∈N (sk , x∗k ) = (s0 , x∗0 ). By Proposition 3.67, we have lim = k∈N sk = >s0 and limk∈N x∗k = x∗0 . ∀x ∈ C, ∀k ∈ N, since (sk , x∗k ) ∈ f conj, C conj , then, x∗k , x − f (x) ≤ f conj(x∗k ) ≤ sk . By Propositions 7.72 and 3.66, we have x∗0 , x − f (x) = limk∈N (x∗k , x − f (x)) ≤ lim = k∈N sk = s>0 < +∞. Hence, x∗0 ∈ C conj and f conj(x∗0 ) ≤ s0 . Then, (s0 , x∗0 ) ∈ f conj, C conj . This shows that = > = > > = f conj, C conj ⊆ f conj, C conj . By Proposition 3.3, f conj, C conj is closed. This completes the proof of the proposition. ' & Geometric interpretation of f conj : C conj → R. Fix x∗ ∈ C conj. ∀(r, x) ∈ [f, C], we have f (x) ≤ r. Then, (−1, x∗ ), (r, x) = x∗ , x − r ≤ f conj(x∗ ). It is easy to recognize that sup(r,x)∈[f,C] (−1, x∗ ), (r, x) = f conj(x∗ ). Hence, the closed hyperplane H := {(r, x) ∈ R × X | (−1, x∗ ), (r, x) = f conj(x∗ )} is a supporting hyperplane of [f, C]. Proposition 8.29 Let X be a real normed linear space, C ⊆ X be a nonempty convex set, and f : C → R be a convex functional. Assume that [f, C] =: K ⊆ R × X is closed. Then, ∃x∗0 ∈ X∗ such that .

sup (−1, x∗0 ), (r, x) < +∞

(r,x)∈K

Then, x∗0 ∈ C conj = ∅. Proof Fix x0 ∈ C = ∅. By Proposition 8.20, K is convex. Clearly, K = ∅. By the assumption of the proposition, K is closed. Let r0 := f (x0 ) − 1 ∈ R. Then, (r0 , x0 ) ∈ / K. By Proposition 8.10 and Example 7.76, ∃(¯s0 , x¯∗0 ) ∈ R × X∗ such that (¯s0 , x¯∗0 ), (r0 , x0 )
0. Let x∗0 = −¯s0−1 x¯∗0 ∈ X∗ . Then, (8.1) is equivalent to (−1, x∗0 ), (r0 , x0 ) > sup (−1, x∗0), (r, x)

.

(r,x)∈K

The above implies that (−1, x∗0), (r0 , x0 ) > supx∈C (x∗0 , x − f (x)). Hence, x∗0 ∈ C conj = ∅. This completes the proof of the proposition. ' & Proposition 8.30 Let X be a real normed linear space and K ⊆ R × X =: W be a closed convex set.

8.5 Conjugate Convex Functionals

235

Assume that there exists a nonvertical hyperplane such that K is contained in one of the half-spaces associated with the hyperplane, that is ∃(s1 , x∗1 ) ∈ W∗ = R×X∗ with s1 = 0 such that sup (s1 , x∗1 ), (r, x) < +∞. (r,x)∈K

Then, ∀(r0 , x0 ) ∈ W\K, there exists a nonvertical hyperplane separating (r0 , x0 ) and K, that is, ∃x∗0 ∈ X∗ such that either .

sup (−1, x∗0 ), (r, x) < (−1, x∗0 ), (r0 , x0 )

(8.2a)

inf (−1, x∗0), (r, x) > (−1, x∗0), (r0 , x0 )

(8.2b)

(r,x)∈K

or .

(r,x)∈K

Proof Fix (r0 , x0 ) ∈ W \ K. By Proposition 8.10, ∃(¯s0 , x¯∗0 ) ∈ R × X∗ such that (¯s0 , x¯∗0 ), (r0 , x0 )
0; Case 2: s¯0 < 0; Case 3: s¯0 = 0. Case 1: s¯0 > 0. Let x∗0 = −¯s0−1 x¯∗0 ∈ X∗ . Then, (8.3) is equivalent to .

− r0 + x∗0 , x0  > sup (−r + x∗0 , x) (r,x)∈K

Hence, (8.2a) holds. Case 2: s¯0 < 0. Let x∗0 = −¯s0−1 x¯∗0 ∈ X∗ . Then, (8.3) is equivalent to .

− r0 + x∗0 , x0 
−∞ = sup (−1, x∗0 ), (r, x) (r,x)∈K

Hence, (8.2a) holds.

236

8 Global Theory of Optimization

Case 3b: K = ∅. Define M1 := sup (s1 , x∗1 ), (r, x) − (s1 , x∗1 ), (r0 , x0 ) ∈ R

.

(r,x)∈K

M2 :=

inf (¯s0 , x¯∗0 ), (r, x) − (¯s0 , x¯∗0 ), (r0 , x0 )

(r,x)∈K

∈ (0, ∞) ⊂ R Let δ := M2 /(1 + |M1 |) ∈ (0, ∞) ⊂ R and (˜s0 , x˜∗0 ) = (¯s0 , x¯∗0 ) − δ(s1 , x∗1 ) ∈ R × X∗ . Then, we have .

inf (˜s0 , x˜∗0 ), (r, x) − (˜s0 , x˜∗0 ), (r0 , x0 )

(r,x)∈K

=

inf ((¯s0 , x¯∗0 ), (r, x) − δ(s1 , x∗1 ), (r, x))

(r,x)∈K

−(˜s0 , x˜∗0 ), (r0 , x0 ) ≥

inf (¯s0 , x¯∗0 ), (r, x) + inf (−δ(s1 , x∗1 ), (r, x))

(r,x)∈K

(r,x)∈K

−(˜s0 , x˜∗0 ), (r0 , x0 ) =

inf (¯s0 , x¯∗0 ), (r, x) − δ sup (s1 , x∗1 ), (r, x)

(r,x)∈K

(r,x)∈K

−(˜s0 , x˜∗0 ), (r0 , x0 ) = M2 − δM1 > 0 where we have applied Proposition 3.81 in the above. Hence, we have obtained an alternative (¯s0 , x¯∗0 ) := (˜s0 , x˜∗0 ) such that (8.3) holds. For this alternative pair, s˜0 = −δs1 = 0. Hence, this case can be solved by Case 1 or Case 2 with the alternative pair. This completes the proof of the proposition. ' & Definition 8.31 Let X be a real normed linear space, Γ ⊆ X∗ be a nonempty convex set, and ϕ : Γ → R be convex. The pre-conjugate set conjΓ is defined as Γ := {x ∈ X | sup (x∗ , x − ϕ(x∗ )) < +∞}

.conj

x∗ ∈Γ

and the functional conjϕ : conjΓ → R pre-conjugate to ϕ is defined by ϕ(x) = sup (x∗ , x − ϕ(x∗ )),

.conj

x∗ ∈Γ

We will use a compact notation conj[ϕ, Γ ] for

=

∀x ∈ conjΓ

conjϕ, conjΓ

>

.

In the above definition, conjϕ takes value in R since x ∈ conjΓ and Γ = ∅.

%

8.5 Conjugate Convex Functionals

237

Next, we state three duality results regarding conjugate convex functionals. Proposition 8.32 Let X be a real normed linear space, C ⊆ X be a nonempty convex set, and f= : C → R>be a convex functional. Assume that [f, C] is closed. Then, [f, C] = conj [f, C]conj . Therefore, ∀x0 ∈ C, f (x0 ) = supx∗ ∈C conj(x∗ , x0  − f conj(x∗ )). Proof Since C = ∅, then C conj and f conj are well-defined. By Proposition 8.29, C conj 8.28, C conj is convex =and f conj > is convex. Then,  =  ∅. By Proposition   C and f are well-defined. Thus, conj [f, C]conj is a well-defined conj conj conj conj set. = > = > We will first show that [f, C] ⊆ conj f conj, C conj = conj [f, C]conj . Fix any (r, x) ∈ [f, C]. ∀x∗ ∈ C conj, we have f conj(x∗ ) ≥ x∗ , x − f (x). Hence, we have r ≥ f (x) ≥ x∗ , x − f conj(x∗ ). Thus, we have r≥

.

sup

x∗ ∈C conj

  x∗ , x − f conj(x∗ ) = conj f conj (x)

= > = > Hence, (r, x) ∈ conj f conj, C conj and [f, C] ⊆ conj f conj, C conj . On the other hand, fix any (r0 , x0 ) ∈ (R × X) \ [f, C]. By Proposition 8.29, ∃x¯∗0 ∈ X∗ such that .

sup (r,x)∈[f,C]

(−1, x¯ ∗0), (r, x) < +∞

Note that [f, C] is convex, by Proposition 8.20. By Proposition 8.30, there exists a nonvertical hyperplane separating (r0 , x0 ) and K := [f, C], that is, ∃x∗0 ∈ X∗ such that either .

sup (−1, x∗0 ), (r, x) < (−1, x∗0 ), (r0 , x0 )

(8.4a)

inf (−1, x∗0), (r, x) > (−1, x∗0), (r0 , x0 )

(8.4b)

(r,x)∈K

or .

(r,x)∈K

Since K is the epigraph of f , then (8.4b) is impossible since the left-hand side equals to −∞. Therefore, (8.4a) must hold. Then, c := sup(r,x)∈K (−1, x∗0), (r, x) = supx∈C > f (x)) = f conj(x∗0 ) ∈ R and = (x∗0 , x − x∗0 ∈ C conj. Then, (c, x∗0 ) ∈ f conj, C conj . But, c < −r0 + x∗0 , x0  implies that r0 < x∗0 , x0  − f conj(x∗0 ) ≤ supx∗ ∈C conj x∗ , x0  − f conj(x∗ ). Therefore, = > = > / conj f conj, C conj .=Hence, conj> f conj, C conj ⊆ [f, C]. (r0 , x0 ) ∈   This yields [f, C] = conj [f, C]conj , which implies that f and conj f conj admit the same domain of   definition and equal to each other on C. Therefore, ∀x0 ∈ C, f (x0 ) = conj f conj (x0 ) = supx∗ ∈C conj(x∗ , x0  − f conj(x∗ )). ' &

238

8 Global Theory of Optimization

Proposition 8.33 Let X be a real normed linear space, C ⊆ X be a nonempty convex set, f : C → R be a convex functional, f be lower semicontinuous at x0 ∈ C, C conj ⊆ X∗ be the conjugate set, and f conj : C conj → R be the conjugate functional. Then, f (x0 ) = supx∗ ∈C conj(x∗ , x0  − f conj(x∗ )). Proof ∀x∗ ∈ C conj, we have x∗ , x0  − f conj(x∗ ) = x∗ , x0  + infx∈C (f (x) − x∗ , x) ≤ f (x0 ). Then, f (x0 ) ≥

.

sup (x∗ , x0  − f conj(x∗ ))

x∗ ∈C conj

∀ ∈ (0, ∞) ⊂ R, by the lower semicontinuity of f at x0 , ∃δ ∈ (0, /2] ⊂ R such that f (x) > f (x0 ) − /2, ∀x ∈ BX (x0 , δ) ∩ C. Consider the real normed linear space W := R × X. Define K2 := [f, C], which is clearly nonempty. By Proposition 8.20, K2 is convex. Define K1 := BW ((f (x0 ) − , x0 ), δ). Then, K1 is convex and K1◦ = K1 = ∅. Claim 8.33.1 K1◦ ∩ K2 = ∅. Proof of Claim Suppose ∃(r, x) ∈ K1◦ ∩K2 . Then, x ∈ BX (x0 , δ)∩C, r < f (x0 )−  + δ ≤ f (x0 ) − /2 and r ≥ f (x). This leads to f (x) < f (x0 ) − /2, which is a contradiction. Hence, K1◦ ∩ K2 = ∅. This completes the proof of the claim. ' & By Eidelheit Separation Theorem and Example 7.76, ∃c ∈ R and ∃(¯s0 , x¯∗0 ) ∈ W∗ = R × X∗ with (¯s0 , x¯∗0 ) = (0, ϑX∗ ) such that .

sup (¯s0 , x¯∗0 ), (r, x) ≤ c ≤

(r,x)∈K1

inf

(r,x)∈K2

(¯s0 , x¯∗0 ), (r, x)

(8.5)

The second inequality in (8.5) implies that −∞ < c ≤ inf(r,x)∈K2 (¯s0 r + x¯∗0 , x. Since K2 = [f, C], then s¯0 ≥ 0. Claim 8.33.2 s¯0 > 0. Proof of Claim Suppose s¯0 = 0. By the fact that (¯s0 , x¯∗0 ) = (0, ϑX∗ ), we have x¯∗0 = ϑX∗ . Then, (8.5) implies that x¯∗0 , x0  ≥ c ≥ sup x¯∗0 , x = (r,x)∈K1

sup

x∈BX (x0 ,δ)

x¯ ∗0 , x. This is impossible since x¯∗0 = ϑX∗ . Therefore, s¯0 > 0. This ' &

completes the proof of the claim. Let x∗0 := .

−¯s0−1 x¯∗0

∈ X∗ . Then, (8.5) is equivalent to

inf (x∗0 , x − r) ≥ −¯s0−1 c ≥

(r,x)∈K1

sup (x∗0 , x − r)

(r,x)∈K2

= sup (x∗0 , x − f (x)) x∈C

8.5 Conjugate Convex Functionals

239

Hence, x∗0 ∈ C conj and f conj(x∗0 ) ≤ x∗0 , x0  − f (x0 ) + . Therefore, we have x∗0 , x0  − f conj(x∗0 ) ≥ f (x0 ) − 

.

By the arbitrariness of , f (x0 ) = supx∗ ∈C conj(x∗ , x0  − f conj(x∗ )). This completes the proof of the proposition.

' &

Proposition 8.34 Let X be a real normed linear space, C ⊆ X be a nonempty convex set, C ◦ = ∅, f : C → R be a convex functional, f be continuous at x¯ ∈ C ◦ , C conj ⊆ X∗ be the conjugate set, and f conj : C conj → R be the conjugate functional. Then, ∀x0 ∈ C ◦ , f (x0 ) = maxx∗ ∈C conj(x∗ , x0  − f conj(x∗ )). Proof ∀x∗ ∈ C conj, we have x∗ , x0  − f conj(x∗ ) = x∗ , x0  + infx∈C (f (x) − x∗ , x) ≤ f (x0 ). Then, f (x0 ) ≥

.

sup (x∗ , x0  − f conj(x∗ ))

x∗ ∈C conj

Consider the real normed linear space W := R×X. Let V = {(f (x0 ), x0 )} ⊆ W. Clearly, V is a linear variety. Define K := [f, C]. By Proposition 8.20, K is convex. By Proposition 8.22, K admits relative interior point (¯r , x) ¯ for some r¯ > f (x). ¯ By Proposition 8.21, V ([f, C]) = R × V (C) = R × X = W since C ◦ = ∅. Hence, (¯r , x) ¯ ∈ K ◦ = ∅. Note that (f (x0 ) − δ, x0 ) ∈ / [f, C], ∀δ ∈ (0, ∞) ⊂ R. Then, (f (x0 ), x0 ) ∈ / K ◦ . Hence, V ∩ K ◦ = ∅. By Mazur’s Theorem and Example 7.76, ∃c ∈ R and ∃(¯s0 , x¯∗0 ) ∈ W∗ = R × X∗ with (¯s0 , x¯∗0 ) = (0, ϑX∗ ) such that (¯s0 , x¯∗0 ), (f (x0 ), x0 ) = c ≥ sup (¯s0 , x¯∗0 ), (r, x)

.

(r,x)∈K

which is equivalent to s¯0 f (x0 ) + x¯∗0 , x0  ≥ sup (¯s0 r + x¯∗0 , x)

.

(8.6)

(r,x)∈K

Since K = [f, C], then s¯0 ≤ 0, otherwise the right-hand side of (8.6) equals to +∞. Claim 8.34.1 s¯0 < 0. Proof of Claim Suppose s¯0 = 0. By the fact that (¯s0 , x¯∗0 ) = (0, ϑX∗ ), we have x¯∗0 = ϑX∗ . Then, (8.6) implies that x¯∗0 , x0  ≥ supx∈C x¯∗0 , x. Note that x0 ∈ C ◦ and x¯∗0 = ϑX∗ implies that supx∈C x¯∗0 , x > x¯∗0 , x0 . This is a contradiction. Therefore, s¯0 < 0. This completes the proof of the claim. ' & Let x∗0 := |¯s0 |−1 x¯∗0 ∈ X∗ . Then, (8.6) is equivalent to .

− f (x0 ) + x∗0 , x0  ≥ sup (−r + x∗0 , x) = sup (x∗0 , x − f (x)) (r,x)∈K

x∈C

Hence, x∗0 ∈ C conj and −f (x0 ) + x∗0 , x0  ≥ f conj(x∗0 ).

240

8 Global Theory of Optimization

Therefore, we have f (x0 ) ≤ x∗0 , x0  − f conj(x∗0 ) ≤

.

sup (x∗ , x0  − f conj(x∗ )) ≤ f (x0 )

x∗ ∈C conj

Hence, f (x0 ) = maxx∗ ∈C conj(x∗ , x0  − f conj(x∗ )), where the maximum is achieved at x∗0 . & '

8.6 Fenchel Duality Theorem Let .X be a real normed linear space, .C1 , C2 ⊆ X be nonempty convex sets, and f1 : C1 → R and .f2 : C2 → R be convex functionals. We consider the problem of

.

μ :=

.

inf

x∈C1 ∩C2

(f1 (x) + f2 (x))

We assume that .f1conj, .C1conj, .f2conj, .C2conj are easily characterized yet (f1 + f2 )conj and .(C1 ∩ C2 )conj are difficult to determine. Then, the above infimum can be equivalently calculated by

.

μ=

.

(r1 + r2 )

inf

(r1 ,x)∈[f1 ,C1 ],(r2 ,x)∈[f2 ,C2 ]

The idea of Fenchel Duality Theorem can be illustrated in Fig. 8.1. Fig. 8.1 Fenchel duality

IR

[ f1 , C1 ]

−f1conj(x∗ )

µ

f2conj(−x∗ )

[ f2 , C2 ]

8.6 Fenchel Duality Theorem

241

Theorem 8.35 (Fenchel Duality Theorem) Let .X be a real normed linear space, C1 , C2 ⊆ X be nonempty convex sets, .f1 : C1 → R and .f2 : C2 → R be convex functionals, and .C2 ∩ C1◦ = ∅. Assume that .f1 is continuous at .x¯ ∈ C1◦ and .μ := infx∈C1 ∩C2 (f1 (x) + f2 (x)) ∈ R. Let .f1conj : C1conj → R and .f2conj : C2conj → R be conjugate functionals of .f1 and .f2 , respectively. Then, the following statements hold.

.

(i) .μ = infx∈C1∩C2 (f1 (x) + f2 (x)) = maxx∗ ∈C1 conj∩(−C2 conj) (−f1conj(x∗ ) − f2conj(−x∗ )), where the maximum is achieved at some .x∗0 ∈ C1conj ∩(−C2conj). If the infimum is achieved by some .x0 ∈ C1 ∩ C2 , then, we have f1conj(x∗0 ) = x∗0 , x0  − f1 (x0 ).

(8.7a)

f2conj(−x∗0 ) = −x∗0 , x0  − f2 (x0 )

(8.7b)

.

(ii) If there exists .x∗0 ∈ C1conj ∩ (−C2conj) and .x0 ∈ C1 ∩ C2 such that (8.7) holds, then the infimum is achieved at .x0 and the maximum is achieved at .x∗0 . Proof .∀x∗ ∈ C1conj ∩ (−C2conj), we have .

− f1conj(x∗ ) = − sup (x∗ , x − f1 (x)) = inf (−x∗ , x + f1 (x)) x∈C1

x∈C1



inf

x∈C1 ∩C2

(−x∗ , x + f1 (x))

−f2conj(−x∗ ) = − sup (−x∗ , x − f2 (x)) x∈C2

= inf (x∗ , x + f2 (x)) ≤ x∈C2

inf

x∈C1 ∩C2

(x∗ , x + f2 (x))

Then, we have −f1conj(x∗ ) − f2conj(−x∗ ) ≤

.

+

inf

x∈C1 ∩C2

inf

x∈C1 ∩C2

(x∗ , x + f2 (x)) ≤

(−x∗ , x + f1 (x))

inf

x∈C1 ∩C2

(f1 (x) + f2 (x)) = μ

(8.8)

(i) Consider the sets .K1 := [f1 − μ, C1 ] and .K2 := {(r, x) ∈ R × X | x ∈ C2 , r ≤ −f2 (x)} = {(r, x) ∈ R × X | (−r, x) ∈ [f2 , C2 ]}. By Proposition 8.20, .K1 and .K2 are nonempty convex sets. By Proposition 8.22, .K1 admits relative interior point .(¯r , x) ¯ for some .r¯ ∈ R. By Proposition 8.21, .V ([f1 − μ, C1 ]) = R×V (C1 ) = ¯ ∈ K1◦ = ∅. R × X. Then, .(¯r , x) We will show that .K1◦ ∩ K2 = ∅ by an argument of contradiction. Suppose ◦ ◦ .(r, x) ∈ K ∩ K2 = ∅. Then, we have .(r, x) ∈ K , which implies that .x ∈ C1 and 1 1 .r > f1 (x) − μ; and .(r, x) ∈ K2 , which implies that .x ∈ C2 and .r ≤ −f2 (x). Then, .x ∈ C1 ∩ C2 and .f1 (x) + f2 (x) < μ. This contradicts with the definition of .μ. Hence, .K1◦ ∩ K2 = ∅.

242

8 Global Theory of Optimization

By Eidelheit Separation Theorem and Example 7.76, .∃(¯s0 , x¯∗0 ) ∈ R × X∗ with .(¯ s0 , x¯∗0 ) = ϑR×X∗ such that .

−∞
0; Case 2: .s¯0 = 0. Case 1: .s¯0 > 0. Then, .sup(r,x)∈K1 (x¯∗0 , x + r s¯0 ) = +∞, which contradicts (8.9). Case 2: .s¯0 = 0. Then, .x¯∗0 = ϑX∗ . By (8.9), we have .

− ∞ < sup x¯∗0 , x ≤ inf x¯∗0 , x < +∞ x∈C2

x∈C1

BB CC Let .x´ ∈ C1◦ ∩ C2 = ∅. Then, .supx∈C1 x¯∗0 , x ≤ x¯∗0 , x´ . This is not possible since .x´ ∈ C1◦ and .x¯ ∗0 = ϑX∗ . Hence, we have a contradiction in both cases. Therefore, .s¯0 < 0. Let .x∗0 = |¯s0 |−1 x¯∗0 . Then, (8.9) is equivalent to .

−∞


sup (x∗0 , x − r) = sup (x∗0 , x − f1 (x) + μ) x∈C1

(r,x)∈K1



sup

x∈C1 ∩C2

(x∗0 , x − f1 (x)) + μ =: d1 > −∞

Hence, .x∗0 ∈ C1conj. Note also that −∞
−∞. Hence, the desired equality holds. Case 2: .X × Y = ∅. Then, .X = ∅ and .Y = ∅. This implies that .infx∈X g1 (x) = inf(x,y)∈X×Y g1 (x) and .infy∈Y g2 (y) = inf(x,y)∈X×Y g2 (y). By Proposition 3.81, we have μl ≥

.

inf

(x,y)∈X×Y

g1 (x) +

inf

(x,y)∈X×Y

g2 (y) = μr

We will further distinguish two exhaustive and mutually exclusive cases: Case 2a: μr = +∞; Case 2b: .μr < +∞. Case 2a: .μr = +∞. Then, .+∞ ≥ μl ≥ μr = +∞. Hence, the desired equality holds. Case 2b: .μr < +∞. .∀m ∈ R with .m > μr , by Proposition 3.81, .∃x0 ∈ X and .∃y0 ∈ Y such that .g1 (x0 ) + g2 (y0 ) < m. Then, again by Proposition 3.81, we have .μl < m. This implies that .μl ≤ μr . Hence, the desired equality holds. This completes the proof of the proposition. ' & .

8.6 Fenchel Duality Theorem

245

Proposition 8.38 Let .X be a topological space, Y be a set, and .V : X × Y → R. Assume that V satisfies the following two conditions. (i) .∀x1 ∈ X , .W (x1 ) := infy∈Y V (x1 , y) ∈ R. This defines the function .W : X → R. (ii) .∀y ∈ Y , define the function .fy : X → R by .fy (x) = V (x, y), .∀x ∈ X . The collection of functions .{fy | y ∈ Y } is equicontinuous. Then, W is continuous. Proof .∀x0 ∈ X , .∀ ∈ (0, ∞) ⊂ R, by (ii),  .∃U ∈ OX with .x0 ∈ U such that, ¯ y) − V (x0 , y)| < . By (i), ∀x¯ ∈ U , .∀y ∈ Y , we have .fy (x) ¯ − fy (x0 ) = |V (x, ¯ − V (x, ¯ y)| ¯ <  and .|W (x0 ) − V (x0 , y0 )| < . Note .∃y, ¯ y0 ∈ Y such that .|W (x) that .

.

− 2 < −|W (x) ¯ − V (x, ¯ y)| ¯ − |V (x, ¯ y) ¯ − V (x0 , y)| ¯ ≤ W (x) ¯ − V (x0 , y) ¯ ¯ + V (x0 , y) ¯ − W (x0 ) = W (x) ¯ − W (x0 ) ≤ W (x) ¯ − V (x0 , y) ¯ y0 ) − W (x0 ) ≤ V (x, ¯ y0 ) − W (x0 ) = W (x) ¯ − V (x, ¯ y0 ) + V (x, ≤ |V (x, ¯ y0 ) − V (x0 , y0 )| + |V (x0 , y0 ) − W (x0 )| < 2

¯ − W (x0 )| < 2 and W is continuous at .x0 . By ProposiHence, we have .|W (x) tion 3.9, W is continuous. This completes the proof of the proposition. ' & Next, we present a result on game theory. Proposition 8.39 Let .X be a reflexive real normed linear space and .A ⊆ X and B ⊆ X∗ be nonempty bounded closed convex sets. Then,

.

.

min max x∗ , x = max min x∗ , x x∈A x∗ ∈B

x∗ ∈B x∈A

Proof Let .μ := infx∈A supx∗ ∈B x∗ , x. Let .h : X × B → R be given by h(x, x∗ ) = x∗ , x, .∀x ∈ X, .∀x∗ ∈ B. By Proposition 7.72, h is continuous. Define .f1 : X → Re by .f1 (x) = supx∗ ∈B h(x, x∗ ), .∀x ∈ X. .∀x ∈ X, define ∗ ∗ ∗ .hx : X → R by .hx (x∗ ) = h(x, x∗ ) = x∗ , x, .∀x∗ ∈ X . Then, .hx is weak ∗ continuous. By Proposition 8.11, B is weak compact. Then, by Proposition 5.29, .∃x ¯∗ ∈ B such that .f1 (x) = hx (x¯∗ ) = maxx∗ ∈B hx (x∗ ) = maxx∗ ∈B h(x, x∗ ) ∈ R. Hence, .f1 : X → R takes value in .R. Since B is bounded, then .∃MB ∈ [0, ∞) ⊂ R such that .x∗  ≤ MB , .∀x∗ ∈ B. .∀x∗ ∈ B, define .hx∗ : X → R by .hx∗ (x) = h(x, x∗ ), .∀x ∈ X. .∀x0 ∈ X, .∀ ∈ (0, ∞) ⊂ R, let .δ = /(1 + MB ) ∈ (0, ∞) ⊂ R. .∀x ∈ BX (x0 , δ), .∀x∗ ∈   B, we have .hx∗ (x) − hx∗ (x0 ) = |h(x, x∗ ) − h(x0 , x∗ )| = |x∗ , x − x0 | ≤ x∗ x − x0  ≤ MB δ < , where we have applied Proposition 7.72. Hence, .{hx∗ | x∗ ∈ B} is equicontinuous. By Proposition 8.38, .f1 : X → R is continuous. .

246

8 Global Theory of Optimization

∀x1 , x2 ∈ X, .∀α ∈ [0, 1] ⊂ R, we have

.

f1 (αx1 + (1 − α)x2 ) = max x∗ , αx1 + (1 − α)x2 

.

x∗ ∈B

≤ sup αx∗ , x1  + sup (1 − α)x∗ , x2  x∗ ∈B

x∗ ∈B

= α max x∗ , x1  + (1 − α) max x∗ , x2  x∗ ∈B

x∗ ∈B

= αf1 (x1 ) + (1 − α)f1 (x2 ) where we have applied Proposition 3.81 in the second equality. Hence, .f1 is convex. Since A is bounded, then, .∃MA ∈ [0, ∞) ⊂ R such that .x ≤ MA , .∀x ∈ A. .∀x ∈ A, .∀x∗ ∈ B, by Proposition 7.72, we have .x∗ , x ≥ −x∗ x ≥ −MA MB . Then, .f1 (x) ≥ −MA MB since .B = ∅. Then, .μ ≥ −MA MB . Since .A = ∅, then .μ < +∞. Hence, .μ is finite. Define .f2 : A → R by .f2 (x) = 0, .∀x ∈ A. Clearly, .f2 is convex. Note .μ = infx∈X∩A (f1 (x) + f2 (x)). Now, it is easy to check that all assumptions for the Fenchel Duality Theorem are satisfied. Then, μ=

.

max

x∗ ∈Xconj∩(−Aconj)

(−f1conj(x∗ ) − f2conj(−x∗ ))

Claim 8.39.1 .Xconj = B and .f1conj(x∗ ) = 0, .∀x∗ ∈ B. Proof of Claim .Xconj = {x∗ ∈ X∗ | supx∈X (x∗ , x − f1 (x)) < +∞}. .∀x∗0 ∈ B, .∀x ∈ X, we have .f1 (x) = maxx∗ ∈B x∗ , x ≥ x∗0 , x. Then, .x∗0 , x − f1 (x) ≤ 0. Hence, .x∗0 ∈ Xconj. Hence, .B ⊆ Xconj. The above also implies that .f1conj(x∗0 ) = supx∈X (x∗0 , x − f1 (x)) = 0. On the other hand, .∀x∗0 ∈ X∗ \ B, by Proposition 8.10, .∃x∗∗0 ∈ X∗∗ such that .x∗∗0 , x∗0  < infx∗ ∈B x∗∗0 , x∗ . Since .X is reflexive, then .x∗∗0 = φ(x0 ) for some .x0 ∈ X, where .φ : X → X∗∗ is the natural mapping. Then, .x∗0 , x0  < infx∗ ∈B x∗ , x0  and .x∗0 , −x0  > supx∗ ∈B x∗ , −x0  = f1 (−x0 ). Then, .x∗0 , −αx0  − f1 (−αx0 ) = α (x∗0 , −x0  − f1 (−x0 )) > 0, .∀α ∈ (0, ∞) ⊂ R. Hence, .supx∈X (x∗0 , x − f1 (x)) = +∞ and .x∗0 ∈ X∗ \ Xconj. This implies that .Xconj ⊆ B. Therefore, we have .Xconj = B. This completes the proof of the claim. ' & Claim 8.39.2 .Aconj = X∗ and .f2conj(x∗ ) = maxx∈A x∗ , x, .∀x∗ ∈ X∗ . Proof of Claim .Aconj = {x∗ ∈ X∗ | supx∈A x∗ , x < +∞}. Since .X is reflexive, then, by Definition 7.89, .φ : X → X∗∗ is an isometrical isomorphism. Then, .φ(A) ⊆ X∗∗ is a nonempty bounded closed convex set. By Proposition 7.90, .Y := X∗ is a reflexive real normed linear space. By Proposition 8.11, .φ(A) is weak∗ compact. .∀x∗ ∈ X∗ , .hx∗ ◦ φinv : X∗∗ → R is

8.7 Positive Cones and Convex Mappings

247

weak∗ continuous. By Proposition 5.29, we have .

sup x∗ , x =

x∈A

sup

x∗∗ ∈φ(A)

x∗∗ , x∗  = x∗∗0 , x∗  = x∗ , φinv (x∗∗0 )

= x∗ , x0  ∈ R for some .x∗∗0 ∈ φ(A) and .x0 = φinv (x∗∗0 ) ∈ A. Then, .x∗ ∈ Aconj and .Aconj = X∗ . Clearly, .f2conj(x∗ ) = x∗ , x0  = maxx∈A x∗ , x. This completes the proof of the claim. ' & Then, μ = inf max x∗ , x =

.

x∈A x∗ ∈B

max (− max −x∗ , x) = max min x∗ , x

x∗ ∈B∩X∗

x∗ ∈B x∈A

x∈A

To show that the infimum in the above equation is actually achieved, we note that .

− μ = min max x∗ , x = min x∗ ∈B x∈(−A)

max

x∗ ∈B x∗∗ ∈φ(−A)

x∗∗ , x∗ 

By Proposition 7.90, .Y := X∗ is a reflexive real normed linear space and .φ(X) = X∗∗ . Since .φ is an isometrical isomorphism, then .φ(−A) is a nonempty bounded closed convex set. Applying the result that we have obtained in this proof to the above, we have .

−μ=

max

min x∗∗ , x∗  = max min x∗ , x

x∗∗ ∈φ(−A) x∗ ∈B

x∈(−A) x∗ ∈B

which is equivalent to .μ = minx∈A maxx∗ ∈B x∗ , x. This completes the proof of the proposition.

' &

8.7 Positive Cones and Convex Mappings Definition 8.40 Let X be a normed linear space and P ⊆ X be a closed convex  cone. For x, y ∈ X , we will write x = y (with respect to P ) if x − y ∈ P . The cone P defining this relation is called the positive cone in X. The cone N = −P is  called the negative cone in X and we write y = x if y − x ∈ N. We will write x  y (y  x) if x − y ∈ P ◦ (y − x ∈ N ◦ = −P ◦ ). % 



It is easy to check that relations = and = are reflexive and transitive. Proposition 8.41 Let X be a normed linear space with positive cone P . ∀x1 , x2 , x3 , x4 ∈ X, ∀α ∈ [0, ∞) ⊂ R, we have 

(i) x1 , x2 ∈ P implies x1 + x2 ∈ P , αx1 ∈ P , and −αx1 = ϑ.  (ii) x1 = x1 .

248

(iii) (iv) (v) (vi) (vii)

8 Global Theory of Optimization 





x1 = x2 and x2 = x3 implies x1 = x3 .    x1 = x2 and x3 = x4 implies x1 + x3 = x2 + x4 .   x1 = x2 implies αx1 = αx2 .  x1  ϑ and x2 = ϑ implies x1 + x2  ϑ. x1  x2 and α > 0 implies αx1  αx2 . ' &

Proof This is straightforward.

Definition 8.42 Let X be a real normed linear space and S ⊆ X. The set S ⊕ := {x∗ ∈ X∗ | x∗ , x ≥ 0, ∀x ∈ S} is called the positive conjugate cone of S. The set S  := {x∗ ∈ X∗ | x∗ , x ≤ 0, ∀x ∈ S} is called the negative conjugate cone of S. Clearly, S  = −S ⊕ . % Proposition 8.43 Let X be a real normed linear space and S, T ⊆ X. Then, (i) S ⊕ ⊆ X∗ is a closed convex cone. (ii) If S ⊆ T , then T ⊕ ⊆ S ⊕ . Proof (i) Clearly, ϑ∗ ∈ S ⊕ . ∀x∗ ∈ S ⊕ , ∀α ∈ [0, ∞) ⊂ R, ∀x ∈ S, we have αx∗ , x = αx∗ , x ≥ 0. Hence, αx∗ ∈ S ⊕ . Therefore, S ⊕ is a cone with vertex at origin. ∀x∗1 , x∗2 ∈ S ⊕ , ∀x ∈ S, we have x∗1 + x∗2 , x = x∗1 , x + x∗2 , x ≥ 0. Hence, x∗1 + x∗2 ∈ S ⊕ . Therefore, S ⊕ is a convex cone. ⊕ ∀x∗ ∈ S ⊕ , by Proposition 4.13, ∃(x∗n )∞ n=1 ⊆ S such that lim x∗n = x∗ . ∀x ∈ n∈N

S, by Propositions 7.72 and 3.66, x∗ , x = limn∈N x∗n , x ≥ 0. Therefore, x∗ ∈ S ⊕ and S ⊕ ⊆ S ⊕ . By Proposition 3.3, S ⊕ is closed. (ii) This is straightforward. This completes the proof of the proposition. ' & Proposition 8.44 Let X and Y be real normed linear spaces, A ∈ B(X, Y), and S ⊆ X. Then, (A(S))⊕ = A inv (S ⊕ ). BB CC Proof ∀y∗ ∈ (A(S))⊕ ⊆ Y∗ , ∀s ∈ S, we have A y∗ , s = y∗ , As ≥ 0. Then, A y∗ ∈ S ⊕ and y∗ ∈ A inv (S ⊕ ). Hence, (A(S))⊕ ⊆ A inv (S ⊕ ).  ⊕  ⊕ the have y∗ , As = BB On CC other hand, ∀y∗ ∈ A inv (S ), A y∗ ∈ ⊕S . ∀s ∈ S, we  A y∗ , s ≥ 0. Therefore, we have y∗ ∈ (A(S)) and A inv (S ⊕ ) ⊆ (A(S))⊕ . Hence, (A(S))⊕ = A inv (S ⊕ ). This completes the proof of the proposition. ' & Definition 8.45 Let X be a real normed linear space and P ⊆ X be the positive cone. We will define P ⊕ ⊆ X∗ to be the positive cone in the dual. % Proposition 8.46 Let X be a real normed linear space with the positive cone P . If  x0 ∈ X satisfies that x∗ , x0  ≥ 0, ∀x∗ ∈ P ⊕ (or x∗ = ϑ∗ ), then x0 ∈ P . Proof Suppose x0 ∈ / P , by Proposition 8.10, ∃x∗ ∈ X∗ , such that −∞ < x∗ , x0  < infx∈P x∗ , x. Since P is a cone, then infx∈P x∗ , x = 0 (it must be greater than or equal to 0 since otherwise the infimum must be −∞; and it must be less than or equal to 0 since ϑX ∈ P ). Hence, x∗ ∈ P ⊕ and x∗ , x0  < 0. This

8.8 Lagrange Multipliers

249

contradicts with the assumption. Therefore, we must have x0 ∈ P . This completes the proof of the proposition. ' & Proposition 8.47 Let X be a real normed linear space with positive cone P . ∀x∗ ∈ X∗ , ∀x ∈ X, we have (i) (ii) (iii) (iv)





x = ϑ and x∗ = ϑ∗ implies x∗ , x ≥ 0.   x = ϑ and x∗ = ϑ∗ implies x∗ , x ≤ 0.  x  ϑ, x∗ = ϑ∗ , and x∗ = ϑ∗ implies x∗ , x > 0.  x = ϑ, x = ϑ, and x∗  ϑ∗ implies x∗ , x > 0. ' &

Proof This is straightforward.

Definition 8.48 Let X be a real vector space, Ω ⊆ X , and Z be a real normed linear space with the positive cone P ⊆ Z. A mapping G : Ω → Z is said to be convex if  Ω is convex and ∀x1 , x2 ∈ Ω, ∀α ∈ [0, 1] ⊂ R, we have G(αx1 + (1 − α)x2 ) = αG(x1 ) + (1 − α)G(x2 ). % We note that the convexity of a mapping depends on the definition of the positive cone P . Proposition 8.49 Let X be a real vector space, Ω ⊆ X be convex, Z be a real normed linear space with the positive *cone P ⊆ Z, and G + : Ω → Z be a convex    mapping. Then, ∀z ∈ Z, the set Ωz := x ∈ Ω G(x) = z is convex. 

Proof Fix any z ∈ Z. ∀x1 , x2 ∈ Ωz , ∀α ∈ [0, 1] ⊂ R, we have G(x1 ) = z and   G(x2 ) = z. Since P is a convex cone, then, αG(x1 ) + (1 − α)G(x2 ) = αz +  (1 − α)z = z. By the convexity of G, we have G(αx1 + (1 − α)x2 ) = αG(x1 ) +  (1 − α)G(x2 ). By Proposition 8.41, we have G(αx1 + (1 − α)x2 ) = z. Hence, αx1 + (1 − α)x2 ∈ Ωz . Hence, Ωz is convex. This completes the proof of the proposition. ' &

8.8 Lagrange Multipliers The basic problem to be considered in this section is μ0 :=

.

inf



f (x)

(8.11)

x∈Ω, G(x)=ϑZ

where .X is a real vector space, .Ω ⊆ X is a nonempty convex set, .f : Ω → R is a convex functional, .Z is a real normed linear space with positive cone .P ⊆ Z, and .G : Ω → Z is a convex mapping.

250

8 Global Theory of Optimization

Toward a solution to the above problem, we consider the following class of problems: * Γ := z ∈ Z

.

ω(z) :=

inf

 +    ∃x ∈ Ω · G(x) = z . 

(8.12a)

∀z ∈ Γ

f (x);

x∈Ω, G(x)=z

(8.12b)

where .ω : Γ → Re is the primal functional. To guarantee that .ω is real-valued, we make the following assumption. Assumption 8.50 .ϑZ ∈ Γ and (i) .∃¯z ∈ ◦Γ such that .ω(¯z) ∈ R or (ii) .μ := infx∈Ω f (x) > −∞ holds. Fact 8.51 .Γ ⊆ Z is convex. 

Proof .∀z1 , z2 ∈ Γ , .∀α ∈ [0, 1] ⊂ R, there exist .xi ∈ Ω such that .G(xi ) = zi , .i = 1, 2. By the convexity of .Ω, we have .αx1 + (1 − α)x2 ∈ Ω. Then, by the convexity   of G and Proposition 8.41, .G(αx1 + (1 − α)x2 ) = αG(x1 ) + (1 − α)G(x2 ) = αz1 + (1 − α)z2 . Hence, .αz1 + (1 − α)z2 ∈ Γ . This completes the proof of the fact. ' & Fact 8.52 Under Assumption 8.50, .ω : Γ → R is real-valued, convex, and  nonincreasing, that is, .∀z1 , z2 ∈ Γ with .z1 = z2 , we have .ω(z1 ) ≥ ω(z2 ). Proof We will first show that .ω(z) ∈ R, .∀z ∈ Γ . Let (i) in Assumption 8.50 hold. Fix any .z ∈ Γ . We will distinguish two exhaustive and mutually exclusive cases: Case 1: .z = z¯ ; Case 2: .z = z¯ . Case 1: .z = z¯ . Then, .ω(z) = ω(¯z) ∈ R. Case 2: .z = z¯ . By the definition of .Γ and the fact that f is real-valued, we have .ω(z) < +∞. By (i) of Assumption 8.50, .∃δ ∈ (0, ∞) ⊂ R such that .BZ (¯ z, δ) ∩ V (Γ ) ⊆ Γ . Let .z¯ 1 := z¯ + 2¯zδ−z (¯z − z) ∈ BZ (¯z, δ) ∩ V (Γ ) ⊆ Γ and .α¯ = ¯z−z ¯z−z+δ/2

∈ (0, 1) ⊂ R. It is easy to verify that .z¯ = α¯ z¯ 1 + (1 − α)z. ¯ Then, we have .

− ∞ < ω(¯z) = ≤ ≤ =

inf 

x∈Ω, ¯ G(x) ¯ =α¯ z¯ 1 +(1−α)z ¯

inf

f (x) ¯





x= ¯ α¯ x¯ 1 +(1−α)x, ¯ x¯ 1 ∈Ω, G(x¯ 1 )=¯z1 , x∈Ω, G(x)=z

inf





x¯ 1 ∈Ω, G(x¯ 1 )=¯z1 , x∈Ω, G(x)=z

inf



x¯ 1 ∈Ω, G(x¯ 1 )=¯z1

= α¯

inf



αf ¯ (x¯1 ) +

x¯ 1 ∈Ω, G(x¯ 1 )=¯z1

(αf ¯ (x¯1 ) + (1 − α)f ¯ (x)) inf



x∈Ω, G(x)=z

f (x¯1 ) + (1 − α) ¯

= αω(¯ ¯ z1 ) + (1 − α)ω(z) ¯

f (x) ¯

(1 − α)f ¯ (x) inf



x∈Ω, G(x)=z

f (x)

8.8 Lagrange Multipliers

251

where the second equality follows from Proposition 8.37, and the third equality follows from Proposition 3.81. Then, .ω(z) > −∞. Hence, .ω(z) ∈ R. In both cases, we have .ω(z) ∈ R. Therefore, .ω : Γ → R is real-valued. Let (ii) in Assumption 8.50 hold. Fix any .z ∈ Γ . Then, .ω(z) ≥ μ > −∞. By the definition of .Γ , we have .ω(z) < +∞. Hence, .ω(z) ∈ R. Thus, under Assumption 8.50, .ω : Γ → R is real-valued. .∀z1 , z2 ∈ Γ , .∀α ∈ [0, 1] ⊂ R, we have ω(αz1 + (1 − α)z2 ) =

inf

.

≤ ≤ =

f (x)



x∈Ω, G(x)=αz1 +(1−α)z2

inf





f (x)

x=αx1 +(1−α)x2 , x1 ∈Ω, G(x1 )=z1 , x2 ∈Ω, G(x2 )=z2 

inf



x1 ∈Ω, G(x1 )=z1 , x2 ∈Ω, G(x2 )=z2

inf



x1 ∈Ω, G(x1 )=z1



inf

αf (x1 ) +



x1 ∈Ω, G(x1 )=z1

(αf (x1 ) + (1 − α)f (x2 )) inf



x2 ∈Ω, G(x2 )=z2

f (x1 ) + (1 − α)

(1 − α)f (x2 ) inf



f (x2 )

x2 ∈Ω, G(x2 )=z2

= αω(z1 ) + (1 − α)ω(z2 ) where the second equality follows from Proposition 8.37, and the third equality follows from Proposition 3.81. Hence, .ω is convex. It is obvious that .ω is nonincreasing. This completes the proof of the fact. ' & Now, let the functional ωconj : Γconj → R be conjugate to ω : Γ → R. Fact 8.53 Under Assumption 8.50, we have .Γ conj ⊆ P  and .−Γ conj ⊆ P ⊕ . 

Proof Since .ϑZ ∈ Γ implies that .∃x1 ∈ Ω such that .G(x1 ) = ϑZ , then .Γ ⊇ G(x1 ) + P . .∀z∗ ∈ Γ conj, by Definition 8.27, we have .

+ ∞ > sup(z∗ , z − ω(z)) ≥ z∈Γ



sup z∈G(x1 )+P

sup z∈G(x1 )+P

(z∗ , z − ω(z))

(z∗ , z − ω(G(x1 )))

= sup z∗ , z¯  + z∗ , G(x1 ) − ω(G(x1 )) z¯ ∈P

where the third inequality follows from Fact 8.52, and the equality follows from z∗ , z < +∞. Since P is a cone, then Proposition 8.37. Hence, .supz=ϑ  Z

z∗ , z = 0. This implies that .z∗ ∈ P  , .Γ conj ⊆ P  , and .−Γ conj ⊆ P ⊕ . supz=ϑ  Z This completes the proof of the fact. ' &

.

Fact 8.54 Let Assumption 8.50 hold. Define .ω¯ : P  → Re by ω(z ¯ ∗ ) = sup (z∗ , z − ω(z));

.

z∈Γ

∀z∗ ∈ P 

252

8 Global Theory of Optimization

Then, .supz∗ ∈Γ conj −ωconj(z∗ ) = supz∗ ∈P  −ω(z ¯ ∗ ). .∀z∗ ∈ P ⊕ , we have that .

− ω(−z ¯ ∗ ) = inf (f (x) + z∗ , G(x)) =: ϕ(z∗ ) x∈Ω

where .ϕ : P ⊕ → Re is called the dual functional. Proof Clearly, .∀z∗ ∈ Γ conj, .ω(z ¯ ∗ ) = ωconj(z∗ ) and, by the definition of .Γ conj, .∀z∗ ∈ P  \ Γ conj, .ω(z ¯ ∗ ) = +∞. Hence, .supz∗ ∈Γ conj −ωconj(z∗ ) = supz∗ ∈P  −ω(z ¯ ∗ ). 

∀z∗ ∈ P ⊕ , .−z∗ = ϑZ∗ ,

.

−ω(−z ¯ ∗ ) = − sup(−z∗ , z − ω(z)) = inf (z∗ , z + ω(z))

.

z∈Γ

z∈Γ

= inf (z∗ , z + z∈Γ

=

inf



inf

x∈Ω, z∈Γ, G(x)=z

= inf



x∈Ω, G(x)=z

f (x)) = inf

inf

(z∗ , z + f (x))

inf

(z∗ , z + f (x))

 z∈Γ x∈Ω, G(x)=z

(z∗ , z + f (x)) = inf

 x∈Ω z∈Γ, z=G(x)

inf (z∗ , z + f (x)) = inf (f (x) + inf z∗ , z)

 x∈Ω z=G(x)



x∈Ω

z=G(x)

= inf (f (x) + z∗ , G(x)) x∈Ω

where the second equality follows from Proposition 3.81; the fourth equality follows from Proposition 8.37; the fifth and sixth equality follows from Proposition 8.36; and the eighth equality follows from Proposition 8.37. This completes the proof of the fact. ' & The desired theory follows by applying either duality results for convex functionals, which are Propositions 8.33 and 8.34. This leads to two different regularity conditions. To apply Proposition 8.33, we assume that Assumption 8.55 .ω is lower semicontinuous at .ϑZ . To apply Proposition 8.34, we assume that Assumption 8.56 .∃x1 ∈ Ω such that .G(x1 )  ϑZ . Now, we state the two Lagrange duality results. Proposition 8.57 Let .X be a real vector space, .Ω ⊆ X be nonempty and convex, f : Ω → R be a convex functional, .Z be a real normed linear space with the positive cone .P ⊆ Z, .G : Ω → Z be a convex mapping, and .μ0 be defined as in (8.11). Let Assumptions 8.50 and 8.55 hold. Then,

.

μ0 = sup

.



z∗ =ϑZ∗

inf (f (x) + z∗ , G(x)) = sup ϕ(z∗ )

x∈Ω



z∗ =ϑZ∗

(8.13)

8.8 Lagrange Multipliers

253

Furthermore, if the supremum in (8.13) is achieved at .z∗0 ∈ P ⊕ , that is, μ0 = inf (f (x) + z∗0 , G(x))

.

(8.14)

x∈Ω

then the following statement holds: the infimum in (8.11) is achieved at .x0 ∈ Ω  with .G(x0 ) = ϑZ if, and only if, the infimum in (8.14) is achieved at .x0 ∈ Ω with  .G(x0 ) = ϑZ and .z∗0 , G(x0 ) = 0. Proof By Facts 8.51 and 8.52, Assumption 8.55, and Proposition 8.33, we have μ0 = ω(ϑZ ) =

.

sup

z∗ ∈Γ conj

−ωconj(z∗ )

Then, by Facts 8.53 and 8.54, we have μ0 = sup −ω(z ¯ ∗ ) = sup −ω(−z ¯ ∗ ) = sup ϕ(z∗ )

.

z∗ ∈P 





z∗ =ϑZ∗

z∗ =ϑZ∗

Therefore, (8.13) holds. Let the supremum in (8.13) be achieved at .z∗0 ∈ P ⊕ , that is, (8.14) holds.  If the infimum in (8.11) is achieved at .x0 ∈ Ω with .G(x0 ) = ϑZ , then, we have .μ0 = f (x0 ) ≥ f (x0 ) + z∗0 , G(x0 ) ≥ infx∈Ω (f (x) + z∗0 , G(x)) = μ0 . Hence, the infimum in (8.14) is achieved at .x0 and .z∗0 , G(x0 ) = 0.  On the other hand, if the infimum in (8.14) is achieved at .x0 ∈ Ω with .G(x0 ) = ϑZ and .z∗0 , G(x0 ) = 0, then .μ0 = f (x0 ) + z∗0 , G(x0 ) = f (x0 ). Hence, the infimum in (8.11) is achieved at .x0 . This completes the proof of the proposition. ' & Proposition 8.58 Let .X be a real vector space, .Ω ⊆ X be nonempty and convex, f : Ω → R be a convex functional, .Z be a real normed linear space with the positive cone .P ⊆ Z, .P ◦ = ∅, .G : Ω → Z be a convex mapping. Let Assumptions 8.50 and 8.56 hold. Then,

.

μ0 = max inf (f (x) + z∗ , G(x)) = max ϕ(z∗ )

.

 z∗ =ϑZ∗ x∈Ω

(8.15)



z∗ =ϑZ∗

where the maximum is achieved at .z∗0 ∈ P ⊕ , that is, μ0 = inf (f (x) + z∗0 , G(x))

.

(8.16)

x∈Ω



Furthermore, the infimum in (8.11) is achieved at .x0 ∈ Ω with .G(x0 ) = ϑZ if,  and only if, the infimum in (8.16) is achieved at .x0 ∈ Ω with .G(x0 ) = ϑZ and .z∗0 , G(x0 ) = 0. Proof By Assumption 8.56, .G(x1 )ϑZ . Then, .−G(x1 ) ∈ P ◦ and .G(x1 )+P ⊆ Γ . By Proposition 7.16, .Γ ◦ ⊇ G(x1 ) + P ◦ and .ϑZ ∈ Γ ◦ . .∀z ∈ G(x1 ) + P , we

254

8 Global Theory of Optimization 

have .z = G(x1 ) and, by Fact 8.52, .ω(z) ≤ ω(G(x1 )). Since .G(x1 )  ϑZ , then .∃δ ∈ (0, ∞) ⊂ R such that .BZ (ϑZ , δ) ⊆ G(x1 ) + P . Take .r0 := ω(G(x1 )) + δ ∈ R. It is easy to check that .BR×Z ((r0 , ϑZ ), δ) ⊆ [ω, Γ ]. Hence, .(r0 , ϑZ ) ∈ [ω, Γ ]◦ . By Proposition 8.22, .ω is continuous at .ϑZ . By Facts 8.51 and 8.52 and Proposition 8.34, we have μ0 = ω(ϑZ ) = max −ωconj(z∗ )

.

z∗ ∈Γ conj

Then, by Facts 8.53 and 8.54, we have μ0 = max −ω(z ¯ ∗ ) = max −ω(−z ¯ ∗ ) = max ϕ(z∗ )

.

z∗ ∈P 



z∗ =ϑZ∗



z∗ =ϑZ∗

Therefore, (8.15) holds and the maximum is achieved at .z∗0 ∈ P ⊕ .  If the infimum in (8.11) is achieved at .x0 ∈ Ω with .G(x0 ) = ϑZ , then, we have .μ0 = f (x0 ) ≥ f (x0 ) + z∗0 , G(x0 ) ≥ infx∈Ω (f (x) + z∗0 , G(x)) = μ0 . Hence, the infimum in (8.16) is achieved at .x0 and .z∗0 , G(x0 ) = 0.  On the other hand, if the infimum in (8.16) is achieved at .x0 ∈ Ω with .G(x0 ) = ϑZ and .z∗0 , G(x0 ) = 0, then .μ0 = f (x0 ) + z∗0 , G(x0 ) = f (x0 ). Hence, the infimum in (8.11) is achieved at .x0 . This completes the proof of the proposition. ' & In the Propositions 8.57 and 8.58, .z∗0 is called the Lagrange multiplier. Assumption 8.50 guarantees that the primal functional is real-valued and convex. Assumption 8.56 guarantees the existence of a Lagrange multiplier. This assumption is restrictive. On the other hand, Assumption 8.55 guarantees the duality but not the existence of a Lagrange multiplier. This condition is more relaxed. Corollary 8.59 Let .X be a real vector space, .Ω ⊆ X be nonempty and convex, .f : Ω → R be a convex functional, .Z be a real normed linear space with the positive cone .P ⊆ Z, .P ◦ = ∅, .G : Ω → Z be a convex mapping. Let Assumptions 8.50  and 8.56 hold and .x0 ∈ Ω with .G(x0 ) = ϑZ achieves the infimum in (8.11). Then,  there exists .z∗0 ∈ Z∗ with .z∗0 = ϑZ∗ such that the Lagrangian .L : Ω × P ⊕ → R defined by L(x, z∗ ) := f (x) + z∗ , G(x);

.

∀x ∈ Ω, ∀z∗ ∈ P ⊕

admits a saddle point at .(x0 , z∗0 ), i.e., L(x0 , z∗ ) ≤ L(x0 , z∗0 ) ≤ L(x, z∗0 );

.

∀x ∈ Ω, ∀z∗ ∈ P ⊕ 

Proof By Proposition 8.58, there exists .z∗0 ∈ Z∗ with .z∗0 = ϑZ∗ such that ⊕ .L(x0 , z∗0 ) ≤ L(x, z∗0 ), .∀x ∈ Ω and .z∗0 , G(x0 ) = 0. Then, .∀z∗ ∈ P , we have .L(x0 , z∗ ) − L(x0 , z∗0 ) = z∗ , G(x0 ) − z∗0 , G(x0 ) = z∗ , G(x0 ) ≤

8.8 Lagrange Multipliers

255

0. Hence, the saddle-point condition holds. This completes the proof of the corollary. ' & Next, we present two sufficiency results on Lagrange multipliers. Proposition 8.60 Let .X be a real vector space, .Ω ⊆ X be nonempty, .f : Ω → R, Z be a real normed linear space with the positive cone .P ⊆ Z, .G : Ω → Z.  Assume that there exist .x0 ∈ Ω and .z∗0 ∈ Z∗ with .z∗0 = ϑZ∗ such that

.

f (x0 ) + z∗0 , G(x0 ) ≤ f (x) + z∗0 , G(x);

.

∀x ∈ Ω

Then, f (x0 ) =

inf

.

f (x)



x∈Ω, G(x)=G(x0 )



Proof .∀x ∈ Ω with .G(x) = G(x0 ), we have .z∗0 , G(x) ≤ z∗0 , G(x0 ), since ⊕ .z∗0 ∈ P . By the assumption of the proposition, .f (x0 ) + z∗0 , G(x0 ) ≤ f (x) + z∗0 , G(x). Then, .f (x0 ) ≤ f (x). This completes the proof of the proposition. ' & Proposition 8.61 Let .X be a real vector space, .Ω ⊆ X be nonempty, .f : Ω → R, Z be a real normed linear space with the positive cone .P ⊆ Z, .G : Ω → Z.  Assume that there exist .x0 ∈ Ω and .z∗0 ∈ Z∗ with .z∗0 = ϑZ∗ such that the Lagrangian .L : Ω × P ⊕ → R given by

.

∀x ∈ Ω, ∀z∗ ∈ P ⊕

L(x, z∗ ) = f (x) + z∗ , G(x);

.

Admits a saddle point at .(x0 , z∗0 ), i.e., ∀x ∈ Ω, ∀z∗ ∈ P ⊕

L(x0 , z∗ ) ≤ L(x0 , z∗0 ) ≤ L(x, z∗0 );

.



Then, .G(x0 ) = ϑZ and f (x0 ) = L(x0 , z∗0 ) =

.

inf



f (x)

x∈Ω, G(x)=ϑZ

Proof By the first inequality in the saddle-point condition, we have z∗ , G(x0 ) ≤ z∗0 , G(x0 );

.



∀z∗ ∈ P ⊕

∀z∗ ∈ P ⊕ , we have .z∗ = ϑZ∗ and .z∗ + z∗0 ≥ ϑZ∗ . Then, .z∗ + z∗0 , G(x0 ) ≤ z∗0 , G(x0 ) and .z∗ , G(x0 ) ≤ 0. By Proposition 8.46, .G(x0 ) ∈ (−P ) and  .G(x0 ) = ϑZ . Furthermore, .0 = ϑZ∗ , G(x0 ) ≤ z∗0 , G(x0 ) ≤ 0 implies that .z∗0 , G(x0 ) = 0. .

256

8 Global Theory of Optimization 

∀x ∈ Ω with .G(x) = ϑZ , we have .f (x) ≥ f (x) + z∗0 , G(x) ≥ f (x0 ) + z∗0 , G(x0 ) = f (x0 ). This completes the proof of the proposition. ' & .

Next, we present a result on the sensitivity of the infimization problem. Proposition 8.62 Let .X be a real vector space, .Ω ⊆ X be nonempty, .f : Ω → R, Z be a real normed linear space with the positive cone .P ⊆ Z, .G : Ω → Z. ⊕ ⊆ Z∗ be the Let .zi ∈ Z, .μi = infx∈Ω, G(x)=z  f (x), .i = 0, 1. Let .z∗0 ∈ P i Lagrange multiplier associated with .μ0 , that is

.

μ0 = inf (f (x) + z∗0 , G(x) − z0 )

.

x∈Ω

Assume that .μ0 ∈ R. Then, we have μ1 − μ0 ≥ −z∗0 , z1 − z0 

.



Proof .∀x ∈ Ω with .G(x) = z1 , we have .μ0 ≤ f (x) + z∗0 , G(x) − z0 , which implies that f (x) ≥ μ0 − z∗0 , G(x) − z0  ≥ μ0 − z∗0 , z1 − z0 

.

Hence, .μ1 ≥ μ0 + −z∗0 , z1 − z0 . This completes the proof of the proposition. ' &

Chapter 9

Differentiation in Banach Spaces

In this chapter, we are going to develop the concept of derivative in normed linear spaces.

9.1 Fundamental Notion Definition 9.1 Let X be a normed linear space over the field K, D ⊆ X, and x0 ∈ D. u ∈ X is said to be an admissible deviation in D at x0 if ∀ ∈ (0, ∞) ⊂ R, we have {x0 + r u¯ | r ∈ (0, ) ⊂ R, u¯ ∈ B(u, )} ∩ D = ∅. Let AD (x0 ) be the set of % admissible deviations in D at x0 . Clearly, if x0 ∈ D ◦ , then AD (x0 ) = X. Another fact is that when D1 ⊆ D2 ⊆ X and x0 ∈ D1 , then AD1 (x0 ) ⊆ AD2 (x0 ). Yet another fact is that when x0 ∈ D1 , D1 , D2 ⊆ X, and ∃δ ∈ (0, ∞) ⊂ R such that D1 ∩ B(x0 , δ) = D2 ∩ B(x0 , δ), then AD1 (x0 ) = AD2 (x0 ). Proposition 9.2 Let X be a normed linear space over the field K, D ⊆ X, and x0 ∈ D. Then, AD (x0 ) ⊆ X is a closed cone. Proof Clearly, if x0 ∈ D, then ϑ ∈ AD (x0 ). On the other hand, if x0 ∈ D \ D, then, byProposition 4.13, x0 is an accumulation point of √ D. ∀ ∈ (0, ∞) ⊂ R,  ∃x¯ ∈ (B x0 ,  2 ∩ D) \ {x0 }. Let v := x¯ − x0 . Then, δ := v ∈ (0, ) ⊂ R. Let v¯ := δ −1 v. Then, v¯ ∈ B(ϑ, ) and x¯ = x0 + δ v¯ ∈ {x0 + r u¯ | r ∈ (0, ) ⊂ R, u¯ ∈ B(ϑ, )} ∩ D = ∅. Hence, ϑ ∈ AD (x0 ). Therefore, ϑ ∈ AD (x0 ) if x0 ∈ D. ∀u ∈ AD (x0 ), ∀α ∈ [0, ∞) ⊂ R, we will show that αu ∈ AD (x0 ) by distinguishing two exhaustive and mutually exclusive cases: Case 1: α = 0; Case 2: α > 0. Case 1: α = 0. Then, αu = ϑ ∈ AD (x0 ). Case 2: α > 0. ∀ ∈ (0, ∞) ⊂ R, let ¯ = min {α, /α} ∈ (0, ∞) ⊂ R. By u ∈ AD (x0 ), ∃x¯ ∈ {x0 + r u¯ | r ∈ (0, ¯ ) ⊂ R, u¯ ∈ B(u, ¯ )} ∩ D. Hence, x¯ ∈ D and x¯ = x0 + r u¯ = x0 + (r/α)(α u) ¯ with r ∈ (0, ¯ ) ⊂ R and u¯ ∈ B(u, ¯ ). Then, r/α ∈ (0, ) ⊂ R and α u¯ ∈ B(αu, ). © The Author(s), under exclusive license to Springer Nature Switzerland AG 2023 Z. Pan, Measure-Theoretic Calculus in Abstract Spaces, https://doi.org/10.1007/978-3-031-21912-2_9

257

258

9 Differentiation in Banach Spaces

Hence, x¯ ∈ {x0 + r˜ u˜ | r˜ ∈ (0, ) ⊂ R, u˜ ∈ B(αu, )} ∩ D = ∅. This implies that αu ∈ AD (x0 ). Hence, AD (x0 ) is a cone. ∀u ∈ AD (x0 ), ∀ ∈ (0, ∞) ⊂ R, by Proposition 3.3, ∃u1 ∈ AD (x0 )∩B(u, /2). By u1 ∈ AD (x0 ), ∃x¯ ∈ {x0 + r u¯ | r ∈ (0, /2) ⊂ R, u¯ ∈ B(u1 , /2)} ∩ D. Then, x¯ ∈ {x0 + r u¯ | r ∈ (0, ) ⊂ R, u¯ ∈ B(u, )} ∩ D = ∅. Hence, u ∈ AD (x0 ). Then, AD (x0 ) ⊆ AD (x0 ) and AD (x0 ) is closed. This completes the proof of the ' & proposition. Definition 9.3 Let X and Y be normed linear spaces over K, D ⊆ X, f : D → Y, and x0 ∈ D. Assume that span (AD (x0 )) = X. Let L ∈ B(X, Y) be such that ∀ ∈ (0, ∞) ⊂ R, ∃δ ∈ (0, ∞) ⊂ R, ∀x ∈ D ∩ BX (x0 , δ), we have f (x) − f (x0 ) − L(x − x0 ) ≤ x − x0 

.

Then, L is called the (Fréchet) derivative of f at x0 and denoted by f (1)(x0 ) or Df (x0 ). When L exists, we will say that f is (Fréchet) differentiable at x0 . Df or f (1) will denote the B(X, Y)-valued function whose domain of definition is dom f (1) := {x ∈ D | Df (x) ∈ B(X, Y) exists}. If f is differentiable at x0 , ∀x0 ∈ D, we say f is (Fréchet) differentiable. In this case, Df : D → B(X, Y) or f (1) : D → B(X, Y). % Clearly, when X = Y = R and D = [a, b] ⊆ X with a < b, then Df (t) is simply the derivative of f at t ∈ [a, b], as we know before. Definition 9.4 Let X and Y be normed linear spaces over K, D ⊆ X, f : D → Y, x0 ∈ D, and u ∈ AD (x0 ). Let MD (x0 ) := {(r, u) ¯ ∈ R×X | r ∈ (0, +∞) ⊂ R, u¯ ∈ X, x0 + r u¯ ∈ D}, and g¯ : MD (x0 ) → Y be given by g(r, ¯ u) ¯ = r −1 (f (x0 + r u) ¯ − f (x0 )), ∀(r, u) ¯ ∈ MD (x0 ). Clearly, (0, u) is an accumulation point of MD (x0 ) since u ∈ AD (x0 ). The directional derivative of f at x0 along u, denoted by Df (x0 ; u), is the limit lim(r,u)→(0,u) g(r, ¯ u), ¯ when it exists. % ¯ Clearly, the directional derivative is unique when it exists, since Y is Hausdorff. Proposition 9.5 Let X and Y be normed linear spaces over K, D ⊆ X, f : D → Y, x0 ∈ D, span (AD (x0 )) = X, L ∈ B(X, Y) be the Fréchet derivative of f at x0 , and u ∈ AD (x0 ). Then, Df (x0 ; u) = Lu. Proof ∀ ∈ (0, ∞) ⊂ R, let ¯ = /(2 + 2u) ∈ (0, ∞) ⊂ R. By Df (x0 ) = L, ∃δ ∈ (0, ∞) ⊂ R such that f (x) − f (x0 ) − L(x − x0 ) ≤ ¯ x − x0 , ∀x ∈ BX (x0 , δ) ∩ D. Let MD (x0 ) and g¯ : MD (x0 ) → Y be as defined in ¯ Definition 9.4. 2L), δ/(1 + u), 1} ∈ (0, ∞) ⊂ R.  Let δ = min {/(2 + ∀(r, u) ¯ ∈ MD (x0 ) ∩ BR×X (0, u), δ¯ \ {(0, u)}, we have x¯ := x0 + r u¯ ∈ D ∩ BX (x0 , δ). This implies that .

g(r, ¯ u) ¯ − Lu = f (x0 + r u) ¯ − f (x0 ) − rLu/r = f (x) ¯ − f (x0 ) − L(x¯ − x0 ) + rL(u¯ − u)/r ≤ f (x) ¯ − f (x0 ) − L(x¯ − x0 )/r + L(u¯ − u)

9.1 Fundamental Notion

259

¯ u ¯ + Lu¯ − u ≤ ¯ x¯ − x0 /r + L(u¯ − u) =  ¯ + Lδ¯ ≤ ¯ (u + u¯ − u) + Lu¯ − u ≤ ¯ (u + δ) (x1 , x2 )(h1 , h2 ) = h1 + h2 . In “matrix” notation, we have Df (x1 , x2 ) = idX idX . Proof ∀(x1 , x2 ) ∈ X × X, let L : X × X → X be given by L(h1 , h2 ) = h1 + h2 , ∀(h1 , h2 ) ∈ X × X. Clearly, L is a linear operator. Note that L =

.

≤ ≤

sup

L(h1 , h2 )

sup

h1  + h2 

sup

#1/2 √ √ " 2 h1 2 + h2 2 ≤ 2

(h1 ,h2 )∈X×X, (h1 ,h2 )≤1 (h1 ,h2 )∈X×X, (h1 ,h2 )≤1

(h1 ,h2 )∈X×X, (h1 ,h2 )≤1

9.2 The Derivatives of Some Common Functions

263

where the second inequality follows from Cauchy–Schwarz inequality. Hence, L ∈ B(X × X, X). Clearly, AX×X (x1 , x2 ) = X × X since (x1 , x2 ) ∈ (X × X)◦ . ∀ ∈ (0, ∞) ⊂ R, set δ = 1 ∈ R, ∀(x¯1 , x¯2 ) ∈ (X × X) ∩ BX×X ((x1 , x2 ), δ), and we have f (x¯1 , x¯2 ) − f (x1 , x2 ) − L(x¯1 − x1 , x¯2 − x2 ) = 0 ≤ (x¯1 − x1 , x¯2 − x2 ). By Definition 9.3, Df (x1 , x2 ) = L. This completes the proof of the proposition. ' & Proposition 9.15 Let X and Y be normed linear spaces over K, D ⊆ X, f1 : D → Y, f2 : D → Y, x0 ∈ D, α1 , α2 ∈ K, and g : D → Y be given by (1) (1) g(x) = α1 f1 (x) + α2 f2 (x), ∀x ∈ D. Assume that f1 (x0 ) and f2 (x0 ) exist. Then, g is Fréchet differentiable at x0 and g (1) (x0 ) = α1 f1(1) (x0 ) + α2 f2(1) (x0 ). (1)

(1)

Proof Define L := α1 f1 (x0 ) + α2 f2 (x0 ) ∈ B(X, Y). By assumption, span (AD (x0 )) = X. ∀ ∈ (0, ∞) ⊂ R, by the differentiability of f1 at 6 x0 , ∃δ1 ∈ (0, ∞) ⊂ R such 6 that ∀x ∈ D ∩ BX (x0 , δ1 ), we have 6 6 (1) 6f1 (x) − f1 (x0 ) − f1 (x0 )(x − x0 )6 ≤ x − x0 . By the differentiability of f that ∀x ∈ D ∩ BX (x0 , δ2 ), we have 62 at x0 , ∃δ2 ∈ (0, ∞) ⊂ R such 6 6 6 (1) ≤ x − x0 . Let δ := min {δ1 , δ2 } > 0. (x) − f (x ) − f (x )(x − x ) 6f2 2 0 0 0 6 2 ∀x ∈ D ∩ BX (x0 , δ), we have .

g(x) − g(x0 ) − L(x − x0 ) (1)

= α1 (f1 (x) − f1 (x0 ) − f1 (x0 )(x − x0 )) + α2 (f2 (x) − f2 (x0 ) −f2(1) (x0 )(x − x0 )) ≤ (|α1 | + |α2 |)x − x0  Hence, g (1) (x0 ) = L. This completes the proof of the proposition.

' &

Proposition 9.16 Let X be a normed linear space over K and f : K × X → X be given by f (α, x) = αx, ∀(α, x) ∈ K × X. Then, f is Fréchet differentiable and Df : K × X → B(K × X, X) is given by ∀(α, = x) ∈ K> × X, ∀(d, h) ∈ K × X, Df (α, x)(d, h) = αh + dx. Thus, Df (α, x) = x α idX in “matrix” notation. Proof ∀(α, x) ∈ K × X, let L : K × X → X be given by L(d, h) = αh + dx, ∀(d, h) ∈ K × X. Clearly, L is a linear operator. Note that L =

.

≤ ≤

sup

L(d, h)

sup

|α|h + |d|x

sup

" #1/2 " #1/2 |α|2 + x2 |d|2 + h2

(d,h)∈K×X, (d,h)≤1 (d,h)∈K×X, (d,h)≤1

(d,h)∈K×X, (d,h)≤1

≤ (α, x) < +∞

264

9 Differentiation in Banach Spaces

where the second inequality follows from Cauchy–Schwarz inequality. Hence, L ∈ B(K × X, X). Clearly, AK×X (α, x) = K × X since (α, x) ∈ (K × X)◦ . ∀ ∈ (0, ∞) ⊂ R, set δ = 2 ∈ (0, ∞) ⊂ R, and ∀(α, ¯ x) ¯ ∈ (K × X) ∩ BK×X ((α, x), δ), we have .

f (α, ¯ x) ¯ − f (α, x) − L(α¯ − α, x¯ − x) = α¯ x¯ − αx − α(x¯ − x) − (α¯ − α)x = (α¯ − α)(x¯ − x) 1 1 (|α¯ − α|2 + x¯ − x2 ) = (α¯ − α, x¯ − x)2 2 2 ≤ (α¯ − α, x¯ − x)



By Definition 9.3, Df (α, x) = L. This completes the proof of the proposition.

' &

In the previous proposition, we have abused the notation using the “matrix” notation that we identify xd as dx. To state the next proposition, we will introduce a new notation. Let A ∈ B(X, Y) and B ∈ B(Y, Z), where X, Y, and Z are normed linear spaces over K. Clearly, f (A, B) := BA ∈ B(X, Z). Let g(A) : B(Y, Z) → B(X, Z) be given by g(A)(B) = BA, ∀B ∈ B(Y, Z). Clearly, g(A) is a bounded linear operator. It is easy to see that g : B(X, Y) → B(B(Y, Z), B(X, Z)) is a bounded linear operator with g ≤ 1. This operator g is needed in compact “matrix” notation for many linear operators. We will use the ro to denote g for any normed linear spaces X, Y, and Z. The meaning of ro is “right operate.” For x ∈ X, we will identify X with B(R, X) and write ro(x)(A) = Ax. This brings us to the next proposition. Proposition 9.17 Let X and Y be normed linear spaces over K and f : B(X, Y) × X → Y be given by f (A, x) = Ax, ∀(A, x) ∈ B(X, Y) × X. Then, f is Fréchet differentiable, and Df : B(X, Y) × X → B(B(X, Y) × X, Y) is given by ∀(A, x) ∈ B(X, Y) × X, ∀(Δ, h) =∈ B(X, Y)> × X, Df (A, x)(Δ, h) = Ah + Δx. In “matrix” notation, Df (A, x) = ro(x) A . Proof ∀(A, x) ∈ B(X, Y) × X, let L : B(X, Y) × X → Y be given by L(Δ, h) = Ah + Δx, ∀(Δ, h) ∈ B(X, Y) × X. Clearly, L is a linear operator. Note that .

L = ≤ ≤

sup

(Δ,h)∈B(X,Y)×X, (Δ,h)≤1

L(Δ, h)

sup

Ah + Δx

sup

" #1/2 " #1/2 A2 + x2 Δ2 + h2

(Δ,h)∈B(X,Y)×X, (Δ,h)≤1

(Δ,h)∈B(X,Y)×X, (Δ,h)≤1

≤ (A, x) < +∞

9.3 Chain Rule and Mean Value Theorem

265

where the first inequality follows from Proposition 7.64, and the second inequality follows from Cauchy–Schwarz inequality. Hence, L ∈ B(B(X, Y) × X, Y). Clearly, AB(X,Y)×X (A, x) = B(X, Y) × X since (A, x) ∈ (B(X, Y) × X)◦ = ¯ x) B(X, Y) × X. ∀ ∈ (0, ∞) ⊂ R, set δ = 2 ∈ (0, ∞) ⊂ R, and ∀(A, ¯ ∈ (B(X, Y) × X) ∩ BB(X,Y)×X ((A, x), δ), we have .

6 6 6f (A, ¯ x) ¯ − f (A, x) − L(A¯ − A, x¯ − x)6 6 6 6 6 = 6A¯ x¯ − Ax − A(x¯ − x) − (A¯ − A)x 6 = 6(A¯ − A)(x¯ − x)6 62 62 1 6 1 6 (6A¯ − A6 + x¯ − x2 ) = 6 A¯ − A, x¯ − x 6 2 2 6 6 ≤  6 A¯ − A, x¯ − x 6



where the first inequality follows from Proposition 7.64. By Definition 9.3, Df (A, x) = L. This completes the proof of the proposition. ' &

9.3 Chain Rule and Mean Value Theorem Theorem 9.18 (Chain Rule) Let X, Y, and Z be normed linear spaces over K, Dx ⊆ X, Dy ⊆ Y, f : Dx → Dy , g : Dy → Z, x0 ∈ Dx , y0 := f (x0 ) ∈ Dy , and h := g ◦ f : Dx → Z. Assume that f is Fréchet differentiable at x0 with Df (x0 ) ∈ B(X, Y) and g is Fréchet differentiable at y0 with Dg(y0 ) ∈ B(Y, Z). Then, h is differentiable at x0 and Dh(x0 ) ∈ B(X, Z) is given by Dh(x0 ) = Dg(f (x0 )) ◦ Df (x0 ) = Dg(y0 )Df (x0 )

.

Proof Define L : X → Z by, ∀x¯ ∈ X, L(x) ¯ = Dg(y0 )Df (x0 )x¯ = Dg(y0 ) · (Df (x0 )(x)). ¯ Clearly, L is a linear operator and, by Proposition 7.64, L ∈ B(X, Z). J ∀ ∈ (0, ∞) ⊂ R, let ¯ = ( 4 + (Df (x0 ) + Dg(y0 ))2 − Df (x0 ) − Dg(y0 ))/2 ∈ (0, ∞) ⊂ R. By the fact that Dg(y0 ) ∈ B(Y, Z), ∃δ1 ∈ (0, ∞) ⊂ R such that ∀y ∈ Dy ∩ BY (y0 , δ1 ), we have g(y) − g(y0 ) − Dg(y0 )(y − y0 ) ≤ y ¯ − y0 

.

  By the fact that Df (x0 ) ∈ B(X, Y), we have span ADx (x0 ) = X and ∃δ2 ∈ (0, δ1 /(¯ + Df (x0 ))] ⊂ R such that ∀x ∈ Dx ∩ BX (x0 , δ2 ), we have f (x) − f (x0 ) − Df (x0 )(x − x0 ) ≤ ¯ x − x0 

.

Then, by Proposition 7.64, f (x) − y0  = f (x) − f (x0 ) ≤ ¯ x − x0  + Df (x0 )x − x0  ≤ (¯ + Df (x0 ))x − x0  < δ1 . This implies that f (x) ∈

266

9 Differentiation in Banach Spaces

Dy ∩ BY (y0 , δ1 ). Then, we have .

h(x) − h(x0 ) − L(x − x0 ) = g(f (x)) − g(y0 ) − L(x − x0 ) ≤ g(f (x)) − g(y0 ) − Dg(y0 )(f (x) − y0 ) +Dg(y0 )(f (x) − f (x0 ) − Df (x0 )(x − x0 )) ≤ f ¯ (x) − y0  + Dg(y0 )f (x) − f (x0 ) − Df (x0 )(x − x0 ) ¯ ≤ ¯ (¯ + Df (x0 ))x − x0  + Dg(y 0 )x − x0  = ¯ (¯ + Df (x0 ) + Dg(y0 ))x − x0  = x − x0 

where the second inequality follows from Proposition 7.64. By Definition 9.3, Dh(x0 ) = L. This completes the proof of the proposition. ' & Proposition 9.19 Let X, Y, and Z be normed linear spaces over K, D ⊆ X, f1 : D → Y, f2 : D → Z, x0 ∈ D, and g : D → Y × Z be given by g(x) = (f1 (x), f2 (x)), ∀x ∈ D. Then, the following statement holds. g is Fréchet differentiable at x0 if, and only if, f1 and f2 are Fréchet differentiable at x0 . In this case, Dg(x H 0 )(h) I= (Df1 (x0 )(h), Df2 (x0 )(h)), ∀h ∈ X. In “matrix” notation, Df1 (x0 ) Dg(x0 ) = . Df2 (x0 ) Proof “Sufficiency” By the differentiability of f1 and f2 at x0 , we have span (AD (x0 )) = X. Define L : X → Y × Z by, ∀h ∈ X, L(h) = (Df1 (x0 )(h), Df2 (x0 )(h)). Clearly, L is a linear operator. Note that L =

.

= ≤

sup

L(h)

sup

4 Df1 (x0 )(h)2 + Df2 (x0 )(h)2

sup

4 Df1 (x0 )2 h2 + Df2 (x0 )2 h2

h∈X, h≤1

h∈X, h≤1

h∈X, h≤1

4



Df1 (x0 )2 + Df2 (x0 )2 < +∞

where the first inequality follows from Proposition 7.64 and the last inequality follows from the fact Df1 (x0 ) ∈ B(X, Y) and Df2 (x0 ) ∈ B(X, Z). Hence, L ∈ B(X, Y × Z). Since f1 and f2 are differentiable at x0 , then ∀ ∈ (0, ∞) ⊂ R, ∃δ1 ∈ (0, ∞) ⊂ R√ such that ∀x ∈ D ∩BX (x0 , δ1 ), we have f1 (x) − f1 (x0 ) − Df1 (x0 )(x − x0 ) ≤ / 2x − x0 ; ∃δ2 ∈ (0, ∞) ⊂ R such √ that ∀x ∈ D ∩ BX (x0 , δ2 ), we have f2 (x) − f2 (x0 ) − Df2 (x0 )(x − x0 ) ≤ / 2x − x0 . Let δ = min {δ1 , δ2 } ∈ (0, ∞) ⊂ R. ∀x ∈ D ∩ BX (x0 , δ), we have

9.3 Chain Rule and Mean Value Theorem .

267

g(x) − g(x0 ) − L(x − x0 ) = (f1 (x), f2 (x)) − (f1 (x0 ), f2 (x0 )) −(Df1 (x0 )(x − x0 ), Df2 (x0 )(x − x0 )) = (f1 (x) − f1 (x0 ) − Df1 (x0 )(x − x0 ), f2 (x) −f2 (x0 ) − Df2 (x0 )(x − x0 ))  = f1 (x) − f1 (x0 ) − Df1 (x0 )(x − x0 )2 +f2 (x) − f2 (x0 ) − Df2 (x0 )(x − x0 )2

1/2

≤ x − x0  Hence, g is differentiable at x0 and Dg(x0 ) = L. “Necessity” By the differentiability of g at x0 , we have span (AD (x0 )) = X. Note that f1 = πY ◦ g. By chain rule and Proposition 9.13, f1 is Fréchet differentiable at x0 and > = Df1 (x0 ) = idY ϑB(Z,Y) Dg(x0 )

.

By symmetry, f2 is Fréchet differentiable at x0 and = > Df2 (x0 ) = ϑB(Y,Z) idZ Dg(x0 )

.

H

I Df1 (x0 ) . Df2 (x0 ) This completes the proof of the proposition.

Then, Dg(x0 ) =

' &

Theorem 9.20 (Mean Value Theorem) Let X be a real normed linear space, D ⊆ X, f : D → R, x1 , x2 ∈ D, and ϕ : [0, 1] → D be given by ϕ(t) = tx1 + (1 − t)x2, ∀t ∈ I := [0, 1] ⊂ R. Assume that f is continuous at ϕ(t), ∀t ∈ I , and f is Fréchet differentiable at ϕ(t), ∀t ∈ I ◦ . Then, there exists t0 ∈ I ◦ such that f (x1 ) − f (x2 ) = Df (ϕ(t0 ))(x1 − x2 )

.

Proof By Propositions 9.19, 9.16, and 9.15 and chain rule, ϕ is Fréchet differentiable and Dϕ(t)(h) = h(x1 − x2 ), ∀h ∈ R, ∀t ∈ I . Define g : I → R by g(t) = f (ϕ(t)), ∀t ∈ I . By chain rule, g is Fréchet differentiable at t, ∀t ∈ I ◦ , and Dg(t)(h) = Df (ϕ(t))(Dϕ(t)(h)) = hDf (ϕ(t))(x1 − x2 ), ∀h ∈ R, ∀t ∈ I ◦ . Then, Dg(t) = Df (ϕ(t))(x1 − x2 ), ∀t ∈ I ◦ . By Proposition 3.12, g is continuous. By

268

9 Differentiation in Banach Spaces

mean value theorem (Bartle, 1976, see Theorem 27.6), we have f (x1 ) − f (x2 ) = g(1) − g(0) = Dg(t0 )(1 − 0)

.

= Df (t0 x1 + (1 − t0 )x2 )(x1 − x2 ) for some t0 ∈ I ◦ . This completes the proof of the theorem.

' &

Toward a general mean value theorem for vector-valued functions on possibly complex normed linear spaces, we present the following two lemmas. Lemma 9.21 Let D := {a + i0 | a ∈ I := [0, 1] ⊂ R} ⊂ C, x0 := xr + ixi ∈ D, where xr ∈ I and xi = 0, f : D → C be Fréchet differentiable at x0 , Df (x0 ) = dr + idi ∈ C, where dr , di ∈ R, and g : I → R be given by g(t) = Re (f (t + i0)), ∀t ∈ I . Then, g is Fréchet differentiable at xr and Dg(xr ) = dr = Re (Df (x0 )) ∈ R. Proof Note that span (AI (xr )) = R. ∀ ∈ (0, ∞) ⊂ R, by the differentiability of f at x0 , ∃δ ∈ (0, ∞) ⊂ R such that ∀x¯ = x¯r + ix¯i ∈ D ∩ BC (x0 , δ), we have |f (x) ¯ − f (x0 ) − Df (x0 )(x¯ − x0 )| ≤ |x¯ − x0 |. Note that x¯r ∈ I and x¯i = 0. Then, the above implies that |(Re (f (x¯r )) − Re (f (xr )) − (x¯r − xr )dr ) +i(Im (f (x¯r )) − Im (f (xr )) − (x¯r − xr )di )| ≤ |x¯r − xr |. This further implies that |Re (f (x¯r )) − Re (f (xr )) − (x¯r − xr )dr | ≤ |x¯ r − xr |. Then, ∀t ∈ I ∩ BR (xr , δ), x¯ := t +i0 ∈ D∩BC (x0 , δ) and |g(t) − g(xr ) − (t − xr )dr | ≤ |t − xr |. Hence, Dg(xr ) = dr . This completes the proof of the lemma. ' & Lemma 9.22 Let D := {a + i0 | a ∈ I := [0, 1] ⊂ R} ⊂ C, f : D → C be continuous, and f be Fréchet differentiable at a + i0, ∀a ∈ I ◦ . Then, ∃t0 ∈ I ◦ such that Re (f (1) − f (0)) = Re (Df (t0 )). Proof Let g : I → R be given by g(t) = Re (f (t + i0)), ∀t ∈ I . Clearly, g is continuous since f is continuous. By Lemma 9.21, g is Fréchet differentiable at t, ∀t ∈ I ◦ , and Dg(t) = Re (Df (t + i0)), ∀t ∈ I ◦ . By mean value theorem (Bartle, 1976, see Theorem 27.6), ∃t0 ∈ I ◦ such that g(1) − g(0) = Dg(t0 ) = Re (Df (t0 + i0)). Then, Re (f (1) − f (0)) = g(1) − g(0) = Re (Df (t0 + i0)) = Re (Df (t0 )). This completes the proof of the lemma. ' & Theorem 9.23 (Mean Value Theorem) Let X and Y be normed linear spaces over K, D ⊆ X, f : D → Y, x1 , x2 ∈ D, and the line segment connecting x1 and x2 be contained in D. Assume that f is continuous at x = tx1 + (1 − t)x2 , ∀t ∈ I := [0, 1] ⊂ R, and Df (x) ∈ B(X, Y) exists at x = tx1 + (1 − t)x2, ∀t ∈ I ◦ . Then, ∃t0 ∈ I ◦ such that f (x1 ) − f (x2 ) ≤ Df (t0 x1 + (1 − t0 )x2 )(x1 − x2 ). Proof We will distinguish three exhaustive and mutually exclusive cases: Case 1: f (x1 ) = f (x2 ); Case 2: f (x1 ) = f (x2 ) and K = R; Case 3: f (x1 ) = f (x2 ) and K = C. Case 1: f (x1 ) = f (x2 ). Take t0 to be any point in I ◦ . The desired result follows. Case 2: f (x1 ) = f (x2 ) and K = R. By Proposition 7.85, ∃y∗ ∈ Y∗ with y∗  = 1 such that y∗ , f (x1 ) − f (x2 ) = f (x1 ) − f (x2 ). Define ϕ : I → D by

9.3 Chain Rule and Mean Value Theorem

269

ϕ(t) = tx1 + (1 − t)x2 , ∀t ∈ I . By Propositions 9.19, 9.16, and 9.15 and chain rule, ϕ is Fréchet differentiable. Define g : I → R by g(t) = y∗ , f (ϕ(t)), ∀t ∈ I . By Proposition 3.12, g is continuous. By chain rule and Propositions 9.17 and 9.19, g is Fréchet differentiable at t, ∀t ∈ I ◦ , and Dg(t)(d) = y∗ , Df (ϕ(t))(Dϕ(t)(d)) = y∗ , Df (ϕ(t))(d (x1 − x2 )) = y∗ , Df (ϕ(t))(x1 − x2 )d, ∀d ∈ R and ∀t ∈ I ◦ . Hence, Dg(t) = y∗ , Df (ϕ(t))(x1 − x2 ), ∀t ∈ I ◦ . By mean value theorem (Bartle, 1976, see Theorem 27.6), there exists t0 ∈ I ◦ such that g(1) − g(0) = Dg(t0 ). Then, we have f (x1 ) − f (x2 ) = y∗ , f (x1 ) − f (x2 ) = g(1)−g(0) = Dg(t0 ) ≤ |Dg(t0 )| = |y∗ , Df (ϕ(t0 ))(x1 − x2 )| ≤ Df (ϕ(t0 ))(x1 − x2 ), where the last inequality follows from Proposition 7.72. The desired result follows. Case 3: f (x1 ) = f (x2 ) and K = C. By Proposition 7.85, ∃y∗ ∈ Y∗ with y∗  = 1 such that y∗ , f (x1 ) − f (x2 ) = f (x1 ) − f (x2 ). Let D¯ := {a + i0 | a ∈ I } ⊂ ¯ By Propositions 9.19, 9.16, C. Define ϕ : D¯ → D by ϕ(t) = tx1 +(1−t)x2 , ∀t ∈ D. and 9.15 and chain rule, ϕ is Fréchet differentiable and Dϕ(t)(d) = d (x1 − x2 ), ¯ Define g : D¯ → C by g(t) = y∗ , f (ϕ(t)), ∀t ∈ D. ¯ ∀d ∈ C and ∀t ∈ D. By Proposition 3.12, g is continuous. By chain rule and Propositions 9.17 and 9.19, g is Fréchet differentiable at a + i0, ∀a ∈ I ◦ , and Dg(a + i0)(d) = y∗ , Df (ϕ(a + i0))(Dϕ(a + i0)(d)) = y∗ , Df (ϕ(a))(d (x1 − x2 )) = y∗ , Df (ϕ(a))(x1 − x2 )d, ∀d ∈ C and ∀a ∈ I ◦ . Hence, Dg(a + i0) = y∗ , Df (ϕ(a + i0))(x1 − x2 ), ∀a ∈ I ◦ . Note that g(1) − g(0) = y∗ , f (x1 ) − y∗ , f (x2 ) = f (x1 ) − f (x2 ) ∈ R. By Lemma 9.22, there exists t0 ∈ I ◦ such that g(1) − g(0) = Re (Dg(t0 )). Then, we have f (x1 ) − f (x2 ) = g(1) − g(0) ≤ |Dg(t0 )| = |y∗ , Df (ϕ(t0 ))(x1 − x2 )| ≤ Df (ϕ(t0 ))(x1 − x2 ), where the last inequality follows from Proposition 7.72. The desired result follows. This completes the proof of the theorem. ' & Proposition 9.24 Let X, Y, and Z be normed linear spaces over K, D ⊆ X × Y, f : D → Z, (x0 , y0 ) ∈ D. Assume that the following conditions hold: (i) ∃δ0 ∈ (0, ∞) ⊂ R, ∀(x, y) ∈ D ∩ BX×Y ((x0 , y0 ), δ0 ), we have (x, ty + (1 − t)y0 ) ∈ D, ∀t ∈ I := [0, 1] ⊂ R. (ii) f is partial differentiable with respect to x at (x0 , y0 ) and ∂f ∂x (x0 , y0 ) ∈ B(X, Z). (iii) ∀(x, y) ∈ D ∩ BX×Y ((x0 , y0 ), δ0 ), f is partial differentiable with respect to y at (x, y) and ∂f ∂y is continuous at (x0 , y0 ). Then, f is Fréchet differentiable at (x0 , y0 ), and Df (x0 , y0 ) ∈ B(X × Y, Z) is given by Df (x0 , y0 )(h, k) = ∂f (x0 , y0 )(h)+ ∂f 0 )(k), ∀(h, k) ∈ X ×Y. In “matrix” ∂x ∂y (x0 , yG F ∂f ∂f notation, Df (x0 , y0 ) = ∂x (x0 , y0 ) ∂y (x0 , y0 ) .

270

9 Differentiation in Banach Spaces

Proof We will first show that span (AD (x0 , y0 )) = X × Y. Define Dx0 := {y ∈ Y | (x0 , y) ∈ D} and Dy0 := {x ∈ X | (x, y0) ∈ D}. By the partial differentiability   of f with respect to x at (x0 , y0 ), we have span ADy0 (x0 ) = X. By the partial   differentiability of f with respect to y at (x0 , y0 ), we have span ADx0 (y0 ) = Y. ∀u ∈ ADy0 (x0 ), ∀ ∈ (0, ∞) ⊂ R, ∃x¯ := x0 + r u¯ ∈ Dy0 with 0 < r <  and u¯ ∈ BX (u, ). Then, (x, ¯ ϑY ) ∈ D and (u, ¯ ϑY ) ∈ ¯ y0 ) = (x0 , y0 ) + r(u, BX×Y ((u, ϑY ), ). Hence, (u, ϑY ) ∈ AD (x0 , y0 ). Then, ADy0 (x0 ) × {ϑY } ⊆ AD (x0 , y0 ), which implies that X × {ϑY } ⊆ span (AD (x0 , y0 )). By symmetry, we have {ϑX } × Y ⊆ span (AD (x0 , y0 )). By Proposition 7.17, span (AD (x0 , y0 )) = X × Y. ∂f Define L : X × Y → Z by L(h, k) = ∂f ∂x (x0 , y0 )(h) + ∂y (x0 , y0 )(k), ∀(h, k) ∈ X × Y. Clearly, L is a linear operator. Note that L =

.

sup

(h,k)∈X×Y, (h,k)≤1

L(h, k)

6 6 6 ∂f 6  6 ∂f 6 6 6 6 (x0 , y0 )6h + 6 (x0 , y0 )6k ∂x ∂y (h,k)∈X×Y, (h,k)≤1 K 6 ∂f 62 6 ∂f 62 6 6 6 6 ≤ sup 6 (x0 , y0 )6 + 6 (x0 , y0 )6 ∂x ∂y (h,k)∈X×Y, (h,k)≤1 K 4 62 6 ∂f 62 6 ∂f 6 6 6 6 · h2 + k2 ≤ 6 (x0 , y0 )6 + 6 (x0 , y0 )6 < +∞ ∂x ∂y ≤

sup

where the first inequality follows from Proposition 7.64 and the second inequality follows from Cauchy–Schwarz inequality. Hence, L ∈ B(X × Y, Z). ∀ ∈ (0, ∞) ⊂ R, by the partial differentiability of f with respect to x at (x ∩ BX×Y ((x0 , y0 ), δ1 ), we have 6 0 , y0 ), ∃δ1 ∈ (0, δ0 ] ⊂ R such that ∀(x, y)6 ∈ D √ 6 6 ∂f 6f (x, y0 ) − f (x0 , y0 ) − ∂x (x0 , y0 )(x − x0 )6 ≤ / 2x − x0 . By the continuity ∂f ∂y

at (x0 , y0 ), ∃δ2 ∈ (0, δ1 ] ⊂ R, such that ∀(x, ¯ y) ¯ ∈ D ∩ BX×Y ((x0 , y0 ), δ2 ), 6 6 √ 6 6 ∂f ∂f ¯ y) ¯ − ∂y (x0 , y0 )6 < / 2. we have 6 ∂y (x, 6 6 √ 6 6 ≤ / 2y − y0 , Claim 9.24.1 6f (x, y) − f (x, y0 ) − ∂f (x , y )(y − y ) 0 6 ∂y 0 0 ∀(x, y) ∈ D ∩ BX×Y ((x0 , y0 ), δ2 ). of

Proof of Claim Fix any (x, y) ∈ D ∩ BX×Y ((x0 , y0 ), δ2 ). Let D¯ = I if K = R; or D¯ = {a + i0 | a ∈ I } ⊂ C if K = C. Define ψ : D¯ → Z by ψ(t) = f (x, ty + (1 − ¯ t)y0 ) − ∂f ∂y (x0 , y0 )(t (y − y0 )), ∀t ∈ D. By chain rule, each term in the definition of ψ is Fréchet differentiable. By Proposition 9.15, ψ is Fréchet differentiable. By

9.4 Higher Order Derivatives

271

mean value theorem, ∃t0 ∈ I ◦ , 6 6 6 6 ∂f 6 6 . 6f (x, y) − f (x, y0 ) − ∂y (x0 , y0 )(y − y0 )6 = ψ(1) − ψ(0) ≤ Dψ(t0 ) 6 6 6 6 ∂f ∂f 6 (x0 , y0 )(y − y0 )6 = 6 (x, t0 y + (1 − t0 )y0 )(y − y0 ) − 6 ∂y ∂y 6 6 √ 6 ∂f 6 ∂f 6 ≤6 6 ∂y (x, t0 y + (1 − t0 )y0 ) − ∂y (x0 , y0 )6y − y0  ≤ / 2y − y0  where the second inequality follows from Proposition 7.64. This completes the proof of the claim. ' & Therefore, ∀(x, y) ∈ D ∩ BX×Y ((x0 , y0 ), δ2 ), we have .

f (x, y) − f (x0 , y0 ) − L(x − x0 , y − y0 ) 6 6 6 6 ∂f 6 (x f (x, y) − f (x, y ) − , y )(y − y ) ≤6 0 0 0 0 6 6 ∂y 6 6 6 6 ∂f 6 +6f (x, y0 ) − f (x0 , y0 ) − (x0 , y0 )(x − x0 )6 6 ∂x √ √ ≤ / 2y − y0  + / 2x − x0  ≤ (x − x0 , y − y0 )

where the last inequality follows from Cauchy–Schwarz inequality. Hence, Df (x0 , y0 ) = L. This completes the proof of the proposition. ' & We observe that Condition (i) of Proposition 9.24 is easily satisfied when (x0 , y0 ) ∈ D◦ .

9.4 Higher Order Derivatives 9.4.1 Basic Concept We introduce the following notation. Let .X and .Y be normed linear spaces over .K. Denote .B(X, Y) by .B1 (X, Y). Recursively, denote .B(X, Bk (X, Y)) by .Bk+1 (X, Y), .∀k ∈ N. Note that .Bk (X, Y) is the set of bounded multi-linear .Yvalued functions on .Xk , .∀k ∈ N. Define the subset of symmetric functions by .BS k (X, Y) := {L ∈ Bk (X, Y) | L(hk ) · · · (h1 ) = L(vk ) · · · (v1 ), ∀(h1 , . . . , hk ) ∈ Xk , ∀(v1 , . . . , vk ) = a permutation of (h1 , . . . , hk )}. Note that .BS k (X, Y) is a closed subspace of .Bk (X, Y). Then, by Proposition 7.13, .BS k (X, Y) is a normed linear space over .K. If .Y is a Banach space, then, by Proposition 7.66, .Bk (X, Y) is a Banach space. Then, by Proposition 4.39, .BS k (X, Y) is a Banach space. For notational consistency, we will denote .BS 0 (X, Y) := B0 (X, Y) := Y.

272

9 Differentiation in Banach Spaces

Definition 9.25 Let .X and .Y be normed linear spaces over .K, .D ⊆X, .f : D → Y, and .x0 ∈ D. Let .f (1) be defined with domain of definition .dom f (1) . We may   consider the derivative of .f (1) . If .f (1) is differentiable at .x0 ∈ dom f (1) , then f is said to be twice Fréchet differentiable at .x0 . The second order derivative of f at .x0 is .D(Df )(x0 ) =: D2 f (x0 ) =: f (2) (x0 ) ∈ B(X, B(X, Y)) = B2 (X, Y). 2 (2) .D f or .f the .B2 (X, Y)-valued function whose domain of definition    (2)will denote  := x ∈ dom f (1)  f (2) (x) ∈ B2 (X, Y) exists . Recursively, if is .dom f   (k) is Fréchet differentiable at .x ∈ dom f (k) , then f is said to be .(k + 1).f 0 times Fréchet differentiable at .x0 and the .(k + 1)th order derivative of f at (k) (x ) =: Dk+1 f (x ) =: f (k+1) (x ) ∈ B .x0 is .Df 0 0 0 k+1 (X, Y), where .k ∈ N. k+1 .D f or .f (k+1) will denote the .Bk+1 (X, Y)-valued function whose domain of      definition is .dom f (k+1) := x ∈ dom f (k)  f (k+1) (x) ∈ Bk+1 (X, Y) exists . For notational consistency, we will let .f (0) = f . %  (k)   (k+1)  ⊆ dom f ⊆ D, .∀k ∈ N. Note that .dom f Definition 9.26 Let .X and .Y be normed linear spaces over .K, .D ⊆ X, .f : D → Y, and .x0 ∈ D. Assume that .∃δ0 ∈ (0, ∞) ⊂ R such that f is k-times Fréchet differentiable  at x, .∀x ∈ D ∩ BX (x0 , δ0 ), where .k ∈ N, that is, .D ∩ BX (x0 , δ0 ) ⊆  dom f (k) , and .f (k) is continuous at .x0 . Then, we say that f is .Ck at .x0 . If f is .Ck at x, .∀x ∈ D, then we say f is .Ck . If f is .Ck at .∀x ∈ D ∩ BX (x0 , δ0 ), .∀k ∈ N, then we say that f is .C∞ at .x0 . If f is .C∞ at x, .∀x ∈ D, then we say that f is .C∞ . % Note that f being .(k + 1)-times differentiable at .x0 ∈ D does not imply that f is Ck at .x0 since .dom f (k) may not contain .D ∩ BX (x0 , δ), .∀δ ∈ (0, ∞) ⊂ R. When  (k)  ⊇ D ∩ BX (x0 , δ0 ), for some .δ0 ∈ (0, ∞) ⊂ R, and f is .(k + 1)-times .dom f differentiable at .x0 , then f is .Ck at .x0 . In particular, if f is .Ck+1 at .x0 , then f is .Ck at x, .∀x ∈ D ∩ BX (x0 , δ0 ), for some .δ0 ∈ (0, ∞) ⊂ R. If f is infinitely many times differentiable at x, .∀x ∈ D, then f is .C∞ . .

Proposition 9.27 Let .X and .Y be normed linear spaces over .K, .D ⊆ X, and .f : D → Y be .Cn+m−1 at .x0 and .(n + m)-times Fréchet differentiable at.x0 ∈ D, where .n, m ∈ N. Fix .(h1 , . . . , hn ) ∈ Xn . Define the function .g : dom f (n) →   Y by .g(x) = f (n) (x)(hn ) · · · (h1 ), .∀x ∈ Dn := dom f (n) . Then, the following statements hold: (i) g is m-times Fréchet differentiable at .x0 , and .g (m) (x0 ) ∈ Bm (X, Y) is given by .g (m) (x0 )(hn+m ) · · · (hn+1 ) = f (n+m) (x0 )(hn+m ) · · · (h1 ), m .∀(hn+1 , . . . , hn+m ) ∈ X . (ii) If f is .Cn+m at .x0 , then g is .Cm at .x0 . Proof We will first prove (i) using mathematical induction on m. 1◦ .m = 1. Since f is .Cn at .x0 , then .∃δ0 ∈ (0, ∞) ⊂ R such that .D ∩BX (x0 , δ0 ) ⊆   Dn . By the .(n + 1)-times differentiability of f at .x0 , we have .span ADn (x0 ) = X. Define .L : X → Y by, .∀h ∈ X, .L(h) = f (n+1) (x0 )(h)(hn ) · · · (h1 ). Clearly, L is a

9.4 Higher Order Derivatives

273

linear operator. Note that L =

.



6 (n+1) 6 6f (x0 )(h)(hn ) · · · (h1 )6

sup

L(h) =

sup

6 6 (n+1) 6f (x0 )6hhn  · · · h1 

h∈X, h≤1 h∈X, h≤1

sup

h∈X, h≤1

6 6 ≤ 6f (n+1) (x0 )6hn  · · · h1  < +∞ where the first inequality follows from Proposition 7.64. Hence, .L ∈ B(X, Y). (n) at .x , .∃δ ∈ (0, δ ] ⊂ R such .∀ ∈ (0, ∞) ⊂ R, by the differentiability of .f 0 0 6 6 (n) (n) 6 that .∀x ∈ Dn ∩ BX (x0 , δ), we have . f (x) − f (x0 ) − f (n+1) (x0 )(x − x0 )6 ≤ /(1 + hn  · · · h1 )x − x0 . Then, we have .

g(x) − g(x0 ) − L(x − x0 ) = f (n) (x)(hn ) · · · (h1 ) − f (n) (x0 )(hn ) · · · (h1 ) −f (n+1) (x0 )(x − x0 )(hn ) · · · (h1 ) 6 6 = 6(f (n) (x) − f (n) (x0 ) − f (n+1) (x0 )(x − x0 ))(hn ) · · · (h1 )6 6 6 ≤ 6f (n) (x) − f (n) (x0 ) − f (n+1) (x0 )(x − x0 )6hn  · · · h1  ≤ x − x0 

where the first inequality follows from Proposition 7.64. Hence, .Dg(x0 ) = L and g is Fréchet differentiable at .x0 . 2◦ Assume that (i) holds for .m ≤ k, .∀k ∈ N. 3◦ Consider the case .m = k + 1. Since f is .Cn+k at .x0 , then .∃δ0 ∈ (0, ∞) ⊂ R such that f is .(n + k)-times Fréchet differentiable at .x¯ and is .Cn+k−1 at .x, ¯ .∀x¯ ∈ D∩BX (x0 , δ0 ). By inductive assumption, g is k-times Fréchet differentiable at .x¯ and (k) (x)(h .g ¯ n+k ) · · · (hn+1 ) = f (n+k) (x)(h ¯ n+k ) · · · (h1 ), .∀(hn+1 , . . . , hn+k ) ∈ Xk .   (k) ¯ Hence, .x¯ ∈ Dn+k := dom g . Then, we have .D ∩ BX (x0 , δ0 ) ⊆ D¯ n+k ⊆ Dn ⊆ # " D. Note that .D¯ n+k ∩ BX (x0 , δ0 ) = D ∩ BX (x0 , δ0 ). Then, .span AD¯ n+k (x0 ) = span (AD (x0 )) = X, since f is differentiable at .x0 . Define .L : X → Bk (X, Y) by, .∀h ∈ X, .∀(hn+1 , . . . , hn+k ) ∈ Xk , (n+k+1) (x )(h)(h .L(h)(hn+k ) · · · (hn+1 ) = f 0 n+k ) · · · (h1 ). Clearly, L is a linear operator. Note that .

L = =

sup

h∈X, h≤1

L(h) sup

h∈X, h≤1, hn+i ∈X, hn+i ≤1, i=1,...,k

L(h)(hn+k ) · · · (hn+1 )

274

9 Differentiation in Banach Spaces

6 (n+k+1) 6 6f sup (x0 )(h)(hn+k ) · · · (h1 )6 h∈X, h≤1, hn+i ∈X, hn+i ≤1, i=1,...,k 6 (n+k+1) 6 ≤ 6f (x0 )6hn  · · · h1  < +∞

=

where the first inequality follows from Proposition 7.64. Then, .L ∈ Bk+1 (X, Y). (n+k) at .x , .∃δ ∈ (0, δ ] ⊂ R .∀ ∈ (0, ∞) ⊂ R, by the differentiability of .f 0 0  (n+k)  ∩ B such that .∀x ∈ dom f , δ) = D ∩ B , δ), we have (x (x 0 0 X X 6 6 (n+k) .6f (x) − f (n+k) (x0 ) − f (n+k+1) (x0 )(x − x0 )6 ≤ /(1 + hn  · · · h1 ) x − x0 . Then, .∀x ∈ D¯ n+k ∩ BX (x0 , δ) = D ∩ BX (x0 , δ), 6 6 (k) 6g (x) − g (k) (x0 ) − L(x − x0 )6 6 (k) 6(g (x) − g (k) (x0 ) = sup hn+i ∈X, hn+i ≤1, i=1,...,k 6 −L(x − x0 ))(hn+k ) · · · (hn+1 )6 6 (n+k) 6f (x)(hn+k ) · · · (h1 ) = sup hn+i ∈X, hn+i ≤1, i=1,...,k

.

6 −f (n+k) (x0 )(hn+k ) · · · (h1 ) − f (n+k+1) (x0 )(x − x0 )(hn+k ) · · · (h1 )6 6 6 ≤ 6f (n+k) (x) − f (n+k) (x0 ) − f (n+k+1) (x0 )(x − x0 )6hn  · · · h1  ≤ x − x0 

where the first inequality follows from Proposition 7.64. Hence, .g (k+1) (x0 ) = Dg (k) (x0 ) = L. Therefore, g is .(k + 1)-times differentiable at .x0 . This completes the induction process and the proof of (i). For (ii), let f be .Cn+m at .x0 . By Definition  9.26, .∃δ0 ∈ (0, ∞) ⊂ R such that .D ∩ BX (x0 , δ0 ) ⊆ Dn+m := dom f (n+m) ⊆ D and .f (n+m) is continuous at .x0 . .∀x ∈ Dn+m ∩ BX (x0 , δ0 ) = D ∩ BX (x0 , δ0 ), f is .Cn+m−1 at x and .(n + m)-times differentiable at x. By (i), g is m-times Fréchet differentiable at x, and .∀(hn+1 , . . . , hn+m ) ∈ Xm , we have .g (m) (x)(h  ) · · · (hn+1 ) =  n+m f (n+m) (x)(hn+m ) · · · (h1 ). Then, .D ∩ BX (x0 , δ0 ) ⊆ dom g (m) ⊆ Dn ⊆ D. (n+m) at .x , .∃δ ∈ (0, δ ] ⊂ R such that .∀ ∈ (0, ∞) ⊂ R, by the continuity of .f 0 0 6 6 (n+m) (n+m) 6 6 . f (x) − f (x0 ) < /(1 + hn  · · · h1 ), .∀x ∈ Dn+m ∩ BX (x0 , δ) =   D ∩ BX (x0 , δ). .∀x¯ ∈ dom g (m) ∩ BX (x0 , δ) = D ∩ BX (x0 , δ), we have .

6 6 (m) 6g (x) − g (m) (x0 )6 =

6 6 (m) 6(g (x) − g (m) (x0 ))(hn+m ) · · · (hn+1 )6 sup hn+i ∈X, hn+i ≤1, i=1,...,m

9.4 Higher Order Derivatives

275

6 (n+m) 6f sup (x)(hn+m ) · · · (h1 ) hn+i ∈X, hn+i ≤1, i=1,...,m 6 −f (n+m) (x0 )(hn+m ) · · · (h1 )6 6 6 ≤ 6f (n+m) (x) − f (n+m) (x0 )6hn  · · · h1  < 

=

where the first inequality follows from Proposition 7.64. Hence, .g (m) is continuous at .x0 . Therefore, g is .Cm at .x0 . This completes the proof of the proposition. ' & Proposition 9.28 Let .X and .Y be normed linear spaces over .K, .D ⊆ X, .x0 ∈ D, and .f : D → Y be .Cn at .x0 , where .n ∈ N. Assume that D is locally convex at .x0 . Then, .f (n) (x0 ) ∈ BS n (X, Y). Proof By the assumption, .∃δ0 ∈ (0, ∞) ⊂ R such that .D ∩ BX (x0 , δ0 ) is convex. Without loss of generality, assume f is n-times differentiable at x, .∀x ∈ D ∩ BX (x0 , δ0 ). We will prove the proposition by mathematical induction on n. 1◦ .n = 1. Clearly, .f (1) (x0 ) ∈ B(X, Y) = B1 (X, Y) = BS 1 (X, Y). Next, we consider .n = 2. We will prove this case using an argument of contradiction. Suppose .f (2) (x0 ) is not symmetric. Then, .∃h¯ 0 , l¯0 ∈ X such that (2) (x )(h (2) (x )(l¯ )(h .f 0 ¯ 0 )(l¯0 ) = f 0 0 ¯ 0 ). By the differentiability of f at .x0 , we have .span (AD (x0 )) = X. Then, by .f (2) (x0 ) ∈ B2 (X, Y) and Proposition 3.56, ˜ 0 , l˜0 ∈ span (AD (x0 )) such that .f (2) (x0 )(h˜ 0 )(l˜0 ) = f (2)(x0 )(l˜0 )(h˜ 0 ). Since .∃h (2) (x ) is multi-linear, then .∃h ˆ 0 , lˆ0 ∈ AD (x0 ) such that .f (2) (x0 )(hˆ 0 )(lˆ0 ) = .f 0 (2) (2) f (x0 )(lˆ0")(hˆ 0 ). #By continuity " of #.f (x0 ), .∃1 ∈ (0, ∞) ⊂ R such that ˇ ∈ BX hˆ 0 , 1 , .∀lˇ ∈ BX lˆ0 , 1 , we have .f (2) (x0 )(h)( ˇ l) ˇ = f (2)(x0 )(l)( ˇ h). ˇ .∀h " # By .hˆ 0 , lˆ0 ∈ AD (x0 ), .∃rh , rl ∈ (0, 1 ) ⊂ R, .∃hˇ 0 ∈ BX hˆ 0 , 1 , .∃lˇ0 ∈ " # BX lˆ0 , 1 , such that .x0 + rh hˇ 0 , x0 + rl lˇ0 ∈ D ∩ BX (x0 , δ0 ). Clearly, we have .f (2) (x0 )(rh hˇ 0 )(rl lˇ0 ) = rh rl f (2) (x0 )(hˇ 0 )(lˇ0 ) = rh rl f (2)(x0 )(lˇ0 )(hˇ 0 ) = f (2) (x0 )(rl lˇ0 )(rh hˇ 0 ). Let .h0 := rh /2hˇ 0 and .l0 := rl /2lˇ0 . Then, by the convexity of the set .D ∩ BX (x0 , δ0 ), we have .x0 , x0 + h0 , x0 + l0 , x0 + h0 + l0 ∈ D ∩ BX (x0 , δ0 ), (2) (x )(l )(h ). Clearly, .h = ϑ and .l = ϑ . Let and .f (2) 0 0 0 0 0 X X 6 (x(2)0 )(h0 )(l0 ) = f (2) 6 .0 := 6f (x0 )(h0 )(l0 ) − f (x0 )(l0 )(h0 )6/(h0 l0 ) ∈ (0, ∞) ⊂ R. Since f is .C2 at .x0 , then .∃δ16 ∈ (0, δ0 ] ⊂ R such that .∀x ∈ D ∩ BX (x0 , δ1 ), we have 6 (2) (x) − f (2) (x )6 <  /2. By proper scaling of .h and .l , we may assume that .6f 0 0 0 0 .∀t1 , t2 ∈ I := [0, 1] ⊂ R, .x0 + t1 h0 + t2 l0 ∈ D ∩ BX (x0 , δ1 ). 6 {ϑX } such that .0 := 6f (2) (x0 )(h0 )(l0 ) In summary, .∃h , l ∈ X \ 0 0 6 (2) (x )(l )(h )6/(h l ) ∈ (0, ∞) ⊂ R, .∀t , t ∈ I , f is twice differen. −f 0 0 0 0 0 1 2 6 6 tiable at .x0 +t1 h0 +t2 l0 ∈ D ∩BX (x0 , δ0 ), and .6f (2)(x0 + t1 h0 + t2 l0 ) − f (2) (x0 )6 < 0 /2.

276

9 Differentiation in Banach Spaces

Let .D¯ := I , if .K = R; or .D¯ := {a + i0 | a ∈ I } ⊂ C, if .K = C. .∀t1 ∈ I , define .ψt1 : D¯ → B(X, Y) by .ψt1 (t2 ) = f (1) (x0 + t1 h0 + t2 l0 ) − f (1)(x0 + t1 h0 ) − ¯ By Propositions 9.19, 9.16, and 9.15 and chain rule, each f (2) (x0 )(t2 l0 ), .∀t2 ∈ D. term in the definition of .ψt1 is Fréchet differentiable. Then, by Proposition 9.15, .ψt1 ¯ By mean value theorem, .∃t¯2 ∈ I ◦ such that is Fréchet differentiable at .t2 , .∀t2 ∈ D. .

6 (1) 6 6f (x0 + t1 h0 + l0 ) − f (1) (x0 + t1 h0 ) − f (2) (x0 )(l0 )6 6 6 6 6 = 6ψt1 (1) − ψt1 (0)6 ≤ 6Dψt1 (t¯2 )6 6 6 = 6f (2) (x0 + t1 h0 + t¯2 l0 )(l0 ) − f (2) (x0 )(l0 )6 6 6 ≤ 6f (2) (x0 + t1 h0 + t¯2 l0 ) − f (2) (x0 )6l0  < 0 l0 /2

where the second inequality follows from Proposition 7.64. Define .γ : D¯ → Y by .γ (t1 ) = f (x0 + l0 + t1 h0 ) − f (x0 + t1 h0 ) − (2) ¯ By Propositions 9.19, 9.16, and 9.15 and chain rule, f (x0 )(l0 )(t1 h0 ), .∀t1 ∈ D. each term in the definition of .γ is Fréchet differentiable. Then, by Proposition 9.15, ¯ By mean value theorem, .∃t¯1 ∈ I ◦ such .γ is Fréchet differentiable at .t1 , .∀t1 ∈ D. that .

6 6 6f (x0 + l0 + h0 ) − f (x0 + h0 ) − f (x0 + l0 ) + f (x0 ) − f (2)(x0 )(l0 )(h0 )6 6 6 6 = γ (1) − γ (0) ≤ 6Dγ (t¯1 )6 = 6f (1) (x0 + l0 + t¯1 h0 )(h0 ) 6 −f (1) (x0 + t¯1 h0 )(h0 ) − f (2) (x0 )(l0 )(h0 )6 6 6 ≤ 6f (1) (x0 + l0 + t¯1 h0 ) − f (1) (x0 + t¯1 h0 ) − f (2) (x0 )(l0 )6h0  6 6 = 6ψt¯1 (1) − ψt¯1 (0)6h0  < 0 l0 h0 /2

where the second inequality6follows from Proposition 7.64. 6 By symmetry, we + l0 ) − f (x0 + h0 ) + f (x0 ) 6 have . f (x0 + h0 + l0 ) − f 6(x0(2) (2) 6 6 h l /2. . −f (x )(h )(l ) <  Then, . (x0 )(h0 )(l60 ) − f (2) (x0 )(l0 ) f 0 0 0 60 0 0 (2) (x )(h )(l ) . · (h0 )6 < 0 h0 l0 . This leads to the contradiction .0 := 6f 0 0 0 6 (2) 6 . −f (x0 )(l0 )(h0 ) /(h0 l0 ) < 0 . Hence, .f (2)(x0 ) must be symmetric and .f (2) (x0 ) ∈ BS 2 (X, Y). ◦ 2 Assume that the result holds .∀n ≤ k, .k ∈ {2, 3, . . .}. 3◦ Consider the case .n = k + 1. .∀(h1 , . . . , hk+1 ) ∈ Xk+1 , let .(v1 , . . . , vk+1 ) be a permutation of .(h1 , . . . , hk+1 ). We need to show that f (k+1) (x0 )(hk+1 ) · · · (h1 ) = f (k+1)(x0 )(vk+1 ) · · · (v1 )

.

9.4 Higher Order Derivatives

277

Since any permutation can be arrived at in a finite number of steps by interchanging two consecutive elements, then all we need to show is that, .∀i = 1, . . . , k, .

f (k+1) (x0 )(hk+1 ) · · · (hi+1 )(hi ) · · · (h1 ) = f (k+1) (x0 )(hk+1 ) · · · (hi )(hi+1 ) · · · (h1 )

We will distinguish two exhaustive and mutually exclusive cases: Case 1: .1 ≤ i < k; Case 2: .i = k. Case 1: .1 ≤ i < k. Define .g : dom f (k) → Y   by .g(x) = f (k) (x)(hk ) · · · (hi+1 )(hi ) · · · (h1 ), .∀x ∈ dom f (k) . .∀x ∈ D ∩  (k+1)  ∩ BX (x0 , δ0 ), f is .(k + 1)-times differentiable at x, BX (x0 , δ0 ) = dom f   and .dom f (k) ⊇ D ∩ BX (x, δx ), where .δx := δ0 − x − x0  ∈ (0, ∞) ⊂ R. Then, f is .Ck at x. Clearly, .D ∩ BX (x, δx ) = (D ∩ BX (x0 , δ0 )) ∩ BX (x, δx ) is convex. By inductive assumption, .g(x) = f (k) (x)(hk ) · · · (hi )(hi+1 ) · · · (h1 ). By Proposition 9.27, g is .C1 at .x0 and .f (k+1) (x0 )(hk+1 ) · · · (hi+1 )(hi ) · · · (h1 ) = g (1) (x0 )(hk+1 ) = f (k+1)(x0 )(hk+1 ) · · · · (hi )(hi+1 ) · · · (h1 ). This case is proved.   Case 2: .i = k. Define .g : dom f (k−1) → Y by, .∀x ∈ dom f (k−1) , (k−1) (x)(h .g(x) = f k−1 ) · · · (h1 ). Then, by Proposition 9.27, g is .C2 at .x0 (2) and .g (x0 )(u)(v) = f (k+1)(x0 )(u)(v)(hk−1 ) · · · (h1 ), .∀u, v ∈ X. Note that  (k−1) .dom f ∩ BX (x0 , δ0 ) = D ∩ BX (x0 , δ0 ) is convex. By the case of .n = 2, we have .f (k+1)(x0 )(hk+1 )(hk ) · · · (h1 ) = g (2) (x0 )(hk+1 )(hk ) = g (2) (x0 )(hk )(hk+1 ) = f (k+1) (x0 )(hk )(hk+1 ) · · · (h1 ). This case is proved. Hence, .f (k+1)(x0 ) ∈ BS k+1 (X, Y). This completes the induction process and the proof of the proposition. ' & Definition 9.29 Let .X1 , . . . , Xp , and .Y be normed linear spaces over .K, where p p ∈ {2, 3, . . .}, .D ⊆ X := "i=1 X#i , .f : D → Y, and .xo := (x1 o , . . . , xp o ) ∈   ∂f ∂f D. Assume that . ∂x → B X : dom , Y is partial differentiable with i 1 ∂x i1 " i1# ∂f , where .i1 , i2 ∈ {1, . . . , p}. Then, this parrespect to .xi2 at .xo ∈ dom ∂x i1    2 f (xo ) ∈ B Xi2 , B Xi1 , Y , which is one tial derivative is denoted by . ∂x∂i ∂x i

.

2

1

2

f denote the of the second order partial derivatives of f . We will let . ∂x∂i ∂x 2 i" 1 #    ∂2f .B Xi2 , B Xi1 , Y -valued function whose domain of definition is .dom ∂xi2 ∂xi1 := * " # +    ∂f  ∂ 2 f x ∈ dom ∂x (x) ∈ B X , B X , Y exists . There are a total of .p2  i2 i1 ∂xi2 ∂xi1 i1 second order partial derivatives of f . Recursively, we may define kth order partial derivatives, .k ∈ {3, 4, . . .}. There are a total of .pk k order partial derivatives. A kf , where .i1 , . . . , ik ∈ {1, . . . , p}. % typical such derivative is denoted by . ∂xi ∂···∂x i k

1

 Let .X be a normed linear space over .K and .D ⊆ X. Define .Sm (D) := α∈K αD. Clearly, .Sm (D) ⊆ span (D).

278

9 Differentiation in Banach Spaces

Proposition 9.30 Let .X be a normed linear space over .K, .D ⊆ X, and x0 ∈ D. Then, .span (AD (x0 )) ⊆ span ((D ∩ B(x0 , δ)) − x0 ) and .AD (x0 ) ⊆ Sm ((D ∩ B(x0 , δ)) − x0 ), .∀δ ∈ (0, ∞) ⊂ R.

.

Proof .∀u ∈ AD (x0 ), we will show that .u ∈ Sm (D − x0 ). .∀ ∈ (0, ∞) ⊂ R, by .u ∈ AD (x0 ), we have .∃x¯ := x0 + r u¯ ∈ D such that .0 < r <  and .u ¯ ∈ B(u, ). Then, .u¯ ∈ Sm (D − x0 ) ∩ B(u, ) = ∅. Hence, by Proposition 3.3, .u ∈ Sm (D − x0 ). By the arbitrariness of u, we have .AD (x0 ) ⊆ Sm (D − x0 ) ⊆ span (D − x0 ). By Proposition 7.17, .span (AD (x0 )) ⊆ span (D − x0 ). By Definition 3.2, .span (AD (x0 )) ⊆ span (D − x0 ). .∀δ ∈ (0, ∞) ⊂ R, we have .x0 ∈ D ∩ B(x0 , δ) and .AD (x0 ) = AD∩B(x0 ,δ) (x0 ). Then, .AD (x0 ) ⊆ Sm ((D ∩ B(x0 , δ)) − x0 ) and .span ((D ∩ B(x0 , δ)) − x0 ) ⊇ span (AD (x0 )). This completes the proof of the proposition. ' &

9.4.2 Interchange Order of Differentiation Next, we present two results on the interchangeability of order of differentiation. The first result does not assume the existence of the derivative after the interchange, which then requires a stronger assumption on the set .D = dom (f ). The second result assumes the existence of the derivative after the interchange, where the stronger assumption on D can be removed. Proposition 9.31 Let .X, .Y, and .Z be normed linear spaces over .K, .D ⊆ X × Y, f : D → Z, and .(x0 , y0 ) ∈ D. Assume that the following conditions are satisfied:

.

(i) D is locally convex at .(x0 , y0 ), that is, .∃δ0 ∈ (0, ∞) ⊂ R such that ˜ we have ˜ is convex. Furthermore, .∀(x, y) ∈ D, .D ∩ BX×Y ((x0 , y0 ), δ0 ) =: D ˜ .(x, y0 ), (x0 , y) ∈ D. ∂f ∂2f ∂2f ˜ (ii) . ∂f ∂x (x, y), . ∂y (x, y), and . ∂x∂y (x, y) exist, .∀(x, y) ∈ D, and . ∂x∂y is continuous at .(x0 , y0 ). Then, .∀ ∈ (0, ∞) ⊂ R, .∃δ ∈ 6(0, δ0 ] ⊂ R such that .∀(x, y) ∈ D ∩ 6 BX×Y ((x0 , y0 ), δ) =: D˜ δ , we have .6f (x, y) − f (x, y0 ) − f (x0 , y) + f (x0 , y0 ) 6 6 ∂2f .− ∂x∂y (x0 , y0 )(x − x0 )(y − y0 )6 ≤ x − x0 y − y0 . Furthermore, if, in addition, the following condition is satisfied: ˜ ¯ (iii) .∃M ∈ [0, ∞) ⊂ R,  .∀h ∈ X, .∀ ∈ (0, ∞) ⊂ R, .∀(x, y) ∈ D, .∀δ ∈ (0, ∞) ⊂ R, .∃h1 , h2 ∈ Sm (Dy ∩ BX x0 , δ¯ ) − x0 such that .h − h1 + h2  ≤ h, .h1  ≤ Mh, and .h2  ≤ Mh, where .Dy := {x ∈ X | (x, y) ∈ D}, .∀y ∈ Y.

9.4 Higher Order Derivatives

279

2

∂ f Then, . ∂y∂x (x0 , y0 ) exists and is given by

.

∂ 2f ∂ 2f (x0 , y0 )(k)(h) = (x0 , y0 )(h)(k); ∂y∂x ∂x∂y

Proof .∀ ∈ (0, ∞) ⊂ R, by the continuity of

∀h ∈ X, ∀k ∈ Y ∂2f ∂x∂y

at .(x0 , y0 ), .∃δ ∈ ˜ we (0, δ0 ]6 ⊂ R such that .∀(x, y) 6 ∈ D ∩ BX×Y ((x0 , y0 ), δ) =: D˜ δ ⊆ D, 6 6 ∂2f ∂2f have .6 ∂x∂y (x, y) − ∂x∂y (x0 , y0 )6 < /3. Note that, by (i), .D˜ δ is convex and, ˜ δ , we have .(x0 , y), (x, y0 ) ∈ D˜ δ . .∀(x, y) ∈ D 6 6 Claim 9.31.1 .∀(x, y) ∈ D˜ δ , .6f (x, y) − f (x, y0 ) − f (x0 , y) + f (x0 , y0 ) 6 6 ∂2f .− ∂x∂y (x0 , y0 )(x − x0 )(y − y0 )6 ≤ /3x − x0 y − y0 . .

Proof of Claim Fix .(x, y) ∈ D˜ δ . Let .I := [0, 1] ⊂ R. Define .D¯ := I if .K = R, or D¯ := {a + i0 | a ∈ I } ⊆ C if .K = C. ¯ → B(Y, Z) by .ψt2 (t1 ) = ∂f (t1 x + (1 − t1 )x0 , t2 y + .∀t2 ∈ I , define .ψt2 : D ∂y

.

2

∂ f ¯ (1 − t2 )y0 ) − ∂f ∂y (x0 , t2 y + (1 − t2 )y0 ) − ∂x∂y (x0 , y0 )(t1 (x − x0 )), .∀t1 ∈ D. By Propositions 9.19, 9.16, and 9.15 and chain rule, each term in the definition of .ψt2 is differentiable. By Proposition 9.15, .ψt2 is differentiable. By mean value theorem, ◦ .∃t¯1 ∈ I such that

.

6∂f ∂f 6 (x0 , t2 y + (1 − t2 )y0 ) 6 (x, t2 y + (1 − t2 )y0 ) − ∂y ∂y 6 ∂ 2f 6 (x0 , y0 )(x − x0 )6 − ∂x∂y 6 6 6 6 = 6ψt2 (1) − ψt2 (0)6 ≤ 6Dψt2 (t¯1 )6 6 ∂ 2f 6 (t¯1 x + (1 − t¯1 )x0 , t2 y + (1 − t2 )y0 )(x − x0 ) =6 ∂x∂y 6 ∂ 2f 6 (x0 , y0 )(x − x0 )6 ≤ /3x − x0  − ∂x∂y

where the last inequality follows from Proposition 7.64. Define .γ : D¯ → Z by .γ (t2 ) = f (x, t2 y + (1 − t2)y0 ) − f (x0 , t2 y + (1 − t2)y0 ) − ∂2f ¯ ∂x∂y (x0 , y0 )(x −x0 )(t2 (y −y0 )), .∀t2 ∈ D. By Propositions 9.19, 9.16, and 9.15 and chain rule, each term in the definition of .γ is differentiable. By Proposition 9.15, .γ is differentiable. By mean value theorem, .∃t¯2 ∈ I ◦ such that .

2 6 6 6f (x, y) − f (x, y0 ) − f (x0 , y) + f (x0 , y0 ) − ∂ f (x0 , y0 )(x − x0 )(y − y0 )6 ∂x∂y 6 6 = γ (1) − γ (0) ≤ 6Dγ (t¯2 )6

280

9 Differentiation in Banach Spaces

6 ∂f ∂f 6 (x0 , t¯2 y + (1 − t¯2 )y0 )(y − y0 ) = 6 (x, t¯2 y + (1 − t¯2 )y0 )(y − y0 ) − ∂y ∂y 6 ∂ 2f 6 (x0 , y0 )(x − x0 )(y − y0 )6 − ∂x∂y 6 ∂f ∂f 6 (x0 , t¯2 y + (1 − t¯2 )y0 ) ≤ 6 (x, t¯2 y + (1 − t¯2 )y0 ) − ∂y ∂y 6 ∂ 2f 6 (x0 , y0 )(x − x0 )6y − y0  − ∂x∂y 6 6 6 = ψt¯2 (1) − ψt¯2 (0)6y − y0  ≤ /3x − x0 y − y0  where the second inequality follows from Proposition 7.64. This completes the proof of the claim. ' &  * " #+  ∂f Define .Dx0 := {y ∈ Y | (x0 , y) ∈ D} and .Dˆ := y ∈ Y  (x0 , y) ∈ dom ∂x .     By (ii), we have .span ADx0 (y0 ) = Y and .span ADy (x) = X, .∀(x, y) ∈ ˜ Clearly, we have .Dˆ ∩ BY (y0 , δ0 ) = Dx0 ∩ BY (y0 , δ0 ). This implies that D.   .span A ˆ (y0 ) = Y. D 2

∂ f ∀k ∈ Y, define .Lk : X → Z by .Lk (h) = ∂x∂y (x0 , y0 )(h)(k), .∀h ∈ X. Clearly, .Lk is a linear operator. Note that .Lk  = suph∈X, h≤1 Lk (h) ≤ 6 2 6 6∂ f 6 6 ∂x∂y (x0 , y0 )6k < +∞, where the first inequality follows from Proposition 7.64. Hence, .Lk ∈ B(X, Z). Define .L : Y → B(X, Z) by .L(k) = Lk , .∀k Y. Clearly, 6 ∈ 6 L is a linear operator. 6 ∂2f 6 Note that .L = supk∈Y, k≤1 L(k) ≤ 6 ∂x∂y (x0 , y0 )6 < +∞. Hence, .L ∈ B(Y, B(X, Z)). " 6 6 √ # Now, fix any .y ∈ Dˆ ∩ BY y0 , δ/ 2 ; we will show that .6Δy 6 ≤ (2M + .

∂f ∂f ∂x (x0 , y) − ∂x (x0 , y0 ) 2 ∂ f that . ∂y∂x (x0 , y0 ) = L and

1)y − y0 , where .Δy :=

− L(y − y0 ) ∈ B(X, Z).

This immediately implies completes the proof of the proposition. We will distinguish two exhaustive and mutually exclusive cases: Case 1: .y = y0 ; Case 2: .y = y0 . Case 1: .y = y0 . The √ result is immediate. Case 2: .y =  y0 . By  the ∂f x existence of . (x , y), . ∃δ ∈ (0, δ/ 2] ⊂ R such that . ∀x ∈ D ∩B , δ y1 y X 0 y1 , we ∂x 0 6 6 6 6 ∂f have .6f (x, y) − f (x0 , y) − ∂x (x0 , y)(x − x0 )6 ≤ /3y − y0 x − x0 . By the √   existence 6of . ∂f ∂x (x0 , y0 ), .∃δy2 ∈ (0, δ/ 2] ⊂ R such that 6 .∀x ∈ Dy0 ∩ BX x0 , δy2 , 6 6 we have .6f (x, y0 ) − f (x0 , y0 ) − ∂f ∂x (x0 , y0 )(x − x0 )6 ≤ /3y − y0 x − x0 . √     Let .δy = min δy1 , δy2 ∈ (0, δ/ 2] ⊂ R. .∀x ∈ Dy ∩ BX x0 , δy , we have   ˜ δ . Then, .x ∈ Dy0 ∩ BX x0 , δy . By Claim 9.31.1 and the .(x, y), (x, y0 ), (x0 , y) ∈ D

9.4 Higher Order Derivatives

281

preceding argument, we have 6 6 6Δy (x − x0 )6 6 6 ∂f ∂f 6 6 (x0 , y0 )(x − x0 ) − L(y − y0 )(x − x0 )6 = 6 (x0 , y)(x − x0 ) − ∂x ∂x 6 ∂f 6 6 6 6 6 ≤ 6 (x0 , y)(x − x0 ) − f (x, y) + f (x0 , y)6 + 6f (x, y0 ) − f (x0 , y0 ) ∂x 6 6 ∂f 6 6 − (x0 , y0 )(x − x0 )6 + 6f (x, y) − f (x, y0 ) − f (x0 , y) + f (x0 , y0 ) ∂x 6 ∂ 2f 6 (x0 , y0 )(x − x0 )(y − y0 )6 ≤ x − x0 y − y0  − ∂x∂y

.

    ∀h ∈ X, by of .Δy , .∃h1 , h2 ∈ Sm (Dy ∩ BX x0 , δy ) − x0 , 6 6 (iii) and the continuity such that .6Δy (h − h1 + h2 )6 ≤ hy − y0, .h1  ≤ Mh, and .h2  ≤ Mh. Then, .∃α1 , α2 ∈ K and .∃x1 , x2 ∈ Dy ∩ BX x0 , δy such that .hi = αi (xi − x0 ), .i = 1, 2. Then, we have .

6 6 6 6 6Δy (h)6 ≤ 6Δy (h1 − h2 )6 + hy − y0  6 6 = 6α1 Δy (x1 − x0 ) − α2 Δy (x2 − x0 )6 + hy − y0  6 6 6 6 ≤ |α1 |6Δy (x1 − x0 )6 + |α2 |6Δy (x2 − x0 )6 + hy − y0 

.

≤ (|α1 |x1 − x0  + |α2 |x2 − x0  + h)y − y0  = (h1  + h2  + h)y − y0  ≤ (2M + 1)y − y0 h 6 6 Then, we have .6Δy 6 ≤ (2M + 1)y − y0 . This case is proved. This completes the proof of the proposition.

' &

In the preceding proposition, conditions (i) and (iii) are assumptions on the set D. It is clear that (i) and (iii) are satisfied if .(x0 , y0 ) ∈ D ◦ . Proposition 9.32 Let .X, .Y, and .Z be normed linear spaces over .K, .D ⊆ X × Y, f : D → Z, and .(x0 , y0 ) ∈ D. Assume that the following conditions are satisfied:

.

(i) D is locally convex at .(x0 , y0 ), that is, .∃δ0 ∈ (0, ∞) ⊂ R such that ˜ we have ˜ is convex. Furthermore, .∀(x, y) ∈ D, .D ∩ BX×Y ((x0 , y0 ), δ0 ) =: D ˜ .(x, y0 ), (x0 , y) ∈ D. ∂f ∂2f ∂2f ∂2f ˜ (ii) . ∂f ∂x (x, y), . ∂y (x, y), . ∂x∂y (x, y), and . ∂y∂x (x, y) exist, .∀(x, y) ∈ D, and . ∂x∂y 2

∂ f and . ∂y∂x are continuous at .(x0 , y0 ). 2

∂ f Then, . ∂y∂x (x0 , y0 )(k)(h) =

∂2f ∂x∂y (x0 , y0 )(h)(k), .∀h

∈ X, .∀k ∈ Y.

282

9 Differentiation in Banach Spaces

Proof Define .Dx0 := {y ∈ Y | (x0 , y) ∈ D} and .Dy0 := {x ∈ X | (x, y0) ∈ D}. By     (ii), we have .span ADx0 (y0 ) = Y and .span ADy0 (x0 ) = X. .∀ ∈ (0, ∞) ⊂ R, by Proposition 9.31, .∃δ1 ∈ (0, δ0 ] ⊂ R such that .∀(x, y) ∈ 6 ˜ we have .6 D ∩ BX×Y ((x0 , y0 ), δ1 ) =: D˜ δ1 ⊆ D, 6f (x, y) − f (x, y0 ) − f (x0 , y) 6 2 6 ∂ f . +f (x0 , y0 ) − ∂x∂y (x0 , y0 )(x − x0 )(y − y0 )6 ≤ /2x − x0 y − y0 . By symmetry, .∃δ2 ∈6 (0, δ0 ] ⊂ R such that .∀(x, y) ∈ BX×Y ((x0 , y0 ), δ2 ) ∩ D =: ∂2f ˜ we have .6 D˜ δ2 ⊆ D, 6f (x, y) − f (x, y0 ) − f (x0 , y) + f (x0 , y0 ) − ∂y∂x (x0 , y0 ) 6 6 . (y − y0 )(x − x0 )6 ≤ /2x − x0 y − y0 . Therefore, let .δ := min {δ1 , δ2 } ∈ (0, δ0 ] ⊂ R, and 6 2 .∀(x, y) ∈ D ∩ ∂ f ˜ we have .6 BX×Y ((x0 , y0 ), δ) =: D˜ δ = D˜ δ1 ∩ D˜ δ2 ⊆ D, 6 ∂y∂x (x0 , y0 )(y − y0 ) 6 6 ∂2f . (x − x0 ) − ∂x∂y (x0 , y0 )(x − x0 )(y − y0 )6 ≤ x − x0 y − y0 . By the arbitrari2

2

∂ f ∂ f ness of ., we have . ∂y∂x (x0 , y0 )(y − y0 )(x − x0 ) = ∂x∂y (x0 , y0 )(x − x0 )(y − y0 ). Clearly, by (i), .D˜ δ is convex. Suppose that the result of the proposition is not true; then .∃h0 ∈ X and ∂2f ∂2f .∃k0 ∈ Y such that . ∂y∂x (x0 , y0 )(k0 )(h0 ) = ∂x∂y (x0 , y0 )(h0 )(k0 ). By the fact that     ∂2f .span ADx (y0 ) = Y and .span ADy (x0 ) = X and the continuity of . ∂y∂x (x0 , y0 ) 0 0     ∂2f and . ∂x∂y (x0 , y0 ), .∃h¯ 0 ∈ span ADy0 (x0 ) and .∃k¯0 ∈ span ADx0 (y0 ) such that .

∂2f ¯ ¯ ∂y∂x (x0 , y0 )(k0 )(h0 )

=

∂2f ¯ ¯ ∂x∂y (x0 , y0 )(h0 )(k0 ).

By the multi-linearity of these two (x0 ) and .∃kˆ0 ∈ ADx0 (y0 ) such that

second order partial derivatives, .∃hˆ 0 ∈ ADy0 2 ∂2f . (x0 , y0 )(kˆ0 )(hˆ 0 ) = ∂ f (x0 , y0 )(hˆ 0 )(kˆ0 ). By Proposition 9.30, we have ∂y∂x

∂x∂y

.

  ADy0 (x0 ) ⊆ Sm (Dy0 ∩ BX (x0 , δ)) − x0   ADx0 (y0 ) ⊆ Sm (Dx0 ∩ BY (y0 , δ)) − y0

This implies order partial derivatives,   that, by the continuity of these two second  ∃hˇ 0 ∈ Sm (Dy0 ∩ BX (x0 , δ)) − x0 and .∃kˇ0 ∈ Sm (Dx0 ∩ BY (y0 , δ)) − y0 such 2 2 that . ∂ f (x0 , y0 )(kˇ0 )(hˇ 0 ) = ∂ f (x0 , y0 )(hˇ 0 )(kˇ0 ). Then, .∃x¯ ∈ Dy0 ∩ BX (x0 , δ),

.

∂y∂x

∂x∂y

∃y¯ ∈ Dx0 ∩ BY (y0 , δ), and .∃αh , αk ∈ K such that .hˇ 0 = αh (x¯ − x0 ) and ˇ0 = αk (y¯ − y0 ). By the multi-linearity of these two second order partial .k ∂2f ∂2f (x0 , y0 )(y¯ − y0 )(x¯ − x0 ) = ∂x∂y (x0 , y0 )(x¯ − x0 )(y¯ − y0 ). Clearly, derivatives, . ∂y∂x ˜ ˜ .(x, ¯ y0 ), (x0 , y) ¯ ∈ Dδ . By the convexity of .Dδ , we have .(x, ˆ y) ˆ := 0.5 (x, ¯ y0 ) + 0.5 (x0, y) ¯ ∈ D˜ δ . By the multi-linearity of these two second order partial derivatives, ∂2f ∂2f (x0 , y0 )(yˆ − y0 )(xˆ − x0 ) = ∂x∂y (x0 , y0 )(xˆ − x0 )(yˆ − y0 ). This is a we have . ∂y∂x contradiction. Hence, the result of the proposition must hold. This completes the proof of the proposition. ' & .

9.4 Higher Order Derivatives

283

9.4.3 High Order Derivatives of Some Common Functions Proposition 9.33 Let X and Y be normed linear spaces over K, D ⊆ X, f : D → Y, and x0 ∈ D. Assume that ∃δ0 ∈ (0, ∞) ⊂ R and ∃y0 ∈ Y such that f (x) = y0 and span (AD (x)) = X, ∀x ∈ D ∩ BX (x0 , δ0 ). Then, f is C∞ at x0 and f (i) (x) = ϑBS i (X,Y) , ∀x ∈ D ∩ BX (x0 , δ0 ), ∀i ∈ N. Proof ∀x ∈ D ∩ BX (x0 , δ0 ), let δx := δ0 − x − x0  > 0. Then, ∀x¯ ∈ D∩BX (x, δx ) ⊆ D∩BX (x0 , δ0 ), we have f (x) ¯ = y0 . Note thatspan(AD (x)) = X. By Proposition 9.10, f (1)(x) = ϑB(X,Y) . Let D1 := dom f (1) . Then, D1 ∩   BX (x0 , δ0 ) = D ∩ BX (x0 , δ0 ). Then, span AD1 (x) = X, ∀x ∈ D1 ∩ BX (x0 , δ0 ). By recursively applying the above argument, we have f (i) (x) = ϑBi (X,Y) = ϑBS i (X,Y) , ∀x ∈ D ∩ BX (x0 , δ0 ), ∀i ∈ N. This completes the proof of the ' & proposition. Proposition 9.34 Let X and Y be normed linear spaces over K, D2 ⊆ D1 ⊆ X, x0 ∈ D2 , f : D1 → Y, g := f |D2 , k ∈ N, and n ∈ N ∪ {∞}. Then, the following statements hold:   (i) If, ∀x ∈ D2 , f (k) (x) exists and span AD2 (x) = X, then g is k-times Fréchet differentiable and g (i) (x) = f (i) (x), ∀x ∈ D2 , ∀i ∈ {1, . . . , k}. (ii) If, ∀x ∈ D2 , g (k) (x) exists and ∃δx ∈ (0, ∞) ⊂ R such that D1 ∩ BX (x, δx ) = D2 ∩ BX (x, δx ), then f (k) (x) exists and f (i) (x) = g (i) (x), ∀x ∈ D2 , ∀i ∈ {1, . . . , k}.   (iii) If f is Cn at x0 and ∃δ ∈ (0, ∞) ⊂ R such that span AD2 (x) = X, ∀x ∈ D2 ∩ BX (x0 , δ), then, g is Cn at x0 . (iv) If g is Cn at x0 and ∃δ ∈ (0, ∞) ⊂ R such that D1 ∩ BX (x0 , δ) = D2 ∩ BX (x0 , δ), then f is Cn at x0 . Proof (i) We will use mathematical induction on k to prove this statement. 1◦ k = 1. ∀x ∈ D2 , by Proposition 9.11, we have g (1) (x) exists and g (1) (x) = f (1) (x). Hence, the result holds in this case. 2◦ Assume that the result holds for k ≤ k¯ ∈ N. ¯ 3◦ Consider the case k = k¯ + 1. By inductive assumption, "g (k) (x) # exists and   ¯ (i) (i) ( k) ¯ = D2 ⊆ g (x) = f (x), ∀x ∈ D2 , ∀i ∈ 1, . . . , k . Then, dom g # " ¯ ¯ ( k) ( k) =: D¯ 1 . ∀x ∈ D2 , by the assumption, f dom f is differentiable at x and   ¯ span AD2 (x) = X. By Proposition 9.11, g (k) is Fréchet differentiable at x and ¯ ¯ ¯ ¯ g (k+1) (x) = Dg (k) (x) = Df (k) (x) = f (k+1) (x). This completes the induction process. (ii) We will use mathematical induction on k to prove this statement. 1◦ k = 1. ∀x ∈ D2 , by Proposition 9.11, f (1) (x) exists and f (1) (x) = g (1) (x). Hence, the result holds. 2◦ Assume that the result holds for k ≤ k¯ ∈ N.

284

9 Differentiation in Banach Spaces

¯ 3◦ Consider the case k = k¯ + 1. By inductive assumption, f" (k) (x) # exists and   ¯ f (i) (x) = g (i) (x), ∀x ∈ D2 , ∀i ∈ 1, . . . , k¯ . Then, dom g (k) = D2 ⊆ # " ¯ ¯ dom f (k) =: D¯ 1 ⊆ D1 . ∀x ∈ D2 , by the assumption, g (k) is differentiable at x and ∃δx ∈ (0, ∞) ⊂ R such that D¯ 1 ∩ BX (x, δx ) = D2 ∩ BX (x, δx ). Then, by ¯

¯

¯

Proposition 9.11, f (k) is Fréchet differentiable at x and f (k+1) (x) = Df (k) (x) = ¯ ¯ Dg (k) (x) = g (k+1) (x). This completes the induction process. (iii) We will distinguish two exhaustive and mutually exclusive cases: Case 1: n ∈ N; Case 2: n = ∞. Case 1: n ∈ N. Without loss of generality, assume f is n-times differentiable at x, ∀x ∈ D1 ∩ BX (x0 , δ). Let D¯ 2 := D2 ∩ BX (x0 , δ) and g¯ := f |D¯ 2 . ∀x ∈ D¯ 2 , f (n) (x) exists. Let δx := δ − x − x0  ∈ (0, ∞) ⊂ R. Then, D¯ 2 ∩ BX (x, δx ) =   D2 ∩ BX (x, δx ). Hence, span (AD¯ 2 (x)) = span AD2 (x) = X. Then, by (i), g¯ is n-times differentiable and g¯ (i) (x) = f (i) (x), ∀x ∈ D¯ 2 , ∀i ∈ {1, . . . , n}. By (ii), g (n) (x) exists and g (i) (x) = g¯ (i) (x) = f (i) (x), ∀x ∈ D¯ 2 , ∀i ∈ {1, . . . , n}. (n) ¯ By the continuity R, and6 ∀x ∈  of f  at x0 , ∀ ∈ (0, ∞)  (n)  ⊂ R, ∃δ6 ∈ (0, δ] ⊂ (n) ∩ BX x0 , δ¯ = D1 ∩ BX x0 , δ¯ , we have 6f (n) dom f (x) − f (x0 )6 (x1 , x2 )(i+1) (x1 , x2 ) = ϑBS i+1 (X×X,X) , ∀(x1 , x2 ) ∈ X × X, ∀i ∈ N. , and f idX idX Proof This is straightforward and is therefore omitted.

' &

Proposition 9.39 Let X be a normed linear space over K and f : K × =X → X be > given by f (α, x) = αx, ∀(α, x) ∈ K × X. Then, f is C∞ , f (1)(α, x) = x α idX , f (2) (α, x)(d2 , h2 )(d1 , h1 ) = d1 h2 + d2 h1 , and f (i+2) (α, x) = ϑBS i+2 (K×X,X) , ∀(α, x) ∈ K × X, ∀i ∈ N, ∀(d1 , h1 ) ∈ K × X, ∀(d2 , h2 ) ∈ K × X. Proof This is straightforward and is therefore omitted.

' &

Proposition 9.40 Let X and Y be normed linear spaces over K, D ⊆ X, f1 : D → Y, f2 : D → Y, x0 ∈ D, α1 , α2 ∈ K, n ∈ N, k ∈ N ∪ {∞}, and g : D → Y be given by g(x) = α1 f1 (x) + α2 f2 (x), ∀x ∈ D. If f1 and f2 are n-times differentiable, (i) (i) then g is n-times differentiable and g (i) (x) = α1 f1 (x) + α2 f2 (x), ∀x ∈ D, ∀i ∈ {1, . . . , n}. If f1 and f2 are Ck at x0 , then g is Ck at x0 and g (i) (x) = α1 f1(i) (x) + (i) α2 f2 (x), ∀x ∈ D ∩ BX (x0 , δ0 ), ∀i ∈ N with i ≤ k, for some δ0 ∈ (0, ∞) ⊂ R. Proof This is straightforward and is therefore omitted.

' &

Proposition 9.41 Let X and Y be normed linear spaces over K and f : B(X, Y) × X → Y be given= by f (A,> x) = Ax, ∀(A, x) ∈ B(X, Y) × X. Then, f is C∞ , f (1) (A, x) = ro(x) A , f (2) (A, x)(Δ2 , h2 )(Δ1 , h1 ) = Δ1 h2 + Δ2 h1 , and

286

9 Differentiation in Banach Spaces

f (i+2) (A, x) = ϑBS i+2 (B(X,Y)×X,Y) , ∀(A, x) ∈ B(X, Y) × X, ∀i ∈ N, ∀(Δ1 , h1 ) ∈ B(X, Y) × X, ∀(Δ2 , h2 ) ∈ B(X, Y) × X. ' &

Proof This is straightforward and is therefore omitted.

Proposition 9.42 Let X, Y, and Z be normed linear spaces over K and f : B(Y, Z) × B(X, Y) → B(X, Z) be given by f (Ayz , Axy ) = Ayz Axy , ∀(Ayz , Axy ) ∈ = > B(Y, Z) × B(X, Y). Then, f is C∞ , f (1) (Ayz , Axy ) = ro(Axy ) Ayz , f (2) (Ayz , Axy )(Δyz2 , Δxy2 )(Δyz1, Δxy1 ) = Δyz1Δxy2 + Δyz2Δxy1 , and f (i+2) (Ayz , Axy ) = ϑBS i+2 (B(Y,Z)×B(X,Y),B(X,Z)) , ∀(Ayz , Axy ) ∈ B(Y, Z) × B(X, Y), ∀i ∈ N, ∀(Δyz1 , Δxy1 ) ∈ B(Y, Z) × B(X, Y), ∀(Δyz2 , Δxy2 ) ∈ B(Y, Z) × B(X, Y). ' &

Proof This is straightforward and is therefore omitted.

Proposition 9.43 Let X1 , . . . , Xp and Y1, . . . , Ym be normed linear spaces over K, where p, m ∈ N, Zj i := " B Xi , Yj , i = 1, .#. . , p, j = 1, . . . , m, Z := m p m p j =1 i=1 Zj i , and f : Z → B i=1 Xi , j =1 Yj be given by ⎤ A11 · · · A1p ⎢ . .. ⎥ ; .f (A11 , . . . , Amp ) = ⎣ . . . ⎦ Am1 · · · Amp ⎡

∀(A11 , . . . , Amp ) ∈ Z

Then, f is C∞ and ⎤ Δ11 · · · Δ1p ⎢ .. ⎥ (1) .f (A11 , . . . , Amp )(Δ11 , . . . , Δmp ) = ⎣ ... . ⎦ Δm1 · · · Δmp ⎡

f (l+1)(A11 , . . . , Amp ) = ϑB

 S l+1

 p

Z,B

i=1

Xi ,

m j=1

Yj



∀(A11 , . . . , Amp ) ∈ Z, ∀l ∈ N, ∀(Δ11 , . . . , Δmp ) ∈ Z. Proof This is straightforward and is therefore omitted.

' &

9.4.4 Properties of High Order Derivatives Proposition 9.44 Let X, Y, and Z be normed linear spaces over K, D ⊆ X, f1 : D → Y, f2 : D → Z, x0 ∈ D, k ∈ N, and g : D → Y × Z be given by g(x) = (f1 (x), f2 (x)), ∀x ∈ D. Then, the following statements hold:

9.4 Higher Order Derivatives

287 (k)

(k)

(i) ∃δ0 ∈ (0, ∞) ⊂ R such that f1 (x) and f2 (x) exist, ∀x ∈ D ∩ BX (x0 , δ0 ) if, (k) and only if, ∃δ0 ∈ (0, ∞) M that g (x) exists, ∀x ∈ D ∩ BX (x0 , δ0 ). L ⊂ R such (i) f1 (x) , ∀x ∈ D ∩ BX (x0 , δ0 ), ∀i ∈ {1, . . . , k}. In this case, g (i) (x) = (i) f2 (x) (ii) Let n ∈ N ∪ {∞}. Then, f1 and f2 are Cn at x0 if, and only if, g is Cn at x0 . Proof (i) We will use mathematical induction on k to prove this statement. 1◦ k = 1. The statement holds by Proposition 9.19. 2◦ Assume that the result holds for k = k¯ ∈ N. ¯ ¯ 3◦ Consider the case k = k¯ + 1. “Necessity” Let f1(k+1) (x) and f2(k+1) (x) exist, ¯ ∀x ∈ D M∩ BX (x0 , δ0 ). By inductive assumption, g (k) (x) exists and g (i) (x) = L (i) " ¯#   f1 (x) (k) , ∀x ∈ D ∩ BX (x0 , δ0 ), ∀i ∈ 1, . . . , k¯ . Let D¯ 1 := dom f1 , (i) f2 (x) " # " ¯# ¯ (k) D¯ 2 := dom f2 , and D¯ := dom g (k) . Then, D¯ 1 ⊆ D, D¯ 2 ⊆ D, D¯ ⊆ D, D¯ ⊇ D ∩ BX (x0 , δ0 ), and D¯ ∩ BX (x0 , δ0 ) ⊆ D¯ 1 ∩ D¯ 2 . This implies that   D¯ ∩ BX (x0 , δ0 ) = D ∩ BX (x0 , δ0 ) =: Dˆ and span A ˆ (x) = span (AD (x)) = X, D

¯ ¯ (k) ˆ By ˆ By the assumption, f (k) and f2 are differentiable at x, ∀x ∈ D. ∀x ∈ D. 1 ¯ ¯ ( k) ( k+1) ˆ Propositions 9.19 and 9.11, (x) =  ∀x⎤∈ D, g is differentiable at x and g ⎡ ¯  M L ¯ (k)  ( k+1) D f1  (x) ¯  ⎦ = f1 ¯ (x) . Dˆ D g (k)  (x) = ⎣ ¯  (k) Dˆ f2(k+1) (x) D f2  (x) ¯



“Sufficiency” Let g (k+1) (x) exist, ∀x ∈ DL ∩ BX (xM 0 , δ0 ). By inductive assump(i) ¯ ¯ f1 (x) , ∀x ∈ D∩BX (x0 , δ0 ), ∀i ∈ tion, f1(k) (x) and f2(k) (x) exist and g (i) (x) = f2(i) (x) " # " ¯# " ¯#   ¯ 1, . . . , k¯ . Let D¯ 1 := dom f1(k) , D¯ 2 := dom f2(k) , and D¯ := dom g (k) . Then, D¯ 1 ⊆ D, D¯ 2 ⊆ D, D¯ ⊆ D, D¯ ⊇ D ∩ BX (x0 , δ0 ), and D ∩ BX (x0 , δ0 ) ⊆ D¯ 1 ∩ D¯ 2 . This implies that D¯ ∩ BX (x0 , δ0 ) = D ∩ BX (x0 , δ0 ) = D¯ 1 ∩ BX (x0 , δ0 ) =   ˆ Then, span A ˆ (x) = span (AD (x)) = X, ∀x ∈ D. ˆ By D¯ 2 ∩ BX (x0 , δ0 ) =: D. D   ¯  ¯  ( k) ( k) ˆ f  and f  are differentiable Propositions 9.11 and 9.19, we have, ∀x ∈ D, 1 Dˆ 2 Dˆ  ⎡ ⎤ ¯  (k)  D f1  (x) ¯ ¯  ⎦. By Proposition 9.11, we Dˆ at x and g (k+1) (x) = D g (k)  (x) = ⎣ ¯  (k) Dˆ D f2  (x) Dˆ   ¯  ¯ ¯ ¯  ¯ (k) (k) (k+1) (k) (k) have D f1  (x) = Df1 (x) = f1 (x) and D f2  (x) = Df2 (x) = ˆ Dˆ D L ¯ M (k+1) ¯ f1 (x) ¯ (k+1) ( k+1) ˆ ˆ (x), ∀x ∈ D. Then, g (x) = f2 , ∀x ∈ D. ¯ (k+1) f2 (x) This completes the induction process. (ii) We will distinguish two exhaustive and mutually exclusive cases: Case 1: n ∈ N; Case 2: n = ∞.

288

9 Differentiation in Banach Spaces

Case 1: n ∈ N. “Sufficiency” Let g be Cn at x0 . Then, ∃δ0 ∈ (0, ∞) ⊂ R such ¯ By (i), f1 and f2 that g is n-times differentiable at x, ∀x ∈ D ∩ BX (x0 ,Lδ0 ) =: D. M (n) f1 (x) (n) ¯ By the ¯ , ∀x ∈ D. are n-times differentiable at x, ∀x ∈ D and g (x) = (n) f2 (x) (n)

(n)

continuity of g (n) at x0 and Proposition 3.32, f1 and f2 are continuous at x0 . Then, f1 and f2 are Cn at x0 . “Necessity” Let f1 and f2 be Cn at x0 . Then, ∃δ0 ∈ (0, ∞) ⊂ R such that f1 and ¯ By (i), g is n-times f2 are n-times differentiable at x, ∀x ∈ D ∩LBX (x0 , δM0 ) =: D. (n) ¯ By the continuity ¯ and g (n) (x) = f1(n) (x) , ∀x ∈ D. differentiable at x, ∀x ∈ D, f2 (x) of f1(n) and f2(n) at x0 and Proposition 3.32, g (n) is continuous at x0 . Then, g is Cn at x0 . Case 2: n = ∞. “Sufficiency” Let g be C∞ at x0 . Then, ∃δ ∈ (0, ∞) ⊂ R such that g is Ci at x, ∀x ∈ D ∩ BX (x0 , δ), ∀i ∈ N. By Case 1, f1 and f2 are Ci at x, ∀x ∈ D ∩ BX (x0 , δ), ∀i ∈ N. Then, f1 and f2 are C∞ at x0 . “Necessity” Let f1 and f2 be C∞ at x0 . Then, ∃δ ∈ (0, ∞) ⊂ R such that f1 and f2 are Ci at x, ∀x ∈ D ∩ BX (x0 , δ), ∀i ∈ N. By Case 1, g is Ci at x, ∀x ∈ D ∩ BX (x0 , δ), ∀i ∈ N. Then, g is C∞ at x0 . This completes the proof of the proposition. ' & Proposition 9.45 Let X, Y, and Z be normed linear spaces over K, D1 ⊆ X, D2 ⊆ Y, f : D1 → D2 , g : D2 → Z, x0 ∈ D1 , and y0 := f (x0 ) ∈ D2 . Then, the following statements hold: (i) Assume that f is Ck at x0 and g is Ck at y0 , for some k ∈ N ∪ {∞}. Then, h := g ◦ f is Ck at x0 . (ii) Let k ∈ N. Assume that f is k-times differentiable and g is k-times differentiable. Then, h is k-times differentiable. Proof (i) We will first use mathematical induction on k to show that the result holds if k ∈ N. 1◦ k = 1. By g being C1 at y0 , then ∃δ1 ∈ (0, ∞) ⊂ R such that g (1) (y) exists, ∀y ∈ D2 ∩ BY (y0 , δ1 ), and g (1) is continuous at y0 . By f being C1 at x0 , then, by Proposition 9.7, ∃δ ∈ (0, ∞) ⊂ R such that f (x) ∈ D2 ∩ BY (y0 , δ1 ) and f (1)(x) exists, ∀x ∈ D1 ∩ BX (x0 , δ), and f (1) is continuous at x0 . ∀x ∈ D1 ∩ BX (x0 , δ), by chain rule, h(1) (x) exists and h(1) (x) = g (1) (f (x))f (1)(x). By Propositions 3.12, 9.7, 3.32, and 9.42, h(1) is continuous at x0 . Hence, h is C1 at x0 . 2◦ Assume that the result holds for k ≤ k¯ ∈ N. 3◦ Consider the case k = k¯ + 1. By g being Ck+1 at y0 , then ∃δ1 ∈ (0, ∞) ⊂ ¯ (1) ¯ and g (1) is C ¯ at y0 . R such that g (y) exists, ∀y ∈ D2 ∩ BY (y0 , δ1 ) =: D, k By f being Ck+1 at x0 , then, by Proposition 9.7, ∃δ ∈ (0, ∞) ⊂ R such that ¯ ˆ and f (1) is C ¯ at f (x) ∈ D¯ and f (1) (x) exists, ∀x ∈ D1 ∩ BX (x0 , δ) =: D, k ˆ by chain rule, h(1) (x) exists and h(1) (x) = g (1) (f (x))f (1) (x). Then, x0 . ∀x ∈ D,

9.4 Higher Order Derivatives

289

    h(1) Dˆ : Dˆ → B(X, Z) is given by h(1) Dˆ (x) = g (1) D¯ ( f |Dˆ (x)) f (1)Dˆ (x),   ˆ By Proposition 9.34, g (1)  ¯ is C ¯ at y0 and f | ˆ and f (1) ˆ are C ¯ at x0 . ∀x ∈ D. k k D D D  By inductive assumption and Propositions 9.44, 9.42, and 3.32, h(1) Dˆ is Ck¯ at x0 . By Proposition 9.34, h(1) is Ck¯ at x0 . Hence, h is Ck+1 at x0 . ¯ This completes the induction process. When k = ∞, then ∃δ ∈ (0, ∞) ⊂ R such that g is Ci at y, ∀y ∈ D2 ∩BY (y0 , δ), and f is Ci at x, ∀x ∈ D1 ∩ BX (x0 , δ), ∀i ∈ N, which further implies by the induction conclusion, h is Ci at x, ∀x ∈ D1 ∩ BX (x0 , δ), ∀i ∈ N. Hence, h is C∞ at x0 . (ii) We will first use mathematical induction on k to show that the result holds. 1◦ k = 1. By chain rule, h(1) (x) exists and h(1) (x) = g (1) (f (x))f (1) (x), ∀x ∈ D1 . Hence, the result holds. 2◦ Assume that the result holds for k ≤ k¯ ∈ N. 3◦ Consider the case k = k¯ + 1. By chain rule, h(1) (x) exists and h(1) (x) = g (1) (f (x))f (1)(x), ∀x ∈ D1 . By inductive assumption and Propositions 9.44 ¯ and 9.42, h(1) is k-times differentiable. Hence, h is (k¯ + 1)-times differentiable. This completes the induction process and the proof of the proposition. ' & Proposition 9.46 Let X, Y, and Z be normed linear spaces over K, D ⊆ X × Y, and f : D → Z be partial differentiable with respect to x and partial differentiable with respect to y. Then, the following statements hold: ∂f (i) If f is (n + 1)-times differentiable, where n ∈ N, then, ∂f ∂x and ∂y are n-times differentiable. ∂f (ii) If f is C1 at (x0 , y0 ) ∈ D, then ∂f ∂x and ∂y are continuous at (x0 , y0 ). ∂f (iii) If f is Cn at (x0 , y0 ) ∈ D, where n ∈ {2, 3, . . .} ∪ {∞}, then ∂f ∂x and ∂y are Cn−1 at (x0 , y0 ). G F ∂f (x, y) (x, y) , ∀(x, y) ∈ D. Define Proof By Proposition 9.9, f (1) (x, y) = ∂f ∂x ∂y H I idX g : X × Y → B(X, X × Y) by g(x, y) = . By Proposition 9.33, g is C∞ . ϑB(X,Y) (1) (x, y)g(x, y), ∀(x, y) ∈ D: It is clear that ∂f ∂x (x, y) = f (i) Since f is (n + 1)-times differentiable, then f (1) is n-times differentiable. By Propositions 9.42, 9.44, and 9.45, we have ∂f ∂x is n-times differentiable. By ∂f symmetry, ∂y is n-times differentiable. (ii) Since f is C1 at (x0 , y0 ), then f (1) is continuous at (x0 , y0 ). By Propositions 9.42, 9.7, 3.32, and 3.12, we have ∂f ∂x is continuous at (x0 , y0 ). By symmetry, ∂f is continuous at (x , y ). 0 0 ∂y (iii) Since f is Cn at (x0 , y0 ), then f (1) is Cn−1 at (x0 , y0 ). By Proposi∂f tions 9.42, 9.44, and 9.45, we have ∂f ∂x is Cn−1 at (x0 , y0 ). By symmetry, ∂y is Cn−1 at (x0 , y0 ). This completes the proof of the proposition. ' &

290

9 Differentiation in Banach Spaces

Proposition 9.47 Let X1 ,  . . . , Xp and Y be normed linear spaces over K, where p p ∈ {2, 3, . . .}, D ⊆ X := i=1 Xi , f : D → Y, xo ∈ D ◦ , and k ∈ N. Assume that ∃δ0 ∈ (0, ∞) ⊂ R such that all partial derivatives of f up to kth order exist and ˜ are continuous at x, ∀x ∈ D˜ := BX (xo , δ0 ) ⊆ D. Then, f is Ck at x, ∀x ∈ D. Proof We will prove this using mathematical induction on k. ˜ by repeated application of Proposition 9.24, we have f (1) (x) 1◦ k = 1. ∀x ∈ D, G F ∂f ∂f exists and f (1)(x) = ∂x1 (x) · · · ∂xp (x) . 2◦ Assume that the result holds for k = k¯ ∈ N. (1) ˜ 3◦ Consider Fthe case k = k¯ + 1. G ∀x ∈ D, by the case k = 1, f (x) exists ∂f ∂f and f (1) (x) = ∂x1 (x) · · · ∂xp (x) . ∀i ∈ {1, . . . , p}, by the assumption, all partial ∂f ¯ ∂xi up to kth order exist and are continuous at x. Then, by inductive p ∂f assumption, ∂x is Ck¯ at x. Define the function g : Dˆ → i=1 B(Xi , Y) by g(x) = i # " p ∂f ∂f ∂f . Clearly, D˜ ⊆ Dˆ ⊆ D. Then, (x), . . . , ∂x (x)), ∀x ∈ Dˆ := i=1 dom ∂x ( ∂x p 1 i ⎤ ⎡ ∂f (x) Dj ∂x 1 ⎥ ⎢ .. ⎥, ¯ by Proposition 9.44, g is k-times differentiable at x and g (j ) (x) = ⎢

derivatives of



.

∂f Dj ∂x (x) p



¯ By Proposition 3.32, g is C ¯ at x. By Propositions 9.45 and 9.43, ∀j = 1, . . . , k. k (1) f is Ck¯ at x. Therefore, f is Ck+1 at x. This completes the induction process and ¯ the proof of the proposition. ' & Theorem 9.48 (Taylor’s Theorem) Let X and Y be normed linear spaces over K, D ⊆ X, f : D → Y, x0 , x1 ∈ D, and n ∈ Z+ . Let D¯ := I := [0, 1] ⊂ R and D˜¯ := I ◦ if K = R or D¯ := {a + i0 | a ∈ I } ⊂ C and D˜¯ := {a + i0 | a ∈ I ◦ } ⊂ C if ¯ Assume that K = C. Let ϕ : D¯ → D be given by ϕ(t) = tx1 + (1 − t)x0 , ∀t ∈ D.  (n+1)   (n)  ˜ ¯ dom f ¯ and f (n) is continuous at x = ϕ(t), ⊇ ϕ(D), ⊇ ϕ(D), dom f ¯ ∀t ∈ D. Let Rn ∈ Y be given by 1 (1) f (x0 )(x1 − x0 ) + · · · 1! ! 1 (n) + f (x0 ) (x1 − x0 ) · · · (x1 − x0 )    n!

Rn := f (x1 ) − f (x0 ) +

.

n-times

Then, the following statements hold: (i) If Y = R and K = R, then ∃t¯0 ∈ I ◦ such that Rn =

.

1 f (n+1) (ϕ(t¯0 )) (x1 − x0 ) · · · (x1 − x0 )    (n + 1)! (n+1)-times

9.4 Higher Order Derivatives

291

(ii) ∃t¯0 ∈ I ◦ such that Rn  ≤

.

6 (n+1) 6 1 6f (ϕ(t¯0 ))6x1 − x0 n+1 (n + 1)!

Proof (i) Let Y = R and K = R. Define F : I → R by t (1) f (ϕ(1 − t))(x1 − x0 ) + · · · 1! ! t n (n) n+1 ; ∀t ∈ I + f (ϕ(1 − t)) (x1 − x0 ) · · · (x1 − x0 ) +Rn t    n!

F (t) = f (ϕ(1)) − f (ϕ(1 − t)) +

.

n-times

By Propositions 3.12, 3.32, 9.7, 7.23, and 7.65, F is continuous. Clearly, ϕ is differentiable. By chain rule and Propositions 9.10, 9.15–9.17, and 9.19, F is differentiable at t, ∀t ∈ I ◦ . Clearly, F (0) = F (1) = 0. By mean value theorem 9.20, ∃t0 ∈ I ◦ such that 0 = F (1) − F (0) = DF (t0 ). Then, we have 0 = − − f (1) (ϕ(1 − t0 ))(x1 − x0 ) + f (1) (ϕ(1 − t0 ))(x1 − x0 )

.



t0 (2) f (ϕ(1 − t0 ))(x1 − x0 )(x1 − x0 ) + · · · 1!

+

t0n−1 f (n) (ϕ(1 − t0 )) (x1 − x0 ) · · · (x1 − x0 )    (n − 1)! n-times



t0n n!

f

(n+1)

(ϕ(1 − t0 )) (x1 − x0 ) · · · (x1 − x0 ) +(n + 1)Rn t0n    (n+1)-times

=

t0n n!

f (n+1) (ϕ(1 − t0 )) (x1 − x0 ) · · · (x1 − x0 ) −(n + 1)Rn t0n    (n+1)-times

Hence, Rn =

.

1 f (n+1) (ϕ(t¯0 )) (x1 − x0 ) · · · (x1 − x0 )    (n + 1)! (n+1)-times

where t¯0 = 1 − t0 ∈ I ◦ .

!

292

9 Differentiation in Banach Spaces

(ii) By Proposition 7.85, ∃y∗ ∈ Y∗ with y∗  ≤ 1 such that Rn  = y∗ , Rn . Define G : D¯ → K by @@ G(t) = y∗ , f (x1 ) − f (ϕ(1 − t))

.



AA f (i) (ϕ(1 − t)) (x1 − x0 ) · · · (x1 − x0 ) − Rn t n+1    i!

n . ti i=1

i-times

¯ By Propositions 3.12, 3.32, 9.7, 7.23, and 7.65, G is continuous. Clearly, ∀t ∈ D. ϕ is differentiable. By chain rule and Propositions 9.10, 9.15–9.17, and 9.19, G is ˜¯ Clearly, G(0) = G(1) = 0. differentiable at t, ∀t ∈ D. We will distinguish two exhaustive and mutually exclusive cases: Case 1: K = R; Case 2: K = C. Case 1: K = R. By mean value theorem 9.20, ∃t0 ∈ I ◦ such that 0 = G(1) − G(0) = DG(t0 ). Then, we have @@ 0 = y∗ , f (1) (ϕ(1 − t0 ))(x1 − x0 )

.



n . i=1

+

i-times

n i . t

0

i=1

=

t0i−1 f (i) (ϕ(1 − t0 )) (x1 − x0 ) · · · (x1 − x0 )    (i − 1)! f (i+1) (ϕ(1 − t0 )) (x1 − x0 ) · · · (x1 − x0 )    i!

AA

(i+1)-times

−(n + 1)Rn t0n @@ tn y∗ , 0 f (n+1) (ϕ(1 − t0 )) (x1 

n!

− x0 ) · · · (x1 − x0 )  

AA

(n+1)-times

−(n + 1)Rn t0n Then, we have Rn  =

.

AA @@ 1 y∗ , f (n+1) (ϕ(1 − t0 )) (x1 − x0 ) · · · (x1 − x0 )    (n + 1)! (n+1)-times



6 6 1 y∗ 6f (n+1) (ϕ(1 − t0 )) (x1 − x0 ) · · · (x1 − x0 ) 6    (n + 1)! (n+1)-times



1 (n + 1)!

6 6 6 (n+1) ¯ 6 (ϕ(t0 ))6x1 − x0 n+1 6f

where the last two inequalities follow from Proposition 7.64 and t¯0 = 1 − t0 ∈ I ◦ .

9.5 Mapping Theorems

293

Case 2: K = C. By Lemma 9.22, ∃t0 ∈ I ◦ such that Re (G(1) − G(0)) = Re (DG(t0 )). Then, we have BB C 0 = Re y∗ , f (1) (ϕ(1 − t0 ))(x1 − x0 )

.

− Re

n "@@ . y∗ , i=1

AA# t0i−1 f (i) (ϕ(1 − t0 )) (x1 − x0 ) · · · (x1 − x0 )    (i − 1)! i-times

n i "@@ . AA# t0 (i+1) + Re y∗ , f (ϕ(1 − t0 )) (x1 − x0 ) · · · (x1 − x0 )    i! i=1

=

(i+1)-times

−(n + 1)Rn t0n "@@ tn Re y∗ , 0 f (n+1) (ϕ(1 − t0 )) (x1 

n!

− x0 ) · · · (x1 − x0 )  

AA#

(n+1)-times

−(n +

1)Rn t0n

Hence, Rn  =

.

AA# "@@ 1 Re y∗ , f (n+1) (ϕ(1 − t0 )) (x1 − x0 ) · · · (x1 − x0 )    (n + 1)! (n+1)-times



6 6 1 y∗ 6f (n+1) (ϕ(1 − t0 )) (x1 − x0 ) · · · (x1 − x0 ) 6    (n + 1)! (n+1)-times



1 (n + 1)!

6 6 6 (n+1) ¯ 6 (ϕ(t0 ))6x1 − x0 n+1 6f

where the last two inequalities follow from Proposition 7.64 and t¯0 = 1 − t0 ∈ I ◦ . This completes the proof of the theorem. ' &

9.5 Mapping Theorems Definition 9.49 Let X := (X, ρ) be a metric space, S ⊆ X, and T : S → X. T is said to be a contraction mapping on S if T (S) ⊆ S and ∃α ∈ [0, 1) ⊂ R such that ∀x1 , x2 ∈ S, we have ρ(T (x1 ), T (x2 )) ≤ αρ(x1 , x2 ). Then, α is called a contraction index for T . % Theorem 9.50 (Contraction Mapping Theorem) Let S = ∅ be a closed subset of a complete metric space X := (X, ρ) and T be a contraction mapping on S with contraction index α ∈ [0, 1) ⊂ R. Then, the following statements hold:

294

9 Differentiation in Banach Spaces

(i) ∃! x0 ∈ S such that x0 = T (x0 ). (ii) ∀x1 ∈ S, recursively define xn+1 = T (xn ), ∀n ∈ N. Then, limn∈N xn = x0 . n−1 α (iii) ρ(xn , x0 ) ≤ α1−α ρ(x2 , x1 ), ρ(xn , x0 ) ≤ 1−α ρ(xn , xn−1 ), and ρ(xn , x0 ) ≤ αρ(xn−1 , x0 ), ∀n ∈ {2, 3, . . .}. Proof Fix any x1 ∈ S = ∅. Recursively define xn+1 = T (xn ) ∈ S, ∀n ∈ N. Then, ρ(xn , xn−1 ) = ρ(T (xn−1 ), T (xn−2 )) ≤ αρ(xn−1 , xn−2 ) ≤ · · · ≤ α n−2 ρ(x2 , x1 ), ∀n ∈ {3, 4, . . .}. Therefore, (xn )∞ n=1 ⊆ S is a Cauchy sequence and converges to x0 ∈ S by X being complete, S being closed, and Proposition 4.39. Clearly, T is continuous. Then, by Proposition 3.66, x0 = T (x0 ). ∀x¯ ∈ S such that x¯ = T (x). ¯ Then, ρ(x0 , x) ¯ ≤ αρ(x0 , x) ¯ and ρ(x0 , x) ¯ = 0. Hence, x¯ = x0 . Hence, the statements (i) and (ii) are true. Note that, by Propositions 3.66 and 4.30, ∀n ∈ {2, 3, . . .}, ρ(xn , x0 ) = lim ρ(xn , xm ) ≤

.

m∈N

= ρ(xn , x0 ) ≤

α n−1 1−α ∞ . i=n

∞ .

ρ(xi+1 , xi ) ≤

i=n

∞ .

α i−2 ρ(x2 , x1 )

i=n+1

ρ(x2 , x1 )

ρ(xi+1 , xi ) ≤

∞ .

α i ρ(xn , xn−1 ) =

i=1

α ρ(xn , xn−1 ) 1−α

ρ(xn , x0 ) = ρ(T (xn−1 ), T (x0 )) ≤ αρ(xn−1 , x0 ) This completes the proof of the theorem.

' &

Lemma 9.51 Let X and Y be normed linear spaces over K, D ⊆ X, x0 ∈ D, and f : D → Y be C1 at x0 . Assume that D is locally convex at x0 . Then, ∀ ∈ (0, ∞) ⊂ R, ∃δ ∈ (0, ∞) ⊂ R such 6 6 that, ∀x1 , x2 ∈ D ∩ BX (x0 , δ), we have 6f (x1 ) − f (x2 ) − f (1)(x0 )(x1 − x2 )6 ≤ x1 − x2 . Proof By the assumption, ∃δ0 ∈ (0, ∞) ⊂ R such that D ∩ BX (x0 , δ0 ) =: D˜ is convex. ∀ ∈ (0, ∞) ⊂ R, by f6being C1 at x0 , ∃δ 6∈ (0, δ0 ] ⊂ R such that ¯ we have 6f (1) (x) − f (1) (x0 )6 < . Define γ : D → Y ∀x ∈ D ∩ BX (x0 , δ) =: D, by γ (x) = f (x) − f (1) (x0 )(x − x0 ), ∀x ∈ D. By Propositions 9.45, 9.38, 9.41, ¯ Clearly, and 9.44, γ is C1 at x0 and γ (1) (x) = f (1) (x) − f (1) (x0 ), ∀x ∈ D. ¯ ¯ D is convex. Then,6 ∀x1 , x2 ∈ D, by mean value theorem 9.23 and Proposi6 6f (x1 ) − f (x2 ) − f (1) (x0 )(x1 − x2 )6 = γ (x1 ) − γ (x2 ) ≤ tion 7.64, we have 6 (1) 6 6γ (t0 x1 + (1 − t0 )x2 )(x1 − x2 )6 ≤ x1 − x2 , for some t0 ∈ (0, 1) ⊂ R. This completes the proof of the lemma. ' & Theorem 9.52 (Injective Mapping Theorem) Let X and Y be Banach spaces ◦ (1) over K, D ⊆ X, and F : D  → Y beC1 at x0 ∈ D . Assume that F (x0 ) ∈ B(X, Y) is injective and M := R F (1)(x0 ) ⊆ Y is closed. Then, ∃δ ∈ (0, ∞) ⊂ R with

9.5 Mapping Theorems

295

U := BX (x0 , δ) ⊆ D such that F |U : U → F (U ) is bijective and admits a continuous inverse Fi : F (U ) → U . Proof By Proposition 7.13 and Proposition 4.39, M ⊆ Y is a Banach space. Then, F (1)(x0 ) : X → M is bijective. By Open Mapping Theorem 7.103, the inverse A F (1) (x06) : X → to B(M, X). ∀h ∈ X, we have 6 M belongs 6 of (1) 6 h = 6AF (x0 )h6 6≤ A6F6(1)(x0 )h6. Then, ∃r ∈ (0, ∞) ⊂ R such that rA ≤ 1 and rh ≤ 6F (1)(x0 )h6, ∀h ∈ X. By Lemma 9.51, ∃δ 6∈ (0, ∞) ⊂ R with U := BX (x0 ,6δ) ⊆ D such that ∀x1 , x2 ∈ U , we have 6F (x1 ) − 6F (x2 ) − F (1) (x0 )(x61 − x2 )6 ≤ rx1 − x2 /2. ∀x − x2  ≤ 6F6(1) (x0 )(x1 − x2 )6 ≤ F (x1 ) − F (x2 ) + 6 1 , x2 ∈ U , rx1(1) 6F (x1 ) − F (x2 ) − F (x0 )(x1 − x2 )6 ≤ F (x1 ) − F (x2 ) + rx1 − x2 /2. This implies that F (x1 ) − F (x2 ) ≥ rx1 − x2 /2. Hence, F |U : U → F (U ) is injective and surjective. Then, F |U : U → F (U ) is bijective and admits an inverse Fi : F (U ) → U . ∀y1 , y2 ∈ F (U ), we have Fi (y1 ) − Fi (y2 ) ≤ 2/rF (Fi (y1 )) − F (Fi (y2 )) = 2/ry1 − y2 . Hence, Fi is uniformly continuous. This completes the proof of the theorem. ' & Theorem 9.53 (Surjective Mapping Theorem) Let X and Y be Banach spaces over K, D ⊆ X, F : D → Y, x0 ∈ D ◦ , and y0 := F (x0 ) ∈ Y. Assume that F is C1 at x0 , and F (1) (x0 ) ∈ B(X, Y) is surjective. Then, ∃r ∈ (0, ∞) ⊂ R, ∃δ ∈ (0, ∞) ⊂ R, and ∃c1 ∈ [0, ∞) ⊂ R with c1 δ ≤ r such that ∀y¯ ∈ BY (y0 , δ/2), ∀x¯ ∈ BX (x0 , r/2) with y¯ = F (x), ¯ ∀y ∈ BY (y, ¯ δ/2), ∃x ∈ BX (x0 , r) ⊆ D with x − x ¯ we have y = F (x). ¯ ≤ c1 y − y,   Proof Let M := N F (1) (x0 ) , which is a closed subspace by Proposition 7.68. By Proposition 7.45, the quotient space X/M is a Banach space. By Proposition 7.70, F (1) (x0 ) = A ◦ φ, where φ : X → X/M is the natural homomorphism, and A ∈ B(X/M, Y) is injective. Since F (1) (x0 ) is surjective, then A6is bijective and, by Open 6 Mapping Theorem 7.103, A−1 ∈ B(Y, X/M). Let c1 := 46A−1 6 ∈ [0, ∞) ⊂ R. Define γ : D → X/M by γ (x) = A−1 (F (x) − F (1) (x0 )(x − x0 )), ∀x ∈ D. By Propositions 9.45, 9.38, 9.41, 9.34, and 9.44, γ is C1 at x0 and γ (1)(x) = A−1 (F (1) (x) − F (1) (x0 )), ∀x ∈ BX (x0 , r0 ) ⊆ D, for some r0 ∈ (0, ∞) ⊂ R. Clearly, γ (1) (x0 ) = ϑB(X,X/M) . Then, by Lemma 9.51, ∃r ∈ (0, r0 ] ⊂ R such that ∀x1 , x2 ∈ BX (x0 , r) ⊆ D, we have γ (x1 ) − γ (x2 ) ≤ x1 − x2 /4. Let δ ∈ (0, ∞) ⊂ R be such that c1 δ ≤ r. Fix any y¯ ∈ BY (y0 , δ/2) and any x¯ ∈ BX (x0 , r/2) with y¯ = F (x). ¯ Fix any y ∈ BY (y, ¯ δ/2). ¯ [xk+1 ] = [xk ] + A−1 (y − F (xk )) and select Recursively define x1 := x, xk+1 ∈ [xk+1 ] such that xk+1 − xk  ≤ 2[xk+1 ] − [xk ], ∀k ∈ N, where xk . Clearly, [xk ] = φ(xk ) = xk + M is the coset containing 6 6 6x1 ∈6 BX (x0 , r). Note that x2 − x1  ≤ 2[x2 ] − [x1 ] = 26A−1 (y − y) ¯ 6 ≤ 26A−1 6y − y ¯ = ¯ < r/4, where the second inequality follows from Proposition 7.64. c1 y − y/2 Then, x2 ∈ BX (x0 , r). Assume that x1 , . . . , xk ∈ BX (x0 , r) for some k 6 ∈ {2, 3, . . .}. Note that ∀i ∈ {2, . . . , k}, xi+1 − xi  ≤ 2[x 6 i+1 ] − [xi ] = 26A−1 (y − F (xi ) + Aφ(xi )) − A−1 (y − F (xi−1 ) + Aφ(xi−1 ))6 = 2γ (xi ) −γ (xi−1 ) ≤ xi − xi−1 /2. Then, xi+1 − xi  ≤ x2 − x1 /2i−1 . Hence,

296

9 Differentiation in Banach Spaces

-k -k i−1 = (2 − 1/2k−1 ) xk+1 − x ¯ ≤ i=1 xi+1 − xi  ≤ i=1 x2 − x1 /2 x2 − x1  < r/2. This implies that xk+1 ∈ BX (x0 , r). Inductively, we have k−1 , ∀k ∈ N. Hence, (x )∞ (xk )∞ k k=0 k=0 ⊆ BX (x0 , r) and xk+1 − xk  ≤ x2 − x1 /2 is a Cauchy sequence. By the completeness of X, we have limk∈N xk = x ∈ X. Note that, by Propositions 3.66 and 4.30, x − x ¯ = limk∈N xk − x1  ≤ limk∈N (2 − 1/2k−2 )x2 − x1  = 2x2 − x1  < r/2. Hence, x ∈ BX (x0 , r). By the differentiability of F and Propositions 9.7, 3.66, and 7.69, we have φ(x) = limk∈N φ(xk+1 ) = limk∈N (A−1 (y − F (xk )) + φ(xk )) = φ(x) + A−1 (y − F (x)). ¯ This This implies that y = F (x). Note that x − x ¯ ≤ 2x2 − x1  ≤ c1 y − y. completes the proof of the theorem. ' & Theorem 9.54 (Open Mapping Theorem) Let X and Y be Banach spaces over K, D ⊆ X be open, and F : D → Y be C1 . Assume that F (1) (x) ∈ B(X, Y) is surjective, ∀x ∈ D. Then, F is an open mapping. Proof Fix any open subset U ⊆ D, where U is open in the subset topology of D. Since D is open, then U is open in X. We will show that F (U ) is an open set in Y. Fix any y0 ∈ F (U ); there exists x0 ∈ U such that y0 = F (x0 ). Then, ∃r ∈ (0, ∞) ⊂ R such that BX (x0 , r) ⊆ U . It is easy to check that all assumptions of Surjective Mapping Theorem are satisfied at x0 . Then, there exist an open set V ⊆ Y with y0 ∈ V and c1 ∈ [0, ∞) ⊂ R such that ∀y¯ ∈ V , ∃x¯ ∈ D with x¯ − x0  ≤ c1 y¯ − y0 , we have y¯ = F (x). ¯ Take δ ∈ (0, ∞) ⊂ R such that c1 δ ≤ r and BY (y0 , δ) ⊆ V . Then, ∀y¯ ∈ BY (y0 , δ), x¯ − x0  ≤ c1 y¯ − y0  < r. Then, x¯ ∈ BX (x0 , r) ⊆ U and y¯ ∈ F (U ). Hence, BY (y0 , δ) ⊆ F (U ). Therefore, y0 ∈ (F (U ))◦ . By the arbitrariness of y0 , we have F (U ) is open in Y. By the arbitrariness of U , F is an open mapping. This completes the proof of the theorem. ' & Proposition 9.55 Let X and Y be Banach spaces 6 6over K and A ∈ B(X, Y) be bijective. Then, ∀T ∈ B(X, Y) with T − A6A−1 6 < 1, we have T is bijective 6 −1 62 6 6 6A 6 T −A and 6T −1 − A−1 6 ≤ 1−A−1 T −A . Proof By Open Mapping Theorem 7.103, A−1 ∈ B(Y, X). We will first prove the result for the special case Y = X and A = idX . We will distinguish two exhaustive and mutually exclusive cases: Case 1: X is a singleton set; Case 2: ∃x¯ ∈ X such that x¯ = ϑX . 6 −1 6 6 6 = A = 0. ∀T ∈ B(X, X), we have Case 1: X is a singleton set. Then, 6 −1A 6 6 T = idX . Then, T is bijective and T − idX 6 = 0. The 6 result 6 holds for this case. Case 2: ∃x¯ ∈ X such that x¯ = ϑX . Then, A = 6A−1 6 = 1. ∀T ∈ B(X, X) with T − idX  < 1, let Δ := T − idX . We will show that T is bijective. ∀x1 , x2 ∈ X with T (x1 ) = T (x2 ), we have x1 + Δ(x1 ) = x2 + Δ(x2 ), which implies that Δ(x1 − x2 ) = x2 − x1 . By Proposition 7.64, we have Δx1 − x2  ≥ x1 − x2 . Since Δ < 1, then x1 − x2  = 0 and x1 = x2 . Therefore, T is injective. ∀x0 ∈ X, define φ : X → X by φ(x) = x0 − Δ(x), ∀x ∈ X. Clearly, φ is a contraction mapping on X with contraction index Δ. By Contraction Mapping Theorem, there

9.5 Mapping Theorems

297

exists a unique x¯ ∈ X such that x¯ = φ(x). ¯ Then, x0 = x+Δ( ¯ x) ¯ = T (x). ¯ Hence, T is surjective. Then, T is bijective. By Open Mapping Theorem 7.103, T −1 ∈ B(X, X). ∀y ∈ Y, let x = T −1 y. Then, y = T x = x + Δx and x = y − Δx. y By Proposition 7.64, we have x ≤ y + Δx and x ≤ 1−Δ . By 6 −1 6 6T 6 ≤ 1/(1 − Δ). This further implies that the arbitrariness of y, we have 6 6 6 6 6 6 −1 6T − idX 6 = 6T −1 (idX −T )6 ≤ 6T −1 6Δ ≤ Δ/(1 − Δ). The result holds in this case. Hence, the result holds for the special case Y =6 X and 6 A = idX . Now 6A−1 6 < 1, we have T¯ := T consider the general case. ∀T6 ∈ B(X, 6Y) with − A 6 6 6 −1 6 6¯ 6 6 6 6 −1 6T − A < 1. A−1 T 6 ∈ B(X, X). 6 Note that T − idX = A (T − A) ≤ A Then, 6T¯ − idX 6idX 6 < 1.6 By the special case, we have T¯ is bijective and 26 6 −1 6 6 6T¯ − idX 6 ≤ idX  6T¯ −idX 6 . Then, T is bijective and, by Proposition 7.64, 1−idX 6T¯ −idX 6

6 6 6T¯ − idX 6 = 6 −1 6 6A 6idX  = .

6 6 6 −1 6 6A (T − A)6 ≤ 6A−1 6T − A 6 −1 6 6A 6

6 6 6 6 6 6 6 idX 2 6T¯ − idX 6 6 −1 6T − A−1 6 = 6(T¯ −1 − idX )A−1 6 ≤ 6A−1 6 6 6 1 − idX  6T¯ − idX 6 6 −1 62 6A 6 T − A 6 6 ≤ 1 − 6A−1 6 T − A This completes the proof of the proposition.

' &

Proposition 9.56 Let X and Y be Banach spaces over K, D := {L ∈ B(X, Y) | L is bijective}, and f : D → B(Y, X) be given by f (A) = A−1 , ∀A ∈ D. Then, D is open in B(X, Y), f is C∞ , and f (1)(A)(Δ) = −A−1 ΔA−1 , ∀A ∈ D, ∀Δ ∈ B(X, Y). Proof By Proposition 9.55 and Open Mapping Theorem 7.103, D is open and f is continuous. ∀A ∈ D, span (AD (A)) = B(X, Y) since D is open and A ∈ D ◦ . Define L : B(X, Y) → B(Y, X) by L(Δ) = −A−1 ΔA−1 , ∀Δ ∈ B(X, Y). Clearly, L is a linear operator. Note that L =

.

sup

Δ∈B(X,Y), Δ≤1

62 6 L(Δ) ≤ 6A−1 6

where the inequality follows from Proposition 7.64. Hence, L is a bounded linear operator.

298

9 Differentiation in Banach Spaces

6 ⊂ R, by the continuity of f , ∃δ ∈ (0, ∞) ⊂ R such that 6 ∀ ∈ (0, ∞) 6f (A) ¯ − f (A)6 < , ∀A¯ ∈ BB(X,Y) (A, δ). Then, ∀A¯ ∈ BB(X,Y) (A, δ), we have .

6 6 6 6 6f (A) ¯ − f (A) − L(A¯ − A)6 = 6A¯ −1 − A−1 + A−1 (A¯ − A)A−1 6 6 6 = 6 − A−1 (A¯ − A)A¯ −1 + A−1 (A¯ − A)A−1 6 6 66 66 66 6 6 6 ¯ − f (A)6 ≤  6A−1 66A¯ − A6 ≤ 6A−1 66A¯ − A66f (A)

where the first inequality follows from Proposition 7.64. Hence, we have f (1) (A) = L. Then, f is differentiable. Note that f (1)(A) = −f (A)ro(f (A)), ∀A ∈ D. By Propositions 9.42, 9.7, 3.12, and 3.32, f (1) is continuous. Hence, f is C1 . Assume that f is Ck , for some k ∈ N. We will show that f is Ck+1 . Then, f is C∞ . Note that f (1) (A) = −f (A)ro(f (A)), ∀A ∈ D. By Propositions 9.45, 9.42, and 9.44, f (1) is Ck . Then, f is Ck+1 . This completes the proof of the proposition. ' & Theorem 9.57 (Inverse Function Theorem) Let X and Y be Banach spaces over K, D ⊆ X, and F : D → Y be C1 at x0 ∈ D ◦ . Assume that F (1) (x0 ) ∈ B(X, Y) is bijective. Then, ∃ open set U ⊆ D with x0 ∈ U and ∃ open set V ⊆ Y with y0 := F (x0 ) ∈ V such that: (i) F |U : U → V is bijective. (ii) The inverse mapping Fi : V → U of F |U is differentiable, Fi(1) : V → B(Y, X) is given by Fi(1) (y) = (F (1) (Fi (y)))−1 , ∀y ∈ V , and Fi(1) is continuous at y0 . (iii) If F is k-times differentiable for some k ∈ N, then Fi is k-times differentiable. (iv) If F is Ck at x0 for some k ∈ N ∪ {∞}, then Fi is Ck at y0 . Proof By Open Mapping Theorem 7.103, (F (1) (x0 ))−1 ∈ B(Y, X). By F being C1 at x0 ∈ D ◦ , ∃¯r ∈ (0, ∞) ⊂ R such that F (1) (x) exists, ∀x ∈ BX (x0 , r¯ ) ⊆ D. Define T : BX (ϑX , r¯ ) → X by T (x) = (F (1) (x0 ))−1 (F (x + x0 ) − y0 ), ∀x ∈ BX (ϑX , r¯ ) ⊆ D −x0 . Clearly, T (ϑX ) = ϑX . By Propositions 9.45, 9.38, 9.34, and 9.44, T is C1 at ϑX , T is differentiable, and T (1) (x) = (F (1) (x0 ))−1 F (1) (x + x0 ), ∀x ∈ BX (ϑX , r¯ ). Clearly, T (1) (ϑX ) = idX . Define ψ : BX (ϑX , r¯ ) → X by ψ(x) = T (x) − x. Then, by Propositions 9.38, 9.45, and 9.44, ψ is differentiable, ψ is C1 at ϑX , and ψ (1) (x) = T (1) (x) − idX , ∀x ∈ BX (ϑX , r¯ ). Clearly, ψ(ϑX ) = ϑX and ψ (1) (ϑX ) = ϑB(X,X) . Fix any 6 α ∈ (0, 6 1) ⊂ R. Then, ∃r1 ∈ (0, r¯ ) ⊂ R such that B X (ϑX , r1 ) ⊆ D − x0 and 6ψ (1) (x)6 ≤ α, ∀x ∈ B X (ϑX , r1 ). ∀x61 , x2 ∈ B X (ϑX , r1 ), by mean value 6 theorem, ψ(x1 ) − ψ(x2 ) ≤ supt0 ∈(0,1)⊂R 6ψ (1) (t0 x1 + (1 − t0 )x2 )(x1 − x2 )6 ≤ αx1 − x2 , where the last inequality follows from Proposition 7.64. ∀x¯ ∈ BX (ϑX , (1 − α)r1 ), define φ : B X (ϑX , r1 ) → X by φ(x) = x¯ − ψ(x), ∀x ∈ B X (ϑX , r1 ). ∀x ∈ B X (ϑX , r1 ), φ(x) ≤ x ¯ + ψ(x) < (1 − α)r1 + ψ(x) − ψ(ϑX ) ≤ (1 − α)r1 + αx ≤ r1 . Hence, φ : B X (ϑX , r1 ) → BX (ϑX , r1 ) ⊆ B X (ϑX , r1 ). It is easy to see that φ is a contraction mapping

9.5 Mapping Theorems

299

with contraction index α. By Contraction Mapping Theorem, ∃! xˆ ∈ B X (ϑX , r1 ) such that xˆ = φ(x) ˆ ∈ BX (ϑX , r1 ), which is equivalent to x¯ = T (x). ˆ Let U¯ := Tinv (BX (ϑX , (1 − α)r1 )) ∩ BX (ϑX , r1 ) and V¯ := BX (ϑX , (1 − α)r1 ). Note that U¯ and V¯ are open sets in X since T is continuous by Proposition 9.7. Then, T |U¯ : U¯ → V¯ is bijective. Since T (ϑX ) = ϑX , then ϑX ∈ U¯ . Hence, there exists an inverse mapping Ti : V¯ → U¯ of T |U¯ . Let U := U¯ + x0 and V := F (1)(x0 )(V¯ )+y0 . Clearly, U and V are open sets in X and Y, respectively. Note that F |U (x) = F (1) (x0 ) T |U¯ (x − x0 ) + y0 , ∀x ∈ U . Then, F |U : U → V is bijective, whose inverse function is Fi : V → U . The inverse function Fi is given by Fi (y) = Ti ((F (1) (x0 ))−1 (y − y0 )) + x0 , ∀y ∈ V . Hence, the statement (i) holds. Next, we will show that Fi : V → U is differentiable. Note that ∀x ∈ V¯ , x = T (Ti (x)) = Ti (x) + ψ(Ti (x)) and Ti (x) = x − ψ(Ti (x)). ∀x1 , x2 ∈ V¯ , we have Ti (x1 ), Ti (x2 ) ∈ U¯ ⊆ BX (ϑX , r1 ) and Ti (x1 ) − Ti (x2 ) = x1 − x2 − ψ(Ti (x1 )) + ψ(Ti (x2 ))

.

≤ x1 − x2  + ψ(Ti (x1 )) − ψ(Ti (x2 )) ≤ x1 − x2  + αTi (x1 ) − Ti (x2 ) which further implies that Ti (x1 ) − Ti (x2 ) ≤ x1 − x2 /(1 − α). Therefore, Ti is continuous. By Propositions 3.12, 7.23, and 3.32, Fi is continuous. We need the following intermediate result. Claim 9.57.1 ∀x ∈ U , let y = F (x). Then, Fi is differentiable at F (x) and Fi(1) (y) = (F (1) (x))−1 . ¯ Proof X , r1 ) 6 any x ∈ U and let y = F (x). Then,(1)x − x0 ∈ U ⊆ BX (ϑ(1) 6 of(1)Claim Fix 6 6 and ψ (x − x0 ) ≤ α. Note that T = idX +ψ and T 6(x−x0 ) = idX +ψ 6 (x− x0 ). By Proposition 9.55, T (1) (x − x0 ) is bijective and 6(T (1) (x − x0 ))−1 6 < ∞. (1) (1) (1) (1) Note 6 (1)that F−1 6(x) = F (x0 )T (x − x0 ). Then, F (x) is bijective and c1 := 6(F (x)) 6 < ∞. ∀ ∈ (0, ∞) ⊂ R with c1 < 1, by the differentiability of F at x, 6 ∃δ1 ∈ (0, ∞) ⊂(1)R such 6 that ∀h ∈ X with h < δ1 , we have 6F (x + h) − F (x) − F (x)(h)6 ≤ h. By the continuity of Fi , ∃δ ∈ (0, ∞) ⊂ R such that ∀u ∈ Y with u < δ, we i (y + u) −6Fi (y) < 6 have y + u ∈ V and F(1) u δ1 . ∀u ∈ Y with < δ, let β := 6Fi (y + u) − Fi (y) − (F (x))−1 u6 ≥ 0. Let h := Fi (y + u) − x ∈ X. Then, h = Fi (y + u) − Fi (y) < δ1 . Note that 6 6 6 6 β = 6x + h − x − (F (1) (x))−1 u6 = 6(F (1)(x))−1 (u − F (1) (x)h)6 6 6 66 ≤ 6(F (1) (x))−1 66u − F (1) (x)h6 6 6 = c1 6F (x + h) − F (x) − F (1) (x)h6

.

≤ c1 h = c1 Fi (y + u) − Fi (y)

300

9 Differentiation in Banach Spaces

6 6 6 6 ≤ c1 (6Fi (y + u) − Fi (y) − (F (1) (x))−1 u6 + 6(F (1) (x))−1 u6) ≤ c1 (β + c1 u) c2

Then, β ≤ 1−c1 1 u. Hence, Fi is differentiable at y and Fi(1)(y) = (F (1) (x))−1 . This completes the proof of the claim. ' & Then, ∀y ∈ V , Fi(1)(y) = (F (1) (Fi (y)))−1 . By Propositions 9.56 and 3.12, the continuity of Fi , and the continuity of F (1) at x0 , we have Fi(1) is continuous at y0 . Then, the statement (ii) holds. For (iii), we will use mathematical induction on k. 1◦ k = 1. The result holds by (ii). 2◦ Assume that the result holds for k = k¯ ∈ N. ¯ 3◦ Consider the case k = k¯ + 1. By inductive assumption, Fi is k-times ¯ differentiable. Clearly, F (1) is k-times differentiable. By (ii) and Propositions 9.45 ¯ and 9.56, Fi(1) is k-times differentiable. Then, Fi is (k¯ +1)-times differentiable. This completes the induction process and the proof of the statement (iii). For (iv), we will use mathematical induction on k to show that the result holds when k ∈ N. 1◦ k = 1. The result holds by (ii). 2◦ Assume that the result holds for k = k¯ ∈ N. 3◦ Consider the case k = k¯ + 1. Clearly, F (1) is Ck¯ at x0 . By the inductive assumption, Fi is Ck¯ at y0 . By (ii) and Propositions 9.45 and 9.56, Fi(1) is Ck¯ at y0 . Hence, Fi is Ck+1 at y0 . This completes the induction process. ¯ When k = ∞. ∃δ ∈ (0, ∞) ⊂ R such that F is Ci at x, ∀x ∈ BX (x0 , δ) ⊆ D, ∀i ∈ N. By taking r1 < δ in the proof of (i), we have U ⊆ BX (x0 , δ). Then, by the induction conclusion, Fi is Ci at y, ∀y ∈ F (U ) = V , ∀i ∈ N. Hence, Fi is C∞ at y0 . This completes the proof of the theorem. ' & Theorem 9.58 (Implicit Function Theorem) Let X := (X, O) be a topological space, Y and Z be Banach spaces over K, D ⊆ X × Y, and F : D → Z be continuous. Assume that F is partial differentiable with respect to y and ∂F ∂y

is continuous at (x0 , y0 ) ∈ D ◦ , F (x0 , y0 ) = ϑZ , and bijective. Then, the following statements hold:

∂F ∂y

(x0 , y0 ) ∈ B(Y, Z) is

(i) There exists an open set U0 ∈ O with x0 ∈ U0 and r1 ∈ (0, ∞) ⊂ R such that U0 ×BY (y0 , r1 ) ⊆ D and ∀x ∈ U0 , ∃!y ∈ BY (y0 , r1 ) satisfying F (x, y) = ϑZ . This defines a function φ : U0 → BY (y0 , r1 ) by φ(x) = y, ∀x ∈ U0 . Then, φ is continuous. 6" #−1 6 6 6 ∂F ∂F 6 < +∞. 6 (ii) ∀(x, y) ∈ U0 × BY (y0 , r1 ), ∂y (x, y) is bijective and 6 ∂y (x, y) 6

9.5 Mapping Theorems

301

" Proof By Open Mapping Theorem 7.103, mapping ψ : D → Y by ψ(x, y) = y −

.

∂F (x0 , y0 ) ∂y

∂F ∂y (x0 , y0 )

!−1 F (x, y);

#−1

∈ B(Z, Y). Define a

∀(x, y) ∈ D

Note that ψ(x0 , y0 ) = y0 . Then, by Propositions 7.23, 3.12, 3.27, and 3.32, ψ is continuous. By the partial differentiability of F with respect to y, chain rule, and Propositions 9.41, 9.15, and 9.19, ψ is partial differentiable with respect to y and .

∂ψ ∂F (x, y) = idY − (x0 , y0 ) ∂y ∂y

By the continuity of

∂F ∂y

!−1

at (x0 , y0 ), then

∂ψ ∂y

∂F (x, y); ∂y

∀(x, y) ∈ D

is continuous at (x0 , y0 ) ∈ D ◦ and

∂ψ ∂y (x0 , y0 )

= ϑB(Y,Y) . Fix any α ∈ (0, 1) ⊂ R. Then, ∃U1 ∈ O with x0 ∈ U1 6 6 6 6 and ∃r1 ∈ (0, ∞) ⊂ R such that U1 × B Y (y0 , r1 ) ⊆ D and 6 ∂ψ (x, y) 6 ≤ α, ∂y ∀(x, y) ∈ U1 × B Y (y0 , r1 ). By the continuity of ψ, ∃U0 ∈ O with x0 ∈ U0 ⊆ U1 such that ψ(x, y0 ) − y0  = ψ(x, y0 ) − ψ(x0 , y0 ) < (1 − α)r1 , ∀x ∈ U0 . Fix any x ∈ U0 , define mapping γx : B Y (y0 , r1 ) → Y by γx (y) = ψ(x, y), ∀y ∈ B Y (y0 , r1 ). We will show that γx is a contraction mapping with contraction index α. ∀y ∈ B Y (y0 , r1 ), we have γx (y) − y0  = ψ(x, y) − ψ(x0 , y0 )

.

≤ ψ(x, y) − ψ(x, y0 ) + ψ(x, y0 ) − ψ(x0 , y0 ) 6 6 ∂ψ 6 6 (x, ty + (1 − t)y0 )(y − y0 )6 + (1 − α)r1 < sup 6 t ∈(0,1)⊂R ∂y 6 6 ∂ψ 6 6 (x, ty + (1 − t)y0 )6y − y0  + (1 − α)r1 ≤ sup 6 t ∈(0,1)⊂R ∂y ≤ αr1 + (1 − α)r1 = r1 where the second inequality follows from mean value theorem 9.23 and the third inequality follows from Proposition 7.64. Then, γx : B Y (y0 , r1 ) → BY (y0 , r1 ) ⊆ B Y (y0 , r1 ). ∀y1 , y2 ∈ B Y (y0 , r1 ), we have γx (y1 ) − γx (y2 ) = ψ(x, y1 ) − ψ(x, y2 ) 6 6 ∂ψ 6 6 (x, ty1 + (1 − t)y2 )(y1 − y2 )6 ≤ sup 6 t ∈(0,1)⊂R ∂y

.

302

9 Differentiation in Banach Spaces



6 ∂ψ 6 6 6 (x, ty1 + (1 − t)y2 )6y1 − y2  6 ∂y t ∈(0,1)⊂R sup

≤ αy1 − y2  where the first inequality follows from mean value theorem 9.23 and the second inequality follows from Proposition 7.64. Hence, γx is a contraction mapping with contraction index α. By Contraction Mapping Theorem, ∃! y ∈ B Y (y0 , r1 ) such that y = γx (y) ∈ BY (y0 , r1 ), y = limn∈N γx,n (y0 ), where γx,n (y0 ) is recursively defined6 by 6 γx,1(y0 ) = y0 and γx,k+1 (y0 ) = γx (γx,k (y0 )), ∀k ∈ N, and 6γx,n (y0 ) − y 6 ≤ α n−1 n−1 , ∀n ∈ 1−α γx (y0 ) − y0  < r1 α F (x, y) = ϑZ . Hence, ∀x ∈ U0 , ∃! y

{2, 3, . . .}. By γx (y) = y, we conclude that ∈ BY (y0 , r1 ) such that F (x, y) = ϑZ , since F (x, y) = ϑZ ⇔ y = γx (y). Then, we may define φ : U0 → BY (y0 , r1 ) by φ(x) = y = limn∈N γx,n (y0 ), ∀x ∈ U0 . Hence, F (x, φ(x)) = ϑZ , ∀x ∈ U0 . Next, we show that φ is continuous. Fix any x¯ ∈ U0 . ∀ ∈ (0, ∞) ⊂ R, ∃n0 ∈ N with n0 > 1 such that α n0 −1 r1 < /3. By the continuity of ψ, we have that γx,n0 (y0 ) is continuous with6respect to x, that is, ∃6U¯ ∈ O with x¯ ∈ U¯ ⊆ U0 such that 6 ∀x1 ∈ U¯ , we have 6γx1 ,n0 (y0 ) − γx,n ¯ 0 (y0 ) < /3. Then, 6 6 6 6 6 φ(x1 ) − φ(x) ¯ ≤ 6φ(x1 ) − γx1 ,n0 (y0 )6 + 6γx1 ,n0 (y0 ) − γx,n ¯ 0 (y0 ) 6 6 +6γx,n ¯ 6 ≤ r1 α n0 −1 + /3 + r1 α n0 −1 <  ¯ 0 (y0 ) − φ(x)

.

Hence, φ is continuous. Thus, the statement (i) is proved. 6 6 6 6 (x, y) (ii) Note that, ∀(x, y) ∈ U1 × B Y (y0 , r1 ), we have 6 ∂ψ 6 ≤ α. By ∂y Proposition 9.55, idY − ∂ψ ∂y (x, y) is bijective with continuous inverse. Note that " #−1 ∂F ∂F ∂F idY − ∂ψ ∂y (x, y) = ∂y (x0 , y0 ) ∂y (x, y). Therefore, ∂y (x, y) is bijective with continuous inverse. ' & This completes the proof of the theorem. Theorem 9.59 (Implicit Function Theorem) Let X be normed linear space over K, Y and Z be Banach spaces over K, D ⊆ X × Y, and F : D → Z be continuous. Assume that F is partial differentiable with respect to y and ∂F ∂y is continuous at (x0 , y0 ) ∈ D ◦ , F (x0 , y0 ) = ϑZ , and following statements hold:

∂F ∂y

(x0 , y0 ) ∈ B(Y, Z) is bijective. Then, the

(i) There exist r0 , r1 ∈ (0, ∞) ⊂ R such that BX (x0 , r0 ) × BY (y0 , r1 ) ⊆ D and ∀x ∈ BX (x0 , r0 ), ∃! y ∈ BY (y0 , r1 ) satisfying F (x, y) = ϑZ . This defines a function φ : BX (x0 , r0 ) → BY (y0 , r1 ) by φ(x) = y, ∀x ∈ BX (x0 , r0 ). Then, φ is continuous.

9.5 Mapping Theorems

303

(ii) If F is Fréchet differentiable at (x, φ(x)) ∈ BX (x0 , r0 ) × BY (y0 , r1 ) for some x ∈ BX (x0 , r0 ), then φ is Fréchet differentiable at x and φ

.

(1)

∂F (x) = − (x, φ(x)) ∂y

!−1

∂F (x, φ(x)) ∂x

(iii) Let n ∈ N. If F is n-times Fréchet differentiable, then φ is n-times Fréchet differentiable. (iv) Let n ∈ N ∪ {∞} and x¯ ∈ BX (x0 , r0 ). If F is Cn at (x, ¯ φ(x)), ¯ then φ is Cn at x. ¯ Proof By Implicit Function Theorem 9.58, the statement (i) holds. Furthermore, 6" #−1 6 6 6 ∂F ∀(x, y) ∈ BX (x0 , r0 ) × BY (y0 , r1 ), ∂F (x, y) is bijective and (x, y) 6 < 6 ∂y ∂y +∞. (ii) Fix some x ∈ BX (x0 , r0 ) such that FF is differentiable atG(x, φ(x)). Let y := ∂F (1) φ(x) ∈ BY (y0 , r1 ). By Proposition 9.9, ∂F ∂x (x, y) ∂y (x, y) = F (x, y). Let 6" " #−1 #−1 6 6 6 ∂F ∂F L := − ∂F (x, φ(x)) (x, φ(x)) ∈ B(X, Y) and c := (x, y) 6 < 6 1 ∂y ∂x ∂y +∞. ∀ ∈ (0, ∞) ⊂ R with c1 < 1, by the differentiability of F at (x, y), ∃δ1 ∈ (0, ∞)6 ⊂ R such that ∀(h, k) ∈ BX×Y ((ϑX , ϑY ), δ1 ), we have 6 (x + h, y + 6 6 ∂F (x, y)h − (x, y)k k) ∈ D and 6F (x + h, y + k) − F (x, y) − ∂F 6 ≤ (h, k). ∂x ∂y √   By the continuity of φ at x, ∃δ ∈ (0, min r0 − x − x0 , δ1 / 2 ] ⊂ R such that √ ∀h ∈ BX (ϑX , δ), we have φ(x + h) − φ(x) = φ(x + h) − y < δ1 / 2. ∀h ∈ BX (ϑX , δ), let β := φ(x + h) − φ(x) − Lh ≥ 0. Then, (h, φ(x + h) − y) < δ1 . Note that, by Proposition 7.64, 6 !−1 !6 6 6 ∂F ∂F ∂F 6 6 (x, y) (x, y)(φ(x + h) − φ(x)) + (x, y)h 6 .β = 6 6 6 ∂y ∂y ∂x 6 ∂F ≤ c1 6F (x + h, φ(x + h)) − F (x, y) − (x, y)h ∂x 6 ∂F − (x, y)(φ(x + h) − y)6 ∂y ≤ c1 (h, φ(x + h) − y) ≤ c1  (h + φ(x + h) − y) ≤ c1  (h + φ(x + h) − φ(x) − Lh + Lh) ≤ c1  (β + (1 + L)h) (1+L) h. Hence, φ (1) (x) = L. Then, the statement (ii) Then, we have β ≤ c1 1−c 1 holds. For (iii), we will use mathematical induction on n to prove this result.

304

9 Differentiation in Banach Spaces

1◦ 2◦ 3◦

n = 1. The result follows from (ii). Assume that the result holds for n = n¯ ∈ N. Consider the case n = n¯ + 1. By (ii), we have φ is Fréchet differentiable " #−1 ∂F and φ (1) (x) = − ∂F (x, φ(x)) ∂y ∂x (x, φ(x)), ∀x ∈ BX (x0 , r0 ). By inductive

∂F ¯ assumption, φ is n-times ¯ differentiable. By Proposition 9.46, ∂F ∂x and ∂y are n(1) ¯ times differentiable. By Propositions 9.45, 9.56, 9.44, and 9.42, φ is n-times differentiable. Then, φ is (n¯ + 1)-times differentiable. This completes the induction process. For (iv), let y¯ = φ(x). ¯ We will first use mathematical induction on n to prove the result for n ∈ N. 1◦ n = 1. By F being C1 at (x, ¯ y), ¯ then ∃¯r ∈ (0, ∞) ⊂ R such that F is differentiable at (x, y), ∀(x,Fy) ∈ D¯ := BX×Y ((Gx, ¯ y), ¯ r¯ ) ⊆ D. By Propositions 9.34 (1) ∂F ∂F ¯ By Proposition 9.46, (x, y) (x, y) , ∀(x, y) ∈ D. and 9.9, F |D¯ (x, y) = ∂x ∂y   ∂F  ∂F  ∂F ¯ y). ¯ Since D¯ is open, then ∂F ∂x D¯ and ∂y D¯ are continuous at (x, ∂x and ∂y are continuous at (x, ¯ y). ¯ By the continuity of φ, ∃δ¯ ∈ (0, min {r0 − x¯ − x0 , r¯ }] ⊂ R such that F is differentiable at (x, φ(x)), ∀x ∈ BX x, ¯ δ¯ . By (ii), φ is " #−1   ∂F differentiable at x and φ (1) (x) = − ∂F ¯ δ¯ . ∂y (x, φ(x)) ∂x (x, φ(x)), ∀x ∈ BX x,

By Propositions 3.12, 3.32, 9.56, 9.42, and 9.7, φ (1) is continuous at x. ¯ Hence, φ is C1 at x. ¯ 2◦ Assume that the result holds for n = n¯ ∈ N. 3◦ Consider the case n = n+1. ¯ By F being Cn+1 at (x, ¯ y), ¯ then ∃¯r ∈ (0, ∞) ⊂ R ¯ such that F is differentiable at (x, y), ∀(x, y)F ∈ D¯ := BX×Y ((x, ¯ Gy), ¯ r¯ ) ⊆ D. By (1) ∂F ∂F ¯ Propositions 9.34 and 9.9, F |D¯ (x, y) = ∂x (x, y) ∂y (x, y) , ∀(x, y) ∈ D.   ∂F   By Proposition 9.46, ∂F ¯ y). ¯ Since D¯ is open, then, ∂x D¯ and ∂y  ¯ are Cn¯ at (x, D

∂F ¯ y). ¯ By the inductive assumption, φ by Proposition 9.34, ∂F ∂x and ∂y are Cn¯ at (x, ¯ By the continuity of φ, ∃δ¯ ∈ (0, min is Cn¯ at x.  {r0 − x¯ − x0 , r¯ }] ⊂ R such that F is differentiable at (x, φ(x)), ∀x ∈ BX x, ¯ δ¯ . By (ii), φ is differentiable " #−1   ∂F at x and φ (1)(x) = − ∂F ¯ δ¯ . By Proposi∂y (x, φ(x)) ∂x (x, φ(x)), ∀x ∈ BX x,

tions 9.45, 9.44, 9.56, and 9.42, φ (1) is Cn¯ at x. ¯ Hence, φ is Cn+1 at x. ¯ ¯ This completes the induction process. Hence, (iv) holds when n ∈ N. When n = ∞. There exists δ ∈ (0, ∞) ⊂ R, such that F is Ci at (x, ˆ y), ˆ ∀(x, ˆ y) ˆ ∈ BX×Y ((x, ¯ y), ¯ δ) ⊆ BX (x0", r0 ) × BY# (y0 , r0 ), ∀i"∈ N. By #the induction result, we √ √ ¯ δ/ 2 ∩ φinv (BY y, ¯ δ/ 2 ). Hence, φ is C∞ at x. ˆ ∀xˆ ∈ BX x, ¯ have φ is Ci at x, Hence, (iv) holds. This completes the proof of the proposition. ' & Proposition 9.60 Let X := (X, O) be a topological space, Y and Z be normed linear spaces over K, D ⊆ X × Y, f : D → Z be partial differentiable with ¯ respect to y, and ∂f ∂y (x, y) = ϑB(Y,Z) , ∀(x, y) ∈ D. Let D := πX (D), where πX

9.6 Global Inverse Function Theorem

305

¯ the set Dx := {y ∈ is the projection function of X × Y to X . Assume that ∀x ∈ D, Y | (x, y) ∈ D} ⊆ Y is convex. Then, there exists a function φ : D¯ → Z such that f (x, y) = φ(x), ∀(x, y) ∈ D. Furthermore, the following statements hold: (i) If f is continuous at (x0 , y0 ) ∈ D ◦ , then φ is continuous at x0 . (ii) If X is a normed linear space X over K and f is Ck at (x0 , y0 ) ∈ D ◦ , where k ∈ N ∪ {∞}, then φ is Ck at x0 . Proof ∀x ∈ D¯ = πX (D), Dx = ∅. By axiom of choice, ∃g : D¯ → Y such ¯ Define φ : D¯ → X by φ(x) = f (x, g(x)), ∀x ∈ D. ¯ that g(x) ∈ Dx , ∀x ∈ D. ∀(x, y) ∈ D, we have x ∈ D¯ and y, g(x) ∈ Dx . By the convexity of Dx , the line segment connecting y and g(x) is contained in Dx . By mean value theorem 9.23, ∃t0 ∈ (0, 1) ⊂ R such that f (x,6y) − φ(x) = f (x, y) − f (x, g(x)) ≤ 6 6 ∂f 6 6 ∂y (x, t0 y + (1 − t0 )g(x))(y − g(x))6 = 0. Hence, f (x, y) = φ(x). (i) Let f be continuous at (x0 , y0 ) ∈ D ◦ . Then, ∃U ∈ O with x0 ∈ U and ∃δ ∈ (0, ∞) ⊂ R such that U ×BY (y0 , δ) ⊆ D. ∀x ∈ U , we have φ(x) = f (x, y0 ). Hence, φ is continuous at x0 . (ii) Let X be a normed linear space X over K and f be Ck at (x0 , y0 ) ∈ D ◦ ,where k ∈ N∪{∞}. Then, ∃δx , δy ∈ (0, ∞) ⊂ R such that BX (x0 , δx )×BY y0 , δy ⊆ D. ∀x ∈ BX (x0 , δx ), we have φ(x) = f (x, y0 ). By Proposition 9.45, φ|BX (x0 ,δx ) is Ck at x0 . By Proposition 9.34, φ is Ck at x0 . This completes the proof of the proposition. ' &

9.6 Global Inverse Function Theorem Definition 9.61 Let X , Y, and Z be topological spaces, F : X → Y, and σ : Z → Y. We will say θ : Z → X inverts F along σ if σ = F ◦ θ . % Lemma 9.62 Let X and Y be Hausdorff topological spaces, F : X → Y be continuous and countably proper, x0 ∈ X , and y0 := F (x0 ) ∈ Y. Assume that ∀x ∈ X , ∃U ∈ OX with x ∈ U and ∃V ∈ OY with F (x) ∈ V such that F |U : U → V is a homeomorphism. Then, given any continuous mapping σ : [a, b] → Y with σ (t0 ) = y0 , where a, t0 , b ∈ R and a ≤ t0 ≤ b, there exists a unique continuous mapping θ : [a, b] → X with θ (t0 ) = x0 that inverts F along σ . Proof We will distinguish three exhaustive and mutually exclusive cases: Case 1: a = b = t0 ; Case 2: a = t0 < b; Case 3: a < t0 ≤ b. Case 1: a = b = t0 . Clearly, θ exists and is unique. This case is proved. Case 2: a = t0 < b. “Uniqueness” Let θ1 : [a, b] → X and θ2 : [a, b] → X be continuous mappings that invert F along σ with θ1 (a) = θ2 (a) = x0 . Let S := {s ∈ [a, b] ⊂ R | θ1(t) = θ2 (t) ∀t ∈ [a, s] ⊂ R} and ξ := sup S. Clearly, a ∈ S and a ≤ ξ ≤ b. It is easy to show that θ1 (t) = θ2 (t), ∀t ∈ R with a ≤ t < ξ . There exists (tn )∞ n=1 ⊆ S such that limn∈N tn = ξ . By Proposition 3.66 and the continuity

306

9 Differentiation in Banach Spaces

of θ1 and θ2 , we have θ1 (ξ ) = limn∈N θ1 (tn ) = limn∈N θ2 (tn ) = θ2 (tn ), where the limit operator makes sense since X is Hausdorff. Then, ξ ∈ S. We will next show that ξ = b by an argument of contradiction. Suppose that ξ < b. Let x := θ1 (ξ ) = θ2 (ξ ) and y := F (x). Then, ∃U ∈ OX with x ∈ U and ∃V ∈ OY with y ∈ V such that F |U : U → V is a homeomorphism. By the continuity of θ1 and θ2 , ∃ξ¯ ∈ (ξ, b] such that θ1 (t), θ2 (t) ∈ U , ∀t ∈ [ξ, ξ¯ ] ⊂ R. Then, σ (t) = F (θ1 (t)) = F (θ2 (t)) ∈ V , ∀t ∈ [ξ, ξ¯ ] ⊂ R. Since F |U : U → V is a homeomorphism, then θ1 (t) = θ2 (t), ∀t ∈ [ξ, ξ¯ ] ⊂ R. Then, ξ¯ ∈ S and ξ < ξ¯ ≤ sup S = ξ . This is a contradiction. Therefore, we must have ξ = b. Therefore, θ1 (t) = θ2 (t), ∀t ∈ [a, ξ ] = [a, b] ⊂ R, since ξ ∈ S. This shows that θ1 = θ2 . Hence, if θ exists, then it must be unique. “Existence” Let S := {s ∈ [a, b] ⊂ R | there exists a continuous θ : [a, s] → X that inverts F along σ |[a,s] with θ (a) = x0 } ⊂ R and ξ := sup S. Clearly, a ∈ S and a ≤ ξ ≤ b. We will show that ξ ∈ S by an argument of contradiction. Suppose ξ ∈ / S. Then, a < ξ ≤ b and ∃(tn )∞ ⊆ S, which is nondecreasing, such that lim tn = ξ . n∈N n=1 ∀n ∈ N, there exists a continuous θn : [a, tn ] → X that inverts F along σ |[a,tn ] with θn (a) = x0 . By the uniqueness property that we have shown, we have θn = θn+1 |[a,tn ] . Hence, we may define θ : [a, ξ ) → X such that θ (t) = θn (t), ∀t ∈ [a, tn ] ⊂ R, ∀n ∈ N. Then, θ is continuous and inverts F along σ |[a,ξ ) with θ (a) = x0 . Note that σ (tn ) = F (θ (tn )), ∀n ∈ N. By continuity of σ and Proposition 3.66, we have limn∈N σ (tn ) = σ (ξ ) ∈ Y, where the limit operator makes sense since Y is Hausdorff. Then, (θ (tn ))∞ n=1 ⊆ Finv (M), where M := {σ (tn ) ∈ Y | n ∈ N} ∪ {σ (ξ )} ⊆ Y. Clearly, M is compact in Y. Since F is countably proper, then Finv (M) is countably compact. By Proposition 5.26, Finv (M) has the Bolzano–Weierstrass property, and (θ (tn ))∞ n=1 admits a cluster point x ∈ Finv (M). By the continuity of ∞ F and Proposition 3.66, (F (θ (tn )))∞ n=1 = (σ (tn ))n=1 admits a cluster point F (x). Since Y is Hausdorff, then F (x) = σ (ξ ) =: y, by Proposition 3.65. By the assumption of the lemma, ∃U ∈ OX with x ∈ U and ∃V ∈ OY with y ∈ V such that F |U : U → V is a homeomorphism. Since σ is continuous, then σinv (V ) is open in [a, b] ⊂ R. Since ξ ∈ σinv (V ), then ∃ξˆ ∈ [a, ξ ) ⊂ R such that [ξˆ , ξ ] ⊆ σinv (V ). By limn∈N tn = ξ , ∃N ∈ N such that ∀n ∈ N with n ≥ N, tn ∈ [ξˆ , ξ ] ⊂ R. Since (θ (tn ))∞ n=1 admits a cluster point x ∈ U , then ∃n0 ∈ N with n0 ≥ N such that θ (tn0 ) ∈ U . Clearly, σ ([tn0 , ξ ]) ⊆ V . Define θ1 := ( F |U )inv ◦ σ |[tn ,ξ ] : [tn0 , ξ ] → U . Clearly, θ1 is continuous and inverts 0 F along σ |[tn ,ξ ] with θ1 (tn0 ) = θ (tn0 ). Define θ¯ : [a, ξ ] → X by θ¯ (t) = θ (t), 0 ∀t ∈ [a, tn0 ] ⊂ R, θ¯ (t) = θ1 (t), ∀t ∈ [tn0 , ξ ]. By Theorem 3.11, θ¯ is continuous. Clearly, θ¯ inverts F along σ |[a,ξ ] with θ¯ (a) = θ (a) = x0 . Hence, ξ ∈ S. This is a contradiction. Therefore, we must have ξ ∈ S. Next, we will show that ξ = b by an argument of contradiction. Suppose ξ < b. Since ξ ∈ S, then ∃ a continuous function θ : [a, ξ ] → X that inverts F along σ |[a,ξ ] with θ (a) = x0 . Let x := θ (ξ ) ∈ X and y := F (x) = σ (ξ ) ∈ Y. Then, ∃U ∈ OX with x ∈ U and ∃V ∈ OY with y ∈ V such that F |U : U → V is a homeomorphism. By the continuity of σ , ∃ξ¯ ∈ (ξ, b] ⊂ R such that σ (t) ∈ V ,

9.6 Global Inverse Function Theorem

307

¯ ∀t ∈ [ξ, ξ¯ ] ⊂ R. Define θ¯ : [a, ξ¯ ] → X by θ¯ (t) = θ (t), ∀t ∈ [a, ξ ], θ(t) = ( F |U )inv (σ (t)), ∀t ∈ [ξ, ξ¯ ] ⊂ R. By Theorem 3.11, θ¯ is continuous. It is clear that θ¯ inverts F along σ |[a,ξ] ¯ with θ¯ (a) = x0 . Then, ξ¯ ∈ S. This implies that ¯ ξ < ξ ≤ sup S = ξ , which is a contradiction. Hence, ξ = b ∈ S. This case is proved. Case 3: a < t0 ≤ b. By Cases 1 and 2, ∃! θ1 : [t0 , b] → X that is continuous and inverts F along σ |[t0 ,b] with θ1 (t0 ) = x0 . Define σ¯ : [t0 , 2t0 − a] → Y by σ¯ (t) = σ (2t0 − t), ∀t ∈ [t0 , 2t0 − a] ⊂ R. By Proposition 3.12, σ¯ is continuous and σ¯ (t0 ) = y0 . By Case 2, ∃! θ2 : [t0 , 2t0 − a] → X that is continuous and inverts F ¯ = θ1 (t), ∀t ∈ [t0 , b] ⊂ R, along σ¯ with θ2 (t0 ) = x0 . Define θ¯ : [a, b] → X by θ(t) ¯θ (t) = θ2 (2t0 − t), ∀t ∈ [a, t0 ] ⊂ R. By Theorem 3.11, θ¯ is continuous and inverts ¯ 0 ) = x0 . The uniqueness of θ¯ follows from the uniqueness of θ1 F along σ with θ(t and θ2 . This case is proved. This completes the proof of the lemma. ' & Lemma 9.63 Let X and Y be Hausdorff topological spaces, F : X → Y be continuous and countably proper, x0 ∈ X , and y0 = F (x0 ) ∈ Y. Assume that ∀x ∈ X , ∃U ∈ OX with x ∈ U and ∃V ∈ OY with F (x) ∈ V such that F |U : U → V is a homeomorphism. Then, given any continuous mapping σ : [a, b] × [c, d] → Y with σ (a, c) = y0 , where a, b, c, d ∈ R, a ≤ b, and c ≤ d, there exists a unique continuous mapping θ : [a, b] × [c, d] → X with θ (a, c) = x0 that inverts F along σ . Proof We will distinguish two exhaustive and mutually exclusive cases: Case 1: a = b; Case 2: a < b. Case 1: a = b. The result holds by Lemma 9.62. Case 2: a < b. “Uniqueness” Let θ1 : [a, b] × [c, d] → X and θ2 : [a, b] × [c, d] → X be continuous mappings that invert F along σ with θ1 (a, c) = θ2 (a, c) = x0 . Fix any (t, r) ∈ [a, b] × [c, d] ⊂ R2 . Let σ¯ : [0, 1] → Y, θ¯1 : [0, 1] → X , and θ¯2 : [0, 1] → X be defined by, ∀λ ∈ [0, 1] ⊂ R, σ¯ (λ) = σ (λt + (1 − λ)a, λr + (1 − λ)c)

.

θ¯1 (λ) = θ1 (λt + (1 − λ)a, λr + (1 − λ)c) θ¯2 (λ) = θ2 (λt + (1 − λ)a, λr + (1 − λ)c) Since σ = F ◦ θ1 and σ = F ◦ θ2 , then σ¯ = F ◦ θ¯1 and σ¯ = F ◦ θ¯2 . By Proposition 3.12, σ¯ , θ¯1 , and θ¯2 are continuous. This implies that θ¯1 inverts F along σ¯ with θ¯1 (0) = θ1 (a, c) = x0 and θ¯2 inverts F along σ¯ with θ¯2 (0) = x0 . By Lemma 9.62, we have θ¯1 = θ¯2 . Then, θ1 (t, r) = θ¯1 (1) = θ¯2 (1) = θ2 (t, r). Hence, θ1 = θ2 . This shows that θ : [a, b] × [c, d] → X is unique when it exists. “Existence” Define σ,c : [a, b] → Y by σ,c (t) = σ (t, c), ∀t ∈ [a, b] ⊂ R. By Proposition 3.12, σ,c is continuous with σ,c (a) = y0 . By Lemma 9.62, there exists a unique continuous mapping θ,c : [a, b] → X that inverts F along σ,c with θ,c (a) = x0 . Fix any t ∈ [a, b] ⊂ R. Define σt : [c, d] → Y by σt (r) = σ (t, r), ∀r ∈ [c, d] ⊂ R. By Proposition 3.12, σt is continuous with σt (c) = σ (t, c) =

308

9 Differentiation in Banach Spaces

σ,c (t) = F (θ,c (t)). By Lemma 9.62, there exists a unique continuous mapping θt : [c, d] → X that inverts F along σt with θt (c) = θ,c (t). Define θ : [a, b] × [c, d] → X by θ (t, r) = θt (r), ∀(t, r) ∈ [a, b] × [c, d]. Clearly, ∀(t, r) ∈ [a, b] × [c, d], σ (t, r) = σt (r) = F (θt (r)) = F (θ (t, r)). Hence, θ inverts F along σ . θ (a, c) = θa (c) = θ,c (a) = x0 . All we need to show is that θ is continuous to complete the proof of the lemma. Define S := {r ∈ [c, d] ⊂ R | θ |[a,b]×[c,r] is continuous} ⊂ R and ξ = sup S. Clearly, c ∈ S and c ≤ ξ ≤ d. We will show that ξ ∈ S by an argument of contradiction. Suppose ξ ∈ / S. Then, c < ξ ≤ d and ∃(rn )∞ ⊆ S, which is nondecreasing, such that lim rn = ξ . n∈N n=1 ∀(t, r) ∈ [a, b] × [c, ξ ) ⊂ R2 , there exists n0 ∈ N such that ∀n ≥ n0 , we have rn > r. Then, θ |[a,b]×[c,rn ] is continuous implies that θ is continuous at (t, r). Hence, θ |[a,b]×[c,ξ ) is continuous. Fix any t ∈ [a, b], and let x = θ (t, ξ ) ∈ X , and y = F (x) = σ (t, ξ ) ∈ Y. Then, ∃U ∈ OX with x ∈ U and ∃V ∈ OY with y ∈ V such that F |U : U → V is a homeomorphism. Since σ is continuous, then ∃at , bt , ct , dt ∈ R with at < t < bt and c ≤ ct < ξ < dt such that σ (Dt ) ⊆ V , where Dt := ((at , bt ) × [ct , dt )) ∩ ([a, b]×[c, d]) ⊂ R2 . Define θ¯ : Dt → U by θ¯ (t¯, r¯ ) = ( F |U )inv (σ (t¯, r¯ )), ∀(t¯, r¯ ) ∈ Dt . By Proposition 3.12, θ¯ is continuous. Claim 9.63.1 θ (t¯, r¯ ) = θ¯ (t¯, r¯ ), ∀(t¯, r¯ ) ∈ Dt . Proof of Claim Fix any (t¯, r¯ ) ∈ Dt . Note that θ¯ inverts F along σ |Dt . Define θ¯t : Dt,2 → X by θ¯t (ˆr ) = θ¯ (t, rˆ ), ∀ˆr ∈ Dt,2 := [ct , dt ) ∩ [c, d] ⊂ R. Then, θ¯t is continuous and inverts F along σt |Dt,2 with θ¯t (ξ ) = x. Note that θt |Dt,2 is also continuous and inverts F along σt |Dt,2 with θt (ξ ) = x. By Lemma 9.62, we have θ¯t = θt |Dt,2 and, in particular, θ¯ (t, ct ) = θ¯t (ct ) = θt (ct ) = θ (t, ct ). Define θ¯,ct : Dt,1 → X by θ¯,ct (tˆ) = θ¯ (tˆ, ct ), ∀tˆ ∈ Dt,1 := (at , bt ) ∩ [a, b] ⊂ R. Define σ,ct : [a, b] → Y by σ,ct (tˆ) = σ (tˆ, ct ), ∀tˆ ∈ [a, b]. Define θ,ct : [a, b] → X by θ,ct (tˆ) = θ (tˆ, ct ), ∀tˆ ∈ [a, b]. Then, θ¯,ct is continuous and  inverts F along σ,ct D with θ¯,ct (t) = θ (t, ct ). Since ct ∈ [c, ξ ) ⊂ R, then θ,ct D is continuous t,1 t,1  and inverts F along σ,ct D with θ,ct (t) = θ (t, ct ). By Lemma 9.62, we have t,1  θ,ct D = θ¯,ct and, in particular, θ (t¯, ct ) = θ,ct (t¯) = θ¯,ct (t¯) = θ¯ (t¯, ct ). Define t,1 θ¯t¯ : Dt,2 → X by θ¯t¯(ˆr ) = θ¯ (t¯, rˆ ), ∀ˆr ∈ Dt,2 . Then, θ¯t¯ is continuous and inverts F along σt¯|Dt,2 with θ¯t¯(ct ) = θ¯ (t¯, ct ) = θ (t¯, ct ). Note that θt¯|Dt,2 is also continuous and inverts F along σt¯|Dt,2 with θt¯(ct ) = θ (t¯, ct ). By Lemma 9.62, we have θ¯t¯ = θt¯|Dt,2 and, in particular, θ¯ (t¯, r¯ ) = θ¯t¯(¯r ) = θt¯(¯r ) = θ (t¯, r¯ ). This completes the proof of the claim. ' & Then, θ |Dt is continuous. Then, θ is continuous at (t, ξ ). Then, θ |[a,b]×[c,ξ ] is continuous and ξ ∈ S. This contradicts with the hypothesis ξ ∈ / S. Therefore, ξ ∈ S. Now, we show ξ = d by an argument of contradiction. Suppose ξ < d. Fix any t ∈ [a, b], and let x = θ (t, ξ ) ∈ X and y = F (x) = σ (t, ξ ) ∈ Y. Then, ∃U ∈ OX with x ∈ U and ∃V ∈ OY with y ∈ V such that F |U : U → V is a homeomorphism. Since σ is continuous, then ∃at , bt , ct , dt ∈ R with at < t < bt

9.6 Global Inverse Function Theorem

309

and ct < ξ < dt ≤ d such that σ (Dt ) ⊆ V , where Dt := ((at , bt ) × (ct , dt )) ∩ ([a, b]×[c, d]) ⊂ R2 . Define θ¯ : Dt → U by θ¯ (t¯, r¯ ) = ( F |U )inv (σ (t¯, r¯ )), ∀(t¯, r¯ ) ∈ Dt . By Proposition 3.12, θ¯ is continuous. Claim 9.63.2 θ (t¯, r¯ ) = θ¯ (t¯, r¯ ), ∀(t¯, r¯ ) ∈ Dt . Proof of Claim Fix any (t¯, r¯ ) ∈ Dt . Note that θ¯ inverts F along σ |Dt . Define θ¯,ξ : Dt,1 → X by θ¯,ξ (tˆ) = θ¯ (tˆ, ξ ), ∀tˆ ∈ Dt,1 := (at , bt ) ∩ [a, b] ⊂ R. Define σ,ξ : [a, b] → Y by σ,ξ (tˆ) = σ (tˆ, ξ ), ∀tˆ ∈ [a, b]. Define θ,ξ : [a, b] → X by θ,ξ (tˆ) = θ (tˆ, ξ ), ∀tˆ ∈ [a, b]. Then, θ¯,ξ is continuous and inverts F along σ,ξ D t,1  with θ¯,ξ (t) = x. Since ξ ∈ S, then θ,ξ D is continuous and inverts F along t,1   σ,ξ D with θ,ξ (t) = θ (t, ξ ) = x. By Lemma 9.62, we have θ,ξ D = θ¯,ξ t,1 t,1 ¯ t¯, ξ ). Define θ¯t¯ : Dt,2 → X and, in particular, θ (t¯, ξ ) = θ,ξ (t¯) = θ¯,ξ (t¯) = θ( by θ¯t¯(ˆr ) = θ¯ (t¯, rˆ ), ∀ˆr ∈ Dt,2 := (ct , dt ) ∩ [c, d] ⊂ R. Then, θ¯t¯ is continuous and inverts F along σt¯|Dt,2 with θ¯t¯(ξ ) = θ¯ (t¯, ξ ) = θ (t¯, ξ ). Note that θt¯|Dt,2 is also continuous and inverts F along σt¯|Dt,2 with θt¯(ξ ) = θ (t¯, ξ ). By Lemma 9.62, we have θ¯t¯ = θt¯|Dt,2 and, in particular, θ¯ (t¯, r¯ ) = θ¯t¯(¯r ) = θt¯(¯r ) = θ (t¯, r¯ ). This completes the proof of the claim. ' &  Then, θ |Dt is continuous. Note that [a, b] ⊆ t ∈[a,b]⊂R (at , bt ). By the compactness of [a, b] ⊆ R, there exists a finite set T N ⊆ [a, b] ⊂ R such that  [a, b] ⊆ t ∈TN (at , bt ). Note that θ |Dt , ∀t ∈ TN , and θ |[a,b]×[c,ξ ) are continuous. Then,  by Theorem 3.11, θ |D is continuous, where D := ([a, b] × [c, ξ )) ∪ ( t ∈TN Dt ) ⊆ [a, b] × [c, d] ⊆ R2 and all of the sets are relatively open. Set ¯ ⊆ D ⊆ R2 . Then, d¯ = (mint ∈TN dt + ξ )/2 ∈ (ξ, d]. It is clear that [a, b] × [c, d] ¯ θ |[a,b]×[c,d] ¯ is continuous and d ∈ S. This leads to the contradiction ξ < d¯ ≤ sup S = ξ . Therefore, ξ = d. Then, θ is continuous. This completes the proof of Case 2. This completes the proof of the lemma. ' & Theorem 9.64 (Global Inverse Function Theorem) Let X and Y be Hausdorff topological spaces, X = ∅, and F : X → Y be continuous and countably proper. Assume that ∀x ∈ X , ∃U ∈ OX with x ∈ U and ∃V ∈ OY with F (x) ∈ V such that F |U : U → V is a homeomorphism, X is arcwise connected, and Y is simply connected. Then, F : X → Y is a homeomorphism. Proof Fix x0 ∈ X = ∅, and let y0 = F (x0 ) ∈ Y. ∀y ∈ Y, since Y is simply connected, then Y is arcwise connected by Definition 3.53. Then, there exists a curve σ : I → Y, where I := [0, 1] ⊂ R, such that σ (0) = y0 and σ (1) = y. By Lemma 9.62, there exists a continuous mapping θ : I → X that inverts F along σ with θ (0) = x0 . Then, y = σ (1) = F (θ (1)). Hence, F is surjective. Fix x1 , x2 ∈ X such that F (x1 ) = F (x2 ) = y. Since X is arcwise connected, then there exists a curve δ : I → X such that δ(0) = x1 and δ(1) = x2 . Consider the curve η := F ◦ δ, which is continuous by Proposition 3.12. η is a closed curve since η(0) = F (x1 ) = y = F (x2 ) = η(1). Since Y is simply connected, then η is homotopic to a single point y¯ ∈ Y. Then, there exists a continuous mapping

310

9 Differentiation in Banach Spaces

γ : I × I → Y such that γ (t, 0) = η(t), γ (t, 1) = y, ¯ and γ (0, t) = γ (1, t), ∀t ∈ I . By Lemma 9.63, there exists a continuous function ζ : I × I → X that inverts F along γ with ζ(0, 0) = x1 . Define γ0 : I → Y by γ0 (t) = γ (0, t), γ1 : I → Y by γ1 (t) = γ (1, t), γ,0 : I → Y by γ,0 (t) = γ (t, 0), and γ,1 : I → Y by γ,1 (t) = γ (t, 1), ∀t ∈ I . Then, γ0 = γ1 . Define ζ0 : I → X by ζ0 (t) = ζ(0, t), ζ1 : I → X by ζ1 (t) = ζ(1, t), ζ,0 : I → X by ζ,0 (t) = ζ(t, 0), and ζ,1 : I → X by ζ,1 (t) = ζ (t, 1), ∀t ∈ I . Then, ζ0 is continuous and inverts F along γ0 . Set x¯ = ζ0 (1) = ζ (0, 1) ∈ X , and then, y¯ = F (x). ¯ ζ,1 is continuous and inverts F along γ,1 with ζ,1 (0) = ζ(0, 1) = x. ¯ Since γ,1 is a constant function with value y, ¯ then the constant mapping λ : I → X given by λ(t) = x, ¯ ∀t ∈ I , is continuous and inverts F along γ,1 with λ(0) = x. ¯ By Lemma 9.62, we have λ = ζ,1 and, in particular, x¯ = λ(1) = ζ,1 (1) = ζ(1, 1). Note that ζ1 is continuous and inverts F along γ1 with ζ1 (1) = ζ(1, 1) = x. ¯ Since γ1 = γ0 , then ζ0 is continuous and inverts F along γ1 with ζ0 (1) = x. ¯ By Lemma 9.62, we have ζ0 = ζ1 and in particular ζ (1, 0) = ζ1 (0) = ζ0 (0) = ζ(0, 0) = x1 . Note that ζ,0 is continuous and inverts F along γ,0 = η with ζ,0 (0) = ζ(0, 0) = x1 . By construction, δ is continuous and inverts F along η with δ(0) = x1 . By Lemma 9.62, δ = ζ,0 and in particular, x2 = δ(1) = ζ,0 (1) = ζ(1, 0) = x1 . Hence, x1 = x2 . Therefore, F is injective. Hence, F is bijective and admits inverse Fi : Y → X . ∀y ∈ Y, let x = Fi (y) ∈ X . Then, ∃U ∈ OX with x ∈ U and ∃V ∈ OY with y ∈ V such that F |U : U → V is a homeomorphism. Then, Fi |V is the inverse of F |U and is continuous. Then, Fi is continuous at y since V is open. By the arbitrariness of y, Fi is continuous. Hence, F is a homeomorphism. This completes the proof of the theorem. ' & Theorem 9.65 (Global Inverse Function Theorem) Let X and Y be Hausdorff topological spaces and F : X → Y be continuous and countably proper. Let H := {x ∈ X | ∃U ∈ OX with x ∈ U · F |U : U → F (U ) ∈ OY is a homeomorphism} ⊆ X , Σ := X \ H , Σ0 := Finv (F (Σ)), X0 := X \ Σ0 , and Y0 := Y \ F (Σ). Assume that X0 = ∅ is arcwise connected and Y0 is simply connected. Then, G := F |X0 : X0 → Y0 is a homeomorphism. Proof Let OX0 and OY0 be the subset topology on X0 and Y0 , respectively. Claim 9.65.1 Σ ⊆ X is closed. Proof of Claim ∀x ∈ H , ∃Ux ∈ OX with x ∈ Ux such that F |Ux : Ux →F (Ux ) ∈ OY is a homeomorphism. ∀x¯ ∈ Ux , x¯ ∈ H . Then, Ux ⊆ H and H = x∈H Ux . Hence, H ∈ OX . Then, Σ := X \ H is closed. ' & Claim 9.65.2 Finv (Y0 ) = X0 . G : X0 → Y0 is continuous and countably proper. Proof of Claim By Proposition 2.5, Finv (Y0 ) = Finv (Y \ F (Σ)) = Finv (Y) \ Finv (F (Σ)) = X \ Σ0 = X0 and F (X0 ) = F (Finv (Y0 )) ⊆ Y0 . Hence, G is a function of X0 to Y0 . Fix any K ⊆ Y0 such that K is compact in OY0 . Then K is a compact set in OY . Then, Ginv (K) = Finv (K) ⊆ X0 . By the countable properness of F , we have

9.6 Global Inverse Function Theorem

311

Finv (K) ⊆ X is countably compact in OX . Then, it is easy to show that Ginv (K) is countably compact in OX0 . Hence, G is countably proper. ∀V0 ∈ OY0 , V0 = Y0 ∩ V , where V ∈ OY . Ginv (V0 ) = Finv (V0 ) = Finv (Y0 ) ∩ Finv (V ) = X0 ∩ Finv (V ). Since F is continuous, Finv (V ) ∈ OX . Thus, Ginv (V0 ) ∈ OX0 . Hence, G is continuous. ' & By Proposition 2.5, Σ0 = Finv (F (Σ)) ⊇ Σ ∩ dom (F ) = Σ. Then, X0 = X \ Σ0 ⊆ X \ Σ = H . ∀x ∈ X0 ⊆ H , ∃U ∈ OX with x ∈ U such that F |U : U → F (U ) ∈ OY is a homeomorphism. Let U0 := X0 ∩ U ∈ OX0 , V := F (U ) ∈ OY , and V0 := F (U ) ∩ Y0 ∈ OY0 . Clearly, x ∈ U0 . By Proposition 2.5, G(U0 ) = F (U0 ) ⊆ F (X0 ) ∩ F (U ) ⊆ Y0 ∩ V = V0 . Then, G|U0 : U0 → V0 . Note that G|U0 = ( F |U )U . Since F |U is injective, then G|U0 is injective. ∀y¯ ∈ V0 , 0 y¯ ∈ V and then ∃x¯ ∈ U such that y¯ = F (x). ¯ Note that y¯ ∈ Y0 and Finv (Y0 ) = X0 , and then x¯ ∈ X0 . Hence, x¯ ∈ U0 and G(x) ¯ = F (x) ¯ = y. ¯ Then, G|U0 : U0 → V0 is surjective. Hence, G|U0 is bijective with inverse Gi = Fi |V0 , where Fi is the inverse of F |U : U → V . Since F |U is homeomorphism, then F |U and Fi are continuous. Then, G|U0 and Gi are continuous. This shows that G|U0 : U0 → V0 is a homeomorphism. By Global Inverse Function Theorem 9.64, G : X0 → Y0 is a homeomorphism. This completes the proof of the theorem. ' & Proposition 9.66 Let X := (X, O) be a topological space and Aα ⊆ X be arcwise connected (in subset topology), ∀α ∈ Λ,where Λ is an index set. Assume that Aα1 ∩ Aα2 = ∅, ∀α1 , α2 ∈ Λ. Then, A := α∈Λ Aα is arcwise connected (in subset topology). Proof ∀x1 , x2 ∈ A, ∃α1 , α2 ∈ Λ such that xi ∈ Aαi , i = 1, 2. By the assumption, let x0 ∈ Aα1 ∩ Aα2 = ∅. ∀i ∈ {1, 2}, since Aαi is arcwise connected, ∃ curve γi : [0, 1] ⊂ R → Aαi such that γi (0) = x0 and γi (1) = xi . Define γ : [0, 1] ⊂ R → A  γ1 (1 − 2t) 0 ≤ t ≤ 1/2 by γ (t) = , ∀t ∈ [0, 1] ⊂ R. Then γ (0) = γ1 (1) = x1 γ2 (2t − 1) 1/2 < t ≤ 1 and γ (1) = γ2 (1) = x2 . Note that γ |[0,1/2] (t) = γ1 (1 − 2t), ∀t ∈ [0, 1/2] ⊂ R, and γ |[1/2,1] (t) = γ2 (2t − 1), ∀t ∈ [1/2, 1] ⊂ R, are continuous functions. By Theorem 3.11, γ is continuous. Therefore, γ is a curve connecting x1 and x2 . By the arbitrariness of x1 and x2 , we have A is arcwise connected. This completes the proof of the proposition. ' & Proposition 9.67 Let X be a normed linear space and O ⊆ X be open and connected. Then, O is arcwise connected. Proof The result is trivial if O = ∅. Let x0 ∈ O  = ∅. Let M := {A ⊆ O | x0 ∈ A and A is arcwise connected} and A0 := A∈M A. Then, x0 ∈ A0 ⊆ O and, by Proposition 9.66, A0 is arcwise connected. ∀x ∈ A0 ⊆ O, ∃δ ∈ (0, ∞) ⊂ R such that BX (x, δ) ⊆ O. Clearly, BX (x, δ) is arcwise connected. Then, by Proposition 9.66, A0 ∪ BX (x, δ) is arcwise connected. This implies that A0 ∪ BX (x, δ) ∈ M. Then, BX (x, δ) ⊆ A0 . By the arbitrariness of x, we have A0

312

9 Differentiation in Banach Spaces

is open in X. We will show that A0 = O by an argument of contradiction. Suppose A0 ⊂ O. Let E := O \ A0 = ∅. Let ∂E be the boundary of E in X. Claim 9.67.1 ∂E ∩ E = ∅. Proof of Claim Suppose ∂E ∩ E = ∅. By Proposition 3.3, E = E ∩ E = (∂E ∪ E ◦ ) ∩ E = E ◦ ∩ E = E ◦ . Hence, E is open in X. Then, A0 and E form a separation of O. This contradicts with the assumption that O is connected. Hence, the claim holds. ' & Let x1 ∈ ∂E ∩ E ⊆ O. Then, ∃δ1 ∈ (0, ∞) ⊂ R such that BX (x1 , δ1 ) ⊆ O. x1 ∈ ∂E implies that BX (x1 , δ1 ) ∩ E˜ = ∅. This implies that ∃x2 ∈ BX (x1 , δ1 ) ∩ (O \ E) = BX (x1 , δ1 ) ∩ A0 . By Proposition 9.66, A0 ∪ BX (x1 , δ1 ) is arcwise connected. Then, by the definition of A0 , we have BX (x1 , δ1 ) ⊆ A0 and x1 ∈ A0 . This contradicts with the fact that x1 ∈ E = O \ A0 . Therefore, A0 = O and O is arcwise connected. This completes the proof of the proposition. ' &

9.7 Interchange Differentiation and Limit Proposition 9.68 Let X and Y be normed linear spaces over K, D ⊆ X, x0 ∈ D, and Fn : D → Y, ∀n ∈ N. Assume that: (i) ∃δ0 ∈ (0, ∞) ⊂ R such that D¯ := D ∩ BX (x0 , δ0 ) − x0 is a conic segment. (ii) F ∀n ∈ N. " n is differentiable, # (1) ∞ (iii) Fn converges uniformly to G : D → B(X, Y). n=1

(iv) ∀x ∈ D, limn∈N Fn (x) = F (x), where F : D → Y. Then, F is differentiable at x0 and F (1)(x0 ) = G(x0 ) = limn∈N Fn(1) (x0 ). Proof By the differentiability of F1 , we have span (AD (x)) 6= X, ∀x ∈ D. ∀ 6 ∈ 6 (1) 6 (0, ∞) ⊂ R, by (iii), ∃n0 ∈ N such that ∀n ∈ N with n ≥ n0 , 6Fn (x) − G(x)6 < /4, ∀x ∈ D. ∀n, m ∈ N with n ≥ n0 and m ≥ n0 , by Proposition 9.15, gn,m : D → Y, defined by gn,m (x) = Fn (x) − Fm (x), ∀x ∈ D, is differentiable and (1) (1) (1) gn,m (x) = Fn (x) − Fm (x), ∀x ∈ D. By mean value theorem 9.23, ∀x ∈ D¯ + x0 and ∀n, m ∈ N with n ≥ n60 and m ≥ n0 , F6n (x) 6− Fm (x) − Fn (x0 ) + 6 Fm (x0 ) = 6 6 6 6 (1) 6 (1) (1) 6gn,m (x) − gn,m (x0 )6 ≤ 6 ( x)(x ¯ − x ) ≤ ( x) ¯ − F ( x) ¯ 6gn,m 6Fn 6x − x0  ≤ 0 6 m x − x0 /2, where x¯ = t0 x + (1 − t0 )x0 ∈ D¯ + x0 and t0 ∈ (0, 1) ⊂ R. By Fn(1) Y), ∃δ ∈ (0, δ0 ] ⊂ R such that, ∀x ∈6Dˆ := D ∩ BX (x0 , δ) ⊆ 0 (x0 ) ∈ B(X,6 6 6 /4. D¯ + x0 , we have 6Fn0 (x) − Fn0 (x0 ) − Fn(1) 0 (x0 )(x − x0 )6 ≤ x − x0 ˆ ∀m ∈ N with m ≥ n0 , we have Fm (x) − Fm (x0 ) − G(x0 ) Fix any x 6∈ D. 6 6 (x − x0 ) ≤ 6Fm (x) − Fm (x0 ) − Fn0 (x) + Fn0 (x0 )6 + 6Fn0 (x) − Fn0 (x0 ) − Fn(1) 0 6 6 (1) 6 (x0 )(x − x0 )6 + 6Fn (x0 )(x − x0 ) − G(x0 )(x − x0 )6 ≤ x − x0 /2 + 0

x − x0 /4 + x − x0 /4

=

x − x0 . Take limit as m



∞,

9.7 Interchange Differentiation and Limit

313

we have F (x) − F (x0 ) − G(x0 )(x − x0 ) ≤ limm∈N Fm (x) − Fm (x0 ) −G(x0 )(x − x0 ) ≤ x − x0 . By the arbitrariness of x, F is differentiable at x0 and F (1) (x0 ) = G(x0 ) = limn∈N DFn (x0 ). This completes the proof of the proposition. ' & Example 9.69 Let X and Y be normed linear spaces over K, Ω ⊆ X be a compact set, which satisfies span (AΩ (x)) = X, ∀x ∈ Ω, k ∈ N, and W := C(Ω, Y) be the normed linear space defined in Example 7.31. Define Z := {f ∈ C(Ω, Y) | f is Ck }. By Proposition 9.40, Z is a subspace of C(Ω, Y). Then, Z := (Z, ⊕W , ⊗W , ϑW ) is the vector space 6over 6the field K. ∀f ∈ Z, ∀i ∈ {0, . . . , k}, f (i) ∈ C(Ω, Bi (X, Y)) with norm 6f (i) 6C ∈ R. Now, define "- 6 6 #1/2 k 6 (i) 62 f ∈ [0, ∞) ⊂ R, ∀f ∈ Z. If a norm on Z by f Ck := i=0 C f = ϑW , by Proposition 9.33, f is C∞ and f (i) (x) = ϑBi (X,Y) , ∀x ∈ Ω, ∀i ∈ N, then f Ck = 0. On the other hand, if f Ck = 0, then f = ϑW . 6 !1/2 -k 6 6 (i) (i) 62 ∀f1 , f2 ∈ Z, by Proposition 9.40, f1 + f2 Ck = ≤ i=0 6f1 + f2 6 C ! ! 6 1/2 6 1/2 -k 6 -k 6 6 (i) 62 6 (i) 62 + = f1 Ck + f2 Ck , where the inequali=0 6f1 6 i=0 6f2 6 C

C

ity follows from Minkowski’s inequality, Theorem 7.9. ∀α ∈ K, ∀f ∈ Z, by "- 6 "6 #1/2 6 6 #1/2 k 6 k 2 6 (i) 62 (i) 62 |α| αf f Proposition 9.40, αf Ck = = = i=0 i=0 C C |α|f Ck . This shows that (Z, K, ·Ck ) is a normed linear space, which will be denoted Ck (Ω, Y). % Example 9.70 Let X be a normed linear space over K and Y be a Banach space over K, Ω ⊆ X be a compact set, and k ∈ N. Assume that, ∀x ∈ Ω, ∃δx ∈ (0, ∞) ⊂ R such that Ω∩BX (x, δx )−x is a conic segment and span (AΩ (x)) = X. Let Ck (Ω, Y) be the normed linear space defined in Example 9.69. We will show that Ck (Ω, Y) is also a Banach space over K. Fix any Cauchy sequence (fn )∞ n=1 ⊆ Ck (Ω, Y). By the definition of the norm " # (i) ∞ ·Ck , fn ⊆ C(Ω, Bi (X, Y)) =: Wi is a Cauchy sequence, ∀i ∈ {0, . . . , k}. n=1 ∀i ∈ {0, . . . , k}, by Example 7.32 and Proposition 7.66, C(Ω, Bi (X, Y)) is a (i) Banach space. Then, ∃g # C(Ω, Bi (X, Y)) such that limn∈N fn = gi in "i ∈ C(Ω, Bi (X, Y)). Then, fn(i)



n=1

converges uniformly to gi and gi is continuous.

gi(1)

By Proposition 9.68, we have = gi+1 , ∀i ∈ {0, . . . , k − 1}. Then, we (i) have g0 = gi , ∀i ∈ {0, . . . , k}, and g0 ∈ Ck (Ω, Y). Then, fn − g0 Ck = 6 !1/2 62 !1/2 -k 6 -k 6 6 (i) 6 (i) 6 (i) 62 − g = − g → 0 as n → ∞, where i6 i=0 6fn i=0 6fn 0 6 C

C

the first equality follows from Proposition 9.40. Then, limn∈N fn = g0 in Ck (Ω, Y). Hence, Ck (Ω, Y) is complete and therefore a Banach space. %

314

9 Differentiation in Banach Spaces

Sometimes, we need to consider a normed linear space of continuous functions on a topological space that is not necessarily compact. This leads us to the following examples. Example 9.71 Let X := (X, O) be a topological space, Y be a normed linear space over the field K, and Cv (X , Y) be the vector space of all continuous functions of X to Y as defined in Example 7.50 with null vector ϑ. Define a function · : Cv (X , Y) → Re by f  = max supx∈X f (x)Y , 0 , ∀f ∈ Cv (X , Y). Consider the set M := {f ∈ Cv (X , Y) | f < +∞}. Clearly, ϑ ∈ M. ∀f1 ,f2 ∈ M, ∀α, β ∈ K, αf1 + βf2  = max supx∈X αf1 (x) + βf2 (x)Y , 0 ≤ max supx∈X (|α|f1 (x)Y + |β|f2 (x)Y ), 0 < +∞. Then, αf1 + βf2 ∈ M. Hence, M is a subspace of Cv (X , Y). Clearly, ∀f ∈ M, f  ∈ [0, ∞) ⊂ R and f  = 0 ⇔ f = ϑ. ∀f1 , f2 ∈ M, ∀α ∈ K, f1 + f2  = max supx∈X f1 (x) + f2 (x)Y , 0 ≤ max supx∈X f1 (x)Y + supx∈X f2 (x)Y , 0 = f1  + f2 , where the first inequality follows from Proposition 3.81. .

αf1  = max { sup αf1 (x)Y , 0} x∈X

= max { sup |α|f1 (x)Y , 0} = x∈X



 

0 max |α| supx∈X f1 (x)Y , 0 α = 0 α=0

= |α|f1  where the third equality follows from Proposition 3.81. Hence, (M, K, ·) is a normed linear space, which will be denoted by Cb (X , Y). % Example 9.72 Let X := (X, O) be a topological space and Y be a Banach space over the field K (with norm ·Y ). Consider the normed linear space Cb (X , Y) (with norm ·) defined in Example 9.71. We will show that this space is a Banach space. We will distinguish two exhaustive and mutually exclusive cases: Case 1: X = ∅; Case 2: X = ∅. Case 1: X = ∅. Then, Cb (X , Y) is a singleton set. Hence, any Cauchy sequence in Cb (X , Y) must converge. Thus, Cb (X , Y) is a Banach space. Case 2: X = ∅. Take a Cauchy sequence (fn )∞ n=1 ⊆ Cb (X , Y). ∀ ∈ (0, ∞) ⊂ R, ∃N ∈ N such that ∀n, m ≥ N, 0 ≤ fn (x) − fm (x)Y ≤ fn − fm  < , ∀x ∈ X . This shows that, ∀x ∈ X , (fn (x))∞ n=1 ⊆ Y is a Cauchy sequence, which converges to f (x) ∈ Y since Y is complete. This defines a function f : X → Y. It is easy to show that (fn )∞ n=1 , viewed as a sequence of functions of X to Y, converges uniformly to f . By Proposition 4.26, f is continuous. ∀x ∈ X , f (x)Y ≤ fN (x) − f (x)Y + fN (x)Y = limm∈N fN (x) − fm (x)Y + fN (x)Y ≤  + fN . Hence, f  ≤ fN  + . Then, f ∈ Cb (X , Y). It is easy to show that limn∈N fn − f  = 0. Hence, limn∈N fn = f in Cb (X , Y). Hence, Cb (X , Y) is a Banach space. In both cases, we have shown that Cb (X , Y) is a Banach space when Y is a Banach space. % Example 9.73 Let X and Y be normed linear spaces over K, Ω ⊆ X be endowed with the subset topology, which satisfies span (AΩ (x)) = X, ∀x ∈ Ω,

9.8 Tensor Algebra

315

k ∈ N, and W := Cb (Ω, Y) be the normed linear space defined in Example 9.71 with null vector ϑW . Define Z := {f ∈ Cb (Ω, Y) | f is Ck and f (i) ∈ Cb (Ω, Bi (X, Y)), i = 1, . . . , k}. By Proposition 9.40, Z is a subspace of W. Then, Z := (Z, ⊕W , ⊗W , ϑW ) is vector space over 6 field K. ∀f ∈ Z, 6 the ∀i ∈ {0, . . . , k}, f (i) ∈ Cb (Ω, Bi (X, Y)) with norm 6f (i) 6C ∈ R. Now, define b "- 6 6 #1/2 k 6 (i) 62 a norm on Z by f Cb k := ∈ [0, ∞) ⊂ R, ∀f ∈ Z. i=0 f C b

If f = ϑW , by Proposition 9.33, f is C∞ and f (i) (x) = ϑBi (X,Y) , ∀x ∈ Ω, ∀i ∈ N, then f Cb k = 0. On the other hand, if f Cb k = 0, then f = ϑW . 6 !1/2 -k 6 6 (i) (i) 62 ∀f1 , f2 ∈ Z, by Proposition 9.40, f1 + f2 Cb k = + f ≤ 6f i=0 1 2 6C b ! ! 6 1/2 6 1/2 -k 6 -k 6 6 (i) 62 6 (i) 62 + = f1 Cb k + f2 Cb k , where the 6f 6 6f 6 i=0 i=0 1 2 Cb

Cb

inequality follows from Minkowski’s Inequality, Theorem 7.9. ∀α ∈ K, ∀f ∈ Z, by "- 6 62 #1/2 "-k 6 62 #1/2 k 6 |α|2 6f (i) 6 Proposition 9.40, αf C = = = αf (i) 6 bk

i=0

Cb

i=0

Cb

|α|f Cb k . This shows that (Z, K, ·Cb k ) is a normed linear space, which will be denoted Cb k (Ω, Y). %

Example 9.74 Let X be a normed linear space over K and Y be a Banach space over K, Ω ⊆ X be endowed with the subset topology, and k ∈ N. Assume that, ∀x ∈ Ω, ∃δx ∈ (0, ∞) ⊂ R such that Ω ∩ BX (x, δx ) − x is a conic segment and span (AΩ (x)) = X. Let Cb k (Ω, Y) be the normed linear space defined in Example 9.73. We will show that Cb k (Ω, Y) is also a Banach space over K. Fix any Cauchy sequence (fn )∞ n=1 ⊆ Cb k (Ω, Y). By the definition of the " # (i) ∞ ⊆ Cb (Ω, Bi (X, Y)) =: Wi is a Cauchy sequence, norm ·Cb k , fn n=1 ∀i ∈ {0, . . . , k}. ∀i ∈ {0, . . . , k}, by Example 9.72 and Proposition 7.66, Cb (Ω, Bi (X, Y)) is a Banach space. Then, ∃gi " ∈ C#b (Ω, Bi (X, Y)) such that (i)

limn∈N fn

= gi in Cb (Ω, Bi (X, Y)). Then,

(i) ∞

fn

n=1

converges uniformly

to gi and gi is continuous. By Proposition 9.68, we have gi(1) = gi+1 , ∀i ∈ {0, . . . , k − 1}. Then, we have g0(i) = gi , ∀i ∈ {0, . . . , k}, and 6 !1/2 -k 6 6 (i) (i) 62  f = g0 ∈ Cb k (Ω, Y). Furthermore, n − g0 Cb k = i=0 6fn − g0 6 Cb !1/2 6 6 -k 6 (i) 62 → 0 as n → ∞, where the first equality follows i=0 6fn − gi 6 Cb

from Proposition 9.40. Then, limn∈N fn = g0 in Cb k (Ω, Y). Hence, Cb k (Ω, Y) is complete and therefore a Banach space. %

316

9 Differentiation in Banach Spaces

9.8 Tensor Algebra Definition 9.75 Let m ∈ Z+ , Xi be a normed linear space over K, i = 1, . . . , m, and Z be a normed linear space over K. A bounded linear operator A ∈ B(Xm , B(Xm−1 , . . . , B(X1 , Z) · · · )) is said to be an mth order Z-valued tensor. Let B ∈ B(Yn , B(Yn−1 , . . . , B(Y1 , Xm ) · · · )) be another nth order Xm -valued tensor. We define AB := A · B ∈ B(Yn , . . . , B(Y1 , B(Xm−1 , . . . , B(X1 , Z) · · · )) · · · )

.

to be an (n + m − 1)th order Z-valued tensor such that (AB)(yn ) · · · (y1 )(xm−1 ) · · · (x1 ) = A(B(yn ) · · · (y1 ))(xm−1 ) · · · (x1 ) ∈ Z

.

∀yi ∈ Yi , i = 1, . . . , n, ∀xj ∈ Xj , j = 1, . . . , m − 1. Let (n1 , . . . , nm ) be any permutation of (1, . . . , m). Then, we may define the transpose of A with    tensor  permutation (n1 , . . . , nm ) to be ATn1 ,...,nm ∈ B Xnm , . . . , B Xn1 , Z · · · such that ATn1 ,...,nm (xnm ) · · · (xn1 ) = A(xm ) · · · (x1 )

.

∀xi ∈ Xi , i = 1, . . . , m

%

Proposition 9.76 Let m, n ∈ Z+ , Xi , i = 1, . . . , m, Yj , j = 1, . . . , n, Z be normed linear spaces over K, A, Ak ∈ W1 := B(Xm , . . . , B(X1 , Z) · · · ), k = 1, 2, be mth order Z-valued tensors, B, Bl ∈ W2 := B(Yn , . . . , B(Y1 , Xm ) · · · ) be nth order Xm -valued tensors, l = 1, 2, and W3 := B(Yn , . . . , B(Y1 , B(Xm−1 , . . . , B(X1 , Z) · · · )) · · · )

.

Then, the following statements hold: (i) AB ≤ AB. (ii) ∀αk , βl ∈ K, k = 1, 2, l = 1, 2, we have (α1 A1 + α2 A2 ) (β1 B1 + β2 B2 ) = α1 β1 A1 B1 + α1 β2 A1 B2 + α2 β1 A2 B1 + α2 β2 A2 B2 . (iii) Let A ∈ B(X2 , B(X1 , K)) be a second order K-valued tensor. Then, AT2,1 = A φX1 , where φX1 : X1 → X∗∗ 1 is the natural mapping as defined in Remark 7.88. (iv) Let f : W1 × W2 → W3 be defined by f (A, B) = AB ∈ W3 , ∀A ∈ W1 , ∀B ∈ W2 . Then, f is C∞ , f (1) (A0 , B0 )(Δ1,1 , Δ2,1 ) = A0 Δ2,1 + Δ1,1 B0 , f (2) (A0 , B0 )(Δ1,1 , Δ2,1 )(Δ1,2 , Δ2,2 ) = Δ1,2 Δ2,1 + Δ1,1 Δ2,2 , and f (i+2) (A0 , B0 ) = ϑBS i+2 (W1 ×W2 ,W3 ) , ∀(A0 , B0 ) ∈ W1 × W2 , ∀(Δ1,1 , Δ2,1 ) ∈ W1 × W2 , ∀(Δ1,2 , Δ2,2 ) ∈ W1 × W2 , ∀i ∈ N.

9.8 Tensor Algebra

317

(v) Let X be a normed linear space over K, x0 ∈ D ⊆ X, A : D → W1 , and B : D → W2 be tensor-valued functions that are Fréchet differentiable at x0 , and C : D → W3 be defined by C(x) = A(x)B(x), ∀x ∈ D. Then, " #T1,...,m−1,m+1,...,n+m,m T1,...,m−1,m+1,m C (1) (x0 ) = (A(1) (x0 )) B(x0 )

.

+A(x0 )B (1) (x0 ) (vi) Let X be a normed linear space over K, x0 ∈ D ⊆ X, A : D → W1 be a tensor-valued function that is Fréchet differentiable at x0 , (n1 , .. . , nm ) be a permutation of (1, . . . , m), and C : D → Wp := B Xnm , . . . , B Xn1 , Z · · · be defined by C(x) = (A(x))Tn1 ,...,nm , ∀x ∈ D. Then, we have C (1) (x0 ) = (A(1)(x0 ))

.

Tn1 ,...,nm ,m+1

Proof (i) and (ii) These are straightforward and are therefore     omitted. ∗ T2,1 ∈ (iii) Note that A ∈ B X2 , X∗1 . Then, A ∈ B X∗∗ 1 , X2 and A   ∗ T 2,1 B(X1 , B(X2 , K)) BB = B X1 , X2 CC. Then, BB A (x1 )(x2 )CC = A(x2)(x1 ) = A(x2 ), x1  = φX1 (x1 ), A(x2 ) = A (φX1 (x1 )), x2 = A (φX1 (x1 ))(x2 ), ∀xi ∈ Xi , i = 1, 2. Then, we have AT2,1 = A φX1 . (iv) is straightforward and is therefore omitted. (v) follows directly from (iv), the chain rule, and Proposition 9.19. (vi) is straightforward and is therefore omitted. ' & Definition 9.77 Let Xi , i = 1, . . . , m, Z, Yj , j = 1, . . . , n, be normed linear spaces over K, A ∈ B(Xm , . . . , B(X1 , Z) · · · ) be an mth order Zvalued tensor, and B ∈ B(Yn , . . . , B(Y1 , K) · · · ) be an nth order K-valued tensor. Define the outer product of A and B to be an (n + m)th order Zvalued tensor C := A ⊗ B ∈ B(Yn , . . . , B(Y1 , B(Xm , . . . , B(X1 , Z) · · · )) · · · ) such that C(yn ) · · · (y1 )(xm ) · · · (x1 ) = B(yn ) · · · (y1 )A(xm ) · · · (x1 ) ∈ Z, ∀xi ∈ Xi , i = 1, . . . , m, ∀yj ∈ Yj , j = 1, . . . , n. Similarly, we may define the outer product of B and A to be an (n + m)th order Z-valued tensor C¯ := B ⊗ A ∈ B(Xm , . . . , B(X1 , B(Yn , . . . , B(Y1 , Z) · · · )) · · · ) such that ¯ m ) · · · (x1 )(yn ) · · · (y1 ) = B(yn ) · · · (y1 )A(xm ) · · · (x1 ) ∈ Z, ∀xi ∈ Xi , C(x i = 1, . . . , m, ∀yj ∈ Yj , j = 1, . . . , n. % Proposition 9.78 Let m, n ∈ Z+ , Xi , i = 1, . . . , m, Yj , j = 1, . . . , n, Zτ , τ = 1, 2, be normed linear spaces over K with Z1 = K or Z2 = K, Z = Z1 if Z2 = K, Z = Z2 if Z1 = K, A, Ak ∈ W1 := B(Xm , . . . , B(X1 , Z1 ) · · · ), k = 1, 2, be mth order Z1 -valued tensors, B, Bl ∈ W2 := B(Yn , . . . , B(Y1 , Z2 ) · · · ) be nth order Z2 -valued tensors, l = 1, 2, and W3 := B(Yn , . . . , B(Y1 , B(Xm , . . . , B(X1 , Z) · · · )) · · · )

.

318

9 Differentiation in Banach Spaces

Then, the following statements hold: (i) A ⊗ B ≤ AB. (ii) ∀αk , βl ∈ K, k = 1, 2, l = 1, 2, we have (α1 A1 + α2 A2 ) ⊗ (β1 B1 + β2 B2 ) = α1 β1 A1 ⊗ B1 + α1 β2 A1 ⊗ B2 + α2 β1 A2 ⊗ B1 + α2 β2 A2 ⊗ B2 . (iii) Let f : W1 × W2 → W3 be defined by f (A, B) = A ⊗ B ∈ W3 , ∀A ∈ W1 , ∀B ∈ W2 . Then, f is C∞ , f (1) (A0 , B0 )(Δ1,1, Δ2,1 ) = A0 ⊗ Δ2,1 + Δ1,1 ⊗ B0 , f (2) (A0 , B0 )(Δ1,1 , Δ2,1 )(Δ1,2 , Δ2,2 ) = Δ1,2 ⊗ Δ2,1 + Δ1,1 ⊗ Δ2,2 , and f (i+2) (A0 , B0 ) = ϑBS i+2 (W1 ×W2 ,W3 ) , ∀(A0 , B0 ) ∈ W1 × W2 , ∀(Δ1,1 , Δ2,1 ) ∈ W1 × W2 , ∀(Δ1,2 , Δ2,2 ) ∈ W1 × W2 , ∀i ∈ N. (iv) Let X be a normed linear space over K, x0 ∈ D ⊆ X, A : D → W1 and B : D → W2 be tensor-valued functions that are Fréchet differentiable at x0 , and C : D → W3 be defined by C(x) = A(x) ⊗ B(x), ∀x ∈ D. Then, " #T1,...,m,m+2,...,n+m+1,m+1 C (1) (x0 ) = A(1)(x0 ) ⊗ B(x0 )

.

+A(x0) ⊗ B (1) (x0 ) Proof These are straightforward and are therefore omitted.

' &

9.9 Analytic Functions Definition 9.79 Let X and Y be normed linear spaces over K, D ⊆ X, f : D → Y, and x0 ∈ D ◦ . f is said to be analytic at x0 if f is C∞ 6 at x0 , and 6 ∃δ ∈ (0, ∞) ⊂ R, ∃c ∈ [0, ∞) ⊂ R, and ∃M ∈ (0, ∞) ⊂ R, such that 6f (n) (x)6 ≤ cn!M n , ∀n ∈ Z+ , ∀x ∈ BX (x0 , δ) ⊆ D. In this case, ∀δ¯ ∈ (0, δ] ⊂ R with M δ¯ < 1 is called an analytic radius of f at x0 . If D ⊇ D1 ∈ OX and f is analytic at x, ∀x ∈ D1 , then we say that f is analytic on D1 . When f is analytic on D, then we say that f is analytic or an analytic function. % Theorem 9.80 (Taylor Series) Let X and Y be normed linear spaces over K, D ⊆ X, f : D → Y, and x0 ∈ D ◦ . Assume that f is analytic at x0 with an analytic radius δ ∈ (0, ∞) ⊂ R. Then, ∀x ∈ BX (x0 , δ), f (x) =

.

∞ . 1 (n) f (x0 ) (x − x0 ) · · · (x − x0 )    n! n=0

n−times

Proof Since f is analytic at x0 ∈ D ◦ with an analytic 6 radius 6δ ∈ (0, ∞) ⊂ R, then ∃c ∈ [0, ∞) ⊂ R and ∃M ∈ (0, ∞) ⊂ R such that 6f (n) (x)6 ≤ cn!M n , ∀n ∈ Z+ ,

9.9 Analytic Functions

319

∀x ∈ BX (x0 , δ) ⊆ D, and Mδ < 1. Then, ∀x ∈ BX (x0 , δ), ∀n ∈ Z+ , let Rn (x, x0 ; f ) := f (x) −

.

n . 1 (i) f (x0 ) (x − x0 ) · · · (x − x0 )    i! i=0

i−times

By Taylor Theorem 9.48, ∃t0 6 ∈ (0, 1) ⊂ R such that Rn (x, x0 ; f ) ≤ 6 1 6 (n+1) f (t0 x + (1 − t0 )x0 )6x − x0 n+1 < c(Mδ)n+1 . Clearly, (n+1)! limn∈N Rn (x, x0 ; f ) = 0. Hence, the desired equality holds. This completes the proof of the theorem. ' & Next, we establish a result that is needed in the further analysis of analytic functions. Proposition 9.81 Let X, Y, Z, and W be normed linear spaces over K, D1 ⊆ X, f : D1 → B(Y, Z), g : D1 → B(Z, W), n ∈ Z+ , and x0 ∈ D1◦ . Assume that f and g are Cn at x0 . Define h : D1 → B(Y, W) by h(x) = g(x)f (x), ∀x ∈ D1 . Then, h is Cn at x0 such that .

h(n) (x0 )(ξxn ) · · · (ξx1 )(ξy ) = n .

.

i=0

Λ1 :={j1 ,...,ji } is any i-element subset of Λ:={1,...,n};Λ2 :={ji+1 ,...,jn }:=Λ\Λ1

(9.1) g (i) (x0 )(ξxj1 ) · · · (ξxji )

(f (n−i) (x0 )(ξxji+1 ) · · · (ξxjn )(ξy )) ∀ξxl ∈ X, l = 1, . . . , n, ∀ξy ∈ Y. Proof By Propositions 9.45, 9.44, and 9.42, h is Cn at x0 . Since f and g are Cn at x0 , by Proposition 9.28, then ∃δ ∈ (0, ∞) ⊂ R such that f (n) (x0 ) ∈ BS n (X, B(Y, Z)) and g (n) (x0 ) ∈ BS n (X, B(Z, W)), and f (i) (x) ∈ BS i (X, B(Y, Z)) and g (i) (x) ∈ BS i (X, B(Z, W)), i = 1, . . . , n − 1, ∀x ∈ BX (x0 , δ) ⊆ D1 . We will use mathematical induction on n to prove the result. 1◦ n = 0. We have, ∀ξy ∈ Y, LHS = h(0) (x0 )(ξy ) = h(x0 )(ξy ) = g(x0 )f (x0 )ξy = g(x0 )(f (x0 )(ξy )) = RHS

.

This case is proved. 2◦ Assume that the result holds for n ≤ l ∈ Z+ . 3◦ Consider the case when n = l + 1. Since f and g are Cl+1 at x0 , then ∃δ ∈ (0, ∞) ⊂ R such that f and g are Cl at x, ∀x ∈ BX (x0 , δ) ⊆ D1 . By the inductive

320

9 Differentiation in Banach Spaces

assumption, ∀x ∈ BX (x0 , δ), ∀ξxs ∈ X, s = 1, . . . , l, ∀ξy ∈ Y, .

h(l) (x)(ξxl ) · · · (ξx1 )(ξy ) = l .

.

i=0

Λ1 :={j1 ,...,ji } is any i-element subset of Λ:={1,...,l};Λ2 :={ji+1 ,...,jl }:=Λ\Λ1

g (i) (x)(ξxj1 ) · · · (ξxji )

(f (l−i) (x)(ξxji+1 ) · · · (ξxjl )(ξy )) Then, by Proposition 9.27, we have, ∀ξxs ∈ X, s = 1, . . . , l + 1, ∀ξy ∈ Y, h(l+1) (x0 )(ξxl+1 ) · · · (ξx1 ) = h¯ (1) (x0 )(ξxl+1 )

.

¯ where h¯ : BX (x0 , δ) → B(Y, W) is defined by h(x) = h(l) (x)(ξxl ) · · · (ξx1 ), ∀x ∈ BX (x0 , δ). This implies that, by the chain rule (Theorem 9.18) and Proposition 9.42, .

h(l+1) (x0 )(ξxl+1 ) · · · (ξx1 )(ξy ) =

l .

.

i=0

Λ1 :={j1 ,...,ji } is any i-element subset of Λ:={1,...,l};Λ2 :={ji+1 ,...,jl }:=Λ\Λ1

g (i+1) (x0 )(ξxl+1 )(ξxj1 ) · · · (ξxji )

(f (l−i) (x0 )(ξxji+1 ) · · · (ξxjl )(ξy )) + g (i) (x0 )(ξxj1 ) · · · (ξxji ) ! (f (l−i+1) (x0 )(ξxl+1 )(ξxji+1 ) · · · (ξxjl )(ξy )) =

l+1 .

.

i=0

Λ1 :={j1 ,...,ji } is any i-element subset of Λ:={1,...,l};Λ2 :={ji+1 ,...,jl }:=Λ\Λ1

g (i) (x0 )(ξxj1 ) · · · (ξxji )

(f (l+1−i) (x0 )(ξxl+1 )(ξxji+1 ) · · · (ξxjl )(ξy )) . g (i) (x0 )(ξxl+1 )(ξxj1 ) · · · (ξxji−1 ) + Λ1 :={j1 ,...,ji−1 } is any (i−1)-element subset of Λ:={1,...,l};Λ2 :={ji ,...,jl }:=Λ\Λ1

!

(f (l+1−i) (x0 )(ξxji ) · · · (ξxjl )(ξy )) =

l+1 .

.

i=0

Λ1 :={j1 ,...,ji } is any i-element subset of Λ:={1,...,l+1};Λ2 :={ji+1 ,...,jl+1 }:=Λ\Λ1

g (i) (x0 )(ξxj1 ) · · · (ξxji )

(f (l+1−i) (x0 )(ξxji+1 ) · · · (ξxjl+1 )(ξy ))

9.9 Analytic Functions

321

where the last equality follows since all i-element subsets of {1, . . . , l + 1} are divided into two disjoint classes: one with l + 1 as an element; and the other without l + 1 as an element. This completes the induction process. This completes the proof of the proposition. ' & Now, we establish the important result that the composition of analytic functions is an analytic function. Theorem 9.82 Let X, Y, and Z be normed linear spaces over K, D1 ⊆ X, D2 ⊆ Y, f : D1 → D2 , g : D2 → Z, x0 ∈ D1◦ , and y0 := f (x0 ) ∈ D2◦ . Assume that f and g are analytic at x0 and y0 , respectively. Then, h := g ◦ f : D1 → Z is analytic at x0 . Proof Since g is analytic at y06, then ∃δ6g ∈ (0, ∞) ⊂ R, ∃cg ∈ [0, ∞) ⊂ R, and ∃Mg ∈ (0, ∞) ⊂ R such that 6g (n) (y)6 ≤ cg n!Mgn , ∀n ∈ Z+ , ∀y ∈ BY y0 , δg ⊆ D2 . Since f is analytic at x0 , then 6 ∃δf ∈ 6(0, ∞) ⊂ R, ∃cf ∈ [0, ∞) ⊂ R, and  ∃Mf ∈ (0, ∞) ⊂ R such that 6f (n) (x)6 ≤ cf n!Mfn and f (x) ∈ BY y0 , δg ,     ∀n ∈ Z+ , ∀x ∈ BX x0 , δf . By Proposition 9.45, h is C∞6 at ∀x ∈ BX x0 ,6δf . Define nonnegative constants Kn,i := supx∈BX (x0 ,δf ) 6Dn (g (i) ◦ f )(x)6, ∀i, n ∈ Z+ . Clearly, ∀i ∈ Z+ , K0,i =

.

6 6 (i) 6 (i) 6 6g ◦ f (x)6 ≤ 6g (y)6 ≤ cg i!M i sup sup g x∈BX (x0 ,δf ) y∈BY (y0 ,δg )

Note that, ∀i, n ∈ Z+ , by Chain Rule (Theorem 9.18), 6 6 n+1 (i) 6D (g ◦ f )(x)6 sup x∈BX (x0 ,δf ) 6 6 n (i+1) 6D ((g = sup ◦ f )(f (1) ))(x)6 x∈BX (x0 ,δf )

Kn+1,i =

.

  (i+1) (f (x)), ∀x ∈ Define  li+1 : BX x0 , δf → BS i+1 (Y, Z) by li+1 (x) := g BX x0 , δf , ∀i ∈ Z+ . Then, ∀i, n ∈ Z+ , .

6 6 n (i+1) 6D ((g sup ◦ f )(f (1) ))(x)6 x∈BX (x0 ,δf ) 6 6 n 6D (li+1 f (1) )(x)6 = sup x∈BX (x0 ,δf ) 6 6 n 6D (li+1 f (1) )(x)(ξx ) · · · (ξx )(ξx )6 sup = sup n 1 x∈BX (x0 ,δf ) ξxs ∈X,s=1,...,n,ξx ∈X, ≤1

Kn+1,i =

ξxs ≤1,ξx

=

sup x∈BX (x0 ,δf )

sup

6. 6 n 6 6 ∈X,

ξxs ∈X,s=1,...,n,ξx ξxs ≤1,ξx ≤1

k=0

322

9 Differentiation in Banach Spaces

. Λ1 :={j1 ,...,jk } is any k-element subset of Λ:={1,...,n};Λ2 :={jk+1 ,...,jn }:=Λ\Λ1

(f ≤

(n−k+1)

(k) li+1 (x)(ξxj1 ) · · · (ξxjk )

6 6 (x)(ξxjk+1 ) · · · (ξxjn )(ξx ))6 6

n 6 6 . . 6 6 (k) sup 6li+1 (x)6 · x∈BX (x0 ,δf ) k=0 Λ1 :={j1 ,...,jk } is any k-element subset of

6 (n−k+1) 6 6f (x)6

Λ:={1,...,n};Λ2 :={jk+1 ,...,jn }:=Λ\Λ1

=

n " #6 66 6 . n 6 k (i+1) 66 6 sup ◦ f )(x)66f (n−k+1) (x)6 6D (g x∈BX (x0 ,δf ) k=0 k



n " # . n Kk,i+1 cf (n + 1 − k)!Mfn+1−k k k=0

where fourth equality follows from Proposition 9.81. Hence, 0 ≤ Kn+1,i ≤ -n the n n−k+1 , ∀i, n ∈ Z+ . k=0 k Kk,i+1 cf (n − k + 1)!Mf We will use mathematical induction on n ∈ N to show that ∀n ∈ N, ∀i ∈ Z+ , 0 ≤ Kn,i ≤

.

(n + i)!cg Mgi+1 Mfn

n−1 .

n j +1

j =0

1◦

!

n−1 j

!

n+i n−j −1

!−1

j +1

cf

j

Mg

n = 1. By the recursive formula and the inequality for K0,i , ∀i ∈ Z+ , K1,i ≤ K0,i+1 cf Mf ≤ (i + 1)!cf Mf cg Mgi+1 = RHS

.

This case is proved. 2◦ Assume that the result holds for n = 1, . . . , l ∈ N. 3◦ Consider the case n = l + 1. ∀i ∈ Z+ , by the recursive formula, .

Kl+1,i ≤

l . k=0

≤ cf (l

! l Kk,i+1 cf (l − k + 1)!Mfl−k+1 k

+ 1)!Mfl+1 cg (i

+ 1)!Mgi+1

+

l . k=1

·(k + i +

1)!cg Mgi+2 Mfk

k−1 . j =0

k j +1

!

! l cf (l − k + 1)!Mfl−k+1 k k−1 j

!

k+i+1 k−j −1

!−1

j +1

cf

j

Mg

9.9 Analytic Functions

323

l+i+1 l

= cf cg Mgi+1 Mfl+1 (l + i + 1)!(l + 1) +

!−1

l . k−1 . l!(l − k + 1)!k!(k − 1)!(i + j + 2)! j +2 j +i+2 c cg Mfl+1 Mg k!(l − k)!j !(j + 1)!(k − j − 1)! f k=1 j =0

=

cf cg Mgi+1 Mfl+1 (l +

l+i+1 + i + 1)!(l + 1) l

!−1

l l−1 . . l!(l − k + 1)!(k − 1)!(i + j + 2)! j +2 j +i+2 c cg Mfl+1 Mg (l − k)!j !(j + 1)!(k − j − 1)! f j =0 k=j +1

= (l + i + 1)!cg Mgi+1 Mfl+1 cf (l + 1) +

l−1 . l . j =0 k=j +1

= (l + i + +

j =0 k=j +1

= (l + i + +

l . j =1

!

l+i+1 cf (l + 1) l

k−1 (l − k + 1) j

1)!cg Mgi+1 Mfl+1

l j

!−1

(l − k + 1)l!(k − 1)!(i + j + 2)! j +2 j +1 c Mg j !(j + 1)!(k − j − 1)!(l + i + 1)! f

1)!cg Mgi+1 Mfl+1

l−1 . l .

l+i+1 l

l+i+1 l−j

!−1

!

l j +1

!

j +1 j cf Mg

l . k=j

!−1

l+i+1 l−j −1

l+i+1 cf (l + 1) l

!

!−1

j +2 j +1 cf Mg

!

!−1

k−1 (l − k + 1) j −1

!!

where the second inequality follows from assumption   k and the inductive  the  k−1bound  -l for K0,i . Note that lk=j (l − k + 1) jk−1 (l − k + 1)( = − k=j −1 j j ) =  k  -l−1 k   k+1   k  -l -l  k  -l k=j (l −k +1) j − k=j −1 (l −k) j ) = k=j j = k=j ( j +1 − j +1 ) =

324

9 Differentiation in Banach Spaces

 l+1  j +1 , ∀1 ≤ j ≤ l. This leads to l+i+1 l

Kl+1,i ≤ (l + i + 1)!cg Mgi+1 Mfl+1 cf (l + 1)

.

+

l . j =1

l j

!

l+1 j +1

!

l+i+1 l−j

= (l + i + 1)!cg Mgi+1 Mfl+1

l . j =0

!−1

l j

j +1

cf !

!−1

! j

Mg

l+1 j +1

!

l+i+1 l−j

!−1

j +1

cf

j

Mg

= RHS This completes the induction process. Hence, ∀n ∈ N, .

6 (n) 6 6 n 6 6h (x)6 = 6D (g ◦ f )(x)6 = Kn,0 sup sup x∈BX (x0 ,δf ) x∈BX (x0 ,δf ) ! ! !−1 n−1 . n n−1 n j +1 j ≤ n!cg Mg Mfn cf Mg j +1 j n−j −1 j =0

=

n!cg Mg Mfn

! n−1 . n − 1 j +1 j cf Mg = n!cf cg Mg Mfn (1 + cf Mg )n−1 j j =0

= n!

cf cg Mg (Mf (1 + cf Mg ))n 1 + cf Mg

Hence, h is analytic at x0 . This completes the proof of the theorem.

' &

Proposition 9.83 The constant mapping as defined in Proposition 9.33, the identity mapping as delineated in Proposition 9.36, the projection mapping as delineated in Proposition 9.37, the vector addition, the scalar multiplication, the operation of a linear operator on a vector as delineated in Proposition 9.41, the operation of composition of two linear operators as delineated in Proposition 9.42, and the building up of linear operator on the product space using linear operators on individual spaces as delineated in Proposition 9.43 are analytic on the interior of the domain of the mapping. Proof This is immediate from Definition 9.79.

' &

Proposition 9.84 Let X and Y be Banach spaces over K, D := {L ∈ B(X, Y) | L is bijective.}, and f : D → B(Y, X) be defined by f (A) = A−1 , ∀A ∈ D ⊆ B(X, Y). Then, f is analytic.

9.9 Analytic Functions

325

Proof By Proposition 9.56, D is open in B(X, Y), f is C∞ , and f (1) (A)(Δ) = −f (A)Δf (A), ∀A ∈ D, ∀Δ ∈ B(X, Y). Fix any A ∈ D. Since D is open, then ∃δ ∈ (0, ∞) ⊂ R such that U := BB(X,Y) (A, δ) ⊆ D. Without loss of generality, assume that δf (A) < 1. By Proposition 9.55, ∀T ∈ U , we have f (A)2 T −A 2 −1 f (T ) − f (A) ≤ 1−f (A)T −A ≤ f (A) δ (1−f (A)δ) . This implies that f (T ) ≤ f (A) + f (A)2 δ (1 − f (A)δ)−1 = f (A) (1 − δf (A))−1 =: α ∈ [0, ∞) ⊂ R. This leads to 6 6 (1) 6f (T )6 =

.

sup

6 6 (1) 6f (T )(Δ)6 ≤ f (T )2 ≤ α 2

Δ∈B(X,Y),Δ≤1

6 6 We will use mathematical induction on n ∈ Z+ to show that 6f (n) (T )6 ≤ n!α n+1 , ∀n ∈ Z+ , ∀T ∈ U . 1◦ Clearly, n = 0 and n = 1 cases are proved. 2◦ Assume that the result holds for n = 0, . . . , k ∈ N. 3◦ Consider the case when n = k + 1. ∀T ∈ U , ∀Δ1 ∈ B(X, Y), f (1)(T )(Δ1 ) = −f (T )Δ1 f (T ) =: gΔ1 (T ), where gΔ1 : U → B(Y, X). By Proposition 9.27, we have, ∀Δi ∈ B(X, Y), i = 2, . . . , k + 1, (k)

gΔ1 (T )(Δk+1 ) · · · (Δ2 ) = f (k+1)(T )(Δk+1 ) · · · (Δ2 )(Δ1 )

.

This implies that 6 (k+1) 6 6f (T )6 =

.

=

sup

6 (k+1) 6 6f (T )(Δk+1 ) · · · (Δ1 )6

sup

6 6 6 6 (k) 6gΔ1 (T )(Δk+1 ) · · · (Δ2 )6

Δi ∈B(X,Y),Δi ≤1,i=1,...,k+1

Δi ∈B(X,Y),Δi ≤1,i=1,...,k+1

By Proposition 9.81, we have the following equality .

(k)

gΔ1 (T )(Δk+1 ) · · · (Δ2 ) = −

k .

.

i=0

Λ1 :={j1 ,...,ji } is any i-element subset of Λ:={2,...,k+1};Λ2 :={ji+1 ,...,jk }:=Λ\Λ1

(Δ1 f (k−i) (T )(Δji+1 ) · · · (Δjk ))

f (i) (T )(Δj1 ) · · · (Δji )

326

9 Differentiation in Banach Spaces

Then, k 6 . 6 (k+1) 6f (T )6 ≤

.

.

i=0

66 6 6 (i) 6f (T )66f (k−i) (T )6

Λ1 :={j1 ,...,ji } is any i-element subset of Λ:={2,...,k+1};Λ2 :={ji+1 ,...,jk }:=Λ\Λ1

! k 66 6 . k 6 6 (i) 66 6 = 6f (T )66f (k−i) (T )6 i i=0

! k k . . k ≤ k!α k+2 = (k + 1)!α k+2 i!α i+1 (k − i)!α k+1−i = i i=0

i=0

where the last inequality follows from the inductive assumption. This completes the induction process. Hence, f is analytic at T . By the arbitrariness of T , f is an analytic function. This completes the proof of the proposition. ' & Finally, we establish the analytic versions of the Inverse Function Theorem and the Implicit Function Theorem. Theorem 9.85 (Inverse Function Theorem) Let X and Y be Banach spaces over K, D ⊆ X, F : D → Y be analytic at x0 ∈ D ◦ , and y0 = F (x0 ) ∈ Y. Assume that F (1) (x0 ) ∈ B(X, Y) is bijective. Then, ∃δ ∈ (0, ∞) ⊂ R, ∃U ⊆ D with x0 ∈ U ∈ OX such that: (i) F |U : U → BY (y0 , δ) =: V ⊆ Y is bijective. (ii) The inverse mapping Fi : V → U is C∞ , and Fi(1) : V → B(Y, X) is given by Fi(1) (y) = (F (1) (Fi (y)))−1 , ∀y ∈ V . (iii) The inverse mapping Fi is analytic at y0 ∈ V . Proof The result (i) and (ii) are implied by Inverse Function Theorem 9.57. All we need to show is (iii). Since F is analytic at x0 , then ∃δ1 ∈ (0, ∞)6 ⊂ R, ∃c 6∈ [0, ∞) ⊂ R, and ∃M ∈ (0, ∞) ⊂ R such that U¯ := BX (x0 , δ1 ) ⊆ U , 6F (n) (x)6 ≤ cn!M n , ∀n ∈ Z+ , ∀x ∈ U¯ . Then, ∃δ2 ∈ (0, δ] ⊂ R such that Fi (BY (y0 , δ2 )) ⊆ U¯ . Define h : U¯ → B(Y, X) by h(x) = (F (1) (x))−1 , ∀x ∈ U¯ . Since F |U¯ is an analytic function, then F (1)U¯ is an analytic function. By Proposition 9.84 and ¯ Theorem 9.82, h is an analytic function. Then, 6 (n) ∃δ16∈ (0, δ1 ] ⊂nR, ∃ch ∈ [0, ∞) ⊂ 6 R, and ∃Mh ∈ (0, ∞) ⊂ R such that h (x)6 ≤ ch n!Mh , ∀n ∈ Z+ , ∀x ∈   BX x0 , δ¯1 . By choosing δ1 sufficiently small, without loss of generality, we may take δ¯1 = δ1 . (1) (1) Note that Fi ◦ F (x) = h(x), ∀x ∈ U¯ . Then, h(1) (x) = D(Fi ◦ F )(x) = (2) (2) (Fi ◦ F (x))F (1)(x) and Fi ◦ F (x) = (h(1) (x))h(x), ∀x ∈ U¯ . Define hk : U¯ → (k) BS k (Y, X) by hk (x) = Fi ◦F (x), ∀x ∈ U¯ , ∀2 ≤ k ∈ N. This leads to the recursive

9.9 Analytic Functions

327

relationship (k+1)

(Dhk (x))h(x) = Fi

.

∀x ∈ U¯ , ∀2 ≤ k ∈ N

◦ F (x) = hk+1 (x),

¯ with the initial condition h26(x) = h(1) 6 (x)h(x), ∀x ∈ U . Define the nonnegative constants Kk,j := supx∈U¯ 6Dj hk (x)6, ∀j ∈ Z+ , ∀2 ≤ k ∈ N. We will use mathematical induction on 2 ≤ k ∈ N to show that 0 ≤ Kk,j ≤

.

1◦

(j + 2k − 2)! k j +k−1 c M , (k − 1)!2k−1 h h

∀j ∈ Z+ , ∀2 ≤ k ∈ N

k = 2. ∀j ∈ Z+ , by the initial condition of the recursion, 6 6 6 6 K2,j = sup 6Dj h2 (x)6 = sup 6Dj (h(1) h)(x)6

.

x∈U¯

= sup x∈U¯

x∈U¯

6 j (1) 6 6D (h h)(x)(ξx ) · · · (ξx )(ξy )6 sup j 1 ξxl ∈X, ξxl ≤1,l=1,...,j ξy ∈Y,ξy ≤1

= sup x∈U¯

sup  

ξxl ∈X, ξxl ≤1,l=1,...,j ξy ∈Y,ξy ≤1

6. 6 j 6 6 l=0

.

Λ1 :={s1 ,...,sl } is any l-element subset of Λ:={1,...,j};Λ2 :={sl+1 ,...,sj }:=Λ\Λ1

(j −l)

(h

≤ sup

6 6 (x)(ξxsl+1 ) · · · (ξxsj )(ξy ))6 6

j .

x∈U¯ l=0

h(l+1) (x)(ξxs1 ) · · · (ξxsl )

.

6 (l+1) 66 (j −l) 6 6h (x)66h (x)6

Λ1 :={s1 ,...,sl } is any l-element subset of Λ:={1,...,j};Λ2 :={sl+1 ,...,sj }:=Λ\Λ1

! j 66 6 . j 6 6 (l+1) 66 (j −l) 6 (x)66h (x)6 = sup 6h l x∈U¯ l=0



j . l=0

j! j −l ch (l + 1)!Mhl+1 ch (j − l)!Mh l!(j − l)! j +1

= j !ch2 Mh

j . l=0

(l + 1) =

(j + 2)! 2 j +1 ch Mh = RHS 2

where the fourth equality follows from Proposition 9.81. This case is proved.

328

9 Differentiation in Banach Spaces

2◦ 3◦

Assume that the result holds for k = n ≥ 2 with n ∈ N. Consider the case when k = n + 1. ∀j ∈ Z+ , by the recursive relationship, .

6 6 6 6 6 6 Kn+1,j = sup 6Dj hn+1 (x)6 = sup 6Dj (h(1) n h)(x)6 x∈U¯

= sup x∈U¯

sup  

ξxl ∈X, ξxl ≤1,l=1,...,j ξy ∈Y,ξy ≤1

= sup x∈U¯

x∈U¯

sup  

ξxl ∈X, ξxl ≤1,l=1,...,j ξy ∈Y,ξy ≤1

6 6 6 j (1) 6 6D (hn h)(x)(ξxj ) · · · (ξx1 )(ξy )6 6 j 6. 6 6 l=0

.

Λ1 :={s1 ,...,sl } is any l-element subset of Λ:={1,...,j};Λ2 :={sl+1 ,...,sj }:=Λ\Λ1

(j −l)

(h

≤ sup

h(l+1) (x)(ξxs1 ) · · · (ξxsl ) n

6 6 (x)(ξxsl+1 ) · · · (ξxsj )(ξy ))6 6

j .

x∈U¯ l=0

.

6 (l+1) 66 (j −l) 6 6h (x)66h (x)6 n

Λ1 :={s1 ,...,sl } is any l-element subset of Λ:={1,...,j};Λ2 :={sl+1 ,...,sj }:=Λ\Λ1

! j 66 6 . j 6 6 (l+1) 66 (j −l) 6 (x)6 6hn (x)66h l x∈U¯

= sup

l=0

! j . 6 6 j Kn,l+1 6h(j −l) (x)6 l x∈U¯

≤ sup

l=0



j . l=0

j! (l + 2n − 1)! n l+n j −l c M ch (j − l)!Mh l!(j − l)! (n − 1)!2n−1 h h j +n

= chn+1 Mh

j 2n−1 .  j! (l + s) n−1 (n − 1)!2 l=0 s=1

j +n

= chn+1 Mh

(j + 2n)! n+1 j +n j! 1 (j + 2n)! = ch Mh = RHS n−1 (n − 1)!2 2n j! n!2n

where the fourth equality follows from Proposition 9.81; the second inequality follows the inductive assumption; and the seventh equality follows from Proposition A.1. This completes the induction process.

9.9 Analytic Functions

329

Based on this inequality, we have the following bounds for the high order derivatives of Fi around y0 . ∀y ∈ BY (y0 , δ2 ), ∀2 ≤ n ∈ N, .

6 6 6 6 6 6 (n) 6 6 (n) 6Fi (y)6 ≤ sup 6Fi ◦ F (x)6 = sup hn (x) ≤ Kn,0 x∈U¯



x∈U¯

(2n − 2)! n n−1 1 n!(2ch Mh )n c M ≤ (n − 1)!2n−1 chn Mhn−1 = (n − 1)!2n−1 h h 2Mh n

Hence, Fi is analytic at y0 . This completes the proof of the theorem.

' &

Theorem 9.86 (Implicit Function Theorem) Let X, Y, and Z be Banach spaces over K, D ⊆ X × Y, F : D → Z, and (x0 , y0 ) ∈ D ◦ . Assume that F is analytic at (x0 , y0 ), F (x0 , y0 ) = ϑZ , and ∂F ∂y (x0 , y0 ) ∈ B(Y, Z) is bijective. Then, the following statements hold: (i) There exists r0 , r1 ∈ (0, ∞) ⊂ R such that U × V := BX (x0 , r0 ) × BY (y0 , r1 ) ⊆ D and ∀x ∈ U , ∃! y ∈ V satisfying F (x, y) = ϑZ . This defines a function φ : U → V by φ(x) = y, ∀x ∈ U . Then, φ is C∞ . −1 ∂F (x, φ(x)), ∀x ∈ U . (ii) φ (1) (x) = −( ∂F ∂y (x, φ(x))) ∂x (iii) φ is analytic at x0 . Proof The results (i) and (ii) follow immediately from Implicit Function Theorem 9.59. All we need to show is (iii). Since F is analytic at (x0 , y0 ), we may choose r0 and r1 to be 6 6 sufficiently small such that ∃c ∈ [0, ∞) ⊂ R and ∃M ∈ (0, ∞) ⊂ R, 6F (n) (x, y)6 ≤ cn!M n , ∀n ∈ Z+ , ∀(x, y) ∈ U × V . Define mapping F¯ : D → X × Z by F¯ (x, y) = (x, F (x, y)), ∀(x, y) ∈ U × V . By Proposition 9.83 and Theorem 9.82, F¯ is analytic at (x0 , y0 ) ∈ D ◦ . By Propositions 9.44, 9.13, and 9.24, L F¯ (1) (x0 , y0 ) =

.

ϑB(Y,X) idX ∂F ∂F (x , y ) ∂x 0 0 ∂y (x0 , y0 )

M

Clearly, F¯ (1) (x0 , y0 ) is bijective. By Proposition 4.31, X × Y and X × Z are Banach spaces over K. By the analytic version of the Inverse Function Theorem 9.85,  ∃U¯ ⊆ D with (x0 , y0 ) ∈ U¯ ∈ OX×Y , ∃δ ∈ (0, ∞) ⊂ R such that F¯ U¯ : U¯ → BX×Z ((x0 , ϑZ ), δ) is bijective, whose inverse function F¯i : BX×Z ((x0 , ϑZ ), δ) → U¯ is analytic at (x0 , ϑZ ). By taking r0 and r1 sufficiently small, we may assume ¯ that  U × V ⊆ U . Then, ∃δ1 ∈ (0, min{δ, r0 }] ⊂ R, ∃W ⊆ U × V such that  ¯ F W → BX×Z ((x0 , ϑZ ), δ1 ) is bijective. By the uniqueness of y ∈ V that solves F (x, y) = ϑZ , we have φ(x) = πY ◦ F¯i (x, ϑZ ), ∀x ∈ BX (x0 , δ1 ). By Theorem 9.82 and Proposition 9.83, φ is analytic at x0 . This completes the proof of the theorem. ' & Definition 9.87 Let X be a Banach K and A ∈ B (X, X). We will define -space1over n 0 exp(A) ∈ B (X, X) by exp(A) := ∞ % n=0 n! A , where A := idX by notation.

330

9 Differentiation in Banach Spaces

This function is well-defined since B (X, X) is a Banach space over K by Proposition 7.66, and the series is absolutely summable since AB(X,X) < ∞. Thus, the series converges in B (X, X) by Proposition 7.27. The following proposition lists a number of properties for the exponential function. Proposition 9.88 Let X be a Banach space over K, and A ∈ B (X, X). ∀t, τ ∈ K, we have: (i) h : K → B (X, X) defined by h(t) = exp(At), ∀t ∈ K, is an analytic function. ∀t ∈ K, h admits arbitrarily large analytic radius at t. (ii) exp(A(t + τ )) = exp(At) exp(Aτ ) and exp(ϑB(X,X) ) = idX . (iii) dtd exp(At) = A exp(At) = exp(At)A. (iv) If B ∈ B (X, X) and AB = BA, then exp(At)B = B exp(At), and exp(A) exp(B) = exp(B) exp(A) = exp(A + B). (v) (exp(A))−1 = exp(−A) and (exp(A)) = exp(A ). (vi) exp(A)B(X,X) ≤ exp(AB(X,X) ) < ∞. (vii) The function exp : B(X, X) → B(X, X) is Fréchet differentiable with D exp(A)B(B(X,X),B(X,X)) ≤ exp(AB(X,X) ) < ∞, ∀A ∈ B(X, X). -∞ 1 n n Proof (iii) exp(At) = and 9.68, it n=0 n! A t . By Propositions 9.16 d d -∞ 1 n n can be differentiated term by term and dt exp(At) = dt n=0 n! A t = -∞ 1 n n−1 = A exp(At) = exp(At)A. n=1 (n−1)! A t n

(i) By repeated application of (iii), we have h(n) (t) = dtd n exp(At) = An exp(At), ∀n ∈ N. Then, h is an analytic function by Definition 9.79. Furthermore, ∀t ∈ R, h admits arbitrarily large analytic radius at t. -∞ 1 n n -∞ 1 1 n n n (iv) exp(At)B = ( ∞ = n=0 n! A t )B = n=0 n! A Bt n=0 n! BA · n t = B exp(At), where the third equality follows from -∞AB1 = BA. nFur1 n thermore, exp(A) exp(B) = exp(A)( ∞ B ) = = n=0 n! n=0 n! exp(A)B -∞ 1 n B exp(A) = exp(B) exp(A), where the third equality follows from n=0 n! -∞ 1 -∞ 1 + B)n = exp(A)B = B exp(A). exp(A + B) = n=0 n! (A n=0 n! · -∞ -n -n  n  i n−i ∞ ∞ 1 1 i B n−i = i A ( i=0 i A B ) = n=0 i=0 i!(n−i)! i=0 n=i i!(n−i)! A · ∞ ∞ 1 i ∞ 1 n 1 i 1 n−i = B n−i = ∞ i=0 i! A n=i (n−i)! B i=0 i! A n=0 n! B = exp(A) · exp(B), where the second equality follows from AB = BA; and the third through sixth equalities follow from the absolute summability of the series involved. (ii) This directly follows from (iv) and Definition 9.87. 1 n  (v) The inverse relationship follows from (ii). (exp(A)) = ( ∞ n=0 n! · A ) = -∞ 1 n  -∞ 1  n  n=0 n! (A ) = n=0 n! (A ) = exp(A ), where the second and third equalities follow from Proposition 7.110.6 6 -∞ 1 1 n6 n (vi) exp(A)B(X,X) = 6 ∞ n=0 n! A B(X,X) ≤ n=0 n! AB(X,X) = exp(AB(X,X) ) < ∞, where the first equality follows from Definition 9.87; and the first inequality follows from Definition 7.1 and Proposition 7.64; and the second equality follows from standard algebra. (vii) Fix any A0 ∈ B(X, X), and we will show that the function exp : B(X, X) → B(X, X) is Fréchet differentiable at A0 . Note that exp(A0 ) =

9.10 Newton’s Method

331

limn→∞ ni=0 i!1 Ai0 =: limn→∞ Fn (A0 ). Then, DFn (A)(B) = B + ni=2 i!1 · j i−j −1 ), ∀A, B ∈ B(X, X). ( i−1 j =0 A BA Define-G : B(X, X) → B(B(X, X), B(X, X)) by G(A)(B) = B + j i−j −1 ), ∀A, B ∈ B(X, X). ∀A ∈ B(X, X), G(A) is a limn→∞ ni=2 i!1 ( i−1 j =0 A BA linear function of B with norm G(A)B(B(X,X),B(X,X)) ≤ 1 + limn→∞ ni=2 i!1 i · Ai−1 B(X,X) = exp(AB(X,X) ) < ∞. Hence, G(A) ∈ B (B (X, X) , B (X, X)). Then, G is well-defined. Clearly, DFn converges uniformly to G as n → ∞ over the open set BB(X,X) (A0 , 1). Then, by Proposition 9.68, we have D exp(A0 ) = G(A0 ). Clearly, D exp(A0 )B(B(X,X),B(X,X)) = G(A0 )B(B(X,X),B(X,X))

.

≤ exp(A0 B(X,X) ) < ∞ This completes the proof of the proposition.

' &

9.10 Newton’s Method Let .X and .Y be Banach spaces and .F : X → Y. To solve the equation .F (x) = ϑY , we may use Newton’s method. Assume that F is twice Fréchet differentiable, and we set .xn+1 = xn −(F (1) (xn ))−1 F (xn ), .∀n ∈ N. Then, when .x1 is sufficiently close to a solution, the sequence .(xn )∞ n=1 converges to a solution .xopt with .F (xopt ) = ϑY . This result is formalized in the following proposition. Theorem 9.89 (Newton’s Method) Let .X and .Y be Banach spaces over .K, .D1 ⊆ X, .F : D1 → Y, and .x1 ∈ X. Assume that .∃β1 , η1 , h1 , K ∈ [0, ∞) ⊂ R such that: (i) .h1 := β1 η1 K ≤ 12 . " #   1 (ii) .dom F (2) ⊇ B X x1 , 2−2h =: D ⊆ B X (x1 , 2η1 ) and, .∀x ∈ D, 2−3h1 η1 6 (2) 6 6 6 . F (x) ≤ K. 6 6 6 6 (iii) .F (1) (x1 ) is bijective, .6(F (1) (x1 ))−1 6 ≤ β1 , and .6(F (1)(x1 ))−1 F (x1 )6 ≤ η1 . (1) −1 Then, .∃(xn )∞ n=1 ⊆ D, defined by .xn+1 = xn − (F (xn )) F (xn ), .∀n ∈ N, that converges to .xopt ∈ D with .F (xopt ) = ϑY .

Proof We will use mathematical induction to prove the following claim. Claim 9.89.1 .∀n ∈ N, we have: 6 6 βn−1 (a) .F (1) (xn ) is bijective and .6(F (1) (xn ))−1 6 ≤ βn := 1−h ∈ [0, ∞) ⊂ R. n−1 6 6 (1) h η −1 n−1 n−1 (b) .6(F (xn )) F (xn )6 ≤ ηn := ∈ [0, ∞) ⊂ R. 2(1−hn−1 )

332

9 Differentiation in Banach Spaces

" " " #n # # h1 1 1 − η1 . (c) .xn+1 ∈ B Y x1 , 2−2h 2−3h1 2(1−h1 ) (d) .0 ≤ hn := βn ηn K ≤ hn−1 ≤ 12 . Proof of Claim 1◦ Clearly, (a)–(d) are satisfied for .n = 1. 2◦ Assume that (a)–(d) are satisfied for .n ≤ k ∈ N. 3◦ Consider = k + 61. By6mean value theorem 9.23, .∃t0 ∈ (0, 1) ⊂ 6R 6 (1)the case of .n (1) 6 ≤ 6F (2) (t0 xk+1 + (1 − t0 )xk )(xk+1 − xk )6 ≤ such that .6F (x ) − F (x ) k+1 k 6 (2) 6 6 6 6F (t0 xk+1 + (1 − t0 )xk )6xk+1 − xk  ≤ Kηk . Then, .6(F (1) (xk ))−1 6 6 (1) 6 6F (xk+1 ) − F (1) (xk )6 ≤ βk ηk K = hk < 1. By Proposition 9.55, .F (1) (xk+1 ) is bijective and .

6 (1) 6 6(F (xk+1 ))−1 − (F (1) (xk ))−1 6 6 (1) 6 6 6 6(F (xk ))−1 62 6F (1) (xk+1 ) − F (1)(xk )6 6 6 6 6 ≤ 1 − 6(F (1) (xk ))−1 6 6F (1) (xk+1 ) − F (1) (xk )6 ≤

βk2 Kηk hk βk = 1 − βk ηk K 1 − hk

6 6 6 6 6 This leads to6 .6(F (1) (xk+1 ))−1 6 ≤ 6(F (1) (xk ))−1 6 + 6(F (1) (xk+1 ))−1 (1) (x ))−1 6 ≤ β + h β /(1 − h ) = β /(1 − h ) = β . −(F k k k k k k k k+1 . Hence, (a) holds. Note that .F (xk+1 ) = F (xk+1 ) − F (xk ) − F (1) (xk )(xk+1 − xk ). By Taylor’s Theorem 9.48, .∃t61 ∈ (0, 1) ⊂ R such that 6.F (xk+1 ) ≤ 6 1 6 (2) 2 6 F (t ≤ Kηk2 /2. Then, .6(F (1) (xk+1 ))−1 1 xk+1 + (1 − t1 )xk ) xk+1 − xk  2 6 hk ηk 2 . F (xk+1 )6 ≤ βk Kη /(2 − 2hk ) = k 2(1−hk )6 = ηk+1 . Hence, (b) holds. 6 Note also that .xk+2 − xk+1  = 6(F (1) (xk+1 ))−1 F (xk+1 )6 ≤ ηk+1 = " #k hk h1 h1 η ≤ η ≤ k k 2(1−hk ) 2(1−h1 ) 2(1−h1 ) η1 . Then, .xk+2 − x1  ≤ xk+2 − xk+1  + " #k #k ! " h1 2−2h1 h1 2−2h1 xk+1 − x1  ≤ 1 − η1 η + = 1 2(1−h1 ) 2−3h1 2(1−h1 ) 2−3h1 #k+1 ! " h1 1 − 2(1−h η1 . Hence, (c) holds. 1) Note that .hk+1 = βk+1 ηk+1 K = hk 2(1−hk )2

h2k . 2(1−hk )2

Since .0 ≤ hk ≤ 1/2, then .0 ≤

≤ 1. This implies that .0 ≤ hk+1 ≤ hk ≤ 1/2. Hence, (d) holds. This completes the induction process and the proof of the claim.

' &

∞ .(xn ) n=1

Then, the sequence ⊆ D is well-defined. .∀n ∈ N, .xn+2 − xn+1  = " #n 6 (1) 6 h1 ∞ 6(F (xn+1 ))−1 F (xn+1 )6 ≤ ηn+1 ≤ 2(1−h1 ) η1 . Thus, .(xn )n=1 is a Cauchy h1 ≤ 1/2 < 1. It must converge to .xopt ∈ D since D is sequence since .0 ≤ 2(1−h 1) closed and .X is complete. Note that .F (xn ) = F (1) (xn )(xn − xn+1 ), .∀n ∈ N. By Proposition 9.7, we have F and .F (1) are continuous at .xopt. This leads to .F (xopt ) =

9.10 Newton’s Method

333

limn∈N F (xn ) = limn∈N F (1) (xn )(xn − xn+1 ) = F (1)(xopt )(xopt − xopt) = ϑY , where the first equality follows from Proposition 3.66 and the third equality follows from Propositions 3.66, 3.67, 7.23, and 7.65. This completes the proof of the theorem. ' & It is numerically expensive to use Newton’s method due to the necessity of inverting .F (1) (xn ) at each step of the iteration. But the benefit of this method is the quadratic convergence of .(xn )∞ n=1 to .xopt , which is summarized in the following proposition. Proposition 9.90 Let .X and .Y be Banach spaces over .K, .D1 ⊆ X, .F : D1 → Y, and .x1 ∈ X. Assume that .∃β1 , η1 , h1 , K, M1 , M2 , M4 ∈ [0, ∞) ⊂ R such that: (i) .h1 := β1 η1 K ≤ 12 . " #   1 =: D, where .D2 ⊆ D1 is an open (ii) .dom F (3) ⊇ D2 ⊇ B X x1 , 2−2h η 1 2−3h1 6 6 (1) set in .X, .∀x 6∈ D2 , .F6 (x) is6 bijective, and, .∀x ∈ D, .6F (2) (x)6 ≤ K, 6 (1) −1 6 6 (3) 6 .6(F 2 , and .F 6 (x) ≤ M4 . 6 (1) (x)) −1 6≤ M1 , . F 6 (x)(1) ≤ M−1 (iii) .6(F (x1 )) 6 ≤ β1 and .6(F (x1 )) F (x1 )6 ≤ η1 . (1) −1 Then, .∃(xn )∞ that n=1 ⊆ D, defined by .xn+1 = xn − (F (xn )) F (xn6), .∀n ∈ N, 6 converges to .xopt ∈ D with .F (xopt) = ϑY . Furthermore, .∀n ∈ N, .6xn+1 − xopt6 ≤ 6 62 c6xn − xopt 6 , where .c := (M1 K + 2M 3 K 2 M4 + M 2 M2 M4 )/2. 1

1 ∞ .(xn ) n=1

Proof By Newton’s method (Theorem 9.89), ⊆ D is well-defined and converges to .xopt ∈ D with .F (xopt) = ϑY . Define .T : D2 → X by .T (x) = x − (F (1) (x))−1 F (x), .∀x ∈ D2 . Then, by Propositions 9.34, 9.44, 9.45, 9.41, 9.40, and 9.55, T is twice Fréchet differentiable. This leads to, .∀x ∈ D2 , .∀h1 ∈ X T (1) (x)(h1 ) = h1 − (F (1) (x))−1 F (1) (x)(h1 )

.

+(F (1)(x))−1 F (2) (x)(h1 )((F (1)(x))−1 F (x)) = (F (1)(x))−1 F (2) (x)(h1 )((F (1) (x))−1 F (x)) Then, .T (1) (xopt )(h1 ) = ϑX , .∀h1 ∈ X, since .F (xopt ) = ϑY . This implies that (1) (x ) = ϑ .T opt B(X,X) . The second order derivative of T is given by, .∀x ∈ D2 , .∀h1 , h2 ∈ X, .

T (2) (x)(h2 )(h1 ) = −(F (1) (x))−1 F (2) (x)(h2 )((F (1) (x))−1 F (2) (x)(h1 )((F (1) (x))−1 F (x))) +(F (1) (x))−1 F (3) (x)(h2 )(h1 )((F (1) (x))−1 F (x)) " +(F (1) (x))−1 F (2) (x)(h1 ) − (F (1) (x))−1 F (2) (x)(h2 )((F (1) (x))−1 # ·F (x)) + (F (1) (x))−1 F (1)(x)(h2 )

334

9 Differentiation in Banach Spaces

= −(F (1) (x))−1 F (2) (x)(h2 )((F (1) (x))−1 F (2) (x)(h1 )((F (1) (x))−1 F (x))) +(F (1) (x))−1 F (3) (x)(h2 )(h1 )((F (1) (x))−1 F (x)) −(F (1) (x))−1 F (2) (x)(h1 )((F (1) (x))−1 F (2) (x)(h2 )((F (1)(x))−1 F (x))) +(F (1) (x))−1 F (2) (x)(h1 )(h2 ) 6 6 6 6 Then, .∀x ∈ D, .6T (2)(x)6 = suph1 ,h2 ∈X, h1 ≤1, h2 ≤1 6T (2)(x)(h2 )(h1 )6 ≤ 2M13 K 2 M4 + M12 M2 M4 + M1 K =6 2c. .∀n ∈ 6N, by Taylor’s Theorem 69.48, 6 .∃t0 ∈ (0, 1) ⊂ R such that .6xn+1 − xopt 6 = 6T (xn ) − T (xopt)6 = 6 6 6 6 1 6 (2) 6T (xn ) − T (xopt) − T (1) (xopt)(xn − xopt)6 ≤ (t0 xn + (1 − t0 )xopt)6 2 T 6 6 62 62 6xn − xopt6 ≤ c6xn − xopt 6 . This completes the proof of the proposition. ' &

Chapter 10

Local Theory of Optimization

In this chapter, we will develop a number of tools for optimization of sufficiently many times differentiable functions. As in Chap. 8, we will be mainly concerned with real spaces, rather than complex ones.

10.1 Basic Notion Definition 10.1 Let X := (X, O) be a topological space, f : X → R, and x0 ∈ X . x0 is said to be a point of minimum for f if f (x0 ) ≤ f (x), ∀x ∈ X . It is said to be the point of strict minimum for f if f (x0 ) < f (x), ∀x ∈ X \ {x0 }. It is said to be a point of relative minimum for f if ∃O ∈ O with x0 ∈ O such that f (x0 ) ≤ f (x), ∀x ∈ O. It is said to be a point of relative strict minimum for f if ∃O ∈ O with x0 ∈ O such that f (x0 ) < f (x), ∀x ∈ O \ {x0 }. Similar definitions for points of maxima. Moreover, x0 is said to be a point of relative extremum if it is a point of relative minimum or relative maximum. It is said to be a point of relative strict extremum if it is a point of relative strict minimum or relative strict maximum. % Proposition 10.2 Let X be a real normed linear space, D ⊆ X, x0 ∈ D, f : D → R, u ∈ AD (x0 ). Assume that the directional derivative of f at x0 along u exists and x0 is a point of relative minimum for f . Then, Df (x0 ; u) ≥ 0. Proof This is immediate from Definition 9.4.

' &

Definition 10.3 Let X be a real normed linear space and A ∈ BS 2 (X, R). A is said to be positive definite if ∃m ∈ (0, ∞) ⊂ R such that A(x)(x) = Ax, x ≥ mx2 , ∀x ∈ X. Let the set of all such positive definite operators be denoted by S+ X . A is said to be positive semi-definite if A(x)(x) ≥ 0, ∀x ∈ X. Let the set of all such positive semi-definite operators be denoted by Spsd X . A is said to be negative definite if ∃m ∈ (0, ∞) ⊂ R such that A(x)(x) ≤ −mx2 , ∀x ∈ X. Let the set of all such negative definite operators be denoted by S− X . A is said to be © The Author(s), under exclusive license to Springer Nature Switzerland AG 2023 Z. Pan, Measure-Theoretic Calculus in Abstract Spaces, https://doi.org/10.1007/978-3-031-21912-2_10

335

336

10 Local Theory of Optimization

negative semi-definite if A(x)(x) ≤ 0, ∀x ∈ X. Let the set of all such negative semidefinite operators be denoted by Snsd X . We will denote BS 2 (X, R) by SX , which is clearly a Banach subspace of B2 ( X, R ). % Proposition 10.4 Let X be a real normed linear space. Then, (i) (ii) (iii) (iv)

S− X = −S+ X and Snsd X = −Spsd X . S+ X and S− X are open sets in BS 2 (X, R) = SX . Spsd X and Snsd X are closed convex cones in SX . ◦ ◦ S+ X ⊆ Spsd X and S− X ⊆ Snsd X .

Proof (i) This is clear. (ii) Fix A ∈ S+ X . Then, ∃m ∈ (0, ∞) ⊂ R such that A(x)(x) ≥ mx2 . ∀B ∈ BS 2 (X, R) with B − A < m/2, ∀x ∈ X, we have B(x)(x) = A(x)(x)+(B −A)(x)(x) ≥ mx2 −B − Ax2 ≥ mx2 /2, where the first inequality follows from Proposition 7.64. This implies that B ∈ S+ X . Then, ◦ . Hence, S A ∈ S+ + X is open in SX . Therefore, S− X = −S+ X is open in SX . X (iii) Clearly, ϑBS 2 (X,R) ∈ Spsd X . ∀A ∈ Spsd X , ∀α ∈ [0, ∞) ⊂ R, αA ∈ Spsd X . Hence, Spsd X is a cone. ∀A, B ∈ Spsd X , ∀x ∈ X, (A + B)(x)(x) = A(x)(x) + B(x)(x) ≥ 0. Hence, A + B ∈ Spsd X . Then, Spsd X is a convex cone. Let M := SX \ Spsd X . ∀A ∈ M, ∃x0 ∈ X such that A(x0 )(x0 ) < 0. Then, there exists m ∈ (0, ∞) ⊂ R such that A(x0 )(x0 ) < −mx0 2 < 0. ∀B ∈ BS 2 (X, R) with B − A < m/2, we have B(x0 )(x0 ) = A(x0 )(x0 ) + (B − A)(x0 )(x0 ) < −mx0 2 + B − Ax0 2 ≤ −mx0 2 /2 < 0, where the first inequality follows from Proposition 7.64. This implies that B ∈ M and A ∈ M ◦ . Then, M = M ◦ is open. Therefore, Spsd X is closed. Snsd X = −Spsd X is clearly also a closed convex cone. ◦ (iv) Clearly, S+ X ⊆ Spsd X . Then, S+ X ⊆ Spsd X . This further implies that ◦ S− X ⊆ Snsd X . This completes the proof of the proposition. ' & Definition 10.5 Let (X , K) be a vector space and K ⊆ X be convex. M ⊆ K is said to be an extreme subset of K if M = ∅ and ∀x1 , x2 ∈ K, ∀α ∈ (0, 1) ⊂ R, αx1 + (1 − α)x2 ∈ M implies that x1 , x2 ∈ M. If a singleton set {x0 } ⊆ K is an extreme subset, then x0 is called an extreme point of K. % Proposition 10.6 Let X be a real normed linear space, K ⊆ X be a nonempty convex set, and H := {x ∈ X | x∗0 , x = c} be a supporting hyperplane of K, where x∗0 ∈ X∗ with x∗0 = ϑX∗ and c ∈ R. Then, any extreme subset of K1 := K ∩ H is also an extreme subset of K. Proof Without loss of generality, assume that infk∈K x∗0 , x = c. Let M ⊆ K1 be an extreme subset of K1 . Then, M = ∅. ∀x1 , x2 ∈ K, ∀α ∈ (0, 1) ⊂ R, let x¯ := αx1 + (1 − α)x2 ∈ M ⊆ H . Then, x∗0 , x1  ≥ c and x∗0 , x2  ≥ c and x∗0 , x ¯ = c. This implies that α (x∗0 , x1 −c)+(1−α) (x∗0 , x2 −c) = 0 and x∗0 , x1  = c = x∗0 , x2 . Hence, x1 , x2 ∈ K1 . Since, M is an extreme subset of K1 , then x1 , x2 ∈ M. Therefore, M is an extreme subset of K. This completes the proof of the proposition. ' &

10.1 Basic Notion

337

Proposition 10.7 (Krein–Milman) Let X be a real reflexive Banach space, K ⊆ X be a nonempty bounded closed convex set, and M ⊆ K be a weakly compact extreme subset of K. Then, M contains at least one extreme point of K. Proof By Proposition 8.11, K is compact in Xweak . Let M = {E ⊆ M | E is a weakly compact extreme subset of K}. Clearly, M ∈ M = ∅. Clearly, ⊇ defines an antisymmetric partial ordering on M, where smaller sets are further down the stream. Next, we will use Zorn’s Lemma to show that M admits a maximal element.  Let E ⊆ M be a nonempty totally ordered (by ⊇) subcollection. Let E0 := E∈E E. ∀E ∈ E, E ⊆ M is weakly compact extreme subset of K. Then, by Propositions 7.116, 5.5, and 3.61, E is weakly closed. By Proposition 5.5, E0 is weakly compact. ∀x1 , x2 ∈ K, ∀α ∈ (0, 1) ⊂ R, let αx1 + (1 − α)x2 ∈ E0 . ∀E ∈ E, αx1 + (1 − α)x2 ∈ E. Since E is an extreme subset of K, then x1 , x2 ∈ E. By the arbitrariness of E, we have x1 , x2 ∈ E0 . Hence, E0 is an extreme subset of K if E0 = ∅. ∀E ∈ E, E is nonempty. Since E is totally ordered by ⊇, then the intersection of finite number of sets in E is again in E, and hence nonempty. By Proposition 5.12, E0 = ∅. Then, E0 is a weakly compact extreme subset of K and E0 ∈ M. Clearly, E0 is an upper bound of E (in terms of ⊇). Then, by Zorn’s Lemma, M admits a maximal element EM . Then, EM ⊆ M is a weakly compact extreme subset of K and EM = ∅. We will show that EM is a singleton set, which then proves that M contains an extreme point of K. Suppose that ∃x1 , x2 ∈ EM with x1 = x2 . Let N := span ({x2 − x1 }) and define a functional f : N → R by f (α (x2 − x1 )) = α, ∀α ∈ R. Clearly, f is a linear functional on N, and f N = 1/x2 − x1  < ∞. By Hahn–Banach Theorem 7.83, there exists x∗0 ∈ X∗ with x∗0  = 1/x2 − x1  such that x∗0 , α (x2 − x1 ) = α, ∀α ∈ R. Clearly, x∗0 , x1  = x∗0 , x2 . Note that EM is nonempty and compact in Xweak and x∗0 is weakly continuous. By Proposition 5.29, c := x∗0 , x0  = infx∈EM x∗0 , x ∈ R for some x0 ∈ EM . Define H := {x ∈ X | x∗0 , x = c}. Let Em := EM ∩ H . Then, at least one of x1 and x2 is not in Em . Hence, EM ⊃ Em . Clearly, Em x0 is nonempty. Note that H is weakly closed. Then, by Proposition 5.5, Em ⊆ M is weakly compact. ∀x¯1 , x¯2 ∈ K, ∀α¯ ∈ (0, 1) ⊂ R, let α¯ x¯1 + (1 − α) ¯ x¯2 ∈ Em ⊆ EM . Since EM is an extreme subset of K, then we have x¯1 , x¯2 ∈ EM . This further implies that x∗0 , x¯1  ≥ c and x∗0 , x¯2  ≥ c. Note that x∗0 , α¯ x¯1 + (1 − α) ¯ x¯2  = c. Then, we must have x∗0 , x¯1  = c = x∗0 , x¯2  and x¯1 , x¯2 ∈ Em . This shows that Em is an extreme subset of K. Then Em ∈ M. This contradicts with the fact that EM is maximal with respect to ⊇. Therefore, EM is a singleton set. This completes the proof of the proposition. ' & Proposition 10.8 Let X be a real reflexive Banach space, K ⊆ X be a nonempty bounded closed convex set, and E be the set of extreme points of K. Then, K = co (E). Proof Let C := co (E). By Proposition 7.15, C is closed and convex and C ⊆ K. We will prove the result by an argument of contradiction. Suppose K ⊃ C.

338

10 Local Theory of Optimization

Then, ∃x0 ∈ K \ C. By Proposition 8.10, there exists x∗0 ∈ X∗ such that x∗0 , x0  < infx∈C x∗0 , x. By Proposition 8.11, K is compact in the weak topology and x∗0 is continuous in the weak topology. Then, by Proposition 5.29, c0 := x∗0 , x1  = infx∈K x∗0 , x ∈ R for some x1 ∈ K. Clearly c0 ≤ x∗0 , x0  < infx∈C x∗0 , x. We will distinguish two exhaustive and mutually exclusive cases: Case 1: x∗0 = ϑX∗ ; Case 2: x∗0 = ϑX∗ . Case 1: x∗0 = ϑX∗ . Then, c0 = 0 and C = ∅. By Proposition 10.7, E = ∅. Then, C = ∅. This is a contradiction. Case 2: x∗0 = ϑX∗ . Let H := {x ∈ X | x∗0 , x = c0 }. H is a supporting hyperplane of K and H ∩ C = ∅. Let Cm := K ∩ H = ∅. Clearly, Cm is bounded closed and convex. By Proposition 10.7, there is an extreme point xm ∈ Cm of Cm . Then, {xm } ⊆ Cm is an extreme subset of Cm . By Proposition 10.6, {xm } is an extreme subset of K. Then, xm is an extreme point of K and xm ∈ E. This leads to the contradiction xm ∈ Cm ⊆ H , xm ∈ E ⊆ C and C ∩ H = ∅. Thus, in both cases, we have arrived at a contradiction. Then, the hypothesis must be false. Hence, K = co (E). This completes the proof of the proposition. ' & Proposition 10.9 Let X be a real vector space, Ω ⊆ X be a convex set, and f1 : Ω → R and f2 : Ω → R be convex functionals. Then, the following statements hold. (i) ∀n ∈ N, ∀x ∀α1 , . . . , αn ∈ [0, 1] ⊂ R with ni=1 αi = 1, 1 , . . . , xn ∈ Ω, we have f1 ni=1 αi xi ≤ ni=1 αi f1 (xi ). If, in addition, f1 is strictly  -n convex, x1 , . . . , xn are distinct, and α1 , . . . , αn ∈ (0, 1) ⊂ R, then f1 i=1 αi xi < n i=1 αi f1 (xi ). (ii) ∀α1 , α2 ∈ [0, ∞) ⊂ R, α1 f1 + α2 f2 is convex. If, in addition, f1 is strictly convex and α1 ∈ (0, ∞) ⊂ R, then α1 f1 + α2 f2 is strictly convex. (iii) ∀c ∈ R, {x ∈ Ω | f1 (x) ≤ c} is convex. (iv) Let Y be a real vector space, A : Y → X be an affine operator, D ⊆ Y be convex, and A(D) ⊆ Ω. Then, f1 ◦ A|D : D → R is convex. Proof This is straightforward and is therefore omitted.

' &

Proposition 10.10 Let f : [a, b] → R, where a, b ∈ R with a < b. Then, the following statements hold. (i) f is convex if, and only if, ∀x1 , x2 , x3 ∈ [a, b] ⊂ R with x1 < x2 < x3 , we have .

f (x2 ) − f (x1 ) f (x3 ) − f (x2 ) ≤ x2 − x1 x3 − x2

if, and only if, ∀x1 , x2 , x3 ∈ [a, b] ⊂ R with x1 < x2 < x3 , we have .

f (x3 ) − f (x1 ) f (x2 ) − f (x1 ) ≤ x2 − x1 x3 − x1

10.1 Basic Notion

339

(ii) f is strictly convex if, and only if, ∀x1 , x2 , x3 ∈ [a, b] ⊂ R with x1 < x2 < x3 , we have .

f (x2 ) − f (x1 ) f (x3 ) − f (x2 ) < x2 − x1 x3 − x2

if, and only if, ∀x1 , x2 , x3 ∈ [a, b] ⊂ R with x1 < x2 < x3 , we have .

f (x2 ) − f (x1 ) f (x3 ) − f (x1 ) < x2 − x1 x3 − x1

(iii) If f is convex and c ∈ (a, b) ⊂ R, then, the one-sided derivatives .

lim

x→c+

f (x) − f (c) ; x−c

lim

x→c−

f (x) − f (c) x−c

exist. Proof This is straightforward and is therefore omitted.

' &

Proposition 10.11 Let X be a real normed linear space, Ω ⊆ X be convex, and f : Ω → R be differentiable. Then, the following statements hold. (i) f is convex if, and only if, ∀x, y ∈ Ω, we have f (y) ≥ f (x) + f (1)(x)(y − x). (ii) f is strictly convex if, and only if, ∀x, y ∈ Ω with x = y, we have f (y) > f (x) + f (1) (x)(y − x). Proof (i) “Necessity” Let f be convex. ∀x, y ∈ Ω, by the convexity of Ω, we have y−x ∈ AΩ (x). ∀α ∈ (0, 1] ⊂ R, we have f (αy+(1−α)x) ≤ αf (y)+(1−α)f (x), which further implies that f (y) − f (x) ≥ (f (x + α (y − x)) − f (x))/α. Then, we have f (y) − f (x) ≥ lim (f (x + α (y − x)) − f (x))/α = Df (x; y − x)

.

α→0+

= f (1) (x)(y − x) where the first equality follows from Definition 9.4 and the last equality follows from Proposition 9.5. “Sufficiency” Let f (y) ≥ f (x) + f (1)(x)(y − x), ∀x, y ∈ Ω. ∀x1 , x2 ∈ Ω, ∀α ∈ [0, 1] ⊂ R, let x := αx1 + (1 − α)x2 ∈ Ω. Then, f (x2 ) = f (x + α (x2 − x1 )) ≥ f (x) + αf (1)(x)(x2 − x1 ). Note also, f (x1 ) = f (x + (1 − α) (x1 − x2 )) ≥ f (x) + (1 − α)f (1) (x)(x1 − x2 ). Then, αf (x1 ) + (1 − α)f (x2 ) ≥ f (x). Hence, f is convex. (ii) “Necessity” Let f be strictly convex. ∀x, y ∈ Ω with x = y, by the convexity of Ω, we have y − x ∈ AΩ (x). ∀α ∈ (0, 1) ⊂ R, we have f (αy + (1 − α)x) < αf (y) + (1 − α)f (x), which further implies that f (y) − f (x) > (f (x + α (y −

340

10 Local Theory of Optimization

x)) − f (x))/α. Define A : R → X by A(β) = x + β (y − x), ∀β ∈ R. Clearly, A is a affine operator and A(I ) ⊆ Ω, where I := [0, 1] ⊂ R. By Proposition 10.9, g := f ◦ A|I is convex. Then, we have g(0.5) − g(0) 0.5 g(α) − g(0) f (x + α (y − x)) − f (x) ≥ lim = lim α α α→0+ α→0+

f (y) − f (x) > 2(f (x + 0.5 (y − x)) − f (x)) =

.

= Df (x; y − x) = f (1) (x)(y − x) where the second inequality follows from Proposition 10.10 and the third equality follows from Definition 9.4 and the last equality follows from Proposition 9.5. “Sufficiency” Let f (y) > f (x) + f (1) (x)(y − x), ∀x, y ∈ Ω with x = y. ∀x1 , x2 ∈ Ω with x1 = x2 , ∀α ∈ (0, 1) ⊂ R, let x := αx1 + (1 − α)x2 ∈ Ω. Then, f (x2 ) = f (x + α (x2 − x1 )) > f (x) + αf (1) (x)(x2 − x1 ). Note also, f (x1 ) = f (x + (1 − α) (x1 − x2 )) > f (x) + (1 − α)f (1) (x)(x1 − x2 ). Then, αf (x1 ) + (1 − α)f (x2 ) > f (x). Hence, f is strictly convex. This completes the proof of the proposition. ' & Proposition 10.12 Let X be a real normed linear space, Ω ⊆ X be convex, and f : Ω → R be twice differentiable. Then, the following statements hold. (i) If f (2) (x) is positive semi-definite, ∀x ∈ Ω, then f is convex. (ii) If f is convex and C2 , then f (2) (x) is positive semi-definite, ∀x ∈ Ω ◦ ∩ Ω. (iii) If f is convex and f (2) (x) is positive definite for all x ∈ Ω \ E, where E ⊆ Ω does not contain any line segment, then f is strictly convex. Proof (i) ∀x, y ∈ Ω, by Taylor’s Theorem, ∃t0 ∈ (0, 1) ⊂ R we have 1 f (y) = f (x) + f (1) (x)(y − x) + f (2) (t0 y + (1 − t0 )x)(y − x)(y − x) 2

.

By the assumption, f (2) (t0 y + (1 − t0)x)(y − x)(y − x) ≥ 0. Then, f (y) ≥ f (x) + f (1) (x)(y − x). By the arbitrariness of x, y and Proposition 10.11, f is convex. (ii) We will prove this statement by an argument of contradiction. Suppose ∃x0 ∈ Ω ◦ ∩ Ω such that f (2) (x0 ) is not positive semi-definite. Then, ∃h ∈ X such that f (2) (x0 )(h)(h) < 0. We will distinguish two exhaustive and mutually exclusive cases: Case 1: x0 ∈ Ω ◦ ; Case 2: x0 ∈ / Ω ◦ . Case 1: x0 ∈ Ω ◦ . By the continuity (2) ◦ of f and x0 ∈ Ω , ∃δ ∈ (0, ∞) ⊂ R such that x1 := x0 + δh ∈ Ω and

10.2 Unconstrained Optimization

341

f (2) (x0 + αδh)(h)(h) < 0, ∀α ∈ (0, 1) ⊂ R. By Taylor’s Theorem, ∃t0 ∈ (0, 1) ⊂ R we have f (x1 ) = f (x0 ) + f (1)(x0 )(x1 − x0 ) +

.

δ 2 (2) f (x0 + t0 δh)(h)(h) 2

Then, f (x1 ) < f (x0 ) + f (1) (x0 )(x1 − x0 ). This contradicts with the fact that f is convex and Proposition 10.11. Case 2: x0 ∈ / Ω ◦ . Then, x0 ∈ (Ω ◦ ∩ Ω) \ Ω ◦ . Then, by Proposition 4.13, there ∞ exists (xn )n=1 ⊆ Ω ◦ such that limn∈N xn = x0 . By the continuity of f (2) , we have ∃n0 ∈ N such that f (2)(xn0 )(h)(h) < 0. Hence, f (2) (xn0 ) is not positive semidefinite and xn0 ∈ Ω ◦ . By Case 1, there is a contradiction. Hence, the statement must be true. (iii) We will prove this statement by an argument of contradiction. Suppose f is not strictly convex. By Proposition 10.11, ∃x, y ∈ Ω with x = y such that f (y) = f (x) + f (1) (x)(y − x). By the convexity of f and Proposition 10.11, ∀α ∈ [0, 1] ⊂ R, we have f (x + α (y − x)) ≥ f (x) + αf (1) (x)(y − x) = f (x) + α(f (y) − f (x)). By the convexity of f , we have f (x + α (y − x)) = f (x) + α(f (y) − f (x)). Define g : [0, 1] → R by g(α) = f (αy + (1 − α)x) − αf (y) − (1 − α)f (x), ∀α ∈ I := [0, 1] ⊂ R. Then, g(α) = 0, ∀α ∈ I . This implies that g (2) (α) = 0, ∀α ∈ I , and f (2) (αy + (1 − α)x)(y − x)(y − x) = 0, ∀α ∈ I . Hence, f (2) (αy + (1 − α)x) is not positive definite, ∀α ∈ I . Hence, E contains the line segment connecting x and y. This contradicts with the assumption. Therefore, the statement must be true. This completes the proof of the proposition. ' &

10.2 Unconstrained Optimization The basic problem to be considered in this section is μ0 := inf f (x)

.

x∈Ω

(10.1)

where .X is a real normed linear space, .Ω ⊆ X is a set, and .f : Ω → R is a functional. Proposition 10.13 Let .X be a real normed linear space, .Ω ⊆ X be convex, .f : Ω → R be a convex functional, and .μ0 := infx∈Ω f (x) ∈ R. Then, the following statements hold. (i) The set of all points of minimum for f , which is given by .{x ∈ Ω | f (x) = μ0 }, is convex. (ii) Any point of relative minimum for f is a point of minimum for f . (iii) Any point of relative strict minimum for f is the point of strict minimum for f .

342

10 Local Theory of Optimization

Proof (i) Note that .{x ∈ Ω | f (x) = μ0 } = {x ∈ Ω | f (x) ≤ μ0 }, which is convex by Proposition 10.9. (ii) Fix any .x0 ∈ Ω that is a point of relative minimum for f . Then, .∃ ∈ (0, ∞) ⊂ R such that .f (x) ≥ f (x0 ), .∀x ∈ Ω ∩ BX (x0 , ). .∀y ∈ Ω, .∃α ∈ (0, 1) ⊂ R such that .x0 + α (y − x0 ) ∈ BX (x0 , ). Then, .f (x0 ) ≤ f (x0 + α (y − x0 )) = f (αy + (1 − α)x0 ) ≤ αf (y) + (1 − α)f (x0 ). This implies that .f (x0 ) ≤ f (y). Hence, .x0 is a point of minimum for f . (iii) Fix any .x0 ∈ Ω that is a point of relative strict minimum for f . Then, .∃ ∈ (0, ∞) ⊂ R such that .f (x) > f (x0 ), .∀x ∈ (Ω ∩ BX (x0 , )) \ {x0 }. .∀y ∈ Ω with .y = x0 , .∃α ∈ (0, 1) ⊂ R such that .x0 + α (y − x0 ) ∈ BX (x0 , ). Then, .f (x0 ) < f (x0 + α (y − x0 )) = f (αy + (1 − α)x0 ) ≤ αf (y) + (1 − α)f (x0 ). This implies that .f (x0 ) < f (y). Hence, .x0 is a point of strict minimum for f . This completes the proof of the proposition. ' & Proposition 10.14 Let .X be a real reflexive Banach space, .Ω ⊆ X be a nonempty bounded closed convex set, and .f : Ω → R be a weakly upper semicontinuous convex functional. Then, there exists an extreme point .x0 ∈ Ω of .Ω such that .x0 is a point of maximum for f . Proof By Proposition 8.11, .Ω is weakly compact. Let .Oweak (X) be the weak topology on .X and .Oweak,Ω be the subset topology on .Ω with respect to .Oweak (X). By Proposition 5.30, there exists .x1 ∈ Ω that is a point of maximum for f . Let .M := {x ∈ Ω | f (x) ≥ f (x1 )}. Then .x1 ∈ M = ∅. Note that, by Proposition 2.5, .M = finv (R \ I ) = Ω \ finv (I ), where .I = (−∞, f (x1 )) ⊂ R. Since I is open in .R and f is weakly upper semicontinuous, then, .finv (I ) ∈ Oweak,Ω . Then, M is closed in .Oweak,Ω . By Proposition 5.5, M is weakly compact. Since .f (x1 ) = maxx∈Ω f (x), then M is an extreme subset of .Ω. By Proposition 10.7, there exists .x0 ∈ M that is an extreme point of .Ω. This completes the proof of the proposition. ' & Proposition 10.15 Let .X be a real normed linear space, .Ω ⊆ X, .x0 ∈ Ω ◦ , and .f : Ω → R be differentiable at .x0 . Assume that .x0 is a point of relative extremum of f , then .f (1)(x0 ) = ϑX∗ . Proof Without loss of generality, assume that .x0 is a point of relative minimum for f . The case when .x0 is a point of relative maximum for f can be proved similarly. Since .x0 ∈ Ω ◦ , then .AΩ (x0 ) = X. .∀u ∈ X, by Propositions 10.2 and 9.5, .Df (x0 ; u) = f (1) (x0 )u ≥ 0 and .Df (x0 ; −u) = −f (1)(x0 )u ≥ 0. Then, (1) (x )u = 0. By the arbitrariness of u, we have .f (1) (x ) = ϑ ∗ . This completes .f 0 0 X the proof of the proposition. ' &

10.2 Unconstrained Optimization

343

Proposition 10.16 Let .X be a real normed linear space, .Ω ⊆ X, .x0 ∈ Ω ◦ , and .f : Ω → R be .C2 at .x0 . Assume that .x0 is a point of relative minimum for f , then (1) (x ) = ϑ ∗ and .f (2) (x ) is positive semi-definite. .f 0 0 X Proof By Proposition 10.15, .f (1) (x0 ) = ϑX∗ . We will show that .f (2)(x0 ) is positive semi-definite by an argument of contradiction. Suppose .f (2) (x0 ) is not positive semi-definite. Then, .∃h ∈ X such that .f (2) (x0 )(h)(h) < 0. By the continuity of (2) at .x , there exists .δ ∈ (0, ∞) ⊂ R with .D := B (x , δ) ⊆ Ω such that .f 0 X 0 (2) (x)(h)(h) < 0, .∀x ∈ D. Let .x := x + αh ∈ D, where .α ∈ (0, ∞) ⊂ R is an .f 1 0 arbitrary constant. By Taylor’s Theorem 9.48, there exists .t0 ∈ (0, 1) ⊂ R such that 1 f (x1 ) = f (x0 ) + f (2) (x0 + t0 αh)(x1 − x0 )(x1 − x0 ) 2

.

= f (x0 ) +

α 2 (2) f (x0 + t0 αh)(h)(h) < f (x0 ) 2

This contradicts with the fact that .x0 is a point of relative minimum for f . Hence, the result must be true. This completes the proof of the proposition. ' & Proposition 10.17 Let .X be a real normed linear space, .Ω ⊆ X, .x0 ∈ Ω ◦ , and .f : Ω → R be twice differentiable at x, .∀x ∈ D := BX (x0 , δ0 ) ⊆ Ω, where (1) (x ) = ϑ ∗ and .f (2) (x) is .δ0 ∈ (0, ∞) ⊂ R is some constant. Assume that .f 0 X positive semi-definite, .∀x ∈ D. Then, .x0 is a point of relative minimum for f . Proof .∀x ∈ D, by Taylor’s Theorem 9.48, there exists .t0 ∈ (0, 1) ⊂ R such that 1 f (x) = f (x0 ) + f (2) (x0 + t0 (x − x0 ))(x − x0 )(x − x0 ) ≥ f (x0 ) 2

.

Hence, .x0 is a point of relative minimum for f . This completes the proof of the proposition. & ' Proposition 10.18 Let .X be a real normed linear space, .Ω ⊆ X, .x0 ∈ Ω ◦ , and (1) (x ) = ϑ ∗ and .f (2) (x ) is positive .f : Ω → R be .C2 at .x0 . Assume that .f 0 0 X definite. Then, .x0 is a point of relative strict minimum for f . Proof By Proposition 10.4 and the continuity of .f (2) at .x0 , .∃δ ∈ (0, ∞) ⊂ R such that .f (2) (x) is positive definite, .∀x ∈ D := BX (x0 , δ) ⊆ Ω. .∀x ∈ D \ {x0 }, by Taylor’s Theorem, there exists .t0 ∈ (0, 1) ⊂ R such that 1 f (x) = f (x0 ) + f (2) (x0 + t0 (x − x0 ))(x − x0 )(x − x0 ) > f (x0 ) 2

.

Hence, .x0 is a point of relative strict minimum for f . This completes the proof of the proposition. & '

344

10 Local Theory of Optimization

10.3 Optimization with Equality Constraints The basic problem to be considered in this section is μ0 :=

.

inf f (x) subject to x ∈ Ω H (x) = ϑY

(10.2)

where .X and .Y are real normed linear spaces, .Ω ⊆ X is a set, .f : Ω → R is a functional, and .H : Ω → Y is a function. Definition 10.19 Let .X and .Y be real normed linear spaces, .Ω ⊆ X, .H : Ω → Y be .C1 at .x0 ∈ Ω ◦ . .x0 is said to be a regular point of H if .H (1)(x0 ) ∈ B(X, Y) is surjective. % Lemma 10.20 Let .X and .Y be real Banach spaces, .Ω ⊆ X, .f : Ω → R be .C1 at .x0 ∈ Ω ◦ , and .H : Ω → Y be .C1 at .x0 . Consider the optimization problem (10.2). Assume that .x0 is a point of relative minimum for f on the set .Ωc := {x ∈ Ω | H (x) = ϑY }; and .x0 is a regular point of H . Then, .∀u ∈ X with .H (1)(x0 )u = ϑY , we have .f (1) (x0 )u = 0. Proof Define .T : Ω → R × Y by .T (x)H = (f (x), I H (x)), .∀x ∈ Ω. By (1) (x ) f 0 . Proposition 9.44, T is .C1 at .x0 and .T (1) (x0 ) = H (1)(x0 ) We will prove the result using an argument of contradiction. Suppose the result is not true. Then, .∃u0 ∈ X with .H (1)(x0 )u0 = ϑY , we have .f (1) (x0 )u0 = 0. We will show that .T (1) (x0 ) is surjective. .∀(r, y) ∈ R × Y, by .x0 being a regular point (1) (x )u 0 1 u0 + u1 . Then, of H , .∃u1 ∈ X such that .H (1)(x0 )u1 = y. Let .u = r−f f (1) (x )u 0

0

T (1) (x0 )u = (r, y). Hence, .T (1) (x0 ) is surjective. Note that .T (x0 ) = (f (x0 ), ϑY ) and .R × Y is a real Banach space, by Propositions 7.22 and 4.31. By Surjective Mapping Theorem 9.53, .∃δr ∈ (0, ∞) ⊂ R, .∃δ ∈ (0, ∞) ⊂ R, and .c1 ∈ [0, ∞) ⊂ R with .c1 δ ≤ δr such that .∀(r, y) ∈ BR×Y (T (x0 ), δ/2), .∃x ∈ BX (x0 , δr ) ⊆ Ω with .x − x0  ≤ c1 (r, y) − T (x0 ), we have .T (x) = (r, y). Then, .∀δ¯r ∈ (0, δ/2) ⊂ R, let .r1 = f (x0 ) − δ¯r ∈ R. .(r1 , ϑY ) ∈ BR×Y (T (x0 ), δ/2). Then, .∃x1 ∈ Ω with .x1 − x0  ≤ c1 (r1 , ϑY ) − T (x0 ) = c1 |r1 − f (x0 )| = c1 δ¯r < (1 + c1 )δ¯r such that .T (x1 ) = (r1 , ϑY ). Then, .H (x1) = ϑY and .x1 ∈ Ωc ∩ BX (x0 , (1 + c1 )δ¯r ). Furthermore, .f (x1 ) = r1 < f (x0 ). This contradicts with the assumption that .x0 is a point of relative minimum for f on .Ωc . Therefore, the result must be true. This completes the proof of the lemma. ' & .

10.3 Optimization with Equality Constraints

345

Proposition 10.21 (Lagrange Multiplier) Let .X and .Y be real Banach spaces, Ω ⊆ X, .f : Ω → R be .C1 at .x0 ∈ Ω ◦ , and .H : Ω → Y be .C1 at .x0 . Consider the optimization problem (10.2). Assume that .x0 is a point of relative minimum for f on the set .Ωc := {x ∈ Ω | H (x) = ϑY }; and .x0 is a regular point of H . Then, there exists a Lagrange multiplier .y∗0 ∈ Y∗ such that the Lagrangian .L : Ω × Y∗ → R defined by .L(x, y∗ ) = f (x) + y∗ , H (x), .∀(x, y∗ ) ∈ Ω × Y∗ , is stationary at (1) (x , y ) = ϑ .(x0 , y∗0 ), that is .L 0 ∗0 B(X×Y∗ ,R) . .

Proof By Lemma 10.20, .∀u ∈ X with .H (1)(x0 )u = ϑY , we have .f (1) (x0 )u =   (1) ⊥ N H (x0 ) . Since .x0 is a regular point of 0. Then, .f (1) (x0 ) ∈  (1)  H , then .R H (x0 ) = Y is closed. By Proposition 7.114, we have "   (1) ⊥  # . N H (x0 ) = R H (1)(x0 ) . Then, there exists .y∗0 ∈ Y∗ such that   (1) (x ) = − H (1) (x )  y . By Propositions 9.34, 9.41, 9.38, 9.37, 9.44, and 9.45, .f 0 0 ∗0 the Lagrangian L is .C1 at .(x0 , y∗0 ) and, .∀(u, v∗ ) ∈ X × Y∗ , BB CC L(1) (x0 , y∗0 )(u, v∗ ) = f (1) (x0 )u + v∗ , H (x0 ) + y∗0 , H (1)(x0 )u @@" # AA = f (1) (x0 )u + H (1)(x0 ) y∗0 , u = 0

.

where the second equality follows from that fact that .x0 ∈ Ωc . Hence, L(1) (x0 , y∗0 ) = ϑB(X×Y∗ ,R) . This completes the proof of the proposition. ' &

.

Proposition 10.22 (Generalized Lagrange Multiplier) Let .X and .Y be real Banach spaces, .Ω ⊆ X, .f : Ω → R be .C1 at .x0 ∈ Ω ◦ , and .H : Ω → Y be .C1 at .x0 . Consider the optimization problem (10.2). Assume that .x0 is a point of relative   minimum for f on the set .Ωc := {x ∈ Ω | H (x) = ϑY }; and .R H (1)(x0 ) ⊆ Y is closed. Then, there exists a Lagrange multiplier .(r0 , y∗0 ) ∈ R × Y∗ with ∗ → R defined by .(r0 , y∗0 ) = (0, ϑY∗ ) such that the Lagrangian .L : Ω × Y ∗ .L(x, y∗ ) = r0 f (x) + y∗ , H (x), .∀(x, y∗ ) ∈ Ω × Y , is stationary at .(x0 , y∗0 ), that is .L(1) (x0 , y∗0 ) = ϑB(X×Y∗ ,R) . Proof We will   distinguish two  exhaustive  and mutually exclusive cases: Case 1: R H (1)(x0 ) = Y; Case 2: .R H (1)(x0 ) ⊂ Y.   Case 1: .R H (1)(x0 ) = Y. Take .r0 = 1. Clearly, .x0 is a regular point of H . The result followsfrom Proposition 10.21.  Case 2: .R H (1) (x0 ) ⊂ Y. Take .r0 = 0. Clearly, .x0 is not a regular point of H .     Then, .∃y0 ∈ Y \ R H (1)(x0 ) . Let .M = R H (1) (x0 ) , which is a closed subspace of .Y by the assumption. Then, by Proposition 4.10, .δ := infm∈M y0 − m > 0. By Proposition 7.97, we have .δ = maxy∗ ∈M ⊥ , y∗ ≤1 y∗ , y0 , where the maximum"is achieved #at .y∗0 ∈ M ⊥ . Then, .y∗0 = ϑY∗ and, by Proposition 7.112,  (1)  .y∗0 ∈ N H (x0 ) . The Lagrangian L is given by .L(x, y∗ ) = y∗ , H (x), ∗ .∀(x, y∗ ) ∈ Ω × Y . By Propositions 9.34, 9.41, 9.37, 9.44, and 9.45, L is .C1 at .

346

10 Local Theory of Optimization

(x0 , y∗0 ) and, .∀(u, v∗ ) ∈ X × Y∗ ,

.

BB CC L(1) (x0 , y∗0 )(u, v∗ ) = v∗ , H (x0 ) + y∗0 , H (1)(x0 )u # AA @@" = H (1) (x0 ) y∗0 , u = ϑX∗ , u = 0

.

∈ Ωc . Hence,

where the second equality follows from that fact that .x0 L(1) (x0 , y∗0 ) = ϑB(X×Y∗ ,R) . This case is proved. This completes the proof of the proposition.

.

' &

Proposition 10.23 Let .X and .Y be real Banach spaces, .Ω ⊆ X, .f : Ω → R be C2 at .x0 ∈ Ω ◦ , .H : Ω → Y be .C2 at .x0 . Consider the optimization problem (10.2). Assume that   (i) .H (x0) = ϑY and .R H (1)(x0 ) ⊆ Y is closed. (ii) the Lagrangian .L : Ω × Y∗ → R defined by .L(x, y∗ ) = f (x) + y∗ , H (x), ∗ ∗ .∀(x, y∗ ) ∈ Ω × Y , is stationary at .(x0 , y∗0 ), where .y∗0 ∈ Y is a Lagrange multiplier.   2 (iii) . ∂∂xL2 (x0 , y∗0 ) is positive definite on the subspace .M := N H (1)(x0 ) , that

.

is, .∃m ∈ (0, ∞) ⊂ R such that   N H (1)(x0 ) .

.

∂2L (x , y )(h)(h) ∂x 2 0 ∗0

≥ mh2 , .∀h ∈

Then, .x0 is a point of relative strict minimum for f on the set .Ωc := {x ∈ Ω | H (x) = ϑY }. Proof By Propositions 9.34, 9.37, 9.38, 9.41, 9.44, and 9.45, L is .C2 at .(x0 , y∗0 ). By Proposition 9.9 and (ii), L(1) (x0 , y∗0 ) =

.

F

∂L ∂L ∂x (x0 , y∗0 ) ∂y∗ (x0 , y∗0 )

G

= ϑB(X×Y∗ ,R)

Then, .∃δ0 ∈ (0, ∞) ⊂ R such that .f (2) (x), .H (2)(x), and

.

∂2L (x, y∗ ) ∂x 2

exists, 2

∀x ∈ BX (x0 , δ0 ) ⊆ Ω and .∀y∗ ∈ BY∗ (y∗0 , δ0 ), and, by Proposition 9.46, . ∂∂xL2 is continuous 6 2 at .(x0 , y∗02). Then, .∃δ61 ∈ (0, δ0 ] ⊂ 6R and .∃c61 ∈ [0, ∞) ⊂ R such 6 6 that .6 ∂∂xL2 (x, y∗0 ) − ∂∂xL2 (x0 , y∗0 )6 < m/5 and .6H (2)(x)6 ≤ c1 , .∀x ∈ D1 := "  # BX (x0 , δ1 ). By Propositions 7.114 and 7.98 and (i), .M ⊥ = R H (1) (x0 ) is "closed. Then, by Proposition 7.113, .∃c2 ∈ [0, ∞) ⊂ R such that, .∀x∗ ∈  #    (1) R H (x0 ) , there exists .y∗ ∈ Y∗ such that .x∗ = H (1)(x0 ) y∗ and .y∗  ≤ c2 x∗ .

.

Claim 10.23.1 .∀x ∈ Ωc ∩D1 , .∃h0 ∈ M such that .x − x0 − h0  ≤ c1 c2 x − x0 2 , 2 2 .x − x0  − c1 c2 x − x0  ≤ h0  ≤ x − x0  + c1 c2 x − x0  .

10.3 Optimization with Equality Constraints

347

Proof of Claim Fix any .x ∈ Ωc ∩ D1 . Then, .H (x) 6= H (x0 ) = ϑY6. By Taylor’s Theorem 9.48, .∃t0 ∈ (0, 1) ⊂ R such that .6H (1)(x0 )(x − x0 )6 = 6 6 6 6 6H (x) − H (x0 ) − H (1)(x0 )(x − x0 )6 ≤ 1 6H (2)(t0 x + (1 − t0 )x0 )6x − x0 2 ≤ 2 = c1 x − x0 2 /2. By Proposition 7.97, we have .infh∈M x − x0 − h maxx∗ ∈M ⊥ , x∗ ≤1 x∗ , x − x0 , where the maximum is achieved at .x∗1 ∈ "  # M ⊥ = R H (1)(x0 ) with .x∗1  ≤ 1. Then, .∃y∗1 ∈ Y∗ such that  (1)  .x∗1 = H (x0 ) y∗1 and .y∗1  ≤ c2 . By Proposition 7.68, M is closed. Then, .∃h0 ∈ M such that .x − x0 − h0  ≤ 2 infh∈M x − x0 − h = 2x∗1 , x − x0  = 6 6 BB CC 2 y∗1 , H (1)(x0 )(x − x0 ) ≤ 2y∗1 6H (1)(x0 )(x − x0 )6 ≤ c1 c2 x − x0 2 . Then, .h0  ≥ x − x0  − x − x0 − h0  ≥ x − x0  − c1 c2 x − x0 2 ; and 2 .h0  ≤ x − x0  + x − x0 − h0  ≤ x − x0  + c1 c2 x − x0  . This completes the proof of the claim. ' & 6 2 6 6 6 Let .c3 = 6 ∂∂xL2 (x0 , y∗0 )6. Let .δ ∈ (0, δ1 ] ⊂ R such that .c1 c2 δ ≤ 1/4, c12 c22 (c3 +m/5)δ 2 ≤ m/5, and .5c1 c2 (c3 +m/5)δ/2 ≤ m/5. .∀x ∈ Ωc ∩BX (x0 , δ), by Claim 10.23.1, .∃h0 ∈ M such that .x − x0 − h0  ≤ c1 c2 x − x0 2 ≤ x − x0 /4, .3x − x0 /4 ≤ h0  ≤ 5x − x0 /4. By Taylor’s Theorem 9.48, .∃t1 ∈ (0, 1) ⊂ R such that .

f (x) − f (x0 ) = L(x, y∗0 ) − L(x0 , y∗0 ) −

.

=

∂L (x0 , y∗0 )(x − x0 ) ∂x

1 ∂ 2L (t1 x + (1 − t1 )x0 , y∗0 )(x − x0 )(x − x0 )   2 ∂x 2  x¯

=

1  ∂ 2L 2 ∂x +

(x, ¯ y∗0 )(h0 )(h0 ) + 2

∂ 2L (x, ¯ y∗0 )(x − x0 − h0 )(h0 ) ∂x 2

∂ 2L (x, ¯ y∗0 )(h0 )(x − x0 − h0 ) ∂x 2

 ∂ 2L (x, ¯ y∗0 )(x − x0 − h0 )(x − x0 − h0 ) 2 ∂x  1 ∂ 2L ∂ 2L ∂ 2L ≥ (x0 , y∗0 )(h0 )(h0 ) + ( 2 (x, ¯ y∗0 ) − (x0 , y∗0 ))(h0 )(h0 ) 2 2 ∂x ∂x ∂x 2 6 ∂ 2L 6  6 6 −6 2 (x, ¯ y∗0 )6(2x − x0 − h0 h0  + x − x0 − h0 2 ) ∂x 6 2  6 1 ∂ 2L 6∂ L 6 ≥ mh0 2 − mh0 2 /5 − 6 2 (x, ¯ y∗0 ) − (x , y ) 0 ∗0 6 2 ∂x ∂x 2 6 6 ∂ 2L  6 6 +6 2 (x0 , y∗0 )6 (2x − x0 − h0 h0  + x − x0 − h0 2 ) ∂x +

348

10 Local Theory of Optimization

# 1 " 4m 5 h0 2 − (c3 + m/5) ( c1 c2 x − x0 3 + c12 c22 x − x0 4 ) 2 5 2   m m m 1 4m 9 x − x0 2 = x − x0 2 − − ≥ 2 5 16 5 5 40



Hence, .x0 is a point of relative strict minimum for f on the set .Ωc . This completes the proof of the proposition. ' & Proposition 10.24 Let .X and .Y be real Banach spaces, .Ω ⊆ X, .f : Ω → R be C2 at .x0 ∈ Ω ◦ , .H : Ω → Y be .C2 at .x0 . Consider the optimization problem (10.2). Assume that .x0 is a regular point of H and is a point of relative minimum for f on the set .Ωc := {x ∈ Ω | H (x) = ϑY }. Then, there exists a Lagrange multiplier ∗ .y∗0 ∈ Y such that .

(i) The Lagrangian .L : Ω × Y∗ → R defined by .L(x, y∗ ) = f (x) + y∗ , H (x), ∗ .∀(x, y∗ ) ∈ Ω × Y , is stationary at .(x0 , y∗0 ).   2 (ii) . ∂∂xL2 (x0 , y∗0 ) is positive semi-definite on the subspace .N H (1)(x0 ) =: M, that   2 is, . ∂∂xL2 (x0 , y∗0 )(h)(h) ≥ 0, .∀h ∈ N H (1) (x0 ) . Proof Under the assumption of the proposition, by Proposition 10.21, there exists a Lagrange multiplier .y∗0 ∈ Y∗ such that (i) holds. We will show that (ii) also holds by an argument of contradiction. Suppose (ii) does not hold. Then, .∃h0 ∈ M with ∂2L .h0  = 1 such that . (x , y )(h0 )(h0 ) < −m < 0 for some .m ∈ (0, ∞) ⊂ R. ∂x 2 0 ∗0 By Surjective Mapping Theorem 9.53, .∃r1 ∈ (0, ∞) ⊂ R, .∃δ1 ∈ (0, ∞) ⊂ R, and .∃c1 ∈ [0, ∞) ⊂ R with .c1 δ1 ≤ r1 such that .∀y ¯ ∈ BY (ϑY , δ1 /2), .∀x¯ ∈ BX (x0 , r1 /2) ¯ ≤ with .y¯ = H (x), ¯ .∀y ∈ BY (y, ¯ δ1 /2), .∃x ∈ BX (x0 , r1 ) ⊆ Ω with .x − x c1 y − y, ¯ we have .y = H (x). By Propositions 9.34, 9.37, 9.38, 9.41, 9.44, and 9.45, L is .C2 at .(x0 , y∗0 ). By Proposition 9.9 and (i), L(1) (x0 , y∗0 ) =

F

.

∂L ∂L ∂x (x0 , y∗0 ) ∂y∗ (x0 , y∗0 )

G

= ϑB(X×Y∗ ,R) 2

By Proposition 9.46, .∃r2 ∈ (0, r1 ] ⊂ R such that . ∂∂xL2 (x, y∗0 ) exists and 6 6 2 6 6∂ L ∂2L .6 (x, y ) − (x , y ) 6 < m/5, .∀x ∈ BX (x0 , r2 ). Since H is .C2 at .x0 , ∗0 0 ∗0 ∂x 2 ∂x 2 6 (2) 6 6 6 then .∃c2 ∈ [0, ∞) ⊂ R and 6 2 .∃r3 ∈ (0, 6 r2 ] ⊂ R such that . H (x) ≤ c2 , 6∂ L 6 .∀x ∈ BX (x0 , r3 ). Let .c3 := 6 (x , y )6. ∂x 2 0 ∗0 ∀δ ∈ (0, r3 /2) ⊂ R such that .c2 δ 2 < δ1 , .c1 c2 δ/2 ≤ 1/4, .c12 c22 (c3 + m/5)δ 2/4 ≤ m/5, and .c1 c2 (c3 + m/5)δ ≤ m/5. By Taylor’s Theorem 9.48, .∃t0 ∈ (0, 1) ⊂ R such that .

6 6 H (x0 + δh0 ) = 6H (x0 + δh0 ) − H (x0) − δH (1)(x0 )h0 6

.



6 16 6H (2)(x0 + t0 δh0 )6δ 2 ≤ c2 δ 2 /2 < δ1 /2 2

10.3 Optimization with Equality Constraints

349

Let .y¯δ := H (x0 + δh0 ) ∈ BY (ϑY , δ1 /2) and .x¯δ := x0 + δh0 ∈ BX (x0 , r1 /2). Note that .y¯δ = H (x¯δ ) and .ϑY ∈ BY (y¯δ , δ1 /2). Then, .∃xδ ∈ BX (x0 , r1 ) with .xδ − x¯δ  ≤ c1 y¯δ  ≤ c1 c2 δ 2 /2 such that .H (xδ ) = ϑY . Then, we have .xδ − x0  ≥ δh0  − xδ − x¯δ  ≥ δ − c1 c2 δ 2 /2 ≥ 3δ/4; and .xδ − x0  ≤ δh0  + xδ − x¯δ  ≤ δ + c1 c2 δ 2 /2 ≤ 5δ/4 < r3 . Hence, .xδ ∈ Ωc ∩ BX (x0 , r3 ). By Taylor’s Theorem 9.48, .∃t1 ∈ (0, 1) ⊂ R such that f (xδ ) − f (x0 ) = L(xδ , y∗0 ) − L(x0 , y∗0 ) −

.

=

∂L (x0 , y∗0 )(xδ − x0 ) ∂x

1 ∂ 2L (t1 xδ + (1 − t1 )x0 , y∗0 )(xδ − x0 )(xδ − x0 )   2 ∂x 2  xˆ

=

1  ∂ 2L (x, ˆ y∗0 )(x¯δ − x0 )(x¯δ − x0 ) 2 ∂x 2 ∂ 2L ∂ 2L ( x, ˆ y )(x − x ¯ )( x ¯ − x ) + (x, ˆ y∗0 )(x¯δ − x0 )(xδ − x¯δ ) ∗0 δ δ δ 0 ∂x 2 ∂x 2  ∂ 2L + 2 (x, ˆ y∗0 )(xδ − x¯δ )(xδ − x¯δ ) ∂x  2 1  2 ∂ 2L 2 ∂ L δ (x , y )(h )(h ) + δ (x, ˆ y∗0 ) 0 ∗0 0 0 2 ∂x 2 ∂x 2 6 ∂ 2L 6  ∂ 2L 6 6 − 2 (x0 , y∗0 ) (h0 )(h0 ) + 6 2 (x, ˆ y∗0 )6(2xδ − x¯δ δh0  ∂x ∂x  +xδ − x¯δ 2 ) 6 2 6 1 ∂ 2L 6∂ L 6 − mδ 2 + mδ 2 /5 + 6 2 (x, ˆ y∗0 ) − (x , y ) 6 0 ∗0 2 2 ∂x ∂x 6 ∂ 2L 6  6 6 +6 2 (x0 , y∗0 )6 (c1 c2 δ 3 + c12 c22 δ 4 /4) ∂x  1  4m 2 − δ + (c3 + m/5) (c1 c2 δ 3 + c12 c22 δ 4 /4) 2 5 m 1  4m 2 m 2 m 2  − δ + δ + δ = − δ2 < 0 2 5 5 5 5 +




◦ [ω, Z]. Hence, .(r0 , ϑZ ) ∈ ω, Γ¯ . By Proposition 8.22, .ω is continuous at .ϑZ . By Proposition 8.23, .ω is continuous.

352

10 Local Theory of Optimization

By Proposition 8.34, we have 0 = μ¯ 0 = ω(ϑZ ) = max (−ωconj(z∗ ))

.

z∗ ∈Γ¯ conj

where .Γ¯ conj and .ωconj : Γ¯ conj → R are the conjugate set and conjugate functional of .ω. By Fact 8.53, .Γ¯ conj ⊆ P  . Let the maximum be achieved at .−z∗0 ∈ Γ¯ conj, 

¯ ∗ ) = supz∈Γ¯ (z∗ , z − ω(z)), where .z∗0 = ϑZ∗ . Define .ω¯ : P  → Re by .ω(z  .∀z∗ ∈ P . By Fact 8.54, we have .maxz ∈Γ¯ (−ω ¯ ∗ )), conj(z∗ )) = maxz∗ ∈P  (−ω(z ∗ conj where both maximums are achieved at .−z∗0 , and .∀z∗ ∈ P ⊕ , .

 (1) BB CC − ω(−z ¯ (x0 )u + z∗ , G(x0 ) + G(1) (x0 )u ∗ ) = inf f u∈X

Then, we have  BB CC 0 = μ¯ 0 = max inf f (1) (x0 )u + z∗ , G(x0 ) + G(1) (x0 )u

.

 z∗ =ϑZ∗ u∈X

where @@ the maximum is achieved at AA .z∗0 . This implies that .z∗0 , G(x0 ) +  (1)  (1) infu∈X f (x0 ) + G (x0 ) z∗0 , u = 0 by Proposition 8.37. For the infimum   to be finite, we must have .f (1) (x0 ) + G(1) (x0 ) z∗0 = ϑX∗ . Hence, the Lagrangian L is stationary at .x0 . Then, .0 = z∗0 , G(x0 ). This completes the proof of the proposition. ' & Proposition 10.26 Let .X be a real Banach space, .Y be a real Banach space with a positive cone .P ⊆ Y, and .A ∈ B(X, Y) be surjective. Then, (Ainv(P ))⊕ = A (P ⊕ )

.

 Proof .∀x∗ ∈ A (P ⊕ ), .∃y∗ BB∈ P ⊕ such CC that .x∗ = A y∗ . .∀x ∈ Ainv(P ), we have  .Ax ∈ P . Then, .x∗ , x = A y∗ , x = y∗ , Ax ≥ 0. By the arbitrariness of x, ⊕ ⊕  ⊕ .x∗ ∈ (Ainv (P )) . By the arbitrariness of .x∗ , we have .A (P ) ⊆ (Ainv (P )) . ⊕ On the other hand, fix any .x∗ ∈ (Ainv (P )) . .∀x ∈ N (A), .x ∈ Ainv (P ), since .ϑY ∈ P . Then, .x∗ , x ≥ 0. Since .−x ∈ N (A) as well, then .x∗ , x = 0.   Hence, .x∗ ∈ (N (A))⊥ . By Proposition 7.114, .(N (A))⊥ = R A . Then, .∃y∗ ∈ Y∗ such that .x∗ = A y∗ . .∀y ∈ P , since A is surjective, .∃x ∈ Ainv (P ) such that BB then CC  .y = Ax. Then, we have .y∗ , y = y∗ , Ax = A y∗ , x = x∗ , x ≥ 0. By the arbitrariness of y, we have .y∗ ∈ P ⊕ . Then, .x∗ ∈ A (P ⊕ ). By the arbitrariness of .x∗ , we have .(Ainv (P ))⊕ ⊆ A (P ⊕ ). Therefore, we have .A (P ⊕ ) = (Ainv (P ))⊕ . This completes the proof of the proposition. ' &

Next, we present the second-order sufficient condition for a relative strict minimum point in the optimization problem (10.3) with inequality constraints.

10.4 Inequality Constraints

353

Proposition 10.27 Let .X be a real Banach space, .Ω ⊆ X, .x0 ∈ Ω ◦ , .Z be a real Banach space with a positive cone .P ⊆ Z, and .f : Ω → R and .G : Ω → Z be .C2 at .x0 . Consider the optimization problem (10.3). Assume that 

(i) .G(x0 ) = ϑZ and .x0 is regular point of G. (ii) .z∗0 ∈ P ⊕ is the Lagrange multiplier, the Lagrangian .L : Ω → R defined by .L(x) = f (x) + z∗0 , G(x), .∀x ∈ Ω, is stationary at .x0 . (iii) .z∗0 , G(x0 ) = 0. 2 (2) (iv) .∃m * ∈ (0, ∞) ⊂ R such that .L+ (x0 )(u)(u) ≥ mu , .∀u ∈ Mc :=  u ∈ X  G(x0 ) + G(1) (x0 )u = ϑZ , that is, .L(2) (x0 ) is positive definite on the set .Mc . Then, .x0 is a point of relative strict minimum of f on the set .Ωc * +   x ∈ X  G(x) = ϑZ .

:=

Proof By Propositions 9.34, 9.37, 9.40, 9.41, 9.44, and 9.45, L is .C2 at .x0 . By (ii), .L(1) (x0 ) = ϑB(X,R) = ϑX∗ . Then, .∃δ0 ∈ (0, ∞) ⊂ R such that .f (2)(x), (2) (x), and .L(2) (x) exist, .∀x ∈ B (2) is continuous at .x . .G X (x0 , δ0 ) ⊆ Ω and .L6 6 0 Then, .∃δ16∈ (0, δ06] ⊂ R and .∃c1 ∈ [0, ∞) ⊂ R such that .6L(2) (x) − L(2) (x0 )6 < m/5 and .6G(2)(x)6 ≤ c1 , .∀x ∈ D1 := BX (x0 , δ1 ). By Propositions 7.114 and 7.98 "    # and (i), .(N G(1) (x0 ) )⊥ = R G(1) (x0 ) is closed. Then, by Proposition 7.113, "  # .∃c2 ∈ [0, ∞) ⊂ R such that, .∀x∗ ∈ R G(1) (x0 ) , there exists .z∗ ∈ Z∗ such that  (1)  .x∗ = G (x0 ) z∗ and .z∗  ≤ c2 x∗ . Claim 10.27.1 .∀x ∈ Ωc ∩D1 , .∃h0 ∈ Mc such that .x − x0 − h0  ≤ c1 c2 x − x0 2 and .x − x0  − c1 c2 x − x0 2 ≤ h0  ≤ x − x0  + c1 c2 x − x0 2 . 

Proof of Claim Fix any .x ∈ Ωc ∩ D1 . Then, .G(x) = ϑZ . It is easy to see that .Mc ϑX is a nonempty closed convex set. Then, by Proposition 8.15, we have ¯ := infh∈Mc x − x0 − h = maxx∗ ∈Mc supp, x∗ ≤1 (x∗ , x − x0  − g(x∗ )) ≥ .0 ≤ δ 0, where .g : Mcsupp → R is the support functional of .Mc , .Mcsupp := {x∗ ∈ X∗ | supu∈Mc x∗ , u < +∞}, and .g(x∗ ) = supu∈Mc x∗ , u, .∀x∗ ∈ Mcsupp. ⊕

We will show that .Mcsupp ⊆ ((G(1) (x0 ))inv (P )) . Fix any .x∗ ∈ Mcsupp. 

∀u ∈ (G(1) (x0 ))inv (P ), we have .G(1)(x0 )u = ϑZ . .∀α ∈ [0, ∞) ⊂ R, it is easy x∗ , u to show that .−αu ∈ Mc . Suppose .x∗ , u < 0. Then, .supu∈M ¯ ≥ ¯ c supα∈[0,∞)⊂R x∗ , −αu = +∞. This implies that .x∗ ∈ / Mcsupp, which is a contradiction. Hence, we must have .x∗ , u ≥ 0. By the arbitrariness of u, we ⊕ have .x∗ ∈ ((G(1)(x0 ))inv (P )) . By the arbitrariness of .x∗ , we have .Mcsupp ⊆ ⊕ ((G(1)(x0 ))inv (P )) .

.

354

10 Local Theory of Optimization ⊕

Then, .δ¯ = x∗0 , x − x0 −g(x∗0 ) for some .x∗0 ∈ Mcsupp ⊆ ((G(1)(x0 ))inv (P )) (1)  ⊕ with ∗0  ≤ 1. By (i) and Proposition 10.26, we have .x∗0 ∈ (G (x0 )) (P ) ⊆  .x (1)  ∗ (1)  R (G (x0 )) . Then, .∃z∗1 ∈ Z such that .x∗0 = (G (x0 )) z∗1 and .z∗1  ≤ c2 x∗0  ≤ c2 . This further implies that BB CC CC BB δ¯ = z∗1 , G(1) (x0 )(x − x0 ) − sup z∗1 , G(1) (x0 )u

.

u∈Mc

BB CC = z∗1 , −G(x) + G(x0 ) + G(1) (x0 )(x − x0 ) CC BB − sup z∗1 , −G(x) + G(x0 ) + G(1) (x0 )u u∈Mc

where the second equality follows from Proposition 8.37. By (i), we have  R G(1)(x0 ) = Z and .∃u1 ∈ X such that .G(1) (x0 )u1 = G(x) − G(x0 ).

.



It isBB easy to see that .u1 ∈ Mc since .x CC∈ Ωc and .G(x) = ϑZ . Then, ¯ ≤ z∗1 , −G(x) + G(x0 ) + G(1)(x0 )(x − x0 ) . .δ 9.48, 6.∃t0 6∈ (0, 1) ⊂ R 6 such that 6 By Taylor’s Theorem (1) (x )(x − x )6 ≤ 1 6G(2) (t x + (1 − t )x )6x − x 2 ≤ .6G(x) − G(x0 ) − G 0 0 0 0 0 0 6 2 6 c1 x − x0 2 /2. Then, we have .δ¯ ≤ z∗1 6G(x) − G(x0 ) − G(1) (x0 )(x − x0 )6 ≤ c1 c2 x − x0 2 /2. Since .Mc is closed, then .∃h0 ∈ Mc such that .x − x0 − h0  ≤ 2δ¯ ≤ c1 c2 x − x0 2 . Then, .h0  ≥ x − x0  − x − x0 − h0  ≥ x − x0  − c1 c2 x − x0 2 ; and 2 .h0  ≤ x − x0  + x − x0 − h0  ≤ x − x0  + c1 c2 x − x0  . This completes the proof of the claim. ' & 6 (2) 6 2 2 Let .c3 = 6L (x0 )6. Let .δ ∈ (0, δ1 ] ⊂ R such that .c1 c2 δ ≤ 1/4, .c1 c2 (c3 + m/5)δ 2 ≤ m/5, and .5c1 c2 (c3 + m/5)δ/2 ≤ m/5. .∀x ∈ Ωc ∩ BX (x0 , δ), by Claim 10.27.1, .∃h0 ∈ Mc such that .x − x0 − h0  ≤ c1 c2 x − x0 2 ≤ x − x0 /4 and .3x − x0 /4 ≤ h0  ≤ 5x − x0 /4. By Taylor’s Theorem 9.48, .∃t1 ∈ (0, 1) ⊂ R such that f (x) − f (x0 ) ≥ L(x) − L(x0 ) − L(1) (x0 )(x − x0 )

.

=

1 (2) L (t1 x + (1 − t1 )x0 )(x − x0 )(x − x0 )    2 x¯

=

1  (2) ¯ 0 )(h0 ) + L(2) (x)(x ¯ − x0 − h0 )(h0 ) L (x)(h 2 ¯ 0 )(x − x0 − h0 ) + L(2) (x)(x ¯ − x0 − h0 )(x − x0 − h0 ) +L(2) (x)(h



1  (2) ¯ − L(2) (x0 ))(h0 )(h0 ) L (x0 )(h0 )(h0 ) + (L(2) (x) 2 6 6  ¯ 6(2x − x0 − h0 h0  + x − x0 − h0 2 ) −6L(2) (x)



10.4 Inequality Constraints

355

6 6 6 6 1 mh0 2 − mh0 2 /5 − 6L(2) (x) ¯ − L(2) (x0 )6 + 6L(2) (x0 )6 2  ·(2x − x0 − h0 h0  + x − x0 − h0 2 ) # 5 1 " 4m h0 2 − (c3 + m/5) ( c1 c2 x − x0 3 + c12 c22 x − x0 4 ) ≥ 2 5 2   m m 1 4m 9 m x − x0 2 = x − x0 2 − − ≥ 2 5 16 5 5 40



Hence, .x0 is a point of relative strict minimum for f on the set .Ωc . This completes the proof of the proposition. ' & Proposition 10.28 Let .X be a real Banach space, .Ω ⊆ X, .x0 ∈ Ω ◦ , .Z be a real Banach space with a positive cone .P ⊆ Z, and .f : Ω → R and .G : Ω → Z be .C2 at .x0 . Consider the optimization problem (10.3). Assume that .x0 is a regular point  of G and a point of relative minimum for f on the set .Ωc := {x ∈ X | G(x) = ϑZ }. ⊕ Then, there exists a Lagrange multiplier .z∗0 ∈ P such that (i) The Lagrangian .L : Ω → R defined by .L(x) = f (x)+z∗0 , G(x), .∀x ∈ Ω, is stationary at .x0 and .z∗0 , G(x0 ) = 0.   (ii) .L(2) (x0 ) is positive semi-definite on the subspace .N G(1) (x0 ) =: M, that is,   (2) (x )(h)(h) ≥ 0, .∀h ∈ N G(1) (x ) . .L 0 0 Proof By the Generalized Kuhn–Tucker Theorem 10.25, there exists a Lagrange  multiplier .z∗0 ∈ Z∗ with .z∗0 = ϑZ∗ such that (i) holds. We will show that (ii) also holds by an argument of contradiction. Suppose (ii) does not hold. Then, (2) (x )(h )(h ) < −m < 0 for some .∃h0 ∈ M with .h0  = 1 such that .L 0 0 0 .m ∈ (0, ∞) ⊂ R. Let .z0 := G(x0 ) ∈ Z. By Surjective Mapping Theorem 9.53, .∃r1 ∈ (0, ∞) ⊂ R, .∃δ1 ∈ (0, ∞) ⊂ R, and .∃c1 ∈ [0, ∞) ⊂ R with .c1 δ1 ≤ r1 such that .∀¯ z ∈ BZ (z0 , δ1 /2), .∀x¯ ∈ BX (x0 , r1 /2) with .z¯ = G(x), ¯ ¯ ≤ c1 z − z¯ , we have .∀z ∈ BZ (¯ z, δ1 /2), .∃x ∈ BX (x0 , r1 ) ⊆ Ω with .x − x .z = G(x). By Propositions 9.34, 9.37, 9.40, 9.41, 9.44, and 9.45, L is .C2 at .x0 . By (i), .L(1)6(x0 ) = ϑB(X,R) =6ϑX∗ . Then, .∃r2 ∈ (0, r1 ] ⊂ R such that .L(2) (x) exists and .6L(2) (x) − L(2) (x0 )6 < m/5, .∀x ∈ BX (x0 , r2 ). Since G is .C 62 at .x0 , 6 (2) 6 ≤ c2 , then .∃c2 ∈ [0, ∞) ⊂ R and .∃r3 ∈ (0, r2 ] ⊂ R such that .6G (x) 6 (2) 6 6 6 .∀x ∈ BX (x0 , r3 ). Let .c3 := L (x0 ) . 2 2 2 2 .∀δ ∈ (0, r3 /2) ⊂ R such that .c2 δ < δ1 , .c1 c2 δ/2 ≤ 1/4, .c c (c3 + m/5)δ /4 ≤ 1 2 m/5, and .c1 c2 (c3 + m/5)δ ≤ m/5. By Taylor’s Theorem 9.48, .∃t0 ∈ (0, 1) ⊂ R such that 6 6 G(x0 + δh0 ) − z0  = 6G(x0 + δh0 ) − G(x0 ) − δG(1) (x0 )h0 6

.



6 16 6G(2) (x0 + t0 δh0 )6δ 2 ≤ c2 δ 2 /2 < δ1 /2 2

356

10 Local Theory of Optimization

Let .z¯ δ := G(x0 + δh0 ) ∈ BZ (z0 , δ1 /2) and .x¯ δ := x0 + δh0 ∈ BX (x0 , r1 /2). Note that .z¯ δ = G(x¯δ ) and .z0 ∈ BZ (¯zδ , δ1 /2). Then, .∃xδ ∈ BX (x0 , r1 ) with .xδ − x¯δ  ≤ c1 z0 − z¯ δ  ≤ c1 c2 δ 2 /2 such that .G(xδ ) = z0 . Then, we have .xδ − x0  ≥ δh0 − xδ − x¯δ  ≥ δ − c1 c2 δ 2 /2 ≥ 3δ/4; and .xδ − x0  ≤ δh0  + xδ − x¯δ  ≤ δ + c1 c2 δ 2 /2 ≤ 5δ/4 < r3 . Hence, .xδ ∈ Ωc ∩ BX (x0 , r3 ). By Taylor’s Theorem 9.48, .∃t1 ∈ (0, 1) ⊂ R such that f (xδ ) − f (x0 ) = L(xδ ) − L(x0 ) − L(1) (x0 )(xδ − x0 ) − z∗0 , G(xδ )

.

+z∗0 , G(x0 ) = L(xδ ) − L(x0 ) − L(1) (x0 )(xδ − x0 ) =

1 (2) L (t1 xδ + (1 − t1 )x0 )(xδ − x0 )(xδ − x0 )    2 xˆ

1 ˆ x¯δ − x0 )(x¯δ − x0 ) + L(2) (x)(x ˆ δ − x¯δ )(x¯δ − x0 ) = L(2) (x)( 2  ˆ x¯δ − x0 )(xδ − x¯ δ ) + L(2) (x)(x ˆ δ − x¯δ )(xδ − x¯δ ) +L(2) (x)( ≤


0. E = ∞ Let ∞ n=1 En := n=1 {x ∈ X | f2 (x) − f1 (x) ≤ −1/n, |f1 (x)| ≤ n, |f2 (x)| ≤ n}. By Proposition 11.7, μ(E) = limn∈N μ(En ) and ∃n ∈  N such that μ(En ) > 0. Since ∞ X is σ -finite, then ∃(Xm )∞ m=1 Xm and μ(Xm ) < +∞, m=1 ⊆ B such that X = ∀m ∈ N. Without loss of generality, we may assume that Xm ⊆ Xm+1 , ∀m ∈ N. Then, by Proposition 11.7, μ(En ) = limm∈N μ(En ∩ Xm ). Then, ∃m ∈ N ¯ := μ(En ∩ Xm ) ∈ (0, +∞) ⊂ R. By Definition 11.79, ¯ P ◦ such that μ(E) E ¯ < ∞, i = 1, 2. Then, by Propositions 11.75 and 11.92, R fi dμ ≤ nμ(E)

420

11 General Measure and Integration



¯ f1 dμ − E¯ f2 dμ − μ(E)/n = E¯ (f1 − f2 − 1/n) dμ ≥ 0. This contradicts with the assumption. Hence, μ(E) = 0 and f1 ≤ f2 a.e. in X . ' & E¯

Proposition 11.97 Let X := (X, B, μ) be a σ -finite measure space and fi : X → [0, ∞) ⊂ R be B-measurable, i = 1, 2. Assume that f1 ≤ f2 a.e. in X , μ({x ∈ X | f1 (x) < f2 (x)}) > 0, and 0 ≤ X f1 dμ < +∞. Then, X f1 dμ < X f2 dμ. Proof By Proposition 11.83, X f1 dμ ≤ X f2 dμ. Suppose X f2 dμ = X f1 dμ ∈ R. Then, f1 and f2 are absolutely integrable over X . Let g := f2 − f1 . Then, by Propositions 7.23, 11.38, and 11.39, we have g : X → R is B-measurable and g ≥ 0 a.e. in X . By Proposition 11.92, g is absolutely integrable over X and X g dμ = 0. ∀E ∈ B, by Proposition 11.92, we have E g dμ + X\E g dμ = X g dμ = 0, and both of the summands on the left-hand side are nonnegative. Then, E g dμ = 0. By Proposition 11.96, we have g = 0 a.e. in X . This contradicts with the fact that μ({x ∈ X | g(x) > 0}) = μ({x ∈ X | f1 (x) < f2 (x)}) > 0. Therefore, we must have X f1 dμ < X f2 dμ. This completes the proof of the proposition. ' & Theorem 11.98 (Jensen’s Inequality) Let X := (X, B, μ) be a finite measure space with μ(X) = 1, Y be a real Banach space, W be a separable subspace of Y, Ω ⊆ Y be a nonempty closed convex set, f : X → Ω ∩ W be absolutely integrable over X , and G : Ω → R be a convex functional. Assume that G ◦ f is absolutely integrable over X and the epigraph Ω] is closed. Then, [G, X f dμ ∈ Ω and G( X f dμ) ≤ X (G ◦ f ) dμ ∈ R. Proof By Proposition 11.92, y0 := X f dμ ∈ Y. We will first show that y0 ∈ Ω by distinguishing two exhaustive and mutually exclusive cases: Case 1: ϑY ∈ Ω; Case 2: ϑY ∈ / Ω. Case 1: ϑY ∈ Ω. Then, Ω ∩ W is a conic segment. By Lemma 11.87, there exists a sequence of simple functions (ϕn )∞ n=1 , ϕn : X → Ω ∩ W, ∀n ∈ N, such ϕ that lim ϕ = f a.e. in X , (x) ≤ P ◦ f (x), ∀x ∈ X, ∀n ∈ N, and y0 = n n n∈N f dμ = lim ϕ dμ ∈ Y. Fix any n ∈ N. Let ϕn admit the canonical n n∈N X Xn¯ y χ . Let y := ϑY ∈ Ω ∩ W and An+1 := representation ϕn = n+1 ¯ ¯ i=1 i Ai ,X n¯ -n+1 ¯ X \ ( i=1 Ai ) ∈ B. Then, by Proposition 11.75, X ϕn dμ = i=1 yi μ(Ai ). Note -n+1 ¯ that 1 = μ(X) i=1 μ(Ai ) and the summands are nonnegative. Since Ω is = convex, then X ϕn dμ ∈ Ω. Since Ω is closed, then, by Proposition 4.13, y0 = limn∈N X ϕn dμ ∈ Ω. Case 2: ϑY ∈ / Ω. Note that μ(X) = 1 implies that X = ∅. Then, Ω ∩ W = ∅. Let y¯ ∈ Ω ∩ W and Ω¯ := Ω − y. ¯ Then, ϑY ∈ Ω¯ and, by Propositions 7.16 and 6.39, Ω¯ is a closed convex set. Let f¯ := f − y. ¯ Then, we have f¯ : X → Ω¯ ∩ W. By Propositions 11.38, 11.39, and 7.23, f¯ is B-measurable. Note that P ◦ f¯(x) = f (x) − y ¯ ≤ P ◦ f (x) + y, ¯ ∀x ∈ X. Then, by Propositions 11.83 and 11.75, ¯ dμ = 0 ≤ X (P ◦ f¯) dμ ≤ X (P ◦ f + y) ¯ dμ = X (P ◦ f ) dμ + X y ¯ is absolutely integrable over X . By Case 1,  (P ◦ f ) dμ + y ¯ < +∞. Hence, f X

11.6 Banach Space Valued Measures

421

¯ By Propositions 11.92 and 11.75, we have we have X f¯ dμ ∈ Ω. X f dμ = ¯ X f dμ + y¯ ∈ Ω. Hence, in both cases, we have y0 ∈ Ω. By Proposition 8.32, we have G(y0 ) = supy∗ ∈Ω conj (y∗ , y0  − Gconj(y∗ )), where Gconj : Ω conj → R is the conjugate functional to G. ∀ ∈ (0, ∞) ⊂ R, ∃y∗ ∈ Ω conj, such that G(y0 ) −  ≤ y∗ , y0 −Gconj(y∗ ) ≤ y∗ , y0 −y∗ , y+G(y) = G(y)−y∗ , y − y0 , ∀y ∈ Ω. Then, ∀x ∈ X, we have G(y0 ) −  ≤ G(f (x)) − y∗ , f (x) − y0 . By Proposition 11.92, we have 7 .

7

G(y0 ) −  =

(G(y0 ) − ) dμ ≤ X

7 =

X

7

(G ◦ f ) dμ − 7

X

= X

X

(G ◦ f − y∗ , f − y0 ) dμ 7

y∗ , f  dμ +

(G ◦ f ) dμ − y∗ ,

X

7 X

y∗ , y0  dμ 7

f dμ + y∗ , y0  =

By the arbitrariness of , we have R G( completes the proof of the theorem.

X

f dμ) ≤



(G ◦ f ) dμ ∈ R X

X (G

◦ f ) dμ ∈ R. This ' &

11.6 Banach Space Valued Measures Definition 11.99 Let (X, B) be a measurable space and Y be a normed linear space. A Y-valued pre-measure μ on (X, B) is a function μ : B → Y such that: (i) μ(∅) = ϑY . ∞ (ii) ∀(E which is pairwise disjoint, we have ∞ i=1 μ(Ei ) < +∞ and i )∞i=1 ⊆ B, μ( i=1 Ei ) = ∞ i=1 μ(Ei ). Then, the triple (X, B, μ) is said to be a Y-valued pre-measure space. Define P ◦ μ : B → [0, ∞] ⊂ Re by, ∀E ∈ B, P ◦ μ(E) := supn∈Z+ , (Ei )n ⊆B, ni=1 Ei =E, Ei ∩Ej =∅, ∀1≤i 0, ∀0 ≤ti < P ◦ μ(Ei ), ∃ni ∈ Z+ and ∃ pairwise 6 ni ni ni 6 6μ(Ei,j )6 ≤ disjoint Ei,j j =1 ⊆ B with Ei = j =1 Ei,j such that ti < j =1 P ◦ μ(Ei6). ∀i ∈ N 6 with P ◦ μ(Ei ) = 0, let ti = 0, ni = 1, Ei,1 = Ei . Then, 0 = ti ≤ 6μ(Ei,1 )6 ≤ P ◦ μ(Ei ) = 0. Claim 11.100.1 0 ≤ t := ∞ i=1 ti ≤ P ◦ μ(E).

422

11 General Measure and Integration

Proof of Claim Clearly, t ≥ 0 since ti ≥ 0, ∀i ∈ N. We will distinguish two exhaustive and mutually exclusive cases:Case 1: t 0}. Then, .A0 ∈ B and .P ◦ μ(A0 ) < ∞. Note that, by Propositions 11.83 and 11.75, .0 ≤ = X\A0 g dP ◦ μ = X g dP ◦ μ − A0 g dP ◦ μ ≤ X g6dP ◦ μ − A0 φ dP ◦ μ 6 6 6 g dP ◦ μ − X φ dP ◦ μ < /4. This implies that .0 ≤ 6 X f dμ − A0 f dμ6 = 6 6X 6 6 6 X\A0 f dμ6 ≤ X\A0 P ◦ f dP ◦ μ ≤ X\A0 g dP ◦ μ < /4, where the first equality and the second inequality follow from Proposition 11.132, and the third inequality follows from Proposition 11.83. By Lebesgue Dominated Convergence

11.7 Calculation with Measures

477

Theorem 11.150, we have . X\A0 g dP◦μ = limn∈N X\A0 gn dP◦μn . By Case 1, we ¯ Then, .∃n0 ∈ N, .∀n ∈ N with .n0 ≤ n, have . A0 f dμ = limn∈N A0 fn dμn ∈ W. 6 6 6 6 we have .6 A0 f dμ − A0 fn dμn 6 < /4 and . X\A0 gn dP ◦ μn < /2. This leads 6 6 6 6 6 6 6 6 6 6 to .6 X f dμ − X fn dμn 6 ≤ 6 X f dμ − A0 f dμ6+ 6 A0 f dμ − A0 fn dμn 6+ 6 6 6 6 6 6 6 6 6 A0 fn dμn − X fn dμn 6 < /4 + /4 + 6 X\A0 fn dμn 6 ≤ /2 + X\A0 gn dP ◦ μn < , where the second and third inequalities follow from Proposition 11.132. ¯ This case is proved. Hence, . X f dμ = limn∈N X fn dμn ∈ W. This completes the proof of the theorem. ' & Proposition 11.152 Let .X := (X, B, μ) be a .Y-valued measure space, where .Y is a normed linear space over .K, .Z be a Banach space over .K, .W be a separable subspace of .B(Y, Z), and .f : X → W be absolutely integrable over .X . Define ¯ .ν : B → Z by .ν(E) = E f dμ, .∀E ∈ B. Then, .X := (X, B, ν) is a finite .Z-valued measure space and .0 ≤ P ◦ ν(E) ≤ E P ◦ f dP ◦ μ ≤ X P ◦ f dP ◦ μ < ∞, .∀E ∈ B. Proof By Proposition 11.125, .ν(∅) = ∅ f dμ = ϑZ . .∀E ∈ B, by Proposiν(E) ≤ E P ◦f dP ◦ μ ≤ tion 11.132, .ν(E) = E f dμ ∈ Z and .0 ≤ ∞ ∞ P ◦ f dP ◦ μ < ∞. .∀ pairwise disjoint .(En ) n=1 ⊆ B, X let .E := n=1 En ∈ B. -∞ -∞ Then, . n=1 ν(En ) ≤ n=1 En P ◦ f dP ◦ μ = E P ◦ f dP ◦ μ < ∞, where the equality follows from -∞ Proposition -∞11.83. By Proposition 11.132, we have .ν(E) = f dμ = f dμ = n=1 En n=1 ν(En ) ∈ Z. Hence, .ν is a .Z-valued E pre-measure on .(X, B). .∀E ∈ B, P ◦ ν(E) =

sup 

.



n∈Z+ , (Ei )ni=1 ⊆B , E=

n i=1

n .

Ei , Ei ∩Ej =∅, ∀1≤i 0, P ◦ νy∗ (E) ≤ (y∗ P ◦ ν)(E) = y∗ P ◦ ν(E) = y∗  E fˆ dμ = E (y∗ fˆ) dμ, where the first equality follows from Proposition 11.136 and the third equality follows from Proposition 11.83. When y∗  = 0, P ◦ νy∗ (E) ≤ 0 = E (y∗ fˆ) dμ, 11.138 and the where the inequality follows from Proposition equality follows from Proposition 11.75. Hence, P ◦ νy∗ (E) = E (P ◦ fy∗ ) dμ ≤ E (y∗ fˆ) dμ, ∀E ∈ B, ∀y∗ ∈ Y∗ . Then, by Proposition 11.96, P ◦ fy∗ ≤ y∗ fˆ a.e. in X ;

.

∀y∗ ∈ Y∗

Let KQ := Q if K = R and KQ := {α + iβ ∈ C | α, β ∈ Q} if K = C. Clearly, KQ is a countable dense set in K. Let D ⊆ Y∗ be a countable dense set and Dˆ := { ni=1 αi y∗i ∈ Y∗ | n ∈ Z+ , α1 , . . . , αn ∈ KQ , y∗1 , . . . , y∗n ∈ D}. Then, Dˆ ⊆ Y∗ is also a countable ˆ ∀α, β ∈ KQ , we have αy∗1 + βy∗2 ∈ D. ˆ dense set. Note that, ∀y∗1 , y∗2 ∈ D, Define   ˆ ∃α, β ∈ KQ · fαy∗1 +βy∗2 (x) = E0 := x ∈ X  ∃y∗1 , y∗2 ∈ D,    αfy∗1 (x) + βfy∗2 (x) or fy∗1 (x) > y∗1 fˆ(x)

.

Then, E0 ∈ B and μ(E0 ) = 0. We have the following results: fαy∗1 +βy∗2 (x) = αfy∗1 (x) + βfy∗2 (x);

.

ˆ ∀α, β ∈ KQ ∀x ∈ X \ E0 , ∀y∗1 , y∗2 ∈ D,

(11.4)

496

11 General Measure and Integration

and   fy (x) ≤ y∗ fˆ(x); ∗

.

∀x ∈ X \ E0 , ∀y∗ ∈ Dˆ

(11.5)

ˆ ∀x ∈ X \ E0 , ∀y∗ ∈ Y∗ , by Proposition 4.13, ∃(y∗i )∞ converges to y∗ . i=1 ⊆ D that  ∞ Then, this sequence is a Cauchy sequence. By (11.4) and (11.5), fy∗i (x) i=1 ⊆ K is a Cauchy sequence and admits limit F (x, y∗ ) ∈ K. It should be clear that F (x, y∗ ) ∞ ˆ is independent of the choice of the sequence (y∗i )∞ i=1 , since let (y¯∗i )i=1 ⊆ D be any sequence converging to y∗ , the combined sequence (y∗1 , y¯∗1 , y∗2 , y¯∗2 , . . .) ⊆ Dˆ converges to y∗ . Therefore, the sequence (fy∗1 (x), fy¯∗1 (x), fy∗2 (x), fy¯∗2 (x), . . .) ⊆ K is a Cauchy sequence and admits a limit, which has to be equal to F (x, y∗ ). Then, ˆ F (x, y¯∗ ) = fy¯∗ (x). ∀x ∈ X \ E0 , ∀y∗ ∈ Y∗ , F (x, y∗ ) ∈ K. ∀x ∈ X \ E0 , ∀y¯∗ ∈ D, ∞  ∀x ∈ X \ E0 , ∀y∗1 , y∗2 ∈ Y∗ , ∀α, β ∈ K, ∃ y¯∗1,i i=1 ⊆ Dˆ that converges ∞  to y∗1 , ∃ y¯∗2,i i=1 ⊆ Dˆ that converges to y∗2 , ∃(αi )∞ i=1 ⊆ KQ that converges to α, and ∃(βi )∞ ⊆ K that converges to β. By Propositions 7.23, 3.66, Q i=1 and 3.67, we have limi∈N (αi y¯∗1,i + βi y¯∗2,i ) = αy∗1 + βy∗2 , where the sequence ˆ This implies that F (x, αy∗1 + βy∗2 ) = on the left-hand side is contained in D. limi∈N fαi y¯∗1,i +βi y¯∗2,i (x) = limi∈N (αi fy¯∗1,i (x) + βi fy¯∗2,i (x)) = αF (x, y∗1 ) + βF (x, y∗2), where the second equality follows from (11.4) and the last equality follows from Propositions 7.23, and 3.67. Furthermore, |F (x, y∗1 )| = 6 3.66, 6  limi∈N fy¯∗1,i (x) ≤ limi∈N 6y¯∗1,i 6fˆ(x) = y∗1 fˆ(x), where the inequality follows from (11.5) and the last equality follows from Propositions 3.66 and 7.21. Hence, ∀x ∈ X \ E0 , F (x, ·) : Y∗ → K is a bounded linear functional on Y∗ . Since Y is reflexive, then ∃f (x) ∈ Y such that F (x, y∗ ) = y∗ , f (x), ∀y∗ ∈ Y∗ and f (x) ≤ fˆ(x). Therefore, we may define f : X → Y by assigning f (x) = ϑY , ∀x ∈ E0 . ∀y∗ ∈ Y∗ , define Fy∗ : X → K by Fy∗ (x) = y∗ , f (x), ∀x ∈ X. Fix any ˆ Then, ∀x ∈ X \ E0 , Fy¯∗ (x) = y¯∗ , f (x) = F (x, y¯∗ ) = fy¯∗ (x) and, y¯∗ ∈ D. ∀x ∈ E0 , Fy¯∗ (x) = y¯∗ , f (x) = 0. By Proposition 11.41, Fy¯∗ is B-measurable. By the arbitrariness of y¯∗ and Proposition 11.170, f is B-measurable. Claim 11.171.1 ∀y∗0 ∈ Y∗ , fy∗0 (x) = y∗0 , f (x) a.e. x ∈ X . Proof of Claim Let D¯ := D ∪ {y∗0} and D¯ˆ := { ni=1 αi y∗i ∈ Y∗ | n ∈ ¯ Then, Dˆ¯ ⊆ Y∗ is also a countable Z+ , α1 , . . . , αn ∈ KQ , y∗1 , . . . , y∗n ∈ D}. ˆ¯ Define dense set and Dˆ ⊆ D.   ˆ¯ ∃α, β ∈ K · f .E1 := {x ∈ X  ∃y∗1 , y∗2 ∈ D, αy∗1 +βy∗2 (x) = Q    αfy∗1 (x) + βfy∗2 (x) or fy∗1 (x) > y∗1 fˆ(x)} Then, E1 ∈ B, μ(E1 ) = 0, and E0 ⊆ E1 .

11.8 The Radon–Nikodym Theorem

497

ˆ¯ ∀α, β ∈ K , we have f ∀x ∈ X \ E1 , ∀y∗1 , y∗2 ∈ D, αy∗1 +βy∗2 (x) = Q   αfy∗1 (x) + βfy∗2 (x) and fy∗1 (x) ≤ y∗1 fˆ(x). By Proposition 4.13, ∃(y∗i )∞ i=1 ⊆ ˆ ¯ to y∗0 . Then, this sequence is a Cauchy sequence. By the above, D that converges ∞ fy∗i (x) i=1 ⊆ K is a Cauchy sequence and admits limit F¯ (x, y∗0 ) ∈ K. It should be clear that F¯ (x, y∗0 ) is independent of the choice of the sequence (y∗i )∞ i=1 . Two ∞ ˆ particular sequences (y¯∗i )i=1 ⊆ D converging to y∗0 and (y∗0 , y∗0 , . . .) ⊆ Dˆ¯ lead to fy∗0 (x) = F¯ (x, y∗0 ) = F (x, y∗0 ). Therefore, fy∗0 (x) = F (x, y∗0 ) = y∗0 , f (x) = Fy∗0 (x), ∀x ∈ X \ E1 . Note that fy∗0 is B-measurable and, by Propositions 7.72 and 11.38, Fy∗0 is B-measurable. Then, E1 ⊇ {x ∈ X | fy∗0 (x) =  Fy∗0 (x)} = {x ∈ X | fy∗0 (x) − Fy∗0 (x) > 0} ∈ B. Hence, fy∗0 = Fy∗0 a.e. in X . Therefore, the claim holds. ' & Note that P ◦ f (x) ≤ fˆ(x), ∀x ∈ X, and P ◦ f is B-measurable, by Propositions 7.21 and 11.38. ∀E ∈ dom (ν), we have P ◦ ν(E) = E fˆ dμ < ∞. This implies that E P ◦ f dμ < ∞. By Proposition 11.92, we have E f dμ ∈ Y. ∀y∗ ∈ Y∗ , P ◦ νy∗ (E) ≤ (y∗ P ◦ ν)(E) < ∞, E ∈ dom νy∗ , and y , ν(E) = ν (E) = f dμ = F dμ = E y∗ E y∗ E y∗ , f (x) dμ(x) = BB ∗ CC y∗ y∗ , E f dμ , where the last three equalities follow from Proposition 11.92. Hence, by Proposition 7.85, ν(E) = E f dμ ∈ Y, ∀E ∈ dom (ν). Hence, by Propositions 11.116 and 11.137, f is a Radon–Nikodym derivative of ν with respect dν to μ. By Proposition 11.167, f is unique as desired and dμ = f a.e. in X . ¯ Finally, we consider the general case. Let X := (X, B, P ◦ μ), which is a σ finite measure space. By the second special case, ∃! f1 : X → Y such that dPdν◦μ = f1 a.e. in X¯ . Again by the second special case, ∃! f2 : X → K such that dμ = dP ◦μ

◦μ ¯ f2 a.e. in X¯ . By Definition 11.166, ddP P ◦μ = P ◦ f2 a.e. in X . Clearly, the constant function 1 is the Radon–Nikodym derivative of P ◦ μ with respect to P ◦ μ. Then, P ◦ f2 = 1 a.e. in X¯ and ∃! f3 : X → K such that a.e. in X¯ . By 2 = 1 # " f3 f#" dP ◦μ d P ◦μ dν = dPdν◦μ = f1 f3 =: Proposition 11.168, dμ = f3 a.e. in X and dμ dμ f : X → Y a.e. in X . This completes the proof of the theorem. ' &

Theorem 11.172 (Lebesgue Decomposition) Let X := (X, B, μ) be a σ -finite measure space, Y be a Banach space, and ν be a σ -finite Y-valued measure on (X, B). Then, there exists a unique pair of σ -finite Y-valued measures ν0 and ν1 on (X, B) such that P ◦ ν0 ⊥ μ, P ◦ ν1 " μ, ν = ν0 + ν1 , and P ◦ ν = P ◦ ν0 + P ◦ ν1 . Furthermore, the following statements hold: (i) If ν is a finite Y-valued measure on (X, B), then ν0 and ν1 are finite Y-valued measures on (X, B). (ii) If ν is a σ -finite measure on (X, B), then ν0 and ν1 are σ -finite measures on (X, B). (iii) If ν is a finite measure on (X, B), then ν0 and ν1 are finite measures on (X, B).

498

11 General Measure and Integration

Proof Let λ := μ+P ◦ν. By Proposition 11.136, λ is a σ -finite measure on (X, B). Let X¯ := (X, B, λ). Clearly, we have μ " λ and P ◦ ν " λ. By Radon–Nikodym Theorem 11.169, ∃f : X → [0, ∞) ⊂ R such that f is the Radon–Nikodym derivative of μ with respect to λ. Then, μ(E) = E f dλ, ∀E ∈ B. Let A := {x ∈ X | f (x) > 0} ∈ B and B := {x ∈ X | f (x) = 0} ∈ B. Then, A ∩ B = ∅ and A ∪ B = X. By Proposition 11.75, μ(B) = B f dλ = 0. Define ν0 to be a function from B to Y by: ν0 (E) is undefined, ∀E ∈ B with E ∩ B ∈ B \ dom (ν); and ν0 (E) = ν(E ∩ B) ∈ Y, ∀E ∈ B with E ∩ B ∈ dom (ν). Define ν1 to be a function from B to Y by: ν1 (E) is undefined, ∀E ∈ B with E ∩A ∈ B \dom (ν); and ν1 (E) = ν(E ∩ A) ∈ Y, ∀E ∈ B with E ∩ A ∈ dom (ν). By Proposition 11.157, ν0 and ν1 are σ -finite Y-valued measures on (X, B) such that ν = ν0 +ν1 , P ◦ ν = P ◦ ν0 +P ◦ ν1 , P ◦ ν0 (E) = P ◦ ν(E ∩ B), and P ◦ ν1 (E) = P ◦ ν(E ∩ A), ∀E ∈ B. This implies that P ◦ ν0 (A) = P ◦ ν(A ∩ B) = 0. Then, P ◦ ν0 ⊥ μ. ∀E ∈ B with μ(E) = 0, we have E f dλ = 0. Let E¯ := (E, BE , λE ) be the σ -finite measure subspace of X¯ as defined in Proposition 11.13. Then, ¯ Then, λ(E ∩ A) = 0 and by Proposition 11.96, we have f = 0 a.e. in E. P ◦ ν1 (E) = P ◦ ν(E ∩ A) = 0, since P ◦ ν " λ. Hence, P ◦ ν1 " μ. This shows that ν0 and ν1 is the pair of σ -finite Y-valued measures on (X, B) that we seek. Next, we show the uniqueness of the pair ν0 and ν1 . Let νˆ0 and νˆ 1 be another pair of σ -finite Y-valued measures on (X, B) such that P ◦ νˆ 0 ⊥ μ, P ◦ νˆ1 " μ, ˆ Bˆ ∈ B with Aˆ = X \ Bˆ ν = νˆ 0 + νˆ 1 , and P ◦ ν = P ◦ νˆ 0 + P ◦ νˆ1 . Then, ∃A, ˆ = μ(B) ˆ = 0. ∀E ∈ B with λ(E) < ∞, we have μ(E) < ∞, such that P ◦ νˆ 0 (A) P ◦ ν(E) = P ◦ ν0 (E) + P ◦ ν1 (E) = P ◦ νˆ 0 (E) + P ◦ νˆ 1 (E) < ∞. Then, ˆ + νˆ 0 (E ∩ B) ˆ = νˆ0 (E ∩ B) ˆ = νˆ 0 (E ∩ B) ˆ + νˆ 1 (E ∩ B) ˆ = νˆ 0 (E) = νˆ0 (E ∩ A) ˆ = ν0 (E ∩ B) ˆ + ν1 (E ∩ B) ˆ + νˆ 0 (E ∩ Aˆ ∩ B) + νˆ1 (E ∩ Aˆ ∩ B) = ν(E ∩ B) ˆ + ν(E ∩ Aˆ ∩ B) = ν0 (E ∩ B) ˆ + ν0 (E ∩ A) ˆ = ν0 (E) ∈ Y, where ν0 (E ∩ B) ˆ = 0, the third equality the second equality follows from the fact that P ◦ νˆ 0 (A) ˆ = 0 and P ◦ νˆ 1 " μ, the fifth equality follows from the fact that follows from μ(B) ˆ = 0, μ(B) = 0, and P ◦ νˆ 1 " μ, and the sixth equality follows from the P ◦ νˆ 0 (A) ˆ = 0 and P ◦ν1 " μ. We also have νˆ 1 (E) = νˆ1 (E ∩ A)+ ˆ νˆ1 (E ∩ B) ˆ = fact that μ(B) ˆ = νˆ 1 (E ∩ A) ˆ + νˆ 0 (E ∩ A) ˆ = ν(E ∩ A) ˆ = ν0 (E ∩ A) ˆ + ν1 (E ∩ A) ˆ = νˆ 1 (E ∩ A) ˆ = νˆ0 (E ∩ Aˆ ∩ B) + νˆ1 (E ∩ Aˆ ∩ B) + ν1 (E ∩ A) ˆ = ν(E ∩ Aˆ ∩ B) + ν1 (E ∩ A) ˆ = ν1 (E ∩ A) ˆ + ν1 (E ∩ B) ˆ = ν1 (E) ∈ Y, where the second equality ν1 (E ∩ A) ˆ = 0 and P ◦ νˆ 1 " μ; the third equality follows follows from the fact that μ(B) ˆ = 0; the eighth equality follows from the fact that P ◦ νˆ0 (A) ˆ = 0, from P ◦ νˆ 0 (A) μ(B) = 0, and P ◦ νˆ 1 " μ; and the ninth equality follows from the fact that ˆ = 0 and P ◦ ν1 " μ. By Proposition 11.137, we have νˆ 0 = ν0 and νˆ 1 = ν1 . μ(B) Hence, the pair ν0 and ν1 is unique. (i) If ν is finite, then P ◦ ν(X) = P ◦ ν0 (X) + P ◦ ν1 (X) < ∞. Then, ν0 and ν1 are finite. (ii) If ν is a σ -finite measure on (X, B), by Proposition 11.135, it is identified with a σ -finite R-valued measure ν¯ on (X, B) such that ν¯ (E) ∈ [0, ∞) ⊂ R, ∀E ∈ dom (¯ν ). By the general case, there exists a pair of σ -finite R-valued measures

11.9 Lp Spaces

499

ν¯ 0 and ν¯1 on (X, B) that satisfies the desired properties. By the proof of the general case, ν¯ 0 (E) ∈ [0, ∞) ⊂ R, ∀E ∈ dom (ν¯0 ), and ν¯ 1 (E) ∈ [0, ∞) ⊂ R, ∀E ∈ dom (¯ν1 ). By Proposition 11.135, ν¯ 0 and ν¯ 1 are identified with σ -finite measures ν0 and ν1 , respectively. Then, ν0 = P ◦ ν¯0 ⊥ μ, ν1 = P ◦ ν¯ 1 " μ, and ν = P ◦ ν¯ = P ◦ ν¯ 0 + P ◦ ν¯1 = ν0 + ν1 . Then, ν0 and ν1 are σ -finite measures on (X, B). (iii) If ν is a finite measure on (X, B), then, by (i) and (ii), ν0 and ν1 are finite measures on (X, B). This completes the proof of the theorem. ' &

11.9 Lp Spaces Example 11.173 Let p ∈ [1, ∞) ⊂ R, X := (X, B, μ) be a σ -finite measure space, Y be a separable normed linear space over K, and (M(X , Y), K) be the vector space of functions of X to Y as defined in Example 6.20 with the usual vector addition, scalar multiplication, and the null vector ϑ. Define l : [0, ∞) ⊂ R → [0, ∞) ⊂ R by l(t) = t p , ∀t ∈ [0, ∞) ⊂ R. We will introduce the notation Pp ◦ f to denote l ◦ P ◦ f . Let Zp := {f ∈ M(X , Y) | f is B-measurable and Pp ◦ f is integrable over X }. Define ·p : Zp → [0, ∞) ⊂ R by f p =  1/p , ∀f ∈ Zp . We will next show that Zp is a subspace of X (Pp ◦ f ) dμ (M(X , Y), K) and ·p defines a pseudo-norm on Zp . Clearly, ϑ ∈ Zp = ∅. ∀f, g ∈ Zp , ∀α ∈ K, by Propositions 7.23, 11.38, and 11.39, f + g and αf are B-measurable. By Minkowski’s Inequality 11.174,  1/p = f + g ∈ Zp and f + gp ≤ f p + gp . αf p = X (Pp ◦ (αf )) dμ  p 1/p  1/p |α| X (Pp ◦ f ) dμ = |α| X (Pp ◦ f ) dμ = |α|f p ∈ R, where the second equality follows from Proposition 11.92. Hence, αf ∈ Zp . The above shows that Zp is a subspace of (M(X , Y), K). Hence, (Zp , K) is a vector space. Clearly, ϑp = 0. Then, combined with the above, we have ·p defines a pseudo-norm on (Zp , K). By Proposition 7.47, the quotient space of (Zp , K) modulo ·p is a normed linear space, to be denoted Lp (X , Y). We will ¯ denote the vector space (Zp , K) with the pseudo-norm ·p by Lp (X , Y). ∀f ∈ Zp with f p = 0, we have X Pp ◦ f dμ = 0. By Proposition 11.96, we have Pp ◦f = 0 a.e. in X and f = ϑY a.e. in X . On the other hand, ∀f ∈ M(X , Y) with f = ϑY a.e. in X and f being B-measurable, then, by Proposition 11.83, f p = 0 and f ∈ Zp . Hence, ∀f ∈ M(X , Y), f ∈ Zp and f p = 0 if, and only if, f = ϑY a.e. in X and f is B-measurable. We will denote the norm in Lp (X , Y) by ·p and elements in Lp (X , Y) by [f ], where f ∈ L¯ p (X , Y). % Theorem 11.174 (Minkowski’s Inequality) Let p ∈ [1, ∞) ⊂ R, X := (X, B, μ) be a σ -finite measure space, Y be a separable normed linear space over K, and L¯ p (X , Y) be the vector space over K with the pseudo-norm ·p as defined

500

11 General Measure and Integration

in Example 11.173. ∀f, g ∈ L¯ p (X , Y), then, f + g ∈ L¯ p (X , Y) and f + gp ≤ f p + gp

.

When 1 < p < ∞, equality holds if, and only if, ∃α, β ∈ [0, ∞) ⊂ R, which are not both zeros, such that αP ◦ f = βP ◦ g a.e. in X and P ◦ (f + g) = P ◦ f + P ◦ g a.e. in X . Proof Clearly, f and g are B-measurable. By Propositions 7.23, 11.38, and 11.39, f + g is B-measurable. We will distinguish two exhaustive and mutually exclusive cases: Case 1: f p gp = 0; Case 2: f p gp > 0. Case 1: f p gp = 0. Without loss of generality, assume that gp = 0. Then, X (Pp ◦ g) dμ = 0. By Proposition 11.96, we have Pp ◦ g = 0 a.e. in X . Thus, g = ϑY a.e. in X and f = f + g a.e. in X . By Proposition 11.83, the integral X (Pp ◦ (f + g)) dμ = X (Pp ◦ f ) dμ ∈ R. Hence, f + gp = f p = f p +gp ∈ R and f +g ∈ L¯ p (X , Y). Clearly, equality holds. Let α = 0, β = 1; we have αP ◦ f = βP ◦ g a.e. in X and P ◦ (f + g) = P ◦ f + P ◦ g a.e. in X . f p Case 2: f p gp > 0. Let λ := f  +g ∈ (0, 1) ⊂ R. ∀x ∈ X, we have p

.

p

" f (x) + g(x) #p f (x) + g(x)p ≤ f p + gp (f p + gp )p " f (x) g(x) #p f (x)p g(x)p = λ + (1 − λ) ≤λ + (1 − λ) p p f p gp f p gp

where the second inequality follows from the convexity of the function t p on [0, ∞) ⊂ R. In the above, when p > 1, equality holds if, and only if, g(x) (x) f (x) + g(x) = f (x) + g(x) and f f p = gp . (P ◦(f +g)) " (Pp ◦f ) By Proposition 11.83, we have X (f p +g )p dμ ≤ X λ f pp p p # (Pp ◦g) 1−λ dμ = fλp X (Pp ◦f ) dμ+ g +(1 − λ) g p p X (Pp ◦g) dμ = 1. This implies p p p p that X (Pp ◦(f + g)) dμ ≤ (f p +gp ) < +∞ and f + gp ≤ f p +gp . Therefore, f + g ∈ L¯ p (X , Y). Fix any p ∈ (1, +∞) ⊂ R. By Proposition 11.97, equality holds if, and only P ◦f P ◦g if, P ◦ (f + g) = P ◦ f + P ◦ g a.e. in X and f p = gp a.e. in X . Equality implies that α = 1/f p and β = 1/gp such that αP ◦ f = βP ◦ g a.e. in X and P ◦ (f + g) = P ◦ f + P ◦ g a.e. in X . On the other hand, if ∃α, β ∈ [0, ∞) ⊂ R, which are not both zeros, such that αP ◦ f = βP ◦ g a.e. in X and P ◦ (f + g) = P ◦ f + P ◦ g a.e. in X . Without loss of generality, assume that β = 0. Then, P ◦ g = α1 P ◦ f a.e. in X with α1 := α/β. It is easy to show that gp = α1 f p . This implies that α1 = gp /f p . Then, equality holds. Hence, equality holds, if, and only if, ∃α, β ∈ [0, ∞) ⊂ R, which are not both zeros, such that αP ◦ f = βP ◦ g a.e. in X and P ◦ (f + g) = P ◦ f + P ◦ g a.e. in X . Hence, the result holds in both cases. This completes the proof of the theorem. ' &

11.9 Lp Spaces

501

Definition 11.175 Let X := (X, B, μ) be a measure space and f : X → R be B-measurable. The essential supremum of f is .

ess sup f (x) := inf{M ∈ R | μ({x ∈ X | f (x) > M}) = 0} ∈ Re x∈X

%

Proposition 11.176 Let X := (X, B, μ) be a measure space and f : X → R and g : X → R be B-measurable. Then: (i) ess supx∈X (f (x) + g(x)) ≤ ess supx∈X f (x) + ess supx∈X g(x). (ii) If f ≤ g a.e. in X , then ess supx∈X f (x) ≤ ess supx∈X g(x). (iii) ∀α ∈ (0, ∞) ⊂ R, ess supx∈X (αf (x)) = α ess supx∈X f (x); and ∀α ∈ [0, ∞) ⊂ R, ess supx∈X (αf (x)) = α ess supx∈X f (x) if ess supx∈X f (x) ∈ R. (iv) λ := ess supx∈X f (x) ∈ Re . Then, f (x) ≤ λ a.e. x ∈ X . Proof (i) We will distinguish two exhaustive and mutually exclusive cases: Case 1: μ(X) = 0; Case 2: μ(X) > 0. Case 1: μ(X) = 0. Then, ∀M ∈ R, 0 ≤ μ({x ∈ X | f (x) > M}) ≤ μ(X) = 0. Then, ess supx∈X f (x) = −∞. Similarly, ess supx∈X g(x) = ess supx∈X (f (x) + g(x)) = −∞. Then, the (i) holds.  ∞ Case 2: μ(X) > 0. Note that ∞ n=1 An := n=1 {x ∈ X | f (x) > −n} = X. By Proposition 11.7, ∃n ∈ N such that μ(An ) > 0. Then, ess supx∈X f (x) ≥ −n > −∞. Similarly, ess supx∈X g(x) > −∞ and .

ess sup(f (x) + g(x)) > −∞ x∈X

Then, λ := ess supx∈X f (x) + ess supx∈X g(x) ∈ (−∞, +∞] ⊂ Re . If λ = +∞, then (i) holds. On the other hand, if λ ∈ R, ∀M ∈ R with λ < M, ∃M1 , M2 ∈ R with ess supx∈X f (x) < M1 and ess supx∈X g(x) < M2 and M1 + M2 = M. Then, μ({x ∈ X | f (x) > M1 }) = 0 = μ({x ∈ X | g(x) > M2 }). By Propositions 7.23, 11.38, and 11.39, f + g is B-measurable. Hence, μ({x ∈ X | f (x)+g(x) > M}) = 0 and ess supx∈X (f (x)+g(x)) ≤ M. By the arbitrariness of M, (i) holds. (ii) Let λ := ess supx∈X g(x) ∈ Re . We will distinguish two exhaustive and mutually exclusive cases: Case 1: λ = +∞; Case 2: λ < +∞. Case 1: λ = +∞. The result holds. Case 2: λ < +∞. ∀M ∈ R with λ < M, μ({x ∈ X | g(x) > M}) = 0. Then, {x ∈ X | f (x) > M} ⊆ {x ∈ X | g(x) > M} ∪ {x ∈ X | f (x) − g(x) > 0}. By Propositions 7.23, 11.38, and 11.39, f − g is B-measurable. Then, 0 ≤ μ({x ∈ X | f (x) > M}) ≤ μ({x ∈ X | g(x) > M}) + μ({x ∈ X | f (x) − g(x) > 0}) = 0. Hence, ess supx∈X f (x) ≤ M. By the arbitrariness of M, the result holds. (iii) Let α ∈ (0, +∞) ⊂ R. Then, ess supx∈X (αf (x)) = inf{M ∈ R | μ({x ∈ X | αf (x) > M}) = 0} = inf{αM ∈ R | μ({x ∈ X | αf (x) > αM}) = 0} = α inf{M ∈ R | μ({x ∈ X | f (x) > M}) = 0} = α ess supx∈X f (x), where the third equality follows from Proposition 3.81.

502

11 General Measure and Integration

Let α = 0 and ess supx∈X f (x) ∈ R. Then, μ(X) > 0. Thus, .

ess sup(αf (x)) = ess sup 0 = 0 = α ess sup f (x) x∈X

x∈X

x∈X

(iv) We will distinguish three exhaustive and mutually exclusive cases: Case 1: μ(X) = 0; Case 2: μ(X) > 0 and λ < +∞; Case 3: μ(X) > 0 and λ = +∞. Case 1: μ(X) = 0. Then, λ = −∞. Clearly, the result holds. Case 2: μ(X) > 0 and λ < +∞. Then, λ ∈ R. ∀n ∈ N, μ(En ) := μ({x ∈ X | f (x) > λ + 1/n}) = 0. By Proposition 11.7, μ(E) := μ({x ∈ X | f (x) > λ}) = μ( ∞ n=1 En ) = limn∈N μ(En ) = 0. Hence, f (x) ≤ λ a.e. x ∈ X . Case 3: μ(X) > 0 and λ = +∞. Clearly, the result holds. Hence, (iv) holds in all three cases. This completes the proof of the proposition. ' & Example 11.177 Let X := (X, B, μ) be a measure space, Y be a separable normed linear space over K, and (M(X , Y), K) be the vector space of functions of X to Y as defined in Example 6.20 with the usual vector addition, scalar multiplication, and the null vector ϑ. Let Z∞ := {f ∈ M(X , Y) | f is B-measurable and ess supx∈X f (x) < +∞}. Define ·∞ : Z∞ → [0, ∞) ⊂ R by f ∞ = max{ess supx∈X f (x), 0}, ∀f ∈ Z∞ . We will next show that Z∞ is a subspace of (M(X , Y), K) and ·∞ defines a pseudo-norm on Z∞ . Clearly, ϑ ∈ Z∞ = ∅. ∀f, g ∈ Z∞ , ∀α ∈ K, by Propositions 7.23, 11.38, and 11.39, f + g and αf are B-measurable. By Proposition 11.176, f + g∞ = max{ess sup f (x) + g(x), 0}

.

x∈X

≤ max{ess sup(f (x) + g(x)), 0} x∈X

≤ max{ess sup f (x) + ess sup g(x), 0}  =

x∈X

x∈X

ess supx∈X f (x) + ess supx∈X g(x) if μ(X) > 0 0 if μ(X) = 0

≤ f ∞ + g∞ < +∞ and αf ∞ = max{ess sup αf (x), 0} = max{ess sup |α|f (x), 0}

.

 =

x∈X

x∈X

|α|ess supx∈X f (x) if μ(X) > 0 = |α|f ∞ < +∞ 0 if μ(X) = 0

Then, f + g, αf ∈ Z∞ . Hence, Z∞ is a subspace of (M(X , Y), K). This implies that (Z∞ , K) is a vector space. Clearly, ϑ∞ = 0. Therefore, ·∞ defines a pseudo-norm on (Z∞ , K). By Proposition 7.47, the quotient space of (Z∞ , K)

11.9 Lp Spaces

503

modulo ·∞ is a normed linear space, to be denoted L∞ (X , Y). We will denote the vector space (Z∞ , K) with the pseudo-norm ·∞ by L¯ ∞ (X , Y). ∀f ∈ Z∞ with f ∞ = 0, we have ess supx∈X f (x) ≤ 0. Then, by Proposition 11.176, P ◦ f = 0 a.e. in X . Hence, f = ϑY a.e. in X . On the other hand, ∀f ∈ M(X , Y) with f = ϑY a.e. in X and f being B-measurable, we have f ∞ = 0 and f ∈ Z∞ . Hence, ∀f ∈ M(X , Y), f ∈ Z∞ and f ∞ = 0 if, and only if, f = ϑY a.e. in X and f is B-measurable. We will denote the norm in L∞ (X , Y) by ·∞ and elements in L∞ (X , Y) by [f ], where f ∈ L¯ ∞ (X , Y). % . In the following, we will write limn∈N zn = z in L¯ p (X , Y), when the sequence ¯ to z ∈ L¯ p (X , Y) in L¯ p (X , Y) pseudo-norm. We will (zn )∞ n=1 ⊆ Lp (X , Y) converges . simply write limn∈N zn = z if there is no confusion in which pseudo-norm convergence occurs. For z ∈ L¯ p (X , Y), we will denote the corresponding equivalence class . ¯ ¯ in Lp (X , Y) by [z]. Then, for (zn )∞ n=1 ⊆ Lp (X , Y), limn∈N zn = z in Lp (X , Y) if, and only if, limn∈N [zn ] = [z] in Lp (X , Y) (or simply limn∈N [zn ] = [z] when there is no confusion in which norm convergence occurs). Theorem 11.178 (Hölder’s Inequality) Let p ∈ [1, +∞) ⊂ R and q ∈ (1, +∞] ⊂ Re with 1/p + 1/q = 1, X := (X, B, μ) be a σ -finite measure space, and Y be a separable normed linear space over K with Y∗ being separable. Then, ∀f ∈ L¯ p (X , Y), ∀g = L¯ q (X , Y∗ ), the function r : X → K, defined by r(x) = g(x), f (x), ∀x ∈ X, is absolutely integrable over X and 7 .

X

|g(x), f (x)| dμ(x) ≤ f p gq

When q < ∞, equality holds if, and only if, |g(x), f (x)| = f (x) · g(x) a.e. x ∈ X and ∃α, β ∈ R, which are not both zeros, such that αPp ◦ f = βPq ◦ g a.e. in X . Proof By Propositions 7.65, 11.38, 11.39, and 7.21, r and P ◦ r are B-measurable. We will distinguish two exhaustive and mutually exclusive cases: Case 1: q = ∞; Case 2: 1 < q < +∞. Case 1: q = ∞. Then, p = 1. By Proposition 11.176, g(x) ≤ g∞ a.e. x ∈ X . Note that |g(x), f (x)| ≤ g(x)f (x) ≤ g∞ f (x) a.e. x ∈ X . Then, by Propositions 11.83 and 11.92, 7

7

.

X

|g(x), f (x)| dμ(x) ≤ g∞

X

P ◦ f dμ = f 1 g∞

Case 2: 1 < q < +∞. Then, 1 < p < +∞. We will further distinguish two exhaustive and mutually exclusive cases: Case 2a: f p gq = 0; Case 2b: f p gq > 0. Case 2a: f p gq = 0. Without loss of generality, assume gq = 0. Then, g = ϑY∗ a.e. in X . This implies that |g(x), f (x)| = 0 a.e. x ∈

504

11 General Measure and Integration

X and, by Propositions 11.83 and 11.75, 7 |g(x), f (x)| dμ(x) = 0 = f p gq

.

X

Equality holds ⇒ α = 0, β = 1, αPp ◦ f = 0 = βPq ◦ g a.e. in X , and |g(x), f (x)| = 0 = f (x)g(x) a.e. x ∈ X . This subcase is proved. Case 2b: f p gq > 0. Then, f p > 0 and gq > 0. ∀x ∈ X, by Lemma 7.7 with " f (x) #p " g(x) #q a= ,b= , and λ = 1/p, we have f p gq .

g(x) f (x) |g(x), f (x)| 1 f (x)p 1 g(x)q ≤ ≤ p + f p gq gq f p p f p q gqq

with equality holding if, and only if, |g(x), f (x)| = g(x)f (x) and g(x)q f (x)p = p q . Integrating the above inequality over X , we have, by f p gq Propositions 11.83 and 11.97, .

X

|g(x), f (x)| dμ(x) 1 1 ≤ + =1 f p gq p q

with equality holding if, and only if, |g(x), f (x)| = g(x)f (x) a.e. x ∈ X Pp ◦ f Pq ◦ g p q and p = q a.e. in X . Equality ⇒ α = 1/f p , β = 1/gq , αPp ◦ f p gq f = βPq ◦ g a.e. in X and |g(x), f (x)| = g(x)f (x) a.e. x ∈ X . On the other hand, if |g(x), f (x)| = g(x)f (x) a.e. x ∈ X and ∃α, β ∈ R, which are not both zeros, such that αPp ◦ f = βPq ◦ g a.e. in X , then, without loss of generality, assume β = 0. Let α1 = α/β. Then, α1 Pp ◦ f = Pq ◦ g a.e. in X . p q q p Hence, α1 f p = gq , which further implies that α1 = gq /f p . Hence, Pq ◦ g Pp ◦ f p = q a.e. in X . This implies equality. Therefore, equality holds if, and f p gq only if, |g(x), f (x)| = g(x)f (x) a.e. x ∈ X and ∃α, β ∈ R, which are not both zeros, such that αPp ◦ f = βPq ◦ g a.e. in X . This subcase is proved. This completes the proof of the theorem. ' & When p = 2 = q, the Hölder’s inequality becomes the well-known Cauchy– Schwarz inequality: 7 |g(x), f (x)| dμ(x) ≤

.

X



7 P2 ◦ f dμ X

1/2 

7 P2 ◦ g dμ

1/2

X

Example 11.179 Let p ∈ [1, ∞) ⊂ R, X := (X, B, μ) be a σ -finite measure space, Y be a separable Banach space over K, and Lp (X , Y) be the normed linear space over K as defined in Example 11.173. We will show that Lp (X , Y) is a Banach

11.9 Lp Spaces

505

space by Proposition 7.27. Define l : [0, ∞) ⊂ R → [0, ∞) ⊂ R by l(t) = t p , ¯ ∀t ∈-[0, ∞) ⊂ R. Fix any ([fn ])∞ n=1 ⊆ Lp (X , Y) with fn ∈ Lp (X , Y), ∀n ∈ N, ∞ and n=1 [f -n ]p =: M ∈ [0, ∞) ⊂ R. ∀n ∈ N, define gn : X → [0, ∞) ⊂ R by gn (x) = ni=1 fi (x), ∀x ∈ X. By Propositions 7.23, 7.21, 11.38, and 11.39, gn is B-measurable. Note that P ◦ fn ∈ L¯ p (X, R). Then, gn ∈ L¯ p (X , R) and -n n n gn  p ≤ i=1 P ◦ fi p = i=1 fi p = i=1 [fi ]p ≤ M. This implies p p that X (l ◦ gn ) dμ ≤ M . Clearly, (gn (x)) ≤ (gn+1 (x))p , ∀x ∈ X, ∀n ∈ N. By Proposition 11.82 and Monotone Convergence Theorem 11.81, ∃g : X → [0, ∞) ⊂ R, which is B-measurable, such that limn∈N l◦gn = l◦g a.e. in X and X (l◦g) dμ = limn∈N X (l ◦ gn ) dμ ≤ M p . Let E := {x ∈ X | (gn (x))∞ n=1 does not converge to g(x)}. Then, E ∈ B and μ(E) = 0. ∀x ∈ X \ E, limn∈N gn (x) = g(x) ∈ R and limn∈N ni=1 fi (x) =-g(x) ∈ R. By Proposition 7.27 and the completeness of Y, we have limn∈N ni=1 fi (x) =: limn∈N sn (x) =: s(x) ∈ Y. Define s(x) = sn (x) = ϑY , ∀x ∈ E and ∀n ∈ N. Then, by Proposition 11.41, sn is Bmeasurable, ∀n ∈ N. Clearly, limn∈N sn (x) = s(x), -n ∀x ∈ X. By Proposition 11.48, s is B-measurable. By Lemma 11.43, s = n i=1 fi a.e. in X . This yields that > =[sn ] = - ni=1 fi = ni=1 [fi ] and sn ∈ L¯ p (X , Y). Then, by Proposition 11.50, limn∈N ni=1 fi = s a.e. in X . Note that limn∈N Pp ◦ (sn (x) − s(x)) = 0, ∀x ∈ X. Note also sn (x) − s(x)p ≤ (sn (x)+s(x))p ≤ (gn (x)+g(x))p ≤ 2p (g(x))p , ∀x ∈ X \ E, ∀n ∈ N, and sn (x) − s(x)p = 0 ≤ 2p (g(x))p , ∀x ∈ E, ∀n ∈ N. Hence, sn (x) − s(x)p ≤ 2p (g(x))p , ∀x ∈ X, ∀n ∈ N. By Lebesgue Dominated Convergence Theorem 11.91, limn∈N X (Pp ◦ (sn − s)) dμ = 0. This . implies that limn∈N sn − sp = 0. Hence, s ∈ L¯ p (X , Y) and limn∈N ni=1 fi = . ∞ limn∈N sn = s. Hence, ([fn ])n=1 is summable in Lp (X , Y). By Proposition 7.27 and the arbitrariness of ([fn ])∞ n=1 , we have Lp (X , Y) is complete. This shows that Lp (X , Y) is a Banach space when X is a σ -finite measure space and Y is a separable Banach space. % Example 11.180 Let X := (X, B, μ) be a measure space, Y be a separable Banach space over K, and L∞ (X , Y) be the normed linear space over K as defined in Example 11.177. We will show that L∞ (X , Y) is a Banach space. Fix any Cauchy ¯ sequence ([fn ])∞ n=1 ⊆ L∞ (X , Y) with fn ∈ L∞ (X , Y), ∀n ∈ N. ∀k ∈ N, ∃Nk ∈ N, ∀n, m ∈ N with n ≥ Nk and m ≥ Nk , we have fn − fm ∞ < fn (x) − fm (x) ≥ 1/k. Then, by Proposition 11.176, An,m,k := ∞{x ∈∞ X | ∞ 1/k} ∈ B and μ(An,m,k ) = 0. Let A := k=1 n=Nk m=Nk An,m,k ∈ B. ∞ Clearly, μ(A) = 0. ∀x ∈ X \ A, (fn (x))n=1 ⊆ Y is a Cauchy sequence, which converges to f (x) ∈ Y by the completeness of Y. Define f (x) = ϑY , ∀x ∈ A. Then, f : X → Y is well-defined, and limn∈N fn (x) = f (x), ∀x ∈ X \ A. By Propositions 11.48 and 11.41, f is B-measurable. ∀k ∈ N, ∀n ∈ N with n ≥ Nk , ∀x ∈ X \ A, by Propositions 3.66, 3.67, 7.21, and 7.23, we have fn (x) − f (x) = limm∈N fn (x) − fm (x) ≤ 1/k. Then, 0 ≤ μ({x ∈ X | f (x) − fn (x) > 1/k}) ≤ μ(A) = 0. This shows that fn − f ∞ ≤ 1/k. . Then, limn∈N fn − f ∞ = 0, limn∈N fn = f , and f ∈ L¯ ∞ (X , Y). Hence,

506

11 General Measure and Integration

limn∈N [fn ] = [f ] in L∞ (X , Y). Therefore, L∞ (X , Y) is a Banach space when X is a measure space and Y is a separable Banach space. % Proposition 11.181 Let p ∈ [1, ∞) ⊂ R, X := (X, B, μ) be a σ -finite measure space, Y be a separable normed linear space over K, Lp (X , Y) be the normed linear space over K as defined in Example 11.173, and f ∈ L¯ p (X , Y). Then, there exists a sequence of simple functions (φi )∞ i=1 , φi : X → Y, ∀i ∈ N, such that limi∈N φi = f a.e. in X , φi (x) ≤ f (x), ∀x ∈ X, ∀i ∈ N, limi∈N X Pp ◦ (φi − f ) dμ = 0, . and limi∈N φi = f in L¯ p (X , Y). Proof Since f ∈ L¯ p (X , Y), then f is B-measurable and Pp ◦ f is integrable over X . By Proposition 11.66, there exists a sequence of simple functions (φi )∞ i=1 , φi : X → Y, ∀i ∈ N, such that φi (x) ≤ f (x), ∀x ∈ X, ∀i ∈ N, and limi∈N φi = f a.e. in X . By Propositions 7.23, 7.21, 11.38, and 11.39, Pp ◦(φi − f ) is B-measurable, ∀i ∈ N. Note that, by Propositions 7.23, 7.21, 11.52, and 11.53, limi∈N Pp ◦ (φi − f ) = 0 a.e. in X and Pp ◦ (φi − f )(x) ≤ 2p Pp ◦ f (x), ∀x ∈ X, Theorem 11.91, we have ∀i ∈ N. By Lebesgue Dominated Convergence . limi∈N X Pp ◦ (φi − f ) dμ = 0. Hence, limi∈N φi = f . This completes the proof ' & of the proposition. Proposition 11.182 Let p ∈ [1, ∞) ⊂ R, X := (X , B, μ) be a σ -finite normal topological measure space, Y be a separable normed linear space over K, and f ∈ L¯ p (X, Y). Then, ∀ ∈ (0, ∞) ⊂ R, ∃ a continuous function g : X → Y such that g ∈ L¯ p (X, Y) and g − f p < . Proof Fix  ∈ (0, ∞) ⊂ R. By Proposition 11.181, there exists a simple function φ : X → Y such that φ(x) ≤ f(x), ∀x ∈ X, and φ − f p < /2. Let φ admit the canonical representation φ = ni=1 yi χAi ,X , where n ∈ Z+ , y1 , . . . , yn ∈ Y are distinct, and none equals to ϑY , A1 , . . . , An ∈ B are pairwise disjoint, nonempty, and of finite measure. ∀i ∈ {1, . . . , n}, by X being a topological measure space and Proposition 11.27, ∃Ui , X \ Fi ∈ OX such that Fi ⊆ Ai ⊆ Ui and μ(Ui \ Fi ) = p μ(Ui \ Ai ) + μ(Ai \ Fi ) < 2p (n+1) p y p . By X being a normal topological space i and Urysohn’s Lemma 3.55, there exists a continuous function gi : X → [0, 1] ⊂ R such that gi (x) = 1, ∀x ∈ Fi and gi (x) = 11.37, gi 6 0, ∀x ∈ X6\ Ui . By Proposition  is B-measurable. By Proposition 11.83, 6gi − χAi ,X 6p ≤ 2(n+1)y . i -n Define g : X → Y by g(x) = y g (x), ∀x ∈ X. By Proposii i i=1 tions 7.23, 3.12, and 3.32, g is continuous. By Proposition 11.37, g is B-measurable. 6 - 6 Then, g − f p ≤ g − φp + φ − f p ≤ ni=1 6yi (gi − χAi ,X )6p + /2 = 6 6 -n -n yi  yi 6gi − χA ,X 6 + /2 ≤ + /2 < . Hence, g ∈ i=1

i

p

i=1 2(n+1)yi 

L¯ p (X, Y). This completes the proof of the proposition.

' &

Proposition 11.183 Let X := (X, B, μ) be a σ -finite measure space, Y be a separable Banach space, f : X → Y be B-measurable, ∀E ∈ B with μ(E) < ∞, f |E be absolutely integrable over E := (E, BE , μE ), which is the finite measure

11.9 Lp Spaces

507

subspace6of X , and M ∈ 6 [0, ∞) ⊂ R. Assume that, ∀E ∈ B with 0 < μ(E) < ∞, 6 1 6 we have 6 μ(E) f dμ 6 ≤ M. Then, P ◦ f ≤ M a.e. in X . E Proof Consider the open set O := {y ∈ Y | y > M} ⊆ Y. Since Y is separable, by Propositions 4.38 and 4.4, O is second countable and separable. Let D ⊆ O be a countable dense set in O (the relative closure of D with respect to O equals to O). It is easy to show that M := {BY (y, r) ⊆ O | y ∈ D, r ∈ Q, r > 0} (O) = f ( is a countable basis for O. Let E := f inv inv BY (y,r)∈M BY (y, r)) =  BY (y,r)∈M finv (BY (y, r)). We will show that μ(finv (BY (y, r))) = 0, ∀BY (y, r) ∈ M, by an argument of contradiction. Suppose that ∃y ∈ D, ∃r ∈ Q with r > 0 such that BY (y, r) ⊆ ¯ := μ(finv (BY (y, r))) > 0. Since X is σ -finite, then ∃Eˆ ∈ B O, and μ(E) ˆ ˆ < +∞.6∀x ∈6 E, ˆ we have x ∈6 E¯ with E ⊆ E¯ such that 0 M. This μ(E) μ(E) contradicts the assumption. -Hence, μ(finv (BY (y, r))) = 0, ∀BY (y, r) ∈ M. Then, 0 ≤ μ(E) ≤ BY (y,r)∈M μ(finv (BY (y, r))) = 0. Hence, P ◦ f ≤ ' & M a.e. in X . Lemma 11.184 Let p, q ∈ (1, ∞) ⊂ R with 1/p + 1/q = 1, X := (X, B, μ) be a σ -finite measure space, Y be a separable normed linear space over K with Y∗ being separable, Z := Lp (X , Y) be the normed linear space over K as defined in Example 11.173, and g : X → Y∗ be B-measurable. Assume that: (i) ∀E ∈ B with μ(E) < +∞, g is absolutely integrable over E := (E, BE , μE ), which is the finite measure subspace of X . (ii) ∃M ∈ [0, ∞) ⊂ R such that ∀ simple function φ : X → Y (φ ∈ L¯ p (X , Y)),  : X → K is absolutely integrable over X and the function g(·), φ(·)  g(x), φ(x) dμ(x) ≤ Mφp . X Then, g ∈ L¯ q (X , Y∗ ) and gq ≤ M. Proof By Proposition 11.116, define a σ -finite Y∗ -valued measure ν on (X, B) by: ν(E) is undefined, ∀E ∈ B with E P ◦ g dμ = ∞; ν(E) = g dμ ∈ Y∗ , ∀E ∈ B E with E P ◦ g dμ < ∞. Then, P ◦ ν(E) = E P ◦ g dμ, ∀E ∈ B. By (i), P ◦ ν(E) < ∞, ∀E ∈ B with μ(E) < ∞. By Proposition 11.92, ν(E) = ϑY∗ , ∀E ∈ B with μ(E) = 0. Fix any E ∈ B with μ(E) < ∞. Define Pq ◦ ν(E) ∈ [0, ∞] ⊂ Re by Pq ◦ ν(E) :=

.

sup

n . ν(Ei )q μ(Ei ) (μ(Ei ))q i=1

n n∈Z+ , (Ei )n i=1 ⊆B, i=1 Ei =E Ei ∩Ej =∅, ∀1≤i0

508

11 General Measure and Integration

n ∀n ∈ Z+ , ∀ pairwise disjoint (Ei )ni=1 ⊆ B with E = i=1 Ei , ∀ ∈ (0, 1) ⊂ R, ∀i ∈ {1, . . . , n} with μ(Ei ) > 0, by Lemma 7.75, ∃yi ∈ Y such that yi  = ν(Ei )q−1 /(μ(Ei ))q−1 and ν(Ei ), yi  ≥ (1-− )ν(Ei )yi  = (1 − )ν(Ei )q /(μ(Ei ))q−1 . Define a simple function φ = ni=1 μ(Ei )>0 yi χEi ,X . Then, φp =

n " .

.

ν(Ei )(q−1)p (μ(Ei ))(1−q)p μ(Ei )

#1/p

i=1 μ(Ei )>0

=

n " .

ν(Ei )q (μ(Ei ))−q μ(Ei )

#1/p

i=1 μ(Ei )>0

By (ii), we have n  .    ν(Ei ), yi  (1 − )ν(Ei )q /(μ(Ei ))q−1 ≤ 

n . . i=1 μ(Ei )>0

i=1 μ(Ei )>0

n  . @@ 7  = i=1 μ(Ei )>0

 =

7

Ei

7 n AA  .   g dμ, yi  =  i=1 μ(Ei )>0

Ei

  g(x), yi  dμ(x)

n #1/p " .  g(x), φ(x) dμ(x) ≤ M ν(Ei )q (μ(Ei ))1−q X

i=1 μ(Ei )>0

where-the second and third equalities follow from Proposition 11.92. Hence, we have ni=1 μ(Ei )>0 ν(Ei )q (μ(Ei ))−q μ(Ei ) ≤ M q /(1 − )q . By the arbitrariness of , we have ni=1 μ(Ei )>0 ν(Ei )q (μ(Ei ))−q μ(Ei ) ≤ M q . Then, Pq ◦ ν(E) ≤ q M . ∞ Since X is σ -finite, then ∃(Xn )∞ n=1 Xn and n=1 ⊆ B such that X = μ(Xn ) < ∞, ∀n ∈ N. Without loss of generality, we may assume that Xn ⊆ Xn+1 , ∀n ∈ N. By Proposition 11.66, there exists a sequence of simple functions (ψi )∞ i=1 , ψi : X → Y∗ , ∀i ∈ N, such that ψi (x) ≤ g(x), ∀x ∈ X, ∀i ∈ N, and limi∈N ψi = g a.e. in X . Fix any n ∈ N; let En := {x ∈ Xn | g(x) ≤ n} ∈ B and En := (En , BEn , μEn ) be the finite measure subspace of X . Then, (P ◦ g)|En and (Pq ◦ g)En are integrable over En . By Propositions 7.21, 7.23, 11.52, and 11.53, limi∈N (P ◦ (ψi − g))|En = 0 a.e. in En . Note that (P ◦ (ψi − g))|En (x) ≤ 2 (P ◦ g)|En (x), ∀x ∈ En , ∀i ∈ N. By Lebesgue Dominated Convergence Theorem 11.91, we have limi∈N En P ◦ (ψi − g) dμ = 0. Note also that   (Pq ◦ ψi )En (x) ≤ (Pq ◦ g)En (x), ∀x ∈ En , ∀i ∈ N. By Propositions 7.21   and 11.52, limi∈N (Pq ◦ ψi )En = (Pq ◦ g)En a.e. in En . Again, by Lebesgue

11.9 Lp Spaces

509

Dominated Convergence Theorem 11.91, we have limi∈N En Pq ◦ψi dμ = En Pq ◦ g dμ. ∀ ∈ (0, ∞) ⊂ R, ∃i0 ∈ N such that 0 ≤ En P ◦ (ψi0 − g) dμ <      and P ◦ ψ dμ − P ◦ g dμ   < /2. Let ψi0 admit the canonical q i q q−1 0 En En 2qn -n¯ representation ψi0 = j =1 y∗j χAj ,X . Then, 7

7 Pq ◦ g dμ
0

n¯ .



6 6 6ν(Aj ∩ En )6q (μ(Aj ∩ En ))1−q + /2

j=1 μ(Aj ∩En )>0

Let A¯ j = Aj ∩ En , j = 1, . . . , n. ¯ Then, 7 Pq ◦ g dμ
0

n¯ n¯ . . 6 6 6 6q 6ν(A¯ j )6q (μ(A¯ j ))1−q + 6y∗j 6 μ(A¯ j ) + /2



j=1 μ(A¯ j )>0

j=1 μ(A¯ j )>0

≤ Pq ◦ ν(En ) −

n¯ . 6 6 6ν(A¯ j )6q (μ(A¯ j ))1−q j=1 μ(A¯ j )>0

n¯ 67 . 6 + 6 j=1 μ(A¯ j )>0

≤M + q

A¯ j

n¯ . j=1 μ(A¯ j )>0

= Mq +

n¯ . j=1 μ(A¯ j )>0

6q 6 ψi0 dμ6 (μ(A¯ j ))1−q + /2

"6 7 1−q 6 ¯ (μ(Aj )) 6

A¯ j

"6 7 6 (μ(A¯ j ))1−q 6

A¯ j

6q 6 7 6 6 ψi0 dμ6 − 6 6 67 6 6 ψi0 dμ6 − 6

A¯ j

A¯ j

6q # 6 g dμ6 + /2

6# 6 g dμ6 q

510

11 General Measure and Integration

" 67 6 · tj 6 ≤ Mq +

6 67 6 6 ψi0 dμ6 + (1 − tj )6

A¯ j

n¯ .

67 6 (μ(A¯ j ))1−q 6

A¯ j

j=1 μ(A¯ j )>0

A¯ j

6#q−1 6 g dμ6 + /2 7

ψi0 dμ −

A¯ j

6 6 g dμ6q

 q−1 · tj nμ(A¯ j ) + (1 − tj )nμ(A¯ j ) + /2 =M + q

n¯ .

6 67 6 (ψi0 − g) dμ6 + /2 6

q−1 6

qn

A¯ j

j=1 μ(A¯ j )>0

≤ Mq +

n¯ . j=1 μ(A¯ j )>0

7 qnq−1

P ◦ (ψi0 − g) dμ + /2

7

≤ M + qn q

A¯ j

q−1 En

P ◦ (ψi0 − g) dμ + /2 < M q + 

where the first equality follows from the Mean Value Theorem 9.20 and tj ∈ (0, 1) ⊂ R, j = 61, . . . , n; ¯6 the fourth inequality follows from Proposition 11.92 and the fact that 6ψi0 (x)6 ≤ g(x) ≤ n, ∀x ∈ En ; the second equality and the fifth inequality follow from Proposition 11.92; and the sixth inequality follows from Proposition 11.83. By arbitrariness of , we have En Pq ◦ g dμ ≤  M q . Clearly, we have En ⊆ En+1 , ∀n ∈ N, and ∞ by n=1 En = X. Then, Monotone Convergence Theorem 11.81, we have X Pq ◦ g dμ = limn∈N X (Pq ◦ g)χEn ,X dμ = limn∈N En Pq ◦ g dμ ≤ M q . Hence, gq ≤ M and g ∈ L¯ q (X , Y∗ ). This completes the proof of the lemma. ' & Lemma 11.185 Let p ∈ [1, ∞) ⊂ R, X := (X, B, μ) be a σ -finite measure space, Y be a separable reflexive Banach space over K with Y∗ being separable, and Z := Lp (X , Y) be the Banach space over K as defined in Example 11.179. Then, ∀f ∈ Z∗ , ∃g : X → Y∗ , which is B-measurable, such that: (i) ∀E ∈ B with μ(E) < +∞, g is absolutely integrable over E := (E, BE , μE ), which is the finite measure subspace of X . (ii) ∀ simple function φ : X → Y (φ ∈ Z¯ := L¯ p (X , Y)), the function g(·), φ(·) : X → K is absolutely integrable over X , f ([φ]) =   X g(x), φ(x) dμ(x), and  X g(x), φ(x) dμ(x) ≤ f φp . Furthermore, g is unique in the sense that g˜ : X → Y∗ is another function with the above properties if, and only if, g˜ is B-measurable and g = g˜ a.e. in X . ¯ Proof Fix any f ∈ Z∗ . ∀E ∈ B with μ(E) >< ∞, ∀y ∈ Y, let zE,y :=∗ yχE,X ∈ Z. = Define fE : Y → K by fE (y) = f ( zE,y ), ∀y ∈ Y. Since f ∈ Z , then fE is

11.9 Lp Spaces

511

= > ∗ linear = and > continuous and fE = y∗E ∈ Y . Note that z∅,y = ϑZ , ∀y ∈ Y, and f ( z∅,y ) = 0. Then, f∅ = ϑY∗ and y∗∅ = ϑY∗ . Claim11.185.1 ∀E ∈ with disjoint (Ei )∞ i=1 ⊆ B with 6 μ(E) 6 < ∞, ∀ pairwise -B ∞ ∞ 6 1/p 6 E = i=1 Ei . Then, i=1 y∗Ei ≤ f (μ(E)) < ∞.  6  -∞ 6  6 6 = -∞ sup  ∈ N, by Proof of Claim y∈Y, y≤1 fEi (y) . ∀i 6 i=1 y∗Ei i=1 6 6 6  y Propositions 7.85 ≤ 1 such that y and 7.90, ∃y ∈ Y with i i ∗E i 6 6 > > = = = -n -n 6 6 fEi=(y i=1 f 6=-∀n ∈ N,>6 i=1 y∗Ei 6-= 6 ( zEi ,yi ) = -i ) = f ( >zEi ,yi ). Then, f ( ni=1 zEi ,yi ) ≤ f 6 ni=1 zEi ,yi 6p = f 6 ni=1 zEi ,yi 6p ≤ f  · (μ(E))1/p < ∞, where the first inequality follows from Proposition 7.72 and the second inequality 6 from 6 Propositions 11.83 and 11.75. By the arbitrariness - follows 6y∗E 6 ≤ f (μ(E))1/p . This completes the proof of the of n, we have ∞ n n=1 claim. ' & Claim11.185.2 ∀E ∈ B with μ(E) disjoint (En )∞ n=1 ⊆ B with -∞ < ∞, ∀ pairwise ∞ ∗ E = n=1 En , we have y∗E = n=1 y∗En ∈ Y . Proof of Claim By Claim 11.185.1 and Propositions 7.27 and 7.72, ∞ n=1 y∗En ∈ Y∗ . ∀y ∈ Y, .

n ∞ n . . .  y∗En , y = lim  y∗Ei , y = lim y∗Ei , y n∈N

n=1

= lim

n∈N

n . i=1

n∈N

i=1

f ([zEi ,y ]) = lim f ([ n∈N

n .

i=1

zEi ,y ])

i=1

where the first equality follows from Propositions 7.72 -nand 3.66 and the last equality follows6from the linearity of f . Note that lim n∈N i=1 zEi ,y (x) = zE,y (x), 6p ∀x ∈ X, and 6 ni=1 zEi ,y (x) − zE,y (x)6 ≤ yp χE,X (x), ∀x ∈ X, ∀n ∈ N. Then, by Lebesgue Dominated Convergence Theorem 11.91, limn∈N X Pp ◦ -n . ¯ ( ni=1 zEi ,y − zE,y ) dμ = 0 and =-nlimn∈N > i=1 zEi =,y => zE,y in Z. By Propositions 7.72 and BB3.66, limn∈N fCC( i=1 zEi ,y ) = f ( zE,y ) = y∗E , y. Hence, ∞ y∗E , y = n=1 y∗En , y , ∀y ∈ Y. This implies that, by Proposition 7.85, -∞ y∗E = n=1 y∗En . ' & ∞ ∞ Since X is σ -finite, then ∃(Xn )n=1 ⊆ B such that X = n=1 Xn and μ(Xn ) < ∞, ∀n ∈ N. Without loss of generality, we may assume that (Xn )∞ n=1 is pairwise disjoint. Fix any n ∈ N; let Xn := (Xn , Bn , μn ) be the finite measure subspace of X . We may define a function νn : Bn → Y∗ by νn (E) = fE = y∗E , ∀E∈ Bn . Clearly, ∞ νn (∅) = y∗∅ = ϑY∗. ∀ pairwise disjoint (Ei )∞ i=1 Ei ∈ Bn . i=1 ⊆ Bn , let E := ∞ 1/p 1/p < ∞. By f (μ(E)) f (μ(X By Claim 11.185.1, i=1 ν (E ) ≤ ≤ )) n -n ∞ i ∗ Claim 11.185.2, νn (E) = i=1 νn (Ei ) ∈ Y . This shows that νn is a Y∗ -valued pre-measure on (Xn , Bn ). By Claim 11.185.1, P ◦ νn (Xn ) ≤ f (μ(Xn ))1/p < +∞. Then, νn is finite. Hence, (Xn , Bn , νn ) is a finite Y∗ -valued measure space.

512

11 General Measure and Integration

By Proposition 11.118, the generation process on ((Xn , Bn , νn ))∞ n=1 yields a unique σ -finite Y∗ -valued measure space (X, B, ν) on X. Next, we will show that P ◦ ν(E) ≤ f (μ(E))1/p < ∞ and ν(E) = y∗E , ∀E ∈ B with μ(E) < ∞. Fix any E ∈ B with -μ(E) < ∞. Let En := Xn ∩E ∈ Bn , ∀n ∈ N. By Proposition 11.118, P ◦ν(E) = ∞ n=1 P ◦νn (En ). ∀ ∈ (0, +∞) ⊂ R, mn   n ∀n ∈ N, ∃mn ∈ Z+ , ∃ pairwise disjoint En,i i=1 ⊆ Bn with En = m i=1 En,i , such 6 6 - n 6 −n  < ∞. Then, 6 that P ◦ νn (En ) < m ν (E ) + 2 n n,i i=1 P ◦ ν(E)
f (x) ∀x ∈ Xn ¯ by f (x) = , ∀x ∈ X. Let D := { f¯ ∈ Lp (X, Y1 ) | ∃n ∈ ϑY1 ∀x ∈ X \ Xn N, [f ] ∈ Dn }. Clearly, D ⊆ Lp (X, Y1 ) is countable. ¯ We will show that D is dense in Lp (X, Y1 ). Fix any f ∈ Lp (X, Y1 ). We have (P ◦ f ) dμ < ∞. By Monotone Convergence Theorem 11.81, p X X (Pp ◦ f ) dμ = ¯ limn∈N X Pp ◦ (f χXn ,X ) dμ. Let fn := f χXn ,X ∈ Lp (X, Y1 ), ∀n ∈ N. Then, .

lim fn − f p = lim

n∈N

= lim

n∈N

 

7

n∈N



X

Pp ◦ (fn − f ) dμ

7

Pp ◦ (f χX\Xn ,X ) dμ

X

1/p 1/p

7 Pp ◦ f dμ −

X

1/p

X

(Pp ◦ f − Pp ◦ fn ) dμ

n∈N

= lim

7

7

n∈N

= lim



Pp ◦ fn dμ

1/p

=0

X

. where the fourth equality follows from Proposition 11.83. Hence, limn∈N fn = 6 6 f in L¯ p (X, Y1 ). ∀ ∈ (0, ∞) ⊂ R, ∃n0 ∈ N such that 6fn0 − f 6p < /2.  Let fˆn0 := fn0 Xn . Then, fˆn0 ∈ L¯ p (Xn0 , Y1 ). This implies that ∃[g] ∈ Dn0 with 0 6 6 6 6 g ∈ L¯ p (Xn0 , Y1 ) such that 6fˆn0 − g 6 < /2. Let g¯ ∈ L¯ p (X, Y1 ) be defined as in p

the second paragraph from last. Then, [g] ¯ ∈ D and 6 6 6 6 f − g ¯ p ≤ 6f − fn0 6p + 6fn0 − g¯ 6p 7  1/p < /2 + Pp ◦ (fn0 − g) ¯ dμ

.

= /2 + = /2 +

 

7 7

X

X

Pp ◦ ((fn0 − g)χ ¯ Xn0 ,X ) dμ

Xn0

Pp ◦ (fˆn0 − g) dμn0

1/p

1/p

6 6 6 6 = /2 + 6fˆn0 − g 6 <  p

where the second equality follows from Proposition 11.83. Hence, D is dense in Lp (X, Y1 ). Then, by Example 11.179, Lp (X, Y1 ) is a separable Banach space. (v) Since X2 is σ -compact, then ∃  compact sets (Kn )∞ n=1 such that X2 = ∞ K . ∀n ∈ N, ∀i ∈ N, K ⊆ B 1/i). By the compactness (x, n n=1 n x∈Kn X2

11.10 Dual of C (X , Y) and Cc (X, Y)

519

 of Kn , ∃ finite subset Dn,i ⊆ Kn such that Kn ⊆ x∈Dn,i BX2 (x, 1/i). Then, ∞  D := ∞ n=1 i=1 Dn,i ⊆ X2 is a countable dense set. Hence, X2 is separable. This completes the proof of the proposition. ' & Definition 11.191 Let X := (X, O) be a topological space, Y be a normed linear space, and (X, B, μ) be a Y-valued measure space on the same set X. The triple X := (X , B, μ) is said to be a Y-valued topological measure space if B = BB (X ) ¯ := (X , BB (X ), P ◦ μ) is a topological measure space. We will say that X is and X ¯ is so. We will say that X is Tychonoff, Hausdorff, finite, σ -finite, or locally finite if X regular, completely regular, normal, first countable, second countable, separable, second category everywhere, connected, locally connected, compact, countably compact, sequentially compact, locally compact, σ -compact, or paracompact if X is so. Let X := (X, ρ) be a metric space with the natural topology O, Y be a normed linear space, and (X, BB (X ), μ) be a Y-valued measure space on the same set X. The triple X := (X , BB (X ), μ) is said to be a Y-valued metric measure space if ((X, O), BB (X ), μ) is a Y-valued topological measure space. X is said to be complete or totally bounded if X is so. Let X := (X , K, ·) be a normed linear space over the field K, O be the natural topology on X generated by the norm ·, Y be a normed linear space, and (X, BB (X), μ) be a Y-valued measure space on the same set X. The triple X := (X, BB (X), μ) is said to be a Y-valued normed linear measure space if ((X, O), BB (X), μ) is a Y-valued topological measure space. When X is a Banach space, then X is said to be a Y-valued Banach measure space. Depending on whether K = R or K = C, we will say that X is a Y-valued real or complex Banach measure space. % Proposition 11.192 Let X := (X, O) be a topological space and Y be a normed linear space over K. Define Z¯ := {μ ∈ Mf (X, BB (X ), Y) | (X , BB (X ), μ) is a finite Y-valued topological measure space}. Then, Z¯ is a closed subspace of ¯ K, ·M (X,B (X ),Y) ) =: Mf t (X , Y) is a normed Mf (X, BB (X ), Y) and (Z, f B linear space. If, in addition, Y is a Banach space, then Mf t (X , Y) is a Banach space. Furthermore, define Z := {μ ∈ Mσ (X, BB (X ), Y) | (X , BB (X ), μ) is a σ -finite Y -valued topological measure space} ⊆ Mσ (X, BB (X ), Y). Let Z admit the subset topology OZ . Then, Z =: Mσ t (X , Y) is a subspace of Mσ (X, BB (X ), Y). We will abuse the notation and denote the topological space (Z, OZ ) by Mσ t (X , Y). Proof Let Y be a normed linear space. We will show that Z¯ is a subspace of Mf (X, BB (X ), Y). ∀α1 , α2 ∈ K, ∀μ1 , μ2 ∈ Z, by Proposition 11.136, μ := α1 μ1 + α2 μ2 ∈ Mf (X, BB (X ), Y). ∀E ∈ BB (X ), ∀ ∈ (0, +∞) ⊂ R, ∀i = 1, 2, \ E) < /(1 + 2|αi |). Let by μi ∈ Z, ∃Oi ∈ O with E ⊆ Oi such that P ◦ μi (Oi O := O1 ∩O2 ∈ O. Clearly, E ⊆ O and P◦μ(O\E) ≤ ( 2i=1 |αi |P◦μi )(O\E) = -2 -2 i=1 |αi |P ◦ μi (O \ E) ≤ i=1 |αi |P ◦ μi (Oi \ E) < , where the first inequality and the first equality follow from Proposition 11.136. This shows that (X , BB (X ), P ◦ μ) is a topological measure space. Then, (X , BB (X ), μ) is a finite

520

11 General Measure and Integration

¯ Clearly, ϑM (X,B (X ),Y) ∈ Z¯ = ∅. Y-valued topological measure space and μ ∈ Z. f B ¯ Hence, Z is a subspace of Mf (X, BB (X ), Y). Then, Mf t (X , Y) is a normed linear space since Mf (X, BB (X ), Y) is a normed linear space. Next, we will show that Z¯ is closed. ∀μ ∈ Z¯ ⊆ Mf (X, BB (X ), Y), ¯ by Proposition 4.13, ∃(μn )∞ n=1 ⊆ Z such that limn∈N μn = μ. Then, limn∈N μn − μMf (X,BB (X ),Y) = limn∈N P ◦ (μn − μ)(X) = 0. ∀E ∈ BB (X ), ∀ ∈ (0, +∞) ⊂ R, ∃n ∈ N such that P ◦ (μn − μ)(X) < /2. By ¯ ∃O ∈ O with E ⊆ O such that P ◦ μn (O \ E) < /2. Then, μn ∈ Z, P ◦ μ(O \ E) = P ◦ (μn − μn + μ)(O \ E) ≤ (P ◦ μn + P ◦ (μn − μ))(O \ E) = P ◦ μn (O \ E) + P ◦ (μn − μ)(O \ E) < /2 + P ◦ (μn − μ)(X) < , where the first ¯ By equality and the first inequality follow from Proposition 11.136. Hence, μ ∈ Z. the arbitrariness of μ, we have Z¯ = Z¯ and Z¯ is closed. Let Y be a Banach space. By Proposition 11.142, Mf (X, BB (X ), Y) is a Banach space. By Proposition 4.39, Mf t (X , Y) is a Banach space. Finally, we will show that Z is a subspace of Mσ (X, BB (X ), Y). ∀α1 , α2 ∈ K, ∀μ1 , μ2 ∈ Z, by Proposition 11.138, μ := α1 μ1 + α2 μ2 ∈ Mσ (X, BB (X ), Y). ∀E ∈ BB (X ), ∀ ∈ (0, +∞) ⊂ R, ∀i = 1, 2, by μi ∈ Z, ∃Oi ∈ O with E ⊆ Oi such that P ◦ μi (Oi \ E) < /(1 ∩ O2 ∈ O. Clearly, - + 2|αi |). Let O := O1 E ⊆ O and P ◦ μ(O \ E) ≤ ( 2i=1 |αi |P ◦ μi )(O \ E) = 2i=1 |αi |P ◦ μi (O \ E) ≤ 2i=1 |αi |P ◦ μi (Oi \ E) < , where the first inequality and the first equality follow from Proposition 11.138. This shows that (X , BB (X ), P ◦ μ) is a topological measure space. Then, (X , BB (X ), μ) is a σ -finite Y-valued topological measure space and μ ∈ Z. Clearly, ϑMσ (X,BB (X ),Y) ∈ Z = ∅. Hence, Z is a subspace of Mσ (X, BB (X ), Y). ' & This completes the proof of the proposition. A bit of notation to simplify our presentation. Let Mσ (X, B) denote the set of σ -finite measures on the measurable space (X, B); Mf (X, B) denote the set of finite measures on the measurable space (X, B); Mσ t (X ) denote the set of σ -finite topological measures on the topological space X ; and Mf t (X ) denote the set of finite topological measures on the topological space X . Proposition 11.193 Let X := (X, O) be a topological space and μo : O → [0, ∞) ⊂ R. Assume that: (i) (ii) (iii) (iv) (v)

μo (∅) = 0. μo (O ), ∀O1 , O2 ∈ O with O1 ⊆ O2 . 1 ) ≤ μo (O2∞ ∞ μo ( ∞ O ) ≤ i i=1 i=1 μo (Oi ), ∀(Oi )i=1 ⊆ O. μo (O1 ∪ O2 ) = μo (O1 ) + μo (O2 ), ∀O1 , O2 ∈ O with O1 ∩ O2 = ∅. μo (O) = supU ∈O, U ⊆U⊆O μo (U ), ∀O ∈ O.

Define μ¯ o : X2 → [0, ∞) ⊂ R by μ¯ o (E) = infO∈O, E⊆O μo (O), ∀E ⊆ X. Then, the following statements hold: ¯ μ), 1. μ¯ o is an outer measure. It induces a finite complete measure space (X, B, ¯ ¯ where B := {E ⊆ X | E is measurable with respect to μ¯ o } and μ¯ := μ¯ o |B¯ .

11.10 Dual of C (X , Y) and Cc (X, Y)

521

¯ and the triple X := (X , BB (X ), μ := μ| ¯ BB (X ) ) is a finite 2. BB (X ) ⊆ B, topological measure space with μ(O) = μo (O), ∀O ∈ O. 3. The measure μ is unique in the sense that if μˆ be another measure on (X, BB (X )) satisfying μ(O) ˆ = μo (O) = μ(O), ∀O ∈ O, then μˆ = μ. Proof 1. By (i), we have μ¯ o (∅) = 0. ∀A ⊆ B ⊆ X, we have 0 ≤ μ¯ o (A) = inf ¯ o (B) ≤ μo (X) < ∞. ∀E ⊆ O∈O , A⊆O μo (O) ≤ infO∈O , B⊆O μo (O) = μ ∞ E ⊆ X, ∀ ∈ (0, +∞) ⊂ R, ∀i ∈ N, ∃Oi ∈ O with Ei ⊆ Oi i i=1 −i . Then, μ suchthat μo (Oi )-< μ¯ o (Ei ) + 2¯ o (E) = infO∈O, E⊆O μo (O) ≤ ∞ ∞ μo ( ∞ ¯ o (Ei ) + , where the second inequality i=1 Oi ) ≤ i=1 μo (Oi ) < i=1 μ follows from (iii). By the arbitrariness of , we have μ¯ o (E) ≤ ∞ ¯ o (Ei ). Hence, i=1 μ μ¯ o : X2 → [0, ∞) ⊂ R is an outer measure. It is easy to see that μ¯ o (O) = μo (O), ¯ μ) ∀O ∈ O. By Theorem 11.17, (X, B, ¯ is a finite complete measure space. 2. ∀O ∈ O, ∀E ⊆ X, ∀ ∈ (0, +∞) ⊂ R, ∃O1 ∈ O with E ⊆ O1 such that μo (O1 ) < μ¯ o (E) + /2. By (v), ∃U ∈ O with U ⊆ U ⊆ O1 ∩ O such that μo (O ∩ O1 ) < μo (U ) + /2. Then, μ¯ o (E) > μo (O1 ) − /2 ≥ μo ((O1 \ U ) ∪ U ) − /2 = μo (O1 \ U ) + μo (U ) − /2 = μ¯ o (O1 \ U ) + μo (U ) − /2 > μ¯ o (O1 \ (O1 ∩ O)) + μo (O ∩ O1 ) −  = μ¯ o (O1 \ O) + μ¯ o (O ∩ O1 ) −  ≥ μ¯ o (E \ O) + μ¯ o (E ∩ O) −  ≥ μ¯ o (E) − , where the second inequality follows from (ii), the first equality follows from (iv), and the third, the fourth, and the last inequalities follow from the fact that μ¯ o is an outer measure. By the arbitrariness of , we have μ¯ o (E) = μ¯ o (E \ O) + μ¯ o (E ∩ O). By the arbitrariness of E, O ¯ By the arbitrariness of O, we have is measurable with respect to μ¯ o and O ∈ B. ¯ ¯ ¯ By Proposition 11.13, O ⊆ B. Since B is a σ -algebra on X, then BB (X ) ⊆ B. (X, BB (X ), μ) is a measure space. Clearly, μ(O) = μ(O) ¯ = μ¯ o (O) = μo (O), ∀O ∈ O. This coupled with the definition of μ¯ o leads to the conclusion that X is a topological measure space. Clearly, μ(X) = μo (X) < ∞. Hence, X is a finite topological measure space. 3. Let μˆ be another measure on (X, BB (X )) satisfying μ(O) ˆ = μo (O) = μ(O), ∀O ∈ O. Then, μˆ is finite. Suppose μ = μ. ˆ Then, ∃E ∈ BB (X ) such that μ(E) ˆ

= ˆ Since μ(E) = infO∈O, E⊆O μo (O) = infO∈O, E⊆O μ(O) = infO∈O, E⊆O μ(O). μ(E) ˆ ≤ μ(O), ˆ ∀O ∈ O with E ⊆ O, then we must have μ(E) ˆ < μ(E). Since X is a topological measure space, then, by Proposition 11.27, ∃X \ F ∈ O with F ⊆ E such that μ(E \ F ) < (μ(E) − μ(E))/2. ˆ This implies that μ(F ) = μ(E) − μ(E \ F ) > μ(E)/2 + μ(E)/2 ˆ > μ(E) ˆ ≥ μ(F ˆ ), where the equality follows from μ being a finite measure and the last inequality follows from E ⊇ F and μˆ being a measure. Therefore, μo (X \ F ) = μ(X \ F ) = μ(X) − μ(F ) < μ(X) ˆ − μ(F ˆ )= μ(X ˆ \ F ) = μo (X \ F ), where the first equality follows from X \ F ∈ O, the second equality follows from μ being a finite measure, the inequality follows from X ∈ O, the third equality follows from μˆ being a measure, and the last equality follows from X \ F ∈ O. This is a contradiction. Hence, we have μˆ = μ and μ is unique. This completes the proof of the proposition. ' & Proposition 11.194 Let X := (X, O) be a compact Hausdorff topological space, Y be a Banach space, μ¯ be a function that assigns a vector μ(F ¯ ) ∈ Y for each

522

11 General Measure and Integration

closed subset F ⊆ X, and X := (X , BB (X ), ν) be a finite topological measure space. Assume that: (i) μ(F ¯ 1 ∪ F2 ) = μ(F ¯ 1 ) + μ(F ¯ 2 ), ∀X \ F1 , X \ F2 ∈ O with F1 ∩ F2 = ∅. ¯ 1 ) − μ(F (ii) μ(F ¯ 2 ) ≤ ν(O), ∀X \ F1 , X \ F2 , O ∈ O with F1 F2 ⊆ O. Then, there exists a unique μ ∈ Mf t (X , Y) such that μ(F ) = μ(F ¯ ), ∀X \ F ∈ O. Furthermore, P ◦ μ ≤ ν. ¯ Proof Fix any E ∈ BB (X ). Let Aˆ E := {F ⊆ E | X \ F ∈ O} and AˆE := (Aˆ E , ⊆). ¯ˆ Clearly, AE is a directed system. Claim 11.194.1 The net (μ(F ¯ ))

¯ F ∈Aˆ E

is Cauchy.

Proof of Claim ∀ ∈ (0, +∞) ⊂ R, by X being a topological measure space, ∃V ∈ O with E ⊆ V such that ν(V \ E) < /2. By Proposition 11.27, ∃X \ F ∈ O ¯ with F ⊆ E such that ν(E \ F ) < /2. ∀F1 ∈ Aˆ E with F ⊆ F1 , we have F ⊆ F1 ⊆ E ⊆ V and F F1 = F1 \ F ⊆ V \ F ∈ O. By (ii), μ(F ¯ 1 ) − μ(F ¯ ) ≤ ν(V \ F ) = ν(V \ E) + ν(E \ F ) < . Hence, the net is Cauchy. This completes the proof of the claim. ' & By Proposition 4.44, we may define μ(E) = lim ¯ˆ μ(F ¯ ) ∈ Y. Thus, we have F ∈AE defined a function μ : BB (X ) → Y. We will show that μ is the Y-valued measure we seek. Clearly, μ(F ) = μ(F ¯ ), ∀X \ F ∈ O. By (i), μ(∅) ¯ + μ(∅) ¯ = μ(∅) ¯ and then μ(∅) = μ(∅) ¯ = ϑY . ∀E ∈ BB (X ), ∀ ∈ (0, +∞) ⊂ R, ∃V ∈ O with E ⊆ V such that ν(V \ E) < ¯ ¯ ), ∃F ∈ AˆE such that μ(E) − μ(F ¯ ) < /2. /2. By μ(E) = lim ¯ˆ μ(F F ∈AE Clearly, F ⊆ E ⊆ V . Then, μ(E) < μ(F ¯ ) + /2 ≤ ν(V ) + /2 = ν(E) + ν(V \ E) + /2 < ν(E) + , where the second inequality follows from (ii). By the arbitrariness of , we have μ(E) ≤ ν(E). ∞ Fix any pairwise disjoint (En )∞ n=1 En ∈ BB (X ). n=1 ⊆ BB (X ); let E := ∀ ∈ (0, +∞) ⊂ R, ∀n ∈ N, ∃Vn ∈ O with En ⊆ Vn such that ν(Vn \ En ) < 2−n−1 /5. By Proposition 11.27, ∃X \ Fn ∈ O with Fn ⊆ En such that ¯ ¯ ), ∃Fˆn ∈ AˆEn with Fn ⊆ Fˆn ν(En \ Fn ) < 2−n−1 /5. By μ(En ) = lim ˆ¯ μ(F F ∈AEn 6 6 6 6 such that 6μ(En ) − μ( ¯ Fˆn )6 < 2−n /5. Clearly, Fn ⊆ Fˆn ⊆ En ⊆ Vn . ∃V ∈ O with E ⊆ V such that ν(V \ E) < /5. By Proposition 11.27, ∃X \ F ∈ O ¯ ¯ ), ∃Fˆ ∈ Aˆ E with F ⊆ E such that ν(E \ F ) < /5. By μ(E) = lim ¯ˆ μ(F F ∈ A E 6 6 6 6 with F ⊆ Fˆ such that 6μ(E) − μ( ¯ Fˆ )6 < /5. Clearly, F ⊆ Fˆ ⊆ E ⊆ V .   E ⊆ ∞ V , then By Proposition 5.5, Fˆ is compact. Since Fˆ ⊆ E = ∞ 6n=1 n -n n=1 n 6 n0 6 ˆ Vi . ∀n ∈ N 6with n0 ≤ 6n, 6μ(E) − i=1 ∃n 6 F6 ⊆ i=1 6 0 ∈ N such that 6 μ(Ei ) ≤ 6 6 ˆ 6 6 6 6 ˆ ¯ F ) − ni=1 μ( ¯ Fˆ )6 + 6μ( ¯ Fi ) − μ(Ei )6 < /5 + ¯ Fˆi )6 + ni=1 6μ( 6μ(E) − μ( 6 6  6 ˆ 6 ¯ F ) − μ( ¯ ni=1 Fˆi )6+/5, where the second inequality follows from (i). Clearly, 6μ(

11.10 Dual of C (X , Y) and Cc (X, Y)

523

    Fˆ and ni=1 Fˆi are closed sets. Fˆ ( ni=1 Fˆi ) = (Fˆ \( ni=1 Fˆi ))∪(( ni=1 Fˆi )\Fˆ ) ⊆ n n n ˆ ˆ (( i=1 (Vi \ Fi )) ∪ (V \ F ) ∈ O. By 6 Vi ) \ (-ni=1 Fi )) ∪6 (E \ F ) ⊆ ( i=1 n 6 6 μ(E) − < 2/5 + ν(( (ii), μ(E ) i i=1 -n -n i=1 (Vi \ Fi )) ∪ (V \ F )) ≤ 2/5 + ν(V \ F ) + ν(V \ F ) = 2/5 + i i i=1 i=1 (ν(Vi \ Ei ) + ν(Ei \ Fi )) + ν(V \ E) + ν(E \ F ) < . Then, μ(E) = ∞ μ(E n ) ∈ Y. n=1 -∞ ∞ ∞ ∀ pairwise disjoint ⊆ B (E ) (X ), n B n=1 μ(En ) ≤ n=1 ν(En ) = n=1 ∞ ν( n=1 En ) ≤ ν(X) < ∞. This shows that μ is a Y-valued pre-measure on (X, BB (X )). , ∀ pairwise disjoint (Ei )ni=1 ⊆ BB (X ) with E = BB (X ), ∀n ∈ Z+n∀E ∈ n n i=1 Ei , i=1 μ(Ei ) ≤ i=1 ν(Ei ) = ν(E). Hence, P ◦ μ(E) ≤ ν(E). By the arbitrariness of E, we have P ◦ μ ≤ ν. Hence, μ is a finite Y-valued measure on (X, BB (X )). It is easy to show that (X , BB (X ), μ) is a Y-valued topological measure space. Then, μ ∈ Mf t (X , Y). Finally, we need to show that μ ∈ Mf t (X , Y) is unique. Let μˆ ∈ Mf t (X , Y) be such that: μ(F ˆ ) = μ(F ¯ ) = μ(F ), ∀X \ F ∈ O. ∀E ∈ BB (X ), ∀ ∈ (0, +∞) ⊂ R, by Proposition 11.27, ∃X \F ∈ O with F ⊆ E such that P ◦μ(E \F ) < /2. Again by Proposition 11.27, ∃X \ F1 ∈ O with F1 ⊆ E such that P 6 ◦ μ(E ˆ \ F1 ) < 6/2. ¯ := F ∪ F1 ⊆ E, which is clearly closed. Then, 6μ(E) − μ(E) ˆ6 6 ≤ Let F 6 6 6 6 6 6 6 6 = 6μ(E \ F¯ )6 + 0 + 6μ(E) − μ(F¯ )6 + 6μ(F¯ ) − μ( ˆ F¯ ) − μ(E) ˆ ˆ F¯ )6 + 6μ( 6 6 6μ(E ˆ \ F¯ ) ≤ P ◦ μ(E \ F ) + P ◦ μ(E ˆ \ F1 ) < . ˆ \ F¯ )6 ≤ P ◦ μ(E \ F¯ ) + P ◦ μ(E By the arbitrariness of , we have μ(E) = μ(E). ˆ Hence, μ = μ. ˆ This completes the proof of the proposition. ' & Definition 11.195 Let X := (X, O)  be a topological space. E ⊆ X is said to be a ∞ ¯ Gδ if ∃(Oi )∞ ⊆ O such that E = i=1 Oi . E ⊆ X is said to be an Fσ if exists i=1  ∞ ∞ i ⊆ O such that E¯ = i=1 Fi . % F i=1 Proposition 11.196 Let X := (X, ρ) be a locally compact separable metric space with the natural topology O, E := {E ⊆ X | E is a compact Gδ }, and Ba (X ) be the σ -algebra generated by E. Then, Ba (X ) = BB (X ). Proof ∀E ∈ E, by Proposition 5.5, E is closed. Then, E ∈ BB (X ). Hence, E ⊆ BB (X ). Since BB (X ) is a σ -algebra, then Ba (X ) ⊆ BB (X ). ∀x ∈ X,by Definition 5.49, ∃Ox ∈ O such that x ∈ Ox and Ox is compact. Then, X = x∈X Ox . By Propositions 4.4 and 3.24, ∃ a countable set D ⊆ X such   that X = x∈D Ox = x∈D Ox .  ∈ O, ∀x ∈ D, Fx := F ∩ Ox is compact and closed by Propositions 5.5 ∀F   and 3.5. Then, Fx = ∞ ¯ 1/n), by Proposition 4.10. This implies n=1 x∈F ¯ x BX (x, that Fx is aGδ in addition to being compact. Then, Fx ∈ E ⊆ Ba (X ). This implies  ∈ Ba (X ). By the arbitrariness of F , we that F = x∈D Fx ∈ Ba (X ). Hence, F have O ⊆ Ba (X ). By Ba (X ) being a σ -algebra, we have BB (X ) ⊆ Ba (X ). Therefore, Ba (X ) = BB (X ). This completes the proof of the proposition. ' & ∈O Lemma 11.197 Let X := (X, O) be a normal topological space. Then, ∀E with E being a Gδ , there exists a continuous function φ : X → [0, 1] ⊂ R such that E = {x ∈ X | φ(x) = 1}.

524

11 General Measure and Integration

∞ Proof Since E is a Gδ , then ∃(Oi )∞ i=1 Oi . ∀i ∈ N, by i=1 ⊆ O such that E = Urysohn’s Lemma 3.55, there exists a continuous function φ : X → [0, 1] ⊂ R i -∞ −i such that φi |E = 1 and φi |O 2 φ . By Proposition 4.26, i = 0. Let φ := i i=1 φ : X → [0, 1] ⊂ R is continuous. Clearly, φ| = 1 and φ(x) < 1, ∀x ∈ E ∞   = E. Hence, the result holds. This completes the proof of the lemma. ' & O i i=1 Theorem 11.198 Let X := (X, ρ) be a locally compact separable metric space with the natural topology O and μ be a finite measure on (X, BB (X )). Then, X := (X , BB (X ), μ) is a finite metric measure space. Thus, Mf t (X ) = Mf (X, BB (X )). As a consequence, Mf (X, BB (X ), Y) = Mf t (X , Y), where Y is any normed linear space. Proof By Propositions 4.4, 3.24, and 5.72, X is σ -compact. Let R ⊆ BB (X ) be such that E ∈ R if ∀ ∈ (0, +∞) ⊂ R, we have: (i) ∃O ∈ O with O being σ -compact and E ⊆ O such that μ(O \ E) < .  ∈ O with F being a compact Gδ and F ⊆ E such that μ(E \ F ) < . (ii) ∃F We will show that R = BB (X ).    ∈ O, F = ∞ ¯ Claim 11.198.1 ∀F x∈F BX (x, 1/n) =: F and is a Gδ . n=1 , by Proposition 4.10, 0 := Proof of Claim Clearly, F ⊆ F¯ . ∀x0 ∈ F  dist(x0 , F ) > 0. Then, ∀n ∈ N with 1/n < 0 , x0 ∈ / x∈F BX (x, 1/n) and ¯ . Hence, F ¯ and F = F¯ . Clearly, F¯ is a G . This completes the proof of ⊆ F x0 ∈ F δ the claim. ' & Claim 11.198.2 ∀O ∈ O, O is σ -compact.  is a Gδ . Then, ∃(Oi )∞ ⊆ O such that Proof of Claim By Claim 11.198.1, O i=1 ∞ ∞  = i=1 Oi and O = i=1 O i . By X being σ -compact, ∃ compact sets (Ki )∞ O i=1 ∞ ∞ ∞ i ∩ Kj ). ∀i, j ∈ N, such that X = i=1 Ki . Then, O = O ∩ X = i=1 j =1 (O i ∩ Kj is compact. Hence, O is σ -compact. ' & by Propositions 3.5 and 5.5, O ∞ ∞ ∀(Ei )i=1 ⊆ R, let E := i=1 Ei ∈ BB (X ). ∀ ∈ (0, +∞) ⊂ R, ∀i ∈ N, ∃Oi ∈ O with Oi being σ -compact and Ei ⊆ Oi such that μ(Oi \ Ei ) < 2−i ; i ∈ O with Fi being a compact Gδ and Fi ⊆ Ei such that μ(Ei \ Fi ) < 2−i−1 . ∃F  Let O := ∞ by Claim 11.198.2.-Then, E ⊆ O and i=1 O i ∈ O, which ∞is σ -compact ∞ ∞ μ(O \E) = μ(( ∞ O )\( E )) ≤ μ( i=1 i i=1 i i=1 (Oi \Ei )) ≤ i=1 nμ(Oi \Ei ) < . Hence, E satisfies (i). By Proposition 11.7, μ(E) = lim μ( n∈N i=1 Ei ). Then, n0 n0 ∃n0 ∈ N such that μ(E) < μ( i=1 Ei ) + /2. Let F := i=1 Fi . Clearly, F is compact and closed. By Claim 11.198.1, F ⊆ E and  μ(E \ F ) = n0 F is a Gδ . Then, n0 n0 n0 μ(E) − μ(F ) < μ( E ) − μ( F ) + /2 = μ(( E ) \ ( i i=1 i i=1 i i=1 Fi )) + n0 -i=1 n0 /2 ≤ μ( i=1 (Ei \ Fi )) + /2 ≤ i=1 μ(Ei \ Fi ) + /2 < . Hence, E satisfies (ii). Then, E ∈ R. Thus, R is closed under countable unions. ∀E ∈ R, ∀ ∈ (0, +∞) ⊂ R, ∃O ∈ O with O being σ -compact and E ⊆ O  ∈ O with F being a compact Gδ and F ⊆ E such that such that μ(O \ E) < ; ∃F  is a closed set, O  ⊆ E,  and μ(E  \ O)  = μ(O \ E) < . By μ(E \ F ) < . Then, O ∞ X being σ -compact, ∃ compact sets (Ki )∞ such that X = i=1 Ki . Without loss i=1

11.10 Dual of C (X , Y) and Cc (X, Y)

525

  of generality, we may assume that K i∞⊆ Ki+1 , ∀i ∈ N. Note ∞that  > μ(E \ O) =  \ (O ∩ ∞     μ(E K )) = μ( E \ ( ( O ∩ K ))) = μ( ( E \ ( O ∩ Ki ))) = i i=1 i i=1 i=1  O∩K  i )), where the last equality follows from Proposition 11.5. Then, limi∈N μ(E\(  \ (O  ∩ Kn )) < . By Propositions 3.5 and 5.5, O  ∩ Kn is ∃n ∈ N such that μ(E ∩Kn ⊆ E.  Hence, closed and compact; then it is a Gδ by Claim 11.198.1. Clearly, O  satisfies (ii). F  ∈ O and is σ -compact by Claim 11.198.2. Note that E ⊆ F  and E  \ E)  = μ(E \ F ) < . Then, E  satisfies (i). Hence, E  ∈ R. R is closed under μ(F set complements. Clearly, ∅ ∈ R. This proves that R is a σ -algebra.  ∈ O with E being a compact Gδ , ∀ ∈ (0, +∞) ⊂ R, then E satisfies (ii) ∀E trivially. By Lemma 11.197, there exists a continuous function φ : X → [0, 1] ⊂ R such that E = {x ∈ X | φ(x) = 1}. ∀i ∈ N, let Oi := {x ∈ X | φ(x) > 1 − 1/i} ∈ O. Clearly, E = ∞ i=1 Oi and Oi+1 ⊆ Oi , ∀i ∈ N. By Proposition 11.5, μ(E) = limi∈N μ(Oi ). ∃n ∈ N such that μ(On ) −  ≤ μ(E) ≤ μ(On ). Clearly, we have E ⊆ On . Then, μ(On \ E) = μ(On ) − μ(E) < . By Claim 11.198.2, On is σ -compact. Then, E satisfies (i) and E ∈ R. Thus, R is a σ -algebra and E := {E ∈ BB (X ) | E is a compact Gδ } ⊆ R. By Proposition 11.196, R = BB (X ). Then, X is a finite metric measure space. Thus, we have Mf (X, BB (X )) ⊆ Mf t (X ). Clearly, Mf (X, BB (X )) ⊇ Mf t (X ). Hence, Mf (X, BB (X )) = Mf t (X ). ∀μ¯ ∈ Mf (X, BB (X ), Y), then P ◦ μ¯ ∈ Mf (X, BB (X )) = Mf t (X ). This implies that μ¯ ∈ Mf t (X , Y). Hence, Mf (X, BB (X ), Y) ⊆ Mf t (X , Y). Clearly, Mf (X, BB (X ), Y) ⊇ Mf t (X , Y). Then, Mf (X, BB (X ), Y) = Mf t (X , Y). This completes the proof of the theorem. ' & Definition 11.199 Let X := (X, O) be a topological space, Y be a normed linear space, and f : X → Y. The support of f is the set supp(f ) := {x ∈ X | f (x) = ϑY }. % Lemma 11.200 Let X := (X, O) be compact Hausdorff topological space, Z := C(X , R), and f ∈ Z∗ . f¯ ∈ Z∗ is said to be a positive linear functional if f¯(z) ≥ 0, ∀z ∈ P := {h ∈ Z | h : X → [0, ∞) ⊂ R}. Then, f = f+ −f− , where f+ , f− ∈ Z∗ are positive linear functionals. Proof ∀z ∈ P , define f+ (z) := supφ∈Z, 0≤φ(x)≤z(x), ∀x∈X f (φ). Then, 0 ≤ f+ (z) ≤ f z < ∞ and f+ (z) ≥ f (z), ∀z ∈ P . Clearly: (i) f+ (αz) = αf+ (z), ∀z ∈ P and ∀α ∈ [0, ∞) ⊂ R. ∀z1 , z2 ∈ P , ∀i ∈ {1, 2}, ∀φi ∈ Z with 0 ≤ φi (x) ≤ zi (x), ∀x ∈ X, we have f (φ1 )+f (φ2 ) = f (φ1 +φ2 ) ≤ f+ (z1 +z2 ). Then, f+ (z1 )+f+ (z2 ) ≤ f+ (z1 +z2 ). On the other hand, ∀φ ∈ Z with 0 ≤ φ(x) ≤ z1 (x) + z2 (x), ∀x ∈ X, we have f (φ) = f (φ ∧ z1 ) + f (φ − φ ∧ z1 ) ≤ f+ (z1 ) + f+ (z2 ). Then, f+ (z1 + z2 ) ≤ f+ (z1 ) + f+ (z2 ). Therefore: (ii) f+ (z1 ) + f+ (z2 ) = f+ (z1 + z2 ), ∀z1 , z2 ∈ P . ∀z ∈ Z, define z+ := z ∨ 0 ∈ P and z− := (−z) ∨ 0 ∈ P . Clearly, z = z+ − z− . Define f+ (z) := f+ (z+ ) − f+ (z− ) ∈ R. Then, f+ : Z → R is well-defined. We will show that f+ ∈ Z∗ .

526

11 General Measure and Integration

∀z ∈ Z, |f+ (z)| = |f+ (z+ ) − f+ (z− )| ≤ f+ (z+ ) + f+ (z− ) = f+ (z+ + z− ) ≤ f z+ + z−  = f z. ∀z1 , z2 ∈ Z, let z := z1 + z2 . Then, f+ (z1 ) + f+ (z2 ) = f+ (z1+ ) − f+ (z1− ) + f+ (z2+ ) − f+ (z2− ) = f+ (z1+ + z2+ ) − f+ (z1− + z2− ) = f+ (z+ ) + f+ (z1+ + z2+ − z+ ) − f+ (z− ) − f+ (z1− + z2− − z− ) = f+ (z) + f+ (z1+ + z2+ − z+ ) − f+ (z1− + z2− − z− ), where the first equality follows from the definition of f+ , the second equality follows from (ii), the third equality follows from (ii), and the last equality follows from the definition of f+ . Note that, ∀x ∈ X, .

(z1+ + z2+ − z+ )(x) = z1 (x) ∨ 0 + z2 (x) ∨ 0 − (z1 (x) + z2 (x)) ∨ 0 ⎧ ⎪ 0 z1 (x) ≥ 0 and z2 (x) ≥ 0 ⎪ ⎪ ⎪ ⎪ −z2 (x) z1 (x) ≥ 0 > z2 (x) ≥ −z1 (x) ⎪ ⎪ ⎨ z1 (x) z1 (x) ≥ 0 ≥ −z1 (x) > z2 (x) = ⎪ −z (x) z2 (x) ≥ 0 > z1 (x) ≥ −z2 (x) ⎪ ⎪ 1 ⎪ ⎪ ⎪ ⎪ z2 (x) z2 (x) ≥ 0 ≥ −z2 (x) > z1 (x) ⎩ 0 z1 (x) < 0 and z2 (x) < 0 = (−z1 (x)) ∨ 0 + (−z2 (x)) ∨ 0 − (−z1 (x) − z2 (x)) ∨ 0) = z1− (x) + z2− (x) − z− (x)

Hence, (iii) f+ (z1 ) + f+ (z2 ) = f+ (z) = f+ (z1 + z2 ), ∀z1 , z2 ∈ Z. ∀z ∈ Z, ∀α ∈ R. If α = 0, then f+ (αz) = 0 = αf+ (z). If α > 0, then f+ (αz) = f+ (αz+ ) − f+ (αz− ) = αf+ (z+ ) − αf+ (z− ) = αf+ (z), where the first equality follows from the definition of f+ , the second equality follows from (i), and the third equality follows from the definition of f+ . If α < 0, then f+ (αz) = f+ (−αz− )−f+ (−αz+ ) = −αf+ (z− )+αf+ (z+ ) = αf+ (z), where the first equality follows from the definition of f+ , the second equality follows from (i), and the third equality follows from the definition of f+ . Hence, we have: (iv) f+ (αz) = αf+ (z), ∀z ∈ Z and ∀α ∈ R. Hence, f+ ∈ Z∗ and is a positive linear functional. Let f− := f+ − f ∈ Z∗ . Clearly, f− (z) ≥ 0, ∀z ∈ P . Hence, f− is also a positive linear functional. This completes the proof of the lemma. ' & Theorem 11.201 (Riesz Representation Theorem) Let X := (X, O) be a compact Hausdorff topological space, Y be a normed linear space over K, Z := C(X , Y), and Z := {z ∈ Z | z = hy, h ∈ C(X , R), y ∈ Y}. Assume that span (Z) = Z. Then, ∀f ∈ Z∗ , ∃! μ ∈ Mf t (X , Y∗ ) such that f (z) = X dμ(x), z(x) =: μ, z, ∀z ∈ Z. Furthermore, Z∗ = Mf t (X , Y∗ ) isometrically isomorphically. Proof By Proposition 5.14, X is normal. By Example 7.31, Z is a normed linear space over K. Fix any f ∈ Z∗ . Define νo : O → [0, f ] ⊂ R by νo (O) := supz∈Z,z≤1, supp(z)⊆O |f (z)|, ∀O ∈ O. Clearly, we have: (i) νo (∅) = 0. (ii) 0 ≤ νo (O) ≤ f  < ∞, ∀O ∈ O.

11.10 Dual of C (X , Y) and Cc (X, Y)

527

(iii) νo (O1 ) ≤ νo (O2 ), ∀O1 , O 2 ∈ O with O1 ⊆ O2 . ∞ ∀(Oi )∞ ⊆ O, let O := i=1 Oi ∈ O. ∀z ∈ Z with K := supp(z) ⊆ O i=1 and z ≤ 1, by Proposition 5.5, K is compact. By Corollary 5.65 of Partition of Unity, m ∃m ∈ Z+ , ∃n1 , . . . , nm ∈ N, which may be taken to be distinct, ∃ ϕni i=1 ⊆ C(X R, supp(ϕni ) ⊆ Oni , - , R) such that ϕni : X → [0, 1]-⊂ m i = 1, . . . ,m, m ϕ (x) = 1, ∀x ∈ K, and 0 ≤ n i i=1 i=1 6 ϕni (x) ≤ 1, ∀x ∈ X. 6 6zϕn 6 ≤ 1, and supp(zϕn ) ⊆ Then, z = m zϕ . ∀i ∈ {1, . . . , m}, zϕ ∈ Z, n n i i i i i=1 - m   ≤ |f supp(z) ∩ supp(ϕ ) ⊆ supp(ϕ ) ⊆ O . Then, (z)| = f (zϕ ) ni ni i=1  ni -m -m  -∞ni f (zϕn ) ≤ ν (O ) ≤ ν (O ). By the arbitrariness of z, we ni i i i=1 i=1 o i=1 o have:  -∞ ∞ (iv) νo (O) = νo ( ∞ i=1 Oi ) ≤ i=1 νo (Oi ), ∀(Oi )i=1 ⊆ O. ∀O1 , O2 ∈ O with O1 ∩ O2 = ∅, ∀ ∈ (0, +∞) ⊂ R, ∃¯zi ∈ Z with supp(¯zi ) ⊆ Oi and ¯zi  ≤ 1, such that |f (¯zi )| > νo (Oi ) − /2, i = 1, 2. ∀i ∈ {1, 2}, when (¯zi )| f (¯zi ) = 0, take zi := |ff (¯ ¯ i ∈ Z; and when f (¯zi ) = 0, take zi := z¯ i ∈ Z. Then, zi ) z zi ∈ Z, zi  = ¯zi  ≤ 1, supp(zi ) = supp(¯zi ) ⊆ Oi , and f (zi ) = |f (¯zi )| > νo (Oi ) − /2. Let z := z1 + z2 ∈ Z. Clearly, z ≤ 1 since O1 ∩ O2 = ∅. Note that supp(z) ⊆ supp(z1 ) ∪ supp(z2 ) = supp(z1 ) ∪ supp(z2 ) ⊆ O1 ∪ O2 , where the equality follows from Proposition 3.3. Then, νo (O1 ) + νo (O2 ) −  < f (z1 )+f (z2 ) = f (z) ≤ νo (O1 ∪O2 ) ≤ νo (O1 )+νo (O2 ), where the last inequality follows from (iv). By the arbitrariness of , we have: (v) νo (O1 ∪ O2 ) = νo (O1 ) + νo (O2 ), ∀O1 , O2 ∈ O with O1 ∩ O2 = ∅. ∀O ∈ O, ∀ ∈ (0, +∞) ⊂ R, ∃z ∈ Z with K := supp(z) ⊆ O and z ≤ 1, such that |f (z)| > νo (O) − . By Proposition 5.5, K is compact. By Proposition 3.35, ∃U ∈ O, such that K ⊆ U ⊆ U ⊆ O. Then, νo (O) ≥ νo (U ) ≥ |f (z)| > νo (O) − , where the first inequality follows from (iii). Hence, we have: (vi) νo (O) = supU ∈O, U ⊆U⊆O νo (U ), ∀O ∈ O. ¯ := By Proposition 11.193, there exists a finite topological measure space X (X , BB (X ), ν) such that ν(O) = νo (O), ∀O ∈ O and ν(E) = infO∈O, E⊆O νo (O), ∀E  ∈ BB (X ). Furthermore, ν is unique among measures νˆ on (X , BB (X )) with νˆ O = νo . Clearly, ν ∈ Mf t (X ).  ∈ O. Let AF := {(U, V , h) ∈ O × O × C(X , R) | F ⊆ U ⊆ U ⊆ Fix any F V , h : X → [0, 1] ⊂ R, h|U = 1, supp(h) ⊆ V }. Define a relation ≺ on AF by (U1 , V1 , h1 ) ≺ (U2 , V2 , h2 ) if V1 ⊇ V2 . By Proposition 3.35 and Urysohn’s Lemma 3.55, A¯ F := (AF , ≺) is a directed system. ∀(U, V , h) ∈ A¯F , the function fh : Y → K, defined by fh (y) = f (hy), ∀y ∈ Y, is a bounded linear functional since f ∈ Z∗ . Then, fh ∈ Y∗ . This defines a net (fh )(U,V ,h)∈A¯ F ⊆ Y∗ . Claim 11.201.1 The net (fh )(U,V ,h)∈A¯ F ⊆ Y∗ is Cauchy. Proof of Claim ∀ ∈ (0, +∞) ⊂ R, ∃V ∈ O with F ⊆ V such that ν(V \ F ) < . By Proposition 3.35, ∃U, Uˇ ∈ O such that F ⊆ U ⊆ U ⊆ Uˇ ⊆ Uˇ ⊆ V . By Urysohn’s Lemma 3.55, ∃h ∈ C(X , R) with h : X → [0, 1] ⊂ R such that h|U = 1 and h|ˇ = 0. Then, supp(h) ⊆ Uˇ ⊆ V U and (U, V , h) ∈ A¯ F . ∀(U1 , V1 , h1 ) ∈ A¯F with (U, V , h) ≺ (U1 , V1 , h1 ),

528

11 General Measure and Integration

6  6 BB CC we have V1 ⊆ V and 6fh − fh1 6 = supy∈Y, y≤1 fh , y − fh1 , y  = supy∈Y, y≤1 |f (hy) − f (h1 y)| = supy∈Y, y≤1 |f ((h − h1 )y)|. Note that, ∀y ∈ Y with y ≤ 1, (h − h1 )y ∈ Z, (h − h1 )y ≤ 1, and .

supp((h − h1 )y) ⊆ supp(h − h1 ) ⊆ (supp(h) ∪ supp(h1 )) \ {x ∈ X | h(x) = 1 = h1 (x)} ⊆ (supp(h) ∪ supp(h1 )) \ (U ∩ U1 ) = supp(h) \ (U ∩ U1 ) ∪ supp(h1 ) \ (U ∩ U1 ) ⊆ supp(h) \ (U ∩ U1 ) ∪ supp(h1 ) \ (U ∩ U1 ) = (supp(h) \ (U ∩ U1 )) ∪ (supp(h1 ) \ (U ∩ U1 )) ⊆ (V \ F ) ∪ (V1 \ F ) ⊆ V \ F ∈ O

where the third containment follows from the fact that h|U = 1 and h1 |U1 = 1 and 6 6 the first equality follows from Proposition 3.3. Then, 6fh − fh1 6 ≤ νo (V \ F ) = ν(V \ F ) < . Hence, the net is Cauchy. This completes the proof of the claim. & ' By Propositions 4.44 and 7.72:  ∈ O. (vii) lim(U,V ,h)∈A¯ F fh =: μ(F ¯ ) ∈ Y∗ , ∀F Thus, we have defined a function μ¯ that assigns a vector μ(F ¯ ) ∈ Y∗ to each closed subset F ⊆ X. We will next show that μ¯ satisfies the assumption of Proposition 11.194. Fix any F1 , F2 ∈ O with F1 ∩ F2 = ∅. Fix any  ∈ (0, ∞) ⊂ R. By the normality of X , ∃V1 , V2 ∈ O with V1 ∩ V2 = ∅ such that Fi ⊆ Vi , i = 1, 2. Without loss of generality, we may assume that ν(Vi \ Fi ) < /4, i = 1, 2. Fix any i ∈ {1, 2}. By lim(U,V ,h)∈A¯ F fh = μ(F ¯ i ), ∃(Uˆ i , Vˆi , hˆ i ) ∈ A¯ Fi with Vˆi ⊆ Vi such i 6 6 6 6 ˆ ∈ that 6μ(F ¯ 1 ∪ F2 ), ∃(Uˆ , Vˆ , h) ¯ i ) − fhˆ i 6 < /6. By lim(U,V ,h)∈A¯ F ∪F fh = μ(F 1 2 6 6 ¯ 1 ∪ F2 ) − fhˆ 6 g∗ (P ◦μ)(E)−. By the arbitrariness of , we have sE¯ ≥ g∗ (P ◦μ)(E) ¯ and sE¯ = g∗ (P ◦ μ)(E). ¯ g∗ μ) is a Y-valued measure space with ¯ B, Therefore, by Definition 11.108, (X, P ◦(g∗ μ) = g∗ (P ◦μ). It is straightforward to show that g is an isomeasure between X and X¯ . This completes the proof of the proposition. ' &

12.2 Change of Variable

553

¯ be topological spaces, ¯ O) Proposition 12.11 Let X := (X, O) and X¯ := (X, g : X → X¯ be a homeomorphism, (Y be a normed linear space,) and ¯ := X := (X , BB (X ), μ) be a (Y-valued) topological measure space. Then, X (X¯ , BB X¯ , g∗ μ) is a (Y-valued) topological measure space and g is a homeo¯ morphical isomeasure between X and X. Proof Clearly, g is bijective and g and  g inv are continuous.  By  Proposition 11.34, ¯ ∈ ∀E ∈ BB (X ), we have g(E) ∈ BB X¯ , and ∀E¯ ∈ BB X¯ , we have ginv (E) BB (X ). Then, by Propositions 12.9 and 12.10, the (Y-valued) induced measure   ¯ BB X¯ , g∗ μ) is a (Y-valued) measure space, and g is an g∗ μ is well-defined, (X, ¯ isomeasure between X and X.   ¯ is a (Y-valued) topological measure space. ∀E¯ ∈ BB X¯ , We will show that X ¯ ∈ BB (X ). Since X is a (Y-valued) topological ∀ ∈ (0, ∞) ⊂ R, we have ginv (E) ¯ ⊆ U such that μ(U \ ginv (E)) ¯ <  measure space, then ∃U ∈ O with ginv (E) ¯ ¯ ¯ ¯ (P ◦ μ(U \ ginv (E)) < ). Let U := g(U ) ∈ O. Then, E ⊆ g(U ) = U¯ and ¯ = μ(ginv (U¯ \ E)) ¯ = μ(ginv (U¯ )\ginv (E)) ¯ = μ(U \ginv (E)) ¯ <  (in the g∗ μ(U¯ \ E) ¯ ¯ ¯ = case when μ is a Y-valued measure, we have P ◦(g∗μ)(U \ E) = g∗ (P ◦μ)(U¯ \ E) ¯ ¯ ¯ ¯ ¯ P ◦μ(ginv(U \ E)) = P ◦μ(ginv(U )\ginv (E)) = P ◦μ(U \ginv (E)) < , where the ¯ is a (Y-valued) topological first equality follows from Proposition 12.10). Hence, X ¯ measure space. Then, g is a homeomorphical isomeasure between X and X. ' & ¯ ν) be a ¯ B, Proposition 12.12 Let X := (X, B, μ) be a measure space, X¯ := (X, ¯ measure space, Y be a normed linear space, f : X¯ → Y be B-measurable, and g : X → X¯ be an isomeasure between X and X¯ . Then, X (f ◦ g) dμ = X¯ f dν, whenever one of the integrals exists. Proof Clearly, f ◦ g is B-measurable. We will distinguish two exhaustive and mutually exclusive cases: Case 1: μ(X) < +∞; Case 2: μ(X) = +∞. ¯ = μ(ginv (X)) ¯ = μ(X) < +∞. Let I(Y) be Case 1: μ(X) < +∞. ν(X) the α , Uα ) | α ∈ Λ} ∈ I(Y), let F¯R := - integration system on Y. ∀R = {(yα∈Λ yα ν(finv (Uα )) ∈ Y and FR := α∈Λ yα μ((f ◦ g)inv (Uα )) ∈ Y. Note that F¯R = α∈Λ yα μ(ginv (finv (Uα ))) = FR . Then, by Definition 11.70, we have ¯ X¯ f dν = limR∈I(Y) FR = limR∈I(Y) FR = X (f ◦ g) dμ, whenever one of the integrals exists.     ¯ = μ(ginv (X)) ¯ = μ(X) = +∞. Let F¯ ¯ Case 2: μ(X) = +∞. ν(X) A A∈M ¯ X¯ be the net for X¯ f dν and (FA )A∈M(X ) be the net for X (f ◦ g) dμ as defined in Definition 11.71. Without loss of generality, assume that X¯ f dν exists. Then, 7 .



f dν = =

lim

¯ A∈M( X¯ )

lim

F¯A¯ = 7

7 lim

¯ A∈M( X¯ ) A¯

¯ ¯ A∈M( X¯ ) ginv (A)

f |A¯ dνA¯

(f ◦ g)|ginv (A) ¯ dμginv (A) ¯

554

12 Differentiation and Integration

where the first two equalities follow from Definition 11.71 and the third equality follows from Case 1. Fix any open set U (U  ⊆ Re if Y = R or U ⊆ Y ¯ if Y = R) with X¯ f dν ∈ U , ∃A¯ 0 ∈ M X¯ , ∀A¯ ∈ M X¯ with A¯ 0 ⊆ A, ¯ 0 ) ⊆ A, F¯A¯ = ginv (A) ¯ dμginv (A) ¯ ∈ U . ∀A ∈ M(X ) with ginv (A ¯ (f ◦ g)|ginv (A) ¯ we have A ∈ B and μ(A) P ◦ μj (Ej ) = nj =1 P ◦ -∞ μj (Ej ∩ Xj,nj ) = ∞ nj =1 P ◦ μj,nj (Ej ∩ Xj,nj ), K μj (Ej ) = nj =1 μj (Ej ∩ -∞ Xj,nj ) = nj =1 μj,nj (Ej ∩ Xj,nj ). Then, j =1 Ej

k ∞ ∞ k k # "  " . .   # P ◦λ Ej = ··· P ◦ λn1 ,...,nk Ej ∩ Xj,nj

.

j =1

n1 =1

=

∞ .

nk =1

···

n1 =1

=

∞ .

···

k "

j =1

(Ej ∩ Xj,nj )

#

j =1

∞  k .

P ◦ μj,nj (Ej ∩ Xj,nj )

nk =1 j =1

k " . ∞  j =1

P ◦ λn1 ,...,nk

nk =1

n1 =1

=

∞ .

j =1

k #  P ◦ μj,nj (Ej ∩ Xj,nj ) = P ◦ μj (Ej )

nj =1

j =1

1, ∀i0 ∈ J¯ := {1, . . . , m}, let Di0 := x ∈ Rm−1  ∃x¯ ∈ Ω ·  πi0 (x) ¯ = πi0 (x0 ) and πi0 (x) ¯ = x , where πi0 : Rm → Rm−1 is defined by πi0 (x) = (π1 (x), . . . , πi0 −1 (x), πi0 +1 (x), . . . , πm (x)) ∈ Rm−1 , ∀x ∈ Rm , and Fi0 : Di0 → Y be defined by Fi0 (x) = F (x), ¯ ∀x ∈ Di0 , ∀x¯ ∈ Ω with πi0 (x) ¯ = πi0 (x0 ) and πi0 (x) ¯ = x. The function Fi0 is absolutely continuous at πi0 (x0 ). (ii) ∃a ∈ (0, ∞) ⊂ R, ∀ ∈ (0, ∞) ⊂ R, ∃δ() ∈ (0, ∞) ⊂ R, ∀n ∈ N,  ∀(x¯i )ni=1 , (x˜i )ni=1 ⊆ rx0 −a1m ,x0 +a1m ∩ Ω with x¯i = x˜i , ∀i ∈ {1, . . . , n},

608

12 Differentiation and Integration

 n rx¯i ,x˜i 6i=1 being 6 pairwise disjoint, and ni=1 μBm (rx¯i ,x˜i ) < δ(), we have -n 6 6 i=1 ΔF (rx¯ i ,x˜ i ) < . F is said to be absolutely continuous if it is absolutely continuous at any x0 ∈ Ω. F is said to be absolutely continuous on E ⊆ Ω if it is absolutely continuous at any x0 ∈ E. % In the above definition, the definition for general Rm case subsumes the definition for R case. We explicitly specify the R case for its direct implications. Proposition 12.59 Let m ∈ N, Rm be endowed with the usual positive cone, Ω ∈ BB (Rm ) be a rectangle with subset topology O, Y be a normed linear space over K, and F : Ω → Y be absolutely continuous at x0 ∈ Ω. Then, F is continuous at x0 . Proof We will prove the result using mathematical induction on m: 1◦ m = 1. By Definition 12.58, F is continuous at x0 . This case is proved. 2◦ Assume that the result holds for m ≤ k − 1 ∈ N. 3◦ Consider the case when m = k ∈ {2, 3, . . .}. By (i) of Definition 12.58, ∀i0 ∈ J¯ := {1, . . . , m}, Fi0 : Di0 → Y is absolutely continuous at πi0 (x0 ), where Fi0   and Di0 are as defined in Definition 12.58. Clearly, Di0 ∈ BB Rm−1 is a rectangle since Ω is a rectangle. By the inductive assumption, Fi0 is continuous at πi0 (x0 ). 6 6 6 6 ∀ ∈ (0, ∞) ⊂ R, ∃δi0 () ∈ (0, ∞) ⊂ R such that 6Fi0 (x) − Fi0 (πi0 (x0 ))6 < , " # ∀x ∈ BRm−1 πi0 (x0 ), δi0 () ∩ Di0 . By (ii) of Definition 12.58, ∃a ∈ (0, ∞) ⊂ R, ∀ ∈ (0, ∞) ⊂ R, ¯ ∃δ() ∈ (0, ∞) ⊂ R, ∀n ∈ N, ∀(x¯i )ni=1 , (x˜i )ni=1 ⊆ rx0 −a 1m ,x0 +a 1m ∩ Ω 

being pairwise disjoint, and with x¯ = x˜i , ∀i ∈ {1, . . . , n}, (rx¯i ,x˜i )ni=1 6 -n i -n 6 6 6 ¯ μ (r ) < δ(), and we have Δ (r ) < . Take δ() := Bm F x¯ i ,x˜ i i=1 J x¯ i ,x˜ i i=1 m −m −m −m ¯ min{a, δ(2 ), δ1 (2 6 ), . . . , δ6m (2- )} ∈ R+ . ∀x ∈ BRm (x0 , δ()) ∩ Ω, we 6 have F (x) − F (x0 ) ≤ 6ΔG (rx, ˇ xˆ ) + J ⊂J¯ F (xJ ) − F (x0 ), where xˇ := x∧x0 and xˆ := x ∨ x0 , G : Ω → Y is defined by G(x) = F (x) − F (x0 ), ∀x ∈ Ω, and πi (x) ∀i ∈ J , ∀J ⊂ J¯. xJ ∈ rx, ˇ xˆ is defined by πi (xJ ) = πi (x0 ) ∀i ∈ J¯ \ J Clearly, x, ˇ xˆ ∈ rx0 −a1m ,x0 +a1m ∩ Ω since a > |x − x0 | ≥ max{|xˇ − x0 |, m ¯ −m ), then ΔF (rx, |xˆ − x0 |} ≥ 0. Since μBm (rx, < δ(2 ˇ xˆ ) ≤ |x − x0 | ˇ xˆ ) = −m ¯ ¯ ΔG (rx, ) < 2 . ∀J ⊂ J , fix any i ∈ J \ J , and we have 0 ˇ xˆ F (xJ ) − F (x0 ) = Fi0 (πi0 (xJ )) − Fi0 (πi0 (x0 )). Clearly, πi0 (xJ ) ∈     Di0 and πi0 (xJ ) − πi0 (x0 ) = |xJ − x0 | ≤ |x − x0 | < δi0 (2−m ). 6 6 6 6 Then, F (xJ ) − F (x0 ) = 6Fi0 (πi0 (xJ )) − Fi0 (πi0 (x0 ))6 < 2−m . Hence, F (x) − F (x0 ) < . This shows that F is continuous at x0 . This completes the induction process. This completes the proof of the proposition. ' &

12.5 Absolute and Lipschitz Continuity

609

Proposition 12.60 Let m ∈ N, Rm be endowed with the usual positive cone, Ω ∈ BB (Rm ) be a rectangle with subset topology O, Y be a normed linear space over K, F : Ω → Y, and x0 ∈ Ω. Then, F is absolutely continuous at x0 if, and only if: (a) ∀i ∈ {0, . . . , m − 1}, ∀J ⊂ J¯ := {1, . . . , m} with card(J ) = i, let πJ : Rm → Ri be such that πJ (x) ¯ = (πl (x)) ¯ l∈J , ∀x¯ ∈ Rm , πJ¯\J : Rm → Rm−i be such that πJ¯\J (x) ¯ = (πl (x)) ¯ l∈J¯\J , ∀x¯ ∈ Rm , M : Ri × Rm−i → Rm be such that M(xa , xb ) = x¯ with xa = πJ (x) ¯ and xb = πJ¯\J (x), ¯ ∀xa ∈ Ri m−i and ∀xb ∈ R , xf := πJ (x0 ), x˜ f := πJ¯\J (x0 ), Ω˜ J := πJ¯\J (Ω), ˜ ¯ = F (M(xf , x)), ¯ ∀x¯ ∈ Ω˜ J , and FJ : ΩJ → Y be defined by FJ (x) ∃aJ ∈ (0, ∞) ⊂ R, ∀ ∈ (0, ∞) ⊂ R, ∃δJ () ∈ (0, ∞) ⊂ R, ∀n ∈ N,  n  n  ∀ x¯j j =1 , x˜j j =1 ⊆ rx˜f −aJ 1m−i ,x˜f +aJ 1m−i ∩Ω˜ J with x¯j = x˜j , ∀j ∈ {1, . . . , n}, (rx¯j ,x˜j )nj=1 being pairwise disjoint, and nj=1 μB(m−i) (rx¯j ,x˜j ) < δJ (), and we 6 6 have n 6ΔF (rx¯ ,x˜ )6 < . j =1

J

j

j

Proof We will prove the result using mathematical induction on m: 1◦ The result is clearly a restatement of Definition 12.58. Hence, the case is proved. 2◦ Assume that the result holds for m ≤ k − 1 ∈ N. 3◦ Consider the case when m = k ∈ {2, 3, . . .}. “Sufficiency” We need to show that F is absolutely continuous at x0 assuming (a). Let i = 0, and (a) implies (ii) of Definition 12.58. ∀i0 ∈ J¯, let πi0 , πi0 , Di0 , and Fi0 be as defined in (i) of Definition 12.58. By (a) holds for any i ∈ {1, . . . , m − 1} and any J ⊂ J¯ with card(J ) = i such that i0 ∈ J , we have (a) holds for Fi0 at πi0 (x0 ). By the inductive assumption, Fi0 is absolutely continuous at πi0 (x0 ). Hence, (i) of Definition 12.58 holds for F at x0 . Then, F is absolutely continuous at x0 . “Necessity” We need to show that (a) holds assuming that F is absolutely continuous at x0 . By (ii) of Definition 12.58, (a) holds when i = 0. By (i) of Definition 12.58, ∀i0 ∈ J¯, let πi0 , πi0 , Di0 , and Fi0 be as defined in (i) of Definition 12.58, and we have Fi0 is absolutely continuous at πi0 (x0 ). By the inductive assumption, (a) holds for Fi0 . Then, (a) holds for F , ∀i ∈ {1, . . . , m − 1}, ∀J ⊂ J¯ with card(J ) = i and i0 ∈ J . Hence, (a) holds for F . This completes the proof for the necessity. This completes the induction process. This completes the proof of the proposition. ' & Proposition 12.61 Let m ∈ N, Rm be endowed with the usual positive cone, Ω ∈ BB (Rm ) be a rectangle with subset topology O, Y and Z be normed linear spaces over K, f : Ω → Y, g : Ω → Z, x0 ∈ Ω, and h : Ω → Y × Z be defined by h(x) = (f (x), g(x)), ∀x ∈ Ω. Then, h is absolutely continuous at x0 if, and only if, f and g are absolutely continuous at x0 .

610

12 Differentiation and Integration

Proof We will prove the result using mathematical induction on m: 1◦ m = 1. “Sufficiency” By f being absolutely continuous at x0 , ∃af ∈ (0, ∞) ⊂ R, ∀ ∈ (0, ∞) ⊂ R, ∃δf () ∈ (0, af ] ⊂ R, ∀n ∈ N, ∀x0 − af ≤x1 ≤ x¯1 ≤ x2 ≤ x¯2 ≤ · · · ≤ xn ≤ x¯n ≤ x0 + af with xi , x¯i ∈ I , i = 1, . . . , n, and ni=1 (x¯i − xi ) < δf (), and we have ni=1 f (x¯i ) − f (xi ) < . By g being absolutely continuous at x0 , ∃ag ∈ (0, ∞) ⊂ R, ∀ ∈ (0, ∞) ⊂ R, ∃δg () ∈ (0, ag ] ⊂ R, ∀n ∈ N, ∀x0 − ag ≤ x1 ≤-x¯1 ≤ x2 ≤ x¯2 ≤ · · · ≤ xn ≤ x¯n ≤ -x0 + ag with xi , x¯i ∈ I , i = 1, . . . , n, and ni=1 (x¯i − xi ) < δg (), and we have ni=1 g(x¯i ) − g(xi ) < . Let a := min{af , ag } ∈ (0, +∞) ⊂ R. ∀ ∈ (0, ∞) ⊂ R, let δ() := min{δf (/2), δg (/2)} ∈ (0, a] ⊂ R. ∀n ∈ N, ∀x0 − a ≤ x1-≤ x¯1 ≤ x2 ≤ n x¯2 ≤ · · · ≤ xn ≤ x¯n ≤ x0 + a with xi , x¯i ∈ I , i = 1, . . . , n, and i=1 (x¯ i − xi ) < n n δ(), we have i=1 h(x¯i ) − h(xi ) = i=1 (f (x¯i ) − f (xi ), g(x¯i ) − g(xi )) = -n -n 2 2 1/2 ≤ i=1 (f (x¯ i ) − f (xi ) + g(x¯ i ) − g(xi ) ) i=1 (f (x¯ i ) − f (xi ) + g(x¯i ) − g(xi )) < . Hence, h is absolutely continuous at x0 . “Necessity” By h being absolutely continuous at x0 , ∃a ∈ (0, ∞) ⊂ R, ∀ ∈ (0, ∞) ⊂ R, ∃δ() ∈ (0, a] ⊂ R, ∀n ∈ N, ∀x0 − a ≤ x1 ≤ x¯ 1 ≤ x2 ≤ x¯2 ≤ · · · ≤ xn ≤ x¯n ≤ x0 + a with xi , x¯i ∈ I , i = 1, . . . , n, and ni=1 (x¯i − xi ) < -n -n 2 h(x¯i ) − h(xi ) = δ(), and we have  > i=1 i=1 (f (x¯ i ) − f (xi ) + n g(x¯i ) − g(xi )2 )1/2 ≥ i=1 f (x¯i ) − f (xi ). Hence, f is absolutely continuous at x0 . By symmetry, g is absolutely continuous at x0 . This case is proved. 2◦ Assume that the result holds for m ≤ k − 1 ∈ N. 3◦ Consider the case m = k ∈ {2, 3, . . .}. “Sufficiency” Assume that f and g are absolutely continuous at x0 . ∀i0 ∈ J¯ := {1, . . . , m}, let Di0 ⊆ Rm−1 , πi0 : Rm → Rm−1 , fi0 : Di0 → Y, gi0 : Di0 → Z, and hi0 : Di0 → Y × Z be defined as in Definition 12.58. By (i) of Definition 12.58, fi0 and gi0 are absolutely continuous at πi0 (x0 ). By the inductive assumption, hi0 is absolutely continuous at πi0 (x0 ). Then, h satisfies (i) of Definition 12.58. By (ii) of Definition 12.58, ∃af ∈ (0, ∞) ⊂ R, ∀ ∈ (0, ∞) ⊂ R, ∃δf () ∈ 

(0, ∞) ⊂ R, ∀n ∈ N, ∀(x¯ i )ni=1 , (x˜i )ni=1 ⊆ rx0 −af 1m ,x¯i = x˜i , 0 +af 1m ∩ Ω with x ∀i ∈ {1, .. . , n}, (rx¯i ,x˜i )ni=1 being pairwise disjoint, and ni=1 μBm (rx¯i ,x˜i ) < δf (), we have ni=1 Δf (rx¯i ,x˜i ) < ; ∃ag ∈ (0, ∞) ⊂ R, ∀ ∈ (0, ∞) ⊂ R, ∃δg () ∈ 

(0, ∞) ⊂ R, ∀n ∈ N, ∀(x¯i )ni=1 , (x˜i )ni=1 ⊆ rx0 −ag 1m ,x ¯i = x˜i , ∀i ∈ g 1m ∩ Ω with x -0 +a n n {1, . . . , n}, (r ) being pairwise disjoint, and μ (r ) < δg (), and i=1 Bm x¯ i ,x˜ i - x¯i ,x˜i i=1 we have ni=1 Δg (rx¯i ,x˜i ) < . Let a := af ∧ ag ∈ (0, ∞) ⊂ R. ∀ ∈ (0, ∞) ⊂ R, let δ() := min{δf (/2), δg (/2)} ∈ (0, ∞) ⊂ R. ∀n ∈ N, ∀(x¯i )ni=1 , (x˜i )ni=1 ⊆ 

n rx0 −a1m ,x0 +a1m ∩ Ω-with x¯i = x˜i , ∀i ∈ {1, . . . , n}, (r -x¯in,x˜i )i=1 being pairn wise disjoint, and i=1 μBm (rx¯ i ,x˜ i )-< δ(), we have i=1 Δh (rx¯ i ,x˜ i ) = -n n 2 2 1/2 (Δ (r ), Δ (r )) = (Δ (r ) + Δ ≤ f x¯ i ,x˜ i g-x¯ i ,x˜ i f x¯ i ,x˜ i g (rx¯ i ,x˜ i ) ) i=1 -i=1 n n Δ (r ) + Δ (r ) < . Hence, h satisfies (ii) of Definif x¯ i ,x˜ i g x¯ i ,x˜ i i=1 i=1 tion 12.58. Then, h is absolutely continuous at x0 . “Necessity” Let h be absolutely continuous at x0 . ∀i0 ∈ J¯, let Di0 , πi0 , fi0 , gi0 , and hi0 be defined as in Definition 12.58. By (i) of Definition 12.58, hi0 is absolutely

12.5 Absolute and Lipschitz Continuity

611

continuous at πi0 (x0 ). By the inductive assumption, fi0 and gi0 are absolutely continuous at x0 . Hence, f and g satisfy (i) of Definition 12.58. By (ii) of Definition 12.58, ∃a ∈ (0, ∞) ⊂ R, ∀ ∈ (0, ∞) ⊂ R, ∃δ() ∈  (0, ∞) ⊂ R, ∀n ∈ N, ∀(x¯i )ni=1 , (x˜i )ni=1 ⊆ rx0 −a1m ,x0+a1m ∩ Ω with x¯ i = x˜ i , n n ∀i ∈ {1, . . . , n},-(rx¯i ,x˜i )i=1 being pairwise -n disjoint, and i=1 μBm (rx¯i ,x˜i ) < δ(), n we have  > Δ (r ) ≥ h x¯ i ,x˜ i i=1 i=1 Δf (rx¯ i ,x˜ i ). Hence, f satisfies (ii) of Definition 12.58. Then, f is absolutely continuous at x0 . By symmetry, g is absolutely continuous at x0 . This completes the induction process. This completes the proof of the proposition. ' & Proposition 12.62 Let m ∈ N, Rm be endowed with the usual positive cone, Ω ∈ BB (Rm ) be a rectangle with subset topology O, Y be a normed linear space over K, F : Ω → Y, and Ω1 , Ω2 ⊆ Ω be rectangles in Rm such that Ω1 ∪ Ω2 = Ω. Then, the following statements hold: (i) If F is absolutely continuous on E ⊆ Ω, then F |Ω1 is absolutely continuous on E ∩ Ω1 . (ii) If F |Ω1 and F |Ω2 are absolutely continuous and Ω1 and Ω2 are both relatively open or both relatively closed (with respect to the subset topology O), then F is absolutely continuous. Proof (i) ∀x0 ∈ E ∩ Ω1 , since F is absolutely continuous at x0 , then F |Ω1 is absolutely continuous at x0 . By the arbitrariness of x0 , F |Ω1 is absolutely continuous on E ∩ Ω1 . (ii) We will distinguish two exhaustive cases: Case 1: Ω1 and Ω2 are relatively open in Ω; Case 2: Ω1 and Ω2 are relatively closed in Ω. Case 1: Ω1 and Ω2 are relatively open in Ω. Fix any x0 ∈ Ω. We will prove the statement that “if F |Ωi is absolutely continuous at x0 when x0 ∈ Ωi , and Ω1 and Ω2 are relatively open in Ω, then F is absolutely continuous at x0 ” by mathematical induction on m: 1◦ m = 1. Without loss of generality, assume x0 ∈ Ω1 . Since Ω1 is open relative to Ω, then ∃a ∈ (0, ∞) ⊂ R such that (x0 −a, x0 +a)∩Ω1 = (x0 −a, x0 +a)∩Ω. By F |Ω1 being absolutely continuous at x0 , we can easily show that that F is absolutely continuous at x0 . This completes the first step in the induction process. 2◦ Assume that the result holds for m ≤ k − 1 ∈ N. 3◦ Consider the case when m = k ∈ {2, 3, . . .}. Without loss of generality, assume x0 ∈ Ω1 . ∀i0 ∈ J¯ := {1, . . . , m}, Fi0 : Di0 → Y be as defined in * let Di0 and  m−1  Definition 12.58. Let Di0 ,j := x ∈ R ¯ = πi0 (x0 ) and ∃x¯ ∈ Ωj · πi0 (x) + πi0 (x) ¯ = x and Fi0 ,j : Di0 ,j → Y be defined by Fi0 ,j (x) = F |Ωj (x), ¯ ∀x ∈ Di0 ,j , ¯ = πi0 (x0 ) and πi0 (x) ¯ = x, j = 1, 2. Fix any where x¯ ∈ Ωj is such that πi0 (x) j ∈ {1, 2}. Since Ωj ⊆ Ω is a relatively open rectangle, then Di0 ,j ⊆ Di0 is a relatively open rectangle. By F |Ωj being absolutely continuous at x0 when x0 ∈ Ωj , we have Fi0 ,j is absolutely continuous at πi0 (x0 ) when πi0 (x0 ) ∈ Di0 ,j . It is easy to

612

12 Differentiation and Integration

 see that Fi0 ,j = Fi0 D and Di0 ,1 ∪ Di0 ,2 = Di0 . By the inductive assumption, Fi0 i0 ,j is absolutely continuous at πi0 (x0 ) ∈ Di0 . Hence, F satisfies (i) of Definition 12.58. Since Ω1 is relative open in Ω, then ∃a ∈ (0, ∞) ⊂ R such that r◦x0 −a1m ,x0 +a1m ∩ Ω1 = r◦x0 −a1m ,x0 +a1m ∩ Ω. By F |Ω1 being absolutely continuous at x0 and (ii) of Definition 12.58, ∃aF ∈ (0, a) ⊂ R, ∀ ∈ (0, ∞) ⊂ R, ∃δ() ∈ (0, ∞) ⊂ R, ∀n ∈ N, ∀(x¯i )ni=1 , (x˜i )ni=1 ⊆ rx0 −aF 1m ,x0 +aF 1m ∩ Ω1 = rx0 −aF 1m ,x0 +aF 1m ∩ 

Ω with x¯i = x˜ i , ∀i ∈ {1, . . . , n}, (rx¯i ,x6˜i )ni=1 being6 pairwise disjoint, and -n -n 6 6 i=1 μBm (rx¯ i ,x˜ i ) < δ(), and we have i=1 ΔF (rx¯ i ,x˜ i ) < . Hence, F satisfies (ii) of Definition 12.58. Hence, F is absolutely continuous at x0 . This completes the induction process. By the arbitrariness of x0 , F is absolutely continuous. This completes the proof for Case 1. Case 2: Ω1 and Ω2 are relatively closed in Ω. Fix any x0 ∈ Ω. We will prove the statement that “if F |Ωi is absolutely continuous at x0 when x0 ∈ Ωi , and Ω1 and Ω2 are relatively closed in Ω, then F is absolutely continuous at x0 ” by mathematical induction on m. 1◦ m = 1. Without loss of generality, assume x0 ∈ Ω1 . We will further distinguish two exhaustive and mutually exclusive cases: Case 2a: x0 ∈ / Ω2 ; Case 2b: x0 ∈ Ω2 . Case 2a: x0 ∈ / Ω2 . Since Ω2 is relatively closed in Ω and x0 ∈ Ω \ Ω2 , then, by Proposition 4.10, ∃a ∈ (0, ∞) ⊂ R such that (x0 − a, x0 + a) ∩ Ω ⊆ Ω \ Ω2 ⊆ Ω1 . F |Ω1 is absolutely continuous at x0 implies that F is absolutely continuous at x0 . This subcase is proved. Case 2b: x0 ∈ Ω2 . Then, x0 ∈ Ω1 ∩ Ω2 . By F |Ωi being absolutely continuous at x0 , ∃ai ∈ (0, ∞) ⊂ R, ∀ ∈ (0, ∞) ⊂ R, ∃δi () ∈ (0, ai ] ⊂ R, ∀n ∈ N, ∀x0 − ai ≤ x1 ≤ x¯1 ≤ · · · ≤ xn ≤ x¯n ≤ x0 + ai with xj , x¯j ∈ Ωi , j = 1, . . . , n, and nj=1 (x¯j − xj ) < δi (), and we have nj=1 F (x¯j ) − F (xj ) < , i = 1, 2. Take a := min{a1 , a2 } ∈ (0, ∞) ⊂ R, ∀ ∈ (0, ∞) ⊂ R, and let δ() := min{δ1 (/2), δ2 (/2)} ∈ (0, a] ⊂ R. ∀n ∈ N, ∀x0 − a ≤ -x1 ≤ x¯1 ≤ · · · ≤ xn ≤ x¯n ≤ x0 + a with xj , x¯j ∈ Ω, j = 1, . . . , n, and nj=1 (x¯j − xj ) < δ(). Without loss of generality, assume ∃j0 ∈ {1, . . . , n − 1} such that x¯j0 ≤ x0 ≤ xj0 +1 . (In case, j0 does not exist, and add a pair of x0 ’s into the collection of xj ’s and x¯j ’s.) Note that x1 ∈ Ω = Ω1 ∪ Ω2 . Without loss of generality, assume x1 ∈ Ω1 . Then x1 , x¯1 , . . . , xj0 , x¯j0 ∈ Ω1 , -j since Ω1 is a rectangle (interval, since m = 1), and j0=1 (x¯j − xj ) < δ() ≤ -j δ1 (/2). Thus, j0=1 F (x¯j ) − F (xj ) < /2. By a similar argument, we have -n -n j =j0 +1 F (x¯ j ) − F (xj ) < /2. This implies that j =1 F (x¯ j ) − F (xj ) < . Hence, F is absolutely continuous at x0 . This subcase is proved. Hence, in both subcases, F is absolutely continuous at x0 . This completes the proof for the first step of the induction process. 2◦ Assume that the result holds for m ≤ k − 1 ∈ N. 3◦ Consider the case when m = k ∈ {2, 3, . . .}. Without loss of generality, assume ¯ x0 ∈ Ω1 . ∀i  i0 and Fi0 : Di0 → Y be as defined in Definition * 0 ∈ J , let D + 12.58. Let  m−1 Di0 ,j := x ∈ R ¯ = πi0 (x0 ) and πi0 (x) ¯ = x and Fi0 ,j :  ∃x¯ ∈ Ωj · πi0 (x)

12.5 Absolute and Lipschitz Continuity

613

Di0 ,j → Y be defined by Fi0 ,j (x) = F |Ωj (x), ¯ ∀x ∈ Di0 ,j , where x¯ ∈ Ωj is such that πi0 (x) ¯ = πi0 (x0 ) and πi0 (x) ¯ = x, j = 1, 2. Fix any j ∈ {1, 2}. Since Ωj ⊆ Ω is a relatively closed rectangle, then Di0 ,j ⊆ Di0 is a relatively closed rectangle. By F |Ωj being absolutely continuous at x0 when x0 ∈ Ωj , we have Fi0 ,j is absolutely  continuous at π (x0 ) when π (x0 ) ∈ Di ,j . It is easy to see that Fi ,j = Fi  i0

i0

0

0

0

Di0 ,j

and Di0 ,1 ∪ Di0 ,2 = Di0 . By the inductive assumption, Fi0 is absolutely continuous at πi0 (x0 ) ∈ Di0 . Hence, F satisfies (i) of Definition 12.58. We will further distinguish two exhaustive and mutually exclusives subcases: Case 2α: x0 ∈ / Ω2 ; Case 2β: x0 ∈ Ω2 . Case 2α: x0 ∈ / Ω2 . Then, x0 ∈ Ω \ Ω2 . Since Ω2 is relatively closed in Ω, by Proposition 4.10, ∃a ∈ (0, ∞) ⊂ R such that r◦x0 −a1m ,x0 +a1m ∩ Ω ⊆ Ω \ Ω2 ⊆ Ω1 . By F |Ω1 being absolutely continuous at x0 , ∃a¯ ∈ (0, a) ⊂ R, ∀ ∈ (0, ∞) ⊂ R, ∃δ() ∈ (0, ∞) ⊂ R, ∀n ∈ N, ∀(x¯i )ni=1 , (x˜i )ni=1 ⊆ rx0 −a1 ¯ m ,x0 +a1 ¯ m ∩ 

Ω1 = rx0 −a1 ∩ Ω with x¯i = x˜i , ∀i ∈ {1, . . . , n}, and (rx¯i ,x˜i )ni=1 being ¯ m ,x0 +a1 ¯ mn pairwise disjoint, and i=1 μBm (rx¯i ,x˜i ) < δ(), and we have ni=1 ΔF (rx¯i ,x˜i ) < . Hence, F satisfies (ii) of Definition 12.58. Then, F is absolutely continuous at x0 . This subcase is proved. Case 2β: x0 ∈ Ω2 . Then, x0 ∈ Ω1 ∩ Ω2 . Fix any i ∈ {1, 2}. By F |Ωi being absolutely continuous at x0 and (ii) of Definition 12.58, ∃ai ∈ (0,∞) ⊂ R, n ∀ ∈ (0, ∞) ⊂ R, ∃δi () ∈ (0, ∞) ⊂ R, ∀n ∈ N, ∀(x¯j )nj=1 , x˜j j =1 ⊆ #n "  rx0 −ai 1m ,x0 +ai 1m ∩ Ωi with x¯j = x˜j , ∀j ∈ {1, . . . , n}, rx¯j ,x˜j being pairwise j =1 6 -n -n 6 disjoint, and j =1 μBm (rx¯j ,x˜j ) < δi (), we have j =1 6ΔF (rx¯j ,x˜j )6 < . Take a := min{a1 , a2 } ∈ (0, ∞) ⊂ R, ∀ ∈ (0, n ∞) ⊂ R, let δ() := min{δ1 (/2), δ2 (/2)} ∈ R+ , ∀n ∈ N, ∀(x¯j )nj=1 , x˜j j =1 ⊆ rx0 −a1m ,x0 +a1m ∩ 

Ω with x¯ j = x˜j , ∀j ∈ {1, . . . , n}, (rx¯j ,x˜j )nj=1 being pairwise disjoint, and -n j =1 μBm (rx¯j ,x˜ j ) < δ(). ∀j ∈ {1, . . . , n}, since Ω = Ω1 ∪ Ω2 and Ω1 and Ω2 are relatively closed rectangles in the rectangle Ω, then rx¯j ,x˜j = rx¯j,1 ,x˜j,1 ∪ rx¯j,2 ,x˜j,2 , 

where x¯j,k , x˜j,k ∈ rx0 −a1m ,x0 +a1m ∩ Ωk , x¯j,k = x˜j,k , k = 1, 2, and the two sets in the union are disjoint. To see the above, note that rx¯j ,x˜j ∩ Ωk is a rectangle, k = 1, 2. If rx¯j ,x˜j ⊆ Ωl , for some l ∈ {1, 2}, then let x¯j,l = x¯j , x˜j,l = x˜j , x¯j,3−l = x0 , and x˜j,3−l = x0 . The desired result holds. Consider the remaining possibility, we must have x¯ j ∈ Ωl \ Ω3−l and x˜j ∈ Ω3−l \ Ωl , for some l ∈ {1, 2}. Without loss of generality, assume l = 1. Ω1 and Ω2 are m-dimensional rectangles and none contains the other, their union is a m-dimensional rectangle Ω, in which both Ω1 and Ω2 are relatively closed. The only possibility for this to happen is that πi (Ω1 ) = πi (Ω2 ) = πi (Ω), ∀i ∈ J¯ \ {i0 }, for some i0 ∈ J¯, and πi0 (Ω1 ) and πi0 (Ω2 ) are overlapping relatively closed intervals in πi0 (Ω). Let x¯j,1 = x¯j ∈ Ω1 \ Ω2 . Since rx¯j ,x˜j ⊆ Ω, then rx¯j ,x˜j ∩ Ω1 ⊆ Ω is a relatively closed rectangle with respect to rx¯j ,x˜j . Then, ∅ = rx¯j ,x˜j ∩ Ω1 =: rx¯j,1 ,x˜j,1 ⊆ rx¯j ,x˜j ⊆ Ω. This 

implies that x¯j,1 , x˜j,1 ∈ rx0 −a1m ,x0 +a1m ∩ Ω1 and x¯ j,1 = x˜ j,1 . Since x˜j ∈ Ω2 \ Ω1 , then ∅ = rx¯j ,x˜j ∩ Ω2 =: rxˆj ,x˜j ⊆ rx¯j ,x˜j ⊆ Ω, where xˆj ∈ rx0 −a1m ,x0 +a1m ∩ Ω2

614

12 Differentiation and Integration 

and xˆj = x˜j . Clearly, x˜j,1 ∈ Ω1 ∩ Ω2 since Ω2 is relatively closed in Ω and xˆj ∈ Ω1 ∩ Ω2 since Ω1 is relatively closed in Ω. Then, xˆj ∈ rx¯j ,x˜j ∩ Ω1 = rx¯j,1 ,x˜j,1 

and xˆj = x˜j,1 . This implies that rxˆj ,x˜j,1 ⊆ Ω1 ∩ Ω2 . Let x˜j,2 = x˜j and x¯j,2 ∈ rxˆj ,x˜j,1  πl (xˆj ) if πl (x˜j,1 ) = πl (x˜j )  , ∀l ∈ J¯. Then, x¯j,2 = be such that πl (x¯j,2 ) = πl (x˜j,1 ) if πl (x˜j,1 ) < πl (x˜j ) x˜j,2 and x¯j,2 , x˜j,2 ∈ Ω2 ∩ rx0 −a1m ,x0 +a1m . Clearly, rx¯j,1 ,x˜j,1 ∩ rx¯j,2 ,x˜j,2 = ∅ and rx¯j,1 ,x˜j,1 ∪ rx¯j,2 ,x˜j,2 = rx¯j ,x˜j since rx¯j,1 ,x˜j,1 ∪ rxˆj ,x˜j = rx¯j ,x˜j . These x¯j,1 , x¯j,2 , x˜ j,1 , and x˜j,2 satisfy the conditions specified above. -n -2 -m Then, = < δ() and j =1 μBm (rx¯ j ,x˜ j ) k=1 j =1 μBm (rx¯ j,k ,x˜ j,k ) n 6 6 .6 -m 6 = μ (r ) < δ (/2), k = 1, 2. This implies that (r ) 6Δ Bm k F x ¯ , x ˜ x ¯ , x ˜ j =1 j,k j,k j j 6 j =1

n 6 2 . n 6 6 . 6 . 6 6 6 6 6ΔF (rx¯j,1 ,x˜j,1 ) + ΔF (rx¯j,2 ,x˜j,2 )6 ≤ 6ΔF (rx¯j,k ,x˜j,k )6 < , where the first j =1

k=1 j =1

equality follows from Proposition 12.46. Hence, F satisfies (ii) of Definition 12.58. Therefore, F is absolutely continuous at x0 . This subcase is proved. In both subcases, F is absolutely continuous at x0 . This completes the induction process. By the arbitrariness of x0 , F is absolutely continuous. This completes the proof for Case 2. In both cases, F is absolutely continuous. This completes the proof of the proposition. ' & Definition 12.63 Let X := (X, ρX ) and Y := (Y, ρY ) be metric spaces, x0 ∈ X , and f : X → Y. f is said to be locally Lipschitz continuous at x0 (with Lipschitz constant L0 ∈ R+ ) if ∃δ ∈ R+ , ∀x1 , x2 ∈ BX (x0 , δ), we have ρY (f (x1 ), f (x2 )) ≤ L0 ρX (x1 , x2 ). f is said to be locally Lipschitz continuous on X if it is locally Lipschitz continuous at x, ∀x ∈ X . f is said to be Lipschitz continuous on X (with Lipschitz constant L ∈ R+ ) if ∀x1 , x2 ∈ X , we have ρY (f (x1 ), f (x2 )) ≤ LρX (x1 , x2 ). % Clearly, if f is locally Lipschitz continuous at x0 , then f is continuous at x0 ; if f is Lipschitz continuous on X , then it is uniformly continuous; if f is Lipschitz continuous on X , then it is locally Lipschitz continuous on X . Definition 12.64 Let X := (X, ρX ) and Z := (Z, ρZ ) be metric spaces, Y be a set, f : X × Y → Z, and x0 ∈ X . f is said to be locally Lipschitz continuous at x0 (with Lipschitz constant L0 ∈ R+ ) uniformly over Y if ∃δ ∈ R+ , ∀x1 , x2 ∈ BX (x0 , δ), ∀y ∈ Y , we have ρZ (f (x1 , y), f (x2 , y)) ≤ L0 ρX (x1 , x2 ). f is said to be locally Lipschitz continuous on X uniformly over Y if it is locally Lipschitz at x uniformly over Y , ∀x ∈ X . f is said to be Lipschitz continuous on X (with Lipschitz constant L ∈ R+ ) uniformly over Y if ∀x1 , x2 ∈ X , ∀y ∈ Y , we have ρZ (f (x1 , y), f (x2 , y)) ≤ LρX (x1 , x2 ). %

12.5 Absolute and Lipschitz Continuity

615

Clearly, if f is Lipschitz continuous on X uniformly over Y , then it is locally Lipschitz continuous on X uniformly over Y . Proposition 12.65 Let X := (X, ρX ) be a compact metric space, Y be a set, y0 ∈ Y , Z := (Z, ρZ ) be a metric space, and f : X × Y → Z be locally Lipschitz continuous on X uniformly over Y . Assume that ∀x ∈ X , 0 ≤ supy∈Y ρZ (f (x, y), f (x, y0 )) < +∞. Then, f is Lipschitz continuous on X uniformly over Y . Proof ∀x ∈ X , by the local Lipschitz continuity of f , ∃Lx ∈ R+ , ∃δx ∈ R+ , ∀x1 , x2 ∈ Ox := BX (x, δx ), ∀y ∈ Y , we haveρZ (f (x1 , y), f (x2 , y)) ≤ Lx ρX (x1 , x2 ). Let Oˆ x := BX (x, δx /2). Then, X ⊆ x∈X Oˆ x , which is an open covering of X . By  the compactness of X , there exists a finite set XN ⊆ X 2 ˆ such that X ⊆ x∈XN Ox . Let LM := max{0, maxx∈XN Lx , minx∈X δx · maxxˆ1 ,xˆ2 ∈XN (supy∈Y ρZ (f (xˆ1 , y), f (xˆ1 , y0 )) Lxˆ1 δxˆ1 2

+

N

ρZ (f (xˆ1 , y0 ), f (xˆ2 , y0 ))

+

Lxˆ2 δxˆ2 2

supy∈Y ρZ (f (xˆ2 , y), f (xˆ2 , y0 )) + + )} ∈ R+ . ∀x1 , x2 ∈ X , ∀y ∈ Y , ∃xˆ1 ∈ XN such that x1 ∈ Oˆ xˆ1 . We will distinguish two exhaustive and mutually exclusive cases: Case 1: x2 ∈ Oxˆ1 ; Case 2: x2 ∈ / Oxˆ1 . Case 1: x2 ∈ Oxˆ1 . Thus, x1 , x2 ∈ Oxˆ1 and ρZ (f (x1 , y), f (x2 , y)) ≤ Lxˆ1 ρX (x1 , x2 ) ≤ LM ρX (x1 , x2 ). Case 2: x2 ∈ / Oxˆ1 . Thus, ρX (x2 , xˆ1 ) ≥ δxˆ1 and ρX (x2 , x1 ) ≥ minx∈X δx

N ρX (x2 , xˆ1 ) − ρX (x1 , xˆ1 ) > δxˆ1 /2 ≥ > 0. ∃xˆ2 ∈ XN such that 2 ˆ x2 ∈ Oxˆ2 . This implies that ρZ (f (x1 , y), f (x2 , y)) ≤ ρZ (f (x1 , y), f (xˆ1 , y)) + ρZ (f (xˆ1 , y), f (xˆ1 , y0 )) + ρZ (f (xˆ1 , y0 ), f (xˆ2 , y0 )) + ρZ (f (xˆ2 , y0 ), f (xˆ2 , y)) + ρZ (f (xˆ2 , y), f (x2 , y)) ≤ Lxˆ1 ρX (x1 , xˆ1 ) + supy∈Y ρZ (f (xˆ1 , y), f (xˆ1 , y0 )) + ρZ (f (xˆ1 , y0 ), f (xˆ2 , y0 )) + supy∈Y ρZ (f (xˆ2 , y), f (xˆ2 , y0 )) + Lxˆ2 ρX (x2 , xˆ2 ) ≤

supy∈Y ρZ (f (xˆ1 , y), f (xˆ1 , y0 )) + ρZ (f (xˆ1 , y0 ), f (xˆ2 , y0 )) +

Lxˆ1 δxˆ1 2

+

Lxˆ2 δxˆ2 2

+

minx∈XN δx ≤ LM ρX (x1 , x2 ). Hence, in both supy∈Y ρZ (f (xˆ2 , y), f (xˆ2 , y0 )) ≤ LM 2 cases, we have ρZ (f (x1 , y), f (x2 , y)) ≤ LM ρX (x1 , x2 ). Therefore, f is Lipschitz

continuous on X uniformly over Y . This completes the proof of the proposition. ' &

Proposition 12.66 Let X be a normed linear space over K, Y be a set, Z be a normed linear space over K, U ⊆ X be convex, and f : U × Y → Z be partial differentiable with respect to x. Assume that ∃L ∈ [0, ∞) ⊂ R such that 6 6 6 ∂f 6 6 ∂x (x, y)6 ≤ L, ∀(x, y) ∈ U × Y . Then, f is Lipschitz continuous on U with Lipschitz constant L uniformly over Y . Proof ∀x1 , x2 ∈ U , ∀y ∈ Y , by 6 Mean Value Theorem 69.23, ∃t ∈ (0, 1) ⊂ R such 6 6 that f (x1 , y) − f (x2 , y) ≤ 6 ∂f ∂x (tx1 + (1 − t)x2 , y)6x1 − x2  ≤ Lx1 − x2 . Hence, f is Lipschitz continuous on U with Lipschitz constant L uniformly over Y . This completes the proof of the proposition. ' & Proposition 12.67 Let I ⊆ R be an interval, Y be a normed linear space over K1 , Z be a normed linear space over K2 , U ⊆ Y, f : I → U be absolutely continuous

616

12 Differentiation and Integration

at x0 ∈ I , and g : U → Z be locally Lipschitz continuous at y0 := f (x0 ) ∈ U . Then, g ◦ f is absolutely continuous at x0 . Proof By g being locally Lipschitz at y0 , then ∃L0 ∈ [0, ∞) ⊂ R, ∃δg ∈ (0, ∞) ⊂   R, ∀y1 , y2 ∈ U ∩ BY y0 , δg , we have g(y1 ) − g(y2 ) ≤ L0 y1 − y2 . Since f is absolutely continuous at x0 , then it is continuous at x0 by Proposition 12.59.  Then,  ∃a0 ∈ (0, ∞) ⊂ R such that ∀x ∈ BR (x0 , a0 )∩I , we have f (x) ∈ U ∩BY y0 , δg . By the absolute continuity of f at x0 , ∃af ∈ (0, a0 ) ⊂ R, ∀ ∈ (0, ∞) ⊂ R, ∃δf () ∈ (0, af ] ⊂ R, ∀n ∈ N, ∀x0 − af ≤ x1≤ x¯ 1 ≤ x2 ≤ x¯2 ≤ · · · ≤ xn ≤ x¯n ≤ x0 + af with xi , x¯i ∈ I , i = 1, . . . , n, and ni=1 (x¯i − xi ) < δf (), we have n i=1 f (x¯ i ) − f (xi ) < . For the function g ◦ f , take a = af ∈ (0, ∞) ⊂ R, and ∀ ∈ (0, ∞) ⊂ R, let δ() = δf (/(1 + L0 )) ∈ (0, a] ⊂ R. ∀n ∈ N, ∀x0 − a ≤ x1 ≤ x¯1 ≤ x2 ≤ x¯2 ≤ · · · ≤ xn ≤ x¯n ≤ x0 + a with xi ,x¯i ∈ I, n i = 1, . . . , n, and i=1 (x¯ i − xi ) < δ(), -n - we have f (x¯i ), f (xi ) ∈ U ∩ BY y0 , δg , and i=1 g ◦ f (x¯i ) − g ◦ f (xi ) ≤ ni=1 L0 f (x¯i ) − f (xi ) ≤ L0 1+L < . 0 Hence, g ◦ f is absolutely continuous at x0 . This completes the proof of the proposition. ' & Proposition 12.68 Let I ⊆ R be an interval, Y be a normed linear space over K, Z be a normed linear space over K, f : I → Y and g : I → Y be absolutely continuous at x0 ∈ I , A : I → B(Y, Z) be absolutely continuous at x0 , and h : I → Z be defined by h(x) = A(x)f (x), ∀x ∈ I . Then, f + g and h are absolutely continuous at x0 . Proof By Propositions 12.61, 9.14, 12.67, and 12.66, we have that f + g is absolutely continuous at x0 . By Propositions 12.61, 9.17, 12.67, and 12.66, we have that h is absolutely continuous at x0 . This completes the proof of the proposition. ' & Proposition 12.69 Let m ∈ N, Rm be endowed with the usual positive cone, Ω ∈ BB (Rm ) be a rectangle, Y be a normed linear space over K, Z be a normed linear space over K, F : Ω → Y be absolutely continuous at x0 ∈ Ω, and A ∈ B (Y, Z). Then, h : Ω → Z, defined by h(x) = Af (x), ∀x ∈ Ω, is absolutely continuous at x0 . Proof We will prove this result using mathematical induction on m: 1◦ m = 1. Clearly, the result follows from Proposition 12.68. 2◦ Assume that the result holds for m ≤ k ∈ N. 3◦ Consider the case when m = k + 1. We will prove that h is absolutely continuous at x0 by Definition 12.58. Fix any i0 ∈ J¯ := {1, . . . , m}, and let Di0 , πi0 , and Fi0 be defined as in (i) of Definition 12.58. Let hi0 : Di0 → Z be defined by hi0 (x) = h(x) ¯ = AF (x) ¯ = AFi0 (x), ∀x ∈ Di0 ⊆ Rk , ∀x¯ ∈ Ω with πi0 (x) ¯ = πi0 (x0 ) and πi0 (x) ¯ = x. By the assumption, F is absolutely continuous at x0 , and then Fi0 is absolutely continuous at πi0 (x0 ). By inductive assumption, hi0 is absolutely continuous at πi0 (x0 ). Thus, (i) of Definition 12.58 holds.

12.5 Absolute and Lipschitz Continuity

617

Since F is absolutely continuous at x0 , by (ii) of Definition 12.58, ∃a ∈ (0, ∞) ⊂ R, ∀ ∈ (0, ∞) ⊂ R, ∃δ() ∈ R+ , ∀n ∈ N, ∀(x¯i )ni=1 , (x˜i )ni=1 ⊆ 

rx0 −a1 . . . , n}, (rx¯i ,x˜i )ni=1 being pairwise disjoint, -mn,x0 +a1m ∩Ω with x¯i = x˜i , ∀i ∈ {1, and i=1 μBm (rx¯i ,x˜i ) < δ(), we have ni=1 ΔF (rx¯i ,x˜i ) < 1+A . It is easy to -n {1, see that Δ (r ) = AΔ (r ), ∀i ∈ . . . , n}. Then, Δ h (rx¯ i ,x˜ i ) ≤ i=1 6 F x¯i ,x˜i - h6 x¯i ,x˜i A ni=1 6ΔF (rx¯i ,x˜i )6 < . Hence, (ii) of Definition 12.58 holds. Thus, h is absolutely continuous at x0 . This completes the induction process and therefore the proof of the proposition. ' & Proposition 12.70 Let X be a normed linear space over K1 , Y be a set, Z be a normed linear space over K2 , U ⊆ X be endowed with subset topology O, f : U × Y → Z, g : U → R be continuous, (ci )i∈Z ⊂ R be a sequence of strictly increasing real numbers with limi→−∞ ci = −∞ and limi→∞ ci = ∞, and Ui := {x ∈ U | ci ≤ g(x) ≤ ci+1 }, ∀i ∈ Z. Assume that: (i) U is locally convex according to Definition 7.124. (ii) f |Ui ×Y is locally Lipschitz continuous on Ui uniformly over Y , ∀i ∈ Z. Then, f is locally Lipschitz continuous on U uniformly over Y .  Proof Fix any x0 ∈ U = i∈Z Ui . Then, g(x0 ) ∈ R. We will distinguish two exhaustive and mutually exclusives cases: Case 1: ∃i ∈ Z such that ci < g(x0 ) < ci+1 ; Case 2: ∃i ∈ Z such that g(x0 ) = ci . Case 1: ∃i ∈ Z such that ci < g(x0 ) < ci+1 . Then, x0 ∈ Ui . Since g is continuous, then ∃δ1 ∈ (0, ∞) ⊂ R such that BX (x0 , δ1 )∩U ⊆ Ui . By f |Ui ×Y being locally Lipschitz continuous on Ui uniformly over Y , we have f |Ui ×Y is locally Lipschitz continuous at x0 uniformly over Y . By Definition 12.64, ∃L1 ∈ [0, ∞) ⊂ R, ∃δ ∈ (0, δ1 ] ⊂ R, ∀x1 , x2 ∈ BX (x0 , δ) ∩ Ui = BX (x0 , δ) ∩ U , ∀y ∈ Y , we have f (x1 , y) − f (x2 , y) ≤ L1 x1 − x2 . Hence, f is locally Lipschitz continuous at x0 uniformly over Y . Case 2: ∃i ∈ Z such that g(x0 ) = ci . Then, x0 ∈ Ui−1 ∩ U  i . By (i) and the continuity of g, ∃δx0 ∈ (0, ∞) ⊂ R such that U ∩ BX x0 , δx0 ⊆ Ui−1 ∪ Ui is convex. By f |Uj ×Y being locally Lipschitz continuous on Uj uniformly over Y , we have f |Uj ×Y is locally Lipschitz continuous at x0 uniformly over Y , j = i −1, i. By   Definition 12.64, ∃Lj ∈ [0, ∞) ⊂ R, ∃δj ∈ (0, δx0 ] ⊂ R, ∀x1 , x2 ∈ BX x0 , δj ∩ Uj , ∀y ∈ Y , we have f (x1 , y) − f (x2 , y) ≤ Lj x1 − x2 , j = i − 1, i. Take L := Li−1 ∨ Li ∈ [0, ∞) ⊂ R and δ := δi−1 ∧ δi ∈ (0, δx0 ] ⊂ R. ∀x1 , x2 ∈ BX (x0 , δ) ∩ U =: D, ∀y ∈ Y , by Proposition 6.40, D is convex. If ∃j ∈ {i − 1, i} such that x1 , x2 ∈ Uj , then f (x1 , y) − f (x2 , y) ≤ Lj x1 − x2  ≤ Lx1 − x2 . On the other hand, without loss of generality, assume that x1 ∈ Ui−1 and x2 ∈ Ui . Consider the line segment connecting x1 and x2 . Claim 12.70.1 ∃t0 ∈ [0, 1] ⊂ R such that xt0 := t0 x1 + (1 − t0)x2 ∈ Ui−1 ∩ Ui ∩ D.

618

12 Differentiation and Integration

Proof of Claim Since D is convex, then xt := tx1 + (1 − t)x2 ∈ D ⊆ Ui−1 ∪ Ui , ∀t ∈ [0, 1] ⊂ R. Define t0 := sup{t ∈ [0, 1] ⊂ R | xt ∈ Ui } =: sup A

.

Clearly, 0 ∈ A and 0 ≤ t0 ≤ 1. Suppose t0 ∈ / A. This leads to t0 ∈ (0, 1] ⊂ R. Then, ∃(tn )∞ ⊆ [0, t ) ∩ A such that lim tn = t0 . ∀n ∈ N, by tn ∈ A, we have 0 n∈N n=1 g(xtn ) ≥ ci . Then, g(xt0 ) = limn∈N g(xtn ) ≥ ci , xt0 ∈ Ui , and t0 ∈ A. This is a contradiction. Hence, t0 ∈ A and xt0 ∈ U2 . If t0 = 1, then xt0 = x1 ∈ Ui−1 and xt0 ∈ Ui−1 ∩ Ui . Hence, the claim holds. On the other hand, if t0 < 1, then ∀t ∈ (t0 , 1] ⊂ R, t ∈ / A and xt ∈ Ui−1 . This implies that g(xt ) ≤ ci , ∀t ∈ (t0 , 1] ⊂ R. Then, g(xt0 ) = limt →t + g(xt ) ≤ ci and xt0 ∈ Ui−1 . Hence, xt0 ∈ Ui−1 ∩ Ui . The 0 claim holds as well. This completes the proof of the claim. ' & By Claim 12.70.1, f (x1 , y) − f (x2 , y) ≤ f (x1 , y) − f (xt0 , y) + f (xt0 , y) − f (x2 , y) ≤ Li−1 x1 − xt0  + Li x2 − xt0  = Li−1 (1 − t0 )x1 − x2  + Li t0 x1 − x2  ≤ Lx1 − x2 . Hence, we have that f (x1 , y) − f (x2 , y) ≤ Lx1 − x2  in either situations. Then, f is locally Lipschitz continuous at x0 uniformly over Y . In both cases, we have that f is locally Lipschitz continuous at x0 uniformly over Y . By the arbitrariness of x0 , f is locally Lipschitz continuous on U uniformly over Y . This completes the proof of the proposition. ' & Definition 12.71 Let a, b ∈ R, I := ra∧b,a∨b ⊂ R be the semi-open interval with a and b as end points, I := ((I, |·|), B, μ) be the finite metric measure subspace of R, Y be a normed linear space, and f : I → Y be B-measurable. We will write 7

b

.

a

7

b

f (x) dx :=

 f dμB :=

a

whenever the right-hand side makes sense.

dμB if b ≥ a − I f dμB if b < a I f

%

By Lemma 11.73, the interval I may be ra∧b,a∨b , or ra∧b,a∨b , or r◦a∧b,a∨b , which will not change the value of the integral. Fact 12.72 Let a, b, c ∈ R, I := ra∧b∧c,a∨b∨c ⊂ R, I := ((I, |·|), B, μ) be the finite metric measure subspace of R, Y be a separable Banach space, and f : I → Y c b b be absolutely integrable over I. Then, a f (x) dx + c f (x) dx = a f (x) dx ∈ Y. Proof We will distinguish six exhaustive and mutually exclusives cases: Case 1: a ≤ c ≤ b; Case 2: a ≤ b ≤ c; Case 3: b ≤ a ≤ c; Case 4: b ≤ c ≤ a; Case 5: c ≤ a ≤ b; Case 6: c ≤ b ≤ a. Case 1: a ≤ c ≤ b. Let J1 := ra,c ⊂ R, J2 := rc,b ⊂ b R. By Proposition 11.92, a f (x) dx = I f dμB = J1 f dμB + J2 f dμB = c b c b b f (x) dx + c f (x) dx = a f (x) dx ∈ Y. a f (x) dx + c f (x) dx ∈ Y. Then, b c c a Case 2: a ≤ b ≤ c. By Case 1, a f (x) dx = a f (x) dx + b f (x) dx. Then, b c c c b a f (x) dx = a f (x) dx − b f (x) dx = a f (x) dx + c f (x) dx.

12.5 Absolute and Lipschitz Continuity

619

c a c Case 3: b ≤ a ≤ c. By Case 1, b f (x) dx = b f (x) dx + a f (x) dx. b a c c c Then, a f (x) dx = − b f (x) dx = a f (x) dx − b f (x) dx = a f (x) dx + b c f (x) dx. a c a Case 4: b ≤ c ≤ a. By Case 1, b f (x) dx = b f (x) dx + c f (x) dx. b a c a c Then, a f (x) dx = − b f (x) dx = − b f (x) dx − c f (x) dx = a f (x) dx + b c f (x) dx. b a b Case 5: c ≤ a ≤ b. By Case 1, c f (x) dx = c f (x) dx + a f (x) dx. Then, b a c b b a f (x) dx = c f (x) dx − c f (x) dx = a f (x) dx + c f (x) dx. a b a Case 6: c ≤ b ≤ a. By Case 1, c f (x) dx = c f (x) dx + b f (x) dx. b a b a c Then, a f (x) dx = − b f (x) dx = c f (x) dx − c f (x) dx = a f (x) dx + b c f (x) dx. This completes the proof of the fact. ' & Proposition 12.73 Let m ∈ N, Rm be endowed with the usual positive cone, Ω ∈ BB (Rm ) be a rectangle with the subset topology O, ((P(Ω), |·|), B, μ) be the σ finite metric measure subspace of Rm , Y be a normed linear space over K, and F : Ω → Y be absolutely continuous. Then, the following statements hold: (i) F is of locally bounded variation. (ii) If, in addition, Y is a Banach space over K, then there exists a unique σ -finite Y-valued measure space (P(Ω), B, ν) such that ν admits F as a cumulative distribution function. Furthermore, P ◦ ν(rx1 ,x2 ) = TF (rx1 ,x2 ), ∀x1 , x2 ∈ Ω  with x1 = x2 , and P ◦ ν(P(Ω)) = TF and P ◦ ν " μ. Proof (i) By Proposition 12.52, Ω is a region and P(Ω) ∈ BB (Rm ) is a rectangle. By Proposition 12.59, F is continuous. Then, (i) of Definition 12.41 holds.  ∀x1 , x2 ∈ Ω with x1 = x2 , consider the closed rectangle rx1 ,x2 ⊆ Ω. ∀x0 ∈ rx1 ,x2 , F is absolutely continuous at x0 . By (ii) of Definition 12.58, ∃ax0 ∈ (0, ∞) ⊂ R, ∀ ∈ (0, ∞) ⊂ R, ∃δx0 () ∈ (0, ∞) ⊂ R, ∀n ∈ N, ∀(x¯i )ni=1 , (x˜i )ni=1 ⊆ 

rx0 −ax0 1m ,x0 +ax0 1m ∩ Ω with x¯i = x˜i , ∀i ∈ {1, . . . , n}, (rx¯i ,x˜i )ni=1 being pairwise -n -n disjoint, and  i=1 μ◦Bm (rx¯i ,x˜i ) < δx0 (), we have i=1 ΔF (rx¯i ,x˜i ) < . Then, rx1 ,x2 ⊆ x∈rx1 ,x2 rx−ax 1m ,x+ax 1m . By the compactness of rx1 ,x2 , ∃ a finite set  N ⊆ rx1 ,x2 such that rx1 ,x2 ⊆ x∈N r◦x−ax 1m ,x+ax 1m . Let 0 := 1 ∈ R, a := maxx∈N ax ∈ (0, ∞) ⊂ R, and δ := minx∈N δx (0 ) ∈ (0, ∞) ⊂ R. ∀n ∈ Z+ ,   ∀(xˇi )ni=1 , (xˆi )ni=1 ⊆ rx1 ,x2 with xˇi = xˆi , ∀i ∈ {1, . . . , n}, ni=1 rxˇi ,xˆi = rx1 ,x2 , and sets in the union being pairwise disjoint, we have, by Proposition 12.46, .

nx n n 6.. 6 . 6 6 . 6 6 6ΔF (rxˇ ,xˆ )6 = Δ (r ) 6 F x ˇ , x ˆ i i i,x,j i,x,j 6 i=1

i=1



x∈N j =1

nx 6 n . 6 .. 6 6 6ΔF (rxˇi,x,j ,xˆi,x,j )6 x∈N i=1 j =1

620

12 Differentiation and Integration

where nx ∈ Z+ , ∀x ∈ N, (xˇi,x,j )x∈N, j ∈{1,...,nx } , (xˆi,x,j )x∈N, j ∈{1,...,nx } ⊆ rxˇi ,xˆi ,  x   xˇi,x,j = xˆ i,x,j , ∀x ∈ N, ∀j ∈ {1, . . . , nx }, x∈N nj =1 rxˇi,x,j ,xˆi,x,j = rxˇi ,xˆi , the sets nx in the union are pairwise disjoint, and j =1 rxˇi,x,j ,xˆi,x,j ⊆ rx−ax 1m ,x+ax 1m , ∀x ∈ N, i = 1, . . . , n. ∀x ∈ N, by further partition each rxˇi,x,j ,xˆi,x,j into n¯ x , where n¯ x = P Q 1 + 2m axm /δx (0 ) , equal measure and pairwise disjoint rectangles rxˇi,x,j ,xˆi,x,j = n¯ x k=1 rxˇ i,x,j,k ,xˆ i,x,j,k , we have .

nx 6 n n . 6 . 6 .. 6 6 6 6ΔF (rxˇ ,xˆ )6 ≤ (r ) 6Δ F x ˇ , x ˆ i i i,x,j i,x,j 6 x∈N i=1 j =1

i=1

=

n¯ x nx 6 . n . 6 .. 6 6 ΔF (rxˇi,x,j,k ,xˆi,x,j,k )6 6 x∈N i=1 j =1



k=1

nx . n¯ x 6 n . 6 .. 6 6 6ΔF (rxˇi,x,j,k ,xˆi,x,j,k )6 x∈N i=1 j =1 k=1

=

n¯ x . nx 6 n . 6 .. 6 6 6ΔF (rxˇi,x,j,k ,xˆi,x,j,k )6 x∈N k=1 i=1 j =1

Note that, ∀k = 1, . . . , n¯ x , nx n . . .

i=1 j =1

μBm (rxˇi,x,j,k ,xˆi,x,j,k )
0, ∀U ∈ I. We will say that I covers E in the sense of Vitali with index c ∈ (0, √1m ] ⊂ R if ∀ ∈ (0, ∞) ⊂ R, ∀x ∈ E, ∃U ∈ I such that x ∈ U , dia(U ) < , and the shortest side of the rectangle U has length at least c dia(U ). % Lemma 12.78 (Vitali) Let m ∈ N, Rm be endowed with the usual positive cone, E ⊆ Rm with μLmo (E) < +∞, and I ⊆ BB (Rm ) be a collection of nondegenerate rectangles in Rm that covers E in the sense of Vitali with index c ∈ (0, √1m ] ⊂ R. Then, ∀ ∈ (0, ∞) ⊂ R, there exists a finite pairwise  disjoint subcollection {U1 , . . . , Un } ⊆ I, where n ∈ Z+ , such that μLmo (E \ ( ni=1 Ui )) < . Proof Let Iˆ := {U ⊆ Rm | U ∈ I}. Then, Iˆ is a collection of nondegenerate closed rectangles in Rm that covers E in the sense of Vitali with index c. We will ˆ Fix any  ∈ (0, ∞) ⊂ R. By Example 12.56, first show that the result holds for I. -∞  μLmo (E) = inf(Oi )∞ ∈ORm , E⊆ ∞ i=1 μBm (Oi ) < +∞. Then, ∃O ∈ ORm i=1 Oi i=1 ˆ we may, such that E ⊆ O and μBm (O) < +∞. By neglecting rectangles in I, without loss of generality, assume that U ⊆ O and the shortest side of U has length ˆ ∀k ∈ Z+ , assume that pairwise disjoint U1 , . . . , Uk ∈ Iˆ at least c dia(U ), ∀U ∈ I. has already be chosen.  We will distinguish two  exhaustive and mutually exclusive  cases: Case 1: E ⊆ ki=1 Ui ; Case 2: E \ ( ki=1 Ui ) = ∅. Case 1: E ⊆ ki=1 Ui . k k ˆ Case 2: E \( Then, μLmo (E \( i=1 Ui )) = 0. The result holds for I. i=1 Ui ) = ∅. ˆ Note that, for any closed rectangle U ∈ I with dia(U ) =: p¯ > 0, let x0 ∈ U be the center of U , such that rx0 − 1 cp1 1 ¯ m ⊆ U ⊆ rx0 − 12 p1 ˆ m ,x0 + 12 p1 ˆ m , where 2 ¯ m ,x0 + 2 cp1 J 2 pˆ := 1 − (m − 1)c p¯ is the upper bound for the longest side of the rectangle U . This implies that cm p¯ m ≤ μBm (U ) ≤ pˆ m = (1 − (m − 1)c2 )m/2 p¯ m . Let lk be the supremum of the diameters of rectangles in Iˆ that do not intersect U1 , . . . , Uk . √ m   μBm (O) < +∞. Then, ∃x0 ∈ E \ ( ki=1 Ui ). Since ki=1 Ui is Clearly, lk ≤ c closed, then, by Proposition 4.10, d := infx∈k Ui |x − x0 | > 0. Then, by the i=1  assumption, ∃Uˆ ∈ Iˆ such that x0 ∈ Uˆ and dia(Uˆ ) < d. Then, Uˆ ∩ ( ki=1 Ui ) = ∅. This shows that lk ≥ dia(Uˆ ) > 0. Hence, ∃Uk+1 ∈ Iˆ such that U1 , . . . , Uk+1 ∈ Iˆ are pairwise disjoint and dia(Uk+1 ) ≥ lk /2. Inductively, we either have the result ˆ holds for Iˆ or ∃(Uk )∞ k=1 ⊆ I such that dia(Uk+1 ) ≥ lk /2, ∀k ∈ Z+ , and the sequence is pairwise disjoint.

626

12 Differentiation and Integration

∞ In the latter case, we have O ⊇ k=1 Uk and +∞ > μBm (O) ≥ ∞ ∞ ∞  . . Uk ) = μBm (Uk ) ≥ 2−m cm lkm . Note that (lk )∞ μBm ( k=0 is nonincreasing. k=1 k=1 k=0 -∞ Hence, limk∈N lk = 0. Then, ∃n ∈ N such that i=n+1 μBm (Ui ) < m ( +1), where we have referred to the special Gamma function. π −m/2 2m 5−m cm Γ  2 Let R := E \ ( nk=1 Uk ). Without loss of generality, assume  dia(Uk ) =: pk ∈ (0, ∞) ⊂ R and the center of Uk is xk , ∀k ∈ N. ∀x¯ ∈ R, since nk=1 Uk is closed, ¯ > 0. By the assumption, then, by Proposition 4.10, d := infx∈nk=1 Uk |x − x|  ∃U ∈ Iˆ such that x¯ ∈ U and 0 < dia(U ) < d. Then, U ∩ ( nk=1 Uk ) = ∅. By the fact that limk∈N lk = 0, we have ∃i0 ∈ {n + 1, n + 2, . . .} such that U ∩ Ui0 = ∅ and U ∩ Uk = ∅, ∀k ∈ {1, . . . , i0 − 1}. Then, dia(U ) ≤ li0 −1 ≤ 2 dia(Ui0 ) = 2pi0 . Let xˆ ∈ U ∩ Ui0 . Then, |x¯ − xi0 | ≤ |x¯ − x| ˆ + |xˆ − xi0 | ≤ dia(U ) + 12 pi0 ≤ 52 pi0 . Then, x¯ ∈ Vi0 := B Rm (xi0 , 52 pi0 ) ⊂ Rm . Then,  ∞ ∞ 5 m x¯ ∈ ∞ k=n+1 Vk := k=n+1 B Rm (xk , 2 pk ) ⊂ R . Hence, R ⊆ k=n+1 Vk . Then, ∞ -∞ -∞ π m/2 2−m 5m p m μLmo (R) ≤ μLmo ( k=n+1 Vk ) ≤ k=n+1 μBm (Vk ) = k=n+1 Γ ( m +1) k = 2 -∞ π m/2 2−m 5m c−m -∞ m pm ≤ π m/2 2−m 5m c−m · c μ (U ) < , where m m Bm k k=n+1 k=n+1 k Γ ( 2 +1) Γ ( 2 +1) the equality follows from (Mathematics Handbook Editors Group, 1979, pp. 320). ˆ Hence, the result holds for I.  Then, ∃n ∈ Z+ , ∃{U1 , . . . , Un } ⊆ I such that μLmo (E \ ( ni=1 Ui )) < .  Note that μ (J ) := μLmo (( ni=1Ui ) \ ( ni=1 Ui )) = 0. Then, Lmo  J ∈ BL and n μLmo (E \(i=1 Ui )) = μLmo ((E \( ni=1 Ui ))∩J )+μLmo ((E \( ni=1 Ui ))\J ) ≤ μLmo (E \( ni=1 Ui )) < , where the equality follows  from Definition 11.15 and the first inequality follows from fact that μLmo ((E\ ( ni=1 Ui )) ∩ J ) ≤ μLmo (J ) = 0 and the fact that (E \ ( ni=1 Ui )) \ J = E \ ( ni=1 Ui ). This completes the proof of the lemma. ' &

Definition 12.79 Let m ∈ N, Rm be endowed with the usual positive cone, E ∈ ORm with μBm (E) > 0, E := ((E, |·|), B, μ) be the σ -finite metric measure subspace of Rm as defined in Proposition 11.29, Y be a separable normed linear space over K, c ∈ (0, √1m ] ⊂ R, and f ∈ L¯ 1 (E, Y). A point x ∈ E is said to be a rectangular Lebesgue point with regularity c of f if lim supr→0+ sup rx ,x ⊂Rm with 1 (x1 +x2 )=x, μ(rx ,x1 ∩E) rx ,x ∩E f (y) − f (x) dμ(y) = 0.

1 2 2 |x2 −x1 |=r,min(x2 −x1 )≥cr

1 2

1 2

%

Proposition 12.80 Let m ∈ N, Rm be endowed with the usual positive cone, E ∈ ORm with μBm (E) > 0, E := ((E, |·|), B, μ) be the σ -finite metric measure subspace of Rm as defined in Proposition 11.29, Y be a separable normed linear space over K, c ∈ (0, √1m ] ⊂ R, f ∈ L¯ 1 (E, Y), and A := {x ∈ E | x is a rectangular Lebesgue point of f with regularity c}. Then, A ∈ BLm and μLm (E \ A) = 0.

12.6 Fundamental Theorem of Calculus

627

∞ Proof Since E is σ -finite, then ∃(En )∞ n=1 En = E and n=1 ⊆ B such that μ(En ) < +∞, ∀n ∈ N. ∀r ∈ (0, ∞) ⊂ R, define Tr,c f : E → [0, ∞) ⊂ R and Mr,c f : E → [0, ∞) ⊂ R by, ∀x ∈ E, (Tr,c f )(x) =

sup

.

rx1 ,x2 ⊂Rm with 1 (x1 +x2 )=x, 2 |x2 −x1 |=r,min(x2 −x1 )≥cr

1 μ(rx1 ,x2 ∩ E)

7

· (Mr,c f )(x) =

rx1 ,x2 ∩E

f (y) − f (x) dμ(y)

sup rx1 ,x2 ⊂Rm with 1 (x1 +x2 )=x, 2 |x2 −x1 |=r,min(x2 −x1 )≥cr

1 μ(rx1 ,x2 ∩ E)

7 rx1 ,x2 ∩E

P ◦ f dμ

∀x ∈ E, since E ∈ ORm then ∃δ ∈ (0, ∞) ⊂ R such that r◦

⊆ E.

x−δ 1m ,x+δ 1m with x = 12 (x1 + x2 ), |x2 − x1 | = r > 0 and min(x2 − x1 ) ≥ cr, we ∀rx1 ,x2 ⊂ have μ(rx1 ,x2 ∩ E) = μBm (rx1 ,x2 ∩ E) ≥ μBm (rx1 ,x2 ∩ r◦ ) ≥ ((2δ) ∧ x−δ 1m ,x+δ 1m m −m (cr)) > 0. Then, (Mr,c f )(x) ≤ ((2δ) ∧ (cr)) f 1 < ∞ and (Tr,c f )(x) ≤ (Mr,c f )(x) + f (x) < ∞. Hence, Tr,c f and Mr,c f are well-defined. Define Tc f :

Rm

E → [0, ∞] ⊂ Re and Mc f : E → [0, ∞] ⊂ Re by

(Tc f )(x) = lim sup(Tr,c f )(x) (Mc f )(x) = lim sup(Mr,c f )(x);

.

r→0+

r→0+

∀x ∈ E

Clearly, Tc f and Mc f are well-defined. Then, Tr,c and Mr,c are functions of L¯ 1 (E, Y) to nonnegative real-valued functions of E, and Tc and Mc are functions of L¯ 1 (E, Y) to nonnegative extended real-valued functions of  E. Then, A = {x ∈ ∞ ∞ E | (T f )(x) = 0} and E \ A = {x ∈ E | (T f )(x) > 0} = c n=1 k=1 An,k := ∞ c∞ {x ∈ E | (T f )(x) > 1/k}. n c n=1 k=1 We will show that μLmo (An,k ) = 0, ∀n ∈ N, ∀k ∈ N, which implies that An,k ∈ BLm and μLm (An,k ) = 0, ∀n ∈ N, ∀k ∈ N. This further implies that E \ A ∈ BLm , μLm (E \ A) = 0, and A ∈ BLm . Fix any n ∈ N and any k ∈ N. Suppose μLmo (An,k ) > 0. By monotonicity of outer measures, we have μLmo (An,k ) ≤ μLmo (En ) = μLm (En ) = μ(En ) < +∞. μ (A ) Let 0 = Lmo5k n,k ∈ (0, +∞) ⊂ R. By Propositions 11.182 and 4.11, there exists a continuous function g : E → Y such that g ∈ L¯ 1 (E, Y) and g − f 1 < 0 . Let h := f − g. Then, h1 < 0 . Note that (Tr,c f )(x) ≤ (Tr,c g)(x) + (Tr,c h)(x), ∀r ∈ (0, ∞) ⊂ R, ∀x ∈ E. Then, by Proposition 3.85, (Tc f )(x) ≤ (Tc g)(x) + (Tc h)(x), ∀x ∈ E. Since g is continuous, we have (Tc g)(x) = 0, ∀x ∈ E. Then, (Tc f )(x) ≤ (Tc h)(x), ∀x ∈ E. Note that (Tr,c h)(x) ≤ (Mr,c h)(x) + h(x);

.

∀r ∈ (0, ∞) ⊂ R, ∀x ∈ E

628

12 Differentiation and Integration

Then, by Proposition 3.85, we have (Tc h)(x) ≤ (Mc h)(x) + h(x), ∀x ∈ E. Hence,  (Tc f )(x) ≤ + (M * * c h)(x) + h(x), ∀x+ ∈ E. Then, An,k ⊆   1 1 x ∈ En  (Mc h)(x) > 2k ∪ x ∈ En  h(x) > 2k =: A˜ n,k ∪ A¯ n,k . By mono "* +#  1 tonicity of outer measures, we have μLmo (A¯ n,k ) ≤ μLmo x ∈ E  h(x) > 2k =: μLmo (Aˆ k ) = μBm (Aˆ k ) = E χAˆ k ,E dμ ≤ E (2kP ◦ h) dμ = 2kh1 < 2k0 , where the second equality follows from the fact that Aˆ k ∈ B since h is Bmeasurable. By countable subadditivity of outer measures, we have μLmo (An,k ) ≤ μLmo (A˜ n,k ) + μLmo (A¯ n,k ) < μLmo (A˜ n,k ) + 2k0 . Hence, μLmo (A˜ n,k ) > 3k0 . Clearly, μLmo (A˜ n,k )* ≤ μLmo (En )< +∞.  Define I := rx1 ,x2 ⊆ Rm  x = 12 (x1 + x2 ) ∈ E, |x2 − x1 | =: r > 0, + 1 min(x2 − x1 ) ≥ cr, rx ,x ∩E P ◦ h dμ > 2k μ(rx1 ,x2 ∩ E) . Then, I covers A˜ n,k in 1 2 the sense of Vitali with index c. By Vitali’s Lemma 12.78, there exists pairwise  m¯ disjoint rectangles rxi,1 ,xi,2 i=1 ⊆ I with m ¯ ∈ Z+ such that μLmo (A˜ n,k \ m¯ m¯ ( i=1 rxi,1 ,xi,2 )) < 0 . Let B = ( i=1 rxi,1 ,xi,2 )∩E ∈ B. Then, μ(B) = μLmo (B) ≥ μLmo (B ∩ A˜ n,k ) = μLmo (A˜ n,k ) − μLmo (A˜ n,k \ B) > (3k − 1)0 , where the first inequality follows from the monotonicity of outer measures and the second equality follows from the measurability of B. Then, h1 = E P ◦ h dμ ≥ B P ◦ h dμ = -m¯ 1 -m¯ 1 3k−1 i=1 rx ,x ∩E P ◦ h dμ > i=1 2k μ(rxi,1 ,xi,2 ∩ E) = 2k μ(B) > 2k 0 ≥ 0 > i,1 i,2

h1 . This is a contradiction. Therefore, we must have μLmo (An,k ) = 0. This completes the proof of the proposition.

' &

In the rest of the book, we will denote the ith unit vector in Rm by em,i , ∀m ∈ N, ∀i ∈ {1, . . . , m}. Proposition 12.81 Let m ∈ N, Rm be endowed with the usual positive cone, Y be a separable normed linear space over K, Ω ∈ BB (Rm ) be an open set, and F : Ω → Y be continuous on the right (or continuous on the left) and BB (Rm )measurable. Then, the following statements hold: (i) ∀i ∈ {1, . . . , m}, let Di F : Ui → B(K, Y) denote the partial derivative of F with respect to its ith coordinate in its domain and Ui := dom (Di F ). Then, Ui ∈ BB (Rm ) and Di F is BB (Rm )-measurable. m (ii) Let F (1) : U → B (K domain is m, Y) be the Fréchet derivativemof F , whose (1) U := dom F ⊆ i=1 Ui ⊆ Ω. Then, U ∈ BB (R ) and F (1) is BB (Rm )measurable. Proof (i) Fix any i ∈ {1, . . . , m}. Consider the case that F is continuous on the right. Let X := K, and ∀x := (x1 , . . . , xm ) ∈ Ω, let Di,x := {y ∈ R | (x1, . . . , xi−1 , y, xi+1 , . . . , xm ) ∈ Ω} ∈ BB (R), which is an open set. ∀s ∈   Di,x , clearly span ADi,x (s) = K = X. Then, Di F : Ui → B(K, Y), where Ui := dom (Di F ) ⊆ Ω. Clearly, B(K, Y) is isometrically isomorphic to Y. Then, Di F : Ui → Y. Clearly, ∀x ∈ Ui , Di F (x) = limh→0 (F (x + hem,i ) − F (x))/ h ∈ Y.

12.6 Fundamental Theorem of Calculus

Claim 12.81.1 Ui

=

∞   n=1



629



Ui,n,h1 ,h2

=: U¯ i

  ∈ B B Rm ,

δ∈Q h1 ∈Q h2 ∈Q δ>0 0 = > = F (1) (x) = D1 F (x) · · · Dm F (x) = (D1 F )|U (x) · · · (Dm F )|U (x)

.

By Propositions 11.38, 11.39, 11.41, and 11.139, F (1) is BB (Rm )-measurable. The case when F is continuous on the left can be proved similarly. This completes the proof of the proposition.

' &

Theorem 12.82 Let I := [a, b] ⊂ R with a, b ∈ R and a < b, Y be a separable Banach space over K, f : I → Y be absolutely x integrable over I with respect to μB , and F : I → Y be defined by F (x) = a f (t) dt, ∀x ∈ I . Assume that f is continuous at x0 ∈ I . Then, F is Fréchet differentiable at x0 and F (1) (x0 ) = f (x0 ). (Note that, when K = C, I is viewed as a subset of C in calculations of F (1) .) Proof Clearly, F is well-defined by Proposition 11.92. Note that span (AI (x0 )) = K. ∀ ∈ (0, ∞) ⊂ R, by the continuity of f at x0 , ∃δ ∈ (0, ∞) ⊂ R, ∀x ∈ I ∩ BR (x0 , δ), we have f (x) − f (x0 ) < . Then, F (x) − F (x0 ) − f (x0 ) (x − x0 ) 7 x0 7 67 x 6 f (t) dt − f (t) dt − =6

.

a

67 6 =6

x

7 f (t) dt −

x0

67 6 =6

rx0 ∧x,x0 ∨x

a

x x0

x x0

6 6 f (x0 ) dt 6

6 67 x 6 6 6 6 f (x0 ) dt 6 = 6 (f (t) − f (x0 )) dt 6

6 7 6 (f − f (x0 )) dμB 6 ≤

x0

P ◦ (f − f (x0 )) dμB rx0 ∧x,x0 ∨x

≤ μB (rx0 ∧x,x0 ∨x ) = |x − x0 | where the second equality follows from Fact 12.72, the third equality and the first inequality follow from Proposition 11.92, and the second inequality follows from Proposition 11.78. Hence, F (1) (x0 ) = f (x0 ). This completes the proof of the theorem. ' & Theorem 12.83 Let I := [a, b] ⊂ R with a, b ∈ R and a < b, I := ((I, |·|), B, μ) be the finite complete metric measure subspace of R, Y be a Banach space over K, and F : I → Y be C1 . (Note that, when K = C, I is viewed as a subset of C in calculations of F (1) .) Then, F (1) : I → Y is absolutely integrable over I and b b F (b) − F (a) = a F (1) (t) dt = a F (1) dμB .

632

12 Differentiation and Integration

(1) Proof Since F (1) : I → Y is continuous, then, by Proposition 11.37, 6 (1) F 6 is Bmeasurable. Since I is compact, then ∃M ∈ [0, ∞) ⊂ R such that 6F (x)6 ≤ M, ∀x ∈ I . By Proposition 11.78, we have F (1) is absolutely integrable over I. By   Proposition 7.126, F (1) : I → N := span F (1) (I ) ⊆ Y and N is a separable x Banach subspace of Y. Define Fa : I → N ⊆ Y by Fa (x) = a F (1) (t) dt = x (1) dμB , ∀x ∈ I . By Proposition 11.92, Fa is well-defined. By Theorem 12.82, a F (1) Fa is Fréchet differentiable and Fa = F (1) . Then, define g : I → Y by g = F − Fa . Then, by Proposition 9.15, g is Fréchet differentiable and g (1) (x) = ϑY , ∀x ∈ I . By Mean Value Theorem 9.23, g(x) − g(a) ≤ 0, ∀x ∈ I . Hence, g(x) = b g(a) = F (a), ∀x ∈ I . Then, g(b) = F (a) = F (b) − Fa (b) = F (b) − a F (1)(t) dt. b (1) Therefore, F (b) − F (a) = a F (t) dt. This completes the proof of the theorem. ' &

Proposition 12.84 Let m ∈ N, Rm be endowed with the usual positive cone, Ω ∈ BB (Rm ) be a region, X := ((P(Ω), |·|), B, μ) be the σ -finite metric measure subspace of Rm , Y be a finite-dimensional Banach space over K, f : P(Ω) → Y be BB (Rm )-measurable, J¯ := {1, . . . , m}, and x0 := (x0,1 , . . . , x0,m ) ∈ Ω. Then, we may define the σ -finite Y-valued measure ν with kernel f over X according to Proposition 11.116. By Proposition 11.167, f is the unique Radon–Nikodym  derivative of ν with respect to μ. Assume that, ∀x1 , x2 ∈ Ω with x1 = x2 and rx1 ,x2 ⊆ Ω, we have f is bounded over rx1 ,x2 ⊆ P(Ω). Let F : Ω → Y be a cumulative distribution function of ν. Then, ∀x := (x1 , . . . , xm ) ∈ Ω with rx, ˆ xˇ ⊆ Ω, where xˆ := x0 ∧ x and xˇ := x0 ∨ x, we have 7 F (x) =

x1

.

x0,1

7

xm

···

f (s1 , . . . , sm ) dsm · · · ds1 −

x0,m

.

(−1)card(J ) F (xJ )

J ⊆J¯,J =∅

(12.1) and the order of integration can be arbitrary, where xJ ∈ rx, ˆ xˇ ⊆ Ω is defined by  πi (x0 ) = x0,i i ∈ J πi (xJ ) = , ∀i ∈ J¯. πi (x) = xi i ∈ J¯ \ J Proof Y is separable and is σ -compact, since it is finite-dimensional. Fix any x ∈ Ω with rx, By Definition 12.42 and Definition 11.166, we ˆ xˇ ⊆ Ω. Then, rx, ˆ xˇ ⊆ P(Ω). have ΔF (rx, ) = ν(r ) = f dμ = f dμBm . By the assumption and ˆ xˇ x, ˆ xˇ rx, rx, ˆ xˇ ˆ xˇ Fubini’s Theorem 12.31, we have 7

7 f dμBm =

.

rx, ˆ xˇ

= (−1)n(x)

7

x1 x0,1

x0,1 ∨x1 x0,1 ∧x1

7

···

7 ···

xm x0,m

x0,m ∨xm x0,m ∧xm

f (s1 , . . . , sm ) dsm · · · ds1

f (s1 , . . . , sm ) dsm · · · ds1 = ΔF (rx, ˆ xˇ )

12.6 Fundamental Theorem of Calculus

633

and the order of integration can be arbitrary, where n(x) = card  > πi (x) =: card(Jˇ). By Definition 12.41, .

ΔF (rx, ˆ xˇ ) =

.

ˆ

(−1)card(J ) F (x¯Jˆ )

Jˆ⊆J¯

7

= (−1)n(x)

 i ∈ J¯ | πi (x0 )

x1

7 ···

x0,1

xm

f (s1 , . . . , sm ) dsm · · · ds1

x0,m

m ¯ where x¯Jˆ ∈ rx, ˆ xˇ ⊆ R is defined by, ∀i ∈ J ,

 πi (x¯Jˆ ) =

.

 =  =

πi (x) ˆ ∀i ∈ Jˆ ˇ ∀i ∈ J¯ \ Jˆ πi (x) πi (x) ∀i ∈ (Jˆ ∩ Jˇ) ∪ ((J¯ \ Jˆ) ∩ (J¯ \ Jˇ)) πi (x0 ) otherwise πi (x) ∀i ∈ J¯ \ (JˆJˇ) = πi (xJˆJˇ ) πi (x0 ) ∀i ∈ JˆJˇ

This leads to ΔF (rx, ˆ xˇ ) =

.

.

Jˆ⊆J¯ ˇ

= (−1)card(J )

ˆ

(−1)card(J ) F (xJˆJˇ )

.

Jˆ⊆J¯

= (−1)n(x)

. 7

ˇ

ˆ

ˇ

(−1)card(J J ) F (xJˆJˇ )

Jˆ⊆J¯

x1

= (−1)n(x)

ˆ

(−1)card(J )+card(J ) F (xJˆJˇ )

x0,1

7 ···

xm

f (s1 , . . . , sm ) dsm · · · ds1

x0,m

Then, we can conclude . .

J ⊆J¯

7 (−1)card(J )F (xJ ) =

x1 x0,1

7 ···

xm

f (s1 , . . . , sm ) dsm · · · ds1

x0,m

The above equation is equivalent to (12.1). This completes the proof of the proposition. ' & Proposition 12.85 Let m ∈ N, Rm be endowed with the usual positive cone, Ω ∈ BB (Ω) be a nondegenerate rectangle, X := ((Ω, |·|), B, μ) be the σ -finite metric measure subspace of Rm , Y be a separable Banach space, f : Ω → Y be B-

634

12 Differentiation and Integration

measurable, x0 ∈ Ω, and J¯ := {1, . . . , m}. Assume that f is absolutely integrable  over rxa ,xb with respect to μ, ∀xa , xb ∈ Ω with xa = xb . Define πJ : Rm → Rcard(J ) by πJ (x) = (πi (x))i∈J , ∀x ∈ Rm , ∀J ⊆ J¯; (πJ (Ω), BJ , μJ ) be the measure subspace of Rcard(J ) , ∀J ⊆ J¯ with J = ∅; MJ : Rcard(J ) × Rm−card(J ) → Rm by MJ (s, t) ∈ Rm is such that πJ (MJ (s, t)) = s and πJ¯\J (MJ (s, t)) = t, ∀s ∈ Rcard(J ) , ∀t ∈ Rm−card(J ) , ∀J ⊆ J¯. Then, the following statements hold: (i) ∀∅ = J ⊆ J¯, ∃UJ ∈ B with μ(Ω \ UJ ) = 0 such that f (MJ (·, πJ¯\J (x))) : πJ (Ω) → Y is absolutely integrable over πJ (rx, ˆ xˇ ) with respect to μJ , ∀x ∈ UJ ; and F : Ω → Y, defined by, ∀x ∈ Ω, 3 F (x) =

.

ˇ

(−1)card(J ∩J )



f (MJ (s, πJ¯\J (x))) dμJ (s) πJ (rx, ˆ xˇ )

ϑY

x ∈ UJ x ∈ Ω \ UJ

  where xˆ := x0 ∧ x, xˇ := x0 ∨ x, and Jˇ := i ∈ J¯ | πi (x0 ) > πi (x) , is Bmeasurable. * (ii) Furthermore, if, in addition, Y is finite-dimensional, then UJ = x ∈ Ω | f (MJ (·, πJ¯\J (x))) : πJ (Ω) → Y is absolutely integrable over πJ (rx, ˆ xˇ ) with respect to + μJ . Proof First consider the special case when μ(Ω) < ∞ and f is absolutely integrable over X. Since Ω is a nondegenerate rectangle, then Ω is bounded. ˇ Define f¯ : πJ (Ω) × Ω → Y by f¯(s, x) = (−1)card(J ∩J ) f (MJ (s, πJ¯\J (x))) · χπJ (rx,ˆ xˇ ),πJ (Ω) (s), ∀(s, x) ∈ πJ (Ω) × Ω. By Propositions 7.23, 11.39, and 11.38, f¯ is BB (Rcard(J )+m )-measurable. Note the following derivation, 7

P ◦ f¯(s, x) d(μJ × μ)(s, x)

.

πJ (Ω)×Ω

7

≤ 7

7 6 6 6 6 6f (MJ (s, πJ¯\J (x)))6 dμ(x) dμJ (s)

πJ (Ω) Ω

7

7

= πJ (Ω) πJ¯\J (Ω) πJ (Ω)

7

f (MJ (s, t)) dμJ (τ ) dμJ¯\J (t) dμJ (s)

7

= μJ (πJ (Ω)) πJ (Ω) πJ¯\J (Ω)

f (x) dμJ¯\J (πJ¯\J (x)) dμJ (πJ (x))

7 = μJ (πJ (Ω))

P ◦ f dμ < ∞ Ω

where the first inequality follows from Proposition 11.83, and the equalities follow from Tonelli’s Theorem 12.29. Then, we have that f¯ is absolutely integrable over πJ (Ω) × Ω with respect to μJ × μ. By Fubini’s Theorem 12.30, ∃pJ : Ω → Y and UJ ∈ B such that μ(Ω \ UJ ) = 0; f¯(·, x) : πJ (Ω) → Y is absolutely integrable

12.6 Fundamental Theorem of Calculus

635

over 3 (πJ (Ω), BJ , μJ ), ∀x ∈ UJ ; pJ : Ω → Y, defined by, ∀x ∈ Ω, pJ (x) = ¯ πJ (Ω) f (s, x) dμJ (s) x ∈ UJ , is B-measurable; pJ is absolutely integrable ϑY x ∈ Ω \ UJ over X, and πJ (Ω)×Ω f¯(s, x) d(μJ × μ)(s, x) = Ω pJ (x) dμ(x). Clearly, F (x) = pJ (x), ∀x ∈ Ω. Hence, F is B-measurable. Then, f (MJ (·, πJ¯\J (x))) : πJ (Ω) → Y is absolutely integrable over πJ (rx, ˆ xˇ ) ⊆ πJ (Ω) with respect to μJ , ∀x ∈ UJ . If, in addition, Y is finite-dimensional, let UJ be defined as in the statement (ii). ∀x ∈ Ω, f (MJ (·, πJ¯\J (x))) : πJ (Ω) → Y is absolutely integrable ¯ over πJ (rx, ˆ xˇ ) with respect to μJ if, and only if, f (·, x) is absolutely integrable over (πJ (Ω), BJ , μJ ). Then, UJ := {x ∈ Ω | f¯(·, x) : πJ (Ω) → Y is absolutely integrable over (πJ (Ω), BJ , μJ )}. By Fubini’s Theorem 12.31, this choice of UJ is admissible for the definition of pJ and pJ is B-measurable. This proves the result in this special case. Now consider the general case. Since Ω is a nondegenerate rectangle and x0 ∈ Ω, then there exists x0 ∈ Ω1 ⊆ Ω2 ⊆ · · · ⊆ Ωj ⊆ Ωj +1 ⊆ · · · ⊆ Ω with Ωj being a nondegenerate closed rectangle, μ(Ωj ) < ∞, ∀j ∈ N such that Ω = ∞ j =1 Ωj . Fix any j ∈ N. By the assumption of the proposition, we can apply the result of the special case on Ωj . There exists UJ,j ∈ B with μ(Ωj \ UJ,j ) = 0 such that f (MJ (·, πJ¯\J (x))) : πJ (Ωj ) → Y is absolutely integrable over πJ (rx, ˆ xˇ ) with respect to μJ , ∀x ∈ UJ,j ; and Fj : Ωj → Y, defined by, ∀x ∈ Ωj , ⎧ card(Jˇ∩J ) ⎪ f (MJ (s, ⎨ (−1) πJ (rx, ˆ xˇ ) x ∈ UJ,j .Fj (x) = πJ¯\J (x))) dμJ (s) ⎪ ⎩ ϑY x ∈ Ωj \ UJ,j ∞ is B-measurable. Let UJ := U ∈ B. Then, 0 ≤ μ(Ω \ UJ ) = ∞ j =1 J,j ∞ μ( j =1 (Ωj \ UJ )) ≤ μ( j =1 (Ωj \ UJ,j )) ≤ 0. Define F according to the statement of the proposition, and we observe that F (x) = Fj (x), ∀x ∈ UJ,j \ j −1 ( l=1 UJ,l ), if x ∈ UJ ; F (x) = ϑY , ∀x ∈ Ω \ UJ . By Proposition 11.41, F is B-measurable. Clearly, f (MJ (·, πJ¯\J (x))) : πJ (Ω) → Y is absolutely integrable over πJ (rx, ˆ xˇ ) ⊆ πJ (Ω) with respect to μJ , ∀x ∈ UJ . If, in addition, Y is finite-dimensional, then, by the special case, UJ,j = {x ∈ Ωj | f (MJ (·, πJ¯\J (x))) : πJ (Ωj ) → Y is absolutely integrable over πJ (rx, ˆ xˇ ) with  U = {x ∈ Ω | f (M (·, π respect to μJ }, ∀j ∈ N. Then, UJ = ∞ J j =1 J,j J¯\J (x))) : πJ (Ω) → Y is absolutely integrable over πJ (rx, ) withrespect to μ }. This comJ ˆ xˇ pletes the proof of the proposition. ' & Theorem 12.86 (Fundamental Theorem of Calculus I) Let m ∈ N, Rm be endowed with the usual positive cone, Ω ∈ BB (Rm ) be an open rectangle, X := ((Ω, |·|), B, μ) be the σ -finite metric measure subspace of Rm , Y be a separable Banach space over K, f : Ω → Y be B-measurable, J¯ := {1, . . . , m}, and x0 ∈ Ω. Assume that f is absolutely integrable over rxa ,xb with respect to μ, ∀xa , xb ∈ Ω 

with xa = xb . Let ν be the σ -finite Y-valued measure with kernel f over X according

636

12 Differentiation and Integration

to Proposition 11.116. Define F : Ω → Y to be the cumulative distribution function of ν with origin at x0 according to Proposition 12.52. F is absolutely continuous dν by Proposition 12.75, and f = dμ a.e. in X. Let Di F : Ui → Y, ∀i ∈ J¯, where Di is the derivative of a function with respect to theith coordinate variable of Ω. ∀i ∈ J¯, define πi : Rm → Rm−1 by πi (x) = πj (x) j ∈J¯\{i} , ∀x ∈ Rm ; also define Mi : R × Rm−1 → Rm by πi (Mi (x, y)) = x and πi (Mi (x, y)) = y, ∀x ∈ R, ∀y ∈ Rm−1 . Then, the following statements hold: ˇ (i) F (x) = (−1)card(J ) r f dμ, ∀x ∈ Ω, where xˆ := x0 ∧ x, xˇ := x0 ∨ x, and x, ˆ xˇ   Jˇ := i ∈ J¯ | πi (x0 ) > πi (x) . (ii) Ui ∈ B, μ(Ω \ Ui ) = 0, Di F is B-measurable; ∃U¯ i ∈ B with μ(Ω \ U¯ i ) = 0, ∀x ∈ U¯ i , f (Mi (πi (x), ·)) : πi (Ω) → Y is absolutely integrable over πi (rx, ˆ xˇ ) with respect to μi , and ∃pi : Ω → Y defined by, ∀x ∈ Ω, pi (x) =

.

7 ⎧ ⎨ (−1)card(Jˇ\{i}) ⎩

πi (rx, ˆ xˇ ) ϑY

f (Mi (πi (x), s)) dμi (s) x ∈ U¯ i x ∈ Ω \ U¯ i

when m > 1, pi (x) = f (x), ∀x ∈ Ω, U¯ i = ∅, when m = 1; where (πi (Ω), Bi , μi ) is the measure subspace of Rm−1 , such that pi is Bmeasurable, and Di F (x) = pi (x) a.e. x ∈ X, ∀i ∈ J¯. (iii) If, in addition, Y is finite-dimensional,1 then U¯ i = {x ∈ Ω | f (Mi (πi (x), ·)) : πi (Ω) → Y is absolutely integrable over πi (rx, ˆ xˇ ) withrespect to μ i }. Proof By Proposition 12.75, F is absolutely continuous. By Definition 12.41 and Definition 11.166, we have ΔF (rx, f dμ. Since F is ˆ xˇ ) = ν(rx, ˆ xˇ ) = rx, ˆ xˇ the cumulative distribution function of ν with origin at x0 , then ΔF (rx, ˆ xˇ ) = ˇ) ˇ) card( J card( J (−1) F (x). Thus, we have F (x) = (−1) f dμ, ∀x ∈ Ω. This rx, ˆ xˇ establishes (i). (ii) The result is trivial when Ω = ∅. Let Ω = ∅. By Proposition 12.81, Di F is B-measurable, ∀i ∈ J¯. We will distinguish two exhaustive and mutually exclusive cases: Case 1: m = 1; Case 2: m > 1. Case 1: m = 1. Then, i = 1. x ˇ F (x) = (−1)card(J ) r f dμ = x0 f dμB , ∀x ∈ Ω. Since Ω is a nonempty x, ˆ xˇ ∞ ∞ open interval, then ∃(Ej )∞ j =1 Ej j =1 := ((aj , bj ))j =1 ⊆ OR such that Ω = and −∞ < · · · < an < · · · < a2 < a1 < x0 < b1 < b2 < · · · < bn < · · · < ∞. Fix any j ∈ N. Let Ej := (Ej , Bj , μj ) be the finite measure subspace of (Ω, B, μ). Then, f ∈ L¯ 1 (Ej , Y). Let c ∈ (0, 1) be a constant. Define Aj := {x ∈ Ej | x is a rectangular Lebesgue point of f |Ej with regularity c}. By Proposition 12.80, Aj ∈ BL and μL (Ej \ Aj ) = 0. Fix any x ∈ Aj . We have

1

The finite-dimensionality assumption might possibly be relaxed to σ -compact conic segment.

12.6 Fundamental Theorem of Calculus

lim sup r→0+

637

7

1

sup rx˜ ,x˜ ⊂R with 21 (x˜1 +x˜2 )=x, 1 2 |x˜2 −x˜1 |=r,(x˜2 −x˜1 )≥cr

μ(rx˜1 ,x˜2 ∩ Ej )

rx˜1 ,x˜2 ∩Ej

f (y) − f (x) · dμ(y) = 0.

∀ ∈ (0, ∞) ⊂ R, ∃r0 ∈ (0, ∞) 7⊂ R, ∀r ∈ (0, r0 ) ⊂ R, we have 1 f (y) − f (x) dμ(y) < . Let 0≤ sup μ(r 1 x˜ 1 ,x˜ 2 ∩ Ej ) rx˜ ,x˜ ∩Ej r ⊂R with (x˜ +x˜ )=x, x˜1 ,x˜2 2 1 2 |x˜2 −x˜1 |=r,(x˜2 −x˜1 )≥cr

1 2

rx˜1 ,x˜2 ⊆ R be such that 12 (x˜1 + x˜2 ) = x, x˜2 − x < r0 /2, and rx˜1 ,x˜2 ⊆ Ej . ∀x¯ ∈ rx˜1 ,x˜2 with x¯ = x, we have, where h := x¯ − x, 0 ≤ F (x) ¯ − F (x) − f (x) (x¯ − x) 7 x+h 6 6 7 x+h 6 6 f (y) dμ(y) − f (x) dμ(y)6 =6

.

67 6 =6 67 6 =6 7

x

x

x+h x

6 6 (f (y) − f (x)) dμ(y)6

rx∧(x+h),x∨(x+h)

6 6 (f (y) − f (x)) dμ(y)6

f (y) − f (x) dμ(y)

≤ rx−|h|,x+|h|

= 2 |h|

7

1 μB (rx−|h|,x+|h| )

f (y) − f (x) dμB (y) rx−|h|,x+|h|

≤ 2 |x¯ − x| where the first equality follows from Fact 12.72 and Proposition 11.75, the second equality follows from Proposition 11.92, the third equality follows from Definition 12.71, the second inequality follows from Proposition 11.92, and the last inequality follows from the fact x ∈ Aj and the choice of r0 . Then, x ∈ dom (D1 F ) = U1 and  D1 F (x) = f (x). By the arbitrariness of x, we have Aj ⊆ U1 . Clearly, ∞ implies j =1 Aj ⊆ U1 ⊆ Ω. This ∞ that 0 ≤ μ(Ω \ U1 ) = μB (Ω \ U1 ) = μL (Ω \ U1 ) ≤ μL ( j =1 (Ej \ ∞  ( ∞ l=1 Al ))) ≤ μL ( j =1 (Ej \ Aj )) ≤ 0. Hence, μ(Ω \ U1 ) = 0. By D1 F being B-measurable and f being B-measurable, we have D1 F (x) = f (x) a.e. x ∈ X. This case is proved. Case 2: m > 1. Without loss of generality, consider i = 1. Let X1 := ¯ μ) (π1 (Ω), B, ¯ be the measure subspace of R. By Proposition 12.85, ∃U¯ 1 ∈ B with μ(Ω \ U¯ 1 ) = 0, and p1 as defined in the statement of the theorem, such that p1 is B-measurable. Let E¯ := Ω \ U¯ 1 . By Lemma 12.27, ∃U¯ 1,1 ∈ B1 with μ1 (π1 (Ω) \ U¯ 1,1 ) = 0 such that μ( ¯ E¯ π(x) ) = 0, ∀π1 (x) ∈ U¯ 1,1 , where  1  E¯ π(x) := s¯ ∈ π1 (Ω)  M1 (¯s , π1 (x)) ∈ E¯ . By Fubini’s Theorem 12.30, we 1 π1 (x) have F (x) = p1 (M1 (s, π(x))) dμ(s), ¯ ∀x ∈ Ω with π(x) ∈ U¯ . Then, π1 (x0 )

1

1

1,1

638

12 Differentiation and Integration

by Case 1, we have D1 F (x) = p1 (x) a.e. π1 (x) ∈ X1 , ∀π1 (x) ∈ U¯ 1,1 . Since D1 F and p1 are B-measurable and Y is separable, then, by Propositions 7.23, 11.39, and 11.38, D1 F − p1 is B-measurable. Then, Uˆ 1 := {x ∈ Ω | D1 F (x) = p1 (x)} ∈ B. By Lemma 12.27, μ(Ω \ Uˆ 1 ) =: μ(E) = π(Ω) μ(E ¯ π1 (x) ) dμ1 (π1 (x)) = 1 ¯ π1 (x) ) dμ1 (π1 (x)) = U¯  0 dμ1 = 0, where Eπ1 (x) := {¯s ∈ U¯ 1,1 μ(E 1,1 π1 (Ω) | M1 (¯s , π1 (x)) ∈ E}; and μ(E ¯ π(x) ) = 0 follows from D1 F (x) = 1 p1 (x) a.e. π1 (x) ∈ X1 , ∀π1 (x) ∈ U¯ 1,1 . Hence, D1 F = p1 a.e. in X, and U1 ∈ B with μ(Ω \ U1 ) = 0. This case is also proved. In both cases, (ii) holds. Hence, (ii) is true. (iii) This follows immediately from (ii) and Proposition 12.85. This completes the proof of the theorem. ' & Proposition 12.87 Let m ∈ N, Rm be endowed with the usual positive cone, Ω ∈ BB (Rm ) be a region, Z be a normed linear space, G : Ω → Z, J¯ := {1, . . . , m}, and x0 ∈ Ω. ∀J ⊆ J¯, define πJ : Rm → Rcard(J ) by πJ (x) = (πi (x))i∈J , ∀x ∈ Rm ; define MJ : Rcard(J ) × Rm−card(J ) → Rm by MJ (s, t) ∈ Rm such that πJ (MJ (s, t)) = s and πJ¯\J (MJ (s, t)) = t, ∀s ∈ Rcard(J ) , ∀t ∈ Rm−card(J ) ; also define GJ : πJ (Ω) → Y by GJ (s) = G(MJ (s, πJ¯\J (x0 ))), ∀s ∈ πJ (Ω). Then, we have, ∀x ∈ Ω, ˇ

(−1)card(J ) ΔG (rx, ˆ xˇ ) = G(x) − G(x0 ) . ˇ (−1)card(J ∩J ) ΔGJ (πJ (rx, − ˆ xˇ ))

.

(12.2)

J ⊂J¯,J =∅

  where xˆ := x ∧ x0 , xˇ := x ∨ x0 ; and Jˇ := i ∈ J¯ | πi (x0 ) > πi (x) . Proof We will prove the result using mathematical induction on m: 1◦ m = 1. This is obvious. 2◦ Assume the claim holds for m ≤ k ∈ N. ¯ : 3◦ Consider the case when m = k + 1 ∈ {2, 3, . . .}. Fix any x ∈ Ω. Define G ¯ πJ¯\{k+1} (Ω) → Z by G(s) = G(MJ¯\{k+1} (s, πk+1 (x))), ∀s ∈ πJ¯\{k+1} (Ω). Define ¯ J : πJ (Ω) → Z in terms of G ¯ in a similar manner as GJ in terms of G, ∀J ⊆ G J¯ \ {k + 1}. Then, we have the following sequence of arguments. ˇ

(−1)card(J ) ΔG (rx, ˆ xˇ ) " # ˇ = (−1)card(J \{k+1}) ΔG¯ (πJ¯\{k+1} (rx, ˆ xˇ )) − ΔG (πJ¯\{k+1} (rx, ˆ xˇ ))

.

¯ ¯ ¯ = G(π J \{k+1} (x)) − G(πJ¯\{k+1} (x0 )) . ˇ (−1)card((J \{k+1})∩J ) ΔG¯ J (πJ (rx, − ˆ xˇ )) J ⊂J¯\{k+1},J =∅ ˇ

−(−1)card(J \{k+1}) ΔG (πJ¯\{k+1} (rx, ˆ xˇ ))

12.6 Fundamental Theorem of Calculus

639 ˇ

= G(x) − G{k+1} (πk+1 (x)) − (−1)card(J \{k+1}) ΔG (πJ¯\{k+1} (rx, ˆ xˇ )) . ˇ − (−1)card((J \{k+1})∩J ) ΔGJ (πJ (rx, ˆ xˇ )) J ⊂J¯\{k+1},J =∅



.

ˇ

(−1)card((J \{k+1})∩J ) ΔG¯ J (πJ (rx, ˆ xˇ )) − ΔGJ (πJ (rx, ˆ xˇ ))

J ⊂J¯\{k+1} J =∅

= G(x) − G{k+1} (πk+1 (x)) −



.

ˇ

(−1)card(J ∩J ) ΔGJ (πJ (rx, ˆ xˇ ))

= G(x) − G{k+1} (πk+1 (x)) −

.

ˇ

(−1)card(J ∩J ) ΔGJ (πJ (rx, ˆ xˇ ))

J ⊆J¯,k+1∈J / =∅

.

ˇ

(−1)card(J ∩J ) ΔGJ (πJ (rx, ˆ xˇ ))

J ⊂J¯,J ⊃{k+1}

= G(x) − G{k+1} (πk+1 (x)) − −

ˇ

(−1)card(J ∩J ) ΔGJ (πJ (rx, ˆ xˇ ))

J ⊆J¯\{k+1} J =∅

J ⊂J¯,J ⊃{k+1}



.

!

.

ˇ

(−1)card(J ∩J ) ΔGJ (πJ (rx, ˆ xˇ ))

J ⊆J¯,k+1∈J / =∅

.

ˇ

(−1)card(J ∩J ) ΔGJ (πJ (rx, ˆ xˇ ))

J ⊂J¯,J ⊇{k+1} ˇ

+(−1)card(J ∩{k+1}) ΔG{k+1} (π{k+1} (rx, ˆ xˇ )) . ˇ = G(x) − G{k+1} (πk+1 (x)) − (−1)card(J ∩J ) ΔGJ (πJ (rx, ˆ xˇ )) J ⊂J¯,J =∅ ˇ

+(−1)card(J ∩{k+1}) ΔG{k+1} (π{k+1} (rx, ˆ xˇ )) . ˇ = G(x) − G(x0 ) − (−1)card(J ∩J ) ΔGJ (πJ (rx, ˆ xˇ )) J ⊂J¯,J =∅

where the first equality follows from Definition 12.41, the second equality follows from the inductive assumption 2◦ , and the fourth equality follows from Definition 12.41. This establishes the claim in this step and completes the induction process. This completes the proof of the proposition. ' &

640

12 Differentiation and Integration

Theorem 12.88 (Fundamental Theorem of Calculus II) Let m ∈ N, Rm be endowed with the usual positive cone, Ω ∈ BB (Rm ) be an open rectangle, X := ((Ω, |·|), B, μ) be the σ -finite metric measure subspace of Rm , Y be a separable reflexive Banach space over K with Y∗ being separable, F : Ω → Y be absolutely continuous, J¯ := {1, . . . , m}, F (1) : U → B (Rm , Y), Di F : Ui → Y, ∀i ∈ J¯, where Di is the derivative of a function with respect to the ith coordinate variable of Ω, and x0 ∈ Ω. ∀J ⊆ J¯, define πJ : Rm → Rcard(J ) by πJ (x) = (πi (x))i∈J , ∀x ∈ Rm ; define MJ : Rcard(J ) × Rm−card(J ) → Rm by MJ (s, t) ∈ Rm such that πJ (MJ (s, t)) = s and πJ¯\J (MJ (s, t)) = t, ∀s ∈ Rcard(J ) , ∀t ∈ Rm−card(J ) ; also define FJ : πJ (Ω) → Y by FJ (s) = F (MJ (s, πJ¯\J (x0 ))), ∀s ∈ πJ (Ω); and let XJ := (πJ (Ω), BJ , μJ ) be the measure subspace of Rcard(J ) when J = ∅. Then, the following statements hold: (i) There exists a set of functions fJ,x0 : πJ (Ω) → Y, which is BJ -measurable, ∀J ⊆ J¯ with J = ∅, such that fJ,x0 ’s are absolutely integrable on πJ (rxa ,xb ) 

with respect to μJ , ∀xa , xb ∈ Ω with xa = xb , and we have, ∀x ∈ Ω, F (x) =

.

.

(−1)

card(Jˇ∩J )

7 πJ (rx, ˆ xˇ )

J ⊆J¯,J =∅

fJ,x0 dμJ + F (x0 )

(12.3)

  where xˆ := x ∧ x0 , xˇ := x ∨ x0 ; and Jˇ := i ∈ J¯ | πi (x0 ) > πi (x) . The set of functions (fJ,x0 )J ⊆J¯,J =∅ is unique in the sense that if (gJ,x0 )J ⊆J¯,J =∅ is another set of such functions satisfying (12.3), then fJ,x0 = gJ,x0 a.e. in XJ , ∀J ⊆ J¯ with J = ∅. (ii) If we expand absolutely continuous function F at x¯ ∈ Ω, instead of x0 , then we will have a set of functions fJ,x¯ : πJ (Ω) → Y, which is BJ -measurable, ∀J ⊆ J¯ with J = ∅, satisfying the conclusion of (i) with x0 replaced by x. ¯ The fJ,x¯ ’s relate to fJ,x0 ’s according to the following formula, ∀J ⊆ J¯ with J = ∅, ∃U¯ J ∈ BJ with μJ (πJ (Ω) \ U¯ J ) = 0, such that fJ,x¯ (s) =

.

.

J ⊂J˜⊆J¯

(−1)

card(J¯ˇ ∩J˜\J )

7 πJ˜\J (rx, ¯ˆ x¯ˇ )

fJ˜,x0 (πJ˜ (MJ (s,

πJ¯\J (t)))) dμJ˜\J (πJ˜\J (t)) + fJ,x0 (s),

∀s ∈ U¯ J

(12.4)

and fJ,x¯ (s) = ϑY , ∀s ∈ πJ (Ω) \ U¯ J , where x¯ˆ := x¯ ∧ x0 , x¯ˇ := x¯ ∨ x0 , and   ¯ Jˇ := i ∈ J¯ | πi (x0 ) > πi (x) ¯ ; and all involved integrations are such that the integrands are absolutely integrable over the specified integration domain with respect to the given measures, respectively. (iii) Ui ∈ B, μ(Ω \ Ui ) = 0, Di F is B-measurable, and, ∀i ∈ J¯, Di F (x) = f{i},xi (π{i} (x)) a.e. π{i} (x) ∈ X{i} , ∀πJ¯\{i} (x) ∈ XJ¯\{i} , where π (x) xi := M{i} (π{i} (x0 ), πJ¯\{i} (x)); and F (x) − F (xi ) = π{i}{i}(x0 ) f{i},xi dμ{i} , ∀x ∈ X.

12.6 Fundamental Theorem of Calculus

641

∞ (1) is B-measurable, and, ∀x ∈ U , we have (iv) U ⊆ i=1=Ui , U ∈ B, F > (1) F (x) = D1 F (x) · · · Dm F (x) . If, in addition, ∀J ⊆ J¯ with J = ∅, ∀x ∈ Ω, ∃δ(J, x) ∈ 6 (0, ∞) ⊂ 6R, ∃c(J, x) ∈ [0, ∞) ⊂ R such that ess supy∈BRm (x,δ(J,x))6fJ,x0 (πJ (y))6 ≤ c(J, x) and BRm (x, δ(J, x)) ⊆ Ω, then μ(Ω \ U ) = 0.  *  (v) If, in addition, Y is finite-dimensional, then U¯ J = s ∈ πJ (Ω)  fJ˜,x0 (πJ˜ (MJ (s, πJ¯\J (t)))) is absolutely integrable over πJ˜\J (rx, ¯ˆ x¯ˇ ) withrespect to + μJ˜\J (πJ˜\J (t)), ∀J˜ ⊆ J¯ with J ⊂ J˜ , ∀J ⊆ J¯ with J = ∅. In the rest of this book, fJ,x0 will be called the stream of F in J coordinate subspace with respect to x0 . Proof (i) By F = FJ¯ being absolutely continuous and Definition 12.58 and Proposition 12.60, ∀J ⊆ J¯ with J = ∅, FJ is absolutely continuous. By Proposition 12.73, there exists a unique σ -finite Y-valued measure νJ on (πJ (Ω), BJ ) such that FJ is a cumulative distribution function of νJ . Furthermore, P ◦ νJ " μJ . Since Y is a separable reflexive Banach space with Y∗ being separable, then, by Radon–Nikodym Theorem 11.171, there exists a unique fJ,x0 : πJ (Ω) → Y, which 

dνJ is BJ -measurable, such that fJ,x0 = dμ a.e. in XJ . ∀xa , xb ∈ Ω with xa = xb , J we have P ◦ νJ (πJ (rxa ,xb )) = πJ (rx ,x ) P ◦ fJ,x0 dμJ = TFJ (πJ (rxa ,xb )) < a b ∞, where the first equality follows from Definition 11.166, the second equality follows from Proposition 12.73, and the inequality follows from the fact FJ is absolutely continuous and therefore of locally bounded variation, which implies this according to Definition 12.41. Hence, fJ,x0 is absolutely integrable over πJ (rxa ,xb ) ˆ xˇ )) = νJ (πJ (rx, ˆ xˇ )) = with respect to μJ . Then, ∀x ∈ Ω, we have ΔFJ (πJ (rx, f (s) dμ (s). By Proposition 12.87, we have J πJ (r ) J,x0 x, ˆ xˇ

ˇ

(−1)card(J ) ΔF (rx, ˆ xˇ ) = F (x) − F (x0 ) . ˇ (−1)card(J ∩J ) ΔFJ (πJ (rx, − ˆ xˇ ))

.

J ⊂J¯,J =∅

which implies that ˇ

7

(−1)card(J )

.

πJ¯ (rx, ˆ xˇ )

fJ¯,x0 (s) dμJ¯ (s)

= F (x) − F (x0 ) −

.

ˇ

(−1)card(J ∩J )

7 πJ (rx, ˆ xˇ )

J ⊂J¯,J =∅

fJ,x0 (s) dμJ (s)

Reorganizing the above formula, we have F (x) = F (x0 ) +

.

.

(−1)

J ⊆J¯,J =∅

card(Jˇ∩J )

7 πJ (rx, ˆ xˇ )

fJ,x0 (s) dμJ (s)

642

12 Differentiation and Integration

  This establishes (12.3). The uniqueness of the set of functions fJ,x0 J ⊆J¯,J =∅ is clear from the uniqueness of FJ ’s and the uniqueness of Radon–Nikodym derivatives. Hence, (i) holds. (ii) By (i), there exists a set of functions fJ,x¯ : πJ (Ω) → Y, which is Bmeasurable, ∀J ⊆ J¯ with J = ∅, such that fJ,x¯ ’s are absolutely integrable on  πJ (rxa ,xb ) with respect to μJ , ∀xa , xb ∈ Ω with xa = xb , and we have F (x) = card(Jˇ¯∩J ) fJ,x¯ dμJ + F (x), ¯ where x¯ˆ := x¯ ∧ x, x¯ˇ := x¯ ∨ x, J ⊆J¯,J =∅ (−1) πJ (rx, ˆ¯ xˇ¯ )    and Jˇ¯ := i ∈ J¯  πi (x) ¯ > πi (x) . Now, we let x = x¯ + MJ (s, 0m−card(J ) ), ∀s ∈ Rcard(J ) , ∀J ⊆ J¯ with J = ∅. By the preceding formula, we have F (x) = card(Jˇ¯ ∩Jˆ) f ˆ dμ ˆ + F (x). ¯ This is because ∀Jˆ ⊆ J , the ˆ ˆ (−1) J ⊆J,J =∅

πJˆ (rx, ˆ¯ xˇ¯ ) J ,x¯

J

domain of integration will be an empty set and that leads to ϑY for that integral. Now, substitute the formula for F (x) and F (x) ¯ using (12.3), and we have .

7

ˇ ˆ

(−1)card(J ∩J )

.

πJˆ (rx, ˆ xˇ )

Jˆ⊆J¯,Jˆ =∅

.

fJˆ,x¯ dμJˆ +

fJˆ,x0 dμJˆ =

¯ˇ ˆ

(−1)card(J ∩J )

7

.

7 πJˆ (rx, ˆ¯ xˇ¯ )

Jˆ⊆J,Jˆ =∅

fJˆ,x0 dμJˆ

πJˆ (rx, ¯ˆ x¯ˇ )

Jˆ⊆J¯,Jˆ =∅

ˇ¯ ˆ

(−1)card(J ∩J )

Rearrange terms in the above, we obtain .

ˇ¯ ˆ

(−1)card(J ∩J )

7 πJˆ (rx, ˆ¯ xˇ¯ )

Jˆ⊆J,Jˆ =∅

fJˆ,x0 dμJˆ −

=

. Jˆ⊆J¯ Jˆ =∅



(−1)

πj (x)

.

.

ˇ ˆ

(−1)card(J ∩J )

7

Jˆ⊆J¯,Jˆ =∅

πJˆ (rx, ¯ˆ x¯ˇ )

πJˆ (rx, ˆ xˇ )

fJˆ,x0 dμJˆ

fJˆ,x0 (s1 , . . . , scard(Jˆ) ) dμB (scard(Jˆ) ) · · · dμB (s1 )

πj (x0 )

fJˆ,x0 (s1 , . . . , scard(Jˆ) ) dμB (scard(Jˆ) ) · · · dμB (s1 )

/ ¯  7 πj (x)

Jˆ⊆J¯,Jˆ =∅ j ∈Jˆ

7

0

/ 0 ¯ .  7 πj (x) j ∈Jˆ

¯ card(Jˇ ∩Jˆ)

Jˆ⊆J¯,Jˆ =∅

πj (x0 )

j ∈Jˆ

Jˆ⊆J¯ Jˆ =∅

=

/7

.

fJˆ,x¯ dμJˆ =

πj (x0 )

+

7 πj (x) 0 ¯ πj (x)

· dμB (scard(Jˆ) ) · · · dμB (s1 ) −

.

fJˆ,x0 (s1 , . . . , scard(Jˆ) ) / 0 ¯  7 πj (x)

Jˆ⊆J¯,Jˆ =∅ j ∈Jˆ

πj (x0 )

12.6 Fundamental Theorem of Calculus

=

643

fJˆ,x0 (s1 , . . . , scard(Jˆ) ) dμB (scard(Jˆ) ) · · · dμB (s1 ) / / 0 0 ¯ . .  7 πj (x)  7 πl (x) Jˆ⊆J¯,Jˆ =∅ J˜⊆Jˆ,J˜ =∅ j ∈J˜

πj (x) ¯

l∈Jˆ\J˜

πl (x0 )

fJˆ,x0 (s1 , . . . , scard(Jˆ) ) dμB (scard(Jˆ) ) · · · dμB (s1 ) 7 .. ¯ˇ ˆ ˜ ˇ¯ ˜ = fJˆ,x0 dμJˆ (−1)card(J ∩J )+card(J ∩J \J ) (r ) π ˆ x ˆ , x ˇ ˆ ¯ ˜ ˆ J ˜ ˜ J ⊆J J ⊆J J ,x,x,x ¯ J ,x,x,x ¯ 0

Jˆ =∅ J˜ =∅

=

.

.

¯ˇ ˆ ˜

ˇ¯ ˜

(−1)card(J ∩J )+card(J ∩J \J )

0

7

Jˆ⊆J¯,Jˆ =∅ J˜⊆Jˆ∩J,J˜ =∅

fJˆ,x0 dμJˆ =

.

.

¯ˇ ˜ ˆ

ˇ¯ ˆ

(−1)card(J ∩J )+card(J ∩J \J )

=

..

ˇ¯ ˆ

¯ˇ ˜ ˆ

(−1)card(J ∩J )+card(J ∩J \J )

=

.

.

Jˆ⊆J,Jˆ =∅

+(−1) =

. Jˆ⊆J Jˆ =∅

(−1)

¯ˇ ˜ ˆ

ˇ¯ ˆ

(−1)card(J ∩J )+card(J ∩J \J )

J˜⊆J¯,J˜⊃Jˆ

card(Jˇ¯ ∩Jˆ)

!

7

card(Jˇ¯ ∩Jˆ)

πJˆ (rx, ˆ¯ xˇ¯ ) 7

,xˇ J˜,x,x,x ¯

0

πJ˜ (rxˆJˆ,x,x,x ¯

0

,xˇ Jˆ,x,x,x ¯

0

)

)

7 πJ˜ (rxˆJˆ,x,x,x ¯

Jˆ⊆J J˜⊆J¯ Jˆ =∅ J˜⊇Jˆ

0

7

J˜⊆J¯,J˜ =∅ Jˆ⊆J˜∩J,Jˆ =∅

fJ˜,x0 dμJ˜

πJˆ (rxˆJ˜,x,x,x ¯

,xˇ Jˆ,x,x,x ¯

0

)

fJ˜,x0 dμJ˜

0

7 πJˆ (rx, ˆ¯ xˇ¯ )

pJ˜,Jˆ dμJˆ

fJˆ,x0 dμJˆ

πJˆ (rx, ˆ¯ xˇ¯ )

. J˜⊆J¯ J˜⊃Jˆ

(−1)

¯ card(Jˇ ∩J˜\Jˆ)

! pJ˜,Jˆ + fJˆ,x0 dμJˆ

where xˆJ˜,x,x,x ¯ 0 := x1,J˜,x,x,x ¯ 0 := x1,J˜,x,x,x ¯ 0 ∧ x2,J˜,x,x,x ¯ 0 , xˇ J˜,x,x,x ¯ 0 ∨ x2,J˜,x,x,x ¯ 0; x1,J˜,x,x,x ¯ x2,J˜,x,x,x ¯ πJ¯\J˜ (x0 )); the ¯ 0 := MJ˜ (πJ˜ (x), πJ¯\J˜ (x)), ¯ 0 := MJ˜ (πJ˜ (x), second to fifth equalities follow from intuition; we have not used rigorous notation in these equations in expressing the joint integrals as repeated integrals and the other way around; this should not cause concern since from the first equality to ˇ ˆ the fifth, all that we have done is breaking down (−1)card(J ∩J ) π (r ) fJˆ,x0 dμJˆ ˆ xˇ Jˆ x, into pieces along the intermediate point x; ¯ the sixth equality follows from the idea that the corresponding integral will be zero if its domain of integration is an empty set, which will happen since πi (x) = πi (x), ¯ ∀i ∈ J¯ \ J ; the seventh equality follows from the interchange of summing variables Jˆ and J˜;

644

12 Differentiation and Integration

the ninth equality follows from Fubini’s Theorem 12.30, and U¯ J˜,Jˆ ∈ BJˆ with μJˆ (πJˆ (Ω) \ U¯ J˜,Jˆ ) = 0, and pJ˜,Jˆ : πJˆ (Ω) → Y, ∀Jˆ ⊂ J˜, defined by pJ˜,Jˆ (s) := 3 fJ˜,x0 (πJ˜ (MJˆ (s, πJ¯\Jˆ (t)))) dμJ˜\Jˆ (πJ˜\Jˆ (t)) s ∈ U¯ J˜,Jˆ πJ˜\Jˆ (rx, ˆ¯ xˇ¯ ) is BJˆ ϑY s ∈ πJˆ (Ω) \ U¯ J˜,Jˆ measurable; and the tenth equality follows from Proposition 11.92. Let U¯ Jˆ :=  ˆ ¯ ˆ ¯ ¯ ¯ Jˆ⊂J˜⊆J¯ UJ˜,Jˆ , ∀J ⊆ J with J = ∅. Clearly, UJˆ ∈ BJˆ and μJˆ (πJˆ (Ω) \ UJˆ ) = 0. ¯ The definition of fJ,x¯ ’s (12.4) satisfies   the above equalities, ∀J ⊆ J with J = ∅. Then, by (i), this set of functions fJ,x¯ J ⊆J¯,J =∅ is the unique one we seek. (iii) Fix any i ∈ J¯. Without loss of generality, let i = 1. By Proposition 12.81, U1 ∈ B and D1 F is B-measurable. By Fundamental Theorem of Calculus 12.86, every term on the right-hand side of (12.3) is differentiable with respect to the ith coordinate variable of Ω almost everywhere in "X. ∀J ⊆ J¯ with 1 ∈ J ,# ∃UJ,1 ∈ B with μ(Ω \ UJ,1 ) = 0, ˇ D1 (−1)card(J ∩J ) fJ,x0 dμJ : UJ,1 → Y is B-measurable, ∃U¯ J,1 ∈ πJ (rx, ˆ xˇ )

B with μ(Ω \ U¯ J,1 ) = 0, and ∃pJ,1 : Ω → Y defined by pJ,1 = ⎧ card(Jˇ∩J \{1}) ⎪ fJ,x0 (πJ (M{1} (π1 (x) ⎨ (−1) πJ \{1} (rx, ˆ xˇ ) x ∈ U¯ J,1 , ∀x ∈ Ω, when , πJ¯\{1} (s)))) dμJ \{1} (πJ \{1} (s)) ⎪ ⎩ ϑY x ∈ Ω \ U¯ J,1 ¯ J,1 = Ω, when J = {1}, such that ∀x ∈ Ω, U J ⊃ {1}, pJ,1 = fJ,x0 (πJ (x)), " # ˇ pJ,1 is B-measurable and D1 (−1)card(J ∩J ) πJ (r ) fJ,x0 dμJ = pJ,1 (x) a.e. x x, ˆ xˇ " # ˇ∩J ) card( J ¯ · πJ (r ) fJ,x0 dμJ = ϑY , ∈ X. ∀J ⊆ J with 1 ∈ / J , we have D1 (−1) x, ˆ xˇ  ∀x ∈ Ω. In this case, we have U¯ J,1 = Ω. Let U¯ 1 := U¯ J,1 . ¯ - J ⊆J ,J =∅ Then, U¯ 1 ∈ B and μ(Ω \ U¯ 1 ) = 0. Hence, D1 F (x) = J ⊆J¯,J =∅ pJ,1 (x), p (x) a.e. x ∈ X. This implies ∀x ∈ U¯ 1 , and D1 F (x) = J,1 ¯ J ⊆J ,J =∅ that μ(Ω \ U1 ) = 0. Now, expand the function F at x1 (which is as defined in the statement of the theorem.) By (12.3), we have F (x) = π (x) F (x1 ) + π11(x0 ) f{1},x1 dμ{1} , ∀x ∈ Ω. By Fundamental Theorem of Calculus I, Theorem 12.86, we have D1 F (x) = f{1},x1 (π{1} (x)) a.e. π{1} (x) ∈ X{1} , ∀πJ¯\{1} (x) ∈ XJ¯\{1} . Thus, (iii) holds.  (1) 12.81, U ⊆ m (iv) By Proposition i=1 = > Ui , F is B-measurable, and ∀x ∈ U , we (1) have F (x) = D1 F (x) · · · Dm F (x) . All we need to show is that μ(Ω \ U ) = 0 under the additional assumption that fJ,x0 ’s are locally bounded. ∀i ∈ J¯, ∀J ⊆ J¯ with i ∈" J , by the proof of (iii), ∃U˜ J,i ∈ B# with μ(Ω\U˜ J,i ) = 0, ˇ such that Di HJ (x) := Di (−1)card(J ∩J ) πJ (r ) fJ,x0 dμJ (x) = pJ,i (x), ∀x ∈ x, ˆ xˇ U˜ J,i . Then, let Uˆ J,i := UJ,i ∩ U¯ J,i ∩ U˜ J,i ∈ B, we have μ(Ω \ Uˆ J,i ) = 0 ˇ and Di HJ (x) = (−1)card(J ∩J \{i}) · πJ \{i} (r ) fJ,x0 (πJ (M{i} (πi (x), πJ¯\{i} (s)))) · x, ˆ xˇ dμJ \{i} (πJ \{i} (s)) =: HJ,i (x), ∀x ∈ Uˆ J,i .

12.6 Fundamental Theorem of Calculus

#   m ˇ ˆ ∩ U J,i i=1 i=1 Ui ∈ B. Clearly, μ(Ω \ U ) = 0.   Fix any x¯ ∈ Uˇ . Then, by (ii), there exists a unique set of functions fJ,x¯ J ⊆J¯,J =∅ card(Jˇ¯ ∩J ) such that F (x) = ¯ ∀x ∈ Ω. J ⊆J¯,J =∅ (−1) πJ (r ) fJ,x¯ dμJ + F (x), Let Uˇ :=

"  m

645

i∈J ⊆J¯

ˆ¯ xˇ¯ x,

Furthermore, fJ,x¯ ’s are related to fJ,x0 ’s by (12.4). Fix any  ∈ (0, ∞) ⊂ R. By fJ,x0 ’s being locally bounded on their domain of definition and (12.4), we can conclude that fJ,x¯ ’s are also locally bounded on their domain of definition. ∀J ⊆ J¯ ¯ x) ∈ (0, ∞) ⊂ R, ∃c(J, with J = ∅, ∀x ∈ Ω, ∃δ(J, ¯ x) ∈ [0, ∞) ⊂  R such ¯ x) ⊆ Ω. that ess supy∈BRm (x,δ(J,x) (π (y)) ≤ c(J, ¯ x) and BRm x, δ(J, f ¯ J, x ¯ J ) ¯ x) Let δ¯0 := minJ ⊆J¯,J =∅ δ(J, ¯ x) ¯ ∈ ¯ ∈ (0, ∞) ⊂ R and c¯0 := maxJ ⊆J¯,J =∅ c(J,   ¯ ¯ δ0 ⊆ Ω. [0, ∞) ⊂ R. Then, ess supy∈BRm (x, ¯ δ¯0 ) fJ,x¯ (πJ (y)) ≤ c¯0 and BRm x, ¯ ∀i ∈ J , we have .

Di F (x) ¯ = f{i},x0 (πi (x)) ¯ +

(−1)

.

¯ card(Jˇ ∩J \{i})

7 πJ \{i} (rx, ¯ˆ x¯ˇ )

J ⊆J¯,{i}⊂J

fJ,x0 (πJ (M{i} (πi (x), ¯ πJ¯\{i} (s)))) dμJ \{i} (πJ \{i} (s)) . = HJ,i (x) ¯ i∈J ⊆J¯

where the equality follows from the preceding discussion. ∀i ∈ J¯, ∀J ⊆ J¯ with i ∈ J , ∃δJ,i (x) ¯ ∈ (0, ∞) ⊂ R such that HJ (x¯ + hi e m,i ) − HJ (x) ¯ − HJ,i (x)h ¯ i ≤ m−1 2−m |hi |. Let δ1 := mini∈J¯ mini∈J ⊆J¯ δJ,i (x) ¯ ∧ δ¯0 ∧ 2(1+c¯0 )(2m −1−m) ∧ 1 ∈ (0, ∞) ⊂ R. This leads to, ∀x ∈ BRm (x, ¯ δ1 ), 6 6 m . 6 6 6 ¯ − . F (x) − F (x) Di F (x)π ¯ i (x − x) ¯ 6 6 6 6 6 =6 6 6 6 =6 6

i=1

.

(−1)

card(Jˇ¯∩J )

7 πJ (rx, ˆ¯ xˇ¯ )

J ⊆J¯,J =∅

.

ˇ¯

(−1)card(J ∩J )

m .

ˇ¯

(−1)card(J ∩{i})



i=1

(−1)

card(Jˇ¯∩{i})

fJ,x¯ dμJ

7

i=1 m .

i=1

7 πJ (rx, ˆ¯ xˇ¯ )

J ⊆J¯,card(J )≥2

+

fJ,x¯ dμJ −

m .

7

π{i} (rx, ˆ¯ xˇ¯ ) π{i} (rx, ˆ¯ xˇ¯ )

f{i},x¯ dμ{i} 6 6 ¯ dμ{i} (t)6 Di F (x) 6

6 6 Di F (x)π ¯ i (x − x) ¯ 6 6

646

12 Differentiation and Integration

7

.



πJ (rx, ˆ¯ xˇ¯ )

J ⊆J¯,card(J )≥2

6 m 7 6. card(Jˇ¯ ∩{i}) 6 +6 (−1) i=1

7

.



J ⊆J¯,card(J )≥2

+

πJ (rx, ˆ¯ xˇ¯ )

P ◦ fJ,x¯ dμJ  π{i} (rx, ˆ¯ xˇ¯ )

6 6 ¯ dμ{i} (t)6 f{i},x¯ (t) − Di F (x) 6 

P ◦ fJ,x¯ dμJ

m . . 6 6 6H ˜ (x¯ + πi (x − x)e ¯ m,i ) − HJ˜ (x) ¯ − HJ˜,i (x)π ¯ i (x − x) ¯ 6 J i=1 i∈J˜⊆J¯

.



J ⊆J¯,card(J )≥2

.



c¯0

J ⊆J¯,card(J )≥2

m . .

c¯0 μJ (πJ (rx, ˆ¯ xˇ¯ )) +

m−1 2−m |πi (x − x)| ¯

i=1 i∈J˜⊆J¯

   πj (x − x) ¯  + |x − x| ¯ 2

j ∈J

 ¯ + |x − x| ¯ ≤ |x − x| ¯ ≤ (2m − 1 − m)c¯0 δ1card(J )−1 |x − x| 2 where the first equality follows from (12.4); the first inequality follows from Proposition 11.92, the second inequality follows from the fact that 7 .

πi (rx, ˆ¯ xˇ¯ )

f{i},x¯ dμ{i} =

.

(HJ (x¯ + πi (x − x)e ¯ m,i ) − HJ (x)) ¯

J ⊆J¯,J =∅

and Proposition 11.92, and the third inequality  follows from choice of δ1 and the  ¯ we have preceding discussion. Then, x¯ ∈ dom F (1) . By the arbitrariness of x,   Uˇ ⊆ dom F (1) = U . Hence, μ(Ω \ U ) = 0. (v) This is a direct consequence of Fubini’s Theorem 12.31 and the proof of (ii). This completes the proof of the theorem. ' & Theorem 12.89 (Integration by Parts) Let I ⊆ R be a nonempty open or closed interval, I := ((P(I ), |·|), B, μ) be the σ -finite metric measure subspace of R, Y be a separable Banach space over K, Z be a separable reflexive Banach space over K with Z∗ being separable, and W ⊆ B(Y, Z) be a separable Banach subspace, a : P(I ) → W be B-measurable and absolutely integrable over I, νA be the Wvalued measure with kernel a over I, and A : I → W be a cumulative distribution function for νA , f : P(I ) → Y be B-measurable and absolutely integrable over I, νF be the Y-valued measure with kernel f over I, and F : I → Y be a cumulative distribution function for νF . Then, H : I → Z defined by H (x) = A(x)F (x), ∀x ∈ I , is absolutely continuous and is a cumulative distribution function of νH that is the Z-valued measure with kernel h : P(I ) → Z over I, where h is defined

12.6 Fundamental Theorem of Calculus

647

by h(x) = A(x)f (x) + a(x)F (x), ∀x ∈ P(I ), and Af , aF , and h are absolutely integrable over I. As a consequence, ∀b, c ∈ I with b ≤ c, we have 7

c

.

b

7

c

A(x)f (x) dx +

a(x)F (x) dx = A(c)F (c) − A(b)F (b)

(12.5)

b

Proof By Proposition 7.66, B(Y, Z) is a Banach space. Fix any x0 ∈ I . Let F0 : I → Y be the cumulative distribution function of νF with origin x0 , and A0 : I → W be the cumulative distribution function of νA with origin x0 . By Proposition 12.75, F0 and A0 are absolutely continuous and therefore continuous. Since f and a are absolutely integrable over I, then ∃M ∈ [0, ∞) ⊂ R such that F0 (x)Y ≤ M and A0 (x)W ≤ M, ∀x ∈ I . By the assumption, we have F (x) = F (x0 ) + F0 (x) and A(x) = A(x0 ) + A0 (x), ∀x ∈ I . Then, F and A are absolutely continuous, therefore continuous, and ∃M¯ ∈ [0, ∞) ⊂ R ¯ ∀x ∈ I . By Proposition 11.37, F such that F (x)Y ≤ M¯ and A(x)W ≤ M, and A are B-measurable. By Propositions 7.23, 7.65, 11.38, and 11.39, Af , aF and h are B-measurable. Then, Af , aF are absolutely integrable over I. Hence, h is absolutely integrable over I by Proposition 11.83. By Proposition 12.68, H is absolutely continuous. We will first consider the case that I is an open interval. By Fundamental Theorem of Calculus I 12.86, F (1) = D1 F : UF → Y and A(1) = D1 A : UA → W are such that UF , UA ∈ B, μ(I \ UF ) = 0 = μ(I \ UA ), F (1) and A(1) are B-measurable, F (1) = f a.e. in I, and A(1) = a a.e. in I. Then, ∃Uˆ F , Uˆ A ∈ B such that μ(I \ Uˆ F ) = 0 = μ(I \ Uˆ A ) and F (1) (x) = f (x), ∀x ∈ Uˆ F , and A(1)(x) = a(x), ∀x ∈ Uˆ A . ∀x ∈ Uˆ F ∩ Uˆ A , by Propositions 9.17 and 9.19 and the Chain Rule, Theorem 9.18, we have H (1)(x) = A(x)f (x) + a(x)F (x) = h(x). By Fundamental Theorem of Calculus II 12.88, H (1) = D1 H : UH → Z is such that UH ∈ ) = 0, and H (1) is B-measurable. Then, E := (I \ UH ) ∪ B, μ(I \ UH(1) x ∈ UH  H (x) = h(x) ∈ B and E ⊆ (I \ Uˆ F ) ∪ (I \ Uˆ A ). Hence, μ(E) = 0 and h = H (1) a.e. in I. By Fundamental Theorem of Calculus x II 12.88, H (x) = x0 h¯ dμ + H (x0), ∀x ∈ I , h¯ is B-measurable and absolutely ¯ integrable over rxa ,xb , ∀xa , xb ∈ I with xa ≤ xb , and H (1)(x) = h(x) a.e. x ¯ a.e. in I. Then, by Proposition 11.92, we have ∈ I. Thus, by Lemma 11.44, h = h x and H (x) − H (x0) = x0 h(s) ds, ∀x ∈ I . By Proposition 11.92 and Fact 12.72, ∀b, c ∈ I with b ≤ (c) − A(b)F (b) = H (c) − H (x0 ) − c c, we have A(c)F b c (H (b) − H (x0 )) = x0 h(x) dx − x0 h(x) dx = b (A(x)f (x) + a(x)F (x)) dx = c c b A(x)f (x) dx + b a(x)F (x) dx. Define μH to be the Z-valued measure with kernel h over I. By the preceding paragraph, we have H is the cumulative distribution function of μH . This completes the proof for this case. Next, we consider the case when I is a closed interval. We will define Iˆ ˆ ˆ ˆ ˆ to  be an open interval such that I ⊆ I . Define f : I → Y by f (x) = f (x) x ∈ P(I ) . Similarly define aˆ in terms of a. Define Fˆ : Iˆ → Y by ϑY x ∈ Iˆ \ P(I )

648

12 Differentiation and Integration

⎧ x∈I ⎨ F (x) ˆ F (x) = F (mins∈I s) x < mins∈I s . Similarly define Aˆ in terms of A. Then, ⎩ F (maxs∈I s) x > maxs∈I s ˆ μ) ˆ to the result can be applied for Iˆ := ((Iˆ, |·|), B, ˆ for the functions fˆ, Fˆ , a, ˆ A, yield Hˆ : Iˆ → Z and hˆ : Iˆ = P(Iˆ) → Z. Restrict these functions to I to yield the desired result. ' & This completes the proof of the theorem. Lemma 12.90 (Vitali) Let m ∈ N, Rm be endowed with the usual positive cone, E ⊆ Rm with μLmo (E) < +∞, I ⊆ BB (Rm ) be a  collection of nondegenerate closed rectangles in Rm , and I¯ := V ⊆  Rm  V is a nondegenerate closed rectangle in Rm with center x ∈ Rm , such that  the rectangle U := ( 5 (V −x)+x) ∈ I , and assume that I¯ covers E in the sense of c

Vitali with index c ∈ (0, √1m ] ⊂ R. Then, there exists a countable pairwise disjoint subcollection (Vi )i∈N ⊆ I¯ with N ⊆ N (each Vi corresponds to Ui ∈ I as outlined ¯ ∀ ∈ (0, ∞) ⊂ R, ∃n ∈ Z+ such that {1, . . . , n} ⊆ N, in the definition of I), n {V1 , . . . , Vn } ⊆ I¯is pairwise disjoint - with μLmo (E \ ( i=1 Vi )) < , and n E ⊆ ( i=1 Vi ) ∪ ( i∈N,i>n Ui ) with i∈N,i>n μBm (Ui ) ≤ . Proof By Example 12.56, μLmo (E) =

.

inf

∞ (Oi )∞ i=1 ∈ORm , E⊆ i=1 Oi

∞ .

μBm (Oi ) < +∞

i=1

Then, ∃O ∈ ORm such that E ⊆ O and μBm (O) < +∞. By neglecting rectangles ¯ we may without loss of generality assume that V ⊆ O and the shortest side in I, ¯ ∀k ∈ Z+ , assume that pairwise disjoint of V has length at least c dia(V ), ∀V ∈ I. V1 , . . . , Vk ∈ I¯ has already be chosen. two exhaustive and  We will distinguish  mutually exclusive cases: Case 1: E ⊆ ki=1 Vi ; Case 2: E \ ( ki=1 Vi ) = ∅. Case   we let N = {1,. . . , n}. Clearly, μLmo (E \ ( ki=1 Vi )) = 1: E ⊆ ki=1 Vi . Then,  μLmo (∅) = 0, E ⊆ ( ni=1 Vi ), and i∈N,i>n μBm (Ui ) = 0. The procedure  terminates. Case 2: E \ ( ki=1 Vi ) = ∅. Note that, for any closed rectangle V ∈ I¯ with dia(V ) =: p¯ > 0, let x0 ∈ V be the center J of V , such that rx0 − 1 cp1 ⊆ V ⊆ r , where p ˆ := 1 − (m − 1)c2 p¯ is 1 1 1 ¯ m x0 − 2 p1 ˆ m ,x0 + 2 p1 ˆ m 2 ¯ m ,x0 + 2 cp1 the upper bound for the longest side of the rectangle V . This implies that cm p¯ m ≤ μBm (V ) ≤ pˆ m = (1 − (m − 1)c2 )m/2 p¯ m . Let lk be the supremum√of the diameters m (O) < +∞. of rectangles in I¯ that do not intersect V1 , . . . , Vk . Clearly, lk ≤ μBm c k k Note that ∃x0 ∈ E \ ( i=1 Vi ). Since i=1 Vi is closed, then, by Proposition 4.10, |x − x0 | > 0. Then, by the assumption, ∃Vˆ ∈ I¯ such that x0 ∈ Vˆ d := inf k x∈

i=1 Vi

ˆ ˆ and dia(Vˆ ) < dk and the shortest side of the rectangle V has length at least c dia(V ). ˆ ˆ Then, V ∩ ( i=1 Vi ) = ∅. This shows that lk ≥ dia(V ) > 0. Hence, ∃Vk+1 ∈ I¯ such that V1 , . . . , Vk+1 ∈ I¯ are pairwise disjoint and dia(Vk+1 ) ≥ lk /2. Inductively,

12.6 Fundamental Theorem of Calculus

649

we either have Case A: ∃ a finite pairwise disjointcollection {V1 , . . . , Vn0 } ⊆ I¯ with n0 ¯ n0 ∈ Z+ and N = {1, . . . , n0 } such that E ⊆ i=1 Vi or Case B: ∃(Vk )∞ k=1 ⊆ I with N = N such that dia(Vk+1 ) ≥ lk /2, ∀k ∈ Z+ , and the sets in the sequence are pairwise disjoint. Fix any  ∈ R+ . If Case A holds, then choose n = n0 ∈ Z+ and the ∞ result holds. If Case B holds, = N and we have-O ⊇ k=1 Vk and ∞ then N ∞ ∞ −m +∞ > μBm (O) ≥ μBm ( k=1 Vk ) = k=1 μBm (Vk ) ≥ k=0 2 cm lkm . Note that (lk )∞ k=0 is nonincreasing. Hence, limk∈N lk = 0. Thus, ∃n ∈ N such that ∞ −m c m . Let R := E \ ( n μ (V ) < 5 i=n+1 Bm i k=1 Vk ). Without loss of generality, assume dia(V ) =: p ∈ (0, ∞) ⊂ R and the center of Vk is xk , ∀k ∈ N. ∀x¯ ∈ R, k k  since nk=1 Vk is closed, then, by Proposition 4.10, d := infx∈nk=1 Vk |x − x| ¯ > 0. ¯ By the assumption, ∃V ∈ I such that x¯ ∈ V and 0 < dia(V ) < d  and the shortest side of the rectangle V has length at least c dia(V ). Then, V ∩ ( nk=1 Vk ) = ∅. By the fact that limk∈N lk = 0, we have ∃i0 ∈ {n + 1, n + 2, . . .} such that V ∩ Vi0 = ∅ and V ∩ Vk = ∅, ∀k ∈ {1, . . . , i0 − 1}.  Then, dia(V ) ≤ li0 −1 ≤ ˆ + xˆ − xi0  ≤ 2 dia(Vi0 ) = 2pi0 . Let xˆ ∈ V ∩ Vi0 . Then, x¯ − xi0  ≤ |x¯ − x| 1 5 5 dia(V ) + 2 pi0 ≤ 2 pi0 . Then, x¯ ∈ rxi − 5 pi 1m ,xi + 5 pi 1m ⊆ c r− 2c pi 1m , 2c pi 1m + 0 0 2 0 0 2 0 ∞ 0 Then, x ¯ ∈ U . Hence, xi0 ⊆ 5c (Vi0 − xi0 ) + xi0 = Ui0 ∈ I. k k=n+1  -∞ R ⊆ ∞ μLmo ( ∞ k=n+1 Uk . Then, μLmo (R) ≤ k=n+1 Uk ) ≤ k=n+1 μBm (Uk ) = ∞ ∞ m c −m μ m c −m 5 (V ) = 5 μ (V ) < . Note that E ⊆ Bm k Bm k k=n+1 k=n+1   ( ni=1 Vi ) ∪ ( ∞ i=n+1 Ui ) and i∈N,i>n μBm (Ui ) < . Hence, the result holds. This completes the proof of the lemma. ' & Theorem 12.91 (Change of Variable) Let m ∈ N, Rm be endowed with the usual positive cone, Ω ∈ ORm be an open subset, Ω¯ ∈ ORm be an open subset, F : Ω → Ω¯ be a homeomorphism, Fi := Finv : Ω¯ → Ω, X := ((Ω, |·|), B, μ) ¯ := ((Ω, ¯ μ) ¯ |·|), B, be the σ -finite metric measure subspace of Rm , X ¯ be the σ m finite metric measure subspace of R , and Fi ∗ μ¯ be the induced measure on (Ω, B) ¯ as defined in Proposition 12.9. Then, ¯ B) and F∗ μ be the induced measure on (Ω, ¯ and ¯ ¯ |·|), B, F∗ μ) is homeomorphically isomeasuric under F , and X X and ((Ω, ((Ω, |·|), B, Fi ∗ μ) ¯ is homeomorphically isomeasuric under Fi . Assume that F (1) :  U → B (Rm , Rm ) with μ(Ω \ U ) = 0 satisfies, ∀x1 , x2 ∈ Ω with x1 = x2 and  rx1 ,x2 ⊆ Ω, we have supx∈rx ,x ∩U det(F (1) (x)) ≤ Mx1 ,x2 ∈ (0, ∞) ⊂ R; Fi(1) : 1 2

 U¯ → B (Rm , Rm ) with μ( ¯ Ω¯ \ U¯ ) = 0 satisfies, ∀x¯1 , x¯2 ∈ Ω with x¯1 = x¯2 and   (1) ¯ we have supx∈r rx¯1 ,x¯2 ⊆ Ω, ¯  ≤ M¯ x¯1 ,x¯2 ∈ (0, ∞) ⊂ R; and ¯ x¯1 ,x¯2 ∩U¯ det(Fi (x))  * +     (1) Uˆ := x ∈ U  det(F (1) (x)) = 0 and U˜¯ := x¯ ∈ U¯  det(Fi (x)) ¯ = 0 are such " # that μ( ¯ Ω¯ \ U˜¯ ) = 0 = μ(Ω \ U˜ ) := μ Ω \ F (U˜¯ ) = μ(Ω \ Uˆ ) = μ( ¯ Ω¯ \ F (Uˆ )). i

¯ Fi ∗ μ¯ " μ and dFi ∗ μ¯ = g1 a.e. in X, =: g a.e. x ∈ X, Then, F∗ μ " μ¯ and dμ  3   (1) ( x)) ¯  x¯ ∈ U˜¯ det(F i where g : Ω¯ → (0, ∞) ⊂ R is given by g(x) ¯ = 1 x¯ ∈ Ω¯ \ U˜¯ dF∗ μ dμ¯

650

12 Differentiation and Integration

and g1 (x) = g(F1(x)) , ∀x ∈ Ω. Furthermore, let Y be a separable Banach space, ∀f : Ω → Y that is absolutely integrable over X, we have 7

7 f dμ =

.

Ω¯

Ω

f (Fi (x))g( ¯ x) ¯ dμ( ¯ x) ¯

(12.6a)

¯ ∀f¯ : Ω¯ → Y that and the right-hand side integrand is absolutely integrable over X; ¯ we have is absolutely integrable over X, 7 .

Ω¯

f¯ dμ¯ =

7

f¯(F (x))g1 (x) dμ(x)

(12.6b)

Ω

and the right-hand side integrand is absolutely integrable over X. Proof Fix c ∈ (0, √1m ) ⊂ R. The result is trivial if Ω¯ = ∅. Consider the case ¯ = ∅, μ(Ω) > 0, and μ( ¯ > 0. First, we need the Ω¯ = ∅. Then, Ω = Fi (Ω) ¯ Ω) following intermediate result. Claim 12.91.1 ∀x¯ ∈ U˜¯ , ∀ ∈ (0, ∞) ⊂ R, ∃δ  ∈ (0, ∞) ⊂ R such that, ¯ ∀h¯ ∈ BRm (0m , δ) with h¯  0m and min h¯ ≥ ch¯ , we have rx− ¯ x+ ¯ h, ¯ h¯ ⊆ Ω and F∗ μ(rx− ) ≤ (1 + )g( x) ¯ μ(r ¯ ). ¯ x+ ¯ x+ ¯ h, ¯ h¯ x− ¯ h, ¯ h¯ Symmetrically, ∀x ∈ U˜ = Fi (U˜¯ ), ∀ ∈ (0, ∞) ⊂ R, ∃δ ∈ (0, ∞) ⊂ R such that, ∀h ∈ BRm (0m , δ) with h  0m and min h ≥ c|h|, we have rx−h,x+h ⊆ Ω, and Fi ∗ μ(r ¯ x−h,x+h ) ≤ (1 + ) (g(F (x)))−1 μ(rx−h,x+h ) = (1 + )g1 (x)μ(rx−h,x+h ). " # (1) (1) Proof of Claim ∀x¯ ∈ U˜¯ , we have Fi (x) ¯ exists and det Fi (x) ¯

= 0. " #−1 (1) This implies that F (1) (Fi (x)) ¯ = Fi (x) ¯ and det(F (1) (Fi (x))) ¯ = " #−1 (1) det(Fi (x)) ¯ . By Definition 9.3, ∀¯ ∈ (0, ∞) ⊂ R, ∃δ1 ∈ (0, ∞) ⊂ R such that, ∀h¯ ∈ BRm (0m , δ1 ), we have x¯ + h¯ ∈ Ω¯ and       ¯ − Fi (x) ¯ − Fi(1) (x) ¯ h¯  ≤ ¯ h¯  Fi (x¯ + h)

.

  Then, ∀h¯ ∈ BRm (0m , δ1 ) with h¯  0m and min h¯ ≥ ch¯ , we have      ¯  . This implies that ¯ + Fi(1) (x)(r ¯ −h, Fi (rx− ¯ x+ ¯ h¯ ) + B Rm 0m , ¯ h ¯ h, ¯ h¯ ) ⊆ Fi (x)    (1)  ¯  ). F∗ μ(rx− ¯ −h, ¯ x+ ¯ x+ ¯ )) ≤ μBm (Fi (x)(r ¯ h¯ ) + B Rm 0m , ¯ h ¯ h, ¯ h¯ )" = μ(Fi (rx− ¯ h, ¯ h # (1) Clearly, μBm Fi (x)(r ¯ −h, = g(x)μ ¯ Bm (r−h, ¯ h¯ ) ¯ h¯ ), which can be proved by (1)

decomposing the matrix Fi (x) ¯ into basic row operations, and the equality holds for each row operation. Then, for sufficiently small ¯ > 0, we have F∗ μ(rx− ¯ Bm (r−h, ¯ μ(r ¯ x− ¯ x+ ¯ h¯ ) = (1 + )g(x) ¯ x+ ¯ h, ¯ h¯ ) ≤ (1 + )g(x)μ ¯ h, ¯ h¯ ), since (1)

Fi (x)(r ¯ −h, ¯ h¯ ) is a closed parallelogram with all of its sides have length of at least   some constant proportion of h¯ . This proves the first paragraph of the claim.

12.6 Fundamental Theorem of Calculus

651

" #−1 (1) ∀x ∈ U˜ , we have F (x) ∈ F (U˜ ) = U˜¯ . Then, F (1) (x) = Fi (F (x))  −1     (1) and det(F (1)(x)) = det(Fi (F (x))) = (g(F (x)))−1 . By symmetry, ∀ ∈ (0, ∞) ⊂ R, ∃δ ∈ (0, ∞) ⊂ R, ∀h ∈ BRm (0m , δ) with h  0m and min h ≥ c|h|, ⊆ Ω, and μ(F ¯ (rx−h,x+h )) = Fi ∗ μ(r ¯ x−h,x+h ) ≤ (1 +  we have rx−h,x+h  ) det(F (1) (x)) μ(rx−h,x+h ) = (1 + ) (g(F (x)))−1 · μ(rx−h,x+h ). This proves the second paragraph of the claim. This completes the proof of the claim. ' & ¯ We first show  that F∗ μ " μ¯ under the condition that Ω is a nonempty bounded   (1) ˜ ¯ < ∞. ¯  ≤ M¯ ∈ (0, ∞) ⊂ R, ∀x¯ ∈ U¯ . Then, 0 < μ( open set and det(Fi (x)) ¯ Ω) We need the following intermediate result. ¯ > 0, we have μ( ¯ ≥ Claim 12.91.2 ∀O¯ ∈ B¯ with O¯ being open and μ( ¯ O) ¯ O) ¯ ¯ ¯ μ(O)/M := μ(Fi (O))/M. ¯ > 0. Then, O ∈ B Proof of Claim Fix any O¯ ∈ B¯ with O¯ being open and μ( ¯ O) ˜ ˜ ¯ ¯ ¯ and is open. ∀x¯ ∈ O := O ∩ U , by Claim 12.91.1, ∀ ∈ (0, ∞) ⊂ R, ¯ ∃δ(x, ¯ ) ∈ (0, ¯ )) with h¯  0m and  ∞) ⊂ R such that, ∀h ∈ BRm (0m , δ(x,   ¯ ¯ ¯ min h ≥ c h , we have rx− ¯ x+ ¯ x+ ¯ x+ ¯ h, ¯ h¯ ⊆ Ω and μ(Fi (rx− ¯ h, ¯ h¯ )) = F∗ μ(rx− ¯ h, ¯ h¯ ) ≤ (1 + )g(x) ¯ μ(r ¯ x− ¯ x+ ¯ x+ ¯ h, ¯ h¯ ). The collection of such rx− ¯ h, ¯ h¯ is denoted I. We will restrict I to consist of all of the above-mentioned rx− ¯ x+ ¯ h, ¯ h¯ ’s that are subsets ˜ ¯ ¯ ¯ of  I covers O in the sense of Vitali with index c. Define I := * O. Clearly,  V ⊆ Rm  V is a nondegenerate rectangle in Rm with center x ∈ Rm , such that " # + the rectangle 5 (V − x) + x ∈ I . Clearly, I¯ ⊆ I. It is easy to see that I¯ c

covers O˜¯ in the sense of Vitali with index c. By Vitali’s Lemma 12.90, ∃N ⊆ N, ¯ ∃n ∈ Z+ such that {1, . . . , n} ⊆ N, {V1 , . . . , Vn } is pairwise ∃(Vi )i∈N ⊆ I,  ˜ n     n ¯ O˜¯ ⊆ disjoint, μ¯ O¯ \ < 4 μ( ¯ O), i=1 Vi i=1 Vi ∪ i∈N,i>n Ui , and ¯ ¯ where Vi = r ¯ ¯ i ) ≤ 4 μ( ¯ O), i∈N,i>n μ(U x¯ i −hi ,x¯ i +h¯ i ∈ I ⊆ I and Ui = 5 c (Vi − x¯ i ) + x¯ i = rx¯i − 5c h¯ i ,x¯ i + 5c h¯ i ∈ I, ∀i ∈ N. Then, we have the following line of arguments. ˜ = μ(O) ˜ + μ(O \ O) ˜ μ(O) = μ(O˜ ∪ (O \ O))

.

n " "  ˜ ¯ = μ(Fi (O)) ≤ μ Fi ( Vi ) ∪ ( i=1



n . i=1



μ(Fi (Vi )) +

.

## Ui )

i∈N,i>n

μ(Fi (Ui )) =

i∈N,i>n

n . i=1

F∗ μ(Vi ) +

.

F∗ μ(Ui )

i∈N,i>n

n . . (1 + )g(x¯i )μ(r ¯ x¯i −h¯ i ,x¯i +h¯ i ) + (1 + )g(x¯i )μ(r ¯ x¯i − 5 h¯ i ,x¯i + 5 h¯ i ) i=1

i∈N i>n

c

c

652

12 Differentiation and Integration

≤ (1 + )M¯

n .

μ(r ¯ x¯i −h¯ i ,x¯i +h¯ i ) + (1 + )M¯

n .

μ(V ¯ i ) + (1 + )M¯

i=1

≤ (1 + )M¯ μ( ¯

μ(r ¯ x¯i − 5 h¯ i ,x¯i + 5 h¯ i )

i∈N,i>n

i=1

= (1 + )M¯

.

n  i=1

.

c

c

μ(U ¯ i)

i∈N,i>n

 ¯ ¯ O) Vi ) + (1 + )M¯ μ( 4

¯ + (1 + )M¯  μ( ¯ = (1 + ) 4 +  M¯ μ( ¯ ≤ (1 + )M¯ μ( ¯ O) ¯ O) ¯ O) 4 4 ˜¯ the third equality follows from the where O˜ := O ∩ U˜ = Fi (O¯ ∩ U˜¯ ) = Fi (O); fact that O \ O˜ ⊆ Ω \ U˜ ; the first inequality follows from monotonicity of measure; the second inequality follows from the subadditivity of measure; the fourth equality follows from Proposition 12.9; the third inequality follows from the choice of I and ¯ the fifth inequality follows from the countable additivity of measure and from I; the preceding discussion; and the last inequality follows from the monotonicity of ¯ measure. Since  > 0 is arbitrary, then we have μ(O) ≤ M¯ μ( ¯ O). This completes the proof of the claim. ' & ¯ is a metric measure space, ¯ = 0, ∀ > 0, by the fact that X ∀E¯ ∈ B¯ with μ( ¯ E) ¯ ¯ ¯ ¯ ¯ ¯ < /M. ¯ there exists O ∈ B with E ⊆ O and O being open, such that 0 < μ( ¯ O) ¯ ¯ ¯ ¯ ¯ Then, F∗ μ(E) = μ(Fi (E)) ≤ μ(Fi (O)) ≤ M μ( ¯ O) < . By the arbitrariness of ¯ = 0. By the arbitrariness of E, ¯ we have F∗ μ " μ. , we have F∗ μ(E) ¯ This proves that F∗ μ " μ¯ in the special case. Now, consider the general case as stipulated in the theorem statement. Since Ω¯ is ∞ ∞ a nonempty open set and Rm is second countable, then ∃(xˆ¯i )i=1 , (xˇ¯i )i=1 ⊆ Ω¯ such ∞ ◦ ¯ = 0. ¯ ∀i ∈ N, and Ω¯ = i=1 r that xˆ¯i  xˇ¯i and rxˆ¯i ,xˇ¯i ⊆ Ω, . ∀E¯ ∈ B¯ with μ( ¯ E) xˆ¯ i ,xˇ¯ i # " Then, we have μ¯ E¯ ∩ r◦ˆ ˇ = 0, ∀i ∈ N. By the special case, F∗ μ(E¯ ∩ r◦ˆ ˇ ) = x¯ i ,x¯ i x¯ i ,x¯ i -∞ ◦ ¯ ≤ ¯ F μ( E ∩ r ) = 0. Hence, we have 0, ∀i ∈ N. Then, 0 ≤ F∗ μ(E) ∗ i=1 xˆ¯ i ,xˇ¯ i

F∗ μ " μ. ¯ By symmetry, Fi ∗ μ¯ " μ. By Radon–Nikodym Theorem 11.169, dFdμ∗¯μ =: gˆ exists and is unique almost ¯ and dFi ∗ μ¯ =: gˇ exists and is unique almost everywhere on X, everywhere on X, dμ ¯ where gˆ : Ω¯ → [0, ∞) ⊂ R is B-measurable (or simply BB (Rm )-measurable) and gˇ : Ω → [0, ∞) ⊂ R is B-measurable (or simply BB (Rm )-measurable). ˇ = By Definitions 12.7 and 12.8 and Proposition 11.168, it is clear that g(x) 1 1 ¯ Then, by changing gˆ on a a.e. x ∈ X and g( ˆ x) ¯ = a.e. x ¯ ∈ X. g(F ˆ (x)) g(F ˇ i (x)) ¯ set of measure zero in μ¯ and changing gˇ on a set of measure zero in μ, we have 1 , ∀x ∈ Ω. g(x) ˇ = g(F ˆ (x))

12.6 Fundamental Theorem of Calculus

653

By Propositions 11.92, 12.11, and 11.168, we have (12.6) holds with g substi¯ tuted by gˆ and g1 substituted by g. ˇ We will only need to show that gˆ = g a.e. x¯ ∈ X. Then, the theorem is proved. First, consider the special case when Ω is bounded. Then, χΩ,Ω is absolutely integrable over X. By the preceding paragraph, we have ∞ > μ(Ω) = Ω χΩ,Ω dμ = Ω¯ χΩ,Ω (Fi (x)) ¯ g( ˆ x) ¯ dμ( ¯ x) ¯ = Ω¯ χΩ, ¯ g( ˆ x) ¯ dμ( ¯ x). ¯ ¯ Ω¯ (x) ¯ ¯ g ˆ d μ ¯ < ∞. This implies that g ˆ ∈ L ( Ω, R). Let Then, we have Ω 1 ¯   A¯ := x¯ ∈ Ω¯  x¯ is a rectangular Lebesgue point with regularity c of gˆ . By ¯ = 0. Define the set B¯ := Proposition 12.80, A¯ ∈ BLm and μLm (Ω¯ \ A) ¯ {x¯ ∈ Ω | g( ˆ x) ¯ ≤ g(x)}. ¯ By Propositions 12.81, 11.38, 11.39, and 11.139, g and g1 are BB (Rm )-measurable. By the measurability of g and gˆ and ¯ We need the following Propositions 7.23, 11.38, 11.39, and 11.35, we have B¯ ∈ B. intermediate result. Claim 12.91.3 B¯ ⊇ A¯ ∩ U˜¯ . Proof of Claim Fix any x¯ ∈ A¯ ∩ U˜¯ . Fix any  ∈ (0, ∞) ⊂ R. By Claim 12.91.1,   ∃δ ∈ (0, ∞) ⊂ R such that ∀h¯ ∈ BRm (0m , δ) with h¯  0m and min h¯ ≥ ch¯ , ¯ ¯ μ(r ¯ x− we have rx− ¯ x+ ¯ x+ ¯ x+ ¯ h, ¯ h¯ ⊆ Ω and F∗ μ(rx− ¯ h, ¯ h¯ ) ≤ (1 + )g(x) ¯ h, ¯ h¯ ). By Definition 12.79, ∃r0 ∈ (0, ∞) ⊂ R such that ∀r ∈ (0, r0 ), we have BRm (x, ¯ r) ⊆ Ω¯ and 7   1 g( ¯ y) ¯ 0, (x1 + ix2 ) = 0 , Ω¯ := (ρ, θ ) ∈ R2 | ρ > 0, θ !∈ 4  (0, 2π) , and F : Ω → Ω¯ be defined by F (x1 , x2 ) := x12 + x22 , (x1 + ix2 ) , and Fi : Ω¯ → Ω be defined by Fi (ρ, θ ) = Finv (ρ, θ ) = (ρ cos(θ ), ρ sin(θ )).     It is easy to check that F (1) : Ω → B R2 , R2 and Fi(1) : Ω¯ → B R2 , R2 . H I   cos(θ ) −ρ sin(θ )   (1) (1) ¯ Fi (ρ, θ ) = and det(Fi (ρ, θ )) = ρ > 0, ∀(ρ, θ ) ∈ Ω. sin(θ ) ρ cos(θ ) Hence, U˜¯ = Ω¯ as defined in Change of Variable Theorem 12.91. Then, all assumptions of that theorem are satisfied. Let Y be a separable Banach space, and ∀f : Ω → Y that is absolutely integrable over Ω, which is the measure subspace of R2 , we have Ω f (x1 , x2 ) dμB2 (x1 , x2 ) = f Ω¯ (Fi (ρ, θ ))ρ dμB2 (ρ, θ ). 2 2 Furthermore, ∀f : R → Y that is absolutely integrable over R , we have R2 f (x1 , x2 ) dμB2 (x1 , x2 ) = Ω f (x1 , x2 ) dμB2 (x1 , x2 ) = Ω¯ f (Fi (ρ, θ ))ρ dμB2 (ρ, θ ), where the first equality follows from the fact that μB2 (R2 \ Ω) = 0 ¯ and Proposition 11.92. ∀f¯ : Ω¯ → Y that is absolutely integrable over Ω, 2 ¯ which is the measure subspace of R , we have Ω¯ f (ρ, θ ) dμB2 (ρ, θ ) = ¯ 4 1 dμB2 (x1 , x2 ). % Ω f (F (x1 , x2 )) x12 +x22

m Example 12.96 Let m  ∈ N with m m≥ 23, R 2 be endowed with the  usual positive cone, Ω := (x1 , . . . , xm ) ∈ R | x1 + x2 > 0, (x1 + ix2 ) = 0 , Ω¯ := {(ρ, θ1 , . . . , θm−2 , ϕ) ∈ Rm | ρ > 0, θi ∈ (0, π), i = 1, . . . , m − 2, ϕ ∈ (0, 2π)},

12.7 Representation of (Ck (Ω, Y))∗

657

 4- m 2  xm  4and F : Ω → Ω¯ be defined by F (x1 , . . . , xm ) := , i=1 xi , arccos m 2 i=1 xi    . . . , arccos 4 2 x3 2 2 , (x1 + ix2 ) , and Fi : Ω¯ → Ω be defined by x1 +x2 +x3

Fi (ρ, θ1 , . . . , θm−2 , ϕ) = Finv (ρ, θ1 , . . . , θm−2 , ϕ) = (ρ sin(θ1 ) · · · sin(θm−2 ) cos(ϕ), ρ sin(θ1 ) · · · sin(θm−2 ) sin(ϕ), ρ sin(θ1 ) · · · sin(θm−3 ) cos(θm−2 ), . . . , ρ sin (θ1 ) cos(θ2 ), ρ cos(θ1 )). It is easy to check  that F (1) : Ω → B(Rm, Rm )   and Fi(1) : Ω¯ → B(Rm , Rm ), and det(Fi(1) (ρ, θ1 , . . . , θm−2 , ϕ)) = ¯ ρ m−1 (sin(θ1 ))m−2 (sin(θ2 ))m−3 · · · sin(θm−2 ) > 0, ∀(ρ, θ1 , . . . , θm−2 , ϕ) ∈ Ω. ˜ ¯ ¯ Hence, U = Ω as defined in Change of Variable Theorem 12.91. Then, all assumptions of that theorem are satisfied. Let Y be a separable Banach space, and ∀f : Ω → Y that is absolutely integrable over Ω, which is the measure subspace of Rm , we have Ω f (x1 , . . . , xm ) dμBm (x1 , . . . , xm ) = m−1 (sin(θ ))m−2 (sin(θ ))m−3 · · · sin(θ 1 2 m−2 ) · dμBm Ω¯ f (Fi (ρ, θ1 , . . . , θm−2 , ϕ))ρ m → Y that is absolutely integrable over (ρ, θ1 , . . . , θm−2 , ϕ). Furthermore, ∀f : R Rm , we have Rm f (x1 , . . . , xm ) dμBm (x1 , . . . , xm ) = Ω f (x1 , . . . , xm ) dμBm (x1 , . . . , xm ) = Ω¯ f (Fi (ρ, θ1 , . . . , θm−2 , ϕ))ρ m−1 (sin(θ1 ))m−2 · (sin(θ2 ))m−3 · · · sin(θm−2 ) dμBm (ρ, θ1 , . . . , θm−2 , ϕ), where the first equality follows from the fact that μBm (Rm \ Ω) = 0 and Proposition 11.92. ∀f¯ : Ω¯ → Y that is m ¯ absolutely integrable over Ω, which is the measure subspace of R , we have ¯ ¯ = Ω f (F (x1 , . . . , xm )) Ω¯ f (ρ, θ1 , . . . , θm−2 , ϕ) dμBm (ρ, θ1 , . . . , θm−2 , ϕ) 4- 1 4- 1 4- 1 · · · dμ (x , . . . , x ). % Bm 1 m m m−1 2 2 i=1 xi

i=1

xi2

2 i=1 xi

3 Example 12.97 Let endowed with the  usual positive cone, Ω :=  2R be   3  (x1 , x2 , x3 ) ∈ R x1 + x22 > 0, (x1 + ix2 ) = 0 , Ω¯ := (ρ, ϕ, z) ∈ R3   ρ > 0, ϕ ∈ (0, 2π) , and F : Ω → Ω¯ be defined by F (x1 , x2 , x3 ) := 4   2 x1 + x22 , (x1 + ix2 ), x3 , and Fi : Ω¯ → Ω be defined by Fi (ρ, ϕ, z) =

Finv (ρ, ϕ, z) = (ρ cos(ϕ), ρ sin(ϕ), z). It is easy to check that F (1) : Ω → ⎡ ⎤  3 3  3 3  cos(ϕ) −ρ sin(ϕ) 0 (1) (1) B R , R and Fi : Ω¯ → B R , R , ⎣ sin(ϕ) ρ cos(ϕ) 0 ⎦ = Fi (ρ, ϕ, z), 0 0 1     (1) ¯ and det(F (ρ, ϕ, z)) = ρ > 0, ∀(ρ, ϕ, z) ∈ Ω. Hence, U˜¯ = Ω¯ as i

defined in Change of Variable Theorem 12.91. Then, all assumptions of that theorem are satisfied. Let Y be a separable Banach space, and ∀f : Ω → Y 3 that is absolutely integrable over Ω, which is the measure subspace of R , we have Ω f (x1 , x2 , x3 ) dμB3 (x1 , x2 , x3 ) = Ω¯ f (Fi (ρ, ϕ, z))ρ dμB3 (ρ, ϕ, z). Furthermore, ∀f : R3 → Y that is absolutely integrable over R3 , we have R3 f (x1 , x2 , x3 ) dμB3 (x1 , x2 , x3 ) = Ω f (x1 , x2 , x3 ) dμB3 (x1 , x2 , x3 ) = Ω¯ f (Fi (ρ, ϕ, z))ρ dμB3 (ρ, ϕ, z), where the first equality follows from the fact that μB3 (R3 \ Ω) = 0 and Proposition 11.92. ∀f¯ : Ω¯ → Y that is

658

12 Differentiation and Integration

¯ which is the measure subspace of R3 , we have absolutely integrable over Ω, ¯ ¯ 4 1 dμB3 (x1 , x2 , x3 ). % Ω¯ f (ρ, ϕ, z) dμB3 (ρ, ϕ, z) = Ω f (F (x1 , x2 , x3 )) 2 2 x1 +x2

Proposition 12.98 Let I ⊆ R be an open interval, I¯ ⊆ R be an open interval, F : I → I¯ be a homeomorphism, Fi := Finv : I¯ → I , and F and Fi are absolutely continuous, I := ((I, |·|), B, μ) be the σ -finite metric measure ¯ μ) subspace of R and I¯ := ((I¯, |·|), B, ¯ be the σ -finite metric measure subspace of R, and Y be a separable Banach space. Assume that F (1) : U → R satisfies:  (1)  supx∈rx ,x ∩U F (x) < ∞, ∀x1 , x2 ∈ I with x1 ≤ x2 ; Fi(1) : U¯ → R 1 2    (1)  ( x) ¯ satisfies: supx∈r  < ∞, ∀x¯1 , x¯2 ∈ I¯ with x¯1 ≤ x¯2 ; and Uˆ := {x ∈ F ¯ ¯ x¯1 ,x¯2 ∩U i  * +  (1) U | F (1)(x) = 0} and U˜¯ := x¯ ∈ U¯  Fi (x) ¯ = 0 are such that μ( ¯ I¯ \ U˜¯ ) = 0 =     μ I \ F (U˜¯ ) = μ(I \ Uˆ ) = μ¯ I¯ \ F (Uˆ ) . Then, ∀f : I → Y with f being i

absolutely integrable over I, we have 7

7

7

f dμ =

.

I

f (Fi (x))g( ¯ x) ¯ dμ( ¯ x) ¯ = F (I )



f (Fi (x))g( ¯ x) ¯ dμ( ¯ x) ¯

¯ we have and ∀f¯ : I¯ → Y with f¯ being absolutely integrable over I, 7 7 7 ¯ ¯ . f dμ¯ = f (F (x))g1 (x) dμ(x) = f¯(F (x))g1 (x) dμ(x) I¯

Fi (I¯)

I

where g : I¯ → (0, ∞) ⊂ R and g1 : I → (0, ∞) ⊂ R are given by g(x) ¯ =  3  (1)  ¯  x¯ ∈ U˜¯ Fi (x) and g1 (x) = g(F1(x)) , ∀x ∈ I . ˜ ¯ ¯ 1 x¯ ∈ I \ U Proof The proposition is a direct consequence of Theorem 12.91.

' &

Theorem 12.99 (Riemann–Lebesgue) Let Y be a separable Banach space over K, f ∈ L¯ 1 (R, Y). Then, 7 .

lim

|p|→∞ R

f (t) sin(pt) dμB (t) = ϑY

(12.7)

Proof Since f ∈ L¯ 1 (R, Y), then f is absolutely integrable over R. Then, f¯p : R → Y defined by f¯p (t) := f (t) sin(pt), ∀t ∈ R, is absolutely integrable over R, ∀p ∈ R. First, we consider the special case when f = yχI,R with I being an interval and y ∈ Y with y = ϑY . Then, I must be a finite interval, since f ∈ L¯ 1 (R, Y). The case when I = ∅ is trivial. Let I := [a, b] with ba, b ∈ R and a ≤ b. Then, lim|p|→∞ R f (t) sin(pt) dμB (t) = lim|p|→∞ a y sin(pt) dμB (t) = lim|p|→∞ y p1 (cos(pa) − cos(pb)) dμB (t) = ϑY , where the first equality follows

12.7 Representation of (Ck (Ω, Y))∗

659

from I being an interval and I = [a, b] and the second equality follows from Proposition 11.92 and Theorem 12.83. This special case is proved. Next, we consider the general case when f ∈ L¯ 1 (R, Y). By Proposition 11.66, there exists a sequence of simple functions (ϕn )∞ n=1 , ϕn : R → Y, ∀n ∈ N, such that limn∈N ϕn (x) = f (x) a.e. x ∈ R, and ϕn (x) ≤ f (x), ∀x ∈ R, ∀n ∈ N. Then, limn∈N R P ◦ (f − ϕn ) dμB = 0, by Lebesgue Dominated Convergence Theorem 11.91. ∀ ∈ (0, ∞) ⊂ R, ∃n0 ∈ N such that RP ◦ (f − ϕn0 ) dμB < 4 . Let ϕn0 admit the canonical representation ϕn0 = ni=1 yi χEi ,R , where n ∈ Z+ , y1 , . . . , yn ∈ Y are distinct and none equals to ϑY , and (R) are nonempty, pairwise disjoint, and of finite E1 , . . . , En ∈ BBn measure. R P ◦ ϕn0 dμ B = i=1 yi μB (Ei ). By Proposition 11.7, ∃a, b ∈ R with a < b such that R\[a,b] P ◦ ϕn0 dμB < 4 . Let f1 := ϕn0 χ[a,b],R . Then,  ∃ a continuous function R P ◦ (f − f1 ) dμB < 2 . By Proposition 11.182, 6 6 g : [a, b] → Y such that g ∈ L¯ 1 ([a, b], Y) and 6g − f1 |[a,b] 6L¯ ([a,b],Y) < 4 . 1 By Proposition 5.39, g is uniformly continuous. ∃m ∈ N, ∀x1 , x2 ∈ [a, b]  with |x1 − x2 | ≤ b−a m , we have3g(x1 ) − g(x2 ) < 4(b−a) . Construct a step 0 x ∈ R \ [a, b] S R , ∀x ∈ R. function h : R → Y by h(x) = x−a b−a g(a + b−a m m ) x ∈ [a, b] Then, we have [a,b] P ◦ (h |[a,b] − g) dμB < 4 . Hence, R P ◦ (f − h) dμB ≤ −f1 ) dμB + R P ◦(f1 −h) dμB < 2 + [a,b] P ◦( f1 |[a,b] − h|[a,b] ) dμB ≤ R P ◦(f  − g) dμB + [a,b] P ◦ (g − h|[a,b] ) dμB < . Then, 2 + [a,b] P ◦ ( f16|[a,b] 6 6 6 0 ≤ lim inf|p|→∞ 6 R f¯p dμB 6 ≤ lim sup|p|→∞ 6 R f¯p dμB 6 ≤ lim sup|p|→∞ 6 6 6 6  6 6 6 6 ≤ lim|p|→∞ 6 R h(t) sin(pt) dμB (t)6 + R (f (t) − h(t)) sin(pt) dμB (t) 6 h(t) sin(pt) dμB (t)6 + lim sup|p|→∞ f (t) − h(t) dμB (t) < 0 + , R R where the first three inequalities follow from Proposition 3.83 and Lebesgue Dominated Convergence Theorem 11.91, the fourth inequality follows from Propositions 3.83 and 11.92, and the last inequality follows from the preceding discussion 6 Hence, by the arbitrariness of , we 6 and the special case. 6 f (t) sin(pt) dμB (t)6 = 0, by Proposition 3.83. Then, have lim|p|→∞ R lim|p|→∞ R f (t) sin(pt) dμB (t) = ϑY . This completes the proof of the theorem. ' & Proposition 12.100 Let J := ra,b = [a, b] ⊂ R be a compact interval (a ≤ b), g : J → R be of bounded variation (as defined in Definition 12.41) and is nondecreasing, and f : J → R be BB (R)-measurable. Assume that f is bounded, that is, ∃M ≥ 0 such that |f (x)| ≤ M, ∀x ∈ J . If f is Riemann–Stieltjes integrable with respect to g over J (as defined in Definition 29.2 of Bartle (1976), or equivalently in Definition A.4), and let the Riemann–Stieltjes integral be denoted by I , then I = ra,b f dg, where the right-hand side is the Lebesgue–Stieltjes integral defined in Definition 12.57. Proof By Proposition 12.39, P(J ) = ra,b . By Proposition 12.49, there exists a unique finite measure μ on the measurable space (P(J ), BB (P(J ))) such that g is a cumulative distribution function of μ, μ(rx1 ,x2 ) = g(x2 )−g(x1 ), ∀a ≤ x1 ≤ x2 ≤ b,

660

12 Differentiation and Integration

and μ(P(J )) = μ(ra,b ) = g(b)−g(a) = Tg < ∞. Since f is bounded, by Bounded Convergence Theorem 11.77, f is absolutely integrable over P(J ) with respect to μ. Then, by Proposition 11.92, I¯ := ra,b f dg ∈ R. By Riemann Criterion for Integrability (Thoerem 30.1 of Bartle (1976)), ∀n ∈ N, there exists a partition Pn of J such that, ∀ refinement P := (x0 , . . . , xkn ) of Pn , we have kn  .   Mj (g(xj ) − g(xj −1 )) − I  < 2−n ; 

.

j =1

kn .    mj (g(xj ) − g(xj −1 )) − I  < 2−n  j =1

where Mj := sup{f (x) ∈ R | x ∈ [xj −1 , xj ]} and mj := inf{f (x) ∈ R | x ∈ [xj −1 , xj ]}. Without loss of generality, we may assume that Pn+1 is a refinement of Pn (otherwise, define Pn+1 to be the partition that contains all of the points in Pn and Pn+1 ). Then, I = lim

.

n∈N

kn .

7

7 n∈N ra,b

7 = lim

n∈N ra,b j =1

j =1

=: lim

kn .

Mj (g(xj ) − g(xj −1 )) = lim

h¯ n dμ = lim

n∈N

kn .

n∈N ra,b j =1

kn .

Mj χrxj−1 ,xj ,J dμ

mj (g(xj ) − g(xj −1 ))

j =1

7 mj χrxj−1 ,xj ,J dμ =: lim

n∈N ra,b

hn dμ

where hn : ra,b → R and h¯ n : ra,b → R, ∀n ∈ N; and the second and fourth equalities follow from Proposition 11.75. Clearly, −M ≤ hn (x) ≤ hn+1 (x) ≤ f (x) ≤ h¯ n+1 (x) ≤ h¯ n (x) ≤ M, ∀x ∈ ra,b , ∀n ∈ N. By Proposition 11.48, we have limn∈N hn =: h : ra,b → R and limn∈N h¯ n =: h¯ : ra,b → R are BB ra,b -measurable. By Lebesgue Dominated Convergence Theorem 11.91, we have h and h¯ are absolutely integrable over (ra,b , BB (ra,b ), μ) and ra,b h dμ = limn∈N hn dμ = I = limn∈N h¯ n dμ = h¯ dμ. Note that h(x) ≤ ra,b

ra,b

ra,b

¯ f (x) ≤ h(x), ∀x ∈ ra,b , and f is BB (R)-measurable by assumption, and then, by Proposition 11.92, I = ra,b h dμ ≤ ra,b f dμ = I¯ ≤ ra,b h¯ dμ = I . ' & This completes the proof of the proposition. The above can be generalized further. Let g : J → R be of bounded variation, but not necessarily nondecreasing and everything else as assumed in the preceding proposition. Then, by Theorem 12.50, there exists a unique finite R-valued measure μ on the measurable space (P(J ), BB (P(J ))) such that g is a cumulative distribution function of μ, P ◦ μ(rx1 ,x2 ) = Tg (rx1 ,x2 ), ∀a ≤ x1 ≤ x2 ≤ b, and P ◦ μ(P(J )) = P ◦ μ(ra,b ) = Tg < ∞. By Jordan Decomposition Theorem 11.162, there exists a unique pair of mutually singular finite measures μ+ and μ− on (ra,b , BB (ra,b )),

12.7 Representation of (Ck (Ω, Y))∗

661

such that μ = μ+ − μ− and P ◦ μ = μ+ + μ− . Let G be a cumulative distribution function for P ◦ μ as delineated in Proposition 12.51 and G is of bounded variation. We need to assume that f is Riemann–Stieltjes integrable with respect to g and G over J , respectively, and denote these integrals as I and IG , respectively. It is easy to see that 12 (g + G) and 12 (G − g) are cumulative distribution functions for μ+ and μ− , respectively. Then, they are nondecreasing since μ+ and μ− are measures. Furthermore, they are continuous on the right since g and G are of bounded variation. Hence, 12 (g + G) and 12 (G − g) are of bounded variation. Thus, by Theorem 29.5 of Bartle (1976) and Proposition 12.100, we have 12 (I + IG ) = ra,b f dμ+ and 12 (IG − I ) = ra,b f dμ− . This leads to f dμ+ − f dμ− = f dμ = I¯, where I = 1 (I + IG ) − 1 (IG − I ) = 2

ra,b

2

ra,b

ra,b

the third equality follows from Propositions 11.145 and 11.146. Theorem 12.101 (Integration by Parts) Let J := ra,b ⊆ R be a compact interval, I := ((P(J ), |·|), B, μ) be the finite metric measure subspace of R, f : J → R be of bounded variation and a cumulative distribution function of the finite R-valued measure space (P(J ), B, νf ), F : J → R be the cumulative distribution function for the finite measure space (P(J ), B, P ◦ νf ) as delineated in Proposition 12.51 and is of bounded variation, g : J → R be of bounded variation and a cumulative distribution function of the finite R-valued measure space (P(J ), B, νg ), and G : J → R be the cumulative distribution function for the finite measure space (P(J ), B, P ◦ νg ) as delineated in Proposition 12.51 and is of bounded variation. Assume that f is Riemann–Stieltjes integrable with respect to g and G over J , respectively, and denote the Riemann–Stieltjes integral of f with respect to g over J as I1 ; and g is Riemann–Stieltjes integrable with respect to f and F over J , respectively, and denote the Riemann–Stieltjes integral of g with respect to f over J as I2 . Then, we have 7

7 f (x) dg(x) +

.

ra,b

g(x) df (x) = I1 + I2 = f (b)g(b) − f (a)g(a)

(12.8)

ra,b

Proof By Proposition 12.54, f −f (a) and g −g(a) are BB (R)-measurable. Hence, f and g are BB (R)-measurable, by Propositions 11.38, 7.23, and 11.39. Since f and g are of bounded variation, then f and g are bounded. The first equality in the theorem statement follows from Proposition 12.100 and the discussion immediately after it. The second equality in the theorem statement follows from Integration by Parts 29.7 of Bartle (1976) (or Theorem A.8 in the appendix). This completes the proof of the theorem. ' & Proposition 12.102 Let I ⊆ R be an open interval, X := ((I, |·|), B, μ) be the σ -finite metric measure subspace of R, Y be separable reflexive Banach space over K with Y∗ being separable, (I, B, ν) be a σ -finite Y-valued measure space, F : I → Y be a cumulative distribution function of ν, and f : I → Y be Bmeasurable. Assume that F is absolutely continuous, and DF (x) = f (x) a.e. x dν ∈ X. Then, dμ = f a.e. in X.

662

12 Differentiation and Integration

Proof By Fundamental Theorem of Calculus II, Theorem 12.88, F (x) = F (a) + x f (t) dt, ∀a, x ∈ I . By Proposition 11.116, we may define ν¯ to be σ -finite Ya valued measure with kernel f over X. Then, by Definition 11.166, f is the Radon– Nikodym derivative of ν¯ with respect to μ. By Definition 12.42, F is a cumulative distribution function of (I, B, ν¯ ). By Theorem 12.50 and Proposition 12.73, we have dν ν = ν¯ . Then, we have f = dμ a.e. in X. ' & Lemma 12.103 Let Ω be a set, S be a π-system on Ω, B be the σ -algebra on Ω generated by S, Y be a normed linear space, and μ1 and μ2 be finite Y-valued measures on (Ω, B). Assume that μ1 (E) = μ2 (E), ∀E ∈ S. Then, μ1 (E) = μ2 (E), ∀E ∈ B. Proof Define D := {E ∈ B | μ1 (E) = μ2 (E)}. Clearly, S ⊆ D ⊆ B. We will show that D is a monotone class on Ω. Then, by Monotone Class Lemma 12.19, we have D = B. Hence, the result holds. Clearly, (i) ∅, Ω ∈ S ⊆ D. (ii) ∀E1 , E2 ∈ D with E1 ⊆ E2 , we have μ1 (E2 \ E1 ) = μ1 (E2 ) − μ1 (E1 ) = μ2 (E2 ) − μ2 (E1 ) = μ2 (E2 \ E1 ), where the first and last equalities follow from Definitions 11.108 and 11.109 and the second equality follows from the fact E1 , E2 ∈ D. This implies \ E1 ∈ D. (iii) ∞ that E2  ∞ ∀(Ei )∞ ⊆ D with E ⊆ E , ∀i ∈ N, we have E = i i+1 i i=1 i=1 Ai , where i=1 A1 := E1 , A2 := E2 \ E1 , Ai := Ei \ Ei−1 , ∀i ∈ {3, 4, . . .}. Clearly, Ai ’s ∞ are  pairwise disjoint. By (ii), we have A ∈ D, ∀i ∈ N. Then, μ ( E i 1 i i=1 -∞ -∞ ∞ ∞ ) = μ1 ( ∞ A ) = μ (A ) = μ (A ) = μ ( A ) = μ ( i 1 i 2 i 2 i 2 i=1 i=1 i=1 i=1 i=1 Ei ).  Hence, ∞ E ∈ D. The preceding discussion implies that D is a monotone class i i=1 on Ω. This completes the proof of the lemma. ' & Example 12.104 Let n ∈ N with n ≥ 2, and Sn−1 := {x ∈ Rn | |x| = 1} ⊆ Rn be the unit sphere (n-dimensional) with the subset topology O. Clearly, Sn−1 ∈ BB (Rn ). Clearly, (Sn−1 , |·|) is a metric subspace of Rn . Let B := {E ⊆ Sn−1 | E ∈ BB (Rn )}. Then, B = BB ((Sn−1 , O)) by Proposition 11.25. On Sn−1 , we introduce function μ : O → [0, ∞) ⊂ R by μ(O) := nμBn (VO ), where VO := {rx ∈ Rn | x ∈ O, r ∈ (0, 1) ⊂ R} ⊆ Rn , ∀O ∈ O. Clearly, μ(∅) = 0, and ∀O ∈ O, VO ∈ BB (Rn ) since VO is an open set in Rn . Hence, μ(O) is well-defined. n/2 , Clearly, μ(Sn−1 ) = nμBn (BRn (0n , 1) \ {0m }) = nμBn (BRn (0n , 1)) = Γnπ ( n2 +1) where we have referred to the Γ function, and the formula for the volume of ndimensional ball in (Mathematics Handbook Editors Group, 1979, pp. 320). Clearly, μ is countably additive on O, since μBn is countably additive. Let A ⊆ Sn−12 be the algebra generated by O. By Proposition 2.8, ∀A ∈ A, ∃n, m ∈ N, ∀i1 , . . . , i2n ∈ {1, . . . , m},∃Fi1 ,...,i m2n ⊆ X mwith Fi1 ,...,i2n ∈ O or Sn−1 \ Fi1 ,...,in2n ∈ O such that A = m · · · . Then, VA := {rx ∈ R | x ∈ A, r ∈ i1 =1 i2 =1 i2n =1 Fi1 ,...,i m2n m  n (0, 1) ⊂ R} can be expressed VA = i1 =1 i2 =1 · · · m i2n =1 VFi1 ,...,i2n ∈ BB (R ) n since VFi1 ,...,i2n ∈ BB (R ). Thus, we may define μˆ : A → [0, ∞) ⊂ R by μ(A) ˆ := nμBn (VA ), ∀A ∈ A. Clearly, μˆ is an extension of μ to A, and it is a finite measure on the algebra A. By Carathéodory Extension Theorem 11.19, μˆ admits unique

12.8 Sobolev Spaces

663

extension to a finite measure μ¯ on the σ -algebra B. Then, μ¯ : B → [0, ∞) ⊂ R and μ| ¯ O = μ. Note that (Sn−1 , |·|) is second countable by Propositions 4.4 and 4.38. By Proposition 5.40, it is compact. Thus, (Sn−1 , |·|) is a compact (therefore locally compact) separable metric space and μ¯ is a finite measure on B = BB ((Sn−1 , O)). By Theorem 11.198, Sn−1 := ((Sn−1 , |·|), B, μ) ¯ is a finite compact separable metric measure space. This is basically the surface area measure space on the ndimensional unit sphere. % Example 12.105 Let I := ((r−π,π , |·|), B, μ) be the finite metric measure subspace ¯ μ) of R, and S1 := ((S1 , |·|), B, ¯ be the finite metric measure space as defined in Example 12.104. Define a mapping g : r−π,π → S1 by g(x) = (cos(x), sin(x)) ∈ S1 , ∀x ∈ r−π,π . It is easy to see that g is bijective and g is continuous. But, ginv is not continuous and therefore g is not a homeomorphism. ∀O ∈ OS1 , we have ginv (O) ∈ OI since g is continuous. On the other hand, ∀O ∈ OI , we will distinguish two exhaustive and mutually exclusive cases: Case 1: π ∈ O; and Case 2: π ∈ / O. ¯ ∈ OS1 . Then, Case 1: π ∈ O. Then, O¯ := O \ {π} ∈ OI , O¯ ∈ OR , and g(O) ¯ Case 2: π ∈ ¯ ¯ ∪ {g(π)} ∈ B. g(O) = g(O) / O. Then, O ∈ OR and g(O) ∈ OS1 ⊆ B. ¯ ∀O ∈ OI . By Proposition 11.34, we have Hence, we have shown that g(O) ∈ B, ¯ ∀B ∈ B and ginv (B) ¯ Hence, ginv is an isomeasure between ¯ ∈ B, ∀B¯ ∈ B. g(B) ∈ B, S1 and (r−π,π , B, ginv ∗ μ). ¯ It is easy to see that μ(I1 ) = μ(g(I ¯ ¯ 1 ), ∀ interval I1 ⊆ I. Clearly, 1 )) = ginv ∗ μ(I the collection of all intervals that are subset of I forms a π-system on r−π,π . By Lemma 12.103, μ = ginv ∗ μ. ¯ Hence, g is an isomeasure between I and S1 . ∀p ∈ [1, ∞) ⊂ R, let X be a separable Banach space over K. By Example 11.179, we have Lp (I, X) ≡ Lp (S1 , X) in the sense that ∀z ∈ L¯ p (I, X), we have z ◦ ginv ∈ L¯ p (S1 , X) and ∀¯z ∈ L¯ p (S1 , X), we have z¯ ◦ g ∈ L¯ p (I, X). %

12.8 Sobolev Spaces In this section, we will be concerned with absolutely continuous .Y-valued function on an open rectangle .Ω of .Rm , with .m ∈ N, where .Y is a separable reflexive Banach space over .K with .Y∗ being separable. We restrict ourself to consider such functions that further have a finite .L¯ p (X, Y) pseudo-norm, where .p ∈ [1, ∞) ⊂ R. We will be using the notations of the Fundamental Theorem of Calculus II, Theorem 12.88, without repeating their definitions. For any such function f , there exists a set of functions .fJ,x0 : πJ (Ω) → Y, which are .BJ -measurable and absolutely integrable over any bounded rectangles of .XJ , .∀J ⊆ J¯ := {1, . . . , m} with .J = ∅, and any fixed .x0 ∈ Ω. These are the stream functions of f . They form a unique set of functions, except on a set of measure zero in .XJ for each of the stream functions. They together satisfy (12.3). Theorem 12.88 clearly states that these functions determine the partial derivative functions .Di f , .∀i ∈ J¯. We will further restrict ourself to absolutely continuous functions f such that .fJ,x0 is in .L¯ p (XJ , Y),

664

12 Differentiation and Integration

∀J ⊆ J¯ with .J = ∅. It should be clear that .(f + g)J,x0 = fJ,x0 + gJ,x0 a.e. x ∈ XJ and .(αf )J,x0 = αfJ,x0 a.e. in XJ , .∀J ⊆ J¯ with .J = ∅, .∀f, g under consideration and .∀α ∈ K. This shows that such functions form a vector space -over .K. We will define a norm on this space as .f x0 := ( Ω Pp ◦ f (x) dμ(x) + J ⊆J¯ πJ (Ω) Pp ◦

.

J =∅

fJ,x0 (s) dμJ (s))1/p . The value of the norm does depends on the point .x0 . But the functions included in Z are independent of the choice of .x0 (see Example 12.107). We will show that this defines a Banach space over .K, which will be called the 1st order Sobolev space of .X to .Y. Example 12.106 Let .m ∈ N, .Rm be endowed with the usual positive cone, .p ∈ [1, ∞) ⊂ R, .Ω ∈ BB (Rm ) be a nonempty open rectangle, .X := ((Ω, |·|), B, μ) be the .σ -finite metric measure subspace of .Rm , and .Y be a separable reflexive Banach space over .K with .Y∗ being separable. Fix any .x0 ∈ Ω. Let .Z := {f ∈ L¯ p (X, Y) | f is absolutely continuous, and the set of stream functions of f with respect to x0 are in L¯ p (XJ , Y), ∀J ⊆ J¯ with J = ∅, respectively.}, where .J¯ := {1, . . . , m}, .XJ := (πJ (Ω), BJ , μJ ) is as defined in Theorem 12.88. On this set Z, we will define the norm .·Z,x0 : Z → [0, ∞) ⊂ R by 7 f Z,x0 :=

Pp ◦ f (x) dμ(x) +

.

Ω

!1/p

.7 J ⊆J¯ J =∅

πJ (Ω)

Pp ◦ fJ,x0 (s) dμJ (s) (12.9)

We will show that .(Z, ·Z,x0 , K) forms a Banach space over .K, which will be called the 1st order Sobolev space of .X to .Y under p-norm with respect to .x0 ∈ X and denoted by .Wp,1,x0 (X, Y). .∀fl ∈ Z, .l = 1, 2, 3, we let .fl,J,x0 be its stream functions with respect to .x0 . Then, .fl ∈ L¯ p (X, Y) and .fl,J,x0 ∈ L¯ p (XJ , Y), .∀J ⊆ J¯ with .J = ∅. Then, .f1 +f2 ∈ L¯ p (X, Y) and .(f1 + f2 )J,x0 = f1,J,x0 + f2,J,x0 a.e. in XJ , .∀J ⊆ J¯ with .J = ∅. This implies that .(f1 + f2 )J,x0 ∈ L¯ p (XJ , Y). By Propositions 12.69 and 12.61, .f1 + f2 is absolutely continuous. Hence, .f1 + f2 ∈ Z. .∀α ∈ K, by Proposition 12.69, .αf1 is absolutely continuous. It is easy to see that .(αf1 )J,x0 = αf1,J,x0 a.e. in XJ . Thus, we have .(αf1 )J,x0 ∈ L¯ p (XJ , Y), .∀J ⊆ J¯ with .J = ∅. Hence, .αf1 ∈ Z. This paragraph then proves that .(Z, K) is a subspace of .L¯ p (X, Y) and hence a vector space over .K. Clearly, .·Z,x0 : Z → [0, ∞) ⊂ R. .ϑZ Z,x0 = 0. If .f1 Z,x0 = 0, we have .f1 (x) = ϑY a.e. x ∈ X and .f1,J,x0 (x) = ϑY a.e. x ∈ XJ . Since .f1 is absolutely continuous, then .f1 (x) = ϑY , .∀x ∈ X. Thus, .f1 = ϑZ .

12.8 Sobolev Spaces

665

Note that, .∀α ∈ K, 7 αf1 Z,x0 =

Pp ◦ (αf1 )(x) dμ(x)

.

Ω

+

!1/p

.7 J ⊆J¯ J =∅

πJ (Ω)

= |α|f1 Z,x0

Pp ◦ (αf1,J,x0 )(s) dμJ (s)

To prove the triangle inequality, we will distinguish two exhaustive and mutually exclusive cases: Case 1: .f1 Z,x0 f2 Z,x0 = 0; Case 2: .f1 Z,x0 · f2 Z,x0 > 0. Case 1: .f1 Z,x0 f2 Z,x0 = 0. Without loss of generality, assume .f2 Z,x0 = 0. Then, .f2 = ϑZ and .f2 (x) = ϑY , .∀x ∈ Ω. Then, we have .f1 + f2 Z,x0 = f1 Z,x0 = f1 Z,x0 + f2 Z,x0 . This case is proved. Case 2: .f1 Z,x0 f2 Z,x0 > 0. Then, .f1 Z,x0 > 0 and .f2 Z,x0 > 0. Let

p f1 Z,x0 f1 (x)+f2 (x)Y ∈ (0, 1) ⊂ R. Then, .∀x ∈ X, we have . f1 Z,x0 +f2 Z,x0 (f1 Z,x0 +f2 Z,x0 )p ≤ #p #p # " " f1 (x)Y +f2 (x)Y f1 (x)Y f2 (x)Y f1 (x)Y p = λ + (1 − λ) ≤ λ + (1 − f1 Z,x +f2 Z,x f1 Z,x f2 Z,x f1 Z,x 0 0 0 0 #p 0 f " p p f (x) (x) f (x) = λ f1 p Y +(1−λ) f2 p Y , where the first inequality follows from λ) f22  Y Z,x0 1 Z,x 2 Z,x

λ := "

.

0

0

of the function Definition 7.1 and the second inequality follows from the convexity

l p , .∀l ∈ [0, ∞) ⊂ R. By Proposition 11.83, we have

.

λ



Ω

Pp ◦f1 (x) dμ(x) p f1 Z,x

+ (1 − λ)



Ω

0

Pp ◦f2 (x) dμ(x) . p f2 Z,x

.

Pp ◦(f1 +f2 )(x) dμ(x) (f1 Z,x0 +f2 Z,x0 )p

Ω



0

Fix any .J ⊆ J¯ with .J = ∅, we have, .∀s ∈ πJ (Ω), .

/6 6 6 6 0 6f1,J,x (s)6 + 6f2,J,x (s)6 p 0 0 Y Y

6 6 6f1,J,x (s) + f2,J,x (s)6p 0 0 Y

≤ f1 Z,x0 + f2 Z,x0 (f1 Z,x0 + f2 Z,x0 )p / 6 /6 6 6 6 0 6 0 6f1,J,x (s)6 6f2,J,x (s)6 p 6f1,J,x (s)6 p 0 0 0 Y Y Y = λ + (1 − λ) ≤λ f1 Z,x0 f2 Z,x0 f1 Z,x0 /6 6 6 6 0 6 6 6f2,J,x (s)6 p 6f1,J,x (s)6p 6f2,J,x (s)6p 0 0 0 Y Y Y +(1 − λ) =λ + (1 − λ) p p f2 Z,x0 f1 Z,x f2 Z,x 0

0

where the first inequality follows from Definition 7.1 and the second inequality follows from the convexity of the function .l p , .∀l ∈ [0, ∞) ⊂ R. By Proposi tion 11.83, we have .

(1 − λ)

πJ (Ω) Pp ◦(f1,J,x0 +f2,J,x0 )(s) dμJ (s) (f1 Z,x0 +f2 Z,x0 )p

πJ (Ω) Pp ◦f2,J,x0 (s) dμJ (s) p f2 Z,x 0

.

≤λ

πJ (Ω) Pp ◦f1,J,x0 (s) dμJ (s) p f1 Z,x 0

+

666

12 Differentiation and Integration

Thus, we have p

.

f1 + f2 Z,x0 (f1 Z,x0 + f2 Z,x0 )p =

(f1 Z,x0 .7 + J ⊆J¯ J =∅

≤λ +

Ω

πJ (Ω)

p f1 Z,x0



λ

J ⊆J¯ J =∅

!

πJ (Ω) Pp

+ (1 − λ)

Ω

Pp ◦ f2 (x) dμ(x) p

f2 Z,x0

◦ f1,J,x0 (s) dμJ (s) p

f1 Z,x0

+(1 − λ) λ = p f1 Z,x0 +

Pp ◦ (f1 + f2 )(x) dμ(x) Ω

Pp ◦ (f1,J,x0 + f2,J,x0 )(s) dμJ (s)

Pp ◦ f1 (x) dμ(x)

.

7

1 + f2 Z,x0 )p

1−λ p f2 Z,x0

πJ (Ω) Pp

◦ f2,J,x0 (s) dμJ (s)

!

p

f2 Z,x0

7

Pp ◦ f1 (x) dμ(x) + Ω

7

J ⊆J¯ J =∅

Pp ◦ f2 (x) dμ(x) + Ω

!

.7 πJ (Ω)

Pp ◦ f1,J,x0 (s) dμJ (s) !

.7 J ⊆J¯ J =∅

πJ (Ω)

Pp ◦ f2,J,x0 (s) dμJ (s)

=1 Hence, we have .f1 + f2 Z,x0 ≤ f1 Z,x0 + f2 Z,x0 . This case is proved. Based on both of the two cases, we have the triangle inequality for .·Z . So far, we have shown that .(Z, ·Z,x0 , K) is a normed linear space. Finally, we will show that the space is complete. Fix any Cauchy sequence ∞ ∞ ¯ .(fl ) l=1 ⊆ Z. By the definition of the norm .·Z,x0 , we have .(fl )l=1 ⊆ Lp (X, Y) is ∞ a Cauchy sequence, and .(fl,J,x0 )l=1 ⊆ L¯ p (XJ , Y) is a Cauchy sequence, .∀J ⊆ J¯ with .J = ∅. By the completeness of .Lp as proved in Example 11.179, we have . ¯ ¯ ∈ L¯ p (X, Y) and .∃FJ,x0 ∈ L¯ p (XJ , Y) such that .liml∈N fl = .∃F F in L¯ p (X, Y) and . ¯ p (XJ , Y), .∀J ⊆ J¯ with .J = ∅. We need to construct .liml∈N fl,J,x0 = FJ,x0 in L a function .F ∈ Z such that its stream functions with respect to .x0 are .FJ,x0 ’s. It should be noted that .FJ,x0 ’s are specified exactly except on a set of measure zero,

12.8 Sobolev Spaces

667

which are sufficient to serve as stream functions. Note that .∀l ∈ N, .∀x ∈ Ω, we have 7 . ˇ .fl (x) = (−1)card(J ∩J ) fl,J,x0 dμJ + fl (x0 ) πJ (rx, ˆ xˇ )

J ⊆J¯ J =∅

where .xˆ := x ∧ x0 , .xˇ := x ∨ x0 , .Jˇ := exactly (12.3).) As .l → ∞, we have .

lim(fl (x) − fl (x0 )) = lim

l∈N

l∈N

=

.

.

(−1)

 i ∈ J¯ | πi (x0 ) > πi (x) . (This is

card(Jˇ∩J )

7 πJ (rx, ˆ xˇ )

J ⊆J¯ J =∅

(−1)



card(Jˇ∩J )

7 πJ (rx, ˆ xˇ )

J ⊆J¯ J =∅

fl,J,x0 dμJ

FJ,x0 dμJ ;

∀x ∈ Ω

. where the last equality follows from the fact that .liml∈N fl,J,x0 = FJ,x0 in L¯ p (XJ , Y) and Proposition 11.213. This formula determines the function F up to .F (x0 ). . Since .liml∈N fl = F¯ in L¯ p (X, Y), then by Propositions 11.211 and 11.57, there exists a subsequence .(flk )k∈N such that .limk∈N flk = F¯ a.e. in X. Then, .∃x1 ∈ Ω, (possibly different from .x0 ) such that .limk∈N flk (x1 ) = F¯ (x1 ). Then, we may define ¯ (x1 ) − J ⊆J¯ (−1)card(Jˇ1 ∩J ) .F (x0 ) := F πJ (r ) FJ,x0 dμJ , where .xˆ 1 := x1 ∧ x0 , xˆ1 ,xˇ1

J =∅

xˇ1 := x1 ∨ x0 , .Jˇ1 := {i ∈ J¯ | πi (x0 ) > πi (x1 )}. Therefore, we may define .F : Ω → Y by, .∀x ∈ Ω, .

F (x) = F (x0 ) +

.

.

J ⊆J¯ J =∅

ˇ

(−1)card(J ∩J )

7 πJ (rx, ˆ xˇ )

FJ,x0 dμJ

By Fundamental Theorem of Calculus I, Theorem 12.86, each term in the above sum is a absolutely continuous function on .πJ (Ω) and therefore absolutely continuous on .Ω by Definition 12.58. By Propositions 12.69 and 12.61, F is absolutely continuous on .Ω. It is easy to see that .F = F¯ a.e. in X, and .FJ,x0 ’s are the stream functions of F with respect to .x0 by Fundamental Theorem of Calculus II, Theorem 12.88. Hence, .F ∈ Z. Then, it is straightforward to check that .liml∈N fl − F Z,x = 0 and .liml∈N fl = F in Wp,1,x0 (X, Y). 0 Hence, any Cauchy sequence in Z converges to an element in Z. This proves that .Wp,1,x0 (X, Y) is complete, and therefore, it is a Banach space over .K. % We note that the above definition of Sobolev space agrees with that in Zeidler (1995) when .m = 1. In this case, the norm does not depend on the choice of .x0 ∈ X.

668

12 Differentiation and Integration

Example 12.107 We will now show that .∀f ∈ Wp,1,x0 (X, Y) if, and only if, .f ∈ Wp,1,x¯ (X, Y), .∀x¯ ∈ X, under the assumptions of Example 12.106. Furthermore, the norms .·Z,x0 and .·Z,x¯ are equivalent to each other. Fix any .f ∈ Wp,1,x0 (X, Y), we have that .f ∈ L¯ p (X, Y) and it is absolutely continuous, and its stream functions .fJ,x0 ∈ L¯ p (XJ , Y), .∀J ⊆ J¯ with .J = ∅. Fix any .x¯ ∈ X. The stream functions of f with respect to .x¯ are given by (12.4): .∀J ⊆ J¯ with .J = ∅, ⎧ ˇ¯ ˜ ⎪ ⎨ J ⊂J˜⊆J¯ (−1)card(J ∩J \J ) πJ˜\J (rx,¯ˆ x¯ˇ ) fJ˜,x0 (πJ˜ ( s ∈ U¯ J .fJ,x¯ (s) = MJ (s,πJ¯\J (t )))) dμJ˜\J (πJ˜\J (t ))+fJ,x0 (s) ⎪ ⎩ϑ s ∈ πJ (Ω) \ U¯ J Y   ¯ where .Jˇ := i ∈ J¯ | πi (x0 ) > πi (x) ¯ , .x¯ˆ := x¯ ∧ x0 , and .x¯ˇ := x¯ ∨ x0 ; and .U¯ J ∈ BJ with .μJ (πJ (Ω) \ U¯ J ) = 0. All we need to do is to show that .fJ,x¯ ∈ L¯ p (XJ , Y), ¯ p (XJ , Y), .∀J ⊆ J¯ with .∀J ⊆ J¯ with .J = ∅, under the assumption that .fJ,x0 ∈ L .J = ∅. Fix any .J ⊆ J¯ with .J = ∅. Then, we have .

7

. 6 6 6 6 6fJ,x¯ 6 ≤ 6fJ,x 6 + 0 p p

U¯ J

J ⊂J˜⊆J¯

67 6 6

πJ˜\J (rx, ¯ˆ x¯ˇ )

fJ˜,x0 (πJ˜ (MJ (s,

!1/p 6p 6 πJ¯\J (t)))) dμJ˜\J (πJ˜\J (t))6 dμJ (s) Y

. 6 6 ≤ 6fJ,x0 6p +

7

"7 U¯ J

J ⊂J˜⊆J¯

πJ˜\J (rx, ¯ˆ x¯ˇ )

#p 6 πJ¯\J (t))))6Y dμJ˜\J (πJ˜\J (t)) dμJ (s) . 6 6 ≤ 6fJ,x0 6p +  ·

J ⊂J˜⊆J¯

7

πJ˜\J (rx, ¯ˆ x¯ˇ )

7

6 6f ˜ (π ˜ (MJ (s, J ,x0 J !1/p

" 7 U¯ J

πJ˜\J (rx, ¯ˆ x¯ˇ )

1q dμJ˜\J (πJ˜\J (t))

1/q

6p 6 1/p #p 6 6 6fJ˜,x0 (πJ˜ (MJ (s, πJ¯\J (t))))6 dμJ˜\J (πJ˜\J (t)) Y

!1/p

dμJ (s) . 6 6 = 6fJ,x0 6p + J ⊂J˜⊆J¯

7 U¯ J

 p/q μJ˜\J (πJ˜\J (rx, ¯ˆ x¯ˇ ))

7 πJ˜\J (rx, ¯ˆ x¯ˇ )

12.8 Sobolev Spaces

669

!1/p 6p 6 6 6 (π (M (s, π (t)))) dμ (π (t)) dμ (s) 6 6fJ˜,x0 J˜ J J J¯\J J˜\J J˜\J Y

7 .  6 6 1/q 6 6 μJ˜\J (πJ˜\J (rx, = fJ,x0 p + ¯ˆ x¯ˇ )) J ⊂J˜⊆J¯

7 U¯ J

πJ˜\J (Ω)

!1/p 6 6p 6 6 6fJ˜,x0 (πJ˜ (MJ (s, πJ¯\J (t))))6 dμJ˜\J (πJ˜\J (t)) dμJ (s) Y

.  6 6 1/q = 6fJ,x0 6p + μJ˜\J (πJ˜\J (rx, ¯ˆ x¯ˇ )) J ⊂J˜⊆J¯

7 · πJ˜ (Ω)

!1/p 6 6p 6 6 6fJ˜,x0 (s)6 dμJ˜ (s) Y

6 6 = 6fJ,x0 6p +

.  J ⊂J˜⊆J¯

μJ˜\J (πJ˜\J (rx, ¯ˆ x¯ˇ ))

6 1/q 6 6 6 6fJ˜,x0 6 < ∞ p

where the first inequality follows from Minkowski’s Inequality, Theorem 11.174, the second inequality follows from Proposition 11.92, the third inequality follows from Hölder’s Inequality, Theorem 11.178, the first equality follows from Proposition 11.75, the second equality follows from Proposition 11.83, the third equality follows from Tonelli’s Theorem 12.29, and the last inequality follows from our assumption. This shows that .fJ,x¯ ∈ L¯ p (XJ , Y). By the arbitrariness of J , we have .f Z,x¯ < ∞ and .f ∈ Wp,1,x¯ (X, Y). By symmetry, .∀f ∈ Wp,1,x¯ (X, Y), we have .f ∈ Wp,1,x0 (X, Y). Next, we prove that the two norms are equivalent. Note that p

f Z,x¯ = (f p +

.

.6 6 1 6fJ,x¯ 6p ) p p J ⊆J¯ J =∅

p

≤ f p +

6 6 !p ! p1 . 6 . 6 16 6 6fJ,x 6 + q 6f (μ (π (r ))) ¯ ¯ˆ xˇ 0 p J˜\J J˜\J x, J˜,x0 6



p f p

6 !p ! p1 . 6 . card(J˜\J ) card(J˜\J ) 6 6 6 6 2 6 6 q qp fJ,x0 p + + l l 6fJ˜,x0 6

≤ f p +

. J ⊆J¯ J =∅

p

J ⊂J˜⊆J¯

J ⊆J¯ J =∅

p

p

J ⊂J˜⊆J¯

J ⊆J¯ J =∅

(1q +

. J ⊂J˜⊆J¯

l

card(J˜\J ) q

1

)q

670

12 Differentiation and Integration

6p 1 !p ! p1 . card(J˜\J ) 6 6 6p 6 6 6 6 q ·( fJ,x0 p + l 6fJ˜,x0 6 ) p p

J ⊂J˜⊆J¯

p

= f p +

." . J ⊆J¯ J =∅

=

p f p

+

p

card(J˜\J ) q

J ⊆J˜⊆J¯

." .

l

."

1

1 + lq

card(J˜) q

=

l

card(J˜\J ) q

#p " . q

l

card(J˜\J ) q

# p card(q J¯\J ) " . J ⊆J˜⊆J¯

l

6 6p #! p1 6 6 6fJ˜,x0 6 p

6 6p #! p1 6 6 6fJ˜,x0 6 p

J ⊆J˜⊆J¯

J ⊆J¯ J =∅

p f p

#p " . q J ⊆J˜⊆J¯

J˜⊆J¯\J

J ⊆J¯ J =∅

= f p +

l

card(J˜\J ) q

6 6p #! p1 6 6 6fJ˜,x0 6 p

p card(J¯\J ) 6p ! p1 . . " card(J˜\J ) 6 1# q 6 6 1 + lq + l q 6fJ˜,x0 6 J ⊆J¯ J =∅

p

J ⊆J˜⊆J¯

! p card(J¯\J˜) .." card(J \J˜) 6 6p p 1# q 6 6 q q fJ,x0 p + 1+l l 1

=

p f p

J ⊆J¯ J˜⊆J J =∅ J˜=



p

≤ f p +

." J ⊆J¯ J =∅

6 6p ·6fJ,x0 6p

pm 1# q

1 + lq

⎛⎛ ⎞ ⎞card(J ) 1 p 1 card(J ) ⎜⎝ 1 + (1 + l q ) q l q ⎠ ⎟ −l q ⎠ ⎝ 1 p q q (1 + l )

!1

p

m−1 m " 1# q " 1 p 1#p f Z,x0 =: Km,p (l)f Z,x0 ≤ 1 + lq 1 + (1 + l q ) q l q

where the first equality follows from (12.9), the first inequality follows from the previous paragraph, the second inequality follows from the definition .l := max(x¯ˇ − ¯ˆ ∈ R+ , the third inequality follows from Hölder’s Inequality, Theorem 7.8, the x) second through sixth equalities follow from simple algebra, the fourth inequality follows from simple algebra, and the last inequality follows from (12.9) and simple algebra. By symmetry, we have .f Z,x0 ≤ Km,p (l)f Z,x¯ , where l is the same quantity as before. Hence, the norms are equivalent on .Wp,1,x0 (X, Y). % Proposition 12.108 Let .m ∈ N, .Ω ∈ BB (Rm ) be a nonempty open rectangle, .X := ((Ω, |·|), B, μ) be the .σ -finite metric measure subspace of .Rm , .J¯ := {1, . . . , m}, and .Y be a separable reflexive Banach space over .K with .Y∗ being separable, .x0 ∈

12.9 Integral Depending on a Parameter

671

Ω, .Z := Wp,1,x0 (X, Y), .f ∈ Z, .Di f : Ui → Y be the partial derivative of f along the ith coordinate direction, .i ∈ J¯, and .fJ,x0 : πJ (Ω) → Y be the stream functions of f with respect to .x0 , .∀J ⊆ J¯ with .J = ∅. Then, we have the following results: (i) .∀i ∈ J¯, .Ui ∈ B with .μ(Ω \ Ui ) = 0. (ii) .∀t ∈ XJ¯\{i} , .∀s1 , s2 ∈ X{i} , we have Di f (M{i} (s, t)) = f{i},M{i} (π{i} (x0 ),t )(s) a.e. s ∈ X{i}

.

f (M{i} (s1 , t)) − f (M{i} (s2 , t)) ¯ p (X{i} , Y). .f{i},M{i} (π{i} (x0 ),t ) ∈ L .

=

s1 s2

f{i},M{i} (π{i} (x0 ),t )(s) ds,

and

Proof The results are direct consequences of (iii) of Fundamental Theorem of Calculus II, Theorem 12.88, and Example 12.107. & '

12.9 Integral Depending on a Parameter Proposition 12.109 Let X := (X, ρ) be a metric space, D ⊆ X be open, and K ⊆ D be compact. Then, there exists δ ∈ R+ such that ∀x ∈ K, BX (x, δ) ⊆ D. Proof Since D  is open,then ∀x ∈ K, ∃δx ∈ R+ , such that BX (x, δx ) ⊆ D. This of K, there N ∈ Z+ + and implies K ⊆ x∈K BX x, δ2x . By the" compactness # * exists 8 N δxi δxi N (xi )i=1 ⊆ K such that K ⊆ i=1 BX xi , 2 . Let δ := min 1, i∈{1,...,N} 2 ∈  δxi  R+ . Then, ∀x ∈ K, ∀y ∈ BX (x, δ), ∃i0 ∈ {1, . . . , N} such that x ∈ BX xi0 , 20 . δxi

Thus, ρ(y, xi0 ) ≤ ρ(y, x) + ρ(xi0 , x) < δ + 20 ≤ δxi0 . This leads to that y ∈ BX (xi0 , δxi0 ) ⊆ D. By the arbitrariness of y, we have BX (x, δ) ⊆ D. This completes the proof of the proposition. ' & Proposition 12.110 Let X := (X, ρ) be a metric space, D ⊆ X be open, K ⊆ D be compact, and f : D → Y, and Y be a normed linear space. Assume that f is continuous at x and f (x)Y ≤ M, ∀x ∈ K, for some M ∈ R+ . Then, ∀ ∈ R+ , ∃δ ∈ R+ such that ∀x ∈ X with dist(x, K) < δ, we have f (x)Y < M + .   Proof By Proposition 12.109, there exists δ¯ ∈ R+ such that ∀x ∈ K, BX x, δ¯ ⊆ D. Fix any  ∈ R+ . By the assumptions, ∀x ∈ K, ∃δx ∈ r0,δ¯ ⊂ R+ , such that ∀y ∈ BX (x, δx ), we have f (y) − f (x)  Y < . This implies that f (y)Y <  + f (x)Y ≤ M + . Now, K ⊆ x∈K BX x, δ2x . By the compactness of  δxi  N N K, there exists * 8N ∈ Z+ δand+ (xi )i=1 ⊆ K such that K ⊆ i=1 BX xi , 2 . Let x δ := min 1, i∈{1,...,N} 2i ∈ R+ . Then, ∀x ∈ X with dist(x, K) < δ, ∃y ∈ K such that x ∈ BX (y, δ). This further implies that ∃i0 ∈ {1, . . . , N} such that y ∈  δxi  δxi BX xi0 , 20 . Thus, ρ(x, xi0 ) ≤ ρ(y, x) + ρ(xi0 , y) < δ + 20 ≤ δxi0 . This leads

672

12 Differentiation and Integration

to that x ∈ BX (xi0 , δxi0 ) ⊆ D and f (x)Y < M + . This completes the proof of the proposition. ' & Theorem 12.111 Let f : X × Y → B(Z, W), where X := (X, ρ) is a metric space, Y is a compact metric space, Z is a normed linear space over K, W is a Banach space over K, J := (J, B, μ) be a finite Z-valued measure space, and w : J → Y be B-measurable. Assume that f is continuous. Define F : X → W by F (x) := J f (x, w(t)) dμ(t) ∈ W, ∀x ∈ X . Then, F is continuous, f (x, w(·)) : J → Ux ⊆ B(Z, W), where Ux is a separable Banach subspace of B(Z, W), and F (x) ∈ Wx := span ({w¯ ∈ W | ∃z ∈ Z, ∃A ∈ Ux · w¯ = Az}). Proof By Proposition 11.190, Y is separable. Since f is continuous, by Propositions 7.126 and 5.7, ∀x ∈ X , the function h(x) := f (x, ·) : Y → Vx ⊆ B(Z, W) satisfies Vx is a (separable) compact subset of B(Z, W). By Proposition 11.38, f (x, w(·)) : J → Vx is B-measurable. By Bounded Convergence Theorem 11.77, f (x, w(·)) : J → Vx is absolutely integrable over J . By Proposition 7.35, Ux := span (Vx ) is a separable Banach subspace of B(Z, W). By Proposition 11.130, F (x) ∈ Wx ⊆ W, ∀x ∈ X , and F is well-defined. Fix any x0 ∈ X . Fix any  ∈ R+ . ∀y ∈ Y, by  the continuity of ¯ f , ∃δy ∈ R+ such that, ∀(x, y) ¯ ∈ BX ×Y (x0 , y), δy "=: D #x0 ,y , we δ

have f (x, y) ¯ − f (x0 , y)B(Z,W) < /2. Now, ∀y¯ ∈ BY y, 2y , ∀x ∈ " # δ BX x0 , 2y , we have (x, y), ¯ (x0 , y) ¯ ∈ D¯ x0 ,y and f (x, y) ¯ − f (x0 , y) ¯ B(Z,W) ≤ f (x, y) ¯ − f (x0 , y)B(Z,W) + f (x0 , y) − f (x0 , y) ¯ B(Z,W) < . " #  δ Clearly, Y ⊆ y∈Y BY y, 2y , which is an open covering. By the compactness " δ # N y of Y, ∃N ∈ Z+ and ∃(yi )N ⊆ Y such that Y ⊆ B yi , 2i . Define Y i=1 i=1 δm := min{mini∈{1,...,N}

δyi 2

, 1} ∈ R+ . This δm has the following property. Fix  δyi  any x ∈ BX (x0 , δm ). ∀y ∈ Y, ∃i0 ∈ {1, . . . , N}, such that y ∈ BY yi0 , 20 . Then, (x, y), (x0 , y) ∈ D¯ x0 ,yi0 and f (x, y) − f (x0 , y)B(Z,W) < . This further implies 6 6 that F (x) − F (x0 )W = 6 J f (x, w(t)) dμ(t) − J f (x0 , w(t)) dμ(t) 6W = 6 6 6 (f (x, w(t)) − f (x0 , w(t))) · dμ(t) 6 ≤ f (x, w(t))−f (x0 , w(t))B(Z,W) J J W dP ◦ μ(t) ≤ P ◦ μ(J ), where the first equality follows from the definition of F , the second equality follows from Proposition 11.130 and the fact that   span Ux0 ∪ Ux ⊆ B(Z, W) is a separable Banach subspace by Proposition 7.35; the first inequality follows from Proposition 11.130; and the last inequality follows from Propositions 11.83 and 11.75. This proves that F is continuous at x0 . By the arbitrariness of x0 , we have F is continuous. This completes the proof of the theorem. ' & This previous theorem is a true generalization of Theorem 31.6 of Bartle (1976).

12.9 Integral Depending on a Parameter

673

Theorem 12.112 Let f : D × Y → B(Z, W), where D ⊆ X, X is a normed linear space over K, Y is a compact metric space, Z is a normed linear space over K, W is a Banach space over K, x0 ∈ D, and J := (J, B, μ) be a finite Z-valued measure space. Assume that the following conditions hold: (i) span (AD (x0 )) = X.   (ii) ∃δx0 ∈ R+ , such that the set (D ∩ BX x0 , δx0 ) − x0 is a conic segment. ∂f (iii) ∂f ∂x : D × Y → B(X, B(Z, W)) exists, and ∂x is continuous at (x0 , y), ∀y ∈ Y. (iv) w : J → Y is B-measurable and f is continuous. Define F : D → W by F (x) := J f (x, w(t)) dμ(t) ∈ W, ∀x ∈ D. Then, F is  T2,1 dμ(t) ∈ B(X, W). Fréchet differentiable at x0 and DF (x0 ) = J ∂f ∂x (x0 , w(t))  ∂f T2,1 Proof By Theorem 12.111, F (x) ∈ W and J ∂x (x0 , w(t)) dμ(t) ∈ B(X, W), ∀x ∈ D, are well-defined.   By (ii), ∃δ¯x0 ∈ R+ , such that the set (D ∩ BX x0 , δ¯x0 ) − x0 is a conic segment. Fix any  ∈ R+ . By the continuity of ∂f ∂x , Proposition 12.110, and the compactness of {x0 } × Y, ∃δ ∈ r0,δ¯x ⊂ R+ , ∀x ∈ BX (x0 , δ) ∩ 0 6 6 ∂f 6 < D, ∀y ∈ Y, we have dist((x, y), {x0 } × Y) < δ and 6 ∂f ∂x (x, y) − ∂x (x0 , y)  . Fix any x ∈ BX (x0 , δ) ∩ D. ∀y ∈ Y, by Mean Value Theorem 9.23, 6 61+P ◦μ(J ) 6 6 f (x, y) − f (x0 , y) − ∂f (x0 , y)(x − x0 ) 6 ≤ 6 ∂f ∂x (x0 + αx,y (x − x0 ), y) − B(Z,W) ∂x 6 ∂f  6 ≤ 1+P ◦μ(J ) x − x0 X , where αx,y ∈ ∂x (x0 , y) B(X,B(Z,W)) x − x0 X (0, 1) ⊂ R and x0 + αx,y (x − x0 ) ∈ D ∩ BX (x0 , δ); and the 6 second inequality follows from the choice of δ. This further yields that 6F (x) − 6 6  T2,1 F (x0 ) − ( J ∂f (x0 , w(t)) dμ(t))(x − x0 )6W = 6 J f (x, w(t)) dμ(t) − ∂x 6 6 ∂f 6 = 6 (f (x, w(t)) − J f (x0 , w(t)) dμ(t) − J ∂x (x0 , w(t))(x 6− x0 ) dμ(t) J W 6 6 ≤ 6f (x, w(t)) − f (x0 , w(t)) − f (x0 , w(t)) − ∂f (x , w(t))(x − x )) dμ(t) 0 0 J ∂x W 6 ∂f  x − x0 X dP ◦ (x0 , w(t))(x − x0 )6 dP ◦ μ(t) ≤ ∂x

B(Z,W)

J 1+P ◦μ(J )

 P ◦μ(J ) μ(t) = 1+ P ◦μ(J ) x − x0 X ≤ x − x0 X , where the first equality follows from the definition of F and Proposition 11.130, the second equality and the first inequality follow from Proposition 11.130, the second inequality follows from Proposition 11.83, and the last equality follows from Proposition 11.75. This, coupled with (i), proves that F is Fréchet differentiable at x0 and DF (x0 ) = T2,1  ∂f dμ(t). This completes the proof of the theorem. ' & J ∂x (x0 , w(t))

674

12 Differentiation and Integration

In the above theorem, the assumptions (i) and (ii) are conditions on the domain D, which are easily satisfied if x0 ∈ D ◦ . Theorem 12.112 is a true generalization of Theorem 31.7 of Bartle (1976). Theorem 12.113 (Leibniz’s Formula) Let X be a real normed linear space, a, b ∈ R with a < b, and Z is a real Banach space, D ⊆ X, x0 ∈ D, f : D × ra,b → Z, α : D → ra,b , and β : D → ra,b . Assume that the following conditions hold: (i) ∀x1 ∈ D, we have span (AD (x1 )) = X. (ii) The domain  D is locally convex, i. e., ∀x1 ∈ D, ∃δx1 ∈ (0, ∞) ⊂ R such that D ∩ BX x1 , δx1 is convex. (iii) ∂f ∂x : D × ra,b → B(X, Z), Dα(x0 ) ∈ B(X, R), and Dβ(x0 ) ∈ B(X, R) exists, and f and ∂f ∂x are continuous. β(x) Define F : D → Z by F (x) := α(x) f (x, t) dt ∈ Z, ∀x ∈ D. Then, F is Fréchet differentiable at x0 and 7 DF (x0 ) = f (x0 , β(x0 ))Dβ(x0 ) − f (x0 , α(x0 ))Dα(x0 ) +

β(x0 )

.

α(x0 )

∂f (x0 , t) dt ∂x

v Proof Let H : D × ra,b × ra,b → Z be defined by H (x, u, v) := u f (x, t) dt, ∀(x, u, v) ∈ D × ra,b × ra,b . Then, F (x) = H (x, α(x), β(x)), ∀x ∈ D. v ∂f Note that, by Theorem 12.112, ∂H ∂x (x, u, v) = u ∂x (x, t) dt; by Theorem 12.82 and Proposition 7.126, ∂H ∂u (x, u, v) = −f (x, u); and, by Theorem 12.82 and (x, u, v) = f (x, v); ∀(x, u, v) ∈ D × ra,b × ra,b . By Proposition 7.126, ∂H ∂v Proposition 9.24, we have, ∀(x, u, v) ∈ D × ra,b × ra,b , DH (x, u, v) =

.

=

= ∂H ∂x

F

(x, u, v)

∂H ∂H ∂u (x, u, v) ∂v (x, u, v)

v ∂f u ∂x (x, t) dt

−f (x, u) f (x, v)

>

G

Now, by Chain Rule (Theorem 9.18) and Proposition 9.19, we have DF (x0 ) = ⎡ ⎤ idX β(x ) ∂f (x0 , t) dt − f (x0 , α(x0 ))Dα(x0 ) + DH (x0, α(x0 ), β(x0 )) ⎣ Dα(x0 ) ⎦ = α(x00) ∂x Dβ(x0 ) f (x0 , β(x0 ))Dβ(x0). This completes the proof of the theorem. ' & Theorem 12.114 Let Z be a normed linear space over K, W be a Banach space over K, J := (J, B, μ) be a σ -finite Z-valued measure space, X := (X, ρ) be a

12.9 Integral Depending on a Parameter

675

metric space, f : X × J → B(Z, W), and x0 ∈ X . Assume that the following conditions hold: (i) f (x, ·) : J → Ux ⊆ B(Z, W) is B-measurable and Ux is a separable Banach subspace of B(Z, W). (ii) f (·, t) : X → B(Z, W) is continuous at x0 , ∀t ∈ J . (iii) There exists a B-measurable function g : J → R+ , which is integrable on J with respect to P ◦ μ, such that, ∀x ∈ X , f (x, t)B(Z,W) ≤ g(t), ∀t ∈ J . Define F : X → W by F (x) := J f (x, t) dμ(t) ∈ W, ∀x ∈ X . Then, F is continuous at x0 , and F (x) ∈ span ({w ∈ W | ∃z ∈ Z, ∃A ∈ Ux · w = Az}) =: Wx . Proof Fix any x ∈ X , by Lebesgue Dominated Convergence Theorem 11.131, F is well-defined. By Proposition 11.132, F (x) ∈ Wx ⊆ W, ∀x ∈ X . We will prove that limx→x0 F (x) = F (x0 ) by Proposition 4.15. By the continuity of f , we have limx→x0 f (x, t) = f (x0 , t), ∀t ∈ J . Fix any sequence (xi )∞ i=1 ⊆ X with limi∈N xi = x0 . Then, by Proposition 4.15, limi∈N f (x , t) = f (x0 , t), ∀t ∈ J . This implies that limi∈N F (xi ) = i limi∈N J f (xi , t) dμ(t) = J f (x0 , t) dμ(t) = F (x0 ), where the second equality follows from Lebesgue Dominated Convergence Theorem 11.131. By Proposition 4.15, we have limx→x0 F (x) = F (x0 ). This proves that F is continuous at x0 . This completes the proof of the theorem. ' & Theorem 12.115 Let X := (X, ρ) be a metric space, Y be a separable metric space, Z be a normed linear space over K, W be a Banach space over K, J := (J, B, μ) be a σ -finite Z-valued measure space, f : X × Y → B(Z, W), and w : J → Y be B-measurable. Assume that the following conditions hold: (i) f is continuous. (ii) There exists a B-measurable function g : J → R+ , which is integrable on J with respect to P ◦ μ, such that, ∀x ∈ X , f (x, w(t))B(Z,W) ≤ g(t), ∀t ∈ J . Define F : X → W by F (x) := J f (x, w(t)) dμ(t) ∈ W, ∀x ∈ X . Then, F is continuous, f (x, w(·)) : J → Ux ⊆ B(Z, W), where Ux is a separable Banach subspace of B(Z, W), and F (x) ∈ Wx := span ({w ∈ W | ∃z ∈ Z, ∃A ∈ Ux · w = Az}). Proof Since f is continuous, by Proposition 7.126, ∀x ∈ X , h(x) := f (x, ·) : Y → Vx ⊆ B(Z, W) satisfies Vx is a separable subset of B(Z, W). By Proposition 11.38, f (x, w(·)) : J → Vx is B-measurable. By Lebesgue Dominated Convergence Theorem 11.131, F is well-defined. By Proposition 7.35, Ux := span (Vx ) is a separable Banach subspace of B(Z, W). By Proposition 11.132, F (x) ∈ Wx ⊆ W, ∀x ∈ X . Fix any x0 ∈ X . We will prove that limx→x0 F (x) = F (x0 ) by Proposition 4.15. By the continuity of f , we have limx→x0 f (x, w(t)) = f (x0 , w(t)), ∀t ∈ J . Fix any sequence (xi )∞ i=1 ⊆ X with limi∈N xi = x0 . Then, by Proposition 4.15, limi∈N f (xi , w(t)) = f (x0 , w(t)), ∀t ∈ J . This implies that limi∈N F (xi ) =

676

12 Differentiation and Integration

limi∈N J f (xi , w(t)) dμ(t) = J f (x0 , w(t)) dμ(t) = F (x0 ), where the second equality follows from Lebesgue Dominated Convergence Theorem 11.131. By Proposition 4.15, we have limx→x0 F (x) = F (x0 ). This proves that F is continuous at x0 . By the arbitrariness of x0 , F is continuous. This completes the proof of the theorem. ' & Theorem 12.116 Let f : D × Y → B(Z, W), where D ⊆ X, X is a normed linear space over K, Y is a separable metric space, Z is a normed linear space over K, and W is a Banach space over K, and J := (J, B, μ) be a σ -finite Z-valued measure space. Assume that the following conditions hold: (i) ∀x0 ∈ D, span (AD (x0 )) = X.   (ii) ∀x0 ∈ D, ∃δx0 ∈ R+ , such that the set (D ∩ BX x0 , δx0 ) − x0 is a conic segment. (iii) w : J → Y is B-measurable. (iv) f is continuous, and there exists a measurable function g : J → R+ , which is integrable over J with respect to P ◦ μ, such that, ∀x ∈ D, f (x, w(t))B(Z,W) ≤ g(t), ∀t ∈ J . (v) ∂f ∂x exists and is continuous, and there exists a measurable function g¯ : J → R , which is 6integrable over J with respect to P ◦ μ, such that, ∀x ∈ D, 6+ 6 ∂f (x, w(t)) 6 ≤ g(t), ¯ ∀t ∈ J . B(X,B(Z,W)) ∂x Define F : D → W by F (x) := J f (x, w(t)) dμ(t) ∈ W, ∀x ∈ D. Then, F T2,1  dμ(t) ∈ is continuously Fréchet differentiable and DF (x) = J ∂f ∂x (x, w(t)) B(X, W), ∀x ∈ D.  T2,1 Proof By Theorem 12.115, F (x) ∈ W and J ∂f dμ(t) ∈ B(X, W), ∂x (x, w(t)) ∀x ∈ D, are well-defined and are continuous.  T2,1 Fix any x0 ∈ D. Denote J ∂f dμ(t) =: L0 ∈ B(X, W). By (ii), ∂x (x0 , w(t))   ∃δ¯x0 ∈ R+ , such that the set (D ∩ BX x0 , δ¯x0 ) − x0 =: D0 is a conic segment. (x0 )−L0 (h)W Define function G : D0 \ {ϑX } → R by G(h) := F (x0 +h)−F ∈ R, hX ∀h ∈ D0 \ {ϑX }. We will show that limh→ϑX G(h) = 0 by Proposition 4.16. Fix any sequence (hi )∞ i=1 ⊆ D0 \ {ϑX } with limi∈N hi = ϑX . Fix any i ∈ N. F (x0 + hi ) − F (x0 ) − L0 (hi ) ∈ W. Then, we have F (x0 + hi ) − F (x0 ) − L0 (hi )W hi X 67 6 6 1 6 ∂f 6 (x0 , w(t))(hi )) dμ(t)6 = (f (x0 + hi , w(t)) − f (x0 , w(t)) − 6 6 hi X J ∂x W 67 7 1 6 ! 6 6 1 6 ∂f ∂f (x0 + shi , w(t)) − (x0 , w(t)) (hi ) ds dμ(t)6 = 6 hi X 6 J 0 ∂x ∂x W

G(hi ) =

.

12.9 Integral Depending on a Parameter

1 ≤ hi X

7 67 6 6 6 J

1 0

677

6 ! 6 ∂f ∂f (x0 + shi , w(t)) − (x0 , w(t)) (hi ) ds 6 6 ∂x ∂x B(Z,W)

dP ◦ μ(t) 6 ! 7 7 16 6 ∂f 6 1 ∂f 6 (x0 + shi , w(t)) − (x0 , w(t)) (hi )6 ≤ 6 hi X J 0 6 ∂x ∂x B(Z,W) ds dP ◦ μ(t) 6 7 7 16 6 ∂f 6 1 6 (x0 + shi , w(t)) − ∂f (x0 , w(t))6 ≤ 6 6 hi X J 0 ∂x ∂x B(X,B(Z,W)) ·hi X ds dP ◦ μ(t) 7 7 16 6 ∂f 6 (x0 + shi , w(t)) − = 6 ∂x J 0 7 17 6 6 ∂f 6 (x0 + shi , w(t)) − = 6 0 J ∂x

6 6 ∂f (x0 , w(t))6 ds dP ◦ μ(t) 6 ∂x B(X,B(Z,W)) 6 6 ∂f (x0 , w(t))6 dP ◦ μ(t) ds 6 ∂x B(X,B(Z,W))

where the first equality follows from the definitions of G, the second equality follows from the definitions of F and L0 and Proposition 11.132, the third equality follows from Theorem 12.83, the first inequality follows from Proposition 11.132, the second inequality follows from Proposition 11.92, the third inequality follows from Proposition 7.63, the fourth equality follows from simple algebra, and the last equality follows from Tonelli’s Theorem 12.29 and Propositions 11.38 and 11.39. Now, by Lebesgue Dominated Convergence Theorem 11.91, we have limi∈N G(hi ) = 0. This further implies that limh→ϑX G(h) = 0 by Proposition 4.16. ∀ ∈ R+ , ∃δ ∈ r0,δx0 , ∀x ∈ D ∩ BX (x0 , δ), we have 0 ≤ G(x − x0 ) < . This is equivalent to the inequality F (x) − F (x0 ) − L0 (x − x0 )Z ≤  x − x0 X . This shows that DF (x0 ) = L0 . The result follows by the arbitrariness of x0 ∈ D. This completes the proof of the theorem. ' & Proposition 12.117 Let I ⊆ R be an interval, Y and Z be normed linear spaces over K, D ⊆ Y, f : I → D ⊆ Y be of locally bounded variation, and g : D → Z be Lipschitz on D with Lipschitz constant L. Then, the following result holds: (i) h := g ◦ f : I → Z is of locally bounded variation. (ii) If f is of bounded variation, then h is of bounded variation. Proof (i) By Definition 12.41, f is continuous on the right. By Propositions 4.15 and 12.37, h is continuous on the right. ∀x1 , x2 ∈ I with x1 ≤ x2 . Then, we have

678

12 Differentiation and Integration

rx1 ,x2 ⊆ I since I is an interval. By Definition 12.41, Tf (rx1 ,x2 ) < ∞. Then,   Th rx1 ,x2 = = = ≤

n . 6 6 6Δh (rxˇ ,xˇ )6 i−1 i Z

sup

.

n∈Z+ , x1 =xˇ 0 ≤xˇ 1 ≤···≤xˇ n =x2 i=1

sup

n . 6 6 6h(xˇi ) − h(xˇi−1 )6 Z

n∈Z+ , x1 =xˇ 0 ≤xˇ 1 ≤···≤xˇ n =x2 i=1

sup

n . 6 6 6g(f (xˇi )) − g(f (xˇi−1 ))6 Z

n∈Z+ , x1 =xˇ 0 ≤xˇ 1 ≤···≤xˇ n =x2 i=1

sup

n . 6 6 L6f (xˇi ) − f (xˇi−1 )6Y

n∈Z+ , x1 =xˇ 0 ≤xˇ 1 ≤···≤xˇ n =x2 i=1

=L

sup

n . 6 6 6f (xˇi ) − f (xˇi−1 )6 Y

n∈Z+ , x1 =xˇ 0 ≤xˇ 1 ≤···≤xˇ n =x2 i=1

  = LTf rx1 ,x2 < ∞

where the first two equalities follow from Definition 12.41, the third equality follows from the definition of h, the first inequality follows from the Lipschitz continuity of g, the fourth equality follows from Proposition 3.81, and the last equality follows from Definition 12.41. Hence, h satisfies (i) and (ii) of Definition 12.41. ∀x1 , x2 ∈ I with x1 < x2 . Then, rx1 ,x2 ⊆ I . Define H : rx1 ,x2 → [0, ∞) ⊂ R by H (x) := Th (rx1 ,x ), ∀x ∈ rx1 ,x2 . Fix any x ∈ rx1 ,x2 , and let δ ∈ [0, x2 − x] ⊆ [0, ∞) ⊂ R. Then, we have         |H (x + δ) − H (x)| = Th rx1 ,x+δ − Th rx1 ,x  = Th rx,x+δ    ≤ LTf rx,x+δ

.

where the first equality follows from the definition of H , the second equality follows from Proposition 12.47, and the inequality follows from the previous paragraph. By Definition 12.41, we have limδ→0+ Tf (rx,x+δ ) = limδ→0+ (Tf (rx1 ,x+δ ) − Tf (rx1 ,x )) = 0. Then, we have limδ→0+ |H (x + δ) − H (x)| = 0. Hence, H is continuous on the right at x. By the arbitrariness of x, H is continuous on the right. Then, h is of locally bounded variation. (ii) Clearly, Th ≤ LTf < ∞ if f is of bounded variation. Then, h is of bounded variation. This completes the proof of the proposition. ' & Proposition 12.118 Let I ⊆ R be an interval, Y and Z be Banach space over K, f : I → Y and g : I → Z be of locally bounded variation, and h : I → Y × Z be

12.9 Integral Depending on a Parameter

679

defined by h(x) = (f (x), g(x)) ∈ Y × Z, ∀x ∈ I . Then, the following statements hold:   (i) h is of locally bounded variation, and Th rx1 ,x2 ≤ Tf (rx1 ,x2 )+Tg (rx1 ,x2 ) < ∞, ∀x1 , x2 ∈ I with x1 ≤ x2 . (ii) If, in addition, f and g are of bounded variation, then h is of bounded variation. Proof (i) By Theorem 12.50, there exists a σ -finite Y-valued measure μf on the measurable space (P(I ), BB (P(I ))) such that f is a cumulative distribution function of μf ; and there exists a σ -finite Z-valued measure μg on the measurable space (P(I ), BB (P(I ))) such that g is a cumulative distribution function of μg . By I H μf such Proposition 11.147, there exists a σ -finite Y × Z-valued measure μ = μg     that ∀E ∈ dom μf ∩ dom μg , we have μ(E) = (μf (E), μg (E)) ∈ Y × Z, and P ◦ μ ≤ P ◦ μf + P ◦ μg . Then, ∀x1 , x2 ∈ I with x1 ≤ x2 . Then, rx1 ,x2 ⊆ I . Then, Tf (rx1 ,x2 ) < ∞ and Tg (rx1 ,x2 ) < ∞, by Definition 12.41. By Theorem 12.50, we have P ◦ μf (rx1 ,x2 ) = Tf (rx1 ,x2 ) < ∞ and P ◦ μg (rx1 ,x2 ) = Tg (rx1 ,x2 ) < ∞. This implies that rx1 ,x2 ∈ dom (μf )∩dom (μg ). Hence, μ(rx1 ,x2 ) = (μf (rx1 ,x2 ), μg (rx1 ,x2 )) = (f (x2 ) − f (x1 ), g(x2 ) − g(x1 )) = (f (x2 ), g(x2 )) − (f (x1 ), g(x1 )) and P ◦ μ(rx1 ,x2 ) ≤ Tf (rx1 ,x2 ) + Tg (rx1 ,x2 ) < ∞. This leads to the conclusion that h is a cumulative distribution function of μ. If I = ∅, then the desired result holds trivially. On the other hand, if I = ∅, then we let x0 ∈ I . This implies that h¯ := h − h(x0 ) : I → Y × Z is a cumulative distribution function for μ with origin at x0 . By Proposition 12.52, h¯ is of locally bounded variation and Th¯ (rx1 ,x2 ) = P ◦ μ(rx1 ,x2 ), ∀x1 , x2 ∈ I with x1 ≤ x2 . Then, we have h is of locally bounded variation by Definition 12.41, since Th¯ (rx1 ,x2 ) = Th (rx1 ,x2 ), ∀x1 , x2 ∈ I with x1 ≤ x2 . Furthermore, we have Th (rx1 ,x2 ) = Th¯ (rx1 ,x2 ) = P ◦ μ(rx1 ,x2 ) ≤ Tf (rx1 ,x2 ) + Tg (rx1 ,x2 ) < ∞, ∀x1 , x2 ∈ I with x1 ≤ x2 . (ii) Since f and g are of bounded variation, we have Tf < ∞ and Tg < ∞. Then, by (i), Th ≤ Tf + Tg < ∞. Hence, h is of bounded variation. This completes the proof of the proposition. ' & Proposition 12.119 Let I := [a, b] ⊂ R be a compact interval with a ≤ b, X and Y be Banach spaces over K, p ∈ [1, ∞) ⊂ R, f : I → B(X, Y) be of bounded variation, and g : I → X be continuous. Then, the following result holds: (i) h := Pp ◦ f : I → [0, ∞) ⊂ R is of bounded variation. (ii) h is Riemann–Stieltjes integrable with respect to g on I , g is Riemann–Stieltjes integrable with respect to h on I , and the Riemann–Stieltjes integrals converge in gauge mode. (iii) If, in addition, X = K and g is of bounded variation and monotonically b b nondecreasing, then  h dg = h dg ∈ R, where the right-hand side is a

a

the Lebesgue–Stieltjes integral.

b b b (iv) In particular, if g = idI , then a h dg = a Pp ◦ f dt = a Pp ◦ f dt = p p f  ¯ = f L¯ (r ,Y) ∈ R. L (r ,Y) p

a,b

p

a,b

680

12 Differentiation and Integration

Proof (i) Clearly, the norm function ·B(X,Y) : B(X, Y) → [0, ∞) ⊂ R is Lipschitz on B(X, Y) with Lipschitz constant 1. By Proposition 12.117, the function h¯ := P ◦ f : I → [0, ∞) ⊂ R is of bounded variation. Note that, since f is of bounded variation, then f is bounded. Then, ∃c ∈ [0, ∞) ⊂ R such that h¯ : I → [0, c] ⊂ R. The function l : r0,c → r0,cp ⊂ R, defined by l(t) = t p , ∀t ∈ r0,c , is Lipschitz on r0,c with Lipschitz constant pcp−1 . Then, by Proposition 12.117, h = l ◦ h¯ is of bounded variation. (ii) By Integrability Theorem A.10, g is Riemann–Stieltjes integrable with respect to h on I , h is Riemann–Stieltjes integrable with respect to g on I , and these Riemann–Stieltjes integrals converge in the gauge mode. (iii) Let X = K and g be of bounded variation and monotonically nondecreasing (hence real-valued). By Proposition 12.55 and Theorem 12.50, h is BB (R) b b h dg ∈ R. measurable. By Proposition 12.100, we have  h dg = a

a

(iv) Let g = idI , which is continuous, of bounded variation, and monotonically b b b nondecreasing. By (iii), we have  h dg =  Pp ◦ f dt = Pp ◦ f dt = a

p

f L¯

p (ra,b ,Y)

a

Lemma 11.73. This completes the proof of the proposition.

a

p

∈ R. Furthermore, we have f L¯

p (ra,b ,Y)

p

= f L¯

p (ra,b ,Y)

by ' &

Proposition 12.120 Let p ∈ [1, ∞) ⊂ R, X := (ra,b , B, μ) be the finite metric measure subspace of R with a ≤ b, Y be a separable normed linear space over K, and f ∈ L¯ p (X, Y). Then, ∀ ∈ (0, ∞) ⊂ R, ∃ a absolutely continuous function g : X → Y such that g ∈ L¯ p (X, Y) and g − f L¯ p (X,Y) < . Proof Fix any  ∈ (0, ∞) ⊂ R. By Propositions 4.11 and Proposition 11.182, there exists a continuous function h ∈ L¯ p (X, Y) such that h − f L¯ p (X,Y) < 2 . Now, by Bernsteˇın Approximation Theorem 11.203, there exists a Bernsteˇın function  g : X → Y such that g − hC (X,Y) < 2(1+b−a) 1/p . By Definition 11.202, each summand in the function l is absolutely continuous. Note that the vector addition is Lipschitz on the domain Y × Y. Then, by Propositions 12.61 and 12.67, we have g is absolutely continuous. By Propositions 12.59 and 11.37, g is B-measurable. Then, g ∈ L¯ p (X, Y) since g is then bounded. Clearly, g − f L¯ p (X,Y) ≤ g − hL¯ p (X,Y) +  1/p  p h − f L¯ p (X,Y) < ( X Pp ◦ (g − h) dμ)1/p + 2 ≤ X 2p (1+b−a) dμ + 2 < , where the first inequality follows from the triangle inequality, the second inequality follows from the choice of h, the third inequality follows from Proposition 11.83, and the last inequality follows from Proposition 11.75. This completes the proof of the proposition. ' & Theorem 12.121 Let Xi := (Xi , Bi , μi ) be a finite measure space, i = 1, 2, X := X1 × X2 =: (X1 × X2 , B, μ) be the finite product measure space, Y be a separable Banach space, W ⊆ Y be a σ -compact conic segment, h : X1 × X2 → W be B-measurable, hx2  : X1 → W be absolutely integrable over X1 , ∀x2 ∈ X2 , and g : X2 → Y be defined by g(x2 ) := X1 hx2  dμ1 ∈ Y, ∀x2 ∈ X2 . Then, g is B2 -measurable.

12.9 Integral Depending on a Parameter

681

Proof Fix any n ∈ N. Define hn : X1 × X2 → W by hn (x1 , x2 ) = 3 h(x1 , x2 ) if h(x1 , x2 )Y ≤ n , ∀(x1 , x2 ) ∈ X1 × X2 . By Proposin h(x , x ) 1 2 if h(x1 , x2 )Y > n h(x1 ,x2 )Y tions 7.21 and 11.38, we have P ◦ h : X1 × X2 → R is B-measurable. By Propositions 11.39 and 11.38, we have hn is B-measurable. Define gn : X2 → Y by gn (x2 ) := X1 hnx2  dμ1 ∈ Y, ∀x2 ∈ X2 , which is well-defined by Bounded Convergence Theorem 11.77. Again by Bounded Convergence Theorem 11.77, we have hn is absolutely integrable over X . By Fubini’s Theorem 12.31, we have h dμ = g dμ2 and gn is B2 -measurable. n n X1 ×X2 X2 Fix any x ∈ X . Note 2 2 6 6 6that limn∈N hnx2  (x1 ) = hx2  (x1 ), ∀x1 ∈ X1 , and 6 6hnx  (x1 )6 ≤ 6hx  (x1 )6 , ∀x1 ∈ X1 . By Lebesgue Dominated Convergence 2 2 Y Y Theorem 11.91, g(x2 ) = X1 hx2  dμ1 = limn∈N X1 hnx2  dμ1 = limn∈N gn (x2 ). By Proposition 11.48, g is B2 -measurable. This completes the proof of the theorem. ' & Theorem 12.122 (Taylor’s Theorem) Let X be a normed linear space over K, Y be a Banach space over K, D ⊆ X, f : D → Y, x0 , x1 ∈ D, and n ∈ Z+ . Let ¯ D¯ := I := [0, 1] ⊂ R or D¯ := {a + i0 | a ∈ I } ⊂ C if K =C. Let ϕ  : D → D be (n+1) ¯ and ¯ ⊇ ϕ(D), given by ϕ(t) = tx1 + (1 − t)x0 , ∀t ∈ D. Assume that dom f (n+1) ¯ f is continuous at x = ϕ(t), ∀t ∈ D. Let Rn ∈ Y be given by 1 (1) f (x0 )(x1 − x0 ) + · · · 1! ! 1 (n) + f (x0 ) (x1 − x0 ) · · · (x1 − x0 )    n!

Rn := f (x1 ) − f (x0 ) +

.

n-times

Then, 7

1

Rn =

.

0

s n (n+1) f (ϕ(1 − s)) (x1 − x0 ) · · · (x1 − x0 ) ds    n! (n+1)-times

Proof Define F : I → Y by t (1) f (ϕ(1 − t))(x1 − x0 ) + · · · 1! ! tn + f (n) (ϕ(1 − t)) (x1 − x0 ) · · · (x1 − x0 ) +Rn t n+1 ; ∀t ∈ I    n!

F (t) = f (ϕ(1)) − f (ϕ(1 − t)) +

.

n-times

By Propositions 3.12, 3.32, 9.7, 7.23, and 7.65, F is continuous. Clearly, ϕ is differentiable. By Chain Rule and Propositions 9.10, 9.15–9.17 and 9.19, F is ¯ By Propositions 3.12, 3.32, 9.7, 7.23, and 7.65, differentiable at t, ∀t ∈ D. F (1) is continuous. Clearly, F (0) = F (1) = ϑY . By Theorem 12.83, we have

682

12 Differentiation and Integration

F (1) − F (0) =

1 0

7 ϑY = −

DF (s) ds. Then, we have

1

− f (1) (ϕ(1 − s))(x1 − x0 ) + f (1)(ϕ(1 − s))(x1 − x0 )

.

0



s (2) f (ϕ(1 − s))(x1 − x0 )(x1 − x0 ) + · · · 1!

+

s n−1 f (n) (ϕ(1 − s)) (x1 − x0 ) · · · (x1 − x0 )    (n − 1)! n-times



sn n!

f

(n+1)

(ϕ(1 − s)) (x1 − x0 ) · · · (x1 − x0 ) +(n + 1)Rn s   

! n

ds

(n+1)-times

7

1

= 0

sn n!

f (n+1) (ϕ(1 − s)) (x1 − x0 ) · · · (x1 − x0 ) ds − Rn    (n+1)-times

Hence, 7

1

Rn =

.

0

s n (n+1) f (ϕ(1 − s)) (x1 − x0 ) · · · (x1 − x0 ) ds    n!

This completes the proof of the theorem.

(n+1)-times

' &

12.10 Iterated Integrals Proposition 12.123 Let X be a normed linear space, S1 , S2 ⊆ X be compact. Then, S := {x ∈ X | x = αx1 + (1 − α)x2 , x1 ∈ S1 , x2 ∈ S2 , α ∈ r0,1 } is compact. Proof Define a mapping F : S1 ×S2 ×r0,1 → X by F (x1 , x2 , α) = αx1 +(1−α)x2 . By Tychonoff Theorem 5.47, S1 × S2 × r0,1 is compact. Clearly, F is continuous and F (S1 × S2 × r0,1 ) = S. Then, S is compact by Proposition 5.7. ' & Theorem 12.124 Let m ∈ N with m ≥ 2, Rm be equipped with the usual positive  cone, a := (a1 , . . . , am ), b := (b1 , . . . , bm ) ∈ Rm with a = b, Y be a Banach space, and f : ra,b → Y. Assume that: (i) f is bounded, i. e., ∃M ∈ R+ such that f (x)Y ≤ M, ∀x ∈ ra,b . (ii) f (ra,b ) ⊆ H ⊆ Y, where H is a σ -compact conic segment, and f is BB (Rm )measurable.

12.10 Iterated Integrals

683

Then, 7 f (x1 , . . . , xm ) dμBm (x1 , . . . , xm )

.

ra,b

7

=

b1

7 ···

a1

bm

f (x1 , . . . , xm ) dμB (xm ) · · · dμB (x1 )

am

In particular, (ii) holds if the following conditions hold for f :  m (a) ra,b = ∞ i=1 Ci , where the sets in the union are BB (R )-measurable. (b) f |Ci is uniformly continuous, ∀i ∈ N. (c) Ci is compact, ∀i ∈ N. Proof By (i) and (ii) and Bounded Convergence Theorem 11.77, f is absolutely integrable over ra,b . By Fubini Theorem 12.31, we have 7 f (x1 , . . . , xm ) dμBm (x1 , . . . , xm )

.

ra,b

7

=

b1

7 f (x1 , . . . , xm ) dμBm−1 (x2 , . . . , xm ) dμB (x1 ) ra, ¯ b¯

a1

where a¯ := (a2 , . . . , am ), b¯ := (b2 , . . . , bm ). Fix any x1 ∈ ra1 ,b1 . f (x1 , ·) : ra, ¯ b¯ → H ⊆ Y is absolutely integrable over ra, ¯ b¯ ,  m−1  -measurable, by Proposition 12.28, and by (i), and Bounded since it is BB R Convergence Theorem 11.77 implies that it is absolutely integrable. It is clear that the range of the function is contained in H , which is a σ -compact conic segment. Then, we can apply Fubini Theorem 12.31 repeatedly to arrive at 7 f (x1 , . . . , xm ) dμBm (x1 , . . . , xm )

.

ra,b

7

=

b1

f (x1 , . . . , xm ) dμBm−1 (x2 , . . . , xm ) dμB (x1 ) a1

7 =

7

b1 a1

ra, ¯ b¯

7

···

bm

f (x1 , . . . , xm ) dμB (xm ) · · · dμB (x1 )

am

Let (a)–(c) hold. Note that ra,b ⊆ Rm is compact. By (a) and (b) and Proposition 11.37, f |Ci is BB (Rm )-measurable, ∀i ∈ N. By Proposition 11.41, f is BB (Rm )-measurable. Fix any i ∈ N. By (b) and Proposition 4.46, there exists a unique continuous fi : Ci → Y such that fi |Ci = f |Ci and fi is uniformly continuous.Let Ki := fi (C i∞) ⊆ Y. Ki is compact by Proposition 5.7. Then, f (ra,b ) = ∞ f (C ) ⊆ i i=1 i=1 Ki . Let K0 := {ϑY }, which is clearly compact. By Proposition 12.123, K¯ i := {y ∈ Y | y = αy1 + (1 − α)y2 , y1 ∈ Ki , y2 ∈ K0 } is compact. It is straightforward to show that K¯ i is a conic segment. This implies that

684

12 Differentiation and Integration

 ∞ ¯ f (ra,b ) ⊆ ∞ i=1 Ki ⊆ i=1 Ki =: H , where the right-hand side is a σ -compact conic segment. Then, (ii) holds. This completes the proof of the theorem. ' & Theorem 12.125 Let R2 be equipped with the usual positive cone, a :=  (a1 , a2 ), b := (b1 , b2 ) ∈ R2 with a = b, X be a separable Banach space over K, Y be a Banach space over K, and Z be a Banach space over K, x : ra2 ,b2 → X be bounded and BB (R)-measurable, A : ra,b → B(X, Y) be continuous, and C : ra1 ,b1 → W ⊆ B(Y, Z) be bounded and BB (R)-measurable. Assume that W is a separable subspace of B(Y, Z). Then, 7

7 C(τ )A(τ, s)x(s) dμB2 (τ, s) =

.

ra,b

b2

7

b1

C(τ )A(τ, s)x(s) dτ ds (12.10) 7

=

a2 b1

7

a1

a1 b2

C(τ )A(τ, s)x(s) ds dτ ∈ Z

a2

Proof Clearly, ra,b is compact. Since A : ra,b → B(X,  Y) is continuous, then A is bounded by Propositions 5.7 and 5.38. A is BB R2 -measurable by Proposition 11.37. By the boundedness of C, A, and x, and Proposition 7.64, CAx is bounded. By the measurability of C, A, and x, and Propositions 11.38 and 11.39,   we have CAx is BB R2 -measurable. By Proposition 7.126, we have the function (τ, s) 3→ C(τ )A(τ, s)x(s) has the range contained in a separable Banach subspace of Z. By Bounded Convergence Theorem 11.77, CAx is absolutely integrable over ra,b , and therefore, the left-hand side of (12.10) is well-defined and is an element of Z. Fix any τ ∈ ra1 ,b1 , and it is easy to see that C(τ )A(τ, ·)x(·) : ra2 ,b2 → Z is bounded and BB (R)-measurable. Then, this function is absolutely inte b b grable over ra2 ,b2 . Note that a22 C(τ )A(τ, s)x(s) ds = C(τ ) a22 A(τ, s)x(s) ds = b C(τ ) a22 A(τ, s) dμ(s), where the first equality follows from Proposition 11.92 and the second equality follows from the definition μ(E) = E x(s) ds, ∀E ∈ BB (R) with E ⊆ ra2 ,b2 , which, by Proposition 11.116, is a finite X-valued measure, and b Proposition 11.168. By Theorem 12.111, the mapping τ 3→ a22 A(τ, s) dμ(s) ∈ Y is continuous. Then, it is BB (R)-measurable by Proposition 11.37, and its range is contained in a compact and therefore separable subset of Y. By Propositions 11.38 b and 11.39, we have C(·) a22 A(·, s) dμ(s) : ra1 ,b1 → Z is BB (R)-measurable. By (iii) of Fubini Theorem 12.30, we have 7

7 C(τ )A(τ, s)x(s) dμB2 (τ, s) =

.

ra,b

b1 a1

7

b2

C(τ )A(τ, s)x(s) ds dτ ∈ Z

a2

Fix any s ∈ ra2 ,b2 , and it is easy to see that C(·)A(·, s)x(s) : ra1 ,b1 → Z is bounded and BB (R)-measurable. Then, this function is absolutely inte b b grable over ra1 ,b1 . Note that a11 C(τ )A(τ, s)x(s) dτ = a11 C(τ )A(τ, s) dτ ·

12.10 Iterated Integrals

685

b x(s) = ( a11 dμ(τ ¯ ) A(τ, s))x(s), where the first equality follows from Proposi tion 11.92 and the second equality follows from the definition μ(E) ¯ = E C(τ ) dτ , ∀E ∈ BB (R) with E ⊆ ra1 ,b1 , which, by Proposition 11.116, is a finite B(Y, Z)-valued measure, and Proposition 11.168. By Theorem 12.111, the map b ping s 3→ a11 dμ(τ ¯ ) A(τ, s) ∈ B(X, Z) is continuous. Then, it is BB (R)measurable by Proposition 11.37, and its range is contained in a compact and therefore separable subset of B(X, Z). By Propositions 11.38 and 11.39, we have b ( a11 dμ(τ ¯ ) A(τ, ·))x(·) : ra2 ,b2 → Z is BB (R)-measurable. By (iv) of Fubini Theorem 12.30, we have 7

7

b2

C(τ )A(τ, s)x(s) dμB2 (τ, s) =

.

ra,b

a2

7

b1

C(τ )A(τ, s)x(s) dτ ds ∈ Z

a1

This completes the proof of the theorem.

' &

The preceding theorem can be generalized to m-dimensional case. Theorem 12.126 Let m ∈ N with m ≥ 2, Rm be equipped with the usual  positive cone, a := (a1 , . . . , am ), b := (b1 , . . . , bm ) ∈ Rm with a = b, Xi be a separable Banach space over K, i = 1, . . . , m, Y be a Banach space over K, xi : rai ,bi → Xi be bounded and BB (R)-measurable, i = 1, . . . , m, and A : ra,b → B(Xm , B(Xm−1 , · · · , B(X1 , Y) · · · )) be continuous. Then, 7 A(s1 , . . . , sm )(xm (sm )) · · · (x1 (s1 )) dμBm (s1 , . . . , sm )

.

ra,b

7

=

b1 a1

7 ···

bm

(12.11)

A(s1 , . . . , sm )(xm (sm )) · · · (x1 (s1 )) dsm · · · ds1 ∈ Y

am

Furthermore, the order of the iterated integration on the right-hand side of the equality can be arbitrary. Proof Clearly, ra,b is compact. By the continuity of A and Propositions 5.7 and 5.38, A(ra,b ) =: H ⊆ B(Xm , B(Xm−1 , · · · , B(X1 , Y) · · · )) is a compact set and A is bounded. By Proposition 11.37, A is BB (Rm )-measurable. By the boundedness of xi , i = 1, . . . , m, and A, and Proposition 7.64, we have the f is bounded, where f : ra,b → Y is defined by f (s1 , . . . , sm ) = A(s1 , . . . , sm ) · (xm (sm )) · · · (x1 (s1 )), ∀(s1 , . . . , sm ) ∈ ra,b . By Proposition 11.190, H is separable. By the measurability of A, and xi , i = 1, . . . , m, and Propositions 11.38 and 11.39, we have f is BB (Rm )-measurable. By Proposition 7.126, we have the function f (ra,b ) is contained in a separable Banach subspace of Y. By Bounded Convergence Theorem 11.77, f is absolutely integrable over ra,b , and therefore, the left-hand side of (12.11) is well-defined and is an element of Y. We will prove the theorem using mathematical induction on m ∈ N: 1◦ m = 1. This is trivial. 2◦ Assume that the result holds for m = k − 1 ∈ N.

686

12 Differentiation and Integration

m → Rm−1 3◦ Consider the case m = k ∈ {2, 3, . . .} ⊂ N. Define πm  : R m−1 by πm , ∀(s , . . . , sm ) ∈ Rm . Fix any  (s1 , . . . , sm ) = (s1 , . . . , sm−1 ) ∈ R bm 1 (s1 , . . . , sm−1 ) ∈ πm  (ra,b ). It is easy to see that am f (s1 , . . . , sm )(xm (sm )) · dsm is absolutely integrable and integrable and thus well-defined. Define f1 : π (ra,b ) → B(Xm−1 , B(Xm−2 , · · · , B(X1 , Y) · · · )), by f1 (s1 , . . . , sm−1 ) = m bm ∈ πm  (ra,b ). Then, f1 is am f (s1 , . . . , sm )(xm (sm )) dsm , ∀(s1 , . . . , sm−1 ) continuous by Theorem 12.111. By the inductive assumption, πm (ra,b ) f1 (s1 , . . . , sm−1 )(xm−1 (sm−1 )) · · · (x2 (s2 ))(x1 (s1 )) dμBm−1 (s1 , . . . , sm−1 ) is absolutely integrable and integrable. Then,

7 A(s1 , . . . , sm )(xm (sm )) · · · (x1 (s1 )) dμBm (s1 , . . . , sm )

.

ra,b

7

=

f1 (s1 , . . . , sm−1 )(xm−1 (sm−1 )) · · · (x2 (s2 ))(x1 (s1 )) πm  (ra,b )

·dμBm−1 (s1 , . . . , sm−1 ) 7 b1 7 bm−1 = ··· f1 (s1 , . . . , sm−1 )(xm−1 (sm−1 )) · · · (x2 (s2 ))(x1 (s1 )) a1

am−1

·dsm−1 · · · ds1 7 b1 7 bm = ··· f (s1 , . . . , sm )(xm (sm )) dsm (xm−1 (sm−1 )) · · · (x1 (s1 )) a1

am

·dsm−1 · · · ds1 7 b1 7 bm = ··· f (s1 , . . . , sm )(xm (sm ))(xm−1 (sm−1 )) · · · (x1 (s1 )) a1

am

·dsm dsm−1 · · · ds1 ∈ Y where the first equality follows from (iii) of Fubini Theorem 12.30, the second equality follows from the inductive assumption, the third equality follows from the definition of f1 , and the last equality follows from Proposition 11.92. This completes the induction process. Clearly, the order of the iterated integration can be arbitrary by the symmetry in the result. This completes the proof of the theorem. ' & We will now state and prove the following Iterated Integral Theorem. Theorem 12.127 (Iterated Integral Theorem) Let m ∈ N with m ≥ 2, Rm be equipped with the usual positive cone, a := (a1 , . . . , am ), b := (b1 , . . . , bm ) ∈  Rm with a = b, Xi be a separable Banach space over K, i = 1, . . . , m, Y be a Banach space over K, xi : rai ,bi → Xi be bounded and BB (R)-measurable, i = 1, . . . , m, Ω ⊆ ra,b be a region, and A : Ω → B(Xm , B(Xm−1 , · · · , B(X1 , Y) · · · )) be bounded and continuous. Let A¯ : ra,b → B(Xm , B(Xm−1 , · · · , B(X1 , Y) · · · )) be

12.10 Iterated Integrals

687

 ¯ such that A¯ P(Ω) = A and A(s) = ϑB(Xm ,B(Xm−1 ,··· ,B(X1 ,Y)··· )) , ∀s ∈ ra,b \ P(Ω). Then, 7 ¯ 1 , . . . , sm )(xm (sm )) · · · (x1 (s1 )) dμBm (s1 , . . . , sm ) . (12.12) A(s ra,b

7

7

b1

=

bm

···

a1

¯ 1 , . . . , sm )(xm (sm )) · · · (x1 (s1 )) dsm · · · ds1 ∈ Y A(s

am

Furthermore, the order of the iterated integration on the right-hand side of the equality can be arbitrary.     Proof By Definition 12.38, there exists N ⊆ N and a j  j ∈N , bj  j ∈N ⊆ Ω     with a j  = bj  , ∀j ∈ N, such that Ω = j ∈N ra j ,bj , ra j ,bj j ∈N is pairwise   disjoint, P(Ω) = j ∈N ra j ,bj , and ∀a, ¯ b¯ ∈ Ω with a¯ = b¯ and ra, ¯ b¯ ⊆ Ω, we have ra, ⊆ P(Ω). Note that Ω ⊆ r implies that, by Proposition 12.39, P(Ω) ⊆ a,b ¯ b¯ P(ra,b ) = ra,b . Fix any j ∈ N. Define A¯ j : ra,b → B(Xm , B(Xm−1 , · · · , B(X1 , Y)  · · · )) and Aj : ra,b → B(Xm , B(Xm−1 , · · · , B(X1 , Y) · · · )) be such that A¯ j r = a j ,bj  A|r j j and Aj r = A|r j j ; and A¯ j (s) = ϑB(Xm ,B(Xm−1 ,··· ,B(X1 ,Y)··· )) , a ,b a ,b a j ,bj ∀s ∈ ra,b \ ra j ,bj ; and Aj (s) = ϑB(Xm ,B(Xm−1 ,··· ,B(X1 ,Y)··· )) , ∀s ∈ ra,b \ ra j ,bj . Then, 7 . Aj (s1 , . . . , sm )(xm (sm )) · · · (x1 (s1 )) dμBm (s1 , . . . , sm ) ra,b

7

=

Aj (s1 , . . . , sm )(xm (sm )) · · · (x1 (s1 )) dμBm (s1 , . . . , sm ) 7

ra j ,bj

A(s1 , . . . , sm )(xm (sm )) · · · (x1 (s1 )) dμBm (s1 , . . . , sm )

= ra j ,bj

7 =

j

7 =

7

j

b1

···

a1

b1 a1

j

bm j

7 ···

A(s1 , . . . , sm )(xm (sm )) · · · (x1 (s1 )) dsm · · · ds1

am bm

Aj (s1 , . . . , sm )(xm (sm )) · · · (x1 (s1 )) dsm · · · ds1

am

where the first equality follows from the definition of Aj and Proposition 11.92, the second equality follows from the definition of Aj , the third equality follows from j  j  j  j  j  j  Theorem  12.126, a =: (a1 , . . . , am ) and b =: (b1 , . . . , bm ) and the fact is continuous, and the fourth equality follows from the definitions that A¯ j r a j ,bj

of Aj and Definition 12.71. This implies that

688

12 Differentiation and Integration

7

¯ 1 , . . . , sm )(xm (sm )) · · · (x1 (s1 )) dμBm (s1 , . . . , sm ) A(s

.

ra,b

7

¯ 1 , . . . , sm )(xm (sm )) · · · (x1 (s1 )) dμBm (s1 , . . . , sm ) A(s

= 7

P(Ω)

A(s1 , . . . , sm )(xm (sm )) · · · (x1 (s1 )) dμBm (s1 , . . . , sm )

= P(Ω)

= = =

.7 ra j ,bj

j ∈N

ra,b

.7 .7

j ∈N

7

=

A(s1 , . . . , sm )(xm (sm )) · · · (x1 (s1 )) dμBm (s1 , . . . , sm )

j ∈N

b1 a1

Aj (s1 , . . . , sm )(xm (sm )) · · · (x1 (s1 )) dμBm (s1 , . . . , sm ) b1

7 ···

a1

Aj (s1 , . . . , sm )(xm (sm )) · · · (x1 (s1 )) dsm · · · ds1

am

7 ···

bm

bm

¯ 1 , . . . , sm )(xm (sm )) · · · (x1 (s1 )) dsm · · · ds1 A(s

am

¯ the second where the first equality follows from P(Ω) ⊆ ra,b and the definition of A, ¯ the third equality follows from Proposiequality follows from the definition of A, tion 11.92, the fourth and fifth equalities follow from the preceding discussion, and the last equality follows from the repeated application of Proposition 11.92 and the Bounded Convergence Theorem 11.77 and the fact that j ∈N Aj (s1 , . . . , sm ) = ¯ 1 , . . . , sm ), ∀(s1 , . . . , sm ) ∈ ra,b . A(s Clearly, the order of the iterated integration can be arbitrary by the symmetry in the result. This completes the proof of the theorem. ' &

12.11 Manifold 12.11.1 Basic Notion Definition 12.128 Let X be a nonempty Hausdorff topological space and Y be a Banach space over K. X is said to be a manifold of variant Y if ∀x ∈ X , there exists an open set U ∈ OX with x ∈ U and an open set V ⊆ Y such that U and V are homeomorphic. For each such open set U and the homeomorphism φ : U → V ⊆ Y, the pair (U, φ) is said to be a chart. Let (U1 , φ1 ) and (U2 , φ2 ) be two charts with U3 := U1 ∩ U2 = ∅, then U3 ∈ OX and φ1 (U3 ) ⊆ Y and φ2 (U3 ) ⊆ Y are open subsets of Y. Then, (φ2 ◦ (φ1 )inv )φ (U ) : φ1 (U3 ) → φ2 (U3 ) is a homeomorphism. 1

3

12.11 Manifold

689

Let A := ((U  α , φα ))α∈A be a collection of charts on the manifold X . A is said to be an atlas if α∈A Uα = X . % Definition 12.129 Let Y be a Banach space over K, X be a manifold of variant Y, and A := ((Uα , φα ))α∈A be an atlas. A is said to be a C∞ atlas if any two charts (U1 , φ1) and (U2 , φ2 ) in A with U3 := U1 ∩ U2 = ∅, and we have the (φ2 ◦ (φ1 )inv )φ (U ) : φ1 (U3 ) → φ2 (U3 ) is a C∞ diffeomorphism, i. e., the bijective 1 3 mapping together with its inverse mapping are C∞ . Any two charts in A satisfying the above assumption are said to be C∞ compatible. The C∞ atlas A is said to be complete, if it is not a proper subset an another C∞ atlas. The manifold X of variant Y together with a complete C∞ atlas Ac is said to be a smooth manifold. % To specify a C∞ manifold, one must specify the topological space X , the Banach space Y, and a C∞ atlas. Definition 12.130 Let Y be a Banach space over K, X be a manifold of variant Y, and A := ((Uα , φα ))α∈A be an atlas. A is said to be an analytic atlas if any two charts (U1 , φ1 ) and (U2 , φ2 ) in A with U3 := U1 ∩ U2 = ∅, and we have the (φ2 ◦ (φ1 )inv )φ (U ) : φ1 (U3 ) → φ2 (U3 ) is an analytic diffeomorphism, i. e., the 1 3 bijective mapping together with its inverse mapping are analytic on their domain of definition. Any two charts in A satisfying the above assumption are said to be analytically compatible. The analytic atlas A is said to be complete, if it is not a proper subset an another analytic atlas. The manifold X of variant Y together with % a complete analytic atlas Ac is said to be an analytic manifold. To specify an analytic manifold, one must specify the topological space X , the Banach space Y, and an analytic atlas. Example 12.131 Let Y be a Banach space over K and X ⊆ Y be an open subset with the subset topology. Clearly, X is a manifold of variant Y. It is also a smooth manifold with the C∞ atlas that contains the singleton atlas of {(X , idX )}. It is also an analytic manifold with the analytic atlas that contains the singleton atlas of {(X , idX )}. % Example 12.132 Let X and Y be Banach spaces over K, U ⊆ X × Y be an open set, and F : U → X be a C∞ (analytic) function with ∂F ∂x : U → B(X, X) ∂F being such that ∂x (x, y) is bijective, ∀(x, y) ∈ M, where the set M := {(x, y) ∈ U | F (x, y) = ϑX }. Then, the set M together with the subset topology of X × Y is a smooth (analytic) manifold of variant Y. ∂F −1 ∈ Since ∂F ∂x (x, y) is bijective, then it is invertible and the inverse ( ∂x (x, y)) B(X, X), by Open Mapping Theorem 7.103. The rest of the proof of this fact follows immediately from the Implicit Function Theorems 9.59 and 9.86. ∀(x0 , y0 ) ∈ M, there exists U1 × V1 ⊆ X × Y, which is open and with (x0 , y0 ) ∈ U1 × V1 such that ∀y ∈ V1 , ∃! x(y) ∈ U1 that satisfies F (x(y), y) = ϑX . Thus, M ∩ (U1 × V1 ) is the open subset on M and the homeomorphism is φ(x, y) = y that maps to the open set V1 ⊆ Y. The inverse map φinv (y) = (x(y), y). By the Implicit Function Theorems, x(y) is C∞ (analytic). Then, this is one of the chart: (M ∩ (U1 × V1 ), φ).

690

12 Differentiation and Integration

Collecting all of these charts for M forms a C∞ (analytic) atlas. Therefore, M is a smooth (analytic) manifold of variant Y. % Example 12.133 Let X be a smooth (analytic) manifold of variant Y, and E ⊆ X be an open subset. Then, E with the subset topology is a smooth (analytic) manifold of variant Y. The proof of this fact is as follows. For any chart (U, φ) in the complete C∞ ˇ (analytic) atlas of X , we just take Uˇ := U ∩ E and φˇ := φ|Uˇ . Then, take (Uˇ , φ) ˇ as one of the charts for manifold E, if U = ∅. This defines a C∞ (analytic) atlas. Hence, the result follows. % Example 12.134 Let Yi be a Banach space over K, i = 1, 2, Xi be a smooth (analytic) manifold of variant Yi , i = 1, 2, and then X1 × X2 is a smooth (analytic) manifold of variant Y1 × Y2 . The proof of this fact is as follows. For any chart (U, φ) of X1 and chart (V , ψ) of X2 , then (U × V , block diagonal (φ, ψ)) is a chart for X1 × X2 . Hence, the result follows. % Definition 12.135 Let Y and Z be Banach spaces over K, and M and N be two smooth (analytic) manifolds of variants Y and Z, respectively, and F : M → N. F is said to be C∞ or smooth (analytic) if ∀p ∈ M, there exists chart of M (among its complete atlas) (U, φ) with p ∈ U and exists chart of N (among its complete atlas) (V , ψ) with F (p) ∈ V , then the expression of F in the local coordinates is C∞ (analytic), where the expression of F in local coordinates is the mapping Fˆ := (ψ ◦ F ◦ φinv )|φ(Finv (V )∩U ) . % Clearly, the fact that a mapping F is smooth (analytic) is a property that is independent of the choice of the coordinates charts. Definition 12.136 Let Y be a Banach space over K, and M and N be two smooth (analytic) manifolds of variants Y, and F : M → N. F is to be a C∞ (analytic) diffeomorphism if F is bijective and both F and Finv are C∞ (analytic) mappings. Two smooth (analytic) manifolds M and N of the same variant are said to be C∞ (analytically) diffeomorphic if there exists a C∞ (analytic) diffeomorphism F : M → N. % Theorem 12.137 Let Y be a Banach space over K, M and N be smooth (analytic) manifolds of variant Y, and F : M → N be a smooth (analytic) mapping. Then, F is a smooth (analytic) diffeomorphism if, and only if, F is bijective and DFˆ (φ(p)) ∈ B(Y, Y) is bijective, ∀p ∈ M, where Fˆ is the expression of F in local coordinates and (U, φ) is a chart in the complete C∞ (analytic) atlas of M with p ∈ U . Proof “Sufficiency” Let AM and AN be the complete C∞ (analytic) atlas for M and N, respectively. Fix any p ∈ M. Then, there exists a chart (U, φ) ∈ AM with p ∈ U and another chart (V , ψ) ∈ AN with F (p) ∈ V . The expression of F in local coordinates is Fˆ := (ψ ◦ F ◦ φinv )|φ(Finv (V )∩U ) . By the assumption, DFˆ (φ(p)) ∈ B(Y, Y) is bijective. By Open Mapping Theorem 7.103, DFˆ (φ(p)) is invertible and (DFˆ (φ(p)))−1 ∈ B(Y, Y). By Inverse Function Theorems 9.57

12.11 Manifold

691

ˆ ˆ and 9.85, there  exists an open subset Up ⊆ φ(Finv (V ) ∩ U ) ⊆ Y with φ(p) ∈ Up  such that Fˆ  is C∞ (analytic) diffeomorphism between Uˆ p and Fˆ (Uˆ p ). Then, Uˆ p

Up := φinv (Uˆ p ) is an open subset of M and Vp := ψinv (Fˆ (Uˆ p )) is an open subset of N and F |Up : Up → Vp is bijective and smooth (analytic) with smooth (analytic) inverse. Since F is bijective, then Finv : N → M is welldefined. ∀q ∈ N, (Finv )|VF (q) = ( F |UF (q) ) and is smooth (analytic) by our inv inv inv preceding discussion. Hence, Finv is a smooth (analytic) mapping. Then, F is a smooth (analytic) diffeomorphism. This completes the sufficiency part of the proof. “Necessity” Let F be a smooth (analytic) diffeomorphism between M and N. Then, F is bijective and Finv is smooth (analytic). Then, idM = Finv ◦ F . ∀p ∈ M, there exists a chart (U, φ) ∈ AM with p ∈ U , and a chart (V , ψ) ∈ AN with F (p) ∈ V such that the expression of F in local coordinates Fˆ : φ(U ) → ψ(V ) is bijective and smooth (analytic) with smooth (analytic) inverse. Then, idφ(U ) = Fˆinv ◦ Fˆ and idψ(V ) = Fˆ ◦ Fˆinv . By Chain Rule (Theorem 9.18), we have idY = D(Fˆinv )(Fˆ (y))DFˆ (y), ∀y ∈ φ(U ); and idY = DFˆ (Fˆinv (z))D(Fˆinv )(z), ∀y ∈ ψ(V ). Let z = Fˆ (y), and by Proposition 2.4, we have D(Fˆinv )(Fˆ (y)) = (DFˆ (y))−1 , ∀y ∈ φ(U ). Hence, DFˆ (φ(p)) ∈ B(Y, Y) and is bijective. This completes the necessity part of the proof. This completes the proof of the theorem. ' & We are particularly interested in smooth (analytic) manifolds of variant Kn . Theorem 12.138 Let n ∈ Z+ , M and N be smooth (analytic) manifolds of variant Kn , and F : M → N be a smooth (analytic) mapping. Then, F is a smooth (analytic) diffeomorphism if, and only if, F is bijective and rank(DFˆ (φ(p))) = n, ∀p ∈ M, where Fˆ is the expression of F in local coordinates. Proof This is immediate from Theorem 12.137.

' &

Example 12.139 We present here some results on the logarithm on the complex plane. Note that the exponential function (with base e) maps the complex plane C to C \ {0} =: C0 . But, it is not injective, but actually periodic with period i2π. If we restrict the domain of the exponential function to the strip Eexp := {a + ib ∈ C | − π < b ≤ π}, then exp|Eexp : Eexp → C0 is bijective. Define Ln : C0 → Eexp by Ln = ( exp|Eexp ) . By the Inverse Function Theorem, Ln is analytic at ∀z0 ∈ C0 , inv

and Ln(1)(z0 ) = z10 . The natural logarithm function ln : C0 → C is a multi-valued function that takes value Ln(x) + i2kπ, k ∈ Z, depending on the restriction of range of the function. We will use the concept of manifold to resolve the ln function in this example. Let X denote the surface of an infinite long cylinder with radius 1. We parameterize X by the cylindrical polar coordinates: (r, θ, h), where r denotes the radius of projection of the vector to the x-y plane; θ denotes the angle from the positive x-axis direction of the projection of the vector to the x-y plane; and h is the height of the vector from the x-y plane. Then, for the infinite long cylinder, we have r = 1, θ ∈ r−π,π , h ∈ R. The topology of X is that of the subset topology it

692

12 Differentiation and Integration

inherits from R3 . On X , we will introduce an analytic atlas that makes it a analytic manifold of variant C. The atlas has two charts: (U1 , φ1 ) and (U2 , φ2 ), where U1 := {(1, θ, h) ∈ X | h ∈ R and θ ∈ r◦−π,π }, φ1 : U1 → V1 ⊆ C is defined by φ1 (1, θ, h) := h+iθ , ∀(1, θ, h) ∈ X , and V1 := {a+ib ∈ C | a ∈ R and b ∈ r◦−π,π }; and U2 := {(1, θ, h) ∈ X | h ∈ R and θ ∈ r0,π ∪ r◦−π,0 }, φ2 : U2 → V2 ⊆ C  h + iθ θ >0 , ∀(1, θ, h) ∈ X , and V2 := is defined by φ2 (1, θ, h) := h + iθ + i2π θ < 0 {a + ib ∈ C | a ∈ R and b ∈ r◦0,2π }. It is easy to check that two charts are analytically compatible. Then, X together with the atlas A := {(U1 , φ1 ), (U2 , φ2 )} defines an analytic manifold of variant C. Define  : R × R → r−π,π by θ1  θ2 = θ1 + θ2 − 2kπ, where k ∈ Z is such that θ1 + θ2 − 2kπ ∈ r−π,π . Define on the manifold X an operation ⊕ by (1, θ1 , h1 ) ⊕ (1, θ2 , h2 ) := (1, θ1  θ2 , h1 + h2 ), ∀(1, θ1 , h1 ), (1, θ2 , h2 ) ∈ X . Clearly, ⊕ is an analytic function on the product manifold X ×X , and ⊕ is commutative and associative. Here, what we have done is just roll the complex plane on to the cylinder with the real axis of the complex plane identifying with the line {(1, 0, h) ∈ X | h ∈ R} on the cylinder and identifying whatever overlapped points of the complex plane to the point on the cylinder. Define the function exp : X → C0 by exp(1, θ, h) = exp(h + iθ ), ∀(1, θ, h) ∈ X . On the chart (U1 , φ1 ), the function exp admits expression exp : V1 → C0 . On the chart (U2 , φ2 ), the function exp admits expression exp : V2 → C0 . Then, exp is an analytic function on the manifold X of variant C with atlas A. Clearly, exp : X → C0 is bijective whose derivative (of the expression in local coordinates) is nonzero ∀p ∈ X . By Theorem 12.138, exp : X → C0 is an analytic diffeomorphism and admits an analytic inverse denoted by ln. On chart (U1 , φ1 ), the function ln admits expression ln : C0 \ {a + ib ∈ C | a < 0, b = 0} → V1 . On the chart (U2 , φ2 ), the function ln admits expression ln : C0 \ {a + ib ∈ C | a > 0, b = 0} → V2 . Here, ln’s are the usual multi-valued function, but they are now standard functions since the ranges of the function are limited to sets where it has only one value. On X , we define the metric ρ : X × X → [0, ∞) ⊂ R by ρ(p1 , p2 ) =  1 ρ((1, θ1 , h1 ), (1, θ2 , h2 )) = (h1 − h2 )2 + min{(|θ1 − θ2 | − 2π)2 , |θ1 − θ2 |2 } 2 , ∀p1 , p2 ∈ X . It is easy to show that ρ defines a metric on X and its induced topology is the same as the subset topology inherited from R3 . % Proposition 12.140 Let x, y ∈ C0 and z1 , z2 , z ∈ X , where X is the analytic manifold of variant C as defined in Example 12.139. Using the notations introduced in Example 12.139, we have: exp(ln(x)) = x. ln(exp(z)) = z. exp(z1 ⊕ z2 ) = exp(z1 )exp(z2 ). ln(x) ⊕ ln(y) = ln(xy). ρ(z1 ⊕ z2 , p0 ) ≤ ρ(z1 , p0 ) + ρ(z2 , p0 ), where p0 := (1, 0, 0) ∈ X . ρ(z1 , z2 ) ≤ |φ1 (z1 ) − φ1 (z2 )|, ∀z1 , z2 ∈ U1 , where (U1 , φ1 ) is the chart defined in Example 12.139; and ρ(z1 , z2 ) ≤ |φ2 (z1 ) − φ2 (z2 )|, ∀z1 , z2 ∈ U2 , where (U2 , φ2 ) is the chart defined in Example 12.139.   (vii) ρ(ln(1 + w) ⊕ ln(exp(−w)), p0 ) < 2|w|2 , ∀w ∈ BC 0, 12 . (i) (ii) (iii) (iv) (v) (vi)

12.11 Manifold

693

Proof (i) and (ii) These are direct consequences of Example 12.139. (iii) By Example 12.139, we have exp(z1 ⊕ z2 ) = exp(z1 )exp(z2 ). (iv) Let p1 := ln(x) ∈ X and p2 := ln(y) ∈ X , and we have, by (i), exp(p1 ) = x and exp(p2 ) = y, which implies that exp(p1 ⊕ p2 ) = exp(p1 )exp(p2 ) = xy, where the first equality follows from (iii). Thus, by (ii), we have p1 ⊕ p2 = ln(xy). (v) Let zi := (1, θi , hi ), i = 1, 2, and z¯ 2 := (1, −θ2, −h2 ) ∈ X . Then, ρ(z1 ⊕ 1 z2 , p0 ) = ((h1 + h2 )2 + min{(θ1  θ2 )2 , (|θ1  θ2 | − 2π)2 }) 2 = ((h1 + h2 )2 + 1 1 (θ1  θ2 )2 ) 2 ≤ ((h1 + h2 )2 + min{(θ1 + θ2 )2 , (|θ1 + θ2 | − 2π)2 }) 2 = ρ(z1 , z¯ 2 ) ≤ ρ(z1 , p0 ) + ρ(p0 , z¯ 2 ) = ρ(z1 , p0 ) + ρ(p0 , z2 ), where the second inequality follows from the triangular inequality and the rest of the steps follow from Example 12.139. (vi) ∀z1 , z2 ∈ U1 , let zi := (1, θi , hi ), i = 1, 2. ρ(z1 , z2 ) =  1 1  (h1 − h2 )2 + min{(|θ1 − θ2 | − 2π)2 , |θ1 − θ2 |2 } 2 ≤ (h1 − h2 )2 + |θ1 − θ2 |2 2 = |h1 + iθ1 − (h2 + iθ2 )| = |φ1 (z1 ) − φ1 (z2 )|.  ∀z1 , z2 ∈ U2 , let zi := (1, θi , hi ), i = 1, 2. ρ(z1 , z2 ) = (h1 − h2 )2 + 1 min{(|θ1 − θ2 | − 2π)2 , |θ1 − θ2 |2 } 2 ≤ |φ2 (z1 ) − φ2 (z2 )|.  1 (vii) ∀w ∈ BC 0, 2 , we have ρ(ln(1 + w) ⊕ ln(exp(−w)), p0 ) ≤     φ1 (ln(1 + w) ⊕ ln(exp(−w))) − φ1 (p0 ) = φ1 (ln((1 + w) exp(−w))) − 0 =   φ1 (ln((1 + w) exp(−w))) = |ln((1 + w) exp(−w))| = |ln(1 + w) − w| ≤ 2|w|2 , where the first inequality follows from (vi), the first equality follows from (iv), the third equality follows from Example 12.139 and the fact that ln admits the expression ln on the chart (U1 , φ1 ), the fourth equality follows from the property of ln and the fact that it is single valued on the chart (U1 , φ1 ), and the last inequality follows from Taylor’s Theorem 9.48. This completes the proof of the proposition. ' &

12.11.2 Tangent Vectors Starting in this section, we will not pursue results on analytic manifold. As one can see, the results on smooth manifold have ramifications to analytic manifold, which we shall leave it to the reader to pursue these results. Definition 12.141 Let .Y be a reflexive Banach space over .K, .X be a smooth manifold of variant .Y, and .p ∈ X . Let .t := (U, φ) be a chart in the complete .C∞ atlas of .X , denoted by .AX , with .p ∈ U . .λ : U → K be a functional such that .λ ◦ φinv : φ(U ) → K is .C∞ . Such a functional .λ is said to be smooth in a neighborhood of p. The collection of all functionals that are smooth in a neighborhood of p is denoted by .C∞ (p, K). % Note that .C∞ (p, K) forms a vector space over .K and further satisfies that, any two functionals in .C∞ (p, K), their product is also a member of .C∞ (p, K). Definition 12.142 Let .Y be a reflexive Banach space over .K, .X be a smooth manifold of variant .Y, and .p ∈ X . A tangent vector v at p is an assignment of

694

12 Differentiation and Integration

a vector .vt ∈ Y to every chart .t := (U, φ) in the complete smooth atlas of .X , denoted by .AX , with .p ∈ U , such that .∀h ∈ C∞ (p, K), .v(h) ∈ K, that satisfies: (i) .v(h) = D(h ◦ φinv )(φ(p))(vt ). (ii) .vt1 = D(φ1 ◦ (φ2 )inv )(φ2 (p))(vt2 ), .∀t1 := (U1 , φ1 ), t2 := (U2 , φ2 ) ∈ AX with .p ∈ U1 ∩ U2 . (iii) .v(αh1 + βh2 ) = αv(h1 ) + βv(h2 ), .∀α, β ∈ K and .∀h1 , h2 ∈ C∞ (X , K). (iv) .(αv1 + βv2 )(h) = αv1 (h) + βv2 (h), .∀α, β ∈ K, and for any two tangent vectors .v1 and .v2 at p. (v) .v(h1 h2 ) = v(h1 )h2 (p) + v(h2 )h1 (p), .∀h1 , h2 ∈ C∞ (X , K) (Leibniz rule). The collection of all tangent vectors at .p ∈ X is the tangent space to .X at p and denoted by .Tp X . % Note that .Tp X is a vector space over .K that is isomorphic to .Y. Definition 12.143 Let .Y and .Z be reflexive Banach spaces over .K, .X1 be a smooth manifold of variant .Y, and .p ∈ X1 , .X2 be a smooth manifold of variant .Z, and .F : X1 → X2 be a smooth mapping. The differential of F at p is a mapping .Fp : Tp X1 → TF (p) X2 defined by, .∀v ∈ Tp X1 , .∀λ ∈ C∞ (F (p), K), we have .(Fp (v))(λ) = v(λ ◦ F ). Let .t1 := (U, φ) ∈ AX be a chart at p and .t2 := (V , ψ) ∈ 1 AX2 be a chart at .F (p); then, we have the expression of .Fp in local coordinates to be .D(ψ ◦ F ◦ φinv )(φ(p))(vt1 ), .∀vt1 ∈ Y. Then, .Fp is well-defined. % Theorem 12.144 Let .Y and .Z be reflexive Banach spaces over .K, .X1 be a smooth manifold of variant .Y, and .p ∈ X1 , .X2 be a smooth manifold of variant .Z, and .F : X1 → X2 be a smooth mapping. Then differential of F at p, .Fp : Tp X1 → TF (p)X2 is a linear map. Proof .∀α, β ∈ K, .∀v1 , v2 ∈ Tp X1 , .∀λ ∈ C∞ (F (p), K), we have .(Fp (αv1 + αv2 ))(λ) = (αv1 + βv2 )(λ ◦ F ) = αv1 (λ ◦ F ) + βv2 (λ ◦ F ) = α(Fp (v1 ))(λ) + β(Fp (v2 ))(λ) = (αFp (v1 ) + βFp (v2 ))(λ), where the first equality follows from Definition 12.143, the second equality follows from Definition 12.142, the third equality follows from Definition 12.143, and the last equality follows from Definition 12.142. By the arbitrariness of .λ, we have .Fp (αv1 +αv2 ) = αFp (v1 )+ βFp (v2 ). This completes the proof of the theorem. ' & Theorem 12.145 (Chain Rule) Let .Yi be a reflexive Banach space over .K, .Xi be a smooth manifold of variant .Yi , .i = 1, 2, 3, and .p ∈ X1 , .F : X1 → X2 be a smooth mapping, and .G : X2 → X3 be a smooth mapping. Then we have .(G ◦ F )p = GF (p) ◦ Fp . Proof .∀v ∈ Tp X1 , .∀λ ∈ C∞ (G(F (p)), K), we have .((G◦ F )p (v))(λ) = v(λ◦ G◦ F ) = (Fp (v))(λ ◦ G) = (GF (p)(Fp (v)))(λ). By the arbitrariness of .λ, we have .(G ◦ F )p (v) = GF (p) (Fp (v)). By the arbitrariness of v, we have .(G ◦ F )p = GF (p) ◦ Fp . This completes the proof of the theorem. ' &

12.11 Manifold

695

Example 12.146 Let .Y be a reflexive Banach space over .K, .Ω ⊆ K be an open set, X be a smooth manifold of variant .Y, .σ : Ω → X be a smooth mapping, and .t0 ∈ Ω. Then, .σt0 ( dtd ) ∈ Tσ (t0 ) X satisfies, .∀λ ∈ C∞ (σ (t0 ), K), we have .(σt0 ( dtd ))(λ) = d % dt (λ ◦ σ )(t0 ).

.

Definition 12.147 Let .Y be a reflexive Banach space over .K, .X be a smooth manifold of variant .Y, and .p ∈ X . A tangent covector .v∗ at p is an assignment of a vector .v∗t ∈ Y∗ to every chart .t := (U, φ) in the complete smooth atlas of .X , denoted by .AX , with .p ∈ U , such that .∀v ∈ Tp X , .v∗ (v) ∈ K, that satisfies: (i) .v∗ (αv1 + βv2 ) = αv∗ (v1 ) + βv∗ (v2 ), .∀α, β ∈ K and .∀v1 , v2 ∈ Tp X . (ii) .(αv∗1 + βv∗2 )(v) = αv∗1 (v) + βv∗2 (v), .∀α, β ∈ K, and for any two tangent covectors .v∗1 and .v∗2 at p. (iii) .v∗t1 = v∗t2 ◦ (D(φ2 ◦ (φ1 )inv )(φ1 (p))) = (D(φ2 ◦ (φ1 )inv )(φ1 (p))) v∗t2 , .∀t1 := (U1 , φ1 ), t2 := (U2 , φ2 ) ∈ AX with .p ∈ U1 ∩ U2 . The collection of all tangent covectors at .p ∈ X is the cotangent space to .X at p and denoted by .T∗p X . If there exists .h ∈ C∞ (p, K) such that .v∗ (v) = v(h), .∀v ∈ Tp X , then .v∗t := D(h ◦ φinv )(φ(p)) ∈ Y∗ , and .v∗ is denoted by .(dh)p . % Note that .T∗p X is a vector space over .K that is isomorphic to .Y∗ .

12.11.3 Vector Fields Definition 12.148 Let Y be a reflexive Banach space over K and X be a smooth manifold of variant Y. A vector field f is a mapping assigning to each point p ∈ X a tangent vector f (p) ∈ Tp X . A vector field is said to be smooth if, ∀p ∈ X , there exists a chart t := (U, φ) ∈ AX with p ∈ U , such that, on the chart t, the expression of f in local coordinates φ(p) ¯ ∈ φ(U ) ⊆ Y 3→ (f (p)) ¯ t ∈ Y is a C∞ map. Necessarily, ∀λ ∈ C∞ (X , K), the function h : X → K defined by h(p) = f (p)(λ), ∀p ∈ X , is C∞ . This function h is said to be the Lie derivative of λ along the vector field f , and denoted by Lf λ. The set of all smooth vector fields on X is denoted by V(X ). % V(X ) is a vector space over K, since ∀f, g ∈ V(X ), ∀α, β ∈ K, we have αf +βg is the vector field defined by (αf + βg)(p) = αf (p) + βg(p), ∀p ∈ X , which is clearly smooth. ∀a, b ∈ C∞ (X , K), we may still define a linear combination af +bg by (af + bg)(p) = a(p)f (p) + b(p)g(p), ∀p ∈ X , which is clearly smooth. This makes V(X ) a module over the ring C∞ (X , K). Example 12.149 Let Y be a reflexive Banach space over K, X be a smooth manifold of variant Y, p ∈ X , f be a smooth vector field, and ti := (Ui , φi ) be two charts in AX , i = 1, 2, with p ∈ U1 ∩ U2 . Note that f (p) ∈ Tp X and its expression in chart ti is (f (p))ti =: fi (φi (p)) ∈ Y, i = 1, 2. Then, by (ii) of Definition 12.142, in chart t2 , f2 (φ2 (p)) = D(φ2 ◦ (φ1 )inv )(φ1 (p))(f1 (φ1 (p))) =

696

12 Differentiation and Integration

D(φ2 ◦ (φ1 )inv )(φ1 ◦ (φ2 )inv (φ2 (p)))(f1 (φ1 ◦ (φ2 )inv (φ2 (p)))). This is equivalent to f2 (y2 ) = (DT (y1 )f1 (y1 ))|y1 =Tinv (y2 ) , where T := φ2 ◦ (φ1 )inv φ (U ∩U ) . We 1 1 2 recognize that this is simply the coordinate transformation formula for a differential equation on Y. Thus, the notion of vector field allows us to define the notion of differential equation on the manifold X of variant Y. Let f be a vector field on X , and σ : ra,b → X be a smooth mapping with a, b ∈ R and a < b. σ is said to be integral curve of f if σt0 ( dtd ) = f (σ (t0 )), ∀t0 ∈ ra,b . Let c := (U, φ) ∈ AX with σ (t0 ) ∈ U . Then, by Example 12.146, we have, ∀λ ∈ C∞ (X , K), σt0 ( dtd )(λ) = f (σ (t0 ))(λ) = dtd (λ ◦ σ )(t0 ). On the chart c, by (i) of Definition 12.142, we have D(λ ◦ φinv )(φ(σ (t0 ))(f (σ (t0 ))c ) = dtd (λ ◦ σ )(t0 ) = dtd (λ ◦ φinv ◦ φ ◦ σ )(t0 ) = D(λ ◦ φinv )(φ(σ (t0 ))D(φ ◦ σ )(t0 ). By the arbitrariness of λ, we have D(φ ◦ σ )(t0 ) = f (σ (t0 ))c . This is the desired differential equation formulation: on the left-hand side is the derivative of the expression of σ in local coordinates, and on the righthand side is the assigned vector f (σ (t0 ))c ∈ Y for the chart c of the tangent vector f (σ (t0 )) ∈ Tσ (t0 ) X . Thus, we can say that σ is a solution to the differential equation D(φ ◦ σ )(t0 ) = f (σ (t0 ))c in a neighborhood of t0 . % Let f and g be smooth vector fields on the manifold X . ∀λ ∈ C∞ (X , K), we may define the following smooth function Lf Lg λ − Lg Lf λ. (Note that in repeated Lie derivative, we assume that Lf Lg λ := Lf (Lg λ).) In a chart t := (U, φ) ∈ AX , we have Lg λ(p) = D(λ ◦ φinv )(φ(p))g(φ(p)), ∀p ∈ U , where g admits expression g(x) in the chart t. Let f admits expression f (x) in the chart t. Then, Lf Lg λ(p) = Lf (Lg λ)(p) = D(D(λ ◦ φinv )(x)g(x))(φ(p))f (φ(p)) = (λ ◦ φinv )(2)(φ(p))(g(φ(p)))(f (φ(p))) + D(λ ◦ φinv )(φ(p))(Dg(φ(p))(f (φ(p)))). By symmetry, we have Lg Lf λ(p) = (λ◦φinv)(2) (φ(p))(f (φ(p)))(g(φ(p)))+D(λ◦ φinv )(φ(p))(Df (φ(p))(g(φ(p)))). Then, (Lf Lg λ − Lg Lf λ)(p)

.

= (λ ◦ φinv )(2) (φ(p))(g(φ(p)))(f (φ(p))) +D(λ ◦ φinv )(φ(p))(Dg(φ(p))(f (φ(p)))) −(λ ◦ φinv )(2) (φ(p))(f (φ(p)))(g(φ(p))) −D(λ ◦ φinv )(φ(p))(Df (φ(p))(g(φ(p)))) = D(λ ◦ φinv )(φ(p))(Dg(φ(p))(f (φ(p)))) −D(λ ◦ φinv )(φ(p))(Df (φ(p))(g(φ(p)))) = D(λ ◦ φinv )(φ(p))(Dg(φ(p))(f (φ(p))) − Df (φ(p))(g(φ(p)))) =: L[f,g] λ(p) where the first equality follows from the preceding derivation and the second equality follows from Proposition 9.28. Thus, we observe that f and g determine

12.11 Manifold

697

another vector field [f, g], which in chart t admits the expression Dg(x)(f (x)) − Df (x)(g(x)). This is a smooth vector field according to Definition 12.148. We will call [f, g] the Lie bracket of the vector fields f and g. Theorem 12.150 Let Y be a reflexive Banach space over K, X be a smooth manifold of variant Y, p ∈ X , and f and g be smooth vector fields. Then, there exists a smooth vector field denoted by [f, g] that assigns a tangent vector [f, g](p) ∈ Tp X to every p ∈ X . In any chart (U, φ) ∈ AX , the expression for [f, g] is given by Dg(x)(f (x)) − Df (x)(g(x)), where f (x) and g(x) are assumed to be the expression of f and g in the chart (U, φ). ∀f, g, h ∈ V(X ), ∀α, β ∈ K, we have: (i) (ii) (iii) (iv) (v)

[f, g] = −[g, f ] (skew commutative). [f, αg + βh] = α[f, g] + β[f, h] (bilinearity). [αg + βh, f ] = α[g, f ] + β[h, f ] (bilinearity). Lf Lg λ − Lg Lf λ = L[f,g] λ, ∀λ ∈ C∞ (X , K). It satisfies the Jacobi identity: [f, [g, h]] + [g, [h, f ]] + [h, [f, g]] = ϑV(X ) .

Proof (i)–(iii) These are immediate from the expression of [f, g] in local coordinates. (iv) This follows from the derivation preceding the theorem. (v) Fix any p ∈ X and any λ ∈ C∞ (p, K). Note that L[f,[g,h]] λ = Lf (L[g,h] λ) − L[g,h] (Lf λ)

.

= Lf (Lg Lh λ − Lh Lg λ) − Lg Lh (Lf λ) + Lh Lg (Lf λ) = Lf Lg Lh λ + Lh Lg Lf λ − Lf Lh Lg λ − Lg Lh Lf λ where the first equality follows from (iv), the second equality follows from (iv) and Definition 12.142, and the third equality follows from Definition 12.142. Thus, we have, by symmetry, L[f,[g,h]]+[g,[h,f ]]+[h,[f,g]] λ = L[f,[g,h]] λ + L[g,[h,f ]] λ + L[h,[f,g]] λ

.

= Lf Lg Lh λ + Lh Lg Lf λ − Lf Lh Lg λ − Lg Lh Lf λ +Lg Lh Lf λ + Lf Lh Lg λ − Lg Lf Lh λ − Lh Lf Lg λ +Lh Lf Lg λ + Lg Lf Lh λ − Lh Lg Lf λ − Lf Lg Lh λ = 0 ∈ C∞ (p, K) where the first equality follows from Definition 12.142 and the second equality follows from the preceding discussion. Thus, by the arbitrariness of λ and p, we have [f, [g, h]] + [g, [h, f ]] + [h, [f, g]] = ϑV(X ) . This completes the proof of the theorem. ' &

Chapter 13

Hilbert Spaces

Hilbert spaces are vector spaces equipped with inner products. They possess a wealth of structural properties generalizing basic geometrical insights. The concepts of orthonormal basis, Fourier series, and least-square minimization all have natural setting in Hilbert spaces.

13.1 Fundamental Notions Definition 13.1 A pre-Hilbert space X over K is a vector space X over K, together with an inner product ·, · : X × X → K satisfying, ∀x, y, z ∈ X , ∀λ ∈ K, (i) (ii) (iii) (iv)

x, y = y, x. x + y, z = x, z + y, z. λx, y = λx, y. x, x ∈ [0, ∞) ⊂ R and x, x = 0 if, and only if, x = ϑX .

%

Proposition √ 13.2 Let X be a pre-Hilbert space over K. Define · : X → [0, ∞) ⊂ R by x = x, x, ∀x ∈ X. Then, ∀x, y, z ∈ X, ∀α, β ∈ K, we have αx + βy, z = αx, z + β y, z. z, αx + βy = αz, x + βz, y. ϑX , x = 0 = x, ϑX . |x, y| ≤ xy, where equality holds if, and only if, y = ϑX or x = λy for some λ ∈ K. (Cauchy–Schwarz Inequality) (v) x + y ≤ x + y, where equality holds if, and only if, y = ϑX or x = λy, for some λ ∈ [0, ∞) ⊂ R. (vi) · is a norm of X. (Induced norm)

(i) (ii) (iii) (iv)

© The Author(s), under exclusive license to Springer Nature Switzerland AG 2023 Z. Pan, Measure-Theoretic Calculus in Abstract Spaces, https://doi.org/10.1007/978-3-031-21912-2_13

699

700

13 Hilbert Spaces

Proof (i) αx + βy, z = αx, z + βy, z = αx, z + β y, z, where the first equality follows from (ii) of Definition 13.1; and the second equality follows from (iii) of Definition 13.1. (ii) z, αx + βy = αx + βy, z = αx, z + β y, z = αx, z + βy, z = αz, x + βz, y, where the first equality follows from (i) of Definition 13.1; the second equality follows from (i); and the fourth equality follows from (i) of Definition 13.1. (iii) ϑX , x = y − y, x = y, x − y, x = 0, where the second equality follows from (i). Then, x, ϑX  = ϑX , x = 0. (iv) We will distinguish two exhaustive and mutually exclusives cases: Case 1: y = ϑX ; Case 2: y = ϑX . Case 1: y = ϑX . Then, |x, y| = 0 = x · 0 = xy. This case is proved. Case 2: y = ϑX . By (iv) of Definition 13.1, we have y, y > 0. ∀λ ∈ K, we have 0 ≤ x − λy, x − λy = x, x − λy − λy, x − λy = x, x − λx, y − λy, x + λλy, y, where the inequality follows from (iv) of Definition 13.1; the first equality follows from (i); and the second equality follows from (ii). Take λ := x, y/y, y. Then, 0 ≤ x, x − λλy, y − λx, y + |λ|2 y, y = x, x − x, yx, y/y, y = x, x − |x, y|2 /y, y, where the equality follows (i) of Definition 13.1. Hence, |x, y| ≤ xy. Equality holds if, and only if, x − λy = ϑX . This case is also proved. Hence, (iv) holds. (v) Note that x + y2 = x + y, x + y = x, x + x, y + y, x + y, y = x2 +2 Re (x, y)+y2 ≤ x2 +2|x, y|+y2 ≤ x2 +2xy+y2 = (x + y)2 , where the first equality follows from (i) and (ii); the second equality follows from (i) of Definition 13.1; the second inequality follows from (iv). Hence, x + y ≤ x + y. Equality holds if, and only if, Re (x, y) = |x, y| = xy. By (iv), have equality holds if, and only if, y = ϑX or x = λy for some λ ∈ [0, ∞) ⊂ R. (vi) Clearly, x ∈ [0, ∞) ⊂ R. x = 0 ⇔ x, x = 0 ⇔ x = ϑX . Hence, (i) of Definition 7.1 √holds. By (v), (ii) of Definition 7.1 holds. Note also that αx = √ αx, αx = ααx, x = |α|x, where the second equality follows from (i) and (ii); and the third equality follows from the fact that αα = |α|2 . Hence, (iii) of Definition 7.1 holds. Therefore, · defines a norm of X. ' & Lemma 13.3 (Parallelogram Law) Let X be a pre-Hilbert space. ∀x, y ∈ X, we have x + y2 + x − y2 = 2x2 + 2y2 . Proof This is straightforward, and is therefore omitted.

' &

Pre-Hilbert space is a special kind of normed linear space. The concept of convergence, continuity, topology, and completeness, etc., apply in pre-Hilbert spaces. Proposition 13.4 Let X be a pre-Hilbert space. Then, the inner product is a continuous function of the product space X × X. Proof We will show that the inner product is continuous at any (x0 , y0 ) ∈ X × X. ∀ ∈ (0, ∞) ⊂ R, let M := max{x0 , y0 } + 1 ∈ (0, ∞) ⊂ R

13.1 Fundamental Notions

701

and δ := min{, 1}/M ∈ (0, 1] ⊂ R. Fix any (x, y) ∈ BX×X ((x0 , y0 ), δ), we have |x, y − x0 , y0 | = |x − x0 , y + x0 , y − y0 | ≤ |x − x0 , y| +  (y0  + y − y0 ) + |x0 , y − y0 | ≤ x − x0 y + x0 y − y0 √≤ x − x0√ My − y0  ≤ M (x − x0  + y − y0 ) < 2Mδ ≤ 2, where the second inequality follows from Proposition 13.2. Hence, the inner product is continuous at (x0 , y0 ). By the arbitrariness of (x0 , y0 ), we have the inner product is continuous. This completes the proof of the proposition. ' & Definition 13.5 A complete pre-Hilbert space is called a Hilbert space.

%

Definition 13.6 Let X be a pre-Hilbert space, x, y ∈ X, and S ⊆ X. The vectors x and y are said to be orthogonal if x, y = 0. We will then write x ⊥ y. The vector x is said to be orthogonal to the set S if x ⊥ s, ∀s ∈ S. We will then write x ⊥ S. % Theorem 13.7 (Pythagorean) Let X be a pre-Hilbert space over K and x, y ∈ X. If x ⊥ y, then x + y2 = x2 + y2 . Furthermore, if K = R and x + y2 = x2 + y2 , then x ⊥ y. Proof Let x ⊥ y, then x, y = 0. This implies that x + y2 = x + y, x + y = x, x + x, y + y, x + y, y = x2 + y2 , where the second equality follows from Proposition 13.2; and the third equality follows from Definition 13.1. Let K = R and x + y2 = x2 + y2 . Then, we have 0 = x, y + y, x = 2x, y, where the first equality follows from Proposition 13.2; and the second equality follows from Definition 13.1 and K = R. Hence, x ⊥ y. This completes the proof of the theorem. ' & Example 13.8 Let n ∈ Z+ and X = Kn . Define the inner product by x, y = n n n i=1 ξi ηi ∈ K, ∀x := (ξ1 , . . . , ξn ) ∈ K and ∀y := (η1 , . . . , ηn ) ∈ K . It is easy to check that the pair X := (X , ·, ·) is a pre-Hilbert space over K. The induced norm is clearly the Euclidean norm. By Example 7.29, X is complete. Therefore, X is a Hilbert space, which is to be denoted simply by Kn . % Example 13.9 Let Xi be a Hilbert space over K with inner product ·, ·i , i = 1, 2. Let Z := X1 × X2 . Define inner product ·, ·Z : Z × Z → K by (x1 , x2 ), (y1 , y2 )Z = x1 , y1 1 + x2 , y2 2 , ∀(x1 , x2 ), (y1 , y2 ) ∈ Z. It is easy to show that Z with this inner product is a pre-Hilbert space over K. It is also easy to see that the induced norm of Z is equal to the norm of Z as defined in Proposition 7.22. By Proposition 4.31, Z is complete. Therefore, Z is a Hilbert space over K. % n The above example can be easily generalized to i=1 Xi case, where n ∈ Z+ and Xi is a Hilbert space over K, ∀i ∈ {1, . . . , n}. Example 13.10 Let X be a Hilbert space over K with inner product ·, ·X and Z := l2 (X). We will show that Z is a Hilbert space over K with inner product -∞ ·, ·Z defined by x, yZ = i=1 ξi , ηi , ∀x := (ξ1 , ξ2 , . . .) ∈ Z and ∀y := (η ∈ Z. To see that x, yZ ∈ K, we note that x ∈ Z implies that -1∞, η2 , . . .) 2 < ∞. Fix any  ∈ (0, ∞) ⊂ R. Then, ∃n1 ∈ N such that i=1 ξi 

702

13 Hilbert Spaces

-∞ ξi 2 < . By y ∈ Z, ∃n2 ∈ N such that 2 < . Let i=n2 η  -i m n0 := max{n1 , n2 } ∈ N. ∀n, m ∈ N with n0 ≤ n ≤ m, we have  i=n ξi , ηi  ≤ -m       -∞ 2 1/2 -m η 2 1/2 2 1/2 -∞ η 2 1/2 ≤ < , i i i=n ξi  i=n i=n1 ξi  i=n2 where the first inequality follows from Hölder’s Inequality Theorem 7.8. Hence, ∞ -n i=1 ξi , ηi  n=1 ⊆ K is a Cauchy sequence, which converges since K is complete. Hence, x, yZ ∈ K. It is easy to show that (i)–(iv) of Definition 13.1 are satisfied. Hence, Z is a pre-Hilbert space. It is easy to see that the induced norm is equal to the norm of Z as defined in Example 7.10. By Example 7.33, we have Z is complete and therefore a Hilbert space. % -∞

i=n1

Example 13.11 Let X := (X, B, μ) be a σ -finite measure space, Y be a separable Hilbert space over K with inner product ·, ·Y and Z := L2 (X , Y). We will show that Z is a Hilbert space over K with inner product ·, ·Z defined by [z1 ], [z2 ]Z = ¯ X z1 (x), z2 (x)Y dμ(x), ∀z1 , z2 ∈ L2 (X , Y). First, we show that ·, ·Z is well-defined. ∀z1 , z2 ∈ L¯ 2 (X , Y), we have z1 and z2 are B-measurable. By Propositions 11.39, 11.38, and 13.4, the function h : X → K defined by h(x) = z1 (x), z2 (x)Y , ∀x ∈ X , is B-measurable. P ◦ h(x) = |h(x)| ≤ z1 (x)z2 (x) = P ◦ z1 (x)P ◦ z2 (x), ∀x ∈ X , where the inequality follows from Proposition 13.2. Then, X P ◦ h dμ ≤ X (P ◦ z1 P ◦ z2 ) dμ ≤ ( X P ◦ z1 dμ)1/2 ( X P ◦ z2 dμ)1/2 = z1 2 z2 2 < ∞, where the first inequality follows from Definition 11.79; and the second inequality follows from Hölder’s Inequality 11.178. Hence, h is absolute integrable over X . By Proposition 11.92, h is integrable over X and X z1 (x), z2 (x)Y dμ(x) ∈ K. ∀¯z1 , z¯2 ∈ L¯ 2 (X , Y) with [z1 ] = [¯z1 ] and [z2 ] = [¯z2 ], by Example 11.173, we have z1 = z¯ 1 a.e. in X and ¯ z2 = z¯ 2 a.e. in X . Define h¯ : X → K by h(x) = ¯z1 (x), z¯ 2 (x)Y , ∀x ∈ X . ¯ By Lemmas 11.45 and 11.46, h = h a.e. in X . By Proposition 11.92, X h dμ = ¯ X h dμ. Hence, [z1 ], [z2 ]Z ∈ K is well-defined. Next, we show that ·, ·Z satisfies (i)–(iv) of Definition 13.1. Fix any z1 , z2 , z3 ∈ L¯ 2 (X , Y) and any λ ∈ K. (i) [z1 ], [z2 ]Z = X z1 (x), z2 (x)Y dμ(x) = X z2 (x), z1 (x)Y dμ(x) = X z2 (x), z1 (x)Y dμ(x) = [z2 ], [z1 ]Z , where the second equality follows from the fact that Y is a Hilbert space over K; and the third equality follows from Proposition 11.92. (ii) [z 1 ] + [z2 ], [z3 ]Z = [z1 + z2 ], [z3 ]Z = X z 1 (x) + z2 (x), z3 (x)Y · dμ(x) = X (z1 (x), z3 (x)Y + z2 (x), z3 (x)Y ) dμ(x) = X z1 (x), z3 (x)Y · dμ(x) + X z2 (x), z3 (x)Y dμ(x) = [z1 ], [z3 ]Z + [z2 ], [z3 ]Z , where the first equality follows from Proposition 7.43; the third equality follows from the fact that Y is a Hilbert space; and the fourth equality follows from Proposition 11.92. (iii) λ[z1 ], [z2 ]Z = λz [λz = (x), z (x) dμ(x) = λz (x), z2 (x)Y dμ(x) = ], [z ] 1 2 1 2 1 Z Y X X λ X z1 (x), z2 (x)Y dμ(x) = λ[z1 ], [z2 ]Z , where the first equality follows from Proposition 7.43; the third equality follows from the fact that Y is a Hilbert space; and the fourth equality follows from Proposition 11.92. (iv) [z1 ], [z1 ]Z = z (x), z1 (x)Y dμ(x) = X z1 (x)2Y dμ(x) = z1 22 ∈ [0, ∞) ⊂ R. Clearly, 1 X [z1 ], [z1 ]Z = 0 ⇔ z1 2 = 0 ⇔ [z1 ] = ϑZ . Hence, Z is a pre-Hilbert space over K.

13.2 Projection Theorems

703

Note that the induced norm of Z is equal to the norm of Z as defined in Example 11.173. By Example 11.179, Z is complete. Hence, Z is a Hilbert space over K. %

13.2 Projection Theorems The basic concept of the Projection Theorem is illustrated in Fig. 13.1. Theorem 13.12 Let X be a pre-Hilbert space, M ⊆ X be a subspace, and x ∈ X. Consider the problem minm∈M x − m. If there exists m0 ∈ M such that x − m0  ≤ x − m, ∀m ∈ M, then m0 is the unique vector in M that minimizes x − m. A necessary and sufficient condition for m0 ∈ M being the unique minimizing vector is (x − m0 ) ⊥ M. Proof We first show that if m0 ∈ M is a minimizing vector then (x − m0 ) ⊥ M by an argument of contradiction. Suppose that (x − m0 ) ⊥ M. Then, ∃m ∈ M such that x − m0 , m =: δ = 0. Without loss of generality, assume m = 1. Define m1 := m0 + δm ∈ M. Then, by Proposition 13.2, x − m1 2 = x − m0 − δm, x − m0 − δm = x − m0 2 − x − m0 , δm − δm, x − m0  + δm2 = x − m0 2 − δδ − δδ + |δ|2 m2 = x − m0 2 − |δ|2 < x − m0 2 . This contradicts the assumption that m0 is a minimizing vector. Hence, (x − m0 ) ⊥ M. Next, we show that if (x − m0 ) ⊥ M, where m0 ∈ M, then m0 is the unique minimizing vector. ∀m ∈ M, by Pythagorean Theorem 13.7, we have x − m2 =

Fig. 13.1 Projection onto a subspace

704

13 Hilbert Spaces

x − m0 − (m − m0 )2 = x − m0 2 + m − m0 2 . Then, x − m > x − m0  if m = m0 . This completes the proof of the theorem. ' & Theorem 13.13 (The Classical Projection Theorem) Let X be a Hilbert space and M ⊆ X be a closed subspace. ∀x ∈ X, there exists a unique vector m0 ∈ M such that x − m0  = minm∈M x − m. Furthermore, a necessary and sufficient condition for m0 ∈ M being the unique minimizing vector is (x − m0 ) ⊥ M. Proof The uniqueness and orthogonality are immediate consequences of Theorem 13.12. We are only required to establish the existence of m0 . Let δ := infm∈M x − m ∈ [0, ∞) ⊂ R. Then, ∃ (mn )∞ n=1 ⊆ M such that limn∈N x − mn  = δ. ∀ ∈ (0, ∞) ⊂ R, ∃n ∈ N such that ∀n ∈ N with n0 ≤ n, 0 J we have δ ≤ x − mn  < δ 2 +  2 /4. By Parallelogram Law 13.3, we have, 6 62 6 62 ∀i, j ∈ N with n0 ≤ i and n0 ≤ j , 6mi − x + x − mj 6 + 6x − mi + x − mj 6 = 6 6 62 62 62 6 26x − mj 6 + 2x − mi 2 . Then, 6mi − mj 6 = 26x − mj 6 + 2x − mi 2 − 62 6 6 6 46x − (mi + mj )/26 < 4(δ 2 +  2 /4) − 4δ 2 =  2 and 6mi − mj 6 < . This implies that (mn )∞ n=1 is a Cauchy sequence, which must converge to m0 ∈ M, since M is a complete subspace by Proposition 4.39. By Proposition 7.21, we have x − m0  = limn∈N x − mn  = δ = infm∈M x − m. This completes the proof of the theorem. ' & In the above proof, we observe that the key to the existence of the minimizing vector is that M is complete. Hence, we have the following modified version of the projection theorem. Theorem 13.14 Let X be a pre-Hilbert space and M ⊆ X be a complete subspace. ∀x ∈ X, there exists a unique vector m0 ∈ M such that x − m0  = minm∈M x − m. Furthermore, a necessary and sufficient condition for m0 ∈ M being the unique minimizing vector is (x − m0 ) ⊥ M. Proof This is straightforward by the proof of the Projection Theorem 13.13, and is therefore omitted. ' &

13.3 Dual of Hilbert Spaces Theorem 13.15 (Riesz-Fréchet) following statements hold.

Let X be a Hilbert space over K. Then, the

(i) ∀f ∈ X∗ , there exists a unique y0 ∈ X such that f (x) = x, y0 , ∀x ∈ X, and f X∗ = y0 X . Therefore, we may define a mapping Φ : X∗ → X by Φ(f ) = y0 . (ii) ∀y ∈ X, define g : X → K by g(x) = x, y, ∀x ∈ X, then g ∈ X∗ .

13.3 Dual of Hilbert Spaces

705

(iii) The mapping Φ is bijective, uniformly continuous, norm preserving, and conjugate linear (that is Φ(αf1 + βf2 ) = αΦ(f1 ) + βΦ(f2 ), ∀f1 , f2 ∈ X∗ , ∀α, β ∈ K). (iv) If K = R, then Φ is a isometrical isomorphism between X∗ and X. (v) If K = C, let φ : X → X∗∗ be the natural mapping as defined in Remark 7.88, then φ is surjective and X is reflexive. (vi) If K = C, then X∗ with the inner product ·, ·X∗ , defined by f, gX∗ := Φ(g), Φ(f ), ∀f, g ∈ X, is a Hilbert space, and it is reflexive. Henceforth, we will denote Φinv (x) =: x ∗ , ∀x ∈ X. Furthermore, the following statement hold. (vii) When K = C, let Φ∗ : X∗∗ = X → X∗ be the mapping of Φ if X is replaced by X∗ . Then, Φ∗ = Φinv . This leads to the identity (x ∗ )∗ = x, ∀x ∈ X. (viii) If X is separable, then X∗ is separable. Proof We will first show that, ∀y ∈ X, the functional g defined in the statement (ii) of the theorem is a bounded linear functional with gX∗ = yX . By Definition 13.1, g is a linear functional. By Proposition 13.2, we have |g(x)| = |x, y| ≤ xX yX . Hence, g ∈ X∗ and gX∗ ≤ yX . Note that |g(y)| = |y, y| = y2X . Hence, gX∗ ≥ yX . Hence, gX∗ = yX . Next, we will show that, for any f ∈ X∗ , there exists a y0 ∈ X such that f (x) = x, y0 , ∀x ∈ X. Let M := N (f ). We will distinguish two exhaustive and mutually exclusives cases: Case 1: M = X; Case 2: M ⊂ X. Case 1: M = X. Take y = ϑX . Then, f (x) = 0 = x, y, ∀x ∈ X. This case is proved. Case 2: M ⊂ X. Then, ∃x0 ∈ X \ M. Since f ∈ X∗ , then M is a closed subspace of X by Proposition 7.68. By Projection Theorem 13.13, ∃m0 ∈ M such that (x0 − m0 ) ⊥ M. Let z0 := x0 − m0 ∈ X \ M. Then, f (z0 ) = 0. Without loss of generality, we may assume that f (z0 ) = 1. ∀x ∈ X, by the linearity of f , we have f (x − f (x)z0 ) = f (x) − f (x)f (z0 ) = 0 and x − f (x)z0 ∈ M. Since z0 ⊥ M, we 2 2 Bhave 0 = x2 C− f (x)z0 , z0  = x,2 z0  − f (x)z0  . Then, f (x) = x, z0 /z0  = x, z0 /z0  . Let y0 := z0 /z0  ∈ X, we have f (x) = x, y0 , ∀x ∈ X. This case is also proved. Next, we will show that, for any f ∈ X∗ , the vector y0 ∈ X is unique. Let y ∈ X be another vector such that f (x) = x, y, ∀x ∈ X. Then, 0 = f (x) − f (x) = x, y0  − x, y = x, y0 − y, ∀x ∈ X. Then, 0 = y0 − y, y0 − y = y0 − y. Thus, y = y0 . Hence, y0 is unique. Based on the above, we have (i) and (ii) holds. (iii) Note that, ∀f1 , f2 ∈ X∗ , B∀α, β ∈ K, (αf1 + βfC2 )(x) = αf1 (x) + βf2 (x) = αx, Φ(f1 ) + β x, Φ(f2 ) = x, αΦ(f1 ) + βΦ(f2 ) , ∀x ∈ X, where the second equality follows from (i); and the third equality follows from Proposition 13.2. Then, by (i), we have Φ(αf1 + βf2 ) = αΦ(f1 ) + βΦ(f2 ). By (i), Φ(f )X = f X∗ , ∀f ∈ X∗ . Therefore, Φ is norm preserving. By (ii), Φ is surjective. ∀f1 , f2 ∈ X∗ , Φ(f1 ) = Φ(f2 ) implies that 0 = Φ(f1 ) − Φ(f2 )X = Φ(f1 − f2 )X = f1 − f2 X∗ , which further implies that f1 = f2 . Hence, Φ is injective. Hence, Φ

706

13 Hilbert Spaces

is bijective. Clearly, Φ is uniformly continuous, which follows from the fact that it is norm preserving and conjugate linear. (iv) Let K = R. By (iii), Φ is linear. Then, by Definitions 6.28 and 7.24, Φ is an isometrical isomorphism. (v) Let K = C. By Remark 7.88, ∀x∗ ∈ X∗ , ∀x ∈ X, we have x∗ , x = φ(x), x∗ . Then, ∀y ∈ X, we have x, y = Φinv (y), x = φ(x), Φinv (y). We will show that φ is surjective, which then implies that φ is isometrical isomorphism between X and X∗∗ , and X is reflexive. ∀x∗∗ ∈ X∗∗ , x∗∗ , x∗  = x∗∗ , Φinv (Φ(x∗ )), ∀x∗ ∈ X∗ . Define G : X → K by G(y) = x∗∗ , Φinv (y), ∀y ∈ X. By (iii), it is easy to show that G ∈ X∗ . By (i), G(y) = y, Φ(G), ∀y ∈ X. Then, x∗∗ , x∗  = x∗∗ , Φinv (Φ(x∗ )) = G(Φ(x∗ )) = Φ(x∗ ), Φ(G) = Φ(G), Φ(x∗ ) = Φinv (Φ(x∗ )), Φ(G) = x∗ , Φ(G) = φ(Φ(G)), x∗ , ∀x∗ ∈ X∗ . Thus, we have x∗∗ = φ(Φ(G)) ∈ R(φ). By the arbitrariness of x∗∗ , we have φ is surjective. (vi) We will show that ·, ·X∗ is an inner product on X∗ . ∀f, g, h ∈ X∗ , ∀λ ∈ K. (a) g, f X∗ = Φ(f ), Φ(g) = Φ(g), Φ(f ) = f, gX∗ , where the second equality follows from (i) of Definition 13.1. (b) f + g, hX∗ = Φ(h), Φ(f + g) = Φ(h), Φ(f ) + Φ(g) = Φ(h), Φ(f ) + Φ(h), Φ(g) = f, hX∗ + g, hX∗ , where the second equality follows from (iii); and the Bthird equality Cfollows from Definition 13.1. (c) λf, gX∗ = Φ(g), Φ(λf ) = Φ(g), λΦ(f ) = λΦ(g), Φ(f ) = λf, gX∗ , where the second equality follows from (iii); and the third equality follows from Definition 13.1. (d) f, f X∗ = Φ(f ), Φ(f ) = Φ(f )2X = f 2X∗ ∈ [0, ∞) ⊂ R, where the second equality follows from Definition 13.1; and the third equality follows from (iii). Clearly, f, f X∗ = 0 if, and only if, f X∗ = 0 if, and only if, f = ϑX∗ . Hence, X∗ with the inner product ·, ·X∗ form a pre-Hilbert space. The induced norm is equal to the ·X∗ . By Proposition 7.72, X∗ is complete and hence a Hilbert space. By Proposition 7.90 and (v), X∗ is reflexive. (vii) Let K = C. ∀x ∈ X. Let f = Φ∗ (x) = Φ∗ (φ(x)). Then, f (x) = x, Φ(f ) = (φ(x))(f ) = f, Φ∗ (φ(x))X∗ = f, Φ∗ (x)X∗ = f, f X∗ = f 2X∗ , where the first equality follows from (i); the second equality follows from Remark 7.88; the third equality follows from (i) as applied to X∗ ; and the fourth equality follows from (v). By (iii), f X∗ = xX and f X∗ = Φ(f )X . This implies that x, Φ(f ) = xX Φ(f )X . By (iv) of Proposition 13.2, Φ(f ) = ϑX or x = λΦ(f ), for some λ ∈ C. If Φ(f ) = ϑX , by (iii), f = ϑX∗ . Thus, by (iii), x = ϑX = Φ(f ) since f = Φ∗ (x). If Φ(f ) = ϑX , we must have x = λΦ(f ), for some λ ∈ C. Then, f 2X∗ = x, Φ(f ) = λΦ(f ), Φ(f ) = λΦ(f )2X = λf 2X∗ . Thus, λ = 1. This implies that x = Φ(f ). In both cases, we have x = Φ(f ) = Φ(Φ∗ (x)). By the arbitrariness of x, we have Φ ◦ Φ∗ = idX . By (iii) and Proposition 2.4, we have Φ∗ = Φinv . (viii) Let D ⊆ X be a countable dense set. Then, Φinv (D) ⊆ X∗ is a countable dense set by (iii). Hence, X∗ is separable. This completes the proof of the theorem. ' &

13.3 Dual of Hilbert Spaces

707 Banach space

Banach space



,

, Φ Banach space Hilbert space

inv







conjugate linear

Banach space Hilbert space ∗

Φ Fig. 13.2 Relationship of a Hilbert space H and its dual H∗ and H as Banach space and its dual H∗

Here, we will summarize the relationship between a complex Hilbert space H and its dual H∗ and the Hilbert space H as a complex Banach space and its dual H∗ . We have the diagram Fig. 13.2. Definition 13.16 Let X be a Hilbert space over K, S ⊆ X. By Definition 7.95, S ⊥ ⊆ X∗ is the set S ⊥ = {x∗ ∈ X∗ | x∗ , x = 0, ∀x ∈ S}. We will denote S ⊥h := {y ∈ X | x, y = y ∗ , x = 0, ∀x ∈ S}, which is said to be the Horthogonal complement of S. Note that S ⊥h = ΦX (S ⊥ ), S ⊥ = ΦXinv (S ⊥h ) = (ΦXinv (S))⊥h , and ⊥T = ΦX (T ⊥h ) = (ΦX (T ))⊥h , ∀T ⊆ X∗ . % Proposition 13.17 Let X be a Hilbert space over K and S, T ⊆ X. Then, (i) (ii) (iii) (iv)

S ⊥h ⊆ X is a closed subspace. If S ⊆ T , then T ⊥h ⊆ S ⊥h . S ⊥h ⊥h ⊥h = S ⊥h . S ⊥h ⊥h = span (S).

Proof (i) and (ii) follow immediately from Proposition 7.98. (iv) By Riesz-Fréchet Theorem 13.15, X is reflexive. Then, X = X∗∗ . Then, by Proposition 7.98, we have span (S) = ⊥(S ⊥ ) = (ΦX (S ⊥ ))⊥h = S ⊥h ⊥h . (iii) By (iv), we have S ⊆ span (S) = S ⊥h ⊥h . By (ii), we have S ⊥h ⊇ S ⊥h ⊥h ⊥h .   Again by (iv), we have S ⊥h ⊆ span S ⊥h = S ⊥h ⊥h ⊥h . Therefore, S ⊥h = S ⊥h ⊥h ⊥h . This completes the proof of the proposition. ' & Definition 13.18 Let X be a vector space over the field F and M, N ⊆ X are subspaces. We say that X is the direct sum of M and N if ∀x ∈ X , ∃! m ∈ M and ∃! n ∈ N, such that x = m + n. In this case, we will write X = M ⊕ N. % Theorem 13.19 Let M be a closed subspace of a Hilbert space X. Then, X = M ⊕ M ⊥h . Proof ∀x ∈ X, by Projection Theorem 13.13, ∃!m0 ∈ M such that n0 = (x −m0 ) ⊥ M. Hence, x = m0 + n0 with m0 ∈ M and n0 ∈ M ⊥h . Clearly, m0 and n0 is unique since m0 is unique. Hence, X = M ⊕M ⊥h . This completes the proof of the theorem. ' &

708

13 Hilbert Spaces

Proposition 13.20 Let X be a real Hilbert space, f : X × X → R be given by f (x, y) = x, y, ∀(x, y) ∈ X × X. Then, f B is analytic C C B with arbitrarily large analytic radius, f (1) (x, y)(hx , hy ) = hx , y + x, hy , C B C B f (2) (x, y)(hx2 , hy2 )(hx1 , hy1 ) = hx2 , hy1 + hx1 , hy2 , and f (i+2) (x, y) = ϑBS i+2 (X×X,R) , ∀(x, y) ∈ X × X, ∀i ∈ N, ∀(hx , hy ) ∈ X × X, ∀(hx1 , hy1 ) ∈ X × X, ∀(hx2 , hy2 ) ∈ X × X. Proof This follows immediately from Propositions 9.41 and 9.27 and RieszFréchet Theorem 13.15. ' &

13.4 Hermitian Adjoints Definition 13.21 Let X and Y be Hilbert spaces over K, A ∈ B(X, Y), and ΦX : X∗ → X and ΦY : Y∗ → Y as defined in Riesz-Fréchet Theorem 13.15. Define the Hermitian adjoint of A by A∗ := ΦX ◦ A ◦ ΦYinv ∈ B(Y, X). A is said to be Hermitian if A∗ = A. % Proposition 13.22 Let X, Y, and Z be Hilbert spaces over K, A, B ∈ B(X, Y), C ∈ B(Y, Z), x ∈ X, y ∈ Y, and λ ∈ K. Then, (i) (ii) (iii) (iv) (v) (vi) (vii) (viii) (ix) (x) (xi) (xii)

id∗X = idX . (A + B)∗ = A∗ + B ∗ . (λA)∗ = λA∗ . (CA)∗ = A∗ C ∗ . If A is bijective, then (A∗ )−1 = (A−1 )∗ =: A−∗ . Ax, yY = x, A∗ yX . y, AxY = A∗ y, xX . (A∗ )∗ = A; A∗ B(Y,X) = AB(X,Y) . (R(A))⊥h = N (A∗ ) and (R(A∗ ))⊥h = N (A). If R(A) ⊆ X is closed, then R(A) = (N (A∗ ))⊥h and R(A∗ ) = (N (A))⊥h . Let (X, B) be a measurable space and μ be a B(Y, Z)-valued measure on (X, B). Define μ∗ to be a mapping from B to B(Z, Y) by: μ∗ (E) = (μ(E))∗ , if E ∈ dom (μ); and μ∗ (E) is undefined, if E ∈ B \ dom (μ). Then, μ∗ is a B(Z, Y)-valued measure on (X, B) with P ◦ μ∗ = P ◦ μ.

Proof Let ΦX : X∗ → X, ΦY : Y∗ → Y, and ΦZ : Z∗ → Z be as defined in Riesz-Fréchet Theorem 13.15. (i) id∗X = ΦX ◦ idX ◦ΦXinv = ΦX ◦ idX∗ ◦ΦXinv = idX , where the first equality follows from Definition 13.21; and the second equality follows from Proposition 7.110. (ii) (A + B)∗ = ΦX ◦ (A + B) ◦ ΦYinv = ΦX ◦ (A + B  ) ◦ ΦYinv = ΦX ◦  A ◦ ΦYinv + ΦX ◦ B  ◦ ΦYinv = A∗ + B ∗ , where the first equality follows from

13.4 Hermitian Adjoints

709

Definition 13.21; the second equality follows from Proposition 7.110; and the third equality follows from Riesz-Fréchet Theorem 13.15. (iii) (λA)∗ = ΦX ◦ (λA) ◦ ΦYinv = ΦX ◦ (λA ) ◦ ΦYinv = λΦX ◦ A ◦ ΦYinv = λA∗ , where the first equality follows from Definition 13.21; the second equality follows from Proposition 7.110; and the third equality follows from Riesz-Fréchet Theorem 13.15. (iv) (CA)∗ = ΦX ◦(CA) ◦ΦZinv = ΦX ◦(A C  )◦ΦZinv = ΦX ◦A ◦C  ◦ΦZinv = ΦX ◦A ◦ΦYinv ◦ΦY ◦C  ◦ΦZinv = A∗ ◦C ∗ = A∗ C ∗ , where the first equality follows from Definition 13.21; and the second equality follows from Proposition 7.110. (v) By Open Mapping Theorem 7.103, A−1 ∈ B (Y, X). Then, we have (A−1 )∗ = ΦY ◦(A−1 ) ◦ΦXinv = ΦY ◦(A )−1 ◦ΦXinv = (ΦX ◦A ◦ΦYinv )−1 = (A∗ )−1 , where the first equality follows from Definition 13.21; and the second equality follows from Proposition 7.110. BB  ∗ CC B C (vi) Ax, yY = y ∗ , AxY = A y , x X = x, ΦX (A y ∗ ) X = B C x, ΦX (A (ΦYinv (y))) X = x, A∗ yX , where the first equality follows from RieszFréchet Theorem 13.15; the second equality follows from Proposition 7.109; the third and fourth equalities follow from Riesz-Fréchet Theorem 13.15; and the last equality follows from Definition 13.21. (vii) y, AxY = Ax, yY = x, A∗ yX = A∗ y, xX , where the first equality follows from Definition 13.1; the second equality follows from (vi); and the third equality follows from Definition 13.1. (viii) y, AxY = A∗ y, xX = y, (A∗ )∗ xY , where the first equality follows from (vii); and the second equality follows from (vi). By the arbitrariness of x and y, we have A = (A∗ )∗ . (ix) A∗ B(Y,X) = supy∈Y, yY ≤1 A∗ yX = supy∈Y, yY ≤1 ΦX ◦ A ◦ ΦYinv (y)X = A B(Y∗ ,X∗ ) = AB(X,Y) , where the first equality follows from Proposition 7.63; the second equality follows from Definition 13.21; the third equality follows from the fact that Φ is bijective and norm preserving; and the fourth equality follows from Proposition 7.109. (x) By Proposition 7.112 and Definition 13.16, we have (R(A))⊥h = ΦY (N A ) = N (A∗ ). Then, (R(A∗ ))⊥h = N ((A∗ )∗ ) = N (A), where the first equality follows from the previous sentence; and the second equality follows from (viii). 13.16, we have R(A) = (xi)  Propositions 7.112 and  7.114 and Definition  By   ⊥(N A ) = ⊥(Φ  ))) = (Φ (N A ))⊥h = (N (Φ (N A (A∗ ))⊥h . FurtherYinv Y Y more, R(A∗ ) = (N ((A∗ )∗ ))⊥h = (N (A))⊥h , where the first equality follows from the previous sentence; the second equality follows from (viii). (xii) We will prove the result using Definition 11.108. Let ν := P ◦ μ. Clearly, ν is measure on (X, B). (a) ν(∅) = 0 and μ(∅) = ϑB(Y,Z) . Then, μ∗ (∅) = (μ(∅))∗ = ∗ ϑB(Y,Z) = ϑB(Z,Y) , where the last equality follows from (ii). (b) ∀E ∈ B with ν(E) = ∞, then μ(E) is undefined and E ∈ B \ dom (μ). Then, μ∗ (E) is undefined.

710

13 Hilbert Spaces

(c) ∀E ∈ B with ν(E) < ∞, then μ(E) ∈ B(Y, Z) and E ∈ dom (μ). ∞ ∗ ∗ Then, ∞ μ (E) = (μ(E)) -∞ ∈ B(Z, Y). ∀ pairwise disjoint (Ei )i=1 -∞⊆ B with E = μ(E E , we have ) < ∞ and μ(E) = i i B(Y,Z) i=1 -i=1 -∞ i=1 μ(Ei ). Then, ∞ ∞ ∗ (E ) ∗ μ (μ(E = )) = < i i B(Z,Y) B(Z,Y) i=1 i=1 i=1 μ(E -∞i )B(Y,Z) ∗ (E ) = ∞, where the second equality follows from (ix). Furthermore, μ i i=1 -∞ -∞ ∗ ∗ ∗ ∗ i=1 (μ(Ei )) = ( i=1 μ(Ei )) = (μ(E)) = μ (E), where the second equality follows from Propositions 3.66, 7.109, and 7.110, Riesz-Fréchet Theorem 13.15, and Definition 13.21. (d) ∀E ∈ B with ν(E) < ∞, we have .

ν(E) =

=

=

n .

sup 

n∈Z+ , (Ei )ni=1 ⊆B , E=

n i=1

Ei , Ei ∩Ej =∅, ∀1≤i follow from the preceding discussion. By the arbitrariness of , we have F |I = ϑZ . Then, F |I = 0 a.e. in I. Since F is absolutely continuous, then it is continuous by Proposition 12.59, and then F (t) = 0, ∀t ∈ r−π,π (otherwise, we can always find a nonempty interval that is a subset of I such that F is nonzero on the interval.) Then, by Fundamental Theorem of Calculus I, Theorem 12.86, z(t) = 0 a.e. t ∈ I. Hence, [z] = ϑZ .

720

13 Hilbert Spaces

⊥h  We have shown that ([en ])∞ = {ϑZ }. Then, by Proposition 13.33, n=0   ∞ ∞ span ([e -n ])n=0 = Z. and ([en ])n=0 is a complete orthonormal sequence. Then % [f ] = ∞ n=0 an [en ] in Z. Given independent vectors y1 , . . . , yn generating a subspace M of a Hilbert space X, we wish to find m0 ∈ M that minimizes x − m0 , for some given x ∈ X. Rather than solving the normal equation (13.1), we may employ the Gram–Schmidt orthogonalization procedure together with Proposition 13.31. First use the Gram–Schmidt Procedure to obtain orthonormal sequence (ei )ni=1 . Then, n m0 = i=1 x, ei ei . Thus, the optimization problem is easily solved once the vectors yi ’s are orthogonalized. Since solution to the approximation problem is equivalent to solution to the normal equation. It is clear that the Gram–Schmidt procedure can be interpreted as a procedure for inverting the Gram matrix. It is interesting to note that the Gram–Schmidt procedure may itself be viewed as an approximation problem. y1 .e1 = ; y1 

yk − k−1 i=1 yk , ei  ei 6 ; ek = 6 -k−1 6 6 6yk − i=1 yk , ei  ei 6

∀k > 1

-k−1 The vector yk − i=1 yk , ei ei is the optimal error for the problem minm∈span({y1 ,...,yk−1 }) yk − m. Thus, the Gram–Schmidt procedure consists of solving a series of minimum norm approximation problems by use of the Projection Theorem. Alternatively, the minimum norm approximation of x can be found by applying the Gram–Schmidt procedure to the sequence (y1 , . . . , yn , x). The optimal error x − m0 is found at the last step.

13.6 Other Minimum Norm Problems In approximation problems, finite dimensionality of the subspace allows the reduction of the problem into a finite-dimensional normal equation, which leads to a feasible computation procedure. In many important and interesting practical problems, the subspace is not finite-dimensional. In such problems, it is generally not possible to reduce the problem to a finite-dimensional normal equation. However, there is an important class of such problems that can be reduced by the projection theorem to finite-dimensional equations. The motivation for this class of problems is illustrated in Fig. 13.3. In this class of problems, the subspace M ⊥h is finitedimensional, which leads to feasible computation procedures. Next, we state a modified version of the Projection Theorem 13.13. Theorem 13.37 (Restatement of Projection Theorem) Let X be a Hilbert space, M ⊆ X be a closed subspace, x, y ∈ X, V = y + M be

13.6 Other Minimum Norm Problems

721

Fig. 13.3 Dual projection problems

a closed linear variety. Then, there is a unique vector v0 ∈ V such that x − v0  = minv∈V x − v. A necessary and sufficient condition that v0 ∈ V is the unique minimizing vector is that (x − v0 ) ⊥ M. Proof This is straightforward, and is therefore omitted.

' &

A point of caution is necessary here: (x − v0 ) ⊥ M but not (x − v0 ) ⊥ V . The concept of the above theorem is illustrated in Fig. 13.4. A special type of linear variety is V = {v ∈ X | v, yi  = ci , i = 1, . . . , n}, where y1 , . . . , yn ∈ X are fixed vectors and c1 , . . . , cn ∈ K are fixed scalars. Let M = span ({y1 , . . . , yn }). When c1 = · · · = cn = 0, then V = M ⊥h . For arbitrary ci ’s, V is equal to y + M ⊥h assuming V = ∅. A linear variety of this form is said to be of finite co-dimension, since M ⊥h ⊥h = M is finite-dimensional by Proposition 13.17. Theorem 13.38 Let X be a Hilbert space over K, n ∈ N, y1 , . . . , yn ∈ X, c1 , . . . , cn ∈ K, V = {v ∈ X | v, yi  = ci , i = 1, . . . , n} = ∅, x ∈ X. Then, the following equation

722

13 Hilbert Spaces

Fig. 13.4 Reformulation of the projection Theorem

⎤ ⎡ ⎤ x, y1  − c1 β1 ⎢ ⎥ .. ⎢ . ⎥ .(Gram(y1 , . . . , yn )) ⎣ . ⎦ = ⎣ ⎦ . . ⎡

βn

(13.2)

x, yn  − cn

admits at least one solution (β1 , . . . , βn ) ∈ Kn . Any such solution implies that v0 = x − ni=1 βi yi ∈ V is the unique minimizing vector for minv∈V x − v. Proof Let M := span ({y1 , . . . , yn }). By Theorem 7.36 and Proposition 4.39, M is a closed subspace of X. Then, M ⊥h = {x¯ ∈ X | x, ¯ yi  = 0, i = 1, . . . , n}. Fix any y0 ∈ V = ∅, we have V = y0 + M ⊥h . By Theorem 13.37, ∃! v0 ∈ V such that x − v0  = minv∈V x − v and a necessary and sufficient condition for any v ∈ V to be v0 is that (x − v0 ) ⊥ M ⊥h . Then, (x − v0 ) ∈ M ⊥h ⊥h = M, by n Proposition Then, ∃(β1 , . . . , βn ) ∈ K such that x − v0 = ni=1 βi yi . Then, -13.17. n v0 = x − i=1 βi yi . Since v0 ∈ V , we must have v0 , yi  = ci , i = 1, . . . , n. This leads to (13.2). Any solution (β1 , . . . , βn ) ∈ Kn to (13.2), we have v0 ∈ V and (x − v0 ) ∈ M = M ⊥h ⊥h . By Theorem 13.37, v0 is the unique minimizing vector for min v ∈ V x − v. This completes the proof of the theorem. ' & Much of the discussion in the above can be generalized from linear varieties to convex sets. Theorem 13.39 Let X be a real Hilbert space, x ∈ X, K ⊆ X be a nonempty closed convex subset. Then, there is a unique vector k0 ∈ K such that x − k0  = mink∈K x − k. Furthermore, a necessary and sufficient condition for k0 ∈ K being the unique minimizing vector is that x − k0 , k − k0  ≤ 0, ∀k ∈ K.

13.6 Other Minimum Norm Problems

723

Fig. 13.5 Projection to a convex set

The main idea of this theorem is illustrated in Fig. 13.5, which shows that the angle between x − k0 and k − k0 is greater than or equal to 90◦ . Proof First, we show the existence of k0 . Let δ := infk∈K x − k ∈ [0, ∞) ⊂ R since K = ∅. Then, ∃ (ki )∞ i=1 ⊆ K such that limi∈N x − ki  = δ. ∀ ∈ (0, ∞) ⊂ R, ∃n ∈ N, ∀i ∈ N with n0 ≤ i, we have δ ≤ x − ki  < 0 J δ 2 +  2 /4. ∀i, j ∈ N with n0 ≤ i ≤ j , by Parallelogram Law 13.3, (x − ki ) − (x − kj )2 +(x − ki ) + (x − kj )2 = 2x − ki 2 +2x − kj 2 . This implies that ki − kj 2 = 2x − ki 2 + 2x − kj 2 − 4x − (ki + kj )/22 < 4(δ 2 +  2 /4) − 4δ 2 =  2 , where the inequality follows from the fact that K is convex and (ki + kj )/2 ∈ K. Hence, (ki )∞ i=1 ⊆ K is Cauchy sequence. By Proposition 4.39, K is complete and limi∈N ki = k0 ∈ K. By Propositions 3.66 and 7.21, δ = limi∈N x − ki  = x − k0 . 6 6 Next, we show the uniqueness of k0 . Let k¯ ∈ K be such that δ = 6x − k¯ 6. The ¯ ¯ sequence (ki )∞ i=1 := (k0 , k, k0 , k, . . .) ⊆ K satisfies limi∈N x − ki  = δ. By the ∞ proof of the existence, (ki )i=1 is convergent. Then, we must have k¯ = k0 . Hence, k0 is unique. Next, we show that x − k0 , k − k0  ≤ 0, ∀k ∈ K by an argument of contradiction. Suppose this is not true. Then, ∃k1 ∈ K such that x − k0 , k1 − k0  =: λ > 0. Consider the vector kα := (1 − α)k0 + αk1 , where α ∈ (0, 1) ⊂ R is to be determined. Since K is convex, then kα ∈ K. Note that x − kα 2 = (x − k0 ) − α (k1 − k0 )2 = x − k0 2 − 2αx − k0 , k1 − k0  + α 2 k1 − k0 2 =

724

13 Hilbert Spaces

δ 2 − 2αλ + α 2 k1 − k0 2 . Then, for sufficiently small α ∈ (0, 1) ⊂ R, we have x − kα 2 < δ 2 , which contradicts with the definition of δ. Hence, x − k0 , k − k0  ≤ 0, ∀k ∈ K. B C ¯ k − k¯ ≤ 0, ∀k ∈ Finally, we show that if k¯ ∈ K satisfies x − k, 6 6 K, then 6x − k¯ 6 = mink∈K x − k. ∀k ∈ K, we have x − k2 = 6 6 6 6 6 6 6 B C 6 6(x − k) ¯ − (k − k) ¯ 62 = 6x − k¯ 62 − 2 x − k, ¯ k − k¯ + 6k − k¯ 62 ≥ 6x − k¯ 62 . 6 6 Hence, 6x − k¯ 6 ≤ x − k2 , ∀k ∈ K. This completes the proof of the theorem. ' &

13.7 Positive Definite Operators on Hilbert Spaces Let X be a real Hilbert space. In Chap. 10, we defined the symmetric operators SX to be BS 2 (X, R) ⊆ B(X, X∗ ). By Riesz-Fréchet Theorem 13.15, ΦX (X∗ ) = X, where ΦX : X∗ → X is the isometrical isomorphism. Then, S¯X := ΦX (SX ) ⊆ B(X, X). ˆ ˆ Fix any A = ΦX Aˆ ∈ S¯X . ∀x, y ∈ X, we have y, Ax = A(x)(y) = A(y)(x) = ∗ ∗ x, Ay = A x, y = y, A x, where the first equality follows since A(x) is a linear functional on X and by Riesz-Fréchet Theorem 13.15; the second equality follows since Aˆ ∈ BS 2 (X, R); the third equality follows from the same argument as the first equality; the fourth equality follows from Proposition 13.22; and the last equality follows from Definition 13.1 and X is a real Hilbert space. By the arbitrariness of x and y, we have A∗ = A, and A is Hermitian. By Definition 13.21, A = A∗ = ΦX A ΦXinv and Aˆ = A ΦXinv since X is a real Hilbert space. Here, we will generalize the definition of SX , S+ X , Spsd X , S− X , and Snsd X to the case where X is a complex Hilbert space. Definition 13.40 Let X be a Hilbert space over K and A ∈ B(X, X). We will write A ∈ S¯X , if A is Hermitian. We will write A ∈ S¯+ X if A ∈ S¯X and ∃m ∈ (0, ∞) ⊂ R such that x, Ax ≥ mx2 , ∀x ∈ X. We will write A ∈ S¯psd X if A ∈ S¯X and ∃m ∈ [0, ∞) ⊂ R such that x, Ax ≥ mx2 , ∀x ∈ X. We will write A ∈ S¯− X if A ∈ S¯X and ∃m ∈ (0, ∞) ⊂ R such that x, Ax ≤ −mx2 , ∀x ∈ X. We will write A ∈ S¯nsd X if A ∈ S¯X and ∃m ∈ [0, ∞) ⊂ R such that x, Ax ≤ −mx2 , ∀x ∈ X. % Proposition 13.41 Let X be a Hilbert space, A ∈ B(X, X) be Hermitian (that is, A ∈ S¯X ). Assume that ∃δ ∈ (0, ∞) ⊂ R such that x, Ax ≥ δx2 , ∀x ∈ X. Then, −1 −1 A is bijective and B C A ∈δ B(X, X).2 Furthermore, A is Hermitian and ∀x ∈ X, we −1 x , ∀ ∈ (0, ∞) ⊂ R. have x, A x ≥ 2 +AB(X,X)

Proof By the assumption, ∀x ∈ X with x = ϑX , A(x) = ϑX since x, Ax ≥ δx2 > 0. Hence, N (A) = {ϑX } and A is injective. By Proposition 13.22, (R(A))⊥h = N (A∗ ) = N (A) = {ϑX }, where the second equality follows from the fact that A is Hermitian. This shows that ((R(A))⊥h )⊥h = X. By Proposition 13.17, R(A) = X. Hence, R(A) is dense in X. Clearly, ϑX ∈ R(A). ∀x ∈ X with x = ϑX ,

13.7 Positive Definite Operators on Hilbert Spaces

725

there exists (xi )∞ i=1 ⊆ R(A) such that x = limi∈N xi . Without loss of generality, we may assume that xi = ϑX , ∀i ∈ N. Let xi = A(x¯i ), ∀i ∈ N, where x¯i ∈ X. Clearly, x¯i = ϑX since xi = ϑX , ∀i ∈ N. Then, we have x¯i xi  ≥ |x¯ i , xi | = |x¯i , Ax¯i | ≥ δx¯i 2 ;

.

∀i ∈ N

where the first inequality follows from the Cauchy–Schwarz Inequality. This implies that xi  ≥ δx¯i  and x¯i  ≤ xi /δ, ∀i ∈ N. Since (xi )∞ i=1 is convergent, then there exists c ∈ (0, ∞) ⊂ R such that xi  ≤ c, ∀i ∈ N. Then, x¯ i  ≤ c/δ =: c1 ∈ ∗∗ (0, ∞) ⊂ R, ∀i ∈ N. This shows that (x¯i )∞ i=1 ⊆ B X (ϑX , c1 ) =: S1 ⊆ X = X , where the last equality follows from Riesz-Fréchet Theorem 13.15. By Alaoglu Theorem 7.122, S1 ⊆ X∗∗ is weak∗ compact. By Propositions 5.22 and 5.26, ∗∗ = X in weak∗ topology. Since X is reflexive, (x¯i )∞ i=1 has a cluster point x¯ ∈ X ∗ ∗∗ the weak topology on X = X is identical to the weak topology on X. By Proposition 7.123, A ∈ B (X, X) is continuous as a mapping A : Xweak → Xweak . ∞ By Proposition 3.66, (xi )∞ ¯ ∈ Xweak in weak i=1 = (A(x¯ i ))i=1 has a cluster point A(x) topology. By Proposition 7.116, Xweak is completely regular, then it is Hausdorff by Proposition 3.61. By Proposition 3.65, A(x) ¯ = x, since x = limi∈N xi and x = limi∈N xi weakly. This shows that x ∈ R(A). By the arbitrariness of x, we have R(A) = X and A is surjective. Hence, A is bijective. By Open Mapping Theorem 7.103, A−1 ∈ B(X, X). Then, −1 A is Hermitian by Proposition 13.22 and the fact that A is Hermitian. ∀x ∈ X, let x¯ = A−1 (x). We have ¯ Ax ¯ ≥ δx ¯ 2 x, A−1 x = Ax, ¯ x ¯ = x,

.

¯ We will distinguish two exhaustive and Then, x = Ax¯ and x ≤ AB(X,X) x. mutually exclusive cases: Case 1: AB(X,X) = 0; Case 2: AB(X,X) > 0. Case 1: AB(X,X) = 0, we have x = 0, ∀x ∈ X. Hence, X = {ϑX }. Then, C B δ x2 = 0, ∀ ∈ (0, ∞) ⊂ R. x, A−1 x = 0 ≥ 2 +AB(X,X)

Case 2: AB(X,X) > 0. Then, we have B C x, A−1 x ≥ δx ¯ 2≥

.



δ A2B(X,X)

δ  + A2B(X,X)

x2

x2 ,

∀ ∈ (0, ∞) ⊂ R

B C Hence, we have shown, in both cases, that x, A−1 x ≥ and ∀ ∈ (0, ∞) ⊂ R. This completes the proof of the proposition.

δ x2 , ∀x +A2B(X,X)

∈X ' &

726

13 Hilbert Spaces

Proposition 13.42 Let X be a Hilbert space over K and A ∈ S¯X . Then, the following statements hold. (i) ∀x ∈ X, x, Ax ∈ R. (ii) S¯X is a closed subset of B(X, X). ∀A1 , A2 ∈ S¯X , ∀λ ∈ R, we have A1 +A2 ∈ S¯X and λA1 ∈ S¯X . If K = R, S¯X is a closed subspace of B(X, X), and hence a Banach subspace. If K = C, no further conclusion can be made about S¯X except that it is a closed convex cone in B(X, X). (iii) S¯+ X and S¯− X are open subsets of S¯X (in the subset topology of S¯X ⊆ B(X, X)); and S¯+ X = −S¯− X . (iv) S¯psd X and S¯nsd X are closed convex cones in S¯X (in the subset topology of S¯X ⊆ B(X, X)); and S¯psd X = −S¯nsd X . (v) A ∈ S¯+ X if, and only if, A−1 ∈ S¯+ X . (vi) A ∈ S¯− X if, and only if, A−1 ∈ S¯− X . (vii) The interior of S¯psd X relative to S¯X is S¯+ X , and the interior of S¯nsd X relative to S¯X is S¯− X . (viii) A ∈ S¯+ X , B, C ∈ S¯psd X , α ∈ (0, ∞) ⊂ R, and β ∈ [0, ∞) ⊂ R, implies that A + B ∈ S¯+ X , B + C ∈ S¯psd X , αA ∈ S¯+ X , and βB ∈ S¯psd X . (ix) Let Y be a Hilbert space over K, and B ∈ B(X, Y). Then, BB ∗ ∈ S¯psd Y and B ∗ B ∈ S¯psd X . Proof (i) ∀x ∈ X, x, Ax = A∗ x, x = Ax, x = x, Ax, where the first equality follows from Proposition 13.22; the second equality follows from A ∈ S¯X ; and the last equality follows from Definition 13.1. Hence, x, Ax ∈ R. ¯ (ii) ∀A ∈ S¯X , by Proposition 4.13, there exists (Ai )∞ i=1 ⊆ SX such that limi∈N Ai = A in B(X, X). Then, ∀x, y ∈ X, we have Ax, y = limi∈N Ai x, y = limi∈N x, A∗i y = limi∈N x, Ai y = x, Ay = A∗ x, y, where the first equality follows from Propositions 7.65, 13.4, 3.12, and 3.66; the second equality follows from Proposition 13.22; the third equality follows from Ai ∈ S¯X and therefore is Hermitian; the fourth equality follows from Propositions 7.65, 13.4, 3.12, and 3.66; and the last equality follows from Proposition 13.22. By the arbitrariness of x and y, we have A = A∗ and therefore A ∈ S¯X . Thus, we have shown that S¯X ⊆ S¯X ⊆ S¯X . By Proposition 3.3, S¯X is a closed subset of B(X, X). ∀A1 , A2 ∈ S¯X and ∀λ ∈ R, we have (A1 + A2 )∗ = A∗1 + A∗2 = A1 + A2 , where the first equality follows from Proposition 13.22. Then, A1 + A2 ∈ S¯X . In addition, (λA1 )∗ = λA∗1 = λA1 , where the first equality follows from Proposition 13.22. Hence, λA1 ∈ S¯X . Note that, the preceding paragraph is enough to conclude that S¯X is a subspace if X is a real Hilbert space. In this case, S¯X is a closed subspace and therefore a Banach subspace of B(X, X) by Proposition 4.39. When K = C, we only have S¯X is a closed convex cone and hence a complete metric subspace of B(X, X) by Proposition 4.39.

13.7 Positive Definite Operators on Hilbert Spaces

727

(iii) It is clear that S¯+ X = −S¯− X . We will show that S¯+ X is open in S¯X . Fix 2 any B ∈ S¯+ X . Then, ∃m ∈ (0,  ∞) ⊂ R such that x, Bx ≥ mx , ∀x ∈ X. 1 ∀C ∈ S¯X ∩ BB(X,X) B, 2 m , we have x, Cx ∈ R, ∀x ∈ X, by (i). Then, x, Cx ≥ x, Bx − |x, Bx − x, Cx| ≥ mx2 − |x, (B − C)x| ≥ mx2 − x(B − C)x ≥ mx2 − xB − CB(X,X) x ≥ mx2 − 12 mx2 = 1 2 2 mx , ∀x ∈ X, where the second inequality follows from the previous discussion and Definition 13.1; the third inequality follows from Cauchy–Schwarz Inequality; the fourth inequality follows from Proposition 7.64; and the fifth inequality follows from the fact that B − CB(X,X) < 12 m. By the arbitrariness of x, C ∈ S¯+ X . By   the arbitrariness of C, we have BB(X,X) B, 12 m ∩ S¯X ⊆ S¯+ X . Hence, S¯+ X is an open subset of S¯X . By the relation S¯+ X = −S¯− X , we have S¯− X is also an open subset of S¯X . (iv) It is clear that S¯psd X = −S¯nsd X . We will show that S¯psd X is a closed subset ¯ of S¯X . ∀B ∈ S¯psd X , by Proposition 4.13, there exists (Bi )∞ i=1 ⊆ Spsd X such that ¯ limi∈N Bi = B in B(X, X). By (ii), B ∈ SX . ∀x ∈ X, x, Bx = limi∈N x, Bi x ≥ 0, where the equality follows from Propositions 7.65, 13.4, 3.12, and 3.66; and the inequality follows from Bi ∈ S¯psd X . By the arbitrariness of x, B ∈ S¯psd X . Thus, we have shown that S¯psd X ⊆ S¯psd X ⊆ S¯psd X . By Proposition 3.3, S¯psd X is a closed subset of S¯X . ∀A1 , A2 ∈ S¯psd X , ∀α ∈ [0, 1] ⊂ R. Let B := αA1 +(1−α)A2. By (ii), B ∈ S¯X . ∀x ∈ X, x, Bx = x, αA1 x + (1 − α)A2 x = x, αA1 x + x, (1 − α)A2 x = αx, A1 x + (1 − α)x, A2 x = αx, A1 x + (1 − α)x, A2 x ≥ 0 = 0x2 , where the second and third equalities follow from Definition 13.1; and the inequality follows from A1 , A2 ∈ S¯psd X . By the arbitrariness of x, we have B ∈ S¯psd X . Hence, S¯psd X is convex. Obviously, ϑB(X,X) ∈ S¯psd X . ∀A1 ∈ S¯psd X , ∀λ ∈ [0, ∞) ⊂ R, we have λA1 ∈ ¯ SX by (ii). ∀x ∈ X, x, λA1 x = λx, A1 x ≥ 0, where the first equality follows from Definition 13.1 and the fact λ ∈ R; and the inequality follows from λ ≥ 0 and A1 ∈ S¯psd X . By the arbitrariness of x, we have λA1 ∈ S¯psd X . Thus, S¯psd X is a cone with vertex at origin in S¯X ⊆ B(X, X). Hence, S¯psd X is a closed convex cone in S¯X . By the relation S¯psd X = −S¯nsd X , we have S¯nsd X is a closed convex cone in S¯X . (v) This follows directly from Proposition 13.41 and Definition 13.40. (vi) This follows immediately from (v) and (iii). (vii) Clearly, S¯+ X is an open subset of S¯X and is contained in S¯psd X , which is a closed subset of S¯X . Let P be the interior of S¯psd X relative S¯X . Then, P ⊇ S¯+ X . We will show that P ⊆ S¯+ X . This will imply that P = S¯+ X . Suppose P ⊆ S¯+ X . Then, there exists B ∈ P \ S¯+ X . B ∈ P implies that ∃δ ∈ (0, ∞) ⊂ R such that BB(X,X) (B, δ) ∩ S¯X ⊆ P ⊆ S¯psd X . Since B ∈ / S¯+ X , then ∃x0 ∈ X such that δ 2 x0 , Bx0  < 4 x0  . Clearly, x0 = ϑX . Consider the operator Bˆ := B − 2δ idX . Clearly, Bˆ ∈ BB(X,X) (B, δ). By Proposition 13.22, Bˆ ∗ = B ∗ − 2δ id∗X = B − δ ¯ ¯ ˆ ˆ ˆ 2 idX = B. Hence, B ∈ SX . By our earlier discussion, we must have B ∈ Spsd X .

728

13 Hilbert Spaces

ˆ 0  = x0 , Bx0 −( δ )x0 , idX x0  < δ x0 2 − δ x0 2 = − δ x0 2 < 0, Yet, x0 , Bx 2 4 2 4 where the first equality follows from Definition 13.1; the first inequality follows from our earlier discussion; and the last equality follows from the fact x0 = ϑX . This contradicts with the fact that Bˆ ∈ S¯psd X . Hence, the hypothesis does not hold. P ⊆ S¯+ X . This proves that P = S¯+ X . The statement that the interior of S¯nsd X relative to S¯X is S¯− X can be proved by an argument that is similar to the preceding paragraph. (viii) This follows immediately from Definition 13.40. (ix) Clearly, B ∗ ∈ B(Y, X). Then, BB ∗ ∈ B(Y, Y). By Proposition 13.22, we have (BB ∗ )∗ = (B ∗ )∗ B ∗ = BB ∗ . Hence, BB ∗ ∈ S¯Y . ∀y ∈ Y, y, BB ∗ yY = B ∗ y, B ∗ yX = B ∗ y2X ≥ 0, where the first equality follows from Proposition 13.22; and the second equality and the inequality follow from Proposition 13.2. Hence, BB ∗ ∈ S¯psd Y . By similar arguments, we have B ∗ B ∈ S¯psd X . This completes the proof of the proposition. ' &

13.8 Pseudoinverse Operator Proposition 13.43 Let X be a Hilbert space, M ⊆ X be a closed subspace. Then, we may define a mapping P : X → M by P (x) = x0 , ∀x ∈ X, where x0 is the unique solution to the x − x0  = minm∈M x − m, by the Classical Projection Theorem 13.13. P is said to be the projection operator of X to M. Then, we have the following properties for the projection operator. (i) (ii) (iii) (iv) (v)

P ∈ B(X, X). P B(X,X) ≤ 1. P 2 = P (idempotent). P ∗ = P (Hermitian). ∀A ∈ B(X, X) that satisfies (iii) and (iv) is a projection operator.

Proof (i) By Theorem 13.19, X = M ⊕ M ⊥h . Then, ∀x ∈ X, there exists a unique pair x1 ∈ M and x2 ∈ M ⊥h such that x1 + x2 = x. Then, x1 = P x. We will show that P is a linear operator. ∀x, y ∈ X, ∀λ ∈ K, we have x = x1 + x2 and y = y1 + y2 with x1 , y1 ∈ M and x2 , y2 ∈ M ⊥h . Then, x + y = x1 + y1 + x2 + y2 with x1 + y1 ∈ M and x2 + y2 ∈ M ⊥h . Hence, P (x + y) = x1 + y1 = P (x) + P (y). λx = λx1 + λx2 with λx1 ∈ M and λx2 ∈ M ⊥h . Then, P (λx) = λx1 = λP (x). Hence, P is linear. Next, we show that P is bounded. P B(X,X) = supx∈X,x≤1 P x. P x2 = x1 2 = x1 , x1  = x − x2 , x − x2  = x, x − x, x2  − x2 , x + x2 , x2  = x2 −x1 + x2 , x2 −x2 , x1 + x2 +x2 , x2  = x2 −x2 , x2  = x2 −x2 2 ≤ x2 , where the first equality follows from the definition of P ; the second equality follows from Proposition 13.2; the third through fifth equalities follows from basic

13.8 Pseudoinverse Operator

729

algebra; the sixth equality follows from the fact x1 ∈ M and x2 ∈ M ⊥h . Hence, P B(X,X) ≤ 1. Hence, P is bounded. Then, P ∈ B(X, X). (ii) This follows from the preceding paragraph. (iii) This is obvious. (iv) ∀x, y ∈ X, x, P y = P ∗ x, y = x1 + x2 , y1  = x1 , y1  = x1 , y1 + y2  = P x, y, where the first equality follows from Proposition 13.22; x = x1 + x2 , y = y1 + y2 , x1 , y1 ∈ M, and x2 , y2 ∈ M ⊥h . By the arbitrariness of x and y, we have P ∗ = P . (v) Let A ∈ B(X, X) satisfies (iii) and (iv). Let M := R(A). Then, M ⊥h = N (A∗ ) = N (A), by Proposition 13.22 and (iv). By Theorem 13.19, X = M ⊥h ⊕ (M ⊥h )⊥h . By Proposition 13.17, (M ⊥h )⊥h = R(A). Let P : X → R(A) be the projection operator. We will show that P = A. ∀x, y ∈ X, we have B C Ax, y − Ay = Ax, y − Ax, Ay = Ax, y − A∗ Ax, y

.

= Ax, y − A2 x, y = Ax, y − Ax, y = 0 where the second equality follows from Proposition 13.22; the third equality follows from (iv); and the fourth equality follows from (iii). Thus, Ax ⊥ y − Ay. By the arbitrariness of x, we have y − Ay ∈ M ⊥h . Note that Ay ∈ R(A). Then, we have Ay = P y. By the arbitrariness of y, we have A = P . This completes the proof of the proposition. ' & Proposition 13.44 Let X and Y be Hilbert spaces over K, A ∈ B(X, Y), and y ∈ Y. Assume that R(A) ⊆ Y is closed. Then, there exists a unique vector x0 ∈ X such that x0 = arg minx1 x1 X , where x1 satisfies Ax1 − yY = minx∈X Ax − yY . We will denote x0 =: A† y, where A† : Y → X is called the pseudoinverse of A. Furthermore, we have the following properties for A† . A† ∈ B(Y, X). (A† )† = A. (A∗ )† = (A† )∗ . A† AA† = A† . AA†A = A. (A† A)∗ = A† A. (AA†)∗ = AA† . A† = (A∗ A)† A∗ . A† = A∗ (AA∗ )† . A† = lim→0+ A∗ ( idY +AA∗ )−1 = lim→0+ ( idX +A∗ A)−1 A∗ . A can be uniquely expressed as A = P1 Ar P2 , where P1 : Y → R(A) and P2 : X → (N (A))⊥h are projection Ar ∈   operators, and ⊥h ; and B (N (A))⊥h , R(A) is bijective with A−1 ∈ B R(A), (N (A)) r A† = P2 A−1 r P1 . (xii) AA† = P1 and A† A = P2 .

(i) (ii) (iii) (iv) (v) (vi) (vii) (viii) (ix) (x) (xi)

730

13 Hilbert Spaces

Proof Since R(A) is closed. Then, by the Classical Projection Theorem 13.13, ¯ Y , furthermore, y0 ∈ ∃! y0 ∈ R(A) such that y − y0 Y = miny∈ ¯ R(A) y − y R(A) is defined to be the unique vector such that y − y0 ∈ (R(A))⊥h . Then, the set of x1 ’s defined in the proposition statement is given by N (A) + x1 , where x1 is any vector such that Ax1 = y0 . Then, by the Classical Projection Theorem 13.13 and Proposition 13.22, ∃! x0 ∈ (N (A))⊥h = R(A∗ ) such that x0 X = minx∈N (A) x1 − xX . This proves the existence and uniqueness of x0 ∈ X for any given y ∈ Y. Thus, A† is well-defined. (i) By Theorem 13.19, X = N (A) ⊕ (N (A))⊥h =: M ⊕ M ⊥h . Define the projection mapping P2 : X → M ⊥h . ∀x ∈ X, there is unique pair x1 ∈ M and x2 ∈ M ⊥h such that x = x1 + x2 = x1 + P2 x. Then, Ax = AP2 x. This allows us to define a mapping Ar : M ⊥h → N := R(A) by Ar (x2 ) = A(x2 ), ∀x2 ∈ M ⊥h . Clearly, Ar ∈ B M ⊥h , N and Ar is surjective. It is easy   to show that N (Ar ) = ϑM ⊥h = ϑX since the domain of Ar is M ⊥h . Then, Ar is bijective. Note that N is a closed subspace of Y by assumption. Then, by Proposition 4.39, N is a Banach subspace of Y. By Proposition 13.17, M ⊥h is a ⊥h closed subspaceof X. Then,  by Proposition 4.39, M is a Banach subspace of X. −1 ⊥ h by Open Mapping Theorem 7.103. Then, A = Ar P2 . By Then, Ar ∈ B N, M Theorem 13.19, Y = N ⊕ N ⊥h . Define P1 : Y → N to be the projection mapping. ⊥h ⊆ X. Then, Clearly, P1 ∈ B(Y, N). ∀y ∈ Y, y0 = P1 y. Let x1 := A−1 r y0 ∈ M −1 −1 Ax1 = Ar P2 Ar y0 = Ar Ar y0 = y0 . Then, x0 = P2 x1 = x1 = A−1 r P1 y. Hence, −1 P ∈ B (Y, X). A† = A−1 P = P A 1 2 r 1 r (xi) This follows immediately from discussion in (i).  the preceding  (ii) For A ∈ B(X, Y), Ar ∈ B M ⊥h , N , A = Ar P2 = P1 Ar P2 , and A† = −1 † † A−1 r P1 = P2 Ar P1 . Then, (A ) = P1 Ar P2 = A. † ∗ −1 ∗ ∗ = P ∗ (A−1 )∗ P ∗ = P A−∗ P , (iii) (A ) = (Ar P1 ) = (P2 A−1 1 r 2 r P1 ) r 1 2 where the first two equalities follow from (xi); the third equality follows from Proposition 13.22; and the last equality follows from Proposition 13.43. Now, (N (A∗ ))⊥h = R(A) = N and R(A∗ ) = (N (A))⊥h = M ⊥h , by Proposition 13.22. Therefore, (A∗ )† = (P2∗ A∗r P1∗ )† = (P2 A∗r P1 )† = P1 (A∗r )−1 P2 = P1 A−∗ r P2 , where the first equality follows from (xi); the second equality follows from Proposition 13.43; the third equality follow from (xi); and the last equality follows from Proposition 13.22. Hence, (A† )∗ = (A∗ )† . −1 (xii) By (xi), we have A† A = P2 A−1 r P1 P1 Ar P2 = P2 Ar Ar P2 = P2 , where the second and third equalities follow from the fact that P1 and P2 are projection −1 operators. Similarly, AA† = P1 Ar P2 P2 A−1 r P1 = P1 Ar Ar P1 = P1 , where the second and third equalities follow from the fact that P1 and P2 are projection operators. −1 † (iv) By (xi) and (xii), we have A† AA† = P2 A−1 r P1 P1 = P2 Ar P1 = A , where the second equality follows from Proposition 13.43. (v) By (xi) and (xiii), we have AA† A = P1 P1 Ar P2 = P1 Ar P2 = A, where the second equality follows from Proposition 13.43. (vi) By (xi) and (xii), we have (A† A)∗ = P2∗ = P2 = A† A, where the second equality follows from Proposition 13.43.

13.8 Pseudoinverse Operator

731

(vii) By (xi) and (xii), we have (AA† )∗ = P1∗ = P1 = AA† , where the second equality follows from Proposition 13.43. (viii) By (xi), we have A∗ A = (P1 Ar P2 )∗ P1 Ar P2 = P2∗ A∗r P1∗ P1 Ar P2 = P2 A∗r Ar P2 , where the second equality follows from Proposition 13.22; the third equality follows from Proposition 13.43 and the fact P1 is a projection operator. Then, we have (A∗ A)† A∗ = P2 (A∗r Ar )−1 P2 (P1 Ar P2 )∗ = −∗ ∗ ∗ ∗ −1 −∗ ∗ −1 −∗ ∗ P2 A−1 r Ar P2 P2 Ar P1 = P2 Ar Ar P2 P2 Ar P1 = P2 Ar Ar Ar P1 = P2 · † A−1 r P1 = A , where the first equality follows from (xi) applied to the operator A∗ A; the second equality follows from Proposition 13.22; the third equality follows from Proposition 13.43; the fourth equality follows from the fact P2 is a projection operator; and the last equality follows from (xi). (ix) By (xi), we have AA∗ = P1 Ar P2 (P1 Ar P2 )∗ = P1 Ar P2 P2∗ A∗r P1∗ = P1 Ar A∗r P1 , where the second equality follows from Proposition 13.22; the third equality follows from Proposition 13.43 and the fact P2 is a projection operator. Then, we have A∗ (AA∗ )† = (P1 Ar P2 )∗ P1 (Ar A∗r )−1 P1 = −1 ∗ −∗ −1 ∗ −∗ −1 P2∗ A∗r P1∗ P1 A−∗ r Ar P1 = P2 Ar P1 P1 Ar Ar P1 = P2 Ar Ar Ar P1 = P2 · −1 † Ar P1 = A , where the first equality follows from (xi) applied to the operator AA∗ ; the second equality follows from Proposition 13.22; the third equality follows from Proposition 13.43; the fourth equality follows from the fact P2 is a projection operator; and the last equality follows from (xi). (x) Note that  idX +A∗ A = (P2 + P¯2 ) + P2 A∗r Ar P2 =  P¯2 idM P¯2 + P2 ( idM ⊥h +A∗r Ar )P2 , ∀ ∈ (0, ∞) ⊂ R, where P¯2 : X → M is the projection operator; the first equality follows from Theorem 13.19 and the proof of (viii); the second equality follows from Proposition 13.43. Then, ( idX +A∗ A)−1 = P¯2 1 idM P¯2 + P2 ( idM ⊥h +A∗r Ar )−1 P2 , ∀ ∈ (0, ∞) ⊂ R, where the equality follows from the fact P2 P¯2 = ϑB(X,X) and Proposition 13.43; and the invertibility of  idM ⊥h +A∗r Ar follows from Proposition 13.41. This yields ( idX +A∗ A)−1 A∗ = (P¯2 1 idM P¯2 + P2 ( idM ⊥h +A∗r Ar )−1 P2 )P2 A∗r P1 = P2 ( idM ⊥h +A∗r Ar )−1 A∗r P1 , ∀ ∈ (0, ∞) ⊂ R. Since A∗r Ar is invertible, by −∗ Proposition 9.55, we have lim→0+ ( idM ⊥h +A∗r Ar )−1 = (A∗r Ar )−1 = A−1 r Ar . ∗ −1 ∗ ∗ −1 ∗ Then, lim→0+ ( idX +A A) A = lim→0+ P2 ( idM ⊥h +Ar Ar ) Ar P1 = −∗ ∗ −1 † P2 A−1 r Ar Ar P1 = P2 Ar P1 = A . The other equality in (x) can be proved by symmetry. This completes the proof of the proposition. ' & Proposition 13.45 Let X and Y be Hilbert spaces over K, and B ∈ B(X, Y). If B is surjective, then BB ∗ ∈ S¯+ Y . If B is injective and R(B ∗ ) ⊆ X is closed, then B ∗ B ∈ S¯+ X . Proof If B is surjective, then R(B) = Y and it is a closed set in Y. By Proposition 13.44, we have B = P1 Br P2 , where P1 : Y → R(B) and  P2 : X → (N (B))⊥h are projection operators, and Br ∈ B (N (B))⊥h , R(B) is bijective. Clearly, P1 = idY . Then, BB ∗ = Br Br∗ is bijective, and then (BB ∗ )−1 ∈ B(Y, Y) by Open Theorem 7.103. Fix any ∀y ∈ Y. Let y¯ := (BB ∗ )−1 y. This B Mapping C ∗ −1 yields y, (BB ) y Y = BB ∗ y, ¯ y ¯ Y = B ∗ y, ¯ B ∗ y ¯ X = B ∗ y ¯ 2X . Define

732

13 Hilbert Spaces

x¯ := B ∗ y¯ = B ∗ (BB ∗ )−1 y = B † y ∈ X, where the second equality follows from Proposition 13.44. This leads to B x¯ = BB † y = P1 y = y, where the second equality follows from the proof for (vi) of Proposition 13.44. This yields yY ≤ B C B2B(X,Y) BB(X,Y) x x ¯ 2X ≥ ¯ X . This implies that y, (BB ∗ )−1 y Y = x ¯ 2X ≥ 2 +BB(X,Y) 1 2 y , where the first equality follows from the preceding discussion; the Y +B2B(X,Y)

first inequality holds for any  ∈ (0, ∞) ⊂ R; and the last inequality follows the preceding discussion. This yields that (BB ∗ )−1 ∈ S¯+ Y (clearly, (BB ∗ )−1 ∈ S¯Y ). By (v) of Proposition 13.42, BB ∗ ∈ S¯+ Y . If B is injective and R(B ∗ ) ⊆ X is closed, then N (B) = {ϑX }. By Proposition 13.22, we have R(B ∗ ) = (N (B))⊥h = X. Hence, B ∗ is surjective. By the preceding paragraph, we have B ∗ (B ∗ )∗ = B ∗ B ∈ S¯+ X , where the equality follows from Proposition 13.22. This completes the proof of the proposition. ' &

13.9 Spectral Theory of Linear Operators Definition 13.46 Let X be a normed linear space over K, A ∈ B(X, X), λ ∈ K, and x0 ∈ X with x0 = ϑX . If λx0 = Ax0 , or equivalently (λ idX −A)x0 = ϑX , we will say that λ is an eigenvalue of A, x0 is an eigenvector of A associated with the eigenvalue λ. If (λ idX −A)k−1 x0 = ϑX and (λ idX −A)k x0 = ϑX , for some k ∈ N with k ≥ 2, we will say that x0 is a generalized eigenvector of grade k of A associated with the eigenvalue λ. % Proposition 13.47 Let X be a Hilbert space over K, Q ∈ B(X, X) be Hermitian, and A ∈ B(X, X). Then, the following statements hold. (i) If λ ∈ K is an eigenvalue of Q, then λ ∈ R. (ii) If Sγ ⊆ X be an linearly independent  of eigenvectors of Q associated with  set eigenvalue λγ ∈ R, ∀γ ∈ Γ , and λγ γ ∈Γ is pairwise distinct, then the set  S := γ ∈Γ Sγ is a linearly independent set. (iii) If xi is an eigenvector of Q associated with the eigenvalue λi ∈ K, i = 1, 2, and λ1 = λ2 , then x1 , x2  = 0 and x1 ⊥ x2 . (iv) Q has no generalized eigenvectors of grade k ≥ 2 associated with any eigenvalue of Q, (i. e., Q has only eigenvectors but not generalized eigenvectors). (v) If R(Q) ⊆ X is closed, then 0 ∈ K is not an eigenvalue of Q if, and only if, Q is bijective and Q−1 ∈ B(X, X). (vi) If Q ∈ S¯+ X then ∃m ∈ (0, ∞) ⊂ R such that ∀λ ∈ R with λ being an eigenvalue of Q, we have λ ≥ m. (vii) If λ ∈ K is an eigenvalue of A, then |λ| ≤ AB(X,X) idX B(X,X) ≤ AB(X,X) . (viii) Q ∈ S¯psd X implies that all of the eigenvalues of Q are nonnegative.

13.9 Spectral Theory of Linear Operators

733

Proof (i) Let x0 ∈ X be the eigenvector of Q associated with the eigenvalue λ. Then, λx0 = Qx0 . This implies that λx0 2 = λx0 , x0  = x0 , λx0  = x0 , Qx0  = Q∗ x0 , x0  = Qx0 , x0  = λx0 , x0  = λx0 2 , where the first equality follows from Proposition 13.2; the second equality follows from Definition 13.1; the third equality follows from the fact that x0 is an eigenvector of Q associated with the eigenvalue λ; the fourth equality follows from Proposition 13.22; the fifth equality follows from the assumption that Q is Hermitian; the sixth equality follows from the fact that x0 is an eigenvector of Q associated with eigenvalue λ; and the seventh equality follows from Definition 13.1 and Proposition 13.2. Note that x0 = ϑX since it is an eigenvector, then x0 2 > 0. Thus, the above implies that λ = λ. Hence, λ ∈ R. (ii) We will use mathematical induction on n to show that there does not exist an n ∈ Z+ such that ∃x0 ∈ S and ∃x1, . . . , xn ∈ S \ {x0 }, which are distinct, and ∃α1 , . . . , αn ∈ K \ {0} such that x0 = ni=1 αi xi . 1◦ n = 0. Suppose that the result does not hold, then ∃x0 ∈ S such that x0 = ϑX . Note that x0 ∈ Sγ0 , for some γ0 ∈ Γ . Then, x0 is an eigenvector of Q associated with eigenvalue λγ0 . By Definition 13.46, x0 = ϑX . This is a contradiction. This case is proved. 2◦ Assume the claim holds for n ≤ k ∈ Z+ . 3◦ Consider the case n = k + 1 ∈ N. Suppose that the result does not hold. Then, ∃x0 ∈ S and ∃x1, . . . , xn ∈ S \ {x0 }, which are distinct, and ∃α1 , . . . , αn ∈ K \ {0} such that x0 = ni=1 αi xi . We will distinguish two exhaustive and mutually exclusive cases: Case 1: x0 , . . . , xn ∈ Sγ0 for some γ0 ∈ Λ; Case 2: x0 ∈ Sγ0 with γ0 ∈ Γ , and ∃i1 ∈ {1, . . . , n} such that xi1 ∈ Sγi1 with γi1 ∈ Γ and γi1 = γ0 . Case 1: x0 , . . . , xn ∈ Sγ0 for some γ0 ∈ Γ . Then, x0 , . . . , xn are distinct -nvectors in Sγ0 . By assumption, Sγ0 is a linearly independent set. Then, x0 = i=1 αi xi contradicts with the fact that Sγ0 being a linearly independent set. Case 2: x0 ∈ Sγ0 with γ0 ∈ Γ , and ∃i1 ∈ {1, . . . , n} such that xi1 ∈ Sγi1 with γi1 ∈ Γ and γi1 = γ0 . Without loss of generality, assume xi ∈ Sγi , i = 1, . . . , n, -that n where γ ∈ Γ . Then, ϑ = (λ id −Q)x = (λ i γ 0 X X 0 i=1 γ0 idX −Q)(αi xi ) = -n α (λ − λ )x . By the preceding discussion and (i), 0 = λγ0 − λγi1 ∈ R. i γ γ i 0 i i=1 -n (λγ0 −λγi )αi Then, we have ϑX = i=1 (λγ0 −λγi )αi1 xi . Rearranging the equation, we have 1 -n (λγ −λγ )αi xi1 = i=1,i =i1 − (λγ 0−λγ i )αi xi . This contradicts with the inductive assumption. 0

i1

1

Hence, the result holds in this case as well. This completes the inductive process. Hence, S is a linearly independent set. (iii) By (i), λ1 , λ2 ∈ R. We have λ1 x1 , x2  = λ1 x1 , x2  = Qx1 , x2  = x1 , Q∗ x2  = x1 , Qx2  = x1 , λ2 x2  = λ2 x1 , x2  = λ2 x1 , x2 , where the first equality follows from Definition 13.1; the second equality follows from the fact that x1 is an eigenvector of Q associated with the eigenvalue λ1 ; the third equality follows from Proposition 13.22; the fourth equality follows from the fact that Q is Hermitian; the fifth equality follows from the fact that x2 is an eigenvector of Q associated with the eigenvalue λ2 ; the six equality follows from Definition 13.1; and the last equality follows from the fact λ2 ∈ R. Since λ1 = λ2 , then x1 , x2  = 0 and x1 ⊥ x2 .

734

13 Hilbert Spaces

(iv) Suppose the result does not hold. Let x0 ∈ X be a generalized eigenvector of grade k ≥ 2 of Q associated with eigenvalue λ ∈ R. Then, B k−1 x0 = ϑX and k−2 x and x k−1 x . B k x0 = ϑX , where B := λ idX −Q. Let xk−2 k−1 := B 0 C0 B := B2 2 ∗ Then, B xk−2 = ϑX . This implies that 0 = xk−2 , B xk−2 = B xk−2 , Bxk−2  = Bxk−2 , Bxk−2  = xk−1 , xk−1  = xk−1 2 , where the second equality follows from Proposition 13.22; the third equality follows from the assumption that Q is Hermitian; and the fifth equality follows from Proposition 13.2. Then, xk−1 = ϑX . This contradicts with B k−1 x0 = ϑX . Hence, the result holds. (v) “Necessity” Since 0 is not an eigenvalue of Q, then N (Q) = {ϑX }. By Proposition 13.22, R(Q) = (N (Q∗ ))⊥h = (N (Q))⊥h = X. Then, Q is bijective and invertible. By Open Mapping Theorem 7.103, Q−1 ∈ B(X, X). “Sufficiency” Since Q is bijective, then N (Q) = {ϑX }. Then, 0 ∈ K is not an eigenvalue of Q, since there is no eigenvector associated with 0. (vi) By Definition 13.40, ∃m ∈ (0, ∞) ⊂ R such that x, Qx ≥ mx2 , ∀x ∈ X. Let x0 ∈ X be an eigenvector of Q associated with the eigenvalue λ ∈ R. Then, we have λx0 2 = x0 , λx0  = x0 , Qx0  ≥ mx0 2 . Since x0 is an eigenvector, then x0 = ϑX and x0  > 0. Then, we have λ ≥ m. (vii) Fix any λ ∈ K with |λ| > AB(X,X) idX B(X,X) ≥ 0. Then, 6 6 −1 6 6 1 6 id 6 6 idX −(idX − 1 A)6 X B(X,X) = |λ| AB(X,X) idX B(X,X) < 1. By λ B(X,X)

Proposition 9.55, idX − λ1 A is bijective and admits continuous inverse. Then, λ idX −A is bijective and admits continuous inverse since λ = 0. This implies that N (λ idX −A) = {ϑX }. Hence, λ is not an eigenvalue of A. Hence, the result holds noting idX B(X,X) ≤ 1. (viii) Since Q ∈ S¯psd X , then x, Qx ≥ 0, ∀x ∈ X. By (i), all eigenvalues of Q are real. Suppose that the result does not hold. Then, ∃λ ∈ (−∞, 0) ⊂ R and ∃x0 ∈ X such that x0 is an eigenvector of Q associated with the eigenvalue λ. Then, 0 ≤ x0 , Qx0  = x0 , λx0  = λx0 2 < 0, where the first equality follows from the hypothesis; and the second equality follows from Definition 13.1 and Proposition 13.2; and the last inequality follows from x0 = ϑX since it is an eigenvector. This is a contradiction. Hence, all eigenvalues of Q are nonnegative. This completes the proof of the proposition. ' & Proposition 13.48 Let X be a Hilbert space over K, A ∈ B(X, X) be a Hermitian operator. Then, λ := supx∈X,x≤1 |Ax, x| = AB(X,X) . Proof Note that |Ax, x| ≤ Axx ≤ AB(X,X) x2 , ∀x ∈ X, where the first inequality follows from Cauchy–Schwarz Inequality; and the second inequality follows from Proposition 7.64. Hence, λ ≤ AB(X,X) . Fix any u ∈ B X (ϑX , 1), let v1 := αu + α −1 Au and v2 := αu − α −1 Au, where α ∈ (0, ∞) ⊂ R. Note also that B C B C B C Au2 = Au, Au = A∗ Au, u = A2 u, u = u, A2 u

.

1 1 (Av1 , v1  − Av2 , v2 ) ≤ (λv1 2 + λv2 2 ) 4 4 1 1 = λ(α 2 u2 + α −2 Au2 ) ≤ λ(α 2 + α −2 Au2 ) 2 2

=

13.9 Spectral Theory of Linear Operators

735

where the first equality follows from Proposition 13.2; the second equality follows from Proposition 13.22; the third equality follows from the fact A is Hermitian; the fourth equality follows from Proposition 13.22 and A being Hermitian; the fifth equality follows from straightforward algebra; the first inequality follows from the definition of λ; the sixth equality follows from straightforward algebra; and the last inequality follows from u ∈ B X (ϑX , 1). If Au = 0, then Au ≤ λ. On the other hand, if Au > 0, let α 2 := Au, we have Au2 ≤ 12 λ2Au, which leads to Au ≤ λ. In both cases, we have Au ≤ λ, ∀u ∈ B X (ϑX , 1). Hence, A ≤ λ. This completes the proof of the proposition. ' & Definition 13.49 Let X be a Banach space over K1 and Y be a Banach space over K2 , D ⊆ X, f : D → Y be continuous. f is said to be a compact operator if ∀r ∈ [0, ∞) ⊂ R, f (B X (ϑX , r)) ⊆ Y is a compact set. % It is then clear that if f : X → Y is a compact operator, then, ∀(xi )∞ i=1 ⊆ X, which is bounded, then (f (xi ))∞ ⊆ Y has a subsequence converges to some y0 ∈ Y, by i=1 Borel-Lebesgue Theorem 5.37. Proposition 13.50 Let X be a Banach space over K and Y be a Banach space over K, A ∈ B(X, Y). A is a compact operator if, and only if, A(BX (ϑX , 1)) ⊆ Y is a compact set. We will denote the set of such A’s by K(X, Y) := {A ∈ B(X, Y) | A is a compact operator}. Proof “Necessity” is immediate. “Sufficiency” Let Θ := A(B X (ϑX , 1)) ⊆ Y. Then, by A being a compact operator, we have Θ is a compact set. ∀r ∈ [0, ∞) ⊂ R, A(BX (ϑX , r)) = rΘ ⊆ Y since A is a linear operator. Then, by Proposition 7.102, A(BX (x0 , r)) = rΘ is sequentially compact since Θ is. By Borel-Lebesgue Theorem 5.37, A(BX (x0 , r)) is compact. Hence, A is a compact operator. This completes the proof of the proposition. ' & Proposition 13.51 Let X, Y, and Z be Banach space over K. Then, the following statements holds. (i) Let A ∈ K(X, Y). ∀C ∈ B(Z, X), ∀D ∈ B(Y, Z), we have AC ∈ K(Z, Y) and DA ∈ K(X, Z). (ii) (Fredholm Alternative) Let A ∈ K(X, X). ∀λ ∈ K with λ = 0, we have E := (λ idX −A)(F ) ⊆ X is a closed set, ∀F ⊆ X that is a closed and bounded set. (The operator λ idX −A is called the Fredholm alternative of A.) Proof (i) Note that ## " " E1 := (AC)(B Z (ϑZ , 1)) ⊆ A B X ϑX , CB(Z,X) =: F1 ⊆ Y

.

736

13 Hilbert Spaces

where the set containment follows from the fact that C ∈ B(Z, X). By Definition 13.49, F1 is a compact. Then, E1 is a compact set by Proposition 5.5. Hence, AC ∈ K(Z, Y) by Proposition 13.50. Note that G1 := (DA)(B X (ϑX , 1)) ⊆ D(A(B X (ϑX , 1))) =: D(F2 ) ⊆ Z. By Proposition 13.50, F2 is compact. By Proposition 5.7, D(F2 ) is compact. By Proposition 5.5, D(F2 ) = D(F2 ) and hence D(F2 ) is compact. Then, G1 is compact by Proposition 5.5. Hence, DA ∈ K(X, Z) by Proposition 13.50. (ii) Fix any y0 ∈ E. By Proposition 4.13, ∃ (yn )∞ n=1 ⊆ E such that y0 = limn∈N yn . Then, by the definition of E, there exists (xn )∞ n=1 ⊆ F ⊆ B X (ϑX , m), for some m ∈ N since F is a bounded set, such that yn = (λ idX −A)xn = λxn − Axn , ∀n ∈ N. Since A is a compact operator, by Definition 13.49, we have A(BX (ϑX , m)) ⊆ X is a compact set. By Borel-Lebesgue Theorem 5.37, there  ∞ exists a subsequence xnk k=1 such that limk∈N Axnk = x¯ ∈ A(B X (ϑX , m)). Thus, we have λ limk∈N xnk = limk∈N ynk + limk∈N Axnk = y0 + x¯ ∈ X. Since λ = 0, we have limk∈N xnk = λ−1 (y0 + x) ¯ =: x0 ∈ F , where the last set membership follows since F is a closed subset of X. Then, we have (λ idX −A)(x0) = limk∈N (λ idX −A)(xnk ) = limk∈N (λxnk − Axnk ) = λx0 − x¯ = y0 . Hence, y0 ∈ (λ idX −A)(F ) = E. By the arbitrariness of y0 , we have E ⊆ E ⊆ E. Hence, E = E and E is closed by Proposition 3.3. ' & Theorem 13.52 (Spectral Theory) Let X and Y be Hilbert space over K, A ∈ B(X, Y), and Q ∈ S¯X . Then, the following results hold. (i) Let A ∈ K(X, Y). Then, A = ϑB(X,Y) if, and only if, AB(X,Y) =: σ > 0. In 1 this case, ∃x ∈ X with xX = 1 such that AxY = σ . Let y := Ax Ax ∈ Y

(ii)

(iii)

(iv)

(v)

¯ where A¯ ∈ K(X, Y) and A¯ = P¯1 A P¯2 , Y, then yY = 1 and A = σyx ∗ + A, P1 : Y → span ({y}) =: N, P2 : X → span ({x}) =: M, P¯1 = idY −P1 : Y → N ⊥h , and P¯2 = idX −P2 : X → M ⊥h are projection operators. ¯ A ∈ K(X, Y) if, and only if, ∃n¯ ∈ Z+ ∪ {∞} and ∃ (xi )ni=1 ⊆ X, which is an n¯ orthonormal sequence, and ∃ (yi )i=1 ⊆ Y, which is an orthonormal sequence, and σ =: σ1 ≥-σ2 ≥ · · · ≥ σi ∈ (0, ∞) ⊂ R, ∀i ∈ N with 1 ≤ i < n¯ + 1, ¯ such that A = ni=1 σi yi xi∗ , and in case n¯ = ∞, we have limi∈N σi = 0. ¯ Let Q ∈ SX ∩ K(X, X). Then, Q = ϑB(X,X) if, and only if, Q admits an eigenvector x0 ∈ X with x0  = 1 associated with the eigenvalue λ ∈ R \ {0} ¯ where Q¯ = with |λ| = QB(X,X) ∈ (0, ∞) ⊂ R. Then, Q = λx0 x0∗ + Q, P¯1 QP¯1 ∈ S¯X ∩ K(X, X), P1 : X → span ({x0 }) =: M, and P¯1 : X → M ⊥h are projection operators with P1 + P¯1 = idX . Q ∈ S¯X ∩ K(X, X) if, and only if, ∃n ∈ Z+ ∪ {∞}, Q has n eigenvectors xi ∈ X with xi  = 1, each associated with an eigenvalue λi ∈ R \ {0}, and QB(X,X) = |λ1 | ≥ |λ2 | ≥ · · · ≥ |λi | ∈ (0, ∞) ⊂ R, i ∈ N with 1 ≤ i < n + 1, where (xi )ni=1 ⊆ X is an orthonormal sequence, and Q = ni=1 λi xi xi∗ ; and in case n = ∞, we have limi∈N |λi | = 0. Let Q ∈ S¯X ∩ K(X, X) and the results of (iv) hold. ∀λ ∈ K with λ = 0 is an eigenvalue of Q if, and only if, ∃i0 ∈ N with 1 ≤ i0 < n + 1 such that

13.9 Spectral Theory of Linear Operators

737

λi0 = λ. In this case, ∃m ∈ N with m < n+1 such that dim (N (λ idX −A)) = m = maxs∈N dim (N ((λ idX −A)s )), which is called the multiplicity of the eigenvalue λ for Q, and there is exactly m λi ’s such that λ = λi . ¯ (vi) Let A ∈ K(X, Y) and the results of (ii) hold. Then, the values (σi )ni=1 is n¯ ¯ . uniquely defined that is independent of the choice of (yi )i=1 and (xi )ni=1 ¯ is said to be the singular values of the operator A. The formula These (σi )ni=1 -n¯ ∗ A = i=1 σi yi xi is said to be the singular value decomposition of the operator A ∈ K(X, Y). In the above theorem, we have established that A ∈ K(X, Y) if, and only if, it admits the infinite-dimensional version of the singular value decomposition for matrices. ¯ σi s are the singular values, 1 ≤ i < n¯ + 1, (yi )ni=1 replaces the unitary matrix U ; n¯ and (xi )i=1 replace the unitary matrix V . For a Hermitian compact operator Q, it can always be diagonalized by its eigenvectors of all of its nonzero eigenvalues. Its singular value decomposition is given in terms of its eigenvectors and eigenvalues, where the yi = sgn(λi )xi and xi is the eigenvector associated with eigenvalue λi = 0 with σi = |λi | > 0, ∀i ∈ N with 1 ≤ i < n + 1. Proof (i) Let A ∈ K(X, Y). Note that AB(X,Y) = sup

x∈X xX ≤1

AxY = σ > 0

if, and only if, A = ϑB(X,Y) . In case of σ > 0, then, ∃ (zn )∞ n=1 ⊆ B X (ϑX , 1)

such that limn∈N Azn Y = σ . Then, (Azn )∞ n=1 ⊆ A(BX (ϑX , 1)) =: E ⊆ Y. By A ∈ K(X, Y) and Definition 13.49, we have E is compact. Then, E is sequentially compact by Borel-Lebesgue Theorem 5.37. Thus, there exists a ∞ subsequence (Aznk )∞ k=1 of (Azn )n=1 such that limk∈N Aznk = y¯ ∈ E. Thus, we have y ¯ Y = limk∈N Aznk Y = σ > 0. By Alaoglu Theorem 7.122 and RieszFréchet Theorem 13.15, B X (ϑX , 1) =: F ⊆ X is weakly compact (since X is a Hilbert space). Then, (znk )∞ k=1 ⊆ F has the Bolzano–Weierstrass property by Propositions 5.22 and 5.26. Then, ∃¯z ∈ F such that (znk )∞ k=1 has z¯ as a weak cluster point. By Proposition 7.123, A ∈ B(X, Y) implies that A : Xweak → Yweak is continuous. Then, the sequence (Aznk )∞ k=1 has a weak cluster point at A¯z ∈ Yweak , by Proposition 3.66. By limk∈N Aznk = y, ¯ we have limk∈N Aznk = y¯ weakly. Then, by Proposition 7.116, Yweak is Hausdorff, which further implies, by Proposition 3.65, y¯ = A¯z. Thus, we have A¯zY = σ > 0 and ¯zX ≤ 1. By the fact that AB(X,Y) = σ , we have ¯zX = 1 (otherwise, AB(X,Y) > σ and leads to a contradiction). Let x := z¯ and y := σ1 A¯z = σ1 y. ¯ Then, we have xX = 1 = yY . Since X is a Hilbert space, we have x ∗ ∈ X∗ , x ∗ X∗ = 1, and x ∗ , x = x, xX = 1. Let A¯ := A − σyx ∗ . By Theorem 13.19, we have P1 + P¯1 = idY and P2 + P¯2 = idX . ∀x˜ ∈ X, we have A¯ x˜ = Ax˜ −B σyx C∗ , x ˜ = A(P2 + P¯2 )x˜ − σyx, ˜ xX = AP2 x˜ + AP¯2 x˜ − σyP2 x, ˜ xX − σy P¯2 x, ˜ x X , where the first equality follows from ¯ and the last two equalities follow from the preceding discussion. the definition of A; ˜ for some α˜ ∈ K. Then, A¯ x˜C = Note that P2 x˜ ∈ M = span ({x}) and B thenC P2 x˜ = αx, B αAx ˜ + AP¯2 x˜ − σyαx, ˜ xX − σy P¯2 x, ˜ x X = ασy ˜ + AP¯2 x˜ − ασy ˜ − σy x, ˜ P¯2∗ x = AP¯2 x, ˜ ∀x˜ ∈ X, where the second equality follows from the fact σy = Ax = A¯z;

738

13 Hilbert Spaces

and the last equality follows from Propositions 13.22 and 13.43 and the fact that P¯2 x = ϑX . By the arbitrariness of x, ˜ we have A¯ = AP¯2 . ∗ ¯ Note that A = σyx + A = σyx ∗ + AP¯2 = σyx ∗ + P1 AP¯2 + P¯1 AP¯2 . where the ¯ the second equality follows from the first equality follows from the definition of A; preceding paragraph; and the last equality follows from P1 + P¯1 = idY . We need the following result. Claim 13.52.1 P1 AP¯2 = ϑBM ⊥h ,N  . Proof of Claim We will prove this by an Suppose 6 argument 6 6 of contradiction. 6 P1 AP¯2 = ϑBM ⊥h ,N  . Then, 0 < β := 6P1 AP¯2 6BM ⊥h ,N  = 6P1 AP¯2 6B(X,Y) . 6 6 Therefore, ∃s ∈ M ⊥h with sX = 1 such that 6P1 AP¯2 s 6Y > β2 > 0. Note that P1 AP¯2 s ∈ N and hence P1 AP¯2 s = ty, for some t ∈ K with t = 0. 6 62 ∀α ∈ K, A(x + αs)2Y = σy + αAs2Y = 6σy + αP1 AP¯2 s + α P¯1 AP¯2 s 6Y = 6 6 6(σ + αt)y + α P¯1 AP¯2 s 62 . Choose α = tδ, where δ = 1 ∈ (0, ∞) ⊂ Y 6 62 2σ R. Thus, A(x + αs)2Y = 6(σ + |t|2 δ)y + tδ P¯1 AP¯2 s 6Y = (σ + |t|2 δ)2 + 62 6 2 |t |2 2 2 |t|2 δ 2 6P¯1 AP¯2 s 6Y ≥ (σ + |t|2 δ)2 = σ 2 (1 + |tσ| δ )2 = σ 2 (1 + 2σ 2 ) > σ (1 + |t |2 2 ) 4σ 2

= σ 2 (1 + |t|2 δ 2 )2 > σ 2 (1 + |t|2 δ 2 ), where the second equality follows 6 62 from P¯1 y = ϑY and Proposition 13.2. Note that x + αs2X = 6x + tδs 6X = 6 62 x2X + 6tδs 6X = 1 + |t|2 δ 2 . These implies that A(x + αs)2Y > σ 2 x + αs2X and AB(X,Y) > σ . This is a contradiction. Hence, the hypothesis is invalid. So, ' & P1 AP¯2 = ϑBM ⊥h ,N  . This completes the proof of the claim. Then, we have A = σyx ∗ + P¯1 AP¯2 6and6 hence A¯ =6 P¯61 AP¯2 . By Proposition 13.51, A¯ ∈ K(X, Y). Furthermore, 6A¯ 6B(X,Y) ≤ 6P¯1 6B(Y,Y) AB(X,Y) · 6 6 6P¯2 6 ≤ AB(X,Y) = σ . Clearly, A¯ ∈ K(M ⊥h , N ⊥h ). This proves (i). B(X,X) ¯ For notational (ii) “Necessity” Let A ∈ K(X, Y). Then, (i) holds. Repeat (i) for A. consistency, we let σ1 = σ , N0 := {ϑY }, N1 := N, M0 := {ϑX }, M1 := M, x1 := x, y1 := y, P1,1 := P1 , P¯1,1 := P¯1 , P1,2 := P2 , P¯1,2 := P¯2 , A1 := A, ¯ Recursively, assume that we have completed i steps and arrived at and A2 := A. ¯ Ai+1 = Pi,1 Ai P¯i,2 , (xj )ij =1 ⊆ X is an orthonormal sequence, (yj )ij =1 ⊆ Y is an h h orthonormal sequence, Pj,1 : Nj⊥−1 → span ({yj }) and Pj,2 : Mj⊥−1 → span ({xj }) ⊥ ⊥ h h ¯ are projection operators, Pj,1 : N → N := (span ({y1 , . . . , yj }))⊥h and

j −1

j

h P¯j,2 : Mj⊥−1 → Mj⊥h := (span ({x1 , . . . , xj }))⊥h are projection operators, Pj,1 + P¯j,1 = id ⊥h , Pj,2 + P¯j,2 = id ⊥h , σ1 ≥ σ2 ≥ · · · ≥ σi > 0, Nj−1 Mj−1 -i ∗ σj = Aj B(X,Y) , j = 1, . . . , i, and A = j =1 σj yj xj + Ai+1 . Consider σi+1 := Ai+1 B(X,Y) . If σi+1 = 0, then, we have n¯ = i, Ai+1 = ϑB(X,Y) and -n¯ ∗ A = j =1 σj yj xj . The desired result holds. In case σi+1 > 0, we apply (i) to

1 Ai+1 . By (i), ∃xi+1 ∈ Mi⊥h and yi+1 := σi+1 Axi+1 ∈ Ni⊥h with xi+1 X = 1 = ∗ +A yi+1 Y such that Ai+1 = σi+1 yi+1 xi+1 i+2 , where Ai+2 = P¯i+1,1 Ai+1 P¯i+1,2 ,

13.9 Spectral Theory of Linear Operators

739

Pi+1,1 : Ni⊥h → span ({yi+1 }) and Pi+1,2 : Mi⊥h → span ({xi+1 }) are projection ⊥h := (span ({y1 , . . . , yi+1 }))⊥h operators, P¯i+1,1 = idN ⊥h −Pi+1,1 : Ni⊥h → Ni+1 i

⊥h and P¯i+1,2 = idM ⊥h −Pi+1,2 : Mi⊥h → Mi+1 := (span ({x1 , . . . , xi+1 }))⊥h are i

i+1 projection operators. Clearly, (xj )i+1 j =1 is orthonormal and so is (yj )j =1 . Clearly, σ = σ1 ≥ σ2 ≥ · · · ≥ σi+1 > 0. Recursively, either the above process stops in finite number of steps, say n¯ ∈ N, with σn+1 = 0, then, (ii) holds with a finite n; ¯ or the above process continues ¯ indefinitely, and we have n¯ = ∞ and σi > 0, ∀i ∈ N. In the latter case, clearly ∞ (xj )∞ j =1 and (yj )j =1 are orthonormal sequences. We need the following result.

Claim 13.52.2 limi∈N σi = 0. Proof of Claim Clearly, limi∈N σi =: σ¯ ∈ [0, ∞) ⊂ R. We will show the claim by an argument of contradiction. Suppose limi∈N σi = σ¯ > 0. Then, we have ∞ Axi = σi yi , ∀i ∈ N. (xi )∞ i=1 ⊆ B X (ϑX , 1), but (σi yi )i=1 ⊆ E does not have ∞ any convergent subsequence since (yi )i=1 is orthonormal. This contradicts with the assumption that A is a compact operator and E is sequentially compact. Hence, the hypothesis is invalid. Hence, we must have σ¯ = 0 = limi∈N σi . This completes the proof of the claim. ' & -n Then, An+1 = A − i=1 σi yi xi∗ , ∀n-∈ N. We have limn∈N An+1 B(X,Y) = ∗ limn∈N σn+1 = 0. This implies that A = ∞ i=1 σi yi xi . This completes the necessity part of the proof. ¯ “Sufficiency” Let ∃n¯ ∈ Z+ ∪ {∞} and ∃ (xi )ni=1 ⊆ X, which is an orthonormal n¯ sequence, and ∃ (yi )i=1 ⊆ Y, which is an orthonormal sequence, and σ =: σ1 ≥ σ2 ≥ · · · ≥ σi ∈ (0, ∞) ⊂ R, ∀i ∈ N with 1 ≤ i < n¯ + 1, such that A = n¯ ∗ ¯ = ∞, we have limi∈N σi = 0. Fix any (x˜i )∞ i=1 ⊆ F , i=1 σi yi xi , and in case n we will show that there exists a subsequence (Ax˜ik )∞ ⊆ E that converges to some k=1 y0 ∈ Y, or equivalently, that the subsequence (Ax˜ik )∞ ⊆ E is a Cauchy sequence. k=1 This shows that E is sequentially compact. By Proposition 13.50, A ∈ K(X, Y). This then completes the sufficiency part of the proof. We need the following intermediate result. ∞ Claim 13.52.3 There exists a subsequence (x˜ik )∞ k=1 of (x˜ i )i=1 such that ¯ limk∈N x˜ik , xs X = αs ∈ B K (0, 1) =: F , ∀s ∈ N with 1 ≤ s < n¯ + 1.

¯ ¯ Proof of Claim Note that (x˜i , x1 X )∞ i=1 ⊆ F and F ⊆ K is compact. Then, there exists a subsequence (x˜il )∞ such that lim  x l∈N ˜ il , x1 X = α1 ∈ F¯ . Note l=1  ∞ ¯ that x˜il , x2 X l=1 ⊆ F . Then, there exists a subsubsequence (x˜ilj )∞ j =1 such that ¯ limj ∈N x˜il , x2 X = α2 ∈ F . Clearly, limj ∈N x˜il , x1 X = α1 ∈ F¯ . Note that j

j

∞ ¯ (x˜ilj , x3 X )∞ j =1 ⊆ F . Then, there exists a subsubsubsequence (x˜ iljk )k=1 such that limk∈N x˜il , x3 X = α3 ∈ F¯ . Clearly, limk∈N x˜il , xs X = αs ∈ F¯ , s = 1, 2, 3. jk

Recursively, continue this process till n. ¯

jk

740

13 Hilbert Spaces

If n¯ is finite, we may obtain (after finite recursions) a subsequence (x˜ik )∞ k=1 of ¯ (x˜i )∞ such that lim  x ˜ , x  = α ∈ F , ∀s ∈ N with 1 ≤ s ≤ n. ¯ s k∈N ik s X i=1 If n¯ = ∞, we may obtain (after countably infinite recursions) infinitely many subsequences by the construction procedure of the second to last paragraph. Then, we take the Cantor’s diagonal sequence (x˜i1 , x˜il2 , x˜ilj , . . .) and denote it as (x˜ik )∞ k=1 , 3 ∞ which is a subsequence of (x˜ i )i=1 . Then, we have limk∈N x˜ik , xs X = αs ∈ F¯ , ∀s ∈ N. This completes the proof of the claim. ' & Then, we have the subsequence (Ax˜ik )∞ k=1 ⊆ E ⊆ Y. Note that Ax˜ ik = -n¯ σ y  x ˜ , x  . We will show that this is a Cauchy sequence. Fix any ∀ ∈ s=1 s s ik s X  (0, ∞) ⊂ R. ∃n1 ∈ N with 1 ≤ n1 < n¯ + 1, such that √ ≥ σn1 +1 ≥ σn1 +2 ≥ 2 2    , ∀s ∈ N with · · · ≥ 0. Then, ∃n2 ∈ N such that x˜i , xs  − αs  < √ k

X

2 2n1 (σ1 +1)

1 ≤ s ≤ n1 , ∀k ∈ N with k ≥ n2 . ∀k, j ∈ N with k, j ≥ n2 , we have .

n¯ 6. 6 6 B C 62 6Ax˜i − Ax˜i 62 = 6 σs ys x˜ik − x˜ij , xs X 6Y k j Y i=1

=

n¯ .

B C 2  σs2  x˜ik − x˜ij , xs X 

s=1

=

n1 .

n¯ B B . C 2 C 2   σs2  x˜ik − x˜ij , xs X  + σs2  x˜ik − x˜ij , xs X  s=n1 +1

s=1



n1 . s=1



" σ12 √

n¯ #2  2 . B C    x˜ i − x˜ i , xs 2 + k j X 8 2n1 (1 + σ1 ) s=n +1 1

62 2 6 2 + 6x˜ik − x˜ij 6X ≤  2 2 8

¯ where the second equality follows from the fact that (yi )ni=1 is orthonormal; the first inequality follows from the assumption and the preceding discussion; and the second inequality follows from Bessel’s Inequality, Proposition 13.30. This proves that (Ax˜ik )∞ k=1 ⊆ E ⊆ Y is a Cauchy sequence. This completes the sufficiency proof. (iii) Let Q ∈ S¯X ∩ K(X, X). “Necessity” QB(X,X) > 0 since Q = ϑB(X,X) . By Proposition 13.48, ∃ (ui )∞ i=1 ⊆ X with ui  = 1 such that limi∈N |Q(ui ), ui | = QB(X,X) > 0. Since Q is a compact operator, then

(Q(ui ))∞ i=1 ⊆ Q(B X (ϑX , 1)), which is a compact set. By Borel-Lebesgue Theorem 5.37, the set Q(B X (ϑX , 1)) ⊆ X is sequentially compact. Then, there exists a subsequence (Q(uik ))∞ k=1 ⊆ Q(B X (ϑX , 1)) such that limk∈N Q(uik ) = u0 . Then, lim |Q(uik ), uik | = QB(X,X) . Let λ := lim Q(uik ), uik  ∈ k∈N

k∈N

13.9 Spectral Theory of Linear Operators

741

K and we have |λ| = QB(X,X) . Note that λ = limk∈N Q(uik ), uik  = limk∈N uik , Q∗ (uik ) = limk∈N uik , Q(uik ) = limk∈N Q(uik ), uik  = λ, where the second equality follows from Proposition 13.22; the third equality follows from Q being Hermitian; and the fourth equality follows from Definition 13.1. Hence, we have λ ∈ R \ {0}. (λ = QB(X,X) or λ = −QB(X,X) .) Note also that 62 6 0 ≤ 6Q(uik ) − λuik 6 = Q(uik ) − λuik , Q(uik ) − λuik  = Q(uik ), Q(uik ) − 6 62 λQuik , uik  − λuik , Quik  + λ2 uik , uik  = 6Q(uik )6 − λ2 − λ(Q(uik ), uik  + 6 6 2 uik , Q(uik ) − 2λ) ≤ Q2B(X,X) 6uik 6 − λ2 − λ(Q(uik ), uik  + Q∗ uik , uik  − 2λ) = −2λ(Q(uik ), uik  − λ) → 0, where the first equality follows from Proposition 13.2; the second equality follows from Definition 13.1; the second inequality follows from simple algebra and Proposition 13.22; and the last equality follows from Q is Hermitian. Then, we have 0 = limk∈N Q(uik ) − λuik  = limk∈N u0 − λuik . Hence, limk∈N uik = λ1 u0 . By continuity of Q, we have 6 6 Q( λ1 u0 ) = limk∈N Q(uik ) = u0 . Furthermore, 6 λ1 u0 6 = limk∈N uik  = 1. Hence, 1 λ u0 =: x0 is an eigenvector of Q associated with the eigenvalue λ. Let span ({x0 }) =: M ⊆ X. It is a closed subspace in X by Theorem 7.36 and Proposition 4.39. By Theorem 13.19, X = M ⊕ M ⊥h . Define P1 : X → M and P¯1 : X → M ⊥h be projection operators. We have P1 + P¯1 = idX . This implies that Q = (P1 + P¯1 )Q(P1 + P¯1 ) = P1 QP1 +P1 QP¯1 + P¯1 QP1 + P¯1 QP¯1 . Note that P1 = x0 x0∗ . ∀x ∈ X, P1 (x) = x0 x0∗ , x = x0 x, x0 . This implies that P1 QP1 (x) = x, x0 P1 Q(x0 ) = x, x0 P1 (λx0 ) = λx, x0 x0 = λx0 x0∗ , x and P¯1 QP1 (x) = x, x0 P¯1 (λx0 ) = ϑX . Thus, Q(x) = λx0 · x0∗ , x + P1 QP¯1 (x) + P¯1 QP¯1 (x). Hence, Q = λx0 x0∗ + P1 QP¯1 + P¯1 QP¯1 . By Propositions 13.22 and 13.43, we have Q = Q∗ = λx0 x0∗ + P¯1 Q∗ P1 + P¯1 Q∗ P¯1 = λx0 x0∗ + P¯1 QP1 + P¯1 QP¯1 = λx0 x0∗ + P¯1 QP¯1 . Hence, we have Q = λx0 x0∗ + P¯1 QP¯1 . By Propositions 13.51, 13.43, and 13.22, Q¯ := P¯1 QP¯1 ∈ S¯X ∩ K(X, X). “Sufficiency” Let Q admit an eigenvector x0 associated with the eigenvalue λ ∈ R \ {0} with |λ| = QB(X,X) > 0. Then, x0 , Qx0  = x0 , λx0  = λx0 2 = 0. Hence, Q = ϑB(X,X) . This proves the (iii). (iv) “Necessity” If Q = ϑB(X,X) , the result is trivial with n = 0. Consider the case Q = ϑB(X,X) . By (iii), Q = λx0 x0∗ + Q2 , where Q2 = P¯1 QP¯1 . Denote λ1 := λ ∈ R\{0} with |λ1 | = QB(X,X) > 0, x1 := x0 , Q1 := Q, and P¯1,1 := P¯1 . Now, we will repeat (iii) for the operator Q2 . Clearly, Q2 ∈ S¯X . We claim that Claim 13.52.4 Any eigenvector x2 ∈ X with x2  = 1 of Q2 associated with an eigenvalue λ2 ∈ R\{0} must be an eigenvector of Q1 associated with the eigenvalue λ2 , and x2 ⊥ x1 . Proof of ClaimB Q2 x2 = λC 2 x2 . B Since Q2 = PC¯1 QP¯B1 . Then, λ2 x1 , xC 2  = x1 , λ2 x2  = x1 , P¯1 QP¯1 x2 = P¯1∗ (x1 ), QP¯1 (x2 ) = P¯1 (x1 ), QP¯1 (x2 ) = 0, where the first equality follows from Definition 13.1 and the fact that λ2 ∈ R; the second equality follows from our assumption that x2 is an eigenvector of Q2 associated with the eigenvalue λ2 ; the third equality follows from the

742

13 Hilbert Spaces

expression for Q2 and Proposition 13.22; the fourth equality follows from Proposition 13.43; and the last equality follows from P¯1 (x0 ) = ϑX . Since, λ2 = 0, then x1 ⊥ x2 . Then, x2 ∈ M ⊥h . It is then easy to check that Q(x2 ) = Q1 (x2 ) = λx0 x0∗ , x2  + Q2 (x2 ) = λx0 x2 , x0  + λ2 x2 = λx0 x2 , x1  + λ2 x2 = λ2 x2 , where the first equality follows from our notation; the second equality follows from the expression for Q; the third equality follows from Riesz-Fréchet Theorem 13.15; the fourth equality follows from our notation; and the last equality follows from the preceding discussion. Hence, x2 is an eigenvector of Q associated with the eigenvalue λ2 . This completes the proof of the claim. ' & Repeat (iii) until Qn+1 = ϑB(X,X) or indefinitely. After any step k ∈ N (with k < n), we assume that we have found orthonormal sequence (xi )ki=1 ⊆ X, xi is an eigenvector of Qi (and Q) associated with the eigenvalue - λi ∈ R \ {0} with |λi | = Qi B(X,X) ∈ (0, ∞) ⊂ R, i = 1, . . . , k, and Q = ij =1 λj xj xj∗ + Qi+1 , i = 1, . . . , k. At step k + 1, since Qk+1 = ϑB(X,X) , then Qk+1 B(X,X) > 0, by (iii), there exists an eigenvector xk+1 ∈ X with xk+1  = 1 of Qk+1 associated with the eigenvalue λk+1 ∈ R \ {0} with |λk+1 | = Qk+1 B(X,X) > 0. Furthermore, ∗ + P¯1,k+1 Qk+1 P¯1,k+1 . Now, by Claim 13.52.4, xk+1 is an Qk+1 = λk+1 xk+1 xk+1 eigenvector of Qk associated with the eigenvalue λk+1 , and xk+1 ⊥ xk . Recursively, by Claim 13.52.4, xk+1 is an eigenvector of Qi , i = k − 1, . . . , 1, associated with the eigenvalue λk+1 , and xk+1 ⊥ xi . Hence, the sequence (xi )k+1 i=1 is an orthonormal sequence. Thus, either the process stops at n, for some n ∈ Z+ , when Qn+1 = ϑB(X,X) , or we have n = ∞ and an orthonormal sequence (xi )∞ i=1 . In the latter case, Q = ij =1 λj xj xj∗ + Qi+1 , ∀i ∈ N, where Qi+1 := P¯1,i Qi P¯1,i . Clearly, Qi+1 ∈ S¯X ∩ K(X, X), ∀i ∈ N. 6 62 6 6 |λi | = Qi B(X,X) ≥ Qi B(X,X) 6P¯1,i 6B(X,X) ≥ 6P¯1,i Qi P¯1,i 6B(X,X) = Qi+1 B(X,X) = |λi+1 | > 0, ∀i ∈ N with 1 ≤ i < n. In case of n = ∞. Suppose that limi∈N |λi | > 0. Then, Qxi = λi xi , ∀i ∈ N. The sequence ∞ (Qxi )∞ i=1 = (λi xi )i=1 does not have a converging subsequence. This contradicts with the6assumption that Q is 6 a compact operator. Therefore, limi∈N |λi | = 0. Then limk∈N 6 Q − ki=1 λi xi xi∗ 6B(X,X) = limk∈N Qk+1 B(X,X) = limk∈N |λk+1 | = 0, where the first two equalities follow from the recursive construction; and the last -n ∗ equality follows from the preceding discussion. Hence, Q = i=1 λi xi xi . This completes the necessity part of the proof. “Sufficiency” By the expression for Q = ni=1 |λi |(sgn(λi )xi )xi∗ and (ii), Q ∈ K(X, X) ∩ S¯X . This completes the sufficiency part of the proof. (v) Let Q ∈ S¯X ∩ K(X, X). Fix any λ ∈ K with λ = 0. If exists an i0 ∈ N -there n ∗ with 1 ≤ i0 < n + 1 such that λ = λi0 . By (iv), Q = λ i i=1 xi xi . It is easy to show that Q(xi0 ) = λi0 xi0 . Then, xi0 is an eigenvector of Q associated with the eigenvalue λ = λi0 , and λ is an eigenvalue of Q. On the other hand, if λ is an eigenvalue of Q. By Proposition 13.47, λ ∈ R \ {0}. Then, M := N (λ idX −Q) ⊃ {ϑX }. We need the following intermediate results. Claim 13.52.5 M = N (λ idX −Q) is finite-dimensional.

13.9 Spectral Theory of Linear Operators

743

Proof of Claim Suppose M is infinite-dimensional. Then, ∃ (x¯i )∞ i=1 ⊆ M that is orthonormal. Qx¯i = λx¯i , ∀i ∈ N. The sequence (Qx¯i )∞ = x¯i )∞ (λ i=1 i=1 does not have a convergent subsequence. This contradicts with the assumption Q ∈ K(X, X). Hence, the hypothesis is invalid. M must be finite-dimensional. ' & Claim 13.52.6 M ⊃ {ϑX } implies that λ = λi0 for some i0 ∈ N with 1 ≤ i0 < n + 1. Proof of Claim Let N := span ((xi )ni=1 ), and P : X → N and P¯ : X → N ⊥h be projection operators. By Theorem 13.19, we have P + P¯ = idX . ∀x ∈ N (λ idX −Q) =-M with x = ϑX . x = P x + P¯ x =: x¯ + x. ˜ λx = Qx implies ∈ N. This implies that λx˜ ∈ N ∩ N ⊥h and hence that λx¯ + λx˜ = ni=1 λi xi x, xi  ¯ xi  = λ ni=1 xi x, ¯ xi , where the x˜ = ϑX . Hence, we have λx¯ = ni=1 λi xi x, n last equality follows from x¯ ∈ N and (xi )i=1 is orthonormal. Therefore, we must ¯ xi , ∀i ∈ N with 1 ≤ i < n + 1. Then, λ = λi , ∀i ∈ N have λx, ¯ xi  = λi x, with 1 ≤ i < n + 1 and x, ¯ xi  = 0. Since ϑX = x = x¯ ∈ N, ∃i0 ∈ N with 1 ≤ i0 < n + 1 such that x, ¯ xi0  = 0. Then, λ = λi0 . This completes the proof of the claim. ' & By the preceding two claims, we have proved that if λ ∈ K with λ = 0 is an eigenvalue of Q, then λ = λi0 for some i0 ∈ N with 1 ≤ i0 < n + 1. Let λ ∈ R\{0} be an eigenvalue of Q. By Claim 13.52.5, M is finite-dimensional. Let dim M =: m ∈ N. We will show that there are exactly m λi ’s such that λ = λi . Let M admit an orthonormal basis (x¯i )m i=1 . By Claim 13.52.6 and its proof, λ = λi , ∀i ∈ N with 1 ≤ i < n + 1 and x¯j , xi  = 0, for some j ∈ {1, . . . , m}, and n (x¯i )m i=1 ⊆ N := span ((xi )i=1 ). Then, there are at least m λi ’s such that λ = λi . Then, there must be exactly m such λi ’s. This is because xi is an eigenvector of Q associated with λi , ∀i ∈ N with 1 ≤ i < n + 1. Then, xi ∈ M whenever λi = λ. Those xi ’s for such λi ’s form a linearly independent subset of M. M has dimension m, and therefore such λi ’s cannot be more than m. This shows that there are exactly m λi ’s such that λ = λi . We need the following intermediate result. Claim 13.52.7 Let λ ∈ R \ {0} be an eigenvalue of Q. Then, .

dim (N (λ idX −Q)) = max dim (N ((λ idX −Q)s )) ∈ N s∈N

  Proof of Claim We will show that N (λ idX −Q)s+1 \ N ((λ idX −Q)s ) = ∅,   ∀s ∈ N. Suppose we have x0 ∈ N (λ idX −Q)s+1 \ N ((λ idX −Q)s ), for some s ∈ N. Then, (λ idX −Q)s+1 x0 = ϑX and (λ idX −Q)s x0 =: ξ = ϑX . Clearly, (λ idX −Q)ξ = ϑX and ξ ∈ M. But, ξ ∈ R(λ idX −Q) ⊆ R(λ idX −Q) = ⊥ ((R(λ idX −Q))⊥h ) h = (N ((λ idX −Q)∗ ))⊥h = M ⊥h , where the first equality follows from Proposition 13.17; and the second equality follows from Proposition 13.22. Thus, ξ ∈ M ∩ M ⊥h and hence ξ = ϑX . This contradicts with ξ = ϑX .  Hence, the hypothesis is invalid. Therefore, we must have N (λ idX −Q)s+1 \

744

13 Hilbert Spaces

  N ((λ idX −Q)s ) = ∅, ∀s ∈ N. Clearly, N (λ idX −Q)s+1 ⊇ N ((λ idX −Q)s ), ∀s ∈ N. Then, we must have N (λ idX −Q) = N ((λ idX −Q)s ), ∀s ∈ N. Then, the ' & result holds, by Claim 13.52.5. This completes the proof of the claim. This completes the proof for (v). (vi) Let A ∈ K(X, Y). Thus, (ii) holds. Then, A∗ A ∈ S¯psd X ∩ K(X, X) by -¯ Propositions 13.42 and 13.51. Then, we have A∗ A = ni=1 σi2 xi xi∗ . By (v), σi2 ’s ¯ are the nonzero eigenvalues of A∗ A, and is independent of the choice of (yi )ni=1 and √ n¯ n¯ (xi )i=1 . Hence, σi = λi , ∀i ∈ N with 1 ≤ i < n¯ + 1, where (λi )i=1 are the nonzero eigenvalues of A∗ A (including multiplicity), which is independent of the ¯ ¯ choice of (yi )ni=1 and (xi )ni=1 . This completes the proof of the theorem. ' & Example 13.53 Let X be a separable Hilbert space over K, Y be a finitedimensional Hilbert space over K, Ii ⊆ R be compact intervals, Ii := ((Ii , |·|), Bi , μi ) be the compact finite metric measure subspace of R, i = 1, 2, H1 := L2 (I1 , X) and H2 := L2 (I2 , Y) be Hilbert spaces as defined in Example 13.11, K : I2 × I1 → B(X, Y) be a continuous function. Define the mapping T¯ : H1 → H2 by T¯ ([h]) = [T (h)] and T : L¯ 2 (I1 , X) → L¯ 2 (I2 , Y) defined by 7 T (h)(t) =

.

I1

K(t, s)h(s) dμ1 (s);

∀t ∈ I2 , ∀h ∈ L¯ 2 (I1 , X)

We will show that T¯ ∈ B(H1 , H2 ) and is a compact operator. ∀[h] ∈ H1 , ∀t ∈ I2 , we have K(t, ·)h(·) : I1 → Y is B1 -measurable, by Propositions 11.37, 11.39, 7.126, and 7.65. By Proposition 5.29, ∃M ∈ [0, ∞) ⊂ R such that K(t, s)B(X,Y) ∀[h] ∈ H1 , ∀t ∈ ≤ M, ∀(t, s) ∈ I2 × I1 . Then, I2 , ∀s ∈ I1 , we have I1 K(t, s)h(s)Y dμ1 (s) ≤ I1 Mh(s)X dμ1 (s) ≤ √ M μ1 (I1 )[h]H1 < ∞, where the second inequality follows from Cauchy– Schwarz Inequality. By Proposition 11.89, T (h)(t) ∈ Y and T (h)(t)Y ≤ √ M μ1 (I1 )[h]H1 , ∀t ∈ I2 , ∀[h] ∈ H1 . Next, we will show that T (h) ∈ C(I2 , Y). By Proposition 5.39, K is uniformly continuous. ∀ ∈ (0, ∞) ⊂ R, ∃δ() ∈ (0, ∞) ⊂ R such that ∀(t1 , s1 ), (t2 , s2 ) ∈ I2 ×I1 with |(t1 , s1 ) − (t2 , s2 )| < δ(), we have K(t1 , s1 ) −!K(t2 , s2 )B(X,Y) < . ∀[h] ∈ H1 , ∀t1 , t2 ∈ I2 with |t1 − t2 | < δ

.

 √ 1+ μ1 (I1 )[h]H1

, we have

67 6 6 6 6 T (h)(t1 ) − T (h)(t2 )Y = 6 (K(t1 , s) − K(t2 , s))h(s) dμ1 (s)6 6 I1 Y 7 (K(t1 , s) − K(t2 , s))h(s)Y dμ1 (s) ≤ 7 ≤

I1

I1

K(t1 , s) − K(t2 , s)B(X,Y) h(s)X dμ1 (s)

13.9 Spectral Theory of Linear Operators

745

7

 h(s)X dμ1 (s) √ μ1 (I1 ) [h]H1 J  ≤ μ1 (I1 )[h]H1 <  √ 1 + μ1 (I1 ) [h]H1



I1 1 +

where the first equality and inequality follow from Proposition 11.89; the third inequality follows from the preceding discussion; and the fourth inequality follows from Cauchy–Schwarz Inequality. Then, by Proposition 11.37, T (h) is B2 -measurable. Furthermore, T (h) ∈ L¯ 2 (I2 , Y) since I2 T (h)(t)2Y dμ2 (t) ≤ M 2 μ1 (I1 )[h]2H1 dμ2 = M 2 μ1 (I1 )μ2 (I2 )[h]2H1 < ∞. Then, 6 6I2 √ 6T¯ ([h])6 ¯ = [T (h)]H2 ≤ M μ1 (I1 )μ2 (I2 )[h]H1 < ∞ and T ([h]) ∈ H2 . H2 6 6 It is clear that T¯ is a linear operator. The preceding shows that 6T¯ 6B(H ,H ) ≤ 1 2 √ M μ1 (I1 )μ2 (I2 ). Hence, T¯ ∈ B(H1 , H2 ). ˆ ⊆ Y be a set Finally, we will show that T¯ is a compact operator. Let (yi )ni=1 of orthonormal basis vectors of Y, where nˆ ∈ Z+ is the dimension of Y. Fix any ([hi ])∞ i=1 ⊆ B H1 (ϑH1 , 1) := F1 . By Alaoglu Theorem 7.122 and Riesz-Fréchet Theorem 13.15, F1 is weakly compact. By Propositions 5.22 and 5.26, ∃ [h0 ] ∈ F1 ¯ such that it is a weak cluster point of ([hi ])∞ i=1 . Let v0 := T (h0 ) ∈ L2 (I2 , Y). By the preceding paragraph, we have vi := T (hi ) ∈ C(I2 , Y), ∀i ∈ Z! +, furthermore, ∀ ∈ (0, ∞) ⊂ R, ∀t1 , t2 ∈ I2 with |t1 − t2 | < δ

√ 1+ μ1 (I1 )

,

we have T (hi )(t1 ) − T!(hi )(t2 )Y < . Let i0 := 0 ∈ Z+ . Fix any n ∈ N,

let δ¯ := δ

2√−n /3 1+ μ1 (I1 )

¯ > 0. Then, exists n¯ ∈ N and (ti )ni=1 ⊆ I2 with

¯ ∀j = 2, . . . , n. min I2 = t1 < t2 < · · · tn¯ = max I2 and@ tj − tj −1 < δ, ¯ A ∗ Define x∗n,j,k ∈ H1 by x∗n,j,k , [h] := I1 K(tj , s)h(s) dμ1 (s), yk ∈ K, ∀[h] ∈ H1 , ∀j = 1, . . . , n, ¯ * ∀k = 1, . . ., n. ˆ It is easy to see that x∗n,j,k ’s −n  are well-defined. Let On := [h] ∈ H1 |x∗n,j,k , h − h0 | < 2√ , j = 3 nˆ + 1, . . . , n, ¯ k = 1, . . . , n. ˆ ⊆ H1 . Clearly, [h0 ] ∈ On and On is a weakly open subset of H1 by Proposition 7.116. Since [h0 ] is a weak cluster point of ([hi ])∞ i=1 , then there exists in ∈ N with in > in−1 , such that [hin ] ∈ On . ∞ Thus, we have obtained a subsequence T¯ ([hin ]) n=1 ⊆ T¯ (F1 ) ⊆ H2 . By [hin ] ∈ On , we have |T (hin )(tj ) − v0 (tj ), yk | = |T (hin )(tj ) − T (h0 )(tj ), yk | = −n ¯ ∀k = 1, . . . , n. ˆ This implies |x∗n,j,k , hin − h0 | < 2√ , ∀j = 1, . . . , n, 3 nˆ -nˆ −n that T (hin )(tj ) − v0 (tj )Y = ( k=1 |T (hin )(tj ) − v0 (tj ), yk |2 )1/2 < 2 3 . Thus, ∀t ∈ I2 , ∃j0 ∈ 1, . . . , n¯ − 1 such that tj0 ≤ t ≤ tj0 +1 . This yields T (hin )(t) − v0 (t)Y ≤ T (hin )(t) − T (hin )(tj0 )Y + T (hin )(tj0 ) − v0 (tj0 )Y + −n −n −n v0 (tj0 ) − v0 (t)Y ≤ 2 3 + 2 3 + 2 3 = 2−n . Thus, we have T (hin ) − v0 C (I2 ,Y) < 6 6 √ 2−n , and hence 6T¯ ([hin ]) − [v0 ]6H < 2−n μ2 (I2 ). This shows that 2 ¯ limn∈N T¯ ([hin ]) = [v0 ] in H2 . Thus, we have shown that ∀(vi )∞ i=1 ⊆ T (F1 ), there ¯ (F1 ). This shows exists a subsequence (vin )∞ that converges to an element of T n=1 that T¯ (F1 ) is sequentially compact. By Borel-Lebesgue Theorem 5.37, T¯ (F1 ) is

746

13 Hilbert Spaces

compact and then closed. Hence, T¯ (F1 ) = T¯ (F1 ) is compact. By Proposition 13.50, T¯ is a compact operator. This completes this example. % Proposition 13.54 Let T ∈ (0, ∞) ⊂ R, I := [0, T ] ⊂ R with the subset topology O, I := (I, O), and z ∈ C(I, K) =: Z. Then, the following statements hold.   (i) Assume that z(0) = z(T ). Let M = z¯ ∈ Z  z¯ (x) = cos( 2πn T x), ∀x ∈ I, ∀n ∈  Z+ ; or z¯ (x) = sin( 2πn x), ∀x ∈ I, ∀n ∈ N ⊆ Z. Let M = span (M) ⊆ Z. T Then, z ∈ M.  * (n− 1 )π  (ii) Assume that z(0) = 0. Let M = z¯ ∈ Z  z¯ (x) = sin( T2 x), ∀x ∈ I, ∀n ∈ + N ⊆ Z. Let M = span (M) ⊆ Z. Then, z ∈ M. Proof (i) This follows readily from Corollary 7.58 and Proposition 13.35. (ii) Clearly, we have only to prove this for T = π2 . For general T ∈ (0, ∞) ⊂ R, π π we may work with zˆ ∈ C(r0, π2 , K) with zˆ (t) := z( 2T π t), ∀t ∈ r0, 2 . Let T = 2 . Let I¯ := r−π,π ⊂ R and I¯ := (I¯, |·|) be the complete metric subspace of R. Define ¯ K) := Y by h ∈ C(I, ⎧ z(2T − t) ⎪ ⎪ ⎨ z(t) .h(t) = ⎪ −z(−t) ⎪ ⎩ −z(2T + t)

T < t ≤ 2T 0≤t ≤T , −T ≤ t < 0 −2T ≤ t < −T

∀t ∈ I¯

¯ Hence, h is an By Theorem 3.11, h ∈ Y. Clearly, h(−t) = −h(t), ∀t ∈ I. ¯ odd function. Clearly, = z(0) = 0 = h(−π). Hence, by (i), h ∈ M,  h(π)  ¯ ∀n ∈ ¯ := span M¯ and M¯ := z¯ ∈ Yz¯ (x) = cos(nx), ∀x ∈ I, where M  ¯ ∀n ∈ N ⊆ Y. Let Sn ∈ C(I, ¯ K) be defined Z+ ; or z¯ (x) = sin(nx), ∀x ∈ I, -n 1 ¯ by Sn (t) := 2 a0 + i=1 (ai cos(it) + bi sin(it)), ∀t ∈ I, ∀n ∈ Z+ , be the partial π sum of the Fourier series for h, where ai := π1 −π h(t) cos(it) dt ∈ K, ∀i ∈ Z+ ; π and bi := π1 −π h(t) sin(it) dt ∈ K, ∀i ∈ N. It is straight forward to show that ai = 0, ∀i ∈ Z+ , since h is an odd function. Now, we will show that b2k = 0, ∀k ∈ N. Fix any k ∈ N. We have b2k =

1 π

=

1 π

.

7

π −π

7

h(t) sin(2kt) dt − π2

−π

7

π 2

+ 0

7 −z(π + t) sin(2kt) dt + 7

z(t) sin(2kt) dt +

π π 2

0 − π2

−z(−t) sin(2kt) dt

z(π − t) sin(2kt) dt

!

13.9 Spectral Theory of Linear Operators

=

7

1 π

π 2



747

z(t¯) sin(2k t¯ − 2kπ) dt¯ +

0

7

π 2

+

7 z(t) sin(2kt) dt −

π 2

0

7

1 = π

π 2

− π 2

+ 0

0 π 2

z(tˆ) sin(−2k tˆ) dtˆ

! z(t˜) sin(2kπ − 2k t˜) dt˜

z(t¯) sin(2k t¯) dt¯ +

0

7

0

7

7

π 2

z(tˆ) sin(2k tˆ) dtˆ

0

7

π 2

z(t) sin(2kt) dt −

! z(t˜) sin(2k t˜) dt˜ = 0

0

where the second equality follows from Definition 12.71; the third equality follows from Change of Variable Theorem 12.91 and t¯ = t + π, tˆ = −t, and t˜ = R πS − t; and fourth equality follows from Definition 12.71. Thus, Sn (t) = - n+1 2 ¯ k=1 b2k−1 sin((2k − 1)t), ∀t ∈ I, ∀n ∈ Z+ . By Fejér’s Theorem 38.12 of Bartle (1976), the Cesàro means of the Fourier series for h converges to h in Y (working with the real part and the imaginary part separately - when K = C), that is limn∈N Kn = h in Y, where Kn ∈ Y is defined by Kn = n1 n−1 i=0 Si , ∀n ∈ N. Since ˆ Si ’s only involve terms " #with sin((2k − 1)t),then so is Kn . Hence, we have h ∈ M,  ¯ ˆ ˆ ˆ where M := span M and M := z¯ ∈ Y z¯ (x) = sin((2k − 1)x), ∀x ∈ I, ∀k ∈  N ⊆ Y. Note that z = h|I . Then, z ∈ M, where M is as defined in (ii). This completes the proof of the proposition. ' & Proposition 13.55 Let X and Y be Hilbert space over K and A ∈ K(X, Y). Then, the following statements hold. (i) ∀λ ∈ (0, ∞) ⊂ R, λ is an eigenvalue of AA∗ with a corresponding eigenvector y if, and only if, λ is an eigenvalue of A∗ A with a corresponding eigenvector A∗ y. (ii) AA∗ ∈ S¯psd Y and A∗ A ∈ S¯psd X are both compact operators. (iii) A∗ ∈ K(Y, X). (iv) There exists n¯ ∈ Z+ ∪ {∞} such that A∗ A admits eigenvectors xi ∈ X with xi X = 1, each associated with an eigenvalue λi ∈ (0, ∞) ⊂ ¯ R, i = 1, . . . , n, ¯ and (xi )ni=1 is an orthonormal sequence, and A∗ A = -n¯ ∗ ¯ = ∞. Then, AA∗ admits eigenvectors i=1 λi xi xi and limi∈N λi = 0 if n 1 yi := Axi  Axi ∈ Y with yi Y = 1, each associated with an eigenvalue Y

¯ ¯ and (yi )ni=1 is an orthonormal sequence, and λi ∈ (0, ∞) ⊂ R, i = 1, . . . , n, n ¯ AA∗ = i=1 λi yi yi∗ . (v) AA∗ B(Y,Y) = A∗ AB(X,X) = A2B(X,Y) .

Proof (i) “Necessity” We have λy = AA∗ y and y > 0. Then, λA∗ y = A∗ (λy) = A∗ AA∗ y = (A∗ A)(A∗ y). Now, y = ϑY and λ = 0, implies that λy = ϑY and then

748

13 Hilbert Spaces

AA∗ y = ϑY . Hence, A∗ y = ϑX . This shows that A∗ y is an eigenvector of A∗ A associated with eigenvalue λ. “Sufficiency” This follows from symmetry and the “Necessity” proof. (ii) By (ix) of Proposition 13.42, AA∗ ∈ S¯psd Y and A∗ A ∈ S¯psd X . By Proposition 13.51, A∗ A and AA∗ are compact operators. ¯ (iii) By Spectral Theory Theorem 13.52, ∃n¯ ∈ Z+ ∪ {∞} and ∃ (xi )ni=1 ⊆ X, n¯ which is an orthonormal sequence, and ∃ (yi )i=1 ⊆ Y, which is an orthonormal sequence, and σ=: σ1 ≥ σ2 ≥ · · · ≥ σi ∈ (0, ∞) ⊂ R, ∀i ∈ N with 0 ≤ i −1 < n, ¯ ¯ such that A = ni=1 σi yi xi∗ , and in case n¯ = ∞, we have limi∈N σi = 0. Then, -¯ A∗ = ni=1 σi xi yi∗ . By Spectral Theory Theorem 13.52, A∗ ∈ K(Y, X). -¯ -¯ -¯ ∗ (iv) A A = ( ni=1 σi xi yi∗ )( ni=1 σi yi xi∗ ) = ni=1 σi2 xi xi∗ , where the second n¯ equality follows from the fact that (yi )i=1 is orthonormal. It is easy to check that -¯ σi2 xi = A∗ Axi , ∀i ∈ N with 1 ≤ i < n¯ + 1. Similarly, AA∗ = ni=1 σi2 yi yi∗ . By 1 1 (i) of Spectral Theory Theorem 13.52, we have yi = σi Axi = Axi  Axi . Then, Y (iv) follows. ∗ (v) By Spectral Theory Theorem 13.52, we have AA B(Y,Y) = σ12 = A∗ AB(X,X) = A2B(X,Y) . This completes the proof of the proposition. ' & Proposition 13.56 (Fredholm Alternative) Let X be a Hilbert space over K, A ∈ S¯X ∩ K(X, X), and λ ∈ R \ {0}. Then, the following statements hold. (i) N (λ idX −A) = {ϑX } if, and only if, R(λ idX −A) = X. (ii) R(λ idX −A) = (N (λ idX −A))⊥h . Proof (i) “Sufficiency” Let R(λ idX −A) = X. By Proposition 13.22, N (λ idX −A) = (R((λ idX −A)∗ ))⊥h = (R(λ idX −A))⊥h = X⊥h = {ϑX }. “Necessity” Let N (λ idX −A) = {ϑX }. By Proposition 13.22, N (λ idX −A) = (R((λ idX −A)∗ ))⊥h = (R(λ idX −A))⊥h = {ϑX }. This further implies that #⊥h "  ⊥ R(λ idX −A) = (R(λ ids X − A))⊥h = ϑX h = X. Fix any x0 ∈ X = R(λ idX −A). Then, there exists (xi )∞ i=1 ⊆ X such that limi∈N (λ idX −A)xi = x0 . We will distinguish two exhaustive and mutually exclusive cases: Case 1: ∃r ∈ [0, ∞) ⊂ R such that (xi )∞ i=1 ⊆ B X (ϑX , r); Case 2: ∀r ∈ [0, ∞) ⊂ R, ∃i ∈ N such that xi  > r. Case 1: ∃r ∈ [0, ∞) ⊂ R such that (xi )∞ i=1 ⊆ B X (ϑX , r). By A ∈ K(X, X) ∞ and Definition 13.49, there exists a subsequence (xik )∞ k=1 such that (Axik )k=1 ⊆ X converges to x¯ ∈ X. Then, we have limk∈N λxik = limk∈N (λ idX −A)(xik ) + limk∈N Axik = x0 + x¯ and limk∈N xik = λ1 (x0 + x) ¯ =: x¯0 . Then, (λ idX −A)(x¯0 ) = limk∈N (λ idX −A)(xik ) = limk∈N λxik − limk∈N Axik = λx¯0 − x¯ = x0 . Hence, x0 ∈ R(λ idX −A). Case 2: ∀r ∈ [0, ∞) ⊂ R, ∃i ∈ N such that xi  > r. By possibly further restrict to a subsequence if necessary, we may assume, without loss of generality, limi∈N xi  =: limi∈N ri = ∞. Then, we have limi∈N r1i (λ idX −A)xi = 6 6 6 6 limi∈N (λ idX −A)( r1i xi ) = ϑX . Note that 6 r1i xi 6 = 1, ∀i ∈ N. By A ∈

13.9 Spectral Theory of Linear Operators

749

K(X, X) and Definition 13.49, there exists a subsequence limk∈N A( r1i xik ) k limk∈N A( r1i xik ) k limk∈N A( r1i xik ) k

= x¯ ∈ X. Then,

limk∈N λ r1i xik k

"

#∞ 1 x i rik k k=1

such that

= limk∈N (λ idX −A)( r1i xik ) +

= x. ¯ This implies that limk∈N

1 rik xik

=

1 ¯ λ x,

k

and A( λ1 x) ¯ =

= x. ¯ We can conclude that x¯ ∈ N (λ idX −A) = {ϑX }. Hence, 6 6 6 6 61 6 6 6 x¯ = ϑX . But 6 λ x¯ 6 = limk∈N 6 r1i xik 6 = 1 > 0. This is a contradiction! Hence, k this case is impossible. In the above, we have shown that x0 ∈ R(λ idX −A). By the arbitrariness of x0 , we have R(λ idX −A) = X. This completes the proof of (i). (ii) Let N (λ idX −A) =: M. If M = {ϑX }, then by (i) we have R(λ idX −A) = X = M ⊥h . This case is proved. If M ⊃ {ϑX }, then λ is an eigenvalue of A. By Spectral Theory Theorem 13.52, M is finite-dimensional. By Proposition 13.22, we have (R(λ idX −A))⊥h = N (λ idX −A) = M. Thus, by Proposition 13.17, we have R(λ idX −A) = M ⊥h . Fix any x0 ∈ M ⊥h . Then, there exists (x˜i )∞ i=1 ⊆ X such that limi∈N (λ idX −A)x˜ i = x0 . Note that M is finite-dimensional implies that M is closed by Theorem 7.36 and Proposition 4.39. Let P : X → M and P¯ : X → M ⊥h be projection operators. By Theorem 13.19, we have P + P¯ = idX . Then, x0 = limi∈N (λ idX −A)x˜i = ¯ ¯ limi∈N (λ idX −A)(P ∞x˜i + P x˜i )⊥ = limi∈N (λ idX −A)(P x˜i ). Hence, the sequence h satisfies x ¯ (xˆi )∞ := P x ˜ ⊆ M = lim i i=1 0 i∈N (λ idX −A)xˆ i . We will i=1 distinguish two exhaustive and mutually exclusive cases: Case 1: ∃r ∈ [0, ∞) ⊂ R such that (xˆi )∞ i=1 ⊆ B X (ϑX , r); Case 2: ∀r ∈ [0, ∞) ⊂ R, there exists i0 ∈ N such that xˆi0  > r. Case 1: ∃r ∈ [0, ∞) ⊂ R such that (xˆi )∞ i=1 ⊆ B X (ϑX , r). By A ∈ K(X, X), there exists a subsequence (xˆik )∞ such that limk∈N Axˆik = x¯ ∈ X. Then, we have k=1 limk∈N λxˆik = limk∈N (λ idX −A)xˆik + limk∈N Axˆik = x0 + x¯ =: λxˆ0 . This implies that (λ idX −A)xˆ0 = limk∈N (λ idX −A)xˆik = x0 ∈ R(λ idX −A). Case 2: ∀r ∈ [0, ∞) ⊂ R, there exists i0 ∈ N such that xˆi0  > r. By possibly further restricting to a subsequence, we may, without loss of generality, assume that limi∈N xˆi  =: limi∈N ri = ∞. Then, we have limi∈N (λ idX −A)( r1i xˆi ) = ϑX . " #∞ By A ∈ K(X, X) and Definition 13.49, there exists a subsequence r1i xˆik ⊆ k=1

k

M ⊥h such that limk∈N A( r1i xˆik ) =: x¯ ∈ X. This implies that limk∈N λ( r1i xˆik ) = k

k

limk∈N (λ idX −A)( r1i xˆik ) + limk∈N A( r1i xˆik ) = ϑX + x¯ = x¯ and limk∈N r1i xˆik = k k k 6 6 6 6 61 6 1 ⊥h . Furthermore, 6 1 x¯ 6 = lim x ¯ ∈ M x ˆ . Note that = 1 and x ¯ =

ϑ 6 6 6 6 i k∈N X k λ λ ri k

Ax¯ = limk∈N A(λ r1i xˆik ) = λ limk∈N A( r1i xˆ ik ) = λx. ¯ This implies that x¯ is an k k eigenvector of A associated with the eigenvalue λ, and x¯ ∈ M. Thus, we have x¯ ∈ M ∩ M ⊥h and hence x¯ = ϑX . This is a contradiction with x¯ = ϑX . Thus, this case is impossible.

750

13 Hilbert Spaces

Based on the preceding analysis, we have x0 ∈ R(λ idX −A). By the arbitrariness of x0 , we have M ⊥h = R(λ idX −A). This completes the proof of the proposition. ' & Example 13.57 Let m ∈ N, Rm be endowed with the usual positive cone, Ω ∈ BB (Rm ) be a nonempty open rectangle, X := ((Ω, |·|), B, μ) be the σ -finite metric measure subspace of Rm , Y be a separable Hilbert space over K. (Note that Y∗ is separable by Riesz-Fréchet Theorem 13.15.) Fix x0 ∈ Ω. Then, W2,1,x0 (X, Y) (as defined in Example 12.106) is a Hilbert space over K with inner product defined by .

u, vW2,1,x (X,Y) 0 7 .7 := u(x), v(x)Y dμ(x) + Ω

J ⊆J¯ J =∅

πJ (Ω)

uJ,x0 (s), vJ,x0 (s)Y dμJ (s) ∈ K

∀u, v ∈ W2,1,x0 (X, Y), where uJ,x0 ’s are the stream functions of u with respect to x0 , vJ,x0 ’s are the stream functions of v with respect to x0 . (Notations are as defined in Example 12.106.) It is easy to show that ·, ·W2,1,x (X,Y) defines an inner product on W2,1,x0 (X, Y). 0 Then, W2,1,x0 (X, Y) is a pre-Hilbert space. By Example 12.106, it is a Hilbert space. Then, by Riesz-Fréchet Theorem 13.15, W2,1,x0 (X, Y) is reflexive and its dual also a Hilbert space. ∀f ∈ (W2,1,x0 (X, Y))∗ , there exists a unique u0 ∈ W2,1,x0 (X, Y) such that f (u) = u, u0 W2,1,x (X,Y) , ∀u ∈ W2,1,x0 (X, Y), and 0 f (W2,1,x (X,Y))∗ = u0 W2,1,x (X,Y) . f is denoted to be u∗0 . 0 0 The tools of this chapter are applicable to W2,1,x0 (X, Y). %

Chapter 14

Probability Theory

14.1 Fundamental Notions Definition 14.1 Let Ω := (Ω, B, P ) be a finite measure space with P (Ω) = 1. Then, it is said to be a probability measure space. Let Y be a topological space. A Y-valued random variable is a B-measurable function x : Ω → Y. When Y ⊆ R, we will simply say x is a random variable. When Y ⊆ Z and Z is a separable Banach space over K, the integral Ω x dP is said to be the expectation of x, when it makes sense, which will be denoted by E(x). % Definition 14.2 Let (Ω, B) be a measurable space, Y be a topological space, S be a set, F ⊆ B, x : Ω → Y be B-measurable, and xt : Ω → Y be B-measurable, ∀t ∈ S. We will denote the σ -algebra generated by F by σ (F ) ⊆ B, denote the σ -algebra generated by {E ∈ B | E = xinv (O), O ∈ OY } by σ (x) ⊆ B, and denote the σ -algebra generated by {E ∈ B | E = xt inv (O), O ∈ OY , t ∈ S} by σ ((xt )t ∈S ) ⊆ B. Clearly, σ (x) is the smallest σ -algebra on which x is measurable, and σ ((xt )t ∈S ) is the smallest σ -algebra on which xt , ∀t ∈ S, are measurable. % Proposition 14.3 Let Ω := (Ω, B, P ) be a probability measure space, n ∈ N, F1 , . . . , Fn ∈ B. Then, P(

n 

.

i=1

Fi ) =

n . l=1

(−1)

l−1

. 1≤i1 0, we have E P (F |B)(ω) dP˜ (ω) = P (F |E)P (E) by the previous paragraph. Proposition 14.14 Let Xi := (Ωi , Bi , P i ) be a probability measure space, i = 1, 2, X := X1 × X2 =: (Ω, B, P ) be the product probability measure space, Y be a finite-dimensional Banach space over K, x : Ω1 × Ω2 → Y be B-measurable, x ∈ L¯ 1 (X , Y), and ∃M ∈ [0, ∞) ⊂ R, such that x(ω1 , ω2 )Y ≤ M, ∀(ω1 , ω2 ) ∈ Ω1 × Ω2 = Ω. Define g : Ω → Y by g(ωˆ 1 , ωˆ 2 ) := Ω x(ωˆ 1 , ω2 ) dP (ω1 , ω2 ), ∀(ωˆ 1 , ωˆ 2 ) ∈ Ω. Then, the following statements hold: (i) The collection of σ -algebras (Bˆ1 , Bˆ2 ) is independent, where Bˆi := {B1 × B2 ∈ B | Bj = Ωj , j = 1, 2 with i = j }. (ii) There exists g1 : Ω1 → Y, such that g(ω1 , ω2 ) = g1 (ω1 ), ∀(ω1 , ω2 ) ∈ Ω, g1 ∈ L¯ 1 (X1 , Y), and g ∈ E(x|Bˆ1) Proof (i) This holds by Proposition 12.21. (ii) Clearly, g(ωˆ 1 , ωˆ 2 ) = g1 (ωˆ 1 ), ∀(ωˆ 1 , ωˆ 2 ) ∈ Ω, and, ∀ωˆ 1 ∈ Ω1 , 7

7

g1 (ωˆ 1 ) :=

7

x(ωˆ 1 , ω2 ) dP (ω1 , ω2 ) =

.

7

Ω

=

7 x(ωˆ 1 , ω2 )

Ω2

x(ωˆ 1 , ω2 ) dP 1 (ω1 )dP 2 (ω2 ) Ω2

Ω1

7

dP 1 (ω1 )dP 2 (ω2 ) = Ω1

x(ωˆ 1 , ω2 ) dP 2 (ω2 ) ∈ Y Ω2

where the first equality follows from Proposition 12.28 and Fubini’s Theorem 12.31 and the assumptions, the second equality follows from Propositions 11.92, the third equality follows from 11.75, and the set membership follows from Proposition 11.92. By Fubini’s Theorem 12.31, g1 is B1 -measurable and g1 ∈ L¯ 1 (X1 , Y).

764

14 Probability Theory

Hence, g is Bˆ1 -measurable and g ∈ L¯ 1 (X , Y). ∀E ∈ Bˆ1 , we have E = E1 × Ω2 , where E1 ∈ B1 . Then, 7 7 7 7 . x dP = xχE,Ω dP = x(ω1 , ω2 )χE1 ,Ω1 (ω1 ) dP 2 (ω2 )dP 1 (ω1 ) E

7

= Ω1

Ω

χE1 ,Ω1 (ω1 )

7 =

Ω1

x(ω1 , ω2 ) dP 2 (ω2 )dP 1 (ω1 ) Ω2

7

7

g1 dP 1 E1

7

dP 2 (ω2 )dP 1 (ω1 ) =

g1 (ω1 ) E1

Ω2

χE1 ,Ω1 (ω1 )g1 (ω1 ) dP 1 (ω1 ) =

7 =

Ω1

7

Ω2

g dP E

where the first equality follows from Proposition 11.92, the second equality follows from Fubini’s Theorem 12.31, the third equality follows from Proposition 11.92, the fourth equality follows from the expression for g1 , and the last equality follows from Fubini’s Theorem 12.31. By the arbitrariness of E and Proposition 14.8, we have g ∈ E(x|Bˆ1). This completes the proof of the proposition. ' & Theorem 14.15 (Fundamental Theorem on Modeling) Let Λ be a set, X α := (Xα , Bα , P α )be a probability measure space, ∀α ∈ Λ, F := { α∈Λ Bα ⊆ X := α∈Λ Xα | Bα ∈ Bα , ∀α ∈ Λ, and Bα = Xα , ∀α ∈ Λ except a finite number of α’s}, and μ : F → [0, 1] ⊂ Re be defined by   μ( α∈Λ Bα ) = α∈Λ P α (Bα ) = α∈Λ,Bα ⊂Xα P α (Bα ), ∀ α∈Λ Bα ∈ F . Then, F is a semialgebra on X and there exists a unique probability measure space (X, B := σ (F ), P ) such that P |F = μ.  Furthermore, the collection of σ -algebras (B¯α )α∈Λ is independent, where B¯α := { λ∈Λ Bλ ⊆ F | Bλ = Xλ , ∀λ ∈ Λ with λ = α}, ∀α ∈ Λ. Proof Note that the result holds if Λ is a finite set, by Proposition 12.21. We need only to prove the result when Λ is not a finite set.  We will first show  that F is a semialgebra  on X. Clearly, ∅, X ∈ F . ∀B := B , C := C ∈ F , B ∩ C = α α α∈Λ α∈Λ α∈Λ (Bα ∩ Cα ). Clearly, Bα ∩ Cα ∈ Bα , ∀α ∈ Λ, since Bα , Cα ∈ Bα . Furthermore, Bα ∩ Cα = Xα , ∀α ∈ Λ except a finite number of α’s, since Bα = Xα , ∀α ∈ Λ except for a finite number of α’s and Cα = Xα , ∀α ∈ Λ except for a finite number of α’s. Hence, B ∩ C ∈ F . Fix any B := α∈Λ Bα ∈ F . Then, Bα ∈ Bα , ∀α ∈ Λ, and Bα = Xα , ∀α  ∈ Λ \ ΛB , where ΛB = {α1 , . . . , αn } ⊆ Λ,n ∈ Z+ . Then, X \ B = nj=1 1≤i1 0 + ni=1 μ(Ci ) = 0 + ni=1 Pˆ n,1 (Cˆ i,n,1 ), where the equalities follow from the fact that ΛB ⊆ Λˆ n and Λi ⊆ Λˆ n , ∀i ∈ {1, . . . , n}. Define, Hˆ n,1 :=   Bˆ n,1 \( ni=1 Cˆ i,n,1 ). Then, Pˆ n,1 (Hˆ n,1 ) > 0 . Define Hn,1 := B\( ni=1 Ci ). Clearly,  ∞ Hn,1 = Hˆ n,1 × ˆ Xα . Obviously, Hn,1 ⊇ Hn+1,1 and n=1 Hn,1 = ∅. α∈Λ\Λn

˜ ˆ n,1(χ ˆ By Radon–Nikodym Theorem 11.169 and Definition 14.7, E Hn,1 ,Yn,1|Bα1 ,n,1) ˆ n,1 is with respect to measure Pˆ n,1 . Define gn,1 : Yn,1 = exists, where E  X → [0, 1] ⊂ R by, ∀(ωˆ α1 , . . . , ωˆ αr(n) ) ∈ Yn,1 , α∈Λˆ n α gn,1 (ωˆ α1 , . . . , ωˆ αr(n) ) 7 χHˆ n,1 ,Yn,1 (ωˆ α1 , ωα2 , . . . , ωαr(n) ) dPˆ n,1 (ωα1 , . . . , ωαr(n) ) =

.

Yn,1

˜ ˆ n,1 (χ ˆ By Proposition 14.14, gn,1 ∈ E ˆ α1 , . . . , ωˆ αr(n) ) = Hn,1 ,Yn,1 |Bα1 ,n,1 ), gn,1 (ω g¯n,1 (ωˆ α1 ), ∀(ωˆ α1 , . . . , ωˆ αr(n) ) ∈ Yn,1 , and g¯ n,1 : Xα1 → [0, 1] ⊂ R

766

14 Probability Theory

ˆ n,1 (E ˆ n,1 is Bα1 -measurable. Since Pˆ n,1 (Hˆ n,1 ) > 0 , then E α1 (g¯n,1 ) = E ˜ ˆ ˆ ˆ (χHˆ n,1 ,Yn,1 |Bα1 ,n,1 )) = E n,1 (χHˆ n,1 ,Yn,1 ) = P n,1 (Hn,1 ) > 0 , where E α1 is with respect to P α1 , and the second equality follows from (a) of Proposition 14.11. This leads to the following inequalities: 0 < E α1 (g¯n,1 ) ≤ P α1 ({ωα1 ∈ Xα1 | g¯ n,1 (ωα1 ) > 2−1 0 })

.

+2−1 0 P α1 ({ωα1 ∈ Xα1 | g¯ n,1 (ωα1 ) ≤ 2−1 0 }) ≤ P α1 ({ωα1 ∈ Xα1 | g¯n,1 (ωα1 ) > 2−1 0 }) + 2−1 0 Thus, we have P α1 ({ωα1 ∈ Xα1 | g¯n,1 (ωα1 ) > 2−1 0 }) > 2−1 0 . Now, since Hn,1 ⊇ Hn+1,1 , by the definition of g¯n,1 , Proposition 11.92, and Fubini’s Theorem 12.31, we have 0 ≤ g¯n+1,1 (ωα1 ) ≤ g¯n,1 (ωα1 ), ∀ωα1 ∈ Xα1 (which requires us to work in the product measure space Yn+1,1 r(n+1) that includes both Hˆ n,1 × i=r(n)+1 Xαi and Hˆ n+1,1 ). Therefore, we have P α1 (Γn,1 ) > 2−1 0 , where Γn,1 := {ωα1 ∈ Xα1 | g¯n,1 (ωα1  ) > 2−1 0 }. Furthermore, Γn,1 ⊇ Γn+1,1 . By Proposition 11.5, we have P α1 ( ∞ n=1  Γn,1 ) = limn∈N P α1 (Γn,1 ) ≥ 2−1 0 > 0. Then, ∃ω¯ α1 ∈ Xα1 such that ω¯ α1 ∈ ∞ n=1 Γn,1 . This implies that g¯ n,1 (ω¯ α1 ) > 2−1 0 , ∀n ∈ N. By the definition of g¯n,1 , we have ¯ α1 , ωα2 , . . . , ωαr(n) ) dPˆ n,1 (ωα1 , . . . , ωαr(n) ) > 2−1 0 , ∀n ∈ N. Yn,1 χHˆ n,1 ,Yn,1 (ω  Fix any n ∈ N. By Proposition 12.21, we have Yn,2 := α∈Λˆ n \{α1 } Xα =: (Yn,2 , Bˆn,2 , Pˆ n,2 ) is a probability measure space, on which the* collection of σ  ˜ algebras (B˜α,n,2 )α∈Λˆ n \{α1 } is independent, where B˜α,n,2 := α∈ ¯ Λˆ n \{α1 } Bα¯ ∈  +  Bˆn,2  B˜ α¯ = Xα¯ , ∀α¯ ∈ Λˆ n \ {α1 , α} , ∀α ∈ Λˆ n \ {α1 }. Let Bˆ n,2 := 3 3 B ω ¯ ∈ B ¯ α1 ∈ Ci,α1 α α α ˆ 1 1 α∈Λn \{α1 } α∈Λˆ n \{α1 } Ci,α ω and Cˆ i,n,2 := , ∅ ω¯ α1 ∈ / Bα1 ∅ ω¯ α1 ∈ / Ci,α1  ∀i ∈ {1, . . . , n}. Clearly, Bˆ n,2 ⊇ ni=1 Cˆ i,n,2 and the sets in the union are pairwise disjoint, and all sets involved are in Bˆn,2 , by Proposition 12.25. Then, we have Pˆ n,2 (Bˆ n,2 ) =

7

.

Yn,2

7 = Yn,1

7 = Yn,1

χBˆ n,2 ,Yn,2 (ωα2 , . . . , ωαr(n) ) dPˆ n,2 (ωα2 , . . . , ωαr(n) )

χBˆ n,1 ,Yn,1 (ω¯ α1 , ωα2 , . . . , ωαr(n) ) dPˆ n,1 (ωα1 , ωα2 , . . . , ωαr(n) ) / χHˆ n,1 ,Yn,1 +

n .

0 χCˆ i,n,1 ,Yn,1 (ω¯ α1 , ωα2 , . . . , ωαr(n) )

i=1

dPˆ n,1 (ωα1 , ωα2 , . . . , ωαr(n) )

14.1 Fundamental Notions

7

n .

=

Yn,1 i=1

7

+ Yn,1

7

Yn,1 i=1

+2−1 0 =2

−1

χCˆ i,n,1 ,Yn,1 (ω¯ α1 , ωα2 , . . . , ωαr(n) ) dPˆ n,1 (ωα1 , ωα2 , . . . , ωαr(n) )

χHˆ n,1 ,Yn,1 (ω¯ α1 , ωα2 , . . . , ωαr(n) ) dPˆ n,1 (ωα1 , ωα2 , . . . , ωαr(n) )

n .

>

767

χCˆ i,n,1 ,Yn,1 (ω¯ α1 , ωα2 , . . . , ωαr(n) ) dPˆ n,1 (ωα1 , ωα2 , . . . , ωαr(n) )

7

n .

0 +

Yn,2 i=1

= 2−1 0 +

n .

χCˆ i,n,2 ,Yn,2 (ωα2 , . . . , ωαr(n) ) dPˆ n,2 (ωα2 , . . . , ωαr(n) )

Pˆ n,2 (Cˆ i,n,2 )

i=1

where the first equality follows from Proposition 11.75, the second equality follows from the definition of Bˆ n,2 , Xα1 is a probability measure space, and Fubini’s Theorem 12.31, the third equality follows from the fact that Bˆ n,1 =  Hˆ n,1 ∪ ( ni=1 Cˆ i,n,1 ) and the sets in the union are pairwise disjoint, the fourth equality follows from Proposition 11.92, the inequality follows from the conclusion of the previous paragraph, the fifth equality follows from the definition of Cˆ i,n,2 ’s and Fubini’s Theorem 12.31, and the last equality follows from Proposi ˆ n,2 (Bˆ n,2 \ ( ni=1 Cˆ i,n,2 )) > 2−1 0 . tion 11.75. Thus, we have Pˆ n,2 (Hˆ n,2 ) := P Define Hn,2 := {(ωα )α∈Λ\{α1 } ∈ α∈Λ\{α1 } Xα | (ωˆ α )α∈Λ ∈ Hn,1 , where ωˆ α = ωα , ∀α ∈ Λ \ {α1 } , ωˆ α1 = ω¯ α1 }. It is obvious that Hˆ n,2 = {(ωα )α∈Λˆ n \{α1 } ∈  ˆ α )α∈Λˆ n ∈ Hˆ n,1 , where ωˆ α = ωα , ∀α ∈ Λˆ n \ {α1 } , ωˆ α1 = ω¯ α1 }. α∈Λˆ n \{α1 } Xα | (ω   Clearly, Hn,2 = Hˆ n,2 × α∈Λ\Λˆ n Xα , Hn,2 ⊇ Hn+1,2 , and ∞ n=1 Hn,2 = ∅. ˆ ˆ Obviously, Hn,2 ∈ Bn,2 . ˆ n,2 (χ ˆ By Radon–Nikodym Theorem 11.169 and Definition 14.7, E | Hn,2 ,Yn,2

ˆ n,2 is with respect to measure Pˆ n,2 . Define gn,2 : Yn,2 = B˜α2 ,n,2 ) exists, where E  α∈Λˆ n \{α1 } Xα → [0, 1] ⊂ R by gn,2 (ωˆ α2 , . . . , ωˆ αr(n) ) 7 = χHˆ n,2 ,Yn,2 (ωˆ α2 , ωα3 , . . . , ωαr(n) ) dPˆ n,2 (ωα2 , . . . , ωαr(n) )

.

Yn,2

˜ ˆ n,2 (χ ˆ ∀(ωˆ α2 , . . . , ωˆ αr(n) ) ∈ Yn,2 . By Proposition 14.14, gn,2 ∈ E Hn,2 ,Yn,2 |Bα2 ,n,2 ), gn,2 (ωˆ α2 , . . . , ωˆ αr(n) ) = g¯n,2 (ωˆ α2 ), ∀(ωˆ α2 , . . . , ωˆ αr(n) ) ∈ Yn,2 , and g¯n,2 : Xα2 → [0, 1] ⊂ R is Bα2 -measurable. Since Pˆ n,2 (Hˆ n,2 ) > 2−1 0 , then E α2 (g¯n,2 ) =

768

14 Probability Theory

−1 ˜ ˆ n,2 (E ˆ n,2 (χ ˆ ˆ ˆ ˆ E Hn,2 ,Yn,2 |Bα2 ,n,2 )) = E n,2 (χHˆ n,2 ,Yn,2 ) = P n,2 (Hn,2 ) > 2 0 , where E α2 is with respect to P α2 , and the second equality follows from (a) of Proposition 14.11. This leads to the following inequalities:

2−1 0 < E α2 (g¯n,2 ) ≤ P α2 ({ωα2 ∈ Xα2 | g¯n,2 (ωα2 ) > 2−2 0 })

.

+2−2 0 P α2 ({ωα2 ∈ Xα2 | g¯ n,2 (ωα2 ) ≤ 2−2 0 }) ≤ P α2 ({ωα2 ∈ Xα2 | g¯ n,2 (ωα2 ) > 2−2 0 }) + 2−2 0 Thus, we have P α2 ({ωα2 ∈ Xα2 | g¯n,2 (ωα2 ) > 2−2 0 }) > 2−2 0 . Now, since Hn,2 ⊇ Hn+1,2 , by the definition of g¯n,2 , Proposition 11.92, and Fubini’s Theorem 12.31, we have 0 ≤ g¯n+1,2 (ωα2 ) ≤ g¯n,2 (ωα2 ), ∀ωα2 ∈ Xα2 (which requires us to work in the product measure space Yn+1,2 r(n+1) that includes both Hˆ n,2 × i=r(n)+1 Xαi and Hˆ n+1,2 ). Therefore, we have −2 P α2 (Γn,2 ) > 2 0 , where Γn,2 := {ωα2 ∈ Xα2 | g¯n,2 (ωα2  ) > 2−2 0 }. Furthermore, Γn,2 ⊇ Γn+1,2 . By Proposition 11.5, we have P α2 ( ∞ n=1  Γn,2 ) = limn∈N P α2 (Γn,2 ) ≥ 2−2 0 > 0. Then, ∃ω¯ α2 ∈ Xα2 such that ω¯ α2 ∈ ∞ n=1 Γn,2 . This implies that g¯ n,2 (ω¯ α2 ) > 2−2 0 , ∀n ∈ N. By the definition of g¯n,2 , we have ¯ α2 , ωα3 , . . . , ωαr(n) ) dPˆ n,2 (ωα2 , . . . , ωαr(n) ) > 2−2 0 , ∀n ∈ N. Yn,2 χHˆ n,2 ,Yn,2 (ω Recursively, assume we have completed index l ∈ N. Fix any n ∈ N. By Propo sition 12.21, we have Yn,l+1 := α∈Λˆ n \{α1 ,...,αl } Xα =: (Yn,l+1 , Bˆn,l+1 , Pˆ n,l+1 ) is a probability measure space, on which the collection of σ -algebras  ˜ (B˜α,n,l+1 )α∈Λˆ n \{α1 ,...,αl } is independent, where B˜α,n,l+1 := { α∈ ¯ Λˆ n \{α1 ,...,αl } Bα¯ ∈ Bˆn,l+1 | B˜ α¯ = Xα¯ , ∀α¯ ∈ Λˆ n \ {α1 , . . . , αl , α}}, ∀α ∈ Λˆ n \ {α1 , . . . , αl }. Let Bˆ n,l+1 :=

.

3 ∅

α∈Λˆ n \{α1 ,...,αl } Bα

ω¯ αj ∈ Bαj , j = 1, . . . , l ∃j ∈ {1, . . . , l} · ω¯ αj ∈ / Bαj

3

¯ αj ∈ Ci,αj , j = 1, . . . , l α∈Λˆ n \{α1 ,...,αl } Ci,α ω , ∀i ∈ ∅ ∃j ∈ {1, . . . , l} · ω¯ αj ∈ / Ci,αj  {1, . . . , n}. Clearly, Bˆ n,l+1 ⊇ ni=1 Cˆ i,n,l+1 and the sets in the union are pairwise disjoint, and all sets involved are in Bˆn,l+1 , by Proposition 12.25. Then, we have

and Cˆ i,n,l+1 :=

Pˆ n,l+1 (Bˆ n,l+1 ) 7 = χBˆ n,l+1 ,Yn,l+1 (ωαl+1 , . . . , ωαr(n) ) dPˆ n,l+1 (ωαl+1 , . . . , ωαr(n) )

.

Yn,l+1

7 =

Yn,l

χBˆ n,l ,Yn,l (ω¯ αl , ωαl+1 , . . . , ωαr(n) ) dPˆ n,l (ωαl , ωαl+1 , . . . , ωαr(n) )

14.1 Fundamental Notions

769

7 = Yn,l

(χHˆ n,l ,Yn,l +

n .

χCˆ i,n,l ,Yn,l )(ω¯ αl , ωαl+1 , . . . , ωαr(n) )

i=1

dPˆ n,l (ωαl , ωαl+1 , . . . , ωαr(n) ) 7 χHˆ n,l ,Yn,l (ω¯ αl , ωαl+1 , . . . , ωαr(n) ) dPˆ n,l (ωαl , ωαl+1 , . . . , ωαr(n) ) = Yn,l

7 +

n .

Yn,l i=1

χCˆ i,n,l ,Yn,l (ω¯ αl , ωαl+1 , . . . , ωαr(n) )

dPˆ n,l (ωαl , ωαl+1 , . . . , ωαr(n) ) 7 . n χCˆ i,n,l ,Yn,l (ω¯ αl , ωαl+1 , . . . , ωαr(n) ) > 2−l 0 + Yn,l i=1

dPˆ n,l (ωαl , ωαl+1 , . . . , ωαr(n) ) 7 n . −l χCˆ i,n,l+1 ,Yn,l+1 (ωαl+1 , . . . , ωαr(n) ) = 2 0 + Yn,l+1 i=1

dPˆ n,l+1 (ωαl+1 , . . . , ωαr(n) ) = 2−l 0 +

n .

Pˆ n,l+1 (Cˆ i,n,l+1 )

i=1

where the first equality follows from Proposition 11.75, the second equality follows from the definition of Bˆ n,l+1 , Xαl is a probability measure space, and Fubini’s Theorem 12.31, the third equality follows from the fact that Bˆ n,l =  Hˆ n,l ∪ ( ni=1 Cˆ i,n,l ) and the sets in the union are pairwise disjoint, the fourth equality follows from Proposition 11.92, the inequality follows from the conclusion of the previous paragraph, the fifth equality follows from the definition of Cˆ i,n,l+1 ’s and Fubini’s Theorem 12.31, and the last equality follows from Propo (Bˆ n,l+1 \ ( ni=1 Cˆ i,n,l+1 )) > sition 11.75. Thus, we have Pˆ n,l+1 (Hˆ n,l+1 ) := Pˆ n,l+1  2−l 0 . Define Hn,l+1 := {(ωα )α∈Λ\{α1 ,...,αl } ∈ ˆ α )α∈Λ ∈ α∈Λ\{α1 ,...,αl } Xα | (ω Hn,1 , where ωˆ α = ωα , ∀α ∈ Λ \ {α1 , . . . , αl } , ωˆ αj = ω¯ αj , j = 1, . . . , l}. It  is obvious that Hˆ n,l+1 = {(ωα )α∈Λˆ n \{α1 ,...,αl } ∈ α∈Λˆ n \{α1 ,...,αl } Xα | (ωˆ α )α∈Λˆ n ∈ Hˆ n,1 , where ωˆ α = ωα , ∀α ∈ Λˆ n \ {α1 , . . . , αl } , ωˆ αj = ω¯ αj , j = 1, . . . , l}. Clearly,  ∞ Hn,l+1 = Hˆ n,l+1 × Hn,l+1 = ∅. ˆ Xα , Hn,l+1 ⊇ Hn+1,l+1 , and α∈Λ\Λn

Obviously, Hˆ n,l+1 ∈ Bˆn,l+1 .

n=1

770

14 Probability Theory

ˆ n,l+1 (χ ˆ By Radon–Nikodym Theorem 11.169 and Definition 14.7, E Hn,l+1 ,Yn,l+1 ˜ ˆ ˆ |Bαl+1 ,n,l+1 ) exists, where E n,l+1 is with respect to measure P n,l+1 . Define gn,l+1 : Yn,l+1 = α∈Λˆ n \{α1 ,...,αl } Xα → [0, 1] ⊂ R by, ∀(ωˆ αl+1 , . . . , ωˆ αr(n) ) ∈ Yn,l+1 , 7 gn,l+1 (ωˆ αl+1 , . . . , ωˆ αr(n) ) =

.

Yn,l+1

χHˆ n,l+1 ,Yn,l+1 (ωˆ αl+1 , ωαl+2 , . . . , ωαr(n) )

dPˆ n,l+1 (ωαl+1 , . . . , ωαr(n) ) ˜ ˆ n,l+1 (χ ˆ By Proposition 14.14, gn,l+1 ∈ E ˆ αl+1 , . . . , Hn,l+1 ,Yn,l+1 |Bαl+1 ,n,l+1 ), gn,l+1 (ω ωˆ αr(n) ) = g¯n,l+1 (ωˆ αl+1 ), ∀(ωˆ αl+1 , . . . , ωˆ αr(n) ) ∈ Yn,l+1 , and g¯n,l+1 : Xαl+1 → [0, 1] ⊂ R is Bαl+1 -measurable. Since Pˆ n,l+1 (Hˆ n,l+1 ) > 2−l 0 , then ˆ n,l+1 (E ˆ n,l+1 (χ ˆ ˆ n,l+1 = E |B˜αl+1 ,n,l+1 )) = E E αl+1 (g¯n,l+1 ) Hn,l+1 ,Yn,l+1

(χHˆ n,l+1 ,Yn,l+1 ) = Pˆ n,l+1 (Hˆ n,l+1 ) > 2−l 0 , where E αl+1 is with respect to P αl+1 , and the second equality follows from (a) of Proposition 14.11. This leads to the following inequalities: 2−l 0 < E αl+1 (g¯n,l+1 )

.

≤ P αl+1 ({ωαl+1 ∈ Xαl+1 | g¯n,l+1 (ωαl+1 ) > 2−l−1 0 }) +2−l−1 0 P αl+1 ({ωαl+1 ∈ Xαl+1 | g¯ n,l+1 (ωαl+1 ) ≤ 2−l−1 0 }) ≤ P αl+1 ({ωαl+1 ∈ Xαl+1 | g¯n,l+1 (ωαl+1 ) > 2−l−1 0 }) + 2−l−1 0 Thus, we have P αl+1 ({ωαl+1 ∈ Xαl+1 | g¯ n,l+1 (ωαl+1 ) > 2−l−1 0 }) > 2−l−1 0 . Now, since Hn,l+1 ⊇ Hn+1,l+1 , by the definition of g¯ n,l+1 , Proposition 11.92, and Fubini’s Theorem 12.31, we have 0 ≤ g¯ n+1,l+1 (ωαl+1 ) ≤ g¯n,l+1 (ωαl+1 ), ∀ωαl+1 ∈ Xαl+1 (which requires us to work in the product r(n+1) measure space Yn+1,l+1 that includes both Hˆ n,l+1 × Xαi and i=r(n)+1

Hˆ n+1,l+1 ). Therefore, we have P αl+1 (Γn,l+1 ) > 2−l−1 0 , where Γn,l+1 := {ωαl+1 ∈ Xαl+1 | g¯ n,l+1 (ωαl+1 ) > 2−l−1  0 }. Furthermore, Γn,l+1 ⊇ Γn+1,l+1 . By Proposition 11.5, we have P αl+1 ( ∞ n=1 Γn,l+1 ) = limn∈N P αl+1 (Γn,l+1 ) ≥ 2−l−1 0 > 0. Then, ∃ω¯ αl+1 ∈ Xαl+1 such that ω¯ αl+1 ∈ ∞ n=1 Γn,l+1 . This implies that g¯n,l+1 (ω¯ αl+1 ) > 2−l−1 0 , ∀n ∈ N. By the definition of g¯n,l+1 , we have ¯ αl+1 , ωαl+2 , . . . , ωαr(n) )dPˆ n,l+1 (ωαl+1 , . . . , ωαr(n) ) > 2−l−1 0 , Yn,l+1 χHˆ n,l+1 ,Yn,l+1 (ω ∀n ∈ N. ∞ ˆ Thus, we may obtain ω¯ αi ∈ n=1 Γn,i ⊆  Xαi , ∀i ∈ N with αi ∈ Λ. By Axiom of Choice, there exists a ω¯ ∈ X = α∈Λ Xα such that παi (ω) ¯ = ω¯ αi , ∞ ˆ Fix any n ∈ N. ω¯ αr(n) ∈ ∀αi ∈ Λ. Γ . This implies that ω¯ αr(n) ∈ m,r(n) m=1 Γn,r(n) = {ωαr(n) ∈ Xαr(n) | g¯ n,r(n) (ωαr(n) ) > 2−r(n) 0 }, and g¯n,r(n) (ω¯ αr(n) ) > 2−r(n) 0 > 0. By definition of g¯n,r(n) , we have 0 < g¯n,r(n) (ω¯ αr(n) ) =

14.1 Fundamental Notions

Yn,r(n)

771

χHˆ n,r(n) ,Yn,r(n) (ω¯ αr(n) ) dPˆ n,r(n) (ωαr(n) ) = χHˆ n,r(n) ,Xα

r(n)

(ω¯ αr(n) ) = 1, where

the last equality follows since it is the value of an indicator function. Hence, ω¯ αr(n) ∈ Hˆ n,r(n). By the definition of Hˆ n,r(n) , we have (ω¯ αi )αi ∈Λˆ n ∈ Hˆ n,1 . This ∞ yields ω¯ ∈ Hn,1 . By the arbitrariness of n, we have ω¯ ∈ n=1 Hn,1 = ∅. This is a contradiction. Hence, the hypothesis does not hold. We must have μ(B) ≤ ∞ i=1 μ(Ci ). Hence, (ii) of Proposition 11.32 holds. By Proposition 11.32, μ admits a unique extension to a measure μ¯ on the algebra on X, A, generated by F . Clearly, μ(X) ¯ = μ(X) = 1, since X ∈ F . Then, μ¯ is finite. By Carathéodory Extension Theorem 11.19, there exists a unique measure P on the measurable space (X, σ (F )) that is an extension of μ. ¯ Then, we have a probability measure space (X, B := σ (F ), P ) as we seek. It is straightforward to prove the collection of σ -algebras (B¯α )α∈Λ is independent by Definition 14.5. This completes the proof of the theorem. ' & Definition 14.16 Let Ω := (Ω, B, P ) be a probability measure space, m ∈ N, x : Ω → Rm be an Rm -valued random variable. Define a mapping Lx : BB (Rm ) → [0, 1] ⊂ R by Lx (E) := P (xinv (E)), ∀E ∈ BB (Rm ). Then, Lx is a probability measure on the measurable space (Rm , BB (Rm )) and is said to be the law of x. Since Lx is a probability measure, then we may define F :  Rm → [0, 1] ⊂ R by F (z) = Lx ({x¯ ∈ Rm | x¯ = z}), ∀z ∈ Rm . Then, by Proposition 12.51, F is a cumulative distribution function of Lx . If Lx " μBm , dLx we will let f = dμ a.e. in Rm and say f is the probability density function of Bm Lx . % Clearly, the probability density function f is determined almost everywhere on Rm , dLx if it exists, which is a version of dμ . If probability density function exists, then, Bm m ∀E ∈ BB (R ), we have P (xinv (E)) = Lx (E) = E f dμBm . The relationship between probability density function f and cumulative distribution function F  m is that ∀s1 , s2 ∈ R with s1 = s2 , we have P (xinv (rs1 ,s2 )) = Lx (rs1 ,s2 ) m= rs ,s f dμBm = ΔF (rs1 ,s2 ), and F (z) = limk→−∞ rk1 ,z f (x) dμBm (x), ∀z ∈ R . 1 2

m

Example 14.17 Let I := (((0, 1), |·|), B, μ) be the finite metric measure subspace of R. Let x : (0, 1) → [0, 1] ⊂ R be a random variable given by x(ω) = ω, ∀ω ∈ I. Then, any cumulative distribution function F : R → [0, 1] ⊂ R with F being of bounded variation, limz→−∞ F (z) = 0, limz→∞ F (z) = 1, and TF = 1, we seek a function h : (0, 1) → R such that the random variable y := h ◦ x : I → R admits F as a cumulative distribution function of Ly , which is the probability measure on the measurable space (R, BB (R)). Since TF = 1 and limz→−∞ F (z) = 0 and limz→∞ F (z) = 1, then ∀z1 , z2 ∈ R with z1 ≤ z2 , we have ΔF (rz1 ,z2 ) ≥ 0 and

772

14 Probability Theory

hence F (z1 ) ≤ F (z2 ). Since F is of bounded variation, then F is continuous on the right. We select h to be h(α) = inf{z ∈ R | F (z) ≥ α} ∈ R, ∀α ∈ I. Then, it is straightforward to prove that μ({ω ∈ I | y(ω) ≤ z}) = μ({ω ∈ I | h(x(ω)) ≤ z}) = μ({ω ∈ I | ω ≤ F (z)}) = F (z). Hence, the random variable y = h(x) admits the F as a cumulative distribution function for its law Ly . Thus, for any cumulative distribution function F : R → [0, 1] ⊂ R satisfying the stated assumptions, there exists a probability measure space I and a random variable y : I → R such that y admits F as a cumulative distribution function for its law Ly . % Proposition 14.18 Let Ω := (Ω, B, P ) be a probability measure space, xi : Ω → Rni be an Rni -valued random variable, ni ∈ N, i = 1, 2, and x := (x1 , x2 ) be an Rn -valued random variable, n := n1 + n2 , with probability density function fx (z), fx : Rn → [0, ∞) ⊂ R. Then, x1 admits probability density function fx1 : Rn1 → [0, ∞) ⊂ R, where fx1 (z1 ) :=  Rn2 fx (z1 , z2 ) dμBn2 (z2 ) if Rn2 fx (z1 , z2 ) dμBn2 (z2 ) ∈ R, ∀z ∈ Rn1 and z := 1 0 if Rn2 fx (z1 , z2 ) dμBn2 (z2 ) = ∞ (z1 , z2 ) is partitioned as is for x. Proof Let Lx1 be the law for x1 . Then, Lx1 (E1 ) = P ({(x1 , x2 ) ∈ Rn1 +n2 | x1 ∈ E1 }) 7 7 = fx (z) dμBn (z) = fx1 (z1 ) dμBn1 (z1 )

.

E1 ×Rn2

E1

∀E1 ∈ BB (Rn1 ), where the first and second equalities follow from Definition 14.16 and the third equality follows from Fubini’s Theorem 12.31. By Proposition 11.116, we have Lx1 is the measure with kernel fx1 over Rn1 . By Proposition 12.75, we have dLx1 dμBn1

= fx1 a.e. in Rn1 , and hence x1 admits probability density function fx1 .

' &

Proposition 14.19 Let Ω := (Ω, B, P ) be a probability measure space, m ∈ N, X be an Rm -valued random variable with probability density function fX : Rm → [0, ∞) ⊂ R, and V ∈ B (Rm , Rm ) be invertible. Then, Y := V X is an Rm -valued random variable with probability density function fY : Rm → [0, ∞) ⊂ R, where 1 −1 y). fY (y) = |det(V )| fX (V Proof Let the law for X be LX and the law for Y be LY . Then, ∀E ∈ BB (Rm ), LY (E) = P ({ω ∈ Ω | Y (ω) ∈ E}) = P ({ω ∈ Ω | X(ω) ∈ V −1 (E)}) 7 −1 = LX (V (E)) = fX (x) dμBm (x)

.

V −1 (E)

14.1 Fundamental Notions

=

773

7     det(V −1 )fX (V −1 y) dμBm (y) E

7 =

E

1 fX (V −1 y) dμBm (y) = |det(V )|

7 fY (y) dμBm (y) E

where the first equality follows from Definition 14.16, the second equality follows from the fact that Y = V X, the third equality follows from Definition 14.16, the fourth equality follows from Definition 14.16 and Proposition 11.167, the fifth equality follows from the Change of Variable Theorem 12.91 with y = V x, the sixth equality follows from properties of matrices, and the last equality follows from the definition of fY . Hence, we observe that LY is the measure with kernel fY over Rm . dLY By Propositions 11.116 and 12.75, we have dμ = fY a.e. in Rm , and hence Y Bm admits probability density function fY . This completes the proof of the proposition. ' & Proposition 14.20 Let Ω := (Ω, B, P ) be a probability measure space, m ∈ N, n1 , . . . , nm ∈ N, xi : Ω → Rni be an Rni -valued random variable with probability density function fi : Rni → [0, ∞) ⊂ R, i = 1, . . . , m. Then, the random variables x1 , . . . , xm are independent if, and onlyif, the Rn -valued random variable m n x := (x1 , . . . , xm ) : Ω → R with n := i=1 ni ∈ N admitsprobability m density function f : Rn → [0, ∞) ⊂ R, defined by f (z) = i=1 fi (zi ), ∀z := (z1 , . . . , zm ) ∈ Rn with zi ∈ Rni , i = 1, . . . , m. Proof “Sufficiency” By Proposition 11.39, x isan Rn -valued random variable. m nj ∀Ei ∈ BB (Rni ), i = 1, . . . , m, let E¯ i := j =1 Bj,i , where Bj,i = R , ∀j ∈ J¯ := {1, . . . , m} with j = i. Then, P(

m 

.

E¯ i ) = P (

i=1

=

E¯ i=1

7 ··· E1

m 7  i=1 Ei

7 E¯

f (z) dμBn (z)

fi (zi ) dμBn (z1 , . . . , zm )

7

=

¯ = Ei ) =: P (E)

i=1

7  m

=

m 

m  Em i=1

fi (zi ) dμBnm (zm ) · · · dμBn1 (z1 )

fi (zi ) dμBni (zi ) =

m 

P (E¯ i )

i=1

where the second equality follows from Definition 14.16, the fourth equality follows from Fubini’s Theorem 12.31, the fifth equality follows from Proposition 11.92, and the last equality follows from Definition 14.16. Hence, the random variables x1 , . . . , xm are independent.

774

14 Probability Theory

“Necessity” By Proposition 11.39, x is an Rn -valued random variable. Let Lx be the law of x and Lxi be the law of xi , i = 1, . . . , m. ∀Ei ∈ BB (Rni ), i = 1, . . . , m, nj ¯ ¯ let Ei := m j =1 Bj,i , where Bj,i = R , ∀j ∈ J := {1, . . . , m} with j = i. By the independence of random variables x1 , . . . , xm , we have Lx (

m 

.

¯ = E¯ i ) =: Lx (E)

i=1

m  i=1

=

m 7  i=1 Ei

Lx (E¯ i ) =

m 

Lxi (Ei )

i=1

7 fi (zi ) dμBni (zi ) =



f (z) dμBn (z)

where the first equality follows from the independence assumption, the second equality follows from Definition 14.16, the third equality follows from the assumption that xi admits probability density function fi , i = 1, . . . , m, and the last equality follows from Fubini’s Theorem 12.31. By Lemma 12.103, Lx equals the finite measure on Rn with kernel f . Then, the Rn -valued random variable x admits probability density function f , by Definition 14.16 and Proposition 12.75. This completes the proof of the proposition. ' & Proposition 14.21 Let p ∈ [1, ∞) ⊂ R, Ω := (Ω, B, P ) be a probability measure space, x : Ω → [0, ∞] ⊂ Re be a nonnegative extended real-valued function that is B-measurable. Then, we have E(x p ) := Ω (x(ω))p dP (ω) = p−1 P ({ω ∈ Ω | x(ω) > z}) dμ (z). B (0,∞) pz Proof We will distinguish two exhaustive and mutually exclusive cases: Case 1: P ({ω ∈ Ω | x(ω) = ∞}) > 0, and Case 2: P ({ω ∈ Ω | x(ω) = ∞}) = 0. Case 1: P ({ω ∈ Ω | x(ω) = ∞}) > 0. Note that E(x p ) := Ω (x(ω))p ·dP (ω) = ∞ = (0,∞) pzp−1 P ({ω ∈ Ω | x(ω) > z}) dμB (z), where the equalities follow from Definition 11.79. Case 2: P ({ω ∈ Ω | x(ω) = ∞})  = 0. Let x¯ : Ω → [0, ∞) ⊂ R be the x(ω) if x(ω) < ∞ random variable defined by x(ω) ¯ = . Then, we have E(x p ) := 0 if x(ω) = ∞ p p dP (ω) = E(x¯ p ), where the first equality follows ¯ Ω (x(ω)) dP (ω) = Ω (x(ω)) from Definition 11.79 and the second equality follows from Definition 14.1. Fix any n ∈ N. Let x¯n := x¯ ∧ n, Ln : BB (R) → [0, 1] ⊂ R be the law of of Ln as x¯n , and Fn : R → [0, 1] ⊂ R be the cumulative distribution function p ≥ E(x¯ p ) = p dP (ω) = defined in Definition 14.16. Note that ∞ > n ( x ¯ (ω)) n n Ω χ zp dFn (z) = (−∞,0] 0 dFn (z)+ (0,∞) zp dFn (z) = (0,∞) zp dFn (z) = R [0,∞),R p − 1) = − (0,∞) p · zp−1 (Fn (z) − 1) dμB (z) = (0,∞) pzp−1 (1 − (0,∞) z d(Fn (z) Fn (z)) dμB (z) = (0,∞) pzp−1 · P ({ω ∈ Ω | x¯n (ω) > z}) dμB (z), where the second inequality follows from the fact that x¯ n (ω) ≤ n, ∀ω ∈ Ω, the first equality follows from Definition 14.1, the second equality follows from Definition 11.70, the third equality follows from Proposition 11.89 and Definition 12.57, the fourth equality follows from Proposition 11.75, the fifth equality follows from the fact that Fn |[0,∞)

14.1 Fundamental Notions

775

and (Fn − 1)|[0,∞) are cumulative distribution functions for the same measure space, which is the measure subspace of (R, BB (R) , Lx¯ ) to the interval (0, ∞), the sixth equality follows from Integration by Parts Theorem 12.101, Integrability Theorem A.10, and Theorem 29.8 of Bartle (1976), and the eighth equality follows from Definition 14.16. p dP (ω) = Therefore, we have E(x p ) := Ω (x(ω))p dP (ω) = Ω (x(ω)) ¯ p p−1 P ({ω ∈ Ω | x¯ n (ω) > limn∈N Ω (x¯n (ω)) dP (ω) = limn∈N (0,∞) pz p−1 z}) dμB (z) = (0,∞) pz P ({ω ∈ Ω | x(ω) ¯ > z}) dμB (z) = (0,∞) pzp−1 · P ({ω ∈ Ω | x(ω) > z}) dμB (z), where the first equality follows from the second to last paragraph, the second equality follows from Monotone Convergence Theorem 11.81, the third equality follows from the last paragraph, the fourth equality follows from Monotone Convergence Theorem 11.81, and the last equality follows from the fact that P ({ω ∈ Ω | x(ω) = ∞}) = 0. This completes the proof of the proposition. ' & ˆ be measurable spaces, D ⊆ X, Definition 14.22 Let X := (X, B) and Y := (Y, B) ˆ ˆ we have finv (E) ∈ and f : D → Y. f is said to be B-in-B-measurable if ∀E ∈ B, B. % Here, the concept of measurable function is extended to functions between measurable spaces. Relating to the original concept of measurable functions Definition 11.33, where f : D → Z, where Z := (Z, OZ ) is a topological space, f is B-measurable if, and only if, it is BB (Z)-in-B-measurable. ˆ be measurable spaces, D ⊆ Proposition 14.23 Let X := (X, B) and Y := (Y, B) X, f : D → Y, F ⊆ Bˆ is a π-system on Y , and σ (F ) = Bˆ (where σ (F ) is ˆ the σ -algebra generated by F as defined in Definition 14.2). Then, f is B-in-Bmeasurable if, and only if, ∀E ∈ F , finv (E) ∈ B. Proof “Necessity” is obvious. “Sufficiency” Let f satisfy, ∀E ∈ F , finv (E) ∈ B. Define BY := {E ⊆ Y | finv (E) ∈ B}. Clearly, F ⊆ BY . We will show that BY is a σ -algebra on Y . Then, Bˆ = σ (F ) ⊆ BY and the result is established. Clearly, ∅, Y ∈ F by the definition of π-system (Definition 12.17), which implies that finv (∅) = ∅ ∈ B and finv (Y ) = D ∈ B, then ∅, Y ∈ BY . ∀E ∈ BY , by Proposition 2.5, finv (Y \ E) = finv (Y ) \ finv (E) = D \ finv (E) ∈ B. Hence, Y \ E ∈ BY . ∀ (Ei we have finv (Ei ) ∈ B,  ∀i ∈ N. Then, by )∞ i=1 ⊆ BY , ∞ ∞ Proposition 2.5, finv ( ∞ i=1 Ei ) = i=1 finv (Ei ) ∈ B. Then, i=1 Ei ∈ BY . This shows BY is a σ -algebra on Y . This completes the proof of the proposition. ' & ˆ be measurable spaces, Z := Proposition 14.24 Let X := (X, B) and Y := (Y, B) ˆ (Z, OZ ) be a topological space, D1 ⊆ X, D2 ⊆ Y , f : D1 → D2 be B-inˆ B-measurable, and g : D2 → Z be B-measurable. Then, g ◦ f : D1 → Z is B-measurable.

776

14 Probability Theory

ˆ since g is B-measurable. ˆ ˆ Proof ∀O ∈ OZ , ginv (O) ∈ B, By f being B-in-Bmeasurable, we have finv (ginv (O)) ∈ B. By the arbitrariness of O, g ◦ f is Bmeasurable. ' &

14.2 Gaussian Random Variables and Vectors Definition 14.25 Let Ω := (Ω, B, P ) be a probability measure space, m ∈ N, x : Ω → Rm be an Rm -valued random variable. We say that x is an Rm valued Gaussian (normal) random variable with mean x¯ ∈ Rm and covariance K ∈ B (Rm , Rm ), where K ∈ S¯+ Rm , if E(x) = x¯ and E((x − x) ¯ ⊗ (x − x)) ¯ =K and a cumulative distribution function for the law Lx of x is 7 1 .F (z) = lim √ m k→−∞ rk1 ,z 2π (det(K))1/2 m A 1@ ¯ K −1 (x − x) · exp(− x − x, ¯ m ) dμBm (x) R 2 ∀z ∈ Rm , or the probability density function for Lx is f : Rm → [0, ∞) ⊂ R, defined by A 1 1@ ¯ K −1 (x − x) f (z) = √ m exp(− x − x, ¯ m ), R 2 2π (det(K))1/2

.

∀z ∈ Rm

We will write x ∼ N(x, ¯ K) to denote that x is an Rm -valued Gaussian random variable with mean x¯ and covariance K. % Example 14.26 Let m ∈ N, x¯ ∈ Rm , and K ∈ B (Rm , Rm ), where K ∈ S¯+ Rm . By Spectral Theory Theorem 13.52, there exists a unitary matrix V ∈ B (Rm , Rm )  2 2 and a diagonal matrix D = block diagonal σ1 , . . . , σm ∈ S¯+ Rm such that K = V  DV . If x is an Rm -valued Gaussian random variable with mean x¯ and covariance K, then y = V x is Rm -valued Gaussian random variable with mean y¯ := V x¯ =: (y¯1 , . . . , y¯m ) and covariance V KV  = D. Then, by Example 33.15 of Bartle (1976), Example 14.17, and Fundamental Theorem on Modeling 14.15, there exists a probability measure space Ω := (Ω, B, P ) and m independent random variables yi : Ω → R, i = 1, . . . , m, such that yi is a Gaussian normal random variable with mean y¯i and variance σi2 , i = 1, . . . , m. yi ∼ N(y¯i , σi2 ). Then, by Proposition 14.20, we have the Rm -valued random variable y := (y1 , . . . , ym )

14.2 Gaussian Random Variables and Vectors

777

admits the probability density function fy : Rm → [0, ∞) ⊂ R, given by m 

1 1 exp(− σi−2 (zi − y¯i )2 ) √ 2 2πσi i=1 A 1@ 1 ¯ D −1 (z − y) exp(− z − y, ¯ m ); ∀z ∈ Rm = √ m R 2 2π (det(D))1/2

fy (z) =

.

Thus, by Proposition 14.19, x = V  y ∼ N(x, ¯ K) as desired.

%

Proposition 14.27 Let Ω := (Ω, B, P ) be a probability measure space, m ∈ N, xi : Ω → Rni be an Rni -valued random - variable, ni ∈ N, ni = 1, . . . , m, and x := ¯ (x1 , . . . , xm ) ∼ N(x, ¯ K), where n := m i=1 ni ∈ N, x¯ ∈ R and K ∈ S+ Rn . Then, x1 , . . . , xm are independent if, and only if, K = block diagonal (K1 , . . . , Km ) ∈ S¯+ Rn with Ki ∈ S¯+ Rni , i = 1, . . . , m. Proof “Sufficiency” Note that x admits the probability density function fx : Rn → R+ , defined by, ∀z ∈ Rn , A 1 1@ ¯ K −1 (z − x) fx (z) = √ n exp(− z − x, ¯ n) R 2 2π (det(K))1/2 m A  1 1@ = exp(− zi − x¯i , Ki−1 (zi − x¯i ) n ) √ ni 1/2 R i 2 i=1 2π (det(Ki ))

.

where z := (z1 , . . . , zm ) and x¯ := (x¯1 , . . . , x¯m ) are partitioned as is for x and the second equality follows from the block diagonal structure of K. By Proposition 14.18, we have xi admits probability density function fxi : Rni → [0, ∞) ⊂ R, where A 1 1@ fxi (zi ) := √ ni exp(− zi − x¯i , Ki−1 (zi − x¯i ) n ) R i 2 2π (det(Ki ))1/2

.

 n Clearly, we have fx (z) = m i=1 fxi (zi ), ∀z = (z1 , . . . , zm ) ∈ R . By Proposition 14.20, x1 , . . . , xm are independent. “Necessity” Let x1 , . . . , xm be independent. Then, K = E((x − x) ¯ ⊗ (x − x)) ¯ = ⎤ ⎡ K1,1 · · · K1,m ⎢ .. .. ⎥, where K := E((x − x¯ ) ⊗ (x − x¯ )), ∀1 ≤ i, j ≤ m. ⎣ . i,j i i j j . ⎦ Km,1 · · · Km,m ∀1 ≤ i, j ≤ m with i = j , we have Ki,j = E((xi − x¯i ) ⊗ (xj − x¯j )) = E(xi − x¯i )⊗E(xj −x¯j ) = 0ni ×nj , where the second equality follows from Proposition 14.9.   Hence, K = block diagonal K1,1 , . . . , Km,m . This completes the proof of the proposition. ' &

778

14 Probability Theory

Proposition 14.28 Let Ω := (Ω, B, P ) be a probability measure space, m, p ∈ N, p ¯ K), x¯ ∈ Rp , (Xi )i=1 be random variables such that X := (X1 , . . . , Xp ) ∼ N(x, m p p K ∈B (R , R ) with K ∈ S¯+ Rp , and (Yi )i=1 be linear combinations of Xi ’s, p Yi = j =1 αi,j Xj , i = 1, . . . , m, where αi,j ∈ R, ∀i = 1, . . . , m, ∀j = 1, . . . , p. Assume that V := (αi,j )m×p is of full row rank. Then, Y := (Y1 , . . . , Ym ) is an Rm -valued random variable with distribution N(V x, ¯ V KV  ). Proof Clearly m ≤ p and rank(V ) = m. Then, N (V ) is (p − m)-dimensional. Let > = f1 , . . . , fp−m be an orthonormal basis of N (V ), and F := f1 · · · fp−m . Then, H H I I V Y T := 14.19, Z := T X = is an invertible matrix. By Proposition is F K −1 Z2 p random ¯ T KT  ). Note that T KT  = an I variable with distribution N(T x, H R -valued     VF V KV = block diagonal V KV  , F K −1 F  . By Proposition 14.27,  −1  FV FK F Y and Z2 are independent Gaussian random vectors with Y ∼ N(V x, ¯ V KV  ). This completes the proof of the proposition. ' &

14.3 Law of Large Numbers Theorem 14.29 (Weak Law of Large Numbers) Let Ω := (Ω, B, P ) be a probability measure space, Z be a separable Hilbert space over K, (Xn )∞ n=1 be a sequence of independent Z-valued random variables, Xn : Ω → Z, with E(X n) = x¯ ∈ Z and E(Xn − x ¯ 2Z ) ≤ b ∈ R+ , ∀n ∈ N. Then, limn∈N n1 ni=1 Xi = x¯ in measure in Ω. Proof Let Sn := n1 ni=1 Xi , which is a Z-valued random variable by Proposi¯ E(Sn − x ¯ 2Z ) = n12 E( ni=1 (Xi − tions 11.38 and 11.39. Thus, E(Sn ) = x. -n - ¯ Z ) = n12 ni=1 nj=1 E((Xi − x), ¯ (Xj − x) ¯ Z ) = n12 · x), ¯ j =1 (Xj − x) -n 1 b 2 ¯ Z ) ≤ n2 nb = n → 0, as n ∈ N, where the first two equalities i=1 E(Xi − x follow from Proposition 11.92 and the third equality follows from the assumption that (Xn )∞ ¯ 2Z ) = n=1 is independent and Proposition 14.9. Then, limn∈N E(Sn − x 0. This implies that limn∈N Sn − x ¯ 2L¯ (Ω,Z) = 0. Hence, by Proposition 11.211, we 2 have limn∈N Sn (ω) = x¯ in measure in Ω. This completes the proof of the theorem. ' &

14.4 Martingales Indexed by Z+ Definition 14.30 Let X := (X , BB (X ) , μ) be a nonempty σ -finite topological measure space with a partial ordering ) defined on it, Ω := (Ω, B, P ) be a ¯ ν) be the σ -finite product probability measure space, and X × Ω := (X × Ω, B,

14.4 Martingales Indexed by Z+

779

measure space (as defined in Proposition 12.22), Y be a topological space, and ¯ W : X × Ω → Y be B-measurable. Then, W is said to be a Y-valued stochastic process on X × Ω. Then, ∀t ∈ X, Wt : Ω → Y defined by Wt (ω) = W (t, ω), ∀ω ∈ Ω, is a Y-valued random variable (i.e., B-measurable by Proposition 12.28). We will sometimes write (Wt )t ∈X to denote the stochastic process W . When Y ⊆ R, we will simply say that W is a stochastic process on X × Ω. (Bx )x∈X is said to be a filtration on Ω, if Bx ⊆ Bz ⊆ B and Bx and Bz are sub-σ -algebras of B on Ω, ∀x, z ∈ X with x ) z. Then, the quadruple (Ω, B, (Bx )x∈X , P ) is said to be a filtered probability measure space. We say that (Wx )x∈X is adapted to the filtration (Bx )x∈X if Wx is Bx -measurable, ∀x ∈ X. The stochastic process (Wx )x∈X defines the natural filtration of (Wx )x∈X , B˜x x∈X , where B˜x := σ ( y∈X,y)z σ (Wy )), ∀x ∈ X. % It is easy to check that B˜x ⊆ B˜z ⊆ B, ∀x, z ∈ Z with x ) z. Hence, the natural filtration of a stochastic process is indeed a filtration. In this section and the next five sections, we will be primarily concerned with stochastic processes indexed by Z+ . In this case, we will define ℵ := (Z+ , Z+2, μ) to be the topological measure space on Z+ with the counting measure and the partial ordering of ≤. (It is easy to check that it is a topological measure space, where the topology is given by the subset topology of R.) In this case, a Y-valued stochastic process W on ℵ × Ω adapted to some filtration (Bn )n∈ℵ is equivalent to that Wn : Ω → Y being Bn -measurable, ∀n ∈ ℵ. In this case, we will simply say ∞ a Y-valued stochastic process (Wn )∞ n=0 on Ω that is adapted to (Bn )n=0 to save us from cumbersome notations. Definition 14.31 Let Ω := (Ω, B, (Bn )∞ n=0 , P ) be a filtered probability measure space and (Xn )∞ be a real-valued stochastic process on Ω. Then, the stochastic n=0 process (Xn )∞ is said to be a Martingale if n=0 ∞ (i) (Xn )∞ n=0 is an adapted process (that is, it is adapted to the filtration (Bn )n=0 ). (ii) Xn ∈ L¯ 1 (Ω, R), ∀n ∈ Z+ . (iii) Xn−1 ∈ E(Xn |Bn−1 ), ∀n ∈ N.

The stochastic process (Xn )∞ n=0 is said to be a super Martingale if it satisfies (i) and (ii) and fn−1 (ω) ≤ Xn−1 (ω) a.e. ω ∈ Ω, ∀fn−1 ∈ E(Xn |Bn−1 ), ∀n ∈ N. The stochastic process (Xn )∞ n=0 is said to be a sub Martingale if it satisfies (i) and (ii) and fn−1 (ω) ≥ Xn−1 (ω) a.e. ω ∈ Ω, ∀fn−1 ∈ E(Xn |Bn−1 ), ∀n ∈ N. % Clearly, a stochastic process is a super Martingale if, and only if, its negative process is a sub Martingale. A stochastic process is a Martingale if, and only if, it is a super Martingale and is a sub Martingale. Theorem 14.32 Let Ω := (Ω, B, (Bn )∞ n=0 , P ) be a filtered probability measure space, (Xn )∞ be a super Martingale (or a Martingale), and (Cn )∞ n=0 n=1 be such that (i) Cn : Ω → [0, α] ⊂ R, ∀n ∈ N, where α ∈ R+ . (ii) Cn is Bn−1 -measurable, ∀n ∈ N.

780

14 Probability Theory

∞ Then, -n the stochastic process (Yn )n=0 defined by Y0 = X0 and Yn = X0 + i=1 Ci (Xi − Xi−1 ), ∀n ∈ N, is a super Martingale (or a Martingale).

Proof Since, ∀n ∈ N, Xn is Bn -measurable and Cn is Bn−1 -measurable, then, Cn (Xn − Xn−1 ) is Bn -measurable by Propositions 11.38, 7.35, and 11.39 and Definition 14.31. Hence, Yn isBn -measurable, ∀n ∈ Z+ . Clearly, |Yn | ≤ |X0 | + ni=1 (|Ci | |Xi | + |Ci | |Xi−1 |) ≤ (1 + α) |X0 | + -n−1 ¯ i=1 2α |Xi | + α |Xn |, ∀n ∈ Z+ . Then, by Definition 14.31, Xn ∈ L1 (Ω, R), ¯ ∀n ∈ Z+ , implies that Yn ∈ L1 (Ω, R) by Proposition 11.83. Note that E(Yn − Yn−1 |Bn−1 ) = E(Cn (Xn − Xn−1 )|Bn−1 ) = Cn E(Xn − Xn−1 |Bn−1 ), ∀n ∈ N, where the first equality follows from the definition of Yn and the second equality follows from (h) of Proposition 14.11. This is less than or equal to zero almost everywhere if (Xn )∞ n=0 is a super Martingale or is equal to zero almost everywhere if (Xn )∞ is a Martingale. Hence, (Yn )∞ n=0 n=0 is a super Martingale, ∞ if (Xn )n=0 is so, and it is a Martingale, if (Xn )∞ is a Martingale. n=0 This completes the proof of the theorem. ' & This result implies that one cannot beat the system in a gambling scenario by varying his bet each time after observing the outcome of the past dice rolls. Definition 14.33 Let X be a nonempty topological subspace of R, Ω := (Ω, B, (Bx )x∈X , P ) be a filtered probability measure space, and T : Ω → X ∪ {−∞, +∞} ⊆ Re be a random variable (i.e., it is B-measurable). T is said to be stopping time if {ω ∈ Ω | T (ω) ≤ x} ∈ Bx , ∀x ∈ X . % Example 14.34 Let Ω := (Ω, B, (Bn )∞ n=0 , P ) be a filtered probability measure space, Y be a normed linear space over K, (Xn )∞ n=0 be a Y-valued adapted stochastic process, and B ∈ BB (Y). Then, T : Ω → Z+ ∪ {∞} defined by T (ω) := inf{n ∈ Z+ | Xn (ω) ∈ B} = time of first entry of (Xn )∞ n=0 into set B is a stopping time. This is because {ω ∈ Ω | T (ω) ≤ n} = ni=0 {ω ∈ Ω | Xi (ω) ∈ B} ∈ Bn . % Theorem 14.35 Let Ω := (Ω, B, (Bn )∞ n=0 , P ) be a filtered probability measure space, (Xn )∞ be a real-valued adapted stochastic process, and T : Ω → n=0 Z+ ∪ {∞} be a stopping time. Assume that (Xn )∞ n=0 is a super Martingale (or a Martingale), then the stopped stochastic process (Yn )∞ n=0 defined by Yn (ω) = XT (ω)∧n (ω), ∀n ∈ Z+ , is a super Martingale (or a Martingale). Proof Define (Cn )∞ n=1 by Cn : Ω → [0, 1] ⊂ R with Cn (ω) = χ{T (ω)≥n},Ω , ∀ω ∈ Ω, ∀n ∈ N. Clearly, Cn is Bn−1 -measurable. It is easy to see that Yn as defined in the statement of the theorem is exactly the same as that of Theorem 14.32. Then, the result follows immediately from Theorem 14.32. ' & Theorem 14.36 (Doob’s Optional Stopping) Let (Ω, B, (Bn )∞ n=0 , P ) =: Ω be a filtered probability measure space, (Xn )∞ be a super Martingale adapted to the n=0

14.4 Martingales Indexed by Z+

781

filtration (Bn )∞ n=0 , and T : Ω → Z+ ∪ {∞} be a stopping time. Then, XT : Ω → R defined by, ∀ω ∈ Ω, XT (ω) ⎧ ⎪ ∀ω ∈ Ω with T (ω) < ∞ ⎨ XT (ω) (ω) := limn∈N Xn (ω) ∀ω ∈ Ω with T (ω) = ∞ and limn∈N Xn (ω) ∈ R ⎪ ∀ω∈Ω with T (ω)=∞ and ⎩0 limn∈N Xn (ω) does not exist in R

.

is absolutely integrable over Ω and E(XT ) ≤ E(X0 ) if any of the following conditions holds: (i) Xn (ω) ≥ 0, ∀ω ∈ Ω, ∀n ∈ N, and T (ω) < ∞ a.e. ω ∈ Ω. (ii) ∃N ∈ N such that T (ω) ≤ N, ∀ω ∈ Ω. (iii) ∃M ∈ [0, ∞) ⊂ R such that |Xn (ω)| ≤ M, ∀ω ∈ Ω, ∀n ∈ Z+ , and T (ω) < ∞ a.e. ω ∈ Ω. (iv) E(T ) < ∞ and ∃M ∈ [0, ∞) ⊂ R such that |Xn (ω) − Xn−1 (ω)| ≤ M, ∀ω ∈ Ω, ∀n ∈ N. ∞ If (Xn )∞ n=0 is a Martingale adapted to the filtration (Bn )n=0 and any of the conditions (ii)–(iv) holds, then XT is absolutely integrable over Ω and E(XT ) = E(X0 ).

Proof We first consider the first paragraph of the theorem statements. By Propositions 11.51, 11.48, and 11.41, XT is B-measurable. Let (i) hold. Then, the stopped stochastic process (Yn )∞ n=0 defined in Theorem 14.35 satisfies Yn (ω) ≥ 0, ∀ω ∈ Ω, ∀n ∈ Z+ . By Theorem 14.35, (Yn )∞ n=0 is a super Martingale. By (i), T (ω) < ∞ a.e. ω ∈ Ω. Then, limn∈N Yn (ω) = XT (ω) a.e. ω ∈ Ω. Clearly, XT (ω) ≥ 0, ∀ω ∈ Ω. By Fatou’s Lemma 11.80, we have E(XT ) = X (ω) dP (ω) ≤ lim inf T n∈N Ω Ω Yn (ω) dP (ω) = lim infn∈Z+ E(Yn ) = lim infn∈Z+ E(E(Yn |Bn−1 )) ≤ lim infn∈Z+ E(Yn−1 ) ≤ · · · ≤ E(Y0 ) = E(X0 ), where the first equality follows from Definition 14.1, the second equality follows from Definition 14.1, the third equality follows from (j) of Proposition 14.11, the second inequality follows from Theorem 14.35, and the last equality follows from the definition of (Yn )∞ n=0 in Theorem 14.35. Hence, the result holds. Let (ii) hold. Then, the stopped stochastic process (Yn )∞ n=0 satisfies that YN (ω) = XT (ω), ∀ω ∈ Ω. Hence, by Theorem 14.35, XT is absolutely integrable over Ω and E(XT ) = E(YN ) ≤ E(YN−1 ) ≤ · · · ≤ E(Y0 ) = E(X0 ). Hence, the result holds. Let (iii) hold. Then, (Yn )∞ n=0 satisfies that |Yn (ω)| ≤ M, ∀ω ∈ Ω, ∀n ∈ Z+ , and limn∈Z+ Yn (ω) = XT (ω) a.e. ω ∈ Ω. By Lebesgue Dominated Convergence Theorem 11.91, XT is absolutely integrable over Ω and E(XT ) = limn∈Z+ E(Yn ) ≤ E(Y0 ) = E(X0 ). Hence, the result holds. ∞ Let (iv) hold. Then, the stopped -n stochastic process (Yn )n=0 defined in Theorem 14.35 satisfies Yn = X0 + i=1 Ci (Xi − Xi−1 ), ∀n ∈ N, where (Cn )∞ n=1

782

14 Probability Theory

is as defined in the proof of Theorem 14.35. Then, ∀n ∈ Z+ , |Yn (ω)| ≤ |X0 (ω)| +

n .

.

χ{T (ω)≥i},Ω |Xi (ω) − Xi−1 (ω)|

i=1

≤ |X0 (ω)| +

n .

Mχ{T (ω)≥i},Ω ≤ |X0 (ω)| + MT (ω)

i=1

where the first inequality follows from the definition of (Cn ) ∞ n=1 and the second inequality follows from (iv). Then, E(|Yn |) ≤ E(|X0 |) + M Ω T (ω) · dP (ω) = E(|X0 |) + ME(T ) < ∞, ∀n ∈ Z+ , where the last inequality follows from (iv). Then, Yn is absolutely integrable over Ω, ∀n ∈ Z+ . Since E(T ) < ∞, then T (ω) < ∞ a.e. ω ∈ Ω, and limn∈Z+ Yn (ω) = XT (ω) a.e. ω ∈ Ω. By Lebesgue Dominated Convergence Theorem 11.91, E(XT ) = limn∈Z+ E(Yn ) ≤ E(Y0 ) = E(X0 ), where the inequality follows from Theorem 14.35. Hence, the result holds. Now, consider the second paragraph of the theorem statements. Let (Xn )∞ n=0 be a Martingale. Clearly, it is a super Martingale. Let (ii), (iii), or (iv) hold. By the super Martingale case, we have E(XT ) ≤ E(X0 ). Clearly, (−Xn )∞ n=0 is also a super Martingale. By the super Martingale case, we have E(−XT ) ≤ E(−X0 ). Then, we have E(XT ) = E(X0 ). Hence, the result holds. This completes the proof of the theorem. ' & Lemma 14.37 Let Ω := (Ω, B, (Bn )∞ n=0 , P ) be a filtered probability measure space and T : Ω → Z+ ∪ {∞} be a stopping time. Assume that ∃N ∈ N and ∃ ∈ (0, 1) ⊂ R, ∀n ∈ N, P ({ω ∈ Ω | T (ω) ≤ n + N}|Bn )(ω) >  a.e. ω ∈ Ω. Then, E(T ) < ∞. Proof Note that 7 E(T ) =

T (ω) dP (ω) =

.

Ω

=

∞ .

nP ({ω ∈ Ω | T (ω) = n})

n=0

∞ (k+1)N−1 . .

nP ({ω ∈ Ω | T (ω) = n})

n=kN

k=0

∞ (k+1)N−1 . . ≤ ((k + 1)N − 1)P ({ω ∈ Ω | T (ω) = n}) k=0

=

n=kN

∞ .

((k + 1)N − 1)P ({ω ∈ Ω | kN ≤ T (ω) < (k + 1)N})

k=0 ∞ . ((k + 1)N − 1)P ({ω ∈ Ω | T (ω) > kN − 1}) ≤ k=0

14.4 Martingales Indexed by Z+

783

Let αk := P ({ω ∈ Ω | T (ω) ≤ kN − 1}) ∈ [0, 1] ⊂ R, ∀k ∈ N. Then, αk = P ({ω ∈ Ω | T (ω) ≤ kN − 1 − N}) + P ({ω ∈ Ω | kN − 1 − N < T (ω) ≤ kN − 1}) = αk−1 + P ({ω ∈ Ω | kN − 1 − N < T (ω)})P ({ω ∈ Ω | T (ω) ≤ kN − 1} | {ω ∈ Ω | kN − 1 − N < T (ω)}). The last term is a conditional probability of an event given another event. We will treat this term in detail in the following paragraph. Let E1 := {ω ∈ Ω | T (ω) ≤ kN − 1} ∈ B and E2 := {ω ∈ Ω | kN − 1 − N < T (ω)} ∈ BkN−1−N . Let F := {∅, E2 , Ω \ E2 , Ω} ⊆ BkN−1−N , where F is a σ algebra on Ω. Then, the notion P (E1 |F ) is well defined, which is a function on Ω that is F -measurable. Then, P (E1 |F )(ω) = P (E1 |E2 ) ∈ [0, 1] ⊂ R, ∀ω ∈ E2 , and P (E1 |F )(ω) = P (E1 |Ω \ E2 ) ∈ [0, 1] ⊂ R, ∀ω ∈ Ω \ E2 . Then, we have the following identities: P (E1 |E2 ) =

.

1 = P (E2 ) = =

1 P (E2 ) 1 P (E2 )

=

7 7 7

E2

E2

E2

E2

7 P (E1 |F )(ω) dPˆ (ω) 1 P (E1 |F )(ω) dPˆ (ω) = ˆ Pˆ (E2 ) E2 E2 1 dP (ω)

E(χE1 ,Ω |F )(ω) dPˆ (ω) E(E(χE1 ,Ω |BkN−1−N )|F )(ω) dPˆ (ω) E(χE1 ,Ω |BkN−1−N )(ω) dP¯ (ω) ≥

1 P (E2 )

7

 dP¯ E2

1 ¯ P (E2 ) =  P (E2 )

where Pˆ = P |F and P¯ = P |BkN−1−N , the first equality follows from the preceding discussion, the second equality follows from E2 ∈ F , the third equality follows from the definition of Pˆ and Definition 14.12, the fourth equality follows from (j) of Proposition 14.11, the fifth equality follows from (14.2a) of Definition 14.7, the inequality follows from the assumption of the lemma, the sixth equality follows from E2 ∈ BkN−1−N , and the last equality follows from the definition of P¯ . Then, we have αk ≥ αk−1 + (1 − αk−1 ), ∀k ∈ {2, 3, . . .}. Therefore, P ({ω ∈ Ω | T (ω) > kN − 1}) = 1 − αk ≤ (1 − αk−1 ) (1 − ), ∀k ∈ {2, 3, . . .}. Recursively, we have P ({ω ∈ Ω | T (ω) > kN −1}) ≤ P ({ω ∈ Ω | T (ω) > N−1})(1−)k−1 ≤ k−1 (1 − ) , ∀k ∈ {2, 3, . . .}. Hence, E(T ) ≤ N − 1 + 2N − 1 + ∞ k=2 ((k + 1)N − 1) (1 − )k−1 < ∞, where the last inequality follows from Proposition A.2. This completes the proof of the lemma. ' & Definition 14.38 Let Ω := (Ω, B, (Bn )∞ n=0 , P ) be a filtered probability measure space, (Xn )∞ be an adapted R-valued random process on Ω, and a, b ∈ R with n=0 a < b. For any N ∈ Z+ and any ω ∈ Ω, we will say that the sample path of the random process (Xn (ω))∞ n=0 has UN (a, b)(ω) ∈ Z+ upcrossings of the closed interval [a, b] ⊂ R by the time N, if we can find k(ω) := UN (a, b)(ω) ∈ Z+

784

14 Probability Theory

that is the largest nonnegative integer with the property that there exists 0 ≤ s1 < t1 < s2 < t2 < · · · < sk < tk ≤ N such that Xsi (ω) < a and Xti (ω) > b, i = 1, . . . , k. Clearly, (UN (a, b))∞ N=0 is an adapted Z+ -valued random process, i.e., UN (a, b) : Ω → Z+ and is BN -measurable, ∀N ∈ Z+ . % Lemma 14.39 (Doob’s Upcrossing Lemma) Let (Ω, B, (Bn )∞ n=0 , P ) := Ω be a filtered probability measure space, (Xn )∞ be an adapted R-valued random n=0 process on Ω, a, b ∈ R with a < b, and (UN (a, b))∞ be the adapted Z+ N=0 valued random process defined in Definition 14.38. Assume that (Xn )∞ n=0 is a super Martingale. Then, (b − a)E (UN (a, b)) ≤ E (−((XN − a) ∧ 0))

.

(14.4)

Proof We will prove the lemma using Theorem 14.32. Define an R-valued random process (Cn )∞ := χ{ω∈Ω | X0 (ω) Z∞ (ω)} ∈ B. ∀ω ∈ Λ, ∃a, b ∈ Q with a < b such that Y∞ (ω) > b > a > Z∞ (ω). ∞ Then, let (Un (a, b))∞ n=0 be the adapted Z+ -valued random process for (Xn )n=0 as defined in Definition 14.38. Then, U∞ (a, b)(ω) := limn∈N   Un (a, b)(ω) = ∞. ¯ a,b ∈ B. By Thus, Λ ⊆ a,b∈Q {ω ∈ Ω | U∞ (a, b)(ω) = ∞} =: a,b∈Q Λ a 0, ∀(a1 , b1 ) ∈ BR2 ((a, b), δ), we have

|K(a1 , b1 ) − K(a, b)| 7 exp(−iθ a1 ) − exp(−iθ a) + exp(−iθ b) − exp(−iθ b1 ) 1  =  2π R iθ   ·ΨX (θ ) dμB (θ )  7   exp(−iθ a1 ) − exp(−iθ a) + exp(−iθ b) − exp(−iθ b1 )  1   ≤  2π R  iθ

.

·|ΨX (θ )| dμB (θ )   ! 7   exp(−iθ a1 ) − exp(−iθ a)   exp(−iθ b) − exp(−iθ b1 )  1     ≤ +   2π iθ iθ R

·|ΨX (θ )| dμB (θ ) 7 1 ≤ (|a − a1 | + |b − b1 |) |ΨX (θ )| dμB (θ ) 2π R 7 1 = (|a − a1 | + |b − b1 |) |ΨX (θ )| dμB (θ ) 2π R 1 J 1 ≤ 2((a − a1 )2 + (b − b1 )2 )c < √ δc <  2π 2π where the first inequality follows from Proposition 11.92 and the third inequality follows from Claim 14.55.1. This shows that K(a, b) is continuous at (a, b), ∀a, b ∈

14.6 Characteristic Functions

797

R with a < b. Thus, we may conclude that F is continuous. Hence, we have K(a, b) = F (b) − F (a) =

.

1 2π

7 R

exp(−iθ a) − exp(−iθ b) ΨX (θ ) dμB (θ ) iθ

Fix any a0 ∈ R, and we have .

lim

b→a0+

7 F (b) − F (a0 ) 1 exp(−iθ a0 ) − exp(−iθ b) ΨX (θ ) dμB (θ ) = lim + 2π b − a0 iθ (b − a0 ) b→a0 R 7 1 exp(−iθ a0 )ΨX (θ ) dμB (θ ) = 2π R

where the second equality follows by the Lebesgue Dominated Convergence Theorem 11.91. Similarly, .

lim

a→a0−

7 F (a0 ) − F (a) 1 exp(−iθ a) − exp(−iθ a0 ) = lim ΨX (θ ) dμB (θ ) − a0 − a iθ (a0 − a) a→a0 2π R 7 1 exp(−iθ a0 )ΨX (θ ) dμB (θ ) = 2π R

1 Therefore, we have DF (a0 ) = 2π exp(−iθ a0 )ΨX (θ ) dμB (θ ). By the arbitrariR 1 ness of a0 , we have DF (x) = 2π R exp(−iθ x)ΨX (θ ) dμB (θ ) = f (x), ∀x ∈ R. Again, by Lebesgue Dominated Convergence Theorem 11.91, we have f is continuous. Hence, F is C1 . (14.7b). By Definition 14.52, we have ΨX (θ ) = (iii) By (ii), we have R exp(iθ x) dF (x) = R exp(iθ x)f (x) dμB (x), where the second equality follows from Propositions 12.102 and 11.168. This completes the proof of the theorem. ' & By Lévy Inversion Formula, Theorem 14.55, if the characteristic function g : R → C for a random variable X with law μ and cumulative distribution function F (as defined in Definition 14.16) is known, then (i) of Theorem 14.55 implies that the value 12 μ({a}) + μ(r◦a,b ) + 12 μ({b}) is uniquely determined, ∀a, b ∈ R with a < b. Then, g uniquely determines μ(ra,b ) = F (b) − F (a), ∀a, b ∈ R with a < b and F being continuous at a and b. Since μ is a probability measure, then g uniquely determines F (b), ∀b ∈ R with F being continuous at b. By Proposition 12.51, F is a function of bounded variation, and it is continuous on the right and monotonically nondecreasing. Thus, F has countably many points of discontinuity (the total number of discontinuities with a jump of at least 1/n is at most n), which means that the set of points in R where F is continuous is dense in R. Hence, F is uniquely determined by g. This further implies that μ is uniquely determined by g by Theorem 12.50.

798

14 Probability Theory

14.7 Convergence in Distribution Definition 14.56 Let Ω := (Ω, B, P ) be a probability measure space, m ∈ N, m m (xn )∞ n=1 be a sequence of R -valued random variables, where xn : Ω → R and m admits law Ln , ∀n ∈ N, x0 be an R -valued random variable with law L0 . We ∗ will say that (xn )∞ n=1 converges to x0 in distribution if limn∈N Ln = L0 weak . In this case, we will write xn  x0 as n → ∞, or limn∈N xn  x0 . Let Fn : Rm → [0, 1] ⊂ R be the cumulative distribution functions of xn as defined in Definition 14.16, ∀n ∈ Z+ , we will write limn∈N Fn  F0 or Fn  F0 as n → ∞. % In the above definition, Ln is a probability measure on the measurable space (Rm , BB (Rm )), ∀n ∈ Z+ . Then, Ln ∈ Mf (Rm , BB (Rm ) , R) = Mf t (Rm , R) = (Cc (Rm , R))∗ by Riesz Representation Theorem 11.209, ∀n ∈ Z+ . Thus, limn∈N Ln = L0 weak∗ makes sense. Therefore, the definition makes sense. We will note that the convergence in distribution defined above is actually weaker than its classical definition (Williams, 1991, See Definition 17.1). The classical definition implies our definition, and on the other hand, our definition when coupled with the uniform growth boundedness assumption yields the classical definition. Theorem 14.57 (Modes of Convergence) Let Ω := (Ω, B, P ) be a probability m measure space, m ∈ N, and (xn )∞ n=0 be a sequence of R -valued random variables, m where xn : Ω → R and admits law Ln , ∀n ∈ Z+ . Then, the following statements hold: (i) If limn∈N xn = x0 a.e. in Ω, then limn∈N xn = x0 in measure in Ω. (ii) If limn∈N xn = x0 in measure in Ω, then limn∈N xn  x0 . Proof (i) Let limn∈N xn = x0 a.e. in Ω. Then, by Definition 11.42, there exists E ∈ B with P (E) = 0 such that limn∈N xn (ω) = x0 (ω), ∀ω ∈ Ω \ E. Fix any ∞ ∈ R+ . Let En := {ω ∈ Ω | |xn (ω) − x0 (ω)| > }, ∀n ∈ N. Clearly, ∞ n=1 j =n Ej ⊆  ∞ E. Hence, we have 0 ≤ P ( ∞ E ) ≤ P (E) = 0. This implies that, by n=1 j =n j ∞ Proposition 11.5 and P (Ω) = 1 < ∞, limn→∞ P ( j =n Ej ) = 0, which further implies that limn∈N P (En ) = 0. Hence, ∃N ∈ N, ∀n ∈ N with n ≥ N, we have P (En ) < . Therefore, limn∈N xn = x0 in measure in Ω. m 6 6(ii) Let limn∈N xn = x0 in measure in Ω. Fix any zm ∈ Cc (R , R). Let 6z6 Cc (Rm ,R) =: c ∈ R+ . Fix any  ∈ R+ . Since z ∈ Cc (R , R), then ∃M ∈ R+ such that ∀s ∈ Rm with |s| > M, we have |z(s)| < 4 . By Proposition 5.39, z|K is uniformly continuous, where K := B Rm (0m , M + 1) ⊆ Rm is a compact set by Proposition 5.40. Then, ∃δ ∈ (0, 2 ∧ 1) ⊂ R, such that |z(s1 ) − z(s2 )| < 4 , ∀s1 , s2 ∈ K with |s1 − s2 | < δ. By Definition 11.56, there exists N ∈ N, ∀n ∈ N

14.7 Convergence in Distribution

799

δ δ with n ≥ N, we have P ({ω ∈ Ω | |xn (ω) − x0 (ω)| > 4(1+c) }) < 4(1+c) . Fix any n ∈ N with n ≥ N. Then, we have 7 7   dLn (s), z(s) − dL0 (s), z(s) .|Ln , z − L0 , z| = 

 =  =

Rm

7

7 z(xn (ω)) dP (ω) −

7

Ω

Ω

 (z(xn (ω)) − z(x0 (ω))) dP (ω) ≤ Ω

=

 z(x0 (ω)) dP (ω)

"7

E1

7 +

7 +

E2

Rm

7 |z(xn (ω)) − z(x0 (ω))| dP (ω) Ω

# |z(xn (ω)) − z(x0 (ω))| dP (ω)

E3

where the first equality follows from Riesz Representation Theorem 11.209, the second equality follows from Definition 14.16, the third equality, the inequality, and the fourth equality follow from Proposition 11.92, E1 := {ω ∈ δ Ω | |xn (ω) − x0 (ω)| ≤ 4(1+c) and |x0 (ω)| ∨ |xn (ω)| < M + 1}, E2 := {ω ∈ δ Ω | |xn (ω) − x0 (ω)| ≤ 4(1+c) and |x0 (ω)| ∨ |xn (ω)| ≥ M + 1}, and E3 := {ω ∈ δ }. Note that E1 |z(xn (ω)) − z(x0 (ω))| dP (ω) < Ω | |xn (ω) − x0 (ω)| > 4(1+c)   E1 4 dP ≤ 4 , since |xn (ω) − x0 (ω)| < δ and x0 (ω), xn (ω) ∈ K, ∀ω ∈ |z(x (ω)) − z(x0 (ω))| dP (ω) ≤ E2 (|z(xn (ω))| + |z(x0 (ω))|) dP ≤ E , 1  E2  n  E2 ( 4 + 4 ) dP ≤ 2 , since |xn (ω)| > M and |x0 (ω)| > M, ∀ω ∈ E2 , and E3 |z(xn (ω)) − z(x0 (ω))| dP (ω) ≤ E3 (|z(xn (ω))| + |z(x0 (ω))|) dP (ω) ≤ δ < 2δ < 4 . Thus, we have |Ln , z − L0 , z| < . 2cP (E3 ) < 2c 4(1+c) Hence, limn∈N Ln , z = L0 , z. By the arbitrariness of z, we have limn∈N Ln = L0 weak∗ . This proves that limn∈N xn  x0 . This completes the proof of the proposition. ' & Definition 14.58 Let Ω := (Ω, B, P ) be a probability measure space, Y be a separable Banach space, and M be a set of Y-valued random variables. M is said to be uniformly growth bounded if ∀ ∈ R+ , ∃K ∈ R+ , ∀x ∈ M, we have x : Ω → Y satisfies P ({ω ∈ Ω | x(ω)Y > K}) < . % Theorem 14.59 (Skorokhod Representation) Let μn be a probability measure on (R, BB (R)) with cumulative distribution function Fn : R → [0, 1] ⊂ R as defined in Definition 14.16, ∀n ∈ Z+ . Assume that ∀b ∈ R with F0 being continuous at b, we have limn∈N Fn (b) = F0 (b). Then, there exists a probability measure space Ω := (Ω, B, P ) and a sequence of random variables (xn )∞ n=0 , such that xn : Ω → R admits law μn , ∀n ∈ Z+ , and limn∈N xn (ω) = x0 (ω) a.e. ω ∈ Ω. Proof Define E := {b ∈ R | F0 is continuous at b} ⊆ R. Note that F0 is monotonically nondecreasing and is of bounded variation. Then, F0 has countably many points of discontinuity (the total number of discontinuities with a jump of at least 1/n is at most n).

800

14 Probability Theory

Take Ω := ((0, 1), B, P ) to be the finite metric measure subspace of R. Fix any n ∈ Z+ , define xn+ (ω) := inf{z ∈ R | Fn (z) > ω} and xn− (ω) := inf{z ∈ R | Fn (z) ≥ ω}, ∀ω ∈ (0, 1) ⊂ R. By Proposition 12.54, Fn is BB (R)-measurable. Since μn is a probability measure, then limb→−∞ Fn (b) = 0 and limb→∞ Fn (b) = 1. This implies that xn+ (ω) ∈ R and xn− (ω) ∈ R, ∀ω ∈ (0, 1) =: Ω. See Fig. 14.1 for an illustration of the function Fn , xn+ , and xn− . It is easy to show that {ω ∈ Ω | xn− (ω) ≤ z} = {ω ∈ Ω | ω ≤ Fn (z)} ∈ B, ∀z ∈ R. Then, by Proposition 11.35, xn− is B-measurable. Thus, xn− is an R-valued random variable on Ω. The preceding argument further implies that xn− admits law μn since its cumulative distribution function as defined in Definition 14.16 is Fn . It is easy to show that {ω ∈ Ω | xn+ (ω) ≤ z} ⊇ {ω ∈ Ω | ω < Fn (z)} ∈ B, ∀z ∈ R. Note that xn+ (ω) ≥ xn− (ω), ∀ω ∈ Ω. Then, {ω ∈ Ω | xn+ (ω) ≤ z} ⊆ {ω ∈ Ω | xn− (ω) ≤ z} = {ω ∈ Ω | ω ≤ Fn (z)}, ∀z ∈ R. Thus, we have (0, F (z)) ⊆ {ω ∈ Ω | xn+ (ω) ≤ z} ⊆ (0, F (z)]. Then, {ω ∈ Ω | xn+ (ω) ≤ z} equals either (0, F (z)) or (0, F (z)], which is B-measurable in either case. By the arbitrariness of z and Proposition 11.35, xn+ is B-measurable. Hence, xn+ is an R-valued random variables on Ω. The preceding argument further implies that xn+ admits law μn and xn+ = xn− a.e. in Ω. Fix ω ∈ Ω. Let z ∈ E with ω < F0 (z). By the assumption, we have limn∈N Fn (z) = F0 (z) > ω. Then, for sufficiently large n, Fn (z) > ω, which implies that xn+ (ω) ≤ z by the discussion in the preceding paragraph. Then, we have lim supn∈N xn+ (ω) ≤ z. By the arbitrariness of z, we have lim supn∈N xn+ (ω) ≤ inf{z ∈ E | ω < F0 (z)}. Since E is dense in R, it is straightforward to show that lim supn∈N xn+ (ω) ≤ inf{z ∈ E | ω < F0 (z)} = inf{z ∈ R | ω < F0 (z)} = x0+ (ω). Let z ∈ E and z < x0− (ω). Then, ω > F0 (z), by the second preceding paragraph. By the assumption, we have limn∈N Fn (z) = F0 (z) < ω. Then, for sufficiently large n, we have Fn (z) < ω. This implies that xn− (ω) > z for sufficiently large n, by the second preceding paragraph. Then, we have lim infn∈N xn− (ω) ≥ z. Since E is dense in R, lim infn∈N xn− (ω) ≥ sup{z ∈ E | z < x0− (ω)} = x0− (ω). Combining the result of the above two paragraphs, we have x0− (ω) ≤ lim infn∈N xn− (ω) ≤ lim supn∈N xn− (ω) ≤ lim supn∈N xn+ (ω) ≤ x0+ (ω), ∀ω ∈ Ω. When x0− (ω) = x0+ (ω), we have x0− (ω) = limn∈N xn− (ω). Since x0− (ω) = x0+ (ω) a.e. ω ∈ Ω, we have x0− (ω) = limn∈N xn− (ω) a.e. ω ∈ Ω, by  ∞ Proposition 11.48. Hence, xn− n=0 is the sequence we seek. This completes the proof of the theorem. ' & Now, we prove a result that connects our definition of convergence in distribution to that of Williams (1991). Proposition 14.60 Let Ω := (Ω, B, P ) be a probability measure space, (xn )∞ n=0 be a sequence of (R-valued) random variables with xn : Ω → R with law μn , ∀n ∈ Z+ , and Fn : R → [0, 1] ⊂ R be the cumulative distribution function for μn as defined in Definition 14.16, ∀n ∈ Z+ . Then, the following statements hold: (i) If limn∈N R dμn (x), z(x) = R dμ0 (x), z(x) ∈ R, ∀z ∈ Cb (R, R), then limn∈N μn  μ0 .

14.7 Convergence in Distribution

801

Fig. 14.1 A typical cumulative distribution function Fn and the corresponding random variables xn+ and xn−

(ii) limn∈N R dμn (x), z(x) = R dμ0 (x), z(x) ∈ R, ∀z ∈ Cb (R, R), if, and only if, limn∈N Fn (b) = F0 (b), ∀b ∈ R with F0 being continuous at b. ∞ (iii) If limn∈N μn  μ0 and (x n )n=1 is uniformly growth bounded, then limn∈N R dμn (x), z(x) = R dμ0 (x), z(x) ∈ R, ∀z ∈ Cb (R, R).

802

14 Probability Theory

Proof (i) Note that limn∈N μn  μ0 means that limn∈N μn = μ0 weak∗ , that is limn∈N μn , z = μ0 , z, ∀z ∈ Cc (R, R). By Riesz Representation Theorem 11.209, this is equivalent to lim n∈N R dμn (x), z(x) = dμ (x), z(x), ∀z ∈ C (R, R) ⊆ C (R, R). Thus, (i) 0 c b R follows readily. (ii) “Necessity” Let limn∈N R dμn (x), z(x) = R dμ0 (x), z(x) ∈ R, ∀z ∈ Cb (R, R). Fix any b ∈ R and δ ∈ R+ . Define hb,δ : R → ⎧ x≤b ⎨1 [0, 1] ⊂ R by hb,δ (x) = 1 − x−b , ∀x ∈ R. Clearly, hb,δ ∈ δ b 2δ . Hence, (Xn )∞ n=1 is uniformly growth bounded. BB CC By Proposition 14.60, we have liml∈N μnl , z BB= μ,CC z, ∀z ∈ Cb (R, R). Let z = χR,R , we have μ(R) = μ, z = liml∈N μnl , z = liml∈N μnl (R) = 1. Combined with P ◦ μ(R) ≤ 1, we conclude that P ◦ μ(R) = 1 and μ is a probability measure on (R, BB (R)). Therefore, by Example 14.17, there exists a ¯ := (Ω, ¯ P¯ ) and a random variable X : Ω¯ → R ¯ B, probability measure space Ω with law μ. Let F : R → [0, 1] ⊂ R be the cumulative distribution function of μ as defined in Definition 14.16. Then, we have g(θ ) = limBBn∈N ΨXn (θCC) = limn∈N R exp(iθ x) dμn (x) = limn∈N μn , z0 (θ ) = liml∈N μnl , z0 (θ ) = μ, z0 (θ ) = R exp(iθ x) dμ(x) = ΨX (θ ), ∀θ ∈ R, where the first equality follows from the assumption of the theorem, the second equality follows from Definition 14.52, the third equality follows from the definition z0 (θ ) : R → C is defined by z0 (θ )(x) = exp(iθ x), ∀x ∈ R, the fourth equality follows from Proposition 3.70, the fifth equality follows from the preceding discussion and Re (z0 (θ )) ∈ Cb (R, R) and Im (z0 (θ )) ∈ Cb (R, R), and the last equality follows from Definition 14.52. By (ii) of Proposition 14.60, liml→N Fnl (b) = F (b), ∀b ∈ R with F being continuous at b. By Lévy Inversion Formula, Theorem 14.55, and the discussion immediately after the theorem, g uniquely determines μ. ∞ we have shown that for the  Thus, sequence (μn )∞ , there is a subsequence μ n l l=1 such that liml∈N μnl  μ, n=1 where μ is the probability measure on (R, BB (R)) that is the law of a random variable X whose characteristic function  ∞ is given by g. Note that g is independent of the choice of the subsequence μnl l=1 . This clearly holds for any subsequence  ∞ ∞ of (μn )∞ n=1 . Thus, ∀ subsequence μni i=1 of (μn )n=1 , there exists a further

14.8 Central Limit Theorem

807

∞  subsubsequence μnil l=1 such that liml∈N μnil  μ. By the uniqueness of the weak∗ limit (Propositions 7.120) and 3.71, we have limn∈N μn  μ. Hence, limn∈N Xn  X. This completes the proof of the theorem. ' & Theorem 14.63 (Central Limit Theorem) Let Ω := (Ω, B, P ) be a probability measure, b ∈ R+ , (Xn )∞ n=1 be a sequence of independent (R-valued) random variables, Xn : Ω → R, with E(Xn ) = 0, E(Xn2 ) ≤ b, and E(|X/n |3 ) ≤ b, ∀n 0∞∈ N, 2 3 (E(Sn )) Sn and Sn := ni=1 Xi . Assume that lim = ∞. Then, J is n∈N n2 E(Sn2 ) n=1 Sn uniformly growth bounded and lim J  X, where X is a random variable n∈N E(S 2 ) n with cumulative distribution function N(0, 1). Proof Clearly, E(Sn ) = 0 and bn := E(Sn2 ) = ni=1 E(Xi2 ) ≤ nb, ∀n ∈ N. Fix any n ∈ N. Let ΨXn : R → C be the characteristic function of Xn and ΨVn : R → C J 2 := S / E(S be the characteristic function of V n n n ). Then, by Propositions 14.53 n θ and 14.54, ΨVn (θ ) = i=1 ΨXi ( √b ), ∀θ ∈ R. By Proposition 14.53, we have n   (1) 2 ), and Ψ (3) (θ ) = ΨXn (0) = 1, ΨXn (0) = iE(Xn ) = 0, ΨX(2) (0) = −E(X   n Xn n   E((iXn )3 exp(iθ Xn )) ≤ E(|Xn |3 ) ≤ b. By Taylor’s Theorem 9.48, we have Rn (θ ) := ΨXn (θ ) − (1 −

E(Xn2 ) 2 2 θ )

|Rn (θ )| ≤

.

Let hn,i (θ ) := −

E(Xi2 ) θ 2 bn 2

and

1 E(|Xn |3 )|θ |3 , 6

∀θ ∈ R

+ Ri ( √θb ), ∀n ∈ N, ∀i ∈ N. Fix any θ ∈ R. Clearly, we n

have limn∈N hn,i (θ ) = 0, limn∈N Ri ( √θb ) = 0, and the convergence is uniform in    n   i ∈ N. For sufficiently large n, we have hn,i (θ ) < 12 and Ri ( √θb ) < 12 , ∀i ∈ N. n Then, we have, using the notation of Example 12.139,    n  2  2    . ΨVn (θ ) − exp(− θ ) =  ΨX ( √θ ) − exp(− θ ) i  2   2  b i=1

n

 n /  0 0 /  E(Xi2 ) θ 2 E(Xi2 ) θ 2 θ 2  θ  = exp(− ) 1− + Ri ( √ ) exp − 1  2  bn 2 bn 2 bn i=1

  !  n      θ  1 + hn,i (θ ) exp −hn,i (θ ) exp Ri ( √ ) − 1 = exp(− ) 2 bn θ2

i=1

 n T      θ2  = exp(− )exp ln 1 + hn,i (θ ) ⊕ ln exp −hn,i (θ ) 2 i=1

808

14 Probability Theory

θ ⊕ln exp Ri ( √ ) bn =: exp(−

!! !!

  − 1

θ2 )|exp (un (θ )) − 1| 2

where the second equality follows from the definition of bn , the third equality follows from the definition of hn,i (θ ), and the fourth equality follows from Proposition 12.140. Note that ρ(un (θ ), p0 )

.



n T

     ln 1 + hn,i (θ ) ⊕ ln exp −hn,i (θ )

i=1

θ ⊕ln exp Ri ( √ ) bn ≤

n .

     ρ ln 1 + hn,i (θ ) ⊕ ln exp −hn,i (θ )

i=1

θ ⊕ln exp Ri ( √ ) bn ≤

n .

!! ! ! , p0

!!

! , p0

      ρ ln 1 + hn,i (θ ) ⊕ ln exp −hn,i (θ ) , p0

i=1

 ! !! !! . n 2   θ θ     +ρ ln exp Ri ( √ ) , p0 ≤ 2 hn,i (θ ) + Ri ( √ ) bn bn i=1 / 0 n . b|θ |3 −3/2 θ 4 b2 1 bn + 2 + 3 b2 θ 6 ≤ 6 bn 9bn i=1

=

nb|θ |3 −3/2 nθ 4 b2 n bn + + 3 b2 θ 6 → 0 as n → ∞ 6 bn2 9bn

where the first equality follows from the previous assignment, the first and second inequalities follow from (v) of Proposition 12.140, the third inequality follows from the (vi) and (vii) of Proposition 12.140, the fourth inequality follows from the definition of hn,i (θ ) and Ri (θ ), and the convergence statement follows from the assumption of the theorem. Hence, by Example 12.139, limn∈N exp (un (θ )) = exp(p0 ) = 1 and limn∈N ΨVn (θ ) = exp(−θ 2 /2) =: g(θ ). Note that g(θ ) is the characteristic function of a random variable X ∼ N(0, 1). Then, the result follows from Lévy Convergence Theorem 14.62. This completes the proof of the theorem. ' &

14.9 Uniform Integrability and Martingales

809

14.9 Uniform Integrability and Martingales Definition 14.64 Let Ω := (Ω, B, P ) be a probability measure space, Y be a separable Banach space, and F be a collection of Y-valued random variables. F is said to be uniformly integrable if ∀ ∈ (0, ∞) ⊂ R, ∃K ∈ [0, ∞) ⊂ R such that, ∀X ∈ F , we have {ω∈Ω | X(ω) >K} X(ω)Y dP (ω) < . % Y

Clearly, for a uniformly integrable (UI) family F of Y-valued random variables, let  = 1 and K1 ∈ [0, ∞) ⊂ R be as in Definition 14.64. Then, ∀X ∈ F , we have 7 7 X(ω)Y dP (ω) = X(ω)Y dP (ω) .E(P ◦ X) = {ω∈Ω | X(ω)Y >K1 }

Ω

7

+

{ω∈Ω | X(ω)Y ≤K1 }

X(ω)Y dP (ω)

7

< 1+

{ω∈Ω | X(ω)Y ≤K1 }

K1 dP (ω) ≤ 1 + K1

where the first equality follows from Definition 14.1, the second equality follows from Proposition 11.89 and Bounded Convergence Theorem 11.77, and the first inequality follows from Proposition 11.89 and Definition 14.64. Hence, E(P ◦X) <  K1 + 1, ∀X ∈ F , and F ⊆ BL¯ 1 (Ω,Y) ϑL¯ 1 (Ω,Y) , 1 + K1 ⊆ L¯ 1 (Ω, Y). Proposition 14.65 Let Ω := (Ω, B, P ) be a probability measure space, Y be a separable Banach space, and F be a collection of Y-valued random variables. Then, the following statements hold: (i) If F is bounded in L¯ p (Ω, Y) with p ∈ (1, ∞) ⊂ R, that is, ∃A ∈ [0, ∞) ⊂ R such that E(Pp ◦ X) ≤ A, ∀X ∈ F . Then, F is uniformly integrable. (ii) If there exists g : Ω → [0, ∞) ⊂ R such that g is B-measurable, E(g) < ∞, and X(ω)Y ≤ g(ω) a.e. ω ∈ Ω, ∀X ∈ F . Then, F is uniformly integrable. #− 1 " p−1  Proof (i) Fix any  ∈ (0, ∞) ⊂ R. Let K := 1+A . ∀X ∈ F , by Propositions 11.38 and 7.21, Pq ◦ X is B-measurable, ∀q ∈ [1, ∞) ⊂ R. Then, 7 .

{ω∈Ω | X(ω)Y >K}

X(ω)Y dP (ω)

7



p

{ω∈Ω | X(ω)Y >K}

X(ω)Y K 1−p dP (ω)

810

14 Probability Theory

7 ≤ K 1−p

p

{ω∈Ω | X(ω)Y >K}

7 ≤ K 1−p

Ω

X(ω)Y dP (ω)

p

X(ω)Y dP (ω) ≤

 A= 1+A

where the inequalities follow from Proposition 11.83. Hence, F is uniformly integrable. (ii) Fix any  ∈ (0, ∞) ⊂ R. By Proposition 11.84, ∃δ ∈ (0, ∞) ⊂ R such that ∀E ∈ B with P (E) < δ, we have g dP < , since E(g) < ∞. Note that E   P {ω ∈ Ω | g(ω) > K} = P (∅) = 0. By Proposition 11.5, we have 0 = K∈N limK∈N P ({ω ∈ Ω | g(ω) > K}). Then, ∃K ∈ N such that E := {ω ∈ Ω | g(ω) > K} ∈ B and P (E) < δ. ∀X ∈ F , 7 7 7 X(ω) . P ◦ X dP ≤ g dP <  Y dP (ω) ≤ {ω∈Ω | X(ω)Y >K}

E

E

where the first inequality follows Proposition 11.83, the second inequality follows from Proposition 11.83, and the last inequality follows from P (E) < δ and the preceding discussion. Hence, F is uniformly integrable. This completes the proof of the proposition. ' & Theorem 14.66 Let Ω := (Ω, B, P ) be a probability measure space, Y be a separable Banach space, X ∈ L¯ 1 (Ω, Y) be a Y-valued random variable. Then, the ¯ is set F := {f ∈ L¯ 1 (Ω, Y) | ∃B¯ ⊆ B which is a sub-σ -algebra on Ω, f ∈ E(X|B)} uniformly integrable. Proof Fix any  ∈ (0, ∞) ⊂ R. By Proposition 11.84, ∃δ ∈ (0, ∞) ⊂ R such that ∀E ∈ B with P (E) < δ, we have E P ◦ X dP < . Let K ∈ [0, ∞) ⊂ R be such that K −1 E(P ◦ X) < δ. Fix any B¯ ⊆ B with B¯ being a σ -algebra on Ω. Let ¯ := (Ω, B, ¯ P¯ := P | ¯ ). Fix any f ∈ E(X|B). ¯ By (g) of Proposition 14.11, Ω B ¯ ¯ Let E := ¯ we have P ◦ f (ω) ≤ f (ω) a.e. ω ∈ Ω, where f¯ ∈ E(P ◦ X|B). ¯ ¯ {ω ∈ Ω | P ◦ f (ω) > K} ∈ B, since f is B-measurable. Then, K P¯ (E) = ¯ ¯ ¯ ¯ ¯ K E 1 dP ≤ E P ◦f dP ≤ Ω P ◦f dP ≤ Ω f dP = Ω P ◦X dP = E(P ◦X), where the first equality follows from Proposition 11.75, the first, second, and third inequalities follow from Proposition 11.83, and the second equality follows from −1 ¯ Definition 14.7. This implies that P (E) = P (E) ≤ K E(P ◦ X) < δ. Therefore, ¯ ¯ ¯ E P ◦ f dP = E P ◦ f dP ≤ E f dP = E P ◦ X dP < , where the first ¯ equality follows from the fact f is B-measurable and Proposition 11.72, the first inequality follows from Proposition 11.83, and the second equality follows from Definition 14.7. Hence, F is uniformly integrable. This completes the proof of the theorem. ' & Proposition 14.67 (Chebyshev Inequality) Let Ω := (Ω, B, P ) be a probability measure space and X ∈ L¯ 1 (Ω, R) be a R-valued random variable. Assume that

14.9 Uniform Integrability and Martingales

811

X(ω) ≥ 0 a.e. in Ω. Then, ∀c ∈ (0, ∞) ⊂ R, we have P ({ω ∈ Ω | X(ω) ≥ c}) ≤ c−1 E(X). Proof We have the following derivations: 7 P ({ω ∈ Ω | X(ω) ≥ c}) =

.

7 ≤

{ω∈Ω | X(ω)≥c}

≤ c−1

7

{ω∈Ω | X(ω)≥c}

c−1 X(ω) dP (ω) = c−1

1 dP 7 {ω∈Ω | X(ω)≥c}

X(ω) dP (ω)

X(ω) dP (ω) = c−1 E(X) Ω

where the first equality follows from Proposition 11.75, the first inequality follows from Proposition 11.89, the second equality follows from Proposition 11.89, and the second inequality follows from the fact that X(ω) ≥ 0 a.e. in Ω. ' & Theorem 14.68 Let Ω := (Ω, B, P ) be a probability measure space, Y be a ¯ ¯ separable Banach space, (Xn )∞ n=1 ⊆ L1 (Ω, Y) =: Z be a sequence of Y-valued ¯ random variables, and X ∈ L1 (Ω, Y) be a Y-valued random variable. Then, . ¯ or equivalently limn∈N E(P ◦ (Xn − X)) = 0 implies that the limn∈N Xn = X in Z, following two conditions are satisfied: (i) limn∈N Xn = X in measure in Ω. (ii) The sequence (Xn )∞ n=1 is uniformly integrable. The converse of the above holds if Y is a separable real Hilbert space. Proof “Necessity” Let limn∈N E(P ◦ (Xn − X)) = 0. By Proposition 11.211, we have (i). Fix any  ∈ (0, 1) ⊂ R. Then, ∃n0 ∈ N, ∀n ∈ N with n0 ≤ n, we have E(P ◦ (Xn − X)) <  2 < . By Chebyshev Inequality Proposition 14.67, we have P ({ω ∈ Ω | P ◦ (Xn − X)(ω) > } ≤  −1 E(P ◦ (Xn − X)) < , ∀n ∈ N with n0 ≤ n. By Proposition 11.84, ∃δ ∈ (0, ∞) ⊂ R such that ∀E ∈ B with P (E) < δ, we have E P ◦ Xn dP < , n = 1, . . . , n0 , and E P ◦ X dP < . Clearly, supn∈N E(P ◦ Xn ) =: M ∈ [0, ∞) ⊂ R. Choose K > 0 such that K −1 M < δ. Then, ∀n ∈ N, by Chebyshev Inequality Proposition 14.67, we have P ({ω ∈ Ω | P ◦ Xn (ω) > K}) ≤ K −1 E(P ◦ Xn ) ≤ K −1 M < δ. If n ≥ n0 , then {ω∈Ω | P ◦Xn (ω)>K} P ◦ Xn dP ≤ {ω∈Ω | P ◦Xn (ω)>K} P ◦ X dP + {ω∈Ω | P ◦Xn (ω)>K} P ◦ (Xn − X) dP <  + Ω P ◦ (Xn − X) dP < 2, where the first inequality follows from Definition 7.1 and Proposition 11.83, the second inequality follows from the preceding discussion and Proposition 11.83, and the last inequality follows from the preceding discussion. On the other hand, if n < n0 , we have {ω∈Ω | P ◦Xn (ω)>K} P ◦Xn dP <  by the preceding discussion. Thus, (Xn )∞ n=1 is uniformly integrable.

812

14 Probability Theory

“Sufficiency” Let (i) and (ii) hold and Y be a separable real 3 Hilbert space. ∀K ∈ K y if yY > K (0, ∞) ⊂ R, define ϕK : Y → B Y (ϑY , K) by ϕK (y) = yY , y if yY ≤ K ∀y ∈ Y. Clearly, ϕK is continuous. Fix any  ∈ R+ . By (ii) and the fact that X ∈ L¯ 1 (Ω, Y), we choose K ∈ R+ such that {ω∈Ω | P ◦Xn (ω)>K} P ◦ Xn dP < 3 , ∀n ∈ N, and {ω∈Ω | P ◦X(ω)>K} P ◦ X dP < 3 . ∀n ∈ N, we have E(P ◦ (ϕK ◦ Xn − Xn )) 7 6 6Xn (ω) − =

.

7 = 7 ≤

{ω∈Ω | P ◦Xn (ω)>K}

{ω∈Ω | P ◦Xn (ω)>K}

{ω∈Ω | P ◦Xn (ω)>K}

(1 −

6 K Xn (ω)6Y dP (ω) Xn (ω)Y

K )Xn (ω)Y dP (ω) Xn (ω)Y

P ◦ Xn dP
K, Case 3: xY > K and yY ≤ K, and Case 4: xY > K and yY > K. Case 1: x, y ∈ B Y (ϑY , K). Then, we have ϕK (x) − ϕK (y)Y = x − yY . This case is proved. Case 2: xY ≤ K and yY > K. Let C := B Y (ϑY , K), which is clearly a closed convex set. The minimization problem minv∈C y − vY admits minimum K yY − K and is achieved at v0 = y y =: αy ∈ C. This is because Y

y − v2Y = y − v, y − vY = y2Y − 2v, y + v2Y ≥ y2Y − 2vY yY + v2Y = (yY − vY )2 ≥ (yY − K)2 , ∀v ∈ C, where the first equality follows from Proposition 13.2, the second equality follows from Definition 13.1, the first inequality follows from Proposition 13.2, and the second inequality follows from the fact that v ∈ C and yY > K. Thus, by Theorem 13.39, we have (1 − α)y, v − αyY = y − v0 , v − v0 Y ≤ 0, ∀v ∈ C. Thus, we have ϕK (x) − ϕK (y)Y = x − αyY ≤ x − yY , where the inequality follows since x − y2Y − x − αy2Y = t − (1 − α)y2Y − t2Y = −2t, (1 − α)yY + (1 − α)2 y2Y ≥ 0, with t := x − αy. This case is proved. Case 3: xY > K and yY ≤ K. This follows from symmetry and Case 6 K 2. Case 4: xY > K and yY > K. Note that ϕK (x) − ϕK (y)Y = 6 x x− Y 6 6 6 K 6 =: 6αx x − αy y 6 . Without loss of generality, assume yY ≥ xY . y yY Y Y6 6 6 α 6 Then, αx ≥ αy > 0. Now, 6αx x − αy y 6Y = αx 6x − αyx y 6Y ≤ αx x − yY ≤ x − yY , where the first inequality follows from Case 2 with K set to xY . This proves the case.

14.10 Existence of the Wiener Process

813

In all cases, we have ϕK (x) − ϕK (y)Y ≤ x − yY . This completes the proof of the claim. ' & ∞  ∞ Fix any subsequence Xnk k=1 of (Xn )n=1 . By (i), we have limk∈N Xnk = #∞ " X in measure in Ω. By Proposition 11.57, there exists a subsubsequence Xnkl l=1 ∞  of Xnk k=1 such that liml∈N Xnkl = X a.e. in Ω. Then, by Proposition 11.52, we have liml∈N ϕK ◦ Xnkl = ϕK ◦ X a.e. in Ω. Again by Proposition 11.52, we have liml∈N P ◦ (ϕK ◦ Xnkl − ϕK ◦ X) = 0 a.e. in Ω. By Bounded Convergence Theorem 11.77, we have liml∈N Ω P ◦ (ϕK ◦ Xnkl − ϕK ◦ X) dP = 0. Since ∞ the above holds for any subsequence of (Xn )n=1 , by Proposition 3.71, we have limn∈N Ω P ◦ (ϕK ◦ Xn − ϕK ◦ X) dP = 0. Then, ∃n0 ∈ N, ∀n ∈ N with n0 ≤ n, we have E(P ◦ (ϕK ◦ Xn − ϕK ◦ X)) < 3 . This leads to E(P ◦ (Xn − X)) ≤ E(P ◦ (ϕK ◦ Xn − Xn )) + E(P ◦ (ϕK ◦ Xn − ϕK ◦ X)) + E(P ◦ (ϕK ◦ X − X)) < , ∀n ∈ N with n0 ≤ n. Hence, we have limn∈N E(P ◦ (Xn − X)) = 0, i.e., . ¯ limn∈N Xn = X in Z. ' & This completes the proof of the theorem.

14.10 Existence of the Wiener Process Proposition 14.69 Let Ω := (Ω, B, P ) be a probability measure space, Y be a separable reflexive Banach space with Y∗ being separable, Bˆ ⊆ B be a sub-σ ˆ := (Ω, B, ˆ Pˆ := P | ˆ ) be probability measure space, p ∈ [1, ∞) ⊂ R, algebra, Ω B ˆ ∈ L1 (Ω, ˆ R), and f ∈ L¯ p (Ω, Y) be a Y-valued random variable, g ∈ E(Pp ◦ f |B) ˆ ˆ ˆ ˆ ¯ h ∈ E(f |B) ∈ L1 (Ω, Y). Then, h ∈ Lp (Ω, Y) and Pp ◦ h ≤ g a.e. in Ω. Proof Note that f ∈ L¯ p (Ω, Y) implies that f ∈ L¯ 1 (Ω, Y) since Ω is finite and ˆ [0, ∞)) and h ∈ L¯ 1 (Ω, ˆ Y) exists. Then, by p ≥ 1. By Definition 14.7, g ∈ L¯ 1 (Ω, ˆ where l ∈ E(P ◦ f |B) ˆ ∈ (g) of Proposition 14.11, we have P ◦ h ≤ l a.e. in Ω, ˆ [0, ∞)). Thus, Pp ◦ h = (P ◦ h)p ≤ l p ≤ g a.e. in Ω, ˆ where the last L1 (Ω, p inequality follows from (i) of Propositions 14.11 and 8.25. Then, h ¯ ˆ = Lp (Ω,Y)

ˆ p ◦h) = E(Pp ◦h) ≤ E(g) = E(Pp ◦f ) < ∞, where the first equality follows E(P from Definition 14.1, the second equality follows from Proposition 11.72, the first inequality follows from Proposition 11.83 and the preceding discussion, the third equality follows from (a) of Proposition 14.11, and the last inequality follows from ˆ Y). This completes the proof of the proposition. f ∈ L¯ p (Ω, Y). Hence, h ∈ L¯ p (Ω, ' & Theorem 14.70 Let Ω := (Ω, B, P ) be a probability measure space, Y be a ˆ := (Ω, B, ˆ Pˆ := P | ˆ ) separable real Hilbert space, Bˆ ⊆ B be a sub-σ -algebra, Ω B ¯ be probability measure space, f ∈ L2 (Ω, Y) be a Y-valued random variable, ˆ Y). Then, h ∈ L¯ 2 (Ω, ˆ Y) and f − h2 ˆ ∈ L1 (Ω, and h ∈ E(f |B) = L¯ (Ω,Y) 2

814

14 Probability Theory

E(P2 ◦ (f − h)) = E(P2 ◦ f ) − E(P2 ◦ h) ≤ f − g2L¯ (Ω,Y) , ∀g : Ω → Y 2 ˆ ˆ that is B-measurable, with equality holding if, and only if, g = h a.e. in Ω. ˆ ∈ L1 (Ω, ˆ Y) exists. Hence, h ∈ L¯ 1 (Ω, ˆ Y) exists. Proof By Definition 14.7, E(f |B) ˆ ¯ By Proposition 14.69, h ∈ L2 (Ω, Y). This implies that λ :=

.

inf

g:Ω→Y being Bˆ -measurable

f − g2L¯

2 (Ω,Y)

∈ [0, ∞) ⊂ R

ˆ Y). Then, and the infimum must be such that g ∈ L¯ 2 (Ω, λ=

.

= = = = = = =

inf

f − g2L¯

inf

ˆ E(E(P2 ◦ (f − g)|B))

inf

ˆ E(E(f, f Y − 2f, gY + g, gY |B))

inf

ˆ − 2E(f, gY |B) ˆ + E(g, gY |B)) ˆ E(E(f, f Y |B)

inf

ˆ − 2E(Φinv (g), f |B) ˆ + g, gY ) E(E(f, f Y |B)

inf

BB CC ˆ − 2 Φinv (g), E(f |B) ˆ + g, gY ) E(E(f, f Y |B)

inf

(E(f, f Y ) − 2E(Φinv (g), h) + E(g, gY ))

inf

E(f, f Y − 2Φinv (g), h + g, gY )

ˆ g∈L¯ 2 (Ω,Y) ˆ g∈L¯ 2 (Ω,Y) ˆ g∈L¯ 2 (Ω,Y) ˆ g∈L¯ 2 (Ω,Y) ˆ g∈L¯ 2 (Ω,Y) ˆ g∈L¯ 2 (Ω,Y) ˆ g∈L¯ 2 (Ω,Y) ˆ g∈L¯ 2 (Ω,Y)

= E(f, f Y ) + = E(f, f Y ) +

2 (Ω,Y)

=

E(P2 ◦ (f − g))

inf

ˆ g∈L¯ 2 (Ω,Y)

inf

E(−2h, gY + g, gY )

inf

E(−h, hY + g − h, g − hY )

ˆ g∈L¯ 2 (Ω,Y) ˆ g∈L¯ 2 (Ω,Y)

= E(f, f Y ) − E(h, hY ) +

inf

E(g − h, g − hY )

inf

g − h2¯

ˆ g∈L¯ 2 (Ω,Y)

= f 2L¯

2 (Ω,Y)

− h2¯

+

= f 2L¯

2 (Ω,Y)

− h2¯

= f − h2L¯

ˆ L2 (Ω,Y) ˆ L2 (Ω,Y)

ˆ g∈L¯ 2 (Ω,Y)

ˆ L2 (Ω,Y)

2 (Ω,Y)

where the first equality follows from the preceding discussion, the second equality follows from Definition 14.1, the third equality follows from (a) of Proposition 14.11, the fourth equality follows from Proposition 13.2 and Definition 13.1, the fifth equality follows from (c) of Proposition 14.11, the sixth equality follows from (iv) of Theorem 13.15 and (b) of Proposition 14.11 and Φ : Y∗ → Y is

14.10 Existence of the Wiener Process

815

the mapping defined in Theorem 13.15, the seventh equality follows from (h) of Proposition 14.11, the eighth and ninth equalities follow from Proposition 11.92, the eleventh equality follows from Definition 13.1, the twelfth equality follows from Definition 14.1 and Proposition 13.2, the fourteenth equality follows whenever g = ˆ and the last equality follows from the fact we substitute g = h a.e. in Ω ˆ h a.e. in Ω, into the original infimization problem. This completes the proof of the theorem. ' & The above theorem says that minimization problem .

min

g:Ω→Y being Bˆ -measurable

f − g2L¯

2 (Ω,Y)

admits a unique minimizer (modulo a set of measure 0) given by a version of ˆ E(f |B). Proposition 14.71 Let Ω := (Ω, B, P ) be a probability measure space, m ∈ N, m m and (xn )∞ n=0 be a sequence of R -valued random variables with xn : Ω → R m with law μn , ∀n ∈ Z+ , and Fn : R → [0, 1] ⊂ R be the cumulative distribution function for μn as defined in Definition 14.16, ∀n ∈ Z+ . Then, the following statements hold: (i) If limn∈N Rm dμn (x), z(x) = Rm dμ0 (x), z(x) ∈ R, ∀z ∈ Cb (Rm , R), then limn∈N μn  μ 0 . (ii) limn∈N Rm dμn (x), z(x) = Rm dμ0 (x), z(x) ∈ R, ∀z ∈ Cb (Rm , R) implies that limn∈N Fn (b) = F0 (b), ∀b ∈ Rm with F0 being continuous at b. (iii) If limn∈N μn  μ0 and (xn )∞ n=1 is uniformly growth bounded, then limn∈N Fn (b) = F0 (b), ∀b ∈ Rm with F0 being continuous at b. (iv) If limn∈N μn  μ0 , (xn )∞ n=1 is uniformly growth bounded, and limn∈N Fn (b) =: F¯ (b) ∈ [0, 1] ⊂ R, ∀b ∈ Rm , and F¯ : Rm → [0, 1] ⊂ R is a valid cumulative distribution function for some Rm -valued random variable, then F¯ = F0 . Proof (i) Note that limn∈N μn  μ0 means that limn∈N μn = μ0 weak∗ , that is limn∈N μn , z = μ0 , z, ∀z ∈ Cc (Rm , R). By Riesz Representation Theorem 11.209, this is equivalent to lim n∈N Rm dμn (x), z(x) = m , R) ⊆ C (Rm , R). Thus, (i) follows readily. dμ (x), z(x), ∀z ∈ C (R m 0 c b R (ii) Let limn∈N Rm dμn (x), z(x) = Rm dμ0 (x), z(x) ∈ R, ∀z ∈ m Cb (Rm , R). Fix δ ∈ R+ . Define hb,δ : Rm → [0, 1] ⊂ R ⎧ any b ∈ R and  ⎪ x=b ⎨1   max(x−b) by hb,δ (x) = 1 − x = b + δ1m and x = b , ∀x ∈ Rm . Clearly, hb,δ ∈ δ ⎪ ⎩  0 x = b + δ1m BB CC BB CC CC BB m , R). Then, lim C μ (R , hb,δ = μ0 , hb,δ , Fn (b) ≤ μn , hb,δ , and b n n∈N BB CC μ0 , hb,δ ≤ F0 (b + δ1m ). Then, taking limit as n → ∞ in the above three relations, we have lim supn∈N Fn (b) ≤ F0 (b + δ1m ). Since F0 is a cumulative distribution function and therefore continuous on the right. Taking limit as δ → 0+ ,

816

14 Probability Theory

BB CC we BB have lim CC supn∈N Fn (b) ≤ BBF0 (b). Similarly, CC BB we mustCChave limn∈N μn , hb,δ = μ0 , hb,δ , Fn (b + δ1m ) ≥ μn , hb,δ , and μ0 , hb,δ ≥ F0 (b). Taking limit as n → ∞ in the above three relations, we have lim infn∈N Fn (b + δ1m ) ≥ F0 (b). ¯ ≥ F0 (b¯ − δ1m ), ∀b¯ ∈ Rm , Let b¯ := b + δ1m , we have lim infn∈N Fn (b) m ∀δ ∈ R+ . Now, fix any b ∈ R with F0 being continuous at b. We have F0 (b − δ1m ) ≤ lim infn∈N Fn (b) ≤ lim supn∈N Fn (b) ≤ F0 (b). Now, take δ → 0+ , we have F0 (b) ≤ lim infn∈N Fn (b) ≤ lim supn∈N Fn (b) ≤ F0 (b) and hence limn∈N Fn (b) = F0 (b). Hence, (ii) is proved. m (iii) Let limn∈N μn  μ0 . Fix any z ∈ Cb (R , R). By BoundedmConvergence Theorem 11.77, Rm dμn (x), z(x) ∈ R, ∀n ∈ Z+ . ∀a, b ∈ R with a  b, ∀k ∈ N with min(b − a) > 1k , by Proposition 5.62, there exists a continuous function hk,a,b : Rm → [0, 1] ⊂ R such that hk,a,b (x) = 1, ∀x ∈ Ek,1 := {x¯ ∈   Rm | a + 1k 1m = x¯ = b} ⊂ Rm and hk,a,b (x) = 0, ∀x ∈ Ek,2 := Rm \ {x¯ ∈ Rm | a  x¯  b + 1k 1m }. Clearly, hk,a,b ∈ Cc (Rm , R). Then, 7 ΔFn (ra+ 1 1m ,b ) =

.

k

7 ≤

7 Rm

Rm

BB CC hk,a,b dμn = μn , hk,a,b

χr

,Rm

dμn ≤

χr

,Rm

dμn = ΔFn (ra,b+ 1 1m );

a+ k1 1m ,b

a,b+ k1 1m

Rm

(14.13)

k

∀n ∈ Z+ , ∀a, b ∈ Rm , ∀k ∈ N with a +

1 b k

where the first and last equalities follow from Definition 14.16 and the first and last inequalities follow from Proposition 11.92. Taking limit as n → ∞, BB CC BB CC lim sup ΔFn (ra+ 1 1m ,b ) ≤ lim μn , hk,a,b = μ0 , hk,a,b

.

n∈N

k

n∈N

(14.14)

1 ≤ lim inf ΔFn (ra,b+ 1 1m ); ∀a, b ∈ Rm , ∀k ∈ N with a + 1m  b k n∈N k where the equality follows from limn∈N μn  μ0 and hk,a,b ∈ Cc (Rm , R) and the inequalities follow from (14.13). By the assumption that (xn )∞ n=1 is uniformly growth bounded, then (xn )∞ is uniformly growth bounded. Thus, we have n=0 limmin d→−∞ Fn (d) = 0, and the convergence is uniform with respect to n ∈ Z+ . It is easy to prove, using  − δ arguments, that 0≤

.

lim

lim inf Fn (d) ≤

min d→−∞ n∈N

= lim

lim

n∈N min d→−∞

lim

min d→−∞

lim sup Fn (d) n∈N

Fn (d) = 0

(14.15)

This implies that, ∀b ∈ Rm and ∀k ∈ N, lim sup Fn (b) = lim sup

.

n∈N

n∈N

lim

max a→−∞

ΔFn (ra+ 1 1m ,b ) k

14.10 Existence of the Wiener Process

= ≤

lim

max a→−∞

lim

max a→−∞

817

lim sup ΔFn (ra+ 1 1m ,b ) ≤ k

n∈N

BB

lim inf

max a→−∞

μ0 , hk,a,b

CC

1 ΔF0 (ra,b+ 1 1m ) = F0 (b + 1m ) k k

(14.16)

where the first equality follows from (14.15), the second equality follows from (14.15), the first inequality follows from (14.14) and the fact that all limits are real numbers in the closed interval [0, 1] ⊂ R, the second inequality follows from (14.13) with n = 0, and the last equality follows from (14.15) and Definition 12.41. We also have, ∀b ∈ Rm and ∀k ∈ N, F0 (b) =

.



lim

max a→−∞

ΔF0 (ra+ 1 1m ,b ) ≤

lim inf

BB

max a→−∞

k

μ0 , hk,a,b

CC

lim inf lim inf ΔFn (ra,b+ 1 1m )

max a→−∞ n∈N

k

1 = lim inf lim inf ΔFn (ra,b+ 1 1m ) = lim inf Fn (b + 1m ) k n∈N max a→−∞ n∈N k

(14.17)

where the first equality follows from (14.15) and Definition 12.41, the first inequality follows from (14.13) with n = 0, the second inequality follows from (14.14); the second equality follows from (14.15), and the last equality follows from (14.15). Then, let b ∈ Rm be any point such that F0 is continuous at b, we have lim supn∈N Fn (b) ≤ limk∈N F0 (b + 1k 1m ) = F0 (b) = limk∈N F0 (b − 1k 1) ≤ limk∈N lim infn∈N Fn (b) = lim infn∈N Fn (b) ≤ lim supn∈N Fn (b), where the first inequality follows from (14.16), the first and second equalities follow from the fact that F0 is continuous at b, the second inequality follows from (14.17) with b substituted by b − k1 , and the last equality follows from Proposition 3.83. Thus, we have limn∈N Fn (b) = F0 (b). This establishes (iii). (iv) ∀a, b ∈ Rm with a  b, ∀k ∈ N with min(b − a) > 1k , let hk,a,b : Rm → [0, 1] ⊂ R be as defined in (iii). Clearly, hk,a,b ∈ Cc (Rm , R). Then, (14.13)–(14.17) holds. Taking limit as n → ∞ in (14.13), BB CC BB CC ΔF¯ (ra+ 1 1m ,b ) = lim ΔFn (ra+ 1 1m ,b ) ≤ lim μn , hk,a,b = μ0 , hk,a,b

.

k

n∈N

n∈N

k

≤ lim ΔFn (ra,b+ 1 1m ) = ΔF¯ (ra,b+ 1 1m ); n∈N

k

k

(14.18)

1 ∀a, b ∈ Rm , ∀k ∈ N with a + 1m  b k where the first equality follows from the assumption, the second equality from limn∈N μn  μ0 and hk,a,b ∈ Cc (Rm , R), the inequalities follow from (14.13), and the last equality follows from the assumption. This further implies that, ∀b ∈ Rm and ∀k ∈ N, F¯ (b) = lim Fn (b) = lim

.

n∈N

lim

n∈N max a→−∞

ΔFn (ra+ 1 1m ,b ) k

818

14 Probability Theory

=

lim ΔFn (ra+ 1 1m ,b ) ≤

lim

1 ΔF0 (ra,b+ 1 1m ) = F0 (b + 1m ) k k

max a→−∞ n∈N



BB

lim

max a→−∞

k

lim inf

max a→−∞

μ0 , hk,a,b

CC

(14.19)

where the first equality follows from the assumption, the second equality follows from (14.15), the third equality follows from (14.15), the first inequality follows from (14.18), the second inequality follows from (14.13) with n = 0, and the last equality follows from (14.15) and Definition 12.41. Taking limit k → ∞, we have F¯ (b) ≤ F0 (b), ∀b ∈ Rm , since F0 is continuous on the right. We also have, ∀b ∈ Rm and ∀k ∈ N, F0 (b) =

.



lim

max a→−∞

ΔF0 (ra+ 1 1m ,b ) ≤ k

lim inf

max a→−∞

lim inf lim ΔFn (ra,b+ 1 1m ) = lim

max a→−∞ n∈N

k

BB

μ0 , hk,a,b

CC

lim inf ΔFn (ra,b+ 1 1m )

n∈N max a→−∞

1 1 = lim Fn (b + 1m ) = F¯ (b + 1m ) n∈N k k

k

(14.20)

where the first equality follows from (14.15) and Definition 12.41, the first inequality follows from (14.13) with n = 0, the second inequality follows from (14.14), the second and third equalities follow from (14.15), and the last equality follows from the assumption. Then, again taking limit k → ∞, we have F0 (b) ≤ F¯ (b), ∀b ∈ Rm , since F¯ is continuous on the right. Thus, we have F¯ (b) = F0 (b), ∀b ∈ Rm . This establishes (iv). This completes the proof of the proposition. ' & Proposition 14.72 Let Ω := (Ω, B, P ) be a probability measure space, m ∈ N, with (Xi )∞ i=1 be a sequence of independent identically distributed random variable  m Xi ∼ N(0, 1), ∀i ∈ N, α1 , . . . , αm ∈ l∞ (R) be linearly independent, and Yj j =1 be random variables defined by Yj (ω) :=

.

 -∞

l=1 αj,l Xl (ω)

0

∀ω ∈ Fj , ∀ω ∈ Ω ∀ω ∈ Ω \ Fj

  where Fj := {ω ∈ Ω | ∞ l=1 αj,l Xl (ω) ∈ R}, j = 1, . . . , m; αj = αj,1 , αj,2 , . . . , j = 1, . . . , m. Then, Y := (Y1 , . . . , Ym ) is anRm -valued Gaussian random variable with distribution N(0m , K), where K = ( ∞ l=1 αi,l αj,l )m×m , and P (Fj ) = 0, j = 1, . . . , m, if K ∈ S¯+ Rm . Proof Let K ∈ S¯+ Rm . Then, αj ∈ l2 (R) =: X, j = 1, . . . , m. By Proposition 13.42, S¯+ Rm-is an open subset of S¯Rm , then, ∃n1 ∈ N, ∀n ∈ N with  n ≥ n1 , we have Kn := ( nl=1 αi,l αj,l )m×m ∈ S¯+ Rm . Then, the matrix Vn := αi,j m×n is of full row rank, ∀n ∈ N with n ≥ n1 , since Kn = Vn Vn . By Proposition 14.28, we have Y n := Vn Xn is an Rm -valued Gaussian random variable with distribution N(0m , Kn ), ∀n ∈ N with n ≥ n1 , where Xn := (X1 , . . . , Xn ) is an Rn -valued

14.10 Existence of the Wiener Process

819

Gaussian random variable with distribution N(0n , idRn ). Fix any j ∈ {1, . . . , m}. n n 2 ). Thus, Let Yj (ω) := nl=1 αj,l Xl (ω), ∀n ∈ N. Then, Yj ∼ N(0, nl=1 αj,l #∞ " n Yj is a Martingale and is bounded in L¯ 1 (Ω, R). By Doob’s Forward n=1

n

Convergence Theorem 14.41, limn∈N Yj = Yj a.e. in Ω. This implies that P (Fj ) = 0. By the arbitrariness of j , we have limn∈N Y n = Y a.e. in Ω. By Mode of Convergence (Theorem 14.57), Y n  Y . By Chebyshev have  we ∞ n Inequality Proposition 14.67, we have that Y is uniform growth bounded. n=1 By Lebesgue Dominated Convergence Theorem 11.91 and Definition 14.25, we have limn∈N FY n (z) = F¯ (z), ∀z ∈ Rm , where FY n : Rm → [0, 1] ⊂ R is the cumulative distribution function of Y n , ∀n ∈ N, and F¯ : Rm → [0, 1] ⊂ R is the cumulative distribution function of some Rm -valued Gaussian random variable with distribution N(0m , K). By Proposition 14.71, Y ∼ N(0m , K). This completes the proof of the proposition. ' & Example 14.73 Let I := [0, 1] ⊂ R and I := ((P(I ) = (0, 1], |·|), B, P ) be the finite metric measure subspace of R, which is a probability measure space. Define a sequence of simple functions (hn )∞ n=1 , hn : I → [−1, 1] ⊂ R by hn (x) = ⎧ −m −m ⎨ 1 2 2i < x ≤ 2 (2i + 1) O N −1 2−m (2i + 1) < x ≤ 2−m (2i + 2) , ∀x ∈ I, ∀n ∈ N, m := log2 n ∈ Z+ , ⎩ 0 elsewhere P Q i := n − 1 − 2m−1 ∨ 0 ∈ Z+ . This sequence of functions is said to be the Haar functions. See Fig. 14.2 for the plot of the first eight of Haar functions. Clearly, ¯ ¯ (hn )∞ n=1 is an orthogonal sequence in Z := L2 (I, R). We will further define a ∞ sequence of simple functions (Hn )n=1 by Hn := hn1 ¯ hn , ∀n ∈ N, which is then an Z ¯ We will show that ([Hn ])∞ is a complete orthonormal orthonormal sequence in Z. n=1 sequence in Z := L2 (I, R). m View each Hn as a random variable, ∀n ∈ N. Let Bm := σ ((Hn )2n=1 ), ∀m ∈ Z  + . We observe that Bm = σ (Hm ), where H0 := {∅, (0, 1]},and Hm+1 := Hm ∪ (0, 2−m−1 ], (2−m−1 , 2 · 2−m−1 ], · · · , ((2m+1 − 1)2−m−1 , 1] , ∀m ∈ Z+ . It should be clear that Hm is a π-system, ∀m ∈ Z+ , by Definition 12.17. We need the following intermediate result: -m Claim 14.73.1 ∀E ∈ Bm , ∃a1 , . . . , a2m ∈ R such that χE,I = 2i=1 ai hi . Proof of Claim We will prove the claim using mathematical induction on m ∈ Z+ . 1◦ m = 0. This case is straightforward since B0 = H0 . 2◦ Assume that the result holds for 0 ≤ m ≤ k ∈ Z+ . 3◦ Consider the case m = k + 1 ∈ N. Since Bk+1 = σ (Hk+1 ), we just need to show the result ∀E ∈ Hk+1 . Clearly, B0 ⊆ B1 ⊆ · · · ⊆ Bk+1 ⊆ · · · , since H0 ⊆ H1 ⊆ · · · ⊆ Hk+1 ⊆ · · · . Then, if E ∈ Hk , the result holds by 2◦ . On −k−1 −k−1 (i + 1)] for some i ∈ the if E ∈ Hk+1 \ PHkQ, then  other hand,  E = (2 k i, 2 i k+1 0, 1, . . . , 2 − 1 . Let j = 2 ∈ 0, 1, . . . , 2 − 1 and E¯ := (2−k j, 2−k (j + -2 k i ◦ 1)] ∈ Hk . Then, χE,I = 12 (χE,I ¯ + (−1) h2k +1+j ). By 2 , χE,I ¯ = i=1 ai hi for

820

Fig. 14.2 The first eight Haar functions

14 Probability Theory

14.10 Existence of the Wiener Process

821

- k+1 some a1 , . . . , a2k ∈ R. Then, we have χE,I = 2i=1 a¯ i hi for some a¯ 1 , . . . , a¯ 2k+1 ∈ R. This completes the proof for m = k + 1 and completes the inductive process. This completes the proof of the claim. ' & ¯ Fix m ∈ Z+ . By Theorem 14.70, we have ∀gm ∈ E(f |Bm ), Fix any f ∈ Z. m gm = arg ming:I→R being Bm -measurable f − g2Z¯ . By Bm = σ (Hm ) = σ ((hn )2n=1 ), we have gm is a real-valued simple function on the measurable space ((0, 1], Bm ). m By Claim 14.73.1, we have gm is a linear combination of (hn )2n=1 . Therefore, # f − g2 . By Proposition 13.31, we have g " gm = arg min m m = ¯ Z g∈span (Hn )2n=1 -2 m i=1 f, Hi Hi . Fix any  ∈ (0, ∞) ⊂ R, by Proposition 11.182, ∃g¯ ∈ C(I, R) ∩ Z¯ −m ¯ ∃m1 ∈ Z+ , we have 2 1 P2 ◦ such that f − g ¯ ¯ <  . Since g¯ ∈ Z, 2

Z

2

0

¯ [2−m1 ,1] is continuous and hence uniformly continuous, by Propog¯ dP < 8 . g| sition 5.39. Then, ∃m2 ∈ Z+ with m2 ≥ m1 , we have |g(t ¯ 1 ) − g(t ¯ 2 )| < 4 , −m −m 1 2 ∀t1 , t2 ∈ [2 , 1] ⊂ R with |t1 − t2 | ≤ " 2 . Then, ∀m ∈ Z+ with # m m ≥ m2 , there exists a function g¯m ∈ span (Hn )2n=1 , defined by g¯m (t) =  0 0 < t ≤ 2−m1 , ∀t ∈ I, such that −m 2 g(2 ¯ i) 2−m2 i < t ≤ 2−m2 (i + 1), i = 2m2 −m1 , . . . , 2m2 − 1 # " m2 g¯m − g .) Then, we have f − g¯m Z¯ < ¯ Z¯ < 2 . (Note that g¯m ∈ span (Hn )2n=1 . By the previous paragraph, we have f − gm Z¯ ≤ f − g¯ m Z¯ < . Hence,   . ¯ and [f ] ∈ span ([Hn ])∞ . limm∈N gm = f in Z, n=1   ∞ By the arbitrariness of f , we have Z = span ([Hn ])∞ n=1 and ([Hn ])n=1 is a complete orthonormal sequence in Z. % ˆ μ) be the Definition 14.74 Let T ∈ R+ , [0, T ] ⊂ R, and I := (([0, T ], |·|), B, finite metric measure subspace of R with partial ordering ≤, Ω := (Ω, B, P ) be a ¯ ν) be the finite product probability measure space, and I × Ω := ([0, T ] × Ω, B, measure space. A Wiener process on I × Ω is a mapping w : I × Ω → R that satisfies (a) (b) (c) (d)

¯ w is B-measurable. w(0, ω) = 0, ∀ω ∈ Ω. ∀ω ∈ Ω, w(·, ω) : I → R is continuous (i.e., the sample paths are continuous). The random process is of independent increment, i.e., ∀n ∈ N, ∀0 ≤ s1 < t1 ≤ s2 < t2 ≤ · · · ≤ sn < tn ≤ T , we have wti − wsi is a Gaussian random variable with mean 0 and covariance ti − si , i = 1, . . . , n, and the random variables wti − wsi , i = 1, . . . , n, are independent, where (wt )t ∈I is defined by wt (ω) = w(t, ω), ∀ω ∈ Ω, ∀t ∈ I. %

ˆ μ) Example 14.75 Let I¯ := (([0, 1], |·|), B, ˆ be the finite compact metric measure ¯ |·|), B, ˜ μ) be the finite metric subspace of R with partial ordering ≤, I := ((P(I), ∞ measure subspace of R, (Hn )∞ and be the Haar functions as defined in (h ) n n=1 n=1 ¯ ¯ Example 14.73, Z := L2 (I, R), and Z := L2 (I, R). Define (an )∞ n=1 be a sequence

822

14 Probability Theory

t of functions: an : I¯ → R with an (t) := 0 Hn (s) ds, ∀t ∈ I¯ and ∀n ∈ N. B= > C ¯ ∀n ∈ N. By Fundamental Theorem on Clearly, an (t) = χ(0,t ],I , [Hn ] Z , ∀t ∈ I, Modeling 14.15 and Example 14.17, there exists a probability measure space Ω := (Ω, B, P ) and random variables (Xi )∞ i=1 , Xi : Ω → R, such that Xi ∼ N(0, 1), ¯ ¯ ν) be are independent. Let I×Ω := ([0, 1]×Ω, B, and the random variables (Xi )∞ i=1 the finite product measure space, which happens to be a probability measure space. -n ¯ Define the partial sum wn : I¯ × Ω → R by wn (t, ω) = 2i=1 Xi (ω)ai (t), ∀t ∈ I, ¯ ∀ω ∈ Ω, ∀n ∈ N. Clearly, wn is B-measurable, ∀n ∈ N, by Propositions 11.38 and 11.39, and wn (·, ω) : I¯ → R is continuous, ∀ω ∈ Ω. C 6 6 B= > −m+1 Note that ak (t) = hk1 ¯ χ(0,t ],I , [hk ] Z , 6hk 6Z¯ = 2 2 , ∀k ∈ N, Z N O and m := log2 k , and the function a¯ k : I¯ → [0, ∞) ⊂ R defined by C > B= ¯ has the same support as hk and has maximum a¯ k (t) := χ(0,t ],I , [hk ] Z , ∀t ∈ I, absolute value 2−m . Therefore, ak has the same support as hk , is nonnegative −m+1 m+1 real-valued, and has maximum aˆ k = 2−m 2− 2 = 2− 2 , where m := O N -n -2j log2 k . Therefore, wn (t, ω) = X1 (ω)a1 (t) + j =1 i=2j−1 +1 Xi (ω)ai (t) =: ¯ ∀ω ∈ Ω, ∀n ∈ N. wn (·, ω) will X1 (ω)a1 (t) + nj=1 Bj (t, ω), ∀t ∈ I,   -∞   ¯ R) if by Proposition 7.27 and converge in C(I, j =1 maxt ∈I¯ Bj (t, ω) < ∞, - j      Example 7.30. Note that max ¯ Bj (t, ω) = max ¯  2 j−1 Xi (ω)ai (t) = t ∈I

− j+1 2

t ∈I

i=2

+1

maxi∈{2j−1 +1,...,2j } |Xi (ω)| =: bj Mj (ω), ∀j ∈ N, ∀ω ∈ Ω.  limn→∞ wn (t, ω) ω ∈ F ¯ ¯ Define w : I × Ω → R by w(t, ω) = , ∀t ∈ I, 0 ω ∈Ω \F ∀ω ∈ Ω, where F := {ω ∈ Ω | limn→∞ nj=1 bj Mj (ω) < ∞}. We will show that w is a Wiener process on I¯ × Ω. First, we show that P (F ) = 1 by proving E( ∞ j =1 bj Mj ) < ∞. Note that #1/4 "- j 2 4 (X (ω)) =: Yj (ω), Mj (ω) = maxi∈{2j−1 +1,...,2j } |Xi (ω)| ≤ j−1 i i=2 +1 "- j #1/4 2 4) ∀ω ∈ Ω, ∀j ∈ N. Then, E(Mj ) ≤ E(Yj ) ≤ E(X = j−1 i i=2 +1 √ j−1 1 (3 · 2j −1 ) 4 = 4 3 · 2 4 , where the second-inequality follows -∞from Jensen’s Inequality Theorem 11.98. This implies that E( ∞ b M ) = j j j =1 j =1 bj E(Mj ) ≤ √ -∞ − j+1 √ -∞ √ -∞ √ j−1 j+3 4 4 4 − 2 4 · 3·2 4 = = 24 · j =1 ( 4 2)−j < ∞. j =1 2 j =1 3 · 2 Hence, we have P (F ) = 1 and w(t, ω) = limn∈N wn (t, ω) a.e. (t, ω) ∈ I¯ × Ω. ¯ By Propositions 11.48 and 11.41, w is B-measurable. Hence, w satisfies (a) of Definition 14.74 We will check the rest of the conditions for w to be a Wiener process on I¯ × Ω.  (b) Clearly, wn (0, ω) = 0, ∀n ∈ N, ∀ω ∈ Ω. Then, w(0, ω) = limn∈N wn (0, ω) = 0 ω ∈ F = 0, ∀ω ∈ Ω. 0 ω ∈Ω \F (c) Clearly, w(·, ω) : I¯ → R is continuous, ∀ω ∈ Ω. 2

14.10 Existence of the Wiener Process

823

(d) Since the Gaussian random variables (Xi )∞ i=1 are independent, then ∀n ∈ N, ∀0 ≤ s1 < t1 ≤ s2 < t2 ≤ · · · ≤ sn < tn ≤ 1, we will prove that (wt1 − ws1 , . . . , wtn − wsn ) is Rn -valued Gaussian random variable with distribution N(0n , Q) by Proposition 14.72, where Q := block diagonal (t1 − s1 , . . . , tn − sn ) ∈ S¯+ Rn , and (wt )t ∈I¯ is defined by wt (ω) = ¯ Clearly, E(wtj − wsj ) = E( ∞ (ai (tj ) − ai (sj ))Xi ) = w(t, ω), ∀ω ∈ Ω, ∀t ∈ I. i=1 0, j = 1, . . . , n. Furthermore, E((wtj − wsj )2 ) = E

.

∞ " . 2 # (ai (tj ) − ai (sj ))Xi i=1

=

∞ ∞ " . . B= > C #2 (ai (tj ) − ai (sj ))2 = χ(sj ,tj ],I , [Hi ] Z i=1

=

i=1

∞ @. B=

∞ A . C C > B= > χ(sj ,tj ],I , [Hi ] Z [Hi ], χ(sj ,tj ],I , [Hi ] Z [Hi ]

i=1

i=1

Z

6 62 > = >C B= = χ(sj ,tj ],I , χ(sj ,tj ],I Z = 6χ(sj ,tj ],I 6Z¯ = tj − sj where the second equality follows from the independence of the random variables ∞ (Xi )∞ i=1 and the fourth and fifth equalities follow from the fact that ([Hi ])i=1 is a complete orthonormal sequence in Z. It is easy to check that E((wtk − wsk )(wtl − wsl ))

.

=E

∞ " .

Xi (ai (tk ) − ai (sk ))

∞  .

i=1

=

∞ .

Xj (aj (tl ) − aj (sl ))

#

j =1

E(Xi2 (ai (tk ) − ai (sk ))(ai (tl ) − ai (sl )))

i=1

=

∞ .

(ai (tk ) − ai (sk ))(ai (tl ) − ai (sl ))

i=1

=

∞ . B=

C B= C > > χ(sk ,tk ],I , [Hi ] Z · χ(sl ,tl ],I , [Hi ] Z

i=1

=

∞ @. B= i=1

∞ A . C C > B= > χ(sk ,tk ],I , [Hi ] Z [Hi ], χ(sl ,tl ],I , [Hi ] Z [Hi ]

> = >C B= = χ(sk ,tk ],I , χ(sl ,tl ],I Z = 0,

i=1

Z

∀1 ≤ k, l ≤ n with l = k

824

14 Probability Theory

where the second equality follows from the independence of the random variables (Xi )∞ i=1 , the fourth equality follows from the definition of ak ’s, and the fifth and sixth equalities follow from the fact that ([Hi ])∞ i=1 is a complete orthonormal sequence in Z. Hence, (wt1 − ws1 , . . . , wtn − wsn ) is Rn -valued Gaussian random variable with distribution N(0n , Q). Then,wt1 −ws1 , . . . , wtn −wsn are independent by Proposition 14.27. This completes the proof of existence of the Wiener process on I¯ × Ω. % ˆ μ) be the σ -finite metric measure Definition 14.76 Let I := (([0, ∞), |·|), B, subspace of R with partial ordering ≤, Ω := (Ω, B, P ) be a probability measure ¯ ν) be the σ -finite product measure space. A space, and I × Ω := ([0, ∞) × Ω, B, standard Wiener process on I × Ω is a mapping w : I × Ω → R that satisfies (a) (b) (c) (d)

¯ w is B-measurable. w(0, ω) = 0, ∀ω ∈ Ω. ∀ω ∈ Ω, w(·, ω) : I → R is continuous (i.e., the sample paths are continuous). The random process is of independent increment, i.e., ∀n ∈ N, ∀0 ≤ s1 < t1 ≤ s2 < t2 ≤ · · · ≤ sn < tn < ∞, we have wti − wsi is a Gaussian random variable with mean 0 and covariance ti − si , i = 1, . . . , n, and the random variables wti − wsi , i = 1, . . . , n, are independent, where (wt )t ∈I is defined by wt (ω) = w(t, ω), ∀ω ∈ Ω, ∀t ∈ I. %

ˆ μ) Example 14.77 Let Iˆ := (([0, 1], |·|), B, ˆ be the finite compact metric measure subspace of R with partial ordering ≤. By Example 14.75, there exist probability measure spaces Ω n := (Ωn , Bn , P n ), ∀n ∈ N, each with a Wiener process wn : Iˆ × Ω n → R. By Fundamental Theorem on Modeling 14.15, there exists a unique   probability measure space Ω := (Ω, B, P ) = ( n∈N Ωn , B, P := n∈N P n ), on  which Bˆn := { i∈N Bi ∈ B | Bi ∈ Bi , ∀i ∈ N; and Bi = Ωi , ∀i ∈ N with i = ˜ μ) n} ⊆ B, n ∈ N are independent σ -algebras. Let I := (([0, ∞), |·|), B, ˜ be the σ -finite metric measure subspace of R with partial ordering ≤ and I × Ω := ¯ ([0, ∞) × Ω, B, - ν) be the σ -finite product measure space. Define w : I × Ω → R by w(t, ω) = i−1 j =1 wj (1, ω) + wi (t − i + 1, ω), ∀t ∈ I, and i := 1t2 + 1 ∈ N, ∀ω ∈ Ω. It is easy to check that w is a standard Wiener process on I × Ω. % ˆ μ) be the σ -finite metric measure Definition 14.78 Let I := (([0, ∞), |·|), B, subspace of R, Ω := (Ω, B, P ) be a probability measure space, m ∈ N, and ¯ ν) be the σ -finite product measure space. An Rm -valued I × Ω := ([0, ∞) × Ω, B, standard Wiener process on I × Ω is a mapping w : I × Ω → Rm that satisfies ¯ (a) w is B-measurable. (b) w(0, ω) = 0m , ∀ω ∈ Ω. (c) ∀ω ∈ Ω, w(·, ω) : I → Rm is continuous (i.e., the sample paths are continuous).

14.10 Existence of the Wiener Process

825

(d) The random process is of independent increment, i.e., ∀n ∈ N, ∀0 ≤ s1 < t1 ≤ s2 < t2 ≤ · · · ≤ sn < tn < ∞, we have wti − wsi is a Rm -valued Gaussian random vector with mean 0m and covariance (ti − si ) idRm , i = 1, . . . , n, and the random vectors wti − wsi , i = 1, . . . , n, are independent, where (wt )t ∈I is defined by wt (ω) = w(t, ω), ∀ω ∈ Ω, ∀t ∈ I. % ˆ μ) be the σ -finite metric measure Example 14.79 Let I := (([0, ∞), |·|), B, subspace of R with partial ordering ≤, m ∈ N, and J¯ := {1, . . . , m}. By Example 14.77, there exist probability measure spaces Ω i := (Ωi , Bi , P i ), ∀i ∈ J¯, each with a standard Wiener process wi defined on I × Ω i . By Fundamental Theorem on Modeling  14.15, there exists ma unique probability measure m space ˆ Ω := (Ω, B, P ) = ( m Ω , B, P := P ), on which B := { i i=1 i i=1 i j =1 Bj ∈ ¯ ¯ B | Bj ∈ Bj , ∀j ∈ J ; and Bj = Ωj , ∀j ∈ J with j = i} ⊆ B, i ∈ J¯, are ¯ ν) be the σ -finite product measure independent σ -algebras. Let I × Ω := (I × Ω, B, m space, and define w : I × Ω → R by w(t, ω) = (w1 (t, ω), . . . , wm (t, ω)) ∈ Rm , ∀t ∈ I, ∀ω ∈ Ω. It is easy to check that w is an Rm -valued standard Wiener process on I × Ω. % ˆ μ) be the σ -finite metric Proposition 14.80 Let m ∈ N, I := (([0, ∞), |·|), B, measure subspace of R with partial ordering ≤, Ω := (Ω, B, P ) be a probability ¯ ν) be the σ -finite product measure measure space, I × Ω := ([0, ∞) × Ω, B, space, w : I × Ω → Rm be an Rm -valued standard Wiener process (as defined in Definition 14.78), and T ∈ B (Rm , Rm ) is invertible. Then, w˜ := T w : I × Ω → Rm is an Rm -valued stochastic process on I × Ω satisfying ¯ (i) w˜ is B-measurable. (ii) w(0, ˜ ω) = 0m , ∀ω ∈ Ω. (iii) ∀ω ∈ Ω, w(·, ˜ ω) : I → Rm is continuous (i.e., the sample paths are continuous). (iv) The Rm -valued random variable w˜ t : Ω → Rm , defined by w˜ t (ω) = w(t, ˜ ω) = T w(t, ω), ∀ω ∈ Ω, ∀t ∈ I, is of independent increment: i.e., ∀0 ≤ s1 < t1 ≤ s2 < t2 ≤ · · · ≤ sn < tn < ∞, w˜ t1 − w˜ s1 , . . ., w˜ tn − w˜ sn are independent Gaussian random variables with mean 0m and covariance (ti − si )T T  , i = 1, . . . , n. (v) The auto-covariance function E(w˜ t1 ⊗ w˜ t2 ) = min {t1 , t2 }T T  , ∀t1 , t2 ∈ I. (vi) If in addition, T is a unitary matrix, i.e., T T  = idRm , then w˜ is an Rm -valued standard Wiener process on I × Ω. Proof (i) This is immediate from Definition 14.78 and Proposition 11.38. (ii) w(0, ˜ ω) = T w(0, ω) = T 0m = 0m , ∀ω ∈ Ω. (iii) This is immediate from Definition 14.78 and Proposition 3.12.

826

14 Probability Theory

(iv) By w being a Rm -valued standard Wiener process, ∀0 ≤ s1 < t1 ≤ s2 < t2 ≤ · · · ≤ sn < tn < ∞, wt1 − ws1 , . . ., wtn − wsn are independent Rm -valued Gaussian random variables with mean 0m and covariance (ti − si ) idRm . Then, W := (wt1 − ws1 , . . . , wtn − wsn ) ∼ N(0mn , Q), where Q := block diagonal ((t1 − s1 ) idRm , . . . , (tn − sn ) idRm ). Then, we have W˜ := (w˜ t1 − w˜ s1 , . . . , w˜ tn − w˜ sn ) = (T (wt1 − ws1 ), . . . , T (wtn − wsn )) = block diagonal (T , . . . , T ) W =: Tn W . By the invertibility of T , then Tn ˜ is invertible. By Proposition 14.28, we have W˜ ∼ N(0mn , Tn QTn =: Q).     ˜ = block diagonal (t1 − s1 )T T , . . . , (tn − sn )T T . Hence, w˜ t1 − Clearly, Q w˜ s1 , . . . , w˜ tn − w˜ sn are independent Rm -valued Gaussian random variables with mean 0m and covariance (ti − si )T T  , i = 1, . . . , n, by Proposition 14.27. (v) Without loss of generality, assume 0 ≤ t1 ≤ t2 . The result is trivial if t1 = 0. If t1 > 0, then E(w˜ t1 ⊗w˜ t2 ) = E(w˜ t1 ⊗(w˜ t2 −w˜ t1 ))+E(w˜ t1 ⊗w˜ t1 ) = 0m×m +t1 T T  = min {t1 , t2 }T T  , where the first equality follows from Proposition 11.92, and the second equality follows from (iv). (vi) The result follows directly from (i)–(iv) and Definition 14.78. This completes the proof of the proposition. ' & ˆ μ) be the σ -finite metric Proposition 14.81 Let m ∈ N, I := (([0, ∞), |·|), B, measure subspace of R with partial ordering ≤, Ω := (Ω, B, P ) be a probability ¯ ν) be the σ -finite product measure measure space, I × Ω := ([0, ∞) × Ω, B, m m space, w : I × Ω → R be an R -valued standard Wiener process (as defined in Definition 14.78), and α ∈ R with α = 0. Then, w˜ : I × Ω → Rm , defined by w(t, ˜ ω) = α1 w(α 2 t, ω), ∀ω ∈ Ω, ∀t ∈ I, is an Rm -valued standard Wiener process. Proof This is immediate from Proposition 14.80 and Definition 14.78.

' &

ˆ μ) be the σ -finite metric measure Proposition 14.82 Let I := (([0, ∞), |·|), B, subspace of R with partial ordering ≤, Ω := (Ω, B, P ) be a probability ¯ ν) be the σ -finite product measure measure space, I × Ω := ([0, ∞) × Ω, B, space, w : I × Ω → R be a standard Wiener process (as defined in Definition 14.76). Then, there exists F ∈ B with P (F ) = 1 such that ∀ω ∈ F , lim suph→0+ h1 |w(t + h, ω)) − w(t, ω)| = ∞, ∀t ∈ I. Proof Fix any m ∈ Z+ , and we consider the interval t ∈ [m, m + 1). Let Fm := {ω ∈ Ω | lim suph→0+ h1 |w(t + h, ω) − w(t, ω)| = ∞, ∀t ∈ [m, m + 1)}.  ¯ Note that Fm may not be B-measurable. Define F¯ := ∞ m=0 Fm . Then, ∀ω ∈ F , 1 lim suph→0+ h |w(t + h, ω)) − w(t, ω)| = ∞, ∀t ∈ I.   Let Am,j,k := m≤t α2 t + β, for some t ∈ I}) ≤ exp(−αβ). Proof Let (Bt )t ∈I be the natural filtration of (wt )t ∈I . Let A := {ω ∈ Ω | w(t, ω) > α 2 t + β, for some t ∈ I} and M : I × Ω → R be defined by M(t, ω) = 2

exp(αw(t, ω) − α2 t), ∀ω ∈ Ω, ∀t ∈ I. By Proposition 14.84, the stochastic process (Mt )t ∈I , defined by Mt (ω) = M(t, ω), ∀ω ∈ Ω, ∀t ∈ I, is a Martingale adapted to the filtration (Bt )t ∈I and E(Mt ) = 1, ∀t ∈ I. Then, A = {ω ∈ Ω | supt ∈I Mt (ω) > ¯ exp(αβ)}. Let M¯ : Ω → [0, ∞] ⊂ Re be defined by M(ω) := supt ∈I Mt (ω), ∀ω ∈  N  ¯ Ω. Then, A = {ω ∈ Ω | M(ω) > exp(αβ)}. Fix any N ∈ N, and define fn n∈Z+  N  N by fn (ω) = M( 2nN , ω), ∀n ∈ Z+ and ∀ω ∈ Ω. Then, fn n∈Z+ is a martingale indexed by Z+ . By the continuity of M(·, ω) : I → R, ∀ω ∈ Ω, we have A = {ω ∈  N ¯ Ω | M(ω) > exp(αβ)} = N∈N {ω ∈ Ω | supn∈Z+ fn (ω) > exp(αβ)} ∈ B, where the set membership follows from Proposition 11.40. Fix any N ∈ N, and  N  consider the Martingale fn n∈Z+ . Let T N : Ω → Z+ ∪ {∞} be defined by N

T N (ω) := inf{n ∈ Z+ | fn (ω) > exp(αβ)}, ∀ω ∈ Ω. By Example 14.34, T N  N  is a stopping time for the Martingale fn n∈Z+ . By Theorem 14.35, the stopped  N  N N stochastic process Yn n∈Z , defined by Yn (ω) = fT N (ω)∧n (ω), ∀ω ∈ Ω, +

N

∀n ∈ Z+ , is a Martingale. Note that Y0 (ω) = M0 (ω) = 1, ∀ω ∈ Ω. Then, N E(Yn ) = 1, ∀n 3∈ Z+ . By Doob’s Forward Convergence Theorem 14.41, we N N limn∈N Yn (ω) limn∈N Yn (ω) ∈ R N have Y∞ (ω) := satisfies N 0 limn∈N Yn (ω) does not exist in R N

N

N

N

Y∞ =  limn∈N Yn (ω)a.e. ω ∈ Ω, Y∞ is B-measurable, and E(Y∞ ) = N N N E(Y∞ ) ≤ supn∈Z+ E(Yn ) = supn∈Z+ E(Yn ) = 1. This implies that, by N

Proposition 11.7, P (A) = limN∈N P ({ω ∈ Ω | supn∈Z+ fn N Y∞

(ω) > exp(αβ)}) =

> exp(αβ)} ≤ lim infN∈N exp(−αβ) = exp(−αβ), limN∈N P ({ω ∈ Ω |  N  N where the second equality follows from the definition of Y∞ and Yn n∈Z+ and the inequality follows from Chebyshev Inequality, Proposition 14.67. This completes the proof of the lemma. ' &

830

14 Probability Theory

Lemma 14.86 Let α ∈ R+ , V : (α, ∞) → R be continuous and V¯ : (0, α) → R be continuous. Then, .

lim sup V (t) = t →∞

lim sup V¯ (t) = t →0

inf

sup V (s) =

t ∈R,t >α s∈R,s>t

inf

sup

t ∈R+ ,t t

V¯ (s) =

inf

sup

t ∈Q,0 α. Since V is continuous, we have H (t) = sups∈Q,s>t V (s) ∈ Re , ∀t ∈ R with t > α. We will distinguish two exhaustive and mutually exclusive cases: Case 1: lim supt →∞ V (t) < +∞ and Case 2: lim supt →∞ V (t) = +∞. Case 1: lim supt →∞ V (t) < +∞. Then, inft ∈R,t >α H (t) < ∞. Clearly, H is monotonically nonincreasing on (α, ∞). If there exists t0 ∈ (α, ∞) with H (t0 ) = ∞, we can show that H (t) = ∞, ∀t ∈ (α, ∞). Then, we have H : (α, ∞) → R, since V : (α, ∞) → R. We need the following intermediate result: Claim 14.86.1 H : (α, ∞) → R is continuous. Proof of Claim Fix any t ∈ (α, ∞). H (t) = sups∈R,s>t V (s) ∈ R. Fix any  ∈ (0, ∞) ⊂ R. Clearly, V (t) ≤ H (t), since V is continuous. We will distinguish two exhaustive and mutually exclusive cases: Case A: H (t) > V (t) and Case B: H (t) = V (t). Case A: H (t) > V (t). Then, by continuity of V , ∃δ ∈ (0, ∞) ⊂ R such that |V (l) − V (t)| < 12 (H (t) − V (t)), ∀l ∈ BR (t, δ) ∩ (α, ∞) =: D. This implies that H (l) = H (t), ∀l ∈ D. Then, |H (l) − H (t)| = 0 < , ∀l ∈ D. Case B: H (t) = V (t). By continuity of V , ∃δ ∈ (0, ∞) ⊂ R such that |V (l) − V (t)| < , ∀l ∈ BR (t, δ) ∩ (α, ∞) =: D. Fix any l ∈ D. If l ≥ t, we have H (t) −  = V (t) −  < V (l) ≤ sups∈R,s>l V (s) = H (l) ≤ H (t). Thus, we have |H (l) − H (t)| ≤ . On the other hand, if l < t, we have H (t) ≤ H (l) = sups∈R,s>l V (s) = max{sups∈R,s>t V (s), sups∈[l,t ] V (s)} = max{H (t), sups∈[l,t ] V (s)} ≤ H (t) + . This yields |H (l) − H (t)| ≤ . Hence, we have |H (l) − H (t)| ≤ , ∀l ∈ D. In both cases, we have found δ ∈ (0, ∞) ⊂ R such that |H (l) − H (t)| ≤ , ∀l ∈ BR (t, δ) ∩ (α, ∞) =: D. Hence, H is continuous at t. By the arbitrariness of t, we conclude that H is continuous. This completes the proof of the claim. ' & By Claim 14.86.1, we have inft ∈R,t >α H (t) = inft ∈Q,t >α H (t). This leads to lim supt →∞ V (t) = inft ∈R,t >α H (t) = inft ∈Q,t >α H (t) = inft ∈Q,t >α sups∈Q,s>t V (s). Hence, the result holds in this case. Case 2: lim supt →∞ V (t) = +∞. Then, inft ∈R,t >α H (t) = ∞. This implies that H (t) = ∞, ∀t ∈ (α, ∞) ⊂ R. This leads to lim supt →∞ V (t) = inft ∈R,t >α H (t) = inft ∈Q,t >α H (t) = inft ∈Q,t >α sups∈Q,s>t V (s). Hence, the result holds in this case. In both cases, we have shown that the desired result holds.

14.11 Martingales with General Index Set

831

Next, consider the function V¯ . By Definition 3.84, lim sup V¯ (t) =

.

t →0

inf

sup

t ∈R+ ,t α1

1 sup V¯ ( ) s s∈R,s>t

1 sup V¯ ( ) = sup V¯ (s) inf t ∈Q,0 √ h(θ n ) = θ n −√θ n−1 θ n − θ n−1 7 ∞ y2 1 (h(θ n ))2 1 θn √ exp(− ) dy < exp(− ), ∀n ∈ N with √ √ −1 2 2θ n √ 1−θ 2π 2π h(θ n ) h(θ n ) "*

θ n −θ n−1

w n −wθ n−1

n ≥ n0,θ , where the second equality follows from the fact that √θ

θ n −θ n−1

∼ N(0, 1)

and the inequality follows from Claim 14.87.1. Again by Claim 14.87.1, we have x2 1 x n P (An ) > √1 1+x 2 exp(− 2 ), ∀n ∈ N with n ≥ n0,θ , where x := θ n/2 h(θ ) = 2π √ √ 2 ln(n ln(θ )) = 2 ln(n) √+ 2 ln(ln(θ )). This implies that ∃n1,θ ∈ N with n1,θ ≥ n0,θ such that 1 < x < 2 ln(n), ∀n ∈ N with n ≥ n1,θ . Then, we have P (An ) > 2 √1 1 exp(− x ) > √1 √1 √1 exp(− ln(n ln(θ ))) = √ 1 , ∀n ∈ N 2 2π 2x 2π 4 ln(n)4 2π ln(θ) n ln(n) ∞ with n ≥ n1,θ . This yields that n=n0,θ P (An ) = ∞, by Problem 34.M of Bartle ∞  (1976). By Borel–Cantelli Lemma 14.6, P (F ) := P ( ∞ m=n0,θ n=m An ) = 1 = √  +# "* −1  w n −w n−1 n ) for infinitely many n ≥ n . Thus, P ω ∈ Ω  √θ n θn−1 > √ 1−θ h(θ 0,θ n n−1 θ −θ √ θ −θ w n −w n−1 wθ n we have equivalently that 1 − θ −1 ≤ lim supn∈N θ h(θ nθ) ≤ lim supn∈N h(θ n) + −w

n−1

wθ n n−1 h(θ ) 1 ≤ lim supn∈N h(θ lim supn∈N h(θ θn−1 n ) + √ a.e. ω ∈ Ω, where the ) h(θ n ) θ second inequality follows from Proposition 3.83 and the last inequality follows −w n−1 t ≤ lim supt →∞ −w from the facts that lim supn∈N h(θ θn−1 h(t ) ≤ 1 (since −wt is a ) n−1

) √1 . This standard Wiener process by Proposition 14.80) and limn∈N h(θ h(θ n ) = θ √ wθ n wt −1 − √1 a.e. ω ∈ Ω. ≥ lim sup ≥ 1 − θ leads to lim supt →∞ h(t n n∈N h(θ ) ) θ wt By Lemma 14.86, lim supt →∞ h(t : Ω → R is B-measurable. Then, we have 4) √ wt 1 lim supt →∞ h(t ) ≥ 1 − θ −1 − θ a.e. in Ω. By the arbitrariness of θ , we have wt lim supt →∞ h(t ) ≥ 1 a.e. in Ω. This proves (i). (ii) By Proposition 14.80, (−wt )t ∈I is also a standard Wiener process. Then, by −wt wt (i), we have − lim inft →∞ h(t ) = lim supt →∞ h(t ) = 1 a.e. in Ω, where the first equality follows from Proposition 3.85 and the second equality follows from (i). This proves (ii). 4 ¯ (iii) Let h¯ : (0, 1 ) → (0, ∞) ⊂ R be defined by h(t) = 2t ln(ln( 1 )), ∀t ∈ e

t

¯ = 0 and h¯ is strictly increasing on (0, t0 ) ⊂ R, for (0, 1e ) ⊂ R. Clearly, limt →0 h(t) 1 some t0 ∈ (0, e ). Fix any θ, δ ∈ Q with θ, δ ∈ (0, 1) ⊂ R. Let n0,θ ∈ N be such ¯ n ) > 0 and βn := 1 h(θ ¯ n ) > 0, ∀n ∈ N that θ n0,θ < t0 , and αn := (1 + δ)θ −n h(θ 2 1 −n n 2 ¯ )) = (1 + δ) ln(ln(θ −n )) = (1 + with n ≥ n0,θ . Then, αn βn = 2 (1 + δ)θ (h(θ δ) ln(n ln( θ1 )) = ln((n ln( θ1 ))1+δ ) and exp(−αn βn ) = nc(θ) 1+δ , ∀n ∈ N with n ≥ n0,θ , where c(θ ) := (ln( θ1 ))−1−δ . Therefore, we have ∞ n=n0,θ exp(−αn βn ) < +∞. By Lemma 14.85, we have P ({ω ∈-Ω | w(t, ω) > α2n t + βn for some t ∈ αn I}) ≤ exp(−αn βn ). This implies that ∞ n=n0,θ P ({ω ∈ Ω | w(t, ω) > 2 t +

834

14 Probability Theory

 ∞ βn for some t ∈ I}) < ∞. By Borel–Cantelli Lemma 14.6, P ( ∞ m=n0,θ n=m {ω ∈ Ω | w(t, ω) > α2n t + βn for some t ∈ I}) = 0 = P ({ω ∈ Ω | w(t, ω) > αn 2 t + βn for some t ∈ I and for infinitely many n ≥ n0,θ }) =: P (Ω \ Ωθ,δ ). Then, P (Ωθ,δ ) = 1. Fix any ω ∈ Ωθ,δ , ∃Nθ,δ (ω) ∈ N with Nθ,δ (ω) ≥ n0,θ such that w(t, ω) ≤ α2n t + βn , ∀t ∈ I and ∀n > Nθ,δ (ω). Fix any n > Nθ,δ (ω). ¯ n ¯ n) ≤ ∀t ∈ [θ n , θ n−1 ] ⊂ R, we have w(t, ω) ≤ α2n θ n−1 + βn = (1+δ)2θh(θ ) + 12 h(θ w(t,ω) 1+δ ¯ n 1+δ ¯ 1+δ ≤ . This implies h(θ ) ≤ h(t). Hence, we have sup Nθ,δ (ω)+1 θ

θ

that lim supt →0+

0 √ n n+1 h(θ θ n −θ n+1 θ −θ √ w n −wθ n+1 wθ n we have equivalently that 1 − θ ≤ lim supn∈N θ h(θ ≤ lim supn∈N h(θ ¯ n) ¯ n) + √ −wθ n+1 h(θ ¯ n+1 ) wθ n ≤ lim supn∈N h(θ θ a.e. ω ∈ Ω, where the lim supn∈N h(θ ¯ n+1 ) h(θ ¯ n) ¯ n) + second inequality follows from Proposition 3.83 and the last inequality follows −wθ n+1 −wt from the facts that lim supn∈N h(θ ¯ n+1 ) ≤ lim supt →0+ h(t ¯ ) ≤ 1 (since −wt is a √ ¯ n+1 ) standard Wiener process by Proposition 14.80) and limn∈N h(θ θ. This ¯h(θ n ) = √ √ wθ n wt leads to lim supt →0+ h(t 1 − θ − θ a.e. ω ∈ Ω. ¯ ) ≥ lim supn∈N h(θ ¯ n) ≥ wt By Lemma 14.86, lim supt →0+ h(t ¯ ) : Ω → R is B-measurable. Then, we have √ √ wt lim supt →0+ h(t 1 − θ − θ a.e. in Ω. By the arbitrariness of θ , we have ¯ ) ≥ wt lim supt →0+ h(t ¯ ) ≥ 1 a.e. in Ω.

14.11 Martingales with General Index Set

835

This proves (iii). (iv) By Proposition 14.80, (−wt )t ∈I is also a standard Wiener process. Then, by wt −wt (iii), we have − lim inft →0+ h(t ¯ ) = lim supt →0+ h(t ¯ ) = 1 a.e. in Ω, where the first equality follows from Proposition 3.85 and the second equality follows from (iii). This proves (iv). ¯ Let F denote (v) By (i) and (ii), E ∈ B and P (E) = 1. Then, (0, ∞) × E ∈ B. the measurable rectangles in the product space (0, ∞) × Ω, and define the mapping Ψ : (0, ∞) × Ω → (0, ∞) × Ω by Ψ (t, ω) = ( 1t , ω), ∀(t, ω) ∈ (0, ∞) × Ω. For ¯ Consider any set G1 × G2 ∈ F , we have the mapping Ψinv (G1 × G2 ) ∈ F ⊆ B. ¯ the set of all subsets G ⊆ (0, ∞) × Ω such that Ψinv (G) ∈ B, it is easy to show that this set is a σ -algebra containing F . Then, the mapping wˆ : (0, ∞) × Ω → ¯ R defined by w(t, ˆ ω) = w( 1t , ω), ∀(t, ω) ∈ (0, ∞) × Ω, is B-measurable. By Propositions 11.38, 11.39, and 11.41, we have w˜ : I × Ω → R is B¯ measurable. This proves (a) in Definition 14.76. Clearly, w(0, ˜ ω) = 0, ∀ω ∈ Ω. This establishes (b) in Definition 14.76. ∀ω ∈ Ω, if ω ∈ E, then w(·, ˜ ω) : I → R is continuous at any t ∈ I, and if ω ∈ Ω \ E, then w(·, ˜ ω) : I → R is the identically zero function and is continuous. Hence, (c) of Definition 14.76 is established. Fix any 0 = s1 < t1 = s2 < t2 = · · · = sn < tn < ∞. Then, we have w˜ t1 − w˜ s1 = t1 w1/t1 , . . ., w˜ tn − w˜ sn = tn w1/tn − sn w1/sn . Since (wt )t ∈I is a standard Wiener 2 ) = t , E(t w process, we have E(t1 w1/t1 ) = 0, E(t12 w1/t 1 i 1/ti − si w1/si ) = 0, 1 2 E((ti w1/ti − si w1/si ) ) = ti + si − 2ti si E(w1/ti w1/si ) = ti + si − 2ti si t1i = ti − si , i = 2, . . . , n. Furthermore, E(t1 w1/t1 (ti w1/ti − si w1/si )) = t1 ti E(w1/t1 w1/ti ) − t1 si E(w1/t1 w1/si ) = t1 − t1 = 0, i = 2, . . . , n, and E((ti w1/ti − si w1/si )(tj w1/tj − sj w1/sj )) = ti tj t1j −si tj t1j −ti sj s1j +si sj s1j = ti −si −ti +si = 0, ∀2 ≤ i < j ≤ n. Hence, we have (w˜ t1 − w˜ s1 , . . . , w˜ tn − w˜ sn ) admits mean 0n and covariance matrix K = block diagonal (t1 − s1 , t2 − s2 , . . . , tn − sn ) ∈ S¯+ Rn . By Proposition 14.72 and the fact that (wt )t ∈I is a standard Wiener process, (w˜ t1 − w˜ s1 , . . . , w˜ tn − w˜ sn ) ∼ N(0n , K). By Proposition 14.27, w˜ t1 − w˜ s1 , . . ., w˜ tn − w˜ sn are independent. This establishes (d) of Definition 14.76. Hence, the process (w˜ t )t ∈I , defined by w˜ t (ω) = w(t, ˜ ω), ∀ω ∈ Ω, ∀t ∈ I, is a standard Wiener process. This completes the proof of the theorem. ' & ˆ μ) be the σ -finite metric measure Example 14.88 Let I := (([0, ∞), |·|), B, subspace of R with the partial ordering ≤, Y be a normed linear space, Ω := (Ω, B, (Bt )t ∈I , P ) be a filtered probability measure space, I × Ω = ([0, ∞) × ¯ ν) be the σ -finite product measure space, (Xt )t ∈I be an Y-valued adapted Ω, B, stochastic process on Ω with continuous sample path ∀ω ∈ Ω, F ⊆ Y be a closed set, and T : Ω → I ∪ {+∞} be defined by T (ω) := inf{t ∈ I | Xt (ω) ∈ F }, ∀ω ∈ Ω. Then, T is a stopping time.  This  is because, ∀t ∈ I, {ω ∈ Ω | T (ω) ≤ t} = s∈[0,t ]{ω ∈ Ω | Xs (ω) ∈ F} = s∈[0,t ] Xs inv (F ) = {ω ∈ Ω | infs∈[0,t ] dist(Xs (ω), F ) = 0} = {ω ∈ Ω | infs∈[0,t ]∩Q dist(Xs (ω), F ) = 0} ∈ Bt , where the first equality follows from the fact that (Xt )t ∈I has continuous sample path, ∀ω ∈ Ω, which further implies that XT (ω) (ω) ∈ F if T (ω) < ∞, the second equality follows from the fact that

836

14 Probability Theory

(Xt )t ∈I is adapted to the filtration (Bt )t ∈I , the third and fourth equalities follow from the fact that (Xt )t ∈I has continuous sample path, ∀ω ∈ Ω, and dist(·, F ) : Y → [0, ∞] ⊂ Re is a continuous function, and the set membership follows from the fact that dist(Xs (·), F ) : Ω → [0, ∞] ⊂ Re is Bs -measurable by Proposition 11.38, ∀s ∈ [0, t] ∩ Q, and the set [0, t] ∩ Q is countable. % ˆ μ) be the σ -finite metric measure Proposition 14.89 Let I := (([0, ∞), |·|), B, subspace of R, Ω := (Ω, B, (Bt )t ∈I , P ) be a filtered probability measure space, and τ and κ be stopping times with values in I ∪ {∞}. Then, τ ∨ κ, τ ∧ κ, and τ + κ are stopping times. Proof By Definition 14.33, τ : Ω → I ∪ {∞} = [0, ∞] ⊆ Re is B-measurable and {ω ∈ Ω | τ (ω) ≤ t} ∈ Bt , ∀t ∈ I. The same thing holds for κ. By Propositions 11.36 and 11.40, τ ∨ κ, τ ∧ κ, and τ + κ are functions of Ω to [0, ∞] ⊆ Re and are Bmeasurable. Fix any t ∈ I, and we have {ω ∈ Ω | τ ∨ κ ≤ t} = {ω ∈ Ω | τ ≤ t} ∩ {ω ∈ Ω | κ ≤ t} ∈ Bt . This implies that τ ∨ κ is a stopping time. In addition, {ω ∈ Ω | τ ∧ κ ≤ t} = {ω ∈ Ω | τ ≤ t} ∪ {ω ∈ Ω | κ ≤ t} ∈ Bt . This yields that τ ∧ κ is a stopping time. Furthermore, {ω ∈ Ω | τ + κ ≤ t} = Ω \ {ω ∈ Ω | τ + κ >  t} = Ω \ ( r∈Q,0≤r≤t ({ω ∈ Ω | τ > r} ∩ {ω ∈ Ω | κ > t − r}) ∪ {ω ∈ Ω | τ > t} ∪ {ω ∈ Ω | κ > t}) ∈ Bt . This leads to τ + κ is a stopping time. This completes the proof of the proposition. ' & ˆ μ) be the σ -finite metric measure Example 14.90 Let I := (([0, ∞), |·|), B, subspace of R with the partial ordering ≤, Ω := (Ω, B, (Bt )t ∈I , P ) be a filtered ¯ ν) be the σ -finite product probability measure space, I × Ω = ([0, ∞) × Ω, B, measure space, w : I × Ω → R be the standard Wiener process on I × Ω (as defined in Definition 14.76), and the stochastic process (wt )t ∈I , where wt (ω) = w(t, ω), ∀ω ∈ Ω, ∀t ∈ I, is adapted to the filtration (Bt )t ∈I . Assume that Bt and 2 σ ((wt +s − wt )s∈I ) are independent σ -algebras, ∀t ∈ I. Expand exp(αx − α2 t) as an analytic function of α to obtain ∞

.

exp(αx −

. α2 αn t) =: Hn (x, t) , 2 n!

∀α ∈ R

(14.21)

n=0

where Hn (x, t) : R × R → R is C∞ , ∀n ∈ N. Clearly, Hn (x, t) is an nth order P Q 2 polynomial in x and an n2 th order polynomial in t. Note that exp(|α||x|+ α2 |t|) ≥ -∞ |α|n 3 3 n=0 |Hn (x, t)| n! , ∀(α, x, t) ∈ R . Then, (14.21) holds ∀(α, x, t) ∈ R with ∞ the right-hand side converge absolutely. The sequence (Hn (x, t))n=0 is called the Hermite polynomials. The leading Hermite polynomials are H0 (x, t) = 1, H1 (x, t) = x, H2 (x, t) = x 2 − t, H3 (x, t) = x 3 − 3xt, . . ., ∀(x, t) ∈ R2 .

14.11 Martingales with General Index Set

837

Now, consider the adapted stochastic process (Mt )t ∈I , defined by Mt (ω) = -∞ 2 αn exp(αwt (ω) − α2 t) = n=0 Hn (wt (ω), t) n! , ∀ω ∈ Ω, ∀t ∈ I. By Proposition 14.84, (Mt )t ∈I is a Martingale. Then, we have E(Mt |Bs ) = Ms a.e. in Ω s , ∀s, t ∈ I with s ≤ t, where Ω s := (Ω, Bs , P s := P |Bs ) is a probability measure -∞ |α|n 2 space. Since exp(|α||wt (ω)| + α2 |t|) ≥ n=0 n! |Hn (wt (ω), t)|, ∀(α, t, ω) ∈ -∞ 2 |α|n R × I × Ω, then E( n=0 |Hn (wt (ω), t)| n! ) ≤ E(exp(|α||wt (ω)| + α2 |t|)) < ∞, -∞ αn ∀(α, t) ∈ R × I. This yields E(Mt |Bs ) = n=0 E(Hn (wt , t)|Bs ) n! = Ms = -∞ n α n=0 Hn (ws , s) n! a.e. in Ω s , ∀s, t ∈ I with s ≤ t, ∀α ∈ R, where the first equality follows from (g) of Proposition 14.11, the second equality follows from (Mt )t ∈I being a Martingale, and the last equality follows from the definition of (Mt )t ∈I . By the arbitrariness of α, we have E(Hn (wt , t)|Bs ) = Hn (ws , s) a.e. in Ω s , ∀t, s ∈ I with s ≤ t, ∀n ∈ Z+ . Hence, (Hn (wt , t))t ∈I is a Martingale, ∀n ∈ Z+ . % Proposition 14.91 Let I := [0, ∞) ⊂ R, Ω := (Ω, B, (Bt )t ∈I , P ) be a filtered probability measure space, and τ : Ω → [0, ∞] ⊂ Re be a stopping time. Define Bτ := {A ∈ B | ∀t ∈ I, A ∩ {ω ∈ Ω | τ (ω) ≤ t} ∈ Bt }. Then, the following statements hold: (i) (ii) (iii) (iv)

Bτ is a σ -algebra on Ω. τ is Bτ -measurable. Ω˜ := {ω ∈ Ω | τ (ω) < ∞} ∈ Bτ . ˜ Then, B˜τ is a σ -algebra on the set Ω. ˜ Let B˜τ := {A ∈ Bτ | A ⊆ Ω}.

Proof (i) Clearly, Ω ∈ Bτ , since, ∀t ∈ I, Et := {ω ∈ Ω | τ (ω) ≤ t} ∈ Bt by the assumption that τ is a stopping time. Clearly, ∅ ∈ Bτ since ∅ ∈ Bt , ∀t ∈ I. ∀A ∈ Bτ , ∀t ∈ I, we have (Ω \ A) ∩ Et = Et \ A = Et \ (Et ∩ A) = Et ∩ (Ω \ (Et ∩ A)) ∈ Bt , since Et , Et ∩ A, Ω ∈ Bt . By the arbitrariness of t, we have Ω \ A ∈ Bτ .  ∞ ∀ (An )∞ ( ∞ n=1 An ) ∩ Et = n=1 (An ∩ Et ) ∈ Bt . By n=1 ⊆ Bτ , ∀t ∈ I, we have ∞ the arbitrariness of t, we have n=1 An ∈ Bτ . This shows that Bτ is a σ -algebra on Ω. (ii) It follows immediately from Proposition 11.35.  (iii) Clearly, Et ∈ Bτ , ∀t ∈ I. Note that Ω˜ = ∞ n=1 En ∈ Bτ . (iv) It is easy to show that B˜τ is a σ -algebra on Ω˜ by Proposition 11.13. This completes the proof of the proposition. ' & ˆ μ) be the σ -finite metric Proposition 14.92 Let m ∈ N, I := (([0, ∞), |·|), B, measure subspace of R with the partial ordering ≤, Ω := (Ω, B, (Bt )t ∈I , P ) be ¯ ν) be the σ -finite a filtered probability measure space, I × Ω =: ([0, ∞) × Ω, B, product measure space, w : I × Ω → Rm be the Rm -valued standard Wiener process that is adapted to the filtration (Bt )t ∈I , τ : Ω → [0, ∞] ⊂ Re be a stopping time, wt : Ω → Rm be defined by wt (ω) = w(t, ω), ∀ω ∈ Ω, ∀t ∈ I. Assume that Bt and σ ((wt +s − wt )s∈I ) are independent σ -algebras, ∀t ∈ I. Let Ω˜

838

14 Probability Theory

˜ Assume that and B˜τ be as defined in Proposition 14.91, and B˜ := {E ∈ B | E ⊆ Ω}. ˜ Then, ˜ > 0. Define P˜ : B˜ → [0, 1] ⊂ R by P˜ (A) := P (A|Ω), ˜ ∀A ∈ B. P (Ω) ˜ := (Ω, ˜ P˜ ) is a probability measure space. ˜ B, (i) Ω ¯˜ ν¯ ) be the σ -finite product measure space and ˜ =: ([0, ∞) × Ω, ˜ B, (ii) Let I × Ω ˜ → Rm by y(t, ω) := w(τ (ω) + t, ω) − w(τ (ω), ω), ∀(t, ω) ∈ define y : I × Ω ˜ ˜ I × Ω. Then, y is an Rm -valued standard Wiener process on Ω. ˜ (iii) The σ -algebras Bτ and σ ((yt )t ∈I ) are independent, i.e., the process y is ˜ → Rm is defined by yt (ω) = y(t, ω), independent of B˜τ , where yt : Ω ˜ ∀ω ∈ Ω, ∀t ∈ I. ˜ B˜τ ) is a measurable space. By ProposiProof (i) By Proposition 14.91, (Ω, ˜ is a measurable space. By the discussion on Page 763, P˜ (A) = ˜ B) tion 11.13, (Ω, ˜ is a probability measure ˜ Hence, it is easy to show that Ω ˜ ˜ ∀A ∈ B. P (A∩ Ω)/P (Ω), space. ¯˜ (ii) We will first show that y is B-measurable. Note that y(t, ω) = w(h(t, ω)) − ˜ where h : I × Ω ˜ → I×Ω ˜ is defined by h(t, ω) = w(h(0, ω)), ∀(t, ω) ∈ I × Ω, ˜ ˜ ˜ (g(t, ω), ω) ∈ I × Ω, ∀(t, ω) ∈ I × Ω, and g : I × Ω → I is defined by g(t, ω) = ˜ By Propositions 11.38 and 11.39, we have g is t + τ (ω) ∈ I, ∀(t, ω) ∈ I × Ω. ¯ ¯˜ ¯˜ ˜ B-measurable. We will show that h is B-inB-measurable. ˜ | J ∈ B, ˆ E ∈ B} ˜ be the set of measurable rectangles Let F := {J × E ⊆ I × Ω ¯ ˜ ˜ in I × Ω. Clearly, σ (F ) = B and F is a π-system. ∀J × E ∈ F , hinv (J × E) = ˜ | ω ∈ E, g(t, ω) ∈ J } = {(t, ω) ∈ I × Ω ˜ | (t, ω) ∈ I × {(t, ω) ∈ I × Ω ¯ ˜ E, (t, ω) ∈ ginv (J )} = ginv (J ) ∩ (I × E). By g being B-measurable, we have ¯ ¯ ¯˜ By ˜ Clearly, I × E ∈ F ⊆ B. ˜ Thus, we conclude that hinv (J × E) ∈ B. ginv (J ) ∈ B. ¯˜ ¯˜ B-measurable. the arbitrariness of J × E and Proposition 14.23, we have h is B-inm ¯ Note that w : I × Ω → R is B-measurable. By Proposition 12.23, w|I×Ω˜ : ¯˜ ˜ → Rm is B-measurable. I×Ω Then, by Propositions 14.24, 11.38, and 11.39, we ¯ ˜ have y is B-measurable. This establishes (a) of Definition 14.78. (b) and (c) of Definition 14.78 are straightforward for the stochastic process y. For (d) of Definition 14.78, we need the following intermediate result: Claim ∀j ∈ N, ∀s1 , . . . , sj ∈ (0, ∞) ⊂ R, ∀A ∈ B˜τ , ∀ bounded  14.92.1  BB Rmj -measurable function φ : Rm × · · · × Rm → R, we have E(χA,Ω · j −times

φ(ys1 , . . . , ysj )) = P (A)E(φ(ws1 , . . . , wsj )).

˜ → I by τn (ω) = kn if k−1 Proof of Claim Fix any n ∈ N. Define τn : Ω 2 2n < k ˜ τ (ω) ≤ 2n , k ∈ N. Then, we have limn∈N τn (ω) ↓ τ (ω), ∀ω ∈ Ω.   Fix any bounded BB Rmj -measurable function φ : Rm × · · · × Rm → R. ¯˜ Since y is B-measurable, then the function under the expectation on the left-hand˜ side of the desired equality is B-measurable. Since it is further bounded, then it is

14.11 Martingales with General Index Set

839

˜ R). We have the following line of arguments: an element of L¯ 1 (Ω, E(χA,Ω φ(ys1 , . . . , ysj )) = lim E(χA,Ω φ(wτn +s1 − wτn , . . . , wτn +sj − wτn ))

.

n∈N

= lim

n∈N

∞ .

E(χA,Ω χ{ω∈Ω | τn (ω)=

k=1

k },Ω 2n

·

φ(wτn +s1 − wτn , . . . , wτn +sj − wτn )) = lim

n∈N

∞ .

φ(w = lim

n∈N

k +s1 2n

∞ .

n∈N

= lim

n∈N

2

E(χA∩{ω∈Ω |

k +s1 2n

∞ . k=1

k=1

k +sj 2n

2

·

− w kn )) 2

k−1 α2 s + β2 , for some s ∈ I}. By Lemma 14.85, we have 2

P (A1 ) = P (A2 ) ≤ exp(−α β2 ) = exp(− β2t ). Clearly, we have E ⊆ A1 ∪ A2 . This 2

implies that P (E) ≤ P (A1 ) + P (A2 ) ≤ 2 exp(− β2t ). On the other hand, we have E ⊇ A3 := {ω ∈ Ω | |w(t, ω)| > β}. Then, P (E) ≥ ∞ 2 P (A3 ) = 2 β √ 1 exp(− x2t ) dx > 0, where the equality follows from the fact 2πt that wt (·) := w(t, ·) : Ω → R satisfies wt ∼ N(0, t). This completes the proof of the lemma. ' & ˆ μ) be the σ -finite Proposition 14.94 Let p ∈ [1, ∞) ⊂ R, I := (([0, ∞), |·|), B, metric measure subspace of R with partial ordering ≤, Ω := (Ω, B, (Bt )t ∈I , P ) ¯ ν) be the σ be a filtered probability measure space, I × Ω = ([0, ∞) × Ω, B, finite product measure space, w : I × Ω → R be the standard Wiener process on I × Ω (as defined in Definition 14.76) that is adapted to the filtration (Bt )t ∈I , and the stochastic process (wt )t ∈I defined by wt (ω) = w(t, ω), ∀ω ∈ Ω, ∀t ∈ I. Assume that Bt and σ ((wt +s − wt )s∈I ) are independent σ -algebras, ∀t ∈ I. Then, there exist constants Cp ∈ (0, ∞) ⊂ R, ∀τ : Ω → [0, ∞] ⊂ Re be a stopping time, we have wτ,M p ≤ Cp τ 1/2 p , where wτ,M : Ω → [0, ∞] ⊂ Re is defined  1/p p by wτ,M (ω) := supt ∈I |w(τ (ω) ∧ t, ω)|, ∀ω ∈ Ω, wτ,M p := E(wτ,M ) , and  1/p 1/2 p/2 . τ p := E(τ ) Proof We need the following intermediate result. Claim 14.94.1 Fix any t ∈ I, and let wt,M : Ω → [0, ∞) ⊂ R be defined by wt,M (ω) := sup0≤s≤t |w(s, ω)|, ∀ω ∈ Ω. ∀β ∈ (1, ∞) ⊂ R, ∀λ, δ ∈ (0, ∞) ⊂ R, we have P ({ω ∈ Ω | wτ,M (ω) > βλ, (τ (ω))1/2 ≤ δλ}) ≤ P ({ω ∈ Ω | wτ,M (ω) > λ})P ({ω ∈ Ω | w1,M (ω) > β−1 2δ }). Proof of Claim Fix any β ∈ (1, ∞) ⊂ R and any λ, δ ∈ (0, ∞) ⊂ R. Define W : I × Ω → [0, ∞) ⊂ R defined by W (t, ω) = |w(τ (ω) ∧ t, ω)|, ∀(t, ω) ∈ ¯ I × Ω. Note that the mapping (t, ω) ∈ I × Ω 3→ τ (ω) ∧ t ∈ I is B-measurable, by ¯ Proposition 11.40. Then, the mapping (t, ω) ∈ I×Ω 3→ (τ (ω)∧t, ω) ∈ I×Ω is B¯ in-B-measurable, by Proposition 14.23. Then, by Proposition 14.24, we have W is ¯ B-measurable. Clearly, W (·, ω) : I → [0, ∞) ⊂ R is continuous, ∀ω ∈ Ω. Define κ : Ω → [0, ∞] ⊂ Re by κ(ω) := inf{t ∈ I | |w(τ (ω) ∧ t, ω)| ≥ β+1 2 λ}, ∀ω ∈ Ω.

14.11 Martingales with General Index Set

841

By Example 14.88, κ is a stopping time. Then, we have {ω ∈ Ω | wτ,M (ω) > β+1 β+1 2 λ} ⊆ {ω ∈ Ω | κ(ω) < ∞} ⊆ {ω ∈ Ω | wτ,M (ω) ≥ 2 λ}. Clearly, we have wτ,M (ω) = supt ∈I W (t, ω), ∀ω ∈ Ω. By the continuity of W (·, ω) : I → [0, ∞) ⊂ R, we can conclude that wτ,M is B-measurable. Similarly, we may conclude that wt,M is Bt -measurable, ∀t ∈ I. Thus, we have the following line of arguments: P ({ω ∈ Ω | wτ,M (ω) > βλ, (τ (ω))1/2 ≤ δλ})

.

= P ({ω ∈ Ω | κ(ω) < ∞, τ (ω) ≤ δ 2 λ2 ,

|w(s, ω)| > βλ})

sup κ(ω)≤s≤τ (ω)

≤ P ({ω ∈ Ω | κ(ω) < ∞,

sup

|w(κ(ω) + s, ω) − w(κ(ω), ω)| >

0≤s≤δ 2 λ2

β −1 λ}) 2 = P ({ω ∈ Ω | κ(ω) < ∞})P ({ω ∈ Ω |

sup

|w(s, ω)| >

0≤s≤δ 2 λ2

β −1 λ}) 2 = P ({ω ∈ Ω | κ(ω) < ∞})P ({ω ∈ Ω | w1,M (ω) >

β −1 }) 2δ

β +1 β −1 λ})P ({ω ∈ Ω | w1,M (ω) > }) 2 2δ β −1 ≤ P ({ω ∈ Ω | wτ,M (ω) > λ})P ({ω ∈ Ω | w1,M (ω) > }) 2δ ≤ P ({ω ∈ Ω | wτ,M (ω) ≥

where the first equality follows from the equality of the two sets on both sides of the equality and the set on the left-hand-side of the equality is B-measurable, the first inequality follows from the fact that the set on the left-hand-side of the inequality is a subset of the set on the right-hand side of the inequality, and the set on the right-hand side of the inequality is B-measurable by Proposition 14.92, the second equality follows from Proposition 14.92, in particular Claim 14.92.1, the third equality follows from Proposition 14.81, and the last two inequalities follow from the preceding discussion and the fact that wτ,M is B-measurable. This completes the proof of the claim. ' &

842

14 Probability Theory

Note that we have the following line of arguments, ∀β ∈ (1, ∞) ⊂ R, ∀δ ∈ (0, ∞) ⊂ R. 6 6 6 6 6wτ,M 6p = β p 6β −1 wτ,M 6p = β p E((β −1 wτ,M )p ) p p 7 = βp pzp−1 P ({ω ∈ Ω | β −1 wτ,M (ω) > z}) dμB (z)

.

7

(0,∞)

= βp

pzp−1 P ({ω ∈ Ω | wτ,M (ω) > βz}) dμB (z) 7



(0,∞)

 pzp−1 P ({ω ∈ Ω | wτ,M (ω) > βz, (τ (ω))1/2 ≤ δz})

p (0,∞)

 +P ({ω ∈ Ω | wτ,M (ω) > βz, (τ (ω))1/2 > δz}) dμB (z) 7  ≤ βp pzp−1 (β, δ)P ({ω ∈ Ω | wτ,M (ω) > z}) (0,∞)

 +P ({ω ∈ Ω | δ −1 (τ (ω))1/2 > z}) dμB (z) = β p (β, δ)E(wτ,M ) + β p E((δ −1 τ 1/2 )p ) 6 6p = β p (β, δ)wτ,M pp + β p δ −p 6τ 1/2 6p p

where the third equality follows from Proposition  14.21, the first  inequality follows  from Claim 14.94.1, and (β, δ) := P ( ω ∈ Ω  w1,M > β−1 2δ ) ∈ (0, 1] ⊂ R, and the sixth equality follows from Proposition 14.21. 2 ). By Lemma 14.93, the constant (β, δ) satisfies 0 < (β, δ) ≤ 2 · exp(− (β−1) 8δ 2 Then, we may fix β > 1 first and then choose δ =

√ β−1 p 8 ln(4β )

(β, δ) ≤ 2β1p . Then, the previous paragraph implies √ ln(4β p ) with Cp := 21/p βδ = 21/p β 8β−1 ∈ (0, ∞) ⊂ R. This completes the proof of the proposition.

> 0 such that 0 < 6 6 that wτ,M p ≤ Cp 6τ 1/26p ' &

14.12 Stochastic Integral ˆ μ) be the σ -finite metric measure Definition 14.95 Let I := (([0, ∞), |·|), B, subspace of R with partial ordering ≤, Ω := (Ω, B, (Bt )t ∈I , P ) be a filtered ¯ ν) be the σ -finite product probability measure space, I × Ω = ([0, ∞) × Ω, B, measure space, w : I × Ω → R be a standard Wiener process on I × Ω, wt be adapted to (Bt )t ∈I , Y be a Banach space, and φ : ra,b × Ω → Y be Yvalued stochastic process that is adapted to (Bt )t ∈I , where a, b ∈ I with a ≤ b. Assume that φω := φ(·, ω) : ra,b → Y is of bounded variation, ∀ω ∈ Ω. (Use notations wt and wω , etc., as in Proposition 12.28.) Define I : Ω → Y by

14.12 Stochastic Integral

843

I (ω) := w(b, ω)φ(b, ω)−w(a, ω)φ(a, ω)− ra,b w(s, ω) dφω (s), ∀ω ∈ Ω, where the integral is the Lebesgue–Stieltjes integral as introduced in Definition 12.57. Note that, by Definition 14.76, wω : I → R is continuous, ∀ω ∈ Ω. Then, by Proposition 7.126, Lebesgue Dominated Convergence Theorems 11.128, 12.50, and Definition 12.57, ra,b w(s, ω) dφω (s) ∈ Y, ∀ω ∈ Ω. Thus, the integral I is well-defined. By Theorem A.10, we have ra,b w(s, ω) dφω (s) = b b  a w(s, ω) dφω (s), ∀ω ∈ Ω, which further implies that a w(s, ω) dφω (s) = b  φ(s, ω) dwω (s) ∈ Y, ∀ω ∈ Ω. Thus, the integral is simply the I (ω) = a

Riemann-Stieltjes integral carried out sample path-wise, when the integrand is of b bounded variation sample path-wise. We will denote I by  φs dws or  φs dws , ra,b

a

which is called the Riemann-Stieltjes integral of φ with respect to the Wiener % process w on the interval ra,b . Based on Definition 14.95, we wish to expand our definition of integral to the more general class of L2 (ra,b × Ω, Y) integrands, where Y is a Hilbert space. First, we present a result that deduces the measurability of a function with domain of a product measure space. Proposition 14.96 Let Xi := (Xi , Bi , μi ) be a σ -finite measure space, i = 1, 2, ¯ μ) be the σ -finite product measure space, Y := X := X1 × X2 =: (X1 × X2 , B, ¯ (Y, O) be a topological space, f : X1 × X2 → Y be B-measurable, g : X1 × X2 → ¯ ¯ X1 be B1 -in-B-measurable, h : X1 × X2 → X2 be B2 -in-B-measurable, and l : X1 ×X2 → X1 ×X2 be defined by l(x1 , x2 ) = (g(x1 , x2 ), h(x1 , x2 )), ∀(x1 , x2 ) ∈ X . ¯ B-measurable, ¯ ¯ Then, we have l is B-inand the function f ◦ l : X → Y is Bmeasurable. ¯ B-measurable. ¯ Proof We will first show that l is B-inLet F := {D × E ⊆ X1 × X2 | D ∈ B1 , E ∈ B2 } be the set of measurable rectangles in X . Clearly, σ (F ) = B¯ and F is a π-system. ∀D × E ∈ F , linv (D × E) = {(x1, x2 ) ∈ X1 × X2 | h(x1 , x2 ) ∈ E, g(x1 , x2 ) ∈ D} = {(x1 , x2 ) ∈ X1 × X2 | (x1 , x2 ) ∈ ¯ hinv (E), (x1 , x2 ) ∈ ginv (D)} = ginv (D) ∩ hinv (E). By g being B1 -in-B-measurable, ¯ By h being B2 -in-B-measurable, ¯ ¯ we have ginv (D) ∈ B. we have hinv (E) ∈ B. ¯ By the arbitrariness of D × E and Thus, we conclude that linv (D × E) ∈ B. ¯ B-measurable. ¯ Proposition 14.23, we have l is B-in¯ By Proposition 14.24, we have f ◦ l is B-measurable. This completes the proof of the proposition. ' & We next present the following result on the important properties of Riemann– Stieltjes integral with respect to the Wiener process integrator that readily applies to deterministic integrands. ˆ μ) be the σ -finite metric measure Lemma 14.97 Let I := (([0, ∞), |·|), B, subspace of R with partial ordering ≤, Ω := (Ω, B, (Bt )t ∈I , P ) be a filtered ¯ ν) be the σ -finite product probability measure space, I × Ω = ([0, ∞) × Ω, B, measure space, w : I × Ω → R be a standard Wiener process on I × Ω and

844

14 Probability Theory

be adapted to the filtration (Bt )t ∈I , the σ -algebras Bt and σ ((wt +s − wt )s∈I ) are independent, ∀t ∈ I, Y and Z be Banach space over K, and a, b ∈ I with a ≤ b. Then, the following statements hold: (i) Let φ : ra,b × Ω → Y be a Y-valued stochastic process that is adapted to (Bt )t ∈I , φω : ra,b → Y be of bounded variation, ∀ω ∈ Ω, and A ∈ B (Y, Z).      Then we have ra,b Aφs dws (ω) = A ra,b φs dws (ω) ∈ Z, ∀ω ∈ Ω. (ii) Let φi : ra,b × Ω → Y be a Y-valued stochastic process that is adapted to (Bt )t ∈I , φi ω : ra,b → Y be of bounded variation, ∀ω ∈ Ω, i = 1, 2. Then, we       have ra,b (φ1 + φ2 )s dws (ω) = ra,b φ1 s dws (ω) + ra,b φ2 s dws (ω) ∈ Y, ∀ω ∈ Ω. (iii) Let φ : ra,b × Ω → Y be a Y-valued stochastic process that is adapted to (Bt )t ∈I , and c ∈ ra,b  . Then, ∀ω ∈  Ω, φω : ra,b → Y is of bounded variation, if, and only if, φω r and φω r are of bounded variation, ∀ω ∈ Ω. In this c,b  a,c      case, we have  φs dws (ω) +  φs dws (ω) =  φs dws (ω) ∈ Y, ra,c

rc,b

ra,b

∀ω ∈ Ω. (iv) Let Y be a separable Banach space, φ : ra,b ×Ω → Y be a Y-valued stochastic process that is adapted to (Bt )t ∈I , and φω : ra,b → Y be of bounded variation,  t  ∀ω ∈ Ω. Define x : ra,b × Ω → Y by x(t, ω) =  φs dws (ω) ∈ Y, ∀(t, ω) ∈ a

ra,b × Ω. Then, x is a Y-valued stochastic process that is adapted to (Bt )t ∈I with continuous sample paths. (v) Let Y be a separable Hilbert space, φ : ra,b × Ω → Y be a Y-valued stochastic process that is adapted to (Bt )t ∈I , and φω : ra,b → Y be of bounded variation, ∀ω ∈ Ω. Assume that there exists a random variable  c : Ω → [0, ∞) ⊂ R such that (φ(a, ω)Y + Tφω ra,b )2 ≤ c(ω),   2 ∀ω ∈ Ω, and E c (1 + wa,b,M ) < ∞, where wa,b,M : Ω → [0, ∞) ⊂ R is defined by wa,b,M := maxa≤t ≤b |w(b, ω) − w(t, ω)|. Then, we have 6 6 62 6 (E(6ra,b φs dws 6Y ))1/2 = 6ra,b φs dws 6L (Ω,Y) = φL2 (ra,b ×Ω,Y) = 2  1/2 2 φ(s, ω) ds dP (ω) ∈ [0, ∞) ⊂ R. Y Ω ra,b

ˆ a,b × Ω, Y) := {φ : ra,b × When Y is a separable Hilbert space, define V(r Ω → Y | φ is a Y-valued stochastic process that is adapted to (Bt )t ∈I , φω : ra,b → Y be of bounded variation, ∀ω ∈ Ω, there exists  arandom variable cφ : Ω → [0, ∞) ⊂ R such that (φ(a, ω)Y + Tφω ra,b )2 ≤ cφ (ω), ∀ω ∈   2 ˆ a,b × Ω, Y) is a subspace of Ω, and E cφ (1 + wa,b,M ) < ∞}. Then, V(r L¯ 2 (ra,b × Ω, Y) and the Riemann-Stieltjes integral with Wiener process integrator ˆ a,b × Ω, Y) and its image is a pseudo-norm preserving homomorphism between V(r ¯ in L2 (Ω, Y). Proof (i) By Proposition 12.117, Aφω : ra,b → Z is also of bounded b   variation, ∀ω ∈ Ω. Then, ra,b Aφs dws (ω) = a Aφ(s, ω) dwω (s) =    b  A a φ(s, ω) dwω (s) = A( ra,b φs dws (ω)) ∈ Z, ∀ω ∈ Ω, where the

14.12 Stochastic Integral

845

first equality follows from Definition 14.95, the second equality follows from Definition A.4, and the last equality follows from Definition 14.95. (ii) By Propositions 12.118 and 12.117 and the fact that vector addition is Lipschitz on Y × Y, we have (φ1 + φ2 )ω : ra,b → Y is of bounded vari b  ation, ∀ω ∈ Ω. Then, we have ra,b (φ1 + φ2 )s dws (ω) = a (φ1 (s, ω) + b b  φ2 (s, ω)) dwω (s) = a φ1 (s, ω) dwω (s) + a φ2 (s, ω) dwω (s) = ra,b φ1 s ·    dws (ω) +  φ2 dws (ω) ∈ Y, ∀ω ∈ Ω, where the first equality follows ra,b

s

from Definition 14.95, the second equality follows from Theorem A.6, and the last equality follows from Definition 14.95. (iii) Let φω : ra,b → Y be of bounded variation, ∀ω ∈ Ω. By Definition 12.41, we have φω ra,c and φω r are of bounded variation, ∀ω ∈ Ω. In this   c,b   case, we have ra,c φs dws (ω) + rc,b φs dws (ω) = ra,c φ(s, ω) dwω (s) +     rc,b φ(s, ω) dwω (s) = ra,b φ(s, ω) dwω (s) = ra,b φs dws (ω) ∈ Y, ∀ω ∈ Ω, where the first equality follows from Definition 14.95, the second equality follows from Theorem A.7, and the last equality follows from Definition 14.95.  On the other hand, let φω r and φω r be of bounded variation, ∀ω ∈ Ω. a,c

c,b

ω

Fix any ω ∈ Ω. By Theorem 12.50, there exist unique finite measures μ1 and ω μ2 on the measurable spaces (ra,c , BB (ra,c )) and (rc,b , BB (rc,b )), respectively,   ω such that φω r is a cumulative distribution function of μ1 and φω r is a a,c

ω

c,b

cumulative distribution function of μ2 . By generation process Proposition 11.118, there exists a unique finite measure μω on the measurable space (ra,b , BB (ra,b )) ω ω such that (ra,c , BB (ra,c ), μ1 ) and (rc,b , BB (rc,b ), μ2 ) are its measure subspaces. By Definition 12.42, we have φω is a cumulative distribution function for μω . If ra,b = ∅, then φω is of bounded variation. If ra,b = ∅, let x0 ∈ ra,b . Then, φ¯ω := φω − φ(x0 , ω) : ra,b → Y is a cumulative distribution function of μω , and it is the cumulative distribution function of μω with origin at x0 . By Proposition 12.52, φ¯ ω is of locally bounded variation. Then, it is of bounded variation since ra,b is a compact interval and μω is finite. Then, φω is of bounded variation by Definition 12.41 since Tφω (rx1 ,x2 ) = Tφ¯ω (rx1 ,x2 ) = P ◦ μω (rx1 ,x2 ) < P ◦ μω (ra,b ) < ∞, ∀x1 , x2 ∈ ra,b with x1 ≤ x2 . This completes the proof of (iii). t   (iv) Note that x(t, ω) := ra,t φs dws (ω) = a φ(s, ω) dwω (s) = n n n limn∈N ni=1 φ(si−1 (t), ω)(w(si (t), ω) − w(si−1 (t), ω)) =: limn∈N Yn (t, ω), ∀(t, ω) ∈ ra,b × Ω, where the first equality follows from Definition 14.95 and the n second equality follows from Integrability Theorem A.10 with si (t) = a + i t −a n , ¯ i = 0, . . . , n. By Propositions 14.96, 11.38, and 11.39, Yn is B-measurable, ∀n ∈ N. ¯ By Proposition 11.48, x : ra,b × Ω → Y is B-measurable. Thus, x is a Y-valued stochastic process. Again by Propositions 11.38, 11.39, and 11.48, xt : Ω → Y is Bt -measurable, ∀t ∈ ra,b . Hence, x is adapted to (Bt )t ∈I . By Theorem A.12, xω : ra,b → Y is continuous, ∀ω ∈ Ω.

846

14 Probability Theory

(v) By (iv), we have ra,b φs dws is Bb -measurable and therefore a Y-valued random variable. Then, we have the following line of argument: 67 6 .6 6

ra,b

62 6 φs dws 6 6

L2 (Ω,Y)

67 6 = E(6 6

ra,b

62 6 φs dws 6 6 )

62 7 67 b 6 6 6 6 = 6 φ(s, ω) dwω (s)6 dP (ω) Ω

7 =

Ω

Y

Y

a

6. 62 6 n 6 n n n 6 dP (ω) lim 6 φ(s , ω)(w(s , ω) − w(s , ω)) i−1 i i−1 6 6 n∈N Y

i=1

where the first equality follows from Definition 14.1, the second equality follows from Definition 14.95, and the third equality follows from Integrability Theon rem A.10 with si = a + i b−a n , i = 0, . . . , n. Fix any n ∈ N. ∀ω ∈ Ω, we have 6 n 62 6. 6 n n n 6 . φ(si−1 , ω)(w(si , ω) − w(si−1 , ω))6 6 6

Y

i=1

6 n 62 6. 6 n n n 6 =6 φ(s , ω)(−(w(b, ω) − w(s , ω)) + (w(b, ω) − w(s , ω))) i−1 i i−1 6 6

Y

i=1 n 6 . 6 n n = 6φ(a, ω)(w(b, ω) − w(a, ω)) − (w(b, ω) − w(si , ω))(φ(si , ω) i=1

62 6 n −φ(si−1 , ω))6

Y

n 6 #2 6 " . 6 6 n n ≤ wa,b,M φ(a, ω)Y + wa,b,M 6φ(si , ω) − φ(si−1 , ω)6 i=1

Y

≤ (wa,b,M )2 (φ(a, ω)Y + Tφω (ra,b ))2 ≤ c(ω) (wa,b,M(ω))2 where the first equality follows from simple algebra, the second equality follows from simple algebra, the first inequality follows from Definition 7.1 and the definitions of wa,b,M : Ω → [0, ∞) ⊂ R, the second inequality follows from Definition 12.41, and the last inequality follows from the definition of c : Ω → [0, ∞) ⊂ R.

14.12 Stochastic Integral

847

2 By the assumption, the E(c wa,b,M ) < ∞. Now, by Lebesgue Dominated Convergence Theorem 11.91, we have

67 6 R 6

.

7 =

ra,b

62 6 φs dws 6

L2 (Ω,Y)

n 62 6. 6 6 n n n lim 6 φ(si−1 , ω)(w(si , ω) − w(si−1 , ω))6 dP (ω)

Ω n∈N

Y

i=1

n 62 # "6 . 6 6 = lim E 6 φs n (ws n − ws n )6 n∈N

i=1

i−1

i

i−1

Y

n #  " .  E P2 ◦ φs n E (ws n − ws n )2 = lim n∈N

i−1

i=1

i

i−1

n∈N

n 7 . 6 6 n 6φ(s , ω)62 dP (ω)(s n − s n ) i−1 i i−1 Y

= lim

7 . n 6 6 n 6φ(s , ω)62 (s n − s n ) dP (ω) i−1 i−1 Y i

= lim

i=1

Ω

n∈N Ω i=1

where the set membership and the second equality follow from Lebesgue Dominated Convergence Theorem 11.91, the third equality follows from Definition 14.76, the independence assumption, and Definition 13.1, and the last equality follows from Proposition 11.92. ∀ω ∈ Ω, we have φω is of bounded variation and the function g : ra,b → R, defined by g(t) = t, ∀t ∈ ra,b , is continuous, strictly increasing, and of bounded variation. By Proposition 12.119, we have P2 ◦ φω is Riemann-Stieltjes integrable b b with respect to g on ra,b , a P2 ◦ φω dg = ra,b P2 ◦ φω dg = a P2 ◦ 6 62 b φω dt = 6φω 6L¯ (r ,Y) , and the Riemann-Stieltjes integral a P2 ◦ φω dg con2 a,b 62 n - 6 n n verges in gauge mode. Thus, we have limn∈N ni=1 6φ(si−1 , ω)6Y (si − si−1 ) = 62 n 6 6 - 6 n n 6φω 62¯ , ∀ω ∈ Ω. Fix any n ∈ N. 0 ≤ ni=1 6φ(si−1 , ω)6Y (si − si−1 ) ≤ L2 (ra,b ,Y) 6 62   maxa≤t ≤b 6φω (t)6Y (b − a) ≤ (b − a)(φ(a, ω)Y + Tφω ra,b )2 ≤ (b − a)c(ω), where the second inequality follows from simple algebra, the third inequality follows from Definition 12.41, and the last inequality follows from the definition ˆ a,b × Ω, Y). By the assumption, E(c) < ∞, and by Lebesgue Dominated of V(r

848

14 Probability Theory

Convergence Theorem 11.91, we have 67 6 R 6

.

= lim

ra,b

62 6 φs dws 6

L2 (Ω,Y)

7 . n 6 62 6 n 6 n n 6φ(si−1 , ω)6 (si − si−1 ) dP (ω) Y

n∈N Ω i=1

7 =

lim

7 =

n 6 62 . 6 n 6 n n 6φ(si−1 , ω)6 (si − si−1 ) dP (ω) Y

Ω n∈N i=1

Ω

6 6 6φω 62¯ L

2 (ra,b ,Y)

dP (ω) = φ2L¯

2 (ra,b ×Ω,Y)

where the second equality follows from Lebesgue Dominated Convergence Theorem 11.91, the third equality follows from the preceding discussion, and the fourth equality follows from Tonelli’s Theorem 12.29. This completes the proof for (v). ˆ a,b × Ω, Y), by (v), we have φ ∈ Let Y be a separable Hilbert space. ∀φ ∈ V(r ¯L2 (ra,b × Ω, Y). Then, V(r ˆ a,b × Ω, Y) ⊆ L¯ 2 (ra,b × Ω, Y).     ˆ a,b × Ω, Y), ∀α ∈ K, let φ¯ := αφ. Then, T ¯ ra,b ≤ |α|Tφω ra,b , ∀φ ∈ V(r φω ∀ω ∈ Ω, by Proposition 12.117. Then, let cφ : Ω → [0, ∞) ⊂ R be the random ˆ a,b × Ω, Y). variable for φ, we may set cφ¯ := |α|2 cφ . Thus, φ¯ ∈ V(r ˆ a,b × Ω, Y), i = 1, 2, let φ¯ := φ1 + φ2 . Then, by Propositions 12.117 ∀φi ∈ V(r       and 12.118, we have Tφ¯ω ra,b ≤ Tφ1 ω ra,b + Tφ2ω ra,b , ∀ω ∈ Ω. Then, let cφ,i : Ω → [0, ∞) ⊂ R be the random variable for φi , i = 1, 2. We may set ˆ a,b × Ω, Y). cφ¯ := 2(cφ,1 + cφ,2 ). Thus, φ¯ ∈ V(r ˆ This shows that V(ra,b × Ω, Y) is a subspace of L¯ 2 (ra,b × Ω, Y). By (i) and (ii), we have the Riemann-Stieltjes integral with Wiener process integrator ˆ a,b × Ω, Y) and its image in L¯ 2 (Ω, Y). By is a homomorphism between V(r (v), Riemann-Stieltjes integral with Wiener process integrator is a pseudo-norm ˆ a,b × Ω, Y) and its image in L¯ 2 (Ω, Y). preserving homomorphism between V(r This completes the proof of the lemma. ' & When Y is a separable Hilbert space, by Lemma 14.97, if φ ∈ L¯ 2 (ra,b , Y) is of bounded variation, then cφ can be chosen as a constant (independent of ω). Then, 2 2 by Proposition 14.94, we have E(cφ (1 + wa,b,M )) = cφ (1 + E(wa,b,M )) ≤ 2 cφ (1 + 4C2 b)) < ∞, where C2 ∈ (0, ∞) ⊂ R is a constant as defined in ˆ a,b × Ω, Y). Thus, V(r ˆ a,b × Ω, Y) contains all Proposition 14.94. Thus, φ ∈ V(r the Y-valued deterministic square integrable functions on ra,b that are of bounded variation. Note that cφ is an upper bound of the φ2BV as defined in (h) of Problem 29.β of Bartle (1976). ˆ μ) be the σ -finite metric measure Proposition 14.98 Let I := (([0, ∞), |·|), B, subspace of R with partial ordering ≤, Ω := (Ω, B, (Bt )t ∈I , P ) be a filtered

14.12 Stochastic Integral

849

¯ ν) be the σ -finite product probability measure space, I × Ω = ([0, ∞) × Ω, B, measure space, w : I × Ω → R be a standard Wiener process on I × Ω adapted to the filtration (Bt )t ∈I , the σ -algebras Bt and σ ((wt +s − wt )s∈I ) be independent, ∀t ∈ I, Y be a separable Hilbert space, and φ ∈ L¯ 2 (ra,b , Y), where a, b ∈ I with a ≤ b. Then, there exists a unique [I ] ∈ L2 (Ω, Y), where I : Ω → Y is a Y-valued random variable that is Bb -measurable, such that ∀ (φn )∞ n=1 with . φn : ra,b → Y be of bounded variation, and limn∈N φn = φ in L¯ 2 (ra,b , Y), we have = > limn∈N ra,b φn (s) dws = [I ] in L2 (Ω, Y). We will denote [I ] by ra,b φs dws or b  a φs dws , which is called the Wiener integral of φ (with respect to Wiener process w) on the interval ra,b . Proof By Proposition 12.120, ∀n ∈ N, there exists a absolutely continuous function gn : ra,b → Y such that gn ∈ L¯ 2 (ra,b , Y) and gn − φL¯ 2 (ra,b ,Y) < 2−n . By Proposition 12.73, gn is of locally bounded variation, which further implies that gn is of bounded variation by Definition 12.41 and the fact that the domain of gn is a compact interval, ∀n ∈ N. Then, limn∈N [gn ] = [φ] in L2 (ra,b , Y). The sequence ([gn ])n∈N ⊆ L2 (ra,b , Y) is a Cauchy sequence. By Lemma 14.97, the sequence > = b gn dws ⊆ L2 (Ω, Y) is a Cauchy sequence. By Example 11.179, s

a

n∈N

L2 (Ω, Y) is a Hilbert space. Then, there exists a unique [I ] ∈ L2 (Ω, Y), such = > that limn∈N ra,b gn (s) dws = [I ] in L2 (Ω, Y). ∀(φn )∞ n=1 with φn : ra,b → Y . ¯ be of bounded variation, and limn∈N φn = φ in L2 (ra,b , Y), it is easy to see that the sequence ([φ1 ], [g1 ], [φ2 ], [g2 ], . . .) is a Cauchy sequence in L2 (ra,b , Y) that converges to [φ]. By Lemma 14.97, the sequence .

"= b # b > = b > = b >  φ1 (s) dws >, = g1 (s) dws ,  φ2 (s) dws ,  g2 (s) dws , . . . a

a

a

a

= > = is a Cauchy sequence in L2 (Ω, Y). Then, limn∈N ra,b φn (s) dws [I ] in L2 (Ω, Y). b By (iv) of Lemma 14.97, a φn s dws is Bb -measurable, ∀n ∈ N. Then, by Example 11.179, we may find I : Ω → Y, which is a Y-valued random variable and Bb -measurable, such that I ∈ [I ]. This completes the proof of the proposition. ' & We next present the following result on the important properties of Riemann– Stieltjes integral with Wiener process integrator on another subspace of L¯ 2 (ra,b × Ω, Y). ˆ μ) be the σ -finite metric measure Lemma 14.99 Let I := (([0, ∞), |·|), B, subspace of R with partial ordering ≤, Ω := (Ω, B, (Bt )t ∈I , P ) be a filtered ¯ ν) be the σ -finite product probability measure space, I × Ω = ([0, ∞) × Ω, B, measure space, w : I × Ω → R be a standard Wiener process on I × Ω and be adapted to the filtration (Bt )t ∈I , the σ -algebras Bt and σ ((wt +s − wt )s∈I ) are independent, ∀t ∈ I, Y be a separable Banach space over K, a, b ∈ I with a ≤ b,

850

14 Probability Theory

ˆ a,b × Ω, Y) := {φ : ra,b × Ω → Y | ∃n ∈ Z+ , ∃a = t0 ≤ t1 ≤ t2 ≤ · · · ≤ and Q(r tn = b · φ(t, ω) = ni=1 Ai−1 (ω)χ[ti−1 ,ti ),ra,b (t), ∀(t, ω) ∈ ra,b × Ω and Ai−1 : Ω → Y be Bti−1 -measurable and E(P2 ◦ Ai−1 ) < ∞, i = 1, . . . , n}. The set ˆ a,b ×Ω, Y) is the set of all Y-valued simple predictable step functions on ra,b ×Ω. Q(r Then, the following statements hold: ˆ a,b × Ω, Y), we have φ2 (i) ∀φ ∈ Q(r = ni=1 E(P2 ◦ Ai−1 )(ti − L¯ (r ×Ω,Y) 2 a,b

ˆ a,b × Ω, Y) ⊆ L¯ 2 (ra,b × Ω, Y) is a subspace. ti−1 ) < ∞. Furthermore, Q(r ˆ a,b × Ω, Y), ∀ω ∈ Ω, we have φω : ra,b → Y is of (ii) ∀φ ∈ Q(r   bounded variation and Tφω ra,b ≤ ni=1 Ai (ω) − Ai−1 (ω)Y < ∞, where An (ω) := ϑY .   b -n ˆ (iii) ∀φ -n := i=1 Ai−1 χ[ti−1 ,ti ),ra,b ∈ Q(ra,b × Ω, Y), we have a φs dws (ω) = i=1 Ai−1 (ω) (w(ti , ω) − w(ti−1 , ω)), ∀ω ∈ Ω.   ˆ a,b × Ω, Y), define x : ra,b × Ω → Y by x(t, ω) := t φs dws (ω) ∈ (iv) ∀φ ∈ Q(r a Y. Then, x is a Y-valued stochastic process that is adapted to (Bt )t ∈I with continuous sample paths. ˆ a,b × Ω, Y). Then, we (v) Let Y be a separable Hilbert space and φ ∈ Q(r 6 62 1/2 6 6   = 6 ra,b φs dws 6L¯ (Ω,Y) = φL¯ 2 (ra,b ×Ω,Y) ∈ have (E(6 ra,b φs dws 6Y )) 2 [0, ∞) ⊂ R.

ˆ a,b × Ω, Y) is a subspace of L¯ 2 (ra,b × When Y is a separable Hilbert space. Q(r Ω, Y). The Riemann-Stieltjes integral with Wiener process integrator is a pseudoˆ a,b ×Ω, Y) and its image in L¯ 2 (Ω, Y). norm preserving homomorphism between Q(r ˆ a,b × Ω, Y), we have Proof (i) ∀φ := ni=1 Ai−1 χ[ti−1 ,ti ),ra,b ∈ Q(r 7 7 φ2L¯

.

2 (ra,b ×Ω,Y)

7 7 =

Ω

7 =

Ω

=

Ω

a

b

6 6 6φ(t, ω)62 dt dP (ω) Y

n b. 6 a

6 6Ai−1 (ω)62 χ[t ,t ),r (t) dt dP (ω) Y i−1 i a,b

i=1

n . 6 6 6Ai−1 (ω)62 (ti − ti−1 ) dP (ω) = E(P2 ◦ Ai−1 )(ti − ti−1 ) < ∞ Y i=1

where the first equality follows from Tonelli’s Theorem 12.29, the third equality follows from Proposition 11.75, the last equality follows from Proposition 11.92, ˆ a,b × Ω, Y). Hence, φ ∈ and the inequality follows from the definition of Q(r ˆ a,b ×Ω, Y) ⊆ L¯ 2 (ra,b ×Ω, Y). L¯ 2 (ra,b ×Ω, Y). By the arbitrariness of φ, we have Q(r ˆ It is easy to see that ∀φ1 , φ2 ∈ Q(ra,b × Ω, Y) and ∀α ∈ K, we have φ1 + φ2 ∈ ˆ a,b × Ω, Y) and αφ1 ∈ Q(r ˆ a,b × Ω, Y). Hence, Q(r ˆ a,b × Ω, Y) is a subspace of Q(r ¯L2 (ra,b × Ω, Y). -n ˆ (ii) ∀φ i=1 Ai−1 χ[ti−1 ,ti ),ra,b ∈ Q(ra,b ×Ω, Y), ∀ω ∈ Ω, it is easy to see that -:= φω = ni=1 Ai−1 (ω)χ[ti−1 ,ti ),ra,b is a simple function and a step function. Clearly,

14.12 Stochastic Integral

851

it is continuous on the right. By Definition  - 12.41, we may easily show φω is of bounded variation and that Tφω ra,b ≤ ni=1 Ai (ω) − Ai−1 (ω)Y < ∞. -n ˆ (iii) ∀φ := i=1 Ai−1 χ[ti−1 ,ti ),ra,b ∈ Q(ra,b × Ω, Y), ∀ω ∈ Ω, we have   b b φs dws (ω) =  φ(s, ω) dwω (s) = n Ai−1 (ω) (w(ti , ω) − w(ti−1 , ω)), a

i=1

a

where the first equality follows from Definition 14.95 and the last equality follows from Definition A.4. (iv) This follows immediately from (iv) of Lemma 14.97. (v) Let Y be a separable Hilbert space. Fix any φ := ni=1 Ai−1 · χ[ti−1 ,ti ),ra,b ∈ ˆ a,b × Ω, Y). Then, we have φ2 = ni=1 E(P2 ◦ Ai−1 )(ti − ti−1 ) ∈ Q(r L¯ 2 (ra,b ×Ω,Y)  b  -n [0, ∞) ⊂ R, by (i). We also have a φs dws (ω) = i=1 Ai−1 (ω) (w(ti , ω) − w(ti−1 , ω)), ∀ω ∈ Ω, by (iii). Then, we have 7 6 6 . ra,b

62 φs dws 6L¯

7 = =

Ω

2 (Ω,Y)

n 62 6. 6 Ai−1 (ω) (w(ti , ω) − w(ti−1 , ω))6Y dP (ω) i=1

n 7 n . . i=1 j =1 Ω

B

C Ai−1 (ω), Aj −1 (ω) (w(tj , ω) − w(tj −1 , ω))

·(w(ti , ω) − w(ti−1 , ω)) dP (ω) n 7 . Ai−1 (ω), Ai−1 (ω) (w(ti , ω) − w(ti−1 , ω))2 dP (ω) = i=1

=

n .

Ω

# " E(P2 ◦ Ai−1 ) E (wti − wti−1 )2

i=1

=

n .

E(P2 ◦ Ai−1 ) (ti − ti−1 ) = φ2L¯

2 (ra,b ×Ω,Y)

i=1

where the second equality follows from Proposition 13.2, the third equality follows from Definition 14.76 and the independence assumption of Bt and σ ((wt +s − wt )s∈I ), ∀t ∈ I, the fourth equality follows from Proposition 14.9, the fifth equality follows from Definition 14.76, and the last equality follows from (i). This completes the proof of (v). ˆ a,b ×Ω, Y) is a subspace When Y is a separable Hilbert space, by (i), we have Q(r ¯ of L2 (ra,b × Ω, Y). By (iii), Definition 14.95, and Theorem A.6, the RiemannStieltjes integral with Wiener process integrator is a homomorphism between ˆ a,b × Ω, Y) and its image in L¯ 2 (Ω, Y). By (v), we have the Riemann-Stieltjes Q(r integral with Wiener process integrator is a pseudo-norm preserving homomorphism

852

14 Probability Theory

ˆ a,b × Ω, Y) and its image in L¯ 2 (Ω, Y). This completes the proof of the between Q(r lemma. ' & Proposition 14.100 Let Y be a separable Banach space, p ∈ [1, ∞) ⊂ R, and f ∈ L¯ p (R, Y). Then, the following statements hold: (i) Let fδ : R → Y be defined by fδ (t) = f (t − δ), ∀t ∈ R, ∀δ ∈ (0, ∞) ⊂ R. Then, we have limδ→0+ f − fδ L¯ p (R,Y) = 0. k−1 k 2n (ii) Let f n : R → Y be defined by f n (t) = 2n k−2 f (s) ds, ∀ k−1 2n ≤ t < 2n , 2n 6 6 ∀k ∈ Z, ∀n ∈ N. Then, we have 6f − f n 6L¯ (R,Y) ≤ f − f2−n L¯ p (R,Y) + p 6p 6 6 1 6 1 16 6 6 p p 2 ( 0 f − f2−n τ L¯ (R,Y) dτ ) and limn∈N f − f n 6L¯ p (R,Y) = 0. p

Proof (i) Fix any  ∈ (0, ∞) ⊂ R. By Proposition 11.182, there exists a continuous function g : R → Y such that g − f L¯ p (R,Y) < 3 . Then, clearly, gδ − fδ L¯ p (R,Y) < 3 , where gδ : R → Y is defined by gδ (t) = g(t − δ), ∀t ∈ R. Clearly, we have limδ→0+ g(t) − gδ (t)Y = 0, ∀t ∈ R. Fix any (δn )∞ n=1 ⊆ R+ with limn∈N δn 6 = 0. By 6 Lebesgue Dominated Convergence Theorem 11.91, we have limn∈N 6g − gδn 6L¯ (R,Y) = 0. By Proposition 4.16, we p have limδ→0+ g − gδ L¯ p (R,Y) = 0. Then, ∃δ0 ∈ (0, ∞) ⊂ R such that ∀δ ∈ (0, δ0 ] ⊂ R, we have g − gδ L¯ p (R,Y) < 3 . Then, we have f − fδ L¯ p (R,Y) ≤ f − gL¯ p (R,Y) + g − gδ L¯ p (R,Y) + gδ − fδ L¯ p (R,Y) < . By the arbitrariness of , we have limδ→0+ f − fδ L¯ p (R,Y) = 0. 2kn k f2−n (s) ds, ∀ k−1 (ii) Clearly, f n (t) = 2n k−1 2n ≤ t < 2n , ∀k ∈ Z, ∀n ∈ N, 2n 6 6 where f2−n : R → Y is as defined in (i). Then, we have 6f − f n 6L¯ (R,Y) ≤ p 6 6 f − f2−n  ¯ + 6f2−n − f n 6 ¯ . Fix any  ∈ (0, ∞) ⊂ R. By (i), ∃δ0 ∈ Lp (R,Y)

Lp (R,Y)

(0, ∞) ⊂ R such that ∀δ ∈ (0, δ0 ] ⊂ R, we have f − fδ L¯ p (R,Y) < 2 . We have the following line of arguments: 6 6 6f2−n − f n 6p L¯

.

=

∞ .

7

k=−∞

=

∞ 7 . k=−∞



∞ 7 . k=−∞



∞ 7 . k=−∞

k 2n k−1 2n k 2n k−1 2n k 2n k−1 2n k 2n k−1 2n

p (R,Y)

7 6 6 n −n (t) − 2 6f2 67 6 2pn 6

2

pn

k−1 2n

"7

k 2n k−1 2n

7 2n

k 2n

k 2n k−1 2n

k 2n k−1 2n

6p 6 f2−n (s) ds 6 dt Y

6p 6 (f2−n (t) − f2−n (s)) ds 6 dt Y

(f2−n (t) − f2−n (s))Y ds p

(f2−n (t) − f2−n (s))Y ds dt

#p dt

14.12 Stochastic Integral

=2

n

∞ 7 .

k−2 2n

∞ 7 .

=2

=2

≤2



1 2n

7 =2

7

−∞

0 2−n

n+1 0

f

t

7

p

(f (t) − f (s))Y dt ds

7

1 2n

0

p

(f (t) − ft −s (t))Y dt ds p

(f (t) − ft −s (t))Y ds dt

t − k−2 2n 0

k−1 2n

7 = 2n+1

7

k−2 2n

k=−∞

k−1 2n

k−2 2n

k−1 2n

k−2 2n

k−1 2n

7

k−2 2n

∞ 7 .

p

(f (t) − f (s))Y dt ds

s

k−1 2n

∞ 7 .

6 6 6(f (t¯) − f (¯s ))6p d¯s dt¯ Y

s

k−1 2n

∞ 7 .

k=−∞ n+1

7

k−2 2n

k=−∞

= 2n+1

k−2 2n k−1 2n

∞ 7 . k=−∞

= 2n+1

k−1 2n

k−2 2n

k=−∞ n+1

7

k−2 2n

∞ 7 .

k−1 2n k−2 2n

k−1 2n

k=−∞ n+1

7

k−1 2n

k=−∞

= 2n

853

p

(f (t) − fτ (t))Y dτ dt p

(f (t) − fτ (t))Y dτ dt p

(f (t) − fτ (t))Y dt dτ

p − fτ L¯ (R,Y) dτ p

7 =2 0

1

p

f − f2−n τ¯  ¯ L

p (R,Y)

dτ¯

where the first equality follows from Example 11.173, the second equality follows from Proposition 11.92, the first inequality follows from Proposition 11.92, the second inequality follows from Hölder’s Inequality (Theorem 11.178) with q ∈ (1, ∞] ⊂ Re satisfying 1/p + 1/q = 1, the third equality follows from change of variable s¯ := s − 2−n and t¯ := t − 2−n , the fourth equality follows from Tonelli’s Theorem 12.29, the fifth equality follows from symmetry, the sixth equality follows from simple algebra, the seventh equality follows from Tonelli’s Theorem 12.29, the eighth equality follows from change of variable τ = t − s, the third inequality follows from Proposition 11.83, the ninth equality follows from Proposition 11.83 and Tonelli’s Theorem 12.29; and the last equality follows from Change of Variable −n0 ≤ δ . Then, ∀n ∈ N Theorem 12.91 with τ¯ = 2n τ . Let n0 ∈ N be 0 6 such that 6 2 −n 6 6 with n0 ≤ n, ∀τ ∈ (0, 2 ] ⊂ R, we have f − fτ L¯ p (R,Y) < 2 . Then, we have 6 6 6 6 −n  p n+1 2 1−p  p and 6f −n − f n 6 6f2−n − f n 6p ≤ 2 ¯Lp (R,Y) ≤ 2 0 2p dτ = 2 L¯ p (R,Y) 6 6 2−1/q . This shows that 6f − f n 6L¯ p (R,Y) ≤ 2 + 2−1/q  < 32 . Hence, 6 6 limn∈N 6f − f n 6L¯ (R,Y) = 0. p This completes the proof of the proposition. ' &

854

14 Probability Theory

ˆ μ) be the σ -finite metric measure Definition 14.101 Let I := (([0, ∞), |·|), B, subspace of R with partial ordering ≤, Ω := (Ω, B, (Bt )t ∈I , P ) be a filtered ¯ ν) be the σ -finite product probability measure space, I × Ω = ([0, ∞) × Ω, B, measure space, Y be a topological space, and x : I × Ω → Y be a Y-valued stochastic process. Then, x is said to be progressively measurable, if ∀t ∈ I, the function x|r0,t ×Ω is B˜t -measurable, where (r0,t , Bˆt , μt ) is the finite metric measure ˜ is the finite product measure subspace of I (and therefore R), and (r0,t × Ω, B˜t , ν) % space of (r0,t , Bˆt , μt ) and (Ω, Bt , P |Bt ). ˆ μ) be the σ -finite metric measure Proposition 14.102 Let I := (([0, ∞), |·|), B, subspace of R with partial ordering ≤, Ω := (Ω, B, (Bt )t ∈I , P ) be a filtered ¯ ν) be the σ -finite product probability measure space, I × Ω = ([0, ∞) × Ω, B, measure space, Y be a Banach space over K, and x : I × Ω → Y be adapted to (Bt )t ∈I . Assume that xω : I → Y is continuous on the right, ∀ω ∈ Ω. Then, x is a progressively measurable stochastic process. Proof Fix any t ∈ I. Define a sequence (Xn )∞ n=1 with Xn : r0,t × Ω → Y, ∀n ∈ N, -2 n k and Xn (s, ω) = x(0, ω) + k=1 x( 2n t, ω)χr k−1 k ,I (s), ∀(s, ω) ∈ r0,t × Ω. 2n

t, n t 2

Fix any n ∈ N. By our assumption, Ak := x kn t : Ω → Y is B kn t -measurable 2 2 (since x is adapted to (Bt )t ∈I ) and therefore Bt -measurable (since (Bt )t ∈I is a filtration), k = 0, . . . , 2n . Then, " # ∀O ∈ OY , Xn inv (O) = ({0} × A0inv (O)) ∪ 2n k=1 (r k−1 t, 2kn t × Ak inv (O)) . Clearly, each set in the union on the right-hand 2n side is in B˜t , where B˜t is as defined in Definition 14.101. Then, Xn inv (O) ∈ B˜t , ∀O ∈ OY . By the arbitrariness of O, we have Xn is B˜t -measurable. By the assumption xω : I → Y is continuous on the right, ∀ω ∈ Ω, we have limn∈N Xn (s, ω) = x(s, ω), ∀(s, ω) ∈ r0,t × Ω. By Proposition 11.48, x|r0,t ×Ω is B˜t -measurable. By the arbitrariness of t, x is progressively measurable. Note that x(s, ω) = limn∈N x|r0,n ×Ω , ∀(s, ω) ∈ I × Ω. x|r0,n ×Ω is B˜n -measurable and ¯ ¯ therefore B-measurable. By Proposition 11.48, x is B-measurable. Hence, x is a Yvalued stochastic process. This completes the proof of the proposition. ' & ˆ μ) be the σ -finite metric measure Proposition 14.103 Let I := (([0, ∞), |·|), B, subspace of R with partial ordering ≤, Ω := (Ω, B, (Bt )t ∈I , P ) be a filtered ¯ ν) be the σ -finite product probability measure space, I × Ω = ([0, ∞) × Ω, B, measure space, ra,b ⊂ I with a ≤ b, Y be a separable Banach space, W ⊆ Y be a σ -compact conic segment, and φ : ra,b ×Ω → W be a W -valued stochastic process adapted to the filtration (Bt )t ∈I . Assume that (i) φ ∈ L¯ 2 (ra,b × Ω, Y) and φω : ra,b → W is continuous on the right, ∀ω ∈ Ω. (ii) ∀E ∈ B with P (E) = 0, implies that E ∈ B0 . Then, there exists a sequence of Y-valued simple predictable step functions ˆ ˆ (φn )∞ n=1 ⊆ Q(ra,b ×. Ω, Y), where Q(ra,b × Ω, Y) is defined in Lemma 14.99, ¯ such that limn∈N φn = φ in L2 (ra,b × Ω, Y).

14.12 Stochastic Integral

855



φ(t, ω) a ≤ t < b , t < a or t ≥ b ϑY ∀(t, ω) ∈ I × Ω. Clearly, φ¯ ∈ L¯ 2 (I × Ω, Y) and φ¯ ω : I → W is continuous on the right, ∀ω ∈ Ω. Then, φ¯ is a W -valued stochastic process that is adapted to the filtration (Bt )t∈I . By Proposition 14.102, φ¯ is progressively measurable. Construct ∞ the sequence φ¯ n n=1 according to Proposition 14.100 sample path-wise. Fix any n ∈ N. Define φ¯ n : I × Ω → Y by ¯ ω) := Proof Let φ¯ : I × Ω → W be defined by φ(t,

φ¯ n (t, ω) ⎧ ⎨ ϑY (∀0 ≤ t < 21n ) or φ¯ ω ∈ / L¯ 2 (I, Y) k := n k k+1 2 ¯ ω) ds (∀ n ≤ t < n , ∀k ∈ N) and φ¯ ω ∈ L¯ 2 (I, Y) ⎩ 2n k−1 φ(s, 2 2

.

2n

=:

∞ .

An,k−1 (ω)χ[ k−1 n , 2

k=1

k ),I 2n

(t),

∀(t, ω) ∈ I × Ω

 6 Define φn : ra,b × Ω → Y by φn = φ¯ n r ×Ω . Then, we have limn∈N 6φ¯ ω − a,b 6 ˆ where Ωˆ := {ω ∈ Ω | φ¯ ω ∈ L¯ 2 (I, Y)}. Since (φ¯ n )ω 6L¯ (I,Y) = 0, ∀ω ∈ Ω, 2 ˆ = 0, φ¯ ∈ L¯ 2 (I × Ω, Y), by Tonelli’s Theorem 12.29, we have Ωˆ ∈ B and P (Ω \ Ω) 62 6 62 6 6 6 6 6 ˆ ¯ ¯ and φ L¯ (I×Ω,Y) = Ωˆ φω L¯ (I,Y) dP (ω). By (ii), we have Ω \ Ω ∈ B0 and 2 2  ¯ 2 (r0,b+1 × Ω, Y). Since ν(r0,b+1 × Ω) = b + 1 < ∞, Ωˆ ∈ B0 . Then, φ¯ r ∈ L 0,b+1 ×Ω  then by Hölder’s Inequality (Theorem 11.178), φ¯ r ∈ L¯ 1 (r0,b+1 × Ω, Y). 0,b+1 ×Ω  ˆ we have φ¯ω  ∈ L¯ 2 (r0,b+1 , Y). Again, by Hölder’s Inequality Fix any ω ∈ Ω, r0,b+1  (Theorem 11.178), we have φ¯ ω  ∈ L¯ 1 (r0,b+1 , Y). Then, by Fubini’s Theor0,b+1

rem 12.31, An,k−1 : Ω → Y is B-measurable, ∀n ∈ N, ∀k ∈ {1, . . . , 12n (b + 1)2}. Since φ¯ is progressively measurable, then, by Theorem 12.121, An,k−1 is B k−1 2n n measurable, ∀n ∈ N, ∀k ∈ {1, . . . , 12 (b + 1)2}. Note that, ∀k = 2, 3, . . ., 67 k−1 62 6 2n 6 6 ¯ ω) ds 6 .E(P2 ◦ An,k−1 ) = 22n 6 φ(s, 6 dP (ω) 6 k−2 6 Ω n 7

Y

2

7

7



2n Ω

k−1 2n k−2 2n

6 6 6φ(s, ¯ ω)62 ds dP (ω) Y

6 62 6  6 6 6 = 2n 6 φ¯ r 6 k−1 ×Ω 6 6 k−2 , 2n 2n ¯

= b (ii) a y dws = [y(wb − wa )] = a y dws ∈ L2 (Ω, Y). b b (iii) A( a φ1 s dws ) = a (Aφ1 )s dws ∈ L2 (Ω, Z). c b b (iv) Let c ∈ ra,b . Then, a φ1 s dws + c φ1 s dws = a φ1 s dws ∈ L2 (Ω, Y). 6 6 6 b 6 (v) 6 a φ1 s dws 6L (Ω,Y) = 6φ1 6L¯ (r ×Ω,Y) . 2 2 a,b    b b (vi) Re ra,b ×Ω φ1 (s, ω), φ2 (s, ω)Y dν(s, ω) = Re E( a φ1 s dws , a φ2 s  dws Y ) . b (vii) E( a φ1 s dws ) = ϑY . Proof (i) Fix any i ∈ {1, 2}. By Definition 14.104, there exists a sequence ˆ a,b × Ω, Y), of Y-valued simple predictable step functions (φi,n )n∈N ⊆ Q(r . ˆ a,b × Ω, Y) is defined in Lemma 14.99, such that limn∈N φi,n = where Q(r φi in L¯ 2 (ra,b × Ω, Y). For each n ∈ N, let Ii,n : Ω → Y be defined by   b Ii,n (ω) :=  (φi,n ) dws (ω) ∈ Y. Then, Ii,n is Bb -measurable, ∀n ∈ N, by a

s

Lemma 14.99. Then, there exists a unique [Ii ] ∈ L2 (Ω, Y), where Ii : Ω → Y is a Y-valued random variable that is further Bb -measurable, such that [Ii ] = b = > ra,b φi s dws = a φi s dws = limn∈N Ii,n in L2 (Ω, Y). By Lemma 14.99, we . ˆ a,b × Ω, Y), and clearly limn∈N (φ1,n + φ2,n ) = have (φ1,n + φ2,n )n∈N ⊆ Q(r (φ1 + φ2 ) in L¯ 2 (ra,b × Ω, Y). By Definition 14.104, φ1 + φ2 is Itô integrable b = b > on ra,b and we have a (φ1 + φ2 )s dws = limn∈N a (φ1,n + φ2,n )s dws = b = b > limn∈N a φ1,n s dws + a φ2,n s dws = limn∈N [I1,n + I2,n ] = limn∈N ([I1,n ] + [I2,n ]) = [I1 ] + [I2 ] in L2 (Ω, Y), where the second equality follows from Theorem A.6. This proves (i). (ii) This follows immediately from Lemma 14.99. (iii) Note that Aφ1 : ra,b × Ω → Z is a Z-valued stochastic process adapted to the filtration (Bt )t ∈I , by Propositions 11.38. Then, the sequence of Z-valued simple . ˆ a,b × Ω, Z) satisfies limn∈N Aφ1,n = predictable step functions (Aφ1,n )n∈N ⊆ Q(r Aφ1 in L¯ 2 (ra,b × Ω, Z). By Definition 14.104, Aφ1 is Itô integrable on ra,b and we

14.12 Stochastic Integral

859

= b > = b > dws = limn∈N a (Aφ1,n )s dws = limn∈N A(a φ1,n s dws ) = b limn∈N [AI1,n ] = A(limn∈N [I1,n ]) = A( a φ1 s · dws ) in L2 (Ω, Z), where the first equality follows from Definition 14.104, the second equality follows from Definition A.4, the third equality follows from the definition of I1,n , and the fourth equality follows from A ∈ B (Y, Z). This proves (iii). c = c > (iv) By Definition 14.104, we have a φ1 s dws = limn∈N a φ1,n s dws b = b > c b and c φ1 s dws = limn∈N c φ1,n s dws . Then, a φ1 s dws + c φ1 s dws = = c > = b > = c > b limn∈N a φ1,n s dws + c φ1,n s dws = limn∈N a φ1,n s dws +c φ1,n s dws = b = b > limn∈N  φ1,n dws = φ1 dws in L2 (Ω, Y), where the third equality follows

have

b

a (Aφ1 )s

s

a

s

a

from Theorem A.7. This proves (iv). 6 b 6 6 = b >6 (v) Note that 6 a φ1 s dws 6L (Ω,Y) = 6 limn∈N a φ1,n s dws 6L (Ω,Y) = 2 6 6 2 6 b 6 6 6 limn∈N 6a φ1,n s dws 6L¯ (Ω,Y) = limn∈N 6φ1,n 6L¯ (r ×Ω,Y) = 6φ1 6L¯ (r ×Ω,Y) , 2 2 a,b 2 a,b where the second equality follows from Proposition 7.21 and the third equality follows from Lemma 14.99. (vi) Note that 7 b " B7 b C # .Re E φ1 s dws , φ2 s dws Y a

= Re

"B7

a

= = = = = = =

a

7

b

φ1 s dws ,

b a

φ2 s dws

#

C L2 (Ω,Y)

" # = Re I1 , I2 L¯ 2 (Ω,Y)

 1 I1 , I2 L¯ 2 (Ω,Y) + I2 , I1 L¯ 2 (Ω,Y) 2  1 I1 + I2 , I1 + I2 L¯ 2 (Ω,Y) − I1 − I2 , I1 − I2 L¯ 2 (Ω,Y) 4  1 I1 + I2 2L¯ (Ω,Y) − I1 − I2 2L¯ (Ω,Y) 2 2 4  1 φ1 + φ2 L¯ 2 (ra,b ×Ω,Y) − φ1 − φ2 L¯ 2 (ra,b ×Ω,Y) 4  1 φ1 , φ2 L¯ 2 (ra,b ×Ω,Y) + φ2 , φ1 L¯ 2 (ra,b ×Ω,Y) 2   Re φ1 , φ2 L¯ 2 (ra,b ×Ω,Y) "7 # φ1 (s, ω), φ2 (s, ω)Y dν(s, ω) Re ra,b ×Ω

where the first equality follows from Example 13.11, the third and fourth equalities follow from Definition 13.1, the fifth equality follows from Proposition 13.2, the sixth equality follows from (v), the seventh equality follows from Definition 13.1,

860

14 Probability Theory

Proposition 13.2, and simple algebra, the eighth equality follows from Definition 13.1, and the ninth equality follows from Example 13.11. b (vii) Note that E( a φ1 s dws ) = E(I1 ) = limn∈N E(I1,n ) = ϑY , where the first equality follows from (i), the second equality follows from Proposition 11.213, and b the last equality follows from the fact that I1,n (ω) =  φi,n (ω) dws (ω), ∀n ∈ N, s

a

ˆ a,b × Ω, Y), which implies that E(I1,n ) = ϑY , ∀n ∈ N, by where φi,n ∈ Q(r Lemma 14.99. This completes the proof of the proposition. & '

The following result shows that, under the condition that the integrand satisfies the Riemann Criterion for Integrability and some mild growth assumptions, the Itô integral of an L¯ 2 function can be written as the limit of any sequence of the Itô integral of simple predictable step functions, whose values at the sample times are simply the value of the integrand at the sample times, as the maximum sample period converges to zero. This result allows for easy analysis of the Itô integral when the prescribed conditions are satisfied. ˆ μ) be the σ -finite metric measure Proposition 14.107 Let I := (([0, ∞), |·|), B, subspace of R with partial ordering ≤, Ω := (Ω, B, (Bt )t ∈I , P ) be a filtered ¯ ν) be the σ -finite product probability measure space, I × Ω = ([0, ∞) × Ω, B, measure space, w : I × Ω → R be a standard Wiener process on I × Ω, which is adapted to the filtration (Bt )t ∈I , ra,b ⊂ I with a ≤ b, Y be a separable Hilbert space over K, and the σ -algebras Bt and σ ((wt +s − wt )s∈I ) are independent, ∀t ∈ I. Define .

U(ra,b × Ω, Y) := {φ ∈ L¯ 2 (ra,b × Ω, Y) | φ is a Y-valued stochastic process that is adapted to the filtration (Bt )t ∈I , φω (s) is right continuous, 7 b and φω (s) ds satisfies the Riemann Criterion for Integrability, a

Theorem A.9, ∀ω ∈ Ω; there exists a random variable cφ : Ω → R+ such that φ(t, ω)2Y ≤ cφ (ω), ∀t ∈ ra,b , ∀ω ∈ Ω; and E(cφ ) < ∞.} Let φ ∈ U(ra,b × Ω, Y). Then, the following statements hold: (i) U(ra,b × Ω, Y) is a subspace of L¯ 2 (ra,b × Ω, Y).  ∞ -n n (ii) Define φ¯ n n=1 by φ¯ n (t, ω) := i=1 φ(si−1 , ω)χ[s n ,s n ),r (t), ∀t ∈ ra,b , n

n

n

i−1

i

a,b

n

∀ω ∈ Ω, where a = s0 < s1 < · · · < sn = b, and maxi=1,...,n (si − . n ¯ ¯ si−1 ) < b−a+1 n . Then, we have limn∈N φn = φ in L2 (ra,b × Ω, Y) and  ∞ ˆ a,b × Ω, Y). φ¯n ⊆ Q(r n=1

14.12 Stochastic Integral

861

b = b > . (iii) φ is Itô on ra,b , a φ>s dws = limn∈N a (φ¯ n )s dws = -integrable n limn∈N i=1 φs n (ws n − ws n ) in L2 (Ω, Y), and φL¯ 2 (ra,b ×Ω,Y) i−1 i i−1 6 b 6 6 b 6  6 φs dws 6 6 = limn∈N a (φ¯ n )s dws 6L¯ (Ω,Y) . a L (Ω,Y) 2

. = =

2

Proof (i) This is ∞  straightforward. ˆ a,b × Ω, Y). By Riemann Criterion for Integrability (ii) Clearly, φ¯n n=1 ⊆ Q(r (Theorem A.14), ∀ ∈ (0, ∞) ⊂ R, ∀ω ∈ Ω, ∃δ(ω, ) ∈ (0, ∞) ⊂ R n n ˆ ˆ ˆ such -n that ∀R := ((si )i=0 , (ξi )i=1 ) ∈ I (I ) with Gauge(R) < δ(ω, ), we have φ(v1 , ω)−φ(v2 ,ω)Y , M (ω)(s −s ) K} < 2 , ∀n ∈ N. By the assumption that limn∈N yn = ϑL¯ 1 (Ω,W ) in measure in Ω, we have ∃N ∈ N, ∀n ∈ N with N ≤ n, P (Fn ) := P ({ω ∈ Ω | yn (ω)B(X,Y) ≥   K+2 } < K+2 . Fix any n ∈ N with N ≤ n. Let An := En ∪ Fn ∈ B. Then, P (An ) ≤  P (En ) + P (Fn ) < 2 + K+2 ≤ . ∀ω ∈ Ω \ An , we have yn (ω)xn (ω)Y ≤  yn (ω)B(X,Y) xn (ω)X ≤ K+2 · K < . Hence, P ({ω ∈ Ω | yn (ω)xn (ω) ≥ } ≤ P (An ) < . This proves that limn∈N yn xn = ϑL¯ 1 (Ω,Y) in measure in Ω. This completes the proof of the proposition. ' &

14.13 Itô Processes Definition 14.112 Let Ω := (Ω, B, P ) be a probability measure space and X be a separable Banach space over K, y : Ω → X be an X-valued random variable. Define |||y||| := E(1 ∧ (P ◦ y)) = Ω (1 ∧ y(ω)X ) dP (ω) ∈ r0,1 . % Lemma 14.113 Let Ω := (Ω, B, P ) be a probability measure space and X be a separable Banach space over K, y : Ω → X and z : Ω → X be two X-valued random variables, and (yn )∞ n=1 be a sequence of X-valued random variables. Then, we have (i) |||y + z||| ≤ |||y||| + |||z|||. (ii) limn∈N yn = y in measure in Ω if, and only if, limn∈N |||y − yn ||| = 0. Proof (i) Note that |||y + z||| = E(1 ∧ (P ◦ (y + z))) ≤ E(1 ∧ (P ◦ y + P ◦ z)) ≤ E(1 ∧ (P ◦ y) + 1 ∧ (P ◦ z)) = E(1 ∧ (P ◦ y)) + E(1 ∧ (P ◦ z)) = |||y||| + |||z|||, where the first equality follows from Definition 14.112, the first inequality follows from the triangular inequality for norms, the second inequality follows from simple algebra, the second equality follows from Proposition 11.92, and the last equality follows from Definition 14.112. (ii) “Necessity” Let limn∈N yn = y in measure in Ω. Then, ∀ ∈ r◦0,1 , ∃n() ∈ N, such that ∀n ∈ N with n() ≤ n, we have P (An, ) := P ({ω ∈   |||yn − y||| = E(1 ∧ (P ◦ (yn − y))) = Ω | yn (ω) − y(ω)X > 2 }) < 2 . Then,  (1∧y (ω) − y(ω) ) dP (ω) ≤ n X An, 1dP (ω)+ Ω\An, 2 dP (ω) ≤ P (An, )+ Ω    Ω 2 dP (ω) < 2 + 2 = , where the first equality follows from Definition 14.112, the second equality follows from Definition 14.1, the first inequality follows from the definition of An, ∈ B, the second inequality follows from Definition 14.1, and the last inequality follows from the property for An, . By the arbitrariness of , we have limn∈N |||yn − y||| = 0. “Sufficiency” Let limn∈N |||yn − y||| = 0. ∀ ∈ r◦0,1 , ∃n() ∈ N, such that ∀n ∈ N with n() ≤ n, we have |||yn − y||| <  2 . Then, P (A¯ n, ) := P ({ω ∈

14.13 Itô Processes

865

Ω | yn (ω) − y(ω)X > }) = P ({ω ∈ Ω | 1∧yn (ω) − y(ω)X > }) ≤ 1 E(1∧ (P ◦ (yn − y))) = 1 |||yn − y||| < , where the first equality follows from simple algebra, the first inequality follows from Chebyshev’s Inequality Proposition 14.67, the second equality follows from Definition 14.112, and the last inequality follows from n ≥ n(). By the arbitrariness of , we have limn∈N yn = y in measure in Ω. ' & Definition 14.114 Let Ω := (Ω, B, (Bn )∞ n=0 , P ) be a filtered probability measure space, Y be a separable Hilbert space over K, X := (Xn )∞ n=0 be an adapted Y-valued Martingale, and dn := Xn − Xn−1 be a Y-valued random variable, 2 12 2 n ∈ N. Define S(X) := (X0 2Y + ∞ k=1 dk Y ) , s(X) := (X0 Y + -∞ 1 1 n 2 2 2 2 2 k=1 E(dk Y |Bk−1 )) , Sn (X) := (X0 Y + k=1 dk Y ) , and sn (X) := -n 1 2 2 (X0 Y + k=1 E(dk Y |Bk−1 )) 2 , ∀n ∈ Z+ . S(X) : Ω → [0, ∞] ⊂ Re is the square function for the Martingale X, and Sn (X) : Ω → R+ is the corresponding partial sum. s(X) : Ω → [0, ∞] ⊂ Re is the conditional square function for the Martingale X, and sn (X) : Ω → R+ is the corresponding partial sum. % Proposition 14.115 Let Ω := (Ω, B, (Bn )∞ n=0 , P ) be a filtered probability measure space, (Yn )∞ be a sequence of adapted nonnegative real-valued random n=0 variables that is a sub Martingale, and YM : Ω → [0, ∞] ⊂ Re be defined by YM (ω) := supn∈Z+ Yn (ω), ∀ω ∈ Ω. Then, YM is a nonnegative extended real-valued function that is B-measurable, and P ({ω ∈ Ω | YM (ω) > λ}) ≤ 1 λ limn∈N E(Yn ), ∀λ ∈ R+ . Proof Define YM,n : Ω → R+ by YM,n (ω) := supk∈Z+ ,k≤n Yk (ω), ∀ω ∈ Ω, ∀n ∈ Z+ . By Proposition 11.40, YM,n , ∀n ∈ Z+ , and YM are B-measurable. Then, YM,n is a nonnegative real-valued random variable, ∀n ∈ Z+ . Clearly, we have Y M,n (ω) ↑ YM (ω), as n → ∞, ∀ω ∈ Ω. Then, we have An,λ ⊆ An+1,λ , and n∈Z+ An,λ = Aλ := {ω ∈ Ω | YM (ω) > λ}, where An,λ := {ω ∈ Ω | YM,n (ω) > λ}, ∀n ∈ Z+ , ∀λ ∈ R+ . Claim 14.115.1 P (An,1 ) ≤ E(Yn ), ∀n ∈ Z+ . Proof of Claim Note that, ∀n ∈ Z+ , P (An,1 ) = P ({ω ∈ Ω | Y0 (ω) > 1})

.

+P ({ω ∈ Ω | Y0 (ω) ≤ 1, Y1 (ω) > 1}) + · · · +P ({ω ∈ Ω | Yk (ω) ≤ 1, k = 0, . . . , n − 1, Yn (ω) > 1}) n n 7 n 7 . . . =: P (Bk ) = 1dP ≤ Yk dP k=0



n 7 . k=0 Bk

k=0 Bk

E(Yn |Bk ) dP =

k=0 Bk n . k=0

E(χBk ,Ω E(Yn |Bk ))

866

14 Probability Theory

=

n .

E(E(χBk ,Ω Yn |Bk )) =

k=0

=

n 7 . k=0 Bk

n .

E(χBk ,Ω Yn )

k=0

7 Yn dP =

7 Yn dP ≤

An,1

Yn dP = E(Yn ) Ω

where  the first equality follows from the fact that An,1 equals the disjoint union of nk=0 Bk , the second equality follows from Definition 14.1, the first inequality follows from the fact Yk (ω) > 1, ∀ω ∈ Bk , k = 0, . . . , n, the second inequality follows from the fact (Yn )∞ n=0 is a sub Martingale, the third equality follows from Definition 14.1, the fourth equality follows from (h) of Proposition 14.11, the fifth equality follows from (a) of Proposition 14.11, the sixth equality follows from Definition 14.1, the seventh equality follows from Proposition 11.92, the third inequality follows from the fact that Yn is nonnegative real-valued, and the last equality follows from Definition 14.1. This completes the proof of the claim. ' & By Claim 14.115.1, we have that P (An,λ ) = P ({ω ∈ Ω | λ1 YM,n > 1}) ≤ E( λ1 Yn ) = λ1 E(Yn ), ∀λ ∈ R+ , where the first equality follows from simple set equality, the inequality follows from Claim 14.115.1, and the  last equality follows from Proposition 11.92. Therefore, we have P (Aλ ) = P ( n∈Z+ An,λ ) = limn∈N P (An,λ ) ≤ lim infn∈N λ1 E(Yn ) = λ1 limn∈N E(Yn ), ∀λ ∈ R+ , where the first equality follows from the first paragraph in the proof of this proposition, the second equality follows from Proposition 11.7, the inequality follows from the preceding discussion, and the last equality follows from the fact E(Yn ) ≤ E(Yn+1 ), ∀n ∈ Z+ . This completes the proof of the proposition. ' & Lemma 14.116 Let Ω := (Ω, B, (Bn )∞ n=0 , P ) be a filtered probability measure space, Y be a separable Hilbert space over K, X := (Xn )∞ n=0 be an adapted Yvalued Martingale, dn := Xn − Xn−1 be a Y-valued random variable, n ∈ N, and XM : Ω → [0, ∞] ⊂ Re be defined by XM (ω) := supn∈Z+ Xn (ω)Y , ∀ω ∈ Ω. Then, |||XM ||| ≤ 3||| s(X)|||. Proof Since X is an adapted Y-valued Martingale, then Xn : Ω → Y is Bn measurable, and Xn ∈ E(Xn+1 |Bn ), ∀n ∈ Z+ . By Proposition 11.38, P ◦ Xn is Bn -measurable. By Proposition 11.40, XM is B-measurable. Hence, 1 ∧ XM is nonnegative real-valued random variable. We need the following intermediate result: Claim 14.116.1 ∀λ ∈ R+ , let Aλ := {ω ∈ Ω | s(X) > λ} ∈ B, then P ({ω ∈ Ω | XM > λ, s(X) ≤ λ}) ≤ P (Aλ ) +

.

1 λ2

7 (s(X))2 dP Ω\Aλ

1 Proof of Claim Note that sn (X) = (X0 2Y + nk=1 E(dk 2Y |Bk−1 )) 2 is Bn−1 measurable, ∀n ∈ N. Let τ : Ω → Z+ ∪ {∞} be defined by τ (ω) := inf{n ∈

14.13 Itô Processes

867

Z+ | sn+1 (X)(ω) > λ}, ∀ω ∈ Ω. Clearly, ∀n ∈ Z+ , we have {ω ∈ Ω | τ (ω) ≤ n} ∈ Bn . Hence, τ is a stopping time. Let Y := (Yn )∞ n=0 be defined by Yn = Xn∧τ , ∀n ∈ Z+ . Clearly, Y is the stopped-Y-valued process. By Theorem 14.47, Y is a Martingale. Clearly, Yn = X0 + nk=1 χ{ω∈Ω | τ (ω)≥k},Ω dk , ∀n ∈ Z+ . By the definition of τ , we have sτ (X) ≤ λ. Define YM : Ω → [0, ∞] ⊂ Re be defined by YM (ω) := supn∈Z+ Yn (ω)Y , ∀ω ∈ Ω. Note that, ∀n ∈ N, # "6 62 E(Yn 2Y |Bn−1 ) = E 6Yn−1 + χ{ω∈Ω | τ (ω)≥n},Ω dn 6Y |Bn−1

.

= E(Yn−1 2Y + χ{ω∈Ω | τ (ω)≥n},Ω Yn−1 , dn Y +χ{ω∈Ω | τ (ω)≥n},Ω dn , Yn−1 Y + χ{ω∈Ω | τ (ω)≥n},Ω dn 2Y |Bn−1 )     = Yn−1 2Y + E 2χ{ω∈Ω | τ (ω)≥n},Ω Re Yn−1 , dn Y | Bn−1 "6 # 62 +E 6χ{ω∈Ω | τ (ω)≥n},Ω dn 6Y |Bn−1   = Yn−1 2Y + 2χ{ω∈Ω | τ (ω)≥n},Ω E(Re Yn−1 , dn Y | Bn−1 ) +E(Yn − Yn−1 2Y |Bn−1 )

  = Yn−1 2Y + 2χ{ω∈Ω | τ (ω)≥n},Ω Re E(Yn−1 , dn Y |Bn−1 ) +E(Yn − Yn−1 2Y |Bn−1 ) = Yn−1 2Y + E(Yn − Yn−1 2Y |Bn−1 ) ≥ Yn−1 2Y a.e. in Ω where the first equality follows from the preceding discussion, the second equality follows from Definition 13.1 and Proposition 13.2, the third equality follows from (c), (b), (h) of Proposition 14.11, the fourth equality follows from (h) of Proposition 14.11, the fifth equality follows from Proposition 11.92, the last equality follows from (h) of Proposition 14.11 and the fact that E(dn |Bn−1 ) =  ∞ ϑY , and the inequality follows from (d) of Proposition 14.11. Thus Yn 2Y n=0 is a sub Martingale. Let gM : Ω → [0, ∞] ⊂ Re be defined by gM (ω) := supn∈Z+ Yn (ω)2Y , ∀ω ∈ Ω. Then, P ({ω ∈ Ω | YM (ω) > λ} = P ({ω ∈ Ω | gM (ω) > λ2 }) ≤ λ12 limn∈N E(Yn 2Y ) ≤ λ12 E((s(Y ))2 ), where the first equality follows from the equality of the two sets involved, the first inequality follows from Proposition 14.115, and the last inequality follows from the fact that E(Yn+1 2Y ) = E(E(Yn+1 2Y |Bn )) = E(Yn 2Y ) + E(E(Yn+1 − Yn 2Y |Bn )), ∀n ∈ Z+ . Then, we have the following line of reasoning: P ({ω ∈ Ω | XM > λ, s(X) ≤ λ})

.

= P ({ω ∈ Ω | XM > λ, τ (ω) = ∞}) = P ({ω ∈ Ω | YM > λ, τ (ω) = ∞})

868

14 Probability Theory

1 ≤ P ({ω ∈ Ω | YM > λ}) ≤ 2 E((s(Y ))2 ) λ 7 7 1 1 = 2 (s(Y ))2 dP + 2 (s(Y ))2 dP λ Ω\Aλ λ Aλ where the first equality follows from the definition of τ , the second equality follows from the fact YM (ω) = XM (ω), when τ (ω) = ∞, the first inequality follows from Proposition 11.4, the second inequality follows from the previous paragraph, and the last equality follows from Proposition - 11.92. 2 Note that (s(Y ))2 = X0 2Y + ∞ k=1 E(χ{ω∈Ω | τ (ω)≥k},Ω dk Y |Bk−1 ) = -τ 2 2 2 2 X0 Y + k=1 E(dk Y |Bk−1 ) = (sτ (X)) ≤ λ , where the first equality follows from Definition 14.114, the second equality follows from simple algebra, the third equality follows from Definition 14.114, and the inequality follows the first paragraph in this proof of the claim. Then, it follows that P ({ω ∈ Ω | XM > λ, s(X) ≤ λ}) 7 7 1 1 (s(Y ))2 dP + 2 λ2 dP ≤ 2 λ Ω\Aλ λ Aλ 7 1 = P (Aλ ) + 2 (s(X))2 dP λ Ω\Aλ

.

where the first inequality follows from the last paragraph and the preceding discussion and the equality follows from the fact that s(Y )(ω) = s(X)(ω), if s(X) ≤ λ. This completes the proof of the claim. ' & Note the following line of reasoning: 7



|||XM ||| = E(1 ∧ XM ) =

.

7

1

= 7

P ({ω ∈ Ω | (1 ∧ XM (ω)) > λ}) dλ

0 1

= 7

P ({ω ∈ Ω | (1 ∧ XM (ω)) > λ}) dλ

0

P ({ω ∈ Ω | XM (ω) > λ}) dλ

0 1

= 0

P ({ω ∈ Ω | XM (ω) > λ, s(X) ≤ λ}) dλ

7

1

+

P ({ω ∈ Ω | XM (ω) > λ, s(X) > λ}) dλ

0

7

1

≤ 0

1 (P (Aλ ) + 2 λ

7 (s(X))2 dP ) dλ Ω\Aλ

14.13 Itô Processes

7

1

+ 7 7

1 0 1

= 0

+2 7

P ({ω ∈ Ω | s(X) > λ}) dλ

0

=

869

1 λ2 1 λ2 7 1

7

7 (s(X))2 dP dλ + 2 7

1

P ({ω ∈ Ω | s(X) > λ}) dλ

0

Ω\Aλ

{ω∈Ω | (1∧s(X))≤λ}

(1 ∧ s(X))2 dP dλ

P ({ω ∈ Ω | (1 ∧ s(X)) > λ}) dλ

0

1 (Z(ω))2 d(P × μB )(ω, λ) + 2E(1 ∧ s(X)) 2 {(ω,λ)∈Ω×r0,1 | Z(ω)≤λ} λ 7 7 1 1 = 2||| s(X)||| + (Z(ω))2 dλ dP (ω) 2 λ Z(ω) Ω 1 7  2 1  = 2||| s(X)||| − (Z(ω)) ( ) dP (ω) λ Z(ω) Ω 7 Z(ω) dP (ω) = 2||| s(X)||| + E(Z) = 3||| s(X)||| ≤ 2||| s(X)||| + =

Ω

where the first equality follows from Definition 14.112, the second equality follows from Proposition 14.21, the third equality follows from Proposition 11.92, the fourth equality follows from simple set equality for 0 ≤ λ < 1 and Lemma 11.73, the fifth equality follows from Fact 11.3, the first inequality follows from Claim 14.116.1, the sixth equality follows from the definition of Aλ and Proposition 11.92, the seventh equality follows from simple set equality for 0 ≤ λ < 1 and Lemma 11.73, the eighth equality follows from the definition Z := 1 ∧ s(X), Tonelli’s Theorem 12.29, and Proposition 14.21, the ninth equality follows from Definition 14.112 and Tonelli’s Theorem 12.29, the tenth equality follows Theorem 12.83, the second inequality follows from Proposition 11.83, the eleventh equality follows from Definition 14.1, and the last equality follows from Definition 14.112. This completes the proof of the lemma. ' & Lemma 14.117 Let Ω := (Ω, B, (Bn )∞ n=0 , P ) be a filtered probability measure space, Y be a separable Hilbert space over K, and X := (Xn )∞ n=0 be an adapted 2 2 Y-valued Martingale. Then, limn∈N E(Xn Y ) = E((S(X)) ) = E((s(X))2 ). 2 2 Proof Note that E((S(X))2 ) = E(X0 2Y + ∞ k=1 dk Y ) = E(limn∈N (X0 Y + -n n 2 2 2 2 k=1 dk Y )) = limn∈N E(X0 Y + k=1 dk Y ) = limn∈N E(Xn Y ), where the first equality follows from Definition 14.114, the second equality follows from simple algebra, the third equality follows from Monotone Convergence Theorem 11.81, and the fourth equality follows from the assumption that X is a Martingale, which implies that E(Xn 2Y ) = E(Xn−1 + dn 2Y ) =

870

14 Probability Theory

E(Xn−1 2Y ) + E(dn )2 , ∀n ∈ N. Clearly, we have E((s(X))2 ) = E(X0 2Y + -∞ -∞ 2 2 2 2 k=1 E(dk Y |Bk−1 )) = E(X0 Y + k=1 dk Y ) = E((S(X)) ), were the first equality follows from Definition 14.114, the second equality follows from Proposition 11.92 and (a) of Proposition 14.11, and the last equality follows from Definition 14.114. This completes the proof of the lemma. ' & ˆ μ) be the σ -finite metric measure subspace Lemma 14.118 Let I := ((R+ , |·|), B, of R with partial ordering ≤, Ω := (Ω, B, (Bt )t ∈I , P ) be a filtered probability ¯ ν) be the σ -finite product measure space, measure space, I × Ω = ([0, ∞) × Ω, B, w : I × Ω → R be a standard Wiener process on I × Ω and (wt )t ∈I is adapted to the filtration (Bt )t ∈I , ra,b ⊂ I with a ≤ b, Y be a separable Hilbert spaces ˆ a,b × Ω, Y) is defined in Lemma 14.99, ˆ a,b × Ω, Y), where Q(r over K, φ ∈ Q(r and the σ -algebras Bt and σ ((wt +s − wt )s∈I ) are independent, ∀t ∈ I. Define h : 6 t 6 Ω → R+ by h(ω) := supt ∈ra,b 6a φs dws 6Y , ∀ω ∈ Ω. Then, we have |||h||| ≤ 6 6 3|||6φω 6L¯ (r ,Y) |||. 2 a,b Proof Let φ be given by φ(t, ω) := nk=1 Ak−1 (ω)χ[tk−1 ,tk ),I (t), where n ∈ Z+ , a = t0 < t1 < · · · < tn = b, and Ak ∈ L¯ 2 (Ω, Y) is Btk -measurable, k = 0, . . . , n − 1. ∀m ∈ N, and a = t¯0 < t¯1 < · · · < t¯nm = b, such that t¯im+j − t¯im+j −1 = m1 (ti+1 − ti ), i = 0, . . . , n − 1, j = 1, . . . , m. 6 6 -j Thus, h(ω) = limm∈N supj =1,...,mn 6 k=1 AR k−1 S (ω)(wt¯k (ω) − wt¯k−1 (ω))6Y =: m t limm∈N hm (ω), ∀ω ∈ Ω, by the continuity of a φs dws . Clearly, hm is Bb measurable. Then, by Proposition 11.48, h is Bb - measurable. By Lemma 14.116, we have |||hm ||| = |||

.

≤ 3|||

sup

j =1,...,mn

m

m

mn #1 ". 6 6 6AP j−1 Q 62 (t¯j − t¯j −1 ) 2 ||| Y j =1

= 3|||

m

mn #1 ". 6 6 6AP j−1 Q 62 E((wt¯ − wt¯ )2 |Bt¯ ) 2 ||| j j−1 j−1 Y j =1

= 3|||

k=1

mn ". "6 ## 1 62 2 E 6AP j−1 Q (wt¯j − wt¯j−1 )6Y |Bt¯j−1 ||| j =1

= 3|||

j 6. 6 6 AP k−1 Q (wt¯k − wt¯k−1 )6Y |||

m

n ". #1 6 6 6 6 6Aj −1 62 (tj − tj −1 ) 2 ||| = 3|||6φω 6 ¯ Y L j =1

2 (ra,b ,Y)

|||

where the first equality follows from the definition of hm , the inequality follows from Lemma 14.116, the second equality follows from Definition 7.1 and (h)

14.13 Itô Processes

871

of Proposition 14.11, the third equality follows from the Definition 14.76, the fourth equality follows from simple algebra, and the last equality follows m from Proposition 6 11.75. Then, we6 have 6 |||h||| = limm∈N |||h ||| ≤ 6 6 6 6 6 limm∈N 3||| φω L¯ (r ,Y) ||| = 3||| φω L¯ (r ,Y) |||, where the first equality 2 a,b 2 a,b follows from Bounded Convergence Theorem 11.77, the inequality follows from the preceding discussion, and the last equality follows form simple algebra. This completes the proof of the lemma. ' & ˆ μ) be the σ -finite metric measure subTheorem 14.119 Let I := ((R+ , |·|), B, space of R with partial ordering ≤, Ω := (Ω, B, (Bt )t ∈I , P ) be a filtered ¯ ν) be the σ -finite product measure probability measure space, I×Ω = (R+ ×Ω, B, space, w : I × Ω → R be a standard Wiener process on I × Ω and be adapted to the filtration (Bt )t ∈I , ra,b ⊂ I with a ≤ b, Y be a separable Hilbert space, and φ ∈ L¯ 2 (ra,b × Ω, Y) be a Y-valued stochastic process adapted to the filtration (Bt )t ∈I . Assume that (a) The σ -algebras Bt and σ ((wt +s − wt )s∈I ) are independent, ∀t ∈ I. (b) ∀E ∈ B, if P (E) = 0 then E ∈ B0 . (c) φ is Itô integrable with respect to the Wiener process w on the interval ra,b . Then, there exists J : ra,b × Ω → Y that is a Y-valued stochastic process adapted to ˆ a,b × Ω, Y), the filtration (Bt )t ∈I with continuous sample path, such that, ∀ψ ∈ Q(r we have |||h||| ≤ 3|||ψ − φL¯ 2 (ra,b ,Y) |||, where h : Ω → R+ is defined by h(ω) = 6 6 t supt ∈ra,b 6(a ψs dws )(ω) − J (t, ω)6Y , ∀ω ∈ Ω. Such J is unique in the sense that, let J¯ : ra,b × Ω → Y be another such process, then J¯ω = Jω a.e. ω ∈ Ω. We t will say that J is a Y-valued Itô process representation of the Itô integrals a φs dws , t ∈ ra,b . Proof By (c) and Definition 14.104, there exists a sequence of Y-valued simple ˆ ˆ predictable step functions (φn )∞ n=1 ⊆ Q(ra,b × . Ω, Y), where Q(ra,b × Ω, Y) is defined in Lemma 14.99, such that limn∈N φn = φ in L¯ 2 (ra,b × Ω, Y). For each   t n ∈ N, let In : ra,b × Ω → Y be defined by In (t, ω) := a φn s dws (ω) ∈ Y. Then, In is an adapted Y-valued stochastic process and has continuous sample path, t ∀n ∈ N. By Definition 14.104, limn∈N [In (t)] = I (t) := a φs dws in L2 (Ω, Y), ∀t ∈ ra,b . By Lemma 14.118, we have |||hn,m ||| ≤ 3|||φn − φm L¯ 2 (ra,b ,Y) |||, ∀n, m ∈ N, where hn,m : Ω → R+ is defined by hn,m (ω) := supt ∈ra,b In (t, ω) − Im (t, ω)Y , ∀ω ∈ Ω, and hn,m is Bb -measurable. ∀ ∈ R+ , 6∃N() ∈ N such that6∀n, m ∈ N with N() ≤ n and N() ≤ m, we have 6 φn − φm L¯ 2 (ra,b ,Y) 6L¯ (Ω,R) = 2 φn − φm L¯ 2 (ra,b ×Ω,Y) < 3 , where the equality follows from Tonelli’s Theo rem 12.29. Then, we have |||Zn,m ||| = E(1 ∧ Zn,m ) = Ω (1 ∧ Zn,m ) dP ≤ 6 6 6 6 1  2 6 6 6 6 2 Ω Zn,m dP = Zn,m L¯ 1 (Ω,R) ≤ ( Ω Zn,m dP ) = Zn,m L¯ 2 (Ω,R) < 3 , where Zn,m := φn − φm L¯ 2 (ra,b ,Y) , the first equality follows from Definition 14.112, the second equality follows from Definition 14.1, the first inequality follows from

872

14 Probability Theory

Proposition 11.83, the third equality follows from Example 11.173, the second inequality follows from Cauchy–Schwarz Inequality, and the last equality follows from Example 11.173. This implies that |||hn,m ||| < . Fix any k ∈ N, ∃nk ∈ N with nk−1 < nk and N(2−2k ) ≤ nk , where n0 := 0. ∀n, m ∈ N with nk ≤ n and nk ≤ m, we have |||hn,m ||| < 2−2k . Then, P ({ω ∈ Ω | hn,m (ω) ≥ 2−k }) < 2−k , by Chebyshev’s Inequality (Proposition 14.67). Define Ak := {ω ∈ Ω | hnk ,nk+1 (ω) ≥ 2−k } ∈ B, ∀k ∈ N. Then, P (Ak ) < 2−k .  ∞  ∞ Consider the sequence Ink k=1 . Let A := ∞ k=1 j =k Aj ∈ B. By Borel-Cantelli Lemma 14.6,P (A) = 0. ∀ω ∈ Ω \ A, we have ∀l ∈ N, ∃kl ∈ Nsuch that ∞ ω ∈ Ω \( ∞ j =kl Aj ). Then, ∀k ∈ N with k ≥ kl , ω ∈ Ω \ ( j =k Aj ). Without loss of generality, take kl ≥ l. Thus, we have ∀k, j ∈ N with kl ≤ 6 6 -j −1 k ≤ j , hnk ,nj (ω) = supt ∈ra,b 6Ink (t, ω) − Inj (t, ω)6Y ≤ t =k hnt ,nt+1 (ω) < -j −1 −t −k+1 ≤ 2−kl +1 ≤ 2−l+1 . By the arbitrariness of l, we have 2 < 2 " t =k #∞ Ink ω is uniform Cauchy sequence, uniform in t ∈ ra,b . Let J : ra,b ×Ω → Y k=1  limk∈N Ink (t, ω) ω ∈ Ω \ A , ∀ω ∈ Ω, ∀t ∈ ra,b . Then, be defined by J (t, ω) = ω∈A ϑY ¯ J is B-measurable and has continuous sample path. By Assumption (b), J is a Yvalued stochastic process adapted to the filtration (Bt )t ∈I . By Proposition 14.102, J is progressively measurable. Fix any t ∈ ra,b , we have limk∈N Ink (t) = J (t) a.e. in Ω. By the first paragraph of the proof, we have limk∈N [Ink (t)] = I (t) in L2 (Ω, Y). By Propositions 11.211 and 11.57, we have J (t) ∈ I (t). Hence, we have shown that there exists a Yvalued stochastic process J that is adapted to (Bt )t ∈I , whose sample path is t continuous, such that J (t) ∈ a φs dws , ∀t ∈ ra,b . Define hn : Ω → R+ by hn (ω) = supt ∈ra,b In (t, ω) − J (t, ω)Y , ∀ω ∈ Ω, ∀n ∈ N. Then, we have hn is Bb -measurable and hn = limk∈N hn,nk a.e. in Ω, ∀n ∈ N. Then, |||hn ||| = limk∈N |||hn,nk ||| ≤ limk∈N 3|||φn − φnk L¯ 2 (ra,b ,Y) |||, ∀n ∈ N, where the first equality follows from Bounded Convergence Theorem 11.77 and the inequality follows from the second paragraph of the proof. 6 6 Note that 0 = limk∈N φ − φnk L¯ 2 (ra,b ×Ω,Y) = limk∈N 6φ − φnk L¯ 2 (ra,b ,Y) 6L¯ (Ω,R) = 2 6 6 6 limk∈N 6φ − φn + φn − φnk L¯ 2 (ra,b ,Y) 6L¯ (Ω,R) ≥ limk∈N 6φ − φn L¯ 2 (ra,b ,Y) − 2 6 φn − φnk L¯ 2 (ra,b ,Y) 6L¯ (Ω,R) ≥ 0, ∀n ∈ N, where the first equality follows from  ∞ 2 the choice of φnk k=1 , the second equality follows from Tonelli’s Theorem 12.29, the third equality follows from simple algebra, the first inequality follows from Definition 7.1, and the last inequality follows from Definition 7.1. This . implies that limk∈N φn − φnk L¯ 2 (ra,b ,Y) = φ − φn L¯ 2 (ra,b ,Y) in L¯ 2 (Ω, R). By Proposition 11.211 and Lemma 14.113, we have limk∈N |||φn − φnk L¯ 2 (ra,b ,Y) ||| = |||φn − φL¯ 2 (ra,b ,Y) |||, ∀n ∈ N. Thus, we have shown that |||hn ||| ≤ 3|||φn − φL¯ 2 (ra,b ,Y) |||, ∀n ∈ N.

14.13 Itô Processes

873

ˆ a,b × Ω, Y), without loss of generality, we assume that φ1 = ψ. ∀ψ ∈ Q(r Then, by the previous paragraph, we have |||h||| ≤ 3|||ψ − φL¯ 2 (ra,b ,Y) |||, where 6 t 6 6( ψs dws )(ω)−J (t, ω)6 , ∀ω ∈ Ω. h : Ω → R+ is defined by h(ω) = sup t ∈ra,b

a

Y

Finally, we will show the uniqueness of J . Let J¯ be another such process, then, we have |||h¯ n ||| ≤ 3|||φn − φL¯ 2 (ra,b ,Y) |||, where h¯ n : Ω → R+ is defined by 6 6 h¯ n (ω) = supt ∈ra,b 6In (t, ω) − J¯(t, ω)6Y , ∀ω ∈ Ω. Let h˜ : Ω → R+ is defined 6 6 6J (t, ω) − J¯(t, ω)6 . Then, h(ω) ˜ ˜ ≤ hn (ω) + h¯ n (ω), ∀n ∈ N, by h(ω) = sup t ∈ra,b

Y

˜ ≤ limn∈N (|||h¯ n ||| + |||hn |||) = 0, by Lemma 14.113. ∀ω ∈ Ω. Thus, we have |||h||| ¯ This shows that Jω = Jω a.e. ω ∈ Ω. This completes the proof of the theorem. ' &

ˆ μ) be the σ -finite metric measure Proposition 14.120 Let I := ((R+ , |·|), B, subspace of R with partial ordering ≤, Ω := (Ω, B, (Bt )t ∈I , P ) be a filtered ¯ ν) be the σ -finite product measure probability measure space, I×Ω = (R+ ×Ω, B, space, w : I × Ω → R be a standard Wiener process on I × Ω and be adapted to the filtration (Bt )t ∈I , ra,b ⊂ I with a ≤ b, Y and Z be separable Hilbert spaces over K, φ, ψ ∈ L¯ 2 (ra,b × Ω, Y) be Y-valued stochastic processes adapted to the filtration (Bt )t ∈I . Assume that the following hold: (a) The σ -algebras Bt and σ ((wt +s − wt )s∈I ) are independent, ∀t ∈ I. (b) ∀E ∈ B, if P (E) = 0 then E ∈ B0 . (c) φ and ψ are Itô integrable with respect to the Wiener process w on the interval ra,b , respectively. t (d) Let I be a Y-valued Itô process representation of the Itô integrals a φs dws , t ∈ ra,b and J be a Y-valued Itô process representation of the Itô integrals t a ψs dws , t ∈ ra,b . (e) h : Ω → R+ be defined by h(ω) = supt ∈ra,b I (t, ω) − J (t, ω)Y , ∀ω ∈ Ω. Then, the following statements hold: (i) |||h||| ≤ 3|||φ − ψL¯ 2 (ra,b ,Y) |||.

t (ii) I + J is an Itô process representation of the Itô integrals a (ψs + φs ) dws , t ∈ ra,b . (iii) ∀α ∈ K, we have αI is an Itô process representation of the Itô integrals t a (αφs ) dws , t ∈ ra,b . 6 6 (iv) ∀t ∈ ra,b , It L¯ 2 (Ω,Y) = 6 φ|ra,t ×Ω 6L¯ (r ×Ω,Y) . 2 a,t (v) (It )t ∈ra,b is a Y-valued Martingale adapted to (Bt )t ∈ra,b . (vi) ∀A : Ω → W ⊆ B (Y, Z) that is Ba -measurable and E(A2B(Y,Z) ) < ∞, where W is a separable subspace of B (Y, Z), if φ ∈ U(ra,b × Ω, Y) with the random variable cφ : Ω → R+ as defined in Proposition 14.107 and E(A2B(Y,Z) cφ ) < ∞, then AI is a Z-valued Itô process representation of t the Itô integrals a (Aφs ) dws , t ∈ ra,b .

874

14 Probability Theory

(vii) ∀c ∈ ra,b , I |rc,b ×Ω − Ic is an Itô process representation of the Itô integrals t c φs dws , t ∈ rc,b . " # (viii) Re (E(It , Jt Y )) = Re ra,t ×Ω φ(s, ω), ψ(s, ω) Y dν(s, ω) , ∀t ∈ ra,b . Proof (i) By (c) and Definition 14.104, there exists a sequence of Y-valued simple ˆ ˆ predictable step functions (φn )∞ n=1 ⊆ Q(ra,b × . Ω, Y), where Q(ra,b × Ω, Y) is defined in Lemma 14.99, such that limn∈N φn = φ in L¯ 2 (ra,b × Ω, Y). For each   t n ∈ N, let In : ra,b × Ω → Y be defined by In (t, ω) := a φn s dws (ω) ∈ Y. Then, In is an adapted Y-valued stochastic process and has continuous sample path, ∀n ∈ N. By Theorem 14.119, |||hn ||| ≤ 3|||φn − ψL¯ 2 (ra,b ,Y) |||, ∀n ∈ N, where hn : Ω → R+ is defined by hn (ω) = supt ∈ra,b In (t, ω) − J (t, ω)Y , ∀ω ∈ Ω, ∀n ∈ N, and we have hn is Bb -measurable. By Theorem 14.119, I is a Y-valued stochastic process that is adapted to (Bt )t ∈I , whose sample path is continuous, such that I (t) ∈ t ¯ a φs dws , ∀t ∈ ra,b . Furthermore, |||hn ||| ≤ 3|||φn − φL¯ 2 (ra,b ,Y) |||, ∀n ∈ N, where h¯ n : Ω → R+ is defined by h¯ n (ω) = supt ∈ra,b In (t, ω) − I (t, ω)Y , ∀ω ∈ Ω, ∀n ∈ N, and we have h¯ n is Bb -measurable. Clearly, h ≤ hn + h¯ n , ∀n ∈ N. Then, we have |||h||| ≤ lim infn∈N |||hn + ¯hn ||| ≤ limn∈N (|||hn ||| + |||h¯ n |||) = limn∈N |||hn ||| + limn∈N |||h¯ n ||| ≤ limn∈N 3|||φn − ψL¯ 2 (ra,b ,Y) ||| + limn∈N 3|||φn − φL¯ 2 (ra,b ,Y) |||, where the first inequality follows from Proposition 11.83, the second inequality follows from Lemma 14.113, the equality follows from Proposition 3.83, and the last inequality follows from 6the last paragraph.6 Note that 0 = limn∈N φ − φn L¯ 2 (ra,b ×Ω,Y) = limn∈N 6 φ − φn L¯ 2 (ra,b ,Y) 6L¯ (Ω,R) , where the 2 first equality follows from the choice of (φn )∞ n=1 and the second equality follows from Tonelli’s Theorem 12.29. By Proposition 11.211 and 6 Lemma 14.113, we have limn∈N |||φn − φL¯ 2 (ra,b ,Y) ||| = 0. Note that limn∈N 6 ψ − φn L¯ 2 (ra,b ,Y) − 6 6 6 ψ − φL¯ 2 (ra,b ,Y) 6L¯ (Ω,R) ≤ limn∈N 6 ψ − φn + φ − ψL¯ 2 (ra,b ,Y) 6L¯ (Ω,R) = 2 2 6 6 limn∈N 6φ − φn L¯ 2 (ra,b ,Y) 6L¯ (Ω,R) = limn∈N φ − φn L¯ 2 (ra,b ×Ω,Y) = 0, 2 where the first inequality follows from Definition 7.1, the first equality follows from Tonelli’s Theorem 12.29, and the second equality follows from the choice of (φn )∞ n=1 . Thus, by Proposition 11.211 and Lemma 14.113, we have limn∈N |||φn − ψL¯ 2 (ra,b ,Y) ||| = |||φ − ψL¯ 2 (ra,b ,Y) |||. This further implies that |||h||| ≤ 3|||φ − ψL¯ 2 (ra,b ,Y) |||. (ii) By (c) and Definition 14.104, there exists a sequence of Y-valued simple . ˆ predictable step functions (ψn )∞ n=1 ⊆ Q(ra,b × Ω, Y), such that limn∈N ψn = ψ in L¯ 2 (ra,b × Ω, Y). For each n ∈ N, let Jn : ra,b × Ω → Y be defined by   t Jn (t, ω) := a ψn s dws (ω) ∈ Y. Then, Jn is an adapted Y-valued stochastic ˆ process and has continuous sample path, ∀n ∈ N. Now, (φn + ψn )∞ n=1 ∈ Q(ra,b × . ¯ Ω, Y), by Lemma 14.99, and limn∈N (ψn + φn ) = ψ + φ in L2 (ra,b × Ω, Y). Thus, ψ + φ is Itô integrable with respect to the Wiener process w on the interval ra,b ,   t by Definition 14.104. Note that (In + Jn )(t, ω) := a (φn s + ψn s ) dws (ω) ∈ Y,

14.13 Itô Processes

875

∀(t, ω) ∈ ra,b ×Ω. Let h˜ n (ω) := supt ∈ra,b  In (t, ω)+Jn (t, ω)−I (t, ω)−J (t, ω) Y , ∀ω ∈ Ω. Then, we have |||h˜ n ||| ≤ |||h¯ n + hˆ n ||| ≤ |||h¯ n ||| + |||hˆ n ||| → 0 as n → ∞, where hˆ n (ω) := supt ∈ra,b  Jn (t, ω) − J (t, ω) Y , ∀ω ∈ Ω; the first inequality follows from definition of h˜ n , h¯ n , and hˆ n and monotonicity of triple norm for nonnegative real valued random variables; and the second inequality follows from Lemma 14.113. Thus, limn∈N h˜ n = 0 in measure in Ω. By Theorem 14.119 and its proof, we have I + J is an Itô process representation of the Itô integrals t a (ψs + φs ) dws , t ∈ ra,b .   t (iii) Note that ∀n ∈ N, αIn (t, ω) =  (αφn ) dws (ω) ∈ Y, ∀(t, ω) ∈ ra,b × Ω. a

s

Then, αIn : ra,b × Ω → Y is an adapted Y-valued stochastic process and has conˆ tinuous sample path, ∀n ∈ N. Now, (αφn )∞ n=1 ∈ Q(ra,b × Ω, Y), by Lemma 14.99, . ¯ and limn∈N (αφn ) = αφ in L2 (ra,b × Ω, Y). Thus, αφ is Itô integrable with respect to the Wiener process w on the interval ra,b , by Definition 14.104. By Proposition 14.111, we have limn∈N |α | h¯ n = limn∈N supt ∈ra,b  αIn (t, ω) − αI (t, ω) Y = 0 in measure in Ω. By Theorem 14.119 and its proof, we have t αI is an Itô process representation of the Itô integrals a (αφs ) dws , t ∈ ra,b . (iv) This follows immediately from (v) of Proposition 14.106 and the fact that t It ∈ a φs dws , ∀t ∈ ra,b . (v) Clearly, (It )t ∈ra,b is a Y-valued stochastic process that is adapted to the filtration (Bt )t ∈I . By (iv), It ∈ L¯ 2 (Ω, Y), which implies that It ∈ L¯ 1 (Ω, Y), ∀t ∈ ra,b . Fix any t, s ∈ ra,b with s ≤ t. By (i) of Proposition 14.11 and the convexity of 6 6 t 6 t 6 norm, we have 6E( s (φτ − (φn )τ ) dwτ |Bs )6Y ≤ E(6 s (φτ − (φn )τ ) dwτ 6Y |Bs ) 62 6 t 6 t 6 and 6E( s (φτ − (φn )τ ) dwτ |Bs )6Y ≤ (E(6 s (φτ − (φn )τ ) · dwτ 6Y |Bs ))2 ≤ 6 t 6 t 62 62 E(6 s (φτ − (φn )τ ) dwτ 6Y |Bs ). Then, 0 ≤ E(6E( s (φτ − (φn )τ ) dwτ |Bs )6Y ) ≤ 6 t 6 t 62 62 6 t E(E(6 s (φτ − (φn )τ ) dwτ 6Y |Bs )) = E(6 s (φτ − (φn )τ ) dwτ 6Y ) = 6 s (φτ − 6 62 62 (φn )τ ) dwτ 6 ¯ = 6φ − φn 6 ¯ → 0 as n → ∞, where the first L2 (Ω,Y)

L2 (rs,t ×Ω,Y)

inequality follows from (d) of Proposition 14.11, the second inequality follows from the preceding discussion, the first equality follows from (a) of Proposition 14.11, the second equality follows from Example 11.173 and Definition 14.1, the last equality follows from Proposition 14.106, and the convergence statement 6 t 6 follows from the choice of (φn )∞ n=1 . This implies that limn∈N E( E( s (φτ − 62 (φn )τ ) dwτ |Bs )6Y ) = 0. Therefore, by Propositions 11.211 and 11.57, we have t there exists a subsequence limk∈N E( s (φτ − (φnk )τ ) dwτ |Bs ) = ϑY a.e. in Ω. t Thus, we have E(It |Bs ) = E(Is + It − Is |Bs ) = Is + E( s φτ dwτ |Bs ) = t t t Is + limk∈N E( s φτ dwτ − s (φnk )τ dwτ |Bs ) + limk∈N E( s (φnk )τ dwτ |Bs ) = t Is + limk∈N E( s (φτ − (φnk )τ ) dwτ |Bs ) = Is a.e. in Ω, where the first equality follows from simple algebra, the second equality follows from (b) and (c) of Proposition 14.11, the third equality follows from simple algebra, the fourth equality

876

14 Probability Theory

ˆ a,b × Ω, Y), and the last equality follows from the preceding follows from φnk ∈ Q(r discussion. Hence, (It )t ∈ra,b is a Martingale. ˆ a,b × Ω, Y) and A(ω)In (t, ω) = A(ω)(t φn s · (vi) Note that Aφn ∈ Q(r a t dws )(ω) = (a (Aφn s ) dws )(ω), ∀(t, ω) ∈ ra,b × Ω, ∀n ∈ N, where the . second equality follows from Definition A.4. Note also that limn∈N Aφn = Aφ in L¯ 2 (ra,b × Ω, Y). Hence, by Theorem 14.119 and its proof, AI is a Z-valued t Itô process representation of the Itô integrals a (Aφs ) dws , t ∈ ra,b . (vii) Fix any c ∈ ra,b . Let J¯ := I |rc,b ×Ω − Ic . Clearly, J¯ is Y-valued stochastic process that is adapted to the filtration (Bt )t ∈I and admits continuous sample t path. By assumption (d) and (iv) of Proposition 14.106, we have J¯t ∈ c φs dws , ˆ c,b × Ω, Y), let Φ¯ : ra,b × Ω → Y be defined by Φ(t, ¯ ω) := ∀t ∈ rc,b . ∀φ¯ ∈ Q(r 3 I (t, ω) if t ∈ ra,c ¯ ω) := t and ψ¯ : ra,b × Ω → Y be defined by ψ(t,  ¯ I (c, ω) + c φs dws if t ∈ rc,b  6 6 φ(t, ω) if t ∈ ra,c ¯ ˜ . Clearly, |||h||| = |||h||| ≤ 3|||6ψ¯ − φ 6L¯ (r ,Y) ||| = ¯ 2 a,b φ(t, ω) if t ∈ rc,b 6 6 6 6 t  6 6 6 ¯ ¯ ¯ 3||| φ− φ|rc,b ×Ω L¯ (r ,Y) |||, where h(ω) := supt ∈rc,b ( c φs dws )(ω)− J¯(t, ω)6Y 2 c,b 6 6 ¯ ω)−I (t, ω)6 , and the inequality follows from (i). Hence, and h˜ := supt ∈ra,b 6Φ(t, Y t J¯ is a Y-valued Itô process representation of the Itô integrals c φs dws , t ∈ rc,b . (viii) This follows immediately from (vi) of Proposition 14.106. This completes the proof of the proposition. ' & ˆ μ) be the σ -finite metric measure Proposition 14.121 Let I := ((R+ , |·|), B, subspace of R with partial ordering ≤, Ω := (Ω, B, (Bt )t ∈I , P ) be a filtered ¯ ν) be the σ -finite product probability measure space, I × Ω = (R+ × Ω, B, measure space, ra,b ⊂ I with a ≤ b, Y be a separable Banach space over K, and g : ra,b × Ω → Y is a Y-valued stochastic process that is progressively measurable, b  Theorem A.9, ∀ω ∈ a gω (s) ds satisfies the Riemann Criterion for Integrability, t Ω. Then, I : ra,b × Ω → Y, defined by I (t, ω) := a g(s, ω) ds, ∀(t, ω) ∈ ra,b × Ω, is a Y-valued stochastic process that is adapted to (Bt )t ∈I , has continuous sample path, and is progressively measurable. Proof Fix any ω ∈ Ω, by the assumption and Corollary A.16, gω is RiemannStieltjes integrable with respect to idI on the interval ra,b , and is absolutely Riemann-Stieltjes integrable over ra,b , and the Riemann integrals equal to the corresponding Lebesgue integrals. By Theorem A.11 and Proposition 11.48, It is Bt -measurable, ∀t ∈ ra,b . By Proposition 11.84, I has continuous sample path. By Proposition 14.102, I is a Y-valued stochastic process that is progressively measurable. This completes the proof of the proposition. ' & Next, we present the infinite-dimensional Itô’s Formula. ˆ μ) be the σ -finite metric Theorem 14.122 (Itô’s Formula) Let I := ((R+ , |·|), B, measure subspace of R with partial ordering ≤, (Ω, B, (Bt )t ∈I , P ) =: Ω be a

14.13 Itô Processes

877

¯ ν) be the σ -finite product filtered probability measure space, I × Ω = (R+ × Ω, B, measure space, m ∈ N, w : I×Ω → Rm be an Rm -valued standard Wiener process on I × Ω, where the components of w are (w1 , . . . , wm ), wj  is a standard Wiener process, j = 1, . . . , m, and (wt )t ∈I is adapted to the filtration (Bt )t ∈I , ra,b ⊂ I with a ≤ b, Y and Z be separable Hilbert spaces over K, and D ⊆ Y be a convex open set. Assume that (a) The σ -algebras Bt and σ ((wt +s − wt )s∈I ) are independent, ∀t ∈ I. (b) ∀E ∈ B, if P (E) = 0, then E ∈ B0 . (c) g ∈ L¯ 2 (ra,b × Ω, Y) is a Y-valued stochastic process that is adapted to the b filtration (Bt )t ∈I and has right continuous sample paths,  gω (s) ds satisfies a

the Riemann Criterion for Integrability, Theorem A.9, ∀ω ∈ Ω, and hj  ∈ j  U(ra,b × Ω, Y) with the random variable ch : Ω → R+ , j = 1, . . . , m, where U(ra,b × Ω, Y) is defined in Proposition 14.107. (d) φ j  : ra,b × Ω → Y is the Itô process representation of the Itô integral t j  j  a hs dws , t ∈ ra,b , j = 1, . . . , m. φ : ra,b × Ω → D is defined by φ(t, ω) :=

m .

.

φ

j 

7 (t, ω) +

t

gs ds + φ(a, ω);

∀t ∈ ra,b

(14.22)

a

j =1

where φ(a, ·) ∈ L¯ 2 (Ω, Y) is a Y-valued Ba -measurable random variable. Let U : D → Z be C2 and X(t, ω) = U (φ(t, ω)), ∀(t, ω) ∈ ra,b × Ω. Assume further that (i) h¯ j  ∈ L¯ 2 (ra,b × Ω, Z) and is Itô integrable, j = 1, . . . , m, where g(t, ¯ ω) :=

.

m . 1 j =1

2

U (2) (φ(t, ω))(hj  (t, ω))(hj  (t, ω))

+DU (φ(t, ω))(g(t, ω));

∀(t, ω) ∈ ra,b × Ω.

h¯ j  (t, ω) := DU (φ(t, ω))(hj  (t, ω));

(14.23a)

∀(t, ω) ∈ ra,b × Ω, (14.23b)

j = 1, . . . , m (ii) U (2) (φ)(hj  )(hk ) ∈ U(ra,b × Ω, Z), j = 1, . . . , m, k = 1, . . . , m. 6 62 (iii) U (1) (φt ) ∈ L¯ 2 (Ω, B (Y, Z)) and E(6U (1) (φt )6B(Y,Z) ch ) < ∞, ∀t ∈ ra,b , -m j  where ch : Ω → R+ is defined by ch := j =1 ch . Then, we have the following Itô’s formula: ∀t ∈ ra,b , 7

t

X(t, ω) − X(a, ω) =

.

a

g(s, ¯ ω) ds +

m . j =1

l j  (t, ω) a.e. ω ∈ Ω

(14.24a)

878

14 Probability Theory

where l j  : ra,b × Ω → Z is a Z-valued Itô process representation of the Itô t j  j  integrals a h¯ s dws , t ∈ ra,b , ∀j = 1, . . . , m. In this case, we will abuse the notation and write, ∀t ∈ ra,b , dXt = DU (φt )(gt ) dt +

.

m . 1 j =1

+

m .

2

j 

j 

j 

U (2) (φt )(ht )(ht ) dt j 

a.e. in Ω

DU (φt )(ht ) dwt

(14.24b)

j =1

Proof By (c) and Corollary A.16, gω is absolutely integrable over ra,b . By Theorem 14.119, (a), and (b), φ j  is a Y-valued stochastic process with continuous sample path that is adapted to (Bt )t ∈I , j = 1, . . . , m. By Proposition 14.121, φ has continuous sample path and is progressively measurable. By Proposition 14.102, φ, hj  , j = 1, . . . , m, g are progressively measurable. By Propositions 7.126, 11.38, and 11.39, each term in g¯ is a Z-valued stochastic process that is further adapted to (Bt )t ∈I and admit right-continuous sample paths. By Propositions 7.126, 11.38, and 11.39, h¯ j  , j = 1, . . . , m, are Z-valued stochastic processes and are adapted to the filtration (Bt )t ∈I and admit rightcontinuous sample paths. By Proposition 14.102, g, ¯ h¯ j  , j = 1, . . . , m, are progressively measurable. Fix any t ∈ ra,b . Fix any j = 1, . . . , m. By Theorem A.18 and Corollaries A.16 b and A.17,  h¯ j  (s, ω) ds ∈ Z satisfies the Riemann Criterion for Integrability, a

Theorem A.9, ∀ω ∈ Ω. By (c) and Proposition 14.107, hj  is Itô integrable on ra,b . By (i), h¯ j  is Itô integrable on ra,b . By Theorem 14.119, (a), and (b), l j  is a Z-valued stochastic process with continuous sample path t - thatjis adapted to (Bt )t ∈I . Let Tˆ (ω) := X(t, ω) − X(a, ω) − a g(s, ¯ ω) ds − m j =1 l (t, ω), ∀ω ∈ Ω. Fix any ω ∈ Ω, by Theorem A.18, each term in g¯ satisfies the Riemann Criterion for Integrability, Theorem A.9, with respect to the integrator idI on the interval ra,b . By Proposition 14.121, the assumption (i), and Theorem 14.119, Tˆ is Bt -measurable. -2 n n (2) j  n j  n Let S2,n (ω) := 12 m j =1 i=1 U (φ(si−1 , ω))(h (si−1 , ω))(h (si−1 , ω)) · n

n

n

((wj  (si , ω) − wj  (si−1 , ω))2 − (si n

n

− si−1 )) ∈ Z, ∀ω ∈ Ω, ∀n ∈ N, where

si = a + 2in (t − a), i = 0, . . . , 2n . We will first calculate limn∈N E(S2,n 2Z ). (Clearly, S2,n is Bt -measurable.) lim E(S2,n 2Z ) = lim E(S2,n , S2,n Z )

n∈N

n∈N

1 "@ . . (2) j  j  j  j  U (φs n )(h n )(h n ) ((w n − w n )2 = lim E si−1 si−1 si si−1 i−1 n∈N 4 m

2n

j =1 i=1

n

n −(si

n − si−1 )),

2 m . . j =1 i=1

j  j  j  n )(h n ) ((w n si−1 si−1 si

U (2) (φs n )(h i−1

j  2 n ) si−1

−w

14.13 Itô Processes n

−(si

1 n∈N 4

= lim

879 n

− si−1 ))

A # Z

"@ k  k  E U (2)(φs n )(h n1 )(h n1 )

m . 2n . m . 2n .

k1  n si

−w

k2  n sj

−w

·((w ·((w

si−1

i−1

k1 =1 i=1 k2 =1 j =1 k1  2 n ) si−1

− (si

n

k2  2 n ) sj−1

− (sj − sj −1 ))

si−1

n

k2  k2  n )(h n ) sj−1 sj−1

− si−1 )), U (2) (φs n )(h

n

n

j−1

A # Z

m m 1 . . . . "@ (2) k  k  E U (φs n )(h n1 )(h n1 ), si−1 si−1 i−1 n∈N 4 2n

2n

= lim

k1 =1 i=1 k2 =1 j =1

A

k2  k2  k  ((w n1 n )(h n ) sj−1 sj−1 Z si

U (2) (φs n )(h j−1

k2  n sj

·((w

k2  2 n ) sj−1

−w

n

n

− (sj − sj −1 ))

k1  2 n ) si−1

−w #

n

− (si

n

− si−1 ))

m 2 m 2 1 . . . . " "@ (2) k  k  = lim E E U (φs n )(h n1 )(h n1 ), s s i−1 n∈N 4 i−1 i−1 n

n

k1 =1 i=1 k2 =1 j =1

A

k2  k2  k  ((w n1 n )(h n ) sj−1 sj−1 Z si

U (2) (φs n )(h j−1

k2  n sj

·((w

k2  2 n ) sj−1

−w

n

k1  2 n ) si−1

−w

n

− (sj − sj −1 ))|Bs n

n

− (si

n

− si−1 ))

##

max{i,j}−1

m m 1 . . . " "@ (2) k  k  E E U (φs n )(h n1 )(h n1 ), si−1 si−1 i−1 n∈N 4 2n

= lim

k1 =1 k2 =1 i=1

A

k2  k2  k  ((w n1 n )(h n ) si−1 si−1 Z si

U (2) (φs n )(h i−1

k2  n si

·((w

k2  2 n ) si−1

−w

n

− (si

k1  2 n ) si−1

−w

n

− si−1 ))|Bs n

n

− (si

n

− si−1 ))

##

i−1

m m 1 . . . "@ (2) k  k  E U (φs n )(h n1 )(h n1 ), U (2) (φs n ) si−1 si−1 i−1 i−1 n∈N 4 2n

= lim

k1 =1 k2 =1 i=1

A

k2  k2  E n )(h n ) si−1 si−1 Z

·(h

k2  n si

·((w

k2  2 n ) si−1

−w

" k  k  n n ((w n1 − w n1 )2 − (si − si−1 )) si

si−1

n

− si−1 ))|Bs n

− (si

n

i−1

##

880

14 Probability Theory

1 . . "@ (2) k k k = lim E U (φs n )(h n )(h n ), U (2) (φs n )(h n ) si−1 si−1 si−1 i−1 i−1 n∈N 4 k=1 i=1 ## A " k k k n n ·(h n ) E ((w n − w n )2 − (si − si−1 ))2 |Bs n m

2n

Z

si−1

si

si−1

i−1

m 6 # 1 . . "6 6 k k 62 1 n n E 6U (2) (φs n )(h n )(h n )6 · (si − si−1 )2 si−1 si−1 Z 2 i−1 n∈N 4 2n

= lim

k=1 i=1

m 2n 6 # 1 t − a . . "6 6 (2) k k 62 n n = lim E (φ − s ) n )(h n )(h n )6 (si 6U i−1 si−1 si−1 si−1 Z n∈N 8 2n k=1 i=1

=0 where the first equality follows from Proposition 13.2, the second equality follows from the definition of S2,n , the third equality follows from simple algebra, the fourth equality follows from Definition 13.1, the fifth equality follows from (a) of Proposition 14.11, the sixth equality follows from assumption (a), the seventh equality follows from (h) of Proposition 14.11, the eighth equality follows from k k Definition 14.78, the ninth equality follows from the fact that wt +h − wt ∼ n

N(0, h) and assumption (a), the tenth equality follows from the definition of si , and the last equality follows from the assumption (ii), Proposition 14.107, and . Lebesgue Dominated Convergence Theorem 11.91. Hence, we have limn∈N S2,n = ϑL¯ 2 (Ω,Z) in L¯ 2 (Ω, Z). By Proposition 11.211, limn∈N S2,n = ϑZ in measure in Ω. Note that, by Taylor’s Theorem 12.122, ∀x1 , x2 ∈ D, U (x1 ) − U (x2 )

.

=U

(1)

7

1

(x2 )(x1 − x2 ) +

τ U (2) (x2 + (1 − τ )(x1 − x2 ))(x1 − x2 )

0

·(x1 − x2 ) dτ

7

17 τ

= U (1) (x2 )(x1 − x2 ) +

ds 0

0

·U (2) (x2 + (1 − τ )(x1 − x2 ))(x1 − x2 )(x1 − x2 ) dτ 7 17 1 (1) U (2) (x2 + (1 − τ )(x1 − x2 ))(x1 − x2 ) = U (x2 )(x1 − x2 ) + 0

s

·(x1 − x2 ) dτ ds 1 = U (1) (x2 )(x1 − x2 ) + U (2)(x2 )(x1 − x2 )(x1 − x2 ) 2 7 17 1 (U (2) (x2 + (1 − τ )(x1 − x2 )) + 0

s

14.13 Itô Processes

881

−U (2) (x2 ))(x1 − x2 )(x1 − x2 ) dτ ds 1 = U (1) (x2 )(x1 − x2 ) + U (2)(x2 )(x1 − x2 )(x1 − x2 ) 2 7 17 s + (U (2) (x2 + τ (x1 − x2 )) 0

−U

(2)

0

(x2 ))(x1 − x2 )(x1 − x2 ) dτ ds

where the second equality follows from Proposition 11.75, the third equality follows from Fubini’s Theorem 12.31, the fourth equality follows from Proposition 11.75, and the last equality follows from Change of Variable, Theorem 12.91. Note the following line of arguments: Tˆ − S2,n

.

7 2 .  Xs n − Xs n − =

n

n

i

i=1

i−1

m .  j  g¯s ds − S2,n − lt

si

n

si−1

j =1

7 . U (φs n ) − U (φs n ) − 2n

=

i

i=1

i−1

n

si

n

si−1

m .  j  g¯ s ds − S2,n − lt j =1

n

=

2 .  (1) 1 U (φs n )(φs n − φs n ) + U (2) (φs n )(φs n − φs n ) i−1 i i−1 i−1 i i−1 2 i=1

7

·(φs n − φs n ) + i

i−1

1 7 s¯ 0

0

(U (2) (φs n + τ (φs n − φs n )) i−1

i

i−1

7 −U (2) (φs n ))(φs n − φs n )(φs n − φs n ) dτ d¯s − i−1

−S2,n −

m .

i

i−1

i

i−1

n

si

n si−1

g¯ s ds



j 

lt

j =1

7 2 " . (1) U (φs n )( =

n

n

i−1

i=1

7 ·(

n si n

gs ds +

si−1

si

n si−1

m . j =1

gs ds +

j =1

j =1

si−1



j  n si

7 s i j  φ n ))( n si−1 si−1

−φ

j  n )) + si−1

n

j  (φ n si



7 m . j  j  + (φ n − φ n )) − si

m .

gs ds

n

si

n

si−1

U (1) (φs )(gs ) ds

1 (2) U (φs n ) i−1 2

882

14 Probability Theory

1. − 2 m

7

7

n

si

(2)

U

n

j  j  (φs )(hs )(hs ) ds

+ 0

j =1 si−1

7 +τ (φs n − φs n )) − U i

7

n

·(

gs ds +

n

si−1

m .



j =1

7 ." U (1) (φs n )( 2n

=

(2)

i−1

si

i−1

i=1 m ". 2 .

1 7 s¯

(φs n ))( i−1

j  n si

−φ

7

n

n

2 7 .

1 7 s¯

n

+

0

i=1

7 ·(

0

j =1

n

U (1) (φs )(gs ) ds

n

j  n si

−φ

j  j  n ) − lt si−1

#

#

7 i−1

i

i−1

2 7 m . .

1 7 s¯

n

gs ds) dτ d¯s + 2

n si−1

si−1

m # . j  d¯s − S2,n − lt

U (2) (φs n + τ (φs n − φs n ))(

n

si

7 ·(

i−1

i=1

si

j =1

si−1

U (1)(φs n )(φ

j =1

m . j  j  (φ n − φ n ))

gs ds +

n

+

i−1

si−1

si

gs ds) −

si−1

(U (2) (φs n

n

si

j  n )) dτ si−1

n

si

0

j =1 i=1

0

0

n

si

n

gs ds)

si−1

U (2) (φs n + τ (φs n − φs n )) i−1

i

i−1

n

si

gs ds)(φ

n si−1

j  n si

−φ

j  n ) dτ si−1

d¯s

n

m m 2 1 . . . (2) j  j  j  j  U (φs n )(φ n1 − φ n1 )(φ n2 − φ n2 ) + si si−1 si si−1 i−1 2 j1 =1 j2 =1 i=1

1. − 2 m

7

t

j 

j 

U (2) (φs )(hs )(hs ) ds

j =1 a 2n

1 . . (2) j  j  − U (φs n )(h n )(h n ) si−1 si−1 i−1 2 m

j =1 i=1

m . 2 7 m . .

1 7 s¯

n

j  n si

·((w

j  2 n ) si−1

−w

n

− (si

n

− si−1 )) +

j1 =1 j2 =1 i=1

+τ (φs n − φs n )) − U (2) (φs n ))(φ i

·(φ

j2  n si

i−1

−φ

j2  n ) dτ si−1

i−1

d¯s

j1  n si

−φ

j1  n ) si−1

0

0

(U (2) (φs n

i−1

14.13 Itô Processes

7

t

=

883

(U (1) (Γs ) − U (1) (φs ))(gs ) ds

a 2 7 .

+

7 ·(

(2)

(φs n + τ (φs n − φs n ))( i−1

0

n

si

n si−1

7 ·(

U 0

i=1

7

1 7 s¯

n

i

i−1

m . 2 7 .

1 7 s¯

n

gs ds) dτ d¯s + 2

0

j =1 i=1

0

n

si

gs ds)

n

si−1

U (2) (φs n + τ (φs n − φs n )) i−1

i

i−1

n

si

n si−1

gs ds)(φ

j  n si

−φ

j  n ) dτ si−1

d¯s

1 . " . (2) j  j  n n U (φs n )(h n )(h n )(si − si−1 ) + si−1 si−1 i−1 2 7

m

2n

j =1

i=1

t



j 

j 

U (2) (φs )(hs )(hs ) ds

#

a n

m m 2 1 . . . (2) j  j  j  j  + U (φs n )(φ n1 − φ n1 )(φ n2 − φ n2 ) si si−1 si si−1 i−1 2 j1 =1 j2 =1 i=1 2n

1 . . (2) j  j  j  j  U (φs n )(h n )(h n )(w n − w n )2 si−1 si−1 si si−1 i−1 2 m



j =1 i=1

m ". 2 . n

+

j =1

U (1)(φs n )(φ i−1

i=1

m . 2 7 m . .

1 7 s¯

n

+

j1 =1 j2 =1 i=1

·(φ

j1  n si

−φ

0

0

j1  j2  n )(φ n si−1 si

j  n si

−φ

j  j  n ) − lt si−1

#

(U (2) (φs n + τ (φs n − φs n )) − U (2) (φs n )) i−1

−φ

j2  n ) dτ si−1

i

i−1

d¯s

=: T1,n + T2,n + T3,n + T4,n + T5,n + T6,n where n

Γ (s, ω) :=

2 .

.

i=1

φs n (ω)χ[s n ,s n ),r (s); i−1

i−1

i

a,t

∀(s, ω) ∈ ra,t × Ω

i−1

884

14 Probability Theory

7

t

T1,n :=

(U (1) (Γs ) − U (1) (φs ))(gs ) ds

a 2 7 .

T2,n :=

7 ·(

U 0

i=1

7

1 7 s¯

n

(2)

(φs n + τ (φs n − φs n ))( i−1

0

i

n

si

n si−1

i−1

m . 2 7 .

1 7 s¯

n

gs ds) dτ d¯s + 2

0

j =1 i=1

7 −φs n ))(

n

gs ds)

si−1

U (2) (φs n + τ (φs n i−1

0

i

n

si

gs ds)(φ

n si−1

i−1

n

si

j  n si

−φ

j  n ) dτ si−1

d¯s

=: T2,n,1 + T2,n,2 T3,n

1 . " . (2) j  j  n n := U (φs n )(h n )(h n )(si − si−1 ) si−1 si−1 i−1 2 m

2n

j =1

i=1

7

t



j 

j 

U (2) (φs )(hs )(hs ) ds

#

a

T4,n

1 := 2

m .

n

m . 2 .

j1 =1 j2 =1 i=1

i−1

j1  n si

−φ

j1  j2  n )(φ n si−1 si

−φ

j2  n ) si−1

2n

1 . . (2) j  j  j  j  U (φs n )(h n )(h n )(w n − w n )2 si−1 si−1 si si−1 i−1 2 m



U (2) (φs n )(φ

j =1 i=1

m ". 2 . n

T5,n :=

j =1

U (1) (φs n )(φ i−1

i=1

m . m . 2 7 .

1 7 s¯

n

T6,n :=

j1 =1 j2 =1 i=1

0

−U (2) (φs n ))(φ i−1

0 j1  n si

j  n si

−φ

j  n ) si−1

j 

− lt

#

(U (2) (φs n + τ (φs n − φs n )) i−1

−φ

j1  j2  n )(φ n si−1 si

−φ

i

i−1

j2  n ) dτ si−1

d¯s

the first equality follows from the definition of Tˆ and Proposition 11.92, the second equality follows from the definition of X, the third equality follows from result of the last paragraph, the fourth equality follows from assumptions (d), (i), (ii), Proposition 11.92, Theorem A.7, and the discussion in the fourth paragraph of the proof, the fifth equality follows from Definitions 9.3 and 9.25, Proposition 9.28, and the definition of S2,n , the sixth equality follows from simple algebra, Proposition 11.92, and Theorem 12.83, and the assignment of T1,n follows from the definition of Γ . Now, we will deal with the terms above one by one.

14.13 Itô Processes

885

Since φω : ra,b → D is continuous, ∀ω ∈ Ω, then we have φω is uniformly continuous since ra,b is compact. Then, we have Γω converges to φω uniformly as n → ∞, ∀ω ∈ Ω. By the same argument, U (1)(Γω ) converges to U (1) (φω ) uniformly as n → ∞, ∀ω ∈ Ω. Note that ∀ω ∈ Ω, 6 6 6 6T1,n (ω)6 = 6 6 Z

7

t

.

a

6 6 (U (1)(Γ (s, ω)) − U (1) (φ(s, ω)))(g(s, ω)) ds 6

7 t6 6 6 6 (1) ≤ 6(U (Γ (s, ω)) − U (1)(φ(s, ω)))(g(s, ω))6 a

7 t6 6 6 6 (1) ≤ 6U (Γ (s, ω)) − U (1)(φ(s, ω))6 7

a

t

≤ a

δˆn (ω)g(s, ω)Y ds = δˆn (ω)

6 6 ≤ δˆn (ω) 6gω 6L¯

7

t a

B(Y,Z)

Z

Z

ds

g(s, ω)Y ds

g(s, ω)Y ds

1 (ra,b ,Y)

where the first inequality follows from Proposition 11.92, the second inequality follows from Proposition 7.64, the third inequality follows from the definition δˆn (ω) :=

sup

.

u,v∈ra,b , |u−v|≤2−n (b−a)

6 6 (1) 6U (φ(v, ω)) − U (1) (φ(u, ω))6 B(Y,Z)

(14.25)

and the second equality follows from Proposition 11.92. It follows from Proposition 14.109 that δˆn is B-measurable, ∀n ∈ N. By the fact that limn∈N δˆn (ω) = 0, ∀ω ∈ Ω, we have limn∈N T1,n (ω) = ϑZ , ∀ω ∈ Ω. Now, we analyze the term T2,n = T2,n,1 + T2,n,2 . Note that, ∀ω ∈ Ω, 6 6 6T2,n,1 (ω)6 Z

.

2n 7 6. 6 =6

1 7 s¯

i=1

7

6 6 gs ds) dτ d¯s 6

n

si

n

si−1 2 67 . 6 ≤ 6

7

U

0

i=1 n

si

n

si−1

0

i

(2)

gs ds)

si−1

Z

(φs n + τ (φs n − φs n ))(

6 6 gs ds) dτ d¯s 6

n

si

n

i−1

7

1 7 s¯

n

·(

i−1

0

·(

0

7 U (2) (φs n + τ (φs n − φs n ))(

i−1

Z

i

i−1

n

si

n

si−1

gs ds)

886

14 Probability Theory 2 7 .

1 7 s¯

n



0

i=1

7

0

n

1 7 s¯

n



i=1

0

67 6 ·6

si

n

0

"7 ·

si

0



i−1

n

1 7 s¯

i=1

0

"7 ·

si

0

n

n

si−1

i

i−1

B(Y,B(Y,Z))

6 6 6 (2) 6 6U (φs n + τ (φs n − φs n ))6 i−1

gs Y ds

si−1 n

gs ds)

6 6 6 (2) 6 6U (φs n + τ (φs n − φs n ))6

n

2 7 .

n

si−1

62 6 gs ds 6 dτ d¯s

1 7 s¯

i=1

i−1

Y

si−1

n



0

n

2 7 .

i

Z

si−1 2 7 .

i−1

n

si

6 6 gs ds)6 dτ d¯s

n

si

·(

7 6 6 (2) 6U (φs n + τ (φs n − φs n ))(

#2

i

i−1

B(Y,B(Y,Z))

dτ d¯s

6 6 6 6 (2) 6U (φs n + τ (φs n − φs n ))6 i−1

i

i−1

B(Y,B(Y,Z))

# n n gs 2Y ds (si − si−1 ) dτ d¯s

7 7 2n 6 . t − a 1 s¯ 6 6 6 (2) = (φ dτ d¯s n + τ (φ n − φ n ))6 6U si−1 si si−1 2n 0 0 B(Y,B(Y,Z)) i=1

7 ·

n

si

n

si−1

g(s, ω)2Y ds

7 s 2 i b−a . g(s, ω)2Y ds M2 (ω) ≤ n+1 n 2 si−1 n

n

i=1

6 62 b−a ≤ n+1 M2 (ω)6gω 6L¯ (r ,Y) 2 a,b 2 where the first inequality follows from the triangular inequality, the second inequality follows from Proposition 11.92, the third inequality follows from Proposition 7.64, the fourth inequality follows from Proposition 11.92 and the assumption (c), the fifth inequality follows from Cauchy–Schwarz Inequality, the second n equality follows from Proposition 11.92 and the definition of si ’s, and the sixth

14.13 Itô Processes

887

inequality follows from the definition M2 (ω) :=

6 (2) 6 6U (αφ(v1 , ω) + (1 − α)φ(v2 , ω))6

sup

.

B(Y,B(Y,Z))

v1 ,v2 ∈ra,b , α∈r0,1

a. By the assumption, we have f is bounded, i. e., .∃M ∈ [0, ∞) ⊂ R such that .f (s)Y ≤ M, .∀s ∈ I . Now, fix any . ∈ (0, ∞) ⊂ R. By Riemann Criterion for Integrability, Theorem A.9, and its proof, .∃ a sampled partition .∃Rˆ  := ((si )ni=0 , (ξi )ni=1 ) ∈ Iˆ (I ), we have . ni=1 Mi (si − si−1 ) < 2 . Then, .n ∈ N (since .a < b) .a = s0 ≤ s1 ≤ s2 ≤ · · · ≤ sn = b, and .ξi ∈ [si−1 , si ] ⊂ R, .i = 1, . . . , n. Without loss of generality, we may assume that .a = s0 < s1 < s2 < · · · < sn = b. mini=1,...,n (si −si−1 )  Define .δ := ∧ 8(M+1)n ∈ (0, ∞) ⊂ R. Fix any .Rˆ ∈ Iˆ (I ) 3 .

A Elements in Calculus

919

 nˆ ˆ ˆ < δ. Let .Rˆ = ((ˆsi )ni=0 with .Gauge(R) , ξˆi i=1 ). Clearly .nˆ ∈ N. Without loss of generality, assume .a = sˆ0 < sˆ1 < · · · < sˆnˆ = b, and .ξˆi ∈ [ˆsi−1 , sˆi] ⊂ R, .i = 1, . . . , n. ˆ Let .{ˇsi ∈ I | i = 0, . . . , n} ˇ := {s0 , . . . , sn } ∪ sˆ0 , . . . , sˆnˆ . Then,  nˇ .n ˇ ∈ N and we may define . sˇi i=0 such that .a = sˇ0 < sˇ1 < · · · < sˇnˇ = b. It   is clear that there exists .i0 , . . . , in ∈ 0, . . . , nˇ such that .sj = sˇij , .j = 0, . . . , n, and .i0 = 0 and .in = n. ˇ By our choice of .δ, we have .0 = i0 < i0 + 2 < i1 < i1 + 2 < i2 < i2 + 2 < · · · < in = n. ˇ Let .ξˇi = sˇi−1 , .i = 1, . . . , n. ˇ Now, define  nˇ n ˇ ˆ ˆ ˆ ˇ .Q := ((ˇ si )i=0 , ξi i=1 ). Clearly, .Q ∈ I (I ). Then, we have the following line of arguments. nˆ . .

Mˆ i (ˆsi − sˆi−1 )

i=1 nˆ nˇ nˇ .  .  .     Mˆ i (ˆsi − sˆi−1 ) − Mˇ i (ˇsi − sˇi−1 ) +  Mˇ i (ˇsi − sˇi−1 ) ≤ i=1

i=1

i=1

nˆ nˇ .   .