226 10 6MB
English Pages 370 [355] Year 2021
From Euclidean to Hilbert Spaces
To my mentors, Sissa Abbati and Renzo Cirelli, who taught me the importance of rigor in mathematics, and to Brunella, Paola, Clara and Tommo, whose passion for their work has both helped and brought joy to many
From Euclidean to Hilbert Spaces Introduction to Functional Analysis and its Applications
Edoardo Provenzi
First published 2021 in Great Britain and the United States by ISTE Ltd and John Wiley & Sons, Inc.
Apart from any fair dealing for the purposes of research or private study, or criticism or review, as permitted under the Copyright, Designs and Patents Act 1988, this publication may only be reproduced, stored or transmitted, in any form or by any means, with the prior permission in writing of the publishers, or in the case of reprographic reproduction in accordance with the terms and licenses issued by the CLA. Enquiries concerning reproduction outside these terms should be sent to the publishers at the undermentioned address: ISTE Ltd 27-37 St George’s Road London SW19 4EU UK
John Wiley & Sons, Inc. 111 River Street Hoboken, NJ 07030 USA
www.iste.co.uk
www.wiley.com
© ISTE Ltd 2021 The rights of Edoardo Provenzi to be identified as the author of this work have been asserted by him in accordance with the Copyright, Designs and Patents Act 1988. Library of Congress Control Number: 2021937006 British Library Cataloguing-in-Publication Data A CIP record for this book is available from the British Library ISBN 978-1-78630-682-1
Contents
Preface . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
xi
Chapter 1. Inner Product Spaces (Pre-Hilbert) . . . . . . . . . . . . . .
1
1.1. Real and complex inner products . . . . . . . . . . . . . . . . . 1.2. The norm associated with an inner product and normed vector spaces . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.2.1. The parallelogram law and the polarization formula . . . . 1.3. Orthogonal and orthonormal families in inner product spaces 1.4. Generalized Pythagorean theorem . . . . . . . . . . . . . . . . 1.5. Orthogonality and linear independence . . . . . . . . . . . . . 1.6. Orthogonal projection in inner product spaces . . . . . . . . . 1.7. Existence of an orthonormal basis: the Gram-Schmidt process 1.8. Fundamental properties of orthonormal and orthogonal bases 1.9. Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . .
6 9 11 11 13 15 19 20 28
Chapter 2. The Discrete Fourier Transform and its Applications to Signal and Image Processing . . . . . . . . . . . . . . . . . . . . . . . . .
31
2.1. The space 2 pZN q and its canonical basis . . . . . . . . . . . . 2.1.1. The orthogonal basis of complex exponentials in 2 pZN q . 2.2. The orthonormal Fourier basis of 2 pZN q . . . . . . . . . . . . 2.3. The orthogonal Fourier basis of 2 pZN q . . . . . . . . . . . . . 2.4. Fourier coefficients and the discrete Fourier transform . . . . 2.4.1. The inverse discrete Fourier transform . . . . . . . . . . . 2.4.2. Definition of the DFT and the IDFT with the orthonormal Fourier basis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.4.3. The real (orthonormal) Fourier basis . . . . . . . . . . . . 2.5. Matrix interpretation of the DFT and the IDFT . . . . . . . . . 2.5.1. The fast Fourier transform . . . . . . . . . . . . . . . . . .
. . . . . . . . .
. . . . . . . . .
. . . . . . . . .
. . . . . . . . .
1
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
31 34 38 40 41 44
. . . .
. . . .
. . . .
. . . .
. . . .
46 47 48 51
vi
From Euclidean to Hilbert Spaces
2.6. The Fourier transform in signal processing . . . . . . . . . . . . . . . . 51 2.6.1. Synthesis formula for 1D signals: decomposition on the harmonic basis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51 2.6.2. Signification of Fourier coefficients and spectrums of a 1D signal . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53 2.6.3. The synthesis formula and Fourier coefficients of the unit pulse . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54 2.6.4. High and low frequencies in the synthesis formula . . . . . . . . . 55 2.6.5. Signal filtering in frequency representation . . . . . . . . . . . . . . 59 2.6.6. The multiplication operator and its diagonal matrix representation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60 2.6.7. The Fourier multiplier operator . . . . . . . . . . . . . . . . . . . . 60 2.7. Properties of the DFT . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61 2.7.1. Periodicity of zˆ and zˇ . . . . . . . . . . . . . . . . . . . . . . . . . . 62 2.7.2. DFT and shift . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63 2.7.3. DFT and conjugation . . . . . . . . . . . . . . . . . . . . . . . . . . 67 2.7.4. DFT and convolution . . . . . . . . . . . . . . . . . . . . . . . . . . 68 2.8. The DFT and stationary operators . . . . . . . . . . . . . . . . . . . . . 73 2.8.1. The DFT and the diagonalization of stationary operators . . . . . . 75 2.8.2. Circulant matrices . . . . . . . . . . . . . . . . . . . . . . . . . . . . 77 2.8.3. Exhaustive characterization of stationary operators . . . . . . . . . 78 2.8.4. High-pass, low-pass and band-pass filters . . . . . . . . . . . . . . . 82 2.8.5. Characterizing stationary operators using shift operators . . . . . . 83 2.8.6. Frequency analysis of first and second derivation operators (discrete case) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 84 2.9. The two-dimensional discrete Fourier transform (2D DFT) . . . . . . . 88 2.9.1. Matrix representation of the 2D DFT: Kronecker product versus iteration of two 1D DFTs . . . . . . . . . . . . . . . . . . . . . . . . . . . 91 2.9.2. Properties of the 2D DFT . . . . . . . . . . . . . . . . . . . . . . . . 93 2.9.3. 2D DFT and stationary operators . . . . . . . . . . . . . . . . . . . 95 2.9.4. Gradient and Laplace operators and their action on digital images . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 97 2.9.5. Visualization of the amplitude spectrum in 2D . . . . . . . . . . . . 97 2.9.6. Filtering: an example of digital image filtering in a Fourier space . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 100 2.10. Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 102 Chapter 3. Lebesgue’s Measure and Integration Theory . . . . . . . . 105 3.1. Riemann versus Lebesgue . . . . . . . . . . . . . . . . . . . . 3.2. σ-algebra, measurable space, measures and measured spaces . 3.3. Measurable functions and almost-everywhere properties (a.e) 3.4. Integrable functions and Lebesgue integrals . . . . . . . . . .
. . . .
. . . .
. . . .
. . . .
. . . .
105 106 108 109
Contents
vii
3.5. Characterization of the Lebesgue measure on R and sets with a null Lebesgue measure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 111 3.6. Three theorems for limit operations in integration theory . . . . . . . . 113 3.7. Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 114 Chapter 4. Banach Spaces and Hilbert Spaces . . . . . . . . . . . . . . 115 4.1. Metric topology of inner product spaces . . . . . . . . . . . . 4.2. Continuity of fundamental operations in inner product spaces 4.2.1. Equivalence of separated topologies in finite-dimension vector spaces . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.3. Cauchy sequences and completeness: Banach and Hilbert . . 4.3.1. Completeness of vector spaces . . . . . . . . . . . . . . . . 4.3.2. Characterizing the completeness of normed vector spaces using series . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.3.3. Banach fixed-point theorem . . . . . . . . . . . . . . . . . 4.4. Remarkable examples of Banach and Hilbert spaces . . . . . . 4.4.2. L8 and 8 spaces . . . . . . . . . . . . . . . . . . . . . . . 4.4.3. Inclusion relationships between p spaces . . . . . . . . . 4.4.4. Inclusion relationships between Lp spaces . . . . . . . . . 4.4.5. Density theorems in Lp (X,A,μ) . . . . . . . . . . . . . . . 4.5. Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . 116 . . . . . 120 . . . . . 128 . . . . . 129 . . . . . 133 . . . . . . . .
. . . . . . . .
. . . . . . . .
. . . . . . . .
. . . . . . . .
135 139 145 156 161 163 165 169
Chapter 5. The Geometric Structure of Hilbert Spaces . . . . . . . . . 171 5.1. The orthogonal complement in a Hilbert space and its properties 5.2. Projection onto closed convex sets: theorem and consequences . 5.2.1. Characterization of closed vector subspaces in Hilbert spaces 5.3. Polar and bipolar subsets of a Hilbert space . . . . . . . . . . . . . 5.4. The (orthogonal) projection theorem in a Hilbert space . . . . . . 5.5. Orthonormal systems and Hilbert bases . . . . . . . . . . . . . . . 5.5.1. Bessel’s inequality and Fourier coefficients . . . . . . . . . . . 5.5.2. The Fischer-Riesz theorem . . . . . . . . . . . . . . . . . . . . 5.5.3. Characterizations of a Hilbert basis (or complete orthonormal system) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.5.4. Isomorphisms between Hilbert spaces . . . . . . . . . . . . . . 5.5.5. 2 pN, Kq as the prototype of separable Hilbert spaces of infinite dimension . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.6. The Fourier Hilbert basis in L2 . . . . . . . . . . . . . . . . . . . . 5.6.1. L2 r´π, πs or L2 r0, 2πs . . . . . . . . . . . . . . . . . . . . . . 5.6.2. L2 pTq . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.6.3. L2 ra, bs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.6.4. Real Fourier series . . . . . . . . . . . . . . . . . . . . . . . . . 5.6.5. Pointwise convergence of the real Fourier series: Dirichlet’s theorem . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . .
. . . . . . . .
. . . . . . . .
171 174 180 182 185 188 189 192
. . . 194 . . . 199 . . . . . .
. . . . . .
. . . . . .
201 202 202 204 205 206
. . . 212
viii
From Euclidean to Hilbert Spaces
5.6.6. The Gibbs phenomenon and Cesàro summation . 5.6.7. Speed of convergence to 0 of Fourier coefficients 5.6.8. Fourier transform in L2 pTq and shift . . . . . . . 5.7. Summary . . . . . . . . . . . . . . . . . . . . . . . . .
. . . .
. . . .
Chapter 6. Bounded Linear Operators in Hilbert Spaces
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
214 214 218 219
. . . . . . . 221
6.1. Fundamental properties of bounded linear operators between normed vector spaces . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6.1.1. Continuity of linear operators defined on a finite-dimensional normed vector space . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6.2. The operator norm, convergence of operator sequences and Banach algebras . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6.2.1. A classical example of a non-bounded linear operator on a vector space of infinite dimension . . . . . . . . . . . . . . . . . . . . . . . . . . 6.3. Invertibility of linear operators . . . . . . . . . . . . . . . . . . . . . . . 6.4. The dual of a Hilbert space and the Riesz representation theorem . . . 6.4.1. The scalar product induced on the dual of a Hilbert space . . . . . 6.5. Bilinear forms, sesquilinear forms and associated quadratic forms . . . 6.5.1. The Lax-Milgram theorem and its consequences . . . . . . . . . . . 6.6. The adjoint operator: presentation and properties . . . . . . . . . . . . 6.7. Orthogonal projection operators in a Hilbert space . . . . . . . . . . . . 6.7.1. Bounded multiplication operators and their relation to orthogonal projectors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6.7.2. Geometric realization of orthogonal projection operators via orthonormal systems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6.8. Isometric and unitary operators . . . . . . . . . . . . . . . . . . . . . . 6.8.1. Characterizations of isometric and unitary operators . . . . . . . . 6.8.2. Relationship between isometric and unitary operators and orthonormal systems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6.9. The Fourier transform on SpRn q, L1 pRn q and L2 pRn q . . . . . . . . . 6.9.1. The invariance of the Schwartz space with respect to the Fourier transform . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6.9.2. Extension of the Fourier transform of SpRn q to L1 pRn q: the Riemann-Lebesgue theorem . . . . . . . . . . . . . . . . . . . . . . . . 6.9.3. Extension of the Fourier transform to a unitary operator on L2 pRn q: the Fourier-Plancherel transform . . . . . . . . . . . . . . . . . . 6.9.4. Relationship between the Fourier-Plancherel transform and the Hermitian Hilbert basis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6.9.5. The Fourier transform and convolution . . . . . . . . . . . . . . . . 6.9.6. Convolution and Fourier transforms in L2 : localization of the Fourier transform . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6.10. The Nyquist-Shannon sampling theorem . . . . . . . . . . . . . . . . 6.10.1. The Nyquist frequency: aliasing and oversampling . . . . . . . . .
223 226 227 238 239 244 249 249 257 261 269 278 280 286 288 293 296 296 301 302 305 306 309 310 312
Contents
6.11. Application of the Fourier transform to solve ordinary and partial differential equations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6.11.1. Solving an ordinary differential equation using the Fourier transform . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6.11.2. The Fourier transform and partial differential equations . . . . 6.11.3. Solving the partial differential equation for heat propagation using the Fourier transform . . . . . . . . . . . . . . . . . . . . . . . . 6.12. Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
ix
. . 313 . . 313 . . 315 . . 316 . . 319
Appendix 1: Quotient Space . . . . . . . . . . . . . . . . . . . . . . . . . . 323 Appendix 2: The Transpose (or Dual) of a Linear Operator . . . . . . 329 Appendix 3: Uniform, Strong and Weak Convergence . . . . . . . . . 331 References Index
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 335
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 337
Preface
This book provides an introduction to the key theoretical concepts associated with Hilbert spaces and with operators defined over these spaces. Our decision to dedicate a whole book to the subject of Hilbert spaces stems from a simple observation: of all the infinite dimensional vector spaces, Hilbert spaces bear the closest resemblance to finite dimensional Euclidean spaces, that is, Rn or Cn , which provide the framework for classical analysis and linear algebra. The topological subtleties which come into play when using infinite dimensions mean that certain conditions (which are always verified in finite dimensions) must be posed in order to maintain the validity of known results from Euclidian spaces. For Hilbert spaces, one of these topological conditions is completeness, that is, any Cauchy sequence must converge in the space in which it is defined. From this perspective, the theory of Hilbert spaces may be seen as an elegant conjunction of algebra, analysis and topology. It draws on the work of some of the great mathematicians of the early 20th century, including Riesz, Banach and, evidently, Hilbert, who established the conditions needed to extend classical algebra and analysis into infinite dimensions. One particularly important linear operator, the Fourier transform, appears on multiple occasions throughout this book. We start by examining the properties of this transform in finite dimensions, with the discrete Fourier transform, before extending it to infinite dimensions, considering the use of this operator in a range of different domains, including signal and image processing. A clear understanding of the concepts introduced in this book is essential for mathematicians, physicists or engineers hoping to progress in any field, whether applied or theoretical. These concepts provide access to tools and techniques
xii
From Euclidean to Hilbert Spaces
developed over a particularly rich, creative period in the history of mathematics, which remain relevant for both pure and applied forms of the subject. The author would like to thank Olivier Husson for his assistance in producing the majority of the figures included in this book. April 2021
1 Inner Product Spaces (Pre-Hilbert)
This chapter will focus on inner product spaces, that is, vector spaces with a scalar product, specifically those of finite dimension. 1.1. Real and complex inner products In real Euclidean spaces R2 and R3 , the inner product of two vectors v, w is defined as the real number: v ‚ w “ xv, wy “ }v}}w} cospϑq where ϑ is the smallest angle between v and w and } } represents the norm (or the magnitude) of the vectors. Using the inner product, it is possible to define the orthogonal projection of vector v in the direction defined by vector w. A distinction must be made between: – the scalar projection of v in the direction of w: }v} cospθq “
xv,wy }w}
w – the vector projection of v in the direction of w: }v} cospθq }w} “
; and
xv,wy }w}2 w
;
w is the unit vector in the direction of w. Evidently, the roles of v and w can where }w} be reversed.
The absolute value of the scalar projection measures the “similarity” of the directions of two vectors. To understand this concept, consider two remarkable relative positions between v and w: – if v and w possess the same direction, then the angle between them ϑ is either null or π, hence cospϑq “ ˘1, that is, the absolute value of the scalar projection of v in direction w is }v}; From Euclidean to Hilbert Spaces: Introduction to Functional Analysis and its Applications First Edition. Edoardo Provenzi. © ISTE Ltd 2021. Published by ISTE Ltd and John Wiley & Sons, Inc.
2
From Euclidean to Hilbert Spaces
– however, if v and w are perpendicular, then ϑ “ π2 and hence cospϑq “ 0, showing that the scalar projection of v in direction w is null. When the position of v relative to w falls somewhere in the interval between the two vectors described above, the absolute value of the scalar projection of v in the direction of w falls between 0 and }v}; this explains its use to measure the similarity of the direction of vectors. In this book, we shall consider vector spaces which are far more complex than R2 and R3 , and the measure of vector similarity obtained through projection supplies crucial information concerning the coherence of directions. Before we can obtain this information, we must begin by moving from Euclidean spaces R2 and R3 to abstract vector spaces. The general definition of an inner product and an orthogonal projection in these spaces may be seen as an extension of the previous definitions, permitting their application to spaces in which our representation of vectors is no longer applicable. Geometric properties, which can only be apprehended and, notably, visualized in two or three dimensions, must be replaced by a set of algebraic properties which can be used in any dimension. Evidently, these algebraic properties must be necessary and sufficient to characterize the inner product of vectors in a plane or in real space. This approach, in which we generalize concepts which are “intuitive” in two or three dimensions, is a classic approach in mathematics. In this chapter, the symbol V will be used to describe a vector space defined over the field K, where K is either R or C and is of finite dimension n ă `8. Field K contains the scalars used to construct linear combinations between vectors in V . Note that two finite dimensional vector spaces are isomorphic if and only if they are of the same dimension. Furthermore, if we establish a basis B “ pb1 , . . . , bn q for V , an isomorphism between V and Kn can be constructed as follows: I:
n ÝÑ K ¨ ˛ v1 n ř ˚ .. ‹ v “ rvsB “ vi bi ÞÝÑ ˝ . ‚
V
i“1
vn
that is, I associates each v P V with the vector of Kn given by the scalar components of v in relation to the established basis B. Since I is an isomorphism, it follows that Kn is the prototype of all vector spaces of dimension n over a field K. D EFINITION 1.1.– Let V be a vector space defined over a field K.
Inner Product Spaces (Pre-Hilbert)
3
A K-form over V is an application defined over V ˆ V with values in K, that is: φ : V ˆ V ÝÑ K pv, wq ÞÝÑ φpv, wq. D EFINITION 1.2.– Let V be a real vector space. A couple pV, x, yq is said to be a real inner product space (or a real pre-Hilbert space) if the form x, y is: 1) bilinear, i.e.1 linear in relation to each argument (the other being fixed): xv1 ` v2 , w1 ` w2 y “ xv1 , w1 y ` xv1 , w2 y ` xv2 , w1 y ` xv2 , w2 y, @ v 1 , v 2 , w1 , w2 P V and: xαv, βwy “ αxv, βwy “ βxαv, wy “ αβxv, wy,
@α, β P R, v, w P V
2) symmetrical: xv, wy “ xw, vy, @v, w P V ; 3) defined: xv, vy “ 0 ðñ v “ 0V , the null vector of the vector space V ; 4) positive: xv, vy ą 0 @v P V , v ‰ 0V . Upon reflection, we see that, for a real form over V , the symmetry and bilinearity requirements are equivalent to requiring symmetry and linearity on the left-hand side, that is: xv1 `v2 , wy “ xv1 , wy`xv2 , wy,
xαv, wy “ αxv, wy,
@v, w P V, @α P R
The simplest and most important example of a real inner product is the canonical inner product, defined as follows: let v “ pv1 , v2 , . . . , vn q, w “ pw1 , w2 , . . . , wn q be two vectors in Rn written with their components in relation to any given, but fixed, basis pbi qni“1 in Rn . The canonical inner product of v and w is: xv, wy ”
n ÿ
vi w i “ v t ¨ w “ v ¨ w t ,
i“1 t
t
where v and w in the final equations are the transposed vectors of v and w, giving us the matrix product of a line vector (treated as a 1 ˆ n matrix) and a column vector (treated as an n ˆ 1 matrix). The extension of these definitions to complex vector spaces is not particularly straightforward. First, note that if V is a complex vector space, then there is no bilinear and definite-positive transformation over V ˆ V . In this case, any vector v P V would give the following: xiv, ivy “ i2 xv, vy “ ´xv, vy ď 0 since xv, vy ě 0 by positivity. 1 i.e. is the abbreviation of the Latin expression “id est”, meaning “that is”. This term is often used in mathematical literature.
4
From Euclidean to Hilbert Spaces
As we shall see, the property of positivity is essential in order to define a norm (and thus a distance, and by extension, a topology) from a complex inner product. To obtain an algebraic structure for complex scalar products which remains compatible with a topological structure, we are therefore forced to abandon the notion of bilinearity, and to search for an alternative. We could consider antilinearity2, i.e. ¯ wy xαv, βwy “ α ¯ βxv, But it has the same problem as bilinearity, xiv, ivy “ p´iqp´iqxv, vy “ i2 xv, vy “ ´xv, vy2 ď 0. A simple analysis shows that, in order to avoid losing the positivity, it is sufficient to request the linearity with respect to one variable and the antilinearity with respect to the other. This property is called sesquilinearity3. The choice of the linear and antilinear variable is entirely arbitrary. By convention, the antilinear component is placed on the right-hand side in mathematics, but on the left-hand side in physics. We have chosen ¯ wy. xαv, βwy “ αβxv,
to
adopt
the
mathematical
convention
here,
i.e.
Next, it is important to note that sesquilinearity and symmetry are incompatible: if both properties were verified, then xv, αwy “ α ¯ xv, wy, and also xv, αwy “ xαw, vy “ αxw, vy “ αxv, wy. Thus, xv, αwy “ α ¯ xv, wy “ αxv, wy which can only be verified if α P R. Thus x, y cannot be both sesquilinear and symmetrical when working with vectors belonging to a complex vector space. The example shown above demonstrates that, instead of symmetry, the property which must be verified for every vector pair v, w is xv, wy “ xw, vy, that is, changing the order of the vectors in x, y must be equivalent to complex conjugation. A transform which verifies this property is said to be Hermitian4. 2 The symbols z ˚ and z¯ represent the complex conjugation, i.e. ř śnif z P C, zś“n a ` ib, a,2b P R, řn ¯ then z ˚ “ z¯ “ a ´ ib. We recall that n k“1 zk “ k“1 zk , k“1 zk “ k“1 zk , |z| “ z z and z “ z¯ if and only if z P R. 3 Sesqui comes from the Latin semisque, meaning one and a half times. This term is used to highlight the fact that there are not two instances of linearity, but one “and a half”, due to the presence of the complex conjugation. 4 For the French mathematician Charles Hermite (1822, Dieuze-1901, Paris).
Inner Product Spaces (Pre-Hilbert)
5
These observations provide full justification for Definition 1.3. D EFINITION 1.3.– Let V be a complex vector space. The pair pV, x, yq is said to be a complex inner product space (or a complex pre-Hilbert space) if x, y is a complex form which is: 1) sesquilinear: xv1 ` v2 , w1 ` w2 y “ xv1 , w1 y ` xv1 , w2 y ` xv2 , w1 y ` xv2 , w2 y @ v1 , v2 , w1 , w2 P V , and: Antilinearity on the right ¯ ¯ wy ÝÝÝÝÝÝÝÝÝÝÝÝÑ xαv, βwy “ αxv, βwy “ βxαv, wy “ αβxv, Linearity on the left
@ α, β P C, @ v, w P V ; 2) Hermitian: xv, wy “ xw, vy, @v, w P V ; 3) definite: xv, vy “ 0 ðñ v “ 0V , the null vector of the vector space V ; 4) positive: xv, vy ą 0 @v P V , v ‰ 0V . As in the case of the canonical inner product, for a complex form over V , the symmetry and sesquilinearity requirement is equivalent to requiring the Hermitian property and linearity on the left-hand side; if these properties are verified, then: xv, αwy “ xαw, vy “ αxw, vy “ α ¯ xw, vy “ α ¯ xv, wy “ α ¯ xv, wy,
@α P C.
Considering the sum of n, rather than two, vectors, sesquilinearity is represented by the following formulae: x
n ÿ
αi vi , wy “
i“1
xv ,
n ÿ
αi xvi , wy
[1.1]
αi xv, wi y
[1.2]
i“1 n ÿ
αi w i y “
i“1
n ÿ i“1
In Cn , the complex Euclidean inner product is defined by: xv, wy ”
n ÿ
vi wi “ v ¨ pwqt “ v t ¨ w
i“1
where v “ pv1 , v2 , . . . , vn q, w “ pw1 , w2 , . . . , wn q P Cn are written with their components in relation to any given, but fixed, basis pbi qni“1 in Cn . The symbol K will be used throughout to represent either R or C in the context of properties which are valid independently of the reality or complexity of the inner product.
6
From Euclidean to Hilbert Spaces
T HEOREM 1.1.– Let pV, x , yq be an inner product space. We have: 1) xv, 0V y “ 0 @v P V ; 2) if xu, wy “ xv, wy @w P V , then u and v must coincide; 3) xv, wy “ 0 @v P V ðñ w “ 0V , i.e. the null vector is the only vector which is orthogonal to all of the other vectors. P ROOF.– 1) xv, 0V y “ xv, 0V ` 0V y xv, 0V y ´ xv, 0V y “ 0 “ xv, 0V y.
“
xv, 0V y ` xv, 0V y by linearity, i.e.
2) xu, wy “ xv, wy @w P V implies, by linearity, that xu ´ v, wy “ 0 @w P V and thus, notably, considering w “ u ´ v, we obtain xu ´ v, u ´ vy “ 0, implying, due to the definite positiveness of the inner product, that u ´ v “ 0V , i.e. u “ v. 3) If w “ 0V , then xv, wy “ 0 @v P V using property (1). Inversely, by hypothesis, it holds that xv, wy “ 0 “ xv, 0V y @v P V , but then property (2) implies that w “ 0V . 2 Finally, let us consider a typical property of the complex inner product, which results directly from a property of complex numbers. T HEOREM 1.2.– Let pV, x , yq be a complex inner product space. Thus: pxv, wyq “ pxv, iwyq
@v, w P V
P ROOF.– Consider any complex number z “ a ` ib, so ´iz “ b ´ ia, hence b “ pzq “ p´izq. Taking z “ xv, wy, we obtain pxv, wyq “ p´ixv, wyq “ pxv, iwyq by sesquilinearity. 2 1.2. The norm associated with an inner product and normed vector spaces If pV, x, yq is an inner product space over K, then a norm on V can be defined as follows: } }: V Ñ R` r0, `8q 0 “a v Ñ }v} “ xv, vy Note that }v} is well defined since xv, vy ě 0 @v P V . Once a norm has been established, it is always possible to define a distance between two vectors v, w in V : dpv, wq “ }v ´ w}.
Inner Product Spaces (Pre-Hilbert)
7
The vector v P V such that }v} “ 1 is known as a unit vector. Every vector v P V can be normalized to produce a unit vector, simply by dividing it by its norm. N OTABLE EXAMPLES .– g f n fÿ n pR , x, yq : }v} “ e v 2 i
i“1
g g f n f n ÿ f fÿ n e pC , x, yq : }v} “ vi v i “ e |vi |2 i“1
i“1
Three properties of the norm, which should already be known, are listed below. Taking any v, w P V , and any α P K: 1) }v} ě 0, }v} “ 0 ðñ v “ 0V ; 2) }αv} “ |α|}v} (homogeneity); 3) }v ` w} ď }v} ` }w} (triangle inequality). D EFINITION 1.4 (normed vector space).– A normed vector space is a pair pV, } }q given by a vector space V and a function, called a norm, } } : V Ñ R` 0 , satisfying the three properties listed above. A norm a } } is Hilbertian if there exists an inner product x , y on V such that }v} “ xv, vy @v P V . Canonically, an inner product space is therefore a normed vector space. Counterexamples can be used to show that the reverse is not generally true. Note that, by definition, xv, vy “ v v, but, in general, the magnitude of the inner product between two different vectors is dominated by the product of their norms. This is the result of the well-known inequality shown below. T HEOREM 1.3 (Cauchy-Schwarz inequality).– For all v, w P pV, x , yq we have: | xv, wy | ď }v}}w} P ROOF.– Dozens of proofs of the Cauchy-Schwarz inequality have been produced. One of the most elegant proofs is shown below, followed by the simplest one:
8
From Euclidean to Hilbert Spaces
– first proof : if w “ 0V , then the inequality is verified trivially with 0 “ 0. If xv,wy w ‰ 0V , then we can define z “ v ´ xv,wy }w}2 w, i.e. v “ }w}2 w ` z, and we note that: xz, wy “ xv ´ thus:
xv, wy xv, wy “0 wy w, wy “ xv, wy ´ xw, 2 }w}2 }w}
F xv, wy xv, wy w ` z, w ` z }w}2 }w}2 F B F B xv, wy xv, wy xv, wy “ w ` z ` z, w`z w, }w}2 }w}2 }w}2
}v}2 “ xv, vy “
“
B
xv, wy xv, wy ` xv, wy xw, zy ` xv, wy xz, wy ` xz, zy wy xw, 2 2 }w} }w}2 }w}2 }w}
|xv, wy|2 ` }z}2 }w}2 as the two intermediate terms in the penultimate step are zero, since xz, wy “ xw, zy “ 0. “
As }z}2 ě 0, we have seen that: }v}2 “
|xv, wy|2 |xv, wy|2 2 ` }z} ě }w}2 }w}2
i.e. |xv, wy|2 ď }v}2 }w}2 , hence |xv, wy| ď }v}}w}; – second proof (in one line!): @t P R we have: 0 ď }tv ´ w}2 “ xtv ´ w, tv ´ wy “ t2 }v}2 ´ 2txv, wy ` }w}2 ðñ Δ “ 4xv, wy2 ´ 4}v}2 }w}2 ď 0
2
The Cauchy-Schwarz inequality allows the concept of the angle between two vectors to be generalized for abstract vector spaces. In fact, it implies the existence of a coefficient k between ´1 and `1 such that xv, wy “ }v}}w}k, but, given that the restriction of cos to r0, πs creates a bijection with r´1, 1s, this means that there is only one ϑ P r0, πs such that xv, wy “ }v}}w} cos ϑ. ϑ P r0, πs is known as the angle between the two vectors v and w. Another very important property of the norm is as follows. T HEOREM 1.4.– Let pV, } }q be an arbitrary normed vector space and v, w P V . We have: |}v} ´ }w}| ď }v ´ w}
[1.3]
Inner Product Spaces (Pre-Hilbert)
9
P ROOF.– On one side: }v} “ }v ´ w ` w} “ }pv ´ wq ` w} ď }v ´ w} ` }w} by the triangle inequality, thus }v} ´ }w} ď }v ´ w}. On the other side: }w} “ }w ´ v ` v} “ }pw ´ vq ` v} ď }w ´ v} ` }w} thus }w} ´ }v} ď }v ´ w}, i.e. }v} ´ }w} ě ´}v ´ w}. Hence, ´}v ´ w} ď }v} ´ }w} ď }v ´ w}, i.e. |}v} ´ }w}| ď }v ´ w}.
2
The following formula is also extremely useful. T HEOREM 1.5 (Carnot’s theorem).– Taking v, w P pV, x , yq: 2
2
2
2
2
2
v ˘ w “ v ` w ˘ 2xv, wy,
pK “ Rq
[1.4]
and v ˘ w “ v ` w ˘ xv, wy ˘ xw, vy,
pK “ Cq
[1.5]
P ROOF.– Direct calculation: 2
v ˘ w “ xv ˘ w, v ˘ wy “ xv, vy ˘ xv, wy ˘ xw, vy ` xw, wy 2 2 “ v ` w ˘ xv, wy ˘ xw, vy
2
If K “ C, then xw, vy “ xv, wy, and since, if z “ a ` ib “ pzq ` ipzq, then z ` z¯ “ 2a “ 2pzq, we can rewrite [1.5] as: 2
2
2
v ˘ w “ v ` w ˘ 2pxv, wyq
[1.6]
The laws presented in this section have immediate consequences which will be highlighted in section 1.2.1. 1.2.1. The parallelogram law and the polarization formula The parallelogram law in R2 is shown in Figure 1.1. This law can be generalized on a vector space with an arbitrary inner product. T HEOREM 1.6 (Parallelogram law).– Let pV, x, yq be an inner product space on K. Thus, @v, w P V : 2
2
2
2
v ` w ` v ´ w “ 2pv ` w q
10
From Euclidean to Hilbert Spaces
Figure 1.1. Parallelogram law in R2 : The sum of the squares of the two diagonal lines is equal to two times the sum of the squares of the edges v and w. For a color version of this figure, see www.iste.co.uk/provenzi/spaces.zip 2
P ROOF.– A direct consequence of law [1.4] or law [1.5] taking v ` w then 2 v ´ w . 2 As we have seen, an inner product induces a norm. The polarization formula can be used to “reverse” roles and write the inner product using the norm. T HEOREM 1.7 (Polarization formula).– Let pV, x, yq be an inner product space on K. In this case, @v, w P V : ¯ 1´ 2 2 v ` w ´ v ´ w , pK “ Rq xv, wy “ 4 and: ¯ı ´ 1” 2 2 2 2 v ` w ´ v ´ w ` i v ` iw ´ v ´ iw , pK “ Cq xv, wy “ 4 P ROOF.– This law is a direct consequence of law [1.4], in the real case. For the complex case, w is replaced by iw in law [1.5], and by sesquilinearity, we obtain: 2
2
2
v ˘ iw “ v ` w ¯ ixv, wy ˘ ixw, vy 2
By direct calculation, we can then verify that v ` w 2 2 i v ` iw ´ i v ´ iw “ 4xv, wy.
2
´ v ´ w ` 2
It may seem surprising that something as simple as the parallelogram law may be used to establish a necessary and sufficient condition to guarantee that a norm over a vector space will be induced by an inner product, that is, the norm is Hilbertian. This notion will be formalized in Chapter 4.
Inner Product Spaces (Pre-Hilbert)
11
1.3. Orthogonal and orthonormal families in inner product spaces The “geometric” definition of an inner product in R2 and R3 indicates that this product is zero if and only if ϑ, the angle between the vectors, is π{2, which implies cospϑq “ 0. In more complicated vector spaces (e.g. polynomial spaces), or even Euclidean vector spaces of more than three dimensions, it is no longer possible to visualize vectors; their orthogonality must therefore be “axiomatized” via the nullity of their scalar product. D EFINITION 1.5.– Let pV, x, yq be a real or complex inner product space of finite dimension n. Let F “ tv1 , ¨ ¨ ¨ , vn u be a family of vectors in V . Thus: – F is an orthogonal family of vectors if each different vector pair has an inner product of 0: xvi , vj y “ 0; – F is an orthonormal family if it is orthogonal and, furthermore, }vi } “ 1 @i. Thus, if tvi uni“1 is an orthogonal family, tui “ }vi }´1 vi uni“1 is an orthonormal family. An orthonormal family (unit and orthogonal vectors) may be characterized as follows: # 1 if i “ j xvi , vj y “ δi,j “ Orthonormal family 0 if i ‰ j δi,j is the Kronecker delta5. 1.4. Generalized Pythagorean theorem The Pythagorean theorem can be generalized to abstract inner product spaces. The general formulation of this theorem is obtained using a lemma. L EMMA 1.1.– Let pV, x, yq be a real or complex inner product space. Let u P V be orthogonal to all vectors v1 , . . . , vn P V . Hence, u is also orthogonal to all vectors in V obtained as a linear combination of v1 , . . . , vn . P ROOF.– Let w “ of vectors
n ř
αi vi , αi P K @i “ 1, . . . , n, i“1 v1 , . . . , vn . By direct calculation:
xu, wy “ xu,
n ÿ i“1
α i vi y
“
(sesquilinearity)
n ÿ
be an arbitrary linear combination
αi xu, vi y “
i“1
5 Leopold Kronecker (1823, Liegnitz-1891, Berlin).
uKvi
n ÿ i“1
αi 0 “ 0
2
12
From Euclidean to Hilbert Spaces
T HEOREM 1.8 (Generalized Pythagorean theorem).– Let pV, x, yq be an inner product space on K. Let u, v P V be orthogonal to each other. Hence: 2
2
2
u ` v “ u ` v
More generally, if the vectors v1 , . . . , vn P V are orthogonal, then: 2 n n ÿ ÿ 2 2 2 2 vi ðñ v1 ` . . . ` vn “ v1 ` . . . ` vn vi “ i“1 i“1 P ROOF.– The two-vector case can be proven thanks to Carnot’s formula: }u ` v}2 “ xu ` v, u ` vy *0 *0 xu,vy ` xv, uy ` xv, vy “ xu, uy ` “ }u}2 ` }v}2 Proof for cases with n vectors is obtained by recursion: – the case where n “ 2 is demonstrated above; 2 n´1 n´1 ř ř 2 vi “ vi (recursion hypothesis); – we suppose that i“1
i“1
– now, we write u “ vn and z “
n´1 ř
vi , so u K z using Lemma 1.1. Hence, using
i“1 2
2
case n “ 2: u ` z “ }u}2 ` }z} , but: u ` z “ vn `
n´1 ÿ
vi “
i“1
so:
n ÿ
vi
i“1
2 n ÿ u ` z “ vi i“1 2
and:
2 n´1 ÿ }u}2 ` }z}2 “ }vn }2 ` v i“1 i giving us the desired thesis.
“
(Recursion hypothesis)
}vn }2 `
n´1 ÿ i“1
2
vi “
n ÿ
2
vi
i“1
2
Note that the Pythagorean theorem thesis is a double implication if and only if V is real, in fact, using law [1.6] we have that }u ` v}2 “ }u}2 ` }v}2 holds true if and only if pxu, vyq “ 0, which is equivalent to orthogonality if and only if V is real. The following result gives information concerning the distance between any two vectors within an orthonormal family.
Inner Product Spaces (Pre-Hilbert)
13
T HEOREM 1.9.– Let pV, x, yq be an inner product space on K and let F be an orthonormal ? family in V . The distance between any two elements of F is constant and equal to 2. P ROOF.– Using the Pythagorean theorem: }u ` p´vq}2 “ }u}2 ` }v}2 “ 2, from the fact that u K v. 2 1.5. Orthogonality and linear independence The orthogonality condition is more restrictive than that of linear independence: all orthogonal families are free. T HEOREM 1.10.– Let F be an orthogonal family in pV, x, yq, F “ tv1 , ¨ ¨ ¨ , vn u, vi ‰ 0 @i, then F is free. P ROOF.– We need to prove the linear independence of the elements vi , that is, n ř ai vi “ 0 ùñ ai “ 0 @i. To this end, we calculate the inner product of the i“1
linear combination
n ř
ai vi and an arbitrary vector vj with j P t1, . . . , nu:
i“1
x
n ÿ
i“1
ai vi , v j y “
r1.1s
n ÿ
ai xvi , vj y
i“1
“
pxvi ,vj y‰0 ô i“jq
aj xvj , vj y “ aj }vj }2
By hypothesis, none of the vectors in F are zero; the hypothesis that
n ř
ai vi “ 0
i“1
therefore implies that: 2 }v x0, vjony = aj lo j }on ñ aj “ 0. omo lo omo 0 0
This holds for any j P t1, . . . , nu, so the orthogonal family F is free.
2
Using the general theory of vector spaces in finite dimensions, an immediate corollary can be derived from Theorem 1.10. C OROLLARY 1.1.– An orthogonal family of n non-null vectors in a space pV, x, yq of dimension n is a basis of V . D EFINITION 1.6.– A family of n non-null orthogonal vectors in a vector space pV, x, yq of dimension n is said to be an orthogonal basis of V . If this family is also orthonormal, it is said to be an orthonormal basis of V .
14
From Euclidean to Hilbert Spaces
The extension of the orthogonal basis concept to inner product spaces of infinite dimensions will be discussed in Chapter 5. For the moment, it is important to note that an orthogonal basis is made up of the maximum number of mutually orthogonal vectors in a vector space. Taking n to represent the dimension of the space V and proceeding by reductio ad absurdum, imagine the existence of another vector u˚ P V , u ‰ 0, orthogonal to all of the vectors in an orthogonal basis pui qni“1 ; in this case, the set pu˚ , ui qni“1 would be free as orthogonal vectors are linearly independent, and the dimension of V would be n ` 1 instead of n! This property is usually expressed by saying that an orthogonal family is a basis if it is not a subset of another orthogonal family of vectors in V . Note that in order to determine the components of a vector in relation to an arbitrary basis, we must solve a linear system of n equations with n unknown variables. In fact, if v P V is any vector and pui q i “ 1, . . . , n is a basis of V , then the components of v in pui q are the scalars α1 , . . . , αn such that: $ n ř ’ ’ v “ αi ui,1 ’ 1 ’ ’ i“1 n & ÿ .. αi ui ðñ v“ . ’ ’ n i“1 ’ ř ’ ’ αi ui,n , %vn “ i“1
where ui,j is the j-th component of vector ui . However, in the presence of an orthogonal or orthonormal basis, components are determined by inner products, as seen in Theorem 1.11. Note, too, that solving a linear system of n equations with n unknown variables generally involves far more operations than the calculation of inner products; this highlights one advantage of having an orthogonal basis for a vector space. T HEOREM 1.11.– Let B “ tu1 , . . . , un u be an orthogonal basis of pV, x, yq. Then: v“
n ÿ xv, ui y ui }ui }2 i“1
Notably, if B is an orthonormal basis, then: v“
n ÿ i“1
xv, ui y ui
Inner Product Spaces (Pre-Hilbert)
15
P ROOF.– B is a basis, so there exists a set of scalars α1 , . . . , αn such that v “ n ř αj uj . Consider the inner product of this expression of v with a fixed vector ui , j“1
i P t1, . . . , nu: xv, ui y “ x
n ÿ
α j uj , u i y “
j“1
so αi “
xv,ui y }ui }2
n ÿ
αj xuj , ui y
j“1
@i “ 1, ¨ ¨ ¨ , n, and thus v “
n ř i“1
“
pui Kuj @i‰jq
xv,ui y }ui }2 ui .
αi xui , ui y “ αi }ui }2
If B is an orthonormal basis,
}ui } “ 1 giving the second law in the theorem.
2
Geometric interpretation of the theorem: The theorem that we are about to demonstrate is the generalization of the decomposition theorem of a vector in plane R2 or in space R3 on a canonical basis of unit vectors on axes. To simplify this, consider the case of R2 . If ˆı and jˆ are, respectively, the unit vectors of axes x and y, then the decomposition theorem says that: v “ }v} cos α ˆı ` }v} cos β jˆ “ xv, ˆıy ˆı ` xv, jˆy jˆ looomooon looomooon xv,ˆ ıy
xv,ˆ jy
which is a particular case of the theorem above. We will see that the Fourier series can be viewed as a further generalization of the decomposition theorem on an orthogonal or orthonormal basis. 1.6. Orthogonal projection in inner product spaces The definition of orthogonal projection can be extended by examining the geometric and algebraic properties of this operation in R2 and R3 . Let us begin with R2 . In the Euclidean space R2 , the inner product of a vector v and a unit vector evidently gives us the orthogonal projection of v in the direction defined by this vector, as shown in Figure 1.2 with an orthogonal projection along the x axis. The properties verified by this projection are as follows: 1) projecting onto the x axis a second time, vector Px v obviously remains unchanged given that it is already on the x axis, i.e. Px2 pvq :“ Px pPx vq “ Px v @v P V . Put differently, the operator Px bound to the x axis is the identity of this axis; 2) the difference vector between v and its projection v ´ Px v is orthogonal to the x axis, as we see from Figure 1.3;
16
From Euclidean to Hilbert Spaces
and diagonal projections Figure 1.2. Orthogonal projection Px v “ OC 2 OB and OD of a vector in v P R onto the x axis. For a color version of this figure, see www.iste.co.uk/provenzi/spaces.zip
Figure 1.3. Visualization of property 2 in R2 . For a color version of this figure, see www.iste.co.uk/provenzi/spaces.zip
3) Px v minimizes the distance between the terminal point of v and the x axis. In and AD are, in fact, the hypotenuses of right-angled triangles ABC Figure 1.2, AB is another side of these triangles, and is therefore and ACD; on the other hand, AC smaller than AB and AD. AC is the distance between the terminal point of v and the and AD are the distances between the terminal point terminal point of Px v, while AB of v and the diagonal projections of v onto x rooted at B and D, respectively. We wish to define an orthogonal projection operation for an abstract inner product space of dimension n which retains these same geometric properties. Analyzing orthogonal projections in R3 helps us to establish an idea of the algebraic definition of this operation. Figure 1.4 shows a vector v P R3 and the plane
Inner Product Spaces (Pre-Hilbert)
17
produced by the orthogonal vectors u1 and u2 . We see that the projection p of v onto 1y this plane is the vector sum of the orthogonal projections p1 “ xv,u }u1 }2 u1 and p2
“
xv,u2 y }u2 }2 u2 2 ř
p “ p1 ` p2 “
i“1
onto the two vectors u1 and u2 taken separately, i.e. xv,ui y }ui }2
ui .
Figure 1.4. Orthogonal projection p of a vector in R3 onto the plane produced by two unit vectors. For a color version of this figure, see www.iste.co.uk/provenzi/spaces.zip
Generalization should now be straightforward: consider an inner product space pV, x, yq of dimension n and an orthogonal family of non-zero vectors F “ tu1 , . . . , um u, m ď n, ui ‰ 0V @i “ 1, . . . , m. The vector subspace of V produced by all linear combinations of the vectors of F shall be written SpanpF q: # + m ÿ spanpF q ” S “ s P V : Dα1 , . . . , αm P K such that s “ α j uj j“1
The orthogonal projection operator or orthogonal projector of a vector v P V onto S is defined as the following application, which is obviously linear: PS : V ÝÑ S Ď V v ÞÝÑ PS pvq “
m ÿ xv, ui y ui }ui }2 i“1
Theorem 1.12 shows that the orthogonal projection defined above retains all of the properties of the orthogonal projection demonstrated for R2 . T HEOREM 1.12.– Using the same notation as before, we have: 1) if s P S then PS psq “ s, i.e. the action of PS on the vectors in S is the identity;
18
From Euclidean to Hilbert Spaces
2) @v P V and s P S, the residual vector of the projection, i.e. v ´ PS pvq, is K to S: xv ´ PS pvq, sy “ 0
ðñ
v ´ PS pvq K s
3) @v P V et s P S: }v ´ PS pvq} ď }v ´ s} and the equality holds if and only if s “ PS pvq. We write: PS pvq “ argmin }v ´ s} sPS
P ROOF.– m ř
1) Let s P S, i.e. s “
αj uj , then:
j“1
PS psq “
m ÿ
x
m ř
αj uj , ui y
j“1
}ui }2
i“1
ui “
m ÿ
m ř
αj xuj , ui y
j“1
i“1
ui
}ui }2
m m ÿ ÿ αi xui , ui y u “ α i ui “ s i }ui }2 @i‰jq i“1 i“1
“
pui Kuj
2) Consider the inner product of PS pvq and a fixed vector uj , j P t1, . . . , mu: m m ÿ ÿ xv, ui y xv, ui y ui , u j y “ xui , uj y 2 (linearity) }u } }ui }2 i i“1 i“1
xPS pvq, uj y “ x “
pui Kuj @i‰jq
xv, uj y xuj , uj y “ xv, uj y }uj }2
hence: xv, uj y´xPS pvq, uj y “ 0
ðñ
linearity of x , y
xv´PS pvq, uj y “ 0
Lemma 1.1 guarantees that xv ´ PS pvq, sy “ 0 @s “
m ř
@j P t1, ..., mu
αj uj .
j“1
3) It is helpful to rewrite the difference v ´ s as v ´ PS pvq ` PS pvq ´ s. From property 2, v ´PS pvqKS, however PS pvq, s P S so PS pvq´s P S. Hence pv ´PS pvqq K pPS pvq ´ sq. The generalized Pythagorean theorem implies that: 2 2 }v ´s}2 “ }v ´PS pvq`PS pvq´s}2 “ }v ´PS pvq}2 `}P S pvq ´ s} ě }v ´PS pvq} loooooomoooooon ě0
hence }v ´ s} ě }v ´ PS pvq} @v P V, s P S.
Inner Product Spaces (Pre-Hilbert)
19
Evidently, }PS pvq ´ s}2 “ 0 if and only if s “ PS pvq, and in this case }v ´ s}2 “ }v ´ PS pvq}2 . 2 The theorem demonstrated above tells us that the vector in the vector subspace S Ď V which is the most “similar” to v P V (in the sense of the norm induced by the inner product) is given by the orthogonal projection. The generalization of this result to infinite-dimensional Hilbert spaces will be discussed in Chapter 5. As already seen for the projection operator in R2 and R3 , the non-negative scalar ui i y| quantity |xv,u }ui } gives a measure of the importance of }ui } in the reconstruction of m ř xv,ui y the best approximation of v in S via the formula PS pvq “ }ui }2 ui : if this i“1
quantity is large, then }uuii } is very important to reconstruct PS pvq, otherwise, in some circumstances, it may be ignored. In the applications to signal compression, a usual strategy consists of reordering the summation that defines PS pvq in descent order of i y| the quantities |xv,u }ui } and trying to eliminate as many small terms as possible without degrading the signal quality. This observation is crucial to understanding the significance of the Fourier decomposition, which will be examined in both discrete and continuous contexts in the following chapters. Finally, note that the seemingly trivial equation v “ v ´ s ` s is, in fact, far more meaningful than it first appears when we know that s P S: in this case, we know that v ´ s and s are orthogonal. The decomposition of a vector as the sum of a component belonging to a subspace S and a component belonging to its orthogonal is known as the orthogonal projection theorem. This decomposition is unique, and its generalization for infinite dimensions, alongside its consequences for the geometric structure of Hilbert spaces, will be examine in detail in Chapter 5. 1.7. Existence of an orthonormal basis: the Gram-Schmidt process As we have seen, projection and decomposition laws are much simpler when an orthonormal basis is available. Theorem 1.13 states that in a finite-dimensional inner product space, an orthonormal basis can always be constructed from a free family of generators.
20
From Euclidean to Hilbert Spaces
T HEOREM 1.13.– (The iterative Gram-Schmidt process6) If pv1 , . . . , vn q, n ď 8 is a basis of pV, x, yq, then an orthonormal basis of pV, x, yq can be obtained from pv1 , . . . , vn q. P ROOF.– This proof is constructive in that it provides the method used to construct an orthonormal basis from any arbitrary basis. – Step 1: normalization of v1 : u1 “
v1 }v1 }
– Step 2, illustrated in Figure 1.5: v2 is projected in the direction of u1 , that is, we consider xv2 , u1 yu1 . We know from Theorem 1.12 that the vector difference v2 ´ xv2 , u1 yu1 is orthogonal to u1 . The result is then normalized: u2 “
v2 ´ xv2 , u1 yu1 }v2 ´ xv2 , u1 yu1 }
Figure 1.5. Illustration of the second step in the Gram-Schmidt orthonormalization process. For a color version of this figure, see www.iste.co.uk/provenzi/spaces.zip
– Step n, by iteration: un “
vn ´ pxvn , un´1 yun´1 ` . . . ` xvn , u1 yu1 q }vn ´ pxvn , un´1 yun´1 ` . . . ` xvn , u1 yu1 q}
2
1.8. Fundamental properties of orthonormal and orthogonal bases The most important properties of an orthonormal basis are listed in Theorem 1.14. 6 Jørgen Pedersen Gram (1850, Nustrup-1916, Copenhagen), Erhard Schmidt (1876, Tatu1959, Berlin).
Inner Product Spaces (Pre-Hilbert)
21
T HEOREM 1.14.– Let pu1 , . . . , un q be an orthonormal basis of pV, x, yq, dimpV q “ n. Then, @v, w P V : 1) Decomposition theorem on an orthonormal basis: n ÿ
v“
xv, ui yui
[1.7]
i“1
2) Parseval’s identity7: n ÿ
xv, wy “
xv, ui yxui , wy
[1.8]
i“1
3) Plancherel’s theorem8: }v}2 “
n ÿ
|xv, ui y|2
[1.9]
i“1
Proof of 1: an immediate consequence of Theorem 1.12. Given that pu1 , . . . , un q is a basis, v P spanpu1 , . . . , un q; furthermore, pu1 , . . . , un q is orthonormal, so v “ n ř xv, ui yui . It is not necessary to divide by }ui }2 when summing since PS pvq “ i“1
}ui } “ 1 @i.
Proof of 2: using point 1 it is possible to write v “
n ř
xv, ui yui , and calculating the
i“1
inner product of v, written in this way, and w, using equation [1.1], we obtain: xv, wy “ x
n ÿ
xv, ui yui , wy “
i“1
n ÿ
xv, ui yxui , wy
i“1
Proof of 3: writing w “ v on the left-hand side of Parseval’s identity gives us xv, vy “ }v}2 . On the right-hand side, we have: n ÿ
xv, ui yxui , vy “
i“1
hence }v}2 “
n ÿ i“1
n ř
xv, ui yxv, ui y “
n ÿ
|xv, ui y|2
i“1
|xv, ui y|2 .
i“1
7 Marc-Antoine de Parseval des Chêsnes (1755, Rosières-aux-Salines-1836, Paris). 8 Michel Plancherel (1885, Bussy-1967, Zurich).
2
22
From Euclidean to Hilbert Spaces
N OTE.– 1) The physical interpretation of Plancherel’s theorem is as follows: the energy of v, measured as the square of the norm, can be decomposed using the sum of the squared moduli of each projection of v on the n directions of the orthonormal basis pu1 , ..., un q. In Fourier theory, the directions of the orthonormal basis are fundamental harmonics (sines and cosines with defined frequencies): this is why Fourier analysis may be referred to as harmonic analysis. 2) If pu1 , . . . , un q is an orthogonal, rather than an orthonormal, basis, then using the projector formula and Theorem 1.12, the results of Theorem 1.14 can be written as: a) decomposition of v P V on an orthogonal basis: v“
n ÿ xv, ui y ui }ui }2 i“1
[1.10]
b) Parseval’s identity for an orthogonal basis: xv, wy “
n ÿ xv, ui yxui , wy }ui }2 i“1
[1.11]
c) Plancherel’s theorem for an orthogonal basis: }v}2 “
n ÿ |xv, ui y|2 }ui }2 i“1
[1.12]
The following exercise is designed to test the reader’s knowledge of the theory of finite-dimensional inner product spaces. The two subsequent exercises explicitly include inner products which are non-Euclidean. Exercise 1.1 Consider the complex Euclidean inner product space C3 and the following three vectors: ˆ ˙ 1 ´πi 2 u “ p0, i, 2iq, v “ p2i, 0, ´iq, w “ 0, i, e 2 1) Determine the orthogonality relationships between vectors u, v, w. 2) Calculate the norm of u, v, w and the Euclidean distances between them. 3) Verify that pu, v, wq is a (non-orthogonal) basis of C3 .
Inner Product Spaces (Pre-Hilbert)
23
4) Let S be the vector subspace of C3 generated by u and w. Calculate PS v, the orthogonal projection of v onto S. Calculate dpv, PS vq, that is, the Euclidean distance between v and its projection onto S, and verify that this minimizes the distance between v and the vectors of S (hint: look at the square of the distance). 5) Using the results of the previous questions, determine an orthogonal basis and an orthonormal basis for C3 without using the Gram-Schmidt orthonormalization process (hint: remember the geometric relationship between the residual vector r and the subspace S). 6) Given a vector a “ p2i, ´1, 0q, write the decomposition of a and Plancherel’s theorem in relation to the orthonormal basis identified in point 5. Use these results to identify the vector from the orthonormal basis which has the heaviest weight in the decomposition of a (and which gives the best “rough approximation” of a). Use a graphics program to draw the progressive vector sum of a, beginning with the rough approximation and adding finer details supplied by the other vectors. Solution to Exercise 1.1 1) Evidently, e´ 2 i “ ´i, so by directly calculating the inner products: xu, vy “ ´2, xu, wy “ 0 et xv, wy “ 12 . π
2) By direct calculation: }u}2 “ 5, }v}2 “ 5, }w}2 “ 54 . After calculating the ? difference vectors, we obtain: dpu, vq “ }u ´ v} “ 14, dpu, wq “ }u ´ w} “ 52 , ? dpv, wq “ }v ´ w} “ 221 . 3) The three vectors u, v, w are linearly independent, so they form a basis in C3 . This basis is not orthogonal since only vectors u and w are orthogonal. 4) S “ spanpu, wq. Since pu, wq is an orthogonal basis in S, we can write: PS pvq “
xv, uy xv, wy u` w “ p0, 0, ´iq }u}2 }w}2
The residual vector of the projection of v on S is r “ v ´ PS v “ p2i, 0, 0q and thus dpv, PS vq2 “ }r}2 “ 4. The most general vector in S is s “ αu ` βw “ p0, pα ` βqi, p2α ´ β2 qiq and dpv, sq2 “ }v ´ s}2 “ 4 ` pα ` βq2 ` p2α ´ β2 ` 1q2 ě 4 “ dpv, PS vq2 . This confirms that PS v is the vector in S with the minimum distance from v in relation to the Euclidean norm. 5) r is orthogonal to S, which is generated by u and w, hence pu, w, rq is a set of orthogonal vectors in C3 , that is, an orthogonal basis of C3 . To obtain an orthonormal basis, we then simply divide each vector by its norm: ˙ ˆ ˙ ˙ ˙ ˆˆ ˆ 2i i 2i u i w r , 0, ? , ´ ? , pi, 0, 0q “ 0, ? , ? pˆ u, w, ˆ rˆq ” , , }u} }w} }r} 5 5 5 5 6) Decomposition: a “ xa, u ˆyˆ u ` xa, wy ˆ w ˆ ` xa, rˆyˆ r“
?i u ˆ 5
`
2i ? w ˆ 5
` 2ˆ r.
24
From Euclidean to Hilbert Spaces
Plancherel’s theorem: }a}2 “ 5 “ |xa, u ˆy|2 ` |xa, wy| ˆ 2 ` |xa, rˆy|2 p“ “ 5q.
1 5
`
4 5
`4
The vector with the heaviest weight in the reconstruction of a is thus rˆ: this vector gives the best rough approximation of a. By calculating the vector sum of this rough representation and the other two vectors, we can reconstruct the “fine details” of a, first with w ˆ and then with u ˆ. 2 Exercise 1.2 Let M pn, Cq be the space of n ˆ n complex matrices. The application φ : M pn, Cq ˆ M pn, Cq Ñ C is defined by: φpA, Bq “ trpB : Aq t
where B : :“ B denotes the adjoint matrix of B and tr is the matrix trace. Prove that φ is an inner product. Solution to Exercise 1.2 The distributive property of matrix multiplication for addition and the linearity of the trace establishes the linearity of φ in relation to the first variable. Now, let us prove that φ is Hermitian. Let A “ pai,j q1ďi,jďn and B “ pbi,j q1ďi,jďn be two matrices in M pn, Cq. Let pci,j q1ďi,jďn “ pbj,i q1ďi,jďn be the coefficients of the matrix B : and let pdi,j q1ďi,jďn “ paj,i q1ďi,jďn be the coefficients of A: . This gives us: »˜ φpA, Bq “ trpB : Aq “ tr –
n ÿ
¸ fi
“
n ÿ
i,j
bk,i ak,i “
i,k“0
fl “ tr –
ci,k ak,j
k“1
»˜
n ÿ
n ÿ
¸ fi bk,i ak,j
k“1
di,k bk,i “ trpA: Bq
i,k“0
“ φpB, Aq Thus, φ is a sesquilinear Hermitian form. Furthermore, φ is positive: @A P M pn, Cq, φpA, Aq “
n ÿ i,k“0
|ak,i |2 ě 0
fl i,j
Inner Product Spaces (Pre-Hilbert)
25
It is also definite: n ÿ
φpA, Aq “ 0 ðñ
|ak,i |2 “ 0
i,k“0
ðñ @1 ď k, i ď n, ak,i “ 0 ðñ A “ 0 2
Thus, φ is an inner product. Exercise 1.3
Let E “ RrXs be the vector space of single variable polynomials with real coefficients. For P, Q P E, take: ż1 P ptqQptq ? dt ΦpP, Qq “ 1 ´ t2 ´1 1) Remember that f ptq “ Opgptqq means that D a, C ą 0 such that |t ´ t0 | ă tÑt0
a ùñ |f ptq| ď C |gptq|. Prove that for all P, Q P E, this is equal to: ˆ ˙ 1 P ptqQptq ? “ O ? 1´t 1 ´ t2 tÑ1 and: P ptqQptq ? 1 ´ t2
“ O
tÑ´1
ˆ
1 ? 1`t
˙
Use this result to deduce that Φ is definite over E ˆ E. 2) Prove that Φ is an inner product over E, which we shall note x , y. 3) For n P N, let Tn be the n-th Chebyshev polynomial, that is, the only polynomial such that @θ P R, Tn pcos θq “ cospnθq. Applying the substitution t “ cos θ, show that pTn qnPN is an orthogonal family in E. Hint: use the trigonometric formula [1.13]: 1 pcosppn`mqθq`cosppn´mqθqq “ cospnθq cospmθq 2
@n, m P N. [1.13]
4) Prove that for all n P N, pT0 , . . . , Tn q is an orthogonal basis of Rn rXs, the vector space of polynomials in RrXs of degree less than or equal to n. Deduce that pTn qnPN is an orthogonal basis in the algebraic sense: every element in E is a finite linear combination of elements in the basis of E.
26
From Euclidean to Hilbert Spaces
5) Calculate the norm of Tn for all n and deduce an orthonormal basis (in the algebraic sense) of E using this result. Solution to Exercise 1.3 1) We write f ptq “ P ptqQptq ? 1`t
P ptqQptq ? 1´t2
“
P ptqQptq ? ? . 1´t 1`t
Since P and Q are polynomials, the
function t ÞÑ is continuous in a neighborhood V1 p1q and thus, according to the Weierstrass theorem, itˇ is bounded in this neighborhood, that is, D C1 ą 0 such that ˇ ˇ P? ptqQptq ˇ ptqQptq t P V1 p1q ùñ ˇ 1`t ˇ ď C1 . Similarly, the function t ÞÑ P ? is continuous in 1´t ˇ ˇ ˇ P? ˇ a neighborhood V2 p´1q, thus D C2 ą 0 such that t P V2 p´1q ùñ ˇ ptqQptq ˇ ď C2 . 1´t This gives us: ˆ ˙ 1 1 t P V1 p1q ùñ |f ptq| ď C1 ? ðñ f ptq “ O ? tÑ1 1´t 1´t and: 1 ðñ f ptq “ O t P V2 p´1q ùñ |f ptq| ď C2 ? tÑ´1 1`t
ˆ
1 ? 1`t
˙
This implies that the integral defining Φ is definite; f ptq is continuous over p´1, 1q and therefore can be integrated. The result which we have just proved shows that f ptq is integrable in a right neighborhood of –1 and a left neighborhood of 1, as the integral of its absolute value is incremented by an integrable function in both cases. 2) The bilinearity of Φ is obtained from the linearity of the integral using direct calculation. Its symmetry is a consequence of that of the dot product between functions. The only property which is not immediately evident is definite positiveness. Let us start by proving positiveness: ΦpP, P q “
ż1 ´1
P 2 ptq ? dt ě 0 1 ´ t2
and9: P 2 ptq ΦpP, P q “ 0 ðñ ? dt “ 0 a.e. on p´1, 1q ðñ P ptq “ 0 a.e. on p´1, 1q 1 ´ t2 but the only polynomial with an infinite number of roots is the null polynomial 0ptq ” 0, so P “ 0. Φ is therefore an inner product on E. 9 a.e.: almost everywhere (see Chapter 3).
Inner Product Spaces (Pre-Hilbert)
3) For all n, m P N: ş1 m ptq xTn , Tm y “ ´1 Tn?ptqT dt 1´t2
27
pt “ cos θ, dt “ ´ sin θdθq
t “ cos θ “ ´1 ðñ θ “ π, t “ cos θ “ 1 ðñ θ “ 0 “
ż0
´ sin θ
π
“ “ “
żπ sin θ 0
żπ 0 żπ
Tn pcos θqTm pcos θq a dθ 1 ´ cos2 pθq
cospnθq cospmθq dθ | sin θ|
cospnθq cospmθq sin θ dθ sin θ
psin θ ě 0 on r0, πsq
cospnθq cospmθqdθ 0
1 “ 2
ˆż π 0
cosppn ` mqθqdθ `
żπ 0
cosppn ´ mqθqdθ
˙
pfrom r1.13sq
So, for all n ‰ m, we have: ˆ„ jπ „ jπ ˙ 1 sinppn ` mqθq sinppn ´ mqθq “0 ` xTn , Tm y “ 2 n`m n´m 0 0 that is, Chebyshev polynomials form an orthogonal family of polynomials in relation to the inner product defined above. 4) The family pT0 , T1 , . . . , Tn q is an orthogonal (and thus free) of n`1 elements of RrXs, which is of dimension n ` 1, meaning that it is an orthogonal basis of Rn rXs. To show that pTn qnPN is a basis in the algebraic sense of E, consider a polynomial P P E of an arbitrary degree d P N, i.e. P P Rd rXs, and note that pT0 , T1 , . . . , Td q is an orthogonal (free) family of generators of Rd rXs, that is, a basis in the algebraic sense of the term. 5) The norm of Tn is calculated using the following equality: ˙ ˆż π żπ 1 xTn , Tm y “ cosppn ` mqθqdθ ` cosppn ´ mqθqdθ 2 0 0 which was demonstrated in point 3. Taking n “ m, we have: ˙ ˆż π ˆ„ jπ żπ ˙ 1 1 sin 2nθ }Tn }2 “ xTn , Tn y “ cosp2nθqdθ ` dθ “ `π “ 2 2 2n 0 0 0 `ş ˘ ş π π 2 1 }Tn } “ xTn , Tn y “ 2 0 cosp2nθqdθ ` 0 dθ “ # 1 `şπ şπ ˘ if n “ 0 2 ´ 0 dθ ` 0 dθ ¯“ π “ 1 “ sin 2nθ ‰π π ` π “ if n ě 1, 2 2n 2 0
28
From Euclidean to Hilbert Spaces
hence }T0 } “
?
π and }Tn } “ + " * #c 2 T0 ? Y Tn π π
a π{2 for n ě 1. Finally, the family:
ně1
is an orthonormal basis of the vector space of first-order polynomials with real coefficients E. 2 1.9. Summary In this chapter, we have examined the properties of real and complex inner products, highlighting their differences. We noted that the symmetrical and bilinear properties of the real inner product must be replaced by conjugate symmetry and sesquilinearity in order to obtain a set of properties which are compatible with definite positivity. This final property is essential in order to produce a norm from a scalar product. We noted that the prototype for all inner product spaces, or pre-Hilbert spaces, of finite dimension n is the Euclidean space Kn , where K “ R or K “ C. Using the inner product, the concept of orthogonality between vectors can be extended to any inner product space. Two vectors are orthogonal if their inner product is null. The null vector is the only vector which is orthogonal to all other vectors, and the property of definite positiveness means that it is the only vector to be orthogonal to itself. If two vectors have the same inner product with all other vectors, that is, the same projection in every direction, then these vectors coincide. A norm on a vector space is said to be a Hilbert norm if an inner product can be defined which generates the norm in a canonical manner. Remarkably, a norm is a Hilbert norm if and only if it satisfies the parallelogram law; this holds true for both finite and infinite dimensions. The polarization law can be used to define an inner product which is compatible with a Hilbert norm. Vector orthogonality implies linear independence, guaranteeing that a set of n orthogonal vectors in a vector space of dimension n will constitute a basis. The expansion of a vector on an orthonormal basis is trivial: the components in relation to this basis are the inner products of the vector with the basis vectors. It is therefore much simpler to calculate components in such cases because, if the basis is not orthonormal, then a linear system of equations must be solved. The concept of orthogonal projection on a vector subspace S was also presented. Given an orthogonal basis of this space, the projection can be represented as an expansion over the vectors of the basis, with coefficients given by the inner products
Inner Product Spaces (Pre-Hilbert)
29
(which are normalized if the basis is not orthonormalized). We have seen that the difference between a vector and its orthogonal projection, known as the residual vector, is orthogonal to the projection subspace S. We also demonstrated that the orthogonal projection is the vector in S which minimizes the distance (in relation to the norm of the vector space) between the vector and the vectors of S. Given an inner product space, of finite or infinite dimensions, an orthonormal basis can always be defined using the Gram-Schmidt orthonormalization algorithm. Finally, we proved the important Parseval identity and Plancherel’s theorem in relation to an orthonormal or orthogonal basis. The extension of these properties to infinite dimensions is presented in Chapter 5.
2 The Discrete Fourier Transform and its Applications to Signal and Image Processing
The information presented in the previous chapter (Chapter 1) concerning complex inner product spaces and their properties lays the foundations for a very simple introduction to the discrete Fourier transform (DFT). We simply need to prove that certain functions of complex exponentials constitute an orthogonal basis for a complex inner product space of finite dimension. From a mathematical standpoint, the DFT is a simple change of basis in a vector space; however, its interpretation is of crucial importance and is extremely useful in the context of applications, notably in signal theory, as we shall see in section 2.6. This section draws on the excellent work of M. Frazier (2001). 2.1. The space 2 pZN q and its canonical basis In order to introduce the vector space in which the DFT is to be constructed, we need to make a few adjustments to the notation used thus far. We shall continue to work with complex vectors with a number of components N , 1 ă N ă `8, but a vector in CN will be considered as a finite sequence. Our first task is to define ZN .
From Euclidean to Hilbert Spaces: Introduction to Functional Analysis and its Applications First Edition. Edoardo Provenzi. © ISTE Ltd 2021. Published by ISTE Ltd and John Wiley & Sons, Inc.
32
From Euclidean to Hilbert Spaces
D EFINITION 2.1.– Two integers i, j P Z are said to be congruent modulo N if their difference is divisible by N , that is: a´b “ m P Z, N meaning that we can write a “ b ` mN . The (Gaussian) notation for two integers which are congruent modulo N is: a”b
pmod N q
Congruence modulo N can be shown to be an equivalence relationship in Z. Like all equivalence relationships, it creates a partition of Z into distinct equivalence classes. The set of these equivalence classes is written as: ZN “ Z pmod N q A (finite) sequence of complex values on ZN is a function: z : ZN Ñ C j ÞÑ zpjq In practice, circular or “clock” arithmetic is applied: this consists of identifying a sequence defined on ZN as a sequence defined on t0, 1, . . . , N ´ 1u and extended to Z by N -periodicity: zpj ` mN q “ zpjq
@j, m P Z
that is, given the definition of zpjq when j P t0, 1, . . . , N ´ 1u, in order to determine zpjq when j R t0, . . . , N ´ 1u, we must add an integer multiple of N to j. This is written as mN , m P Z such that j¯ “ j ` mN P t0, 1, . . . , N ´ 1u. We then define zpj ` mN q “ zp¯ jq. An example is shown below. E XAMPLE .– N “ 12, z “ p1, i, i,
?
2i, 0, 0, 0, ´1, 0, 0, 0, 2q, that is:
$ zp0q “ 1 ’ ’ ’ ’ ’ zp1q “ zp2q “ i ’ ’ ? ’ ’ ’ &zp3q “ 2i zp4q “ zp5q “ zp6q “ 0 ’ ’ ’ zp7q “ ´1 ’ ’ ’ ’ ’zp8q “ zp9q “ zp10q “ 0 ’ ’ % zp11q “ 2
The Discrete Fourier Transform and its Applications to Signal and Image Processing
33
extended by 12-periodicity to Z. Determine zp´21q. Since N “ 12, we must find the integer m ‰ 0 such that ´21 ` 12m P t0, 1, . . . , 11u: $ ´21 ` 12 “ ´9 ’ ’ ’ &´21 ` 24 “ 3 ´21 ` 12m “ ’ ´21 ` 36 “ 15 ’ ’ % ...
m“1 m“2 m“3
The value of m for which ´21 ` 12m falls within t0, . . . , 11u is ? m “ 2, and in this case we have ´21 ` 2 ¨ 12 “ 3, which implies zp´21q “ zp3q “ 2i. Despite the fact that ZN is often considered to represent the set of canonical representatives t0, 1, . . . , N ´ 1u, we can, in fact, consider z to be defined over any sub-set of Z given by N consecutive integers, and not necessarily over t0, . . . , N ´ 1u. This convention will be used throughout this book. The complex vector space that will be used in this section is the set of all sequences of complex values over ZN : 2 pZN q “ tz : ZN Ñ Cu The reason for using this particular notation will become clear later. 2 pZN q is a complex vector space with the usual scalar summation and multiplication operators, that is, given z, w P 2 pZN q, α P C, the sum and multiplication by a complex vector are defined as follows: z ` w : ZN Ñ C j ÞÑ pz ` wqpjq “ zpjq ` wpjq αz : ZN Ñ C j ÞÑ pαzqpjq “ αzpjq 2 pZN q is of dimension N : the application which associates each sequence z P pZN q with its images pzp0q, zp1q, . . . , zpN ´ 1qq: 2
2 pZN q ÐÑ CN ¨
zp0q zp1q .. .
˛
‹ ˚ ‹ ˚ ÐÑ pzp0q, zp1q, . . . , zpN ´ 1qq “ ˚ ‹ ‚ ˝ zpN ´ 1q is a linear isomorphism (the proof is left to the reader). z will be represented as a row vector or as a column vector as the case requires. z
34
From Euclidean to Hilbert Spaces
The isomorphism above allows us to define the canonical basis B of 2 pZN q as the set of the following N sequences: # 1 k“j B “ pe0 , e1 , . . . , eN ´1 q, ej pkq “ δj,k “ 0 k‰j We can also introduce an inner product into 2 pZN q using: xz, wy “
Nÿ ´1
zpkqwpkq
k“0
so z, w P 2 pZN q are orthogonal if and only if xz, wy “
Nř ´1
zpkqwpkq “ 0.
k“0
The norm induced by this inner product is: ˜ }z} “
Nÿ ´1
¸ 12 |zpkq|2
k“0
which will be referred to as the 2 pZN q norm. 2.1.1. The orthogonal basis of complex exponentials in 2 pZN q In this section, we are going to define the function system that will be essential to the development of the DFT. First, we recall these basic facts: 1) for an arbitrary z P C, z “ ρrcos α ` i sin αs “ ρeiα , ρ, α P R, ρ ě 0 ; 2) Euler’s formulas: cos α “ 12 peiα ` e´iα q, sin α “
1 iα 2i pe
´ e´iα q;
3) |z| “ 1 ô z “ eiα ; 4) eiα “ eipα`2πkq , k P Z; 5) as a specific instance of the previous point, if α “ 0, we obtain: e2πik “ 1
@k P Z ;
6) eiα eiβ “ eipα`βq ; 7) peiα qn “ einα ; 8) eiα “ e´iα ; 9) given z “ ρeiα , the solutions to the equation wN “ z are the N complex roots ? 2πm`α given by the equation: wm “ N ρei N , m “ 0, . . . , N ´ 1;
The Discrete Fourier Transform and its Applications to Signal and Image Processing
35
10) specifically: roots N -ths of the unit : ωm “ e2πi N , m
m “ 0, . . . , N ´ 1.
We also need to recall the geometric summation formula, defined by: Sk “ 1 ` z ` z 2 ` . . . ` z k´1 ` z k “
k ÿ
zj
j“0
If z “ 1, then Sk “ k ` 1. If z ‰ 1, we observe that: p1 ´ zqSk “ 1 ` z ` z 2 ` . . . ` z k ´ pz ` z 2 ` . . . ` z k ` z k`1 q “ 1 ´ z k`1 hence: k ÿ j“0
# z “ j
1´z k`1 1´z
if z P Czt1u if z “ 1
k`1
Now, consider the sequences in 2 pZN q defined by the following complex exponentials: Em : ZN ÝÑ C n ÞÝÑ Em pnq where: $ ’ E0 pnq “ 1 ’ ’ ’ ’ 2πi n ’ ’E1 pnq “ e N & 2n E2 pnq “ e2πi N ’ .. ’ ’ ’ . ’ ’ ’ pN ´1qn %E 2πi N N ´1 pnq “ e Hence: – E0 is the constant sequence E0 pnq ” 1 @n P ZN ; ¯ ´ pN ´1q 1 2 ; – E1 is the sequence E1 “ 1, e2πi N , e2πi N , . . . , e2πi N ¯ ´ 2pN ´1q 4 2 – E2 is the sequence E2 “ 1, e2πi N , e2πi N , . . . , e2πi N ; ¯ ´ 2pN ´1q pN ´1q2 N ´1 . – EN ´1 is the sequence EN ´1 “ 1, e2πi N , e2πi N , . . . , e2πi N The general sequence is: Em pnq “ e2πi
mn N
“ pωm qn
@m, n “ 0, . . . , N ´ 1
36
From Euclidean to Hilbert Spaces
where pωm qn is the n-th power of the N -th roots of the unit, @n P t0, ..., N ´ 1u, so: ` m ˘n mn “ e2πi N pωm qn “ e2πi N From formula z “ eiα “ rcos α ` i sin αs, we know that the system defined above is a set of sequences of values which oscillate at different frequencies, since the arguments of the cos and sin functions change with the coefficients m and n. As we shall see, the signification of these frequencies is crucial to Fourier analysis. For now, let us focus on proving that the exponential system defined above is an orthogonal basis of 2 pZN q. This proof relies on a preliminary lemma. L EMMA 2.1.– For all j, k P t0, 1, . . . , N ´ 1u, we have: # Nÿ ´1 Nÿ ´1 j´k j´k N e2πin N “ e´2πin N “ N δj,k “ 0 n“0 n“0
j“k j‰k
[2.1]
The physical interpretation of this key formula will be discussed later. Before going further with the proof, note that in the case where j, k P ZN , j ‰ k, we have j ´ k P k´j t1, 2, . . . , N ´ 1u, so j´k N “ ´ N R Z. P ROOF.– This proof covers the first summation, but it is evident that this demonstration also holds for the second summation. We start by using the properties of complex exponentials to rewrite the formula as follows: Nÿ ´1
e2πin
j´k N
“
n“0
Nÿ ´1 ´
e2πi
¯n
j´k N
n“0
Let us analyze the following two cases: 0
– if j “ k, the exponentials in the sum are equal to 1, since e2πi N “ 1, and thus: Nÿ ´1 n“0
e2πin
j´j N
“
Nÿ ´1
1“N
n“0
– if j ‰ k, the exponentials are ‰ 1, so, using the geometric summation formula: ¯N ´1`1 ´ 2πi j´k Nÿ ´1 ´ ¯ 1 ´ e n j´k N e2πi N “ j´k 2πi N 1´e n“0 “ “
1 ´ e2πi
pj´kqN N
1 ´ e2πi
j´k N
1 ´ e2πipj´kq 1 ´ e2πi
j´k N
The Discrete Fourier Transform and its Applications to Signal and Image Processing
37
Since j ´ k “ m P Z, e2πipj´kq “ 1, the numerator of the final formula is 0 when j ‰ k. The denominator, on the other hand, never cancels out; as we saw in the remark before the proof, if j ‰ k, then j´k N R Z. In this case, the summation is equal to 0. 2 The demonstration that E is an orthogonal basis of 2 pZN q is now trivial. T HEOREM 2.1.– E “ pE0 , . . . , EN ´1 q is an orthogonal basis of 2 pZN q. P ROOF.– E is given by N elements of an N –dimensional inner product space, so if we can prove that E is an orthogonal family, then the theorem is also proved. We know that an orthogonal family is free, and a free family of N vectors in an N –dimensional vector space is a basis. We thus calculate the inner products xEj , Ek y, @j, k P t0, . . . , N ´ 1u: xEj , Ek y “
Nÿ ´1
Ej pnqEk pnq “
n“0
Nÿ ´1
jn
e2πi N e´2πi N “
n“0
kn
Nÿ ´1
e2πi
pj´kqn N
“ N δj,k
n“0
using Lemma 2.1 to give us the final equality, which proves that xEj , Ek y “ N δj,k , that is, the elements in the basis are mutually orthogonal. 2 If we consider that j “ k “ m in equation xEj , Ek y “ N δj,k , then xEm , Em y “ N δm,m “ N , hence: 2
}Em } “ N ,
}Em } “
?
N,
@m P t0, 1, . . . , N ´ 1u
Now, let us consider two examples in which the expression of the complex exponentials is particularly simple: N “ 2 and N “ 4 (the expression using N “ 3 is not quite so simple): 1) N “ 2. 2 pZ2 q “ tz “ pzp0q, zp1qq P C2 u, in this case Em pnq “ e2πi πimn e and thus: ˘ ` m“0: E0 “ eπi0¨0 , eπi0¨1 “ p1, 1q ˘ ` ˘ ` m“1: E1 “ eπi1¨0 , eπi1¨1 “ 1, eπi
mn 2
“
However, eπi “ cospπq ` i sinpπq “ ´1, so E1 “ p1, ´1q. Thus: E “ pp1, 1q, p1, ´1qq
[2.2]
is the basis of complex exponentials in 2 pZ2 q. Note the presence of a constant sequence (the first) and an oscillating sequence (the second). This particular feature of the basis will be discussed in greater detail later.
38
From Euclidean to Hilbert Spaces
2) N “ 4. 2 pZ4 q “ tz “ pzp0q, zp1q, zp2q, zp3qq P C4 u: the Fourier basis is obtained from four complex sequences, each with four components. Verification that the basis of complex exponentials of 2 pZ4 q is: E “ pp1, 1, 1, 1q, p1, i, ´1, ´iq, p1, ´1, 1, ´1q, p1, ´i, ´1, iqq
[2.3]
is left to the reader. Results [1.10], [1.11] and [1.12] from section 1.8 may be used to write the following formulas, which are valid for any two elements z, w P 2 pZN q: – decomposition on the orthogonal basis E: z“
Nÿ ´1
xz, Em y Em N m“0
[2.4]
– Parseval’s identity for the orthogonal basis E: xz, wy “
Nÿ ´1
xz, Em yxEm , wy N m“0
[2.5]
– Plancherel’s theorem for E: }z}2 “
Nÿ ´1
2
|xz, Em y| N m“0
[2.6]
The expressions above are calculated explicitly in section 2.3. There are several ways of renormalizing the basis E. Two of the most widespread approaches, which can also be used to define the DFT, are discussed in the next two sections. 2.2. The orthonormal Fourier basis of 2 pZN q ? As we saw in section 2.1.1, the norm 2 pZN q of all sequences Em?is N ; evidently, an orthonormal basis can therefore be obtained by dividing by N . This justifies Definition 2.2. D EFINITION 2.2.– The orthonormal Fourier basis of 2 pZN q is the set: E “ pE0 , E1 , E2 , . . . , EN ´1 q of the N sequences Em P 2 pZN q: Em : ZN ÝÑ C n ÞÝÑ Em pnq
The Discrete Fourier Transform and its Applications to Signal and Image Processing
39
where: $ E0 pnq “ ?1N ’ ’ ’ n ’ ’ E pnq “ ?1N e2πi N ’ ’ & 1 2n E2 pnq “ ?1N e2πi N ’ ’ .. ’ ’ . ’ ’ ’ pN ´1qn % EN ´1 pnq “ ?1N e2πi N The general sequence of the orthonormal Fourier basis is: mn 1 1 Em pnq “ ? e2πi N “ ? pωm qn N N
@m, n “ 0, . . . , N ´ 1
and the orthonormality formula xEj , Ek y “ δj,k holds true. Using formulas [2.2] and [2.3], we can say that: 1 E “ ? pp1, 1q, p1, ´1qq 2
[2.7]
is the orthonormal Fourier basis of 2 pZ2 q and: E“
1 pp1, 1, 1, 1q, p1, i, ´1, ´iq, p1, ´1, 1, ´1q, p1, ´i, ´1, iqq 2
[2.8]
is the orthonormal Fourier basis of 2 pZ4 q. The translation of theorem 1.14 for 2 pZN q equipped with the orthonormal Fourier basis is as follows. Given arbitrary elements z, w P 2 pZN q, we have: – a decomposition on the orthonormal Fourier basis: z“
Nÿ ´1
xz, Em yEm
[2.9]
m“0
– Parseval’s identity: xz, wy “
Nÿ ´1
xz, Em yxEm , wy
[2.10]
m“0
– Plancherel’s theorem: }z}2 “
Nÿ ´1 m“0
2
|xz, Em y|
[2.11]
40
From Euclidean to Hilbert Spaces
2.3. The orthogonal Fourier basis of 2 pZN q ? Although the normalization constant 1{ N , which appears in the definition of the orthonormal Fourier basis, might appear to be the most logical choice for normalizing the basis E in 2 pZN q, another normalization is more commonly used in practical applications. The reason for this choice, shown below, is that it simplifies the writing of several other formulas. D EFINITION 2.3.– The orthogonal Fourier basis of 2 pZN q is the set: F “ pF0 , F1 , F2 , . . . , FN ´1 q of N sequences Fm P 2 pZN q: Fm : ZN ÝÑ C n ÞÝÑ Fm pnq where: $ 1 ’ ’ ’F0 pnq “ N ’ 1 2πi n ’ ’ ’F1 pnq “ N e N & 2n F2 pnq “ N1 e2πi N ’ .. ’ ’ ’ . ’ ’ ’ pN ´1qn %F pnq “ 1 e2πi N N ´1
N
The general sequence of the orthogonal Fourier basis is: Fm pnq “
1 2πi mn 1 N “ e pωm qn N N
@m, n “ 0, . . . , N ´ 1
The relationships between the three bases E, E and F are: Em Em “ ? , N
Fm “
Em , N
Em Fm “ ? N
@m P t0, 1, . . . , N u
[2.12]
Using the formulas above, the orthogonal Fourier bases of 2 pZ2 q and 2 pZ4 q are easy to calculate: – orthogonal Fourier basis of 2 pZ2 q: F “
1 pp1, 1q, p1, ´1qq 2
[2.13]
– orthogonal Fourier basis of 2 pZ4 q : F “
1 pp1, 1, 1, 1q, p1, i, ´1, ´iq, p1, ´1, 1, ´1q, p1, ´i, ´1, iqq 4
[2.14]
The Discrete Fourier Transform and its Applications to Signal and Image Processing
41
Again, using relationship [2.12], we can determine the equivalents of formulas [2.9] or [2.4], [2.10] or [2.5] and [2.11] or [2.6] for two arbitrary elements z, w P 2 pZN q: – decomposition on the orthogonal Fourier basis: z“N
Nÿ ´1
xz, Fm yFm
m“0
– Parseval’s identity for the orthogonal Fourier basis: xz, wy “ N
Nÿ ´1
xz, Fm yxFm , wy
m“0
– Plancherel’s theorem for the orthogonal Fourier basis: Nÿ ´1
}z}2 “ N
2
|xz, Fm y|
m“0
Table 2.1 supplies a helpful summary of the differences between these bases and formulas: Em pnq “ e2πi
Basis
mn mn 1 1 , Em pnq “ ? e2πi N , Fm pnq “ e2πi N N N
Decomposition
E
z“
E
z“
F
mn N
Nř ´1 m“0 Nř ´1
xz,Em y Em N
xz, Em yEm
m“0 Nř ´1
z“N
Parseval’s identity xz, wy “ xz, wy “
Nř ´1
m“0 Nř ´1
xz, Em yxEm , wy
m“0 Nř ´1
xz, Fm yFm xz, wy “ N
m“0
xz,Em yxEm ,wy N
Plancherel ’s theorem Nř ´1
}z}2 “ }z}2 “
m“0 Nř ´1
|xz, Em y|2
m“0 Nř ´1
xz, Fm yxFm , wy }z}2 “ N
m“0
|xz,Em y|2 N
|xz, Fm y|2 .
m“0
Table 2.1. Different normalizations of Fourier bases and relative formulas
2.4. Fourier coefficients and the discrete Fourier transform The definition of the DFT varies from author to author and from application to application. The two most widespread definitions use the orthonormal basis E and a blend of the orthogonal bases E and F . These two versions are useful for different reasons:
42
From Euclidean to Hilbert Spaces
– using the orthonormal basis E allows us to obtain unitary operators; – using a blend of the orthogonal bases E and F makes it possible to simplify many formulas, including the convolution formula, widely used in applications, which will be discussed later. For the purposes of this book, we shall use formulas obtained by a blend of the orthogonal bases E and F . This decision was made for reasons of coherency with various mathematical programs, notably MATLAB. First, let us reconsider the following decomposition: z“
Nÿ ´1
Nÿ ´1 xz, Em y Em xz, Em y Em “ N N m“0 m“0
However, Em {N “ Fm , so: z“
Nÿ ´1
xz, Em yFm
m“0
that is, any given element z P 2 pZN q can be decomposed over the orthogonal Fourier basis F with the components given by the inner products of z with elements of the basis E. Using the definition of the inner product of 2 pZN q, we can write: xz, Em y “
Nÿ ´1
zpnqEm pnq “
n“0
“
Nÿ ´1 n“0 Nÿ ´1
zpnqe2πi
mn N
zpnqe´2πi
mn N
n“0
D EFINITION 2.4.– Given any z P 2 pZN q, the complex vectors xz, Em y, m P t0, 1, . . . , N ´ 1u are known as the Fourier coefficients of z, noted zˆpmq. Explicitly: zˆpmq “
Nÿ ´1
zpnqe´2πi
mn N
Fourier coefficients of z
[2.15]
n“0
The sequence of Fourier coefficients of z is written using zˆ P 2 pZN q: zˆ “ pˆ z p0q, zˆp1q, zˆp2q, . . . , zˆpN ´ 1qq
[2.16]
The linear operator which transforms z P 2 pZN q into the sequence zˆ P 2 pZN q of its Fourier coefficients, that is: DFT ” ˆ ” F : 2 pZN q ÝÑ 2 pZN q z ÞÝÑ DFTpzq ” zˆ ” Fpzq
The Discrete Fourier Transform and its Applications to Signal and Image Processing
zˆpmq “
Nÿ ´1
zpnqe´2πi
mn N
43
@m P t0, 1, . . . , N ´ 1u
n“0
is known as the discrete Fourier transform, or DFT. It is important to note that the variable of z is n, while the variable of zˆ is m. The interpretation of n and m in the context of signal theory will be given in section 2.6; for now, note simply that n is the discrete value of an instant in time (or a position in space) at which a signal z is measured, whereas m is proportional to the oscillation frequency of a wave (harmonic) and is a multiple of a fundamental frequency. The DFT is used to translate a description of a signal in terms of temporal (or spatial) samples into a description in terms of signal frequencies. This notion will be formalized in section 2.6. Using the definitions given above, the decomposition of z may be written as follows: z“
Nÿ ´1
zˆpmqFm
[2.17]
m“0
that is, the Fourier coefficients of z are the components of z in the orthogonal Fourier basis F : zˆ “ rzsF
[2.18]
Using the notation introduced above, the theorem of decomposition on the orthonormal Fourier basis, Parseval’s identity and Plancherel’s theorem may be rewritten as: – decomposition of z on the orthogonal Fourier basis: zpnq “
N ´1 mn 1 ÿ zˆpmqe2πi N N m“0
@n “ 0, 1, . . . , N ´ 1
[2.19]
– Parseval’s identity:
xz, wy “
N ´1 1 ÿ 1 zˆpmqwpmq ˆ “ xˆ z , wy ˆ N m“0 N
[2.20]
– Plancherel’s theorem : }z}2 “
N ´1 1 ÿ 1 2 |ˆ z pmq| “ }ˆ z }2 N m“0 N
[2.21]
44
From Euclidean to Hilbert Spaces
2.4.1. The inverse discrete Fourier transform It is interesting to compare formulas [2.15] and [2.19]: zˆpmq “
Nÿ ´1
zpnqe´2πi
mn N
,
zpnq “
n“0
N ´1 mn 1 ÿ zˆpmqe2πi N N m“0
@n, m P t0, 1, . . . , N ´ 1u The first relationship states that given the values of zpnq, the values of zˆpmq can be reconstructed using formula [2.15]. The second relationship states that given the values of zˆpmq, the values of zpnq can be reconstructed using formula [2.19]. There is thus a “duality” between the two formulas: it is possible to obtain sequence z from sequence zˆ and vice versa using relationships [2.15] and [2.19]. This duality is formalized in Definition 2.5 and Theorem 2.2. D EFINITION 2.5.– The linear operator: 2 pZN q IDFT ” ˇ ” F ´1 : 2 pZN q ÝÑ z ÞÝÑ IDFTpzq ” zˇ ” F ´1 pzq zˇpnq “
N ´1 mn 1 ÿ zpmqe2πi N N m“0
@n P t0, 1, . . . , N ´ 1u
is known as the inverse discrete Fourier transform, or IDFT. T HEOREM 2.2.– The IDFT is the inverse linear operator of the DFT and vice versa: IDFT “ DFT´1 ,
DFT “ IDFT´1
or, in other terms, zˇˆ “ z,
zˆˇ “ z
@z P 2 pZN q
P ROOF.– We wish to prove that the composition between the DFT and the IDFT and between the IDFT and the DFT gives the identity operator id: the DFT˝IDFT“IDFT˝DFT“ id, idpzq “ z, @z P 2 pZN q. We start by verifying that, given an arbitrary sequence z P 2 pZN q and applying the DFT to obtain the sequence of Fourier coefficients zˆ P 2 pZN q, it is possible to obtain the original sequence by applying the IDFT: 2 pZN q ÝÑ 2 pZN q ÝÑ 2 pZN q DFT IDFT ˇ“z z ÞÝÑ zˆ ÞÝÑ zˆ
The Discrete Fourier Transform and its Applications to Signal and Image Processing
45
Before writing the composition, it is important to note that the summation index – the symbol of which is unimportant – should not be confused with the fixed variables n, m in zˇpnq and zˆpmq. To avoid this problem we will use the neutral symbol j. ˜ ¸ N ´1 N ´1 Nÿ ´1 mn 1 ÿ 1 ÿ 2πi mn ´2πi mj ˇ N N zˆpnq “ zˆpmqe “ zpjqe e2πi N N m“0 N m“0 j“0 N ´1 N ´1 n´j 1 ÿ ÿ zpjqe2πim N N m“0 j“0 ˜ ¸ N ´1 Nÿ ´1 1 ÿ 2πim n´j N “ zpjq e N j“0 m“0
“
“
pLemma 2.1q
N ´1 1 ÿ zpjqN δj,n N j“0
“ zpnq @n P t0, 1, . . . , N ´ 1u Now, let us verify that the inverse composition produces the same identity: 2 pZN q ÝÑ 2 pZN q ÝÑ 2 pZN q IDFT DFT ˆ “ z. z ÞÝÑ zˇ ÞÝÑ zˇ zˆˇpmq “ “
Nÿ ´1
´2πi mn N
zˇpnqe
n“0 ´1 Nÿ ´1 Nÿ
1 N
˜
n“0
zpjqe2πin
N ´1 jn 1 ÿ zpjqe2πi N N j“0
¸ e´2πi
mn N
j´m N
n“0 j“0
N ´1 1 ÿ “ zpjq N j“0
“
“
Nÿ ´1
pLemma2.1q
1 N
˜
Nÿ ´1
¸ e
2πin j´m N
n“0 Nÿ ´1
zpjqN δj,m
j“0
“ zpmq @m P t0, 1, . . . , N ´ 1u Thus, zˇˆpnq “ zpnq and zˆ ˇpmq “ zpmq, @n, m P t0, 1, . . . , N ´1u which concludes our proof. 2 Note the similarity between the DFT and the IDFT: the only differences are the coefficient 1{N and the sign of the complex exponential. We wish to draw the reader’s attention to the formulas demonstrated above: N ´1 mn 1 ÿ zˇˆpnq “ zˆpmqe2πi N “ zpnq N m“0
@n P ZN
46
From Euclidean to Hilbert Spaces
zˆˇpmq “
Nÿ ´1
zˇpnqe´2πi
mn N
“ zpmq
@m P ZN
n“0
D EFINITION 2.6.– The pair pz, zˆq P 2 pZN q ˆ 2 pZN q is known as a Fourier pair. 2.4.2. Definition of the DFT and the IDFT with the orthonormal Fourier basis An alternative definition of Fourier coefficients, the DFT and the IDFT, more commonly found in a theoretical mathematical context, uses the orthonormal Fourier basis E: – z, w P 2 pZN q; – Fourier coefficients: N ´1 mn 1 ÿ zˆpmq “ ? zpnqe´2πi N N n“0
[2.22]
The notation zˆ in the following formulas in this list (and only these formulas) refers to the Fourier coefficients above. – decomposition on the orthonormal Fourier basis: N ´1 mn 1 ÿ zˆpnqe2πi N zpmq “ ? N n“0
– DFT : N ´1 mn 1 ÿ zpnqe´2πi N zˆpmq “ ? N n“0
@m P t0, 1, . . . , N ´ 1u
– IDFT : N ´1 mn 1 ÿ zpmqe2πi N zˇpnq “ ? N m“0
@n P t0, 1, . . . , N ´ 1u
– Parseval’s identity: xz, wy “
Nÿ ´1
zˆpmqwpmq ˆ “ xˆ z , wy ˆ
m“0
– Plancherel’s theorem: }z}2 “
Nÿ ´1
|ˆ z pmq|2 “ }ˆ z }2
m“0
Box 2.1. Discrete orthonormal Fourier analysis
The Discrete Fourier Transform and its Applications to Signal and Image Processing
47
As we can see, the greatest advantage of using the orthonormal Fourier basis in defining the objects used in Fourier analysis is that the DFT and the IDFT are operators which conserve the inner product, and consequently the norm; they are therefore represented using unitary matrices. We also see that, independently of the definition used, the product of the coefficients of zˆ and zˇ must always be equal to 1{N to guarantee that IDFT = DFT´1 . 2.4.3. The real (orthonormal) Fourier basis The Fourier basis and DFT can be written using real notation. The advantage of a real DFT is that, if z is real, we can avoid the need to introduce imaginary components. For simplicity’s sake, we shall focus on the orthonormal Fourier basis. First, we must determine whether N is even or odd. Let us begin with the case where N is even: N “ 2M , M P N, M ě 1. In this case, @n “ 0, 1, . . . , N ´ 1, we write: $ c0 pnq “ ?1N ’ ’ b ’ ’ ’ 2 &cm pnq “ p 2πmn m “ 1, 2, ..., M ´ 1 N cos´ N q¯ N n 2π n p´1q 2 ?1 ’ “ ?N ’ N ’cM pnq “ bN cos ’ ’ % sm pnq “ N2 sin p 2πmn m “ 1, 2, . . . , M ´ 1 N q If N “ 2M ` 1 is odd, c0 , cm and sm are defined in the same way as above, but m “ N {2 should not be considered as in this case N {2 is not an integer. T HEOREM 2.3.– The set tc0 , c1 , . . . , cM ´1 , cM , s1 , . . . , sM ´1 u, when N “ 2M , or the set tc0 , c1 , . . . , cM ´1 , s1 , . . . , sM ´1 u, when N “ 2M `1, is an orthonormal basis of 2 pZN q. Thus, for all z P 2 pZN q: z“ z“
M ÿ
xz, cm ycm `
m“0 M ´1 ÿ m“0
xz, cm ycm `
M ´1 ÿ
xz, sm ysm
m“1 M ´1 ÿ
xz, sm ysm
pN “ 2M q pN “ 2M ` 1q
m“1
D EFINITION 2.7.– The real orthonormal Fourier basis of 2 pZN q is the set of sequences of 2 pZN q tc0 , c1 , . . . , cM ´1 , cM , s1 , . . . , sM ´1 u when N “ 2M , our the set of sequences of 2 pZN q tc0 , c1 , . . . , cM ´1 , s1 , . . . , sM ´1 u when N “ 2M ` 1.
48
From Euclidean to Hilbert Spaces
The relationship with the Fourier coefficients is obtained using the following formulas: $ ˆp0q ’ xz, c0 y “ z? ’ N ’ ’ ’ zˆpM q ’ ? xz, c y “ ’ M ’ N ’ ’ ’ ? 1 pˆ xz, c y “ z pmq ` zˆpN ´ mqq, m “ 1, 2, . . . , M ´ 1 ’ m ’ 2N ’ & ´i xz, sm y “ ?2N pˆ z pmq ´ zˆpN ´ mqq, m “ 1, 2, . . . , M ´ 1 ? ’ ’ zˆp0q “ N xz, c0 y ’ ’ ? ’ ’ zˆpM q “ N xz, cM y ’ ’ a ’ ’ ’ zˆpmq “ N {2pxz, cm y ´ ixz, sm yq, m “ 1, 2, . . . , M ´ 1 ’ ’ a ’ %zˆpmq “ N {2pxz, c m “ M ` 1, M ` 2, . . . , N ´ 1 N ´m y ` ixz, sN ´m yq, 2.5. Matrix interpretation of the DFT and the IDFT By definition, the DFT transforms sequences of 2 pZN q represented in the canonical basis B of 2 pZN q into sequences of 2 pZN q represented in the orthogonal Fourier basis F of 2 pZN q [2.17]: DFT : 2 pZN q ÝÑ 2 pZN q z “ rzsB ÞÝÑ DFTpzq “ zˆ “ rzsF The DFT is thus the operator used to operate the change from the canonical basis B of 2 pZN q to the Fourier basis F of 2 pZN q, and, consequently, the IDFT is the opposite operator. We wish to establish a matrix representation of these two linear operators DFT and IDFT. To do this, we shall use a notation which is widely used in literature concerning 2πi the DFT: ωN “ e´ N . Using the properties of complex exponentials, we can write: mn ωN “ e´2πi
mn N
and the Fourier coefficients can thus be written as: zˆpmq “
Nÿ ´1 n“0
zpnqe´2πi
mn N
“
Nÿ ´1
mn zpnqωN
n“0
mn We define the matrix WN containing the elements ωN : mn wmn “ ωN
The Discrete Fourier Transform and its Applications to Signal and Image Processing
49
that is, explicitly: ¨
WN
1 1 1 1 2 3 ˚1 ω N ω ω N N ˚ ˚1 ω 2 4 6 ωN ωN ˚ N “˚ 3 6 9 ˚1 ωN ωN ωN ˚ .. .. .. ˚ .. ˝. . . . 2pN ´1q 3pN ´1q N ´1 ωN ωN 1 ωN
... ... ... ... .. .
˛
1
N ´1 ωN 2pN ´1q ωN 3pN ´1q ωN .. .
pN ´1qpN ´1q
‹ ‹ ‹ ‹ ‹ ‹ ‹ ‹ ‚
[2.23]
. . . ωN
This N ˆ N matrix is called Sylvester matrix1. It is symmetrical: WN “ WNt , i.e. mn nm wmn “ wnm (an obvious consequence of the definition of wmn , since ωN “ ωN ) and each line or column is obtained by the geometric progression2 of a power of ωN . A matrix of this type is known as a Vandermonde matrix3. By convention, when considering WN , we examine the variability of the indices of the lines and columns between 0 and N ´ 1 (in place of canonical variability, from 1 to N ). This convention is the reason why all elements in the first line (m “ 0) and all elements in the first column (n “ 0) are equal to 1. If we apply WN to z considered as a column vector in CN , then, by the definition of matrix product, we obtain a vector WN z whose m-th component pWN zqpmq4 is given by: pWN zqpmq “
Nÿ ´1
wmn zpnq “
n“0
Nÿ ´1
zpnqe´2πi
mn N
“ zˆpmq,
@m P ZN ,
n“0
thus: zˆ “ WN z
@z P 2 pZN q
Using the same approach, we can verify that the IDFT is implemented via the conjugate matrix of WN normalized by the coefficient 1{N (transposition is not required as WN is symmetrical): WN´1 “
1 WN , N
zˇ “ WN´1 z
@z P 2 pZN q
1 James Joseph Sylvester (1814, London-1897, London). 2 A geometric progression of reason r is the sequence of powers 1 “ r0 , r “ r1 , r2 , r3 , . . . , rn . 3 Alexandre-Théophile Vandermonde (1735, Paris-1796, Paris). 4 This is the real Euclidean product of the m-th line of WN , i.e. pwm0 , wm1 , . . . , wmpN ´1q q times the components pzp0q, zp1q, . . . , zpnqq of z.
50
From Euclidean to Hilbert Spaces
WN is the change of basis matrix used to go from B to F , and WN´1 “ the change of basis matrix used to go from F to B.
1 N WN
is
O BSERVATIONS .– Using the definition of the DFT corresponding to equation [2.22], ˜N “ that is using the orthonormal Fourier basis, the associated matrix becomes W ? ˜N. WN { N . This is a unitary matrix, and thus its inverse matrix is W Examples: – N “ 2 : ω2 “ e´2πi{2 “ e´iπ “ cospπq ´ i sinpπq “ ´1, thus: ˆ ˙ 1 1 W2 “ 1 ´1 hence: W2´1
1 “ 2
ˆ ˙ 1 1 1 ´1
– N “ 4 : ω4 “ e´2πi{4 “ e´iπ{2 “ cospπ{2q ´ i sinpπ{2q “ ´i, thus: ¨ ˛ 1 1 1 1 ˚1 ´i p´iq2 p´iq3 ‹ ‹ W4 “ ˚ ˝1 p´iq2 p´iq4 p´iq6 ‚ 1 p´iq3 p´iq6 p´iq9 hence: ¨ 1 ˚1 W4 “ ˚ ˝1 1
1 ´i ´1 i
1 ´1 1 ´1
˛ 1 i ‹ ‹ ´1‚ ´i
The inverse matrix is: ¨ 1 1 1 ˚ 1 1 i ´1 W4´1 “ ˚ ˝ 4 1 ´1 1 1 ´i ´1
˛ 1 ´i ‹ ‹ ´1‚ i
[2.24]
[2.25]
Note that the columns of matrix W4´1 consist of the orthogonal basis F of pZ4 q, as seen in formula [2.14]; this is coherent with the fact that this is the matrix used to change from the orthogonal basis F to the canonical basis of 2 pZ4 q. 2
The Discrete Fourier Transform and its Applications to Signal and Image Processing
51
2.5.1. The fast Fourier transform As we have seen, the action of the DFT on a signal z P 2 pZN q can be represented as a matrix product. We must therefore calculate N multiplications for each element zˆpmq in the sequence zˆ P 2 pZN q. Since zˆ has N components, the complexity of the algorithm used to calculate the DFT is OpN 2 q. This complexity means that the DFT is extremely time-consuming when working with signals of large dimension. In practice, the Fourier transform was almost never used outside of a theoretical context (that is, in real-world applications) before the 1960s. A breakthrough came in 1965, when Cooley and Tukey used symmetries concealed within the DFT to construct a fast algorithm for calculating the DFT: this algorithm is known as the fast Fourier transform (FFT). The complexity of the FFT is of the order of OpN log N q, and, using modern computers, it allows the Fourier transform of large dimension signals to be calculated in under a second. The FFT is extremely efficient in cases where the signal dimension is a power of 2. This is the reason why a 512 or 1,024 format is typically used for digital images, enabling rapid and efficient processing using the FFT. The development of the FFT is considered as one of the greatest scientific breakthroughs of the 20th century, as it enables the use of Fourier transforms in a vast array of practical applications. 2.6. The Fourier transform in signal processing Fourier theory has applications in a wide range of domains, for example in solving ordinary and partial differential equations, classical and quantum physics, statistics and probabilities, and signal processing. In this section, we shall highlight the crucial role of Fourier theory in signal processing in one dimension (1D). 2.6.1. Synthesis formula for 1D signals: decomposition on the harmonic basis A discrete 1D signal of dimension N may be defined as the set of N samplings of a variable, which may be dependent on time, on a spatial dimension (x,y or z), or on another parameter with a single degree of freedom.
52
From Euclidean to Hilbert Spaces
Two remarkable examples of discrete 1D signals, dependent on time or a single spatial dimension, are: – the set of intensity values for a piece of music, sampled at N different moments in time; – the set of grayscale values of a line or column in a simple image, corresponding to N different positions. A discrete 1D signal can be processed using Fourier theory using the following basic identifications: – the abstract mathematical representation of a discrete 1D signal is given by a sequence z P 2 pZN q; – n P ZN “ t0, 1, . . . , N ´ 1u represents the value of the parameter (time, spatial dimension, etc.) according to which the signal is sampled. The unit of measurement used for n is typically the second or meter; – the energy of the signal z is associated with the square of the norm }z}2 . The next step is to interpret the decomposition formula over the Fourier basis, the DFT and the IDFT, and Plancherel’s theorem in the context of signal processing. The interpretation of Plancherel’s theorem in this case is simplest: the energy of the signal z is decomposed into the sum of the squared magnitudes of the Fourier coefficients. The decomposition formula over the Fourier basis, equation [2.19], is known as the synthesis formula in the context of signal processing: zpnq “
N ´1 mn 1 ÿ zˆpmqe2πi N N m“0
@n P ZN
Using this formula, the signal z can be reconstructed (or “synthesized") using the mn Fourier coefficients zˆpmq and the oscillating functions e2πi N . The functions used in the signal synthesis operation are: ´ m ¯ ´ m ¯ m e2πi N n “ cos 2π n ` i sin 2π n . N N
[2.26]
When m “ 0, there is no oscillation; from m “ 1 to m “ N ´ 1 the functions m e2πi N n oscillate at a certain frequency (m is therefore measured in hertz or rad/s). This will be discussed in detail in section 2.6.4. These functions are known as harmonics, a term derived from the field of music, as we see from Definition 2.8.
The Discrete Fourier Transform and its Applications to Signal and Image Processing
53
1
D EFINITION 2.8 (harmonics).– The function n ÞÑ e2πi N n is known as a fundamental m (discrete) harmonic5 and the functions n ÞÑ e2πi N n for m “ 2, . . . , N ´ 1 are (discrete) harmonics of (higher) order m. 2.6.2. Signification of Fourier coefficients and spectrums of a 1D signal The synthesis formula tells us that the signal z in the value n of its parameter can be reconstructed using a linear combination of harmonic waves of frequencies which are multiples of 1{N via the coefficient m: t0, 1{N, 2{N, . . . , pN ´ 1q{N u. The complex scalars of the linear combination are the Fourier coefficients zˆpmq. Each Fourier coefficient zˆpmq P C may be written as: zˆpmq “ apmq ` ibpmq “ |ˆ z pmq|eiArgpˆzpmqq where |ˆ z pmq| “
a 2 apmq2´` bpmq ¯ is the magnitude of the Fourier coefficient zˆpmq
and Argpˆ z pmqq “ arctan
bpmq apmq
is its argument.
Evidently, the “weight” which measures the importance of each harmonic e2πi N in reconstructing a signal z is the magnitude6 of the Fourier coefficient zˆpmq: mn |ˆ z pmq| : measures the importance of the harmonic e2πi N in reconstructing z. For this reason, in signal processing, the Fourier coefficient formula is known as the analysis formula: mn
zˆpmq “
Nÿ ´1
zpnqe´2πi
mn N
@m P ZN
n“0
since zˆ allows us to analyze the frequency components of a signal. If the discrete signal z is dependent on the time t (or a spatial dimension x), then the transformation z Ñ zˆ obtained using the DFT enables us to go from a temporal (or spatial) representation of the signal to a frequential representation, or the Fourier space. The Fourier transform is often defined as the equivalent of Newton’s prism for mathematics. Newton’s prism breaks down light into “hidden” frequency components corresponding to the colors of the spectrum. The Fourier transform reveals the frequency components which are “hidden” in any signal. This analogy explain the terms used in Definition 2.9. 5 It is important to specify that these harmonics are discrete; continuous harmonics are obtained using functions t ÞÑ e2πimνt “ eimωt , where ν is the frequency and ω “ 2πν the pulse. 6 The magnitude must be used here due to the fact that complex numbers are not ordered.
54
From Euclidean to Hilbert Spaces
D EFINITION 2.9.– Given z P 2 pZN q: – t|ˆ z pmq|, m P ZN u is known as the amplitude spectrum of z, or simply the spectrum of z; – t|ˆ z pmq|2 , m P ZN u is the power spectrum of z; – tArgpˆ z pmqq, m P ZN u is the phase spectrum of z. The signification of these spectra will be discussed in detail later. Note the presence of one particularly special Fourier coefficient, zˆp0q, which provides information concerning the average value of z: zˆp0q “
Nÿ ´1
0n
zpnqe2πi N “
n“0
where xzy “
1 N
Nř ´1
Nÿ ´1
zpnq “ N xzy
ùñ
zˆp0q “ N xzy
n“0
zpnq is the average value of the signal z.
n“0
Introducing this expression of zˆp0q into the synthesis formula and separating the first term from the rest of the sum, we obtain: zpnq “
N ´1 mn 1 1 ÿ zˆpmqe2πi N N xzy ` N N m“1
that is : zpnq “ xzy `
N ´1 mn 1 ÿ zˆpmqe2πi N N m“1
The Fourier coefficient zˆp0q is known as the “DC” component of the synthesis formula, while the other terms constitute the “AC” component. This terminology is taken from the field of electronics, with DC standing for “direct current” (current of frequency zero) and AC standing for “alternating current”. One way of interpreting the formula set out above is to say that z is decomposed into the sum of its mean value and the finer details reconstructed by higher order harmonics, weighted by the Fourier coefficients of z. 2.6.3. The synthesis formula and Fourier coefficients of the unit pulse It is helpful to compare the synthesis formula with formula [2.1], that is: # Nÿ ´1 N j“k ˘2πin j´k N e “ N δj,k “ 0 j‰k n“0
The Discrete Fourier Transform and its Applications to Signal and Image Processing
55
Rewriting j ´k “ m P ZN , switching m and n (an acceptable substitution, as both are arbitrary values of ZN ) and normalizing by N , we obtain the following formula: # N ´1 1 n“0 1 ÿ ˘2πin m N “ “ e0 pnq ” δ0 pnq ” δpnq e N m“0 0 n “ 1, . . . , N ´ 1 δ is known as the unit pulse. If we select the option “+” in the formula shown above, we obtain the synthesis formula for the unit pulse, in which all Fourier coefficients are unitary: ˇ ˇ ˇ ˇ @m P ZN δp0 pmq “ 1 “ ˇδp0 pmqˇ This result is particularly informative: the DFT transforms a signal which is completely “localized” at a value on its parameter into a signal which is fully “delocalized” across the spectrum: the harmonics for all frequencies have the same weight when reconstructing the signal. Let us now calculate the Fourier coefficients of the constant signal zpnq “ N1 , @n P ZN , we obtain: # Nÿ ´1 N ´1 1 m“0 1 ´2πi mn 1 ÿ ´2πi mn N “ N “ δ pmq “ zˆpmq “ e e 0 N N 0 m “ 1, . . . , N ´ 1 n“0 n“0 We see that the DFT of a constant signal (which is completely delocalized in relation to its parameter) is therefore a unit pulse in the Fourier domain, meaning that it is completely localized in its frequencies. The generalization of this behavior for spaces which are more complicated than 2 pZN q – notably L2 pΩq, Ω Ď Rn , which we will examine later – forms the basis for understanding the Heisenberg uncertainty principle, the conceptual core of quantum mechanics. Thanks to the results that we have discussed above, we can give a physical interpretation of the formula [2.1] in Lemma 2.1: the superposition of harmonic functions with frequencies which are integer multiples of one another is subjected to a destructive interference everywhere, except at one value where the harmonics experience a constructive interference. Moreover, according to the synthesis formula, harmonics must be weighted differently in order to reconstruct any signal which is not a pulse. 2.6.4. High and low frequencies in the synthesis formula Let us take a closer look at the meaning of the frequency coefficients m in the set ! ´ mn ¯ ) ´ mn ¯ mn e2πi N “ cos 2π ` i sin 2π , n “ 0, 1, . . . , N ´ 1 , N N
56
From Euclidean to Hilbert Spaces
which represents the value of the harmonics in each of the N parameters n. For the sake of simplicity, we shall only consider the real part of the elements of the set above, that is ) ! ´ mn ¯ , n “ 0, 1, . . . , N ´ 1 ; Hm “ cos 2π N our remarks concerning the cosine are equally applicable to the sine. ` ˘ Consider the behavior of cos 2π mn when the value of m is between 0 and N ´1, N where N is even (the case where N is odd will be discussed later): – m “ 0 : As we have already ` seen, ˘ in this case, there is no oscillation, but simply a series of constant values, cos 2π 0n N “ 1, so: H0 “ t1, 1, . . . , 1u; –m“1: " ˆ ˙ ˆ ˙ ˆ ˙* 1 2 N ´1 H1 “ 1, cos 2π , cos 2π , . . . , cos 2π N N N The values of H1 represent N samples of a cosine oscillation. The cycle` does˘not terminate as we do not consider the value n “ N , which would give us cos 2π N N “ cosp2πq “ 1. Figure 2.1 shows the graph of Hm for m “ 1, N “ 16; –m“2: #
˜ ¸ ˆ ˙ ˆ ˙ 2 N2 2 4 H2 “ 1, cos 2π , cos 2π , . . . , cos 2π N N N ˆ ˙* 2pN ´ 1q “ 1, . . . , cos 2π N The values of H2 represent N samples of two cosine oscillations. n “ N {2 marks the end of a cosine cycle. Figure 2.2 shows the graph of Hm for m “ 2, N “ 16. We see that, for n “ 8 “ 16{2, the cosine value is 1. Increasing m up to N {2, the oscillation frequency of the cosine increases. ´ N ¯The n maximum frequency is reached when m “ N {2; in this case, cos 2π 2N “ cospπnq, thus: H N “ tp´1qn , n “ 0, 1, . . . , N ´ 1u 2
The Discrete Fourier Transform and its Applications to Signal and Image Processing
Figure 2.1. Hm for m “ 1, N “ 16
Figure 2.2. Hm for m “ 2, N “ 16
57
58
From Euclidean to Hilbert Spaces
We might expect the cosine oscillation frequency to increase up to N ´ 1, but this is not the case. In reality, from m “ N {2 ` 1, `the cosine ˘ oscillation frequency decreases. To understand this behavior, consider cos 2π nm when m belongs to the N ( set N2 ` 1, N2 ` 2, . . . , N ´ 1 , and apply the following change of variable: " * N N k “ N ´ m ô m “ N ´ k, m P ` 1, ` 2, . . . , N ´ 1 2 2 " * N ô kP ´ 1, . . . , 2, 1 , 2 then, when m increases from N2 ` 1 up to N ´ 1, k decreases from N2 ´ 1 down to 1. Applying this variable change to the cosine, we obtain: ˆ ˙ ˆ ˙ ˆ ˙ ˆ ˙ npN ´ kq nk nk nk cos 2π “ cos 2πn ´ 2π “ cos ´2π “ cos 2π , N N N N having used the periodicity and parity of the cosine. Consequently: " * ˆ ˙ " * ´ nm ¯ N nk N N , mP ` 1, ` 2, . . . , N ´ 1 ðñ cos 2π , kP ´ 1, . . . , 1 cos 2π N 2 2 N 2
Thus, the maximum number of harmonic oscillations is obtained when m “ N {2, and is symmetrical about this value. For example, the graph of Hm for m “ 9, N “ 16 is exactly equal to the graph of Hm with m “ 7, N “ 16. Similarly, the graph of m “ 15, N “ 16 is exactly equal to the graph in Figure 2.1, representing Hm with m “ 1, N “ 16; “ ‰ – evidently, if N is odd, the considerations set out above are valid for N2 , the integer part of N2 , that is the integer closest to, but not greater than N2 . The elements described above are the reasons for certain choices of terminology: – high frequencies: values of m close to
N 2;
– low frequencies: values of m close to 0 or N ´ 1. If the synthesis formula for a discrete signal z P 2 pZN q includes Fourier coefficients zˆpmq with a high magnitude for values of m which are close to N {2, the signal will be characterized by relatively violent variations (as in the case of high sounds, such as those produced by cymbals). However, if the Fourier coefficients with the highest modulus correspond to values of m close to 0 and N ´ 1, the signal will be characterized by “gentler” variations (as in the case of low sounds, such as those produced by bass drums).
The Discrete Fourier Transform and its Applications to Signal and Image Processing
59
The frequency m “ N {2 is known as the Nyquist frequency7. This is the highest harmonic frequency which can appear in the synthesis formula for N samples of a signal. 2.6.5. Signal filtering in frequency representation The DFT can be used to easily modify the frequency content of a signal, for example increasing the strength of the lowest or highest frequencies. The standard approach is to obtain the Fourier space using the DFT then adjust the Fourier coefficients as required using a filter f : 2 pZN q Ñ 2 pZN q, which may be either a linear or a nonlinear transform. Finally, the IDFT is applied to the sequence of modified Fourier coefficients to reconstruct the original signal in its modified form. The signal processing approach used in the frequency domain is shown in Figure 2.3.
Figure 2.3. Filtering approach in the Fourier domain
Note that, in the IDFT ˝ f ˝ DFT transform composition, only f has the capacity to change the energy of the signal: the composition of the Fourier transform with its inverse produces an identity, so the energy of the original signal is retained. One particularly important example of a filter f , defined in section 2.6.6, can be used to define the concept of the Fourier multiplier, defined in section 2.6.7. 7 For the Swedish engineer Harry Nyquist (1889–1976).
60
From Euclidean to Hilbert Spaces
2.6.6. The multiplication operator and its diagonal matrix representation Let w : ZN Ñ C be a fixed sequence in 2 pZN q. D EFINITION 2.10.– The linear application below is known as the multiplication operator by sequence w: Mw : 2 pZN q ÝÑ 2 pZN q z ÞÝÑ Mw pzq “ w ¨ z where Mw pzq “ w ¨ z : ZN Ñ C is the sequence defined by the point-wise (also called Hadamard) product of w and z: Mw zpnq “ pw ¨ zqpnq “ wpnq ¨ zpnq
@n P ZN
Note that if z is represented as a column vector in the canonical basis of 2 pZN q, then the matrix associated with the operator Mw in relation to the canonical basis of 2 pZN q is a diagonal matrix Dw with diagonal elements given by the components of sequence w: ˛ ¨ wp0q 0 ¨ ˛ ¨ ˛ ‹ ˚ zp0q wp0qzp0q ‹ ˚ ‹˚ ‹ ˚ ‹ ˚ .. .. .. Dw z “ ˚ ‹˝ ‚“ ˝ ‚ . . . ‹ ˚ ‚ zpN ´ 1q ˝ 0 wpN ´ 1qzpN ´ 1q wpN ´ 1q E XAMPLE OF A MULTIPLICATION OPERATOR .– Consider the sequence of 2 pZ6 q given by z “ p2, 3 ´ i, 2i, 4 ` i, 0, 1q and the sequence wpnq “ in , n P Z6 , then: pwp0q “ 1, wp1q “ i, wp2q “ ´1, wp3q “ ´i, wp4q “ 1, wp5q “ iq and thus: pMw zqpnq “ p1¨2, i¨p3´iq, ´1¨2i, ´i¨p4`iq, 1¨0, i¨1q “ p2, 3i`1, ´2i, ´4i`1, 0, iq
This provides the foundation for introducing the Fourier multiplier operator. 2.6.7. The Fourier multiplier operator The Fourier multiplier operator – or multiplier – is one notable example of a frequency filter.
The Discrete Fourier Transform and its Applications to Signal and Image Processing
61
D EFINITION 2.11.– Given a sequence w : ZN Ñ C, the Fourier multiplier by sequence w is the following operator: Tpwq : 2 pZN q ÝÑ 2 pZN q ~ ¨ zˆ z ÞÝÑ Tpwq pzq “ w that is, Tpwq is the operator given by the composition Tpwq “ IDFT ˝ Mw ˝ DFT , that is, 2 pZN q ÝÑ DFT
z
2 pZN q
ÝÑ Mw
2 pZN q
ÝÑ 2 pZN q IDFT
~ ¨ zˆ ÞÑ DFTpzq “ zˆ ÞÑ Mw pDFTpzqq “ w ¨ zˆ ÞÑ IDFTpMw pDFTpzqqq “ w
Applying the DFT to both sides of the definition of Tpwq , we see that the action of the Fourier multiplier is diagonal in the Fourier basis F : DFT Tpwq z “ rTpwq zsF “ Mw ˝ DFT z “ Mw zˆ,
@z P 2 pZN q
[2.27]
Thus, Tpwq multiplies the Fourier coefficients of z by the components of sequence w (making this operator a multiplier). This means that we can: – attenuate the low frequencies of a signal z by selecting a sequence wpmq with a low value of |wpmq| when m » 0 and m » N ´ 1; – attenuate the high frequencies of a signal z by selecting a sequence wpmq with a low value of |wpmq| when m » N {2; – amplify the low frequencies of a signal z by selecting a sequence wpmq with a high value of |wpmq| when m » 0 and m » N ´ 1; – amplify the high frequencies of a signal z by selecting a sequence wpmq with a high value of |wpmq| when m » N {2. This information is used in graphic equalizers, used by musicians to adjust the level of high frequencies and bass notes in an audio signal. 2.7. Properties of the DFT In this section, we shall demonstrate the most important properties of the DFT. We shall begin by recalling the translation property of a summation index: n ÿ i“n0
ai “
n´k ÿ i“n0 ´k
ai`k “
n`k ÿ
ai´k
[2.28]
i“n0 `k
This property will be used on several occasions, along with the following lemma.
62
From Euclidean to Hilbert Spaces
L EMMA 2.2.– Let f : Z Ñ C be an N -periodic function, with N P N: f pn ` aN q “ f pnq
@a, n P Z
Then, for all m P Z : m`N ÿ´1
Nÿ ´1
f pnq “
n“m
f pnq
n“0
that is, the sum of an N -periodic function across any interval of size N is constant. P ROOF.– If m “ 0, there is nothing to prove, so we may take m P Z, m ‰ 0. Considering values of m ą 0: m`N ÿ´1
f pnq “
m`N ÿ´1
n“m
f pnq ´
m´1 ÿ
n“0
f pnq “
Nÿ ´1
n“0
f pnq `
n“0
m`N ÿ´1
f pnq ´
m´1 ÿ
f pnq
n“0
n“N
but, using [2.28]: m`N ÿ´1
f pnq “
m´1 ÿ
f pn ` N q “
n“0
n“N
m´1 ÿ
f pnq
n“0
because of the N -periodicity of f , thus: m`N ÿ´1
f pnq “
n“m
Nÿ ´1
f pnq `
n“0
m´1 ÿ
f pnq ´
n“0
m´1 ÿ
f pnq “
n“0
Nÿ ´1
f pnq
n“0
A similar demonstration may be used for cases where m ă 0.
2
2.7.1. Periodicity of zˆ and zˇ In what follows, we shall examine the most important properties of the discrete Fourier theory, starting with the periodicity of zˆ and zˇ. By direct calculation, if a P Z, then: zˆpm ` aN q “
Nÿ ´1
zpnqe´2πi
pm`aN qn N
“
zpnqe´2πi
mn N
Nÿ ´1
zpnqe´2πi
mn N
e´2πi
aN n N
n“0
n“0 Nÿ ´1
“
e´2πani “ zˆpmq
n“0
since e´2πani “ cosp2πanq ´ i sinp2πanq “ 1. The same calculation is used to prove zˇpn ` aN q “ zˇpnq @a P Z.
The Discrete Fourier Transform and its Applications to Signal and Image Processing
63
Thanks to this property, the definitions of zˆ and zˇ can be extended to Z by considering the two N -periodic sequences: zˆ : Z ÝÑ C m ÞÝÑ zˆpmq “ zˆpm ` aN q and: zˇ : Z ÝÑ C n ÞÝÑ zˇpnq “ zˇpn ` aN q with a P Z such that m ` aN P ZN , or n ` aN P ZN , respectively. 2.7.2. DFT and shift We now wish to consider how the DFT of a signal z P 2 pZN q varies in response to a shift in z by a quantity different to N . Another operator for 2 pZN q must be introduced in order to formalize this consideration. D EFINITION 2.12.– Take z P 2 pZN q. The following linear application is the right shift operator of the quantity k: Rk : 2 pZN q ÝÑ 2 pZN q z ÞÝÑ Rk pzq where Rk pzq : ZN Ñ C is the sequence defined by the formula: Rk zpnq “ zpn ´ kq
@n P ZN
E XAMPLE OF A SHIFT OPERATOR .– N “ 6, k “ 2, z “ p2, 3 ´ i, 2i, 4 ` i, 0, 1q. Then: $ ’ ’ &R2 zp0q “ zp0 ´ 2q “ zp´2q “ zp´2 ` 6q “ zp4q “ 0 R2 zp1q “ zp1 ´ 2q “ zp´1q “ zp´1 ` 6q “ zp5q “ 1 ’ ’ % ... giving us: R2 z “ p0, 1, 2, 3´i, 2i, 4`iq. Evidently, the effect of R2 on z is a simple displacement of each element in the sequence by two positions to the right (hence the notation R).
64
From Euclidean to Hilbert Spaces
The final two elements “turn” into the first two positions, as though following a circle. For this reason, Rk is also known as a circular shift operator or rotation operator. Now, consider the composition of this shift operator with the DFT and, inversely, that of the DFT with the shift operator. We shall begin with this latter composition: DFTpzpn ´ kqq, that is: 2 pZN q ÝÑ 2 pZN q ÝÑ 2 pZN q DFT
Rk
z
ÞÝÑ
Rk z
y ÞÝÑ pDFT ˝ Rk qz “ DFTpRk zq “ R kz
Theorem 2.4 shows that, due to the DFT, the action of the operator Rk is transformed into a multiplication by a complex exponential. T HEOREM 2.4.– Take z P 2 pZN q and k P Z. Then: ´2πi mk y N z R ˆpmq k zpmq “ e
@m P Z
[2.29]
k k mk that is, if we define the sequence ωN P 2 pZN q, ωN pmq “ ωN “ e´2πi then:
DFT ˝ Rk “ MωN k ˝ DFT
mk N
@m P Z, [2.30]
P ROOF.– řN ´1 ´2πi mn y N R k zpmq “ n“0 pRk zqpnqe “ “ “
Nÿ ´1
zpn ´ kqe´2πi
n“0 N ´k´1 ÿ n“´k N ´k´1 ÿ
mn N
zpn ´ k ` kqe´2πi
mpn`kq N
zpnqe´2πi
mk N
mn N
e´2πi
n“´k ´2πi mk N
Factor e summation:
is independent of the index n and can thus be left out of the
´2πi mk y N R k zpmq “ e
N ´k´1 ÿ n“´k
“
pLemma 2.2q
“ e´2πi
e´2πi
mk N
mk N
zpnqe´2πi Nÿ ´1 n“0
zˆpmq
mn N
zpnqe´2πi
mn N
The Discrete Fourier Transform and its Applications to Signal and Image Processing
65
Lemma 2.2 can be applied in this case as, by hypothesis, z is N -periodic and the mn exponential e´2πi N is itself an N -periodic function. 2 ˇ ˇ mk ˇ ˇ Note that, if we write zˆpmq “ |ˆ z pmq|eiArgpˆzpmqq then, since ˇe´2πi N ˇ “ 1, the product e´2πi N zˆpmq only changes the phase of zˆpmq. This is the reason why we say that the DFT transforms the shift into a phase shift. The fact that the phase of the Fourier coefficients is modified by translations implies that the phase spectrum contains information regarding the geometry of the original signal. mk
2.7.2.1. Shift invariance of the spectrum Theorem 2.4 highlights an important limitation of the Fourier transform. Since: ˇ ˇ ˇ ´2πi mk ˇ y N ˇ “ 1 ùñ |R z pmq| @m, k P Z, ˇe k zpmq| “ |ˆ the magnitudes of the Fourier coefficients of z and of all its shifts are equal. Consequently, the magnitude of the Fourier coefficients |ˆ z pmq| informs us of the (global) importance of the harmonic of frequency m in the reconstruction of the signal z, but not of its (local) position within the signal. To gain a clearer understanding of this behavior, let us consider the unit pulse, to z which an arbitrary shift is applied: Rk δ0 . The spectrum of this signal is |R k δ0 pmq| “ ´2πi mk ˆ z ˆ N δ pmq|, but, as we have seen, δ pmq “ 1 for all m P Z |e , thus | R 0 0 N k δ0 pmq| “ mk |e´2πi N | “ 1. The difference between this case and that of the non-shifted unit ˇ ˇ ˇp ˇ pulse is that, in the latter case, the spectrum is real and thus ˇδ0 pmqˇ “ δp0 pmq “ 1 @m P ZN . The spectrum of the unit pulse is therefore exactly the same as that of any of its shifted forms. Knowledge of the spectrum alone is not sufficient to reconstruct the spatial location of a signal; to do this, we need information from the phase, which is not easy to interpret or handle. One solution to this problem lies in using two transforms which “localize” the Fourier transform: the Gabor transform and the wavelet transform. These transforms lie outside the scope of this book, the interested reader can consult, for instance, Frazier (2001). Now, let us analyze the composition of the shift operator and the DFT : zˆpm ´ kq, that is: 2 pZN q ÝÑ 2 pZN q ÝÑ 2 pZN q DFT
z
Rk
ÞÝÑ DFTpzq ÞÝÑ pRk ˝ DFTqz “ zˆpm ´ kq
[2.31]
66
From Euclidean to Hilbert Spaces
T HEOREM 2.5.– Using the hypotheses from Theorem 2.4, this is equivalent to the formula: ˆ ˙ nk { 2πi pRk zˆqpmq “ zˆpm ´ kq “ e N z pmq ,
@m P Z
[2.32]
that is: Rk ˝ DFT “ DFT ˝ Mωk
[2.33]
N
P ROOF.– pRk zˆqpmq “ zˆpm ´ kq “
Nÿ ´1
zpnqe´2πi
pm´kqn N
n“0
“
Nÿ ´1 ´
ˆ ˙ ¯ mn kn { 2πi kn ´2πi 2πi N “ e N zpnq e e N z pmq
2
n“0
The properties analyzed above may be summarized in the form of Fourier pairs, shown in Table 2.2. This information shows that the shift operation in the original representation of z becomes a phase change in the Fourier space; conversely, the shift operation in the Fourier space corresponds to a phase change (with a conjugate phase) in the original representation of z. The following situation illustrates a particularly remarkable case. If N is an even value and k “ N {2, then: e´
2πim N 2 N
“ e´πim “ pe´πi qm “ p´1qm
and: e
2πin N 2 N
“ eπin “ peπi qn “ p´1qn
so: ˙˙ ˆ ˆ N DFT z n ´ “ p´1qm zˆpmq, 2
ˆ ˙ N { n zqpmq zˆ m ´ “ pp´1q 2 [2.34]
Thus, multiplying sequence z by p´1qn corresponds to shifting the spectrum by N {2. This operation is often used to center a spectrum on m “ 0.
The Discrete Fourier Transform and its Applications to Signal and Image Processing
67
Original representation Fourier space e´2πi
zpn ´ kq kn
e2πi N zpnq
km N
zˆpmq
zˆpm ´ kq
Table 2.2. Fourier pairs and translation
Finally, note the relation between formula [2.30] and the diagonal representation of the operator Rk . Composing the left and right members of formula [2.30] with the IDFT, we obtain: DFT ˝ Rk ˝ IDFT “ MωN k Using Ak and DωN k (diagonal, see section 2.6.6) to write the matrices associated with the operator Rk and with MωN k in relation to the canonical basis, the previous equation can be rewritten as: WN Ak WN´1 “ DωN k . This tells us that the matrix Ak associated with the shift operator Rk is similar to the diagonal matrix DωN k . The invertible matrix which produces the matrix conjugation of Ak and DωN k is the Sylvester matrix WN , so we can say that the action of the shift operator Rk is diagonal in the Fourier space. 2.7.3. DFT and conjugation Given a sequence z P 2 pZN q, the conjugate sequence z¯ is written as z¯ “ p¯ z p0q, z¯p1q, . . . , z¯pN ´ 1qq, that is, z¯pnq “ zpnq @n P ZN . The relationship between the DFT and conjugation is shown in Theorem 2.6. T HEOREM 2.6.– For all z P 2 pZN q: zp ¯pmq “ zˆp´mq “ zˆpN ´ mq
@m P ZN
P ROOF.– zp ¯pmq “
Nÿ ´1 n“0
zpnqe´2πi
mn N
“
Nÿ ´1
zpnqe2πi
n“0
zˆpN ´ mq “ zˆp´mq, by periodicity.
mn N
“
Nÿ ´1
zpnqe´2πi
p´mqn N
“ zˆp´mq
n“0
2
68
From Euclidean to Hilbert Spaces
C OROLLARY 2.1.– z P 2 pZN q is real, that is, zpnq P R @n P ZN , if and only if: zˆpmq “ zˆp´mq “ zˆpN ´ mq,
@m P ZN
zˆ P 2 pZN q is real, that is, zˆpmq P R @m P ZN , if and only if: zpnq “ zp´nq “ zpN ´ nq,
@n P ZN
P ROOF.– As the DFT is an isomorphism of 2 pZN q, z is real, that is, z “ z¯, if and only if zˆ “ zp ¯, but, from Theorem 2.6, this also holds true when zˆpmq “ zˆp´mqq “ zˆpN ´ mq. zˆ is real if and only if zˆ “ zˆ, but the previous theorem states that zˆp´mq “ zp ¯pmq @m P ZN , implying that zˆpmq “ zp ¯p´mq, by simple substitution of the variable m Ø ´m. Hence: IDFTpˆ z pmqq “ IDFTpzp ¯p´mqq “ IDFTpDFTp¯ z p´mqqq “ z¯p´nq “ zp´nq Therefore zˆ is real ðñ zˆpmq “ zˆpmq ðñ IDFTpˆ z pmqq ðñ zpnq “ zp´nq “ zpN ´ nq @n P ZN .
IDFTpˆ z pmqq “ 2
Corollary 2.2 is an immediate consequence of the previous result. C OROLLARY 2.2.– z, zˆ P 2 pZN q are simultaneously real if and only if they are symmetrical about 0, that is, zpnq “ zp´nq and zˆpmq “ zˆp´mq, @m, n P ZN . 2.7.4. DFT and convolution One of the most important properties of the Fourier transform relates to the convolution operation. To understand this operation, we first note the formula for polynomial products. If P pxq “ a0 `a1 x`. . .`an xn “ m ř
n ř
ai xi and Qpxq “ b0 `b1 x`. . .`bm xm “
i“0
bj xj , then:
j“0
P pxqQpxq “
n`m ÿ “0
c x ,
where
c “
ÿ k“0
a´k bk “
ÿ k“0
ak b´k
[2.35]
The Discrete Fourier Transform and its Applications to Signal and Image Processing
69
E XAMPLE .– P pxq “ a0 ` a1 x ` a2 x2 , Qpxq “ b0 ` b1 x ` b2 x2 , so: P pxqQpxq “ a0 b0 ` pa0 b1 ` a1 b0 qx ` pa0 b2 ` a1 b1 ` a2 b0 qx2 `pa1 b2 ` a2 b1 qx3 ` pa2 b2 qx4 The coefficients of the powers of the variable x verify formula [2.35]. We see that the coefficients c include a sum of the products of the coefficients ai and bj . Notably, the sum of the indices i`j is always equal to ; as the index of one variable increases, that of the other decreases. These are the defining characteristics of the convolution operation (in its discrete form), which we shall introduce in the space 2 pZN q. D EFINITION 2.13.– Take z, w P 2 pZN q. The convolution of z with w, written as z˚w, is the sequence of 2 pZN q with components defined by: pz ˚ wqpnq “
Nÿ ´1
zpn ´ kqwpkq “
Nÿ ´1
k“0
wpn ´ kqzpkq ,
@n P ZN
k“0
Convolution is symmetrical, that is z ˚ w “ w ˚ z, due to the commutative nature of the product in C. E XAMPLE .– z, w P 2 pZ4 q, z “ p1, 1, 0, 2q, w “ pi, 0, 1, 2q, with canonical periodicity: zpn ` kN q “ zpnq and wpn ` kN q “ wpnq @n P ZN and k P Z. Then: pz ˚ wqp0q “
4´1 ÿ
zp0 ´ kqwpkq “
k“0
3 ÿ
zp´kqwpkq
k“0
“ zp0qwp0q ` zp´1qwp1q ` zp´2qwp2q ` zp´3qwp3q “ zp0qwp0q ` zp4 ´ 1qwp1q ` zp4 ´ 2qwp2q ` zp4 ´ 3qwp3q “ zp0qwp0q ` zp3qwp1q ` zp2qwp2q ` zp1qwp3q “1¨i`2¨0`0¨1`1¨2 “i`2
70
From Euclidean to Hilbert Spaces
We also have pz ˚ wqp1q “ 2 ` i, pz ˚ wqp2q “ 1 ` 2i, pz ˚ wqp3q “ 1 ` 3i, hence pz ˚ wq “ pi ` 2, 2 ` i, 1 ` 2i, 1 ` 3iq. The interaction between the DFT and convolution has a particularly elegant and useful property, described in Theorem 2.7. T HEOREM 2.7.– Take z, w P 2 pZN q. Then: DFT pz ˚ wqpmq “ zˆpmq ¨ wpmq ˆ ðñ pz ˚ wqpnq “ IDFT pˆ z ¨ wqpnq ˆ
@n, m P Z [2.36]
z ˚ wqpmq ˆ “ N DFTpz ¨ wqpmq IDFT pˆ z ˚ wqpnq ˆ “ N zpnq ¨ wpnq ðñ pˆ @n, m P Z [2.37] In other words, the Fourier transform of the convolution of z and w is the pointwise product of the Fourier transforms and vice versa: the inverse Fourier transform of the convolution of zˆ and w ˆ is N times the pointwise product of z and w. In other words, we obtain the Fourier pairs shown in Table 2.3. Original representation Fourier space z˚w zˆ ¨ w ˆ Nz ¨ w zˆ ˚ w ˆ Table 2.3. Fourier pairs relative to convolution
P ROOF.– By definition : { pz ˚ wqpmq “
Nÿ ´1
pz˚wqpnqe
´2πi mn N
n“0
“
Nÿ ´1
Nÿ ´1
¸
˜
n“0
zpn ´ kqwpkq e´2πi
k“0
The exponential is rewritten as: e´2πi
mn N
“ e´2πi
mpn´k`kq N
“ e´2πi
mpn´kq`mk N
“ e´2πi
mpn´kq N
e´2πi
mk N
mn N
The Discrete Fourier Transform and its Applications to Signal and Image Processing
71
Then: ´1 Nÿ ´1 Nÿ
{ pz ˚ wqpmq “
zpn ´ kqwpkqe´2πi
n“0 k“0 Nÿ ´1
wpkqe´2πi
“
k“0 Nÿ ´1
“
k“0 Nÿ ´1
“
wpkqe
mk N
´2πi mk N
wpkqe´2πi
mk N
k“0
Nÿ ´1
mpn´kq N
e´2πi
zpn ´ kqe´2πi
n“0 N ´k´1 ÿ n“´k N ´k´1 ÿ
mk N
mpn´kq N
zpn ´ k ` kqe´2πi zpnqe´2πi
mpn´k`kq N
mn N
n“´k Nÿ ´1
“
pLemma 2.2q
wpkqe´2πi
mk N
Nÿ ´1
zpnqe´2πi
mn N
n“0
k“0
“ wpmqˆ ˆ z pmq “ zˆpmqwpmq ˆ Lemma 2.2 can be applied here as it is valid for any k P Z. Thus: { pz ˚ wqpmq “ zˆpmqwpmq, ˆ
@m P Z
The proof that the IDFTpˆ z ˚ wqpnq ˆ “ zpnq ¨ wpnq is very similar, by definition : N ´1 1 ÿ 2πi mn N pˆ z ˚ wqpmqe ˆ N m“0 ¸ ˜ N ´1 N ´1 mn 1 ÿ ÿ “ zˆpm ´ kqwpkq ˆ e2πi N N m“0 k“0
IDFTpˆ z ˚ wqpnq ˆ “
The exponential is rewritten as: e2πi
mn N
“ e2πi
npm´k`kq N
“ e2πi
npm´kq N
e2πi N
2πi zˆpm ´ kqwpkqe ˆ
npm´kq N
e2πi N
“ e2πi
npm´kq`nk N
nk
Then: IDFTpˆ z ˚ wqpnq ˆ “
1 N
řN ´1 řN ´1 m“0
k“0
“
N ´1 Nÿ ´1 npm´kq 1 ÿ 2πi nk N wpkqe ˆ zˆpm ´ kqe2πi N N k“0 m“0
“
N ´1 1 ÿ 2πi nk N wpkqe ˆ N k“0
“
1 N
Nÿ ´1 k“0
2πi N wpkqe ˆ
nk
N ´k´1 ÿ m“´k N ´k´1 ÿ m“´k
zˆpm ´ k ` kqe2πi zˆpmqe2πi
mn N
nk
npm´k`kq N
72
From Euclidean to Hilbert Spaces N ´1 Nÿ ´1 mn 1 ÿ 2πi nk N wpkqe ˆ zˆpmqe2πi N pLemma 2.2q N k“0 ˜ ¸ m“0 ˜ ¸ Nÿ ´1 N ´1 mn 1 1 ÿ 2πi nk 2πi N N “N wpkqe ˆ zˆpmqe ¨ N k“0 N m“0
“
“ N IDFT wpnq ˆ ¨ IDFT zˆpnq “ N wpnq ¨ zpnq “ N zpnq ¨ wpnq
2
O BSERVATIONS .– – In this proof,
Nř ´1
wpkqe´2πi
mk N
cannot be replaced with wpmq ˆ before the final
k“0
step, as the index k is still present in the second sum. wpmq ˆ can only be substituted in once k has been eliminated. – Formulas [2.36] demonstrate a sort of “distributive property” in connection with convolution and the pointwise product: when the DFT is applied to a convolution product, it is distributed over the factors, and the convolution becomes a pointwise product. Inversely, when the IDFT is applied to a pointwise product, it is distributed over the factors, and the pointwise product becomes a convolution. Thus: DFTpˇ z ˚ wq ˇ “ z ¨ w, IDFTpz ¨ wq “ zˇ ˚ w ˇ
@z, w P 2 pZN q
[2.38]
– Using the Fourier transform, a complex operation such as convolution can be transformed into a simple product of Fourier transforms (which can be calculated rapidly using the FFT). This result is particularly useful for signal processing applications. If we define the DFT using the normalization induced by the orthonormal Fourier basis, coefficients appear in the DFT formula of the convolution. These coefficients may be extremely large, particularly when dealing with DFTs in dimensions greater than 1 and/or large signals; this may result in calculation errors. The simplicity of formula [2.36] is the reason why many authors – and programmers – prefer the definition of Fourier coefficients used in this book to other definitions. – Convolution is often carried out between a signal z and another signal w which is non-zero only on a support of size T . The value of T is important in choosing whether to apply the convolution operation directly or to use the FFT. The complexity of the direct convolution operation is OpN T q; using the FFT, the complexity is OpN log N q. It is therefore helpful to transform the convolution into a pointwise product with FFT in cases where T ą logpN q. For example, taking z P 2 pZN q with N “ 1, 000, then logpN q » 7 and it is thus preferable to carry out the convolution z ˚ w in the Fourier domain for all cases where the support of w is larger than 7. If one of the vectors in the convolution is fixed, we can define an endomorphism of 2 pZN q.
The Discrete Fourier Transform and its Applications to Signal and Image Processing
73
D EFINITION 2.14.– Taking a fixed sequence w P 2 pZN q, the following linear transformation is the convolution operator with w: Tw : 2 pZN q ÝÑ 2 pZN q z ÞÝÑ Tw pzq “ z ˚ w As in the case of the shift operator, a diagonal representation of the convolution operator can be produced. To do this, we rewrite formula [2.36] without specifying the index m (as the representation is valid for any index), that is, DFTpz ˚ wq “ zˆ ¨ w, ˆ but DFTpz ˚ wq “ pDFT ˝ Tw qz and zˆ ¨ w ˆ“w ˆ ¨ zˆ “ Mwˆ zˆ “ pMwˆ ˝ DFTqz, that is, pDFT ˝ Tw qz “ pMwˆ ˝ DFTqz @z P 2 pZN q, making it possible to write the operator relationship DFT ˝ Tw “ Mwˆ ˝ DFT. Applying a composition between the IDFT and the left and right sides of this expression, we obtain: DFT ˝ Tw ˝ IDFT “ Mwˆ Let us consider this relationship in the context of the canonical basis B, just as we did in the case of the shift operator. The DFT and the IDFT become WN and WN´1 , and the multiplication operator Mwˆ takes the form of the diagonal matrix Dwˆ “ diagpwp0q, ˆ . . . , wpN ˆ ´ 1qq. If the matrix Aw is the representation of Tw in the basis B, that is, Aw “ rTw sB , then: WN Aw WN´1 “ Dwˆ which shows that the action of the convolution operator is diagonalized in the Fourier basis. Shift and convolution operators are not unique in this regard: there is a whole specific category of operators which have a diagonal action in the Fourier basis. These operators are called stationary and they will be examined in greater detail in section 2.8. 2.8. The DFT and stationary operators The relationship between the Fourier transform and the class of “stationary” operators is an important one. The DFT enables the diagonalization of these operators and they can be shown to be equivalent to convolutions and to Fourier multipliers. To prove these results, we shall also introduce the category of “circulant” matrices, which represent stationary operators in the canonical basis of 2 pZN q. Before giving the formal mathematical definition of stationary operator, let us introduce the idea behind such object by considering an audio signal z and a device T that acts linearly on it. If the signal z is transmitted to T with a delay Δt, and the
74
From Euclidean to Hilbert Spaces
only effect of this delay on T is that its output is delayed by the same quantity Δt, then the device T is said to be stationary. Mathematically speaking, if Rk is the shift operator by the quantity k P Z, then the stationarity of T is translated as the following relationship: @z P 2 pZN q
T pRk zq “ Rk pT zq,
The left side represents the action of the operator T on the z shifted by a quantity k, while the right side represents the shift in the action of operator T on the original signal z. These notions are summarized in the commutative diagram below. R
k 2 pZN q ÝÝÝÝ Ñ 2 pZN q § § § § Tđ đT
2 pZN q ÝÝÝÝÑ 2 pZN q Rk
These considerations justify Definition 2.15. D EFINITION 2.15.– An operator T : 2 pZN q Ñ 2 pZN q is said to be stationary (or shift invariant) if: T pRk zq “ Rk pT zq,
@z P 2 pZN q, @k P Z
[2.39]
that is, T is stationary if it commutes with all shift operators Rk : T ˝ Rk “ Rk ˝ T ,
@k P Z
[2.40]
In section 2.8.5, we shall show that a linear operator T P Endp2 pZN q is stationary Nř ´1 Nř ´1 if and only if pT zqpnq “ ak zpn ´ kq “ ak Rk zpnq, n P t0, . . . , N ´ 1u, ak P C.
k“0
k“0
The DFT provides an extremely important example of a non-stationary operator over 2 pZN q. To prove that the DFT is not a stationary operator, we simply recall the way in which it interacts with shift operators Rk : using formulas [2.30] and [2.33] we obtain, respectively, DFT ˝ Rk “ MωN k ˝ DFT and Rk ˝ DFT “ DFT ˝ M k , ω N
which shows that the DFT does not commute with the shift operators.
The Discrete Fourier Transform and its Applications to Signal and Image Processing
75
2.8.1. The DFT and the diagonalization of stationary operators The most important properties of the DFT with regard to stationary operators can be summarized in a single theorem, but we prefer to highlight the fact that the Fourier transform diagonalizes stationary operators through a separate theorem. T HEOREM 2.8.– Let T P Endp2 pZN qq be a stationary operator. Then, T is diagonalizable, and each element of the orthogonal Fourier basis Fm of 2 pZN q is an Eigenvector of T . P ROOF.– For every fixed m P t0, . . . , N ´ 1u, let us consider the element m of the mn orthogonal Fourier basis: Fm pnq “ N1 e2πi N . As T is an endomorphism, T Fm P 2 pZN q, and thus T Fm can be decomposed over the basis pF0 , . . . , FN ´1 q itself : pT Fm qpnq “
Nÿ ´1
ak Fk pnq “
k“0
N ´1 kn 1 ÿ ak e2πi N , N k“0
@n P ZN
Now, consider the action of the shift operator R1 on Fm : R1 Fm pnq “ Fm pn ´ 1q “ N1 e2πi m “ e´2πi N ¨ Fm pnq
mpn´1q N
“ e´2πi N ¨ m
Applying T to R1 Fm , we obtain: ` ˘ m T R1 Fm pnq “ T e´2πi N ¨ Fm pnq “
“
e´2πi N pT Fm q pnq m
Linearity of T
m
equation r2.41s
“
Nÿ ´1
e´2πi N
Nÿ ´1
ak Fk pnq
k“0
ak e´2πi N Fk pnq m
k“0
Now, we switch the order of composition of R1 and T : R1 T Fm pnq “ T Fm pn ´ 1q N ´1 kpn´1q 1 ÿ “ ak e2πi N equation r2.41s N k“0 “ “
N ´1 k kn 1 ÿ ak e´2πi N ¨ e2πi N N k“0 Nÿ ´1 k“0
ak e´2πi N Fk pnq k
1 2πi mn N Ne
[2.41]
76
From Euclidean to Hilbert Spaces
Since T is stationary, T R1 Fm “ R1 T Fm , that is: Nÿ ´1
ak e´2πi N Fk pnq “ m
Nÿ ´1
k“0
ak e´2πi N Fk pnq k
k“0
and due to the uniqueness of decomposition over a basis: ak e´2πi N “ ak e´2πi N , m
k
@k P ZN , pm is fixedq
[2.42]
Let us analyze this equality. If k “ m, then equation [2.42] is simply an identity and requires no further discussion. In the case where k ‰ m, we begin by recalling that m, k P t0, . . . , N ´ 1u, so the cosine and sine of the complex exponentials have their values in only one period, as the next period begins when m, k “ N . Then: k‰m
ùñ
e´2πi N ‰ e´2πi N m
k
and equation [2.42] can be verified if and only if ak “ 0 @k ‰ m. Equation [2.41] thus becomes: T Fm pnq “ am Fm pnq,
@n P ZN ,
that is, Fm is an eigenvector of T with an eigenvalue am given by the m-th coefficient of the decomposition of T Fm on the orthogonal Fourier basis. Evidently, the coefficient am is dependent on T . Given that we fixed an arbitrary index m, every element of the orthogonal Fourier basis is an eigenvector of T , and consequently 2 pZN q has a basis of eigenvectors of T . By definition, T is therefore diagonalizable. 2 Theorem 2.9 shows how the eigenvalues am can be made explicit using the DFT. The theorem shown above can be interpreted using matrices. We know that the action of the DFT is represented by the Sylvester matrix WN defined in equation [2.23] and that WN is the matrix used to pass from the canonical basis B of 2 pZN q to the Fourier basis F of 2 pZN q; the inverse is WN´1 “ N1 WN , representing the matrix used to pass from basis F to basis B. If A is the matrix associated with T with respect to the canonical basis of 2 pZN q and D is the diagonal matrix of the eigenvalues of A, then: D “ WN AWN´1 ,
A “ WN´1 DWN
[2.43]
If rwsF represents any vector w P 2 pZN q with respect to the Fourier basis F , then: WN Az “ rAzsF
“
pF diagonalizes Aq
DrzsF “ DWN z,
@z P 2 pZN q
so WN A “ DWN , if and only if WN AWN´1 “ DWN WN´1 “ D.
The Discrete Fourier Transform and its Applications to Signal and Image Processing
77
2.8.2. Circulant matrices Thanks to the introduction of the concept of circulant matrix, we will be able to prove the fundamental theorem concerning the link between the Fourier transform and stationary operators. First, let us generalize the periodicity of sequences 2 pZN q to matrices: given a ´1 matrix A “ pamn qN m,n“0 , we say that A is an N -periodic matrix if: am`kN,n “ am,n and am,n`kN “ am,n ,
@m, n, k P Z
E XAMPLE . a0,2 “ aN,2 “ aN,N `2 ´1 D EFINITION 2.16.– Let A “ pamn qN m,n“0 be an N ˆ N periodic matrix. A is said to be circulant if:
am`1,n`1 “ am,n ,
@m, n P Z
Repeating the translation k times, the definition is rewritten as: am`k,n`k “ am,n ,
@m, n, k P Z
We see that, since k P Z, a circulant periodic matrix can also be defined with the property am´k,n´k “ am,n , k P Z. This definition is interpreted as follows. Line (column) m ` 1 (n ` 1) is obtained from line (column) m (n) by shifting one position to the right (at the bottom), as follows: ¨ ˛ a0 a1 a2 . . . aN ´1 ˚aN ´1 a0 a1 . . . aN ´2 ‹ ˚ ‹ ˚aN ´2 aN ´1 a0 . . . aN ´3 ‹ ˚ ‹ ˚ .. .. .. . . .. ‹ ˝ . . . . . ‚ a1 a2 a3 . . . a0 E XAMPLE OF A CIRCULANT MATRIX .– ˛ ¨ 3 2 ` i ´1 4i ˚ 4i 3 2 ` i ´1 ‹ ‹ A“˚ ˝ ´1 4i 3 2 ` i‚ 2 ` i ´1 4i 3
78
From Euclidean to Hilbert Spaces
E XAMPLE OF A NON - CIRCULANT MATRIX .– ˛ ¨ 2 i 3 B “ ˝3 2 i ‚ i 23 For this matrix to be circulant, the third line would have to be pi, 3, 2q. 2.8.3. Exhaustive characterization of stationary operators Theorem 2.9 is the most important result of this chapter. It is used to produce the eigenvalues of a stationary operator T in a very simple manner; it can also be used to characterize T as a convolution operator, in the original representation of z, and as a multiplier, in the frequency representation. T HEOREM 2.9.– Let T : 2 pZN q Ñ 2 pZN q be an endomorphism. The following properties are equivalent. 1) T is stationary. 2) The matrix A, which represents T in the canonical basis of 2 pZN q, is circulant. 3) T is a convolution operator. 4) T is a Fourier multiplier. 5) The matrix D, which represents T in the orthogonal Fourier basis F , is diagonal. Note that implication 1) ùñ 5) has already been proved. The theorem will be proved using the following strategy: 1q ùñ 2q ùñ 3q ùñ 1q
and
3q ðñ 4q
and
4q ðñ 5q
The proof of this theorem is crucial, as it provides an explicit technique for finding the Eigenvalues of T and for constructing the convolution operator and Fourier multiplier which represent T . P ROOF.– 1q ùñ 2q : let A be the associated matrix of T with respect to the N ´1 of 2 pZN q: canonical basis8pen qn“0 ˛ ¨ a0,1 ¨ ¨ ¨ a0,N ´1 a0,0 ˚ a1,0 a1,1 ¨ ¨ ¨ a1,N ´1 ‹ ‹ ˚ A“˚ . ‹ .. .. .. ‚ ˝ .. . . . aN ´1,0 aN ´1,1 ¨ ¨ ¨ aN ´1,N ´1
8 We recall that en pmq “ δn,m , @n, m P ZN .
The Discrete Fourier Transform and its Applications to Signal and Image Processing
79
From the definition of the associated matrix, we have am,n “ pT en qpmq, that is, the n-th column of A is the vector T en . Using the fact that T is stationary, we wish to show that: am`1,n`1 “ am,n
ðñ
pT en`1 qpm ` 1q “ pT en qpmq,
@m, n P ZN
We see that: # 1 if n “ m ´ 1 pR1 en qpmq “ en pm ´ 1q “ 0 if n ‰ m ´ 1
ðñ ðñ
m“n`1 m‰n`1
“ en`1 pmq @m P ZN thus en`1 “ R1 en and, consequently: am`1,n`1 “ pT R1 en qpm ` 1q
“
pT stationaryq
R1 pT en qpm ` 1q “ pT en qpm ` 1 ´ 1q
“ pT en qpmq “ am,n Since am`1,n`1 “ am,n @m, n P ZN , then A is circulant and the implication 1q ùñ 2q is proved. 2q ùñ 3q : let A be a periodic circulant matrix, that is, am,n “ am´k,n´k @n, m, k P Z. We wish to prove the existence of h P 2 pZN q such that Az “ z ˚ h “ Th pzq @z P 2 pZN q. We shall prove that the sequence h which we are looking for is the first column in A, that is: ¨ ˛ a0,0 ‹ ˚ h “ T e0 “ ˝ ... ‚, hpmq “ am,0 , @m P ZN aN ´1,0 We see that hpm ´ nq “ am´n,0 “ am´n,n´n
“
pA circulantq
am,n , and thus, from the
definition of the matrix-vector product, we have: pAzqpmq “
Nÿ ´1
am,n zpnq “
n“0
Nÿ ´1
hpm ´ nqzpnq “ ph ˚ zqpmq
n“0
and implication 2q ùñ 3q is proved. 3q ùñ 1q : we must prove that a convolution operator Tw is stationary, that is: pTw ˝ Rk qpzq “ pRk ˝ Tw qpzq,
@z P 2 pZn q, @k P Z
80
From Euclidean to Hilbert Spaces
We begin by calculating the left side of the equation: pTw Rk zqpmq “ pw ˚ Rk zqpmq “
Nÿ ´1
wpm ´ nqRk zpnq “
n“0
Nÿ ´1
wpm ´ nqzpn ´ kq
n“0
Making the index substitution “ n ´ k ô n “ k ` , the variability of is: $ ’ ’ &n “ 0 ùñ “ ´k .. . ’ ’ %n “ N ´ 1 ùñ “ N ´ 1 ´ k then: pTw Rk zqpmq “
N ´1´k ÿ
wpm ´ k ´ qzpq
“´k
“
Nÿ ´1
Lemma 2.2
wppm ´ kq ´ qzpq
“0
“ pz ˚ wqpm ´ kq “ Rk pz ˚ wqpmq “ pRk Tw zqpmq and this proves the implication 3q ùñ 1q . 3q ðñ 4q : we must prove that a linear operator T : 2 pZN q Ñ 2 pZN q is a convolution operator if and only if T is a Fourier multiplier. Taking an arbitrary fixed element w P 2 pZN q, we have: Tw pzq “ z ˚ w “ IDFTpDFTpz ˚ wqq
“
pTh. 2.36q
“ pIDFT ˝ Mwˆ ˝ DFTqpzq “ Tpwq ˆ pzq,
IDFTpˆ z ¨ wq ˆ “ IDFTpw ˆ ¨ zˆq @w, z P 2 pZN q
ˆ Inversely: where Mwˆ is the multiplication operator by the sequence w. Tpwq pzq “ pIDFT ˝ Mw ˝ DFTqpzq “ IDFTpw ¨ zˆq “ Twˇ pzq @w, z P 2 pZN q
“
equation r2.38s
w ˇ ˚ zˇˆ “ w ˇ˚z
This shows us that the convolution operator with w can be interpreted as the Fourier multiplier by w ˆ and vice versa, and that the Fourier multiplier by w can be interpreted as the convolution operator with w: ˇ Tw “ Tpwq ˆ ,
Tpwq “ Twˇ
@w P 2 pZN q
The double implication 3q ðñ 4q is thus proved. Before continuing on to the final stage in our proof, let us summarize our findings. A stationary operator T : 2 pZN q Ñ 2 pZN q is represented by a circulant matrix A with respect to the canonical basis pe0 , . . . , eN ´1 q of 2 pZN q.
The Discrete Fourier Transform and its Applications to Signal and Image Processing
81
This matrix A can be represented by the convolution operator Th with h “ T e0 , ˆ the first column of A or, as we have just seen, by the Fourier multiplier Tphq ˆ , where h is the sequence of Fourier coefficients of h. 4q ðñ 5q : we must prove that T is a Fourier multiplier Tpwq if and only if the associated matrix of T with respect to the orthogonal Fourier basis F is diagonal. The direct implication has already been proved in formula [2.27], so we simply need to prove the implication 5q ùñ 4q. Stating that D “ diagpdn,n q, n “ 0, . . . , N ´ 1 is the diagonal matrix which represents an operator T in the Fourier basis F means that: rT pzqsF “ DrzsF ðñ DFT ˝ T pzq “ Mw ˝ DFTpzq with Mw the multiplication operator by the sequence wpnq “ dn,n , n “ 0, . . . , N ´1. Applying the IDFT to both sides: T pzq “ IDFT ˝ Mw ˝ DFTpzq
@z P 2 pZN q
hence T “ Tpwq proving the implication 5q ùñ 4q. The proof of the theorem is now complete.
2
The theorem demonstrated above provides a standard technique for studying stationary operators T over 2 pZN q. We recall that the sequence: # 1 if n “ 0 @n P ZN δ P 2 pZN q, δpnq “ e0 pnq “ δ0,n “ 0 if n ‰ 0 is the unit pulse; thus, operator T is completely determined by its action on δ, h “ T δ, ˆ the DFT of the unit pulse response, which is referred to as the unit pulse response. h, is called the transfer function. The properties demonstrated in Theorems 2.8 and 2.9 can be used to summarize the analysis of stationary operators, as shown in Box 2.2. – T is the stationary operator of 2 pZN q. – A is the circulant matrix associated with T with respect to the canonical basis of 2 pZN q. – h is the unit pulse response of T : h “ T δ “ first column of A
82
From Euclidean to Hilbert Spaces
– Th is the convolution operator with h: T z “ Th z “ h ˚ z “ z ˚ h ˆ – Tphq ˆ is the Fourier multiplier by h, the transfer function: ˆ ˆq T z “ Tphq ˆ z “ IDFTph ¨ z – Given h “ T δ, we obtain the Fourier pair in Table 2.4. Original representation Fourier space ˆ ¨ zˆ h h˚z Table 2.4. Fourier pair for the convolution between a signal z and the unit pulse response h of T
– D is the diagonal matrix which represents T in the orthogonal Fourier basis F of 2 pZN q: D“
1 ˆ ˆ . . . , hpN ´ 1qq WN AWN “ diagphp0q, N
– The Eigenvalues of T (the spectrum, in the linear algebra sense) are the components of the transfer function, that is the Fourier coefficients of the unit pulse response, that is: ˆ ˆ Eigenvalues of T : thp0q, . . . , hpN ´ 1qu Box 2.2. Analysis of stationary operators over 2 pZN q
2.8.4. High-pass, low-pass and band-pass filters The synthesis formula for any given signal z P 2 pZN q transformed) via the action of a stationary operator T P Endp2 pZN qq is: T zpnq “
Nÿ ´1
TxzpmqFm pnq
n P ZN
[2.44]
m“0
where Fm is the vector with index m of the orthogonal Fourier basis of 2 pZN q. Thus, |Txzpmq| represents the importance of the harmonic of frequency m in the reconstruction of T z, and t|Txzpmq|, m P ZN u represents the spectrum of the transformed signal T z. To understand how the spectrum of T z is linked to that of the original signal z, ˆ ˆq, where let us apply the DFT to both sides of the formula T z “ Tphq ˆ z “ IDFTph ¨ z h “ T δ0 : ˆ ¨ zˆq “ h ˆ ¨ zˆ DFTpT zq “ DFT ˝ IDFTph
The Discrete Fourier Transform and its Applications to Signal and Image Processing
83
that is: ˆ Txzpmq “ hpmq ¨ zˆpmq ,
@m P ZN
so the Fourier coefficients of T z, the sequence transformed by the operator T , are given by the product of the Fourier coefficients of the original sequence z and the Fourier coefficients of the unit pulse response h. Consequently, the spectrum of the transformed sequence T z is: ˇ !ˇ ) ˇx ˇ ˆ ¨ |ˆ z pmq|, m P ZN ˇT zpmqˇ “ |hpmq|
[2.45]
This allows us to understand the action of stationary filters T on the frequency content of a signal z: ˇ ˇ ˇ ˇ ˆ z p0q| “ 0 and we – if hp0q “ 0, the average of T z is zero, since ˇTxzp0qˇ “ 0 ¨ |ˆ ˇ ˇ ˇ ˇ know that ˇTxzp0qˇ is proportional to the average of T z; ˆ – if |hp0q| “ 1, then T preserves the average of z, that is xT zy “ xzy ; ˆ ˆ – if |hpmq| ą 1 for m » 0 and m » N ´1, and |hpmq| P r0, 1r for m » N {2, then T increases the low frequencies and reduces the high frequencies (low-pass filter); ˆ ˆ – if |hpmq| ą 1 for m » N {2 and |hpmq| P r0, 1r for m » 0 and m » N ´1, then T increases the high frequencies and reduces the low frequencies (high-pass filter); ˆ – if |hpmq| ą 1 for intermediate values of m, then T increases the mid-range frequencies (band-pass filter).
2.8.5. Characterizing stationary operators using shift operators We now have all of the results we need to demonstrate the characterization of a stationary operator as a linear combination of shift operators, or, in an equivalent manner, as a polynomial of the shift operator R1 , since Rk “ R1 ˝ ¨ ¨ ¨ ˝ R1 k times, that is, Rk “ R1k , @k P Z. T HEOREM 2.10.– T P Endp2 pZN qq is stationary if and only if the expression of T is: pT zqpnq “
Nÿ ´1 k“0
ak zpn ´ kq “
@n P t0, .., N ´ 1u where ak P C.
Nÿ ´1 k“0
ak Rk zpnq “
Nÿ ´1 k“0
ak pR1 qk zpnq
[2.46]
84
From Euclidean to Hilbert Spaces
P ROOF.– ùñ : let T be stationary. We know that T “ Th , where Th is the convolution operator with regard to the unit pulse response h “ T δ, that is: pT zqpnq “
Nÿ ´1
hpkqzpn ´ kq.
k“0
We must therefore simply identify the coefficients ak of the formula pT zqpnq “
Nř ´1
ak zpn ´ kq with hpkq to obtain our thesis.
k“0
ðù : we can verify that T , written in the form used in formula [2.46], is stationary due to the linearity of T and Rk . We know that @n P t0, . . . , N ´ 1u: pT Rm zqpnq “ T pRm zpnqq “ T pzpn ´ mqq “ “
Nÿ ´1
Nÿ ´1
ak zpn ´ k ´ mq
k“0
ak Rm zpn ´ kq
k“0 Nÿ ´1
¸
˜ “
(linearity of Rm )
Rm
ak zpn ´ kq
“ pRm T zqpnq
k“0
hence: T ˝ Rm “ Rm ˝ T @m P Z.
2
Since hpkq “ T δpkq, the proof of the theorem above also proves the validity of the formula: pT zqpnq “
Nÿ ´1
T δpkqzpn ´ kq
@ T stationary
k“0
2.8.6. Frequency analysis of first and second derivation operators (discrete case) In this section, we shall analyze two stationary operators which represent the discrete version of the first and second derivatives. By comparing their eigenvalues, we see that the second derivation operator is more efficient for amplifying high frequencies in digital signals. D EFINITION 2.17.– Given a sequence z P 2 pZN q, we define: T1 zpnq “ zpn ` 1q ´ zpnq
Discrete first derivative
T2 zpnq “ zpn ` 1q ´ 2zpnq ` zpn ´ 1q
Discrete second derivative
The Discrete Fourier Transform and its Applications to Signal and Image Processing
85
The discrete first derivative is simply the forward difference of z, divided by the difference of the values of n, but since pn ` 1q ´ n “ 1 there is no need to write the denominator. The discrete second derivative is the backward difference of the discrete first derivative of z, divided by the difference of the values of n, which – once again – is 1, so does not need to be written: T2 zpnq “ T1 zpnq ´ T1 zpn ´ 1q “ zpn ` 1q ´ zpnq ´ rzpnq ´ zpn ´ 1qs “ zpn ` 1q ´ 2zpnq ` zpn ´ 1q. Let us begin by analyzing T1 . To calculate the pulse response, T1 is applied to the unit pulse δ “ e0 : ¨ ˛ ´1 ˛ ¨ ˚0‹ e0 p1q ´ e0 p0q ÐÝ n “ 0 ‹ ‹ ˚ ˚ 0‹ ÐÝ n “ 1 e0 p2q ´ e0 p1q ‹ ˚ ˚ ˚ ‹“˚ . ‹ h “ T1 δ “ ˚ .. ‹ ˚ . ‹ ˚ . ‚ ˚ . ‹ ˝ ‹ ˝0‚ e0 pN ´ 1 ` 1q ´ e0 pN ´ 1q ÐÝ n “ N ´ 1 1 using the fact that e0 p0q “ e0 pN q “ 1. The matrix which represents T1 in the canonical basis of 2 pZN q is: ˛ ¨ ´1 1 0 . . . 0 ˚ 0 ´1 1 . . . 0 ‹ ‹ ˚ ˚ . . . . .. ‹ AT1 “ ˚ ... ‹ . . . ‹ ˚ ˝ 0 0 . . . ´1 1 ‚ 1 0 . . . 0 ´1 Now, let us calculate the DFT of h. For all m P ZN , this is: ˆ hpmq “
Nÿ ´1
hpnqe´2πi
mn N
“ ´1 ¨ e´2πi
m0 N
` 0 ` . . . ` 1 ¨ e´2πi
mpN ´1q N
n“0
“ ´1 ` e2πi N e´2πi m
mN N
“ e2πi N ´ 1 m
m ˆ “ e2πi N ´ 1, m “ 0, 1, . . . , N ´ 1u and its so the eigenvectors of T1 are thpmq diagonal representation is: ´ ¯ pN ´1q 2 1 D “ diag 0, e2πi N ´ 1, e2πi N ´ 1 . . . , e2πi N ´ 1
The action of T1 in terms of frequency can now be interpreted using formula [2.45]. ˆ We wish to calculate the magnitudes of the Eigenvalues phpmqq mPZN . We see that: ´ m¯ m m m m m e2πi N ´ 1 “ eπi N peπi N ´ e´πi N q “ eπi N 2i sin π N
86
From Euclidean to Hilbert Spaces
ˇ ` ˇ mˇ ˇ ` ˘ˇ ˘ˇ ˆ ˇ “ 2 ˇsin π m ˇ, while m P ZN , m ă Thus, |hpmq| “ ˇeπi N ˇ ¨ ˇ2i sin π m N N N 1, so the sinus is always non-negative and the absolute value can be eliminated. To summarize: ´ m¯ ˆ |hpmq| “ 2 sin π , m P ZN N Specifically: ˆ – |hp0q| “ 0: hence, the filtered signal T1 z averages to zero; ˆ N q| “ 2; – |hp 2 ˆ – |hpmq| ă 2 @m ‰
N 2;
ˆ – |hpmq| Ñ 0 if m Ñ 0 or m Ñ N ´ 1; – the action of the operator is symmetrical with regard to
N 2.
Since m “ N {2 represents the highest frequency of the signal and m “ 0 and m “ N ´ 1 represent the lowest frequencies, we can deduce that T1 reduces the low frequencies of z and increases the high frequencies by up to two times. Thus, the discrete first derivative operator is a high-pass filter. Now, let us analyze T2 . Its pulse response is given by the vector: ¨ ˛ ´2 ¨ ˛ e0 p1q ´ 2 e0 p0q ` e0 p´1q ˚1‹ ‹ ˚ ‹ ˚ 0‹ e0 p2q ´ 2e0 p1q ` e0 p0q ˚ ‹ ˚ ˚ ‹“˚ . ‹ h “ Tδ “ ˚ .. ˚ ‹ ˚ . ‹ ˝ ‚ ˚ . ‹ . ‹ ˝0‚ e0 pN q ´ 2e0 pN ´ 1q ` e0 pN ´ 2q 1 The matrix associated with T2 in the canonical basis of 2 pZN q is: ˛ ¨ ´2 1 0 . . . 1 ˚ 1 ´2 1 . . . 0 ‹ ‹ ˚ ˚ . . . . .. ‹ AT2 “ ˚ ... ‹ . . . ‹ ˚ ˝ 0 0 . . . ´2 1 ‚ 1 0 . . . 1 ´2
The Discrete Fourier Transform and its Applications to Signal and Image Processing
87
Next, we calculate the DFT of h : ˆ hpmq “
Nÿ ´1
hpnqe´2πi
mn N
“ ´2 ¨ e´2πi
m0 N
` 1 ¨ e´2πi N ` 0 ` . . . m
n“0
`1 ¨ e´2πi
mpN ´1q N
“ ´2 ` e´2πi N ` e´2πim e2πi N “ ´2 ` e2πi N ` e´2πi N m
m
m
´ m¯ e2πi N ` e´2πi N “ ´2 ` 2 cos 2π 2 N m
“ ´2 ` 2 ¨
m
m
ˆ These values of hpmq must now be compared“ with those` of the ˘‰ first derivative ˆ operator. We do this by rewriting hpmq “ ´4 12 ´ 12 cos 2π m and using the N 2 1 1 m ˆ trigonometric identity sin pαq “ 2 ´ 2 cosp2αq with α “ π N to obtain hpmq “ ` m˘ ` ˘ 2 ˆ ´4 sin π N . The eigenvalues of T2 are thus hpmq “ ´4 sin2 π m N , m “ 0, 1, . . . , N ´ 1, and its diagonal representation is: ˆ ˙˙ ˆ ˙ ˆ ´π¯ pN ´ 1qπ 2π 2 2 2 D “ diag 0, ´4 sin , ´4 sin , . . . , ´4 sin N N N
Figure 2.4. Difference between the sine functions representing the spectrum values of the first and second derivative operators between 0 and π. For a color version of this figure, see www.iste.co.uk/provenzi/spaces.zip
88
From Euclidean to Hilbert Spaces
The effect of T2 on the frequency is defined by the magnitudes of its Eigenvalues: ´ m¯ ˆ , m P ZN |hpmq| “ 4 sin2 π N We see that the magnitudes of the Eigenvalues of the second derivative operator are the squares of those of the first derivative operator. Hence: ˆ – |hp0q| “ 0: thus, as in the case of the first derivative, the filtered signal T2 z averages to zero; ˆ N q| “ 4; – |hp 2
ˆ – |hpmq| ă 4 @m ‰
N 2;
ˆ – |hpmq| Ñ 0 if m Ñ 0 or m Ñ N ´ 1 and the convergence to zero is faster than for the first derivative operator, as in this case, the sine is squared, as illustrated in Figure 2.4; – The action of the operator is symmetrical about
N 2.
Thus, the discrete second derivative operator is also a high-pass filter, amplifying high frequencies and reducing low frequencies in a way which is the square of the action of the discrete first derivative operator. 2.9. The two-dimensional discrete Fourier transform (2D DFT) The Fourier transform considered up to now applies to signals zpnq which depend on only one parameter n. In practical contexts, signals are often very large and depend on multiple parameters. One classic example is that of digital images, which include two parameters: the two spatial coordinates of a pixel, as shown in Figure 2.5. DFT theory can be generalized for signals which depend on any (finite) number of parameters. For simplicity’s sake, we shall focus on the two-dimensional (2D) case, with parameters n1 , n2 . The first step is to introduce the domain vector space: if N1 , N2 P N, we define: 2 pZN1 ˆ ZN2 q “ tz : ZN1 ˆ ZN2 ùñ Cu z P 2 pZN1 ˆ ZN2 q is a complex sequence which depends on two parameters: # n1 P t0, 1, . . . , N1 ´ 1u n2 P t0, 1, . . . , N2 ´ 1u
The Discrete Fourier Transform and its Applications to Signal and Image Processing
89
Figure 2.5. The two coordinates of a pixel, n1 , n2 , in a digital image (image source: author). For a color version of this figure, see www.iste.co.uk/provenzi/spaces.zip
2 pZN1 ˆ ZN2 q is a vector space of dimension N1 ¨ N2 . The definitions used for summation and multiplication by a complex scalar are the same as those used for the 1D case and for inner products: xz, wy “
Nÿ 1 ´1 Nÿ 2 ´1
zpn1 , n2 qwpn1 , n2 q,
@z, w P 2 pZN1 ˆ ZN2 q
n1 “0 n2 “0
To extend DFT theory from one to two dimensions, we use the procedure for generating bases in 2 pZN1 ˆ ZN2 q from bases in 2 pZN1 q and 2 pZN2 q. T HEOREM 2.11.– Let tB0 , B1 , . . . , BN1 ´1 u, tC0 , C1 , . . . , CN2 ´1 u be orthonormal bases in 2 pZN1 q and 2 pZN2 q, respectively. For all m1 P t0, . . . , N1 ´ 1u and m2 P t0, . . . , N2 ´ 1u, consider the sequences in 2 pZN1 ˆ ZN2 q given by: Dm1 ,m2 pn1 , n2 q “ Bm1 pn1 q ¨ Cm2 pn2 q Then, Dm1 ,m2 is an orthonormal basis of 2 pZN1 ˆ ZN2 q, known as the tensor product basis of the two original bases. P ROOF.– The sequences Dm1 ,m2 , m1 P t0, . . . , N1 ´ 1u and m2 P t0, . . . , N2 ´ 1u are N1 ¨ N2 elements of 2 pZN1 ˆ ZN2 q, which is of dimension N1 ¨ N2 . Proof that these constitute an orthonormal basis can be obtained by showing that: " 1 if pm1 , m2 q “ pk1 , k2 q xDm1 ,m2 , Dk1 ,k2 y “ δpm1 ,m2 q,pk1 ,k2 q “ δm1 ,k1 δm2 ,k2 “ 0
if pm1 , m2 q ‰ pk1 , k2 q
90
From Euclidean to Hilbert Spaces
xDm1 ,m2 , Dk1 ,k2 y
řN1 ´1 řN2 ´1
“
def. of x , y
“
n1 “0
Nÿ 1 ´1 Nÿ 2 ´1
def. of D
“
n2 “0
Dm1 ,m2 pn1 , n2 qDk1 ,k2 pn1 , n2 q
Bm1 pn1 qCm2 pn2 qBk1 pn1 qCk2 pn2 q
n1 “0 n2 “0 Nÿ 1 ´1
Nÿ 2 ´1
n1 “0
n2 “0
Bm1 pn1 qBk1 pn1 q
Cm2 pn2 qCk2 pn2 q
“ xB m1 , Bk1 y xC m2 , Ck2 y “ δpm1 ,m2 q,pk1 ,k2 q . looooomooooon looooomooooon
δm1 ,k1
2
δm2 ,k2
For m1 P t0, 1, . . . , N1 ´ 1u and m2 P t0, 1, . . . , N2 ´ 1u, this theorem has the following corollaries: – the canonical orthonormal basis of 2 pZN1 ˆ ZN2 q is: # 1 if pn1 , n2 q “ pm1 , m2 q B “ em1 ,m2 pn1 , n2 q “ 0 if pn1 , n2 q ‰ pm1 , m2 q – the orthogonal Fourier basis of 2 pZN1 ˆ ZN2 q is: Fm1 ,m2 pn1 , n2 q “
m1 n1 m2 n2 1 1 2πi e2πi N1 ¨ e2πi N2 “ e N1 N2 N1 N2
´
m1 n 1 N1
`
m2 n 2 N2
¯
– the orthonormal Fourier basis of 2 pZN1 ˆ ZN2 q is: a Em1 ,m2 pn1 , n2 q “ N1 N2 Fm1 ,m2 pn1 , n2 q – the orthogonal basis of the complex exponentials in 2 pZN1 ˆ ZN2 q is: Em1 ,m2 pn1 , n2 q “ N1 N2 Fm1 ,m2 pn1 , n2 q
Using the theory of complex inner product spaces, the definition of Fourier coefficients, the DFT and the IDFT can be generalized to 2 pZN1 ˆ ZN2 q. Taking z P 2 pZN1 ˆ ZN2 q, we have: xz, Em1 ,m2 y “ “ “
Nÿ 2 ´1 1 ´1 Nÿ n1 “0 n2 “0 Nÿ 2 ´1 1 ´1 Nÿ n1 “0 n2 “0 Nÿ 2 ´1 1 ´1 Nÿ n1 “0 n2 “0
zpn1 , n2 qe2πi
n 1 m1 N1
zpn1 , n2 qe´2πi
e2πi
m1 n1 N1
zpn1 , n2 qe´2πip
n2 m2 N2
e´2πi
m1 n 1 N1
`
m2 n 2 N2
m2 n2 N2
q
The Discrete Fourier Transform and its Applications to Signal and Image Processing
91
thus the Fourier coefficients of z P 2 pZN1 ˆ ZN2 q are defined as follows: zˆpm1 , m2 q “
Nÿ 1 ´1 Nÿ 2 ´1
zpn1 , n2 qe
´
´2πi
m1 n 1 N1
`
m2 n2 N2
¯
n1 “0 n2 “0
As in the 1D case: zˆp0, 0q “ N1 N2 xzy where xzy is the average of z. Note that the quantity N1 N2 may be extremely large. The synthesis formula can also be generalized to the 2D case, as follows: zpn1 , n2 q “
´ ¯ Nÿ 1 ´1 Nÿ 2 ´1 m1 n1 m2 n 2 1 2πi N ` N 1 2 zˆpm1 , m2 qe N1 N2 m “0 m “0 1
2
The 2D DFT and 2D IDFT operators can therefore be written using the following formulas: ˆ : 2 pZN1 ˆ ZN2 q ÝÑ 2 pZN1 ˆ ZN2 q z
ÞÝÑ zˆ, zˆpm1 , m2 q “
Nř 1 ´1 Nř 2 ´1 n1 “0 n2 “0
´2πi
zpn1 , n2 qe
´
m1 n1 N1
`
m2 n 2 N2
¯
and: ˇ : 2 pZN1 ˆ ZN2 q ÝÑ 2 pZN1 ˆ ZN2 q ´ ¯ Nř m1 n 1 m2 n2 1 ´1 Nř 2 ´1 2πi N ` N 1 2 z ÞÝÑ zˇ, zˇpn1 , n2 q “ N11N2 zpm1 , m2 qe m1 “0 m2 “0
Clearly, if the dimension is increased from 2 to 2 ă d ă `8, these formulas can be generalized in the following manner: zˆpm1 , . . . , md q “
Nÿ 1 ´1
¨¨¨
n1 “0
˜ zˇpn1 , ¨ ¨ ¨ , nd q “
d ź
k“1
Nÿ d ´1
zpn1 , . . . , nd qe
´2πi
d ř k“1
mk nk Nk
nd “0
¸´1 Nk
Nÿ 1 ´1 m1 “0
¨¨¨
Nÿ d ´1
zpm1 , . . . , md qe
2πi
d ř k“1
mk n k Nk
md “0
2.9.1. Matrix representation of the 2D DFT: Kronecker product versus iteration of two 1D DFTs The matrix representation of the 2D DFT in the canonical basis of 2 pZN1 ˆ ZN2 q can be constructed using the Sylvester matrices WN1 and WN2 associated with the 1D DFT for 2 pZN1 q and for 2 pZN2 q, respectively.
92
From Euclidean to Hilbert Spaces
The operation used to obtain a matrix representation of the 2D DFT is the Kronecker product, which is defined below. D EFINITION 2.18.– Given two matrices, A of dimension m ˆ n and B of dimension p ˆ q: ¨ ˛ ¨ ˛ a11 ¨ ¨ ¨ a1n b11 ¨ ¨ ¨ b1q ˚ ‹ ˚ ‹ A “ ˝ ... . . . ... ‚, B “ ˝ ... . . . ... ‚ am1 ¨ ¨ ¨ amn
bp1 ¨ ¨ ¨ bpq
the Kronecker product matrix A b B is the matrix of dimension mp ˆ nq defined by: ˛ ¨ a11 B ¨ ¨ ¨ a1n B ˚ .. ‹ .. A b B “ ˝ ... . . ‚ am1 B ¨ ¨ ¨ amn B
The matrix associated with the 2D DFT in the canonical basis of 2 pZN1 ˆ ZN2 q can be shown, by direct calculation, to be the matrix of dimension N1 N2 ˆ N1 N2 given by: ùñ
WN1 ,N2 “ WN1 b WN2
zˆpm1 , m2 q “ WN1 b WN2 zpn1 , n2 q
Unfortunately, the calculation needed to obtain the Kronecker product matrix becomes unfeasibly large for high values of N1 and N2 . In practice, the 2D DFT is generally written as the iteration of two 1D DFTs. To understand this approach, z P 2 pZN1 ˆ ZN2 q must be interpreted as a matrix made up of N2 column vectors with N1 elements: ¨ ˛, .. .. .. / / . ¨¨¨ . ˚ . ‹. ˚ ‹ zpn1 , n2 q “ ˝zp¨, 0q zp¨, 1q ¨ ¨ ¨ zp¨, N2 ´ 1q‚ / / .. .. .. . . ¨ ¨ ¨ . loooooooooooooooooooooomoooooooooooooooooooooon N2 column vectors
N1 elements for each column vector From the definition of the 2D DFT, we can write: ˜ ¸ Nÿ Nÿ 1 ´1 2 ´1 n1 m1 n 2 m2 zˆpm1 , m2 q “ zpn1 , n2 qe´2πi N1 e´2πi N2 n1 “0 n2 “0 looooooooooooooooomooooooooooooooooon Nÿ 2 ´1 n2 m2 WN1 zpn1 , n2 qe´2πi N2 “ n2 “0
zˆpm1 , n2 q “ WN1 zpn1 , n2 q
[2.47]
The Discrete Fourier Transform and its Applications to Signal and Image Processing
93
In this formula, the sum with regard to index n2 is the furthest out, so n2 is fixed each time. Taking a fixed value for n2 , zpn1 , n2 q is a column vector, so the highlighted parenthesis represents the 1D DFT of the column vector, which can be obtained by applying matrix WN1 to zpn1 , n2 q, with a fixed value of n2 , as before. The next problem is that n1 is fixed, and the changing index is n2 , meaning that WN1 zpn1 , n2 q is a row vector. For this reason, the DFT cannot be obtained by applying WN2 : as we saw in section 2.5, the 1D DFT is obtained from the product of the matrix WN and a sequence represented using a column vector. The solution to this problem consists of transposing the two sides of equation [2.47], transforming the row vector zˆpm1 , n2 q into a column vector: zˆpm1 , m2 qt “
Nÿ 2 ´1
pWN1 zpn1 , n2 qqt e´2πi
n2 m2 N2
n2 “0
Now, pWN1 zpn1 , n2 qqt is a column vector, so the DFT can be calculated by applying WN2 : zˆpm1 , m2 qt “ WN2 pWN1 zpn1 , n2 qqt “ WN2 zpn2 , n1 qWN1
“
pABqt “B t At
WN2 zpn1 , n2 qt pWN1 qt
since WNt 1 “ WN1 (note that n1 and n2 have swapped places). Thus, zˆpm1 , m2 qt “ WN2 zpn2 , n1 q WN1 , so to find zˆpm1 , m2 q, we must simply transpose both sides again: zˆpm1 , m2 q “ pˆ z pm1 , m2 qt qt “ pWN2 zpn2 , n1 q WN1 qt “ WN1 zpn1 , n2 q WN2 The formula used to calculate the 2D DFT of a sequence z P 2 pZN1 ˆ ZN2 q is thus: zˆpm1 , m2 q “ WN1 zpn1 , n2 qWN2
[2.48]
It is important to note that equation [2.48] is only meaningful if zˆpm1 , m2 q and zpn1 , n2 q are interpreted as N1 ˆ N2 matrices in their entirety. Formula [2.48] is not the same as WN1 WN2 zpn1 , n2 q or WN2 WN1 zpn1 , n2 q, i.e. the formulas that one could have naively thought to use to implement 1D DFT over the columns and rows of z. The reason for this difference, as we have seen, is that the 1D matrix DFT requires the presence of a column vector, hence the transposition which results in formula [2.48]. 2.9.2. Properties of the 2D DFT The generalization of the properties of the 1D DFT, presented in section 2.7, to the 2D DFT is trivial.
94
From Euclidean to Hilbert Spaces
The demonstrations of these properties in 1D and 2D are practically identical, notwithstanding certain differences in notation. For this reason, we shall not provide proofs for the 2D extensions presented below. As in the 1D case, in order to examine the properties of the 2D DFT, we must first extend the definition of a sequence z P 2 pZN1 ˆZN2 q by periodicity to any interval of length N1 with regard to the variable n1 and of length N2 with regard to the variable n2 . This extension is possible if z is defined outside of ZN1 ˆ ZN2 in the following manner: zpn1 ` j1 N1 , n2 ` j2 N2 q “ zpn1 , n2 q ,
@n1 , n2 , j1 , j2 P Z
[2.49]
The shift operator is also helpful in 2D cases. D EFINITION 2.19.– Take z P 2 pZN1 ˆ ZN2 q, extended by periodicity as in formula [2.49], and k1 , k2 P Z. The shift operator over 2 pZN1 ˆ ZN2 q is defined by: Rk1 ,k2 : 2 pZN1 ˆ ZN2 q ÝÑ 2 pZN1 ˆ ZN2 q z ÞÝÑ Rk1 ,k2 z, pRk1 ,k2 zqpn1 , n2 q “ zpn1 ´ k1 , n2 ´ k2 q Taking z P 2 pZN1 ˆ ZN2 q, extended by periodicity as in formula [2.49], then, for all n1 , n2 , m1 , m2 P Z: – periodicity of zˆ and zˇ : zˆpm1 , m2 q “ zˆpm1 ` N1 , m2 q “ zˆpm1 , m2 ` N2 q “ zˆpm1 ` N1 , m2 ` N2 q and: zˇpn1 , n2 q “ zˇpn1 ` N1 , n2 q “ zˇpn1 , n2 ` N2 q “ zˇpn1 ` N1 , n2 ` N2 q – 2D DFT and shift: ´2πi R{ k1 ,k2 zpm1 , m2 q “ e
´
m1 k1 N1
m k
2 2 ` N 2
¯
zˆpm1 , m2 q
@k1 , k2 P Z
k1 ,k2 k1 ,k2 that is, if we define the sequence ωN pm1 , m2 q “ P 2 pZN1 ˆ ZN2 q, ωN 1 ,N2 1 ,N2
e
´2πi
´
m1 k1 N1
m k
2 2 ` N 2
¯
@m1 , m2 P Z, then:
DFT 2D ˝ Rk1 ,k2 “ Mωk1 ,k2 ˝ DFT 2D N1 ,N2
k1 ,k2 where Mωk1 ,k2 is the multiplication operator by ωN in 2 pZN1 ˆ ZN2 q. Permutating the 1 ,N2 N1 ,N2
direction of composition, we obtain: ˆ
pRk1 ,k2 zˆqpm1 , m2 q “ zˆpm1 ´ k1 , m2 ´ k2 q “ DFT 2D e
2πi
´
m1 k1 N1
m k
2 2 ` N 2
¯
˙
z pm1 , m2 q
The Discrete Fourier Transform and its Applications to Signal and Image Processing
95
that is: Rk1 ,k2 ˝ DFT 2D “ DFT 2D ˝ M´
k ,k
ωN1 ,N2 1 2
¯˚
,
@k1 , k2 P Z
The properties examined above are summarized by the Fourier pairs in Table 2.5. Original representation zpn 1 ´ k1 , n2 ´ k2 q ´ ¯
e
2πi
n 1 k1 N1
n 2 k2 ` N 2
zpn1 , n2 q
e
´2πi
´
Fourier space ¯
m1 k1 N1
m k
2 2 ` N 2
zˆpm1 , m2 q
zˆpm1 ´ k1 , m2 ´ k2 q
Table 2.5. Fourier pairs for 2D shifts As in the case of 1D DFT, considering k1 “ N21 and k2 “ N22 , then p´1qn1 `n2 zpn1 , n2 q and zˆpm1 ´ N21 , m2 ´ N22 q. This transformation is used to obtain a centered visualization of the spectrum of z. Furthermore, as in the o1D case, the amplitude spectrum of a 2D ˇ of any shifted form zpn1 ´ k1 , n2 ´ k2 q is strictly identical, as ˇsignal ´zpn1 , n2 q and ˇ ´2πi m1 k1 ` m2 k2 ¯ ˇ N1 N2 ˇ “ 1. Thus, the amplitude spectrum gives us the frequency content of ˇe ˇ ˇ the signal, but does not tell us where these frequencies are located; – 2D DFT and conjugation: p zpm1 , m2 q “ zˆp´m1 , ´m2 q “ zˆpN1 ´ m1 , N2 ´ m2 q – 2D DFT and convolution: { pz ˚ wqpm1 , m2 q “ zˆpm1 , m2 qwpm ˆ 1 , m2 q where 2D convolution is defined as: Tz : 2 pZN1 ˆ ZN2 q ÝÑ 2 pZN1 ˆ ZN2 q w ÞÝÑ Tz w “ z ˚ w
pz ˚ wqpn1 , n2 q “
Nÿ 1 ´1 Nÿ 2 ´1
zpn1 ´ k1 , n2 ´ k2 qwpk1 , k2 q
k1 “0 k2 “0
“
Nÿ 1 ´1 Nÿ 2 ´1
zpn1 , n2 qwpn1 ´ k1 , n2 ´ k2 q
k1 “0 k2 “0
Box 2.3. Properties of 2D DFT
2.9.3. 2D DFT and stationary operators The properties of 2D and 1D DFT with regard to stationary operators are the same.
96
From Euclidean to Hilbert Spaces
Strictly speaking, an operator T : 2 pZN1 ˆ ZN2 q Ñ 2 pZN1 ˆ ZN2 q is stationary if: T ˝ Rk1 ,k2 “ Rk1 ,k2 ˝ T,
@k1 , k2 P Z
In practice, if z is a digital image, a stationary operator is a transformation whose action is independent of the position of a pixel in the spatial context of the image. As in the 1D case, stationary operators over 2 pZN1 ˆ ZN2 q may be characterized as convolution operators or as Fourier multiplier operators. The theorem formalizing this relation relies on definitions of the Fourier multiplier, the unit pulse and the pulse response in the 2D case. D EFINITION 2.20.– Taking a fixed w P 2 pZN1 ˆ ZN2 q, the Fourier multiplier associated with w is defined as: Tpwq : 2 pZN1 ˆ ZN2 q ÝÑ 2 pZN1 ˆ ZN2 q ~ ¨ zˆ z ÞÝÑ Tpwq z “ w D EFINITION 2.21.– The unit pulse δ in 2 pZN1 ˆ ZN2 q is the first vector in the canonical basis: δ “ e0,0 . Given a linear operator T over 2 pZN1 ˆ ZN2 q, the pulse response is defined as the sequence h “ T δ P 2 pZN1 ˆ ZN2 q. T HEOREM 2.12.– Let T : 2 pZN1 ˆ ZN2 q ÝÑ 2 pZN1 ˆ ZN2 q be a linear operator. The following conditions are equivalent: 1) T is stationary; 2) T is the convolution operator with the pulse response h “ T δ: T z “ Th z “ h ˚ z “ z ˚ h
@z P 2 pZN1 ˆ ZN2 q
ˆ: 3) T is the Fourier multiplier associated with h ˆ}ˆ T z “ Tphq ˆ z “h¨z
P 2 pZN1 ˆ ZN2 q
4) T is diagonalizable, its eigenvectors are the orthogonal Fourier basis Fm1 ,m2 of ˆ 2 pZN1 ˆ ZN2 q, and its eigenvalues are the components of h. O BSERVATIONS .– This result can be extended to circulant matrices, but their definition in the 2D case is more complex.
The Discrete Fourier Transform and its Applications to Signal and Image Processing
97
2.9.4. Gradient and Laplace operators and their action on digital images Repeating the analysis of discrete derivative operators from section 2.8.6 for 2D “ p B , B q, and the second cases, the first derivative gives us the gradient, that is ∇ Bx By
derivative gives us the Laplacian, that is ∇2 “
B2 Bx2
`
B2 By 2 .
The gradient is used to detect the edges of an image in a particular direction. For isotropic edge detection – that is detection which is uniform with regard to direction – the Laplacian is used; this approach is more efficient than using a gradient for intensifying fine details, as we saw in the 1D case. Even in 2D cases, the differential operators above cancel out the average of an image, which is why the output is entirely black, except near the edges, as we see from Figure 2.6.
2.9.5. Visualization of the amplitude spectrum in 2D Visualizations of the spectrum of a 2D signal can be produced on the condition that the signal is centered, for the same reasons presented in the 1D case. Centering may be carried out using the 2D equivalent of formula [2.34], considering p´1qn1 `n2 z pn1 , n2 q in the place of zpn1 , n2 q, as we saw in section 2.9.2. Note that the 1D symmetry of the 1D DFT with regard to frequencies m P t0, 1, . . . , N {2u and m P tN {2 ` 1, N {2 ` 2, . . . , N ´ 1u is replaced by 2D mirror symmetry in the case of the 2D DFT. Figure 2.7 shows three grayscale digital images with their amplitude spectrums. The brightest points correspond to high magnitude values of the Fourier coefficients, while the darkest points correspond to low values. There are several notable characteristics here: – the symmetry of the spectrum: frequency content is repeated in each quadrant by mirror symmetry; – the brightest points are located toward the center of the spectrum: this is due to the fact that these spectrums are centered, so the coordinates of the central frequency are pm1 , m2 q “ p0, 0q and |ˆ z p0, 0q| “ N1 N2 xzy, that is, N1 N2 times the average value of the image. This is why a compressive function, such as a logarithm, must be used to visualize a spectrum: the values of |ˆ z p0, 0q| are so much higher than the others that the variability range needs to be compressed;
98
From Euclidean to Hilbert Spaces
Figure 2.6. a) Original image of Panko; b) image after Laplacian filter; c) image filtered using a gradient in the vertical direction; d) image filtered using a gradient in the horizontal direction (image source: author). For a color version of this figure, see www.iste.co.uk/provenzi/spaces.zip
The Discrete Fourier Transform and its Applications to Signal and Image Processing
Figure 2.7. Left column: original images. Right column: centered amplitude spectrums of the images in the left column, visualized using a logarithmic scale
99
100
From Euclidean to Hilbert Spaces
– moving out from the center, the spectrum shows the amplitude of the coefficients corresponding to the highest frequencies, up to the maximum frequencies pN1 {2, N2 {2q, if N1 , N2 are even, or their integer parts prN1 {2s , rN2 {2sq if N1 , N2 are odd. The image with the highest frequency content is that of the mandrill: its spectrum is the widest of the three shown here. Note the particularly intense values near the edges, representing very high frequencies: these correspond to the fine details of the hairs near the animal’s eyes; – as m1 and m2 represent vertical and horizontal frequencies, the vertical and horizontal edges of the images produce Fourier coefficients which are localized on the corresponding axes. This is why the spectrum of the first image, which features strong vertical intensity gradients between the rocks and the sea, is heavily dominated by intense Fourier coefficients on the vertical axis. The second image (“Lena”, a classic image used in image processing) features fine details in the hat area, at 45˝ and ´45˝ . This results in evident diagonal structures in the spectrum; – from this spectrum analysis, we see that the Fourier spectrum reveals the presence of geometric structures within an image, but does not tell us where in the image these structures are located. 2.9.6. Filtering: an example of digital image filtering in a Fourier space Theorem 2.12 states that all stationary operators T acting on images (interpreted as finite 2D sequences) are “hidden” convolutions between the image and the pulse response h “ T δ. Furthermore, these convolutions can be represented as Fourier multipliers ˆ and the 2D DFT of the image within the Fourier space). (multiplication of h Different results will be obtained depending on the sequence h with which convolution is carried out. The effect of a convolution is often easier to interpret by examining the associated Fourier multiplier. Let us consider the notion of convolution with a discrete Gaussian, noted hpn1 , n2 q. As we shall see in Chapter 6, the Fourier transform of a Gaussian with a standard deviation σ is itself a Gaussian, but the standard deviation of the latter is inversely proportional to σ. Thus, we can further our understanding of the meaning of convoluting an image zpn1 , n2 q with a Gaussian hpn1 , n2 q by analyzing the ˆ 1 , m2 q in the Fourier space. multiplication zˆpm1 , m2 q ¨ hpm Figure 2.8 features three images corresponding to 512 ˆ 512 2D Gaussians. The intensity “ ¯ the pixel in position pn1 , n2 q is hpn1 , n2 q ´ 2 of n `n2 exp ´ 12σ2 2 and the standard deviation is σ “1, 5 and 10, respectively.
The Discrete Fourier Transform and its Applications to Signal and Image Processing
101
Figure 2.8. Two-dimensional Gaussian images with a standard deviation of (left - right) 1, 5 and 10
As we stated above, the 2D DFTs of h are still Gaussians, but their standard 1 deviations are proportional to 1, 15 , and 10 . Evidently, hp0, 0q “ 1 and the values of ˆ 1 , m2 q decrease as we move away from the center; thus, multiplication in the hpm ˆ 1 , m2 q decreases the importance of the harmonics Fourier space zˆpm1 , m2 q ¨ hpm with pm1 , m2 q ‰ p0, 0q, which are associated with the finer details in the image. ˆ 1 , m2 q, we can reconstruct an image Applying the 2D IDFT to zˆpm1 , m2 q ¨ hpm which is blurrier than the original. In image processing, convolution with a Gaussian corresponds to a blurring operation, as we see in Figure 2.9.
Figure 2.9. Blurred image of Lena obtained by multiplying DFTs and Gaussians with standard deviations of (left - right) 1, 5 and 10
C OMMENT CONCERNING FIGURE 2.9.– Note that as the standard deviation of the DFT of a Gaussian is inversely proportional to the original standard deviation, the DFT of the Gaussian with a standard deviation of 10 has a small standard deviation in the latter case, and thus tends rapidly toward 0. So, when the DFT of the Gaussian
102
From Euclidean to Hilbert Spaces
with an SD of 10 is multiplied with the DFT of the image, much of the detail in the image is lost. Blurring has a number of uses; for example, in cases where the original image is noisy, blurring can make this noise less evident (although it also reduces edge sharpness). Figure 2.10 shows a continuous version of the blurring frequency filter.
Figure 2.10. Blurring filter/low-pass filter in the frequency domain
N OTE .– Although convolution with a Gaussian results in a blurring effect, it would be wrong to assume that convolution is always associated with a blurring action. As we saw earlier, convolution, alongside the Fourier multiplier, constitutes a prototype for all stationary operators, which may blur a signal or enhance its contrast.
2.10. Summary In this chapter, we considered the space 2 pZN q composed of N-periodic sequences with complex values, isomorphic to CN . We introduced a special basis in this space, made up of the complex exponentials generated by the consecutive powers of the N -th complex roots of the unit. This basis is used to construct the Fourier basis of 2 pZN q. We interpreted the elements of this basis as harmonic waves, oscillating at frequencies which are multiples of a fundamental one.
The Discrete Fourier Transform and its Applications to Signal and Image Processing
103
The Fourier coefficients of an element in 2 pZN q are its components with regard to the Fourier basis. As these coefficients are complex, their magnitude must be used to determine the importance of a harmonic in relation to a certain frequency when reconstructing (or synthesizing) the element itself. The set of magnitudes of the Fourier coefficients is known as the spectrum of an element in 2 pZN q. The DFT is the endomorphism of 2 pZN q which associates an element of 2 pZN q with the sequence of its Fourier coefficients. The DFT is actually an isomorphism, and its inverse is known as the IDFT. The DFT may be associated with a matrix, known as a Sylvester matrix; this matrix is a Vandermonde matrix, that is, all of the lines and columns in the matrix can be obtained through geometric progressions. We presented an interpretation of these concepts in the context of signal theory, notably highlighting the fact that the highest harmonic oscillation frequency in a discrete signal obtained from N samples is N {2 (or half of its integer part if N is odd); this is the Nyquist frequency. The DFT transforms the shift operation into a multiplication by a phase factor, that is, a complex exponential with unit magnitude; this implies that the signal spectrum is shift-invariant. Convolution is transformed by the DFT into a pointwise product, allowing the convolution operator to be expressed diagonally in the Fourier space. Finally, we saw that the DFT can be used to diagonalize stationary operators, that is, operators which commutate with shift operators. Theorem 2.9 can be used to fully characterize a stationary operator as a convolution or as a Fourier multiplier and to determine the eigenvalues of this operator.
3 Lebesgue’s Measure and Integration Theory
In this chapter, we shall present the most essential elements of measure theory and integration. Our aim here is simply to establish clear and unambiguous notation and a common vocabulary. What follows is a deliberately brief summary. Readers who have not yet studied this important branch of mathematics may wish to look elsewhere for a more detailed introduction to measure theory and integration. Two excellent reference works in this domain are Briane and Pagès (1998) and Bartle (1966). 3.1. Riemann versus Lebesgue The main difference between the Riemann and Lebesgue approaches is shown in Figure 3.1. The key to Riemann integration lies in approximating the area of the surface between the x axis and the curve of a function f using small rectangles rai´1 , ai s ˆ r0, Φi s with their base on the x axis, of a height Φi close to the average height of function f over rai´1 , ai s. Lebesgue’s integration theory differs in that the first stage involves breaking down the y axis into small intervals rbj´1 , bj s; the surface below the curve f is then approximated using: żb ÿ bj´1 ` bj f« ¨ length ptx : bj´1 ď f pxq ď bj uq 2 a j From Euclidean to Hilbert Spaces: Introduction to Functional Analysis and its Applications First Edition. Edoardo Provenzi. © ISTE Ltd 2021. Published by ISTE Ltd and John Wiley & Sons, Inc.
106
From Euclidean to Hilbert Spaces
a)
b)
Figure 3.1. Riemann and Lebesgue integration. For a color version of this figure, see www.iste.co.uk/provenzi/spaces.zip
The main difficulty lies in the fact that the sets: Ej “ tx P ra, bs : bj´1 ď f pxq ď bj u shown in red in Figure 3.1(b), are generally not intervals, and it can be complicated, if not (as in certain cases) impossible, to associate them with a length or measure. The development of measure theory was motivated by the need to create a theory of integration using the strategy described above. This approach is far longer and more complicated than Riemann integration; however, Lebesgue integration presents a significant advantage in terms of generality, and the properties that can be proved are far more powerful. 3.2. σ-algebra, measurable space, measures and measured spaces In order to define a Lebesgue integral, we must first define the sets and functions which can be measured. The definitions and results below, based on work carried out in the early 20th century, make up the necessary formalization. Let X be a set. A σ-algebra on X is a collection A of subsets of X, that is A Ď PpXq, which verifies the following properties: – H, X P A; – A is closed under complementation: E P A ùñ E c P A; Ť E P A. – A is closed under countable unions: pEn qnPN P A ñ nPN
This definition implies that A is closed under countable intersection. S IMPLE EXAMPLES .– – A “ PpXq: σ-algebra of the power set of X. – A “ tH, Xu: the minimal σ-algebra over X.
Lebesgue’s Measure and Integration Theory
107
Mathematicians working on measure theory have proved that the defining properties of a σ-algebra are necessary and sufficient to “measure” the sets contained in the σ-algebra itself, in a sense which will be defined below. For this reason, the pair pX, Aq is called a measurable space and the elements of A are measurable sets. One further concept must be introduced before we can examine a meaningful example of a measurable space: that of the ordering relation between σ-algebras. If every element in a σ-algebra A1 is contained in the σ-algebra A2 , then A1 is said to be smaller than A2 and we write A1 Ă A2 . This concept is used to define the smallest σ-algebra generated by a collection of power sets: taking S Ă PpXq, the intersection of all σ-algebras which contain S is known as the σ-algebra generated by S. The case of a topological set X is particularly interesting, and merits closer attention. The existence of a topology means that we can define the concept of an open part of X. Taking τ Ď PpXq to be the open sets of X, we clearly see that τ is not a σ-algebra, since the complement of an open set is a closed set. However, we can consider the σ-algebra generated by τ , called the Borel σ-algebra1and noted BpXq. Each element in this algebra – which is a subset of X – is called a Borel set. Once we have a measurable space pX, Aq, the concept of a positive measure, or simply a measure, μ can be defined as a function μ : A Ñ r0, `8s such that: – μpHq “ 0 ; – μ is σ-additive (or countably additive): if pEn qnPN is a countable family of twoby-two disjoint elements in A, then: ˜ ¸ ď ÿ μ An “ μpAn q nPN
nPN
The triple pX, A, μq is said to be a measure space. When the σ-algebra A and the measure μ are clearly specified, they are often omitted and one simply writes X. One very simple, but meaningful, example of a measure is given by the Dirac measure in the measurable space pR, BpRqq, that is, R with the Borel σ-algebra. The Dirac measure centered on x0 P R is defined by: δx0 : Bpτ q Ñ t0, 1u: # 1 if x0 P E @E P BpRq δx0 pEq “ 0 if x0 R E
1 This σ-algebra cannot be described explicitly.
108
From Euclidean to Hilbert Spaces
Since R itself is an element in BpRq, δx0 pRq “ 1, and the Dirac measure of R is 1, independently of the starting point. It is therefore an example of a finite measure, that is, the measure of the entire space is finite. Measures are generally σ-finite, rather than simply finite. Given a measure space pX, A, μq, μ is said to be a σ-finite measure if X can be written as the countable union of measurable sets pEn qnPN Ă X with a finite measure, that is: ď En , μpEn q ă `8 @n P N X“ nPN
Several different techniques exist for constructing a measure, but these are not simple and cannot be described in short form. Readers may wish to consult the volume cited in the preface, or any other work on measure theory. 3.3. Measurable functions and almost-everywhere properties (a.e) The next step is to introduce the morphisms of measurable spaces, that is, applications between measurable spaces which preserve measurability. Let pX1 , A1 q, pX2 , A2 q be two measurable spaces and f : X1 Ñ X2 an arbitrary function. f is a measurable function (with respect to the chosen σ-algebras A1 and A2 ) if the reciprocal image via f of any element of the σ-algebra A2 is included in A1 , that is2: E P A2 ùñ f ´1 pEq P A1 . This is equivalent, by definition, to stating that the reciprocal image via f of a measurable set of X2 (with respect to A2 ) is a measurable set of X1 (with respect to A1 ). R EMARKS .– – Continuous functions between two topological spaces are clearly measurable with respect to their Borel σ-algebras. – Without other specifications, whenever we consider real-valued functions, that is f : X Ñ R, where pX, Aq is a measurable space, we fix the Borel σ-algebra on R and we test the measurability of f with respect to this choice. – A complex-value function f : X Ñ C is measurable if both its real and imaginary parts are measurable. 2 Note the similarity between this definition and that of a continuous function, in the topological sense of the term.
Lebesgue’s Measure and Integration Theory
109
Let us now recall the crucial concept of properties which are defined almost everywhere. A function f defined on a measure space pX, A, μq has a property which holds almost everywhere (written a.e.) if f possesses this property on XzE, where E P A has a measure of zero: μpEq “ 0. E XAMPLES .– – f, g: measurable functions defined on pX, A, μq, then f “ g a.e. if f pxq “ gpxq @x P U P A and μpXzU q “ 0. – f is the a.e. pointwise limit of the sequence pfn qnPN if lim fn pxq “ f pxq @x P U P A and μpXzU q “ 0.
nÑ`8
3.4. Integrable functions and Lebesgue integrals Given a measure space pX, A, μq, the integral of a measurable function defined by real or complex functions is relatively simple to obtain. We start by considering a special function, the indicator (or characteristic) function of a set E P A: χE : X Ñ t0, 1u: # 1 if x P E χE pxq “ 0 if x R E An equivalent notation is 1E . Indicator functions are used to define simple functions or step functions via linear combination. More precisely, taking pEk qnk“1 to be a finite and disjoint partition of X, that is, Ek X Ek1 “ H @k ‰ k 1 and n ď
Ek “ X,
k“1
a simple function s : X Ñ R is defined as: s“
n ÿ
c k χE k
k“1
spxq “ ck @x P Ek ; hence s can only take a finite number of values; if X is a subset of R, then s is a piecewise constant function. The natural definition of the Lebesgue integral of a simple function is: ż X
sdμ “
n ÿ k“1
ck μpEk q
110
From Euclidean to Hilbert Spaces
Note that, without the definition of the set measure Ek , the integral of s would not be correctly defined. The importance of simple functions is expressed in Theorem 3.1. T HEOREM 3.1.– Let pX, A, μq be a measure space and f : X Ñ r0, `8s a measurable and non-negative function. f can be approximated from below using a series of simple functions, that is, D psn qnPN , with sn a simple function; such that psn qnPN Õ f , that is: 1) 0 ď s0 pxq ď s1 pxq ď . . . ď sn pxq ď . . . ď f pxq @x P X ; 2) lim sn pxq “ f pxq @x P X (pointwise limit). If f is bounded, then the nÑ`8
convergence of the sequence psn qnPN toward f is uniform. The proof of this theorem is both elegant and informative, showing that the sequence of simple functions is given by: # k k k`1 n ´1 n if n ď f pxq ď 2n , for k “ 0, 1, . . . , 2 sn pxq “ 2n 2n 2 if 2 ď f pxq This fundamental theorem makes it possible to define the integral of a measurable non-negative function f : X Ñ r0, `8s as: ż X
f dμ “ sup
0ďsďf
ż s dμ
s simple
X
f is said to be (Lebesgue) integrable if
ş X
f dμ ă `8.
If f : X Ñ R is measurable, then its integral can be defined by considering its positive part: # f pxq if f pxq ě 0 f` pxq “ 0 if f pxq ă 0 and its negative part: # ´f pxq f´ pxq “ 0
if f pxq ď 0 if f pxq ą 0,
note that both these functions are positive-valued. Since f “ f` ´ f´ , if f` and f´ are integrable, then we can define the integral of a measurable function with extended real values as: ż ż ż f dμ “ f` dμ ´ f´ dμ X
X
X
Lebesgue’s Measure and Integration Theory
111
The same strategy is used for measurable functions f : X Ñ C, but using the positive and negative parts of the real part Repf q and the imaginary part Impf q. The integral is thus defined as: ż ż ż f dμ “ Repf qdμ ` i Impf qdμ X
X
X
Absolute integrability is a necessary and sufficient condition for integrability of a real or complex valued function: ż
f dμ ă `8 ðñ X
ż
|f |dμ ă `8
X
3.5. Characterization of the Lebesgue measure on R and sets with a null Lebesgue measure As we have seen, the construction of a measure is generally not trivial. However, given the importance of the Lebesgue measure on R, it is helpful to provide a brief summary of the characteristics of this measure. Remarkably, a theorem exists which provides the characterization of the Lebesgue measure using certain properties. Before quoting the result, we recall some definitions. – Borel measure: Let X be a topological space, and take the measurable space pX, BpXqq, where BpXq is the Borel σ-algebra. A measure μ defined on this space is said to be a Borel measure if it associates a finite number with each compact subset K of X; – Regular Borel measure: A Borel measure is regular if, for any Borel set E P BpXq, we have: 1) μpEq “ suptμpKq, K Ă E, K compactu; 2) μpEq “ inftμpOq, E Ă O, O openu. Consider now pR, BpRq, μq. μ is a shift-invariant measure if: μpE ` aq “ μpEq for any Borel set E P BpRq and all a P R, where E ` a “ tx P R e ` a, where e P Eu.
:
x “
We can now quote the theorem that provides the characterization of the Lebesgue measure on R, noted m. T HEOREM 3.2.– If a measure on pR, BpRq, μq has the following properties:
112
From Euclidean to Hilbert Spaces
1) μ is a regular Borel measure; 2) μ is shift-invariant; 3) μ is normalized, that is μr0, 1s “ 1; then μ is the Lebesgue measure m. Thus, we can say that the Lebesgue measure on pR, BpRqq is a regular, shift-invariant, normalized Borel measure; this also implies that mra, bs “ b ´ a. A further consequence of this theorem is that the Lebesgue measure is σ-finite: R can be covered by a partition of compact intervals r´n, ns with n P N, all of which possess finite measures (μr´n, ns “ 2n). Generalization of the Lebesgue measure on R to Rn is straightforward, and we can prove that, if a function is Riemann-integrable on Rn , it is also Lebesgue-integrable and the two integrals coincide. Important examples of sets with null Lebesgue measure are given by hypersurfaces of dimension n ´ 1 in Rn , such as two-dimensional (2D) surfaces in R3 and curves in R2 . Regarding R, since R has the cardinality of continuous, the subsets of R with lower cardinality, that is, countable or finite subsets, have null Lebesgue measure, in particular: mpNq “ mpZq “ mpQq “ 0 This means that even if we eliminated from a measurable set in R, for example an interval ra, bs, a countably infinite number of points, its Lebesgue measure would not change. This property means that the class of Lebesgue-integrable functions is much broader than that of Riemann-integrable functions. Take the case of a piecewise continuous function on a set with a finite or countable number of jump discontinuities: this function has no Riemann integral. It does, however, have a Lebesgue integral, which is the algebraic sum of the Riemann integrals of each section for which the function is continuous. As the number of discontinuities is finite or countable, we can simply ignore them, since they constitute a set of null Lebesgue measure and therefore have no effect on the final result of integration. It is important to remember that Lebesgue integration theory does not provide more advanced tools for the explicit calculation of integrals, except in certain very specific cases; however, as just discussed, it allows us to give a meaningful sense to integrals of functions which are much less regular than is required for Riemann integration. This result, along with the crucial theorems presented in section 3.6, gives Lebesgue integration theory a significant advantage over that of Riemann.
Lebesgue’s Measure and Integration Theory
113
3.6. Three theorems for limit operations in integration theory In this section, we shall summarize the three most important theorems concerning the limit operation in integration theory. These will be used in Chapter 4. In these theorems, we shall take pX, A, μq to be an arbitrary fixed measure space. T HEOREM 3.3 (Monotone convergence theorem – Beppo Levi).– Let pfn qnPN , with fn : X Ñ R, be a monotonically increasing sequence of integrable functions. If the sequence of integrals is bounded, that is: ż fn dμ ă K @n P N, DK P R such that X
then D lim fn pxq ă `8 a.e. Furthermore, if we define the limit function f : X Ñ R as:
nÑ`8
lim fn pxq
# f pxq “
nÑ`8
if the limit is finite otherwise
0
then f is integrable, and the limit and integral commute: ż ż f pxq dμ “ lim fn pxq dμ. nÑ`8 X
X
Let us now pass to Fatou’s lemma by first recalling that, given an arbitrary sequence pxn qnPN of real numbers, lim inf is the limit inferior of the sequence, that is: lim inf pxn q “ inftx P R : x limit point for pxn qnPN u nPN
T HEOREM 3.4 (Fatou’s lemma).– Let pfn qnPN ş , with fn : X Ñ R, be a sequence of positive integrable functions and let lim inf X fn dμ ă `8. The function f defined nÑ`8
by: f pxq “
# lim inf fn pxq if the limit inferior is finite nÑ`8
otherwise
0
is integrable, moreover, the following inequality holds: ż ż f dμ ď lim inf fn dμ. X
nÑ`8
X
114
From Euclidean to Hilbert Spaces
T HEOREM 3.5 (Dominated convergence theorem – Henri Lebesgue).– Let pfn qnPN , where fn : X Ñ R, be sequence of measurable functions, and let Φ : X Ñ R be a positive and integrable function such that: |fn | ď Φ
e.a @n P N
If the real sequence pfn pxqqnPN is convergent @x P X and if f pxq “ lim fn pxq, nÑ`8
then fn and f are integrable and the limit and the integral commute, that is: ż ż f pxq dμ “ lim fn pxq dμ X
nÑ`8 X
3.7. Summary In this chapter, we provided a brief overview of key elements of measure theory and Lebesgue integration, touching on subjects such as σ-algebra, measurable sets, measures, measure spaces and measurable functions. Particular attention was paid to the Borel σ-algebra in a topological space: this σ-algebra is generated by the open subsets of the space in question. Almost-everywhere (a.e.) properties play an important role in measure theory: a property is verified a.e. if it is valid on a measurable subset such that the measure of its complementary set is null. We saw that the Lebesgue measure m on R can be characterized with respect to the Borel σ-algebra as a regular, normalized and shift-invariant measure. Remarkable examples of null Lebesgue measures in R include countable sets, specifically mpNq “ mpZq “ mpQq “ 0. Given a measure, the definition of the integral of a measurable function is straightforward and follows a standard approach. We begin by considering simple (or step) functions, which are linear combinations of characteristic functions of measurable sets. Simple functions approach any non-negative measurable function from below. This result is essential, allowing us to define the integral of non-negative measurable functions as the sup of the integral of simple functions which are not greater than the function itself. This definition is extended to arbitrary real-valued functions by using their positive and negative parts, and to complex-valued functions by using their real and imaginary parts. Finally, we outlined the three fundamental theorems concerning the relation between limits and integrals in Lebesgue theory: the monotonic and dominated convergence theorems (developed by Levi and Lebesgue, respectively) and Fatou’s lemma.
4 Banach Spaces and Hilbert Spaces
In this chapter, we shall consider normed or inner product spaces of infinite dimensions. Particular attention will be paid to “complete” spaces, for which several crucial theorems – which do not hold for non-complete spaces of infinite dimensions – can be formulated. Before we can begin our analysis, it is important to note that all of the properties described previously for inner product spaces of finite dimension which rely solely on the algebraic nature of the inner product remain valid for infinite-dimensional vector spaces. For example: – a family of orthogonal vectors is free; – if xx, zy “ xy, zy @z, then vectors x and y necessarily coincide; – the null vector is the only vector which is orthogonal to all other vectors; – the Gram-Schmidt orthonormalization procedure can be iterated, guaranteeing that an infinite system of mutually orthogonal vectors with a unitary norm will be obtained from any given infinite set of vectors. The proofs for the first three properties are identical to those used for finite-dimensional vector spaces. The proof of the final property relies on the Zorn lemma. Results for finite sums are harder to generalize; in this case, we need to take account of topological arguments in addition to algebraic considerations.
From Euclidean to Hilbert Spaces: Introduction to Functional Analysis and its Applications First Edition. Edoardo Provenzi. © ISTE Ltd 2021. Published by ISTE Ltd and John Wiley & Sons, Inc.
116
From Euclidean to Hilbert Spaces
As we shall see, the definition and analysis of Banach and Hilbert spaces rely primarily on the analysis of the compatibility between the linear and topological structures of a normed or inner product vector space. For this reason, we start by recalling the concept of topology in such spaces. 4.1. Metric topology of inner product spaces As we have seen, all inner product spaces V can be assigned a norm, which is canonically induced from the scalar product. Using this norm, it will always, canonically, be possible to define a distance or metric on V : dpx, yq “ }x ´ y} “
a xx ´ y, x ´ yy,
@x, y P V
Function d possesses the following properties, @u, x, y P V : 1) dpx, yq ě 0 and dpx, yq “ 0 ðñ x “ y; 2) dpx, yq “ dpy, xq (symmetry); 3) dpx, yq ď dpx, zq ` dpz, yq (triangular inequality). D EFINITION 4.1 (Metric vector space).– A metric vector space is a pair pV, dq given by a vector space V and a function, the distance d : V ˆ V Ñ R` 0 “ r0, `8q, which satisfies the three properties given above. An inner product space is thus automatically a normed vector space and possesses a distance, independently of whether the scalar product is real or complex. As we shall see in this chapter, the converse is true if and only if the norm satisfies the parallelogram formula. The existence of a metric means that it is possible to establish relationships between points and subsets in a space which go further than simple notion of a point belonging to a set. As we know, these relationships form the basis for constructing a topology. Reminders of a number of common definitions are given below, establishing clear notation and naming conventions for the rest of this chapter. D EFINITION 4.2.– Let pV, x, yq be an inner product space and } } the associated norm. Then: – a neighborhood (open) of x P V of radius ε is the subset of V defined by: Uε pxq “ ty P V : x ´ y ă εu if } } is the Euclidean norm, then Uε pxq is a sphere (open) centered in x and of radius ε. By extension, Uε pxq is often called a ball or sphere (open), and we write Bpx, εq, for any norm } };
Banach Spaces and Hilbert Spaces
117
– a subset O Ď V is said to be open if: @x P O Dε ą 0 such that y P Uε pxq ùñ y P O – a subset F Ď V is said to be closed if its complement F c “ V zF is open. Remember that this is the same as saying that any convergent sequence of elements in F will reach its limit within F ; Ş – the closure of E Ă V is E “ Eα , where Eα is a closed subset of V αPI
containing E. E is the smallest closed subset of V which contains E; – the border (or spherical surface) of Uε pxq is the subset of V defined by: BUε pxq “ ty P V : x ´ y “ εu using the symbol Bpx, εq, the border is noted BBpx, εq; – the closed neighborhood (or ball, or sphere) of radius ε of x P V is the subset of V defined by: Uε pxq “ ty P V : x ´ y ď εu using the symbol Bpx, εq, we can write Bpx, εq. We also recall that a topology on V is a set of parts of V containing V itself and H, which is stable with respect to arbitrary unions and finite intersections. The topology generated by the opens in V is the smallest topology which contains the open sets in V . Using this topology, with respect to the opens defined above, V is a topological space. The topology of V is metric, that is the open sets are defined using a distance function. We recall that this guarantees that the topology will be separated, that is, for all pairs x, y P V , x ‰ y, there exist two neighborhoods U pxq and V pyq, of different or equal radius, such that U pxq X V pyq “ H, and we say that the points are separated by these neighborhoods. A standard result in topology guarantees the uniqueness of the limit of sequences in a separated topology; hence, if sequences of vectors in V converge, they have a single limit. We now recall the definition of convergence for a sequence in the topology of V . D EFINITION 4.3.– Let pV, } }q be a normed vector space. A sequence of vectors pxn qnPN Ă V is convergent, or convergent in norm } }, toward the limit x if: @ε ą 0 DNε ą 0 : n ě Nε ùñ }xn ´ x} ă ε
118
From Euclidean to Hilbert Spaces
that is if, from n “ Nε , xn P Uε pxq. This can be represented using the more compact notation: lim xn “ x,
nÑ`8
xn ´ x
Ñ
nÑ`8
0
R EMARK .– Requiring the inequality }xn ´ x} ă ε to be valid @ε ą 0 enables us to reformulate the definition of convergent, adding a strictly positive, finite multiplication constant to ε, that is, xn Ñ x if: nÑ`8
@ε ą 0 DNε ą 0 and Dm P p0, `8q : n ě Nε ùñ }xn ´ x} ă mε If the property is valid for all positive and arbitrarily small ε, then we can consider ε that ε˜ “ m and redefine the convergence with respect to ε˜: @˜ ε ą 0 DNε˜ ą 0 : n ě Nε˜ ùñ }xn ´ x} ă ε˜ This is possible because using the symbol ε or ε˜ is insignificant; the two quantities can be as small as we wish, so the two definitions are equivalent. In a metric topology, the uniqueness of the limit follows simply from the triangular inequality. If xn Ñ x and xn Ñ y, then: nÑ`8
nÑ`8
0 ď dpx, yq ď dpx, xn q ` dpxn , yq
Ñ
nÑ`8
0`0“0
In what follows, this consideration will be referred to using the standard expression “due to the arbitrarity of ε...”. It is also helpful to recall the concept of density of a subset in a normed vector space. Proof of the equivalence of the properties expressed in Definition 4.4 can be found in most works on the subject of topology. D EFINITION 4.4 (density).– Let pV, q be a normed vector space. A subset E Ă V is dense in V if one of the following propositions is verified: 1) @x P V, Dpxn qnPN Ă E : xn ÝÑ x pi.e. }xn ´ x} ÝÑ 0q, that is: any nÑ`8
nÑ`8
subset in V can be indefinitely approached by a sequence of elements in E, and is the limit of this sequence; 2) @x P V and @ε ą 0, Dy P E : }x ´ y} ă ε, that is, for every element x in X there exists an element y in E with an arbitrarily small distance from x; 3) V is the closure of E: E “ V .
Banach Spaces and Hilbert Spaces
119
We end the recap of classical notions with the concept of continuity of a function between metric spaces, along with a classic result which says that we can characterize continuity of a function via its action on sequences. D EFINITION 4.5 (limits and continuity of functions between metric spaces).– Let X and Y be two arbitrary metric spaces, x ¯ P X and P Y , then: xqXX lim f pxq “ ðñ @ε ą 0 Dδε ą 0 : x P Uδε p¯
ùñ
xѯ x
f pxq P Uε pqXY
that is, the limit of f in x ¯ is if f transforms the points of X which are arbitrarily close to x ¯ into points of Y which are arbitrarily close to . If “ f p¯ xq, then the function f : X Ñ Y is said to be continuous in x ¯ P V . In explicit terms: @ε ą 0 Dδε ą 0 : x P Uδε p¯ xq X X
ùñ
f pxq P Uε pf p¯ xqq X Y
f is continuous on X if it is continuous at every point in X. T HEOREM 4.1 (Sequential continuity).– The function f : X Ñ Y , with pX, dX q and pY, dY q arbitrary metric spaces, is continuous in x ¯ P X if and only if: @pxn qnPN Ď X such that lim xn “ x ¯ ùñ
nÑ`8
lim f pxn q “ f
nÑ`8
˙
ˆ lim xn
nÑ`8
“ f p¯ xq
that is: @pxn qnPN Ď X : dX pxn , x ¯q
Ñ
nÑ`8
0
ùñ
dY pf pxn q, f p¯ xqq
Ñ
nÑ`8
0
We see that the limit operation on the sequence pxn qnPN is carried out in the metric space pX, dX q, while the operation on the sequence pf pxn qqnPN is carried out in the metric space pY, dY q. The possibility of switching the order of the limit and the (continuous) function in the expression: ˆ ˙ lim f pxn q “ f lim xn nÑ`8
nÑ`8
is essential for proving many of the results presented later. P ROOF.– ùñ : let f be continuous in x ¯ and let pxn qnPN Ď X be an arbitrary sequence of elements of X such that lim xn “ x ¯. Then, by definition of the limit of a sequence, nÑ`8
120
From Euclidean to Hilbert Spaces
for sufficiently large values of n, xn belongs to a neighborhood of x ¯ of arbitrarily small radius δ ą 0: in other words, there exists N P N such that n ě N ùñ xn P Uδ p¯ xq. On the other side, due to the continuity of f , the elements xn belonging to the neighborhood Uδ p¯ xq are transformed by f into points belonging to a neighborhood of f p¯ xq of arbitrarily small radius ε ą 0, i.e. n ě N ùñ f pxn q P Uε pf p¯ xqq X Y , that is lim f pxn q “ f p¯ xq. nÑ`8
ðù : we shall assume that, for all sequences pxn qnPN Ď X such that lim xn “ x ¯ P X, it holds that lim f pxn q “ f p¯ xq; we need to prove that this
nÑ`8
nÑ`8
implies the continuity of f in x ¯ P X.
Using reductio ad absurdum, suppose that f is not continuous in x ¯: as we shall see, this results in a contradiction. Negation1 of the continuity of f in x ¯ is equivalent to saying that @δ ą 0, Dεδ ą 0 such that x P Uδ p¯ xq Ă X implies f pxq R Uεδ pf p¯ xqq. Since the values of δ are arbitrary, we may consider the sequence pδn qně1 defined by δn “ n1 @n ě 1, which implies the existence of a sequence pxn qně1 Ă X such that xn P Uδn p¯ xq and f pxn q R Uεδn pf p¯ xqq. This leaves us with a contradiction: on the one hand, when n Ñ `8, δn Ñ 0 and thus xn Ñ x ¯, while on the other hand, f pxn q Ñ xq, that is, the hypothesis f p¯ nÑ`8 that f is not continuous results in a sequence of elements in X which converges to x ¯ without f pxq being convergent to f p¯ xq. This contradicts our initial hypothesis, and thus the possibility that f is not continuous must be rejected. 2 If V, W are two normed vector spaces, then they automatically constitute two metric spaces with respect to the distances canonically induced by the norm and definitions; the results presented above therefore remain valid. 4.2. Continuity of fundamental operations in inner product spaces Given an inner product space with both a linear structure and metric topology, the question about the compatibility of these two structures is evidently important; in other words, we wish to know whether the linear operations of the vector space V , together with inner product and norm, are continuous in the topology of V generated by its inner product. The response to this question is affirmative, as Theorem 4.2 states. 1 Note that the negation of a mathematical proposition is performed by exchanging the universal and existential quantifiers and by considering the complementary affirmation of the initial ¯ where C ¯ is proposition: thus, the negation of p@A DB ùñ Cq is p@B DA ùñ Cq, the negation of the affirmation C.
Banach Spaces and Hilbert Spaces
121
T HEOREM 4.2.– Let pV, x , yq be an inner product space on K. We shall consider the topology induced by the inner product on V , the usual Euclidean topology on K and the product topology on V ˆ V and K ˆ V . Then: – inner product: x , y : V ˆ V ÝÑ K px, yq ÞÝÑ xx, yy – norm: } } : V ÝÑ R` 0 x ÞÝÑ }x} – sum: ` : V ˆ V ÝÑ V px, yq ÞÝÑ x ` y – and scalar multiplication: ¨K : K ˆ V ÝÑ V pk, xq ÞÝÑ kx are continuous functions. P ROOF.– All of the proofs shown below involve majorizing a selected norm using an expression which contains the norm of the difference between a sequence and its bound, which evidently converges to 0. – Continuity of inner product: we must prove that if pxn qnPN and pyn qnPN are any two sequences of elements in V which converge to x and y, respectively, then the sequence of scalars pxxn , yn yqnPN converges to xx, yy. To do this, we first write a simple algebraic manipulation which holds for all n P N: xxn , yn y ´ xx, yy “ xxn ´ x ` x, yn ´ y ` yy ´ xx, yy ´ xx,yy “ xxn ´ x, yn ´ yy ` xxn ´ x, yy ` xx, yn ´ yy ` xx,yy “ xxn ´ x, yn ´ yy ` xxn ´ x, yy ` xx, yn ´ yy We can write the following majorization: |xxn , yn y´xx, yy| ď |xxn ´x, yn ´yy|`|xxn ´x, yy|`|xx, yn ´yy| @n P N and, from the Cauchy-Schwarz inequality: |xxn , yn y ´ xx, yy| ď }xn ´ x}}yn ´ y} ` }xn ´ x}}y} ` }x}}yn ´ y} @n P N As the equality holds for all n P N, the limit n Ñ `8 may be considered on both sides: by hypothesis, }xn ´ x} Ñ 0 and }yn ´ y} Ñ 0, so the right-hand side nÑ`8
tends to 0, hence:
nÑ`8
122
From Euclidean to Hilbert Spaces
|xxn , yn y ´ xx, yy|
Ñ
nÑ`8
0
which proves the continuity of the inner product. – Continuity of the norm: we must prove that if pxn qnPN is an arbitrary sequence of elements in V which converges to x, then the sequence of positive real numbers p}xn }qnPN converges to }x}. This can be done using the majorization of the norm provided by formula [1.3]: |}xn } ´ }x}| ď }xn ´ x} but }xn ´ x}
Ñ
nÑ`8
0, hence }xn }
Ñ
nÑ`8
}x}.
– Continuity of the sum: we must show that if pxn qnPN and pyn qnPN are any two sequences of elements in V which converge to x and y, respectively, then the sequence pxn ` yn qnPN converges to x ` y. To do this, we write: }pxn `yn q´px ` yq} “ }pxn ´xq`pyn ´yq} ď }xn ´ x} ` }yn ´y}
Ñ
nÑ`8
0
– Continuity of scalar multiplication: we must show that if pxn qnPN and pkn qnPN are any two sequences of elements in V and K, respectively, which converge to x and k, respectively, then the product sequence pkn xn qnPN converges to kx. Once again, an algebraic manipulation is involved: } kn xn ´ kx} “ }kn pxn ´ x ` xq ´ kx} “ }kn pxn ´ xq ` kn x ´ kx} “ }kn pxn ´ xq ` xpkn ´ kq} ď |kn |}xn ´ x} ` }x}|kn ´ k|
Ñ
nÑ`8
0
2
Let us consider the immediate consequences of this theorem. First, the continuity of the sum and scalar multiplication implies that the difference is also continuous, since x ´ y “ x ` p´1qy. If pxn qnPN and pyn qnPN are two sequences in pV, x , yq which converge to elements of x and y, respectively, then the continuity of the inner product and the norm, taken alongside Theorem 4.1, give us the following formulas that will be used later: lim xxn , yn y “ x lim xn , lim yn y “ xx, yy
nÑ`8
nÑ`8
nÑ`8
lim x lim xn “ n “ x . nÑ`8 nÑ`8
[4.1]
[4.2]
Banach Spaces and Hilbert Spaces
123
The case of series needs to be considered separately. First of all, let us recall the definitions of a series and of a convergent series. D EFINITION 4.6.– Given a sequence of vectors pxn qnPN Ă V , the series of general n ř xk , and we write: term xn is the sequence of partial sums pSn qnPN , where Sn “ k“0
ÿ
xn “
8 ÿ
xn “ pSn qnPN
n“0
nPN
ř
The series
xn is said to be convergent, or convergent in norm } }, to the sum x
nPN
if the sequence of partial sums pSn qnPN is convergent to x, that is: n ÿ
x “ lim
nÑ`8
xk “
ÿ
xn ðñ
nPN
k“0
lim }Sn ´ x} “ 0 ðñ }Sn ´ x}
nÑ`8
Ñ
nÑ`8
0
xn is said to be absolutely convergent2 if the sequence of the ˆ n ˙ ř partial sums of the norms, that is }xk } , is convergent. In this case, we k“0 nPN ř }xn } ă `8. write: The series
ř
nPN
nPN
We observe that, since Sn ´ x “ 8 ÿ xk }Sn ´ x} “
n ř
xk ´
k“0
8 ř
xn “ ´
k“0
8 ř
xk , then:
k“n`1
[4.3]
k“n`1
hence the explicit definition of a convergent series in a normed vector space is: 8 ÿ @ε ą 0 DNε ą 0 : n ě Nε ùñ xk ă ε k“n`1 ř ř Given convergent series xn , ym , the fact that a series is the sequence of nPN
mPN
its partial sums means that we can write: x
ÿ
nPN
xn ,
ÿ mPN
ym y “ x lim
N Ñ`8
N ÿ n“0
xn , lim
KÑ`8
K ÿ m“0
ym y “
lim
N,KÑ`8
K N ÿ ÿ
xxn , ym y
n“0 m“0
[4.4]
2 The absolute convergence defined here becomes the normal convergence for the modulus of Sn when V “ R or V “ C.
124
From Euclidean to Hilbert Spaces
and: N N ÿ ÿ ÿ xn “ lim xn “ lim xn nPN N Ñ`8 n“0 N Ñ`8 n“0
[4.5]
Squaring the members of equation [4.5], we obtain: 2 ˜ ¸2 2 N N N ÿ ÿ ÿ 2 ÿ xn “ lim xn “ lim xn “ lim xn N Ñ`8 N Ñ`8 N Ñ`8 n“0 nPN n“0 n“0 N ř having used the fact that xn P R and the continuity of the square operation in n“0 R to exchange the limit with the square. If we consider an orthogonal family of vectors pun qnPN in place of an arbitrary sequence pxn qnPN , the generalized Pythagorean theorem (Theorem 1.8) can be used 2 N N ř ř ř 2 2 un un “ un , giving the following to write lim “ lim N Ñ`8 n“0 N Ñ`8 n“0 nPN very helpful formula: ÿ 2 ÿ 2 un “ un nPN nPN
pun qnPN : orthogonal family of vectors
[4.6]
Formula [4.6] will be used extensively in Chapter 5. It is important to note that this formula does not generally hold if pun qnPN is not an orthogonal system of vectors and if we consider the norm rather than its square. The possibility to exchange the limit and inner product and norm operations is crucial for proving many of the theorems that we shall see later. This consideration emphasized the importance of the compatibility of the linear and topological structures in an inner product space. The result below is a first example of the usefulness of the continuity of the norm. In Chapter 1, we saw that the parallelogram law can be used to characterize the norms generated by an inner product, that is, Hilbertian norms. We now have all of the tools we need to formalize this affirmation, which Yosida (1995) refers to as the Fréchet-von Neumann-Jordan theorem. T HEOREM 4.3 (Fréchet-von Neumann-Jordan theorem).– Let V be a vector space on K (of finite or infinite dimension) and let } } be a norm on V . } } is a Hilbertian norm if and only if it satisfies the parallelogram law.
Banach Spaces and Hilbert Spaces
125
If the norm satisfies the parallelogram law, then the inner product from which it is induced is necessarily determined by the polarization formulas for real and complex cases, respectively: 1 2 2 pv ` w ´ v ´ w q 4 ¯ı ´ 1” 2 2 2 2 v ` w ´ v ´ w ` i v ` iw ´ v ´ iw xv, wy “ 4
xv, wy “
P ROOF.– The direct implication is obvious, so we only need to prove the reverse implication, that is, if a norm } } satisfies the parallelogram law, then it is induced by a an inner product in the canonical manner: } ¨ } “ x¨, ¨y. Let us begin by considering the real case. If an inner product exists which induces the norm, then it must take the following form: ppv, wq “
1 2 2 pv ` w ´ v ´ w q, 4
@v, w P V
Note that p is a composition of algebraic functions (sum and difference), the norm and its squared power, all of which are continuous functions; p itself is thus a continuous function of its arguments. The next step is to verify that this definition satisfies the defining properties of a real inner product. First, we note that the symmetry ppv, wq “ ppw, vq is obvious, as is definite 2 2 positiveness, given that ppv, vq “ 14 2v “ v ě 0 and ppv, vq “ 0 if and only if v “ 0V . Second, we must verify bilinearity. Given that the symmetry condition is satisfied, any property of p which is demonstrated with respect to the first argument also holds for the second argument, meaning that we can focus on the first entry of p. Using the parallelogram law, we can write @v, w, z P V : }pv`zq`w}2 `}pv´zq`w}2 “ }pv`wq`z}2 `}pv`wq´z}2 “ 2}v`w}2 `2}z}2 and: }pv`zq´w}2 `}pv´zq´w}2 “ }pv´wq`z}2 `}pv´wq´z}2 “ 2}v´w}2 `2}z}2
126
From Euclidean to Hilbert Spaces
thus: 2
2
ppv ` z, wq ` ppv ´ z, wq “ 14 pv ` z ` w ´ v ` z ´ w 2 2 ` v ´ z ` w ´ v ´ z ´ w q “
1 r2p}v ` w}2 ` }z}2 q ´ 2p}v ´ w}2 ` }z}2 qs 4
[4.7]
1 “ 2 p}v ` w}2 ´ }v ´ w}2 q 4 “ 2ppv, wq :0 wq v, Taking v “ z, we obtain ppv ` v, wq ` ppv “ pp2v, wq “ 2ppv, wq ´ r4.7s
@v, w P V , that is: 2ppv, wq “ pp2v, wq,
@v, w P V
[4.8]
Now, take v1 , v2 P V such that v “ 12 pv1 `v2 q and z “ 12 pv1 ´v2 q, thus v`z “ v1 and v ´ z “ v2 , then: ppv1 , wq ` ppv2 , wq “ ppv ` z, wq ` ppv ´ z, wq “ 2ppv, wq “ pp2v, wq “
pdef. vq
ppv1 ` v2 , wq
r4.7s
r4.8s
Since v, w, z are arbitrary vectors, v1 , v2 are also arbitrary, therefore, the demonstration that ppv1 ` v2 , wq “ ppv1 , wq ` ppv2 , wq proves the additivity of p. Now, let us prove the property of homogeneity. We start by observing that if the reasoning which gave us pp2v, wq “ 2ppv, wq is iterated n P N times, we obtain ppnv, wq “ nppv, wq. v Furthermore, for all m P N, m ‰ 0, it not only holds that ppv, wq “ ppm m , wq, v v but also ppmp m q, wq “ mpp m , wq; combined with the formula ppnv, wq “ nppv, wq, n n this gives us pp m v, wq “ m ppv, wq @n, m P N, m ‰ 0, that is, p is homogeneous with respect to any number r P Q, r ě 0 : pprv, wq “ rppv, wq.
In order to extend this homogeneity to all rational numbers, we use the argument that if r ă 0, then, by rewriting rv “ ´|r|v “ |r|p´vq, we obtain: rppv, wq ´ pprv, wq “ rppv, wq ´ pp|r|p´vq, wq “ rppv, wq ´ |r|pp´v, wq “ rppv, wq ` rpp´v, wq “ rpppv, wq ` pp´v, wqq
“
(additivity)
rppv ´ v, wq “ rpp0V , wq “ 0
Hence, the property of homogeneity also holds for negative rational numbers, and thus for all rational numbers. Now, using the fact that Q is dense in R, we know that
Banach Spaces and Hilbert Spaces
127
for all α P R there exists a sequence of rational numbers prn qnPN Ă Q such that rn ÝÑ α. By the continuity of p, we have: nÑ`8
αppv, wq “ lim rn ppv, wq “ pp lim rn v, wq “ ppαv, wq, nÑ`8
nÑ`8
@α P R, v, w P V
In summary, p is an inner product on V which is compatible with its norm if K “ R. Now, let us consider the complex case: K “ C. As we saw in the real case, if there is an inner product which induces the norm, it must take the following form: ” ¯ı ´ 2 2 2 2 p˜pv, wq “ 14 v ` w ´ v ´ w ` i v ` iw ´ v ´ iw “ ppv, wq ` ippv, iwq @v, w P V . From the observations presented in section 1.1, to prove that p˜pv, wq is a complex inner product, we must simply verify the Hermitian property, that is, p˜pv, wq “ p˜pw, vq, since the linearity of the first variable and the definite positiveness of p imply that these properties also hold for p˜. p˜ is an Hermitian form if and only if p˜pw, vq “ p˜pv, wq “ ppv, wq ´ ippv, iwq, since ppv, wq and ppv, iwq P R. Furthermore, p˜pw, vq “ ppw, vq` ippiw, vq “ ppv, wq ` ippiw, vq, given that ppv, wq “ ppw, vq, thus p˜pw, vq “ ppv, wq ` ippiw, vq. Comparing the formulas: p˜pw, vq “ p˜pv, wq “ ppv, wq ´ ippv, iwq
and
p˜pw, vq “ ppv, wq ` ippiw, vq
we see that p˜ is an Hermitian form if and only if ppv, iwq “ ´ppiw, vq @v, w P V . Now, we calculate: p˜pv, iwq “ “
1 1 2 2 2 2 pv ` iw ´ v ´ iw q “ p|i| v ` iw ´ |i| v ´ iw q 4 4 1 1 2 2 2 2 piv ´ w ´ iv ` w q “ ´ pw ` iv ´ w ´ iv q 4 4
“ ´ppw, ivq using the fact that w ´ iv “ iv ´ w. In short, p˜ is the inner product associated with our norm in the complex case. 2 The mathematical object below is crucial in mathematics.
128
From Euclidean to Hilbert Spaces
D EFINITION 4.7 (Topological vector space).– A topological vector space (T.V.S.) is a vector space V with a topology which is compatible with the linear structure of V , that is such that the linear operations of the sum and of scalar multiplication are continuous functions. The continuity of fundamental operations in an inner product space implies that these spaces are always T.V.S. The same can be said of normed vector spaces; the continuity of linear operations is proved in exactly the same way. In terms of topological arguments, there is no difference between an inner product space and a normed vector space, as the norm is the mathematical object used to prove continuity in both cases. The major difference between an inner product space and a normed vector space is related to the underlying geometric structure of the space itself, which is much richer in the former case. 4.2.1. Equivalence of separated topologies in finite-dimension vector spaces The dimension of the vector space played no part in the proofs of Theorem 4.2, so the considerations presented in the previous section hold true for any vector space, whether of finite or infinite dimensions. In finite dimension, however, the topology (separable) of a T.V.S. can be guaranteed to be essentially unique. T HEOREM 4.4 (Tychonoff).– Let V be a separated T.V.S. of finite dimension n on the field K. Given an arbitrary fixed basis B “ pb1 , . . . , bn q in V , the linear isomorphism defined by: I:
ÝÑ ¨ Kn ˛ x1 n ř ˚ ‹ x “ rxsB “ xi bi ÞÝÑ ˝ ... ‚ V
i“1
xn
is a homeomorphism (or topological isomorphism), that is a bicontinuous application (continuous, inversible, and of which the inverse is continuous) considering the usual Euclidean topology on Kn . As we have seen, all inner product spaces, whether normed or metric, are separated T.V.S, so one immediate consequence of Tychonoff’s theorem is that all inner products, norms and distances which can be defined on a finite-dimensional vector space are topologically equivalent, that is, they generate the same topology,
Banach Spaces and Hilbert Spaces
129
which, up to an isomorphism, is the Euclidean topology. This does not hold for infinite dimensions, as shown by a number of counter-examples. The simplest example of topological independence with respect to the choice of a norm in finite-dimension concerns vector spaces of dimension 1, as we see from the result below. T HEOREM 4.5.– If V is a normed, one-dimensional vector space on the field K, any two norms defined over V are multiples of each other by a real, strictly positive scalar. P ROOF.– Let } }1 , } }2 be two norms on V . By definition, }0V }1 “ }0V }2 “ 0; so we just concentrate on an arbitrary v P V different from the null vector. Let }v}1 “ k1 k1 1 ` and }v}2 “ k2 , then we can write }v} }v}2 “ k2 “ k P R , and thus }v}1 “ k}v}2 . Since V is of dimension 1, for any other vector w P V there exists λ P K such that w “ λv. Thus, by the homogeneity of the norm, we can write: }w}1 “ }λv}1 “ |λ|}v}1 “ |λ|k}v}2 “ k}λv}2 “ k}w}2 that is, for all w P V and for any pair of norms } }1 , } }2 on V , there exists a constant k P R` such that }w}1 “ k}w}2 . 2 4.3. Cauchy sequences and completeness: Banach and Hilbert Mathematicians working in the late 19th and early 20th centuries showed that the infinite-dimensional metric, normed and inner product vector spaces, which were most “similar” to finite-dimensional Euclidean spaces, can be characterized using a relatively simple property: converging sequences can be identified with Cauchy sequences. D EFINITION 4.8.– Given a generic metric space pX, dq, a sequence pxn qnPN is a Cauchy sequence if: @ε ą 0 DNε ą 0 : n, m ě Nε ùñ dpxn , xm q ă ε that is, the elements in the sequence become arbitrarily close to each other as the indices of the elements increase, that is, as the sequence progresses. pX, dq is said to be a complete metric space if all Cauchy sequences converge to limits contained within X. We shall see many examples of complete metric spaces in this chapter. Simple examples of non-complete metric spaces can be built by using the following basic result concerning Cauchy sequences.
130
From Euclidean to Hilbert Spaces
T HEOREM 4.6.– Any convergent sequence in a metric space is necessarily a Cauchy sequence. P ROOF.– If xn
Ñ
nÝÑ`8
x ¯, then, by the arbitrary nature of ε and the triangular
inequality: @ε ą 0 DNε ą 0 : n, m ě Nε ùñ dpxn , xm q ď dpxn , x ¯q`dp¯ x, x m q ă
ε ε ` “ε 2 2 2
Using this result, we can prove that the metric spaces3 pQ, | |q and pp0, 1q, | |q are not complete. To verify that pQ, | |q is not complete, consider the sequence pp1 ` n1 qn qnPN : this sequence is rational, since Q is stable with respect to sum, division and power operators and to their composition. Furthermore, the sequence is known to converge to e, the basis of natural logarithms, so, by Theorem 4.6, it is a Cauchy sequence in Q, interpreted as a subset of R. However, e is an irrational number, that is e P RzQ, implying the existence of at least one Cauchy sequence in Q which converges outside of Q itself. Similarly, in pp0, 1q, | |q, consider the sequence p n1 qně1 ; this is evidently contained within p0, 1q and converges to 0, making it a Cauchy sequence on p0, 1q Ă R, but 0 R p0, 1q. Now, let us consider the relationship between complete and closed metric spaces. T HEOREM 4.7.– If pX, dq is a complete metric space and pE, dq, E Ď X a closed metric subspace in X, then pE, dq is complete. P ROOF.– Let pxn qnPN Ď E be a Cauchy sequence, since E Ă X we have that pxn qnPN Ď X, and thus, since X is complete, pxn qnPN converges to a limit x P X. However, the limits of sequences in E belong to E, and, since E is closed, E “ E, hence x P E, that is all Cauchy sequences of elements of E converge in E itself. 2 T HEOREM 4.8.– If pX, dq is an arbitrary metric space and pE, dq, E Ď X is a complete metric subspace of X, then pE, dq is closed. P ROOF.– Taking x P E, there exists a sequence pxn qnPN Ď E which converges to x. Given that the sequence converges, it is a Cauchy sequence in E. As E is complete, 3 Remember that Q is not a real or complex vector space, as it is not stable with regard to its product by a real or complex scalar; thus, Tychonoff’s theorem cannot be applied for Q.
Banach Spaces and Hilbert Spaces
131
pxn qnPN must converge to an element y P E. By uniqueness of the limit, x “ y and thus x P E, that is E “ E. 2 An inner product vector space, or a normed vector space, is also a metric vector space; consequently, the definition of a Cauchy sequence can be rewritten as: @ε ą 0 DNε ą 0 : n, m ě Nε ùñ }xn ´ xm } ă ε Some authors use an even shorter form: lim
n,mÑ`8
}xn ´ xm } “ 0
A standard result of Calculus guarantees that pRn , | |q and pCn , | |q are complete metric spaces for all finite n P N. Using Tychonoff’s theorem (Theorem 4.4), we known that real or complex separated topological vector spaces of finite dimension n are topologically equivalent to the Euclidean spaces Rn or Cn , respectively; it follows that completeness is never a problem for pre-Hilbert vector spaces (or normed spaces) of finite dimension: converging sequences in these spaces are all, and only, Cauchy sequences. If the dimension of the vector space is not finite, then while it remains true that convergent sequences are necessarily Cauchy sequences, the inverse is not always true. For this reason, we shall introduce a definition to characterize spaces in which the Cauchy condition is necessary and sufficient for convergence4. D EFINITION 4.9 (Hilbert and Banach spaces).– Let V be a vector space of finite or infinite dimension. – If pV, } }q is complete, then it is called a Banach space. – If pV, x , yq is complete, then it is called a Hilbert space. One consequence of Tychonoff’s theorem is that real or complex normed vector spaces of finite dimension are all Banach spaces, while real or complex inner product spaces of finite dimension are all Hilbert spaces. Finite or infinite-dimension Hilbert spaces are also Banach spaces, due to the fact that they are normed and complete vector spaces; the inverse is not generally true, as the existence of an inner product in a Banach space is guaranteed if and only if the parallelogram law holds. Two results related to Cauchy sequences are presented below. These will be extremely useful in what follows. Before proving them, we recall that a sequence in a metric space is said to be bounded if all elements of the sequence fall within a finite neighborhood of one element of the space, as described in Definition 4.9. 4 One can also Fréchet spaces: locally convex topological vector spaces which are complete with respect to a shift-invariant topology.
132
From Euclidean to Hilbert Spaces
D EFINITION 4.10.– A sequence pxn qnPN in a metric space pX, dq is said to be bounded if there exists x˚ P X and M ě 0 such that dpxn , x˚ q ď M @n P N. T HEOREM 4.9.– All Cauchy sequences are bounded. P ROOF.– By definition, if pxn qnPN is a Cauchy sequence, there exists Nε ą 0 such that the distance between xNε and all elements xn of the sequence with n ě Nε is less than ε, that is dpxn , xNε q ă ε @n ě Nε . xNε is thus a good candidate to take the place of x˚ in the definition of a bounded sequence. To prove this, we note that the elements of the sequence corresponding to an index value n lower than Nε belong to X, thus their distance from xNε is finite, and we can define the following value: r “ maxtdpxNε , x0 q, dpxNε , x1 q, . . . , dpxNε , xNε ´1 qu Now, defining M “ maxtε, ru, we obtain dpxn , xNε q ď M @n P N.
2
The second result relates to subsequences. D EFINITION 4.11.– Let pxn qnPN be a sequence in a metric space pX, dq and let ϕ : N Ñ N be a strictly increasing function, that is ϕpn ` 1q ą ϕpnq for all n P N. The sequence defined by pxϕpnq qnPN is a subsequence of the initial sequence pxn qnPN . As a very simple exercise, readers are invited to prove that, if a sequence pxn qnPN in a metric space pX, dq is convergent, then all of its subsequences also converge, and converge to the same limit. The following important result shows that, for Cauchy sequences, the order of this implication can be reversed. T HEOREM 4.10.– Any Cauchy sequence in a metric space pX, dq which possesses at least one convergent subsequence is itself convergent to the same limit. P ROOF.– Let pxn qnPN be a Cauchy sequence in pX, dq which admits a convergent subsequence pxϕpnq qnPN , where ϕ : N Ñ N is the strictly increasing application which defines this subsequence. Let a be the limit of the subsequence, that is a “ lim xϕpnq . nÑ`8
For all n P N, by the triangular inequality, we have dpxn , aq ď dpxn , xϕpnq q ` dpxϕpnq , aq; if we can majorized both terms on the right by an arbitrarily small quantity ε, then the thesis of the theorem will be proven.
Banach Spaces and Hilbert Spaces
133
To show that this is possible, we shall use the definition of a Cauchy sequence for pxn qnPN to write: @ε ą 0 DNε P N such that m, n ě Nε ùñ dpxm , xn q ă
ε 2
but, as ϕ is strictly increasing, ϕpnq ě Nε , hence dpxn , xϕpnq q ă 2ε . Since the subsequence pxϕpnq qnPN is presumed to converge to a, this implies that: @ε ą 0 DKε P N such that: n ě Kε ùñ dpxϕpnq , aq ă and, by considering n ě dpxn , xϕpnq q ` dpxϕpnq , aq ă 2ε `
ε 2
ε 2
maxtNε , Kε u, we obtain dpxn , aq “ ε @ε ą 0, that is xn ÝÑ a. nÑ`8
ď 2
This theorem has notable applications in pure and applied mathematics. We shall see a theoretical use in the next section; here we mention its usefulness in optimization, where one seeks to identify the optimal solution to a problem by minimizing an appropriate function. In many cases, the function is too complicated for an analytical description of its minima to be possible, so the solution must be approximated using an iterative algorithm: in this way, a minimum point is attained after passing through a sequence of points. Theorem 4.10 is often used to demonstrate that the iterative algorithm converges, proving that the sequence of points defined by the algorithm is a Cauchy sequence and proving that it admits a (wisely chosen) converging subsequence. 4.3.1. Completeness of vector spaces Metric vector spaces can always be completed in an essentially unique way to complete spaces, as Theorem 4.11 establishes. T HEOREM 4.11 (Completion of a non-complete metric vector space).– If pV, dq is a non-complete metric vector space, then there exists a complete metric vector space ˆ and an isometric injective function ι : V ÝÑ Vˆ , that is: pVˆ , dq # x1 , x2 P V, x1 ‰ x2 ùñ ιpx1 q ‰ ιpx2 q ˆ @x1 , x2 P V, dpx1 , x2 q “ dpιpx 1 q, ιpx2 qq such that ιpV q “ Vˆ , that is the image of V via ι is dense in Vˆ . C OROLLARY 4.1.– Any pre-Hilbert space V can be completed to a Hilbert space H.
134
From Euclidean to Hilbert Spaces
P ROOF.– This proof will focus on the case of pre-Hilbert spaces, which is most relevant for our purposes. The general proof follows a similar approach, except for the fact that the norm of the difference between two vectors is replaced by their distance. The completion of a pre-Hilbert space V is, by definition, the space H1 of all the Cauchy sequences pxn qnPN modulo the equivalency relationship „, defined as follows: two Cauchy sequences pxn qnPN and pyn qnPN of elements in V are equivalent if lim }xn ´ yn } “ 0. nÑ`8
The completion of V is written as H “ H1 { „, and its elements are noted rxs. We define a norm on H as follows: @rxs P H,
rxs “ lim xn nÑ`8
where pxn qnPN is any Cauchy sequence in the equivalence class rxs. This definition does not depend on the choice of the Cauchy sequence used to represent the equivalence class, since, given that | xn ´ yn | ď xn ´ yn , if pyn qnPN P rxs, then at the limit we have: lim | xn ´ yn | ď lim xn ´ yn “ 0
nÑ`8
nÑ`8
that is: lim xn “ lim yn . nÑ`8
nÑ`8
Now, let us define an inner product on H which is compatible with this norm: xrxs, rysy “ lim xxn , yn y nÑ`8
where pxn qnPN and pyn qnPN are any two Cauchy sequences in the equivalence classes rxs and rys, respectively. To verify that this inner product is well defined, we must verify the existence of the limit used to define it, and show that it does not depend on the chosen representative elements. The first step is to prove the existence of the limit. To do this, we must simply show that xxn , yn y (a sequence in K) is Cauchy; given that K is complete, the limit must exist. Note that @n, m P N, by the triangular inequality and the Cauchy-Schwarz inequality, we can write: |xxn , yn y ´ xxm , ym y| “ |xxn , yn y ´ xxn , ym y ` xxn , ym y ´ xxm , ym y| “ |xxn , yn ´ ym y ` xxn ´ xm , ym y| ď |xxn , yn ´ ym y| ` |xxn ´ xm , ym y| ď xn yn ´ ym ` xn ´ xm ym
ÝÑ
n,mÑ`8
0
Banach Spaces and Hilbert Spaces
135
since xn and yn are bounded, given that pxn qnPN and pyn qnPN are Cauchy sequences. Now, we must verify that the limit is independent of the choice of representative elements: let pξn qnPN and pηn qnPN be two other representatives of the equivalence classes rxs and rys, respectively. Using direct algebraic manipulations, we can write: xxn , yn y “ xxn ´ ξn ` ξn , yn ´ ηn ` ηn y “ xxn ´ ξn , yn y ` xξn , yn ´ ηn y ` xξn , ηn y and: |xxn ´ξn , yn y`xξn , yn ´ηn y| ď xn ´ ξn yn `ξn yn ´ ηn
ÝÑ
n,mÑ`8
0
since pxn qnPN , pξn qnPN P rxs and pyn qnPN , pηn qnPN P rys, hence: xrxs, rysy “ lim xxn , yn y “ lim xξn , ηn y nÑ`8
nÑ`8
Due to the continuity of the inner product on V , all of these properties are transferred onto H by the limit operation. The final step is to verify the isometry: a a rxs “ lim xn “ lim xxn , xn y “ xrxs, rxsy, nÑ`8
nÑ`8
@rxs P H
2
An alternative proof may be found in El Hage Hassan (2011). 4.3.2. Characterizing the completeness of normed vector spaces using series In this section, we shall consider a completeness criterion for normed vector spaces which draws on series and is highly useful in practice. The explicit definition of the Cauchy condition for the sequence of partial sums of ř a series is: nPN
@ε ą 0 DNε ą 0 : n, m ě Nε
n m ÿ ÿ ùñ xk ´ xk ă ε k“0
k“0
The two indices n and m vary independently of one another, and we can suppose, without loss of generality, that one is always greater than the other. For instance,
136
From Euclidean to Hilbert Spaces
n n m ř ř ř supposing that n ą m: xk ´ xk “ k“0
k“0
k“m`1
xk , implying that the
Cauchy condition for series can be rewritten as: @ε ą 0 DNε ą 0 : n ą m ě Nε
n ÿ ùñ xk ă ε
[4.9]
k“m`1
ˆ nInstead,˙the Cauchy condition for the series of norms of xk , that is, the sequence ř }xk } , is: k“0
nPN n ÿ
@ε ą 0 DNε ą 0 : n ą m ě Nε ùñ
}xk } ă ε
[4.10]
k“m`1
This observation will be used in proving the following result. T HEOREM 4.12 (Characterizing the completeness of normed spaces using series).– A normed vector space pV, q is complete if and only if all absolutely convergent series of elements in V are also (simply) convergent in V . P ROOF.– The proof of the direct implication is extremely simple, while that of the inverse is much more complicated and it involves techniques that are very commonly used in functional analysis. ř q to be complete, and let us demonstrate that if ř ùñ : Let us suppose pV, }xn } is convergent, then xn is also convergent in V . nPN
nPN
By completeness, the convergence of
ř
}xn } is equivalent to the Cauchy
nPN
condition [4.10], that is: @ε ą 0 DNε ą 0 : n ą m ě Nε ùñ
n ÿ
}xk } ă ε
[4.11]
k“m`1
n n ř ř ď and since x }xk }, the sequence of partial sums k k“m`1 k“m`1 ˙ ˆ n ř ř xk is also Cauchy, that is xn is convergent. Sn “ k“0
nPN
nPN
ð : Now, let us suppose that all absolutely convergent series of elements in V are also simply convergent in V . We must prove that this implies that V is complete, that is any Cauchy sequence pxn qnPN Ă V , that is: @ε ą 0 DNε ą 0 : n, m ě Nε ùñ xn ´ xm ă ε converges in V , that is, there exists x ¯ P V such that pxn qnPN
Ñ
nÑ`8
x ¯.
Banach Spaces and Hilbert Spaces
137
The Cauchy condition must be valid for all values of ε ą 0, and consequently for εk “ 21k , k P N; thus, any Cauchy sequence in V must verify: ˜k ùñ xn ´ xm ă 1 , ˜k ą 0 : n, m ě N @k ě 0 DN 2k
[4.12]
note that all the objects contained in the expression above are discrete. 1 ˜k`1 ě N ˜k ; this simple consideration allows ă 21k , it follows that N Since 2k`1 us to define a strictly increasing sequence of natural numbers pNk qkě0 simply by ˜k` ą N ˜k u for all k P N. Using this result, we can define the defining Nk :“ inf tN PN
subsequence pxNk qkPN Ă V of pxn qnPN which, by its own definition, satisfies [4.12], that is: 1 @k ě 0, xNk ´ xNk`1 ă k 2
[4.13]
The interest of using this subsequence is that, if it converges in V , that is, if there exists x ¯ P V such that lim xNk “ x ¯, then by Theorem 4.10 the initial Cauchy kÑ`8
sequence pxn qnPN also converges to x ¯PV. To complete the proof, we must therefore demonstrate that the subsequence pxNk qkPN is convergent in V . In the absence of information concerning the convergence of the original sequence pxn qnPN , the convergence of pxNk qkPN cannot be proved directly; instead, we must use the hypothesis that absolutely convergent series in V imply the simple convergence of series in V . The link to series is obtained using a startlingly simple technique: rewriting the subsequence pxNk qkPN as a sequence of telescopic partial sums. To do this, we use pxNk qkPN to define a new sequence pyk qkPN Ă V as follows: # y0 “ xN0 hence: pyk qkPN “ pxN0 , xN1 ´xN0 , xN2 ´xN1 , . . .q yk “ xNk ´ xNk´1 , @k ě 1, then: k ÿ j“0
“ xN k yj “ lo xomo xN xN xN xN xN N Nk ´ 0on ` 1 ´ 0 ` looooomooooon 2 ´ 1 ` ... ` x k´1 looooomooooon loooooomoooooon y0
y1
˜ and this holds @k P N, thus
k ř j“0
y2
yk
¸ “ pxNk qkPN .
yj kPN
To resume, the completeness of V , that is, the convergence of an arbitrary Cauchy sequence pxn qnPN in V , is implied by the convergence of the sequence pxNk qkPN in
138
From Euclidean to Hilbert Spaces
˜ V ; this is equivalent to the convergence of
k ř
¸ yj
j“0
in V , that is, the simple kPN
8 ř yk . By the starting hypothesis, if we can prove that convergence of the series k“0 ř yk is convergent, this would be enough to prove the whole theorem. We begin kPN ř yk : by setting out the terms of the series kPN
ÿ
yk “ y0 `
kPN
8 ÿ
yk “ y0 `
k“1
8 ÿ xN ´ x N k k´1 k“1
8 ÿ xN “ y0 ` ´ xN k k`1 k“0
From inequality [4.13], it holds that xNk`1 ´ xNk ă
1 2k
“
` 1 ˘k 2
@k ě 0, thus:
8 8 ˆ ˙k ÿ ÿ 1 xNk`1 ´ xNk ă y0 ` yk “ y0 ` 2 k“0 kPN k“0
ÿ
“ y0 `
1 1´
1 2
“ y0 ` 2 ă `8
using the geometric series formula. Hence,
ř
yk is a bounded series of positive real terms, and, from a classic
kPN
result in series theory, we know that it converges.
2
If the space pV, } }q in the previous theorem is complete, then the Cauchy sequence pxn qnPN Ă V seen at the start of the proof is convergent; consequently, we know that the subsequence pxNk qkPN also converges to the same limit. This remark is formalized in Corollary 4.2. C OROLLARY 4.2.– Taking (V, } }) to be a complete normed vector space and pxn qnPN Ă V a sequence which converges to x0 P V , there exists a subsequence pxnk qkPN which converges to x0 . 4.3.2.1. The matrix exponential In this section, we shall examine a particularly important application of the previous theorem: the definition of the matrix exponential.
Banach Spaces and Hilbert Spaces
139
D EFINITION 4.12 (Matrix exponential).– Let A P Mpn, Kq be a square matrix5 with coefficients in the field K “ R or C. The exponential of A is the matrix defined by: eA “
8 ÿ Ak k! k“0
The proof that eA is well defined is trivial using the theorem proved above. For instance, let us consider the Frobenius norm of A: ˜ }A} “
n n ÿ ÿ
¸1{2 |aij |
2
i“1 j“1 2
This is the Euclidean norm of a vector in Kn obtained by A using lexicographical order, that is, by sequencing the lines (or rows) of A one after 2 k another. We shall prove that the series In ` A ` A2 ` A3! ` Ak! ` . . . converges in the topology of Mpn, Kq generated by this norm, implying, by Tychonoff’s theorem, its convergence with respect to any other norm. 2
Mpn, Kq is homeomorphic to the Euclidean Kn , which we know to be a complete normed space. To show that eA is well defined, we must show that the series defining eA is absolutely convergent; simple convergence is implied by Theorem 4.12. The proof of absolute convergence is extremely simple: consider the inequality }Ak } ď }A}k , verified by the Frobenius norm for all k P N and for any matrix A P Mpn, Kq, then: 8 8 ÿ ÿ }Ak } }A}k ď “ e}A} k! k! k“0 k“0
using the fact that }A} is a real number ě 0 and that the convergence radius of the exponential series in R is infinite. 4.3.3. Banach fixed-point theorem The result presented in this section is highly significant for many different fields of mathematics, such as analysis, topology, solving differential equations, etc. 5 A must be square as we will be working with powers of A; for dimensional reasons, these are not defined if A is not square.
140
From Euclidean to Hilbert Spaces
We begin by recalling Definition 4.12. D EFINITION 4.13 (Contraction mapping).– Let pX1 , d1 q and pX2 , d2 q be any two metric subspaces and let k P p0, 1q be a real constant. The application f : X1 Ñ X2 is a contraction with coefficient k if, for all x, y P X1 : d2 pf pxq, f pyqq ď kd1 px, yq
[4.14]
The smallest value of k for which [4.14] holds is called the Lipschitz constant of f . Verification that a contraction mapping is always a continuous function is immediate: for any value ε ą 0, let us take an arbitrary fixed element x ¯ P X1 and consider the elements y P X1 such that d1 p¯ x, yq ă ε. Then, by the definition of a contraction mapping, d2 pf p¯ xq, f pyqq ď kd1 p¯ x, yq ă kε ă ε, since k P p0, 1q, so function f is continuous in x ¯. As x ¯ is an arbitrary element in X1 , f is continuous on all X1 . R EMARK .– It is evident from the definition that the distance (in the codomain) between the images of a pair of elements via a contraction mapping is smaller than the initial distance. However, contraction mapping cannot be redefined using this property alone; the definition given above is not the same as stating that, for all x, y P X1 , x ‰ y, d2 pf pxq, f pyqq ă d1 px, yq. If f satisfies this condition, it is said to be a weak contraction mapping, or an application which reduces the distance between points. To understand the subtle difference between these two definitions, we begin by noting that if f is a weak contraction mapping, then for any pair x, y P X1 , x ‰ y, there exists kx,y P p0, 1q such that d2 pf pxq, f pyqq ď kx,y d1 px, yq, that is, kx,y is not a constant, as required by the definition of a contraction mapping. The two definitions coincide if and only if sup kx,y ” k¯ P p0, 1q, but this condition is not guaranteed to x,yPX1
be verified. The sup necessarily exists, since tkx,y , x, y P X1 , x ‰ yu is a bounded subset of R, but it can take the value 1, meaning that it is not strictly less than 1 as required by the definition of a contraction mapping. Contraction mappings with a domain and image in the same complete metric space have a remarkable property, described in the classic Theorem 4.13. T HEOREM 4.13 (Banach fixed-point theorem).– Let pX, dq be a complete metric space and f : X Ñ X a contraction mapping in X of coefficient k P p0, 1q: then f admits a single fixed point, that is there exists a single x ¯ P X such that f p¯ xq “ x ¯.
Banach Spaces and Hilbert Spaces
141
P ROOF.– Let a P X be an arbitrary element. We define the sequence pxn qnPN Ă X by recursion as: # x0 “ a xn “ f pxn´1 q, n ě 1 The first step of the proof consists simply of showing that, if this sequence admits a limit in X, then this limit is a fixed point for f . The uniqueness of the fixed point will be a simple consequence of the definition of the contraction mapping. Instead, the convergence of the sequence pxn qnPN , which is harder to prove, will be verified later. – If there exists X Q x ¯ “ lim xn , then x ¯ is a fixed point for f : The proof of this nÑ`8
statement relies on a simple continuity argument. Since we know that a contraction mapping is continuous, if we let n tend toward `8 in the definition of the sequence, that is, xn “ f pxn´1 q, we obtain: lim xn “ lim f pxn´1 q ðñ x ¯ “ f p lim xn´1 q ðñ x ¯ “ f p¯ xq
nÑ`8
nÑ`8
nÑ`8
that is, x ¯ is a fixed point for f . This is the reason for considering the recursively defined sequence pxn qnPN described above. – Uniqueness of the fixed point: Let x ¯, y¯ P X be two fixed points for f , that is, f p¯ xq “ x ¯, f p¯ y q “ y¯. We can show that their distance is null, that is, x ¯ “ y¯, using the definite positiveness of the distance and the definition of contraction mapping: dp¯ x, y¯q “ dpf p¯ xq, f p¯ y qq ď kdp¯ x, y¯q but since k P p0, 1q, this inequality only holds if dp¯ x, y¯q “ 0, that is, x ¯ “ y¯. – Convergence of the sequence: Here, the hypothesis that pX, dq is complete will be crucial, because if we can show that pxn qnPN is Cauchy, then, by completeness, it is convergent. We begin by noting that for all n ě 1, using the definition of the sequence and the hypothesis that f is a contraction, we can write: dpxn`1 , xn q “ dpf pxn q, f pxn´1 qq ď kdpxn , xn´1 q hence, by iteration: dpxn`1 , xn q ď kdpxn , xn´1 q ď k 2 dpxn´1 , xn´2 q
[4.15]
ď . . . ď k dpx1 , x0 q n
that is the distance between consecutive elements, xn`1 and xn , in sequence pxn qnPN is majorized by k n dpx1 , x0 q; note that the power of k is equal to the smallest index value.
142
From Euclidean to Hilbert Spaces
Now, let us take two arbitrary but different natural indices n, m P N. Without loss of generality, we may consider that n ă m, hence m ´ n “ p P N, or m “ n ` p, and so dpxm , xn q “ dpxn`p , xn q. Iterating the triangular property of the distance, we obtain: dpxn`p , xn q ď dpxn`p , xn`p´1 q ` dpxn`p´1 , xn`p´2 q ` . . . ` dpxn`1 , xn q We see that all terms on the right side of the inequality are distances between two consecutive elements of the sequence pxn qnPN ; using this fact, we can apply the majorization given by [4.15] and write: $ dpxn`p , xn`p´1 q ď k n`p´1 dpx1 , x0 q ’ ’ ’ ’ &dpxn`p´1 , xn`p´2 q ď k n`p´2 dpx1 , x0 q .. ’ ’ . ’ ’ % dpxn`1 , xn q ď k n dpx1 , x0 q that is: dpxn`p , xn q ď pk n`p´1 ` k n`p´2 ` . . . ` k n qdpx1 , x0 q p´1 “ pk `¸ k p´2 ` . . . ` 1qk n dpx1 , x0 q ˜ p´1 ř j k k n dpx1 , x0 q “ j“0 ˜ ¸ `8 ř j ď k k n dpx1 , x0 q, kj ą0
As
`8 ř
j“0
k j is a geometric series in k P p0, 1q, it converges to
j“0
dpxn`p , xn q ď
1 1´k ,
so we have:
kn dpx1 , x0 q 1´k
Remembering that dpxn`p , xn q “ dpxm , xn q, m ą n P N of arbitrary value, we have: dpxm , xn q ď
kn dpx1 , x0 q ÝÑ 0 nÑ`8 1´k
This implies that pxn qnPN is a Cauchy sequence, and thus converges to an element x ¯ P X by the hypothesis of completeness of X. 2 It is important to note that the first element in the sequence pxn qnPN is completely arbitrary: even if this element is distant from the fixed point x ¯, the sequence will reach the fixed point by the limit. In some occasions, a starting point x0 may be selected in such a way as to accelerate the speed at which the sequence convergences.
Banach Spaces and Hilbert Spaces
143
The following nice exercise is proposed by Sondaz (2010). Exercise 4.1 1) Give an example of a metric space pX, dq and contraction mapping f : X Ñ X with no fixed point. 2) Give an example of a complete metric space pX, dq and an application f : X Ñ X which strictly reduces distances, that is such that dpf pxq, f pyqq ă dpx, yq @x, y P X, x ‰ y, and which admits no fixed point. 3) Show that the Cauchy problem: # x1 ptq “ 12 sin xptq xp0q “ 1
[4.16]
has a unique solution ϕ : r´1, 1s Ñ R. Solution to Exercise 4.1 For points 1 and 2, the answer evidently involves undermining the fixed point theorem by removing a hypothesis. For point (1), we consider a non-complete metric space. For point (2), we consider an application which strictly reduces distances; as we have seen, this hypothesis is less strict than requiring the application to be a contraction mapping. 1) We need to consider a non-complete metric space. We have already seen that pp0, 1q, | |q is not complete. In this space, let us consider, for instance, the function f : p0, 1q Ñ p0, 1q, f pxq “ 12 x. Then: ˇ ˇ ˇ1 1 ˇ 1 1 @x, y P p0, 1q, |f pxq ´ f pyq| “ ˇˇ x ´ y ˇˇ “ |x ´ y| ď |x ´ y| 2 2 2 2 so f is a contraction with coefficient k “ 1{2. The fixed point equation for f , that is, f pxq “ x, evidently has no solutions in p0, 1q since 12 x “ x if and only if x “ 0 R p0, 1q. 2) Consider the metric space pX, dq? “ pr0, `8q, | |q and the application f : r0, `8q Ñ r0, `8q defined by f pxq “ x2 ` 1. Taking two arbitrary fixed elements x, y P r0, `8q, due to Lagrange’s mean value theorem, there exists an element ξ P r0, `8q which is strictly included in the interval between x and y, such that: ξ f pxq ´ f pyq “ px ´ yqf 1 pξq “ px ´ yq a 2 ξ `1
144
From Euclidean to Hilbert Spaces
Since
a a ξ 2 ` 1 ą ξ 2 “ ξ P r0, `8q, then ? ξ2
ξ `1
ă 1, so |f pxq ´ f pyq| ă
|x ´ y|, that is f strictly reduces the distances. ? Nevertheless, in r0, `8q the fixed point equation for f , that is, x “ x2 ` 1, can be written as x2 “ x2 ` 1, meaning 1 “ 0, which is obviously a contradiction; thus, f does not admit a fixed point. 3) We know from differential equation theory that solving the Cauchy problem [4.16] is equivalent to determining a function ϕ P Cpr´1, 1sq which satisfies the following Volterra integral equation: 1 ϕptq “ 2
żt 0
sin ϕpsqds ` 1
[4.17]
Let us verify this statement. On one side, if ϕ is a solution of [4.16], by definition, ϕ is differentiable and thus continuous. Integrating both sides of the differential şt 1 ş 1 t equation from 0 to t, we obtain 0 ϕ psqds “ 2 0 sin ϕpsqds, that is, ϕptq ´ ϕp0q “ ş 1 t The initial condition [4.16] gives us ϕp0q “ 1, thus ϕ satisfies 2 0 sin ϕpsqds. ş 1 t ϕptq “ 2 0 sin ϕpsqds ` 1. On the other side, supposing that ϕ satisfies [4.17], the integral function sin ϕpsqds ` 1 is derivable @t P r´1, 1s, since sin ˝ϕ is continuous and the integration operation makes any continuous function derivable. Deriving [4.17] gives us ϕ1 ptq “ 12 sin ϕptq with ϕp0q “ 1, that is, ϕ satisfies [4.16]. ş 1 t 2 0
These considerations highlight the interest of the space Cpr´1, 1sq, which is a Banach space when it is endowed with the norm }f } “ sup |f ptq|. Consider the tPr´1,1s
following application: F : Cpr´1, 1sq ÝÑ Cpr´1, 1sq f ÞÝÑ F pf q where F pf q is the real-value continuous function on r´1, 1s defined by the analytical şt expression F pf qptq “ 12 0 sin ϕpsqds ` 1, for all t P r´1, 1s. Clearly, if we can show that F is a contraction, by invoking the fixed point theorem we will complete the proof that there is only one solution to the Cauchy problem [4.16]. To do this, let us consider any two functions f, g P Cpr´1, 1sq and an arbitrary t P r´1, 1s; then:
Banach Spaces and Hilbert Spaces
145
ˇ ˇż ˇ 1 ˇˇ t ˇ rsin f psq ´ sin gpsqsds ˇ ˇ 2 0 ˇ ˇż t ˇ 1ˇ ď ˇˇ | sin f psq ´ sin gpsq|dsˇˇ 2 0 ˆ ˙ ˆ ˙ p´q p`q pusing the formula sin p ´ sin q “ 2 sin cos q: 2 2 ˇ ˇ ˇż t ˇ ˇ ˇ f psq ´ gpsq f psq ` gpsq ˇˇ ˇˇ cos “ ˇˇ ˇˇsin ˇ dsˇ 2 2 0 |F pf qptq ´ F pgqptq| “
p| cospαq| ď 1q ˇż t ˇ ˇ ˇ ˇ ˇ f psq ´ gpsq ˇˇ ˇˇ ˇ ˇ ď ˇ ˇsin ˇ dsˇ 2 0
p| sinpαq| ď |α|q ˇż t ˇ ˇ ˇ ˇ ˇ f psq ´ gpsq ˇ ˇ ˇ ˇ ˇ dsˇ ďˇ ˇ ˇ ˇ 2 0 ˇ ˇż t ˇ 1ˇ ď ˇˇ }f ´ g}dsˇˇ 2 0 ˇ ˇż }f ´ g} ˇˇ t ˇˇ }f ´ g} ď |t| ˇ dsˇ “ 2 2 0 pt P r´1, 1s ùñ |t| ď 1q ď
}f ´ g} 2
In summary: |F pf qptq ´ F pgqptq| ď
}f ´g} 2
@t P r´1, 1s, hence:
}F pf q ´ F pgq} “ sup |F pf qptq ´ F pgqptq| ď tPr´1,1s
1 }f ´ g} 2
that is F is a contraction.
2
4.4. Remarkable examples of Banach and Hilbert spaces In this section, we shall introduce function spaces which are of crucial importance in mathematics. We shall demonstrate that some of these spaces are Banach spaces, while others are Hilbert spaces; we shall then present density theorems related to these spaces.
146
From Euclidean to Hilbert Spaces
4.4.1. Lp and p spaces: presentation and completeness In the following definitions, K will be either R or C. Let pX, A, μq be a measure space. For all 1 ď p ă `8, we define: " * ż Lp pX, A, μq “ f : X Ñ K, f measurable : |f |p dμ ă `8 X
The set Lp pX, A, μq becomes a vector space if we define the pointwise vector structure, that is @α, β P K, @f, g P Lp pX, A, μq: αf ` βg : X ùñ K x ÞÑ pαf ` βgqpxq “ αf pxq ` βgpxq This linear combination operation is well defined thanks to the famous Minkowski inequality6 for integrals (which we will not prove): ˆż
˙1{p ˆż ˙1{p ˆż ˙1{p |f ` g|p dμ ď |f |p dμ ` |g|p dμ
X
X
[4.18]
X
Since multiplication by the scalars α, β has no effect on the integrability of f, g, the definition is coherent. Writing: }f }p “
ˆż
|f | dμ p
˙1{p
X
the properties of the Lebesgue integral give us: – positiveness (non-definite) and homogeneity: }f }p ě 0, }λf }p “ |λ|}f }p
@f P Lp pX, A, μq, λ P K
– the Minkowski inequality [4.18] becomes the triangular inequality7 for } }p : }f ` g}p ď }f }p ` }g}p ,
@f, g P Lp pX, A, μq
6 By iteration, we can write the generalized Minkowski inequality, which we shall use later: ˇp ¸1{p ˜ż ˇ ˙1{p n n ˆż ˇÿ ˇ ÿ ˇ ˇ p fk ˇ dμ ď |fk | dμ . ˇ ˇ X ˇk“1 k“1 X n n ÿ ÿ 7 By iteration: fk ď fk p . [4.18] k“1 k“1 p
Banach Spaces and Hilbert Spaces
147
– but: f “ 0 (the null function) }f }p “ 0 ùñ so any function g P Lp pX, A, μq which is null a.e. is such that }g}p “ 0. Thus, the fact that }f }p “ 0 does not imply that f is the null function, that is, that f pxq “ 0 @x P X. Hence, } }p is a semi-norm (or pseudo-norm) on Lp pX, A, μq. Unfortunately, the presence of a semi-norm makes it impossible to use the (highly useful) property that, if the norm of the difference between two elements in a normed space is null, then the two elements coincide, since they can differ over a set of measure zero. This norm feature is used to show the uniqueness of a mathematical object in cases where it is difficult to prove it directly, this is why it is important to preserve it. The solution to the problem is to apply the quotient of Lp pX, A, μq w.r.t. a suitable subspace that allows us to get rid of the redundant functions. It should be clear that this subspace is: N “ tf : X Ñ K, f measurable : f “ 0 a.eu The quotient space Lp pX, A, μq “ Lp pX, A, μq{N formed by the equivalence classes of functions which are measurable on X, absolutely integrable in power p and equal a.e, is thus a normed vector space with norm } }p . Using the considerations presented in Appendix 1, it becomes apparent that, fixed a representative f of an equivalence class of Lp pX, A, μq, all other functions g belonging to the same class can be written as g “ f ` h, where h : X Ñ K is null a.e. For simplicity’s sake, a representative function and the equivalence class to which it belongs are generally noted using the same symbol. Furthermore, in cases where X, A and μ do not need to be specified, we may simply write Lp . R EMARK .– Take X Ď Kn with the Lebesgue measure. Let us consider two functions f, g P Lp pX, A, μq which are continuous on X and which differ, at least, at the point x0 P X: f px0 q ‰ gpx0 q. By definition of continuity: @ε ą 0 Dδε ą 0 : x P Uδε px0 q ùñ f pxq P Uε pf px0 qq and gpxq P Uε pgpx0 qq but, by the separability property of Kn , Dε ą 0 such that Uε pf px0 qq X Uε pgpx0 qq “ H, that is, Dδε ą 0 such that x P Uδε px0 q implies f pxq ‰ gpxq, that is, if two
148
From Euclidean to Hilbert Spaces
continuous functions f and g are different at a point x0 , they must also be different on a neighborhood Uδε px0 q of radius δε ą 0. This neighborhood has a non-null Lebesgue measure, so the two functions are not equal a.e. In other words: two functions which are continuous on X Ă Kn cannot be equal a.e.: either they are the same function, or they are different on a non-null Lebesgue measure set. Thus, two continuous functions of Lp pX, A, μq which are different in at least one point are two different elements of Lp pX, A, μq, as they are representatives of two different equivalence classes. If p “ 2, then we can define an inner product on L2 pX, A, μq: xf, gy “
ż f g dμ
if K “ R
xf, gy “
and
X
ż f g dμ
if K “ C
X
These inner products are well defined thanks to Hölder’s inequality for integrals (which we shall not prove here): if p, q ą 0 are conjugate exponents, that is, p1 ` 1q “ 1, then it holds that: ż
|f g| dμ ď
X
ˆż
|f |p dμ
˙1{p ˆż
X
|g|q dμ
˙1{q [4.19]
X
Evidently, p “ q “ 2 are conjugate exponents and thus the inner product introduced above is well defined. The proof that this verifies the axioms of the inner product is left to the reader; here we note simply that Hölder’s inequality for p “ q “ 2 implies the validity of the Cauchy-Schwarz inequality for the space L2 pX, A, μq. In fact, for all f, g P L2 pX, A, μq: ˇż ˇ ż ˙1{2 ˆż ˙1{2 ˆż ˇ ˇ |xf, gyL2 | “ ˇˇ f g¯ dμˇˇ ď |f g| dμ ď |f |2 dμ |g|2 dμ X
X
r4.19s
X
X
“ }f }2 }g}2 One notable instance of Lp spaces is represented by the p spaces, which are defined through the following choices: – X is taken to be a countable set, typically X “ N or X “ Z; – A “ PpXq, the set of parts of X; – μ is the counting measure, that is μ : PpXq Ñ r0, `8s, μpAq “cardpAq @A P PpXq which has a finite cardinal and μpAq “ `8 if cardpAq is not finite.
Banach Spaces and Hilbert Spaces
149
Using these choices, any function f : X ùñ K is measurable and it can be identified with a sequence of elements in K, written pxn qnPN . Thus, explicitly8: # pN, Kq “
+ ÿ
pxn qnPN ,
p
|xn | ă `8 p
nPN
the same considerations hold if we exchange N for Z. In cases where there is no need to specify N, Z or any other countable set, we simply write p . The linear structure of these spaces is the same as that of the Lp spaces, that is pointwise defined, and the norm of pxn qnPN P p pN, Kq is: ¸1{p
˜ ÿ
pxn qnPN p “
|xn |
p
nPN
The same holds if we exchange N for Z. The triangular inequality for this norm follows from the Minkowski inequality for series: ¸1{p
˜ ÿ
|xn ` yn |
¸1{p
˜ ď
p
nPN
ÿ
|xn |
¸1{p
˜ ÿ
`
p
nPN
|yn |
p
[4.20]
nPN
As in the case of Lp spaces, if p “ 2, an inner product can be defined on 2 : xxn , yn y “
ÿ
xn y n
if K “ R
and
xxn , yn y “
nPN
ÿ
xn y n
if K “ C
nPN
The same holds if we exchange N for Z, or any other countable set. 0,
The inner product is well defined thanks to Hölder’s inequality for series: if p, q ą ` 1q “ 1, then it holds that:
1 p
¸1{p ˜
˜ ÿ
|xn yn | ď
nPN
ÿ nPN
|xn |
p
¸1{q ÿ
|yn |
q
nPN
R EMARK .– – The inner product of 2 pN, Kq is the infinite-dimensional generalization of the inner product of 2 pZN q. 8 The spaces p pN, Kq are vector subspaces of the vector space KN :“ tpxn qnPN , xn P K @n P Nu of sequences with values in K possessing a pointwise defined linear structure. The same holds if N and Z are switched, in which case we speak of bilateral sequences.
150
From Euclidean to Hilbert Spaces
– The role of the Minkowski and Hölder inequalities in defining Lp and p spaces should be clear: the Minkowski inequality guarantees the existence of a linear structure, and Hölder’s inequality ensures that the inner product is well defined in the case where p “ 2. – } }p norms with p ‰ 2 are not Hilbert norms, in fact it is possible to provide examples of elements in all the Lp spaces, with p ‰ 2, for which the parallelogram law is not verified. Now, let us demonstrate that Lp and p spaces with 1 ď p ă `8, p ‰ 2 are Banach spaces, and for p “ 2, Hilbert spaces. The completeness of L2 pr0, 1sq spaces was demonstrated independently by the Austrian mathematician Ernst Sigismund Fischer (1875-1954) and by Frigyes Riesz9 in 1907. In 1910, Riesz demonstrated that all Lp r0, 1s spaces are complete. T HEOREM 4.14 (Riesz-Fischer theorem).– For all 1 ď p ă `8, the spaces pLp pX, A, μq, } }p q and pp , } }p q are complete. P ROOF.– We will report Riesz’s demonstration, who brought out the heavy artillery to prove these results, using the characterization theorem for complete normed vector spaces, Fatou’s lemma, the generalized Minkowski inequality, the monotone convergence theorem and the dominated convergence theorem to construct his proof. Let us consider any series
8 ř
fk in Lp pX, A, μq, 1 ď p ď `8, which is
k“0
absolutely convergent, that is: 8 ÿ
fk p “ M ă `8
k“0
then,ř by Theorem 4.12, to show that Lp pX, A, μq is complete, we must simply prove that fk is convergent in norm, that is that DS P Lp pX, A, μq such that: kPN
n ÿ fk ´ S k“0
ÝÑ
p
n ùñ `8
0
[4.21]
The first step in determining the function S is to define the sequence: pgn qnPN , gn “
n ÿ
|fk |,
@n P N
k“0
9 Frigyes Riesz (1880-1956) was a Hungarian mathematician who made many hugely important contributions to the development of functional analysis, among other areas.
Banach Spaces and Hilbert Spaces
151
Using the generalized Minkowski theorem in equation [4.18], we know that: ˆż ˙1{p n n ÿ ÿ p pgn q dμ “ gn p “ |fk | ď fk p X k“0
ď
8 ÿ
fk p
k“0
“
p
k“0
M ă `8
(by hypothesis)
hence: ż
pgn qp dμ ď M p ,
@n P N
[4.22]
X
that is, pgn qp is a sequence of monotonic increasing functions of integrable functions, and the sequence of integrals is bounded. The monotone convergence Theorem 3.3 tells us that the pointwise limit function lim pgn qp pxq is finite a.e. on X, that is, @x P E Ď X and μpXzEq “ 0. This
nÑ`8
implies the existence @x P E of a finite pointwise limit: ˆ ˙ gpxq ” lim gn pxq “ lim rpgn qp pxqs1{p nÑ`8
Since @x P E,
nÑ`8
8 ř
8 ř
fk pxq ď
k“0
|fk pxq| “ gpxq, the series
k“0
8 ř
fk pxq converges
k“0
a.e. on X. Now, let us construct the required function S : X Ñ K: $8 & ř f pxq x P E k Spxq “ k“0 % 0 x P XzE p şThis pdefinition ensures that S is measurable. The fact that S P L pX, A, μq, that is, X S dμ exists and is finite a.e., is a consequence of the dominated convergence theorem (Theorem 3.5) and Fatou’s lemma (Theorem 3.4). This can be proved by considering the sequence of partial sums for S p ,
that is: ˜ pSn q “ p
n ÿ k“0
¸p fk
˜ ď
n ÿ
¸p |fk |
“ pgn qp .
k“0
pgn qp is an increasing positive sequence, thus: pSn qp pxq ď pgn qp pxq ď lim pgn qp pxq “ g p pxq, nÑ`8
@x P E
[4.23]
152
From Euclidean to Hilbert Spaces
By monotony, lim pgn qp pxq “ lim inf pgn qp pxq and thus, by Fatou’s lemma, we nÑ`8
nÑ`8
have: ż
g p dμ ď lim
ż
nÑ`8 X
X
pgn qp dμ ď
lim M p “ M p ă `8
r4.22s nÑ`8
The positive measurable function g p is therefore integrable a.e. on X. Using this information and equation [4.23], that is pSn qp ď g p @n P N, the dominated convergence theorem can be used to guarantee that S p , the a.e. limit of pSn qp , converges on X, that is S P Lp pX, A, μq. To complete our proof, we must demonstrate that function S verifies equation [4.21], that is: ˇp ż ˇˇ ÿ n 8 n ÿ ˇ ÿ ˇ ˇ fk ´ S ÝÑ 0 ðñ lim fk ´ fk ˇ dμ “ 0 ˇ nÑ`8 E ˇ n ùñ `8 ˇ k“0
k“0
p
k“0
note that we do not need to write the integration on XzE since μpXzEq “ 0. With 8 ř our notation, the condition of convergence in norm } }p for the series fk to S can k“0
be rewritten in a simpler way as follows: lim Sn ´ Sp “ 0 ðñ
nÑ`8
ż lim
nÑ`8 E
|Sn ´ S|p dμ “ 0
Evidently, if we can show that the integral and the limit can switch places, then the result will be proved, since, in this case: ż ż lim |Sn ´ S|p dμ “ lim |Sn ´ S|p dμ nÑ`8 E
“
pS is independent of nq
E nÑ`8
ż E
| lim Sn ´ S|p dμ “ 0 nÑ`8
To make this exchange possible, we can write the following majorization: |Sn pxq ´ Spxq|p ď p|Sn pxq| ` |Spxq|qp ď pgpxq ` gpxqqp “ p2gpxqqp “ 2p pgpxqqp @x P E ş As X g p dμ ď M p ă `8, this majorization ensures that the sequence p|Sn pxq ´ Spxq|p qnPN verifies the conditions of the dominated convergence theorem, meaning that the limit and integral can be exchanged. As we saw previously, this ensures that the series
8 ř
fk in Lp pX, A, μq, which
k“0
we presumed to be absolutely convergent, is also simply convergent. Hence, all Lp pX, A, μq spaces with 1 ď p ă 8 are complete.
Banach Spaces and Hilbert Spaces
153
Since p spaces are special cases of Lp spaces, this result also holds for these spaces @1 ď p ă 8. 2 Exercise 4.2 Let a “ pan qnPN be a sequence of strictly positive real numbers, and let 2a pN, Cq be the ř vector space formed by the sequences of complex numbers pun qnPN which verify an |un |2 ă `8. Show that the application defined by: nPN
xu, vy2a “
ÿ
an un vn
nPN
is well defined on 2a pN, Cq ˆ 2a pN, Cq (i.e. xu, vy exists for all u, v P 2a pN, Cq), and deduce that this is an inner product. Solution to Exercise 4.2 ? ? Since u, v P 2a pN, Cq, au and av belong to 2 pN, Cq, then: ÿ? ? ? ? xu, vy2a “ an un an vn “ x an un , an vn y2 ă `8 nPN
The sesquilinearity and conjugate symmetry of xu, vy2a follow directly from the analogous properties of the inner product of 2 pN, Cq. The onlyřelement to verify explicitly is definite positiveness. If u P 2a pN, Cq, then xu, uy2a “ nPN an |un |2 ě 0 as it is a sum of positive terms. This formula also shows that xu, uy2a “ 0 ðñ an |un |2 “ 0 for all n P N, but an ą 0 for all n P N by hypothesis, thus |un |2 “ 0 ðñ un “ 0 @n P N, that is u “ 02a . 2 Exercise 4.3 Take s P R, s ą 0 and: # H “ s
u “ pun qnPN Ă C @n P N :
+ ÿ
2 s
2
p1 ` n q |un | ă `8
nPN
H s is a Hilbert space which is often encountered when solving differential equations using the Fourier transform. 1) Show that H s is a vector subspace of 2 pN, Cq. 2) Let φ : H s ˆ H s Ñ C be the application defined by: ÿ p1 ` n2 qs un vn @u, v P H s φpu, vq :“ nPN
154
From Euclidean to Hilbert Spaces
presuming, for the moment, that the application is well defined, that is, the series converges. For any sequence w “ pwn qnPN P H s , define the sequence w ˜ as follows: w ˜n “ p1 ` n2 qs{2 wn
@n P N
a) Show that w ˜ P 2 pN, Cq and it holds that: φpu, vq “ x˜ u, v˜y2
@u, v P H s
where x , y2 is the usual inner product of 2 pN, Cq. b) Deduce that φ is well defined on H s ˆ H s , then that it constitutes an inner product, noted φ “ x , yH s . 3) We wish to show that pH s , x , yH s q is a Hilbert space. To do this, let us fix an arbitrary Cauchy sequence pum qmPN in H s . a) Show that p˜ um qmPN is Cauchy in 2 pN, Cq. note ˜l.
b) Deduce that p˜ um qmPN converges in 2 pN, Cq to a limit, which we shall c) Define the sequence l “ pln qnPN by: ln “
1 ˜ln p1 ` n2 qs{2
@n P N
Show that l belongs to H s , that pum qmPN converges to l in H s , and conclude your proof. Solution to Exercise 4.3 1) To show that H s Ă 2 pN, Cq we shall demonstrate, in order, that u P H s ùñ u P 2 pN, Cq, that H s ‰ H, and that H s is stable with respect to linear combinations of its elements. For any sequence u “ pun qnPN Ă C it holds that 0 ď |un |2 ď p1 ` n2 q|un |2 for all n P N, hence: ÿ ÿ p1 ` n2 qs |un |2 ă s `8 ùñ u P 2 pN, Cq |un |2 ď nPN
nPN
def. of H
Evidently, 02 P H s , thus H s ‰ H. Finally, taking λ P C and u, v P H s , then: 0 ď |un ` λvn |2 ď p|un | ` |λ||vn |q2 ď 2p|un |2 ` |λ|2 |vn |2 q where the final inequality draws on the fact that the moduli are real numbers and that, for all a, b P R, 0 ď pa ´ bq2 “ a2 ` b2 ´ 2ab “ 2a2 ´ a2 ` 2b2 ´ b2 ´ 2ab,
Banach Spaces and Hilbert Spaces
155
so a2 ` b2 ` 2ab ď 2a2 ` 2b2 , that is pa ` bq2 ď 2pa2 ` b2 q; writing a “ |un | and b “ |λ||vn |, we obtain the final inequality from the previous formula. Now, with respect to the series, we can write: ˜ ¸ ÿ ÿ ÿ p1`n2 qs |un `λvn |2 ď 2 p1 ` n2 qs |un |2 ` |λ|2 p1 ` n2 qs |vn |2 q ă `8 nPN
nPN
nPN
as u, v P H , thus u ` λv P H and so H is a vector subspace of 2 pN, Cq. s
s
2) a) w ˜ P 2 pN, Cq if
s
|w ˜n |2 ă `8, but:
ř nPN
ÿ
|w ˜ n |2 “
nPN
ÿ
p1 ` n2 qs |wn |2 ă `8
nPN
˜ P 2 pN, Cq. Now, taking u, v P H s : since w P H , so w ÿ ÿ ÿ p1`n2 qs un vn “ p1`n2 qs{2 un p1 ` n2 qs{2 vn “ u ˜n v˜n “ x˜ u, v˜y2 φpu, vq “ s
nPN
nPN
nPN
b) We have: ř ř φpu, vq “ nPN p1 ` n2 qs un vn ď nPN |p1 ` n2 qs un v n | “
ÿ
|p1 ` n2 qs{2 un p1 ` n2 qs{2 v n |
nPN
“
ÿ
|˜ un v˜n | “ x˜ u, v˜y2
nPN
ď
Cauchy-Schwarz
}˜ u}2 }˜ v }2 ă `8
u, v˜y2 , thus φpu, vq is well defined for all u, v P H 2 . By the fact that φpu, vq “ x˜ we know that φ is an inner product: it is Hermitian and sesquilinear, since x , y2 possesses these properties. Regarding the definite positiveness, we simply note that for ř all u P H s , φpuq “ 0 implies p1`n2 qs{2 un p1`n2 qs{2 un “ x˜ u, u ˜y2 “ }˜ u}2 “ 0, nPN
that is, u ˜ “ 0, that is, p1 ` n2 qs{2 un “ 0 ðñ un “ 0 @n P N. Hence φ is a complex inner product on H 2 , and this is noted φpu, vq “ xu, vyH s . 3) a) To prove that if u “ pum qmPN is an arbitrary Cauchy sequence in H s then p˜ um qmPN is a Cauchy sequence in 2 pN, Cq, we write the Cauchy condition in its squared form for u: @ε ą 0 DNε P N : m, k ď Nε ùñ }um ´ uk }2H s ă ε2 but }um ´ uk }2H s “ xum ´ uk , um ´ uk yH s
Č Č “ xum ´ u k , um ´ uk y2 , and:
p2.paqq
Č ´ uk “ p1`n2 qs{2 pum ´uk q “ p1`n2 qs{2 um ´p1`n2 qs{2 uk “ u ˜m ´ u ˜k um Č Č hence }um ´ uk }2H s “ xum ´ uk , u m ´ uk y2 “ x˜ um ´ u ˜k , u ˜m ´ u ˜k y2 “ }˜ um ´ 2 u ˜k }2 , which implies that p˜ um qmPN is a Cauchy sequence in 2 pN, Cq.
156
From Euclidean to Hilbert Spaces
b) Given that 2 pN, Cq is complete, the Cauchy sequence p˜ um qmPN converges to an element in 2 pN, Cq which we note ˜l. c) Let us consider the sequence l “ ˜l{p1 ` n2 qs{2 and show that it belongs to H by calculating the square of its norm in H s : s
}l}2H s “
ÿ
p1 ` n2 qs |ln |2 “
nPN
ÿ
p1 ` n2 qs
nPN
ÿ |˜ln |2 “ |˜ln |2 ă `8 pp1`n2 qs nPN
2
so l P H . Now, let us show that pum qmPN converges to : using the result from point Č (2a), we have xum ´ l, um ´ lyH s “ xuČ m ´ l, um ´ ly2 . Since we have also seen that ˜m ´ ˜l, it holds that }um ´ l}2H s “ }˜ um ´ ˜l}22 Ñ 0, by (3b), that is, uČ m´l “ u mÑ`8
pum qmPN converges to l in H s . We have thus demonstrated that the arbitrary Cauchy sequence pum qmPN converges inside H s , that is, H s constitutes a Hilbert space. 2 4.4.2. L8 and 8 spaces The case where p “ 8 has been deliberately excluded up to this point, and will be examined separately here. Let pX, A, μq be a measure space, as before, and let K “ R or C. We begin by defining the space: L8 pX, A, μq “ tf : X ùñ K : DM P R, M ě 0, such that |f pxq| ď M a.e.u The elements of L8 pX, A, μq are known as essentially bounded functions, that is, functions which are bounded on the complement of a null measure set w.r.t. μ. As in the case of Lp spaces, we need to introduce the equivalence relation: f, g P L8 pX, A, μq, f „ g if f “ g a.e. to make the quotient space: L8 pX, A, μq “ L8 pX, A, μq{„ a normed vector space with norm given by: }f }8 “ inftM ě 0 : |f pxq| ď M a.e.u which we shall call ess suppf q, read as the essential supremum of f , which, by definition, satisfies: |f pxq| ď ||f ||8 a.e. for all f P L8 pX, A, μq.
Banach Spaces and Hilbert Spaces
157
The symbol 8 has its origins in the fact that if 1 ď p ă `8 and f P Lp X L8 , then: }f }8 “
lim
p ùñ `8
}f }p
As in the case of Lp spaces, the case of continuous functions requires further clarification. If a continuous function is such that |f pxq| ą M , then, by completeness, there exists a neighborhood of positive radius in which f is not bounded by M . Thus, a continuous and essentially bounded function is actually a bounded function in the usual sense. We also define: 8 pN, Kq “ L8 pN, PpNq, μcounting q “ tpxn qnPN : xn P K @n P N, DM ě 0 : |xn | ď M u
that is, 8 is the space of bounded sequences (a similar definition is obtained if we exchange N for Z). 8 pN, Kq is a normed space with: }pxn qnPN }8 “ sup |xn | nPN
T HEOREM 4.15.– pL8 pX, A, μq, } }8 q and p8 pN, Kq, } }8 q are Banach spaces. P ROOF.– Let us set out the proof for L8 pX, A, μq, then the fact that 8 pN, Kq is a Banach space will be an automatic implication. We must show that, if pfn qnPN is a Cauchy sequence of elements of L8 pX, A, μq, then it converges to an element in L8 pX, A, μq. By the definition of a Cauchy sequence, we have: @ε ą 0 DNε ą 0 : n, m ě Nε ùñ }fn ´ fm }8 ă ε
[4.24]
This will be used later. Now, let us consider the sets of points where the functions in the sequence behave in a “peculiar” manner: Ak “ tx P X : |fk pxq| ą ||fk ||8 u, Bn,m “ tx P X : |fn pxq ´ fm pxq| ą ||fn ´ fm ||8 u by the definition of L8 pX, A, μq, μpAk q “ μpBn,m q “ 0 and: c @x P Ack : |fk pxq| ď ||fk ||8 , @x P Bn,m : |fn pxq ´ fm pxq| ď ||fn ´ fm ||8
158
From Euclidean to Hilbert Spaces
To eliminate the dependency of the indices k, n, m, we construct the set: ď ď Ak Y Bn,m E“ kPN
n,mPN
which has a null measure, μpEq “ 0, as a countable union of null measure sets. Now, we observe that: @x P E c , @n, m ě Nε : | fn pxq ´ fm pxq |ď }fn ´ fm }8 ă ε 4.24
[4.25]
so pfn pxqqnPN is a Cauchy sequence of elements of K, which is complete; thus, there exists a pointwise limit f pxq “ lim fn pxq. n ùñ `8
Equation [4.25] of course holds if n ùñ `8, thus @ε ą 0 we have: @x P E c , @m ě Nε : | lim
nÑ`8
fn pxq ´ fm pxq |“| f pxq ´ fm pxq |ă ε
which is the definition of uniform convergence of the sequence pfn qnPN Ă L8 to f on E c . A standard result of calculus guarantees that if a sequence of bounded functions converges uniformly to a function, then even the limit function is bounded; in our case, this implies that f is essentially bounded on E c . The final step is to extend the definition of f to a function f˜ defined on all X (since the elements of L8 pX, A, μq are defined on all X) while retaining the property of essential boundedness. This is trivial, as we simply take: # f pxq if x P E c f˜pxq “ 0 if x P E Since μpEq “ 0, f˜ : X Ñ K is the representative of an equivalence class of L pX, A, μq to which the Cauchy sequence pfn qnPN converges, this concludes our proof. 2 8
Exercise 4.4 Consider a sequence a “ pak qkPZ and, for all u P 8 pZ, Cq, let a ˚ u be the bilateral sequence defined for k P Z by: ÿ pa ˚ uqk “ am uk´m mPZ
Let us take, for all f P 8 pZ, Cq, T puq :“ a ˚ u ` f .
Banach Spaces and Hilbert Spaces
159
1) For the purposes of this question, we take a “ δ1 , that is, the sequence defined by a1 “ 1 and aj “ 0 if j ‰ 1. Calculate a ˚ u as a function of u. ÿ 2) First, suppose that a “ pak qkPZ P 1 verifying }a}1 “ |ak | ă 1. kPZ
a) Show that pa ˚ uqk is well defined for all k P Z and that a ˚ u P 8 . b) Show that T : 8 pZ, Cq Ñ 8 pZ, Cq is a contraction. c) Deduce that there exists a single unique solution u P 8 pZ, Cq to the equation T puq “ u. ÿ 3) Now, let us suppose that a “ pak qkPZ P 2 verifies }a}2 :“ |ak |2 ă 1. kPZ
a) Using an example, show that we can have a R 1 pZ, Cq. b) Show that pa ˚ uqk is well defined for all k P Z and that a ˚ u P 8 pZ, Cq. c) Deduce that, for all u P 2 pZ, Cq, T puq P 8 pZ, Cq and that if u, v P pZ, Cq, then }T puq ´ T pvq}8 ă }u ´ v}2 . 2
d) Now, take a “ 12 δ1 and let f “ 1 be the constant sequence fj “ 1 for all j P Z. Calculate T puq as a function of u and determine lim pT puqqk . kÑ`8
e) Deduce that there is no u P 2 pZ, Cq such that T puq “ u. Does this contradict the fixed-point theorem? Hint: There is no need to determine u to answer this question. f) Determine u P 8 pZ, Cq such that T puq “ u. Solution to Exercise 4.4 1) By definition: ÿ δm,1 uk´m “ uk´1 pδ1 ˚ uqk “ mPZ
2) a) By direct calculation: ÿ ÿ ÿ |am | “ }u}8 }a}1 ă `8 am uk´m ď |am uk´m | ď }u}8 pa ˚ uqk “ mPZ
mPZ
mPZ
since a P 1 pZ, Cq and u P 8 pZ, Cq. Furthermore, as the majorization is independent of k, }a ˚ u}8 “ suptpa ˚ uqk u ď }u}8 }a}1 ă `8. kPZ
b) Once again, by direct calculation, we have }T puq ´ T pvq}8 “ }a ˚ u ` f ´ a ˚ v ´ f }8 “ }a ˚ u ´ a ˚ v}8 , but u ÞÑ a ˚ u is linear, so from what we saw in
160
From Euclidean to Hilbert Spaces
the previous question: }T puq ´ T pvq}8 “ }a ˚ pu ´ vq}8 ď }a}1 }u ´ v}8 . Since }a}1 ă 1 by hypothesis, T is a contraction. c) Since p8 pZ, Cq, } }8 q is a complete normed (and therefore metric) space, the fixed-point theorem gives us the existence of a single element u ¯ P 8 pZ, Cq such that T p¯ uq “ u ¯, that is, u ¯“a˚u ¯ ` f. 3) a) The simplest example of a sequence a P 2 pZ, Cq such that a R 1 pZ, Cq is # 8 ř ř 0 kď0 1 |ak | “ In this case, probably ak “ 1 k , the harmonic series, otherwise. kPZ k“1 k ř which we know to be divergent, so a R 1 pZ, Cq. On the other hand, |ak |2 “ 8 ř k“1
kPZ 1 k2 ,
2
which is convergent, that is, a P pZ, Cq.
b) Using the given hypotheses, for all fixed k P Z, the Cauchy-Schwarz inequality can be applied to give: ¸1{2 ˜
˜ ÿ
|am uk´m | ď
mPZ
ď
|am |
mPZ ¸1{2
˜ }a}22
ÿ
ÿ
|un |
¸1{2 ÿ
|uk´m |
mPZ
“ }a}2 }u}2
nPZ
using the change ofř variable n “ k ´řm, with fixed k P Z and m P Z, thus n P Z. Since pa ˚ uqk “ |am uk´m | ă `8, for all fixed k P Z, the am uk´m ď mPZ
mPZ
sequence a ˚ u is well defined. Furthermore, as in question 2a, since the majorization does not depend on k, }a ˚ u}8 “ suptpa ˚ uqk u ď }a}2 }u}2 ă `8. kPZ
c) T puq “ a ˚ u ` f is the sum of two elements of 8 pZ, Cq (f by hypothesis, and a ˚ u as demonstrated above), so T puq P 8 pZ, Cq. Once again, u ÞÑ a ˚ u is clearly linear, so, using the result from the previous question: }T puq´T pvq}8 “ }a˚u´a˚v}8 “ }a˚pu´vq}8 ď }a}2 }u´v}2 ă }u´v}2 since, by hypothesis, }a}2 ă 1. u
d) Using the from question 1, we have pT puqqk “ k´1 2 ` 1. Moreover, ř result 2 2 as u P pZ, Cq, |uk | converges, we necessarily have uk ÝÑ 0, which implies kÑ`8
kPZ
pT puqqk ÝÑ 1. kÑ`8
u
e) Taking T puq “ u, we would have uk “ k´1 2 ` 1, and, taking the limit for k which tends to infinity on both sides, we would obtain the absurd result 0 “ 1. There is no contradiction with the fixed-point theorem, since the inequality }T puq ´ T pvq}8 ă
Banach Spaces and Hilbert Spaces
161
}u ´ v}2 does not involve }T puq ´ T pvq}2 . . . Evidently, as there is no fixed point, T cannot be a contraction on 2 pZ, Cq. f) A sequence u P 8 pZ, Cq such that T puq “ u is a bounded sequence uk “ ` 1 (this is an “arithmetico-geometric” sequence). Taking uk “ vk ` α, with u vk´1 `α v α ` 1, that is, vk “ k´1 unknown vk and α, then vk ` α “ k´1 2 `1 “ 2 2 `1´ 2 vk´1 thus, if we take α “ 2, we obtain a geometric sequence vk “ 2 ; by a standard result for geometric sequences, vk “ 2´k v0 . Furthermore, v0 “ u0 ´ α and α “ 2, hence v0 “ u0 ´2, implying that uk “ 2´k pu0 ´2q`2. For all k ě 0, 2´k ă 1, but for k ă 0, 2´k is not bounded, so to obtain a bounded uk , we need to eliminate its factor, that is, to impose u0 ´ 2 “ 0. Finally, we see that the only sequence u P 8 pZ, Cq such that T puq “ u, that is, the only fixed point for the contraction T : 8 pZ, Cq Ñ 8 pZ, Cq, is the constant sequence of 2, uk “ 2 for all k P Z. 2 uk´1 2
4.4.3. Inclusion relationships between p spaces Let us introduce the following functional space: 0 pN, Kq ” fin pN, Kq “ tpxn qnPN Ă K, DN P N : xn “ 0 @n ě N u [4.26] that is, the space of sequences with a finite number of elements ‰ 0. Clearly, 0 pN, Kq Ă p pN, Kq @p ě 1. T HEOREM 4.16.– Taking p, q P R, 1 ď p ď q ă 8, then: 0 pN, Kq Ă 1 pN, Kq Ă . . . Ă p pN, Kq Ă . . . Ă q pN, Kq Ă . . . Ă 8 pN, Kq P ROOF.– Given that 0 pN, Kq Ă 1 pN, Kq, the demonstration that p 8 pN, Kq Ă pN, Kq @1 ď p ă 8 is almost trivial since: ÿ pxn qnPN P p pN, Kq ðñ |xn |p ă `8 nPN
which gives us |xn |
ÝÑ
n ùñ `8
0, that is, |xn | is bounded and thus pxn qnPN P 8 pN, Kq.
It only remains to prove that p pN, Kq Ă q pN, Kq if 1 ď p ď q: as |xn | 0, then, in particular, DN P N such that |xn | ď 1, @ n ě N thus |xn | @ n ě N , which implies that: ÿ ÿ ||xn |p ||xn |q ď
q
nPðN
nPðN
ÝÑ
n ùñ `8 ď |xn |p
162
From Euclidean to Hilbert Spaces
The convergence of is, p Ă q .
ř
|xn |p therefore implies the convergence of
nPN
ř
|xn |q , that
nPN
2
R EMARK .– The completeness of an infinite-dimensional metric space depends on the metric selected for the space. To verify this statement, let us examine the completeness of p1 , } }8 q, that is, 1 interpreted as a subspace of 8 and equipped with the norm of this latter space. Exercise 4.5 Show that p1 , } }8 q is not complete. Solution to Exercise 4.5 Since 1 Ă 8 , to solve this problem we must prove that 1 is not a closed subset of 8 with respect to the norm } }8 , that is, there exists at least one sequence that converges (and so it is Cauchy) outside p1 , } }8 q. The elements of 1 are sequences x ” pxn qnPN , so a sequence of elements of is a sequence of sequences. For all fixed m P N, we shall note this sequence xm ” pxm n qnPN . 1
Now, let us verify that the sequence of elements of 1 defined by: $ ’ &0 if n “ 0 1 xm “ if 1 ď n ď m n n ’ % 0 if n ą m converges in 8 z 1 . For all fixed m P N, the sequence xm is explicitly defined as follows: ˆ ˙ 1 1 0, 1, , . . . , , 0, 0, . . . 2 m which shows that xm P 1 for all fixed m P N. Now, consider the sequence x˚ ” px˚n qnPN defined by: # 0 if n “ 0 x˚n “ 1 if n ě 1 n Clearly, px˚n qnPN is bounded, and thus belongs to 8 , but }px˚n qnPN }1 “
8 ř n“1
1 n
“
`8, so px˚n qnPN R 1 . If we can show that pxm qmPN converges to x˚ in norm } }8 , this will complete our proof.
Banach Spaces and Hilbert Spaces
163
To do this, we calculate: ˚ }xm ´ x˚ }8 “ sup |xm n ´ xn | “ sup
nąm
nPN
1 n
˚ Up to n “ m, the difference xm n ´ xn is null, but when n ą m, the difference 1 1 1 1 becomes |0 ´ n | “ n . By the definition of sup, sup n1 “ sup t m`1 , m`2 ,...u “ 1 m`1
nąm
and thus: }xm ´ x˚ }8 “
1 ÝÑ 0 m ` 1 mÑ`8
2
4.4.4. Inclusion relationships between Lp spaces In general, there are no inclusion relationships between Lp pX, A, μq spaces. For instance, consider L1 pRq, L2 pRq and the following functions: # # x´2{3 if x ą 1 x´2{3 if 0 ă x ă 1 , gpxq “ f pxq “ 0 otherwise 0 otherwise Clearly, f P L1 pRq, but f R L2 pRq, since10: ż R
|f pxq|dx “
ż1 0
1
x
dx ă `8, 2{3
ż R
|f pxq|2 dx “
ż1 0
1 dx “ `8 x4{3
and g P L2 pRq, but g R L1 pRq, because: ż R
|gpxq|dx “
ż `8 1
1 x2{3
dx “ `8,
ż R
2
|gpxq| dx “
ż `8 1
1 dx ă `8 x4{3
p
Inclusions among L spaces can be obtained by imposing additional conditions. Since spaces L1 pRq and L2 pRq are particularly important, we shall examine the conditions used for these spaces – which are often verified in practical applications – in Theorem 4.17. T HEOREM 4.17.– The following statements are true: 1) if f P L1 pRq, with f bounded, then f P L2 pRq; 2) if f P L2 pRq, with f null outside of a finite interval, then f P L1 pRq. 10 Recall that if a ą 0 and b P R, and β ą 1.
1 dx 0 xα
şa
ă `8 and
ş`8 b
1 xβ
dx ă `8 if and only if α ă 1
164
From Euclidean to Hilbert Spaces
P ROOF.– 1) If f is in L1 pRq and is bounded, say |f pxq| ď M @x P R, M ě 0, then: ż ż ż |f pxq|2 dx “ |f pxq| ¨ |f pxq|dx ď M |f pxq|dx “ M }f }1 ă `8 R
R
R
thus f P L2 pRq. 2) If f is in L2 pRq and is null outside of a finite interval, say f pxq “ 0 @x R ra, bs, then: ż ż ż |f pxq|dx “ |f pxq|dx “ 1pxq ¨ |f pxq|dx “ x1, |f |yL2 ra,bs R
1pxq“1 @xPra,bs
ra,bs
¸1{2 ˜ż
˜ż
ď
dx
(Cauchy-Schwarz)
ra,bs
ra,bs
ra,bs
¸1{2 2
|f pxq| dx
“
?
b ´ a }f }2 ă `8
so f P L1 pRq.
2
Statement 1 remains valid for all f P L1 pRn q, n ě 1, while statement 2 remains valid if we replace an interval with a finite-measure part of Rn . More generally, in the case where μpXq ă `8, it is possible to create a highly useful string of inclusions. T HEOREM 4.18.– If pX, A, μq is a measure space with a finite measure, μpXq ă `8, and if q ą p ą 1, then: L8 pX, A, μq Ă . . . Ă Lq pX, A, μq Ă . . . Ă Lp pX, A, μq Ă . . . Ă L1 pX, A, μq P ROOF.– First, let us verify the thesis for L8 , then for L1 and L2 (which provide a clearer illustration of the approach used), and finally for Lp and Lq . ş ş If f P L8 pX, A, μq, then X |f |p dμ ď X }f }p8 dμ “ }f }p8 μpXq ă `8, hence f P Lp pX, A, μq. If f P L2 pX, A, μq, then: ż X
|f |dμ “
ż
|1 ¨ f |dμ
X
ď
Hölder inequ. [4.19]
ˆż
2
1 dμ
2
|f | dμ
˙ 12
X
X
“ hence f P L1 pX, A, μq.
˙ 12 ˆż a
μpXq}f }2 ă `8
Banach Spaces and Hilbert Spaces
165
Taking E “ tx P X : |f pxq| ě 1u and F “ tx P X : |f pxq| ď 1u, then X “ E Y F , and let p ă q. Then |f pxq|p ď |f pxq|q @x P E and |f pxq|p ď 1 @x P F . Thus, if f P Lq : ż ż ż ż ż p q q |f pxq| dμ ď |f pxq| dμ ` 1 dμ ď |f pxq| dμ ` 1 dμ X
E
F
X
X
“ }f }qq ` μpXq ă `8 that is, f P Lp .
2
4.4.5. Density theorems in Lp (X,A,μ) We shall begin our examination of dense varieties in Lp by considering step functions. 4.4.5.1. Step functions Let pX, A, μq be any measure space and K “ R or C. A piecewise constant function on X with values in K is known as a step or simple function. For all N P N, the rigorous definition of the space of these functions is: " řN Σ “ s : X Ñ K : Dpαi qN i“1 P K : s “ i“1 αi χEi , Ei measurable and μpEi q * ă `8 if αi ‰ 0
The function χEi
# 1 if x P Ei “ 0 if x R Ei
is the indicator function of Ei .
T HEOREM 4.19.– Σ “ Lp pX, A, μq @1 ď p ă 8, where the closure should be interpreted with respect to the topology of Lp pX, A, μq taking Σ Ă Lp pX, A, μq. 4.4.5.2. Intersections: Lp X Lq and p X q T HEOREM 4.20.– Let pX, A, μq be any measure space and K “ R or C, then: # Lp pX, A, μq Lp pX, A, μq X Lq pX, A, μq “ Lq pX, A, μq
@1 ď p, q ď 8
In the first case, the intersection should be interpreted as a subset of Lp pX, A, μq and the closure with respect to the metric topology generated by the norm } }p . In the second case, the intersection should be interpreted as a subset of Lq pX, A, μq and the closure with respect to the topology relative to the norm } }q .
166
From Euclidean to Hilbert Spaces
Notably, as p spaces are nested, it holds that: p pN, Kq “ q pN, Kq
@1 ď p ă q ă 8
As before, for all fixed q, p should be interpreted as a subspace of q and the closure should be interpreted with respect to the norm } }q . T HEOREM 4.21.– For all p P R, 1 ď p ă `8: 0 pN, Kq “ p pN, Kq that is, 0 pN, Kq is dense in p pN, Kq with respect to the topology generated by the norm } }p . P ROOF.– Let pxn qnPN be an arbitrary sequence in p pN, Kq. Consider the sequence: # xn if n ă N xN n :“ 0 otherwise then: }xn ´ xN n }p “
ÿ
p |xn ´ xN n| “
nPN
`8 ÿ n“N
|xn |p
Ñ
N Ñ`8
0
as this is the remainder of a convergent series (since pxn qnPN belongs to p pN, Kq), which proves the density of 0 pN, Kq in p pN, Kq. 2 4.4.5.3. Test functions Let Ω Ď Rn be an open set. D EFINITION 4.14.– Cc8 pΩq “ ˚tf : Ω ùñ K, f indefinitely derivable on Ω and supppf q compact in Rn u where supppf q “ tx P Ω : f pxq ‰ 0u is said to be the support of f . The functions in Cc8 pΩq are known as test functions, as they are so regular that they are often used to test the action and properties of certain “wild” operators. Test functions play a crucial role in distribution theory and in analyzing differential equations. The identically null function is obviously a test function; other explicit
Banach Spaces and Hilbert Spaces
167
examples are much harder to find. The canonical example of a test function on R for any value of ε ą 0 is given by: ˆ ˙ $ &exp ´ 1 if |x| ă ε 2 1´p x f pxq “ εq % 0 if |x| ě ε. For the purposes of our discussion, we need a simple symbol to denote the partial derivative of a function with n variables with respect to a multi-index l “ pl1 , l2 , . . . , ld q P Nd of length |l| “ l1 ` l2 ` . . . ` ld . The canonical notation is: Dl f pxq “
B |l| f Bxl11 Bxl22
. . . Bxldd
pxq
@x P Rn
hence Dl f pxq is the partial derivative of f in x l1 times with respect to x1 , l2 with respect to x2 , etc. This symbol appears in the (non-trivial) definition of a topology on Cc8 pΩq with respect to which the convergence of a sequence of test functions pfn qnPN to a test function f is equivalent to fulfilling the following two conditions: – there exists a compact set K Ă Ω such that supppfn q Ď K for all n P N; – @x P Rn , @l P Nd : Dl fn pxq
Ñ
nÑ`8
Dl f pxq, uniformly.
The space Cc8 pΩq with this topology is usually written as DpΩq and is complete. The following density result holds true. T HEOREM 4.22.– Considering the Borel σ algebra and the Lebesgue measure, then: Cc8 pΩq “ Lp pΩq
@1 ď p ă 8
where Cc8 pΩq should be interpreted as a subspace of Lp pΩq and interpret the closure with respect to the topology generated by the norm } }p . By the definition of closure, Cc8 pΩq is not complete with respect to the topology generated by the norm } }p , since there are sequences of elements of Cc8 pΩq which converge to elements in Lp pΩqzCc8 pΩq. 4.4.5.4. Schwartz space For simplicity’s sake – particularly in terms of notation – we shall start by examining the case of a function with a single variable. Taking k, l P N and f P C 8 pRq, for all x P R, we write: f k,l pxq “ xk
dl f pxq dxl
168
From Euclidean to Hilbert Spaces
D EFINITION 4.15 (Schwartz space, n “ 1).– The function space of functions f P C 8 pRq such that: lim |f k,l pxq| “ 0
|x|Ñ`8
@k, l P N
is known as the Schwartz space, or the space of rapidly decreasing functions. The canonical notation for this space is SpRq. Any element in SpRq is thus an infinitely derivable function everywhere such that, if we consider its derivative of any order and multiply this value by any power of its variable, it converges to 0 as the variable tends to ˘8. To satisfy this characteristic, a function must decrease very rapidly to zero at infinity, hence the alternative name for the functions of SpRq. Evidently, DpRq Ă SpRq, since test functions are null at infinity, but the inclusion is strict, as we see from the most important example of a rapidly-decreasing function: 2 the Gaussian f pxq “ e´x , which does not belong to DpRq, as its support is not compact. Now, let us consider a function with n real variables f P C 8 pRn q. In this case, given two multi-indices l, k P Nn , we write: f k,l pxq “ xk11 xk22 ¨ ¨ ¨ xknn Dl f pxq
@x P Rn
D EFINITION 4.16 (Schwartz space, arbitrary n).– The function space of functions f P C 8 pRn q such that: lim
}x}Ñ`8
|f k,l pxq| “ 0
@k, l P Nn
is the Schwartz space, or rapidly decreasing function space. The canonical notation for this space is SpRn q. By construction, SpRn q is stable with respect to partial derivation and to multiplication by a polynomial. Functions of SpRn q (and their derivatives) decay at infinity faster than the reciprocal of a polynomial. As in the case where n “ 1, DpRn q Ă SpRn q and the inclusion is strict, as the 2 Gaussian f pxq “ e´}x} belongs to SpRn q, but not to DpRn q. It is possible to define a topology on SpRn q in which a sequence pfn qnPN of functions of SpRn q converges to f P SpRn q if fnk,l Ñ f uniformly @k, l P Nn . nÑ`8
With respect to this topology, the Schwartz space is complete.
Banach Spaces and Hilbert Spaces
169
Just as we saw for the test function space, Schwartz space plays an important role in distribution theory (which was formalized by Laurent Schwartz himself) and in the context of partially-derived differential equations. The fact that DpRn q Ă SpRn q and that DpRn q is } }p -dense in Lp pRn q implies the following result (Theorem 4.23). T HEOREM 4.23.– Considering the Borel σ-algebra and the Lebesgue measure, then: SpRn q “ Lp pRn q
@1 ď p ă 8
interpreting SpRn q as a subspace of Lp pRn q and considering the closure with respect to the topology generated by the norm of } }p . From the definition of closure, SpRn q is not complete with respect to the topology generated by the norm } }p : there exist sequences of SpRn q which converge to elements of Lp pRn qzSpRn q. 4.5. Summary In this chapter, we have examined the compatibility of the topological structure of inner product vector spaces with the linear structure: the sum and product by a scalar operations are continuous in the topology generated by the inner product, as is the inner product itself, and the canonically induced norm. This compatibility is essential, as it implies that the limit operation commutes with the operations cited above; this result is fundamental in both theoretical and practical contexts. We also saw that all finite-dimensional vector spaces possess the same Euclidean topological structure up to a homeomorphism. Hilbert and Banach spaces were introduced as special cases of inner product or normed vector spaces, respectively, such that all Cauchy sequences converge within the space (completeness property). Any finite-dimensional inner product space is a Hilbert space, while any finite-dimensional normed vector space is a Banach space. All Hilbert spaces are Banach spaces, but the reverse is not usually true. Complete normed vector spaces can be characterized in a simple but very useful way: they are all, and only, spaces in which absolutely convergent series are also simply convergent. Any contraction defined on a complete metric space possesses a unique fixed point. We presented the Hilbert spaces L2 and 2 , along with examples of non-Hilbert, but Banach, spaces, Lp and p , with 1 ď p ď 8, p ‰ 2. The Minkowski inequality
170
From Euclidean to Hilbert Spaces
can be used to define a linear structure on all of these spaces, while Hölder’s inequality is used to define an inner product when p “ 2. p spaces are nested with increasing p; on the other hand, there is generally no inclusion relationship in Lp spaces, with the notable exception of finite measure spaces, for which Lp spaces are nested, but in the opposite way to p spaces, that is, with decreasing p. Finally, we demonstrated that Lp spaces coincide with the closure of many widely used function spaces, such as the test function space and the Schwartz space.
5 The Geometric Structure of Hilbert Spaces
Among the infinite-dimensional vector spaces, Hilbert spaces are the closest to the Euclidean spaces Kn presented in Chapter 1 with respect to their geometric structure, which is the focus of the present chapter. Infinite-dimensional Banach spaces do not share this characteristic, with structural properties that can be far more complicated than those of Hilbert spaces. The rich geometric structure of Hilbert spaces makes it possible to extend the discrete Fourier transform (DFT) to spaces in infinite dimensions, using the concepts of series and the continuous Fourier transform. Suggested reading for those wishing to go further into the subjects discussed in this chapter and in Chapter 6 includes Berberian (1961), Abbati and Cirelli (1997), Saxe (2000), Debnath and Mikusinski (2005) and Moretti (2013). The first step in analyzing the geometric structure of Hilbert spaces is to consider the concept of orthogonal complement. 5.1. The orthogonal complement in a Hilbert space and its properties The set of all vectors which are orthogonal to the vectors of a subset in a Hilbert space is of crucial importance in understanding the geometric properties of these spaces. The definition and properties of this set are given below.
From Euclidean to Hilbert Spaces: Introduction to Functional Analysis and its Applications First Edition. Edoardo Provenzi. © ISTE Ltd 2021. Published by ISTE Ltd and John Wiley & Sons, Inc.
172
From Euclidean to Hilbert Spaces
D EFINITION 5.1.– Let H be a Hilbert space and M Ď H any subset. The orthogonal complement of M in H is: M K “ tx P H : xx, yy “ 0 @ y P M u that is M K contains all of the elements of H which are orthogonal to elements in M . We denote with spanpM q the vector subspace of H generated by M , that is, the set of (finite) linear combinations of vectors in M . In Theorem 5.1, we shall write pM K qK “ M KK and pM KK qK “ M KKK . T HEOREM 5.1 (Properties of the orthogonal complement).– Let H be a Hilbert space and M Ď H an arbitrary subset. Then: K
1) t0H u “ H and HK “ t0H u; # t0H u if 0H P M K 2) M X M “ ; H if 0H R M 3) M K is a closed vector subspace of H; 4) if N Ď H, then M Ď N ñ N K Ď M K (K reverses the set inclusion relationships); 5) M Ď M KK (difference with respect to finite dimensions); 6) pM qK “ M K ; 7) M KKK “ M K ; 8) M K “ pspan M qK “ pspan M qK ; 9) if M “ H ñ M K “ t0H u (the orthogonal complement of a dense subset is the zero vector). The proof is given below. First, however, we note that the fact that M K is always closed is a very useful property for demonstrating that a linear variety of H is closed: we must simply demonstrate that this variety coincides with the orthogonal complement of a subset of H. We also remark how noticeable it is that, thanks to the orthogonal complement, we pass from the category of sets, in which M belongs, to that of topological vector spaces, where M K belongs to moreover, the property of closure for M K . P ROOF.– 1) The property follows from the fact that 0H is the only vector in H which is orthogonal to all the others. 2) 0H is the only vector which is orthogonal to itself.
The Geometric Structure of Hilbert Spaces
173
3) M K is a vector subspace: if x, x1 P M K , then xx, yy “ xx1 , yy “ 0 @y P M , hence xαx ` βx1 , yy “ αxx, yy ` βxx1 , yy “ 0 @y P M , i.e. M K is a vector subspace, as it is stable with respect to linear combinations of its elements. M K is closed: We must show that M K contains all the limit points of sequences in M K . Let pxn qnPN Ă M K be a sequence which is convergent (and thus Cauchy) to a limit x; then, since M K Ď H and H is complete, x P H. For all y P M , xxn , yy “ 0 @n P N, so, from the continuity of the inner product, we can write: 0 “ lim xxn , yy “ x lim xn , yy “ xx, yy nÑ`8
nÑ`8
thus x K y @y P M , that is x P M K . 4) Since M Ď N , the vectors of H which are orthogonal to the vectors of N are also orthogonal to the vectors of M (although the contrary is not necessarily true). Thus, y P H, y P N K implies y P M K , that is N K Ď M K . 5) Every vector in M is orthogonal to every vector in M K by definition, but there may also be other vectors in H which are orthogonal to M K , hence M Ď M KK . 6) The equality of the sets can be demonstrated by demonstrating the two inclusions in the opposite direction: – pM qK Ď M K : this follows from M Ď M and property 4; – M K Ď pM qK : we must show that y P M K ùñ y P pM qK . The elements of M are the union of all elements in M with the limits of the sequences in M , so we must show that, if y P M is orthogonal to all of the elements pxn qnPN Ă M K of an arbitrary convergent sequence in M K , then y is also orthogonal to the limit of this sequence. This can be proved using the continuity of the inner product: by hypothesis, xxn , yy “ 0 @n P N, thus: 0 “ lim xxn , yy “ x lim xn , yy nÑ`8
nÑ`8
that is, y K lim xn . nÑ`8
7) From property 5, M Ď M KK , and from property 4, M KKK Ď M K for any subset M of H. Now, writing N “ M K , the final inclusion can be rewritten as N KK Ď N , implying, by property 4, N K Ď N KKK . M , and thus N , are arbitrary subsets of H, hence the inclusions M KKK Ď M K , N K Ď N KKK imply equality between a subset of H and its triorthogonal complement.
174
From Euclidean to Hilbert Spaces
8) Consider an arbitrary element in spanpM q: y0 “
n ř
αi yi , yi P M and αi P K
i“1
@i “ 1, . . . , n. Taking any fixed x P M K , by the sesquilinearity (or bilinearity) of x , y (for K “ C or K “ R), we can write: xx, y0 y “ xx,
n ÿ
αi yi y “
i“1
n ÿ
αi xx, yi y “ 0 xKyi
i“1
hence x P pspanpM qqK , and since x is arbitrary, this implies M K Ď pspanpM qqK . Given that M Ď spanpM q, by (4), pspanpM qqK Ď M K , that is pspanpM qqK “ M K ; furthermore, by (6), pspanpM qqK “ pspanpM qqK “ M K . 9) M K “ pM qK “ HK “ t0H u, as the only vector which is orthogonal to all vectors in H is the zero vector. 2 5.2. Projection onto closed convex sets: theorem and consequences In this section, we shall consider a particularly important geometric property which has already been covered in the context of Euclidean spaces: the orthogonal projection minimizes the distance between a given vector and those of the subset on which it projects. This result will be presented and proved for the case of a closed convex subset and then used for a closed vector subspace. D EFINITION 5.2.– A subset S of a vector space is convex if: @x, y P S, @λ P r0, 1s : λx ` p1 ´ λqy P S that is, if S is stable with respect to convex combinations, that is, for linear combinations where the coefficients sum to 1. In geometric terms, a convex subspace can be characterized by the fact that any pair of points may be connected by a line segment which remains within the subspace. Evidently, all vector subspaces are convex, as they are stable with respect to all linear combinations, including convex combinations. Note that the half sum of x and y (i.e.
x`y 2 )
is a convex combination with λ “ 1{2.
T HEOREM 5.2.– Let H be a Hilbert space and S a closed, convex and proper subset1 of H. Then, @x P H (fixed) there exists a single point y0 P S such that: }x ´ y0 } “ inf }x ´ y} yPS
1 If S “ H, then the theorem may be verified trivially with y0 “ x.
The Geometric Structure of Hilbert Spaces
175
that is such that y0 minimizes the distance between x and the points in S. Before presenting the proof of this theorem, we should note that this result also holds for any closed vector subspace of H: the theorem of projection onto a closed convex space generalizes property 3 from Theorem 1.12 to infinite-dimensional Hilbert spaces. D EFINITION 5.3.– The vector y0 in the previous theorem is the orthogonal projection of x on S, noted y0 “ PS pxq. The non-negative real quantity dpx, Sq “ }x ´ PS pxq} is the distance between x and the closed, convex and proper subset S. It is evident that if x P S then PS pxq “ x and dpx, Sq “ 0, so the information provided by the theorem is interesting when x R S. P ROOF.– D : For simplicity’s sake, let us note2, δ ” inf }x ´ y}. We shall demonstrate yPS
the existence of y0 using a non-constructive technique typical of the Hilbert school. Consider a sequence pyn qnPN Ă S which satisfies the equation3: lim }x ´ yn } “ δ
[5.1]
nÑ`8
The interest of such a sequence is that, by the continuity of the norm, [5.1] can be rewritten as: › › › › › › › › › › › › › › › › › δ “ › lim px ´ yn q› “ › lim x ´ lim yn › “ ›x ´ lim yn ›› nÑ`8
nÑ`8
nÑ`8
nÑ`8
hence to prove the existence of y0 , we can simply take y0 :“ lim yn . nÑ`8
We begin by noting that S is closed and is thus itself complete; to demonstrate the existence of the limit of yn , we must show that pyn qnPN is a Cauchy sequence in S. To show that pyn qnPN is a Cauchy sequence we will use the parallelogram law [1.6] (which holds since the norm is Hilbertian, see Theorem 4.3) on the elements x ´ yn and x ´ ym : }px´yn q`px´ym q}2 `}px´yn q´px´ym q}2 “ 2p}px´yn q}2 `}px´ym q}2 q which can be rewritten as: }2x ´ yn ´ ym }2 ` }yn ´ ym }2 “ 2p}px ´ yn q}2 ` }px ´ ym q}2 q 2 δpx, Sq would be more “correct”, since δ generally changes with x and S. 3 For example, δ 2 ď }x ´ yn }2 ď δ 2 ` n1 , @n P N.
176
From Euclidean to Hilbert Spaces
that is: }yn ´ ym }2 “ 2p}px ´ yn }2 ` }x ´ ym }2 q ´ }2x ´ yn ´ ym }2 ` ˘ and 2x ´ yn ´ ym “ 2 x ´ 12 pyn ` ym q , so [5.2] becomes:
[5.2]
› ›2 › › 1 › [5.3] }yn ´ ym } “ 2p}px ´ yn } ` }x ´ ym } q ´ 4 ›x ´ pyn ` ym q›› 2 › › Note that 12 pyn ` ym q P S by convexity, then ›x ´ 12 pyn ` ym q› ě δ by definition › › 2 of δ, thus ´4 ›x ´ 1 pyn ` ym q› ď ´4δ 2 , so equation [5.3] gives us: 2
2
2
2
}yn ´ ym }2 ď 2p}x ´ yn }2 ` }x ´ ym }2 q ´ 4δ 2 Since lim }x ´ yn }2 “ nÑ`8
lim }x ´ ym }2 “ δ 2 , the right-hand side of the
mÑ`8
previous inequality tends to 0 for sufficiently high values of n and m, therefore pyn qnPN is a Cauchy sequence. ! : Let us now prove that only one y0 exists which satisfies equation [5.1]. Let y1 be another element in S which verifies }x ´ y1 } “ δ. Writing the parallelogram formula once again, but this time using x ´ y0 and x ´ y1 , we obtain: }px ´ y0 q ` px ´ y1 q}2 ` }px ´ y0 q ´ px ´ y1 q}2 “ 2p}x ´ y0 }2 ` }x ´ y1 }2 q that is: }2x ´ y0 ´ y1 }2 ` }y1 ´ y0 }2 “ 4δ 2 thus: › ˆ ˙›2 › y0 ` y1 ›› › 0 ď }y1 ´ y0 } “ 4δ ´ }2x ´ y0 ´ y1 } “ 4δ ´ ›2 x ´ › 2 ˜ › › ›2 ›2 ¸ › › › › y y ` y ` y 0 1 0 1 2 2 › “ 4 δ ´ ›x ´ › “ 4δ ´ 4 ››x ´ › › 2 2 › 2
2
2
2
1 P S by convexity, and, since δ 2 “ inf yPS }x ´ y}2 , it must We observe that y0 `y 2 › ›2 › › 2 hold that δ 2 ď ›x ´ y0 `y1 › and thus δ 2 ´ ›x ´ y0 `y1 › ď 0, that is:
2
2
› ›2 ¸ › y0 ` y1 ›› 2 2 › 0 ď }y1 ´ y0 } “ 4 δ ´ ›x ´ ď 0, 2 › ˜
hence y1 “ y0 .
2
The Geometric Structure of Hilbert Spaces
177
As we have seen, the parallelogram formula is essential to the proof of this theorem. Since, by Theorem 4.3, only Hilbert norms verify this formula, the proof given above cannot be applied to Banach spaces which are not Hilbert spaces and, in fact, counter-examples show that the theorem of projection onto a closed, convex and proper subset does not hold for any infinite-dimensional Banach space. The theorem of projection onto a closed convex space has very important consequences, which will be described in detail later. For now, note that this theorem guarantees the existence and uniqueness of the orthogonal projection y0 , but it does not provide any information regarding the explicit expression of elements of the sequence pyn qnPN in S which converges to y0 . A geometric characterization of y0 , shown in Theorem 5.3, is therefore very useful. A remarkable application of this result will be presented in section 5.3. T HEOREM 5.3.– Let H be a real Hilbert space, S a closed, convex and proper subset of H and x a fixed element in H. Then y0 is the orthogonal projection of x onto S, that is x ´ y0 “ inf x ´ y, if and only if: yPS
@y P S,
xx ´ y0 , y ´ y0 y ď 0
that is4, if and only if the angle ϑ between vectors x ´ y0 and y ´ y0 is obtuse, as shown in Figure 5.1. If H is complex, then we replace xx ´ y0 , y ´ y0 y ď 0 with pxx ´ y0 , y ´ y0 yq ď 0. P ROOF.– This proof concerns the real case. Proof of the complex case is left to the reader. ñ : we wish to show that, if x ´ y0 “ inf x ´ y, then xx ´ y0 , y ´ y0 y ď 0 yPS
@y P S. To do this, let us consider any y P S, using the convexity of S to guarantee that λy ` p1 ´ λqy0 P S @λ P r0, 1s. Thus, by hypothesis, and using the bilinearity and symmetry properties of the real inner product, we obtain: 2
2
2
x ´ y0 ď x ´ rλy ` p1 ´ λqy0 s “ x ´ y0 ´ λpy ´ y0 q “ xx ´ y0 ´ λpy ´ y0 q, x ´ y0 ´ λpy ´ y0 qy
4 Since xx ´ y0 , y ´ y0 y “ }x ´ y0 }}y ´ y0 } cospϑq, with ϑ the angle between vectors x ´ y0 and y ´ y0 .
178
From Euclidean to Hilbert Spaces
“ xx ´ y0 , x ´ y0 y ´ λxx ´ y0 , y ´ y0 y ´ λxy ´ y0 , x ´ y0 y `λ2 xy ´ y0 , y ´ y0 y looooooooomooooooooon “λxx´y0 ,y´y0 y 2
2
“ x ´ y0 ` λ2 y ´ y0 ´ 2λxx ´ y0 , y ´ y0 y Thus: 2
2
2
x ´ y0 ď x ´ y0 ` λ2 y ´ y0 ´ 2λxx ´ y0 , y ´ y0 y Simplifying and dividing by λ P p0, 1s, we obtain: 2
0 ď λ y ´ y0 ´ 2xx ´ y0 , y ´ y0 y that is: xx ´ y0 , y ´ y0 y ď
λ 2 y ´ y0 2
for all λ in p0, 1s. Now, taking the limit by λ Ñ 0 to both members of the inequality, 2 we obtain: lim xx´y0 , y´y0 y “ xx´y0 , y´y0 y ď lim λ2 y ´ y0 “ 0, completing λÑ0
λÑ0
the proof of the direct implication. ð : Let x P H be fixed, and take y0 P S such that @y P S, xx ´ x0 , y ´ y0 y ď 0. Then, we wish to show that x ´ y0 “ inf yPS x ´ y. We have: 2
2
2
x ´ y “ x ´ y0 ` y0 ´ y “ x ´ y0 ´ py ´ y0 q “ xx ´ y0 ´ py ´ y0 q, x ´ y0 ´ py ´ y0 qy “ xx ´ y0 , x ´ y0 y ´ xx ´ y0 , y ´ y0 y ´ xy ´ y0 , x ´ y0 y ` xy ´ y0 , y ´ y0 y 2 2 “ x ´ y0 ` y ´ y0 ´ 2xx ´ y0 , y ´ y0 y, having used the symmetry of the real inner product. We thus have: 2
2
2
x ´ y0 ´ x ´ y “ 2 xx ´ y0 , y ´ y0 y ´ loooomoooon y ´ y0 ď 0 loooooooomoooooooon ď0 by hypothesis 2
ě0
2
that is, x ´ y0 ď x ´ y @y P S, that is: x ´ y0 “ inf x ´ y. yPS
2
The following corollary shows that any complement of a closed proper vector subspace of a Hilbert space is not trivial, and generalizes property 2 from Theorem 1.12 to infinite-dimensional Hilbert spaces. T HEOREM 5.4.– Let H be a Hilbert space and S a closed, proper vector subspace of H, that is S ‰ H and S ‰ H. Then S K is not reduced to t0H u; in fact, for all fixed x P HzS, the vector u “ x ´ PS pxq is non-zero and belongs to S K : @x P HzS,
u “ x ´ PS pxq ‰ 0H and u P S K
The Geometric Structure of Hilbert Spaces
179
Figure 5.1. Two-dimensional geometric visualization of the property verified by the projection onto a closed, convex and proper subset of H. For a color version of this figure, see www.iste.co.uk/ provenzi/spaces.zip
The vector u “ x ´ PS pxq is known as the residual vector, and the fact that u K S fully justifies the use of the term “orthogonal projection” for PS pxq. P ROOF.– Vector subspaces are convex, so the theorem of projection onto a closed convex subset holds, and thus D PS pxq P S, such that u “ x ´ PS pxq “ inf x ´ y ” δ. yPS
Since x R S and PS pxq P S, u ‰ 0H . If we can show that u P S K , that is that xu, sy “ 0 @s P S, this will prove the theorem. To obtain this result, we start by noting that @k P K, @s P S: 2
2
2
2 2 u ` ks “ x ´ PS pxq ` ks “ }x ´ pP S pxq ´ ksq } ě δ “ u loooooomoooooon PS
by the definition of δ and the fact that S is a vector subspace. Hence: 2
2
u ` ks ´ u ě 0,
@k P K, @s P S
180
From Euclidean to Hilbert Spaces
Furthermore: 2
u ` ks “ xu ` ks, u ` ksy “ xu, uy ` xu, ksy ` xks, uy ` xks, ksy 2 ¯ sy ` kxs, uy ` |k|2 s2 “ u ` kxu, 2
2
Thus, u ` ks ´ u ě 0 if and only if: ¯ sy ` kxs, uy ` |k|2 s2 ě 0 kxu, As k is arbitrary, we can take k “ xu, syt with any t P R . Thus, the equation above becomes: 2
2 2 xu, sytxu, sy ` xu, syt lo xs, uy omo on `|xu, sy| t s ě 0 “xu,sy
2
2 2 2 2 ðñ |xu, ´ sy| t ` |xu, sy| ¯ t ` |xu, sy| t s ě 0 2 ðñ t2 |xu, sy|2 s ` 2t|xu, sy|2 ě 0 @t P R
It would be pointless to simplify |xu, sy|2 , since we ´wish to calculate ¯ xu, sy. The 2 strategy to complete our proof consists of interpreting t2 |xu, sy|2 s `2t|xu, sy|2 as a second-degree polynomial function of the form P ptq “ at2 ` bt ` c with: $ 2 2 ’ &a “ |xu, sy| s b “ 2|xu, sy|2 ’ % c“0 The discriminant is equal to Δ “ b2 ´ 4ac “ 4|xu, sy|4 ě 0. Thus, for P ptq to be positive @t P R, we must have Δ ď 0; this is only possible if Δ “ 0, but: Δ “ 0 ðñ 4|xu, sy|4 “ 0 @s P S (u being fixed since x is fixed) that is, xu, sy “ 0 @s P S, hence u P S K .
2
5.2.1. Characterization of closed vector subspaces in Hilbert spaces Theorem 5.4 can be used to deduce a highly useful characterization of closed vector subspaces in Hilbert spaces. We shall begin by considering an intermediate result. L EMMA 5.1.– Let H be a Hilbert space and M any subset of H, then: KK
spanpM q “ spanpM q
The Geometric Structure of Hilbert Spaces
P ROOF.– Taking: # S “ spanpM q
181
KK
T “ spanpM q
Theorem 5.1 guarantees that S Ď T ; if we can prove that S Ă T is an impossible condition, then we will be left with S “ T . We begin by observing that S is a closed vector subspace of T and that T , as the orthogonal complement of a subset of H, is a closed vector subspace of H; T itself is thus a Hilbert space. Reasoning by the absurd, if we assume S Ă T , then Theorem 5.4 can be applied to the pair S and T to ensure the existence of u P T , u ‰ 0H and u P S K . However, K
KK
this implies that u P S K X T “ spanpM q X spanpM q the fact that u ‰ 0H . Thence, spanpM q “ spanpM q
KK
“ t0H u, which contradicts
.
2
We now have all of the information needed to prove a helpful characterization of closed vector subspaces in Hilbert spaces: closed vector subspaces are precisely those which coincide with their biorthogonal complement. This characterization is particularly powerful, as it creates a bridge between different structures that coexist in a Hilbert space: in fact, on one side, closure is related to the topological structure induced by the presence of a Hilbert norm, while, on the other side, the concept of orthogonality is related to the geometry of the Hilbert space by the presence of an inner produce. This bridge can be used, for instance, to verify the closure of a vector subspace by showing explicitly its biorthogonal complement, if this computation is easier than the direct verification of the closure. T HEOREM 5.5.– Let H be a Hilbert space and M a vector subspace of H. 1) M KK “ spanpM q. 2) M “ M KK . 3) M is a closed vector subspace of H if and only if M “ M KK . P ROOF.– 1) Given that M is a vector subspace, M ” spanpM q. Furthermore, by property K
8 from Theorem 5.1, M K “ spanpM qK “ spanpM q . Hence, the previous lemma KK
implies M KK “ spanpM q
“ spanpM q.
182
From Euclidean to Hilbert Spaces
2) We have shown that M KK “ spanpM q and we know that M ” spanpM q, thus M “ M KK . 3) Let us demonstrate the double implication: ñ : we know from point 1) that M “ M KK , but if M is closed then M “ M , and thus M “ M KK ; ð : if M “ M KK , then M is automatically a closed vector subspace by the fact that it is the orthogonal complement of M K . 2 C OROLLARY 5.1.– Let H be a Hilbert space and M, N any two parts of H. 1) It holds that: K
pM Y N q “ M K X N K 2) Additionally, if M and N are two closed vector subspaces of H, then: K
pM X N q “ span pM K Y N K q
[5.4]
P ROOF.– 1) Let us prove the two inclusions: K
K
– pM Y N q Ď M K X N K : taking x P pM Y N q and y P M , then y also belongs to M Y N , thus xx, yy “ 0, that is x P M K . Now, taking y P N , the same argument tells us that x P N K . Thus x P M K and x P N K , that is, x P M K X N K ; K
– M K X N K Ď pM Y N q : taking x P M K X N K , then x P M K and x P N K . If y P M Y N , then y P M or y P N , but in both cases xx, yy “ 0, that is, x P K pM Y N q ; 2) The relationship determined above holds for all parts of H and thus also holds ˘K ` for M K and N K . In this case, point 1 becomes M K Y N K Ď M KK X N KK “
th. r5.5s
spanpM q X spanpN q “ M X N since M and N are presumed to be closed vector subspaces. Now, taking the orthogonal, we obtain: ˘KK ` K pM X N q “ M K Y N K “ spanpM K Y N K q 2 th. r5.5s
5.3. Polar and bipolar subsets of a Hilbert space In this section, we shall use a different approach to obtain the same result concerning the characterization of a closed part of a Hilbert space. In this case, we shall use the concept of polar sets, which is particularly important in the context of convex optimization theory.
The Geometric Structure of Hilbert Spaces
183
D EFINITION 5.4 (Polar and bipolar).– Let H be a Hilbert space and M any non-empty part of H. The polar set of M , noted M 0 , is the subset of H defined by5 : M 0 :“ tx P H : @y P M, pxx, yyq ď 1u ” tx P H : sup pxx, yyq ď 1u
yPM
The bipolar of M is the polar of the polar, that is: M 00 :“ pM 0 q0 “ th P H : @x P M 0 , pxh, xyq ď 1u ” th P H : sup pxh, xyq ď 1u
xPM 0
Let us also recall the concept of convex hull. D EFINITION 5.5 (Closed convex hull).– The closed convex hull of a part M of H is the closure of the intersection of all convex parts of H containing M . It is the smallest closed convex subset of H which contains M . The following result contains remarkable properties of both the polar and bipolar. T HEOREM 5.6.– Let H be a Hilbert space and M any non-empty part of H. 1) M 0 is a closed convex subset of H which contains the zero vector 0H . 2) M 00 coincides with the closed convex hull C of M Y t0H u. 3) If M is a convex part of H which contains 0H , then M “ M 00 . 4) If M is a vector subspace of H, then M 0 “ M K . P ROOF.– 1) The fact that M 0 contains 0H is an obvious consequence of the fact that x0H , yy “ 0 ă 1 for all y P M . To verify convexity, let us consider λ P r0, 1s and x1 , x2 P M 0 ; by the left linearity of the inner product: pxx1 ` p1 ´ λqx2 , yyq “ λpxx1 , yyq ` p1 ´ λqpxx2 , yyq ď λ ` p1 ´ λq “ 1 showing that M 0 is convex. All that remains is to prove the closure; to do this, we first remark that, for all fixed y P H, the application φy : H Ñ R, x ÞÑ φy pxq :“ pxx, yyq is continuous. Writing: Hy :“ φ´1 y tr´8, `1su “ tx P H : pxx, yyq ď 1u 5 Evidently, the real part of the inner product can be eliminated if H is a real Hilbert space.
184
From Euclidean to Hilbert Spaces
then Hy is a closed subset of H as it is the reciprocal image of a closed subset of R via the continuous map φy (remember that p´8, `1s is closed, since its complement set 0 in R is p1, `8q, which is open). Ş By definition, the elements of M must belong to Hy 0 for all y P M , that is M “ Hy , and thus it is closed in H as it is the intersection of closed parts of H.
yPM
2) Now, let us demonstrate the opposite inclusions. C Ď MŞ00 : first, it is useful to show that M Ď M 00 . To do this, we note that M “ Hx . Next, let us take arbitrary but fixed y P M and x P M 0 ; then 00
xPM 0
notably, x P Hy , that is, pxx, yyq ď 1 and since pxx, yyq “ pxy, xyq, Ş we also have pxy, xyq ď 1, that is, y P Hx . Since this holds for all x P M 0 , y P Hx “ M 00 . xPM 0
Then: y P M ùñ y P M 00 , that is, M Ď M 00 .
By (1), we know that M 00 , as a polar set, is convex, closed and contains t0H u. We have just seen that M Ď M 00 , thus M 00 is a closed convex set which contains M Y t0H u. Given that C, the closed convex hull of M Y t0H u, is the smallest convex subset of H which contains M Y t0H u, it must be included in M 00 . M 00 Ď C : the fact that 0H P C comes into play at this stage of the proof. From Theorem 5.3, for all x P H it holds that: pxx ´ PC x, 0H ´ PC xyq ď 0 ðñ pxx ´ PC x, ´PC xyq ď 0 ðñ pxx ´ PC x, PC xyq ě 0 and for all y P M , we also have pxx ´ PC x, y ´ PC xyq ď 0, that is, pxx ´ PC x, y ´ PC xyq ď ε for all ε ą 0, that is, by linearity of the inner product: pxx ´ PC x, y ´ PC xyq “ pxx ´ PC x, yyq ´ pxx ´ PC x, PC xyq ď ε that is, given that ε ` pxx ´ PC x, PC xyq is a real number ą 0: pxx´PC x, yyq ď ε`pxx´PC x, PC xyq ðñ which can be rewritten as: F˙ ˆB x ´ PC x ,y ď1 ε ` pxx ´ PC x, PC xyq that is, the element zpxq :“
x´PC x ε`pxx´PC x,PC xyq
pxx ´ PC x, yyq ď1 ε ` pxx ´ PC x, PC xyq
@y P M, @ε ą 0
P M 0 for all x P H.
As this result holds for any x P H, it can be applied when x P M 00 ; in this case, by definition, we have pxx, zpxqyq ď 1, that is: F˙ ˆB x ´ PC x ď1 pxx, zpxqyq “ x, ε ` pxx ´ PC x, PC xyq
The Geometric Structure of Hilbert Spaces
185
hence: pxx, x ´ PC xyq ď ε ` pxx ´ PC x, PC xyq “ ε ` pxPC x, x ´ PC xyq which gives us: pxx ´ PC x, x ´ PC xyq ď ε ðñ }x ´ PC x}2 ď ε
@ε ą 0
but this is possible if and only if x ´ PC x “ 0H , that is, x “ PC x; however, since PC x P C, x P C. This completes our proof that x P M 00 ùñ x P C, that is, M 00 Ď C. 3) By (2), if M is a convex part of H containing 0H , then M 00 is the smallest convex part which contains M . If M is convex, then M is also convex, and, furthermore, is the smallest closed set which contains M ; consequently, M “ M 00 . 4) Let us now prove the opposite inclusions. M K Ď M 0 : if x P M K , then, for all y P M , xx, yy “ 0 ă 1, therefore x P M 0 . M 0 Ď M K : taking x P M 0 , we wish to prove that x P M K , that is xx, yy “ 0 @y P M . This is done using the fact that M is taken to be a vector subspace of H: if y P M , then ty P M @t P Rzt0u. Since x P M 0 and ty P M , it must hold that: pxx, tyyq ď 1 ðñ tpxx, yyq ď 1 @t P Rzt0u ðñ pxx, yyq “ 0 @y P M If H is a real Hilbert space, this concludes our proof. If H is complex, we also need to show that the imaginary part of the inner product is zero. We do this using Theorem 1.2, which tells us that pxx, yyq “ pxx, iyyq, thus pxx, yyq “ pxx, iyyq “ 0 as we have previously proven that pxx, zyq “ 0 @z P M and z “ iy P M when y P M if M is a complex vector subspace. Finally, xx, yy “ 0 @y P M and thus x P M K . 2 Properties 3 and 4 from Theorem 5.6 imply property 2 of Theorem 5.5, that is, M “ M KK . In fact, on one side, M 0 “ M K , so by repeating the polar operation twice we obtain M 00 “ M KK . Furthermore, M 00 “ M , thus M “ M KK . 5.4. The (orthogonal) projection theorem in a Hilbert space We shall now present and demonstrate the most important corollary of the theorem of orthogonal projection on a closed convex set. T HEOREM 5.7 (Orthogonal projection theorem).– Let H be a Hilbert space on K “ R or C and let S be a closed proper subspace of H. Then: H “ S ‘ SK
186
From Euclidean to Hilbert Spaces
that is, @x P H, Ds P S, Dt P S K : x “ s ` t, and this decomposition is unique, that is, if: # # x“s`t s “ s1 1 1 K with s, s P S, t, t P S , then: 1 1 x“s `t t “ t1 If S is not a proper subspace, then we have the trivial decomposition H “ H ‘ t0H u. P ROOF.– Take a fixed x P H, the orthogonal projection PS pxq P S of x onto S and the residual vector u: u “ x ´ PS pxq P S K . By Theorem 5.7, x can be decomposed as follows: x “ lo PoSmo pxq x ´ PS pxq on ` loooomoooon PS
PS K
We must now show that a decomposition of this type is unique. Consider the decompositions x “ s ` t and x “ s1 ` t1 , with s, s1 P S, t, t1 P S K , then s ` t “ s1 ` t1 , that is: 1 so´ to1 mo ´otn lo moson “ lo PS
PS K
As S and S K are vector spaces, they are stable by subtraction, hence the inclusions shown in curly brackets. We have S Q s ´ s1 “ t1 ´ t P S K , thus s ´ s1 P S X S K and t1 ´ t P S X S K . However S X S K “ t0H u, so we must have s1 “ s and t1 “ t. 2 I MPORTANT OBSERVATION .– Whenever we recognize the presence of a closed vector subspace S of a Hilbert space H, the orthogonal projection theorem, gives a much profound meaning to the otherwise trivial decomposition x “ x ´ y ` y, with x R S and y P S: in fact, we know that y “ PS pxq and y is orthogonal to x ´ y. This latter 2 property guarantees the possibility to use the Pythagorean theorem to write x “ 2 2 x ´ y ` y , which is often extremely useful in both theoretical and practical contexts, as we will see later in this chapter and the following one. The results introduced above are applied in the exercise below.
Exercise 5.1 Let Ω be a bounded subset of Rn , and consider the set M of functions f : Ω Ñ R, f P L2 pΩq which are constant a.e. Show that: 1) M is a closed vector subspace of L2 pΩq;
The Geometric Structure of Hilbert Spaces
187
2) @f P L2 pΩq, the projection of f onto M is the function which is constant a.e. ş 1 and equal to the average of f on Ω, that is: PM f “ |Ω| f pxqdx, |Ω| “ mpΩq, with Ω m the Lebesgue measure on Rn ; 2 3) The orthogonal complement ş of M in L pΩq is given by the functions h in L pΩq with zero average, that is Ω hpxqdx “ 0. 2
Solution to Exercise 5.1 1) M can be characterized as the vector subspace of L2 pΩq generated by the function 1pxq “ 1 which is constant a.e. on Ω. As there is only one generator, M is isomorphic to R, which is closed. 2) Taking f P L2 pΩq, then, by the projection theorem, if we write f “ f ´PM f ` PM f , we have f ´ PM f P M K . Let g be an element in M such that gpxq “ c ‰ 0 a.e. on Ω, and let us calculate the inner product between f ´ PM f and g: 0 “ xf ´ PM f, gyL2 pΩq “
ż Ω
pf pxq ´ PM f qgpxqdx
pPM f P M
ùñ const. a.e., so we interpret PM f P Rq “
ż
“c
Ω
f pxqgpxqdx ´
ˆż Ω
ż Ω
pPM f qgpxqdx “ c
ż Ω
f pxqdx ´ cpPM f q
ż dx Ω
˙ f pxqdx ´ pPM f q|Ω|
that is, since c ‰ 0: PM f “
1 |Ω|
ż Ω
f pxqdx
3) Taking any h P M K , then by definition: xh, gyL2 pΩq “ 0 @g P M . Now, taking gpxq “ c ‰ 0 a.e., we have: ż ż 0 “ xh, gyL2 pΩq “ hpxqgpxqdx “ c hpxqdx, @k P R Ω
hence
ş
Ω
Ω
hpxqdx “ 0.
What we have just proven and the orthogonal projection theorem imply that any function f P L2 pΩq, Ω Ă Rn such that mpΩq ă `8 can be represented in a unique manner as: f “ xf yΩ ` h
188
From Euclidean to Hilbert Spaces
where xf yΩ is the constant function on Ω and equal to the average of f on Ω and h P L2 pΩq such that xhyΩ “ 0. This implies that h must be an oscillating function, with oscillations that cancel out when we consider its average. This result is already remarkable by its own, but it will be further refined by the Fourier expansion of f , that will be described later in this chapter. 2 5.5. Orthonormal systems and Hilbert bases As we saw in Chapters 1 and 2, the presence of an orthonormal basis in a Euclidean space makes it easy to calculate vector components and to characterize orthogonal projections. Furthermore, using the Fourier basis, it is also possible to define Fourier coefficients and the DFT. In this section, we shall describe the conditions which must be added in order to extend these considerations to infinite-dimensional Hilbert spaces. Let us begin with a definition. D EFINITION 5.6 (Orthonormal system).– An orthonormal family of elements in a Hilbert space is known as an orthonormal system. The properties of orthonormal systems will be analyzed in the context of separable Hilbert spaces, which are defined below. D EFINITION 5.7.– A Hilbert space H is said to be separable if there exists a subset E Ď H which is countable and dense in H: cardpEq “ ℵ0 , E “ H. As the vast majority of Hilbert spaces are, in fact, separable, we shall give a counter-example of a non-separable Hilbert space in section 5.5.3. The main advantage of working with separable Hilbert spaces is set out in Theorem 5.8. T HEOREM 5.8.– All orthonormal systems in a separable infinite-dimensional Hilbert space H are countable. P ROOF.– Let M be an infinite orthonormal system in H. Given that H is separable, there exists a subset E Ď H which is countable and dense in H: E “ H. From the characterization 2 of density given in Definition 4.4, we can guarantee that, for any element x P M and any arbitrary but fixed ε ą 0, Dux P E such that }x ´ ux } ă ε. If we can show that the correspondence defined by the function: ı : M ÝÑ E x ÞÝÑ ıpxq “ ux
The Geometric Structure of Hilbert Spaces
189
is injective, then the theorem will be proven. In fact, if this is the case, M is in bijective correspondence with ıpM q Ď E which is an infinite part of a countable set, and is therefore itself countable. To this aim, we take any y P M such that y ‰ x, and uy P E such that }x´uy } ă ε for all arbitrary but fixed ε ą 0. Since x ‰ y are two distinct points arbitrarily selected in M , the injectivity of ı corresponds to the fact that ıpxq ‰ ıpyq, that is ux ‰ uy . To prove this, we begin by noting that, since x and y belong to an orthonormal system, ? their distance is equal to 2 and we can write: ? 2 “ }x ´ y} “ }x ´ ux ` uy ´ y ` ux ´ uy } ď
triang. ineq.
}x ´ ux } ` }y ´ uy } ` }ux ´ uy }
ă 2ε ` }ux ´ uy }
? ? ? that is }ux ´? uy } ą 2 ´ 2ε. 2 ´ 2ε ą 0 ðñ ε ă 2{2, thus, we simply need to fix ε P p0, 2{2q, }ux ´ uy } ą 0 to obtain ux ‰ uy . 2 This theorem is the reason for selecting a discrete value, for example n P N or Z, to label the elements of an orthonormal system in a separable Hilbert space. C ONVENTION .– From now on, all Hilbert spaces H will be assumed to be separable, unless otherwise stated. The two most important propositions related to orthonormal systems are Bessel’s inequality and the Fischer-Riesz theorem. 5.5.1. Bessel’s inequality and Fourier coefficients The expansion of a vector v P Kn , n ă `8, with respect to an orthonormal basis n ř xv, ui yui , xv, ui y being the components of v in this basis. pui qni“1 is written as v “ i“1
Furthermore, the Plancherel identity holds true: }v}2 “
n ř
|xv, ui y|2 . If we want to
i“1
extend this property to an orthonormal system of an infinite-dimensional Hilbert space H we immediately see that a necessary condition must be verified: for any element x P H, theřsequence pxx, un yqnPN must decay toward 0 when n Ñ `8; otherwise, the series xx, un yun would not converge. The following result guarantees that this nPN
necessary condition is always satisfied; the Plancherel identity, on the other hand, is not guaranteed to hold.
190
From Euclidean to Hilbert Spaces
T HEOREM 5.9 (Bessel’s inequality).– Let pun qnPN Ă H be an orthonormal system in a Hilbert space H. Then, @x P H, it holds that: ÿ |xx, un y|2 ď }x}2 [5.5] nPN
More precisely, the difference between the two sides of inequality [5.5] may be quantified as: 2 ÿ x ´ |xx, un y|2 “ x ´ xx, un yun nPN nPN 2
ÿ
[5.6]
2
Př ROOF .– Bessel’s inequality can be proved by showing that the difference x ´ |xx, un y|2 is equal to the square of a norm, which is ě 0. nPN
For simplicity’s sake, we shall write λn “ xx, un y ðñ λn “ xun , xy @n P N and consider any N P N. By Carnot’s theorem (Theorem 1.5) we have: 2 2 N N N N ÿ ÿ ÿ ÿ 2 λn un “ }x} ´ xx, λ n un y ´ x λn un , xy ` λ n un x ´ n“0 n“0 n“0 n“0 Applying sesquilinearity to the two intermediary terms, and the generalized Pythagorean theorem to the final term, the previous equality can be rewritten as follows: 2 N N N N ÿ ÿ ÿ ÿ λn xx, un y ´ λn xun , xy ` |λn |2 }un }2 λn un “ }x}2 ´ x ´ n“0 n“0 n“0 n“0 From the definitions of λn and λn , and using the fact that }un }2 “ 1 for all n, the final equality becomes: 2 N N N N N ÿ ÿ ÿ ÿ ÿ n |2 “ }x}2 ´ λn un “ }x}2 ´ λ λn λn ` |λ |λn |2 x ´ n λn ´ n“0 n“0 n“0 n“0 n“0 that is: 2 N ÿ x ´ |xx, un y| “ x ´ xx, un yun n“0 n“0 2
N ÿ
2
The Geometric Structure of Hilbert Spaces
191
As we did not impose any restrictions on N P N, this equality holds true for an arbitrarily large value of N , that is: 2 ÿ x ´ |xx, un y| “ x ´ xx, un yun nPN nPN 2
2
ÿ
2
Bessel’s inequality allows us to generalize the definition of Fourier coefficients encountered in Chapter 2. D EFINITION 5.8 (Generalized Fourier coefficients).– The scalars xx, un y P K are said to be the generalized Fourier coefficients of x with respect to the orthonormal system pun qnPN , and are written as: x ˆpnq “ xx, un y
@n P N
Bessel’s inequality can be reformulated stating that, for all x P H, the sequence: x ˆ ” pˆ xn qnPN belongs to 2 pN, Kq, and that: }ˆ x}2 ď }x} @x P H We see that the sequence of generalized Fourier coefficients always decays toward 0. For Hilbert spaces where x can be identified with a function, analyzing the speed of decay of Fourier coefficients provides interesting information concerning the regularity of the function itself. 2
Equation [5.6] gives an estimation of the difference between }x}2 and }ˆ x}2 and, rewritten with the notation introduced above, immediately implies Corollary 5.2. C OROLLARY 5.2.– Let H be a Hilbert space and pun qnPN any orthonormal system in H. Then: 2 ÿ 2 x ˆpnqun “ }x}2 ´ }ˆ x}2 x ´ nPN Specifically,
ř nPN
x ˆpnqun converges to x if and only if }ˆ x}2 “ }x}.
192
From Euclidean to Hilbert Spaces
5.5.2. The Fischer-Riesz theorem Theorem 5.10, which is fundamental in functional analysis, is sometimes referred to as the Fischer-Riesz theorem, for example in the classic Dunford and Schwartz (1958). T HEOREM 5.10 (Fischer-Riesz).– Let H be a Hilbert space, pun qnPN an orthonormal system in H and pkn qnPN a sequence of scalars in K “ R or C. 1) Then: ÿ ÿ kn un converges (in norm } } of H) ðñ |kn |2 converges (in K) nPN
that is,
ř
nPN
kn un converges ðñ pkn qnPN P 2 pN, Kq.
nPN
ř
2) If
kn un converges to the sum x, that is x “
nPN
ř
kn un , then:
nPN
ˆpnq kn “ xx, un y “ x and: }x}2 “
ÿ
|kn |2
nPN
that is, Bessel’s inequality becomes Plancherel’s equality }x}2 “
ř
|xx, un y|2 “
nPN
2
}ˆ x}2 . P ROOF.– ř 1) We wish to verify that studying the convergence of kn un is equivalent to nPN ř |kn |2 . This will be done by using the fact that H and studying the convergence of nPN
K are complete, so the Cauchy condition is necessary ř and sufficient for the sequences to converge, and by remembering that the series kn un is the sequence pSN qN PN “ nPN ˙ ˆN ř k n un of partial sums. n“0
N PN
The Cauchy condition for pSN qN PN is: › › r › ÿ › › › kn un › ă ε @ε ą 0 DKε ą 0 : r ą s ě Kε ùñ }Sr ´ Ss } “ › ›n“s`1 ›
The Geometric Structure of Hilbert Spaces
› › r › ř › › Since › kn un ›› ă ε
193
› ›2 r › ř › › kn un ›› ă ε2 ” δ, as the inequality ›
ðñ
n“s`1
n“s`1
concerns two real positive numbers, the Cauchy condition for pSN qN PN can be redefined as follows: › ›2 r › ÿ › › › @δ ą 0 DKδ ą 0 : r ą s ě Kδ ùñ › kn un › ă δ ›n“s`1 › The usefulness of considering the squared norm is that, thanks to the orthogonality of un , we can use the generalized Pythagorean theorem to write: › ›2 r r r r › ÿ › 1 ÿ ÿ ÿ * › › 2 }u k n un › “ }kn un }2 “ |kn |2 “ |kn |2 › n} ›n“s`1 › n“s`1 n“s`1 n“s`1 The Cauchy condition for the sequence of partial sums of the series
ř
kn un can
nPN
then be rewritten as: @δ ą 0 DKδ ą 0 : r ą s ě Kδ ùñ
r ÿ
|kn |2 ă δ,
n“s`1
which is the Cauchy condition for the sequence of partial sums of the series
ř
|kn |2 .
nPN
Hence, the study of the convergence of the two series is equivalent. ř
2) Assuming that the series
km um converges toward the sum x, then, by
mPN
continuity of the inner product: ÿ ÿ ÿ km um , un y “ km xum , un y “ km δm,n “ kn , xx, un y “ x mPN
Hence: x “
mPN
ř
xx, un yun “
nPN
@n P N.
mPN
ř
x ˆpnqun . The fact that property 2 implies that
nPN
Bessel’s inequality becomes Plancherel’s equality is a direct consequence of Corollary 5.2. An alternative proof is possible using the continuity of the norm: }x}2 “ }
ÿ nPN
xx, un yun }2 “
ÿ nPN
1 ÿ * 2 2 }u |xx, un y|2 |xx, un y|2 “ }ˆ x}2 “ n}
2
nPN
C OROLLARY 5.3.– Let H be a Hilbert space, ř x P H and let pun qnPN be an orthonormal system in H. Then the series x ˆpnqun is always convergent (with respect to the norm } } of H).
nPN
194
From Euclidean to Hilbert Spaces
P ROOF.– By Bessel’s inequality, pˆ xpnqqnPN P 2 pN, Kq, that is, convergent in K; by property 1 of the Fischer-Riesz theorem, the series
ř nPN ř nPN
convergent in H.
|ˆ xpnq|2 is x ˆpnqun is 2
N OTABLE EXAMPLE .– The fact that
x ˆpnqun is always convergent does not necessarily imply that it
ř nPN
converges to x, as we show with the following counter example. We take: H “ L2 r´π, πs, un ptq “ ?1π sinpntq, n P N and t P r´π, πs. It is easy to verify that pun qnPN is an orthonormal system for H. Taking xptq “ cosptq, by direct calculation, we obtain: ¸ ˜ż 8 π ÿ ÿ 1 x ˆpnqun “ cosptq sinpntqdt sinpntq π ´π n“1 nPN şπ Furthermore, ´π cosptq sinpntqdt “ 0 as it is the integral of an odd function on a symmetrical domain, thus: ÿ nPN
x ˆpnqun “
8 ÿ
0 ¨ sinpntq “ 0
n“1
where 0 is the identically ř null function on r´π, πs, which is clearly different from the function cosptq; thus, x ˆpnqun ‰ x. nPN
5.5.3. Characterizations of a Hilbert basis (or complete orthonormal system) The example above shows that an orthonormal system in a Hilbert space H does not necessarily guarantee that the series of Fourier coefficients of x P H multiplied by the elements of this orthonormal system will converge in norm to x itself. This fact naturally raises the question of whether a condition which ensures such a convergence exists. In this section, we shall prove that the answer to this question is affirmative. In section 1.5, we saw that, in finite dimension, this condition is that the orthonormal system must be an orthonormal basis, that is, a maximal set of unitary vectors orthogonal to each other, where “maximal” means that no other unitary vector exists which is orthogonal to all of them. Remarkably, this property also characterizes the bases of an infinite-dimensional Hilbert space, but the terminology used in this case is different.
The Geometric Structure of Hilbert Spaces
195
D EFINITION 5.9 (Complete orthonormal system).– Let pun qnPN Ă H be an orthonormal system of a Hilbert space H. If pun qnPN is not a proper set of another orthonormal system of H, that is, if there are no other unitary vectors orthogonal to the vectors pun qnPN , then this system is referred to as a complete (or total) orthonormal system, or as a Hilbert basis. The property of being a Hilbert basis, in the sense defined above, is equivalent to five other properties. T HEOREM 5.11.– Let pun qnPN be an orthonormal system of a Hilbert space H. The following statements are equivalent: 1) pun qnPN is a Hilbert basis; 2) xx, un y ” x ˆpnq “ 0 @n P N ðñ x “ 0H , that is 0H is the only vector which is orthogonal to all vectors of a complete orthonormal system (or, equivalently, the only vector x P H whose generalized Fourier coefficients are all zero is the null vector); 3) spanppun qnPN q “ H, that is pun qnPN generates a vector subspace which is dense in H; 4) @x P H: ÿ ÿ x“ xx, un yun “ x ˆpnqun nPN
Generalized Fourier series expansion
nPN
5) @x, y P H: ÿ xx, un yxun , yy “ xˆ x, yˆy2 pN,Kq xx, yy “
Parseval’s identity
nPN
6) @x P H: 2
x “
ÿ
|xx, un y|2 “ }ˆ x}2
2
Plancherel’s identity
nPN
P ROOF.– Our proof consists of the following steps: 1q ñ 2q ñ 3q ñ 4q ñ 5q ñ 6q ñ 1q 1q ñ 2q: reasoning by the absurd, if statement 1 is true and statement 2 is false, then Dx˚ P H, x˚ ‰ 0H such that: xx˚ , un y “ 0 @n P N, but then the vector x˚ u˚ “ ˚ is a unitary vector and orthogonal to all of the elements of pun qnPN , thus }x } pu˚ , pun qnPN q would be a larger orthonormal system than pun qnPN , which contradicts the completeness of pun qnPN . ´ ¯K 2q ñ 3q: 2q ñ ppun qnPN qK “ t0H u ðñ spanppun qnPN q “ t0H u, by property 8 from Theorem 5.1, if we take the orthogonal complement of both sides,
196
From Euclidean to Hilbert Spaces
we obtain
´
¯KK ´ ¯KK spanppun qnPN q “ t0H uK “ H, then H “ spanppun qnPN q
“ spanppun qnPN q, by Theorem 5.5. 3q ñ 4q: ř let us consider x, calculate the inner products with pun qnPN and write the series xx, un yun , which we know converges to a certain point y P H. We must nPN
show that, if statement 3 holds, then it follows that y “ x. To this aim, note that the second part of the Fischer-Riesz theorem tells us that xx, un y “ xy, un y @n P N, that ´ ¯K is xx ´ y, un y “ 0, @n P N, that is x ´ y P ppun qnPN qK “ spanppun qnPN q “
p3)
HK “ t0H u, that is, y “ x.
4q ñ 5q: let us consider any x, y P H and write their generalized Fourier series. By statement 4, we have: ÿ ÿ xx, un yun y “ xy, um yum x“ nPN
mPN
thus: xx, yy “ x
ÿ
xx, un yun ,
nPN
ÿ
xy, um yum y
mPN
By the continuity and linearity of the inner product, we have: ÿ ÿ xx, un y xun , xy, um yum y xx, yy “ nPN
mPN
then, by the continuity and sesquilinearity of the inner product: ÿ ÿ ÿ ÿ xx, un y xy, um y xun , um y “ xx, un y xum , yy δn,m xx, yy “ nPN mPN
nPN mPN
that is: xx, yy “
ÿ
xx, un y xun , yy
nPN 2
5q ñ 6q: consider y “ x in statement 5: }x} “ xx, xy “ ř ř xx, un yxx, un y “ |xx, un y|2 . “ nPN
ř
xx, un yxun , xy
nPN
nPN
6q ñ 1q: reasoning by the absurd, if statement 6 is true and statement 1 is false, ˚ ˚ ˚ then ř Du˚ P 2H, }u } “ 1 and xu , un y “ 0 @n P N; this would give us |xu , un y| “ 0, which contradicts statement 4, since it states that nPN ř 2 |xu˚ , un y|2 “ }u˚ } “ 1. 2 nPN
The Geometric Structure of Hilbert Spaces
197
I MPORTANT NOTE CONCERNING PROPERTY 4.– The expansion into a generalized Fourier series on a Hilbert basis is an extension of the decomposition theorem for vectors on an orthonormal basis in a Euclidean space of finite dimension d, as shown in Table 5.1. Kd Hilbert space H Orthonormal basis: pui qi“1,...,d Hilbert basis: pun qnPN d ř ř Expansion: @x P Kd x “ xx, ui yui Fourier series: @x P H x “ xx, un yun i“1
Components: xx, ui y
nPN
Fourier coefficients: xx, un y
Table 5.1. Analogies between a finite-dimensional Euclidean space and an infinite-dimensional Hilbert space
The generalization of the canonical basis of the space 2 pZN q, introduced in section 2.1, is the canonical Hilbert basis of H “ 2 pZ, Kq given by the vectors pek qkPZ , ek pnq “ δk,n @n P Z: pe1 “ p1, 0, 0, . . .q, e2 “ p0, 1, 0, . . .q, . . .q The orthonormal property is obvious; completeness, for example, follows from the fact that the only vector which is orthogonal to e1 , e2 , . . . is the zero vector. T HEOREM 5.12.– All Hilbert spaces H admit a Hilbert basis. P ROOF.– Let O be the collection of all orthonormal families in H. O is an ordered set by inclusion. If Φ Ă O is linearly ordered, then the union of all elements of Φ is a superior bound. Zorn’s lemma (Moretti 2013) guarantees the existence of a maximal element in O. 2 E XAMPLE OF A NON - SEPARABLE H ILBERT SPACE .– The Hilbert space in Theorem 5.11 was implicitly assumed to be separable. Any Hilbert space which does not verify any of the properties which characterize a Hilbert basis is non-separable. We shall use property 2 from Theorem 5.11 to illustrate an example of a non-separable Hilbert space. We begin by defining the following space: H “ tf : R Ñ K : D Ef Ă R, cardpEf q ď ℵ0 : f |Ef P 2 pN, Kq et f |RzEf “ 0RzEf u This is the space made up of all functions f defined on R with a value in K, which vanish everywhere except on a finite or countable subset Ef of R, and such that the sequence f : Ef Ñ K is square summable.
198
From Euclidean to Hilbert Spaces
H is a vector space, with respect to the pointwise-defined linear operations, which may be equipped with the following inner product: ÿ xf, gy “ f pxqgpxq f, g P H xPEf XEg
This is well defined since, by definition of H, the sum is either finite or a convergent series (evidently, if K “ R, the conjugation operation becomes the identity). We can easily verify that H is a Hilbert space with respect to the topology induced by this inner product. Reasoning by the absurd, let us suppose that H is separable, so that any Hilbert basis is be countable. Then let u ” Ť pun qnPN be a Hilbert basis in H, under the separability hypothesis, and take U :“ nPN Un , where the sets Un Ă R @n P N are such that un |Un P 2 pN, Kq and un |RzUn “ 0RzUn . If we can show that there exists an element fu in H which is orthogonal to all un and which is not the identically null function on R, this would prove that property 2 of Theorem 5.11 does not hold: this contradiction implies that H cannot be separable. To construct an element of this sort, we begin by noting that U is the union of countable or finite sets, and is thus, itself, either countable or finite. Considering any point x ¯ P RzU , we can therefore define fu : R Ñ K as: # 1 if x “ x ¯ fu pxq “ 0 otherwise to obtain an element in H such that xun , fu y “ 0 @n P N, but f ‰ 0R . The fact that all complete orthonormal systems of a separable Hilbert space H of infinite dimension are countable should not lead us to think that H itself is of countable dimension as a vector space. In other words, if we consider H simply as a vector space, rather than a Hilbert space, then by definition its dimension is the cardinality of a basis in the algebraic sense, that is, a subset B Ă H of linearly independent elements in H such that any element in H can be obtained through a finite linear combination of elements in basis B. The following result, which we shall not prove, gives us quite surprising information about the difference between the cardinality of complete orthonormal systems and that of algebraic basis of an infinite dimensional Hilbert space. T HEOREM 5.13.– If the common cardinality of the Hilbert bases of a Hilbert space H (separable or otherwise) is ℵ0 , then the cardinality of the dimension of H, as a vector space, cannot be less than ℵ1 . It follows from this theorem that, as vector spaces, separable Hilbert spaces possess at least the cardinality of the continuum, that is a maximal system of linearly
The Geometric Structure of Hilbert Spaces
199
independent vectors possesses at least the cardinality of the continuum. The orthonormality requirement implies a further constraint, in the fact that the distance ? between the elements in the basis must be 2, this forces the cardinality of a complete orthonormal system to drop to that of the countable numbers. Nevertheless, it is important to note – once again – that given a Hilbert basis, any element in an infinite-dimensional Hilbert space can be reconstructed via the generalized Fourier series in the sense of the Hilbert norm; this is by no means equivalent to the possibility of reconstructing elements by means of a finite linear combination. This consideration shows that the concept of Hilbert basis is the most adequate to “parameterize” the elements of an infinite-dimensional Hilbert space via its generalized Fourier coefficients relative to the Hilbert basis, rather than a basis in the algebraic sense. The reason for this lies in the fact that a Hilbert basis interacts with the rich geometric structure of the Hilbert space generated by the inner product via Fourier coefficients, while a mere algebraic basis only takes into account the linear structure. The following definition establishes a specific terminology for the dimension of Hilbert spaces, adopted by certain authors, that we consider particularly adequate. D EFINITION 5.10 (Orthogonal dimension).– Let H be a Hilbert space. The orthogonal dimension of H is the common cardinality of all Hilbert bases in H. Evidently, the orthogonal dimension coincides with the ordinary dimension for a finite-dimensional Hilbert space, but the same cannot be said in infinite dimensions. 5.5.4. Isomorphisms between Hilbert spaces One final property which highlights the analogy between Hilbert spaces and finitedimensional Euclidean spaces is the existence of a prototype for these spaces. As we have seen, the dimension of a vector space V of finite dimension d is sufficient to characterize it up to an isomorphism. In fact, we know that, for any fixed basis of V , the correspondence I : V Ñ Kd which associates each vector v in V with its components (in Kd ) with respect to the chosen basis is an isomorphism. In this sense, Kd is the prototype of vector spaces on K of dimension d ă `8. For (separable) infinite-dimensional Hilbert spaces, the prototype is 2 pN, Kq and the generalized Fourier coefficients replace the vector components. The concept of isomorphism between Hilbert spaces must be defined before we can establish a rigorous statement regarding this fact. The presence of the inner product
200
From Euclidean to Hilbert Spaces
implies that the canonical definition of isomorphism between vector spaces must be adapted to this situation. D EFINITION 5.11 (Isomorphism between Hilbert spaces).– Let H and H1 be two Hilbert spaces on the same field K. The transformation U : H Ñ H1 is an isomorphism of Hilbert spaces if: 1) U is linear; 2) U is bijective; 3) U preserves the inner product, that is: xU pxq, U pyqyH1 “ xx, yyH
@x, y P H
Condition 3 implies (in the specific case where x “ y) that U preserves the norms, that is: }U pxq}H1 “ }x}H
@x P H
This also implies: }U pxq ´ U pyq}H1 “ }U px ´ yq}H1 “ }x ´ y}H
@x, y P H
that is, U preserves the distances. In this case, we say that U is isometric. The property of conservation of the norm implies }U pxq}H1 “ 0 ðñ }x}H “ 0; furthermore, by the definite positivity of the norm, it holds that U pxq “ 0H1 ðñ x “ 0H , that is kerpU q “ t0H u and thus U is injective. An isomorphism U between Hilbert spaces can thus be redefined as a surjective linear transformation which preserves the inner product. Actually, the linearity request is redundant, as we see from the following result. T HEOREM 5.14.– Let V, V 1 be two inner product spaces, of finite or infinite dimension, on the same field K. If the transformation U : V Ñ V 1 is surjective and preserves the inner product, then it is linear. P ROOF.– @x, y, z P V and @α, β P K: 0 “ x0, zy “ xαx ` βy ´ αx ´ βy, zy “ xαx ` βy, zy ´ αxx, zy ´ βxy, zy “
pU preserves x yq
“
plinearity of x yq
xU pαx ` βyq, U pzqy ´ αxU pxq, U pzqy ´ βxU pyq, U pzqy
xU pαx ` βyq ´ αU pxq ´ βU pyq, U pzqy
The Geometric Structure of Hilbert Spaces
201
Since, by hypothesis, U is surjective, as z P V varies, U pzq represents any element of V 1 , thus U pαx ` βyq ´ αU pxq ´ βU pyq is orthogonal to all of the elements of V 1 , that is, U pαx ` βyq ´ αU pxq ´ βU pyq “ 0H1 , hence: U pαx ` βyq “ αU pxq ` βU pyq
@x, y P V, @α, β P K
and so U is linear.
2
The definition of isomorphism between Hilbert spaces can thus be reformulated as follows. D EFINITION 5.12 (Alternative definition of isomorphism between Hilbert spaces).– Let H and H1 be two Hilbert spaces on the same field K. The transformation U : H Ñ H1 is an isomorphism of Hilbert spaces if: 1) U is surjective; 2) U preserves the inner product. The fact of being isomorphic is an equivalence relationship in the set of Hilbert spaces on the same field K. The following result says that the orthogonal dimension plays, for a separable infinite-dimensional Hilbert space, the same role played by the dimension for a finite-dimensional vector space. T HEOREM 5.15.– H, H1 : Hilbert spaces on the same field K. H is isomorphic to H1 if and only if the orthogonal dimension of H is the same as that of H1 . 5.5.5. 2 pN, Kq as the prototype of separable Hilbert spaces of infinite dimension L EMMA 5.2.– Let pun qnPN be a Hilbert basis of H, then, for any sequence pkn qnPN of 2 pN, Kq, there exists x P H such that pkn qnPN “ pxx, un yqnPN . If pkn qnPN P 2 pN, Kq, then, thanks to property 1 of Fischer-Riesz’s P ROOF.– ř theorem, kn un converges to a certain x P H. Then, property 2 of the same nPN
theorem guarantees that pkn qnPN “ pxx, un yqnPN .
2
T HEOREM 5.16.– If the Hilbert space H has countable orthogonal dimension ℵ0 , then H is isomorphic to 2 pN, Kq. P ROOF.– Let pun qnPN be a countable Hilbert basis in H and consider the application: U : H ÝÑ 2 pN, Kq x ÞÝÑ U pxq “ pxx, un yqnPN
202
From Euclidean to Hilbert Spaces
U is surjective by Lemma 5.2 and it preserves the inner product by Parseval’s identity: ÿ ÿ @x, y P H : xx, yyH “ xx, un y xun , yy “ xx, un y xy, un y ” xU pxq, U pyqy2 pN,Kq
nPN
nPN
Hence, U is an isomorphism of Hilbert spaces.
2
5.6. The Fourier Hilbert basis in L2 The best-known example of a Hilbert basis, which is also the most important in terms of practical applications, is the Fourier basis. This basis is defined below in the context of the Hilbert space L2 . 5.6.1. L2 r´π, πs or L2 r0, 2πs Let us begin with H “ L2 r´π, πs or L2 r0, 2πs and K “ C, then: 1 un pxq “ ? einx , 2π
nPZ
is a complete orthonormal system, called the Fourier basis of L2 r´π, πs or L2 r0, 2πs. Note that this orthonormal system completes the orthonormal system ?1 sinpnxq, n P N which we used in section 5.5.2 as a counterexample to show that π the convergence (in Hilbert norm) of the generalized Fourier series to the element defining the generalized Fourier coefficients is not guaranteed if we consider a non-complete orthonormal system. Orthonormality is easy to prove. Considering L2 r´π, πs (the proof for L2 r0, 2πs is the same): ż ż żπ 1 π inx ´imx 1 π ipn´mqx xun , um y “ un pxqum pxqdx “ e e dx “ e dx 2π ´π 2π ´π ´π – if n “ m, then eipn´mqx “ e0 “ 1 and thus xun , un y “ }un }2 “ 1; – if n ‰ m, then, writing y “ ipn ´ mq, the inner product can be written as: ż 1 π yx xun , um y “ e dx 2π ´π “
1 1 x“π reyx sx“´π “ reipn´mqπ ´ eipm´nqπ s “ 0 2πy 2πipn ´ mq
The Geometric Structure of Hilbert Spaces
203
In short, xun , um y “ δn,m , proving orthonormality. The proof that the system is complete, instead, is much more complicated. The Fourier expansion here is written as follows: ÿ xf, un yun @f P L2 r´π, πs : f “ nPZ
where: 1 xf, un y ” fˆpnq “ ? 2π
żπ ´π
f pxqe´inx dx
is the n-th Fourier coefficient of f . Note that the convergence of the series should be interpreted as: ˇ2 ż π ˇˇ N ˇ ÿ ˇ ˇ ˆ f pnqun pxqˇ dx Ñ 0 ˇf pxq ´ N Ñ`8 ˇ ´π ˇ n“´N D EFINITION 5.13.– Take H “ L2 r´π, πs or L2 r0, 2πs. The application: F ” ˆ : H ÝÑ 2 pZ, Cq f ÞÝÑ pfˆpnqqnPZ is known as the Fourier transform of H “ L2 r´π, πs or L2 r0, 2πs. We see that F coincides with the transformation which implements the isomorphism between L2 r´π, πs or L2 r0, 2πs and its prototype 2 pZ, Cq! The Fourier Hilbert basis of L2 pr´π, πsq and L2 pr0, 2πsq can be written in terms of real functions: $ ?1 ’ &u0 ” 2π cosn pxq ” ?1π cospnxq, n P N ’ % sinn pxq ” ?1π sinpnxq, n P N It is important to note that the complex exponential of parameter n P Z is replaced by two real sequences of parameter n P N; this is a consequence of Euler’s formula, eiϑ “ cos ϑ ` i sin ϑ, for all ϑ P R. The advantage of this basis is that it does not contain any imaginary parts; furthermore, the Fourier expansion in this case can be performed: – for even functions, using u0 and cosn ; – for odd functions, using sinn .
204
From Euclidean to Hilbert Spaces
The reason şπ for this result is easily explained: taking an even f , then fˆpnq “ ?1π ´π f pxq sinpnxqdx “ 0 @n, since f pxq sinpnxq is odd and r´π, πs is a symmetrical domain. Similar arguments can be applied to odd functions to obtain the desired result. 5.6.2. L2 pTq Our decision to consider the interval r´π, πs or r0, 2πs reflects the fact that the orthonormality of the system p ?12π einx qnPZ is very easy to prove. Actually, all of the properties stated for this system remain valid if r´π, πs or r0, 2πs is replaced by any other interval of size 2π. Furthermore, these properties continue to hold if we consider functions defined on any real interval, that is, f : R Ñ C, on the condition that they are 2π-periodic. This can be formalized using a highly useful Hilbert space: " L2 pTq “ f : R Ñ C , f measurable , f px ` 2πq “ f pxq, ş2π 0
* |f pxq| dx ă `8 { „ 2
where f „ g if f “ g a.e., as usual. By periodicity, integration can be carried out on any interval of size 2π. The symbol T represents the 1D torus, which may be identified with the unitary circumference. Any function f : R Ñ C which is 2π-periodic may be identified with a function defined on T by means of the following diagram: f
/ C ? p fr T R
p : R ÝÑ T x ÞÝÑ pcos x, sin xq f : R Ñ C 2π-periodic, f˜ : T Ñ C, f˜pppxqq “ f pxq L2 pTq is isomorphic to L2 r0, 2πs or L2 r´π, πs via the application which restricts f : R Ñ C, f P L2 pTq, to the interval r0, 2πs or r´π, πs (or any interval of size 2π): I : L2 pTq ÝÑ L2 r0, 2πs f ÞÝÑ If “ f |r0,2πs ou f |r´π,πs
The Geometric Structure of Hilbert Spaces
205
Using I, the complete orthonormal Fourier system can be transferred from L2 r0, 2πs or L2 r´π, πs onto L2 pTq: ˆ ˙ 1 inx ? e : Hilbert basis for L2 pTq 2π nPZ and the definition of the Fourier transform can be extended on L2 pTq. D EFINITION 5.14.– The transformation: F ” ˆ : L2 pTq ÝÑ 2 pZ, Cq f ÞÝÑ F f “ fr F f pnq “ fˆpnq “ pxf, un yqnPZ “ p ?12π Fourier transform on L2 pTq.
ş2π 0
f pxqe´inx dxqnPZ . is known as the
We know that this transformation is an isomorphism between Hilbert spaces, and ř that ||fp||2 pZ,Cq “ |xf, un y|2 “ ||f ||L2 pTq . nPZ
5.6.3. L2 ra, bs To handle elements of f P L2 ra, bs, a, b P R, a ă b, which are pb´aq-periodic, we must slightly modify the Fourier basis. The trick consists of multiplying the variable of f by an appropriate quantity – the pulse – which turns f into a pb ´ aq-periodic function. Formally, we define: – T “ b ´ a: the period; –ν“
1 T
: the frequency;
– ω “ 2πν “
2π T :
the pulse.
We see that: eiωnpx`T q “ cosrωnpx ` T qs ` i sinrωnpx ` T qs “ cosrωnx `ωnT s ` i sinrωnx ` ωnT s j „ 2π 2π nT ` i sinrωnx ` nT s “ cosrωnx “ cos ωnx ` T T `2πns ` i sinrωnx ` 2πns “ cospωnxq ` i sinpωnxq “ eiωnx
206
From Euclidean to Hilbert Spaces
thus x ÞÑ eiωnx is a T -periodic function. Using these considerations, we can show that a complete orthonormal system for L2 ra, bs can be obtained using the following set of functions: un : ra, bs ÝÑ C x ÞÝÑ un pxq “
x´a
? 1 e2πni b´a b´a
,
nPZ
in the complex case, and: $ 1 u0 “ ?b´a ’ ’ b ¯ ´ & 2 nPN cosn pxq ” b´a cos 2πn x´a b´a , b ¯ ´ ’ ’ %sin pxq ” 2 x´a nPN n b´a sin 2πn b´a , in the real case. In the specific case of the Hilbert space L2 r, s, P R, the Fourier basis can be written as: x 1 un pxq “ ? eπin , 2
nPZ
in the complex case, and: $ u0 “ ?12 ’ ’ b & ˘ ` cosn pxq ” 1 cos πn x , n P N b ’ ’ %sin pxq ” 1 sin `πn x ˘ , n P N n in the real case. 5.6.4. Real Fourier series Using the real Hilbert basis of L2 pTq, that is: $ ?1 ’ &u0 ” 2π cosn pxq ” ?1π cospnxq, n P N ’ % sinn pxq ” ?1π sinpnxq, n P N the real Fourier series expansion for any element f P L2 pTq is: f ptq 2“
L pTq
`8 `8 ÿ ÿ a0 an cospntq ` bn sinpntq ` 2 n“1 n“1
The Geometric Structure of Hilbert Spaces
207
with: a0 “
1 π
ż
an “
1 π
ż
bn “
1 π
ż
T
f ptqdt
a0 1 “ 2 2π
ż T
f ptqdt “ xf yT
f ptq cospntqdt
@n “ 1, 2, . . .
f ptq sinpntqdt
@n “ 1, 2, . . .
T
T
ùñ
(average of f )
The coefficients a0 , an , bn , n “ 1, 2, . . . are known as the real Fourier coefficients of f . Evidently, the equality must be interpreted in the sense of L2 pTq, that is: ż « T
˜ f ptq ´
¸ff2 N N ÿ ÿ a0 an cospntq ` bn sinpntq dt ÝÑ 0 ` N Ñ`8 2 n“1 n“1
The expression: SN ptq “
N N ÿ ÿ a0 an cospntq ` bn sinpntq ` 2 n“1 n“1
is known as a trigonometric polynomial of order N . SN is a 2π-periodic function, like the elements of L2 pTq. To understand the presence of the constant π1 in the real Fourier coefficients, consider the expansion of f with the respect to the system of cosine: `8 ÿ
˙ `8 ÿ ˆ1 ż 1 1 xf, ? cospntqy ? cospntq “ f ptq cospntqdt cospntq π π π T n“1 n“1 the same holds true for the sine system and for the constant. Incorporating the constant π1 into the definition of the Fourier coefficients makes it possible to identify a20 with the average value of f , so that the real Fourier series can be interpreted as the superposition of the average value of f and combinations of harmonic waves of increasing frequency. Notably: – t ÞÑ a1 cosptq ` b1 sinptq is known as the fundamental harmonic; – t ÞÑ an cospntq ` bn sinpntq is the harmonic of order n.
208
From Euclidean to Hilbert Spaces
A tuning fork is able to produce a “pure” sound, that is one which consists exclusively of the fundamental harmonic; the vast majority of musical instruments, on the other hand, produce sounds which can be described by a Fourier series, that is a superposition of harmonics at frequencies which are multiples of the fundamental. Using the orthogonal projection theorem and Plancherel’s identity, we can say that the mean quadratic error (that is the norm L2 ) between f and the trigonometric polynomial of order N is: « ff ż ż N ÿ ` 2 ˘ a20 2 2 2 EN “ rf ptq ´ SN ptqs dt “ f ptq dt ´ π a n ` bn ` 2 T T n“1 and since EN ÝÑ 0, it holds that: N Ñ`8
`8 ÿ` ˘ a2 a2n ` b2n f ptq dt “ π 0 ` 2 T n“1
«
ż
ff
2
This is an identity between an integral and a numerical series, and is particularly useful for determining one of these two objects by calculating the other. Taking L2 ra, bs and writing T “ b ´ a and ω “ 2π T , we know that the real Fourier Hilbert basis is: # + c c 1 2 2 ? , cospωntq, sinpωntq, n “ 1, 2, 3, . . . T T T With respect to this Hilbert basis, the Fourier series expansion of f P L2 pra, bsq, f (T -periodic) is: f ptq
“ 2
L ra,bs
`8 `8 ÿ ÿ a0 an cospωntq ` bn sinpωntq ` 2 n“1 n“1
with: 1 a0 “ 2 T
żb
2 T
żb
an “
a
a
f ptqdt “ xf yra,bs
(average of f )
f ptq cospωntqdt, bn “
2 T
żb
f ptq sinpωntqdt
a
In this case, the Fourier polynomials are T -periodic functions.
@n “ 1, 2, . . .
The Geometric Structure of Hilbert Spaces
209
Exercise 5.2 ` The family ˘ pek : r´π, πs Ñ CqkPZ of non-normalized exponentials ek ptq :“ eikt kPZ is a Hilbert basis of L2 r´π, πs if this space is equipped with an ş 1 π f pxqgpxqdx. inner product defined by xf, gy0 “ 2π ´π 1) Write the Fourier series associated with the function φ : r´π, πs Ñ C, t ÞÑ cosp3tq ´ sinp5tq. 2) Take N˚ “ Nzt0u, and let pψk : R Ñ RqkPN˚ be the family defined by ψk ptq “ sinpktq. şπ a) Consider f P L2 r0, πs such that 0 f ptqψk ptqdt “ 0 @k P N˚ and also # f ptq if 0 ď t ă π gptq “ ´f p´tq if ´ π ă t ă 0 Prove that
şπ
´π
gptqe´ikt dt “ 0 @k P N˚ .
b) Prove that pψşk qkPN˚ is a complete system in L2 r0, πs equipped with the π inner product xf, gy “ 0 f pxqgpxqdx, that is a non-orthogonal family of elements in L2 r0, πs such that: spanppψk qkPN˚ q “ L2 r0, πs ô pspanppψk qkPN˚ qqK “ t0L2 r0,πs u ô xf, ψk y “ 0 @f P L2 r0, πs, @k P N˚ ñ f ” 0L2 r0,πs 3) Construct a Hilbert basis of L2 r0, πs from the family pψk qkPN˚ . 4) Use the result obtained above to determine a sequence of real coefficients `8 ř ak ψk “ 1 (equality in the sense of L2 r0, πs). pak qkPN˚ such that k“1
5) Using Plancherel’s identity, prove that the following formula is valid: `8 ÿ
1 π2 “ p2k ` 1q2 8 k“0 Solution to Exercise 5.2 1) We can rewrite φ as: φpxq “
1 3it 1 pe ` e´3it q ´ pe5it ` e´5it q 2 2i
1 that is, φ “ 12 pe3 ` e´3 q ´ 2i pe5 ´ e´5 q, with the equality in the sense of L2 r´π, πs, is the Fourier series of the function φ by the uniqueness of the decomposition.
210
From Euclidean to Hilbert Spaces
2) We shall consider these two points separately. a) By direct calculation: żπ ´π
gptqe´ikt dt “
ż0 ´π
´f p´tqe´ikt dt `
żπ 0
f ptqe´ikt dt
if we change the variable in the first integral as follows s “ ´t, ds “ ´dt, we obtain: ş0 ş0 şπ şπ ´f p´tqe´ikt dt “ π f psqeiks ds “ 0 ´f psqeiks ds “ 0 ´f ptqeikt dt and thus: ´π żπ
´ikt
gptqe ´π
dt “
żπ 0
´f ptqe
ikt
dt `
żπ 0
f ptqe
´ikt
dt “
żπ 0
f ptqpe´ikt ´ eikt qdt
By using Euler’s formula for the sine we have: żπ żπ żπ gptqe´ikt dt “ ´2i f ptq sinpktqdt “ ´2i f ptqψk ptqdt “ 0 ´π
0
0
by definition of the functions ψk . b) The function f defined in 2(a) is, by hypothesis, orthogonal to all the elements pψk qkPN˚ , so, to verify that pψk qkPN˚ is a complete system for L2 r0, πs we simply have to prove that f “ 0L2 r0,πs . To do that, we use the fact that, by definition, g|r0,πs “ f , thus, if we show that g “ 0L2 rπ,πs , then, necessarily, f “ 0L2 r0,πs . ş 1 π gptqe´ikt dt “ 0 @k P N˚ , if this Thanks to what shown previously, xg, ek y “ 2π ´π ˚ holds also for k “ 0 and ´k, with k P N , then g is orthogonal to all the elements of the Hilbert basis pek qkPZ de L2 r´π, πs, which implies g “ 0L2 r´π,πs , thanks to theorem 5.11. To resume, the only properties that we have to verify are: xg, e0 y “ 0 and xg, e´k y “ 0 for all k P N˚ : ż 1 0 pcosp´3tq ´ sinp´5tqqdt xg, e0 y “ xg, e0 y “ 2π ´π ż 1 0 pcosp3tq ` sinp5tqqdt “ 2π ´π ˜ ˇ0 ˇ0 ¸ 1 cosp5tq ˇˇ sinp3tq ˇˇ “ ´ “0 2π 3 ˇ´π 3 ˇ´π ż ż 1 0 1 π ´ikt pcosp3tq ` sinp5tqqe dt ` pcosp3tq ´ sinp5tqqe´ikt dt xg, e´k y “ 2π ´π 2π 0 ż ż 1 π 1 0 pcosp3tq ` sinp5tqqeikt dt ` pcosp3tq ´ sinp5tqqeikt dt “ 2π ´π 2π 0 ż ż 1 ´π 1 0 ´pcosp3sq ´ sinp5sqqe´iks ds ` ´pcosp3sq ` sinp5sqqe´iks ds “ 2π π 2π 0
The Geometric Structure of Hilbert Spaces
“
1 2π
żπ 0
pcosp3tq ´ sinp5tqqe´ikt dt `
1 2π
ż0 ´π
211
pcosp3tq ` sinp5tqqe´ikt dt
” xg, ek y “ 0 @k P N˚ 3) The fact that pψk qkPN˚ is a complete system in L2 r0, πs means that we can obtain a Hilbert basis for the same space simply by examining the orthonormal properties of this system. For all n, m P N˚ : żπ xψn , ψm y “ sinpntq sinpmtqdt pt ÞÑ sinpntq sinpmtq is evenq 0
“
1 2
“
1 2
żπ sinpntq sinpmtqdt ´π żπ
eint ´ e´int eimt ´ e´imt dt 2i 2i
´π żπ
1 peint ´ e´int qpeimt ´ e´imt qdt 8 ż´π 1 π int pe ´ e´int qpe´imt ´ eimt qdt “´ 8 ´π “´
2π xen ´ e´n , e´m ´ em y0 8 π “ ´ pxen , e´m y0 ´ xen , em y0 ´ xe´n , e´m y0 ` xe´n , em y0 q 4 # 0 if n ‰ m “ π π ´ 4 p´1 ´ 1q “ 2 if n “ m, ¯ ´b a 2 is a Hilbert basis of L2 r0, πs. Thus }ψn } “ π2 @n P N˚ and so π ψn ˚ “´
nPN
4) Let us interpret 1 as the constant function 1 P ´b L2 r0, πs, ¯ 1ptq “ 1 @t P 2 r0, πs, which we shall decompose on the Hilbert basis of L2 r0, πs, π ψn nPN˚ determined above: c c `8 `8 ÿ ÿ 2 2 2 x1, 1“ x1, ψk y ψk ψk y ψk “ π π π k“1 k“1 showing us that 1 “
`8 ř
ak ψk , with:
k“1
2 2 ψk y “ π π
żπ
‰ 2 2 “ π 1 ´ p´1qk r´ cospktqs0 “ πk πk 0 # 0 k even . that is, the sequence we wanted to find is: ak “ 4 k odd πk ak “ x1,
sinpktqdt “
212
From Euclidean to Hilbert Spaces
5) Plancherel’s identity for 1 gives us: ˇ2 ˇ c `8 ˇ ÿ ˇˇ 2 ˇ 2 }1} “ ψk yˇ ˇx1, ˇ ˇ π k“1
Moreover, }1}2 “
şπ 0
1dt “ π and x1,
b
2 π ψk y
“
aπ
2 ak ,
hence:
`8 `8 ÿ π ˆ 4 ˙2 ÿ π 1 1 π2 2 π“ ðñ “ |a2k`1 | “ 2 2 π p2k ` 1q2 p2k ` 1q2 8 k“0 k“0 k“0 `8 ÿ
2 5.6.5. Pointwise convergence of the real Fourier series: theorem
Dirichlet’s
Fourier series were initially met with skepticism by the mathematical community. The idea that series with trigonometric (hence infinitely derivable) functions could be used to approximate non-derivable or, worse, non-continuous functions was considered absurd by many. Furthermore, Fourier did not provide rigorous convergence results for the series that bears his name. In fact, the theorems that we saw earlier concerning convergence in norm were obtained at a later stage by other mathematicians; furthermore, they are not sufficient to guarantee the pointwise convergence of the series. The first conditions for pointwise convergence of the Fourier series were identified by Dirichlet6 (b. 1805, Düren; d. 1859; Göttingen) in 1829. Dirichlet’s constructive proof is of crucial importance in Fourier analysis; readers who wish to explore the subject further may wish to consult Vretblad (2003). For the purposes of this book, we shall simply provide a rigorous definition of Dirichlet’s theorem, introducing the associated notation and terminology. If t0 is a point of discontinuity of a real-valued function f of one real variable, then the right and left limits are written as: f pt` 0 q “ lim f ptq, tÑt` 0
f pt´ 0 q “ lim f ptq tÑt´ 0
D EFINITION 5.15 (Dirichlet function).– Let f : R Ñ R. f is a Dirichlet function if it verifies the following conditions: 6 Remarkably, the “modern” definition of a function, as a univocal correspondence between two sets, was established by Dirichlet as part of his efforts to prove the pointwise convergence of the Fourier series.
The Geometric Structure of Hilbert Spaces
213
1) f is T -periodic, T P R` ; 2) f is piecewise continuous, that is there is only a finite number of points at which f is not continuous; 3) for all t0 P R: f pt0 q “
´ f pt` 0 q ` f pt0 q , 2
[5.7]
that is, at any point t0 P R, the value of f in t0 is the average of the right and left limits of f in t0 . Condition [5.7] is of course satisfied in any point where f is continuous; however, it is not trivial to requite at any point of discontinuity. D EFINITION 5.16 (Generalized derivative).– Let f be a Dirichlet function and take t0 P R. f is said to possess a generalized derivative on the right in t0 if the following (finite) limit exists: lim
hÑ0`
f pt0 ` hq ´ f pt` 0q h
In the same way, f is said to possess a generalized derivative on the left in t0 if the following (finite) limit exists: lim
hÑ0´
f pt0 ` hq ´ f pt´ 0q h
These elements are necessary in defining Dirichlet’s theorem. T HEOREM 5.17 (Dirichlet’s theorem, 1829).– Let f be a Dirichlet function and take t0 P R. If the function f possesses generalized derivatives on the right and left at point t0 , then the real Fourier series of f evaluated in t0 converges to f pt0 q. The conditions of this theorem are known as the Dirichlet conditions; they are sufficient, but not necessary, for the pointwise convergence of the real Fourier series. Conditions which are both necessary and sufficient for the pointwise convergence of the Fourier series have yet to be identified. Nevertheless – thankfully – the Dirichlet conditions are verified for the vast majority of functions encountered in practical applications. Note that, if we ignore the requirement [5.7], then the Fourier series converges to f pt0 q “
´ f pt` 0 q`f pt0 q . 2
214
From Euclidean to Hilbert Spaces
One final remark concerning the possible consequences of a lack of continuity in f : In 1923, the great Russian mathematician Kolmogorov (b. 1903, Tambov; d. 1987, Moscow) succeeded in building a function with pathological discontinuities which make its Fourier series diverge at all points. 5.6.6. The Gibbs phenomenon and Cesàro summation Dirichlet’s theorem does not imply that the behavior of the Fourier series in the neighborhood of a discontinuity of a function will be “regular”; in fact, as we approach a jump discontinuity, oscillations – known as Gibbs oscillations – begin to appear, and remain present even when the number of Fourier coefficients is increased. If a function f is a Dirichlet function, then the oscillations to the left and right of the discontinuity cancel out, and their average coincides with the value of f at the jump. The difference between the value of the function f and the value of the trigonometric polynomial SN in an arbitrarily close neighborhood of a jump continuity can be shown to be close to 18 %, even when N Ñ `8. The analysis of the Gibbs phenomenon involves mathematical subtleties which lie outside the scope of this book. For a more detailed exploration of the Gibbs phenomenon, readers may wish to consult Vretblad (2003). Figure 5.2 shows the Gibbs effect for a rectangular pulse function. Gibbs oscillations can be eliminating by considering a Cesàro (1859, Naples-1906, Torre Annunziata) summation in place of the usual summation; in this case, arithmetic averages of the partial sums are used to “smooth out” oscillations. 5.6.7. Speed of convergence to 0 of Fourier coefficients We begin with a general result. L EMMA 5.3 (Riemann-Lebesgue lemma).– Taking f P L1 ra, bs, then: żb żb żb lim f ptq cospntqdt “ lim f ptq sinpntqdt “ lim f ptqeint dt “ 0
nÑ`8 a
nÑ`8 a
nÑ`8 a
The geometric interpretation of the Riemann-Lebesgue lemma is that the function f ptq cospntq or f ptq sinpntq oscillates at such a high frequency when n Ñ `8 that the values around the average cancel out, and thus the integral converges to 0. An immediate corollary of this lemma is that the Fourier coefficients of the Fourier series of a function f P L1 ra, bs (and, of course, pb ´ aq-periodic), decay toward 0 when n Ñ `8.
The Geometric Structure of Hilbert Spaces
215
Figure 5.2. Gibbs phenomenon for the rectangular pulse function (courtesy of Éric Luçon)
Theorem 5.18 shows that the regularity of f has an important effect on the speed of decay of Fourier coefficients. T HEOREM 5.18.– Let f : R Ñ R be a function that: – is of class C p pra, bsq, that is f is derivable p times on ra, bs with p continuous derivatives; – is pb ´ aq-periodic;
216
From Euclidean to Hilbert Spaces
– possesses equal generalized derivatives at the extrema of the interval ra, bs. Then, the Fourier coefficients of f , an , bn , n “ 1, 2, . . . verify: ˆ ˙ 1 an , bn “ o , np that is they decay toward 0 faster than
1 np .
This result is very important, as it tells us that if f is “smooth”, then it can be approximated in a precise manner even with a small number of Fourier coefficients. However, if f is not sufficiently smooth, then the convergence to 0 of the Fourier coefficients of f is slow, and a large number of these coefficients is required in order to obtain a good approximation of f . The inverse is also true under some suitable hypotheses, which space does not permit us to describe here. The most important concept to grasp is that the faster the Fourier coefficients of a function converge to 0, the smoother the function is. P ROOF.– Let us consider the coefficients an ; the proof is identical for the coefficients bn . We can develop our proof, without loosing generality, by considering b “ π, a “ ´π, in fact it is always taken back our analysis to these values thanks to the following linear variable change: sptq “
b`a b´a ` t 2 2π
which shows that spπq “ b and sp´πq “ a. Using this convention, the expression of an , n “ 0, 1, 2, . . . is integrated by parts, with u “ f ptq and dv “ cospntqdt, hence du “ f 1 ptqdt and v “ n1 sinpntq. We obtain: ż 1 1 π 1 π f ptq sinpntqdt rf ptq sinpntqs´π ´ πn πn ´π ż ¯ ´π 1 π 1 ` nt dt f ptq cos “ πn ´π 2 ` ˘ since sinpnπq “ sinp´nπq “ 0 and cos π2 ` α “ ´ sinpαq @α P R. an “
The Geometric Structure of Hilbert Spaces
217
After a second integration by parts, we obtain: " ” ż ´π ´π ¯ıπ ¯ * 1 1 1 1 π 2 f ptq sin an “ ´ f ptq sin ` nt ` nt dt πn n 2 n ´π 2 ´π Since f 1 p´πq “ f 1 pπq by hypothesis, the first bracketed term is zero, hence: ” ´π ´π ´π ¯ıπ ¯ ¯ f 1 ptq sin “ f 1 pπq sin ` nt ` nπ ´ f 1 p´πq sin ´ nπ 2 2 2 ´π ” ´π ¯ ´π ¯ı ` nπ ´ sin ´ nπ “ f 1 pπq sin 2 2 ” ´π ¯ ´π ¯ı “ f 1 pπq sin ` nπ ´ sin ´ nπ ` 2nπ 2 2 ” ´π ¯ ´π ¯ı ` nπ ´ sin ` nπ “ 0 “ f 1 pπq sin 2 2 Furthermore, the second term in brackets can be rewritten as: ż ż ´π ´π π ¯ ¯ 1 π 2 1 π 2 ´ f ptq sin f ptq cos ` nt dt “ ` ` nt dt n ´π 2 n ´π 2 2 żπ ´ ¯ π 1 f 2 ptq cos “ ¨ 2 ` nt dt n ´π 2 Moreover: an “
1 πn2
żπ ´π
f 2 ptq cos
´π 2
¯ ¨ 2 ` nt dt
In short, integration by parts of an gives us the expression: ż ´π ¯ 1 π 1 an “ f ptq cos ` nt dt πn ´π 2 With two integrations by parts of an , we have: żπ ´π ¯ 1 2 an “ f ptq cos ¨ 2 ` nt dt πn2 ´π 2 With p integrations by parts of an , we have: żπ ´π ¯ 1 ppq an “ f ptq cos ¨ p ` nt dt πnp ´π 2 Similarly, we obtain: żπ ¯ ´π 1 ppq ¨ p ` nt dt f ptq sin bn “ πnp ´π 2
218
From Euclidean to Hilbert Spaces
`π ˘ We now ` π see˘ that, by using the trigonometric identities cos 2 ` α “ ´ sinpαq and sin 2 ` α “ cospαq, @α P R, the integrals: ż ż ´π ´π ¯ ¯ 1 π ppq 1 π ppq εn “ f ptq cos f ptq sin ¨ p ` nt dt, ε˜n “ ¨ p ` nt dt π ´π 2 π ´π 2 are, by definition, the Fourier coefficients of the function f ppq to within a sign. By hypothesis, f ppq is continuous on r´π, πs and thus, as the domain r´π, πs is compact, f ppq P L1 r´π, πs; hence, by the Riemann-Lebesgue lemma, its Fourier coefficients converge to 0 when n Ñ `8. Furthermore, εn ÝÑ 0 and ε˜n ÝÑ 0, which means nÑ8 nÑ8 that: εn ε˜n ÝÑ 0, bn “ p ÝÑ 0 np nÑ8 n nÑ8 ` ˘ that is an , bn “ o n1p . an “
2
This result was used by Krylov (1863–1945) as the foundation of his method for improving the convergence of Fourier series for jump-discontinuous functions. 5.6.8. Fourier transform in L2 pTq and shift Now, let us analyze the relationship between shift and the Fourier transform for a function f P L2 pTq. The result is qualitatively identical to that which we obtained for the DFT in section 2.7.2. T HEOREM 5.19 (Fourier transform and shift).– Taking f P L2 pTq, then: 1) if ga pxq “ f px ´ aq, a P R, then: gˆa pnq “ e´ina fˆpnq, @n P Z; 2) if gk pxq “ eikx f pxq, k P Z, then: gˆk pnq “ fˆpn ´ kq, @n P Z. P ROOF.– Only the proof for 1 is shown here, as the proof for 2 is analogous. The proof consists of a direct calculation in which we make use of the shift-invariance of the Lebesgue measure: ż 2π f px ´ aq ´inx ? gˆa pnq “ xga , un y “ e dx 2π 0 ż 2π ż 2π´a f pxq ´inpx`aq f pxq ´inx 2 ´ina ? e ? e dpx ` aq “ e dx “ 2π 2π ´a 0 ´ina p “e f pnq. ! ) D EFINITION 5.17.– The set |fˆpnq|, n P Z is the spectrum (amplitude spectrum) of f P L2 pTq.
The Geometric Structure of Hilbert Spaces
219
|fˆpnq| represents the weight of importance of the harmonic of frequency n, that is, einx in reconstructing f , as can be seen in the formula ř ˆ einx f pnq ?2π . f“ nPZ
The property which we have just proved shows that the spectrum of f gives us information concerning the presence of certain frequencies in f ; however, it tells us nothing about their “position”: the shifted signal ga pxq “ f px ´ aq has the same spectrum as f , since |p ga pnq| “ |fppnq|. Localized information concerning frequency and position can be obtained in the context of wavelet theory. 5.7. Summary In this chapter, we extended some structural property of finite-dimensional inner product spaces to infinite dimensional Hilbert spaces. The orthogonal complement to a subset or vector subspace of a Hilbert space plays an important role in this extension. The theorem of projection onto a closed convex subset of a Hilbert space is essential for extending the geometric structure of finite-dimensional Euclidean spaces to infinite dimensions. The proof of this theorem draws on the parallelogram law, for which a Hilbert norm is required; hence, the theorem is only valid in Hilbert spaces. When the closed convex subset from the previous theorem is also a vector subspace, then the difference between the original vector and its projection belongs to the orthogonal complement of the subspace, as it does in finite dimensions; this property allows us to extend the orthogonal projection theorem to infinite-dimensional Hilbert spaces. The orthogonal projection theorem is used to produce an extremely useful characterization of closed vector subspaces in Hilbert spaces, as those which coincide with their biorthogonal complement. We examined orthonormal systems in separable Hilbert spaces, that is, those which possess at least one countable dense subset. An orthonormal system of a separable Hilbert space is countable. All of the Hilbert spaces discussed here are implicitly considered to be separable unless otherwise stated. In order for an orthonormal system pun qnPN to be the generalization of an orthonormal basis to an infinite-dimensional Hilbert space H, we must first guarantee that for all x P H, the sequence of Fourier coefficients pˆ xpnq “ xx, un yqnPN decays
220
From Euclidean to Hilbert Spaces
toward 0; otherwise, the expansion
ř
xx, un yun would not converge. Bessel’s
nPN
inequality ensures that this is the case, due to the fact that the sequence of Fourier coefficients with respect to any orthonormal system of a Hilbert space belongs to 2 . Bessel’s inequality also tells us that Plancherel’s identity verified ř is not necessarily for any orthonormal system, as, in general, it holds that |xx, un y|2 un ď }x}2 . nPN
ř The Fischer-Riesz theorem states that Plancherel’s identity holds when the series xx, un yun converges to x; using a counter-example, we showed that this is not nPN ř xx, un yun is the the case for an arbitrary orthonormal system. It turns out that nPN
expansion of x when the orthonormal system pun qnPN is complete, that is, it is not a proper part of another orthonormal system in H. Complete orthonormal systems are also known as Hilbert bases. A Hilbert basis pun qnPN can be characterized using five equivalent conditions: the fact that the zero vector is the only vector which is orthogonal to all elements in a Hilbert basis, the fact that the subspace generated by the Hilbert basis is dense in H, the ability to expand into a generalized Fourier series, Parseval’s identity and Plancherel’s identity. An isomorphism of Hilbert spaces is a surjective transformation which preserves the inner product. We saw that the preservation of inner products implies isometry, and thus injectivity; furthermore, the combination of surjectivity and conservation of the inner product implies linearity. All separable Hilbert spaces on the same field are isomorphic to one another; the prototype of an infinite-dimensional, separable Hilbert space on the field K is 2 pN, Kq. This result is the extension, to infinite dimensions, of the fact that Kn is the prototype of all vector spaces of finite dimension n on K. The classic Fourier series and transform on spaces L2 pra, bsq are defined as a special case of the theory developed earlier; their specificity lies in the choice of a Hilbert basis given by complex exponentials, or by a cosine and sine (plus a constant function). This also holds for functions defined on R, as long as they are periodic. As in the case of sequences in 2 pZN q, also for the functions of L2 described earlier, the Fourier spectrum (the set of magnitudes of the Fourier coefficients) is shiftinvariant, raising the necessity of an extension of Fourier theory to provide a localized frequency analysis. Wavelet theory responded to this need.
6 Bounded Linear Operators in Hilbert Spaces
A function A : V ùñ W , with V and W normed vector spaces on the same field K, is known as a linear operator between V and W if: @α, β P K, Apαx ` βyq “ αApxq ` βApyq,
@x, y P V
To simplify the notation, the parentheses may be omitted in later occurrences, writing Ax in place of Apxq. V is the domain of A; the set: ImpAq “ ApV q “ ty P W : Dx P V : y “ Axu Ď W is the codomain or image of A, and W is the destination set of A. Basic examples are shown below. 1) The identity operator: id : V Ñ V , idpxq “ x @x P V and the null operator: 0 : V Ñ V , 0pxq “ 0V @x P V ; 2) The differential operator: this is defined on a space of differentiable functions which may change according to the particular application we are interested in. As a concrete example, consider the first-order differential operator: D1 f ptq “ df dt ptq “ f 1 ptq. dompD1 q “ tf P L2 ra, bs X C 1 ra, bs : f 1 P L2 ra, bsu, where a ă b are real constants, could be a perfectly valid domain for D1 . Then: D1 : dompD1 q Ă L2 ra, bs ÝÑ L2 ra, bs f ÞÝÑ D1 f Similarly, the operator Dn f ptq “
dn f ptq “ f pnq ptq dtn
From Euclidean to Hilbert Spaces: Introduction to Functional Analysis and its Applications First Edition. Edoardo Provenzi. © ISTE Ltd 2021. Published by ISTE Ltd and John Wiley & Sons, Inc.
222
From Euclidean to Hilbert Spaces
can be defined on the domain dompDn q “ tf P L2 ra, bs X C 1 ra, bs L2 ra, bsu, where a ă b are real constants, that is:
:
f pnq P
Dn : dompDn q Ă L2 ra, bs ÝÑ L2 ra, bs f ÞÝÑ Dn f Partial differential operators are defined in a similar way; 3) The integral operator: this operator is typically defined by considering a kernel function kps, tq, k P L2 pra, bsˆra, bsq, where a ă b are real constants. The integration operator with kernel k is: Tk : L2 ra, bs ÝÑ L2 ra, bs şb f ÞÝÑ Tk f, where Tk f psq “ a kps, tqf ptqdt 4) Linear operators in finite dimensions. Let A : Kn Ñ Kn be a linear operator and let pu1 , . . . , un q be an orthonormal basis in Kn . Any x P Kn can be written as n n ř ř λj Auj . Then: λj uj , with λj P K @j and, by linearity, Ax “ x“ j“1
j“1
xAx, uj y “
n ÿ j“1
λj xAuj , ui y “
n ÿ
αij λj ,
@i “ 1, . . . , n
[6.1]
j“1
where αij “ xAuj , ui y. This shows that the action of A is entirely determined by the matrix of element pαij qi,j“1,...,n and vice versa: for any matrix with elements pαij qi,j“1,...,n , formula [6.1] can be used to define a linear operator on Kn . This last example highlights the well-known relationship between linear operators on Kn and n ˆ n matrices with elements in K. Since Kn is the prototype of all vector spaces V of dimension n on K, we can say that the theory of linear operators on vector spaces in finite dimensions is, in essence, a matrix theory. As we shall see, the action of bounded linear operators on separable Hilbert spaces can also be expressed using a matrix, but, in this case, the matrix contains a countably infinite number of rows and columns. The presence of a topology generated by a norm motivates the need to check the continuity of linear operators defined between two normed vector spaces V and W . If V and W have finite dimension, then any linear operator between them is continuous. However, as we shall see in section 6.2.1, if V is of infinite dimension, then even simple linear operators may not be continuous. In the following sections, we shall examine the main properties of linear operators starting by showing that a linear operator is continuous if and only if it is bounded.
Bounded Linear Operators in Hilbert Spaces
223
6.1. Fundamental properties of bounded linear operators between normed vector spaces We begin by introducing formal definitions for continuous and bounded operators. Let pV, } }V q and pW, } }W q be two generic normed vector spaces. D EFINITION 6.1.– Let A : V ùñ W be a linear operator: – A is continuous in x0 P V if: @ε ą 0 Dδε ą 0 : }x´x0 }V ă δε ùñ }Ax´Ax0 }W “ }Apx´x0 q}W ă ε – A is continuous on V if A is continuous in every element of V ; – A is bounded if Dc P R, c ě 0, such that: }Ax}W ď c}x}V
@x P V
that is, any vector x P V is transformed by A into a vector Ax whose norm in W is majorized by a positive multiple of the norm of x in V . The continuity of a linear operator is equivalent to sequential continuity, just as we saw in the case of functions defined on metric spaces. T HEOREM 6.1.– The linear operator A : V ùñ W is continuous in x0 P V if and only if: @pxn qnPN Ă V, xn
ÝÑ
n ùñ `8
x0 ùñ Axn
ÝÑ
n ùñ `8
Ax0
that is: @pxn qnPN Ă V, }xn ´ x0 }V
ÝÑ
n ùñ `8
0 ùñ }Axn ´ Ax0 }W
ÝÑ
n ùñ `8
0
Before going into the details concerning the properties of continuous linear operators, we can show that any continuous linear operator on a Hilbert space can be represented by an infinite matrix. Let us use the same argument of example 4 previously discussed: let H be a Hilbert space, A : H Ñ H a continuous linear ř operator and pun qnPN a Hilbert basis of H. Then, for all x P H, x “ xx, un yun nPN
and by the continuity and linearity of A, we have: ˜ ¸ ÿ ÿ Ax “ A xx, un yun Apxx, un yun q “ (continuity)
nPN
nPN
“
(linearity)
Furthermore, by the continuity of the inner product: ÿ ÿ xAun , um yxx, um y xAx, um y “ x xx, un yAun , um y “ nPN
“
ÿ nPN
αnm xx, um y,
nPN
@m P N
ÿ
xx, un yAun
nPN
224
From Euclidean to Hilbert Spaces
where αnm “ xAun , um y, thus the infinite matrix with elements pαmn qn,mPN is the representation of the continuous linear operator A with respect to the Hilbert basis pun qnPN . Unlike the finite dimensional case, it is not easy to know when an infinite matrix corresponds to a continuous linear operator; this is the reason why infinite matrices are almost never used when studying linear operators in infinite-dimensional Hilbert spaces. Theorem 6.2 makes it considerably simpler to analyze the continuity of linear operators. T HEOREM 6.2.– Let A : V ùñ W be a linear operator and x0 P V an arbitrary fixed element. Then, A is continuous in x0 if and only if A is continuous on all V . This theorem implies that we simply need to prove the continuity of a linear operator at a single, arbitrary point in order to guarantee the continuity over the whole vector space on which it is defined. P ROOF.– ð : trivial, as if A is continuous on V , then, by definition, A is continuous at all points in V . ùñ : let A be continuous in x0 . To demonstrate that A is continuous in V , we must prove that the continuity of A in x0 implies its continuity in any arbitrary element x P V . Given any sequence pxn qnPN Ă V such that xn Ñ x, we must prove that this implies }Apxn q ´ Apxq}
ùñ
n ùñ `8
n ùñ `8
0.
We note that the sequence pxn ´ x ` x0 qnPN converges to x0 since pxn qnP N converges to x. Thus, by the continuity of A in x0 , it holds that Apxn ´ x ` x0 q Ñ Apx0 q, that is }Apxn ´ x ` x0 q ´ Apx0 q} ùñ 0; n ùñ `8
n ùñ `8
furthermore, by the linearity of A, }Apxn ´ x ` x0 q ´ Apx0 q} }Apxn q ´ Apxq ` Apx Apx ùñ 0. 0q ´ 0 q} “ }Apxn q ´ Apxq} n ùñ `8
“ 2
Thus, to verify the continuity1 (or lack of continuity!) of a linear operator A : V ùñ W , we must simply verify this property for an arbitrary point in V . This point is often chosen to be 0V , the zero vector in V , as, in many cases, it simplifies the calculations involved. 1 We recall that, for a linear operator, continuity and uniform continuity are equivalent conditions.
Bounded Linear Operators in Hilbert Spaces
225
This fact is used below to prove a theorem which shows the relationship between continuous and bounded linear operators. T HEOREM 6.3.– A linear operator A : V continuous.
ùñ W is bounded if and only if it is
P ROOF.–
xn
A bounded ùñ A continuous ðñ A continuous in 0V . Take pxn qnPN Ă V , Ñ 0V , that is }xn }V Ñ 0; as A is assumed to be bounded, Dc P R` n ùñ `8
n ùñ `8
such that: }Axn ´ A0V }W
“
A0V “0W
}Axn }W ď c}xn }V
thus, for any sequence pxn qnPN Ă V , xn
Ñ
n ùñ `8
Ñ
n ùñ `8
0V , Axn
0 Ñ
n ùñ `8
Ap0V q, which
corresponds to the continuity of A in 0V , and hence, by the previous theorem, on all V. A continuous ðñ A continuous in 0V ùñ A bounded. In this case, it is helpful to consider the original definition of continuity, and to express it for x0 “ 0V : @ε ą 0 Dδε ą 0 : }x ´ 0V }V ă δε ùñ }Ax ´ A0V }W ă ε that is : @ε ą 0 Dδε ą 0 : }x}V ă δε ùñ }Ax}W ă ε As the previous expression is valid for all ε, we can consider the case where ε “ 1. For simplicity’s sake, we shall write δε“1 ” K ą 0. Using these choices, the hypothesis that A is continuous in 0V gives us the following implication: }x}V ă K ùñ }Ax}W ă 1
[6.2]
Note that we are approaching the definition of a bounded operator. The final step of the proof consists of determining a specific vector x which satisfies [6.2] and that allows us to handle the inequality }Ax}W ă 1 in order to prove that A is bounded. To this aim, let us consider a real positive number 0 ă σ ă K, hence K ´ σ ą 0, and an arbitrary element y P V . We analyze the norm of the vector pK ´ σq }y}y V : › › › › ›pK ´ σq y › “ K ´ σ }y}V “ K ´ σ ă K › }y}V ›V }y}V
226
From Euclidean to Hilbert Spaces
that is, pK ´ σq }y}y V is a vector in V whose norm is strictly less than K; thus, relationship [6.2] implies: › ˆ ˙› › › ›A pK ´ σq y › ă 1 ðñ K ´ σ }Ay}W ă 1 › }y}V ›W }y}V ðñ }Ay}W ă
1 }y}V K ´σ
Since y P V is arbitrary and K ´ σ ą 0, we can take c ” definition of a bounded A: }Ay}W ă c}y}W
@y P V
1 K´σ
and obtain the 2
This theorem implies that the terms “bounded” and “continuous” can be interchanged for linear operators between normed vector spaces. So far, we specified the vector space in which the norm in question was considered. From now on, for simplicity’s sake, this specification will not be shown and we shall simply write } }. 6.1.1. Continuity of linear operators defined on a finite-dimensional normed vector space The following result shows that all linear operators defined on a finite-dimensional vector space are continuous (and thus bounded). T HEOREM 6.4.– If V is a normed vector space of finite dimension N and W is a normed vector space (of any dimension), then any linear operator A : V Ñ W is bounded (and thus continuous). P ROOF.– As the space V is of finite dimensions, all norms on V are equivalent by Tychonoff’s theorem (Theorem 4.4); thus, we must simply prove that A : V Ñ W is continuous with respect to one norm, and this proof holds for all other norms. Let pu1 , . . . , uN q be a basis of V , then any x P V can be written as N ř xn un , xn P K. Let us consider the following norm on V : x “ n“1 N ř sup |xn |. By the linearity of A and the triangular xn un x “ ” n“1
inequality, we have:
n“1,...,N
Bounded Linear Operators in Hilbert Spaces
227
˜ ¸ N N N N ÿ ÿ ÿ ÿ Ax “ A x n un “ xn Aun ď xn Aun “ |xn | Aun n“1 n“1 n“1 n“1 ď “
N ÿ
sup
n“1 n“1,...,N
ˆ sup n“1,...,N
|xn | Aun
¸ ˙˜ÿ N |xn | Aun n“1
˜ “
def. of }x}
N ÿ
¸ Aun }x}
n“1
this shows us that A is bounded, that is continuous.
2
We therefore do not face any continuity problems when considering linear operators defined on finite-dimensional normed vector spaces, whatever the dimension of the image space. As we shall see, the situation is much more complicated in the case of infinite-dimensional domains. 6.2. The operator norm, convergence of operator sequences and Banach algebras D EFINITION 6.2.– Let A : V Ñ W , V ‰ t0V u be a bounded linear operator. The operator norm of A can be defined in four different (equivalent) ways: }A} “ inf tc ě 0 : }Ax} ď c}x}, @x P V u “ N1
[6.3]
}A} “ sup }Ax} “ N2
[6.4]
}A} “ sup }Ax} “ N3
[6.5]
}Ax} “ N4 }x}
[6.6]
}x}ď1 }x}“1
}A} “ sup
x‰0V
For a non-bounded operator A, we write A “ `8; evidently, for the zero operator 0 it holds that }0} “ 0. Theorem 6.5 guarantees that the definition above is well posed. T HEOREM 6.5.– The four definitions given above coincide. P ROOF.– We shall show that N1 ď N4 ď N3 ď N2 ď N1 , working from right to left. In all of these proofs, we shall use the fact that the sup of a set is, by definition, the smallest of the upper bounds of the set itself.
228
From Euclidean to Hilbert Spaces
N2 ď N1 : by the definition of N1 (i.e. equation [6.3]) we can write }Ax} ď N1 }x} @x P V , thus, in particular, for vectors x such that }x} ď 1, it is true that }Ax} ď N1 , that is, N1 is an upper bound for the set t}Ax}, x P V, }x} ď 1u. By definition, the sup is the smallest of the upper bound of a set, hence N2 “ sup }Ax} ď N1 . }x}ď1
˘ ` N3 ď N2 : consider x P V such }x} “ 1 and the˘sequence xn “ 1 ´ n1 x, ˘ ` ` that n ě 1. On one side: }xn } “ 1 ´ n1 }x} “ 1 ´ n1 ď 1 and thus }Axn } ď sup }Ay} “ N2 . Passing by the limit, we obtain: lim }Axn } ď lim N2 “ N2 . nÑ`8
}y}ď1
On the other side, it is clear that xn
Ñ
nÑ`8
nÑ`8
x and thus, by the continuity of A and of
the norm, lim }Axn } “ }A lim xn } “ }Ax}. nÑ`8
nÑ`8
Combining this information, we can write }Ax} ď N2 , that is, N2 is an upper bound for the set t}Ax}, x P V, }x} “ 1u. The quantity N3 is defined as the sup of this set, that is the smallest upper bound, hence N3 ď N2 . › › › › › x › › x › N4 ď N3 : let us consider x P V , x ‰ 0V , then › }x} › “ 1 and ›A }x} › ď › › › x › }Ax} }Ax} sup }Ay} “ N3 . Furthermore, ›A }x} › “ }x} , hence }x} ď N3 , that is, c is an }y}“1 ! ) upper bound for the set }Ax} . Since N4 is the sup of this set, that , x P V, x ‰ 0 V }x} is the smallest of the upper bounds, then N4 ď N3 . N1 ď N4 : for all x ‰ 0V , }Ax} }x} ď N4 , and }Ax} ď N4 }x}; moreover, by definition of N1 , it holds that N1 ď N4 . 2 R EMARK .– 1) The specification @x P V plays an important role in the definition }A} “ inf tc ě 0 : }Ax} ď c}x}, @x P V u. Without this condition, the norm of A would be trivially null for any linear operator, since A0 “ 0. By considering all of the transformed vectors Ax, x P V , we ensure that the norm of A is ‰ 0 (except, evidently, in the case where A is the identically null operator). 2) We should also highlight the difference between the expression }A}, which represents the operator norm of the linear application A : V Ñ W , and the expression }x}, which represents the norm of a vector x P V . Certain authors use a different symbol for the operator norm, for example |||A|||, but we have chosen to retain the same symbol, } }.
Bounded Linear Operators in Hilbert Spaces
229
We shall now verify that the operator norm is well defined on the set of linear operators from V to W , and that this space is stable with respect to pointwise-defined linear operations, that is pA ` Bqx “ Ax ` Bx and pαAqx “ αAx, for all α P K and for all x P V . – Positive definiteness: evidently, }A} ě 0 for any bounded operator A by equation [6.3]. Furthermore, by equation [6.6], }A} “ sup }Ax} }x} “ 0 if and only x‰0V
if }Ax} “ 0 @x P V , x ‰ 0V (if x “ 0V then Ax “ 0 by linearity). Thus, due to the positive definiteness of the norm of W , }A} “ 0 ðñ Ax “ 0 @x P V , that is, if and only if A is the null operator 0pxq “ 0 @x P V . – Homogeneity: this is a direct consequence of the homogeneity of the norm of W . Using, for example, equation [6.5], we obtain, @α P K: }αA} “ sup }αAx} “ sup |α|}Ax} “ |α| sup }Ax} “ |α|}A} }x}“1
}x}“1
}x}“1
that is: }αA} “ |α|}A}
@α P K
[6.7]
– Triangular inequality: an immediate consequence of equation [6.3] in Definition 6.2 is that we can write: }Ax} ď }A}}x}
@x P V
[6.8]
Using this alongside the triangular inequality of the norm of W , for any pair of operators A, B : V Ñ W and for all x P V , we can write: }pA ` Bqx} “ }Ax ` Bx} ď }Ax} ` }Bx} ď }A}}x} ` }B}}x} “ p}A} ` }B}q}x} By equation [6.3], this implies: }A ` B} ď }A} ` }B}
[6.9]
The inequality [6.9] and the property of homogeneity [6.7] show that the set of bounded linear operators is invariant with respect to linear combinations, and is thus itself a vector space; this space becomes normed by the operator norm. D EFINITION 6.3.– The normed vector space of bounded linear operators from V to W endowed with the operator norm is noted BpV, W q. If V “ W , we simply write BpV q.
230
From Euclidean to Hilbert Spaces
In the literature, the letter B is used to denote bounded. The notation LpV, W q is also used in this sense. Definition 6.4 is an immediate consequence of the fact that BpV, W q is a normed vector space. D EFINITION 6.4 (Convergence in BpV, W q).– A sequence of bounded operators pAn qnPN Ă BpV, W q converges to the bounded operator A P BpV, W q if: }An ´ A} ÝÑ 0 nÑ`8
where }An ´ A} is the operator norm of the difference between An and A. Exercise 6.1 Using Definition 6.4, prove that a necessary condition for the convergence of a sequence of operators from pAn qnPN Ă BpV, W q to A P BpV, W q is: lim }pAn ´ Aqx} “ 0 @x P Bp0, 1q
[6.10]
nÑ`8
Solution to Exercise 6.1 We start by noting that, since the sup is a majorant of a set, it holds that: }A} ě }Ax} @x P Bp0, 1q ,
Bp0, 1q :“ tx P V : }x} ď 1u
[6.11]
Inequality [6.11] implies }An ´ A} ě }pAn ´ Aqx} @x P Bp0, 1q, thus, if there exists at least one x P Bp0, 1q such that lim }pAn ´ Aqx} ą 0, then lim }An ´ nÑ`8
nÑ`8
A} ą 0 which prevents the convergence of the sequence pAn qnPN to A. Property [6.10] is thus necessary for pAn qnPN Ă BpV, W q to converge to A P BpV, W q. 2 In the case where V “ W , we can add a third operation on BpV q, the product: pABqpxq :“ pA ˝ Bqpxq “ ApBpxqq @x P V that is the product in BpV q corresponds to the operation of functional composition between linear operators. We observe that: }pABqx} “ }ApBxq}
ď
pBx is a vector of V q
}A}}Bx} ď }A}}B}}x} @x P V
and thus, by Definition 6.3: }AB} ď }A}}B}
[6.12]
Bounded Linear Operators in Hilbert Spaces
231
Hence, taking A “ B, }A2 } ď }A}2 , by iterating these considerations we obtain the formula: }An } ď }A}n
@n P N.
Thus BpV q is invariant with respect to the product operation defined above, and, consequently, BpV q is a normed associative unital algebra, where the unit is the identity operator. We recall that an algebra A on the field K is a vector space on K equipped with a binary operation ¨ : A ˆ A Ñ A, commonly called the product, which is compatible with linear operations; this is equivalent to requiring that ¨ is bilinear, that is, for all a, b, c P A and k P K it holds that: – pa ` bq ¨ c “ a ¨ c ` b ¨ c and a ¨ pb ` cq “ a ¨ b ` a ¨ c; – pkaq ¨ b “ kpa ¨ bq, a ¨ pkbq “ kpa ¨ bq. T HEOREM 6.6.– Let pV, } }q be an arbitrary normed vector space on the field K. The sum, product by a scalar of K and product in the algebra BpV q are continuous with respect to the operator norm. P ROOF.– Theorem 4.2 also applies in the case of the algebra BpV q, so the sum and product by a scalar are continuous and only the continuity of the product must be proven. If pAn qnPN and pBn qnPN are two sequences of operators of BpV q which converge to A P BpV q and B P BpV q, respectively, that is }An ´ A} Ñ 0, }Bn ´ B}
Ñ
nÑ`8
}An Bn ´ AB}
Ñ
Ñ
0, then we must show that An Bn
nÑ`8
nÑ`8
nÑ`8
AB, that is,
0:
}An Bn ´ AB} “ }An pBn ´ Bq ` pAn ´ AqB} `}An ´ A}}B}
Ñ
nÑ`8
ď
r6.9s,r6.12s
}An }}Bn ´ B}
0
2
The presence of a norm on BpV, W q generates a topology, and this naturally leads us to examine the conditions under which this space is complete. The following result provides a sufficient condition for BpV, W q to be complete. T HEOREM 6.7.– Let V, W be two normed vector spaces. If W is complete, then BpV, W q is complete. Before proving this theorem, we wish to highlight the fact that the theorem holds for BpHq or BpH1 , H2 q, if H, H1 , H2 are Hilbert spaces. P ROOF.– Let pAn qnPN be a Cauchy sequence of operators in BpV, W q, that is: @ε ą 0 DNε ą 0 : @m, n ě Nε : }An ´ Am } ă ε
232
From Euclidean to Hilbert Spaces
To prove the theorem, we must show that pAn qnPN converges in BpV, W q using the hypothesis of completeness of W . We begin by noting that for all fixed x P V , it holds that: @m, n ě Nε : }An x ´ Am x} ď }An ´ Am }}x} ă ε}x}
[6.13]
and thus, by the arbitrary nature of ε, pAn xqnPN is a Cauchy sequence in W . By hypothesis, W is complete, thus there exists lim An x P W ; this means that n ùñ 8
we can define the limit operator A associated with the sequence pAn qnPN : A : V ÝÑ W x ÞÝÑ Apxq “
lim
n ùñ `8
An x
We shall show that pAn qnPN converges in operator norm to A, and that A P BpV, W q, completing our proof. We begin by noting that @n ě Nε , it holds that: }An pxq ´ Apxq} “ }An pxq ´ “
lim
(continuity of } }) m ùñ `8
lim
m ùñ `8
Am pxq} [6.14]
}An pxq ´ Am pxq} ă ε}x} [6.13]
The final equality draws on the fact that m tends toward `8, so we know that m ě Nε . Hence, @ε ą 0 DNε ą 0 : n ě Nε ùñ }An ´ A} “ sup }pAn ´ Aqx} }x}“1
“ sup }An pxq ´ Apxq} ă ε }x}“1
that is, pAn qnPN converges in operator norm to A. Finally, we must verify that A P BpV, W q. Taking an arbitrary x P V , then, since inequality [6.14] holds for all n ě Nε , we can write: }Ax} “ }Ax ´ ANε x ` ANε x} ď }Ax ´ ANε x} ` }ANε x} ă ε}x} ` }ANε }}x} r6.14s
that is, }Ax} ă pε ` }ANε }q}x} @x P V , thus A is bounded.
2
D EFINITION 6.5 (Banach algebra).– An algebra A on the field K is a Banach algebra if the following properties are verified @a, b, c P A:
Bounded Linear Operators in Hilbert Spaces
233
– A is an associative algebra, that is a ¨ pb ¨ cq “ pa ¨ bq ¨ c; – A, as a vector space, admits a norm with respect to which it is a Banach space; – a ¨ b ď a b. From what we have already seen, we know that if V is a Banach space, BpV q is a complete, associative unital algebra with respect to the operator norm ; hence, BpV q is a unital Banach algebra. Evidently, for any Hilbert space H, BpHq is a unital Banach algebra. A particularly important property of the kernel of the operators of BpV, W q is shown below. T HEOREM 6.8.– Let V, W be two normed vector spaces and take A P BpV, W q, then kerpAq is a closed vector subspace of V . P ROOF.– Let pvn qnPN Ă kerpAq be an arbitrary convergent sequence. We must prove that its limit, v¯ “ lim vn , remains within kerpAq. A is bounded, and thus nÑ`8
continuous, so lim Avn “ Ap lim vn q “ A¯ v . Furthermore, Avn “ 0 @n P N nÑ`8
nÑ`8
since vn P kerpAq, hence A¯ v “ lim 0 “ 0, which implies v¯ P kerpAq. nÑ`8
2
The usefulness of this theorem is shown in the following exercise, which highlights the fact that the theorem of projection onto a closed proper vector subspace is not valid without the completeness hypothesis. Exercise 6.2 Let T be the linear operator (actually, a linear functional) defined by: T :
2 pN, Cq ÝÑ C ř x “ pxn qnPN ÞÝÑ T pxq “ nPN
xn n`1
1) Show that T is continuous. " ř 2) Taking F “ pxn qnPN P 2 pN, Cq : proper vector subspace of 2 pN, Cq.
nPN
xn n`1
* “ 0 , show that F is a closed K
3) Prove the existence of u P 2 pN, Cq such that F “ tuu and use your result to deduce the explicit expression of F K . 4) We know (see the definition corresponding to [4.26]) that 0 pN, Cq is the vector subspace of 2 pN, Cq made up of sequences pxn qnPN which are zero after a
234
From Euclidean to Hilbert Spaces
certain index, which we equip with the topology induced by 2 pN, Cq. Take G “ F X 0 pN, Cq. a) Show that G is a closed proper vector subspace of 0 pN, Cq. b) Using formula [5.4], show that the orthogonal complement of G in 0 pN, Cq, that is GK0 :“ GK X 0 , is reduced to the zero vector: GK0 “ t02 pN,Cq u. c) Use your findings to deduce that 0 pN, Cq is not complete in the topology inherited from 2 pN, Cq. Solution to Exercise 6.2 1 1) Let u “ pun qnPN denote the sequence defined by un “ n`1 @n P N, which 2 2 obviously belongs to pN, Cq. For all x P pN, Cq, it holds that: ˇ ˇ ˇ ˇ ˇ ˇÿ 1 ˇˇ ˇˇ ÿ ˇ ˇ ď xn xn un ˇ “ |xx, uy| }x}}u} |T pxq| “ ˇ ˇ“ˇ ˇ ˇnPN n ` 1 ˇ ˇnPN Cauchy-Schwarz
thus T is bounded, with }T } ď }u}, that is continuous. 2) By definition, F “ kerpT q and thus it forms a closed vector subspace in 2 pN, Cq, since we have just proved that T is continuous. One example of an element in 2 pN, Cq that does not belong to F is the first vector in the canonical basis of ř e1 pnq 1 2 pN, Cq, that is, e1 “ p1, 0, 0, . . . q, since n`1 “ 2 ‰ 0. Thus, F is a closed proper vector subspace of 2 pN, Cq.
nPN
3) Taking u in the same way as in question 1, we know that: F “ tx P 2 pN, Cq : xx, uy “ 0u ” tuuK so F K “ tuuKK
“ spantuu; moreover, spantuu “
r5.5s
!
λ n`1 ,
) λ P C is a one-
dimensional vector subspace, and thus it is closed. Hence spantuu “ spantuu and ! ) λ , λ P C . n`1
FK “
4) a) We can rewrite G in an explicit form as: G “ F X 0 pN, Cq “ tpxn qnPN , D N P N : xn “ 0 @n ą N and
N ÿ
xn “ 0u n`1 n“0
showing that G “ ker T |0 pN,Cq . As the restriction of a continuous linear operator is itself continuous, G must be a closed vector subspace of 0 pN, Cq. To prove that G is N ř e1 pnq 1 proper, we consider e1 : e1 P 0 pN, Cq, and n`1 “ 2 ‰ 0. n“0
Bounded Linear Operators in Hilbert Spaces
235
b) We have: GK0 “ GK X 0 pN, Cq “ pF X 0 pN, CqqK X 0 pN, Cq “ spanpF K Y 0 pN, CqqK X 0 pN, Cq r5.4s
Knowing (from Theorem 4.21) that 0 pN, Cq is dense in 2 pN, Cq, we have pN, CqqK “ t02 pN,Cq u, which is already included in F K as a vector subspace of 2 pN, Cq. Furthermore: " * λ K K 0 K K K spanpF Y pN, Cq q “ spanpF q “ F “ F “ , λPC n`1 0
since F K is closed, from the answer to question 3. Then: " * λ K0 G “ , λ P C X 0 pN, Cq “ t02 pN,Cq u n`1 since it is clear that the sequence
λ n`1
R 0 pN, Cq.
c) G is a closed, proper vector subspace in 0 pN, Cq equipped with the topology inherited by 2 pN, Cq; nevertheless, we have just shown that GK0 , the orthogonal complement of G in 0 pN, Cq, consists of the zero vector alone. This contradicts the result of Theorem 5.4 (a corollary of the theorem of projection onto a closed, convex proper part of a Hilbert space) which states that the orthogonal complement of a closed, proper vector subspace does not solely consist of the zero vector. Clearly, the only hypothesis which is not respected here is the completeness of 0 pN, Cq with respect to the inherited topology of 2 pN, Cq. 2 Our next step is to consider the way in which a continuous linear operator between two normed vector spaces interacts with Cauchy sequences. T HEOREM 6.9.– Let V and W be two arbitrary normed vector spaces, A P BpV, W q, and let pxn qnPN Ă V be a Cauchy sequence; then pAxn qnPN is a Cauchy sequence in W. P ROOF.– By hypothesis: @ε ą 0 DNε ą 0 : @n, m ě Nε : }xn ´ xm } ă ε. Now, let us consider pAxn qnPN and analyze }Axn ´ Axm } “ }Apxn ´ xm q} ď }A}}xn ´ xm } ă }A}ε, @n, m ě Nε . By the arbitrary nature of ε, pAxn qnPN is a Cauchy sequence of elements in W . 2 This result can help to prove the completeness of a normed vector space, as we shall see in Exercise 6.3, which may be seen as a continuation of Exercise 4.2.
236
From Euclidean to Hilbert Spaces
Exercise 6.3 Given a fixed sequence a “ pan qnPN of strictly positive real numbers, we write: ÿ ? 2a pN, Cq :“ tu P CN : an |un |2 ă `8 ðñ au P 2 pN, Cqu nPN
In Exercise 4.2, we verified that: ÿ xu, vy2a “ an un vn and
}u}22a “
nPN
ÿ
an |un |2
nPN
are an inner product and a norm on 2a pN, Cq, respectively. 1) Show that the operator ıa : 2a pN, Cq ãÑ 2 pN, Cq ? ? u ÞÑ ıa puq :“ au ” p an un qnPN is linear, continuous, has unit norm and is bijective. Give the explicit expression of the inverse operator of ıa ; verify that this is continuous and has a norm of 1. 2) Using your findings, deduce that 2a pN, Cq is a Hilbert space. an
3) Let a and b be two sequences of strictly positive real numbers such that “ Opbn q. Show that 2b pN, Cq Ă 2a pN, Cq, and that the canonical injection is
nÑ`8
continuous. Solution to Exercise 6.3 1) Linearity rewriting: if u, v P 2a pN, Cq and λ P C, then ? can be shown by ? simple? ıpu ` λvq “ apu ` λvq “ au ` λ av “ ıpuq ` λıpvq. Concerning continuity, for all u P 2a pN, Cq, ıpuq P 2 pN, Cq and the norm is: ÿ ? ÿ }ıa puq}22 “ | a n u n |2 “ an |un |2 “ }u}22a ðñ }ıa puq}2 “ }u}22a nPN
nPN
This shows that ıa is continuous, and that its norm is 1. The final condition to prove is bijectivity. We note that the operator: j1{a : 2 pN, Cq ùñ 2a pN, Cq v ÞÑ ı1{a pvq :“
?1 v a
” p ?1an vn qnPN
? ? ? is well defined, since a ą 0 and v{ a P 2a pN, Cq ðñ av{ a “ v P ?2 pN, Cq. 2 2 Furthermore, it is such that j1{a ˝ ıa : a pN, Cq Ñ a pN, Cq, j1{a ˝ ıa puq “ ?aa u “ u @u P 2a pN, Cq; vice versa, for all v P 2 pN, Cq, ıa ˝ j1{a : 2 pN, Cq Ñ 2 pN, Cq, ıa ˝ j1{a pvq “
? ?a v a
“ v, that is, j1{a ˝ ıa “ id2a pN,Cq and ıa ˝ j1{a “ id2 pN,Cq . Thus,
Bounded Linear Operators in Hilbert Spaces
237
ıa is bijective with inverse j1{a . The inverse is also clearly continuous and possesses a unit norm, since: }j1{a pvq}22a “
ÿ an |vn |2 “ }v}22 ðñ }j1{a pvq}2a “ }v}22 a n nPN
@v P 2 pN, Cq [6.15]
2) By the continuity of ıa and by Theorem 6.9, ıa transforms the Cauchy sequences in 2a pN, Cq into Cauchy sequences in 2 pN, Cq. Now, let pum qmPN be an arbitrary Cauchy sequence of elements in 2a pN, Cq; ıa ppum qmPN q is a Cauchy sequence in 2 pN, Cq, which we know to be complete, thus D L P 2 pN, Cq such that ıa ppum qmPN q Ñ L, that is: mÑ`8
0 “ lim }ıa ppum qmPN q ´ L}2 “ “
lim }j1{a pıa ppum qmPN q ´ Lq}2a
r6.15s mÑ`8
mÑ`8
lim }j1{a ˝ ıa ppum qmPN q ´ j1{a pLq}2a
j1{a linear mÑ`8
“ lim }pum qmPN ´ j1{a pLq}2a mÑ`8
that is, pum qmPN converges in 2a pN, Cq to j1{a pLq, hence 2a pN, Cq is a Hilbert space. 3) We must show that if an
“
nÑ`8
Opbn q, then u P 2b pN, Cq ùñ 2a pN, Cq, that
is: ÿ
bn |un |2 ă `8 ùñ
nPN
ÿ
an |un |2 ă `8
nPN
for all u P 2b pN, Cq. By definition, an
“
nÑ`8
Opbn q if and only if there exist C1 ą 0
and N P N such that, for all n ě N , it holds that an ď C1 bn . For the purposes of this demonstration, we must multiply both sides of the previous inequality by |un |2 , giving us an |un |2 ď C1 bn |un |2 for all n ě N , that `8 `8 ř ř an |un |2 ď C1 bn |un |2 . The summation of the first N terms, from is, n“N
n“N
n “ 0 to n “ N ´ 1, is finite, so there must be a constant C2 ą 0 which Nř ´1 Nř ´1 an |un |2 ď C2 bn |un |2 ; we therefore take is sufficiently large to result in n“0 n“0 ř ř an |un |2 ď C bn |un |2 . This tells us that C :“ maxpC1 , C2 q ą 0, giving us nPN
nPN
if u P 2b pN, Cq, then u P 2a pN, Cq. Furthermore, the previous inequality can be rewritten as }u}22 ď C}u}22 , thus the canonical injection ι : 2b pN, Cq ãÑ 2a pN, Cq a b ? verifies }ιpuq}2a ď C}u}2b for all u P 2b pN, Cq, meaning that it is bounded and thus continuous. 2
238
From Euclidean to Hilbert Spaces
We shall conclude this section by presenting an extremely useful result which can be used to characterize the equality between continuous operators on an inner product space of arbitrary dimensions, via the equality of their action on vectors within an inner product. T HEOREM 6.10.– Let A, B : V Ñ W be two linear operators defined on an inner product space of arbitrary dimension. Then: A “ B ðñ xx, Ayy “ xx, Byy @x, y P V P ROOF.– By linearity of A, it holds that xx, Ayy “ xx, Byy @x, y P V ðñ xx, pA ´ Bqyy “ 0 @x, y P V . Let us take an arbitrary but fixed element y P V and write u “ pA ´ Bqy P V , then xx, uy “ 0 @x P V holds true if and only if u “ 0, that is pA ´ Bqy “ 0 @y P V , that is A ´ B “ 0, implying A “ B. 2 6.2.1. A classical example of a non-bounded linear operator on a vector space of infinite dimension Although we have chosen to focus only on bounded linear operators on Hilbert spaces, it is important to show at least one example of a non-bounded linear operator. Actually, we are going to prove that one of the simplest operations – the derivation – on the simplest Hilbert basis – the Fourier basis – does not produce a bounded operator. Let un pxq “ ?12π einx , pn P Zq, be the Fourier basis of L2 r0, 2πs. Let us consider the first derivation operator on the infinite-dimensional vector space generated by the Fourier basis: D : spanppun qnPZ q ÝÑ L2 r0, 2πs d ÞÝÑ Dun “ dx un un d where dx un pxq “ ?in einx , which is square integrable on r0, 2πs. Of course, the 2π previous definition of D is extended by linearity on the whole span.
We can show that the norm of D is not finite. To calculate it, we may use equation [6.5] in Definition 6.2 of an operator norm, taken v in the domain of D we have : }D} “ sup }Dv} ě sup }Dun } }v}“1
}un }“1
where the inequality is motivated by the fact that the sup on the right hand side is computed over a subset of the domain of D.
Bounded Linear Operators in Hilbert Spaces
239
However, the condition }un } “ 1 does not determine any constraints, as any element un in the Fourier Hilbert basis of L2 r0, 2πs has a unit norm, thus }D} is simply the sup of the set of values }Dun } with respect to the integer index n, that is: ˜ż ˇ ˇ2 ¸1{2 2π ˇ ˇ in inx ˇ ? e ˇ dx }D} ě sup }Dun } “ sup ˇ ˇ 2π nPZ nPZ 0 ¸1{2 ˜ ˇ ż 2π ˇ ˇ 1 inx ˇ2 2 ˇ ? e ˇ dx “ sup |in| ˇ 2π ˇ nPZ 0 that is: }D} ě sup |in| “ sup |n| “ `8 nPZ
nPZ
which implies that the derivation operator defined above is not bounded, and is therefore not continuous. 6.3. Invertibility of linear operators Exercise 6.3 highlighted the importance of analyzing the inverse of a linear operator. This subject will be examined in greater detail in this section. D EFINITION 6.6.– Let V, W be two normed vector spaces on the same field K and let A : V Ñ ImpAq Ď W be a linear operator. The inverse operator of A is A´1 : ImpAq Ď W Ñ V such that @x P V : A´1 : ImpAq Ď W ÝÑ V Ax ÞÝÑ A´1 pAxq “ x If there exists A´1 , then A is invertible. For all x P V , it holds that A´1 pAxq “ x and ApA´1 pAxqq “ Apxq, thus the invertibility of A can be defined in an equivalent manner with the conditions: A´1 ˝ A “ idV and A ˝ A´1 “ idImpAq In the specific case where W “ V and ImpAq “ V , the invertibility of A is equivalent to the existence of an operator A´1 : V Ñ V such that: A ˝ A´1 “ A´1 ˝ A “ idV If A : V Ñ V , the symbol GLpV q is used to designate the set of continuous bijective linear operators with a continuous inverse, known as the set of regular elements in BpV q. Theorem 6.11 summarizes the elementary properties of the inverse (the proofs of these properties are identical to those performed in finite dimension).
240
From Euclidean to Hilbert Spaces
T HEOREM 6.11.– Let V, W be two normed vector spaces and let A : V Ñ ImpAq Ď W be linear: 1) If A´1 exists, then it is unique; 2) If A´1 exists, then it is a linear operator; 3) A´1 exists if and only if kerpAq “ t0V u, that is a necessary and sufficient condition for A to be invertible on its image is that its kernel is reduced to the zero vector of V . P ROOF.– 1) Let B1 , B2 : ImpAq Ď W Ñ V be two inverse operators of A, then: B1 “ B1 ˝ idImpAq “ B1 ˝ pA ˝ B2 q “ pB1 ˝ Aq ˝ B2 “ idImpAq ˝ B2 “ B2 . 2) For all w1 , w2 P ImpAq and k P K, we have: A´1 pw1 ` kw2 q “ A´1 pAA´1 pw1 q ` kAA´1 pw2 qq (linearity of Aq “ A´1 ApA´1 pw1 q ` kA´1 pw2 qq “ A´1 pw1 q ` kA´1 pw2 q 3) We know that the inverse of A can be defined on its image if and only if A is injective. Let us verify that this is equivalent to kerpAq “ 0V . On one side, if Ax “ 0W , then x “ A´1 Ax “ A´1 0W “ 0V by linearity of A´1 , so if there exists A´1 , the kernel of A is reduced to the zero vector of V . On the other side, taking kerpAq “ t0V u and x1 , x2 P V such that Ax1 “ Ax2 , then Ax1 ´ Ax2 “ 0W , that is by linearity of A, Apx1 ´ x2 q “ 0W , but if kerpAq “ t0V u then x1 ´ x2 “ 0V , that is, x1 “ x2 , proving the injectivity of A. 2 The condition kerpAq “ t0V u is necessary and sufficient for the invertibility of a linear operator on its image space ImpAq in finite and infinite dimensions. In finite dimensions, the inverse of a linear operator, if it exists, is always bounded. In infinite dimensions, on the other hand, the condition kerpAq “ t0V u does not imply any relationship between the continuity of A and that of A´1 : A may be bounded and have a non-bounded inverse or, conversely, A may be non-bounded and have a bounded inverse. One classic example of this situation is given by the derivation and integral operators. An easier example is provided by the linear operator A : 2 pN, Kq Ñ 2 pN, Kq defined by Apx1 , x2 , x3 , . . . , xn , . . . q “ px1 , x2 {2, x3 {3, . . . , xn {n, . . . q, that is Appxn qnPN˚ q “ pxn {nqnPN˚ . A is bounded and }A} ď 1. For all x “ pxn qnPN P 2 pN, Kq: }Ax}22 “
ÿ ÿ |xn |2 |xn |2 “ }x}22 ď 2 n nPN nPN
Bounded Linear Operators in Hilbert Spaces
241
The operator A´1 : 2 pN, Kq Ñ 2 pN, Kq, A´1 ppyn qnPN˚ q “ pnyn qnPN˚ is evidently the inverse of A. Nevertheless, A´1 is not bounded: we can verify this by considering the general element of the canonical basis of 2 pN, Kq, that is en “ p0, 0, . . . , 1, 0, 0, . . . q, where 1 is in the position n. We see that, on one side, }en }2 “ 1 @n P N, and, on the other side, }A´1 en }2 “ n, hence }A´1 } “ sup }A´1 en }2 “ `8. nPN
A very useful characterization exists for the bounded invertibility of linear operators. It is important to note that this characterization holds independently of the continuity of the operator, making it particularly helpful in practical applications. T HEOREM 6.12 (Bounded invertibility of a linear operator).– If V and W are two normed vector spaces and A : V Ñ W is a linear operator (not necessarily bounded), then DA´1 P BpImpAq, V q if and only if Dμ ą 0 such that }Ax} ě μ}x} @x P V . P ROOF.– ùñ : suppose that DA´1 P BpImpAq, W q, then, by definition, Dm ą 0 such that @y P ImpAq: }A´1 y} ď m}y}. Since A is invertible and y P ImpAq, Dx P V such 1 ´1 Ax} ď m}Ax}, that is }Ax} ě }x} and, that we can write y “ Ax, then }A loooomoooon m on loomo }x}
“μą0
since y is an arbitrary element in ImpAq, the inequality holds for all x P V .
ð : suppose that }Ax} ě μ }x} @x P V , then, in particular, if we consider x P kerpAq:
pą0q
}Ax} “ }0} “ 0 ě μ}x} ðñ }x} “ 0 ðñ x “ 0V ùñ kerpAq “ t0V u pμą0q
´1
that is DA : ImpAq ùñ V . We must therefore prove that A´1 is bounded. For all y P ImpAq such that x “ A´1 y, we have: }Ax} ě μ}x} ðñ }AA´1 y} ě μ}A´1 y} ðñ }y} ě μ}A´1 y} ðñ }A´1 y} ď μ1 }y}, @y P ImpAq, that is A´1 is bounded. 2 The condition of the theorem is interpreted as follows. First, the fact that }Ax} ě μ}x} guarantees that the kernel of A consists solely of the zero vector. Furthermore, the inequality }Ax} ě μ}x} is inverted with respect to the inequality which defines a bounded operator, that it is well suited to guarantee that the inverse operator of A is bounded. One immediate consequence of the theorem shown above is that a linear operator A : V Ñ W is bounded and has a bounded inverse if and only if it satisfies the following condition: Da, b ą 0, a ď b : a}x} ď }Ax} ď b}x}
@x P V
242
From Euclidean to Hilbert Spaces
that is, the norm of all of the vectors of V , transformed by the action of A, is bounded by the norm of the vector itself multiplied by two positive constants. This consideration has an important consequence for the images of bounded linear operators defined on Banach spaces, as stated in the next theorem. T HEOREM 6.13.– Let V be a Banach space and W an arbitrary normed vector space. Take A P BpV, W q. If A is invertible with a bounded inverse, then ImpAq is a closed vector subspace of W . P ROOF.– From Theorem 6.12, we know that the condition DA´1 P BpImpAq, V q is equivalent to: Da ą 0 : }x} ď a}Ax} @x P V We must prove that this condition implies that ImpAq is closed, that is, if pyn qnPN Ă ImpAq is such that yn Ñ y, then y P ImpAq. Since yn P ImpAq, then nÑ`8
there exists pxn qnPN Ă V such that yn “ Axn @n P N, hence: xn ´ xm ď a Apxn ´ xm q “ Axn ´ Axm “ yn ´ ym
Ñ
n,mÑ`8
0
because pyn qnPN is a convergent, and thus Cauchy sequence. The sequence pxn qnPN must therefore also be Cauchy and, since V is a Banach space, there exists x P V such that xn Ñ x. By the continuity of A, we obtain: nÑ`8
Ax “ A lim xn “ lim Axn “ lim yn “ y nÑ`8
that is, y P ImpAq.
nÑ`8
nÑ`8
2
There is a second condition which is sufficient to ensure the continuity of the inverse of a linear operator. The presentation of this condition relies on an intermediary result, which is, itself, one of the most important theorems in functional analysis (the proof of this theorem is beyond the scope of this book, we simply note that it is a consequence of Baire’s category theorem). T HEOREM 6.14 (Open mapping theorem – Banach-Schauder).– Let V and W be two Banach spaces. If A P BpV, W q is surjective, then A is an open mapping, that is A transforms open subsets of V into open subsets of W . T HEOREM 6.15 (Continuous inverse operator theorem in Banach spaces).– Let V and W be two Banach spaces. If A P BpV, W q is bijective, that is kerpAq “ t0V u, and A is surjective, then A´1 P BpW, V q, that is A´1 is continuous. P ROOF.– Recall the topological characterization of continuity: a function between two topological spaces is continuous if and only if the counterimage of any open
Bounded Linear Operators in Hilbert Spaces
243
subset is open. By definition, the counterimages of A´1 are the images of A, hence A´1 is continuous if and only if any image of open via A is open; this property is guaranteed by the open mapping theorem. 2 The continuous inverse theorem can be used to characterize operators belonging to the set GLpV q for any given Banach space V . T HEOREM 6.16 (Characterization of GLpV q).– Let V be a Banach space and GLpV q the set of regular elements of the Banach algebra BpV q (linear bijections with continuous inverse). For an operator A P BpV q, the following two conditions are equivalent: 1) A P GLpV q; 2) D a linear operator B defined on all V such that BA “ idV and AB “ idV . If one of the two conditions is satisfied, then B is unique and B “ A´1 . P ROOF.– 1q ùñ 2q If A P GLpV q, then we must simply consider B “ A´1 to prove the implication. 2q ùñ 1q The hypothesis BA “ idV implies that kerpAq “ t0u, that is, A is injective. Reasoning by the absurd, if x ‰ 0 and Ax “ 0, then we would have BAx “ 0, which contradicts the fact that BAx “ idV pxq “ x ‰ 0. Furthermore, the hypothesis AB “ idV implies that ImpAq “ V ; for all x P V , it holds that ApBxq “ ABpxq “ idV pxq “ x, so any x P V can be seen as the image via A of an element in V , that is, Bx, meaning that A is surjective. Thus, the existence of B such that the hypotheses BA “ idV and AB “ idV are valid implies that A is a linear bijection, and that @x P V , BpAxq “ x, that is, B “ A´1 . Hence A is bounded by hypothesis, invertible and surjective; by the continuous inverse theorem, B “ A´1 , and therefore A P GLpV q. The final step is to prove uniqueness. Let B and B 1 be two operators which verify 2; then A´1 “ A´1 AB and A´1 “ A´1 AB 1 , hence A´1 “ B “ B 1 . 2 Clearly, if A P GLpV q, then we also have A´1 P GLpV q and if A, B P GLpV q, then AB P GLpV q since pABq´1 “ B ´1 A´1 given that ABB ´1 A´1 “ idV and B ´1 A´1 AB “ idV . GLpV q is therefore stable with respect to the product and inversion, and its unit element is idV , that is GLpV q is a group. D EFINITION 6.7.– The group GLpV q is called the general linear group of V .
244
From Euclidean to Hilbert Spaces
6.4. The dual of a Hilbert space and the Riesz representation theorem Again, let us consider BpV, W q, where V, W are two normed vector spaces. We know that BpV, W q is a Banach space with respect to the operator norm if W is a Banach space. Consider the specific case in which W is the field K on which V is defined as a vector space. As K “ R or C is complete, BpV, Kq is a Banach space, known as the dual of V and noted V ˚ (the notation V 1 is sometimes used in the literature to denote a dual space). The elements of V ˚ are known as the bounded linear functionals on V . We could ask ourselves how the “dualization” process of V can be iterated. For Hilbert spaces, the answer to this question is quite surprising: the dualization of any Hilbert space H is an involution, that is, H˚˚ » H, where » is an isomorphism between Hilbert spaces. H˚˚ is called the bidual of H. This is not true, in general, for Banach spaces; those which are isomorphic to their bidual are known as reflexive Banach spaces. The Banach spaces Lp pX, A, μq are reflexive for 1 ă p ă 8, but L1 pX, A, μq and L8 pX, A, μq are not. Each functional ϕ P V ˚ transforms an element of V into a scalar of K. This transformation is represented using the following notation: ϕ : V ÝÑ K x ÞÝÑ ϕpxq “ xϕ, xy The notation xϕ, xy comes from the fact that if V is a Hilbert space, then any continuous linear functional ϕ P V ˚ acts as an inner product on the vectors of V . This statement forms the basis for a famous result first identified by Riesz, which will be shown and proved below. T HEOREM 6.17 (Riesz representation theorem).– Let H be a Hilbert space on K “ R or C, and let H˚ be the dual of H. Then: T : H ÝÑ H˚ x ÞÝÑ Tx where: Tx : H ÝÑ K y ÞÝÑ Tx pyq “ xy, xy is an isomorphism between H and H˚ interpreted as Banach spaces, that is, T is bijective, preserves the norms and: – if K “ R, then T is linear;
Bounded Linear Operators in Hilbert Spaces
245
– if K “ C, then T is antilinear. The functional Tx is called the Riesz representative of x in H˚ . Before presenting the proof, it is important to understand the reason for the antilinearity in the case K “ C. We shall begin by analyzing the summation operation: T :
H ÝÑ H˚ x1 ` x2 ÞÝÑ Tx1 `x2
Tx1 `x2 : H ÝÑ C y ÞÝÑ Tx1 `x2 pyq “ xy, x1 ` x2 y “ xy, x1 y ` xy, x2 y “ Tx1 pyq ` Tx2 pyq thus Tx1 `x2 “ Tx1 ` Tx2 . Now, consider the multiplication by a scalar using k P C: T : H ÝÑ H˚ kx ÞÝÑ Tkx Tkx : H ÝÑ C ¯ xy “ kT ¯ x pyq y ÞÝÑ Tkx pyq “ xy, kxy “ kxy, ¯ x. thus Tkx “ kT Therefore: T :
T : H ÝÑ H˚ x1 ` x2 ÞÝÑ Tx1 ` Tx2 ,
H ÝÑ H˚ ¯ x kx ÞÝÑ kT
which explains why T is antilinear if K “ C. Evidently, if K “ R, this distinction has no place and T is linear. The Riesz representation theorem owes its name to the fact that it allows all continuous linear functions on a Hilbert space to be represented via inner products; notably, for any continuous linear function ϕ on H “ L2 pX, A, μq there exists a single element f P L2 pX, A, μq such that ϕ “ Tf with: Tf : L2 pX, A, μq ÝÑ K ş g ÞÝÑ Tf pgq “ xg, f y “ X g f¯dμ More generally, we know that all separable, infinite-dimensional Hilbert spaces are isomorphic to 2 pN, Kq, for which the inner product is defined by a series. These observations are the reason why continuous linear functionals are very often represented by finite sums, series or integrals in applications of functional analysis.
246
From Euclidean to Hilbert Spaces
One final aspect to note before moving on to the proof is that if we consider the inner product in the way it is used in physics, that is, as antilinear with respect to the first entry and linear with respect to the second entry, then the definition of Tx becomes Tx pyq “ xx, yy. P ROOF.– Since the linear or antilinear character of T has already been examined, we shall start by verifying that T is well defined, that is, Tx is a bounded linear functional on H. Taking α, β P K, y, y1 , y2 P H: – Tx is linear2: Tx pαy1 `βy2 q “ xαy1 `βy2 , xy “ αxy1 , xy`βxy2 , xy “ αTx py1 q`βTx py2 q – Tx is bounded: We begin by observing that }Tx pyq} “ |Tx pyq| since Tx pyq P K. Thus: }Tx pyq} “ |Tx pyq| “ |xy, xy|
ď
(Cauchy-Schwarz)
}x}}y}
[6.16]
The fact that Tx is a bounded linear operator between the Hilbert spaces H and K allows us to calculate the operator norm of Tx . With respect to this norm, T is an isometry, that is, }Tx }BpH,Kq “ }x}H @x P H. The case of the zero vector is straightforward: if x “ 0H then T0H is the zero functional since T0H pyq “ xy, 0H y “ 0 @y P H, thus: }0H } “ 0 “ }T0H }. Taking x P H, x ‰ 0H , let us prove that }Tx } ď }x} and that }x} ď }Tx }, in that order: – }Tx } ď }x}: by [6.16] we can write }Tx pyq} ď }x}}y} @y P H, hence: }Tx } “ sup |Tx pyq| ď sup }x}}y} “ }x} y“1
y“1
– }x} ď }Tx }: in this case, we can write: }x}2 “ xx, xy
“
(def. of Tx )
Tx pxq
“
Tx pxq“}x}2 ě0 !
|Tx pxq| “ }Tx pxq}
ď
(Tx bounded)
}Tx }}x}
and since }x} ‰ 0, the first and last members of the expression above can be divided by }x}, giving us }x} ď }Tx }. In summary, }Tx } “ }x} @x P H, hence T is an isometry and consequently T is injective. 2 If we had defined Tx pyq “ xx, yy, then we would have Tx pαy1 ` βy2 q “ αxx, ¯ y1 y ` ¯ x py2 q, that is, Tx would be an antilinear functional. It is thus ¯ y2 y “ αT ¯ x py1 q ` βT βxx, impossible to avoid antilinearity either in T or Tx .
Bounded Linear Operators in Hilbert Spaces
247
The final step in the proof is to demonstrate that T is surjective, that is, for all ϕ P H˚ there exists x P H such that ϕ “ Tx . The argument which Riesz used to demonstrate the surjectivity of T is particularly elegant. First, if ϕ is the identically zero functional 0, then ϕ “ T0H . Now, let ϕ be a non-identically zero function, and consider its kernel: – 0H P kerpϕq by linearity of ϕ, thus kerpϕq ‰ H; – since ϕ ‰ 0, there exists at least one vector in H that is not nullified by ϕ, that is, kerpϕq ‰ H; – as we saw in Theorem 6.8, kerpϕq is always closed. Thus, kerpϕq is a closed proper subspace of H; based on this observation, Theorem 5.4 can be used to guarantee that kerpϕqK ‰ t0H u, that is, there exists at least one u ‰ 0H , u P kerpϕqK . Now, we note that since kerpϕq X kerpϕqK “ t0H u and since u ‰ 0H , u R kerpϕq, ϕpyq u is well defined. for all y P H, the vector z “ y ´ ϕpuq z P kerpϕq, and by linearity, ϕpzq “ ϕpy ´ short: # u P kerpϕqK ϕpyq z “ y ´ ϕpuq u P kerpϕq
ϕpyq ϕpuq uq
“ ϕpyq ´
ϕpyq “ 0; in ϕpuq ϕpuq
hence: 0 “ xz, uy “ xy ´
ϕpyq ϕpyq ϕpyq u, uy “ xy, uy ´ x u, uy “ xy, uy ´ }u}2 ϕpuq ϕpuq ϕpuq
that is: ϕpyq “
ϕpuq ϕpuq xy, uy “ xy, uy 2 }u} }u}2
@y P H
Hence, for any vector u P kerpϕqK , u ‰ 0H , the vector x “ ϕpyq “ xy, xy “ Tx pyq,
ϕpuq }u}2 u
is such that:
@y P H
that is, ϕ “ Tx . This proves that T is surjective and concludes the proof.
2
The final step of the proof above actually demonstrates an even finer result: the orthogonal complement of the kernel of a bounded linear function on a Hilbert space H is a straight line in H.
248
From Euclidean to Hilbert Spaces
C OROLLARY 6.1.– Let H be a Hilbert space and take ϕ P H˚ , ϕ ” 0. Then kerpϕqK K is a one-dimensional vector subspace of H, that is, dimpkerpϕq q “ 1. One generator of this space is the residual vector x ´ Pker ϕ x, where x P H is such that ϕ “ Tx via the Riesz isomorphism. P ROOF.– In the final part of the proof of the Riesz representation theorem, we showed that if ϕ is not identically null functional, then for any given u P kerpϕqK , u ‰ 0H , x“
ϕpuq }u}2 u
is the vector in H, which is identified with ϕ via the formula ϕ “ Tx .
Reasoning by the absurd, if kerpϕqK has a dimension greater than 1, then there exists at least one other generator, which we shall note u1 ‰ u, u1 ‰ 0H , u1 P kerpϕqK , where u and u1 are linearly independent. Since kerpϕqK is a vector space, the GramSchmidt algorithm can be applied to orthonormalize the pair pu, u1 q and obtain the pair p˜ u, u ˜1 q P kerpϕqK ˆ kerpϕqK , }˜ u} “ }˜ u1 } “ 1 and u ˜Ku ˜1 . We define the vectors: x“
ϕp˜ uq u ˜ “ ϕp˜ uq˜ u, }˜ u}2
x1 “
ϕp˜ u1 q 1 u ˜ “ ϕp˜ u1 q˜ u1 }˜ u1 }2
which are themselves orthogonal, so Pythagoras’ theorem can be used to estimate the squared norm of their difference: }x ´ x1 }2 “ }x ` p´x1 q}2 “ }x}2 ` }x1 }2 “ |ϕp˜ uq|2 }˜ u}2 ` |ϕp˜ u1 q|2 }˜ u1 }2 ą 0 since ϕp˜ uq, ϕp˜ u1 q and the norms of u ˜ and u ˜1 are ‰ 0. Consequently, x ‰ x1 , so we would have two different vectors in H, x and x1 , associated with the same functional ϕ P H˚ . This is incompatible with the injectivity of the Riesz map. K Furthermore, since x “ ϕpuq }u}2 u and u P kerpϕq , x R ker ϕ and so Theorem 5.4 tells us that the residual vector of the orthogonal projection of x onto ker ϕ, that is, 2 x ´ Pker ϕ x, belongs to kerpϕqK .
R EMARK .– In light of this discussion, the inverse of the Riesz map can be expressed as: T ´1 : H˚ ÝÑ H ϕ ÞÝÑ T ´1 pϕq “ x “
ϕpuq }u}2 u
where u ‰ 0H is an arbitrary vector in kerpϕqK . Since dimpkerpϕqK q “ 1, in order to verify that this definition is well established, we must simply verify that if k P K, k ‰ 0, then the vector x associated with ϕ via u1 “ ku (as an arbitrary element of the one-dimensional subspace kerpϕqK ) is the same: ϕpu1 q 1 k ϕpuq kk ϕpuq ϕpuq u “ ku ““ u“ u“x 1 2 2 2 2 2 }u } |k| }u} }u}2 |k|}u} hence the definition of T ´1 does not depend on the choice of the vector u ‰ 0H P kerpϕqK . x1 “
Bounded Linear Operators in Hilbert Spaces
249
6.4.1. The scalar product induced on the dual of a Hilbert space In the context of the Riesz representation theorem, we saw that a Hilbert space H and its dual H˚ can be identified as Banach spaces, since the isometry of the transformation T draws only on the norm of H and H˚ . It is possible to go even further, and identify these as Hilbert spaces. The first step is to introduce an inner product on H˚ . This can be done using the Riesz isomorphism T : H Ñ H˚ : any bounded linear functional of H˚ is the image of a vector in H and, as we know the inner product of H, there is no risk of ambiguity if we define the inner product on H˚ as: xϕ, ψyH˚ :“ xT ´1 ϕ, T ´1 ψyH ,
@ϕ, ψ P H˚
The fact that T preserves the norm guarantees that this definition of inner product will be compatible with the pre-existing Banach space structure on H˚ . If ϕ “ Tx , that is, ϕ is the functional which can be identified with the image of the vector x P H via T , then: }ϕ}2 “ xT ´1 pTx q, T ´1 pTx qy “ xx, xy “ }x}2 “ }Tx }2 where the final equality is a consequence of the Riesz representation theorem. The compatibility between the co-existing structures of inner product space and complete normed space implies that H˚ , equipped with the inner product induced by the Riesz isomorphism T , is itself a Hilbert space; thus, T becomes an (antilinear) isomorphism between the Hilbert spaces H and H˚ . The Riesz representation theorem is one of the most important results of functional analysis. In the following two sections, we discuss an extension of this result (called the Lax-Milgram theorem) and an extremely significant consequence of Riesz’s theorem: each operator in BpHq can be unambiguously associated with another operator, called its adjoint, which plays a fundamental role in the analysis of projection and unitary operators, among other things. 6.5. Bilinear forms, sesquilinear forms and associated quadratic forms The concept of a quadratic form associated with a bilinear or sesquilinear form could have been introduced in Chapter 1. However, we have decided to discuss this subject here because the connection between bounded linear operators in Hilbert spaces and quadratic forms leads directly to the definition of the adjoint operator, which will be presented in section 6.6. D EFINITION 6.8 (quadratic form).– Let φ : V ˆ V Ñ R (resp. φ : V ˆ V Ñ C) be a bilinear (resp. sesquilinear) form on the real (resp. complex) vector space V . The
250
From Euclidean to Hilbert Spaces
function Φ : V Ñ R, resp. Φ : V Ñ C, defined by restriction of φ on the diagonal of V ˆ V , that is: Φpxq :“ φpx, xq, is called the quadratic form associated with φ. With the addition of positive-definiteness and symmetry (resp. conjugate symmetry) requirements, φ becomes an inner product x , y and, in this case, Φpxq “ xx, xy “ }x}2 for all v P V , that is, Φ is the square of the norm canonically associated with φ. This observation is the reason why Φ is known as the quadratic form. Now, let us consider the concept of bounded forms. D EFINITION 6.9.– If pV, } }q is a normed vector space, then the form φ : V ˆ V Ñ K, taken to be bilinear if K “ R and sesquilinear if K “ C, is said to be bounded if there exists a constant m ą 0 such that: |φpx, yq| ď m}x}}y},
@x, y P V
Where applicable, the norm of φ is defined by the formula: φ :“ inftm ą 0 : |φpx, yq| ď m}x}}y}, @x, y P V u As in the case of operators in BpHq, the norm of φ can be rewritten in an equivalent, and highly useful, form: φ “
sup
φpx, yq
x“y“1
giving us: |φpx, yq| ď φ x y ,
@x, y P H
D EFINITION 6.10 (bounded quadratic forms and their norm).– If pV, } }q is a normed vector space, then the quadratic form Φ is said to be bounded if there exists a constant k ą 0 such that: |Φpxq| ď k}x}2 ,
@x P V
The norm of a bounded quadratic form is defined by: Φ :“ inftk ą 0 : |Φpxq| ď k}x}2 , @x P V u
Bounded Linear Operators in Hilbert Spaces
251
As we saw with the norm of φ, the norm of Φ can be rewritten as: }Φ} :“ sup |Φpxq| }x}“1
giving us: }Φpxq} ď }Φ}}x}2 ,
@x P V
[6.17]
As in the case of inner products and their norms, the polarization formula can be used to completely describe a bilinear (sesquilinear) form via its associated quadratic form. T HEOREM 6.18.– Let φ be a bilinear (resp. sesquilinear) form on V and let Φ be its associated quadratic form. Then, for all x, y P V : 4φpx, yq “ Φpx ` yq ´ Φpx ´ yq respectively: 4φpx, yq “ Φpx ` yq ´ Φpx ´ yq ` iΦpx ` iyq ´ iΦpx ´ iyq The proof is identical to that presented in section 1.2.1, where we saw that the bilinearity or sesquilinearity of the form φ is the only aspect required to prove the polarization formula. The following result is an immediate corollary of the polarization formula, and gives a condition which is equivalent to that set out in Theorem 6.10 for bilinear or sesquilinear forms. C OROLLARY 6.2.– Let φ1 and φ2 be two bilinear or sesquilinear forms on V . Then: φ1 “ φ2 ðñ Φ1 “ Φ2 , that is φ1 px, yq “ φ2 px, yq @x, y P V ðñ φ1 px, xq “ φ2 px, xq @x P V that is, the equality of the quadratic forms is necessary and sufficient to characterize the equality of the forms with which they are associated. Now, let us consider an important consequence of this corollary. T HEOREM 6.19.– A sesquilinear form φ : V ˆ V Ñ C is Hermitian if and only if its associated quadratic form Φ is real, that is if Φpxq P R @x P V .
252
From Euclidean to Hilbert Spaces
P ROOF.– Let us prove these two implications. ùñ :Let φ be Hermitian, that is, φpx, yq “ φpy, xq @x, y P V . Then: Φpxq “ φpx, xq “ φpx, xq “ Φpxq,
@x P V
that is, Φ is real. ðù : Now, taking Φpxq “ Φpxq, let us define a sesquilinear form ψ : V ˆ V Ñ C as follows: ψpx, yq “ φpy, xq. If we can show that ψ “ φ, this will prove that φ is sesquilinear. To do this, we examine the quadratic form Ψ associated with ψ: Ψpxq “ φpx, xq “ Φpxq “ Φpxq,
@x P V
and, by Corollary 6.2, Ψ “ Φ implies ψ “ φ.
2
As a special case of the theorem just proven, if a sesquilinear form φ is positive, and thus real, it must necessarily be Hermitian. This consideration provides additional justification for the definition of complex inner product given in Chapter 1 as a sesquilinear positive-definite Hermitian form. Theorem 6.20 relates to the relationship between the boundedness of a bilinear or sesquilinear form φ and that of its associated quadratic form. T HEOREM 6.20.– A bilinear or sesquilinear form φ on a normed vector space pV, } }q is bounded if and only if the associated quadratic form Φ is bounded. Furthermore: – if φ is real, then: }φ} “ }Φ} ; – if φ is complex, then its norm is contained in the interval between the norm of Φ and its double: }Φ} ď }φ} ď 2}Φ}. P ROOF.– We shall prove the first inequality by considering a real bilinear or complex sesquilinear form. }Φ} ď }φ}, φ real or complex : by definition we have: }Φ} “ sup |Φpxq| “ sup |φpx, xq| ď x“1
sup
p˚q x“y“1
x“1
|φpx, yq| “ }φ}
where p˚q is due to the fact that the upper bound is calculated on a larger set of values. If φ is bounded, then Φ is also bounded, and the first inequality is valid. }φ} ď }Φ}, φ real bilinear : polarization formula, we have:
now, taking Φ to be bounded, then, by the
|φpx, yq| “ 14 |Φpx ` yq ´ Φpx ´ yq| ď 2
2
ď 14 }φ}2px ` y q “
1 2 4 }φ}p}x ` y} r6.17s 2 2 1 2 }φ}px ` y q
` }x ´ y}2 q
Bounded Linear Operators in Hilbert Spaces 2
2
253
2
2
by applying the parallelogram formula x ` y ` x ´ y “ 2px ` y q. Hence: }φ} “
1 2 2 |φpx, yq| ď sup }Φ}px ` y q “ }Φ} x“y“1 2 x“y“1 sup
Hence, a bounded Φ implies a bounded φ and it holds that }ϕ} ď }Φ}. }φ} ď 2}Φ}, φ complex sesquilinear : polarization formula, we have:
taking Φ to be bounded, using the
|φpx, yq| “ 14 |Φpx ` yq ´ Φpx ´ yq ` iΦpx ` iyq ´ iΦpx ´ iyq| ď
r6.17s
1 4 }φ}p}x
` y}2 ` }x ´ y}2 ` }x ` iy}2 ` }x ´ iy}2 q 2
2
In this case, the parallelogram formula gives us: x ` iy ` x ´ iy “ 2 2 2 2 2 2 2px ` iy q “ 2px ` |i|2 y q “ 2px ` y q, thus 2 2 }x ` y}2 ` }x ´ y}2 ` }x ` iy}2 ` }x ´ iy}2 “ 4px ` y q and so: 2
2
|φpx, yq| ď }Φ}px ` y q which implies: }φ} “
sup x“y“1
|φpx, yq| ď
sup x“y“1
2
2
}Φ}px ` y q “ 2}Φ}
Thus, a bounded Φ implies that φ is bounded, and it holds that }ϕ} ď 2}Φ}.
2
If a (complex) sesquilinear form φ is also Hermitian, then we know that its associated quadratic form Φ is real. The theorem proved above guarantees the equality of the norms of φ and Φ when φ is a real bilinear form (and thus Φ is also real). These considerations naturally lead to the idea that a Hermitian (complex) sesquilinear form might have a norm which coincides with that of its (real) quadratic form. The following result confirms that this is the case. T HEOREM 6.21.– If a sesquilinear form φ : V ˆ V Ñ C, where pV, } }q is a normed vector space, is bounded and Hermitian, then }φ} “ }Φ}. P ROOF.– We have seen that the inequality }Φ} ď }φ} is always valid, so we must simply show that the opposite inequality is valid when φ is Hermitian. Once again, consider the polarization formula: φpx, yq “
1 pΦpx ` yq ´ Φpx ´ yq ` iΦpx ` iyq ´ iΦpx ´ iyqq 4
254
From Euclidean to Hilbert Spaces
Since Φ is real, the real part of both sides is: pφpx, yqq “
1 pΦpx ` yq ´ Φpx ´ yqq 4
Using equation [6.17] and the parallelogram formula, we can write the following inequality: |pφpx, yqq| ď
1 1 }Φ}p}x ` y}2 ` }x ´ y}2 q “ }Φ}p}x}2 ` }y}2 q 4 2
[6.18]
If θ P r0, 2πq is such that φpx, yq “ |φpx, yq|eiθ , then, by linearity on the first entry of φ: 0 ď |φpx, yq| “ e´iθ φpx, yq “ φpe´iθ x, yq that is, φpe´iθ x, yq is a real positive quantity, and thus it coincides with its real part and also with its magnitude, hence |φpx, yq| “ |pφpe´iθ x, yqq|. Using equation [6.18], we obtain: }φ} “ ď
sup x“y“1
|φpx, yq| “
sup x“y“1
|pφpe´iθ x, yqq|
1 }Φ}p}x}2 ` }y}2 q “ }Φ} x“y“1 2
2
sup
Now, let us consider the important relationship between bounded bilinear or sesquilinear forms defined on a Hilbert space H and the operators of BpHq. The two results presented below are essential for defining the adjoint of a bounded operator. T HEOREM 6.22.– For all fixed A P BpHq, the bilinear form (if H is real) or sesquilinear form (if H is complex) φA on H defined by: φA px, yq “ xAx, yy
ou
φA px, yq “ xx, Ayy
is bounded, and it holds that }φA } “ }A}. P ROOF.– Consider the definition φA px, yq “ xAx, yy: the proof for the other definition is similar. We observe that: |φA px, yq| “ |xAx, yy|
ď
(Cauchy-Schwarz)
}Ax}}y} ď }A}}x}}y}, r6.11s
hence φA is bounded and: }φA } “
sup x“y“1
|φA px, yq| ď
sup x“y“1
}A}}x}}y} “ }A}
@x, y P H
Bounded Linear Operators in Hilbert Spaces
255
thus }φA } ď }A}. Now, we shall prove the equality of the norms by demonstrating that }A} ď }φA }. First, we note that φA px, Axq “ xAx, Axy “ }Ax}2 ě 0, so it holds that }Ax}2 “ |φA px, Axq|. Then, given that φA is bounded: }Ax}2 “ |φA px, Axq| ď }φA }}x}}Ax} If Ax ‰ 0, then both sides of the previous inequality can be divided by }Ax}, giving us }Ax} ď }φA }}x}. If Ax “ 0, then the inequality }Ax} ď }φA }}x} is written as 0 ď }φA }}x}, which is trivially true. Thus, the inequality }Ax} ď }φA }}x} holds with no constraints, and we can write: }A} “ sup }Ax} ď sup }φA }}x} “ }φA } x“1
x“1
that is, }A} ď }φA }.
2
If we write Bilb pHq, resp. Sesqb pHq, to denote the vector space (with respect to the pointwise defined linear operations) of the bounded bilinear, or sesquilinear, forms on H, then the mapping: BpHq ÝÑ Bilb phq , A ÞÝÑ φA
or :
BpHq ÝÑ Sesqb phq A ÞÝÑ φA
is an isometric inclusion. The mapping defined by BpHq Q A ÞÑ φA P Bilb pHq is linear. The mapping given by BpHq Q A ÞÑ φA P Sesqb pHq is also linear if we define φA px, yq “ xAx, yy, but it is antilinear if we define φA px, yq “ xx, Ayy. By isometry, we can add a further characterization of the norm of an operator A P BpHq. C OROLLARY 6.3 (Fifth characterization of the norm of an operator in BpHq).– For all A P BpHq it holds that: }A} “
sup x“y“1
|xAx, yy|
[6.19]
The following result tells us that the application which associates a bounded operator with a bounded bilinear or sesquilinear form is not only an isometric inclusion, but is also surjective, that is any bounded bilinear or sesquilinear form on a Hilbert space H is defined by one, and only one, operator in BpHq. In short, the correspondence bounded operator ðñ bounded bilinear or sesquilinear form is an isometric isomorphism.
256
From Euclidean to Hilbert Spaces
T HEOREM 6.23.– Let H be a Hilbert space on K “ R, C. For any bounded bilinear form φ : H ˆ H Ñ K if K “ R, or any bounded sesquilinear form K “ C, there exists a unique operator B P BpHq such that φ “ φB , that is: φpx, yq “ xBx, yy
or:
φpx, yq “ xx, Byy,
@x, y P H
P ROOF.– For the purposes of our proof, let us consider the definition φB px, yq “ xx, Byy; the proof for the other one is analogous. Injectivity: Theorem 6.23 guarantees that, for all B P BpHq, φB px, yq “ xx, Byy is a bounded bilinear or sesquilinear form. Now, take B1 , B2 P BpHq such that φ “ φB1 “ φB2 , that is φpx, yq “ xx, B1 yy “ xx, B2 yy @x, y P H, then, by Theorem 6.10, B1 “ B2 . Surjectivity: Taking an arbitrary fixed bilinear or sesquilinear form φ : HˆH Ñ K and an arbitrary element y P H, the application : φy : H ÝÑ K x ÞÝÑ φy pxq :“ φpx, yq is clearly a bounded linear functional on H, that is, φy P H˚ . By the Riesz representation theorem, there exists one single element ξy P H such that φy “ Tξy “ T pξy q, where T is the Riesz isomorphism and Tξy P H˚ is the Riesz representative of ξy P H, which has an action on any x P H defined by Tξy pxq “ xx, ξy y. In short, @x, y P H, we know that φpx, yq “ φy pxq “ xx, ξy y, and thus the property of surjectivity will be proven if we can show that the application: B : H ÝÑ H y ÞÝÑ By :“ ξy is a bounded linear operator on H, since in this case it holds that φpx, yq “ xx, Byy @x, y P H. Taking arbitrary x, y1 , y2 P H and α1 , α2 P K, we have: xx, ξα1 y1 `α2 y2 y “ φpx, α1 y1 ` α2 y2 q “ α1 φpx, y1 q ` α2 φpx, y2 q “ α1 xx, ξy1 y ` α2 xx, ξy2 yxx, α1 ξy1 y ` xx, α2 ξy2 y “ xx, α1 ξy1 ` α2 ξy2 y which shows the linearity of the correspondence H Q y ÞÑ ξy “ By P H. To show that B is bounded, we observe that, since φ is bounded, there exists k ą 0 such that : |xx, Byy| “ |φpx, yq| ă k}x}}y}
@x, y P H
Bounded Linear Operators in Hilbert Spaces
257
Due to the arbitrary nature of x, we know that the inequality also holds when x “ Ay, that is: }By}2 “ |xBy, Byy| ă k}By}}y}
@y P H
hence }By} ă k}y} @y P H such that By ‰ 0, and when By “ 0 the inequality }By} ă k}y} is trivially true, so it holds that }By} ă k}y} @y P H, that is, B is bounded. 2 6.5.1. The Lax-Milgram theorem and its consequences In 1954, Peter Lax and Arthur Milgram presented a simple and elegant proof of a remarkable consequence of Theorem 6.23, generalizing the Riesz representation theorem to bilinear or sesquilinear forms. One of the hypotheses required to obtain this result is defined below. D EFINITION 6.11 (coercive or V -elliptical forms).– Let pV, } }q be a normed vector space. A bilinear or sesquilinear form φ : V ˆ V Ñ K, K “ R or C is said to be coercive or V -elliptical if there exists a constant K ą 0 such that: Φpxq ě K}x}2 ,
@x P V
It is evident that an inner product on V is a coercive form, as, in this case, Φpxq “ xx, xy “ }x}2 ě K}x}2 @x P V with 0 ă K ď 1. The following example is less trivial. If z P Cpr0, 1s, Rq is such that min zptq ą tPr0,1s
0, then the bilinear form: φz : L2 r0, 1s ˆ L2 r0, 1s ÝÑ R ş1 px, yq ÞÝÑ φz px, yq :“ 0 xptqyptqzptqdt is coercive since: Φz pxq “
ż1 0
“ min zptq tPr0,1s
|xptq|2 zptqdt ě ż1 0
ż1 0
|xptq|2 min zptqdt tPr0,1s
|xptq|2 dt “ K}x}2
where K “ min zptq. tPr0,1s
T HEOREM 6.24 (Lax-Milgram theorem).– Let H be a Hilbert space on K “ R or C and let φ : H ˆ H Ñ K be a bounded and coercive bilinear form if K “ R, or a
258
From Euclidean to Hilbert Spaces
bounded and coercive sesquilinear form if K “ C. Then, for any bounded functional ϕ P H˚ , there exists a single element uϕ P H such that: ϕpxq “ φpx, uϕ q,
@x P H
P ROOF.– We know from Theorem 6.23 that there exists an operator A P BpHq such that: φpx, yq “ xx, Ayy,
@x, y P H
[6.20]
On the other side, the Riesz representation theorem guarantees that, for any bounded linear functional ϕ P H˚ , there exists a single element T ´1 pϕq P H, where T is the Riesz isomorphism, such that: ϕpxq “ xx, T ´1 pϕqy,
@x P H
[6.21]
The main idea behind the proof of this theorem is to compare equations [6.20] and [6.21]. If the operator A : H Ñ H is an isomorphism, then there exists a unique element in H, written as uϕ P H since it depends on ϕ, which satisfies Auϕ “ T ´1 pϕq; then: ϕpxq “ xx, T ´1 pϕqy r6.21s
“
pT ´1 pϕq“Auϕ q
xx, Auϕ y “ φpx, uϕ q, r6.20s
@x P H
that is, the thesis of the Lax-Milgram theorem. Now, let us show that A is an isomorphism. Injectivity is a simple consequence of coercivity: 0 ď K}x}2 ď Φpxq “ φpx, xq “ xx, Axy
“
xx,Axyě0
|xx, Axy|
ď
Cauchy-Schwarz
x Ax
1 Ax for all x ‰ 0, and for x “ 0 the inequality is trivial, so it hence }x} ď K holds for all x P H. This implies the injectivity of A: given arbitrary x1 , x2 P H, by linearity, the condition Ax1 “ Ax2 implies that Apx1 ´ x2 q “ 0; then }x1 ´ x2 } ď 1 K Apx1 ´ x2 q “ 0, that is x1 “ x2 .
The surjectivity of A, that is, the fact that ImpAq “ H, is slightly harder to prove. The first argument used here reposes on the inequality proven above. More precisely, let pxn qnPN Ă H be an arbitrary sequence of elements in H, then pAxn qnPN Ă ImpAq is an arbitrary sequence of elements in the image of A. Now, let us suppose that this sequence is convergent in H, that is there exists y P H such that Axn ´ y Ñ nÑ`8
0. Notably, as a convergent sequence, pAxn qnPN is Cauchy, that is, for all ε ą 0 DNε P N such that n, m ě Nε implies Axn ´ Axm ă ε. It therefore also holds that 1 Axn ´ Axm ă ε for all n, m ě Nε , that is, if pAxn qnPN converges }xn ´xm } ď K in H, then pxn qnPN is a Cauchy sequence in H. Since H is complete, pxn qnPN itself
Bounded Linear Operators in Hilbert Spaces
259
converges in H, that is, there exists x P H such that lim xn ´ x “ 0. A is nÑ`8
bounded and therefore continuous, so: ˆ ˙ A lim xn ´ x “ lim Axn ´ Ax “ 0 nÑ`8
nÑ`8
Furthermore, by the uniqueness of the limit in a metric space, we obtain y “ Ax P ImpAq, that is ImpAq is a closed vector subspace of H as it contains the limits of all of its sequences. The closure of ImpAq means that we can use Theorem 5.4. Reasoning by the absurd, if ImpAq is a proper vector subspace of H, then there exists a non-zero vector ξ P Hz ImpAq that is orthogonal to ImpAq, that is, xξ, Ayy “ 0 @y P H. Taking y “ ξ, we obtain: 0 “ xξ, Aξy “ Φpξq
ě
coercivity
K}ξ}2 ą 0
since ξ ‰ 0 and K ą 0, which is absurd.
2
The Lax-Milgram theorem is widely used in solving partial differential equations (PDE) expressed in variational form. Roughly speaking, this approach involves rewriting a PDE as the problem of minimization of a functional expressed by an integral, and looking for the so-called weak solution of the PDE, which takes the form of a minimizer of the functional. In this type of approach, one almost immediate corollary of the Lax-Milgram theorem (often cited as an integral part of the theorem) proves extremely useful. C OROLLARY 6.4 (Lax-Milgram: symmetric case).– Take: – H a real Hilbert space; – ϕ P H˚ ; – φ : H ˆ H Ñ R a bounded, coercive and symmetrical bilinear form; – Φ : H Ñ R the quadratic form associated with φ. Then the vector uϕ P H such that ϕpxq “ φpx, uϕ q @x P H is the only element in H which minimizes the linear functional: Jϕ : H ÝÑ R x ÞÝÑ Jϕ pxq :“ 12 Φpxq ´ ϕpxq that is, D! uϕ P H such that: Jϕ puϕ q “ min Jϕ pxq xPH
ðñ
uϕ “ arg min Jϕ pxq xPH
260
From Euclidean to Hilbert Spaces
P ROOF.– We perform a shift in a neighborhood of uϕ with an arbitrary vector w P H and compute Jϕ : Jϕ puϕ ` wq “ 12 φpuϕ ` w, uϕ ` wq ´ ϕpuϕ ` wq “
1 rφpuϕ , uϕ q ` φpuϕ , wq ` φpw, uϕ q ` φpw, wqs ´ ϕpuϕ q ´ ϕpwq 2 “
pφ symmetricq
“
1 rφpuϕ , uϕ q ` 2φpuϕ , wq ` φpw, wqs ´ ϕpuϕ q ´ ϕpwq 2
1 1 φpuϕ , uϕ q ´ ϕpuϕ q ` φpw, wq ` φpw, uϕ q ´ ϕpwq 2 2 ˙ ˆ 1 φpuϕ , uϕ q ´ ϕpuϕ q “ Jpuϕ q and uϕ satisfies ϕpwq “ φpw, uϕ q 2
1 “ Jpuϕ q ` Φpwq 2 ě
pφ coerciveq
Jpuϕ q `
K }w}2 2
ě
2 pK 2 }w} ě0q
Jpuϕ q
that is, Jpuϕ q ď Jϕ puϕ ` wq @w P H, thus uϕ is the only minimizer of J.
2
Since a real inner product is a bounded, coercive and symmetrical bilinear form, and since its associated quadratic form is the square of the norm (typically expressed in integral form), this result guarantees that, for any real functional of the form: Jϕ pxq “
1 }x}2 ´ ϕpxq 2
where ϕ P H˚ , there exists a single minimizer uϕ P H. The Lax-Milgram theorem and its symmetric variant form the basis for finite element methods, which are based around the following idea: If ϕ does not have a simple expression, then looking directly for the minimizer (weak solution of a PDE) uϕ in the whole Hilbert space H may be very complicated and time-consuming. In pnq this case, the answer can be approximated by looking for a sequence uϕ in Hn , a finite-dimensional subspace of H (hence the term “finite elements”). pnq
In the case where φ is symmetrical and definite-positive, uϕ is the orthogonal projection of u on Hn in the sense of the inner product defined by φ. Once we have defined a basis phi qni“1 (which is typically orthonormal) on Hn , the pnq problem consists of solving the linear system Auϕ “ b, whereAij “ φphj , hi q and bi “ ϕphi q.
Bounded Linear Operators in Hilbert Spaces
261
Finally, note that the Lax-Milgram theorem presented here may be obtained as a corollary of a theorem proven by Lions and Stampacchia in 1967 in the context of variational inequalities. 6.6. The adjoint operator: presentation and properties In this section, we shall examine a particularly important consequence of the Riesz representation theorem and of the results presented in section 6.5.1: the possibility of associating A with another operator, called “adjoint”, which is of fundamental importance in functional analysis and its applications. Consider an operator A P BpHq. By Theorem 6.22, the bilinear or sesquilinear form defined by φpx, yq “ xx, Ayy is bounded. By Theorem 6.23, there exists a single bounded operator B such that, for all x, y P H, it holds that φpx, yq “ xBx, yy, hence: xx, Ayy “ φpx, yq “ xBx, yy. By the same arguments, if we select the alternative options in theorems 6.22 and 6.23, we obtain the equation: xAx, yy “ φpx, yq “ xx, Byy @x, y P H. The operator B has a specific name and symbol. D EFINITION 6.12.– Take A P BpHq. The adjoint operator of A, noted3 A: , is A: P BpHq such that: xA: x, yy “ xx, Ayy
and
xAx, yy “ xx, A: yy
@x, y P H
The application : : BpHq Ñ BpHq, A ÞÑ A: is known as adjunction. T HEOREM 6.25.– The adjunction is an antilinear automorphism of BpHq and it verifies the following properties: for all A, B P BpHq and k P K: 1) pA ` Bq: “ A: ` B : ; ¯ :; 2) pkAq: “ kA 3) pABq: “ B : A: ; 4) pAq:: “ A; 5) }A: A} “ }A}2 , }AA: } “ }A: }2 ; 6) }A: } “ }A}.
3 The origin of the symbol :, the dagger, reflects the close relationship between the adjoint operator A: and the transposed or dual operator At . For more information, see Appendix 2. The symbol A˚ is also widely used.
262
From Euclidean to Hilbert Spaces
P ROOF.– 1) and 2) are immediate consequences of the sesquilinearity of the complex inner product (if the Hilbert space is real, then evidently, k¯ “ k, as a consequence of bilinearity). 3) xpABq: x, yy “ xx, AByy “ xA: x, Byy “ xB : A: x, yy @x, y P H, hence property 3. 4) Since A:: “ pA: q: , xA:: x, yy “ xpA: q: x, yy “ xx, A: yy “ xA: y, xy “ xy, Axy “ xAx, yy @x, y P H, hence property 4. 5) Let us begin by showing that }A}2 ď }A: A}: taking x P H, }x} “ 1, we have: }Ax}2 “ xAx, Axy “ |xAx, Axy| “ |xA: Ax, xy| }A: Ax}}x} “ }A: Ax} ď Cauchy-Schwarz
ď }A: A}}x} “ }A: A} thus, since }A}2 “ sup }Ax}2 , }A}2 ď }A: A}. }x}“1
Now, let us show that }A: A} ď }A}2 . We begin by noting that, for all x, y P H, }x} “ }y} “ 1, it holds that: a xAx, Ayy ď pxAx, Ayyq2 ` pImpxAx, Ayyq2 “ |xAx, Ayy| [6.22] }Ax}}Ay} ď }A}}x}}A}}y} “ }A}2 ď Cauchy-Schwarz
If xA: Ax, yy “ |xA: Ax, yy|eiϑ , with ϑ the phase of xA: Ax, yy, then : R Q |xA: Ax, yy| “ e´iϑ xA: Ax, yy “ xA: Ax, eiϑ yy that is, xA: Ax, eiϑ yy P R and thus xA: Ax, eiϑ yy “ xA: Ax, eiϑ yy
ď
r6.22s
}A}2 ,
since }eiϑ y} “ 1. Using the fact that xA: Ax, yy “ xA: Ax, eiϑ yy, we can write: |xA: Ax, yy| ď }A}2 ,
@x, y P H, }x} “ }y} “ 1
[6.23]
Now, let us take an arbitrary ξ P H and use this last inequality to estimate the norm of A: Aξ: }A: Aξ} “ “
1 }A: Aξ} 1 }A: Aξ}
1 1 1 : 2 : : }ξ} }A Aξ} }ξ} “ }A: Aξ} ˇ}ξ} xA Aξ, A Aξy}ξ} ˇ ˇ : ξ A: Aξ ˇ 1 : : xA A |xA Aξ, A Aξy|}ξ} “ , y ˇ }ξ} }ξ} }A: Aξ} ˇ }ξ} :
ξ A Aξ Writing x “ }ξ} and y “ }A : Aξ} and observing that these two vectors are unitary, we can use inequality [6.23] to write }A: Aξ} ď }A}2 }ξ}, for all ξ P H, which implies that }A: A} “ sup}ξ}“1 }A: Aξ} ď }A}2 . Hence }A: A} “ }A}2 @A P BpHq. If we write B “ A: , then B P BpHq and }B : B} “ }B}2 , that is, }A:: A: } “ }A: }2 ; moreover, A:: “ A, thus }AA: } “ }A: }2 for all A P BpHq.
Bounded Linear Operators in Hilbert Spaces
263
6) On one side, we have: }A}2 “ }A: A} ď }A: }}A} ùñ r6.12s
}A}2 }A: }}A} ď ðñ }A} ď }A: } }A} }A}
and on the other side we have: }A: }2 “ }AA: } ď }A}}A: } ùñ r6.12s
}A: }2 }A}}A: } ď ðñ }A: } ď }A} }A: } }A: }
2 An immediate corollary of properties 1 and 6 is that the adjunction : : BpHq Ñ BpHq is a continuous function, in fact, if pAn qnPN Ă BpHq is a sequence in BpHq which converges toward A P BpHq, that is, }An ´ A} Ñ 0, then: nÑ`8
:
:
:
}An ´ A } “ }pAn ´ Aq } “ }An ´ A} p1q
p6q
Ñ
nÑ`8
0
The Banach algebra BpHq equipped with the adjunction operation becomes a C˚ algebra, as formalized below. D EFINITION 6.13 (C˚ -algebra).– A Banach algebra A is called a C˚ -algebra if it is possible to equip it with a map j : A Ñ A such that, @a, b P A and @k P C: 1) jpa ` bq “ jpaq ` jpbq; ¯ 2) jpkaq “ kjpaq; 3) jpabq “ jpbqjpaq; 4) jpjpaqq “ a. C˚ -algebra theory is extremely important in functional analysis and its applications, in particular in quantum mechanics; however, a thorough discussion C˚ -algebras lies outside of the scope of this work. Let us now consider the class of operators that are invariant with respect to adjunction. D EFINITION 6.14 (self-adjoint or Hermitian operators).– A P BpHq is a self-adjoint (s.a.) or Hermitian operator if A: “ A, that is, if: xAx, yy “ xx, Ayy,
@x, y P H
To understand the importance of self-adjoint operators, we just quote the fact that the physical observables in quantum mechanics are represented by self-adjoint operators on a Hilbert space. Two particularly remarkable self-adjoint operators are A: A and AA: .
264
From Euclidean to Hilbert Spaces
T HEOREM 6.26.– Taking A P BpHq, then A: A and AA: are self-adjoint. P ROOF.– We simply apply the properties pABq: “ B : A: and A:: “ A: pA: Aq: “ A: A:: “ A: A and: pAA: q: “ A:: A: “ AA:
2
The following theorem establishes the conditions under which the self-adjoint property is stable with respect to the operations of the algebra BpHq. The following notation will be used: @A, B P BpHq, we define the operator rA, Bs :“ AB ´ BA, called the commutator between A and B. A and B are said to commute if rA, Bs “ 0, the null operator; in this case, AB “ BA. T HEOREM 6.27.– If A, B P BpHq, A, B are self-adjoint, then: – αA ` βB is self-adjoint if and only if α, β P R ; – AB is self-adjoint if and only if rA, Bs “ 0. P ROOF.– The first property is a straightforward consequence of property 2 concerning the adjunction and sesquilinearity of the inner product. The second property is proven below. ùñ : AB s.a., that is AB “ pABq: , then pABq: “ B : A: AB “ BA.
“
A,B s.a.
BA, thus
ð : @x, y P H it holds that: xABx, yy “ xBx, A: yy “ xx, B : A: yy xx, BAyy
“
rA,Bs“0
xx, AByy, hence AB “ pABq: .
“
A,B s.a.
2
The following exercise makes use of many of the results presented above. Exercise 6.4 Let pun qnPN be an orthonormal system in the Hilbert space H, pλn qnPN Ă C and A : H Ñ H: ÿ Ax “ λn xx, un yun , @x P H nPN
1) Show that, if the sequence pλn qnPN is bounded, then A P BpHq.
Bounded Linear Operators in Hilbert Spaces
265
2) Calculate the adjoint A: of A. Using your result, deduce a necessary and sufficient condition for operator A to be anti-self-adjoint, that is, A ` A: “ 0. 3) For all n P N, consider the operator An defined by: An x “
n ÿ
λk xx, uk yuk
k“0
a) Calculate An un`1 ´ Aun`1 for all n P N. Using your result, deduce a necessary condition to have An ÝÑ A in BpHq. nÑ`8
b) Supposing that lim λn “ 0, prove that An ÝÑ A in BpHq. nÑ`8
nÑ`8
Solution to Exercise 6.4 1) Since pun qnPN is an orthonormal system of a Hilbert space, the FischerRiesz theoremřguarantees that Ax is well defined, that is, the convergence (in H) of the series λn xx, un yun is equivalent to the convergence (in C) of the series nPN ř ř |λn xx, un y|2 “ |λn |2 |xx, un y|2 . If pλn qnPN is a bounded sequence, that is, nPN nPN ř |λn xx, un y|2 ď M 2 }x}2 ă `8, by Bessel’s sup |λn | “ M ă `8, then nPN
nPN
inequality [5.5]. Now, let us analyze the conditions under which A is bounded: › ›2 ›ÿ › ÿ ÿ › › 2 λn xx, un yun › “ }λn xx, un yun } “ |λn |2 |xx, un y|2 }Ax}2 “ › ›nPN › (Pythagorean th.) nPN nPN The fact that pλn qnPN is bounded and Bessel’s inequality can also be used to write }Ax} ď M }x}, @x P H; furthermore, }A} “ sup }Ax} ď M , showing that A P }x}“1
BpHq. 2) Taking x, y P H, we have: xAx, yy “
λn xx, un y xun , yy “
ř
B
nPN
“
B x,
λn xy, un yun
ř
x,
ř
λn xun , yyun
F
nPN
F
nPN
Hence A: x “
ř
λn xx, un yun , @x P H.
nPN
By continuity, we can write pA ` A: qx “
ř
pλn ` λn q xx, un yun , thus A ` A:
nPN
is the zero operator if and only if λn ` λn “ 0 @n P N. Writing λn “ an ` ibn ,
266
From Euclidean to Hilbert Spaces
an , bn P R @n P N, we see that the condition λn ` λn “ 0 is equivalent to an “ ´an @n P N, that is an “ 0 @n P N, whereas there are no constraints on bn . Thus, A is anti-self-adjoint if and only if λn P iR for all n P N, that is, λn is a pure imaginary sequence. 3) a) Using the following facts: }un } “ 1,
xuk , un`1 y “ 0
@n P N, k ‰ n ` 1
we deduce that An un`1 “ 0, Aun`1 “ λn`1 un`1 @n P N. Then, by [6.10], we can write: }An ´ A}BpHq ě }pAn ´ Aqun`1 }H “ |λn`1 |
@n P N
that is, lim λn “ 0 is a necessary condition for lim }An ´ A}BpHq “ 0. nÑ`8
nÑ`8
b) For all n P N and x P H, we calculate: › ›2 8 8 › ÿ › ÿ › › 2 λn xx, uk yuk › “ |λk |2 |xx, uk y| }An x ´ Ax}2 “ › › › k“n`1 k“n`1 ˆ ˙2 ď sup |λk | }x}2 kěn`1
by Bessel’s inequality. Thus, }An ´ A}BpHq ď supkěn`1 |λk |, @n P N. Using the fact that lim λn “ 0, we obtain the required result: nÑ`8
lim }An ´A}BpHq “ 0
nÑ`8
Now, let us consider the norm of self-adjoint operators. T HEOREM 6.28.– Let A P BpHq be a self-adjoint operator, then: A “ sup |xAx, xy| }x}“1
P ROOF.– For simplicity’s sake, we write: sA “ sup |xAx, xy| }x}“1
sA ď }A} : by the Cauchy-Schwartz inequality, we have: |xAx, xy| ď }x}}Ax} ď }A}}x}2
2
Bounded Linear Operators in Hilbert Spaces
267
thus: sA “ sup |xAx, xy| ď sup }A}}x}2 “ }A} }x}“1
}x}“1
and so sA ď }A}. }A} ď sA : using the fact that @z P C, z ` z¯ “ 2Rpzq, we can write @x, y P H: 1 4pxAx, yyq “ 4 rxAx, yy ` xAx, yys “ 2rxAx, yy ` xy, Axys 2 By direct calculation, we can verify that the following equality holds true: 2rxAx, yy ` xy, Axys “ xApx ` yq, x ` yy ´ xApx ´ yq, x ´ yy thus: 4pxAx, yyq “ xApx ` yq, x ` yy ´ xApx ´ yq, x ´ yy “ }x ` y}2 xA
x`y x´y x`y x´y , y ´ }x ´ y}2 xA , y }x ` y} }x ` y} }x ´ y} }x ´ y}
ď }x ` y}2 sA ` }x ´ y}2 sA “ sA p}x ` y}2 ` }x ´ y}2 q “ sA 2p}x}2 ` }y}2 q [1.6]
that is, pxAx, yyq ď 12 sA p}x}2 ` }y}2 q @x, y P H. Since the inequality is valid for any pair of vectors in H, let us consider the pair x, z, where z “ eiϑ y with arbitrary ϑ P R. Given that }z} “ }y}, the previous inequality becomes: pxAx, eiϑ yyq ď
1 sA p}x}2 ` }y}2 q 2
[6.24]
We can now use a similar argument to that used to prove property 5 in the case of adjunction: we write xAx, yy “ |xAx, yy|eiϑ , where ϑ is the phase of xAx, yy, then: R Q |xAx, yy| “ e´iϑ xAx, yy “ xAx, eiϑ yy
“
(being real)
pxAx, eiϑ yyq
thus |xAx, yy| “ pxAx, eiϑ yyq, and so inequality [6.24] may be rewritten as: |xAx, yy| ď
1 sA p}x}2 ` }y}2 q 2
Now, let us introduce the vector y “ we obtain: |xAx, yy| “ |xAx,
}x} }Ax} Ax
into this inequality. On the left side,
}x} }x} }x} Axy| “ |xAx, Axy| “ }Ax}2 “ }x}}Ax} }Ax} }Ax} }Ax}
268
From Euclidean to Hilbert Spaces
while on the right side, we have: 1 }x}2 1 1 }Ax}2 q “ sA p}x}2 ` }x}2 q “ sA }x}2 sA p}x}2 ` }y}2 q “ sA p}x}2 ` 2 2 }Ax}2 2 thus, @x P H, it holds that }x}}Ax} ď sA }x}2 , and if x ‰ 0H , then }Ax} ď sA }x}, hence: }A} “ sup
x‰0H
}Ax} }x} ď sA sup “ sA }x} x‰0H }x}
and finally }A} ď sA .
2
Theorem 6.29 points out a property of the adjoint operator which is of fundamental importance in optimization. T HEOREM 6.29.– Taking A P BpHq, then: kerpAq “ pImpA: qqK
and
ImpA: q “ pkerpAqqK
thus: H “ kerpAq ‘ ImpA: q
and
H “ kerpA: q ‘ ImpAq
P ROOF.– kerpAq Ď pImpA: qqK : taking any x P H and y P kerpAq, then Ay “ 0H and so we can write: 0 “ xx, 0H y “ xx, Ayy “ xA: x, yy that is, y K A: x @x P H. Since ImpA: q “ tA: x, x P Hu, this implies that y P pImpA: qqK . pImpA: qqK Ď kerpAq : taking y P pImpA: qqK , then xA: x, yy “ 0 @x P H, and since xA: x, yy “ xx, Ayy, then xx, Ayy “ 0 @x P H, that is, Ay “ 0H , therefore y P kerpAq. Therefore: kerpAq “ pImpA: qqK “ pImpA: qqK . Taking the orthogonal complement again: kerpAqK “ pImpA: qqKK “ ImpA: q. We see that it is essential to consider the closure of ImpA: q, since kerpAqK is a closed subspace in H and, in general, ImpA: q is not. The orthogonal decompositions of H into a direct sum of subspaces are an immediate consequence of the orthogonal projection theorem. 2
Bounded Linear Operators in Hilbert Spaces
269
Finally, let us analyze the relationship between inversion and adjunction. Recall that, as we saw in section 6.3, if V is a Banach space, then GLpV q is its general linear group, that is, the group of continuous bijective linear operators with continuous inverses. T HEOREM 6.30.– Let H be a Hilbert space and let A P GLpHq. Then A: is invertible and: 1) it holds that: pA: q´1 “ pA´1 q: that is, for the operators in GLpHq, inversion and adjunction commute: the inverse of the adjoint is the adjoint of the inverse; 2) if A P GLpHq is self-adjoint, then A´1 is also self-adjoint. P ROOF.– 1) We need to prove that, for all x P H, pA´1 q: A: x “ A: pA´1 q: x “ x. To do this, let us consider, @x, y P H: xy, pA´1 q: A: xy “ xA´1 y, A: xy “ xAA´1 y, xy “ xy, xy xy, A: pA´1 q: xy “ xAy, pA´1 q: xy “ xA´1 Ay, xy “ xy, xy hence, by [6.10] pA´1 q: A: x “ A: pA´1 q: x “ x. 2) An immediate consequence of property 1 is that if A A “ pA´1 q: . ´1
“
A: , then 2
6.7. Orthogonal projection operators in a Hilbert space We have already examined the concept of orthogonal projection in a Hilbert space H. Here we wish to characterize orthogonal projections from an operator point of view. We will see that the adjoint operator will play a crucial role. A clear, simple way of understanding projection operators (orthogonal or otherwise) is to imagine that we are in a finite-dimensional Euclidean space, for example R2 , and to project a vector in the direction of another vector. Now, imagine that we want to repeat the process, that is, we want to “project the projection”; clearly, this operation has no effect on the projected vector. This property is used to define the concept of projection itself4. 4 Many authors refer to this as oblique projection to distinguish it from the more restrictive concept of orthogonal projection.
270
From Euclidean to Hilbert Spaces
D EFINITION 6.15.– An operator A P BpHq is called a projector, or a projection operator, if it is idempotent, i.e. A2 “ A. The presence of an inner product in H allows us to target a specific projection: the orthogonal projection. The results presented in Chapter 5 showed that the completeness of H with respect to the topology generated by the inner product allows us to give two equivalent definitions of orthogonal projection. D EFINITION 6.16.– Let H be a Hilbert space and S a closed proper subspace of H. The function: PS : H ÝÑ S x ÞÝÑ PS pxq is the orthogonal projector on the subspace S if }x ´ PS pxq} “ inf }x ´ y}, that is, yPS
PS pxq is the element in S which minimizes the distance from x P H with respect to the norm induced by the inner product of H. In an equivalent manner, if we consider the decomposition of H: H “ S ‘ S K and note x “ x1 ` x2 , with x P H, x1 P S, x2 P S K , then the orthogonal projection operator PS is defined via the formula PS pxq “ x1 . Let us consider an example of a projector. Take H “ L2 r´a, as, with a P R equipped with the Borel σ-algebra and the Lebesgue measure. The odd and even functions can be easily verified to be orthogonal for the inner product of L2 r´a, as. We then have the following decomposition: f pxq “
f pxq ` f p´xq f pxq ´ f p´xq ` , 2 2 looooooomooooooon looooooomooooooon even part
@x P r´a, as,
odd part
thus the projector of f P L2 r´a, as on the subspace P Ă L2 r´a, as of even functions p´xq , and the projector on the subspace I Ă L2 r´a, as is defined by PP f pxq “ f pxq`f 2 p´xq of odd functions is defined by PI f pxq “ f pxq´f , @x P r´a, as. 2 Now, let us examine the properties of the operator PS . 1) PS |S “ idS . This is trivial: the element PS pxq P S which minimizes the distance to x P S is itself. In other words, if x “ x1 P S, then PS pxq “ PS px1 q “ x1 . 2) PS2 “ PS (idempotence). @x P H, we have PS2 pxq “ PS p PS pxq. Thus PS is indeed a projector.
PoSmo pxq lo on
PS, by definition
q “
Bounded Linear Operators in Hilbert Spaces
271
3) PS is a continuous linear operator. Let x1 “ x11 ` x21 P H , with x11 P S and P S K , and let x2 “ x12 ` x22 P H, with x12 P S and x22 P S K . For all α, β P K we have: 1 1 αx1 ` βx2 “ αx αx21 ` βx22 1 ` βx2 ` looooomooooon looooomooooon
x21
PS
and thus:
PS K
PS pαx1 ` βx2 q “ αx11 ` βx12 “ αPS px1 q ` βPS px2 q PS is thus a linear operator. Its continuity can be proven by showing that it is bounded: taking any x “ x1 `x2 P H, with x1 P S, x2 P S K , then, by the Pythagorean 2 2 2 2 2 theorem, }x}2 “ }x1 }2 ` }x2 }2 and PS x “ x1 ď x1 ` x2 “ x , i.e. PS x ď x @x P H; 4) PS “ PS: (self-adjoint). To prove this, we use the projection theorem twice, on x P H and on y P H: x “ x1 ` x2 , y “ y 1 ` y 2 , x1 , y 1 P S, x2 , y 2 P S K : :0 1 xPS x, yy “ xx1 , y 1 ` y 2 y “ xx1 , y 1 y ` xx , y 2 y “ xx ´ x2 , y 1 y :0 2 “ xx, y 1 y ´ xx , y 1 y “ xx, y 1 y and since y 1 “ PS y, then: xPS x, yy “ xx, PS yy @x, y P H; 5) PS is a 1-Lipschitz function, that is: }PS pxq ´ PS pyq} ď }x ´ y} ,
@x, y P H
We simply note that @x, y P H, the projection of x ´ y, i.e. PS px ´ yq, and the residual vector px ´ yq ´ PS px ´ yq are orthogonal, since one belongs to S and the other to S K . Thus, we can apply the Pythagorean theorem and write: 2
2
}x ´ y} “ }px ´ yq ´ PS px ´ yq ` PS px ´ yq} 2
2
“ }px ´ yq ´ PS px ´ yq} ` }PS px ´ yq} 2
ě }PS px ´ yq}
“ }PS pxq ´ PS pyq}
2
So, we obtain: }PS pxq ´ PS pyq} ď }x ´ y} , @x, y P H. 6) The non-trivial orthogonal projectors have a unitary norm: # 1 if S ‰ t0H u PS “ 0 if S “ t0H u If S “ t0H u then PS ” 0, thus its norm is 0. Otherwise, by setting y “ 0 in S x the 1-Lipschitz property of PS we have that PS x ď x @x P H, i.e. Px ď1
272
From Euclidean to Hilbert Spaces
@x P Hzt0H u. Furthermore, if S ‰ t0H u, then there exists x ¯ P S, x ¯ ‰ 0H and S x PS x ¯“x ¯, i.e. PS x ¯ “ ¯ x, i.e. P¯xSx¯ “ 1. Then PS “ sup Px “ 1. xPH, x‰0
7) ImpPS q “ S . This is obvious, by definition of the projection operator. 8) ker PS “ S K . We must show the double inclusion: x P ker PS ùñ PS pxq “ 0 , so, for all y P S it holds that 0 “ xPS pxq, yy “ xx, PS: pyqy “ xx, yy since PS: “ PS and PS pyq “ y, thus x P S K . x P S K ùñ x “ 0 ` x “ PS pxq ` PS K pxq, by uniqueness of the orthogonal decomposition, thus PS pxq “ 0 and then x P ker PS . 9) One immediate consequence of the two previous properties and the projection theorem is that: H “ ImpPS q ‘ kerpPS q
@S closed subspace of H.
10) PS ` PS K “ idH (decomposition of the identity). For any x P H, we always have the decomposition x “ x1 ` x2 with x1 “ PS pxq and x2 “ PS K pxq. We thus have PS pxq ` PS K pxq “ pPS ` PS K qpxq “ x1 ` x2 “ x, @x P H. An immediate consequence is that: PS K “ idH ´ PS
@S closed subspace of H.
PS K is also called complementary projector and it is denoted with PS K . For all x P H, the residual vector of the projection of x on S is obtained via PS K “ pidH ´ PS qpxq “ x ´ PS pxq; 11) Characterization of the projection subspace: S “ tx P H : PS pxq “ xu “ tx P H : }PS pxq} “ }x}u that is, the elements of S are the fixed points of PS in H, which may themselves be characterized as elements of H which have a norm equal to that of their projection on S. Let us prove that S “ tx P H : PS pxq “ xu: an element of S is a point of H on which PS acts as the identity, vice-versa, if x P H satisfies PS pxq “ x then, by applying PS to both sides we get PS2 pxq “ PS pxq, but then, thanks to the idempotence of PS , x “ PS pxq P S. Let us now check that PS pxq “ x ðñ }PS pxq} “ }x}: ùñ : evidently, PS pxq “ x ùñ }PS pxq} “ }x}; ðù : again, taking H Q x “ x1 ` x2 with PS pxq “ x1 , we have: ›2 › ›2 › 2 2 }PS pxq} “ }x} ùñ }PS pxq} “ }x} ùñ ›x1 › “ ›x1 ` x2 › › ›2 › ›2 › ›2 ùñ ›x1 › “ ›x1 › ` ›x2 › psince x1 K x2 q
Bounded Linear Operators in Hilbert Spaces
273
› ›2 ðñ ›x2 › “ 0 ðñ x2 “ 0 (property of the norm) ðñ x “ x1 , we thus have }PS pxq} “ }x} ùñ x “ x1 “ PS pxq. 12) xPS x, xy “ xx, PS xy “ }PS x}2 for all x P H. The first equality is simply a consequence of the fact that PS is self-adjoint, then, by idempotence, PS2 “ PS PS “ PS , so xPS2 x, xy “ xPS x, PS xy “ }PS x}2 . Two of the properties proven above characterize bounded linear operators as orthogonal projectors. T HEOREM 6.31 (“Algebraic” characterization of orthogonal projectors).– Taking A P BpHq, the following statements are equivalent: 1) A is an orthogonal projector; 2) A: A “ A; 3) A: A “ A: ; 4) A is self-adjoint and idempotent, i.e. A: “ A and A2 “ A. P ROOF.– The theorem will be proven by the logical loop 1) ùñ 2) ùñ 3) ùñ 4) ùñ 1). 1q ùñ 2q : A is an orthogonal projector, hence A: “ A and A2 “ A, then A A “ AA “ A2 “ A. :
2q ùñ 3q : if A: A “ A, then pA: Aq: “ A: , but we know that A: A is selfadjoint, thus pA: Aq: “ A: A “ A: . 3q ùñ 4q : if A: A “ A: , then pA: Aq: “ A:: “ A; moreover, A: A is selfadjoint, hence pA: Aq: “ A: A “ A. By hypothesis, A: A “ A: , so A: “ A, that is, A is self-adjoint. Reusing the starting hypothesis A: A “ A: , the fact that A is self-adjoint implies that A2 “ A, that is, A is idempotent. 4q ùñ 1q : let A P BpHq be self-adjoint and idempotent. We wish to show that A is an orthogonal projector. By definition, an orthogonal projector projects onto a closed vector subspace, so we first need to show that ImpAq, the subspace which is intended to be the “site” of the projection, is closed, given the hypotheses of the theorem.
274
From Euclidean to Hilbert Spaces
We can show that the continuity and idempotence of a linear operator A imply the closure of its image; this is remarkable, since the relationship between the concept of closure of a vector subspace and idempotence is far from obvious. Let pxn qnPN Ă ImpAq be a sequence converging to x0 P H. We wish to show that x0 P ImpAq. Since each xn P ImpAq, then, @n P N, there exists ξn P H such that xn “ Aξn and we can thus rewrite xn ÝÝÝÝÝ Ñ x0 as Aξn ÝÝÝÝÝÑ x0 . The continuity nÑ`8
nÑ`8
Ñ Ax0 , but A2 “ A, hence Aξn ÝÝÝÝÝÑ Ax0 . We then of A implies that A2 ξn ÝÝÝÝÝ nÑ`8
nÑ`8
have Aξn ÝÝÝÝÝÑ Ax0 and Aξn ÝÝÝÝÝ Ñ x0 , and the uniqueness of the limit implies nÑ`8
nÑ`8
that Ax0 “ x0 , i.e. x0 P ImpAq, thus ImpAq is closed. In this case, property A: “ A is used alongside idempotence to show that A projects in an orthogonal manner. First, let us write an orthogonal decomposition of H with respect to ImpAq: for all x P H, let us consider x “ Ax ` px ´ Axq, then Ax P ImpAq by definition. We need to show that x ´ Ax is orthogonal to any vector of the form Aξ, ξ P H: xAξ, x ´ Axy “ xξ, A: px ´ Axqy “ xξ, Apx ´ Axqy A s.a.
“ xξ, Ax ´ A2 xy 2“ xξ, Ax ´ Axy “ 0 A “A
thus x ´ Ax P ImpAqK and then A “ PImpAq by the orthogonal projection theorem. 2 Exercise 6.5 Let E Ă 2 pN, Cq be the set: E “ tx “ pxn qnPN P 2 pN, Cq : x0 ` x1 “ 0u 1) Show that E is a closed vector subspace of 2 pN, Cq. 2) Provide an explicit description of E K and determine the orthogonal projection operator PE : 2 pN, Cq Ñ E on E. Determine }PE } and PE: . 3) Let x “ pxn qnPN P 2 pN, Cq such that: # 1 if n “ 0 xn “ 0 otherwise Calculate the distance between x and the subspace E, i.e. δ “ inf t}x ´ y}u. yPE
Bounded Linear Operators in Hilbert Spaces
275
4) Let A : 2 pN, Cq Ñ 2 pN, Cq be the operator defined by: # ´x1 if n “ 0 pApx0 , x1 , x2 , . . . qqn “ xn otherwise Determine }A} and A: . (Hint: calculate Ap0, 1, 0, 0, ...q). 5) Show that A2 “ A and determine ImpAq. Is A an orthogonal projector? Solution to Exercise 6.5 1) We begin by showing that E is a vector subspace of 2 pN, Cq: let us consider any λ P C and arbitrary x, y P E. Then, given that the linear structure of 2 pN, Cq is defined pointwise, [6.24] tells us that: z :“ λx ` y “ pλxn qnPN ` pyn qnPN “ pλxn ` yn qnPN thus z0 “ λx0 ` y0 and z1 “ λx1 ` y1 , and then z0 ` z1 “ λpx0 ` x1 q ` py0 ` y1 q “ λ ¨ 0 ` 0 “ 0 since x, y P E, showing that E is stable with respect to the linear combinations of its elements. We can show that E is closed using a technique which is particularly useful in the context of constraints as x0 ` x1 “ 0. This approach consists of establishing an identity between the constraint and the condition defining the kernel of a continuous linear operator between normed vector spaces, which we know from Theorem 6.8 to be a closed vector subspace of the operator domain. In our case, it is easy to identify the sum of the projection operators on the first and second components, that is, A :“ P0 ` P1 : 2 pN, Cq Ñ C, Apxq :“ P0 pxq ` P1 pxq “ x0 ` x1 , with the continuous linear operator (insofar as it is a sum of continuous linear operators) between two Hilbert spaces such that kerpAq “ tx P 2 pN, Cq : Ax “ 0 ðñ x0 ` x1 “ 0u “ E, demonstrating the closure of E. 2) Using the constraint x0 ` x1 “ 0, a sequence x P E can be written as px0 , ´x0 , x2 , x3 , . . . q, of course by respecting the fact that x P 2 pN, Cq. This implies that the canonical Hilbert basis of 2 pN, Cq, that is, e “ pp1, 0, 0, . . . q, p0, 1, 0, . . . q, . . . q, can be used to construct a Hilbert basis of E as: e˜ :“ pp1, ´1, 0, . . . q, p´1, 1, 0, . . . q, p0, 0, 1, 0, . . . q, . . . q thus: E K “ ty P 2 pN, Cq : xy, e˜n y “ 0 @n P Nu. Taking y “ py0 , y1 , y2 , . . . q P 2 pN, Cq, then: - n “ 0: xy, e˜0 y “ y0 ´ y1 ` 0 ` ¨ ¨ ¨ “ y0 ´ y1 null if and only if y0 “ y1 ; - n “ 1: xy, e˜1 y “ y0 ´ y1 ` 0 ` ¨ ¨ ¨ “ ´y0 ` y1 null if and only if y0 “ y1 , as in the case where n “ 0;
276
From Euclidean to Hilbert Spaces
- n “ 2: xy, e˜2 y “ 0 ` 0 ` y2 ` 0 ` ¨ ¨ ¨ “ y2 null if and only if y2 “ 0. Evidently, for all n ě 2, xy, e˜n y “ yn , which is null if and only if yn “ 0. Thus, the only vector y P 2 pN, Cq which is orthogonal to all elements in the Hilbert basis e˜ of E is y “ py0 , y0 , 0, . . . q, that is: E K “ tpy, y, 0, 0, . . . q, y P Cu The orthogonal projection operator on E can be determined using the projection theorem: 2 pN, Cq “ E ‘ E K . We decompose the arbitrary vector z “ pzn qnPN P 2 pN, Cq into a sum of two vectors, one belonging to E and the other to E K . This is done by noting that, given z “ pz0 , z1 , z2 , . . . q, z P E if the first two components are the inverse of one another, and z P E K if the first two components are equal and are null from the third position onward, then: z “ pz0 , z1 , z2 , z3 , . . . q “ pa, ´a, z2 , z3 , . . . q ` pb, b, 0, 0, . . . q “ pa ` b, b ´ a, z2 , z3 , . . . q which implies the system of constraints: # a ` b “ z0 b ´ a “ z1 solved by a “ pz0 ´ z1 q{2 and b “ pz0 ` z1 q{2, that is: ˆ ˙ ˆ ˙ z0 ´ z1 z0 ` z1 z0 ` z1 z0 ´ z1 pz0 , z1 , z2 , . . . q “ ,´ , z2 , . . . ` , , 0, 0, . . . 2 2 2 2 with the first vector in E and the second in E K ; thus: ˆ ˙ z0 ´ z1 z0 ´ z1 PE pz0 , z1 , z2 , . . . q “ ,´ , z2 , . . . 2 2 is the explicit expression of the orthogonal projector on E. Finally, without carrying out a single calculation, we can state that PE has unit norm, }PE } “ 1, given that it is a non-trivial orthogonal projector, and also that PE: “ PE , as orthogonal projectors are self-adjoint. 3) Let x “ pxn qnPN be the element in 2 pN, Cq such that: # 1 if n “ 0 xn “ 0 otherwise Since E is a closed vector subspace of 2 pN, Cq, the distance between x and E is well defined thanks to the projection theorem. PE pxq represents the vector in E which is the closest to x; therefore; this distance is equal to δ “ }x ´ PE pxq}2 : ˆ ˙ ˆ ˙ 1´0 1´0 1 1 PE pxq “ PE pp1, 0, . . . qq “ ,´ , 0, . . . “ , ´ , 0, . . . 2 2 2 2
Bounded Linear Operators in Hilbert Spaces
277
Then : ›ˆ › ˆ ˙› ˙› › 1 1 › › › 1 1 δ “ ››p1, 0, . . . q ´ , ´ , 0, . . . ›› “ ›› , , 0, . . . ›› 2 2 2 2 2 2 dˆ ˙ ˆ ˙ 2 2 1 1 1 “ ` “? 2 2 2 4) First, we note that x0 plays no part in the action of A, thus: # ´x1 if n “ 0 Apx0 , x1 , x2 , . . . q “ Apy0 , x1 , x2 , . . . q “ xn otherwise
@y0 P C
Notably, this holds true for y0 “ 0, so we can limit the action of A on the elements of 2 pN, Cq of the form x “ p0, x1 , x2 , . . . q. Using this specification, by direct calculation, we obtain: a a a 2 2 2 2 2 2 2 }Ax}2 “ ? p´x a 1 q ` x1 ` x2 ` . . . “? 2x1 ` x2 ` . . . ď 2x1 ` 2x2 ` . . . ď 2 02 ` x21 ` x22 ` . . . “ 2}x}2 With this majorization, the definition of the operator norm from equation [6.3] becomes: ? }A} “ inft0 ă c ď 2 : }Ax}2 ď c}x}2 @x “ p0, x1 , x2 , . . . q P 2 pN, Cqu The inf is the sup of the minimizer set; thus, if we can identify a vector x P ? ? 2 pN, Cq for which }Ax}2 “ 2, then the norm of A must be 2. Taking the hint given in the question, we calculate: a ? }Ap0, 1, 0, . . . q}2 “ }p´1, 1, 0, . . . q}2 “ p´1q2 ` 12 ` 02 ` . . . “ 2 ? and then }A} “ 2. Now, let us determine A: . For all x, y P 2 pN, Cq (in this case, x is not necessarily of the form p0, x1 , x2 , . . . q) it holds that: xAx, yy2 “ xp´x1 , x1 , x2 , . . . q, py0 , y1 , y2 , . . . qy2 “ ´x1 y0 ` x1 y1 ` x2 y2 ` ¨ ¨ ¨ “ x0 ¨ 0 ` x1 py1 ´ y0 q ` x2 y2 ` ¨ ¨ ¨ “ xx, p0, y1 ´ y0 , y2 , ...qy “ xx, A: yy and then the adjoint operator of A is: A: pyq “ p0, y1 ´ y0 , y2 , . . . q
@y P 2 pN, Cq
5) We have A2 x “ AAx “ Ap´x1 , x1 , x2 , . . . q “ p´x1 , x1 , x2 , . . . q “ Ax for all x P 2 pN, Cq, thus A is idempotent. Moreover, we clearly see that ImpAq “ E, where E is the subspace defined at?the beginning of the exercise. Thus A is a projection operator on E, but since }A} “ 2 ‰ 1, it cannot be an orthogonal projector. A is
278
From Euclidean to Hilbert Spaces
therefore an oblique projection operator on E. The difference between the actions of A and PE is: oblique projector on E Ax “ p´x1 , x1 , x2 , . . . q ˆ ˙ x 0 ´ x1 x0 ´ x 1 PE x “ ,´ , x2 , . . . orthogonal projector on E 2 2
2
6.7.1. Bounded multiplication operators and their relation to orthogonal projectors In this section, we shall present a concrete application of the last theorem, while taking the opportunity to introduce a new category of highly useful linear operators. D EFINITION 6.17.– Let H “ L2 pX, A, μq and take g P L8 pX, A, μq. The multiplication operator by g is defined by: Mg : L2 pX, A, μq ÝÑ L2 pX, A, μq f ÞÝÑ Mg f “ f ¨ g % where f ¨ gpxq “ f pxqgpxq @x P X (pointwise multiplication). Now, let us examine the properties of Mg . – Mg is bounded @g P L8 pX, A, μq: ˙ ż ż ˆ }Mg f }22 “ |f pxqgpxq|2 dμpxq ď sup |gpxq|2 |f pxq|2 dμpxq X
“
}g}28
2 f 2
X
xPX
ă `8
thus5 }Mg }2 ď }g}8 then Mg P BpL2 pX, A, μqq @g P L8 pX, A, μq. – @g, h P L8 pX, A, μq, by the commutativity of the pointwise product, multiplication operators commute, that is, Mg Mh “ Mh Mg , rMg , Mh s “ 0. – kerpMg q “ tf P L2 pX, A, μq : Mg pf q “ f ¨ g “ 0L2 pX,A,μq u. Defining the set: Ng “ tx P X : gpxq “ 0u, it is clear that pf ¨ gqpxq “ 0 @x P Ng . Thus, since gpxq ‰ 0 @x P Ng c “ XzNg , to obtain the zero function on X via the product f ¨ g, we must simply impose the 5ŤIt is possible to show that if pX, A, μq is a measure space with a σ-finite measure, i.e. X “ Ak , where μpAk q ă `8, then }Mg }2 “ }g}8 . kPN
Bounded Linear Operators in Hilbert Spaces
279
condition that f must be null on Ng c (remember that f is an equivalence class of functions which are equal a.e.). In short: kerpMg q “ tf P L2 pX, A, μq : f pxq “ 0 @x P Ng c u – Now, let us consider the invertibility of Mg . For the kernel of Mg to be trivial, the only element in kerpMg q must be the equivalence class in which the identically zero function appears. This corresponds to requiring that μpNg q “ 0, since in this case μpNg c q “ μpXq ´ μpNg q “ μpXq thus kerpMg q “ tf P L2 pX, A, μq : f pxq “ 0 a.e.u. – If μpNg q “ 0, then there exists an inverse operator of Mg : Mg ´1 : ImpMg q Ñ L pX, A, μq which can be characterized using the function g1 : X Ñ K, g1 pxq “ # 1 if gpxq ‰ 0 gpxq . 0 otherwise 2
By definition, ImpMg q “ th P L2 pX, A, μq : Df P L2 pX, A, μq : h “ 1 Mg pf q “ f ¨ gu; it is thus clear that g1 ¨ h “ gpxq ¨ f ¨ g “ f . This simple observation
allows us to characterize both ImpMg q and the action of Mg ´1 : ImpMg q “ th P L2 pX, A, μq :
1 ¨ h P L2 pX, A, μqu and Mg ´1 “ M g1 g
– Let us determine the adjoint of Mg : @f, h P L2 pX, A, μq, g P L8 pX, A, μq: ż f pxqgpxq hpxqdμpxq xMg: f, hy “ xf, Mg hy “ xf, ghy “ X
“
ż ´
¯
gpxqf pxq hpxqdμpxq
X
“ x¯ g ¨ f, hy “ xMg¯ f, hy that is, Mg: “ Mg¯ , by Theorem 6.10. – Now, we calculate Mg2 : Mg2 f “ Mg pMg f q “ Mg pf gq “ f g 2 “ Mg2 f @f P L2 pX, A, μq, g P L8 pX, A, μq Thus, the bounded linear operator Mg is self-adjoint and idempotent if and only if g¯ “ g and g 2 “ g. The first condition means that g must be a real-valued function, but the only function with real values which is equal to its own square is a function which
280
From Euclidean to Hilbert Spaces
only takes values of 0 and 1, that is, the indicator function of a measurable subset of Rn , which is clearly an element of L8 pX, A, μq. In summary, the multiplication operator Mg is an orthogonal projection operator if and only if g “ χE , with E Ď X measurable. MχE is invertible if and only if μpE c q ‰ 0, i.e. μpEq ‰ μpXq. Leaving aside invertibility, let us calculate the image of MχE : the condition which determines this subspace is χ1E ¨ h P L2 pX, A, μq, but, by definition, χ1E pxq “ 0 @x P X such that χE pxq “ 0, that is, @x P E c and, in this case, χ1E ¨h P L2 pX, A, μq. When x P E, χ1E pxq “ 1, so the defining condition of ImpMg q becomes h P L2 pE, A, μq. In conclusion, for any measurable set E Ď X, the orthogonal projector and multiplication operator MχE is, explicitly: MχE : L2 pX, A, μq ÝÑ L2 pE, A, μq # f pxq f ÞÝÑ MχE f “ 0
xPE otherwise
6.7.2. Geometric realization of orthogonal projection operators via orthonormal systems We now have the means of proving another important analogy between Hilbert spaces and finite-dimensional Euclidean spaces related to the geometric realization of orthogonal projectors on a vector subspace generated by an orthogonal family, which we have already discussed in Chapter 1. We recall that the orthogonal projector of an inner product vector space V of finite dimension n on a vector subspace S of dimension s can be written as: PS pxq “
s ÿ
xx, ui yui
i“1
where pui qsi“1 is any orthonormal basis of S. In a Hilbert space, we have the following result. T HEOREM 6.32.– Take A P BpHq, A ‰ 0. The following statements are equivalent: 1) A is an orthogonal projector;
Bounded Linear Operators in Hilbert Spaces
281
2) there exists an orthonormal system6 pun qnPN in H such that: ÿ xx, un yun @x P H Ax “ nPN
Where applicable, A projects onto the closed subspace spanpun , n P Nq. P ROOF.– 1q ùñ 2q : First, we note that an orthogonal projector A is surjective if and only if it is the identity operator. The condition ImpAq “ H implies, by properties 7 and 8 of orthogonal projectors, that ImpAqK “ kerpAq “ t0H u; thus, by property 9, it holds that Ax “ x @x P H, i.e. A “ idH . In this case, any complete orthonormal system pun qnPN in H realizes A since, on the one hand, Ax “ x, and on the other hand, by Theorem 5.11řregarding the characterization of complete orthonormal systems, we can write x “ xx, un yun . Given that a complete orthonormal system is a special nPN
instance of an orthonormal system, the implication 1q ùñ 2q when ImpAq “ H is true. Now, let A be an orthogonal projector such that ImpAq Ă H, that is, ImpAq is a closed vector subspace (by definition of an orthogonal projector) and proper in H; thus, it is a Hilbert space itself, properly included in H. Let pun qnPN be any complete orthonormal system in ImpAq. Given our hypotheses, pun qnPN is only an orthonormal system (and not, generally, a complete orthonormal system) of H. For all y P ImpAq, we have the following decomposition: ÿ y“ xy, un yun nPN
Moreover, ImpAq “ tAx, x P Hu, so, using the fact that A, as an orthogonal projector, is self-adjoint: ÿ ÿ xAx, un yun “ xx, Aun yun , @x P H Ax “ nPN
pA s.a.q
nPN
Since A is the identity on ImpAq and un P ImpAq @n P N, then Aun “ un , hence: ÿ xx, un yun , @x P H Ax “ nPN
that is, the orthogonal projector A is realized on the orthonormal system pun qnPN of H as described in point 2. 6 Note that although we write pun qnPN , the orthonormal system may be finite, i.e. it may include a finite number of un ‰ 0.
282
From Euclidean to Hilbert Spaces
2q ùñ 1q : for any pair x, y P H, let pun qnPN be an orthonormal system of H such that: ÿ ÿ Ax “ xx, um yum , Ay “ xy, un yun mPN
nPN
then, by the continuity of the inner product: ÿ ÿ ÿ xx, Ayy “ xx, xy, un yun y “ xy, un yxx, un y “ xx, un yxun , yy nPN
nPN
nPN
Again, using the continuity of the inner product, as we saw when proving Parseval’s identity (Theorem 5.11): ÿ ÿ xx, um yum , xy, un yun y xx, A: Ayy “ xAx, Ayy “ x mPN
“
ÿ ÿ
nPN
xx, um yxy, un yxum , un y
mPN nPN
“
ÿ ÿ
ÿ
xx, um y xun , yy δn,m “
mPN nPN
xx, un y xun , yy
nPN
so xx, Ayy “ xx, A: Ayy @x, y P H, that is, A “ A: A by Theorem 6.10. Using the algebraic characterization of orthogonal projectors, Theorem 6.31, we can therefore state that A is an orthogonal projector. Supposing that 1 and 2 are verified, then: 1) ùñ ImpAq “ kerpAqK and kerpAq “ ImpAqK ; 2) ùñ ImpAq Ď spanpun , n P Nq, which is obvious, and kerpAq Ď p spanpun , n P Nq qK , which is not quite so obvious. For x “ 0H this is true; taking N ř ř xx, un yun “ lim xx, un yun . The x P kerpAq, x ‰ 0H , then Ax “ 0 “ nPN
N ùñ `8 n“1
vectors un are linearly independent since they are mutually orthogonal, so, for all N P N ř N, the linear combination xx, un yun is zero if and only if the coefficients xx, un y n“1
are zero, that is, x P pun , n P NqK “ pspanpun , n P NqqK “ p spanpun , n P Nq qK , by the properties of the orthogonal complement. To summarize: on one side, ImpAq Ď spanpun , n P Nq, while, on the other side kerpAq Ď p spanpun , n P Nq qK , thus spanpun , n P Nq Ď kerpAqK “ ImpAq, that is, ImpAq “ spanpun , n P Nq. 2 R EMARK .– We see from the proof of this theorem that any orthonormal system pun qnPN in ImpAq can be used to realize a projector in the sense defined by the theorem.
Bounded Linear Operators in Hilbert Spaces
283
This means that, although each term in the summation may be different, the overall action of the operator will be the same for any orthonormal system pun qnPN in ImpAq. A remarkable application of this result is shown in Exercise 6.6, which illustrates the way in which the best linear approximation of a parabola on a real interval may be found using the orthogonal projection theory on a Hilbert space. Exercise 6.6 Let f pxq ” 1 (th constant function equal to 1) and gpxq “ x, the identity function, seen as two elements of L2 r0, 1s. Calculate: 1) the angle ϑ between f and g; 2) their distance in L2 r0, 1s; 3) the projection PW h of the function h P L2 r0, 1s, hpxq “ x2 , on the vector subspace W “ spanpf, gq. Interpret your findings. Solution to Exercise 6.6 1) The angle between f and g is obtained using the definition of inner product: xf, gy “ f g cospϑq, so we need to calculate xf, gy, f , g: „ 2 j1 ż1 1 x xf, gy “ xdx “ “ 2 0 2 0 f “
ˆż 1 0
˙1{2 dxq “ 1,
In conclusion, cospϑq “
xf,gy f g
“
g “ ? 3 2 ,
ˆż 1
x2 dx
0
thus ϑ “
˙1{2
“
˜„
x3 3
j1 ¸1{2 0
1 “? 3
π 6.
2) Distance: ¯1{2 ´ş ¯1{2 ´ş 1 1 “ 0 p1 ´ xq2 dx dpf, gq “ f ´ g “ 0 pf pxq ´ gpxqq2 dx ˆ ” ı ˙1{2 ? 3 1 “ ´ p1´xq “ ?13 “ 33 3 0
3) Projection on W : We use the characterization of projection given by the previous theorem. We need to construct an orthonormal basis of W , which can be done by using the Gram-Schmidt procedure. A wise choice is to begin with the function f which is a generator of W and, furthermore, has a unitary norm. The second (and final) element in the orthonormal basis of W is then: g˜pxq “
x ´ 12 gpxq ´ xf, gyf pxq “ x ´ 1 , gpxq ´ xf, gyf pxq 2
284
From Euclidean to Hilbert Spaces
with x ´
¨ « ˛ ˜ż 1 ˆ ˙2 ¸1{2 ˆ ˙3 ff1 1{2 1 1 1 1 1 “ ‚ “ ? dx “˝ x´ x´ 2 2 3 2 2 3 0 0
? ? ` ˘ so g˜pxq “ 2 3 x ´ 12 and the desired orthonormal basis is B “ p1, 3 p2x ´ 1qq. The orthogonal projection hpxq “ x2 on W is thus: g PW h “ xh, f yf ` xh, g˜y˜ By direct calculation, xh, f y “ 13 and xh, g˜y “ ? 3? 1 1 3 p2x ´ 1q “ x ´ PW hpxq “ ` 3 6 6
?
3 6 ,
hence:
The interpretation of this result is as follows: The functions r0, 1s ÞÑ 1 and f
r0, 1s ÞÑ x are the generators of the space W of linear functions (straight lines) g
defined on the interval r0, 1s. In fact, any linear function : r0, 1s Ñ K may be written as pxq “ α ` βx, x P r0, 1s with α, β P K; since α ` βx “ αf pxq ` βgpxq @x P r0, 1s, then “ αf ` βg. The function r0, 1s ÞÑ x2 is a parabola defined on the same interval. By definition h
of orthogonal projection: PW h “ arg min h ´ w wPW
that is, PW h is the element in W which minimizes the L2 distance between h and the straight lines. So, the straight line with equation y “ x ´ 16 is the best approximation of the parabola with equation y “ x2 , in the sense of the norm L2 , on the interval r0, 1s. 2 Figure 6.1 shows a graphical representation of this approximation. A list of properties of orthogonal projectors follows (for the proofs of these properties, see, for example, Abbati and Cirelli 1997). For all A, B P BpHq, we recall that: rA, Bs “ AB ´ BA rA, Bs is said to be the commutator of A and B. If rA, Bs “ 0, the zero operator, that is, AB “ BA, then A and B are said to commute. Let R and S be two closed vector subspaces in the Hilbert space H and let PR , PS be the orthogonal projectors on R and S, respectively.
Bounded Linear Operators in Hilbert Spaces
285
Figure 6.1. The line of equation y “ x ´ 16 (shown in blue) is the best approximation of the parabola with equation y “ x2 (in red) with respect to the Hilbert norm of L2 r0, 1s. For a color version of this figure, see www.iste.co.uk/provenzi/spaces.zip
T HEOREM 6.33 (Sum of orthogonal projectors).– The following statements are equivalent: 1) PR ` PS is an orthogonal projector; 2) PR PS “ PS PR “ 0; 3) PR pxq “ 0 @x P S and PS pxq “ 0 @x P R; 4) R K S. Moreover, if PR ` PS is an orthogonal projector, then it projects on R ` S. T HEOREM 6.34 (Product of orthogonal projectors).– The following statements are equivalent: 1) PR PS is an orthogonal projector; 2) PS PR is an orthogonal projector; 3) rPR , PS s “ 0; 4) R “ pR X Sq ‘ pR X S K q ;
286
From Euclidean to Hilbert Spaces
5) S “ pR X Sq ‘ pRK X Sq. If PR PS and PS PR are orthogonal projectors, then they project on R X S. T HEOREM 6.35 (Difference between orthogonal projectors).– The following statements are equivalent: 1) PR ´ PS is a projector; 2) PR PS “ PS PR “ PS ; 3) R X S “ S, i.e. S Ă R. If PR ´ PS is an orthogonal projector, then it projects on R X S K . T HEOREM 6.36 (Mixing projector sum, difference and product).– If rPS , PR s “ 0, then PR ` PS ´ PR PS is an orthogonal projector which projects on spanpR Y Sq. 6.8. Isometric and unitary operators In this section, we shall determine the properties of isometric and unitary operators in a Hilbert space of infinite dimension, and provide an algebraic and geometric characterization of these operators. Once again, the adjoint operator plays a fundamental role in algebraic characterization, while orthonormal systems and Hilbert bases are crucial for the geometric characterization. In finite-dimensional vector spaces V , a linear operator which preserves the inner product, that is, A : V Ñ V , xAx, Ayy “ xx, yy, @x, y P V , also preserves the norm of the vectors (simply by considering x “ y), that is, }Ax} “ }x} @x P V . To prove the converse, it is sufficient to consider the polarization formula7 [1.7], @x, y P V : ¯ 1´ 2 2 2 2 x ` y ´ x ´ y ` i x ` iy ´ i x ´ iy xx, yy “ 4 If we replace x, y with Ax, Ay and use the linearity of A: 1´ 2 2 Apx ` yq ´ Apx ´ yq xAx, Ayy “ 4 ¯ 2 2 `i Apx ` iyq ´ i Apx ´ iyq
7 The complex case is considered here; the real case is even simpler.
Bounded Linear Operators in Hilbert Spaces
287
Assuming that A preserves the norm, we have: ¯ 1´ 2 2 2 2 x ` y ´ x ´ y ` i x ` iy ´ i x ´ iy “ xx, yy xAx, Ayy “ 4 As we know, the norm canonically generates a metric via dpx, yq “ }x ´ y}. For this reason, an operator which preserves the inner product or norm is said to be isometric. The only vector which has a norm of zero is the vector 0V , thus an isometric operator A never transforms a non-zero vector (whose norm is ą 0) into the null vector, that is, kerpAq “ t0V u. Hence, dimpkerpAqq “ 0 and then, by the rank theorem, dimpImpAqq “ dimpV q. In other words, an isometric endomorphism in finite dimensions is automatically surjective. In an infinite-dimensional Hilbert space, it is still true that a bounded linear operator preserves the scalar product if and only if it is isometric. However, the statement that an isometric operator A : H Ñ H is always surjective is no longer true. One counter-example is provided by the operator A P BpHq defined by Aun “ u2n , where pun qnPN is an arbitrary Hilbert basis of H. Evidently, A is isometric, but, as we will see in Theorem 6.39, ImpAq “ spanpuk , k P N, evenq Ă H; the inclusion is strict, since puk , k P N, k evenq is not a complete orthonormal system, as it is a proper subset of pun qnPN . These considerations lead to Definition 6.18. D EFINITION 6.18.– The operator A : H Ñ H is said to be: – isometric, if xAx, Ayy “ xx, yy, @x, y P H, or, in an equivalent manner, if }Ax} “ }x}, @x P H; – unitary, if A is isometric and surjective. Let us calculate the norm of an isometric operator: }A} “ sup }Ax} “ sup }x} “ 1 }x}“1
}x}“1
Since a unitary operator is also isometric, we have that the norm of isometric and unitary operators is 1. BASIC EXAMPLES OF UNITARY OPERATORS .– Let us consider Rn with the Borel σ-algebra and the Lebesgue measure. Given a fixed element a P Rn , any translation operator: Ta : L2 pRn q ÝÑ L2 pRn q f ÞÝÑ Ta f, where Ta f pxq “ f px ´ aq, @x P Rn
288
From Euclidean to Hilbert Spaces
is unitary. In fact, we know that it is well defined, linear and isometric due to the shift invariance of the Lebesgue measure. It is also surjective, since, for any element g P L2 pRn q, we simply need to consider f P L2 pRn q, f pxq “ gpx ` aq @x P Rn to obtain Ta f “ g. Now, let R P Opnq be a rotation matrix of Rn , where Opnq is the orthogonal group of dimension n, that is, the group of square matrices R of dimension n which are orthogonal, that is, such that Rt “ R´1 . Any rotation operator: TR : L2 pRn q ÝÑ L2 pRn q f ÞÝÑ TR f, where TR f pxq “ f pRxq, @x P Rn is unitary, due to the fact that the Jacobian of the transformation, that is, the determinant of R, has an absolute value of 1 and thus the integrals used to calculate the norm of TR f and of f are equal. It is also surjective, since for any element g P L2 pRn q, we simply need to consider f P L2 pRn q, f pxq “ gpRt xq @x P Rn to obtain TR f “ g. A special case of the rotation operator is the inverse identity matrix: P “ ´I such that TP f “ fP , with fP pxq “ f p´xq. TP is known as the parity operator. 6.8.1. Characterizations of isometric and unitary operators The following results establish a useful characterization of isometric and unitary operators. T HEOREM 6.37 (Algebraic characterization of isometric operators).– A BpHq is an isometric operator if and only if A: A “ idH .
P
P ROOF.– Let A be isometric, then @x P H: xA: Ax, xy “ xAx, Axy “ Ax
2
“
A isometric
2
x “ xx, xy
that is, A: A “ idH . Conversely, if A: A “ idH , then @x, y P H: xAx, Ayy “ xA: Ax, yy “ xx, yy that is, A conserves the inner product, and thus it is isometric.
2
The following result is particularly useful in optimization theory and in quantum mechanics.
Bounded Linear Operators in Hilbert Spaces
289
T HEOREM 6.38.– Let A P BpHq be isometric, then AA: is an orthogonal projector. P ROOF.– We will use the algebraic characterization of orthogonal projectors: since we already know that AA: is self-adjoint for all A P BpHq, we simply need to verify that AA: is bounded and idempotent. 1) AA: is bounded: @x P H it holds that: }AA: x} “ }ApA: xq}
“
A isometric
}A: x} ď A: x “ A x
2) AA: is idempotent : pAA: q2 “ AA: pAA: q “ ApA: AqA:
“
“
}A}“1
A: A“idH
x
AA: .
2
AA: projects onto its image,which can be characterized as follows. T HEOREM 6.39.– Let A P BpHq be isometric. The image of the orthogonal projector AA: is ImpAq, so the image of an isometric A P BpHq is a closed vector subspace of H. P ROOF.– We wish to show that ImpAA: q = ImpAq. We begin by observing that in general, @A, B P BpHq, it holds that ImpABq Ď ImpAq and ImpABq “ ImpAq if and only if B is surjective: ImpAq “ ty P H : Dx P H : Ax “ yu, ImpABq “ tz P H : Dx P ImpBq : Ax “ yu This tells us that ImpABq = ImpAq if and only if B is surjective, otherwise the images of A and those of AB would not agree. Taking B “ A: , then ImpAA: q Ď ImpAq @A P BpHq. Now, let A be isometric, that is, A: A “ idH and y P ImpAq, then Dx P H such that: y “ Ax “ ApA: Aqx “ AA: pAxq that is, y P ImpAA: q, hence ImA Ď ImpAA: q. Thus ImpAA: q=ImpAq. Since AA: is an orthogonal projector, we know that its image is closed; hence, ImpAq is closed for any isometric operator. 2 The fact that an isometric operator has a closed image can be shown directly, using a proof very similar to that of Theorem 6.13. If A is unitary, that is, isometric and surjective, then ImpAA: q “ ImpAq “ H and then AA: “ idH . The fact that kerpA: q “ ImpAqK gives us immediately sufficient conditions to guarantee the invertibility or non-invertibility of the adjoint of an operator in BpHq.
290
From Euclidean to Hilbert Spaces
T HEOREM 6.40.– Taking A P BpHq: K
– if A is isometric and not surjective, then kerpA: q “ ImpAq ‰ t0H u, i.e. A: is not invertible; – if A is unitary, then kerpA: q “ HK “ t0u, i.e. A: is invertible. Now, let us apply these results to the case of the operator Aun “ u2n , where pun qnPN is a complete orthonormal system in H. As noted before, since }Aun } “ }u2n } “ 1 “ }un }, A is isometric, but it is not unitary, as ImpAq “ spanpuk , k evenq Ă H. Let us determine A: : xAun , um y “ xun , A: um y, furthermore, xAun , um y “ xu2n , um y “ δ2n,m , then we can write xun , A: um y “ δ2n,m , that is: # u m2 if m “ 2n : A um “ 0 if m ‰ 2n We see that kerpA: q “ spanpum , m oddq “ ImpAqK , confirming our results. The following theorem gives a complete algebraic characterization of unitary operators. T HEOREM 6.41 (algebraic characterization of unitary operators).– A P BpHq, the following statements are equivalent: 1) A is unitary; 2) A is invertible and A´1 “ A: P BpHq; 3) A: A “ AA: “ idH ; 4) A: is unitary. P ROOF.– 1q ùñ 2q : We know that if A is unitary, then A is injective, that is, invertible on its image; by definition, A is surjective; therefore, it is bijective and invertible on all H. To show that A´1 “ A: , we write: xA: Ax, yy “ xAx, Ayy “ xx, yy,
@x, y P H
hence A: A “ idH (showing that A: is the left inverse of A) and then: A: “ A: pAA´1 q “ pA: AqA´1 “ idH A´1 “ A´1
Bounded Linear Operators in Hilbert Spaces
291
that is, A: “ A´1 . Now, we need only to prove that A: “ A´1 is bounded: Since A is surjective, Dy P H such that x “ Ay, and, by unitarity: }x} “ }Ay} “ }y}, that is, }x} “ }y} and then: }A: x} “ }A´1 x} “ }A´1 Ay} “ }y} “ }x} @x P H which implies that }A´1 } “ }A: } “ sup }x} “ 1. x“1
2q ùñ 3q : A: “ A´1 ùñ A: A “ A´1 A “ idH and: A: “ A´1 ùñ AA: “ AA´1 “ idH 3q ùñ 4q : From AA: “ idH , we obtain: xx, AA: yy “ xx, yy @x, y P H; furthermore, xx, AA: yy “ xA: x, A: yy, thus xA: x, A: yy “ xx, yy @x, y P H, that is, A: is isometric. Now, we only need to prove that A: is surjective. We do this using the other identity, A: A “ idH , which implies: A: pAyq “ pA: Aqy “ idH pyq “ y, @y P H that is, @y P H Dξ “ Ay P H such that A: ξ “ y, i.e. A: is surjective. 1q ùñ 4q ùñ 1q : As we have seen, given an arbitrary unitary operator, its adjoint is also unitary. Using the hypothesis that A: is a unitary operator, then A:: is unitary, and, since A:: “ A, then unitary A: implies unitary A. 2 One consequence of this result is that we can study the unitarity of an operator A by considering that of its adjoint, which can be simpler. Corollary 6.5 shows that the norm of a unitary operator is invariant with respect to adjunction and inversion. C OROLLARY 6.5.– If A P BpHq is unitary, then }A} “ }A: } “ }A´1 } “ 1. Let U pHq be the set of unitary operators on a Hilbert space H. If A, B P UpHq, then, by direct calculation, we can verify that AB P UpHq. The theorem proved above tells us that if A P UpHq then A´1 “ A: P UpHq, that is, U pHq verifies the group axioms with respect to composition. D EFINITION 6.19.– UpHq denotes the unitary group of H. UpHq coincides with the group of automorphisms of H: AutpHq.
292
From Euclidean to Hilbert Spaces
Some applications of the characterization of unitary operators are shown below. Taking H “ L2 pX, A, μq and g P L8 pX, A, μq, we have seen that the multiplication operator by g defined by: Mg : L2 pX, A, μq ÝÑ L2 pX, A, μq f ÞÝÑ Mg f “ f ¨ g where f ¨ gpxq “ f pxqgpxq @x P X is linear and bounded. Moreover, we know that Mg: “ Mg¯ , thus Mg Mg: “ Mg¯g “ M|g|2 , and then Mg Mg: “ idH if and only if M|g|2 “ idH , but this is, equivalent to requiring that |g|2 “ 1, that is, the equivalence class of g must contain at least one representative, also noted g for simplicity’s sake, of the form gpxq “ eihpxq , with h : X Ñ R measurable. Let us apply the last theorem that we proven to verify that for all complete orthonormal system pun qnPN of H, the operator U defined as: U un “ p´iqn un is a unitary operator. We will use the algebraic characterization U : U “ U U : “ idH . On one side, by definition: xU un , un y “ xun , U : un y
[6.25]
and, on the other side: xU un , un y “ xp´iqn un , un y “ p´iqn xun , un y “ xun , p´iqn un y “ xun , U : un y @n P N
r6.25s
hence U : un “ p´iqn un , and then: U : U un “ U : pU un q “ p´iqn U un “ p´iqn p´iqn un “ |p´iqn |2 un “ un . Moreover: U U : un “ U pU : un q “ U p´iqn un “ p´iqn p´iqn un “ |p´iqn |2 un “ un Since pun qnPN is a complete orthonormal system, the fact that U : U “ U U : is the identity onřany element un can be extended to all H. In fact, for all x P H, it holds that x “ xx, un yun and, by the linearity and continuity of U : U and U : U , we can nPN
write: U :U x “
ÿ
xx, un yU : U un “
nPN
ÿ
xx, un yun “ x.
nPN
Bounded Linear Operators in Hilbert Spaces
293
The same is true for U U : x, i.e. U : U “ U U : “ idH , proving that U is unitary. As in finite dimensions, unitary operators allow us to establish an equivalence relation between operators, as formalized in the following definition. D EFINITION 6.20.– Two operators A, B P BpHq are unitarily equivalent if there exists a unitary operator U P BpHq such that A “ U BU ´1 . We do not have the space to go into greater detail regarding the properties of unitary equivalence here. We simply note that unitary equivalence preserves operator properties, such as continuity, invertibility and self-adjointness. 6.8.2. Relationship between isometric and unitary operators and orthonormal systems The final property of isometric and unitary operators that we wish to discuss here is their interaction with orthonormal systems and complete orthonormal systems in Hilbert spaces. In finite dimension, isometric and unitary operators coincide, and they transform orthonormal bases into orthonormal bases. In infinite dimension, this remains true only for unitary operators. T HEOREM 6.42 (Geometric characterization of isometric operators).– A P BpHq is isometric if and only if it transforms complete orthonormal systems pun qnPN in H into orthonormal systems pAun qnPN . P ROOF.– ùñ : for any complete orthonormal system pun qnPN in H, by the isometry of A, we can write: xAun , Aum y “ xun , um y “ δn,m
@n, m P N
thus pAun qnPN is an orthonormal system of H. ð : let A P BpHq, pun qnPN be the complete orthonormal system of H and pAun qnPN an orthonormal system of H. We wish to prove that A is isometric. On one side, the fact that pun qnPN is a complete orthonormal system implies that, for into Fourier series x “ ř a generalized ř all x P H, we have the decomposition xx, un yun and Plancherel’s identity }x}2 “ |xx, un y|2 . nPN
nPN
294
From Euclidean to Hilbert Spaces
On the other side, by the continuity of A, we can write Ax “
ř
xx, un y Aun ;
nPN
furthermore, the hypothesis that pAun qnPN is an orthonormal system of H allows us to use the second part of the Riesz-Fischer theorem (Theorem 5.10) to state that8: ÿ 2 |xx, un y|2 Ax “ nPN 2
2
that is, Ax “ x , therefore A is isometric.
2
T HEOREM 6.43 (Geometric characterization of unitary operators).– A P BpHq is unitary if and only if it transforms complete orthonormal systems pun qnPN in H into complete orthonormal systems pAun qnPN . P ROOF.– ùñ : a unitary operator A is isometric, thus by Theorem 6.43, pAun qnPN is an orthonormal system and we simply need to show that pAun qnPN is complete. We do this using one of the characteristic properties of a Hilbert basis: As we saw in point 2 of Theorem 5.11, if xx, Aun y “ 0 @n P N implies x “ 0H , then pAun qnPN is a complete orthonormal system. Since A is surjective, there exists y P H such that x “ Ay; hence, the condition xx, Aun y “ 0 @n P N becomes: @n P N : xAy, Aun y
“
A unitary
xy, un y “ 0
and since pun qnPN is a complete orthonormal system of H, y “ 0H , implying x “ A0H “ 0H , then pAun qnPN is a complete orthonormal system of H. ð : by Theorem 6.43, we can guarantee that, if A transforms complete orthonormal systems pun qnPN of H into complete orthonormal system pAun qnPN in H, then A is at least isometric; thus, we only need to demonstrate its surjectivity. We have seen that the image of an isometric operator is closed, that is, by linearity, ImpAq “ spanpAun , n P Nq “ H, since pAun qnPN is a complete orthonormal system by hypothesis, thus A is surjective, implying that A is unitary. 2 We end this section with a simple exercise involving both unitary operators and orthogonal projectors. 8 Explicitly, the second part of the Riesz-Fischer theorem tells us that, given an orthonormal ř system pvn qnPN in a Hilbert space H, if the series kn vn converges to y P H, then it holds nPN ř |kn |2 ; in our case, y “ Ax, vn “ Aun and kn “ xx, un y. that }y}2 “ nPN
Bounded Linear Operators in Hilbert Spaces
295
Exercise 6.7 Let H be a Hilbert space. Show that the following properties are equivalent. 1) A P BpHq is self-adjoint and unitary. 2) The operator P “ 12 pA ` idH q is an orthogonal projector. 3) There exist two mutually orthogonal closed subspaces H1 and H2 in H such that H “ H1 ‘ H2 and there exists an operator A P BpHq such that, for all x “ x1 ` x2 , xi P Hi , it holds that Ax “ x1 ´ x2 . Suggestion: show that 1q ðñ 2q and 2q ðñ 3q. Solution to Exercise 6.7 We begin with the equivalence 1q ðñ 2q 1q ùñ 2q : By hypothesis, A is self-adjoint, that is, A “ A: , and unitary, that is, A: “ A´1 . Then A “ A´1 and thus A2 “ AA “ AA´1 “ idH . We can use this fact to show that P “ 12 pA ` idH q is self-adjoint and idempotent, implying that it is an orthogonal projector: P: “
1 pA ` idH q: 2
“
linearity of :
1 : 1 pA ` id:H q “ pA ` idH q “ P 2 2
1 2 1 1 pA ` 2A ` idH q “ pidH ` 2A ` idH q “ pA ` idH q “ P 4 4 2 2q ùñ 1q : If property 2 holds, then we write A “ 2P ´ idH , where P is an orthogonal projector, and we prove that A is self-adjoint and unitary: P2 “
A: “ 2P : ´ id:H “ 2P ´ idH “ A A: A “ A: A “ p2P ´ idH q2 “ 4P 2 ´ 4P ` idH “ 4P ´ 4P ` idH “ idH ùñ A: “ A´1 The next step is to analyze the equivalence 2q ðñ 3q. 2q ùñ 3q : If property 2 holds, then we know that H “ ImpP q ‘ kerpP q, hence H1 “ ImpP q and H2 “ kerpP q. Furthermore, if we write H Q x “ x1 ` x2 , x1 P ImpP q and x2 P kerpP q: P x “ x1 and P pxq “
1 1 pA ` idH qpxq “ pAx ` x1 ` x2 q 2 2
that is, x1 “ 12 pAx ` x1 ` x2 q, and then Ax “ 2x1 ´ x1 ´ x2 “ x1 ´ x2 . 3q ùñ 2q : Assuming that property 3 is verified, P pxq “ 12 pAx ` xq “ 12 px1 ´ x2 ` x1 ` x2 q “ x1 for all x P H, thus, by definition, P is the orthogonal projector PH1 by the hypothesis that H1 and H2 are orthogonal and closed. 2
296
From Euclidean to Hilbert Spaces
6.9. The Fourier transform on SpRn q, L1 pRn q and L2 pRn q The Fourier transform on L2 pRn q is the most important example of a unitary operator on L2 pRn q in terms of its applications to theoretical physics, differential equation theory and signal processing, among others. Nonetheless, this operator is not simple to construct, as L2 pRn q is not the most natural space for the Fourier transform; the most suitable environment for the Fourier transform is, in fact, the Schwartz space. Several constructions of the Fourier transform on L2 pRn q can be found in the literature; the most widespread, which shall be used here, consists of defining the Fourier transform on the Schwartz space to highlight its remarkable properties, and then operating an extension to L2 pRn q using a limit procedure. In addition to this result, we shall present an explicit formula which makes use of the Hermite basis of L2 pRn q. 6.9.1. The invariance of the Schwartz space with respect to the Fourier transform Let us begin by defining the Fourier transform on the Schwartz space SpRn q for n “ 1. We will then generalize this definition for an arbitrary (finite) n. D EFINITION 6.21.– The Fourier transform on SpRq is the following linear operator: Fˆ : SpRq ÝÑ SpRq f ÞÝÑ Fˆ pf q “ fˆ, where: fˆpωq “
?1 2π
ş
R
f pxqe´iωx dmpxq
where m is the Lebesgue measure on R and ω P R. The inverse Fourier transform on SpRq is the following linear operator: Fˇ : SpRq ÝÑ SpRq f ÞÝÑ Fˇ pf q “ fˇ, where: fˇpxq “
?1 2π R
ş
f pωqeiωx dmpωq
More generally, the Fourier transform on SpRn q is the following linear operator: Fˆ : SpRn q ÝÑ SpRn q f ÞÝÑ Fˆ pf q “ fˆ, where: fˆpωq “
1 p2πqn{2
ş
Rn
f pxqe´ixω,xy dmpxq
where m is the Lebesgue measure on Rn , ω P Rn and xω, xy “
n ř
ω1 xi is the
k“1
Euclidean inner product in Rn . The inverse Fourier transform on SpRn q is the following linear operator: Fˇ : SpRn q ÝÑ SpRn q f ÞÝÑ Fˇ pf q “ fˇ, where: fˇpxq “
1 p2πqn{2
ş
Rn
f pωqeixω,xy dmpωq
Bounded Linear Operators in Hilbert Spaces
297
To verify that these definitions are well posed, we must ensure that the integrals exist and that fˆ and fˇ are rapidly decreasing functions. The existence of the integrals is evident if we consider that SpRn q Ă L1 pRn q, thus: ż ż ˇ ˇ ˇ ˇ |f pxq| dmpxq ă `8. ˇf pxqe´ixω,xy ˇ dmpxq “ Rn
Rn
The same is true for the inverse Fourier transform. The fact that fˆ and fˇ are rapidly decreasing functions can be verified by iterating the derivation under the integral sign and by integrating by parts. A summary of the most important properties of the Fourier transform for a function f P SpRq, a, b, c P R, a ‰ 0 is given in Table 6.4. I MPORTANT OBSERVATIONS .– – Fˆ transforms the product by a constant into a division by the same constant (up to a coefficient). – Fˆ , like the DFT, transforms the shift of the initial variable into the product by a complex exponential. – Fˆ transforms the n-th derivation into the product by a power of iω. This property is crucial for transforming differential equations into algebraic equations. – Fˆ transforms a Gaussian with unit standard deviation into a Gaussian with unit standard deviation. More generally, Fˆ inverts the standard deviation: a Gaussian with a small standard deviation, that is, with values located in close proximity to its mean, is transformed by Fˆ into a Gaussian with a large standard deviation, that is, with values which are spread away from the mean, and vice versa. Original function f P SpRq Fourier transform fˆ P SpRq ` ˘ 1 ˆ ω f f paxq |a|
f px ´ bq f pax ´ bq eicx f pxq f 1 pxq 2
f pxq dn f dxn n
p´ixq f pxq x2 2
e´
2 2
e´c
x
a
e´iωb fˆpωq ` ˘ ´iω b a ˆ ω e f a |a| fˆpω ´ cq iω fˆpωq ´ω 2 fˆpωq piωqn fˆpωq dn fˆ pωq dω n
e´
ω2 2
ω2
´ 1 ? e 4c2 c 2
Table 6.1. Properties of the Fourier transform on SpRq
298
From Euclidean to Hilbert Spaces
We wish to prove the property fp1 pωq “ iω fˆpωq. P ROOF.– We begin by observing that for f : R Ñ C, f P SpRq, then f pxq ÝÑ 0. We write the Fourier transform of f 1 : ż `8 1 f 1 pxqe´iωx dx “ (int. by parts) fp1 pωq “ ? 2π ´8 ż `8 ‰ 1 1 “ ´iωx `8 f pxqe ´? f pxqp´iωqe´iωx dx “? ´8 2π 2π ´8 ż `8 1 ? f pxqe´iωx dx “ 0 ´ p´iωq 2π ´8 “ iω fˆpωq
|x|Ñ`8
2
The fact that the Gaussian with unit standard deviation is invariant with respect to the Fourier transform is not immediately evident, so a proof is helpful. For that, we need two lemmas. L EMMA 6.1.– It holds that: ż `8 ? x2 e´ 2 dx “ 2π ´8
P ROOF.– We write I “ I2 “
ż `8
e´
´8
x2 2
ş`8 ´8
dx ¨
e´
x2 2
ż `8
dx, then:
e´
y2 2
dy
´8
“
(th. Fubini)
ż `8 ż `8 ´8
1
e´ 2 px
2
`y 2 q
dxdy
´8
Switching to polar coordinates pρ, ϑq, ρ P r0, `8q, ϑ P r0, 2πq and recalling that the Jacobian in polar coordinates is ρ, we obtain: ş`8 ş2π ´ ρ2 ş2π ş`8 ρ2 e 2 ρ dρdϑ “ 0 dϑ 0 e´ 2 ρ dρ 0 ” 0 ı`8 ‰ “ ρ2 “ 2π ´e´8 ` e0 “ 2π “ 2π ´e´ 2
I2 “
0
Thus I “
?
2
2π.
L EMMA 6.2.– It holds that: ż `8 ż `8 px`iωq2 x2 e´ 2 dx “ e´ 2 dx ´8
´8
The proof uses the calculus of residues of complex analysis.
Bounded Linear Operators in Hilbert Spaces
299
We can now prove that: ω2 z x2 e´ 2 pωq “ e´ 2
P ROOF.– 1 z x2 e´ 2 pωq “ ? 2π
ż `8
e´
ω ω2 ¨e´ 2 ¨e 2 ω2
e´ 2 ? 2π
e´ 2 “ ? 2π
ż `8
ω2
ż `8
e´ 2 “ ? 2π
e´
x2 2
e´iωx e
ω2 2
dx
´8
e´
x2 `2iωx´ω 2 2
e´
px`iωq2 2
dx
dx
´8 ω2
“
“
´ e?
Lemma 6.1
ż `8
´8
e´ 2 ? 2π
Lemma 6.2
e´iωx dx
´8 ω2
“ 2
x2 2
ω2 2
2π
ż `8
e´
x2 2
dx
´8
?
2π “ e´
ω2 2
The inversion of the standard deviation, i.e. the fact that e´c
2 2
x2
ÞÑ
c Fˆ
1 ?
ω2
e´ 4c2 , can 2
be proven using an alternative technique (evidently, the technique presented earlier is also an option). 2
2
This technique is based on solving a differential equation. If f pxq “ e´c x , then 1 f pxq “ ´2c2 xf pxq, thus f 1 ` 2c2 xf “ 0 and, given the properties f 1 pxq ÞÑ iω fˆpωq, Fˆ
´ixf pxq ÞÑ fˆ1 pωq and the fact that 2c2 xf “ i2c2 p´ixf q, by applying Fˆ to both Fˆ
sides of the previous differential equation we can write: iω fˆpωq ` i2c2 fˆ1 pωq “ 0 ðñ ω fˆpωq ` 2c2 fˆ1 pωq “ 0
[6.26]
This gives us a separable differential equation9 with respect to fˆ. The canonical technique for solving this type of differential equation is to first search for constant solutions fˆpωq “ C P R @ω P R, implying fˆ1 pωq “ 0 @ω P R, thus [6.26] becomes ω fˆpωq “ 0 which may only be verified for all ω P R when fˆpωq ” 0; hence, the only 9 We recall that a differential equation with respect to a function yptq is said to be separable if it can be written as y 1 ptq “ f pyptqq ¨ gptq, where f and g are two continuous functions.
300
From Euclidean to Hilbert Spaces
constant solution to the differential equation [6.26] is the identically zero function. However, this function is not coherent with the fact that fˆp0q ‰ 0: ż 1 ? f p0q “ f pxqe´i0x dx 2π R def. of fˆp0q ! ż ż 2 2 1 1 f pxqdx “ ? e´c x dx “? 2π R 2π R ż ? 2 1 1 1 ? ? 2π “ ? ? “? e´y {2 dy “ Lemma (6.1) 2πc 2 R 2πc 2 c 2 Hence, fˆp0q ” 0 is not a solution to [6.26]. Now, let us suppose that fˆpωq ‰ 0 and look for non-constant solutions to [6.26] using the variable separation technique. We write the equation as follows: fˆ1 pωq ω “´ 2 ˆ 2c f pωq ω2 Integrating both sides we obtain: log |fˆpωq| “ ´ 4c 2 ` log C, C ą 0, where log C is the arbitrary constant resulting from integration. It is written in this way because, taking the exponential of both sides, we obtain: ω2
ω2
|fˆpωq| “ e´ 4c2 `log C “ Ce´ 4c2
2
ω fˆpωq “ ˘Ce´ 4c2
2
ω that is, fˆpωq “ Ke´ 4c2 , K P Rzt0u. Now, we simply observe that K “ fˆp0q “
as before, which gives us the solution fˆpωq “
1 ? e c 2
ω2 ´ 4c 2
1 ? , c 2
.
The properties of the Fourier transform defined on SpRn q, summarized in Table 6.2 (where c P R, c ‰ 0, a, b P Rn , k P t1, . . . , nu), follow directly from those obtained in the case where n “ 1, with relatively straightforward changes to the demonstration technique, notably involving the use of Fubini’s theorem to calculate multiple integrals. We end this section by presenting the result which makes the Schwartz space so interesting for Fourier transform theory (and which justifies the name of Fˇ ). T HEOREM 6.44.– The transform Fˆ is a linear isomorphism of SpRn q in itself, and its inverse transformation is Fˇ : Fˇ “ Fˆ ´1 . Furthermore, if f P SpRn q is interpreted as a function of L2 pRn q, then: }f } “ }fˆ} @f P SpRn q Ă L2 pRn q. The Schwartz space is thus invariant with respect to the application of the Fourier transform Fˆ , which possesses an explicit integral formula and an explicit inverse given by Fˇ and conserves the norm of rapidly decreasing functions when these are
Bounded Linear Operators in Hilbert Spaces
301
interpreted as elements of L2 pRn q. There is no other infinite-dimensional functional space in which the Fourier transform possesses all of these properties simultaneously. Original function f P SpRn q Fourier transform fˆ P SpRn q ` ˘ 1 ˆ ω f c f pcxq |c| f pcx ´ bq
e´ixω,by fˆpωq ` ˘ ´iω b c ˆ ω e f
eixa,xy f pxq
fˆpω ´ aq
Bxk f pxq
iωk fˆpωq
Bx2k f pxq
´ωk2 fˆpωq
Bxnk f pxq
piωk qn fˆpωq
p´ixk qn f pxq
Bxnk fˆpωq
f px ´ bq
e´ e´c
}x}2 2
2
}x}2
|c|
e´
c
}ω}2 2
}ω}2
´ 1 ? e 4c2 c 2
Table 6.2. Properties of the Fourier transform on SpRn q
As we shall see, L1 pRq is not invariant under Fourier transform, while in L2 pRq we loose the explicit integral formula. 6.9.2. Extension of the Fourier transform of SpRn q to L1 pRn q: Riemann-Lebesgue theorem
the
The functions which constitute the elements of the Schwartz are too regular to be exhaustive, particularly with respect to applications. It is thus important to consider the extension of the Fourier transform to less regular function spaces, such as L1 pRq and L2 pRq. In this section, we shall consider L1 pRq, for which we have a particularly famous result. T HEOREM 6.45 (Riemann-Lebesgue theorem).– The operator Fˆ from section 6.9.1 can be extended in a unique manner to the injective and continuous linear operator defined as follows: Fˆ1 : L1 pRn q ÝÑ C 8 pRn q f ÞÝÑ Fˆ1 pf q “, where: Fˆ1 f pωq “
1 p2πqn{2
ş
Rn
f pxqe´ixω,xy dmpxq
302
From Euclidean to Hilbert Spaces
The same statement holds for the extension of Fˇ to L1 pRn q with the corresponding integral function, that is: Fˇ1 : L1 pRn q ÝÑ C 8 pRn q f ÞÝÑ Fˇ1 pf q “, or : Fˇ1 f pxq “
1 p2πqn{2
ş
Rn
f pωqeixω,xy dmpωq
We recall that C 8 pRn q is the space of defined and continuous functions on Rn which tend toward 0 as we approach infinity, equipped with the norm }f }8 “ supxPRn |f pxq|. O BSERVATIONS .– – The Riemann-Lebesgue theorem tells us that the integral formula of the Fourier transform remains valid for the elements of L1 pRn q; this is very important, since functions which are absolutely integrable in the Lebesgue sense are much more widespread than rapidly decreasing functions in practical applications. – The injectivity of F1 means that it can be inverted on the image F1 pL1 pRn qq Ă C 8 pRn q but not on L1 pRn q. A classic counter-example for the case where n “ 1 is the indicator function for the interval r´1, 1s in R, that is, χr´1,1s ; it belongs to L1 pRq, but by direct calculation we obtain: c 2 sin ω ˆ F1 pχr´1,1s qpωq “ [6.27] π ω This evidently belongs to C 8 pRq but not to L1 pRq; it actually belongs to L2 pRq. Thus, Fˇ1 , which is defined on all L1 pRq, is not the inverse of Fˆ1 . 6.9.3. Extension of the Fourier transform to a unitary operator on L2 pRn q: the Fourier-Plancherel transform The technique which is classically used to extend Fˆ to L2 pRn q consists of using a theorem that is of fundamental importance in functional analysis, which will be presented and proved below. First, however, we must establish a definition of the extension of a linear operator. D EFINITION 6.22 (bounded extension of bounded linear operators).– Let E, V, W be vector spaces on the same field K and let E be a vector subspace of V . Let A : E Ñ W be a linear operator. The linear operator B : V Ñ W is an extension of A on V if the restriction of B to E coincides with A, that is, if Ax “ Bx @x P E. T HEOREM 6.46 (Theorem of extension of a bounded linear operator).– Let E and F be two normed vector spaces, with F a Banach space. Let A : DA Ď E Ñ F be a bounded linear operator, where DA is a vector subspace of E. Then, there exists only one linear operator A with the following properties:
Bounded Linear Operators in Hilbert Spaces
303
1) the domain of A is the closure of DA in E: DA “ DA ; 2) A is continuous: A P BpDA , F q; 3) }A} “ }A}. This operator is defined as follows. Let pxn qnPN Ă DA be an arbitrary sequence which converges to x P DA , then: A : DA Ď E ÝÑ F x ÞÝÑ Ax “ lim Axn nÑ`8
P ROOF.– Let x be an arbitrary element in DA “ DA , then, by definition, there exists a sequence pxn qnPN Ă DA such that x “ lim xn . pxn qnPN . Being convergent, nÑ`8
pxn qnPN is a Cauchy sequence and, since A is continuous, the sequence pAxn qnPN Ă F is also a Cauchy sequence, by Theorem 6.9. Since F is a Banach space, there exists y “
lim Axn ; thus, the operator A :
nÑ`8
DA Ñ F , Ax “ lim Axn is well defined and linear, as it is defined via the limit nÑ`8
operation, which is linear. Furthermore, A does not depend on the sequence which converges to x; in fact, if px1n qnPN Ă E is another sequence such that x “ lim x1n , then: nÑ`8
} lim Axn ´ lim Ax1n } “ lim }Axn ´ Ax1n } (Continuity of } }) nÑ`8
nÑ`8
“ lim }Apxn ´ nÑ`8
x1n q}
“ }A} lim }xn ´ nÑ`8
x1n }
nÑ`8
ď lim }A}}xn ´ x1n } (A bounded) nÑ`8
“ }A} } lim pxn ´ x1n q} (Continuity of } })
“ }A} } lim xn ´ lim nÑ`8
nÑ`8
nÑ`8
x1n }
“ }A}}x ´ x} “ 0
Evidently, any x P DA may be identified as the limit of the constant sequence xn “ x @n P N; hence, given that the definition of A is independent with respect to the chosen sequence, Ax “ Ax @x P DA , that is, the restriction of A to DA is A and, inversely, A is an extension of A on DA . The fact that A is a bounded operator on DA can be verified by considering the limit of the inequality }Axn } ď }A}}xn }. The limit conserves the order relation, that is: lim }Axn } ď lim }A}}xn }
nÑ`8
nÑ`8
304
From Euclidean to Hilbert Spaces
and, by the continuity of the norm, we have: } lim Axn } ď }A} } lim xn } ðñ }Ax} ď }A}}x} nÑ`8
nÑ`8
for all x P DA , that is, A is bounded, and thus continuous. Now, let us prove that any other extension of A to DA must coincide with A. Let B be another bounded extension of A on DA , then, for all x P DA , there exists a sequence pxn qnPN Ă DA , such that x “ lim xn and by the definition of A and the nÑ`8
continuity of B we have: Ax ´ Bx “ lim Axn ´ B nÑ`8
˙
ˆ lim xn
nÑ`8
“ lim Axn ´ lim Bxn nÑ`8
nÑ`8
“ lim pAxn ´ Bxn q nÑ`8
For all n P N, xn P DA and, since B is an extension of A, by definition Bxn “ Axn @n P N, then Axn ´ Bxn “ 0 @n P N and thus Ax ´ Bx “ lim pAxn ´ nÑ`8
Bxn q “ lim 0 “ 0, i.e. A “ B. nÑ`8
Finally, we need toshow that the extension is isometric, that is, }A} “ }A}. We have already seen that Ax ď A x for all x P DA , thus: A “ sup Ax ď sup A x “ A x“1
x“1
then, if we can show that A ě A, this will prove the isometry of the extension. The proof is straightforward if we consider the definition of the following operator norm: # + " * Ax Ax }A} “ sup , x P DA zt0E u ě sup , x P DA zt0E u “ A x x since Ax “ Ax @x P DA and DA Ď DA , hence A “ A and the theorem is fully proven. 2 Using the extension theorem and the fact that SpRn q “ L2 pRn q, the Fourier transform of the Schwartz space can be extended to L2 pRn q via the limit formula of the extension theorem, as formalized as follows. T HEOREM 6.47.– The operators Fˆ and Fˇ which define the Fourier transform and the inverse Fourier transform on SpRn q, respectively, can be extended in a unique manner to two unitary operators F and F˜ on L2 pRn q; furthermore, F˜ “ F ´1 .
Bounded Linear Operators in Hilbert Spaces
305
The operator F is known as the Fourier-Plancherel transform and it is defined as follows: let pfn qnPN Ă SpRn q be an arbitrary sequence of elements in SpRn q which converge to f P L2 pRn q, then: F : L2 pRn q ÝÑ L2 pRn q f ÞÝÑ F pf q “ lim fˆn . nÑ`8
Analogously: F ´1 : L2 pRn q ÝÑ L2 pRn q f ÞÝÑ F pf q “ lim fˇn nÑ`8
Thus, the Fourier-Plancherel transform F on L2 pRn q has the vital properties of being a unitary operator with inverse given by the unitary operator F ´1 . One reason L2 pRn q is a less natural space than SpRn q for studying the Fourier transform is the lack of a valid integral formula for all elements of L2 pRn q. Theorem 6.48 provides a partial solution to this problem. T HEOREM 6.48.– If f P L1 pRn qXL2 pRn q, then F “ Fˆ1 and, for functions belonging to L1 pRn q X L2 pRn q, the integral formula of the Fourier transform remains valid. Thankfully, as we saw in section 4.4.4, the functions of L1 pRn q X L2 pRn q include the bounded functions of L1 pRn q and those of L2 pRn q which cancel outside of a compact subspace, often encountered in practical applications. 6.9.4. Relationship between the Fourier-Plancherel transform and the Hermitian Hilbert basis One very important Hilbert basis in L2 pRq is the Hermite basis, defined as: n 2 1 2 d p´1qn e2x e´x , un pxq “ a ? n n dx 2 n! π
x P R, n P N
The functions un can be shown to decay rapidly, so their Fourier transform is obtained by applying the integral formula, that is: ż 1 F un pωq “ ? un pxqe´iωx dmpxq 2π R ż n 2 1 2 d 1 p´1qn “? a e´iωx` 2 x e´x dmpxq. ? n n dx 2π 2 n! π R By means of some simple algebraic manipulations, we can show that: F un “ p´iqn un ,
nPN
306
From Euclidean to Hilbert Spaces
and thus F coincides with the unitary operator introduced in section 6.8.1. By the continuity of F , we can write: Ff “
ÿ
p´iqn xf, un yun ,
@f P L2 pRq, pun qnPN : Hermite basis
nPN
6.9.5. The Fourier transform and convolution The properties of the Fourier transform with respect to convolution merit a separate discussion, given their importance and usefulness in both theoretical and applied mathematics. Readers wishing to study this subject in greater detail are advised to consult Gasquet and Witomski (2013). We shall begin by defining convolution and discussing its basic properties, before proving the best-known and most important property of the Fourier transform in L1 pRn q with respect to convolution: the convolution product is transformed into the pointwise product of the Fourier transforms (to within a coefficient). D EFINITION 6.23.– Taking f, g : Rn Ñ R, the convolution between f and g is the function f ˚ g defined by: ż f px ´ yqgpyqdmpyq, @x P Rn pf ˚ gqpxq “ Rn
as long as the integral exists in the Lebesgue sense. T HEOREM 6.49 (Basic properties of convolution).– The following properties hold: 1) if f P L1 pRn q and g P L8 pRn q or vice versa , then the convolution is well defined; 2) if f, g P L2 pRn q, then the convolution is well defined and, in general, is an element of L8 pRn q; 3) if f, g P L1 pRn q, then the convolution is well defined and belongs to L1 pRn q, which becomes a Banach algebra with respect to the convolution; 4) if convolution is well defined, then: - f ˚ pαg ` βhq “ αf ˚ g ` βf ˚ h (linearity); - f ˚ g “ g ˚ f (commutativity) ; - f ˚ pg ˚ hq “ pf ˚ gq ˚ h (associativity). P ROOF.– Only the first two properties will be proved here. Proof of the remaining properties is left to the reader as an exercise.
Bounded Linear Operators in Hilbert Spaces
307
1) If f P L1 pRn q and g P L8 pRn q, then: ż ż |f px ´ yqgpyq|dmpyq ď }g}8 |f px ´ yq|dmpyq “ }g}8 }f }1 Rn
Rn
by the shift invariance of the Lebesgue measure. 2) If f, g P L2 pRn q, then, by the Hölder inequality [4.19]: ż Rn
|f px ´ yqgpyq|dmpyq ď
ˆż Rn
˙1{2 ˆż |f px ´ yq|2 dmpyq
Rn
˙1{2 |gpyq|2 dmpyq
“ }f }2 }g}2 again by the shift invariance of the Lebesgue measure.
2
T HEOREM 6.50 (Convolution and Fourier transform in L1 ).– Taking f, g L1 pRn q, then the Fourier transform verifies the following property:
P
fz ˚ g “ p2πqn{2 fˆ ¨ gˆ P ROOF.– We simply write the definition of convolution and of the Fourier transform, then apply the Fubini theorem twice, with a minor algebraic manipulation in between: ˙ ż ˆż 1 { pf ˚ gqpωq “ f px ´ yqgpyqdmpyq e´ixω,xy dmpxq p2πqn{2 Rn n R ż ż 1 “ f px ´ yqgpyqe´ixω,xy dmpxqdmpyq (Fubini) p2πqn{2 Rn Rn ż ż 1 “ f px ´ yqgpyqe´ixω,x´y`yy dmpxqdmpyq p2πqn{2 Rn Rn ż ż 1 “ f px ´ yqe´ixω,x´yy gpyqe´ixω,yy dmpxqdmpyq p2πqn{2 Rn Rn ż ż 1 ´ixω,x´yy “ f px ´ yqe dmpxq gpyqe´ixω,yy dmpyq (Fubini) p2πqn{2 Rn n R “ (t “ x ´ y, dmptq “ dmpx ´ yq) “ p2πq
n{2
1 p2πqn{2
ż Rn
f ptqe
“ p2πqn{2 fˆpωq ¨ gˆpωq,
´ixω,ty
1 dmptq p2πqn{2
@ω P Rn .
ż
gpyqe´ixω,yy dmpyq
Rn
2
If we inverse the Fourier transform on the image F1 pL1 pRn qq Ă C8 pRn q, then we obtain f ˚ g “ p2πqn{2 pfˆ ¨ gˆq_ , which is often written in the form: pf ¨ gq_ “ p2πq´n{2 fˇ ˚ gˇ
[6.28]
308
From Euclidean to Hilbert Spaces
This formula will be used in section 6.11. Convolution is a stationary operation, that is, it commutes with translation, as in the discrete case. Fixing s P R and g P L1 pRq, then we can define the right translation operator Rs and the convolution operator with g, Tg , as follows: ż Rs f ptq “ f pt ´ sq, Tg f ptq “ pf ˚ gqptq “ f pt ´ xqgpxqdx R
then, for all t P R: Rs Tg f ptq “ Tg f pt´sq “
ż R
f pt´s´xqgpxqdx “
ż R
Rs f pt´xqgpxqdx “ Tg Rs f ptq
As we saw in the discrete case (see section 2.9.6), the convolution operation with the Gaussian function results in blurring of a signal. This result can be understood from a different perspective, using the following impulse function: # 1 0ătăε Iε ptq “ ε 0 otherwise If f P L1 pRq, then: ż ż 1 ε f pt ´ xqdx pf ˚ Iε qptq “ f pt ´ xqIε pxqdx “ ε 0 R Now, applying the variable substitution u “ t ´ x, we obtain du “ ´dx and the lower and upper extrema of the integral with respect to the new variable u become t and t ´ ε. Then: ż ż 1 t´ε 1 t pf ˚ Iε qptq “ ´ f puqdu “ f puqdu “ xf yrt,t`εs , ε t ε t´ε that is, the mean of f in the interval rt ´ ε, ts, of size ε. A Gaussian Gμ,σ with mean μ and standard deviation σ is a “smooth” version of the pulse Iε , which rapidly tends toward 0 outside of the interval rμ ´ σ, μ ` σs, thus: f ˚ Gμ,σ » local mean of f in rμ ´ σ, μ ` σs In section 2.9.6, we saw that blurring, in the frequency domain, results from the fact that the Fourier multiplier corresponding to the convolution with the Gaussian constitutes a low-pass filter. Here, we find the explanation for the blurring effect in the original domain of a signal f : following convolution with a Gaussian, each value of f in t is replaced by an approximation of the local mean value of f , with a locality parameter determined by the standard deviation of the Gaussian. A further property of convolution, which is crucial for applications to the theory of differential equations, is discussed in Theorem 6.51.
Bounded Linear Operators in Hilbert Spaces
309
T HEOREM 6.51.– Taking f P CpRn q with bounded partial derivatives and g P L1 pRn q, then f ˚ g P CpRn q and: Bxk pf ˚ gq “ pBxk f q ˚ g,
@k “ 1, . . . , n
In the same way, if g P CpRn q with bounded parial derivatives and f P L1 pRn q, then: Bxk pf ˚ gq “ f ˚ pBxk gq ,
@k “ 1, . . . , n
P ROOF.– The hypotheses of the theorem ensure that the derivation can be passed under the integral sign, thus @k “ 1, . . . , n: ˙ ż ˆż Bxk f px ´ yqgpyqdmpyq “ Bxk pf px ´ yqgpyqq dmpyq Rn
“
ż Rn
Rn
pBxk f px ´ yqq gpyqdmpyq
since f is the only element which depends on x, that is, Bxk pf ˚ gq “ pBxk f q ˚ g. The second formula is a consequence of the commutative property of convolution, which allows us to switch the roles of f and g. 2 6.9.6. Convolution and Fourier transforms in L2 : localization of the Fourier transform A generalization of equation [6.27] allows us to highlight a significant limitation of the Fourier transform. The formalization of this statement relies on the following result, taken from Gasquet and Witomski (2013), which shows that the Fourier transform of the product of the elements in L2 pRn q is proportional to the convolution of their Fourier transforms. T HEOREM 6.52.– If f, g P L2 pRn q, then: fy ¨ g “ p2πq´n{2 fˆ ˚ gˆ Let us consider the spectrum of f P L2 pRq, but only in the neighborhood of a value of t0 . Using translation, it is always possible to consider t0 “ 0. The simplest, but incorrect (for reasons which we shall see later) approach to localizing the analysis of the spectrum of f ptq consists of truncating it, that is, multiplying it by the step function of size 2T : # 1 if |t| ď T χptq “ , 0 otherwise
310
From Euclidean to Hilbert Spaces
where 2T is the size of the neighborhood that we wish to consider. Since χ P L2 pRq, by Theorem 6.5210, the Fourier transform of the truncated signal f˜ptq “ f ptqχptq is ? p ˆ where: f˜pωq “ 1{ 2π fˆpωq ˚ χpωq, c 2 sinpωT q 1 T p χpωq ˜ “? T “ sincpωT q π ωT π 2π where the function R Q t ÞÑ sincptq :“ sint t . Thus: ¯ T ´ˆ p f pωq ˚ sincpωT q f˜pωq “ π that is, the spectrum of the truncated signal is proportional to the convolution between the spectrum of the original signal and the sinc function of ωT . We thus see that precise localized information concerning the original signal cannot be obtained by truncation alone. This is one of the difficulties inherent in localizing frequency analysis of a signal within the context of Fourier analysis. Wavelet theory (Frazier 2001), developed to a significant extent in the late 1980s, offers powerful tools for handling this phenomenon. 6.10. The Nyquist-Shannon sampling theorem The Nyquist-Shannon theorem11 is one of the most important theorems in signal theory. It states that, when a function f possesses a bounded spectrum as specified in Definition 6.24, this function can be reconstructed using a discrete set of samples. D EFINITION 6.24.– The function f : R Ñ C is said to be a continuous signal of finite bandwidth if there exists Ω P R` such that: fˆpωq “ 0
@|ω| ą Ω
The human visual system is incapable of perceiving an electromagnetic wave as light when the oscillating frequency of the wave is lower than 400 THz or higher than 800 THz, where T = Tera = 1012 . Moreover, humans are able to hear sounds as variations in air pressure only at frequencies between 20 Hz and 20 KHz, where K = Kilo = 103 . Visual and auditory signals, which are transmitted to the brain for interpretation, are two key examples of finite-bandwidth signals. 10 This argument is not valid if f P L1 pRq, as, in this case, the formula from Theorem 6.52 ˆ R would only be valid if fˆ and χ ˆ belong to L1 pRq; however, as we saw in section 6.9.2, χ 1 L pRq. 11 This theorem is known by several different names, sometimes including the names of Whittaker and Kotelnikov, two other mathematicians who independently discovered it.
Bounded Linear Operators in Hilbert Spaces
311
T HEOREM 6.53 (Shannon-Nyquist sampling theorem12).– Let: – f : R Ñ C be a signal of finite bandwidth: DΩ P R` such that fˆpωq “ 0 @|ω| ą Ω; – fˆ be continuous and C 1 pRq piecewise. Thus, f is fully determined by its samples at points tn “ following formula holds: f ptq “
ÿ
f
nPZ
´π ¯ n sincpΩt ´ πnq Ω
π Ω n,
n P Z, and the
[6.29]
where the convergence of the series is uniform. There are several proofs of the sampling theorem, including a notable example which uses Poisson’s summation formula (1781, Pithiviers-1840, Paris); here, we have chosen to present an alternative proof, found in Boggess and Narcowich (2015, p. 118). P ROOF.– We shall use the series and Fourier transform of fˆ. To do this, we interpret fˆ as a 2Ω-periodic function when we write its Fourier series and as a function with support bounded in r´Ω, Ωs when we calculate its Fourier transform. Thanks to our hypotheses, fˆ P L2 r´Ω, Ωs and thus we can develop fˆ into a Fourier series: ÿ ÿ πωk 2πωk ck ei Ω [6.30] ck ei 2Ω “ fˆpωq “ kPZ
kPZ
with: 1 ck “ 2Ω
żΩ ´Ω
πωk fˆpωqe´i Ω dω
?
ż ´π 2π 1 ? fˆpωqeip Ω kqω dω “ ˆ 2Ω 2π R f pωq“0 @|ω|ąΩ ? ` π ˘ ?2π ` π ˘ 2π ˇ “ 2Ω fˆ ´ Ω k “ 2Ω f ´ Ω k , where in the final step of the previous computation we have used the definition of π the inverse Fourier transform of fˆ, i.e. f , calculated in ´ Ω k, and we included the normalization factor of the series in ck . 12 Shannon (b. 1916, Petoskey; d. 2001, Medford), Nyquist (b. 1889, Stora Kil; d. 1976, Harlingen)
312
From Euclidean to Hilbert Spaces
The Fourier series [6.30] can thus be rewritten as follows: ? ? ÿ 2π ´ π ¯ πωk ÿ 2π ´ π ¯ πωn fˆpωq “ “ f ´ k ei Ω f n e´i Ω 2Ω Ω 2Ω Ω pn“´k ðñ k“´nq nPZ kPZ and this series is uniformly convergent since fˆ is continuous and C 1 piecewise. We calculate f ptq via the inverse Fourier transform of fˆpωq: ż 1 f ptq “ ? fˆpωqeiωt dω 2π R żΩ 1 ? fˆpωqeiωt dω “ 2π ´Ω fˆpωq“0 @|ω|ąΩ żΩ ÿ ? 2π ´ π ¯ ´i πωn iωt 1 f n e Ω e dω “? Ω 2π ´Ω nPZ 2Ω ÿ 1 ´π ¯ż Ω tΩ´πn eiω Ω dω “ f n 2Ω Ω ´Ω nPZ
[6.31]
In the final step of the previous calculation, the series and the integral can be switched thanks to the fact that the series is uniformly convergent. Now, let us analyze the integral: ˆ ˙ ˙ ˆ żΩ żΩ żΩ tΩ´πn tΩ ´ πn tΩ ´ πn eiω Ω dω “ cos ω sin ω dω ` i dω Ω Ω ´Ω ´Ω ´Ω The second integral is zero, as the sine function is odd and the domain of integration is symmetric; on the other hand, the cosine function is even, so we obtain: « ˘ ffΩ ` ˆ ˙ żΩ żΩ sin ω tΩ´πn tΩ ´ πn iω p tΩ´πn Ω q Ω dω “ 2 e cos ω dω “ 2 tΩ´πn Ω ´Ω 0 Ω 0 ` tΩ´πn ˘ sin Ω Ω sin ptΩ ´ πnq “ 2Ω ´ 0 “ 2Ω tΩ ´ πn tΩ ´ πn Inserting this result in [6.31], we obtain: ÿ 2Ω ´ π ¯ sin ptΩ ´ πnq ÿ ´π ¯ f ptq “ f f n “ n sinc ptΩ ´ πnq 2Ω Ω tΩ ´ πn Ω nPZ nPZ and, as underlined before, the series is uniformly convergent.
2
6.10.1. The Nyquist frequency: aliasing and oversampling Since the sinc function ` π is˘ fixed, the signal f is unequivocally characterized by the n . sequence of samples f Ω
Bounded Linear Operators in Hilbert Spaces
313
π The sampling period used in the theorem is T “ Ω , so the sampling frequency, known as the Nyquist frequency and noted νN , is νN “ T1 “ Ω π.
We now wish to compare the Nyquist frequency with the maximal frequency present in the signal f . Remember that we started with the hypothesis that f is a finite-bandwidth signal with maximum pulse Ω. Then the maximum frequency νmax Ω of f is defined by the relation Ω “ 2πνmax , i.e. νmax “ 2π . Comparing the Nyquist sampling frequency νN with the maximal frequency νmax of signal f , we obtain νN “ 2νmax , which tells us that the sampling theorem holds if and only if the sampling frequency is at least twice the maximal frequency present in the signal f . This is coherent with the results of the discrete Fourier transform, where we have seen that the highest frequency of a discrete signal given by N periodic samples is N2 if N is even, or the integer part of N2 if N is odd. If the sampling frequency is lower than the Nyquist frequency, then a phenomenon known as aliasing occurs; this corresponds to errors in signal reconstruction. These errors result from the fact that, as we saw in our proof, we need to consider a periodic extension of the spectrum of f ; the Nyquist frequency νN is the minimum frequency which allows f to be reconstructed without “overflowing” into the next period of the spectrum. A lower sampling frequency results in the inclusion of parasite information from the adjacent spectrum periods on each side. Finally, we note that the general term of the series in the theorem converges to 0 with the same speed as n1 when n Ñ `8; this is a relatively slow convergence. The convergence speed can be increased, for example to n12 , by increasing the sampling frequency: this technique is known as oversampling. 6.11. Application of the Fourier transform to solve ordinary and partial differential equations The way the Fourier transform behaves with respect to derivatives makes it particularly helpful for solving certain types of differential equations. The general idea is illustrated below in the case of an ordinary differential equation (ODE). 6.11.1. Solving an ordinary differential equation using the Fourier transform Taking y, g : R Ñ R, y, g P L1 pRq, y twice differentiable, consider the following ODE: y 2 ptq ´ yptq “ ´gptq
@t P R
314
From Euclidean to Hilbert Spaces
Applying the Fourier transform to both sides, by the property of linearity, we can write: yp2 pωq ´ yˆpωq “ ´ˆ g pωq that is: ´ω 2 yˆpωq ´ yˆpωq “ ´ˆ g pωq
p1 ` ω 2 qˆ y pωq “ gˆpωq
ðñ
that is: yˆpωq “
1 ¨ gˆpωq 1 ` ω2
(Solution in the frequency domain)
We see that the properties of the Fourier transform allowed us to transform the ODE into an algebraic equation in the frequency domain. If we know the Fourier transform of g, then the ODE is solved in the Fourier space. However, as the original ODE was formulated in terms of the variable t, we must return to the original representation by applying the inverse Fourier transform to both sides of the final equation, using property [6.28] we have: „
1 pˆ y pωqq ptq “ yptq “ ¨ gˆpωq 1 ` ω2 _
j_
1 ptq “ ? 2π
ˆˆ
1 1 ` ω2
˙_
ptq ˚ gptq
˙ [6.32]
We can verify by direct calculation that: c 2 a z ´a|t| e pωq “ 2 π a ` ω2 so, considering a “ 1: c πy 1 e´|t| pωq “ 2 1 ` ω2 and then: 1 yptq “ ? 2π
c
π ´|t| ˚ gptq e 2
that is: 1 yptq “ 2
ż `8
´|t´s|
e ´8
1 gpsqds “ 2
ż `8 ´8
gpt ´ sqe´|s| ds
If we are able to calculate the integral (this depends on the analytical expression of g), then yptq can be determined explicitly; otherwise, the value must be approximated. To solve an ODE via the Fourier transform, we thus need to perform the following operations:
Bounded Linear Operators in Hilbert Spaces
315
1) transform the ODE in the frequency domain, applying the Fourier transform to both sides of the equation; 2) solve the algebraic ODE in the Fourier space; 3) apply the inverse Fourier transform to obtain the solution to the ODE in its original representation; 4) typically, the solution in the Fourier space is given by a product; hence, the solution in the original representation is given by a convolution. This technique can only be used if the coefficients of the derivatives are constant, and if the functions are integrable. 6.11.2. The Fourier transform and partial differential equations The Fourier transform is even more effective when applied to partial differential equations. For the purposes of our presentation, we shall only consider functions of the type u “ u pt, xq or u “ upt, x, y, zq, where t is the time coordinate and x or px, y, zq are one-dimensional (1D) or three-dimensional (3D) coordinates, respectively. It is implicitly considered that u P L1 pR2 q or u P L1 pR4 q, respectively, and that u can be derived enough times so that the corresponding PDE is well defined. For simplicity’s sake, we write: Bu B2 u Bu “ uxx , “ ux , “ ut , . . . 2 Bx Bx Bt The properties of the Fourier transform with respect to the partial derivatives are as follows: – if the integration variable of the Fourier transform is x, then: 2 u xx pt, ωq “ iω u ˆpt, ωq, u y ˆpt, ωq xx pt, ωq “ ´ω u
B B2 u ˆpt, ωq u ˆpt, ωq, ux tt pt, ωq “ Bt Bt2 The first two formulas are straightforward; to obtain the remaining two, we note that, since u P L1 pR2 q, the order of derivation and integration can be modified: upt pt, ωq “
1 ? 2π
ż `8 ´8
B 1 Bu pt, xq ´iωx ? dx “ e Bt Bt 2π
The same is true for utt ;
ż `8 ´8
u pt, xq e´iωx dx “
B u ˆ pt, ωq Bt
316
From Euclidean to Hilbert Spaces
– if the integration variable of the Fourier transform is t, then: 2 upt px, ωq “ iωˆ upx, ωq, ux ˆpx, ωq tt px, ωq “ ´ω u
u xx px, ωq “
B B2 u ˆpx, ωq u ˆpx, ωq, u y xx px, ωq “ Bx Bx2
– these considerations can be extended to upt, x, y, zq. 6.11.3. Solving the partial differential equation for heat propagation using the Fourier transform Consider the Cauchy problem for u P C 2 pR2 q X L1 pR2 q and ϕ P C 2 pRq X L1 pRq defined by: # ut “ α2 uxx @x P p´8, `8q , @t P p0, `8q , α P R` up0, xq “ ϕ pxq @x P p´8, `8q , t “ 0 where – u pt, xq is the temperature of a 1D bar at time t and at the point x; – ut pt, xq is the rate of temperature change at time t and at the point x; – uxx pt, xq is the concavity of the temperature profile at time t and x (note that the second derivative is with respect to the spatial variable, thus it would be wrong to interpret uxx as an acceleration); – ϕpxq is the initial concavity of the temperature profile at the point x. If we write the second discrete derivative (with step Δx) with respect to x, we see that it defines the comparison of the temperature at point x at time t with that of its neighbors at the same instant: uxx pt, xq » “
upt,x`Δxq´2upt,xq`upt,x´Δxq » pΔxq2 2 pΔxq2
fi
ffi — upt, x ` Δxq ` upt, x ´ Δxq — ´upt, xqffi fl –looooooooooooooooomooooooooooooooooon 2 mean temperature of neighboring points
Thus, the equation ut “ α2 uxx tells us that: – if upt, xq is less than the mean temperature of its neighbors, then uxx ą 0 and thus ut pt, xqp“ α2 uxx q ą 0, meaning that the temperature at the point x will increase over time: the neighboring points lose some of their heat in favor of x in order to attain thermal equilibrium;
Bounded Linear Operators in Hilbert Spaces
317
– in the opposite case, ut pt, xqp“ α2 uxx q ă 0 and so the temperature at point x decreases over time: x loses heat to its neighbors in order to attain thermal equilibrium; – the positive constant α2 is a characteristic of the material, known as the thermal diffusion coefficient. The higher the value of α2 , the faster the bar will reach thermal equilibrium. The heat equation is used in many other domains: for instance, in image processing, it is used to smooth out imperfections, and in the field of economics, it plays an important role in the Black-Scholes-Merton model of financial markets. The heat equation is solved by calculating the Fourier transform (integrating with respect to variable x) on both sides: ut pt, xq “ α2 uxx pt, xq
ÝÑ p
B ppt, ωq u ppt, ωq “ ´α2 ω 2 u Bt
The initial condition in the Fourier space becomes u pp0, ωq “ ϕpωq. p The PDE has thus been transformed into an ODE: # # B ut pt, xq “ αuxx pt, xq u ppt, ωq “ ´α2 ω 2 u ppt, ωq p ÝÑ Bt up0, xq “ ϕpxq u pp0, ωq “ ϕpωq p because ω is a constant with respect to variable t, thus the equation B ppt, ωq “ ´α2 ω 2 u ppt, ωq is ordinary. We recall that the solution of the Cauchy Bt u problem: # y 1 “ ´ky yp0q “ y0 is yptq “ y0 e´kt and thus, in the present case: u p pt, ωq “ ϕpωq p ¨ e´α
2
ω2 t
“ ϕpωq p ¨ e´pα
2
tqω 2
(Solution in the Fourier space)
The inverse Fourier transform is then applied to obtain the solution in the original representation. Using equation [6.28], we obtain: ´ ¯ 2 2 _ upt, xq “ ϕpωq p ¨ e´pα tqω pt, xq ´ ¯ 2 2 _ 1 ´pα tqω ˇ pt, xq ˆ ˚ e “ ?2π ϕpxq ˇˆ Furthermore, ϕpxq “ ϕpxq, and e´pα can use the following property: ω2 1 ´c2 x2 pωq “ ? e´ 4c2 e{ c 2
ðñ
2
tqω 2
is a Gaussian with respect to ω, so we
? 2 1 ´c2 x2 pωq “ e´ 4c2 ω c 2e{
318
From Euclidean to Hilbert Spaces
In our case, this gives us 4c12 “ α2 t; moreover, c2 “ 4α12 t and then c “ 2α1?t (in physical terms, only the positive determination of the root is relevant). Finally, we can write: ? ¯_ ´ x2 x2 2 1 ´pα2 tqω 2 ? e´ 4α2 t “ ? e´ 4α2 t e pt, xq “ 2α t α 2t and the solution of the heat equation is thus: upt, xq “
1 ? α 4πt
ż `8
e´
px´yq2 4α2 t
ϕpyqdy
´8
Certain expressions of ϕpxq permit exact integration, and an analytical expression of upt, xq is thus possible. Generally, however, it is only possible to approximate upt, xq. It is interesting to note that, as the standard expression of a Gaussian is: px´μq2 1 ? e´ 2σ2 σ 2π ? then σ “ α 2t, i.e. σ 2 “ 2α2 t: the variance of the Gaussian featured in the solution of the heat equation is not fixed, but increases linearly as the time t increases.
This tells us that the support of the Gaussian widens over time; this is perfectly coherent with common experience, given that as t Ñ `8, the bar reaches thermal equilibrium and thus the temperature is uniform across the whole bar. The observations above provide a deeper insight into the technique of convolution with a Gaussian, widely used in signal processing, for example to blur digital images. Taking ϕpyq to represent the original intensity of any given pixel y in a digital image, and interpreting upt, xq as the intensity of the blurred image at time t and in a fixed pixel at position x, the convolution of an image with a Gaussian may be considered as the exchange of intensity (“heat”) between x and its neighbors. Furthermore, just as heat propagation is an irreversible process, the blurring effect obtained by convolution with a Gaussian cannot be directly inverted. One final observation linked to the spatial dimension of the problem is that the application of the technique described above requires x to be variable between ´8 and `8. Other techniques are used to solve problems where x varies within a bounded interval, including the sine and cosine Fourier transforms and the Laplace transform.
Bounded Linear Operators in Hilbert Spaces
319
6.12. Summary Linear operators between normed vector spaces are continuous at a given point if and only if they are continuous everywhere, and if and only if they are bounded. All linear operators defined on a finite-dimensional vector space are continuous (and thus bounded); this ceases to be true, in general, when the space in which the operator is defined is not of finite dimension. A classic example is provided by the derivation operation. For bounded linear operators, we can define a norm, with four equivalent definitions, which makes the set BpV, W q of bounded linear operators between two normed vector spaces V and W a normed vector space in its own right. In the specific case where V “ W , the composition of operators defines a product in BpV q with respect to which BpV q becomes a unital normed associative algebra. Furthermore, if W is complete, BpV, W q is complete; in the specific case where V “ W “ H, a Hilbert space, BpHq is a unital Banach algebra, that is, a complete associative normed algebra such that AB ď A B @A, B P BpHq. The kernel of a bounded operator is always a closed vector subspace in the domain of the operator. If the kernel consists solely of the zero vector, then the operator is inversible, but its inverse will not necessarily be bounded. The existence of μ ą 0 such that }Ax} ě μ}x} gives a simple and useful characterization of the bounded invertibility of an operator A : V Ñ W . If V is a Banach space and this condition is verified, then ImpAq, the image space of A, is closed. In practical applications, the closure of kerpAq (where A is continuous) and of ImpAq (in the hypotheses given above) may be used to characterize a closed subspace: we must simply show that this coincides with the kernel or image of a linear operator which satisfies those hypotheses. The dual of an arbitrary vector space V on the field K “ R or C is the vector space V ˚ of linear functionals defined on the vector space itself. If the space is normed, then it is natural to require compatibility with the topological structure generated by the norm, that is, the functionals are continuous, that is, we define V ˚ “ BpV, Kq. Given that K is complete, V ˚ is always complete, even when V is not. In the case of a Hilbert space H, the Riesz representation theorem tells us that H and H˚ are isomorphic by the transformation which associates each x P H with the functional Tx which implements the inner product, that is, Tx pyq “ xy, xy @y P H. This theorem makes it possible to define the adjoint A: of any operator A P BpHq via the relationship xA: x, yy “ xx, Ayy @x, y P H. If A “ A: , then A is said to be self-adjoint. Two examples of self-adjoint operators are A: A and AA: . The adjoint of a bounded linear operator is a particularly important operator in both theory and practice. An idea of its importance can be seen in the theorem used to characterize an orthogonal projection operator on a Hilbert space: A P BpHq is an
320
From Euclidean to Hilbert Spaces
orthogonal projector on ImpAq if and only if A is self-adjoint A “ A: and idempotent A2 “ A. This result can be used, for example, to show that multiplication operators on L2 pRn q are orthogonal projectors if and only if they multiply by the indicator function of a measurable subset of Rn . There is also a highly important geometric representation of orthogonal projectors: A P BpHq is an orthogonal projectorřif and only if there exists an orthonormal system pun qnPN in H such that Ax “ xx, un yun , @x P H. This realization of the projector is the extension, in nPN
infinite dimensions, of the analogous formula valid in finite dimension. The adjoint also plays a role in the analysis of isometric and unitary operators. An operator A P BpHq is isometric if it conserves the norm (or, in an equivalent manner, the inner product); a unitary operator is isometric and surjective. The two categories of operators have unit norm. The relationship between isometric operators and orthogonal projectors is given by the following result: if A P BpHq is isometric, then AA: is an orthogonal projector. If A P BpHq is isometric, then ImpAq “ ImpAA: q and, given that AA: is an orthogonal projector (since A is taken to be isometric), ImpAA: q is closed; thus the image space of an isometric operator is always closed. Since kerpA: q “ Im pAqK , if A is isometric but not surjective, then ImpAq ‰ H; hence, Im pAqK ‰ t0H u and then A: is not invertible. Using the same argument, we also see that if A is unitary, then A: is invertible. As in the case of orthogonal projection operators, an algebraic characterization of isometric and unitary operators can be obtained via the adjoint: A P BpHq is isometric if and only if A: A “ idH , while A P BpHq is unitary if and only if A: A “ AA: “ idH ; in this final case, A is invertible and A´1 “ A: . Moreover, we can show that A is unitary if and only if A: is unitary. One consequence of this result is that the unitary nature of an operator A can be studied by examining that of its adjoint, which, in some cases, is simpler. Regarding the geometric realization of isometric and unitary operators, A P BpHq is isometric if and only if it transforms Hilbert bases into orthonormal systems, while A P BpHq is unitary if and only if it transforms Hilbert bases into Hilbert bases. This is an important difference with respect to the finite dimensional case. ş The Fourier transform f pxq ÞÑ fˆpωq “ p2πq1n{2 Rn f pxqe´ixω,xy dx is widely used in both pure and applied mathematics. The most “natural” space in which to define this transform is the Schwartz space; in this space, the Fourier transform has the integral formula given above, and is an isometric isomorphism with respect to the norm inherited by L2 pRn q. If we wish to extend the transform to a space with less regular functions, for example L1 pRn q or L2 pRn q, certain properties must be sacrificed. On L1 pRn q, the image is C8 pRn q, but the integral formula is preserved. On L2 pRn q, the integral formula must be replaced by a limit formula, but the isomorphic character of the transform is retained; the extension of the Fourier transform on L2 pRn q defines a unitary operator F P BpL2 pRn qq. An explicit formula
Bounded Linear Operators in Hilbert Spaces
321
for thisřunitary operator can be obtained by means of the Hermite basis: Ff “ p´iqn xf, un yun . Finally, we note that – to within a constant – the Fourier nPN
transform of the convolution of two functions in L1 pRn q is the pointwise product of the transforms. Finally, we presented the Nyquist-Shannon sampling theorem, which enables the reconstruction of a signal with bounded bandwidth using a sufficiently dense, but finite, set of samples of this signal. We also described applications of the Fourier transform in solving differential equations, notably the heat equation, which played a crucial role in the development of Fourier’s theory.
Appendix 1 Quotient Space
The concept of quotient of a vector space is essential in mathematics, and, in our opinion, does not always receive the attention it deserves in works on linear algebra. For this reason, we have chosen to devote an appendix to the definition and interpretation of this concept. D EFINITION A1.1.– An equivalence relation „ defined on a vector space V (of arbitrary dimension) on the field K is said to be compatible with the linear structure of V if: v „ v1 , w „ w1
ùñ
αv ` βw „ αv 1 ` βw1 ,
@α, β P K
The equivalence class of 0 in V is a vector space Z (since it is stable with respect to linear combinations, and contains the neutral element) known as the kernel of the equivalence relationship. One special case of this definition is when w “ w1 “ v 1 and α “ 1, β “ ´1, which implies: v „ v1
ùñ
v ´ v 1 „ 0 ðñ v ´ v 1 P Z
Conversely, if v ´ v 1 P Z, i.e. v ´ v 1 „ 0 ðñ v ´ v 1 „ v 1 ´ v 1 , and by the fact that v 1 „ v 1 , and since „ is compatible with the linear structure of V , we obtain: v ´ v 1 ` v 1 „ v 1 ´ v 1 ` v 1 , that is, v „ v 1 . In short: v „ v 1 ðñ v ´ v 1 P Z, which tells us that an equivalence relationship compatible with the linear structure of a vector space is univocally determined by its kernel, which is a vector subspace of V . This observation allows us to reverse the process. Given an arbitrary vector subspace W in V , if we define: v „W v 1 ðñ v ´ v 1 P W
@v, v 1 P V
From Euclidean to Hilbert Spaces: Introduction to Functional Analysis and its Applications First Edition. Edoardo Provenzi. © ISTE Ltd 2021. Published by ISTE Ltd and John Wiley & Sons, Inc.
324
From Euclidean to Hilbert Spaces
then „ is an equivalence relationship in V which is compatible with its linear structure and with W as kernel. By symmetry, v „W v 1 ðñ v 1 „W v ðñ v 1 ´ v P W , and this means that there exists w P W such that v 1 ´ v “ w, that is, v 1 “ v ` w; thus, the equivalence class rvsW containing v P V is the subset of V given by: rvsW “ v ` W “ tv ` w : w P W u This is referred to as a linear subvariety and interpreted geometrically as the shift of the subspace W by the vector v. We observe that if v P W , then, by linearity, the shift of W through v does not modify W . As a vector subspace of V , W contains the 0, thus if v R W , then the equivalence class v ` W does not contain the 0 and cannot, therefore, be a vector subspace of V . Lemma A1.1 is essential for defining a quotient space, and will be used extensively in the rest of this appendix. L EMMA A1.1.– Let V be an arbitrary vector space, let v1 , v2 P V and let W be a vector subspace of V . Then the equality v1 ` W “ v2 ` W holds if and only if W1 “ W2 ” W and v1 ´ v2 P W . P ROOF.– ð : taking v1 ´ v2 ” w0 P W “ W1 “ W2 , then v2 “ v1 ´ w0 and: v1 ` W “ tv1 ` w : w P W u,
v2 ` W “ tv1 ` w ´ w0 : w P W u
but evidently W “ tw : w P W u “ tw ´ w0 : w P W u, hence v1 ` W “ v2 ` W . ñ : inversely, taking v1 ` W1 “ v2 ` W2 , then, by the definition of a linear subvariety, v1 ´ v2 ` W1 “ v2 ´ v2 ` W2 “ W2 , thus, if w0 “ v1 ´ v2 , we obtain w0 ` W1 “ W2 . Since W2 is a vector subspace of V , it contains 0. w0 ` W1 also contains 0, i.e. ´w0 P W1 and thus w0 P W1 since W1 is also a vector subspace. Shifting the vectors of W1 using w0 , which is a vector in W1 , does not change the subspace, i.e. w0 ` W1 “ W1 ; however, since w0 ` W1 “ W2 , we obtain W1 “ W2 “ W and w0 “ v1 ´ v2 P W1 “ W . 2 This lemma implies that every linear subvariety is uniquely determined by a single subspace W , of which the subvariety is the shift. Moreover, the vector which induces the shift is uniquely determined, up to the sum with a vector in W . It is now possible to establish the definition of quotient space and prove that this definition is well posed.
Appendix 1
325
D EFINITION A1.2 (quotient (vector) space).– Let V be any vector space and W a vector subspace of V . The quotient vector space V {W is the set of all linear subvarieties of V which are shifts of W , equipped with the following linear operations: pv1 ` W q ` pv2 ` W q “ pv1 ` v2 q ` W, αpv ` W q “ αv ` W,
@v1 , v2 P V
@v P V, @α P K
Let us verify that these operations are well defined and that V {W is a vector space on K. The easy proof that the vector space axioms for V {W are directly induced by the vector space properties of V is left to the reader. Let us just underline the following properties: a) if v1 ` W “ v11 ` W and v2 ` W “ v21 ` W , then v1 ` v2 ` W “ v11 ` v21 ` W ; b) if v1 ` W “ v11 ` W , then αv1 ` W “ αv11 ` W ; @v1 , v2 , v11 , v21 P V and @α P K. We begin by proving the validity of property a. Lemma A1.1 tells us that v1 ´v11 ” w1 P W and v2 ´ v21 ” w2 P W , thus: 1 1 pv1 ` v2 q ` W “ pv11 ` v21 q ` pw 1 ` w2 q ` W “ pv1 ` v2 q ` W loooooooomoooooooon “W
To prove the validity of property b, we simply note that if we write v1 ´ v11 ” w P W , then αpv1 ´ v11 q “ αw P W , and thus by Lemma A1.1, αv1 ` W “ αv11 ` W . It is natural to wonder what the dimension of V {W is, and whether it is linked or not to the dimensions of V and W . To answer this question we need the following preliminary result. L EMMA A1.2.– Let V be an arbitrary vector space, W a vector subspace of V and H a subspace of V which is supplementary to W , i.e. such that W X H “ t0u and V “ W ‘ H. Then, for any vector v P V which implements a translation of W , there exists only one vector hv P H X pv ` W q. This vector is used to write v in a unique manner in the direct sum v “ wv ` hv . P ROOF.– Let us begin by proving the existence of a vector hv belonging to H and to v ` W . By the hypothesis V “ W ‘ H, any vector v P V may be written in a unique manner as v “ wv ` hv , wv P W and hv P H; we must prove that hv P v ` W . To do this, let us now consider a vector v 1 “ wv1 `hv1 P V , wv1 P W and hv1 P H, which belongs to the same equivalence class as v, that is, which is such that v ` W “
326
From Euclidean to Hilbert Spaces
v 1 ` W . Again, using Lemma A1.1, v 1 ´ v P W , that is, wv1 ` hv1 ´ wv ´ hv P W , that is, wv1 ´ wv ` hv1 ´ hv P W . Moreover, since wv1 ´ wv P W and hv1 ´ hv P H, the only case in which their sum remains within W is where hv1 ´ hv “ 0 (given that W X H “ t0u). Hence, hv1 “ hv and then v `W Q v 1 “ wv1 `hv , that is, wv1 `hv P v `W . Using Lemma A1.1 once more, we know that the sum of a vector belonging to v ` W and a vector in W does not take us outside of the equivalence class v ` W , thus hv P v ` W . Since hv P H and hv P v ` W , then hv P H X pv ` W q. Inversely, if h P H X pv ` W q, then, in particular, h P v ` W , that is, h „W v, that is, Dv P V and w ˜v P W such that h “ w ˜v ` v, that is, v “ wv ` h, where wv “ ´w ˜v , that is, h “ hv . 2 T HEOREM A1.1.– If W is a subspace of the vector space V which admits a supplement H in V , then H is isomorphic to V {W : V {W » H,
V “W ‘H
P ROOF.– The uniqueness of vector hv , as established by Lemma A1.2, allows us to construct the bijective and intrinsically linear correspondence which associates an arbitrary linear subvariety v ` W in V with the component in H of an arbitrary representative v P v ` W , that is: V {W ÝÑ H v ` W ÞÝÑ hv , such that: v “ wv ` hv is a linear isomorphism. 2 Note that, given a closed vector subspace W of a Hilbert space H, the orthogonal projection theorem 5.7 tells us that a supplementary space always exists in the form of the orthogonal complement W K ; hence, in this case: H{W » W K that is, the quotient vector space of a Hilbert space on a closed vector subspace W is isomorphic to the orthogonal complement of W . This result also allows us to determine the dimension of V {W as a function of that of V and of W in finite dimensions. In this case, dimpV q “ dimpW q`dimpHq and dimpHq “ dimpV {W q, then dimpV {W q “ dimpV q´dimpW q. C OROLLARY A1.1 (Dimension of V {W ).– Let V be a vector space of finite dimension and W a vector subspace of V , then: dimpV {W q “ dimpV q ´ dimpW q
Appendix 1
327
Many problems in both pure and applied mathematics require us to consider situations where V and W are of infinite dimension, while V {W is of finite dimension. In this case, dimpV {W q is known as the codimension of W in V and written as codimpV {W q. Once the dimension of V {W and the linear isomorphism with H have been determined, Corollary A1.2 concerning the bases of V {W in finite dimensions is almost immediate. C OROLLARY A1.2 (Bases of V {W ).– Let V be a vector space of finite dimension n and W a vector subspace of V , then the linear subvarieties pei `W qni“1 Ă V {W form a basis of the quotient vector space V {W if and only if the representatives pei qni“1 Ă V constitute a basis for a supplementary subspace of W in V . Note that the zero of V {W is evidently the linear subvariety which contains the 0 of V , i.e. 0V ` W ” W is the zero of V {W . We conclude our analysis of V {W by considering the natural projection of V onto V {W : π : V ÝÑ V {W v ÞÝÑ πpvq “ v ` W The properties of π are as follows: – π is surjective: this stems from the fact that each element in V {W is represented by a vector in V ; – the fibers of π, i.e. the counter-images of the elements in V {W through π, are the elements of V {W interpreted as a subvariety of V : π ´1 prv0 sq “ tv P V : v ` W “ v0 ` W u but the equality between sets v ` W “ v0 ` W is only verified for v “ v0 ` W , thus: π ´1 prv0 sq “ v0 ` W where rv0 s is interpreted, first, as the equivalence class corresponding to the element of V {W identified by v0 P V , then as v0 ` W , seen as a subset of V ; – π is a linear application, by the fact that V {W is well defined; – the kernel of π is W : kerpπq “ W . By Lemma A1.1, v0 ` W “ W if and only if v0 P W .
Appendix 2 The Transpose (or Dual) of a Linear Operator
Any linear operator A : V Ñ W , where V and W are two finite-dimensional vector spaces, can be univocally associated with a linear operator At known as the transpose or dual operator of A, defined as: At : W ˚ ÝÑ V ˚ ϕ ÞÝÑ At ϕ “ ϕ ˝ A that is: At ϕ : V ÝÑ K v ÞÝÑ At ϕpvq “ ϕpAvq This definition is natural, as it only uses A and the elements supplied by the vector spaces themselves. Using canonical notation to express the action of a linear functional, we can rewrite At ϕpvq “ ϕpAvq as: xAt ϕ, vy “ xϕ, Avy
[A2.1]
The fact that this is well defined, that is, the linearity of the functional At ϕ, is guaranteed by the fact that for a fixed ϕ, the function v ÞÑ ϕpAvq is linear, as it is a composition of linear applications. The uniqueness of this definition can also be easily proven. Let At1 and At2 be two transpose operators such that At1 ϕpvq “ ϕpAvq “ At2 ϕpvq, that is, pAt1 ´ At2 qϕpvq “ 0. Taking an arbitrary fixed ϕ P V ˚ and leaving v free within V , it is evident from equation pAt1 ´ At2 qϕpvq “ 0 that pAt1 ´ At2 qϕ is the identically zero functional. This holds for all ϕ P V ˚ , implying that At1 ´ At2 “ 0, that is, At1 “ At2 . From Euclidean to Hilbert Spaces: Introduction to Functional Analysis and its Applications First Edition. Edoardo Provenzi. © ISTE Ltd 2021. Published by ISTE Ltd and John Wiley & Sons, Inc.
330
From Euclidean to Hilbert Spaces
Now, let V and W be two finite-dimensional Banach spaces. In this case, the definition remains valid as long as, for all A P BpV, W q, the transpose operator defined above is continuous, that is, At P BpW ˚ , V ˚ q, and if At ϕ is a bounded linear functional on V whenever ϕ is a bounded linear functional on W . Let us verify these properties. – At ϕ is a bounded linear functional on V @ϕ P W ˚ : linearity is evident by definition, so we only need to prove that At ϕ is bounded: }At ϕ} “ sup}v}“1 }pAt ϕqv} ď
A bounded
“
def of At
sup}v}“1 }ϕpAvq}
sup}v}“1 }ϕ}}A}}v} “ }ϕ}}A} ă `8
ď
ϕ bounded
sup}v}“1 }ϕ}}Av}
– At P BpW ˚ , V ˚ q: }At } “ sup }At ϕ} }ϕ}“1
“
sup }ϕ˝A}
def of At }ϕ}“1
ď
sup }ϕ}}A} “ }A}
ϕ bounded }ϕ}“1
ă
APBpV,W q
`8
If V “ W “ H, where H is a Hilbert space, then the Riesz isomorphism T : H Ñ H˚ , H Q x ÞÑ T pxq “ Tx , where Tx pyq “ xy, xy @y P H is associated with the adjoint operator defined in section 6.4 via the expression: A: “ T ´1 At T.
Appendix 3 Uniform, Strong and Weak Convergence
Sequences of operators may be shown to converge with respect to different topologies than the one induced by the operator norm. The same can be said for sequences of elements in Banach or Hilbert spaces. To take a concrete example, consider the following case. Let pun qnPN be an arbitrary Hilbert basis in a Hilbert space H. For all n P N, we define the linear operator: An : H ÝÑ H x ÞÝÑ An x “
n ř
xx, um yum
m“0
From the geometric characterization of projection operators (see Theorem 6.32), we know that An is the orthogonal projector on the vector subspace of H generated by u1 , . . . , un : Sn “ spanpu1 , . . . , un q. Since any x P H may be written as x “
8 ř
xx, un yun , it would seem that the
n“0
sequence of projectors pAn qnPN converges toward idH when n Ñ `8. Nevertheless, since Sn Ă Sn1 @n ă n1 , we know by Theorem 6.35 that An1 ´An is the projector onto Sn1 XSnK “ spanpun`1 , un`2 , . . . , un1 q, thanks to the orthogonality of the vectors pun qnPN . As we have seen, all orthogonal projectors onto non-trivial subspaces have a unitary norm, that is, An1 ´ An “ 1 @n ă n1 , and thus the sequence pAn qnPN is not a Cauchy sequence in BpHq with respect to the operator norm; thus, it cannot be convergent because BpHq is complete and so convergent and Cachy sequences coincide. The sequence pAn xqnPN in H, however, converges to x for all x P H, by the fact that pun qnPN is a Hilbert basis. This highlights the need to define an alternative form From Euclidean to Hilbert Spaces: Introduction to Functional Analysis and its Applications First Edition. Edoardo Provenzi. © ISTE Ltd 2021. Published by ISTE Ltd and John Wiley & Sons, Inc.
332
From Euclidean to Hilbert Spaces
of convergence in order to assign a precise meaning to the intuitive notion that the sequence pAn qnPN converges to idH . Similar examples are encountered in Banach and Hilbert spaces; for this reason, we have organized our presentation of alternative forms of convergence into separate sections for different spaces. A3.1. Strong and weak convergence in Banach spaces Let pV, } }q be a Banach space. By definition, a sequence pxn qnPN Ă V converges toward x P V if xn ´ x ÝÝÝÝÝ Ñ 0. A different type of convergence can be defined nÑ`8
in V by using the continuous linear functionals of its dual space V ˚ . D EFINITION A3.1 (Weak convergence in a Banach space).– Let V be a Banach space. The sequence pxn qnPN Ă V converges weakly toward x P V if, for all ϕ P V ˚ : ϕpxn q ÝÝÝÝÝ Ñ ϕpxq nÑ`8
where the convergence in this case is that of sequences of scalars in K. x is the weak w limit of the sequence pxn qnPN and we write xn ÝÝÝÝÝÑ x, with w for weak. nÑ`8
We note that, for all ϕ P V ˚ and x P V , ϕpxq ď ϕ x, thus, if xn ´ x ÝÝÝÝÝÑ 0, then: nÑ`8
ϕpxn q ´ ϕpxq “ ϕpxn ´ xq ď ϕ xn ´ x ÝÝÝÝÝÑ 0 nÑ`8
that is, “standard” convergence implies weak convergence. For this reason, “standard” convergence in a Banach space is also referred to as strong convergence. Counter-examples show that the inverse is not generally true. Thus, in a Banach space, the topology defined by weak convergence has fewer opens than the topology defined by strong convergence. A3.2. Strong and weak convergence in a Hilbert space A Hilbert space H is also a Banach space, thus the definition of strong and weak convergence given above also applies to Hilbert spaces. Nevertheless, by the Riesz representation theorem, we know that the action of any continuous linear functional on H can be identified with a scalar product. For this reason, an equivalent definition, which is more explicit for the purposes of calculation, can be used for weak convergence in a Hilbert space.
Appendix 3
333
D EFINITION A3.2 (weak convergence in a Hilbert space).– Let H be a Hilbert space. The sequence pxn qnPN Ă H converges weakly toward x P H if, for all y P H: xy, xn y ÝÝÝÝÝ Ñ xy, xy nÑ`8
As in the case of Banach spaces, x is said to be the weak limit of the sequence w Ñ x. pxn qnPN and we write xn ÝÝÝÝÝ nÑ`8
A very simple counter-example can be used to show that weak convergence does not generally imply strong convergence in a Hilbert space. Take any y P H and xn “ un @n P N, where pun qnPN is an arbitrary orthonormal ř 2 system in H. By Bessel’s inequality |xy, un y|2 ď y , so the series is convergent nPN
and thus its general term tends toward 0. Since any series which is absolutely convergent is convergent, ř H is complete, hence xy, un y2 is convergent and then xy, un y2 ÝÝÝÝÝ Ñ 0; however, this holds if nÑ`8
nPN
Ñ 0 for all y P H. and only if xy, un y ÝÝÝÝÝ nÑ`8
Hence, any orthonormal system pun qnPN in a Hilbert space is weakly convergent toward 0. However, ?distance between any two elements of an orthonormal ? we know that the system is 2: un ´ um “ 2 @n, m P N, thus pun qnPN does not verify the Cauchy condition, and therefore it cannot be strongly convergent. A3.3. Uniform, strong and weak convergence in the Banach algebra BpHq In the Banach algebra pBpHq, } }q, where H is any Hilbert space and } } is the operator norm, three different convergences can be defined for a sequence of operators pAn qnPN Ă BpHq. D EFINITION A3.3.– We shall use u, s and w to denote uniform, strong and weak. Let pAn qnPN Ă BpHq be a sequence of bounded linear operators on the Hilbert space H, and take A P BpHq. – Uniform convergence (standard convergence, in operator norm): u
An ÝÝÝÝÝÑ A ðñ An ´ A ÝÝÝÝÝÑ 0 nÑ`8
nÑ`8
334
From Euclidean to Hilbert Spaces
– Strong convergence: s
An ÝÝÝÝÝ Ñ A ðñ An x ÝÝÝÝÝÑ Ax ðñ An x ´ AxH ÝÝÝÝÝÑ 0 nÑ`8
nÑ`8
nÑ`8
@x P H
– Weak convergence: w
An ÝÝÝÝÝÑ A ðñ xy, An xy ÝÝÝÝÝÑ xy, Axy @x, y P H nÑ`8
nÑ`8
As we saw at the beginning of this appendix, for any Hilbert basis pum qmPN , the sequence: An : H ÝÑ H x ÞÝÑ An x “
n ř
xx, um yum
m“0
does not converge uniformly idH . However, it converges strongly towards the identity operator, since, by the continuity of the norm, we have: ÿ lim }An x´idH pxq}H “ } lim An x´x}H “ } xx, um yum ´x}H “ 0 nÑ`8
nÑ`8
mPN
having used the fact that idH pxq is not dependent on n and the generalized Fourier expansion on the Hilbert basis pun qnPN . It is possible to show that, in BpHq, uniform convergence implies strong convergence, which itself implies weak convergence. On the other hand, as we see from the example shown above, strong convergence does not imply uniform convergence. Other counter-examples can be used to show that weak convergence in BpHq does not imply strong convergence.
References
Abbati, M. and Cirelli, R. (1997). Metodi matematici per la fisica – Operatori lineari negli spazi di Hilbert. Città studi, Milan. Bartle, R. (1966). The Elements of Integration. John Wiley & Sons, Hoboken. Berberian, S. (1961). Introduction to Hilbert Spaces. Oxford University Press, Oxford. Boggess, A. and Narcowich, F. (2015). A First Course Wavelets with Fourier Analysis. John Wiley & Sons, Hoboken. Briane, M. and Pagè, G. (1998). Théorie de l’intégration – cours et exercices. Vuibert, Paris. Debnath, L. and Mikusinski, P. (2005). Introduction to Hilbert Spaces with Applications. Academic Press, Cambridge. Dunford, N. and Schwartz, J. (1958). Linear Operators, Part 1. Wiley Interscience, Hoboken. El Hage Hassan, N. (2011). Topologie générale et espaces normés : cours et exercices corrigés. Dunod, Paris. Frazier, M.W. (2001). Introduction to Wavelets through Linear Algebra. Springer, Berlin. Gasquet, C. and Witomski, P. (2013). Fourier Analysis and Applications: Filtering, Numerical Computation, Wavelets, vol. 30. Springer Science & Business Media, Berlin. Moretti, V. (2013). Spectral Theory and Quantum Mechanics, vol. 64. Springer, Berlin. Saxe, K. (2000). Beginning Functional Analysis. Springer, Berlin.
From Euclidean to Hilbert Spaces: Introduction to Functional Analysis and its Applications First Edition. Edoardo Provenzi. © ISTE Ltd 2021. Published by ISTE Ltd and John Wiley & Sons, Inc.
336
From Euclidean to Hilbert Spaces
Sondaz, D. (2010). Bien maîtriser les mathématiques : limites, applications continues, espaces complets. Cépaduès, Toulouse. Vretblad, A. (2003). Fourier Analysis and Its Applications. Springer, Berlin. Yosida, K. (1995). Functional Analysis. Springer-Verlag, Berlin-Heidelberg.
Index
L8 , 156 Lp , 145 V {W , 324 C˚ -algebra, 263 KN , 149 2 pZN q, 33 8 , 157 p , 149 DpΩq “ Cc8 pΩq, 166 σ-algebra, 106 Borel, 107 generated, 107 BpV, W q, 229 BpHq, 231 SpRq, 168 SpRn q, 168 DpRq, 166 A, B algebra Banach, 232 on a field, 231 almost everywhere (a.e), 109 basis Fourier Hilbert of L2 , 202 Hilbert, 194 orthogonal, 14 orthonormal, 14 orthonormal Fourier of 2 pZN q, 40 bipolar, 183 Borel set, 107
C, D closed convex hull, 183 closure, 117 Codomain of a linear operator, 221 coefficients Fourier in 2 pZN q, 42 generalized Fourier, 191 commutator, 284 continuity of fundamental operations in pre-Hilbert spaces, 120 contraction mapping, 140 convergence of a sequence of bounded operators, 230 strong, 332 uniform, 333 weak, 332 convolution, 69, 306 Dual, 244 E, F equivalence of topologies in finite dimensions, 128 essential supremum, 156 expansion to a generalized Fourier series, 195 Fatou’s lemma, 113 FFT (Fast Fourier Transform), 51 finite element methods, 260 form bilinear, 3
From Euclidean to Hilbert Spaces: Introduction to Functional Analysis and its Applications First Edition. Edoardo Provenzi. © ISTE Ltd 2021. Published by ISTE Ltd and John Wiley & Sons, Inc.
338
From Euclidean to Hilbert Spaces
bounded bilinear or sesquilinear, 250 coercive, 257 defined, 3 definite, 5 elliptical, 257 Hermitian, 5 positive, 3, 5 quadratic, 249 sesquilinear, 5 symmetrical, 3 formula analysis, 53 synthesis, 52 Fourier multiplier operator, 61 Fourier-Plancherel transform, 305 function 1-Lipschitz, 271 continuous between metric spaces, 119 essentially bounded, 156 indicator (characteristic), 109 measurable, 108 step (or simple), 165 test, 166
isometric, 200 isomorphism between Hilbert spaces, 200
G, H, I
neighborhood open, 116 norm, 6 Frobenius, of a matrix, 139 Hilbertian, 7 of a bounded bilinear or sesquilinear form, 250 operatorial, 227 Nyquist frequency, 59 operator adjoint, 261 bounded linear, 223 continuous linear, 223 differential, 221 identity, 221 integration, 221 inverse, 239 isometric, 287 multiplication in 2 pZN q, 60 multiplication in L2 , 278 null, 221 orthogonal projection in Hilbert spaces, 270
Gram-Schmidt orthonormalization algorithm, 20 harmonic fundamental, 53, 207 higher order, 53, 207 Hermite basis, 305 Homogeneity of the norm, 7 identity Parseval’s, 195 finite dimensions, 21 Plancherel’s, 195 inequality Bessel’s, 189 Cauchy-Schwarz, 7 Hölder’s for integrals, 148 for series, 149 Minkowski for integrals, 146 for series, 149 triangle, 7
K, L, M Kronecker delta, 11 product, 92 law parallelogram, 9 polarization, 10 Lebesgue integral of a function, 110 linear functional, 244 linear operator image of a 221 matrix exponential, 138 measure, 107 σ-finite, 108 Borel, 111 regular, 111 counting, 148 finite, 108 multi-index, 167 N, O
Index
projection (oblique), 269 rotation, 288 self-adjoint (Hermitian), 263 shift in !2 pZN q, 63 translation, 288 transpose, 329 unitary, 287 orthogonal complement, 172 dimension of a Hilbert space, 199 family of vectors, 11 projection in finite dimensions, 17 P, R, S polar, 183 product canonical inner, 3 complex Euclidean inner, 5 of bounded operators, 230 residual vector, 18 Riemann-Lebesgue lemma, 214 sequence bounded, 132 in a metric space, 132 Cauchy, 129 convergent in norm, 117 series absolutely convergent in norm, 123 convergent in norm, 123 real Fourier in L2 , 206 set closed, 117 measurable, 107 open, 117 signal finite bandwidth, 310 space Banach, 131 complete metric, 129 complex pre-Hilbert, 5 Hilbert, 131 separable, 188 measurable, 107 metric vector, 116 normed vector, 7
339
quotient, 324 real pre-Hilbert, 3 Schwartz, 167 topological vector, 127 spectrum amplitude, 54 phase, 54 power, 54 subsequence, 132 subset density, 118 support of a function, 166 Sylvester matrix, 49 system orthonormal, 188 complete, 194 T, U theorem Banach fixed-point, 139 bounded extension of bounded linear operators, 302 Carnot’s, 9 characterization of a Hilbertian norm, 124 characterization of completeness of normed spaces using series, 136 completion of a non-complete metric space, 133 continuous inverse operator, 242 decomposition on an orthonormal basis, 21 dominated convergence, 113 extension of a bounded linear operator, 302 Fischer-Riesz, 192 generalized Pythagorean, 12 Lax-Milgram, 257 monotone convergence, 113 open mapping (Banach-Schauder), 242 orthogonal projection in a Hilbert space, 185 Plancherel’s finite dimensions, 21 projection on a closed convex, 174 Riemann-Lebesgue, 301 Riesz-Fisher (completeness of Lp spaces), 150
340
From Euclidean to Hilbert Spaces
Riesz representation, 244 sampling, 311 topology metric, 117 separated, 117
transform discrete Fourier (DFT), 43 Fourier-Plancherel on L2 pRn q, 304 inverse discrete Fourier (IDFT), 44 unit pulse, 55
Other titles from
in Mathematics and Statistics
2021 MOKLYACHUK Mikhail Convex Optimization: Introductory Course POGORUI Anatoliy, SWISHCHUK Anatoliy, RODRÍGUEZ-DAGNINO Ramón M. Random Motions in Markov and Semi-Markov Random Environments 1: Homogeneous Random Motions and their Applications Random Motions in Markov and Semi-Markov Random Environments 2: High-dimensional Random Motions and Financial Applications
2020 BARBU Vlad Stefan, VERGNE Nicolas Statistical Topics and Stochastic Models for Dependent Data with Applications CHABANYUK Yaroslav, NIKITIN Anatolii, KHIMKA Uliana Asymptotic Analyses for Complex Evolutionary Systems with Markov and Semi-Markov Switching Using Approximation Schemes KOROLIOUK Dmitri Dynamics of Statistical Experiments
MANOU-ABI Solym Mawaki, DABO-NIANG Sophie, SALONE Jean-Jacques Mathematical Modeling of Random and Deterministic Phenomena
2019 BANNA Oksana, MISHURA Yuliya, RALCHENKO Kostiantyn, SHKLYAR Sergiy Fractional Brownian Motion: Approximations and Projections GANA Kamel, BROC Guillaume Structural Equation Modeling with lavaan KUKUSH Alexander Gaussian Measures in Hilbert Space: Construction and Properties LUZ Maksym, MOKLYACHUK Mikhail Estimation of Stochastic Processes with Stationary Increments and Cointegrated Sequences MICHELITSCH Thomas, PÉREZ RIASCOS Alejandro, COLLET Bernard, NOWAKOWSKI Andrzej, NICOLLEAU Franck Fractional Dynamics on Networks and Lattices VOTSI Irene, LIMNIOS Nikolaos, PAPADIMITRIOU Eleftheria, TSAKLIDIS George Earthquake Statistical Analysis through Multi-state Modeling (Statistical Methods for Earthquakes Set – Volume 2)
2018 AZAÏS Romain, BOUGUET Florian Statistical Inference for Piecewise-deterministic Markov Processes IBRAHIMI Mohammed Mergers & Acquisitions: Theory, Strategy, Finance PARROCHIA Daniel Mathematics and Philosophy
2017 CARONI Chysseis First Hitting Time Regression Models: Lifetime Data Analysis Based on Underlying Stochastic Processes (Mathematical Models and Methods in Reliability Set – Volume 4) CELANT Giorgio, BRONIATOWSKI Michel Interpolation and Extrapolation Optimal Designs 2: Finite Dimensional General Models CONSOLE Rodolfo, MURRU Maura, FALCONE Giuseppe Earthquake Occurrence: Short- and Long-term Models and their Validation (Statistical Methods for Earthquakes Set – Volume 1) D’AMICO Guglielmo, DI BIASE Giuseppe, JANSSEN Jacques, MANCA Raimondo Semi-Markov Migration Models for Credit Risk (Stochastic Models for Insurance Set – Volume 1) GONZÁLEZ VELASCO Miguel, del PUERTO GARCÍA Inés, YANEV George P. Controlled Branching Processes (Branching Processes, Branching Random Walks and Branching Particle Fields Set – Volume 2) HARLAMOV Boris Stochastic Analysis of Risk and Management (Stochastic Models in Survival Analysis and Reliability Set – Volume 2) KERSTING Götz, VATUTIN Vladimir Discrete Time Branching Processes in Random Environment (Branching Processes, Branching Random Walks and Branching Particle Fields Set – Volume 1) MISHURA YULIYA, SHEVCHENKO Georgiy Theory and Statistical Applications of Stochastic Processes NIKULIN Mikhail, CHIMITOVA Ekaterina Chi-squared Goodness-of-fit Tests for Censored Data (Stochastic Models in Survival Analysis and Reliability Set – Volume 3)
SIMON Jacques Banach, Fréchet, Hilbert and Neumann Spaces (Analysis for PDEs Set – Volume 1)
2016 CELANT Giorgio, BRONIATOWSKI Michel Interpolation and Extrapolation Optimal Designs 1: Polynomial Regression and Approximation Theory CHIASSERINI Carla Fabiana, GRIBAUDO Marco, MANINI Daniele Analytical Modeling of Wireless Communication Systems (Stochastic Models in Computer Science and Telecommunication Networks Set – Volume 1) GOUDON Thierry Mathematics for Modeling and Scientific Computing KAHLE Waltraud, MERCIER Sophie, PAROISSIN Christian Degradation Processes in Reliability (Mathematial Models and Methods in Reliability Set – Volume 3) KERN Michel Numerical Methods for Inverse Problems RYKOV Vladimir Reliability of Engineering Systems and Technological Risks (Stochastic Models in Survival Analysis and Reliability Set – Volume 1)
2015 DE SAPORTA Benoîte, DUFOUR François, ZHANG Huilong
Numerical Methods for Simulation and Optimization of Piecewise Deterministic Markov Processes DEVOLDER Pierre, JANSSEN Jacques, MANCA Raimondo Basic Stochastic Processes LE GAT Yves Recurrent Event Modeling Based on the Yule Process (Mathematical Models and Methods in Reliability Set – Volume 2)
2014 COOKE Roger M., NIEBOER Daan, MISIEWICZ Jolanta Fat-tailed Distributions: Data, Diagnostics and Dependence (Mathematical Models and Methods in Reliability Set – Volume 1) MACKEVIČIUS Vigirdas Integral and Measure: From Rather Simple to Rather Complex PASCHOS Vangelis Th Combinatorial Optimization – 3-volume series – 2nd edition Concepts of Combinatorial Optimization / Concepts and Fundamentals – volume 1 Paradigms of Combinatorial Optimization – volume 2 Applications of Combinatorial Optimization – volume 3
2013 COUALLIER Vincent, GERVILLE-RÉACHE Léo, HUBER Catherine, LIMNIOS Nikolaos, MESBAH Mounir Statistical Models and Methods for Reliability and Survival Analysis JANSSEN Jacques, MANCA Oronzio, MANCA Raimondo Applied Diffusion Processes from Engineering to Finance SERICOLA Bruno Markov Chains: Theory, Algorithms and Applications
2012 BOSQ Denis Mathematical Statistics and Stochastic Processes CHRISTENSEN Karl Bang, KREINER Svend, MESBAH Mounir Rasch Models in Health DEVOLDER Pierre, JANSSEN Jacques, MANCA Raimondo Stochastic Methods for Pension Funds
2011 MACKEVIČIUS Vigirdas Introduction to Stochastic Analysis: Integrals and Differential Equations MAHJOUB Ridha Recent Progress in Combinatorial Optimization – ISCO2010 RAYNAUD Hervé, ARROW Kenneth Managerial Logic
2010 BAGDONAVIČIUS Vilijandas, KRUOPIS Julius, NIKULIN Mikhail Nonparametric Tests for Censored Data BAGDONAVIČIUS Vilijandas, KRUOPIS Julius, NIKULIN Mikhail Nonparametric Tests for Complete Data IOSIFESCU Marius et al. Introduction to Stochastic Models VASSILIOU PCG Discrete-time Asset Pricing Models in Applied Stochastic Finance
2008 ANISIMOV Vladimir Switching Processes in Queuing Models FICHE Georges, HÉBUTERNE Gérard Mathematics for Engineers HUBER Catherine, LIMNIOS Nikolaos et al. Mathematical Methods in Survival Analysis, Reliability and Quality of Life JANSSEN Jacques, MANCA Raimondo, VOLPE Ernesto Mathematical Finance
2007 HARLAMOV Boris Continuous Semi-Markov Processes
2006 CLERC Maurice Particle Swarm Optimization