137 5 8MB
English Pages 332 Year 2023
LINEAR ALGEBRA CORE TOPICS FOR THE SECOND COURSE
LINEAR ALGEBRA CORE TOPICS FOR THE SECOND COURSE
Dragu Atanasiu University of Borås, Sweden
Piotr Mikusiński
University of Central Florida, USA
World Scientific NEW JERSEY
•
LONDON
•
SINGAPORE
•
BEIJING
•
SHANGHAI
•
HONG KONG
•
TAIPEI
•
CHENNAI
•
TOKYO
Published by World Scientific Publishing Co. Pte. Ltd. 5 Toh Tuck Link, Singapore 596224 USA office: 27 Warren Street, Suite 401-402, Hackensack, NJ 07601 UK office: 57 Shelton Street, Covent Garden, London WC2H 9HE
Library of Congress Control Number: 2022057625
British Library Cataloguing-in-Publication Data A catalogue record for this book is available from the British Library.
LINEAR ALGEBRA Core Topics for the Second Course Copyright © 2023 by World Scientific Publishing Co. Pte. Ltd. All rights reserved. This book, or parts thereof, may not be reproduced in any form or by any means, electronic or mechanical, including photocopying, recording or any information storage and retrieval system now known or to be invented, without written permission from the publisher.
For photocopying of material in this volume, please pay a copying fee through the Copyright Clearance Center, Inc., 222 Rosewood Drive, Danvers, MA 01923, USA. In this case permission to photocopy is not required from the publisher.
ISBN 978-981-125-854-1 (hardcover) ISBN 978-981-125-855-8 (ebook for institutions) ISBN 978-981-125-856-5 (ebook for individuals)
For any available supplementary material, please visit https://www.worldscientific.com/worldscibooks/10.1142/12898#t=suppl
Printed in Singapore
Preface This is a book for a second course in linear algebra. In order to facilitate a smooth transition to the world of rigorous proofs we mix, to a greater extent than most textbooks at this level, abstract theory with matrix calculations. We present numerous examples and proofs of particular cases of important results before the general versions are formulated and proved. We noticed in many years of teaching that a proof of a particular case which captures the main idea of the general theorem has a major impact on the depth of understanding of the general theory. Reading simpler and more manageable proofs is also more likely to encourage students to work with proofs. Students can try to prove another particular case or the general case using the knowledge gained from the particular case. For some theorems we give two or even three proofs. In this way we give students an opportunity to see important results from different angles and at the same time to see connections between different results presented in the book. Students are assumed to be familiar with calculations with real matrices. For example, students should be able to calculate products of matrices and find the reduced row echelon form of a matrix. All this background material is presented in our book Linear Algebra, Core topics for the first course, but the present book does not assume that students are familiar with our presentation of that material. Any standard book on Matrix Linear Algebra will provide a sufficient preparation for the present volume. On the other hand, since most material of this book mirrors, at a higher, less computational, and more abstract level, the content of our book Linear Algebra, Core topics for the first course, students who find a result from this book too abstract would benefit from reading the same material from the first course. For example, in Core topics for the first course there are a lot of concrete examples of Jordan forms and singular value decompositions. Getting familiar with those examples would make the theory presented in the second course more accessible. The majority of results are presented under the assumption that the vector spaces are finite dimensional. Some examples of infinite dimensional spaces are given and a very brief discussion of infinite dimensional inner product spaces is included as Appendix D. In Chapter 1 we introduce vector spaces and discuss some basic ideas including subspaces, linear independence, bases, dimension, and direct sums. We consider both real and complex vector spaces. For students with limited exv
vi
PREFACE
perience with complex numbers we provide an appendix that presents complex numbers in an elementary and detailed manner. In Chapter 2 we discuss linear transformations between vector spaces. The presented topics include projections, the Rank-Nullity Theorem, isomorphisms, dual spaces, matrix representation of linear transformations, and quotient spaces. In order to keep this chapter at a reasonable size we only prove the results that are used in later chapters and give more results as exercises. In Chapter 3 we discuss inner product spaces, including orthogonal projections, self-adjoint, normal, unitary, orthogonal, and positive linear transformation. A careful presentation of spectral theorems and the singular value decomposition constitute a substantial part of this chapter. Since we determine eigenvalues without using characteristic polynomials, determinants are used in Chapter 3 only in some examples and exercises. In Chapter 4 we show how to obtain bases such that the matrix of a linear transformation becomes diagonal or block-diagonal. In order to construct such bases, we study factorization of characteristic polynomials. In order to give interesting examples we need to calculate characteristic polynomials using determinants of endomorphisms. These determinants are introduced at the beginning of Chapter 4 via multilinear algebra. It is possible to obtain diagonal and block-diagonal matrices without using determinants, as shown in [2], but in our opinion the discussion of determinants in connection with alternating multilinear forms is an interesting part of linear algebra. While the theory without determinants has a certain appeal, when it comes to determining diagonal and block-diagonal forms in concrete cases one is limited to very simple examples where calculating the eigenvalues is trivial, as can be seen in books where determinants are avoided. We believe that students should have all possible instruments to solve problems and determinants are essential in determining the characteristic polynomials. It has been our experience that by presenting topics in a less theoretical way and showing more concrete calculations using determinants increase the understanding of the presented topics. Since every student taking a proof based course in linear algebra has some knowledge of determinants, the presence of determinants in this book helps students make connections with more elementary courses. Moreover, in our book we do not calculate determinants as in matrix linear algebra or precalculus, but instead we emphasize the connection to multilinear algebra. If there are reasons to dislike determinants because calculating determinants is tedious and non-intuitive, it is not a reason to not appreciate the elegance of the alternating multilinear forms and eliminate them in a first course of linear algebra with proofs. Appendices that provide short introductions to permutations, complex numbers, and polynomials are included at the end of the book. Proofs of all results presented in these appendices are included. A complete solution manual is available upon request for all instructors who adopt this book as a course text. Please send your request to [email protected].
Contents Preface
v
1 Vector Spaces 1.1 Definitions and examples . . . . . . . . . 1.2 Subspaces . . . . . . . . . . . . . . . . . 1.3 Linearly independent vectors and bases 1.4 Direct sums . . . . . . . . . . . . . . . . 1.5 Dimension of a vector space . . . . . . . 1.6 Change of basis . . . . . . . . . . . . . . 1.7 Exercises . . . . . . . . . . . . . . . . . 1.7.1 Definitions and examples . . . . 1.7.2 Subspaces . . . . . . . . . . . . . 1.7.3 Linearly independent vectors and 1.7.4 Direct sums . . . . . . . . . . . . 1.7.5 Dimension of a vector space . . . 1.7.6 Change of basis . . . . . . . . . .
. . . . . . . . . . . . .
. . . . . . . . . . . . .
. . . . . . . . . . . . .
. . . . . . . . . . . . .
. . . . . . . . . . . . .
1 1 8 17 32 41 53 58 58 59 61 63 63 65
2 Linear Transformations 2.1 Basic properties . . . . . . . . . . . . . . . . . . . . . . . . 2.1.1 The kernel and range of a linear transformation . . 2.1.2 Projections . . . . . . . . . . . . . . . . . . . . . . 2.1.3 The Rank-Nullity Theorem . . . . . . . . . . . . . 2.2 Isomorphisms . . . . . . . . . . . . . . . . . . . . . . . . . 2.3 Linear transformations and matrices . . . . . . . . . . . . 2.3.1 The matrix of a linear transformation . . . . . . . 2.3.2 The isomorphism between Mn×m (K) and L(V, W) 2.4 Duality . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.4.1 The dual space . . . . . . . . . . . . . . . . . . . . 2.4.2 The bidual . . . . . . . . . . . . . . . . . . . . . . 2.5 Quotient spaces . . . . . . . . . . . . . . . . . . . . . . . . 2.6 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.6.1 Basic properties . . . . . . . . . . . . . . . . . . . 2.6.2 Isomorphisms . . . . . . . . . . . . . . . . . . . . . 2.6.3 Linear transformations and matrices . . . . . . . . 2.6.4 Duality . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . .
67 68 73 76 78 81 86 86 95 96 97 99 102 105 105 108 109 109
vii
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . bases . . . . . . . . . . . .
. . . . . . . . . . . . .
. . . . . . . . . . . . .
. . . . . . . . . . . . .
. . . . . . . . . . . . .
. . . . . . . . . . . . .
viii
CONTENTS 2.6.5
Quotient spaces . . . . . . . . . . . . . . . . . . . . . . . . 111
3 Inner Product Spaces 3.1 Definitions and examples . . . . . . . . . . . . . . . . . . . . . . 3.2 Orthogonal projections . . . . . . . . . . . . . . . . . . . . . . . 3.2.1 Orthogonal projections on lines . . . . . . . . . . . . . . 3.2.2 Orthogonal projections on arbitrary subspaces . . . . . 3.2.3 Calculations and applications of orthogonal projections 3.2.4 The annihilator and the orthogonal complement . . . . 3.2.5 The Gram-Schmidt orthogonalization process and orthonormal bases . . . . . . . . . . . . . . . . . . . . . 3.3 The adjoint of a linear transformation . . . . . . . . . . . . . . 3.4 Spectral theorems . . . . . . . . . . . . . . . . . . . . . . . . . 3.4.1 Spectral theorems for operators on complex inner product spaces . . . . . . . . . . . . . . . . . . . . . . . 3.4.2 Self-adjoint operators on real inner product spaces . . . 3.4.3 Unitary operators . . . . . . . . . . . . . . . . . . . . . 3.4.4 Orthogonal operators on real inner product spaces . . . 3.4.5 Positive operators . . . . . . . . . . . . . . . . . . . . . 3.5 Singular value decomposition . . . . . . . . . . . . . . . . . . . 3.6 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.6.1 Definitions and examples . . . . . . . . . . . . . . . . . 3.6.2 Orthogonal projections . . . . . . . . . . . . . . . . . . . 3.6.3 The adjoint of a linear transformation . . . . . . . . . . 3.6.4 Spectral theorems . . . . . . . . . . . . . . . . . . . . . 3.6.5 Singular value decomposition . . . . . . . . . . . . . . . 4 Reduction of Endomorphisms 4.1 Eigenvalues and diagonalization . . . . . . . . . . . . . . . 4.1.1 Multilinear alternating forms and determinants . . 4.1.2 Diagonalization . . . . . . . . . . . . . . . . . . . . 4.2 Jordan canonical form . . . . . . . . . . . . . . . . . . . . 4.2.1 Jordan canonical form when the characteristic polynomial has one root . . . . . . . . . . . . . . . 4.2.2 Uniqueness of the Jordan canonical form when the characteristic polynomial has one root . . . . . . . 4.2.3 Jordan canonical form when the characteristic polynomial has several roots . . . . . . . . . . . . . 4.3 The rational form . . . . . . . . . . . . . . . . . . . . . . . 4.4 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.4.1 Diagonalization . . . . . . . . . . . . . . . . . . . . 4.4.2 Jordan canonical form . . . . . . . . . . . . . . . . 4.4.3 Rational form . . . . . . . . . . . . . . . . . . . . .
. . . .
. . . .
. . . .
. . . . . .
113 113 128 128 133 139 143
. 147 . 154 . 163 . . . . . . . . . . . .
163 177 180 187 192 200 213 213 214 215 216 218
. . . .
221 221 221 234 252
. . . . 252 . . . . 274 . . . . . .
. . . . . .
. . . . . .
. . . . . .
277 280 290 290 294 297
ix
CONTENTS 5 Appendices Appendix A Appendix B Appendix C Appendix D
Permutations . . . . . . . . . . . . . . . . Complex numbers . . . . . . . . . . . . . Polynomials . . . . . . . . . . . . . . . . . Infinite dimensional inner product spaces
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
301 301 304 307 317
Bibliography
319
Index
321
This page intentionally left blank
Chapter 1
Vector Spaces Introduction If you are reading this book, you most likely worked with vectors in a number of courses and you have fairly good intuitive understanding of what we mean by a vector. But can you give a formal definition of a vector? It turns out that the best we can do is to define a vector as an element of a vector space. It may seem a silly definition, but actually it represents what is quite common in more advanced mathematics. The idea is that, when we want to describe a certain class of objects, we don’t want to describe the objects themselves, but rather what we can do with them and what are the properties of those operations on the objects we are defining. Describing the objects directly limits applications of the methods we develop to the instances we describe. On the other hand, if formulate something about any object that has certain properties, then it will apply to examples that we may not even be aware of. We use this approach in our definition of vector spaces. We define a vector space as a collection of objects not of a certain kind, but rather objects on which certain operations having certain operational properties can be performed.
1.1
Definitions and examples
A vector space is a set with an algebraic structure. Elements of a vector space can be added and scaled. Addition in a vector space V is a function that assigns to any v, w ∈ V a unique element v + w ∈ V. An element v ∈ V is scaled if it is multiplied by a number, called scalar in this context. The result of multiplying v ∈ V by a scalar c is denoted by cv. In this book scalars are either real numbers or complex numbers. If it is necessary to specify which case it is, we write a “real vector space” or a “complex vector space”. If a statement applies to both real vector spaces and complex vector spaces, we use the letter K instead of R or C. The formal definition of vector spaces given below is an example of such a situation. The set K is called a “scalar field”. The phrase “vector space over K” 1
2
Chapter 1: Vector Spaces
is often used to indicate that K is the set of scalars for that vector space. As explained in the introduction, addition and scaling in the definition of a vector space below are not specific operations, but any operations that satisfy the listed conditions. In most applications we already know what we mean by addition and scaling. In order to use the tools of vector spaces we have to make sure that all conditions in the definition are satisfied. After the definition we discuss some examples where all conditions are satisfied as well as some examples where some conditions are not satisfied. Definition 1.1.1. By a vector space we mean a set V with addition that assigns a unique element v + w ∈ V to any v, w ∈ V and scalar multiplication that assigns a unique element cv ∈ V to any c ∈ K and any v ∈ V in such a way that all of the following conditions are satisfied: (a) For every v, w ∈ V we have v + w = w + v; (b) For every u, v, w ∈ V we have u + (v + w) = (u + w) + v; (c) There is an element 0 ∈ V such that for every v ∈ V we have 0 + v = v; (d) For every v ∈ V there is u ∈ V such that v + u = 0; (e) For every v ∈ V we have 1v = v; (f) For every v ∈ V and every c1 , c2 ∈ K we have (c1 c2 )v = c1 (c2 v); (g) For every v ∈ V and every c1 , c2 ∈ K we have (c1 +c2 )v = c1 v+c2 v; (h) For every v, w ∈ V and every c ∈ K we have c(v + w) = cv + cw. Now we present some examples of vector spaces. We do not verify that all conditions in the definition of a vector space are satisfied. While we do not expect that you will verify every condition in every example, you should convince yourself that they are satisfied. It is a good exercise to give formal proofs for some conditions in some examples, especially if they don’t seem obvious.
Example 1.1.2. The set of all real numbers R with the standard addition and multiplication is a real vector space. Note that in this example the vector space and the scalar field are the same set. The set of all complex numbers C is an example of a complex vector space as well as a real vector space.
3
1.1. DEFINITIONS AND EXAMPLES
a1 Example 1.1.3. For every integer n ≥ 1 the set of all n × 1 matrices ...
an with a1 , . . . , an ∈ K, denoted by Kn , is a vector space with the operations of addition of vectors and multiplication of vectors by scalars defined by a1 b1 a1 + b 1 a1 ca1 .. .. .. . . . + . = . and c .. = .. an
bn
an + b n
an
can
for c ∈ K. Clearly, Cn is a vector space over C and Rn is a vector space over R. It is also possible to consider Cn as a vector space over R.
Example 1.1.4. Let V = {a}, a set with a single element a. With the operations defined as a + a = a and ca = a, for any c ∈ K, it is a vector space. Note that we must have a = 0, where 0 is the element whose existence is guaranteed by condition (c) in Definition 1.1.1. This is the smallest possible vector space. It is often called the trivial vector space.
Example 1.1.5. Let S be an arbitrary nonempty set. The set of all functions f : S → K with addition defined by (f + g)(s) = f (s) + g(s) for every s ∈ S and multiplication by scalars defined by (cf )(s) = cf (s) for every s ∈ S and c ∈ K is a vector space over K. We will denote this space by FK (S). Note that the constant function f (s) = 0 is the 0 of this vector space and the function g(s) = −f (s) is the element of FK (S) such that f + g = 0. This is a very important family of vector spaces. Several examples below are special cases of FK (S). If you verify that all conditions in the definition of a vector space are satisfied for FK (S), then there is no need to check them for those examples.
4
Chapter 1: Vector Spaces
Example 1.1.6. We denote by Mm×n (K) the set of all m × n matrices with entries from K. If A and B are matrices from Mm×n (K) such that the (j, k) entry of the matrix A is ajk and the (j, k) entry of the matrix B is bjk and c is a number form K, then A + B is the matrix with the (j, k) entry equal to ajk + bjk and cA is the matrix with the (j, k) entry equal to cajk . It is easy to verify that with these operations Mm×n (K) is a vector space over K. Note that the set Mm×n (K) can be interpreted as the space FK (S) where S = {1, 2, . . . , m} × {1, 2, . . . , n} = {(j, k) : j = 1, 2, . . . , m, k = 1, 2, . . . , n}.
Example 1.1.7. The set of all infinite sequences (xn ) = (x1 , x2 , . . . ) of real numbers can be identified with FR ({1, 2, 3, . . . }) and thus it is a real vector space. Similarly, the set of all infinite sequences of complex numbers is a complex vector space. The operations defined in Example 1.1.5 can be described in a more intuitive way: (x1 , x2 , . . . ) + (y1 , y2 , . . . ) = (x1 + y1 , x2 + y2 , . . . ) and c(x1 , x2 , . . . ) = (cx1 , cx2 , . . . ). In some applications it is natural to consider the vector space FK (Z), where Z is the set of all integers, of all “two-sided” sequences (. . . , x−2 , x−1 , x0 , x1 , x2 , . . . ) of real or complex numbers.
Example 1.1.8. We will define a vector space over R whose elements are lines of R2 parallel to a given line. We recall that a line is a set of the form x + Ra = {x + ca : c ∈ R} where x, a ∈ R2 and a 6= 0. Let a be a fixed nonzero vector in R2 . We define b = x + Ra. x
b = Ra. b is the line through x parallel to a. First note that 0 In other words, x Moreover, b=y b if and only if there is a real number α such that y = x + αa. x
In other words
b=y b if and only if y ∈ x b. x
1.1. DEFINITIONS AND EXAMPLES b=y b , then Indeed, if x
5
b=y b = y + Ra, x + Ra = x
and thus y = y + 0a = x + αa for some α ∈ R. On the other hand, if y = x + αa for some α ∈ R, then b = y + Ra = x + αa + Ra = x + Ra = x b, y
because αa + Ra = Ra. Now we define
\ b+y b=x x + y and cb x = cc x.
These operations are well defined because
\ (x + αa) + (y + βa) = (x + y) + (α + β)a ∈ x +y and c(x + αa) = cx + (cα)a ∈ cc x,
for arbitrary real numbers α, β and c. The set V = {b x : x ∈ R2 } with the operations defined above is a real vector space.
Example 1.1.9. Let A = {(x, x) : x ∈ R} and B = {(x, −x) : x ∈ R}. Show that A ∪ B is not a real vector space. Solution. It suffices to note that, for example, the vector (2, 0) = (1, 1)+(1, −1) is not A ∪ B. Note that both A and B are vector spaces.
Example 1.1.10. Let A = {(x, y) ∈ R2 : y ≥ 0}. Show that A is not a real vector space. Solution. It suffices to observe that, for example, the vector (1, −4) is in A, but the vector (2, −8) = −2(1, −4)) is not in A. Note that, if (x1 , y1 ), (x2 , y2 ) ∈ A, then (x1 + x2 , y1 + y2 ) = (x1 , y1 ) + (x2 , y2 ) is in A.
Example 1.1.11. Let A = {(x, y, z) ∈ R3 : x + y + z = 1}. Show that A is not a real vector space. Solution 1. It suffices to note that the vectors (1, 0, 0) and (0, 1, 0) are in A, but the vector (1, 1, 0) = (1, 0, 0) + (0, 1, 0) is not in A.
6
Chapter 1: Vector Spaces
Solution 2. It suffices to note that the vector (1, 0, 0) is in A, but the vector (3, 0, 0) = 3(1, 0, 0) is not in A. Note that the set V = {(x, y, z) ∈ R3 : x + y + z = 0} is a real vector space.
Example 1.1.12. Let V1 and V2 be arbitrary vector spaces over K. The set of all pairs (v1 , v2 ) such that v1 ∈ V1 and v2 ∈ V2 is denoted by V1 × V2 and called the Cartesian product of spaces V1 and V2 . In symbols, V1 × V2 = {(v1 , v2 ) : v1 ∈ V1 , v2 ∈ V2 } . It is easy to verify that V1 × V2 becomes a vector space if we define (v1 , v2 ) + (w1 , w2 ) = (v1 + w1 , v2 + w2 ) and c(v1 , v2 ) = (cv1 , cv2 ). More generally, for arbitrary vector spaces V1 , . . . , Vn over K we define V1 × · · · × Vn = {(v1 , . . . , vn ) : v1 ∈ V1 , . . . , vn ∈ Vn } . The set V1 × · · · × Vn , called the Cartesian product of spaces V1 , . . . , Vn , is a vector space with the addition (v1 , . . . , vn ) + (w1 , . . . , wn ) = (v1 + w1 , . . . , vn + wn ) and scalar multiplication c(v1 , . . . , vn ) = (cv1 , . . . , cvn ). If V1 = · · · = Vn = V, then we write V1 × · · · × Vn = V · · × V} = V n . | × ·{z n times
Example 1.1.13. Let V be a real vector space. Show that V × V is a complex vector space if we define addition as for the real vector space V × V and multiplication by complex numbers as follows (a + bi)(x, y) = (ax − by, ay + bx), where a and b are real numbers and x and y are vectors from V. Solution. We will only verify that (c + di)((a + bi)x, y) = ((c + di)(a + bi))(x, y)
1.1. DEFINITIONS AND EXAMPLES
7
because the verification of the other axioms is similar and easier. (c + di)((a + bi)x, y)) = (c + di)(ax − by, ay + bx) = (c(ax − by) − d(ay + bx), c(ay + bx) + d(ax − by)) = ((ac − db)x − (ad + bc)y, (ac − bd)y) + (ad + bc)x) = ((ac − bd) + (ad + bc)i)(x, y) = ((c + di)(a + bi))(x, y).
Elements of a vector space are called vectors. The use of the word “vector” does not imply that we are talking about the familiar vectors in Rn that we picture as arrows. In the above examples we considered vectors that were functions, matrices, sequences, or even sets (Example 1.1.8). The expressions “v is an element of a vector space V” and “v is a vector in a vector space V” are completely equivalent. We close this section with a theorem that collects some simple but useful properties of addition and scaling in vector spaces. Theorem 1.1.14. Let V be a vector space. Then (a) The element 0 such that 0 + v = v for every v ∈ V is unique; (b) 0v = 0 for every v ∈ V; (c) If v + u = 0, then u = (−1)v; (d) c0 = 0 for every c ∈ K; (e) If u + w = v + w, then u = v. Proof. To prove (a) assume that there are 01 , 02 ∈ V such that 01 + v = v and 02 + v = v for every v ∈ V. Then we have 01 = 02 + 01 = 01 + 02 = 02 . For every v ∈ V we have 0v + v = 0v + 1v = (0 + 1)v = 1v = v. Now let u ∈ V be such that v + u = 0. Then 0v + v = v implies 0 = v + u = (0v + v) + u = 0v + (v + u) = 0v + 0 = 0v. Thus, 0v = 0 for every v ∈ V, proving (b). To prove (c) we first observe that for every v ∈ V we have v + (−1)v = 1v + (−1)v = (1 − 1)v = 0v = 0.
8
Chapter 1: Vector Spaces
It remains to show that (−1)v is the only element with that property. Indeed, if u is any element such that v + u = 0, then u = u + 0 = u + (v + (−1)v) = (u + v) + (−1)v = 0 + (−1)v = (−1)v. For any c ∈ K we have c0 = c(v + (−1)v) = cv + (−c)v = (c − c)v = 0v = 0, where v is an arbitrary element in V, proving (d). Finally, if u + w = v + w, then u = u + 0 = u + (w + (−1)w) = (u + w) + (−1)w = (v + w) + (−1)w = v + (w + (−1)w) = v + 0 = v, proving (e). The element (−1)v is denoted simply by −v. With this notation we have v − v = −v + v = 0. Note that −(−v) = v.
1.2
Subspaces
The concept of a subspace of a vector space is one of the fundamental ideas of linear algebra. We begin the discussion of subspaces by considering an example. Example 1.2.1. Let V be a vector space and let x be a vector in V. The set of all vectors of the form cx where c ∈ K is a vector space with the addition and the multiplication by scalars inherited from the vector space V, that is, c1 x + c2 x = (c1 + c2 )x
and c1 (c2 x) = (c1 c2 )x.
This vector space is denoted by Kx or Span{x}: Kx = Span{x} = {cx : c ∈ K} . Without being too precise, one can say that the vector space Span{x} is a “small vector space in a large vector space”. It is important that every element of the small space is an element of the large space and that the operations of addition and scaling in the small space are the same as in the large space. These ideas are captured in the following precise definition.
9
1.2. SUBSPACES
Definition 1.2.2. A nonempty subset U of a vector space V is called a subspace of V if the following two conditions are satisfied: (a) If v ∈ U and c ∈ K, then cv ∈ U; (b) If v, w ∈ U, then v + w ∈ U. The set Span{x} defined in Example 1.2.1 is a subspace of the vector space V. Note that, if we take x = 0, then Span{x} = {0}. Since 0 is an element in every vector space, {0} is a subspace of every vector space. From the definition of subspaces it follows that every vector space is a subspace of itself. To exclude these special cases, we add the word “proper”: U is a proper subspace of V if U is a subspace of V such that U = 6 V and U = 6 {0}. Now we present several examples of subspaces that often appear in applications. Example 1.2.3. A polynomial is a function p : K → K defined by p(t) = a0 + a1 t + · · · + an tn where a0 , a1 , . . . , an ∈ K. If an 6= 0, then n is called the degree of the polynomial p(t) = a0 + a1 t + · · · + an tn (see Appendix C). The set of all polynomials, denoted by P(K), is a subspace of FK (K). The set Pn (K) of all polynomials of the form a0 + a1 t + · · · + an tn is a subspace of P(K). If k ≤ n, then Pk (K) is a subspace of Pn (K).
Example 1.2.4. For every integer n ≥ 1 we denote by DRn (R) the set of all real-valued n-times differentiable functions defined on R. We also denote by DR (R) the set of all real-valued functions defined on R which are n-times differentiable for every integer n ≥ 1. Show that DRn (R) is a subspace of FR (R) for every integer n ≥ 1 and that DR (R) is a subspace of DRn (R) for every integer n ≥ 1. Solution. This is an immediate consequence of the fact that the sum of differentiable functions is differentiable and that a constant multiple of a differentiable function is differentiable.
Example 1.2.5. Show that the set Sn×n of all n × n symmetric matrices (matrices such that AT = A) is a subspace of the vector space of the square matrices Mn×n (K).
10
Chapter 1: Vector Spaces
Solution. If A and B are symmetric matrices and c ∈ K, then we have (A + B)T = AT + B T = A + B and (cA)T = cAT = cA.
Example 1.2.6. Show that the set An×n (K) of all antisymmetric matrices (matrices such that AT = −A) is a subspace of the vector space of square matrices Mn×n (K). Solution. If A and B are antisymmetric matrices and c ∈ K, then we have (A + B)T = AT + B T = −A − B = −(A + B) and (cA)T = cAT = c(−A) = −cA.
Example 1.2.7. Let A ∈ Mm×n (K). Show that the set C(A) = {Ax : x ∈ Kn } is a subspace of the vector space Km . Solution. If y1 = Ax1 , y2 = Ax2 , and c ∈ K, then y1 + y2 = A(x1 + x2 ) and cy = cAx = A(cx).
Example 1.2.8. Let A ∈ Mm×n (K). Show that the set N(A) = {x ∈ Kn : Ax = 0} is a subspace of the vector space Kn . Determine 2+i 1−i 1 i 2+i i . N 2+2i 3 1+i
11
1.2. SUBSPACES Solution. If Ax1 = Ax2 = Ax = 0 and c ∈ K, then A(x1 + x2 ) = Ax1 + Ax2 = 0 and A(cx) = c(Ax) = 0, which proves that N(A) is a subspace of Mm×n (K). Since the reduced row echelon form of the matrix 2+i 1−i 1 i 2+i i 2+2i 3 1+i is
1 0 0 1 0 0
2 13 1 13
− +
3 13 i 5 13 i ,
0
the solution of the equation 2+i 1−i 1 z1 0 i 2+i i z2 = 0 2+ 2i 3 1 + i z3 0
is
2 − 13 + z1 z2 = c − 1 − 13 z3
3 13 i 5 13 i ,
1
where c is an arbitrary complex number. This means that 2 3 2+i 1−i 1 − 13 + 13 i 1 5 i 2+i i = Span − 13 N − 13 i . 2+2i 3 1+i 1
Example 1.2.9. Let a, b, and c be distinct numbers from K and let U be the set of polynomials from Pn (K) such that p(a) = p(b) = p(c), that is, U = {p ∈ Pn (K) : p(a) = p(b) = p(c)} . Show that U is a subspace of Pn (K). Solution. If p1 , p2 ∈ U, then p1 (a) = p1 (b) = p1 (c) and p2 (a) = p2 (b) = p2 (c),
12
Chapter 1: Vector Spaces
and consequently p1 (a) + p2 (a) = p1 (b) + p2 (b) = p1 (c) + p2 (c), which means that p1 + p2 ∈ U. Similarly, if p ∈ U and k ∈ K, then p(a) = p(b) = p(c) and consequently kp(a) = kp(b) = kp(c), which means that kp ∈ U. Note that in our argument we are not using the fact that elements of U are polynomials. A similar argument can be used to show that, if V is any subspace of FK (S) and s1 , s2 , . . . , sn ∈ S, then U = {f ∈ V : f (s1 ) = f (s2 ) = · · · = f (sn )} is a subspace of V. Now we are going to consider some general properties of subspaces. We begin with the following simple observation that will help us show that every subspace of a vector space is a vector space itself. Lemma 1.2.10. If U is a subspace of a vector space V, then (a) If v ∈ U, then −v ∈ U; (b) 0 ∈ U. Proof. To show that (a) holds we note that for any v ∈ V we have −v = (−1)v, so if v ∈ U, then −v = (−1)v ∈ U. To show that (b) holds we take an arbitrary u ∈ U and note that 0 = u − u = u + (−1)u ∈ U.
Theorem 1.2.11. A subspace U of a vector space V is a vector space itself.
13
1.2. SUBSPACES
Proof. First we note that, by the definition of a subspace, for every v, w ∈ U the element v + w is in U and for every v ∈ U and every number c ∈ K the element cv is in U. Moreover, by Lemma 1.2.10, 0 ∈ U and −v ∈ U for every v ∈ U. Finally, the eight conditions in the definition of a vector space are satisfied for elements of U because they are satisfied for all elements of V.
Definition 1.2.12. Let V1 , . . . , Vn be subspaces of a vector space V. The set of all possible sums of the form v1 + · · · + vn , where v1 ∈ V1 , . . . , vn ∈ Vn , is denoted by V1 + · · · + Vn , that is, V1 + · · · + Vn = {v1 + · · · + vn : v1 ∈ V1 , . . . , vn ∈ Vn }. This operation of “addition of subspaces” has properties similar to ordinary addition. Theorem 1.2.13. Let U, V, and W be subspaces of a vector space X . Then U +V =V +U and (U + V) + W = V + (U + W).
Proof. The properties follow immediately from the fact that u+v =v+u
and u + (v + w) = (u + w) + v
for any u, v, w ∈ X . While there are some similarities, there are also some differences. For example, in general U + W = V + W does not imply U = V. Theorem 1.2.14. If V1 , . . . , Vn are subspaces of a vector space V, then the set V1 + · · · + Vn is a subspace of V.
14
Chapter 1: Vector Spaces
Proof. If v ∈ V1 + · · · + Vn , then v = v1 + · · · + vn , for some vj ∈ Vj . For any number c ∈ K we have cv = c(v1 + · · · + vn ) = cv1 + · · · + cvn , which shows that cv ∈ V1 + · · · + Vn . If v, w ∈ V1 + · · · + Vn , then v = v1 + · · · + vn
and w = w1 + · · · + wn ,
for some vectors vj , wj ∈ Vj . Since v + w = v1 + · · · + vn + w1 + · · · + wn = v1 + w1 + · · · + vn + wn , we have v + w ∈ V1 + · · · + Vn . Definition 1.2.15. Let v1 , . . . , vn be elements of a vector space V. Any vector of V of the form x1 v1 + · · · + xn vn , where x1 , . . . , xn ∈ K, is called a linear combination of v1 , . . . , vn . The set of all linear combinations of the vectors v1 , . . . , vn is denoted by Span{v1 , . . . , vn }, that is, Span{v1 , . . . , vn } = {x1 v1 + · · · + xn vn : x1 , . . . , xn ∈ K}. The set Span{v1 , . . . , vn } is called the linear span (or simply span) of vectors v1 , . . . , vn . Linear spans play an important role in defining subspaces in vector spaces. Theorem 1.2.16. If v1 , . . . , vn be elements of a vector space V, then the set Span{v1 , . . . , vn } is a subspace of V. Proof. Since Span{v1 , . . . , vn } = Kv1 + · · · + Kvn , the result is a consequence of Theorem 1.2.14. If U = Span{v1 , . . . , vn }, we say that {v1 , . . . , vn } is a spanning set for U. Note that Span{v1 , . . . , vn } is the smallest vector space containing vectors v1 , . . . , vn .
15
1.2. SUBSPACES
Example 1.2.17. Pn (K) = Span{1, t, t2 , .., tn }.
Example 1.2.18. The set of all real-valued solutions of the differential equation y ′ − 3y = 0 is a subspace on DR (R). This subspace is Span{e3t }.
Example 1.2.19. The set of all real-valued solutions of the differential equation y ′′ + y = 0 is a subspace of the vector space DR (R). This subspace is Span{cos t, sin t}. The choice of a spanning set is not unique. For example, we have Span{cos t, sin t} = Span{cos t + sin t, cos t − sin t}. Indeed, if f ∈ Span{cos t, sin t}, then for some a, b ∈ K we have f (t) = a cos t + b sin t =
a−b a+b (cos t + sin t) + (cos t − sin t), 2 2
so Span{cos t, sin t} ⊆ Span{cos t + sin t, cos t − sin t}. Similarly, if g ∈ Span{cos t + sin t, cos t − sin t}, then for some c, d ∈ K we have g(t) = c(cos t + sin t) + d(cos t − sin t) = (c + d) cos t + (c − d) sin t, so Span{cos t + sin t, cos t − sin t} ⊆ Span{cos t, sin t}. In the remainder of this section we investigate what changes to the spanning set leave the spanned subspace unchanged. Theorem 1.2.20. Let v1 , . . . , vn and v be elements of a vector space V. If v ∈ Span{v1 , . . . , vn }, then Span{v1 , . . . , vn , v} = Span{v1 , . . . , vn }. Proof. Clearly, Span{v1 , . . . , vn } ⊆ Span{v1 , . . . , vn , v}. Now consider a w ∈ Span{v1 , . . . , vn , v}. Then there are numbers a1 , . . . , an , an+1 ∈ K such that w = a1 v1 + · · · + an vn + an+1 v.
16
Chapter 1: Vector Spaces
Since v ∈ Span{v1 , . . . , vn }, there are numbers b1 , . . . , bn ∈ K such that v = b1 v1 + · · · + bn vn . Then w = a1 v1 + · · · + an vn + an+1 v
= a1 v1 + · · · + an vn + an+1 (b1 v1 + · · · + bn vn ) = (a1 + an+1 b1 )v1 + · · · + (an + an+1 bn )vn .
Thus w ∈ Span{v1 , . . . , vn }. The next theorem gives us a condition that can be use to check whether two sets of vectors span the same subspace. Theorem 1.2.21. Let u1 , . . . , uk and v1 , . . . , vm be elements of a vector space V. Then Span{u1 , . . . , uk } = Span{v1 , . . . , vm } if and only if u1 , . . . , uk ∈ Span{v1 , . . . , vm } and v1 , . . . , vm ∈ Span{u1 , . . . , uk }. Proof. If Span{u1 , . . . , uk } = Span{v1 , . . . , vm }, then clearly we must have u1 , . . . , uk ∈ Span{v1 , . . . , vm } and v1 , . . . , vm ∈ Span{u1 , . . . , uk }. On the other hand, if u1 , . . . , uk ∈ Span{v1 , . . . , vm } and v1 , . . . , vm ∈ Span{u1 , . . . , uk }, then Span{u1 , . . . , uk , v1 , . . . , vm } = Span{u1 , . . . , uk } and Span{u1 , . . . , uk , v1 , . . . , vm } = Span{v1 , . . . , vm }, by Theorem 1.2.20. Hence Span{u1 , . . . , uk } = Span{v1 , . . . , vm }. In the next three corollaries we show that “elementary operations” on the vectors v1 , . . . , vn do not affect the vector space Span{v1 , . . . , vn }. Corollary 1.2.22. For any 1 ≤ i < j ≤ n we have Span{v1 , . . . , vj , . . . , vi , . . . , vn } = Span{v1 , . . . , vi , . . . , vj , . . . , vn }. Proof. This is a direct consequence of Theorem 1.2.21.
1.3. LINEARLY INDEPENDENT VECTORS AND BASES
17
While the above lemma says that interchanging the position of two vectors in {v1 , . . . , vn } does not affect the span, it follows that writing these vectors in any order will have no affect on the span. More precisely, Span{v1 , . . . , vn } = Span{vσ(1) , . . . , vσ(n) } where σ is any permutation on {1, . . . , n}. Corollary 1.2.23. For any j ∈ {1, . . . , n} and any scalar c 6= 0 we have Span{v1 , . . . , cvj , . . . , vn } = Span{v1 , . . . , vj , . . . , vn }.
Proof. This is a direct consequence of Theorem 1.2.21 and the equality vj = 1 c (cvj ) .
Corollary 1.2.24. For any i, j ∈ {1, . . . , n} and any scalar c we have Span{v1 , . . . , vi + cvj , . . . , vn } = Span{v1 , . . . , vi . . . , vn }.
Proof. This is a direct consequence of Theorem 1.2.21 and the equality vi = (vi + cvj ) − cvj . The operations on the spanning set that do not affect the subspace, described in this section, either do not change or increase the number of spanning vectors. Clearly, if we can describe a subspace using fewer vectors it would make sense to use that smaller spanning set. How can we tell whether it is possible to find a smaller spanning set? This question will be addressed in the next section.
1.3
Linearly independent vectors and bases
In this section we introduce two concepts that are of basic importance in every aspect of linear algebra. In the first definition we describe a property that will provide an answer to the question asked at the end of last section, that is, a property of a set of vectors v1 , . . . , vn that implies that there is a smaller set of vectors spanning the same vector space.
18
Chapter 1: Vector Spaces
Definition 1.3.1. Vectors v1 , . . . , vn in a vector space V are called linearly dependent if the equation x1 v1 + · · · + xn vn = 0 has a nontrivial solution, that is, a solution such that at least one of the numbers x1 , . . . , xn is different from 0.
i i Example 1.3.2. In the complex vector space C3 the vectors −i, i and 0 2i 0 1, are linearity dependent because 1 i i 0 0 i i −i − i + 1 = 0 . 2 2 0 2i 1 0 Linear dependence of vectors does not depend on the vector space. More precisely, if vectors v1 , . . . , vn are linearly dependent in the space Span{v1 , . . . , vn }, the smallest vector space containing vectors v1 , . . . , vn , then they are linearly dependent in every vector space that contains Span{v1 , . . . , vn } as a subspace. On the other hand, linear dependence can depend on the scalar field K. For example, the vectors 1 and i are linearly dependent in the complex vector space C, because i · 1 − i = 0, but they are not linearly dependent in the real vector space C. Theorem 1.3.3. If one of the vectors v1 , . . . , vn in a vector space V is 0, then the vectors v1 , . . . , vn are linearly dependent.
Proof. If vk = 0 for some k ∈ {1, . . . , n}, then we can take xk = 1 and xm = 0 for m 6= k. With this choice we have x1 v1 + · · · + xn vn = 0, which shows that the vectors are linearly dependent. The next theorem gives a practical criterion for linear dependence of a set vectors.
1.3. LINEARLY INDEPENDENT VECTORS AND BASES
19
Theorem 1.3.4. A set of n vectors, with n ≥ 2, is linearly dependent if and only if at least one of the vectors can be expressed as a linear combination of the remaining vectors. In other words, vectors v1 , . . . , vn , with n ≥ 2, are linearly dependent if and only if there is a k ∈ {1, . . . , n} such that vk ∈ Span{v1 , . . . , vk−1 , vk+1 , . . . , vn }.
Proof. If the vectors v1 , . . . , vn are linearly dependent, then x1 v1 + · · · + xn vn = 0 where x1 , . . . , xn ∈ K and xk 6= 0 for some 0 ≤ k ≤ n. Then x1 xn v1 + · · · + 1vk + · · · + vn = 0 xk xk and thus vk = −
xk−1 xk+1 xn x1 v1 − · · · − vk−1 − vk+1 − · · · − vn . xk xk xk xk
But this means that vk ∈ Span{v1 , . . . , vk−1 , vk+1 , . . . , vn }. Conversely, if vk ∈ Span{v1 , . . . , vk−1 , vk+1 , . . . , vn } for some 0 ≤ k ≤ n, then there are a1 , . . . , ak−1 , ak+1 , . . . , an ∈ K such that vk = a1 v1 + · · · + ak−1 vk−1 + ak+1 vk+1 + · · · + an vn and thus x1 v1 + · · · + xn vn = 0, where xm = am for m ∈ {1, . . . , k − 1, k + 1, . . . , n} and xk = −1, which means that the vectors v1 , . . . , vn are linearly dependent.
Definition 1.3.5. Vectors v1 , . . . , vn in a vector space V are called linearly independent if the only solution of the equation x1 v1 + · · · + xn vn = 0 is the trivial solution, that is x1 = · · · = xn = 0. In other words, vectors are linearly independent if they are not linearly dependent.
20
Chapter 1: Vector Spaces
Example 1.3.6. Show that the polynomials 1, t, t2 , . . . , tn are linearly independent in Pn (K). Solution. Assume x0 + x1 t + · · · + xn tn = 0 holds for some numbers x1 , . . . , xn ∈ K. It result from Appendix C, Theorem 5.15, that x0 = · · · = xn = 0. Consequently the polynomials 1, t, t2 , . . . , tn are linearly independent.
The result from the next example gives a characterization of linear independence of the columns of a matrix and is usually proved for matrices with real entries in an introductory Matrix Linear Algebra course. This result will be used later in this chapter.
Example 1.3.7. Let A ∈ Mn×n (K) with columns a1 , . . . , an ∈ Kn , that is, A = [a1 . . . an ]. Show that the following conditions are equivalent: (a) The vectors a1 , . . . , an are linearly independent; (b) The matrix A is invertible; x1 (c) The equation A ... = 0 has only the trivial solution, that is, the xn solution x1 = · · · = xn = 0.
Solution. The equivalence of (a) and (c) follows easily from the definition of linearly independent vectors. It is also easy to show that (b) implies (c). We only show that (c) implies (b). Let Ejk be the matrix with the (j, k) entry being 1 and all other entries being 0 and let α ∈ K. For j 6= k, we denote Rjk (α) = In + αEjk and Sj (α) = In + (α − 1)Ejj , where In is the identity n × n matrix. Note that these matrices have the following properties:
1.3. LINEARLY INDEPENDENT VECTORS AND BASES
21
(i) Rjk (α)A is the matrix obtained from A by adding the kth row multiplied by α to the jth row; (ii) Sj (α)A is the matrix obtained from A by multiplying the jth row by α; (iii) Rjk (α) is invertible and Rjk (α)−1 = Rjk (−α); (iv) If α 6= 0, then Sj (α) is invertible and Sj (α)−1 = Sj (α−1 ).
x1 Now we assume that the equation A ... = 0 has only the trivial solution. xn
Suppose that
A=
1
0
... 0
a1p
...
a1n
0 .. .
1 .. .
... 0 .. .
a2p .. .
...
a2,n .. .
0
0
... 1
ap−1,p
...
ap−1,n
0
0
... 0
app
...
apn
0 .. .
0 .. .
... 0 .. .
ap+1,p .. .
...
ap+1,n .. .
0
0
... 0
an−1,p
. . . an−1,n
0
0
... 0
anp
...
ann
,
where p ∈ {1, . . . , n}. If ap,p = ap+1,p = · · · = an,p = 0, then −a1,p −a 2,p .. . −ap−1,p 1 0 .. .
0
is a nontrivial solution, which contradicts our assumption. Suppose that ap,p 6= 0. We multiply A by Sp (a−1 pp ), to get 1 as the (p, p) entry, and then by Rr,p (−arp ) for every r 6= p. The result is a matrix of the
22
Chapter 1: Vector Spaces
form
A′ =
1
0 ... 0
a′1,p+1
...
a′1n
0
1 ... 0
a′2,p+1
...
a′2n
.. . 0
.. .. . . 0 ... 1
.. . a′p,p+1
...
a′pn
0
0 ... 0
a′p+1,p+1
...
a′p+1,n
0
0 ... 0
a′p+2,p+1
...
a′p+2,n
.. . 0
.. .. . . 0 ... 0
.. . a′n−1,p+1
0
0 ... 0
a′n,p+1
.. .
.. . . . . a′n−1,n ...
a′nn
.
Note that, since every matrix of the form Sj (α), with α 6= 0, or Rjk (α) is x1 ′ .. invertible, a solution of the equation A . = 0 is also a solution of the
xn x1 x1 .. ′ .. equation A . = 0 and consequently the equation A . = 0 has only
xn xn the trivial solution. If ap,p = 0 but aq,p 6= 0 for some q > p, we first multiply A by Rpq (1) to get aq,p as the (p, p) entry, and then we continue as in the case ap,p 6= 0. Using induction on p we can show that there are matrices E1 , . . . , Em such that E1 . . . Em A = In ,
where each matrix Ek is of the form Sj (α), with α 6= 0, or Rjk (α). This gives us −1 A = Em . . . E1−1 ,
proving the result, because a product of invertible matrices is invertible.
Example 1.3.8. Show that the functions 1, cos t, cos 2t are linearly independent in FK (R). Solution. Assume a + b cos t + c cos 2t = 0
1.3. LINEARLY INDEPENDENT VECTORS AND BASES
23
for some a, b, c ∈ K. Substituting t = 0, π2 , π gives us the system of equations a + b + c = 0, a − c = 0, a − b + c = 0.
Since the only solution of this system is a = b = c = 0, the functions 1, cos t, cos 2t are linearly independent. From Theorem 1.3.3 we get the following obvious corollary. Corollary 1.3.9. Linearly independent vectors are nonzero.
Another easy but useful fact about linearly independent vectors is that if we remove some vectors from a linearly independent set of vectors, then the new smaller set is still linearly independent. Theorem 1.3.10. If v1 , . . . , vn are linearly independent vectors in a vector space V, then for any distinct indices i1 , . . . , im ∈ {1, . . . , n} the vectors vi1 , . . . , vim are linearly independent.
Proof. Since xi1 vi1 + · · · + xim vim = x1 v1 + · · · + xn vn where xj = 0 for every j not in the set {i1 , . . . , im }, linear independence of the vectors vi1 , . . . , vim is an immediate consequence of linear independence of the vectors v1 , . . . , vn . The converse of the statement in Theorem 1.3.10 is not true, in general. More precisely, if every proper subset of {v1 , . . . , vk } is linearly independent, it is not necessarily true that the vectors v1 , . . . , vk are linearly independent. For example, the vectors (1, 0), (0, 1), (1, 1) are linearly dependent even though any two of them are linearly independent. If v ∈ Span{v1 , . . . , vk }, then v can be written as a linear combination of vectors v1 , . . . , vk , but this representation is not necessarily unique. For example, (1, −1) ∈ Span{(1, 0), (0, 1), (1, 1)} and we have (1, −1) = (1, 0) − (0, 1) and (1, −1) = (1, 1) − 2(0, 1). Uniqueness of representation of a vector in terms of the spanning set is a desirable property that is essential in many arguments in linear algebra and in
24
Chapter 1: Vector Spaces
applications. It turns out that linear independence of the spanning set guarantees uniqueness of the representation. This is one of the main reasons why linear independence is so important in linear algebra. Theorem 1.3.11. If v1 , . . . , vn are linearly independent vectors in a vector space V and c1 v1 + · · · + cn vn = d1 v1 + · · · + dn vn ,
(1.1)
for some c1 , . . . , cn , d1 , . . . , dn ∈ K, then c1 = d1 , c2 = d2 , . . . , cn = dn .
Proof. From (1.1) we get (c1 − d1 )v1 + · · · + (cn − dn )vn = 0, and thus c1 − d1 = · · · = cn − dn = 0, by linear independence of the vectors v1 , . . . , vn . The converse of the above theorem is also true: If for every v ∈ Span{v1 , . . . , vn } there are unique numbers c1 , . . . , cn ∈ K such that v = c1 v1 + · · · + cn vn , then the vectors v1 , . . . , vn are linearly independent. Indeed, if c1 v1 + · · · + cn vn = 0, then since we also have 0v1 + · · · + 0vn = 0, we must have c1 = · · · = cn = 0, by the uniqueness. Definition 1.3.12. A collection of vectors {v1 , . . . , vn } in a vector space V is called a basis of V if the following two conditions are satisfied: (a) The vectors v1 , . . . , vn are linearly independent; (b) V = Span{v1 , . . . , vn }. If {v1 , . . . , vn } is a basis of a vector space V, then every vector in V has a unique representation in the form v = c1 v1 + · · · + cn vn , by Theorem 1.3.11. Note that Theorem 1.3.4 implies that a basis in a vector space V is minimal in the following sense: If you remove even one vector from a basis in V, it will no longer span V.
1.3. LINEARLY INDEPENDENT VECTORS AND BASES
25
Example 1.3.13. Show that the set {1, t, t2 , . . . , tn } is a basis in Pn (K). Solution. This result is a consequence of the Examples 1.2.17 and 1.3.6.
Example 1.3.14. The set of all real solutions of the differential equation y ′′′ − y ′ = 0 is a real vector space. The set of functions {1, et , e−t } is a basis of this vector space.
Example 1.3.15. The set of matrices 0 1 0 0 0 0 , , 0 0 1 0 0 1
0 a is a basis in the vector space of all matrices of the form , where a, b and b c c are arbitrary numbers from K.
Example 1.3.16. Let v1 , . . . , vn be arbitrary vectors in a vector space V. If v1 6= 0, then we say that v1 is a pivot. For k ≥ 2, vk is called a pivot if vk 6∈ Span{v1 , . . . , vk−1 }. Show that the set of all pivots in {v1 , . . . , vn } is a basis of Span{v1 , . . . , vn }. Solution. First we show by induction on k that Span{v1 , . . . , vk } = Span{vj1 , . . . , vjm }, where vj1 , . . . , vjm are the pivots such that 1 ≤ j1 < · · · < jm ≤ k. For k = 1 the result is trivial. Assume now that Span{v1 , . . . , vk } = Span{vj1 , . . . , vjm }, where vj1 , . . . , vjm are the pivots such that 1 ≤ j1 < · · · < jm ≤ k. If vk+1 is a pivot, then Span{v1 , . . . , vk , vk+1 } = Span{vj1 , . . . , vjm , vk+1 }.
26
Chapter 1: Vector Spaces
If we let jm+1 = k + 1, then the vectors vj1 , . . . , vjm , vjm+1 are the pivots such that 1 ≤ j1 < · · · < jm < jm+1 ≤ k + 1. If vk+1 is not a pivot, then vk+1 ∈ Span{v1 , . . . , vk } = Span{vj1 , . . . , vjm }. Consequently, Span{v1 , . . . , vk , vk+1 } = Span{vj1 , . . . , vjm } and the vectors vj1 , . . . , vjm are the pivots such that 1 ≤ j1 < · · · < jm ≤ k+1. Now we show by induction on k that, if vj1 , . . . , vjm are the pivots such that 1 ≤ j1 < · · · < jm ≤ k, then the vectors vj1 , . . . , vjm are linearly independent. The statement is trivially true for k = 1. Now assume that it is true for some k ≥ 1 and that vj1 , . . . , vjm are the pivots such that 1 ≤ j1 < · · · < jm ≤ k +1. Let x1 vj1 + · · · + xm vjm = 0 for some x1 , . . . , xm ∈ K. If xm 6= 0, then vjm ∈ Span{vj1 , . . . , vjm−1 }, contradicting the assumption that vjm is a pivot. Consequently, xm = 0 and, since 1 ≤ j1 < · · · < jm−1 ≤ k, we must have x1 = · · · = xm−1 = 0, by our inductive assumption. This shows that the vectors vj1 , . . . , vjm are linearly independent. We have shown that the set of the pivots in {v1 , . . . , vk } is a basis of Span{v1 , . . . , vk }, for any k ≤ n. If we take k = n, we get the desired result.
Example 1.3.17. Let Ejk be the m × n matrix such that the (j, k) entry is 1 and the all other entries are 0. The set of matrices {Ejk , 1 ≤ j ≤ m, 1 ≤ k ≤ n} is a basis of the vector space Mm×n (K).
Example 1.3.18. The standard Gaussian elimination method can be applied to a matrix with complex entries in the same way as in the case of a matrix with real entries. For any A ∈ Mm×n (K) the set of pivot columns is a basis of C(A), that is, the vector subspace of Km spanned by the columns of A.
Definition 1.3.19. Let {v1 , . . . , vn } be a basis of a vector space V and let x ∈ V. The unique numbers c1 , . . . , cn such that x = c1 v1 +· · ·+cn vn are called the coordinates of x in the basis {v1 , . . . , vn }.
1.3. LINEARLY INDEPENDENT VECTORS AND BASES
27
Example 1.3.20. Let a be an arbitrary number in K. Show that {1, t − a, . . . , (t − a)n } is a basis of Pn (K) and determine the coordinates of an arbitrary polynomial in this basis.
Solution. We first show that the polynomials 1, t − a, . . . , (t − a)n are linearly independent. Suppose that we have x0 + x1 (t − a) + · · · + xn (t − a)n = 0, where x0 , . . . , xn are some numbers from K. If we let t = a we get x0 = 0. Next we differentiate the above equality and get x1 + 2x2 (t − a) + · · · + nxn (t − a)n−1 = 0. Again we let t = a and get x1 = 0. We continue in this way to show that x2 = · · · = xn = 0. Consequently the polynomials 1, t − a, . . . , (t − a)n are linearly independent. To finish the proof we will show that for any polynomial p ∈ Pn (K) we have 1 (1.2) p(t) = p(a) + p′ (a)(t − a) + · · · + p(n) (a)(t − a)n . n! Let p ∈ Pn (K) be such that p(t) = a0 + a1 t + · · · + an tn , for some a0 , a1 , . . . , an ∈ K. Since we can write p(t) = a0 + a1 (t − a + a) + · · · + an (t − a + a)n , it is easy to verify, using the binomial expansion, that there are b0 , b1 , . . . , bn ∈ K such that p(t) = b0 + b1 (t − a) + · · · + bn (t − a)n . Thus Span{1, t − a, . . . , (t − a)n } = Pn (K). Since p(a) = b0 and, for k = 1, 2, . . . , n, p(k) (a) = bk k!, we have bk = 1 (k) (a). Hence k! p p(t) = p(a) + p′ (a)(t − a) + · · · +
1 (n) p (a)(t − a)n . n!
Note that in this example we use the derivative from Appendix C.
28
Chapter 1: Vector Spaces
Example 1.3.21. Let a be an arbitrary number in K and let U = {p ∈ Pn (K) : p(a) = p′ (a) = p′′ (a) = 0}. (a) Show that U is a vector subspace of Pn (K). (b) Find a basis U. (c) Determine the coordinates of an arbitrary polynomial from U in that basis. Solution. We are going to use the result from Example 1.3.20. If p ∈ Pn (K) and p(a) = p′ (a) = p′′ (a) = 0, then p(t) =
1 1 ′′′ p (a)(t − a)3 + · · · + p(n) (a)(t − a)n , 3! n!
(1.3)
by (1.2). Consequently, U = Span{(t − a)3 , . . . , (t − a)n } and, since the functions (t − a)3 , . . . , (t − a)n are linearly independent, the set {(t − a)3 , . . . , (t − a)n } is a basis of U. Finally, the coordinates of an arbitrary polynomial from U in that basis are already in (1.3). The following theorem can often be used to solve problems involving linear independence of vectors using methods from matrix algebra. Theorem 1.3.22. Let {v1 , . . . , vn } be a basis of a vector space V and let wk = a1k v1 + · · · + ank vn for some ajk ∈ K, 1 ≤ j, k ≤ n. Then the vectors w1 , . . . , wn are linearly independent if and only if the columns of the matrix a11 . . . a1n .. .. . . an1 . . . ann
are linearly independent.
Proof. First note that we can write x1 w1 +· · ·+xn wn = [a11
x1 x1 .. .. . . . a1n ] . v1 +· · ·+ [an1 . . . ann ] . vn . xn
xn
1.3. LINEARLY INDEPENDENT VECTORS AND BASES
29
Now, because the vectors v1 , . . . , vn are linearly independent, x1 w1 + · · · + xn wn = 0 is equivalent to a11
x1 x1 . . . . . a1n .. = · · · = an1 . . . ann .. = 0
xn
xn
which can be written as
or, equivalently,
a11 . . . a1n x1 0 .. .. .. = .. . . . . an1 . . . ann xn 0
a11 a1n 0 .. .. .. x1 . + · · · + xn . = . . an1
ann
0
Therefore the vectors w1 , . . . , wn are linearly independent if and only if the a1n a11 vectors ... , . . . , ... are linearly independent, proving the theorem. an1
ann
Example 1.3.23. Let v1 , v2 , v3 be arbitrary vectors in a vector space. Show that the vectors 2v1 + 3v2 + v3 , 5v1 + 2v2 + 8v3 , 2v1 + 7v2 − 3v3 are linearly dependent and express one of them as a linear combination of the remaining two. Solution. Because the reduced row echelon form of the matrix 2 5 2 3 2 7 1 8 −3 is
31 1 0 11 8 0 1 − 11 0 0 0
30
Chapter 1: Vector Spaces
the vectors are linearly dependent and we have 2v1 + 7v2 − 3v3 =
31 8 (2v1 + 3v2 + v3 ) − (5v1 + 2v2 + 8v3 ). 11 11
Example 1.3.24. Show that the polynomials t(t − 1)2 , t3 + 2t, (t + 1)3 , (t + 1)2 are linearly independent in P3 (K). Solution. If we use {1, t, t2 , t3 } as a basis in P3 (K), then the matrix in Theorem 1.3.22 is 0 0 1 1 1 1 3 2 −2 0 3 1 . 1 2 1 0 Since the reduced row echelon form of the above matrix is 1 0 0 0
0 1 0 0
0 0 1 0
0 0 , 0 1
the polynomials t(t − 1)2, t3 + 2t, (t + 1)3, (t + 1)2 are linearly independent.
Example 1.3.25. Find a basis of the vector space V = Span
1−i 2 −3 + 5 i −2 i 2 , , . i 1+i 6 − 3 i −3 − 5 i 3 −i
Solution. We can solve this problem by finding a basis of the vector space 1−i −3 + 5 i i 2 2 −2 Span , , . i 6 − 3 i 3 1+i −3 − 5 i −i
1.3. LINEARLY INDEPENDENT VECTORS AND BASES
31
The reduced row echelon form of the matrix 1 − i −3 + 5 i i 2 −2 2 i 6 − 3 i 3 1 + i −3 − 5 i −i is
1 0 0 0
and thus the set
0 23 1 12 , 0 0 0 0
1−i 2 −3 + 5 i −2 , i 1+i 6 − 3 i −3 − 5 i
(1.4)
is linearly independent. Since 1 −3 + 5 i 3 1−i 2 −2 i 2 + , = i 1+i 6 − 3 i −3 − 5 i 3 −i 2 2 we have V = Span
1−i 2 −3 + 5 i −2 , , i 1+i 6 − 3 i −3 − 5 i
so the set (1.4) is a basis of V.
Example 1.3.26. Let a, b, and c be distinct numbers from K. Find a basis in the space U = {p ∈ Pn (K) : p(a) = p(b) = p(c)}, where n ≥ 3. Solution. Let κ = p(a) = p(b) = p(c) and q(t) = (t − a)(t − b)(t − c). If p ∈ U, then p(t) = s(t)q(t) + κ, where s(t) is 0 or a polynomial of degree at most n − 3. In other words, U = {sq + κ : s ∈ Pn−3 (K), k ∈ K}. Consequently, {1, q(t), tq(t), . . . , tn−3 q(t)} is a basis of U.
32
Chapter 1: Vector Spaces
1.4
Direct sums
In Section 1.2 we introduced sums of subspaces. Now we are going to refine that idea by introducing direct sums which play a much more important role in linear algebra. Definition 1.4.1. Let V1 , . . . , Vn be subspaces of a vector space V. The sum V1 + · · · + Vn is called a direct sum if every v ∈ V1 + · · ·+ Vn can be written in a unique way as a sum v1 + · · · + vn with vj ∈ Vj for every j ∈ {1, . . . , n}. To indicate that the sum V1 + · · · + Vn is direct we write V1 ⊕ · · · ⊕ Vn . The condition of uniqueness of the representation v1 + · · · + vn with vj ∈ Vj for every j ∈ {1, . . . , n} is similar to a condition that characterizes linear independence of vectors. This is not a coincidence. As we will soon see, direct sums and linear independence are closely related. The following theorem gives a simple condition that characterizes direct sums. It should not be a surprise that it is similar to the condition that characterizes linear independence of vectors. Theorem 1.4.2. Let V1 , . . . , Vn be subspaces of a vector space V. The sum V1 + · · · + Vn is direct if and only if the equality v1 + · · · + vn = 0 with vj ∈ Vj for every j ∈ {1, . . . , n}, implies v1 = · · · = vn = 0. Proof. Assume that the sum V1 + · · · + Vn is direct and that v1 + · · · + vn = 0 with vj ∈ Vj for every j ∈ {1, . . . , n}. Since we also have 0 + ··· + 0 = 0 and 0 ∈ Vj for every j ∈ {1, . . . , n}, we must have v1 = · · · = vn = 0 by the uniqueness requirement in the definition of direct sums. Now assume that for any vectors vj ∈ Vj for j ∈ {1, . . . , n}, the equality v1 + · · · + vn = 0 implies v1 = · · · = vn = 0. We need to show the uniqueness property. If u1 + · · · + un = w 1 + · · · + w n
33
1.4. DIRECT SUMS with uj , wj ∈ Vj for every j ∈ {1, . . . , n}, then u1 − w1 + · · · + un − wn = 0.
Since uj − wj ∈ Vj for all j ∈ {1, . . . , n}, we can conclude that uj = wj for all j ∈ {1, . . . , n}. Note that, if a sum V1 + · · · + Vn is a direct sum and k1 , k2 , . . . , km ∈ {1, 2, . . . , n} are distinct indices, then the sum Vk1 + · · · + Vkm is also direct. Example 1.4.3. Suppose that U, U ′ , V, and V ′ are subspaces of a vector space W. If U = U ′ ⊕ (U ∩ V) and V = V ′ ⊕ (U ∩ V), show that U + V = U ′ ⊕ (U ∩ V) ⊕ V ′ . Solution. Because it is easy to see that U + V = U ′ + (U ∩ V) + V ′ we only have to show that the sum is direct. Suppose that u + w + v = 0, where u ∈ U ′ , w ∈ U ∩ V, and v ∈ V ′ . From v = −u − w ∈ U ′ ⊕ (U ∩ V) = U we get v ∈ V ′ ∩ U ⊆ V ∩ U, which means that v ∈ V ′ ∩ (U ∩ V) = {0}. Since v = 0, we have u + w = 0 and thus u = w = 0.
Theorem 1.4.4. Suppose V and W are subspaces of a vector space X . The sum V + W is direct if and only if V ∩ W = {0}. Proof. If the sum V + W is direct and v = w, with v ∈ V and w ∈ W, then the equality 0 = v − w implies v = w = 0. Now, if V ∩ W = {0} and v + w = 0 with v ∈ V and w ∈ W, then v = −w. Hence v = w = 0 because −w ∈ W.
34
Chapter 1: Vector Spaces
Example 1.4.5. Show that the set U = {p ∈ Pn (K) : t3 + 1 divides p}, where n ≥ 3, is a subspace of Pn (K) and that we have U ⊕ P2 (K) = Pn (K). Solution. It is easy to show, as a consequence of the definitions, that U is a subspace of the vector space Pn (K). If q ∈ Pn (K), then there are polynomials r and s such that q = s(t3 + 1) + r, where r = 0 or deg r ≤ 2. Since s(t3 + 1) ∈ U, we have shown that U + P2 (K) = Pn (K). Clearly the sum U + P2 (K) is direct. The property in Theorem 1.4.4 does not extend to three or more subspaces. It is possible that U ∩ V ∩ W = {0}, but the sum U + V + W is not direct. We leave showing this as an exercise. The next theorem makes the following statement precise: The direct sum of direct sums is direct. Theorem 1.4.6. Let n1 , n2 , . . . , nm be arbitrary positive integers and let Vj,k be subspaces of a vector space V for 1 ≤ k ≤ m and 1 ≤ j ≤ nk . If the sum Uk = V1,k + V2,k + · · · + Vnk ,k is direct for every 1 ≤ k ≤ m and the sum U1 + · · · + Um is direct, then the sum V1,1 + · · · + Vn1 ,1 + · · · + V1,m + · · · + Vnm ,m is direct and we have V1,1 ⊕ · · · ⊕ Vn1 ,1 ⊕ · · · ⊕ V1,m ⊕ · · · ⊕ Vnm ,m = U1 ⊕ · · · ⊕ Um . Proof. Suppose v1,1 + · · · + vn1 ,1 + · · · + v1,m + · · · + vnm ,m = 0.
35
1.4. DIRECT SUMS Since v1,1 + · · · + vn1 ,1 + · · · + v1,m + · · · + vnm ,m
= (v1,1 + · · · + vn1 ,1 ) + · · · + (v1,m + · · · + vnm ,m ),
we have v1,k + · · · + vnk ,k = 0 for every 1 ≤ k ≤ m and thus v1,1 = · · · = vn1 ,1 = · · · = v1,k = · · · = vnk ,k = 0. This shows that the sum V1,1 + · · ·+ Vn1 ,1 + · · · + V1,k + · · · + Vnk ,k is direct. The last equality in the theorem is an immediate consequence of Theorem 1.2.13. Now we formulate and prove a theorem that connects direct sums with linear independence. Theorem 1.4.7. If v1 , . . . , vn are linearly independent vectors in a vector space V, then the sum Kv1 + · · · + Kvn is direct.
Proof. Assume the vectors v1 , . . . , vn are linearly independent. If u1 + · · · + un = 0 with uj ∈ Kvj for j ∈ {1, . . . , n}, then uj = xj vj for some x1 , . . . , xn ∈ K. Then x1 v1 + · · · + xn vn = 0 and, since the vectors v1 , . . . , vn are linearly independent, x1 = · · · = xn = 0. But this means that u1 = · · · = un = 0, proving that the sum Kv1 + · · · + Kvn is direct. The conditions in the above theorems are not equivalent. We need an additional assumption to get the implication in the other direction. Theorem 1.4.8. If v1 , . . . , vn are nonzero vectors in a vector space V and the sum Kv1 + · · · + Kvn is direct, then the vectors v1 , . . . , vn are linearly independent.
36
Chapter 1: Vector Spaces
Proof. Assume the vectors v1 , . . . , vn are nonzero and the sum Kv1 + · · · + Kvn is direct. If x1 v1 + · · · + xn vn = 0, then we get x1 v1 = · · · = xn vn = 0. Since the vectors v1 , . . . , vn are nonzero, we must have x1 = · · · = xn = 0, proving that the vectors v1 , . . . , vn are linearly independent.
Example 1.4.9. Show that the sum Span{t2 + 1, t2 + 3} + Span{t3 + t + 2, t3 + 3t2 } is a direct sum of subspaces of P3 (K). Proof. It suffices to check that the functions t2 + 1, t2 + 3, t3 + t + 2, t3 + 3t2 are linearly independent and then use Theorems 1.4.6 and 1.4.7.
The next theorem describes another desirable property of direct sums. In general, if {u1 , . . . , uk } is a basis of U and {v1 , . . . , vm } is a basis of V, then {u1 , . . . , uk , v1 , . . . , vm } need not be a basis of U + V. It turns out that we don’t have this problem if the sum is direct. Theorem 1.4.10. Let V1 , . . . , Vn be subspaces of a vector space V such that the sum V1 + · · · + Vn is direct. If {v1,j , . . . , vkj ,j } is a basis of Vj for each 1 ≤ j ≤ n, then {v1,1 , . . . , vk1 ,1 , . . . , v1,n , . . . , vkn ,n } is a basis of V1 ⊕ · · · ⊕ Vn .
Proof. According to Theorem 1.4.6 we have V1 ⊕ · · · ⊕ Vn = (Kv1,1 ⊕ · · · ⊕ Kvk1 ,1 ) ⊕ · · · ⊕ (Kv1,n ⊕ · · · ⊕ Kvkn ,n ) = Kv1,1 ⊕ · · · ⊕ Kvk1 ,1 ⊕ · · · ⊕ Kv1,n ⊕ · · · ⊕ Kvkn ,n
and thus V1 ⊕ · · · ⊕ Vn = Span{v1,1 , . . . , vk1 ,1 , . . . , v1,n , . . . , vkn ,n }. Now the result follows by Theorem 1.4.8.
37
1.4. DIRECT SUMS
Example 1.4.11. Show that the sum Span{1, cos t, cos 2t} + Span{sin t, sin 2t}
(1.5)
is a direct sum of subspaces of FK (R) and that {1, cos t, cos 2t, sin t, sin 2t} is a basis of Span{1, cos t, cos 2t} ⊕ Span{sin t, sin 2t}. Solution. If a0 , a1 , a2 , b1 , b2 ∈ K are such that a0 + a1 cos t + a2 cos 2t = b1 sin t + b2 sin 2t, then also a0 + a1 cos(−t) + a2 cos(−2t) = b1 sin(−t) + b2 sin(−2t), which simplifies to a0 + a1 cos t + a2 cos 2t = −(b1 sin t + b2 sin 2t). Consequently, a0 + a1 cos t + a2 cos 2t = 0 and b1 sin t + b2 sin 2t = 0. We have shown in Example 1.3.8 that the functions 1, cos t, cos 2t are linearly independent. It can be shown, using the same approach, that the functions sin t, sin 2t are linearly independent. Linear independence of the functions 1, cos t, cos 2t gives us a0 = a1 = a2 = 0 and linear independence of the functions sin t, sin 2t gives us b1 = b2 = 0. Hence, by Theorem 1.4.4, the sum (1.5) is direct and the set {1, cos t, cos 2t, sin t, sin 2t} is its basis, by Theorem 1.4.10.
Example 1.4.12. Show that the sum Sn×n (K) + An×n (K) is direct and we have Sn×n (K) ⊕ An×n (K) = Mn×n (K), where
Sn×n (K) = A ∈ Mn×n (K) : AT = A
38
Chapter 1: Vector Spaces
and
An×n (K) = A ∈ Mn×n (K) : AT = −A .
Solution. If A ∈ Sn×n (K)∩An×n (K), then A = AT = −AT and thus A = −A. Consequently, the entries of the matrix A are all 0. This shows that the sum Sn×n (K) + An×n (K) is direct, by Theorem 1.4.4. Now we observe that any matrix A ∈ Mn×n (K) can be written in the form A=
1 1 (A + AT ) + (A − AT ). 2 2
Since A + AT ∈ Sn×n (K) and A − AT ∈ An×n (K), we have A ∈ Sn×n (K) ⊕ An×n (K). So far in this section we were concerned with checking whether a given sum of subspaces was direct. Now we are going to investigate a different problem. We would like to be able to decompose a given vector space into subspaces that give us the original vector space as the direct sum of those subspaces. In practice, we often want those subspaces to have some special properties. We begin by proving a simple lemma that gives us the basic ingredient of the construction. Lemma 1.4.13. Let U be a subspace of a vector space V and let v ∈ V. If v ∈ / U, then the sum U + Kv is direct. Proof. Suppose u + αv = 0 for some u ∈ U and α ∈ K. If α 6= 0, then v = − α1 u ∈ U, which is not possible, because of our assumption. Consequently, α = 0 and thus also u = 0.
Theorem 1.4.14. Let U be a subspace of a vector space V. If V = Span{v1 , . . . , vn }, then there are linearly independent vectors vi1 , . . . , vim ∈ {v1 , . . . , vn } such that V = U ⊕ Kvi1 ⊕ · · · ⊕ Kvim . Proof. If U = V, then we have nothing to prove. Now, if U 6= V, then there is a vector vi1 ∈ {v1 , . . . , vn } such that vi1 ∈ / U. According to Lemma 1.4.13 the sum U + Kvi1 is direct. If U ⊕ Kvi1 = V, then we are done. If U ⊕ Kvi1 6= V, then we continue using mathematical induction. Suppose that we have proven that the the sum U + Kvi1 + · · · + Kvik
39
1.4. DIRECT SUMS
is direct. If U ⊕ Kvi1 ⊕ · · · ⊕ Kvik = V, then we are done. If U ⊕ Kvi1 ⊕ · · · ⊕ Kvik 6= V, then there is a vector vik+1 ∈ {v1 , . . . , vn } which is not in U + Kvi1 + · · · + Kvik . But then the sum U + Kvi1 + · · · + Kvik + Kvik+1 is direct by Lemma 1.4.13 and Theorem 1.4.6. Since the set {v1 , . . . , vn } has a finite number of elements, we will reach a point when U ⊕ Kvi1 ⊕ · · · ⊕ Kvik = V. Corollary 1.4.15. Let U be a subspace of a vector space V. If V = Span{v1 , . . . , vn }, then there is a subspace W of the vector space V such that V = U ⊕ W. Proof. W = Kvi1 ⊕· · ·⊕Kvim , where vi1 , . . . , vim ∈ {v1 , . . . , vn } are the vectors in Theorem 1.4.14. A subspace W of a vector space V such that V = U ⊕ W will be called a complement of U in V. It is important to remember that a complement is not unique. For example, both W = {(t, t) : t ∈ R} and W = {(0, t) : t ∈ R} are complements of U = {(t, 0) : t ∈ R} in the vector space V = R2 = {(s, t) : s, t ∈ R}. In Chapter 3 we introduce a different notion of complements, namely orthogonal complements, and prove that such complements are unique. Corollary 1.4.16. Let V be a vector space. If V = Span{v1 , . . . , vn } and the vectors w1 , . . . wk ∈ V are linearly independent, then there are linearly independent vectors vi1 , . . . , vim ∈ {v1 , . . . , vn } such that the set {w1 , . . . wk , vi1 , . . . , vim } is a basis of V. Proof. We apply Theorem 1.4.14 to the subspace W = Kw1 ⊕ · · · ⊕ Kwk and we get linearly independent vectors vi1 , . . . , vim such that V = W ⊕ Kvi1 ⊕ · · · ⊕ Kvim = Kw1 ⊕ · · · ⊕ Kwk ⊕ Kvi1 ⊕ · · · ⊕ Kvim . This means that the set {w1 , . . . , wk , vi1 , . . . , vim } is a basis of V. Corollary 1.4.17. Let V be a vector space. If V = Span{v1 , . . . , vn }, then there are linearly independent vectors vi1 , . . . , vim ∈ {v1 , . . . , vn } such that the set {vi1 , . . . , vim } is a basis of V.
40
Chapter 1: Vector Spaces
Proof. We apply Theorem 1.4.14 to the subspace W = {0}. Note that in Example 1.3.16 we give an alternative proof of the above fact.
Theorem 1.4.18. Let U, V, and W be subspaces of a vector space X such that V ⊕ U = W ⊕ U. If V has a basis with m vectors, then W has a basis with m vectors. Proof. If {v1 , . . . , vm } is a basis of V, then V = Kv1 ⊕ · · · ⊕ Kvm . Since Kv1 ⊕ · · · ⊕ Kvm ⊕ U = W ⊕ U, for every 1 ≤ j ≤ m, we have vj = wj + uj , for some wj ∈ W and uj ∈ U. We will show that {w1 , . . . , wm } is a basis of W. Indeed, if x1 w1 + · · · + xm wm = 0, then x1 v1 + · · · + xm vm = x1 u1 + · · · + xm um and thus x1 v1 +· · ·+xm vm = 0, because uj ∈ U and Span{v1 , . . . , vm }∩ U = 0. Consequently, x1 = · · · = xm = 0, proving linear independence of vectors w1 , . . . , wm . Now, if w is an arbitrary vector in W, then there are numbers x1 , . . . , xm ∈ K and a vector u ∈ U such that w = x1 v1 + · · · + xm vm + u = x1 w1 + · · · + xm wm + x1 u1 + · · · + xm um + u. Since W ∩ U = {0}, we must have w = x1 w1 + · · · + xm wm , proving that W = Span{w1 , . . . , wm }.
Example 1.4.19. Let V1 and V2 be arbitrary vector spaces over K and let W = V1 × V2 . Then W1 = V1 × {0} and W2 = {0} × V2 are subspaces of W. It is easy to see that W = W1 ⊕ W2 . More generally, if V1 , . . . , Vn are arbitrary vector spaces over K and W =
1.5. DIMENSION OF A VECTOR SPACE
41
V1 × · · · × Vn , then the spaces W1 = V1 × {0} × · · · × {0} .. . Wj = {0} × · · · × {0} × Vj × {0} × · · · × {0} .. . Wn = {0} × · · · × {0} × Vn are subspaces of W and we have W = W1 ⊕ · · · ⊕ Wn . Note that we cannot write W = V1 ⊕ · · · ⊕ Vn because V1 , . . . , Vn are not subspaces of W.
1.5
Dimension of a vector space
Dimensions of R2 and R3 have a clear and intuitive meaning for us. The notion of the dimension of an abstract vector space in less intuitive. Before we can define the dimension of a vector space, we first need to establish some additional properties of bases in vector spaces. To motivate an approach to the proof of the first important theorem in this section (Theorem 1.5.2) we consider a special case.
Example 1.5.1. If a vector space V = Span{v1 , v2 , v3 } contains three linearly independent vectors w1 , w2 , w3 , then the vectors v1 , v2 , v3 are linearly independent and Span{v1 , v2 , v3 } = Span{w1 , w2 , w3 }. Solution. Let w1 , w2 , w3 be linearly independent vectors in Span{v1 , v2 , v3 }. Then Span{w1 , w2 , w3 } ⊆ Span{v1 , v2 , v3 } and w1 = a11 v1 + a21 v2 + a31 v3 w2 = a12 v1 + a22 v2 + a32 v3 w3 = a13 v1 + a23 v2 + a33 v3 , for some ajk ∈ K. Let
a11 a12 a13 A = a21 a22 a23 . a31 a32 a33
42
Chapter 1: Vector Spaces
It is easy to verify that the equality x1 0 A x2 = 0 x3 0 implies
x1 w1 + x2 w2 + x3 w3 = 0. Since the vectors w1 , w2 , w3 are linearly independent, we must have x1 = x2 = x3 = 0. But this means that the matrix A is invertible. Let b11 b12 b13 B = b21 b22 b23 = A−1 . b31 b32 b33 Now we note that
b11 w1 + b21 w2 + b31 w3 = (b11 a11 + b21 a12 + b31 a13 )v1 + (b11 a21 + b21 a22 + b31 a23 )v2 + (b11 a31 + b21 a32 + b31 a33 )v3 = 1v1 + 0v2 + 0v3 = v1 , that is v1 = b11 w1 + b21 w2 + b31 w3 . Similarly, we get v2 = b12 w1 + b22 w2 + b32 w3 , v3 = b13 w1 + b23 w2 + b33 w3 . Consequently Span{v1 , v2 , v3 } = Span{w1 , w2 , w3 }. To show that the vectors v1 , v2 , v3 are linearly independent suppose y1 v1 + y2 v2 + y3 v3 = 0 for some y1 , y2 , y3 ∈ K. Then y1 (b11 w1 + b21 w2 + a31 w3 ) + y2 (b12 w1 + b22 w2 + b32 w3 ) + y3 (b13 w1 + b23 w2 + b33 w3 ) = 0
or, equivalently, (y1 b11 + y2 b12 + y3 b13 )w1 + (y1 b21 + y2 b22 + y3 b23 )w2 + (y1 b31 + y2 b32 + y3 b33 )w3 = 0.
Since the vectors w1 , w2 , w3 are linearly independent, we must have y1 b11 + y2 b12 + y3 b13 = y1 b21 + y2 b22 + y3 b23 = y1 b31 + y2 b32 + y3 b33 = 0,
1.5. DIMENSION OF A VECTOR SPACE which can be written as
43
y1 0 B y2 = 0 . y3 0
But this means that y1 = y2 = y3 = 0, because the matrix B is invertible. Thus the vectors v1 , v2 , v3 are linearly independent. And now the general theorem. Theorem 1.5.2. Let V be a vector space and let v1 , . . . , vn ∈ V. If vectors w1 , . . . , wn ∈ Span{v1 , . . . , vn } are linearly independent, then the vectors v1 , . . . , vn are linearly independent and Span{v1 , . . . , vn } = Span{w1 , . . . , wn }. In other words, every collection of n linearly independent vectors w1 , . . . , wn in Span{v1 , . . . , vn } is a basis of Span{v1 , . . . , vn } and the spanning set {v1 , . . . , vn } is also a basis of Span{v1 , . . . , vn }. Proof. The proof follows the method presented in detail in Example 1.5.1. Let w1 , . . . , wn be linearly independent vectors in Span{v1 , . . . , vn }. Then Span{w1 , . . . , wn } ⊆ Span{v1 , . . . , vn } and for every 1 ≤ k ≤ n we have wk = a1k v1 + · · · + ank vn , for some ajk ∈ K, 1 ≤ k ≤ n. Let
If
a11 . . . a1n A = ... . . . ... . an1 . . . ann 0 x1 .. .. A . = . ,
xn
then we have
0
x1 w1 + · · · + xn wn = 0, and, since the vectors w1 , . . . , wn are linearly independent, x1 = · · · = xn = 0. This means that A is invertible. Let b11 . . . b1n B = ... . . . ... = A−1 . bn1 . . . bnn
44
Chapter 1: Vector Spaces Now, because we have vk = b1k w1 + · · · + bnk wn ,
for every 1 ≤ k ≤ n, we have Span{v1 , . . . , vn } = Span{w1 , . . . , wn }. Moreover, the equality y1 v1 + · · · + yn vn = 0, implies
y1 0 .. .. B . = . yn
0
and thus y1 = · · · = yn = 0, because B is invertible. But this means that the vectors v1 , . . . , vn are linearly independent. The above proof depends heavily on properties of invertible matrices. Below we give a proof that does not use matrices at all. It is based on mathematical induction. Second proof. We leave as exercise the proof for n = 1. Now we assume that the theorem holds for n − 1 for some n ≥ 2 and show that it must also hold for n. Let w1 , . . . , wn be linearly independent vectors in Span{v1 , . . . , vn }. As in the first proof, for every k = 1, 2, . . . , n we write wk = a1k v1 + · · · + ank vn . Since w1 6= 0 and w1 = a11 v1 + · · · + an1 vn , one of the numbers a11 , . . . , an1 is different from 0. Without loss of generality, we can assume that a11 6= 0. Then v1 =
1 a21 an1 1 (w1 − a21 v2 − · · · − an1 vn ) = w1 − v2 − · · · − vn . (1.6) a11 a11 a11 a11
For k ≥ 2 we get wk −
a1k w1 = a11
The vectors w2 −
a1k a21 a1k an1 a2k − v2 + · · · + ank − vn . a11 a11
a12 a11 w1 , . . . , wn
−
a1n a11 w1
are linearly independent. Indeed, if
a12 a1n x2 w2 − w1 + · · · + xn wn − w1 = 0, a11 a11
then
a1n a12 + · · · + xn w1 + x2 w2 + · · · + xn wn = 0. − x2 a11 a11
1.5. DIMENSION OF A VECTOR SPACE
45
This implies x2 = · · · = xn = 0, because the vectors w1 , . . . , wn are linearly independent. Now, since w2 − aa12 w1 , . . . , wn − aa1n w1 are linearly independent vectors in 11 11 Span{v2 , . . . , vn }, by the inductive assumption we have a12 a1n Span w2 − w1 , . . . , wn − w1 = Span{v2 , . . . , vn } a11 a11 and the vectors v2 , . . . , vn are linearly independent. Consequently, a1n a12 w1 , . . . , wn − w1 Span{w1 , . . . , wn } = Span w1 , w2 − a11 a11 = Span{w1 , v2 . . . , vn } = Span{v1 , . . . , vn },
where the last equality follows by (1.6). It remains to be shown that the vectors v1 , . . . , vn are linearly independent. We argue by contradiction. We already know that the vectors v2 , . . . , vn are linearly independent. Suppose v1 ∈ Span{v2 , . . . , vn }. Then Span{v2 , . . . , vn } = Span{v1 , v2 , . . . , vn }. Since w2 , . . . , wn are linearly independent vectors in Span{v2 , . . . , vn }, we have Span{v2 , . . . , vn } = Span{w2 , . . . , wn }, by our inductive assumption. Consequently, we have Span{w2 , . . . , wn } = Span{v1 , v2 , . . . , vn } and thus also w1 ∈ Span{w2 , . . . , wn }, which contradicts linear independence of the vectors w1 , w2 , . . . , wn . This proves that v1 ∈ / Span{v2 , . . . , vn } and therefore the vectors v1 , v2 , . . . , vn are linearly independent. Because Theorem 1.5.2 is of central importance in the discussion of the dimension of a vector space, we give still another proof. It is an induction proof that uses Corollaries 1.4.16 and 1.4.17 and Theorem 1.4.18. It is instructive to study and compare these three different arguments. Third proof. We leave as exercise the proof for n = 1. Now we assume that the theorem holds for all integers p < n and show that it must also hold for n. Let w1 , . . . , wn be linearly independent vectors in Span{v1 , . . . , vn }. First we show that the set {w1 , . . . , wn } is a basis of Span{v1 , v2 , . . . , vn }. We argue by contradiction. If the set {w1 , . . . , wn } is not a basis of Span{v1 , v2 , . . . , vn }, then, by Corollary 1.4.16, there is an integer r ≥ 1 and linearly independent vectors vi1 , . . . , vir ∈ {v1 , . . . , vn } such that the set {w1 , . . . , wn , vi1 , . . . , vir } is a basis of Span{v1 , v2 , . . . , vn }. Now, again by Corollary 1.4.16, there are vectors vj1 , . . . , vjp ∈ {v1 , . . . , vn } such that {vj1 , . . . , vjp , vi1 , . . . , vir } is a basis of Span{v1 , v2 , . . . , vn }. Note that we must have p < n, because if p = n then Span{vj1 , . . . , vjp } = Span{v1 , . . . , vn },
46
Chapter 1: Vector Spaces
which is not possible because r ≥ 1. Since the subspaces Span{w1 , . . . , wn } and Span{vj1 , . . . , vjp } are complements of the same subspace Span{vi1 , . . . , vir }, the subspace Span{w1 , . . . , wn } has a basis with p elements {u1 , . . . , up }, by Theorem 1.4.18. But then the linearly independent vectors w1 , . . . , wp are in the subspace Span{u1 , . . . , up } = Span{w1 , . . . , wn }. Consequently, by our inductive assumption, {w1 , . . . , wp } is a basis of Span{w1 , . . . , wn }. But this contradicts independence of the vectors w1 , . . . , wn , because p < n. This completes the proof of the fact that {w1 , . . . , wn } is a basis of Span{v1 , v2 , . . . , vn }. Now we show that the vectors v1 , . . . , vn are linearly independent. Again we use a proof by contradiction. If v1 , . . . , vn are not linearly independent, then, by Corollary 1.4.17, there is an integer r < n and a set {vi1 , . . . , vir } ⊂ {v1 , . . . , vn } of linearly independent vectors such that Span{vi1 , . . . , vir } = Span{v1 , . . . , vn }. But then the linearly independent vectors w1 , . . . , wr are in Span{vi1 , . . . , vir } = Span{v1 , . . . , vn }. Consequently, by our inductive assumption, {w1 , . . . , wr } is a basis of Span{v1 , v2 , . . . , vn }, contradicting linear independence of the vectors w1 , . . . wn , because r < n. This contradiction proves that the vectors v1 , . . . , vn are linearly independent. The next theorem is an easy consequence of Theorem 1.5.2. Theorem 1.5.3. If {v1 , . . . , vn } is a basis of the vector space V, then any n + 1 vectors in V are linearly dependent. Proof. Let w1 , . . . , wn , wn+1 ∈ V. If the vectors w1 , . . . , wn are linearly dependent, we are done. If the vectors w1 , . . . , wn are linearly independent, then the set {w1 , . . . , wn } is a basis of V, by Theorem 1.5.2. Consequently wn+1 can be written as a linear combination of the vectors w1 , . . . , wn , which implies that the vectors w1 , . . . , wn , wn+1 are linearly dependent.
Corollary 1.5.4. If both {v1 , . . . , vm } and {w1 , . . . , wn } are bases of a vector space V, then m = n. Proof. This is an immediate consequence of Theorem 1.5.3. Now we are ready to define the dimension of a vector space spanned by a finite number of vectors.
1.5. DIMENSION OF A VECTOR SPACE
47
Definition 1.5.5. If a vector space V has a basis with n vectors, then the number n is called the dimension of V and is denoted by dim V. Additionally, we define dim{0} = 0.
If a vector space V has a basis with a finite number of vectors, we say that V is finite dimensional and we write dim V < ∞. Not every vector space is finite dimensional.
Example 1.5.6. The space of all polynomials P(K) is not a finite dimensional space. Suppose, on the contrary, that dim P(K) = n for some positive integer n. Then P(K) = Span{p1 , . . . , pn } for some nonzero polynomials p1 , . . . , pn . Let m be the maximum degree of the polynomials p1 , . . . , pn . Since the degree of every linear combination of the polynomials p1 , . . . , pn is at most m, the polynomial p(t) = tm+1 is not in P(K) = Span{p1 , . . . , pn }, a contradiction. The following useful result can be easily obtained from Theorem 1.5.2. Theorem 1.5.7. Let V be a vector space such that dim V = n. (a) Any set of n linearly independent vectors from V is basis of V; (b) If V = Span{w1 , . . . , wn }, then {w1 , . . . , wn } is a basis of V. Proof. Let dim V = n and let {v1 , . . . , vn } be a basis of V. If the vectors w1 , . . . , wn ∈ V are linearly independent, then Span{w1 , . . . , wn } = Span{v1 , . . . , vn } = V, by Theorem 1.5.2. Thus {w1 , . . . , wn } is a basis of V. If V = Span{w1 , . . . , wn }, then the linearly independent vectors v1 , . . . , vn are in Span{w1 , . . . , wn } and, again by Theorem 1.5.2, the vectors w1 , . . . , wn are linearly independent. This means that the set {w1 , . . . , wn } is a basis of V.
Example 1.5.8. The solutions of the differential equation y ′′ + y = 0 is a vector space of dimension 2 over C. Both {eit , e−it } and {cos t, sin t} are bases in this vector space.
48
Chapter 1: Vector Spaces
Example 1.5.9. Let a, b, and c be distinct numbers from K and let U = {p ∈ Pn (K) : p(a) = p(b) = p(c)}. The dimension of the vector space U is n − 2 (see Example 1.3.26).
Example 1.5.10. Let a be an arbitrary number from K and let U = {p ∈ Pn (K) : p(a) = p′ (a) = p′′ (a) = 0}. Then dim U = n − 3 (see Example 1.3.21).
Example 1.5.11. Let n be an integer greater than 1. (a) dim Sn×n (K) = 12 n(n + 1) (see Example 1.2.5) (b) dim An×n (K) = 21 n(n − 1) (see Example 1.2.5)
Example 1.5.12. Show that 1 1−2i i 0 i 1 i −i dim Span , , , =3 −i −2 − i 1 i 0 −2 i i 1
and find all bases of the above subspace, in the set 0 i 1 1−2i i 1 i −i , , . , −i −2 − i 1 i −2 i i 1 0
Solution. Since the reduced row echelon form of the matrix 1 1−2i i 0 i −i i 1 −i −2 − i 1 i 0 −2 i i 1
49
1.5. DIMENSION OF A VECTOR SPACE is
1 0 0 0
and the sets 0 0 1 0 1 0 , , , 0 0 1 0 0 0
0 21 0 1 − 21 0 0 0 1 0 0 0
1 1 21 0 0 − 0 , 2 , , 0 0 1 0 0 0
1 0 21 0 1 − 0 , 2 , , 0 0 1 0 0 0
are linearly independent, the following sets are bases: 0 1−2i 1 −i , 1 , i, −i −2 − i i 1 −2 i 0
1 i 0 i i, 1 , −i 1 i 0 i 1
i 0 1−2i −i , i, 1 . −2 − i 1 i i 1 −2 i
In the next example we use Theorem 1.5.7 to obtain the characterization of invertible matrices in Example 1.3.7.
Example 1.5.13. Show that a matrix A ∈ Mn×n (K) is invertible if and only if the only solution of the equation Ax = 0 is x = 0. Solution. If A is invertible and Ax = 0, then x = A−1 Ax = A−1 0 = 0. Suppose now that the only solution of the equation Ax = 0 is x = 0. We x1 .. write A = a1 . . . an , where a1 , . . . , an are the columns of A, and x = . . xn
The assumption that the only solution of the equation a1
x1 0 . . . . . . an . = .. xn
0
50
Chapter 1: Vector Spaces
0 .. is . implies that the vectors a1 , . . . , an are linearly independent. Conse0 quently, by Theorem 1.5.7, the set {a1 , . . . , an } is a basis of Kn . If e1 , . . . , en are the columns of the unit matrix In , then for any j ∈ {1, . . . , n} we have ej = b1j a1 + · · · + bnj an , for some bjk ∈ K, that is,
In = AB
where B is the matrix with entries bjk . It is easy to see that the only solution of the equation Bx = 0 is x = 0. Arguing as before, we can find a matrix C such that In = BC. Since A = AIn = ABC = In C = C, we get AB = BA = In .
Example 1.5.14. Show that the dimension of the vector space U in Example 1.1.8 is 1. Solution. Let x be a vector in R2 which is not on the vector line Ra. Consequently, the vectors x and a are linearly independent. If y is in R2 then there are real numbers α and β such that y = αa + βx. Hence
c = βx c = βb b = (αa + βx)∧ = αa c + βx y x.
This means that {b x} is a basis in U and consequently dim U = 1. In Section 1.3 we remark that linear independence of vectors may depend on the field of scalars K. Since the dimension of a vector space is closely related to linear independence of vectors (being the maximum number of linearly independent vectors in the space), it should be expected that the dimension of a vector space also depends on whether the space is treated as a real vector space or a complex vector space.
1.5. DIMENSION OF A VECTOR SPACE
51
Example 1.5.15. Let V be a complex vector of dimension n. Show that the dimension of the real vector space V is 2n. Solution. Let V be a complex vector of dimension n and let {v1 , . . . , vn } be a basis of V. This means that every v ∈ V has a unique representation of the form v = z1 v1 + · · · + zn vn with z1 , . . . , zn ∈ C. We will show that {v1 , . . . , vn , iv1 , . . . , ivn } is a basis of the real vector space V. If a1 v1 + · · · + an vn + b1 iv1 + · · · + bn ivn = 0 with a1 , . . . , an , b1 , . . . , bn ∈ R, then (a1 + b1 i)v1 + · · · + (an + bn i)vn = 0. Since the vectors v1 , . . . , vn are linearly independent in the complex vector space V, we must have a1 + b1 i = · · · = an + bn i = 0, and thus a1 = · · · = an = b1 = · · · = bn = 0. This proves that the vectors v1 , . . . , vn , iv1 , . . . , ivn are linearly independent in the real vector space V. Now we need to show that every vector v ∈ V can be written in the form x1 v1 + · · · + xn vn + y1 iv1 + · · · + yn ivn with x1 , . . . , xn , yn , . . . , yn ∈ R. Indeed, if v ∈ V, then v = z1 v1 + · · · + zn vn for some z1 , . . . , zn ∈ C. If we write zj = xj + yj i, then z1 v1 + · · · + zn vn = (x1 + y1 i)v1 + · · · + (xn + yn i)vn = x1 v1 + · · · + xn vn + y1 iv1 + · · · + yn ivn , which is the desired representation.
Theorem 1.5.16. Let U be a subspace of a finite dimensional vector space V. Then (a) U is finite dimensional and dim U ≤ dim V; (b) If dim U = dim V, then U = V.
52
Chapter 1: Vector Spaces
Proof. Both properties are trivially true if U = {0}. If U is a nontrivial subspace of V and {v1 , . . . , vn } is a basis of V, then there exist an integer m ≥ 1 and linearly independent vectors vi1 , . . . , vim ∈ {v1 , . . . , vn } such that U ⊕ Kvi1 ⊕ · · · ⊕ Kvim = V, by Theorem 1.4.14. Since we also have Kvj1 ⊕ · · · ⊕ Kvjn−m ⊕ Kvi1 ⊕ · · · ⊕ Kvim = V, where {j1 , . . . , jn−m , i1 , . . . , im } = {1, . . . , n}, the vector subspace U has a basis with n − m vectors, by Theorem 1.4.18. Consequently, dim U = n − m < n = dim V. If U = V, then obviously dim U = dim V. In either case, since dim U ≤ n, the space U is finite dimensional. This completes the proof of part (a). Part (b) is an immediate consequence of Theorem 1.5.7.
Example 1.5.17. Let U and V be finite dimensional vector subspaces of a vector space W. Show that dim(U + V) = dim U + dim V − dim(U ∩ V). Solution. First we note that U + V is a finite dimensional vector space such that U ⊆ U + V and V ⊆ U + V. Let {x1 , . . . , xp } be a basis of U ∩V, {y1 , . . . , yq } be a basis of a complement U ′ of U ∩ V in U, and let {z1 , . . . , zr } a basis of a complement V ′ of U ∩ V in V. According to Example 1.4.3 we have U + V = U ′ ⊕ (U ∩ V) ⊕ V ′ Consequently, by Theorem 1.4.10, {x1 , . . . , xp y1 , . . . , yq , z1 , . . . , zr } is a basis of U + V. Since dim(U + V) = p + q + r, dim U = p + q, dim V = p + r, and dim(U ∩ V) = p, the desired equality follows.
53
1.6. CHANGE OF BASIS
We close this section with a useful observation that is an immediate consequence of Corollary 1.4.16. Note that the theorem implies that every collection of linearly independent vectors in a vector space V can be extended to a basis V. Theorem 1.5.18. Let {v1 , . . . , vn } be a basis of a vector space V. If the vectors w1 , . . . , wk ∈ V are linearly independent and k < n, then there are vectors wk+1 , . . . , wn ∈ {v1 , . . . , vn } such that {w1 , . . . , wk , wk+1 , . . . , wn } is a basis of V.
Example 1.5.19. In Example 1.3.26 we use the fact that the polynomials 1, q(t), tq(t), . . . , tn−3 q(t), where q(t) = (t − a)(t − b)(t − c), are linearly independent vectors in Pn (K). Extend this collection of vectors to a basis of Pn (K). Solution. We can use two vectors from the standard basis of Pn (K), that is, {1, t, t2 , . . . , tn }, namely t and t2 . It is easy to check that the set {1, t, t2 , q(t), tq(t), . . . , tn−3 q(t)} is a basis in Pn (K).
1.6
Change of basis
Problems in linear algebra and its applications often require working with different bases in the same vector space. If coordinates of a vector in one basis are known, we need to be able to efficiently find its coordinates in a different basis. The following theorem describes this process in terms of matrix multiplication.
54
Chapter 1: Vector Spaces
Theorem 1.6.1. Let {v1 , . . . , vn } and {w1 , . . . , wn } be bases of a vector space V and let ajk ∈ K, for 1 ≤ j, k ≤ n, be the unique numbers such that vk = a1k w1 + · · · + ank wn , for every 1 ≤ k ≤ n. For every v ∈ V, if v = x1 v1 + · · · + xn vn , then v = y 1 w1 + · · · + y n wn ,
(1.7)
where the numbers y1 , . . . , yn are given by y1 a11 . . . a1n x1 .. .. . .. ... . = . . yn
an1 . . . ann
xn
Proof. Let v be an arbitrary vector in V. If v = x1 v1 + · · · + xn vn , then v = x1 v1 + · · · + xn vm
= x1 (a11 w1 + · · · + an1 wn ) + · · · + xm (a1n w1 + · · · + ann wn ) x1 x1 . . = a11 . . . a1n .. w1 + · · · + an1 . . . ann .. wn . xn
xn
Thus the coordinates of v in basis {w1 , . . . , wn } are x1 . yk = ak1 . . . akn .. , xn
for every k ∈ {1, . . . , n}, which can be written as x1 a11 . . . a1n y1 .. .. .. . . . = . . . . xn an1 . . . ann yn
Definition 1.6.2. Let B = {v1 , . . . , vn } and C = {w1 , . . . , wn } be bases of a vector space V. The n × n matrix in Theorem 1.6.1 is called the change of coordinates matrix from the basis B to the basis C and is denoted by IdB→C .
55
1.6. CHANGE OF BASIS We use Id in the above definition because in Theorem 1.6.1 we have Id(vk ) = vk = a1k w1 + · · · + ank wn ,
where the function Id : V → V is defined by Id(v) = v for every v ∈ V. This notation will be explained better in Chapter 2.
Example 1.6.3. Let {v1 , v2 , v3 } and {w1 , w2 , w3 } be bases for a vector space V. If v1 = iw1 + iw2 + w3 v2 = −iw1 + iw2 + w3 v3 = w1 + iw2 + iw3 , write the vector v = (2 − i)v1 + 3v2 + iv3 as a linear combination of the vectors w1 , w2 , and w3 . Solution. Since the change of coordinates matrix from the basis {v1 , v2 , v3 } to the basis {w1 , w2 , w3 } is i −i 1 i i i 1 1 i and
we have
i −i 1 2 − i 1 i i i 3 = 5 i , 1 1 i i 4−i v = w1 + 5iw2 + (4 − i)w3 .
Example 1.6.4. Let a ∈ K. We consider the vector space P3 (K). Determine the change of coordinates matrix from the basis {1, t, t2 , t3 } to the basis {1, t − a, (t − a)2 , (t − a)3 } and write an arbitrary polynomial from P3 (K) in the basis {1, t − a, (t − a)2 , (t − a)3 }.
56
Chapter 1: Vector Spaces
Solution. Since 1=1 t = a + 1 · (t − a)
t2 = a2 + 2a(t − a) + (t − a)2
t3 = (a + t − a)3 = a3 + 3a2 (t − a) + 3a(t − a)2 + (t − a)3 ,
the change of coordinates matrix 1 0 0 0
is a a2 a3 1 2 a 3 a2 . 0 1 3 a 0 0 1
For an arbitrary polynomial p(t) = b0 + b1 t + b2 t2 + b3 t3 in P3 (K) we have 1 0 0 0
3 a a2 a3 b0 a b3 + a2 b1 + ab2 + b0 1 2 a 3 a2 3 a2 b3 + 2 ab2 + b1 b1 = , 0 1 3 a b2 3 ab3 + b2 0 0 1 b3 b3
and thus p(t) becomes a3 b3 +a2 b1 +ab2 +b0 +(3a2 b3 +2 ab2 +b1 )(t−a)+(3 ab3 +b2 )(t−a)2 +b3 (t−a)3 , when written in the basis {1, t − a, (t − a)2 , (t − a)3 }. Note that we could get this result by direct calculations.
Theorem 1.6.5. Let B = {v1 , . . . , vn } and C = {w1 , . . . , wn } be bases of a vector space V. The change of coordinates matrix IdB→C is invertible and Id−1 B→C = IdC→B . Proof. If x1 v1 + · · · + xn vn = y1 w1 + · · · + yn wn , then
y1 x1 .. .. . = IdB→C . yn
xn
and
x1 y1 .. .. . = IdC→B . , xn
yn
57
1.6. CHANGE OF BASIS and consequently
x1 y1 x1 .. .. .. . = IdC→B . = IdC→B IdB→C . xn
and
yn
xn
y1 x1 y1 .. .. .. . = IdB→C . = IdB→C IdC→B . , yn
xn
yn
proving that Id−1 B→C = IdC→B .
Example 1.6.6. Let {v1 , v2 , v3 } and {w1 , w2 , w3 } be bases of the vector space V. If v1 = iw1 + iw2 + w3 v2 = −iw1 + iw2 + w3 v3 = w1 + iw2 + iw3 , write w1 , w2 , and w3 as linear combinations of the vectors v1 , v2 , and v3 . Solution. The change of coordinates matrix from the basis {v1 , v2 , v3 } to the basis {w1 , w2 , w3 } is i −i 1 i i i . 1 1 i
The inverse of this matrix is i 1 −2 0 2 i i 2 − 21 − 2i 2 . 0 21 − 2i − 21 − 2i Consequently, we can write
i i w1 = − v1 + v2 , 2 2 1 i 1 i w2 = − + v2 + − v3 , 2 2 2 2 i 1 i 1 + v3 . w3 = v1 + v2 − 2 2 2 2
58
Chapter 1: Vector Spaces
Example 1.6.7. It is easy to calculate that the change of coordinates matrix from the basis {1, t − a, (t − a)2 , (t − a)3 } to the basis {1, t, t2 , t3 } is
1 −a a2 −a3 0 1 −2 a 3 a2 . 0 0 1 −3 a 0 0 0 1
Consequently, the change of coordinates matrix from the basis {1, t, t2 , t3 } to the basis {1, t − a, (t − a)2 , (t − a)3 } is
−1 1 −a a2 −a3 1 0 1 −2 a 3 a2 0 = 0 0 0 1 −3 a 0 0 0 1 0
a a2 a3 1 2 a 3 a2 , 0 1 3 a 0 0 1
which is the matrix we found in Example 1.6.4.
1.7 1.7.1
Exercises Definitions and examples
Exercise 1.1. Let A = {(x, y, z) : x ∈ R3 , x ≥ 0, z ≤ 0}. Show that A is not a vector space. Exercise 1.2. Let A = {(x, y, z) ∈ R3 : x + y + z = 0} and B = {(x, y, z) ∈ R3 : x − y + z = 0}. Show that A ∪ B is not a vector space. Exercise 1.3. We define a vector space over R whose elements are lines in R3 parallel to a given line. We recall that a line is a set of the form x + Ra = {x + ca : c ∈ R} where x and a are vectors in R3 and a 6= 0. Let a be a fixed nonzero vector in R3 . We define
and
b = x + Ra x
\ b+y b=x x + y and cb x = cc x.
59
1.7. EXERCISES
Show that the set V = {b x : x ∈ R3 } with the operations defined above is a real vector space. Exercise 1.4. We define a vector space over R whose elements are planes in R3 parallel to a given plane. We recall that a plane is a set of the form x + Ra + Rb = {x + ca + db : c, d ∈ R} where x, a and b are vectors in R3 with a 6= 0 and b 6= 0. Let a and b be fixed nonzero vectors in R3 . We define b = x + Ra + Rb x
and
\ b+y b=x x + y and cb x = cc x.
Show that the set V = {b x : x ∈ R3 } with the operations defined above is a real vector space. Exercise 1.5. Let V and W be vector spaces. Show that the set of all functions f : V → W with the operations of addition and scalar multiplication defined as (f + g)(x) = f (x) + g(x)
and (αf )(x) = αf (x),
is a vector space.
1.7.2
Subspaces
Exercise 1.6. Let a, b ∈ K. Write the polynomial (t + a)3 as a linear combination of the polynomials 1, t + b, (t + b)2 , (t + b)3 . Exercise 1.7. Let U, V, and W be subspaces of a vector space X . If U ⊂ V, show that V ∩ (U + W) = U + (V ∩ W). Exercise 1.8. Let v1 , v2 , v3 be vectors in a vectors space V. If w1 = v1 + 3v2 + 5v3 w2 = v1 + v2 + 3v3 w3 = v1 + 2v2 + 4v3 w4 = v1 − v2 + v3 w5 = 3v1 − v2 + 5v3 , show that Span{w1 , w2 } = Span{w3 , w4 , w5 }. Exercise 1.9. Show that the set {p ∈ P(R) : subspace of P(R).
R1 0
p(t)t2 dt = 0} is a vector
Exercise 1.10. Let V be a subspace of Mn×n (K). Show that V T = {AT : A ∈ V} is a subspace of Mn×n (K). Exercise 1.11. Let V1 , . . . , Vm be subspaces of a vector space V. If W is a vector space such that Vj ⊆ W for every j ∈ {1, . . . , m}, show that V1 + · · · + Vm ⊆ W.
60
Chapter 1: Vector Spaces
Exercise 1.12. If U is a proper subspace of a vector space V, show that the following conditions are equivalent: (a) If W 6= U is a subspace of V and U ⊆ W ⊆ V, then W = V; (b) For every x ∈ V which is not in U we have U + Kx = V. Exercise 1.13. If U and W are subspaces of a vector space V, show that the following conditions are equivalent: (a) The set U ∪ W is a subspace of V; (b) U ⊆ W or W ⊆ U. Exercise 1.14. Let U and W are subspaces of a vector space V and A and B finite subsets of V such that SpanA = U and SpanB = W. Show that Span(A ∪ B) = U + W. Exercise 1.15. Let U be a subspace of the vector space K. Show that U = {0} or U = K. Exercise 1.16. Show that the set of matrices from Mn×n (K) which satisfy the equation 2A + AT = 0 is a subspace of Mn×n (K). Describe this subspace. Exercise 1.17. Give an example of subspaces U, V, and W of P3 (K) such that U 6= V and U + W = V + W = P3 (K). Exercise 1.18. Show that the set UDn×n (K) = {A = [ajk ]1≤j≤n,1≤k≤n ∈ Mn×n (K) : ajk = 0 for all j < k} is a subspace of Mn×n (K). Exercise 1.19. Verify that Mn×n (K) = UD n×n (K) + Sn×n (K), where UDn×n (K) is defined in Exercise 1.18. Exercise 1.20. If v1 , v2 , v3 are vectors in a vector space V, show that Span{v1 , v2 , v3 } = Span{7v1 + 3v2 − 4v3 , v2 + 2v3 , 5v3 − v2 }. Exercise 1.21. Let a ∈ K. Show that a polynomial p ∈ Pn (K) is a linear combination of the polynomials (t−a)2 , . . . , (t−a)n if and only if p(a) = p′ (a) = 0. Exercise 1.22. Let U be a subspace of a vector space V and let x and y be vectors in V such that x ∈ / U. Show that if x ∈ U + Ky then y ∈ U + Kx and y∈ / U. Exercise 1.23. Let f, g ∈ FK (S). Write the function f as a linear combination of the functions 5f − 3g and 4f + 7g. Exercise 1.24. Show that the function |t| is not a linear combination of the function |t − c1 |, . . . , |t − cn | where c1 , . . . , cn are nonzero real numbers. Exercise 1.25. Write the function sin(t + α) as a linear combination of the functions sin t and cos t.
61
1.7. EXERCISES
1.7.3
Linearly independent vectors and bases
Exercise 1.26. Let U be the set of all infinite sequences (x1 , x2 , . . . ) of real numbers such that xn+2 = xn for every integer n ≥ 1. Show that U is a finitedimensional subspace of the vector space of all infinite sequences (x1 , x2 , . . . ) of real numbers and determine a basis in U. Exercise 1.27. Let U and W be subspaces of a vector space V such that U ∩W = {0}. Show that, if the vectors u1 , . . . , um ∈ U are linearly independent and the vectors w1 , . . . , wn ∈ W are linearly independent, then the vectors u1 , . . . , um , w1 , . . . , wn are linearly independent. Exercise 1.28. If v1 , v2 , v3 are linearly independent vectors in a vector space V, show that the vectors v1 + v2 , v2 + v3 , v3 + v1 are linearly independent. Exercise 1.29. If v1 , v2 , v3 , v4 are linearly independent vectors in a vector space V, show that the vectors v1 + v2 , v2 + v3 , v3 + v4 , v4 + v1 are linearly dependent. Exercise 1.30. Show that the functions 1, cos2 t, cos 2t are linearly dependent elements in DR (R). Exercise 1.31. Show that the matrices −i 3 1−i i −i − 2 9 − 2 i , , 2 i 3 i 0 i are linearly dependent elements of M2×2 (C). Exercise 1.32. Let {v1 , v2 , v3 } be a basis of a complex vector space V. Show that the vectors w1 = iv1 + v2 + v3 w2 = iv1 + iv2 + v3 w3 = v1 + v2 + iv3 are linearly independent. Exercise 1.33. If v1 , . . . , vn are linearly independent vectors in a vector space V, show that the vectors v3 + v1 + v2 , . . . , vn + v1 + v2 are linearly independent. Exercise 1.34. Find a basis of C2 as a vector space over R. Exercise 1.35. Show that {cos t, sin t} and {cos t + i sin t, cos t − i sin t} are bases of the same complex vector subspace of FC (R). Exercise 1.36. Let v1 , . . . , vk , v, w1 , . . . , wm be elements of a vector space V. If the vectors v1 , . . . , vk , v are linearly independent and v ∈ Span{v1 , . . . , vk , w1 , . . . , wm }, then there are integers j1 , . . . jm−1 ∈ {1, . . . , m} such that Span{v1 , . . . , vk , v, wj1 , . . . , wjm−1 } = Span{v1 , . . . , vk , w1 , . . . , wm }.
62
Chapter 1: Vector Spaces
Exercise 1.37. Let W be a vector space and let v1 , . . . , vk , w1 , . . . , wm ∈ W. Use Exercise 1.36 to show that, if the vectors v1 , . . . , vk are linearly independent in Span{w1 , . . . , wm }, then k ≤ m. Exercise 1.38. Let v1 , . . . , vn be vectors in a vector space V such that V = Span{v1 , . . . , vn }. Let k be an integer such that 1 ≤ k < n. We suppose that the vectors v1 , . . . , vk are linearly independent and that the vectors v1 , . . . , vk , vm are linearly dependent for all m ∈ {k + 1, . . . , n}. Show that {v1 , . . . , vk } is a basis of V. Exercise 1.39. Let V be a subspace of a vector space W and let v1 , . . . , vk be linearly independent vectors in V. Show that either {v1 , . . . , vk } is a basis of V or there is a vector v ∈ V such that the vectors v1 , . . . , vk , v are linearly independent. Exercise 1.40. Let V be a subspace of an m-dimensional vector space W. Using Exercises 1.37 and 1.39, show that V is a k-dimensional vector space for some k ≤ m. Exercise 1.41. Let {v1 , v2 , v3 , v4 , v5 } be a basis of a vector space V and let w1 = v1 + 3v2 + 3v3 + 2v4 + v5 w2 = 2v1 + v2 + v3 + 4v4 + v5 w3 = v1 + v2 + v3 + 2v4 + v5 . Determine all two-element sets {vj , vk } ⊂ {v1 , v2 , v3 , v4 , v5 } such that the set {w1 , w2 , w3 , vj , vk } is a basis of V. Exercise 1.42. Let {v1 , v2 , v3 , v4 } be a basis of a vector space V. If w1 = 3v1 + 4v2 + 5v3 + 2v4 w2 = 2v1 + 5v2 + 3v3 + 4v4 w3 = v1 + av2 + bv3 w4 = cv2 + dv3 + v4 , find numbers a, b, c, d such that {w1 , w2 } and {w3 , w4 } are bases of the same subspace. Exercise 1.43. Verify that the set x y S = : 3x + y + z + 4t = 0, 2x + y + 3z + t = 0 z t
is a subspace of R4 and determine a basis of this subspace. R1 R2 Exercise 1.44. Show that V = {p ∈ P2 (R) : 0 p(t)dt = 0, 0 p(t)dt = 0} is a vector subspace of P2 (R) and determine a basis in this subspace.
63
1.7. EXERCISES
1.7.4
Direct sums
Exercise 1.45. If U = {f ∈ FR (R) : f (−t) = f (t)} and V = {f ∈ FR (R) : f (−t) = −f (t)}, show that FR (R) = U ⊕ V . Exercise 1.46. Let Dn×n (K) = {[ajk ] ∈ Mn×n (K) : ajk = 0 if j 6= k}. Find a subspace V of Mn×n (K) such that An×n (K) ⊕ Dn×n (K) ⊕ V = Mn×n (K). Exercise 1.47. Let U and W be vector subspaces of a vector space V. If dim U + dim W = dim V + 1, show that the sum U + W is not direct. Exercise 1.48. Let U1 , U2 and U3 be subspaces of a vector space V. Show that the sum U1 + U2 + U3 is direct if and only if U1 ∩ U2 = 0 and (U1 + U2 ) ∩ U3 = 0. Exercise 1.49. Let U1 and U2 be subspaces of a vector space V and let v1 , . . . , vn ∈ V. We assume that the sum U1 + U2 + Kv1 + · · · + Kvn is direct. If W1 = U1 + Kv1 + · · · + Kvn and W2 = U2 + Kv1 + · · · + Kvn , show that {v1 , . . . , vn } is a basis of W1 ∩ W2 . Exercise 1.50. Let A, B, and S be sets such that A and B are disjoint and A ∪ B = S. If V = {f ∈ FK (S) : f (x) = 0 for all x ∈ A} and W = {f ∈ FK (S) : f (x) = 0 for all x ∈ B}, show that V and W are subspaces of FK (S) and that FK (S) = V ⊕ W. Exercise 1.51. Show that Mn×n (K) = An×n (K) ⊕ UDn×n (K). Exercise 1.52. Let x1 , . . . , xn be distinct elements of a set S. Show that the set U = {f ∈ FK (S) : f (x1 ) = · · · = f (xn ) = 0} is a vector subspace of FK (S) and determine a subspace W of FK (S) such that U ⊕ W = FK (S).
1.7.5
Dimension of a vector space
Exercise 1.53. Let p0 , . . . , pn be polynomials in Pn (K) such that deg pj = j for every j ∈ {0, . . . , n}. Show that {p0 , . . . , pn } is a basis of Pn (K). Exercise 1.54. Show that the dimension of the vector space V in Exercise 1.4 is 1. Exercise 1.55. Show that the dimension of the vector space V in Exercise 1.3 is 2. Exercise 1.56. Show that 1 1 1 1 1 0 0 0 B= , , , 1 1 0 1 0 1 0 1 is a basis of the vector space M2×2 (K).
64
Chapter 1: Vector Spaces
Exercise 1.57. Determine 3 different bases in the vector space M2×2 (K) which 1 2 3 0 are extensions of the set , . 0 0 0 4 a b Exercise 1.58. Show that the set of all matrices of the form , where b b a, b ∈ K are arbitrary, is a subspace of M2×2 (K) and determine the dimension of this subspace. x b Exercise 1.59. Show that the set of all matrices of the form , where c y a, b, x, y ∈ K and 2x + 5y = 0, is a subspace of M2×2 (K) and determine the dimension of this subspace. Exercise 1.60. Show that the set a b U= ∈ M2×2 (K) : a + b = 0, c + d = 0, a + d = 0 c d is a vector subspace of M2×2 (K) and determine dim U. Exercise 1.61. Determine dim S3×3 (K). Exercise 1.62. If U1 , . . . , Uk are subspaces of an n-dimensional vector space V, show that dim U1 + · · · + dim Uk ≤ (k − 1)n + dim U1 ∩ · · · ∩ Uk . Exercise 1.63. Let U and W be vector subspaces of a vector space V where dim V = n. If {0} 6= U * W , dim U = m, and dim W = n − 1, determine dim U ∩ W. Exercise 1.64. Let {v1 , v2 , v3 , v4 } be a basis of a vector space V and let w1 , w2 , w3 ∈ V be linearly independent. Show that at least one the sets {w1 , w2 , w3 , v1 }, {w1 , w2 , w3 , v2 }, {w1 , w2 , w3 , v3 }, {w1 , w2 , w3 , v4 } is a basis of V. Exercise 1.65. Let {v1 , v2 , v3 , v4 } be a basis of a vector space V. If w1 = v1 + 3v2 + 4v3 − v4 w2 = 2v1 + v2 + 3v3 + 3v4 w3 = v1 + v2 + 2v3 + v4 , determine the dimension of Span{w1 , w2 , w3 } and find a basis of this subspace. Exercise 1.66. Show that the set U = {p ∈ Pn (K) : p(1) = p′′′ (1) = 0} is a subspace of Pn (K). Determine the dimension of U, find a basis of U, and extend this basis to a basis of Pn (K). Exercise 1.67. If V = {p ∈ Pn (K) : p(1) = p(2) = 0} and W = {p ∈ Pn (K) : p(1) = p(3) = 0}, describe V + W and verify that dim(V + W) = dim V + dim W − dim(V ∩ W).
65
1.7. EXERCISES
Exercise 1.68. Let {v1 , v2 , v3 , v4 , v5 } be linearly independent vectors in a vector space V and let w1 , w2 , w3 ∈ Span{v1 , v2 , v3 , v4 , v5 }. We assume that w1 = a1 v1 + a2 v2 + a3 v3 + a4 v4 + a5 v5 w2 = b1 v1 + b2 v2 + b3 v3 + b4 v4 + b5 v5 w3 = c1 v1 + c2 v2 + c3 v3 + c4 v4 + c5 v5 and that the reduced echelon form of a1 a2 b1 b2 c1 c2 is
the matrix a3 a4 a5 a3 b 4 b 5 c3 c4 c5
1 t 0 0 x 0 0 1 0 y . 0 0 0 1 z
Show that {w1 , w2 , w3 , v2 , v5 } is a basis of Span{v1 , v2 , v3 , v4 , v5 }. Exercise 1.69. Show that dim Mn×n (K) = dim UDn×n + dim Sn×n (K) − dim Dn×n (K). Exercise 1.70. Show that the set U = {p ∈ P3 (K) : p(1) = p(2) = 0} is a subspace of P3 (K), determine the dimension of U, find a basis of U, and extend that basis to a basis of P3 (K). Exercise 1.71. Let V = {p ∈ Pn (K) : p(1) = p(−1) = 0} and W = {p ∈ Pn (K) : p(i) = p(−i) = 0}, where n ≥ 4. Show that V and W are subspaces of Pn (K), V + W = Pn (K), and dim(V + W) = dim V + dim W − dim(V ∩ W).
1.7.6
Change of basis
Exercise 1.72. Find the change of coordinates matrix from {w1 , w2 } to {w3 , w4 } defined in Exercise 1.42. Exercise 1.73. Show that {t − a, t(t − a), t2 (t − a)} and {t − a, (t − a)2 , (t − a)3 } are bases in the vector subspace {p ∈ P4 (K) : p(a) = 0} and find the change of coordinates matrix from {t − a, (t − a)2 , (t − a)3 } to {t − a, t(t − a), t(t − a)2 }. Exercise 1.74. Let {v1 , v2 , v3 , v4 } be a basis of a vector space V. If w1 = v1 + 2v2 + 4v3 + v4 w2 = 2v1 + 4v2 + 7v3 + 3v4 w3 = v1 + 2v2 + 5v4 w4 = v3 − v4 , show that {w1 , w2 } and {w3 , w4 } are bases of the same subspace and find the change of coordinates matrix from {w3 , w4 } to {w1 , w2 } and from {w1 , w2 } to {w3 , w4 }.
This page intentionally left blank
Chapter 2
Linear Transformations Introduction When limits are introduced in calculus, one of the first properties of limits that we learn is lim (f (x) + g(x)) = lim f (x) + lim +g(x) and
x→a
x→a
x→a
lim cf (x) = c lim f (x),
x→a
x→a
where c is an arbitrary constant. These properties are then used in a more general version, namely, lim (c1 f1 (x) + · · · + cn fn (x)) = c1 lim f1 (x) + · · · + cn lim fn (x)
x→a
x→a
x→a
where c1 , . . . , cn are arbitrary constants. Then we see a similar property for derivatives d d d (c1 f1 (x) + · · · + cn fn (x)) = c1 f1 (x) + · · · + cn fn (x) dx dx dx and integrals Z
a
b
(c1 f1 (x) + · · · + cn fn (x)) dx = c1
Z
a
b
f1 (x) dx + · · · + cn
Z
b
fn (x) dx.
a
This property is referred to as linearity. When matrix multiplication is introduced in an introductory matrix linear algebra course, again one of the first properties we mention is linearity of matrix multiplication. Linearity of functions between vector spaces is one of the fundamental ideas of linear algebra. In this chapter we study properties of linear functions in the abstract setting of vector spaces. 67
68
Chapter 2: Linear Transformations
2.1
Basic properties
Definition 2.1.1. Let V and W be vector spaces. A function f : V → W is called linear if it satisfies the following two conditions (a) f (x + y) = f (x) + f (y) for every x, y ∈ V; (b) f (αx) = αf (x) for every x ∈ V and every α ∈ K. Linear functions from V to W are called linear transformations. Note that, if f : V → W is a linear transformation, then f (α1 x1 + · · · + αj xj ) = α1 f (x1 ) + · · · + αj f (xj ) for any vectors x1 , . . . , xj ∈ V and any numbers α1 , . . . , αj ∈ K. Proposition 2.1.2. If f : V → W is a linear transformation, then (a) f (0) = 0; (b) f (−v) = −f (v). Proof. To prove (a) it suffices to note that f (0) = f (0 + 0) = f (0) + f (0) and to prove (b) it suffices to note that 0 = f (0) = f (v − v) = f (v) + f (−v).
Example 2.1.3. Let V be a vector space. Show that the function Id : V → V defined by Id(x) = x is linear. Solution. We have Id(x + y) = x + y = Id(x) + Id(y) for every x, y ∈ V, and Id(αx) = αx = α Id(x) for every x ∈ V and every α ∈ K.
69
2.1. BASIC PROPERTIES
Example 2.1.4. Let V and W be vector spaces. If f : V → W and g : V → W are linear transformations, then the function f + g : V → W defined by (f + g)(x) = f (x) + g(x), for every x ∈ V, is a linear transformation. Solution. For any vectors x, y ∈ V and any number α ∈ K we have (f + g)(x + y) = f (x + y) + g(x + y) = f (x) + f (y) + g(x) + g(y) = f (x) + g(x) + f (y) + g(y) = (f + g)(x) + (f + g)(y) and (f + g)(αx) = f (αx) + g(αx) = αf (x) + αg(x) = α(f (x) + g(x)) = α(f + g)(x). This means that f + g is a linear transformation.
Example 2.1.5. Let V and W be vector spaces. If f : V → W is a linear transformation and α ∈ K, then the function αf : V → W defined by (αf )(x) = αf (x) for every x ∈ V, is a linear transformation. Solution. For any vectors x, y ∈ V and any number β ∈ K we have (αf )(x + y) = α(f (x + y)) = α(f (x) + f (y)) = αf (x) + αf (y) = (αf )(x) + (αf )(y) and (αf )(βx) = α(f (βx)) = αβf (x) = βαf (x) = β(αf )(x). This means that αf is a linear transformation.
The proof of the following important result is a consequence of the definitions.
70
Chapter 2: Linear Transformations
Theorem 2.1.6. Let V and W be vector spaces. The set of all linear transformations from V to W with the operations f + g and αf defined as (f + g)(x) = f (x) + g(x) and (αf )(x) = αf (x) is a vector space.
Definition 2.1.7. The vector space of all linear transformations from a vector space V to a vector space W is denoted by L(V, W). A linear transformation f : V → V is also called an operator or an endomorphism. The vector space L(V, V) is often denoted by L(V).
Example 2.1.8. If A ∈ Mn×m (K), show that the function f : Km → Kn defined by f (x) = Ax is a linear transformation. Solution. From properties of matrix multiplication we get f (x + y) = A(x + y) = Ax + Ay = f (x) + f (y) and f (αx) = A(αx) = αAx = αf (x). This means that f is a linear transformation.
Example 2.1.9. Let V1 , . . . , Vn be subspaces of a vector space V. Show that the function f : V1 × · · · × Vn → V defined by f (x1 , . . . , xn ) = x1 + · · · + xn is a linear transformation. Solution. The proof is an immediate consequence of the definition of a linear transformation.
Another important property of linear transformations is that the composition of linear transformations is a linear transformation. Theorem 2.1.10. Let V, W, and X be vector spaces. If f : V → W and g : W → X are linear transformations, then the function g ◦ f : V → X is a linear transformation. In other words, if f ∈ L(V, W) and g ∈ L(W, X ), then g ◦ f ∈ L(V, X ).
71
2.1. BASIC PROPERTIES Proof. If x, y ∈ V and α ∈ K, then g ◦ f (x + y) = g(f (x + y)) = g(f (x) + f (y))
= g(f (x)) + g(f (y)) = g ◦ f (x) + g ◦ f (y)
and g ◦ f (αx) = g(f (αx)) = g(αf (x)) = αg(f (x)) = αg ◦ f (x).
Example 2.1.11. Let f : K → K and g : K → K be the linear transformations defined by f (x) = αx and g(x) = βx, where α and β are numbers from K. The linear transformation g ◦ f is defined by g ◦ f (x) = (αβ)x. Note that in this case g ◦ f = f ◦ g. This equality is not generally true. The above example can be generalized as follows. Example 2.1.12. Let f : Km → Kn and g : Kn → Kp be the linear transformations defined by f (x) = Ax and g(y) = By, where A ∈ Mn×m (K) and B ∈ Mp×n (K). Then g ◦ f (x) = (BA)x for every x ∈ Km . In linear algebra it is customary to write the composition f ◦ g simply as f g and call it the product of f and g. Note that, if f and g are defined in terms of matrices, as in Example 2.1.12, then the product of f and g corresponds to the product of matrices. Composition of linear transformations has properties similar to multiplication. The main difference is that composition is not commutative, that is, in general f g is different from gf . Moreover, if f g is well-defined, it does not mean that gf makes sense. Proposition 2.1.13. Let V and W be vector spaces, let f, f ′ : V → W and g, g ′ : W → X be linear transformations, and let α ∈ K. Then (a) g(f + f ′ ) = gf + gf ′ ; (b) (g + g ′ )f = gf + g ′ f ; (c) (αg)f = g(αf ) = α(gf ).
72
Chapter 2: Linear Transformations
Proof. The properties are direct consequences of the definitions. The proof is left as an exercise. The following theorem implies that a linear transformation f ∈ L(V, W) is completely determined by its values at elements of an arbitrary basis of V. This is very different from arbitrary functions from V to W and has important consequences. Theorem 2.1.14. Let V and W be vector spaces and let {v1 , . . . , vn } be a basis of V. For any w1 , . . . , wn ∈ W there is a unique linear transformation f : V → W such that f (v1 ) = w1 , . . . , f (vn ) = wn . Proof. Since {v1 , . . . , vn } is a basis of V, for every x ∈ V there are unique numbers x1 , . . . , xn ∈ K such that x = x1 v1 + · · · + xn vn . We define f (x) = f (x1 v1 + · · · + xn vn ) = x1 w1 + · · · + xn wn .
Since for any α ∈ K we have
αx = α(x1 v1 + · · · + xn vn ) = (αx1 )v1 + · · · + (αxn )vn , we also have f (αx) = (αx1 )w1 + · · · + (αxn )wn = α(x1 w1 + · · · + xn wn ) = αf (x). Now, if x, y ∈ V, then x = x1 v1 + · · · + xn vn
and y = y1 v1 + · · · + yn vn ,
for some numbers x1 , . . . , xn , y1 , . . . , yn ∈ K. Since
x + y = x1 v1 + · · · + xn vn + y1 v1 + · · · + yn vn = (x1 + y1 )v1 + · · · + (xn + yn )vn , we get f (x + y) = (x1 + y1 )w1 + · · · + (xn + yn )wk = x1 w1 + · · · + xk wn + y1 w1 + · · · + yn wn = f (x) + f (y).
This shows that the defined function f is a linear transformation. Clearly, f (v1 ) = w1 , . . . , f (vn ) = wn . Now we need to show that defined function f is a unique linear transformation such that f (v1 ) = w1 , . . . , f (vn ) = wn . Let g be any linear transformation such that g(v1 ) = w1 , . . . , g(vn ) = wn and let x ∈ V. Then x = x1 v1 + · · · + xn vn for some x1 , . . . , xn ∈ K and we have g(x) = g(x1 v1 + · · · + xn vn ) = x1 g(v1 ) + · · · + xn g(vn )
= x1 w1 + · · · + xn wn = f (x1 v1 + · · · + xn vn ) = f (x).
This proves the uniqueness and completes the proof.
73
2.1. BASIC PROPERTIES
2.1.1
The kernel and range of a linear transformation
Definition 2.1.15. Let f : V → W be a linear transformation. The set ker f = {x ∈ V : f (x) = 0} is called the kernel of f .
Example 2.1.16. Consider an m×n matrix A with entries in K and the linear transformation f : Kn → Km defined by f (x) = Ax. Then ker f = N(A) (see Example 1.2.8).
Theorem 2.1.17. Let f : V → W be a linear transformation. Then ker f is a subspace of V. Proof. If v ∈ ker f and α ∈ K, then f (αv) = αf (v) = 0, and thus αv ∈ ker f . Similarly, if v1 , v2 ∈ ker f , then f (v1 + v2 ) = f (v1 ) + f (v2 ) = 0, and thus v1 + v2 ∈ ker f .
Example 2.1.18. Consider the linear transformation f : P2 (R) → R defined R1 by f (p) = 0 p(t)dt. Determine ker f and dim ker f .
Solution. An arbitrary element of P2 (R)) is of the form at2 + bt + c where a, b, and c are real numbers. Since Z
1 0
(at2 + bt + c) dt =
b a + + c, 3 2
f (at2 + bt + c) = 0 if and only if a3 + 2b + c = 0 or, equivalently, c = − a3 − 2b . Consequently, f (at2 + bt + c) = 0 if and only if a b 1 1 2 2 2 at + bt + c = at + bt − − = a t − +b t− . 3 2 3 2
74
Chapter 2: Linear Transformations
Hence
1 1 ker f = Span t − , t − 3 2 2
and dim ker f = 2.
Example 2.1.19. Let f : V → V be a linear transformation such that (f − α Id)(f − β Id) = 0 (where Id : V → V is the identity linear transformation) and α, β ∈ K with α 6= β. Show that V = ker(f − α Id) ⊕ ker(f − β Id). Solution. First we note that f − α Id −f − β Id = (β − α) Id .
(2.1)
Consequently, for any v ∈ V, we have (f − α Id)v − (f − β Id)v = (β − α)v and thus v=
1 1 (f − α Id)v − (f − β Id)v. β−α β−α
Since (f − α Id)v ∈ ker(f − β Id) and (f − β Id)v ∈ ker(f − α Id), we have V = ker(f − α Id) + ker(f − β Id). To finish the proof we have to show that the sum is direct. Indeed, if v ∈ ker(f − α Id) and v ∈ ker(f − β Id), then v = 0 by (2.1).
Definition 2.1.20. Let f : V → W be a linear transformation. The set ran f = {f (x) : x ∈ V} is called the range of f .
In other words, if f : V → W, then ran f is the set of all y ∈ W such that y = f (x) for some x ∈ V. We can also write ran f = f (V).
75
2.1. BASIC PROPERTIES
Theorem 2.1.21. Let f : V → W be a linear transformation. Then ran f is a subspace of W.
Proof. If w ∈ ran f , then w = f (v) for some v ∈ V. Then for any α ∈ K we have αw = αf (v) = f (αv), so αw ∈ ran f . Similarly, if w1 , w2 ∈ ran f , then w1 = f (v1 ) and w2 = f (v2 ) for some v1 , v2 ∈ V and we have w1 + w2 = f (v1 ) + f (v2 ) = f (v1 + v2 ), so w1 + w2 ∈ ran f .
Example 2.1.22. Consider an m×n matrix A with entries in K and the linear transformation f : Kn → Km defined by f (x) = Ax. Show that ran f = C(A) (see Example 1.2.7). Solution. Let A = a1 . . . an where a1 , . . . , an are the columns of the matrix A. Then x1 .. n ran f = {Ax : x ∈ K } = {x1 a1 + · · · + xn an : . ∈ Kn } = C(A). xn
Theorem 2.1.23. A linear transformation f : V → W is injective if and only if ker f = {0}.
Proof. Since f (0) = 0, if f is injective, then the only v ∈ V such that f (v) = 0 is v = 0. This means that ker f = {0}. Now assume ker f = {0}. If f (v1 ) = f (v2 ) for some v1 , v2 ∈ V, then f (v1 − v2 ) = f (v1 ) − f (v2 ) = 0. So v1 − v2 ∈ ker f = {0}, which means that v1 − v2 = 0 or v1 = v2 . This shows that f is injective.
76
Chapter 2: Linear Transformations
2.1.2
Projections
Projections on subspaces play an important role in linear algebra. In this section we discuss projections associated with direct sums. In Chapter 3 we will discuss orthogonal projections that are a special type of projections discussed in this section. Recall that, if V is a vector space and U and W are subspaces of V such that V = U ⊕ W, then for every v ∈ V there are unique u ∈ U and w ∈ W such that v = u + w. This property is essential for the following definition. Definition 2.1.24. Let V be a vector space and let U and W be subspaces of V such that V = U ⊕ W. The function f : V → V defined by f (u + w) = u, where u ∈ U and w ∈ W is called the projection on U along W. Note that, if f is the projection on U along W, then f 2 = f and Id −f is the projection on W along U. Indeed, if u ∈ U and w ∈ W, then f 2 (u + w) = f (f (u + w)) = f (u) = f (u + 0) = u = f (u + w) and (Id −f )(u + w) = u + w − f (u + w) = u + w − u = w.
Example 2.1.25. Let V be a vector space and let f : V → V be a linear transformation such that f 2 = f . Show that V = ran f ⊕ ker f and that f is the projection on ran f along ker f . Solution. For any v ∈ V we have v = f (v) + (v − f (v)) and f (v − f (v)) = f (v) − f (f (v)) = f (v) − f (v) = 0. Since f (v) ∈ ran f and v − f (v) ∈ ker f , this shows that V = ran f + ker f . We need to show that this sum is direct, that is, that the only vector that is in both ran f and ker f is the zero vector.
2.1. BASIC PROPERTIES
77
Suppose v ∈ ran f and v ∈ ker f . Since v ∈ ran f , there is a w ∈ V such that f (w) = v. Then v = f (w) = f (f (w)) = f (v) = 0, because v ∈ ker f . Clearly, f is the projection on ran f along ker f .
Example 2.1.26. Let V be a vector space and let f1 , . . . , fn : V → V be linear transformations such that fj fk = 0 for j 6= k and f1 + · · · + fn = Id. Show that (a) The linear transformations f1 , . . . , fn are projections; (b) V = ran f1 ⊕ · · · ⊕ ran fn ; (c) For every j ∈ {1, . . . , n} the transformation fj is the projection on ran fj along ran f1 ⊕ · · · ⊕ ran fj−1 ⊕ ran fj+1 ⊕ · · · ⊕ ran fn . Solution. Let v be an arbitrary vector in V. Since v = f1 (v) + · · · + fn (v), for every j ∈ {1, . . . , n} we have fj (v) = fj f1 (v) + · · · + fj fn (v) = fj fj (v). This shows that fj fj = fj and thus, by Example 2.1.25, fj is the projection on ran fj along ker fj . To prove (b) we first note that, since v = f1 (v) + · · · + fn (v) for every v ∈ V, we have V = f1 (V ) + · · · + fn (V ). We need to show that this sum is direct. If f1 (v) + · · · + fn (v) = 0, then fj f1 (v) + · · · + fj fn (v) = fj (0) = 0. On the other hand, since fj fk = 0 for j 6= k, we have fj f1 (v) + · · · + fj fn (v) = fj fj (v) = fj (v) and consequently fj (v) = 0 for every j ∈ {1, . . . , n}. This shows that the sum f1 (V ) + · · · + fn (V ) is direct.
78
Chapter 2: Linear Transformations Finally, to prove (c) we take a v ∈ ker fj . Then we have v = f1 (v) + · · · + fn (v) = f1 (v) + · · · + fj−1 (v) + fj+1 (v) + · · · + fn (v)
and thus v ∈ f1 (V ) ⊕ · · · ⊕ fj−1 (V ) ⊕ fj+1 (V ) ⊕ · · · ⊕ fn (V ). On the other hand, since fj fk = 0 for j 6= k, every v ∈ f1 (V ) ⊕ · · ·⊕ fj−1 (V ) ⊕ fj+1 (V ) ⊕ · · · ⊕ fn (V ) is in ker fj . Consequently, ker fj = f1 (V ) ⊕ · · · ⊕ fj−1 (V ) ⊕ fj+1 (V ) ⊕ · · · ⊕ fn (V ), completing the proof, by Example 2.1.25.
2.1.3
The Rank-Nullity Theorem
The main result of this section is an important theorem that connects the dimension of the domain of a linear transformation with the dimensions of its range and the subspace on which the transformation is zero, that is, the kernel of the transformation. We start with an example that will motivate the result.
Example 2.1.27. Let f : V → W be a linear transformation. If {v1 , v2 , v3 } is a basis of ker f and {w1 , w2 } is a basis of ran f , show that dim V = 5. Solution. For any v ∈ V there are x1 , x2 ∈ K such that f (v) = x1 w1 + x2 w2 . If u1 and u2 are vectors in V such that f (u1 ) = w1 and f (u2 ) = w2 , then f (v) = x1 f (u1 ) + x2 f (u2 ) = f (x1 u1 + x2 u2 ) and thus f (v − (x1 u1 + x2 u2 )) = 0. This means that v − (x1 u1 + x2 u2 ) ∈ ker f and consequently there are y1 , y2 , y3 ∈ K such that v − (x1 u1 + x2 u2 ) = y1 v1 + y2 v2 + y3 v3 or v = x1 u1 + x2 u2 + y1 v1 + y2 v2 + y3 v3 .
79
2.1. BASIC PROPERTIES
Since v is an arbitrary vector in V, this shows that Span{u1 , u2 , v1 , v2 , v3 } = V. To finish the proof we have to show that the vectors u1 , u2 , v1 , v2 , v3 are linearly independent. To this end suppose that x1 u1 + x2 u2 + y1 v1 + y2 v2 + y3 v3 = 0.
(2.2)
Then x1 f (u1 ) + x2 f (u2 ) + y1 f (v1 ) + y2 f (v2 ) + y3 f (v3 ) = 0 and consequently x1 w1 + x2 w2 = 0, which gives us x1 = x2 = 0, because the vectors w1 and w2 are linearly independent. Now, since x1 = x2 = 0, equation (2.2) becomes y1 v1 + y2 v2 + y3 v3 = 0, which gives us y1 = y2 = y3 = 0, because the vectors v1 , v2 , v3 are linearly independent. It turns out that the property of the linear transformation in the example above holds for all linear transformations. Theorem 2.1.28 (Rank-Nullity Theorem). Let V be a finite dimensional vector space and let f : V → W be a linear transformation. Then dim ker f + dim ran f = dim V. Proof. The proof is a generalization of the argument presented in Example 2.1.27. Let {v1 , . . . , vm } be a basis of ker f and let {w1 , . . . , wn } be a basis of ran f . Then there are u1 , . . . , un ∈ V such that f (uj ) = wj for 1 ≤ j ≤ n. If v ∈ V, then there are x1 , . . . , xn ∈ K such that f (v) = x1 w1 + · · · + xn wn = x1 f (u1 ) + · · · + xn f (un ) = f (x1 u1 + · · · + xn un ) and thus f (v − x1 u1 + · · · + xn un ) = 0. Consequently v − x1 u1 + · · · + xn un ∈ ker f and v = x1 u1 + · · · + xn un + y1 v1 + · · · + ym vm for some y1 , . . . , ym ∈ K. This shows that Span{u1 , . . . , un , v1 , . . . , vm } = V. To finish the proof we have to show that the vectors u1 , . . . , un , v1 , . . . , vm are linearly independent. Suppose that x1 u1 + · · · + xn un + y1 v1 + · · · + ym vm = 0.
(2.3)
80
Chapter 2: Linear Transformations
By applying f to the above equation and using the fact that {v1 , . . . , vm } is a basis of ker f we obtain x1 w1 + · · · + xn wn = 0, which gives us x1 = · · · = xn = 0, because the vectors w1 , . . . , wn are linearly independent. Now equation (2.3) reduces to y1 v1 + · · · + ym vm = 0, which gives us y1 = · · · = ym = 0 in view of linear independence of vectors v1 , . . . , vm . The above theorem is called the Rank-Nullity Theorem because the number dim ran f is called the rank of f and the number dim ker f is called the nullity of f .
Example 2.1.29. Let f : P5 (R) → P5 (R) be the linear transformation defined by f (p) = p′′′ . Determine ker f , dim ker f , ran f , dim ran f , and verify the Rank-Nullity Theorem. Solution. If p′′′ = 0, then p(t) = at2 + bt + c for some a, b, c ∈ R. Consequently ker f = Span{1, t, t2 } and dim ker f = 3. On the other hand, since (a5 t5 + a4 t4 + a3 t3 + a2 t2 + a1 t + a0 )′′′ = 60a5 t2 + 24a4 t + 6a3 , ran f = Span{1, t, t2 } and dim ran f = 3. As stated in the Rank-Nullity Theorem, dim ker f + dim ran f = 6 = dim P5 (R).
2 Example 2.1.30. Let f : P5 (R) → R be the linear transformation defined ′ p (5) by f (p) = . Determine ker f , dim ker f , ran f , dim ran f , and verify the p(5) the Rank-Nullity Theorem.
Solution. If f ∈ P5 (R) and p′ (5) = p(5) = 0, then p(t) = (at3 + bt2 + ct + d)(t − 5)2 for some a, b, c, d ∈ R. Consequently ker f = Span{(t − 5)2 , t(t − 5)2 , t2 (t − 5)2 , t3 (t − 5)2 }
81
2.2. ISOMORPHISMS and dim ker f = 4. On the other hand, since
p′ (5) 1 0 ′ f (p) = = p (5) + p(5) , p(5) 0 1 1 0 ran f = Span , = R2 and dim ran f = 2. As stated in the Rank0 1 Nullity Theorem, dim ker f + dim ran f = 6 = dim P5 (R).
The following simple consequence of the Rank-Nullity Theorem is often used. Corollary 2.1.31. If f : V → K is a nonzero linear transformation, then dim ker f = dim V − 1.
2.2
Isomorphisms
Consider the vector space Pn (K) of all functions p : K → K of the form p(t) = a0 + a1 t + · · · + an tn where a0 , a1 , . . . , an ∈ K. Since the polynomial p(t) = a0 + a1 t + · · · + an tn is completely determined by the numbers a0 , a1 , . . . , an , one could say that the space Pn (K) can be “identified” with the space Kn+1 . We expect the vector spaces Pn (K) and Kn+1 to have the same algebraic properties. We could say that, from the point of view of linear algebra, Pn (K) and Kn+1 are two “representations” of the same vector space. This point of view is important in linear algebra. In this section we will make this idea precise and examine some of its consequences. Definition 2.2.1. Let V and W be vector spaces. A linear transformation f : V → W that is both injective and surjective is called an isomorphism of vector spaces or simply an isomorphism. Vector spaces V and W are called isomorphic if there is an isomorphism f : V → W.
Example 2.2.2. The vector spaces Pn (K) and Kn+1 are isomorphic. Indeed,
82
Chapter 2: Linear Transformations
the function
a0 a1 f (a0 + a1 t + · · · + an tn ) = . .. an
is an isomorphism from Pn (K) onto Kn+1 .
Theorem 2.2.3. Let V and W be vector spaces. If f : V → W is an isomorphism, then its inverse f −1 : W → V is a linear transformation. Proof. Let w ∈ W and α ∈ K. Since f is surjective, w = f (v) for some v ∈ V. From linearity of f we get f −1 (αw) = f −1 (αf (v)) = f −1 (f (αv)) = αv = αf −1 (w). Now, let w1 , w2 ∈ W. Since f is surjective, w1 = f (v1 ) and w2 = f (v2 ) for some v1 , v2 ∈ V and, from linearity of f , we get f −1 (w1 + w2 ) = f −1 (f (v1 ) + f (v2 )) = f −1 (f (v1 + v2 )) = v1 + v2 = f −1 (w1 ) + f −1 (w2 ).
Corollary 2.2.4. The inverse of an isomorphism is an isomorphism.
Since the function f in the definition of an isomorphism maps V onto W, it may seem that the role of V in the definition is different from the role of W, but in view of the above corollary we know that it is not the case. In the next theorem we characterize isomorphisms in terms of bases. Theorem 2.2.5. Let V and W be vector spaces and let {v1 , . . . , vn } be a basis of V. A linear transformation f : V → W is an isomorphism if and only if the set {f (v1 ), . . . , f (vn )} is a basis of W. Proof. Assume that {f (v1 ), . . . , f (vn )} is a basis of W. We first show that f is injective. If f (x1 v1 + · · · + xn vn ) = 0 for some x1 , . . . , xn ∈ K, then x1 f (v1 ) + · · · + xn f (vn ) = 0 and thus x1 = · · · = xn = 0. This means that ker f = 0 and consequently f is injective, by Theorem 2.1.23.
2.2. ISOMORPHISMS
83
To show that f is surjective we consider an arbitrary w ∈ W. Then w = x1 f (v1 ) + · · · + xn f (vn ) for some x1 , . . . , xn ∈ K. Since w = x1 f (v1 ) + · · · + xn f (vn ) = f (x1 v1 + · · · + xn vn ), we have w ∈ ran f .
Now we assume that f is an isomorphism. If w ∈ W, then there is v ∈ V such that f (v) = w. Since v = x1 v1 + · · · + xn vn for some x1 , . . . , xn ∈ K, we have w = f (v) = f (x1 v1 + · · · + xk vn ) = x1 f (v1 ) + · · · + xn f (vn ). This shows that Span{f (v1 ), . . . , f (vn )} = W. To show that the vectors f (v1 ), . . . , f (vn ) are linearly independent we suppose that x1 f (v1 ) + · · · + xn f (vn ) = 0 for some x1 , . . . , xn ∈ K. Then f (x1 v1 + · · · + xn vn ) = 0 and thus x1 v1 + · · · + xn vn = 0, because ker f = 0. Since {v1 , . . . , vn } is a basis, we conclude that x1 = · · · = xn = 0. Consequently, the set f (v1 ), . . . , f (vn ) is a basis of the vector space W.
As an immediate consequence of the above theorem we obtain the following important result.
Corollary 2.2.6. Finite dimensional vector spaces V and W are isomorphic if and only if dim V = dim W.
Example 2.2.7. Let V be a finite dimensional vector space and let U, W1 , and W2 be subspaces of V such that V = U ⊕ W1 = U ⊕ W2 . Then dim W1 = dim W2 , by Theorem 1.4.18. Consequently, the vector spaces W1 and W2 are isomorphic.
Isomorphic vector spaces have the same algebraic properties and, as we mentioned at the beginning of this section, from the point of view of linear algebra, isomorphic vector spaces can be thought of as different representations of the same vector space. The following corollary says that every vector space over K of dimension n is basically a version of Kn .
84
Chapter 2: Linear Transformations
Corollary 2.2.8. Let {v1 , . . . , vn } be a basis of a vector space V. The function f : V → Kn defined by x1 .. f (x1 v1 + · · · + xn vn ) = . xn
for all x1 , . . . , xn ∈ K, is an isomorphism.
Proof. We have f (v1 ) = e1 , . . . , f (vn ) = en , where e1 , . . . , en is the standard basis of Kn .
Example 2.2.9. Let V1 , . . . , Vn be subspaces of a vector space V. Show that the function f : V1 × · · · × Vn → V defined by f (v1 , . . . , vn ) = v1 + · · · + vn is an isomorphism if and only if V1 ⊕ · · · ⊕ Vn = V. Solution. The function f is clearly a linear transformation. If f is an isomorphism, then ran f = V and ker f = {0}. Since ran f = V, we have V1 + · · · + Vn = V. Since ker f = {0}, v1 + · · · + vn = 0 implies v1 = · · · = vn = 0, which means that the sum V1 + · · · + Vn is direct. Now we assume that V1 ⊕ · · · ⊕ Vn = V. Then ran f = V and ker f = {0}, so f is an isomorphism.
Example 2.2.10. Let V be an arbitrary vector space. Show that V and L(K, V) (the vector space of all linear transformations f : K → V) are isomorphic. Solution. For v ∈ V let tv : K → V be the function defined by tv (x) = xv. Note that tv ∈ L(K, V). We will show that f : V → L(K, V) defined by f (v) = tv
85
2.2. ISOMORPHISMS
is an isomorphism. It is easy to verify that f is a linear injection. We need to show that f is a surjection. Consider an arbitrary s ∈ L(K, V). Then s(x) = s(x · 1) = xs(1), If we let v = s(1), then have s = tv = f (v). Consequently, ran f = L(K, V). Note that, as a particular case, we get that K and L(K, K) = L(K) are isomorphic. We close this section with a theorem that gives several useful characterizations of isomorphisms from a vector space to itself. Theorem 2.2.11. Let V be a finite dimensional vector space and let f : V → V be a linear transformation. The following conditions are equivalent: (a) f is an isomorphism; (b) ker f = {0}; (c) ran f = V; (d) f is left invertible, that is, there is a function g : V → V such that gf = Id; (e) f is right invertible, that is, there is a function g : V → V such that f g = Id.
Proof. Clearly (a) implies each of the remaining four conditions. Let {v1 , . . . , vn } be a basis of V. Assume ker f = {0}. If x1 f (v1 ) + · · · + xn f (vn ) = 0, then f (x1 v1 + · · · + xn vn ) = 0 and consequently x1 v1 + · · · + xn vn = 0, since ker f = {0}. Hence x1 = · · · = xn = 0, because the vectors v1 , . . . , vn are linearly independent. This proves that the vectors f (v1 ), . . . , f (vn ) are linearly independent. Consequently {f (v1 ), . . . , f (vn )} is a basis of V and we have ran f = V. This shows that (b) implies (c). If ran f = V, then Span{f (v1 ), . . . , f (vn )} = V and thus {f (v1 ), . . . , f (vn )} is a basis of V. Hence ker f = {0}, by the Rank-Nullity Theorem. This shows that (c) implies (b).
86
Chapter 2: Linear Transformations
Since (b) and (c) are equivalent and (b) and (c) together are equivalent to (a), all three conditions are equivalent. Now we show that (d) implies (b). Indeed, if there is function g : V → V such that gf = Id and f (x) = f (y), then x = g(f (x)) = g(f (y)) = y. This shows that f is injective and thus ker f = {0}. Finally we show that (e) implies (c). Assume there is a function g : V → V such that f g = Id. Then for every x ∈ V we have x = f (g(x)) and thus ran f = V.
2.3 2.3.1
Linear transformations and matrices The matrix of a linear transformation
At the beginning of this chapter we observed that an n × m matrix with entries in K defines a linear transformation from Km to Kn . It turns out that all linear transformations between finite dimensional spaces can be described in terms of matrix multiplication. We begin with an example. Example 2.3.1. Let {v1 , v2 , v3 } and {w1 , w2 } be bases of V and W, respectively and let f : V → W be the linear transformation defined by f (v1 ) = a11 w1 + a21 w2 , f (v2 ) = a12 w1 + a22 w2 , f (v3 ) = a13 w1 + a22 w2 . Show that for every v = x1 v1 + x2 v2 + x3 v3 ∈ V we have f (v) = y1 w1 + y2 w2 , where the numbers y1 and y2 are given by the equality x y1 a11 a12 a13 1 x2 . = y2 a21 a22 a23 x3
2.3. LINEAR TRANSFORMATIONS AND MATRICES
87
Solution. Since f (v) = f (x1 v1 + x2 v2 + x3 v3 ) = x1 f (v1 ) + x2 f (v2 ) + x3 f (v3 )
we have
= x1 (a11 w1 + a21 w2 ) + x2 (a12 w1 + a22 w2 ) + x3 (a13 w1 + a22 w2 ) x1 x1 = a11 a12 a13 x2 w1 + a21 a22 a23 x2 w2 , x3 x3 x1 a11 a12 a13 x2 x1 x3 a11 a12 a13 y1 = = a21 a22 a23 x2 . y2 x3 x1 a21 a22 a23 x2 x3
The observation in the above example can be generalized to an arbitrary linear transformation between finite dimensional spaces. Theorem 2.3.2. Let {v1 , . . . , vm } and {w1 , . . . , wn } be bases of vector spaces V and W, respectively. For every linear transformation f : V → W there is a unique n × m matrix a11 . . . a1m .. .. . . an1 . . . anm
such that for every v = x1 v1 + · · · + xm vm ∈ V we have
f (x1 v1 + · · · + xm vm ) = y1 w1 + · · · + yn wn where the numbers y1 , . . . , yn ∈ K are given by y1 a11 . . . a1m x1 .. .. .. .. . = . . . . yn an1 . . . anm xm Proof. For every 1 ≤ j ≤ m there are unique a1j , . . . , anj ∈ K such that f (vj ) = a1j w1 + · · · + anj wn .
88
Chapter 2: Linear Transformations
If v = x1 v1 + · · · + xn vn is an arbitrary vector in V, then f (v) = f (x1 v1 + · · · + xm vm ) = x1 f (v1 ) + · · · + xm f (vm )
= x1 (a11 w1 + · · · + an1 wn ) + · · · + xm (a1m w1 + · · · + anm wn ) x1 x1 . . . = a11 . . . a1m . w1 + · · · + an1 . . . anm .. wn . xm
xm
Consequently, if f (x1 v1 + · · · + xm vm ) = y1 w1 + · · · + yn wn , then x1 . yk = ak1 . . . akm .. , xm
for all 1 ≤ k ≤ n, which is equivalent to y1 a11 . . . a1m x1 .. .. . .. ... . = . . yn an1 . . . anm xm
Definition 2.3.3. Let f : V → W be a linear transformation and let B = {v1 , . . . , vm } and C = {w1 , . . . , wn } be bases of V and W, respectively. The matrix a11 . . . a1m .. .. . . an1 . . . anm
in Theorem 2.3.2 is called the matrix of f relative to the bases {v1 , . . . , vm } and {w1 , . . . , wn } and is denoted by fB→C .
Example 2.3.4. Let f : V → W be a linear transformation and let B = {v1 , . . . , vm } and C = {w1 , . . . , wn } be bases of V and W, respectively. If a11 . . . a1m .. A = fB→C = ... . an1 . . . anm
is the matrix of f relative to the bases B and C, show that there is an isomor-
2.3. LINEAR TRANSFORMATIONS AND MATRICES phism g : ker f → N(A) such that
89
x1 g(x1 v1 + · · · + xm vm ) = ... xm
whenever x1 v1 + · · · + xm vm ∈ ker f .
Solution. It suffices to observe that x1 v1 + · · · + xm vm ∈ ker f is equivalent to 0 a11 . . . a1m x1 .. .. . . . = . .. .. , 0
an1 . . . anm
xm
by Theorem 2.3.2, and that the function h : ker f → Km defined by
x1 h(x1 v1 + · · · + xm vm ) = ... xm
is an isomorphism. Consequently, g : ker f → N(A) is an isomorphism.
Example 2.3.5. Let V be a vector space with a basis {v1 , v2 , v3 , v4 } and let W be a vector space with a basis {w1 , w2 , w3 }. Let f : V → W be a linear transformation such that the matrix of f relative to the bases {v1 , v2 , v3 , v4 } and {w1 , w2 , w3 } is 1 2 2 1 2 1 3 5 . 4 5 7 7
Find a basis of ker f .
Solution. It is easy to verify that 4 −7 1 0 , −3 3 0 1
90
Chapter 2: Linear Transformations
is a basis of
1
N 2 4
Consequently,
2 2
1
5 . 7
1 3 5 7
{4v1 + v2 − 3v4 , −7v1 + 3v3 + v4 } is a basis of ker f .
Example 2.3.6. Let {v1 , . . . , vm } and {w1 , . . . , wn } be bases of vector spaces V and W, respectively. If f : V → W is the linear transformation such that the matrix of f relative to the bases {v1 , . . . , vm } and {w1 , . . . , wn } is a11 . . . a1m .. , A = ... . an1 . . . anm
show that there is an isomorphism g : ran f → C(A) such that
a1j g(f (vj )) = ... , anj
for all 1 ≤ j ≤ m. Solution. We define g : ran f → C(A) by
a11 a1m g(f (x1 v1 + · · · + xm vn )) = x1 ... + · · · + xm ... . an1
anm
To show that g is well-defined assume that f (x1 v1 + · · · + xm vn ) = f (x′1 v1 + · · · + x′m vn ). Then f (x1 v1 + · · · + xm vn ) = y1 w1 + · · · + yn wn = f (x′1 v1 + · · · + x′m vn ),
91
2.3. LINEAR TRANSFORMATIONS AND MATRICES for some y1 , . . . , yn ∈ K. By Theorem 2.3.2, this is equivalent to ′ a11 . . . a1m x1 y1 a11 . . . a1m x1 .. .. .. = .. = .. .. .. . . . . . . . . an1 . . . anm
xm
yn
an1 . . . anm
x′m
Consequently, a11 a1m a11 a1m x1 ... + · · · + xm ... = x′1 ... + · · · + x′m ... , an1
an1
anm
anm
proving that the function g is well-defined. Clearly, g is a linear transformation. If g(f (x1 v1 + · · · + xm vn )) = 0, then 0 = g(f (x1 v1 + · · · + xm vn )) a11 a1m = x1 ... + · · · + xm ...
an1
anm
a11 . . . a1m x1 y1 .. .. = .. . = ... . . . an1 . . . anm xm yn
Hence
f (x1 v1 + · · · + xm vn ) = y1 w1 + · · · + yn wn = 0, proving that g is injective. y1 .. Finally, if . ∈ C(A), then yn
a1m a11 y1 . . .. . = x1 .. + · · · + xm ..
yn
an1
anm
for some x1 , . . . , xm ∈ K and, consequently, y1 .. . = g(f (x1 v1 + · · · + xm vn )), yn
proving that g is surjective.
92
Chapter 2: Linear Transformations
Example 2.3.7. Let V be vector space with a basis {v1 , v2 , v3 , v4 } and let W be vector space with a basis {w1 , w2 , w3 }. If f : V → W is a linear transformation such that the matrix of f relative to the bases {v1 , v2 , v3 , v4 } and {w1 , w2 , w3 } is 1 2 2 1 2 1 3 5 , 4 5 7 7
find a basis of ran f .
1
Solution. Since the reduced row echelon form of the matrix 2 4 1 0 4/3 3 is 0 1 1/3 −1 , the set 0 0 0 0
2 2 1 3 5 7
1
5 7
{w1 + 2w2 + 4w3 , 2w1 + w2 + 5w3 }
is a basis of ran f .
Theorem 2.3.8. Let B, C, and D be bases of vector spaces V, W, and X , respectively, and let f : V → W and g : W → X be linear transformations. If A is the matrix of f relative to the bases B and C and B is the matrix of g relative to the bases C and D, then the matrix BA is the matrix of gf relative to the bases B and D. In other words (gf )B→D = gC→D fB→C . Proof. If B = {v1 , . . . , vm }, C = {w1 , . . . , wn }, D = {x1 , . . . , xp }, and a11 . . . a1m b11 . . . b1n . .. and g .. , fB→C = A = ... C→D = B = .. . . an1 . . . anm
where A ∈ Mn×m (K) and B ∈ Mp×n (K), then
f (vj ) = a1j w1 + · · · + anj wn , for all 1 ≤ j ≤ m, and g(wk ) = b1k x1 + · · · + bpk xp ,
bp1 . . . bpn
93
2.3. LINEAR TRANSFORMATIONS AND MATRICES for all 1 ≤ k ≤ n. Since g(f (vj )) = a1j g(w1 ) + · · · + anj g(wn ) = a1j (b11 x1 + · · · + bp1 xp ) + · · · + anj (b1n x1 + · · · + bpn xp ) a1j a1j . . = b11 . . . b1n .. x1 + · · · + bp1 . . . bpn .. xp anj
anj
and the j-th column of the matrix b11 . . . b1n a11 . . . a1m .. .. .. .. . . . . bp1 . . . bpn an1 . . . anm is
b 11 bp1
a1j . . . b1n ... a1j b11 . . . b1n anj .. .. .. , .. = . . . . b . . . b anj a1j p1 pn . . . . bpn .. anj
the matrix of gf relative to the bases B and D is BA.
Corollary 2.3.9. Let B and C be bases of a vector space V. Then the matrix of the identity function Id : V → V relative to the bases B and C is invertible and its inverse is the matrix of the identity function relative to the bases C and B. Proof. This is an immediate consequence of Theorem 2.3.8 because IdC→B IdB→C = IdB→B
and
IdB→C IdC→B = IdC→C .
Example 2.3.10. We consider the linear vector space M2×2 (K) and the bases 1 0 0 1 0 0 0 0 B= , , , 0 0 0 0 1 0 0 1
94
Chapter 2: Linear Transformations
and C=
1 1 1 1 1 0 0 0 , , , . 1 1 0 1 0 1 0 1
Determine the matrix of Id relative to the bases B and C. Solution. Since the matrix of Id relative to the bases C and B is 1 1 1 0 1 1 0 0 , 1 0 0 0 1
1 1
1
to the bases B and C is −1 0 0 1 0 1 0 0 1 −1 0 0 0 = . 0 0 0 0 1 −1 −1 0 0 1 1 1 1 1
the matrix of Id relative 1 1 1 1 1 0
If {v1 , . . . , vn } is a basis of a vector space V and f : V → V is a linear transformation, then there is a unique n × n matrix A such that for every v = x1 v1 + · · · + xn vn ∈ V we have f (x1 v1 + · · · + xn vn ) = y1 v1 + · · · + yn vn where the numbers y1 , . . . , yn ∈ K are given by y1 x1 .. .. . = A . . yn
xn
This is simply a special case of Theorem 2.3.2. We will say that A is the matrix of the linear transformation f : V → V relative to the basis {v1 , . . . , vn }. Theorem 2.3.11. Let B and C be bases of a vector space V and let f : V → V be a linear transformation. Then the matrix of f relative to the basis C is M = P −1 N P, where N is the matrix of the linear transformation f relative to the basis B and P is the matrix of the identity function Id : V → V relative to the bases C and B.
2.3. LINEAR TRANSFORMATIONS AND MATRICES
95
Proof. Since f = Id f Id and consequently IdC→B fB→B IdB→C = (IdB→C )−1 fB→B IdB→C = fC→C , the result follows from Theorem 2.3.8 because P −1 is the matrix of the identity function Id : V → V relative to the bases B and C.
2.3.2
The isomorphism between Mn×m (K) and L(V, W)
The main result of this section is the fact that, if V and W are vector spaces such that dim V = m and dim W = n, then the space of all linear transformations from V to W can be identified with the space of all n×m matrices. The following theorem formalizes this claim. Theorem 2.3.12. Let {v1 , . . . , vm } and {w1 , . . . , wn } be bases of vector spaces V and W, respectively. For every n × m matrix a11 . . . a1m .. A = ... . an1 . . . anm
we define the linear transformation fA : V → W via fA (vj ) = a1j w1 + · · · + anj wn
for every j ∈ {1, . . . , m}. Then the function ∆ : Mn×m (K) → L(V, W) defined by ∆(A) = fA is an isomorphism.
Proof. We first show the ∆ is a linear transformation. Let
a11 . . . a1m .. A = ... . an1 . . . anm
b11 . . . b1m .. . and B = ... . bn1 . . . bnm
Since ∆(A + B)(vj ) = (a1j + b1j )w1 + · · · + (anj + bnj )wn
= a1j w1 + · · · + anj wn + b1j w1 + · · · + bnj wn = ∆(A)(vj ) + ∆(B)(vj ),
96
Chapter 2: Linear Transformations
we have ∆(A + B) = ∆(A) + ∆(B). If α ∈ K, then ∆(αA)(vj ) = (αa1j )w1 + · · · + (αanj )wn = α(a1j w1 + · · · + anj wn ) = (α∆(A))(vj ),
so ∆(αA) = α∆(A). Consequently ∆ is a linear transformation. If ∆(A) = ∆(B), then a1j w1 + · · · + anj wn = b1j w1 + · · · + bnj wn for all 1 ≤ j ≤ m. Consequently A = B, proving that ∆ is injective. Finally, if g : V → W is an arbitrary linear transformation and g(vj ) = a1j w1 + · · · + anj wn for all 1 ≤ j ≤ m, then g = ∆(A) where a11 . . . a1m .. . A = ... . an1 . . . anm This shows that ∆ is surjective.
Example 2.3.13. The function that assigns to the matrix A = a1 . . . am
the linear transformation fA : Km → K defined by x1 .. fA . = a1 x1 + · · · + am xm xm
is an isomorphism from the vector space M1×m (K) to the vector space L(Km , K).
2.4
Duality
In this section we study the vector space L(V, K), that is the space of all linear transformations from a vector space to the number field K. While L(V, K) is a special case of the vector space of all linear transformations between vector spaces, it has some distinct properties.
97
2.4. DUALITY
2.4.1
The dual space
Definition 2.4.1. Let V be vector space. A linear transformation f : V → K is called a functional or a linear form. The vector space L(V, K) is called the dual space of the vector space V and is denoted by V ′ .
Example 2.4.2. Let V be an n-dimensional vector space and let f : V → K be a nonzero linear form. If a ∈ V is such that f (a) 6= 0, show that V = Ka ⊕ ker f and determine the projection of V on Ka along ker f . Solution. For every v ∈ V we have
f (v) a =0 f v− f (a)
and thus b = v −
f (v) f (a a
∈ ker f . Since we can write v=
f (v) a + b, f (a)
we have V = Ka + ker f. Now we show that this sum is direct. If v ∈ Ka ∩ ker f , then v = αa for some α ∈ K and f (v) = 0. This yields f (αa) = αf (a) = 0. Since f (a) 6= 0, we have α = 0 and consequently v = 0. Therefore V = Ka ⊕ ker f and the projection of the vector v ∈ V on Ka along ker f is
f (v) a. f (a)
Note that, since dim Ka = 1 and dim Ka + dim ker f = dim V = n, we have dim ker f = n − 1, as shown in Corollary 2.1.31.
Example 2.4.3. Let V be an n-dimensional vector space and let f, g ∈ V ′ . If f is nonzero and ker f ⊆ ker g, show that g ∈ Span{f }.
98
Chapter 2: Linear Transformations
Solution. Let v ∈ ker f . With the notation from Example 2.4.2, we have g(αa + v) = αg(a) = Thus g =
αg(a) g(a) g(a) f (a) = f (αa) = f (αa + v). f (a) f (a) f (a)
g(a) f , because V = Ka + ker f . f (a)
Definition 2.4.4. Let {v1 , . . . , vn } be a basis of a vector space V. For every j ∈ {1, . . . , n} by lvj we mean the unique linear form lvj : V → K such that lvj (vj ) = 1 and lvj (vk ) = 0 for every k 6= j. In other words, lvj (x1 v1 + · · · + xn vn ) = xj .
Theorem 2.4.5. If {v1 , . . . , vn } is a basis of V, then {lv1 , . . . , lvn } is a basis of V ′ . Proof. Assume x1 lv1 + · · · + xn lvn = 0. Since, for every 1 ≤ k ≤ n we have 0 = x1 lv1 (vk ) + · · · + xn lvn (vk ) = xk lvk (vk ) = xk , the functions lv1 , . . . , lvn are linearly independent. If f : V → K is the linear transformation such that f (vj ) = aj for every 1 ≤ j ≤ n, then it is easy to verify that f = a1 l v 1 + · · · + an l v n . This shows that Span{lvj , 1 ≤ j ≤ n} = V ′ , completing the proof.
Definition 2.4.6. Let {v1 , . . . , vn } be a basis of the vector space V. The basis {lv1 , . . . , lvn } of V ′ is called the dual basis of the basis {v1 , . . . , vn }.
Example 2.4.7. Find the dual basis of the basis {1, t, . . . , tn } in the space Pn (K).
99
2.4. DUALITY Solution. According to the definition of the dual basis we have l1 (a0 + a1 t + · · · + an tn ) = a0 , lt (a0 + a1 t + · · · + an tn ) = a1 , .. . ltn (a0 + a1 t + · · · + an tn ) = an . Note that for any p ∈ Pn (K) we could write l1 (p) = p(0), lt (p) = p′ (0), lt2 (p) =
1 1 ′′ p (0), . . . , ltn (p) = p(n) (0). 2 n!
This formulation has the advantage that we don’t have to write p in the form a0 + a1 t + · · · + an tn . For example, if p(t) = ((t2 + t + 1)3 + t + 3)7 , it would be quite time consuming to calculate lt (p) using the first formula. Calculating it using lt (p) = p′ (0) is much simpler.
Theorem 2.4.8. Let V be a vector space such that dim V = n. If v1 , . . . , vj are linearly independent vectors in V, then the set Q = {f ∈ V ′ : f (v1 ) = · · · = f (vj ) = 0} is a vector subspace of V ′ and dim Q = n − j. Proof. First we extend {v1 , . . . , vj } to a basis {v1 , . . . , vn } of V. Let {lv1 , . . . , lvn } be its dual basis. It is easy to see that Q is a subspace of V ′ . Now we show that {lvj+1 , . . . , lvn } is a basis of Q. The linear functionals lvj+1 , . . . , lvn are in Q and are linearly independent, so we only have to show that Span{lvj+1 , . . . , lvn } = Q. If f ∈ Q, then we can write f = x1 lv1 + · · · + xn lvn where x1 , . . . , xn are numbers from K. Since, for every 1 ≤ k ≤ j, we have 0 = f (vk ) = xk , we conclude that f = xj+1 lvj+1 + · · · + xn lvn .
2.4.2
The bidual
For any vector space V the dual space V ′ is a vector space, so it makes sense to consider its dual, that is, (V ′ )′ .
100
Chapter 2: Linear Transformations Definition 2.4.9. Let V be vector space. The vector space (V ′ )′ is called the bidual of V.
Example 2.4.10. Let V be a vector space and let v ∈ V. It’s easy to verify that the function gv : V ′ → K defined by gv (l) = l(v) is an element of the bidual, that is, a linear form on V ′ . In this section we will show that, if V is finite dimensional, then the spaces V and (V ′ )′ are isomorphic (see Theorem 2.4.12). First we need to prove an auxiliary result.
Lemma 2.4.11. Let V be a vector space of finite dimension and let v ∈ V. If l(v) = 0 for every l ∈ V ′ , then v = 0. Proof. Let {v1 , . . . , vn } be a basis of V and let {lv1 , . . . , lvn } be the dual basis. If v ∈ V, then v = x1 v1 + · · · + xn vn for some x1 , . . . , xn ∈ K. Since, for every 1 ≤ j ≤ n, we have 0 = lvj (v) = xj , which means that v = 0. Now we prove the main result of this section. Theorem 2.4.12. Let V be a finite dimensional vector space. The function Γ : V → (V ′ )′ , which associates with every vector v ∈ V the linear form gv : V ′ → K defined by gv (l) = l(v), is an isomorphism from V to (V ′ )′ .
Proof. For any v, v1 , v2 ∈ V, α ∈ K, and l ∈ V ′ , we have gv1 +v2 (l) = l(v1 + v2 ) = l(v1 ) + l(v2 ) = gv1 (l) + gv2 (l) and gαv (l) = l(αv) = αl(v) = αgv (l). This shows that Γ is a linear transformation. If gv (l) = l(v) = 0 for every l ∈ V ′ , then v = 0 by Lemma 2.4.11. Consequently, Γ is injective and thus dim ran Γ = dim V. Finally, since dim V = dim V ′ = dim(V ′ )′ , we have ran Γ = (V ′ )′ because ran Γ is a subspace of (V ′ )′ such that dim ran Γ = dim(V ′ )′ .
101
2.4. DUALITY
The linear form gv : V ′ → K defined by gv (l) = l(v) is called the canonical isomorphism from V to (V ′ )′ . From Theorem 2.4.12 it follows that every basis of V ′ is the dual basis of some basis of V. Theorem 2.4.13. Let V be vector space and let {f1 , . . . , fn } be a basis of V ′ . Then there is a basis {v1 , . . . , vn } of V such that {f1 , . . . , fn } is its dual basis.
Proof. Let V be an n-dimensional vector space. Let {f1 , . . . , fn } be a basis of V ′ and let {lf1 , . . . , lfn } be its dual basis in (V ′ )′ . By Theorem 2.4.12, there exist vectors v1 , . . . , vn ∈ V such that Γ(vj ) = lfj for j ∈ {1, . . . , n}. Since {lf1 , . . . , lfn } is a basis of (V ′ )′ and Γ is an isomorphism, {v1 , . . . , vn } is a basis of V. The set {f1 , . . . , fn } is the dual basis of {v1 , . . . , vn } because for every j ∈ {1, . . . , n} we have fj (vj ) = Γ(vj )(fj ) = lfj (fj ) = 1 and fj (vk ) = Γ(vk )(fj ) = lfk (fj ) = 0 whenever j 6= k. Note the similarity between the next theorem and Theorem 2.4.8. We could say that Theorem 2.4.14 is a “dual version” of Theorem 2.4.8. The proofs of these two theorems are also similar. Theorem 2.4.14. Let V be a vector space such that dim V = n. If f1 , . . . , fj ∈ V ′ are linearly independent, then the set U = {v ∈ V : f1 (v) = · · · = fj (v) = 0} is a vector subspace of V and dim U = n − j. Proof. First we extend {f1 , . . . , fj } to a basis {f1 , . . . , fn } of V ′ . Let {v1 , . . . , vn } be a basis of V such that {f1 , . . . , fn } is its dual basis. It is easy to see that U is a subspace of V. Now we show that {vj+1 , . . . , vn } is a basis of U. The vectors vj+1 , . . . , vn are linearly independent, so we only have to show that Span{vj+1 , . . . , vn } = U. If u ∈ U, then u = x1 v1 + · · · + xn vn where x1 , . . . , xn are numbers from K.
102
Chapter 2: Linear Transformations
Since, for every 1 ≤ k ≤ j, we have 0 = fk (u) = xk , we conclude that u = xj+1 vj+1 + · · · + xn vn .
2.5
Quotient spaces
For a vector space V and its subspace U there is a subspace W such that V = U ⊕ W. While the space W is not unique, we can show that, if U ⊕W1 = U ⊕W2 , then the spaces W1 and W2 are isomorphic. In this section we present a canonical way of constructing, for a given vector space V and its subspace U, a space that is isomorphic to every space W such that V = U ⊕ W. If U is a subspace of a vector space V, then we define x + U = {x + u : u ∈ U}. In this section we will use the following notation b = x + U, x
which is a generalization of what was introduced in Example 1.1.8. This notation makes sense only if it is clear what the subspace U is. Note that, while x is a b is a set of vectors. In particular, we have x = x + 0 ∈ x b. vector, x Lemma 2.5.1. Let x and y be vectors in a vector space V and let U be b=y b if and only if y = x + u for some u ∈ U. a subspace of V. Then x
b=x b. Then Proof. First assume that y
b=x b = x + U. y∈y
Consequently, y = x + u for some u ∈ U. Now assume that y = x + u for some u ∈ U. Then b = y + U = (x + u) + U = x + (u + U) = x + U = x b. y
Corollary 2.5.2. Let V be a vector space and let U be a subspace of V. If x ∈ V and u ∈ U, then \ b=x x + u.
103
2.5. QUOTIENT SPACES
Theorem 2.5.3. Let V be a vector space and let U be a subspace of V. b∩y b 6= ∅, then If x b=y b. x
b∩y b 6= ∅, then there are vectors u1 , u2 ∈ U such that x + u1 = y + u2 Proof. If x and we have b = x\ b. x + u1 = y\ + u2 = y b if and only if x ∈ U. b=0 Note that the above implies that x
Definition 2.5.4. Let V be a vector space and let U be a subspace of V. The set V/ U = {x + U : x ∈ V} is called the quotient space of V by U.
b’s with x ∈ V. The In other words, but less precisely, V/ U is the set of all x quotient space V/ U becomes a vector space if we define \ b+y b=x x +y
c and αb x = αx
for any x, y ∈ V and α ∈ K. It is easy to verify that these operations are b = U is the zero vector in V/ U. It is important that well-defined. Note that 0 the operations in V/ U are defined in such a way that the function q : V → V/ U b is a linear transformation. defined by q(x) = x Definition 2.5.5. Let U be a subspace of a vector space V. The function q : V → V/ U defined by b q(x) = x is called the quotient linear transformation.
The following example will be generalized in Exercise 2.75.
Example 2.5.6. Show that dim V/ U = 1 if and only if U ⊕ Kv = V for some v ∈ V. Solution. Let q : V → V/ U be the quotient linear transformation and let b = U the vector v b 6= 0 {q(v)} = {b v} be a basis for V/ U. Note that because v b = αb c and is not in U. Then for every x ∈ V there is α ∈ K such that x v = αv,
104
Chapter 2: Linear Transformations
thus x = αv + u, for some u ∈ U. Consequently, V = U + Kv and the sum U + Kv is direct, because v ∈ / U. Conversely, if U ⊕ Kv = V, then every vector x ∈ V can be written as b = αv c = αb x = u + αv for some α ∈ K. Consequently, x v and thus {b v} is a basis for V/ U.
Example 2.5.7. Let v, u, w be linearly independent vectors in R3 . The set {u + Rv + Rw} is a basis of R3 /(Rv + Rw).
Theorem 2.5.8. Let U be a subspace of a finite dimensional vector space V. Then dim V = dim U + dim V/ U.
Proof. This result can be obtained from the Rank-Nullity Theorem 2.1.28. Indeed, U = ker q and V/ U = ran q where q : V → V/ U is the quotient linear transformation (see Definition 2.5.5).
Theorem 2.5.9. Let V and W be vector spaces and let f : V → W be a linear transformation. There is an isomorphism g : V/ ker f → ran f such that f (x) = gq(x) = g(b x), where x is a vector from V and q : V → V/ ker f is the quotient linear transformation.
Proof. For x ∈ V we define g(b x) = f (x). Note that g is a well-defined function \ b =x g : V/ ker f → ran f . Indeed, if x + y for some y ∈ ker f , then we have \ g(x + y) = f (x + y) = f (x) because f (y) = 0. Since for every x1 , x2 ∈ V we have b2 ) = g(x\ g(b x1 + x x1 ) + g(c x2 ) 1 + x2 ) = f (x1 + x2 ) = f (x1 ) + f (x2 ) = g(c
and for every x ∈ V and α ∈ K we have
c = f (αx) = αf (x) = αg(b g(αb x) = g(αx) x),
g is a linear transformation.
2.6. EXERCISES
105
b Consequently, g is b = 0. If g(b x) = f (x) = 0, then x ∈ ker f and thus x injective. Finally, if y ∈ ran f , then the there is x ∈ V such that f (x) = y. Consequently, g(b x) = y and thus g is surjective.
Corollary 2.5.10. If U and W are subspaces of a vector space V such that V = U ⊕ W, then the spaces V/ U and W are isomorphic.
Proof. If f : U ⊕ W → W is the projection on W along U, then ker f = U and ran f = W. Therefore the result is a consequence of Theorem 2.5.9. Note that if the vector space V is finite dimensional we can get Theorem 2.5.8 as a consequence of Corollary 2.5.10.
Example 2.5.11. Let R3 = Ru ⊕ Rv ⊕ Rw. If f : R3 → Rv ⊕ Rw is the projection on Rv ⊕ Rw along Ru, then the function g : R3/ Ru → Rv ⊕ Rw defined by g(q(αv + βw + γu)) = αv + βw, where q : R3 → R/ Ru is the quotient linear transformation, is an isomorphism.
2.6 2.6.1
Exercises Basic properties
Exercise 2.1. Let V be a vector space and let U be a subspace of V. If f : V → V and f (U) ⊆ U, then we say that U is f -invariant. Let f, g : V → V be linear transformations. If U is an f -invariant and a g-invariant subspace, show that U is a (gf )-invariant subspace. Exercise 2.2. Let V be a vector space and let f : V → V be a linear transformation. If U is an f -invariant subspace (see Exercise 2.1), show that the restriction of f to U is a linear transformation fU : U → U. Exercise 2.3. Let V be a vector space and f : V → V be a linear transformation. We suppose that U and W are f -invariant subspaces (see Exercise 2.1). Show that U ∩ W is an f -invariant subspace. Exercise 2.4. Let V be a vector space and f : V → V be a linear transformation. We suppose that U and W are f -invariant subspaces (see Exercise 2.1). Show that U + W is an f -invariant subspace.
106
Chapter 2: Linear Transformations
Exercise 2.5. Let V and W be vector spaces, let U be a subspace of V, and let g : U → W be a linear transformation. If V is a finite dimensional space, show that there is a linear transformation f : V → W such that the restriction of f to U is g. Exercise 2.6. Let f : V → W be a linear transformation and let {w1 , . . . , wn } be a basis of ran f . If wj = f (uj ) for every j ∈ {1, . . . , n} and some u1 , . . . , un ∈ V, then V = Span{u1 , . . . , un } ⊕ ker f . Exercise 2.7. If f : V → K is a nonzero linear transformation, then there is a vector u ∈ V such that f (u) = 1 and V = Span{u} ⊕ ker f . Determine the projection on ker f along Span{u}. Exercise 2.8. Let V and W be vector spaces. For arbitrary w ∈ W and a linear transformation f : V → K we define a function w ⊗ f : V → W by (w ⊗ f )(v) = f (v)w. Show that, if g : V → W is a linear transformation such that dim ran g = 1, then there exist w ∈ W and a linear transformation f : V → K such that g = w ⊗ f . Exercise 2.9. Let V be a vector space and let v1 , . . . , vn ∈ V. We define a function f : Kn → V by x1 .. f . = x1 v1 + · · · + xn vn xn
Show that ker f 6= 0 if and only if the vectors v1 , . . . , vn are linearly dependent.
Exercise 2.10. Let V be a vector space and let v be a nonzero vector in V. If there is a vector subspace U ⊆ V such that U ⊕ Kv = V, show that there is a linear transformation f : V → K such that ker f = U. Exercise 2.11. Let V be a vector space and f : V → K be a nonzero linear transformation. Show that there is a nonzero vector v ∈ V such that ker f ⊕ Kv = V. Exercise 2.12. Let V be a vector space and let f, g : V → K be nonzero linear transformations such that ker f = ker g. Show that there is a nonzero number α ∈ K such that g = αf . Exercise 2.13. Let V and W be vector spaces and let U1 , . . . , Un be subspaces of V such that V = U1 ⊕ · · · ⊕ Un . If f1 : U1 → W, . . . , fn : Un → W are linear transformations, show that there is an unique linear transformation g : V → W such that g(uj ) = fj (uj ) for every j ∈ {1, . . . , n} and uj ∈ Uj . Exercise 2.14. Let V, W, and U be vector spaces and let V1 ⊆ V and W1 ⊆ W be subspaces. If V = U ⊕ V1 and dim V1 = dim W1 = n for some n ≥ 1, show that there is a linear transformation f ∈ L(V, W) such that ker f = U and f (V1 ) = W1 .
2.6. EXERCISES
107
Exercise 2.15. Let V and W be vector spaces and let f : V → W be a linear transformation. Show that the set {(v, f (v))|v ∈ V} is a vector subspace of V × W. Exercise 2.16. If V and W are finite dimensional vector spaces, show that there is an injective linear transformation f : V → W if and only if dim V ≤ dim W. Exercise 2.17. Show that the function f : DR1 (R) → FR (R) defined by f (ϕ) = ϕ + 2ϕ′ is linear and determine ker f . Exercise 2.18. Show that the function f : DR2 (R) → FR (R) defined byf (ϕ) = ϕ + ϕ′′ is linear and determine ker f Exercise 2.19. Find a basis of ker f for the linear transformation f : M2×2 (K) → K defined by a b f = a + b + c + d. c d Exercise 2.20. Let U be the subspace of R3 defined by x U = y ∈ R3 : x + y + z = 0 z
and let f : U → R2 be the linear transformation such that 1 −1 2 1 −1 0 f = and f = . 3 4 0 1
Determine f .
Exercise 2.21. Let V and W be finite dimensional vector spaces and let f : V → W be a linear transformation. If dim W = n, show that for j ∈ {1, . . . , n} there are vectors wj ∈ W and linear transformations fj : V → K such that f = w1 ⊗ f1 + · · · + wn ⊗ fn , where wj ⊗ fj is defined as in Exercise 2.8. Exercise 2.22. Let V and W be vector spaces and let f : V → W be a linear transformation. Show that, if u1 , . . . , un ∈ V are linearly independent vectors such that V = Span{u1 , . . . , un } ⊕ ker f , then {f (u1 ), . . . , f (un )} is a basis of ran f . Exercise 2.23. Let V and W be finite dimensional vector spaces and let f ∈ L(V, W). Use the rank-nullity theorem to show that, if f is surjective, then dim V ≥ dim W. Exercise 2.24. Let V and W be finite dimensional vector spaces. Explain the meaning of the rank-nullity theorem for the function f : V × W → V defined by f (v, w) = v.
108
2.6.2
Chapter 2: Linear Transformations
Isomorphisms
Exercise 2.25. Let f : V → W be a linear transformation and let {w1 , . . . , wn } be a basis of ran f . Then there are u1 , . . . , un ∈ V such that f (uj ) = wj for every j ∈ {1, . . . , n}. Consider the linear transformation g : Span{u1 , . . . , un } → ran f defined by g(u) = f (u) for every u ∈ Span{u1 , . . . , un }. Using Exercise 2.6, show that g is an isomorphism such that for every v ∈ V we have g(h(v)) = f (v), where h : V → V is the projection on Span{u1 , . . . , un } along ker f . Exercise 2.26. Let f : P4 (R) → P4 (R) be the linear transformation defined by f (p) = p′′ . Find n, w1 , . . . , wn , u1 , . . . , un , and h that satisfy the conditions in Exercise 2.25. Exercise 2.27. Let V, W1 , W2 be arbitrary vector spaces. Show that the vector space L(V, W1 × W2 ) is isomorphic to the vector space L(V, W1 ) × L(V, W2 ). Exercise 2.28. Let V be a vector space and let U, W1 , and W2 be subspaces of V. If V = U ⊕ W1 = U ⊕ W2 , show that there is an isomorphism g : W1 → W2 . Exercise 2.29. Show that there is an isomorphism f : K4 → K4 such that f 2 (x) = −x. Exercise 2.30. Let V be a finite dimensional vector space and let f, g ∈ L(V). Show that the operator f g is invertible if and only if both f and g are invertible. Exercise 2.31. Let V and W be vector spaces and let {v1 , . . . , vn } be a basis of V. Show that the function g : L(V, W) → W n defined by g(f ) = (f (v1 ), . . . , f (vn )) is an isomorphism. Exercise 2.32. Let V and W be vector spaces and let {v1 , . . . , vn } be a basis of V. If W is finite dimensional, show that the set S = {f ∈ L(V, W) : f (v1 ) = f (v2 ) = 0} is a vector subspace of L(V, W) and find dim S. Exercise 2.33. Let V and W be vector spaces. Show that there is an isomorphism between V × W and W × V. Exercise 2.34. Let V and W be finite dimensional vector spaces and let f : V → W be a linear transformation. Show that the spaces V and ker f × ran f are isomorphic. Exercise 2.35. Let U, V, and W be vector spaces. Show that, if U and V are isomorphic and V and W are isomorphic, then U and W are isomorphic. Exercise 2.36. Let V be a vector space and let f : V → V be a linear operator such that f 3 = 0. Show that Id −f is an isomorphism. Exercise 2.37. Let V and W be vector spaces and let f : V → W be an isomorphism. Show that ϕ : L(V) → L(W) defined by ϕ(g) = f gf −1 is an isomorphism. Exercise 2.38. Show that the function f : Mm×n (K) → Mn×m (K) defined by f (A) = AT is an isomorphism.
109
2.6. EXERCISES
2.6.3
Linear transformations and matrices
Exercise 2.39. Let V be a vector space such that dim V = 4 and let f : V → V be an operator such that dim ran f = 2. Show that there are bases B and C of V such that 1 0 0 0 0 0 0 0 fB→C = 0 0 1 0 . 0 0 0 0
Exercise 2.40. We consider the linear transformation f : M2×2 (K) → M2×2 (K) defined by f (X) = 12 (X + X T ). Determine the matrix of f relative to the basis B=
1 0 0 1 0 0 0 0 , , , . 0 0 0 0 1 0 0 1
Exercise 2.41. Let U = Span{cos t, t cos t, sin t, t sin t} and let f : U → U be the operator f (ϕ) = ϕ′ . Show that B = {cos t, t cos t, sin t, t sin t} is a basis of U and determine the B-matrix of f . Exercise 2.42. Let U = Span{cos t, t cos t, sin t, t sin t} and let f : U → U be the operator f (ϕ) = ϕ′′ . Determine the matrix of f relative to the basis {cos t, t cos t, sin t, t sin t}. Exercise 2.43. Let V = Span{v1 , v2 } = Span{w1 , w2 } where {v1 , v2 } and {w1 , w2 } are bases. Let f : V → V be defined by f (v1 ) = 5v1 + 7v2 and f (v2 ) = 2v1 + 3v2 . If v1 = 2w1 − w2 and v2 = 5w1 + 4w2 , determine the matrix of f relative to the basis {w1 , w2 }.
2.6.4
Duality
Exercise 2.44. We define f1 , f2 , f3 ∈ (R3 )′ by x x x f1 y = 2x + y + z, f2 y = x + 2y + z, and f3 y = x + y + 2z. z z z Show that f1 , f2 , and f3 are linearly independent.
Exercise 2.45. Let V be a vector space and let {v1 , . . . , vn } be a basis of V. Show that f = f (v1 )lv1 + · · · + f (vn )lvn for every f ∈ V ′ . Exercise 2.46. Let V be an n-dimensional vector space and let {f1 , . . . , fn } be a basis of V ′ . Show that the function g : V → Kn defined by g(v) = (f1 (v), . . . , fn (v)) is an isomorphism. Exercise 2.47. Let V be a finite dimensional vector space, let U be a subspace of V, and let x ∈ V be such that x ∈ / U. Show that there is a linear form f ∈ V ′ such that f (x) 6= 0 and f (u) = 0 for every u ∈ U.
110
Chapter 2: Linear Transformations
Exercise 2.48. Let V be a finite dimensional vector space and let f, g1 . . . , gn ∈ V ′ . If f (x) = 0 for every x ∈ ker g1 ∩ · · · ∩ ker gn , show that f is a linear combination of g1 , . . . , gn . Exercise 2.49. Let f : V → W be a linear transformation. We define the function f T : W ′ → V ′ by f T (l)(v) = l(f (v)) for l ∈ W ′ and v ∈ V. Show that the function f T is a linear transformation. Exercise 2.50. Let V and W be vector spaces and let BV = {v1 , . . . , vn } and BW = {w1 , . . . , wm } be a bases of V and W, respectively. Let f : V → W be a linear transformation and let A be the matrix of f relative to the bases BV and BW . Show that, if f T : W ′ → V ′ is the linear transformation defined in Exercise 2.49, then the matrix of f T relative to the dual bases {lw1 , . . . , lwm } and {lv1 , . . . , lvn } is AT . Exercise 2.51. Let f : V → W and g : W → X be linear transformations. Show that (gf )T = f T g T . Exercise 2.52. Let V and W be vector spaces and let f ∈ L(V, W). Show that, if f is an isomorphism, then f T is an isomorphism. Exercise 2.53. Let V and W be finite dimensional vector spaces and let f ∈ L(V, W). Let G : (V ′ )′ → (W ′ )′ be defined by G(F )(l) = F (lf ) for F ∈ (V ′ )′ and l ∈ W ′ . If S : V → (V ′ )′ and T : W → (W ′ )′ be the canonical isomorphisms, show that, if S(v) = F for some v ∈ V, then T (f (v)) = G(F ). Exercise 2.54. Let V and W be finite dimensional vector spaces and let f ∈ L(V, W). Show that there is a unique linear transformation g ∈ L(V, W) such that l(g(v)) = f T (l)(v) for every l ∈ W ′ and v ∈ V. Exercise 2.55. Let U be a subspace of a finite dimensional vector space V. Show that the set U 0 = {l ∈ V ′ : l(u) = 0 for every u ∈ U} is a subspace of V ′ and that dim U 0 + dim U = dim V. Exercise 2.56. Let V and W be finite dimensional vector spaces. Show that the function f : L(V, W) → L(W ′ , V ′ ) defined by f (g) = g T is an isomorphism. Exercise 2.57. Let V1 , V2 , and W be vector spaces. Show that the vector space L(V1 × V2 , W) is isomorphic to the vector space L(V1 , W) × L(V2 , W). Exercise 2.58. Let V and W be vector spaces and let f : V → W be a linear transformation. Show that (ran f )0 = ker f T , where (ran f )0 is defined in Exercise 2.55 and f T is as in Exercise 2.49. Exercise 2.59. Let V and W be finite dimensional vector spaces and let f : V → W be a linear transformation. Using Exercise 2.58 show that f is surjective if and only if f T is injective. Exercise 2.60. Let U be a subspace of a vector space V and let U 0 be as defined in Exercise 2.55. If U 00 = {x ∈ V : l(x) = 0 for every l ∈ U 0 }, show that U 00 = U.
111
2.6. EXERCISES
Exercise 2.61. Let V and W be finite dimensional vector spaces, f ∈ L(V, W), and l ∈ V ′ . If l ∈ (ker f )0 , show that there is m ∈ W ′ such that l = mf . Exercise 2.62. Let V and W be finite dimensional vector spaces and let f ∈ L(V, W). Show that (ker f )0 = ran f T , where (ker f )0 is defined in Exercise 2.55 and f T is as in Exercise 2.49. Exercise 2.63. Let V and W be finite dimensional vector spaces and let f : V → W be a linear transformation. Using Exercises 2.55 and 2.58 show that dim ran f = dim ran f T . Exercise 2.64. Use Exercise 2.63 to show that f is injective if and only if f T is surjective. Exercise 2.65 (Rank Theorem). Use Exercises 2.50 and 2.63 to show that if A ∈ Mn×m (K), then dim C(A) = dim C(AT ). Exercise 2.66. Let V and W be vector spaces with bases {v1 , . . . , vm } and {w1 , . . . , wn }, respectively. Show that the set of all linear transformations wk ⊗ lvj , where 1 ≤ j ≤ m and 1 ≤ k ≤ n, is a basis of L(V, W). (See Exercise 2.8 for the definition of wk ⊗ lvj .) Exercise 2.67. Let V and W be vector spaces with bases {v1 , . . . , vm } and {w1 , . . . , wn }, respectively. For every (j, k) ∈ {1, . . . , m} × {1, . . . , n} we define the linear transformation fjk : V → W by ( wk if i = j, fjk (vi ) = 0 if i 6= j. Show that the set {fjk : (j, k) ∈ {1, . . . , m} × {1, . . . , n}} is a basis of L(V, W). Exercise 2.68. Let V be a finite dimensional vector space and let f, f1 , . . . , fj ∈ V ′ . If f ∈ / Span{f1 , . . . , fj }, then there is w ∈ V such that f (w) 6= 0 and fk (w) = 0 for every k ∈ {1, . . . , j}.
2.6.5
Quotient spaces
Exercise 2.69. Let V and W be vector spaces and let f : V → W be a linear transformation. Show that the function fb : V/ ker f → ran f defined by fb(v + ker f ) = f (v) is an isomorphism.
Exercise 2.70. Let R3 = Ru⊕Rv⊕Rw. If f : R3 → Rv⊕Rw is the projection on Rv ⊕ Rw along Ru, then the function g : R3/ Ru → Rv ⊕ Rw defined by g((αv + βw + γu)∧ ) = αv + βw, is an isomorphism. Exercise 2.71. Let V be a vector space and let f : V → K be a nonzero linear transformation. Show that there is an isomorphism between V/ ker f and K.
112
Chapter 2: Linear Transformations
Exercise 2.72. Let V and W be vector spaces and let f : V → W be a linear transformation. Show that the spaces V and ker f × V/ ker f are isomorphic. Exercise 2.73. Let V and W be finite dimensional vector spaces. If V1 ⊆ V and W1 ⊆ W are subspaces, show that the spaces (V × W)/(V1 × W1 ) and V/ V1 × W/ W1 are isomorphic. Exercise 2.74. Let V and W be vector spaces and let U a subspace of V. If f : V → W is a linear transformation such that U ⊆ ker f , show that the function g : V/ U → W defined by g(q(x)) = f (x), where q : V → V/ U is the quotient linear transformation, is a well-defined linear transformation. Exercise 2.75. Let U be a subspace of a vector space V. Show that dim V/ U = n if and only if there are linearly independent vectors v1 , . . . , vn ∈ V such that V = U ⊕ Kv1 · · · ⊕ Kvn . Exercise 2.76. Let V be a vector space and let U be a subspace of V. If f : V → V is a linear transformation and U is f -invariant, then there is a unique linear transformation g : V/ U → V/ U such that qf = gq, where q : V → V/ U is the quotient linear transformation. Exercise 2.77. If U and W are subspaces of a vector space V such that V = b U ⊕ W, show that the linear transformation h : W → V/ U defined by h(w) = w is an isomorphism, without using Theorem 2.5.9.
Exercise 2.78. Let v, u, and w be linearly independent vectors in R3 . Show that the set {u + Rv, w + Rv} is a basis of R3/ Rv. Exercise 2.79. Let U and W be subspaces of a vector space V such that V = U ⊕ W. If {w1 , . . . , wn } is a basis of W, show that {w1 + U, . . . , wp + U} is a basis of V/ U.
Chapter 3
Inner Product Spaces Introduction The dot product is an important tool in the linear algebra of Euclidean spaces as well as many applications. In this chapter we investigate properties of vector spaces where an abstract form of the dot product is available. In the context of general vector spaces the name inner product is used instead of dot product. In some examples and exercises in this chapter we will use determinants. In α β particular, we will use the fact that a matrix ∈ M2×2 (K) is invertible if γ δ and only if α β det = αδ − βγ 6= 0. γ δ The use of determinants in those examples and exercises is not essential, but it is convenient and it leads to simplifications. Unlike some other textbooks at the same level, we do not consider determinants a forbidden tool.
3.1
Definitions and examples
In Chapter 2 we used the name linear form to mean a linear function f : V → K. In this chapter we consider functions f : V × V → K. Since V × V is a vector space, we can talk about linearity of f : V × V → K: f (a1 (x1 , y1 ) + a2 (x2 , y2 )) = a1 f (x1 , y1 ) + a2 f (x2 , y2 ). In the context of inner product spaces it is natural to consider a different property of functions f : V × V → K related to linearity, namely bilinearity. 113
114
Chapter 3: Inner Product Spaces
Definition 3.1.1. By a bilinear form on a vector space V we mean a function f : V × V → K such that (a) f (x1 + x2 , y) = f (x1 , y) + f (x2 , y), (b) f (x, y1 + y2 ) = f (x, y1 ) + f (x, y2 ), (c) f (αx, y) = αf (x, y), (d) f (x, αy) = αf (x, y), for all vectors x, x1 , x2 , y, y1 , y2 ∈ V and all numbers α ∈ K. Note that the conditions (a)-(d) in the above definition can be expressed as a single equality: f (a1 x1 + a2 x2 , b1 y1 + b2 y2 ) = a1 b1 f (x1 , y1 ) + a2 b1 f (x2 , y1 ) + a1 b2 f (x1 , y2 ) + a2 b2 f (x2 , y2 ). Clearly, this condition implies m n n m X X X X f aj xj , bk yk = aj bk f (xj , yk ). j=1
k=1
j=1 k=1
The conditions for linearity and bilinearity of a function f : V × V → K are not equivalent. Both linear and bilinear functions satisfy conditions (a) and (b), but for a linear f we have f (ax, ay) = af (x, y) and for a bilinear f we have f (ax, ay) = a2 f (x, y).
Example 3.1.2. If f and g are linear forms on a vector space V, then the function h(x, y) = f (x) + g(y) is a linear form on V × V and the function k(x, y) = f (x)g(y) is a bilinear form on V. If K = C, the field of complex numbers, then there are reasons to replace condition (d) in the definition of bilinearity with the condition f (x, ay) = af (x, y), where a denotes the complex conjugate of a.
115
3.1. DEFINITIONS AND EXAMPLES
Definition 3.1.3. By a sesquilinear form on a vector space V we mean a function s : V × V → K such that (a) s(x1 + x2 , y) = s(x1 , y) + s(x2 , y), (b) s(x, y1 + y2 ) = s(x, y1 ) + s(x, y2 ), (c) s(αx, y) = αs(x, y), (d) s(x, αy) = αs(x, y), for all vectors x, x1 , x2 , y, y1 , y2 ∈ V and all numbers α ∈ K. As in the case of bilinear form, the conditions (a)-(d) in the above definition can be expressed as a single equality: f (a1 x1 + a2 x2 , b1 y1 + b2 y2 ) = a1 b1 f (x1 , y1 ) + a2 b1 f (x2 , y1 ) + a1 b2 f (x1 , y2 ) + a2 b2 f (x2 , y2 ). In general, we have m n m X n X X X f aj xj , bk yk = aj bk f (xj , yk ). j=1
k=1
j=1 k=1
Note that for a function f : V × V → R the conditions for bilinearity and sesquilinearity are equivalent.
Example 3.1.4. The function u1 v1 √ 1 s u2 , v2 = 3u1 v1 + 2u2 v2 + u3 v3 5 u3 v3
is a sesquilinear form on the the vector space C3 over C.
Definition 3.1.5. A form f : V×V → K is called symmetric, if f (x, y) = f (y, x) for all x, y ∈ V. The functions h and k in Example 3.1.2 are symmetric if and only if f (x) = g(x) for all x ∈ V. Note that, if f : V × V → K is symmetric, then (b) in Definition 3.1.1 follows from (a). Similarly, (d) follows from (c).
116
Chapter 3: Inner Product Spaces
The following theorem is an easy consequence of the definition of symmetric bilinear forms. Theorem 3.1.6. Let V be a vector space over R and let s : V × V → R be a symmetric bilinear form. Then s(x, y) =
1 [s(x + y, x + y) − s(x − y, x − y)] 4
for all x, y ∈ V. The identity in the above theorem is often referred to as a polarization identity. It implies that, if the values of s(x, x) are known for all x ∈ V, then the values of s(x, y) are known for all x, y ∈ V, which is often used in arguments. Corollary 3.1.7. If s1 and s2 are symmetric bilinear forms on a real vector space V such that s1 (x, x) = s2 (x, x)
for all x ∈ V,
then s1 (x, y) = s2 (x, y)
for all x, y ∈ V.
In particular, if s is a symmetric bilinear form such that s(x, x) = 0 for every x ∈ V, then s = 0. Note that the above property is not true for all bilinear forms. Indeed, for the bilinear form s : R2 × R2 → R defined by y x1 , 1 = x1 y2 − x2 y1 s(x, y) = s y2 x2 we have s(x, x) = 0 for every x ∈ R2 , but it is not true that s(x, y) = 0 for every x, y ∈ R2 . It turns out that for sesquilinear forms a different condition is more natural than symmetry. Definition 3.1.8. A sesquilinear form s : V × V → K is called a hermitian form if s(x, y) = s(y, x) for all x, y ∈ V.
117
3.1. DEFINITIONS AND EXAMPLES
Example 3.1.9. The function u1 v1 √ 1 s u2 , v2 = 3u1 v1 + 2u2 v2 + u3 v3 5 u3 v3
considered in Example 3.1.4 is a hermitian sesquilinear form, but the function
u1 v1 √ i s u2 , v2 = 3u1 v1 + 2u2 v2 + u3 v3 5 v3 u3
is not.
As in the case of symmetric forms, if s is a hermitian form, then the condition (b) in Definition 3.1.3 follows from (a) and (d) follows from (c).
Theorem 3.1.10. Let V be a vector space over C and let s : V×V → C be a sesquilinear form on V. Then s is hermitian if and only if s(x, x) ∈ R for every x ∈ V.
Proof. If s is hermitian, then for every x ∈ V we have s(x, x) = s(x, x) which means that s(x, x) ∈ R. Suppose now that s(v, v) ∈ R for every v ∈ V. Then
α = s(x, y) + s(y, x) = s(x + y, x + y) − s(x, x) + s(y, y) ∈ R and β = i(−s(x, y) + s(y, x)) = s(x, iy) + s(iy, x) = s(x + iy, x + iy) − s(x, x) + s(iy, iy) ∈ R. Since s(x, y) = s is hermitian.
1 (α + iβ) 2
and s(y, x) =
1 (α − iβ), 2
118
Chapter 3: Inner Product Spaces Theorem 3.1.11. Let V be a vector space over C and let s : V × V → C be a hermitian form. Then s(x, y) =
1 [s(x + y, x + y) − s(x − y, x − y) 4 + is(x + iy, x + iy) − is(x − iy, x − iy)]
for all x, y ∈ V.
Proof. The result is a consequence of the following equalities: s(x + y, x + y) − s(x − y, x − y) = 2s(x, y) + 2s(y, x) = 2s(x, y) + 2s(x, y) = 4 Re s(x, y) and s(x + iy, x + iy) − s(x − iy, x − iy) = 4 Re s(x, iy)
= 4 Re(−is(x, y)) = 4 Im s(x, y).
The identity in the above theorem is a complex version of the polarization identity. As in the real case it implies the following useful property of hermitian sesquilinear forms. Corollary 3.1.12. If s1 and s2 are hermitian forms on a complex vector space V such that s1 (x, x) = s2 (x, x)
for all x ∈ V,
then s1 (x, y) = s2 (x, y)
for all x, y ∈ V.
In particular, if s is a sesquilinear form such that s(x, x) = 0 for every x ∈ V, then s = 0.
3.1. DEFINITIONS AND EXAMPLES
119
Definition 3.1.13. A sesquilinear form s : V ×V → K is called a positive form if s(x, x) ≥ 0 for every x ∈ V. A positive form is called positive definite if s(x, x) > 0 whenever x 6= 0.
The condition s(x, x) ≥ 0 implicitly assumes that s(x, x) ∈ R for every x ∈ V. Consequently, by Theorem 3.1.10, every positive form is hermitian.
Example 3.1.14. The function s : M2×2 (C) × M2×2 (C) → C defined by s
u1 u2 v v , 1 2 = u1 v1 + u2 v2 + u3 v3 + u4 v4 u3 u4 v3 v4
is a positive definite sesquilinear form on M2×2 (C).
Now we are in a position to define the generalization of the dot product to arbitrary vector spaces.
Definition 3.1.15. By an inner product on a vector space V we mean a positive definite sesquilinear form on V. A vector space V with an inner product is called an inner product space.
The inner product of two vectors x and y in V is denoted by hx, yi. Below we list all properties that constitute the definition of an inner product.
120
Chapter 3: Inner Product Spaces
A function h·, ·i : V × V → K is an inner product on V if the following conditions are satisfied. 1. h·, ·i is sesquilinear: (a) hx1 + x2 , yi = hx1 , yi + hx2 , yi
(b) hx, y1 + y2 i = hx, y1 i + hx, y2 i (c) hαx, yi = αhx, yi
(d) hx, αyi = αhx, yi
for all x, y1 , y2 ∈ V,
for all x, y ∈ V and α ∈ K, for all x, y ∈ V and α ∈ K,
2. h·, ·i is hermitian: hx, yi = hy, xi 3. h·, ·i is positive definite:
for all x1 , x2 , y ∈ V,
hx, xi > 0
for all x, y ∈ V, for all 0 6= x ∈ V.
In view of the previous comments and Theorem 3.1.10, in order to verify that a function h·, ·i : V × V → C is an inner product on V it suffices to check the following three conditions: (i) hα1 x1 + α2 x2 , yi = α1 hx1 , yi + α2 hx2 , yi for all x1 , x2 , y ∈ V and α1 , α2 ∈ K, (ii) hx, yi = hy, xi for all x, y ∈ V, (iii) hx, xi > 0 for all nonzero x ∈ V.
Example 3.1.16. The standard inner product in the vector space Cn is defined by * x1 y1 + n X .. .. xj yj = x1 y1 + · · · + xn yn . . , . = xn
yn
j=1
Example 3.1.17. The functions defined in Examples 3.1.4 and 3.1.14 are examples of inner products. The vector space C3 is an inner product space with the inner product defined by * + u1 v1 √ u2 , v2 = 3u1 v1 + 2u2 v2 + 1 u3 v3 . 5 u3 v3
121
3.1. DEFINITIONS AND EXAMPLES More generally, for any positive real numbers α1 , . . . , αn the form * x1 y1 + n X .. .. αj xj yj = α1 x1 y1 + · · · + αn xn yn . , . = xn
yn
j=1
is an inner product in Cn . The vector space M2×2 (C) is an inner product space with the inner product defined by
u1 u2 v v , 1 2 = u1 v1 + u2 v2 + u3 v3 + u4 v4 . u3 u4 v3 v4
This example can be easily generalized to Mm×n (C) for any positive integers m and n.
Example 3.1.18. The vector space C[a,b] (C) of all continuous complex-valued functions on the interval [a, b] is an inner product space with the inner product defined by Z b hf, gi = f (t)g(t)dt. a
More generally, for any ϕ ∈ C[a,b] (C) such that ϕ(t) > 0 for all t ∈ [a, b], the form Z b hf, gi = f (t)g(t)ϕ(t)dt. a
is an inner product in C[a,b] (C).
2 Example 3.1.19. Let V = C[0,1] (C), the space of complex-valued functions on [0, 1] with continuous second derivatives. Show that
hf, gi = f (0)g(0) + f ′ (0)g ′ (0) +
Z
1
f ′′ (t)g ′′ (t)dt
0
is an inner product. Solution. The only nontrivial part is showing that hf, f i = 0 implies f = 0.
122
Chapter 3: Inner Product Spaces Since hf, f i = f (0)f (0) + f
′
(0)f ′ (0)
= |f (0)|2 + |f ′ (0)|2 +
Z
+
Z
1
f ′′ (t)f ′′ (t)dt
0
1 0
|f ′′ (t)|2 dt,
if hf, f i = 0, then f (0) = 0, f ′ (0) = 0, and f ′′ (t) = 0 for all t ∈ [0, 1]. From f ′′ (t) = 0 we get f (t) = at + b. Then, form f (0) = 0 we get b = 0 and finally from f ′ (0) = 0 we get a = 0. Consequently, f (t) = 0 for all t ∈ [0, 1].
1 Example 3.1.20. Let V = C[0,1] (C) the space of complex-valued functions on [0, 1] with continuous derivatives. Show that
hf, gi =
Z
1
f ′ (t)g ′ (t)dt
0
is not an inner product. Solution. The defined function is not an inner product because it is not positive definite. Indeed, hf, f i = 0 implies f ′ = 0, but this does not mean that f = 0 because f could be any constant, not necessarily 0. Note that the defined function is a positive sesquilinear form.
The inequality in the next theorem, known as Schwarz’s Inequality, is one of the most important and useful properties of the inner product. Theorem 3.1.21 (Schwarz’s Inequality). Let V be an inner product space. Then |hx, yi|2 ≤ hx, xihy, yi for all x, y ∈ V. Proof. If y = 0, then the inequality is trivially true since both sides are equal to zero. If y 6= 0, then 0 ≤ hx + αy, x + αyi = hx, xi + αhx, yi + αhy, xi + |α|2 hy, yi, for any α ∈ C. If we let α = − 0 ≤ hx, xi −
hx, yi , then the above inequality becomes hy, yi
hy, xi hx, yi |hx, yi|2 hx, yi − hy, xi + hy, yi. hy, yi hy, yi (hy, yi)2
3.1. DEFINITIONS AND EXAMPLES
123
After multiplying the above inequality by hy, yi and simplifying we get 0 ≤ hx, xihy, yi − |hx, yi|2 , which is Schwarz’s inequality.
Theorem 3.1.22. Let V be an inner product space and let x, y ∈ V. Then |hx, yi|2 = hx, xihy, yi if and only if the vectors x and y are linearly dependent.
Proof. Let x and y be linearly dependent vectors in V. Without loss of generality we can assume that x = αy for some α ∈ K. Then |hx, yi|2 = |hx, αxi|2 = |α|2 (hx, xi)2 = hx, xiααhx, xi = hx, xihαx, αxi = hx, xihy, yi. Now, we assume that x and y are vectors in V such that |hx, yi|2 = hx, xihy, yi. Then hx, yihy, xi = hx, xihy, yi and consequently hhy, yix − hx, yiy, hy, yix − hx, yiyi
= hy, yi2 hx, xi − hy, yihy, xihx, yi − hx, yihy, yihy, xi + hx, yihy, xihy, yi = 0.
This shows that hy, yix − hx, yiy = 0, which implies linear dependence of x and y. The dot product in Rn has an important geometric meaning. For example, if x and y are nonzero vectors in R3 and x • y = 0, then the vectors are perpendicular, that is, the angle between them is 90◦ . In general vector spaces we do not have that geometric interpretation. For example, what would “the angle between functions sin t and cos t” even mean? On the other hand, the importance of the dot product in Rn goes far beyond its connection with the angle between vectors. Many of those properties and applications of the dot product extend to the inner product in general vector spaces. Definition 3.1.23. Let V be an inner product space. Vectors x, y ∈ V are called orthogonal if hx, yi = 0.
124
Chapter 3: Inner Product Spaces
Example 3.1.24. Show that the vectors 1 + 2i −2 − i and 2−i 1 − 2i are orthogonal in the inner product space C2 . Solution. 1 + 2i −2 − i , = (1 + 2i)(−2 − i) + (2 − i)(1 − 2i) = 0. 2−i 1 − 2i
Example 3.1.25. Consider the vector space of continuous functions defined on the interval [−1, 1] with the inner product hf, gi =
Z
1
f (t)g(t) dt.
−1
Show that an odd function f (t), that is a function such that f (−t) = −f (t) for all t ∈ [−1, 1], and cos t are orthogonal. Solution. First we note that the function f (t) cos t is odd since f (−t) cos(−t) = −f (t) cos t for all t ∈ [−1, 1]. Hence, hf (t), cos ti =
Z
0
f (t) cos t dt +
−1
=−
Z
0
Z
1
f (t) cos t dt
0
1
f (t) cos t dt +
Z
1
f (t) cos t dt = 0.
0
Another tool that plays an important role in the linear algebra of Rn is the norm. The standard norm, called the Euclidean norm, is defined as v u n uX k(x1 , . . . , xn )k = t |xk |2 . k=1
A norm in a general vector spaces V is introduced as a function k·k : V → [0, ∞) satisfying certain conditions.
125
3.1. DEFINITIONS AND EXAMPLES
Definition 3.1.26. Let V be a vector space. By a norm in V we mean a function k · k : V → [0, ∞) such that (a) kx + yk ≤ kxk + kyk, (b) kαxk = |α|kxk, (c) kxk = 0 if and only if x = 0, for all vectors x, y ∈ V and all numbers α ∈ K. A vector spece with a norm is called a normed space. The inequality kx + yk ≤ kxk + kyk is called the triangle inequality. Example 3.1.27. Here are two examples of norms in Kn : k(x1 , . . . , xn )k =
n X
k=1
|xk |,
k(x1 , . . . , xn )k = max{|x1 |, . . . , |xn |}.
Example 3.1.28. The function kf k =
Z
a
b
|f (t)|dt
is a norm in the vector space C[a,b] (C) of all continuous functions f : [a, b] → C. It turns out the inner product in a vector space defines in a natural way a norm in that space. That norm has the best algebraic and geometric properties. Theorem 3.1.29. Let V be an inner product space. The function p kxk = hx, xi is a norm in V.
Proof. First notice that kxk is well-defined because hx, xi is always a nonnegative real number. Since the inner product is a positive definite form, kxk = 0 if and only if x = 0. Moreover p p kαxk = hαx, αxi = ααhx, xi = |α|kxk.
126
Chapter 3: Inner Product Spaces
The triangle inequality follows from Schwarz’s inequality: kx + yk2 = hx + y, x + yi = hx, xi + 2 Rehx, yi + hy, yi ≤ hx, xi + 2|hx, yi| + hy, yi
≤ kxk2 + 2kxkkyk + kyk2 (by Schwarz’s inequality) = (kxk + kyk)2 .
Hence kx + yk ≤ kxk + kyk. When p we say the norm in an inner product space we always mean the norm kxk = hx, xi. Example 3.1.30. The standard norm in the vector space Cn is defined by the inner product * x1 y1 + .. .. . , . , xn yn which means that
v
x1 u * x1 x1 + v uX
u u u n
.. u .. .. xj xj .
. = t . , . = t
j=1
xn xn xn Example 3.1.31. Since
1 + 2i 2 * 1 + 2i 1 + 2i +
2 − 3i = 2 − 3i , 2 − 3i = 5 + 13 + 4 = 22
−2 −2 −2
1 + 2i √ the norm of the vector 2 − 3i ∈ C3 is 22. −2
Example 3.1.32. Consider the vector space of continuous functions defined
127
3.1. DEFINITIONS AND EXAMPLES on the interval [−π, π] with the inner product Z 1 π hf, gi = f (t)g(t) dt. π −π Find k sin ntk for any positive integer n. Solution. Since k sin ntk2 = hsin nt, sin nti =
1 π
Z
π
(sin nt)2 dt =
−π
1 2π
Z
π
−π
(1 − cos(2nt)) dt = 1,
we have k sin ntk = 1. Schwarz’s Inequality is often stated and applied in the form stated in the following Corollary. Corollary 3.1.33. Let V be an inner product space. Then |hx, yi| ≤ kxkkyk for all x, y ∈ V. From Theorem 3.1.11 we obtain the polarization identity for the inner product expressed in terms of the norm. Corollary 3.1.34 (Polarization identity). Let V be an inner product space over C. Then hx, yi =
1 kx + yk2 − kx − yk2 + ikx + iyk2 − ikx − iyk2 4
for all x, y ∈ V.
We close this section with a general version of the Pythagorean Theorem that we are all familiar with from geometry. Theorem 3.1.35 (The Pythagorean Theorem). If u and v are orthogonal vectors in an inner product space, then ku + vk2 = kuk2 + kvk2 .
128
Chapter 3: Inner Product Spaces
Proof. If u and v are orthogonal, then hu, vi = hv, ui = 0 and thus ku + vk2 = hu + v, u + vi = hu, ui + hu, vi + hv, ui + hv, vi = kuk2 + kvk2 .
3.2
Orthogonal projections
In an elementary calculus course you have probably seen an exercise similar to the following one: Find a point on the line 2x + 5y = 7 that is closest to the point (3, 6). In calculus we use derivatives to find such a point. Linear algebra offers a simpler and more elegant way of solving the above problem. Moreover, the algebraic method generalizes in a natural way to more difficult problems. For example, it would be rather difficult to use methods form calculus to solve the following problem: R1 t e − a − bt − ct2 2 dt. Find numbers a, b, and c that minimize the integral 0
We will see in this section that both problems are quite similar from the point of view of linear algebra and that the method uses the inner product in a substantial way.
3.2.1
Orthogonal projections on lines v v−p u p 0
We start by considering the simplest case of an orthogonal projection in a general vector space, namely, projection on a one-dimensional subspace. In R2 we observe that, if u is a nonzero vector and v is a point not on the line through u and the origin, then the point p on that line that is the closest to v is characterized by the fact that hv − p, ui = 0. This property generalizes to any inner product space.
129
3.2. ORTHOGONAL PROJECTIONS
Theorem 3.2.1. Let V be an inner product space and let u, v ∈ V. If u 6= 0, then there is a unique vector p ∈ Span{u} such that hv − p, ui = 0. Proof. If p ∈ Span{u}, then there is a number α ∈ K such that p = αu. If hv − p, ui = 0, then hv − p, ui = hv − αu, ui = 0. The equation hv − αu, ui = 0 has a unique solution α = p=
hv, ui . Consequently, hu, ui
hv, ui u is the unique vector in Span{u} such that hv − p, ui = 0. hu, ui
Definition 3.2.2. Let V be an inner product space and let u ∈ V be a nonzero vector. For v ∈ V we define projSpan{u} (v) =
hv, ui hv, ui u= u. hu, ui kuk2
The vector projSpan{u} (v) is called the orthogonal projection of v on the subspace Span{u}.
If u is a unit vector, that is, kuk = 1, then the expression for the projection on Span{u} can be simplified: projSpan{u} (v) = hv, uiu. Assuming that u is a unit vector is not restrictive, since for any u 6= 0 we have Span{u} = Span
u kuk
.
Note that projSpan{u} : V → V is a linear operator. Example 3.2.3. Inthe inner product space C3 find the projection of the 1 i vector 1 on Span −1 . 1 1
130
Chapter 3: Inner Product Spaces
Solution. Since *
+ i i −1 , −1 = 3 and 1 1
* + 1 i 1 , −1 = −i, 1 1
1 i the projection of the vector 1 on Span −1 is 1 1 1 i −i 13 −1 = 3 i . 3 1 −1i
3
Note that, as expected, we have 1 *1 i + 3 1 1 − 3 i , −1 = 0. 1 1 − 13 i
Example 3.2.4. Consider the vector space of continuous functions defined on the interval [−π, π] with the inner product Z 1 π hf, gi = f (t)g(t) dt. π −π Find the projection of the function f (t) = t on Span{sin nt} where n is an arbitrary positive integer. Solution. The projection is ht, sin nti sin nt. hsin nt, sin nti Since Z
Z cos nt 1 cos nt sin nt t sin nt dt = t − + cos nt dt = t − + , n n n n2
we have ht, sin nti =
1 π
Z
π
−π
2 t sin nt dt = (−1)n+1 . n
131
3.2. ORTHOGONAL PROJECTIONS And, since 1 hsin nt, sin nti = k sin ntk = π 2
Z
π
1 (sin nt) dt = 2π −π 2
Z
π
−π
(1 − cos 2nt) dt = 1,
the projection of the function t on Span{sin(nt)} is (−1)n+1
2 sin nt. n
Example 3.2.5. In the vector space Cn with the standard inner product the x1 y1 .. projection of the vector . on Span ... is xn yn * x1 y1 + .. .. . , . xn yn
2
y1
..
.
yn
x1 y1 y1 . . 1 .. . . = 2 y1 . . . yn . ..
y1
xn yn yn
..
.
yn x1 y1 1 = 2 ... y1 . . . yn ... .
y1
xn yn
..
.
yn
y1 y1 .. The matrix . y1 . . . yn is called the projection matrix on Span ... . yn yn
The following theorem establishes properties of projections in general vector spaces that should look familiar from our experience with projections on lines in R2 and R3 . Part (a) says that the projections is the unique vector minimizing the distance. In part (b) we say that the magnitude of the projection cannot exceed the magnitude of the original vector. And in part (c) we show that the projection and the original vector are the same if and only if the original vector is on the line.
132
Chapter 3: Inner Product Spaces
Theorem 3.2.6. Let V be an inner product space and let u be a nonzero vector in V. For all v ∈ V we have (a) kv − projSpan{u} (v)k < kv − βuk β 6=
hv,ui hu,ui ;
for every β ∈ K such that
(b) kprojSpan{u} (v)k ≤ kvk; (c) projSpan{u} (v) = v
if and only if
v ∈ Span{u}.
Proof. (a) From Theorem 3.2.1 we get D E D E v − projSpan{u} (v), αu = α v − projSpan{u} (v), u = 0
for any α ∈ C. Consequently, for any β ∈ C we have D E v − projSpan{u} (v), projSpan{u} (v) − βu = 0,
(3.1)
because projSpan{u} (v) − βu ∈ Span{u}. From (3.1) we get
kv − projSpan{u} (v)k2 + kprojSpan{u} (v) − βuk2 = kv − βuk2 .
(3.2)
Hence kv − projSpan{u} (v)k < kv − βuk whenever kprojSpan{u} (v) − βuk 6= 0, that is, for every β 6= (b) If we let β = 0 in (3.2), then we get
hv,ui hu,ui .
kv − projSpan{u} (v)k2 + kprojSpan{u} (v)k2 = kvk2 . Consequently kprojSpan{u} (v)k ≤ kvk. (c) If projSpan{u} (v) = v, then v = projSpan{u} (v) ∈ Span{u}. Now, if v ∈ Span{u}, then v = αu for some α ∈ C and thus projSpan{u} (v) =
hv, ui hαu, ui u= u = αu = v. hu, ui hu, ui
From the properties of projections proved in Theorem 3.2.6 we can obtain the results in Theorems 3.1.21 and 3.1.22 in a different way.
3.2. ORTHOGONAL PROJECTIONS
133
Theorem 3.2.7. Let V be an inner product space. For all x, y ∈ V we have (a) |hx, yi| ≤ kxkkyk; (b) |hx, yi| = kxkkyk if and only if x and y are linearly dependent. Proof. (a) Since the inequality is trivial when x = 0, we can assume that x 6= 0. By Theorem 3.2.6 we have kprojSpan{x} (y)k ≤ kyk, which means that |hx, yi| kxk ≤ kyk. kxk2 Hence |hx, yi| ≤ kxkkyk. (b) Again, without loss of generality, we can assume that x 6= 0. Then the following statements are equivalent: x and y are linearly dependent y ∈ Span{x} kprojSpan{x} (y)k = kyk
(by (c) in Theorem 3.2.6)
|hx, yi| kxk = kyk kxk2 |hx, yi| = kxkkyk
3.2.2
Orthogonal projections on arbitrary subspaces
Now we generalize orthogonal projections to arbitrary subspaces. While some properties remain the same, some aspects of projections become more complicated when the dimension of the subspace is more than one. We use the property described in Theorem 3.2.1 to define an orthogonal projection on a subspace. Definition 3.2.8. Let U be a subspace of an inner product space V and let v ∈ V. A vector p ∈ U is called an orthogonal projection of the vector v on the subspace U if hv − p, ui = 0 for every vector u ∈ U. Note that it is not clear if the projection always exists. This question will be addressed later.
134
Chapter 3: Inner Product Spaces
Theorem 3.2.9. Let U be a subspace of an inner product space V and let v ∈ V. If an orthogonal projection of v on the subspace U exists, then it is unique. Proof. Assume that both p1 and p2 are orthogonal projections of v on the subspace U, that is, hv − p1 , ui = hv − p2 , ui = 0 for every u ∈ U. Since p1 , p2 ∈ U, we have 0 = hv − p1 , p2 i = hv, p2 i − hp1 , p2 i and 0 = hv − p2 , p2 i = hv, p2 i − hp2 , p2 i = hv, p2 i − kp2 k2 . Consequently, hp1 , p2 i = kp2 k2 . Similarly, we can show that hp2 , p1 i = kp1 k2 . Hence kp1 − p2 k2 = hp1 − p2 , p1 − p2 i = kp1 k2 − hp1 , p2 i − hp2 , p1 i + kp2 k2 = 0, proving that p1 = p2 . The unique orthogonal projection of a vector v on a subspace U is denoted by projU (v). Example 3.2.10. Consider the vector space V of continuous functions defined on the interval [−π, π] with the inner product hf, gi =
1 2π
Z
π
f (t)g(t) dt
−π
and its subspace U = Span{1, cos t, sin t}. Show that the function 2 sin t is a projection of the function t on the subspace U. Solution. Since Z π 1 (t − 2 sin t) dt = 0, 2π −π Z π 1 ht − 2 sin t, cos ti = (t − 2 sin t) cos t dt = 0, 2π −π ht − 2 sin t, 1i =
and
1 ht − 2 sin t, sin ti = 2π
we have projU (t) = 2 sin t.
Z
π
−π
(t − 2 sin t) sin t dt = 0,
3.2. ORTHOGONAL PROJECTIONS
135
Example 3.2.11. Consider the vector space V of continuous R 1 functions defined on the interval [−1, 1] with the inner product hf, gi = −1 f (t)g(t) dt. Find a projection of a function f ∈ V on the subspace E of even functions. Solution. First we note that, if f ∈ V, then the function 21 (f (t) + f (−t)) is even and, for any function g ∈ E, the function (f (t) − f (−t))g(t) is odd. Since Z 1 Z 1 1 1 f (t) − (f (t) + f (−t)) g(t) dt = (f (t) − f (−t))g(t) dt = 0 2 2 −1 −1 for any function g ∈ E, we conclude that projE (f (t)) = 12 (f (t) + f (−t)).
Definition 3.2.12. Let U be a subspace of an inner product space V and let v be a vector in V. A vector p ∈ U is called the best approximation to the vector v by vectors from the subspace U if kv − pk < kv − uk for every u ∈ U such that u 6= p. Note that the definition of the best approximation implies that, if a vector has a best approximation by vectors from the subspace, then that approximation is unique. A projection of a vector v ∈ V on a subspace U can be interpreted as the best approximation of v by vectors from U. This point of view is natural in many applications. Theorem 3.2.13. Let U be a subspace of an inner product space V and let v be a vector in V. The following conditions are equivalent: (a) p ∈ U is the orthogonal projection of v on U; (b) p ∈ U is the best approximation to the vector v by vectors from U. In other words, hv−p, ui = 0 for every u ∈ U if and only if kv−pk < kv−uk for every u ∈ U such that u 6= p. Proof. If p is an orthogonal projection of v on U, then hv − p, ui = 0 for every vector u ∈ U. Let q be an arbitrary vector from U. Then kv − qk2 = kv − p + p − qk2 .
136
Chapter 3: Inner Product Spaces
Since p − q ∈ U, we have hv − p, p − qi = 0 and thus kv − qk2 = kv − pk2 + kp − qk2 , by the Pythagorean Theorem 3.1.35. Hence the vector q = p is the best approximation to the vector v by vectors from U. This shows that (a) implies (b). Now assume that the vector p is the best approximation to the vector v by vectors from U. Let u be an arbitrary nonzero vector in U and let z be a number from K. Then kv − p + zuk2 ≥ kv − pk2 and consequently |z|2 kuk2 + zhu, v − pi + zhv − p, ui ≥ 0. If we take z = thv−p, ui, where t is a real number, and suppose that hv−p, ui 6= 0, then we get t2 kuk2 + 2t ≥ 0 for every real numbers t, which is not possible because kuk 6= 0, and thus hv − p, ui = 0. Since u is an arbitrary nonzero vector in U, p is an orthogonal projection of v on U. Therefore (b) implies (a). The next theorem can be helpful when calculating the projection of a vector on the subspace spanned by vectors u1 , . . . , uk . Theorem 3.2.14. Let u1 , u2 , . . . , uk be vectors in an inner product space V and let v ∈ V. The following conditions are equivalent: (a) projSpan{u1 ,...,uk } (v) = x1 u1 + x2 u2 + · · · + xk uk ; x1 hu1 , u1 i + x2 hu2 , u1 i + · · · + xk huk , u1 i = hv, u1 i x1 hu1 , u2 i + x2 hu2 , u2 i + · · · + xk huk , u2 i = hv, u2 i (b) .. .. .. .. . . . . x1 hu1 , uk i + x2 hu2 , uk i + · · · + xk huk , uk i = hv, uk i Proof. projSpan{u1 ,...,uk } (v) = x1 u1 + · · · + xk uk if and only if hv − p, ui = 0 for every u ∈ Span{u1 , . . . , uk }. Since every u ∈ Span{u1 , . . . , uk } is of the form u = y1 u1 + · · · + yk uk , for some y1 , . . . , yk ∈ K, the equation hv − p, ui = 0 is equivalent to the equations hv − p, u1 i = 0, hv − p, u2 i = 0, . . . , hv − p, uk i = 0
137
3.2. ORTHOGONAL PROJECTIONS or hv − (x1 u1 + · · · + xk uk ), u1 i = 0,
hv − (x1 u1 + · · · + xk uk ), u2 i = 0, .. . hv − (x1 u1 + · · · + xk uk ), uk i = 0, which can also be written as x1 hu1 , u1 i + · · · + xk huk , u1 i = hv, u1 i, x1 hu1 , u2 i + · · · + xk huk , u2 i = hv, u2 i, .. . x1 hu1 , uk i + · · · + xk huk , uk i = hv, uk i.
0 i 1 Example 3.2.15. Find the projection of the vector 1 on Span 1, 0 0 1 0 in the inner product space C3 . Solution. We have to solve the system
x1 hu1 , u1 i + x1 hu1 , u2 i +
x2 hu2 , u1 i x2 hu2 , u2 i
= hv, u1 i , = hv, u2 i
0 i 1 where u1 = 1, u2 = 0, and v = 1, that is, the system 0 0 1
Since the solutions are x1 =
3x1 ix1 1 2
+ ix2 − x2
= 1 . = 0
and x2 = 2i , the projection is
0 1 i 1 i 1 1 + 0 = 2 . 2 2 1 1 0 2
138
Chapter 3: Inner Product Spaces
It is easy to verify that
*
+ * + 0 0 1 1 v − 2 , u1 = 0 and v − 2 , u2 = 0. 1 2
1 2
We are now in a position to prove that the orthogonal projection on a finite dimensional subspace of an inner product space always exists. We show that by showing that the system of equations in Theorem 3.2.14 always has a unique solution. Theorem 3.2.16. Let U be a finite dimensional subspace of an inner product space V. For any vector v ∈ V the orthogonal projection of v on the subspace U exists.
Proof. Let U = Span{u1 , . . . , uk }. Without loss of generality we can suppose that the vectors u1 , . . . , uk are linearly independent. In view of Theorem 3.2.14 it suffices to show that the matrix hu1 , u1 i . . . huk , u1 i .. .. . . hu1 , uk i . . . huk , uk i
is invertible. If
hu1 , u1 i . . . huk , u1 i x1 0 .. .. .. .. . = . , . . hu1 , uk i . . . huk , uk i xk 0
for some x1 , . . . , xk ∈ K, then
hx1 u1 + · · · + xk uk , uj i = 0 for 1 ≤ j ≤ k. Hence hx1 u1 + · · · + xk uk , x1 u1 + · · · + xk uk i = kx1 u1 + · · · + xk uk k2 = 0 and thus x1 u1 + · · · + xk uk = 0. Since the vectors u1 , . . . , uk are linearly independent, we get x1 = · · · = xk = 0. The assumption that the subspace U in the above theorem is finite dimensional is essential. For infinite dimensional subspaces the projection may not
3.2. ORTHOGONAL PROJECTIONS
139
exist. For example, consider the space V ofR all continuous functions on the in1 terval [0, 1] with the inner product hf, gi = 0 f (t)g(t) dt and the subspace U of all polynomials. Since there is no polynomial p such that Z
0
1
et − p(t) q(t) dt = 0
for every polynomial q, the function et does not have an orthogonal projection on the subspace of all polynomials.
3.2.3
Calculations and applications of orthogonal projections
Theorem 3.2.14 gives us a method for effectively calculating projections on subspaces spanned by arbitrary vectors u1 , . . . , uk . It turns out that the calculations are significantly simplified if the vectors u1 , . . . , uk are orthogonal. Definition 3.2.17. Let v1 , . . . , vk be vectors in an inner product space V. We say that the set {v1 , . . . , vk } is an orthogonal set if hvi , vj i = 0 for all i, j = 1, . . . , k such that i 6= j. An orthogonal set {v1 , . . . , vk } is called an orthonormal set if kvi k = 1 for all i = 1, . . . , k.
The condition of orthonormality is often expressed in terms of the Kronecker delta function: ( 1 if i = j, δij = 0 if i 6= j. Using the Kronecker delta function we can say that a set {v1 , . . . , vk } is orthonormal if hvi , vj i = δij for all i, j ∈ {1, . . . , k}. Theorem 3.2.18. Let {u1 , . . . , uk } be an orthogonal set of nonzero vectors in an inner product space V and let U = Span{u1 , . . . , uk }. Then projU v =
hv, u1 i hv, uk i u1 + · · · + uk hu1 , u1 i huk , uk i
for every vector v in V.
Proof. Let p=
hv, uk i hv, u1 i u1 + · · · + uk . hu1 , u1 i huk , uk i
140
Chapter 3: Inner Product Spaces
Then, for every j ∈ {1, . . . , k}, we have hv, u1 i hv, uk i hv − p, uj i = v − u1 − · · · − uk , uj hu1 , u1 i huk , uk i hv, uk i hv, u1 i hu1 , uj i − · · · − huk , uj i = hv, uj i − hu1 , u1 i huk , uk i hv, uj i huj , uj i = 0. = hv, uj i − huj , uj i Since every u ∈ Span{u1 , . . . , uk } is of the form u = x1 u1 + · · · + xk uk , for some x1 , x2 , . . . , xk ∈ K, it follows that hv − p, ui = 0 for every u ∈ Span{u1 , . . . , uk }, which means that p = projU v. Note that Theorem 3.2.18 implies that, if {u1 , . . . , uk } is an orthogonal set of nonzero vectors, then the best approximation to the vector v by vectors from the subspace Span{u1 , . . . , uk } is the vector hv, u1 i hv, uk i u1 + · · · + uk . hu1 , u1 i huk , uk i If {u1 , . . . , uk } is an orthonormal set, then the formula for the projection becomes even simpler. Corollary 3.2.19. Let {u1 , . . . , uk } be an orthonormal set in an inner product space V and let U = Span{u1 , . . . , uk }. Then projU v = hv, u1 iu1 + · · · + hv, uk iuk for every vector v in V.
Example 3.2.20. Consider the vector space of continuous functions defined on the interval [α, β] with the inner product Z β 2 f (t)g(t) dt. hf, gi = β−α α n o 2πt 10πt Show that the set cos β−α , sin β−α is an orthonormal set and thus for every
141
3.2. ORTHOGONAL PROJECTIONS function f continuous on the interval [α, β] we have projU (f ) =
2 β−α
Z
β
f (t) cos
α
2πt 2πt 2 dt cos + β−α β−α β−α
o n 10πt 2πt , sin β−α . where U = Span cos β−α
Z
β
f (t) sin
α
10πt 10πt dt sin , β−α β−α
Solution. First we recall that for an arbitrary positive integer k we have 2kπt α−β 2kπβ 2kπα dt = − cos cos β−α 2kπ β−α β−α α 2kπβ 2kπ(α − β + β) α−β − cos cos = 2kπ β−α β−α 2kπβ 2kπ(α − β) 2kπβ 2kπβ α−β 2kπ(α − β) cos − sin sin − cos = cos = 0. 2kπ β−α β−α β−α β−α β−α Z
β
sin
Now we use the trigonometric identity sin α cos β =
1 (sin(α + β) + sin(α − β)) 2
and the above result to calculate the inner product: Z β 10πt 2 2πt 10πt 2πt , sin = cos sin dt cos β−α β−α β−α α β−α β−α Z β 1 1 12πt 8πt = sin + sin dt = 0. β−α α 2 β−α β−α This proves orthogonality of the set. Using a similar approach we can show that
2
2
cos 2πt = sin 10πt = 1,
β−α β − α which completes the proof of orthonormality. Then the formula for the inner product hf, gi follows from Corollary 3.2.19.
Example 3.2.21. Let P2 ([−1, 1]) be the space of complex valued polynomials on the interval [−1, 1] of degree at most 2 with the inner product defined as hp(t), q(t)i = Show that
n
√1 , 2
q
3 2 t,
q
5 2
3 2 2t
−
1 2
o
Z
1
p(t)q(t) dt.
−1
is an orthonormal set in P2 ([−1, 1]).
142
Chapter 3: Inner Product Spaces
Solution. We need to calculate the following integrals: r √ Z 1 Z 1 1 3 3 √ t dt = t dt = 0, 2 2 2 −1 −1 r √ Z 1 Z 1 1 5 3 2 1 5 √ t − dt = (3t2 − 1) dt = 0, 2 4 −1 2 2 2 −1 √ Z 1 Z 1r r 3 5 3 2 1 15 t t − dt = t(3t2 − 1) dt = 0, 2 2 2 2 4 −1 −1 2 Z 1 Z 1 1 1 √ 1 dt = 1, dt = 2 −1 2 −1 Z 1 r !2 Z 3 1 2 3 t dt = t dt = 1, 2 2 −1 −1 Z
1 −1
r !2 Z 2 5 3 2 1 5 1 t − dt = 3t2 − 1 dt = 1. 2 2 2 8 −1
Example 3.2.22. Let V be an inner product space and let {v1 , v2 } be an n o orthonormal set in V. Show that the set √12 (v2 + v1 ), √12 (v2 − v1 ) is orthonormal. Solution. We have 1 1 1 √ (v2 + v1 ), √ (v2 − v1 ) = (hv2 , v2 i−hv2 , v1 i+hv2 , v1 i−hv1 , v1 i) = 0, 2 2 2
2
1
√ (v2 + v1 ) = √1 (v2 + v1 ), √1 (v2 + v1 )
2
2 2 1 = (hv2 , v2 i + hv1 , v2 i + hv2 , v1 i + hv1 , v1 i) 2 1 = (hv2 , v2 i + hv1 , v1 i) = 1, 2 and similarly
2
1
√ (v2 − v1 ) = √1 (v2 − v1 ), √1 (v2 − v1 ) = 1.
2
2 2
3.2. ORTHOGONAL PROJECTIONS
143
For many results on subspaces U = Span{u1 , . . . , uk } it was necessary to assume that the vectors u1 , . . . , uk were linearly independent. It turns out that, if u1 , . . . , uk are nonzero orthogonal vectors, then they are always linearly independent. Consequently, any orthonormal set is linearly independent. Theorem 3.2.23. If {v1 , . . . , vk } is an orthogonal set of nonzero vectors in an inner product space V, then the vectors v1 , . . . , vk are linearly independent.
Proof. If {v1 , . . . , vk } is an orthogonal set of nonzero vectors and x1 v1 + · · · + xk vk = 0 for some numbers x1 , . . . , xk ∈ K, then for any j ∈ {1, . . . , k} we have hx1 v1 + x2 v2 + · · · + xk vk , vj i = x1 hv1 , vj i + x2 hv2 , vj i + · · · + xk hvk , vj i = xj hvj , vj i = x1 kv1 k2 .
On the other hand, hx1 v1 + x2 v2 + · · · + xk vk , vj i = h0, vj i = 0. Since xj kv1 k2 = 0 and kvj k 6= 0, we must have xj = 0. Consequently, the vectors v1 , . . . , vk are linearly independent.
3.2.4
The annihilator and the orthogonal complement
In Chapter 1 we introduced the notion of complementary subspaces: If U is a subspace of a vector space V, then a subspace W is called a complement of U in V if V = U ⊕ W. We pointed out that such space is not unique. In inner product spaces we can define orthogonal complements that have better properties. Definition 3.2.24. Let A be a nonempty subset of an inner product space V. The set of all vectors in V orthogonal to every vector in A is called the annihilator of A and is denoted by A⊥ : A⊥ = {x ∈ V : hx, vi = 0 for every v ∈ V} . If U is a subspace of V, then U ⊥ is called the orthogonal complement of U. From the definition of the annihilator and basic properties of the inner product we get the following useful result.
144
Chapter 3: Inner Product Spaces
Theorem 3.2.25. Let A be a subset of an inner product space V. The annihilator A⊥ is a subspace of V.
Example 3.2.26. Show that hf, gi = f (0)g(0) + f ′ (0)g ′ (0) + f ′′ (0)g ′′ (0) is an inner product in the vector space P2 (C) and determine (Span{t2 + 1})⊥ . Solution. First we note that hf, gi is an inner product because hα1 t2 + β1 t + γ1 , α2 t2 + β2 t + γ3 i = γ1 γ2 + β1 β2 + 4α1 α2 . Now we describe (Span{t2 + 1})⊥ . Since hαt2 + βt + γ, t2 + 1i = 4α + γ, we have (Span{t2 + 1})⊥ = αt2 + βt + γ : 4α + γ = 0 = Span t, t2 − 4 .
If U is a subspace of an inner product space V, then U ⊥ is a subspace of V, so it makes sense to consider the subspace (U ⊥ )⊥ . How is this subspace related to U? If u ∈ U, then u is orthogonal to every vector in U ⊥ , so u ∈ (U ⊥ )⊥ . This means that U ⊆ (U ⊥ )⊥ . In general, U and (U ⊥ )⊥ need not be equal. For example, consider the space V of Rall continuous functions on the interval 1 [0, 1] with the inner product hf, gi = 0 f (t)g(t) dt and the subspace U of all polynomials. It can be shown that, for any continuous function f , if Z
1
f (t)q(t) dt = 0
0
for every polynomial q, then f = 0. This means that U ⊥ = {0} and thus (U ⊥ )⊥ = V 6= U. If we assume that U is finite dimensional, then we can show that U and (U ⊥ )⊥ are equal. Theorem 3.2.27. If U is a finite dimensional subspace of an inner product space V, then (U ⊥ )⊥ = U.
3.2. ORTHOGONAL PROJECTIONS
145
Proof. We need to show that (U ⊥ )⊥ ⊆ U. Let v ∈ (U ⊥ )⊥ . If U is finite dimensional, then projU (v) exists, by Theorem 3.2.16. Since hw, vi = 0 for every w ∈ U ⊥ and v − projU (v) ∈ U ⊥ , we have hv − projU (v), vi = 0. Consequently, 0 = hv − projU (v), vi = hv − projU (v), v − projU (v) + projU (v)i
= hv − projU (v), v − projU (v)i + hv − projU (v), projU (v)i = hv − projU (v), v − projU (v)i = kv − projU (v)k2 ,
which means that v − projU (v) = 0. Thus v = projU (v), which implies v ∈ U. Theorem 3.2.28. Let V be an inner product space and let U be a finite dimensional subspace of V. Then for every v ∈ V the projection of v on U ⊥ exists and projU ⊥ (v) = v − projU (v). Proof. First we note that hv − projU (v), ui = 0 for every u ∈ U, which means v − projU (v) ∈ U ⊥ . Moreover, for every w ∈ U ⊥ we have hv − (v − projU (v)), wi = hprojU (v), wi = 0. Therefore v − projU (v) is the projection of v on U ⊥ . Theorem 3.2.29. For any finite dimensional subspace U of an inner product space V we have V = U ⊕ U ⊥. Proof. For every v ∈ V we have v = projU (v) + projU ⊥ (v), by Theorem 3.2.28. Hence V = U + U ⊥ . If x ∈ U ∩ U ⊥ , then kxk2 = hx, xi = 0 and thus x = 0, which means that V = U ⊕ U ⊥ . In the next theorem we list all basic results on orthogonal projections on finite dimensional subspaces. We assume that the subspace U is finite dimensional to ensure that the projU (v) exists for every v ∈ V. If we replace the assumption that U is finite dimensional by the assumption that the projU (v) exists for every v ∈ V, the theorem remains true.
146
Chapter 3: Inner Product Spaces
Theorem 3.2.30. Let U be a finite dimensional subspace of an inner product space V. Then (a) p = projU (v) if and only if hv − p, ui = 0 for every u ∈ U; (b) projU (v) is the best approximation to the vector v by vectors from the subspace U, that is, p = projU (v) if and only if kv − pk < kv − uk for every u ∈ U such that u 6= p; (c) projU : V → V is a linear transformation; (d) u = projU (u) for every u ∈ U; (e) ran projU = U; (f) ker projU = U ⊥ ; (g) projU ⊥ = Id −projU ; (h) projU (projU (v)) = projU (v) for every v ∈ V; (i) hprojU (v), wi = hv, projU (w)i for every v, w ∈ V. Proof. (a) is the definition of orthogonal projections (Definition 3.2.8); (b) is the statement in Theorem 3.2.13; (c) If v, w ∈ V and α, β ∈ K, then hαv + βw − αprojU (v) + βprojU (w), ui = αhv − projU (v), ui + βhw − projU (w), ui = 0
for every u ∈ U. Hence projU (αv + βw) = αprojU (v) + βprojU (w); (d) For every u ∈ U we have hu − u, ui = 0, which means that u = projU (u); (e) follows from (d) and the definition of the projection; (f) If v ∈ ker projU , then for every u ∈ U we have hv, ui = hv − projU (v), ui = 0, which means that v ∈ U ⊥ . Now, if v ∈ U ⊥ , then hv − 0, ui = hv, ui = 0 for every u ∈ U, which means that projU (v) = 0; (g) is equivalent to the statement in Theorem 3.2.28; (h) is a consequence of (d); (i) For any v, w ∈ V we have hprojU (v), wi = hprojU (v), w − projU (w) + projU (w)i = hprojU (v), projU (w)i = hv − projU (v) + projU (v), projU (w)i = hv, projU (w)i.
147
3.2. ORTHOGONAL PROJECTIONS
It turns out that properties (c), (h), and (i) in Theorem 3.2.30 characterize orthogonal projections. Theorem 3.2.31. Let V be an inner product space. If f : V → V is a linear transformation such that (a) f (f (x)) = f (x) for every x ∈ V, (b) hf (x), yi = hx, f (y)i for every x, y ∈ V, then f is the orthogonal projection on the subspace ran f . Proof. Assume that f : V → V is a linear transformation satisfying (a) and (b). If x, y ∈ V, then hv − f (x), f (y)i = hx, f (y)i − hf (x), f (y)i
= hx, f (y)i − hx, f (f (y))i = hx, f (y)i − hx, f (y)i = 0.
This means that f (x) = projran f (x) for every x ∈ V.
3.2.5
The Gram-Schmidt orthogonalization process and orthonormal bases
In Corollary 3.2.19 we noted that calculating the projection on the subspace U = Span{u1 , . . . , uk } is especially simple if {u1 , . . . , uk } is an orthonormal set. We are going to show that every finite dimensional subspace can be spanned by orthonormal vectors. This is accomplished by modifying an arbitrary spanning set by what is called the Gram-Schmidt process. We motivate the idea of the Gram-Schmidt process by considering a couple of examples.
Example 3.2.32. Find a vector v ∈ C3 such that i 1 1 Span 1 , v = Span 1 , −1 1 1 1
and
*
+ i v, −1 = 0. 1
148
Chapter 3: Inner Product Spaces
i Solution. Let U = Span −1 . By Example 3.2.15, we have 1 1 1 3 projU 1 = 13 i . 1 − 13 i Consequently, we can take
2 1 1 3 v = 1 − projU 1 = 1 − 31 i . 1 1 1 + 13 i
Example 3.2.33. Let u1 , . . . , um be nonzero orthogonal vectors in an inner product space V and let v be a vector in V such that v ∈ / Span{u1 , . . . , um }. Find a nonzero vector um+1 such that um+1 ∈ Span{u1 , . . . , um }⊥ and Span{u1 , . . . , um , um+1 } = Span{u1 , . . . , um , v}. Solution. By Theorem 3.2.18, we have projSpan{u1 ,...,um } (v) =
hv, u2 i hv, um i hv, u1 i u1 + u2 + · · · + um . hu1 , u1 i hu2 , u2 i hum , um i
We take um+1 = v −
hv, u1 i hv, u2 i hv, um i u1 − u2 − · · · − um . hu1 , u1 i hu2 , u2 i hum , um i
Clearly, um+1 6= 0 and Span{u1 , . . . , um , v} = Span{u1 , . . . , um , um+1 }. Moreover, since hum+1 , uj i = hv − projSpan{u1 ,...,um } (v), uj i = 0 for j = 1, . . . , m, we have um+1 ∈ Span{u1 , . . . , um }⊥ . The method used in the above example leads to the following general result.
3.2. ORTHOGONAL PROJECTIONS
149
Theorem 3.2.34. For any linearly independent vectors u1 , . . . , um in an inner product space V there are orthogonal vectors v1 , . . . , vm in V such that Span{u1 , . . . , uk } = Span{v1 , . . . , vk } for every k ∈ {1, . . . , m}. Proof. Let Uk = Span{u1 , . . . , uk } for k ∈ {1, . . . , m}. We define v1 = u1 and then successively vk = uk − projUk−1 (uk ) for k ∈ {2, . . . , m}. Since uk ∈ / Uk−1 , we have vk 6= 0. If Uk−1 = Span{v1 , . . . , vk−1 } for some k ∈ {2, . . . , m}, then uk = vk + projUk−1 (uk ) = vk + projSpan{v1 ,...,vk−1 } (uk ) ∈ Span{v1 , . . . , vk } and consequently Uk = Span{v1 , . . . , vk }, because vk ∈ Uk for every k ∈ {1, . . . , m}. This shows by induction that Span{u1 , . . . , uk } = Span{v1 , . . . , vk } for every k ∈ {1, . . . , m}. To finish the proof we note that, by part (a) of Theorem 3.2.30, hvk , ui = 0 for every u ∈ Uk−1 and every k ∈ {2, . . . , m}. Hence hvk , v1 i = · · · = hvk , vk−1 i = 0, because v1 , . . . , vk−1 ∈ Uk−1 . Note that the above proof describes an effective process of constructing an orthogonal basis of a subspace from an arbitrary basis. This process is called the Gram-Schmidt orthogonalization process.
Example 3.2.35. Let u1 , u2 , u3 , u4 be linearly independent vectors in an inner product space V. Find an orthogonal set {v1 , v2 , v3 , v4 } such that u1 = v1 , Span{u1 , u2 } = Span{v1 , v2 },
Span{u1 , u2 , u3 } = Span{v1 , v2 , v3 }, and Span{u1 , u2 , u3 , u4 } = Span{v1 , v2 , v3 , v4 }.
150
Chapter 3: Inner Product Spaces
Solution. We take v1 = u1 , v2 = u2 −
hu2 , v1 i v1 , hv1 , v1 i
v3 = u3 −
hu3 , v1 i hu3 , v2 i v1 − v2 , hv1 , v1 i hv2 , v2 i
v4 = u4 −
hu4 , v1 i hu4 , v2 i hu4 , v3 i v1 − v2 − v3 . hv1 , v1 i hv2 , v2 i hv3 , v3 i
and
In the first section of this chapter we proved that ku + vk2 = kuk2 + kvk2 for any orthogonal vectors u and v (the Pythagorean Theorem 3.1.35). This property easily generalizes to any finite set of orthogonal vectors. Theorem 3.2.36 (The General Pythagorean Theorem). For any orthogonal vectors v1 , . . . , vn in an inner product space V we have kv1 + · · · + vn k2 = kv1 k2 + · · · + kvn k2 .
Proof. For any orthogonal vectors v1 , . . . , vn we have
2 * +
X
n n n n n X X X X X
n
v = v , v = hv , v i = hv , v i = kvj k2 . j j k j k j j
j=1 j=1 j=1 j=1 k=1 j,k=1 If u1 , . . . , um are linearly independent vectors, then the vectors v1 , . . . , vm obtained by the Gram-Schmidt orthogonalization process are also linearly independent and thus they are nonzero vectors. By normalizing vectors v1 , . . . , vm we obtain an orthonormal set vm v1 ,..., . kv1 k kvm k The process of obtaining an orthonormal set from an arbitrary linearly independent set is called the Gram-Schmidt orthonormalization process.
151
3.2. ORTHOGONAL PROJECTIONS
Corollary 3.2.37. For any linearly independent vectors u1 , . . . , um in an inner product space V there are orthonormal vectors w1 , . . . , wm in V such that Span{u1 , . . . , uk } = Span{w1 , . . . , wk } for every k ∈ {1, . . . , m}.
Example 3.2.38. We apply the Gram-Schmidt orthonormalization process to the set {1, t, t2 } in the vector space of polynomials on the interval [0, 1] with the inner product Z 1 hf, gi = f (t)g(t)dt. 0
First we define f0 (t) = 1. Since kf0 k2 =
Z
1
1dt = 1, 0
we let g0 (t) = f0 (t) = 1. Next we find f1 : f1 (t) = t − ht, 1i = t − Since 2
kf1 k = we define g1 (t) =
Z
0
1
1 t− 2
Z
1
0
2
1 tdt = t − . 2
dt =
1 , 12
√ 1 1 f1 (t) = 2 3 t − kf1 k 2
Now we find f2 : √ √ 1 1 f2 (t) = t2 − ht2 , 1i − t2 , 2 3 t − 2 3 t− 2 2 Z 1 Z 1 1 1 = t2 − t2 dt − 12 t2 t − dt t − 2 2 0 0 1 1 = t2 − − t − 3 2 1 = t2 − t + . 6
152
Chapter 3: Inner Product Spaces
Since kf2 k2 = we define g2 (t) =
Z
0
1
2 1 1 dt = , t2 + t + 6 180
√ 1 1 f2 (t) = 6 5 t2 − t + . kf2 k 6
By applying the Gram-Schmidt orthonormalization process to the set {1, t, t2 } we obtain the following orthonormal set √ √ 1 1 2 ,6 5 t − t+ . 1, 2 3 t − 2 6
Example 3.2.39. Use the result from Example 3.2.38 to find the best approximation to the function cos πt by quadraticR polynomials on the interval 1 [0, 1] with respect to the inner product hf, gi = 0 f (t)g(t)dt. Solution. Since
√ √ 4 3 1 =− 2 , hcos πt, 1i = 0, cos πt, 2 3 t − 2 π and
√ 1 2 cos πt, 6 5 t − t + = 0, 6
we have projSpan{1,t,t2 } (cos πt) = −
√ 4 3 √ 1 24 12 3 t − 2 = − 2t+ 2. π2 2 π π
Example 3.2.40. Consider the vector space Pm (R) with the inner product hf, gi =
Z
1
f (t)g(t)dt.
−1
We apply the Gram-Schmidt orthogonalization process to the polynomials 1, t, . . . , tm and get polynomials 1, p1 , . . . , pm . Show that pm ∈ Span{((1 − t2 )m )(m) }.
153
3.2. ORTHOGONAL PROJECTIONS Solution. First we find that Z 1 1 ((1 − t2 )m )(m) tm−1 dt = ((1 − t2 )m )(m−1) tm−1 −1 −1
− (m − 1)
= −(m − 1)
Z
Z
1
−1 1
−1
((1 − t2 )m )(m−1) tm−2 dt
((1 − t2 )m )(m−1) tm−2 dt.
If we continue to integrate by parts, we end up with Z 1 Z 1 2 m (m) m−1 m−1 ((1 − t ) ) t dt = (−1) (m − 1)! ((1 − t2 )m )′ dt = 0. −1
−1
In a similar way we get Z
1
−1
((1 − t2 )m )(m) tj dt = 0
for every j ∈ {0, . . . , m − 1}. Now, since ⊥ Pm−1 (R) ⊕ Pm−1 (R) = Pm (R), ⊥ dim Pm (R) = m + 1, and dim Pm−1 (R) = m, we have dim Pm−1 (R) = 1. This 2 m (m) ⊥ gives us our result because ((1 − t ) ) ∈ Pm−1 (R).
From Corollary 3.2.37 it follows that every finite dimensional subspace of an inner product space has an orthonormal spanning set. In other words, every finite dimensional subspace of an inner product space has an orthonormal basis. Theorem 3.2.41. Let {x1 , . . . , xn } be an orthonormal set in an inner product space V. The following conditions are equivalent: (a) {x1 , . . . , xn } is a basis in V; (b) (Span{x1 , . . . , xn })⊥ = {0}; (c) v = hv, x1 ix1 + · · · + hv, xn ixn for every vector v ∈ V; (d) hv, wi = hv, x1 ihx1 , wi + · · · + hv, xn ihxn wi for every v, w ∈ V; (e) kvk2 = |hv, x1 i|2 + · · · + |hv, xn i|2 for every vector v ∈ V. Proof. Assume {x1 , . . . , xn } is a basis in V. If v ∈ (Span{x1 , . . . , xn })⊥ , then v = α1 x1 + · · · + αn xn
154
Chapter 3: Inner Product Spaces
for some α1 , . . . , αn ∈ K and hv, xj i = 0 for every j = 1, . . . , n. Since the set {x1 , . . . , xn } is orthonormal, we have 0 = hv, xj i = hα1 x1 , . . . , αn xn , xj i
= hα1 x1 , xj i + · · · + hαn xn , xj i = αj hxj , xj i = αj ,
for every j = 1, . . . , n, which means that v = 0. This shows that (a) implies (b). Now we observe that hv − (hv, x1 ix1 + · · · + hv, xn ixn ), xj i = 0 for every v ∈ V and every j = 1, . . . , n, and thus v − (hv, x1 ix1 + · · · + hv, xn ixn ) ∈ (Span{x1 , . . . , xn })⊥ . Consequently, if (Span{x1 , . . . , xn })⊥ = {0}, then v = hv, x1 ix1 + · · · + hv, xn ixn , for every v ∈ V. This shows that (b) implies (c). Since (c) clearly implies (a), the conditions (a), (b), and (c) are equivalent. Now assume that (c) holds and consider arbitrary v, w ∈ V. Then * n + n X X hv, wi = hv, xj ixj , hw, xk ixk j=1
= = =
n X
j,k=1 n X j=1 n X j=1
k=1
hv, xj ihw, xk ihxj , xk i
hv, xj ihw, xj ihxj , xj i hv, xj ihxj , wi.
Thus (c) implies (d). To see that (d) implies (e) it suffices to let w = v in (d). To complete the proof we show that (e) implies (b). Indeed, if (e) holds and v ∈ (Span{x1 , . . . , xn })⊥ , then kvk2 = |hv, x1 i|2 + · · · + |hv, xn i|2 = 0, and thus v = 0.
3.3
The adjoint of a linear transformation
For any v0 in an inner product space V the function f (x) = hx, v0 i is a linear transformation from V to K, which is an immediate consequence of the definition
3.3. THE ADJOINT OF A LINEAR TRANSFORMATION
155
of the inner product. It turns out that, if V is a finite dimensional inner product space, then every linear transformation from V to K is of such form. Theorem 3.3.1 (Representation Theorem). Let V be a finite dimensional inner product spaces and let f : V → K be a linear transformation. Then there exists a unique vf ∈ V such that f (x) = hx, vf i for every x ∈ V. Proof. Let f : V → K be a linear transformation. If f is the zero transformation, then clearly v = 0 has the desired property. Assume f : V → K is a nonzero linear transformation. Then ker f 6= V. Let u be a unit vector in (ker f )⊥ . Since f (u)x − f (x)u ∈ ker f for every x ∈ V, we have hf (u)x − f (x)u, ui = 0, which gives us
hf (u)x, ui = hf (x)u, ui = f (x)kuk2 = f (x). Consequently f (x) = hf (u)x, ui = hx, f (u)ui.
If we take vf = f (u)u, then
f (x) = hx, vf i for every x ∈ V. Now suppose w is another vector such that f (x) = hx, wi for every x ∈ V. But then kvf − wk2 = hvf − w, vf − wi = hvf − w, vf i − hvf − w, wi = f (vf − w) − f (vf − w) = 0
and thus w = vf . From Theorem 3.3.1 we obtain the following important result. Theorem 3.3.2. Let V and W be finite dimensional inner product spaces. For every linear transformation f : V → W there is a unique linear transformation g : W → V such that hf (v), wi = hv, g(w)i for every v ∈ V and w ∈ W.
156
Chapter 3: Inner Product Spaces
Proof. Let f : V → W be a linear transformation. For every w ∈ W the function fw : V → K defined by fw (v) = hf (v), wi is linear and thus, by Theorem 3.3.1, there is a unique vector zw such that fw (v) = hv, zw i for every v ∈ V. Clearly, the function fw depends on w and thus zw depends on w. In other words, there is a function g : W → V such that hf (v), wi = hv, zw i = hv, g(w)i for every v ∈ V and w ∈ W. We need to show that g is linear. If w1 , w2 ∈ W, then hf (v), w1 + w2 i = hf (v), w1 i + hf (v), w2 i = hv, g(w1 )i + hv, g(w2 )i = hv, g(w1 ) + g(w2 )i
for every v ∈ V. By the uniqueness part of Theorem 3.3.1 we have g(w1 + w2 ) = g(w1 ) + g(w2 ). Similarly, if α ∈ K and w ∈ W, then hf (v), αwi = αhf (v), wi = αhv, g(w)i = hv, αg(w)i for every v ∈ V, which gives us g(αw) = αg(w).
Definition 3.3.3. Let V and W be finite dimensional inner product spaces and let f : V → W be a linear transformation. The unique linear transformation g : W → V such that hf (v), wi = hv, g(w)i for every v ∈ V and w ∈ W is called the adjoint of f and is denoted by f ∗. Note that, if g = f ∗ , then also f = g ∗ . Indeed, if hf (v), wi = hv, g(w)i for every v ∈ V and w ∈ W, then hg(w), vi = hv, g(w)i = hf (v), wi = hw, f (v)i for every v ∈ V and w ∈ W. Theorem 3.3.2 says that for any finite dimensional inner product spaces V and W, if f ∈ L(V, W), then f ∗ ∈ L(W, V). We can think of ∗ as an operation from L(V, W) to L(W, V), that is, ∗ : L(V, W) → L(W, V). The next theorem lists some useful algebraic properties of the operation ∗ .
3.3. THE ADJOINT OF A LINEAR TRANSFORMATION
157
Theorem 3.3.4. Let V, W, and X be finite dimensional inner product spaces. (a) (Id)∗ = Id; (b) (gf )∗ = f ∗ g ∗
for every f ∈ L(V, W) and g ∈ L(W, X );
(c) (f1 + f2 )∗ = f1∗ + f2∗ (d) (αf )∗ = αf ∗ (e) (f ∗ )∗ = f Proof.
for every f1 , f2 ∈ L(V, W);
for every f ∈ L(V, W) and α ∈ K;
for every f ∈ L(V, W).
(a) For any v ∈ V we have hId(v), vi = hv, vi = hv, Id(v)i.
(b) For any v ∈ V, w ∈ W, and x ∈ X , we have
hg(f (v)), xi = hf (v), g ∗ (x)i = hv, f ∗ (g ∗ (x))i.
(c) For any v ∈ V and w ∈ W, we have
h(f1 + f2 )(v), wi = hf1 (v), wi + hf2 (v), wi = hv, f1∗ (w)i + hv, f2∗ (w)i = hv, (f1∗ + f2∗ )(w)i.
(d) For any v ∈ V, w ∈ W, and α ∈ K, we have
h(αf )(v), wi = hαf (v), wi = αhf (v), wi = αhv, f ∗ (w)i = hv, αf ∗ (w)i.
(e) For any v ∈ V and w ∈ W, we have
hf (v), wi = hv, f ∗ (w)i = h(f ∗ )∗ (v), wi.
Example 3.3.5. Let f : C3 → C3 and g : C3 → C3 be the linear transformations defined by x y x 0 f y = z and g y = x . z 0 z y Show that f ∗ = g.
Solution. We have * x x + *y x + 1 2 1 2 f y1 , y2 = z1 , y2 = y1 x2 + z1 y2 z2 0 z2 z1
158 and
Chapter 3: Inner Product Spaces * + * + x1 x2 x1 0 y1 , g y2 = y1 , x2 = y1 x2 + z1 y2 . z1 z2 z1 y2
Example 3.3.6. In this example we give an application of the adjoint of a linear operator. Let V be a finite dimensional inner product space and let f : V → V be a linear operator. Use the adjoint of f to show that, if {a1 , . . . , an } and {b1 , . . . , bn } are two orthonormal bases in V, then n X j=1
n X
kf (aj )k2 =
j=1
kf (bj )k2 .
Solution. From Theorem 3.2.41 we get kf (aj )k2 = for every j = 1, . . . , n and kf ∗ (bk )k2 =
n X j=1
n X
k=1
|hf (aj ), bk i|2
|hf ∗ (bk ), aj i|2 =
n X j=1
|haj , f ∗ (bk )i|2
for every k = 1, . . . , n. Hence n X j=1
kf (aj )k2 = =
n X n X j=1 k=1
n X n X j=1 k=1
|hf (aj ), bk i|2 |haj , f ∗ (bk )i|2 =
n X
k=1
In similar way we obtain n X j=1
kf (bj )k2 =
which gives us the desired result.
n X
k=1
kf ∗ (bk )k2 ,
kf ∗ (bk )k2 .
3.3. THE ADJOINT OF A LINEAR TRANSFORMATION
159
Theorem 3.3.7. Let V and W be finite dimensional inner product spaces. A linear transformation f : V → W is invertible if and only if f ∗ : W → V is invertible and then we have (f ∗ )−1 = (f −1 )∗ . Proof. From f −1 f = IdV we get and from
f ∗ (f −1 )∗ = (f −1 f )∗ = Id∗V = IdV f f −1 = IdW
we get
(f −1 )∗ f ∗ = (f f −1 )∗ = Id∗W = IdW .
In Theorem 2.3.2 we prove that, if B = {v1 , . . . , vm } and C = {w1 , . . . , wn } are bases of vector spaces V and W, respectively, then for every linear transformation f : V → W there is a unique n × m matrix A such that f (v) = Av for all v ∈ V. We say that A is the matrix of f relative to the bases B and C and write A = fB→C . In the following theorem we use A∗ to denote the conjugate transpose of A. If A = [akj ] is an n × m matrix with complex entries, then the conjugate transpose of A is the m × n matrix defined by A∗ = [ajk ]. Theorem 3.3.8. Let V and W be finite dimensional inner spaces and let f : V → W be a linear transformation. If B = {v1 , . . . , vm } is an orthonormal basis of V and C = {w1 , . . . , wn } an orthonormal basis of W, then (f ∗ )C→B = (fB→C )∗ . Proof. For all j ∈ {1, . . . , m} and k ∈ {1, . . . , n} we let akj = hf (vj ), wk i. Then f (vj ) = hf (vj ), w1 iw1 + · · · + hf (vj ), wn iwn = a1j w1 + · · · + anj wn , for every j ∈ {1, . . . , m}. This means that the matrix A = [akj ] is the matrix of f relative to the bases B and C. On the other hand, for every k ∈ {1, . . . , n}, we have f ∗ (wk ) = hf ∗ (wk ), v1 iv1 + · · · + hf ∗ (wk ), vm ivm = hwk , f (v1 )iv1 + · · · + hwk , f (vm )ivm = hf (v1 ), wk iv1 + · · · + hf (vm ), wk ivm
= ak1 v1 + · · · + akm vm ,
160
Chapter 3: Inner Product Spaces
which means that the matrix of f ∗ relative to the bases C and B is the conjugate transpose of the matrix of f relative to the bases B and C. The adjoint of a linear transformation f ∈ L(V, W) is a linear transformation f ∗ ∈ L(W, V). If V = W, then f, f ∗ ∈ L(V, V) = L(V) and we can consider properties of the adjoint operation that simply don’t make sense when V 6= W. Definition 3.3.9. Let V be a finite dimensional inner product space and let f : V → V be a linear operator. (a) If f f ∗ = f ∗ f , then f is called a normal operator. (b) If f ∗ = f , then f is called a self-adjoint operator.
Clearly, every self-adjoint operator is normal. Note that self-adjoint operators can be defined for all inner product spaces (not necessarily finite dimensional): a linear operator f : V → V is self-adjoint if hf (x), yi = hx, f (y)i for every x, y ∈ V. More on operators on infinite dimensional inner product spaces can be found in Section 5.
Example 3.3.10. Consider the operator f ∈ L(C2 , C2 ) defined by f (x) = Ax where i 1−i A= . −1 − i 2i Show that f is normal but not self-adjoint. Solution. Since A∗ =
−i −1 + i , 1+i −2i
f is not self-adjoint. On the other hand, since AA∗ = A∗ A =
3 −3 − 3i , −3 + 3i 6
f is a normal operator.
Theorem 3.3.11. Let V be a finite dimensional inner product space. For every linear operator f : V → V, the operators f f ∗ , f ∗ f and f + f ∗ are self-adjoint.
3.3. THE ADJOINT OF A LINEAR TRANSFORMATION
161
Proof. Since (f f ∗ )∗ = (f ∗ )∗ f ∗ = f f ∗ , f f ∗ is self-adjoint. In the same way we can show that f ∗ f is self-adjoint. Since (f + f ∗ )∗ = f ∗ + (f ∗ )∗ = f ∗ + f, f + f ∗ is self-adjoint. The composition of two self-adjoint operators need not be self adjoint. The following theorem tells us exactly when it is rue. Theorem 3.3.12. Let f and g be self-adjoint operators on a finite dimensional inner product space V. The operator f g is self-adjoint if and only if f g = gf .
Proof. If f and g are self-adjoint, then for every v, w ∈ V we have hf g(v), wi = hg(v), f (w)i = hv, gf (w)i. Consequently, f g = gf if and only if f g is self-adjoint. The following useful result is a consequence of the polarization identity. Theorem 3.3.13. Let V be a finite dimensional inner product space. (a) If f : V → V is a self-adjoint operator such that hf (v), vi = 0 for every v ∈ V, then f = 0. (b) If f1 , f2 : V → V are self-adjoint operators such that hf1 (v), vi = hf2 (v), vi for every v ∈ V, then f1 = f2 .
Proof. Let f : V → V be a self-adjoint operator. First we note that the form s(v, w) = hf (v), wi is sesquilinear. We show that s is hermitian. Indeed, s(v, w) = hf (v), wi = hv, f ∗ (w)i = hv, f (w)i = hf (w), vi = s(w, v). Now, if hf (v), vi = 0 for every v ∈ V, then hf (v), wi = 0 for every v, w ∈ V, by Theorem 3.1.11 (or Theorem 3.1.6 for a real inner product space). Consequently, f = 0, proving part (a). To prove part (b) we take f = f1 − f2 and use part (a).
162
Chapter 3: Inner Product Spaces
Example 3.3.14. The above theorem does not hold if we drop the assumption 2 2 that f is self-adjoint. For example, consider the operator f ∈ L(C , C ) defined x −y by f = . Then y x x x −y x f , = , =0 y y x y for every
x ∈ C2 , but f 6= 0. y
Theorem 3.3.15. Let f : V → V be a self-adjoint operator on a finite dimensional inner product space V. If f k = 0 for some integer k ≥ 1, then f = 0. Proof. If f 2 = 0, then for every v ∈ V we have 0 = hf 2 (v), vi = hf (v), f (v)i = kf (v)k2 , and thus f = 0. If f 4 = (f 2 )2 = 0, then f 2 = 0 and thus f = 0. This way we n can show that if f 2 = 0 and then f = 0. For any other integer k ≥ 1 we find n an integer n ≥ 1 such that k ≤ 2n . Then, if f k = 0, then f 2 = 0 and thus f = 0. The above property may seem obvious, but it’s not true for arbitrary linear operators. For for the operator f : C2 → C2 defined as f (x) = Ax, example, 0 1 where A = , we have f 2 = 0, but f 6= 0. 0 0 We close this section with an important characterization of normal operators. Theorem 3.3.16. Let V be a finite dimensional inner product space. A linear operator f : V → V is normal if and only if kf (v)k = kf ∗ (v)k for every v ∈ V. Proof. Assume f is normal. Then for every v ∈ V we have kf (vk2 = hf (v), f (vi) = hf ∗ f (v), vi = hf f ∗ (v), vi = hf ∗ (v), f ∗ (vi) = kf ∗ (vk2 . Now assume kf (v)k = kf ∗ (v)k for every v ∈ V. Since hf ∗ f (v), vi = hf (v), f (vi) = kf (vk2 = kf ∗ (vk2 = hf ∗ (v), f ∗ (vi) = hf f ∗ (v), vi, for every v ∈ V, we have f f ∗ = f ∗ f by Theorems 3.3.11 and 3.3.13.
163
3.4. SPECTRAL THEOREMS
3.4
Spectral theorems
Spectral decomposition of matrices is one the most important ideas in matrix linear algebra. Here we generalize this idea to operators on arbitrary finite dimensional inner product spaces.
3.4.1
Spectral theorems for operators on complex inner product spaces
In this section all inner product spaces are assumed to be complex. Definition 3.4.1. Let V be a complex vector space and let f : V → V be a linear operator. (a) λ ∈ C is called an eigenvalue of f if f (v) = λv for some nonzero v ∈ V. (b) If λ ∈ C is an eigenvalue of f , then every nonzero vector v ∈ V such that f (v) = λv is called an eigenvector of f corresponding to λ. The set of all eigenvectors of a linear operator f corresponding to an eigenvalue λ is not a vector subspace of V because the zero vector is not an eigenvector, but if we include the zero vector, then we obtain a subspace that is called the eigenspace of f corresponding to λ and is denoted by Eλ . Example 3.4.2. Consider the complex vector space V = Span{et , e2t , . . . , d ent }. Show that 1, 2, . . . , n are eigenvalues of the differential operator dt on the space V. Solution. For every k ∈ {1, 2, . . . , n} we have d kt e = kekt . dt This means that k is an eigenvalue of the differential operator function ekt is an eigenvector corresponding to k.
d dt
and the
Example 3.4.3. Consider the real vector space V = Span{1, t, t2 , . . . , tn }. d Show that 0 is the only eigenvalue of the differential operator dt on the space V.
164
Chapter 3: Inner Product Spaces
d Solution. Since dt 1 = 0 = 0·1, the number 0 is an eigenvalue of the differential d operator dt and the constant function 1 is an eigenvector corresponding to 0. d Now suppose there is a λ 6= 0 that is an eigenvalue of dt on V. Then
d (z0 + z1 t + · · · + zk tk ) = λ(z0 + z1 t + · · · + zk tk ) dt for some k ≤ n and some z0 , z1 , . . . , zk ∈ R such that zk 6= 0. But this means that z1 + 2z2 t + · · · + kzk tk−1 = λ(z0 + z1 t + · · · + zk tk ), which implies λzk = 0, a contradiction.
Example 3.4.4. This example uses derivatives of the functions of the form F : R → C. Consider the complex vector space V = Span{cos t, sin t}. Show that i and d . −i are eigenvalues of the differential operator dt Solution. Since d (cos t + i sin t) = − sin t + i cos t = i(cos t + i sin t), dt d and the function cos t + i sin t i is an eigenvalue of the differential operator dt is an eigenvector corresponding to i. Similarly, since
d (cos t − i sin t) = − sin t − i cos t = −i(cos t − i sin t), dt −i is an eigenvalue of the differential operator is an eigenvector corresponding to −i.
d dt
and the function cos t − i sin t
In the next example we are assuming that f : V → V is an operator such that there is an orthonormal basis {e1 , . . . , en } of V consisting of eigenvectors of f . As we will see later in this section, every normal operator has this property.
Example 3.4.5. Let V be a finite dimensional inner product space and let f : V → V be a linear operator such that there is an orthonormal basis {e1 , . . . , en } of V consisting of eigenvectors of f . Show that for every v ∈ V we have f (v) =
n X j=1
λj hv, ej iej ,
165
3.4. SPECTRAL THEOREMS where λj is the eigenvalue corresponding to the eigenvector vj . Pn Solution. Since v = j=1 hv, ej iej for every v ∈ V, we have f (v) =
n X j=1
hv, ej if (ej ) =
n X j=1
λj hv, ej iej .
The following theorem gives us two useful descriptions of eigenvalues. Theorem 3.4.6. Let V be a finite dimensional complex vector space and let f : V → V be a linear operator. The following conditions are equivalent: (a) λ ∈ C is an eigenvalue of f ; (b) ker(f − λ Id) 6= {0}; (c) The operator f − λ Id is not invertible. Proof. Equivalence (a) and (b) is an immediate consequence of the definitions. Equivalence (b) and (c) follows from Theorem 2.1.23.
Definition 3.4.7. If f : V → V is a linear operator on a vector space V and p(z) = a0 + a1 z + · · · + am z m is a polynomial, we define p(f ) = a0 Id +a1 f + · · · + am f m . Since compositions and linear combination of linear operators are linear operators, p(f ) is a linear operator. Clearly, if p and q are polynomials, then (p + q)(f ) = p(f ) + q(f ) and (pq)(f ) = p(f )q(f ). Theorem 3.4.8. If V is a nontrivial complex vector space of finite dimension, then every linear operator f : V → V has an eigenvalue. 2
Proof. Let dim V = n. Since dim L(V) = n2 , the operators Id, f, f 2 , . . . , f n are linearly dependent and thus a0 Id +a1 f + a2 f 2 + · · · + ak f k = 0
166
Chapter 3: Inner Product Spaces
for some k ≤ n2 and a0 , a1 , a2 , . . . , ak ∈ C such that ak 6= 0. Now, by the Fundamental Theorem of Algebra, there are complex numbers z1 , . . . , zk such that a0 + a1 t + · · · + ak tk = ak (t − z1 ) · · · (t − zk ). Consequently, a0 Id +a1 f + a2 f 2 + · · · + ak f k = ak (f − z1 Id) · · · (f − zk Id). Since the operator (f − z1 Id) · · · (f − zk Id) is not invertible, for at least one j ∈ {1, . . . , k} the operator f − zj Id is not invertible, which means that zj is an eigenvalue of f , as noted in Theorem 3.4.6.
Theorem 3.4.9. Let V be a finite dimensional inner product space and let f : V → V be a normal operator. If λ is an eigenvalue of f , then λ is an eigenvalue of f ∗ . Moreover, every eigenvector of f corresponding λ is an eigenvector of f ∗ corresponding λ.
Proof. First we note that, if f is a normal operator, then f − λ Id is a normal operator and we have (f − λ Id)∗ = f ∗ − λ Id . Let v be an eigenvector of f corresponding to λ. Then, by Theorem 3.3.16, we have 0 = kf (v) − λvk = kf ∗ (v) − λvk and consequently f ∗ (v) = λv.
Theorem 3.4.10. Let V be a finite dimensional inner product space inner product space and let f : V → V be a normal operator. Eigenvectors of f corresponding to different eigenvalues are orthogonal.
Proof. We need to show that if λ and µ are two distinct eigenvalues of f and v and w eigenvectors of f corresponding to λ and µ, respectively, then hv, wi = 0. Indeed, since λhv, wi = hf (v), wi = hv, f ∗ (w)i = hv, µwi = µhv, wi, we have (λ − µ)hv, wi = 0. Consequently, hv, wi = 0, because λ 6= µ.
3.4. SPECTRAL THEOREMS
167
Note that the property in the above theorem can be expressed as follows: Eigenspaces of a normal operator corresponding to different eigenvalues are mutually orthogonal subspaces. Definition 3.4.11. A subspace U of a vector space V is called an invariant space of a linear operator f : V → V, or simply f -invariant, if f (U) ⊆ U. A subspace U of an inner product space V is called a reducing space of a linear operator f : V → V, if both U and U ⊥ are f -invariant.
Example 3.4.12. Let f : R3 → R3 be the linear operator defined by x 2 3 7 x f y = 0 8 0 y . z 1 −2 3 z
0 1 Show that Span 0 , 0 is f -invariant. 0 1 Proof. We have
and
0 1 2 1 f 0 = 0 ∈ Span 0 , 0 0 1 0 1
0 0 7 1 f 0 = 0 ∈ Span 0 , 0 . 0 1 1 3
The following two theorems characterize invariant spaces and reducing spaces in terms of projections. Theorem 3.4.13. Let V be an inner product space and let f : V → V be a linear operator. A finite dimensional subspace U ⊆ V is f -invariant if and only if f projU = projU f projU . Proof. If U ⊆ V is f -invariant, then projU (v) ∈ U and f (projU (v)) ∈ U for
168
Chapter 3: Inner Product Spaces
every v ∈ V. Consequently projU (f (projU (v))) = f (projU (v)), that is, f projU = projU f projU . On the other hand, if f projU = projU f projU , then f (u) = f (projU (u)) = projU (f (projU (u))) ∈ U for every u ∈ U, which means that U is f -invariant. Theorem 3.4.14. Let V be an inner product space and let f : V → V be a linear operator. A finite dimensional subspace U ⊆ V is a reducing subspace of f if and only if f projU = projU f. Proof. By Theorem 3.4.13, the subspace U ⊆ V is a reducing subspace of f if and only if f projU = projU f projU
and f (Id −projU ) = (Id −projU )f (Id −projU ),
because projU ⊥ = Id −projU . The above is equivalent to f projU = projU f projU
and projU f = projU f projU
or simply to f projU = projU f, because the equality f projU = projU f implies projU f projU = projU projU f = projU f.
Theorem 3.4.15. Let V be a finite dimensional inner product space and let f : V → V be a linear operator. If a subspace U ⊆ V is an invariant subspace of f , then U ⊥ is an invariant subspace of f ∗ . Proof. Assume U ⊆ V is an invariant subspace of f . If u ∈ U and v ∈ U ⊥ , then hf ∗ (v), ui = hv, f (u)i = 0, because f (u) ∈ U. Consequently, f ∗ (v) ∈ U ⊥ .
169
3.4. SPECTRAL THEOREMS
Theorem 3.4.16. Let V be a finite dimensional inner product space and let f : V → V be a nonzero linear operator. The following conditions are equivalent: (a) f is a normal operator; (b) There are orthonormal vectors e1 , . . . , er ∈ V and nonzero complex numbers λ1 , . . . , λr such that for every v ∈ V we have f (v) =
r X j=1
λj hv, ej iej ;
(c) There are orthonormal vectors e1 , . . . , er ∈ V and nonzero complex numbers λ1 , . . . , λr such that f=
r X
λj projej .
j=1
Proof. First we note that (b) and (c) are equivalent because projej (v) = hv, ej iej for every v ∈ V. Pr If f (v) = j=1 λj hv, ej iej for every v ∈ V, then f (ej ) = λj ej for every
j = 1, . . . , r and hence f f ∗ (v) = =
r X j=1 r X j=1
λj hf ∗ v, ej iej = λj hv, λj ej iej =
r X
j=1 r X j=1
λj hv, f (ej )iej λj λj hv, ej iej =
r X j=1
|λj |2 hv, ej iej .
On the other hand, since hf (v), ui =
*
r X j=1
λj hv, ej iej , u
+
=
r X
= v, λj hu, ej iej = j=1
r X j=1
*
v,
r X j=1
for every v, u ∈ V, we have f ∗ (v) =
r X j=1
λj hv, ej ihej , ui
λj hv, ej iej
λj hu, ej iej
+
,
170
Chapter 3: Inner Product Spaces
and thus f ∗ (ej ) = λj ej for every j = 1, . . . , r. Consequently f ∗ f (v) =
r X j=1
=
r X j=1
λj hf v, ej iej =
r X
j=1 r X
λj hv, λj ej iej = Pr
λj hv, f ∗ (ej )iej
j=1
λj λj hv, ej iej =
r X j=1
|λj |2 hv, ej iej .
This shows that, if f (v) = j=1 λj hv, ej iej for every v ∈ V, then f is a normal operator, that is, (b) implies (a). To complete the proof we show that (a) implies (c). Assume f is a nonzero normal operator. By Theorem 3.4.8, f has an eigenvalue λ. Let e be a unit eigenvector of f corresponding to λ. Then, by Theorem 3.4.9, λ is an eigenvalue of f ∗ with the same eigenvector e and thus the subspace Span{e} is f -invariant and f ∗ -invariant. Consequently, by Theorem 3.4.15, Span{e}⊥ is f -invariant and f ∗ -invariant. Now we argue by induction on the dimension of the range of f . If dim ran f = 1, then clearly f = λproje and we are done. Now we assume that dim ran f = r for some r > 1 and that the implication (a) implies (c) is proved for every normal operator g such that dim ran g = q < r. We denote λ = λr and e = er . Because, as observed above, the subspaces Span{er } and Span{er }⊥ are f -invariant and f ∗ -invariant, the operators f and f ∗ commute with the projection on Span{er } and we have, by Theorem 3.4.14, f projer = projer f = λr projer
and f ∗ projer = projer f ∗ = λr projer .
Consequently, (f −λr projer )(f ∗ −λr projer ) = f f ∗ −|λr |2 projer = (f ∗ −λr projer )(f −λr projer ), which means that f − λr projer is a normal operator. Moreover, since ran f = Span{er } ⊕ f (Span{er }⊥ ), we have dim ran(f − λr projer ) = dim f (Span{er }⊥ ) = r − 1.
By our inductive assumption there are orthonormal vectors e1 , . . . , er−1 and nonzero complex numbers λ1 , . . . , λr−1 such that f − λr projer =
r−1 X
λj projej ,
j=1
which gives us the desired representation f =
r X
λj projej .
j=1
Theorem 3.4.17. Let V be a finite dimensional inner product space. The operator f : V → V is normal if and only if there is an orthonormal basis of V consisting of eigenvectors of f .
171
3.4. SPECTRAL THEOREMS
Proof. Let dim V = n. The orthonormal vectors e1 , . . . , er in Theorem 3.4.16 are eigenvectors of f and we have ran f = Span{e1 , . . . , er }. Since V = ran f ⊕ (ran f )⊥ , we have dim(ran f )⊥ = n − r and there are orthonormal vectors er+1 , . . . , en such that (ran f )⊥ = Span{er+1 , . . . , en }. If v ∈ (ran f )⊥ , then hv, f (f ∗ (v))i = 0. Since f is normal, we have f (f ∗ (v)) = f ∗ (f (v)) and thus 0 = hv, f (f ∗ (v))i = hv, f ∗ (f (v))i = kf (v)k2 . Hence f (v) = 0 and thus er+1 , . . . , en ∈ ker f , which means that they are eigenvectors corresponding to the eigenvalue 0. By the Rank-Nullity Theorem we have dim ker f = n − r and thus the set {er+1 , . . . , en } is a basis of ker f . Consequently, {e1 , . . . , en } is an orthonormal basis of V consisting of eigenvectors of f . On the other hand, if there is an orthonormal basis {e1 , . . . , en } of V consisting of eigenvectors of f , then the operator f is normal by Example 3.4.5 and Theorem 3.4.16.
Example 3.4.18. Let V be a finite dimensional inner product space and let f : V → V be a normal operator such that for every v ∈ V we have f (v) =
n X j=1
λj hv, ej iej ,
where {e1 , . . . , en } is an orthonormal basis of V consisting of eigenvectors of f and λ1 , . . . , λn are the corresponding eigenvalues. Show that p(f ) is a normal operator and we have p(f )(v) =
n X j=1
p(λj )hv, ej iej
for any polynomial p. Solution. Since λ1 , . . . , λn are eigenvalues of f corresponding to eigenvectors e1 , . . . , en , we have f k (ej ) = λkj ej for every j ∈ {1, . . . , n} and every integer k ≥ 1. Consequently, p(f )(ej ) = p(λj )ej for every j ∈ {1, . . . , n} and consequently the linear operator p(f ) is normal according to Theorem 3.4.17. From Example 3.4.5 we get n X p(f )(v) = p(λj )hv, ej iej . j=1
172
Chapter 3: Inner Product Spaces
A representation of a linear operator f : V → V in the form f=
r X
λj projej ,
j=1
as in Theorem 3.4.16, is called a spectral decomposition of f .
Example 3.4.19. In Example 3.3.10 we show that the operator f ∈ L(C2 , C2 ) defined by f (x) = Ax where i 1−i A= −1 − i 2i is normal. Find a spectral decomposition of f . Solution. First we find eigenvalues of f . We need to find values of λ ∈ C such that the operator f − λ Id is not invertible. Since (f − λ Id)x =
i−λ 1−i x −1 − i 2i − λ
and
i−λ 1−i det = (i − λ)(2i − λ) − (1 − i)(−1 − i) = λ(λ − 3i), −1 − i 2i − λ f has two eigenvalues: 0 and 3i. Next we need to find an eigenvector corresponding to 3i, that is a nonzero x1 vector ∈ C2 such that x2
−2i 1 − i −1 − i −i
x1 = 0. x2
1 satisfies the above equation, it is an eigenvector −1 + i corresponding to the eigenvalue 3i and Since the vector
f = 3iproj
is a spectral decomposition of f .
1 −1 + i
3.4. SPECTRAL THEOREMS
173
1+i is an eigenvector corresponding to 0, but it is not used in 1 1 the spectral decomposition of f . Note that, as expected, the vectors −1 + i 1+i and are orthogonal. 1 The vector
Example 3.4.20. Show that the operator f ∈ L(C2 , C2 ) defined by f (x) = Ax, where 3 + 2i 2 − 4i A= , 4 + 2i 6 + i is normal and find a spectral decomposition of f . Solution. Since
3 + 2i − λ 2 − 4i det = (3 + 2i − λ)(6 + i − λ) − (4 + 2i)(2 − 4i) 4 + 2i 6 + i − λ = 16 + 15i + λ2 − (9 + 3i)λ + 16 − 12i
= λ2 − (9 + 3i)λ + 27i = (9 − λ)(3i − λ)
the eigenvalues of the operator f are 9 and 3i. 4i − 2 Next we find that is an eigenvector corresponding to 9 and −6 + 2i 4i − 2 4i − 2 is an eigenvector corresponding to 3i. The vectors v1 = 3−i −6 + 2i 4i − 2 and v2 = are orthogonal and thus {v1 , v2 } is an orthogonal basis of 3−i L(C2 , C2 ) and consequently the operator f is normal. Thus f = 9projv1 + 3iprojv2 is a spectral decomposition of f .
174
Chapter 3: Inner Product Spaces
For practical calculations we can use matrices of the projections projv1 and projv2 , which gives us x1 1 1 x1 4i − 2 √ −4i − 2 −6 − 2i f =9 √ x2 x2 60 −6 + 2i 60 1 1 4i − 2 x1 √ −4i − 2 3 + i + 3i √ x2 30 3 − i 30 1 1 − i x1 2 −1 + i x1 =3 +i 1+i 2 x2 −1 − i 1 x2 x for every 1 ∈ C2 . x2 As an immediate consequence of Theorem 3.4.16 we obtain the following result, which is also referred to as the spectral representation. Theorem 3.4.21. Let V be a finite dimensional inner product space and let f : V → V be a normal operator. Then f=
q X j=1
λj projEλ , j
where λ1 , . . . , λq are all distinct eigenvalues of f .
Let V be an inner product space and let U1 , . . . , Un be subspaces of V such that V = U1 ⊕ · · · ⊕ Ur . If the subspaces U1 , . . . , Un are mutually orthogonal, that is, hx, yi = 0 for any x ∈ Uj and y ∈ Uk with j 6= k, then we say that U1 ⊕ · · · ⊕ Ur is an orthogonal decomposition of the space V. From Theorem 3.4.21 we obtain the following important result. Theorem 3.4.22. Let f : V → V be a normal operator on a finite dimensional inner product space V. If λ1 , . . . , λq are all distinct eigenvalues of f , then Eλ1 ⊕ · · · ⊕ Eλq is an orthogonal decomposition of V. The decomposition of a normal operator in Theorem 3.4.21 is unique as shown in the next theorem.
175
3.4. SPECTRAL THEOREMS
Theorem 3.4.23. Let V be a finite-dimensional inner product space and let f : V → V be a linear operator. If U1 ⊕ · · · ⊕ Ur is an orthogonal decomposition of V and f=
r X j=1
λj projUj
for some distinct λ1 , . . . , λr ∈ K, then Uj = Eλj for every j ∈ {1, . . . , r}. Solution. If v ∈ Uj , then f (v) = λj v and Prthus v ∈ Eλk . Pr Now, if v ∈ Eλk , then f (v) = λk v = j=1 λk projUj (v) because j=1 projUj = IdV . Since r X f (v) = λj projUj (v), j=1
we have
r X j=1
which gives us
λk projUj (v) =
r X
j=1,j6=k
r X j=1
λj projUj (v)
|λk − λj |2 kprojUj (v)k2 = 0.
Consequently, projUj (v) = 0 for j 6= k. This means that v = projUk (v) and thus v ∈ Uk . Now we turn our attention to spectral properties of self-adjoint operators. Theorem 3.4.24. Let V be a complex inner product space. All eigenvalues of a self-adjoint operator f : V → V are real numbers. Proof. Let λ be an eigenvalue of a self-adjoint operator f : V → V. If x is an eigenvector of f corresponding to λ, then λhx, xi = hλx, xi = hf (x), xi = hx, f (x)i = hx, λxi = λhx, xi. Since hx, xi = kxk2 6= 0, we have λ = λ. The above property characterizes normal operators that are self-adjoint. Theorem 3.4.25. Let V be a finite dimensional complex inner product space. A normal operator f : V → V is self-adjoint if and only if all eigenvalues of f are real.
176
Chapter 3: Inner Product Spaces
Proof. In the proof of Theorem 3.4.16 we have shown that if f (v) =
r X j=1
then f ∗ (v) =
λj hv, ej iej
r X j=1
λj hv, ej iej
Our result is a consequence of these equalities.
Example 3.4.26. Find a spectral decomposition of the self-adjoint operator f ∈ L(C2 , C2 ) defined as x1 33 24 − 24i x1 f = . x2 24 + 24i 57 x2 Solution. Since 33 − λ 24 − 24i det = (33 − λ)(57 − λ) − (24 + 24i)(24 − 24i) 24 + 24i 57 − λ = (24 + 9 − λ)(48 + 9 − λ) − 24 · 48
= (9 − λ)(81 − λ), 1−i 1−i the eigenvalues of f are 9 and 81. The vectors and are eigenvec−1 2 tors of f corresponding to the eigenvalues 9 and 81, respectively. Consequently, f = 9proj
1 − i −1
+ 81proj
is a the spectral decomposition of f . Since the matrix of the projection on Span
1 − i 2
1−i is −1
1 1−i 1 1 2 −1 + i √ √ 1 + i −1 = 1 3 −1 − i 3 −1 3 1−i and the matrix of the projection on Span is 2
1 1−i 1 1 1 1−i √ √ 1+i 2 = , 2 3 1+i 2 6 6
3.4. SPECTRAL THEOREMS
177
we have f
for every
3.4.2
x1 x2
2 −1 + i x1 1 1 − i x1 =3 + 27 −1 − i 1 x2 1+i 2 x2
x1 ∈ C2 . x2
Self-adjoint operators on real inner product spaces
Now we turn our attention to real inner product spaces. Some properties established for complex spaces remain true, but there are some essential differences.
Example 3.4.27. Consider the operator f ∈ L(R2 , R2 ) defined by f (x) = Ax, where 0 −1 A= . 1 0 Show that f is a normal operator without any eigenvalues. 0 1 ∗ T T Solution. First we note that f (x) = A x, where A = . Since −1 0
0 −1 0 1 0 1 0 −1 1 0 = = , 1 0 −1 0 −1 0 1 0 0 1
f is a normal operator. To show that f has no eigenvalues we note that
−λ −1 det = λ2 + 1 1 −λ and the equation λ2 + 1 = 0 has no real solutions. (A complex number cannot be an eigenvalue of an operator on a real vector space.)
A matrix A ∈ Mn×n (R) is called symmetric if AT = A. Note that if A ∈ Mn×n (R) is symmetric, then the operator f : Rn → Rn defined by f (x) = Ax is self-adjoint. Lemma 3.4.28. If A ∈ Mn×n (R) is symmetric, then the operator f : Rn → Rn defined by f (x) = Ax has an eigenvalue.
178
Chapter 3: Inner Product Spaces
Proof. Let g : Cn → Cn be the linear operator defined by g(x) = Ax. By Theorem 3.4.8, g has an eigenvalue λ. Since A is a symmetric matrix, λ is a real number. Let z ∈ Cn be an eigenvector corresponding to λ. We can write z = x + iy where x, y ∈ Rn . Then Az = A(x + iy) = λ(x + iy) = λx + iλy. Since A(x + iy) = Ax + iAy and λ is a real number, it follows that A(x) = λx and A(y) = λy. Now, because z 6= 0, we have x 6= 0 or y 6= 0. Consequently, λ is an eigenvalue of f .
Theorem 3.4.29. Let V be a finite dimensional real inner product space. Every self-adjoint operator f : V → V has an eigenvalue.
Proof. Let B = {v1 , . . . , vn } be a basis of V and let A be the B-matrix of f . Since f is self-adjoint, A is a symmetric matrix. Let λ ∈ R be an eigenvalue of A x1 .. and let x = . ∈ Rn be an eigenvector of A corresponding to the eigenvalue xn
λ. Then
x1 x1 λx1 A ... = λ ... = ... xn
xn
λxn
and thus f (x1 v1 + · · · + xn vn ) = (λx1 )v1 + · · · + (λxn )vn = λ(x1 v1 + · · · + xn vn ), which means that the real number λ is an eigenvalue of f and x1 v1 + · · · + xn vn is an eigenvector of f corresponding to λ.
The following theorem is a version of Theorem 3.4.16 for self-adjoint operators on real inner product spaces. The proof is similar to the proof of Theorem 3.4.16. Note that the above theorem is needed for the result.
179
3.4. SPECTRAL THEOREMS
Theorem 3.4.30. Let V be a finite dimensional inner product space over R and let f : V → V be a nonzero linear operator. The following conditions are equivalent: (a) f is a self-adjoint operator; (b) There are orthonormal vectors e1 , . . . , er ∈ V and nonzero real numbers λ1 , . . . , λr such that for every v ∈ V we have f (v) =
r X j=1
λj hv, ej iej ;
(c) There are orthonormal vectors e1 , . . . , er ∈ V and nonzero real numbers λ1 , . . . , λr such that f=
r X
λj projej .
j=1
Example 3.4.31. Find all eigenvalues and a spectral decomposition of the operator f ∈ L(R3 , R3 ) defined by f (x) = Ax, where 4 1 2 A = 1 5 1 . 2 1 4
Solution. Note that A is a symmetric matrix, so f has at least one real eigenvalue. We have to determine the real numbers λ such that the system (4 − λ)x + y + 2z = 0 x + (5 − λ)y + z = 0 2x + y + (4 − λ)z = 0
has nontrivial solutions. By adding the second and third equations to the first one we get (7 − λ)(x + y + z) = 0 x + (5 − λ)y + z = 0 . 2x + y + (4 − λ)z = 0 If λ = 7, then the system has nontrivial solutions. If λ 6= 7, then the system
180
Chapter 3: Inner Product Spaces
is equivalent to
=0 x + y + z x + (5 − λ)y + z = 0 2x + y + (4 − λ)z = 0
or
=0 x + y + z (4 − λ)y =0 . 2x + y + (4 − λ)z = 0
If λ = 4, then the system has nontrivial solutions. If λ 6= 4, then y = 0 and the system becomes x+z =0 2x + (4 − λ)z = 0 or x+z =0 (2 − λ)z = 0 If λ = 2, then the system has nontrivial solutions. If λ 6= 2, then x = y = z = 0. 4 1 2 Consequently, the eigenvalues of the matrix 1 5 1 are 7, 4 and 2. 2 1 4 1 1 1 The vectors 1, −2, and 0, are eigenvectors corresponding to the 1 1 −1 eigenvalues 7, 4 and 2, respectively. If we let 1 1 √
3
1 √ v1 = 3 , √1 3
then we can write
√ 6
2 √ v2 = − 6 , √1 6
√1 2
v3 = 0 , 1 − √2
f (x) = 7hx, v1 iv1 + 4hx, v2 iv2 + 2hx, v2 iv2 or f = 7projv1 + 4projv2 + 2projv3 .
3.4.3
Unitary operators
In this section all inner product spaces are assumed to be complex. We discuss a special type of normal operators, called unitary operators. These operators have interesting geometric and algebraic properties similar to rotations about the origin in Rn .
181
3.4. SPECTRAL THEOREMS
Definition 3.4.32. Let V be a finite dimensional inner product space. A linear operator f : V → V is called a unitary operator if f f ∗ = f ∗ f = Id, where Id is the identity operator on V. Note that the condition in the above definition implies that f is invertible, ran f = V, and f −1 = f ∗ . Theorem 3.4.33. Let V be a finite dimensional inner product space. If f : V → V is a unitary operator, then f ∗ and f −1 are unitary.
Proof. If f is unitary, then f ∗ (f ∗ )∗ = f ∗ f = Id
and (f ∗ )∗ f ∗ = f f ∗ = Id,
and hence f ∗ is unitary. Since f −1 = f ∗ , f −1 is unitary.
Example 3.4.34. Consider the vector space Pn (C) with the inner product R1 hf, gi = 0 f (t)g(t)dt. Show that the linear operator Φ : Pn (C) → Pn (C)
defined as Φ(f )(t) = f (1 − t) is a unitary operator. Solution. Since, for every f, g ∈ Pn (C), we have hΦ(f ), gi =
Z
0
1
f (1 − t)g(t)dt =
Z
0
1
f (t)g(1 − t)dt = hf, Φ(g)i,
Φ is self-adjoint. Hence, Φ∗ Φ = ΦΦ∗ = Φ2 = Id.
Definition 3.4.35. Let V be a normed space. A linear operator f : V → V is called an isometric operator or an isometry if kf (x)k = kxk for every x ∈ V.
182
Chapter 3: Inner Product Spaces
Example 3.4.36. Let z and w be two complex numbers such that |z| = |w| = 1. Find a linear isometry f : C → C such that f (z) = w. Solution. The function f (x) = wzx has the desired property. Since for every x ∈ C we have x x x f (x) = f z = f (z) = w = wzx, z z z this solution is unique. This function f can be interpreted as the rotation of the complex plane about the origin that takes z to w.
Example 3.4.37. Let V be a finite dimensional complex inner product space and let {e1 , . . . , en } be an orthogonal basis of V. Show that if λ1 , . . . , λn are complex numbers such that |λ1 | = · · · = |λn | = 1, then the linear operator f : V → V defined by f (α1 e1 + · · · + αn en ) = λ1 α1 e1 + · · · + λn αn en is an isometric operator. Solution. Since the vectors e1 , . . . , en are orthogonal, by the Pythagorean Theorem, we get kf (α1 e1 + · · · + αn en )k2 = kλ1 α1 e1 + · · · + λn αn en k2
= kλ1 α1 e1 k2 + · · · + kλn αn en k2
= |λ1 |2 kα1 e1 k2 + · · · + |λn |2 kαn en k2
= kα1 e1 k2 + · · · + kαn en k2
= kα1 e1 + · · · + αn en k2 . Consequently, kf (x)k = kxk for every x ∈ V.
Theorem 3.4.38. Let V be an inner product space. A linear operator f : V → V is isometric if and only if it preserves the inner product, that is, hf (x), f (y)i = hx, yi for every x, y ∈ V. Proof. If hf (x), f (y)i = hx, yi for every x, y ∈ V, then kf (x)k2 = hf (x), f (x)i = hx, xi = kxk2
3.4. SPECTRAL THEOREMS
183
for every x ∈ V and thus f is an isometric operator. Now assume that f is an isometric operator. From the Polarization Identity (Corollary 3.1.34) we get hf (x), f (y)i 1 = kf (x) + f (y)k2 − kf (x) − f (y)k2 + ikf (x) + if (y)k2 − ikf (x) − if (y)k2 4 1 = kf (x + y)k2 − kf (x − y)k2 + ikf (x + iy)k2 − ikf (x − iy)k2 4 1 = kx + yk2 − kx − yk2 + ikx + iyk2 − ikx − iyk2 = hx, yi 4
for every x, y ∈ V.
Theorem 3.4.39. On a finite dimensional inner product space a linear operator is unitary if and only if it is isometric.
Proof. Let V be a finite dimensional inner product space and let f : V → V be a unitary operator. For every x ∈ V we have kf (x)k2 = hf (x), f (x)i = hx, f ∗ f (x)i = hx, xi = kxk2 , which means that f is an isometric operator. Now we assume that V is a finite dimensional inner product space and f : V → V is an isometric operator. Note that the equality kf (x)k = kxk implies that f is injective and thus dim ker f = 0. If dim V = n, then dim ran f = n and hence ran f = V and f is invertible. In other words, f is an isomorphism. By Theorem 3.4.38, for every x, y ∈ V we have hf ∗ f (x), yi = hf (x), f (y)i = hx, yi, and hence f ∗ f = Id. Moreover, since f is an isomorphism, for every x ∈ V, there is a y ∈ V such that x = f (y) and we have f f ∗ (x) = f f ∗ f (y) = f (y) = x, because f ∗ f = Id. Hence f f ∗ = Id. From Theorems 3.4.38 and 3.4.56 we obtain the following geometric characterization of unitary operators on finite dimensional inner product spaces: unitary operators are linear operators that preserve the norm or the inner product.
184
Chapter 3: Inner Product Spaces
Corollary 3.4.40. Let V be a finite dimensional inner product space and let f : V → V be a linear operator. The following conditions are equivalent: (a) f is unitary; (b) kf (x)k = kxk for every x ∈ V; (c) hf (x), f (y)i = hx, yi for every x, y ∈ V. In the following theorem we characterize unitary operators on V in terms orthonormal bases in V. Theorem 3.4.41. Let V be a finite dimensional inner product space and let f : V → V be a linear operator. The following conditions are equivalent: (a) f is unitary; (b) f is normal and |λ| = 1 for every eigenvalue λ of f ; (c) There is an orthonormal basis {e1 , . . . , en } {f (e1 ), . . . , f (en )} is an orthonormal basis of V;
of
V
such
(d) For every orthonormal basis {v1 , . . . , vn } of V, {f (v1 ), . . . , f (vn )} is an orthonormal basis of V; (e) V has an orthonormal basis {e1 , . . . , en } of eigenvectors of f corresponding to eigenvalues λ1 , . . . , λn such that |λ1 | = · · · = |λn | = 1; Pn (f) f (x) = j=1 λj hx, ej iej for every x ∈ V, where {e1 , . . . , en } is an orthonormal basis of V and |λ1 | = · · · = |λn | = 1. Proof. If f is unitary, then f is normal and, by Corollary 3.4.17, there is an orthonormal basis {e1 , . . . , en } of V consisting of eigenvectors of f . Let λj be the eigenvalue of f corresponding to the eigenvector ej . Then |λj | = kλj ej k = kf (ej )k = kej k = 1, so (a) implies (b). Now we assume that f is normal and |λ| = 1 for every eigenvalue λ of f . By Corollary 3.4.17, there is an orthonormal basis {e1 , . . . , en } of V consisting of eigenvectors of f . Let λ1 , . . . , λn be the corresponding eigenvalues. Since |λ1 | = · · · = |λn | = 1, we have hf (ej ), f (ek )i = hλj ej , λk ek i = λj λk hej , ek i = δjk
185
3.4. SPECTRAL THEOREMS
and thus {f (e1 ), . . . , f (en )} is an orthonormal basis of V. This shows that (b) implies (c). Next we assume that there is an orthonormal basis {e1 , . . . , en } of V such {f (e1 ), . . . , f (en )} is an orthonormal basis of V. Let {v1 , . . . , vn } be an orthonormal basis of V. Then for all j ∈ {1, . . . , n} we have vj =
n X
αjm em ,
m=1
where αjm = hvj , em i. Since hf (vj ), f (vk )i = = = = =
*
f
n X
n X
αjl el
l=1 n X
l=1 m=1 n X n X
l=1 m=1 n X n X l=1 m=1 * n X
!
,f
n X
αkm em
m=1
!+
αjl αkm hf (el ), f (em )i αjl αkm δlm αjl αkm hel , em i
αjl el ,
l=1
n X
αkm em
m=1
+
= hvj , vk i = δjk ,
{f (v1 ), . . . , f (vn )} is an orthonormal basis of V, proving that (c) implies (d). Let {v1 , . . . , vn } be an orthonormal basis of V. If {f (v1 ), . . . , f (vn )} is an orthonormal basis, then
2 2
X
n n X
2
kf (x)k = f hx, vj ivj = hx, vj if (vj )
j=1
j=1 =
n X j=1
|hx, vj i|2 kf (vj )k2 =
n X j=1
|hx, vj i|2 = kxk2 ,
for every x ∈ V. Consequently, by Corollary 3.4.40, f is unitary and thus (d) implies (a). So far we have proved that (a)-(d) are all equivalent. Clearly (b) implies (e) and (e) implies (f). Finally, if we assume that there is an orthonormal basis {e1 , . . . , en } of V and |λ1 | = · · · = |λn | = 1 such that f (x) =
n X j=1
λj hx, ej iej
186
Chapter 3: Inner Product Spaces
for every x ∈ V, then
2
n
n X X
2
kf (x)k = λj hx, ej iej = kλj hx, ej iej k2
j=1
j=1 =
n X j=1
|λj |2 khx, ej iej k2 =
n X j=1
khx, ej iej k2 = kxk2 .
By Corollary 3.4.40, f is unitary and thus (f) implies (a), completing the proof of the theorem.
Example 3.4.42. Let {a, b, c} and {u, v, w} be two orthogonal bases in an inner product space V. Find an isometry f such that Span{f (a)} = Span{u}, Span{f (b)}) = Span{v}, and Span{f (c)} = Span{w}. Solution. f (x) =
x,
1 1 1 1 1 1 a u + x, b v + x, c w. kak kuk kbk kvk kck kwk
Example 3.4.43. Consider the operator f ∈ L(C2 , C2 ) defined by f (x) = Ax, where 1 4 + i 2 − 2i . A= 5 2 − 2i 1 + 4i Show that f is unitary and find its spectral decomposition. Solution. We can verify that f f ∗ = f ∗ f = Id by simple matrix multiplication. Since the roots of the equation (4 + i − λ)(1 + 4i − λ) − (2 − 2i)2 = λ2 − (5 + 5i)λ + 25i = 0 are 5 and 5i, 1 and i are eigenvalues of f and " " 2 # √
5 √1 5
and
# √1 5 − √25
are unit eigenvectors corresponding to 1 and i. Consequently, * " 1 #+ " 1 # * " √2 #+ " √2 # √ √ x1 x1 x1 5 5 5 5 f = , 1 +i . , 1 2 x2 x2 x2 √ √ −√ − √2 5
5
5
5
3.4. SPECTRAL THEOREMS
187
The operator in the next example is a normal operator that is not unitary.
Example 3.4.44. Consider the operator f ∈ L(C2 , C2 ) defined by f (x) = Ax, where 5i 3 + 4i A= . −3 + 4i 5i
It is easy to verify that f f ∗ = f ∗ f = 0, so f is a normal operator, but it is not unitary. Since 1 − 2i 0 2+i 2+i f = and f = 10i , −2 + i 0 1 + 2i 1 + 2i 1 − 2i 2+i and are corresponding f has eigenvalues 0 and 10i and −2 + i 1 + 2i eigenvectors.
3.4.4
Orthogonal operators on real inner product spaces
In this section we discuss the orthogonal operators on finite dimensional real inner product spaces, that is, operators that preserve the inner product. All inner product spaces considered in this section are assumed to be real.
Definition 3.4.45. Let V be a real inner product space. A linear operator f : V → V is called an orthogonal operator if hf (x), f (y)i = hx, yi for every x, y ∈ V.
Theorem 3.4.46. Let V be a finite dimensional real inner product space and let f : V → V be a linear operator. The following conditions are equivalent: (a) f is orthogonal; (b) f f ∗ = f ∗ f = Id; (c) kf (x)k = kxk for every x ∈ V.
188
Chapter 3: Inner Product Spaces
Proof. Let f : V → V be an orthogonal operator. For every x, y ∈ V we have hf ∗ f (x), yi = hf (x), f (y)i = hx, yi and thus hf ∗ f (x) − x, yi = 0.
If we take y = f ∗ f (x) − x, then we get
kf ∗ f (x) − xk2 = hf ∗ f (x) − x, f ∗ f (x) − xi = 0. Consequently, f ∗ f (x)−x = 0, and hence f ∗ f = Id. Since V is finite dimensional, we also have f f ∗ = Id, by Theorem 2.2.11. This proves that (a) implies (b). If f f ∗ = f ∗ f = Id, then for every x ∈ V we have kf (x)k2 = hf (x), f (x)i = hx, f ∗ f (x)i = hx, xi = kxk2 , and thus (b) implies (c). Finally, if kf (x)k = kxk for every x ∈ V, then 2hf (x), f (y)i = kf (x + y)k2 − kf (x)k2 − kf (y)k2
= kx + yk2 − kxk2 − kyk2 = 2hx, yi
for every x, y ∈ V. This proves that (c) implies (a). While some properties of orthogonal operators on finite dimensional real inner product spaces are the same as properties of unitary operators on finite dimensional complex inner product spaces, there are some essential differences. Theorem 3.4.47. Let V be a finite dimensional real inner product space and let f : V → V be an orthogonal operator. If λ is an eigenvalue of f , then λ = 1 or λ = −1. Proof. If v be an eigenvector corresponding to the eigenvalue λ, then f (v) = λv and thus |λ|kvk = kλvk = kf (v)k = kvk. Consequently, |λ| = 1 because v 6= 0 and, because λ is a real number, λ = 1 or λ = −1. Note that the above theorem does not say that every orthogonal operator on a finite dimensional real inner product space has an eigenvalue. Indeed, if f : R2 → R2 is the operator f (x) = Ax, where 0 −1 A= , 1 0
3.4. SPECTRAL THEOREMS
189
then x1 y1 −x2 −y2 x1 y A ,A = , = x1 y1 + x2 y2 = , 1 , x2 y2 x1 y1 x2 y2 so f is an orthogonal operator. On the other hand, since −λ −1 det = λ2 + 1, 1 −λ f has no real eigenvalues. Lemma 3.4.48. Let V be a finite dimensional real inner product space and let f : V → V be an orthogonal operator. If U is an f -invariant subspace of V, then U ⊥ is also f -invariant. Proof. Let U be an f -invariant subspace of V. Since f is an isomorphism, we have f (U) = U. Let v ∈ U ⊥ and u ∈ U. Then there is w ∈ U such that f (w) = u and we have hf (v), ui = hf (v), f (w)i = hv, wi = 0. Thus f (v) ∈ U ⊥ . Lemma 3.4.49. Let V be a finite dimensional real inner product space and let f : V → V be an orthogonal operator. There is an f -invariant subspace U ⊆ V such that dim U = 1 or dim U = 2. Proof. Since the operator f + f ∗ is self-adjoint, it has an eigenvalue λ ∈ R. Let v be an eigenvector corresponding to λ. Then (f + f ∗ )v = λv and hence (f f + f f ∗ )v = f (λv). Since f f ∗ = Id, the above can be written as f 2 (v) + v = λf (v) or f 2 (v) = λf (v) − v.
Consequently, f 2 (v) ∈ Span{v, f (v)} and thus U = Span{v, f (v)} is f -invariant and dim U = 1 or dim U = 2.
190
Chapter 3: Inner Product Spaces
Theorem 3.4.50. Let V be a finite dimensional real inner product space and let f : V → V be an orthogonal operator. There are f -invariant subspaces U1 , . . . , Un ⊆ V such that V = U1 ⊕ · · · ⊕ Un and dim Uj = 1 or dim Uj = 2 for every j ∈ {1, . . . , n}. Proof. By Lemma 3.4.49 there is an f -invariant subspace U1 ⊆ V such that dim U1 = 1 or dim U1 = 2. We have V = U1 ⊕ U1⊥ . Now, by Lemma 3.4.48, U1⊥ is f -invariant and thus we can define an operator g : U1⊥ → U1⊥ by g(x) = f (x) for every x ∈ U1⊥ . Clearly g is an orthogonal operator. Since dim U1⊥ = dim V − 1 or dim U1⊥ = dim V − 2, we can apply Lemma 3.4.49 to g and proceed as before. This gives us the desired result by induction. If f : V → V is an orthogonal operator on a real inner product space V, then by the above theorem, V is a direct sum of f -invariant subspaces U1 , . . . , Un of dimension 1 or 2. If dim Uj = 1, then f (v) = v or f (v) = −v for every v ∈ Uj . Now we are going to consider the case when dim Uj = 2. Theorem 3.4.51. Let V be a real inner product space such that dim V = 2 and let f : V → V be an orthogonal operator. If {v, w} is an orthonormal basis of V such that f (v) = av + bw
and
f (w) = cv + dw,
where a, b, c, d ∈ R, then one of the following conditions holds (a) ad − bc = 1 and there is a unique number θ ∈ (−π, π] such that f (v) = cos θ v + sin θ w
and
f (w) = − sin θ v + cos θ w;
(b) ad − bc = −1 and there is an orthonormal basis {u1 , u2 } of V such that f (u1 ) = u1 and f (u2 ) = −u2 . Note that in the second case u1 and u2 are eigenvectors of f corresponding to eigenvalues 1 and −1, respectively. Proof. From kf (v)k2 = kf (w)k2 = 1 and hf (v), f (w)i = 0 we get 2 a + b2 = 1 c2 + d2 = 1 . ac + bd = 0
3.4. SPECTRAL THEOREMS
191
d d d If a 6= 0, then c = − bd a = − a b and d = − a a. If we let t = − a , then we have
1 = c2 + d2 = t2 (a2 + b2 ) = t2 and thus t = 1 or t = −1. If t = 1, then c = −b, d = a, ad − bc = 1, and there is a unique θ ∈ (−π, π] such that a = cos θ and b = sin θ. If t = −1, then c = b, d = −a, and ad − bc = −1. Since for any p, q, x, y ∈ R we have hf (pv + qw), xv + ywi = apx + bqx + bpy − aqy = hpv + qw, f (xv + yw)i, f is self-adjoint. Because f is orthogonal, if λ is an eigenvalue of f , then λ = 1 or λ = −1. Next we calculate the corresponding eigenvectors. To solve the equation f (xv + yw) − (xv + yw) = (x(a − 1) + yb)v + (xb − (a + 1)y)w = 0 we need to solve the system
x(a − 1) + yb = 0 . xb − (a + 1)y = 0
It is easy to verify that x = −b and y = a − 1 is a nonzero solution if b 6= 0 or a 6= 1. The same way we show that the equation f (xv + yw) + (xv + yw) = (x(a + 1) + yb)v + (xb − (a − 1)y)w = 0 has a nonzero solution x = −b and y = a + 1 if b 6= 0 or a 6= −1. Note that, as expected, the vectors −bv + (a − 1)w and −bv − (a + 1)w are orthogonal because a2 + b2 = 1. The cases b = 0 and a = 1 as well as b = 0 and a = −1 are trivial. Consequently, there is a basis {u1 , u2 } of V consisting of orthonormal eigenvectors of f such that f (u1 ) = u1 and f (u2 ) = −u2 . Finally, if a = 0, then d = 0 and b2 = c2 = 1. There are four possibilities: # " cos( π2 ) − sin( π2 ) a b 0 −1 ; = = c d 1 0 sin( π2 ) cos( π2 ) # " cos( π2 ) − sin(− π2 ) a b 0 1 = = ; c d −1 0 cos( π2 ) sin(− π2 ) a b 0 1 = ; c d 1 0 a b 0 −1 = . c d −1 0 In the first two cases ad − bc = 1 and in the last two cases ad − bc = −1. In all cases the operator f is self-adjoint and has eigenvalues 1 and −1.
192
Chapter 3: Inner Product Spaces
Using Theorem 3.4.51 we can give a more detailed description of the f invariant subspaces in Theorem 3.4.50. Theorem 3.4.52. Let V be a finite dimensional real inner product space and let f : V → V be an orthogonal operator. There are f -invariant subspaces U1 , . . . , Up , W1 , . . . , Wq , X1 , . . . , Xr ⊆ V, such that V = U1 ⊕ · · · ⊕ Up ⊕ W1 ⊕ · · · ⊕ Wq ⊕ X1 ⊕ · · · ⊕ Xr and (a) for every j ∈ {1, . . . , p}, dim Uj = 1 and there is a nonzero vector uj ∈ Uj such that f (uj ) = uj ; (b) for every k ∈ {1, . . . , q}, dim Wk = 1 and there is a nonzero vector wk ∈ Wk such that f (wk ) = −wk ; (c) for every l ∈ {1, . . . , r}, dim Xl = 2 and there are orthonormal vectors xl , yl ∈ Xl and a unique θl ∈ (−π, π] such that f (xl ) = cos θl xl + sin θl yl f (yl ) = − sin θl xl + cos θl yl .
3.4.5
Positive operators
There is some similarity between operators on a complex inner product space and the complex numbers. Self-adjoint operators are like real numbers and unitary operators are like complex numbers of modulus 1. Now we are going to consider operators that behave like nonnegative numbers. Definition 3.4.53. Let V be an inner product space. A linear operator f : V → V is called positive if hf (x), xi ≥ 0 for every x ∈ V. If f : V → V is a positive operator, then hf (x), xi is a real number for every x ∈ V and thus hf (x), xi = hf (x), xi = hx, f (x)i, which shows that positive operators are self-adjoint.
193
3.4. SPECTRAL THEOREMS
Example 3.4.54. Consider the vector space C([a, b]) of complex-valued continuous functions defined on an interval [a, b] with the inner product hf, gi = Rb a f (t)g(t) dt and let ϕ : [a, b] → R be a positive continuous function. Show that the operator Φ : C([a, b]) → C([a, b]) defined by Φ(f ) = ϕf is a positive operator. Solution. For all f ∈ C([a, b]) we have hΦ(f ), f i =
Z
b
ϕ(t)f (t)f (t) dt =
a
Z
b
a
ϕ(t)|f (t)|2 dt ≥ 0.
Example 3.4.55. Let V = C∞ be the vector space of all infinite sequences of complex numbers with only a finite number of nonzero terms with the inner product defined as h(x1 , x2 , . . . ), (y1 , y2 , . . . )i =
∞ X
xj yj .
j=1
Consider the linear operator f : C∞ → C∞ defined as f (x1 , x2 , . . . ) = (α1 x1 , α2 x2 , . . . ) where α1 , α2 , . . . are arbitrary complex numbers. Show that f is a positive operator if and only if all αj ’s are positive real numbers. Solution. If all αj ’s are positive real numbers, then for every (x1 , x2 , . . . ) ∈ C∞ we have hf (x1 , x2 , . . . ), (x1 , x2 , . . . )i = h(α1 x1 , α2 x2 , . . . ), (x1 , x2 , . . . )i ∞ ∞ X X αj |xj |2 ≥ 0, = αj xj xj = j=1
j=1
so f is a positive operator. Now assume that hf (x1 , x2 , . . . ), (x1 , x2 , . . . )i ≥ 0 for every (x1 , x2 , . . . ) ∈ C∞ . If, for every integer j ≥ 1, we denote by ej ∈ C∞ the sequence (x1 , x2 , . . . ) such that xj = 1 and xk = 0 for k 6= j, then we have 0 ≤ hf (ej ), ej i = αj .
194
Chapter 3: Inner Product Spaces
The operations of adjoint of an operator and conjugate of a complex number have similar algebraic properties. For example, for any complex number z we have zz ≥ 0. In the next theorem we formulate a similar property for linear operators. Theorem 3.4.56. Let V be a finite dimensional inner product space and let f : V → V be a linear operator. The operators f f ∗ and f ∗ f are positive. Proof. For every x ∈ V we have hf f ∗ (x), xi = hf ∗ (x), f ∗ (x)i = kf ∗ (x)k2 ≥ 0 and hf ∗ f (x), xi = hf (x), f (x)i = kf (x)k2 ≥ 0.
The following theorem is similar to Theorems 3.4.16 and 3.4.30. These three theorems characterize normal, self-adjoint, and positive operators in terms of their eigenvalues. Theorem 3.4.57. Let V be a finite dimensional inner product space and let f : V → V be a nonzero linear operator. The following conditions are equivalent (a) f is a positive operator; (b) There are orthonormal vectors e1 , . . . , er and positive numbers λ1 , . . . , λr such that for every v ∈ V we have f (v) =
r X j=1
λj hv, ej iej ;
(c) There are orthonormal vectors e1 , . . . , er and positive numbers λ1 , . . . , λr such that r X f= λj projej . j=1
Proof. If f is a positive operator, then it is self-adjoint and there are orthonormal P vectors e1 , . . . , er and nonzero real numbers λ1 , . . . , λr such that f (v) = rj=1 λj hv, ej iej for every vector v ∈ V. Since, for every λj we have 0 ≤ hf (ej ), ej i = hλj ej , ej i = λj hej , ej i = λj kej k2 = λj ,
195
3.4. SPECTRAL THEOREMS
λ1 , . . . , λr are positive numbers. This shows that (a) implies (b). Conditions (b) and (c) are equivalent, because projej (v) = hv, ej iej . Pr Now we assume that for every v ∈ V we have f (v) = j=1 λj hv, ej iej , where {e1 , . . . , er } is an orthonormal set and λ1 , . . . , λn are positive numbers. Then for every v ∈ V we have * r + r X X hf (v), vi = λj hv, ej iej , v = λj hv, ej ihej , vi j=1
=
r X j=1
j=1
λj hv, ej ihv, ej i =
r X j=1
λj |hv, ej i|2 ≥ 0.
This shows that (b) implies (a), which completes the proof.
Example 3.4.58. Consider the operator f ∈ L(C3 , C3 ) defined as f (x) = Ax, where 14 2 4 A = 2 17 −2 . 4 −2 14 Show that f is a positive operator.
Solution. Since the matrix A is symmetric, the operator f is normal and thus C3 has an orthonormal basis consisting of eigenvectors of f . Hence, to show that f is a positive operator, it suffices to show that all eigenvalues of f are nonnegative numbers. If λ is an eigenvalue of f , then the following system has a nontrivial solution. (14 − λ)x + 2y + 4z = 0 2x + (17 − λ)y − 2z = 0 4x − 2y + (14 − λ)z = 0. If we add the first and the third equations, we get (18 − λ)(x + z) = 0. It is easy to see that for λ = 18 the system has nontrivial solutions. If λ 6= 18, the system is equivalent to the system =0 x + z 2x + (17 − λ)y − 2z = 0 4x − 2y + (14 − λ)z = 0. If we let z = −x, then we get
4x + (17 − λ)y = 0 (λ − 10)x − 2y = 0.
196
Chapter 3: Inner Product Spaces
We multiply the first equation by 10−λ and add to the second and get 4 (10 − λ) (17 − λ) − 2 y = 0 4 or
(λ2 − 27λ + 162) y = 0. 4
The roots of the equation λ2 − 27λ + 162 = 0 are 9 and 18. If λ 6= 9 and λ 6= 18, then the only solution is x = y = z = 0. Consequently, 9 and 18 are the only eigenvalues of f . Since these are positive numbers, f is a positive operator.
The square root of a positive operator Every positive real number has a unique positive square root. A similar property holds for positive operators.
Definition 3.4.59. Let V be an inner product space and let f : V → V be a linear operator. An operator g : V → V is called a square root of f if g 2 = f .
Example 3.4.60. Let V = C∞ be the vector space of all infinite sequences of complex numbers with only a finite number of nonzero terms with the inner P∞ product defined as h(x1 , x2 , . . . ), (y1 , y2 , . . . )i = j=1 xj yj . If f : C∞ → C∞ is the operator defined as f (x1 , x2 , . . . ) = (α1 x1 , α2 x2 , . . . ), where α1 , α2 , . . . are positive numbers, then the operator g : C∞ → C∞ defined as √ √ g(x1 , x2 , . . . ) = ( α1 x1 , α2 x2 , . . . ), is a square root of f . Note that every operator of the form √ √ h(x1 , x2 , . . . ) = ((−1)n1 α1 x1 , (−1)n2 α2 x2 , . . . ) , where nj ∈ {1, 2}, is a square root of f , but the only square root of f that is a positive operator is the operator g defined above.
197
3.4. SPECTRAL THEOREMS
Theorem 3.4.61. Let V be a finite dimensional inner product space and let f : V → V be a positive operator. There is a unique positive operator g : V → V such that g 2 = f . Proof. We offer two different proofs of existence and two different proofs of uniqueness. The first proof of existence: Without loss of generality we can assume that f is a nonzero positive operator. By Theorem 3.4.57, for every x ∈ V we have f (x) =
r X j=1
λj hx, ej iej ,
where {e1 , . . . , er } is an orthonormal set and λ1 , . . . , λr are positive numbers. It is easy to verify that for the positive operator g(x) =
r X p λj hx, ej iej j=1
we have g 2 = f . The second proof of existence: Since f is a positive operator, all √ eigenvalues of f are nonnegative. Let p be a polynomial such that p(λ) = λ for every eigenvalue λ of f . Since, by Example 3.4.18, (p(f ))2 = f and p(f ) is a positive operator, we can take g = p(f ). √ The first proof of uniqueness: Let p be a polynomial such that p(λ) = λ for every eigenvalue λ of f and let g = p(f ). Now assume h is a positive operator such that h2 = f . If µ is any eigenvalue of h and v is an eigenvector corresponding to µ, then µ ≥ 0 and f (v) = µ2 v, which means that µ2 is an eigenvalue of f and thus p(µ2 ) = µ. Consequently, g(v) = p(f )(v) = p(h2 )(v) = p(µ2 )(v) = µv = h(v). Since g(v) = h(v) for every eigenvector of h and there is a basis of V of eigenvectors of h, we can conclude that g = h. The second proof of uniqueness: Let p be a polynomial such that (p(f ))2 = f and let h be a positive operator such that h2 = f . Then hf = hh2 = h2 h = f h and consequently hg = hp(f ) = p(f )h = gh, where g = p(f ). Since h2 − g 2 = 0, for every v ∈ V we have 0 = h(h − g)v, (h2 − g 2 )vi = h(h − g)v, (h + g)(h − g)vi = h(h − g)v, h(h − g)vi + h(h − g)v, g(h − g)vi.
198
Chapter 3: Inner Product Spaces
This gives us hh − g)(v), h(h − g)(v)i = 0
and h(h − g)(v), g(h − g)(v)i = 0,
because both h and g are positive operators, and consequently h(h − g)(v),(h − g)(h − g)(v)i = hh − g)(v), h(h − g)(v)i − h(h − g)(v), g(h − g)(v)i = 0. Hence h(h − g)3 (v), vi = 0, which gives (h − g)3 = 0. Since h − g is a self-adjoint operator, we conclude that h − g = 0, by Theorem 3.3.15. The unique positive square root of a positive operator f will be denoted
√ f.
Example 3.4.62. Consider the operator f ∈ L(C2 , C2 ) defined as f (x) = P x, where 33 24 − 24i P = . 24 + 24i 57 Show that f is a positive operator and find its positive square root. Solution. First we find the spectral decomposition of f : f = 9proj
1 − i −1
Thus f is a positive operator and p f = 3proj
Since
,
1 − i 2
1 − i −1
. + 9proj 1 − i 2
1 1−i 1 2 i−1 √ √ 1 + i −1 = , −1 −1 − i 1 3 3 1 1−i 1 3 2 2(1 − i) √ 1+i 2 = 9 √ , 4 2 2 2(1 + i) 6 6 3
and
+ 81proj
3 2 2(1 − i) 5 2 − 2i 2 i−1 + = , −1 − i 1 4 2 + 2i 7 2 2(1 + i)
we have p f
z1 5 2 − 2i z1 = . z2 2 + 2i 7 z2
199
3.4. SPECTRAL THEOREMS
Example 3.4.63. In Example 3.4.58 we show that the operator f ∈ L(C3 , C3 ) defined as f (x) = Ax, where 14 2 4 A = 2 17 −2 , 4 −2 14 is a positive operator. Find the spectral decomposition of
√ f.
Solution. In Example 3.4.58 we found that λ = 18 and λ = 9 are the eigenvalues of f . We need to find an orthogonal basis of C3 consisting of eigenvectors of f . For λ = 18 the system (14 − λ)x + 2y + 4z = 0 2x + (17 − λ)y − 2z = 0 4x − 2y + (14 − λ)z = 0 is equivalent to the equation
2x − y − 2z = 0. This means that E18
0 x 1 = 2x − 2z : x, z ∈ C = Span 2 , −2 . 1 z 0
0 1 Note that the vectors 2 and −2 are not orthogonal. The projection of 0 1 0 1 0 the vector 2 on Span −2 is − 54 −2 and thus the vector 1 0 1 1 0 5 2 − − 4 −2 = 1 2 5 5 4 0 1
0 0 5 is orthogonal to −2 and 2 , −2 is an orthogonal basis of E18 . 1 4 1
200
Chapter 3: Inner Product Spaces For λ = 9 the system (14 − λ)x + 2y + 4z = 0 2x + (17 − λ)y − 2z = 0 4x − 2y + (14 − λ)z = 0
is equivalent to the system
which means that
=0 x + z 4x + 8y = 0 , −x − 2y = 0
−2 −2y E9 = y : y ∈ C = Span 1 . 2 2y −2 As expected the vector 1 is orthogonal to the vectors from E18 and 2 0 −2 5 2 , −2 , 1 4 1 2
is an orthogonal basis of eigenvectors of f . Since
f = 18proj5 + 18proj 0 + 9proj−2 −2 1 2 1 2 4 we have
3.5
p √ √ f = 3 2proj5 + 3 2proj 0 + 3proj−2 . 2 −2 1 4 1 2
Singular value decomposition
Spectral decomposition is formulated for operators on an inner product space, that is, operators f : V → V where V is an inner product space. Now we are going to discuss a decomposition similar to spectral decomposition for linear transformations between two different inner product spaces. We begin by presenting an example which will motivate the main result of this section.
201
3.5. SINGULAR VALUE DECOMPOSITION
Example 3.5.1. Let V and W be finite dimensional inner product spaces and let f : V → W be a linear transformation. Suppose for some orthonormal vectors v1 , v2 ∈ V we have f ∗ f (x) = 49hx, v1 iv1 + 25hx, v2 iv2 for every x ∈ V. Let w1 = kw2 k = 1, hw1 , w2 i = 0, and
1 7 f (v1 )
and w2 =
1 5 f (v2 ).
Show that kw1 k =
f (x) = 7hx, v1 iw1 + 5hx, v2 iw2 for every x ∈ V. Solution. Since kf (v1 )k2 = hf ∗ f (v1 ), v1 i = 49 and kf (v2 )k2 = hf ∗ f (v2 ), v2 i = 25 we have
1
1
kw1 k = f (v1 ) = 1 and kw2 k = f (v2 )
=1 7 5
and
1 hf (v1 ), f (v2 )i 35 1 ∗ 1 = hf f (v1 ), v2 i = h49v1 , v2 i 35 35 49 = hv1 , v2 i = 0. 35
hw1 , w2 i =
Now we extend the set {v1 , v2 } to {v1 , v2 , . . . , vn }, an orthonormal basis of V. Then, for every x ∈ V, we have x = hx, v1 iv1 + hx, v2 iv2 + · · · + hx, vn ivn and thus f (x) = hx, v1 if (v1 ) + hx, v2 if (v2 ) + · · · + hx, vn if (vn ). Note that kf (vj )k2 = hf ∗ f (vj ), vj i = 0 for j > 2. Hence f (x) = hx, v1 if (v1 ) + hx, v2 if (v2 ) = 7hx, v1 iw1 + 5hx, v2 iw2 .
202
Chapter 3: Inner Product Spaces
Theorem 3.5.2. Let V and W be finite dimensional inner product spaces. For every nonzero linear transformation f : V → W there are positive numbers σ1 , . . . , σr , orthonormal vectors v1 , . . . , vr ∈ V, and orthonormal vectors w1 , . . . , wr ∈ W such that f (x) =
r X j=1
σj hx, vj iwj
for every x ∈ V. Proof. The operator f ∗ f is self-adjoint. Since hf ∗ f (x), xi = hf (x), f (x)i = kf (xk2 ≥ 0, for every x ∈ V, f ∗ f is a nonzero positive operator and thus there are orthonormal vectors v1 , . . . , vr and positive numbers λ1 , . . . , λr such that f ∗ f (x) =
r X j=1
λj hx, vj ivj
for every x ∈ V. Now we extend the set {v1 , . . . , vr } to an orthonormal basis {v1 , . . . , vn } of V. Then, for every x ∈ V, we have
f (x) = f
n X j=1
hx, vj ivj =
n X j=1
hx, vj if (vj ).
Since, for every j = 1, . . . , r, we have
kf (vj )k2 = hf ∗ f (vj ), vj i = λj kvj k2 = λj , and kf (vj )k2 = hf ∗ f (vj ), vj i = 0 for j > r, we can write f (x) =
r X j=1
hx, vj if (vj ) =
r X j=1
p If we let σj = kf (vj )k = λj and wj = the desired decomposition of f : f (x) =
r X j=1
for every x ∈ V.
kf (vj )khx, vj i 1 kf (vj )k f (vj ),
σj hx, vj iwj
1 f (vj ). kf (vj )k
for j ≤ r, then we obtain
3.5. SINGULAR VALUE DECOMPOSITION
203
Moreover, for every j, k ∈ {1, . . . , r} we have 1 1 1 f (vj ), f (vk ) = hf (vj ), f (vk )i hwj , wk i = kf (vj )k kf (vk )k σj σk 1 1 λj = hf ∗ f (vj ), vk i = hλj vj , vk i = hvj , vk i, σj σk σj σk σj σk so the vectors w1 , . . . , wr are orthonormal.
Corollary 3.5.3. Let V and W be finite dimensional inner product spaces. For every nonzero linear transformation f : V → W there are orthonormal bases v1 , . . . , vn ∈ V and w1 , . . . , wm ∈ W and positive numbers σ1 , . . . , σr , for some r ≤ m and r ≤ n, such that f (vj ) = σj wj for j = 1, . . . , r and f (vj ) = 0 for j = r + 1, . . . , n, if n > r. Proof. Let v1 , . . . , vn ∈ V, λ1 , . . . , λr , σ1 , . . . , σr > 0, and w1 , . . . , wr ∈ W be as defined in the proof of Theorem 3.5.2. It suffices to extend {w1 , . . . , wr } to an orthonormal basis of W. The representation of a linear transformation between inner product spaces given in Theorem 3.5.2 is called the singular value decomposition of f . If f : V → V is a positive operator, then its singular value decomposition is the same as its spectral decomposition. If f : V → V is a normal operator and f (x) =
r X j=1
λj hx, vj ivj
is its spectral decomposition with every λj 6= 0, then f (x) =
r X j=1
|λj |hx, vj i
λj vj |λj |
is its singular value decomposition. Note that, if λj is a nonzero real number, λ then |λjj | is 1 or −1. Example 3.5.4. Let f : V → W, v1 , . . . , vr ∈ V, σ1 , . . . , σr > 0, and w1 , . . . , wr ∈ W be as defined in Theorem 3.5.2. Show that f ∗ (y) =
r X j=1
σj hy, wj ivj
204
Chapter 3: Inner Product Spaces
for every y ∈ W. Solution. hf (x), yi = =
*
r X
j=1 r X j=1
=
r X j=1
σj hx, vj iwj , y
+
r X
=
σj hx, vj ihwj , yi = σj hx, hy, wj ivj i =
j=1 r X j=1
*
hσj hx, vj iwj , yi
σj hx, hwj , yivj i
x,
r X j=1
σj hy, wj ivj
+
= hx, f ∗ (y)i.
The following theorem can be interpreted as a form of uniqueness of the singular value decomposition. Theorem 3.5.5. Let V and W be finite dimensional inner product spaces and let f : V → W be a nonzero linear transformation. If there are positive numbers σ1 , . . . , σr , orthonormal vectors v1 , . . . , vr ∈ V, and orthonormal vectors w1 , . . . , wr ∈ W such that f (x) =
r X j=1
σj hx, vj iwj
for every x ∈ V, then f ∗ f (x) =
r X j=1
σj2 hx, vj ivj
for every x ∈ V. Proof. From the result in Example 3.5.4 we get f ∗ f (x) =
r X j=1
= =
r X
j=1 r X j=1
for every x ∈ V.
σj hf (x), wj ivj σj
*
r X
k=1
σk hx, vk iwk , wj
σj2 hx, vj ivj
+
vj
3.5. SINGULAR VALUE DECOMPOSITION
205
Example 3.5.6. Let f : V → W, v1 , . . . , vr ∈ V, σ1 , . . . , σr > 0, and w1 , . . . , wr ∈ W be as defined in Theorem 3.5.2. Show that the linear transformation f + : W → V defined as f + (y) =
r X 1 hy, wj ivj σ j=1 j
is the unique linear transformation g from W to V such that the following two conditions are satisfied: (a) gf = proj(ker f )⊥ ; (b) g(y) = 0 for every vector y ∈ (ran f )⊥ . Solution. Since, for every k ∈ {1, . . . , r}, +
f f (vk ) = f
+
r X j=1
σj hvk , vj iwj = f + (σk hvk , vk iwk ) = σk f + (wk )
r X 1 1 = σk hwk , wj ivj = σk hwk , wk ivk = vk = proj(ker f )⊥ (vk ), σ σk j=1 j
we have f + f = proj(ker f )⊥ , because (ker f )⊥ = Span{v1 , . . . , vr }. If y ∈ (ran f )⊥ , then hy, wj i = 0 for every j ∈ {1, . . . , r} and thus f + (y) =
r X 1 hy, wj ivj = 0. σ j=1 j
Now assume that a linear transformation g : W → V satisfies (a) and (b). Note that ran f = Span{w1 , . . . , wr } and W = ran f ⊕ (ran f )⊥ . If y ∈ W, then y = y1 + y2 , where y1 ∈ ran f and y2 ∈ (ran f )⊥ . Then y1 = f (x) for some x ∈ V, and consequently g(y) = g(y1 + y2 ) = g(y1 ) + g(y2 ) = gf (x) + 0 * + r r X X 1 ∗ = proj(ker f )⊥ (x) = hx, vj ivj = x, 2 f f (vj ) vj σj j=1 j=1 r r X 1 X 1 1 = f (x), f (vj ) vj = hy1 , wj i vj σ σj σ j=1 j j=1 j =
r X 1 hy, wj ivj . σ j=1 j
206
Chapter 3: Inner Product Spaces
Therefore, if P g : W → V is a linear transformation that satisfies (a) and (b), r then g(y) = j=1 σ1j hy, wj ivj .
Example 3.5.7. Consider the operator f ∈ L(C2 , C2 ) defined as f (x) = Ax, where 5i 3 + 4i A= . −3 + 4i 5i Find the singular value decomposition of f . Solution. We have
−5i −3 − 4i A = 3 − 4i −5i ∗
and A∗ A =
50 40 − 30i . 40 + 30i 50
The eigenvalues of the matrix A∗ A are the roots of the equation (50 − λ)2 − (40 − 30i)(40 + 30i) = λ2 − 100λ = 0, that is, λ = 100 and λ = 0, and the orthonormal vectors 1 4 − 3i 1 4 − 3i √ √ and 5 −5 5 2 5 2 are corresponding eigenvectors. Since 4 − 3i 10(3 + 4i) f = , 5 50i and
f
4 − 3i 0 = −5 0
√
10(3 + 4i)
= 50 2,
50i
the singular value decomposition of f is 1 1 4 − 3i 10(3 + 4i) x1 x1 √ . f = 10 , √ 5 50i x2 x2 5 2 50 2 For practical calculations we can use a simplified form: 1 x1 x1 4 − 3i 3 + 4i f = , . x2 x2 5 5i 5
207
3.5. SINGULAR VALUE DECOMPOSITION
Example 3.5.8. Consider the operator f ∈ L(C2 , C2 ) defined as f (x) = Ax, where 3 + 2i 2 − 4i A= . 4 + 2i 6 + i Find the singular value decomposition of f . Solution. First we calculate
33 24 − 24i A A= . 24 + 24i 57 ∗
1−i 1−i The eigenvalues of the matrix A A are 9 and 81 and and are −1 2 corresponding eigenvectors. Consequently the singular value decomposition of f is f (x) = 3hx, v1 iw1 + 9hx, v1 iw2 , ∗
where 1 1+i 1 1+i 1 1−i 1 1+i , w1 = √ , v2 = √ , w2 = √ . v1 = √ −1 −i 2 2 3 3 6 6
Example 3.5.9. Consider the operator f ∈ L(C2 , C4 ) defined as f (x) = Ax, where 3 1 1 2 A= 1 −3 . −1 −2 Find the singular value decomposition of f . Solution. First we find that
3 1 3 1 1 −1 1 2 = 12 4 . 1 2 −3 −2 1 −3 4 18 −1 −2
12 4 2 1 The eigenvalues of the matrix are 10 and 20, and and are 4 18 −1 2
208
Chapter 3: Inner Product Spaces
corresponding eigenvectors. We have 3 1 5 1 2 1 5 1 −3 2 = −5 and −1 −2 −5 We note that
*
3 1 5 1 2 2 0 1 −3 −1 = 5 . −1 −2 0
5 5 + 5 0 , = 0. −5 5 −5 0
The singular value decomposition of f is √ √ f (x) = 20hx, v1 iw1 + 10hx, v1 iw2 , where
v1 =
√1 5 , √2 5
w1 =
1 2 1 2 1 , − 2 − 21
√2
5 , v2 = − √15
√1 2
0 w2 = √1 . 2 0
Example 3.5.10. Let P2 ([−1, 1]) and P1 ([−1, 1]) denote real valued polynomials on the interval [−1, 1] of degree at most 2 and 1, respectively, with the inner product defined as Z 1 hp(t), q(t)i = p(t)q(t) dt. −1
Find the singular value decomposition of the differential operator D =
d . dt
q q n o is an orthonormal baSolution. First we note that √12 , 32 t, 52 32 t2 − 21 q o n sis in P2 ([−1, 1]) and √12 , 32 t is an orthonormal basis in P1 ([−1, 1]). The matrix of the differential operator with respect to these bases is √ 0 3 √ 0 0 0 15
209
3.5. SINGULAR VALUE DECOMPOSITION and we have 0 √ 0 0 0 √0 0 3 0 3 √ √ 0 0 0 15 = 0 3 0 , 0 0 15 0 15
so of the operator D∗ D are 3 and 15. The polynomials q eigenvalues q the nonzero 1 3 5 3 2 ∗ 2 t and 2 2 t − 2 are orthonormal eigenvectors of D D corresponding to the eigenvalues 3 and 15. Since
and
q q r 3 3 D t 2 1 2
q = √ =
2 3 3
D 2t
q 5 D 2
q
5
D 2
3 2 2t
−
1 2
√ 3 5 2 t
= q =
3 2 1 15 t −
2 2 2
r
3 t, 2
the singular value decomposition of the differential operator D : P2 ([−1, 1]) → P1 ([−1, 1]) is * * r +r r + r √ 3 1 √ 5 3 2 1 3 D(p(t)) = 3 p(t), t + 15 p(t), t − t 2 2 2 2 2 2 Z Z 15 1 2 3 1 tp(t) dt + (3t − 1)p(t) dt t. = 2 −1 4 −1 While this result has limited practical applications, it is interesting that on the space P2 ([−1, 1]) differentiation can be expressed in terms of definite integrals.
The polar decomposition We close this chapter with an introduction of another decomposition of operators on a finite dimensional inner product space, called the polar decomposition. To define the polar decomposition of an operator we need the notion of a partial isometry.
210
Chapter 3: Inner Product Spaces
Definition 3.5.11. Let V be an inner product space and let U be a subspace of V. A linear transformation f : V → V is called a partial isometry with initial space U if the following two conditions are satisfied: (a) kf (x)k = kxk for every x ∈ U; (b) f (x) = 0 for every x ∈ U ⊥ .
Example 3.5.12. Let V be an inner product space and let g : V → V be the linear transformation defined as g(x) =
r X j=1
hx, aj ibj ,
where {a1 , . . . , ar } and {b1 , . . . , br } are orthonormal sets in V. Show that g is a partial isometry with initial space Span{a1 , . . . , ar }. Solution. If x ∈ Span{a1 , . . . , ar }, then there are numbers x1 , . . . , xr ∈ K such that x = x1 a1 + · · · + xr ar . Then kxk2 = kx1 a1 + · · · + xr ar k2 = |x1 |2 + · · · + |xr |2 and, because hx, aj i = xj for all j ∈ {1, . . . , r}, we also have kg(x)k2 = kx1 b1 + · · · + xr br k2 = |x1 |2 + · · · + |xr |2 . It is clear that g(x) = 0 for every x ∈ Span{a1 , . . . , ar }⊥ . √ For any z ∈ C we have zz = |z|2 and thus |z| = zz. We use this analogy to define the operator |f | for an arbitrary linear operator on a finite dimensional inner product space V. If f : V → V is a linear operator, then f ∗ f is√a positive linear operator on V and thus it has a unique positive square root f ∗ f . We will use the notation p |f | = f ∗ f .
In other words, for any linear operator f : V → V on a finite dimensional inner product space V there is a unique positive linear operator |f | : V → V such that |f |2 = f ∗ f .
3.5. SINGULAR VALUE DECOMPOSITION
211
Theorem 3.5.13. Let V be a finite dimensional inner product space and let f : V → V be a linear transformation. There is a partial isometry g : V → V with initial space ran |f | such that f = g|f |. This representation is unique in the following sense: If f = hp where p : V → V is a positive operator and h is a partial isometry on V with initial space ran p, then p = |f | and g = h. Proof. If u = |f |(v) for some v ∈ V, then we would like to define g(u) = f (v), but this does not define g unless we can show that |f |(v1 ) = |f |(v2 ) implies f (v1 ) = f (v2 ). Since kf (v)k2 = hf (v), f (v)i = hf ∗ f (v), vi
= h|f |2 (v), vi = h|f |(v), |f |(v)i = k|f |(v)k2 ,
we have kf (v)k = k|f |(v)k and thus kf (v1 ) − f (v2 )k = kf (v1 − v2 )k = k|f |(v1 − v2 )k = k|f |(v1 ) − |f |(v2 )k. Consequently |f |(v1 ) = |f |(v2 ) implies f (v1 ) = f (v2 ) and thus g is well-defined. Moreover, kg(|f |(v))k = kf (v)k = k|f |(v)k, so g is an isometry on ran |f |. Clearly, g(x) = 0 for every x ∈ (ran |f |)⊥ . Therefore, g is a partial isometry with initial space ran |f |. Now assume that f = hp where p : V → V is a positive operator and h is a partial isometry g : V → V with initial space ran p. Then for every v ∈ V we have hf ∗ f (v), vi = hf (v), f (v)i = hh(p(v), h(p(v)i = hp(v), p(v)i = hp2 (v), vi, which gives us f ∗ f = p2 , because f ∗ f and p2 are self adjoint, and thus p = |f |. Clearly, g = h. The representation of a linear transformation f : V → V in the form presented in Theorem 3.5.13 is called the polar decomposition of f . It is somewhat similar to the polar form of a complex number: z = |z|(cos θ + i sin θ). Example 3.5.14. Consider the operator f ∈ L(C2 , C2 ) defined as f (x) = Ax, where 3 + 2i 2 − 4i A= . 4 + 2i 6 + i Determine the polar decomposition of f .
212
Chapter 3: Inner Product Spaces
Solution. According to Example 3.4.62 we can write p + 9proj . |f | = f ∗ f = 3proj 1 − i 1 − i −1 2 We have f
|f |
and
1−i 3 + 3i = , −1 −3i
f
1−i 3 − 3i = , −1 −3
|f |
1−i 2
1−i 2
=
=
9 − 9i 18
9 − 9i . 18
Now we can define an isometry g : V → V such that 3 − 3i 3 + 3i 9 − 9i 9 − 9i g = and g = , −3 −3i 18 18 that is, 2
g(x) = hx, v1 iw1 + hx, v2 iw2 ,
where x ∈ C , 1 1+i 1 1−i 1 1−i 1 1−i , w1 = √ , v2 = √ , w2 = √ . v1 = √ −1 −i 2 2 3 3 6 6 Then f = g|f | which is the polar decomposition of f .
Example 3.5.15. Let V be an inner product space and let f : V → V be a linear transformation such that for every x ∈ V we have f ∗ f (x) =
r X j=1
λj hx, aj iaj ,
where {a1 , . . . , ar } is an orthonormal set in V and λ1 , . . . , λr are positive numbers. Find the polar decomposition of f . Solution. The linear operator |f | is defined by r X p p ∗ |f |(x) = f f (x) = λj hx, aj iaj j=1
for every x ∈ V. In the proof of Theorem 3.5.13 we show that p kf (aj )k = k|f |(aj )k = λj > 0
213
3.6. EXERCISES for j ∈ {1, . . . , r}, so we can define the operator g : V → V by g(y) =
r X j=1
where bj =
1 kf (aj )k f (aj )
hy, aj ibj ,
= √1 f (aj ). The operator g is a partial isometry λj
and it is easy to verify that f = g|f | which is the polar decomposition of f .
3.6 3.6.1
Exercises Definitions and examples
Exercise 3.1. Let s : V × V → C a positive sesquilinear form. Show that |s(x, y)|2 ≤ s(x, x)s(y, y) for every x, y ∈ V.
Exercise 3.2. Let V be an inner product space and let f : V → V be a positive operator (see Definition 3.4.53). Show that |hf (x), yi|2 ≤ hf (x), xihf (y), yi for every x, y ∈ V.
Exercise 3.3. Let V be an inner product space and let f : V → V be a positive operator (see Definition 3.4.53). Show that kf (x)k3 ≤ hf (x), xikf 2 (x)k for every x ∈ V.
Exercise 3.4. Let C 2 ([0, 1]) be the space of functions with continues second R1 derivatives. Show that hf, gi = f (0)g(0) + f ′ (0)g ′ (0) + 0 f ′′ (t)g ′′ (t)dt is an inner product in C 2 ([0, 1]).
Exercise 3.5. Let v be a vector in an inner product space V. Show that the function sv : V × V → C defined by sv (a, b) = hv, bia is sesquilinear.
Exercise 3.6. Let V be a finite dimensional inner product space. Show that for every linear operator f : V → V there is a nonnegative constant α such that kf (v)k ≤ αkvk for every v ∈ V.
Exercise 3.7. Let V be an inner product space and let s : V × V → C be a sesquilinear form. Show that, if s(x, y) = s(y, x) for every x, y ∈ V, then s = 0.
Exercise 3.8. Let V and W be inner product spaces and let v ∈ V and w ∈ W. Show that the function w⊗v : V → W defined by w⊗v(x) = hx, viw is a linear transformation and we have f ◦ (w ⊗ v) = f (w) ⊗ v for every linear operator f : W → W. Exercise 3.9. Let v ∈ Cn and w ∈ Cm . Find the matrix of the linear transformation w ⊗ v as defined in Exercise 3.8.
Exercise 3.10. Let U, V, and X be inner product spaces and let u ∈ U, v1 , v2 ∈ V, and w ∈ W. Show, with the notations from Exercise 3.8, that we have (w ⊗ v2 )(v1 ⊗ u) = hv1 , v2 i(w ⊗ u). q 1 2 2 Exercise 3.11. Show that n1 |(a1 + · · · + an )| ≤ n (a1 + · · · + an ) for any a1 , . . . , an ∈ R.
214
3.6.2
Chapter 3: Inner Product Spaces
Orthogonal projections
Exercise 3.12. Let {v1 , . . . , vn } be an orthonormal basis of theP inner product n space V. Show, with the notation from Exercise 3.8, that IdV = j=1 vj ⊗ vj .
Exercise 3.13. Let V and W be inner product spaces and let {v1 , . . . , vm } and {w1 , . . . , wn } be orthonormal bases in V and W, respectively. Show that the set {wk ⊗ vj : j ∈ {1, . . . , m}, k ∈ {1, . . . , n}} is a basis of L(W, V) (wk ⊗ vj is defined in Exercise 3.8). i Exercise 3.14. Determine the projection matrix on Span in C2 and −i 1 i use it to determine the projection of the vector on Span . i −i
Exercise 3.15. Let V be an inner product space and let f : V → V be an orthogonal projection. Show that kf (v)k = kvk implies f (v) = v for every v ∈ V. Exercise 3.16. Let V be an inner product space and let f : V → V be a linear operator. Show that f is an orthogonal projection if and only if ker(Id −f ) = ran f = ker f ⊥ . Exercise 3.17. Let U and V be subspaces of a finite-dimensional inner product space W. Show that projU projV = 0 if and only if hu, vi = 0 for every u ∈ U and v ∈ V. Exercise 3.18. Let V be a finite-dimensional inner product space and let f : V → V be a linear operator. Show, with the notation from Exercise 3.8, that the operator f is a nonzero projection Pk if and only if there is an orthonormal set {v1 , . . . , vk } ⊆ V such that f = j=1 vj ⊗ vj . Determine ran f .
Exercise 3.19. Let V be a finite-dimensional inner product space and let f : V → V be a linear operator. Show that there is an orthonormal basis {v1 , . . . , vn } of V and numbers xjk for 1 ≤ j ≤ k ≤ n such that f (v1 ) = x11 v1 , f (v2 ) = x12 v1 + x22 v2 , f (v3 ) = x13 v1 + x23 v2 + x33 v3 , .. . f (vn ) = x1n v1 + · · · + xn−1,n vn−1 + xnn vn . Exercise 3.20. Consider the vector space of continuous functions defined on R1 the interval [−1, 1] with the inner product hf, gi = −1 f (t)g(t) dt. Find the best approximation of the function t3 by functions from Span{1, t}. Exercise 3.21. Consider the vector space of continuous R π functions defined on the 1 interval [−π, π] with the inner product hf, gi = 2π −π f (t)g(t) dt. Determine nit the projection of the function t on Span{e } for all integers n ≥ 1.
3.6. EXERCISES
215
Exercise 3.22. Let V be an inner product space and let p1 , . . . , pn be orthogonal projections such that Id = p1 + · · · + pn . Show that pj pk = 0 whenever j 6= k.
3.6.3
The adjoint of a linear transformation
Exercise 3.23. Let V be a finite-dimensional inner product space and let f, g : V → V be orthogonal projections. Show that the following conditions are equivalent. (a) ran f ⊆ ran g (b) gf = f (c) f g = f Exercise 3.24. Let V be an inner product space and let f, g : V → V be orthogonal projections. If ran f ⊆ ran g, show that g − f is an orthogonal projection and ran(g − f ) = (ran f )⊥ ∩ ran g. Exercise 3.25. Let V be an inner product space and let f : V → V be a linear operator. If f is invertible and self-adjoint, show that f −1 is self-adjoint. Exercise 3.26. Let V be a finite-dimensional inner product space and let f : V → V be a self-adjoint operator. Show that k(λ − f )xk ≥ | Im λ|kxk, where λ ∈ K and x ∈ V. Exercise 3.27. Let V and W be finite-dimensional inner product spaces and let f : V → W be a linear transformation. If f is injective, show that f ∗ f is an isomorphism. Exercise 3.28. Let V and W be inner product spaces and let f : V → W be a linear transformation. If f is surjective, show that f f ∗ is an isomorphism. Exercise 3.29. Let V and W be inner product spaces and let v ∈ V and w ∈ W. Show, with the notation from Exercise 3.8, that we have (w ⊗ v)∗ = v ⊗ w. Exercise 3.30. Let V and W be inner product spaces and let v ∈ V and w ∈ W. If f : W → V is a linear transformation, show that (w ⊗ v) ◦ f = w ⊗ f ∗ (v), where ⊗ is defined as in Exercise 3.8. Exercise 3.31. Let V be a finite-dimensional inner product space and let f : V → V be a linear operator. Show that f = 0 if and only if f ∗ f = 0. Exercise 3.32. Let V be an inner product space and let f, g : V → V be linear operators. If f is self-adjoint, show that g ∗ f g is self-adjoint. Exercise 3.33. Let V be a finite-dimensional inner product space and let f : V → V be a linear operator. Show that Id +f ∗ f is invertible. Exercise 3.34. Let V be a finite dimensional inner product spaces and let f, g : V → V be self-adjoint operators. Show that f g is self-adjoint if and only if f g = gf .
216
Chapter 3: Inner Product Spaces
Exercise 3.35. Let V be an inner product space and let f, g : V → V be orthogonal projections. Show that f g is an orthogonal projection if and only if f g = gf . Exercise 3.36. Let V be an inner product space and let f, g : V → V be orthogonal projections. If f g is an orthogonal projection, show that ran f g = ran f ∩ ran g. Exercise 3.37. Let V be a finite-dimensional inner product space and let f : V → V be a linear operator. If f ∗ f is an orthogonal projection, show that f f ∗ is an orthogonal projection. Exercise 3.38. Let V be an n-dimensional complex inner product space and let {a1 , . . . , an } be an orthonormal basis of V. For a linear operator f : V → V we define the trace of f , denoted by tr f , as tr f =
n X j=1
hf (aj ), aj i.
Show that tr f does not depend on the choice of the orthonormal basis {a1 , . . . , an }, thatPis, for any two orthonormal bases {a1 , . . . , an } and {b1 , . . . , bn } in V Pn n we have j=1 hf (aj ), aj i = j=1 hf (bj ), bj i.
Exercise 3.39. Let A = (aij ) be a matrix from Mn×n (C) and let f : Cn → Cn be the linear operator defined by f (v) = Av. Show that tr f = a11 + · · · + ann . (See Exercise 3.38 for the definition of tr f .) Exercise 3.40. Let V be an n-dimensional inner product space and let x, y ∈ V. Show that tr(x ⊗ y) = hx, yi. (See Exercise 3.38 for the definition of tr f and Exercise 3.8 for the definition of x ⊗ y.) Exercise 3.41. Let V be an inner product space and let f, g : V → V be linear operators. Show that tr(f g) = tr(gf ). (See Exercise 3.38 for the definition of tr f .) Exercise 3.42. Let V be an inner product space. Show that the function s : L(V) × L(V) → K defined by s(f, g) = tr(g ∗ f ) is an inner product. (See Exercise 3.38 for the definition of tr f .) Exercise 3.43. Let V be an inner product space and let f, g : V → V be self-adjoint operators. If f 2 g 2 = 0, show that f g = 0.
3.6.4
Spectral theorems
Exercise 3.44. Let V be an inner product space and let f : V → V be an invertible linear operator. If λ is an eigenvalue of f , show that λ1 is an eigenvalue of f −1 .
3.6. EXERCISES
217
Exercise 3.45. Let V be a finite-dimensional inner product space and let f, g : V → V be linear operators. Show that, if Id −f g is invertible, then Id −gf is invertible and (Id −gf )−1 = Id +g(Id −f g)−1 f . Then show that, if λ 6= 0 is an eigenvalue of f g, then λ is an eigenvalue of gf . Exercise 3.46. Let V be an inner product space and let f : V → V be a linear 1 operator. Let g = 12 (f + f ∗ ) and h = 2i (f − f ∗ ). Show that g and h are self-adjoint and f = g + ih. Show also that f is normal if and only if gh = hg. Exercise 3.47. Let V be an inner product space and let f : V → V be a normal operator. If {a1 , . . . , an } is P an orthonormal Pn basis of V and λ1 , . . . , λn are the n eigenvalues of f , show that j=1 |λj |2 = j=1 kf (aj )k2 . Exercise 3.48. Let V be an inner product space and let f : V → V be a linear operator. Using Exercise 3.19 show that, if the operator f is normal, then there is an orthonormal basis {v1 , . . . , vn } of V consisting of eigenvectors of V. Exercise 3.49. Let V be a finite-dimensional inner product space and let f : V → V be a linear operator. Show that if f f ∗ f = f ∗ f f , then (f f ∗ − f ∗ f )2 = 0 and that f is normal. Exercise 3.50. Let V be a finite-dimensional inner product space and let f : V → V be a linear operator. Show that f is normal if and only if there is a polynomial p such that p(f ) = f ∗ . Exercise 3.51. Let V be a finite-dimensional inner product space. Show that the function S : V × V → V × V defined by S(x, y) = (−y, x) is a unitary operator and that S 2 = IdV×V . Exercise 3.52. Let V be a finite-dimensional inner product space and let f : V → V be a linear operator. If Γf = {(x, f (x)) ∈ V × V|x ∈ V} and Γf ∗ = {(x, f ∗ (x)) ∈ V × V|x ∈ V}, show that (Γ(f ))⊥ = S(Γ(f ∗ )), where S is defined in Exercise 3.51. Exercise 3.53. Let V be an n-dimensional inner product space and let f : V → V be a linear operator. Show that f normal if and only if there is a unitary operator g such that f g = f ∗ . Exercise 3.54. Let V be a finite-dimensional inner product space and let f : V → V be a linear operator. If f is positive, show that g ∗ f g is positive. Exercise 3.55. Let V be an n-dimensional inner product space and let f : V → V be a self-adjoint operator. Show that there are positive operators g and h such that f = g − h. Exercise 3.56. Consider the operator f ∈ L(C2 , C2 ) defined as f (x) = Ax, 13 5i where A = . Show that f is positive and determine its spectral de−5i 13 composition.
218
Chapter 3: Inner Product Spaces
Exercise 3.57. Let V and W be inner product spaces and let f : V → W be a linear operator. Show that f is an isometry if and only if there P are orthonormal bases {v1 , . . . , vn } and {w1 , . . . , wn } in V such that f = nj=1 wj ⊗ vj , where ⊗ is defined as in Exercise 3.8.
3.6.5
Singular value decomposition
Exercise 3.58. Let V be a finite-dimensional inner product space and let f : V → V be a linear operator. Show that, if f ∗ f is the projection on a subspace U, then f is a partial isometry with initial space U. 2 2 Exercise 3.59. Consider the operator f ∈ L(C , C ) defined as f (x) = Ax, 1 i where A = . Determine the singular value decomposition of f . i 1
Exercise 3.60. Let V and W be finite dimensional inner product spaces. Show that for every nonzero linear transformation f : V → W there are positive numbers σ1 , . . . , σr , orthonormal vectors Pr v1 , . . . , vr ∈ V, and orthonormal vectors w1 , . . . , wr ∈ W such that f = j=1 σj wj ⊗ vj , where ⊗ is defined as in Exercise 3.8. Exercise 3.61. Let V and W be finite-dimensional inner product spaces and P let f : V → W be a linear operator. If f = rj=1 αj wj ⊗ vj , where {v1 , . . . , vr } are orthonormal vectors in V, {w1 , . . . , wr } are orthonormal vectors in W, and α1 , . . . , αr are positive numbers P such that α1 ≥ · · · ≥ αr (and ⊗ is defined as in Exercise 3.8), show that f ∗ f = rj=1 |αj |2 vj ⊗ vj and consequently v1 , . . . , vr are eigenvectors of f ∗ f with corresponding eigenvalues |α1 |2 , . . . , |αr |2 . Exercise 3.62. Let V and W be finite dimensional inner Pr product spaces and let f : V → W be a linear transformation. Let f = j=1 σj wj ⊗ vj be the singular value decomposition of f from Exercise 3.60, where {v1 , . . . , vr } are orthonormal vectors in V, {w1 , . . . , wr } are orthonormal vectors in W, and σ1 , . . . , σr are positive numbers such that σ1 ≥ · · · ≥ σr . Show that ran f = Span{w1 , . . . , wr }. Exercise 3.63. Let V and W be finite dimensional P inner product spaces and let f : V → W be a linear transformation. Let f = rj=1 σj wj ⊗ vj be the singular value decomposition of f from Exercise 3.60, where {v1 , . . . , vr } are orthonormal vectors in V, {w1 , . . . , wr } are orthonormal vectors in W, and σ1 , . . . , σr are positive numbers such that σ1 ≥ · · · ≥ σr . Let {v1 , . . . , vr , . . . , vn } be an orthonormal basis of V. Show that ker f = Span{vr+1 , . . . , vn }. Exercise 3.64. Let V and W be finite-dimensional inner Pr product spaces and let f : V → W be a linear transformation. Let f = j=1 σj wj ⊗ vj be the singular value decomposition of f from Exercise 3.60, where {v1 , . . . , vr } are orthonormal vectors in V, {w1 , . . . , wr } are orthonormal vectors in W, and σ1 , . . . , σr are positive numbers such that σ1 ≥ · · · ≥ σr . Show that ran f ∗ = Span{v1 , . . . , vr }.
3.6. EXERCISES
219
Exercise 3.65. Let V and W be finite-dimensional Pr inner product spaces and let f : V → W be a linear transformation. Let f = j=1 σj wj ⊗ vj be the singular value decomposition of f from Exercise 3.60, where {v1 , . . . , vr } are orthonormal vectors in V, {w1 , . . . , wr } are orthonormal vectors in W, and σ1 , . . . , σr are positive numbers such that σ1 ≥ · · · ≥ σr . Let {w1 , . . . , wr , . . . , wm } be an orthonormal basis of W. Show that ker f ∗ = Span{wr+1 , . . . , wm }. Exercise 3.66. Let V and W be finite-dimensional inner P product spaces and let f : V → W be a linear transformation. Let f = rj=1 σj wj ⊗ vj be the singular value decomposition of f from Exercise 3.60, where {v1 , . . . , vr } are orthonormal vectors in V, {w1 , . . . , wr } are orthonormal vectors in W, and σ1 , . . . , σr are positive numbers such that σ1 ≥ · · · ≥ σr . Show that, if a linear transformation f + : W → V is such that f + f = projran f ∗ and f + = 0 on Pr (ran f ∗ )⊥ , then f + = j=1 σ1j vj ⊗ wj .
Exercise 3.67. Let V and W be finite-dimensional inner Pr product spaces and let f : V → W be a linear transformation. Let f = j=1 σj wj ⊗ vj be the singular value decomposition of f , from Exercise 3.60, where {v1 , . . . , vr } are orthonormal vectors in V, {w1 , . . . , wr } are orthonormal vectorsPin W, and r σ1 , . . . , σr are positive numbers such that σ1 ≥ · · · ≥ σr . Let f + = j=1 σ1j vj ⊗ wj as in exercise 3.66. Show that f f + is the projection on ran f and IdW −f f + is the projection on ker f ∗ .
Exercise 3.68. Let V and W be finite-dimensional Pr inner product spaces and let f : V → W be a linear transformation. Let f = j=1 σj wj ⊗ vj be the singular value decomposition of f from Exercise 3.60, where {v1 , . . . , vr } are orthonormal vectors in V, {w1 , . . . , wr } are orthonormal vectors inP W, and σ1 , . . . , σr are r positive numbers such that σ1 ≥ · · · ≥ σr . If f + = j=1 σ1j vj ⊗ wj (as in Exercise 3.66), show that f + f f + = f + . Exercise 3.69. Let V and W be finite-dimensional inner Pr product spaces and let f : V → W be a linear transformation. Let f = j=1 σj wj ⊗ vj be the singular value decomposition of f from Exercise 3.60, where {v1 , . . . , vr } are orthonormal vectors in V, {w1 , . . . , wr } are orthonormal vectors in W, and σ1 , . . . , σr are positive numbers such that σ1 ≥ · · · ≥ σr . If f is injective, show that f ∗ f is invertible and (f ∗ f )−1 f ∗ = f + , where f + is defined in Exercise 3.66. Exercise 3.70. Let V and W be finite-dimensional inner product spaces and let f : V → W be a linear transformation. Show that (f ∗ )+ = (f + )∗ , where f + is defined in Exercise 3.66. Exercise 3.71. Let V be a finite-dimensional inner product space and let f : V → V be a linear operator. Obtain, using Exercise 3.12 and without using the proof of Theorem 3.5.2, the following form of singular value decomposition: If dim V = n, then there are orthonormal bases {v1 , . . . ,P vn } and {u1 , . . . , un } n of V and nonnegative numbers σ1 , . . . , σn , such that f = j=1 σj uj ⊗ vj .
220
Chapter 3: Inner Product Spaces
Exercise 3.72. Let V be a finite-dimensional inner product space and let f : V → V be a linear operator. Using Exercise 3.71 show that there is an isometry g such that f = g|f |. Note that this is a form of the polar decomposition of f . Exercise 3.73. Let V be a finite dimensional inner product space and let f : V → V be a linear operator. √Using Exercise 3.71 show that there is an isometry g : V → V such that f = f f ∗ g. Exercise 3.74. Let V and W be finite dimensional P inner product spaces and let f : V → W be a linear transformation. Let f = rj=1 σj wj ⊗ vj be the singular value decomposition of f from Exercise 3.60, where {v1 , . . . , vr } are orthonormal vectors in V, {w1 , . . . , wr } are orthonormal vectors in W, and σ1 , . . . , σr are positive numbers such that σ1 ≥ P · · · ≥ σr . Let {v1 , . . . , vr , . . . , vn } be an r orthonormal basis of V. If f + = j=1 σ1j vj ⊗ wj , where ⊗ is defined as in Exercise 3.8, show that every least square solution x of the equation f (x) = b is of the form x = f + (b) + xr+1 vr+1 + · · · + xn vn where xr+1 , . . . , xn ∈ K are arbitrary. Moreover, there is a unique least square solution of minimal length, which is x = f + (b).
Chapter 4
Reduction of Endomorphisms Introduction The main topic of this chapter is the following question: Given an endomorphism f on a finite-dimensional vector space V can we find a base of V such that the matrix of f is simple and easy to work with, that is, diagonal or blockdiagonal. This will help us better understand the structure of endomorphisms and provide important tools for applications of linear algebra, for example, to solve differential equations. At the beginning of the chapter we discuss alternating multilinear forms and determinants of endomorphisms which will give us a practical way of determining the diagonal and block-diagonal matrices of an endomorphism. Our presentation of determinants is self-contained, that is, it does not use results on determinants from elementary courses. In the context of this chapter it is customary to use the name endomorphisms instead of linear operators.
4.1
Eigenvalues and diagonalization
4.1.1
Multilinear alternating forms and determinants
At the beginning of Chapter 3 we introduce bilinear forms. They are defined as functions f : V × V → K that are linear in each variable, that is, the function fx : V → K defined as fx (y) = f (x, y) is linear for every x ∈ V and the function fy : V → K defined as fy (x) = f (x, y) is linear for every y ∈ V. 221
222
Chapter 4: Reduction of Endomorphisms n times
}| { z A similar definition can be given for any function from V = V × · · · × V to K. n
Definition 4.1.1. Let V be a vector space. A function E : V n → K is called an n-linear form or a multilinear form if for every j ∈ {1, . . . , n} and every x1 , . . . , xj−1 , xj+1 , . . . , xn ∈ V the function fx1 ,...,xj−1 ,xj+1 ,...,xn : V → K defined as fx1 ,...,xj−1 ,xj+1 ,...,xn (x) = E(x1 , . . . , xj−1 , x, xj+1 , . . . , xn ) is a linear form.
Example 4.1.2. The function E : Kn → K defined as E(x1 , . . . , xn ) = cx1 . . . xn is an n-linear form for any c ∈ K. This example can be generalized in the following way. Let V be a vector space and let fj : V → K be a linear function for j ∈ {1, . . . , n}. Then the function E : V n → K defined as E(x1 , . . . , xn ) = f1 (x1 ) . . . fn (xn ) is an n-linear form.
Definition 4.1.3. Let V be a vector space. An n-linear form E : V n → K is called an alternating n-linear form (or alternating multilinear form) if E(x1 , . . . , xn ) = 0 whenever xj = xk for some j 6= k. The following property of alternating multilinear forms is often used in calculations. It is equivalent to the condition in the definition of alternating multilinear forms.
4.1. EIGENVALUES AND DIAGONALIZATION
223
Theorem 4.1.4. Let V be a vector space and let E : V n → K be an alternating multilinear form. If 1 ≤ j < k ≤ n, then E(x1 , . . . , xj−1 , xk , xj+1 , . . . , xk−1 , xj , xk+1 . . . xn ) = −E(x1 , . . . , xj−1 , xj , xj+1 , . . . , xk−1 , xk , xk+1 . . . xn ) for all x1 , . . . , xn ∈ V. Proof. Since 0 = E(x1 , . . . , xj−1 , xj + xk , xj+1 , . . . , xk−1 , xj + xk , xk+1 , . . . , xn ) = E(x1 , . . . , xj−1 , xj , xj+1 , . . . , xk−1 , xj , xk+1 , . . . , xn ) + E(x1 , . . . , xj−1 , xj , xj+1 , . . . , xk−1 , xk , xk+1 , . . . , xn ) + E(x1 , . . . , xj−1 , xk , xj+1 , . . . , xk−1 , xj , xk+1 , . . . , xn ) + E(x1 , . . . , xj−1 , xk , xj+1 , . . . , xk−1 , xk , xk+1 . . . xn ) = E(x1 , . . . , xj−1 , xj , xj+1 , . . . , xk−1 , xk , xk+1 , . . . , xn ) + E(x1 , . . . , xj−1 , xk , xj+1 , . . . , xk−1 , xj , xk+1 , . . . , xn ), we have E(x1 , . . . , xj−1 ,xk , xj+1 , . . . , xk−1 , xj , xk+1 . . . xn ) = − E(x1 , . . . , xj−1 , xj , xj+1 , . . . , xk−1 , xk , xk+1 . . . xn ).
The following three examples indicate that there is a connection between alternating forms and determinants as defined in elementary courses. The full scope of that connection will become clearlater in this chapter. In these exα β amples the determinant of a matrix ∈ M2,2 (K) is defined as usual by γ δ α β det = αδ − βγ. γ δ
Example 4.1.5. Let V be a vector space and let E : V × V → K be an alternating bilinear form. Show that for every v1 , v2 ∈ V and α, β, γ, δ ∈ K we have α β E(αv1 + βv2 , γv1 + δv2 ) = det E(v1 , v2 ). (4.1) γ δ
224
Chapter 4: Reduction of Endomorphisms
Proof. For every v1 , v2 ∈ V and α, β, γ, δ ∈ K we have E(αv1 + βv2 , γv1 + δv2 ) = E(αv1 , γv1 ) + E(αv1 , δv2 ) + E(βv2 , γv1 ) + E(βv2 , δv2 ) = αγE(v1 , v1 ) + αδE(v1 , v2 ) + βγE(v2 , v1 ) + βδE(v2 , v2 ) = αδE(v1 , v2 ) + βγE(v2 , v1 ) = αδE(v1 , v2 ) − βγE(v1 , v2 ) = (αδ − βγ)E(v1 , v2 ) α β = det E(v1 , v2 ). γ δ
Note that, if E satisfies (4.1) for every v1 , v2 ∈ V and α, β, γ, δ ∈ K, then E is alternating.
Example 4.1.6. Let V be a vector space and let E : V × V → K be an alternating bilinear form. Show that for every x, y, z ∈ V and a11 , a21 , a31 , a12 , a22 , a32 ∈ K we have E(a11 x + a21 y + a31 z, a12 x + a22 y + a32 z) a a11 a12 E(x, y) + det 11 = det a31 a21 a22 a a22 + det 21 E(y, z). a31 a32
a12 E(x, z) a32
Solution. E(a11 x + a21 y + a31 z, a12 x + a22 y + a32 z) = E(a11 x, a22 y) + E(a21 y, a12 x) + E(a11 x, a32 z) + E(a31 z, a12 x) + E(a21 y, a32 z) + E(a31 z, a22 y) = E(a11 x + a21 y, a12 x + a22 y) + E(a11 x + a31 z, a12 x + a32 z) + E(a21 y + a31 z, a22 y + a32 z) a11 a12 a = det E(x, y) + det 11 a21 a22 a31 a a22 + det 21 E(y, z) a31 a32
a12 E(x, z) a32
4.1. EIGENVALUES AND DIAGONALIZATION
225
Example 4.1.7. Let x, y, z be vectors in a vector space V and let E : V × V × V → K be an alternating multilinear form. Show that E(a11 x + a21 y + a31 z, a12 x + a22 y + a32 z, a13 x + a23 y + a33 z) a22 a23 a12 a13 a12 a13 = a11 det − a21 det + a31 det E(x, y, z) a32 a33 a32 a33 a22 a23 a a23 a a13 a a13 = − a12 det 21 − a22 det 11 + a32 det 11 E(x, y, z) a31 a33 a31 a33 a21 a23 a a22 a a12 a a12 = a13 det 21 − a23 det 11 + a33 det 11 E(x, y, z) a31 a32 a31 a32 a21 a22
for every a11 , a21 , a31 , a12 , a22 , a32 , a13 , a23 , a33 ∈ K. Solution. We prove the second equality. The other equalities can be proven in the same way. We apply the result from Example 4.1.6 to the function G : V × V → K defined by G(s, t) = E(s, a12 x + a22 y + a32 z, t), where x, y, z ∈ V are arbitrary but fixed, and obtain E(a11 x + a21 y + a31 z, a12 x + a22 y + a32 z, a13 x + a23 y + a33 z) a11 a13 = det E(x, a12 x + a22 y + a32 z, y) a21 a23 a a13 + det 11 E(x, a12 x + a22 y + a32 z, z) a31 a33 a a23 E(y, a12 x + a22 y + a32 z, z) + det 21 a31 a33 a a13 a a13 = det 11 E(x, a32 z, y) + det 11 E(x, a22 y, z) a21 a23 a31 a33 a a23 E(y, a12 x, z) + det 21 a31 a33 a11 a13 a11 a13 E(x, y, z) E(x, z, y) + a22 det = a32 det a31 a33 a21 a23 a a23 E(y, x, z) +a12 det 21 a31 a33 a11 a13 a11 a13 E(x, y, z) E(x, y, z) + a22 det = −a32 det a31 a33 a21 a23 a a23 E(x, y, z) −a12 det 21 a31 a33 a a23 a a13 a a13 E(x, y, z). + a12 det 21 − a22 det 11 = − a32 det 11 a31 a33 a31 a33 a21 a23
226
Chapter 4: Reduction of Endomorphisms Theorem 4.1.8. Let V be a vector space and let E : V n → K be an alternating multilinear form. If x1 , . . . , xn ∈ V are linearly dependent, then E(x1 , . . . , xn ) = 0.
Proof. Without loss of generality, we can assume that x1 =
n X
aj xj . Then
j=2
n n X X E(x1 , . . . , xn ) = E aj xj , x2 , . . . , xn = aj E(xj , x2 . . . , xn ) = 0, j=2
j=2
because E is alternating.
Theorem 4.1.9. Let V be a vector space and let E : V n → K be a multilinear form. For any permutation σ ∈ Sn the function G : V n → K defined by X G(x1 , . . . , xn ) = ǫ(σ)E(xσ(1) , . . . , xσ(n) ) σ∈Sn
is an alternating multilinear form. Proof. It is easy to see that G is a multilinear form. Now suppose that xj = xk for some distinct j, k ∈ {1, . . . , n}. Let τ = σjk ∈ Sn , that is the transposition such that τ (j) = k, τ (k) = j, and τ (l) = l for any l ∈ {1, . . . , n} different from j and k. First we note that if σ is an even permutation then τ σ is an odd permutation and the function s : En → On defined by s(σ) = τ σ is a bijection. We have X G(x1 , . . . , xn ) = ǫ(σ)E(xσ(1) , . . . , xσ(n) ) σ∈Sn
=
X
ǫ(σ)E(xσ(1) , . . . , xσ(n) ) +
σ∈En
=
X
X
σ∈En
ǫ(σ)E(xσ(1) , . . . , xσ(n) )
σ∈On
ǫ(σ)E(xσ(1) , . . . , xσ(n) ) +
σ∈En
=
X
X
ǫ(τ σ)E(xτ σ(1) , . . . , xτ σ(n) )
σ∈En
ǫ(σ)E(xσ(1) , . . . , xσ(n) ) −
X
ǫ(σ)E(xτ σ(1) , . . . , xτ σ(n) ).
σ∈En
Now we consider three cases: Case 1: If σ(l) 6= j and σ(l) 6= k, then τ σ(l) = σ(l). Case 2: If σ(l) = j, then τ σ(l) = τ (j) = k and xj = xσ(l) = xk = xτ σ(l) . Case 3: If σ(l) = k, then τ σ(l) = τ (k) = j and xk = xσ(l) = xj = xτ σ(l) .
227
4.1. EIGENVALUES AND DIAGONALIZATION Consequently G(x1 , . . . , xn ) =
X
σ∈En
ǫ(σ)E(xσ(1) , . . . , xσ(n) ) −
X
ǫ(σ)E(xτ σ(1) , . . . , xτ σ(n) ) = 0
σ∈En
and thus G is an alternating multilinear form. Theorem 4.1.10. Let V be vector space and let E : V n → K be an alternating n-linear form. Then E(xσ(1) , . . . , xσ(n) ) = ǫ(σ)E(x1 , . . . , xn ) for any σ ∈ Sn and x1 , . . . , xn ∈ V. Proof. Since, by Theorem 4.1.4, we have E(xτ (1) , . . . , xτ (n) ) = −E(x1 , . . . , xn ) for every transposition τ , the result follows from Theorem 5.1 in Appendix A. Theorem 4.1.11. Let V be vector space and let E : V n → K be an alternating n-linear form. If xj = a1j v1 + · · · + anj vn , where v1 , . . . , vn , x1 , . . . , xn ∈ V, akj ∈ K, and j, k ∈ {1, . . . , n}, then ! X E(x1 , . . . , xn ) = ǫ(σ)aσ(1),1 · · · aσ(n),n E(v1 , . . . , vn ). σ∈Sn
Proof. Since E is alternating n-linear, we have X E(x1 , . . . , xn ) = aσ(1),1 · · · aσ(n),n E(vσ(1) , . . . , vσ(n) ). σ∈Sn
Consequently, by Theorem 4.1.10, E(x1 , . . . , xn ) =
X
σ∈Sn
ǫ(σ)aσ(1),1 · · · aσ(n),n
!
E(v1 , . . . , vn ).
Theorem 4.1.12. Let V be an n-dimensional vector space and let E : V n → K be a nonzero alternating n-linear form. Then vectors v1 , . . . , vn ∈ V constitute a basis of V if and only if E(v1 , . . . , vn ) 6= 0.
228
Chapter 4: Reduction of Endomorphisms
Proof. If {v1 , . . . , vn } is a basis of V and E(v1 , . . . , vn ) = 0, then E = 0 by Theorem 4.1.11. Consequently, if {v1 , . . . , vn } is a basis and E 6= 0, then E(v1 , . . . , vn ) 6= 0. If the vectors v1 , . . . , vn are linearly dependent, then E(v1 , . . . , vn ) = 0, by Theorem 4.1.8. Consequently, if E(v1 , . . . , vn ) 6= 0, then v1 , . . . , vn must be linearly independent, and thus {v1 , . . . , vn } is a basis of V.
Example 4.1.13 (Cramer’s rule). Let V be a vector space and let {v1 , . . . , vn } be a basis of V. If D : V n → K is a nonzero alternating multilinear form and a = x1 v1 + · · · + xn vn , then xj =
D(v1 , . . . , vj−1 , a, vj+1 , . . . , vn ) D(v1 , . . . , vn )
for every j ∈ {1, . . . , n}. Proof. For every j ∈ {1, . . . , n} we have D(v1 , . . . , vj−1 , a, vj+1 , . . . , vn ) = D(v1 , . . . , vj−1 , x1 v1 + · · · + xn vn , vj+1 , . . . , vn ) = x1 D(v1 , . . . , vj−1 , v1 , vj+1 , . . . , vn )
+ · · · + xj D(v1 , . . . , vj−1 , vj , vj+1 , . . . , vn ) + · · · + xn D(v1 , . . . , vj−1 , vn , vj+1 , . . . , vn )
= xj D(v1 , . . . , vj−1 , vj , vj+1 , . . . , vn ). This gives us the desired equality.
Theorem 4.1.14. Let {v1 , . . . , vn } be a basis of a vector space V. There is an unique alternating n-linear form Dv1 ,...,vn : V n → K such that Dv1 ,...,vn (v1 , . . . , vn ) = 1. Proof. Let {v1 , . . . , vn } be a basis of V. For x = a1 v1 + · · · + an vn and j ∈ {1, . . . , n} we define lvj (x) = lvj (a1 v1 + · · · + an vn ) = aj . Clearly, the function Ev1 ,...,vn : V n → K defined by Ev1 ,...,vn (x1 , . . . , xn ) = lv1 (x1 ) · · · lvn (xn )
4.1. EIGENVALUES AND DIAGONALIZATION
229
is n-linear. According to Theorem 4.1.9 the function Dv1 ,...,vn : V n → K defined by X Dv1 ,...,vn (x1 , . . . , xn ) = ǫ(σ)lv1 (xσ(1) ) . . . lvn (xσ(n) ) σ∈Sn
is an alternating n-linear form such that Dv1 ,...,vn (v1 , . . . , vn ) = 1. The uniqueness is a consequence of Theorem 4.1.11. Note that the sums X ǫ(σ)aσ(1),1 · · · aσ(n),n
and
σ∈Sn
X
σ∈Sn
ǫ(σ)a1,σ(1) · · · an,σ(n)
are equal. Indeed, if ajk ∈ K for j, k ∈ {1, . . . , n} and σ, τ ∈ Sn , then aσ(1),1 · · · aσ(n),n = aστ (1),τ (1) · · · aστ (n),τ (n) and consequently aσ(1),1 · · · aσ(n),n = aσσ−1 (1),σ−1 (1) · · · aσσ−1 (n),σ−1 (n) = a1,σ−1 (1) · · · an,σ−1 (n) . Hence X
σ∈Sn
ǫ(σ)aσ(1),1 · · · aσ(n),n = =
X
σ∈Sn
X
σ∈Sn
=
X
σ∈Sn
ǫ(σ)a1,σ−1 (1) · · · an,σ−1 (n) ǫ(σ −1 )a1,σ−1 (1) · · · an,σ−1 (n) ǫ(σ)a1,σ(1) · · · an,σ(n) .
Definition 4.1.15. Let A be an n × n matrix with entries aj,k . The number X ǫ(σ)aσ(1),1 · · · aσ(n),n σ∈Sn
is called the determinant of A and is denoted by det A.
Note that from the calculations presented before the above definition it follows that det A = det AT . It is easy to verify that our definition of the determinant agrees with the familiar formulas for 2 × 2 and 3 × 3 matrices: a a12 det 11 = a11 a22 − a12 a21 a21 a22
230
Chapter 4: Reduction of Endomorphisms
and a11 det a21 a31
a12 a22 a32
a13 a23 = a11 a22 a33 + a12 a23 a31 + a13 a21 a32 a33 − a13 a22 a31 − a11 a23 a32 − a12 a21 a33 .
Theorem 4.1.16. Let V be an n-dimensional vector space and let D : V n → K be a nonzero alternating n-linear form. For every alternating n-linear form E : V n → K we have E = αD for some unique α ∈ K. Proof. Let {v1 , . . . , vn } be a basis in V. For every x1 , . . . , xn ∈ V we have xj = a1j v1 + · · · + anj vn , where ajk ∈ K and j, k ∈ {1, . . . , n}. Now, using Theorem 4.1.11, we obtain ! X E(x1 , . . . , xn ) = ǫ(σ)aσ(1),1 · · · aσ(n),n E(v1 , . . . , vn ) σ∈Sn
=
X
σ∈Sn
= =
ǫ(σ)aσ(1),1 · · · aσ(n),n
E(v1 , . . . , vn ) D(v1 , . . . , vn )
X
σ∈Sn
!
D(v1 , . . . , vn ) D(v1 , . . . , vn ) !
E(v1 , . . . , vn )
ǫ(σ)aσ(1),1 · · · aσ(n),n
D(v1 , . . . , vn )
E(v1 , . . . , vn ) D(x1 , . . . , xn ). D(v1 , . . . , vn )
This means that E = αD where α =
E(v1 ,...,vn ) D(v1 ,...,vn ) .
Note that since E(x1 , . . . , xn ) = αD(x1 , . . . , xn ) for every x1 , . . . , xn ∈ V, the constant α does not depend on the choice of a basis in V. Now the uniqueness of α is immediate.
Theorem 4.1.17. Let V be an n-dimensional vector space and let f : V → V be an endomorphism. There is a unique number α ∈ K such that for every v1 , . . . , vn ∈ V and every alternating n-linear form E : V n → K we have E(f (v1 ), . . . , f (vn )) = αE(v1 , . . . , vn ).
4.1. EIGENVALUES AND DIAGONALIZATION
231
Proof. Let D : V n → K be a nonzero alternating n-linear form. The function F : V n → K defined by F(v1 , . . . , vn ) = D(f (v1 ), . . . , f (vn )) is an alternating n-linear form. Consequently, by Theorem 4.1.16, there is a number α ∈ K such that D(f (v1 ), . . . , f (vn )) = F(v1 , . . . , vn ) = αD(v1 , . . . , vn ). Now, applying Theorem 4.1.16 to an alternating n-linear form E : V n → K we obtain a number β ∈ K such that E = βD. Hence E(f (v1 ), . . . , f (vn )) = βD(f (v1 ), . . . , f (vn )) = βαD(v1 , . . . , vn ) = αE(v1 , . . . , vn ).
It is clear that α is unique.
Definition 4.1.18. Let V be an n-dimensional vector space and let f : V → V be an endomorphism. The number α ∈ K such that for every v1 , . . . , vn ∈ V and every alternating n-linear form E : V n → K we have E(f (v1 ), . . . , f (vn )) = αE(v1 , . . . , vn ) is called the determinant of f and is denoted by det f .
Using the notation from the above definition we can write that, if f is an endomorphism on an n-dimensional vector space V, then E(f (v1 ), . . . , f (vn )) = det f E(v1 , . . . , vn ) for every v1 , . . . , vn ∈ V and every alternating n-linear form E : V n → K. Example 4.1.19. Let V be a 2-dimensional vector space and let f : V → V be an endomorphism. If {v1 , v2 } is a basis of V such that f (v1 ) = αv1 + βv2
and f (v2 ) = γv1 + δv2
show that det f = αδ − βγ. Proof. Let E be an arbitrary alternating bilinear form E : V × V → K. Then E(f (v1 ), f (v2 )) = E(αv1 + βv2 , γv1 + δv2 ). Now we continue as in Example 4.1.5.
232
Chapter 4: Reduction of Endomorphisms
Theorem 4.1.20. Let V be an n-dimensional vector space and let f and g be endomorphisms on V. Then det(gf ) = det g det f. Proof. Let D : V n → K be a nonzero alternating n-linear form and let x1 , . . . , xn ∈ V. Then D(gf (x1 ), . . . , gf (xn )) = det(gf )D(x1 , . . . , xn ) and D(gf (x1 ), . . . , gf (xn )) = det g D(f (x1 ), . . . , f (xn )) = det g det f D(x1 , . . . , xn ).
Theorem 4.1.21. Let V be an n-dimensional vector space and let f : V → V be an endomorphism. Then f is invertible if and only if det f 6= 0. 1 . If f is invertible, then det f −1 = det f Proof. Let D : V n → K be a nonzero alternating n-linear form. If f is not invertible, then there is a nonzero vector x1 such that f (x1 ) = 0. If {x1 , . . . , xn } is a basis of V, then we have D(f (x1 ), . . . , f (xn )) = det f D(x1 , . . . , xn ) = 0. Since {x1 , . . . , xn } is a basis of V, we have D(x1 , . . . , xn ) 6= 0. Consequently, det f = 0. This shows that, if det f 6= 0, then f is invertible. Conversely, if f is invertible, then 1 = det(IdV ) = det(f −1 f ) = det f −1 det f. Thus det f 6= 0 and we have det f −1 =
1 . det f
Corollary 4.1.22. Let V be a finite dimensional vector space and let f and g be endomorphisms on V. If f is invertible, then det(f −1 gf ) = det g. Proof. According to the previous two results we have det(f −1 gf ) = det f −1 det g det f =
1 det g det f = det g. det f
4.1. EIGENVALUES AND DIAGONALIZATION
233
Lemma 4.1.23. Let V1 and V2 be finite dimensional vector spaces and let V = V1 ⊕ V2 . If f : V1 → V1 is an endomorphism and g : V → V is the endomorphism defined by g(v1 + v2 ) = f (v1 ) + v2 for all v1 ∈ V1 and v2 ∈ V2 , then det g = det f. Proof. Let {x1 , . . . , xn } be a basis of the vector space V such that {x1 , . . . , xp } is a basis of the vector space V1 and {xp+1 , . . . , xn } is a basis of the vector space V2 . Let D : V n → K be a nonzero alternating n-linear form. Then D(g(x1 ), . . . , g(xn )) = det g D(x1 , . . . , xn ). Now, if D1 : V1p → K is the alternating p-linear form defined by D1 (v1 , . . . , vp ) = D(v1 , . . . , vp , xp+1 , . . . , xn ), then D1 (f (x1 ), . . . , f (xp )) = det f D1 (x1 , . . . , xp ) = det f D(x1 , . . . , xp , xp+1 , . . . , xn ).
Hence det g = det f because D(g(x1 ), . . . , g(xn )) = D(f (x1 ), . . . , f (xp ), xp+1 , . . . , xn ) = D1 (f (x1 ), . . . , f (xp )).
Theorem 4.1.24. Let V1 and V2 be finite dimensional vector spaces and let V = V1 ⊕ V2 . If f1 : V1 → V1 and f2 : V2 → V2 are endomorphism and f : V → V is the endomorphism defined by f (v1 + v2 ) = f1 (v1 ) + f2 (v2 ) where v1 ∈ V1 and v2 ∈ V2 , then det f = det f1 det f2 . Proof. Let g1 : V → V and g2 : V → V be defined as g1 (v1 + v2 ) = f1 (v1 ) + v2
and g2 (v1 + v2 ) = v1 + f2 (v2 )
for all v1 ∈ V1 and v2 ∈ V2 . Then f = g1 g2 and thus det f = det g1 det g2 , by Theorem 4.1.20. Now from Lemma 4.1.23 we have det g1 = det f1 and det g2 = det f2 , which gives us det f = det f1 det f2 .
234
Chapter 4: Reduction of Endomorphisms
4.1.2
Diagonalization
In the remainder of this chapter it will be convenient to identify a number α ∈ K with the operator α Id. This convention is quite natural since (α Id)x = αx, so the α on the right hand side can be interpreted as a number or an operator. Eigenvalues and eigenvectors were introduced in Chapter 3 in the context of operators on inner product spaces, but the definitions do not require the inner product. For convenience we recall the definitions of eigenvalues, eigenvectors and eigenspaces. Definition 4.1.25. Let V be a vector space and let f : V → V be an endomorphism. A number λ ∈ K is called an eigenvalue of f if the equation f (x) = λx has a nontrivial solution, that is, a solution x 6= 0. The following theorem is useful when finding eigenvalues of endomorphism. Theorem 4.1.26. Let V be a vector space and let f : V → V be an endomorphism. Then λ is an eigenvalue of f
if and only if
det(f − λ) = 0.
Proof. The equivalence is a consequence of Theorem 4.1.21. Indeed, the equation f (x) = λx has a solution x 6= 0 if and only if the equation (f − λ)(x) = 0 has a solution x 6= 0, which means that the linear transformation f − λ is not invertible and this is equivalent to det(f − λ) = 0, by Theorem 4.1.21.
Definition 4.1.27. Let V be a vector space and let f : V → V be an endomorphism. The polynomial cf (t) = det(f − t) is called the characteristic polynomial of f and the equation det(f − t) = 0 is called the characteristic equation of f .
4.1. EIGENVALUES AND DIAGONALIZATION
235
Example 4.1.28. Let f : R3 → R3 be the endomorphism defined by x 3 1 2 x f y = 1 3 2 y . z 1 1 4 z Calculate cf .
Solution. Let D : R3 × R3 × R3 → R be a nonzero alternating 3-linear form. Then 1 0 0 D (f − t) 0 , (f − t) 1 , (f − t) 0 0 0 1 3−t 1 2 = D 1 , 3 − t , 2 1 1 4−t 2 3−t 1 1 = D 1 − 3 − t , 3 − t , 2 4−t 1 1 1 2−t 1 2 = D t − 2 , 3 − t , 2 0 1 4−t 1 1 2 = (2 − t)D −1 , 3 − t , 2 . 0 1 4−t Now we proceed as in Example 4.1.7 and get 1 1 2 1 0 0 D −1 , 3 − t , 2 = (t2 − 8t + 12)D 0 , 1 , 0 . 0 1 4−t 0 0 1 Hence
cf (t) = (2 − t)(t2 − 8t + 12) = (2 − t)2 (6 − t).
Example 4.1.29. Let f : C2 → C2 be the endomorphism defined by f
x 1 + 2i 2i x = . y 1 + i 3i y
Calculate cf and the eigenvalues of f .
236
Chapter 4: Reduction of Endomorphisms
Proof. Let D : C2 ×C2 → C be a nonzero alternating bilinear form. Proceeding as in Example 4.1.5 we get D
1 + 2i − t 2i 1 0 , = ((1 + 2i − t)(3i − t) − 2i(1 + i))D , 1+i 3i − t 0 1 1 0 = (t2 − (1 + 5i)t + i(1 + 4i))D , . 0 1
Hence cf (t) = t2 − (1 + 5i)t + i(1 + 4i) and the eigenvalues are i and 1 + 4i.
Definition 4.1.30. Let V be a vector space and let λ be an eigenvalue of an endomorphism f : V → V. A vector x 6= 0 is called an eigenvector of f corresponding to the eigenvalue λ if f (x) = λx. The set Eλ = {x ∈ V : f (x) = λx} is called the eigenspace of f corresponding to λ. It is easy to verify that Eλ is a subspace of V. It consists of all eigenvectors of f corresponding to λ and the zero vector. Note that E0 = ker f . Theorem 4.1.31. Let V be an n-dimensional vector space and let f : V → V be an endomorphism. If {v1 , . . . , vn } is a basis of V such that f (v1 ) = x11 v1 f (v2 ) = x12 v1 + x22 v2 .. . f (vn ) = x1n v1 + x2n v2 + · · · + xn−1,n vn−1 + xnn vn , where xjk ∈ K for all j, k ∈ {1, . . . , n} such that j ≤ k, then cf (t) = (x11 − t) · · · (xnn − t).
4.1. EIGENVALUES AND DIAGONALIZATION
237
Proof. For any nonzero alternating n-linear form D : V n → K we have cf (t)D(v1 , . . . , vn ) = det(f − t)D(v1 , . . . , vn )
= D(f (v1 ) − tv1 , f (v2 ) − tv2 , . . . , f (vn ) − tvn ) = D(x11 v1 − tv1 , x12 v1 + x22 v2 − tv2 , . . . ,
x1n v1 + x2n v2 + · · · + xn−1,n vn−1 + xnn vn − tvn ) = D(x11 v1 − tv1 , x22 v2 − tv2 , . . . , xnn vn − tvn ) = (x11 − t) · · · (xnn − t)D(v1 , . . . , vn ).
It turns out that the converse of the above result is also true. Theorem 4.1.32. Let V be an n-dimensional vector space and let f : V → V be an endomorphism such that cf (t) = (λ1 − t) · · · (λn − t) for some λ1 , . . . , λn ∈ K. Then there is a basis {v1 , . . . , vn } of V such that f (v1 ) = x11 v1 f (v2 ) = x12 v1 + x22 v2 .. . f (vn ) = x1n v1 + x2n v2 + · · · + xn−1,n vn−1 + xnn vn , where xjk ∈ K for all j, k ∈ {1, . . . , n} such that j ≤ k, and x11 = λ1 , x22 = λ2 , . . . , xnn = λn . Proof. We are going to use induction on n. Clearly, the theorem holds when n = 1. Now let n ≥ 2 and assume that the theorem holds for n − 1. If cf (t) = (λ1 − t) · · · (λn − t), then λ1 , . . . , λn are eigenvalues of f . Let v1 be an eigenvector of f corresponding to the eigenvalue λ1 , that is, f (v1 ) = λ1 v1 and v1 6= 0. We define V1 = Span{v1 }. Let W be a vector subspace of V such that V = V1 ⊕ W and let p be the projection of V on W along V1 . We denote by g : W → W the endomorphism induced by pf on W. Let {w2 , . . . , wn } be a basis of W and let D : V n → K be a nonzero alternating n-linear form. For some y1k ∈ K, where k ∈ {2, . . . , n}, we have f (w2 ) = y12 v1 + g(w2 ) .. . f (wn ) = y1n v1 + g(wn ).
238
Chapter 4: Reduction of Endomorphisms
If cf is the characteristic polynomial of f and cg is the characteristic polynomial of g, then for some x1k , where k ∈ {2, . . . , n}, we have cf (t)D(v1 , w2 , . . . , wn ) = D(f (v1 ) − tv1 , f (w2 ) − tw2 , . . . , , f (wn ) − twn ) = (λ1 − t)D(v1 , y12 v1 + g(w2 ) − tw2 , . . . , y1n v1 + g(wn ) − twn ) = (λ1 − t)D(v1 , g(w2 ) − tw2 , . . . , g(wn ) − twn )
= (λ1 − t)cg (t)D(v1 , w2 , . . . , wn ).
Consequently, cf (t) = (λ1 − t)cg (t) and thus cg = (λ2 − t) · · · (λn − t). By our inductive assumption the theorem holds for the endomorphism g : W → W and thus there is a basis {v2 , . . . , vn } of W and xjk ∈ K for j, k ∈ {2, . . . , n}, j ≤ k, such that g(v2 ) = x22 v2 g(v3 ) = x23 v2 + x33 v3 .. . g(vn ) = x2n v2 + · · · + xnn vn , 0 where x22 = λ2 , . . . , xnn = λn . Consequently, there are x12 , . . . , x1n ∈ K such that f (v1 ) = x11 v1 f (v2 ) = x12 v1 + g(v2 ) = x12 v1 + x22 v2 .. . f (vn ) = x1n v1 + g(vn ) = x1n v1 + x2n v1 + · · · + xn−1,n vn−1 + xnn vn , where x11 = λ1 , x22 = λ2 , . . . , xnn = λn .
Definition 4.1.33. Let V be a vector space and let f : V → V be an endomorphism. A polynomial p is called an f-annihilator if p(f ) = 0.
Note that in Chapter 3 we define the annihilator of a subset in an inner product space. In that context the annihilator is a subspace. The f -annihilator of an endomorphism f is a polynomial. Every endomorphism f on a finite dimensional vector space has f -annihilators. Indeed, if V is an n-dimensional vector space, then the dimension of the vector space L(V) of all endomorphisms f : V → V is n2 . Consequently, if f ∈ L(V), 2 then the endomorphisms Id, f, f 2 , . . . , f n are linearly dependent and thus there are numbers x0 , x1 , . . . , xn2 ∈ K, not all equal to 0, such that 2
x0 Id +x1 f + x2 f 2 + · · · + xn2 f n = 0.
239
4.1. EIGENVALUES AND DIAGONALIZATION
Example 4.1.34. Let V be a vector space and let f : V → V be an endomorphism. We suppose that dim V = 4 and that B = {v1 , v2 , v3 , v4 } is a basis of V. Let α 1 0 0 0 α 1 0 0 0 α 1 0 0 0 α be the B-matrix of f for some α ∈ K. Show that (t − α)4 is an annihilator of f. Proof. Since f (v1 ) = αv1 ,
f (v2 ) = v1 + αv2 ,
f (v3 ) = v2 + αv3 ,
f (v4 ) = v3 + αv4 ,
which can be written as (f − α)(v1 ) = 0,
(f − α)(v2 ) = v1 ,
(f − α)(v3 ) = v2 ,
(f − α)(v4 ) = v3 ,
we successively obtain (f − α)2 (v1 ) = (f − α)2 (v2 ) = 0, (f − α)2 (v4 ) = (f − α)(v3 ),
(f − α)2 (v3 ) = (f − α)v2 ,
(f − α)3 (v1 ) = (f − α)3 (v2 ) = (f − α)3 (v3 ) = 0,
(f − α)3 (v4 ) = (f − α)2 (v3 ),
and finally (f − α)4 (v1 ) = (f − α)4 (v2 ) = (f − α)4 (v3 ) = (f − α)4 (v4 ) = 0. Consequently the polynomial (t − α)4 is an f -annihilator. It is easy to verify that for the endomorphism f from the above example we have cf (t) = (t − α)4 . It is not a coincidence that (f − α)4 = 0. Actually, this is true for many endomorphisms as stated in the next theorem. If a polynomial p can be written as p(t) = (λ1 − t) · · · (λn − t) for some λ1 , . . . , λn ∈ K, then we say that p splits over K. Note that every polynomial with complex coefficients splits over C, by the Fundamental Theorem of Algebra, but not every polynomial with real coefficients splits over R. Theorem 4.1.35 (Cayley-Hamilton). Let V be an n-dimensional vector space. If f : V → V is an endomorphism such that its characteristic polynomial cf splits over K, then cf (f ) = 0.
240
Chapter 4: Reduction of Endomorphisms
Proof. Let {v1 , . . . , vn } be a basis of V such that f (v1 ) = x11 v1 f (v2 ) = x12 v1 + x22 v2 .. . f (vn ) = x1n v1 + x2n v1 + · · · + xn−1,n vn−1 + xnn vn , where xjk ∈ K for all j, k ∈ {1, . . . , n} such that j ≤ k. Then cf (t) = (x11 − t) · · · (xnn − t), by Theorem 4.1.31. We need to show that cf (f ) = (x11 − f ) · · · (xnn − f ) = 0. We will show by induction that (x11 − f ) · · · (xjj − f )(x) = 0 for every x ∈ Span{v1 , . . . , vj }. Clearly, (x11 − f )(x) = 0
for every vector x ∈ Span{v1 }. Now suppose that for some j ∈ {1, . . . , n − 1} we have (x11 − f ) · · · (xjj − f )(x) = 0 for every x ∈ Span{v1 , . . . , vj }. Since
(xj+1,j+1 − f )(vj+1 ) = xj+1,j+1 vj+1 − x1,j+1 v1 − x2,j+1 v2 − · · · − xj,j+1 vj − xj+1,j+1 vj+1 = −x1,j+1 v1 − x2,j+1 v2 − · · · − xj,j+1 vj ,
we have (x11 − f ) · · · (xjj − f )(xj+1,j+1 − f )(vj+1 )
= (x11 − f ) · · · (xjj − f )(−x1,j+1 v1 − x2,j+1 v2 − . . . − xj,j+1 vj ) = 0,
by the inductive assumption. Consequently, cf (f )(x) = (x11 − f ) · · · (xnn − f )(x) = 0 for every x ∈ V. Example 4.1.36. Verify Cayley-Hamilton theorem for the endomorphism f in Example 4.1.34. Solution. Let D : V 4 → K be a nonzero alternating multilinear form. Then D(f (v1 ) − tv1 , f (v2 ) − tv2 , f (v3 ) − tv3 , f (v4 ) − tv4 ) = D(αv1 − tv1 , v1 + αv2 − tv2 , v2 + αv3 − tv3 , v3 + αv4 − tv4 ) = D(αv1 − tv1 , αv2 − tv2 , αv3 − tv3 , αv4 − tv4 ) = (α − t)(α − t)(α − t)(α − t)D(v1 , v2 , v3 , v4 ) = (α − t)4 D(v1 , v2 , v3 , v4 ).
4.1. EIGENVALUES AND DIAGONALIZATION
241
This shows that cf (t) = (α − t)4 . Thus cf (f ) = 0, by Example 4.1.34.
Example 4.1.37. Verify Cayley-Hamilton Theorem for the endomorphism f : R2 → R2 defined by x 2 −1 x f = . y 1 5 y Solution. Let D : R2 × R2 → R be a nonzero alternating bilinear form. Proceeding as in Example 4.1.5 we obtain 2−t 1 1 0 D , = (t2 − 7t + 11)D , . −1 5−t 0 1 Hence cf (t) = t2 − 7t + 11 and we have 2 2 −1 2 −1 1 0 3 −7 −14 7 11 0 0 0 −7 + 11 = + + = . 1 5 1 5 0 1 7 24 −7 −35 0 11 0 0
Example 4.1.38. Let f : V → V be an endomorphism. We suppose that t3 +t is a f -annihilator. Show that, if λ is an eigenvalue of f , then λ ∈ {0, −i, i}. Solution. Let x be an eigenvector corresponding to an eigenvalue λ of f . Then (f 3 + f )(x) = (λ3 + λ)x = 0. Since x 6= 0, we have λ3 + λ = 0, which gives us the desired result.
Theorem 4.1.39. Let V be a finite dimensional vector space and let f : V → V be a nonzero endomorphism. (a) There is a unique monic polynomial mf of smallest positive degree such that mf (f ) = 0; (b) mf divides every f -annihilator; (c) If λ is an eigenvalue of f then mf (λ) = 0.
242
Chapter 4: Reduction of Endomorphisms
Proof. Note that cf is an f -annihilator. Recall that a monic polynomial is a single-variable polynomial in which the leading coefficient is equal to 1. Let p be a monic polynomial of smallest positive degree such that p(f ) = 0 and let q be a polynomial of positive degree such that q(f ) = 0. Then q = pa+r, where a and r are polynomials and r = 0 or the degree of r is strictly less than the degree of p. Since q(f ) = p(f )a(f ) + r(f ), we have r(f ) = 0. Now, because p is a polynomial of smallest positive degree such that p(f ) = 0, we have r = 0 and thus p divides q. If q is another monic polynomial of smallest positive degree such that q(f ) = 0, then the equality q = pa implies a = 1 and consequently p = q. Finally, if v is an eigenvector corresponding to the eigenvalue λ, then mf (f )(v) = mf (λ)v, which gives us mf (λ) = 0 because v 6= 0.
Definition 4.1.40. Let V be a finite dimensional vector space and let f : V → V be a nonzero endomorphism. The unique monic polynomial mf of smallest positive degree such that mf (f ) = 0 is called the minimal polynomial of f .
Example 4.1.41. Let V be a vector space such that dim V = 4 and let f : V → V be an endomorphism. If B = {v1 , v2 , v3 , v4 } is a basis of V and α 0 0 0
1 α 0 0
0 1 α 0
0 0 1 α
is the B-matrix of f for some α ∈ K, find mf . Solution. From Example 4.1.34 we know that the polynomial (t − α)4 is an annihilator of f . We note that (t − α)4 is the minimal polynomial mf because (f − α)3 (v4 ) = (f − α)2 (v3 ) = (f − α)(v2 ) = v1 6= 0.
4.1. EIGENVALUES AND DIAGONALIZATION
243
Theorem 4.1.42. Let V1 and V2 be finite dimensional vector spaces and let V = V1 ⊕ V2 . If f1 : V1 → V1 and f2 : V2 → V2 are nonzero endomorphisms and f : V → V is the endomorphism defined by f (v1 + v2 ) = f1 (v1 ) + f2 (v2 ) where v1 ∈ V1 and v2 ∈ V2 , then mf = LCM(mf1 , mf2 ). Proof. Let p = LCM(mf1 , mf2 ) and let v1 ∈ V1 and v2 ∈ V2 . Then p(f )(v1 + v2 ) = p(f )(v1 ) + p(f )(v2 ) = p(f1 )(v1 ) + p(f2 )(v2 ) = 0, because both mf1 and mf2 divide p. This implies that mf divides p. Now, since mf (f )(v) = mf (f1 )(v) = 0 for every v ∈ V1 , mf1 divides mf . Similarly, mf2 divides mf . Consequently p divides mf .
Theorem 4.1.43. Let V1 and V2 be finite dimensional vector spaces and let V = V1 ⊕ V2 . If f1 : V1 → V1 and f2 : V2 → V2 are endomorphisms and f : V → V is the endomorphism defined by f (v1 + v2 ) = f1 (v1 ) + f2 (v2 ) where v1 ∈ V1 and v2 ∈ V2 , then cf = cf1 cf2 . Proof. This result is a consequence of Theorem 4.1.24. The following result is of significant importance for the remainder of this chapter. Theorem 4.1.44. Let V be a finite dimensional vector space and let f : V → V be an endomorphism. If p1 , . . . , pk are polynomials such that GCD(pj , pl ) = 1 for every j, l ∈ {1, . . . , k} such that j 6= l and the product p1 · · · pk is an f -annihilator, then V = ker p1 (f ) ⊕ · · · ⊕ ker pk (f ). Proof. Let qj = p1 · · · pj−1 pj+1 · · · pk for j ∈ {1, . . . , k}. Clearly GCD(q1 , . . . , qk ) = 1 and thus there are polynomials a1 , . . . , ak such that a1 q1 + · · · + ak qk = 1.
244
Chapter 4: Reduction of Endomorphisms
Consequently, a1 (f )q1 (f ) + · · · + ak (f )qk (f ) = Id and thus a1 (f )q1 (f )(v) + · · · + ak (f )qk (f )(v) = v
(4.2)
for every v ∈ V. Note that for every j ∈ {1, . . . , n} we have pj (f )aj (f )qj (f )(v) = aj (f )pj (f )qj (f )(v) = aj (f )p1 (f ) · · · pk (f )(v) = 0, and thus aj (f )qj (f )(v) ∈ ker pj (f ). Hence, by (4.2), we have V = ker p1 (f ) + · · · + ker pk (f ). We need to show that the sum is direct. Let vj ∈ ker pj (f ) for j ∈ {1, . . . , n}. From (4.2) we get a1 (f )q1 (f )(vj ) + · · · + ak (f )qk (f )(vj ) = vj . Since al (f )ql (f )(vj ) = 0 for l ∈ {1, . . . , j − 1, j + 1, . . . , k}, we have a1 (f )q1 (f )(vj ) + · · · + ak (f )qk (f )(vj ) = aj (f )qj (f )(vj ) and thus vj = aj (f )qj (f )(vj ).
(4.3)
Suppose now that the vectors vj ∈ ker pj (f ), j ∈ {1, . . . , k}, are such that v1 + · · · + vk = 0. Then for every j ∈ {1, . . . , k} we have 0 = aj (f )qj (f )(0) = aj (f )qj (f )(v1 + · · · + vk ) = aj (f )qj (f )(vj ) because aj (f )qj (f )(vl ) = 0 for l ∈ {1, . . . , j − 1, j + 1, . . . , k}. Hence, by (4.3), we have vj = aj (f )qj (f )(vj ) = 0, which shows that the sum ker p1 (f ) + · · · + ker pk (f ) is direct. Invariance of a subspace with respect to a linear operator was initially introduced in Chapter 2 in exercises and then repeated in Chapter 3. For convenience we recall that a subspace U of a vector space V is called f -invariant, where f : V → V is an endomorphism, if f (U) ⊆ U. Invariance of subspaces will play an important role in this chapter. It is easy to see that, if V is a vector space and f : V → V is an endomorphism, then ker f and ran f are f -invariant subspaces. More generally, if p is a polynomial, then ker p(f ) is an f -invariant subspace.
4.1. EIGENVALUES AND DIAGONALIZATION
245
Theorem 4.1.45. Let V be an n-dimensional vector space and let f : V → V be an endomorphism. If cf (t) = (λ1 − t)r1 · · · (λk − t)rk for some distinct λ1 , . . . , λk ∈ K and some positive integers r1 , . . . , rk , then V = ker(f − λ1 )r1 ⊕ · · · ⊕ ker(f − λk )rk and cfj (t) = (t − λj )rj
for every j ∈ {1, . . . , k} where fj : ker(f − λj )rj → ker(f − λj )rj is the endomorphism induced by f . Proof. From Theorem 4.1.44 applied to the polynomials (λ1 −t)r1 , . . . , (λk −t)rk we get V = ker(f − λ1 )r1 ⊕ · · · ⊕ ker(f − λk )rk .
Now, for every j ∈ {1, . . . , k}, the polynomial (λ1 − t)r1 is an fj -annihilator. This implies, by Theorem 4.1.39, that mfj = (t − λ1 )qj where qj is an integer such that 1 ≤ qj ≤ rj and cfj (t) = (λj − t)sj where sj is an integer such that qj ≤ sj . From Theorem 4.1.43 we get cf = cf1 . . . cfk , that is, (λ1 − t)r1 · · · (λk − t)rk = (λ1 − t)s1 · · · (λk − t)sk
Hence rj = sj for every j ∈ {1, . . . , k}, completing the proof.
Corollary 4.1.46. Let V be an n-dimensional vector space and let f : V → V be an endomorphism. If cf (t) = (λ1 − t)r1 · · · (λk − t)rk for some distinct λ1 , . . . , λk ∈ K and some positive integers r1 , . . . , rk , then for every j ∈ {1, . . . , k} we have dim ker(f − λj )rj = rj . Proof. This follows from the fact that deg cfj (t) = deg(t − λj )rj = rj for every j ∈ {1, . . . , k}.
246
Chapter 4: Reduction of Endomorphisms
Definition 4.1.47. An endomorphism f : V → V is called diagonalizable if V has a basis consisting of eigenvectors of f . If V is an n-dimensional vector space and an endomorphism f : V → V is diagonalizable, then that is there is a basis B = {v1 , . . . , vn } of V and λ1 , . . . , λn ∈ K such that f (vj ) = λj vj for every j ∈ {1, . . . , n}. In other words, the B-matrix of f is diagonal: λ1 0 · · · 0 0 λ2 · · · 0 .. .. . . .. . . . . . 0 0 · · · λn The following result is a direct consequence of the definitions.
Theorem 4.1.48. Let V be an n-dimensional vector space. An endomorphism f : V → V is diagonalizable if and only if it has n linearly independent eigenvectors.
If λ is an eigenvalue of an endomorphism f : V → V, then the characteristic polynomial of f can be written as det(f − t) = (λ − t)r q(t), where q(t) is a polynomial such that q(λ) 6= 0. The number r is called the algebraic multiplicity of the eigenvalue λ. The dimension of the eigenspace of f corresponding to an eigenvalue λ, that is dim ker(f − λ), is called the geometric multiplicity of λ. The algebraic multiplicity of an endomorphism need not be the same as the geometric multiplicity. Indeed, consider the endomorphism f : R3 → R3 defined as f (x) = Ax where 3 1 1 A = 0 3 0 . 0 0 3
Since det(f − t) = (3 − t)3 , f has only one eigenvalue λ = 3 whose algebraic multiplicity is 3. On the other hand, since the dimension of the null space of the matrix 0 1 1 0 0 0 0 0 0 is 2, the geometric multiplicity of the eigenvalue 3 is 2.
4.1. EIGENVALUES AND DIAGONALIZATION
247
Theorem 4.1.49. The geometric multiplicity is less than or equal to the algebraic multiplicity. Proof. Let V be an n-dimensional vector space and let f : V → V be an endomorphism. If dim ker(f − λ) = k, then there is a basis {v1 , . . . , vn } of V such that v1 , . . . , vk are eigenvectors of f corresponding to the eigenvalue λ. For any nonzero n-linear alternating form D : V n → K we have cf (t)D(v1 , . . . , vk , vk+1 , . . . , vn ) = D(f (v1 ) − tv1 , . . . , f (vk ) − tvk , f (vk+1 ) − tvk+1 , . . . , f (vn ) − tvn ) = D(λv1 − tv1 , . . . , λvk − tvk , f (vk+1 ) − tvk+1 , . . . , f (vn ) − tvn )
= (λ − t)k D(v1 , . . . , vk , f (vk+1 ) − tvk+1 , . . . , f (vn ) − tvn ). Consequently, (λ − t)k divides cf because
D(v1 , . . . , vk , f (vk+1 ) − tvk+1 , . . . , f (vn ) − tvn ) = q(t)D(v1 , . . . , vk , vk+1 , . . . , vn ),
where q is a polynomial. In Theorem 3.4.10 we show that eigenvectors corresponding to different eigenvalues of a normal operator on an inner product space V are orthogonal. If V is an arbitrary vector space, then we cannot talk about orthogonality of eigenvectors, but we still have linear independence of eigenvectors corresponding to different eigenvalues, as the following theorem states. Theorem 4.1.50. Let V be a vector space and let f : V → V be an endomorphism. If v1 , . . . , vk ∈ V are eigenvectors of f corresponding to distinct eigenvalues λ1 , . . . , λk , then the vectors v1 , . . . , vk are linearly independent. First proof. If x1 v1 + · · · + xk vk = 0
for some x1 , . . . , xk ∈ K, then
(f − λ1 ) · · · (f − λk−1 )(x1 v1 + · · · + xk vk ) = 0 and thus Since
xk (f − λ1 ) · · · (f − λk−1 )(vk ) = 0. (f − λ1 ) · · · (f − λk−1 )(vk )
= ((f − λk ) + (λk − λ1 )) · · · ((f − λk ) + (λk − λk−1 )))(vk ) = (λk − λ1 ) · · · (λk − λk−1 )vk ,
248
Chapter 4: Reduction of Endomorphisms
we get xk (λk − λ1 ) · · · (λk − λk−1 )vk = 0. Consequently xk = 0, because (λk − λ1 ) · · · (λk − λk−1 ) 6= 0 and vk 6= 0. In the same way we can show that x1 = · · · = xk−1 = 0. Second proof. Since the subspace Span{v1 , . . . , vk } is f -invariant, without loss of generality we can assume that V = Span{v1 , . . . , vk }. Then the eigenvalues are roots of the polynomial cf which has the degree equal to dim V and consequently k ≤ dim V. But V = Span{v1 , . . . , vk }, so we must have k ≥ dim V. Thus k = dim V and the vectors v1 , . . . , vk are linearly independent. Theorem 4.1.51. Let V be an n-dimensional vector space and let f : V → V be an endomorphism. The following conditions are equivalent: (a) f is diagonalizable; (b) There are distinct λ1 , . . . , λk ∈ K and positive integers r1 , . . . , rk such that cf (t) = (λ1 − t)r1 · · · (λk − t)rk
and
dim ker(f − λj ) = rj ;
(c) If λ1 , . . . , λk are all distinct eigenvalues of f , then ker(f − λ1 ) ⊕ · · · ⊕ ker(f − λk ) = V; (d) If λ1 , . . . , λk are all distinct eigenvalues of f , then n = dim ker(f − λ1 ) + · · · + dim ker(f − λk ); (e) If λ1 , . . . , λk are all distinct eigenvalues of f , then mf (t) = (t − λ1 ) · · · (t − λk ). Proof. First we prove equivalence of (a) and (b). Then we show that (c) implies (d), (d) implies (a), (a) implies (e), and (e) implies (c). If f : V → V is diagonalizable, then there is a basis {v1 , . . . , vn } of V such that f (vj ) = λj vj for some λ1 , . . . , λn ∈ K and all j ∈ {1, . . . , n}. If D : V n → K is a nonzero n-linear alternating form, then D(f (v1 ) − tv1 , . . . , f (vn ) − tvn ) = D((λ1 − t)v1 , . . . , (λn − t)vn ) = (λ1 − t) · · · (λn − t)D(v1 , . . . , vn )
4.1. EIGENVALUES AND DIAGONALIZATION
249
and thus cf (t) = (λ1 − t) · · · (λn − t). Without loss of generality, we can suppose that λ1 , . . . , λk are distinct numbers such that {λk+1 , . . . , λn } ⊆ {λ1 , . . . , λk }. Consequently, cf (t) = (λ1 − t)r1 . . . (λk − t)rk where r1 , . . . , rk are positive integers such that r1 + · · · + rk = n. Because for every l ∈ {1, . . . , n} there is a j ∈ {1, . . . , k} such that vl ∈ ker(f − λj ), we have ker(f − λ1 ) + · · · + ker(f − λk ) = V. Now since for every j ∈ {1, . . . , k} we have ker(f − λ1 ) ⊆ ker(f − λj )rj and, by Corollary 4.1.46, dim ker(f − λj )rj = rj , we conclude that dim ker(f − λj ) = rj for every j ∈ {1, . . . , k}. Thus (a) implies (b). Now suppose that we can write cf (t) = (λ1 − t)r1 . . . (λk − t)rk where λ1 , . . . , λk ∈ K are distinct and r1 , . . . , rk are positive integers such that dim ker(f − λj ) = rj for every j ∈ {1, . . . , k}. Then, by Corollary 4.1.46, ker(f − λj ) = ker(f − λj )rj and, by Theorem 4.1.45, the sum ker(f − λ1 ) + · · · + ker(f − λk ) is direct and we have ker(f − λ1 ) ⊕ · · · ⊕ ker(f − λk ) = V. Consequently, if Bj is a basis of ker(f − λj ) for j ∈ {1, . . . , k}, then B1 ∪ · · · ∪ Bk is a basis of V because r1 + · · · + rk = n. Since all elements of B1 ∪ · · · ∪ Bk are eigenvectors, f is diagonalizable. Thus (b) implies (a). Clearly (c) implies (d). Next suppose that dim ker(f − λ1 ) + · · · + dim ker(f − λk ) = n. Since ker(f − λj ) ⊆ ker(f − λj )rj , it follows from Theorem 4.1.45 that the sum ker(f − λ1 ) + · · · + ker(f − λk ) is direct. Now we construct a basis of eigenvectors of f as in the proof of the first part ((b) implies (a)). This proves that (d) implies (a). Next assume that the endomorphism f satisfies (a). Since (t− λ1 ) . . . (t− λk ) is an annihilator of f and mf is a monic polynomial which has all distinct eigenvalues as zeros and divides every annihilator of f , we have mf (t) = (t − λ1 ) . . . (t − λk ),
250
Chapter 4: Reduction of Endomorphisms
so (a) implies (e). Finally, (e) implies (c), because if mf (t) = (t − λ1 ) . . . (t − λk ), then ker(f − λ1 ) ⊕ · · · ⊕ ker(f − λk ) = V, by Theorem 4.1.44.
Example 4.1.52. Let f : R2 → R2 be the x 1 f = y −1
endomorphism defined by 4 x . 5 y
Show that f is not diagonalizable. Proof. Let D : R2 ×R2 → R be a nonzero alternating bilinear form. Proceeding as in Example 4.1.5 we obtain D
0 1 −1 0 1−t 1 = (t − 3)2 D , = (((1 − t)(5 − t) + 4)D . , , 1 0 5−t 1 4 0
Hence cf (t) = (t − 3)2 . Next we determine the eigenvectors corresponding to the eigenvalue 3. The equation 1 4 x x =3 −1 5 y y is equivalent to the equation x = 2y. Consequently, the eigenspace corresponding to the eigenvalue 3 is 2 E3 = Span . 1 Because dim E3 < 2, the endomorphism f is not diagonalizable.
Example 4.1.53. Find f n if f : R2 → R2 is the endomorphism defined by x 4 1 x f = . y 2 3 y Proof. The eigenvalues are given by the equation 0 and (4 − t)(3 − t) − 2 = are 1 1 2 and 5. It is easy to verify that E2 = Span and E5 = Span . −2 1 Since 4 1 1 1 1 1 2 0 = , 2 3 −2 1 −2 1 0 5
4.1. EIGENVALUES AND DIAGONALIZATION we have
251
−1 4 1 1 1 2 0 1 1 = 2 3 −2 1 0 5 −2 1
and thus n −1 1 4 1 1 1 2n 0 1 1 1 1 2n 0 1 −1 = = . 2 3 −2 1 0 5n −2 1 0 5n 2 1 3 −2 1 Consequently 1 x 2n + 2 · 5n −2n + 5n x f = . y 3 −2n+1 + 2 · 5n 2n+1 + 5n y n
Example 4.1.54. Show that the endomorphism f : R3 → R3 defined by x 3 1 2 x f y = 1 3 2 y z 1 1 4 z
is diagonalizable. Solution. According to Example 4.1.28 we have cf (t) = (2 − t)2 (6 − t). It is enough to show that dim ker(f − 2) = 2. The equation 3 1 2 x x 1 3 2 y = 2 y 1 1 4 z z
is equivalent to the equation x + y + 2z = 0. Consequently, −2 −1 x −y − 2z = y 1 + z 0 , y = y 1 z z 0 which means that
−2 −1 E2 = Span 1 , 0 . 0 1 −1 −2 Since the vectors 1 , 0 are linearly independent, dim E2 = 2. 0 1
252
Chapter 4: Reduction of Endomorphisms
Example 4.1.55. Let V be an n-dimensional vector space and let f : V → V be an endomorphism such that mf (t) = t(t + 1) and dim ker f = k. Determine cf . Solution. The endomorphism f is diagonalizable by Theorem 4.1.51. The only eigenvalues are 0 and −1. Let {v1 , . . . , vn } be a basis of eigenvectors such that {v1 , . . . , vk } is a basis of ker f = E0 and {vk+1 , . . . , vn } is a basis of E−1 . Then if 1 ≤ j ≤ k
f (vj ) = 0 and f (vj ) = −vj
if k + 1 ≤ j ≤ n.
Consequently cf (t) = (−1)n tk (1 + t)n−k .
4.2
Jordan canonical form
Diagonalizable endomorphisms have many good properties that are useful in applications. However, we often have to deal with endomorphisms that are not diagonalizable. In this section we study the structure of such endomorphisms.
4.2.1
Jordan canonical form when the characteristic polynomial has one root
We begin by considering two examples.
Example 4.2.1. Let f : R2 → R2 be the endomorphism defined by x 1 4 x f = . y −1 5 y
2 Show that f is not diagonalizable and find a basis B of R such that the matrix 3 1 of f in B is . 0 3 2 Proof. First we find that cf (t) = (t − 3)2 and ker(f − 3) = Span . This 1 shows that f is not diagonalizable because dim ker(f − 3) = 1. 1 Next we choose a vector that is not in ker(f −3), for example . The vec0
4.2. JORDAN CANONICAL FORM tor (f − 3) we have
253
1 is in ker(f − 3) because according to Cayley-Hamilton theorem 0 1 1 (f − 3) (f − 3) = (f − 3)2 = 0. 0 0
Now it is easy to verify that the set 1 1 1 4 1 1 −2 1 (f − 3) , = , = , 0 0 −1 5 0 0 −1 0 is a basis with the desired property. Note that mf (t) = (t − 3)2 .
Example 4.2.2. Let V be a vector space such that dim V = 3. Let f : V → V be a linear transformation and let α ∈ K. If mf (t) = (t − α)3 , show that there is a basis P of V such that the P-matrix of the linear transformation f is α 1 0 0 α 1 . 0 0 α Solution. Let v be a vector in ker(f − α)3 = V that is not in ker(f − α)2 . Then (f − α)v ∈ ker(f − α)2 and (f − α)v ∈ / ker(f − α). We will show that P = {(f − α)2 v, (f − α)v, v} is a basis with the desired property. Let x1 , x2 , x3 ∈ K be such that x1 (f − α)2 v + x2 (f − α)v + x3 v = 0. By applying (f − α)2 to (4.4) we get x3 (f − α)2 v = 0 and consequently x3 = 0. Next, by applying f − α to the equality x1 (f − α)2 x + x2 (f − α)v = 0 we get x2 (f − α)2 v = 0 which gives us x2 = 0. Now (4.4) becomes x1 (f − α)2 v = 0
(4.4)
254
Chapter 4: Reduction of Endomorphisms
which gives us x1 = 0. This shows that the vectors (f − α)2 v, (f − α)v, and v are linearly independent and consequently P = {(f − α)2 v, (f − α)v, v} is a basis of V. Since f ((f − α)2 v) = (f − α)(f − α)2 v + α(f − α)2 v
= α(f − α)2 v (because (f − α)3 (v) = 0),
f ((f − α)v) = (f − α)(f − α)v + α(f − α)v = (f − α)2 v + α(f − α)v, f (v) = (f − α)v + αv,
α 1 0 the P-matrix of f is 0 α 1. 0 0 α
The minimal polynomial of an endomorphism f : V → V on a finite dimensional vector space is defined as the unique monic polynomial mf of smallest positive degree such that mf (f ) = 0, that is, mf (f )(v) = 0 for every v ∈ V. Now we consider a similar property relative to a fixed v ∈ V.
Example 4.2.3. Let f : V → V be an endomorphism on a vector space V and let x ∈ V. If (f − α)3 (x) = 0, show that the subspace Span x, f (x), f 2 (x) is f -invariant. Proof. The equality (f − α)3 (x) = (f 3 − 3αf 2 + 3α2 f − α3 )(x) = 0 gives us f 3 (x) = (3αf 2 − 3α2 f + α3 )(x) and then, for every integer n ≥ 3, we get f n (x) = (3αf n−1 − 3α2 f n−2 + α3 f n−3 )(x). This implies by induction that for every nonnegative integer n we have f n (x) ∈ Span x, f (x), f 2 (x) . Our result is an immediate consequence of this fact.
4.2. JORDAN CANONICAL FORM
255
Theorem 4.2.4. Let V be a finite dimensional vector space, f : V → V a nonzero endomorphism, and v a nonzero vector in V. Then (a) There is a unique monic polynomial mf,v of smallest positive degree such that mf,v (f )(v) = 0; (b) mf,v divides every polynomial p such that p(f )(v) = 0; (c) If the degree of mf,v is k, then the vectors v, f (v), . . . , f k−1 (v) are linearly independent and Span v, f (v), . . . , f k−1 (v) is the smallest f -invariant subspace of V containing v. Proof. (a) and (b) can be obtained as in the proof of Theorem 4.1.39. If the degree of mf,v is k, then the vectors v, f (v), . . . , f k−1 (v) are linearly independent because, if x0 , . . . , xk−1 ∈ K are not all 0, then the degree of the polynomial x0 + x1 t + · · · + xk−1 tk−1is strictly less than the degree of mf,v . To show that the subspace Span v, f (v), . . . , f k−1 (v) is f -invariant it is enough to show that f n (v) is in this subspace for every positive integer n. Indeed, there are polynomials q and r such that tn = mf,v (t)q(t) + r(t), where r = 0 or r 6= 0 and the degree of r is strictly less than the degree of mf,v . Thus f n (v) = r(f )(v) and the subspace Span v, f (v), . . . , f k−1 (v) is f -invariant. Clearly, if U is an f -invariant subspace of V such that v ∈ U, then Span v, f (v), . . . , f k−1 (v) ⊆ U. Definition 4.2.5. Let V be a vector space, f : V → V a nonzero endomorphism and v a nonzero vector in V such that the degree of mf,v is k. The f -invariant subspace Span{v, f (v), . . . , f k−1 (v)} is called the cyclic subspace of f associated with v and is denoted by Vf,v .
Theorem 4.2.6. Let V be a finite dimensional vector space and let f : V → V be an endomorphism such that mf (t) = (t − α)k for some integer k ≥ 1 and some α ∈ K. Then there is a nonzero vector v ∈ V such that mf,v = mf .
Proof. Let v ∈ V be such that (f − α)k v = 0 and (f − α)k−1 v 6= 0. Then mf (t) = (t − α)k = mf,v (t).
256
Chapter 4: Reduction of Endomorphisms
Theorem 4.2.7. Let V be a finite dimensional vector space and let f : V → V be an endomorphism. For a nonzero vector v ∈ V the following conditions are equivalent. (a) mf,v = (t − α)k for some integer k ≥ 1 and some α ∈ K; (b) The set B = {(f − α)k−1 v, . . . , (f − α)v, v} is a basis of Vf,v ; (c) There is a basis B = {v1 , . . . , vk−1 , v} of an f -invariant subspace U of V such that the B-matrix of the endomorphism g : U → U induced by f is α 1 0 0 ... 0 0 0 α 1 0 . . . 0 0 0 0 α 1 . . . 0 0 .. .. .. .. . . .. .. . . . . . . . . 0 0 0 0 . . . 1 0 0 0 0 0 . . . α 1 0 0 0 0 ... 0 α
Proof. Suppose that mf,v = (t − α)k . Because for every j ∈ {0, . . . , k − 1} we have f j (v) = (f − α + α)j (v) = (f − α)j (v) + jα(f − α)j−1 (v) + · · · + jαj−1 (f − α)(v) + αj v,
we get Vf,v = Span{v, f (v), . . . , f k−1 (v)} ⊆ Span{v, (f − α)v, . . . , (f − α)k−1 v}. Hence Vf,v = Span{v, f (v), . . . , f k−1 (v)} = Span{v, (f − α)v, . . . , (f − α)k−1 v}, because the vectors v, f (v), . . . , f k−1 (v) are linearly independent. Consequently B = (f − α)k−1 v, . . . , (f − α)v, v
is a basis of Vf,v . This proves (a) implies (b). Suppose now that B = (f − α)k−1 v, . . . , (f − α)v, v is a basis of Vf,v . In order to find the B-matrix of f we note that f ((f − α)k−1 (v)) = (f − α)k (v) + α(f − α)k−1 (v) = α(f − α)k−1 (v)
257
4.2. JORDAN CANONICAL FORM and
f ((f − α)j (v)) = (f − α)j+1 (v) + α(f − α)j (v) = 1 · (f − α)j+1 (v) + α(f − α)j (v) for every j ∈ {0, . . . , k − 2}. To prove that (b) implies (c) we take U = Vf,v . Now suppose that there is a basis B = {v1 , . . . , vk−1 , v} of an f -invariant subspace U of V such that the B-matrix of the endomorphism g : U → U induced by f is α 1 0 0 ... 0 0 0 α 1 0 . . . 0 0 0 0 α 1 . . . 0 0 .. .. .. .. . . .. .. . . . . . . . . 0 0 0 0 . . . 1 0 0 0 0 0 . . . α 1 0 0 0 0 ... 0 α Then
(f − α)v1 = 0
(f − α)v2 = v1 .. .
(f − α)vk−1 = vk−2
(f − α)v = vk−1 .
Consequently (f − α)k−1 v = v1 6= 0
and (f − α)k v = 0.
Hence mf,v = (t − α)k , which proves that (c) implies (a).
Example 4.2.8. Let f : P3 (R) → P3 (R) be the endomorphism defined by f (p) = p′ . Determine mf , Vf,x2 +1 and mf,x2 +1 . Proof. Since f (x2 + 1) = 2x, f (2x) = 2, and f (2) = 0, we have mf,x2 +1 = t3 , Vf,x2 +1 = P2 (R), and mf (t) = t4 .
258
Chapter 4: Reduction of Endomorphisms Definition 4.2.9. Let α ∈ K and matrix α 1 0 0 α 1 0 0 α . . . . . . . . . 0 0 0 0 0 0 0 0 0
let k be a positive integer. The k × k 0 0 0 .. . 0 . . . 1 0 0 . . . α 1 0 ... 0 α 0 0 1 .. .
... ... ... .. .
0 0 0 .. .
is called a Jordan block and is denoted by Jα,k . The 1 × 1 matrix [α] is also considered a Jordan block.
Example 4.2.10. Let f : R3 → R3 be the endomorphism defined by x 2 5 1 x f y = 0 5 −1 y . z −1 2 5 z
Show that mf (t) = (t − 4)3 and find a basis of R3 such that the matrix of f in that basis is the Jordan block 4 1 0 0 4 1 . 0 0 4
Proof. First, proceeding as in Example 4.1.28, we get cf (t) = (4 − t)3 . Hence, by Cayley-Hamilton theorem, we have (f − 4)3 = 0. To determine ker(f − 4)2 we solve the equation that is,
2 2 5 1 4 0 0 x 0 0 5 −1 0 4 0 y − = 0 , −1 2 5 0 0 4 z 0 3 −3 −6 x 0 1 −1 −2 y = 0 , 1 −1 −2 z 0
259
4.2. JORDAN CANONICAL FORM which gives us x − y − 2z = 0. Consequently x ker(f − 4)2 = y ∈ R3 , x − y − 2z = 0 . z
Because ker(f − 4)2 6= V, we have mf (t) = (t − 4)3 . 0 Since 1 ∈ / ker(f − 4)2 , according to Theorem 4.2.7, the set 0 0 0 0 (f − 4)2 1 , (f − 4) 1 , 1 0 0 0 0 −2 5 1 0 3 −3 −6 0 = 1 −1 −2 1 , 0 1 −1 1 , 1 0 −1 2 1 0 1 −1 −2 0 5 0 −3 = −1 , 1 , 1 −1 2 0
is a basis satisfying the required condition. Note that m
3 0 = (t − 4) .
f, 1
0
In all examples discussed so far the matrix of the linear transformation had one Jordan block. Now we consider two examples of transformations with two Jordan blocks. In the next several examples we added horizontal and vertical lines in the matrix to visualize and separate different Jordan blocks. The lines have no other mathematical meaning.
Example 4.2.11. Let f : R3 → R3 be the endomorphism defined by x 5 2 4 x f y = 0 7 0 y . z −1 1 9 z
7 Find a basis of R3 such that the matrix of f in that basis is 0 0
1 7 0
0 0 . 7
Proof. First we find that cf (t) = (7 − t)3 . We determine ker(f − 7) by solving
260 the equation
that is,
Chapter 4: Reduction of Endomorphisms
5 2 4 7 0 0 x 0 0 7 0 − 0 7 0 y = 0 , −1 1 9 0 0 7 z 0
0 −2 2 4 x 0 0 0 y = 0 , −1 1 2 z 0
an endomorphism which gives us x − y − 2z = 0. Consequently x ker(f − 7) = y ∈ R3 , x − y − 2z = 0 . z
Since dim ker(f − 7) = 2, the endomorphism f is not diagonalizable. It is easy 0 to see that mf (t) = (t − 7)2 . Because 0 ∈ / ker(f − 7) and dim ker(f − 7) = 2, 1 we have 0 (4.5) ker(f − 7) ⊕ Span 0 = R3 . 1 1 1 0 Now, since 1 ∈ ker(f − 7) and the eigenvectors 1 and (f − 7) 0 = 0 0 1 4 0 are linearly independent, it is easy to verify, using (4.5), that the set 2 0 0 1 4 0 1 (f − 7) 0 , 0 , 1 = 0 , 0 , 1 1 1 0 2 1 0
is a basis satisfying the required condition. Note that m 0 = (t − 7)2 = mf . f, 0
1
Example 4.2.12. Let V be a vector space such that dim V = 5. Let f : V → V be an endomorphism and let α ∈ K. We assume that cf (t) = (α − t)5 and
261
4.2. JORDAN CANONICAL FORM mf (t) = (t − α)3 . If dim ker(f − α) = 2 and show that there is a basis P of V α 0 0 0 0
dim ker(f − α)2 = 4,
such that the P-matrix of f is 1 0 0 0 α 1 0 0 0 α 0 0 . 0 0 α 1 0 0 0 α
Solution. Let u be a vector in ker(f −α)3 which is not in ker(f −α)2 . Then the vector (f −α)u is in ker(f −α)2 and not in ker(f −α). Because dim ker(f −α)2 = 4 and dim ker(f − α) = 2, we can choose a vector v ∈ ker(f − α)2 such that {(f − α)u, v} is a basis of a complement to ker(f − α) in ker(f − α)2 . We will show that (f − α)2 u, (f − α)u, u, (f − α)v, v
is a basis that has the desired property. First we show that the vectors (f −α)2 u, (f −α)u, u, (f −α)v, v are linearly independent. If x1 (f − α)2 u + x2 (f − α)u + x3 u + x4 (f − α)v + x5 v = 0
(4.6)
for some x1 , x2 , x3 , x4 , x5 ∈ K, by applying (f − α)2 to the above equation we obtain x3 (f − α)2 u = 0 and consequently x3 = 0. Next, by applying f − α to (4.6) we get x2 (f − α)2 u + x5 (f − α)v = (f − α)(x2 (f − α)u + x5 v) = 0 which yields that x2 = x5 = 0 because {(f − α)u, v} is a basis of a complement to ker(f − α) in ker(f − α)2 . Now (4.6) becomes x1 (f − α)2 u + x4 (f − α)v = (f − α)(x1 (f − α)u + x4 v) = 0 which gives us x1 = x4 = 0 again because {(f − α)u, v} is a basis of a complement to ker(f − α) in ker(f − α)2 . We have proved that the vectors (f − α)2 u, (f − α)u, u, (f − α)v, v are linearly independent and consequently P = (f − α)2 u, (f − α)u, u, (f − α)v, v
262
Chapter 4: Reduction of Endomorphisms
is a basis of V. Since f ((f − α)2 u) = (f − α)(f − α)2 u + α(f − α)2 u = α(f − α)2 u
f ((f − α)u) = (f − α)(f − α)u + α(f − α)u = (f − α)2 u + α(f − α)u f (u) = (f − α)u + αu f ((f − α)v) = (f − α)(f − α)v + α(f − α)v = α(f − α)v f (v) = (f − α)v + αv
the P-matrix of f is
α 0 0 0 0
1 0 α 1 0 α 0 0 0 0
0 0 0 0 0 0 α 1 0 α
.
Note that the assumption that dim ker(f − α)2 = 4 is unnecessary because, if u and v were linearly independent vectors in a complement of the vector subspace ker(f − α)2 , then we could show, as before, that the vectors (f − α)2 u, (f − α)u, u, (f − α)2 v, (f − α)v, v are linearly independent, which is not possible because dim V = 5. The next result plays a central role this and the following sections. The proof is difficult in comparison with the other proofs presented in this book so far and it can be skipped at the first reading. On the other hand, understanding the proof, which uses many ideas presented in this book, is a good indication that you understand linear algebra at the level expected in a second course. Lemma 4.2.13 (Fundamental lemma). Let V be an n-dimensional vector space and let f : V → V be an endomorphism. Let v ∈ V be such that mf (t) = mf,v (t) = a0 + a1 t + · · · + ak−1 tk−1 + tk , for some k ∈ {1, 2, . . . , n} and a0 , a1 , . . . , ak−1 ∈ K. Then there is an f -invariant subspace W such that Span v, f (v), . . . , f k−1 (v) ⊕ W = V. Proof. Let v, f (v), . . . , f k−1 (v), vk+1 , . . . , vn be a basis of V and let g : V → K be the linear functional such that g(f j (v)) = 0 for j < k − 1, g(f k−1 (v)) = 1, and g(vj ) = 0 for j > k. We define W = w ∈ V : g(f j (w)) = 0, j ∈ {0, 1, 2, . . . } .
4.2. JORDAN CANONICAL FORM
263
Clearly, W is a subspace of V. Because for every j ∈ {0, 1, 2, . . . } and every w ∈ W we have g(f j (f (w))) = g(f j+1 (w)) = 0, the subspace W is f -invariant. Now we prove that the sum Span{v, . . . , f k−1 (v)} + W is direct. Suppose w = x1 v + x2 f (v) + · · · + xk f k−1 (v) ∈ W
for some x1 , . . . , xk ∈ K. Then g(w) = xk = 0 and thus g(f (w)) = xk−1 = 0, because f (w) = x1 f (v) + · · · + xk−1 f k−1 (v) + xk f k (v) = x1 f (v) + · · · + xk−1 f k−1 (v). Continuing this way we show that xk = xk−1 = xk−2 = · · · = x1 = 0, so the sum is direct. Now, since mf (f ) = mf,v (f ) = 0, we have f k = −a0 − a1 f − · · · − ak−1 f k−1 and then, for every integer l ≥ 0,
f k+l = −a0 f l − a1 f l+1 − · · · − ak−1 f l+k−1 ,
which yields
Span f j , j ∈ {0, 1, 2, . . . } = Span f j , j ∈ {0, 1, . . . , k − 1} .
Consequently
W = w ∈ V : g(f j (w)) = 0, j ∈ {0, . . . , k − 1} = w ∈ V : (gf j )(w) = 0, j ∈ {0, . . . , k − 1} .
Next we show that the functionals g, gf, . . . , gf k−1 are linearly independent. Suppose x1 g + x2 gf + · · · + xk gf k−1 = 0
for some x1 , . . . , xk ∈ K. Applying this equality successively to v, f (v), f 2 (v), . . . , f k−1 (v) we get g(x1 v + · · · + xk f k−1 (v)) = 0, g(x1 f (v) + · · · + xk f k (v)) = 0, .. .
g(x1 f k−1 (v) + · · · + xk f 2k−2 (v)) = 0,
which can be written as
g(x1 v + · · · + xk f k−1 (v)) = 0,
g(f (x1 v + · · · + xk f k−1 (v))) = 0, .. . g(f k−1 (x1 v + · · · + xk f k−1 (v))) = 0.
This shows that x1 v+· · ·+xk f k−1 (v) ∈ W and thus x1 = · · · = xk = 0, because the sum Span{v, . . . , f k−1 (v)} + W is direct. Consequently the functionals g, gf, . . . , gf k−1 are linearly independent. By Theorem 2.4.14, we get dim W = dim V − k, which completes the proof.
264
Chapter 4: Reduction of Endomorphisms
Theorem 4.2.14. Let V be a finite dimensional vector space and let f : V → V be an endomorphism. If mf (t) = (t − α)n for some integer n ≥ 1 and α ∈ K, then there are nonzero vectors v1 , . . . , vr ∈ V such that V = Vf,v1 ⊕ · · · ⊕ Vf,vr and mf,v1 = (t − α)k1 , . . . , mf,vr = (t − α)kr , where k1 , . . . , kr are integers such that 1 ≤ kr ≤ · · · ≤ k1 = n. Proof. Let v1 ∈ V be such that (f − α)n−1 (v1 ) 6= 0. Then mf,v1 = (t − α)n . If V = Vf,v1 and we take k1 = n, then we are done. If V 6= Vf,v1 , then there is an f -invariant subspace W of dimension dim V − n such that V = Vf,v1 ⊕ W, by Lemma 4.2.13. Let g : W → W be the endomorphism induced by f on W. Clearly, mf (t) = (t − α)n is a g-annihilator. Consequently, mg (t) divides mf (t) = (t − α)n . This means that mg (t) = (t − α)m where m ≤ n. Now, since dim W = dim V − n < dim V, we can finish the proof using induction. From the above theorem and Theorem 4.2.7 we obtain the following important result. Corollary 4.2.15. Let V be a finite dimensional vector space and let f : V → V be an endomorphism. If mf (t) = (t − α)n for some integer n ≥ 1 and α ∈ K, then there are integers 1 ≤ kr ≤ · · · ≤ k1 = n and nonzero vectors v1 , . . . , vr such that V=
r M j=1
Span (f − α)kj −1 vj , (f − α)kj −2 vj , . . . , (f − α)vj , vj
where for every j ∈ {1, . . . , r} we have (f − α)kj vj = 0 and (f − α)kj −1 vj 6= 0. Moreover, the set B=
r [ (f − α)kj −1 vj , (f − α)kj −2 vj , . . . , (f − α)vj , vj
j=1
is a basis of V and the B-matrix of f is Jα,k1 Jα,k2
0
0 ..
. Jα,kr
.
265
4.2. JORDAN CANONICAL FORM
Definition 4.2.16. Let α ∈ K. We say that a matrix is in α-Jordan canonical form if it has the form Jα,k1 Jα,k2
0 ..
.
0
Jα,kr
where k1 , . . . , kr are integers such that k1 ≥ · · · ≥ kr ≥ 1 and Jα,k1 , Jα,k2 , . . . , Jα,kr are Jordan blocks.
If it is clear from the context what α is, then instead of “α-Jordan canonical” form we simply say “Jordan canonical form”. Here is an example of a matrix in α-Jordan canonical form
α 1 0 α 0 0 0 0 0 0 0 0 0 0 0 0 0
0
0 1
0 0
0 0
0 0
0 0
0 0
α
0
0
0
0
0
0 0
α 1 0 α
0 1
0 0
0 0
0
0
0
α
0
0
0
0
0
0
α
1
0
0
0
0
0
α
0
0
0
0
0
0
0 0 0 0 0 . 0 0 0 α
In this example k1 = k2 = 3, k3 = 2, k4 = 1 and
Jα,k1 = Jα,k2
α 1 0 = 0 α 1 , 0 0 α
Jα,k3
α 1 = , 0 α
Jα,k4 = α .
The result in the following theorem is useful when computing Jordan canonical forms.
266
Chapter 4: Reduction of Endomorphisms
Theorem 4.2.17. Let V be a vector space, f : V → V an endomorphism, and α ∈ K. If, for some integer k ≥ 2, v1 , . . . , vm ∈ ker(f − α)k are linearly independent vectors such that ker(f − α)k = ker(f − α)k−1 ⊕ Kv1 ⊕ · · · ⊕ Kvm , then the vectors (f −α)v1 , . . . , (f −α)vm are linearly independent vectors in ker(f − α)k−1 and the sum ker(f − α)k−2 + K(f − α)v1 + · · · + K(f − α)vm is direct. Proof. If w ∈ ker(f − α)k−2 and w + x1 (f − α)v1 + · · · + xm (f − α)vm = 0 for some x1 , . . . , xm ∈ K, then
0 = (f − α)k−2 w + x1 (f − α)k−1 v1 + · · · + xm (f − α)k−1 vm = x1 (f − α)k−1 v1 + · · · + xm (f − α)k−1 vm = (f − α)k−1 (x1 v1 + · · · + xm vm )
and thus x1 v1 + · · · + xm vm = ker(f − α)k−1 .
Since ker(f − α)k = ker(f − α)k−1 ⊕ Kv1 ⊕ · · · ⊕ Kvm , we must have x1 v1 + · · · + xm vm = 0. Thus x1 = · · · = xm = 0, because the vectors v1 , . . . , vm are linearly independent, and consequently w = 0.
Example 4.2.18. Let V be a vector space and let f : V → V be an endomorphism such that cf (t) = (α − t)9 and mf (t) = (t − α)3 for some α ∈ K. If dim ker(f − α) = 4 and dim ker(f − α)2 = 7, show that there is a basis B α 0 0 0 A= 0 0 0 0 0
of V such that the B-matrix of f is 1 0 0 0 0 0 0 0 α 1 0 0 0 0 0 0 0 α 0 0 0 0 0 0 0 0 α 1 0 0 0 0 0 0 0 α 1 0 0 0 . 0 0 0 0 α 0 0 0 0 0 0 0 0 α 1 0 0 0 0 0 0 0 α 0 0 0 0 0 0 0 0 α
4.2. JORDAN CANONICAL FORM
267
Solution. Let u and v be linearly independent vectors in V such that ker(f − α)2 ⊕ Ku ⊕ Kv = ker(f − α)3 = V. According to Theorem 4.2.17 the sum ker(f − α) + K(f − α)u + K(f − α)v is direct. Note that ker(f − α) + K(f − α)u + K(f − α)v ⊆ ker(f − α)2 . Because dim ker(f − α) = 4 and dim ker(f − α)2 = 7, there is a p ∈ ker(f − α)2 such that ker(f − α) ⊕ K(f − α)u ⊕ K(f − α)v ⊕ Kp = ker(f − α)2 . The vectors (f − α)2 u, (f − α)2 v, (f − α)p are in ker(f − α), that is, are eigenvectors of f and they are linearly independent, by Theorem 4.2.17 (with k = 2). Because dim ker(f − α) = 4 we can find an eigenvector q such that K(f − α)2 u ⊕ K(f − α)2 v ⊕ K(f − α)p ⊕ Kq = ker(f − α). Consequently, V = K(f − α)2 u ⊕ K(f − α)u ⊕ Ku ⊕ K(f − α)2 v ⊕ K(f − α)v ⊕ Kv ⊕ K(f − α)p ⊕ Kp ⊕ Kq and B = (f − α)2 u, (f − α)u, u, (f − α)2 v, (f − α)v, v, (f − α)p, p, q
is a basis of V. It is easy to verify that A is the B-matrix of f .
Example 4.2.19. Let V be a vector space and let f : V → V be an endomorphism. If mf (t) = t3 and there are positive integers q, r, s such that dim V = 3q + 2r + s, dim ker f 2 = 2q + 2r + s, and dim ker f = q + r + s, show that there are linearly independent vectors v1 , . . . , vq , w1 , . . . , wr , u1 , . . . , us such that V is a direct sum of the following q + r + s cyclic subspaces Span f 2 (v1 ), f (v1 ), v1 , . . . , Span f 2 (vq ), f (vq ), vq Span {f (w1 ), w1 } , . . . , Span {f (wr ), wr } Span{u1 }, . . . , Span{us }
268
Chapter 4: Reduction of Endomorphisms
Solution. There are linearly independent vectors v1 , . . . , vq such that V = ker f 3 = ker f 2 ⊕ Kv1 ⊕ · · · ⊕ Kvq . By Theorem 4.2.17, the sum ker f + Kf (v1 ) + · · · + Kf (vq ) is direct and it is a subspace of ker f 2 . Let w1 , . . . , wr ∈ ker f 2 be linearly independent vectors such that ker f 2 = ker f ⊕ Kf (v1 ) ⊕ · · · ⊕ Kf (vq ) ⊕ Kw1 ⊕ · · · ⊕ Kwr . Again by Theorem 4.2.17, the vectors f 2 (v1 ), . . . , f 2 (vq ), f (w1 ), . . . , f (wr ) are linearly independent vectors in ker f . Now, let u1 , . . . , us be linearly independent vectors such that ker f = Kf 2 (v1 ) ⊕ · · · ⊕ Kf 2 (vq ) ⊕ Kf (w1 ) ⊕ · · · ⊕ Kf (wr ) ⊕ Ku1 ⊕ · · · ⊕ Kus . Consequently, the vector space V is a direct sum of the following q + r + s cyclic subspaces: Span f 2 (v1 ), f (v1 ), v1 , . . . , Span f 2 (vq ), f (vq ), vq Span {f (w1 ), w1 } , . . . , Span {f (wr ), wr } Span{u1 }, . . . , Span{us }
It is worth noting that Theorems 4.2.14 and 4.2.15 can be obtained from Theorem 4.2.17. In the next theorem we give a slightly different formulation of these results and use Theorem 4.2.17 to prove it. Theorem 4.2.20. Let V be a finite dimensional vector space and let f : V → V be an endomorphism. If mf (t) = (t − α)k for some α ∈ K and integer k ≥ 1, then V is a direct sum of subspaces of the form Span (f − α)m−1 v, . . . , (f − α)v, v for some integer m ≥ 1 and v ∈ V such that mf,v = (t − α)m .
Proof. Recall that mf (t) = (t − α)k means that V = ker(f − α)k and V 6= ker(f − α)k−1 . Similarly, mf,v = (t − α)m means that (f − α)m v = 0 and (f − α)m−1 v 6= 0. Let Bk = {vk,1 , . . . , vk,mk } be a basis of a complement Ck of ker(f − α)k−1 in ker(f − α)k = V, that is, V = ker(f − α)k = ker(f − α)k−1 ⊕ Ck .
269
4.2. JORDAN CANONICAL FORM
By Theorem 4.2.17, there are vectors vk−1,1 , . . . , vk−1,mk−1 ∈ ker(f − α)k−1 such that the set Bk−1 consisting of vectors (f − α)vk,1 , . . . , (f − α)vk,mk , vk−1,1 , . . . , vk−1,mk−1 , is a basis of a complement Ck−1 of ker(f −α)k−2 in ker(f −α)k−1 . Consequently, V = ker(f − α)k−2 ⊕ Ck−1 ⊕ Ck . Next, there are vectors vk−2,1 , . . . , vk−2,mk−2 ∈ ker(f − α)k−2 such that the set Bk−2 consisting of vectors (f − α)2 vk,1 , . . . , (f − α)2 vk,mk , (f − α)vk−1,1 , . . . , (f − α)vk−1,mk−1 , vk−2,1 , . . . , vk−2,mk−2 , is a basis of a complement Ck−2 of ker(f − α)k−3 in ker(f − α)k−2 , again by Theorem 4.2.17, and we have V = ker(f − α)k−3 ⊕ Ck−2 ⊕ Ck−1 ⊕ Ck . Continuing as above we eventually obtain vectors v2,1 , . . . , v2,m2 ∈ ker(f − α)2 such that the set B2 consisting of vectors (f − α)k−2 vk,1 , . . . , (f − α)k−2 vk,mk , (f − α)k−3 vk−1,1 , . . . , (f − α)k−3 vk−1,mk−1 , .. . (f − α)v3,1 , . . . , (f − α)v3,m3 , v2,1 , . . . , v2,m2 , is a basis of a complement C2 of ker(f − α) in ker(f − α)2 . Finally, there are eigenvectors v1,1 , . . . , v1,m1 of f such that the set B1 consisting of vectors (f − α)k−1 vk,1 , . . . , (f − α)k−1 vk,mk , .. . (f − α)v2,1 , . . . , (f − α)v2,m2 , v1,1 , . . . , v1,m1 , is a basis of the eigenspace ker(f − α).
270
Chapter 4: Reduction of Endomorphisms
Now, since V = Ck ⊕ Ck−1 ⊕ · · · ⊕ C2 ⊕ ker(f − α) the set of the vectors Bk ∪ Bk−1 ∪ · · · ∪ B2 ∪ B1 is a basis of V. This basis contains mk sets of vectors
(f − α)k−1 vk,1 , .. .
(f − α)k−2 vk,1 , . . . , (f − α)vk,1 , .. .. . .
vk,1 } .. .
(f − α)k−1 vk,mk , (f − α)k−2 vk,mk , . . . , (f − α)vk,mk , vk,mk }
each with k elements, which will generate mk Jordan blocks Jα,k , mk−1 sets of vectors
(f − α)k−2 vk−1,1 , .. .
(f − α)k−3 vk−1,1 , .. .
...,
(f − α)vk−1,1 , .. .
vk−1,1 } .. .
(f − α)k−2 vk−1,mk−1 , (f − α)k−3 vk−1,mk−1 , . . . , (f − α)vk−1,mk−1 , vk−1,mk−1
each with k − 1 elements, which will generate mk−1 Jordan blocks Jα,k−1 , .. . j2 sets of vectors {(f − α)v2,1 , .. .
v2,1 } .. .
{(f − α)v2,m2 ,
v2,m2 }
each with 2 elements, which will generate m2 Jordan blocks Jα,2 , and m1 eigenvectors of f v1,1 .. . v1,m1 which will generate m1 Jordan blocks Jα,1 .
Example 4.2.21. Let V be a vector space and let f : V → V be an endomorphism. If cf (t) = (α − t)5 and mf (t) = (t − α)2 for some α ∈ K, determine all possible Jordan canonical forms associated with such an endomorphism f .
271
4.2. JORDAN CANONICAL FORM Solution. It is easy to see that these forms are α 1 α 1 0 0 0 0 α 0 0 0 0 α 0 0 α 1 0 and 0 0 0 0 0 α 0 0 0 0 0 0 0 α 0 0
0 0 α 0 0
0 0 0 α 0
0 0 0 0 α
.
In the first case we have dim ker(f − α) = 3 and we have three Jordan blocks and in the second case we have dim ker(f − α) = 4 and there are four Jordan blocks.
Example 4.2.22. Let V be a vector space such that dim V = 9 and let f : V → V be an endomorphism. If dim ker(f − α) = 3, dim ker(f − α)2 = 5, dim ker(f − α)3 = 7, ker(f − α)4 = 8, and mf (t) = (t − α)5 , determine the Jordan canonical form of f . Solution. Following the proof of Theorem 4.2.20 or to verify that the Jordan canonical form of f is α 1 0 0 0 0 0 0 0 α 1 0 0 0 0 0 0 0 α 1 0 0 0 0 0 0 0 α 1 0 0 0 0 0 0 0 α 0 0 0 0 0 0 0 0 α 1 0 0 0 0 0 0 0 α 1 0 0 0 0 0 0 0 α 0 0 0 0 0 0 0 0
Theorem 4.2.17 it is easy 0 0 0 0 0 0 0 0 α
Example 4.2.23. We consider the endomorphism f f (x) = Ax where −7 2 −14 −24 −4 −4 3 −7 −16 −3 7 −2 13 22 4 A= −3 1 −5 −9 −2 7 −2 12 22 5
.
: R5 → R5 defined by
.
272
Chapter 4: Reduction of Endomorphisms
Knowing that cf (t) = (1 − t)5 , determine mf and a basis B of R5 such that the B-matrix of f has a Jordan canonical form. Solution. Since
B =A−
and
1 0 0 0 0
0 1 0 0 0
0 0 1 0 0
0 0 0 1 0
0 0 0 0 1
2 2 B2 = −2 1 −2
B3 =
we have mf (t) = (t − 1)3 . The set −1 0 1 0 0
,
−8 2 −14 −24 −4 −4 2 −7 −16 −3 = 7 −2 12 22 4 −3 1 −5 −10 −2 7 −2 12 22 4 0 2 4 2 0 2 4 2 0 −2 −4 −2 , 0 1 2 1 0 −2 −4 −2 0 0 0 0 0
−2 0 0 1 0
is a basis of ker(f − 1)2 and the set
is a basis of ker(f − 1).
0 0 0 0 0
0 0 0 0 0
,
0 0 0 0 0
−1 0 0 0 1
0 0 0 0 0
,
,
,
0 1 0 0 0
−2 −2 1 4 1 , 0 0 1 1 0
It is easy to see that
1 0 0 0 0
is not in ker(f − 1)2 and
−1 0 1 0 0
∈ ker(f − 1)2
273
4.2. JORDAN CANONICAL FORM is not in Span (f − 1)
1 0 0 0 0
,
−2 1 1 0 1
,
−2 4 0 1 0
−8 −2 −2 −4 1 4 = Span 7 , 1 , 0 . −3 0 1 7 1 0
Consequently the set 1 1 1 0 0 0 2 0 0 B = (f − 1) 0 , (f − 1) , , (f 0 0 0 0 0 0 −8 1 −6 2 −4 0 −3 2 = −2 , 7 , 0 , 5 −3 0 −2 1 7 0 5 −2
− 1) ,
−1 0 1 0 0
−1 0 1 0 0
,
−1 0 1 0 0
is a basis of R5 with the desired properties, and the B-matrix of f is
1 0 0 0 0
1 1 0 0 0
0 1 1 0 0
0 0 0 1 0
0 0 0 1 1
.
Example 4.2.24. Let f : V → V be an endomorphism on a vector space V and let α ∈ K. If B = {v1 , v2 , v3 , v4 , v5 , v6 , v7 , v8 , v9 } is a basis of V such that the B-matrix of f is
α 1 0 α 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 1 0 α 1 0 α 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 α 1 0 α 0 0 0 0 0 0
0 0 0 0 0 1 α 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 α 0 0 α
,
274
Chapter 4: Reduction of Endomorphisms
find bases of ker(f − α), ker(f − α)2
and
ker(f − α)3
and the polynomial mf . Solution. Since f (v1 ) = αv1 , f (v2 ) = v1 + αv2 , f (v3 ) = v2 + αv3 , f (v4 ) = v3 + αv4 , f (v5 ) = αv5 , f (v6 ) = v5 + αv6 , f (v7 ) = v6 + αv7 , f (v8 ) = αv8 , f (v9 ) = αv9 , {v1 , v5 , v8 , v9 } is a basis of ker(f − α), {v1 , v2 , v5 , v6 , v8 , v9 } is a basis of ker(f − α)2 , {v1 , v2 , v3 , v5 , v6 , v7 , v8 , v9 } is a basis of ker(f − α)3 , and mf (t) = (t − α)4 .
4.2.2
Uniqueness of the Jordan canonical form when the characteristic polynomial has one root
In this section we present a formula for the number of Jordan blocks in the Jordan canonical form of an endomorphism with the characteristic polynomial that has only one root. This result gives us a uniqueness theorem for the Jordan canonical form for such endomorphisms. First we need some preliminary results. Lemma 4.2.25. Let U be a finite dimensional vector space and let g : U → U be an endomorphism. If the Jordan canonical form of g is a Jordan k × k block Jα,k for some α ∈ K, then dim ran(g − α)m = k − m for 1 ≤ m ≤ k − 1 and dim ran(g − α)m = 0 for m ≥ k. Proof. Let B = {v1 , . . . , vk } be a basis of U such that the B-matrix of g is the Jordan k × k block Jα,k . Then (g − α)(v1 ) = 0, (g − α)(v2 ) = v1 , . . . , (g − α)(vk ) = vk−1 . This means that ran(g − α) = Span{v1 , . . . , vk−1 }. Since (g−α)2 (v1 ) = 0, (g−α)2 (v2 ) = 0, (g−α)2 (v3 ) = v1 , . . . , (g−α)2 (vk ) = vk−2 ,
275
4.2. JORDAN CANONICAL FORM
we have ran(g − α)2 = Span{v1 , . . . , vk−2 }. Continuing the same way we get ran(g − α)m = Span{v1 , . . . , vk−m } for m ≤ k−1 and ran(g−α)m = 0 for m ≥ k. This gives us dim ran(g−α)m = k−m for 1 ≤ m ≤ k − 1 and dim ran(g − α)m = 0 for m ≥ k. Lemma 4.2.26. Let V be a finite dimensional vector space and let f : V → V be an endomorphism. If the diagonal Jordan blocks of the Jordan canonical form of f are Jα,n1 , . . . , Jα,np for some positive integers n1 , . . . , np , then X dim ran(f − α)k−1 = {number of Jordan blocks Jα,k } + (nq − k + 1), nq >k
for every integer k ≥ 1. Proof. We have V = U1 ⊕ · · · ⊕ Up , where, for every q ∈ {1, . . . , p}, Uq is a vector subspace of V with a basis Bq such that the Bq -matrix of the endomorphism fq : Uq → Uq induced by f on Uq is Jα,nq . For any vectors v ∈ V, u1 ∈ U1 , . . . , up ∈ Up such that v = u1 + · · · + up , we have f m (v) = f1m (u1 ) + · · · + fpm (up ), for every integer m ≥ 1. By Lemma 4.2.25, dim ran(fq − α)k−1 = 0 if nq < k, dim ran(fq − α)k−1 = 1 if nq = k, dim ran(fq − α)k−1 = nq − (k − 1) = nq − k + 1 if nq > k. The desired result is a consequence of these equalities.
Lemma 4.2.27. Let V be a finite dimensional vector space and let f : V → V be an endomorphism. If the diagonal Jordan blocks of the Jordan canonical form of f are Jα,n1 , . . . , Jα,np , for some positive integers n1 , . . . , np , then {number of Jordan blocks Jα,k }
= dim ran(f − α)k−1 + dim ran(f − α)k+1 − 2 dim ran(f − α)k ,
for every integer k ≥ 1.
276
Chapter 4: Reduction of Endomorphisms
Proof. By Lemma 4.2.26 we have dim ran(f − α)k−1 = {number of Jordan blocks Jα,k } +
X
nq >k
(nq − k + 1)
and by Lemma 4.2.25 we have dim ran(f − α)k =
X
nq ≥k+1
(nq − k) =
X
nq >k
(nq − k)
and dim ran(f − α)k+1 =
X
nq ≥k+2
(nq − k − 1) =
X
(nq − k − 1) =
nq ≥k+1
X
nq >k
(nq − k − 1).
Consequently dim ran(f − α)k−1 + dim ran(f − α)k+1 − 2 dim ran(f − α)k X X = {number of Jordan blocks Jα,k } + (nq − k + 1) + (nq − k − 1) −2
X
nq >k
nq >k
nq >k
(nq − k)
= {number of Jordan blocks Jα,k }.
Theorem 4.2.28. Let V be a finite dimensional vector space and let f : V → V be an endomorphism. If the diagonal Jordan blocks of the Jordan canonical form of f are Jα,n1 , . . . , Jα,np , for some positive integers n1 , . . . , np , then {number of Jordan blocks Jα,k }
= 2 dim ker(f − α)k − dim ker(f − α)k−1 − dim ker(f − α)k+1 ,
for every integer k ≥ 1. Proof. This equality is an immediate consequence of Lemma 4.2.27 and the Rank-Nullity Theorem 2.1.28.
Corollary 4.2.29. Let V be a finite dimensional vector space and let f : V → V be an endomorphism. If the minimal polynomial of f is mf = (t − α)n for some integer n ≥ 1, then the number of Jordan blocks in the Jordan canonical form of f is dim ker(f − α).
277
4.2. JORDAN CANONICAL FORM
Proof. According to Theorem 4.2.28 the number of n × n Jordan blocks Jα,n is 2 dim ker(f − α)n − dim ker(f − α)n+1 − dim ker(f − α)n−1 = dim ker(f − α)n − dim ker(f − α)n−1 ,
the number of m × m Jordan blocks Jα,m , where m ∈ {2, . . . , n − 1}, is 2 dim ker(f − α)m − dim ker(f − α)m+1 − dim ker(f − α)m−1 , and the number of 1 × 1 Jordan blocks Jα,1 , that is the blocks which are eigenvalues, is 2 dim ker(f − α) − dim ker(f − α)2 − dim ker(f − α)0
= 2 dim ker(f − α) − dim ker(f − α)2 − ker Id = 2 dim ker(f − α) − dim ker(f − α)2 .
Consequently the number of Jordan blocks is dim ker(f − α)n − dim ker(f − α)n−1
+ 2 dim ker(f − α)n−1 − dim ker(f − α)n − dim ker(f − α)n−2 + · · · + 2 dim ker(f − α)2 − dim ker(f − α)3 − dim ker(f − α) + 2 dim ker(f − α) − dim ker(f − α)2 = dim ker(f − α).
4.2.3
Jordan canonical form when the characteristic polynomial has several roots
We begin by considering an example.
Example 4.2.30. Let V be a 4-dimensional vector space and let f : V → V be an endomorphism. If the characteristic polynomial of f is (t − α)2 (t − β)2 for two distinct numbers α and β and dim ker(f − α) = 1, dim ker(f − α)2 = 2, dim ker(f − β) = 2, show that there is a basis B of V such that α 1 0 0 α 0 A= 0 0 β 0 0 0
the B- matrix of f is 0 0 . 0 β
Solution. Let u be a vector in ker(f − α)2 which is not in ker(f − α) and let {v, w} be a basis of ker(f −β). Then (f −α)u is a nonzero vector in ker(f −α).
278
Chapter 4: Reduction of Endomorphisms
We will show that B = {(f − α)u, u, v, w} is a basis with the desired properties. Let x1 (f − α)u + x2 u + x3 v + x4 w = 0
(4.7)
for some x1 , x2 , x3 , x4 ∈ K. By applying (f − α)2 to (4.7) we get x1 (f − α)3 u + x2 (f − α)2 (u) + x3 (f − α)2 (v) + x4 (f − α)2 (w) = 0 and thus x3 (f − α)2 (v) + x4 (f − α)2 (w) = 0. Hence x3 = x4 = 0 because (f − α)2 = (f − β)2 + 2(β − α)(f − β) + (β − α)2 . Next, using the fact that x3 = x4 = 0 and applying f − α to (4.7), we get x2 = 0 and then x1 = 0. This shows that the vectors (f − α)u, u, v, and w are linearly independent and consequently {(f − α)u, u, v, w} is a basis of R4 . Since f ((f − α)(u)) = (f − α)2 (u) + α(f − α)(u) = α(f − α)(u), f (u) = (f − α)u + αu, f (v) = βv,
f (w) = βw, the B-matrix of f is A. The next two results follow from Theorems 4.2.14 and 4.1.45. Theorem 4.2.31. Let V be an n-dimensional vector space and let f : V → V be an endomorphism. If cf (t) = (λ1 − t)r1 · · · (λk − t)rk for some distinct λ1 , . . . , λk ∈ K and some positive integers r1 , . . . , rk , then there are vectors x1 , . . . , xm ∈ V such that V = Vf,x1 ⊕ · · · ⊕ Vf,xm and for every j ∈ {1, . . . , m} there is an l ∈ {1, . . . , k} and an integer ql ≤ rl such that mf,xj (t) = (t − λl )ql .
279
4.2. JORDAN CANONICAL FORM
Theorem 4.2.32. Let V be an n-dimensional vector space and let f : V → V be an endomorphism. If cf (t) = (λ1 − t)r1 · · · (λk − t)rk for some distinct λ1 , . . . , λk ∈ K and some positive integers r1 , . . . , rk , then there is a basis B of V such that the B-matrix of f is of the form J1 J2
0 ..
.
0
Jr
and for every m ∈ {1, . . . , r} there is an integer q ∈ {1, . . . , k} such that Jm is in αq -Jordan canonical form.
Definition 4.2.33. We say that a matrix is in Jordan canonical form if it has the form J1 J2 .. .
0
0
Jr
and for every k ∈ {1, . . . , r} there is a number αk ∈ K such that Jk is in αk -Jordan canonical form.
Example 4.2.34. Let f : V → V be an endomorphism and let α, β, γ, δ ∈ K be such that cf (t) = (t − α)4 (t − β)2 (t − γ)2 (t − δ)2 and dim ker (f − αI)3 = 4,
dim ker (f − αI)2 = 3,
dim ker (f − βI)2 = 2,
dim ker (f − βI) = 1,
dim ker (f − γI) = 2, dim ker (f − δI)2 = 2,
dim ker (f − δI) = 1.
dim ker (f − αI) = 2,
280
Chapter 4: Reduction of Endomorphisms
By Theorem 4.2.31 there is a basis B of V such that the B-matrix of f has following Jordan canonical form α 1 0 0 0 0 0 0 0 0 0 α 1 0 0 0 0 0 0 0 0 0 α 0 0 0 0 0 0 0 0 0 0 α 0 0 0 0 0 0 0 0 0 0 β 1 0 0 0 0 0 0 0 0 0 β 0 0 0 0 . 0 0 0 0 0 0 γ 0 0 0 0 0 0 0 0 0 0 γ 0 0 0 0 0 0 0 0 0 0 δ 1 0 0 0 0 0 0 0 0 0 δ
4.3
The rational form
When considering Jordan forms we assumed that the minimal polynomial of the endomorphism splits over K. In this section we consider endomorphisms such that their minimal polynomial does not necessary split and construct bases for such endomorphisms for which the matrix has a simple form. We begin by considering an example.
Example 4.3.1. Let V be a 4-dimensional vector space over R and let f : V → V be an endomorphism. If mf (t) = (t2 + 1)2 , show that there is a basis B of V such that the B-matrix of f is
0 1 0 0
0 0 1 0
0 −1 0 0 . 0 −2 1 0
Solution. Let v be a vector from V such v ∈ ker(f 2 + 1)2 and v ∈ / ker(f 2 + 1). Then mf,v = mf and it is easy to see that B = v, f (v), f 2 (v), f 3 (v)
is a basis of V. Since f (v) = f (v), f (f (v)) = f 2 (v), f (f 2 (v)) = f 3 (v) and (f 2 + 1)2 (v) = f 4 (v) + 2f 2 (v) + 1 = 0, we have f (f 3 (v)) = f 4 (v) = −1 − 2f 2 (v).
281
4.3. THE RATIONAL FORM Consequently, the B-matrix of f is 0 1 0 0 Note that
0 0 1 0
0 −1 0 0 . 0 −2 1 0
mf,v (t) = 1 + 2t2 + t4 and that the entries in the last column in the B-matrix of f are obtained from the coefficients of the polynomial mf,v (t). The following result generalizes the observation in the above example. Theorem 4.3.2. Let V be a finite dimensional vector space and let v ∈ V be a nonzero vector. If f : V → V is a nonzero endomorphism such that mf,v (t) = a0 + a1 t + · · · + ak−1 tk−1 + tk , for some integer k ≥ 1, then B = v, f (v), . . . , f k−1 (v) is a basis of Vf,v and the B-matrix of the endomorphism fv : Vf,v → Vf,v induced by f is 0 0 0 0 . . . 0 0 −a0 1 0 0 0 . . . 0 0 −a1 0 1 0 0 . . . 0 0 −a2 .. . A = ... ... ... ... . . . ... ... . 0 0 0 0 . . . 0 0 −ak−3 0 0 0 0 . . . 1 0 −ak−2 0 0 0 0 . . . 0 1 −ak−1 Proof. We already know that B is a basis of Vf,v . To show that A is the B-matrix of the endomorphism fv it suffices to observe that f (v) = f (v) f (f (v)) = f 2 (v) .. . f (f k−3 (v)) = f k−2 (v) f (f k−2 (v)) = f k−1 (v) f (f k−1 (v)) = f k (v) = −a0 v − a1 f (v) − · · · − ak−1 f k−1 (v).
282
Chapter 4: Reduction of Endomorphisms
The next theorem is a form of converse of the above theorem. Note that it implies that every polynomial is a minimal polynomial of some endomorphism on a finite dimensional vector space. Theorem 4.3.3. Let V be a finite dimensional vector space and let B = {v1 , . . . , vk } be a basis of V. If f : V → V is an endomorphism such that 0 0 0 0 . . . 0 0 −a0 1 0 0 0 . . . 0 0 −a1 0 1 0 0 . . . 0 0 −a2 .. .. .. .. . . .. .. .. . . . . . . . . 0 0 0 0 . . . 0 0 −ak−3 0 0 0 0 . . . 1 0 −ak−2 0 0 0 0 . . . 0 1 −ak−1 is the B-matrix of f for some a0 , a1 , . . . , ak−1 ∈ K, then vj = f j−1 (v1 ) for j ∈ {2, . . . , k} and mf,v1 (t) = mf (t) = a0 + a1 t + · · · + ak−1 tk−1 + tk . Proof. Since f (v1 ) = v2 , f (v2 ) = v3 = f 2 (v1 ), f (v3 ) = v4 = f 3 (v1 ), .. . f (vk−1 ) = vk = f k−1 (v1 ), f (vk ) = f (f k−1 (v1 )) = f k (v1 ) = −a0 v1 − a1 f (v1 ) − · · · − ak−1 f k−1 (v1 ), we have a0 v1 + a1 f (v1 ) + · · · + ak−1 f k−1 (v1 ) + f k (v1 ) = 0. Linear independence of vectors v1 , . . . , vk implies that mf,v1 (t) = a0 + a1 t + · · · + ak−1 tk−1 + tk . Moreover, mf,v1 (v2 ) = mf,v1 (f (v1 )) = f (mf,v1 (v1 )) = 0 .. . mf,v1 (vk ) = mf,v1 (f k−1 (v1 )) = f k−1 (mf,v1 (v1 )) = 0, which shows that mf,v1 is an annihilator of f and mf = mf,v1 .
283
4.3. THE RATIONAL FORM
Definition 4.3.4. Let a0 , . . . , ak−1 0 0 0 0 ... 1 0 0 0 . . . 0 1 0 0 . . . .. .. .. .. . . . . . . . 0 0 0 0 . . . 0 0 0 0 . . . 0 0 0 0 ...
∈ K. The matrix 0 0 −a0 0 0 −a1 0 0 −a2 .. .. .. , . . . 0 0 −ak−3 1 0 −ak−2 0 1 −ak−1
is called the companion matrix of the polynomial
p(t) = a0 + a1 t + · · · + ak−1 tk−1 + tk and is denoted by cp .
In the next example we consider a matrix where instead of Jordan blocks we have companion matrices.
Example 4.3.5. Let V be a vector space such that dim V = 5. If f : V → V is an endomorphism such that cf (t) = (t − α)5 and mf (t) = (α − t)3 , for some α ∈ K, and dim ker(f − α) = 2 and show that there is a basis B of 0 1 0 0 0
dim ker(f − α)2 = 4,
V such that the B-matrix of f is 0 α3 0 0 0 −3α2 0 0 1 3α 0 0 . 0 0 0 −α2 0 0 1 2α
Solution. Let u and v be the vectors from Example 4.2.12. We will show that B = u, f (u), f 2 (u), v, f (v)
is a basis of V with the desired property. First we note that
Span{(f − α)2 u, (f − α)u, u} = Span{u, f (u), f 2 (u)} and Span{(f − α)v, v} = Span{v, f (v)}.
284
Chapter 4: Reduction of Endomorphisms
Now, since (f − α)2 u, (f − α)u, u, (f − α)v, v is a basis, the set {u, f (u), f 2 (u), v, f (v) is also a basis. From (f − α)3 (u) = f 3 (u) − 3αf 2 (u) + 3α2 f (u) − α3 u = 0 we get f (f 2 (u)) = f 3 (u) = α3 u − 3α2 f (u) + 3αf 2 (u). Since we also have f (f (v)) = f 2 (v) = −α2 v + 2αf (v), the B-matrix of f is
0 1 0 0 0
0 α3 0 −3α2 1 3α 0 0 0 0
0 0 0 0 0 0 . 0 −α2 1 2α
Theorem 4.3.6. Let V be a finite dimensional vector space and let f : V → V be a nonzero endomorphism such that mf = pq, where p and q are two monic polynomials such that GCD(p, q) = 1. If g is the endomorphism induced by f on ker p(f ) and h is the endomorphism induced by f on ker q(f ), then mg = p and mh = q.
Proof. Let a and b be polynomials such that ap + bq = 1. Since mg (g) = 0, we have mg (g)b(g)q(g) = 0 and thus mg (f )b(f )q(f )(x) = 0 for every x ∈ ker p(f ). Moreover, mg (f )b(f )q(f )(y) = 0 for every y ∈ ker q(f ). Consequently, mg (f )b(f )q(f ) = 0, because every vector from V is of the form x + y where x ∈ ker p(f ) and y ∈ ker q(f ), by Theorem 4.1.44. Clearly, mf = pq divides mg bq = mg (1 − ap). Consequently, p divides mg (1 − ap), which implies that p divides mg , because GCD(p, 1 − ap) = 1. For every x ∈ ker p(f ) we have p(g)(x) = p(f )(x) = 0, which implies that mg divides p. Since the monic polynomials p and mg divide each other, we have p = mg . In a similar way we can show that q = mh . In Theorem 4.2.7 we show that, if V is a finite dimensional vector space and f : V → V is an endomorphism such that mf (t) = (t − α)k for some α ∈ K, then there is a vector v ∈ V such that mf = mf,v . Now we can show that the assumption that mf (t) = (t − α)k is unnecessary.
285
4.3. THE RATIONAL FORM
Theorem 4.3.7. Let V be a finite dimensional vector space. For every nonzero endomorphism f : V → V there is a nonzero vector v ∈ V such that mf = mf,v . αk 1 Proof. Let mf = pα 1 . . . pk where p1 , . . . , pk are irreducible monic polynomials and α1 , . . . , αk are positive integers. By Theorem 4.3.6, for every j ∈ {1, . . . , k} α the polynomial pj j is the minimal polynomial of the endomorphism fj induced α by f on ker pj j (f ). By Theorem 4.1.44, we have αk 1 V = ker pα 1 ⊕ · · · ⊕ ker pk .
(4.8) α
For every j ∈ {1, . . . , k} we choose a vector vj ∈ ker pj j (f ) such that vj ∈ / α −1 ker pj j (f ). Because α pj j . Now, since
the polynomial pj is irreducible this means that mf,vj =
0 = mf,v1 +···+vk (f )(v1 + · · · + vk )
= mf,v1 +···+vk (f )(v1 ) + · · · + mf,v1 +···+vk (f )(vk ) α
and because mf,v1 +···+vk (f )(vj ) ∈ ker pj j , it follows from (4.8) that mf,v1 +···+vk (f )(vj ) = 0 for every j ∈ {1, . . . , k}. Hence, using the equalα α ity mf,vj = pj j , it follows that the polynomial pj j divides the polynomial mf,v1 +···+vk (t). Consequently αk 1 mf,v1 +···+vk = pα 1 . . . p k = mf αk 1 because, by definition, mf,v1 +···+vk divides the product pα 1 . . . pk .
Example 4.3.8. Let V be a vector space such that dim V = 4 and let f : V → V be an endomorphism. We assume that V = V1 ⊕ V2 where V1 and V2 are two invariant subspaces of f . Let f1 : V1 → V1 and f2 : V2 → V2 be the induced endomorphisms. If mf1 = t2 and mf2 = t2 − t + 1, show that there is a basis B of V such that the B-matrix of f is 0 0 0 0 1 0 0 0 0 1 0 −1 . 0 0 1 1 Solution. Let v ∈ V be such that
mf,v = mf = mf1 mf2 = t4 − t3 + t2 .
286
Chapter 4: Reduction of Endomorphisms
Then B = {v, f (v), f 2 (v), f 3 (v)} is a basis of V. Obviously, f (v) = f (v) f (f (v)) = f 2 (v) f (f 2 (v)) = f 3 (v). Moreover, since f 4 (v) − f 3 (v) + f 2 (v) = 0, we have f (f 3 (v)) = f 4 (v) = f 3 (v) − f 2 (v). Consequently, the B-matrix of f is
0 1 0 0
0 0 1 0
0 0 0 0 . 0 −1 1 1
Theorem 4.3.9. Let V be a finite dimensional vector space and let f : V → V be a nonzero endomorphism. There are v1 , . . . , vn ∈ V such that V = Vf,v1 ⊕ · · · ⊕ Vf,vn and mf,vj is a multiple of mf,vj+1 for every j ∈ {1, . . . , n − 1}.
Proof. By Theorem 4.3.7, there is a vector v1 ∈ V such that mf (t) = mf,v1 (t) and, by Lemma 4.2.13, we have V = Vf,v1 ⊕ V2 , where V2 is an f -invariant subspace of V. Now, there is a vector v2 ∈ V2 such that mf2 (t) = mf,v2 (t), where f2 is the endomorphism induced by f on V2 . Clearly, mf2 divides mf . Applying Lemma 4.2.13 again we get V = Vf,v1 ⊕ Vf,v2 ⊕ V3 ,
4.3. THE RATIONAL FORM
287
where V3 is an f -invariant subspace of V. Continuing this way we can produce the desired decomposition of V.
Example 4.3.10. Let V, f , u, v, and w be as defined in Example 4.2.30. Show that V = Vf,u+v ⊕ Vf,w . Solution. Since mf,u (t) = (t − α)2 and mf,v (t) = t − β, we have mf,u+v = (t − α)2 (t − β) = mf . Because Vf,u+v ⊆ Vf,u ⊕ Vf,v and because dim Vf,u+v = 3, we have Vf,u+v = Vf,u ⊕ Vf,v . This gives us V = Vf,u ⊕ Vf,v ⊕ Vf,w = Vf,u+v ⊕ Vf,w . In Theorem 4.3.9 we show that for any endomorphism f on a finite dimensional space V there are v1 , . . . , vn ∈ V such that V = Vf,v1 ⊕ · · · ⊕ Vf,vn and mf,vj is a multiple of mf,vj+1 for every j ∈ {1, . . . , n − 1}. In Theorem 4.3.12 below we will show that, while the vectors v1 , . . . , vn are not unique, the integer n and the polynomials mf,v1 , . . . , mf,vn are unique. In the proof of Theorem 4.3.12 we use the following lemma. Lemma 4.3.11. Let V be a finite dimensional vector space and let f : V → V be a nonzero endomorphism. If, for a vector v ∈ V, the polynomial p is a divisor of mf,v , then dim p(f )(Vf,v ) = dim Vf,v − deg p.
Proof. We can assume that the polynomial p is monic and that p 6= mf,v . Then p(f )(Vf,v ) = Vf,p(f )v . Now, if q is a monic polynomial such that mf,v = pq, then clearly q = mf,p(f )v . This gives us dim p(f )(Vf,v ) = deg q = deg mf,v − deg p = dim Vf,v − deg p.
288
Chapter 4: Reduction of Endomorphisms
Theorem 4.3.12. Let V be a finite dimensional vector space and let f : V → V be a nonzero endomorphism. If V = Vf,v1 ⊕ · · · ⊕ Vf,vn for some nonzero vectors v1 , . . . , vn ∈ V such that mf,v1 = mf and mf,vj is a multiple of mf,vj+1 for every j ∈ {1, . . . , n − 1} and V = Vf,w1 ⊕ · · · ⊕ Vf,wq for some nonzero vectors w1 , . . . , wq ∈ V such that mf,w1 = mf and mf,wj is a multiple of mf,wj+1 for every j ∈ {1, . . . , q − 1}, then q = n and mf,vj = mf,wj for every j ∈ {1, . . . , n}. Proof. Let k be a positive integer such that k ≤ n and k ≤ q. Suppose that mf,vj = mf,wj for every j ∈ {1, . . . , k − 1}. Since mf,vk (f )(Vf,vj ) = 0 for every j ∈ {k, . . . , n}, we have mf,vk (f )(V) = mf,vk (f )(Vf,v1 ) ⊕ · · · ⊕ mf,vk (f )(Vf,vk−1 ), which give us dim mf,vk (f )(V) = dim mf,vk (f )(Vf,v1 ) + · · · + dim mf,vk (f )(Vf,vk−1 ). We also have mf,vk (f )(V) = mf,vk (f )(Vf,w1 ) ⊕ · · · ⊕ mf,vk (f )(Vf,wq ) and thus dim mf,vk (f )(V) = dim mf,vk (f )(Vf,w1 ) + · · · + dim mf,vk (f )(Vf,wq ). By the assumptions and Lemma 4.3.11 we have dim mf,vk (f )(Vf,vj ) = dim mf,vk (f )(Vf,wj ) for every j ∈ {1, . . . , k−1}, because mf,vk is a divisor of all polynomials mf,w1 = mf,v1 , . . . , mf,wk−1 = mf,vk−1 . Hence dim mf,vk (f )(Vf,wk ) = 0, which implies that mf,vk (f )(Vf,wk ) = 0. This shows that mf,wk divides mf,vk .
289
4.3. THE RATIONAL FORM
The same way we can show that mf,vk divides mf,wk . Consequently, mf,vk = mf,wk , because the polynomials mf,vk and mf,wk are monic. We finish the proof by induction and using the equality dim V = dim Vf,v1 + · · · + dim Vf,vn = dim Vf,w1 + · · · + dim Vf,wq .
The polynomials mf,v1 , . . . , mf,vn in Theorem 4.3.12 are called the invariant factors of f . Example 4.3.13. Let V be a finite dimensional vector space and let f : V → V be an endomorphism such that the invariant factors of f are (t − 1)3 (t − 2)3 (t − 3),
(t − 1)2 (t − 2)3 ,
and (t − 1)(t − 2).
Determine the Jordan blocks from the Jordan canonical form. Solution. J1,3 , J1,2 , J1,1 , J2,3 , J2,3 , J2,1 , J3,1 .
Definition 4.3.14. With the notation from Theorem 4.3.12 the matrix Cmf,v1 Cmf,v2 . ..
0
0
Cmf,vn
is called the rational form of f .
Example 4.3.15. Let V be a finite dimensional vector space and let f : V → V be an endomorphism. If the Jordan canonical form of f is α 0 0 0
1 α 0 0
0 0 β 0
0 0 , 0 β
determine the invariant factors of f and the rational form of f .
290
Chapter 4: Reduction of Endomorphisms
Solution. The invariant factors are (t − α)2 (t − β) = t3 − (2α + β)t2 + (α2 + 2αβ)t − α2 β Consequently, the rational form is 0 0 α2 β 2 1 0 −(α + 2αβ) 0 1 2α + β 0 0 0
and t − β.
0 0 . 0 β
Example 4.3.16. Let V be a finite dimensional vector space and let f : V → V be an endomorphism. If the Jordan canonical form of f is α 1 0 0 0 0 0 0 0 0 0 α 1 0 0 0 0 0 0 0 0 0 α 0 0 0 0 0 0 0 0 0 0 α 0 0 0 0 0 0 0 0 0 0 β 1 0 0 0 0 0 0 0 0 0 β 0 0 0 0 , 0 0 0 0 0 0 γ 1 0 0 0 0 0 0 0 0 0 γ 0 0 0 0 0 0 0 0 0 0 γ 0 0 0 0 0 0 0 0 0 0 γ determine the invariant factors of f . Solution. The invariant factors are (t − α)3 (t − β)2 (t − γ)2 ,
4.4 4.4.1
(t − α)(t − γ),
and t − γ.
Exercises Diagonalization
Exercise 4.1. Let V be a finite dimensional vector space and let f : V → V be an endomorphism. If U and W are f -invariant subspaces and V = U ⊕ W, show that det f = det fU det fW . Exercise 4.2. Let V be a finite dimensional vector space and let f : V → V be an endomorphism. If p, q, and r are polynomials such that GCD(p, q) = r, show that ker p(f ) ∩ ker q(f ) = ker r(f ).
4.4. EXERCISES
291
Exercise 4.3. Let V be a finite dimensional vector space and let f : V → V be an endomorphism. If p and q are polynomials such that GCD(p, q) = 1, show that ker(pq)(f ) = ker p(f ) ⊕ ker q(f ). Exercise 4.4. Let V be a finite dimensional vector space and let f : V → V be an endomorphism. Suppose that p1 , . . . pn are polynomials such that GCD(pj , pk ) = 1, for j, k ∈ {1, . . . , n} and j 6= k. Use Exercise 4.3 and mathematical induction to prove that ker(p1 . . . pn )(f ) = ker p1 (f ) ⊕ · · · ⊕ ker pn (f ). Exercise 4.5. Let V be a finite dimensional vector space and let f : V → V be an endomorphism. If λ1 , . . . , λk are distinct numbers, show that dim ker(f − λ1 ) + · · · + dim ker(f − λk ) = dim(ker(f − λ1 ) + · · · + ker(f − λk )). Exercise 4.6. Let V be a finite dimensional vector space and let f : V → V be an endomorphism. Suppose that p1 , . . . , pn are polynomials such that GCD(pj , pk ) = 1 for j, k ∈ {1, . . . , n} and j 6= k. Show that for every j ∈ {1, . . . , n} there is a polynomial qj such that qj (f ) is the projection of ker pj (f ) on ker(p1 . . . pn )(f ) along ker p1 (f ) ⊕ · · · ⊕ ker pj−1 (f ) ⊕ ker pj+1 (f ) ⊕ · · · ⊕ ker pn (f ). Exercise 4.7. Let V be a finite dimensional vector space and let f : V → V be an endomorphism. If p(f ) = 0 for some polynomial p, show that every eigenvalue of f is a root of p. Exercise 4.8. Let V be a finite dimensional vector space and let f : V → V be an endomorphism such that p(f ) = 0 where p is a polynomial which can be written as a product of distinct monic polynomials of degree 1. Show that f is diagonalizable. Exercise 4.9. Let V be a finite dimensional vector space and let f : V → V be an endomorphism. Show that, if f 2 is diagonalizable and has distinct positive eigenvalues, then f is diagonalizable. Exercise 4.10. Let V be a finite dimensional vector space and let f : V → V be an endomorphism. If cf (t) = (5 − t)2 (3 − t)7 , show that f is diagonalizable if and only if (f − 5)(f − 3) = 0. Exercise 4.11. Let f : C3 → C3 be the endomorphism defined by x 0 0 1 x f y = 1 0 0 y . z 0 1 0 z Determine cf and mf and show that f is diagonalizable.
292
Chapter 4: Reduction of Endomorphisms
Exercise 4.12. Let f : C2 → C2 be the endomorphism defined by x 7i −2 x f = . y −25 −7i y Show that f is diagonalizable and determine the eigenspaces of f . Exercise 4.13. Let V be a finite dimensional vector space, f : V → V an endomorphism, and g : V → V an isomorphism. Show that cf = cgf g−1 . Exercise 4.14. Let V be a finite dimensional vector space, f : V → V an endomorphism, and g : V → V an isomorphism. Show that mf = mgf g−1 . Exercise 4.15. Let f : R3 → R3 be the x −3 f y = −4 z −8 Find cf .
endomorphism defined by 43 −17 x 29 −10 y . 60 −21 z
Exercise 4.16. Show that the endomorphism f : R3 → R3 defined by x −3 43 −17 x f y = −4 29 −10 y z −8 60 −21 z
is not diagonalizable and find the eigenspaces of this endomorphism.
Exercise 4.17. Let V be a finite dimensional vector space and let f : V → V be an endomorphism. Show that 0 is a root of the minimal polynomial mf if and only if f is not invertible. Exercise 4.18. Let V be a finite dimensional vector space and let f : V → V be an endomorphism. If U is an f -invariant subspace of V such that dim U = 1, show that every nonzero vector in U is an eigenvector. Exercise 4.19. Let V be a finite dimensional vector space, f, g : V → V endomorphisms, and α ∈ K. If α is an eigenvalue of f and g 3 −3g 2 f +3gf 2 −f 3 = 0, show that α is an eigenvalue of g. Exercise 4.20. Let V be a finite dimensional vector space and let f, g; V → V be endomorphisms such that f g = gf . If mf = tj and mf = tk for some positive integers j and k, show that mf g = tr and mf +g = ts for some positive integers r and s. Exercise 4.21. Let V be a finite dimensional vector space and let f : V → V be an endomorphism. If the characteristic polynomial of f is cf (t) = 5t3 + 7t2 + 2t + 8, show that f is invertible and find f −1 . Exercise 4.22. Let V and W be vector spaces. Let f : V → V be an endomorphism and let g : W → V be an invertible linear transformation. Show that λ ∈ K is an eigenvalue of f if and only λ is an eigenvalue of g −1 f g.
4.4. EXERCISES
293
Exercise 4.23. Let V be a finite dimensional vector space and let f : V → V be an endomorphism. If U is an f -invariant subspace of V, show that mfU divides mf . Exercise 4.24. Let V be a finite dimensional vector space, f : V → V an endomorphism, and U an f -invariant subspace of V. Let v, w ∈ V be such that v + w ∈ U. If v and w are eigenvectors of f corresponding to distinct eigenvalues, show that both vectors v and w are in U. Exercise 4.25. Let V be a finite dimensional complex vector space and let f : V → V be an endomorphism such that f 4 + Id = 0. Show that f is diagonalizable. Exercise 4.26. Let V be a finite dimensional vector space, f : V → V an endomorphism, and λ a nonzero number. If f is invertible, show that λ is an eigenvalue of f if and only if λ1 is an eigenvalue of f −1 . Exercise 4.27. Let V be a finite dimensional vector space, f : V → V an endomorphism, and v ∈ V an eigenvector of f corresponding to an eigenvalue λ. We assume that α, β ∈ K are such that γ 6= α and γ 6= β. If u ∈ ker((f − α)(f − β)) and u + v = 0, show that v = 0. Exercise 4.28. Let V be a finite dimensional vector space and let f : V → V be an endomorphism. Show that 0 is an eigenvalue of f if and only f is not invertible. Exercise 4.29. Let V be a finite dimensional vector space and let f : V → V be an endomorphism. If U is an f -invariant subspace of V and f is diagonalizable, show that fU is diagonalizable. Exercise 4.30. Let V be a finite dimensional vector space, f : V → V an endomorphism, and α ∈ K. If dim V = n and there is an integer k ≥ 1 such that (f − α)k = 0, determine the characteristic polynomial of f . Exercise 4.31. Let V be a finite dimensional complex vector space and let f : V → V be an endomorphism such that f 2 − 2f + 5 = 0. What are all possible polynomials mf ? Exercise 4.32. Let V be a finite dimensional complex vector space and let f : V → V be an endomorphism such that f 3 − id = 0. Show that f is diagonalizable. Exercise 4.33. Let V be a finite dimensional vector space and let f : V → V be an endomorphism. If U is an f -invariant subspace of V, v, w ∈ V are such that v, v + w ∈ U, and w is an eigenvector of f corresponding to a nonzero eigenvalue α ∈ K, show that w ∈ U. Exercise 4.34. Let V be a finite dimensional complex vector space and let f : V → V be an endomorphism. If cf = pq, where p and q are polynomials such that GCD(p, q) = 1, and g and h are the linear transformations induced by f on ker p(f ) and ker q(f ), respectively, show that there are nonzero α, β ∈ C such that p = αcg and q = βch .
294
4.4.2
Chapter 4: Reduction of Endomorphisms
Jordan canonical form
Exercise 4.35. Let V be a finite dimensional complex vector space and let f : V → V be an endomorphism. If U is an f -invariant subspace of V and v ∈ U, show that the cyclic subspace of f associated with v is a subspace of U. 2 2 Exercise 4.36. Show that the endomorphism f : R → R corresponding to 3 4 the matrix is not diagonalizable and find the Jordan canonical form −1 7 of f and a Jordan basis.
Exercise 4.37. Use Exercise 4.36 to solve the system ′ x = 3x + 4y . y ′ = −x + 7y Exercise 4.38. Let f : R3 → R3 be the endomorphism defined by x 1 1 0 x f y = −5 4 1 y . z 1 2 1 z Determine the Jordan canonical form of f and a Jordan basis.
2 Exercise 4.39. → C2 with the Jordan canonical Find an endomorphism f : C i 1 2i 1 form and a Jordan basis , . 0 i −3 i
Exercise 4.40. Let f : R3 → R3 be the x −3 f y = −4 z −8
endomorphism defined by 43 −17 x 29 −10 y . 60 −21 z
Using Exercises 4.15 and 4.16 find the Jordan canonical form of f and a Jordan basis. Exercise 4.41. Let V be a finite dimensional vector space and let f : V → V be an endomorphism. If mf (t) = tn for some integer n ≥ 1, show that {0} ( ker f ( · · · ( ker f n−1 ( ker f n = V. Exercise 4.42. Let V be a finite dimensional vector space and let f : V → V be an endomorphism. If m ≥ 2 and Sm is a subspace of V such that ker f m−1 ∩Sm = {0}, show that f (Sm ) ∩ ker f m−2 = {0}. Exercise 4.43. Let V be a finite dimensional vector space and let f : V → V be an endomorphism. If m ≥ 2 and Sm is a subspace of V such that ker f m−1 ∩Sm = {0}, show that the function g : Sm → f (Sm ) defined by g(v) = f (v) is an isomorphism.
4.4. EXERCISES
295
Exercise 4.44. Let V be a finite dimensional vector space and let f : V → V be an endomorphism such that mf (t) = tn for some integer n ≥ 1. Using Exercise 4.42, show that for every m ∈ {1, . . . , n} there is a subspace Sm of V such that ker f m = ker f m−1 ⊕ Sm for m ∈ {1, . . . , n} and f (Sm ) ⊆ Sm−1 for m ∈ {2, . . . , n}. Exercise 4.45. Give an example of a vector space V and an endomorphism f : V → V such that cf = (t − 2)4 (t − 5)4 and mf = (t − 2)2 (t − 5)3 . Exercise 4.46. Let V and W be finite dimensional vector spaces, f : V → V and g : W → W endomorphisms, v ∈ V and w ∈ W. If mf,v = mg,w , show that there is an isomorphism h : Vf,v → Wf,w such that g(x) = h(f (h−1 (x))) for every x ∈ Wf,w . Exercise 4.47. Let V and W be finite dimensional vector spaces and let f : V → V and g : W → W be endomorphisms. If f and g have the same Jordan canonical form, show that there is an isomorphism h : V → W such that g = hf h−1 . Exercise 4.48. Let V be a finite dimensional vector space and let f : V → V be an endomorphism. If cf (t) = (α − t)n for some α ∈ K and integer n ≥ 1, show that there is an endomorphism g : V → V such that cf (t) = (−t)n and we have f = g + α Id. Exercise 4.49. Let V be a vector space such that dim V = 8 and let f : V → V be an endomorphism such that mf (t) = (t − α)5 for some α ∈ K. If dim ker(f −α)4 = 7, dim ker(f −α)3 = 6, dim ker(f −α)2 = 4, dim ker(f −α) = 2, find the Jordan canonical form of f and a Jordan basis. Exercise 4.50. Let V be a finite dimensional vector space and let f : V → V be an endomorphism. We assume that mf = t4 and that the Jordan canonical form of f has 7 Jordan blocks J0,4 , 4 Jordan blocks J0,3 , one block J0,2 , and one Jordan block J0,1 . Determine dim ker f 3 , dim ker f 2 , and dim ker f . Exercise 4.51. With the notation from Exercise 4.44 show that dim Sn is the number of Jordan blocks J0.n of the Jordan canonical form of f and that for every m ∈ {1, . . . , n − 1} the dimension dim Tm equals the number of Jordan blocks J0.m of the Jordan canonical form of f . Exercise 4.52. Let V be a finite dimensional vector space and let f : V → V be an endomorphism. If cf = (t − α)4 (t − β)3 and mf = (t − α)3 (t − β)2 for some α, β ∈ K, find all possible Jordan canonical forms of f . Exercise 4.53. Let V be a finite dimensional vector space and let f, g : V → V be endomorphisms. If g is invertible, show that f and gf g −1 have the same Jordan canonical form. Exercise 4.54. Let V be a vector space such that dim V = 4 and let f : V → V be an endomorphism such that mf (t) = (t − α)3 for some α ∈ K. Determine the Jordan canonical form of f and explain the construction of a Jordan basis.
296
Chapter 4: Reduction of Endomorphisms
Exercise 4.55. Let V be a vector space such that dim V = 4 and let f : V → V be an endomorphism such that cf (t) = (α − t)4 and mf (t) = (t − α)2 for some α ∈ K. If dim ker(f − α) = 2, determine the Jordan canonical form of f and explain the construction of a Jordan basis. Exercise 4.56. Let V be a finite dimensional vector space and let f : V → V be an endomorphism such that the Jordan canonical form of f is α 1 0 0 0 0 0 0 0 α 1 0 0 0 0 0 0 0 α 1 0 0 0 0 0 0 0 α 0 0 0 0 0 0 0 0 α 1 0 0 0 0 0 0 0 α 0 0 0 0 0 0 0 0 α 1 0 0 0 0 0 0 0 α for some α ∈ K. Determine mf .
Exercise 4.57. Let V be a finite dimensional vector space and let f : V → V be an endomorphism such that the Jordan canonical form of f is α 1 0 0 0 0 0 0 0 α 0 0 0 0 0 0 0 0 α 1 0 0 0 0 0 0 0 α 0 0 0 0 0 0 0 0 β 1 0 0 0 0 0 0 0 β 1 0 0 0 0 0 0 0 β 0 0 0 0 0 0 0 0 β for some α, β ∈ K. Determine mf .
Exercise 4.58. Let V be a vector space such that dim V = 7 and let f : V → V be an endomorphism such that mf = (t − α)4 for some α ∈ K. Determine all possible Jordan canonical forms of f . Exercise 4.59. Let V be a vector space such that dim V = 25 and let f : V → V be an endomorphism such that mf (t) = (t − α)3 for some α ∈ K. If dim ker(f − α)2 = 21 and dim ker(f − α) = 12, determine the number of Jordan α 1 blocks of the form in the Jordan canonical form of f . 0 α Exercise 4.60. Let V be a finite dimensional vector space and let f : V → V be an endomorphism. If cf = (t − α)4 (t − β)4 and mf = (t − α)2 (t − β)2 for some α, β ∈ K, determine all possible Jordan canonical forms of f . Exercise 4.61. Let V be a finite dimensional vector space and let f : V → V be a diagonalizable endomorphism. Assume v1 . . . vn is a basis of V consisting
297
4.4. EXERCISES
of eigenvectors of f corresponding to eigenvalues λ1 , . . . , λn . If λ1 , . . . , λk are the nonzero eigenvalues of f , show that {v1 , . . . , vk } is a basis of ran f and {vk+1 , . . . , vn } is a basis of ker f . Exercise 4.62. Let V be a finite dimensional vector space and let f : V → V be an endomorphism. If dim ker(f − α)9 = 33 and dim ker(f − α)7 = 27 for some α ∈ K, determine the possible number of 8 × 8 Jordan blocks in the Jordan canonical form of f . Exercise 4.63. Let V be a finite dimensional vector space, f : V → V an endomorphism, and α ∈ K. Show that it is not possible to have dim V = 8, mf (t) = (t − α)7 and dim ker(f − α) = 1.
4.4.3
Rational form
Exercise 4.64. Let V be a finite dimensional vector space, f : V → V an endomorphism, and v ∈ V. If mf (t) = (t2 −t+1)5 and that v ∈ / ker(f 2 −f +1)3 , what can mf,v be? Exercise 4.65. Let V be a finite dimensional vector space, v, w ∈ V, and f : V → V an endomorphism. If GCD(mf,v , mf,w ) = 1, show that mf,v+w = mf,v mf,w . Exercise 4.66. Let f : R3 → R3 be the x −3 f y = −4 z −8
endomorphism defined by 43 −17 x 29 −10 y . 60 −21 z
Exercise 4.67. Let f : R3 → R3 be the x −3 f y = −4 z −8
endomorphism defined by 43 −17 x 29 −10 y . 60 −21 z
Find a vector v ∈ R3 such that mf,v = (t + 1)(t − 3).
Find a vector v ∈ R3 such that the set B = {v, f (v), f 2 (v)} is a basis of R3 and determine the B-matrix of f . Exercise 4.68. Let V be a finite dimensional vector space and let f : V → V be an endomorphism. If the invariant factors of f are (t − 2)3 (t − 5)2 and (t − 2)(t − 5)2 , determine the Jordan canonical form of f Exercise 4.69. Let V be a finite dimensional vector space and let f : V → V be an endomorphism. If the minimal polynomials of the Jordan blocks of the Jordan canonical form of f are (t − 1)5 , (t − 1)3 , t − 1, (t − 4)3 , (t − 4)2 , (t − 2)4 , (t − 2)2 , (t − 2)2 , (t − 2), determine the invariant factors of f .
298
Chapter 4: Reduction of Endomorphisms
Exercise 4.70. Let V be a vector space such that dim V = 8 and let f : V → V be an endomorphism. If mf = (t2 +1)(t2 +t+1), determine all possible invariant factors of f . Exercise 4.71. Let V be a finite-dimensional vector space, f : V → V an endomorphism, and B a basis of V. If the B-matrix of f is 0 0 −1 0 0 1 0 −3 0 0 0 1 −3 0 0 , 0 0 0 0 −1 0 0 0 1 −2
show that this matrix is the rational form of f and determine the Jordan canonical form of f . Exercise 4.72. Find a vector space V and an endomorphism f : V → V such that cf = (t2 + t + 1)3 (t2 + 1) and mf = (t2 + t + 1)(t2 + 1). Exercise 4.73. Let V be a finite-dimensional vector space, f : V → V an endomorphism, and B a basis of V. If the B-matrix of f is 0 −1 0 0 0 1 −1 0 0 0 0 0 0 −1 0 , 0 0 1 −1 0 0 0 0 0 0 determine the rational form of f .
Exercise 4.74. Let V be a vector space such that dim V = 8 and let f : V → V be an endomorphism. If the Jordan canonical form of f is α 1 0 0 0 0 α 1 0 0 0 0 α 0 0 , 0 0 0 β 1 0 0 0 0 β where α 6= β, determine the rational form of f .
Exercise 4.75. Let V be a finite-dimensional vector space, f : V → V an endomorphism, and B a basis of V. If the B-matrix of f is 0 0 0 0 −1 1 0 0 0 −5 0 1 0 0 −10 , 0 0 1 0 −10 0 0 0 1 −5 determine the minimal polynomial of f .
4.4. EXERCISES
299
Exercise 4.76. Let V be a finite-dimensional vector space, f : V → V an endomorphism, and B a basis of V. If the B-matrix of f is 0 −3 0 0 1 0 0 0 0 0 0 −1 , 0 0 1 0 determine the rational form of f .
Exercise 4.77. Determine a vector space V and an endomorphism f : V → V such that mf = 3 − t − 4t2 + 2t3 + t4 .
This page intentionally left blank
Chapter 5
Appendices Appendix A
Permutations
By a permutation on {1, 2, . . . , n} we mean a bijection σ : {1, 2, . . . , n} → {1, 2, . . . , n}. The set of all permutations on {1, 2, . . . , n} will be denoted by Sn . Note that the identity function id : {1, 2, . . . , n} → {1, 2, . . . , n} is an element of Sn . A permutation σ ∈ Sn defines a permutation on any set with n objects. For example, if σ ∈ Sn and (x1 , . . . , xn ) is an ordered n-tuple of vectors, then (xσ(1) , . . . , xσ(n) ) is the corresponding permutation of (x1 , . . . , xn ). If we write the first n integers ≥ 1 in the order 1, 2, . . . , n, then we may think of a permutation σ ∈ Sn as being a reordering of these numbers such that we get σ(1), σ(2), . . . , σ(n). A permutation σ ∈ Sn will be called a transposition if there are distinct j, k ∈ {1, 2, . . . , n} such that σ(j) = k, σ(k) = j, and σ(m) = m for every m ∈ {1, 2, . . . , n} different from j and k.
In other words, a transposition switches two numbers and leaves the remaining numbers where they were. We will use the symbol σjk to denote the above −1 transposition. Note that σjk = σjk for every j, k ∈ {1, 2, . . . , n}. A transposition of the form σj,j+1 is called an elementary transposition. Theorem 5.1. Every permutation in Sn with n ≥ 2 is an elementary transposition or a composition of elementary transpositions.
Proof. We will prove this result using induction on n. Since every permutation in σ ∈ S2 is an elementary transposition, the statement is true for n = 2. Now assume that the statement is true for every k < n for some n > 2. 301
302
Chapter 5: Appendices
Let σ : {1, 2, . . . , n} → {1, 2, . . . , n}. be a permutation. If σ(n) = j for some j < n, then the permutation τ = σn,n−1 . . . σj+1,j σ satisfies τ (n) = n and we have −1 −1 σ = σj+1,j . . . σn,n−1 τ = σj+1,j . . . σn,n−1 τ.
This means that, if τ is a product of elementary permutations, then σ is also a product of elementary permutations. Now, if σ(n) = n, then the restriction of σ to the set {1, 2, . . . , n − 1} is a product of elementary permutations on {1, 2, . . . , n − 1}, by the inductive assumption. Because every elementary permutation π on {1, 2, . . . , n−1} can be extended to an elementary permutation π ′ on {1, 2, . . . , n} by taking π ′ (j) = j for j < n and π ′ (n) = n, σ is a product of elementary permutations.
Definition 5.2. Let σ ∈ Sn . A pair of integers (j, k), j < k is called an inversion of σ if σ(j) > σ(k). A permutation with an even number of inversions is called an even permutation and a permutation with an odd number of inversions is called an odd permutation. We define the sign of a permutation σ, denoted by ǫ(σ), to be 1 if σ is even and −1 if σ is odd, that is, ( 1 if σ is even ǫ(σ) = −1 if σ is odd
Example 5.3. Show that every transposition is an odd permutation. Proof. Let σjk be a transposition where j < k. We represent this transposition by (1, . . . , j − 1, k, j + 1, . . . , k − 1, j, k + 1, . . . , n). The inversions are (k, j + 1), . . . , (k, k − 1), (j + 1, j), . . . , (k − 1, j), (k, j). The total number of inversions is the odd number (2(k − j) − 1).
APPENDIX A.
PERMUTATIONS
303
Theorem 5.4. The sign of a product of permutations is the product of the signs of these permutations, that is, ǫ(σ1 · · · σk ) = ǫ(σ1 ) · · · ǫ(σk ), for any σ1 , . . . , σk ∈ Sn . Proof. Let σ ∈ Sn and let τ = σσj+1,j . First note that, since (τ (j), τ (j+1)) = (σ(j+1), σ(j)), the pair (τ (j), τ (j+1)) is an inversion if and only if (σ(j), σ(j + 1)) is not an inversion. Moreover, (τ (p), τ (q)) = (σ(p)), σ(q)) if p 6= j, p 6= j + 1, q 6= j and q 6= j + 1. Since {(τ (1), τ (j)), . . . , (τ (j − 1), τ (j))} = {(σ(1), σ(j + 1)), . . . , (σ(j − 1)), σ(j + 1))}, {(τ (1), τ (j + 1)), . . . , (τ (j − 1), τ (j + 1))} = {(σ(1), σ(j)), . . . , (σ(j − 1)), σ(j))}, {(τ (j), τ (j + 2)), . . . , (τ (j), τ (n))} = {(σ(j + 1), σ(j + 2)), . . . , (σ(j + 1)), σ(n))}, {(τ (j + 1), τ (j + 2)), . . . , (τ (j + 1), τ (n))} = {(σ(j), σ(j + 2)), . . . , (σ(j), σ(n))}, we have {(τ (1), τ (j)), . . . , (τ (j − 1), τ (j)), (τ (1), τ (j + 1)), . . . , (τ (j − 1), τ (j + 1))} = {(σ(1), σ(j)), . . . , (σ(j − 1)), σ(j)), (σ(1), σ(j + 1)), . . . , (σ(j − 1)), σ(j + 1))} and {(τ (j), τ (j + 2)), . . . , (τ (j), τ (n)), (τ (j + 1), τ (j + 2)), . . . , (τ (j + 1), τ (n))} = {(σ(j), σ(j + 2)), . . . , (σ(j), σ(n)), (σ(j + 1), σ(j + 2)), . . . , (σ(j + 1)), σ(n))}. Hence ǫ(τ ) = −ǫ(σ), because (τ (j), τ (j + 1)) = (σ(j + 1)), σ(j)). Since every permutation is a product of elementary transpositions, the result follows by induction.
Corollary 5.5. For every σ ∈ Sn we have ǫ(σ −1 ) = ǫ(σ).
Proof. Since ǫ(σ −1 )ǫ(σ) = ǫ(σ −1 σ) = ǫ(Id) = 1, we have ǫ(σ −1 ) = (ǫ(σ))−1 = ǫ(σ).
304
Chapter 5: Appendices
Appendix B
Complex numbers
The set of complex numbers, denoted by C, can be identified with the set R2 where we define addition as in the vector space R2 , that is (a, b) + (c, d) = (a + c, b + c), and the multiplication by (a, b)(c, d) = (ac − bd, ad + bc). If z = (x, y), then by −z we denote the number (−x, −y). Note that z + (−z) = (0, 0). y x , − If z = (x, y) 6= (0, 0), then by z −1 or z1 we denote the number x2 +y 2 x2 +y 2 . Note that zz −1 = z −1 z = (1, 0).
Theorem 5.6. If z, z1 , z2 , z3 ∈ C, then (a) (z1 + z2 ) + z3 = z1 + (z2 + z3 ); (b) z1 + z2 = z2 + z1 ; (c) (z1 z2 )z3 = z1 (z2 z3 ); (d) z1 z2 = z2 z1 ; (e) z(0, 0) = 0, 0); (f) z(1, 0) = z; (g) (z1 + z2 )z3 = z1 z3 + z2 z3 .
All these properties are easy to verify. As an example, we verify the propriety (c). If z1 = (a, b), z2 = (c, d), and z3 = (e, f ), then (z1 z2 )z3 = (ac−bd, ad+bc)(e, f ) = ((ac−bd)e−(ad+bc)f, (ac−bd)f +(ad+bc)e) and z1 (z2 z3 ) = (a, b)(ce−df, cf +de) = (a(ce−df )−b(cf +de), a(cf +de)+b(ce−df ). Since (ac − bd)e − (ad + bc)f = a(ce − df ) − b(cf + de) and (ac − bd)f + (ad + bc)e = a(cf + de) + b(ce − df ), we get (z1 z2 )z3 = z1 (z2 z3 ). The function ϕ : R → {(x, 0) : x ∈ R}, defined by ϕ(x) = (x, 0), is bijective and satisfies
APPENDIX B. COMPLEX NUMBERS
305
(a) ϕ(x + y) = ϕ(x) + ϕ(x); (b) ϕ(xy) = ϕ(x)ϕ(x). This observation makes it possible to identify the real number x with the complex number (x, 0). It is easy to verify that (0, 1)(0, 1) = −1. The complex (0, 1) is denoted by i. Consequently we can write i2 = −1. If z = (x, y) is a complex number, then z = (x, y) = (x, 0) + (y, 0)(0, 1) = x + yi, which is the standard notation for complex numbers. In this notation the product of the complex numbers (a, b) and (c, d) becomes (a, b)(c, d) = (a + bi)(c + di) = (ab − bc) + (ad + bc)i. A complex number z is called purely imaginary if there is a nonzero real number y such that z = (0, y) = yi. The complex conjugate of a number z = x + yi is the number x − yi and it is denoted by z. Theorem 5.7. If z, z1 , z2 ∈ C, then (a) z = z; (b) z1 + z2 = z1 + z2 ; (c) z1 z2 = z1 z2 . Proof. Clearly we have z = z and z1 + z2 = z1 + z2 . To prove z1 z2 = z1 z2 we let z1 = x1 + y1 i and z2 = x2 + y2 i. Now z1 z2 = (x1 + y1 i)(x2 + y2 i) = x1 x2 − y1 y2 + (x1 y2 + y1 x2 )i = x1 x2 − y1 y2 − (x1 y2 + y1 x2 )i and z1 z2 = x1 + y1 i x2 + y2 i = (x1 − y1 i)(x2 − y2 i) = x1 x2 − y1 y2 − (x1 y2 + y1 x2 )i.
306
Chapter 5: Appendices
Note that a complex number z is real if and only if z = z and a complex number z is purely imaginary if and only if z = −z. p The absolute value of the number z = x + yi is the nonnegative number x2 + y 2 and is denoted by |z|. Theorem 5.8. If z, z1 , z2 ∈ C, then (a) z 6= 0 if and only if |z| 6= 0; (b) zz = |z|2 ; (c) z −1 =
z |z|2 ;
(d) |z1 + z2 | ≤ |z1 | + |z2 |; (e) |z1 z2 | = |z1 | |z2 |. Proof. Parts (a), (b), and (c) are direct consequences of the definitions. To prove |z1 + z2 | ≤ |z1 | + |z2 | we let z1 = x1 + y1 i and z2 = x2 + y2 i. Since both numbers |z1 + z2 | and |z1 | + |z2 | are nonnegative, the inequality |z1 + z2 | ≤ |z1 | + |z2 | is equivalent to the inequality |z1 + z2 |2 ≤ (|z1 | + |z2 |)2 which is equivalent the inequality q q x1 x2 + y1 y2 ≤ x21 + y12 x22 + y22 . The above inequality is a direct consequence of the inequality (x1 x2 + y1 y2 )2 ≤ (x21 + y12 )(x22 + y22 ) which is equivalent to the inequality 2x1 x2 y1 y2 ≤ x21 y22 + y12 x22 which can be written as 0 ≤ (x1 y2 − y1 x2 )2 .
Now to prove (e) we observe that
|z1 z2 |2 = z1 z2 z1 z2 = z1 z2 z1 z2 = z1 z1 z2 z2 = |z1 |2 |z2 |2 , which gives us |z1 z2 | = |z1 | |z2 |. Corollary 5.9. For every complex number z 6= 0, there are unique real number r > 0 and a complex number v such that |v| = 1 and z = rv. z Proof. We can take r = |z| and v = |z| , because |z| = |z|. If z = sw where s > 0 and |w| = 1, then |z| = |s||w| = s, by Theorem 5.8, z . and thus w = |z|
307
APPENDIX C. POLYNOMIALS
Appendix C
Polynomials
A polynomial is a function p : K → K defined by p(t) = an tn + an−1 tn−1 + · · · + a1 t + a0 where n is a nonnegative integer and a0 , . . . , an ∈ K. We denote the set of all polynomials by P(K). If p, q ∈ P(K) and α ∈ K, then we define polynomials p + q, pq, and αp by (p + q)(t) = p(t) + q(t),
(pq)(t) = p(t)q(t),
and (αp)(t) = αp(t).
The numbers a0 , . . . , an are uniquely defined by f . To prove this important fact we first prove the following result which is a form of the Euclidean algorithm. Theorem 5.10. Let p(t) = an tn + · · · + a0 and s(t) = bm tm + · · · + b0 be polynomials such that n ≥ m > 0 and bm 6= 0. Then there are polynomials q(t) = cn−m tn−m + · · · + c0 and r(t) = dk tk + · · · + d0 such that m > k ≥ 0 and p = qs + r. Proof. It is easy to verify that p1 (t) = p(t) −
an n−m t s(t) = a′n−1 tn−1 + · · · + a′0 bm
for some a′n−1 , . . . , a′0 ∈ K. If n − 1 < m, then p(t) =
an n−m t s(t) + p1 (t) bm
and we can take q(t) = bamn tn−m and r = p1 . If n − 1 ≥ m, then we can show by induction that there are polynomials q1 (t) = c′n−m−1 tn−m−1 + · · · + c′0 and r1 (t) = d′j tj + · · ·+ d′0 such that m > j ≥ 0 and p1 = q1 s + r1 . Consequently, p(t) =
an n−m t s(t) + q1 (t)s(t) + r1 (t) = bm
an n−m t + q1 (t) s(t) + r1 (t) bm
and we can take q(t) = tn−m an b−1 m + q1 (t) and r = r1 .
Definition 5.11. By a root of a nonzero polynomial p ∈ P(K) we mean any number α ∈ K such that p(α) = 0.
308
Chapter 5: Appendices
Theorem 5.12. If α ∈ K is a root of a polynomial p(t) = an tn +· · ·+a0 , then there is a polynomial q(t) = cn−1 tn−1 + · · · + c0 such that p(t) = (t − α)q(t). Proof. According to Theorem 5.10 there are a polynomial q(t) = cn−1 tn−1 + · · · + c0 and a number r ∈ K such that p(t) = (t − α)q(t) + r(t). Since r(α) = 0, we get p(t) = (t − α)q(t). Theorem 5.13. If a polynomial p(t) = an tn + · · · + a0 has n + 1 distinct roots, then p(t) = 0 for every t ∈ K. Proof. Let α1 , . . . , αn+1 be distinct roots of the polynomial p. From Theorem 5.12 we obtain, by induction, that there is a c ∈ K such that p(t) = c(t − α1 ) · · · (t − αn ). Since p(αn+1 ) = c(αn+1 − α1 ) · · · (αn+1 − αn ) = 0,
we get c = 0 and consequently p = 0.
Corollary 5.14. If p ∈ P(K) and p(t) = 0 for all t ∈ K, then p = 0. Now we are ready to prove the important result mentioned at the beginning, that is, the uniqueness of numbers an , . . . , a0 for a polynomial p(t) = an tn + · · · + a0 . Theorem 5.15. Let p(t) = an tn + · · · + a0 and q(t) = bm tm + · · · + b0 . If p(t) = q(t) for all t ∈ K, then n = m and aj = bj for every j ∈ {1, . . . , n}. Proof. Since (p − q)(t) = 0 for all t ∈ K, the result follows from Corollary 5.14. If p(t) = an tn +· · ·+a0 , then the numbers a0 , . . . , an are called the coefficients of p. If p is a nonzero polynomial and an 6= 0, then an is called the leading coefficient of the polynomial p and n is called the degree of p, denoted by deg p. We do not define the degree of the zero polynomial, that is, the polynomial
309
APPENDIX C. POLYNOMIALS
defined by p(t) = 0 for every t ∈ K. Note that the degree of a polynomial is well defined because of the Theorem 5.15. Let n be a nonnegative integer.The set of all polynomials of the form a0 + a1 t + · · · + an tn is denoted by Pn (K). Since P0 (K) can be identified with K, we often do not distinguish between a constant polynomial α and the number α. The following theorem is an immediate consequence of the definition of the degree of a polynomial. Theorem 5.16. If p and q are nonzero polynomials, then deg(p + q) ≤ max {deg p, deg q}
and
deg(pq) = deg p + deg q.
Note that deg(αp) = deg p for any nonzero polynomial p and any nonzero number α. Now we can formulate the final form of Theorem 5.10. Theorem 5.17. If p, s ∈ P(K) and s is a nonzero polynomial, then there are unique q, r ∈ P(K) such that deg r < deg g or r = 0 and we have p = qs + r. Proof. The existence of q, r ∈ P(K) can be obtained by a slight modification of the proof of Theorem 5.10. Suppose that we have p = qs + r
and p = q1 s + r1 ,
for some q, r, q1 , r1 ∈ P(K) such that deg r < deg s or r = 0 and deg r1 < deg s or r1 = 0. Then 0 = (q − q1 )s = r1 − r. If q − q1 6= 0, then (q − q1 )s 6= 0, because deg(q − q1 ) + deg s = deg(q − q1 )s. Consequently, q − q1 = 0 and r = r1 . The polynomial q is called the quotient on division of the polynomial p by the polynomial s and the polynomial r is the remainder on this division. Probably the student is familiar with the long division algorithm. We do not use it in the next example. Example 5.18. Let p(t) = 4t3 + 5t2 + 15t + 8 and s(t) = t2 + t + 3. Find the quotient and the remainder on division of p by s. Proof. In this case, it is easy to see that q(t) = at + b and r(t) = ct + d. We
310
Chapter 5: Appendices
have 4t3 +5t2 +15t+8 = (t2 +t+3)(at+b)+ct+d = at3 +(a+b)t2 +(3a+b+c)t+3b+d which yields 4 = a; 5 = a + b; 15 = 3a + b + c; 8 = 3b + d. This gives q(t) = 4t + 11 and r(t) = 2t + 5. A polynomial p is called monic if its leading coefficient is 1. If p and q are polynomials such that there is a polynomial s satisfying p = qs, then we say that q divides p or that q is a divisor of p. Theorem 5.19. Let p1 , . . . , pn be nonzero polynomials. unique monic polynomial d such that
There is a
{s1 p1 + · · · + sn pn : s1 , . . . , sn ∈ P(K)} = {sd : s ∈ P(K)} . The polynomial d divides all polynomials p1 , . . . , pn . Moreover, if a nonzero polynomial h divides all polynomials p1 , . . . , pn , then h divides d.
Proof. Let d be a nonzero polynomial of the smallest degree in the set F = {s1 p1 + · · · + sn pn : s1 , . . . , sn ∈ P(K)} . Then d divides fj for every j ∈ {1, . . . , n}. Indeed, by Theorem 5.17, for every j ∈ {1, . . . , n} we have pj = qj d + rj , for some polynomials qj and rj , and thus rj ∈ F . If rj 6= 0 for some j ∈ {1, . . . , n}, then we get a contradiction because deg rj < deg d and d is a nonzero polynomial of the smallest degree in F . Consequently we must have rj = 0 for every j ∈ {1, . . . , n} which means that for every j ∈ {1, . . . , n} the polynomial d is a divisor of the polynomial pj . Since s1 p1 + · · · + sn pn = (s1 q1 + · · · + sn qn )d, we have F ⊆ {sd : s ∈ P(K)}. On the other hand, because d ∈ F there are polynomials h1 , . . . , hn such that d = h1 p 1 + · · · + hn p n and consequently {sd : s ∈ P(K)} ⊆ F .
(5.1)
APPENDIX C. POLYNOMIALS
311
Now, if d1 and d2 are monic polynomials such that {sd1 : s ∈ P(K)} = {sd2 : s ∈ P(K)} , then there are polynomials s1 and s2 such that d1 = s1 d2 and d2 = s2 d1 . Then d1 = s1 s2 d1 and thus s1 s2 = 1. Since d1 and d2 are monic polynomials, we conclude that d1 = d2 . Finally, if a nonzero polynomial h divides all polynomials p1 , . . . , pn , then h divides d as a consequence of (5.1).
As a consequence of Theorem 5.19 and its proof we get the following result. Theorem 5.20. Let p1 , . . . , pn be nonzero polynomials. There is a unique monic polynomial d such that d divides all polynomials p1 , . . . , pn and if a nonzero polynomial h divides all polynomials p1 , . . . , pn , then h divides d.
Definition 5.21. Let p1 , . . . , pn be nonzero polynomials. The unique monic polynomial d from Theorem 5.20 is called the greatest common divisor of the polynomials p1 , . . . , pn and is denoted by GCD(p1 , . . . , pn ).
Note that if GCD(p1 , . . . , pn ) = 1, then there are s1 , . . . , sn ∈ P(K) such that s1 p1 + · · · + sn pn = 1. This observation is often used in proofs. Definition 5.22. A nonzero polynomial p ∈ P(K) with deg p ≥ 1 is called irreducible in P(K) if p = f g and f, g ∈ P(K) implies f ∈ K or g ∈ K. Note that irreducibility of a polynomial may depend on K. For example, the polynomial t2 + 1 is irreducible in P(R) but it is not irreducible in P(C). Clearly, if a polynomial in P(R) is irreducible in P(C) then it is irreducible in P(R).
312
Chapter 5: Appendices
Example 5.23. Show that the polynomial at+b is irreducible for any a, b ∈ K with a 6= 0. Solution. If at + b = f (t)g(t), then deg f + deg g = 1 and thus deg f = 0 or deg g = 0. Consequently, f ∈ K or g ∈ K.
Example 5.24. Show that the polynomial of degree at least 2 which has a root in K is not irreducible. Solution. This is an immediate consequence of Theorem 5.12.
Example 5.25. Let a, b, c ∈ R with a 6= 0. Show that the polynomial at2 + bt + c is irreducible over R if and only if 4ac − b2 > 0. Solution. If 4ac − b2 > 0, then
! 2 2 2 2 b b 4ac − b b 4ac − b at2 +bt+c = a t2 + 2 t + 2 + =a t+ + . 2a 4a 4a2 2a 4a2 Since
2 4ac − b2 b + > 0, t+ 2a 4a2
the polynomial at2 + bt + c has no real root and consequently is irreducible. If 4ac − b2 ≤ 0, then 2 b b2 − 4ac t+ = 2a 4a2
and the polynomial at2 + bt + c has the well known roots √ b2 − 4ac b − ± 2a 2a and thus is not irreducible.
Example 5.26. Let a, b, c, d ∈ R with a 6= 0. If the polynomial at3 +bt2 +ct+d has a complex root x+yi such that y 6= 0, show that the polynomial (t−x)2 +y 2 divides at3 + bt2 + ct + d.
APPENDIX C. POLYNOMIALS
313
Solution. If at3 + bt2 + ct + d has a root x + yi such that y 6= 0, then a(x − yi)3 + b(x − yi)2 + c(x − yi) + d = a(x + yi)3 + b(x + yi)2 + c(x + yi) + d = 0.
Note that (t − x − yi)(t − x + yi) = (t − x)2 + y 2 is an irreducible polynomial in P(R). Since we can write at3 + bt2 + ct + d = ((t − x)2 + y 2 )q + et + f, where e and f are real numbers, et + f as a polynomial from P(C) has two different roots x + yi and x − yi. Consequently e = f = 0, by Theorem 5.13. This shows that the polynomial (t − x)2 + y 2 divides at3 + bt2 + ct + d.
Theorem 5.27. If an irreducible polynomial p divides the product of two polynomials f g then p divides f or p divides g.
Proof. Suppose p does not divide f . If d is a polynomial such that p = dp1 and f = df1 for some p1 , f1 ∈ P(K), then d ∈ K. Consequently, GCD(p, f ) = 1 and there are polynomials q and h such that qp + hf = 1. Hence qpg + hf g = g. Because p divides qpg and hf g, it must divide g.
Lemma 5.28. If an irreducible polynomial p divides the product q1 · · · qm of irreducible polynomials q1 , . . . , qm , then there are j ∈ {1, . . . , m} and c ∈ K such that qj = cp. Proof. Using Theorem 5.27 and induction we can show there is a j ∈ 1, . . . , m such that p divides qj . Because the polynomial qj is an irreducible polynomial, there is a number c ∈ K such that qj = cp.
Theorem 5.29. If p1 · · · pn = q1 · · · qm for some irreducible polynomials p1 , . . . , pn , q1 , . . . , qm , then n = m and there are numbers c1 , . . . , cn such q1 = c1 p1 , . . . , qn = cn pn .
314
Chapter 5: Appendices
Proof. Without loss of generality, we can assume that q1 = c1 p1 where c1 ∈ K, by Lemma 5.28. This gives us p2 . . . pn = c1 q2 . . . qm . Because the polynomial c1 q2 is irreducible can we finish the proof by induction.
Theorem 5.30. Every nonzero polynomial can be written as a product of irreducible polynomials.
Proof. Let p be a nonzero polynomial such that deg p > 0. If p is irreducible we are done. If not, then we can write p = f g where deg f > 0 and deg g > 0. Now we continue in the same way with the polynomials f and g and finish the proof by induction. Because for every irreducible polynomial p there is a unique monic irreducible polynomial q such that p = cq, where c ∈ K, every polynomial p such that deg p > 0 can be uniquely written as p = cq1m1 · · · qkmk
(5.2)
where c ∈ K and q1 , . . . , qn are irreducible monic polynomials. As a consequence of Theorems 5.29 and 5.30 we get the following result. Theorem 5.31. Let p1 , . . . , pn be nonzero polynomials. There is a unique monic polynomial m such that all polynomials p1 , . . . , pn divide m and if all polynomials p1 , . . . , pn , divide a nonzero polynomial h then m divides h.
Definition 5.32. Let p1 , . . . , pn be nonzero polynomials. The unique monic polynomial m from Theorem 5.31 is called the lowest common multiple of the polynomials p1 , . . . , pn and is denoted by LCM(p1 , . . . , pn ). The following theorem is called the Fundamental Theorem of Algebra. It is usually proven using complex analysis. Theorem 5.33. For every p ∈ P(C) such that deg p > 0 there is z ∈ C such that p(z) = 0.
APPENDIX C. POLYNOMIALS
315
From the Fundamental Theorem of Algebra and Theorem 5.12 we obtain the following important result. Theorem 5.34. If p ∈ P(C) has exactly k distinct roots α1 , . . . , αk , p(t) = c(t − α1 )m1 · · · (t − αk )mk for some c ∈ C and m1 , . . . , mk are integers ≥ 1.
Theorem 5.35. If p ∈ P(R) is irreducible and deg p ≥ 2, then p(t) = at2 + bt + c where a, b, c ∈ R, a 6= 0, and 4ac − b2 > 0. Proof. If p has a real root, then p is not irreducible. If deg p ≥ 3, then p has a complex root of the form x + yi with y 6= 0 and we can show, as in Example 5.26, that the polynomial (t − x)2 + y 2 divides p and consequently p is not irreducible. To complete the proof we note that a polynomial p(t) = at2 + bt + c with a, b, c ∈ R, a 6= 0, and 4ac − b2 > 0, is irreducible. Now we can state a version of Theorem 5.34 for P(R). Theorem 5.36. If p ∈ P(R), then p = cq1m1 · · · qkmk , where c ∈ R and for every j ∈ {1, . . . , k} the polynomial qj is either of the form t−α for some α ∈ R or of the form t2 +βt+γ for some β, γ ∈ R such that β 2 − 4γ < 0.
Lagrange interpolation theorem Theorem 5.37. For any integer n ≥ 1 and α1 , . . . , αn , β1 , . . . , βn ∈ K, such that α1 , . . . , αn are distinct, there is a polynomial p ∈ P(K) such that p(α1 ) = β1 , . . . , p(αn ) = βn .
316
Chapter 5: Appendices
Proof. For every j ∈ {1, . . . , n} the polynomial qj (t) =
(t − α1 ) · · · (t − αj−1 )(t − αj+1 ) · · · (t − αn ) (αj − α1 ) · · · (αj − αj−1 )(αj − αj+1 ) · · · (αj − αn )
satisfies qj (αj ) = 1 and qj (αk ) = 0 for all k 6= j. Thus we can take p = β1 q1 + · · · + βn qn .
The formal derivative of a polynomial Definition 5.38. By the derivative of a polynomial p(t) = an tn +· · ·+a0 we mean the polynomial nan tn−1 + · · · + 2a2 t + a1 . The derivative of a polynomial p is denoted by p′ , that is, if p(t) = an tn + · · · + a0 then p′ (t) = nan tn−1 + · · · + 2a2 t + a1 . Theorem 5.39. For any p, q ∈ P(K) and α ∈ K we have (a) (αp)′ = αp′ ; (b) (p + q)′ = p′ + q ′ ; (c) (pq)′ = p′ q + pq ′ . Proof. Clearly, we have (αp)′ = αp′ and (p + q)′ = p′ + q ′ . To show that (pq)′ = p′ q + pq ′ we first note that (tm+n )′ = (m + n)tm+n−1 = mtm−1 tn + ntm tn−1 = (tm )′ tn + tm (tn )′ , for any positive integers m and n. To finish the proof we use the fact that, if p1 , p2 , q ∈ P(K) are polynomials such that (p1 q)′ = p′1 q + p1 q ′ and (p2 q)′ = p′2 q + p2 q ′ , then ((p1 + p2 )q)′ = (p1 q)′ + (p2 q)′ = p′1 q + p1 q ′ + p′2 q + p2 q ′ = (p′1 + p′2 )q + (p1 + p2 )q ′ = (p1 + p2 )′ q + (p1 + p2 )q ′ .
Using Theorem 5.39 and mathematical induction we obtain the following useful result.
APPENDIX D.
INFINITE DIMENSIONAL SPACES
317
Corollary 5.40. For any p ∈ P(K) and any integer n ≥ 1 we have (pn )′ = npn−1 p′ .
Appendix D spaces
Infinite dimensional inner product
Many results in Chapter 3 are formulated and proved for finite dimensional inner product spaces. While some of these results remain true in infinite dimensional inner product space, other are not or require additional assumptions. Here we briefly address the issues arising in infinite dimensional inner product spaces. The important Representation Theorem 3.3.1 is no longer true if we remove the assumption that V is finite dimensional. Indeed, consider the space V of all continuous functions on the interval [0, 1] with the inner product hf, gi = R1 f (t)g(t) dt and the function Φ : V → K defined by 0 Φ(f ) =
Z
1 2
f (t)dt.
0
The function Φ is clearly a linear transformation, but there is no continuous function g0 such that Φ(f ) = hf, g0 i =
Z
1
f (t)g0 (t)dt, 0
for every continuous function f , because then we would have to have g0 (t) = 1 for all t ∈ 0, 12 and g0 (t) = 0 for all t ∈ 12 , 1 . The Representation Theorem guarantees that every linear transformation between finite dimensional spaces has an adjoint. This is no longer true in infinite dimensional inner product spaces. Indeed, consider the space V = C∞ of all infinite sequences of complex numbers with only a finite number of nonzero terms with the inner product defined as h(x1 , x2 , . . . ), (y1 , y2 , . . . )i =
∞ X
xj yj .
j=1
Note that because all but a finite number of xj ’s and yj ’s are 0, the summation is always finite and thus we don’t have to worry about convergence of the series. Now consider the functions f : C∞ → C∞ defined as ∞ ∞ X X f ((x1 , x2 , . . . )) = xj , xj , . . . . j=1
j=2
318
Chapter 5: Appendices
It is clearly a linear transformation from C∞ to C∞ . Now suppose there is a linear transformation g : C∞ → C∞ such that hf (x), yi = hx, g(y)i for all x, y ∈ C∞ . Then for every integer j ≥ 1 we would have hej , g(e1 )i = hf (ej ), e1 i = 1, where ej is the sequence that has 1 in the j-th place and zeros everywhere else. But this is not possible because this means that g(e1 ) = (1, 1, . . . ) ∈ / C∞ . Since there are linear transformations that do not have adjoints, in every theorem that says something about the adjoint of a transformation we assume that the inner product space is finite dimensional spaces. Many of those theorems remain true in all inner product spaces if we simply assume that the transformations have adjoints. For example, if we assume that all transformations in Theorem 3.3.4 have adjoints, then the theorem is true for all inner product spaces. The definition of self-adjoint operators is formulated for operators on finite dimensional spaces, but it is not necessary. We can say that a linear transformation f : V → V is self-adjoint if hf (x), yi = hx, f (y)i for all x, y ∈ V. Note that this definition makes sense in any inner product spaces and many of the properties of self-adjoint operators proved in this chapter remain true in infinite dimensional inner product spaces and quite often the presented proof does not require any changes. There are theorems that depend in an essential way on the assumption that the space is of finite dimension. For example, in Theorem 3.4.56 we show that on a finite dimensional inner product space a linear operator is unitary if and only if it is isometric. This is not true in general. Indeed, consider the space V = C∞ defined above and the linear operator f : C∞ → C∞ defined as f (x1 , x2 , . . . ) = (0, x1 , x2 , . . . ). Note that this operator has an adjoint: f ∗ (x1 , x2 , . . . ) = (x2 , x3 , . . . ). Since kf (x1 , x2 , . . . )k2 = |0|2 +|x1 |2 +|x2 |2 +· · · = |x1 |2 +|x2 |2 +· · · = k(x1 , x2 , . . . )k2 , we have kf (x)k = kxk for every x ∈ C∞ and thus f is an isometric operator. On the other hand, since ran f 6= C∞ and f f ∗ 6= Id, f is not a unitary operator. It would be an excellent way to review Chapter 3 by checking for which theorems the assumption of finite dimensionality is essential.
Bibliography [1] D. Atanasiu and P. Mikusi´ nski, Linear algebra, Core topics for the first course, World Scientific, 2020. [2] S. Axler, Linear algebra done right, 3rd edition, Springer, 2015. [3] S. K. Berberian, Linear algebra, Dover Publications, 2014. [4] R. Godement, Cours d’alg`ebre, 3rd edition, Hermann, 1966. [5] J. S. Golan, The linear algebra a beginning graduate student ought to know, 3rd edition, Springer, 2009. [6] M. Houimi, Alg`ebre lin´eaire, alg`ebre bilin´eaire, Ellipses, 2021. [7] H. J. Kowalsky and G. Michler, Lineare algebra, 12th edition, de Gruyter, 2003. [8] T. W. K¨orner, Vectors pure and applied, Cambridge University Press, 2013. [9] S. Lang, Linear algebra, 3rd edition, Springer, 1987. [10] R. Mansuy and R. Mneimn´e, Alg`ebre lin´eaire. R´eduction des endomorphismes, 2nd edition, Vuibert, 2016. [11] L. Spence, A. Insel, and S. Friedberg, Linear algebra, 5th edition, Pearson, 2018. [12] S. Weintraub, A guide to advanced linear algebra, The Mathematical Association of America, 2011. [13] H. Woerdeman, Advanced linear algebra, Chapman and Hall/CRC, 2015.
319
This page intentionally left blank
Index ker f , 73 hx, yi, 119 An×n (K), 10 C[a,b] , 121 DR (R), 9 DRn (R), 9 L(V, W), 70 Mm×n (K), 4 P(K), 9 Pn (K), 9 Sn×n (K), 9 Un×n (K), 60 w ⊗ f , 106 w ⊗ v, 213 V n, 6 projU (v), 134 ran f , 74 f ∗ , 156 f + , 219 f T , 110 fB→C , 88 lvj , 98
change of coordinates matrix, 54 characteristic equation, 234 characteristic polynomial, 234 companion matrix, 283 complement of a subspace, 39 conjugate transpose matrix, 159 coordinates, 26 Cramer’s rule, 228 cyclic subspace, 255 derivative of a polynomial, 316 determinant of a matrix, 229 determinant of an endomorphism, 231 diagonalizable endomorphism, 246 dimension, 47 direct sum, 32 dual basis, 98 dual space, 97
eigenspace, 163, 236 eigenvalue, 163, 234 eigenvector, 163, 236 elementary transposition, 301 endomorphism, 70 Euclidean norm, 124 adjoint transformation, 156 algebraic multiplicity of an eigenvalue, even permutation, 302 246 f -annihilator, 238 alternating n-linear form, 222 f -invariant, 105, 244 alternating multilinear form, 222 functional, 97 annihilator, 143 geometric multiplicity of an eigenvalue, basis, 24 246 best approximation, 135 Gram-Schmidt process, 149, 150 bidual, 100 greatest common divisor, 311 bilinear form, 114 hermitian form, 116 canonical isomorphism, 101 Cartesian product, 6 inner product, 119 321
322 inner product space, 119 invariant factors, 289 invariant subspace, 167 inversion of permutation, 302 irreducible polynomial, 311 isometric operator, 181 isometry, 181 isomorphic vector spaces, 81 isomorphism, 81 Jordan block, 258 Jordan canonical form, 265, 279 kernel of a linear transformation, 73 Kronecker delta, 139 linear combination, 14 linear form, 97 linear independence, 19 linear span, 14 linear transformations, 68 linearly dependent, 18 lowest common multiple, 314
INDEX polarization identity, 116, 118, 127 positive definite form, 119 positive operator, 192 projection, 76 projection matrix, 131 quotient linear transformation, 103 quotient space, 103 range, 74 rank of a linear transformation, 80 rank-nullity theorem, 79 rational form, 289 reducing subspace, 167 Representation Theorem, 155 root of a polynomial, 307
scalar field, 1 Schwarz’s inequality, 122 self-adjoint operator, 160, 163 sesquilinear form, 115 singular value decomposition, 203 span, 14 spanning set, 14 matrix of a linear transformation, 88, spectral decomposition, 172 94 spectral representation, 174 minimal polynomial, 242 square root of positive operator, 196 monic polynomial, 310 subspace, 9 multilinear form, 222 symmetric form, 115 symmetric matrix, 177 norm, 125 normal operator, 160 trace of an operator, 216 normed space, 125 transposition, 301 nullity of a linear transformation, 80 triangle inequality, 125 odd permutation, 302 operator, 70 orthogonal complement, 143 orthogonal decomposition, 174 orthogonal operator, 187 orthogonal projection, 129, 133 orthogonal set, 139 orthogonal vectors, 123 orthonormal set, 139 partial isometry, 210 permutation, 301 polar decomposition, 211
trivial vector space, 3 unit vector, 129 unitary operator, 181 vector space, 2