Linear Algebra and Matrices [1 ed.] 9781611975130

935 82 12MB

English Pages [293] Year 2018

Table of contents :
Front cover
Title page
Contents
Preface
List of Symbols
Chapter 1: Basic facts on vector spaces, matrices, and linear transformations
Chapter 2: Canonical forms
Review Problems (1)
Chapter 3: Applications of the Jordan canonical form
Chapter 4: Inner product spaces
Chapter 5: Perron–Frobenius theorem
Chapter 6: Tensor products
Review Problems (2)
Challenging Problems
Appendix A: Sets and mappings
Appendix B: Basic facts in analysis and topology
Appendix C: Basic facts in graph theory
Appendix D: Basic facts in abstract algebra
Appendix E: Complex numbers
Appendix F: Diﬀerential equations
Appendix G: Basic notions of matrices
Bibliography
Index
Back cover

Recommend Papers

Nonnegative Matrices and Applicable Topics in Linear Algebra 9780486838076, 0486838072

Nonnegative matrices is an increasingly important subject in economics, control theory, numerical analysis, Markov chain

117 11 2MB Read more

Algebra De Matrices

424 68 9MB Read more

An Introduction to Matrices and Linear Transformations

An introductory textbook on linear algebra.

201 4 33MB Read more

An Introduction to Matrices and Linear Transformations

242 80 18MB Read more

Linear Algebra

914 98 2MB Read more

Linear Algebra

108 24 588KB Read more

Matrices and Linear Algebra (Dover Books on Mathematics) [2nd Revised ed.] 0486660141, 9780486660141

Linear algebra is one of the central disciplines in mathematics. A student of pure mathematics must know linear algebra

523 25 14MB Read more

Introduction to Applied Linear Algebra – Vectors, Matrices, and Least Squares 9781316518960, 9781108583664

424 26 4MB Read more

Linear Algebra

225 98 15MB Read more

Linear Algebra 9781107177901, 1107177901

Linear Algebra offers a unified treatment of both matrix-oriented and theoretical approaches to the course, which will b

241 109 7MB Read more

Linear Algebra and Matrices [1 ed.]
9781611975130

Author / Uploaded
Shmuel Friedland
Mohsen Aliabadi

0 0 0
Like this paper and download? You can publish your own PDF file online for free in a few minutes! Sign Up

File loading please wait...

Citation preview

Linear Algebra and Matrices

OT156_Aliabadi_FM_11-27-17.indd 1

11/27/2017 2:47:30 PM

OT156_Aliabadi_FM_11-27-17.indd 2

11/27/2017 2:47:30 PM

Linear Algebra and Matrices Shmuel Friedland Mohsen Aliabadi University of Illinois at Chicago Chicago, Illinois

Society for Industrial and Applied Mathematics Philadelphia

OT156_Aliabadi_FM_11-27-17.indd 3

11/27/2017 2:47:30 PM

Copyright © 2018 by the Society for Industrial and Applied Mathematics 10 9 8 7 6 5 4 3 2 1 All rights reserved. Printed in the United States of America. No part of this book may be reproduced, stored, or transmitted in any manner without the written permission of the publisher. For information, write to the Society for Industrial and Applied Mathematics, 3600 Market Street, 6th Floor, Philadelphia, PA 19104-2688 USA. Publications Director Executive Editor Developmental Editor Managing Editor Production Editor Copy Editor Production Manager Production Coordinator Compositor Graphic Designer

Kivmars H. Bowling Elizabeth Greenspan Gina Rinelli Harris Kelly Thomas Lisa Briggeman Claudine Dugan Donna Witzleben Cally A. Shrader Cheryl Hufnagle Lois Sellers

Library of Congress Cataloging-in-Publication Data Please visit www.siam.org/books/ot156 to view the CIP data.

is a registered trademark.

OT156_Aliabadi_FM_11-27-17.indd 4

11/27/2017 2:47:30 PM

Schmuel Friedland dedicates this book to his wife, Margaret Mohsen Aliabadi dedicates this book to the memory of his parents

s

OT156_Aliabadi_FM_11-27-17.indd 5

11/27/2017 2:47:30 PM

OT156_Aliabadi_FM_11-27-17.indd 6

11/27/2017 2:47:30 PM

Contents Preface

ix

List of Symbols

xi

1

2

Basic facts on vector spaces, matrices, and linear transformations 1.1 Vector spaces . . . . . . . . . . . . . . . . . . . . . . . . . . 1.2 Dimension and basis . . . . . . . . . . . . . . . . . . . . . . 1.3 Matrices . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.4 Sum of subspaces . . . . . . . . . . . . . . . . . . . . . . . . 1.5 Permutations . . . . . . . . . . . . . . . . . . . . . . . . . . 1.6 Linear, multilinear maps and functions (first encounter) . 1.7 Definition and properties of the determinant . . . . . . . . 1.8 Permanents . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.9 An application of the Philip Hall theorem . . . . . . . . . 1.10 Polynomial rings . . . . . . . . . . . . . . . . . . . . . . . . 1.11 The general linear group . . . . . . . . . . . . . . . . . . . . 1.12 Linear operators (second encounter) . . . . . . . . . . . . .

. . . . . . . . . . . .

. . . . . . . . . . . .

4

1 1 3 13 24 30 34 41 52 56 61 67 69

Canonical forms 75 2.1 Jordan canonical forms . . . . . . . . . . . . . . . . . . . . . . . 75 2.2 An application of diagonalizable matrices . . . . . . . . . . . . 83 2.3 Matrix polynomials . . . . . . . . . . . . . . . . . . . . . . . . . . 87 2.4 Minimal polynomial and decomposition to invariant subspaces 91 2.5 Quotient of vector spaces and induced linear operators . . . . 99 2.6 Isomorphism theorems for vector spaces . . . . . . . . . . . . . 100 2.7 Existence and uniqueness of the Jordan canonical form . . . . 102 2.8 Cyclic subspaces and rational canonical forms . . . . . . . . . . 108

Review Problems (1) 3

. . . . . . . . . . . .

117

Applications of the Jordan canonical form 3.1 Functions of matrices . . . . . . . . . . . . . . . . . . . . . . . 3.2 Power stability, convergence, and boundedness of matrices . 3.3 eAt and stability of certain systems of ordinary differential equations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

121 . 121 . 127 . 132

Inner product spaces 139 4.1 Inner product . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 139 vii

viii

Contents

4.2 4.3 4.4 4.5 4.6 4.7 4.8 4.9 4.10 4.11 4.12 4.13 4.14 4.15 4.16

Explanation of the G–S process in standard Euclidean space 144 An example of the G–S process . . . . . . . . . . . . . . . . . . 144 QR factorization . . . . . . . . . . . . . . . . . . . . . . . . . . . 144 An example of QR factorization . . . . . . . . . . . . . . . . . . 145 The best fit line . . . . . . . . . . . . . . . . . . . . . . . . . . . . 148 Geometric interpretation of the determinant (second encounter)149 Special transformations in IPS . . . . . . . . . . . . . . . . . . . 156 Symmetric bilinear, Hermitian, and quadratic forms . . . . . . 161 Max-min characterizations of eigenvalues . . . . . . . . . . . . 164 Positive definite operators and matrices . . . . . . . . . . . . . 173 Inequalities for traces . . . . . . . . . . . . . . . . . . . . . . . . 177 Singular value decomposition . . . . . . . . . . . . . . . . . . . . 181 Majorization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 185 Characterizations of singular values . . . . . . . . . . . . . . . . 190 Moore–Penrose generalized inverse . . . . . . . . . . . . . . . . 198

5

Perron–Frobenius theorem 5.1 Perron–Frobenius theorem . . . . . . . . . . . . . . . . . . . . 5.2 Irreducible matrices . . . . . . . . . . . . . . . . . . . . . . . . 5.3 Recurrence equation with nonnegative coefficients . . . . . .

207 . 207 . 215 . 217

6

Tensor products 221 6.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 221 6.2 Tensor product of two vector spaces . . . . . . . . . . . . . . . . 221 6.3 Tensor product of linear operators and the Kronecker product 224 6.4 Tensor product of many vector spaces . . . . . . . . . . . . . . 226 6.5 Examples and results for 3-tensors . . . . . . . . . . . . . . . . . 229

Review Problems (2)

235

Challenging Problems

245

Appendices

247

A

Sets and mappings

249

B

Basic facts in analysis and topology

253

C

Basic facts in graph theory

257

D

Basic facts in abstract algebra

261

E

Complex numbers

273

F

Differential equations

277

G

Basic notions of matrices

279

Bibliography

281

Index

283

Preface Linear algebra and matrix theory, abbreviated here as LAMT, is a foundation for many advanced topics in mathematics and an essential tool for computer science, physics, engineering, bioinformatics, economics, and social sciences. A first course in linear algebra for engineers is like a cookbook, where various results are given with very little rigorous justification. For mathematics students, on the other hand, linear algebra comes at a time when they are being introduced to abstract concepts and formal proofs as the foundations of mathematics. In the first case, very little of the fundamental concepts of LAMT is covered. For a second course in LAMT, there are a number of options. One option is to study the numerical aspects of LAMT, for example, in the book [11]. A totally different option, as in the popular book [14], views LAMT as a part of a basic abstract course in algebra. This book is meant to be an introductory course in LAMT for beginning graduate students and an advanced (second) course in LAMT for undergraduate students. The first author has been teaching this course to both graduate and advanced undergraduate students, in the Department of Mathematics, Statistics and Computer Science, the University of Illinois at Chicago, for more than a decade. This experience has helped him to reconcile this dichotomy. In this book, we use abstract notions and arguments to give the complete proof of the Jordan canonical form, and more generally, the rational canonical forms of square matrices over fields. Also, we provide the notion of tensor products of vector spaces and linear transformations. Matrices are treated in depth: stability of matrix iterations, the eigenvalue properties of linear transformations in inner product space, singular value decomposition, and min-max characterizations of Hermitian matrices and nonnegative irreducible matrices. We now briefly outline the contents of this book. There are six chapters. The first chapter is a survey of basic notions. It deals with basic facts in LAMT, which may be skipped if the student is already familiar with them. The second chapter is a rigorous exposition of the Jordan canonical form over an algebraically closed field (which is usually the complex numbers in the engineering world) and a rational canonical form for linear operators and matrices. Again, the section dealing with cyclic subspaces and rational canonical forms can be skipped without losing consistency. Chapter 3 deals with applications of the Jordan canonical form of matrices with real and complex entries. First, we discuss the precise expression of f (A), where A is a square matrix and f is a polynomial, in terms of the components of A. We then discuss the extension of this formula to functions f which are analytic in a neighborhood of the spectrum of A. The instructor may choose one particular application to teach from this chapter. The fourth chapter, the longest one, is devoted to properties of inner product spaces and ix

x

Preface

special linear operators such as normal, Hermitian, and unitary. We study the min-max and max-min characterizations of the eigenvalues of Hermitian matrices, the singular value decomposition, and its minimal low-rank approximation properties. The fifth chapter deals with basic aspects of the Perron–Frobenius theory, which are not usually found in a typical linear algebra book. The last chapter is a brief introduction to the tensor product of vector spaces, the tensor product of linear operators, and their representation as the Kronecker product. One of the main objectives of this book is to show the variety of topics and tools that modern linear algebra and matrix theory encompass. To facilitate the reading of this book we have included a good number of worked-out problems. These will help students in understanding the notions and results discussed in each section and prepare them for exams such as a graduate preliminary exam. We also provide a number of problems for instructors to assign to complement the material. It may be hard to cover all of these topics in a one-semester graduate course. Since many sections of this book are independent, the instructor may choose appropriate sections as needed. The authors would like to express their gratitude to the following coworkers and colleagues for comments on the work in progress: Saieed Akbari, Sharif University of Technology, Tehran, Iran; Charles Alley, University of Illinois at Chicago, Chicago, IL, United States; Behnam Esmayli, University of Pittsburgh, Pittsburgh, PA, United States; Khashayar Filom, Northwestern University, Evanston, IL, United States; Hossein Shahmohamad, Rochester Institute of Technology, Rochester, NY, United States; Joseph Shipman, Brandeis University, Waltham, MA, United States; Xindi Wang, University of Illinois at Chicago, Chicago, IL, United States. Finally, we assume responsibility for any remaining typographical or mathematical errors, and we encourage readers to not only report to us any errors they find, but to also submit any other suggestions for improvements to future editions.

March 2017

Shmuel Friedland Mohsen Aliabadi

List of Symbols Symbol

Description

1 a∣b (a1 , . . . , an ) auv (`) A⊺ A∗ A−1 A[α, β]

column vector of whose all coordinates are 1 a divides b g.c.d. of a1 , . . . , an the (u, v) entries of A` transpose of matrix A A¯⊺ for complex-valued matrix A the inverse of A submatrix of A formed by the rows and columns indexed by α and β adjoint matrix of A A is left equivalent to B A is right equivalent to B A is equivalent to B A is similar to B augmented coefficient matrix the matrix obtained from A by deleting its ith row and jth column the determinant of A(i, j) the upper left k × k submatrix of A nonnegative, nonzero nonnegative, and positive matrix adjacency matrix of digraph D A 0 block diagonal matrix [ ] 0 B {x ∶ x ∈ A or x ∈ B} {x ∶ x ∈ A and x ∈ B} the disjoint union of A and B A is a subset of B a is an element of A a is not an element of A the Cartesian product of A and B A equals B block diagonal matrix diag(A1 , . . . , Ak ) all anti–self-adjoint operators all n × n skew-symmetric matrices with entries in F {x ∶ x ∈ B and x ∈/ A}

adj A A ∼l B A ∼r B A∼B A≈B Aˆ A(i, j) Aij Ak A ≥ 0, A ≧ 0, A > 0 A(D) A⊕B A∪B A∩B A ∪∗ B A⊆B a∈A a ∈/ A A×B A=B ⊕ki=1 Ai AS(V) AS(n, F) B∖A

xi

xii

List of symbols

C Cn Cn×n C(A) Ck (A) C(φ) char R coker ϕ codim U conv S D = (V, E) D(n, F) DO(n, F) DUn D(m, n) deg p deg v degin v, degout v det δk (A) δji diag(A) dim V e1 ∧ ⋯ ∧ en ei ext C ex F E/F Eλ F[x] Fq F[z]/⟨p(z)⟩ Fr(k, V) G(x1 , . . . , xn ) GL(n, F) Gr(m, V) Gr(r, Fm ) G = (V, E) g.c.d. g○f Hn Hn,+ H0n,+ Hom(N, M) Hom(M) H(A)

the field of complex numbers n-dimensional vector space over C the set of n × n complex valued matrices the set of continuous function in A the set of continuous function in A with k continuous derivatives the companion matrix the characteristic of R the cokernel of ϕ the codimension of U all convex combinations of elements in S digraph with V vertices and E diedges all n × n diagonal matrices with entries in F all n × n diagonal orthogonal matrices with entries in F all n × n diagonal unitary matrices the vector subspace of diagonal matrices degree of polynomial p degree of a vertex v in and out degree of a vertex v determinant g.c.d. of all k × k minors of A Kronecker’s delta function the diagonal matrix with the diagonal entries of A the dimension of V the volume element generated by e1 , . . . , en (δ1i , . . . , δni )⊺ the set of extreme points of a convex set C (ex1 , . . . , exn )⊺ for x = (x1 , . . . , xn )⊺ ∈ Cn field field extension eigenspace of λ the set of all polynomials with coefficients in F the field of order q the set of all polynomials modulo p(z) all orthonormal k-frames in V the Gramian matrix group of n × n invertible matrices in Fn×n Grassmannian of m-dimensional subspaces in V Grassmannian of r-dimensional subspaces of Fm undirected graph with V vertices and E edges greatest common divisor the composition of f and g all n × n Hermitian matrices all n × n nonnegative definite Hermitian matrices the open set all homomorphisms from N to M all homomorphisms from M to M 0 A [ ∗ ] for A ∈ Cm×n A 0

List of symbols

In index A index (λ) JCF (A) Jn Ker T Kn Km,n lcm(a1 , . . . , an ) lim supn→∞ Sn lim inf n→∞ Sn L(U, V) L(V) Lk (V, U) L(m, F) λ(T ) ma (λ) mg (λ) N [n] [n]k N ⊲G N(V) N(n, R) N(n, C) N (A) nul A ∥⋅∥ ∥ ⋅ ∥p ∥ ⋅ ∥∞ ⟨⋅, ⋅⟩ ∠(x, y) O O(n, F) Ωn Ωn,s P (X) Pn Pn,s perm A pA (z) or p(z) Πn Q (Q, ⊕) R R+ rank A R

xiii

the n × n identity matrix the index of nilpotency of A the multiplicity of λ in minimal polynomial of A the Jordan canonical form of A the n × n matrix in which every entry equals n1 kernel of homomorphism T a complete graph on n vertices a complete bipartite graph on m and n vertices least common multiple of a1 , . . . , an the limit superior of {Sn } the limit inferior of {Sn } the set of linear maps from U to V L(V, V) {T ∈ L(V, U); rank T ≤ k} all m × m lower triangular matrices with entries in F eigenvalue vector of self-adjoint operator T , eigenvalues are decreasing the algebraic multiplicity of λ the geometric multiplicity of λ all positive integers set {1, . . . , n} for n ∈ N all subsets of [n] of cardinality k N is a normal subgroup of G all normal operators all n × n real-valued normal matrices all n × n complex-valued normal matrices null space of A nullity of matrix A norm on matrices the `p norm the spectral norm an inner product the angle between x and y big O notation all n × n orthogonal matrices with entries in F all n × n doubly stochastic matrices all n × n symmetric doubly stochastic matrices the power set of X all n × n permutation matrices all n×n matrices of the form 12 (P +P ⊺ ) where P ∈ Pn the permanent of A the characteristic polynomial of A the set of probability vectors with n coordinates the field of rational numbers the quaternion group the field of real numbers the set of all nonnegative real numbers the rank of a matrix A a ring

xiv

List of symbols

Rm,n,k (F) range T rank τ ρ(A) Sn → S S⊥ S1 S2n−1 S(V) S+ (V) S+ (V)o S m×n S(n, F) S+ (n, R) S+ (n, R)0 SO(n, F) SUn span {x1 , . . . , xn } spec (T ) σ1 (T ) ≥ σ2 (T ) ≥ . . . σ(A) σ (p) (T ) σk (z) Sn T ≻ S, (T ⪰ S) tr T trk T U+W U⊕W U≅V U(V) Un U(n, F) V V′ x(D) x ≺ y (x ⪯ y) z¯ Z Z+ Z(G) Zij (Zij )uv #J U⊗2 ⊗ki=1 Vi U⊗W ⊗k V = V⊗k

all m × n matrices of at most rank k with entries in F range of a homomorphism T rank of a tensor τ spectral radius of A Sn converges to S the maximal orthogonal set of S the unit circle ∣z∣ = 1 in the complex plane the 2n − 1 dimensional sphere all self-adjoint operators acting on IPS V all self-adjoint nonnegative definite operators all self-adjoint positive definite operators all m × n matrices with entries in S all n × n symmetric matrices with entries in F S(n, R) ∩ Hn,+ S(n, R) ∩ H0n,+ all n × n special orthogonal matrices with entries in F all n × n special unitary matrices the set of all linear combination of x1 , . . . , xn spectrum of T singular values of T (σ1 (A), . . . , σmin(m,n) (A))⊺ for A ∈ Cm×n (σ1 (T ), . . . , σp (T ))⊺ k-th elementary symmetric polynomial the group of permutations of [n] T − S is positive (nonnegative) definite trace of T sum of the first k eigenvalues of self-adjoint T {v ∶= u + w, u ∈ U, w ∈ W} the direct sum of U and W U is isomorphic to V all unitary operators all n × n unitary matrices all n × n upper triangular matrices with entries in F a vector space the dual vector space Hom(V, F) (d1 , . . . , dn )⊺ for D = diag(d1 , . . . , dn ) x (weakly) majorized by y complex conjugate of z all integers all nonnegative integers the center of the group G the A-components the (u, v) entries of A-component Zij cardinality of a set J U⊗U tensor product of V1 , . . . , Vk the tensor product of U and W k-tensor power of V

List of symbols

⊙m i=1 Xi Symk V ⋀k V ⊗ki=1 xi

xv

∧ki=1 xi ⊗ki=1 Ai

a set tensor product of X1 , . . . , Xm k-symmetric power of V k-exterior power or wedge product of V k-decomposable tensor, rank 1 tensor if each xi is nonzero k-wedge product Kronecker tensor product of matrices A1 , . . . , Ak

⊕ A``

the direct sum of square matrices A11 , . . . , App

[E, F] 0 ψA (z) or ψ(z) ι+ (T ) ι0 (T ) ι− (T ) ∑(A) ⟨a⟩ ≫

the degree of field extension zero matrix the minimal polynomial of A the number of positive eigenvalues of T the number of zero eigenvalues of T the number of negative eigenvalues of T the sum of all entries of A the principal ideal generated by a much greater than

p

`=1

Chapter 1

Basic facts on vector spaces, matrices, and linear transformations

1.1 Vector spaces An abelian group (V, +) is called a vector space over a field F, if for any a ∈ F and v ∈ V the product av (called the scalar multiplication operation) is an element in V, and this operation satisfies the following properties: a(u + v) = au + av, (a + b)v = av + bv, (ab)v = a(bv), and 1v = v, for all a, b ∈ F, u, v ∈ V. The vector space V over F is also called an F-vector space. (See Appendix D for definition of group and field.) Throughout this book, the symbol V will denote a vector space over the field F unless stated otherwise. Roughly speaking, the definition states that a vector space is a composite object consisting of a field, a set of vectors, and two operations with certain properties. Historically, the first ideas leading to vector spaces can be traced back as far as the 17th century’s matrices, systems of linear equations, and Euclidean vectors. A more abstract treatment, first formulated by Giuseppe Peano in 1888, encompasses more general objects than Euclidean space. Examples of vector spaces 1. Let V = Fm×n be the set of all m × n matrices with entries in the field F. Then, V is a vector space over F, where addition is ordinary matrix addition and multiplication of the scalar c by matrix A means the multiplication of the matrix A by the scalar c gives the matrix whose entries are the entries of A multiplied by c. (See Appendix G for definition and properties of matrices.) 2. Let F[x] be the set of all polynomials with coefficients in F. Addition and scalar multiplication are defined as (p + q)(x) = p(x) + q(x), (ap)(x) = ap(x), where p, q ∈ F[x], and a ∈ F. Then F[x] forms a vector space over F. 1

2

Chapter 1. Basic facts on vector spaces, matrices, and linear transformations

3. Let X be a nonempty set and V be the set of all real-valued functions on X. For f , g ∈ V, we want to define f + g ∈ V. It means f + g must be a real-valued function on X. We define (f + g)(x) = f (x) + g(x). Similarly, we define cf by (cf )(x) = cf (x). Then V is a vector space over R.

Geometric interpretation of vector addition in R2 Let x, y ∈ R2 , where x = (x1 , x2 ), y = (y1 , y2 ). In Figure 1.1, a = (x1 , x2 ), b = (y1 , y2 ). Form the parallelogram with sides oa and ob. Let the fourth coordinates be (α, β). Let m be the point of intersection of the diagonals of the parallelogram oacb. Then m is the midpoint of ba as well as oc. Since m is the midpoint of oc, we have m = 21 [(0, 0) + (α, β)]. Comparing coordinates we get α = x1 + y1 and β = x2 + y2 , that is, c = a + b. Thus x + y is the vector represented by the diagonal of the parallelogram spanned by x and y.

Figure 1.1

Geometric interpretation of scalar multiplication in R2 If a is a positive number and x is a vector in R2 , then ax is the vector that points in the same direction as x and whose length is a times the length of x. Figure 1.2 illustrates this point.

Figure 1.2

1.2. Dimension and basis

3

Figure 1.3 If a is a negative number, then ax is the vector that points in the opposite direction as x and whose length is ∣a∣ times the length of x, as illustrated in Figure 1.3. Remark 1.1.1. Actually, the motivation for the definition of a vector space comes from the properties possessed by addition and scalar multiplication on Rn .

1.2 Dimension and basis As most algebraic structures, vector spaces contain substructures called subspace. Let V be a vector space over a field F. (For simplicity of the exposition you may assume that F = R or F = C.) A subset U ⊂ V is called a subspace of V, if a1 u1 + a2 u2 ∈ U, for each u1 , u2 ∈ U and a1 , a2 ∈ F. The subspace U ∶= {0} is called the zero, or the trivial subspace of V.

Examples of vector subspaces 1. Let V be the vector space mentioned as the third example above and fix y0 ∈ X. Then W = {f ∈ V ∶ f (y0 ) = 0} is a vector subspace of V. 2. Let A ∈ Fm×n , where F is a field. Then the set of all n × 1 matrices x over F such that Ax = 0 is a subspace of the space of all n × 1 matrices over F. To prove this we must show that (cx + dy) = 0, when Ax = 0, Ay = 0, and c, d ∈ F. This follows immediately from the following fact, whose proof is left as an exercise: Let A ∈ Fm×n and B, C ∈ Fn×p . Then A(cB + dC) = c(AB) + d(AC), for each scalar c,d ∈ F.

1.2.1 Basic notions in vector spaces The elements of V are called vectors. Let x1 , . . . , xn ∈ V be n vectors. Then, for any a1 , . . . , an ∈ F, called scalars, a1 x1 + ⋯ + an xn is called a linear combination of x1 , . . . , xn . Moreover, 0 = ∑ni=1 0xi is the trivial combination. Also, x1 , . . . , xn are linearly independent, if the equality 0 = ∑ni=1 ai xi implies that a1 = ⋯ = an = 0. Otherwise, x1 , . . . , xn are linearly dependent. Note that a single vector x is linearly independent by itself if and only if x ≠ 0. The set of all linear combinations of x1 , . . . , xn is called the span of x1 , . . . , xn , and denoted

4

Chapter 1. Basic facts on vector spaces, matrices, and linear transformations

by span{x1 , . . . , xn }. Indeed, span {x1 , . . . , xn } = {∑ni=1 ai xi ∶ ai ∈ F}. In particular, we use the notation span (x) to express span {x}, for any x ∈ V . Clearly, span {x1 , . . . , xn } is a subspace of V. (Why?) In general, the span of a set S ⊆ V is the set of all linear combinations of elements of S, i.e., the set of all linear combinations of all finite subsets of S. Example 1.1. The vectors x1 = (1, 1, 1)⊺ , x2 = (1, −1, 2)⊺ , and x3 = (3, 1, 4)⊺ in R3 are linearly dependent over R because 2x1 + x2 − x3 = 0.

1.2.2 Geometric interpretation of linear independence Two vectors x1 and x2 in R2 are linearly independent if neither is the zero vector and one is not a scalar multiple of the other. Graphically, this means x1 and x2 form the edges of a nondegenerate parallelogram. See Figure 1.4.

Two nonzero nonparallel vectors x1 and x2 form the edges of a parallelogram. A vector x = c1 x1 + c2 x2 lies interior to the parallelogram if 0 < c1 , c2 < 1.

Figure 1.4. Independent vectors. Linear independence is defined for three vectors in R3 similarly. Three vectors in R3 are linearly independent if none of them is the zero vector and they form the edges of a nondegenerate parallelepiped of positive volume. See Figure 1.5.

Vectors x1 , x2 , x3 form the edges of a parallelepiped. Vector x = c1 x1 + c2 x2 + c3 x3 satisfying 0 < ci < 1, i = 1, 2, 3 are located interior to the parallelepiped.

Figure 1.5. Independence of three vectors.

1.2.3 More details on vector space If V is a vector space, it is called finitely generated if V = span {x1 , . . . , xn }, for some vectors x1 , . . . , xn and {x1 , . . . , xn } is called a spanning set of V. (In this

1.2. Dimension and basis

5

book we consider only finitely generated vector spaces.) For a positive integer n we will use the following notation: [n] ∶= {1, 2, . . . , n − 1, n}

(1.2.1)

Observe the following “obvious” relation for any n ∈ N (here, N is the set of all positive integers 1, 2, . . . ,): span {u1 , . . . , ui−1 , ui+1 , . . . , un } ⊆ span {u1 , . . . , un }, for each i ∈ [n]. (1.2.2) It is enough to consider the case i = n. (We assume here that for n = 1, the span of empty set is the zero subspace U = {0}.) Then, for n = 1, (1.2.2) holds. Suppose that n > 1. Clearly, any linear combination of u1 , . . . , un−1 , which is n−1 u = ∑n−1 i=1 bi ui , is a linear combination of u1 , . . . , un : u = 0un + ∑i=1 bi ui . Hence, by induction (1.2.2) holds for all n ∈ N. Theorem 1.2.1. Suppose that n > 1. Then, there exists i ∈ [n] such that span {u1 , . . . , un } = span {u1 , . . . , ui−1 , ui+1 , . . . , un },

(1.2.3)

if and only if u1 , . . . , un are linearly dependent vectors. Proof. Suppose that (1.2.3) holds. Then, ui = b1 u1 +⋯+bi−1 ui−1 +bi+1 ui+1 +⋯+ bn un . Hence, ∑ni=1 ai ui = 0, where aj = bj , for j ≠ i and ai = −1. Then, u1 , . . . , un are linearly dependent. Now assume that u1 , . . . , un are linearly dependent. Then, there exist scalars a1 , . . . , an ∈ F, not all equal to zero, so that a1 u1 + . . . + an un = 0. Assume that ai ≠ 0. For simplicity of notation (or by renaming indices in [n]), we may assume that i = n. This yields un = − a1n ∑n−1 i=1 ai ui . Let u ∈ span {u1 , . . . , un }. Then, n

n−1

n−1

i=1

i=1

i=1

u = ∑ bi ui = bn un + ∑ bi ui = −bn ∑

n−1 n−1 ai an bi − ai bn ui + ∑ bi ui = ∑ ui . an an i=1 i=1

That is, u ∈ {u1 , . . . , un−1 }. This proves the theorem.

◻

Corollary 1.2.2. Let u1 , . . . , un ∈ V. Assume that not all ui ’s are zero vectors. Then, there exist d ≥ 1 integers 1 ≤ i1 < i2 < ⋯ < id ≤ n such that ui1 , . . . , uid are linearly independent and span {ui1 , . . . , uid } = span {u1 , . . . , un }. Proof. Suppose that u1 , . . . , un are linearly independent. Then, d = n and ik = k, for k = 1, . . . , n, and we are done. Now assume that u1 , . . . , un are linearly dependent. Applying Theorem 1.2.1, we consider now n − 1 vectors u1 , . . . , ui−1 , ui+1 , . . . , un as given by Theorem 1.2.1. Note that it is not possible that all vectors in {u1 , . . . , ui−1 , ui+1 , . . . , un } are zero since this will contradict the assumption that not all ui are zero. Apply the previous arguments to u1 , . . . , ui−1 , ui+1 , . . . , un and continue in this method until one gets d linearly independent vectors ui1 , . . . , uid . ◻ Corollary 1.2.3. Let V be a finitely generated nontrivial vectors space, i.e., contains more than one element. Then, there exist n ∈ N and n linearly independent vectors v1 , . . . , vn such that V = span {v1 , . . . , vn }.

6

Chapter 1. Basic facts on vector spaces, matrices, and linear transformations

Definition 1.2.4. The linear equation ∑ni=1 ai xi = b with unknowns x1 , x2 , . . . , xn is called homogeneous if b = 0. Otherwise, it is called nonhomogeneous. In the following lemma we use the fact that any m homogeneous linear equations with n variables have a nontrivial solution if m < n. (This will be proved later using the row echelon form (REF) of matrices.) Lemma 1.2. Let n > m ≥ 1 be integers. Then, any w1 , . . . , wn ∈ span {u1 , . . . , um } are linearly dependent. Proof. Observe that wj = ∑m i=1 aij uj , for some aij ∈ F, 1 ≤ j ≤ n. Then n n m ⎛ n m ⎞ ∑ xj wj = ∑ xj ∑ aij ui = ∑ ∑ aij xj ui . ⎠ ⎝ i=1 j=1 j=1 j=1 i=1

Consider the following m homogeneous equations in n unknowns: ∑nj=1 aij xj = 0, for i = 1, . . . , m. Since n > m, we have a nontrivial solution (x1 , . . . , xn ). Thus, n ◻ ∑j=1 xj wj = 0 with xi not all 0. This proves the lemma. Theorem 1.2.5. Let V = span {v1 , . . . , vn } and assume that v1 , . . . , vn are linearly independent. Then, the following holds: 1. Any vector u can be expressed as a unique linear combination of v1 , . . . , vn . 2. For an integer N > n, any N vectors in V are linearly dependent. 3. Assume that u1 , . . . , un ∈ V are linearly independent. Then, V = span {u1 , . . . , un }. Proof. 1. Assume that ∑ni=1 xi vi = ∑ni=1 yi vi . Thus, ∑ni=1 (xi −yi )vi = 0. As v1 , . . . , vn are linearly independent, it follows that xi − yi = 0, for i = 1, . . . , n. Hence, 1 holds. 2. Lemma 1.2 implies 2. 3. Suppose that u1 , . . . , un are linearly independent. Let v ∈ V and consider n + 1 vectors u1 , . . . , un , v. Next, 2 implies that u1 , . . . , un , v are linearly dependent. Thus, there exist n + 1 scalars a1 , . . . , an+1 , not all of them zero, such that an+1 v + ∑ni=1 ai ui = 0. Assume first that an+1 = 0. Then, n ∑i=1 ai ui = 0. Since not all ai ’s are zero, it follows that u1 , . . . , un are linearly dependent, which contradicts our assumption. Hence, an+1 ≠ 0 i and v = ∑ni=1 a−a ui , i.e., v ∈ span {u1 , . . . , un }. ◻ n+1

1.2.4 Dimension A vector space V over the field F is said to be finite-dimensional if there is a finite subset {x1 , . . . , xn } of V such that V = span {x1 , . . . , xn }. Otherwise, V is called infinite-dimensional. The dimension of the trivial vector space, consisting of the zero vector, V = {0}, is zero. The dimension of a finite-dimensional nonzero vector space V is the number of vectors in any spanning linearly independent

1.2. Dimension and basis

7

set. Namely, if V contains an independent set of n vectors but contains no independent set of n + 1 vectors, we say that V has dimension n. Also, the dimension of an infinite-dimensional vector space is infinity. The dimension of V is denoted by dimF V or simply dim V if there is no ambiguity in the background field. Assume that dim V = n. Suppose that V = span {x1 , . . . , xn }. Then, {x1 , . . . , xn } is called a basis of V, i.e., each vector x can be uniquely expressed as ∑ni=1 ai xi . Thus, for each x ∈ V, one associates a unique column vector a = (a1 , . . . , an )⊺ ∈ Fn , where Fn is the vector space of column vectors with n coordinates in F. It will be convenient to denote x = ∑ni=1 ai xi by the equality x = [x1 , x2 , . . . , xn ]a. An ordered basis is a basis {x1 , . . . , xn } plus an agreement that x1 is to come first in formulas, and so on. All bases in this book are assumed to be ordered. Example 1.3. The set 1 0 1 S = {[ ] , [ ] , [ ]} 0 1 2 spans R2 but it is not a linearly independent set. However, removing any single vector from S makes the remaining set of vectors a basis for R2 . 1 Example 1.4. In R2 , [−1 ] and [01] are linearly independent, and they span R2 .

Thus they form a basis for R2 and dimR R2 = 2. See Figure 1.6.

Figure 1.6. Subspace spanned by vectors.

An application 2

dy d y 2x The differential equation dx and y2 = ex . 2 − 3 dx + 2y = 0 has solutions y1 = e Any linear combination y(x) = c1 y1 (x) + c2 y2 (x) is a solution of the equation, so span {e2x , ex } is contained in the set of solutions of the differential equation. Indeed, the solution space is a subspace of the vector space of functions. Since y1 and y2 are linearly independent and span the solution space for the equation, they form a basis for the solution space and the solution space has dimension 2.

8

Chapter 1. Basic facts on vector spaces, matrices, and linear transformations

Change of basis matrix Let {x1 , . . . , xn } be a basis of V and y1 , . . . , yn be n vectors in V. Then yi = ∑ yji xj , for some yji ∈ F, 1 ≤ i, j ≤ n. Denote by Y = [yji ]ni=j=1 the n × n matrix with the element yji in jth row and ith column. The above equalities are equivalent to the identity [y1 , . . . , yn ] = [x1 , . . . , xn ]Y.

(1.2.4)

Then, {y1 , . . . , yn } is a basis if and only if Y is an invertible matrix. (See subsection 1.7.4 for details on invertible matrices.) In this case, Y is called the matrix of the change of basis from [y1 , . . . , yn ] to [x1 , . . . , xn ]. The change of basis matrix from [x1 , . . . , xn ] to [y1 , . . . , yn ] is given by Y −1 . (Just multiply the identity (1.2.4) by Y −1 from the right.) Suppose also that [z1 , . . . , zn ] is a basis in V. Let [z1 , . . . , zn ] = [y1 , . . . , yn ]Z. Then, [z1 , . . . , zn ] = [x1 , . . . , xn ](Y Z). (Use (1.2.4).) Here, Y Z is the matrix of the change of basis, from [z1 , . . . , zn ] to [x1 , . . . , xn ]. See Appendix G about matrices and their properties. We end this section by investigating the linear analogue of a result in group theory. It is a well-known fact in elementary group theory that a subgroup of a finitely generated group is not necessarily finitely generated. See [12] for more details. Nevertheless, it is not the case for vector spaces, i.e., any subspace of a finite-dimensional vector space is finite-dimensional. Now, if V is a finite-dimensional vector space over the field F with the basis {x1 , . . . , xn } and W is a subspace, one can ask if there exists a subset J ⊆ [n] for which W = span {xi ∶ i ∈ J}? It is easy to show that this is not the case. For example, if V = R3 as a real vector space with the standard basis X = {(1, 0, 0), (0, 1, 0), (0, 0, 1)} and W = span {(1, 1, 1), (1, 1, 21 )}, then W is spanned by no two elements of X. (Why?)

Historical note Sir Isaac Newton (1642–1727) was an English mathematician, astronomer, and physicist who is recognized as one of the most influential scientists of all time. Newton’s work has been said “to distinctly advance every branch of mathematics then studied.” His work on the subject usually referred to as fluxions or calculus, seen in a manuscript from October 1666, is now published among Newton’s mathematical papers. The author of the manuscript De analysi per aequations numero terminorum infinitas, sent by Isaac Barrow to John Collins in June 1669, was identified by Barrow in a letter sent to Collins in August of that year as “Mr Newton, a fellow of our College, and very young. . .but of an extraordinary genius and proficiency in these things.” Newton later became involved in a dispute with Leibniz over priority in the development of calculus (the Leibniz–Newton calculus controversy). Most modern historians believe that Newton and Leibniz developed calculus independently, although with very different notations. The vector concept arose in the study of mechanics and forces. Isaac Newton is credited with the formulation of our current view of forces in his work Principia (1687). Following Newton, many scholars began to use directed line segments to represent forces and the parallel-

1.2. Dimension and basis

9

ogram rule to add such segments. Joseph-Louis Lagrange (1736–1813) published his M´ecanique analytique in 1788, in which he summarized all the post-Newtonian efforts in a single cohesive mathematical treatise on forces and mechanics [27].

Isaac Newton (1642–1727).

1.2.5 Worked-out problems 1. Let C[−1, 1] be the set of all real continuous functions on the interval [−1, 1]. (a) Show that C[−1, 1] is a vector space over R. (b) Let U (a, b) ⊂ C[−1, 1] be all functions such that f (a) = b, for a ∈ [−1, 1] and b ∈ R. For which values of b is U (a, b) a subspace? (c) Is C[−1, 1] finite-dimensional? Solution: (a) We use the following elementary facts: “The sum of two continuous functions is continuous” and “Multiplying a continuous function by a real number yields a continuous function.” If f (x) ∈ C[−1, 1] and a ∈ R, (af )(x) = af (x). Obviously, af is also in C[−1, 1]. Furthermore, if f (x), g(x) ∈ C[−1, 1] then (f + g)(x) = f (x) + g(x). Using the above facts, f + g is also continuous. The zero function is in C[−1, 1] and works as the neutral element for the operation “+” defined as above. Also 1f = f , for any f ∈ C[−1, 1]. Then, C[−1, 1] satisfies all properties of a vector space. (b) If U (a, b) is a subspace, it must contain the 0-function. Then, 0(a) = 0, i.e., b = 0. Now, if f and g are in U (a, b) and c and d are real numbers, then cf (a)+dg(a) = (c+d)b = 0. Thus, cf +dg ∈ U (a, b). Since U (a, b) is closed under addition and scalar multiplication, it is a subspace.

10

Chapter 1. Basic facts on vector spaces, matrices, and linear transformations

(c) No, C[−1, 1] is infinite-dimensional. If n is an arbitrary positive integer, then 1, x, x2 , . . . , xn are linearly independent. Indeed, if (a0 , . . . , an )∈ Rn is nonzero, then the assumption that p(x) = ∑ni=0 ai xi is the zero function implies that p(x) has infinitely many roots. This contradicts the fact that p(x) has at n roots. (See the fundamental theorem of algebra in Appendix E.) 2. Let V be a vector space and dim V = n. Show that (a) A set A of n vectors in V spans V if and only if A is independent. (b) V has a basis and every basis consists of n vectors. Solution: (a) Assume that A = {x1 , . . . , xn }. Since dim V = n, the set {x, x1 , . . . , xn } is dependent, for every x ∈ V . If A is independent, then x is in the span of A. Therefore, A spans V. Conversely, if A is dependent, one of its elements can be removed without changing the span of A. Thus, A cannot span V, by Problem 1.2.6-9. (b) Since dim V = n, V contains an independent set of n vectors, and (a) shows that every such set is a basis of V; the statement follows from the definition of dimension and Problem 1.2.6-9.

1.2.6 Problems 1. A complex number ζ is called the mth root of unity if ζ m = 1. ζ is called a primitive unity root of order m > 1 if ζ m = 1 and ζ k ≠ 1, for k ∈ [m − 1]. Let l be the number of integers in [m − 1] that are coprime with m. Let Q[ζ] ∶= {a0 + a1 ζ + ⋯ + al−1 ζ l−1 , a0 , . . . , al−1 ∈ Q}. Show that Q[ζ] is a field. It is a finite extension of Q. More precisely, Q[ζ] can be viewed as a vector space over Q of dimension l. Hints: 2πi

(a) Let ξ = e m . Show that ζ is an mth root of unity if and only if ζ = ξ k and k ∈ [m]. Furthermore, ζ is primitive if and only if k and m are coprime. (b) Let ζ be an mth primitive root of unity. Show that ζ, ζ 2 , . . . , ζ m−1 , ζ m are all mth root of unity. (c) Show that xm − 1 has simple roots. (d) Let p1 (x)⋯pj (x) be the decomposition of xm − 1 to irreducible factors in Q[x]. Assume that ζ, a primitive mth root of unity, is a root of p1 . Let l = deg pl . Show that Q[ζ] ∶= {a0 + a1 ζ + ⋯ + al−1 ζ l−1 , a0 , . . . , al−1 ∈ Q} is a field extension of Q of degree l which contains all m roots of unity. (e) Show that l is the number of m-primitive roots of unity. (f) Show that l is given by the Euler function φ(m). Namely, assume that m = pd11 ⋯pdr r , where 1 < p1 < ⋯ < pr are primes and d1 , . . . , dr are positive integers. Then, φ(m) = (p1 − 1)pd11 −1 ⋯(pr − 1)pdr r −1 .

1.2. Dimension and basis

11

Special cases: (a) m = 4: x4 − 1 = (x2 + 1)(x + 1)(x − 1). The 4th primitive roots of unity are ±i. They are roots of x2 + 1. Q[i] is called the Gaussian field of rationals. The set Z[i] = {a + bi ∶ a, b ∈ Z} is called the Gaussian integers domain. (b) m = 6: x6 − 1 = (x2 − x +√1)(x2 + x + 1)(x + 1)(x − 1). The primitive roots of order 6 are 21 ± 23 i, which are the two roots of x2 − x + 1. 2. Quaternions Q is a four-dimensional noncommutative division algebra that generalizes complex numbers. Any quaternion is written as a vector q = a0 + a1 i + a2 j + a3 k. The product of two quaternions is determined by using the following equalities: i2 = j2 = k2 = −1, ij = −ji = k, jk = −kj = i, ki = −ik = j. ¯ = a0 − a1 i − a2 j − a3 k. Show that Let q ¯ is an involution. (T is a bijection preserving ad(a) The map T (q) = q dition and multiplication and T 2 = id.) (b) Q is a noncommutative division algebra, i.e., if q ≠ 0, then there exists a unique r such that rq = qr = 1. (You can try to use the identity ¯ q = ∣q∣2 = ∑3i=0 a2i .) q¯ q=q 3. Let Pm be the subset of all polynomials of degree at most m over the field F. Show that the polynomials 1, x, . . . , xm form a basis in Pm . 4. Let fj be a nonzero polynomial in Pm of degree j, for j = 0, . . . , m. Show that f0 , . . . , fm form a basis in Pm . 5. Find a basis in P4 such that each polynomial is of degree 4. 6. Is there a basis in P4 such that each polynomial is of degree 3? 7. Write up the set of all vectors in R4 whose coordinates are two 0’s and two 1’s. (There are six such vectors.) Show that these six vectors span R4 . Find all the collections of four vectors out of these six which form a basis in R4 . 8. Prove the completion lemma: Let V be a vector space of dimension m. Let v1 , . . . , vn ∈ V be n linearly independent vectors. (Here m ≥ n.) Then, there exist m − n vectors vn+1 , . . . , vm such that {v1 , . . . , vm } is a basis in V. 9. Assume that a vector space V is spanned by a set of n vectors, n ∈ N. Show that dim V ≤ n. 10. Let V be a vector space over a field F and S ⊆ V, and show that (a) span (span (S)) = span (S), (b) span (S) is the smallest subspace of V containing S, (c) span (S) = S if and only if S is a subspace of V.

12

Chapter 1. Basic facts on vector spaces, matrices, and linear transformations

11. Let x1 , . . . , xn ∈ Qn . Prove that the xi ’s are linearly independent over Q if and only if they are linearly independent over R. 12. A homogeneous polynomial is a polynomial whose nonzero monomial all have the same degree. For example, x5 + 2x2 y 3 + 5xy 4 is a homogeneous polynomial of degree 5, in two indeterminates. Let F be a field and fix m, n ∈ N. (a) Prove that the set of homogeneous polynomials of degree m in n indeterminates is a vector space over F. ). (b) Prove that its dimension over F is (n+m−1 m (Note that for natural numbers k, n with k ≤ n, (nk) (read n choose k) is the number of subsets of order k in the set [n]. For example, (42) = 6; the six subsets of order 2 of {1, 2, 3, 4} are {1, 2}, {1, 3}, {1, 4}, {2, 3}, {2, 4}, {3, 4}.) 13. Find subspaces V and W of the real vector space R3 having the property that V ∪ W is not a subspace of R3 . 14. Let V be a three-dimensional vector space over F, with basis {x1 , x2 , x3 }. Is {x1 + x2 , x2 + x3 , x1 − x3 } a basis for V? 15. Suppose that A ∈ Fn×n and x1 , . . . , xm ∈ Fn×1 are column vectors. Prove that if Ax1 , . . . , Axm are linearly independent column vectors, then x1 , . . . , xm are linearly independent. 16. Let V be a vector space and S ⊆ V. Prove that span S is the intersection of all the subspaces of V containing S. 17. Let U and W be two-dimensional subspaces of R3 . Prove that U∩W≠{0}. 18. Show that R is a vector space of infinite dimension over Q. 19. Prove or disprove that R2 is a subspace of R3 . 20. Let V = {α ∈ R; α is algebraic over Q}. Prove that V can be considered as a Q-vector space. What is dimQ V? 21. Is the set S 1 = {(x, y)⊺ ∈ R2 ; x2 + y 2 ≤ 1} a subspace of R2 ? (This is the set of points inside and on the unit circle in the xy-plane.)

Historical note Georg Cantor (1845–1918) was a German mathematician who invented set theory, which has become a fundamental theory in mathematics. The history of dimension is itself a fascinating subject. For vector spaces, we have seen that the definition is fairly intuitive. In 1877, Cantor proved that there is a one-to-one correspondence between points of R and points of R2 . Although intuitively the (two-dimensional) plane is bigger than the (one-dimensional) line, Cantor’s argument says that they each have the same “number” of points. He also proved that for any positive integer n, there exists a one-to-one correspondence between the points on the unit line segment and all of the points in an n-dimensional

1.3. Matrices

13

space. About this discovery, Cantor wrote to Dedekind: “I see it, but I don’t believe it!” The result that he found so astonishing has implications for the notion of dimension. In 1890, Peano further confounded the situation by producing a continuous function from the line into the plane that touches every point of a square [27].

Georg Cantor (1845–1918).

1.3 Matrices For a field F, the set Fm×n is a vector space over F of dimension mn. A standard basis of Fm×n is given by Eij , i = 1, . . . , m, j = 1, . . . , n, where Eij is a matrix with 1 in the entry (i, j) and 0 in all other entries. Also, Fm×1 = Fm the set of column vectors with m coordinates. A standard basis of Fm is ei = (δ1i , . . . , δmi )⊺ , where δji denotes Kronecker’s delta function defined as ⎧ ⎪ ⎪0 if j ≠ i δji = ⎨ ⎪ 1 if j = i, ⎪ ⎩ for i = 1, . . . , m. For A = [aij ] ∈ Fm×n , denote by A⊺ ∈ Fn×m the transpose of the matrix A, i.e., the (i, j) entry of A⊺ is the (j, i) entry of A. The following properties hold: (aA)⊺ = aA⊺ , (A + B)⊺ = A⊺ + B ⊺ , (AC)⊺ = C ⊺ A⊺ , (Ar )⊺ = (A⊺ )r , for all nonnegative integers r. (See Appendix G for details of matrices operations.) The matrix A = [aij ] ∈ Fm×n is called diagonal if aij = 0, for i ≠ j. Denote by D(m, n) ⊂ Fm×n the vector subspace of diagonal matrices. A square diagonal matrix with the diagonal entries d1 , . . . , dn is denoted by diag(d1 , . . . , dn ) ∈ Fn×n . A matrix A is called a block matrix if A = [Aij ]p,q i=j=1 , where each entry Aij is an mi × mj matrix. Then, A ∈ Fm×n , where m = ∑pi=1 mi and n = ∑qj=1 nj . A block matrix A is called block diagonal if Aij = 0mi ×mj , for i ≠ j. A block diagonal matrix with p = q is denoted by diag(A11 , . . . , App ). A different notation is ⊕pl=1 All ∶= diag(A11 , . . . , App ), and ⊕pl=1 All is called the direct sum of the square matrices A11 , . . . , App .

14

Chapter 1. Basic facts on vector spaces, matrices, and linear transformations

1.3.1 Elementary row operation Elementary operations on the rows of A ∈ Fm×n are defined as follows: 1. Multiply row i by a nonzero a ∈ F. 2. Interchange two distinct rows i and j in A. 3. Add to row j, row i multiplied by a, where i ≠ j. The first nonzero element from the left in a nonzero row of a matrix A is called a leading entry. A matrix A ∈ Fn×n is said to be in an REF if it satisfies the following conditions: 1. All nonzero rows are above any rows of all zeros. 2. Each leading entry of a row is in a column to the right of the leading entry of the row above it. 3. All entries in a column below a leading entry are zeros. A matrix A can be brought to a unique canonical form called the reduced row echelon form, abbreviated as RREF, by elementary row operations. The RREF of the zero matrix 0m×n is 0m×n . For A ≠ 0m×n , RREF of A given by B is of the following form: In addition to the conditions 1, 2, and 3 above, it must satisfy the following: 1. The leading entry in each nonzero row is 1. 2. Each leading 1 is the only nonzero entry in its column. A pivot position in the matrix A is a location in A that corresponds to a leading 1 in the RREF of A. The first nonzero entry of each row of a REF of A is called a pivot, and the columns in which pivots appear are pivot columns. The columns other than pivot columns are called nonpivot. The process of finding pivots is called pivoting. The rank of A is the number of pivots in its row echelon form and is denoted by rank F A or simply rank A if there is no ambiguity in the background field.

1.3.2 Gaussian elimination Gaussian elimination is a method to bring a matrix either to its REF or its RREF or similar forms. It is often used for solving matrix equations of the form Ax = b.

(1.3.1)

In the following, we may assume that A is full-rank, i.e., the rank of A equals the number of rows. To perform Gaussian elimination starting with the system of equations ⎡a11 ⎢ ⎢a ⎢ 21 ⎢ ⎢ ⋮ ⎢ ⎢ak1 ⎣

a12 a22

⋯ ⋯

ak2

⋯

a1k ⎤⎥ ⎡⎢x1 ⎤⎥ ⎡⎢b1 ⎤⎥ a2k ⎥⎥ ⎢⎢x2 ⎥⎥ ⎢⎢b2 ⎥⎥ ⎥⎢ ⎥ = ⎢ ⎥, ⎥⎢ ⋮ ⎥ ⎢ ⋮ ⎥ ⎥⎢ ⎥ ⎢ ⎥ akk ⎥⎦ ⎢⎣xk ⎥⎦ ⎢⎣bk ⎥⎦

(1.3.2)

1.3. Matrices

15

compose the “augmented matrix ⎡ a11 ⎢ ⎢ a ⎢ 21 ⎢ ⎢ ⋮ ⎢ ⎢ ak1 ⎣

equation”

⎤ ⎥ ⎥ ⎥ ⎥. (1.3.3) ⎥ ⎥ ak2 ⋯ akk bk ⎥⎦ Now, perform elementary row operations to put the augmented matrix into the upper triangular form: ′ ⎤ ′ ′ ⎡ a′ ⎢ 11 a12 ⋯ a1k b1 ⎥ ⎢ 0 a′ ′ b′2 ⎥⎥ ⎢ 22 ⋯ a2k ⎥. ⎢ (1.3.4) ⎥ ⎢ ⋮ ⎥ ⎢ ′ ′ ⎥ ⎢ 0 0 ⋯ akk bk ⎦ ⎣ a12 a22

⋯ ⋯

a1k a2k

b1 b2

Solve the equation of the kth row for xk , then substitute back into the equation of the (k − 1)-st row to obtain a solution for xk−1 , etc. We get the formula xi =

k ⎞ 1 ⎛ ′ bi − ∑ a′ij xj . ′ aii ⎝ ⎠ j=i+1

Example 1.5. We solve the following system using Gaussian elimination: ⎧ 2x − 2y ⎪ ⎪ ⎪ ⎪ ⎨ x−y+z ⎪ ⎪ ⎪ ⎪ 3y − 2z ⎩ For this system, the augmented matrix ⎡2 −2 ⎢ ⎢ ⎢1 −1 ⎢ ⎢0 3 ⎣

= −6 =1 = −5.

is 0 1 −2

−6⎤⎥ ⎥ 1 ⎥. ⎥ −5⎥⎦

First, multiply row 1 by 21 : ⎡ ⎤ ⎡2 −2 0 −6⎤ ⎥ Multiply r1 by 12 ⎢1 −1 0 −3⎥ ⎢ ⎥ ⎢ ⎥ ⎢ 1 ⎥ −−−−−−−−Ð→ ⎢1 −1 1 1 ⎥. ⎢1 −1 1 ⎥ ⎢ ⎥ ⎢ ⎢0 3 −2 −5⎥ ⎢0 3 −2 −5⎥ ⎦ ⎣ ⎦ ⎣ Now, adding −1 times the first row to the second row yields zeros below the first entry in the first column: ⎡1 −1 0 −3⎤ ⎡ ⎤ ⎢ ⎥ −r1 added to r2 ⎢1 −1 0 −3⎥ ⎢ ⎥ ⎥ ⎢ 1 ⎥ −−−−−−−−Ð→ ⎢0 0 1 4 ⎥. ⎢1 −1 1 ⎢ ⎥ ⎥ ⎢ ⎢0 3 −2 −5⎥ ⎢0 3 −2 −5⎥ ⎣ ⎦ ⎦ ⎣ Interchanging the second and third rows then gives the desired upper-triangular coefficient matrix: ⎡1 −1 0 −3⎤ ⎡ ⎤ ⎢ ⎥ r2 ↔r1 ⎢1 −1 0 −3⎥ ⎢ ⎥ ⎢ ⎥ 1 4 ⎥ −−−−Ð→ ⎢0 3 −2 −5⎥ . ⎢0 0 ⎢ ⎥ ⎢ ⎥ ⎢0 3 −2 −5⎥ ⎢0 0 1 4 ⎥⎦ ⎣ ⎦ ⎣ The third row now says z = 4. Back-substituting this value into the second row gives y = 1, and back-substitution of both these values into the first row yields x = −2. The solution is therefore (x, y, z) = (−2, 1, 4).

16

Chapter 1. Basic facts on vector spaces, matrices, and linear transformations

1.3.3 Solution of linear systems Consider the following system of the linear equations: a11 x1 + a12 x2 + ⋯ + a1n xn = b1 a21 x1 + a22 x2 + ⋯ + a2n xn = b2 ⋮ an1 x1 + an2 x2 + ⋯ + ann xn = bn

Assume that A = [aij ]1≤i,j≤n

(1.3.5)

⎡ b1 ⎤ ⎢ ⎥ ⎢b ⎥ ⎢ ⎥ and b = ⎢ 2 ⎥. ⎢⋮⎥ ⎢ ⎥ ⎢bn ⎥ ⎣ ⎦

Definition 1.3.1. Let Aˆ ∶= [A∣b] be the augmented m × (n + 1) matrix representˆ Assume that C ing the system (1.3.5). Suppose that Cˆ = [C∣c] is an REF of A. has k-pivots in the columns 1 ≤ `1 < ⋯ < `k ≤ n. Then, the variables x`1 , . . . , x`k corresponding to these pivots are called lead variables. The other variables are called free variables. Definition 1.3.2. The function f (x1 , . . . , xn ) is called affine if it is of the form f (x1 , . . . , xn ) = a + ∑ni=1 ai xi . The following theorem describes exactly the set of all solutions of (1.3.5). Theorem 1.3.3. Let Aˆ ∶= [A∣b] be the augmented m×(n+1) matrix representing ˆ Then, the system the system (1.3.5). Suppose that Cˆ = [C∣c] is an REF of A. ˆ (1.3.5) is solvable if and only if C does not have a pivot in the last column n + 1. Assume that (1.3.5) is solvable. Then each lead variable is a unique affine function in free variables. These affine functions can be determined as follows: ˆ Move all the free variables 1. Consider the linear system corresponding to C. to the right-hand side of the system. Then, one obtains a triangular system in lead variables, where the right-hand side are affine functions in free variables. 2. Solve this triangular system by back-substitution. In particular, for a solvable system, we have the following alternative: 1. The system has a unique solution if and only if there are no free variables. 2. The system has infinitely many solutions if and only if there is at least one free variable. ˆ As Proof. We consider the linear system of equations corresponding to C. ˆ an ERO (elementary row operation) on A corresponds to an EO (elementary operation) on the system (1.3.5), it follows that the system represented by Cˆ is equivalent to (1.3.5). Suppose first that Cˆ has a pivot in the last column. Thus, the corresponding row of Cˆ that contains the pivot in the column n + 1 is (0, 0, . . . , 0, 1)⊺ . The corresponding equation is unsolvable, hence the whole

1.3. Matrices

17

system corresponding to Cˆ is unsolvable. Therefore, the system (1.3.5) is unsolvable. Assume now that Cˆ does not have a pivot in the last column. Move all the ˆ It is a triangular free variables to the right-hand side of the system given by C. system in the lead variables where the right-hand side of each equation is an affine function in the free variables. Now use back-substitution to express each lead variable as an affine function of the free variables. Each solution of the system is determined by the value of the free variables which can be chosen arbitrarily. Hence, the system has a unique solution if and only if it has no free variables. The system has many solutions if and only if it has at least one free variable. ◻ ˆ Example 1.6. Consider the following example of C: ⎡ 1 −2 3 −1 0 ⎤ ⎥ ⎢ ⎥ ⎢ ⎢ 0 1 3 1 4 ⎥, ⎥ ⎢ ⎢ 0 0 0 1 5 ⎥ ⎦ ⎣ x1 , x2 , x4 are lead variables, x3 is a free variable.

(1.3.6)

x4 = 5, x2 + 3x3 + x4 = 4 ⇒ x2 = −3x3 − x4 + 4 ⇒ x2 = −3x3 − 1, x1 − 2x2 + 3x3 − x4 = 0 ⇒ x1 = 2x2 − 3x3 + x4 = 2(−3x3 − 1) − 3x3 + 5 ⇒ x1 = −9x3 + 3. Theorem 1.3.4. Let Aˆ ∶= [A∣b] be the augmented m×(n+1) matrix representing ˆ Then, the system the system (1.3.5). Suppose that Cˆ = [C∣c] is a RREF of A. (1.3.5) is solvable if and only if Cˆ does not have a pivot in the last column n + 1. Assume that (1.3.5) is solvable. Then, each lead variable is a unique affine function in free variables determined as follows. The leading variable x`i appears only in the equation i, for 1 ≤ i ≤ r = rank A. Shift all other variables in the equation (which are free variables) to the other side of the equation to obtain x`i as an affine function in free variables. Proof. Since RREF is a row echelon form, Theorem 1.3.3 yields that (1.3.5) is solvable if and only if Cˆ does not have a pivot in the last column n + 1. Assume that Cˆ does not have a pivot in the last column n + 1. Then, all the pivots of Cˆ appear in C. Hence, rank Aˆ = rank Cˆ = rank A(= r). The pivots of C = [cij ] ∈ Rm×n are located at row i and the column `i , denote as (i, `i ), for i = 1, . . . , r. Since C is also in RREF, in the column `i there is only one nonzero element which is equal to 1 and is located in row i. ˆ which is equivaConsider the system of linear equations corresponding to C, lent to (1.3.5). Hence, the lead variable `i appears only in the ith equation. The left-hand side of this equation is of the form x`i plus a linear function in free variables whose indices are greater than x`i . The right-hand side is ci , where c = (c1 , . . . , cm )⊺ . Hence, by moving the free variables to the right-hand side, we obtain the exact form of x`i : x`i = ci −

∑

j∈{`i ,`i +1,...,n}∖{j`1 ,j`2 ,...,j`r }

cij xj , for i = 1, . . . , r.

(1.3.7)

18

Chapter 1. Basic facts on vector spaces, matrices, and linear transformations

(Therefore, {`i , `i + 1, . . . , n} consists of n − `i + 1 integers from j`i to n, while {j`1 , j`2 , . . . , j`r } consists of the columns of the pivots in C.) ◻ Example 1.7. Consider ⎡ 1 0 ⎢ ⎢ ⎢ 0 1 ⎢ ⎢ 0 0 ⎣

b 0 u ⎤⎥ ⎥ d 0 v ⎥, ⎥ 0 1 w ⎥⎦

where x1 , x2 , x4 are lead variables and x3 is a free variable: x1 + bx3 = u ⇒ x1 = −bx3 + u, x2 + dx3 = v ⇒ x2 = −dx3 + v, x4 = w. Definition 1.3.5. Let A be a square n × n, then A is called nonsingular if the corresponding homogeneous system of n equations in n unknowns has only solution x = 0. Otherwise, A is called singular. We see in the following theorem that the reduced row echelon form of a matrix is unique. It means there cannot be two different reduced row echelon forms for a matrix. There can, however, be infinitely many matrices in row echelon form (not reduced) that are not unique. Theorem 1.3.6. Let A be an m × n matrix. Then, its RREF is unique. Proof. Let U be a RREF of A. Consider the augmented matrix Aˆ ∶= [A∣0] ˆ = [U ∣0] is corresponding to the homogeneous system of equations. Clearly, U ˆ a RREF of A. Put the free variables on the other side of the homogeneous ˆ where each lead variable is a linear function of the system corresponding to A, free variables. Note that the exact formulas for the lead variables determine uniquely the columns of the RREF that corresponds to the free variables. Assume that U1 is another RREF of A. Then, U and U1 have the same pivots. Thus, U and U1 have the same columns which correspond to pivots. By considering the homogeneous system of linear equations corresponding to ˆ1 = [U1 ∣0], we find also the solution of the homogeneous system Aˆ by writing U ˆ and U ˆ1 give down the lead variables as linear functions in free variables. Since U rise to the same lead and free variables, we deduce that the linear functions in ˆ and U ˆ1 are free variables corresponding to a lead variable x`i corresponding to U equal. That is, the matrices U and U1 have the same row i, for i = 1, . . . , rank A. All other rows of U and U1 are zero rows. Hence, U = U1 . ◻ Corollary 1.3.7. The matrix A ∈ Rn×n is nonsingular if and only if rank A = n, i.e., the RREF of A is the identity matrix ⎡1 0 ⎢ ⎢0 1 ⎢ In = ⎢ ⎢⋮ ⋮ ⎢ ⎢0 0 ⎣

⋯ ⋯ ⋯

0 0 ⋮ 0

0⎤⎥ 0⎥⎥ ⎥ ∈ Rn×n . ⋮ ⎥⎥ 1⎥⎦

(1.3.8)

1.3. Matrices

19

Proof. The matrix A is nonsingular if and only if there are no free variables. Thus, rank A = n if and only if the RREF is In . ◻

1.3.4 Invertible matrices In this subsection, we discuss the link between elementary row operations on matrices and multiplication by certain matrices called elementary matrices. A matrix A ∈ Fm×m is called invertible if there exists B ∈ Fm×m such that AB = BA = Im . Note that such B is unique. If AC = CA = Im , then B = BIm = B(AC) = (BA)C = Im C = C. We denote B, the inverse of A by A−1 . Denote by GL(m, F) ⊂ Fm×m the set of all invertible matrices. Note that GL(m, F) is a group under the multiplication of matrices, with the identity In . (Observe that for A and B ∈ GL(m, F), (AB)−1 = B −1 A−1 .) Note that in our definition of an invertible matrix A, we required that A be square and that the inverse matrix B satisfies both AB = Im and BA = Im . In fact, given A and B are square, either of the conditions AB = Im and BA = Im implies the other one. (Why?) A matrix E ∈ Fm×m is called elementary if it is obtained from Im by applying one elementary row operation. Note that applying an elementary row operation on A is equivalent to EA, where E is the corresponding elementary matrix. By reversing the corresponding elementary operation, we see that E is invertible, and E −1 is also an elementary matrix. In Section 1.7 we will use elementary matrices to prove a fundamental fact about the product formula, det AB = det A det B, for A, B ∈ Fn×n . 0

Example 1.8. [1 0

1 0 0

0 0] 1

is an elementary matrix. We are performing on the

identity matrix r1 ↔ r2 . 1

Example 1.9. [0 0

0 3 0

0 0] 1

is an elementary matrix. We are performing on the

identity matrix 3r2 → r2 . 1

Example 1.10. [ 0

−2

0 0 1 0] 0 1

is an elementary matrix. We are performing on the

identity matrix r3 − 2r1 → r3 . The following theorem is well-known as the fundamental theorem of invertible matrices; in the next subsections, we will see a more detailed version of this theorem. Theorem 1.3.8. Let A ∈ Fm×m . The following statements are equivalent: 1. A is invertible. 2. Ax = 0 has only the trivial solution. 3. The RREF of A is Im . 4. A is a product of elementary matrices.

20

Chapter 1. Basic facts on vector spaces, matrices, and linear transformations

Proof. 1 ⇒ 2. The equation Ax = 0 implies that x = Im x = A−1 (Ax) = A−1 0 = 0. 2 ⇒ 3. The system Ax = 0 does not have free variables, hence the number of pivots is the number of rows, i.e., the RREF of A is Im . 3 ⇒ 4. There exists a sequence of elementary matrices so that Ep Ep−1 ⋯E1 A = Im . Hence, A = E1−1 E2−1 ⋯Ep−1 , and each Ej−1 is also elementary. 4 ⇒ 1. As each elementary matrix is invertible, so is their product, which is equal to A. ◻ Theorem 1.3.9. Let A ∈ Fn×n . Define the matrix B = [A∣In ] ∈ F n×(2n) . Let C = [C1 ∣C2 ], C1 , C2 ∈ F n×n be the RREF of B. Then, A is invertible if and only if C1 = In . Furthermore, if C1 = In , then A−1 = C2 . Proof. The fact that “A is invertible if and only if C1 = In ” follows straightforwardly from Theorem 1.3.8. Note that for any matrix F = [F1 ∣F2 ], F1 , F2 ∈ Fn×p , G ∈ Fl×n , we have GF = [(GF1 )∣(GF2 )]. Let H be a product of elementary matrices such that HB = [I∣C2 ]. Thus, HA = In and HIn = C2 . Hence, H = A−1 and C2 = A−1 . ◻ We can extract the following algorithm from Theorem 1.3.9 to calculate the inverse of a matrix if it was invertible: Gauss–Jordan algorithm for A−1 Form the matrix B = [A∣In ]. Perform the ERO to obtain RREF of B ∶ C = [D∣F ]. A is invertible ⇔ D = In . If D = In , then A−1 = F .

Example 1.11. Let

⎡1 ⎢ ⎢ A = ⎢−2 ⎢ ⎢3 ⎣

−1⎤⎥ ⎥ 5 ⎥. ⎥ −5⎥⎦

2 −5 7

Write B = [A∣I3 ]:

⎡ 1 2 −1 1 0 ⎢ ⎢ B = ⎢ −2 −5 5 0 1 ⎢ ⎢ 3 7 −5 0 0 ⎣ Perform ERO R2 + 2R1 → R2 , R3 − 3R1 → R3 : ⎡ 1 ⎢ ⎢ B1 = ⎢ 0 ⎢ ⎢ 0 ⎣

2 −1 1

−1 3 −2

0 0 1

⎤ ⎥ ⎥ ⎥. ⎥ ⎥ ⎦

1 0 0 ⎤⎥ ⎥ 2 1 0 ⎥. ⎥ −3 0 1 ⎥⎦

To make the (2,2) entry pivot do −R2 → R2 : ⎡ 1 2 ⎢ ⎢ B2 = ⎢ 0 1 ⎢ ⎢ 0 1 ⎣

−1 −3 −2

1 −2 −3

0 0 ⎤⎥ ⎥ −1 0 ⎥ . ⎥ 0 1 ⎥⎦

1.3. Matrices

21

To eliminate the (1,2) and (3,2) entries, do R1 − 2R2 → R1 , R3 − R2 → R3 : ⎡ 1 0 5 5 2 0 ⎤⎥ ⎢ ⎢ ⎥ B3 = ⎢ 0 1 −3 −2 −1 0 ⎥ . ⎢ ⎥ ⎢ 0 0 1 −1 1 1 ⎥ ⎣ ⎦ To eliminate the (1,3) and (2,3) entries do R1 − 5R3 → R1 , R2 + 3R3 → R2 : ⎡ 1 0 0 10 −3 −5 ⎤ ⎥ ⎢ ⎥ ⎢ 3 ⎥. B4 = ⎢ 0 1 0 −5 2 ⎥ ⎢ ⎢ 0 0 1 −1 1 1 ⎥⎦ ⎣ Then, B4 = [I3 ∣F ] is RREF of B. Thus, A has the inverse: ⎡ 10 −3 −5⎤ ⎥ ⎢ ⎥ ⎢ −1 3 ⎥. A = ⎢−5 2 ⎥ ⎢ ⎢−1 1 1 ⎥⎦ ⎣

1.3.5 Row and column spaces of A Two matrices A and B ∈ Fm×n are called left equivalent (row equivalent) if A and B have the same reduced row echelon form or B can be obtained from A by applying elementary row operations. Equivalently, B = U A for some U ∈ GL(m, F ). This is denoted as A ∼l B. It is straightforward to show that ∼l is an equivalence relation. Right equivalence (column equivalence) is defined similarly and denoted by ∼r . Note that A ∼r B if and only if A⊺ ∼l B ⊺ . Also, A, B ∈ Fm×n are called equivalent if B = U AV , for some U ∈ GL(m, F), V ∈ GL(n, F). Equivalently, B is equivalent to A if B can be obtained from A by applying elementary row and column operations. This is denoted as A ∼ B. It is straightforward to show that ∼ is an equivalence relation. Let A ∈ Fm×n . Denote by c1 , . . . , cn the n rows of A. We write A = [c1 c2 . . . cn ]⊺ . Then, span {c1 , . . . , cn } is denoted by R(A) and called the row space of A. Its dimension equals rank A. (Why?) Similarly, the dimension of the column space of A⊺ , which is equal to the dimension of the row space of A, is rank A, i.e., rank A⊺ = rank A. (See Worked-out Problem 1.4.3-4.) Let x = (x1 , . . . , xn ). Then, Ax = ∑ni=1 xi ci . Hence, the system Ax = b is solvable if and only if b is in the row space of A. The set of all x ∈ Fn such that Ax = 0 is called the null space of A. It is a subspace of dimension n − rank A and is denoted N(A) ⊂ Fm ; dim N(A) is called the nullity of A and denoted by null A. (See Worked-out Problem 1.6.2-2.) If A ∈ GL(n, F), then Ax = b has the unique solution x = A−1 b. Example 1.12. Consider the matrix ⎡−2 −5 8 ⎢ ⎢1 3 −5 ⎢ A=⎢ ⎢ 3 11 −19 ⎢ ⎢1 7 −13 ⎣ Its REF is

⎡1 3 ⎢ ⎢0 1 ⎢ ⎢ ⎢0 0 ⎢ ⎢0 0 ⎣

−5 −2 0 0

0 −17⎤⎥ 1 5 ⎥⎥ ⎥. 7 1 ⎥⎥ 5 −3 ⎥⎦

1 5 ⎤⎥ 2 −7⎥⎥ ⎥. 1 −5⎥⎥ 0 0 ⎥⎦

22

Chapter 1. Basic facts on vector spaces, matrices, and linear transformations

There are three pivots and then Its RREF is ⎡1 ⎢ ⎢0 ⎢ ⎢ ⎢0 ⎢ ⎢0 ⎣

rank A = 3. Now, consider the system Ax = 0. 0 1 0 0

1 −2 0 0

0 1 ⎤⎥ 0 3 ⎥⎥ ⎥. 1 −5⎥⎥ 0 0 ⎥⎦

The corresponding system is ⎧ x1 + x3 + x5 = 0, ⎪ ⎪ ⎪ ⎪ ⎨x2 − 2x3 + 3x5 = 0, ⎪ ⎪ ⎪ ⎪ ⎩x4 − 5x5 = 0. The leading variables are x1 , x2 , and x4 . Hence, the free variables are x3 and x5 . We can write the solution in the form ⎡ x1 ⎤ ⎡−1⎤ ⎡−1⎤ ⎢ ⎥ ⎢ ⎥ ⎢ ⎥ ⎢−3⎥ ⎢x ⎥ ⎢2⎥ ⎢ ⎥ ⎢ 2⎥ ⎢ ⎥ ⎢ ⎥ ⎢ ⎥ ⎢ ⎥ ⎢x3 ⎥ = c1 ⎢ 1 ⎥ + c2 ⎢ 0 ⎥ . ⎢ ⎥ ⎢ ⎥ ⎢ ⎥ ⎢5⎥ ⎢ x4 ⎥ ⎢0⎥ ⎢ ⎥ ⎢ ⎥ ⎢ ⎥ ⎢ x5 ⎥ ⎢1⎥ ⎢0⎥ ⎣ ⎦ ⎣ ⎦ ⎣ ⎦ Thus, null A = 2. Theorem 1.3.10. Let A ∈ Fm×n . Assume that B ∈ Fm×n is a row echelon form of A. Let k = rank A. Then 1. The nonzero rows of B form a basis in the row space of A. 2. Assume that the pivots of B are the columns 1 ≤ j1 < ⋯ < jk ≤ n, i.e., xj1 , . . . , xjk are the lead variables in the system Ax = 0. Then, the columns j1 , . . . , jk of A form a basis of the column space of A. 3. If n = rank A, then the null space of A consists only of 0. Otherwise, the null space of A has the following basis: For each free variable xp , let xp be the unique solution of Ax = 0, where xp = 1 and all other free variables are zero. Then, these n − rank A vectors form a basis in N (A). Proof. 1. Note that if E ∈ Fm×m , then the row space of EA is contained in the row space A, since any row in EA is a linear combination of the rows of A. Since A = E −1 (EA), it follows that the row space of A is contained in the row space of EA. Hence, the row space of A and EA are the same. Therefore, the row space of A and B are the same. Since the last m − k rows of B are zero rows, it follows that the row space of B is spanned by the first k rows b⊺1 , . . . , b⊺k . We claim that b⊺1 , . . . , b⊺k are linearly independent. Indeed, consider x1 b⊺1 + ⋯ + xk b⊺k = 0⊺ . Since all the entries below the first pivot in B are zero, we deduce that x1 = 0. Continue in the same manner to obtain that x2 = 0, . . . , xk = 0. Then, b⊺1 , . . . , b⊺k form a basis in the row space of B and A. 2. We first show that cj1 , . . . , cjk are linearly independent. Consider the equality ∑kj=1 xij cij = 0. This is equivalent to the system Ax = 0, where x = (x1 , . . . , xn )⊺ and xp = 0 if xp is a free variable. Since all free variables are zero,

1.3. Matrices

23

all lead variables are zero, i.e., xji = 0, for i = 1, . . . , k. Thus, cj1 , . . . , cjk are linearly independent. It is left to show that cp , where xp is a free variable, is a linear combination of cj1 , . . . , cjk . Again, consider Ax = 0, where each xp = 1 and all other free variables are zero. We have the unique solution to Ax = 0, which states that 0 = cp + ∑ki=1 xji cji , i.e., cp = ∑ki=1 −xji cji . 3. This follows from the way we write down the general solution of the system Ax = 0 in terms of free variables. ◻ Remark 1.3.11. Let x1 , . . . , xn ∈ Fm . To find a basis in U ∶= span {x1 , . . . , xn } consisting of k vectors in {x1 , . . . , xn }, apply Theorem 1.3.10 to the column space of the matrix A = [x1 x2 . . . xn ]. To find an appropriate basis in U, apply Theorem 1.3.10 to the row space of A⊺ . Lemma 1.13. Let A ∈ Fm×n . Then, A is equivalent to a block diagonal matrix Irank A ⊕ 0. Proof. First, bring A to a REF matrix B with all pivots equal to 1. Now perform the elementary column operations corresponding to the elementary row operations on B ⊺ . This will give a matrix C with r pivots equal to 1 and all other elements zero. Now, interchange the corresponding columns to obtain Ir ⊕ 0. It is left to show that r = rank A. First, recall that if B = U A, then the row spaces of A and B are the same. Then, rank A = rank B, which is the dimension of the row space. Next, if C = BV , then C and B have the same column space. Thus, rank B = rank C, which is the dimension of the column space. This means r = rank A. ◻ The following theorem is a more detailed case of Theorem 1.3.8. Its proof is left as an exercise. Theorem 1.3.12. Let A ∈ Fn×n . The following statements are equivalent: 1. A is invertible (nonsingular). 2. Ax = b has a unique solution for every b ∈ Fn . 3. Ax = 0 has only the trivial (zero) solution. 4. The reduced row echelon form of A is In . 5. A is a product of elementary matrices. 6. A⊺ is invertible. 7. Rank A = n. 8. N (A) = {0}. 9. null A = 0. 10. The column vectors of A are linearly independent. 11. The column vectors of A span Fn . 12. The column vectors of A form a basis for Fn .

24

Chapter 1. Basic facts on vector spaces, matrices, and linear transformations

13. The row vectors of A are linearly independent. 14. The row vectors of A span Fn . 15. The row vectors of A form a basis for Fn .

1.3.6 Special types of matrices The following are some special types of matrices in Fn×n : 1. A is symmetric if A⊺ = A. Denote by S(n, F) the subspace of n × n symmetric matrices. 2. A is skew-symmetric, or antisymmetric, if A⊺ = −A. Denote by AS(n, F) the subspace of the skew-symmetric matrices. 3. A = [aij ] is called upper triangular if aij = 0, for each j < i. Denote by U(n, F) the subspace of upper triangular matrices. 4. A = [aij ] is called lower triangular if aij = 0 for each j > i. Denote by L(n, F) the subspace of lower triangular matrices. 5. A triangular matrix is one that is either lower triangular or upper triangular. 6. A = [aij ] is called tridiagonal if aij = 0 for all i, j with ∣i − j∣ > 1. 7. A = [aij ] is called upper Hessenberg if aij = 0 for i > j + 1. 8. A ∈ Rn×n is called orthogonal if AA⊺ = A⊺ A = In . 9. An identity matrix times a scalar is called a scalar matrix. Definition 1.3.13. Let α = {α1 < α2 < ⋯ < αp } ⊂ [m], β = {β1 < β2 < ⋯ < βq } ⊂ [n]. Then A[α, β] is a p × q submatrix of A, which is obtained by erasing in A all rows and columns which are not in α and β, respectively. Assume that p < m, q < n. Denote by A(α, β) the (m − p) × (n − q) submatrix of A, which is obtained by erasing in A all rows and columns which are in α and β, respectively. For i ∈ [m], j ∈ [n] denote by A(i, j) ∶= A({i}, {j}). Assume that A ∈ Fn×n , A[α, β] is called a principal if and only if α = β.

1.4 Sum of subspaces 1.4.1 Sum of two subspaces Let V be a vector space and U and W be two subspaces. To determine the smallest subspace of V containing U and W, we make the following definition: Definition 1.4.1. For any two subspaces U, W ⊆ V denote U + W ∶= {v ∶= u + w ∶ u ∈ U, w ∈ W}, where we take all possible vectors u ∈ U, w ∈ W. Theorem 1.4.2. Let V be a vector space and U, W be subspaces in V. Then (a) U + W and U ∩ W are subspaces of V.

1.4. Sum of subspaces

25

(b) Assume that V is finite-dimensional. Then 1. U, W, U ∩ W are finite-dimensional. Let l = dim U ∩ W ≥ 0, p = dim U ≥ 1, q = dim W ≥ 1. 2. There exists a basis {v1 , . . . , vm } in U + W such that {v1 , . . . , vl } is a basis in U∩W, {v1 , . . . , vp } is a basis in U, and {v1 , . . . , vl , vp+1 , . . . , vp+q+l } is a basis in W. 3. dim(U + W) = dim U + dim W − dim U ∩ W. (Identity #(A ∪ B) = #A + #B − #(A ∩ B) for finite sets A, B is analogous to 3.) 4. U + W ⊆ span (U ∪ W), i.e., U + W is the smallest subspaces of V containing U and W. Proof. 1a. Let u, w ∈ U ∩ W. Since u, w ∈ U, it follows au + bw ∈ U. Similarly au + bw ∈ W. Hence, au + bw ∈ U ∩ W and U ∩ W is a subspace. 1b. Assume that u1 , u2 ∈ U, w1 , w2 ∈ W. Then, a(u1 + w1 ) + b(u2 + w2 ) = (au1 + bu2 ) + (aw1 + bw2 ) ∈ U + W. Hence, U + W is a subspace. (b) 1. Any subspace of an m-dimensional space has dimension at most m. (b) 2. Let {v1 , . . . , vl } be a basis in U ∩ W. Complete this linearly independent set in U and W to a basis {v1 , . . . , vp } in U and a basis {v1 , . . . , vl , vp+1 , . . . , vp+q−l } in W. Hence, for any u ∈ U, w ∈ W, u + w ∈ span {v1 , . . . , vp+q−l }. Hence U + W = span {v1 , . . . , vp+q−l }. We now show that v1 , . . . , vp+q−l are linearly independent. Suppose that a1 v1 + ⋯ + ap+q−l vp+q−l = 0. So u ∶= a1 v1 + ⋯ + ap vp = −ap+1 vp+1 + ⋯ − ap+q−l vp+q−l ∶= w. Note that u ∈ U, w ∈ W. So w ∈ U ∩ W. Hence, w = b1 v1 + ⋯ + bl vl . Since v1 , . . . , vl , vp+1 , . . . , vp+q−l are linearly independent, then ap+1 = ⋯ = ap+q−l = b1 = ⋯ = bl = 0. So w = 0 = u. Since v1 , . . . , vp are linearly independent, then a1 = ⋯ = ap = 0. Hence, v1 , . . . , vp+q−l are linearly independent. (b) 3. Note that from (b) 2, dim(U + W) = p + q − l. (b) 4. Clearly, every element u + w of U + W is in span (U + W). Thus, U + W ⊆ span (U + W). ◻

Geometric interpretation of Theorem 1.4.2 (b-3) If U and W are one-dimensional vector subspaces of R3 , then they are lines through the origin. If U = W, then U + W = U and U ∩ W = U. Then 1 = dim(U + W) = (dim U = 1) + (dim W = 1) − (dim(W1 ∩ W2 ) = 1) . If U ≠ W, then U + W will be the plane containing both lines and U∩W = {0}. Then 2 = dim(U + W) = (dim U = 1) + (dim W = 1) − (dim(U ∩ W) = 0) . We can use a similar argument if U is a line and W is a plane. We have two cases: U ∩ W = {0} or U ⊆ W (see Figure 1.7).

26

Chapter 1. Basic facts on vector spaces, matrices, and linear transformations

In the first case, U + W = R3 . Then 3 = dim(U + W) = (dim U = 1) + (dim W = 2) − (dim(U ∩ W) = 0) . In the second case, U + W = W. Then 2 = dim(U + W) = (dim U = 1) + (dim W = 2) − (dim(U ∩ W) = 1) . The case that U and W are both planes is proven similarly.

Figure 1.7 Definition 1.4.3. The subspace X ∶= U + W of V is called a direct sum of U and W if any vector v ∈ U + W has a unique representation of the form v = u + w, where u ∈ U, w ∈ W. Equivalently, if u1 + w1 = u2 + w2 , where u1 , u2 ∈ U, w1 , w2 ∈ W, then u1 = u2 , w1 = w2 . A direct sum of U and W is denoted by U ⊕ W. Here, U is said to be a complement of W in U ⊕ W. Also, U (or W) is called a summand of X. Note that a complement of a subspace is not unique. (Why?) Proposition 1.4.4. For two finite-dimensional vector subspaces U, W ⊆ V, the following are equivalent: (a) U + W = U ⊕ W. (b) U ∩ W = {0}. (c) dim U ∩ W = 0. (d) dim(U + W) = dim U + dim W. (e) For any bases {u1 , . . . , up }, {w1 , . . . , wq } in U, W, respectively, {u1 , . . . , up , w1 , . . . , wq } is a basis in U + W. Proof. The proof is left as an exercise.

◻

A Example 1.14. Let A ∈ Rm×n , B ∈ Rl×n . Then, N (A) ∩ N (B) = N ([B ]). Note that x ∈ N (A) ∩ N (B) if and only if Ax = Bx = 0.

Example 1.15. Let A ∈ Rm×n , B ∈ Rm×l . Then, R(A) + R(B) = R([A B]). Note that R(A) + R(B) is the span of the rows of A and B.

1.4. Sum of subspaces

27

1.4.2 Sums of many subspaces Definition 1.4.5. Let U1 , . . . , Uk be k subspaces of V. Then, X ∶= U1 + ⋯ + Uk is the subspace consisting of all vectors of the form u1 +u2 +⋯+uk , where ui ∈ Ui , i = 1, . . . , k. U1 + ⋯ + Uk is called a direct sum of U1 , . . . , Uk , and denoted by ⊕ki=1 Ui ∶= U1 ⊕ ⋯ ⊕ Uk if any vector in X can be represented in a unique way as u1 + u2 + ⋯ + uk , where ui ∈ Ui , i = 1, . . . , k. Proposition 1.4.6. For finite-dimensional vector subspaces Ui ⊆ V, i = 1, . . . , k, the following statements are equivalent: (a) U1 + ⋯ + Uk = ⊕ki=1 Ui . (b) dim(U1 + ⋯ + Uk ) = ∑ki=1 dim Ui . (c) For any basis {u1,i , . . . , upi ,i } in Ui , the vectors uj,i form a basis in U1 + ⋯ + Uk , where 1 ≤ i ≤ k and 1 ≤ j ≤ pi . Proof. (a) ⇒ (c). Choose a basis {u1,i , . . . , upi ,i } in Ui , i = 1, . . . , k. Since every vector in ⊕ki=1 Ui has a unique representation as u1 + ⋯ + uk , where ui ∈ Ui for i = 1, . . . , k, it follows that 0 can be written in the unique form as a trivial linear combination of all u1,i , . . . , upi ,i , for i ∈ [k]. So all these vectors are linearly independent and span ⊕ki=1 Ui . Similarly, (c) ⇒ (a). Clearly, (c) ⇒ (b). (b) ⇒ (c). It is left as an exercise. ◻ Note that Proposition 1.4.6 states that the dimension of a direct sum is the sum of the dimensions of its summands.

1.4.3 Worked-out problems 1. Consider the set of all polynomials of degree at most 2 with real coefficients and denote it by P2 . P2 is a real vector space with the usual addition and scalar multiplication. Show that the polynomials 1, x − 1, (x − 1)2 form a basis in this vector space. Solution: Using Taylor series at x = 1, we have p(x) = p(1)+p′ (1)(x−1)+ p 2!(1) (x−1)2 , where all other terms in Taylor series are zero, i.e., p(k) (x) = 0, for k ≥ 3. Then, {1, x − 1, (x − 1)2 } spans P2 . Suppose that a linear combination of 1, x − 1, (x − 1)2 is identically zero: p(x) = a + b(x − 1) + c(x − 1)2 = 0. Then, 0 = p(1) = a, 0 = p′ (1) = b, and 0 = p′′ (1) = 2c. Hence, 1, x − 1 and (x − 1)2 are linearly independent. Thus, it is a basis for P2 . √ 2. Let V be the set of all complex√numbers of the form a + ib 5, where a, b are rational numbers and i = −1. It is easy to see that V is a vector space over Q. ′′

(a) Find the dimension of V over Q. (b) Define a product on V such that V is a field.

28

Chapter 1. Basic facts on vector spaces, matrices, and linear transformations

Solution:

√ √ √ (a) Let u =√1 + i0 5 and v = 0 + i 5. Then, a + ib 5 = au + bv, and so {1, i 5} spans V. In addition, u and v are linearly √ independent. Suppose that v = tu, for some t ∈ Q. Thus, t = uv = i 5 which is not a rational number. Hence, dimQ V = 2.

(b) Define √ √ √ (a + ib 5)(c + id 5) = (ac − 5bd) + i(ad + bc) 5. √ √ 5 If (a, b) ≠ 0, then (a + ib 5)−1 = a−ib . 2 a +5b2 3. Let A ∈ Fn×n be an invertible matrix. Show that (a) (A−1 )⊺ = (A⊺ )−1 . (b) A is symmetric if and only if A−1 is symmetric. Solution: (a) (A−1 )⊺ A⊺ = (AA−1 )⊺ = (In )⊺ = In . Then, (A−1 )⊺ = (A⊺ )−1 . (b) Assume that A is symmetric. Using part (a), we have (A−1 )⊺ = (A⊺ )−1 . On the other hand, A⊺ = A. Then, (A−1 )⊺ = A−1 . The converse is obtained by considering (A−1 )−1 = A and using the same argument. 4. If A ∈ Fm×n , show that the dimension of its row space equals the dimension of its column space. Solution: Use the row operations to find an equivalent echelon form matrix C. Using Theorem 1.3.10, we know that the dimension of the row space of A equals the number of nonzero rows of C. Note that each nonzero row of C has exactly one pivot and the different rows have pivots in different columns. Therefore, the number of pivot columns equals the number of nonzero rows. But by Theorem 1.3.10, the number of pivot columns of C equals the number of vectors in a basis for the column space of A. Thus, the dimension of the column space equals the number of nonzero rows of B. This means the dimensions of the row space and column space are the same. 5. Let A and B be n × n matrices over the field F. Prove that if I − AB is invertible, then I − BA is invertible and (I − BA)−1 = I + B(I − AB)−1 A. Solution: (I − BA)(I + B(I − AB)−1 A) = I − BA + B(I − AB)−1 A − BAB(I − AB)−1 A. After the identity matrix, we can factor B on the left and A on the right, and we get (I − BA)(I + B(I − AB)−1 A) = I − B[−I + (I − AB)−1 − AB(I − AB)−1 ]A = I − B[−I + (I − AB)(I − AB)−1 ]A = I + 0 = I.

1.4. Sum of subspaces

29

1.4.4 Problems 1. Let A ∈ Fn×n be an invertible matrix. Show that (a) A is skew-symmetric if and only if A−1 is skew-symmetric. (b) A is lower triangular if and only if A−1 is lower triangular. (c) A is upper triangular if and only if A−1 is upper triangular. 2. Let A = [aij ] ∈ Fn×n with the following property: Its REF is an upper triangular matrix U with 1’s on the main diagonals, and there is no pivoting. That is, first a11 ≠ 0. Apply the Gauss elimination to obtain a (1) matrix A1 = [aij ], where the first column of A1 is e1 the first column (1)

of the identity. Let B1 = [aij ]ni=j=2 be the matrix obtained by deleting the first row and column of A1 . Then, B1 satisfies the same assump(1) tions as A, i.e., a22 ≠ 0. Now continue as above to obtain U . Show that A = LU , where L is a corresponding nonsingular lower triangular matrix. 3. Let A ∈ Fl×m , B ∈ Fm×n . Show rank AB ≤ min {rank A, rank B}. Furthermore, if A or B is invertible, then equality holds. Give an example where inequality holds strictly. (Hint: Use the fact that each column of AB is a combination of the columns of A, and then use Worked-out Problem 1.4.3-4.) 4. Prove Theorem 1.3.12. 5. Let V be a finite-dimensional vector space and U be a subspace of V. Prove that U has a complement in V. 6. Prove the following statements: (a) Row equivalence is an equivalence relation. (b) Row equivalent matrices have the same row space. (c) Let F be a field. Then the set of matrices of rank r, whose rows (viewed as column vectors) span a given subspace U ⊆ Fn of dimension r correspond to exactly one row equivalence class in Fm×n . (The set of subspaces of given dimension in a fixed vector space is called the Grassmannian. In part (c) of the above problem, we have constructed a bijection between the Grassmannian of r-dimensional subspaces of Fn , denoted Gr(r, Fn ), and the set of reduced row echelon matrices with n columns and r nonzero rows. We will study this notion in section 4.10.) 7. Let A ∈ Fm×n . Prove that a nonzero, nonpivot column of A is a linear combination of the pivot columns of A. 8. Let A ∈ Fm×n . Prove that there is at most one solution to Ax = B, for B ∈ Fm if and only if rank A = n. (Hint: Use the rank-nullity theorem, Worked-out Problem 1.6.2-2.)

30

Chapter 1. Basic facts on vector spaces, matrices, and linear transformations

9. Let A ∈ Fm×n . Prove that null A equals the number of nonpivot columns of A. 10. If E is an elementary matrix, prove that E ⊺ is an elementary matrix. 11. Let E/F be a field extension and A ∈ Fn×n . Prove that rank F A = rank E A. This statement is well known as the rank insensitivity lemma. 12. Let C(R) denote the set of real-valued continuous functions over R. Then C(R) is a real vector space with the usual addition and scalar multiplication of functions. A function f ∈ C(R) is even if f (−x) = f (x), for all x ∈ R, and f is odd if f (−x) = −f (x), for all x ∈ R. Let U and W be the sets of even and odd continuous functions on R, respectively. (a) Show that U and W are subspaces of C(R). (b) Show that C(R) = U ⊕ W. 13. Let A, B, C, D be in Fn×n AB ⊺ , CD⊺ be symmetric, and let AD⊺ −BC ⊺ = I. Prove that A⊺ D − C ⊺ B = I. 14. For matrices A, B, C of appropriate sizes, prove that rank (AB) + rank (BC) − rank B ≤ rank (ABC). 15. Let V be a vector space and W1 , W2 , and W3 be subspaces of V. If W1 ⊆ W3 , prove that W3 ∩ (W2 + W1 ) = W1 + (W2 ∩ W3 ). This result is known as the modular law for subspaces. 16. Let A, B ∈ Fn×n and B ⊺ A = 0. Prove that rank (A + B) = rank A + rank B. 17. Prove that for A ∈ Fm×n , rank A = 0 if and only if A is the zero matrix.

1.5 Permutations This section gives the basic facts about permutations that are needed to discuss determinants.

1.5.1 The permutation group Denote by Sn the group of the permutations of the set [n] onto itself. Indeed, Sn = S([n]). The smallest integer N > 0 such that for ω ∈ Sn , ω N = id, is called the order of ω. If a1 , . . . , ar ∈ [n] are distinct, then the symbol (a1 , . . . , ar ) denotes the permutation ω of [n] which sends a1 to a2 , sends a2 to a3 , . . . , sends ar−1 to ar , sends ar to a1 , and fixes all the other numbers in [n]. In other words, ω(a1 ) = a2 , ω(a2 ) = a3 , . . . , ω(ar ) = a1 , and ω(i) = i, if i ∈/ {a1 , . . . , ar }. Such a permutation is called cyclic. The number r is called the length of the cycle.

1.5. Permutations

31

For example, the permutation of S4 that sends 1 to 3, 3 to 2, 2 to 4, and 4 to 1 is cyclic, while the permutation that sends 1 to 3, 3 to 1, 2 to 4, and 4 to 2 is not. Remark 1.5.1. The cyclic permutation ω given by (a1 , . . . , ar ) has order r. Note that ω(a1 ) = a2 , ω 2 (a1 ) = a3 , . . . , ω r−1 (a1 ) = ar , ω r (a1 ) = a1 , by definition of ω. Likewise, for any i = 1, . . . , r, we have ω r (ai ) = ai . Now, introduce the following polynomials: Pω (x) ∶=

∏ (xω(i) − xω(j) ), x = (x1 , . . . , xn ), for each ω ∈ Sn .

(1.5.1)

1≤i ω(j), then the factor −(xω(i) − xω(j) ) appears in Pid (x). Hence, sign(ω) ∈ {1, −1}. Observe next that Pω○σ (x) Pω○σ (x) Pσ (x) = = sign(ω)sign(σ). Pid (x) Pσ (x) Pid (x) (To show the equality i ∈ [n].)

Pω○σ (x) Pσ (x)

= sign(ω), introduce new variables yi = xσ(i) , for ◻

An element τ ∈ Sn is called a transposition if there exists a pair of integers 1 ≤ i < j ≤ n such that τ (i) = j, τ (j) = i. Furthermore, τ (k) = k for k ≠ i, j. (A transposition is a cycle of length 2.) Note that τ ○ τ = id, i.e., τ −1 = τ . Theorem 1.5.3. For an integer n ≥ 2, each ω ∈ Sn is a product of transpositions: ω = τ1 ○ τ2 ○ ⋯ ○ τm .

(1.5.4)

The parity of m is unique. More precisely, sign(ω) = (−1)m . Proof. We agree that in (1.5.4), m = 0 if and only if ω = id. (This is true for any n ∈ N.) We prove the theorem by induction on n. For n = 2, S2 consists of two elements id and a unique permutation τ , which satisfies τ (1) = 2, τ (2) = 1. Thus, τ ○ τ = id. In this case, the lemma is straightforwardly. Suppose that the theorem holds for n = N ≥ 2 and assume that n = N +1. Let ω ∈ Sn . Suppose first that ω(n) = n. Then, ω can be identified with the bijection ω ′ ∈ Sn−1 , where ω ′ (i) = ω(i), for i = 1, . . . , n − 1. Use the induction hypothesis

32

Chapter 1. Basic facts on vector spaces, matrices, and linear transformations ′ to express ω ′ as a product of m transposition τ1′ ○ τ2′ ○ ⋯ ○ τm in Sn−1 . Clearly, ′ each τi extends to a transposition τi in Sn . Hence, (1.5.4) holds. Suppose now that ω(n) = i < n. Let τ be the transposition that interchanges i and n. Let ω ′ = τ ○ ω. Then, ω ′ (n) = τ (ω(n)) = n. The previous arguments show that ω ′ = τ1 ○ ⋯ ○ τl . Therefore, ω = τ ○ ω ′ = τ ○ τ1 ○ ⋯ ○ τl . It is left to show that the parity of m is unique. First, observe that sign(τ ) = −1. Using Problem 1.5.3-4 and Theorem 1.5.2, we obtain sign(ω) = (−1)m . Hence, the parity of m is fixed. ◻

Remark 1.5.4. Note that the product decomposition is not unique if n > 1. For example, for (1, 2, 3, 4, 5) we have the following product decompositions: (5, 4)(5, 6)(2, 1)(2, 5)(2, 3)(1, 3) = (1, 2)(2, 3)(3, 4)(4, 5) ´¹¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¸ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¶ ´¹¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¸ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¶ 6 transpositions

4 transpositions

= (1, 5)(1, 4)(1, 3)(1, 2) = ⋯. ´¹¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¸ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¶ 4 transpositions

We finish this section with the definition of a permutation matrix. Definition 1.5.5. A permutation matrix is a square matrix obtained from the same-size identity matrix by a permutation of rows. A permutation matrix is called elementary if it is obtained by permutation of exactly two distinct rows. Clearly, every permutation matrix has exactly one 1 in each row and column. It is easy to show that every elementary permutation matrix is symmetric. Note that an elementary permutation matrix corresponds to a transposition in Sn . Theorem 1.5.3 implies that every permutation matrix is a product of elementary matrices. Note that a general permutation matrix is not symmetric. We denote by Pn the set of all permutation matrices. It is easy to see that #Pn = n!. Indeed, there is a one-to-one correspondence between Pn and Sn , the set of all permutations of [n], and every P = [pij ] ∈ Pn is associated to a permutation σ ∈ Sn for which σ(i) = j if pij = 1. In section 1.9, we will study a new family of matrices called doubly stochastic matrices, and we will see their relationship to permutation matrices.

1.5.2 Worked-out problems 1. In Sn , consider all elements ω such that sign(ω) = 1; denote it by An . (a) Show that it is a subgroup of Sn . (b) How many elements does it have? (c) Show that S3 is not a commutative group. Solution: (a) Since id ∈ An , then An is a nonempty subset of Sn . Now, if σ and ω are two elements of An , then (1)

(2)

sign(σω −1 ) == sign(σ)sign(ω −1 ) == sign(σ)sign(ω) = 1.

1.5. Permutations

33

(1) Use Theorem 1.5.3. (2) Use Problem 1.5.3-2. Then, σω −1 ∈ An , and this means An is a subgroup of Sn . Note that An is called the alternating group on n elements. (b) Let σ ∈ Sn be a transposition. Then, for any ω ∈ An , sign(σω) = sign(σ)sign(ω) = (−1)(1) = −1. It follows that σAn ∪ An = Sn . Since n = n! . #σAn = #An and An ∩ σAn = ∅, then #An = #S 2 2 (c) Take σ(1) = 2, σ(2) = 3, σ(3) = 1, ω(1) = 2, ω(2) = 1, and ω(3) = 3. We have (σω)(1) = σ(ω(1)) = σ(2) = 3, (ωσ)(1) = ω(σ(1)) = ω(2) = 1. Thus, σω ≠ ωσ and S3 is not commutative.

1.5.3 Problems Assume that ω and τ are two elements of Sn . Show that 1. sign(id) = 1. 2. sign(ω −1 ) = sign(ω). (Hint: Use the previous problem and the fact that sign(ωω −1 ) = sign(id).) 3. Assume that 1 ≤ i < j ≤ n. We say that ω changes the order of the pair (i, j) if ω(i) > ω(j). Let N (ω) be the number of pairs (i, j) such that ω changes their order. Then, sign(ω) = (−1)N (ω) . 4. Assume that 1 ≤ i < j ≤ n. Let τ be the transposition τ (i) = j, τ (j) = i. Then, N (τ ) = 2(j − i) − 1. Hence, sign(τ ) = −1 for any transposition τ . 5. Give an example of a permutation matrix which is not symmetric. 6. Let X be a set. Show that S(X ) is a group. 7. Show that if X has one or two elements, then S(X ) is a commutative group. 8. Show that if X has at least three elements, then S(X ) is not a commutative group. 9. Show that Z(Sn ) = {1}, for n ≥ 3. 10. Let An = {σ ∈ Sn ∶ sign(ω) = 1}. Show that (a) An ⊲ Sn , (b) Sn /An ≅Z2 , (c) #An =

n! . 2

Note that An is called the alternating group on [n]. See Worked-out Problem 1.5.2-1.

34

Chapter 1. Basic facts on vector spaces, matrices, and linear transformations

1.6 Linear, multilinear maps and functions (first encounter) When we study sets, the notion of a function is important. There is not much to say about an isolated set, while by functions we can link two different sets. Similarly, this can be the case in linear algebra, where we cannot say much about an isolated vector space. We have to link different vector spaces by functions. We often seek functions that preserve certain structures. For example, continuous functions preserve those points which are close to one another. In linear algebra, the functions which preserve the vector space structures are called linear operators (linear transformations). Roughly speaking, a linear operator is a function from one vector space to another that preserves the vector space operations. In what follows, we give its precise definition and related concepts. Let U and V be two vector spaces over a field F. A map T ∶ U → V is called linear or a linear operator (linear transformation) if T (au + bv) = aT (u) + bT (v), for all a, b ∈ F and u, v ∈ V. The set of linear maps from U to V is denoted by L(U, V). For the case U = V, we simply use the notation L(V). A linear operator T ∶ U → V is called a linear isomorphism if T is bijective. In this case, U is isomorphic to V and is denoted by U ≅ V. Also, the kernel and image of T are denoted by ker T and Im T , respectively, and defined as follows: ker T = {x ∈ U ∶ T (x) = 0}, Im T = {T (x) ∶ x ∈ U}. It is easy to show that ker T and Im T are subspaces of U and V, respectively. Furthermore, the nullity and the rank of T are denoted by null T and rank T and defined as follows: null T = dim(ker T ), rank T = dim(Im T ). k

Let U1 , . . . , Uk be vector spaces (k ≥ 2). Denote by × Ui ∶= U1 × . . . × Uk the i=1

k

set of all tuples (u1 , . . . , uk ), where ui ∈ Ui , for i ∈ [k]. A map T ∶ × Ui → V i=1

is called multilinear or a multilinear operator if it is linear with respect to each variable ui , while all other variables are fixed, i.e., T (u1 , . . . , ui−1 , avi + bwi , ui+1 , . . . , uk ) = aT (u1 , . . . , ui−1 , vi , ui+1 , . . . , uk ) + bT (u1 , . . . , ui−1 , wi , ui+1 , . . . , uk ), for all a, b ∈ F, uj ∈ Uj , j ∈ [k] ∖ {i}, vi , wi ∈ Ui and i ∈ [k]. Note that if k = 2, then T is called bilinear. Suppose that U1 = ⋯ = Uk = U. Denote ×U ∶= U × ⋯ × U . k ´¹¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¸ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¶ k

A map T ∶ ×U → V is called a symmetric map if k

T (uσ(1) , . . . , uσ(k) ) = T (u1 , . . . , uk ),

1.6. Linear, multilinear maps and functions (first encounter)

35

for each permutation σ ∈ Sk and each u1 , . . . , uk . A map T ∶ ×U → V is called a k

skew-symmetric map if T (uσ(1) , . . . , uσ(k) ) = sign(σ)T (u1 , . . . , uk ), for each permutation σ ∈ Sk . Since each permutation is a product of transpositions, and sign(τ ) = −1, for any transposition, a map T is skew-symmetric if and only if T (uτ (1) , . . . , uτ (k) ) = −T (u1 , . . . , uk ), for each transposition τ ∈ Sk .

(1.6.1)

In this book, most of the maps are linear or multilinear. If T ∈ L(V, F), then T is called a linear functional. Example 1.16. Consider the multilinear map T ∶ Fm × Fn → Fm×n given by T (x, y) = xy⊺ . Then, T (Fm ×Fn ) is the set of all matrices of ranks 0 and 1. Example 1.17. Consider T and Q ∶ Fm × Fm → Fm×m given by T (x, y) = xy⊺ + yx⊺ , Q(x, y) = xy⊺ − yx⊺ . Then, T and Q are a multilinear symmetric and a skew-symmetric map, respectively. Also, T (Fm ×Fm ) is the set of all symmetric matrices of ranks 0, 1, and 2.

Geometric operators Projection: Define T ∶ R3 → R2 by T (α, β, γ) = (α, β). Then T is a linear operator known as a projection because of its geometric interpretation. See Figure 1.8.

Figure 1.8 Rotation: Consider the function T ∶ R2 → R2 , and let T x be the vector obtained by rotating x through an angle θ counter-clockwise about the origin. It is easy to see geometrically that T (x + y) = T x + T y and T (cx) = cT x. This means that T is a linear operator. Considering the unit vectors, we have T (1, 0) = (cos θ, sin θ) and T (0, 1) = (− sin θ, cos θ). Therefore, T (α, β) = αT (1, 0) + βT (0, 1) = (α cos θ − β sin θ, α sin θ + β cos θ). See Figure 1.9.

36

Chapter 1. Basic facts on vector spaces, matrices, and linear transformations

Figure 1.9 Any rotation has the property that it is a length-preserving operator (known as an isometry in geometry). Reflection: Consider T ∶ R2 → R2 , and let T x be the vector obtained from reflecting x through a line through the origin that makes an angle θ2 with the x-axis. This is a linear operator and, considering the unit vectors, we have T (1, 0) = (cos θ, sin θ) and T (0, 1) = (sin θ, − cos θ). Thus, T (α, β) = αT (1, 0) + βT (0, 1) = (α cos θ + β sin θ, α sin θ − β cos θ). See Figure 1.10.

Figure 1.10

Examples of linear operators 1. If A ∈ Fm×n , then the map TA ∶ Fn x is linear.

→ Fm ↦ Ax

1.6. Linear, multilinear maps and functions (first encounter)

37

df 2. The derivative map D ∶ R[z] → R[z] given by D(f ) = dz is linear (Here, R[z] denotes all polynomials with variable z and real coefficients and f ∈ R[z].) 1

3. The integral map T ∶ R[z] → R[z] given by T (f ) = ∫0 f (z)dz is linear. Remark 1.6.1. Consider the linear operator given in Example 1 above. The function TA is called the linear operator associated to the matrix A. Notation 1.6.2. The vectors e1 = (1, 0, . . . , 0)⊺ , e2 = (0, 1, . . . , 0)⊺ , . . ., en = (0, . . . , 0, 1)⊺ in Fn will be called the standard basis vectors. Generally, all the components of ei ’s are 0 except for the ith component, which is 1. Any vector x = (x1 , . . . , xn )⊺ ∈ Fn may be written x = x1 e1 + x2 e2 + ⋯ + xn en . Theorem 1.6.3. Let T ∶ Fn → Fm be a linear operator. Then, T = TA for some A ∈ Fm×n . The jth column of A equals T (ej ), for j = 1, . . . , n. Proof. For any x = (x1 , . . . , xn )⊺ ∈ Fn , we have T (x) = T (∑ni=1 xi ei ) = n ∑i=1 xi T (ei ). Consequently, T is entirely determined by its value on the standard basis vectors, which we may write as T (e1 ) = (a11 , a21 , . . . , am1 )⊺ , . . . , T (en ) = (a1n , a2n , . . . , amn )⊺ . Combining this with the previous formula, we obtain ⎡ a1n ⎤ ⎡ a11 ⎤ ⎤ ⎢ ⎥ ⎡ n ⎢ ⎥ ⎢ a ⎥ ⎢ ∑j=1 aij xj ⎥ ⎢a ⎥ ⎥ ⎢ 2n ⎥ ⎢ ⎢ 21 ⎥ ⋮ ⎥. ⎥=⎢ ⎥ + ⋯ + xn ⎢ T (x) = x1 ⎢ ⎥ ⎢ ⋮ ⎥ ⎢ n ⎢ ⋮ ⎥ ⎢ ⎥ ⎢∑j=1 amj xj ⎥ ⎢ ⎥ ⎢amn ⎥ ⎣ ⎢am1 ⎥ ⎦ ⎣ ⎦ ⎣ ⎦ Then, T = TA for the matrix A = [aij ]m×n .

◻

Corollary 1.6.4. There is a one-to-one correspondence between linear operators T ∶ Fn → Fm and m × n matrices, i.e., Fm×n . Definition 1.6.5. An inner product space is a vector space V over the field F (F = R or F = C), together with an inner product, i.e., with a map ⟨⋅, ⋅⟩ ∶ V×V → F that satisfies the following three axioms for all vectors x, y, z ∈ V and all scalars a ∈ F: (i) Conjugate symmetry: ⟨x, y⟩ = ⟨y, x⟩. (See Appendix E for the above notation.) (ii) Linearity in the first component: ⟨ax, y⟩ = a⟨x, y⟩, ⟨x + y, z⟩ = ⟨x, z⟩ + ⟨y, z⟩. (iii) Positive-definiteness: ⟨x, x⟩ ≥ 0, ⟨x, x⟩ = 0 ⇒ x = 0. We will discuss this map in more detail in Chapter 4.

38

Chapter 1. Basic facts on vector spaces, matrices, and linear transformations

Definition 1.6.6. For vectors u = (u1 , u2 , u3 )⊺ and v = (v1 , v2 , v3 )⊺ in R3 , the cross product is defined by u × v = (u2 v3 − u3 v2 , −u1 v3 + u3 v1 , u1 v2 − u2 v1 )⊺ . Example 1.18. Assume that U = V = R3 . Let T (x, y) = x × y be the cross product in R3 . Then, T ∶ R3 × R3 → R3 is a bilinear skew-symmetric map. Example 1.19. If a vector space V over the real numbers carries an inner product, then the inner product is a bilinear map V × V → R. Example 1.20. Matrix multiplication is a bilinear map, Fm×n ×Fn×p → Fm×p . Remark 1.6.7. Note that in the definition of a linear operator, the background fields of the vector spaces of the domain and codomain must be the same. For example, consider the identity linear operator id ∶ V x

→V . ↦x

If the background fields of the vector space V as domain and codomain are not the same, then the identity function is not a linear operator necessarily. Consider the identity map from C to C, where C as a domain is vector space over R and C as a codomain is a vector space over Q. If id is a linear operator, then it is a linear isomorphism. This contradicts the case dimR C = 2 < dimQ C.

1.6.1 Dual spaces and annihilators Let V be a vector space over the field F. A linear operator f ∈ L(V, F) is called a linear functional on V. Also, V′ = L(V, F), the space of the linear functionals on V, is called the dual space. For S ⊆ V we define the annihilator of S as S ⊥ = {f ∈ V′ ∶ f (v) = 0, for all v ∈ S}. Example 1.21. The map f ∶ F[z] → F defined by f (p(z)) = p(0) is a linear functional, known as evaluation at 0. Theorem 1.6.8. Let V be a vector space and S ⊆ V. (i) S ⊥ is a subspace of V′ (although S does not have to be a subspace of V). (ii) S ⊥ = (span S)⊥ . (iii) Assume that V is finite-dimensional with a basis {v1 , . . . , vn }. Let i ∈ [n] and Vi = ⟨v1 , . . . , vi−1 , vi+1 , . . . , vn ⟩. Then dim Vi⊥ = 1. Assume that fi ∈ Vi⊥ satisfies fi (vi ) = 1. Then {fi } is a basis in Vi⊺ . Moreover, {f1 , . . . , fn } is a basis in V′ . (It is called the dual basis of {v1 , . . . , vn }.) Furthermore, ⊥ {fi+1 , . . . , fn } is a basis of span {v1 , . . . , vi } .

1.6. Linear, multilinear maps and functions (first encounter)

39

Proof. (i) First we show that S ⊥ is a subspace of V′ . Note that the zero functional obviously sends every vector in S to zero, so 0 ∈ S ⊥ . If c, d ∈ F and f1 , f2 ∈ S ⊥ , then for each v ∈ S, (cf1 + df2 )(v) = cf1 (v) + df2 (v) = 0. Then, cf1 + df2 ∈ S ⊥ and S ⊥ is a subspace of V′ . (ii) We show that S ⊥ = (span S)⊥ . Take f ∈ S ⊥ . Then if v ∈ span S, we can write v = c1 v1 + ⋯ + cm vm , for scalars ci ∈ F and vi ∈ S. Thus, f (v) = c1 f (v1 ) + ⋯ + cm f (vm ) = 0, so f ∈ (span S)⊥ . On the other hand, if f ∈ (span S)⊥ then clearly f (v) = 0, for all v ∈ S. This completes the proof of (ii). (iii) Let fi ∈ V′ be given by the equalities fi (vj ) = δij for j = 1, . . . , n. Clearly, fi ∈ Vi⊥ . Assume that f ∈ Vi⊥ . Let f (vi ) = a. Consider g = f − afi . Thus, g ∈ Vi⊥ , g(vi ) = f (vi ) − afi (vi ) = 0, and g ∈ {v1 , . . . , vn }⊥ . Hence g = 0 and f = afi . Thus {fi } is a basis in Vi⊥ . Observe next that f1 , . . . , fn are linearly independent. Assume that ∑nj=1 aj fj = 0. Then 0 = (∑nj=1 aj fj )(vi ) = ai fi (vi ) = ai for i = 1, . . . , n. We now show that f1 , . . . , fn is a basis in V′ . Let f ∈ V′ and assume that f (vi ) = ai for i = 1, . . . , n. Then g = f − ∑ni=1 ai fi ∈ {v1 , . . . , vn }⊥ . Hence g = 0 and f = ∑ni=1 f (vi )fi . Assume that f ∈ Vi′ . Let f = ∑nj=1 aj fj . Assume that k ∈ [i]. Hence 0 = f (vk ) = ak and f = ∑nj=i+1 aj fj . Conversely, suppose that f = ∑nj=i+1 aj fj . Then f ∈ Vi′ . Hence {fi+1 , . . . , fn } is a basis of Vi⊥ . ◻

1.6.2 Worked-out problems 1. Let V and W be finite-dimensional vector spaces over the field F. Show that V ≅ W if and only if dim V = dim W. Solution: Assume that T ∶ V → W is a linear isomorphism and {v1 , . . . , vn } is a basis for V. We claim that {T (v1 ), . . . , T (vn )} is a basis for W. First, we verify that T (vi )’s are linearly independent. Suppose that ∑ni=1 ci T (vi ) = 0, for some ci ∈ F. Since T is linear, then T (∑ni=1 ci vi ) = 0 = T (0) and, as T is injective, so ∑ni=1 ci vi = 0. Since the vi ’s are linearly independent, thus ci = 0, 1 ≤ i ≤ n. This verifies that T (v1 ), . . . , T (vn ) are linearly independent. Next, we show that the T (vi )’s span W. Choose y ∈ W, since T is surjective, one can find x ∈ V, for which T (x) = y. Now, if x = ∑ni=1 ci vi , for some ci ∈ F, then y = ∑ni=1 ci T (vi ). Thus, dim W = n. Conversely, assume that dim V = dim W and {v1 , . . . , vn } and {w1 , . . . , wn } are bases for V and W, respectively. Then, the map vi ↦ wi is a linear isomorphism from V to W.

40

Chapter 1. Basic facts on vector spaces, matrices, and linear transformations

2. Prove the rank-nullity theorem: For any A ∈ Fm×n we have rank A + null A = n. Solution: If rank A = 0, then by Theorem 1.3.12, the only solution to Ax = 0 is the trivial solution x = 0. Hence, in this case, null A = 0 and the statement holds. Now suppose that rank A = r < n. In this case, there are n − r > 0 free variables in the solution to Ax = 0. Let t1 , . . . , tn−r denote these free variables (chosen as those variables not attached to a leading one in any REF of A), and let x1 , . . . , xn−r denote the solution obtained by sequentially setting each free variable to 1 and the remaining free variables to 0. Note that {x1 , . . . , xn−1 } is linearly independent. Moreover, every solution to Ax = 0 is a linear combination of x1 , . . . , xn−r which shows that {x1 , . . . , xn−r } spans N (A). Then, {x1 , . . . , xn−r } is a basis for N (A) and null A = n − r. The above result says that there is a complementary relation between the rank and the nullity: for matrices of a given size, the larger the rank, the smaller the nullity, and vice versa.

1.6.3 Problems 1. If U, V, and W are vector spaces over the field F, show that (a) if T1 ∈ L(U, V) and T2 ∈ L(V, W), then T2 ○ T1 ∈ L(U, W). (b) null T2 ○ T1 ≤ null T1 + null T2 . (Hint: Use Worked-out Problem 1.6.2-2.) 2. Assume that V is a vector space over the field F. Let W be a subspace of V. Show that there exists a T1 ∈ L(V, F) such that ker T1 = W. Show also that there is a T2 ∈ L(V, F) such that ImT2 = W. 3. If U and V are vector spaces over the field F, show that L(U, V) is a vector space over F, too. cos θ − sin θ 4. Consider the matrices A = [ ] ∈ R2×2 and B = [eiθ ] ∈ C1×1 = C. sin θ cos θ Show that both TA and TB are counterclockwise rotation through angle θ. (Then, A and B represent the same motion.) 5. Let A ∈ Fn×n and define TA ∶ Fn x

→ Fn . ↦ Ax

Show that the following statements are equivalent: (a) (b) (c) (d)

A is invertible. TA is bijective. TA is surjective. TA is injective.

1.7. Definition and properties of the determinant

41

6. Let V be a vector space. Show that there is no isomorphism V → V′ that maps every basis to its dual. 7. Let U and W be two finite-dimensional vector spaces over a field F and T ∈ L(U, W). Show that if T is surjective, then dim U ≥ dim W. 8. Is there any linear operator T ∶ R2 → R3 that is surjective? (Hint: Use the previous problem.) 9. Let V be a finite dimensional vector space and U be a subspace of V. Suppose W1 and W2 are complements to U in V. Prove that W1 and W2 are isomorphic. 10. Prove that as vector spaces over Q, Rn ≅ R, for all n ∈ N.

1.7 Definition and properties of the determinant Given a square matrix A, it is important to be able to determine whether or not A is invertible, and if it is invertible, how to compute A−1 . This arises, for example, when trying to solve the nonhomogeneous equation Ax = y, or trying to find eigenvalues by determining whether or not A − λI is invertible. (See section 2.1 to find the definition and details on eigenvalues.)

1.7.1 Geometric interpretation of determinant (first encounter) Let A be an n × n real matrix. As we mentioned already, we can view A as a linear transformation from Rn to Rn given by x → Ax. If A is a 2 × 2 matrix with column vectors a and b, then the linearity means that A transforms the unit square in R2 into the parallelogram in R2 determined by a and b. (See Figure 1.11.)

Figure 1.11. Effect of the matrix A = [a∣b] on the unit square. Similarly, in the 3×3 case, A maps the unit cube in R3 into the parallelepiped (or solid parallelogram) in R3 determined by the column vectors of A. In general, an n × n matrix A maps the unit n-cube in Rn into the n-dimensional parallelogram determined by the column vectors of A. Other squares (or cubes, or hypercubes, etc.) are transformed in much the same way, and scaling the sides of the squares merely scales the sides of the

42

Chapter 1. Basic facts on vector spaces, matrices, and linear transformations

parallelograms (or parallelepipeds, or higher-dimensional parallelograms) by the same amount. In particular, the magnification factor area (or volume) of image region area (or volume) of original region is always the same, no matter which squares (or cubes, or hypercubes) we start with. Indeed, since we can calculate the areas (or volumes) of reasonably nice regions by covering them with little squares (or cubes) and taking limits, the above ratio will still be the same for these regions, too. Definition 1.7.1. The absolute value of the determinant of the matrix A is the above magnification factor. For example, since the unit square has area 1, the determinant of a 2×2 matrix A is the area of the parallelogram determined by the columns of A. Similarly, the determinant of a 3 × 3 matrix A is the volume of the parallelepiped determined by columns of A. See section 5.6 for more results on the geometric approach to the determinant. Moreover, for more results see [13]. See also section 5.7. In what follows, we give an explicit formula to calculate the determinant of a matrix over an arbitrary field.

1.7.2 Explicit formula of the determinant and its properties We define the determinant over an arbitrary field in the standard way. Definition 1.7.2. For A ∈ Fn×n , we define the determinant of A by the formula det A = ∑ sign(ω)a1ω(1) a2ω(2) . . . anω(n) ,

A = [aij ]∈ Fn×n .

(1.7.1)

ω∈Sn

In what follows, we verify that the formulas given in Definitions 1.7.1 and 1.7.2 are the same for 2×2 real matrices. This can be generalized for real matrices a12 of any size. We assume that A = [aa11 a22 ]. We are going to calculate the area 21 of the parallelogram determined by a and b. We have a = A(1, 0)⊺ = (a11 , a21 )⊺ ,

b = A(0, 1)⊺ = (a12 , a22 )⊺ .

The equation of the line passing through the origin and a is this line by L; then we have

a21 x−y a11

= 0. Denote

a12 − a22 ∣ ∣ a21 a11 d(b, L) = √ , 2 a21 ( a11 ) + 1 √ d(a, 0) = a211 + a221 .

Here, d denotes the distance of two points or a point and a line. Then, the area of the image region in Figure 1.11 is equal to √

a211

+ a221

a12 ∣ a21 − a22 ∣ a11 ⋅√ = ∣a11 a22 − a21 a12 ∣ . 2 a21 ( a11 ) + 1

1.7. Definition and properties of the determinant

43

As the area of the original region (unit square) in Figure 1.11 is 1, then the magnification factor (∣ det A∣) is ∣a11 a22 − a21 a21 ∣. Clearly, we obtain the same value for ∣ det A∣ by formula (1.7.1). Note that here we used the distance formula between a line L = ax+by +c and 0 +c∣ . a point p = (x0 , y0 ) which is denoted by d(p, L) and given as d(p, L) = ∣ax√0 +by a2 +b2 In the following theorem, we study the main properties of the determinant function. Theorem 1.7.3. Assume that the characteristic of F is different from 2, i.e., 2 ≠ 0 in F. Then, the determinant function det ∶ Fn×n → F is the unique skewsymmetric multilinear function in the rows of A satisfying the normalization condition: det In = 1. Furthermore, it satisfies the following properties: 1. det A = det A⊺ . 2. det A is a multilinear function in rows or columns of A. 3. The determinant of a lower triangular or an upper triangular matrix is a product of the diagonal entries of A. 4. Let B be obtained from A by permuting two rows or columns of A. Then det B = − det A. This property of the determinant function is known as the alternating property. 5. If A has two equal rows or columns, then det A = 0. 6. A is invertible if and only if det A ≠ 0. 7. Let A, B ∈ Fn×n . Then, det AB = det A det B. 8. (Laplace row and column expansion for determinants) For i, j ∈ [n] denote by A(i, j) ∈ F(n−1)×(n−1) the matrix obtained from A by deleting its ith row and jth column. Then n

n

j=1

i=1

det A = ∑ aij (−1)i+j det A(i, j) = ∑ aji (−1)i+j det A(i, j),

(1.7.2)

for i = 1, . . . , n. Remark 1.7.4. Note that the Laplace expansion shows how to reduce the evaluation of the determinant of an n × n matrix. Proof of Theorem 1.7.3. First, observe that if the determinant function exists, then it must be defined by (1.7.1). Indeed, n

ri = ∑ aij e⊺j , j=1

ej = (δj1 , . . . , δjn )⊺ .

44

Chapter 1. Basic facts on vector spaces, matrices, and linear transformations

Let T ∶ ⊕Fn → F be a multilinear skew-symmetric function. Use multilinearity n

of T to deduce T (r1 , . . . , rn ) =

∑

i1 ,...,in ∈[n]

a1i1 . . . anin T (ei1 , . . . , ein ).

Since T is skew-symmetric, T (ei1 , . . . , ein ) = 0 if ip = iq , for some 1 ≤ p < q ≤ n. Indeed, if we interchange eip = eiq , then we do not change the value of T . On the other hand, since T is skew-symmetric, the value of T is equal to −T , when we interchange eip = eiq . As the characteristic of F is not 2, we deduce that T (ei1 , . . . , ein ) = 0, if ip = iq . Hence, in the above expansion of T (r1 , . . . , rn ), the only nonzero terms are where {i1 , . . . , in } = [n]. Each such set of n indices {i1 , . . . , in } corresponds uniquely to a permutation ω ∈ Sn , where ω(j) = ij , for j ∈ [n]. Now {e1 , . . . , en } can be brought to {eσ(1) , . . . , eσ(n) } by using transpositions. The composition of these transpositions yields σ. Since T is skew-symmetric, it follows that T (eσ(1) , . . . , eσ(n) ) = sign(ω)T (e1 , . . . , en ). As T (In ) = 1, we deduce that T (eσ(1) , . . . , eσ(n) ) = sign(ω). 1. In (1.7.1) note that if j = ω(i), then i = ω −1 (j). Since ω is a bijection, we deduce that when i = 1, . . . , n, then j takes each value in [n]. Since sign(ω) = sign(ω −1 ), we obtain det A = ∑ sign(ω)aω−1 (1)1 aω−1 (2)2 . . . aω−1 (n)n ω∈Sn

−1

= ∑ sign(ω )aω−1 (1)1 aω−1 (2)2 . . . aω−1 (n)n = det A⊺ . ω∈Sn

(Note that since Sn is a group, when ω varies over Sn , so does ω −1 .) 2. Fix all rows of A except the row i. From (1.7.1) it follows that det A is a linear function in the row i of A, i.e., det A is a multilinear function in the rows of A. In view of the identity det A = det A⊺ , we deduce that det A is a multilinear function of the columns of A. 3. Assume that A is upper triangular. Then, anω(n) = 0 if ω(n) ≠ n. Therefore, all nonzero terms in (1.7.1) are zero unless ω(n) = n. Thus, assume that ω(n) = n. Then, a(n−1)ω(n−1) = 0 unless ω(n − 1) = n − 1. Hence, all nonzero terms in (1.7.1) must come from all ω satisfying ω(n) = n, ω(n − 1) = n − 1. Continuing in the same manner, we deduce that the only nonzero term in (1.7.1) comes from ω = id. As sign(id) = 1, it follows that det A = ∏ni=1 aii . Since det A = det A⊺ , we deduce the claim for lower triangular matrices. 4. Note that a permutation of two elements 1 ≤ i < j ≤ n in [n] is achieved by a transposition τ ∈ Sn . Therefore, B = [bpq ] and bpq = aτ (p)q . As in the proof of 1 we let τ (i) = j; then j = τ −1 (i) = τ (i). Hence, det B = ∑ sign(ω)aτ (1)ω(1) aτ (2)ω(2) . . . aτ (n)ω(n) ω∈Sn

= ∑ sign(ω)a1ω(τ (1)) a2ω(τ (2)) . . . anω(τ (n)) ω∈Sn

= ∑ −sign(ω ○ τ )a1(ω○τ )(1) a2(ω○τ )(2) . . . an(ω○τ )(n) = − det A. ω∈Sn

1.7. Definition and properties of the determinant

45

5. Suppose A has two identical rows. Interchange these two rows to obtain B = A. Then, det A = det B = − det A, where the last equality is established in 4. Thus, 2 det A = 0 and this means det A = 0 as char F ≠ 2. 6. Use 2, 3, and 5 to deduce that det EA = det E det A if E is an elementary matrix. (Note that det E ≠ 0.) Hence, if E1 , . . . , Ek are elementary matrices, we deduce that det(Ek . . . E1 ) = ∏ki=1 det Ei . Let B be the RREF of A. Therefore, B = Ek Ek−1 . . . E1 A. Hence, det B = (∏ni=1 det Ei ) det A. If In ≠ B, then the last row of B is zero, so det B = 0, which implies that det A = 0. If B = In , then det A = ∏ni=1 (det Ei )−1 . 7. Assume that either det A = 0 or det B = 0. We claim that (AB)x = 0 has a nontrivial solution. Suppose det B = 0. Using 6 and Theorem 1.3.8, we conclude that the equation Bx = 0 has a nontrivial solution which satisfies ABx = 0. Suppose that B is invertible and Ay = 0, for some y ≠ 0. Then, AB(B −1 y) = 0 which implies that det AB = 0. Hence, in these cases det AB = 0 = det A det B. Suppose that A and B are invertible. Then, each of them is a product of elementary matrices. Use the arguments in the proof of 6 to deduce that det AB = det A det B. 8. First, we prove the first part of (1.7.2) for i = n. Clearly, (1.7.1) yields the equality n

det A = ∑ anj j=1

∑

sign(ω)a1ω(1) . . . a(n−1)ω(n−1) .

(1.7.3)

ω∈Sn ,ω(n)=j

In the above sum, let j = n. Hence, ω(n) = n, and then ω can be viewed as ω ′ ∈ Sn−1 . Also sign(ω) = sign(ω ′ ). Hence, ∑ω′ ∈Sn−1 sign(ω)a1ω′ (1) . . . a(n−1)ω′ (n−1) = det A(n, n). Note that (−1)n+n = 1. This justifies the form of the last term of Expansion (1.7.2) for i = n. To justify the sign of any term in 1.7.2 for i = n, we take the column j and interchange it first with column j + 1, then with column j + 2, and at last with column n. The sign of det A(n, j) is (−1)n+j . This proves the case i = n. By interchanging any row i < n with row i + 1, row i + 2, and finally with row n, we deduce the first part of (1.7.2), for any i. By considering det A⊺ , we deduce the second part of (1.7.2). ◻

1.7.3 Elementary row operations and determinants Invoking Theorem 1.7.3, the determinant is alternating and multilinear in rows and columns. Let ⎡A ⎤ ⎢ 1⎥ ⎢ ⎥ A = ⎢ ⋮ ⎥ ∈ Fn×n , ⎢ ⎥ ⎢An ⎥ ⎣ ⎦ where the Ai ’s are its rows. We can conclude some results concerning the effects of ERO on determinants. The first type of ERO is multiplying row i by a nonzero a ∈ F. Multilinearity of the determinant implies ⎡ A1 ⎤ ⎢ ⎥ ⎢ ⋮ ⎥ ⎢ ⎥ ⎢ ⎥ det ⎢aAi ⎥ = a det A. ⎢ ⎥ ⎢ ⋮ ⎥ ⎢ ⎥ ⎢ An ⎥ ⎣ ⎦

46

Chapter 1. Basic facts on vector spaces, matrices, and linear transformations

Namely, multiplying one row by a scalar multiplies the determinant by that scalar. Multiplying the whole matrix by a given scalar yields the following result: det(aA) = an det A. The second type of ERO is interchanging two distinct rows i and j. Alternativity of the determinant implies ⎡ A1 ⎤ ⎡ A1 ⎤ ⎢ ⎥ ⎢ ⎥ ⎢ ⋮ ⎥ ⎢ ⋮ ⎥ ⎢ ⎥ ⎢ ⎥ ⎢ ⎥ ⎢ ⎥ ⎢ Aj ⎥ ⎢ Ai ⎥ ⎢ ⎥ ⎢ ⎥ det ⎢⎢ ⋮ ⎥⎥ = − det ⎢⎢ ⋮ ⎥⎥ . ⎢A ⎥ ⎢A ⎥ ⎢ i⎥ ⎢ j⎥ ⎢ ⎥ ⎢ ⎥ ⎢ ⋮ ⎥ ⎢ ⋮ ⎥ ⎢ ⎥ ⎢ ⎥ ⎢An ⎥ ⎢An ⎥ ⎣ ⎦ ⎣ ⎦ The third type of ERO adds, to row j, row i multiplied by a. Reusing multilinearity and alternativity of the determinant implies ⎡ A1 ⎤ ⎡ A1 ⎤ ⎢ ⎥ ⎢ ⎥ ⎢ ⎥ ⎢ ⋮ ⎥ ⋮ ⎢ ⎢ ⎥ ⎥ ⎢ ⎥ ⎢ ⎥ ⎢ Ai ⎥ ⎢ Ai ⎥ ⎢ ⎥ ⎢ ⎥ ⎥ = a det ⎢ ⋮ ⎥ + det A = 0 + det A = det A. ⋮ det ⎢⎢ ⎢ ⎥ ⎥ ⎢A ⎥ ⎢aA + A ⎥ ⎢ i⎥ ⎢ i j⎥ ⎢ ⎥ ⎢ ⎥ ⎢ ⎢ ⋮ ⎥ ⎥ ⋮ ⎢ ⎥ ⎢ ⎥ ⎢An ⎥ ⎢ An ⎥ ⎣ ⎦ ⎣ ⎦

1.7.4 Matrix inverse Let A be an invertible matrix. Observe that det In = 1. Hence, 1 = det In = det AA−1 = det A det A−1 ⇒ det A−1 =

1 . det A

For A = [aij ] ∈ Fn×n denote by Aij the determinant of the matrix obtained from A by deleting the ith row and jth column multiplied by (−1)i+j . (This is called the (i, j) cofactor of A.) Aij ∶= (−1)i+j det A(i, j).

(1.7.4)

Then, the expansion of det A by the row i and the column j, respectively, is given by the equalities n

n

j=1

i=1

det A = ∑ aij Aij = ∑ aij Aij .

(1.7.5)

Then, the adjoint matrix of A is defined as follows: ⎡ A11 ⎢ ⎢ A ⎢ adj A ∶= ⎢ 12 ⎢ ⋮ ⎢ ⎢ A1n ⎣

A21 A22 ⋮ A2n

. . . An1 . . . An2 ⋮ ⋮ . . . Ann

⎤ ⎥ ⎥ ⎥ ⎥. ⎥ ⎥ ⎥ ⎦

(1.7.6)

1.7. Definition and properties of the determinant

47

Definition 1.7.5. An m × m matrix B is called an m × m principal submatrix of an n × n matrix A if B is obtained from A by removing n − m rows and the same n − m columns. Let B be a square submatrix of A. Then, det B is called a minor of A. Moreover, det B is called a principal minor of order m if B is an m × m principal submatrix of A. Proposition 1.7.6. Let A ∈ Fn×n . Then A(adj A) = (adj A)A = (det A)In .

(1.7.7)

Hence, A is invertible if and only if det A ≠ 0. Furthermore, A−1 = (det A)−1 adj A. Proof. Consider an (i, k) entry of A(adj A). It is given as ∑nj=1 aij Akj . For i = k, (1.7.5) yields that ∑nj=1 aij Aij = det A. Suppose that i ≠ k. Let Bk be the matrix obtained from A by replacing the row k of A by the row i. Then, Bk has two identical rows, and hence det Bk = 0. On the other hand, expand Bk by the row k to obtain that 0 = det Bk = ∑nj=1 aij Akj . This shows that A(adj A) = (det A)In . Similarly, one shows that (adj A)A = (det A)In . Using Theorem 1.7.3, part 6, we see that A is invertible if and only if det A ≠ 0. Hence, for invertible A, A−1 = det1 A adj A. ◻ One popular application of the explicit formula for A−1 is Cramer’s rule for solving a nonsingular system of n linear equations in n unknowns. Consider the system Ax = b, where A ∈ Fn×n has nonzero determinant. Proposition 1.7.7. (Cramer’s rule.) Let A ∈ GL(n, F) and consider the system Ax = b, where x = (x1 , . . . , xn )⊺ . Denote by Bk the matrix obtained from Bk A by replacing the column k in A by b. Then, xk = det , for k = 1, . . . , n. det A Proof. Clearly, x = A−1 b = det1 A (adj A)b. Hence, xk = (det A)−1 ∑nj=1 Ajk bj , where b = (b1 , . . . , bk )⊺ . Expand Bk by the column k to deduce that det Bk = n ◻ ∑j=1 bj Ajk . Example 1.22. We solve the following system of equations by Cramer’s rule: ⎧ 2x1 + x2 + x3 = 3, ⎪ ⎪ ⎪ ⎪ ⎨x1 − x2 − x3 = 0, ⎪ ⎪ ⎪ ⎪ ⎩x1 + 2x2 + x3 = 0. We rewrite this system in the coefficient matrix form: ⎡2 ⎢ ⎢ ⎢1 ⎢ ⎢1 ⎣

1 −1 2

1 ⎤⎥ ⎡⎢x1 ⎤⎥ ⎡⎢3⎤⎥ ⎥⎢ ⎥ ⎢ ⎥ −1⎥ ⎢x2 ⎥ = ⎢0⎥ . ⎥⎢ ⎥ ⎢ ⎥ 1 ⎥⎦ ⎢⎣x3 ⎥⎦ ⎢⎣0⎥⎦

Assume that D denotes the determinant of the coefficient matrix: RRR2 R D = RRRRR1 RRR1 R

1 −1 2

1 RRRR −1RRRR . R 1 RRRR

48

Chapter 1. Basic facts on vector spaces, matrices, and linear transformations

Denote by D1 the determinant of the matrix obtained from the coefficient matrix by replacing the first column with the right-hand side column of the given system: RRRR3 1 1 RRRR D1 = RRRRR0 −1 −1RRRRR . RRR0 2 1 RRRR R Similarly, D2 and D3 would then be RRR2 3 1 RRR RRR2 1 3RRR RRR RRR R R D2 = RRR1 0 −1RRR and D3 = RRRRR1 −1 0RRRRR . RRR1 0 1 RRR RRR1 2 0RRR R R R R Evaluating each determinant, we get D = 3, D1 = 3, D2 = −6, and D3 = 9. Cramer’s rule says that x1 = DD1 = 33 = 1, x2 = DD2 = − 36 = −2, and x3 = DD3 = 93 = 3. Note that the point of Cramer’s rule is that we don’t have to solve the whole system to get the one value we need. For large matrices, Cramer’s rule does not provide a practical method for computing the inverse because it involves n2 determinants, and the computation via the Gauss–Jordan algorithm given in subsection 1.3.4 is significantly faster. Nevertheless, Cramer’s rule has important theoretical importance.

1.7.5 Worked-out problems 1. If A ∈ Fn×n , show that the rank of adj A is n, 1, or 0. Solution: Since A is an n × n matrix, then A(adj A) = (det A)In ⇒ (det A)(det(adj A)) = (det A)(det In ) = det A. Assume that A is a full-rank matrix. Then det A ≠ 0 and so (det A)(det(adj A)) = det A gives det(adj A) = 1. Therefore, the matrix adj A is also invertible and of rank n. (Why?) If the rank of A is n − 1 , then at least one minor of order n−1 of A is nonzero and this means adj A is nonzero, and so the rank of adj A is greater than zero. Since rank A = n − 1, then det A = 0 and so A(adj A) = 0. This tells us rank A(adj A) = 0. Now, by Problem 1.6.3-2, we conclude that 0 ≥ rank A+rank adj A−n or n−1+rank adj A ≤ n or rank adj A ≤ 1. But we showed that rank adj A > 0. Then, rank adj A = 1. Finally, if rank A ≤ n−1, then all minors of orders n−1 of A are zero. Thus, adj A = 0 and rank adj A = 0. 2. Assume that the matrix A ∈ F(n+1)×(n+1) ⎡1 x0 x2 ⎢ 0 ⎢1 x x21 ⎢ 1 A=⎢ ⎢⋮ ⋮ ⋮ ⎢ ⎢1 xn x2n ⎣

is given as follows: ⋯ xn0 ⎤⎥ ⋯ xn1 ⎥⎥ ⎥. ⋮ ⎥⎥ n⎥ ⋯ xn ⎦

1.7. Definition and properties of the determinant

49

Show that det A = ∏(xi − xj ).

(1.7.8)

i>j

The determinant of A is called the Vandermonde determinant of order n + 1. Solution: We prove by induction. First, we can verify that the result holds in the 2 × 2 case. Indeed, 1 ∣ 1

x0 ∣ = x1 − x0 . x1

We now assume the result for n − 1 and consider n. We note that the index n corresponds to a matrix of order (n + 1) × (n + 1), and hence our induction hypothesis is that the claim (1.8.1) holds for any Vandermonde determinant of order n × n. We subtract the first row from all other rows and expand the determinant along the first column: RRR1 RRR RRR1 RRR ⋮ RRR RRR1

x0 x1 ⋮ xn

⋯ ⋯ ⋯

xn0 RRRR RRRR1 R R xn1 RRRR RRRR0 = ⋮ RRRR RRRR ⋮ R R xnn RRRR RRRR0

x0 x1 − x0 ⋮ xn − x0

⋯ ⋯ ⋯

xn0 RRRR R R R x − x0 − xn0 RRRR RRRR 1 RRR = RRR ⋮ ⋮ RR RRRx − x0 xnn − xn0 RRRR R n xn1

⋯ ⋯

xn1 − xn0 RRRR RRR . ⋮ R n n RRR xn − x0 RR

For every row k in the last determinant, we factor out a term xk − x0 to get

RRR x1 − x0 RRR RRRR ⋮ RRRxn − x0

⋯ ⋯

RRR RRR1 RRR n n RR RRR x1 − x0 RR n R1 R R ⋮ RRR = ∏ (xk − x0 ) RRRR RRR ⋮ xnn − xn0 RRRR k=1 RRR RRR RRR1 R

x1 + x0

⋯

x2 + x0

⋯

⋮ x1 + x0

⋯

n RR xi0 RRRR ∑ xn−1−i 1 RRR i=0 n n−1−i i RRR x0 RR ∑ x2 RRR . i=0 RRR ⋮ RRR n n−1−i i RRR x x ∑ n 0 RR RR i=0

Here we use the expansion n−3 2 n−1 xnk − xn0 = (xk − x0 )(xn−1 + xn−2 k k x0 + xk x0 + ⋯ + x0 ).

For example, the first row of the last determinant is 1

x1 + x0

x21 + x1 x0 + x20

x31 + x21 x0 + x1 x20 + x30

⋯.

Now for every column l, starting from the second one, subtracting the sum of xi0 times column i, we end up with RRR1 x1 RRR R1 x2 ∏ (xk − x0 ) RRRR ⋮ RRR ⋮ k=1 RRR1 x n R n

⋯ ⋯ ⋯

R xn−1 1 RRR n−1 RR x2 RRR . ⋮ RRRR n−1 RRR xn RR

(1.7.9)

50

Chapter 1. Basic facts on vector spaces, matrices, and linear transformations

The next step, using the first row as the example, shows that (x21 +x21 x0 +x1 x20 +x30 )−(x21 +x1 x0 +x20 )x0 = x31 , (x21 +x1 x0 +x20 )−(x1 +x0 )x0 = x21 , (x1 + x0 ) − 1 × x0 = x1 . Now that on the right-hand side of (1.7.9) we have a Vandermonde determinant of dimension n × n, so we can use the induction to conclude with the desired result. Note that the matrix given in this problem is well known as a Vandermonde matrix, named after Alexander-Theophile Vandermonde. 3. Prove that Theorem 1.7.3-5 is the case for any field without any condition on characteristic. Solution: Assume that char F = 2, i.e., 2 = 0. In this case, det A equals the permanent of A: perm A = ∑ a1ω(1) a2ω(2) ⋯anω(n) . ω∈Sn

(See section 1.8 for more details on the permanent of a matrix.) Since A has two identical rows, each term appears twice. For example, if terms, row 1 is equal to row 2, then a1i a2j = a1j a2i . Thus, we have only n! 2 and each term is multiplied by 2 = 0. Hence, det A = 0. Use det A = det A⊺ to deduce that det A = 0 if A has two identical columns.

1.7.6 Problems 1. Let A = [aij ] ∈ Fn×n be a symmetric tridiagonal matrix. If B is a matrix formed from A by deleting the first two rows and columns, show that det A = a11 det A(1, 1) − a212 det B. 2. If A ∈ Fm×n and B ∈ Fn×p , show that rank A + rank B − n ≤ rank AB.

(1.7.10)

The combination of the inequality (1.7.10) with the inequality mentioned in Problem 1.4.4-3 is well known as Sylvester’s inequality. 3. Let A, B ∈ Fn×n . Prove that rank (A + B) ≤ rank A + rank B. 4. If A, B ∈ Fn×n , prove or disprove the following statement: “rank (A + B) ≤ min{rank A, rank B}.” Compare this statement with Problem 1.4.4-3. 5. If A, B ∈ Fn×n and A − B = AB, prove that AB = BA.

1.7. Definition and properties of the determinant

51

6. Let A ∈ Fm×n and denote by Ak the upper left k×k submatrix of A. Assume det Ai for that det Ai ≠ 0 for i = 1, . . . , k. Show that the ith pivot of A is det Ai−1 i = 1, . . . , k. (Assume that det A0 = 1.) 7. Let A ∈ Rn×n . Prove that rank AA⊺ = rank A⊺ A. 8. Let A ∈ R2×2 be an orthogonal matrix. Show the following: cos θ − sin θ (a) if det A = 1, then A = [ ], for some θ, 0 ≤ θ < 2π. That is, sin θ cos θ A counterclockwise rotates every point in R2 by an angle θ. cos θ sin θ (b) if det A = −1, then A = [ ], for some θ, 0 ≤ θ < 2π. That sin θ − cos θ is, A reflects every point in R2 about a line passing through the origin. Determine this line. Equivalently, there exists an invertible matrix P 1 0 such that P −1 AP = [ ]. 0 −1 (See Problem 1.6.3-4.) 9. Let

⎡0 ⎢ ⎢0 ⎢ ⎢ A=⎢ ⋮ ⎢ ⎢0 ⎢ ⎢a1 ⎣

0 0

⋯ ⋯

0 an−1

a2 0

⋯ ⋯

0 0

an ⎤⎥ 0 ⎥⎥ ⎥ ⎥ ∈ Fn×n . ⎥ 0 ⎥⎥ 0 ⎥⎦

Then, the only possible nonzero entries of A occur in the (n−i+1)st column n(n−1) of the ith row. Prove that det A = (−1) 2 a1 ⋯an . 10. Let P be an n × n permutation matrix. Show that det P = ±1. 11. Let A ∈ Fn×n . Prove that if det A ≠ 0, then det(adj A) ≠ 0. 12. Show that if A ∈ Fn×n is orthogonal, then det A = ±1. 13. If every row of A ∈ Rn×n adds to zero, prove that det A = 0. If every row adds to 1, prove that det(A − I) = 0. 14. Prove or disprove the following statements: (a) The determinant is the product of pivots. (b) If A, B ∈ Fn×n such that A is invertible and B is singular, then A + B is invertible. (c) If A, B ∈ Fn×n , then det(AB − BA) = 0. 15. Let A ∈ AS(n, R) be a matrix with entries +1 above the main diagonal. Calculate adj A. 16. Let A ∈ AS(n, F). Prove that adj A is a symmetric matrix for odd n and skew-symmetric for even n. 17. Calculate the matrix inverse to the Vandermonde matrix given in Workedout Problem 1.7.5-2.

52

Chapter 1. Basic facts on vector spaces, matrices, and linear transformations

18. Suppose that A ∈ Rn×n such that the diagonal entries are positive, the off-diagonal entries are negative, and the row sums are all positive. Prove that det A ≠ 0. (This result is known as Minkowski’s criterion.) 19. Let An ∈ Fn×n be a tridiagonal matrix as follows: ⎡a b 0⎤⎥ ⎢ ⎢c a b ⎥ ⎢ ⎥ ⎥. An = ⎢ ⎢ ⎥ c a b ⎢ ⎥ ⎢0 ⎥ c a ⎣ ⎦ Prove that ⎧ an ⎪ ⎪ ⎪ n ⎪ det An = ⎨(n + 1) ( a2 ) ⎪ ⎪ n+1 ⎪ ⎪ − β n+1 ) /(α − β) ⎩(α

if bc = 0, if a2 = 4bc, if a2 ≠ 4bc,

√ √ a2 − 4ac a − a2 − 4bc , β= . 2 2 20. An LU decomposition of a square matrix A expresses A = LU , where L is lower triangular and U is upper triangular. Prove that A ∈ Fn×n has an LU decomposition if and only if every principal submatrix is nonsingular. where

α=

a+

21. Let A = [aij ] ∈ Rn×n where aij =

1 , min(i,j)

for 1 ≤ i, j ≤ n. Compute det A.

22. Let A = [aij ] ∈ Rn×n satisfying ∣aij ∣ > ∑j≠i ∣aij ∣, for all 1 ≤ i ≤ n. Prove that A is invertible. 23. Show that if A, B ∈ Rn×n such that AB = BA, then det(A2 + B 2 ) ≥ 0.

1.8 Permanents 1.8.1 A combinatorial approach to the Philip Hall theorem The following theorem is well known as the Philip Hall theorem and gives us the necessary and sufficient conditions to have a perfect matching in a bipartite graph. See [30] for more results on this theorem and its proof. See also Appendix C for definitions of bipartite graphs and matchings in graphs. Theorem 1.8.1. If G = (V, E) is a bipartite graph with the bipartite sets X and Y , then G has a perfect matching if and only if #X = #Y and for any subset S ⊂ X, #S ≤ #N (S), where N (S) denotes the neighborhood of S in Y , i.e., the set of all vertices in Y adjacent to some elements of S. If A = {Ai }ni=1 is a family of nonempty subsets of a finite set S, a system of distinct representative (SDR) of A is a tuple (a1 , . . . , an ) with ai ∈ Ai and ai ≠ aj , 1 ≤ i, j ≤ n, i ≠ j. This is also called a transversal of A. Theorem 1.8.2. Let A = {Ai }ni=1 be a family of nonempty subsets of a finite set A. Then, A has an SDR if and only if # ⋃ Ai ≥ #J, i∈J

for any J ⊆ [n].

1.8. Permanents

53

It is easy to check that the above theorem is an immediate consequence of the Philip Hall theorem. Indeed, construct a bipartite graph with X = {a1 , . . . , an } and Y = {A1 , . . . , An }. There is an edge (ai , Aj ) if and only if ai ∈ Aj . Apply the Philip Hall theorem to find a perfect matching in this graph. The interested reader can verify that the number of perfect matchings in a bipartite graph G with bipartite sets X and Y and #X = #Y = n is equal to perm A, where A is the adjacency matrix of G. (We will define the permanent of a matrix in the next subsection.) We end this section with the linear version of the Philip Hall theorem. Let V be a vector space over the field F and let V = {V1 , . . . , Vn } be a family of vector subspaces of V. A free transversal for V is a family of linearly independent vectors x1 , . . . , xn such that xi ∈ Vi , i = 1, . . . , n. The following result of Rado [22] gives a linear version of the Philip Hall theorem for the existence of a free transversal for V. The interested reader is referred to [2] to see the proof and more details. Theorem 1.8.3. Let V be a vector space over the field F, and let V = {V1 , . . . ,Vn } be a family of vector subspaces of V. Then, V admits a free transversal if and only if dim span ( ⋃ Vi ) ≥ #J, i∈J

for all J ⊂ [n].

1.8.2 Permanents Recall that if A ∈ Fn×n , the permanent of A = [aij ] is denoted by perm A and defined as n

permA = ∑ ∏ aiw(i) . w∈Sn i=1

This formula is just like formula (1.7.1) for det A, except that the sign(ω) has been omitted. Note that the permanent of A is the sum of n! terms, each of which is a product of n elements of A chosen from distinct rows and columns in all possible ways. From the definition of permanent, it follows that permA = permA⊺ . Most of the multilinear properties for determinants extend without difficulty to permanents. However, the alternating property is not the case for permanents; i.e., if the rows (or columns) of A are permuted in any way, the permanent doesn’t change. (Why?) The matrix A ∈ Fm×n whose entries are either 1 or 0 is called a (0, 1)-matrix. A special class of (0, 1)-matrices, namely, the (0, 1)-matrices in Fn×n , have exactly k ones in each row and column. We denote this class by C(n, k). We define S(n, k) ∶= max{permA; A ∈ C(n, k)}, s(n, k) ∶= min{permA; A ∈ C(n, k)}. The following result, known as Fekete’s lemma, will be used to find an upper bound for the permanent function. See [30] for more details about Fekete’s lemma.

54

Chapter 1. Basic facts on vector spaces, matrices, and linear transformations

Lemma 1.23. Let f ∶ N → N be a function for which f (m + n) ≥ f (m)f (n), for 1 all m, n ∈ N. Then limn→∞ f (n) n exists (possibly ∞). It is easy to check the following inequalities: S(n1 + n2 , k) ≥ S(n1 , k)S(n2 , k), s(n1 + n2 , k) ≤ s(n1 , k)s(n2 , k). By applying these inequalities to Fekete’s lemma, we can define 1

S(k) ∶= lim {S(n, k)} n , n→∞

1

s(k) ∶= lim {s(n, k)} n . n→∞

Definition 1.8.4. A function f ∶ Rn → R is called convex if f (tx + (1 − t)y) ≤ tf (x) + (1 − t)f (y), for any x, y ∈ Rn and t ∈ [0, 1]. Also, f is called strictly convex if the inequality above is strict. Note that a necessary and sufficient condition for a continuous function f on [0, 1], such that f ′ and f ′′ exist on (0, 1), is that the second derivative is always nonnegative. (Try to justify!) Lemma 1.24. If t1 , . . . , tn are nonnegative real numbers, then n

∑ ti ( i=1 ) n

n

∑i=1 ti

n

≤ ∏ ttii . i=1

Proof. Since x ln x is a convex function, we have n n n ∑i=1 ti ∑ ti ∑ ti ln ti ln ( i=1 ) ≤ i=1 , n n n

◻

which proves the assertion.

It was conjectured by H. Minc that if A ∈ Rn×n is a (0,1)-matrix with rowsums r1 , . . . , rn , then n

1

perm A ≤ ∏(rj !) rj . j=1

This conjecture was proved by L.M. Bregman [5]. The following short proof of this theorem was given by A. Schrijver [26]. Proof by induction on n. For n = 1 the statement is obvious. Suppose that it is satisfied for (n − 1) × (n − 1) matrices. We will show that (permA)

npermA

n

npermA 1 ri

≤ (∏ ri ! )

,

i=1

which implies the above inequality. In the following inequalities, the variables i, j, and k range from 1 to n. Let S be the set of all permutations ω ∈ Sn for which aiω(i) = 1, for i = 1, . . . , n. Clearly, #S = perm A. Using Lemma 1.24, ∏(perm A) i

perm A

⎛ ⎞ perm A perm Aik ⎟ ⎜ ≤ ∏ ⎜ri ∏ perm Aik ⎟, i ⎝ k ⎠ aik =1

1.8. Permanents

55

where Aik denotes the minor obtained from A by deleting row i and column k. Now, ⎛ ⎞ permA permAik ⎟ r permA ∏⎜ ∏ ⎜ i ⎟ = ∏ ((∏ri ) (∏permAiω(i) )) , ik i i i ⎝ k ⎠ ω∈S aik =1 because the number of factors ri equals permA on both sides, while the number of factors permAik equals the number of ω ∈ S for which ω(i) = k. Applying the induction hypothesis to each Aiω(i) implies that the recent expression is less than or equal to ⎛ ⎛ ⎛ ⎞⎛ ⎞⎞⎞ ⎜ ⎜ ⎜ ⎟⎟⎟ 1 1 ⎟⎜ ⎜ ⎜ ⎜ ⎜ ⎟ ⎟⎟⎟ (∏ ri ) ⎜∏ ⎜ ∏ rj ! rj ⎟ ⎜ ∏ (rj − 1)! (rj −1) ⎟⎟⎟ . ∏⎜ ⎜ ⎜ ⎜ ⎜ ⎟ ⎟⎟⎟ ω∈S ⎜ i ⎜ i ⎜ j ⎟⎜ j ⎟⎟⎟ j≠i j≠i ⎝ ⎝aiω(i)=0 ⎝ ⎠ ⎝aiω(i)=1 ⎠⎠⎠

(1.8.1)

Changing the order of multiplication and considering that the number of i such that i ≠ j and ajω(j) = 0 is n − ri , whereas the number of i such that i ≠ j and ajω(i) = 1 is rj − 1, we get (rj −1) (n−rj ) ⎛ ⎛ ⎞⎞ (1.8.1) = ∏ (∏ ri ) ∏ rj ! rj (rj − 1)! (rj −1) . ⎝ ⎝ ⎠⎠ i j ω∈S

(1.8.2)

That (1.8.2) equals nperm A n ri

1 ri

∏ (∏ri ! ) = (∏ri ! )

ω∈S

is trivial. Since (perm A)

i

i

npermA

= ∏(permA)permA , the proof is complete. i

Now, if G is a balanced bipartite graph with adjacency matrix A ∈ {0, 1}n×n and row-sums r1 , . . . , rn , then by Bregman’s theorem, ∏nj=1 (rj !)1/rj is an upper bound for the number of perfect matchings in G.

1.8.3 Worked-out problems n×n

1. Assume that Tn ⊆ (N ∪ {0}) is the set of matrices with row-sums and column-sums 3 and let tn = min{perm A; A ∈ Tn }. Denote by Xn the set of all matrices obtained from elements of Tn by decreasing one positive entry by 1; xn = min{perm A; A ∈ Xn }. Show that tn ≥ ⌈ 32 xn ⌉, where ⌈ ⌉ denotes the ceiling function. Solution: Choose A ∈ Tn with first row y = (y1 , y2 , y3 , 0, . . . , 0), where yi ≥ 0, for i = 1, 2, 3. Then 2y = y1 (y1 − 1, y2 , y3 , 0, . . . , 0) + y2 (y1 , y2 − 1, y3 , 0, . . . , 0) + y3 (y1 , y2 , y3 − 1, 0, . . . , 0). Since S(n1 + n2 , k) ≥ S(n1 , k)S(n2 , k), then 2tn ≥ (y1 + y2 + y3 )xn = 3xn .

56

Chapter 1. Basic facts on vector spaces, matrices, and linear transformations

1.8.4 Problems 1. Show the following: (a) S(n, k) ≥ k!. 1

(b) S(k) ≤ (k!) k . 1

1

(c) Show by example that S(k) ≥ (k!) k . This shows that S(k) = (k!) k . (Hint: Use Bregman’s theorem.)

1.9 An application of the Philip Hall theorem Recall that a permutation matrix is a square matrix that has exactly one entry of 1 in each row and each column and zero elsewhere. Now, we define a more general family of matrices called doubly stochastic, as mentioned in section 1.5. Definition 1.9.1. A matrix with no negative entries whose column (rows) sums are 1 is called a column stochastic (row stochastic) matrix. In some references, a column stochastic (row stochastic) matrix is called a stochastic matrix. Both types of matrices are also called Markov matrices. Definition 1.9.2. A doubly stochastic matrix is a square matrix A = [aij ] of nonnegative real entries, each of whose rows and columns sum 1, i.e., ∑ aij = ∑ aij = 1. i

j

The set of all n × n doubly stochastic matrices is denoted by Ωn . If we denote all n × n permutation matrices by Pn , then clearly Pn ⊂ Ωn . Definition 1.9.3. A subset A of a real finite-dimensional vector space V is said to be convex if λx + (1 − λ)y ∈ A, for all vectors x, y ∈ A and all scalars λ ∈ [0, 1]. Via induction, this can be seen to be equivalent to the requirement that ∑ni=1 λi xi ∈ A, for all vectors x1 , . . . , xn ∈ A and all scalars λ1 , . . . , λn ≥ 0 with ∑ni=1 λi = 1. A point x ∈ A is called an extreme point of A if y, z ∈ A, 0 < t < 1, and x = ty + (1 − t)z imply x = y = z. We denote by ext A the set of all extreme points of A. With these restrictions on all λi ’s, an expression of the form ∑ni=1 λi xi is said to be a convex combination of x1 , . . . , xn . The convex hull of a set B ⊂ V is defined as {∑ λi xi ∶ xi ∈ B, λi ≥ 0, and ∑ λi = 1}. The convex hull of B can also be defined as the smallest convex set containing B. (Why?) It is denoted by conv B. The importance of extreme points can be seen from the following theorem, whose proof is found in [3]. Theorem 1.9.4 (Krein–Milman). Let A ⊂ Rn be a nonempty compact convex set. Then 1. the set of all extreme points of A is nonempty. 2. the convex hull of the set of all extreme points of A is A itself.

1.9. An application of the Philip Hall theorem

57

The following theorem is a direct application of matching theory to express the relation between two sets of matrices Pn and Ωn . Theorem 1.9.5 (Birkhoff). Every doubly stochastic matrix can be written as a convex combination of permutation matrices. Proof. We use the Philip Hall theorem to prove this theorem. We associate to our doubly stochastic matrix A = [aij ] a bipartite graph as follows. We represent each row and each column with a vertex, and we connect the vertex representing row i with the vertex representing row j if the entry aij is nonzero. For example, if ⎡7 0 5⎤ ⎢ 12 12 ⎥ ⎥ ⎢ ⎢1 1 1⎥ A = ⎢ 6 2 3 ⎥, ⎥ ⎢ ⎢1 1 1⎥ ⎥ ⎢ ⎣4 2 4⎦ the graph associated to A is given in the picture below:

row1

column 1

row2

column 2

row3

column 3

We claim that the associated graph of any doubly stochastic matrix has a perfect matching. Assume to the contrary, that is, A has no perfect matching. Then, by the Philip Hall theorem, there is a subset E of the vertices in one part such that the set N (E) of all vertices connected to some vertex in E has strictly less than #E elements. Without loss of generality, we may assume that E is a set of vertices representing rows, and the set N (E) consists then of vertices representing columns. Consider now the sum ∑i∈E,j∈N (E) aij = #E, the sum of all entries located in rows belonging to E and columns belonging to N (E). In the rows belonging to E all nonzero entries are located in columns belong to N (E) (by the definition of the associated graph). Thus, ∑

i∈E,j∈N (E)

aij = #E,

since the matrix is doubly stochastic, the sum of elements located in any of given #E rows is #E. On the other hand, the sum of all elements located in all columns belonging to N (E) is at least ∑i∈E,j∈N (E) aij , since the entries not belonging to a row in E are nonnegative. Since the matrix is doubly stochastic, the sum of all elements located in all columns belonging to N (E) is also exactly #N (E). Thus, we obtain ∑

i∈E,j∈N (E)

aij ≤ #N (E) < #E =

∑

i∈E,j∈N (E)

a contradiction. Then, A has a perfect matching.

aij ,

58

Chapter 1. Basic facts on vector spaces, matrices, and linear transformations

Now we are ready to prove the theorem. We proceed by induction on the number of nonzero entries in the matrix. As we proved, the associated graph of A has a perfect matching. Underline the entries associated to the edges in the matching. For example, in the associated graph above, {(1, 3), (2, 1), (3, 2)} is a perfect matching, so we underline a13 , a23 , and a32 . Thus, we underline exactly one element in each row and each column. Let α0 be the minimum of the underlined entries. Let P0 be the permutation matrix that has a 1 exactly at the position of the underlined elements. If α0 = 1, then all underlined entries are 1, and A = P0 is a permutation matrix. If α0 < 1, then the matrix A − α0 P0 has nonnegative entries, and the sum of the entries in any row or any column is 1−α0 . Dividing each entry by (1 − α0 ) in A − α0 P0 gives a doubly stochastic matrix A1 . Thus, we may write A = α0 P0 + (1 − α0 )A1 , where A1 is not only doubly stochastic but has less nonzero entries than A. By our induction hypothesis, A1 may be written as A1 = α1 P1 + ⋯ + αn Pn , where P1 , . . . , Pn are permutation matrices, and α1 P1 + ⋯ + αn Pn is a convex combination. But then we have A = α0 P0 + (1 − α0 )α1 P1 + ⋯ + (1 − α0 )αn Pn , where P0 , P1 , . . . , Pn are permutation matrices and we have a convex combination. Since α0 ≥ 0, each (1 − α0 )αi is nonnegative and we have α0 + (1 − α0 )α1 + ⋯ + (1 − α0 )αn = α0 + (1 − α0 )(α1 + ⋯ + αn ) = α0 + (1 − α0 ) = 1. ◻ In our example,

⎡0 ⎢ ⎢ P0 = ⎢1 ⎢ ⎢0 ⎣

0 0 1

1⎤⎥ ⎥ 0⎥ , ⎥ 0⎥⎦

and α0 = 16 . Thus, we get

A1 =

1 1 − 61

⎡7 ⎢ 12 1 6 ⎢⎢ (A − P0 ) = ⎢ 0 6 5 ⎢⎢ ⎢1 ⎣4

0 1 2 1 3

1⎤ 4⎥ ⎥ 1⎥ ⎥ 3⎥ ⎥ 1⎥ 4⎦

⎡7 ⎢ 10 ⎢ ⎢ =⎢0 ⎢ ⎢3 ⎢ ⎣ 10

0 3 5 2 5

3 ⎤ 10 ⎥ ⎥ 2 ⎥ ⎥. 5 ⎥ ⎥ 3 ⎥ 10 ⎦

The graph associated to A1 is the following:

1

1

2

2

3

3

A perfect matching is {(1, 1), (2, 2), (3, 3)}; is ⎡1 0 ⎢ ⎢ P1 = ⎢0 1 ⎢ ⎢0 0 ⎣

the associated permutation matrix 0⎤⎥ ⎥ 0⎥ . ⎥ 1⎥⎦

1.9. An application of the Philip Hall theorem

59

and we have α1 = 3/10. Thus, we get ⎡4/10 0 3 10 ⎢⎢ 1 3/10 ⎢ 0 (A1 − P1 ) = A2 = 1 − 3/10 10 7 ⎢⎢3/10 2/5 ⎣

3/10⎤⎥ ⎡⎢4/7 0 ⎥ ⎢ 2/5 ⎥ = ⎢ 0 3/7 ⎥ ⎢ 0 ⎥⎦ ⎢⎣3/7 4/7

3/7⎤⎥ ⎥ 4/7⎥ . ⎥ 0 ⎥⎦

The graph associated to A2 is the following:

A perfect matching in this graph is {(1, 3), (2, 2), (3, 1)}, the associated permutation matrix is ⎡0 0 1⎤ ⎥ ⎢ ⎥ ⎢ P2 = ⎢0 1 0⎥ , ⎥ ⎢ ⎢1 0 0⎥ ⎦ ⎣ and we have α2 = 37. Thus, we get ⎡4/7 0 1 3 4 ⎢⎢ 0 A3 = (A2 − P2 ) = ⎢ 0 1 − 3/7 7 7 ⎢⎢ 0 4/7 ⎣

0 ⎤⎥ ⎡⎢1 ⎥ ⎢ 4/7⎥ = ⎢0 ⎥ ⎢ 0 ⎥⎦ ⎢⎣0

0 0 1

0⎤⎥ ⎥ 1⎥ . ⎥ 0⎥⎦

We are done since A3 = P3 is a permutation matrix. Working our way backwards we get 3 4 A2 = α2 P2 + (1 − α2 )A3 = P2 + P3 , 7 7 A1 = α1 P1 + (1 − α1 )A2 =

7 3 4 3 3 4 3 P1 + ( P2 + P3 ) = P1 + P2 + P3 , 10 10 7 7 10 10 10

and 1 5 3 3 4 1 1 1 1 A = α0 P0 +(1−α0 )A1 = P0 + ( P1 + P2 + P3 ) = P0 + P1 + P2 + P3 . 6 6 10 10 10 6 4 4 3 We obtained that ⎡7/12 ⎢ ⎢ ⎢ 1/6 ⎢ ⎢ 1/4 ⎣

0 1/2 1/2

⎡0 5/12⎤⎥ ⎢ ⎥ 1⎢ 1/3 ⎥ = ⎢1 ⎥ 6⎢ ⎢0 1/4 ⎥⎦ ⎣

0 0 1

⎡1 1⎤⎥ ⎢ ⎥ 1⎢ 0⎥ + ⎢0 ⎥ 4⎢ ⎢0 0⎥⎦ ⎣

0 1 0

⎡0 0⎤⎥ ⎢ ⎥ 1⎢ 0⎥ + ⎢0 ⎥ 4⎢ ⎢1 1⎥⎦ ⎣

0 1 0

⎡1 1⎤⎥ ⎢ ⎥ 1⎢ 0⎥ + ⎢0 ⎥ 3⎢ ⎢0 0⎥⎦ ⎣

0 0 1

0⎤⎥ ⎥ 1⎥ ⎥ 0⎥⎦

Remark 1.9.6. Let e ∈ Rn be the column vector with each coordinate equal to 1. Then, for a matrix A ∈ Rn×n , the condition that each sum of entries in every row and column can be described by e⊺ A = e⊺ and Ae = e. It follows that the product of finitely many doubly stochastic matrices is a doubly stochastic matrix.

60

Chapter 1. Basic facts on vector spaces, matrices, and linear transformations

1.9.1 Affine spaces Definition 1.9.7. Let V be a real vector space. A subset S of V is called an affine subspace if x, y ∈ S and t ∈ R imply tx + (1 − t)y ∈ S. Note that S need not be a vector space. Lemma 1.25. S ⊆ V is an affine subspace if and only if ∑ni=1 ti vi ∈ S, for all vi ∈ S and ti ∈ R with ∑ni=1 ti = 1. Proof. We prove by induction. For n = 2, this is immediate from the definition. Assume the result for n. Let vi ∈ S, ti ∈ R for 1 ≤ i ≤ n + 1 with ∑ ti = 1. There exists at least one i such that ti ≠ 1. Without loss of generality, we i , we would assume that tn+1 ≠ 1. Now, ∑ni=1 ti = 1 − tn+1 ≠ 0. Then if t′i = 1−ttn+1 n n ′ ′ have ∑i=1 ti = 1. By the induction hypothesis, ∑i=1 ti vi ∈ S. Call it v. Hence, (1 − tn+1 )v + tn+1 vn+1 ∈ S. This means ∑n+1 ◻ i=1 ti vi ∈ S. Theorem 1.9.8. A nonempty subset S of V is an affine subspace if and only if it is of the form v + W, for some v ∈ V and a subspace W of V. Proof. Fix v ∈ S and set W = S − v. We will show that W is a subspace of V. Clearly, 0 ∈ W, as 0 = v − v. Let w1 = x − v ∈ W and w2 = y − v ∈ W, where x, y ∈ S. We will show w1 + w2 ∈ W. We have w1 + w2 = (x + y − v) − v. Using the above lemma, z = x + y − v ∈ S, as the coefficients add up to 1. Hence, w1 + w2 = z − w ∈ W. Now, we assume that w = x − v ∈ W, for some x ∈ S and t ∈ R. Then tw = tx − tv = tx − (t − 1)v − v = tx + (1 − t)v − v. Since S is affine and x, y ∈ S, then z = tx + (1 − t)v ∈ S. Therefore, tw = z − w ∈ W. We have shown that W is a subspace of V. The converse is easy to check using the definition of an affine subspace and is left as an exercise. ◻ Now, we consider the system of linear equations (1.3.5), in the real field setting sense. We realize the following: Let S be the set of solutions of (1.3.5). Assume that S ≠ ∅. Then S is an affine subspace and S = a + S ′ , for some a ∈ S, where S ′ denotes the set of solutions of the corresponding homogeneous system. Definition 1.9.9. The dimension of an affine subspace S is dim W if S = v+W, for some v ∈ S. An affine space of dimension dim V − 1 is called a hyperplane (see Problems 1.9.3-8 and 1.9.3-9).

1.9.2 Worked-out problems 1. If we denote by Ωn,s the subset of symmetric doubly stochastic matrices, show that each A ∈ Ωn,s can be written as a convex combination of 21 (P + P ⊺ ), where P ∈ Pn . Solution: As A ∈ Ωn,s ⊂ Ωn , by Theorem 1.9.5, one can find P1 , . . . , PN ∈ Pn and N t1 , . . . , tn ∈ (0, 1) such that A = ∑N j=1 tj Pj and ∑j=1 tj = 1. Since A is

1.10. Polynomial rings

61 ⊺

⊺ symmetric, then ∑nj=1 tj Pj = (∑nj=1 tj Pj ) . This implies ∑N j=1 tj (Pj − Pj ) = 0. Since all tj ’s are positive and all Pj ’s are nonnegative matrices, we conclude that (Pj − Pj⊺ )’s are zero matrices, and so Pj = Pj⊺ , 1 ≤ j ≤ N . 1 ⊺ Therefore, Pj = 12 (Pj + Pj⊺ ) and then A = ∑N j=1 tj 2 (Pj + Pj ).

1.9.3 Problems 1. Show that in the decomposition of the above worked-out problem, N ≤ n2 −n+2 , for n > 2. 2 (Hint: Use Carath´eodary’s theorem [29].) 2. Let A be a doubly stochastic matrix. Show that perm A > 0. 3. Show that Ωn is a compact set. (See Appendix B for definition of a compact set.) 4. Show that Pn is a group with respect to the multiplication of matrices, with In the identity and P −1 = P ⊺ , for all P ∈ Pn . 5. Let A ∈ Ωn and B ∈ Ωm . Show that A ⊕ B ∈ Ωn+m . 6. Assume that A is an invertible n × n doubly stochastic matrix and that A−1 is doubly stochastic. Prove that A is a permutation matrix. 7. A nonnegative vector x ∈ Rn whose coordinates sum to 1 is called a probability vector, or a stochastic vector. Denote by Πn the set of all probability vectors in Rn . Prove that Πn is a compact set. 8. Show that a hyperplane in R2 is a line. 9. Show that a hyperplane in R3 is a plane. 10. Let V be an n-dimensional real vector space. Prove that any convex linear combination of m vectors in V is also a convex linear combination of no more than n + 1 of the given vectors.

1.10 Polynomial rings 1.10.1 Polynomials Let F be a field (usually F = R, C). By the ring of polynomials in the indeterminate, z, written as F[z], we mean the set of all polynomials p(z) = a0 z n + a1 z n−1 + ⋯ + an , where n can be any nonnegative integer and of coefficients a0 , . . . , an are all in F. The degree of p(z), denoted by deg p, is the maximal degree n − j of a monomial aj xn−j which is not identically zero, i.e., aj ≠ 0. Then, deg p = n if and only if a0 ≠ 0. The degree of a nonzero constant polynomial p(z) = a0 is zero, and the degree of the zero polynomial is equal to −∞. For two polynomials p, q ∈ F[z] and two scalars a, b ∈ F, ap(z) + bq(z) is a well-defined polynomial. Hence, F[z] is a vector space over F, whose dimension is infinite. The set of polynomials of degree at most n is an (n + 1)-dimensional m−j subspace of F[z]. Given two polynomials p = ∑ni=0 ai z n−i , q = ∑m ∈ F[z], j=0 bj z one can form the product n+m

k

k=0

i=0

p(z)q(z) = ∑ (∑ ai bk−i ) z n+m−k , where ai = bj = 0, for i > n and j > m.

62

Chapter 1. Basic facts on vector spaces, matrices, and linear transformations

Note that pq = qp and deg pq = deg p + deg q. The addition and the product in F[z] satisfy all the nice distribution identities that are satisfied by the addition and product in F. This implies that F[z] is a commutative ring with the addition and product defined above. Here, the constant polynomial p ≡ 1 is the identity element, and the zero polynomial is the zero element. (That is the reason for the name ring of polynomials in one indeterminate (variable) over F.) Given two polynomials p, q ∈ F[z], one can divide p by q ≡/ 0 with the residue r, i.e., p = tq + r, for some unique t, r ∈ F[z], where deg r < deg q. For p, q ∈ F[z], let (p, q) denote the greatest common divisor of p, q. If p and q are identically zero, then (p, q) is the zero polynomial. Otherwise, (p, q) is a polynomial s of the highest degree that divides p and q. Note that s is determined up to a multiple of a nonzero scalar, and it can be chosen as a unique monic polynomial: s(z) = z l + s1 z l−1 + ⋯ + sl ∈ F[z].

(1.10.1)

For p, q ≡/ 0, s can be found using the Euclidean algorithm: pi (z) = ti (z)pi+1 (z) + pi+2 (z), deg pi+2 < deg pi+1

i = 1, . . . .

(1.10.2)

Start this algorithm with p1 = p, p2 = q. Continue it until pk = 0 the first time. (Note that k ≥ 3.) Then, pk−1 = (p, q). It is easy to show, for example, by induction, that each pi is of the form ui p + vi q, for some polynomials ui , vi . Hence, the Euclidean algorithm yields (p(z), q(z)) = u(z)p(z) + v(z)q(z), for some u(z), v(z) ∈ F[z].

(1.10.3)

(This formula holds for any p, q ∈ F[z].) Note that p, q ∈ F[z] are called coprime if (p, q)∈ F. Note that if we divide p(z) by z − a, we get the residue p(a), i.e., p(z) = (z − a)q(z) + p(a). Hence, z − a divides p(z) if and only if p(a) = 0, i.e., a is a root of p. A monic p(z) splits to a product of linear factors if n

p(z) = (z − z1 )(z − z2 ) . . . (z − zn ) = ∏(z − zi ).

(1.10.4)

i=1

Note that z1 , . . . , zn are the roots of p. Let z = (z1 , . . . , zn )⊺ ∈ Fn . Denote σk (z) ∶=

∑

1≤i1 0 and so T is not injective.

1.12.2 Problems 1. Show that any n-dimensional vector space V over the field F is isomorphic to Fn . 2. Show that any finite-dimensional vector space V over F is isomorphic to (V′ )′ . Define an explicit isomorphism from V to (V′ )′ . 3. Show that any T ∈ L(V, U) is an isomorphism if and only if T is represented by an invertible matrix in some bases of V and U. 4. Recall that the matrices A and B ∈ Fn×n are called similar if B = P −1 AP , for some P ∈ GL(n, F). Show that similarity is an equivalence relation on Fn×n . 5. Show that the matrices A and B ∈ Fn×n are similar if and only if they represent the same linear transformation T ∈ L(V) in different bases, for a given n-dimensional vector space V over F. 6. Suppose that A and B ∈ Fn have the same characteristic polynomial, which has n distinct roots in F. Show that A and B are similar. 7. Show that every square matrix is similar (over the splitting field of its characteristic polynomial) to an upper triangular matrix. 8. Show that the eigenvalues of a triangular matrix are the entries on its main diagonal. ¯ 9. Assume that A ∈ Rn×n and that λ ∈ C is an eigenvalue of A. Show that λ is also an eigenvalue of A. 10. Let V and W two vector spaces over a field F and T ∈ L(V, W). Show that (a) If dim V > dim W, then T is not one-to-one. (b) If dim V = dim W, then T is one-to-one if and only if T is surjective. 11. Let A, B ∈ Fn×n be similar. Prove that det A = det B.

72

Chapter 1. Basic facts on vector spaces, matrices, and linear transformations

1.12.3 Trace As we saw before, the determinant is a function that assigns a scalar value to every square matrix. Another important scalar-valued function is the trace. The trace of A = [aij ] ∈ Fn×n is denoted by tr A and defined to be the sum of the elements on the main diagonal of A, i.e., n

tr A = ∑ aii . i=0

Clearly, det(zI − A) = z n − tr Az n−1 + ⋯. Assume that A is similar to B ∈ Fn×n . As A and B have the same characteristic polynomial, it follows that tr A = tr B. Let V be an n-dimensional vector space over F. Assume that T ∈ L(V) is represented by A and B, respectively, in two different bases in V. Since A and B are similar, tr A = tr B. Then, the trace of T is denoted by tr T and defined to be tr A. Note that the trace is only defined for a square matrix. The following properties are obvious about the trace: If A, B ∈ Fn×n and c ∈ F, then 1. tr A + B = tr A + tr B, 2. tr cA = c tr A, 3. tr A = tr A⊺ , 4. tr AB = tr BA. The latter case holds even if A and B are not square matrices. Namely, if A ∈ Fm×n and B ∈ Fn×m , then tr AB = tr BA. Also, the trace is invariant under cyclic permutations, i.e., tr ABCD = tr BCDA = tr CDAB = tr DABC, where A, B, C, D ∈ Fn×n . This is known as the cyclic property of trace. Note that arbitrary permutations are not allowed; in general, tr ABC ≠ tr ACB.

1.12.4 Worked-out problems 1. If A, B ∈ Fn×n , char F ≠ 2 and A is symmetric and B is skew-symmetric, prove that tr AB = 0. Solution: (3)

(2)

(4)

tr AB = tr(AB)⊺ = tr B ⊺ A⊺ = tr −BA = − tr BA = − tr AB. Then, tr AB = 0. (Here, (2), (3) and (4) denote the second and third properties mentioned above for trace.)

1.12. Linear operators (second encounter)

73

1.12.5 Problems 1. Deduce the cyclic property by proving tr ABC = tr BCA = tr CAB, where A, B, C ∈ Fn×n . 2. Let A, B ∈ Fn . Prove or disprove the following statements: (a) tr AB = tr A tr B, (b) tr(A−1 ) =

1 tr A

if A is invertible,

(c) det(xI − A) = x2 − tr A + det A if A ∈ F2×2 . 3. Determine if the map ϕ ∶ Fn×n → F by ϕ(A) = tr A is a ring homomorphism. 4. Let A, B ∈ Cn×n and AB = 0. Show that for any positive integer k, tr(A + B)k = tr Ak + tr B k . 5. Show that if all eigenvalues of A ∈ Cn×n are real and tr A2 = tr A3 = tr A4 = a, for some constant a, then for any k ∈ Z, tr Ak = a, and a must be an integer.

Chapter 2

Canonical forms

In this chapter, we will discuss two different canonical forms for similarity: the Jordan canonical form, which applies only when the base field is algebraically closed, and the rational canonical form, which applies in all cases.

2.1 Jordan canonical forms For a matrix A ∈ Fn×n , the polynomial p(z) ∶= det(zIn − A) is called the characteristic polynomial of A. An element λ ∈ F is called an eigenvalue of A if there exists 0 ≠ x ∈ Fn such that Ax = λx. This x ≠ 0 is called an eigenvector of A corresponding to λ. Note that for an eigenvalue λ of A, since Ax = λx (for some x ≠ 0), then x is in the kernel of λIn − A. Using Theorem 1.3.12, λIn − A is not invertible and det(λIn −A) = 0. On the other hand, if λ ∈ F is a root of p(z), then λIn − A is not invertible, and reusing Theorem 1.3.12 yields ker(λIn − A) ≠ 0. If 0 ≠ x ∈ ker(λIn − A), then Ax = λx, and then λ is an eigenvalue of A. Therefore, if we call the set of eigenvalues of A the spectrum of A and denote it by spec A, then we have spec A = {λ ∈ F ∶ p(λ) = 0}. Geometrically, an eigenvector corresponding to a basis vector in one-dimensional subspace is invariant under the action of A, which is given by multiplication by the eigenvalue λ. Note that for an algebraically closed field F, every matrix has at least one eigenvalue. To compute them, we need to use the fact that eigenvalues are the roots of p(z) = det(zIn − A). If λ is an eigenvalue of A, then eigenvectors for λ are elements of the null space of A − λIn , which can be found via row reduction. Then, the problem of finding λ is solved by investigating the case when A − λIn is singular. Thus, we have reduced the problem of determining eigenvalues to the problem of determining invertibility of a matrix. This can be considered motivation for the definition of the determinant. Lemma 2.1. Let A ∈ Fn×n . Then, det(zIn − A) = z n + ∑ni=1 ai z n−i and (−1)i ai is the sum of all i × i principal minors of A. Assume that det(zIn − A) = (z − z1 ) . . . (z − zn ), and let z ∶= (z1 , . . . , zn )⊺ ∈ Fn . Then, (−1)i ai = σi (z), for i = 1, . . . , n. 75

76

Chapter 2. Canonical forms

Proof. Consider det(zI − A). To obtain the coefficient of z n−i , we need to take the product of some n − i diagonal elements of zIn − A: (z − aj1 j1 ) . . . (z − ajn−i jn−i ). We take z n−i in this product. Then, this product is multiplied by the det(−A[α, α]), where α is the complement of {j1 , . . . , jn−i } in the set [n]. This shows that (−1)i ai is the sum of all principal minors of A of order i. Suppose that det(zIn − A) splits into linear factors in F[z]. Then, (1.10.6) implies that (−1)i ai = σi (z). ◻ Corollary 2.1.1. Let A ∈ Fn×n and assume that det(zIn − A) = ∏ni=1 (z − zi ). Then, n

n

i=1

i=1

tr A ∶= ∑ aii = ∑ zi ,

n

det A = ∏ zi . i=1

Definition 2.1.2. Let GL(n, F) ⊂ Fn×n denote the set (group) of all n × n invertible matrices with entries in a given field F. Two matrices A, B ∈ Fn×n are called similar, and this is denoted by A≈B, if B = P AP −1 , for some P ∈ GL(n, F). The set of all B ∈ Fn×n similar to a fixed A ∈ Fn×n is called the similarity class corresponding to A, or simply a similarity class. The following proposition is straightforward. Proposition 2.1.3. Let F be a field. Then, the similarity relation on Fn×n is an equivalence relation. Furthermore, if B = U AU −1 , then 1. For any integer m ≥ 2, B m = U Am U −1 . 2. If A is invertible, then B is invertible and B −1 = U A−1 U −1 . Corollary 2.1.4. Let V be an n-dimensional vector space over F. Assume that T ∶ V → V is a linear transformation. Then, the set of all representation matrices of T in different bases is a similarity class. (Use (1.12.2) and (1.12.3), where m = n, xi = yi , i = 1, . . . , n, X = Y .) Hence, the characteristic polynomial i of T is defined as det(zIn − A) = z n + ∑n−1 i=0 ai z , where A is the representation matrix of T in any basis [u1 , . . . , un ], and this definition is independent of the choice of a basis. In particular, det T ∶= det A, and tr T m = tr Am , for any nonnegative integer. (T 0 is the identity operator, i.e., T 0 (v) = v, for all v ∈ V, and A0 = I.) Definition 2.1.5. An element v ∈ V is called an eigenvector of T corresponding to the eigenvalue λ ∈ F, if v ≠ 0 and T (v) = λv. This is equivalent to the existence 0 ≠ x ∈ Fn such that Ax = λx. Hence, (λI − A)x = 0 which implies that det(λI − A) = 0. Thus, λ is a zero of the characteristic polynomial of A and T . The assumption λ is a zero of the characteristic polynomial implies that the system (λI − A)x has a nontrivial solution x ≠ 0. Corollary 2.1.6. Let A ∈ Fn×n . Then, λ is an eigenvalue of A if and only if λ is a zero of the characteristic polynomial of A, p(z) = det(zI − A). Let V be an n-dimensional vector space over F. Assume that T ∶ V → V is a linear transformation. Then, λ is an eigenvalue of T if and only if λ is a zero of the characteristic polynomial of T .

2.1. Jordan canonical forms

77

Example 2.2. Here, we first show that 5 is an eigenvalue of A = [14

2 3]

∈ R2×2 .

We must show that there is a nonzero vector x ∈ R2 such that Ax = 5x. Clearly, this is equivalent to the equation (A − 5I)x = 0. Then, we need to compute 2 the null space of the matrix A − 5I = [−4 4 −2]. Since the rows (or columns) of A−5I are clearly linearly dependent, using the fundamental theorem of invertible matrices, null A > 0. Thus, Ax = 5x has a nontrivial solution and this tells us 5 is an eigenvalue of A. Next, we find its eigenvectors by computing the null space: [ A − 5I

0 ]=[

−4 4

2 −2

− 12 0

0 1 ]→[ 0 0

0 ]. 0

Therefore, if x = [xx12 ] is an eigenvector corresponding to 5, then x1 − 12 x2 = 0 or 1

x1 = 12 x2 . Thus, the eigenvectors are of the form x = [ 2xx22 ] or, equivalently, the 1

nonzero multiples of [ 12 ]. Note that the set of all eigenvectors corresponding to an eigenvalue λ of an n × n matrix A is the set of nonzero vectors in the null space of A − λI. This means that the set of eigenvectors of A together with the zero vector in Fn is the null space of A − λI. Definition 2.1.7. Assume that A is an n × n matrix and λ is an eigenvalue of A. The collection of all eigenvectors corresponding to λ, together with the zero vector, is called the eigenspace of λ and is denoted by Eλ . Then, in the above example, 1 E5 = {t [ ] ∶ t ∈ R} . 2 Definition 2.1.8. For any m ∈ N and λ ∈ F, we define Eλm as follows: Eλm = {x ∈ F ∶ (A − λI)m x = 0}. Definition 2.1.9. A linear operator T ∶ V → V is said to be diagonalizable if it admits a diagonal matrix representation with respect to some basis of V, i.e., there is a basis β of V such that the matrix [T ]β is diagonal. Definition 2.1.10. A basis of Fn consisting of eigenvectors of A is called an eigenbasis for A. Diagonal matrices possess a very simple structure and allow for very fast computation of determinants and inverses, for instance. Here, we will take a closer look at how to transform matrices into diagonal form. More specifically, we will look at T ∈ L(V) of finite-dimensional vector spaces, which are similar to diagonal matrices. Proposition 2.1.11. Let V be an n-dimensional vector space over F. Assume that T ∶ V → V is a linear transformation. Then, there exists a basis in V such

78

Chapter 2. Canonical forms

that T is represented in this basis by a diagonal matrix: ⎡ ⎢ ⎢ ⎢ diag(λ1 , λ2 , . . . , λn ) ∶= ⎢ ⎢ ⎢ ⎢ ⎣

λ1 0 ⋮ 0

0 λ2 ⋮ 0

... ... ⋮ ...

0 0 ⋮ λn

⎤ ⎥ ⎥ ⎥ ⎥, ⎥ ⎥ ⎥ ⎦

namely, T is diagonalizable, if and only if the characteristic polynomial of T is (z − λ1 )(z − λ2 ) . . . (z − λn ), and V has a basis consisting of eigenvectors of T . Equivalently, A ∈ Fn×n is similar to a diagonal matrix diag(λ1 , λ2 , . . . , λn ) if and only if det(zI − A) = (z − λ1 )(z − λ2 ) . . . (z − λn ), and A has n-linearly independent eigenvectors. Proof. Assume that there exists a basis {u1 , . . . , un } in V such that T is represented in this basis by a diagonal matrix Λ ∶= diag(λ1 , . . . , λn ). Then, the characteristic polynomial of T is det(zI − Λ) = ∏ni=1 (z − λi ). From the definition of the representation matrix of T , T (ui ) = λi ui , for i = 1, . . . , n. Since each ui ≠ 0, we deduce that each ui is an eigenvector of T . By our assumption, {u1 , . . . , un } is a basis in V. Conversely, assume now that V has a basis {u1 , . . . , un } consisting of eigenvectors of T . Thus, T (ui ) = λi ui for i = 1, . . . , n. Hence, Λ is the representation matrix of T in the basis {u1 , . . . , un }. To prove the corresponding results for A ∈ Fn×n , let V ∶= Fn and define the linear operator T (x) ∶= Ax, for all x ∈ Fn . ◻ Lemma 2.3. Let A ∈ Fn×n and assume that x1 , . . . , xk are k eigenvectors corresponding to k distinct eigenvalues λ1 , . . . , λk , respectively. Then, x1 , . . . , xk are linearly independent. Proof. We prove by induction on k. For k = 1, x1 ≠ 0. Hence, x1 is linearly independent. Assume that the lemma holds for k = m − 1. Suppose that k = m. Assume that ∑m i=1 ai xi = 0. Then m

m

m

i=1

i=1

i=1

0 = A0 = A ∑ ai xi = ∑ ai Axi = ∑ ai λi xi . Multiply the equality ∑m i=1 ai xi = 0 by λm and subtract it from the above inequality to deduce that ∑m−1 i=1 ai (λi − λm )xi = 0. Since x1 , . . . , xm−1 are linearly independent, by the induction hypothesis, we deduce that ai (λi − λm ) = 0, for i = 1, . . . , m − 1. As λi − λm ≠ 0, for i < m, we get that ai = 0, for i = 1, . . . , m − 1. The assumption that ∑m i=1 ai xi = 0 yields that am xm = 0. Since xm ≠ 0, we obtain that am = 0. Hence, a1 = ⋯ = am = 0. ◻ Theorem 2.1.12. Let V be an n-dimensional vector space over F. Assume that T ∶ V → V is a linear transformation. Assume that the characteristic polynomial of T , p(z) has n distinct roots over F, i.e., p(z) = ∏ni=1 (z − λi ) where λ1 , . . . , λn ∈ F, and λi ≠ λj for each i ≠ j. Then, there exists a basis in V in which T is represented by a diagonal matrix. Similarly, let A ∈ Fn×n and assume that det(zI − A) has n distinct roots in F. Then, A is similar to a diagonal matrix.

2.1. Jordan canonical forms

79

Proof. It is enough to consider the case of the linear transformation T . Recall that each root of the characteristic polynomial of T is an eigenvalue of T (Corollary 2.1.6). Hence, each λi corresponds to an eigenvector ui : T (ui ) = λi ui . Then, the proof of the theorem follows from Lemma 2.3 and Proposition 2.1.11. ◻ Remark 2.1.13. The converse of Theorem 2.1.12 is not true. If it turns out that there do not exist n distinct eigenvalues, then we cannot conclude from this that the matrix is not diagonalizable. This is really rather obvious, as the identity matrix (operator) has only one eigenvalue, but it is diagonal already. Remark 2.1.14. If A ∈ Fn×n , it may happen that det(zI − A) does not have n roots in F. (See, for example, Problem 2.2.2-2.) Hence, we cannot diagonalize A, i.e., A is not similar to a diagonal matrix. If F is algebraically closed, then det(zI − A) has n roots in F. We can apply Proposition 2.1.11 in general and Theorem 2.1.12 in particular to see if A is diagonalizable. Remark 2.1.15. Since R is not algebraically closed and C is, that is the reason we sometimes view a real-valued matrix A ∈ Rn×n as a complex-valued matrix A ∈ Cn×n . (Again, see Problem 2.2.2-2.) Corollary 2.1.16. Let A ∈ Cn×n be nondiagonalizable. Then, its characteristic polynomial must have a multiple root. 2

Example 2.4. We diagonalize the matrix A = [ 1

−1

0 0 2 1]. 0 1

Its eigenvalues are

λ1 = 1, λ2 = λ3 = 2. The corresponding linearly independent eigenvectors are x1 = (0, −1, 1)⊺ , x2 = (0, 1, 0)⊺ , and x3 = (−1, 0, 1)⊺ . We now construct the matrix P from the eigenvectors of A as follows: ⎡0 0 ⎢ ⎢ P = ⎢−1 1 ⎢ ⎢1 0 ⎣

−1⎤⎥ ⎥ 0 ⎥. ⎥ 1 ⎥⎦

Next, we construct the diagonal matrix D corresponding to eigenvalues of A: ⎡1 ⎢ ⎢ D = ⎢0 ⎢ ⎢0 ⎣

0 2 0

0⎤⎥ ⎥ 0⎥ . ⎥ 2⎥⎦

Finally, we check our work by verifying that AP = P D (A is supposed to be equal to P DP −1 ): ⎡2 ⎢ ⎢ AP = ⎢ 1 ⎢ ⎢−1 ⎣ ⎡0 ⎢ ⎢ P D = ⎢−1 ⎢ ⎢1 ⎣

0 2 0 0 1 0

⎤⎡ 0 ⎥⎢ ⎥⎢ ⎥ ⎢−1 ⎥⎢ ⎥⎢ 1 ⎦⎣ −1⎤⎥ ⎡⎢ 1 ⎥⎢ 0 ⎥⎢ 0 ⎥⎢ 1 ⎥⎦ ⎢⎣ 0 0 1 1

0 1 0 0 2 0

−1⎤⎥ ⎡⎢ 0 ⎥ ⎢ 0 ⎥ = ⎢−1 ⎥ ⎢ 1 ⎥⎦ ⎢⎣ 1 0 ⎤⎥ ⎡⎢ 0 ⎥ ⎢ 0 ⎥ = ⎢−1 ⎥ ⎢ 2 ⎥⎦ ⎢⎣ 1

0 2 0 0 2 0

−2⎤⎥ ⎥ 0 ⎥, ⎥ 2 ⎥⎦ −2⎤⎥ ⎥ 0 ⎥. ⎥ 2 ⎥⎦

80

Chapter 2. Canonical forms 2

Example 2.5. Consider the matrix A = [0 0

4 2 0

6 2]. 4

Its eigenvalues are λ1 = 2

and λ2 = 4. Solving Ax = λi x for each i = 1, 2, we would find the eigenvectors x1 = (1, 0, 0)⊺ and x2 = (5, 1, 1)⊺ . Every eigenvector of A is a multiple of x1 or x2 , which means there are not three linearly independent eigenvectors of A and using Proposition 2.1.11, A is not diagonalizable. Remark 2.1.17. Considering the above examples, if there are no n linearly independent eigenvectors for A ∈ Fn×n , then A may or may not be diagonalizable, and we have to check directly whether there are n linearly independent eigenvectors. Diagonal matrices A = P −1 BP exhibit useful properties such that they can be easily raised to a power: k

B k = (P AP −1 ) = P Ak P −1 . Computing Ak is easy because we apply this operation individually to any diagonal element. As an example, this allows us to compute inverses of A by performing fewer flops. Definition 2.1.18. Let ∼ be an equivalence relation on the set X. A subset A ⊆ X is said to be a set of canonical form for ∼ if for every x ∈ X, there exists a finite number of a ∈ A which are equivalent to x. Example 2.6. We have already seen that row equivalence is an equivalence relation on Fm×n . The subset of RREF matrices is a set of canonical form for row equivalence as every matrix is row equivalent to a unique matrix in row echelon form. Remark 2.1.19. Recall that A, B ∈ Fn×n are called similar if there exists an invertible matrix P such that A = P −1 BP. Similarity is an equivalence relation on Fn×n . As we have seen, two matrices are similar if and only if they represent the same linear operators. Hence, similarity is important to study the structure of linear operators. As we mentioned at the beginning, this chapter is devoted to developing canonical forms for similarity. Definition 2.1.20. 1. Let k be a positive integer and λ ∈ F. Then, Jk (λ) ∈ Fk×k is a k × k upper triangular matrix, with λ on the main diagonal, 1 on the next subdiagonal, and other entries equal to 0 for k > 1: ⎡ ⎢ ⎢ ⎢ ⎢ Jk (λ) ∶= ⎢ ⎢ ⎢ ⎢ ⎢ ⎣

λ 0 ⋮ 0 0

1 λ ⋮ 0 0

0 1 ⋮ 0 0

... ... ⋮ ... ...

0 0 ⋮ λ 0

0 0 ⋮ 1 λ

⎤ ⎥ ⎥ ⎥ ⎥ ⎥, ⎥ ⎥ ⎥ ⎥ ⎦

and Jk (λ) is called a Jordan block associated with the scalar λ.

2.1. Jordan canonical forms

81

2. If Aij are matrices of the appropriate sizes, then by block matrix ⎡A ⎢ 11 ⎢ A=⎢ ⋮ ⎢ ⎢Am1 ⎣

A12 ⋮ Am2

⋯ ⋯

A1n ⎤⎥ ⎥ ⎥, ⎥ Amn ⎥⎦

we mean the matrix whose left submatrix is A11 , and so on. Thus, the Aij ’s are submatrices of A and not entries of A. 3. Let Ai ∈ Fni ×ni , for i = 1, . . . , k. Denote by ⎡ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎣

A = ⊕ki=1 Ai = A1 ⊕ A2 ⊕ ⋯ ⊕ Ak = diag(A1 , A2 , . . . , Ak ) ∶= A1 0 . . . 0 ⎤⎥ 0 A2 . . . 0 ⎥⎥ ⎥ ∈ Fn×n , n = n1 + n2 + . . . + nk . ⋮ ⋮ ⋮ ⋮ ⎥⎥ 0 0 . . . Ak ⎥⎦

Here, A is said to be a block diagonal matrix whose blocks are A1 , A2 , . . . , Ak . Theorem 2.1.21. (the Jordan canonical form) Let A ∈ Cn×n (A ∈ Fn×n , where F is an algebraically closed field.) Then, A is similar to its Jordan canonical form ⊕ki=1 Jni (λi ) for some λ1 , . . . , λk ∈ C, (λ1 , . . . , λk ∈ F), and positive integers n1 , . . . , nk . The Jordan canonical form is unique up to the permutations of the Jordan blocks Jn1 (λ1 ), . . . , Jnk (λk ), and it is denoted by JCF (A). Equivalently, let T ∶ V → V be a linear transformation of an n-dimensional space over C, or any other algebraically closed field. Then, there exists a basis in V such that ⊕ki=1 Jni (λi ) is the representation matrix of T in this basis. The blocks Jni (λi ), i = 1, . . . , k are unique. Note that A ∈ Cn×n is diagonalizable if and only if its Jordan canonical form k = n, i.e., n1 = . . . = nn = 1. For k < n, the Jordan canonical form is the simplest form of the similarity class of a nondiagonalizable A ∈ Cn×n . We will prove Theorem 2.1.21 in the next several sections. Example 2.7. Consider the matrix ⎡2 ⎢ ⎢ A=⎢1 ⎢ ⎢−1 ⎣

2 3 −2

3 ⎤⎥ ⎥ 3 ⎥. ⎥ −2⎥⎦ 1

2 3 2 3] −1 −2 −3

We see that det(xI − A) = (x − 3)3 . For λ = 1, we have A − I = [ 1

which

has rank 1. Thus, dim N (A − I) = 2. Since (A − I)2 = 0, dim N ((A − I)2 ) = 3. Next, dim N (A − I) = 2 tells us JCF (A) has 2 blocks with eigenvalue 1. The condition dim N ((A − I)2 ) − dim N (A − I) = 1 follows that one of these blocks has size at least 2, and so the other one has size 1. Thus, JCF (A) = 1

[0 0

0 1 0

0 1]. 1

The following is an algorithm to compute the Jordan canonical form of A ∈ Fn×n .

82

Chapter 2. Canonical forms

Algorithm: 1. Compute and factor the characteristic polynomial of A. 2. Let mλ be the maximal integer k ≤ nλ such that Eλk /Eλk−1 is not a zero subspace. 3. For each λ, compute a basis A = {x1 , . . . , xk } for Eλmλ /Eλmλ −1 and lift to elements of Eλmλ . Add the elements (A − λI)m xi to A, for 1 ≤ m < mλ . 4. Set i = mλ − 1. 5. Compute A ∩ Eλi to a basis for Eλi /Eλi−1 . Add the element (A − λI)m x to A for all m and x ∈ A. 6. If i ≥ 1, set i = i − 1, and return to the previous step. 7. Output A, the matrix for A with respect to a suitable ordering of A, is in Jordan canonical form. Proof of correctness. To verify that this algorithm works, we need to check that it is always possible to complete A ∩ Eλk to a basis for Eλk /Eλk−1 . Suppose A∩Eλk is linearly dependent. Then, there are x1 , . . . , xs ∈ A∩Eλk with ∑i ci xi = 0, and not all ci = 0. By the construction of A, we know that xi = (A − λI)yi , for some yi ∈ A, so consider y = ∑i ci yi . Then y ≠ 0, since the yi ’s are linearly independent, and not all ci ’s are zero. Indeed, by the construction of the yi ’s, we know y ∈/ Eλk . But (A − λI)w = 0, so y ∈ Eλ1 , which is a contradiction, since k ≥ 1. Example 2.8. Consider the matrix ⎡2 ⎢ ⎢−2 ⎢ A=⎢ ⎢−2 ⎢ ⎢−2 ⎣

−4 0 −2 −6

2 2⎤⎥ 1 3⎥⎥ ⎥. 3 3⎥⎥ 3 7⎥⎦

Then, the characteristic polynomial of A is (x − 2)2 (x − 4)2 . A basis for E21 is {(2, 1, 0, 2)⊺ , (0, 1, 2, 0)⊺ }, so there are two blocks of dimension one with 2 as an eigenvalue. To confirm this, check that E2m = E21 for all m > 1. A basis for E41 is {(0, 1, 1, 1)⊺ }, while a basis for E42 is {(0, 1, 1, 1)⊺ , (1, 0, 0, 1)⊺ }. So we can take {(1, 0, 0, 1)⊺ } as a basis for E42 /E41 . Then (A − 4I)(1, 0, 0, 1)⊺ = (0, 1, 1, 1)⊺ , so our basis is then {(2, 1, 0, 2)⊺ , (0, 1, 2, 0)⊺ , (0, 1, 1, 1)⊺ , (1, 0, 01)⊺ }. The matrix of the transformation with respect to this basis is ⎡2 ⎢ ⎢0 ⎢ ⎢ ⎢0 ⎢ ⎢0 ⎣

0 2 0 0

0 0 4 0

0⎤⎥ 0⎥⎥ ⎥. 1⎥⎥ 4⎥⎦

2.2. An application of diagonalizable matrices

83

An application of the Jordan canonical form In many applications one has a linear system given schematically by the inputoutput (black box) relation x → y where x, y ∈ Rn are column vectors with n coordinates, and y = Ax, where A ∈ Rn×n . If one repeats this procedure m times (closed loop), then xm = Am x, m = 1, 2, . . .. What does xm look like where m is very big? This question is very natural in stationary Markov chains, which are nowadays very popular in many simulations and algorithms for hard problems in combinatorics and computer science. The answer to this problem is given by using the Jordan canonical form.

2.2 An application of diagonalizable matrices The sequence of Fibonacci numbers Fn , 0, 1, 1, 2, 3, 5, 8, 13, 21, 34, . . . , is defined recursively as follows: One begins with F0 = 0 and F1 = 1. After that, every number is defined to be the sum of its two predecessors: Fn = Fn−1 + Fn−2 ,

for n > 1.

Johannes Kepler (1571–1630) observed that the ratio of consecutive Fibonacci √ 1+ 5 numbers converges to the golden ratio: 2 . Theorem 2.2.1 (Kepler). limn→∞

Fn+1 Fn

=

√ 1+ 5 . 2

The purpose of this section is using linear algebra tools to prove Kepler’s theorem. In order to do this, we will need to find an explicit formula for Fibonacci numbers. Particularly, we need the following lemma. See also section 5.3. Lemma 2.9. For n > 1, Fn =

√ √ (1+ 5)n −(1− 5)n √ . 2n 5

Proof. Let T be the linear operator on R2 represented by the matrix 1 A=[ 1

1 ] 0

with respect to the standard basis of R2 . Then, for the vector vk whose coordinates are two consecutive Fibonacci numbers (Fk , Fk−1 )⊺ , we have that F 1 T (vk ) = A [ k ] = [ Fk−1 1

1 F F + Fk−1 F ][ k ] = [ k ] = [ k+1 ] = vk+1 . 0 Fk−1 Fk Fk

Therefore, [FFn+1 ] = An [10] and this equation helps us to calculate the powers An n of A by using the diagonalization. We start with finding the eigenvalues of A, which are √ √ 1+ 5 1− 5 λ1 = and λ2 = , 2 2

84

Chapter 2. Canonical forms

and we conclude that A is diagonalizable as its eigenvalues are real and distinct. Next, we find the eigenvectors corresponding to λ1 and λ2 . We solve the equations A [xy11 ] = λ1 [xy11 ], A [xy22 ] = λ2 [xy22 ] and it is obtained that [xy11 ] = [λ11 ] and [xy22 ] = [λ12 ]. Therefore, the matrix of change of basis between standard basis and the basis of eigenvectors is P = [λ11 λn 1

A =P[0 n

0 ] P −1 . λn 2

λ2 1]

and so P −1 AP = [λ01

0 λ2 ].

This means

Totally, we have

F 1 λn [ n+1 ] = An [ ] = p [ 1 Fn 0 0

0 −1 1 ]p [ ] λn2 0

λ2 λn1 ][ 1 0

=

1 λ [ 1 λ1 − λ2 1

=

1 λn+1 − λn+1 2 [ 1 n ]. λ1 − λn2 λ1 − λ2

0 1 ][ λn2 −1

−λ2 1 ][ ] λ1 0

Equating the entries of the vectors in the last formula we obtain √ √ λn − λn2 (1 + 5)n − (1 − 5)n √ = Fn = 1 , λ1 − λ2 2n 5 ◻

as claimed. We now are ready to prove Kepler’s theorem by using the above lemma. √ √ √ 2n 5 Fn+1 (1 + 5)n+1 − (1 − 5)n+1 √ √ √ lim = lim n→∞ Fn n→∞ 2n+1 5 (1 + 5)n − (1 − 5)n √ √ 1 (1 + 5)n+1 − (1 − 5)n+1 √ √ = lim 2 n→∞ (1 + 5)n − (1 − 5)n √ n √ √ √ 1−√5 (1 + ) (1 − 5) 1 + 5 5) − ( 1 1+ 5 = lim . = √ n 2 n→∞ 2 √5 ) 1 − ( 1− 1+ 5 Note that

because

√ n 1− 5 √ ) =0 lim ( n→∞ 1 + 5 √ 1− 5 √ ∣ < 1. ∣ 1+ 5

2.2.1 Worked-out problems 1. Show that the Jordan block Jk (λ) ∈ Fn×n is similar to Jk (λ)⊺ . Solution: Consider the k × k permutation matrix ⎡0 ⎢ ⎢0 ⎢ P =⎢ ⎢ ⋱ ⎢ ⎢1 ⎣

⋯ 1⎤⎥ 1 0⎥⎥ ⎥. ⎥ ⋮ ⎥ ⋯ 0⎥⎦

2.2. An application of diagonalizable matrices

85

Then, P −1 Jk (λ)P = Jk (λ)⊺ and so Jk (λ)≈Jk (λ)⊺ . 2. Assume that the characteristic polynomial of A splits in F. Show that A is similar to A⊺ . Solution: Using Theorem 2.1.21, A is similar to JCF (A) = ⊕Jki (λi ). ⊺

⊺

Clearly,

i

A ≈⊕Jki (λi ) . Using the previous problem, we conclude that ⊕Jki (λi )≈ i

⊕Jki (λi )⊺ and this yields A≈A⊺ .

i

i

2.2.2 Problems 1. Let V be a vector space over F. (You may assume that F = C.) Let T ∶ V → V be a linear transformation. Suppose that ui is an eigenvector of T with the corresponding eigenvalue λi , for i = 1, . . . , m. Show by induction on m that if λ1 , . . . , λm are m distinct scalars, then u1 , . . . , um are linearly independent. 2. Let A = [

0 1 ] ∈ R2×2 . −1 0

(a) Show that A is not diagonalizable over the real numbers R. (b) Show that A is diagonalizable over the complex numbers C. (c) Find U ∈ C2×2 and a diagonal Λ ∈ C2×2 such that A = U ΛU −1 . 3. Let A = ⊕ki=1 Jni (λi ). Show that det(zI −A) = ∏ki=1 (z −λi )ni . (You may use the fact that the determinant of an upper triangular matrix is the product of its diagonal entries.) 4. Let A = ⊕ki=1 Ai , where Ai ∈ Cni ×ni , i = 1, . . . , k. Show that det(zIn − A) = k ∏i=1 det(zIni − Ai ). (Hint: First show the identity for k = 2 using the determinant expansion by rows. Then use induction for k > 2.) 5. (a) Show that any eigenvector of Jn (λ) ∈ Cn×n is in the subspace spanned by e1 . Conclude that Jn (λ) is not diagonalizable unless n = 1. (b) What is the rank of zIn − Jn (λ) for a fixed λ ∈ C and for each z ∈ C? (c) What is the rank of zI − ⊕ki=1 Jni (λi ) for fixed λ1 , . . . , λk ∈ C and for each z ∈ C? 6. Let A ∈ Cn×n and assume that det(zIn −A) = z n +a1 z n−1 +⋯+an−1 z +an has n distinct complex roots. Show that An + a1 An−1 + ⋯ + an−1 A + an In = 0, where 0 ∈ Cn×n denotes the zero matrix. (This is a special case of the Cayley–Hamilton theorem, which states that the above identity holds for any A ∈ Cn×n ; see Theorem 2.3.2.) (Hint: Use the fact that A is diagonalizable.) 7. If A, B ∈ Fn×n , show that AB and BA have the same characteristic polynomial.

86

Chapter 2. Canonical forms

8. Let 1 ≤ m ≤ n ∈ N and suppose that A ∈ Cm×n and B ∈ Cn×m . Prove that det(zI − BA) = z n−m det(zI − AB). (In particular, if m = n, then the characteristic polynomials of AB and BA are the same; see the previous problem.) 9. If A ∈ Rn×n is a symmetric matrix, show that all of its eigenvalues are real. 10. If A ∈ Fn×n is an invertible matrix and pA (z) and pA−1 (z) denote the characteristic polynomials of A and A−1 , respectively. Show that pA−1 (z) = zn 1 p ( ). pA (0) A z n

11. If A ∈ Rn×n with n distinct eigenvalues λ1 , . . . , λn , show that Rn = ⊕ Eλi . i=1

12. If A ∈ Fn×n is a diagonalizable matrix with the characteristic polynok

mial pA (z) = ∏ (z − λi )di and V = {B ∈ Fn×n ∶ AB = BA}, prove that i=1

dim V = ∑ki=1 d2i . 13. Prove that a matrix is invertible if and only if it does not have a zero eigenvalue. 14. Let A be a square matrix with the block form A = [X 0 are square matrices. Show that det A = det X det Z.

Y Z ],

where X and Z

15. Assume that A is a doubly stochastic matrix. Show that for each eigenvalue λ of A, ∣λ∣ ≤ 1. θ − sin θ 2×2 16. Show that the matrix A = [cos has complex eigenvalues if θ sin θ cos θ ] ∈ R is not a multiple of π. Give the geometric interpretation of it. (See also Problem 1.6.3-4.)

17. Let A ∈ Fn×n . Prove that (a) if λ is an eigenvalue of A, then λk is an eigenvalue of Ak , for any k ∈ N. (b) if λ is an eigenvalue of A and A is invertible, then λ−1 is an eigenvalue of A−1 . 18. Let k and n be positive integers with k < n. Assume that A = [A011

A12 A22 ]

∈

Fn×n , where A11 ∈ Fk×k and A22 ∈ F(n−k)×(n−k) . Prove that spec A = spec A11 ∪ spec A22 . 19. Let A ∈ Rn×n and assume that a + ib is a complex eigenvalue of A having the eigenvector x + iy, where x and y are real vectors. Prove that then a − ib is also an eigenvalue with the eigenvector x − iy.

2.3. Matrix polynomials

87

Historical note Camille Jordan (1838–1922) was a brilliant French mathematician. Jordan was interested in algebra and its application to geometry. In particular, he studied groups which are used to study symmetry—in Jordan’s case, the structure of crystals. In 1870, Jordan published his most important work, a summary of group theory and related algebraic notions in which he introduced the “canonical form” for matrices of transformations that now bears his name. His work contains the Jordan canonical form theorem for matrices, not over the complex numbers but over a finite field. Group theory was one of the major areas of mathematical research for 100 years following Jordan’s fundamental publication. Although given Jordan’s work on matrices and the fact that the Jordan canonical form is named after him, the Gauss–Jordan pivoting elimination method is not. The Jordan of Gauss–Jordan is Wilhelm Jordan (1842–1899), a German geodesist who conducted surveys in Germany and Africa and founded the German geodesy journal [27].

Camille Jordan (1838–1922).

2.3 Matrix polynomials m,n

Let P (z) = [pij (z)]i=j=1 be an m×n matrix whose entries are polynomials in F[z]. The set of all such m × n matrices is denoted by F[z]m×n . Clearly, F[z]m×n is a vector space over F, of infinite dimension. Given p(z) ∈ F[z] and P (z) ∈ F[z]m×n , one can define p(z)P (z) ∶= [p(z)pij (z)] ∈ F[z]m×n . Again, this product satisfies nice distribution properties. Thus, F[z]m×n is a module over the ring F[z]. (Note F[z] is not a field!) Let P (z) = [pij (z)] ∈ F[z]m×n . Then, deg P (z) ∶= maxi,j deg pij (z) = l. Write l

m,n

pij (z) = ∑ pij,k z l−k , Pk ∶= [pij,k ]i=j=1 ∈ Fm×n for k = 0, . . . , l. k=0

Then P (z) = P0 z l + P1 z l−1 + ⋯ + Pl ,

Pi ∈ Fm×n , i = 0, . . . , l,

is a matrix polynomial with coefficients in Fm×n .

(2.3.1)

88

Chapter 2. Canonical forms

Assume that P (z), Q(z) ∈ F[z]n×n . Then, we can define P (z)Q(z) ∈ F[z]. Note that in general, P (z)Q(z) ≠ Q(z)P (z). Hence, F[z]n×n is a noncommutative ring. For P (z) ∈ Fn×n of the form (2.3.1) and any A ∈ Fn×n , we define l

P (A) = ∑ Pi Al−i = P0 Al + P1 Al−1 + ⋯ + Pl , where A0 = In . i=0

Given two polynomials p, q ∈ F[z], one can divide p by q ≡/ 0 with the residue r, i.e., p = tq +r, for some unique t, r ∈ F[z], where deg r < deg q. One can trivially generalize that to polynomial matrices: Proposition 2.3.1. Let p(z), q(z) ∈ F[z] and assume that q(z) ≡/ 0. Let p(z) = t(z)q(z)+r(z), where t(z), r(z) ∈ F[z] are the unique polynomials with deg r(z) < deg q(z). Let n > 1 be an integer, and define the following scalar polynomials: P (z) ∶= p(z)In , Q(z) ∶= q(z)In , T (z) ∶= t(z)In , R(z) ∶= r(z)In ∈ F[z]n×n . Then, P (A) = T (A)Q(A) + R(A), for any A ∈ Fn×n . Proof. Since Ai Aj = Ai+j , for any nonnegative integer, with A0 = In , the equality P (A) = T (A)Q(A) + R(A) follows trivially from the equality p(z) = t(z)q(z) + r(z). ◻ Recall that p is divisible by q, denoted as q∣p, if p = tq, i.e., r is the zero polynomial. Note that if q(z) = (z − a), then p(z) = t(z)(z − a) + p(a). Thus, (z − a)∣p if and only if p(a) = 0. Similar results hold for square polynomial matrices, which are not scalar. Lemma 2.10. Let P (z) ∈ F[z]n×n , A ∈ Fn×n . Then, there exists a unique Tlef t (z), of degree degP − 1 if deg P > 0 or degree −∞ if deg P ≤ 0, such that P (z) = Tlef t (z)(zI − A) + P (A).

(2.3.2)

In particular, P (z) is divisible from the right by zI − A if and only if P (A) = 0. Proof. We prove the lemma by induction on deg P . If deg P ≤ 0, i.e., P (z) = P0 ∈ Fn×n , then Tlef t = 0, P (A) = P0 and the lemma trivially holds. Suppose that the lemma holds for all P with deg P ≤ l − 1, where l ≥ 1. Let P (z) be of degree l ≥ 1 of the form (2.3.1). Then P (z) = P0 z l + P˜ (z), where P˜ (z) = ∑li=1 Pi z l−1 . By the induction assumption P˜ (z) = T˜lef t (z)(zIn − A) + P˜ (A), where T˜lef t (z) is unique. A straightforward calculation shows that l−1

P0 z l = Tˆlef t (z)(zIn − A) + P0 Al , where Tˆlef t (z) = ∑ P0 Ai z l−i−1 , i=0

and Tˆlef t is unique. Hence, Tlef t (z) = Tˆlef t (z) + T˜lef t is unique, P (A) = P0 Al + P˜ (A) and (2.3.2) follows. Suppose that P (A) = 0. Then P (z) = Tlef t (z)(zI − A), i.e., P (z) is divisible by zIn − A from the right. Assume that P (z) is divisible by (zIn − A) from the right, i.e., there exists T (z) ∈ F[z]n×n such that P (z) = T (z)(zIn − A). Subtract (2.3.2) from P (z) = T (z)(zIn − A) to deduce that 0 = (T (z) − Tlef t (z))(zIn − A) − P (A). Hence, T (z) = Tlef t (z) and P (A) = 0. ◻

2.3. Matrix polynomials

89

The above lemma can be generalized to any Q(z) = Q0 z l + Q1 z l−1 + ⋯ + Ql ∈ F[z], where Q0 ∈ GL(n, F); there exists unique Tlef t (z), Rlef t (z) ∈ F[z] such that P (z) = Tlef t (z)Q(z) + Rlef t (z), deg Rlef t < deg Q, l

Q(z) = ∑ Qi z l−i , Q0 ∈ GL(n, F).

(2.3.3)

i=0

Here we agree that (Az i )(Bz j ) = (AB)z i+j , for any A, B ∈ Fn×n and nonnegative integers i, j. Theorem 2.3.2 (Cayley–Hamilton theorem). Let A ∈ Fn×n and p(z) = det(zIn − A) be the characteristic polynomial of A. Let P (z) = p(z)In ∈ F[z]n×n . Then, P (A) = 0. Proof. Let A(z) = zIn − A. Fix z ∈ F and let B(z) = [bij (z)] be the adjoint matrix of A(z), whose entries are the cofactors of A(z). That is, bij (z) is (−1)i+j times the determinant of the matrix obtained from A(z) by deleting row j and column i. If one views z as indeterminate, then B(z) ∈ F[z]n×n . Consider the identity A(z)B(z) = B(z)A(z) = det A(z)In = p(z)In = P (z). Hence, (zIn − A) divides P (z) from the right. Lemma 2.10 yields that P (A) = 0. ◻ For p, q ∈ F[z], let (p, q) denote the greatest common divisor of p, q. If p and q are identically zero, then (p, q) is the zero polynomial. Otherwise (p, q) is a polynomial s of the highest degree that divides p and q. Also, s is determined up to a multiple of a nonzero scalar and it can be chosen as a unique monic polynomial: s(z) = z l + s1 z l−1 + ⋯ + sl ∈ F[z]. (2.3.4) Corollary 2.3.3. Let p, q ∈ F[z] be coprime. Then, there exist u, v ∈ F[z] such that 1 = up + vq. Let n > 1 be an integer and define P (z) ∶= p(z)In , Q(z) ∶= q(z)In , U (z) ∶= u(z)In , V (z) ∶= v(z)In ∈ F[z]n×n . Then, for any A ∈ Fn×n , we have the identity In = U (A)P (A) + V (A)Q(A), where U (A)P (A) = P (A)U (A) and V (A)Q(A) = Q(A)V (A). Let us consider that case where p and q ∈ F[z] are both nonzero polynomials that split (to linear factors) over F. Thus, p(z) = p0 (z − α1 ) . . . (z − αi ), p0 ≠ 0,

q(z) = q0 (z − β1 ) . . . (z − βj ), q0 ≠ 0.

In that case (p, q) = 1, if p and q do not have a common root. If p and q have a common zero, then (p, q) is a nonzero polynomial that has the maximal number of common roots of p and q counting with multiplicities. From now on, for any p ∈ F[z] and A ∈ Fn×n , we identify p(A) with P (A), where P (z) = p(z)In . Definition 2.3.4. For an eigenvalue λ, its algebraic multiplicity is the multiplicity of λ as a root of the characteristic polynomial and it is denoted by ma (λ).

90

Chapter 2. Canonical forms

The geometric multiplicity of λ is the maximal number of linearly independent eigenvectors corresponding to it and it is denoted by mg (λ). Also, λ is called an algebraically simple eigenvalue if its algebraic multiplicity is 1. Example 2.11. Consider the matrix A = [10

1 1].

Let us find its eigenvalues and

eigenvectors. The characteristic polynomial is (1 − x)2 . So λ = 1 is a double root. Eigenvectors corresponding to this eigenvalue follow: 1−λ [ 0

1 x 0 0 ] [ 1] = [ ] ⇒ [ 1 − λ x2 0 0

1 x1 0 ][ ] = [ ]. 0 x2 0

So x2 = 0, and x1 can be anything. There is only one linearly independent vector: 0 [ ]. 1 Consider now the identity matrix 1 A=[ 0

0 ]. 1

Then, by the same way, λ = 1 is a double root of the characteristic polynomial and the only eigenvalue. But any nonzero vector can serve as its eigenvector, because for any x ∈ R2 we have Ax = x = 1 ⋅ x. So, it has two linearly independent eigenvectors, [10] and [01]. Here, for both matrices λ = 1 is the only eigenvalue with algebraic multiplicity 2. But its geometric multiplicity is 1 for [10 11] and two for [10 01]. Theorem 2.3.5. Let A ∈ Fn×n , and λ is an eigenvalue of A. Then mg (λ) ≤ ma (λ). Proof. Set m = mg (λ) and let {x1 , . . . , xm } be a basis for Eλ . Extend it to a basis {x1 , . . . , xn } for Fn . Let P ∈ Fn×n be the invertible matrix with column vectors x1 , . . . , xn and set B = P −1 AP . Then, for each 1 ≤ j ≤ m, we have Bej = P −1 AP ej = P −1 Avj = P −1 λvj = λ(P −1 vj ) = λej , so B has the block form λI B=[ m 0

X ]. Y

Using Problem 2.2.2-13 and the fact that similar matrices have the same characteristic polynomials, we have (t − λ)Im pA (t) = pB (t) = det [ 0

−X ] = det((t − λ)Im ) det(tIn−m − Y ) tIn−m − Y

= (t − λ)m PY (t). This shows that λ appears at least m times as a root of pA (t), and then ma (t) ≥ m.

2.4. Minimal polynomial and decomposition to invariant subspaces

91

Historical note Arthur Cayley (1821–1895) was a British lawyer and mathematician. Of the many hundreds of mathematics papers Cayley published during his career, the one of greatest interest here is his Memoir on the Theory of Matrices, which was published in 1858 while Cayley was practicing law. He investigated different topics such as analytical geometry of n dimensions, theory of determinants, theory of matrices, and skew surfaces. The term matrix was coined by Cayley’s friend and colleague, James Joseph Sylvester (1814–1897). Many authors referred to what we now call a matrix as an “array” or “tableau.” Before its mathematical definition came along, the word matrix was used to describe “something which surrounds, within which something is contained.” It was a perfect word for this new object. In 1878, Sylvester read Cayley’s Memoir and adopted the use of the word matrix. Of Cayley’s work on matrices, Richard Feldmann writes: “Although the term ‘matrix’ was introduced into mathematical literature by James Joseph Sylvester in 1850, the credit for founding the theory of matrices must be given to Arthur Cayley, since he published the first expository articles on the subject. . . . Cayley’s introductory paper in matrix theory was written in French and published in a German periodical [in 1855]. . . . [He] introduces, although quite sketchily, the ideas of inverse matrix and of matrix multiplication, or ‘compounding’ as Cayley called it.”

James Joseph Sylvester (1814–1897).

Arthur Cayley (1821–1895).

2.4 Minimal polynomial and decomposition to invariant subspaces Recall that Fn×n is a vector space over F of dimension n2 . Let A ∈ Fn×n and consider the powers A0 = In , A, A2 , . . . , Am . Let m be the smallest positive integer such that these m + 1 matrices are linearly dependent as vectors in Fn×n . (Note that A0 ≠ 0.) Thus, there exists (b0 , . . . , bm )⊺ ∈ Fn for which (b0 , . . . , bm )⊺ ≠ 0 m−i and ∑m = 0. If b0 = 0, then A0 , . . . , Am−1 are linearly dependent, which i=0 bi A

92

Chapter 2. Canonical forms

contradicts the definition of m. Hence, b0 ≠ 0. Divide the linear dependence by m−i b0 to obtain ∑m = 0, where ai = bb0i , 0 ≤ i ≤ m. i=0 ai A m m Define ψ(z) = z + ∑i=0 ai z m−i ∈ F[z]. Then, ψ(A) = 0. Here, ψ is called the minimal polynomial of A. In principle m ≤ n2 , but in fact m ≤ n. In the rest of this book, we will use the notations pA (z) and ψA (z) for the characteristic polynomial and minimal polynomial of A, respectively, or simply p(z) and ψ(z) if there is no ambiguity about A. Theorem 2.4.1. Let A ∈ Fn×n and ψ(z) be its minimal polynomial. Assume that p(z) ∈ F[z] is an annihilating polynomial of A, i.e., p(A) = 0n×n . Then, ψ divides p. In particular, the characteristic polynomial p(z) = det(zIn − A) is divisible by ψ(z). Hence, deg ψ ≤ deg p = n. Proof. Divide the annihilating polynomial p by ψ to obtain p(z) = t(z)ψ(z) + r(z), where deg r < deg ψ = m. Proposition 2.3.1 yields that p(A) = t(A)ψ(A) + r(A), which implies that r(A) = 0. Assume that l = deg r(z) ≥ 0, i.e., r is not identically the zero polynomial. Therefore, A0 , . . . , Al are linearly dependent, which contradicts the definition of m. Hence, r(z) ≡ 0. The Cayley–Hamilton theorem yields that the characteristic polynomial p(z) of A annihilates A. Hence, ψ∣p and deg ψ ≤ deg p = n. ◻ Comparing the minimal polynomial of an algebraic element in a field extension with the minimal polynomial of a matrix, we can see their common features. For example, if E/F is a field extension and we denote the minimal polynomial of an algebraic element α ∈ E by m(z) and if we denote the minimal polynomial of A ∈ Fn×n by ψ(z), then both m(z) and ψ(z) are annihilators for α and A, respectively, and both are nonzero polynomials of minimum degree with this character. According to the definition, m(z) is irreducible over F. Now, the question of whether ψ(z) is necessarily irreducible over F too can be asked. We will answer this question in Worked-out Problem 2.4.1-6. The following theorem gives a method to compute the minimal polynomial of a matrix by using its characteristic polynomial. Theorem 2.4.2. Let A ∈ Fn×n . Denote by q(z) the g.c.d., (monic polynomial), of all the entries of adj (zIn −A) ∈ F[z]n×n , which is the g.c.d. of all (n−1)×(n−1) minors of (zIn − A). Then, q(z) divides the characteristic polynomial p(z) = det(zIn − A) of A. Furthermore, the minimal polynomial of A is equal to p(z) . q(z) Proof. Expand det(zIn −A) by the first column. Since q(z) divides each (n−1)× (n − 1) minor of zIn − A, we deduce that q(z) divides p(z). Let φ(z) = p(z) . As q(z) q(z) divides each entry of adj (zIn −A), we deduce that adj (zIn −A) = q(z)C(z), for some C(z) ∈ F[z]n×n . Divide the equality p(z)In = adj (zIn − A)(zIn − A) by q(z) to deduce that φ(z)In = C(z)(zIn − A). Lemma 2.10 yields that φ(A) = 0. Let ψ(z) be the minimal polynomial of A. Theorem 2.4.1 yields that ψ divides φ. We now show that φ divides ψ. Theorem 2.4.1 implies that p(z) = s(z)ψ(z), for some monic polynomial s(z). Since ψ(A) = 0, Lemma 2.10 yields that ψ(z)In = D(z)(zIn − A), for some D(z) ∈ Fn×n [z]. Thus, p(z)In = s(z)ψ(z)In = s(z)D(z)(zIn − A). Since p(z)In = adj (zIn − A)(zIn − A), we conclude that s(z)D(z) = adj (zIn − A). As all the entries of D(z) are polynomials, it follows

2.4. Minimal polynomial and decomposition to invariant subspaces

93

that s(z) divides all the entries of adj (zIn − A). Since q(z) is the g.c.d. of all entries of adj (zIn − A), we deduce that s(z) divides q(z). Consider the equality p(z) = s(z)ψ(z) = q(z)φ(z). Thus, ψ(z) = q(z) φ(z). Hence, φ(z) divides ψ(z). s(z) As ψ(z) and φ(z) are monic, we deduce that ψ(z) = φ(z). ◻ Example 2.12. Let A = J3 (4) ⊕ J1 (2). Then its characteristic polynomial is p(z) = det(zIn − A) = (z − 4)3 (z − 2). Therefore, ⎡ (z − 4)2 (z − 2) ⎢ ⎢−(z − 4)2 (z − 2) ⎢ adj (zIn − A) = ⎢ ⎢ (z − 2) ⎢ ⎢ (2 − z) ⎣

0 0 (z − 4)2 (z − 2) 0 (z − 4)(z − 2) (z − 4)2 (z − 2) (z − 4) (z − 4)2

Then q(z) = 1 and using Theorem 2.4.2, ψ(z) =

p(z) q(z)

0 ⎤⎥ 0 ⎥⎥ ⎥. 0 ⎥⎥ 3⎥ (z − 4) ⎦

= p(z).

Proposition 2.4.3. Let A ∈ Fn×n and assume that λ ∈ F is an eigenvalue of A with the corresponding eigenvector x ∈ Fn . Then, for any h(z) ∈ F[z], h(A)x = h(λ)x. In particular, λ is a root of the minimal polynomial ψ(z) of A, i.e., ψ(λ) = 0. Proof. Clearly, Am x = λm x. Hence, h(A)x = h(λ)x. Assume that h(A) = 0. As x ≠ 0, we deduce that h(λ) = 0. Then, ψ(λ) = 0. ◻ Combining Theorem 2.4.1 and Proposition 2.4.3, we conclude that the minimal polynomial and the characteristic polynomial of a matrix have the same roots (probably with the different multiplicities). Definition 2.4.4. A matrix A ∈ Fn×n is called nonderogatory if the minimal polynomial of A is equal to its characteristic polynomial. Definition 2.4.5. Let V be a finite-dimensional vector space over F, and assume that V1 , . . . , Vi are nonzero subspaces of V. Then, V is a direct sum of V1 , . . . , Vi , denoted as V = ⊕ij=1 Vj if any vector v ∈ V has a unique representation as v = v1 + ⋯ + vi , where vj ∈ Vj , for j = 1, . . . , i. Equivalently, let [vj1 , . . . , vjlj ] be a basis of Vj , for j = 1, . . . , i. Then, dim V = ∑ij=1 dim Vj = i ∑j=1 lj and the dim V vectors v11 , . . . , v1l1 , . . . , vi1 , . . . , vili are linearly independent. Let T ∶ V → V be a linear operator. A subspace U of V is called a T -invariant subspace, or simply an invariant subspace when there is no ambiguity about T , if T (u) ∈ U, for each u ∈ U. We denote this fact by T U ⊆ U. Denote by T ∣U the restriction of T to the invariant subspace of T . Clearly, T ∣U is a linear operator on U. Note that V and the zero subspace {0} are invariant subspaces. Those are called trivial invariant subspaces. The subspace U is called a nontrivial invariant subspace if it is an invariant subspace such that 0 < dim U < dim V. Since the representation matrices of T in different bases form a similarity class, we can define the minimal polynomial ψ(z) ∈ F[z] of T , as the minimal polynomial of any representation matrix of T. Equivalently, ψ(z) is the monic polynomial of the minimal degree which annihilates T ; ψ(T ) = 0.

94

Chapter 2. Canonical forms

Our objective now is to use the notions of direct sum and invariant subspace in order to find a basis of V such that the matrix relative to this basis has a particularly useful form. The key to this study is the following fundamental results. Theorem 2.4.6 (primary decomposition theorem). Let T ∶ V → V be a linear operator on a nonzero finite-dimensional vector space V. Let ψ(z) be the minimal polynomial of T . Assume that ψ(z) decomposes to ψ(z) = ψ1 (z) . . . ψk (z), where each ψi (z) is a monic polynomial of degree at least 1. Suppose, furthermore, that for each pair i ≠ j, ψi (z) and ψj (z) are coprime. Then, V is a direct sum of V1 , . . . , Vk , where each Vi is a nontrivial invariant subspace of T . Furthermore, the minimal polynomial of T ∣Vi is equal to ψi (z), for i = 1, . . . , k. Moreover, each Vi is uniquely determined by ψi (z) for i = 1, . . . , k. Proof. We prove the theorem by induction on k ≥ 2. Let k = 2. Then, ψ(z) = ψ1 (z)ψ2 (z). Let V1 ∶= ψ2 (T )V and V2 = ψ1 (T )V be the images of the operators ψ2 (T ), ψ1 (T ), respectively. Observe that T V1 = T (ψ2 (T )V) = (T ψ2 (T ))V = (ψ2 (T )T )V = ψ2 (T )(T V) ⊆ ψ2 (T )V = V1 . Thus, V1 is a T-invariant subspace. Assume that V1 = {0}. This is equivalent to ψ2 (T ) = 0. By Theorem 2.4.1, ψ divides ψ2 , which is impossible since deg ψ = deg ψ1 + deg ψ2 > deg ψ1 . Thus, dim V1 > 0. Similarly, V2 is a nonzero T invariant subspace. Let Ti = T ∣Vi , for i = 1, 2. Clearly, ψ1 (T1 )V1 = ψ1 (T )V1 = ψ1 (T )(ψ2 (T )V) = (ψ1 (T )ψ2 (T ))V = {0}, since ψ is the minimal polynomial of T . Hence, ψ1 (T1 ) = 0, i.e., ψ1 is an annihilating polynomial of T1 . Similarly, ψ2 (T2 ) = 0. Let U = V1 ∩ V2 . Then, U is an invariant subspace of T . We claim that U = {0}, i.e., dim U = 0. Assume to the contrary that dim U ≥ 1. Let Q ∶= T ∣U and denote by φ ∈ F[z] the minimal polynomial of Q. Clearly, deg φ ≥ 1. Since U ⊆ Vi , it follows that ψi is an annihilating polynomial of Q for i = 1, 2. Hence, φ∣ψ1 and φ∣ψ2 , i.e., φ is a nontrivial factor of ψ1 and ψ2 . This contradicts the assumption that ψ1 and ψ2 are coprime. Hence, V1 ∩ V2 = {0}. Since (ψ1 , ψ2 ) = 1, there exist polynomials f, g ∈ F[z] such that ψ1 f +ψ2 g = 1. Hence, I = ψ1 (T )f (T ) + ψ2 (T )g(T ), where I is the identity operator Iv = v on V. In particular, for any v ∈ V we have v = v2 + v1 , where v1 = ψ2 (T )(g(T )v) ∈ V1 , v2 = ψ1 (T )(f (T )v) ∈ V2 . Since V1 ∩ V2 = {0}, it follows that V = V1 ⊕ V2 . Let ψ˜i be the minimal polynomial of Ti . Then, ψ˜i ∣ψi for i = 1, 2. Hence, ψ˜1 ψ˜2 ∣ψ1 ψ2 . Let v ∈ V. Therefore, v = v1 + v2 , where vi ∈ Vi , i = 1, 2. Using the facts that ψ˜1 (T )ψ˜2 (T ) = ψ˜2 (T )ψ˜1 (T ), ψ˜i is the minimal polynomial of Ti , and the definition of Ti , we deduce ψ˜1 (T )ψ˜2 (T )v = ψ˜2 (T )ψ˜1 (T )v1 + ψ˜1 (T )ψ˜2 (T )v2 = 0. Hence, the monic polynomial θ(z) ∶= ψ˜1 (z)ψ˜2 (z) is an annihilating polynomial of T . Thus ψ(z)∣θ(z), which implies that ψ(z) = θ(z), hence ψ˜i = ψ˜ for i = 1, 2. ¯ i ∶= {v ∈ V ∶ ψi (T )v = 0}, It is left to show that V1 and V2 are unique. Let V ¯ for i = 1, 2. Therefore, Vi is a subspace that contains Vi , for i = 1, 2. If

2.4. Minimal polynomial and decomposition to invariant subspaces

95

ψi (T )v = 0, then ψi (T )(T (v)) = (ψi (T )T )v = (T ψi (T )v) = T (ψi (T )v) = T 0 = 0. ¯ i is T -invariant subspace. We claim that V ¯ i = Vi . Suppose to the Hence V ¯ i > dim Vi , for some i ∈ {1, 2}. Let j ∈ {1, 2} and j ≠ i. contrary that dim V ¯ i ∩ Vj ) ≥ 0. As before, we conclude that U ∶= V ¯ i ∩ Vj is T -invariant Then, dim(V subspace. As above, the minimal polynomial of T ∣U must divide ψ1 (z) and ψ2 (z), which contradicts the assumption that (ψ1 , ψ2 ) = 1. This concludes the proof of the theorem for k = 2. Assume that k ≥ 3. Let ψˆ2 ∶= ψ2 . . . ψk . Then, (ψ1 , ψˆ2 ) = 1 and ψ = ψ1 ψˆ2 . ˆ 2 , where T ∶ V1 → V1 has the minimal polynomial ψ1 , and Thus, V = V1 ⊕ V ˆ2 → V ˆ 2 has the minimal polynomial ψˆ2 . Note that V1 and V ˆ 2 are unique. T ∶V to deduce the theorem. ◻ Apply the induction hypothesis for T ∣V ˆ2

2.4.1 Worked-out problems 1. Let A ∈ Fn×n . Show that A and A⊺ have the same minimal polynomial. Solution: If ψ(z) denotes the minimal polynomial of A, since ψ(A⊺ ) = ψ(A)⊺ , then ψ(z) is the minimal polynomial of A⊺ as well. 2. Let A ∈ Cn×n and A2 = −A. (a) List all possible eigenvalues of A. (b) Show that A is diagonalizable. (c) Assume that B 2 = −A. Is B diagonalizable? Solution: (a) If λ is an eigenvalue of A, then Au = λu, for some nonzero u ∈ Cn . Then, A2 u = λ2 u = −Au = −λu and so (λ2 + λ)u = 0. This means λ(λ + 1) = 0 and then λ1 = 0 and λ2 = −1 are possible eigenvalues. (b) Clearly, p(z) = z 2 + z is an annihilating polynomial of A. Using Theorem 2.3.2, its minimal polynomial is either z 2 + z or z or z + 1; neither of them has simple roots. Hence, by the next worked-out problem, A is diagonalizable. (c) No. Obviously A = [00 B = [00

1 0].

0 0]

satisfies A2 = −A = 0 and B 2 = −A = 0, where

But since pB (z) = det(zI − B) = z 2 and ψB (z)∣pB (z),

then ψB (z) is either z or z 2 and clearly ψB (B) = 0 if ψB (z) = z. Then, ψB (z) = z 2 and by the next worked-out problem, B is not diagonalizable. 3. If A ∈ Fn×n , show that A is diagonalizable over F if and only if det(zI − A) splits to linear factors over F and its minimal polynomial has simple roots. Solution: First, assume that A is diagonalizable, so all eigenvalues of A are in F and Fn is the direct sum of the eigenvectors for A. We want to show

96

Chapter 2. Canonical forms

the minimal polynomial of A in F[z] splits and has distinct roots. Let m

λ1 , . . . , λm be the different eigenvalues of A, so V = ⊕ Eλi . We will show i=1

m

that f (z) = ∏ (z − λi ) ∈ F[z] is the minimal polynomial of A in F[z]. By i=1

hypothesis, the eigenvectors of A span Fn . Let v be an eigenvector, say, Av = λv. Then, A − λ annihilates v. The matrices (A − λi I)’s commute with each other and one of them annihilates v, so their product f (A) annihilates v. Thus, f (A) annihilates the span of the eigenvectors, which is Fn , so f (A) = 0. The minimal polynomial is therefore a factor f (z). At the same time, each root of f (z) is an eigenvalue of A and so is a root of the minimal polynomial of A. Since the roots of f (z) each occur once, f (z) must be the minimal polynomial of A. Conversely, assume that ψA (z) = (z − λ1 )⋯(z − λm ) is the minimal polynomial of A. Then, λi ’s are the eigenvalues of A which are distinct. For any eigenvalue λi , let Eλi = {v ∈ Fn ; Av = λi v} be the corresponding m

eigenspaces. We will show that Fn = ⊕ Eλi . According to Worked-out i=1

Problem 2.4.1-5, the eigenvectors with different eigenvalues are linearly m independent, so it suffices to show Fn = ⊕ Eλi , as the sum will then i=1

be direct by linear independence. h1 (z), . . . , hm (z) in F[z] such that

Now, we want to find polynomials

m

1 = ∑ hi (z), hi (z) ≡ 0 (mod i=1

ψA ). z − λi

(2.4.1)

The congruence condition implies the polynomial (z − λi )hi (z) is divisible by ψA (z). If we substitute the matrix A for z in (2.4.1) and apply both sides to any vector v ∈ Fn , we get v = ∑m i=1 hi (A)(v), (A − λi I)hi (A)(v) = 0.

(2.4.2)

The second equation says that hi (A)(v) lies in Eλi and the first equation says v is a sum of such eigenvectors, so Fn = ⊕m i=1 Eλi . For each Eλi one obtains an eigenbasis for A. Now we find hi (z)’s fitting (2.4.2). For 1 < i < m, let fi (z) = ψA (z)/(z −λi ) = ∏ (z −λi ). Since λi ’s are distinct, the j≠i

polynomials f1 (z), . . . , fm (z) are relatively prime as an m-tuple, so some F[z]-linear combination of them equals (2.4.1): m

1 = ∑ gi (z)fi (z),

(2.4.3)

i=1

where gi (z) ∈ F[z]. Let hi (z) = gi (z)fi (z). By now, we have proved m

Fn = ⊕ Eλi . Thus, Fn has a basis of eigenvectors for A, and by Proposition i=1

2.1.11, A is diagonalizable. 4. Let A ∈ Cn×n be of finite order, i.e., Am = In , for some positive integer m. Then, A is diagonalizable.

2.4. Minimal polynomial and decomposition to invariant subspaces

97

Solution: Since Am = In , A is annihilated by z m − 1. Then, the minimal polynomial of A is a factor of z m − 1. The polynomial z m − 1 has a distinct root in C. The previous problem completes the proof. 5. Show that eigenvectors for distinct eigenvalues are linearly independent. Solution: Assume that A ∈ Fn×n and v1 , . . . , vm are eigenvectors of A with distinct eigenvalues λ1 , . . . , λm , we show that vi ’s are linearly independent by induction on m. The case m = 1 is obvious. If m > 1, suppose ∑m i=1 ci vi = 0, for some ci ∈ F. Since Avi = λi vi and A ∑m i=1 ci vi = 0, then m

∑ ci λi vi = 0.

(2.4.4)

i=1

Multiplying the linear relation ∑m i=1 ci vi = 0 by λm , we get m

∑ ci λm vi = 0.

(2.4.5)

i=1

Subtracting (2.4.5) from (2.4.4), we get ∑m−1 i=1 ci (λi − λm )vi = 0. By induction hypothesis, ci (λi − λm ) = 0, for i = 1, . . . , m − 1. Since λ1 , . . . , λm−1 , λm are distinct, λi −λm ≠ 0, for i = 1, . . . , m−1. Then, ci = 0, for i = 1, . . . , m−1. Thus the linear relation ∑m i=1 ci vi = 0 becomes cm vm = 0. The vector vm is nonzero, then cm = 0. 6. Show that the minimal polynomial of a matrix A ∈ Fn×n is not necessarily irreducible over F. Solution: If A = [00

1 0]

∈ R2×2 , then ψA (z) = z 2 which is reducible over R.

2.4.2 Problems 1. Let A, B ∈ Fn×n and p(z) ∈ F[z]. Show the following statements: (a) If B = U AU −1 , for some U ∈ GL(n, F), then p(B) = U p(A)U −1 . (b) If A≈B, then A and B have the same minimal polynomial. (c) Let Ax = λx. Then, p(A)x = p(λ)x. Deduce that each eigenvalue of A is a root of the minimal polynomial of A. (d) Assume that A has n distinct eigenvalues. Then, A is nonderogatory. 2. (a) Show that the Jordan block Jk (λ) ∈ Fk×k is nonderogatory. (b) Let λ1 , . . . , λk ∈ F be k distinct elements. Let i A = ⊕k,l i=j=1 Jmij (λi ), where mi = mi1 ≥ ⋯ ≥ mili ≥ 1, for i = 1, . . . , k. (2.4.6) Here mij and li are positive integers. Find the minimal polynomial of A when A is nonderogatory.

98

Chapter 2. Canonical forms

3. Find the characteristic and the minimal polynomials of ⎡ 2 ⎢ ⎢ −4 ⎢ C ∶= ⎢ ⎢ 1 ⎢ ⎢ 2 ⎣ 4. Let A ∶= [ux

y v ].

2 −2 4 ⎤⎥ −3 4 −6 ⎥⎥ ⎥. 1 −1 2 ⎥⎥ 2 −2 4 ⎥⎦

Then, A is a point in the four-dimensional real vector

space R4 . (a) What is the condition that A has a multiple eigenvalue (det(zI2 −A) = (z − λ)2 ) ? Conclude that the set (variety) of all 2 × 2 matrices with a multiple eigenvalue is a quadratic hypersurface in R4 , i.e., it satisfies a polynomial equation in (x, y, u, v) of degree 2. Hence, its dimension is 3. (b) What is the condition that A has a multiple eigenvalue and it is a diagonalizable matrix? Show that this is a line in R4 . Hence, its dimension is 1. (c) Conclude that the set (variety) of 2 × 2 matrices which have multiple eigenvalues and are diagonalizable is “much smaller” than the set of 2 × 2 matrices with multiple eigenvalues. This fact holds for any n × n matrices in Rn×n or Cn×n . 5. Show that A ∈ Fn×n is diagonalizable if and only if mg (λ) = ma (λ), for any eigenvalue λ of A. 6. Let A ∈ Fn×n and assume that λ is an eigenvalue of it. Show that the multiplicity of λ is at least the dimension of the eigenspace Eλ . 7. Programming problem. Spectrum and pseudospectrum: Let A = [aij ]ni,j=1 ∈ Cn×n . Then, det(zIn − A) = (z − λ1 )⋯(z − λn ) and the spectrum of A is given as spec A ∶= {λ1 , . . . , λn }. In computations, the entries of A are known or given up to a certain precision. Say, in regular precision, each aij is known with precision to eight digits: a1 ⋅ a2 ⋯a8 × 10m for some integer, e.g., 1.2345678 × 10−12 , in floating point notation. Thus, with a given matrix A, we associate a whole class of matrices C(A) ⊂ Cn×n of matrices B ∈ Cn×n that are represented by A. The pseudospectrum of A is the union of all the spectra of B ∈ C(A): pspec A ∶= ∪B∈C(A) spec B. Note that spec A and pspec A are subsets of the complex plane C and can be easily plotted by computer. The shape of pspec A gives an idea of our real knowledge of the spectrum of A, and to changes of the spectrum of A under perturbations. The purpose of this programming problem is to give the reader a taste of this subject. In all computations, use double precision. (a) Choose at random A = [aij ] ∈ R5×5 as follows: Each entry aij is chosen at random from the interval [−1, 1], using uniform distribution. Find the spectrum of A and plot the eigenvalues of A on the X − Y plane as complex numbers, marked as +, where the center of + is at each eigenvalue.

2.5. Quotient of vector spaces and induced linear operators

99

i. For each = 0.1, 0.01, 0.0001, 0.000001, do the following: For i = 1, . . . , 100, choose Bi ∈ R5×5 at random as A in item (a) and find the spectrum of A + Bi . Plot these spectra, each eigenvalue of A + Bi plotted as on the X − Y plane, together with the plot of the spectrum of A. (You will have 4 graphs in total.) (b) Let A ∶= diag[0.1C, [−0.5]], i.e., A ∈ R5×5 be a block diagonal matrix where the first 4 × 4 block is 0.1C, where the matrix C is given in part (i) of (a) above with this specific A. (Again, you will have 4 graphs.) (c) Repeat (a) by choosing at random a symmetric matrix A = [aij ] ∈ R5×5 . That is, choose at random aij for 1 ≤ i ≤ j, and let aji = aij for i < j. i. Repeat part (i) of (a). (Bj are not symmetric!) You will have 4 graphs. ii. Repeat part (i) of (a), with the restriction that each Bj is a random symmetric matrix, as explained in (c). You will have 4 graphs. (d) Can you draw some conclusions about these numerical experiments? 8. Assume that A, B ∈ Fn×n are similar and A = P BP −1 . Let Eλ (A) be the λ-eigenspace for A and Eλ (B) be the λ-eigenspace for B. Prove that Eλ (B) = P −1 Eλ (A), i.e., Eλ (B) = {P −1 x ∶ x ∈ Eλ (A)}. Note that although similar matrices have the same eigenvalues (Workedout Problem 1.12.1-1), they do not usually have the same eigenvectors or eigenspaces. The above problem states that there is a precise relationship between the eigenspaces of similar matrices. 9. Let A ∈ Fn×n and rank A = k. Show that the degree of the minimal polynomial of A is at most k + 1. 10. Let A ∈ Fn×n . Prove that A is invertible if and only if ψA (0) ≠ 0. 11. For each θ ∈ R, show that all matrices of the form A(θ) = [

cos2 θ cos θ sin θ

cos θ sin θ ] ∈ R2×2 sin2 θ

are nonderogatory. 12. Prove or disprove the following statement: If T ∈ L(Rn ) and n > 1, there exists a T -invariant subspace U of Rn such that dim U = 2.

2.5 Quotient of vector spaces and induced linear operators Similar to the concepts of kernel, image, and cokernel of a group homomorphism, we have the concepts of kernel, image, and cokernel for a linear transformation. (See Appendix D.) If V and W are two vector spaces over the field F and

100

Chapter 2. Canonical forms

T ∈ L(V, W), then we have ker T = {x ∈ V ∶ T (x) = 0}, Im T = {T (x) ∶ x ∈ V} and cokerT = W/Im T . Let V be a finite-dimensional vector space over F. Assume that U is a subspace of V. Then by V/U we denote the set of cosets x + U ∶= {y ∈ V ∶ y = ˆ ∶= V/U is denoted by x ˆ ∶= x + U. x + u for all u ∈ U}. An element in V Note that codimension of U is denoted by codim U and defined as codim U = dim V − dim U. One of the results in the following lemma is that codim U = dim V/U. Lemma 2.13. Let V be a finite-dimensional vector space over F. Assume that U is a subspace of V. Then, V/U is a finite-dimensional vectors space over F, where (x + U) + (y + U) ∶= (x + y + U) and a(x + U) ∶= ax + U. Specifically, the neutral element is identified with the coset 0 + U = U. In particular, dim V/U = dim V − dim U. More precisely, if U has a basis {u1 , . . . , um } and V has a basis ˆ has a basis {ˆ ˆ m+l }. (Note that {u1 , . . . , um , um+1 , . . . , um+l }, then V um+1 , . . . , u ˆ = V and if U = V, then V ˆ is the trivial subspace consisting if U = {0}, then V of the zero element.) The proof is straightforward and left as an exercise. Lemma 2.14. Assume that the assumptions of Lemma 2.13 hold. Let T ∈ L(V) ∶= L(V, V). Assume furthermore that U is an invariant subspace of T . ˆ →V ˆ by the equality Tˆ(x+U) ∶= T (x)+U, Then, T induces a linear operator Tˆ ∶ V ̂ ˆ i.e., T (ˆ x) = T (x). Proof. We first need to show that T̂ (x) is independent of the coset representative. Namely, T (x + u) + U = T (x) + (T (u) + U) = T (x) + U, for each u ∈ U. Since T (u) ∈ U, we deduce that T (u) + U = U. The linearity of Tˆ follows from the linearity of T . ◻

2.6 Isomorphism theorems for vector spaces The notion of isomorphism gives a sense in which several examples of vector spaces that we have seen so far are the same. For example, assume that V is the vector space of row vectors with two real components, and W is the vector space of column vectors with two real components. Clearly, V and W are the same in some sense, and we can associate to each row vector (a b) the corresponding column vector (ab). We see that T (x y) = (xy) is linear and R(T ) = W and N (T ) = {0}. So T is an isomorphism. The importance of this concept is that any property of V as a vector space can be translated into an equivalent property of W. For instance, if we have a set of row vectors and want to know if they are linearly independent, it would suffice to answer the same question for the corresponding row vectors. Theorem 2.6.1. Let V and W be vector spaces over the field F and T ∈ L(V, W). If V1 and W1 are subspaces of V and W, respectively, and T (V1 ) ⊆ W1 , then T induces a linear transformation T¯ ∶ V/V1 → W/W1 defined by T¯(v + V1 ) = T (v) + W1 .

2.6. Isomorphism theorems for vector spaces

101

Proof. First, we show that T¯ is well-defined. If x+V1 = y+V1 , for x, y ∈ V, then x − y ∈ V1 and so T (x) − T (y) ∈ T (V1 ) ⊆ W1 . Then, T (x) + W1 = T (y) + W1 , and T¯(x + W) = T¯(y + W). By linearity of T it follows that T¯ is also linear. ◻ The first isomorphism theorem. If T ∶ V → W is a linear operator, then V/ ker T ≅ImT. Proof. Let V1 = ker T and W1 = {0}. Then, T induces a linear transformation T¯ ∶ V/ ker T → ImT defined by T¯(v+ker T ) = T (v). If y ∈ ImT , there exists x ∈ V such that T (x) = y, and so T¯(x + ker T ) = y. Therefore, T¯ is surjective. Now we check that T¯ is injective for x ∈ V, x + ker T ∈ ker T¯ if and only if T (x) = 0, that is, x ∈ ker T . Thus, T¯ is injective. Using the previous theorem, we conclude that T¯ is linear. We are done! ◻ The first isomorphism theorem implies that up to isomorphism, images of linear operators on V are the same as the quotient space of V. The second and third isomorphism theorems are consequences of the first isomorphism theorem. See Figure 2.1.

Figure 2.1. Kernel and image of a linear mapping T ∶ V → W. The second isomorphism theorem. Let V be a vector space, and let V1 and V2 be subspaces of V. Then V1 + V2 V1 ≅ . V2 V1 ∩ V2 1 Proof. Let T ∶ V1 + V2 → V1V∩V be defined by T (v1 + v2 ) = v1 + (V1 ∩ 2 V2 ). It is easy to show that T is a well-defined surjective linear operator with kernel V2 . An application of the first isomorphism theorem then completes the proof. ◻

The third isomorphism theorem. Let V be a vector space, and suppose that V1 ⊂ V2 ⊂ V are subspaces of V. Then V V/V1 ≅ . V2 /V1 V2

102

Chapter 2. Canonical forms

Proof. Let T ∶ V/V1 → V/V2 be defined by T (v + V1 ) = v + V2 . It is easy to show that T is a well-defined surjective linear transformation whose kernel is V2 /V1 . The rest follows from the first isomorphism theorem. ◻

2.7 Existence and uniqueness of the Jordan canonical form Definition 2.7.1. A matrix A ∈ Fn×n or a linear transformation T ∶ V → V is called nilpotent if Am = 0 or T m = 0, respectively. The minimal m ≥ 1 for which Am = 0 or T m = 0 is called the index of nilpotency of A and T , respectively, and denoted by index A or index T , respectively.

Examples 1. The matrix A = [00

1 0]

is nilpotent, since A2 = 0.

2. Any triangular matrix with zero along example, the matrix ⎡0 2 ⎢ ⎢0 0 ⎢ A=⎢ ⎢0 0 ⎢ ⎢0 0 ⎣

the main diagonal is nilpotent. For 6⎤⎥ 2⎥⎥ ⎥ 3⎥⎥ 0⎥⎦

1 1 0 0

is nilpotent and its index of nilpotency is 4: ⎡0 ⎢ ⎢0 ⎢ 2 A =⎢ ⎢0 ⎢ ⎢0 ⎣

0 0 0 0

2 0 0 0

⎡0 7⎤⎥ ⎢ ⎢0 ⎥ 3⎥ ⎢ ⎥ , A3 = ⎢ ⎢0 0⎥⎥ ⎢ ⎢0 0⎥⎦ ⎣

0 0 0 0

0 0 0 0

6⎤⎥ 0⎥⎥ ⎥ and A4 = 0. 0⎥⎥ 0⎥⎦

3. Though the examples above have a large number of zero entries, a typical nilpotent matrix does not. For example, the matrix ⎡5 ⎢ ⎢ A = ⎢15 ⎢ ⎢10 ⎣

−3 −9 −6

2⎤⎥ ⎥ 6⎥ ⎥ 4⎥⎦

is nilpotent, though the matrix has no zero entries.

s-numbers Assume that A or T are nilpotent; then the s-numbers of A or T are defined as si (A) ∶= rank Ai−1 − 2rank Ai + rank Ai+1 ,

(2.7.1)

si (T ) ∶= rank T i−1 − 2rank T i + rank T i+1 , i = 1, . . . . Note that A or T is nilpotent with the index of nilpotency m if and only if z m is the minimal polynomial of A or T , respectively. Furthermore, if A or T is nilpotent, then the maximal l, for which sl > 0, is equal to the index of nilpotency of A or T , respectively. Moreover, if A is nilpotent, then it is singular, but the

2.7. Existence and uniqueness of the Jordan canonical form

103

converse does not satisfy. For example, consider the matrix A = [10 singular but not nilpotent. (Why?) ⎡0 ⎢ Example 2.15. Consider the matrix A = ⎢⎢00 ⎢0 ⎣ ⎡0 ⎢ ⎢0 ⎢ 2 A =⎢ ⎢0 ⎢ ⎢0 ⎣

0 0 0 0

1 0 0 0

0⎤⎥ 0⎥⎥ ⎥ 0⎥⎥ 0⎥⎦

1 0 0 0

0 1 0 0

0⎤ 0⎥ ⎥ 0⎥ 0⎥ ⎦

1 0],

which is

in R4×4 . We have

and A3 = 0.

Then, A is nilpotent with the index of nilpotency 3. We calculate s-numbers for A: s1 (A) = rank I − 2rank A + rank A2 = 4 − 2 × 2 + 1 = 1, s2 (A) = rank A − 2rank A2 + rank A3 = 2 − 2 × 1 + 0 = 0, s3 (A) = rank A2 − 2rank A3 + rank A4 = 1 − 2 × 0 + 0 = 1, and si (A) = 0, for i > 3. Therefore, the maximal l, for which sl > 0, is 3, which is equal to the index of nilpotency of A. Proposition 2.7.2. Let T ∶ V → V be a nilpotent operator with the index of nilpotency m on the finite-dimensional vector space V. Then m

rank T i = ∑ (j − i)sj = (m − i)sm + (m − i − 1)sm−1 + ⋯ + si+1 , (2.7.2) j=i+1

i = 0, . . . , m − 1. Proof. Since T l = 0, for l ≥ m, it follows that sm (T ) = rank T m−1 and sm−1 = rank T m−2 − 2rank T m−1 if m > 1. This proves (2.7.2) for i = m − 1, m − 2. For other values of i, (2.7.2) follows straightforward from (2.7.1) by induction on m − i ≥ 2. ◻ Theorem 2.7.3. Let T ∶ V → V be a linear transformation on a finitedimensional vector space. Assume that T is nilpotent with the index of nilpotency m. Then, V has a basis of the form xj , T (xj ), . . . T lj −1 (xj ), j = 1, . . . , i, where m = l1 ≥ ⋯ ≥ li ≥ 1

(2.7.3)

and T (xj ) = 0, j = 1, . . . , i. lj

More precisely, the number of lj , which are equal to an integer l ∈ [m], is equal to sl (T ) given in (2.7.1). Proof. We first show by induction on the dimension of V that there exists a basis in V given by (2.7.3). If dim V = 1 and T is nilpotent, then any x ∈ V ∖ {0} is a basis satisfying (2.7.3). Suppose that for dim V ≤ N any nilpotent T has a basis of the form (2.7.3). Assume now that dim V = N + 1. Assume that T m = 0 and T m−1 ≠ 0. Hence, rank T m−1 = dim Im T m−1 > 0. Thus, there exists 0 ≠ y1 ∈ Im T m−1 . Therefore, T m−1 (x1 ) = y1 , for some

104

Chapter 2. Canonical forms

x1 ∈ V. Clearly T m (x1 ) = 0. We claim that x1 , T (x1 ), . . . , T m−1 (x1 ) are linearly independent. Suppose that m

∑ ai T

i−1

(xi ) = 0.

(2.7.4)

i=1

Apply T m−1 to this equality to deduce that 0 = a1 T m−1 (x1 ) = a1 y1 . Since y1 ≠ 0, it follows that a1 = 0. Now apply T m−2 to (2.7.4) to deduce that a2 = 0. Continue this process to deduce that a1 = ⋯ = am = 0. Let U = span {x1 , T (x1 ), . . . , T m−1 (x1 )}. Clearly, T U ⊂ U. If U = V, we ˆ ∶= V/U are done. Now, assume that m = dim U < dim V. We now consider V m m ˆ ˆ ˆ ˆ and the corresponding T ∶ V → V. Since T = 0, it follows that T = 0. Con′ ′ sequently, Tˆ is nilpotent. Assume that Tˆm = 0 and Tˆm −1 ≠ 0. Then, m ≥ m′ . ˆ has a basis of the form We now apply the induction hypothesis to Tˆ. Thus, V lj −1 ′ ˆ ˆ {ˆ xj , T (ˆ xj ), . . . , T (ˆ xj )}, for j = 2, . . . , i. Here, m = l2 ≥ ⋯ ≥ li ≥ 1. Furtherˆ for j = 2, . . . , i. Assume that x ˆ j = zj + U, for j = 2, . . . , i. more, Tˆlj (ˆ xj ) = 0 in V i−1 m−lj a T (x to both As Tˆlj (ˆ xj ) = 0, it follows that T lj (zj ) = ∑m 1 ). Apply T i=1 i lj m−lj−1 +i−1 m (x1 ). sides of this identity. Since T = 0, we deduce that 0 = ∑i=1 ai T As x1 , . . . , T m−1 (x1 ) are linearly independent, we conclude that a1 = ⋯ = alj = 0. m−l Let xj ∶= zj − ∑i=1 j alj +i T i−1 (x1 ). Observe that T lj (xj ) = 0, for j = 2, . . . , i. We claim that the vectors in (2.7.3) form a basis in V, as claimed. Assume that there is a linear combination of these vectors that gives zero vector in V: i

lj

∑ ∑ bjk T

k−1

(xj ) = 0.

(2.7.5)

j=1 k=1

ˆ As x ˆ then the above equality ˆ 1 = 0 in V, Consider this linear combination in V. reduces to i

lj

∑ ∑ bjk Tˆ

k−1

(ˆ xj ) = 0.

j=2 k=1

The induction hypothesis yields that bjk = 0 for each j ≥ 2 and each corresponding k−1 k. Hence, (2.7.5) reduces to ∑m (x1 ) = 0. As x1 , . . . , T m−1 (x1 ) are k=1 b1k T linearly independent, we deduce that b1k = 0, for k ∈ [m]. Hence the vectors in (2.7.3) are linearly independent. Note that the number of these vectors is ˆ = dim V = N + 1. Then, the vectors in (2.7.3) form a basis in V. m + dim V It is left to show that the number of lj , which is equal to an integer l ∈ [m], is equal to sl (T ) given in (2.7.1). Since T l = 0, for l ≥ m, it follows that sm = rank T m−1 = dim Im T m−1 . Note that if lj < m, then T m−1 (xj ) = 0. Hence, Im T m−1 = span {T m−1 (x1 ), . . . , T m−1 (xsm (T ) )}. That is, sm (T ) is the number of lj that are equal to m. Use Proposition 2.7.2 and the basis of V given by (2.7.3) to deduce that the number of lj , which are equal to an integer l ∈ [m], is equal to sl (T ) for l = m − 1, . . . , 1. ◻ Corollary 2.7.4. Let T satisfy the assumption of Theorem 2.7.3. Denote Vj ∶= span {T lj −1 (xj ), . . . , T (xj ), xj }, for j = 1, . . . , i. Then, each Vj is a T -invariant subspace, T ∣Vj is represented by Jlj (0) ∈ Clj ×lj in the basis {T lj −1 (xj ), . . . , T (xj ), xj }, and V = ⊕ij=1 Vj . Each lj is uniquely determined by the sequence si (T ), i = 1, . . . ,. Namely, the index m of the nilpotent T is the largest i ≥ 1 such that si (T ) ≥ 1. Let k1 = sm (T ), l1 = ⋯ = lk1 = p1 = m and define recursively

2.7. Existence and uniqueness of the Jordan canonical form

105

kr ∶= kr−1 + spr (T ), lkr−1 +1 = ⋯ = lkr = pr , where 2 ≤ r, pr ∈ [m − 1], spr (T ) > 0 r and kr−1 = ∑m−p j=1 sm−j+1 (T ). Definition 2.7.5. T ∶ V → V be a nilpotent operator. Then, the sequence (l1 , . . . , li ) defined in Theorem 2.7.3, which gives the lengths of the corresponding Jordan blocks of T in decreasing order, is called the Segre characteristic of T . The Weyr characteristic of T is dual to Segre’s characteristic. That is, consider an m × i, 0 − 1 matrix B = [bpq ] ∈ {0, 1}m×i . The jth column of B has 1 in the rows 1, . . . , lj and 0 in the rest of the rows. Let ωp be the pth row sum of B, for p = 1, . . . , m. Then, ω1 ≥ ⋯ ≥ ωm ≥ 1 is the Weyr characteristic. Proof of Theorem 2.1.21 (the Jordan canonical form). Let p(z) = det(zIn − A) be the characteristic polynomial of A ∈ Cn×n . Since C is algebraically closed, then p(z) = ∏kj=1 (z − λj )nj . Here λ1 , . . . , λk are k distinct roots (eigenvalues of A), where nj ≥ 1 is the multiplicity of λj in p(z). Note that k ∑j=1 nj = n. Let ψ(z) be the minimal polynomial of A. By Theorem 2.4.1, ψ(z)∣p(z). Using Problem 2.4.2-1.c we deduce that ψ(λj ) = 0, for j = 1, . . . , k. Hence k

k

j=1

j=1

det(zIn − A) = ∏(z − λj )nj , ψ(z) = ∏(z − λj )mj ,

(2.7.6)

1 ≤ mj ≤ nj , λj ≠ λi for j ≠ i, i, j = 1, . . . , k. Let ψj ∶= (z − λj )mj , for j = 1, . . . , k. Then (ψj , ψi ) = 1, for j ≠ i. Let V ∶= Cn and T ∶ V → V be given by T (x) ∶= Ax, for any x ∈ Cn . Then, det(zIn − A) and ψ(z) are the characteristic and the minimal polynomial of T , respectively. Use Theorem 2.4.6 to obtain the decomposition V = ⊕ki=1 Vi , where each Vi is a nontrivial T -invariant subspace such that the minimal polynomial of Ti ∶= T ∣Vi is ψi , for i = 1, . . . , k. That is, Ti − λi Ii , where Ii is the identity operator, i.e., Ii v = v, for all v ∈ Vi , is a nilpotent operator on Vi and index (Ti − λi Ii ) = mi . Let Qi ∶= Ti − λi Ii . Then, Qi is nilpotent and index Qi = mi . Apply Theorem qj Vi,j , where each Vi,j is Qi 2.7.3 and Corollary 2.7.4 to deduce that Vi = ⊕j=1 invariant subspace, and each Vi,j has a basis in which Qi is represented by a Jordan block Jmij (0), for j = 1, . . . , qj . According to Corollary 2.7.4, mi = mi1 ≥ ⋯ ≥ miqi ≥ 1,

i = 1, . . . , k.

(2.7.7)

Furthermore, the above sequence is completely determined by rank Qji , j = 0, 1, . . . for i = 1, . . . , k. Noting that Ti = Qi + λi Ii , it easily follows that each Vi,j is a Ti -invariant subspace, hence T -invariant subspace. Moreover, in the same basis of Vi,j that Qi represented by Jmij (0), Ti is represented by Jmij (λi ), for j = 1, . . . , qi and i = 1, . . . , k. This shows the existence of the Jordan canonical form. We now show that the Jordan canonical form is unique, up to a permutation of factors. Note that the minimal polynomial of A is completely determined by its Jordan canonical form. Namely, ψ(z) = ∏ki=1 (z − zi )mi1 , where mi1 is the biggest Jordan block with the eigenvalues λi . (See Problem 2.4.2-2.) Thus, mi1 = mi , for i = 1, . . . , k. Now, Theorem 2.4.6 yields that the subspaces V1 , . . . , Vk are uniquely determined by ψ. Thus, each Ti and Qi = T − λi Ii are uniquely determined. Using Theorem 2.7.3, we get that rank Qji , j = 0, 1, . . . determines

106

Chapter 2. Canonical forms

the sizes of the Jordan blocks of Qi . Hence, all the Jordan blocks corresponding to λi are uniquely determined for each i ∈ [k]. ◻ Corollary 2.7.6. Let A ∈ Fn×n and assume that the characteristic polynomial of p(z) = det(zIn − A) splits to linear factors, i.e., (2.7.6) holds. Let B be the Jordan canonical form of A. Then 1. The multiplicity of the eigenvalue λi in the minimal polynomial ψ(z) of A is the size of the biggest Jordan block corresponding to λi in B. 2. The number of Jordan blocks in B corresponding to λi is the nullity of A − λi In , i.e., the number of Jordan blocks in B corresponding to A − λi In is the number of linearly independent eigenvectors of A corresponding to the eigenvalue λi . 3. Let λi be an eigenvalue of A. Then, the number of Jordan blocks of order i corresponding to λi in B is given in (2.7.9). Proof. 1. Since Jn (0)n = 0 and Jn (0)n−1 ≠ 0, it follows that the minimal polynomial of Jn (λ) is (z − λn )n = det(zIn − Jn (λ)). Use Problem 2.7.2-3.a to deduce the first part of the corollary. 2. Since Jn (0) has one independent eigenvector, use Problem 2.7.2-3.b to deduce the second part of the corollary. 3. Use Problem 2.7.2-2.a to establish the last part of the corollary. ◻

2.7.1 Worked-out problems 1. Show that the only nilpotent diagonalizable matrix is the zero matrix. Solution: Assume that A ∈ Fn×n is a diagonalizable and nilpotent matrix with the index of nilpotency m. Since A is diagonalizable, then A = C −1 DC, for a diagonal matrix D and C ∈ GL(n, F). Thus, Dm = (CAC −1 )m = CAm C −1 = 0. However, if the diagonal entries of D are d1 , . . . , dn , then the diagonal enm m tries of Dm are just dm = 0, then dm 1 , . . . , dn . Since D i = 0, for all i and hence di = 0, for all i. This means D = 0, and it follows that A = 0. 2. Show that the index of nilpotency of the nilpotent matrix A ∈ Fn×n is less than or equal to n. Solution: We already discussed that the minimal polynomial of a nilpotent matrix is z m for some positive integer m. Since the minimal polynomial and the characteristic polynomial of a matrix have the same roots (probably with the different multiplicities), the characteristic polynomial of A is z n and based on the definition of minimal polynomial, m ≤ n. Now, the statement follows from the fact that the index of nilpotency and the degree of the minimal polynomial of A are the same.

2.7. Existence and uniqueness of the Jordan canonical form

107

3. The matrix A ∈ Fn×n is called idempotent if A2 = A. Show that if A is idempotent, then det A is equal to 0 or 1. Solution: Since A2 = A, taking the determinant of both sides of this equation, we find det A = det(A2 ). (2.7.8) Then, det(A2 ) = (det A)2 = det A. Hence, det A(det A − 1) = 0 and so det A = 0 or 1.

2.7.2 Problems 1. Let T ∶ V → V be nilpotent with m = index T . Let (ω1 , . . . , ωm ) be the Weyr characteristic. Show that rank T j = ∑m p=j+1 ωp , for j = 1, . . . , m−1. 2. Let A ∈ Cn×n and assume that det(zIn − A) = ∏ki=1 (z − λi )ni , where λ1 , . . . , λk are k distinct eigenvalues of A. Let si (A, λj ) ∶= rank (A − λj In )i−1 − 2rank (A − λj In )i + rank (A − λj In )i+1 , (2.7.9) i = 1, . . . , nj , j = 1, . . . , k. (a) Show that si (A, λj ) is the number of Jordan blocks of order i corresponding to λj for i = 1, . . . , nj . (b) Show that in order to find all Jordan blocks of A corresponding to λj one can stop computing si (A, λj ) at the smallest i ∈ [nj ] such that 1s1 (A, λj ) + 2s2 (A, λj ) + ⋯ + isi (A, λj ) = nj . 3. Let C = F ⊕ G = diag(F, G), F ∈ Fl×l , G ∈ Fm×m . (a) Assume that ψF , ψG are the minimal polynomials of F and G, respectively. Show that ψC , the minimal polynomial of C, is equal to ψF ψG . (ψF ,ψG ) (b) Show that null C = null F + null G. In particular, if G is invertible, then null C = null F . 4. Let F be a field, char F ≠ 2, V be a vector space over F, and A ∈ Fn×n be an idempotent matrix. Prove the following statements: (a) I − A is idempotent. (b) A + I is invertible. (c) Assume that T ∈ L(V) with T x = Ax, for any x ∈ V. Then V≅Im T ⊕ ker T . (d) Each eigenvalue of A is either 1 or 0. (e) The matrix representation of TA under some basis is diag(1, . . . , 1, 0, . . . , 0). (f) rank A + rank (A − I) = n. (g) rank A = tr A.

108

Chapter 2. Canonical forms

2.8 Cyclic subspaces and rational canonical forms In this section, we study the structure of a linear operator on a finite-dimensional vector space, using the primary decomposition theorem of Section 2.4. Let T ∶ V → V be a linear operator and assume that V is finite-dimensional over the field F. Let 0 ≠ u ∈ V. Consider the sequence of vectors u = T 0 (u), T (u), T 2 (u), . . .. Since dim V < ∞, there exists a positive integer l ≥ 1 such that u, T (u), . . . , T l−1 (u) are linearly independent, and u, T (v), . . . , T l (u) are linearly dependent. Hence, l is the smallest integer such that l

T l (u) = − ∑ ai T l−i (u),

(2.8.1)

i=1

for some scalars ai ∈ F, 1 ≤ i ≤ l. Clearly, l ≤ dim V. The polynomial ψu (z) ∶= z l + ∑li=1 ai z l−i is called the minimal polynomial of u, with respect to T . It is a monic polynomial and has the minimum degree among annihilating polynomials of u. Its property is similar to the property of the minimal polynomial of T . Namely, if a polynomial φ ∈ F[z] annihilates u, i.e., φ(T )(u) = 0, then ψu ∣φ. In particular, the minimal polynomial ψ(z) of T is divisible by ψu , since ψ(T )(u) = 0(u) = 0. Assume that U = span {T i (u) ∶ i = 0, 1, . . . }. Clearly, every vector w ∈ U can be uniquely represented as φ(T )(u), where φ ∈ F[z] and deg φ ≤ l − 1. Hence, U is a T invariant subspace. The subspace U is called the cyclic subspace, generated by u. Note that in the basis {u1 = u, u2 = T (u), . . . , ul = T l−1 (u)}, the linear transformation T ∣U is given by the matrix ⎡ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎣

0 1 0 ⋮ 0

0 0 0 0 1 0 ⋮ ⋮ 0 0

... ... ... ... ...

0 −al 0 −al−1 0 −al−2 ⋮ ⋮ 1 −a1

⎤ ⎥ ⎥ ⎥ ⎥ ⎥ ∈ Fl×l . ⎥ ⎥ ⎥ ⎥ ⎦

(2.8.2)

The above matrix is called the companion matrix , corresponding to the polynomial ψu (z). (Sometimes, the transpose of the above matrix is called the companion matrix.) Lemma 2.16. Let V be a finite-dimensional vector space and T ∈ L(V). Suppose u and w are two nonzero elements of V and ψu and ψw are the minimal polynomials of u and w with respect to T , respectively, and (ψu , ψw ) = 1.Then, ψu+w = ψu ⋅ ψw . Proof. Let U and W be the cyclic invariant subspaces generated by u and w, respectively. We claim that the T -invariant subspace X ∶= U ∩ W is the trivial subspace {0}. Assume to the contrary, there exists 0 ≠ x ∈ X. Let X1 ⊂ X be a nontrivial cyclic subspace generated by x and ψx be the minimal polynomial corresponding to x. Since X1 ⊂ U, it follows that x = φ(T )(u), for some φ(z) ∈ F[z]. Hence, ψu (T )(x) = 0. Thus ψx ∣ψu . Similarly, ψx ∣ψw . This contradicts the assumption that (ψu , ψw ) = 1. Hence, U ∩ W = {0}. Let φ = ψu ψw . Clearly, φ(T )(u + w) = ψw (T )(ψu (T )(u)) + ψu (T )(ψw (T )(w) = 0 + 0 = 0.

2.8. Cyclic subspaces and rational canonical forms

109

Thus, φ is an annihilating polynomial of u + w. Let θ(z) be an annihilating polynomial of u + w. Then, 0 = θ(T )(u) + θ(T )(w). Since U and V are T invariant subspaces and U ∩ W = {0}, it follows that 0 = θ(T )(u) = θ(T )(w). Hence, ψu ∣θ, ψw ∣θ. As (ψu , ψw ) = 1, it follows that φ∣θ. Hence, ψu+w = φ. ◻ Theorem 2.8.1. Let T ∶ V → V be a linear operator and 1 ≤ dim V < ∞. Let ψ(z) be the minimal polynomial of T . Then, there exists 0 ≠ u ∈ V such that ψu = ψ. Proof. Assume first that ψ(z) = (φ(z))l , where φ(z) is irreducible in F[z], i.e., φ(z) is not divisible by any polynomial θ such that 1 ≤ deg θ < deg φ, and l is a positive integer. Let 0 ≠ w ∈ V. Recall that ψw ∣ψ. Hence, ψw = (φ)l(w) , where l(w) ∈ [l] is a positive integer. Assume to the contrary that 1 ≤ l(w) ≤ l − 1, for each 0 ≠ w ∈ V. Then, (φ(T ))l−1 (w) = 0. As (φ(T ))l−1 (0) = 0, we deduce that (φ(z))l−1 is an annihilating polynomial of T . Therefore, ψ∣φl−1 which is impossible. Hence, there exists u ≠ 0 such that l(u) = l, i.e., ψu = ψ. Consider now the general case k

ψ(z) = ∏(φi (z))li , li ∈ N, φi irreducible, (φi , φj ) = 1, for i ≠ j,

(2.8.3)

i=1

where k > 1. Theorem 2.4.6 implies that V = ⊕ki=1 Vi , where Vi = ker φlii (T ) and φlii is the minimal polynomial of T ∣Vi . The first case of the proof yields the existence of 0 ≠ ui ∈ Vi such that ψui = φlii for i = 1, . . . , k. Let wj = u1 + ⋯ + uj . Use Lemma 2.16 to deduce that ψwj = ∏ji=1 φlii . Hence, ψu = ψ for u ∶= wk . ◻ Theorem 2.8.2 (the cyclic decomposition theorem for V). Let T ∶ V → V be a linear operator and 0 < dim V < ∞. Then, there exists a decomposition of V to a direct sum of r T -cyclic subspaces V = ⊕ri=1 Ui with the following properties. Assume that ψi is the minimal polynomial of T ∣Ui , then ψ1 is the minimal polynomial of T . Furthermore, ψi+1 ∣ψi , for i = 1, . . . , r − 1. Proof. We prove the theorem by induction on dim V. For dim V = 1, any 0 ≠ u is an eigenvector of T : T (u) = λu, so V is cyclic, and ψ(z) = ψ1 (z) = z − λ. Assume now that the theorem holds for all nonzero vector spaces of dimension less than n + 1. Suppose that dim V = n + 1. Let ψ(z) be the minimal polynomial of T , m = deg ψ and T is nonderogatory, i.e., m = n + 1, which is the degree of the characteristic polynomial of T . Theorem 2.8.1 implies that V is a cyclic subspace. This yields r = 1, and the theorem holds in this case. Assume now that T is derogatory. Then, m < n + 1. Theorem 2.8.1 implies the existence of 0 ≠ u1 such that ψu1 = ψ. Let U1 be the cyclic subspace generated by u1 . Let Vˆ ∶= V/U1 . Thus, 1 ≤ dim Vˆ = dim V − m ≤ n. We now ˆ → V. ˆ Therefore, Vˆ = ⊕r U ˆ apply the induction hypothesis to Tˆ ∶ V i=2 i , where ˆ ˆ i , i = 2, . . . , r. The minimal polynomial Ui is a cyclic subspace generated by u ˆ i is ψi , i = 2, . . . , r and ψ2 is the minimal polynomial of Tˆ, and ψi+1 ∣ψi , for of u i = 2, . . . , r − 1. Observe first that for any polynomial φ(z), we have the identity φ(Tˆ)([x]) = [φ(T )(x)]. Since ψ(T )(x) = 0, it follows that ψ1 ∶= ψ(z) is an annihilating polynomial of Tˆ. Hence, ψ2 ∣ψ1 . Observe next that since ψi+1 ∣ψi , for i = 1, . . . , r − 1, it follows that ψi ∣ψ1 = ψ. Consequently, ψ = θi ψi , where θi is a monic polynomial and i = 2, . . ., r.

110

Chapter 2. Canonical forms

Since [0] = ψi (Tˆ)([ˆ ui ]) = [ψi (T )(ˆ ui )], it follows that ψi (T )(ˆ ui ) ∈ U1 . Hence, ψi (T )(ˆ ui ) = ωi (T )(u1 ), for some ω(z) ∈ F[z]. Hence, 0 = ψ(T )(ˆ ui ) = θi (T )(ψi (T )(ˆ ui )) = θi (T )(ωi (T )(u1 )). Therefore, ψu1 = ψ divides θi ωi . Then, ψi ∣ωi and so ωi = ψi αi . Define ui = ˆ i − αi (T )(u1 ), for i = 2, . . . , r. The rest of the proof of theorem follows from u Problem 2.8.2-5. ◻ The cyclic decomposition theorem can be used to determine a set of canonical forms for similarity as follows. Theorem 2.8.3 (rational canonical form). Let A ∈ Fn×n . Then, there exist r monic polynomials ψ1 , . . . , ψr , of degree 1 at least, satisfying the following conditions: First, ψi+1 ∣ψi , for i = 1, . . . , r. Second, ψ1 = ψ is the minimal polynomial of A. Then, A is similar to ⊕ri=1 C(ψi ), where C(ψi ) is the companion matrix of the form (2.8.2) corresponding to the polynomial ψi . Decompose each ψi = ψi,1 . . . ψi,ti to the product of its irreducible components, as in the decomposition of ψ = ψ1 given in (2.8.3). Then, A is similar to i ⊕r,t i=l=1 C(ψi,l ). Suppose finally that ψ(z) = ∏kj=1 (z−λj )mj , where λ1 , . . . , λk are the k distinct eigenvalues of A. Then, A is similar to the Jordan canonical form given in Theorem 2.1.21. Proof. Identify A with T ∶ Fn → Fn , where T (x) = Ax. Use Theorem 2.8.2 to decompose Fn = ⊕ri=1 Ui , where each Ui is a cyclic invariant subspace such that T ∣Ui has the minimal polynomial ψi . Since Ui is cyclic, then T ∣Ui is represented by C(ψi ) in the appropriate basis of Ui as shown in the beginning of this section. Hence, A is similar to ⊕ri=1 C(ψi ). Consider next Ti ∶= T ∣Ui . Use Theorem 2.4.6 to deduce that Ui decomposes i to a direct sum of Ti invariant subspaces ⊕tl=1 Ui,l , where the minimal polynomial of Ti,l ∶= Ti ∣Ui,l is ψi,l . Since Ui was cyclic, i.e., Ti was nonderogatory, it follows that each Ti,l must be nonderogatory, i.e., Ui,l cyclic. (See Problem 2.8.2-3.) Recall that each Ti,l is represented in a corresponding basis by C(ψi,l ). Hence, i A is similar to ⊕r,t i=l=1 C(ψi,l ). Suppose finally that ψ(z) splits to linear factors. Hence, ψi,l = (z − λl )mi,l and Ti,l − λl I is nilpotent of index mi,l . Thus, there exists a basis in Ui,l such that Ti,l is represented by the Jordan block Jmi,l (λl ). Therefore, A is similar to a sum of corresponding Jordan blocks. ◻ The polynomials ψ1 , . . . , ψk appearing in Theorem 2.8.2 are called invariant polynomials of T or its representation matrix A, in any basis. The polynomials ψi,1 , . . . , ψi,ti , i = 1, . . . , k appearing in Theorem 2.8.3 are called the elementary divisors of A, or the corresponding linear transformation T represented by A. We now show that these polynomials are uniquely defined. Lemma 2.17 (Cauchy–Binet formula). For two positive integers 1 ≤ p ≤ m, denote by Qp,m the set of all subsets α of [m] of cardinality p. Let A ∈ Fm×n , B ∈ Fn×l and denote C = AB ∈ Fm×l . Then, for any integer p ∈ [min(m, n, l)], α ∈

2.8. Cyclic subspaces and rational canonical forms

111

Qp,m , β ∈ Qp,l the following identity holds: det C[α, β] =

∑ det A[α, γ] det B[γ, β]. γ ∈Qp,n

(2.8.4)

The Cauchy–Binet formula is a generalization of the product formula det AB = det A det B to products of rectangular matrices. Proof. It is enough to check the case where α = β = [p]. This is equivalent to the assumption that p = m = l ≤ n. For p = m = n = l, the Cauchy–Binet formula reduces to det AB = (det A)(det B). Thus, it is sufficient to consider the case p = m = l < n. Let C = [cij ]pi=j=1 . Then, cij = ∑ntj =1 aitj btj j for i, j = 1, . . . , p. Use multilinearity of the determinant to deduce n

det C = det[ ∑ aitj btj j ]ni=j=1 = tj =1

=

n

∑

t1 ,...,tp =1

n

∑

t1 ,...,tp =1

det[aitj btj j ]ni=j=1

det A[{1, . . . , p}, {t1 , . . . , tp }]bt1 1 bt2 2 . . . btp p .

Observe next that det A[{1, . . . , p}, {t1 , . . . , tp }] = 0 if ti = tj , for some 1 ≤ i < j ≤ p, since the columns ti and tj in A[{1, . . . , p}, {t1 , . . . , tp }] are equal. Consider the sum ∑

{t1 ,t2 ,...,tp }=γ ∈Qp,n

A[{1, . . . , p}, {t1 , . . . , tp }]bt1 1 bt2 2 . . . btp p .

The above arguments yield that this sum is det(A[⟨p⟩, γ]C[γ, ⟨p⟩]) = (det A[⟨p⟩, γ])(det C[γ, ⟨p⟩]). This establishes (2.8.4).

◻

Proposition 2.8.4. Let A(z) ∈ Fm×n [z]. Denote by r = rank A(z) the size of the biggest minor of A(z) that is not a zero polynomial. For an integer k ∈ [r], let δk (z) be the greatest common divisor of all k × k minors of A(z), which is assumed to be a monic polynomial. Assume that δ0 = 1. Then, δi ∣δi+1 , for i = 0, . . . , r − 1. Proof. Expand a nonzero (k + 1) × (k + 1) minor of A(z) to deduce that δk (z)∣δk+1 (z). In particular, δi ∣δj , for 1 ≤ i < j ≤ r. ◻ Proposition 2.8.5. Let A(z) ∈ F[z]m×n . Assume that P ∈ GL(m, F) and Q ∈ GL(n, F). Let B(z) = P A(z)Q. Then 1. rank A(z) = rank B(z) = r. 2. δk (A(z)) = δk (B(z)), for k = 0, . . . , r. Proof. Use Cauchy–Binet to deduce that rank B(z) ≤ rank A(z). Since A(z) = P −1 B(z)Q−1 , it follows that rank A(z) ≤ rank B(z). Hence, rank A(z) = rank B(z) = r. Use Cauchy–Binet to deduce that δk (A(z))∣δk (B(z)). As A(z) = P −1 B(z)Q−1 , we deduce also that δk (B(z))∣δk (A(z)). ◻

112

Chapter 2. Canonical forms

Definition 2.8.6. Let A ∈ Fn×n and define A(z) ∶= zIn − A. Then, the polynon−i+1 , i = 1, . . . , n are called the mials of degree 1 at least in the sequence φi = δδn−i (z) invariant polynomials of A. Note that the product of all invariant polynomials is det(zIn − A). In view of Proposition 2.8.5 we obtain that similar matrices have the same invariant polynomials. Hence, for linear transformation T ∶ V → V, we can define its invariant polynomials of T by any representation matrix of T in a basis of V. Theorem 2.8.7. Let T ∶ V → V be a linear operator. Then, the invariant polynomials of T are the polynomials ψ1 , . . . , ψr appearing in Theorem 2.8.2. To prove this theorem we need the following lemma: Lemma 2.18. Let A = diag(B, C)(= B ⊕ C) ∈ Fn×n , where B ∈ Fl×l , C ∈ Fm×m , m = n−l. Let p ∈ [n−1] and assume that α = {α1 , . . . , αp }, β = {β1 , . . . , βp } ∈ Qp,n : 1 ≤ α1 < ⋯ < αp ≤ n, 1 ≤ β1 < ⋯ < βp ≤ n. If #α ∩ [l] ≠ #β ∩ [l], then det A[α, β] = 0. Proof. Let k ∶= #α ∩ [l], i.e., αk ≤ l, αk+1 > l. Consider a term in det A[α, β]. Up to a sign it is ∏pj=1 aαj βσ(j) , for some bijection σ ∶ [p] → [p]. Then, this product is zero unless σ([k]) = [k] and βk ≤ l, βk+1 > l. ◻ Proof of Theorem 2.8.7. From the proof of Theorem 2.8.2, it follows that V has a basis in which T is represented by C ∶= ⊕ri=1 C(ψi ). Assume that C(ψi ) ∈ Fli ×li , where i ∈ [r] and n = ∑ri=1 li . Let θ ∶= z l + a1 z m−1 + ⋯ + al . Then, the companion matrix C(θ) is given by (2.8.2). Consider the submatrix of B(z) of zIl − C(θ) obtained by deleting the first row and the last column. It is an upper triangular matrix with −1 on the diagonal. Hence, det B(z) = (−1)l−1 . Therefore, δl−1 (zIm − C(θ)) = 1. Thus, δl−j (zIl − C(θ)) = 1, for j ≥ 1. We claim that δn−j (zIn − C) = ∏ri=j+1 ψj , for j ∈ [r]. Consider a minor of order n−j of the block diagonal matrix zIn −C of the form (zIn −C)[α, β], where α, β ∈ Qn−j,n . Lemma 2.18 implies that det(zIn − C)[α, β] = 0, unless we choose our minor such that in each submatrix (zIli −C(ψi )), we delete the same number of rows and columns. Deleting the first row and last column in (zIli −C(ψi )), we get a minor with value (−1)li −1 . Recall also that δli −k (zIli −C(ψi )) = 1, for k ≥ 1. Since ψi+1 ∣ψi , it follows that δn−j (zIn − C) is obtained by deleting the first row and the last column in (zIli − C(ψi )), for i = 1, . . . , j and j = 1, . . . , r. (ψr+1 ∶= 1.) (zIn −C) Thus, δn−j (zIn − C) = ∏ri=j+1 ψj , for j ∈ [r]. Hence, ψi = δδn−i+1 . ◻ n−i (zIn −C) Note that the above theorem implies that the invariant polynomials of A satisfy φi+1 ∣φi , for i = 1, . . . , r − 1. The irreducible factors φi,1 , . . . , φi,ti of ψi given in Theorem 2.8.3, for i = 1, . . . , r, are called the elementary divisors of A. The matrices ⊕ki=1 C(ψi ) and i ⊕k,t i=l=1 C(ψi,l ) are called the rational canonical forms of A. Those are the canonical forms in the case that the characteristic polynomial of A does not split to linear factors over F.

2.8. Cyclic subspaces and rational canonical forms

113

Note that the diagonal elements of the Jordan canonical form of a matrix A are the eigenvalues of A, each appearing the number of times equal to its algebraic multiplicity. However, the rational canonical form does not expose the eigenvalues of the matrix, even when these eigenvalues lie in the base field. Example 2.19. Let T ∈ L(R7 ) with the minimal polynomial ψ(z) = (z − 1)(z 2 + 1)2 . Since its elementary divisors are z − 1 and (z 2 + 1)2 , then we have the following possibilities for the list of elementary divisors: 1. z − 1, z − 1, z − 1, (z 2 + 1)2 . 2. z − 1, (z 2 + 1)2 , z 2 + 1. These correspond to the following rational canonical forms: ⎡−1 ⎢ ⎢0 ⎢ ⎢ ⎢0 ⎢ 1. ⎢⎢ 0 ⎢0 ⎢ ⎢ ⎢0 ⎢ ⎢0 ⎣

0 −1 0 0 0 0 0

0 0 −1 0 0 0 0

⎡−1 ⎢ ⎢0 ⎢ ⎢ ⎢0 ⎢ 2. ⎢⎢ 0 ⎢0 ⎢ ⎢ ⎢0 ⎢ ⎢0 ⎣

0 0 1 0 0 0 0

0 0 0 1 0 0 0

0 0 0 0 1 0 0

0 0 0 0 0 1 0

0 0 0 −1 0 0 0 −2 1 0 0 0 0 0

0 0 0 0 0 0 1

0 ⎤⎥ 0 ⎥⎥ ⎥ 0 ⎥⎥ −1⎥⎥ 0 ⎥⎥ ⎥ −2⎥⎥ 0 ⎥⎦

0 0 ⎤⎥ 0 0 ⎥⎥ ⎥ 0 0 ⎥⎥ 0 0 ⎥⎥ 0 0 ⎥⎥ ⎥ 0 −1⎥⎥ 1 0 ⎥⎦

2.8.1 Worked-out problems 1. Consider the following matrix: ⎡2 ⎢ ⎢−4 ⎢ C=⎢ ⎢1 ⎢ ⎢2 ⎣

2 −3 1 2

−2 4 −1 −2

4 ⎤⎥ −6⎥⎥ ⎥. 2 ⎥⎥ 4 ⎥⎦

Assume that the minimal polynomial of C is ψC (z) = z(z − 1)2 . Find the rational canonical form of C. Solution: (z) = z, then ψ2 (z) = z. According to the definition of the comSince pψC1 (z) panion matrix, ⎡0 0 0 ⎤ ⎢ ⎥ ⎢ ⎥ C(ψ1 ) = ⎢1 0 −1⎥ , ⎢ ⎥ ⎢0 1 2 ⎥ ⎣ ⎦

114

Chapter 2. Canonical forms

and C(ψ2 ) = 0, so

⎡⎡⎢0 ⎢⎢ ⎢⎢1 ⎢ C≈ ⎢⎢⎢ ⎢⎣0 ⎢ ⎢ ⎣

0 0 1

0 ⎤⎥ ⎥ −1⎥ ⎥ 2 ⎥⎦

⎤ ⎥ ⎥ ⎥ ⎥. ⎥ ⎥ [0]⎥⎦

2.8.2 Problems 1. Let φ(z) = z l + ∑li=1 ai z l−i . Denote by C(φ) the matrix (2.8.2). Show the following statements: (a) φ(z) = det(zI − C(φ)). (b) The minimal polynomial of C(φ) is φ, i.e., C(φ) is nonderogatory. (c) Assume that A ∈ Fn×n is nonderogatory. Show that A is similar to C(φ), where φ is a characteristic polynomial of A. 2. Let T ∶ V → V, dim V < ∞, 0 ≠ u, w. Show that ψu+w =

ψu ψw . (ψu ,ψw )

3. Let T ∶ V → V and assume that V is a cyclic space, i.e., T is nondegenerate. Let ψ be the minimal and the characteristic polynomial. Assume that ψ = ψ1 ψ2 , where deg ψ1 , deg ψ2 ≥ 1, and (ψ1 , ψ2 ) = 1. Show that there exist 0 ≠ u1 , u2 such that ψi = ψui , i = 1, 2. Furthermore, V = U1 ⊕ U2 , where Ui is the cyclic subspace generated by ui . 4. Let T be a linear operator T ∶ V → V and U is a subspace of V. (a) Assume that T U ⊆ U. Show that T induces a linear transformation Tˆ ∶ V/U → V/U, i.e., Tˆ([x]) ∶= [T (x)]. (b) Assume that T U ⊈ U. Show that Tˆ([x]) ∶= [T (x)] does not make sense. 5. In this problem we finish the proof of Theorem 2.8.2. (a) Show that ψi is an annihilating polynomial of ui , for i ≥ 2. (b) By considering the vector [ˆ ui ] = [ui ], show that ψui = ψi for i ≥ 2. (c) Show that the vectors ui , T (ui ), . . . , T deg ψi −1 (ui ), i = 1, . . . , r are linearly independent. (Hint: Assume linear dependence of all vectors, and then quotient this dependence by U1 . Then, use the induction hypothesis on Vˆ .) (d) Let Ui be the cyclic subspace generated by ui , for i = 2, . . . , r. Conclude the proof of Theorem 2.8.2. 6. Let the assumptions of Theorem 2.8.2 hold. Show that the characteristic polynomial of T is equal to ∏ri=1 ψi . 7. Prove that any A ∈ Fn×n can be written as the product of two symmetric matrices. (Hint: Use the rational canonical form.)

2.8. Cyclic subspaces and rational canonical forms

115

8. Let A, B ∈ Fn×n . Prove that A is similar to B if and only if they have the same elementary divisors. 9. Find the rational canonical form of the matrix A ∈ R2×2 , cos θ A=[ sin θ where θ ∈ R.

− sin θ ], cos θ

Review Problems (1)

1. Recall that ξ is an nth root of unity if ξ is a solution of the equation z n −1 = 0. An nth root of unity is called primitive if ξ l ≠ 1 for l = 1, . . . , n−1. (a) Write down all 3-primitive roots. (b) Write down the monic polynomial with simple roots whose roots are all 3-primitive roots. (c) Write down the monic polynomial with simple roots whose roots are all 6-primitive roots. (d) Assume that n is a prime number. Write down the monic polynomial with simple roots that are all n-primitive roots and prove that claim. −4 4 −9 7 −1 −4 8 2

2. (a) Let A = [ 6

−2 −3]. 0

Find the decomposition of A = LU , where L

is an invertible lower triangular matrix and U is a row echelon form of A. (b) Let A, B ∈ Fn×n . Find and prove the formula for the adjoint of AB in terms of the adjoints of A and B. (c) Let A ∈ Fm×n and B ∈ Fn×p . Give a best upper bound for the rank of AB in terms of the ranks of A and B and prove this upper bound. (d) Let A ∈ Fn×n be a matrix of rank n − 1. Show that the adjoint of A has rank 1. 3. Assume that A ∈ Cn×n . Assume that I, A, and A2 are linearly independent as vectors in Cn×n . Assume that I, A, A2 , and A3 are linearly dependent. (a) What is the degree of the minimal polynomial of A? (b) How many distinct eigenvalues does A have at most? (c) What is the minimal value of n? (d) When is A nonderogatory? (e) When is A diagonalizable?

117

118

Review Problems (1)

4. (a) Let Jn (λ) ∈ Cn×n be an n × n Jordan block with the eigenvalue λ. Assume that n is even. Find the Jordan canonical form of Jn (λ)2 . (Hint: There are two different cases.) ⎡1 1 1 1⎤ ⎢ −1 −1 −1⎥ ⎥. Find the characteristic polynomial, the min(b) Let A = ⎢⎢−1 ⎥ ⎢ 00 00 11 11 ⎥ ⎣ ⎦ imal polynomial, and the Jordan canonical form of A. 5. Let A ∈ Fn×n be a skew-symmetric matrix. Show that if the characteristic of F is not 2 and n is odd, then det A = 0. Does this result hold true if the characteristic of F is 2? 6. Show that if A ∈ R(2m)×(2m) is skew-symmetric, then det A ≥ 0. 7. Let A ∈ Fn×n . Show that det adj A = (det A)n−1 . (Assume for simplicity that A is invertible.) 8. Let A ∈ Fn×n and Am = I, for some m ∈ N. Prove that if A has only one eigenvalue λ, then A = λI. 9. Let A ∈ Rn×n such that A⊺ = A2 . What are the possible eigenvalues of A? 10. Let T ∶ C2×2 → C2×2 be the transpose function a T[ c

b a ]=[ d b

c ], d

and let S ∶ C2×2 → C2×2 be the conjugate transpose function S[

a c

b a ¯ ] = [¯ d b

c¯ ]. d¯

(a) Is T linear? Prove your answer. (b) Is S linear? Prove your answer. (c) Compute det T . (d) Compute the characteristic polynomial of T . (e) Compute the minimal polynomial of T . 11. Let A ∈ R4×4 and pA (x) = x4 − 16. Is A necessarily diagonalizable? 12. Let A ∈ Zn×n and det A = ±1. Prove that A−1 has integer entries. 13. Let A ∈ GL(m, R) and B ∈ Rm×n . Show that rank (AB) = rank B. 14. Let V be a vector space and S ∈ L(V). Prove that S commutes with all T ∈ L(V) if and only if S is a scalar operator. 15. Prove or disprove the following statement: Assume that V is a finite-dimensional vector space, T ∈ L(V) and p ∈ F[z]. Then every eigenvalue of p(T ) is of the form p(λ) for some eigenvalue λ of T .

Review Problems (1)

119

16. Can both the sum and the product of the eigenvalues of a matrix A ∈ Qn×n be irrational? 17. Let A ∈ Fn×n be invertible. Dose there necessarily exist a polynomial p(z) ∈ F[z] such that p(A) = A−1 ? 18. Let A ∈ GL(n, R) and its entries be either 0 or 1. Determine the minimum number of 0’s in the entries of A. 19. Determine the number of congruence classes of S(n, R).

Chapter 3

Applications of the Jordan canonical form

3.1 Functions of matrices Let A ∈ Cn×n . Consider the iterations xl = Axl−1 , xl−1 ∈ Cn ,

l = 1, . . . .

(3.1.1)

Clearly, xl = Al x0 . To compute xl from xl−1 , one needs to perform n(2n−1) flops (operations: n2 multiplications and n(n − 1) additions). If we want to compute x108 , we need 108 n(2n−1) operations, if we simply program the iterations (3.1.1). If n = 10, it will take us some time to do these iterations, and we will probably run to the roundoff error, which will render our computations meaningless. Is there any better way to find x108 ? The answer is yes, and this is the purpose of this section. For a scalar function f and a matrix A ∈ Cn×n , we denote by f (A) a matrix of the same dimension as A. This provides a generalization of a scalar variable matrix f (z), z ∈ C. If f (z) is a polynomial or rational function, it is natural to define f (A) by substituting A for z, replacing division by matrix inversion, and 3 replacing 1 by the identity matrix. For example, if f (z) = 1+z , then 1−z f (A) = (I − A)−1 (I + A3 ), if 1 is not an eigenvalue of A. As rational functions of a matrix commute, it does not matter if we write (I − A)−1 (I + A3 ) or (I + A3 )(I − A)−1 . In what follows, we are going to provide the formal definition of a matrix function. Let A ∈ Cn×n . Using Theorem 2.1.21 (the Jordan canonical form), we have k

P −1 AP = ⊕ Jni (λi ), i=1

for some P ∈ GL(n, C), and the λi ’s are the eigenvalues of A each of order ni . The function f is said to be defined on the spectrum of A if the values f (j) (λi ), j = 0, 1, . . . , ni − 1,

i ∈ [k]

exist. These are called the values of the function f on the spectrum of A. 121

122

Chapter 3. Applications of the Jordan canonical form

Definition 3.1.1 (matrix function). Let f be defined on the spectrum of A. Then, f (A) = P f (J)P −1 = P diag(f (Jk ))P −1 , where ⎡ ⎢f (λi ) ⎢ ⎢ f (Jk ) = ⎢⎢ ⎢ ⎢ ⎢ ⎣ 1

For example, for J = [ 4 0

1

1] 4

f ′ (λi ) f (λi )

⋯ ⋱ ⋱

f (ni −1) (λi ) ⎤ ⎥ (ni −1)! ⎥

⋮ f (λi ) f (λi ) ′

⎥ ⎥. ⎥ ⎥ ⎥ ⎥ ⎦

and f (z) = z 2 , we have

⎡f ( 1 ) ⎢ f (J) = ⎢ 4 ⎢ 0 ⎣

1 f ′ ( 12 )⎤⎥ ⎡⎢ 16 ⎢ ⎥ = f ( 14 ) ⎥⎦ ⎢⎣ 0

1 ⎤⎥ ⎥, 1 ⎥ 16 ⎦

which is easily proved to be J 2 . Theorem 3.1.2. Let A ∈ Cn×n and k

k

i=1

i=1

det(zIn − A) = ∏(z − λi )ni , ψ(z) = ∏(z − λi )mi , k

k

i=1

i=1

(3.1.2)

1 ≤ m ∶= deg ψ = ∑ mi ≤ n = ∑ ni , 1 ≤ mi ≤ ni , λi ≠ λj , for i ≠ j, i, j = 1, . . . , k, where ψ(z) is the minimal polynomial of A. Then, there exist unique m linearly independent matrices Zij ∈ Cn×n , for i = 1, . . . , k and j = 0, . . . , mi − 1, which depend on A, such that for any polynomial f (z) the following identity holds: k mi −1

f (A) = ∑ ∑

i=1 j=0

f (j) (λi ) Zij . j!

(3.1.3)

(Zij , i = 1, . . . , k, j = 1, . . . , mi are called the A-components.) Proof. We start first with A = Jn (λ). So Jn (λ) = λIn + Hn , where Hn ∶= Jn (0). Thus, Hn is a nilpotent matrix, with Hnn = 0 and Hnj has 1’s on the jth subdiagonal and all other elements are equal to 0, for j = 0, 1, . . . , n − 1. Hence, In = Hn0 , Hn , . . . , Hnn−1 are linearly independent. Let f (z) = z l . Then min(l,n−1) l l l Al = (λIn + Hn )l = ∑ ( )λl−j Hnj = ( )λl−j Hnj . ∑ j j=0 j j=0

The last equality follows from the equality H j = 0, for j ≥ n. Note that ψ(z) = det(zIn − Jn (λ)) = (z − λ)n , i.e., k = 1 and m = m1 = n. From the above equality we conclude that Z1j = Hnj , for j = 0, . . . if f (z) = z l and l = 0, 1, . . .. With this definition of Z1j , (3.1.3) holds for Kl z l , where Kl ∈ C and l = 0, 1, . . .. Hence, (3.1.3) holds for any polynomial f (z) for this choice of A. Assume now that A is a direct sum of Jordan blocks as in (2.4.6): A = i ⊕k,l i=j=1 Jmij (λi ). Here mi = mi1 ≥ ⋯ ≥ mili ≥ 1 for i = 1, . . . , k, and λi ≠ λj

3.1. Functions of matrices

123

i for i ≠ j. Thus, (3.1.2) holds with ni = ∑lj=1 mij for i = 1, . . . , k. Let f (z) be k,li a polynomial. Then, f (A) = ⊕i=j=1 f (Jmij (λi )). Use the results for Jn (λ) to deduce mij −1 (r) f (λi ) r i Hmij . f (A) = ⊕k,l i=j=1 ∑ r! r=0

Let Zij ∈ Cn×n be a block diagonal matrix of the following form. For each integer l ∈ [k] with l ≠ i, all the corresponding blocks to Jlr (λl ) are equal to j zero. In the block corresponding to Jmir (λi ), Zij has the block matrix Hm , ir for j = 0, . . . , mi − 1. Note that each Zij is a nonzero matrix with 0 − 1 entries. Furthermore, two different Zij and Zi′ j ′ do not have a common 1 entry. Hence, Zij , i = 1, . . . , k, j = 0, . . . , mi − 1 are linearly independent. It is straightforward to deduce (3.1.3) from the above identity. Let B ∈ Cn×n . Then, B = U AU −1 , where A is the Jordan canonical form of B. Note that A and B have the same characteristic polynomial. Let f (z) ∈ C[z]. Then (3.1.3) holds. Clearly, k mi −1

f (B) = U f (A)U −1 = ∑ ∑

i=1 j=0

f (j) (λi ) U Zij U −1 . j!

Hence, (3.1.3) holds for B, where U Zij U −1 , i = 1, . . . , k, j = 0, . . . , mi − 1 are the B-components. The uniqueness of the A-components follows from the existence and uniqueness of the Lagrange-Sylvester interpolation polynomial, explained below. ◻ Theorem 3.1.3 (the Lagrange–Sylvester interpolation polynomial). Let λ1 , . . . , λk ∈ C be k distinct numbers. Let m1 , . . . , mk be k positive integers and m = m1 + ⋯ + mk . Let sij , i = 1, . . . , k, j = 0, . . . , mi − 1 be any m complex numbers. Then, there exists a unique polynomial φ(z) of degree at most m − 1 satisfying the conditions φ(j) (λi ) = sij , for i = 1, . . . , k, j = 0, . . . , mi − 1. (For mi = 1, i = 1, . . . , k, φ is the Lagrange interpolating polynomial.) Proof. The Lagrange interpolating polynomial is given by the formula (z − λ1 ) . . . (z − λi−1 )(z − λi+1 ) . . . (z − λk ) si0 . (λ i − λ1 ) . . . (λi − λi−1 )(λi − λi+1 ) . . . (λi − λk ) i=1 k

φ(z) = ∑

In the general case, one can determine φ(z) as follows. Let ψ(z) ∶= ∏ki=1 (z − λi )mi . Then k mi −1

φ(z) = ψ(z) ∑ ∑

i=1 j=0

k mi −1 tij = ∑ ∑ tij (z − λi )j θi (z), m −j i (z − λi ) i=1 j=0

(3.1.4)

where θi = Observe that

ψ(z) = ∏(z − λj )mj , for i = 1, . . . , k. (z − λi )mi j≠i

dl θi (λr ) = 0, for l = 0, . . . , mr − 1 and r ≠ i. dz l

(3.1.5)

(3.1.6)

124

Chapter 3. Applications of the Jordan canonical form

Now start to determine ti0 , ti1 , . . . , ti(mi −1) recursively for each fixed value of i. This is done by using the values of φ(λi ), φ′ (λi ), . . . , φ(mi −1) (λi ) in the above formula for φ(z). Note that deg φ ≤ m − 1. It is straightforward to show that φ′ (λi ) − ti0 θi′ (λi ) φ (λi ) − ti0 θi (λi ) − 2ti1 θi′ (λi ) φ(λi ) ti0 = , ti1 = , ti2 = . θi (λi ) θi (λi ) 2θi (λi ) (3.1.7) The uniqueness of φ is shown as follows. Assume that θ(z) is another Lagrange–Sylvester polynomial of degree less than m. Then, ω(z) ∶= φ(z) − θ(z) must be divisible by (z − λi )mi , since ω (j) (λi ) = 0, for j = 0, . . . , mi − 1, for each i = 1, . . . , k. Hence, ψ(z)∣ω(z). As deg ω(z) ≤ m − 1, it follows that ω(z) is the zero polynomial, i.e., φ(z) = θ(z). ◻ ′′

′′

How to find the components of A: Assume that the minimal polynomial of A, ψ(z) is given by (3.1.2). Then, we have Zij = φij (A), where φij (z) is the Lagrange–Sylvester polynomial of degree at most m satisfying (q)

φij (λp ) = 0, for p ≠ i and φij (λi ) = j!δpj , j = 0, . . . , mi − 1.

(3.1.8)

Indeed, assume that φij (z) satisfies (3.1.8). Use (3.1.3) to deduce that φij (A) = Zij . To find φij , do the following steps. Set φij (z) =

mi −1 ⎞ θi (z)(z − λi )j ⎛ 1 + ∑ al (z − λi )l−j , j = 0, . . . , mi − 2, (3.1.9) θi (λi ) ⎝ l=j+1 ⎠

φi(mi −1) (z) =

θi (z)(z − λi )mi −1 . θi (λi )

(3.1.10)

To find aj+1 , . . . , ami −1 in (3.1.9), use the conditions (q)

φij (λi ) = 0 for q = j + 1, . . . , mi − 1.

(3.1.11)

Namely, the condition for q = j + 1 determines aq+1 . Next, the condition for q = j + 2 determines aq+2 . Continue in this manner to find all aj+1 , . . . , ami −1 . We now explain briefly why these formulas hold. Since θi (z)∣φij (z), it fol(q) lows that φij (λp ) = 0 for p ≠ i and q = 0, . . . , mq − 1. (That is, the first con(q)

dition of (3.1.8) holds.) Since (z − λi )j ∣φij (z), it follows that φij (λi ) = 0, for (j)

q = 0, . . . , j − 1. A straightforward calculation shows that φij (λi ) = j!. The values of aj+1 , . . . , ami −1 are determined by (3.1.11). Proof of the uniqueness of A-components. Let φij (z) be the LagrangeSylvester polynomial given by (3.1.8). Then, (3.1.3) yields that Zij = φij (A). ◻ The following proposition gives a computer algorithm to find Zij . (Not recommended for hand calculations!) Proposition 3.1.4. Let A ∈ Cn×n . Assume that the minimal polynomial ψ(z) is given by (3.1.2) and denote m = deg ψ. Then, for each integer u, v ∈ [n],

3.1. Functions of matrices

125

(l)

denote by auv and (Zij )uv the (u, v) entries of Al and of the A-component Zij , respectively. Then, (Zij )uv , i = 1, . . . , k, j = 0, . . . , mi − 1 are the unique solutions of the following system with m unknowns: k mi −1 l max(l−j,0) (Zij )uv = a(l) ∑ ∑ ( )λi uv , j i=1 j=0

l = 0, . . . , m − 1.

(3.1.12)

(Note that (jl ) = 0 for j > l.) Proof. Consider the equality (3.1.3) for f (z) = z l , where l = 0, . . . , m − 1. Restricting these equalities to (u, v) entries, we deduce that (Zij )uv satisfy the system (3.1.12). Thus, the systems (3.1.12) are solvable for each pair (u, v), u, v = 1, . . . , n. Let Xij ∈ Cn×n , i = 1, . . . , k, j = 1, . . . , mi − 1 such that (Xij )uv satisfy

(λi ) i −1 f the system (3.1.12), for each u, v ∈ [n]. Hence, f (A) = ∑ki=1 ∑m Tij , for j=0 j! f (z) = z l and l = 0, . . . , m−1. Hence, the above equality holds for any polynomial f (z) of degree less than m. Apply the above formula to the Lagrange–Sylvester polynomial φij as given in the proof of the uniqueness of the A-components. Then, φij (A) = Xij . So Xij = Zij . Thus, each system (3.1.12) has a unique solution. ◻ (j)

The algorithm for finding the A-components and their complexity: (a) Set i = 1. (b) Compute and store Ai . Check if In , A, . . . , Ai are linearly independent. m−i m−i is the . Then, ψ(z) = z m − ∑m (c) m = i and express Am = ∑m i=1 ai z i=1 ai A minimal polynomial.

(d) Find the k roots of ψ(z) and their multiplicities: ψ(z) = ∏ki=1 (z − λi )mi . (e) Find the A-components by solving n2 systems (3.1.12). Complexity of an algorithm is a measure of the amount of time or space required by an algorithm for an input of a given size. The maximum complexity to find ψ(z) happens when m = n. Then, we need to compute and store In , A, A2 , . . . , An . So, we need n3 storage space. Viewing In , A, . . . , Ai as row 2 vectors arranged as i × n2 matrix Bi ∈ Ci×n , we bring Bi to a row echelon form: Ci = Ui Bi , Ui ∈ C i×i . Note that Ci is essentially upper triangular. Then, we add (i + 1)th row: Ai+1 to the Bi to obtain Ci+1 = Ui+1 Bi+1 . (Ci is an i × i submatrix of Ci+1 .) To get Ci+1 from Ci we need 2in2 flops. In the case m = n, Cn2 +1 has the last row zero. So to find ψ(z) we need at most Kn4 flop. (K ≤ 2?). The total storage space is around 2n3 . Now to find the roots of ψ(z) with certain precision will take a polynomial time, depending on the precision. To solve n2 systems with n variables, given in (3.1.12), use Gauss–Jordan for the augmented matrix [S T ]. Here, S ∈ Cn×n stands for the coefficients of the system (3.1.12), depending on λ1 , . . . , λk . Next, 2 T ∈ Cn×n given the “left-hand side” of n2 systems of (3.1.12). One needs around n3 storage space. Bring [S T ] to [In Q] using Gauss–Jordan to find A-components. To do that we need about n4 flops. In summary, we need storage of 2n3 and around 4n4 flops. (This would suffice to find the roots of ψ(z) with good enough precision.)

126

Chapter 3. Applications of the Jordan canonical form

3.1.1 Linear recurrence equation Consider the linear homogeneous recurrence equation of order n over a field F: um = a1 um−1 + a2 um−2 + ⋯ + an um−n ,

m = n, n + 1, . . .

(3.1.13)

The initial conditions are the values u0 , . . . , un−1 . Given the initial conditions, then the value of each um for m ≥ n is determined recursively by (3.1.13). Let ⎡0 ⎢ ⎢0 ⎢ ⎢ A=⎢ ⋮ ⎢ ⎢0 ⎢ ⎢an ⎣

1 0 ⋮ 0

0 1 ⋮ 0

0 0 ⋮ 0

an−1

an−2

an−3

... 0 ... 0 ⋮ ⋮ ... 0 . . . a2

0 ⎤⎥ ⎡ uk ⎤ ⎢ ⎥ 0 ⎥⎥ ⎢ u ⎥ ⎥ ⎢ k+1 ⎥ n×n ⋮ ⎥ ∈ F , xk = ⎢ ⎥ ∈ Fn , (3.1.14) ⎥ ⎢ ⋮ ⎥ ⎢ ⎥ 1 ⎥⎥ ⎢uk+n−1 ⎥ ⎥ ⎣ ⎦ a1 ⎦

where k = 0, 1, . . .. Hence, the linear homogeneous recurrence equation (3.1.13) is equivalent to the homogeneous linear system relation xk = Axk−1 ,

k = 1, 2, . . . .

(3.1.15)

Then, the formula for um , m ≥ n, can be obtained from the last coordinate of xm−n+1 = Am−n+1 x0 . Thus, we could use the results of this section to find um . Note that this can also be used as a tool to compute a formula for Fibonacci numbers. See also section 5.3.

3.1.2 Worked-out problems ⎡2 ⎢ ⎢−4 ⎢ 1. Compute the components of A = ⎢ ⎢1 ⎢ ⎢2 ⎣ Assume that its minimal polynomial 2.4.2).

2 −2 4 ⎤⎥ −3 4 −6⎥⎥ ⎥. 1 −1 2 ⎥⎥ 2 −2 4 ⎥⎦ is ψ(z) = z(z − 1)2 (see Theorem

Solution: We have λ1 = 0, θ1 (z) = (z − 1)2 , θ1 (λ1 ) = 1, λ2 = 1, θ2 (z) = z, θ2 (λ2 ) = 1. Next, (3.1.10) yields φ10 =

θ1 (z)z 0 = (z − 1)2 , θ1 (0)

φ21 =

θ2 (z)(z − 1)1 = z(z − 1). θ2 (1)

Also, (3.1.9) yields that φ20 = z(1+a1 (z−1)). The value of a1 is determined by the condition φ′20 (1) = 0. Hence, a1 = −1. So φ20 = z(2 − z). This leads to Z10 = (A − I)2 , Z20 = A(2I − A), Z21 = A(A − I). Use (3.1.3) to deduce f (A) = f (0)(A − I)2 + f (1)A(A − 2I) + f ′ (1)A(A − I), for any polynomial f (z).

(3.1.16)

3.2. Power stability, convergence, and boundedness of matrices

127

3.1.3 Problems 1. Let A ∈ Cn×n and assume that det(zIn − A) = ∏ki=1 (z − λi )ni , and the minimal polynomial of A is ψ(z) = ∏ki=1 (z − λi )mi , where λ1 , . . . , λk are k distinct eigenvalues of A. Assume that Zij , j = 0, . . . , mi − 1, i = 1, . . . , k are the A-components. (a) Show that Zij Zpq = 0 for i ≠ p. (b) What is the exact formula for Zij Zip ?

3.2 Power stability, convergence, and boundedness of matrices Let A ∈ Cn×n . Assume that the minimal polynomial ψ(z) is given by (3.1.2) and denote by Zij , i = 1, . . . , k, j = 0, . . . , mj − 1 the A-components. Using Theorem 3.1.2, obtain k mi −1 l max(l−j,0) Al = ∑ ∑ ( )λi Zij , (3.2.1) i=1 j=0 j for each positive integer l. If we know the A-components, then to compute Al we need only around 2mn2 ≤ 2n3 flops! Thus, we need at most 4n4 flops to compute Al , including the computations of A-components, without dependence on l! (Note that λji = 8 elog jλi .) So to find x108 = A10 x0 , discussed in the beginning of the previous section, we need about 104 flops. So to compute x108 , we need about 104 102 flops compared with 108 102 flops using the simple algorithm explained in the beginning of the previous section. There are much simpler algorithms to compute Al that are roughly of the order (log2 l)2 n3 of computations and (log2 l)2 n2 (4n2 ) storage. Definition 3.2.1. Let A ∈ Cn×n . A is called power stable if liml→∞ Al = 0. Also, A is called power convergent if liml→∞ Al = B, for some B ∈ Cn×n . A is called power bounded if there exists K > 0 such that the absolute value of every entry of every Al , l = 1, . . . is bounded above by K. Theorem 3.2.2. Let A ∈ Cn×n . Then 1. A is power stable if and only if each eigenvalue of A is in the interior of the unit disk: ∣z∣ < 1. 2. A is power convergent if and only if each eigenvalue λ of A satisfies one of the following conditions: (a) ∣λ∣ < 1; (b) λ = 1 and each Jordan block of the JCF of A with an eigenvalue 1 is of order 1, i.e., 1 is a simple zero of the minimal polynomial of A. 3. A is power bounded if and only if each eigenvalue λ of A satisfies one of the following conditions: (a) ∣λ∣ < 1;

128

Chapter 3. Applications of the Jordan canonical form

(b) ∣λ∣ = 1 and each Jordan block of the JCF of A with an eigenvalue λ is of order 1, i.e., λ is a simple zero of the minimal polynomial of A. (Clearly, power convergence implies power boundedness.) Proof. Consider formula (3.2.1). Since the A-components Zij , i = 1, . . . , k, j = 0, . . . , mi − 1 are linearly independent, we need to satisfy the conditions of the theorem for each term in (3.2.1), which is (jl )λl−j i Zij for l >> 1. Note that for a fixed j, liml→∞ (jl )λl−j = 0 if and only if ∣λi ∣ < 1. Hence, we deduce the condition i 1 of the theorem. Note that the sequence (jl )λl−j i , l = j, j + 1, . . . , converges if and only if either ∣λi ∣ < 1 or λi = 1 and j = 0. Hence, we deduce condition 2 of the theorem. Note that the sequence (jl )λl−j i , l = j, j + 1, . . . , is bounded if and only if either ∣λi ∣ < 1 or ∣λi ∣ = 1 and j = 0. Thus, we deduce condition 3 of the theorem. ◻ Corollary 3.2.3. Let A ∈ Cn×n and consider the iterations xl = Axl−1 for l = 1, . . .. Then, for any x0 , 1. liml→∞ xl = 0 if and only if A is power stable. 2. xl , l = 0, 1, . . . converges if and only if A is power convergent. 3. xl , l = 0, 1, . . . is bounded if and only if A is power bounded. Proof. If A satisfies the conditions of an item of Theorem 3.2.2, then the corresponding condition of the corollary clearly holds. Assume that the conditions of an item of the corollary holds. Choose x0 = ej = (δ1j , . . . , δnj )⊺ , for j = 1, . . . , n to deduce the corresponding condition of Theorem 3.2.2. ◻ Theorem 3.2.4. Let A ∈ Cn×n and consider the nonhomogeneous iterations xl = Axl−1 + bl ,

l = 0, . . .

(3.2.2)

Then, we have the following statements: 1. liml→∞ xl = 0 for any x0 ∈ Cn and any sequence b0 , b1 , . . . satisfying the condition liml→∞ bl = 0 if and only if A is power stable. 2. The sequence xl , l = 0, 1, . . . converges for any x0 and any sequence b0 , b1 , . . . satisfying the condition ∑ll=0 bl converges if and only if A is power convergent. 3. The sequence xl , l = 0, 1, . . . is bounded for any x0 and any sequence b0 , b1 , . . . satisfying the condition ∑ll=0 ∣∣bl ∣∣∞ converges if and only if A is power bounded. (Here, ∣∣(x1 , . . . , xn )∣∣∞ = maxi∈[n] ∣xi ∣.) Proof. Assume that bl = 0. Since x0 is arbitrary, we deduce the necessity of all the conditions of Theorem 3.2.2. The sufficiency of the above conditions follows from the Jordan canonical form of A: Let J = U −1 AU , where U is an invertible matrix and J is the Jordan canonical form of A. Letting yl ∶= U −1 xl and cl = U −1 bl , it is enough to prove the sufficiency part of the theorem for the case where A is the sum of Jordan blocks. In this case, the system (3.2.2) reduces to independent systems of equations for each Jordan block. Thus, it is left to prove the theorem when A = Jn (λ).

3.2. Power stability, convergence, and boundedness of matrices

129

1. We show that if A = Jn (λ) and ∣λ∣ < 1, then liml→∞ xl = 0, for any x0 and bl , l = 1, . . . if liml→∞ bl = 0. We prove this claim by the induction on n. For n = 1, (3.2.2) is reduced to xl = λxl−1 + bl ,

x0 , xl , bl ∈ C for l = 1, . . . .

(3.2.3)

It is straightforward to show, e.g., use the induction that l

xl = ∑ λi bl−i = bl + λbl−1 + ⋯ + λl b0

l = 1, . . . , where b0 ∶= x0 .

(3.2.4)

i=0

Let βm = supi≥m ∣bi ∣. Since liml→∞ bl = 0, it follows that each βm is finite, the sequence βm , m = 0, 1, . . . decreasing and limm→∞ βm = 0. Fix m. Then, for l > m we have l

l−m

i=0

i=0

m

∣xl ∣ ≤ ∑ ∣λ∣i ∣bl−i ∣ = ∑ ∣λ∣i ∣bl−i ∣ + ∣λ∣l−m ∑ ∣λ∣j ∣∣bm−j ∣ l−m

m

i=0

j=1

j=1

∞

m

i=0

j=1

≤ βm ∑ ∣λ∣i + ∣λ∣l−m ∑ ∣λ∣j ∣∣bm−j ∣ ≤ βm ∑ ∣λ∣i + ∣λ∣l−m ∑ ∣λ∣j ∣∣bm−j ∣ =

m

βm βm + ∣λ∣l−m ∑ ∣λ∣j ∣∣bm−j ∣ → as l → ∞. 1 − ∣λ∣ 1 − ∣λ∣ j=1

βm That is, lim supl→∞ ∣xl ∣ ≤ 1−∣λ∣ . As limm→∞ βm = 0, it follows that lim supl→∞ ∣xl ∣ = 0, which is equivalent to the statement liml→∞ xl = 0. This proves the case n = 1.

Assume that the theorem holds for n = k. Let n = k + 1. View x⊺l ∶= (x1,l , yl⊺ , )⊺ , bl = (b1,l , c⊺l , )⊺ , where yl = (x2,l , . . . , xk+1,l )⊺ , cl ∈ Ck are the vectors composed of the last k coordinates of xl and bl , respectively. Then (3.2.2) for A = Jk+1 (λ) for the last k coordinates of xl is given by the system yl = Jk (λ)yl−1 +cl for l = 1, 2, . . .. Since liml→∞ cl = 0, the induction hypothesis yields that liml→∞ yl = 0. The system (3.2.2) for A = Jk+1 (λ) for the first coordinate is x1,l = λx1,l−1 + (x2,l−1 + b1,l ), for l = 1, 2, . . .. From the induction hypothesis and the assumption that liml→∞ bl = 0, we deduce that liml→∞ x2,l−1 + b1,l = 0. Hence, from the case k = 1, we deduce that liml→∞ x1,l = 0. Therefore, liml→∞ xl = 0. The proof of this case is concluded. 2. Assume that each eigenvalue λ of A satisfies the following conditions: either ∣λ∣ < 1 or λ = 1 and each Jordan block corresponding to 1 is of order 1. As we pointed out, we assume that A is a direct sum of its Jordan form. So first we consider A = Jk (λ) with ∣λ∣ < 1. Since we assumed that ∑∞ l=1 bl converges, we deduce that liml→∞ bl = 0. Thus, by part 1 we get that liml→∞ xl = 0. Assume now that A = (1) ∈ C1×1 . Thus, we consider (3.2.3) with λ = 1. Then, (3.2.4) yields that xl = ∑li=0 bl . By the assumption of the theorem, ∞ ∑i=1 bl converges, hence the sequence xl , l = 1, . . . converges. 3. As in part 2, it is enough to consider the case J1 (λ) with ∣λ∣ = 1. Note that (3.2.4) yields that ∣xl ∣ ≤ ∑li=0 ∣bi ∣. The assumption that ∑∞ i=1 ∣bi ∣ converges implies that ∣xl ∣ ≤ ∑∞ ◻ i=0 ∣bi ∣ < ∞.

130

Chapter 3. Applications of the Jordan canonical form

Remark 3.2.5. The stability, convergence, and boundedness of the nonhomogeneous systems, xl = Al xl−1 , Al ∈ Cn×n ,

l = 1, . . . ,

xl = Al xl−1 + bl , Al ∈ Cn×n , bl ∈ Cn ,

l = 1, . . . ,

are much harder to analyze.

3.2.1 Systems of linear ordinary differential equations A system of linear ordinary differential equations (ODE) with constant coefficients is given as y1′ y2′ ⋮ yn′

= a11 y1 = a21 y1 ⋮ ⋮ = an1 y1

+ a12 y2 + a22 y2 ⋮ ⋮ + an2 y2

+ . . . + a1n yn + . . . + a2n yn . ⋮⋮ + . . . + a1n yn

(3.2.5)

In matrix terms we write y′ = Ay, where y = y(t) = (y1 (t), y2 (t), . . . , yn (t))⊺ and A ∈ Cn×n is a constant matrix. We guess a solution of the form y(t) = eλt x, where x = (x1 , . . . , xn )⊺ ∈ Cn is a constant vector. We assume that x ≠ 0, otherwise we have a constant noninteresting solution x = 0. Then y′ = (eλt )′ x = λeλt x. The system y′ = Ay is equivalent to λeλt x = A(eλt x). Since eλt ≠ 0, divide by eλt to get Ax = λx. Corollary 3.2.6. If x(≠ 0) is an eigenvector of A corresponding to the eigenvalue λ, then y(t) = eλt x is a nontrivial solution of the given system of linear ODE with constant coefficients. Theorem 3.2.7. Assume that A ∈ Cn×n is diagonalizable. Let det(A − λI) = (λ1 −λ)m1 (λ2 −λ)m2 . . . (λk −λ)mk , where λi ≠ λj for i ≠ j, 1 ≤ mi (the multiplicity of λi ), and dim N(A − λi I) = mi , N(A − λi I) = span {xi1 , . . . , ximi }, for i = 1, . . . , k. Then the general solution of (3.2.5) is k,mi

y(t) = ∑ cij eλi (t−t0 ) xij .

(3.2.6)

i=1,j=1

The constants cij , i = 1, . . . , k, j = 1, . . . , mi , are determined uniquely by the initial condition y(t0 ) = y0 . Proof. Since each xij is an eigenvector of A corresponding to the eigenvalue λi , it follows from Corollary 3.2.6 that any y given by (3.2.6) is a solution of (3.2.5). From the proof of Theorem 2.1.12 it follows that the eigenvectors xij , i = 1, . . . , k, j = 1, . . . , mi form a basis in Cn . Hence there exists a unique i ◻ linear combination of the eigenvectors of A satisfying ∑k,m i=1,j=1 cij xij = c. Example 3.1. y1′ y2′

= 0.7y1 = 0.3y1

+ 0.2y2 , + 0.8y2 .

(3.2.7)

3.2. Power stability, convergence, and boundedness of matrices

The right-hand side is given by A = [

0.7 0.3

131

0.2 ] and det(A−λI) = (λ−1)(λ−0.5). 0.8

⊺ 2 Ax1 = x1 , Ax2 = 0.5x2 , x1 = ( , 1) , x2 = (−1, 1)⊺ . 3

The general solution of the system is y(t) = c1 et x1 + c2 e0.5t x2 : [ y1 (t) =

2c1 et − c2 e0.5t , y2 (t) = c1 et + c2 e0.5t . 3

Example 3.2.

⎡ 2 ⎢ ⎢ Hence A = ⎢ 1 ⎢ ⎢ 1 ⎣

2 y1 (t) −1 ] = c1 et [ 3 ] + c2 e0.5t [ ], y2 (t) 1 1

y1′ = 2y1 y2′ = y1 y3′ = y1

−3y2 −2y2 −3y2

+y3 , +y3 , +2y3 .

−3 1 ⎤⎥ ⎥ −2 1 ⎥ and we have ⎥ −3 2 ⎥⎦ det(A − λI) = −λ(λ − 1)2 , ⎡ 1 3 ⎢ ⎢ X = [x1 x2 x3 ] = ⎢ 1 1 ⎢ ⎢ 1 0 ⎣

λ1 = 0, λ2 = λ3 = 1, −1 ⎤⎥ ⎥ 0 ⎥. ⎥ 1 ⎥⎦

The general solution is y(t) = c1 e0 x1 + c2 et x2 + c3 et x3 , where y1 (t) = c1 + 3c2 et − c3 et , y2 (t) = c1 + c2 et , y3 (t) = c1 + c3 et .

3.2.2 Initial conditions Assume that the initial conditions of the system (3.2.5) are given at the time t0 = 0: y(0) = y0 . On the assumption that A is diagonalizable, i.e., XΛX −1 , it follows that the unknown vector c appearing in (3.2.6) satisfies Xc = y0 . We solve this system either by Gauss elimination or by inverting X: c = X −1 y0 . Example 3.3. In the system given in (3.2.7) find the solution satisfying the initial condition y(0) = (1, 2)⊺ . Solution: This condition is equivalent to [ 2 c [ 1 ]=[ 3 c2 1

−1

−1 ] 1

2 3

1

[

c 1 −1 ] [ 1 ] = [ ]: c2 2 1 ⎡ 3 1 ⎢ ] = ⎢ 35 ⎢ − 2 ⎣ 5

Now substitute these values of c1 , c2 in (3.2.6).

3 5 2 5

⎤ 1 ⎡ ⎥ ⎢ ⎥[ ] = ⎢ ⎥ 2 ⎢ ⎦ ⎣

9 5 1 5

⎤ ⎥ ⎥. ⎥ ⎦

132

Chapter 3. Applications of the Jordan canonical form

3.2.3 Complex eigenvalues of real matrices Proposition 3.2.8. Let A ∈ Rn×n and assume that λ ∶= α+iβ, α, β ∈ R is nonreal eigenvalue (β ≠ 0). Then the corresponding eigenvector x = u + iv, u, v ∈ Rn ¯ = α−iβ ≠ λ is another eigenvalue of (Au = λu) is nonreal (v ≠ 0). Furthermore, λ ¯ = u−iv. The corresponding contributions A with the corresponding eigenvector x of the above two complex eigenvectors to the solution of y′ = Ay is eαt C1 (cos(βt)u − sin(βt)v) + eαt C2 (sin(βt)u + cos(βt)v).

(3.2.8)

These two solutions can be obtained by considering the real linear combination of the real and imaginary parts of the complex solution eλt x. Proof. Consider Euler’s formula for ez , where z = a + ib, a, b ∈ R: ez = ea+ib = ea eib = ea (cos b + i sin b). Now find the real part of the complex solution (C1 + iC2 )e(α+iβ)t (u + iv) to deduce (3.2.8). ◻

3.2.4 Second-order linear differential systems Let A1 , A2 ∈ Cn×n and y = (y1 (t), . . . , yn (t))⊺ . Then the second-order differential system of linear equations is given as y′′ = A1 y + A2 y′ .

(3.2.9)

It is possible to translate this system to a system of the first order involving matrices and vectors of double size. Let z = (y1 , . . . , yn , y1′ , . . . , yn′ )⊺ . Then z′ = Az,

A=[

0n A1

In ] ∈ C2n×2n . A2

(3.2.10)

Here 0n is the n × n zero matrix and In is the n × n identity matrix. The initial conditions are y(t0 ) = a ∈ Cn , y′ (t0 ) = b ∈ Cn , which are equivalent to the initial conditions z(t0 ) = c ∈ C2n . Thus the solution of the second-order differential system with n unknown functions can be solved by converting this system to the first-order system with 2n unknown functions.

3.3 eAt and stability of certain systems of ordinary differential equations The exponential function ez has the Maclaurin expansion ez = 1 + z +

∞ l z2 z3 z + +⋯=∑ . 2 6 l=0 l!

For A ∈ Cn×n , let ∞

1 k 1 1 A = I + A + A2 + A3 + ⋯ . k! 2! 3! k=0

eA = ∑

(3.3.1)

3.3. eAt and stability of certain systems of ordinary differential equations

133

If D = diag(λ1 , λ2 , . . . , λn ) it is straightforward to show that eD = diag(eλ1 , eλ2 , . . . , eλn ).

(3.3.2)

Hence, for a diagonalizable matrix A, we get the identity eA = XeD X −1 ,

A = XDX −1 .

(3.3.3)

Similarly, we define ∞

1 1 tk k A = I + tA + t2 A2 + t3 A3 + ⋯ . k! 2! 3! k=0

etA = ∑

(3.3.4)

So taking the derivative with respect to t we obtain ∞

ktk−1 k 1 1 A = 0 + A + 2tA2 + 3t2 A3 + ⋯ = AeAt . k! 2! 3! k=1

(etA )′ = ∑

(3.3.5)

3.3.1 Examples of exponential of matrices Example 3.4. A=[ eA = [ etA = [

0.7 0.3

2 0.2 ]=[ 3 0.8 1

2 3

e1 −1 ][ 0 1

2 3

et −1 ][ 0 1

1

1

1 −1 ][ 0 1 0

e0.5 0 e0.5t

⎡ 3 ⎢ ] ⎢ 53 ⎢ − ⎣ 5 ⎡ 3 ⎢ ] ⎢ 53 ⎢ − ⎣ 5

3 3 0 ] [ 53 25 ] , 0.5 −5 5 3 ⎤ ⎡ 2e−3e0.5 2e−2e0.5 ⎤ ⎥ 5 ⎥ ⎢ 5 5 ⎢ ⎥, ⎥= 0.5 ⎥ 2 ⎥ ⎢ 3e−3e0.5 3e+2e ⎢ ⎥ 5 ⎦ ⎣ ⎦ 5 5 t 0.5t t 0.5t 3 ⎤ ⎡ 2e −3e 2e −2e 5 ⎥ ⎢ 5 5 ⎥=⎢ 2 ⎥ ⎢ 3et −3e0.5t 3et +2e0.5t ⎢ 5 ⎦ ⎣ 5 5

⎤ ⎥ ⎥. ⎥ ⎥ ⎦

In the system of ODE (3.2.7) the solution satisfying IC y(0) = (1, 2)⊺ is given as ⎡ ⎢ y(t) = e y(0) = ⎢⎢ ⎢ ⎣ At

2et −3e0.5t 5 3et −3e0.5t 5

2et −2e0.5t 5 3et +2e0.5t 5

⎡ ⎤ ⎥ 1 ⎢ ⎥[ ] = ⎢ ⎢ ⎥ 2 ⎢ ⎥ ⎣ ⎦

6et −7e0.5t 5 9et +e0.5t 5

⎤ ⎥ ⎥. ⎥ ⎥ ⎦

Compare this solution with the solution given in Example 3.3. Example 3.5. B = (00

1 0)

is nondiagonalizable.

We compute eB , etB using

power series (3.3.4). Note B 2 = 0. Hence B k = 0 for k ≥ 2. So 1 2 1 3 1 B + B +⋯=I +B =[ 0 2! 3!

1 ], 1

1 2 2 1 3 3 1 t B + t B + ⋯ = I + tB = [ 0 2! 3!

t ]. 1

eB = I + B + etB = I + tB + Hence the system of

y1′ = y2 y2′ = 0

has the general solution [

y1 (t) c c +c t ] = etB [ 1 ] = [ 1 2 ] . y2 (t) c2 c2

134

Chapter 3. Applications of the Jordan canonical form

Proposition 3.3.1. Let A ∈ Cn×n and consider the linear system of n ordinary differential equations with constant coefficients dx(t) = Ax(t), where x(t) = dt (x1 (t), . . . , xn (t))⊺ ∈ Cn , satisfying the initial conditions x(t0 ) = x0 . Then, x(t) = eA(t−t0 ) x0 is the unique solution to the above system. More generally, let b(t) = (b1 (t), . . . , bn (t))⊺ ∈ Cn be any continuous vector function on R and consider the nonhomogeneous system of n ordinary differential equations with the initial condition dx(t) = Ax(t) + b(t), dt

x(t0 ) = x0 .

(3.3.6)

Then, this system has a unique solution of the form x(t) = eA(t−t0 ) x0 + ∫

t t0

eA(t−u) b(u)du.

(3.3.7)

Proof. The uniqueness of the solution of (3.3.6) follows from the uniqueness of solutions to system of ODE. The first part of the proposition follows from (3.3.5). To deduce the second part, one does the variations of parameters. Namely, one tries a solution x(t) = eA(t−t0 ) y(t), where y(t) ∈ Cn is unknown vector function. Hence, x′ = (eA(t−t0 ) )′ y(t) + eA(t−t0 ) y′ (t) = AeA(t−t0 ) y(t) + eA(t−t0 ) y′ (t) = Ax(t) + eA(t−t0 ) y′ (t). Substitute this expression of x(t) to (3.3.6) to deduce the differential equation y′ = e−A(t−t0 ) b(t). Since y(t0 ) = x0 , this simple equation has a unique solution u y(t) = x0 + ∫t0 eA(u−t0 ) b(u)du. Now, multiply by eA(t−t0 ) and use the fact that eAt eAu = eA(u+v) to deduce (3.3.7). ◻ Use (3.1.3) for ezt and the observation that

dj ezt dz j

k mi −1 j λi t

eAt = ∑ ∑

j=1 j=0

= tj ezt , j = 0, 1, . . . to deduce

t e Zij . j!

(3.3.8)

We can substitute this expression for eAt in (3.3.7) to get a simple expression of the solution x(t) of (3.3.6). Definition 3.3.2. Let A ∈ Cn×n . A is called exponentially stable, or simple stable, if limt→∞ eAt = 0. A is called exponentially convergent if limt→∞ eAt = B, for some B ∈ Cn×n . A is called exponentially bounded if there exists K > 0 such that the absolute value of every entry of every eAt , t ∈ [0, ∞) is bounded above by K. Theorem 3.3.3. Let A ∈ Cn×n . Then 1. A is stable if and only if each eigenvalue of A is in the left half of the complex plane: Rz < 0. 2. A is exponentially convergent if and only if each eigenvalue λ of A satisfies one of the following conditions:

3.3. eAt and stability of certain systems of ordinary differential equations

135

(a) Rλ < 0; (b) λ = 2πli, for some integer l, and each Jordan block of the JCF of A with an eigenvalue λ is of order 1, i.e., λ is a simple zero of the minimal polynomial of A. 3. A is exponentially bounded if and only if each eigenvalue λ of A satisfies one of the following conditions: (a) Rλ < 0; (b) Rλ = 0 and each Jordan block of the JCF of A with an eigenvalue λ is of order 1, i.e., λ is a simple zero of the minimal polynomial of A. Proof. Consider the formula (3.3.8). Since the A-components Zij , i = 1, . . . , k, j = 0, . . . , mi − 1 are linearly independent, we need to satisfy the conditions of the j theorem for each term in (3.3.8), which is tj! eλi t Zij . Note that for a fixed j, limt→∞

tj λi t e j!

= 0 if and only if Rλi < 0. Hence, we deduce the condition 1 of the j

theorem. Note that the function tj! eλi t converges as t → ∞ if and only if either Rλi < 0 or eλi = 1 and j = 0. Hence, we deduce the condition 2 of the theorem. j Note that the function tj! eλi t is bounded for t ≥ 0 if and only if either Rλi < 0 or ∣eλi ∣ = 1 and j = 0. Hence, we deduce condition 3 of the theorem. ◻ Corollary 3.3.4. Let A ∈ Cn×n and consider the system of differential equations dx(t) = Ax(t), x(t0 ) = x0 . Then, for any x0 , dt 1. limt→∞ x(t) = 0 if and only if A is stable. 2. x(t) converges as t → ∞ if and only if A is exponentially convergent. 3. x(t), t ∈ [0, ∞) is bounded if and only if A is exponentially bounded. Theorem 3.3.5. Let A ∈ Cn×n and consider the system of differential equations (3.3.6). Then, for any x0 ∈ Cn , we have the following statements: 1. limt→∞ x(t) = 0, for any continuous function b(t), such that limt→∞ b(t) = 0, if and only if A is stable. 2. x(t) converges as t → ∞, for any continuous function b(t), such that ∞ ∫t0 b(u)du converges, if and only if A is exponentially convergent. 3. x(t), t ∈ [0, ∞) is bounded for any continuous function b(t), such that ∞ ∫t0 ∣b(u)∣du converges if and only if A is exponentially bounded. Proof. It is left as an exercise.

Historical note Joseph-Louis Lagrange is usually considered to be a French mathematician, but the Italian Encyclopaedia refers to him as an Italian mathematician. He and Pierre Simon Laplace (1749–1827) together reduced the solution of differential equations to what in actuality was an eigenvalue problem for a matrix of coefficients determined by their knowledge of the planetary orbits. Without having

136

Chapter 3. Applications of the Jordan canonical form

any official notion of matrices, they constructed a quadratic form from the array of coefficients and essentially uncovered the eigenvalues and eigenvectors of the matrix by studying the quadratic form. One of Lagrange’s main contributions in algebra is well-known as Lagrange’s theorem, which states that the order of a subgroup H of a group G must divide the order of G. Lagrange is one of the 72 prominent French scientists who were commemorated on plaques at the first stage of the Eiffel Tower when it first opened. Rue Lagrange in the 5th Arrondissement in Paris is named after him. In Turin, the street where the house of his birth still stands is named Via Lagrange [27].

Joseph-Louis Lagrange (1736–1813).

Pierre Simon Laplace (1749–1827).

3.3.2 Worked-out problems 1. Let B = [bij ]ni,j=1 ∈ Rn×n and assume that each entry of B is nonnegative. Assume that there exists a vector u = (u1 , . . . , un )⊺ with positive coordinates such that Bu = u. (k)

(a) Show that B k = [bij ]ni,j=1 has nonnegative entries for each k ∈ N. (k)

(b) Show that bij ≤

ui , uj

for i, j = 1, . . . , n and k ∈ N.

(c) Show that each eigenvalue of B satisfies ∣λ∣ ≤ 1. (d) Suppose that λ is an eigenvalue of B and ∣λ∣ = 1. What is the multiplicity of λ in the minimal polynomial of B? Solution: (a) It is immediate, as B is a nonnegative matrix. (b) Since Bu = u, by induction on k, we can show that B k u = u, for any (k) (k) k ∈ N. Now we have ui = ∑np=1 bip up ≥ bij uj , for all 1 ≤ i, j ≤ n. Then ui bij ≤ uj . (c) According to part (b), B is power bounded. The statement is immediate from the third part of Theorem 3.2.2. (d) According to the third part of Theorem 3.2.2, λ would be a simple zero of the minimal polynomial of B.

3.3. eAt and stability of certain systems of ordinary differential equations

137

3.3.3 Problems 1. Consider the nonhomogeneous system xl = Al xl−1 , Al ∈ Cn×n , l = 1, . . .. Assume that the sequence Al , l = 1, . . . , is periodic, i.e., Al+p = Al for all l = 1, . . . , and a fixed positive integer p. (a) Show that for each x0 ∈ Cn liml→∞ xl = 0 if and only if B ∶= Ap Ap−1 . . . A1 is power stable. (b) Show that for each x0 ∈ Cn the sequence xl , l = 1, . . . , converges if and only if the following conditions satisfy. First, B is power convergent, i.e., liml→∞ B l = C. Second, Ai C = C, for i = 1, . . . , p. (c) Find a necessary and sufficient condition such that for each x0 ∈ Cn , the sequence xl , l = 1, . . . , is bounded. 2. Prove Theorem 3.3.5. 3. For any A ∈ Fn×n , prove that det eA = etr A . 4. Let A ∈ Rn×n and P ∈ GL(n, R). Prove that eP 5. Let

⎡2 ⎢ ⎢−4 ⎢ A=⎢ ⎢1 ⎢ ⎢2 ⎣

2 −3 1 2

−2 4 −1 −2

−1

AP

= P −1 eA P .

4 ⎤⎥ −6⎥⎥ ⎥. 2 ⎥⎥ 4 ⎥⎦

Assume that the minimal polynomial of A is ψA (z) = z(z − 1)2 . (a) Find the components of A as polynomials in A of degree at most 2. (b) Find Ap for any positive integer p as a linear combination of I, A, A2 . (c) Find eAt as a linear combination of I, A, A2 . (d) Solve the system (1, 0, 0, 0)⊺ .

dx(t) dt

= Ax(t) with the initial condition x(0) =

(e) Is A power bounded? (See Worked-out Problem 2.8.1-1.) 6. Let A ∈ Rn×n . (a) Assume that the minimal polynomial of A is ψA (z) = z 3 (z− 21 )2 (z 4 −1). Is A power stable, power convergent, or power bounded? (b) Assume that the minimal polynomial of A is (z + 2)3 (z 2 + z + 3)2 (z 2 + 4). Consider the system dx(t) = Ax(t). Is it stable, convergent, or dt bounded? 7. For every A ∈ Cn×n , prove that eA is invertible and its inverse is e−A .

Chapter 4

Inner product spaces

Throughout this chapter, we assume that F is either the real field or complex field and V is a vector space over F (unless stated otherwise). Assume that T ∈ L(U, V). We abbreviate T (x) to T x and no ambiguity will arise.

4.1 Inner product In this section, we shall study a certain type of scalar-valued functions on pairs of vectors, known as an inner product. The first example of an inner product is the dot product in R3 . The dot product of x = (x1 , x2 , x3 ) and y = (y1 , y2 , y3 ) in R3 is the scalar ∑3i=1 xi yi . Geometrically, this scalar is the product of the length of x, the length of y, and the cosine of the angle between x and y. Then, we can define length and angle in R3 algebraically. An inner product on a vector space is the generalization of the dot product. This section deals with inner products and their properties. Then, we will discuss length and angle (orthogonality). Definition 4.1.1. The function ⟨⋅, ⋅⟩ ∶ V × V → F is called an inner product if the following conditions hold for any x, y, z ∈ V and a ∈ F: (i) Conjugate symmetry: ⟨x, y⟩ = ⟨y, x⟩. (ii) Linearity in the first argument: ⟨ax + y, z⟩ = a⟨x, z⟩ + ⟨y, z⟩. (iii) Positive-definiteness: ⟨x, x⟩ ≥ 0, ⟨x, x⟩ = 0 if and only if x = 0. If the second condition in positive-definiteness is dropped, the resulting structure is called a semi-inner product. Other standard properties of inner products are mentioned in Problems 4.7.2-1 and 4.7.2-2. The vector space V endowed with the inner product ⟨⋅, ⋅⟩ is called the inner product space. We will use the abbreviation IPS for inner product space. 139

140

Chapter 4. Inner product spaces

Example 4.1. On Rn we have the standard inner product (the dot product) defined by ⟨x, y⟩ = ∑ni=1 xi yi , where x = (x1 , . . . , xn )⊺ ∈ Rn and y = (y1 , . . . , yn )⊺ ∈ Rn . Note that the inner product generalizes the notion of the dot product of vectors in Rn . Example 4.2. On Cn we have the standard inner product defined by ⟨x, y⟩ = n ∑i=1 xi , yi , where x = (x1 , . . . , xn )⊺ ∈ Cn and y = (y1 , . . . , yn )⊺ ∈ Cn . Example 4.3. Let V = Cn×n . For A = [aij ] and B = [bij ] ∈ V, define the inner product ⟨A, B⟩ = ∑ aij bij . i,j

Define conjugate transpose B ∗ = (B)⊺ . Then ⟨A, B⟩ = tr B ∗ A is an inner product. Example 4.4 (integration). Let V be the vector space of all F-value continuous functions on [0, 1]. For f, g ∈ V, define ⟨f, g⟩ = ∫

1 0

f gdt.

This is an inner product on V. In some context, this is called L2 inner product space. This can be done in any space where you have an idea of integration and it comes under Measure Theory. Matrix of inner product. Let {e1 , . . . , en } be a basis of V. Let pij = ⟨ei , ej ⟩ and P = [pij ] ∈ Fn×n . Then, for v = x1 e1 + ⋯ + xn en ∈ V and w = y1 e1 + ⋯ + yn en ∈ V, we have ⎛ y1 ⎞ ⎜y ⎟ ⟨v, w⟩ = ∑ xi yj pij = (x1 , . . . , xn )P ⎜ 2 ⎟ . ⎜ ⋮ ⎟ ⎝yn ⎠ This matrix P is called the matrix of the inner product with respect to the basis {e1 , . . . , en }. Definition 4.1.2. A function ∥⋅∥ ∶ V → F is called a norm on V if it has the following properties for all x, y ∈ V and a ∈ F: (i) Positive definiteness: ∥x∥ ≥ 0, ∥x∥ = 0 if and only if x = 0. (ii) Homogeneity: ∥ax∥ = ∣a∣∥x∥. (iii) Triangle inequality: ∥x + y∥ ≤ ∥x∥ + ∥y∥. The vector space V endowed with the norm ∥⋅∥ is called the normed linear space.

4.1. Inner product

141

Note that the notion of norm generalizes the notion of length of a vector in Rn . Example 4.5. On Rn we have the following norms for x = (x1 , . . . , xn )⊺ ∈ Rn : (1) ∥x∥∞ = max{∣x1 ∣, . . . , ∣xn ∣}, (2) ∥x∥1 = ∣x1 ∣ + ∣x2 ∣ + ⋯ + ∣xn ∣, 1

(3) ∥x∥p = (∣x1 ∣p + ∣x2 ∣p + ⋯ + ∣xn ∣p ) p , (p is a real number, p ≥ 1). The interested reader is encouraged to verify the above properties to see that (1), (2), and (3) define norms. Note that ∥⋅∥p is called the `p norm. The `2 norm is called the Euclidean norm. The Euclidean norm on the matrix space Fm×n is called the Frobenius norm: 1 2

⎛ ⎞ ∥A∥F = ∑∣aij ∣2 , A = [aij ] ∈ Fm×n . ⎝ i,j ⎠ Let ∥⋅∥t be a norm on Fn and A ∈ Fn . Define ∥A∥ = max {∥Ax∥t ∶ ∥x∥t = 1, x ∈ Fn }. Then ∥⋅∥ is a norm called operator norm induced by ∥⋅∥t . The operator norm on Fm×n induced by the Euclidean norm is called the spectral norm, denoted by ∥⋅∥∞ : ∥A∥∞ = max {∥Ax∥2 ∶ ∥x∥2 = 1, x ∈ Fn } . Remark 4.1.3. If ⟨ , ⟩ is an inner product, the function ∥⋅∥ ∶ V → F defined as 1 ∥x∥ = (⟨x, x⟩) 2 is a norm (Why? Use Theorem 4.1.4 to justify it.) This norm is called the induced norm by the inner product ⟨ , ⟩. Also, ∥x∥ is called the length of x. Note that if ∥⋅∥ ∶ V → R is a norm, then the metric space (V, d) defined as d(x, y) = ∥x − y∥ is called the induced metric by the norm ∥⋅∥. (See Appendix B for details about metric spaces.) Theorem 4.1.4 (the Cauchy–Schwarz inequality). Let V be an IPS and x, y ∈ V. Then we have ∣⟨x, y⟩∣ ≤ ∥x∥∥y∥, and equality holds if and only if x and y are linearly dependent. Here ∥⋅∥ is the induced norm by the inner product. Proof. The inequality is clearly valid when x = 0. If x ≠ 0, put z=y−

⟨y, x⟩ x. ∥x∥2

Then ⟨z, x⟩ = 0 and we have: 0 ≤ ∥z∥2 = ⟨y −

⟨y, x⟩ ⟨y, x⟩ x, y − x⟩ 2 ∥x∥ ∥x∥2

⟨y, x⟩ ⟨y, x⟩ ∥x∥2 ∣⟨x, y⟩∣2 . = ∥y∥2 − ∥x∥2 = ⟨y, y⟩ −

Therefore, ∣⟨x, y⟩∣2 ≤ ∥x∥2 ∥y∥2 .

◻

142

Chapter 4. Inner product spaces

Proposition 4.1.5. Let V be a real vector space. Identify Vc with the set of pairs (x, y), x, y ∈ V. Then, Vc is a vector space over C with (a + ib)(x, y) ∶= a(x, y) + b(−y, x),

for all a, b ∈ R, x, y ∈ V.

If V has a basis {e1 , . . . , en } over F, then {(e1 , 0), . . . , (en , 0)} is a basis of Vc over C. Any inner product ⟨⋅, ⋅⟩ on V over R induces the following inner product on Vc : ⟨(x, y), (u, v)⟩ = ⟨x, u⟩ + ⟨y, v⟩ + i(⟨y, u⟩ − ⟨x, v⟩),

for x, y, u, v ∈ V.

We leave the proof of this proposition to the reader as Problem 4.7.2-3. Definition 4.1.6 (angle). Let x, y ∈ Rn be nonzero and ⟨⋅, ⋅⟩ denote the standard inner product. Since ∣⟨x, y⟩∣ ≤ ∥x∥∥y∥, we can define the angle between x, y as follows: ⟨x, y⟩ ∠(x, y) = arccos . ∥x∥∥y∥ In particular, vectors x and y are orthogonal, denoted by x ⊥ y, if ⟨x, y⟩ = 0. We can define this notion in any inner product space as follows: Definition 4.1.7. Let V be an IPS. Then (i) x, y ∈ V are called orthogonal if ⟨x, y⟩ = 0. (ii) S, T ⊂ V are called orthogonal if ⟨x, y⟩ = 0, for any x ∈ S, y ∈ T . (iii) For any S ⊂ V, S ⊥ ⊂ V denotes the maximal orthogonal set to S, i.e., ⊥ S = {v ∈ V ∶ ⟨v, w⟩ = 0, for all w ∈ S}. S ⊥ is also called the orthogonal complement of S. (iv) {x1 , . . . , xm }⊂ V is called an orthonormal set if ⟨xi , xj ⟩ = δij

(i, j = 1, . . . , m),

where δij denotes the Kronecker delta. (v) {x1 , . . . , xn }⊂ V is called an orthonormal basis if it is an orthonormal set which is a basis in V. Remark 4.1.8. Note that S ⊥ is always a subspace of V (even if S was not). Furthermore, (i) {0}⊥ = V, (ii) V⊥ = {0}, (iii) S ⊥ = (span S)⊥ . Lemma 4.6. Let U be a subspace of an inner product space V over F. Assume that {e1 , . . . , en } is an orthonormal basis of U. Then there exists a unique vector p ∈ U, called a projection of x on U, such that x − p is orthogonal on U . p is given by the formula n

p = ∑⟨x, ei ⟩ei . i=1

Furthermore, the distance of x to U is given by ∥x − p∥.

(4.1.1)

4.1. Inner product

143

Proof. By the definition of p, clearly p ∈ U. We claim that x − p is orthogonal to U. Indeed, n

⟨x − p, ej ⟩ = ⟨x, ej ⟩ − ⟨p, ej ⟩ = ⟨x, ej ⟩ − ∑⟨x, ei ⟩⟨ei , ej ⟩ i=1

n

= ⟨x, ej ⟩ − ∑⟨x, ei ⟩δij = ⟨x, ej ⟩ − ⟨x, ej ⟩ = 0. i=1

So x − p is orthogonal to span {e1 , . . . , en } = U. Assume that p1 ∈ U and x − p1 is orthogonal to U. Hence, p1 − p = (x − p) − (x − p1 ) is also orthogonal to U. Clearly, p1 − p ∈ U. Hence, ∥p1 − p∥2 = ⟨p1 − p, p1 − p⟩ = 0. Hence, p1 − p = 0. This shows the uniqueness of p. Let u ∈ U. Then x − u = (x − p) + (p − u). As x − p is orthogonal to p − u it follows that ∥x − u∥2 = ⟨(x − p) + (p − u), (x − p) + (p − u)⟩ = ⟨x − p, x − p⟩ + ⟨p − u, p − u⟩ = ∥x − p∥2 + ∥p − u∥2 ≥ ∥x − p∥2 . Thus the distance of x to U is ∥x − p∥.

◻

Definition 4.1.9. A matrix Q ∈ Cn×n is called unitary if Q∗ Q = In . In this case, the columns of Q form an orthonormal basis. (See subsection 1.3.6-8.) Remark 4.1.10. Unitary matrices over C are the natural generalization of orthogonal matrices over R. Gram–Schmidt (G–S) algorithm. Let V be an IPS and S = {x1 , . . . , xm } ⊂ V a finite (possibly empty) set. Then, S˜ = {e1 , . . . , ep } is the orthonormal set (p ≥ 1) or the empty set (p = 0) obtained from S using the following recursive steps: (a) If x1 = 0, remove it from S. Otherwise, replace x1 by ∣∣x1 ∣∣−1 x1 . (b) Assume that {x1 , . . . , xk } is an orthonormal set and 1 ≤ k < m. Let yk+1 = xk+1 − ∑ki=1 ⟨xk+1 , xi ⟩xi . If yk+1 = 0, remove xk+1 from S. Otherwise, replace xk+1 by ∣∣yk+1 ∣∣−1 yk+1 . Indeed, the Gram–Schmidt process (GSP) is a method for orthonormalization of a set of vectors in an inner product space. Corollary 4.1.11. Let V be an IPS and S = {x1 , . . . , xn } ⊂ V be n linearly independent vectors. Then, the G–S algorithm on S is given as follows: y1 , r11 rji ∶= ⟨xi , ej ⟩, j = 1, . . . , i − 1, y1 ∶= x1 , r11 ∶= ∣∣y1 ∣∣, e1 ∶= i−1

pi−1 ∶= ∑ rji ej ,

(4.1.2)

yi ∶= xi − pi−1 ,

j=1

rii ∶= ∣∣yi ∣∣, ei ∶=

yi , i = 2, . . . , n. rii

In particular, ei ∈ Si and ∣∣yi ∣∣ = dist(xi , Si−1 ), where Si = span {x1 , . . . , xi }, for i = 1, . . . , n and S0 = {0}. (See Problem 4.7.2-4 for the definition of dist(xi , Si−1 ).)

144

Chapter 4. Inner product spaces

Corollary 4.1.12. Any (ordered) basis in a finite-dimensional inner product space V induces an orthonormal basis by the G–S algorithm. See Problem 4.7.2-4 for some known properties related to the above notions.

4.2 Explanation of the G–S process in standard Euclidean space First, observe that ri(k+1) ∶= ⟨xk+1 , ei ⟩ is the scalar projection of xk+1 on ei . Next, observe that pk is the projection of xk+1 on {e1 , . . . , ek } = span {x1 , . . . , xk }. Hence, yk+1 = xk+1 − pk ⊥ span {e1 , . . . , ek }. Thus, r(k+1)(k+1) = ∣∣xk+1 − pk ∣∣ is the distance of xk+1 to span {e1 , . . . , ek } = span {x1 , . . . , xk }. The assumption that x1 , . . . , xn are linearly independent yields that r(k+1)(k+1) > 0. Hence, ek+1 = −1 r(k+1)(k+1) (xk+1 − pk ) is a vector of unit length orthogonal to {e1 , . . . , ek }. In Section 4.11 we will provide an explicit form of the G–S process without the iterative algorithm.

4.3 An example of the G–S process Let x1 = (1, 1, 1, 1)⊺ , x2 = (−1, 4, 4, −1)⊺ , x3 = (4, −2, 2, 0)⊺ . Then 1 1 1 1 1 ⊺ x1 = ( , , , ) , r11 2 2 2 2 3 3 3 3 ⊺ r12 = e⊺1 x2 = 3, p1 = r12 e1 = 3e1 = ( , , , ) , 2 2 2 2 5 5 5 5 ⊺ y2 = x2 − p1 = (− , , , − ) , r22 = ∣∣y2 ∣∣ = 5, 2 2 2 2 1 1 1 1 1 ⊺ e2 = (y2 ) = (− , , , − ) , r22 2 2 2 2 r13 = e⊺1 x3 = 2, r23 = e⊺2 x3 = −2,

r11 = ∣∣x1 ∣∣ = 2, e1 =

p2 = r13 e1 + r23 e2 = (2, 0, 0, 2)⊺ ,

y3 = x3 − p2 = (2, −2, 2, −2)⊺ , r33 = ∣∣y3 ∣∣ = 4, 1 1 1 1 1 ⊺ e3 = (x3 − p2 ) = ( , − , , − ) . r33 2 2 2 2

4.4 QR factorization A QR factorization of a real matrix A is its decomposition into a product A =QR of unitary matrix Q and an upper triangular matrix R. QR factorization is often used to solve the least squares problem (Theorem 4.5.4). Indeed, the G–S process is used to factor any real matrix with linearly independent columns into a product of a matrix with orthogonal columns and an upper triangular matrix. Let A = [a1 a2 . . . an ] ∈ Rm×n and assume that rank A = n, i.e., the columns of A are linearly independent. Perform the G–S process with the bookkeeping as above: r11 ∶= ∣∣a1 ∣∣, e1 ∶=

1 a . r11 1

4.5. An example of QR factorization

145

Assume that e1 , . . . , ek−1 were computed. Then, rik ∶= e⊺i ak for i = 1, . . . , k− 1 (ak − 1, pk−1 ∶= r1k e1 + r2k e2 + ⋯r(k−1)k ek−1 and rkk ∶= ∣∣ak − pk−1 ∣∣, ek ∶= rkk pk−1 ), for k = 2, . . . , n.

⎡ r11 r12 r13 . . . r1n ⎤ ⎢ ⎥ ⎢ 0 r r23 . . . r2n ⎥⎥ 22 ⎢ ⎢ ⎥ 0 r33 . . . r3n ⎥. Let Q = [e1 e2 . . . en ] ∈ Rm×n and R = ⎢ 0 ⎢ ⎥ ⎢ ⋮ ⋮ ⋮ ⋮ ⋮ ⎥⎥ ⎢ ⎢ 0 0 0 . . . rnn ⎥⎦ ⎣ ⊺ ⊺ ⊺ Then, A = QR, Q Q = In and A A = R R. The LSS (least squares solution) of Ax = b is given by the upper triangular system Rˆ x = Q⊺ b which can be solved −1 ⊺ ˆ = R Q b. (See Theorem 4.5.4.) by back substitution. Formally, x Proof. A⊺ Ax = R⊺ Q⊺ QRx = R⊺ Rx = A⊺ b = R⊺ Q⊺ b. Multiply from the left by (R⊺ )−1 to get Rˆ x = Q⊺ b. Note that QQ⊺ b is the projection of b on the column space of A. The matrix P ∶= QQ⊺ is called an orthogonal projection. It is symmetric and P 2 = P , as (QQ⊺ )(QQ⊺ ) = Q(Q⊺ Q)Q⊺ = Q(I)Q⊺ = QQ⊺ . Note that QQ⊺ ∶ Rm → Rm is the orthogonal projection. The assumption that rank A = n is equivalent to the assumption that A⊺ A is ˆ = (A⊺ A)−1 b. invertible. So, the LSS of A⊺ Aˆ x = A⊺ b has the unique solution x Hence, the projection of b on the column space of A is P b = Aˆ x = A(A⊺ A)−1 A⊺ b. ⊺ −1 ⊺ Therefore, P = A(A A) A .

4.5 An example of QR factorization Let

⎡ 1 −1 4 ⎤⎥ ⎢ ⎥ ⎢ 1 4 −2 ⎥ ⎢ ⎥ A = [x1 x2 x3 ] = ⎢ ⎢ 1 4 2 ⎥⎥ ⎢ ⎢ 1 −1 0 ⎥⎦ ⎣ be the matrix corresponding to the example of the G–S algorithm in section 4.3. Then ⎡ r ⎢ 11 ⎢ R=⎢ 0 ⎢ ⎢ 0 ⎣

⎤ ⎡ 2 3 ⎥ ⎢ ⎥ ⎢ ⎥=⎢ 0 5 ⎥ ⎢ ⎥ ⎢ 0 0 ⎦ ⎣ ⎡ 1 −1 ⎢ 2 2 ⎢ ⎢ 1 1 ⎢ 2 2 Q = [q1 q2 q3 ] = ⎢⎢ 1 1 ⎢ 2 2 ⎢ ⎢ 1 1 ⎢ ⎣ 2 −2 r12 r22 0

r13 r23 r33

⎤ ⎥ ⎥ ⎥, ⎥ ⎥ ⎦ 1 ⎤ 2 ⎥ ⎥ − 12 ⎥⎥ ⎥. 1 ⎥ ⎥ 2 ⎥ 1 ⎥ ⎥ 2 ⎦

2 −2 4

(Explain why, in this example, A = QR.) Note that QQ⊺ ∶ R4 → R4 is the projection on span {x1 , x2 , x3 }. Remark 4.5.1. It is known, e.g., [11], that the G–S process as described in Corollary 4.1.11 is numerically unstable. That is, there is a severe loss of orthogonality of y1 , . . . as we proceed to compute yi . In computations, one uses either a modified GSP or Householder orthogonalization [11].

146

Chapter 4. Inner product spaces

Definition 4.5.2 (modified Gram–Schmidt (MGS) algorithm). Let V be an IPS and S = {x1 , . . . , xm } ⊂ V a finite (possibly empty) set. Then, S˜ = {e1 , . . . , ep } is either the orthonormal set (p ≥ 1) or the empty set (p = 0) obtained from S using the following recursive steps: 1. Initialize j = 1 and p = m. 2. If xj ≠ 0 goto 3. Otherwise, replace p by p − 1. If j > p, exit. Replace xi by xi+1 , for i = j, . . . , p. Repeat. 3. ej ∶=

1 x . ∥xj ∥ j

4. j = j + 1. If j > p, exit. 5. For i = 1, . . . , j − 1, let xj ∶= xj − ⟨xj , ei ⟩ei . 6. Go to 2. The MGS algorithm is stable but needs mn2 flops, which is more time consuming than the G–S algorithm. Lemma 4.7. Let V be a finite-dimensional IPS over R. Let U be a subspace of V. Then V = U ⊕ U⊥ . (4.5.1) Proof. If U = V or U = {0}, the lemma is trivial. So we assume that n = dim V > m = dim U ≥ 1. Choose a basis {u1 , . . . , un } for U. Complete this basis to a basis {u1 , . . . , un } for V. Perform the GSP on u1 , . . . , un to obtain an orthonormal basis {v1 , . . . , vn }. Recall that {u1 , . . . , ui } = span {v1 , . . . , vi }, for i = 1, . . . , n. Hence, U = span {u1 , . . . , um } = span {v1 , . . . , vm }, i.e., {v1 , . . . , vm } is an orthonormal basis in U. Clearly, y = ∑ni=1 ⟨y, vi ⟩vi ∈ U⊥ if and only if y = ∑ni=m+1 ai vi , i.e., {vm+1 , . . . , vn } is a basis in U⊥ . So when we write a vector n z = ∑ni=1 zvi , it is of the unique form u + w, u = ∑m i=1 zi vi ∈ U, w = ∑i=m+1 zi vi ∈ U⊥ . ◻ Corollary 4.5.3. Let the assumptions of Lemma 4.7 hold. Then, (U⊥ )⊥ = U. Lemma 4.8. For A ∈ Fn×m : (a) N (A⊺ ) = R(A)⊥ , (b) N (A⊺ )⊥ = R(A). The proof is left as an exercise. Lemma 4.9 (Fredholm alternative). Let A ∈ Fm×n . Then, the system Ax = b,

x ∈ Fn , b ∈ Fm ,

is solvable if and only if for each y ∈ F holds.

m

(4.5.2)

⊺

satisfying y A = 0, the equality y⊺ b = 0

Proof. Suppose that (4.5.2) is solvable. Then, y⊺ b = y⊺ Ax = 0⊺ x = 0. Observe next that y⊺ A = 0⊺ if and only if y⊺ ∈ R(A)⊥ . As (R(A)⊥ )⊥ = R(A), it follows that if y⊺ b = 0, for each y⊺ A = 0, then b ∈ R(A), i.e., (4.5.2) is solvable. ◻

4.5. An example of QR factorization

147

An application of the Fredholm alternative Let A = [aij ] ∈ Zn×n be a symmetric matrix. It has been proven in [6] that 2 diag A ∈ Im A. Here, we give Noga Alon’s short proof for this statement: n

x⊺ Ax = ∑ aij xi xj i,j=1

= ∑ aii xi xi + ∑(aij xi xj + aij xj xi ) = ∑ aii xi . i

i n, P(x1 , . . . , xm ) is a “flattened” parallelepiped, since x1 , . . . , xm are always linearly dependent in Rn for m > n. Assuming that the volume element generated by e1 , . . . , en denoted as e1 ∧ ⋯ ∧ en is positive, then the volume element of eσ(1) ∧ ⋯ ∧ eσ(n) has the sign (orientation) of the permutation σ. Proposition 4.7.2. Let A ∈ Rn×n and view A = [c1 c2 . . . cn ] as an ordered set of n vectors (columns), c1 , . . . , cn . Then ∣ det A∣ is the n-dimensional volume of the parallelepiped P(c1 , . . . , cn ). If c1 , . . . , cn are linearly independent, then the orientation in Rn induced by c1 , . . . , cn is the same as the orientation induced by e1 , . . . , en if det A > 0, and is the opposite orientation if det A < 0. Proof. det A = 0 if and only if the columns of A are linearly dependent. If c1 , . . . , cn are linearly dependent, then P(c1 , . . . , cn ) lies in a subspace of Rn , i.e., some n − 1 dimensional subspace, and hence the n-dimensional volume of P(c1 , . . . , cn ) is zero. Assume now that det A ≠ 0, i.e., c1 , . . . , cn are linearly independent. Perform the GSP. Then, A = QR, where Q = [e1 e2 . . . en ] is an orthogonal matrix and R = [rji ] ∈ Rn×n is an upper diagonal matrix. (See Problem 4.7.2-5.) So det A = det Q det R. Since Q⊺ Q = In , we deduce that 1 = det In = det Q⊺ det Q = det Q det Q = (det Q)2 . So det Q = ±1 and the sign of det Q is the sign of det A. Hence, ∣ det A∣ = det R = r11 r22 . . . rnn . Recall that r11 is the length of the vector c1 , and rii is the distance of the vector ei to the subspace spanned by e1 , . . . , ei−1 , for i = 2, . . . , n. (See Problem 4.7.2-4, parts (f), (g), and (i).) Thus, the length of P(c1 ) is r11 . The distance of c2 to P(c1 ) is r22 . Thus, the area, i.e., 2-dimensional volume of P(c1 , c2 ), is r11 r22 . Continuing in this manner, we deduce that the i − 1 dimensional volume of P(c1 , . . . , ci−1 ) is r11 . . . r(i−1)(i−1) . As the distance of ci to P(c1 , . . . , ci−1 ) is rii , it follows that the i-dimensional volume of P(c1 , . . . , ci ) is r11 . . . rii . For i = n, we get that ∣ det A∣ = r11 . . . rnn , which is equal to the n-dimensional volume of P(c1 , . . . , cn ). As we already pointed out, the sign of det A is equal to the sign of det Q = ±1. If det Q = 1, it is possible to “rotate” the standard basis in Rn to the basis given by the columns of an orthogonal matrix Q with det Q = 1. If det Q = −1, we need one reflection, i.e., replace the standard basis {e1 , . . . , en } by the new basis {e2 , e1 , e3 , . . . , en } and rotate the new basis {e2 , e1 , e3 , . . . , en } to the basis consisting of the columns of an orthogonal matrix Q′ , where det Q′ = 1. ◻ Theorem 4.7.3 (the Hadamard determinant inequality). Let A = [c1 , . . . , cn ] ∈ Cn×n . Then, ∣ det A∣ ≤ ∣∣c1 ∣∣ ∣∣c2 ∣∣ . . . ∣∣cn ∣∣. Equality holds if and only if either ci = 0, for some i or ⟨ci , cj ⟩ = 0, for all i ≠ j, i.e., {c1 , . . . , cn } is an orthogonal system. Proof. Assume first that det A = 0. Clearly, the Hadamard inequality holds. Equality in the Hadamard inequality holds if and only if ci = 0, for some i. Assume now that det A ≠ 0 and perform the GSP. From (4.1.2), it follows that A = QR, where Q is a unitary matrix, i.e., Q∗ Q = In and R = [rji ] ∈ Cn×n

4.7. Geometric interpretation of the determinant (second encounter)

151

is upper triangular with rii real and positive numbers. So det A = det Q det R. Thus, 1 = det In = det Q∗ Q = det Q∗ det Q = det Q det Q = ∣ det Q∣2 ⇒ ∣ det Q∣ = 1. Hence, ∣ det A∣ = det R = r11 r22 . . . rnn . According to Problem 4.7.2-4 and the proof of Proposition 4.7.2, we know that ∣∣ci ∣∣ ≥ dist(ci , span {c1 , . . . , ci−1 }) = rii , for i = 2, . . . , n. Hence, ∣ det A∣ = det R ≤ ∣∣c1 ∣∣ ∣∣c2 ∣∣ . . . ∣∣cn ∣∣. Equality holds if ∣∣ci ∣∣ = dist(ci , span {c1 , . . . , ci−1 }), for i = 2, . . . , n. Use Problem 4.7.2-4 to deduce that ∣∣ci ∣∣ = dist(ci , span {c1 , . . . , ci−1 }) if an only if ⟨ci , cj ⟩ = 0, for j = 1, . . . , i − 1. Use these conditions for i = 2, . . . to deduce that equality in the Hadamard inequality holds if and only if {c1 , . . . , cn } is an orthogonal system. ◻

4.7.1 Worked-out problems 1. Let A = [aij ]ni,j=1 ∈ Cn×n such that ∣aij ∣ ≤ 1, for i, j = 1, . . . , n. Show that n ∣ det A∣ = n 2 if and only if A∗ A = AA∗ = nIn . Solution: n

Assume first that ∣ det A∣ = n 2 and set ⎡ a1j ⎤ ⎢ ⎥ ⎢a ⎥ ⎢ ⎥ cj = ⎢ 2j ⎥ . ⎢ ⋮ ⎥ ⎢ ⎥ ⎢anj ⎥ ⎣ ⎦

√ Then, Theorem 4.7.3 and the assumption ∣aij ∣ ≤ 1 yield ∥ci ∥ = n, for any 1 ≤ i ≤ n. Reusing the assumption ∣aij ∣ ≤ 1 follows that ∣aij ∣ = 1, for i, j = 1, . . . , n. Finally, as {c1 , . . . , cn } is an orthogonal system, then A∗ A = AA∗ = nIn . Conversely, assume that A∗ A = AA∗ = nIn . We have ∣ det A∣2 = det A det A∗ = n det AA∗ = det nIn = nn . Then ∣ det A∣ = n 2 . 2. Let

⎡1⎤ ⎡2⎤ ⎢ ⎥ ⎢ ⎥ ⎢−1⎥ ⎢0⎥ ⎢ ⎥ ⎢ ⎥ u = ⎢ ⎥, v = ⎢ ⎥, ⎢1⎥ ⎢−2⎥ ⎢ ⎥ ⎢ ⎥ ⎢−1⎥ ⎢1⎥ ⎣ ⎦ ⎣ ⎦

⎡1⎤ ⎢⎥ and U = span{u, v}. Find an orthogonal projection of the matrix ⎢⎢10⎥⎥ on ⎢0⎥ ⎣⎦ U using least squares with a corresponding matrix. Solution: Define

⎡1 ⎢ ⎢−1 ⎢ A=⎢ ⎢1 ⎢ ⎢−1 ⎣

⎡1⎤ 2 ⎤⎥ ⎢ ⎥ ⎥ ⎢1⎥ 0⎥ ⎢ ⎥ ⎥ and b = ⎢ ⎥ . ⎢0⎥ −2⎥⎥ ⎢ ⎥ ⎢0⎥ 1 ⎥⎦ ⎣ ⎦

152

Chapter 4. Inner product spaces

The least squares theorem implies that the system A∗ Ax = A∗ b is solvable, 1 8 [2] and Ax0 ∈ R(A), i.e., x0 = 35 ⎡ 18 ⎤ ⎢ ⎥ 1 ⎢⎢ −2 ⎥⎥ ⎢ ⎥ Ax0 = 35 ⎢⎢−14⎥⎥ ⎢ −6 ⎥ ⎣ ⎦ is an orthogonal projection of b. 3. Find A = [aij ] ∈ C3×3 satisfying ∣aij ∣ ≤ 1 and ∣ det A∣2 = 27. Solution: Define ⎡1 ⎢ ⎢ A = ⎢1 ⎢ ⎢1 ⎣

1 ξ1 ξ12

1 ⎤⎥ 2kπ ⎥ ξ2 ⎥ , where ξk = e 3 i = cos 2kπ + i sin 2kπ , k = 1, 2. 3 3 ⎥ 2⎥ ξ2 ⎦

Clearly, ∣aij ∣ ≤ 1 and AA∗ = A∗ A = 3In . Using Worked-out Problem 4.7.1-1, √ 3 we obtain ∣ det A∣ = 3 2 = 27.

4.7.2 Problems 1. Let V be an IPS over F. Show the following statements: (a) ⟨0, x⟩ = ⟨x, 0⟩ = 0, (b) for F = R, ⟨z, ax + by⟩ = a⟨z, x⟩ + b⟨z, y⟩, for all a, b ∈ R, x, y, z ∈ V, (c) for F = C, ⟨z, ax + by⟩ = a ¯⟨z, x⟩ + ¯b⟨z, y⟩, for all a, b ∈ C, x, y, z ∈ V. 2. Let V be an IPS. Show that (a) ∣∣ax∣∣ = ∣a∣ ∣∣x∣∣, for a ∈ F and x ∈ V. (b) The triangle inequality ∣∣x + y∣∣ ≤ ∣∣x∣∣ + ∣∣y∣∣, and equality holds if either x = 0 or y = ax, for some nonnegative a ∈ R. (Hint: Use the Cauchy–Schwarz inequality.) (c) Pythagorean law; if x ⊥ y, then ∥x + y∥2 = ∥x∥2 + ∥y∥2 . (d) Parallelogram identity: ∥x + y∥2 + ∥x − y∥2 = 2∥x∥2 + 2∥y∥2 . 3. Prove Proposition 4.1.5. 4. Let V be a finite-dimensional IPS of dimension n. Assume that S ⊂ V. (a) Show that if {x1 , . . . , xm } is an orthonormal set, then x1 , . . . , xm are linearly independent.

4.7. Geometric interpretation of the determinant (second encounter)

153

(b) Assume that {e1 , . . . , en } is an orthonormal basis in V. Show that for any x ∈ V, the orthonormal expansion holds: n

x = ∑⟨x, ei ⟩ei .

(4.7.1)

i=1

Furthermore, for any x, y ∈ V, n

⟨x, y⟩ = ∑⟨x, ei ⟩⟨y, ei ⟩.

(4.7.2)

i=1

(c) Assume that S is a finite set. Let S˜ be the set obtained by the G–S process. Show that S˜ = ∅ if and only if span S = {0}. Moreover, prove that if S˜ ≠ ∅, then {e1 , . . . , ep } is an orthonormal basis in span S. (d) Show that there exists an orthonormal basis {e1 , . . . , en } in V and 0 ≤ m ≤ n such that e1 , . . . , em ∈ S, ⊥

span S = span {e1 , . . . , em },

S = span {em+1 , . . . , en },

(S ⊥ )⊥ = span S.

(e) Assume from here to the end of the problem that S is a subspace of V. Show V = S ⊕ S ⊥ . (f) Let x ∈ V and let x = u + v, for unique u ∈ S, v ∈ S ⊥ . Let P (x) ∶= u be the projection of x on S. Show that P ∶ V → V is a linear transformation satisfying P 2 = P, range P = S, ker P = S ⊥ . (g) Show that dist(x, S) ∶= ∣∣x − P x∣∣ ≤ ∣∣x − w∣∣ for any w ∈ S, and equality holds if and only if w = P x.

(4.7.3)

(h) Show in part (g) that equality holds if and only if w = P x. (i) Show that dist(x, S) = ∣∣x − w∣∣, for some w ∈ S if and only if x − w is orthogonal to S. (j) Let {e1 , . . . , em } be an orthonormal basis of S. Show that for each x ∈ V, P x = ∑pi=1 ⟨y, ei ⟩ei . (Note that P x is called the least squares approximation to x in the subspace S.) 5. Let X ∈ Cm×n and assume that m ≥ n and rank X = n. Let x1 , . . . , xn ∈ Cm be the columns of X, i.e., X = [x1 , . . . , xn ]. Assume that Cm is an IPS with the standard inner product < x, y >= y∗ x. Perform the G–S algorithm (1.1) to obtain the matrix Q = [e1 , . . . , en ] ∈ Cm×n . Let R = [rji ]n1 ∈ Cn×n be the upper triangular matrix with rji , j ≤ i given by (4.1.2). Show that ¯ T Q = In and X = QR. (This is the QR algorithm.) Show that if in Q addition, X ∈ Rm×n , then Q and R are real-valued matrices.

154

Chapter 4. Inner product spaces

6. Let C ∈ Cn×n and assume that λ1 , . . . , λn are n eigenvalues of C counted with their multiplicities. View C as an operator C ∶√ Cn → Cn . View Cn as a 2n-dimensional vector space over R. Let C = A + −1B, A, B ∈ Rn×n . A −B 2n×2n (a) Then, Cˆ ∶= [B represents the operator C ∶ Cn → Cn as an A]∈R operator over R in a suitably chosen basis. ¯ 1 , . . . , λn , λ ¯ n are the 2n eigenvalues of Cˆ counting with (b) Show that λ1 , λ multiplicities. (c) Show that the Jordan canonical form of Cˆ is obtained by replacing each ¯ + H. Jordan block λI + H in C by two Jordan blocks λI + H and λI

7. Let A = [aij ]i,j ∈ Cn×n . Assume that ∣aij ∣ ≤ K, for all i, j = 1, . . . , n. Show n that ∣ det A∣ ≤ K n n 2 . n

8. Show that for each n, there exists a matrix A = [aij ]i,j=1 ∈ Cn×n such that n ∣aij ∣ = 1, for i, j = 1, . . . , n and ∣ det A∣ = n 2 . 9. Let A = [aij ] ∈ Rn×n and assume that aij = ±1, i, j = 1, . . . , n. Show that if n n > 2, then the assumption that ∣ det A∣ = n 2 yields that n is divisible by 4. 10. Show that for any n = 2m , m = 0, 1, . . . there exists A = [aij ] ∈ Rn×n n such that aij = ±1, i, j = 1, . . . , n and ∣ det A∣ = n 2 . (Hint: Try to prove m m by induction on m that A ∈ R2 ×2 can be chosen symmetric, and then m+1 m+1 construct B ∈ R2 ×2 using A.) Note: A matrix H = [aij ]i,j ∈ Rn×n such that aij = ±1, for i, j = 1, . . . , n n and ∣ det H∣ = n 2 is called a Hadamard matrix. Hadamard matrices admit several other characterizations; an equivalent definition states that a Hadamard matrix H is an n × n matrix satisfying the identity HH ⊺ = nIn . (See Worked-out Problem 4.7.1-1.) Also, another equivalent definition for a Hadamard matrix H is an n × n matrix with entries in {−1, 1} such that two distinct rows or columns have inner product zero. It is conjectured that for each n divisible by 4, there exists a Hadamard matrix. It is known that a necessary condition for the existence of an n×n Hadamard matrix is that n = 1, 2, 4k, for some k, i.e., there exists no n × n Hadamard matrices for n ∈/ {1, 2, 4k, k ∈ N}. That this condition is also sufficient is known as the Hadamard conjecture, and has been the subject of a vast amount of literature in recent decades. 11. Let V be an IPS and T ∈ L(V). Assume that v, w ∈ V are two eigenvectors of T with distinct eigenvalues. Prove that ⟨v, w⟩ = 0. θ − sin θ 2×2 12. Consider rotation matrices A = [cos , θ ∈ [0, 2π). Assume sin θ cos θ ] ∈ R that T ∈ L(V) is orthogonal, where V is a finite-dimensional real vector space. Show that there exists a basis β of V such that [T ]β is block diagonal, and the blocks are either 2 × 2 rotation matrices or 1 × 1 matrices consisting of 1 or −1. (See Problem 1.6.3-4 and Problem 2.2.2-16.)

13. Let u = (1, −1, 1, −1)⊺ , v = (2, 0, −2, 1)⊺ . Find (a) The cosine of the angle between u and v.

4.7. Geometric interpretation of the determinant (second encounter)

155

(b) The scalar and the vector projection of v on u. (c) A basis to the orthogonal complement of U ∶= span (u, v). (d) The projection of the vector (1, 1, 0, 0)⊺ on U and U⊥ . 14. Let A ∈ R4×3 . Assume that the vector (1, −1, 1, −1)⊺ is a vector in the column space of A. Is it possible that a vector (2, 0, −2, 1)⊺ is in the null space of A⊺ ? If yes, give an example of such a matrix. If not, justify why. 15. Consider the overdetermined system x1 − x1 x1

+ x2 + x2 − x2

+ + + +

x3 x3 x3 x3

= = = =

4 0 1 2.

(a) Is this system solvable? (b) Find the least squares solution of this system. (c) Find the projection of (4, 0, 1, 2)⊺ on the column space of the coefficient matrix A ∈ R4×3 of this system. 16. Let (−1, 0), (0, 1), (1, 3), (2, 9) be four points in the x − y plane. Find (a) The best least squares fit by a linear function y = ax + b. (b) The best least squares by a quadratic polynomial y = ax2 + bx + c. (c) Explain briefly why there exists a unique cubic polynomial y = ax3 + bx2 + cx + d passing through these four points. 17. Let a ≤ t1 < t2 < ⋯ < tn ≤ b be n points in the interval [a, b]. For any two continuous functions f, g ∈ C[a, b] define ⟨f, g⟩ ∶= ∑ni=1 f (ti )g(ti ). Let Pm be the vector space of all polynomials of degree at most m − 1. (a) Show that for m ≤ n, ⟨ , ⟩ is an inner product on Pm . (b) Is ⟨ , ⟩ an inner product on Pn+1 ? Justify! 1

18. For the inner product ⟨f, g⟩ ∶= ∫−1 f (x)g(x)dx on C[−1, 1]. Find the cosine of the angle between f (x) = 1 and g(x) = ex . 19. Let T ∈ L(Rn ) be an invertible operator such that ⟨x, y⟩ = 0 implies ⟨T x, T y⟩ = 0, for x, y ∈ Rn . Prove that ∠(x, y) = ∠(T x, T y). Namely, if an operator is right angle preserving, then it is angle (any angle) preserving! (Hint: Use the Gram–Schmidt algorithm.) 20. A matrix A = [aij ] ∈ Cn×n is called diagonally dominant if ∑ ∣aij ∣ < ∣aii ∣.

j≠i

Prove that if A is diagonally dominant, then A is invertible. 21. Prove that every orthogonal matrix operator is an isometry, namely, if A ∈ Fn×n , prove that A is orthogonal if and only if ∥Ax∥ = ∥x∥, for every x ∈ Fn .

156

Chapter 4. Inner product spaces

4.8 Special transformations in IPS Proposition 4.8.1. Let V be an IPS and T ∶ V → V a linear transformation. Then, there exists a unique linear transformation T ∗ ∶ V → V such that ⟨T x, y⟩ = ⟨x, T ∗ y⟩, for all x, y ∈ V. We leave its proof as problems 4.10.2-1 and 4.10.2-2. Definition 4.8.2. Let V be an IPS over F and T ∶ V → V be a linear transformation. Then (a) T is called self-adjoint if T ∗ = T ; (b) T is called anti–self-adjoint if T ∗ = −T ; (c) T is called unitary if T ∗ T = T T ∗ = I; (d) T is called normal if T ∗ T = T T ∗ . Note that a self-adjoint transformation (matrix) is also called Hermitian. Denote by S(V), AS(V), U(V), and N(V) the sets of self-adjoint, anti– self-adjoint, unitary, and normal operators on V, respectively. Proposition 4.8.3. Let V be an IPS over F with an orthonormal basis E = {e1 , . . . , en }. Let T ∶ V → V be a linear transformation and A = [aij ] ∈ Fn×n be the representation matrix of T in the basis E: aij = ⟨T ej , ei ⟩,

i, j = 1, . . . , n.

(4.8.1)

Then, for F = R, (a) (b) (c) (d) (e)

T ∗ is represented by A⊺ ; T is self-adjoint if and only if A = A⊺ ;

T is anti–self-adjoint if and only if A = −A⊺ ;

T is unitary if and only if A is orthogonal, i.e., AA⊺ = A⊺ A = I;

T is normal if and only if A is normal, i.e., AA⊺ = A⊺ A,

and for F = C, (a) (b) (c) (d) (e)

T ∗ is represented by A∗ (∶= A¯⊺ );

T is self-adjoint if and only if A = A∗ ;

T is anti–self-adjoint if and only if A is anti-Hermitian, i.e., A = −A∗ ; T is unitary if and only if A is unitary, i.e., AA∗ = A∗ A = I;

T is normal if and only if A is normal, i.e., AA∗ = A∗ A.

We leave the proof as Problem 4.10.2-3. Let V be a real vector space. The complexification of V is defined by taking the tensor product of V with the complex field, and it is denoted by Vc . Proposition 4.8.4. Let V be an IPS over R and T ∈ L(V). Let Vc be the complexification of V. Then, there exists a unique Tc ∈ L(Vc ) such that Tc ∣V = T . Furthermore, T is self-adjoint, unitary, or normal if and only if Tc is selfadjoint, unitary, or normal, respectively.

4.8. Special transformations in IPS

157

We leave the proof as Problem 4.10.2-4. Definition 4.8.5. For a field F, let D(n, F) ∶= {A ∈ Fn×n ∶

A is diagonal}, A = A⊺ },

S(n, F) ∶= {A ∈ Fn×n ∶ AS(n, F) ∶= {A ∈ Fn×n ∶ O(n, F) ∶= {A ∈ Fn×n ∶

A = −A⊺ },

AA⊺ = A⊺ A = In },

SO(n, F) ∶= {A ∈ O(n, F) ∶ det A = 1}, DO(n, F) ∶= D(n, F) ∩ O(n, F), AA⊺ = A⊺ A},

N(n, R) ∶= {A ∈ Rn×n ∶

AA∗ = A∗ A},

N(n, C) ∶= {A ∈ Cn×n ∶

A = A∗ },

Hn ∶= {A ∈ Cn×n ∶ AHn ∶= {A ∈ C

n×n

∶

A = −A∗ },

Un ∶= {A ∈ Cn×n ∶

AA∗ = A∗ A = In },

SUn ∶= {A ∈ Un ∶

det A = 1},

DUn ∶= D(n, C) ∩ Un . See Problem 4.10.2-5 for relations between these classes. Theorem 4.8.6. Let V be an IPS over C of dimension n. Then, a linear transformation T ∶ V → V is normal if and only if V has an orthonormal basis consisting of eigenvectors of T . Proof. Suppose first that V has an orthonormal basis {e1 , . . . , en } such that ¯ i ei , i = T ei = λi ei , i = 1, . . . , n. From the definition of T ∗ , it follows that T ∗ ei = λ ∗ ∗ 1, . . . , n. Hence, T T = T T . Assume now T is normal. Since C is algebraically closed, T has an eigenvalue λ1 . Let V1 be the subspace of V spanned by all eigenvectors of T corresponding to the eigenvalue λ1 . Clearly, T V1 ⊂ V1 . Let x ∈ V1 . Then, T x = λ1 x. Thus, T T ∗ x = (T T ∗ )x = (T ∗ T )x = T ∗ T x = λ1 T ∗ x ⇒ T ∗ V1 ⊂ V1 . Hence, T V1⊥ , T ∗ V1⊥ ⊂ V1⊥ . Since V = V1 ⊕ V1⊥ , it is enough to prove the theorem for T ∣V1 and T ∣V1⊥ . ¯ 1 IV (see Problem As T ∣V1 = λ1 IV1 , it is straightforward to show T ∗ ∣V1 = λ 1 4.10.2-2). Hence, for T ∣V1 the theorem trivially holds. For T ∣V1⊥ , the theorem follows by induction. ◻ An important result of linear algebra, the spectral theorem for normal matrices, states that normal matrices are diagonal with respect to an orthonormal basis. See Theorem 4.8.7. Consult [23] for more details. Theorem 4.8.7. Assume that A ∈ Cn×n is a normal matrix. Then, A is unitarily similar to a diagonal matrix. That is, there exists a unitary matrix U ∈ Cn×n and a diagonal matrix Λ ∈ Cn×n such that A = U ΛU ∗ = U ΛU −1 .

158

Chapter 4. Inner product spaces

(This is called the spectral decomposition of A.) Note that the columns of U is an orthonormal basis of Cn consisting of eigenvectors of A, and the diagonal entries of Λ are the corresponding eigenvalues of A. The proof of Theorem 4.8.6 yields the following corollary. Corollary 4.8.8. Let V be an IPS over R of dimension n. Then, the linear transformation T ∶ V → V with a real spectrum is normal if and only if V has an orthonormal basis consisting of eigenvectors of T . Theorem 4.8.9. Let A ∈ Rn×n be a normal matrix. Divide the spectrum of A to the real part spec R A = spec A ∩ R and to the complex part: spec C A = spec A ∖ spec R A. Then there exists a real orthogonal matrix A ∈ Rn×n and a block diagonal normal matrix B = diag(B1 , . . . , Bk ) with the following properties: 1. A = QBQ⊺ . 2. Each Bi is either 1 × 1 or 2 × 2 matrix. 3. If Bi is 2 × 2, the spec Bi ⊆ spec C A. Proof. Suppose first that spec R A ≠ ∅ and spec C A = ∅. Then our theorem follows from Corollary 4.8.8. Assume second that spec R A ≠ ∅ and spec C A ≠ ∅. Let V1 ⊂ Rn be the subspace spanned by all real eigenvectors corresponding to the real eigenvalues. The arguments of the proof of Theorem 4.8.6 yield that V1 , V1⊥ are nonzero invariant subspaces of A and A⊺ . By choosing orthonormal bases in V1 and V1⊥ , we deduce that A = Q1 CQ⊺1 , where C = diag(C1 , C2 ). Also, C1 and C2 are real normal matrices where spec C1 = spec R A and spec C2 = spec C A. The previous arguments show that C1 is orthogonally similar to a real diagonal matrix. Thus we need to show our theorem for C2 . Equivalently, it is left to prove it in the case where spec R A = ∅. By Theorem 4.8.7, A = U ΛU ∗ , where U ∈ Cn×n is unitary and Λ = diag(λ1 , . . . , λn ). Our assumptions yield that each λj is not real. ¯j u ¯ j . As Let U = [u1 , . . . , un ]. So uj = xj + iyj and Auj = λj uj . Note A¯ uj = λ ¯ ¯ j are orthogonal λj ≠ λj it follows from the proof of Theorem 4.8.6 that uj and u vectors. Hence, 0 = u⊺j uj = (x⊺ + iyj⊺ )(xj + iyj ) ⇒ x⊺j xj = yj⊺ yj , yj⊺ xj = 0. The equality x⊺j xj = yj⊺ yj and the equality u∗j uj = 1 yield that x⊺j xj = yj⊺ yj = 21 . ¯ j . So n = 2m Note that the eigenvalues of A are complex and come in pairs λj , λ and we can assume that we grouped the eigenvalues of A as λj , for j = 1, . . . , m, ¯ j , for j = 1, . . . , m. So Jλj < 0, for j = m + 1, . . . , 2m. where Jλj > 0 and λm+j = λ By the assumption that U is unitary we have that u1 , . . . , um are orthonormal vectors. Let i, j ∈ [m], i ≠ j. Then, ui is orthogonal to uj and ui is also ¯j u ¯ ¯ j , as A¯ orthogonal uj = λ implies √ to√u √ √¯ j and λi ≠ λj as Jλi > 0 > Jλj . This that { 2x1 , 2y1 , . . . , 2xm , 2ym } is an orthonormal basis in R2m . In this orthonormal basis, A is represented by a matrix B = diag(B1 , . . . , Bm ), where ¯j . each Bj is a normal 2 × 2 real matrix with the eigenvalues λj , λ ◻

4.8. Special transformations in IPS

159

Proposition 4.8.10. Let V be an IPS over C and T ∈ N(V). Then (a) T is self-adjoint if and only if spec T ⊂ R, (b) T is unitary if and only if spec T ⊂ S 1 = {z ∈ C ∶

∣z∣ = 1}.

Proof. Since T is normal, there exists an orthonormal basis {e1 , . . . , en } such ¯ i ei . Then that T ei = λi ei , i = 1, . . . , n. Hence, T ∗ ei = λ ¯ i , i = 1, . . . , n, T = T ∗ if and only if λi = λ

T T ∗ = T ∗ T = I if and only if ∣λi ∣ = 1, i = 1, . . . , n.

◻

Combine Proposition 4.8.4 and Corollary 4.8.8 with the above proposition to deduce the following corollary: Corollary 4.8.11. Let V be an IPS over R and T ∈ S(V). Then, spec T ⊂ R and V has an orthonormal basis consisting of the eigenvectors of T . Remark 4.8.12. Part (a) of Proposition 4.8.10 yields that a Hermitian matrix has real eigenvalues. Corollary 4.8.11 yields that a real symmetric matrix has real eigenvalues. Proposition 4.8.13. Let V be an IPS over R and let T ∈ U(V). Then, V = ⊕i∈{−1,1,2,...,k} Vi , where k ≥ 1, Vi and Vj are orthogonal for i ≠ j, such that (a) T ∣V−1 = −IV−1 dim V−1 ≥ 0, (b) T ∣V1 = IV1 , dim V1 ≥ 0, (c) T Vi = Vi , dim Vi = 2, spec (T ∣Vi ) ⊂ S 1 /{−1, 1}, for i = 2, . . . , k. We leave the proof as Problem 4.10.2-7. Proposition 4.8.14. Let V be an IPS over R and T ∈ AS(V). Then, V = ⊕i∈[k] Vi , where k ≥ 1, Vi and Vj are orthogonal, for i ≠ j, such that (a) T ∣V1 = 0V1 dim V0 ≥ 0, √ (b) T Vi = Vi , dim Vi = 2, spec (T ∣Vi ) ⊂ −1R/{0}, for i = 2, . . . , k. We leave the proof as Problem 4.10.2-8. Theorem 4.8.15. Let V be an IPS over C of dimension n and T ∈ L(V). Let λ1 , . . . , λn ∈ C be n eigenvalues of T counted with their multiplicities. Then, there exists an orthonormal basis {g1 , . . . , gn } of V with the following properties: T span {g1 , . . . , gi } ⊂ span {g1 , . . . , gi }, ⟨T gi , gi ⟩ = λi , i = 1, . . . , n.

(4.8.2)

Let V be an IPS over R of dimension n and T ∈ L(V) and assume that spec T ⊂ R. Let λ1 , . . . , λn ∈ R be n eigenvalues of T counted with their multiplicities. Then, there exists an orthonormal basis {g1 , . . . , gn } of V such that (4.8.2) holds. Proof. Assume first that V is an IPS over C of dimension n. The proof is by induction on n. For n = 1, the theorem is trivial. Assume that n > 1. Since λ1 ∈ spec T , it follows that there exists g1 ∈ V, ⟨g1 , g1 ⟩ = 1, such that T g1 = λ1 g1 . Let U ∶= span (g1 )⊥ . Let P be the orthogonal projection on U. Observe that P v = v − ⟨v, g1 ⟩g1 , for any vector v ∈ V. Let T1 ∶= P T ∣U . Clearly, T1 ∈ L(V).

160

Chapter 4. Inner product spaces

˜2, . . . , λ ˜ n be the eigenvalues of T1 counted with their multiplicities. The Let λ induction hypothesis yields the existence of an orthonormal basis {g2 , . . . , gn } of U such that ˜ i , i = 1, . . . , n. T1 span {g2 , . . . , gi } ⊂ span {g2 , . . . , gi }, ⟨T1 gi , gi ⟩ = λ As T1 u = T u−⟨T u, g1 ⟩g1 , it follows that T span {g1 , . . . , gi } ⊂ span {g1 , . . . , gi }, for i = 1, . . . , n. Hence, in the orthonormal basis {g1 , . . . , gn }, T is presented by ˜ i , i = 2, . . . , n. an upper diagonal matrix B = [bij ]n1 , with b11 = λ1 and bii = λ ˜ ˜ Therefore, λ1 , λ2 , . . . , λn are the eigenvalues of T counted with their multiplicities. This establishes the theorem in this case. The real case is treated similarly. ◻ Combine the above results with Problems 4.10.2-6 and 4.10.2-12 to deduce the following corollary, which is known as Schur’s unitary triangularization. Corollary 4.8.16. Let A ∈ Cn×n and λ1 , . . . , λn ∈ C be n eigenvalues of A counted with their multiplicities. Then, there exists an upper triangular matrix n B = [bij ]1 ∈ Cn×n , such that bii = λi , i = 1, . . . , n, and a unitary matrix U ∈ Un such that A = U BU −1 . If A ∈ N(n, C), then B is a diagonal matrix. Remark 4.8.17. Let A ∈ Rn×n and assume that spec T ⊂ R. Then, A = U BU −1 , where U can be chosen as a real orthogonal matrix and B a real upper triangular matrix. If A ∈ N(n, R) and spec A ⊂ R, then B is a diagonal matrix. It is easy to show that U in the above corollary can be chosen in SUn or SO(n, R), respectively (Problem 4.10.2-11). Definition 4.8.18. Let V be a vector space and assume that T ∶ V → V is a linear operator. Let 0 ≠ v ∈ V. Then, W = span {v, T v, T 2 v, . . . } is called a cyclic invariant subspace of T generated by v. (It is also referred as a Krylov subspace of T generated by v.) Sometimes, we will call W just a cyclic subspace, or Krylov subspace. Theorem 4.8.19. Let V be a finite-dimensional IPS and T ∶ V → V be a linear operator. For 0 ≠ v ∈ V, let W = span {v, T v, . . . , T r−1 v} be a cyclic T -invariant subspace of dimension r generated by v. Let {u1 , . . . , ur } be an orthonormal basis of W obtained by the GSP from the basis {v, T v, . . . , T r−1 v} of W. Then, ⟨T ui , uj ⟩ = 0, for 1 ≤ i ≤ j − 2, i.e., the representation matrix of T ∣W in the basis {u1 , . . . , ur } is upper Hessenberg. If T is self-adjoint, then the representation matrix of T ∣W in the basis {u1 , . . . , ur } is a tridiagonal Hermitian matrix. Proof. Let Wj = span {v, . . . , T j−1 v}, for j = 1, . . . , r+1. Clearly, T Wj ⊂ Wj+1 , for j = 1, . . . , r. The assumption that W is a T -invariant subspace yields W = Wr = Wr+1 . Since dim W = r, it follows that v, . . . , T r−1 v are linearly independent. Hence, {v, . . . , T r−1 v} is a basis for W. Recall that span {u1 , . . . , uj } = Wj , for j = 1, . . . , r. Let r ≥ j ≥ i + 2. Then, T ui ∈ T Wi ⊂ Wi+1 . As uj ⊥ Wi+1 , it follows that ⟨T ui , uj ⟩ = 0. Assume that T ∗ = T . Let r ≥ i ≥ j + 2. Then, ⟨T ui , uj ⟩ = ⟨ui , T uj ⟩ = 0. Hence, the representation matrix of T ∣W in the basis {u1 , . . . , ur } is a tridiagonal Hermitian matrix. ◻

4.9. Symmetric bilinear, Hermitian, and quadratic forms

161

4.9 Symmetric bilinear, Hermitian, and quadratic forms Definition 4.9.1. Let V be a vector space over F and Q ∶ V × V → F. Q is called a bilinear form on V if (i) Q(ax + bz, y) = aQ(x, y) + bQ(z, y), (ii) Q(x, ay + bz) = aQ(x, y) + bQ(x, z), for all a, b ∈ F and x, y, z ∈ V. Definition 4.9.2. A bilinear form Q is called symmetric if Q(x, y) = Q(y, x), for all x, y ∈ V. Definition 4.9.3. For F = C, a bilinear form Q is called a Hermitian form or sesquilinear on V if Q satisfies the following condition: Q(x, y) = Q(y, x), for all x, y ∈ V. Definition 4.9.4. A bilinear form Q is called skew-symmetric if Q(x, y) = −Q(y, x), for all x, y ∈ V. Definition 4.9.5. A bilinear form Q is called alternate if ⟨x, x⟩ = 0, for all x ∈ V. Theorem 4.9.6. Let V be a vector space over F and Q be a bilinear form. (1) If char F = 2, then Q is skew-symmetric if and only if Q is symmetric. Moreover, if Q is alternate, then it is symmetric (skew-symmetric). (2) If char F ≠ 2, then Q is skew-symmetric if and only if it is alternate. The proof is left an exercise. Remark 4.9.7. The above theorem implies that skew-symmetric is always equivalent to either symmetric or alternate. This means we do not need to consider skew-symmetric forms. Something to notice about the definition of a bilinear form is the similarity it has to an inner product. In essence, a bilinear form is a generalization of an inner product. Example 4.11. The dot product on Rn is a symmetric bilinear form.

162

Chapter 4. Inner product spaces

Example 4.12. On Cn , let Q((x1 , . . . , xn ), (y1 , . . . , yn )) = ∑ni=1 xi y¯i . Regarding Cn as a real vector space, Q is bilinear. But viewing Cn as a complex vector space, Q is not bilinear (it is not linear in its second component). Moreover, Q(x, y) = Q(y, x). Then Q is a Hermitian form. Definition 4.9.8. A quadratic form on V is a map Q ∶ V → F with the following properties: (i) Q(ax) = a2 Q(x), for all x ∈ V and a ∈ F, (ii) The map ⟨x, y⟩Q = Q(x + y) − Q(x) − Q(y) is a symmetric bilinear form. There is a link between symmetric bilinear forms and quadratic forms on a vector space. Indeed, every quadratic form Q defines a symmetric bilinear form by (ii). On the other hand, if char F ≠ 2 and ⟨ , ⟩ is a symmetric bilinear form on V, then we can define a quadratic form Q by 1 Q(x) = ⟨x, x⟩. 2 Furthermore, if Q is defined from a bilinear form in this fashion, then the bilinear form associated with Q is ⟨x, y⟩Q = Q(x + y) − Q(x) − Q(y) 1 1 1 = ⟨x + y, x + y⟩ − ⟨x, x⟩ − ⟨y, y⟩ 2 2 2 1 1 = ⟨x, y⟩ + ⟨y, x⟩ = ⟨x, y⟩, 2 2 which is the original bilinear form. It means the map ⟨ , ⟩ → Q and Q → ⟨ , ⟩Q are inverses, so there is a one-to-one correspondence between symmetric bilinear forms on V and quadratic forms on V. The following results are elementary; we leave their proofs as problems 4.10.2-14 and 4.10.2-15. Proposition 4.9.9. Let V be a vector space over F with a basis E = {e1 , . . . , en }. Then, there is a 1 − 1 correspondence between a symmetric bilinear form Q on V and A ∈ S(n, F): Q(x, y) = η ⊺ Aξ, n

n

i=1

i=1

x = ∑ ξi ei , y = ∑ ηi ei , ξ = (ξ1 , . . . , ξn )⊺ , η = (η1 , . . . , ηn )⊺ ∈ Fn . Let V be a vector space over C with a basis E = {e1 , . . . , en }. Then, there is a 1 − 1 correspondence between a Hermitian form Q on V and A ∈ Hn : Q(x, y) = η ∗ Aξ, n

n

i=1

i=1

x = ∑ ξi ei , y = ∑ ηi ei , ξ = (ξ1 , . . . , ξn )⊺ , η = (η1 , . . . , ηn )⊺ ∈ Cn . Definition 4.9.10. Let the assumptions of Proposition 4.9.9 hold. Then, A is called the representation matrix of Q in the basis E.

4.9. Symmetric bilinear, Hermitian, and quadratic forms

163

Proposition 4.9.11. Let the assumptions of Proposition 4.9.9 hold and assume that F = {f1 , . . . , fn } is another basis of V. Then, the symmetric bilinear form Q is represented by B ∈ S(n, F) in the basis F , where B is congruent to A: B = U ⊺ AU,

U ∈ GL(n, F),

and U is the matrix corresponding to the basis change from F to E. For F = C, the Hermitian form Q is represented by B ∈ Hn in the basis F , where B is Hermite congruent to A: B = U ∗ AU,

U ∈ GL(n, C),

and U is the matrix corresponding to the basis change from F to E. Its proof is left as an exercise. Proposition 4.9.12. Let V be an n-dimensional vector space over R. Let Q ∶ V × V → R be a symmetric bilinear form and A ∈ S(n, R) the representation matrix of Q with respect to a basis E in V and Vc be the extension of V over C. Then, there exists a unique Hermitian form Qc ∶ Vc × Vc → C such that Qc ∣V×V = Q and Qc is represented by A with respect to the basis E in Vc . We leave its proof as Problem 4.10.2-16. Convention 4.9.13. Let V be a finite-dimensional IPS over F. Let Q ∶ V×V → F be either a symmetric bilinear form for F = R or a Hermitian form for F = C. Then, a representation matrix A of Q is chosen with respect to an orthonormal basis E. The following proposition is straightforward. Proposition 4.9.14. Let V be an n-dimensional IPS over F and Q ∶ V × V → F be either a symmetric bilinear form for F = R or a Hermitian form for F = C. Then, there exists a unique T ∈ S(V) such that Q(x, y) = ⟨T x, y⟩, for any x, y ∈ V. In any orthonormal basis of V, Q and T are represented by a Hermitian matrix A in the same similarity class. In particular, the characteristic polynomial p(λ) of T is called the characteristic polynomial of Q. Here, Q has only real roots: λ1 (Q) ≥ ⋯ ≥ λn (Q), which are called the eigenvalues of Q. Furthermore, there exists an orthonormal basis F = {f1 , . . . , fn } in V such that D = diag(λ1 (Q), . . . , λn (Q)) is the representation matrix of Q in F . Vice versa, for any T ∈ S(V) and any subspace U ⊂ V, the form Q(T, U) defined by Q(T, U)(x, y) ∶= ⟨T x, y⟩, for x, y ∈ U is either a symmetric bilinear form for F = R or a Hermitian form for F = C. In the rest of this book, we use the following normalization unless stated otherwise.

164

Chapter 4. Inner product spaces

Normalization 4.9.15. Let V be an n-dimensional IPS over F. Assume that T ∈ S(V). Then, arrange the eigenvalues of T counted with their multiplicities in decreasing order: λ1 (T ) ≥ ⋯ ≥ λn (T ). The same normalization applies to real symmetric matrices and complex Hermitian matrices.

4.10 Max-min characterizations of eigenvalues Given a Hermitian matrix (a self-adjoint linear transformation), we can obtain its largest (resp., smallest) eigenvalue by maximizing (resp., minimizing) the corresponding quadratic form over all the unit vectors. This section is devoted to the max-min characterization of eigenvalues of Hermitian matrices. First, we recall the definition of the Grassmannian of subspaces of given dimension in a fixed vector space. (See Problem 1.4.4-6.) Definition 4.10.1. Let V be a finite-dimensional space over the field F. Denote by Gr(m, V) the set of all m-dimensional subspaces in V of dimension m ∈ [n]∪{0}. Theorem 4.10.2 (the convoy principle). Let V be an n-dimensional IPS and T ∈ S(V). Then λk (T ) =

max

min

U∈Gr(k,V) 0≠x∈U

⟨T x, x⟩ = max λk (Q(T, U)), U∈Gr(k,V) ⟨x, x⟩

k = 1, 2, . . . , n, (4.10.1)

where the form Q(T, U) is defined in Proposition 4.9.14. For k ∈ [n], let U be an invariant subspace of T spanned by eigenvectors e1 , . . . , ek corresponding to the eigenvalues λ1 (T ), . . . , λk (T ). Then, λk (T ) = λk (Q(T, U)). Let U ∈ Gr(k, V) and assume that λk (T ) = λk (Q(T, U)). Then, the subspace U contains an eigenvector of T corresponding to λk (T ). In particular, ⟨T x, x⟩ , 0≠x∈V ⟨x, x⟩

λ1 (T ) = max

⟨T x, x⟩ . 0≠x∈V ⟨x, x⟩

λn (T ) = min

(4.10.2)

Moreover for any x ≠ 0, ⟨T x, x⟩ if and only if T x = λ1 (T )x, ⟨x, x⟩ ⟨T x, x⟩ if and only if T x = λn (T )x, λn (T ) = ⟨x, x⟩ λ1 (T ) =

x,x⟩ The quotient ⟨T⟨x,x⟩ , 0 ≠ x ∈ V, is called the Rayleigh quotient. The characterization (4.10.2) is called the convoy principle.

Proof. Choose an orthonormal basis E = {e1 , . . . , en } such that T ei = λi (T )ei , < ei , ej >= δij

i, j = 1, . . . , n.

(4.10.3)

4.10. Max-min characterizations of eigenvalues

Then

165

⟨T x, x⟩ ∑ni=1 λi (T )∣xi ∣2 = , n ⟨x, x⟩ ∑i=1 ∣xi ∣2

n

x = ∑ xi ei ≠ 0.

(4.10.4)

i=1

The above equality yields (4.10.2) and the equality cases in these characterizations. Let U ∈ Gr(k, V). Then, the minimal characterization of λk (Q(T, U)) yields the equality ⟨T x, x⟩ , for any U ∈ Gr(k, U). 0≠x∈U ⟨x, x⟩

λk (Q(T, U)) = min

(4.10.5)

Next, there exists 0 ≠ x ∈ U such that ⟨x, ei ⟩ = 0, for i = 1, . . . , k − 1. (For k = 1, this condition is void.) Hence, ⟨T x, x⟩ ∑ni=k λi (T )∣xi ∣2 ≤ λk (T ) ⇒ λk (T ) ≥ λk (Q(T, U)). = n ⟨x, x⟩ ∑i=k ∣xi ∣2 Let λ1 (T ) = ⋯ = λn1 (T ) > λ(T )n1 +1 (T ) = ⋯ = λn2 (T ) > ⋯ >, λnr−1 +1 (T ) = ⋯ = λnr (T ) = λn (T ),

n0 = 0 < n1 < ⋯ < nr = n. (4.10.6)

Assume that nj−1 < k ≤ nj and λk (Q(T, U)) = λk (T ). Then, for x ∈ U with nj ⟨x, ei ⟩ = 0, we have equality λk (Q(T, U)) = λk (T ) if and only if x = ∑i=k xi ei . Thus, T x = λk (T )x. Let Uk = span {e1 , . . . , ek } and 0 ≠ x = ∑ki=1 xi ei ∈ Uk . Then ⟨T x, x⟩ ∑ki=1 λi (T )∣xi ∣2 ≥ λk (T ) ⇒ λk (Q(T, Uk )) ≥ λk (T ). = k ⟨x, x⟩ ∑i=1 ∣xi ∣2 Hence, λk (Q(T, Uk )) = λk (T ).

◻

Note that (4.10.1) can be stated as max{min{⟨T x, x⟩ ∶ x ∈ U, ∥x∥ = 1 and U ∈ Gr(k, V)}}. It can be shown that for k > 1 and λ1 (T ) > λk (T ), there exists U ∈ Gr(k, V) such that λk (T ) = λk (T, U) and U is not an invariant subspace of T . In particular, U does not contain all e1 , . . . , ek satisfying (4.10.3). (See Problem 4.10.2-18.) Corollary 4.10.3. Let the assumptions of Theorem 4.10.2 hold. Let 1 ≤ ` ≤ n. Then λk (T ) = max λk (Q(T, W)), k = 1, . . . , `. (4.10.7) W∈Gr(`,V)

Proof. For k ≤ `, apply Theorem 4.10.2 to λk (Q(T, W)) to deduce that λk (Q(T, W)) ≤ λk (T ). Let U` = span {e1 , . . . , e` }. Then λk (Q(T, U` )) = λk (T ),

k = 1, . . . , `.

◻

Theorem 4.10.4 (Courant–Fischer principle). Let V be an n-dimensional IPS and T ∈ S(V). Then λk (T ) =

min

max

W∈Gr(k−1,V) 0≠x∈W⊥

⟨T x, x⟩ , ⟨x, x⟩

k = 1, . . . , n.

166

Chapter 4. Inner product spaces

See Problem 4.10.2-19 for the proof of the theorem and the following corollary. Corollary 4.10.5. Let V be an n-dimensional IPS and T ∈ S(V). Let k, ` ∈ [n] be integers satisfying k ≤ `. Then λn−`+k (T ) ≤ λk (Q(T, W)) ≤ λk (T ),

for any W ∈ Gr(`, V).

The following theorem by Weyl allows us to obtain an upper bound for the kth eigenvalue of S + T . Theorem 4.10.6. Let V be an n-dimensional IPS and S, T ∈ S(V). Then, for any i, j ∈ N with i + j − 1 ≤ n, the inequality λi+j−1 (S + T ) ≤ λi (S) + λj (T ) holds. (This inequality is well-known as the Weyl inequality.) Proof. Let Ui−1 , Vj−1 ⊂ V be eigenspaces of S and T spanned by the first i − 1, j − 1 eigenvectors of S and T , respectively. So ⊥ ⟨Sx, x⟩ ≤ λi (S)⟨x, x⟩, ⟨T y, y⟩ ≤ λj (T )⟨y, y⟩, for all x ∈ U⊥i−1 , y ∈ Vj−1 .

Note that dim Ui−1 = i − 1, dim Vj−1 = j − 1. Let W = Ui−1 + Vj−1 . Then, dim W = l − 1 ≤ i + j − 2. Assume that z ∈ W⊥ . Then, ⟨(S + T )z, z⟩ = )z,z⟩ ⟨Sz, z⟩ + ⟨T z, z⟩ ≤ (λi (S) + λj (T ))⟨z, z⟩. Hence, max0≠z∈W⊥ ⟨(S+T ≤ λi (S) + ⟨z,z⟩ λj (T ). Use Theorem 4.10.4 to deduce that λi+j−1 (S + T ) ≤ λl (S + T ) ≤ λi (S) + λj (T ). ◻ Definition 4.10.7. Let V be an n-dimensional IPS. Fix an integer k ∈ [n]. Then, Fk = {f1 , . . . , fk } is called an orthonormal k-frame if ⟨fi , fj ⟩ = δij , for i, j = 1, . . . , k. Denote by Fr(k, V) the set of all orthonormal k-frames in V. Note that each Fk ∈ Fr(k, V) induces U = span Fk ∈ Gr(k, V). Vice versa, any U ∈ Gr(k, V) induces the set Fr(k, U) of all orthonormal k-frames that span U. Theorem 4.10.8 (Ky Fan). Let V be an n-dimensional IPS and T ∈ S(V). Then, for any integer k ∈ [n], k

∑ λi (T ) =

i=1

k

max

∑⟨T fi , fi ⟩.

{f1 ,...,fk }∈Fr(k,V) i=1

Furthermore, k

k

i=1

i=1

∑ λi (T ) = ∑⟨T fi , fi ⟩,

for some k-orthonormal frame Fk = {f1 , . . . , fk } if and only if span Fk is spanned by e1 , . . . , ek satisfying (4.10.3). Proof. Define k

tr Q(T, U) ∶= ∑ λi (Q(T, U)), for U ∈ Gr(k, V), i=1

(4.10.8) k

trk T ∶= ∑ λi (T ). i=1

4.10. Max-min characterizations of eigenvalues

167

Let Fk = {f1 , . . . , fk } ∈ Fr(k, V). Set U = span Fk . Then, in view of Corollary 4.10.3, k

k

i=1

i=1

∑⟨T fi , fi ⟩ = tr Q(T, U) ≤ ∑ λi (T ).

Let Ek ∶= {e1 , . . . , ek }, where e1 , . . . , en are given by (4.10.3). Clearly, trk T = tr Q(T, span Ek ). This shows the maximal characterization of trk T . Let U ∈ Gr(k, V) and assume that trk T = tr Q(T, U). Hence, λi (T ) = λi (Q(T, U)), for i = 1, . . . , k. Then, there exists Gk = {g1 , . . . , gk } ∈ Fr(k, U)) such that ⟨T x, x⟩ = λi (Q(T, U)) = λi (T ), i = 1, . . . , k. 0≠x∈span {g1 ,...,gi } ⟨x, x⟩ min

Use Theorem 4.10.2 to deduce that T gi = λi (T )gi , for i = 1, . . . , k.

◻

Theorem 4.10.9. Let V be an n-dimensional IPS and T ∈ S(V). Then, for any integer k, l ∈ [n] with k + l ≤ n, we have l+k

∑ λi (T ) =

i=l+1

k

min

∑⟨T fi , fi ⟩.

max

W∈Gr(l,V) {f1 ,...,fk }∈Fr(k,V∩W⊥ ) i=1

Proof. Let Wj ∶= span {e1 , . . . , ej }, j = 1, . . . , n, where e1 , . . . , en are given by (4.10.3). Then, V1 ∶= V ∩ Wl is an invariant subspace of T . Let T1 ∶= T ∣V1 . Then, λi (T1 ) = λl+i (T ), for i = 1, . . . , n − l. Theorem 4.10.8 for T1 yields k

l+k

∑⟨T fi , fi ⟩ = ∑ λi (T ).

max

{f1 ,...,fk }∈Fr(k,V∩Wl⊥ ) i=1

i=l+1

Let T2 ∶= T ∣Wl+k and W ∈ Gr(l, V). Set U ∶= Wl+k ∩ W⊥ . Then, dim U ≥ k. Apply Theorem 4.10.8 to −T2 to deduce k

k

i=1

i=1

∑ λi (−T2 ) ≥ ∑⟨−T fi , fi ⟩, for {f1 , . . . , fk } ∈ Fr(k, U).

The above inequality is equivalent to the inequality l+k

k

i=l+1

i=1

∑ λi (T ) ≤ ∑⟨T fi , fi ⟩, for {f1 , . . . , fk } ∈ Fr(k, U) ≤

k

max

∑⟨T fi , fi ⟩.

{f1 ,...,fk }∈Fr(k,V∩W⊥ ) i=1

The above inequalities yield the theorem.

4.10.1 Worked-out problems 1. Let A, B ∈ Hn . (a) Show that λi (A + B) ≤ λi (A) + λi (B), for any i ∈ [n]. (b) Show that λi (A) + λn (B) ≤ λi (A + B).

◻

168

Chapter 4. Inner product spaces

(c) Give the necessary and sufficient condition to have the equality λ1 (A+ B) = λ1 (A) + λ1 (B). Solution: (a) Apply the Weyl inequality for j = 1. (b) Replace A and B by −A and −B and i by j in the previous part: λj (−A − B) ≤ λj (−A) + λj (−B). We have λj (−A − B) = −λn−j+1 (A + B), λj (−A) = −λn−j+1 (A) and λ1 (−B) = −λn (B). Then, λn−j+1 (A + B) ≥ λn−j+1 (A) + λn (B). Setting i = n − j + 1, we obtain the desired inequality. (c) Assume that λ1 (A+B)x = (A+B)x, where ∥x∥ = 1. Then, λ1 (A+B) = x∗ (A + B)x = x∗ Ax + x∗ Bx ≤ λ1 (A) + λ1 (B). Equality holds if and only if x is an eigenvector of A and B corresponding to λ1 (A) and λ1 (B), i.e., equality holds if and only if A and B have a common eigenvector x corresponding to λ1 (A) and λ1 (B). 2. Let S, T ∈ S(V). We say T ≻S if ⟨T x, x⟩ > ⟨Sx, x⟩, for all 0 ≠ x ∈ V. T is called positive definite if T ≻0, where 0 is the zero operator in L(V). See the next subsection about positive definite matrices. Assume that K ∈ S(n, R) is a positive definite matrix. Define in Rn an inner product ⟨x, y⟩ ∶= y⊺ Kx. Let A ∈ Rn×n and view AK as a linear transformation from Rn to Rn by x ↦ (AK)x. Show that AK is selfadjoint with respect to the above inner product if and only if A ∈ S(n, R). Solution: Assume first that AK is self-adjoint. Then we have ⟨AKx, y⟩ = y⊺ KAKx = y⊺ K ⊺ A⊺ Kx = y⊺ (AK)⊺ K = ⟨x, AKy⟩. On the other hand, since KA is self-adjoint (as AK is), then we have y⊺ KAKx = y⊺ K ⊺ A⊺ Kx. Since this is true for all x and y, then KAK = KA⊺ K and as K is positive definite, then A ∈ S(n, R). Conversely, assume that A ∈ S(n, R). Then ⟨x, Ay⟩ = ⟨Ax, y⟩ ⇒ AyKx = yKAx ⇒ yAKx = yKAx.

(4.10.9)

As K ∈ S(n, R), then (4.10.9) yields yAKx = yKAx. Since this is true for all x and y, then KA = KA = (KA)⊺ . This implies that AK is self-adjoint. 3. Prove that every real square matrix can be written as a sum of a symmetric matrix and a skew-symmetric matrix. Solution: For any matrix A = [aij ], consider the symmetric matrix B = [bij ] and a +a a −a skew-symmetric matrix C = [cij ] with bij = ij 2 ji and cij = ij 2 ji , for i ≤ j. Then, note that for i ≤ j, we have bij + cij = aij and bji + cji = bij − cij = aji , so A = B + C as desired.

4.10. Max-min characterizations of eigenvalues

169

4. Let ⟨, ⟩ be a bilinear form on a real vector space V. Show that there is a symmetric form (, ) and a skew-symmetric form [, ] so that ⟨, ⟩ = (, ) + [, ]. Solution: Note that a bilinear form is symmetric or skew-symmetric if and only if its corresponding matrix is symmetric or skew-symmetric. Let A be the matrix for ⟨, ⟩. By the previous problem we can find B symmetric and C skew-symmetric such that A = B+C. Take (u, v) = u⊺ Bv and [u, v] = u⊺ Cv. Note that ⟨u, v⟩ = (u, v) + [u, v] and (, ) and [, ] are symmetric and skewsymmetric, as desired.

4.10.2 Problems 1. Prove Proposition 4.8.1. 2. Let P, Q ∈ L(V), a, b ∈ F. Show that (aP + bQ)∗ = a ¯P ∗ + ¯bQ∗ . 3. Prove Proposition 4.8.3. 4. Prove Proposition 4.8.4 for finite-dimensional V. (Hint: Choose an orthonormal basis in V.) 5. Show the following statements: SO(n, F) ⊂ O(n, F) ⊂ GL(n, F), S(n, R) ⊂ Hn ⊂ N(n, C), AS(n, R) ⊂ AHn ⊂ N(n, C), S(n, R), AS(n, R) ⊂ N(n, R) ⊂ N(n, C), O(n, R) ⊂ Un ⊂ N(n, C), SO(n, F), O(n, F), SUn , Un are groups S(n, F) is an F-vector space of dimension (n 2+ 1) , AS(n, F) is an F-vector space of dimension (n2 ) , Hn is an R-vector space of dimension n2 , AHn = iHn . 6. Let V be an IPS over F. Let E = {e1 , . . . , en } be an orthonormal basis in V. Let G = {g1 , . . . , gn } be another basis in V. Show that G is an orthonormal basis if and only if the matrix of change of bases either from E to G or from G to E is a unitary matrix. 7. Prove Proposition 4.8.13. 8. Prove Proposition 4.8.14. θ 9. (a) Show that A ∈ SO(2, R) is of the form A = [cos sin θ

− sin θ cos θ ] , θ

∈ R.

(b) Show that SO(2, R) = eAS(2,R) . That is, for any B ∈ AS(2, R), eB ∈ SO(2, R), and for any A ∈ SO(n, R) one can find B ∈ AS(2, R) such that A = eB . (Hint: Consider the power series for eB , B = [θ0 −θ 0 ].)

170

Chapter 4. Inner product spaces

(c) Show that SO(n, R) = eAS(n,R) . (Hint: Use Propositions 4.8.13 and 4.8.14 and part b.) (d) Show that SO(n, R) is a path connected space. (See part e.) (e) Let V be an n-dimensional IPS over F = R with n > 1. Let p ∈ [n − 1]. Assume that {x1 , . . . , xp } and {y1 , . . . , yp } are two orthonormal systems in V. Show that these two orthonormal systems are path connected. That is, there are p continuous mappings zi (t) ∶ [0, 1] → V, i = 1, . . . , p, such that for each t ∈ [0, 1], {z1 (t), . . . , zp (t)} is an orthonormal system and zi (0) = xi , zi (1) = yi , i = 1, . . . , p. 10. (a) Show that Un = eAHn . (Hint: Use Proposition 4.8.10 and its proof.) (b) Show that Un is path connected. (c) Prove Problem 9e for F = C. 11. Show that (a) D1 DD1∗ = D, for any D ∈ D(n, C), D1 ∈ DUn . (b) A ∈ N(n, C) if and only if A = U DU ∗ , U ∈ SUn , D ∈ D(n, C). (c) A ∈ N(n, R), σ(A) ⊂ R if and only if A = U DU ⊺ , U ∈ SOn , D ∈ D(n, R). 12. Show that an upper triangular or a lower triangular matrix B ∈ Cn×n is normal if and only if B is diagonal. (Hint: Consider the equality (BB ∗ )11 = (B ∗ B)11 .) 13. Let the assumptions of Theorem 4.8.19 hold. Show that instead of performing the GSP on v, T v, . . . , T r−1 v, one can perform the following process. 1 v. Assume that one already obtained i orthonormal vecLet w1 ∶= ∥v∥ ˜ i+1 ∶= T wi − ∑ij=1 ⟨T wi , wj ⟩wj . If w ˜ i+1 = 0, then tors w1 , . . . , wi . Let w stop the process, i.e., one is left with i orthonormal vectors. If wi+1 ≠ 0, 1 ˜ then wi+1 ∶= ∥w w and continue the process. Show that the process ˜ i+1 ∥ i+1 ends after obtaining r orthonormal vectors w1 , . . . , wr and ui = wi , for i = 1, . . . , r. (This is a version of the Lanczos tridiagonalization process.)

14. Prove Proposition 4.9.9. 15. Prove Proposition 4.9.11. 16. Prove Proposition 4.9.12. 17. Prove Proposition 4.9.14. 18. Let V be a 3-dimensional IPS and T ∈ L(V) be self-adjoint. Assume that λ1 (T ) > λ2 (T ) > λ3 (T ),

T ei = λi (T )ei , i = 1, 2, 3.

Let W = span {e1 , e3 }. (a) Show that for each t ∈ (λ3 (T ), λ1 (T )), there exists W(t) ∈ Gr(1, W) such that λ1 (Q(T, W(t))) = t. (b) Let t ∈ [λ2 (T ), λ1 (T )] and set U(t) = span {W(t), e2 } ∈ Gr(2, V). Show that λ2 (T ) = λ2 (Q(T, U(t)).

4.10. Max-min characterizations of eigenvalues

171

19. (a) Let the assumptions of Theorem 4.10.4 hold, and W ∈ Gr(k − 1, V). Show that there exists 0 ≠ x ∈ W⊥ such that ⟨x, ei ⟩ = 0, for k + 1, . . . , n, x,x⟩ where e1 , . . . , en satisfy (4.10.3). Conclude that λ1 (Q(T, W⊥ )) ≥ ⟨T⟨x,x⟩ ≥ λk (T ). (b) Let U` = span {e1 , . . . , e` }. Show that λ1 (Q(T, U⊥` )) = λ`+1 (T ), for ` = 1, . . . , n − 1. (c) Prove Theorem 4.10.4. (d) Prove Corollary 4.10.5. (Hint: Choose U ∈ Gr(k, W) such that ⊥ U ⊂ W ∩ span {en−`+k+1 , . . . , en } . Then, λn−`+k (T ) ≤ λk (Q(T, U)) ≤ λk (Q(T, W)).) 20. Let B = [bij ]ni,j=1 ∈ Hn and denote by A ∈ Hn−1 the matrix obtained from B by deleting the jth row and column. (a) Show the Cauchy interlacing inequalities: λi (B) ≥ λi (A) ≥ λi+1 (B), for i = 1, . . . , n − 1. (b) Prove the inequality λ1 (B) + λn (B) ≤ λ1 (A) + bii . (Hint: Express the traces of B and A, respectively, in terms of eigenvalues to obtain n−1

λ1 (B) + λn (B) = bii + λ1 (A) + ∑ (λi (A) − λi (B)). i=2

Then use the Cauchy interlacing inequalities.) B

21. Let B ∈ Hn be the 2 × 2 block matrix B = [B11 ∗ 12

B12 B22 ].

Show that

λ1 (B) + λn (B) ≤ λ1 (B11 ) + λ1 (B22 ). (Hint: Assume that Bx = λ1 (B)x, x⊺ = {x⊺1 , x⊺2 }, partitioned as B. Consider U = span {(x⊺1 , 0)⊺ , (0, x⊺2 )⊺ }. Analyze λ1 (Q(T, U)) + λ2 (Q(T, U)).) 22. Let T ∈ S(V). Denote by ι+ (T ), ι0 (T ), ι− (T ) the number of positive, zero, and negative eigenvalues among λ1 (T ) ≥ ⋯ ≥ λn (T ). The triple ι(T ) ∶= (ι+ (T ), ι0 (T ), ι− (T )) is called the inertia of T . For B ∈ Hn , let ι(B) ∶= (ι+ (B), ι0 (B), ι− (B)) be the inertia of B, where ι+ (B), ι0 (B), ι− (B) is the number of positive, zero, and negative eigenvalues of B, respectively. Let U ∈ Gr(k, V). Prove the statements (a), (b), (c), and (d). (a) Assume that λk (Q(T, U)) > 0, i.e., Q(T, U) > 0. Then, k ≤ ι+ (T ). If k = ι+ (T ), then one can choose U to be an invariant subspace of V spanned by the eigenvectors of T corresponding to positive eigenvalues of T . (Usually such a subspace is not unique.) (b) Assume that λk (Q(T, U)) ≥ 0, i.e., Q(T, U) ≥ 0. Then, k ≤ ι+ (T ) + ι0 (T ). If k = ι+ (T ) + ι0 (T ), then U is the unique invariant subspace of V spanned by the eigenvectors of T corresponding to nonnegative eigenvalues of T . (c) Assume that λ1 (Q(T, U)) < 0, i.e., Q(T, U) < 0. Then, k ≤ ι− (T ). If k = ι− (T ), then U can be chosen to be an invariant subspace of V

172

Chapter 4. Inner product spaces

spanned by the eigenvectors of T , corresponding to negative eigenvalues of T . (Usually such a subspace may not be unique.) (d) Assume that λ1 (Q(T, U)) ≤ 0, i.e., Q(T, U) ≤ 0. Then, k ≤ ι− (T ) + ι0 (T ). If k = ι− (T ) + ι0 (T ), then U is a unique invariant subspace of V spanned by the eigenvectors of T corresponding to nonpositive eigenvalues of T . 23. Let B ∈ Hn and assume that A = P BP ∗ , for some P ∈ GL(n, C). Show that ι(A) = ι(B). 24. Prove that for a nonsingular matrix that has LU decomposition, the signs of pivots are the signs of eigenvalues. 25. Show the following statements: (a) The set of Hermitian matrices is not a subspace of Cn×n over C (with the usual addition and scalar multiplication). (b) The set of Hermitian matrices is a subspace of Cn×n over R (with the usual addition and scalar multiplication). 26. Let V be a complex vector space, T ∈ N(V), and f ∈ C[z]. Show the following statements: (a) f (T ) ∈ N(V). (b) The minimal polynomial of T has distinct roots. 27. Let A ∈ Cn×n be a Hermitian matrix. Prove that rank A equals the number of nonzero eigenvalues of A: 28. Show that the following statements are equivalent for A ∈ Cn×n : (a) A is Hermitian. (b) There is a unitary matrix U such that U ∗ AU is real diagonal. (c) x∗ Ax is real for every x ∈ Cn . (d) A2 = A∗ A = AA∗ . (e) tr A2 = tr A∗ A = tr AA∗ . 29. Let A ∈ S(n, R) and denote by ∑(A) the sum of all entries of A. If ∑(A) ≠ 0, prove that ∑(A) ∑(A2 ) ≤ . n2 ∑(A) 30. Let V be an IPS and T ∈ S(V). Prove that the eigenvectors of different eigenvalues of T are orthogonal to one another. 31. If A and B ∈ Cn×n commute and A is normal, prove that A∗ and B commute. 32. Let

⎡ ⎢ ⎢ ⎢ A=⎢ ⎢ ⎢ ⎢ ⎣

1 2 0 3

2 0 3 ⎤⎥ 5 −1 4 ⎥⎥ ⎥. −1 −1 1 ⎥⎥ 4 1 −2 ⎥⎦

4.11. Positive definite operators and matrices

173

Assume that λ1 ≥ λ2 ≥ λ3 ≥ λ4 are the eigenvalues of A. Assume that on R4 one has the standard inner product ⟨x, y⟩ = y⊺ x. Let ei = (δ1i , . . . , δ4i )⊺ , i ∈ [4], be the standard basis in R4 . View A as a self-adjoint operator x ↦ Ax. (a) Let U =span{e1 , e2 }. Find λ1 (A, U), λ2 (A, U). √ √ (b) Show that λ1 ≥ 3 + 8 and λ2 ≥ 3 − 8. (c) Let B be the matrix obtained from A by deleting the first row and column in A. Show that λ2 (B) ≥ λ3 . 33. Let A ∈ Cn×n be a Hermitian matrix. Arrange the n diagonal entries A in nonincreasing order α1 ≥ ⋯ ≥ αn . Show that for each integer p, 1 ≤ p ≤ n − 1 one has the inequality ∑pj=1 λj (A) ≥ ∑pj=1 αj . Furthermore, n n ∑j=1 λj (A) = ∑j=1 αj . 34. Prove Euler’s theorem: Let Q ∈ R3×3 be an orthogonal matrix. If det Q = 1, then there exists a two-dimensional subspace U in R3 and angle φ such that for each x ∈ R3 the following is true: Qx is obtained by rotating by an angle φ in U. That is, let P ∶ R3 → U be the orthogonal projection on U and Rφ ∶ U → U a rotation in U by the angle φ around the origin. Then, Qx = (x − P x) + Rφ (P x). 35. Prove that if A is a nonzero Hermitian matrix, then rank A ≥

(tr A)2 . tr(A2 )

36. Let A ∈ Cn×n . Prove that there exists a matrix D = diag(±1, . . . , ±1) such that det(A + D) ≠ 0. 37. Let S and T be normal operators such that Im S ⊥ Im T . Prove that S + T is a normal operator. 38. Let V be an IPS and T ∈ L(V). Prove that T ∈ U(V) if and only if T preserves distance, i.e., ∥T x∥ = ∥x∥, for any x ∈ V.

4.11 Positive definite operators and matrices To find the matrix analogues of positive (nonnegative) real numbers, we introduce positive (nonnegative) definite matrices. Definition 4.11.1. Let V be a finite-dimensional IPS over F. Let S and T ∈ S(V). Then, T ≻S, (T ⪰S) if ⟨T x, x⟩ > ⟨Sx, x⟩, (⟨T x, x⟩ ≥ ⟨Sx, x⟩), for all 0 ≠ x ∈ V. Also, T is called positive (nonnegative) definite if T ≻0 (T ⪰0), where 0 is the zero operator in L(V). Denote by S+ (V)o and S+ (V) the open set of positive definite self-adjoint operators and the closed set of nonnegative self-adjoint operators, respectively. (S+ (V)o ⊂S+ (V)⊂ S(V)). Definition 4.11.2. Let P and Q be either symmetric bilinear forms for F = R or Hermitian forms for F = C. Then, Q≻P, (Q⪰P ) if Q(x, x) > P (x, x), (Q(x, x) ≥ P (x, x)), for all 0 ≠ x ∈ V. Also, Q is called positive (nonnegative) definite if Q≻0 (Q⪰0), where 0 is the zero form.

174

Chapter 4. Inner product spaces

Recall that A ∈ Cn×n is Hermitian if and only if x∗ Ax is real for all x ∈ Cn (Problem 4.10.2-28). This suggests the following definition of positive (nonnegative) definiteness for A ∈ Hn . Definition 4.11.3. For A and B ∈ Hn , B≻A (B⪰A) if x∗ Bx > x∗ Ax (x∗ Bx ≥ x∗ Ax), for all 0 ≠ x ∈ Cn . Moreover, B∈ Hn is called positive (nonnegative) definite if B≻0 (B⪰0). Denote by Hon,+ ⊂ Hn,+ ⊂ Hn the open set of positive definite n×n Hermitian matrices and the closed set of n×n nonnegative Hermitian matrices, respectively. Let S+ (n, R) ∶= S(n, R) ∩ Hn,+ , S+ (n, R)o ∶= S(n, R) ∩ Hon,+ . Use Theorem 4.10.2 to deduce the following corollary which states that all eigenvalues of a positive definite matrix (or a positive definite linear operator) are strictly positive real numbers. Corollary 4.11.4. Let V be an n-dimensional IPS and T ∈ S(V). Then, T ≻0 (T ⪰0) if and only if λn (T ) > 0 (λn (T ) ≥ 0). Let S ∈ S(V) and assume that T ≻S (T ⪰S). Then, λi (T ) > λi (S) (λi (T ) ≥ λi (S)), for i = 1, . . . , n. We can apply Corollary 4.11.4 to derive an equivalent definition for positive (nonnegative) definite operators as follows. An operator is positive (nonnegative) definite if it is symmetric and all its eigenvalues are positive (nonnegative). Moreover, since the trace of a matrix is the sum of its eigenvalues, then the trace of a positive definite matrix is a positive real number. Also, since the determinant of a matrix is the product of its eigenvalues, then the determinant of a positive definite matrix is a positive real number. In particular, positive definite matrices are always invertible. Proposition 4.11.5. Let V be a finite-dimensional IPS. Assume that T ∈ S(V). Then, T ⪰0 if and only if there exists S ∈ S(V) such that T = S 2 . Furthermore, T ≻0 if and only if S is invertible. For T ∈ S(V) with T ⪰0, there exists a unique S⪰0 in S(V) such that T = S 2 . This S is called the square root of T and is 1 denoted by T 2 . Proof. Assume first that T ⪰0. Let {e1 , . . . , en } be an orthonormal basis consisting of eigenvectors of T as in (4.10.3). Since λi (T ) ≥ 0, i = 1, . . . , n, we can define P ∈ L(V) as follows: √ P ei = λi (T )ei , i = 1, . . . , n. Clearly, P is self-adjoint nonnegative and T = P 2 . Suppose now that T = S 2 , for some S ∈ S(V). Then, T ∈ S(V) and ⟨T x, x⟩ = ⟨Sx, Sx⟩ ≥ 0. Hence, T ⪰0. Clearly, ⟨T x, x⟩ = 0 if and only if√Sx = 0. Hence, T ≻0 if and only if S ∈ GL(V). Suppose that S⪰0. Then, λi (S) = λi (T ), i = 1, . . . , n. Furthermore, each eigenvector of S is an eigenvector of T . It is straightforward to show that S = P , where P is defined above. Clearly, T ≻0 if and only if √ λn (T ) > 0, i.e., if and only if S is invertible. ◻ Corollary 4.11.6. Let B ∈ Hn (or S(n, R)). Then, B⪰0 if and only if there exists A ∈ Hn (or S(n, R)) such that B = A2 . Furthermore, B≻0 if and only if

4.11. Positive definite operators and matrices

175

A is invertible. For B⪰0, there exists a unique A⪰0 such that B = A2 . This A is 1 denoted by B 2 . Definition 4.11.7. Let V be an IPS. matrix ⎡ ⟨x , x ⟩ ⎢ 1 1 ⎢ ⋮ ⎢ ⎢ ⎢⟨xn , x1 ⟩ ⎣

Given a list of vectors x1 , . . . , xn ∈ V, the ⋯ ⟨x1 , xn ⟩ ⎤⎥ ⎥ ⋯ ⋮ ⎥ ⎥ ⋯ ⟨xn , xn ⟩⎥⎦

is denoted by G(x1 , . . . , xn ) and is called the Gramian matrix.

Theorem 4.11.8. Let V be an IPS over F. Let x1 , . . . , xn ∈ V. Then, the n Gramian matrix G(x1 , . . . , xn ) ∶= [⟨xi , xj ⟩]1 is a Hermitian nonnegative definite matrix. (If F = R, then G(x1 , . . . , xn ) is real symmetric nonnegative definite.) Also, G(x1 , . . . , xn ) > 0 if and only if x1 , . . . , xn are linearly independent. Furthermore, for any integer k ∈ [n − 1], det G(x1 , . . . , xn ) ≤ det G(x1 , . . . , xk ) det G(xk+1 , . . . , xn ).

(4.11.1)

Equality holds if and only if either det G(x1 , . . . , xk ) det G(xk+1 , . . . , xn ) = 0 or ⟨xi , xj ⟩ = 0, for i = 1, . . . , k and j = k + 1, . . . , n. Proof. Clearly, G(x1 , . . . , xn ) ∈ Hn . If V is an IPS over R, then G(x1 , . . . , xn ) ∈ S(n, R). Let a = (a1 , . . . , an )⊺ ∈ Fn . Then n

n

i=1

j=1

a∗ G(x1 , . . . , xn )a = ⟨∑ ai xi , ∑ aj xj ⟩ ≥ 0. Equality holds if and only if = 0. Hence, G(x1 , . . . , xn ) ≥ 0 and G(x1 , . . . , xn ) > 0 if and only if x1 , . . . , xn are linearly independent. In particular, det G(x1 , . . . , xn ) ≥ 0 and det G(x1 , . . . , xn ) > 0 if and only if x1 , . . . , xn are linearly independent. We now prove the inequality (4.11.1). Assume first that the right-hand side of (4.11.1) is zero. Then, either x1 , . . . , xk or xk+1 , . . . , xn are linearly dependent. Hence, x1 , . . . , xn are linearly dependent and det G = 0. Assume now that the right-hand side of (4.11.1) is positive. Hence, x1 , . . . , xk and xk+1 , . . . , xn are linearly independent. If x1 , . . . , xn are linearly dependent, then det G = 0 and strict inequality holds in (4.11.1). It is left to show the inequality (4.11.1) and the equality case when x1 , . . . , xn are linearly independent. Perform the Gram–Schmidt algorithm on x1 , . . . , xn as given in (4.1.2). Let Sj = span {x1 , . . . , xj }, for j = 1, . . . , n. Corollary 4.1.2 yields that span {e1 , . . . , en−1 } = Sn−1 . Let yi = xi − PSi−1 (xi ), i = k + 1, . . . , n. Hence ∥yi ∥ = dist(xi , Si−1 ), for i = k + 1, . . . , n. Clearly, yn = xn − ∑n−1 j=1 bj xj , for some b1 , . . . , bn−1 ∈ F. Let G′ be the matrix obtained from G(x1 , . . . , xn ) by subtracting from the nth row bj times jth row. Thus, the last row of G′ is (⟨yn , x1 ⟩, . . . , ⟨yn , xn ⟩) = (0, . . . , 0, ∥yn ∥2 ). Clearly, det G(x1 , . . . , xn ) = det G′ . Expand det G′ by the last row to deduce n ∑i=1 ai xi

det G(x1 , . . . , xn ) = det G(xi , . . . , xn−1 ) ∥yn ∥2 n

= ⋯ = det G(x1 , . . . , xk ) ∏ ∥yi ∥2

(4.11.2)

i=k+1

n

= det G(x1 , . . . , xk ) ∏ dist(xi , Si−1 )2 , i=k+1

k = n − 1, . . . , 1.

176

Chapter 4. Inner product spaces

ˆ k+1 = xk+1 and y ˆj = Let Sˆj ∶= span {xk+1 , . . . , xj } for j = k + 1, . . . , n. Denote y ˆ n } is an orthogonal set such xj − PSˆj−1 (xj ) for j = k + 1, . . . , n. Then {ˆ yk+1 , . . . , y that ˆ j }, dist(xj , Sˆj−1 )=∥ˆ Sˆj ∶=span {xk+1 , . . . , xj }=span {ˆ yk+1 , . . . , y yj ∥=dist(ˆ yj , Sˆj−1 ), for j =k+1, . . . , n, where Sˆk ={0}. Use (4.11.2) to deduce that det G(xk+1 , . . . , xn )= n yj ∥2 . As Sˆj−1 ⊂ Sj−1 , for j > k, it follows that ∏j=k+1 ∥ˆ ∥yj ∥ = dist(xj , Sj−1 ) ≤ dist(xj , Sˆj−1 ) = ∥ˆ yj ∥, j = k + 1, . . . , n. This shows (4.11.1). Assume now that equality holds in (4.11.1). Then, ∥yj ∥ = ˆ j − xj ∈ Sˆj−1 ⊂ Sj−1 , it fol∥ˆ yj ∥, for j = k + 1, . . . , n. Since Sˆj−1 ⊂ Sj−1 and y lows that dist(xj , Sj−1 ) = dist(ˆ yj , Sj−1 ) = ∥yj ∥. Hence, ∥ˆ yj ∥ = dist(ˆ yj , Sj−1 ). As ˆ j ⊥ Sj−1 , for j = k + 1, . . . , n. dist(ˆ yj , Sj−1 ) = ∣ˆ yj − PSj−1 (ˆ yj )∣, we deduce that y ˆ j ⊥ Sk , for j = k + 1, . . . , n. Therefore, ⟨xj , xi ⟩ = 0, for j > In particular, y k and i ≤ k. Clearly, if the last condition holds, then det G(x1 , . . . , xn ) = det G(x1 , . . . , xk ) det G(xk+1 , . . . , xn ). ◻ Note that det G(x1 , . . . , xn ) has the following geometric meaning. Consider a parallelepiped Π in V spanned by x1 , . . . , xn starting from the origin 0. That is, Π is a convex hull √ spanned by the vectors 0 and ∑i∈S xi for all nonempty subsets S ⊂ [n]. Then, det G(x1 , . . . , xn ) is the n-volume of Π. The inequality (4.11.1) and equalities (4.11.2) are “obvious” from this geometrical point of view. n

Corollary 4.11.9. Let 0 ≤ B = [bij ]1 ∈ Hn,+ . Then k

n

det B ≤ det [bij ]1 det [bij ]k+1 , for k = 1, . . . , n − 1. For a fixed k, equality holds if and only if either the right-hand side of the above inequality is zero or bij = 0, for i = 1, . . . , k and j = k + 1, . . . , n. Proof. From Corollary 4.11.6, it follows that B = X 2 , for some X ∈ Hn . Let x1 , . . . , xn ∈ Cn be the n columns of X T = (x1 , . . . , xn ). Let ⟨x, y⟩ = y∗ x. Since X ∈ Hn , we deduce that B = G(x1 , . . . , xn ). ◻ Theorem 4.11.10. Let V be an n-dimensional IPS and T ∈ S(V), then the following statements are equivalent: (a) T ≻0. (b) Let {g1 , . . . , gn } be a basis of V. Then, det(⟨T gi , gj ⟩)ki,j=1 > 0, k = 1, . . . , n. Proof. (a) ⇒ (b). According to Proposition 4.11.5, T = S 2 , for some S ∈ S(V) ∩ GL(V). Then, ⟨T gi , gj ⟩ = ⟨Sgi , Sgj ⟩. Hence, det(⟨T gi , gj ⟩)ki,j=1 = det G(Sg1 , . . . , Sgk ). Since S is invertible and g1 , . . . , gk linearly independent, it follows that Sg1 , . . . , Sgk are linearly independent. Theorem 4.11.1 implies that det G(Sg1 , . . . , Sgk ) > 0, for k = 1, . . . , n. (b) ⇒ (a). The proof is by induction on n. For n = 1, (a) is obvious. Assume that (a) holds for n = m − 1. Let U ∶= span {g1 , . . . , gn−1 } and Q ∶= Q(T, U). Then, there exists P ∈ S(U) such that < P x, y >= Q(x, y) =< T x, y >, for any x, y ∈ U. By induction P ≻0. Corollary 4.10.3 yields that λn−1 (T ) ≥ λn−1 (P ) > 0.

4.12. Inequalities for traces

177

Hence, T has at least n − 1 positive eigenvalues. Let e1 , . . . , en be given by n (4.10.3). Then, det(⟨T ei , ej ⟩)ni,j=1 = ∏ni=1 λi (T ) > 0. Let A = [apq ]1 ∈ GL(n, C) be the transformation matrix from the basis {g1 , . . . , gn } to {e1 , . . . , en }, i.e., n

gi = ∑ api ep , i = 1, . . . , n. p=1

It is straightforward to show that (⟨T gi , gj ⟩)n1 = AT (⟨T ep , eq ⟩)A¯ (4.11.3) n

⇒ det(⟨T gi , gj ⟩)n1 = det(⟨T ei , ej ⟩)n1 ∣ det A∣2 = ∣ det A∣2 ∏ λi (T ). i=1

Since

det(⟨T gi , gj ⟩)n1

> 0 and λ1 (T ) ≥ ⋯ ≥ λn−1 (T ) > 0, it follows that λn (T ) > 0. ◻

The following result is straightforward; its proof is left as Problem 4.12.2-1. Proposition 4.11.11. Let V be a finite-dimensional IPS over F with the inner product ⟨⋅, ⋅⟩. Assume that T ∈ S(V). Then, T ≻0 if and only if (x, y) ∶= ⟨T x, y⟩ is an inner product on V. Vice versa any inner product (⋅, ⋅) ∶ V × V → R is of the form (x, y) = ⟨T x, y⟩ for a unique self-adjoint positive definite operator T ∈ L(V). Example 4.13. Each B ∈ Hn with B≻0 induces an inner product on Cn : ⟨x, y⟩ = y∗ Bx. Each B ∈ S(n, R) with B≻0 induces an inner product on Rn : ⟨x, y⟩ = yT Bx. Furthermore, any inner product on Cn or Rn is of the above form. In particular, the standard inner products on Cn and Rn are induced by the identity matrix I.

4.12 Inequalities for traces Let V be a finite-dimensional IPS over F. Let T ∶ V → V be a linear operator. Then, tr T is the trace of the representation matrix of T with respect to any orthonormal basis of V. (See Problem 4.12.2-3.) Theorem 4.12.1. Let V be an n-dimensional IP S over F. Assume that S, T ∈ S(V). Then, tr ST is bounded from below and above by ∑ni=1 λi (S)λn−i+1 (T ) and ∑ni=1 λi (S)λi (T ), respectively. Namely, we have n

n

i=1

i=1

∑ λi (S)λn−i+1 (T ) ≤ tr ST ≤ ∑ λi (S)λi (T ).

(4.12.1)

Equality for the upper bound holds if and only if ST = T S and there exists an orthonormal basis {x1 , . . . , xn } in V such that Sxi = λi (S)xi ,

T xi = λi (T )xi ,

i = 1, . . . , n.

(4.12.2)

Equality for the lower bound holds if and only if ST = T S and there exists an orthonormal basis {x1 , . . . , xn } of V such that Sxi = λi (S)xi ,

T xi = λn−i+1 (T )xi ,

i = 1, . . . , n.

(4.12.3)

178

Chapter 4. Inner product spaces

Proof. Let {y1 , . . . , yn } be an orthonormal basis of V such that T yi = λi (T )yi ,

i = 1, . . . , n,

λ1 (T ) = ⋯ = λi1 (T ) > λi1 +1 (T ) = ⋯ = λi2 (T ) > ⋯ > λik−1 +1 (T ) = ⋯ = λik (T ) = λn (T ), 1 ≤ i1 < ⋯ < ik = n. If k = 1, then i1 = n and it follows that T = λ1 I. Then, the theorem is trivial in this case. Assume that k > 1. Then n

tr ST = ∑ λi (T )⟨Syi , yi ⟩ i=1

n−1

i

n

l=1

= ∑ (λi (T ) − λi+1 (T )) (∑⟨Syl , yl ⟩) + λn (T ) (∑⟨Syl , yl ⟩) i=1

l=1

k−1

ij

j=1

l=1

= ∑ (λij (T ) − λij+1 (T )) ∑⟨Syl , yl ⟩ + λn (T ) tr S. i

i

j j Theorem 4.10.8 yields that ∑l=1 ⟨Syl , yl ⟩ ≤ ∑l=1 λl (S). Substitute these inequalities for j = 1, . . . , k − 1 in the above identity to deduce the upper bound in (4.12.1). Clearly, Condition (4.12.2) implies that tr ST is equal to the upper bound in (4.12.1). Assume now that tr ST is equal to the upper bound in (4.12.1). ij ij λl (S) for j = 1, . . . , k − 1. Theorem 4.10.8 yields that ⟨Syl , yl ⟩ = ∑l=1 Then, ∑l=1 span {y1 , . . . , yij } is spanned by some ij eigenvectors of S corresponding to the first ij eigenvalues of S, for j = 1, . . . , k − 1. Let {x1 , . . . , xi1 } be an orthonormal basis of span {y1 , . . . , yi1 } consisting of the eigenvectors of S corresponding to the eigenvalues of λ1 (S), . . . , λi1 (S). Since any 0 ≠ x ∈ span {y1 , . . . , yi1 } is an eigenvector of T corresponding to the eigenvalue λi1 (T ), it follows that (4.12.2) holds, for i = 1, . . . , i1 . Consider span {y1 , . . . , yi2 }. The above arguments imply that this subspace contains i2 eigenvectors of S and T corresponding to the first i2 eigenvalues of S and T . Hence, U2 is the orthogonal complement of span {x1 , . . . , xi1 } in span {y1 , . . . , yi2 }, spanned by xi1 +1 , . . . , xi2 , which are i2 − i1 orthonormal eigenvectors of S corresponding to the eigenvalues λi1 + (S), . . . , λi2 (S). Since any nonzero vector in U2 is an eigenvector of T corresponding to the eigenvalue λi2 (T ), we deduce that (4.12.2) holds for i = 1, . . . , i2 . Continuing in the same manner, we obtain (4.12.2). To prove the equality case in the lower bound, consider the equality in the upper bound for tr S(−T ). ◻

Corollary 4.12.2. Let V be an n-dimensional IPS over F. Assume that S and T ∈ S(V). Then n

2 2 ∑(λi (S) − λi (T )) ≤ tr(S − T ) .

(4.12.4)

i=1

Equality holds if and only if ST = T S and V has an orthonormal basis {x1 , . . . , xn }, satisfying (4.12.2). Proof. Note that n

n

i=1

i=1

2 2 2 ∑(λi (S) − λi (T )) = tr S + tr T − 2 ∑ λi (S)λi (T ).

◻

4.12. Inequalities for traces

179

Corollary 4.12.3. Let S, T ∈ Hn . Then, the inequalities (4.12.1) and (4.12.4) hold. Equality in the upper bound holds if and only if there exists U ∈ Un such that S = U diag λ(S)U ∗ , T = U diag λ(T )U ∗ . Equality in the lower bound of (4.12.1) holds if and only if there exists V ∈ Un such that S = V diag λ(S)V ∗ , −T = V diag λ(−T )V ∗ .

4.12.1 Worked-out problems 1. Let V denote the vector space of real n × n matrices. Prove that ⟨A, B⟩ = tr A⊺ B is a positive definite bilinear form on V. Find an orthonormal basis for this form. Solution: First note that ⟨A1 + A2 , B⟩ = tr((A1 + A2 )⊺ B) = tr(A⊺1 B + A⊺2 B) = tr A⊺1 B + tr A⊺2 B = ⟨A1 , B⟩ + ⟨A2 , B⟩ and ⟨cA, B⟩ = tr cA⊺ B = c tr A⊺ B = c⟨A, B⟩, so the form is bilinear in A. The exact same proof establishes bilinearity in B. Now, note that ⟨0, 0⟩ = 0, and n

n

n

n

n

⟨A, A⟩ = tr A⊺ A = ∑(A⊺ A)ii = ∑ ∑ (A⊺ )ij Aji = ∑ ∑ A2ij > 0, i=1

i=1 j=1

i=1 j=1

for A ≠ 0. So the form is positive definite and bilinear. For the next part, let Mij be the matrix with a 1 in the (i, j) entry and 0’s elsewhere. We claim that the Mij ’s for 1 ≤ i, j ≤ n form the desired orthonormal basis; it is clear that these matrices form a basis for V. Then, note that for (a, b) ≠ (x, y) we have n

⊺ ⊺ Mxy )ii ⟨Mab , Mxy ⟩ = tr(Mab Mxy ) = ∑(Mab n

n

n

i=1 n

⊺ = ∑ ∑ (Mab )ij (Mxy )ji = ∑ ∑ δaj δbi δxj δyi = 0, i=1 j=1

i=1 j=1

where δ is the Kronecker delta function and where we have used the fact that a term in the sum is nonzero only when a = x = j and b = y = i, which never occurs. Thus, this basis is orthogonal. Furthermore, we find ⊺ ⟨Mab , Mab ⟩ = tr(Mab Mab ) = tr Mbb = 1, ⊺ since Mab Mab = Mbb . Therefore, the Mij ’s form the desired orthonormal basis.

4.12.2 Problems 1. Prove Proposition 4.11.11. 2. Consider the H¨ older inequality, n

∑ xl yl al ≤

l=1

n

(∑ xpl al ) l=1

1 p

n

(∑ ylq al ) l=1

1 q

,

(4.12.5)

for any x = (x1 , . . . , xn )⊺ , y = (y1 , . . . , yn )⊺ , a = (a1 , . . . , an ) ∈ Rn+ and p, q ∈ (1, ∞) such that p1 + 1q = 1.

180

Chapter 4. Inner product spaces

(a) Let A ∈ Hn,+ , x ∈ Cn and 0 ≤ i < j < k be three integers. Show that x∗ Aj x ≤ (x∗ Ai x) k−i (x∗ Ak x) k−i . k−j

j−i

(4.12.6)

(Hint: Diagonalize A.) (b) Assume that A = eB , for some B ∈ Hn . Show that (4.12.6) holds for any three real numbers i < j < k. 3. Let V be an n-dimensional IPS over F. (a) Assume that T ∶ V → V is a linear transformation. Show that for any orthonormal basis {x1 , . . . , xn }, n

tr T = ∑⟨T xi , xi ⟩. i=1

Furthermore, if F = C, then tr T is the sum of the n eigenvalues of T . (b) Let S, T ∈ S(V). Show that tr ST = tr T S ∈ R. 4. Let A ∈ Rm×n and rank A = n. Show that A⊺ A is positive definite. 5. Let A ∈ Rn×n be symmetric. Show that eA is symmetric and positive definite. 6. Show that a matrix A ∈ S(n, R) is positive definite if it has LU decomposition and all its pivots are positive. (Hint: Use Problem 4.10.2-24.) 7. Determine whether the following matrices are positive definite: ⎡−2 1 ⎢ ⎢ (a) A = ⎢−1 2 ⎢ ⎢0 1 ⎣ ⎡ 2 −1 ⎢ ⎢ (b) B = ⎢−1 2 ⎢ ⎢0 1 ⎣

0 ⎤⎥ ⎥ −1⎥ ⎥ 2 ⎥⎦ 0⎤⎥ ⎥ 1⎥ ⎥ 2⎥⎦

(Hint: Use Problem 4.10.2-24 and Problem 4.12.2-4.) 8. Let A = [aij ] ∈ S(n, F). Let us consider its all upper left submatrices. a12 Namely, A1 = [a11 ], A2 = [aa11 a22 ], . . . , An = A. Prove that A is positive 21 definite if and only if det Ak > 0, for k ∈ [n]. (This result is known as Sylvester’s criterion of positivity.) 9. Let V be an IPS and x1 , . . . , xn ∈ V. (a) Show that G(x1 , . . . , xn ) ≤ ∥x1 ∥⋯∥xn ∥. Furthermore, if x1 , . . . , xn are nonzero then equality holds if and only if x1 , . . . , xn are orthogonal vectors. (b) Deduce the Hadamard determinant inequality and the equality case from the above inequality. 10. Prove that if A is positive definite, then adj A is also positive definite.

4.13. Singular value decomposition

181

11. Let A be an n × n positive definite matrix. Prove that √ ∞ ∞ ( π)n −x⊺ Ax ⋯∫ e dx1 . . . dxn = √ , ∫ −∞ −∞ det A ´¹¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¸¹¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¶ n

where x = (x1 , . . . , xn ).

4.13 Singular value decomposition Consider A = [aij ] ∈ Rn×m . It can be interpreted as a digital picture seen on a screen. Then aij encodes the color and its strength in the location of (i, j). In many cases m and n are very big, so it is very costly and time-consuming to storage the information, or to transmit it. We know that there is a lot of redundancy in the picture. Is there a way to condense the information to have almost the same picture, when an average person looks at it? The answer is yes, and one way to achieve it is to use the singular value decomposition. Singular value decomposition is based on a theorem which says that a rectangular matrix A can be broken down into the product of three matrices: a unitary matrix U , a diagonal matrix Σ, and the transpose of a unitary matrix V. Let U and V be two finite-dimensional IPS over F, with the inner products ⟨⋅, ⋅⟩U and ⟨⋅, ⋅⟩V , respectively. Let {u1 , . . . , um } and {v1 , . . . , vn } be bases in U and V, respectively. Let T ∶ V → U be a linear operator. In these bases, T is represented by a matrix A = [aij ] ∈ Fm×n as given by m

T vj = ∑ aij ui ,

j = 1, . . . , n.

i=1

Let T ∗ ∶ U∗ = U → V∗ = V. Then, T ∗ T ∶ V → V and T T ∗ ∶ U → U are self-adjoint operators. As ⟨T ∗ T v, v⟩V = ⟨T v, T v⟩V ≥ 0,

⟨T T ∗ u, u⟩U = ⟨T ∗ u, T ∗ u⟩U ≥ 0,

it follows that T ∗ T ⪰0, T T ∗ ⪰0. Assume that T ∗ T ci = λi (T ∗ T )ci , ⟨ci , ck ⟩V = δik , i, k = 1, . . . , n,

(4.13.1)

T T ∗ dj = λj (T T ∗ )dj , ⟨dj , dl ⟩U = δjl , j, l = 1, . . . , m,

(4.13.2)

λ1 (T ∗ T ) ≥ ⋯ ≥ λn (T ∗ T ) ≥ 0, ∗

∗

λ1 (T T ) ≥ ⋯ ≥ λm (T T ) ≥ 0. Proposition 4.13.1. Let U and V be two finite-dimensional IPS over F and T ∶ V → U be a linear transformation. Then, rank T = rank T ∗ = rank T ∗ T = rank T T ∗ = r. Furthermore, the self-adjoint nonnegative definite operators T ∗ T and T T ∗ have exactly r positive eigenvalues, and λi (T ∗ T ) = λi (T T ∗ ) > 0,

i = 1, . . . , rank T.

(4.13.3)

Moreover, for i ∈ [r], T ci and T ∗ di are eigenvectors of T T ∗ and T ∗ T corresponding to the eigenvalue λi (T T ∗ ) = λi (T ∗ T ), respectively. Furthermore, if c1 , . . . , cr ˜ i ∶= T ci , i = 1, . . . , r satisfies (4.13.2), for i = 1, . . . , r. A satisfy (4.13.1), then d ∣∣T ci ∣∣ similar result holds for d1 , . . . , dr .

182

Chapter 4. Inner product spaces

Proof. Clearly, T x = 0 if and only if ⟨T x, T x⟩ = 0 if and only if T ∗ T x = 0. Hence, rank T ∗ T = rank T = rank T ∗ = rank T T ∗ = r. Thus, T ∗ T and T T ∗ have exactly r positive eigenvalues. Let i ∈ [r]. Then, T ∗ T ci ≠ 0. Hence, T ci ≠ 0. (4.13.1) yields that T T ∗ (T ci ) = λi (T ∗ T )(T ci ). Similarly, T ∗ T (T ∗ di ) = λi (T T ∗ )(T ∗ di ) ≠ 0. Hence, (4.13.3) holds. Assume that ˜ 1, . . . , d ˜ r be defined as above. By the definition, c1 , . . . , cr satisfy (4.13.1). Let d ˜ ∣∣di ∣∣ = 1, i = 1, . . . , r. Assume that 1 ≤ i < j ≤ r. Then, ˜ i, d ˜ j ⟩ = 0. 0 = ⟨ci , cj ⟩ = λi (T ∗ T )⟨ci , cj ⟩ = ⟨T ∗ T ci , cj ⟩ = ⟨T ci , T cj ⟩ ⇒ ⟨d ˜ 1, . . . , d ˜ r } is an orthonormal system. Hence, {d Definition 4.13.2. Let √ σi (T ) = λi (T ∗ T ), for i = 1, . . . , r,

◻

σi (T ) = 0 for i > r, (4.13.4)

⊺

σ(p) (T ) ∶= (σ1 (T ), . . . , σp (T )) ∈

Rp↘ ,

p ∈ N,

where Rp↘ ∶= {x = (x1 , . . . , xp )⊺ ∈ Rp ∶ x1 ≥ x2 ≥ ⋯ ≥ xp }. Then, σi (T ) = σi (T ∗ ), i = 1, . . . , min(m, n) are called the singular values of T and T ∗ , respectively. Note that the singular values are arranged in a decreasing order. The positive singular values are called principal singular values of T and T ∗ , respectively. Note that ∣∣T ci ∣∣2 = ⟨T ci , T ci ⟩ = ⟨T ∗ T ci , ci ⟩ = λi (T ∗ T ) = σi2 ⇒ ∣∣T ci ∣∣ = σi , i = 1, . . . , n,

∣∣T ∗ dj ∣∣2 = ⟨T ∗ dj , T ∗ dj ⟩ = ⟨T T ∗ dj , di ⟩ = λi (T T ∗ ) = σj2 ⇒ ∣∣T dj ∣∣ = σj , j = 1, . . . , m. Let {c1 , . . . cn } be an orthonormal basis of V satisfying (4.13.1). Choose an orthonormal basis {d1 , . . . , dm } as follows: Set di ∶= Tσcii , i = 1, . . . , r. Then, complete the orthonormal set {d1 , . . . , dr } to an orthonormal basis of U. Since span {d1 , . . . , dr } is spanned by all eigenvectors of T T ∗ corresponding to nonzero eigenvalues of T T ∗ , it follows that ker T ∗ = span {dr+1 , . . . , dm }. Hence, (4.13.2) holds. In these orthonormal bases of U and V, the operators T and T ∗ are represented quite simply: T ci = σi (T )di , i = 1, . . . , n,

where di = 0, for i > m, (4.13.5)

T ∗ dj = σj (T )cj , j = 1, . . . , m,

where cj = 0, for j > n.

Let m,n

Σ = [sij ]i,j=1 , sij = 0, for i ≠ j, sii = σi , for i = 1, . . . , min(m.n).

(4.13.6)

In the case m ≠ n, we call Σ a diagonal matrix with the diagonal σ1 , . . . , σmin(m,n) . Then, in the bases [d1 , . . . , dm ] and [c1 , . . . , cn ], T and T ∗ are represented by the matrices Σ and Σ⊺ , respectively.

4.13. Singular value decomposition

Lemma 4.14. Assume that

183

Let U, V, T , Σ, [c1 , . . . , cn ] and [d1 , . . . , dm ] be as above.

(i) T is presented by the matrix A ∈ Fm×n (then T ∗ is presented by A∗ ). (ii) U ∈ U(m) is the unitary matrix representing the change of basis [d1 , . . . , dm ] to [c1 , . . . , cn ]. (iii) V ∈ U(n) is the unitary matrix representing the change of basis [u1 , . . . , um ] to [v1 , . . . , vn ]. (iv) {u1 , . . . , um } is an orthonormal set. (v) {v1 , . . . , vn } is an orthonormal set. Then

A = U ΣV ∗ ∈ Fm×n .

(4.13.7) m

n

Proof. By the definition, T vj = ∑m i=1 aij ui . Let U = [uip ]i,p=1 , V = [vjq ]j,q=1 . Then n

n

m

n

j=1

j=1

i=1

j=1

m

m

i=1

p=1

T cq = ∑ vjq T vj = ∑ vjq ∑ aij ui = ∑ vjq ∑ aij ∑ u ¯ip dp . ∗

Use the first equality of (4.13.5) to deduce that U AV = Σ.

◻

Definition 4.13.3. (4.13.7) is called the singular value decomposition (SVD) of A. Theorem 4.13.4 (singular value decomposition). Let A ∈ Fm×n where F = R or C. Then there exists the singular value decomposition of A of the form A = U ΣV ∗ , where U ∈ U(m), V ∈ U(n), and Σ ∈ Fm×n is a matrix with nonnegative real entries in its diagonal. Proof. It is immediate from Lemma 4.14.

◻

Proposition 4.13.5. For the field F, denote by Rm,n,k (F) ⊂ Fm×n the set of all matrices of rank at most k ∈ [min(m, n)]. Then, A ∈ Rm,n,k (F) if and only if A can be expressed as a sum of at most k matrices of rank 1. Furthermore, Rm,n,k (F) is a variety in Fm×n given by the polynomial equations that each (k + 1) × (k + 1) minor of A is equal to zero. (Variety 1 means a set of points in Fn satisfying a finite number of polynomial equations.) For the proof, see Problem 4.15.2-2. Definition 4.13.6. Let A ∈ Cm×n and assume that A has the SVD given by (4.13.7), where U = [u1 , . . . , um ] and V = [v1 , . . . , vn ]. Denote by Ak ∶= k ∑i=1 σi ui vi∗ ∈ Cm×n , for k = 1, . . . , rank A. For k > rank A, we define Ak ∶= A (= Arank A ). Note that for 1 ≤ k < rank A, the matrix Ak is uniquely defined if and only if σk > σk+1 .

184

Chapter 4. Inner product spaces

Theorem 4.13.7. For the field F and A = [aij ] ∈ Fm×n , the following conditions hold: ¿ Árank A √ √ À ∑ σ (A)2 , (4.13.8) ∣∣A∣∣F ∶= tr A∗ A = tr AA∗ = Á i i=1

∣∣A∣∣2 ∶= min

B∈Rm,n,k (F)

max

x∈Fn ,∣∣x∣∣2 =1

∣∣Ax∣∣2 = σ1 (A),

∣∣A − B∣∣2 = ∣∣A − Ak ∣∣ = σk+1 (A), k = 1, . . . , rank A − 1, ′

(4.13.9) (4.13.10)

′

,n σi (A) ≥ σi ((aip jq )m p=1,q=1 ) ≥ σi+(m−m′ )+(n−n′ ) (A),

(4.13.11) ′

′

m ∈ [m], n ∈ [n], 1 ≤ i1 < ⋯ < im′ ≤ m, 1 ≤ j1 < ⋯ < jn′ ≤ n. Proof. The proof of (4.13.8) is left as Problem 4.15.2-7. We now show the equality in (4.13.9). View A as an operator A ∶ Cn → Cm . From the definition of ∣∣A∣∣2 , it follows that ∣∣A∣∣22 = maxn 0≠x∈R

x∗ A∗ Ax = λ1 (A∗ A) = σ1 (A)2 , x∗ x

which proves (4.13.9). We now prove (4.13.10). In the SVD of A, assume that U = {u1 , . . . , um } and V = {v1 , . . . , vn }. Then, (4.13.7) is equivalent to the following representation of A: r

A = ∑ σi ui vi∗ , u1 , . . . , ur ∈ Rm , v1 , . . . , vr ∈ Rn , u∗i uj = vi∗ vj = δij , i, j = 1, . . . , r, i=1

(4.13.12) where r = rank A. Let B = ∑ki=1 σi ui vi∗ ∈ Rm,n,k . Then, in view of (4.13.9), r

∣∣A − B∣∣2 = ∣∣ ∑ σi ui vi∗ ∣∣2 = σk+1 . k+1

Let B ∈ Rm,n,k . To show (4.13.10), it is enough to show that ∣∣A − B∣∣2 ≥ σk+1 . Let W ∶= {x ∈ Rn ∶ Bx = 0}. Then codim W ≥ k. Furthermore, ∣∣A − B∣∣22 ≥

max

∣∣x∣∣2 =1,x∈W

∣∣(A − B)x∣∣2 =

max

∣∣x∣∣2 =1,x∈W

2 x∗ A∗ Ax ≥ λk+1 (A∗ A) = σk+1 ,

where the last inequality follows from the min-max characterization of λk+1 (A∗ A). m,n′ Let C = [aijq ]i,q=1 . Then, C ∗ C is a principal submatrix of A∗ A of dimension n′ . The interlacing inequalities between the eigenvalues of A∗ A and C ∗ C yield ′ ,n′ ∗ (4.13.11), for m′ = m. Let D = (aip jq )m p,q=1 . Then, DD is a principal submatrix ∗ of CC . Using the interlacing properties of the eigenvalues of CC ∗ and DD∗ , we can deduce (4.13.11). ◻ We now restate the above materials for linear operators.

4.14. Majorization

185

Definition 4.13.8. Let U and V be finite-dimensional vector spaces over F. For k ∈ Z+ , denote Lk (V, U) ∶= {T ∈ L(V, U) ∶ rank T ≤ k}. Assume furthermore that U and V are IPS. Let T ∈ L(V, U) and assume that the orthonormal bases [d1 , . . . , dm ] and [c1 , . . . , cn ] of U and V, respectively, satisfy (4.13.5). Define T0 ∶= 0 and Tk ∶= T , for integers k ≥ rank T . For k ∈ [rank T − 1], define Tk ∈ L(V, U) by the equality Tk (v) = ∑ki=1 σi (T )⟨v, ci ⟩di , for any v ∈ V. It is straightforward to show that Tk ∈ Lk (V, U) and Tk is unique if and only if σk (T ) > σk+1 (T ) (see Problem 4.15.2-8). Theorem 4.13.7 yields the following corollary: Corollary 4.13.9. Let U and V be finite-dimensional IPS over F and T ∶ V → U be a linear operator. Then ¿ Árank T √ √ À ∑ σ (T )2 , (4.13.13) ∣∣T ∣∣F ∶= tr T ∗ T = tr T T ∗ = Á i i=1

∣∣T ∣∣2 ∶= min

Q∈Lk (V,U)

max

x∈V,∣∣x∣∣2 =1

∣∣T − Q∣∣2 = σk+1 (T ),

∣∣T x∣∣2 = σ1 (T ),

(4.13.14)

k = 1, . . . , rank T − 1.

(4.13.15)

4.14 Majorization Definition 4.14.1. Let Rn↘ ∶= {x = (x1 , . . . , xn )⊺ ∈ Rn ∶

x1 ≥ x2 ≥ ⋯ ≥ xn }.

¯ = (¯ For x = (x1 , . . . , xn )⊺ ∈ Rn let x x1 , . . . , x ¯n )⊺ ∈ Rn↘ be the unique rearrangement of the coordinates of x in a decreasing order. That is, there exists a permutation π on [n] such that xπ(1) ≥ xπ(2) ≥ ⋯ ≥ xπ(n) , x ¯i = xπ(i) , i = 1, . . . , n. Let x = (x1 , . . . , xn )⊺ , y = (y1 , . . . , yn )⊺ ∈ Rn . Then x is weakly majorized by y (or y weakly majorizes x), denoted by x ⪯ y, if k

k

i=1

i=1

¯i ≤ ∑ y¯i , ∑x

k = 1, . . . , n.

(4.14.1)

Also, x is majorized by y (or y majorizes x), denoted by x ≺ y, if x ⪯ y and n n ∑i=1 xi = ∑i=1 yi . Proposition 4.14.2. Let y = (y1 , . . . , yn )⊺ ∈ Rn↘ . Let M (y) ∶= {x ∈ Rn↘ ∶

x ≺ y}.

Then M (y) is a closed convex set. The proof is left as an exercise. Definition 4.14.3. Denote by Rn×n the set of matrices A = [aij ]ni,j=1 where + each entry aij ∈ R, aij ≥ 0. We denote a nonnegative matrix by A ≥ 0. Denote by

186

Chapter 4. Inner product spaces

Ωn ⊂ Rn×n the set of doubly stochastic matrices. Denote by n1 Jn the n × n doubly + is the matrix in stochastic matrix in which every entry equals n1 , i.e., Jn ∈ Rn×n + which every entry is 1. Lemma 4.15. The following properties hold: 1. A ∈ Rn×n is doubly stochastic if and only if A1 = A⊺ 1 = 1, where 1 = + ⊺ (1, . . . , 1) ∈ Rn . 2. Ω1 = {1}. 3. Ωn is a convex set. 4. A, B ∈ Ωn ⇒ AB ∈ Ωn . 5. Pn ⊂ Ωn . 6. Pn is a group with respect to the multiplication of matrices, with In the identity and P −1 = P ⊺ . 7. A ∈ Ωl , B ∈ Ωm → A ⊕ B ∈ Ωl+m . The proof is left as an exercise. Theorem 4.14.4. Ωn is a convex set generated by Pn . i.e., Ωn = conv Pn . Furthermore, Pn is the set of the extreme points of Ωn . That is, A ∈ Rn×n is + doubly stochastic if and only if A = ∑ aP P for some aP ≥ 0, where P ∈ Pn , and ∑ aP = 1.

(4.14.2)

P ∈Pn

P ∈Pn

Furthermore, if A ∈ Pn , then for any decomposition of A of the above form one has the equality aP = 0 for P ≠ A. Proof. In view of properties 3 and 5 of Lemma 4.15 it follows that any A of the form (4.14.2) is doubly stochastic. We now show by induction on n that any A ∈ Ωn is of the form (4.14.2). For n = 1 the result trivially holds. Assume that the result holds for n = m − 1 and n = m. Assume that A = [aij ] ∈ Ωn . Let l(A) be the number of nonzero entries of A. Since each row sum of A is 1 it follows that l(A) ≥ n. Suppose first l(A) ≤ 2n − 1. Then there exists a row i of A that has exactly one nonzero element. Then, it must be 1. Hence there exist i, j ∈ [n] such that aij = 1. Then all other elements (n−1)×(n−1) of A on the row i and column j are zero. Denote by Aij ∈ R+ the matrix obtained from A by deleting the row and column j. Clearly, Aij ∈ Ωn−1 . Use the induction hypothesis on Aij to deduce (4.14.2), where aP = 0 if the entry (i, j) of P is not 1. We now show by induction on l(A) ≥ 2n − 1 that A is of the form (4.14.2). Suppose that any A ∈ Ωn such that l(A) ≤ l − 1, l ≥ 2n is of the form (4.14.2). Assume that l(A) = l. Let S ⊂ [n] × [n] be the set of all indices (i, j) ∈ [n] × [n] where aij > 0. Note that #S = l(A) ≥ 2n. Consider the following system of equations in n2 variables, which are the entries of X = [xij ]ni,j=1 ∈ Rn×n : n

n

j=1

j=1

∑ xij = ∑ xji = 0,

i = 1, . . . , n.

4.14. Majorization

187

Since the sum of all rows of X is equal to the sum of all columns of X we deduce that the above system has at most 2n − 1 linear independent equations. Assume furthermore the conditions xij = 0 for (i, j) ∈/ S. Since we have at least 2n variables, it follows that there exists X ≠ 0n×n satisfying the above conditions. Note that X has a zero entry in the places where A has a zero entry. Furthermore, X has at least one positive and one negative entry. Therefore, there exist b, c > 0 such that A − bX, A + cX ∈ Ωn and l(A − bX), l(A + cX) < l. So A − bX, A + cX b c (A − bX) + b+c (A + cX) we deduce that A is are of the form (4.14.2). As A = b+c of the form (4.14.2). Assume that A ∈ Pn . Consider the decomposition (4.14.2). Suppose that aP > 0. So A − aP P ≥ 0. Therefore, if the (i, j) entry of A is zero then the (i, j) entry of P is zero. Since A is a permutation matrix it follows that the nonzero entries of A and P must coincide. Hence, A = P . ◻ Theorem 4.14.5. Let Ωn,s ⊂ Ωn be the set of symmetric doubly stochastic matrices. Denote Pn,s = {A ∶ A = 21 (P + P ⊺ ) for some P ∈ Pn }.

(4.14.3)

Then A ∈ Pn,s if and only if there exists a permutation matrix P ∈ Pn such that A = P BP ⊺ , where B = diag(B1 , . . . , Bt ) and each Bj is a doubly stochastic symmetric matrix of the following forms: 1. 1 × 1 matrix [1]. 0 2. 2 × 2 matrix [ 1

1 ]. 0

⎡0 ⎢ ⎢1 ⎢ 3. n × n matrix ⎢ 2 ⎢⋮ ⎢1 ⎢ ⎣2

1 2

0 ⋮ 0

0 1 2

⋮ 0

... ... ... ...

0 0 ⋮ 1 2

1⎤ 2⎥

0 ⎥⎥ ⎥. ⋮ ⎥⎥ 0 ⎥⎦

Furthermore, Ωn,s = conv Pn,s and Pn,s = ext Ωn,s . That is, A ∈ Ωn,s if and only if A = ∑ bR R, for some aR ≥ 0, where R ∈ Pn,s , and R∈Pn,s

∑ bR = 1. (4.14.4)

R∈Pn,s

Moreover, if A ∈ Pn,s , then for any decomposition of A of the above form bR = 0 unless R = A. Proof. Assume that A ∈ Ωn,s . As A is doubly stochastic (4.14.2) holds. Clearly, A = A⊺ = ∑P ∈Pn aP P ⊺ . Hence, 1 1 A = (A + A⊺ ) = ∑ aP ( (P + P ⊺ )). 2 2 P ∈Pn This establishes (4.14.4). Let Q ∈ Pn . So Q represents a permutation σ ∶ [n] → [n]. Namely, Q = [qij ]ni,j=1 where qij = δσ(i)j for i, j ∈ [n]. Fix i ∈ [n] and consider the orbit of i under the iterations of σ: σ k (i) for k ∈ N. Then σ decomposes to t-cycles.

188

Chapter 4. Inner product spaces

Namely, we have a decomposition of [n] to a disjoint union ∪j∈[t] Cj . On each Cj = {c1,j , . . . , clj ,j } σ acts as a cyclic permutation c1,j → c2,j → ⋯ → clj −1,j → clj ,j → c1,j Rename the integers in [n] such that each Cj consists of consecutive integers; cp,j = c1,j + p − 1 for p = 1, . . . , lj and 1 ≤ l1 ≤ ⋯ ≤ lt . Equivalently, there exists a permutation P such that P QP −1 = P QP ⊺ has the block diagonal form: diag(Q1 , . . . , Qt ). Qj is [1] if and only if lj = 1. For lj ≥ 2, Qj = [δ(i+1)j ]ni,j=1 , where n + 1 ≡ 1. It now follows that 12 P (Q + Q⊺ )P ⊺ has one of the forms 1,2,3. Suppose that A ∈ Pn,s has a decomposition (4.14.4). Assume that bR > 0. Then A − bR R ≥ 0. If the (i, j) entry of A is zero then the (i, j) entry of R is also zero. Without loss of generality we may assume that A = diag(B1 , . . . , Bt ) and each Bj has one of the forms 1,2,3. So R = diag(R1 , . . . , Rt ). Recall that R = 12 (Q + Q⊺ ) for some Q ∈ Pn . Hence, Q = diag(Q1 , . . . , Qt ), where each Qj is a permutation matrix of corresponding order. Clearly, each row and column of Rj has at most two nonzero entries. If Bj has the form 1 or 2 then Rj = Qj = Bj . Suppose that Bj is of the form 3. So Bj = 21 (Fj + Fj⊺ ) for a corresponding cyclic permutation and Rj = 12 (Qj + Q⊺j ). Then a straightforward argument shows that either Qj = Fj or Qj = Fj⊺ . ◻ Theorem 4.14.6. Let x, y ∈ Rn . Then x ≺ y if and only if there exists A ∈ Ωn such that x = Ay. Proof. Assume first that x = P y, for some P ∈ Pn . Then it is straightforward to see that x ≺ y. Assume that x = Ay for some A ∈ Ωn . Use Theorem 4.14.4 to deduce that x ≺ y. ¯ , y = Q¯ Assume now that x, y ∈ Rn and x ≺ y. Since x = P x y for some ¯ ≺y ¯ . In view of Lemma 4.15 it is enough to show P, Q ∈ Pn , it follows that x ¯ = By ¯ some B ∈ Ωn . We prove this claim by induction on n. For n = 1 the that x ¯≺y ¯ ∈ Rl then x ¯ = By ¯ for some B ∈ Ωl , for all claim is trivial. Assume that if x ¯≺y ¯ . Suppose first that for some 1 ≤ k ≤ n − 1 l ≤ m − 1. Assume that n = m and x we have the equality ∑ki=1 x ¯i = ∑ki=1 y¯i . Let x1 = (¯ x1 , . . . , x ¯k )⊺ , y1 = (¯ y1 , . . . , y¯k )⊺ ∈ Rk ,

x2 = (¯ xk+1 , . . . , x ¯n )⊺ , y2 = (¯ yk+1 , . . . , y¯n )⊺ ∈ Rn−k . Then x1 ≺ y1 , x2 ≺ y2 . Use the induction hypothesis that xi = Bi yi , i = 1, 2 ¯ = (B1 ⊕ B2 )y and B1 ⊕ B2 ∈ Ωn . where B1 ∈ Ωk , B2 ∈ Ωn−k . Hence, x It is left to consider the case where strict inequalities hold in (4.14.1) for k = 1, . . . , n − 1. We now define a finite number of vectors, ¯ = z1 ≻ z2 = z ¯ 2 ≻ ⋯ ≻ zN = z ¯N ≻ x, y where N ≥ 2, such that 1. zi+1 = Bi zi for i = 1, . . . , N − 1. 2. ∑ki=1 x ¯i = ∑ki=1 wi for some k ∈ [n − 1], where zN = w = (w1 , . . . , wn )⊺ . ¯=y ¯ and we have Observe first that we cannot have y¯1 = ⋯ = y¯n . Otherwise, x equalities in (4.14.1) for all k ∈ [n], which contradicts our assumptions. Assume that we defined ¯ = z1 ≻ z2 = z ¯ 2 ≻ ⋯ ≻ zr = z ¯r = (u1 , . . . , un )⊺ ≻ x, y

4.14. Majorization

189

for 1 ≤ r such that ∑ki=1 x ¯i < ∑ki=1 ui for k = 1, . . . , n − 1. Assume that u1 = ⋯ = up > up+1 = ⋯ = up+q , where up+q > up+q+1 if p + q < n. Let C(t) = ((1 − t Jp+q ) ⊕ In−(p+q) ) for t ∈ [0, 1] and define u(t) = C(t)zr . We vary t t)Ip+q + p+q ¯ (t) for all t ∈ [0, 1]. We continuously from t = 0 to t = 1. Note that u(t) = u have two possibilities. First, there exists t0 ∈ (0, 1] such that u(t) ≻ x for all t ∈ [0, t0 ]. Furthermore, for w = u(t0 ) = (w1 , . . . , wn )⊺ we have the equality k ¯i = ∑ki=1 wi for some k ∈ [n − 1]. In that case r = N − 1 and zN = u(t0 ). ∑i=1 x Otherwise, let zr+1 = u(1) = (v1 , . . . , vn )⊺ , where v1 = ⋯ = vp+q > vp+q+1 . Repeat this process for zr+1 and so on until we deduce conditions 1 and 2. So x = BN zN = BN BN −1 zN −1 = BN . . . B1 y. In view of part 4 of Lemma 4.15, we deduce that x = Ay for some A ∈ Ωn . ◻ Combine Theorems 4.14.6 and 4.14.4 to deduce the following: Corollary 4.14.7. Let x, y ∈ Rn . Then x ≺ y if and only if x = ∑ aP P y, for some aP ≥ 0, where ∑ aP = 1.

(4.14.5)

P ∈Pn

P ∈Pn

Furthermore, if x ≺ y and x ≠ P y, for all P ∈ Pn , then in (4.14.5), each aP < 1. Lemma 4.16. Let x, y ∈ Rn . Then x ≺ y if and only if −y ≺ −x. The proof is left as an exercise. Theorem 4.14.8. Let x = (x1 , . . . , xn )⊺ ≺ y = (y1 , . . . , yn )⊺ . Let φ ∶ [¯ yn , y¯1 ] → R be a convex function. Then n

n

i=1

i=1

∑ φ(xi ) ≤ ∑ φ(yi ).

(4.14.6)

If φ is strictly convex on [¯ yn , y¯1 ] and P x ≠ y for all P ∈ Pn then strict inequality holds in (4.14.6). Proof. Lemma 4.16 implies that if x = (x1 , . . . , xn )⊺ ≺ y = (y1 , . . . , yn )⊺ then xi ∈ [¯ yn , y¯1 ] for i = 1, . . . , n. Use Corollary 4.14.7 and the convexity of φ to deduce φ(xi ) ≤ ∑ aP φ((P y)i ), i = 1, . . . , n. P ∈Pn n n ∑i=1 φ(yi ) = ∑i=1 φ((P y)i )

Observe next that for all P ∈ Pn . Sum up the above inequalities to deduce (4.14.6). Assume now that φ is strictly convex and x ≠ P y for all P ∈ Pn . Then Corollary 4.14.7 and the strict convexity of φ imply that at least in 1 the above ith inequality 1 has strict inequality. Hence strict inequality holds in (4.14.6). ◻ Definition 4.14.9. Let V be an n-dimensional IPS, and T ∈ S(V). Define the eigenvalue vector of T to be λ(T ) ∶= (λ1 (T ), . . . , λn (T ))⊺ ∈ Rn↘ . Corollary 4.14.10. Let V be an n-dimensional IPS. Let T ∈ S(V), and Fn = {f1 , . . . fn } ∈ Fr(n, V). Then (⟨T f1 , f1 ⟩, . . . , ⟨T fn , fn ⟩)⊺ ≺ λ(T ).

190

Chapter 4. Inner product spaces

Furthermore, if φ ∶ [λn (T ), λ1 (T )] → R is a convex function, then n

∑ φ(λi (T )) =

i=1

n

max

∑ φ(⟨T fi , fi ⟩).

{f1 ,...fn }∈Fr(n,V) i=1

Finally, if φ is strictly convex, then ∑ni=1 φ(λi (T )) = ∑ni=1 φ(⟨T fi , fi ⟩) if and only if {f1 , . . . , fn } is a set of n orthonormal eigenvectors of T .

4.15 Characterizations of singular values Theorem 4.15.1. For the field F, assume that A ∈ Fm×n . Define 0 H(A) = [ ∗ A

A ] ∈ Hm+n (F). 0

(4.15.1)

Then λi (H(A)) = σi (A), λm+n+1−i (H(A)) = −σi (A), i = 1, . . . , rank A, (4.15.2) λj (H(A)) = 0, j = rank A + 1, . . . , n + m − rank A. View A as an operator A ∶ Fn → Fm . Choose orthonormal bases [d1 , . . . , dm ] and [c1 , . . . , cn ] in Fm and Fn , respectively, satisfying (4.13.5). Then [

0 A∗

A di d ] [ ] = σi (A) [ i ] , 0 ci ci

0 [ ∗ A

A di d ] [ ] = −σi (A) [ i ] , 0 −ci −ci

i = 1, . . . , rank A, ker H(A) = span

(4.15.3)

{(d∗r+1 , 0)∗ , . . . , (d∗m , 0)∗ , (0, c∗r+1 )∗, . . . , (0, c∗n )∗ },

r = rank A.

Proof. It is straightforward to show the equalities (4.15.3). Since all the eigenvectors appearing in (4.15.3) are linearly independent, we deduce (4.15.2). ◻ Corollary 4.15.2. For the field F, assume that A ∈ Fm×n . Let Aˆ ∶= A[α, β] ∈ Fp×q be a submatrix of A, formed by the set of rows and columns α ∈ Qp,m , β ∈ Qq,n , respectively. Then ˆ ≤ σi (A) for i = 1, . . . . σi (A)

(4.15.4)

ˆ = σi (A), i = 1, . . . , l hold if and only if there For l ∈ [rank A], the equalities σi (A) exist two orthonormal systems of l right and left singular vectors c1 , . . . , cl ∈ Fn , d1 , . . . , dl ∈ Fn satisfying (4.15.3), for i = 1, . . . , l such that the nonzero coordinate vectors c1 , . . . , cl and d1 , . . . , dl are located at the indices β and α, respectively. (See Problem 4.15.2-10.) Corollary 4.15.3. Let V and U be two IPS over F. Assume that W is a subspace of V, T ∈ L(V, U) and denote by Tˆ ∈ L(W, U) the restriction of T to W. Then, σi (Tˆ) ≤ σi (T ), for any i ∈ N. Furthermore, σi (Tˆ) = σi (T ), for i = 1, . . . , l ≤ rank T if and only if U contains a subspace spanned by the first l right singular vectors of T . (See Problem 4.15.2-11.)

4.15. Characterizations of singular values

191

We now translate Theorem 4.15.1 to the operator setting. Lemma 4.17. Let U and V be two finite-dimensional IPS spaces with the inner products ⟨⋅, ⋅⟩U , ⟨⋅, ⋅⟩V , respectively. Define W ∶= V ⊕ U as the induced IPS by ⟨(y, x), (v, u)⟩W ∶= ⟨y, v⟩V + ⟨x, u⟩U . Let T ∶ V → U be a linear operator and T ∗ ∶ U → V be the adjoint of T . Define the operator Tˆ ∶ W → W, Tˆ(y, x) ∶= (T ∗ x, T y). (4.15.5) 2 ∗ ∗ Then, Tˆ is a self-adjoint operator and Tˆ = T T ⊕ T T . Hence, the spectrum of Tˆ is symmetric with respect to the origin and Tˆ has exactly 2rank T nonzero eigenvalues. More precisely, if dim U = m, dim V = n, then λi (Tˆ) = −λm+n−i+1 (Tˆ) = σi (T ), for i = 1, . . . , rank T, λj (Tˆ) = 0, for j = rank T + 1, . . . , n + m − rank T.

(4.15.6)

Let {d1 , . . . , dmin(m,n) } ∈ Fr(min(m, n), U) and {c1 , . . . , cmin(m,n) } ∈ Fr(min (m, n), V) be the set of vectors satisfying (4.13.5). Define 1 1 zi ∶= √ (ci , di ), zm+n−i+1 ∶= √ (ci , −di ), i = 1, . . . , min(m, n). 2 2

(4.15.7)

Then {z1 , zm+n , . . . , zmin(m,n) , zm+n−min(m,n)+1 } ∈ Fr(2 min(m, n), W). Furthermore, Tˆzi = σi (T )zi and Tˆzm+n−i+1 = −σi (T )zm+n−i+1 , for i = 1, . . . , min(m, n). The proof is left as Problem 4.15.2-12. Theorem 4.15.4. Let U and V be finite-dimensional IPS over C, with dim U = m, dim V = n and T ∶ V → U be a linear operator. Then, for each k ∈ [min(m, n)], we have k

∑ σi (T ) =

i=1

=

k

∑ R⟨T gi , fi ⟩U

max

{f1 ,...,fk }∈Fr(k,U),{g1 ,...,gk }∈Fr(k,V) i=1

(4.15.8)

k

max

∑ ∣⟨T gi , fi ⟩U ∣.

{f1 ,...,fk }∈Fr(k,U),{g1 ,...,gk }∈Fr(k,V) i=1

Here, R stands for the real part of a complex number. Furthermore, ∑ki=1 σi (T ) = k ∑i=1 R⟨T gi , fi ⟩U , for some two k-orthonormal frames Fk = {f1 , . . . , fk }, Gk = {g1 , . . . , gk } if and only span {(g1 , f1 ), . . . , (gk , fk )} is spanned by k eigenvectors of Tˆ corresponding to the first k eigenvalues of Tˆ. Proof. Assume that {f1 , . . . , fk } ∈ Fr(k, U) and {g1 , . . . , gk } ∈ Fr(k, V). Let wi ∶= √12 (gi , fi ), i = 1, . . . , k. Then, {w1 , . . . , wk } ∈ Fr(k, W). A straightforward calculation shows ∑ki=1 ⟨Tˆwi , wi ⟩W = ∑ki=1 R⟨T gi , fi ⟩U . The maximal characterization of ∑ki=1 λi (Tˆ), Theorem 4.10.8, and (4.15.6) yield the inequality k k ∑i=1 σi (Tˆ) ≥ ∑i=1 R⟨T gi , fi ⟩U for k ∈ [min(m, n)]. Let c1 , . . . , cmin(m,n) and d1 , . . . , dmin(m,n) satisfy (4.13.5). Then, Lemma 4.17 yields that ∑ki=1 σi (Tˆ) = k ∑i=1 R⟨T ci , di ⟩U , for k ∈ [min(m, n)]. This proves the first equality of (4.15.8). The second equality of (4.15.8) is straightforward. (See Problem 4.15.2-13.)

192

Chapter 4. Inner product spaces

Assume now that ∑ki=1 σi (T ) = ∑ki=1 R⟨T gi , fi ⟩U , for some two k-orthonormal frames Fk = {f1 , . . . , fk }, Gk = {g1 , . . . , gk }. Define w1 , . . . , wk as above. The above arguments yield that ∑ki=1 ⟨Tˆwi , wi ⟩W = ∑ki=1 λi (Tˆ). Theorem 4.10.8 implies that span {(g1 , f1 ), . . . , (gk , fk )} is spanned by k eigenvectors of Tˆ corresponding to the first k eigenvalues of Tˆ. Conversely, assume that {f1 , . . . , fk } ∈ Fr(k, U), {g1 , . . . , gk } ∈ Fr(k, V) and span {(g1 , f1 ), . . . , (gk , fk )} is spanned by k eigenvectors of Tˆ corresponding to the first k eigenvalues of Tˆ. Define {w1 , . . . , wk } ∈ Fr(W) as above. Then, span {w1 , . . . , wk } contains k linearly independent eigenvectors corresponding to the first k eigenvalues of Tˆ. Theorem 4.10.8 and Lemma 4.17 conclude that σi (T ) = ∑ki=1 ⟨Tˆwi , wi ⟩W = k ◻ ∑i=1 R⟨T gi , fi ⟩U . Theorem 4.15.5. Let U and V be two IPS with dim U = m and dim V = n. Assume that S and T ∶ V → U are two linear operators. Then R tr(S ∗ T ) ≤

min(m,n)

∑

σi (S)σi (T ).

(4.15.9)

i=1

Equality holds if and only if there exist two orthonormal sets {d1 , . . . , dmin(m,n) } ∈ Fr(min(m, n), U) and {c1 , . . . , cmin(m,n) } ∈ Fr(min(m, n), V) such that Sci = σi (S)di , T ci = σi (T )di , S ∗ di = σi (S)ci , T ∗ di = σi (T )ci , i = 1, . . . , min(m, n). (4.15.10) Proof. Let A, B ∈ Cn×m . Then, tr B ∗ A = tr AB ∗ . Hence, 2R tr AB ∗ = tr H(A)H(B). ˆ Tˆ and Lemma 4.17 to Therefore, 2R tr S ∗ T = tr SˆTˆ. Use Theorem 4.12.1, for S, ˆ ˆ deduce (4.15.9). Equality in (4.15.9) holds if and only if tr SˆTˆ = ∑m+n i=1 λi (S)λi (T ). Clearly, the assumptions that {d1 , . . . , dmin(m,n) } ∈ Fr(min(m, n), U), {c1 , . . . , cmin(m,n) } ∈ Fr(min(m, n), V) and the equalities (4.15.10) imply equality in (4.15.9). Assume equality in (4.15.9). Theorem 4.12.1 and the definitions of Sˆ and hatT yield the existence {d1 , . . . , dmin(m,n) } ∈ Fr(min(m, n), U) and {c1 , . . . , cmin(m,n) } ∈ Fr(min(m, n), V), such that (4.15.10) hold. ◻ Theorem 4.15.6. Let U and V be finite-dimensional IPS over F and T ∶ V → U be a linear operator. Then

min

Q∈Lk (V,U)

¿ Árank T À ∑ σ 2 (T ), ∣∣T − Q∣∣F = Á i

k = 1, . . . , rank T − 1.

(4.15.11)

i=k+1

√ T 2 Furthermore, ∣∣T − Q∣∣F = ∑rank i=k+1 σi (T ), for some Q ∈ Lk (V, U), k < rank T , if and only if Q = Tk , where Tk is defined in Definition 4.13.8.

4.15. Characterizations of singular values

193

Proof. Use Theorem 4.15.5 to deduce that for any Q ∈ L(V, U), one has ∣∣T − Q∣∣2F = tr T ∗ T − 2R tr Q∗ T + tr Q∗ Q ≥ rank T

k

i=1

i=1

k

2 2 ∑ σi (T ) − 2 ∑ σi (T )σi (Q) + ∑ σi (Q) = i=1

k

rank T

rank T

i=1

i=k+1

i=k+1

2 2 2 ∑(σi (T ) − σi (Q)) + ∑ σi (T ) ≥ ∑ σi (T ).

T 2 Clearly, ∣∣T − Tk ∣∣2F = ∑rank i=k+1 σi (T ). Hence, (4.15.11) holds. Vice versa, if Q ∈ T 2 Lk (V, U) and ∣∣T − Q∣∣2F = ∑rank i=k+1 σi (T ), then the equality case in Theorem 4.15.5 implies that Q = Tk . ◻

Corollary 4.15.7. For the field F, let A ∈ Fm×n . Then ¿ Árank A À ∑ σ 2 (A), k = 1, . . . , rank A − 1. min ∣∣A − B∣∣ = Á B∈Rm,n,k (F)

F

i

(4.15.12)

i=k+1

√ A 2 Furthermore, ∣∣A − B∣∣F = ∑rank i=k+1 σi (A), for some B ∈ Rm,n,k (F), k < rank A, if and only if B = Ak , where Ak is defined in Definition 4.13.6. Theorem 4.15.8. For the matrix A ∈ Fm×n , we have j

min

k+j

∑ σi (A − B) = ∑ σi (A),

B∈Rm,n,k (F) i=1

i=k+1

(4.15.13)

j = 1, . . . , min(m, n) − k, k = 1, . . . , min(m, n) − 1. Proof. Clearly, for B = Ak , we have the equality ∑ji=1 σi (A − B) = ∑k+j i=k+1 σi (A). Let B ∈ Rm,n,k (F). Assume that X ∈ Gr(k, Cn ) is a subspace that contains the columns of B. Let W = {(0⊺ , x⊺ )⊺ ∈ Fm+n , x ∈ X}. Observe that for any z ∈ W⊥ , one has the equality z∗ H((A − B))z = z∗ H(A)z. Combine Theorems 4.10.9 and 4.15.1 to deduce ∑ji=1 σi (B − A) ≥ ∑k+j ◻ i=k+1 σi (A). Theorem 4.15.9. Let V be an n-dimensional IPS over C and T ∶ V → V be a linear operator. Assume the n eigenvalues of T , λ1 (T ), . . . , λn (T ), are arranged in the order ∣λ1 (T )∣ ≥ ⋯ ≥ ∣λn (T )∣. Let λa (T ) ∶= (∣λ1 (T )∣, . . . , ∣λn (T )∣), σ(T ) ∶= (σ1 (T ), . . . , σn (T )). Then, λa (T ) ⪯ σ(T ). That is, k

k

i=1

i=1

∑ ∣λi (T )∣ ≤ ∑ σi (T ),

i = 1, . . . , n.

(4.15.14)

Furthermore, ∑ki=1 ∣λi (T )∣ = ∑ki=1 σi (T ), for some k ∈ [n] if and only if the following conditions are satisfied. There exists an orthonormal basis {x1 , . . . , xn } of V such that 1. T xi = λi (T )xi , T ∗ xi = λi (T )xi , for i = 1, . . . , k. 2. Denote by S ∶ U → U the restriction of T to the invariant subspace U = span {xk+1 , . . . , xn }. Then, ∣∣S∣∣2 ≤ ∣λk (T )∣.

194

Chapter 4. Inner product spaces

Proof. Use Theorem 4.8.15 to choose an orthonormal basis {g1 , . . . , gn } of V such that T is represented by an upper diagonal matrix A = [aij ] ∈ Cn×n such that aii = λi (T ), i = 1, . . . , n. Let i ∈ C, ∣i ∣ = 1 such that ¯i λi (T ) = ∣λi (T )∣, for i = 1, . . . , n. Let S ∈ L(V) be presented in the basis {g1 , . . . , gn } by a diagonal matrix diag(1 , . . . , k , 0, . . . , 0). Clearly, σi (S) = 1, for i = 1, . . . , k and σi (S) = 0, for i = k + 1, . . . , n. Furthermore, R tr S ∗ C = ∑ki=1 ∣λi (T )∣. Hence, Theorem 4.15.5 yields (4.15.14). Assume now that ∑ki=1 ∣λi (T )∣ = ∑ki=1 σi (T ). Hence, the equality sign holds in (4.15.9). Hence, there exist two orthonormal bases {c1 , . . . , cn } and {d1 , . . . , dn } in V such that (4.15.10) holds. It easily follows that {c1 , . . . , ck } and {d1 , . . . , dk } are orthonormal bases of W ∶= span {g1 , . . . , gk }. Hence, W is an invariant subspace of T and T ∗ . Then, A = A1 ⊕ A2 , i.e., A is a block diagonal matrix. Thus, k n A1 = [aij ]i,j=1 ∈ Ck×k , A2 = [aij ]i,j=k+1 ∈ C(n−k)×(n−k) represent the restriction of ⊥ T to W, U ∶= W , denoted by T1 and T2 , respectively. Hence, σi (T1 ) = σi (T ), for i = 1, . . . , k. Note that the restriction of S to W, denoted by S1 , is given by the diagonal matrix D1 ∶= diag(1 , . . . , k ) ∈ U(k). Next, (4.15.10) yields that S1−1 T1 ci = σi (T )ci , for i = 1, . . . , k, i.e., σ1 (T ), . . . , σk (T ) are the eigenvalues of S1−1 T1 . Clearly, S1−1 T1 is presented in the basis [g1 , . . . , gk ] by the matrix D1−1 A1 , which is a diagonal matrix with ∣λ1 (T )∣, . . . , ∣λk (T )∣ on the main diagonal. That is, S1−1 T1 has eigenvalues ∣λ1 (T )∣, . . . , ∣λk (T )∣. Therefore, σi (T ) = ∣λi (T )∣, for i = 1, . . . , k. Theorem 4.13.7 yields that k

k

k

k

i,j=1

i=1

i=1

i=1

tr A∗1 A1 = ∑ ∣aij ∣2 = ∑ σi2 (A1 ) = ∑ σi2 (T1 ) = ∑ ∣λi (T )∣2 . As λ1 (T ), . . . , λk (T ) are the diagonal elements of A1 , it follows from the above equality that A1 is a diagonal matrix. Hence, we can choose xi = gi , for i = 1, . . . , n to obtain part 1 of the equality case. Let T x = λx, where ∣∣x∣∣ = 1 and ρ(T ) = ∣λ∣. Recall that ∣∣T ∣∣2 = σ1 (T ), where σ1 (T )2 = λ1 (T ∗ T ) is the maximal eigenvalue of the self-adjoint operator T ∗ T . The maximum characterization of λ1 (T ∗ T ) yields that ∣λ∣2 = ⟨T x, T x⟩ = ⟨T ∗ T x, x⟩ ≤ λ1 (T ∗ T ) = ∣∣T ∣∣22 . Hence, ρ(T ) ≤ ∣∣T ∣∣2 . Assume now that ρ(T ) = ∣∣T ∣∣2 . If ρ(T ) = 0, then ∣∣T ∣∣2 = 0. This means T = 0 and the theorem holds trivially in this case. Assume that ρ(T ) > 0. Hence, the eigenvector x1 ∶= x is also the eigenvector of T ∗ T corresponding to ¯ λ1 (T ∗ T ) = ∣λ∣2 . Hence, ∣λ∣2 x = T ∗ T x = T ∗ (λx), which implies that T ∗ x = λx. ⊥ Let U = span (x) be the orthogonal complement of span (x). Since T span (x) = span (x), it follows that T ∗ U ⊆ U. Similarly, since T ∗ span (x) = span (x), then T U ⊆ U. Thus, V = span (x) ⊕ U and span (x) and U are invariant subspaces of T and T ∗ . Hence, span (x) and U are invariant subspaces of T ∗ T and T T ∗ . Let T1 be the restriction of T to U. Then, T1∗ T1 is the restriction of T ∗ T . Therefore, ∣∣T1 ∣∣22 ≥ λ1 (T ∗ T ) = ∣∣T ∣∣22 . The above result implies the equality ρ(T ) = ∣∣T ∣∣2 . ◻ Corollary 4.15.10. Let U be an n-dimensional IPS over C and T ∶ U → U be a linear operator. Denote by ∣λ(T )∣ = (∣λ1 (T )∣, . . . , ∣λn (T )∣)T the absolute eigenvalues of T (counting with their multiplicities), arranged in decreasing order. Then, ∣λ(T )∣ = (σ1 (T ), . . . , σn (T ))⊺ if and only if T is a normal operator.

4.15. Characterizations of singular values

195

4.15.1 Worked-out problems 1. Let A ∈ Cn×n and σ1 (A) ≥ ⋯ ≥ σn (A) ≥ 0 be the singular values of A. Let λ1 (A), . . . , λn (A) be the eigenvalues of A listed with their multiplicities and ∣λ1 (A)∣ ≥ ⋯ ≥ ∣λn (A)∣. (a) Show that ∣λ1 (A)∣ ≤ σ1 (A). (b) What is a necessary and sufficient condition for the equality ∣λ1 (A)∣ = σ1 (A)? (c) Is it true that σn (A) ≤ ∣λn (A)∣? (d) Show that ∑ni=1 σi (A)2 ≥ ∑ni=1 ∣λi (A)∣2 . Solution: (a) Let Ax = λ1 (A)x, for ∥x∥ = 1. Then, ∥Ax∥ = ∣λ1 (A)∣∥x∥ = ∣λ1 (A)∣ ≤ max∥y∥=1 (∥Ay∥) = σ1 (A). (b) Equality holds if and only if A∗ Ax = λ1 (A∗ A)x, i.e., A∗ x = λ1 x and ∣λ1 (A)∣ = σ1 (A). (c) Yes. We have σn (A)2 = min y∗ A∗ Ay. Now, Ay = λn (A)y, ∥y∥ = 1. ∥y∥=1

Hence, σn (A)2 ≤ ∣λn (A)∣2 .

(d) Let B = U AU ∗ , where U is unitary and B is upper-triangular. If ⎡λ1 ⎢ ⎢0 ⎢ ⎢ B=⎢0 ⎢ ⎢ ⋮ ⎢ ⎢0 ⎣

b12 λ2 0 ⋮ ...

... b23 ⋱

...

0

b1n ⎤⎥ b2n ⎥⎥ ⎥ ⋮ ⎥ = [bij ], ⎥ ⋮ ⎥⎥ λn ⎥⎦

then tr BB ∗ = ∑ni,j=0 ∣bij ∣2 = ∑ni=1 ∣λi (B)∣2 + ∑ni=2 ∑nj=i+1 ∣bij ∣2 = n n n ∑i=1 σi (B)2 . Then, ∑i=1 ∣λi (B)∣2 ≤ ∑i=1 σi (B)2 . 2. Find the SVD of A = [10

1 0

0 1]

∈ R2×3 .

Solution: First, we compute

⎡1 1 0⎤ ⎢ ⎥ ⎢ ⎥ A A = ⎢1 1 0⎥ . ⎢ ⎥ ⎢0 0 1⎥ ⎣ ⎦ The eigenvalues of A⊺ A are λ1 = 2, λ2 = 1, and λ3 = 0, with corresponding eigenvectors ⎡1⎤ ⎡0⎤ ⎡−1⎤ ⎢ ⎥ ⎢ ⎥ ⎢ ⎥ ⎢ ⎥ ⎢ ⎥ ⎢ ⎥ ⎢1⎥ , ⎢0⎥ , ⎢ 1 ⎥. ⎢ ⎥ ⎢ ⎥ ⎢ ⎥ ⎢0⎥ ⎢1⎥ ⎢0⎥ ⎣ ⎦ ⎣ ⎦ ⎣ ⎦ These vectors are orthogonal. The normalized form of them is as follows: ⊺

⎡ √1 ⎤ ⎢ 2⎥ ⎢ ⎥ v1 = ⎢ √12 ⎥ , ⎢ ⎥ ⎢ 0 ⎥ ⎣ ⎦

⎡0⎤ ⎢ ⎥ ⎢ ⎥ v2 = ⎢0⎥ , ⎢ ⎥ ⎢1⎥ ⎣ ⎦

⎡− √1 ⎤ ⎢ 2⎥ ⎢ ⎥ v3 = ⎢ √12 ⎥ . ⎢ ⎥ ⎢ 0 ⎥ ⎣ ⎦

196

Chapter 4. Inner product spaces

The singular values of A are σ1 = ⎡ √1 ⎢ 2 ⎢ V = ⎢ √12 ⎢ ⎢ 0 ⎣

√

0 − √12 ⎤⎥ ⎥ 0 √12 ⎥ ⎥ 1 0 ⎥⎦

2, σ2 = 1, and σ3 = 0. Therefore, and

√ 2 0 0 ]. Σ=[ 0 1 0

Next we calculate U. We have 1 1 1 Av1 = √ [ u1 = σ1 2 0 u2 =

1 1 1 Av2 = [ σ2 1 0

1 0 1 0

⎡ √1 ⎤ 0 ⎢⎢ 12 ⎥⎥ 1 ]⎢√ ⎥ = [ ], 1 ⎢⎢ 2 ⎥⎥ 0 ⎣ 0 ⎦ ⎡0⎤ 0 0 ⎢⎢ ⎥⎥ ] ⎢0⎥ = [ ] . 1 1 ⎢⎢ ⎥⎥ ⎣1⎦

So we have 1 U =[ 0

0 ]. 1

Note that u1 and u2 form an orthonormal basis for R2 . We get the SVD of A as follows: 1 A=[ 0

1 0

0 1 ]=[ 1 0

⎡ √1 √ 0 2 0 0 ⎢⎢ 2 ][ ]⎢ 0 1 0 1 0 ⎢⎢ √1 ⎣− 2

√1 2

0

√1 2

0⎤⎥ ⎥ 1⎥ = U ΣV ⊺ . ⎥ 0⎥⎦

4.15.2 Problems 1. Let U and V be two finite-dimensional inner product spaces. Assume that T ∈ L(U, V). Show that for any complex number t ∈ C, σi (tT ) = ∣t∣σi (T ), for all i. 2. Prove Proposition 4.13.5. (Use SVD to prove the nontrivial part of the proposition.) 3. Let A ∈ Cm×n and assume that U ∈ Um and V ∈ Vn . Show that σi (U AV ) = σi (A), for all i. 4. Let A ∈ GL(n, C). Show that σ1 (A−1 ) = σn (A)−1 . 5. Let U and V be two IPS of dimensions m and n, respectively. Assume that U = U1 ⊕ U2 , dim U1 = m1 , dim U2 = m2 , U1 ⊥ U2 , V = V1 ⊕ V2 , dim V1 = n1 , dim V2 = n2 , V1 ⊥ V2 . Suppose furthermore that T ∈ L(V, U), T V1 ⊆ U1 and T V2 ⊆ U2 . Let Ti ∈ L(Vi , Ui ) be the restriction of T to Vi , for i = 1, 2. Then, rank T = rank T1 +rank T2 and {σ1 (T ), . . . , σrank T (T )} = {σ1 (T1 ), . . . , σrank T1 (T1 )}∪ {σ1 (T2 ), . . . , σrank T2 (T2 )}. 6. Let the assumptions of Definition 4.13.6 hold. Show that for 1 ≤ k < rank A, Ak is uniquely defined if and only if σk > σk+1 .

4.15. Characterizations of singular values

197

7. Prove the equalities in (4.13.8). 8. Let the assumptions of Definition 4.13.8 hold. Show that for k ∈ [rank T − 1], rank Tk = k and Tk is unique if and only if σk (T ) > σk+1 (T ). 9. Let V be an n-dimensional IPS. Assume that T ∈ L(V) is a normal operator. Let λ1 (T ), . . . , λn (T ) be the eigenvalues of T arranged in the order ∣λ1 (T )∣ ≥ ⋯ ≥ ∣λn (T )∣. Show that σi (T ) = ∣λi (T )∣, for i = 1, . . . , n. 10. Let the assumptions of Corollary 4.15.2 hold. (a) Since rank Aˆ ≤ rank A, show that the inequalities (4.15.4) reduce to ˆ = σi (A) = 0 for i > rank A. σi (A) ˆ is a submatrix of H(A), use the Cauchy interlacing prin(b) Since H(A) ciple to deduce the inequalities (4.15.4) for i = 1, . . . , rank A. Furthermore, if p′ ∶= m − #α, q ′ = n − #β, then the Cauchy interlacing ˆ ≥ σi+p′ +q′ (A) for principle gives the complementary inequalities σi (A) any i ∈ N. ˆ = σi (A), for i = 1, . . . , l ≤ rank A. Compare the (c) Assume that σi (A) ˆ maximal characterization of the sum of the first k eigenvalues of H(A) and H(A) given by Theorem 4.10.8, for k = 1, . . . , l to deduce the last part of Corollary 4.15.2. 11. Prove Corollary 4.15.3 by choosing any orthonormal basis in U, an orthonormal basis in V whose first dim W elements span W, and using the previous problem. 12. Prove Lemma 4.17. 13. Under the assumptions of Theorem 4.15.4, show the equalities: k

max

∑ R⟨T gi , fi ⟩U

max

∑ ∣⟨T gi , fi ⟩U ∣.

{f1 ,...,fk }∈Fr(k,U),{g1 ,...,gk }∈Fr(k,V) i=1

=

k

{f1 ,...,fk }∈Fr(k,U),{g1 ,...,gk }∈Fr(k,V) i=1

14. Let U and V be finite-dimensional IPS. Assume that P and T ∈ L(U, V). min(m,n) Show that R tr(P ∗ T ) ≥ − ∑i=1 σi (S)σi (T ). Equality holds if and only if S = −P and T satisfies the conditions of Theorem 4.15.5. 15. Show that if A, B ∈ Fm×n and B is a submatrix of A, then ∥B∥∞ ≤ ∥A∥∞ . 16. Let U ΣV ∗ be the singular value decomposition of A ∈ Fm×n . (a) Show that ∥A∥2 = σ1 (the largest singular value). (b) If A is invertible, show that ∥A−1 ∥2 =

1 . σn

17. Show that the singular values of a positive definite Hermitian matrix are the same as its eigenvalues.

198

Chapter 4. Inner product spaces

18. Prove that any A ∈ Cn×n admits a polar decomposition of the form A = P Q, where P ∈ Cn×n is a nonnegative definite matrix with the same rank as A and where Q ∈ Cn×n is a unitary matrix. (Hint: Use a singular value decomposition, A = UΣV∗ , and observe that A = (UΣU∗ )(UV∗ ).) 19. Prove that if A is a normal matrix and A = P Q is its polar decomposition, then P Q = QP . (See the previous problem.)

Historical note The singular value decomposition (SVD) used in mathematics [15] and in numerical computations [11] is a very useful and versatile tool. It is used in statistics in principal component analysis [16] and recently became very useful in the analysis of DNA microarrays [1]. The singular value decomposition was originally developed by differential geometers who wished to determine whether a real bilinear form could be made equal to another by independent orthogonal transformations of the two spaces it acts on. Eugenio Beltrami and Camille Jordan discovered independently, in 1873 and 1874, respectively, that the singular values of the bilinear forms, represented as a matrix, form a complete set of invariants for bilinear forms under orthogonal substitutions. James Joseph Sylvester also arrived at the singular value decomposition for real square matrices in 1889, apparently independently of both Beltrami and Jordan. Sylvester called the singular values the canonical multipliers of the matrix A. The fourth mathematician to discover the singular value decomposition independently was Autonne, who arrived at it in 1915 via the polar decomposition. The first proof of the singular value decomposition for rectangular and complex matrices seems to have been by Carl Eckart and Gale Young in 1936; they saw it as a generalization of the principal axis transformation for Hermitian matrices.

4.16 Moore–Penrose generalized inverse The purpose of constructing a generalized inverse is to obtain a matrix that can serve as the inverse in some sense for a wider class of matrices than invertible ones. A generalized inverse exists for an arbitrary matrix A ∈ Cm×n . Definition 4.16.1. Let A ∈ Cm×n . Then, (4.13.12) is called the reduced SVD of A. It can be written as A = Ur Σr Vr∗ ,

r = rank A,

Σr ∶= diag(σ1 (A), . . . , σr (A)) ∈ Sr (R), (4.16.1)

Ur = [u1 , . . . , ur ] ∈ C

m×r

, Vr = [v1 , . . . , vr ] ∈ C

n×r

, Ur∗ Ur

=

Vr∗ Vr

= Ir .

Note that AA∗ ui = σi (A)2 ui , A∗ Avi = σi (A)2 vi , 1 1 vi = A∗ ui , ui = Avi , i = 1, . . . , r. σi (A) σi (A) Then

∗ n×m A ∶= Vr Σ−1 r Ur ∈ C

(4.16.2)

4.16. Moore–Penrose generalized inverse

199

is the Moore–Penrose generalized inverse of A. If A ∈ Rm×n , then we assume that U ∈ Rm×r and V ∈ Rn×r , i.e., U and V are real-valued matrices over the real numbers R. Example 4.18. We find the Moore–Penrose generalized inverse of A = [10 Using Worked-out Problem 4.15.1-2, the SVD of A is as follows: 1 A=[ 0

1 0

0 1 ]=[ 1 0

⎡ √1 √ 0 2 0 0 ⎢⎢ 2 ]⎢ 0 ][ 1 0 1 0 ⎢⎢ √1 ⎣− 2

We form Σ−1 2

⎡ √1 ⎢ 2 ⎢ =⎢ 0 ⎢ ⎢ 0 ⎣

√1 2

0

√1 2

1 0

0 1].

0⎤⎥ ⎥ 1⎥ = U ΣV ⊺ . ⎥ 0⎥⎦

0⎤⎥ ⎥ 1⎥⎥ . 0⎥⎦

Then

A =

∗ V2 Σ−1 2 U2

⎡ √1 ⎢ 2 ⎢ = ⎢ √12 ⎢ ⎢ 0 ⎣

0 − √12 ⎤⎥ ⎡⎢ √1 ⎥⎢ 2 0 − √12 ⎥ ⎢ 0 ⎥⎢ 1 0 ⎥⎦ ⎢⎣ 0

0⎤⎥ ⎥ 1 1⎥⎥ [0 0⎥⎦

⎡1 ⎢2 0 ⎢ ] = ⎢ 12 ⎢ 1 ⎢0 ⎣

0⎤⎥ ⎥ 0⎥ . ⎥ 1⎥⎦

Remark 4.16.2. Assume that A ∈ Cn×n is invertible. Then A = A−1 . Theorem 4.16.3. Let A ∈ Cm×n . Then, the Moore–Penrose generalized inverse A ∈ Cn×m satisfies the following properties: 1. rank A = rank A . 2. A AA = A , AA A = A, A∗ AA = A AA∗ = A∗ . 3. A A and AA are Hermitian nonnegative definite idempotent matrices, i.e., (A A)2 = A A and (AA )2 = AA , having the same rank as A. 4. The least squares solution of Ax = b, i.e., the solution of the system A∗ Ax = A∗ b has a solution y = A b. This solution has the minimal norm ∣∣y∣∣, for all possible solutions of A∗ Ax = A∗ b. 5. If rank A = n, then A = (A∗ A)−1 A∗ . In particular, if A ∈ Cn×n is invertible, then A = A−1 . To prove the above theorem we need the following proposition: Proposition 4.16.4. Let E ∈ Cl×m and G ∈ Cm×n . Then rank EG ≤ min(rank E, rank G). If l = m and E is invertible, then rank EG = rank G. If m = n and G is invertible, then rank EG = rank E. Proof. Let e1 , . . . , em ∈ Cl , g1 , . . . , gn ∈ Cm be the columns of E and G, respectively. Then, rank E = dim span {e1 , . . . , el }. Observe that EG = [Eg1 , . . . , Egn ] ∈ Cl×n . Clearly Egi is a linear combination of the columns of E. Hence, Egi ∈ span {e1 , . . . , el }. Therefore, span {Eg1 , . . . , Egn } ⊆ span {e1 , . . . , el }, which implies that rank EG ≤ rank E. Note that (EG)T = GT E T . Hence rank EG =

200

Chapter 4. Inner product spaces

rank (EG)T ≤ rank GT = rank G. Thus rank EG ≤ min(rank E, rank G). Suppose E is invertible. Then, rank EG ≤ rank G = rank E −1 (EG) ≤ rank EG. It follows that rank EG = rank G. Similarly, rank EG = rank E if G is invertible. ◻ Proof of Theorem 4.16.3. ∗ −1 ∗ 1. Proposition 4.16.4 yields that rank A = rank Vr Σ−1 r Ur ≤ rank Σr Ur ≤ −1 ∗ rank Σr = r = rank A. Since Σr = Vr A Ur , Proposition 4.16.4 implies that rank A ≥ rank Σ−1 r = r. Hence, rank A = rank A . ∗ −1 ∗ ∗ 2. AA = (Ur Σr Vr∗ )(Vr Σ−1 r Ur ) = Ur Σr Σr Ur = Ur Ur . Hence,

AA A = (Ur Ur∗ )(Ur Σr Vr∗ ) = Ur ΣVr∗ = A. Hence, A∗ AA = (Vr Σr Ur∗ )(Ur Ur∗ ) = A∗ . Similarly, A A = Vr Vr∗ and A AA = A , A AA∗ = A∗ . 3. Since AA = Ur Ur∗ , we deduce that (AA )∗ = (Ur Ur∗ )∗ = (Ur∗ )∗ Ur∗ = AA , i.e., AA is Hermitian. Next, (AA )2 = (Ur Ur∗ )2 = (Ur Ur∗ )(Ur Ur∗ ) = (Ur Ur∗ ) = AA , i.e., AA is idempotent. Thus, AA is nonnegative definite. As AA = Ur Ir Ur∗ , the arguments of part 1 yield that rank AA = r. Similar arguments apply to A A = Vr Vr∗ . 4. Since A∗ AA = A∗ , it follows that A∗ A(A b) = A∗ b, i.e., y = A b is a least squares solution. It is left to show that if A∗ Ax = A∗ b, then ∣∣x∣∣ ≥ ∣∣A b∣∣ and equality holds if and only if x = A b. We now consider the system A∗ Ax = A∗ b. To analyze this system, we use the full form of the SVD given in (4.13.7). It is equivalent to (V ΣT U ∗ )(U ΣV ∗ )x = V ΣT U ∗ b. Multiplying by V ∗ , we obtain the system ΣT Σ(V ∗ x) = ΣT (U ∗ b). Let z = (z1 , . . . , zn )T ∶= V ∗ x, c = (c1 , . . . , cm )T ∶= U ∗ b. Note that z∗ z = x∗ V V x = x∗ x, i.e., ∣∣z∣∣ = ∣∣x∣∣. After these substitutions, the least squares system in z1 , . . . , zn variables is given in the form σi (A)2 zi = σi (A)ci , for i = 1, . . . , n. 1 Since σi (A) = 0, for i > r, we obtain that zi = σi (A) ci , for i = 1, . . . , r r n 1 2 2 while zr+1 , . . . , zn are free variables. Thus, ∣∣z∣∣ = ∑i=1 σi (A) 2 + ∑i=r+1 ∣zi ∣ . Hence, the least squares solution with minimal length ∣∣z∣∣ is the solution with zi = 0, for i = r + 1, . . . , n. This solution corresponds to x = A b. 5. Since rank A∗ A = rank A = n, it follows that A∗ A is an invertible matrix. Hence, the least squares solution is unique and is given by x = (A∗ A)−1 A∗ b. Thus, for each b one has (A∗ A)−1 A∗ b = A b, hence A = (A∗ A)−1 A∗ . If A is an n×n invertible matrix, it follows that (A∗ A)−1 A∗ = A−1 (A∗ )−1 A∗ = A−1 . ◻

4.16.1 Rank-constrained matrix approximations Let A ∈ Cm×n and assume that A = UA ΣA VA∗ is the SVD of A given in (4.13.7). Let UA = [u1 u2 . . . um ], VA = [v1 v2 . . . vn ] be the representations of U, V in

4.16. Moore–Penrose generalized inverse

201

terms of their m, n columns, respectively. Then rank A

PA,L ∶= ∑ ui u∗i ∈ Cm×m , i=1

rank A

PA,R ∶= ∑ vi vi∗ ∈ Cn×n ,

(4.16.3)

i=1

are the orthogonal projections on the range of A and A∗ , respectively. Denote by Ak ∶= ∑ki=1 σi (A)ui vi∗ ∈ Cm×n for k = 1, . . . , rank A. For k > rank A we define Ak ∶= A (= Arank A ). For 1 ≤ k < rank A, the matrix Ak is uniquely defined if and only if σk (A) > σk+1 (A). Theorem 4.16.5. Let A ∈ Cm×n , B ∈ Cm×p , C ∈ Cq×n be given. Then, X = B (PB,L APC,R )k C is a solution to the minimal problem min

X∈R(p,q,k)

∣∣A − BXC∣∣F ,

(4.16.4)

having the minimal ∣∣X∣∣F . This solution is unique if and only if either k ≥ rank PB,L APC,R or 1 ≤ k < rank PB,L APC,R and σk (PB,L APC,R ) > σk+1 (PB,L APC,R ). Proof. Recall that the Frobenius norm is invariant under the multiplication from the left and the right by the corresponding unitary matrices. Hence ∣∣A − ˜ ∶= V ∗ XUC . Clearly, X and X ˜ BXC∣∣F = ∣∣A1 − ΣB Y ΣC ∣∣, where A˜ ∶= UB∗ AVC , X B have the same rank and the same Frobenius norm. Thus, it is enough to consider ˜ B XΣ ˜ C ∣∣F . Let s = rank B, t = rank B. the minimal problem minY X∈R(p,q,k) ∣∣A−Σ ˜ Clearly, if B or C is a zero matrix, then X = 0 is the solution to the minimal problem (4.16.4). In this case, either PB,L or PC,R are zero matrices, and the theorem holds trivially. It is left to consider the case 1 ≤ s, 1 ≤ t. Define B1 ∶= diag(σ1 (B), . . . , σs (B)) ∈ ˜ to 2 × 2 block maCs×s , C1 ∶= diag(σ1 (C), . . . , σt (C)) ∈ Ct×t . Partition A˜ and X 2 2 ˜ ˜ trices A = [Aij ]i,j=1 and X = [Xij ]i,j=1 , where A11 , X11 ∈ Cs×t . (For certain ˜ to less than 2 × 2 block values of s and t, we may have to partition A˜ or X 2 matrices.) Observe next that Z ∶= ΣB Y ΣC = [Zij ]i,j=1 , where Z11 = B1 Y11 C1 and all other blocks Zij are zero matrices. Hence, ∣∣A˜ − Z∣∣2F = ∣∣A11 − Z11 ∣∣2F +

2 2 ∑ ∣∣Aij ∣∣F ≥ ∣∣A11 − (A11 )k ∣∣F +

2 σk+1 (PB,L APC,R ). ◻

202

Chapter 4. Inner product spaces

4.16.2 Generalized singular value decomposition See [8] for more details on this section. Proposition 4.16.6. Let 0 < M ∈ S(m, R), 0 < N ∈ S(n, R) be positive definite symmetric matrices. Let ⟨x, y⟩M ∶= yT M 2 x, ⟨u, v⟩N ∶= vT N 2 u, for x, y ∈ Rm , u, v ∈ RN , be inner products in Rm , Rn , respectively. Let A ∈ Rm×n and view A as a linear operator A ∶ Rn → Rm , x ↦ Ax. Denote by Ac ∶ Rm → Rn the adjoint operator with respect to the inner products ⟨⋅, ⋅⟩M , ⟨⋅, ⋅⟩N . That is, ⟨Au, y⟩M = ⟨u, Ac y⟩N . Then, Ac = N −2 AT M 2 . Proof. Clearly, ⟨Au, y⟩M = yT M 2 Au and ⟨u, Ac y⟩N = (Ac y)T N 2 u. Hence, (Ac )T N 2 = M 2 A. Take the transpose of this identity and divide by N −2 from the left to deduce Ac = N −2 AT M 2 . ◻ Theorem 4.16.7. Let the assumptions of Proposition 4.16.6 hold. Then the generalized singular value decomposition (GSVD) of A is A = U ΣV T , Σ = diag(σ1 , . . . , ) ∈ Rm×n , where σ1 ≥ ⋯σr > 0, σi = 0 for i > r ∶= rank A, U ∈ GL(m, R), V ∈ GL(n, R), U T M 2 U = Im , V T N −2 V = Im .

(4.16.5)

Proof. Identify Rn and Rm , with the inner products ⟨⋅, ⋅⟩N , ⟨⋅, ⋅⟩M , with the inner product spaces V and U, respectively. Identify A, Ac with the linear operators T ∶ V → V, T ∗ ∶ U → V, respectively. Apply Proposition 4.13.1. Then, {v1 , . . . , vn } and {u1 , . . . , um } are orthonormal sets of eigenvectors of Ac A and 2 AAc , respectively, corresponding to the eigenvalues σ12 , . . . , σn2 and σ12 , . . . , σm : N −2 AT M 2 Avi = σi2 vi , vjT N 2 vi = δij , i, j = 1, . . . , n, V = N 2 [v1 . . . vn ], AN −2 AT M 2 ui = σi2 ui , uTj M 2 ui = δij , i, j = 1, . . . , m, U = [u1 . . . um ], 1 ui = Avi , i = 1, . . . , r = rank A. σi (4.16.6) To justify the decomposition A = U ΣV T , choose a vector v ∈ Rn and write it up as v = ∑ni=1 vi vi . Then Av = ∑ni=1 Avi . Since σi = 0 for i > r, it follows that Avi = 0. Also Avi = σi ui , for i = 1, . . . , r. Hence, Av = ∑ri=1 vi σi ui . Compare that with U ΣV T vi = U Σ[v1 . . . vn ]T N 2 vi , which is equal to σi ui if i ≤ r and 0 if i > r. Hence, A = U ΣV T . ◻ Corollary 4.16.8. Let the assumptions and the notations of Theorem 4.16.7 hold. Then for k ∈ [1, r], k

Ak ∶= Uk Σk VkT = ∑ σi ui viT N 2 ,

Uk ∈ Rm×k , Vk ∈ Rn×k ,

i=1

Σk ∶= diag(σ1 , . . . , σk ), UkT M 2 Uk = VkT N −2 Vk = Ik , Uk = [u1 , . . . , uk ], Vk = N 2 [v1 , . . . , vk ]

(4.16.7)

4.16. Moore–Penrose generalized inverse

203

is the best rank k-approximation to A in the Frobenius and the operator norms with respect to the inner products ⟨⋅, ⋅⟩M , ⟨⋅, ⋅⟩N on Rm and Rn , respectively. Theorem 4.16.9. Let A ∈ Rm×n and B ∈ Rl×n . Then there exists a generalized (common) singular value decomposition of A and B, called GSVD, of the form A ∶= Ur Σr (A)VrT ,

B = Wr Σr (B)VrT ,

Σr (A) = diag(σ1 (A), . . . , σr (A)),

Σr (B) = diag(σ1 (B), . . . , σr (B)),

σi (A) + σi (B) = 1, for i = 1, . . . , r, 2

2

UrT Ur = WrT Wr = VrT N −2 Vr = Ir .

r ∶= rank [AT B T ],

(4.16.8)

Let V ⊆ Rn be the subspace spanned by the columns of AT and B T . Then, 0 ≤ N ∈ S(n, R) is any positive definite matrix such that N V = V and N 2 ∣V = AT A + B T B∣V. Furthermore, the GSVD of A and B is obtained as follows. Let P ∶= AT A + B T B. Then rank P = r. Let √ √ P ∶= Qr Ω2r QTr , QTr Qr = Ir , Qr ∶= [q1 . . . qr ], Ωr ∶= diag( λ1 , . . . , λr ), (4.16.9) P qi = λi qi , qTj qi = δij , i, j = 1, . . . , n, λ1 ≥ ⋯ ≥ λr > 0 = λr+1 = ⋯ = λn , be the spectral decomposition of P . Define T T −1 CA ∶= Ω−1 r Qr A AQr Ωr ,

T T −1 r×r CB ∶= Ω−1 . r Qr B BQr Ωr ∈ R

Then, CA + CB = Ir . Let CA = RΣr (A)2 RT , R ∈ Rr×r , RT R = Ir be the spectral decomposition of CA . Then, CB = RΣr (B)2 RT is the spectral decomposition of CB . Furthermore, V = Qr Ωr R. The nonzero orthonormal columns of Ur and Wr corresponding to positive singular values σi (A) and σj (B) are uniquely −1 determined by the equalities Ur Σr (A) = AQr Ω−1 r R and Wr Σr (B) = BQr Ωr R. Other columns of Ur and Wr are any set of orthonormal vectors in Rm and Rl , respectively, which are orthogonal to the previously determined columns of Ur and Wr , respectively. Proof. We first prove the identities (4.16.8). Assume that Qr Ω−1 r R = [v1 , . . . , vr ]. Clearly, range P = span {q1 , . . . , qr } = span {v1 , . . . , vr }, and range (AT ), range ⊥ (B T ) ⊆ range P . Hence ker P = span {u1 , . . . , ur } ⊆ ker A, ker P ⊆ ker B. m×r Let Ur = [u1 . . . ur ] ∈ R . From the equality Ur Σr (A) = AQr Ω−1 r R, we deduce that Avi = σi ui , for i = 1, . . .. The equality T −1 2 Σr (A)UrT Ur Σr (A) = (AQr Ω−1 r R) (AQr Ωr R) = Σr (A)

implies that the columns of Ur corresponding to positive σi (A) form an orthonormal system. Since V = Qr Ωr R we obtain V T Qr Ω−1 r R = Ir . Hence, Ur Σr (A)V T vi = σi (A)ui , for i = 1, . . . , r. Therefore, A = Ur Σr (A)VrT . Similarly, B = Wr Σr (A)VrT . From the definitions of P, Qr , CA , CB it follows that CA + CB = Ir . Hence, Σ1 (A)2 + Σr (B)2 = Ir . Other claims of the theorem follow straightforward. ◻ We now discuss a numerical example that shows the sensitivity of the GSVD of two matrices. We first generate at random two matrices A0 ∈ R8×7 and B0 ∈ R9×7 , where rank A0 = rank B0 = 2 and rank [AT0 B0T ] = 3. These are done as follows. Choose at random x1 , x2 ∈ R8 , y1 , y2 ∈ R9 , z1 , z2 , z3 ∈ R7 . Then

204

Chapter 4. Inner product spaces

A0 = x1 zT1 + x2 zT2 , B0 = y1 zT1 + y2 zT3 . The first three singular values of A0 , B0 are given as follows: 27455.5092631633888, 17374.6830503566089, 3.14050409246786192 × 10−12 , 29977.5429571960522, 19134.3838220483449, 3.52429226420727071 × 10−12 , i.e., the ranks of A0 and B0 are 2 within double-digit precision. The four first singular values of P0 = AT0 A0 + B0T B0 are 1.32179857269680762 × 109 , 6.04366385186753988 × 108 ,

3.94297368116438210 × 108 , 1.34609524647135614 × 10−7 . Again rank P0 = 3 within double-digit precision. The three generalized singular values of A0 and B0 given by Theorem 4.16.9 are φ1 = 1, φ2 = 0.6814262563, φ3 = 3.777588180 × 10−9 , ψ1 = 0, ψ2 = 0.7318867789, ψ3 = 1. So A0 and B0 have one “common” generalized right singular vector v2 with the corresponding singular values φ2 , ψ2 , which are relatively close numbers. The right singular vector v1 is presented only in A0 and the right singular vector v3 is presented only in B0 . We next perturb A0 , B0 by letting A = A0 +X, B = B0 +Y , where X ∈ R8×7 , Y ∈ 9×7 R . The entries of X and Y were chosen each at random. The seven singular values of X and Y are given as follows: (27490, 17450, 233, 130, 119, 70.0, 18.2), (29884, 19183, 250, 187, 137, 102, 19.7). Note that ∣∣X∣∣ ∼ 0.01∣∣A0 ∣∣, ∣∣Y ∣∣ ∼ 0.01∣∣B0 ∣∣. Form the matrices A ∶= A0 + X, B ∶= B0 + Y . These matrices have the full ranks with corresponding singular values rounded off to three significant digits at least: (27490, 17450, 233, 130, 119, 70.0, 18.2), (29884, 19183, 250, 187, 137, 102, 19.7). We now replace A, B by A1 , B1 of rank 2 using the first two singular values and the corresponding singular vectors in the SVD decompositions of A, B. Then two nonzero singular values of A1 , B1 are (27490, 17450), (29883, 19183), rounded to five significant digits. The singular values of the corresponding P1 = AT1 A1 + B1T B1 are up to three significant digits: (1.32×109 , 6.07×108 , 3.96×108 , 1.31×104 , 0.068, 9.88×10−3 , 6.76×10−3 ). (4.16.10) Assume that r˜ = 3, i.e., P has three significant singular values. We now apply Theorem 4.16.9. The three generalized singular values of A1 , B1 are (1.000000000, .6814704276, 0.7582758358 × 10−8 ),

(0., .7318456506, 1.0).

These results match the generalized singular values of A0 , B0 at least up to four ˆ1 , W ˆ 1 be the matrix V , the first two columns of U , the significant digits. Let Vˆ , U last two columns of W , which are computed for A1 , B1 . Then ∣∣V − Vˆ ∣∣ ˆ1 ∣∣ ∼ 0.0093, ∣∣W1 − W ˆ 1 ∣∣ ∼ 0.0098. ∼ 0.0061, ∣∣U1 − U ∣∣V ∣∣

4.16. Moore–Penrose generalized inverse

205

Finally, we discuss the critical issue of choosing correctly the number of significant singular values of noised matrices A, B and the corresponding matrix P = AT A + B T B. Assume that the numerical rank of P1 is 4. That is, in Theorem 4.16.9, assume that r = 4. Then the four generalized singular values of A1 , B1 up to six significant digits are (1, 1, 0, 0), (0, 0, 1, 1)!

4.16.3 Worked-out problems 1. Let V be a vector space and T ∈ L(V). Prove the following statements: (a) If T is idempotent, then so is I − T , where I is the identity operator on V. (b) If T is idempotent, then V = ker T ⊕ ImT . (c) If T is idempotent, its only possible eigenvalues are 0 and 1. (d) If T is idempotent and V is finite-dimensional, then T is diagonalizable. Solution: (a) For any v ∈ V, (I − T )2 v = (I − T )(I − T )v = (I − T )(v − T v) = v − T v − T (v − T v) = v − T v − T v + T 2 v. Using T 2 = T , this reduces to v − T v = (I − T )v. So, (I − T )2 = I − T and I − T is idempotent, too. (b) For any v ∈ V, of course T v ∈ ImT . Since, T (v − T v) = T v − T 2 v = 0, v − T v ∈ ker T . As v = (v − T v) + T v, we have v ∈ ker T + ImT and so V = ker T + ImT . To show that the sum is direct, suppose that v ∈ ker T ∩ ImT . Then, T v = 0 and also v = T w, for some w ∈ V. Then, 0 = T v = T (T w) = T 2 w = T w = v. So ker T ∩ ImT = {0}, and then V = ker T ⊕ ImT . (c) One can do this directly, using the definition of eigenvalues. Supposing that λ is an eigenvalue of T , we have T v = λv, for some nonzero vector v. Then, T 2 v = T (T v) = T (λv) = λT v = λ2 v. On the other hand, T 2 = T and T v = λv. This implies λ2 v = λv. Since v ≠ 0, λ2 = λ and so λ is either 0 or 1. (d) Since V = ker T ⊕ ImT , if we choose a basis for ker T and a basis for ImT , together they form a basis for V. Clearly, each element of the basis for ker T is an eigenvector corresponding to 0. Also, any vector in the basis for ImT is v = T w, for some w ∈ V; as such T v = T 2 w = T w = 1v and so an eigenvector corresponding to 1. We have a basis of V consisting of eigenvectors, so T is diagonalizable.

4.16.4 Problems 1. A matrix P ∈ Rn×n is called an orthogonal projection if P is idempotent and a symmetric matrix. Let V ⊆ Rn be the subspace spanned by the columns of P . Show that for any a ∈ Rn and b ∈ P V the inequality ∣∣a−b∣∣ ≥ ∣∣a−P a∣∣ holds. Furthermore, equality holds if and only if b = P a. That is, P a is the orthogonal projection of a on the column space of P .

206

Chapter 4. Inner product spaces

2. Let A ∈ Rm×n and assume that the SVD of A is given by (4.13.7), where U ∈ O(m, R), V ∈ O(n, R). (a) What is the SVD of AT ? (b) Show that (AT ) = (A )T . (c) Suppose that B ∈ Rl×m . Is it true that (BA) = A B ? Justify! 3. Let A ∈ Cm×n . Show that (a) N (A∗ ) = N (A ). (b) R(A∗ ) = R(A ). 4. Let V be a vector space with V = U ⊕ W. The linear operator ρU⊕W (u + w) = u, where u ∈ U and w ∈ W is called projection onto U along W. Prove that T ∈ L(V) is a projection if and only if it is idempotent.

Chapter 5

Perron–Frobenius theorem

The Perron–Frobenius theorem provides a simple characterization of the maximal positive eigenvalue and the corresponding positive eigenvector. The importance of this theorem lies in the fact the eigenvalue problems on these types of matrices arises in many different fields of mathematics. As an application of the Perron–Frobenius theorem, at the end of this chapter, we will show that it can be used as a tool to give a generalization of Kepler’s theorem.

5.1 Perron–Frobenius theorem

√ Throughout this section, we denote the complex number i by −1 to simplify notations. Denote by Rm×n the set of matrices A = [aij ]m,n + i,j=1 , where each entry aij ∈ R, aij ≥ 0. We denote a nonnegative matrix by A ≥ 0. We say that A is nonzero nonnegative if A ≥ 0 and A ≠ 0. This is denoted by A ≩ 0. We say that A is a positive matrix if all the entries of A are positive, i.e., aij > 0, for all i, j. We denote that by A > 0. Similar notation is used for vectors, i.e., n = 1. From now on, we assume that all matrices are square matrices unless stated otherwise. A nonnegative square matrix A ∈ Rn×n is called irreducible, if (I + A)n−1 > 0. A + nonnegative matrix is called primitive if Ap > 0, for some positive integer p. Theorem 5.1.1 (Perron–Frobenius). Let A ∈ Rn×n + . Denote by ρ(A) the spectral radius of A, i.e., the maximum of the absolute values of the eigenvalues of A. Then, the following conditions hold: 1. ρ(A) is an eigenvalue of A. 2. There exists x ≩ 0 such that Ax = ρ(A)x. 3. Assume that λ is an eigenvalue of A, λ ≠ ρ(A) and ∣λ∣ = ρ(A). Then, λ index (λ) ≤ index (ρ(A)). Furthermore, ζ = ρ(A) is a root of unity of order m at most n, i.e., ζ = 1, for some m with m ≤ n. 4. Suppose that A is primitive. Then, ρ(A) > 0 and ρ(A) is a simple root of the characteristic polynomial, and there exists x > 0 such that Ax = ρ(A)x. Furthermore, if λ is an eigenvalue of A different from ρ(A), then ∣λ∣ < ρ(A). 207

208

Chapter 5. Perron–Frobenius theorem

5. Suppose that A is irreducible. Then, ρ(A) > 0 and ρ(A) is a simple root of the characteristic polynomial and there exists x > 0 such that Ax = ρ(A)x. Let λ0 ∶= ρ(A), λ1 , . . . , λm−1 be all distinct eigenvalues of A satisfying ∣λ∣ = ρ(A). Then, all these eigenvalues are simple roots of the characteristic polynomial of A. Furthermore, λj = ρ(A)ζ j , for j = 0, . . . , m − 1, where 2πi ζ = e m is a primitive mth root of 1. Finally, the characteristic polynomials of ζA and A are equal. See [10] for an analog of Perron–Frobenius for multilinear forms with nonnegative coefficients, and more generally, for polynomial maps with nonnegative coefficients. We first prove this result for symmetric nonnegative matrices. Proposition 5.1.2. Let A = A⊺ ≥ 0 be a symmetric matrix with nonnegative entries. Then, the conditions of Theorem 5.1.1 hold. Proof. Recall that A is diagonalizable by an orthogonal matrix. Hence, the index of each eigenvalue of A is 1. Next, recall that each eigenvalue is given by ⊺ . For x = (x1 , . . . , xn )⊺ , let ∣x∣ ∶= (∣x1 ∣, . . . , ∣xn ∣)⊺ . Note the Rayleigh quotient xx⊺Ax x

A∣x∣ that x⊺ x = ∣x∣⊺ ∣x∣. Also, ∣x⊺ Ax∣ ≤ ∣x∣⊺ A∣x∣. Hence, ∣ xx⊺Ax ∣ ≤ ∣x∣ . Thus, the x ∣x∣⊺ ∣x∣ maximum of the Rayleigh quotient is achieved for some x ≩ 0. This shows that ρ(A) is an eigenvalue of A, and there exists an eigenvector x ≩ 0 corresponding to A. Suppose that A > 0. Then, Ax = ρ(A)x, x ≩ 0. So, Ax > 0. Hence, ρ(A) > 0 and x > 0. Assume that there exists another eigenvector y, which is not collinear with x, corresponding to ρ(A). Then, y can be chosen to be orthogonal to the positive eigenvector x corresponding to ρ(A), i.e., x⊺ y = 0. Hence, y has positive and negative coordinates. Therefore, ⊺

∣

⊺

y⊺ Ay ∣y∣⊺ A∣y∣ ∣< ≤ ρ(A). ⊺ y y ∣y∣⊺ ∣y∣

This gives the contradiction ρ(A) < ρ(A). Therefore, ρ(A) is a simple root of the characteristic polynomial of A. In a similar way, we deduce that each eigenvalue λ of A, different from ρ(A), satisfies ∣λ∣ < ρ(A). Since the eigenvectors of Ap are the same as A, we deduce the same results if A is a nonnegative symmetric matrix which is primitive. Suppose that A is irreducible. Then, (tI + A)n−1 > 0, for any t > 0. Clearly, ρ(tI + A) = ρ(A) + t. Hence, the eigenvector corresponding to ρ(A) is positive. Therefore, ρ(A) > 0. Each eigenvalue of tI + A is of the form λ + t, for some eigenvalue λ of A. Since (tI + A) is primitive for any t > 0, it follows that ∣λ + t∣ < ρ(A) + t. Let t → 0+ to deduce that ∣λ∣ ≤ ρ(A). Since all the eigenvalues of A are real, we can have an equality for some eigenvalue λ ≠ ρ(A) if and only if λ = −ρ(A). This will be the 0 B case if A has the form C = [B ⊺ 0 ], for some B ∈ Rm×l + . It can be shown that if A is a symmetric irreducible matrix such that −ρ(A) is an eigenvalue of A, then A = P CP ⊺ , for some permutation matrix P . We already showed that C and −C have the same eigenvalues, counted with their multiplicities. Hence, −A and A have the same characteristic polynomial. ◻

5.1. Perron–Frobenius theorem

209

Recall the `∞ norm on Cn and the corresponding operator norm on Cn×n : ∥(x1 , . . . , xn )⊺ ∥∞ ∶= max ∣xi ∣, i∈[n]

∥A∥∞ ∶= max ∥Ax∥∞ , A ∈ Cn×n . ∥x∥∞ ≤1

(5.1.1)

Lemma 5.1. Let A = [aij ]i,j∈[n] ∈ Cn×n . Then ∥A∥∞ = max ∑ ∣aij ∣.

(5.1.2)

i∈[n] j∈[n]

Proof. Let a ∶= maxi∈[n] ∑j∈[n] ∣aij ∣. Assume that ∥(x1 , . . . , xn )⊺ ∥ ≤ 1. Then ∣(Ax)i ∣ = ∣ ∑ aij xj ∣ ≤ ∑ ∣aij ∣ ∣xj ∣ ≤ ∑ ∣aij ∣ ≤ a, j∈[n]

j∈[n]

j∈[n]

for each i ∈ [n]. Hence, ∥A∥∞ ≤ a. From the definition of a, it follows that a = ∑j∈[n] ∣akj ∣, for some k ∈ [n]. Let xj be a complex number of modulus 1, i.e., xj = 1, such that ∣akj ∣ = akj xj , for each j ∈ [n]. Let x = (x1 , . . . , xn )⊺ . Clearly, ∥x∥∞ = 1. Furthermore, a = ∑j∈[n] akj xj . Hence, maxi∈[n] ∥Ax∥∞ ≥ a. Therefore, a = ∥A∥∞ . ◻ Definition 5.1.3. Let A = [aij ] ∈ Cn×n . Denote ∣A∣ ∶= [∣aij ∣] ∈ Rn×n + . For x = (x1 , . . . , xn )⊺ ≥ 0, define r(A, x) ∶= min{t ≥ 0 ∶ ∣A∣x ≤ tx}. (Note that r(A, x) may take value ∞.) It is straightforward to show that r(A, x) ∶= max i∈[n]

∑j∈[n] ∣aij ∣xj , for x > 0. xi

(5.1.3)

Lemma 5.2. Let A = [aij ] ∈ Cn×n and 1 ∶= (1, . . . , 1)⊺ ∈ Cn . Then 1. ∥A∥∞ = r(A, 1). 2. Assume that x = (x1 , . . . , xn )⊺ > 0. Then, each eigenvalue λ of A satisfies ∣λ∣ ≤ r(A, x). That is, ρ(A) ≤ r(A, x). In particular, ρ(A) ≤ ∥A∥∞ .

(5.1.4)

Proof. The statement ∥A∥∞ = r(A, 1) follows from the definition of ∥A∥∞ and r(A, 1). We now show (5.1.4). Assume that Az = λz, where z = (z1 , . . . , zn )⊺ ∈ Cn ∖ {0}. Without loss of generality, we may assume that ∣zk ∣ = ∥z∥∞ = 1, for some k ∈ [n]. Then n

n

j=1

j=1

∣λ∣ = ∣λzk ∣ = ∣(Az)k ∣ ≤ ∑ ∣akj ∣ ∣zj ∣ ≤ ∑ ∣akj ∣ ≤ ∥A∥∞ . Assume that x = (x1 , . . . , xn )⊺ > 0. Let D = diag(x1 , . . . , xn ). Define A1 ∶= D−1 AD. A straightforward calculation shows that r(A, x) = r(A1 , 1) = ∥A1 ∥∞ . Clearly, A and A1 are similar. Use (5.1.4) for A1 to deduce that ∣λ∣ ≤ r(A, x). ◻ Lemma 5.3. Let A ∈ Rn×n + . Then, r(A, x) ≥ r(A, (I + A)x).

210

Chapter 5. Perron–Frobenius theorem

Proof. As Ax ≤ r(A, x)x, we deduce A((I + A)x) = (I + A)(Ax) ≤ (I + A)(r(A, x)x) = r(A, x)A((I + A)x). The definition of r(A, (I + A)x) yields the lemma.

◻

The next result is a basic step in the proof of Theorem 5.1.1 and is due to Wielandt, e.g., [9]. Theorem 5.1.4. Let A ∈ Rn×n be an irreducible matrix. Then + ρ(A) = inf r(A, x) > 0. x>0

(5.1.5)

Equality holds if and only if Ax = ρ(A)x. There is a unique positive eigenvector x > 0 corresponding to ρ(A), up to a multiple by a constant. This eigenvector is called the Perron–Frobenius eigenvector. That is, rank (ρ(A)I − A) = n − 1. Any other real eigenvector of A corresponding to λ ≠ ρ(A) has two coordinates with opposite signs. Proof. Since for any a > 0 and x > 0, we have that r(A, ax) = r(A, x), it is enough to assume that x is a probability vector, i.e., ∥x∥1 = ∑ni=1 xi = 1. We know that Πn , the set of all probability vectors in Rn , is a compact set (Problem 1.9.3-7). We now consider the extremal problem r(A) ∶= inf r(A, x). x∈Πn

(5.1.6)

We claim that this infimum is achieved for some y ∈ Πn . As r(A, x) ≥ 0, for x ≥ 0, it follows that there exists a sequence xm ∈ Πn such that limm→∞ r(A, xm ) = r(A). Clearly, r(A, xm ) ≥ r(A). Hence, by considering a subsequence if necessary, we can assume that r(A, xm ) is a nonincreasing sequence, i.e., r(A, xl ) ≥ r(A, xm ), for m > l. Thus, r(A, xl )xm ≥ r(A, xm )xm ≥ Axm . As limm→∞ xm = y, it follows that r(A, xl )y ≥ Ay. Hence, r(A, xl ) ≥ r(A, y) ⇒ r(A) ≥ r(A, y). From the definition of r(A), we deduce that r(A) ≤ r(A, y). Hence, r(A) = r(A, y). Assume first that y is not an eigenvector of A. Let z ∶= r(A)y − Ay. So z ≥ 0 and z ≠ 0. Since A is irreducible, then (I + A)n−1 > 0. Hence, (I + A)n−1 z > 0. This is equivalent to r(A)((I + A)n−1 y) > (I + A)n−1 (Ay) = A((I + A)n−1 y). Let u ∶= (I + A)n−1 y. Then, the above inequality yields that r(A, u) < r(A). Let v = bu ∈ Πn , for a corresponding b > 0. So r(A, v) = r(A, u) < r(A). This contradicts the definition of r(A). Hence, Ay = r(A)y. Clearly, (I + A)n−1 y = (1 + r(A))n−1 y. As y ∈ Πn and (I + A)n−1 > 0, it follows that (I + A)n−1 y > 0. Hence, y > 0. Part 2 of Lemma 5.2 yields that ρ(A) ≤ r(A, y) = r(A). As r(A) is an eigenvalue of A, it follows that r(A) = ρ(A). Since A is irreducible, A ≠ 0. Hence, r(A) = r(A, y) > 0. Combine the definition of r(A) with the fact that Ay = r(A)y to deduce (5.1.5). We next claim that rank (ρ(A)I − A) = n − 1, i.e., the dimension of the eigenspace of A corresponding to A is 1. Assume to the contrary that w ∈ Rn is an eigenvector of A corresponding to ρ(A), which is not collinear with y, i.e.,

5.1. Perron–Frobenius theorem

211

w ≠ ty, for any t ∈ R. Consider w + ty. Since y > 0 for t ≫ 0 and −t ≫ 0, we have that w + ty > 0 and w + ty < 0, respectively. Hence, there exists t0 ∈ R such that u ∶= w + t0 y ≥ 0 and at least one of the coordinates of u equals zero. As w and y are linearly independent, it follows that u ≠ 0. Also, Au = ρ(A)u. Since A is irreducible, we obtain that (1 + ρ(A))n−1 u = (I + A)n−1 u > 0. This contradicts the assumption that u has at least one zero coordinate. Suppose finally that Av = λv, for some λ ≠ ρ(A) and v ≠ 0. Observe that A⊺ ≥ 0 and (I + A⊺ )n−1 = ((I + A)n−1 )⊺ > 0. That is, A⊺ is irreducible. Hence, there exists u > 0 such that A⊺ u = ρ(A⊺ )u. Recall that ρ(A⊺ ) = ρ(A). Consider now u⊺ Av = λu⊺ v = ρ(A)u⊺ v. As ρ(A) ≠ λ, we deduce that u⊺ v = 0. As u > 0, it follows that v has two coordinates with opposite signs. ◻ We now give the full proof of Theorem 5.1.1. We use the following well-known result for the roots of monic polynomials of complex variables [21]. Lemma 5.4 (continuity of the roots of a polynomial). Let p(z) = z n + a1 z n−1 + ⋯ + an be a monic polynomial of degree n with complex coefficients. Assume that p(z) = ∏nj=1 (z − zj (p)). Given > 0, there exists δ() > 0 such that for each coefficient vector b = (b1 , . . . , bn ) satisfying ∑nj=1 ∣bj − aj ∣2 < δ()2 , one can rearrange the roots of q(z) = z n + b1 z n−1 + ⋯ + bn = ∏nj=1 (z − zj (q)) such that ∣zj (q) − zj (p)∣ < , for each j ∈ [n]. Corollary 5.1.5. Assume that Am ∈ Cn×n converges to A ∈ Cn×n , m ∈ N. Then, limm→∞ ρ(Am ) = ρ(A). That is, the function ρ(⋅) ∶ Cn×n → [0, ∞) that assigns to each A its spectral radius is a continuous function. The proof is left as an exercise. Proof of 1. and 2. of Theorem 5.1.1. Assume that A ∈ Rn×n + , i.e., A has nonnegative entries. Let Jn ∈ Rn×n be a matrix all of whose entries are 1. Let + 1 Jn , for m ∈ N. Then, each Am is a positive matrix and limm→∞ Am = Am = A + m A. Corollary 5.1.5 yields that limm→∞ ρ(Am ) = ρ(A). Clearly, each Am is irreducible. Hence by Theorem 5.1.4 there exists a probability vector xm ∈ Πn such that Am xm = ρ(Am )xm . As Πn is compact, there exists a subsequence mk , k ∈ N such that limk→∞ xmk = x ∈ Πn . Consider the equalities Amk xmk = ρ(Amk )xmk . Let k → ∞ to deduce that Ax = ρ(A)x. So ρ(A) is an eigenvector of A with a corresponding probability vector x. ◻ Lemma 5.5. Let B ∈ Cn×n and assume that rank B = n − 1. Then, adj B = uv⊺ ≠ 0, where Bu = B ⊺ v = 0. Suppose furthermore that A ∈ Rn×n and ρ(A) is + a geometrically simple eigenvalue, i.e., rank B = n − 1, where B = ρ(A)In − A. Then, adj B = uv⊺ and u, v ≩ 0 are the eigenvectors of A and A⊺ corresponding to the eigenvalue ρ(A). Proof. Recall that the entries of adj B are all n − 1 minors of B. Since rank B = n − 1, we deduce that adj B ≠ 0. Recall next that B(adj B) = (adj B)B = det B In = 0, as rank B = n − 1. Let adj B = [u1 u2 . . . un ]. As B(adj B) = 0, we deduce that Bui = 0, for i ∈ [n]. Since rank B = n − 1, the dimension of the null space is 1. So the null space of B is spanned by u ≠ 0. Hence, ui = vi u, for some vi , i ∈ [n]. Let v = (v1 , . . . , vn )⊺ . Then, adj B = uv⊺ . Since adj B ≠ 0, it

212

Chapter 5. Perron–Frobenius theorem

follows that v ≠ 0. As 0 = (adj B)B = uv⊺ B = u(B ⊺ v) and u ≠ 0, it follows that B ⊺ v = 0. Assume now that A ∈ Rn×n + . Then, part 1 of Theorem 5.1.1 yields that ρ(A) is an eigenvalue of A. Hence, B = ρ(A)In − A is singular. Suppose that rank B = n − 1. We claim that adj B ≥ 0. We first observe that for t > ρ(A), we have det(tIn −A) = ∏nj=1 (t−λj (A)) > 0. Indeed, if λj (A) is real, then t−λj (A) > 0. ¯ j (A) is also an eigenvalue of If λj (A) is not real, i.e., a complex number, then λ ¯ j (A)) = ∣t−λj (A)∣2 > 0. A. Recall that t > ρ(A) ≥ ∣λj (A)∣. Hence, (t−λj (A))(t− λ So det(tIn − A) > 0. Furthermore, adj (tIn − A) = det(tIn − A)(tIn − A)−1

∞

= det(tIn − A)t−1 (In − t−1 A)−1 = det(tIn − A)t−1 ∑ (t−1 A)j . j=0

As Aj ≥ 0, for each nonnegative integer j, it follows that adj (tIn − A) ≥ 0, for t > ρ(A). Let t ↘ ρ(A) (t converges to ρ(A) from above) to deduce that adj B ≩ 0. As B = uv⊺ , we can assume that u, v ≩ 0. ◻ Lemma 5.6. Let A ∈ Rn×n be an irreducible matrix. Then, ρ(A) is an alge+ braically simple eigenvalue. Proof. We may assume that n > 1. Theorem 5.1.4 implies that ρ(A) is geometrically simple, i.e., null(ρ(A)I − A) = 1. Hence, rank (ρ(A)I − A)=n−1. Lemma 5.5 yields that adj (ρ(A)I − A) = uv⊺ , where Au = ρ(A)u, A⊺ v = ρ(A)v, u, v ≩ 0. Theorem 5.1.4 implies that u, v > 0. Hence, uv⊺ is a positive matrix. In particular, tr uv⊺ = v⊺ u > 0. Since (det(λI − A))′ (λ = ρ(A)) = tr adj (ρ(A)I − A) = v⊺ u > 0, we deduce that ρ(A) is a simple root of the characteristic polynomial of A.

◻

For A ∈ Rn×n and x ≩ 0 denote s(A, x) = max{t ≥ 0 ∶ Ax ≥ tx}. From the + definition of r(A, x), we immediately conclude that r(A, x) ≥ s(A, x). We now give a complementary part of Theorem 5.1.4. Lemma 5.7. Let A ∈ Rn×n be an irreducible matrix. Then + ρ(A) = max s(A, x) > 0. x>0

(5.1.7)

Equality holds if and only if Ax = ρ(A)x. The proof of this lemma is similar to the proof of Theorem 5.1.4, so we will skip it. As usual, denote by S1 ∶= {z ∈ C ∶ ∣z∣ = 1} the unit circle in the complex plane. Lemma 5.8. Let A ∈ Rn×n be irreducible, C ∈ Cn×n . Assume that ∣C∣ ≤ A. Then, + ρ(C) ≤ ρ(A). Equality holds, i.e., there exists λ ∈ spec C, such that λ = ζρ(A), for some ζ ∈ S1 , if and only if there exists a complex diagonal matrix D ∈ Cn×n , whose diagonal entries are equal to 1, such that C = ζDAD−1 . The matrix D is unique up to a multiplication by t ∈ S1 .

5.1. Perron–Frobenius theorem

213

Proof. We can assume that n > 1. Suppose that A = [aij ], C = [cij ]. Theorem 5.1.4 yields that there exists u > 0 such that Au = ρ(A)u. Hence, r(A, u) = ρ(A). Since ∣C∣ ≤ A, it follows that r(C, u) ≤ r(A, u) = ρ(A). Lemma 5.2 yields that ρ(C) ≤ r(C, u). Hence, ρ(C) ≤ ρ(A). Suppose that ρ(C) = ρ(A). Hence, we have equalities ρ(C) = r(C, u) = ρ(A) = r(A, u). So, λ = ζρ(A) is an eigenvalue of C, for some ζ ∈ S1 . Furthermore, for the corresponding eigenvector z of C we have ∣λ∣ ∣z∣ = ∣Cz∣ ≤ ∣C∣ ∣z∣ ≤ A∣z∣. Hence, ρ(A)∣z∣ ≤ A∣z∣. Therefore, s(A, ∣z∣) ≥ ρ(A). Lemma 5.7 yields that s(A, ∣z∣) = ρ(A) and ∣z∣ is the corresponding nonnegative eigenvector. Therefore, ∣Cz∣ = ∣C∣∣z∣ = A∣z∣. Theorem 5.1.4 yields that ∣z∣ = u. Let zi = di ui , ∣di ∣ = 1, for i = 1, . . . , n. The equality ∣Cz∣ = ∣C∣ ∣z∣ = A∣z∣ combined with the triangle inequality and ∣C∣ ≤ A yields first that ∣C∣ = A. Furthermore, for each fixed i, the nonzero complex numbers ci1 z1 , . . . , cin zn have the same argument, i.e., cij = ζi aij d¯j , for j = 1, . . . , n and some complex number ζj , where ∣ζi ∣ = 1. Recall that λzi = (Cz)i . Hence, ζi = ζdi , for i = 1, . . . , n. Thus, C = ζDAD−1 , where D = diag(d1 , . . . , dn ). It is straightforward to see that D is unique up to a multiplication by t, for any t ∈ S1 . Suppose now that for D = diag(d1 , . . . , dn ), where ∣d1 ∣ = ⋯ = ∣dn ∣ = 1 and ∣ζ∣ = 1, we have that C = ζDAD−1 . Then, λi (C) = ζλi (A), for i = 1, . . . , n. So ρ(C) = ρ(A). Furthermore, cij = ζdi cij d¯j , i, j = 1, . . . , n. Then, ∣C∣ = A. ◻ Lemma 5.9. Let ζ1 , . . . , ζh ∈ S1 be h distinct complex numbers that form a multiplicative semigroup, i.e., for any integers i, j ∈ [h], ζi ζj ∈ {ζ1 , √. . . , ζh }. Then, the set {ζ1 , . . . , ζh } is the set (the group) of all h roots of 1: e

2πi −1 h

, i = 1, . . . , h.

Proof. Let ζ ∈ T ∶= {ζ1 , . . . , ζh }. Consider the sequence ζ i , i = 1, . . .. Since ζ i+1 = ζζ i , for i = 1, . . . , and T is a semigroup, it follows that each ζ i is in T . As T is a finite set, we must have two positive integers such that ζ k = ζ l , for k < l. Assume that k and l are the smallest possible positive integers. So ζ p = 1, where p = l − k ≥ 1, and Tp ∶= {ζ, ζ 2 , . . . , ζ p−1 , ζ√p = 1} are all p roots of 1. Here, ζ 2πp1

−1

is called a p-primitive root of 1, i.e., ζ = e p , where p1 is a positive integer less than p. Furthermore, p1 and p are coprime, which is denoted by (p1 , p) = 1. Note that ζ i ∈ T , for any integer i. Next, we choose ζ ∈ T such that ζ is a primitive p-root of 1 of the maximal possible order. We claim that p = h, which is equivalent to the equality T = Tp . Assume to the contrary that Tp ⊊ T . Let η ∈ T /Tp . The previous arguments show that η is a q-primitive root of 1. Therefore, Tq ⊂ T , and Tq ⊊ Tp . Thus, q cannot divide p. Also, the maximality of p yields that q ≤ p. Let r be the greatest common divisor of p and q. So 1 ≤ r < q. Recall that Euclid’s algorithm, which is applied to the division of p by q with a residue, yields that there exist two > p be the least common multiple of integers i, j such that ip + jq = r. Let l ∶= pq r p and q. Observe that ζ ′ = e

2π

√

−1 p

∈ Tp , η ′ = e

ξ ∶= (η ′ )i (ζ ′ )j = e

2π(ip+jq) pq

2π

√ −1 q

∈ Tq . So

√ −1

=e

2π

√ −1 l

∈T.

As ξ is an l-primitive root of 1, we obtain a contradiction to the maximality of p. So p = h and T is the set of all h-roots of unity. ◻

214

Chapter 5. Perron–Frobenius theorem

Theorem 5.1.6. Let A ∈ Rn×n be irreducible and assume that for a positive + integer h ≥ 2, A has h − 1 distinct eigenvalues λ1 , . . . , λh−1 , which are distinct from ρ(A), such that ∣λ1 ∣ = ⋯ = ∣λh−1 ∣ = ρ(A). Then, the following conditions hold: 1. Assume that A is imprimitive, i.e., not primitive. Then, there exist exactly h−1 ≥ 1 distinct eigenvalues λ1 , . . . , λh−1 different from ρ(A) and satisfying ∣λi ∣ = ρ(A). Furthermore, the following conditions hold: (a) λi is an algebraically simple eigenvalue of A, for i = 1, . . . , h − 1. (b) The complex numbers

λi ,i ρ(A) √ 2π

= 1, . . . , h − 1 and 1 are all h roots of

unity, i.e., λi = ρ(A)e , for i = 1, . . . , h − 1. Furthermore, if Azi = λi zi , zi ≠ 0, then ∣zi ∣ = u > 0, the Perron–Frobenius eigenvector u given in Theorem 5.1.4. −1i h

(c) Let ζ be any h-root of 1, i.e., ζ h = 1. Then, the matrix ζA is similar to A. Hence, if λ is an eigenvalue of A, then ζλ is an eigenvalue of A having the same algebraic and geometric multiplicity as λ. (d) There exists a permutation matrix P ∈ Pn such that P ⊺ AP = B has a block h-circulate form: ⎡ 0 B12 0 0 ... 0 0 ⎢ ⎢ 0 0 B23 0 . . . 0 0 ⎢ ⎢ ⋮ ⋮ ⋮ ... ⋮ ⋮ B=⎢ ⋮ ⎢ ⎢ 0 0 0 0 ⋮ 0 B(h−1)h ⎢ ⎢ Bh1 0 0 0 ⋮ 0 0 ⎣ ni ×ni+1 Bi(i+1) ∈ R , i = 1, . . . , h, Bh(h+1) = Bh1 ,

⎤ ⎥ ⎥ ⎥ ⎥ ⎥, ⎥ ⎥ ⎥ ⎥ ⎦

nh+1 = n1 , n1 + ⋯ + nh = n. Furthermore, the diagonal blocks of B h are all irreducible primitive matrices, i.e., Ci ∶= Bi(i+1) . . . B(h−1)h Bh1 . . . B(i−1)i ∈ Rn+ i ×ni , i = 1, . . . , h, (5.1.8) are irreducible and primitive. λi Proof. Assume that ζi ∶= ρ(A) ∈ S1 , for i = 1, . . . , h − 1 and ζh = 1. Apply Lemma 5.8 to C = A and λ = ζi ρ(A) to deduce that A = ζi Di ADi−1 , where Di is a diagonal matrix such that ∣D∣ = I, for i = 1, . . . , h. Hence, if λ is an eigenvalue of A, then ζi λ is an eigenvalue of A, with the same algebraic and geometric multiplicities as λ. In particular, since ρ(A) is an algebraically simple eigenvalue of A, λi = ζi ρ(A) is an algebraically simple eigenvalue of A, for i = 1, . . . , h − 1. This establishes (1a). Let T = {ζ1 , . . . , ζh }. Note that

A = ζi Di ADi−1 = ζi Di (ζj Dj ADj−1 )Di−1 = (ζi ζj )(Di Dj )A(Di Dj )−1 .

(5.1.9)

Therefore, ζi ζj ρ(A) is an eigenvalue of A. Hence, ζi ζj ∈ T , i.e., T is a semigroup. Lemma 5.9 yields that ζ1 , . . . , ζn are all h roots of 1. Note that if Azi = λi zi , zi ≠ 0, then zi = tDi u, for some 0 ≠ t ∈ C, where u > 0 is the Perron–Frobenius vector given in Theorem 5.1.4. This establishes (1b).

5.2. Irreducible matrices

215 √ −1

Let ζ = e h ∈ T . Then, A = ζDAD−1 , where D is a diagonal matrix D = (d1 , . . . , dn ), ∣D∣ = I. Since D can be replaced by d¯1 D, we can assume that d1 = 1. (5.1.9) yields that A = ζ h Dh AD−h = IAI −1 . Lemma 5.8 implies that Dh = diag(dh1 , . . . , dhn ) = tI. Since d1 = 1, it follows that Dh = I. So all the diagonal entries of D are h-roots of unity. Let P ∈ Pn be a permutation matrix such that the diagonal matrix E = P ⊺ DP is of the following block diagonal form: 2π

2πki

√ −1

E = In1 ⊕ µ1 In2 ⊕ ⋯ ⊕ µl−1 Inl , µi = e h , i = 1, . . . , l − 1, 1 ≤ k1 < k2 < ⋯ < kl−1 ≤ h − 1. Note that l ≤ h and equality holds if and only if ki = i. Let µ0 = 1. n ×n Let B = P ⊺ AP . Partition B to a block matrix [Bij ]li=j=1 , where Bij ∈ R+ i j , for i, j = 1, . . . , l. Then, the equality A = ζDAD−1 yields B = ζEBE −1 . The structure of B and E implies the equalities Bij = ζ

µi−1 Bij , µj−1

i, j = 1, . . . , l.

i−1 ≠ 1. Since all the entries of Bij are nonnegative, we obtain that Bij = 0 if ζ µµj−1 Hence, Bii = 0, for i = 1, . . . , l. Since B is irreducible, it follows that not all Bi1 , . . . , Bil are zero matrices for each i = 1, . . . , l. First, start with i = 1. Since µ0 = 1 and j1 ≥ 1, it follows that µj ≠ ζ, for j > 1. Then, B1j = 0 for j = 3, . . . , l. Hence, B12 ≠ 0, which implies that µ1 = ζ, i.e., k1 = 1. Now, let i = 2 and consider j = 1, . . . , l. As ki ∈ {k1 + 1, k1 + 2, . . . , h − 1}, for i > 1, it follows that B2j = 0, for j ≠ 3. Hence, B23 ≠ 0, which yields that k2 = 2. Applying these arguments for i = 3, . . . , l − 1, we deduce that Bij = 0 for j ≠ i + 1, Bi(i+1) ≠ 0, ki = i for i = 1, . . . , l − 1. It is left to consider i = l. Note that

ζµl−1 ζl = j−1 = ζ l−(j−1) , which is different from 1, for j ∈ [`] ∖ {1}. µj−1 ζ Hence, Blj = 0 for j > 1. Since B is irreducible, B11 ≠ 0. So ζ l = 1. As l ≤ h, we deduce that l = h. Hence, B has the block form given in (1d). ◻ Remark 5.1.7. Let D = (V, E) be a digraph on the set of n vertices V , and the set of edges E ⊂ V × V . Then D is represented by A = [aij ] ∈ {0, 1}n×n , where aij = 0 or aij = 1 if there is no edge or there is an edge from the vertex i to the vertex j, respectively. Many properties of digraphs are reflected in the spectrum (the set of eigenvalues) of A, and the corresponding eigenvectors, in particular to the eigenvector corresponding to the nonnegative eigenvalue of the maximal modulus. The Perron–Frobenius theorem for nonnegative matrices is used to study this problem.

5.2 Irreducible matrices Denote by R+ ⊃ Z+ the set of nonnegative real numbers and nonnegative integers, respectively. Let S ⊂ C. By S(n, S) ⊂ S n×n denote the set of all symmetric matrices A = [aij ] with entries in S. Assume that 0 ∈ S. Then, by Sn,0 (S) ⊂ S(n, S) denote the subset of all symmetric matrices with entries in S and zero diagonal. Denote by 1 = (1, . . . , 1)⊺ ∈ Rn the vector of length n whose all coordinates are 1. For any t ∈ R, we let signt = 0 if t = 0 and signt = ∣t∣t if t ≠ 0.

216

Chapter 5. Perron–Frobenius theorem

Let D = (V, E) be a multidigraph. Assume that #V = n and label the vertices of V as 1, . . . , n. We have a bijection φ1 ∶ V → [n]. This bijection induces an isomorphic graph D1 = ([n], E1 ). With D1 we associate the following matrix A(D1 ) = [aij ]ni,j=1 ∈ Zn×n + . Then, aij is the number of directed edges from the vertex i to the vertex j. (If aij = 0, there are no diedges from i to j.) When no confusion arises, we let A(D) ∶= A(D1 ), and we call A(D) the adjacency matrix of D. Note that a different bijection φ2 ∶ V → [n] gives rise to a different A(D2 ), where A(D2 ) = P ⊺ A(D1 )P , for some permutation matrix P ∈ Pn . (See Appendix C for details about multidigraphs.) If D is a simple digraph, then A(D) ∈ {0, 1}n×n . If D is a multidigraph, then aij ∈ Z+ is the number of diedges from i to j. Hence, A(G) ∈ Zn×n + . If G is a multigraph, then A(G) = A(D(G)) ∈ S(n, Z+ ). If G is a simple graph, then A(G) ∈ Sn,0 ({0, 1}). Proposition 5.2.1. Let D = (V, E) be a multidigraph on n vertices. Let A(D) (k) be a representation matrix of D. For an integer k ≥ 1, let A(D)k = [aij ] ∈ Zn×n + . (k)

Then, aij is the number of walks of length k from the vertex i to the vertex j. In particular, 1⊺ A1 and tr A are the total number of walks and the total number of closed walks of length k in D, respectively. Proof. For k = 1 the proposition is obvious. Assume that k > 1. Recall that (k)

aij =

∑

i1 ,...,ik−1 ∈[n]

aii1 ai1 i2 ⋯aik−1 j .

(5.2.1)

The summand aii1 ai1 i2⋯aik−1 j gives the number of walks of the form i0 i1 i2 ⋯ik−1 ik , where i0 = i, ik = j. Indeed, if one of the terms in this product is zero, i.e., there is no diedge (ip , ip+1 ), then the product is zero. Otherwise, each positive integer aip ip+1 counts the number of diedges (ip , ip+1 ). Hence, aii1 ai1 i2 ⋯aik−1 j is the number of walks of the form i0 i1 i2 ⋯ik−1 ik . The total number of walks from i = i0 to j = ik of length k is the sum given by (5.2.1). The total number of walks in (k) D of length k can be found by ∑ni=j=1 aij = 1⊺ A1. The total number of closed (k)

walks in D of length k is ∑ki=1 aii = tr A(D)k .

◻

With a multipartite graph G = (V1 ∪ V2 , E), where #V1 = m, #V2 = n, we associate a representation matrix B(G) = [bij ]m,n i=j=1 as follows. Let ψ1 ∶ V1 → [m], φ1 ∶ V2 → [m] be bijections. This bijection induces an isomorphic graph D1 = ([m] ∪ [n], E1 ). Then, bij is the number of edges connecting i ∈ [m] to j ∈ [n] in D1 . A nonnegative matrix A = [aij ]ni=j=1 ∈ Rn×n induces the following digraph + D(A) = ([n], E). The diedge (i, j) is in E if and only if aij > 0. Note that of A(D(A)) = [signaij ] ∈ {0, 1}n×n . Theorem 5.2.2. Let D = ([n], E) be a multidigraph. Then, D is strongly connected if and only if (I + A(D))n−1 > 0. In particular, a nonnegative matrix A ∈ R+n×n is irreducible if and only if (I + A)n−1 > 0. Proof. Apply the Newton binomial theorem for (1 + t)n−1 to the matrix (I + A(D))n−1 : n−1 n−1 (I + A(D))n−1 = ∑ ( )A(D)p . p p=0

5.3. Recurrence equation with nonnegative coefficients

217

) are positive for p = 0, . . . , n − 1. Recall that all the binomial coefficients (n−1 p Assume first that (I + A(D))n−1 is positive. Hence, the (i, j) entry of A(D)p is positive for some p = p(i, j). Let i ≠ j. Since A(D)0 = I, we deduce that p(i, j) > 0. Use Proposition 5.2.1 to deduce that there is a walk of length p from the vertex i to the vertex j. Suppose that D is strongly connected. Then, for each i ≠ j we must have a path pf length p ∈ [1, n−1] that connects i and j. Hence, all off-diagonal entries of (I +A(D))n−1 are positive. Clearly, (I +A(D))n−1 ≥ I. Hence, (I +A(D))n−1 > 0. n−1 Let A ∈ Rn×n is positive if and only if the + . Then, the (i, j) entry of (I + A) n−1 (i, j) entry of (I + A(D(A))) is positive. Hence, A is irreducible if and only if (I + A)n−1 > 0. ◻

5.3 Recurrence equation with nonnegative coefficients Recall that the matrix A = [aij ] ∈ Rn×n is called irreducible if (I + A)n−1 > 0. + We now correspond a digraph D = D(A) of order n to A as follows. The vertex set is V = {a1 , . . . , an }. There is an arc α = (ai , aj ) from ai to aj if and only if aij > 0, i, j ∈ [n]. It is shown that A is irreducible if and only if D is strongly connected (Theorem 6.2.2). We have already seen that A ∈ Rn×n is called primitive if Ap > 0, for some + positive integer p. It is proved that if A is an irreducible matrix such that the g.c.d. of the lengths of all cycles in D(A) is 1, then A is primitive. See [9] for more details on primitive matrices. Consider the following homogeneous recurrence equation: xk = Axk−1 , ⎡0 ⎢ ⎢0 ⎢ ⎢ A=⎢ ⋮ ⎢ ⎢0 ⎢ ⎢an ⎣

1 0 ⋮ 0

0 1 ⋮ 0

an−1

an−2

... ... ⋮ ... ...

0 0 ⋮ 0 a2

k = 1, 2, . . .

0 ⎤⎥ 0 ⎥⎥ ⎥ ⋮ ⎥ ∈ Rn×n , ⎥ 1 ⎥⎥ a1 ⎥⎦

(5.3.1) ⎡ uk ⎤ ⎢ ⎥ ⎢ u ⎥ ⎢ k+1 ⎥ ⎥ ∈ Rn . xk = ⎢ ⎢ ⋮ ⎥ ⎢ ⎥ ⎢uk+n−1 ⎥ ⎣ ⎦

Assume that a1 , . . . , an ≥ 0. Since a1(i+1) = 1 for i = 1, . . . , n − 1, we have a dipath 1 → 2 → ⋯ → n. So if an > 0, then we have a Hamiltonian dicycle. In particular, the digraph D(A) is strongly connected. Hence, A is irreducible. Suppose an > 0. If a1 > 0, we have a dicycle of length 1. As gcd(1, n) = 1, A is primitive. If ai > 0, then we have a dicycle n − i + 1 → n − i + 2 → ⋯ → n → n − i + 1 of length i. In particular, if an−1 > 0, then gcd(n − 1, n) = 1 and A is primitive. Now, we are ready to give a generalization of Kepler’s theorem for Fibonacci numbers that can be considered an application of the Perron–Frobenius theorem. Theorem 5.3.1. Consider the homogeneous recurrence equation (5.3.1). Assume that an > 0 and a1 , . . . , an−1 ≥ 0. Suppose furthermore that an1 , . . . , ani > 0, where 1 ≤ n1 < ⋯ < ni < n. Assume that the g.c.d. of n, n1 , . . . , ni is 1. Suppose that (u0 , . . . , un−1 )⊺ ≩ 0. Then lim

m→∞

um+1 = ρ(A). um

(5.3.2)

218

Chapter 5. Perron–Frobenius theorem

More precisely,

1 xk = (v⊺ x0 )u, k→∞ ρ(A)k lim

(5.3.3)

where u, v are defined below. Proof. The assumptions of the theorem yield that A is primitive. Then, the Perron–Frobenius theorem yields that ρ(A) > 0 is a simple eigenvalue of A, and all other eigenvalues of A satisfy ∣λ∣ < ρ(A). The component of A corresponding to ρ(A) is of the form uv⊺ , where u, v > 0 are the right and left eigenvectors of A, satisfying Au = ρ(A)u, A⊺ v = ρ(A)v, v⊺ u = 1. Hence, writing Am in terms of its components and taking Am x0 , we see that the leading term is xk = ρ(A)k uv⊺ x0 + O(tk ), t = max{∣λj (A)∣ ∶ ∣λj (A)∣ < ρ(A)}. As v⊺ x0 > 0, we deduce the theorem.

◻

5.3.1 Worked-out problems 1. Let A ∈ Rn×n be irreducible. Show that A is primitive if and only if one of + the following conditions hold: (a) n = 1 and A > 0. (b) n > 1 and each eigenvalue λ of A different from ρ(A) satisfies the inequality ∣λ∣ < ρ(A). Solution: If n = 1, then A is primitive if and only if A > 0. Assume now that 1 n > 1. So ρ(A) > 0. Considering B = ρ(A) A, it is enough to consider the case ρ(A) = 1. Assume first that if λ ≠ 1 is an eigenvalue of A, then ∣λ∣ < 1. Theorem 5.1.4 implies Au = u, A⊺ w = w for some u, w > 0. So w⊺ u > 0. Let v ∶= (w⊺ u)−1 w. Then, A⊺ v = v and v⊺ u = 1. Part 2 of Theorem 4.6 yields limk→∞ Ak = Z10 ≥ 0, where Z10 is the component of A corresponding to an algebraically simple eigenvalue 1 = ρ(A). Problem 3.1.3-1.a yields that Z10 = uv⊺ > 0. So there exists an integer k0 ≥ 1, such that Ak > 0, for k ≥ k0 , i.e., A is primitive. Assume now that A has exactly h > 1 distinct eigenvalues λ satisfying ∣λ∣ = 1. Lemma 5.1.6 implies that there exists a permutation matrix P such that B = P ⊺ AP is of the form (1d) of Theorem 5.1.6. Note that B h is a block diagonal matrix. Hence, B hj = (B h )j is a block diagonal matrix for j = 1, . . . , .... Hence, B hj is never a positive matrix, so Ahj is never a positive matrix. Hence, A is not primitive.

5.3.2 Problems 1. Let B ∈ Rn×n be an irreducible, imprimitive matrix having h > 1 distinct + eigenvalues λ satisfying ∣λ∣ = ρ(B). Suppose furthermore that B has the form (1d) of Theorem 5.1.6. Show that B h is a block diagonal matrix,

5.3. Recurrence equation with nonnegative coefficients

219

where each diagonal block is an irreducible primitive matrix whose spectral radius is ρ(B)h . In particular, show that the last claim of (1d) of Theorem 5.1.6 holds. 2. Show that if n > 1 and A ∈ Rn×n is irreducible, then A does not have a zero + row or column. 3. Assume that A ∈ Rn×n is irreducible. Show the following: + (a) For each x ∈ Πn the vector (I + A)n−1 x is positive. (b) The set (I + A)n−1 Πn ∶= {y = (I + A)n−1 x ∶ x ∈ Πn } is a compact set of positive vectors. (c) r(y) is a continuous function on (I + A)n−1 Πn . 4. Assume that A ∈ Rn×n is irreducible. Show that the following are equiva+ lent: (a) A is primitive. (b) There exists a positive integer k0 such that for any integer k ≥ k0 , Ak > 0. 5. Let D = (V, E) be a digraph and assume that V is a disjoint union of h h

nonempty sets V1 , . . . , Vh . Denote Vh+1 ∶= V1 . Assume that E ⊂ ⋃ Vi × Vi+1 . i=1

Let Di = (Vi , Ei ) be the following digraph. The diedge (v, w) ∈ Ei , if there is a path of length h in D from v to w in D. (a) Show that D is strongly connected if and only if Di is strongly connected for i = 1, . . . , h. (b) Assume that D is strongly connected. Let 1 ≤ i < j ≤ h. Then Di is primitive if and only if Dj is primitive. 2

6. Let A be a nonnegative primitive matrix of order n. Prove that An > 0.

−2n+2

7. Let A be a primitive matrix. Can the number of positive entries of A be greater than that of A2 ? 8. Assume that A is an irreducible nonnegative matrix and x ≥ 0 is an eigenvector of A. Prove that x > 0.

Chapter 6

Tensor products

6.1 Introduction The common notion of tensors in mathematics is associated with differential geometry, covariant and contravariant derivatives, Christoffel symbols, and Einstein theory of general relativity. In engineering and other practical applications such as biology and psychology, the relevant notions in mathematics are related to multilinear algebra. Of course, the notions in the two mentioned fields are related.

6.2 Tensor product of two vector spaces The term “tensor product” refers to a way of constructing a big vector space out of two (or more) smaller vector spaces. Given two vector spaces U, W over the field F, one first defines U ⊗ W as a linear span of all vectors of the form u ⊗ w, where u ∈ U, w ∈ W, satisfying the following natural properties: a(u ⊗ w) = (au) ⊗ w = u ⊗ (aw) for all a ∈ F. (a1 u1 + a2 u2 ) ⊗ w = a1 (u1 ⊗ w) + a2 (u2 ⊗ w) for all a1 , a2 ∈ F. (Linearity in the first variable.) u ⊗ (a1 w1 + a2 w2 ) = a1 (u ⊗ w1 ) + a2 (u ⊗ w2 ) for all a1 , a2 ∈ F. (Linearity in the second variable.) If {u1 , . . . , um } and {w1 , . . . , wn } are bases in U and W, respectively, then {ui ⊗ wj ∶ i = 1, . . . , m, j = 1, . . . , n} is a basis in U ⊗ W.

Here U ⊗ W is called the tensor product of U and W. Also, U ⊗ W is a vector space over F. The element u ⊗ w is called the decomposable tensor, decomposable element (vector), or rank 1 tensor if u and w are nonzero. It is natural to question why U ⊗ W always exists. In what follows we prove the existence of the tensor product. Indeed, we will construct an explicit tensor product. This construction only works if U and W are finite-dimensional. 221

222

Chapter 6. Tensor products

Universal property Denote by U × W the product set, all pairs of the form (u, w). Theorem 6.2.1. Let U, W be given finite-dimensional vector spaces over F. Then, there exists a unique vector space V′ over F, up to isomorphism, of dimension (dim U × dim W) with the following properties: 1. There exists a bilinear map ι ∶ U × W → V′ with the following properties: (a) ι((u, w)) = 0 if and only if either u = 0 or w = 0. (b) Assume that ui , wi ≠ 0, for i ∈ {0, 1}. Then, ι((u1 , u2 )) = ι((w1 , w2 )) if and only if there exist scalars t1 , t2 ∈ F, satisfying t1 t2 = 1 and w1 = t1 u1 and w2 = t2 u2 . 2. Assume that L ∶ U × W → V is a bilinear map. Then, there exists a unique ˆ ∶ V′ → V such that L = L ˆ ○ ι. map L Property 2 is called the lifting property. It is usually started in terms of a commutative diagram: ι

U×W

V′ (∶= U ⊗ W) ˆ L

L V Proof.

1. Assume that {e1,1 , e2,1 , . . . , em1 ,1 } and {e1,2 , e2,2 , . . . , em2 ,2 } are bases in U and W, respectively. Let V′ be a finite vector space of dimension m = m1 m2 . Choose a basis of cardinality m and denote the elements of this basis by ej1 ,1 ⊗ ej2 ,2 , where j1 ∈ [m1 ] and j2 ∈ [m2 ]. Assume that u= w=

∑ uj1 ,1 ej1 ,1 ,

j1 ∈[m1 ]

∑ wj2 ,2 ej2 ,2 .

j2 ∈[m2 ]

Define ι by the formula ι((u, w)) ∶=

∑

j1 ∈[m1 ],j2 ∈[m2 ]

uj1 ,1 wj2 ,2 ej1 ,1 ⊗ ej2 ,2 .

It is straightforward to see that ι is bilinear. Suppose that either u = 0 or w = 0. Clearly, ι((u, w)) = 0. If u ≠ 0, then uj1 ,1 ≠ 0, for some j1 ∈ [m1 ]. Also, if w ≠ 0, then wj2 ,2 ≠ 0 for some j2 ∈ [m2 ]. Thus, the coefficients uj1 ,1 , wj2 ,2 of the basis element ej1 ,1 ⊗ ej2 ,2 are nonzero. Hence, ι((u, w)) ≠ 0. To show part 1b, let u1 = x = (x1 , . . . , xp )⊺ , u2 = y = (y1 , . . . , yq )⊺ , where p = m1 , q = m2 . The coefficients of e1,i e2,j correspond to the rank 1 matrix xy⊺ . Similarly, let w1 = z, w2 = w. Then, the equality ι((u1 , u2 )) = ι((w1 , w2 )) is equivalent to xy⊺ = zw⊺ . Hence, z = t1 x, w = t2 y and t1 t2 = 1.

6.2. Tensor product of two vector spaces

223

ˆ by L(e ˆ j ,1 ⊗ ej ,2 ) ∶= L(ej ,1 , ej ,2 ) for j1 ∈ [m1 ] and j2 ∈ [m2 ]. 2. Define L 1 2 1 2 ˆ extends uniquely to a linear map from V′ to V. It is straightforThen, L ˆ ○ ι. ward to show that L = L ◻ Indeed, the tensor product of U and W is a vector space U ⊗ W equipped with a bilinear map ι ∶ U × W → U ⊗ W, which is universal. This universal property determines the tensor product up to canonical isomorphism. Example 6.1. The vector space Fm×n over F is a tensor product of Fm and Fn with tensor multiplication defined, for (a1 , . . . , am ) ∈ Fm and (b1 , . . . , bn ) ∈ Fn , by ⎡a ⎤ ⎢ 1⎥ ⎢ ⎥ (a1 , . . . , am ) ⊗ (b1 , . . . , bn ) = ⎢ ⋮ ⎥ [b1 . . . bn ] . ⎢ ⎥ ⎢am ⎥ ⎣ ⎦ ′ ′ With this definition, ei ⊗ej = Eij , where ei , ej , and Eij are standard basis vectors of Fm , Fn , and Fm×n , respectively. Example 6.2. Let U be the space of all polynomials in variable x of degree i less than m: p(x) = ∑m−1 i=0 ai x with coefficients in F. Let W be the space of all j polynomials in variable y of degree less than n: q(y) = ∑n−1 j=0 bj x with coefficients in F. Then, U ⊗ W is identified with the vector space of all polynomials in two cij xi y j with the coefficients in F. variables x, y of the form f (x, y) = ∑m−1,n−1 i=j=0 The decomposable elements are p(x)q(y), p ∈ U, q ∈ W. The tensor products of this kind are the basic tool for solving PDE, using separation of variables, i.e., Fourier series. Example 6.3. Let U = Fm and W = Fn . Then U ⊗ W is identified with the space of m × n matrices Fm×n . If u, w ≠ 0, the rank 1 tensor u ⊗ w is identified with uwT . Note that in this case, uwT is indeed a rank 1 matrix. Assume now that in addition, U, W are IPS with the inner products ⟨⋅, ⋅⟩U , ⟨⋅, ⋅⟩W . (Here, F is either the real field or the complex field.) Then, there exists a unique inner product ⟨⋅, ⋅⟩U⊗W that satisfies the property ⟨u ⊗ w, x ⊗ y⟩U⊗W = ⟨u, x⟩U ⟨w, y⟩W , for all u, x ∈ U and w, y ∈ W. This follows from the fact that if {u1 , . . . , um } and {w1 , . . . , wm } are orthonormal bases in U and W, respectively, then ui ⊗ wj , i = 1, . . . , m, j = 1, . . . , n is an orthonormal basis in U ⊗ W. In Example 6.2, if one has the standard inner product in Fm and Fn , then these inner products induce the following inner product in Fm×n : ⟨A, B⟩ = tr AB ∗ . If A = uwT , B = xyT , then trAB ∗ = (x∗ u)(y∗ w). Any τ ∈ U ⊗ W can be viewed as a linear transformation τU,W ∶ U → W and τW,U as follows. Then, (u ⊗ w)U,W ∶ U → W is given by x ↦ ⟨x, u⟩U w, (u ⊗ w)W,U ∶ W → U is given by y ↦ ⟨y, w⟩W u. Since any τ ∈ U ⊗ W is a linear combination of rank 1 tensors, and equivalently a linear combination of rank 1 matrices, the above definitions extend to any

224

Chapter 6. Tensor products

τ ∈ U ⊗ W. Thus, if A ∈ Fm×n = Fm ⊗ Fn , then AFm ,Fn u = AT u, AFn ,Fm w = Aw. For τ ∈ U ⊗ W, rank U τ ∶= rank τW,U and rank W τ ∶= rank τU,W . The rank of τ , denoted by rank τ , is the minimal decomposition of τ to a sum of rank 1 nonzero tensors: τ = ∑ki=1 ui ⊗ wi , where ui , wi ≠ 0 for i = 1, . . . , k. Proposition 6.2.2. Let τ ∈ U ⊗ W. Then, rank τ = rank U,W τ = rank W,U τ . Proof. Let τ = ∑ki=1 ui ⊗ wi . Then τ (w) = ∑ki=1 ⟨w, wi ⟩W ui ∈ span {u1 , . . . , uk }. Hence, range τW,U ⊆ span {u1 , . . . , uk }. Therefore, rank τW,U = dim range τW,U ≤ dim span {u1 , . . . , uk } ≤ k. (It is possible that u1 , . . . , uk are linearly dependent.) Since τ is represented by a matrix A, we know that rank τU,W = rank AT = rank A = rank τW,U . Also, rank τ is the minimal number k that A is represented as rank 1 matrices. The singular value decomposition of A yields that one can represent A as the sum of rank A of rank 1 matrices. ◻

6.3 Tensor product of linear operators and the Kronecker product Let Ti ∶ Ui → Wi be linear operators. Then they induce a linear operator on T1 ⊗ T2 ∶ U1 ⊗ U2 → W1 ⊗ W2 such that (T1 ⊗ T2 )(u1 ⊗ u2 ) = (T1 u1 ) ⊗ (T2 u2 ), product for all u1 ∈ U2 , u2 ∈ U2 . We will see in the next section that T1 ⊗ T2 is a special 4 tensor, i.e., T1 ⊗ T2 ∈ U1 ⊗ W1 ⊗ U2 ⊗ W2 . If furthermore Pi ∶ Wi → Wi , i = 1, 2, then we have the following composition: (P1 ⊗ P2 )(T1 ⊗ T2 ) = (P1 T1 ) ⊗ (P2 T2 ). This equality follows from (P1 ⊗ P2 )((T1 ⊗ T2 )(u1 ⊗ u2 )) = (P1 ⊗ P2 )(T1 u1 ⊗ T2 u2 ) = (P1 T1 u1 ) ⊗ (P2 T2 u2 ). Since each linear operator Ti ∶ Ui → Wi , i = 1, 2 is represented by a matrix, one can reduce the definition of T1 ⊗ T2 to the notion of tensor product of two matrices A ∈ Fm1 ×n1 , B ∈ Fm2 ×n2 . This tensor product is called the Kronecker product. m1 ×n1 1 ,n1 Let A = [aij ]m . Then, A ⊗ B ∈ Fm1 m2 ×n1 n2 is the following block i,j=1 ∈ F matrix: ⎡ a11 B a12 B . . . a1n1 B ⎤⎥ ⎢ ⎢ a B a22 B . . . a2n1 B ⎥⎥ ⎢ ⎥. A ⊗ B ∶= ⎢ 21 (6.3.1) ⎢ ⎥ ⋮ ⋮ ⋮ ⋮ ⎢ ⎥ ⎢ am1 1 B ⎥ am1 2 B . . . a m 1 n1 B ⎦ ⎣ Let us try to understand the logic of this notation. Let x = [x1 , . . . , xn1 ]T ∈ Fn1 = Fn1 ×1 , y = [y1 , . . . , yn2 ]T ∈ Fn2 = Fn2 ×1 . Then we view x ⊗ y as a column vector of dimension n1 n2 given by the above formula. So the coordinates of x ⊗ y are xj yl where the double indices are arranged in lexicographic order (the order of a dictionary): (1, 1), (1, 2), . . . , (1, n2 ), (2, 1), . . . , (2, n1 ), . . . , (n1 , 1), . . . , (n1 , n2 ).

6.3. Tensor product of linear operators and the Kronecker product

225

The entries of A ⊗ B are aij bkl = c(i,k)(j,l) . So we view (i, k) as the row index of A ⊗ B and (j, l) as the column index of A ⊗ B. Then n1 ,n2 ⎛ n1 ⎞ n2 [(A ⊗ B)(x ⊗ y)](i,k) = ∑ c(i,k)(j,l) xj yk = ∑ aij xj (∑ bkl yl ) = (Ax)i (By)k , ⎝j=1 ⎠ l=1 j,l=1

as it should be according to our notation. Thus, as in the operator case, (A1 ⊗ B1 )(A2 ⊗ B2 ) = (A1 B1 ) ⊗ (A2 B2 ) if Ai ∈ Fmi ×ni , Bi ∈ Fni ×li , i = 1, 2. Note that Im ⊗In = Imn . Moreover, if A and B are diagonal matrices, then A⊗B is a diagonal matrix. If A and B are upper or lower triangular, then A ⊗ B is upper or lower triangular, respectively. So if A ∈ GL(n, F), B ∈ GL(n, F), then (A⊗B)−1 = A−1 ⊗B −1 . Also, (A⊗B)T = AT ⊗B T . So if A, B are symmetric, then A⊗B is symmetric. If A and B are orthogonal matrices, then A⊗B is orthogonal. The following results follow straightforward from the above properties: Proposition 6.3.1. . The following facts hold: 1. Let Ai ∈ Rmi ×ni for i = 1, 2. Assume that Ai = Ui Σi WiT , Ui ∈ O(mi , R), Wi ∈ O(ni , R) is the standard SVD decomposition for i = 1, 2. Then, A1 ⊗ A2 = (U1 ⊗ U2 )(Σ1 ⊗ Σ2 )(W1T ⊗ W2T ) is a singular decomposition of A1 ⊗ A2 , except that the diagonal entries of Σ1 ⊗ Σ2 are not arranged in decreasing order. In particular, all the nonzero singular values of A1 ⊗ A2 are of the form σi (A1 )σj (A2 ), where i = 1, . . . , rank A1 and j = 1, . . . , rank A2 . Hence, rank A1 ⊗ A2 = rank A1 rank A2 . k 2. Let Ak ∈ Fnk ×nk , k = 1, 2. Assume that det(zInk − Ak ) = ∏ni=1 (z − λi,k ) for 1 n2 (z − λi,1 λj,2 ). k = 1, 2. Then, det(zIn1 n2 − A1 ⊗ A2 ) = ∏ni,j=1

3. Assume that Ai ∈ S(ni , R) and Ai = Qi Λi QTi is the spectral decomposition of Ai , i.e., Qi is orthogonal and Λi is diagonal, for i = 1, 2. Then, A1 ⊗A2 = (Q1 ⊗ Q2 )(Λ1 ⊗ Λ2 )(QT1 ⊗ QT2 ) is the spectral decomposition of A1 ⊗ A2 ∈ S(n1 n2 , R). In the next section, we will show that the singular decomposition of A1 ⊗ A2 is a minimal decomposition of the 4 tensor A1 ⊗ A2 . In the rest of the section we discuss the symmetric and skew-symmetric tensor products of U ⊗ U. Definition 6.3.2. Let U be a vector space of dimension m over F. Denote U⊗2 ∶= U ⊗ U. The subspace Sym2 U ⊂ U⊗2 , called a 2-symmetric power of U, is spanned by tensors of the form sym2 (u, w) ∶= u ⊗ w + w ⊗ u for all u, w ∈ U. sym2 (u, w) = sym2 (w, u) is called a 2-symmetric product of u and w, or simply a symmetric product. Any vector τ ∈ Sym2 U is a called a 2-symmetric tensor, or simply a symmetric tensor. The subspace ⋀2 U ⊂ U⊗2 , called 2-exterior power of U, is spanned by all tensors of the form u∧w ∶= u⊗w−w⊗u, for all u, w ∈ U. u ∧ w = −w ∧ u is called the wedge product of u and w. Any vector τ ∈ ⋀2 U is called a 2-skew-symmetric tensor, or simply a skew-symmetric tensor. Since 2u ⊗ w = sym2 (u, w) + u ∧ w, it follows that U⊗2 = Sym2 (U) ⊕ ⋀2 U. That is, any tensor τ ∈ U⊗2 can be decomposed uniquely to a sum τ = τs + τa , where τs , τa ∈ U⊗2 are symmetric and skew-symmetric tensors, respectively.

226

Chapter 6. Tensor products

In terms of matrices, we identify U with Fm , U⊗2 with Fm×m , i.e., the algebra of m × m matrices. Then, Sym2 U is identified with Sm (F) ⊂ Fm×m , the space of m × m symmetric matrices: AT = A, and ⋀2 U is identified with AS(m, F), the space of m × m skew-symmetric matrices: AT = −A. Note that any matrix A ∈ Fm×m is of the form A = 21 (A + AT ) + 12 (A − AT ), which is the unique decomposition to a sum of symmetric and skew-symmetric matrices. Proposition 6.3.3. Let U be a finite-dimensional space over F. Let T ∶ U → U be any linear operator. Then, Sym2 U and ⋀2 U are invariant subspaces of T ⊗2 ∶= T ⊗ T ∶ U⊗2 → U⊗2 . Proof. Observe that T ⊗2 sym2 (u, w) = sym2 (T u, T w) and T ⊗2 u ∧ w = (T u) ∧ (T w). ◻ It can be shown that Sym2 U and ⋀2 U are the only invariant subspaces of T ⊗2 for all choices of linear transformations T ∶ U → U.

6.4 Tensor product of many vector spaces Let Ui be vector spaces of dimension mi , for i = 1, . . . , k over F. Then U ∶= ⊗ki=1 Ui = U1 ⊗U2 ⊗. . .⊗Uk is the tensor product space of U1 , . . . , Uk of dimension m1 m2 . . . mk . U is spanned by the decomposable tensors ⊗ki=1 ui = u1 ⊗ u2 ⊗ . . . ⊗ uk , also called rank 1 tensors, where ui ∈ Ui for i = 1, . . . , k. As in the case of k = 2, we have the basic identity: a(u1 ⊗ u2 ⊗ . . . ⊗ uk ) = (au1 ) ⊗ u2 ⊗ . . . ⊗ uk = u1 ⊗ (au2 ) ⊗ . . . ⊗ uk = ⋯ = u1 ⊗ u2 ⊗ . . . ⊗ (auk ). Also, the above decomposable tensor is multilinear in each variable. The definition and the existence of k-tensor products can be done recursively as follows. For k = 2, U1 ⊗ U2 is defined in section 6.3. Then define recursively ⊗ki=1 Ui as k−1 (⊗i=1 Ui ) ⊗ Uk for k = 3, . . .: ⊗kj=1 uij ,j , ij = 1, . . . , mj , j = 1, . . . , k is a basis of ⊗ki=1 Ui

(6.4.1)

if {u1,i , . . . , umi ,i } is a basis of Ui for i = 1, . . . , k. Assume now that in addition, each Ui is an IPS with the inner product ⟨⋅, ⋅⟩Ui for i = 1, . . . , k. (Here, F is either the real field or the complex field.) Then there exists a unique inner product ⟨⋅, ⋅⟩⊗ki=1 U that satisfies the property k

⟨⊗ki=1 ui , ⊗ki=1 wi ⟩⊗ki=1 Ui = ∏⟨ui , wi ⟩Ui for all ui , wi ∈ Ui , i = 1, . . . , k. i=1

In particular, if {u1,i , . . . , umi ,i } is an orthonormal basis in Ui , for i = 1, . . . , k, then ⊗kj=1 uij ,j , where ij = 1, . . . , mj and j = 1, . . . , k is an orthonormal basis in ⊗ki=1 Ui . Then any τ ∈ ⊗ki=1 U can be represented as τ=

∑

ij ∈[1,mj ],j=1,...,k

ti1 ...ik ⊗kj=1 uij ,j .

(6.4.2)

1 ,...,mk The above decomposition is called the Tucker model. Also, T = (ti1 ...ik )m i1 =⋯=ik =1 is called the core tensor. The core tensor T is called diagonal if ti1 i2 ...ik = 0, whenever the equality i1 = ⋯ = ik is not satisfied.

6.4. Tensor product of many vector spaces

227

We now discuss the change in the core tensor when we replace the basis {u1,i , . . . , umi ,i } in Ui to the basis {w1,i , . . . , wmi ,i } in Ui , for i = 1, . . . , k. We first explain the change in the coordinates of the vector u ∈ U when we change the basis {u1 , . . . , um } to the basis {w1 , . . . , wm } in U. Let u = ∑m i=1 xi ui . Then, x ∶= [x1 , . . . , xm ]T is the coordinate vector of u in the basis {u1 , . . . , um }. It is convenient to represent u = [u1 , . . . , um ]x. Suppose that [w1 , . . . , wm ] is another m×m basis in U. Then Q = [qij ]m is called the transition matrix from the i,j=1 ∈ F basis {u1 , . . . , um } to the basis {w1 , . . . , wm } if m

[u1 , . . . , um ] = [w1 , . . . , wm ]Q, ⇐⇒ ui = ∑ qji wj , j = 1, . . . , m. j=1

So Q−1 is the transition matrix from the basis {w1 , . . . , wm } to {u1 , . . . , um }, i.e., [w1 , . . . , wm ] = [u1 , . . . , um ]Q−1 . Hence u = [u1 , . . . , um ]x = [w1 , . . . , wm ](Qx), i.e., y = Qx coordinate vector of u in [w1 , . . . wn ] basis. Thus, if m×m Ql = [qij,l ]m is the transition matrix from the basis {u1,l , . . . , uml ,l } i,j=1 ∈ F to the basis {w1,l , . . . , wml ,l } for l = 1, . . . , k, then the core tensor T ′ = (t′j1 ...jk ) corresponding to the new basis ⊗kj=1 wij ,j is given by the formula t′j1 ,...,jk =

m1 ,...,mk

∑

i1 ,...,ik =1

k

(∏ qjl il ,l ) ti1 ...ik , denoted as T ′ = T ×Q1 ×Q2 ×⋯×Qk . (6.4.3) l=1

Any tensor can be decomposed to a sum of rank 1 tensors: R

τ = ∑ ⊗kl=1 ul,i , where ul,i ∈ Ul for i = 1, . . . , j, l = 1, . . . , k.

(6.4.4)

i=1

This decomposition is called the CANDECOMP-PARAFAC decomposition. This is not unique. For example, we can obtain CANDECOMP-PARAFAC decomposition from Tucker decomposition by replacing the nonzero term ti1 ...ik ⊗kj=1 uij ,j , ti1 ...ik ≠ 0 with (ti1 ...ik ui1 ) ⊗ ui2 ⊗ ⋯ ⊗ uik . The value of the minimal R is called the tensor rank of τ , and is denoted by rank τ . That is, rank τ is the minimal number of rank 1 tensors in the decomposition of τ to a sum of rank 1 tensors. In general, it is a difficult problem to determine the exact value of rank τ for τ ∈ ⊗ki=1 Ui and k ≥ 3. Any τ ∈ ⊗ki=1 U can be viewed as a linear transformation, τ⊗p

∶ ⊗pl=1 Uil → ⊗pl=1 Ui′l , 1 ≤ i1 < ⋯ < ip ≤ k, 1 ≤ i′1 < ⋯ < ip′ ≤ k, ′

′

U ,⊗p U l=1 il l=1 i′

l

(6.4.5) where the two sets of nonempty indices {i1 , . . . , ip }, {i′1 , . . . , ip′ } are complementary, i.e., 1 ≤ p, p′ < k, p + p′ = 1 and {i1 , . . . , ip } ∩ {i′1 , . . . , ip′ } = ∅, {i1 , . . . , ip } ∪ {i′1 , . . . , ip′ } = [k]. The above transformation is obtained by contracting the indices i1 , . . . , ip . Assume for simplicity that each Ui is an IPS over R, for i = 1, . . . , k. Then for decomposable tensor transformation, (6.4.5) is given as p

′

(⊗ki=1 ui )(⊗pl=1 ⊗ wil ) = (∏⟨wil , uil ⟩Uil ) ⊗pl=1 ⊗ui′l .

(6.4.6)

l=1

For example, for k = 3 (u1 ⊗ u2 ⊗ u3 )(w2 ) = (⟨u2 , w2 ⟩U2 )u1 ⊗ u3 , where u1 ∈ U1 , u2 , w2 ∈ U2 , u3 ∈ U3 and p = 1, i1 = 2, p′ = 2, i′1 = 1, i′3 = 3. Then rank τ⊗p

l=1

′

Uil ,⊗p U l=1 i′

l

∶= dim range τ⊗p

l=1

′

Uil ,⊗p U l=1 i′

l

.

(6.4.7)

228

Chapter 6. Tensor products

It is easy to compute the above ranks, as this rank is the rank of the corresponding matrix representing this linear transformation. As in the case of matrices, it is straightforward to show that rank τ⊗p

l=1

′

Uil ,⊗p U l=1 i′

l

∶= rank τ⊗p′

l=1

Ui′ ,⊗p U l=1 il

.

(6.4.8)

l

In view of the above equalities, for k = 3, i.e., U1 ⊗ U2 ⊗ U3 , we have three different ranks: rank U1 τ ∶= rank τU2 ⊗U3 ,U1 , rank U2 τ ∶= rank τU1 ⊗U3 ,U2 , rank U3 τ ∶= rank τU1 ⊗U2 ,U3 . Thus, rank U1 τ can be viewed as the dimension of the subspace of U1 obtained by all possible contractions of τ with respect to U2 , U3 . In general, let rank i τ = rank τU1 ⊗⋯⊗Ui−1 ⊗Ui+1 ⊗⋯⊗Uk ,Ui , i = 1, . . . , k.

(6.4.9)

Let τ ∈ ⊗ki=1 Ui be fixed. Choose a basis of Ui such that {u1,i , . . . , urank i τ,i } is a basis in range τU1 ⊗⋯⊗Ui−1 ⊗Ui+1 ⊗⋯⊗Uk ,Ui . Then we obtain a more precise version of the Tucker decomposition: τ=

∑

ij ∈[1,rank j τ ],j=1,...,k

ti1 ...ik ⊗kj=1 uij ,j .

(6.4.10)

The following lower bound on rank τ can be computed easily. Proposition 6.4.1. Let Ui be a vector space of dimension mi for i = 1, . . . , k ≥ 2. Let τ ∈ ⊗ki=1 Ui . Then for any set of complementary indices {i1 , . . . , ip }, {i′1 , . . . , i′p′ }, rank τ ≥ rank τ⊗p U ,⊗p′ U ′ . l=1

il

l=1

i l

Proof. Let τ be of the form (6.4.4). Then rank τ⊗p

l=1

′

Uil ,⊗p U l=1 i′

l

= dim range τ⊗p

l=1 ′

′

Uil ,⊗p U l=1 i′

l ′

= dim span {⊗pl=1 ui′l ,1 , . . . , ⊗pl=1 ui′l ,j } ≤ j.

◻

Proposition 6.4.2. Let Ui be a vector space of dimension mi , for i = 1, . . . , k ≥ 2. Let τ ∈ ⊗ki=1 Ui . Then m1 . . . mk . rank τ ≤ maxi∈[k] mi Proof. The proof is by induction on k. For k = 2, recall that any τ ∈ U1 ⊗ U2 can be represented by A ∈ Fm1 ×m2 . Hence, rank τ = rank A ≤ min(m1 , m2 ) = m1 m2 . Assume that the proposition holds of k = n ≥ 2 and let k = n + 1. max(m1 ,m2 ) By permuting the factor U1 , . . . , Uk , one can assume that mn ≤ mi , for i = 1, . . . , n − 1. Let {u1,n , . . . , umn ,n } be a basis of Un . It is straightforward to n−1 n show that τ ∈ ⊗ni=1 Ui is of the form τ = ∑m p=1 τp ⊗ up,n , for unique τp ∈ ⊗i=1 Ui . n−1 Decompose each τi to a minimal sum of rank 1 tensors in ⊗i=1 Ui . Now use the induction hypothesis for each τi to obtain a decomposition of τ as a sum of at m1 ...mk most max rank 1 tensors. ◻ mi i∈[1,k]

6.5. Examples and results for 3-tensors

229

6.5 Examples and results for 3-tensors Let us start with the simplest case U1 = U2 = U3 = R2 (or C2 ). Let {e1 = [1, 0]T , e2 =[0, 1]T } be the standard basis in R2 (or C2 ). Then, τ = ∑2i=j=k=1 tijk ei ⊗ i=j=k=2 ej ⊗ ek , and T = (tijk )i=j=k=1 . Let Tk ∶= [tijk ]2i,j=1 , for k = 1, 2. We identify any B = [bij ]2i,j=1 ∈ F2 with the tensor ∑2i,j=1 bij ei ⊗ ej . Using this identification we view τ = T1 ⊗ e1 + T2 ⊗ e2 . Example 6.4. Assume that t111 = t112 = 1, t221 = t222 = 2, t211 = t121 = t212 = t122 = 0.

(6.5.1)

2×4 To find R1 = rank U2 ⊗U3 ,U1 , we construct a matrix A1 ∶= [apq,1 ]2,4 , p,q=1 ∈ R

where p = i = 1, 2 and q = (j, k), j, k = 1, 2 and apq,1 = tijk . Then A1 ∶= [10

1 0

0 2

0 2],

2×4 and R1 = rank A1 = 2. Next A2 ∶= [apq,2 ]2,4 , where p = i = 1, 2 and p,q=1 ∈ R q = (j, k), j, k = 1, 2 and apq,2 = tjik . Then, A2 = A1 and R2 = rank A2 = 2. 2×4 Next A3 ∶= [apq,3 ]2,4 , where p = i = 1, 2 and q = (j, k), j, k = 1, 2 and p,q=1 ∈ R

apq,3 = tjki . Then A3 ∶= [11 00 00 22], and R3 = rank A3 = 1. Hence, Proposition 6.4.1 yields that rank τ ≥ 2. We claim that rank τ = 2 as a tensor over real or complex numbers. Observe that T1 = T2 = [

1 0

0 ] ≡ e1 ⊗ e1 + 2e2 ⊗ e2 ⇒ τ 2

= (e1 ⊗ e1 + 2e2 ⊗ e2 ) ⊗ e1 + (e1 ⊗ e1 + 2e2 ⊗ e2 ) ⊗ e2 . Hence, τ = (e1 ⊗ e1 + 2e2 ⊗ e2 ) ⊗ (e1 + e2 ) = e1 ⊗ e1 ⊗ (e1 + e2 ) + (2e2 ) ⊗ e2 ⊗ (e1 + e2 ). Thus, rank τ ≤ 2, and we finally deduce that rank τ = 2. Example 6.5. Assume that t211 = t121 = t112 = 1,

t111 = t222 = t122 = t212 = t221 .

(6.5.2)

Let A1 , A2 , A3 be the matrices defined as in Example 6.2. Then, A1 = A2 = A3 ∶= [01 10 10 00]. Hence, R1 = R2 = R3 = 2 and rank τ ≥ 2. Observe next that 0 T1 = [ 1

1 1 ] ≡ e1 ⊗ e2 + e2 ⊗ e1 , T2 = [ 0 0

0 ] ≡ e1 ⊗ e1 ⇒ τ = T1 ⊗ e1 + T2 ⊗ e2 . 0

Hence, τ = e1 ⊗ e2 ⊗ e1 + e2 ⊗ e1 ⊗ e1 + e1 ⊗ e1 ⊗ e2 . Thus, 2 ≤ rank τ ≤ 3. We will show that rank τ = 3, using the results that follow. Proposition 6.5.1. Let {u1,i , . . . , umi ,i } be a basis of Ui , for i = 1, 2, 3 over F. 1 ,m2 ,m3 Let τ ∈ U1 ⊗ U2 ⊗ U3 , and assume that τ = ∑m tijk ui,1 ⊗ uj,2 ⊗ uk,3 , i = i=j=k=1 1 ,m2 1, . . . , m1 , j = 1, . . . , m2 , k = 1, . . . , m3 . Let Tk ∶= [tijk ]m ∈ Fm1 ×m2 , for i,j=1 m1 ,m2 m3 k = 1, . . . , m3 , i.e., Ti = ∑i,j=1 tijk ui,1 ⊗ uj,2 and τ = ∑k=1 Tk ⊗ uk,3 . Assume that a basis {u1,3 , . . . , um3 ,3 } of U3 is changed to a basis {w1,3 , . . . , wm3 ,3 },

230

Chapter 6. Tensor products 3 where {u1,3 , . . . , um3 ,3 } = {w1,3 , . . . , wm3 ,3 }Q3 , Q3 = [qpq,3 ]m p,q=1 ∈ GL(m3 , F). m ,m m3 k 1 2 Then, τ = ∑k=1 Tk′ wk,3 , where Tk′ = [t′ijk ]i,j=1 = ∑l=1 qkl,3 Tl . In particum1 ,m2 ,m2 ′ ′ lar, T ∶= (tijk )i,j,k=1 is the core tensor corresponding to τ in the basis ui,1 ⊗uj,2 ⊗wk,3 for i = 1, . . . , m1 , j = 2, . . . , m2 , k = 3, . . . , m3 . Furthermore, R3 = rank U1 ⊗U2 ,U3 τ = rank U3 ,U1 ⊗U2 τ = dim span {T1 , . . . , Tm3 }, where each Tk is viewed as a vector in Fm1 ×m2 . In particular, one chooses a basis {w1,3 , . . . , wm3 ,3 } in U3 such that the matrices T1′ , . . . , TR′ 3 are linearly independent. Furthermore, if m3 > R3 , then we can assume that Tk′ = 0, for k > R3 . Assume that {w1,3 , . . . , wm3 ,3 } in U3 was chosen as above. Let {w1,1 , . . . , wm1 ,1 } and {w1,2 , . . . , wm2 ,2 } be two bases in U1 and U2 , respectively, where

[u1,1 , . . . , um1 ,1 ] = [w1,1 , . . . , wm1 ,1 ]Q1 , [u1,2 , . . . , um2 ,2 ] = [w1,2 , . . . , wm2 ,2 ]Q2 , and Q1 =

1 [qpq,1 ]m p,q=1

2 ∈ GL(m1 , F), Q2 = [qpq,2 ]m p,q=1 ∈ GL(m2 , F). Let τ = m1 ,m2 ,m3 ˜ m1 ,m2 tijk wi,1 ⊗ wj,2 ⊗ wj,3 , T˜k ∶= [t˜ijk ]i,j=1 ∈ Fm1 ⊗m2 , for k = 1, . . . , m3 . ∑i,j,k=1 ′ T Then, T˜k = Q1 Tk Q2 for k = 1, . . . , m3 . Furthermore, if one chooses the bases in U1 , U2 such that range τU2 ⊗U3 ,U1 = span {w1,1 , . . . , wR1 ,1 } and range τU1 ⊗U3 ,U2 = span {w1,2 , . . . , wR2 ,2 }, then each T˜k is a block diagonal matrix T˜k = diag(Tˆk , 0), R1 ×R2 1 ,R2 , for k = 1, . . . , m3 . Recalling that Tk′ = 0 for where Tˆk = [tˆijk ]R i,j=1 ∈ F R1 ,R2 ,R3 ˆ tijk wi,1 ⊗ wj,2 ⊗ wj,3 . k > R3 , we get the representation τ = ∑i,j,k=1

The proof of this proposition is straightforward and left to the reader. By interchanging the factors U1 , U2 , U3 in the tensor product U1 ⊗ U2 ⊗ U3 , it will be convenient to assume that m1 ≥ m2 ≥ m3 ≥ 1. Also, the above proposition implies that for τ ≠ 0, we may assume that m1 = R1 ≥ m2 = R2 ≥ m3 = R3 . Proposition 6.5.2. Let τ ∈ U1 ⊗ U2 ⊗ U3 . If R3 = 1, then rank τ = R1 = R2 . Proof. We may assume that m3 = 1. In that case, τ = T1 ⊗ w1,3 . Hence, rank τ = rank T1 = R1 = R2 . ◻ Thus we need to consider the case m1 = R1 ≥ m2 = R2 ≥ m3 = R3 ≥ 2. We now consider the generic case, i.e., where T1 , . . . , Tk ∈ Fm1 ×m2 are generic. That is, each Ti ≠ 0 is chosen at random, where ∣∣TTii∣∣F has a uniform distribution on the matrices of norm 1. It is well-known that a generic matrix T ∈ Fm1 ×m2 has rank m2 , since m1 ≥ m2 . n n 2 Theorem 6.5.3. Let τ = ∑n,n,2 i,j,k=1 tijk ∈ C ⊗ C ⊗ C , where n ≥ 2. Denote T1 = n n n×n [tij1 ]i,j=1 , T2 = [tij2 ]i,j=1 ∈ C . Suppose that there exists a linear combination T = aT1 + bT2 ∈ GL(n, C), i.e., T is invertible. Then n ≤ rank τ ≤ 2n − 1. In particular, for a generic tensor rank τ = n.

Proof. Recall that by changing a basis in C2 , we may assume that T1′ = aT1 + bT2 = T . Suppose first that T2′ = 0. Then, rank τ = rank T = n. Hence, we have always the inequality rank τ ≥ n. Assume now that R3 = 2. Choose first Q1 = T −1 and Q2 = In . Then T˜2 = In . Thus, for simplicity of notation, we may start with T1 = In . Let λ be an eigenvalue T2 . Then change the basis in C2 such that T1′ = In and T2′ =

6.5. Examples and results for 3-tensors

231

T2 − λT1 = T2 − λIn . So det T2′ = 0. Hence, r = rank T2′ ≤ n − 1. So the SVD of T2 = ∑ri=1 ui ⊗ (σi (T2 )¯ ui ). Now In = ∑ni=1 ei ⊗ ei . Hence, τ = ∑ni=1 ei ⊗ e1 ⊗ e1 + r ui ) ⊗ e2 . Therefore, rank τ ≤ 2n − 1. ∑i=1 ui ⊗ (σi (T2 )¯ We now consider the generic case. In that case, T1 is generic, so rank T1 = n is invertible. Choose Q1 = T −1 , Q2 = In , Q3 = I2 . Then, T˜1 = In , T˜2 = T1−1 T2 . T˜2 is generic. Hence, it is diagonalizable. So Tˆ2 = X diag(λ1 , . . . , λn )X −1 , for some invertible X. Now we again change a basis in U1 = U2 = Rn by letting Q1 = X −1 , Q2 = X T . The new matrices Tˆ1 = X −1 In X = In , Tˆ2 = X −1 T˜2 X = diag(λ1 , . . . , λn ). In this basis, n

n

n

i=1

i=1

i=1

τ = Tˆ1 ⊗ e1 + Tˆ2 ⊗ e2 = ∑ ei ⊗ ei ⊗ e1 + ∑ λi ei ⊗ ei ⊗ e2 = ∑ ei ⊗ ei ⊗ (e1 + λi e2 ). Hence the rank of generic tensor τ ∈ Cn ⊗ Cn ⊗ C 2 is n.

◻

We believe that for any τ ∈ Cn ⊗Cn ⊗C2 , rank τ ≤ 2n−1. The case rank τ = 2n−1 would correspond to T1 = In and T2 a nilpotent matrix with one Jordan block. n n 2 Corollary 6.5.4. Let τ = ∑n,n,2 i,j,k=1 tijk ∈ F ⊗ F ⊗ F , where n ≥ 2. Denote T1 = n n n×n [tij1 ]i,j=1 , T2 = [tij2 ]i,j=1 ∈ F . Suppose that there exists a linear combination T = aT1 + bT2 ∈ GL(n, F), i.e., T is invertible. Then, rank τ = n, if and only if the matrix T −1 (−bT1 + aT2 ) is diagonalizable over F.

This result shows that it is possible that for τ ∈ Rn × Rn × R2 , rank R τ > n, while rank C τ = n. For simplicity, choose n = 2, τ = T1 ⊗ e1 + T2 ⊗ e2 , T1 = I2 , T2 = −T2T ≠ 0. Then T2 has two complex conjugate eigenvalues. Hence, T2 is not diagonalizable over R. The above corollary yields that rank R > 2. However, since T2 is normal, T2 is diagonalizable over C. Hence, the above corollary yields that rank C τ = 2. Proof of the claim in Example 6.5. Observe that T1 is invertible T1−1 = T1 . Consider T1−1 T2 = S = [01 00]. Note that S is not diagonalizable. The above corollary shows that rank R τ, rank C τ > 2. Hence, rank R τ = rank C τ = 3. This shows that Theorem 6.5.3 is sharp for n = 2.

6.5.1 Worked-out problems 1. Let A ∈ Fm×m and B ∈ Fn×n . Assume that the characteristic polynomials of A and B split in F: det(λIm − A) = ∏ (λ − αi ),

det(λIn − B) = ∏ (λ − βj ).

i∈[m]

j∈[n]

Show that det(λImn − A ⊗ B) =

∏

(λ − αi βj ).

(6.5.3)

i∈[m],j∈[n]

Furthermore, if A and B are diagonalizable, prove that A ⊗ B is diagonalizable.

232

Chapter 6. Tensor products

Solution: Since the characteristic polynomials of A and B split, it is well-known that A and B are similar to upper triangular matrices: A = U A1 U −1 , B = V B1 V 1 (Problem 1.12.2-7). As the characteristic polynomials of A1 , B1 are equal to A, B, respectively, it follows that the diagonal entries of A1 and B1 are {α1 , . . . , αm } and {β1 , . . . , βn }, respectively. Observe that A ⊗ B = (U ⊗ V )(A1 ⊗ A2 )(U −1 ⊗ V −1 ) = (U ⊗ V )(A1 ⊗ A2 )(U ⊗ V )−1 . Thus, the characteristic polynomials of A ⊗ B and A1 ⊗ B1 are equal. ˆ B1 is also upper triangular with the diagonal entries Observe that A1 ⊗ α1 β1 , . . . , α1 βn , α2 β1 , . . . , α2 βn , . . . , αm β1 , . . . , αm βn . Hence, the characteristic polynomial of A1 ⊗ B1 is ∏i∈[m],j∈[n] (λ − αi βj ). This establishes (6.5.3). Assume that A and B are diagonalizable, i.e., one can assume that A1 and ˆ B1 is a diagonal matrix. Hence, A2 are diagonal matrices. Clearly, A1 ⊗ A ⊗ B is a diagonal matrix. Note that diagonalization is the process of finding a corresponding diagonal matrix for a diagonalizable matrix or linear transformation.

6.5.2 Problems 1. Let A ∈ Fm×m and B ∈ Fn×n . Assume that x1 , . . . , xk and y1 , . . . , yl are linearly independent eigenvectors of A and B with the corresponding eigenvalues α1 , . . . , αk and β1 , . . . , βl , respectively. Show that x1 ⊗ y1 , . . . , x1 ⊗ yl , . . . , xk ⊗y1 , . . . , xk ⊗yl are kl linearly independent vectors of A⊗B with the corresponding α1 β1 , α1 βl , . . . , αk βl , respectively. 2. Let Ai ∈ Fmi ×mi for i ∈ [d], where d ∈ N. Assume that the characteristic polynomial of Ai splits in F: det(λImi − Ai ) = ∏ji ∈[mi ] (λ − αji ,i ), for i ∈ [d]. Prove that det(λIm1 ⋯md −⊗i∈[d] Ai ) = ∏ji ∈[mi ],i∈[d] (λ−αj1 ,1 ⋯αjd ,d ). Furthermore, show that if each Ai is diagonalizable, then ⊗i∈[d] Ai is diagonalizable. (This statement can be viewed as a generalization of Worked-out Problem 6.5.1-1.) 3. Let A ∈ Fn×n and B ∈ Fm×m . Prove that (a) tr(A ⊗ B) = (tr A)(tr B). (b) det(A ⊗ B) = (det A)n (det B)m . 4. Prove the following properties of the Kronecker product of matrices: (a) (A ⊗ B)⊺ = A⊺ ⊗ B ⊺ . (b) (A ⊗ B)(C ⊗ D) = AC ⊗ BD if all products are well-defined. (c) (A ⊗ B)−1 = A−1 ⊗ B −1 provided A and B are invertible matrices. (d) A ⊗ B is an orthogonal matrix if A, B are orthogonal matrices.

6.5. Examples and results for 3-tensors

233

5. Let Ti ∈ L(Ui , Wi ), 1 ≤ i ≤ 2, be linear maps. Prove that (a) Im(T1 ⊗ T2 ) = (ImT1 ) ⊗ (ImT2 ). (b) ker(T1 ⊗ T2 ) = ker T1 ⊗ W2 + W1 ⊗ ker T2 . 6. Investigate where Theorem 6.2.1 goes wrong if U or W are not finitedimensional. 7. Let U and W be finite-dimensional vector spaces over F. Prove that the linear operator U ⊗ W u⊗w

→ W ⊗ U is an isomorphism. ↦w⊗v

8. Let T1 ∈ L(U) and T2 ∈ L(W), where U and W are finite-dimensional vector spaces over F. Prove the following statements: (a) rank (T1 ⊗ T2 ) = (rank T1 )(rank T2 ). (b) If T1 is nilpotent, then so is T1 ⊗ T2 . 9. Prove that if A and B are Hadamard matrices, then so is A ⊗ B.

Review Problems (2) 1. Let V be an n-dimensional vector space, n ∈ N and n ≠ 1. Let x1 and x2 be two linearly independent vectors in V. Prove that there exists a linear functional f such that f (x1 ) = 0 and f (x2 ) = 1. Show that if x ∈ V is nonzero, then there exists a linear functional f ∈ V such that f (x) ≠ 0. ⎡ 3 ⎢ ⎢ 2. Let A = ⎢ 7 ⎢ ⎢ 6 ⎣

−1 1 ⎤⎥ ⎥ −5 1 ⎥. ⎥ −6 2 ⎥⎦

(a) Find the characteristic and the minimal polynomial of A. (b) Find the components of A. (c) Find the formula to compute Al for any integer l using the components of A. 3. Suppose that A ∈ Cn×n and the minimal polynomial of A is (z − α)(z − β)2 (z − γ)3 , where α, β, γ are three distinct complex numbers. (a) Write the general form of the characteristic polynomial of A. (b) What is the condition for A to be power stable, i.e., liml→∞ Al = 0? (c) What is the condition for A to be power convergent, i.e., liml→∞ Al = B? 4. Let T ∶ V → V be a linear transformation on a finite-dimensional vector space V over a field F. (a) Let U be a nontrivial subspace of V, i.e., 0 < dim U < dim V. Define ˆ ∶= V/U. What is the dimension of V? ˆ the quotient space V (b) Suppose that U is T -invariant. Show that T induces a linear transˆ → V. ˆ formation Tˆ ∶ V (c) Suppose that U is T -invariant. Let ψ and ψˆ be the minimal polynomials of T and Tˆ, respectively. Show that ψˆ divides ψ. 5. Let B = [bij ] ∈ Rn×n be a real symmetric matrix. Denote by A = [bij ]n−1 i,j=1 the real symmetric matrix obtained from B by deleting the nth row and column. (a)

i. Show the Cauchy interlacing inequalities λi (B) ≥ λi (A) ≥ λi+1 (B), for i = 1, . . . , n − 1. 235

236

Review Problems (2)

ii. Show the inequality λ1 (B) + λn (B) ≤ λ1 (A) + bnn . (b) Let G = [gij ] ∈ C√m×n and assume that ∣gij ∣ ≤ K for all i and j. Show that σ1 (G) ≤ K mn. 6. Let A ∈ Cn×n . Assume that A has only two linearly independent eigenvectors. (a) Show that A has a has at most two distinct eigenvalues. (b) Assume that n = 4. List all possible Jordan canonical forms of A. (c) Assume that n = 4, A is nondiagonalizable and A2 is diagonalizable. Suppose that the trace of A is equal to 3. What is the Jordan canonical form of A? 7. Let F be a field and assume that A ∈ Fn×n and A4 = A2 . (a) What are all possible eigenvalues of A? (b) Suppose that char F ≠ 2 and A is nonsingular. Show that A is diagonalizable. (c) Can we drop one of the assumptions above in (b) and still conclude that A is diagonalizable? 8. Let A ∈ Rn×n such that Ap is a positive matrix for some positive integer p. (a) Show that ρ(A) > 0. (ρ(A) is the spectral radius of A.) (b) Show that A has exactly one eigenvalue λ on the circle {z ∈ C, ∣z∣ = ρ(A)}, and this eigenvalue is a simple root of the characteristic polynomial A. (c) Show that A has a positive eigenvector corresponding to this eigenvalue λ. (d) Conclude that λ is either ρ(A) or −ρ(A). 9. Let A ∈ Qn×n such that every entry is ±1. Prove that det A is an integer divisible by 2n−1 . 10. Let

⎡ 1 ⎢ ⎢2 − i ⎢ A=⎢ ⎢ 0 ⎢ ⎢ −2 ⎣

2+i 5 −1 1

0 −1 −1 1

−2⎤⎥ 1 ⎥⎥ ⎥. 1 ⎥⎥ −2⎥⎦

Denote by λ1 ≥ λ2 ≥ λ3 ≥ λ4 the eigenvalues of A. (a) Show that λ1 ≥ 6. (b) Show that λ3 < 0. 11. Let A be an n × n Hermitian matrix with columns g1 , . . . , gn , and eigenvalues λ1 ≥ ⋯ ≥ λn . Let γ1 ≥ ⋯ ≥ γn ≥ 0 be the nonnegative numbers g1∗ g1 , . . . , gn∗ gn arranged in nonincreasing order.

Review Problems (2)

237

(a) Assume that λn ≥ 0. Show that λ21 + λ22 ≥ γ1 + γ2 . (Hint: Consider A2 .) (b) Can we drop the assumption in (a) that λn ≥ 0? 12. Let F be a finite field, A ∈ Fn×n and #F ≥ n. Assume that the diagonal entries of A are zero. Prove that there exist matrices B, C ∈ Fn×n such that BC − CB = A. 13. Show that it is possible for two matrices A and B to have the same rank while A2 and B 2 have different ranks. 14. Show that if λ1 and λ2 are eigenvalues of A1 , A2 ∈ Fn×n , it is not necessarily true that λ1 + λ2 is an eigenvalue of A1 + A2 . 15. Let A = [aij ] ∈ Fn×n and ∑j aij be independent of i. Prove that (1, 1, . . . , 1)⊺ is an eigenvector of A, and find the corresponding eigenvalue. 16. If T ∈ L(Cn ) is nilpotent, prove that tr T = 0. 17. Show that if V is a vector space and U and W are two subspaces of V, then U + W = V implies U⊥ ∩ W⊥ = {0}. 18. Show the following statements for A, B ∈ Fn×n : (a) If A ∈ AS(n, F), then A2 ∈ S(n, F). (b) If A ∈ AS(n, F) and B ∈ S(n, F), then AB − BA ∈ S(n, F). (c) If A ∈ AS(n, F) and B ∈ S(n, F), then AB ∈ AS(n, F) if and only if AB = BA. 2 (d) If A1 , . . . , Am ∈ S(n, R) such that ∑m i=1 Ai = 0, prove that Ai = 0, for all 1 ≤ i ≤ m.

19. Let A ∈ Cn×n . Prove the following statements: (a) A∗ A is Hermitian nonnegative definite. (b) A∗ A = 0 implies A = 0. (c) If A is Hermitian, then A2 = 0 implies A = 0. 20. Consider the matrix X = A + Bi, where A, B ∈ S(n, R) and i2 = −1. Show that X is normal if and only if AB = BA. 21. Show that if A is R2×2 is normal with at least one entry equal to zero, then A ∈ S(2, R) or A ∈ AS(2, R). 22. Let A ∈ R2×2 be defined as follows: cos 2π j Aj = [ sin 2π j

− sin 2π j ], cos 2π j

where j ∈ N. Calculate An for all n ∈ N. 23. How many permutation matrices are there in Fn×n ? 24. Show that any upper triangular matrix is similar to a lower triangular matrix.

238

Review Problems (2)

25. If A is a stochastic matrix and B is similar to A, does it follow that B is also a stochastic matrix? 26. Let V be a finite-dimensional real vector space and T ∈ L(V). Assume that all eigenvalues of T are real. Prove that tr(T 2 ) ≥ 0. 27. Assume that A ∈ Cn×n and tr(Ak ) = 0, for 1 ≤ k ≤ n. Prove that A is nilpotent. 28. Let A be a normal matrix and n ∈ N. Prove that (An ) = (A )n . k

29. Let A ∈ Cn×n and det(zIn − A) = ∏ (z − λj )nj , where λ1 , . . . , λk are the disj=1

tinct eigenvalues of A. Show that the following statements are equivalent: (a) Im(cIn − A) ∩ ker(cIn − A) = {0}, for any c ∈ C. (b) Im(λj In − A) ∩ ker(λj In − A) = {0}, for any 1 ≤ j ≤ k. (c) rank (λj In − A) = n − nj , for 1 ≤ j ≤ k. (d) The minimal polynomial of A has only simple roots. (e) A has n linearly independent eigenvectors. (f) A is diagonalizable. (g) (λj In − A)x = 0 and (λj In − A)2 x = 0 have the same solution space for any 1 ≤ j ≤ k. 30. Assume A is an invertible doubly stochastic matrix and that A−1 is doubly stochastic. Prove that A is a permutation matrix. 31. Prove that for A ∈ Cn×n , (eA )∗ = eA . ∗

32. Let A ∈ Cn×n and λ1 , . . . , λn be its eigenvalues. (a) Prove that A is nonnegative definite if and only if A = U ∗ diag(λ1 , . . . , λn )U , for some unitary matrix U , where λ1 , . . . , λn are nonnegative real numbers. (b) Prove that A is positive definite if and only if A = U ∗ diag(λ1 , . . . , λn )U , for some unitary matrix U , where λ1 , . . . , λn are positive real numbers. 33. Show that if A is similar to B, then tr A = tr B. 34. Let A be an idempotent matrix. Prove that rank A = tr A. 35. Let A ∈ GL(n, Z). Prove that A−1 ∈ Zn×n if and only if ∣ det A∣ = 1. 36. (a) Find matrices A, B ∈ Cn×n so that eA ⋅ eB ≠ eA+B . (b) Prove that if A is nonderogatory, then eA ⋅ eB = eA+B . (c) Prove that if AB = BA, then eA ⋅ eB = eA+B . 37. Let U1 , W1 , U2 , and W2 be vector spaces over a field F, T1 ∈ L(U1 , W1 ) and T2 ∈ L(U2 , W2 ). Prove the following statements: (a) (T1 ⊗ T2 )∗ = T1∗ ⊗ T2∗ .

Review Problems (2)

239

(b) If T1 and T2 are positive definite, then so is T1 ⊗ T2 . (c) If T1 and T2 are normal, then so is T1 ⊗ T2 . (d) If T1 and T2 are Hermitian, then so is T1 ⊗ T2 . 38. Let A, B ∈ Fn×n and f (z) ∈ F[z]. Show that if A is similar to B, then so is f (A) to f (B). 39. Prove that in Cn×n (or Rn×n ), the set of diagonalizable matrices is an open subset. 40. Let A ∈ Cn×n . Show that every eigenvector of A is also an eigenvector of adj A. 41. Let A, B ∈ Fn×n and k ∈ N. Is it necessarily true that tr(AB)k = tr(Ak ) tr(B k )? 42. Show the following for any A ∈ Cn×n : det(I ⊗ A + A ⊗ I) = (−1)n det pA (−A), where pA (z) denotes the characteristic polynomial of A. 43. Prove that as vector spaces over Q, Rn is isomorphic to R, for any n ∈ N (Note that this means Rn and R are isomorphic as additive abelian groups.) 44. Let V be a finite-dimensional vector space and T ∈ L(V). Prove that there exists n ∈ N such that ker(T n ) ∩ Im(T n ) = {0}. 45. Let A = [aij ] ∈ Cn×n and λ1 , . . . , λn be its eigenvalues. Prove that ∑ni=1 ∣λi ∣2 ≤ n ∑i,j=1 ∣aij ∣2 and the equality holds if and only if A is a normal matrix. 46. Prove that a diagonalizable matrix having only one eigenvalue is a scalar matrix. 47. Show that AS(n, R) is not closed under multiplication. 48. Let A ∈ Cm×n and λ ≠ 0 a complex number. Prove that (λA) = λ1 A . 49. Let G ⊆ GL(n, R) be the set of matrices whose determinants are integer powers of 3. Is G a matrix group? 50. Let A ∈ Cn×n be a Hermitian matrix with eigenvalues λ1 , . . . , λn . Prove that (A − λ1 I)⋯(A − λn I) = 0. 51. Let A, B ∈ Cn×n be Hermitian. Prove that if λ ∈ spec (AB), then one can find λ1 ∈ A and λ2 ∈ B such that λ = λ1 λ2 . 52. Let A and B be n × n nonnegative definite matrices. Prove that Im(AB) ∩ ker(AB) = {0}. 53. Let A ∈ Cn×n with rank A = m. Prove that AA∗ = 21 (A + A∗ ) if and only if A = U [I0m

0 ∗ 0] U ,

for some unitary matrix U .

240

Review Problems (2)

54. Let S1 , S2 ∈ S(n, R). Assume that if x ∈ Rn satisfies x⊺ S1 x = x⊺ S2 x = 0, then x = 0. 55. Let A ∈ Cn×n . Prove that ∣λ1 ⋯λn ∣ = σ1 ⋯σn , where the λi and σi are the eigenvalues and singular values of A, respectively. 56. Let A1 , . . . , Am ∈ U(n, F), all have diagonal entries zero and m ≥ n. Show m

that ∏ Ai = 0. i=1

57. Let A ∈ Cn×n and λ1 , . . . , λn be its eigenvalues. Prove that the following statements are equivalent: (a) A is normal. (b) AA∗ A = A∗ A2 . (c) The singular values of A are ∣λ1 ∣, . . . , ∣λn ∣. (d) Every eigenvector of A is an eigenvector of A∗ . (e) A∗ A − AA∗ is nonnegative definite. 58. Let A, B ∈ Rn×n and A ≥ B. Prove that ρ(A) ≥ ρ(B). + 59. Let V be an IPS and T ∈ L(V) be an orthogonal operator. Assume that U = {x ∈ V ∶ T x = x} and W = {x − T x ∶ x ∈ V}. Prove that V = U ⊕ W. 60. Let U be the subspace of Fn×n defined as U = {A ∈ Fn×n ∶ tr A = 0}. Find a basis for U. What is the dimension of U? 61. Determine the number of distinct characteristic polynomials of matrices in F2×2 2 . 62. Let A, B be Hermitian idempotents. Prove that spec (AB) ⊆ [0, 1]. 63. Let A = [aij ] ∈ Cn×n and λ ∈ spec A. (a) Prove that ∣λ∣ ≤ nmax∣aij ∣. i,j

n

(b) Prove that ∣ det A∣ ≥ min{∣aii ∣ − ri }ni=1 , where ri = ∑ ∣aij ∣. j=1, j≠i

64. Prove that if A is a positive definite matrix and A = UΣV∗ is a singular value decomposition of A, then U = V. 65. Let V be an n-dimensional vector space and T ∈ L(V). Assume that det(zI − A) = (−1)n z n . Prove that T is nilpotent. 66. Assume that A is a normal matrix. Prove that AA = A A. 67. Let G denote the subgroup of GL(n, F) generated by etA , where t ∈ F, rank A = 1, and A2 = 0. Prove that G = SL(n, F). 68. If A, B ∈ Cn×n are normal, prove that ρ(AB) ≤ ρ(A)ρ(B). A 0 69. Let X = [B C ], where A and C are square matrices. Show that perm X = perm A ⋅ perm C.

Review Problems (2)

241

70. Let A = [aij ] ∈ Fn×n with aik = aij + ajk , for all i, j, k in [n] such that #{i, j, k} ≠ 1. (a) Prove that A is skew-symmetric. (b) Show that A is determined by its first row. 71. Let V be an IPS and T ∈ L(V). Let T be idempotent. Prove that T is Hermitian if and only if Im T ⊥ ker T . 72. Let F be a field and A ∈ Fn×n . Prove or disprove that A is similar to A+cIn if and only if c = 0. (Hint: Compare the traces of A and A + cIn .) 73. Prove that if A and B are two Hermitian matrices, then tr(AB)2 ≤ tr(A2 B 2 ). 74. Show that ρ(A) ≤ ρ(∣A∣) for any A = [aij ] ∈ Cn×n , where ∣A∣ = [∣aij ∣]. 75. Let T be an orthogonal linear operator. (a) Prove that T is invertible. (b) Prove that T −1 is orthogonal. 76. Let A ∈ Cn×n and σ1 , . . . , σn be its singular values. Prove that n

∣ tr A∣ ≤ ∑ σi . i=1

77. Let A ∈ GL(n, F). Find a polynomial f (z) ∈ F[z] such that A−1 = f (A). (Hint: Use the Cayley-Hamilton theorem.) 78. Prove that if A ∈ Rn×n is primitive, then it has at least n+1 positive entries. 79. Let V be an n-dimensional vector space and U1 , U2 and U3 be three subspaces of it. Prove the following inequality: dim(U1 ∩ U2 ∩ U3 ) ≥ dim U1 + dim U2 + dim U3 − 2n. 80. Let U1 , U2 , and W be subspaces of the vector space V such that U1 ⊕W = U2 ⊕ W. Is it always true that U1 = U2 ? 81. Let V be a finite-dimensional vector space and T ∈ L(V). Let ∞

∞

i=1

i=1

U = ⋃ ker(T i ), W = ⋂ Im(T i ). (a) Prove that U and W are T -invariant subspaces of V. (b) Prove that V = U ⊕ W. 82. Show that for any nonzero vectors x and y in Cn , ∥x − y∥ ≥

1 1 1 (∥x∥ + ∥y∥) ∥ x− y∥ , 2 ∥x∥ ∥y∥

where ∥⋅∥ denotes the induced norm by the Euclidean inner product.

242

Review Problems (2)

83. Let A, B ∈ Fn×n . Prove that ∣rank A − rank B∣ ≤ min{rank (A + B), rank (A − B)}. 84. Let A ∈ Fm×n and B ∈ Fn×m . Do AB and BA have the same nonzero singular values? 85. Let A, B ∈ GL(n, F) and AB − BA ∈/ GL(n, F). Prove that 1 ∈ spec (A−1 B −1 AB). 86. State and show the polar decomposition for rectangular matrices. (See Problem 4.15.2-18.) 87. Let A ∈ Fn×n , λ ∈ spec A and f (z) ∈ F[z]. Show that if f (A) = 0, then f (λ) = 0. 88. Let A ∈ C n×n , p(z) = det(zIn − A) and q(z) = adj (zIn − A). Prove that every entry in the matrix z i q(z) − q(z)z i is divisible by p(z), for i ∈ N. 89. Let A ∈ GL(n, F). When is A similar to A−1 ? 90. Let A ∈ Cn×n and spec A ⊆ R. Prove that A is similar to A∗ . 91. Can a nonzero matrix be both idempotent an nilpotent? 92. Let A, B ∈ Fn×n . Prove that for any C ∈ Fn×n , A rank [ 0

C ] ≥ rank A + rank B. B

93. Let H ∈ Rn×n be a Hadamard matrix that contains a Jm . Prove that n ≥ m2 . 94. Prove that for any permutation matrix P ∈ Rn×n , P n! = In . 95. Prove that any n × n permutation matrix can be written as the product of at most n − 1 symmetric permutation matrices. 96. Let A = [aij ], B = [bij ] be n × n positive matrices. Set C = [aij ⋅ bij ]. Prove that ρ(C) ≤ ρ(AB). √ n 97. Let A ∈ Cn×n be nonnegative definite. Prove that det A ≤ n1 tr A. 98. Let A, B ∈ Cn×n be Hermitian. Prove that tr(AB)i is real for any i ∈ N. 99. Let A, B ∈ Cn×n , B is normal, AB = BA and #spec B = n. Prove that A is normal. 100. Let P ∈ Rn×n be permutation and irreducible. Prove that P n = In . (See Problem 94 above.) 101. Prove the eigenvalue trace formula: For any nonnegative integer k, if A ∈ Fn×n with eigenvalues λ1 , . . . , λn , then n

tr(Ak ) = ∑ λki . i=1

Review Problems (2)

243

102. Let A ∈ AS(3, C) and B ∈ S(3, C). Assume that the polynomial f (x) = det(A + xB) has a multiple root. Prove that det(A + B) = det B. 103. Prove that the equation X n = I2 has at least n solutions in R2×2 . 104. Let A, B ∈ Cn×n , where n ≥ 2. (a) Prove that rank AB − rank BA ≤ ⌊ n2 ⌋. (b) Show that there are two matrices A, B ∈ Cn×n such that n rank AB − rank BA = ⌊ ⌋ . 2

Challenging Problems 1. Let A, B ∈ Rn×n , 3 ∣ n and A2 +B 2 = AB. Prove that AB −BA is invertible. 2. For any A ∈ Cn×n , define C(A) = {B ∈ Cn×n ∶ AB = BA}. Prove that min{dimC C(A) ∶ A ∈ Cn×n } = n. 3. Let A = [aij ] ∈ Rn×n and ∑ni=1 ∑nj=1 aij = n. Prove that + (a) ∣ det A∣ ≤ 1. (b) If ∣ det A∣ = 1 and λ ∈ C is an eigenvalue of A, then ∣λ∣ = 1. 4. Let f ∶ Rn×n → R be a function such that f (AB) = f (A)f (B), for any A, B ∈ Rn×n and f ≡/ 0. Prove that A ∈ Rn×n is invertible if and only if f (A) ≠ 0. 5. Let A ∈ S(n, F ) and rank A = n − 1. Prove that there exists k ∈ [n] such that the matrix obtained from A by deleting its kth row and column is full-rank. 6. Let A, B ∈ Cn×n and AB − BA can be written as a linear combination of A and B over C. Prove that A, B have at least one common nonzero eigenvector. 7. Let n ≥ 3 and A1 , . . . , An ∈ Fn×n . Assume that A1 ⋯An = In , det A1 = ⋯ = det An = 1 and A1 − A2 , A1 − A3 , . . . , A1 − An are nonzero distinct scalar matrices. Prove that A1 + (−1)n An is a scalar matrix. 8. Let A, B ∈ Rn×n are two symmetric matrices. Prove that f (t) = tr(A+Bt)m + is a polynomial with nonnegative coefficients, for any m ∈ N. 9. Let A, B ∈ F2×2 , AB ≠ BA and p(x) ∈ F[x] such that p(AB) = p(BA). Prove that p(AB) = cI2 , for some c ∈ F. 10. Let A ∈ Fn×n . Show that there exists B ∈ Fn×n such that AB is idempotent. 11. Let G = {A1 , A2 , . . . , Ar } be a finite subgroup of Rn×n under matrix multiplication such that ∑ri=1 tr Ai = 0. Prove that ∑ri=1 Ai is the n × n zero matrix. 12. Let S = { [ac

b d]

∈ R2×2 ∶ a, b, c, d (in that order) form an arithmetic pro-

gression}. Find all A ∈ S for which there is some integer k > 1 such that Ak is also in S. 245

246

Challenging Problems

13. Let A, B, C ∈ Rn×n such that AC = CB and A, B have the same characteristic polynomial. Prove that det(A − CX) = det(B − CX), for any X ∈ Rn×n . 14. Let A = [aij ] ∈ Rn×n is defined as aij = det A.

1 , min{i,j}

1 ≤ i, j ≤ n. Compute

15. Let A, B ∈ Rn×n satisfying A−1 +B −1 = (A+B)−1 . Prove that det A = det B. Does the same conclusion follow for matrices with complex entries? 16. Evaluate the determinant of the n × n matrix In + A, where A = [aij ] has ), 1 ≤ i, j ≤ n. + 2πj entries aij = cos ( 2πi n n 17. Let A = [34 23] ∈ R2×2 and dn denote the greatest common divisor of the entries of An − I. Show that limn→∞ dn = ∞. 18. Consider the following subgroups of SL(2, R): cos θ K = {[ sin θ

− sin θ ] ∶ θ ∈ R} , cos θ

r A = {[ 0

0

∶ r ∈ R+ } ,

1 N = {[ 0

x ] ∶ x ∈ R} . 1

1] r

Prove that we have a decomposition SL(2, R) = KAN . This formula is called the Iwasawa decomposition of the group. 19. Let A, B ∈ Cn×n such that A2 B + BA2 = 2ABA. Prove that there exists k ∈ N such that (AB − BA)k = 0. 20. Let F be a subfield of C such that [C ∶ F] < ∞. Prove that [C ∶ F] is either 1 or 2. √ √ 21. Prove that the number 1, p1 , . . . , pk are linearly independent over Q, where pi are distinct prime numbers. 22. Let 1 < n1 < n2 < ⋯ be a sequence of integers such that log n1 , log n2 , . . . are linearly independent over Q. Prove that limk→∞ nkk = ∞. 23. Let A ∈ Cn×n and define T ∈ L(Cn×n ) as T (B) = AB − BA. (a) Prove that rank T ≤ n2 − n. (b) Determine all eigenvalues of T in terms of the eigenvalues of A. 24. Prove that there are infinitely many sets {x1 , . . . , xn } ⊆ C satisfying the following condition; given all off-diagonal entries of A ∈ Cn×n , it is possible to select diagonal entries x1 , . . . , xn so that the eigenvalues of A are given complex numbers. 25. Consider the real numbers λ1 ≤ ⋯ ≤ λn , d1 ≤ ⋯ ≤ dn , such that ∑ki=1 di ≥ k n n ∑i=1 λi , for k ∈ [n − 1] and ∑i=1 di = ∑i=1 λi . Prove that there exists an orthogonal matrix P such that the diagonal of the matrix P ⊺ ΛP , where Λ = diag(λ1 , . . . , λn ), is occupied by the numbers d1 , . . . , dn .

Appendices

Appendix A

Sets and mappings

A set is any unordered collection of distinct objects. These objects are called the elements or members of the set. We indicate membership in or exclusion from a set using the symbols ∈ and ∈/, respectively. The set containing no elements is known as the empty set. The set of elements common to two given sets A and B is known as their intersection and written as A ∩ B. The set of elements appearing in at least one of these sets is called the union, denoted by A ∪ B. We say that A and B are equal sets, written A = B, if these two sets contain precisely the same elements. Given sets A and B, whenever each element of A is also an element of B, we say that A is a subset of B and write A ⊆ B. If A and B have no elements in common, they are disjoint. Removing all elements from a set B that belong to another set A creates a new set: the set difference B ∖ A. We define the power set P (A) of a set A to be the set of all subsets of A, including the empty set and the set A itself. The subscripts a, b, . . . are known as indices; the set I = {a, b, c, . . . , z} of all indices is called the index set. The collection of all the sets Wa through Wz comprises a family of sets, in the sense that they are related by a common definition. It may help to remember that each index indicates a particular set in the family. Given two sets A and B, their Cartesian product A × B is the set consisting of all ordered pairs (a, b) with a ∈ A and b ∈ B. The disjoint union of A and B is denoted by A ∪∗ B and is defined as A ∪∗ B = (A × {0}) ∪ (B × {1}) , where × is a Cartesian product. A relation from a set A to a set B is a subset of A × B, where A × B = {(a, b) ∶ a ∈ A and b ∈ B}. A relation on a set A is a relation from A to A, i.e., a subset of A × A. Given a relation R on A, i.e., R ⊆ A × A, we write x ∼ y if (x, y) ∈ R. An equivalence relation on a set A is a relation on A that satisfies the following properties: (i) Reflexivity: For all x ∈ A, x ∼ x, 249

250

Appendix A. Sets and mappings

(ii) Symmetricity: For all x, y ∈ A, x ∼ y implies y ∼ x, (iii) Transitivity: For all x, y, z ∈ A, x ∼ y and y ∼ z imply x ∼ z. If ∼ is an equivalence relation on A and x ∈ A, the set Ex = {y ∈ A; x ∼ y} is called the equivalence class of x. Another notation for the equivalence class Ex of x is [x]. A collection of nonempty subsets A1 , A2 , . . . of A is called a partition of A if it has the following properties: (i) Ai ∩ Aj = ∅ if i ≠ j, (ii) ∪Ai = A. i

The following fundamental results are about the connection between equivalence relation and a partition of a set. See [25] for their proofs and more details. Theorem A.0.1. Given an equivalence relation on a set A, the set of distinct equivalence classes forms a partition of A. Theorem A.0.2. Given a partition of A into sets A1 , . . . , An , the relation defined by “x ∼ y if and only if x and y belong to the same set Ai from the partition” is an equivalence relation on A. Definition A.0.3. Let X and Y be two sets. Then f ∶X x

→Y ↦ f (x)

is called a mapping if for each x ∈ X, f (x) represents exactly one element in Y . The set X in the above definition is called the domain of the mapping f . The set Y is called the codomain of f . The notation f (x) means that the rule f is applied to the element x ∈ X. The element y = f (x) is called the image of x under the mapping f . If A ⊂ X, the set f (A) is called the image of A under f and is defined as follows: f (A) = {y ∈ Y ∶ ∃x ((x ∈ A) and (f (x) = y))}. If A = X, then f (X) is called the image of the mapping f and we use the notation f (X) = Im f . For y ∈ Y , we define f −1 (y) = {x ∈ X ∶ f (x) = y}. In general, for B ⊆ Y we define f −1 (B) = {x ∈ X ∶ f (x) ∈ B}. It is clear that f −1 (Y ) = X. Definition A.0.4. The mapping f ∶ X → Y is called injective if for any two elements x1 , x2 ∈ X, x1 ≠ x2 implies f (x1 ) ≠ f (x2 ). Definition A.0.5. The mapping f ∶ X → Y is called surjective (onto) if f (X) = Y. Definition A.0.6. The mapping f ∶ X → Y is called bijective (one-to-one) if it is injective and surjective simultaneously.

Appendix A. Sets and mappings

251

Let us consider two mappings f ∶ X → Y and g ∶ Y → Z. Choosing an arbitrary element x ∈ X, we can apply f to it and we get f (x) ∈ Y . Then we can apply g to f (x) and this successive application of two mappings g(f (x)) yields a rule that associates each x ∈ X with some uniquely determined element z = g(f (x)) ∈ Z. That is, we have a mapping h ∶ X → Z. This mapping is denoted by h = g ○ f and is called the composition of f and g. Theorem A.0.7. If f and g are two injective (surjective) mappings, then g ○ f is an injective (a surjective) mapping. The proof is left as an exercise. Definition A.0.8. A mapping f −1 ∶ Y → X is called the inverse mapping for the mapping f ∶ X → Y if f ○ f −1 = idY and f −1 ○ f = idX , where idX ∶ X → X and idY ∶ Y → Y are defined as idX (x) = x and idY (y) = y, for any x ∈ X and y ∈ Y . Note that idX and idY are called identity maps on X and Y , respectively. Definition A.0.9. A mapping l ∶ Y → X is called a left inverse to the mapping f ∶ X → Y if l ○ f = idX . Definition A.0.10. A mapping r ∶ Y → X is called a right inverse to the mapping f ∶ X → Y if f ○ r = idY . Theorem A.0.11. A mapping f ∶ X → Y possesses the left inverse mapping l if and only if it is injective. Theorem A.0.12. A mapping f ∶ X → Y possesses the right inverse mapping r if and only if it is surjective. The proofs are left as exercises. Theorem A.0.13. A mapping f ∶ X → Y possesses both left and right inverse mapping l and r if and only if it is bijective. In this case, the mappings l and r are uniquely determined. They coincide with each other thus determining the unique bilateral inverse mapping l = r = f −1 . The proof is left as an exercise.

Appendix B

Basic facts in analysis and topology

Definition B.0.1. A metric space is an ordered pair (X, d), where X is a set and d is a metric on X, i.e., a function d ∶ X × X → R, such that for any x, y, z ∈ X, the following conditions hold: (i) Nonnegativity d(x, y) ≥ 0. (ii) Identity of indiscernibles d(x, y) = 0 if and only if x = y. (iii) Symmetry d(x, y) = d(y, x). (iv) Triangle inequality d(x, z) ≤ d(x, y) + d(y, z). For any point x in X, we define the open ball of radius r > 0 (where r is a real number) about x as the set B(x, r) = {y ∈ X ∶ d(x, y) < r}. A subset U of X is called open if for every x in U , there exists an r > 0 such that B(x, r) is contained in U . The complement of an open set is called closed. A metric space X is called bounded if there exists some real number r such that d(x, y) ≤ r, for all x, y in X. √ Example B.1. (Rn , d) is a metric space for d(x, y) = ∑ni=1 (xi − yi )2 , where x = (x1 , . . . , xn ) and y = (y1 , . . . , yn ). This metric space is well-known as Euclidean metric space. The Euclidean metric on Cn is the Euclidean metric on R2n , where Cn viewed as R2n ≡ Rn × Rn . Definition B.0.2. A sequence is a function f defined on the set of all positive integers. If f (n) = sn , for n ∈ N, it is customary to denote the sequence f by the symbol {sn }. The values of f , that is, the elements sn are called the terms of the sequence. If X is a set and sn ∈ X, for all n ∈ N, then {sn } is said to be a sequence in X. Let {sn } ⊆ X be a given sequence, denoted as sn . The sequence sn is convergent to s ∈ X if the following condition holds: 253

254

Appendix B. Basic facts in analysis and topology

“for any positive real ε, there exists N such that d(sn , s) < ε, for any n > N .” This is denoted by sn → s or limn→∞ sn = s. Definition B.0.3. For X = R the limit inferior of {sn } is denoted by lim inf n→∞ sn and is defined by lim inf n→∞ sn ∶= limn→∞ (inf m≥n sm ). Similarly, the limit superior of {sn } is denoted by lim supn→∞ sn and defined by lim supn→∞ sn ∶= limn→∞ (supm≥n sm ). Note that lim inf n→∞ sn and lim supn→∞ sn can take the values ±∞. One can show that sn → s if and only if lim inf n→∞ sn = s = lim supn→∞ sn . The subset Y of X is called compact if every sequence in Y has a subsequence that converges to a point in Y . It can be shown that if F ⊆ Rn (or F ⊆ Cn ), the following statements are equivalent: (i) F is compact. (ii) F is closed and bounded. The interested reader is referred to [24] to see the proof of the above theorem. The sequence {sn } ⊆ X is called Cauchy if the following condition holds: “For any positive real number ε, there is a positive integer N such that for all natural numbers m, n > N , d(xn , xm ) < ε.” It can be shown that every convergent sequence is a Cauchy sequence. But the converse is not necessarily true. For example, if X = (0, ∞) with d(x, y) = ∣x − y∣, then sn = n1 is a Cauchy sequence in X but not convergent (as sn converges to 0 and 0 ∈/ X). A metric space X in which every Cauchy sequence converges in X is called complete. For example, the set of real numbers is complete under the metric induced by the usual absolute value but the set of rational numbers is not. (Why?) Definition B.0.4. A topological space is a set X together with a collection of its subsets T , whose elements are called open sets, that satisfying (i) ∅ ∈ T , (ii) X ∈ T , (iii) The intersection of a finite number of sets in T is also in T , (iv) The union of an arbitrary number of sets in T is also in T . (Here, P (X) denotes the power set of X and T ⊂ P (X).) Definition B.0.5. A closed set is a set whose complement is an open set. Let (X, T ) and (Y, T ′ ) be two topological spaces. A map f ∶ X → Y is said to be continuous if U ∈ T ′ implies f −1 (U ) ∈ T , for any U ∈ T ′ . The interested reader is encouraged to investigate the relation between continuity between metric spaces and topological spaces. In particular, the above definition inspires one to check whether the inverse image of an open set under a continuous function is open or not, in the metric space sense.

Appendix B. Basic facts in analysis and topology

255

Definition B.0.6. A topological space X is said to be path connected if for any two points x1 and x2 ∈ X, there exists a continuous function f ∶ [0, 1] → X such that f (0) = x1 and f (1) = x2 . The reader is referred to [17] for more results on topological spaces. We end this section with big O notation. In mathematics, it is important to get a handle on the approximation error. For example, we write ex = 1 + x +

x2 + O(x3 ), 2

to express the fact that the error is smaller in an absolute value than some constant times x3 if x is close enough to 0. For a formal definition, suppose f (x) and g(x) are two functions defined on some subsets of the real numbers. We write f (x) = O(g(x)) , x → 0, if and only if there exist positive constants ε and C such that ∣f (x)∣ < C∣g(x)∣ , for all ∣x∣ < ε.

Appendix C

Basic facts in graph theory

A graph G consists of a set V (or V (G)) of vertices, a set E (or E(G)) of edges, and a mapping associating to each edge e ∈ E(G) an unordered pair x, y of vertices called the ends of e. The cardinality of V(G) is called the order of G. Also, the cardinality of E(G) is called the degree of G. We say an edge is incident with its ends, that is, it joins its ends. Two vertices are adjacent if they are joined by a graph edge. The adjacency matrix, also sometimes called the connection matrix of a graph, is a matrix with rows and columns labeled by graph vertices, with a 1 or 0 in position (vi , vj ) according to whether vi and vj are adjacent or not. (Here vi and vj are two vertices of the graph.) Example C.1. Consider the following graph:

x4

x1

x2

x3

.

Two vertices x1 and x2 are not adjacent. Also, x1 and x4 are adjacent. Here, the adjacency matrix is ⎡0 0 0 1⎤ ⎢ ⎥ ⎢0 0 0 1⎥ ⎢ ⎥ ⎢ ⎥. ⎢0 0 0 1⎥ ⎢ ⎥ ⎢1 1 1 0⎥ ⎣ ⎦ For the graph G = (V, E), a matching M in G is a set of pairwise nonadjacent edges, that is, no two edges share a common vertex. A perfect matching is a matching that matches all vertices of the graph. A bipartite graph (or bigraph) is a graph whose vertices can be divided into two disjoint sets V1 and V2 such that every edge connects a vertex in V1 to one in V2 . Here, V1 and V2 are called bipartite sets of G. If #V1 = #V2 , G is called balanced bipartite. A graph is called regular if each vertex has the same degree. A regular graph with vertices of degree k ∈ N is called a k-regular graph. 257

258

Appendix C. Basic facts in graph theory

A directed graph, abbreviated as digraph, is denoted by D = (V, E). V is the set of vertices and E is the set of directed edges, abbreviated as diedges, in G. So E is a subset of V × V = {(v, w) ∶ v, w ∈ V }. Thus (v, w) ∈ E is a directed edge from v to w. For example, the graph D = ([4], {(1, 2), (2, 1), (2, 3), (2, 4), (3, 3), (3, 4), (4, 1)}) has 4 vertices and 7 diedges. The diedge (v, v) ∈ E is called a loop, or selfloop: degin v ∶= #{(w, v) ∈ E},

degout v ∶= #{(v, w) ∈ E},

the number of diedges to v and out of v in D. Here, degin , degout are called the in and out degrees, respectively. Clearly, we have ∑ degin v = ∑ degout v = #E.

v∈V

(C.0.1)

v∈V

A vertex, v ∈ V , is called isolated if degin (v) = degout (v) = 0. A multigraph G = (V, E) has undirected edges, which may be multiple, and may have multiple loops. A multidigraph D = (V, E) may have multiple diedges. Each multidigraph D = (V, E) induces an undirected multigraph G(D) = (V, E ′ ), where each diedge (u, v) ∈ E is viewed as an undirected edge (u, v) ∈ E ′ . (Each loop (u, v) ∈ E will appear twice in E ′ .) Vice versa, a multigraph G = (V, E ′ ) induces a multidigraph D(G) = (V, E), where each undirected edge (u, v) induces diedges (u, v) and (v, u), when u ≠ v. The loop (u, u) appears p times in D(G) if it appears p times in G. Most of the following notions are the same for graphs, digraphs, multigraphs or multidigraphs, unless stated otherwise. We state these notions for directed multidigraphs D = (V, E) mostly. Definition C.0.1. 1. A walk in D = (V, E) given by v0 v1 ⋯vp , where (vi−1 , vi ) ∈ E for i = 1, . . . , p. One views it as a walk that starts at v0 and ends at vp . The length of the walk p is the number of edges in the walk. 2. A path is a walk where vi ≠ vj for i ≠ j. 3. A closed walk is a walk where vp = v0 . 4. A cycle is a closed walk where vi ≠ vj , for 0 ≤ i < j < p. A loop (v, v) ∈ E is considered a cycle of length 1. Note that a closed walk vwv, where v ≠ w, is considered a cycle of length 2 in a digraph, but not a cycle in an undirected multigraph! 5. A Hamiltonian cycle is a cycle through the graph that visits each node exactly once. 6. Two vertices v, w ∈ V , v ≠ w are called strongly connected if there exist two walks in D: The first starts in v and ends in w, and the second starts in w and ends in v. For multigraphs G = (V, E), the corresponding notion is u, v are connected.

C.1. Worked-out problems

259

7. A multidigraph D = ([n], E) is called strongly connected if either n = 1 and (1, 1) ∈ E, or n > 1 and any two vertices in D are strongly connected. 8. A multidigraph g = (V, E) is called connected if either n = 1, or n > 1, and any two vertices in G are connected. (Note that a simple graph on one vertex G = ([1], ∅) is considered connected. But the induced directed graph D(G) = G is not strongly connected.) 9. For W ⊂ V , the subgraph G(W ) = (W, E(W )), of multigraph G = (V, E), is called a connected component of G if G(W ) is connected, and for any W ⫋ U ⊂ V the induced subgraph G(U ) = (U, E(U )) is not connected. 10. A forest G = (V, E) is called a tree if it is connected. 11. A diforest D = (V, E) is called a ditree if the induced multigraph G(D) is a tree. 12. A dipath means a directed path and a dicycle means a directed cycle. 13. Two multidigraphs D1 = (V1 , E1 ), D2 = (V2 , E2 ) are called isomorphic if there exists a bijection φ ∶ V1 → V2 that induces a bijection φˆ ∶ E1 → E2 . That is, if (u1 , v1 ) ∈ E1 is a diedge of multiplicity k in E1 , then (φ(u1 ), φ(v1 )) ∈ E2 is a diedge of multiplicity k and vice versa. The interested reader is referred to [4] to see more details about the above concepts.

C.1 Worked-out problems 1. Let G = (V, E) be a multigraph. Prove that G is a disjoint union of its connected components. That is, there is a unique decomposition of V to k

⋃ Vi , up to relabeling of V1 , . . . , Vk , such that the following conditions hold:

i=1

(a) V1 , . . . , Vk are nonempty and mutually disjoint. (b) Each G(Vi ) = (Vi , E(Vi )) is a connected component of G. k

(c) E = ⋃ Vi . i=1

Solution: We introduce the following relation ∼ on V . First, we assume that v ∼ v for each v ∈ V . Second, for v, w ∈ V , v ≠ w we say that v ∼ w if v is connected to w. It is straightforward to show that ∼ is an equivalence relation. Let V1 , . . . , Vk be the equivalence classes in V . That is v, w ∈ Vi if and only if v and w are connected. The rest of the proposition follows straightforwardly. 2. Let D = (V, E) be a multidigraph. Prove that D is diforest if and only if it is isomorphic to a digraph D1 = ([n], E1 ) such that if (i, j) ∈ E1 , then i < j.

260

Appendix C. Basic facts in graph theory

Solution: Clearly, D1 cannot have a cycle. So if D is isomorphic to D1 , then D is a diforest. Assume now that D = (V, E) is a diforest. Let Vi be all vertices in V having height i for i = 0, . . . , k ≥ 0, where k is the maximal height of all vertices in D. Observe that from the definition of height it follows that if (u, v) ∈ D, where u ∈ Vi , w ∈ Vj , then i < j. Rename the vertices of V such that Vi = {ni + 1, . . . , n + 1}, where 0 = n0 < n1 < ⋯ < nk+1 = n ∶= #V . Then one obtains the isomorphic graph D1 = ([n], E1 ), such that if (i, j) ∈ E1 , then i < j.

C.2 Problems 1. Assume v1 , . . . , vp is a walk in multidigraph D = (V, E). Show that it is possible to subdivide this walk to walks vni−1 + 1⋯vni , i = 1, . . . , q, where n0 = 0 < n1 < ⋯ < nq = p, and each walk is either a cycle, or a maximal path. Erase all cycles in v1 , . . . , vp and apply the above statement to the new walk. Conclude that a walk can be “decomposed” to a union of cycles and at most one path. 2. Let D be a multidigraph. Assume that there exists a walk from v to w. Show that (a) if v ≠ w, there exists a path from v to w of length #V − 1 at most; (b) if v = w, there exists a cycle which contains v of length #V at most. 3. If G is a k-regular bipartite graph, prove that G has k perfect matchings, no pair of which have an edge in common. (Hint: Use Theorem 1.8.1.)

Appendix D

Basic facts in abstract algebra

D.1 Groups A binary operation on a set A is a map that sends elements of the Cartesian product A × A to A. A group denoted by G is a set of elements with a binary operation ⊕, i.e., a ⊕ b ∈ G, for each a, b ∈ G. This operation is (i) associative: (a ⊕ b) ⊕ c = a ⊕ (b ⊕ c); (ii) there exists a neutral element ◯ such that a⊕◯ = a, for each a ∈ G; (iii) for each a ∈ G, there exists a unique ⊖a such that a⊕(⊖a) = ◯. The group G is called abelian (commutative) if a ⊕ b = b ⊕ a, for each a, b ∈ G. Otherwise, it is called nonabelian (noncommutative). Examples of abelian groups 1. The following subsets of complex numbers where ⊕ is the standard addition +, ⊖ is the standard subtraction −, and the neutral element is 0: (a) The set of integers Z. (b) The set of rational numbers Q. (c) The set of real numbers R. (d) The set of complex numbers C. 2. The following subsets of C∗ ∶= C/{0}, i.e., all nonzero complex numbers, where the operation ⊕ is the standard product, the neutral element is 1, and ⊖a is a−1 : (a) Q∗ ∶= Q/{0}. (b) R∗ ∶= R/{0}. (c) C∗ ∶= C/{0}. An example of a nonabelian group The quaternion group is a group with eight elements that is denoted as (Q8 , ⊕) 261

262

Appendix D. Basic facts in abstract algebra

and defined as follows: Q8 = {1, −1, i, −i, j, −j, k, −k}, 1 is the identity element, and (−1) ⊕ (−1) = 1, (−1) ⊕ i = −i, (−1) ⊕ j = −j, (−1) ⊕ k = −k, i ⊕ j = k, j ⊕ i = −k, j ⊕ k = i, k ⊕ j = −i, k ⊕ i = j, i ⊕ k = −j, and the remaining relations can be deduced from these. Clearly, Q8 is not abelian, for instance, i ⊕ j ≠ j ⊕ i.

D.2 Rings From now on, we will denote the operation ⊕ by +. An abelian group R is called a ring if there exists a second operation, called a product, denoted by ab, for any a, b ∈ R, which satisfies (i) associativity: (ab)c = a(bc); (ii) distributivity: a(b + c) = ab + ac, (b + c)a = ba + ca, (a, b, c ∈ R); (iii) existence of identity 1 ∈ R: 1a = a1 = a, for all a ∈ R. Also, R is called a commutative ring if ab = ba, for all a, b ∈ R. Otherwise, R is called a noncommutative ring. Examples of rings 1. For a positive integer n > 1, the set of n × n complex-valued matrices, denoted by Cn×n , with the addition A + B, product AB, and with the identity In , the n × n identity matrix. Furthermore, the following subsets of Cn×n : (a) Zn×n , the ring of n × n matrices with integer entries (noncommutative ring). (b) Qn×n , the ring of n×n matrices with rational entries (noncommutative ring). (c) Rn×n , the ring of n × n matrices with real entries (noncommutative ring). (d) Cn×n (noncommutative ring). (e) D(n, S), the set of n×n diagonal matrices with entries in S = Z, Q, R, C (commutative ring). (Diagonal matrices are square matrices in which the entries outside the main diagonal are zero.) 2. Zm = Z/(mZ), all integers modulo a positive integer m, with the addition and multiplication modulo m. Clearly, #Zm , the number of elements in Zm , is m and Zm can be identified with {0, 1, . . . , m − 1}.

D.4. Fields and division rings

263

A subring of a ring R is a subset S of R that contains 1 and is closed under subtraction and multiplication. An ideal is a special kind of subring. A subring I of R is a left ideal if a ∈ I, r ∈ R imply ra ∈ I. A right ideal is defined similarly. A two-sided ideal (or just an ideal) is both left and right ideal. That is, a, b ∈ I, r ∈ R imply a − b, ar, ra ∈ I. It can be shown that if R is a commutative ring and a ∈ R, then the set I = {ra ∶ r ∈ R} is an ideal of R. This ideal is called the principal ideal generated by a and is denoted by ⟨a⟩. If I is an ideal of R and r ∈ R, r + I is defined as {r + x ∶ x ∈ I}. Consider the set R/I of all cosets a + I, where a ∈ R. On this set, we define addition and multiplication as follows: (a + I) + (b + I) = (a + b) + I, (a + I)(b + I) = ab + I. With these two operations, R/I is a ring called the quotient ring by I. (Why is R/I a ring?)

D.3 Integral domains Let R be a ring. A zero divisor in R is an element r ≠ 0, such that there exists an s ∈ R with s ≠ 0 and rs = 0. For example, in Z6 , 0 = 2 ⋅ 3, hence both 2 and 3 are zero divisors. A commutative R is an integral domain if R ≠ {0}, and there do not exist zero divisors in R. Examples of integral domain (a) Z, the ring of integer numbers with standard addition and product. (b) Q, the ring of rational numbers with standard addition and product.

D.4 Fields and division rings A commutative ring R is called a field if each nonzero element a ∈ R has a unique inverse, denoted by a−1 such that aa−1 = 1. A field is usually denoted by F. A ring R is called a division ring if any nonzero element a ∈ R has a unique inverse. Examples of fields (a) Q, R, C, with the standard addition and product. (b) Zm , where m is a prime integer. An example of a division ring: the quaternion ring Let QR = {(x1 , x2 , x3 , x4 ) ∣ xi ∈ R, i = 1, 2, 3, 4}. Define + and ⋅ on QR as follows: (x1 , x2 , x3 , x4 ) + (y1 , y2 , y3 , y4 ) = (x1 + y1 , x2 + y2 , x3 + y3 , x4 + y4 ), (x1 , x2 , x3 , x4 ) ⋅ (y1 , y2 , y3 , y4 ) = (x1 y1 − x2 y2 − x3 y3 − x4 y4 , x1 y2 + x2 y1 + x3 y4 − x4 y3 , x1 y3 + x3 y1 + x4 y2 − x2 y4 , x1 y4 + x2 y3 − x3 y2 + x4 y1 ).

264

Appendix D. Basic facts in abstract algebra

One can view QR as four-dimensional space vector space over R, consisting of vectors x = x1 + x2 i + x3 j + x4 k. Then the product x ⋅ y is determined by the product (⊕) in the nonabelian group Q8 introduced in section D.1. From the definition of + and ⋅, it follows that + and ⋅ are binary operations on QR . Now + is associative and commutative because addition is associative and commutative in R. We also note that (0, 0, 0, 0) ∈ QR is the additive identity, and if (x1 , x2 , x3 , x4 ) ∈ QR , then (−x1 , −x2 , −x3 , −x4 ) ∈ QR and −(x1 , x2 , x3 , x4 ) = (−x1 , −x2 , −x3 , −x4 ). Hence, (QR , +) is a commutative group. Similarly, ⋅ is associative and (1, 0, 0, 0) ∈ QR is the multiplicative identity. Let (x1 , x2 , x3 , x4 ) ∈ QR be a nonzero element. Then, N = x21 + x22 + x23 + x24 ≠ 0 and N ∈ R. Thus, (x1 /N, −x2 /N, −x3 /N, −x4 /N ) ∈ QR . It is verified that (x1 /N, −x2 /N, −x3 /N, −x4 /N ) is the multiplicative inverse of (x1 , x2 , x3 , x4 ). Thus, QR is a division ring called the ring of real quaternions. However, QR is not commutative because (0, 1, 0, 0) ⋅ (0, 0, 1, 0) = (0, 0, 0, 1) ≠ (0, 0, 0, −1) = (0, 0, 1, 0) ⋅ (0, 1, 0, 0). Therefore, QR is not a field. We remark that any field F is an integral domain. (Why?) However, Z, the ring of integer numbers, is an example of an integral domain that is not a field. In the following proposition, we study the special case where the integral domain R is in fact a field. Proposition D.4.1. A finite integral domain R is a field. Proof. Suppose r ∈ R with r ≠ 0. The elements 1 = r0 , r, r2 , . . . cannot all be different, since otherwise R would be infinite. Hence, there exist 0 ≤ n < m, with rn = rm . Since R is an integral domain, then rn (1 − rm−n ) = 0 yields rm−n = 1. Finally, since rm−n−1 r = 1, we see that r is invertible, with r−1 = rm−n−1 . ◻

D.5 Vector spaces, modules, and algebras An abelian group V is called a vector space over a field F, if for any a ∈ F and v ∈ V the product av is an element in V, and this operation satisfies the following properties: a(u + v) = au + av, (a + b)v = av + bv, (ab)u = a(bu) and 1v = v, for all a, b ∈ F, u, v ∈ V. A module M is a vector space over a ring. The formal definition is exactly as above, but we use a ring R instead of a field. In this case, M is called an R-module. A ring R is called an algebra over F if R is a vector space over F with respect to the addition operation +. Denote by ⋅ the product in R. Then, for any x, y, z in R and a, b in F, the following equalities hold: (i) (x + y) ⋅ z = x ⋅ z + y ⋅ z, (ii) x ⋅ (y + z) = x ⋅ y + x ⋅ z, (iii) (ax) ⋅ (by) = (ab)(x ⋅ y).

D.6. More about groups

265

The algebra R is called a division algebra if for any y ∈ R ∖ {0} there exists exactly one element z in R such that z ⋅ y = y ⋅ z = 1. In what follows, we denote for simplicity x ⋅ y by xy and no ambiguity will arise. Examples of vector spaces, modules, and algebras (a) If M is a vector space over the field F, then M is an F-module. (b) Let M = Rm×n be the set of all m × n matrices with entries in the ring R. Then M is an R-module, where addition is ordinary matrix addition and multiplication of the scalar c by matrix A means the multiplication of each entry of A by c. (c) In the above example, if we change the ring R to a field F, then V = Fm×n would be a vector space over F. We remark that for the case n = 1, we simply use the notation Fm for V. (d) Every abelian group A is a Z-module. Addition and subtraction are defined according to the group structure of A; the point is that we can multiply x ∈ A by the integer n. If n > 0, then nx = x + x + ⋯ + x (n times); if n < 0, then nx = −x − x − ⋯ − x (∣n∣ times, where ∣ ∣ denotes the absolute value function). (e) Every commutative ring R is an algebra over itself. (f) An arbitrary ring R is always a Z-algebra. (g) If R is a commutative ring, then Rn×n , the set of all n × n matrices with entries in R is an R-algebra.

D.6 More about groups A set G is called to be a semigroup if it has a binary operation satisfying the condition (ab)c = a(bc), for any a, b, c ∈ G. (Here, the product operation is replaced by ⊕.) A subset H of G is called a subgroup of G if H also forms a group under the operation of G. The set of the integers Z is a subgroup of the group of rational numbers Q under ordinary addition. A subgroup N of a group G is called a normal subgroup if for each element n in N and each g in G, the element gng −1 is still in N . We use the notation N ⊲ G to denote that N is a normal subgroup of G. For example, all subgroups N of an abelian group G are normal. (Why?) The center of a group G, denoted by Z(G), is the set of elements that commute with every element of G, i.e., Z(G) = {z ∈ G ∶ zg = gz, ∀g ∈ G}. Clearly, Z(G) ⊲ G. For a subgroup H of a group G and an element x of G, define xH to be the set {xh ∶ h ∈ H}. A subset of G of the form xH is said to be a left coset of H (a right coset of H is defined similarly). For a normal subgroup N of the group G, the quotient group of N in G, written G/N and read “G modulo N ,” is the

266

Appendix D. Basic facts in abstract algebra

set of cosets of N in G. It is easy to see that G/N is a group with the following operation: (N a)(N b) = N ab,

for all a, b ∈ G.

A group G is called finitely generated if there exist x1 , . . . , xn in G such that ±1 every x in G can be written in the form x = x±1 i1 ⋯xim , where i1 , . . . , im ∈ [n] and m ranges over all positive integers. In this case, we say that the set {x1 , . . . , xn } is a generating set of G. A cyclic group is a finitely generated group, which is generated by a single element. The concept of homomorphism allows us to compare algebraic structures. There are many different kinds of homomorphisms, and intuitively it is a function from one system to another one of the same kind that preserves all relevant structures. A group homomorphism is a map ϕ ∶ G1 → G2 between two groups G1 and G2 such that the group operation is preserved, i.e., ϕ(xy) = ϕ(x)ϕ(y), for any x, y ∈ G1 . A group homomorphism is called an isomorphism if it is bijective. If ϕ is an isomorphism, we say that G1 is isomorphic to G2 and we use the notation G1 ≅ G2 . It is easy to show that the isomorphisms of groups form an equivalence relation on the class of all groups. (See Appendix A for more information about equivalence relations.) The kernel of a group homomorphism ϕ ∶ G1 → G2 is denoted by ker ϕ and is the set of all elements of G1 that are mapped to the identity element of G2 . The kernel is a normal subgroup of G1 . Also, the image of ϕ is denoted by Imϕ and defined as follows: Imϕ = {y ∈ G2 ∶ ∃x ∈ G1 such that ϕ(x) = y}. Clearly, Imϕ is a subgroup of G2 (and even its normal subgroup). Also, the cokernel of ϕ is denoted by cokerϕ and defined as cokerϕ = G2 /Imϕ. Now we are ready to give the three isomorphism theorems in the context of groups. The interested reader is referred to [14] to see the proofs and more details about these theorems.

First isomorphism theorem Let G1 and G2 be two groups and ϕ ∶ G1 → G2 be a group homomorphism. Then, G1 / ker ϕ ≅ Imϕ. In particular, if ϕ is surjective, then G1 / ker ϕ is isomorphic to G2 .

Second isomorphism theorem Let G a group, S be a subgroup of G, and N be a normal subgroup of G. Then, 1. The product SN = {sn ∶ s ∈ S and n ∈ N } is a subgroup of G. 2. S ∩ N ⊲ S. 3. SN /N ≃ S/S ∩ N .

D.8. More about rings

267

Third isomorphism theorem 1. If N ⊲ G and K is a subgroup of G such that N ⊆ K ⊆ G, then K/N is a subgroup of G/N . 2. Every subgroup of G/N is of the form K/N , for some subgroup K of G such that N ⊆ K ⊆ G. 3. If K is a normal subgroup of G such that N ⊆ K ⊆ G, then K/N is a normal subgroup of G/N . 4. Every normal subgroup of G/N is of the form K/N , for some normal subgroup K of G such that N ⊆ K ⊆ G. 5. If K is a normal subgroup of G such that N ⊆ K ⊆ G, then the quotient group (G/N ) / (K/N ) is isomorphic to G/K. Assume that G is a finite group and H its subgroup. Then G is a disjoint union of cosets of H. Hence, the order of H, (#H), divides the order of G: Lagrange’s theorem. Note that this is the case for vector spaces, i.e., if V is a finite vector space (this should not be assumed to be a finite-dimensional vector space), and W is its subspace, then #W divides #V . We define the concept of “dimension” for vector spaces in section 1.2.4.

D.7 The group of bijections on a set X Let X be a set. Denote by S(X ) the set of all bijections of X onto itself. It is easy to show that S(X ) forms a group under the composition, with the identity element idX . Assume that X is a finite set. Then, any bijection ψ ∈ S(X ) is called a permutation, and S(X ) is called a permutation group. Let X be a finite set. Then S(X ) has n! elements if n is the number of elements in X . Assume that X = {x1 , . . . , xn }. We can construct a bijection φ on X as follows: (1) Assign one of the n elements of X to φ(x1 ). (There are n possibilities for φ(x1 ) in X .) (2) Assign one of the n − 1 elements of X − {φ(x1 )} to φ(x2 ). (There are n − 1 possibilities for φ(x2 ) in X − {φ(x1 )}.) ⋮ (n) Assign the one remaining element to φ(xn ). (There is only one possibility for φ(xn ).) This method can generate n(n − 1)⋯1 = n! different bijections of X .

D.8 More about rings If R and S are two rings, then a ring homomorphism from R to S is a function ϕ ∶ R → S such that

268

Appendix D. Basic facts in abstract algebra

(i) ϕ(a + b) = ϕ(a) + ϕ(b), (ii) ϕ(ab) = ϕ(a)ϕ(b), for each a, b ∈ R. A ring homomorphism is called an isomorphism if it is bijective. Note that we have three isomorphism theorems for rings that are similar to isomorphism theorems in groups. See [14] for more details. Remark D.8.1. If ϕ ∶ (R, +, ⋅) → (S, +, ⋅) is a ring homomorphism, then ϕ ∶ (R, +) → (S, +) is a group homomorphism. Remark D.8.2. If R is a ring and S ⊂ R is a subring, then the inclusion i ∶ S ↪ R is a ring homomorphism. (Why?)

D.9 More about fields Let F be a field. Then by F[x] we denote the set of all polynomials p(x) = an xn + ⋯ + a0 , where x0 ≡ 1. p is called monic if an = 1. The degree of p, denoted as deg p, is n if an ≠ 0. F[x] is a commutative ring under the standard addition and product of polynomials. F[x] is called the polynomial ring in x over F. Here 0 and 1 are the constant polynomials having value 0 and 1, respectively. Moreover, F[x] does not have zero divisors, i.e., if p(x)g(x) = 0, then either p or q is zero polynomial. A polynomial p(x) ∈ F[x] is called irreducible (primitive) if the decomposition p(x) = q(x)r(x) in F[x] implies that either q(x) or r(x) is a constant polynomial. A subfield F of a ring R is a subring of R that is a field. For example, Q is a subfield of R under the usual addition. Note that Z2 is not a subfield of Q, even though Z2 = {0, 1} can be viewed as a subset of Q and both are fields. (Why is Z2 not a subfield of Q?) If F is a subfield of a field E, one also says that E is a field extension of F, and one writes E/F is a field extension. Also, if C is a subset of E, we define F(C) to be the intersection of all subfields of E which contains F ∪ C. It is verified that F(C) is a field and F(C) is called the subfield of E generated by C over F. In the case C = {a}, we simply use the notation F(a) for F(C). If E/F is a field extension and α ∈ E, then α is called algebraic over F, if α is a root of some polynomial with coefficients in F; otherwise, α is called transcendental over F. If m(x) is a monic irreducible polynomial with coefficients in F and m(x) = 0, then m(x) is called a minimal polynomial of α over F. Theorem D.9.1. If E/F is a field extension and α ∈ E is algebraic over F, then 1. the element α has a minimal polynomial over F. 2. its minimal polynomial is determined uniquely. 3. if f (α) = 0, for some nonzero f (x) ∈ F[x], then m(x) divides f (x). The interested reader is referred to [14] to see the proof of Theorem D.9.1.

D.11. Worked-out problems

269

D.10 The characteristic of a ring Give a ring R and a positive integer n. For any x ∈ R, by n ⋅ x, we mean n ⋅ x = x + ⋯ + x. ´¹¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¸ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¶ n terms

It may happen that for a positive integer c we have c ⋅ 1 = 1 + ⋯ + 1 = 0. ´¹¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹¸ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹¶ c terms

For example, in Zm = Z/(mZ), we have m ⋅ 1 = m = 0. On the other hand in Z, c ⋅ 1 = 0 implies c = 0, and then no such positive integer exists. The smallest positive integer c for which c ⋅ 1 = 0 is called the characteristic of R. If no such number c exists, we say that R has characteristic zero. The characteristic of R is denoted by char R. It can be shown that any finite ring is of nonzero characteristic. Also, it is proven that the characteristic of a field is either zero or prime. (See Worked-out Problem D.11-2.) Notice that in a field F with char F = 2, we have 2x = 0, for any x ∈ F. This property makes fields of characteristic 2 exceptional. For instance, see Theorem 1.7.3 or Worked-out Problem 1.7.5-3.

D.11 Worked-out problems 1. Show that the number e (the base of the natural logarithm) is transcendental over Q. Solution: t

Assume that I(t) = ∫0 et−u f (u)du, where t is a complex number and f (x) is a polynomial with complex coefficients to be specified later. If f (x) = n n ∑j=0 aj xj , we set f¯(x) = ∑j=0 ∣aj ∣xj , where ∣ ∣ denotes the norm of complex numbers. Integration by parts gives ∞

∞

n

n

j=0

j=0

j=0

j=0

I(t) = et ∑ f (j) (0) − ∑ f (j) (t) = et ∑ f (j) (0) − ∑ f (j) (t),

(D.11.1)

where n is the degree of f (x). Assume that e is a root of g(x) = ∑ri=0 bi xi ∈ Z[X], where b0 ≠ 0. Let p be a prime greater than max{r, ∣b0 ∣}, and define f (x) = xp−1 (x − 1)p (x − 2)p ⋯(x − r)p . Consider J = b0 I(0) + b1 I(1) + ⋯ + br I(r). Since g(e) = 0, the contribution of the first summand on the right-hand side of (D.11.1) to J is 0. Thus, r

n

J = − ∑ ∑ bk f (j) (k), k=0 j=0

where n = (r + 1)p − 1. The definition of f (x) implies that many of the terms above are zero, and we can write n

r

n

J = − ∑ b0 f (j) (0) + ∑ ∑ bk f (j) (k). j=p−1

k=1 j=p

270

Appendix D. Basic facts in abstract algebra

Each of the terms on the right is divisible by p! except for f p−1 (0) = (p − 1)!(−1)rp (r!)p , where we have used that p > r. Thus, since p > ∣b0 ∣ as well, we see that J is an integer that is divisible by (p − 1)!, but not by p. That is, J is an integer with ∣J∣ ≥ (p − 1)!. Since f¯(k) = k p−1 (k + 1)p (k + 2)p ⋯(k + r)p ≤ (2r)n ,

for 0 ≤ k ≤ r,

we deduce that r

r

j=0

j=0

p

∣J∣ ≤ ∑ ∣bj ∣∣I(j)∣ ≤ ∑ ∣bj ∣jej f¯(j) ≤ c ((2r)(r+1) ) , where c is a constant independent of p. This gives a contradiction. 2. Prove the following statements: (a) Any finite ring R is of nonzero characteristic. (b) The characteristic of a field F is either zero or prime. Solution: (a) Assume to the contrary char R = 0. Without loss of generality, we may assume that 0 ≠ 1. For any m, n ∈ Z, m⋅1 = n⋅1 implies (m−n)⋅1 = 0, and since we assumed char R = 0, it follows that m = n. Therefore, the set {n⋅1 ∶ n ∈ Z} is an infinite subset of R, which is a contradiction. (b) Assume that m = char F is nonzero and m = nk. Then 0 = (nk) ⋅ 1 = nk ⋅ (1 ⋅ 1) = (n ⋅ 1)(k ⋅ 1). Thus, either n = 0 or k = 0 and this implies either n = m or k = m. Hence, m is prime.

D.12 Problems 1. Let Z6 be the set of all integers modulo 6. Explain why this is not a field. 2. Assume that G and H are two groups. Let Hom(G, H) denote the set of all group homomorphisms from G to H. Show that (a) Hom(G, H) is a group under the composition of functions if H is a subgroup of G. (b) #Hom(Zm , Zn ) = gcd(m, n), where m, n ∈ N, where # denotes the cardinality of the set. ⎡a ⎢ ⎢0 ⎢ n×n 3. Prove that ϕ ∶ Q → Q by ϕ(a) = ⎢ ⎢⋮ ⎢ ⎢0 ⎣

0⎤⎥ 0⎥⎥ ⎥ is a ring homomorphism. ⋮ ⎥⎥ a⎥⎦ √ √ 4. Let F be the set of all real numbers of the form a + b 3 2 + c 3 4, where a, b, c ∈ Q. Is F a subfield of R? 0 a ⋮ 0

⋯ ⋯ ⋱ ⋯

D.12. Problems

271

5. Prove that a quaternion x = x1 + x2 i + x3 j + x4 k is purely imaginary if and only if x2 ≤ 0. 6. Assume that G is a group. Prove that if G/Z(G) is cyclic, then G is abelian. 7. Show that every intersection of ideals of a ring R is an ideal of R. 8. Let I and J be ideals of a ring R. Show that I ∪ J is an ideal of R if and only if I ⊆ J or J ⊆ I. 9. Let ϕ ∶ R → S be a homomorphism of rings, A be a subring of R, and B be a subring of S. Show that (a) ϕ(A) is a subring of S. (b) ϕ−1 (B) is a subring of R.

Appendix E

Complex numbers

Denote by C the field of complex numbers. A complex number z can be written in the form z = √ x + iy, where x, y ∈ R. Here, i2 = −1. Sometimes in this book we denote √ i by −1. Indeed, a complex number is a linear combination of 1 and i = −1. Then, C can be viewed as R2 , where the vector (x, y)⊺ represents z. Actually, a complex number z = x + iy can be identified with the point (x, y) and plotted in the plane (called the complex place), as shown in Figure iθ E.1. Also, √ z can be written in the polar form z = re = r(cos θ + i sin θ), where r = ∣z∣ = x2 + y 2 is the absolute value of z or the modulus of z, and θ = arctan xy called the argument of z. Note that x = Rz, the real part of z, and y = Iz, the imaginary part of z. In addition, z¯ = x−iy is the conjugate of z. As we mentioned, the polar representation of z is z = reiθ = r(cos θ + i sin θ). Note that here we use Euler’s formula, which states that for any real number x, eix = cos x + i sin x. Let w = u + iv = R(cos ψ + i sin ψ), where u, v ∈ R. Then z ± w = (x ± u) + i(y ± v),

zw = (xu − yv) + i(xv + yu) = rRei(θ+ψ) , w 1 R = w¯ z = ei(ψ−θ) if z ≠ 0. z z z¯ r

Theorem E.0.1. Complex roots of polynomials with real coefficients appear in conjugate pairs. The proof is left as an exercise. Theorem E.0.2 (De Moivre’s theorem). (cos θ + i sin θ)n = cos nθ + i sin nθ, for any n ∈ N and θ ∈ R. The proof is left as an exercise. Remark E.0.3. For a complex number w = Reiψ and a positive integer n ≥ 2, the ψ+2kπ 1 equation z n − w = 0 has n-complex distinct roots for w ≠ 0, which are R n ei n 1 ) + i sin ( π+2kπ )], for k = 0, 1, . . . , n − 1= R n [cos ( π+2kπ n n 273

274

Appendix E. Complex numbers Im 1

2

2

4

2

Re

Figure E.1 Example E.1. In this example, we find the three cube roots of −27. In polar form −27 = 27(cos π + i sin π). It follows that the cube of −27 is given by 1

1

(−27) 3 = 27 3 [cos (

π + 2kπ π + 2kπ ) + i sin ( )] , 3 3

for k = 0, 1, 2.

Fundamental theorem of algebra The fundamental theorem of algebra states that any monic polynomial p(z) ∈ C[z] of degree n ≥ 2 splits to linear factors, i.e., p(z) = ∏ni=1 (z − zi ). Namely, every polynomial of degree n with real or complex coefficients has exactly n zeros (counting multiplicities) in C. Equivalently, one says that the field of complex numbers is algebraically closed. It is not possible to give a formula for the roots of polynomials of degree greater than 4. Actually, Niels Henrik Abel proved that for a general fifth-degree polynomial equation, there is no formula for its roots in terms of its coefficients that uses only the operations of addition, subtraction, multiplication, division, and taking the nth root. Technically, it is said that a fifth degree polynomial equation is not solvable by radicals. Later, Evariste Galois provided conditions under which polynomial equations can be solved by radicals. His approach to polynomial equations is now known as the Galois theory. See [7] for more details about the fundamental theorem of algebra. The interested reader is referred to [28] to see an improvement of the fundamental theorem of algebra given by Joseph Shipman in 2007; any field in which polynomials of prime degree have roots is algebraically closed.

E.1 Worked-out problems 1. Find the real part of (cos 0.7 + i sin 0.7)53 .

E.2. Problems

275

Solution: 53

This is the same as (e0.7i ) = e37.1i = cos(37.1) + i sin(37.1). Then, the real part is simply cos(37.1). 2. Write (1 − i)100 as a + ib, where a and b are real. Solution: The complex number 1 − i has modulus

√

2 and argument − π4 . That is,

√

π π 2 (cos (− ) + i sin (− )) 4 4 √ 100 100π 100π 100 ⇒ (1 − i) = ( 2) (cos (− ) + i sin (− )) 4 4 = 250 (cos (−25π) + i sin (−25π)) 1−i=

= 250 (−1 + 0i) = −250 .

E.2 Problems 1. Show that p(z) = z 2 +bz +c ∈ R[z] is irreducible over R if and only if b2 < 4c. 2. Show that any monic polynomial p(z) ∈ R[z] of degree at least 2 splits to a product of irreducible linear and quadratic monic polynomials over R[z]. 3. Deduce from the previous problem that any p(z) ∈ R[z] of odd degree must have a real root. 4. Show De Moivre’s theorem; for θ ∈ R and n ∈ N, (cos θ + i sin θ)n = cos nθ + i sin nθ. 5. Show that ∣z + 1∣ ≤ ∣z + 1∣2 + ∣z∣, for all z ∈ C. 6. If z ∈ C, find w ∈ C satisfying w2 = z. 7. Use De Moivre’s theorem to find (1 + i)8 .

Appendix F

Differential equations

If x = x(t) is a differentiable function satisfying a differential equation of the form x′ = kx, where k is a constant, then the general solution is x = cekt , where c is a constant. If an initial condition x(0) = x0 is specified, then, by substituting t = 0 in the general solution, we find that c = x0 . Hence, the unique solution to the differential equation that satisfies the initial condition is x = x0 ekt . Suppose we have n differentiable functions of x1 , x2 , . . . , xn of t that satisfy a system of differential equations: ⎧ x′1 = a11 x1 + a12 x2 + ⋯ + a1n xn ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪x′2 = a21 x1 + a22 x2 + ⋯ + a2n xn ⎨ ⎪ ⋮ ⎪ ⎪ ⎪ ⎪ ⎪x′ = an1 x1 + an2 x2 + ⋯ + ann xn ⎪ ⎩ n We can write this system in matrix form as x′ = Ax, where ⎡ x1 (t) ⎤ ⎢ ⎥ ⎢ x (t) ⎥ ⎢ 2 ⎥ ⎥, x(t) = ⎢ ⎢ ⋮ ⎥ ⎢ ⎥ ⎢xn (t)⎥ ⎣ ⎦

⎡ x′ (t) ⎤ ⎢ 1 ⎥ ⎢ x′ (t) ⎥ ⎢ ⎥ ′ x (t) = ⎢ 2 ⎥ , and A = [aij ]n×n . ⎢ ⋮ ⎥ ⎢ ′ ⎥ ⎢xn (t)⎥ ⎣ ⎦

Now we can use matrix methods to help us find the solution. Suppose we want to solve the following system of differential equations: x′1 = 3, x′2 = 7. Each equation can be solved separately, as above, to give x1 = c1 e3t , x2 = c2 e7t , 277

278

Appendix F. Differential equations

where c1 and c2 are constants. Notice that in matrix form, our equation x′ = Ax has a diagonal coefficient matrix, 3 A=[ 0

0 ], 7

and the eigenvalues 3 and 7 occur in the exponentials e3t and e7t of the solution.

Appendix G

Basic notions of matrices

A matrix is a rectangular array of elements called the entries from a given set of elements D. The horizontal arrays of a matrix are called its rows, and the vertical arrays are called its columns. A matrix is said to have the order m × n if it has m rows and n columns. An m × n matrix A can be represented in the following form: ⎡ a11 a12 ⋯ a1n ⎤ ⎢ ⎥ ⎢a ⎥ ⎢ 21 a22 ⋯ a2n ⎥ ⎢ ⎥ ⋮ ⎥, A=⎢ ⋮ ⎢ ⎥ ⎢ ⋱ ⎥ ⋮ ⎢ ⎥ ⎢am1 am2 ⋯ amn ⎥ ⎣ ⎦ where aij is the entry at the intersection of the ith row and jth column. In a more concise manner, we also write A = [aij ]m×n or A = [aij ]. A matrix obtained by deleting some of the rows and/or columns of a matrix is said to be a submatrix of the given matrix. The set of m × n matrices with entries in F is denoted by Fm×n . If A = [aij ] and B = [bij ] are m × n matrices in Fm×n , their sum A + B is the m × n matrix obtained by adding the corresponding entries. Thus, A + B = [aij + bij ]. If A ∈ Fm×n and c ∈ F is a scalar, then the scalar multiple cA is the m × n matrix obtained by multiplying each entry of A by c. More formally, we have cA = c[aij ] = [caij ]. The matrix (−1)A is written as −A and called the negative of A. If A, B ∈ Fm×n , then A − B = A + (−B). A matrix, all of whose entries are zero, is called a zero matrix and denoted by 0. It is clear that if A ∈ Fm×n and 0 is zero matrix of the same size, then A+0=A=0+A and

A − A = 0 = −A + A. 279

280

Appendix G. Basic notions of matrices

The matrix addition an scalar multiplication satisfy the following properties: (Here we assume that A, B, C ∈ Fm×n and c, d ∈ F.) (i) commutativity:

A + B = B + A;

(ii) associativity:

(A + B) + C = A + (B + C);

(iii) A + 0 = A; (iv) A + (−A) = 0; (v) distributivity:

c(A + B) = cA + cB;

(vi) distributivity:

(c + d)A = cA + dA;

(vii) c(dA) = (cd)A. For A ∈ Fm×n and B ∈ Fp×q , the product AB is defined if and only if the number of columns in A is equal to the number of rows in B, i.e., n = p. In that case, the resulting matrix C = [cij ] is m × q. The entry cij is obtained by multiplying the row i of A by the column j of B. For x = (x1 , . . . , xn )⊺ ∈ Fn and y = (y1 , . . . , yn )⊺ ∈ Fn , x⊺ y = ∑ni=1 xi yi . (This product can be regarded as a product of a 1 × n matrix x⊺ with n × 1 product matrix y.) The product of corresponding sizes of matrices satisfies the following properties: (i) associativity: (AB)C = A(BC); (ii) distributivity: (A1 + A2 )B = A1 B + A2 B and A(B1 + B2 ) = AB1 + AB2 ; (iii) a(AB) = (aA)B = A(aB), for each a ∈ F; m×m (iv) identities: Im A= A = AIn , where A ∈ Fm×n and Im = [δij ]m is i=j=1 ∈ F the identity matrix of order m.

If A ∈ Fn×n , we define

Ak = AA⋯A , ´¹¹ ¹ ¹ ¹ ¸ ¹ ¹ ¹ ¹ ¶ k−factors

if k is an integer greater than 1. For k = 1, we define A1 = A and for k = 0, it is convenient to define A0 = In . If r, s are nonnegative integers, then it is proven that Ar As = Ar+s and (Ar )s = Ars .

Bibliography [1] O. Alter, P.O. Brown, and D. Botstein, Singular value decomposition for genome-wide expression data processing and modelling, Proc. Natl. Acad. Sci. USA, 97 (2000), 10101–10106. [2] J.L. Arocha, B.L. Iano, and M. Takane. The theorem of Philip Hall for vector spaces, An. Inst. Mat. Univ. Nac. Aut´ onoma M´exico, 32 (1992), 1–8. [3] A. Barvinok, A Course in Convexity, Grad. Stud. Math. 54, Amer. Math. Soc., Providence, RI, 2002. [4] J.A. Bondy and U.S.R. Murty, Graph Theory, Grad. Texts Math., Springer, Berlin, 2008. [5] L.M. Bregman, Some properties of nonnegative matrices and their permanents, Soviet Math. Dokl., 14 (1973), 945–949 (in English); Dokl. Akad. Nauk SSSR, 211 (1973), 27–30 (in Russian). [6] Y. Filmus, Range of Symmetric Matrices over GF(2), unpublished note, University of Toronto, 2010. [7] B. Fine and G. Rosenberger, The Fundamental Theorem of Algebra, Undergrad. Texts Math., Springer-Verlag, New York, 1997. [8] S. Friedland, A new approach to generalized singular value decomposition, SIAM J. Matrix. Anal. Appl., 27 (2005), 434–444. [9] S. Friedland, Matrices: Algebra, Analysis and Applications, World Scientific, Singapore, 2016. [10] S. Friedland, S. Gaubert, and L. Han, Perron-Frobenius theorem for nonnegative multilinear forms and extensions, Linear Algebra Appl., 438 (2013), 738–749. [11] G.H. Golub and C.F. Van Loan, Matrix Computation, 4th ed., John Hopkins Univ. Press, Baltimore, 2013. [12] R. J. Gregorac, A note on finitely generated groups, Proc. Amer. Math. Soc., 18 (1967), 756–758. [13] J. Hannah, A geometric approach to determinant, Amer. Math. Monthly, 103 (1996), 401–409. [14] I.N. Herstein, Abstract Algebra, 3rd ed., Wiley, New York, 1996. [15] R.A. Horn and C.R. Johnson, Matrix Analysis, Cambridge University Press, Cambridge, 2013. 281

282

Bibliography [16] R.A. Johnson and D.W. Wichern, Applied Multivariate Statistical Analysis, 4th ed., Prentice Hall, Englewood Cliffs, NJ, 1998. [17] B. Mendelson, Introduction to Topology, 3rd ed., Dover Publications, Inc., New York, 1990. [18] M.A. Nielsen and I.L. Chuang, Quantum Computation and Quantum Information, Cambridge University Press, Cambridge, 2010. [19] A. Niknejad, S. Friedland, and L. Chihara, A simultaneous reconstruction of missing data in DNA microarrays, Linear Algebra Appl., 416 (2006), 8–28. [20] S. Friedland, A. Niknejad, M. Kaveh, and H. Zare, An algorithm for missing value estimation for DNA microarray data, in Proceedings of the 2006 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), IEEE Press, Piscataway, NJ, 2006, pp. 1092–1095. [21] A. Ostrowski, Solution of Equations in Euclidean and Banach Spaces, Academic Press, New York, 1973. [22] R. Rado, A Theorem on Independence Relations, Quart. J. Math., Oxford Ser., 13 (1942), 83–89. [23] S. Roman, Advanced Linear Algebra, Grad. Texts Math. 135, Springer-Verlag, New York, 1992. [24] W. Rudin, Principles of Mathematical Analysis, 3rd ed., Internat. Ser. Pure Appl. Math., McGraw-Hill Education, New York, 1974. [25] R. Schindler, Set Theory: Exploring Independence and Truth, Universitext, Springer, Cham, 2014. [26] A. Schrijver, A short proof of Minc’s conjecture, J. Combin. Theory Ser. A, 25 (1978), 80–83. [27] T. Shifrin and M.R. Adam, Linear Algebra: A Geometric Approach, 2nd ed., W.H. Freeman and Company, New York, 2011. [28] J. Shipman, Improving the fundamental theorem of algebra, Math. Intell., 29 (2009), 9–14. [29] G. Toth, Measures of Symmetry for Convex Sets and Stability, Universitext, Springer, Cham, 2015. [30] J.H. Van Lint and R.M. Wilson, A Course in Combinatorics, Cambridge University Press, Cambridge, 1992.

Index abelian group, 1 adjacency matrix, 53 adjoint matrix, 89 affine spaces, 60 algebra, 11 algebraic element, 92 algebraic multiplicity, 89 algebraically closed field, 63 algebraically simple eigenvalue, 90 angle, 35 angle between x and y, 139 annihilator, 38 anti–self-adjoint, 156 associativity, 280 augmented matrix, 15 balanced bipartite graph, 55 basis, 3 big O notation, 255 bijection, 11 bilinear form, 161 binary operation, 261 bipartite graph, 52 block diagonal matrix, 23 block matrix, 13 CANDECOMP-PARAFAC decomposition, 227 canonical form, 75 Carath´eodary’s theorem, 61 Cauchy–Binet formula, 110 Cauchy–Schwarz inequality, 141 Cayley–Hamilton theorem, 85 center of a group, 265 closed set, 173 codimension, 100 cofactor, 46 cokernel, 99

cokernel of group homomorphism, 99 column spaces, 21 columns of a matrix, 279 commutative group, 32 commutative ring, 62 compact set, 61 companion matrix, 108 complement, 26 completion lemma, 11 complex number, 10 complex plane, 98 complex vector space, 162 continuity of the roots of a polynomial, 211 convergent sequence, 254 convex combination, 56 convex function, 54 convex hull, 56 convex set, 56 convoy principle, 164 core tensor, 226 coset, 100 Courant–Fischer principle, 165 Cramer’s rule, 47 cross product, 38 cyclic decomposition theorem, 109 cyclic group, 66 cyclic invariant subspace, 108 cyclic property, 72 cyclic subspace, 108 De Moivre’s theorem, 273 decomposable element, 221 decomposable tensor, 221 degree of field extension, 63 determinant of matrix, 42 diagonal entries, 13 283

diagonal matrices, 232 diagonalizable matrix, 232 diagonalization, 83 diagonally dominant, 155 differential equation, 7 digraph, 216 dimension, 3 direct sum of matrices, 13 disjoint sets, 257 disjoint union, 188 distance formula, 43 division algebra, 11 division ring, 263 dot product, 139 double dual, 70 doubly stochastic matrices, 32 dual space, 38 edge, 53 eigenspace, 77 eigenvector, 63 elementary matrix, 19 elementary row operation, 14 equivalence relation, 21 Euclidean metric space, 253 Euclidean norm, 141 Euclidean space, 1 Euler’s function, 10 even function, 30 extreme point, 56 Fekete’s lemma, 53 Fibonacci numbers, 83 field, 61 field extension, 63 finite field extension, 63 finite-dimensional vector space, 8 finitely generated group, 8 finitely generated vector space, 5

284 first isomorphism theorem, 101 Fredholm alternative, 146 free transversal, 53 free variable, 16 Frobenius norm, 141 fundamental theorem of algebra, 274 fundamental theorem of invertible matrices, 19 Gauss–Jordan algorithm, 20 Gaussian elimination, 14 Gaussian field of rationals, 11 Gaussian integers domain, 11 general linear group, 67 generalized singular value decomposition, 202 geometric multiplicity, 90 Gram–Schmidt (G–S) algorithm, 153 graph, 52 group, 1 H¨ older inequality, 179 Hadamard conjecture, 154 Hadamard determinant inequality, 150 Hadamard matrix, 154 Hermitian form, 161 homogeneous linear equation, 6 homogeneous polynomial, 12 homomorphism, 31 hyperplane, 60 idempotent matrix, 199 index of nilpotency, 102 induced norm by inner product, 141 infinite-dimensional vector space, 7 inner product, 37 inner product space (IPS), 37 invariant polynomials, 110 invariant subspace, 91 invertible matrices, 8 involution, 11 IPS (inner product space), 139 irreducible matrix, 208 irreducible polynomial, 63 isometry, 155

Index isomorphism, 34 Jordan block, 80 Jordan canonical form, 75 Kronecker delta function, 13 Kronecker product, 224 Krylov subspace, 160 Lagrange–Sylvester interpolation polynomial, 123 Laplace expansion, 43 lead variable, 16 leading entry, 14 least squares system, 147 least squares theorem, 152 left coset, 265 linear combination, 3 linear functional, 38 linear isomorphism, 34 linear operator, 34 linear system, 16 linearly dependent, 3 linearly independent, 3 lower triangular matrix, 29 LU decomposition, 52 Maclaurin expansion, 132 mapping, 101 Markov matrices, 56 matching, 55 matrix, 1 matrix addition, 1 matrix function, 121 matrix multiplication, 1 matrix of inner product, 140 matrix of the change of basis, 8 matrix polynomial, 87 max-min characterizations of eigenvalues, 164 maximal orthogonal set, 142 metric, 141 metric space, 141 minimal polynomial, 91 Minkowski’s criterion, 52 modified Gram–Schmidt algorithm, 146 modular law for subspaces, 30 module, 87 Moore–Penrose generalized inverse, 198 multilinear operator, 34

neutral element, 9 nilpotent matrix, 102 noncommutative division algebra, 11 noncommutative ring, 262 nonnegative matrices, 61 nonpivot column, 29 nontrivial invariant subspace, 94 norm, 140 normal subgroup, 265 normalization condition, 43 normed linear space, 140 numerically unstable, 145 odd function, 30 open ball, 253 ordered bases, 69 ordinary differential equations (ODE), 130, 132 orthogonal complement, 142 orthogonal matrix, 150 orthogonal projection, 145 orthonormal basis, 142 orthonormal k-frame, 166 orthonormal sets, 142 parallelogram identity, 152 path, 170 perfect matching, 57 permutation, 30 permutation group, 30 permutation matrix, 32 Perron–Frobenius eigenvector, 214 Perron–Frobenius theorem, 207 pivot, 14 pivot column, 14 pivot position, 14 pivoting, 14 polar decomposition, 198 positive definite operators, 173 primary decomposition theorem, 94 primitive matrix, 214 primitive unity root, 10 principal minor, 47 projection, 35 projection of x on U, 142 pseudospectrum, 98 QR algorithm, 153

Index QR factorization, 144 quaternions, 11 rank, 14 rank 1 tensor, 221 rank insensitivity lemma, 30 rational canonical form, 112 Rayleigh quotient, 164 reduced row echelon form, 14 reduced SVD (singular value decomposition), 198 REF (row echelon form), 6 reflection, 36 representation matrix, 69 ring, 61 ring of polynomial, 62 root of unity, 10 rotation, 35 row echelon form (REF), 6, 14 s-numbers, 102 scalar, 1 scalar matrix, 24 Schur’s unitary triangularization, 160 second isomorphism theorem, 101 second-order linear differential systems, 132 Segre’s characteristic, 105 self-adjoint linear transformation, 156

285 sequence, 20 similar matrices, 112 similarity class, 76 singular value, 181 singular value decomposition (SVD), 181 skew-symmetric matrix, 24, 168 skew-symmetric tensor, 225 span, 27 spanning set, 4 special linear group, 67 spectral decomposition, 158 spectral norm, 141 spectral radius, 207 spectral theorem, 157 spectrum, 75 standard basis, 8 stochastic matrix, 32 subgroup, 8 submatrix, 279 subring, 68 subspace, 53 Sylvester’s criterion of positivity, 180 Sylvester’s inequality, 50 symmetric matrix, 64 symmetric product, 225 symmetric tensor, 225 symmetric tridiagonal matrix, 50 system of differential equations, 135 system of distinct

representative, 52 tensor product of two vector spaces, 221 third isomorphism theorem, 101 transition matrix, 227 transpose matrix, 13 transposition, 31 triangular matrix, 24 trivial combination, 3 trivial subspace, 3 trivial vector space, 5 Tucker model, 226 universal property, 222 upper diagonal matrix, 194 upper triangular matrix, 29 Vandermonde determinant, 49 Vandermonde matrix, 50 Vandermonde, Alexander-Theophile, 50 vector, 1 vector projection, 155 vector space, 1 vertex, 57 walk, 216 wedge product, 225 Weyl inequality, 166 Weyr characteristic, 105

• constructive discussions about the motivation of fundamental concepts, • many worked-out problems in each chapter, and • topics rarely covered in typical linear algebra textbooks. The authors use abstract notions and arguments to give the complete proof of the Jordan canonical form and, more generally, the rational canonical form of square matrices over fields. They also provide the notion of tensor products of vector spaces and linear transformations. Matrices are treated in depth, with coverage of the stability of matrix iterations, the eigenvalue properties of linear transformations in inner product spaces, singular value decomposition, and min-max characterizations of Hermitian matrices and nonnegative irreducible matrices. The authors show the many topics and tools encompassed by modern linear algebra to emphasize its relationship to other areas of mathematics. The text is intended for advanced undergraduate students. Beginning graduate students seeking an introduction to the subject will also find it of interest.

M. Aliabadi

S. Friedland

Shmuel Friedland has been Professor at the University of Illinois at Chicago since 1985. He was a visiting Professor at the University of Wisconsin, Madison; IMA, Minneapolis; IHES, Bures-sur-Yvette; IIT, Haifa; and Berlin Mathematical School. He has contributed to the fields of one complex variable, matrix and operator theory, numerical linear algebra, combinatorics, ergodic theory and dynamical systems, mathematical physics, mathematical biology, algebraic geometry, functional analysis, group theory, quantum theory, topological groups, and Lie groups. He has authored and coauthored approximately 200 papers, and in 1978 he proved a conjecture of Erdős and Rényi on the permanent of doubly stochastic matrices. In 1993 he received the first Hans Schneider prize in Linear Algebra jointly with M. Fiedler and I. Gohberg, and in 2010 he was awarded a smoked salmon for solving the set-theoretic version of the salmon problem. Professor Friedland serves on the editorial boards of Electronic Journal of Linear Algebra and Linear Algebra and Its Applications, and from 1992 to 1997 he served on the editorial board of Random and Computational Dynamics. He is the author of Matrices: Algebra, Analysis and Applications (World Scientific, 2015).

For more information about SIAM books, journals, conferences, memberships, or activities, contact:

Society for Industrial and Applied Mathematics 3600 Market Street, 6th Floor Philadelphia, PA 19104-2688 USA +1-215-382-9800 • Fax +1-215-386-7999 [email protected] • www.siam.org

Shmuel Friedland Mohsen Aliabadi

Mohsen Aliabadi received his M.Sc. in Pure Mathematics from the Department of Mathematical Science, Sharif University of Technology, Tehran, in 2012. Currently, he is a Ph.D. student under the supervision of Professor Shmuel Friedland and is a teaching assistant in the Department of Mathematics, Statistics, and Computer Science at the University of Illinois at Chicago. His interests include combinatorial optimization, matrix and operator theory, graph theory, and field theory. He has authored six papers.

Linear Algebra and Matrices

This introductory textbook grew out of several courses in linear algebra given over more than a decade and includes such helpful material as

Shmuel Friedland Mohsen Aliabadi

Linear Algebra and Matrices

OT156 ISBN 978-1-611975-13-0 90000

OT156 9781611975130

OT156_Aliabadi_revisedcoverD_12-11-17.indd 1

12/11/2017 3:19:40 PM