Linear Algebra in Action [2 ed.] 9781470409081

Linear algebra permeates mathematics, perhaps more so than any other single subject. It plays an essential role in pure

98 17 48MB

English Pages 585+xix [602] Year 2013

Report DMCA / Copyright

DOWNLOAD PDF FILE

Table of contents :
Title
Contents
Preface to the Second Edition
Preface to the First Edition
1. Vector spaces
2. Gaussian elimination
3. Additional applications of Gaussian elimination
4. Eigenvalues and eigenvectors
5. Determinants
6. Calculating Jordan forms
7. Normed linear spaces
8. Inner product spaces and orthogonality
9. Symmetric, Hermitian and normal matrices
10. Singular values and related inequalities
11. Pseudoinverses
12. Triangular factorization and positive definite matrices
13. Difference equations and differential equations
14. Vector-valued functions
15. The implicit function theorem
16. Extremal problems
17. Matrix-valued holomorphic functions
18. Matrix equations
19. Realization theory
20. Eigenvalue location problems
21. Zero location problems
22. Convexity
23. Matrices with nonnegative entries
A. Some facts from analysis
B. More complex variables
Bibliography
Notation Index
Subject Index
Recommend Papers

Linear Algebra in Action [2 ed.]
 9781470409081

  • Commentary
  • From a 200dpi scan
  • 0 0 0
  • Like this paper and download? You can publish your own PDF file online for free in a few minutes! Sign Up
File loading please wait...
Citation preview

Linear Algebra in Action

‘— SECOND EDITION

Harry Dym

Graduate Studies

in Mathematics Volume 78

i‘

:

“6

o

a

o

’2 E5, American Mathematlcal Socnety

Linear Algebra inAamn SECOND EDITION

Harry Dym

Graduate Studies in Mathematics Volume 78



g American Mathematical Society

5" Providence, Rhode Island

EDITORIAL COMMITTEE David Cox (Chair) Daniel S. Freed Rafe Mazzeo

Gigliola Staffilani 2010 Mathematics Subject Classification. Primary 15-01, 30-01, 34-01, 39-01, 52—01, 93—01.

For additional information and updates on this book, Visit www.ams.org/ bookpages/gsm—78

Library of Congress Cataloging-in-Publication Data

Dym, H. (Harry), 1938-. Linear algebra in action / Harry Dym — Second edition. pages cm. — (Graduate studies in mathematics ; volume 78) Includes bibliographical references and index.

ISBN 978—1—4704—0908—1 (alkaline paper) 1. Algebras, Linear. I. Title. QA184.2.D96 512’.5—dc23

2014 2013029538

Copying and reprinting.

Individual readers of this publication, and nonprofit libraries

acting for them, are permitted to make fair use of the material, such as to copy a chapter for use in teaching or research. Permission is granted to quote brief passages from this publication in reviews, provided the customary acknowledgment of the source is given. Republication, systematic copying, or multiple reproduction of any material in this publication is permitted only under license from the American Mathematical Society. Requests for such permission should be addressed to the Acquisitions Department, American Mathematical Society, 201 Charles Street, Providence, Rhode Island 02904-2294 USA. Requests can also be made by e-mail to reprint-permissionGams .org.

© 2013 by the American Mathematical Society. All rights reserved. The American Mathematical Society retains all rights except those granted to the United States Government. Printed in the United States of America. The paper used in this book is acid—free and falls Within the guidelines established to ensure permanence and durability.

Visit the AMS home page at http://www.ams.org/ 10987654321

181716151413

Dedicated to the memory of our oldest son Jonathan Carroll Dym and our first granddaughter Avital Chana Dym, who were recalled prematurely for no apparent reason, he but 44 and she but 12. Yhz' zichmm bamch

Contents

Preface to the First Edition

Chapter 1.

§1.1. §1.2. §1.3. §1.4. §1.5. §1.6. §1.7. §1.8.

Preview The abstract definition of a vector space

Some definitions

xvii

Mappings

11

Triangular matrices

13

Block triangular matrices

17

Schur complements

18

Other matrix products

20

Chapter 2.

§2.1. §2.2. §2.3. §2.4. §2.5. §2.6. §2.7. §2.8.

Vector spaces

XV

UIMHH

Preface to the Second Edition

Gaussian elimination

21

Some preliminary observations

22

Examples

24

Upper echelon matrices

30

The conservation of dimension

36

Quotient spaces

38

Conservation of dimension for matrices

38

From U to A

40

Square matrices

41

Chapter 3.

Additional applications of Gaussian elimination

45

Contents

vi

§3.1. §3.2. §3.3. §3.4. §3.5. §3.6. §3.7. §3.8.

Gaussian elimination redux

45

Properties of BA and AC

48

Extracting a basis

50

Computing the coeflicients in a basis

51

The Gauss—Seidel method

52

Block Gaussian elimination

55

{0, 1, 00}

56

Review

57

Chapter 4.

§4.1. §4.2. §4.3. §4.4. §4.5. §4.6. §4.7. §4.8. §4.9. §4.10. §4.11. §4.12. §4.13. §4.14. §4.15.

61

Change of basis and similarity

62

Invariant subspaces

64

Existence of eigenvalues

64

Eigenvalues for matrices

66

Direct sums

69

Diagonalizable matrices

72

An algorithm for diagonalizing matrices

74

Computing eigenvalues at this point

75

Not all matrices are diagonalizable

77

Chapter 5.

§5.1. §5.2. §5.3. §5.4. §5.5. §5.6. §5.7. §5.8. §5.9.

Eigenvalues and eigenvectors

The Jordan decomposition theorem

80

An instructive example

81

The binomial formula

83

More direct sum decompositions

84

Verification of Theorem 4.13

87

Bibliographical notes

89

Determinants

91

Functionals

91

Determinants

92

Useful rules for calculating determinants

95

Eigenvalues

98

Exploiting block structure

101

The Binet-Cauchy formula

103

Minors

105

Uses of determinants

109

Companion matrices

110

Contents

vii

Circulants and Vandermonde matrices

§5.10. Chapter 6.

Calculating Jordan forms

111 113

§6.1.

Overview

114

§6.2.

Structure of the nullspaces NBj

114

§6.3.

Chains and cells

116

§6.4.

Computing J

117

§6.5.

An algorithm for computing U

118

§6.6.

A simple example

120

§6.7.

A more elaborate example

122

§6.8.

Jordan decompositions for real matrices

125

§6.9.

Projection matrices

128

§6.10.

Companion and generalized Vandermonde matrices

Chapter 7.

Normed linear spaces

128 133

§7.1.

Four inequalities

133

§7.2.

Normed linear spaces

138

§7.3.

Equivalence of norms

140

§7.4.

Norms of linear transformations

142

§7.5.

Operator norms for matrices

144

§7.6.

Mixing tops and bottoms

146

§7.7.

Evaluating some operator norms

146

§7.8.

Inequalities for multiplicative norms

148

§7.9.

Small perturbations

151

§7.10.

Bounded linear functionals

154

§7.11.

Extensions of bounded linear functionals

155

§7.12.

Banach spaces

158

§7.13.

Bibliographical notes

160

Chapter 8.

Inner product spaces and orthogonality

161

§8.1.

Inner product spaces

161

§8.2.

A characterization of inner product spaces

164

§8.3.

Orthogonality

165

§8.4.

Gram matrices

167

§8.5.

Projections and direct sum decompositions

168

§8.6.

Orthogonal projections

170

§8.7.

Orthogonal expansions

173

viii

gas. §8.9. §8.10. §8.11. §8.12. §8.13. §8.14. §8.15.

Contents

The Gram-Schmidt method

175

Toeplitz and Hankel matrices

176

Adjoints

178

The Riesz representation theorem

182

Normal, selfadjoint and unitary transformations

184

Auxiliary formulas

186

Gaussian quadrature

187

Bibliographical notes

190

Chapter 9.

§9.1. §9.2. §9.3. §9.4. §9.5. §9.6. §9.7. §9.8. §9.9. §9.10. §9.11.

Symmetric, Hermitian and normal matrices

Hermitian matrices are diagonalizable

192

Commuting Hermitian matrices

194

Real Hermitian matrices

196

Projections and direct sums in lF"

197

Projections and rank

201

Normal matrices

202

QR factorization

204

Schur’s theorem

205

Areas, volumes and determinants

207

Boundary value problems

212

Bibliographical notes

212

Chapter 10.

§10.1. §10.2. §10.3. §10.4. §10.5. §10.6. §10.7. §10.8. §10.9.

191

Singular values and related inequalities

213

Singular value decompositions

213

Complex symmetric matrices

218

Approximate solutions of linear equations

220

Fitting a line in R2

221

Fitting a line in RP

222

Projection by iteration

223

The Courant—Fischer theorem

224

Inequalities for singular values

228

von Neumann’s inequality for contractive matrices

235

§10.10.

Bibliographical notes

236

Chapter 11.

Pseudoinverses

237

§11.1. §11.2.

Pseudoinverses

237

The Moore-Penrose inverse

244

ix

Contents

§11.3.

Best approximation in terms of Moore—Penrose inverses

247

§11.4.

Drazin inverses

249

§11.5.

Bibliographical notes

250

Chapter 12.

Triangular factorization and positive definite matrices

251

§12.1.

A detour on triangular factorization

252

§12.2.

Definite and semidefinite matrices

254

§12.3.

Characterizations of positive definite matrices

256

§12.4.

An application of factorization

259

§12.5.

Positive definite Toeplitz matrices

260

§12.6.

Detour on block Toeplitz matrices

266

§12.7.

A maximum entropy matrix completion problem

271

§12.8.

A class of A > O for which (12.52) holds

275

§12.9.

Schur complements for semidefinite matrices

277

§12.10.

Square roots

280

§12.11.

Polar forms

282

§12.12.

Matrix inequalities

283

§12.13.

A minimal norm completion problem

286

§12.14.

A description of all solutions to the minimal norm

§12.15. Chapter 13.

completion problem

288

Bibliographical notes

289

Difference equations and differential equations

291

§13.1.

Systems of difference equations

292

§13.2.

Nonhomogeneous systems of difference equations

293

§13.3.

The exponential cm

294

§13.4.

Systems of differential equations

296

§13.5.

Uniqueness

298

§13.6.

Isometric and isospectral flows

299

§13.7.

Second-order differential systems

300

§13.8.

Stability

301

§13.9.

Nonhomogeneous differential systems

301

§13.10.

Strategy for equations

302

§13.11.

Second-order difference equations

303

§13.12.

Higher order difference equations

306

§13.13.

Second-order differential equations

307

§13.14.

Higher order differential equations

309

Contents

§13.15. §13.16. Chapter 14.

§14.1. §14.2. §14.3. §14.4. §14.5. §14.6. §14.7. §14.8. §14.9. §14.10. §14.11.

Variation of parameters

313

Vector-valued functions

315 315

Taylor’s formula with remainder

316

Application of Taylor’s formula with remainder

317

Mean value theorem for functions of several variables

318

Mean value theorems for vector-valued functions of several variables

319

A contractive fixed point theorem

321

Newton’s method

324

A refined contractive fixed point theorem

327

Spectral radius

328

The Brouwer fixed point theorem

332

Bibliographical notes

336

The implicit function theorem

337

Preliminary discussion

337

The implicit function theorem

339

A generalization of the implicit function theorem

344

Continuous dependence of solutions

346

The inverse function theorem

347

Roots of polynomials

349

An instructive example

349

A more sophisticated approach

351

Dynamical systems

353

Chapter 16.

§16.1. §16.2. §16.3. §16.4. §16.5. §16.6.

311

Mean value theorems

Chapter 15.

§15.1. §15.2. §15.3. §15.4. §15.5. §15.6. §15.7. §15.8. §15.9. §15.10. §15.11.

Wronskians

Lyapunov functions

355

Bibliographical notes

357

Extremal problems

359

Classical extremal problems

359

Convex functions

363

Extremal problems with constraints

366

Examples

368

Krylov subspaces

374

The conjugate gradient method

374

xi

Contents

§16.7.

Dual extremal problems

379

§16.8.

Linear programming

381

§16.9.

Bibliographical notes

386

Chapter 17.

Matrix-valued holomorphic functions

387

§17.1.

Differentiation

387

§17.2.

Contour integration

391

§17.3.

Evaluating integrals by contour integration

396

§17.4.

A short detour on Fourier analysis

400

§17.5.

The Hilbert matrix

403

§17.6.

Contour integrals of matrix-valued functions

404

§17.7.

Continuous dependence of the eigenvalues

407

§17.8.

More on small perturbations

408

§17.9.

Spectral radius redux

410

§17.10.

Fractional powers

413

§17.11.

Bibliographical notes

414

Chapter 18.

Matrix equations

415

§18.1.

The equation X — AXB = O

415

§18.2.

The Sylvester equation AX — XB = C

418

§18.3.

AX = XB

421

§18.4.

Special classes of solutions

422

§18.5.

Riccati equations

424

§18.6.

Two lemmas

430

§18.7.

An LQR problem

432

§18.8.

Bibliographical notes

434

Chapter 19.

Realization theory

435

§19.1.

Minimal realizations

442

§19.2.

Stabilizable and detectable realizations

449

§19.3.

Reproducing kernel Hilbert spaces

450

§19.4.

de Branges spaces

453

§19.5.

R0t invariance

455

§19.6.

A left tangential Nevanlinna—Pick interpolation problem

456

§19.7.

Factorization of @(A)

462

§19.8.

Bibliographical notes

465

Chapter 20.

Eigenvalue location problems

467

Contents

xii

§20.1. §20.2. §20.3. §20.4. §20.5. §20.6. §20.7. §20.s. §20.9. §20.10.

Interlacing

467

Sylvester’s law of inertia

470

Congruence

472

Counting positive and negative eigenvalues

474

Exploiting continuity

477

Gerégorin disks

478

The spectral mapping principle

480

Inertia theorems

480

An eigenvalue assignment problem

483

Bibliographical notes

Chapter 21.

§21.1. §21.2. §21.3. §21.4. §21.5. §21.6. §21.7. §21.8. §21.9.

487

Bezoutians

487

The Barnett identity

492

The main theorem on Bezoutians

493

Resultants

495

Other directions

499

Bezoutians for real polynomials

501

Stable polynomials

502

Kharitonov’s theorem

504

Bibliographical notes

505

Chapter 22.

§22.1. §22.2. §22.3. §22.4. §22.5. §22.6. §22.7. §22.s. §22.9. §22.10. §22.11. §22.12. §22.13.

Zero location problems

486

Convexity

507

Preliminaries

507

Convex functions

509

Convex sets in R”

512

Separation theorems in R”

513

Hyperplanes

515

Support hyperplanes

516

Convex hulls

517

Extreme points

520

Brouwer’s theorem for compact convex sets

522

The Minkowski functional

523

The numerical range

525

Eigenvalues versus numerical range

528

The Gauss—Lucas theorem

529

xiii

Contents

§22.14.

The Heinz inequality

530

§22.15.

Extreme points for polyhedra

532

§22.16.

Bibliographical notes

536

Chapter 23.

Matrices With nonnegative entries

537

§23.1.

Perron—Frobenius theory

538

§23.2.

Stochastic matrices

544

§23.3.

Behind Google

545

§23.4.

Doubly stochastic matrices

546

§23.5.

An inequality of Ky Fan

550

§23.6.

The Schur-Horn convexity theorem

552

§23.7.

Bibliographical notes

558

Appendix A.

Some facts from analysis

559

§A.1.

Convergence of sequences of points

559

§A.2.

Convergence of sequences of functions

560

§A.3.

Convergence of sums

560

§A.4.

Sups and infs

561

§A.5.

Topology

562

§A.6.

Compact sets

562

§A.7.

Normed linear spaces

562

Appendix B.

More complex variables

565

§B.1.

Power series

565

§B.2.

Isolated zeros

567

§B.3.

The maximum modulus principle

569

§B.4.

ln(1 — A) When |)\| < l

569

§B.5.

Rouché’s theorem

570

§B.6.

Liouville’s theorem

572

§B.7.

Laurent expansions

572

§B.8.

Partial fraction expansions

573

Bibliography

575

Notation Index

579

Subject Index

581

Preface to the Second Edition

I have an opinion. But I do not agree with it.

Joshua Sobol [83] Most of the chapters in the first edition have been revised, some extensively. The revisions include changes in a number of proofs, to either simplify the argument and/or make the logic clearer, and, on occasion, to sharpen the result. New short introductory sections on linear programming, extreme points

for polyhedra and a Nevanlinna—Pick interpolation problem have been added, as have some very short introductory sections on the mathematics behind Google, Drazin inverses, band inverses and applications of svd together with a number of new exercises. I would like to thank the many readers who e-mailed me helpful lists of typographical errors. I owe a special word of thanks to David Kimsey and Motke Porat, whose lists hit double figures. I believe I have fixed all the reported errors and then some. A couple of oversights in the first edition that came to light (principally the fact that the word Hankel should be removed from the statement and proof of Corollary 21.2; an incomplete definition of a support hyperplane;

and a certain fuzziness in the discussion of operator norms and multiplicative

norms) have also been fixed. XV

xvi

Preface to the Second Edition

It is a pleasure to thank the staff of the AMS for being so friendly and helpful; a special note of thanks to my copy/production editor Mike Saitas for his sharp eye and cheerful willingness to accommodate the author and to Mary Medeiros for preparing the indices and her expertise in LaTeX. The AMS website www.ams.org/bookpages/gsm—78 will be used for sins of omission and commission (and just plain afterthoughts) for the second edition as well as the first.

TAM, ACH TEREM NISHLAM, July 19, 2013 Rehovot, Israel

Preface to the First Edition

A foolish consistency is the hobgoblln of little mtnds,... Ralph Waldo Emerson, Self Reliance This book is based largely on courses that I have taught at the Feinberg Graduate School of the Weizmann Institute of Science over the past 35 years to graduate students with widely varying levels of mathematical sophistication and interests. The objective of a number of these courses was to present a user-friendly introduction to linear algebra and its many ap— plications. Over the years I wrote and rewrote (and then, more often than not, rewrote some more) assorted sets of notes and learned many interesting things en route. This book is the current end product of that process. The emphasis is on developing a comfortable familiarity with the material. Many lemmas and theorems are made plausible by discussing an example that is chosen to make the underlying ideas transparent in lieu of a formal proof; i.e., I have tried to present the material in the way that most of the mathematicians that I know work rather than in the way they write. The coverage is not intended to be exhaustive (or exhausting), but rather to indicate the rich terrain that is part of the domain of linear algebra and to present a decent sample of some of the tools of the trade of a working analyst that I have absorbed and have found useful and interesting in more than 40 years in the business. To put it another way, I wish someone had taught me this material when I was a graduate student. In those days, in the arrogance of youth, I thought that linear algebra was for boys and girls and that real xvii

xviii

Preface to the First Edition

men and women worked in functional analysis. However, this is but one of many opinions that did not stand the test of time.

In my opinion, the material in this book can (and has been) used on many levels. A core course in classical linear algebra topics can be based on the first six chapters, plus selected topics from Chapters 7—9 and 13. The latter treats difference equations, differential equations and systems thereof. Chapters 14—16 cover applications to vector calculus, including a proof of the implicit function based on the contractive fixed point theorem, and extremal problems with constraints. Subsequent chapters deal with matrix—valued holomorphic functions, matrix equations, realization theory,

eigenvalue location problems, zero location problems, convexity, and matrices with nonnegative entries. I have taken the liberty of straying into areas that I consider significant, even though they are not usually viewed as part of the package associated with linear algebra. Thus, for example, I have added short sections on complex function theory, Fourier analysis, Lyapunov functions for dynamical systems, boundary value problems and more. A number of the applications are taken from control theory. I have adapted material from many sources. But the one which was most significant for at least the starting point of a number of topics covered

in this work is the wonderful book [56] by Lancaster and Tismenetsky. A number of students read and commented on substantial sections of assorted drafts: Boris Ettinger, Ariel Ginis, Royi Lachmi, Mark Kozdoba,

Evgeny Muzikantov, Simcha Rimler, Jonathan Ronen, Idith Segev and Amit Weinberg.

I thank them all, and extend my appreciation to two senior

readers: Aad Dijksma and Andrei Iacob for their helpful insightful remarks. A special note of thanks goes to Deborah Smith, my copy editor at the AMS, for her sharp eye and expertise in the world of commas and semicolons. On the production side, I thank Jason Friedman for typing an early version, and our secretaries Diana Mandelik, Ruby Musrie, Linda Alman,

Terry Debesh, all of whom typed selections and to Diana again for preparing all the figures and clarifying numerous mysterious intricacies of LaTeX. I also thank Barbara Beeton of the AMS for helpful advice on AMS LaTeX. One of the difficulties in preparing a manuscript for a book is knowing when to let go. It is always possible to write it better.1 Fortunately the

1Israel Gohberg tells of a conversation with Lev Sakhnovich that took place in Odessa many years ago: Lev: Israel, how is your book with Mark Gregorovic (Krein) progressing? Israel: It’s about 85% done. Lev: That’s great! Why so sad? Israel: If you would have asked me yesterday,

I would have said 95%.

Preface to the First Edition

xix

AMS maintains a web page: http://WWW.ams.org/bookpages/gsm—78, for sins of omission and commission (or just plain afterthoughts). TAM, ACH TEREM NISHLAM,... October 18, 2006 Rehovot, Israel

Chapter 1

Vector spaces

The road to wisdom? Well it’s plain and simple to express. Err and err and err again, but less and less and less.

Cited in [54]

1. 1 . Preview

One of the fundamental concerns of Linear Algebra is the solution of linear equations of the form a11m1+a12x2 + ' ' '+a1q$q=b1

a21m1+a22x2 + ' - -+a2q$q=bz

ap1x1+ap2x2 + ' - -+apqa:q=bp,

Where the aij and the b7; are given numbers (either real or complex) for i= 1,...,pandj = 1,...,q, and we are looking for the atj forj = 1,...,q.

Such a system of equations is equivalent to the matrix equation Ax=b,

l—‘l

Where

2

1. Vector spaces

0 RC Cola: The term (153' in the matrix A sits in the i’th row and the j’th column of the matrix; i.e., the first index stands for the number of the row and the second for the number of the column.

The order is re as in the popular drink by that name.

Given A and b, the basic questions are:

(1) When does there exist at least one solution x? (2) When does there exist at most one solution x? (3) How to calculate the solutions, when they exist? (4) How to find approximate solutions? The answers to these questions are part and parcel of the theory of vector spaces.

1.2. The abstract definition of a vector space This subsection is devoted to the abstract definition of a vector space. Even

though the emphasis in this course is definitely computational, it seems advisable to start with a few abstract definitions which will be useful in future situations as well as in the present. A vector Space V over the real numbers is a nonempty collection of objects called vectors, together with an operation called vector addition, which assigns a new vector u + v in V to every pair of vectors u in V and V in V, and an operation called scalar multiplication, which assigns a

vector av in V to every real number 04 and every vector v in V such that the following hold: (1) For every pair of vectors u and v, u + v = v + u; i.e., vector addition is commutative.

(2) For any three vectors u, v and w, u + (v + w) = (u + v) + w; i.e., vector addition is associative.

(3) There is a zero vector (or, in other terminology, additive identity) 0 E V such that 0 + v = v + O = v for every vector v in V. (4) For every vector v there is a vector W (an additive inverse of v) such that v + W = 0. (5) For every vector v, 1v = v. (6) For every pair of real numbers oz and fl and every vector v, oz(fiv) =

(01/3)V(7) For every pair of real numbers a and fl and every vector v, (oz + fl)v = ozv + flv.

1.2. The abstract definition of a vector space

3

(8) For every real number oz and every pair of vectors u and v, a(u + v) = au + ozv. Because of Item 2, we can write 11 + v + W without brackets; similarly,

because of Item 6 we can write 045v without brackets. From now on we shall use the symbol R to designate the real numbers, the symbol (C to designate the complex numbers and the symbol IF when the statement in question is valid for both IR and (C and there is no need to specify. Numbers in IF are often referred to as scalars. A vector space V over C is defined in exactly the same way as a vector space V over IR except that the numbers oz and 5 which appear in the definition above are allowed to be complex. Lemma 1.1. If V is a vector space over IF, then:

(a) V has exactly one zero vector. (b) Every vector u E V has exactly one additive inverse; i.e., if v, wEV and ifu+v=0 andu+w=0, thenv=w.

a=fi. Proof.

If 0 and 6 are both zero vectors for V, then 0 = 0 + 6 = 0. A

similar argument serves to establish (b); i.e., if u + v = 0 and u + w = then

V=V+0=V+(u+w) = (v+u)+W=O+W=W. To verify (0), let W be an additive inverse for the vector 0V. Then, since

OV= (0+0)V=0v+0v,

0=0v+w= (0v+0v)+w=0v+(0v+w) =0v+0=0v. The verification of (d) is similar: If w is an additive inverse for a0, then

0=a0+w=a(0+0)+w=a0+(a0+w) =a0+0=a0. Suppose next that av = 0 for some nonzero vector v E V and a 7E 0. Then

V = (1)v = (Oz—lav) = a_1(ozv) = Ot_10 = 0, by (d). This completes the proof of (f). Assertions (e) and (g) follow easily from (b) and (f), respectively, and are left to the reader. D The additive inverse (—1)V of a vector V E V is usually denoted —v. Correspondingly, we write 11 + (—v) as u — v.

4

1. Vector spaces

Exercise 1.1. Let V be a vector space over IF.

Show that if afl 6 IF

and if v is a nonzero vector in V, then av = flv 4:) a = B.

a—fi#0=>v= (oz—B)‘1(oz-B)V-l

[HINT:

Example 1.2. The set of column vectors 5131

l=

g

:wz-EIF‘,i=1,...,p

37p of height p with entries x7; 6 IF that are subject to the natural rules of vector addition

001

yl

{131 + .01

° + s = $19

:

929

33p + 3429

and multiplication $1

04131

a

= sup

amp

of the vector x by a number oz 6 IF is the most basic example of a vector space. Note the difference between the number 0 and the vector 0 E IFp. The symbols RP (respectively (Cp) will be used to designate vectors with

entries in IR (respectively (3). Example 1.3. The set IF pxq of p X q matrices with entries in IF is a vector space with respect to the rules of vector addition: 5611

"'

$1q

y11

+ $101

'''

€q

"'

3 3/111

q

3 '''

3611 + y11

=

ypq

"'

$1.; + y1q

3

E

£191 + 9101

'''

a

q + ypq

and multiplication by a scalar oz 6 IF: x11

- u 0

xlq

axll

u o o

axlq

xpl

. . .

mpq

axpl

. . .

05q

Notice that IFp = IFPX1. The symbols Rq (respectively (3q) will be used to designate p x q matrices with entries in IR (respectively C).

Exercise 1.2. Show that the space R3 endowed with the rule max(a:1, 3J1)

x C] y =

max(a:2,y2) max(a:3, y3)

1.3. Some definitions

5

for vector addition and the usual rule for scalar multiplication is not a vector space over 1R. [HINT: Show that this “addition” rule does not admit a zero element; i.e., there is no vector a E R3 such that a El x = x El a = x for

every x E 1R3.] a1

Exercise 1.3. Let C C R3 denote the set of vectors a =

a2

such that

as

the polynomial a1 + agt + 0,3752 2 0 for every t E R. Show that it is closed

under vector addition (i.e., a, b E C => a+b E C) and under multiplication

by positive numbers (i.e., a E C

and

a > 0 => aa E C), but that C is not

a vector space over R. [REMARKz A set C with the indicated two properties

is called a cone.] Exercise 1.4. Show that for each positive integer n, the space of polynomials

p()\) = Z ajAj of degree 3 n j=0 with coefficients aj E (C is a vector space over C under the natural rules of addition and scalar multiplication. [REMARK You may assume that

231:0 (Ly-A] = 0 for every A E (C if and only if a0 = a1 = - -- 2 an 2 0.] Exercise 1.5. Let .7: denote the set of continuous real-valued functions f (x) on the interval 0 g :L' S 1. Show that .7: is a vector space over R with respect

to the natural rules of vector addition ((f1 + f2)(a:) = f1(:c) + 13(3)) and scalar multiplication ((af )(m) = ozf (510)) 1 .3. Some definitions 0 Subspaces: A subspace M of a vector space V over IF is a nonempty

subset of V that is closed under vector addition and scalar multiplication. In other words if x and y belong to M, then x + y E M and ax E M for every scalar oz 6 IF. A subspace of a vector space is automatically a vector space in its own right.

Exercise 1.6. Let J-"o denote the set of continuous real-valued functions f (at) on the interval 0 S a: S 1 that meet the auxiliary constraints f (0) = 0 and

f (1) = 0. Show that .70 is a vector space over R with respect to the natural rules of vector addition and scalar multiplication that were introduced in Exercise 1.5 and that .70 is a subspace of the vector space .7: that was considered there. Exercise 1.7. Let .7-"1 denote the set of continuous real-valued functions

f (a?) on the interval 0 S :I; S 1 that meet the auxiliary constraints f (0) = 0 and f (1) = 1. Show that .7-"1 is not a vector space over R with respect

6

1. Vector spaces

to the natural rules of vector addition and scalar multiplication that were introduced in Exercise 1.5.

o Span: If v1, . . . ,vk is a given set of vectors in a vector space V over IF, then

span{v1,...,vk} =

k E Olal,...,Oék; EIF

j=1 In words, the span is the set of all linear combinations alvl + - - ~+0zkv;.c of the indicated set of vectors, with coefficients on, . . . , (1;,

in IF. Or, to put it another way, span{v1, . . . ,vk} is the smallest vector space that contains the vectors V1,. ..,vk. It is important

to keep in mind that the number of vectors k that were used to define the span is not a good indicator of the size of this space. 1 v1:

2

1

2 ,V2=

4

andV3=

2

comm

Thus, for example, if

then

Span‘t V2, V3} = span{v1} To clarify the notion of the size of the span we need the concept of linear dependence.

0 Linear dependence: A set of vectors v1, . . . ,vk in a vector space

V over IF is said to be linearly dependent over IF if there exists a set of scalars 041, . . . , oak 6 IF , not all of which are zero, such that a1v1+---+akvk=0.

Notice that this permits you to express one or more of the given

vectors in terms of the others. Thus, if a1 7E 0, then

and hence

span{v1, . . . ,vk} = span{vQ, . . .,vk}. Further reductions are possible if the vectors V2,...,V1c are still linearly dependent. 0 Linear independence: A set of vectors V1,. . .,vk in a vector

space V over IF is said to be linearly independent over IF if the only scalars a1, . . . ,ak 6 IF for which

041V1+---+Osk=0

1.3. Some definitions

are (11 =

7

= oz], = 0. This is just another way of saying that

you cannot express one of these vectors in terms of the others. Moreover, if {V1, . . . ,vk} is a set of linearly independent vectors in a vector space V over IF and if

(1.1)

V=Oz1V1+°'-+Otk

and

V=51V1+m+5k

for some choice of constants 041, . . . @1951, . . . ,6], 6 IF, then aj = 59' forj = l,...,k.

Exercise 1.8. Verify the last assertion; i.e., if (1.1) holds for a linearly independent set of vectors, {v1,...,vk}, then 059- = flj forj = l,...,k. Show by example that this conclusion is false if the given set of k vectors is not linearly independent. 0 Basis: A set of vectors v1, . . . ,Vk is said to form a basis for a vector space V over F if

(1) span{v1, . . . ,Vk} = V. (2) The vectors v1, . . . ,vk are linearly independent.

Both of these conditions are essential. The first guarantees that the given set of k vectors is big enough to express every vector v E V as a linear combination of v1, . . . ,vk; the second that you cannot

achieve this with less than k: vectors. A nontrivial vector space V has many bases. However, the number of elements in each basis for V is exactly the same and is referred to as the dimension of V and will be denoted dim V. A proof of this statement will be furnished later. The next example should make it plausible. Example 1.4. It is readily checked that the vectors 1 0

0

0 ,

1

0

0 and

0

1

form a basis for the vector space F3 over the field IF. It is also

not hard to show that no smaller set of vectors will do. (Thus, dimlll‘3 = 3, and, of course, diml = k: for every positive integer

k.) In a similar vein, the pxq matrices Eij,i = 1,. . .,p,j = 1,. . . ,q,

that are defined by setting every entry in Eij equal to zero except for the 1}j entry, which is set equal to one, form a basis for the vector space lxq.

Matrix multiplication: Let A = [adj] be a p x q matrix and

B = [bst] be a q x 7“ matrix. Then the product AB is the p X r

8

1. Vector spaces

matrix C = [cm] with entries 61 0kg =Zakjbjg,

k7: 1,...,p;€= 1...,7’.

j=1 Notice that 0kg is the matrix product of the the k’th row 5,, of A With the E’th column bg of B:

biz

Glee = 5km = [am ' " akql

3 bqe

Thus, for example, if

A =

1

3

5

2

1

0

and

B =

1

2

3

4

1

0

—1

1

0

1

2

—1

,

then

AB =

4

7

10

2

3

4

5

9

.

Moreover, if A E 1Fq and x E W, then y = Ax is the vector in

W with components yi = 23-21 aij$j for 2' = 1,. . . ,p. 0 Identity matrix: We shall use the symbol In to denote the n X n

matrix A = [aw], i,j = 1,...,n, with au- 2 1 for i = 1,...,n and aij = 0 for 2' 7E j. Thus, [3 =

1 0 0

0 1 0

0 0 1

The matrix In is referred to as the n X n identity matrix, or just the identity matrix if the size is clear from the context. The name stems from the fact that Inx = x for every vector x 6 IF” .

0 Zero matrix: We shall use the symbol Opxq for the matrix in 11“q all of Whose entries are equal to zero. The subscript p x q Will be dropped if the size is clear from the context. The definition of matrix multiplication is such that:

0 Matrix multiplication is not commutative; i.e., even if A and B are both p x p matrices, in general AB 7A BA. In fact, if p > 1,

then one can find A and B such that AB = Opxp, but BA 75 Opxp. Exercise 1.9. Find a pair of 2 x 2 matrices A and B such that AB = 02x2 but BA 75 02x2.

1.3. Some definitions

9

0 Matrix multiplication is associative: If A 6 IFq, B E Iq” and C 6 Fr)“, then

(AB)C = A(BC). 0 Matrix multiplication is distributive: If A, A1,A2 6 IFq and B, B1, B2 E IN”, then (A1 + A2)B = 141.3 + A23

and

A(B1 -|- B2) = 1431 + AB2 .

o If A E 1?q is expressed both as an array of p row vectors of length q and as an array of q column vectors of height 19:

A= = =[a1aq], and if B E Iq’" is expressed both as an array of q row vectors of length 7' and as an array of 7“ column vectors of height q:

b1 B =

2 [b1

br] ,

6., then the product AB can be expressed in the following three ways: 51B (1.2)

AB =

q = [Abl

- - -

Abr] = Zaigi .

fipB

i=1

Exercise 1.10. Show that if

a A = J

a

a

11

12

G21

G22

13 J

and

B =

023

(911

512

1913

(>14

(921

b22

[J23

b24

531

I932

(333

534

,

then _

(£1100

AB—Ja21

0

00,120

OJB+J:0

(1,22

000,13

0JB+J:0

0

a23JB

and hence that AB =Ja11Jlb11

0,21

(914] + J on J [b21 by (923 b24]+ J 0.13 J [b31

G22

(934] .

0’23

Exercise 1.11. Verify the three ways of writing a matrix product in formula

(1.2). [HINT: Let Exercise 1.10 serve as a guide]

10

1. Vector spaces

0 Block multiplication: It is often convenient to express a large matrix as an array of sub—matrices (i.e., blocks of numbers) rather than as an array of numbers. Then the rules of matrix multiplica— tion still apply (block by block) provided that the block decompositions are compatible. Thus, for example, if A:

A11

A12

A21

A22

A31

A32

and

B:

B14

B11

B12

313

B21

B22

B23 324

With entries Aij E Fp'ixqj and Bjk 6 IF qjxrk, then

C=AB= [0,5] ,2‘: 1,...,3, j=1,...,4, Where

Cij = AilBlj + Ai2B2ja is a p,- X rj matrix. 0 TranSposes: The transpose of a p x q matrix A is the q x p matrix Whose k’th row is equal to the k’th column of A laid sideways, k = 1,. . . ,q. In other words, the ij entry of A is equal to the ji entry of its transpose. The symbol AT is used to designate the transpose of A. Thus, for example, if

135 426

A:

,mmA

T

=

14 32 56

It is readily checked that

03)

(ATVH=A and LABV¥=BTAT. o Hermitian transposes: The Hermitian transpose AH of a p X q matrix A is the same as the transpose AT of A, except that all the entries in the transposed matrix are replaced by their complex conjugates. Thus, for example, if

=

1

m

4

2—2

,

5+i

,

m

,then AH =

1

4

—32‘

2+2'

, 5—z

_ —&

It is readily checked that

(M)

mflH=Aam mmH=yME o Inverses: Let A E 19‘q . Then: (1) A matrix C E IE‘qXP is said to be a left inverse ofA if CA = L1. (2) A matrix B E IN“ is said to be a right inverse of A if

AB=5

1.4. Mappings

11

In the first case A is said to be left invertible.In the second case A is said to be right invertible. It is readily checked that if a matrix A E 19‘q has both a left inverse C and a right inverse B, then B = C:

C = 0],, = C(AB) = (CA)B = IqB = B. Notice that this implies that if A has both a left and a right inverse, then it has exactly one left inverse and exactly one right inverse and

(as shown just above) the two are equal. In this instance, we shall say that A is invertible and refer to B = C as the inverse of A and denote it by A‘l. In other words, a matrix A 6 PPM is invertible if and only if there exists a matrix B E qP such that

AB = [10 and BA = Iq. In fact, as we shall see later, we must also have q = p in this case. Exercise 1.12. Show that if A and B are invertible matrices of the same

OOl—I

Exercise 1.13. Show that the matrix A 2

l—ll—lo

I—‘I—‘H

size, then AB is invertible and (AB)_1 = B‘1A_1. has no left inverses

and no right inverses. Exercise 1.14. Show that the matrix A = [ 1 0

O I

1 J has at least two 1

right inverses, but no left inverses. Exercise 1.15. Show that if a matrix A E ([3q has two right inverses B1 and B2, then ABl + (1 — A)B2 is also a right inverse for every choice of A E (C.

Exercise 1.16. Show that a given matrix A E 114‘q has either 0, 1 or infinitely many right inverses and that the same conclusion prevails for left inverses. Exercise 1.17. Let A11 6 WW, A12 6 1?q and A21 E qP. Show that if A11 is invertible, then [A11 A12]

is right invertible and

A11 ]

|: A

is left invertible.

21

1.4. Mappings o Mappings: A mapping (or transformation) T from a subset DT of a vector space M into a vector space V is a rule that assigns exactly one vector v E V to each u 6 PT. The set DT is called the domain

of T. The following three examples give some idea of the possibilities:

12

1. Vector spaces

a:

3xf+4x2 6R3.

(a) T: [x1] 6R2 I—>

xg—xl

2

x1+2zc2+6

(b) T: { [:3 E R2: £131 —$2 790} |—> [1/(131 —$2)] 6R1.

x

33314-562

(c)T:[x1]ElR2I—>

301—5132

6R3.

2 3331+CC2 The restriction on the domain in case (b) is imposed in order to insure that the definition is meaningful. In the other two cases the domain is taken equal to the full vector space 1R2. In this framework we shall refer to the set NT={uE’DT:Tu=0v}

as the nuIISpace (or kernel) of T and the set RT={Tu:uEDT}

as the range (or image) of T. The subscript V is added to the symbol 0 in the first definition to emphasize that it is the zero vector in V, not in U. 0 Linear mapping: A mapping T from a vector space M over IF

into a vector space V over the same number field F is said to be

a linear mapping (or a linear transformation) if for every choice of u, v E M and a E IF the following two conditions are met:

(1) T(u + V) 2 Tu + TV. (2) T(au) = aTu. It is readily checked that if T is a linear mapping from a vector space M over IF into a vector space V over IF, then NT is a subspace

of U and RT is a subspace of V . Moreover, in the preceding set of

three examples, T is linear only in case (c). o The identity: Let u be a vector space over IF. The special linear transformation from L! into M that maps each vector u E u into

itself is called the identity mapping. It is denoted by the symbol In if u = IF” and by [y otherwise, though, more often than not, when the underlying space L! is clear from the context, the subscript bl will be dropped and I will be written in place of Iu. Thus, q = [11 = u for every vector u G Ll.

Exercise 1.18. Compute NT and ’RT for each of the three cases (a), (b) and (0) considered above and say which are subspaces and which are not. Linear transformations are intimately connected with matrix multiplication:

1.5. Triangular matrices

13

Exercise 1.19. Show that if T is a linear transformation from a vector space

u over IF with basis {u1, . . . , uq} into a vector space V over IF with basis {V1, . . . ,vp}, then there exists a unique set of scalars aij 6 IF, 2' = 1,. . . ,p andj= 1,...,q such that p

(1.5)

Tuj = Zais'

for j = 1, . . . ,q

i=1 and hence that q

(1.6)

p

T(Z ij) = Z yz-vi 4:) Ax = y, j=1

i=1

where x E lF‘q has components x1, . . . , xq, y E W has components yl, . . . ,yp

and the entries aij of A 6 WW are determined by formula (1.5). 0 Warning: If A E (Cpxq, then matrix multiplication defines a linear map from x E (Cg to Ax 6 (CP. Correspondingly, the nullspace of this map,

NA={XE(Cq:Ax=O},

isasubspace of (3",

and the range of this map,

RA = {Ax : x E Cq} ,

is a subspace of (Cp.

However, if A 6 RPM, then matrix multiplication also defines a linear map from x E Rq to Ax 6 RP; and in this setting

isasubspace of Rq,

NA={x€Rq:Ax=0} and the range of this map,

RA 2 {Ax : x E Rq} ,

is a subspace of RP.

In short, it is important to clarify the space on which A is acting,

i.e., the domain of A. This will usually be clear from the context.

1.5. Triangular matrices An n x 71 matrix A = [aij] is said to be 0 upper triangular if all its nonzero entries sit either on or above the diagonal, i.e., if Liz-j = 0 when i > j.

0 lower triangular if all its nonzero entries sit either on or below the diagonal, i.e., if AT is upper triangular. o triangular if it is either upper triangular or lower triangular. o diagonal if 043‘ = 0

when

75 7E 3'.

Systems of equations based on a triangular matrix are particularly convenient to work with, even if the matrix is not invertible.

14

1. Vector spaces

Example 1.5. Let A E F4X4 be a 4 x 4 upper triangular matrix with nonzero diagonal entries and let b be any vector in F4. Then the vector x is a solution of the equation

(1.7)

Ax = b

if and only if a11$1 + 6112302 + 6113303 + 6114334

6122332 + 6123303 + 6124334

=

b1

= 52

a33x3 + a34x4

=

b3

0.44114

=

()4 .

Therefore, since the diagonal entries of A are nonzero, it is readily seen that

these equations admit a (unique) solution, by working from the bottom up: $4

=

ag41b4

333

= a§31(b3 — 6134174)

$2

= 09—21092 — 6123963 — @4304)

$1

= (11—11(51 — 6112172 — a13a33 — 61141104) -

Thus, we have shown that for any right—hand side b, the equation (1.7)

admits a (unique) solution x. Exploiting the freedom in the choice of b, let ej, j = 1,. . . ,4, denote

the j’th column of the identity matrix I4 and let Xj denote the solution of the equation i = e, for j = 1, . . . ,4. Then the 4 x 4 matrix X = [X1

X2

X3

X4]

with columns x1, . . . ,X4 is a right inverse of A: AX = A[X1 ' ' ' X4]

=

[AXl - - 'AX4]

=

[81...e4]=I4_

Analogous examples can be built for p>< p lower triangular matrices. The only difference is that now it is advantageous to work from the top down. The existence of a left inverse can also be obtained by writing down the requisite equations that must be solved. It is easier, however, to play with transposes. This works because A is a triangular matrix with nonzero diagonal entries if and only if AT is a triangular matrix with nonzero diagonal entries and

YA=IpATYT=Ip. Exercise 1.20. Show that the right inverse X of the upper triangular matrix A that is constructed in Example 1.5 is also a left inverse and that it is upper triangular. Lemma 1.6. Let A be a p X p triangular matrix. Then

1.5. Triangular matrices

15

(1) A is invertible if and only if all its diagonal entries are nonzero.

Moreover, if A is an invertible triangular matrix, then (2) A is upper triangular 4:) A‘1 is upper triangular. (3) A is lower triangular (E) A—1 is lower triangular. Proof. It suffices to focus on upper triangular matrices because A is lower triangular if and only if AT is upper triangular. The proof is divided into

four steps. The first two verify (1) for p = 2. The last two show that if (1) is valid for p = k, then it is also valid for p = (k: + 1). The formulas for A—1 that are exhibited in these steps serve to justify (2) and (3). 1. If A E W“ is an invertible upper triangular matrix, then its diagonal entries are nonzero. Under the given assumptions there exists a 2 x 2 matrix B such that AB = BA = I2. Thus,

6111

A=

[ 0

012

G22]

, B=

b11

b12

[1921

(922]

:» AB =

G11

l 0

0/12

511

022] [521

512]

522

=I. 2

The last identity is equivalent to the following four conditions:

( 1) 011511 + a12b21 = 1(2) ( 3) (4 )

($221921 2 0.

(1111912 + a12b22 = 0(7220,22 = 1.

Clearly, (4) implies that an 75 0, b22 7E 0 and b22 = 1/a22. Thus, (2) implies that b21 = O and hence (1) reduces to a11b11 = 1. Therefore, all 7E 0,

b11 aé 0 and b11 = 1/a11. This completes the proof of Step 1 and, with the

aid of (3), yields formulas for the entries in B: (311 = 1/a11,

521 = 0,

522 = 1/a22

and

512 = —a1_11a12a2_21.

2. If A E W“ is an upper triangular matrix with nonzero diagonal entries, then it is invertible. Since all 75 0 and an 75 0 by assumption, the matrix (1.8)

B =

—1 a 11

—1 —1 —a 11 alga _1 22

0

a22

is well defined and it is readily checked that AB = BA = 12. This completes

the proof of Step 2.

16

1. Vector spaces

3. If the theorem is true for A E 1?k and ifA E F E

is invertible,

and construct an example to show that the opposite implication is false. Exercise 1.29. Show that if the matrix E is defined by formula (1.10), then D

and

A — BD‘lC

invertible => E

is invertible,

and show by example that the opposite implication is false. Exercise 1.30. Show that if the blocks A and D in the matrix E defined

by formula (1.10) are invertible, then

E is invertible

4:» D — (YA—1B is invertible

A — BD‘lC is invertible.

Exercise 1.31. Show that if blocks A and D in the matrix E defined by

formula (1.10) are invertible and A — BD‘lC is invertible, then

(1.15)

(A — BD—lo)—1 = A—1 + A‘lB(D — CA—13)—1CA—1.

[HINT: (X — Y)-1 — X-1 = X—1{X — (X — Y)}(X — Y)-1 when X and X — Y are invertible] Exercise 1.32. Show that if if blocks A and D in the matrix E defined by

formula (1.10) are invertible and D — (YA—1B is invertible, then

(1.16)

(D — CA—lB)—1 = D—1 + D—10(A — BD—lo)—1BD-1.

[HINT: See the hint to the preceding exercise] Exercise 1.33. Show that if A E (Cp, B E (Cpxq, C E (qP and the

matrices A and A + B0 are both invertible, then the matrix L, + CA‘lB is

invertible and (L, + CA‘lB)—1 = Iq — C(A + BC)_1B. Exercise 1.34. Show that if A E (CPXP, B E (Cpxq, C 6 (CW? and the B . . . matrix A + BC is invertible, then the matrix [ A C find its inverse.

—Iq ] is invertlble, and

20

1. Vector spaces

Exercise 1.35. Let A E (Cp, u 6 (CP, v E (C? and assume that A is invertible. Show that

[vi

—1u] is invertible

(E) A + uvH is invertible 4:) 1 + VHA_111 75 0

and that if these conditions are met, then

(I:p + uVHA_1)_1u = u(1 + VHA_111)_1. Exercise 1.36. Show that if in the setting of Exercise 1.35 the condition 1 + VHA_1u 75 0 is met, then the Sherman-Morrison formula 1.1

( 7) holds.

(

A

A‘luvHA_1

H ‘1 = A‘1 — — +uv ) 1+VHA_111

Exercise 1.37. Show that the upper block triangular matrix

A11 A12 A13 A=

0

A22

A23

0

0

A33

with entries Aij of size p,- X 19,- is invertible if and only if the diagonal blocks A11, A22 and A33 are invertible, and find a formula for A‘l. [HINT: Look

for a matrix B of the same form as A such that AB = Ip1+p2+p3.] 1.8. Other matrix products

0 The Schur product C = A o B of A = [dz-j] E ([3a with B = [brij] 6 CM" is defined as the n x n matrix C 2 [GM] with entries Cij = aijbz-j for Z,j = 1,. . .,n.

o The Kronecker product A 8) B of A = [oz-j] E (Cpxq with B = [bij] E Cnxm is defined by the formula 011B

' - -

A®B=

0,q

. aplB

- - -

0,q

The Schur product is clearly commutative, whereas the Kronecker product is associative:

(A®B)®C=A®(B®C) and satisfies the rules

(A®B)T = AT®BT and (A®B)(C®D) =AC®BD, when the indicated matrix multiplications are meaningful. If x 6 IF k, u E IE‘k, y 6 Fe and v E IFE, then the last rule implies that

(XTUXYTV) = (XT ® yTXu ® V) -

Chapter 2

Gaussian elimination

People can tell you... do it like this. But that ain’t the way to learn. You got to do it for yourself.

Willie Mays, cited in Kahn [51], p. 163 Gaussian elimination is a way of passing from a given system of equations to a new system of equations that is easier to analyze. The passage from the given system to the new system is effected by multiplying both sides of the given equation, say Ax=b,

successively on the left by appropriately chosen invertible matrices. The restriction to invertible multipliers is essential. Otherwise, the new system will not have the same set of solutions as the given one. In particular, the left multipliers will be either permutation matrices (which are defined below) or lower triangular matrices with ones on the diagonal. Both types are invertible. The first operation serves to interchange (i.e., permute) rows, whereas the second serves to add a multiple of one row to other rows. Thus, for example,

0 1

1 0

0 0

an 021

G12 022

a1n azn

0

0

1

031

032

a3n

=

Q21 an

G22 (112

azn am

6131

G32

asn

a

whereas

1

0

0

G11

am

a

1

0

a21

a2n

[3

0

1

G31

a3n

=

an

6112

am

aa11+a21

aa12+a22

aa1n+a2n

fial1+a31

fia12+a32

fla1n+a3n 21

22

2. Gaussian elimination

2.1. Some preliminary observations The operation of adding (or subtracting) a constant multiple of one row of a p x q matrix from another row of that matrix can always be achieved by multiplying on the left by a p X p matrix with ones on the diagonal and one other nonzero entry. Every such matrix can be expressed in the form

(2.1)

with 73 andj fixed and

Ea = 1,, + ozeZ-ef

i aé j,

where the vectors e1 . . . , ep denote the standard basis for IFP (i.e., the columns in the identity matrix 110) and a E IF. It is readily seen that the following conclusions hold for the class of

matrices 8,7 of the form (2.1): (1) 8,3- is closed under multiplication: EaEfi = Ea+5. (2) The identity belongs to 817‘: E0 = JP. (3) Every matrix in 6}]- is invertible: Ea is invertible and E51 = E_a.

(4) Multiplication is commutative in Eij: EaEfi = EBEQ.

Thus, the class of matrices of the form (2.1) is a commutative group with respect to matrix multiplication. The same conclusion holds for the more general class of p X p matrices of the form

(2.2)

Eu = 1,, + llWT,

with

u, W E F10

and

WT u=0.

The trade secret is the identity, which is considered in the next exercise, or, in less abstract terms, the observation that 1 0

0 1

0 0

0 0

1 0

0 1

0 0

0 0

0

a

1

O

0

c

1

0

0

b

0

1

0

d

0

l

_

1 0

0 1

0 0

0 0

_

0

(1+0

1

O

0

b-l-d

0

1

and the realization that there is nothing special about the size of this matrix or the second column. Exercise 2.1. Let u, v, W E F10 be such that WTu = 0 and WTV = 0. Show that

(Ip + UWT)(Ip + VWT) = 1,, + (u + V)WT = (1,, + v)(Ip + UWT). o Permutation matrices: Every n x n permutation matrix P is obtained by taking the identity matrix In and interchanging some of the rows. Consequently, P can be expressed in terms of the columns ej, j = 1,. . . , n of In and a one to one mapping a of the

set of integers {1, . . . ,n} onto itself by the formula

(2.3)

P = Pa = Z ejeaTm i=1

2.1. Some preliminary observations

23

Or—noo

OOOr—t

PC, = e1e3T + e2e2T + e3e4T + e4e T =

OOHO

r—IOOO

Thus, for example, if n = 4 and 0(1) = 3, 0(2) = 2, 0(3) = 4 and 0(4) = 1, then

The set of n x n permutation matrices also forms a group under multiplication, but this group is not commutative (i.e., conditions

(1)—(3) in the list given above are satisfied, but not (4)). o Orthogonal matrices: An n X 71 matrix V with real entries is said to be an orthogonal matrix if VTV = In.

Exercise 2.2. Show that every permutation matrix is an orthogonal matrix.

[HINT: Use formula (2.3).] The following notions will prove useful:

0 Upper echelon: A p>< q matrix U with k nonzero rows, 1 S k: g p, is said to be an upper echelon matrix if the nonzero entries are in

the first It rows and first nonzero entry in row 2' lies to the left of the first nonzero entry in row i + 1 for 71 = 1, . . . ,k: — 1; the remaining

p — k rows (if any) are all zeros. Thus, for example, the first of the following three matrices is an upper echelon matrix, while the second and third are not. 3 0 0 0

6 0 0 0

2 1 0 0

4 0 0 0

1 5 2 0

0 0 0 0



4 0 0 0

2 O 5 O

3 6 0 0

1 0 5 0



0 4 0 0

O 2 0 5

0 3 6 0

0 1 0 5

o Pivots: The first nonzero entry in each row of an upper echelon

matrix is termed a pivot. The pivots in the matrix on the left just above are 3, 1 and 2.

o Pivot columns: A column in an upper echelon matrix U will be referred to as a pivot column if it contains a pivot. Thus, the first,

third and fifth columns of the matrix considered in the preceding paragraph are pivot columns. If GA = U, where G is invertible and U E 11“q is in upper echelon form with k: pivots, then the columns a,-1,...,a,-,c of A that correspond in position to the pivot columns uil, . . . ,uz'k of U will also be called pivot columns (even though the pivots are in U not in A) and the entries mil, . . . ,xik in x E IFq will

be referred to as pivot variables.

24

2. Gaussian elimination

2.2. Examples Example 2.1. Consider the equation Ax = b, where

(2.4)

A=

1

1

3

0

2

1

5 3 4

2

6

3

and

b=

2

2 1

1. Construct the augmented matrix ~ A=

(2.5)

0 1 2

2 5 6

3 3 3

1 4 2

1 2 1

that is formed by adding b as an extra column to the matrix A on the far right. The augmented matrix is introduced to insure that the row operations that are applied to the matrix A are also applied to the vector b. 2. Interchange the first two rows of A to get 1

am

5

3

4

2

~

0231.1:HA 2

6

3

2

1

P1 =

0 1 0

1 0 0

where 0 0 1

has been chosen to obtain a nonzero entry in the upper left-hand corner of the new matrix. 3.

Subtract two times the top row of the matrix BA from its bottom

row to get

3

4

2

0

2

3

1

1

0

—4

—3

—6

—3

~ = E1P1A,

where

E1 =



is chosen to obtain all zeros below the pivot in the first column. 4. Add two times the second row of E1P1A to its third row to get 1

@&

3

4

2

~

023

1

1

=amaA=WcL

—4

—1

0

5 0

3

l—lOO

[\Dl—‘O

E2:

OOH

Where

HOD

5

OHO

1

[\DOI—l

(2.7)

2.2. Examples

25

is chosen to obtain all zeros below the pivot in the second column, U = E2E1P1A is in upper echelon form and c = E2E1P1b.

It was not nec—

essary to permute the rows, since the upper left—hand corner of the block 2

[ 0

3

3

1

_4

_1] was already nonzero.

5. Try to solve the new system of equations

(2.9)

1 5 3

4

:1

2

Ux=023

1

m2:

1

0 0 3 —4

—1

3

m4

by solving for the pivot variables from the bottom row up: The bottom row equation is 3333 — 4$4 = —1,

and hence for the third pivot variable 1103 we obtain the formula 35103 = 4:34 — 1 .

The second row equation is 2x2+3x3+x4 = 1, and hence for the second pivot variable 372 we obtain the formula 21102 = —3:c3 —a:4+1 = —5ar;4+2.

Finally, the top row equation is $1 + 5132 + 3303 + 4334 = 2,

and hence for the first pivot variable 331 we get x1=—5x2—3x3—4w4+2

—5 —5 =—( 234+ 2 )—(4m4—1)—4x4+2 9 =§x4—2.

Thus, we have expressed each of the pivot variables m1,x2,m3 in terms of the variable 1134. In vector notation, $1

—2

9/2

_ m2 _ 1 —5/2 x‘ 303 _ —1/3 +954 4/3 {174

0

1

is a solution of the system of equations (2.9), or equivalently, (2.10)

E2E1P1AX = E2E1P1b

26

2. Gaussian elimination

(with A and b as in (2.4)) for every choice 0f :04. However, since the matrices E2, E1 and P1 are invertible, x is a solution of (2.10) if and only if Ax = b, i.e., if and only if x is a solution of the original equation. 6. Check that the computed solution solves the original system of equations. Strictly speaking, this step is superfluous, because the construction guarantees that every solution of the new system is a solution of the old system, and vice versa. Nevertheless, this is an extremely important step, because it gives you a way of verifying that your calculations are correct. Conclusions: Since U is a 3 x 4 matrix with 3 pivots, much the same sorts of calculations as those carried out above imply that for each choice of b E F3, the equation Ax = b considered in this example has at least one solution x 6 F4. Therefore, 7?,A 2 F3. Moreover, for any given b, there is a family of solutions of the form x = u + saw for every choice of x4 6 IF. But

this implies that Ax = Au + 3:4Av = Au for every choice of 334 6 IF, and hence that V E NA. In fact, 9/2

—5/2 4/3

NA = span

and

_ 3 RA—IF.

1

This, as we shall see shortly, is a consequence of the number of pivots and

their positions. (In particular, anticipating a little, it is not an accident that the dimensions of these two spaces sum to the number of columns of A.) Example 2.2. Consider the equation Ax = b with A=

0 1

O 2

4 4

3 1

1

2

8

4

and

b =

b1 b2 b3

1. Form the augmented matrix ~

0

0

4

3

b1

A =

1

2

4

1

b2

1

2

8

4

b3

2. Interchange the first two rows to get 1

2

4

1

[)2

~

0 1

0 2

4 8

3 4

bl 193

= HA

with P1 as in Step 2 of the preceding example.

2.2. Examples

27

3. Subtract the top row of P121 from its bottom row to get

0043 0

0

4

b1

3

=EHE

b3 — b2

where

1 0 —1

E1 =

0 1 0

0 0 1

4. Subtract the second row of E1P121J from its third row to get 1

2

4

1

b2

~

0

0

4

3

bl

0

0

O

O

b3-b2-b1

= E2E1P1A = [U

C] ,

Where 1

E2:

0 0

0

0

10 —1 1

1

2

4

1

()2

,U=0043 0 0 0 0

1241 ()043 00()0

and

c=

m m J = m 3 m—m—m

b1 b3 — b2 — ()1

,

534

working from the bottom up.

To begin with, the bottom row yields the equation 0 = (93— b2 —b1. Thus, it is clear that there are no solutions unless b3 = In + ()2. If this restriction

is in force, then the second row yields the equation 4133 + 32174 = b1

and hence, the pivot variable, _ b1 — 3.734

133

_

4

_

Next, the first row supplies the equation x1+2w2+4x3+x4 =b2

and hence, the other pivot variable, m1 =b2—2m2—4x3—x4 = ()2 — 25132 — (b1 — 3x4) — x4 =b2—b1—2m2+2m4.

28

2. Gaussian elimination

Consequently, if b3 = ()1 + ()2, then $1 IE2

_

b2—b1 0

_

X‘ x3 _

111/4 +532

334

—2

2

1

0

0

1

0 +304 —3/4

0

is a solution of the given system of equations for every choice of $2 and $4 in IF.

6. Check that the computed solution solves the original system of equations.

Conclusions: The preceding calculations imply that the equation Ax = b is solvable if and only if b1

b2

b2

;

i.e., if and only if

b E span

131 + b2

1

0

0 ,

1

1

1

Moreover, for each such b E F3 there exists a solution of the form x = u + x2v1 + x4V2 for every $2, 1174 6 IF. In particular, 11:2Av1 + $4AV2 = O for every choice of $2 and 364. But this is possible only if Avl = 0 and Av2 = 0.

Exercise 2.3. Check that for the matrix A in Example 2.2, RA is the span of the pivot columns of A:

RA 2 span

1 , 4 1

and NA = span

0 0

8

,

—3/4 1

The next example is carried out more quickly. Example 2.3. Let

00347 _01000 A_02368 006814

b1 _b2 andb—bg b4

Then a vector x E F5 is a solution of the equation Ax = b if and only if

0 1 0 0 0

361

0 0 3 4 7

:2

_

0 0 0 2 1 0 0 0 0 0

m3 4

_

5175

b2

b1

b3—2b2—b1 b4—2b1

2.2. Examples

29

The pivots of the upper echelon matrix on the left are in columns 2, 3 and 4. Therefore, upon solving for the pivot variables :32, x3 and $4 in terms of 5131,5135 and b1, . . . , b4 from the bottom row up, we obtain the formulas

0

=

()4 — 2191

2x4

=

()3 — 2b2 — b1 — x5

3mg

=

()1 — 4x4 — 7305

3b1 + 4b2 — 2b3 — 5x5 1172

=

[)2 .

But this is the same as

£173

=

0 0

1 0

$1 b2

331 5132

(— 5$5+3b1+4b2—2b3))/3

=

0

+1135

—5/3

.184

(— —£L'5+b3—2b2—b1 )/2

0

—1/2

535

£175

0

1

0 0

0 1 + b2

1

+191

4/3

0 0 —2/3

+ b3

—1/2

—1

1/2

0

0

0

= $1111 + 305112 + (91113 + 192114 + b31159 where 111, . . . ,u5 denote the five vectors in IF 5 of the preceding line. Thus, we have shown that for each vector b 6 IF 4 with b4 = 2b1, the vector

X = 1131111 + £85112 + (91113 + 52114 + (93115 is a solution of the equation Ax = b for every choice of $1 and :35. Therefore, xlul + x5u2 is a solution of the equation Ax = 0 for every choice of x1, x5 6

IF. Thus, u1,u2 6 NA and, as Ax = wlAul + x5Au2 + b1AU3 + b2Au4 + bgAu5 = b1AU3 + b2AU4 + b3Au5 , the vectors

=All3=

1 0

0

2

,V2=All4=

0 1

0

and V3=A115=

O

O O 1

O

belong to RA. Exercise 2.4. Let aj, j = 1,. . .,5, denote the j’th column vector of the

matrix A considered in the preceding example. Show that

(1) span{v1, V2, V3} = span{a2, a3, a4}, i.e., the span of the pivot columns of A.

2. Gaussian elimination

30

(2) Spanful, 112} = NA and Span{V1, V2, V3} = RA-

2.3. Upper echelon matrices The examples in the preceding section serve to illustrate the central role played by the number of pivots in an upper echelon matrix U and their positions when trying to solve systems of equations by Gaussian elimination. Our next main objective is to exploit the special structure of upper echelon matrices in order to draw some general conclusions for matrices in this class. Extensions to general matrices will then be made on the basis of the following lemma:

Lemma 2.4. Let A E 19‘q and assume that A aé Opxq. Then there exists an invertible matrix G E lE‘p such that

(2.11)

GA = U

is in upper echelon form. Proof. By Gaussian elimination there exists a sequence P1, P2, . . . ,Pk of p x p permutation matrices and a sequence E1, E2, . . . ,Ek of lower triangular

matrices with ones on the diagonal such that EkPk - - -E2P2E1P1A = U

is in upper echelon form. Consequently the matrix G = EkP;c - - - E2P2E1P1 fulfills the asserted conditions, since it is the product of invertible matrices.

El

Lemma 2.5. Let U E pq be an upper echelon matrix with k pivots and let ej denote the j ’th column of Ip for j = 1,. . .,p. Then:

(1) k S min {p.61}(2) The pivot columns of U are linearly independent. (3) The span of the pivot columns = span {e1, . . .,ek} = RU; i.e.,

(a) Ifk < p, then

Ru={[:]:belfi‘k

and new—k}.

(b) Ifk =p, then RU = Fl”. (4) The first k columns of UT form a basis for RUT. Proof. The first assertion follows from the fact there is at most one pivot in each column and at most one pivot in each row.

2.3. Upper echelon matrices

31

Next, let u1,... ,uq denote the columns of U and let 1M1, . . . ,uik (with

i1 < - - - < ik) denote the pivot columns of U. Then clearly

(2.12) span{u,-1,...,u,-k} gspan{u1,...,uq} Q {[3] : bEl andOElE‘p_k}, if k: < p. On the other hand, the matrix formed by arraying the pivot columns one after the other is of special form:

[11,,

U um] = [US] ,

where U11 is a k x 11: upper triangular matrix with the pivots as diagonal

entries and U21 = 0(p_k)xk. Therefore, U11 is invertible, (2) holds and, for any choice of b 6 IF'“, the formulas _ [1111

' ' '

b

_

U

uikl Ulllb = [U::] Ulllb = [0]

imply that (2.13)

{[3] : b e 1"“ and 0 E lip—k} g {Uxt x E W} E Spanfui17---auik}-

The two inclusions (2.12) and (2.13) yield the equality advertised in (a) of (3). The same argument (but with U = U11) serves to justify (b) of (3). Item (4) is easy and is left to the reader.

El

Exercise 2.5. Verify (4) of Lemma 2.5. Exercise 2.6. Let U E 117q be an upper echelon matrix with k pivots. Show that there exists an invertible matrix K 6 IF q such that:

(1) If k < q, then

RUT={K[3]:bEl and new-k}. (2) If k = q, then RUT = IFq. [HINT: In case of difficulty, try some numerical examples for orientation.] Exercise 2.7. Let U be a 4 x 5 matrix of the form

U=[111

112

113

114

115]:

U11 0

U12 0

U13 U23

U14 U24

U15 U25

0

0

0

U34

1135

0

0

0

0

0

with U11, U23 and U34 all nonzero. Show that span {u1, u3, u4} = RU.

32

2. Gaussian elimination

Exercise 2.8. Find a basis for the null space NU of the 4 x 5 matrix U

considered in Exercise 2.7 in terms of its entries uij, when the pivots of U are all set equal to one.

Lemma 2.6. Let U E 15‘q be in upper echelon form with k: pivots. Then:

(1) k S min{p,q}. (2) k: = q (=> U is left invertible 4:) My = {0}. (3) k = p U is right invertible x= (VU)x=V(Ux) =V0=O, i.e., U left invertible => NU = {0}. To complete the proof of (2), observe that if there are less than q pivots, then at least one of the columns of U, say u,., is not a pivot column of U. Thus, as u,. can be expressed as a linear combination of the pivot columns, it follows that q

u,. =

Z cjuj

for some choice of coeflicients c1, . . . ,cq 6 IF.

151,375?“

But this in turn implies that the vector (1 with components d,. = 1 and dj = —cj for j 7é r is a nonzero vector in My, which contradicts the assumption

that My = {0}. Therefore, NU = {0} implies that U has q pivots. Finally, even though the equivalence h = p 4:) RU = IF? is known from Lemma 2.5, we shall present an independent proof of all of (3), because it is instructive and indicates how to construct right inverses, when they exist. We proceed in three steps:

(a) k = p => U is right invertible: If k: = p = q, then U is right (and

2.3. Upper echelon matrices

33

left) invertible by Lemma 1.6. If k = p and q > p, then there exists a q x q permutation matrix P that (multiplying U on the right) serves to in— terchange the columns of U so that the pivots are concentrated on the left, i.e.,

UP = [U11

U12] ,

where U11 is a p x p upper triangular matrix with nonzero diagonal entries. Thus, if q > p and V E 1'“q is written in block form as V:

V11

lvml

with V11 6 Fl”? and V21 6 IF(q-P>> RU = If”: If U is right invertible and V is a right inverse of U, then for each choice of b 6 1W, x = Vb is a solution of the equation Ux = b:

UV = 1,, => Ux = U(Vb) = (UV)b = Ipb = b;

i.e., (b) holds. (c) RU = l => 1»: = p: If RU 2 IN, then there exists a vector v E q such that Uv = ep, where ep denotes the p’th column of Ip. If U has less than p pivots, then the last row of U, egU 2 0T, i.e.,

1 = egep = eT(Uv) = (egU)v = OTV = 0, p which is impossible. Therefore, (c) holds. Exercise 2.9. Let A =

1

1

0

2

0

1

O

0

0

and B =

III

0

0

1

0

1

0 . Find a basis for

1

0

0

each of the spaces 723,4, RA and RAB.

Exercise 2.10. Find a basis for each of the spaces NBA, NA and NAB for the matrices A and B that are given in the preceding exercise. Exercise 2.11. Show that if A E pq, B E 11?p and 111, . . . ,uk is a basis for RA, then span {Bu1, . . . , Buk} = 723A and that this second set of vectors will be a basis for 723A if B is left invertible.

Exercise 2.12. Show that if A is a p x q matrix and C is a q x q invertible matrix, then RAG 2 RA.

34

2. Gaussian elimination

Exercise 2.13. Show that if U E pq is a p x q matrix in upper echelon form with p pivots, then U has exactly one right inverse if and only if p = q. If A E lF‘pxq and U is a subspace of F4, then

(2.14)

AU={Au: uEU}.

Exercise 2.14. Show that if GA = B and G is invertible (as is the case in formula (2.11) with U = B), then R3 = GRA, NB =NA, 733T = RAT and GTNBT =NAT.

Exercise 2.15. Let U E Fq be an upper echelon matrix with k pivots,

where 1 g k: g p < q. Show that NU 75 {O}. [HINT: There exists a q x q permutation matrix P (that is introduced to permute the columns of U, if

need be) such that UP=|:

U11 U21

U12 J U22

ifk 0, then (2) implies that L1 aé V and hence that at least one of the vectors in the basis, say vil, is not in L1. Therefore, {u1, . . . ,uk, v11} is a linearly independent set of vectors in V. If r > 1, then

there is a vector vi, in the basis that is not in the span{u1, . . . , uk,v,-1}.

2. Gaussian elimination

36

Therefore, {u1, . . . , uk, V11, vb} is a linearly independent set of vectors inside

the vector space V. If r > 2, then the procedure continues until r new

vectors {Vi1,.. . ,Va} have been added to the original set {u1, . . . ,uk} to form a basis for V.

[I

o dimension: The preceding theorem guarantees that if V is a vec— tor space over IF with a finite basis, then every basis of V has exactly the same number of vectors. This number is called the dimension of the vector space.

0 zero dimension: The dimension of the vector space {0} will be assigned the number zero. Exercise 2.18. Show that IF'c is a k: dimensional vector space over IF.

Exercise 2.19. Let T be a linear mapping from a finite dimensional vector space Ll over IF into a vector space V over IF. Show that dim RT 3 dimu.

[HINT: If u1, . . . ,un is a basis for L1, then the vectors TUj, j = 1,... ,n span RT] Exercise 2.20. Construct a linear mapping T from a vector space Ll over IF into a vector space V over IF such that dim RT < dim L1. 2.4. The conservation of dimension

Theorem 2.9. Let T be a linear mapping from a finite dimensional vector space H over IF into a vector space V over IF (finite dimensional or not). Then

(2.15) Proof.

dim NT + dim RT = dim LI. In view of Exercise 2.19, RT is automatically a finite dimensional

space regardless of the dimension of V. Suppose first that NT 7é {0}, RT 75 {0} and let u1, . . . ,uk be a basis for NT, v1, . . . ,vl be a basis for RT and choose vectors yj E U such that Tyj=vj,

j=1,...,l.

The first item of business is to show that the vectors ul, . . . ,uk and y1, . . . ,yl

are linearly independent over ]F. Suppose, to the contrary, that there exists scalars a1, . . . ,ak and B1,. . .,[3; in IF such that

k: (2.16)

l

2 aiui + Z Bjyj = 0 .

i=1

j=1

Then

I:

l

2.4. The conservation of dimension

37

Thus, k-

l

l

0 = ZazTui + ZfljTya' = 0 +252“?i=1

j=1

j=1

Therefore, 51 = - -- = B; = O and so too, by (2.16), a1 =

= 04;, = 0. This

completes the proof of the asserted linear independence.

The next step is to verify that the vectors u1, . . . , uk, y1,. . . ,yl span u and thus that this set of vectors is a basis for H. To this end, let W E M. Then, since l

l

TW = 25% = X (33139" j=1

j=1

for some choice of fil, . . . , fig 6 IF, it follows that l

T W — 2 fli

= 0-

j=1 This means that

l W — Z fljyj E NT

9'21

and, consequently, this vector can be expressed as a linear combination of 111, . . . ,uk. In other words, k

l

W = Z oziu, + Zfljyj

i=1

j=1

for some choice of scalars a1, . . .,ozk and 51,. .. ,5; in IF. But this means that Span{u1, ° ' ' : ukayla ' ' ° :yl} = u

and hence, in View of the already exhibited linear independence, that

dimu = k: +l = dim/VT -|- dimRT,

as claimed.

Suppose next that NT = {O} and RT 7g {0}. Then much the same sort of argument serves to prove that if v1, . . . ,vl is a basis for RT and if yj E U is such that TYj = Vj for j = 1, . . . , l, then the vectors y1, . . . ,yl are linearly

independent and span Ll. Thus, dim L1 = dim ’RT = 6, and hence formula

(2.15) is still in force, since dim NT = 0. It remains only to consider the case RT = {0}. But then NT = U, and

formula (2.15) is still valid.

El

2. Gaussian elimination

38

Remark 2.10. We shall refer to formula (2.15) as the principle of conservation of dimension. Notice that it is correct as stated if L! is a finite dimensional subspace of some other vector space W.

2.5. Quotient spaces This section is devoted to a brief sketch of another approach to establishing Theorem 2.9 that is based on quotient spaces. It can be skipped Without loss of continuity.

o Quotient spaces: Let V be a vector space over IF and let M be a subspace of V and, for u E V, let uM = {u + m: m E M}. Then

V/M = {uM : u E V} is a vector space over IF With respect to the

rules uM+vM = (11+V)M and a(uM) = (ozu)M of vector addition and scalar multiplication, respectively. The details are easily filled in with the aid of Exercises 2.21—2.23.

Exercise 2.21. Let M be a proper nonzero subspace of a vector space V

over IF and, for u E V, let uM = {u+ m: m E M}. Show that if x,y E V, then

xsMx—yEM and use this result to describe the set of vectors u E V such that uM = 0M-

Exercise 2.22. Show that if, in the setting of Exercise 2.21, u, v,x, y E V

and if also uM = xM and VM = yM, then (u+v)M = (x+y)M. Exercise 2.23. Show that if, in the setting of Exercise 2.21, oz, 5 6 IF and

u E V, but u 9? M, then (ozu)M = (Bu)M if and only if a = 3. Exercise 2.24. Let L! be a finite dimensional vector space over IF and let V

be a subspace of M. Show that dimu = dim (Ll/V) + dim V. Exercise 2.25. Establish the principle of conservation of dimension with the aid of Exercise 2.24.

2.6. Conservation of dimension for matrices One of the prime applications of the principle of conservation of dimension is to the particular linear transformation T from IFq into IF? that is defined

by multiplying each vector x E IFP by a given matrix A 6 IF q. Because of its importance, the main conclusions are stated as a theorem, even though they are easily deduced from the definitions of the requisite spaces and Theorem 2.9.

2.6. Conservation of dimension for matrices

39

Theorem 2.11. IfA E IN”, then

(2.17)

NA = {x E W : Ax = O}

(2.18)

RA

(2.19)

=

{Ax : x E IN}

is a subspace of W,

is a subspace of F1“

and

q = dim NA + dim RA. 0 rank: If A E lE‘pxq, then the dimension of RA is termed the rank of A: rankA = dimRA.

Exercise 2.26. Use the principle of conservation of dimension to show that if A 6 IF q is invertible, then p = q. [HINT: First show that A left invertible

=> NA = {0} and A right invertible => 73A = FF] Exercise 2.27. Let A 6 W“, B 6 15‘7”” and C E qX’". Show that:

(1) rank BA 3 rank A, with equality if B is left invertible. (2) rank AC 3 rank A, with equality if C is right invertible.

Theorem 2.12. IfA E lE‘q, then

(2.20) NAHA =NA,

Proof.

RAHA = RAH

and

rankA = rankAH g min{p, q}.

If x E NAHA and y = Ax, then

10

Z lyjl2 = yHy = (AX)HAX = xH(AHAx) = xHO = 0.

j=1

Therefore, Ax = y = 0, i.e., NAHA Q NA. Thus, as the inclusion NA Q

NAHA is obvious, equality must prevail. Next, the principle of conservation of dimension applied first to A 6 PPM and then to AHA E qq implies that q = dlmNA + dlmRA = dlmNAHA + dlmRAHA

and hence, in View of the already established equality NA = NAHA and the second inequality in Exercise 2.27, that dlmRA = dlmRAHA S dlmRAH.

Since the last inequality may be applied to AH as well as to A, it follows that dimRA S dimRAH S dimRA.

Thus, equality must hold. Moreover, dim RAHA = dim RAH and therefore the self—evident inclusion RAHA g RAH must be an equality. The inequality rankA g min{, 1), q} then follows from the fact that dim RA 3 q, the number

of columns of A, and dim RAH S p, the number of rows of A. Exercise 2.28. Show that if A E 1?q , then rankA = rank AT.

El

40

2. Gaussian elimination

Exercise 2.29. Use the fact that if A 7A Opxq, then there exists an invertible matrix G E 15‘p such that GA = U is in upper echelon form to give another proof of the assertion that

A E IF‘pxq => rankA = rankAT = rankAH g min {19, q}. [HINT: First check that rank U = rank UT = rank UH is equal to the number of pivots in U] Exercise 2.30. Show that if A 6 (Cq and C E (Cq, then (2.21)

rank [3,] = q 4:) NA 0N0 = {0}.

Exercise 2.31. Show that if A 6 (Cq and B 6 C10”, then

(2.22)

rank [A B] = p 4:) NAH flNBH = {0}.

Exercise 2.32. Show that if A is a triangular matrix (either upper or

lower), then rank A is bigger than or equal to the number of nonzero diagonal entries in A. Give an example of an upper triangular matrix A for which the inequality is strict. Exercise 2.33. Calculate dim NA and dim RA in the setting of Exam-

ple 2.3 and confirm that these numbers are consistent with the principle of conservation of dimension.

2.7. From U to A

The next theorem is an analogue of Lemma 2.6 that is stated for general matrices A 6 IFq; i.e., the conclusions are not restricted to upper echelon

matrices. It may be obtained from Lemma 2.6 by exploiting formula (2.11). However, it is more instructive to give a direct proof.

Theorem 2.13. Let A 6 PPM, Then

(1) rankA = p 4:) A is right invertible RA = IFP.

(2) rankA = q 4:) A is left invertible 4:) NA = {O}. (3) IfA has both a, left inverse B E qP and a right inverse C E IN”, then B = C, p = q and A has no other left or right inverses. Proof.

Since 72A Q 1171’, it is clear that

RA = If” rankA =p. Suppose next that RA = W. Then the equations

i =bj, j: 1,...,p,

2.8. Square matrices

41

are solvable for every choice of the vectors bj. If, in particular, bj is set equal to the j’th column of the identity matrix Ip, then

A[x1

xp] = [b1

bp] =Ip.

This exhibits the q x p matrix X = [x1

xp]

with columns x1, . . . ,xp as a right inverse of A.

Conversely, if AC = Ip for some matrix C 6 qP, then x = Cb is a solution of the equation Ax = b for every choice of b 6 F9, i.e., 72A = IN.

This completes the proof of (1). Next, (2) follows from (1) and the observation that

NA = {0} rankA = q 4:) rank AH = q

(by Theorem 2.11) (by Theorem 2.12)

4:) AH is right invertible 4:)

(by part (1))

A is left invertible.

Moreover, (1) and (2) imply that if A is both left invertible and right invertible, then p = q and, as has already been shown, the two one-sided inverses coincide:

B = BIp = B(AC) = (BA)C = [(10 = C. This identification also serves to justify the asserted uniqueness. Thus, e.g.,

if also DA 2 L1, then D = C = B.

[I

Exercise 2.34. Find the null space NA and the range RA of the matrix

A=

3

1

0

2

4

1

0

2

5

2

0

4

acting on

R4

and check that the principle of conservation of dimension holds.

2.8. Square matrices Theorem 2.14. Let A 6 IpP. Then the following statements are equivalent: 1 ) A is left invertible. ) A is right invertible.

) ) Proof.

NA = {0}.

This is an immediate corollary of Theorem 2.13.

D

42

2. Gaussian elimination

Remark 2.15. The equivalence of (3) and (4) in Theorem 2.14 is a special case of the Fredholm alternative, which, in its most provocative form,

states that if the solution to the equation Ax = b is unique, then it exists, or, to put it better:

If A E FPXP and the equation Ax = b has at most one solution, then it has exactly one.

Lemma 2.16. IfA E FPXP, B E FPXP and AB is invertible, then both A and B are invertible.

Proof.

Clearly, RAB Q RA and NAB 2 N3. Therefore,

p = rankAB g rankA g p

and

0 = dimNAB 2 dim/VB Z 0.

The rest is immediate from Theorem 2.14.

D

Exercise 2.35. Let A E IE‘pxq, B E qr, and let {u1, . . . , uk} be a basis for RB. Show that {Au1, . . . , Buk} is a basis for RAB if and only if RBONA =

{0}Exercise 2.36. Show that if A E 1?q and B E IN”, then NAB = {0} if

and only ifNA 0 R3 = {0} and N3 = {0}. Exercise 2.37. Find a p x q matrix A and a vector b E RP such that

NA = {0} and yet the equation Ax = b has no solutions. Exercise 2.38. Let A E pq, B 6 EM)”, and let {u1, . . .,uk} be a basis for RB. Show that:

(a) span{Au1, . . .,Buk} = RAB. (b) If A is left invertible, then {Au1, . . . , Buk} is a basis for 72,43.

Exercise 2.39. Find a basis for RA and NA if A =

1 0 1 1

3 1 —2 6

1 2 3 11

8 1 3 5

2 3 1 9

Exercise 2.40. Let A 6 F4”, let V1,V2,V3 be a basis for RA and let V = [v1 v2 V3]. Show that VH V is invertible, that B = V(VHV)_1VH is not left invertible and yet 723 = 723,4. Exercise 2.41. Let 111, ug, u3 be linearly independent vectors in a vector space M over IF and let u4 = 111 + 2u2 + mg.

(a) Show that the vectors 111, ug, u4 are linearly independent and that span {u1, u2, u3} = span {u1, u2, u4}. (b) Express the vector 7u1 + 13u2 + 5u3 as a linear combination of the vectors u1,u2,u4. [Note that the coefficients of all three vectors

change]

2.8. Square matrices

43

1

3

2

2

4

1

O

4

:1:

3 4 4

2 1 2

Exercise 2.42. For which values of a; is matrix

Exercise 2.43. Show that the matrix A =

1 2 0

invertible?

is invertible and

find its inverse by solving the system of equations A[x1 umn by column.

X2

X3] = I3 col-

Exercise 2.44. Show that if A E (CPXP is invertible, B E Cpxq, C E qP, DEqqand A

B

E=lc pl, then dimRE = dimRA + dim R(D_CA_1B). Exercise 2.45. Show that if, in the setting of the previous exercise, D is invertible, then rankE 2 rank D + rank (A — BD‘lC). Exercise 2.46. Use the method of Gaussian elimination to solve the equa—

tion Ax = b when A =

1

4

2

3

2 3

0 0

1 1

0 2

2

and b =

1 1

, if possible, and

find a basis for RA and a basis for NA.

Exercise 2.47. Use the method of Gaussian elimination to solve the equa—

.

tion Ax = b when A =

5

2

4

1

3

0

0

2

4

0

1

0

1

and b =

1

O

.

.

, 1f pos31ble, and find

1

a basis for RA and a basis for NA. Exercise 2.48. Use the method of Gaussian elimination to solve the equa1 0 2 1 tion Ax = b when A =

3

1

O

2

5

O

and b =

3

, if possible, and find

1

a basis for RA and a basis for NA.

Exercise 2.49. Find lower triangular matrices with ones on the diagonal E1, E2, . . . and permutation matrices P1, P2, . . . such that

EkPkEk—IPk—l ' - 'E1P1A is in upper echelon form for any two of the three preceding exercises.

44

2. Gaussian elimination

Exercise 2.50. Use Gaussian elimination to find at least two right inverses to the matrix A given in Exercise 2.46. [HINT: Try to solve the equation A

$11 a: 21

£312 :1: 22

51113 :L' 23

$31

$32

$33

$41

£342

5843

I3 ,

column by column]

Exercise 2.51. Use Gaussian elimination to find at least two left inverses to the matrix A given in Exercise 2.47. [HINT: Find right inverses to AT.]

Chapter 3

Additional applications of Gaussian elimination

I was working on the proof of one of my poems all morning, and took out a comma. In the afternoon I put it back again. Oscar Wilde

This chapter is devoted to a number of applications of Gaussian elim— ination, both theoretical and computational. There is some overlap with

conclusions reached in the preceding chapter, but the methods of obtaining them are usually different. 3.1. Gaussian elimination redux

Recall that the method of Gaussian elimination leads to the following conclusion: Theorem 3.1. Let A E lXq be a nonzero matrix. Then there exists a set

of lower triangular p X p matrices E1, . . . ,Ek with ones on the diagonal and a set of p x p permutation matrices P1, . . . , Pk such that (3.1)

EkPk - - -E1P1A = U

is in upper echelon form. Moreover, in this formula, Pj acts only (if at all) on rows j, . . . , p and Ej — Ip has nonzero entries in at most the j ’th column.

The extra information on the structure of the permutation matrices may seem tedious, but, as we shall see shortly, it has significant implications: it

enables us to slide all the permutations to the right in formula (3.1) without changing the form of the matrices E1, . . . ,Ek.

45

46

3. Additional applications of Gaussian elimination

Theorem 3.2. Let A E pq be any nonzero matrix. Then there exists a lower triangular p X p matrix E with ones on the diagonal and a p x p permutation matrix P such that EPA = U

is in upper echelon form. Discussion.

To understand where this theorem comes from, suppose first

that A is a nonzero 4 x 5 matrix and let e1, . . . , e4 denote the columns of I4. Then there exists a choice of permutation matrices P1, P2, P3 and lower

triangular matrices 1 E1

E2

E3

_



_

_

_



a

0

0

0

b

0

1

0 1

0

_c

0

0

1_

_1 0

0

0 1

d

0 0

0— 0 _

_0

e

0

1_

_e_

_1 0

0 1

0 0

0 0

_0 _ 0

0

0

1

1

0

0

0

0

_

T

— I4 + 11181

.

Wlth

_

111 —

at b

,

_c_

T

— I4 + u2e2

_

T

— I4 -I- 11383

.

Wlth

.

Wlth

‘0— _ 0

112 —

113 —

_0 0 f 1

d

0

,

,

_f

such that (3.2)

E3P3E2P2E1P1A = U

is in upper echelon form. In fact, since P2 is chosen so that it interchanges the second row of E1P1A with its third or fourth row, if necessary, and P3 is chosen so that it interchanges the third row of E2P2E1P1A with its fourth

row, if necessary, these two permutation matrices have a special form: P2

=

P3

=

1 0T

[0

I2 O

H ] ,

where

1

0 H 2

,

where

H1

H2

is a 3 X 3 permutation matrix

. . . 1s a 2 x 2 permutation matrlx.

This exhibits the pattern, which underlies the fact that PiEjZEg-Pi

if

’l>j,

3.1. Gaussian elimination redux

47

where E’ denotes a matrix of the same form as Ej. Thus, for example, since v2— — P3u2 is a vector of the same form as u2 and 92TP—3 _ e2 , it follows that

P3E2

=

P3(I4 + 11262)

=

P3 + we;

=_&+w£g=@g, where

Ey=n+Vޤ is a matrix of the same form as E2. In a similar vein

P3P2E1 = P3E’1P2 = Ei’Pn and consequently,

E3P3E2P2E1P1 = EgEéEi’PnPl = EP, with E = E3EéEi’

and

P = P3P2P1 .

Much the same argument works in the general setting of Theorem 3.2. You have only to exploit the fact that Gaussian elimination corresponds to mul— tiplication on the left by EkPk - - -E1P1, where

Ej =Ip+ [:3] 9?

with

06l

and

bj El_j,

and that the p X p permutation matrix P,- may be written in block form as

._n4 P’L_|:O

0 Hi—1:|,

where H7;_1 is a (p — i + 1) X (p — 2' + 1) permutation matrix. Then, letting

Cj = ILL—1133',

Wen W

P+

= (I, #[l f)PF ET bri>L since

ej13,=ej

for

z>].

Remark 3.3. Theorem 3.2 has interesting implications. However, we wish to emphasize that when Gaussian elimination is used in practice to study

the equation Ax = b, it is not necessary to go through all this theoretical

48

3. Additional applications of Gaussian elimination

analysis. It suffices to carry out all the row operations on the augmented matrix all

0.101

. . .

am

()1

alq

bq

and then to solve for the “pivot variables”, just as in the examples. But, do check that your answer works. In what follows, we shall reap some extra dividends from the representation formula

(3.3)

EPA = U

(that is valid for both real and complex matrices) by exploiting the special structure of the upper echelon matrix U. Exercise 3.1. Show that if A 6 PM” is an invertible matrix, then there exists a permutation matrix P such that

(3.4)

PA = LDU,

where L is lower triangular with ones on the diagonal, U is upper triangular with ones on the diagonal and D is a diagonal matrix. Exercise 3.2. Show that if L1D1U1 = L2D2U2, where Lj, Dj and Uj are n X n matrices of the form exhibited in Exercise 3.1, then L1 2 L2, D1 2 D2 and U1 = U2. [HINT: LglLlDl = D2U2Uf1 is both lower and upper

triangular.] Exercise 3.3. Show that there exists a 3 X 3 permutation matrix P and a lower triangular matrix B =

1 b21

0 1

0 0

such that

1931 (D32 1

0 1

1 0

0 0

1 a

0 1

0 0

= BP

B 0 1

0 0 1

if and only if oz = 0.

Or—tr—t

i—li—li—l

invertible 3 x 3 matrix for the matrix A =

r—Ir—IO

Exercise 3.4. Find a permutation matrix P such that PA = LU, where L is a lower triangular invertible 3 X 3 matrix and U is an upper triangular

3.2. Properties of BA and AC In this section a number of basic properties of the product of two matrices in terms of the properties of their factors are reviewed for future use.

3.2. Properties of BA and AC

49

Lemma 3.4. Let A E pq and let B E 1?p be invertible. Then: (1) A is left invertible if and only if BA is left invertible.

The first assertion follows easily from the observation that if C is

Proof.

a left inverse of A, then

CA = L, 2 (CB—1)(BA) = Iq. Conversely, if C is a left inverse of BA, then

C(BA) 2 lg => (CB)A = Iq.

This completes the proof of (1). Next, to verify (2), notice that if C is a right inverse of A, then

A0 = 1,, 2 (BA)C = B(AC) = B 2 BA(CB—1)= 1p; i.e., (CB—1) is a right inverse of BA. Conversely, if C is a right inverse of BA, then

(BA)C = 1,, 2 B(AC) = 1,, 2 AC = B—1 2 A(CB) = 1,; i.e., 0B is a right inverse of A. Items (3) and (4) are easy and are left to the reader. To verify (5), let {u1, . . . , uk} be a basis for RA. Then clearly span{Bu1, . . . ,Buk} = RBA. Moreover, the vectors Bul, . . . , Buk are linearly independent, since [6

k

Zaj(Buj)=0=>B

Zajuj

j=1

j=1

k

=O=>ZajUjENB,

j=1

and N3 = {0}, which forces all the coefficients a1, . . . ,ozk to be zero, be— cause the vectors u1, . . .,uk are linearly independent.

Thus, the vectors

Bul, . . . , Bu], are also linearly independent and hence dim RBA = k = dim RA,

which proves (5). Finally, (6) is immediate from (3), and (7) is immediate from (5). Exercise 3.5. Verify items (3), (4), and (7) of Lemma 3.4.

D

50

3. Additional applications of Gaussian elimination

Exercise 3.6. Verify item (5) of Lemma 3.4 on the basis of (3) and the law of conservation of dimension. [HINT: The matrices A and BA have the same number of columns] Lemma 3.5. Let A E pq and let C E qq be invertible. Then:

(1 ) A is left invertible if and only if AC is left invertible. (2) A is right invertible if and only if AC is right invertible.

(3) NA— — CNAC (4) RA— — RAC-

(5) dimNA— — dim/VAC.

(6) NA — {0} B ~ A; (3) A ~ B and B ~ 0 => A ~ 0.

64

4. Eigenvalues and eigenvectors

4.2. Invariant subspaces Let T be a linear mapping from a vector space u over IF into itself. Then

a subspace M of L1 is said to be invariant under T if Tu E M whenever u E M. The simplest invariant subspaces are the one dimensional ones, if they exist. Clearly, a one dimensional invariant subspace M = {au : 04 6 IF} based on a nonzero vector u E L! is invariant under T if and only if there exists a constant A 6 IF such that (4.2)

Tu=)\u,

u7éO,

or, equivalently, if and only if

N(T—)\I) 7é {0}; i.e., the nullspace of T — AI is not just the zero vector. In this instance, the number A is said to be an eigenvalue of T and the vector u is said to be an eigenvector of T. In fact, every nonzero vector in A/(T_M) is said to be

an eigenvector of T. It turns out that if IF = C and U is finite dimensional, then a one dimensional invariant subspace always exists. However, if IF 2 IR, then T may not have any one dimensional invariant subspaces. The best that you can guarantee for general T in this case is that there exists a two dimensional invariant subspace. As we shall see shortly, this is connected with the fact that a polynomial with real coefficients (of even degree) may not have any real roots. Exercise 4.2. Show that if T is a linear transformation from a vector space

V over IF into itself, then the vector spaces MT—AI) and R(T_,\I) are both invariant under T for each choice of A 6 IF.

Exercise 4.3. The set V of polynomials p(t) with complex coefficients is a vector space over (C with respect to the natural rules of vector addition

and scalar multiplication. Let Tp = p”(t) + tp’(t) and Sp = p” (t) + t2p’ (t). Show that the subspace Mk of V of polynomials p(t) = co + out + - - - + ckt'c of degree less than or equal to k: is invariant under T but not under 5'. Find a nonzero polynomial p 6 U3 and a number A E (C such that Tp = Ap. Exercise 4.4. Show that if T is a linear transformation from a vector space

V over IF into itself, then T2 + 5T + 6] = (T + 31)(T + 21). 4.3. Existence of eigenvalues The first theorem in this section serves to establish the existence of at least one eigenvalue A E (C for a linear transformation that maps a finite dimensional vector space over (C into itself. The second theorem serves to bound

4.3. Existence of eigenvalues

65

the number of distinct eigenvalues of such a transformation by the dimension of the space.

Theorem 4.2. Let T be a linear transformation from a vector space V over (C into itself and let M 75 {0} be a finite dimensional subspace of V that is invariant under T. Then there exists a nonzero vector W E u and a number A E (C such that TW = AW. Proof. By assumption, dimu = 6 for some positive integer 6. quently, for any nonzero vector u E L! the set of 6 + 1 vectors

Conse-

u,Tu,...,Teu is linearly dependent over (C; i.e., there exists a set of complex numbers c0, . . . , Cg, not all of which are zero, such that c0u+---+CgTZu=0.

Let k = max {7' : Cj aé 0}. Then, by the fundamental theorem of algebra, the polynomial

p(:t)=co+c1m+m+n€=c0+clx+---+ckxk can be factored as a product of k: polynomial factors of degree one with roots u1,...,,uk EC:

10(33): CWB — Mk) ' ' ' (a? — M1)Correspondingly, c0u+---+CgTeu

=

c0u+---+cku

=

Ck(T— Mk1) - - - (T — [1,2])(T— [11])11 = 0.

This in turn implies that there are k possibilities:

(1) (T — #1011 = o. (2) (T — ,ulI)u aé 0 and (T — ,u21)(T — ulI)u = O.

(k) (T—uk_lI)---(T—,u1I)u7éO and (T—ukI)---(T—,u11)u=0. In the first case, ,u1 is an eigenvalue and u is an eigenvector. In the second case, the vector W1 = (T — ,ulI)u is a nonzero vector in L! and TW1 = ,ugwl. Therefore, (T — ,ulI)u is an eigenvector of T corresponding

to the eigenvalue ,u2.

In the k’th case, the vector wk_1 = (T — uk_1I) - - - (T — u11)u is a nonzero vector in L1 and TWk_1 = ,ukwk_1. Therefore, (T — uk_1I) - - - (T — ,ulI)u is

an eigenvector of T corresponding to the eigenvalue Hk-

D

66

4. Eigenvalues and eigenvectors

Notice that the proof does not guarantee the existence of real eigenvalues for linear transformations T from a vector space V over 1R into itself because the polynomial p(w) = co + 013: + - -- + ckx’c may have only complex roots a1, . . . Ha]. even if the coefficients 01,. . .,ck are real; see e.g., Exercise 4.5.

Exercise 4.5. Let T be a linear transformation from a vector space V over

IR into itself and let u be a two dimensional subspace of V with basis {u1, 112}. Show that if Tul = u2 and Tug = —u1, then T2u + u = 0 for every

vector u E U but that there are no one dimensional subspaces of u that are invariant under T. Why? [HINT: A one dimensional subspace of L! is equal

to {oz(clu1+02u2) : oz 6 R} for some choice of c1, 02 E R with |01|+|C2| > 0.] Theorem 4.3. Let T be a linear transformation from an n dimensional vector space M over (3 into itself and let u1, . . . ,uk 6 L1 be eigenvectors of T corresponding to a distinct set of eigenvalues A1, . . .,)\k. Then the vectors u1,...,u,c are linearly independent and hence h g n. Proof.

Let a1, . . . , ak be a set of numbers such that

(4.3)

(1,1111 + - - - + akuk = 0 .

We wish to show that aj = O forj = 1,. ..,l~c. But now as (T— A¢I)uj = Tuj — Aiuj = (Aj — Afluj,

’l, j = 1,.. .,k,

it is readily checked that (T—A21)---(T—)\kI)Uj ={

0 if j=2,...,k: (Al—A2)---()\1—)\k)u1

if

j=1.

Therefore, upon applying the product (T — A21) - - - (T — MD to both sides

of (4.3), it is easily seen that the left-hand side (T — A21) ' ' ° (T — )‘kIXalU-l + ' ' ' + akuk) = (A1 — A2) ' ' ' (A1 — An)a1u1, whereas the right—hand side

(T— A21)-~(T— mm = 0. Thus, a1 = 0, since these two sides must match. Similar considerations serve to show that a2 = = ak = 0. Consequently the vectors 111, . . . ,uk are linearly independent and k S n, as claimed. D

4.4. Eigenvalues for matrices The operation of multiplying vectors in IF ” by a matrix A 6 FM” is a linear transformation of IF” into itself. Consequently, the definitions that were introduced earlier for the eigenvalue and eigenvector of a linear transformation can be reformulated directly in terms of matrices:

4.4. Eigenvalues for matrices

67

o A point /\ 6 IF is said to be an eigenvalue of A E IFnX” if there exists a nonzero vector u 6 IF” such that Au = Au, i.e., if

MA—Mn) 7’é {0}Every nonzero vector u E N(A—>\In) is said to be an eigenvector of A corresponding to the eigenvalue A. If A1, . . . , Ak are distinct eigenvalues of a matrix A E Ian, then:

o The number 7.7 = dimN(A—>\jln)a j: 17 ' ' ' a k:

is termed the geometric multiplicity of the eigenvalue Aj. The number aj : dimN(A_)\jIn)n, j: 1, . . . , k,

is termed the algebraic multiplicity of the eigenvalue Aj. The inclusions

(4.4)

MA—Ajrn) Q MA—AjInP Q

Q MA—AjIn)“

guarantee that

(4.5)

71' S Ozj

for

j: 1,...,l€,

and hence (as will follow in part from Theorem 4.13) that

(4.6)

71+---+7kSCk1+---+ozk=n. Theset

(4.7)

C(14) = {A 6 C1 N(A—/\In) 7é {0}} is called the spectrum of A. Clearly, 0(A) is equal to the set {A1, . . . , M} of all the distinct eigenvalues of the matrix A in (C. A nonzero vector u 6 IF" is said to be a generalized eigenvector of the matrix A E Ian corresponding to the eigenvalue A 6 IF if u E N(A—>\In)”-

A vector u 6 IF” is said to be a generalized eigenvector of order k of the matrix A 6 IF a corresponding to the eigenvalue A 6 IF

if (A — Aln)ku = 0, but (A — Aln)k_1u ¢ 0. In this instance, the set of vectors Uj = (A — Aln)(k_j)u for j = 1,. . .,k is said to form a Jordan chain of length It; they satisfy the following chain of

68

4. Eigenvalues and eigenvectors

equalities: (A — AIn)u1

=

0

(A — AIn)U2

=

111

(A — AIn)uk

=

uk_1 .

This is equivalent to the formula

(4.8) (A— Mn) [111

11k] = [111

11k] N,

where

12—1 N = 2929l

j=1 and ej denotes the j ’th column of 1],. Thus, for example, if k: = 4,

then (4.8) reduces to the identity

(4.9)

(A — AI”) [111

ug

U3

U4] = [111

112

113

114]

0 0

1 0

0 1

0 0

0

0

0

0

0 0 0

1

Exercise 4.6. Show that the vectors 111, . . . , 11;, in a Jordan chain of length

k are linearly independent. Theorems 4.2 and 4.3 imply that

(1) Every matrix A 6 ([3a has at least one eigenvalue A E (C. (2) Every matrix A 6 CM” has at most n distinct eigenvalues in (C. (3) Eigenvectors corresponding to distinct eigenvalues are automatically linearly independent.

Even though (1) implies that 0(A) 7A 0) for every A 6 CW", it does not guarantee that 0(A) 0 1R 7E 0) if A E Rn”. Exercise 4.7. Verify the inclusions (4.4). Exercise 4.8. Show that the matrices 1—1

A=[1

1]

and

2—1

A=[3_1]

have no real eigenvalues, i.e., 0(A) 0 R = (Z) in both cases. Exercise 4.9. Show that although the following upper triangular matrices 200 210 210 020,020,021 002 002 002

4.5. Direct sums

69

have the same diagonal, dimN(A_213) is equal to three for the first, two for

the second and one for the third. Calculate N(A_213)j for j = 1, 2, 3,4 for each of the three choices of A.

Exercise 4.10. Show that if A 6 FM” is a triangular matrix with entries aij, then 0(A) = uy=1{a,-,-}. The cited theorems actually imply a little more: Theorem 4.4. Let A E (3a and let M be a nonzero subspace of C” that is invariant under A, i.e., u E a => Au 6 M. Then:

(1) There emists a nonzero vector u G U and a number A E C such that Au = Au. (2) If u1,...,u;c E L! are eigenvectors of A corresponding to distinct eigenvalues A1, . . . , A1,, then k: g dimu.

Exercise 4.11. Verify Theorem 4.4. 4.5. Direct sums

Let U and V be subspaces of a vector space 3/ over IF and let

U+V={u+v: uEI/{andVEV}. Clearly, a + V is a subspace of 3/ with respect to the rules of vector addition and scalar multiplication that are inherited from the vector space 3/, since U + V is closed under vector addition and scalar multiplication. If u and V are finite dimensional, then the sum U + V is said to be a direct sum if

dimu + dimV = dim (U + V). Direct sums are denoted by the symbol +, i.e., U—l—V rather than L! + V. Analogously, If u,, j = 1, . . . , k, are subspaces of a vector space 32 over IF, then the sum (4.10)

U1+---+Z/{k.={u1+---+uk: uiEI/{i

fOI‘

i=1,...,k}

is a subspace of 3/ with respect to the rules of vector addition and scalar multiplication that are inherited from the vector space 3?. If these k subspaces are finite dimensional, the sum u = U1 + + (1;, is said to be a direct sum if

(4.11)

dimLIl + - - - + dimZ/lk, = dim {U1 + - - - +1119}.

If the sum L! is direct, then we write

u=m+m+m.

70

4. Eigenvalues and eigenvectors

Notice that, if {u1, . . . , un} is a basis for L! and 1 < k < E < n, then

M = span{u1, . . . , uk_1}—l—span{uk, . . . , ug_1}—i—span{ug, . . . , un}. Theorem 4.5. If u and V are finite dimensional subspaces of a vector space 3/ over IF, then

(4.12)

dim (u + V) = dimu + dimV — dim(u n V)

and hence

(4.13) Proof.

the sum u + V is direct

U D V = {0}.

If bl n V = L! or L! F) V = V, then it is easily seen that formula

(4.12) holds, since L! + V = U in the first case and L! + V = V in the second. Suppose next that LI 0 V is a nonzero proper subspace of LI and of V and that {w1, . . . ,wk} is a basis for U (1 V. Then there exists a fam-

ily of vectors {u1, . . .,u,.} and a family of vectors {V1, . . . ,vs} such that {w1,...,wk,u1,...,u,.} is a basis for U and {w1,...,wk,v1,...,vs} is a basis for V. It is clear that span{W1,...,Wk,u1,...,u,.,v1,...,vs} =U+V. Moreover, if (11,...,ak,b1,...,b,«,c1,...,c,S E IF, then k

r

s

k

7'

s

Za2W2+iuJ +ZCn = 0 => 20’i +iuj — —ZCeVe i=1

j=1

€21

i=1

3:1

£21

and hence cl = - - - = C5 = 0, since the last equality implies that 227:1 cn belongs to MOV. Thus, also a1 =

= ak 2 bl =

= bs = 0, since

{w1, . . . ,Wk,u1, . . . ,u,} is a basis for u. Consequently, the full family of k+r+s vectors is linearly independent and so is a basis for bl +V. Therefore,

dim(L{+V) =k+r+s= (k+r)+(k—l—s)—k, which serves to verify (4.12) when U 0 V is a nonzero proper subspace of both LI and V.

The verification of (4.12) when L! H V = {O} is similar, but easier: If {u1, . . . ,us} is a basis for L! and {V1, . . . ,vt} is a basis for V, then clearly span {u1, . . . , us, v1, . . . , vt} = U + V and this set of 5 +25 vectors is linearly independent since 7"

s

r

s

iuj 4-20n = 0 => iuj = —ZCn

j=1

e=1

j=1

12:1

and hence that both sides of the last equality belong to bl n V = {0}. Therefore, (4.12) holds in this case also. Finally, the characterization (4.13) is immediate from the definition of a direct sum and formula (4.12). D

4.5. Direct sums

71

We remark that the characterization (4.13) has the advantage of being applicable to infinite dimensional spaces. However, it does not have simple analogues for sums of three or more subspaces; see e.g., Exercise 4.12. Exercise 4.12. Give an example of three subspaces bl, V and W of R3 such

thatblflV={0},I/lflW= {0} and VF‘IW: {0} yet the sul+V+W is not direct. Exercise 4.13. Let 3) be a finite dimensional vector space over IF. Show that if)? = U—l—V and V = X—i—W, then 3/ = Ll—l—X—i—W. If the vector space 32 admits the direct sum decomposition 3’ = Ll-i—V.

then the subspace V is said to be a complementary space to the subspace L! and the subspace u is said to be a complementary space to the subspace

V. Exercise 4.14. Let T be a linear transformation from a vector space V

over 1R into itself and let M be a two dimensional subspace of V with basis {u1, 112}. Show that if Tul = 111 + 2u2 and Tug = 2u1 + 1.12, then u is the

direct sum of two one dimensional spaces that are each invariant under T. Lemma 4.6. Let 3? be a vector space over IF and let LI and V be subspaces of 32 such that U—i—V = 3/. Then every vector y E y can be expressed as a

sum of the form y = u + V for exactly one pair of vectors u E U and V E V. Proof.

Suppose that y = 111 +V1 = u2 +V2 with 111, 112 E L! and v1, V2 6

V. Then the vector u1 — u2 2 v2 — v1 belongs to LI (1 V and is therefore equal to 0. Thus, 111 = u2 and v1 2 v2. El

Lemma 4.7. Let bl, V and W be subspaces of a vector space 3/ over IF such that

Ll—i—V=,V

and Z/lQW.

Then

W = (WmLI)+(WnV). Proof.

Clearly,

(Wnu)+(WoV) g W+W=W.

To establish the opposite inclusion, let W E W. Then, since 32 = LI—i—V, W = u+v for exactly one pair of vectors u E u and v E V. Moreover, under the added assumption that L! E W, it follows that both u and v = W — u belong to W. Therefore, 11 E W flu and v E W (1 V, and hence

WQ (Wou)+(WoV).

72

4. Eigenvalues and eigenvectors

Exercise 4.15. Provide an example of three subspaces U, V and W of a

vector space 31 over IF such that Ll—l—V = 3), but W 7E (W OUH—(W O V). [HINT: Simple examples exist with y = R2.] Lemma 4.8. Let M], j = 1,. .. ,k, be finite dimensional nonzero subspaces

of a vector space 3/ over IF.

Then the sum (4.10) is direct if and only if

every set of nonzero vectors {u1, . . .,uk} with u1; E u, fori = 1,. . .,k is a

linearly independent set of vectors. Discussion.

To ease the exposition, suppose that k: = 3 and let {a1, . . . , ag}

be a basis for U1, {b1, . . . ,bm} be a basis for U2 and {C1, . . . ,cn} be a basis for U3. Clearly

span{a1,...,ag, b1,...,bm, C1,...,c.,}=u1 +212 +213. It is easily checked that if the sum is direct, then the 6 + m + n vectors indicated above are linearly independent and hence if u = Zaiai, v = E ij and W = Z cykck are nonzero vectors in L11, U2 and U3, respectively,

then they are linearly independent. Suppose next that every set of nonzero vectors u 6 L11, v 6 U2 and

W 6 U3 is linearly independent. Then {a1,...,ag, b1,...,bm, C1,...,Cn} must be a linearly independent set of vectors because if

04181+"'+Oteae+51b1+"'+fimbm+7101+“'+’7ncn=0, and if, say, all 7E 0, 61 7E O and 71 75 0, then

a1(a1+- - -+a;1aeae)+fil(b1+- - ~+flflflmbm)+m(cl+- - -+v;1incn) = 0, which implies that a1 = fll = '71 = 0, contrary to assumption. The same

argument shows that all the remaining coefficients must be zero too.

D

Exercise 4.16. Let U = span{u1, . . . ,uk} over IF and let Llj = {auj : a 6 IF}. Show that the set of vectors {u1, . . . , uk} is a basis for the vector space

Ll over IF if and only iful—l— - - - +11], = u. 4.6. Diagonalizable matrices A matrix A E Ian is said to be diagonalizable if it is similar to a diagonal matrix, i.e., if there exists an invertible matrix U E Im and a diagonal

matrix D E Ian such that

(4.14)

A = UDU—l.

Theorem 4.9. Let A E (3a and suppose that A has exactly k distinct eigenvalues A1, . . . , Ah 6 C. Then the sum

ARA—Ali”) + ' ' ' +MA—Ak1n) is direct. Moreover, the following statements are equivalent:

4.6. Diagonalizable matrices

73

(1) A is diagonalizable. (2) dim N(A_)\11n) + - - ' + dim MA—Akln) = n.

(3) MA—Alln)'i' ' ' ' 'l'A/(A—Akln) = C"Proof.

Suppose first that A is diagonalizable. Then, the formula A =

UDU ‘1 implies that

A — A1,, = UDU-1 — AUInU‘l = U(D — AIn)U‘1 and hence that

dimMA_AIn) = dimN(D_Mn)

for every point

A E (C.

In particular, if A = M is an eigenvalue of A, then

72‘ = dim-A/(A—Ajln) = dimA/(D—Ajln) is equal to the number of times the number Aj is repeated in the diagonal matrix D. Thus, ’71 + - - - + 7;, = n, i.e., (1) => (2). To prove that (2) => (1), take 79- linearly independent vectors from MA—AjIn) and array them as the column vectors of an n x ’Yj matrix Uj. Then AUj = UjAj

for

j: 1,...,k,

where Aj is the 73' X 73- diagonal matrix with Aj on the diagonal. Thus, upon setting U = [U1 '

Uk]

and

D = diag{A1,...,Ak},

it is readily seen that AU 2 UD and, with the help of Theorem 4.3, that the ’71 + - - - + 7;, columns of U are linearly independent, i.e., rankU=71+---+’yk=n.

The formula AU = UD is valid even if 71 + - - - + 719 < n. However, if (2) is

in force, then U is invertible and A = UDU‘1. Finally, the implication (2) => (3) follows from Theorem 4.3 and Lemma 4.8; the converse follows from the definition of a direct sum.

D

Corollary 4.10. Let A E (Cnxn and suppose that A has n distinct eigenvalues in (3. Then A is diagonalizable. Exercise 4.17. Verify the corollary.

Formula (4.14) is extremely useful. In particular, it implies that

A2 = (UDU—1)(UDU—1) = UD2U—1,

A3 = UD3U—1 etc.

74

4. Eigenvalues and eigenvectors

The advantage is that the powers D2, D3, . . . , D’“ are easy to compute: #1

(4.15)

Ni

D=

=> Dk = #11

lull

Moreover, this suggests that the matrix exponential (which will be intro-

duced later)

all of which can be justified.

Exercise 4.18. Show that if a matrix A E an is diagonalizable, i.e., if

A = UDU-1 with D = diag{)\1, . . . , An}, and if

x V1

U=[u1

un]

and

U_1=

5

,then:

—>

vn

(1) Ak = UDkU—l = 23:1 Afuj ‘79-.

(2) (A — Aln)_1 = U(D — Ara-1114 = zyzloj — Arluj ‘79-, if

A g: 0(A). 4.7. An algorithm for diagonalizing matrices The verification of (2) => (1) in the proof of the Theorem 4.9 contains a recipe for constructing a pair of matrices U and D so that A = UDU‘1 for a matrix A 6 (Ca with exactly 14: distinct eigenvalues A1, . . . , Ak when the geometric multiplicities meet the constraint 71 + - - - + W = n:

(1) Calculate the geometric multiplicity 73' = dim A/(A_Aj1n) for each

eigenvalue Aj of A. (2) Obtain a basis for each of the spaces A/(A_Aj1n) for j = 1, . . . , k and let Uj denote the n x '73- matrix with columns equal to the vectors in this basis. (3) Let U = [U1

'"

Uk].

Then

AU = [AU1

AUk] = [A1U1

AkUk] = UD,

4.8. Computing eigenvalues at this point

75

where

D = diag{D1, . . . , D1,} and Dj is a 7]“ X’Yj diagonal matrix with Aj on its diagonal. If 71+ - +7], = n, then U will be invertible. The next example illustrates the algorithm. Example 4.11. Let A E (36i and suppose that A has exactly 3 distinct eigenvalues A1, A2, A3 6 C with geometric multiplicities 71 = 3, 72 = 1 and

73 = 2, respectively. Let {u1, u2, u3} be any basis for A/(A_A11n), {u4} be

any basis for MA— A211,) and {u5, u6} be any basis for MA— A311». Then it is readily checked that

A[u1 u2 u3 u4 u5 u6] = [u1 u2 u3 u4 u5 us]

Al

0

0

0

0

0

0

A1

0

0

0

0

0

0

A12 0 i 0

0

“-0- ----- 0- ----- 0A20 ----- 0--

”0 """ (l ddddd 00A30 0

0

0

0

0

/\3

But, upon setting

U = [111 ug 113 u4 u5 u6]

and

D = diag {A1,A1,>\1,>\2,/\3,)\3}a

the preceding formula can be rewritten as AU 2 UD

or equivalently as

A = UDU—1 7

since U is invertible, because it is a 6 X 6 matrix with six linearly independent column vectors, thanks to Theorem 4.3. Notice that in the notation used in Theorem 4.9 and its proof, k = 3, U1 = [111 ug U3], U2 = U4, U5 = [U5 U6], A1 = diag{)\1,)\1,)\1}, A2 = A4 and A3 = diag{)\5, A6}.

4.8. Computing eigenvalues at this point The eigenvalues of a matrix A 6 CM” are precisely those points A E (C at

which N(A_,\In) 7E {O}. In Chapter 5, we shall identify these points with the values of A E (C at which the determinant det (A — Mn) is equal to zero. However, as we have not introduced determinants yet, we shall discuss another method that uses Gaussian elimination to find those points A E (C for which the equation Ax — Ax = 0 has nonzero solutions x E C”. In

particular, it is necessary to find those points A for which the upper echelon matrix U corresponding to A — A1,, has less than n pivots.

76

4. Eigenvalues and eigenvectors

Example 4.12. Let

Then

A—AIg=

2 1

2—A 3

1 1—A

Thus, permuting the first and third row of A for convenience, we obtain 0 0 1

0 1 O

1 O O

1 2 3—A

A:

3 2—A 1

1—A 1 1

Next, adding —2 times the first row to the second and A — 3 times the first row to the third, yields 1 —2 A—3

0 1 0

0 0 1

1 2 3—A

3 2—A 1

l—A 1 1

=

1 0 0

3 —4—A 3A—8

1—A 2A—1 :8

,

where

x=1+(A—3)(1—A). Since the last matrix on the right is invertible when A = —4, the vector space

NA+413 = {0}. Thus, we can assume that A+4 7E 0 and add (3A— 8)/(A+4) times the second row to the third row to get

1 0 0 0 (3A1_8) 0

1 3 1—A 0 —4—A 2A—1

=

1 3 1—A 0 —4—,\ 2A—1

,

where

BA—&QA—D+A+4+lA+®Q—3Xr—M A+4

—MA—®Q—1) A+4

'

Therefore, MA—Ma) 7E {0} if and only if A = 0, or A = 5, or A = 1.

Exercise 4.19. Find an invertible matrix U E (C3X3 and a diagonal matrix D E (C3X3 so that A = UDU‘1 when A is chosen equal to the matrix in the preceding example. [HINT: Follow the steps in the algorithm presented in

the previous section.]

4.9. Not all matrices are diagonalizable

77

Exercise 4.20. Find an invertible matrix U such that U ‘1AU is equal to a diagonal matrix D for each of the following two choices of A:

111 112 022,022 001 003

r—tr—tr—t

r—Ior—I

D—‘D—‘H

H

{L

Exercise 4.21. Repeat Exercise 4.20 for

[REMARKz This is a little harder than the previous exercise, but not much.]

4.9. Not all matrices are diagonalizable Not all matrices are diagonalizable, even if complex eigenvalues are allowed. The problem is that a matrix may not have enough linearly independent eigenvectors; i.e., the criterion 71 + - - - + W = n established in Theorem 4.9 may not be satisfied. Thus, for example, if

1 1 A_[02 2], then A—2I2_[O0 0], dimA/(A_212) = 1 and dimN(A_M2) = 0 if A 75 2. Similarly, if

210 010 A=021,then A—213=001, 002 000 dimA/(A_213) = 1 and dim/WA_ A13) = 0 if A 7é 2. More elaborate examples may be constructed by taking larger matrices of the same form or by putting such blocks together as in Exercise 4.22.

78

4. Eigenvalues and eigenvectors

Exercise 4.22. Calculate dimMBAl—Alll3)j for j = 1,2,... when

A1 1 0 A1

0 1

0 0

0 g 0 0 i 0

0 0

0 0

0 g 0 0 § 0

0 g 0 g 0 0 i 0 i 0

0

0

A1

1

0

0

0

0

0

0

0

0

0

A1

1 § 0

0

0

0 § 0

0 5 0 5 0

0

0

0

0

A1; 0

0

0

0 5 0

0 5 0 5 0

0

0

0

"'0' """ '0' """ 0' """ 0 """ 0"2';\1""1 """ 0' """ 0"2'1)’ """ UTOC'U” B,\1= 00000§0A110§0 0§0§0 0 0 0 0 0 i 0 0 A1 1 3 0 0 i 0 i 0 0 0 5 0 5 0 "'0'0 _____ 0.0 _____ 0.0 _____ 0.0. ..... 00.1.1.0.0 _____ 0.0. ..... 00' .....A15 0""Q"X1”"1"§"0":"0"' 0 0 0 0 0 g 0 0 0 0 g 0 A1§ 0 g 0 ”‘0' ..... “““ 0'0 ..... “““ 0'O ..... ””” 0'0 ..... ””” .0110. ’0”§”’0' _____ """ .'00.' ..... """ 0'0' ..... """ 01.1.0. 0"”"0’ _____ """ 6"5”t)'"?'X1”'_ O‘f‘Xl‘fh” _'"0' and build an array of symbols x with dim MBM _)\1113)1 —dim/\/'(B/\1 _)\1113)1_1 symbols x in the i’th row for 75 = 1, 2, . . .. Check that the number of funda—

mental Jordan cells in BM of size 2' x 2' is equal to the number of columns of height 1' in the array corresponding to Aj. The notation

BM = diag{0§?a CE), 03), Cl?) 0%)} is a convenient way to describe the matrix BA1 of Exercise 4.22 in terms of its fundamental Jordan cells Cg] ) , where

(4-16)

04

1

0

0

a

1

...

05:“) = 2 s

0

0

O

0

s s = at + 03”)

0

0

...

oz

1

O

0

...

0

oz

denotes the 1/ x 1/ matrix with oz on the main diagonal, one on the diagonal line just above the main diagonal and zeros elsewhere. This helps to avoid such huge displays. Moreover, such block diagonal representations are convenient for calculation, because

(4.17)

B = diag{Bl, . . . ,Bk} => dim/VB = dim/V31 + - - - + dimNBk.

Nevertheless, the news is not all bad. There is a more general factor-

ization formula than (4.14) in which the matrix D is replaced by a block

diagonal matrix J = diag{B)\1, . . .,B,\k}, where BA]. is an aj >< aj upper triangular matrix that is also a block diagonal matrix with 73- Jordan cells (of assorted sizes) as blocks that is based on the following fact:

4.9. Not all matrices are diagonalizable

79

Theorem 4.13. Let A E (3a and suppose that A has exactly k distinct eigenvalues, A1, . . . , A], E (3. Then

C" = N(A—A11n)n‘l'""l' MA—Akln)”The proof of this theorem will be carried out in the next few sections. At this point let us focus instead on its implications. Example 4.14. Let A E (ng and suppose that A has exactly three distinct eigenvalues A1, A2 and A3 with algebraic multiplicities a1 = 4, a2 = 2 and 043 = 3, respectively. Let {V1,V2,V3,V4} be any basis for A/(A_)\119)9, let {W1,W2} be any basis for MA-A219)9 and let {x1,X2,X3} be any basis for

A/(A_,\319)9. Then, since each of the spaces A/(A_,\j19)9, j = 1, 2, 3 is invariant under multiplication by the matrix A, A[V1 V2 V3 V4]

=

[V1 V2 V3 V4]G1

A[W1 W2]

2

[W1 W2]Gz

A[X1 X2 X3]

=

[X1 X2 X3]G3

for some choice of 01 E (C4X4, 02 E C2X2 and 03 E (C3X3. In other notation, upon setting

V=lV1 V2 V3V4laW=lW1 W2]

and

X=[X1X2X3],

one can write the preceding three sets of equations together as 01

A[VWX]=[VWX]

O

O

0 02 0 O

0

G3

or, equivalently, upon setting U = [V W X], as G1

(4.18)

A=U

0

O

0 G2 0 0

0

U4,

G3

since the matrix U = [V W X] is invertible, thanks to Theorem 4.13.

Formula (4.18) is the best that can be achieved with the given information. To say more, one needs to know more about the subspaces .A/(A_)‘iln)j forj = 1,. . . ,oa and i = 1, 2,3. Thus, for example, if dimA_)\,19) = 1 for i = 1, 2, 3, then the vectors in A/(A_)\,19)a,- may be chosen so that

diag {01, G2, 03} = diag {09, 0%), egg» . On the other hand, if dimN(A_>\z.Ig) = 2 for i = 1, 2, 3 and dimA/(A_A119)2 = 4, then the vectors in N(A_>\i19)aj may be chosen so that

diag {01, 02, G3} = diag{C§3), cg), Cg), 0%), 0%), 0%)}

80

4. Eigenvalues and eigenvectors

There are still more possibilities. The main facts are summarized in the statement of Theorem 4.15 in the next section.

4.10. The Jordan decomposition theorem Theorem 4.15. Let A E (Cnxn and suppose that A has exactly k distinct eigenvalues A1, . . . , Ak in C with geometric multiplicities 71, . . . ,7k and alge— braic multiplicities a1, . . . ,ak, respectively. Then there exists an invertible matrix U 6 ([3a such that

AU = UJ,

where: (1) J = diag{B)\1, . . . , BAk}-

(2) BA]. is an ozj >< ozj block diagonal matrix that is built out of 73- Jordan

cells Cfij) of the form (4.16). (3) The number of Jordan cells Ci? in BA]. with i 2 K is equal to (4.19)

dimA/(A_)\j1n)e — dimA/(A_Aj1n)e—1, e = 2, . . . ,aj,

or, in friendlier terms, the number of Jordan cells Cg) in BA]. is equal to the number of columns of heighti in the array of symbols x that is constructed by placing >
.j1n)2 — dim MA—Afln) symbols in row 2

x

- --

with dim N(A—AjIn)3 — dim A/(A_Aj1n)2 symbols in row 3 '

(4) The columns of U are generalized eigenvectors of the matrix A. (5) (A — A117,)” - - - (A — AkIn)ak = 0.

(6) If l/j = min{i : dim MA_)\jIn)i = dim MA_)\jIn)n}, then l/j g Olj forj = 1, . ..,k and (A— Alln)”1 - - - (A — AkInYk = O.

The verification of this theorem rests on Theorem 4.13. It amounts to showing that the basis of each of the spaces MA_)\jIn)n can be organized in

a suitable way. It turns out that the array constructed in (3) is a Young diagram, since the number of symbols in row i + 1 is less than or equal to

the number of symbols in row i; see Lemma 6.2 and Section 6.4. Item (5) is the Cayley—Hamilton theorem: n—1

(4.20)

n—1

det()\In — A) = A” + 2 cj => A” = — 2 cj. i=0

j=0

4.11. An instructive example

81

In view of (6), the polynomial p()\) = (A— A1)"1 - - - (A—AkY’k is referred to as the minimal polynomial for A. Moreover, the number I/j is the “size” of the largest Jordan cell in BA,» A detailed proof of the Jordan decomposition theorem is deferred to Chapter 6, though an illustrative example, which

previews some of the key ideas, is furnished in the next section. Exercise 4.23. Show that if

A = UCé72 )U —1 ,

then

0 if mm

' MA—Mn) = dlm { 1

if

A: 02

Exercise 4.24. Calculate dlmN(A_)\Ip)t for every A E (C and t = 1, 2, . . . for

the 26 x 26 matrix A = UJU‘1 when J = diag {B,\1,BA2,B)\3}, the points A1, A2, A3 are distinct, BA, is as in Exercise 4.22, BM = diag{C(‘:), CS3} and 3A3 = diag{C(:), Cg), Cg} . Build an array of symbols x for each eigen-

value Aj with 79- symbols in the first row and dim MA_ A]. 1,.)i —dim NEA_ A]. Ip)i—1 symbols x in the 2"th row for 2 = 2, 3, . . . and check that the number of fun-

damental Jordan cells in BA3. of size 2' X 2' is equal to the number of columns in the array of height 2'.

4.11. An instructive example To develop some feeling for Theorem 4.15, we shall first investigate the implications of the factorization A = UJU‘1 on the matrix A when U = [111

' ' '

115]

is any 5 x 5 invertible matrix with columns 111, . . . , u5 and

A11

0§00

0A11§00

Then the matrix equation AU = UJ can be replaced by five vector equa— tions, one for each column of U:

Aul = Alul. 14112 = 111 + Alug.

Au3 = u2 + /\1u3. Am; 2 A2u4. .4115 = 114 + /\2U5.

82

4. Eigenvalues and eigenvectors

The first three formulas imply in turn that 111 E MA_,\115) ; i.e., 111 is an eigenvector corresponding to A1.

112 ¢ ARA—A115) but u2 E M14411”; “3 ¢ Mal—ms)? but 113 E MA—A1I5)3; i.e., 111, ug, u3 is a Jordan chain of length 3. Similarly, u4, u5 is a Jordan chain of length 2. This calculation exhibits 111 and u4 as eigenvectors. In fact, (1) if A1 75 A2, then

dimN(A_)\15)=

1

if A = A1

1

if A = A2 ,

0

otherwise

dimMA—A15)2 =

2

if A = A1

2

if A = A2

0

otherwise

and

dimMA_)\15)k =

3

if A = A1

2

if A = A2

0

otherwise

for every integer

k 2 3;

(2) if A1 = /\2, then

.

dlmMA—Als) = {

2

if/\=)\1

0

otherwise

a

4

,

dlmMA—A15)2 = {0

ifA=A1 otherwise

and

5 dimMA_AI5)k = {

0

ifA=A1 . otherw1se

for every integer

k: 2 3.

The key to these calculations is in the fact that (4.21)

dim/\QA_)\15)1¢ = dimN(J_)\15)k

for k = 1, 2, . . . , and the special structure of J. Formula (4.21) follows from the identity

(4.22)

(A — A15)k = U(J — A15)’“U—1.

Because of the block diagonal structure of J, rank J 2 rank CS) + rank Cg)

4.12. The binomial formula

83

and

rank (J — AI5)k = rank (CS) — AIg>k + rank (Cg) — AIg)k

2 rank (cgfgyc + rank (ogggy . Moreover, it is easy to compute the indicated ranks, because a 1/ x 1/ Jordan cell CE”) is invertible if fl 75 0 and k rank(C((,'/))

= V—k for k: 1,...,1/.

To illustrate even more graphically, observe that if B = A1 — A2 7é 0, then

0 1 0§0 0 0 0 1g 0 0 0 0 1:0 0 0 0 0; 0 0 J—AlI5= 0 0 050 0 , (J—A11'5)2= 0 0 0; 0 0 "b”"(i"'0'§'B""i""" 6""0'"'0";"B2""'2'B" _000i06

000i0/32

and

'0 0 0; 0 0 0 0 0; 0 0 0 (J—A115)3= 0 0 0; 0 "O“O"'O”{B3'”36'2" _0 0 0: 0 53 J Clearly one can construct more elaborate examples of nondiagonalizable

matrices A = UJU‘1 by adding more Jordan cells Cg) to J. 4.12. The binomial formula

The familiar binomial identity

(a + b)m = Em: (71:1)akbm—k k=0

for numbers a and b remains valid for square matrices A and B of the same size if they commute:

(4.23)

(A + B)m = Z (71:) AkBm‘k if AB = BA. 1920

It is easy to see that the condition AB = BA is necessary for the formula in (4.23) to hold by comparing both sides when m = 2; the sufficiency may be verified by induction.

84

4. Eigenvalues and eigenvectors

If A = A1,, and B E Cnxn, then AB = BA and hence

(4.24)

(A1,, + B)?“ = gm: (7;) AkBm‘k. k=0

Exercise 4.25. Find a pair of matrices A and B for which the formula

(4.23) fails. Exercise 4.26. Show that formula (4.23) holds for A, B E (3a if AB = BA. [HINT: Induction] 4.13. More direct sum decompositions The next few lemmas focus on the null spaces NBJ- for j = 0,1,... when B 6 an. It is perhaps reassuring to note that if B E Rn”, then in spite of the ambiguity in the domain,

(4.25)

dim{x 6 RM”: Bx = 0} = dim{u E (Cnxnz Bu = 0}.

This stems from the fact that if {x1, . . . ,xk} is a basis for the nullspace on the left and Bu 2 0 for some vector u = x + iy E C" (with x, y E R"), then Bx = O and By 2 0. Therefore, k

k

k

j=1

j=1

j=1

HEX, 2w nu=zxee i.e., the k vectors x1, . . . ,xk are also a basis for the nullspace on the right.

Lemma 4.16. IfB 6 FM”, then: (1) The null spaces NBj are ordered by inclusion: NBQNBZ QNBB Q

.

(2) IfNBj = NBj—l for some integerj 2 2, then NBj+1 = NBj.

(3) Ifj 2 1 is an integer, then NBj 75 {0} (=> NBj+1 75 {0}. Proof.

If u E NBj, then Bj+1u = B(Bju) = B0 = 0, which clearly

implies that u E NBj+1 and hence justifies (1) Suppose next that u E NBj-H and NBj = NBj—l. Then Bu 6 NBj, since

Bj+1u = Bj (Bu) = 0. Therefore, Bu 6 NBj—l, which implies in turn that u E NBj. Thus, N33“ = NBj—l => NBj+1 Q N333

and hence as NBj Q NBj+1 by (1), serves to prove (2). Finally, (3) is left to El the reader as an exercise.

Exercise 4.27. Verify assertion (3) in Lemma 4.16.

4.13. More direct sum decompositions

85

Lemma 4.17. Let B 6 FM" and let u 6 IF" belong to NB). for some positive integer k. Then the vectors u, Bu, . . . , Bic—In are linearly independent over IF Bk_1u 7E 0.

Proof.

Suppose first that BIC—in 7E 0 and that mm + ouBu + - - - + ak_1Bk_1u = 0

for some choice of coefiicients a0, . . . .0419 6 IF. Then, since u E NBk, Bk_1(a0u + alBu + - - - + ak_1Bk_1u) = aoBk_1u = O,

which clearly implies that ozo = 0. Similarly, the identity Bk_2(a1Bu + - - - + ozk_1Bk_1u) = alBk_1u = 0

implies that a1 = 0. Continuing in this vein it is readily seen that B k_1u 7g 0 => that the vectors u, Bu, . . . ,Bk_1u are linearly independent over IF.

Thus, as the converse is self-evident, the proof is complete.

[I

Lemma 4.18. IfB E an, then NB). Q NBn for every positive integer k

(i.e., Bku = 0 => Bnu = 0).

Proof.

If N'B;c = {0}, the assertion is clear. If Bku = 0 for some nonzero

vector u 6 IF” and some positive integer k, let j be the smallest positive

integer such that Bj_1u 7E 0 and B9u = 0. Then, in view of Lemma 4.17, the vectors u, Bu, . . .,B j_1u are linearly independent. Therefore, j g n, i.e., NB]. Q NBj with j g n. The inclusion NBk Q NBn then follows from (1) of Lemma 4.16. D Lemma 4.19. IfB E an, then (4.26)

Proof.

Fn=NBn+RBn.

Suppose first that u E NBn fl RBn. Then Bnu = O and u = a

for some vector v 6 IF". Therefore,

0 = Bnu = B2nv. But, by the last lemma, this in fact implies that u = a = 0. Thus, the sum is direct. It is all of IF" by the principle of conservation of

dimension:

n = dim 72371 + dim NBn.

Lemma 4.20. IfB 6 FM", A E C and A 7E 0, then N(B+)‘In)n g RBn.

86

4. Eigenvalues and eigenvectors

Proof.

Let u E MB+AIn)"- Then, by formula (4.23),

0 = (B + Mn)” u = i (n) Bj()\In)"_ju j=0 3

= (Wu + Z (7i)>\”‘iu i=1 J

2 Ann + B

C") An—i—lu.

i=1 3 Therefore,

u = Bp(B)u, for the polynomial

19(3) = _ —n: An_i_l = COIn + 013 + ' ' ' + Cn—a_1 j=1

in the matrix B. The last identity for u may be iterated to obtain

11 = Bp(B)Bp(B)u = 3219(3)2 , since

310(3) = p(B)BIterating n — 2 more times we see that

u = B”p(B) "11, which is to say that u E RBn,

D

as claimed.

Remark 4.21. The last lemma may be exploited to give a quick proof of the fact that generalized eigenvectors corresponding to distinct eigenvalues are automatically linearly independent. To verify this, let (A—Ajln)”uj =0, j: 1,...,k,

for some distinct set of eigenvalues A1, . . . , A1, and suppose that clu1+---+ckuk =0. Then —01u1 = C2112 + - - - + Ckuk

and, since

—clu1 E MA_)‘IIn)n and, by Lemma 4.20,

02112 +- - -+ckuk E R(A_,\11n)n,

4.14. Verification of Theorem 4.13

87

both sides of the last equality must equal zero, thanks to Lemma 4.19. Therefore c1 = O and 02u2+---+ckuk =0.

To complete the verification, just keep on going. Exercise 4.28. Complete the proof of the assertion in the preceding remark when k = 3. 4.14. Verification of Theorem 4.13

Lemma 4.19 guarantees that

(4.27)

19‘" = M.4— Mn)n+7z(A_AInn

for every point A 6 IF. The next step is to obtain an analogous direct sum decomposition for R(A_)\In)n. Lemma 4.22. Let A E ax", let A1,)\2 6 IF and suppose that A1 7E A2. Then

(428) Proof.

R(A—A11n)n = MA—Agln)"‘l'{R(A—A1In)n fl R(A—A21n)n}The sum in (4.28) is direct, thanks to Lemma 4.19. Moreover, if

X E R(A—>\1In)n, then, we can write x=u+v

where u E .A/(A_A2In)'n, V 6 7294421101; and, since the sum in (4.26) is direct, the vectors u and v are linearly independent. Lemma 4.20 guarantees that 11 E R(A—)\11n)n'

Therefore, the same holds true for v. Thus, in View of Lemma 4.7,

R(A—A11n)n = R(A—)\1In)” fl A/(A—Agln)” + R(A—A11n)n n R(A—A2In)n , which coincides with formula (4.28).

D

There is a subtle point in the last proof that should not be overlooked; see Exercise 4.15.

Lemma 4.23. Let A 6 ((3a and suppose that A has emactly k: distinct eigenvalues, A1, . . . , A], E (3. Then (4.29)

R(A_)\11n)n fl R(A—A21n)n fl ' ' ' fl R(A—)\k1n)" = {0} .

Proof. Let M denote the intersection of the k: sets on the left-hand side of the asserted identity (4.29). Then it is readily checked that M is invariant under A; i.e, if u E M, then Au 6 M, because each of the sets R(A_)\j1n)n

is invariant under A: if u E R(A_)\jIn)n, then u = (A — AjIn)"vj and hence

Au = (A — Ajln)"Avj, forj = 1, . . .,k. Consequently, if M aé {0}, then,

88

4. Eigenvalues and eigenvectors

by Theorem 4.4, there exists a complex number A and a nonzero vector v E M such that Av — Av = 0. But this means that A is equal to one of the

eigenvalues, say At. Hence v E MA—Atln)- But this in turn implies that V E 'N-(A-AtIn)” fl R(A-)\t1n)” = {0} .

D We are now ready to prove Theorem 4.13, which gives the theoretical justification for the decomposition of J into blocks BAj, one for each distinct eigenvalue. It states that every vector v E C“ can be expressed as a linear combination of generalized eigenvectors of A. Moreover, generalized eigenvectors corresponding to distinct eigenvalues are linearly independent. The theorem is established by iterating Lemma 4.22. Proof of Theorem 4.13. Let us suppose that k: 2 3. Then, by Lemmas 4.19 and 4.22,

C" = MA—Alln)"‘l'R(A—A11n)" and

R(A—A1In)n = N(A—)\2[n)’” +R(A—A1In)n fl R(A—/\21n)” Therefore,

C” = MA—Alln)"‘l'N(A—A21n)n‘l'R(A—A11n)n fl R(A—A21n)n Moreover, since

MA—Agln)“ Q R(A—>\1In)n fl Roi—Amy, by Lemma 4.20, the supplementary formula

R(A—A11n)n “Rm—Aggy» = N(A—A3In)n ‘l‘R(A—A11n)n n73(A—>.21.,)n “Rm—A35)” may be verified just as in the proof of Lemma 4.22 and then substituted into the last formula for C”. To complete the proof, just keep on going until you run out of eigenvalues and then invoke Lemma 4.23. III The point of Theorem 4.15 is that for every matrix A 6 ([3a it is possible to find a set of n linearly independent generalized eigenvectors u1, . . . ,un such that

A[u1---un] = [u1---un]J. The vectors have to be chosen properly. Details will be furnished in Chap— ter 6.

Exercise 4.29. Find a basis for each of the spaces NBj, NRj and N33" flNBj forj = 0,1,..., when B = 053).

Exercise 4.30. Show that if A E Cnx” has exactly two distinct eigenvalues in C, then

R(A—A1In)” fl R(A—A21n)" = {0} -

4.15. Bibliographical notes

89

Exercise 4.31. Show that if A 6 (3a has exactly h distinct eigenvalues A1, . . . , N, in (C with algebraic multiplicities a1, . . . ,ozk, then

MA—AlInrl +"'+MA—Ak1n)ak = C" Is it possible to reduce the powers further? Explain your answer.

Exercise 4.32. Verify formula (4.17). [HINTz In case of difficulty, start modestly by showing that if B = diag{B1, B2, B3}, then

dim/VB = dim/V31 + dimN32 + dimN33.] Exercise 4.33. Let u E C”, v E C" and B 6 (Ca be such that B4u = 0, B4v = 0 and the pair of vectors B3u and B3v are linearly independent in

C". Show that the eight vectors u, Bu, B2u, B3u, v, Bv, B2v and B3v are linearly independent in C". Exercise 4.34. Let u E C”, v E C" and B 6 (Ca be such that B4u = 0,

B3v = 0 and the pair of vectors B3u and B2v are linearly independent in C“. Show that the seven vectors u, Bu, B2u, B3u, v, Bv and B2v are linearly independent in C".

100 Exercise 4.35. Calculate [ 3

(ll ]

. [HINT2 To see the pattern, write the

given matrix as 012 +F and note that since F2 = 0, (a12+F)2, (a12+F)3, . . ., have a simple form.] 4.15. Bibliographical notes Earlier versions of this chapter defined the eigenvalues of a matrix A E (Cnxn in terms of the roots of the polynomial det (A1,, — A). The present version, which was influenced by a conversation with Sheldon Axler at the Holomorphic Functions Session at MSRI, Berkeley, in 1995 and his paper [5] in the American Math. Monthly, avoids the use of determinants. They appear for the first time in the next chapter, and although they are extremely useful for calculating eigenvalues, they are not needed to establish the Jordan decomposition theorem.

To counterbalance the title of [5] (which presumably was chosen for dramatic effect) it is perhaps appropriate to add the following words of the

distinguished mathematical physicist L. D. Faddeev [39]: If I had to choose a single term to characterize the technical tools used in my research, it would be determinants.

Chapter 5

Determinants

Look at him, he doesn’t drink, he doesn’t smoke, he doesn’t chew, he doesn’t

stay out late, and he still can’t hit. Casey Stengel In this chapter we shall develop the theory of determinants. If you have met determinants before, then you know that that the determinant det A of a matrix A 6 (Ca is a number in C such that det In 2 1, det PA = — det A if P is a simple permutation and also (although this is perhaps less familiar) that det A is a linear function of each of its rows separately. In this chapter we shall show that these three properties serve to completely specify determinants, i.e., a determinant can be characterized as the one

and only one multilinear functional d(A) from (€a to (C that meets the two additional constraints d(In) = 1 and d(PA) = —d(A) for every simple n x n permutation P. Later on, in Chapter 9, we shall also give a geometric interpretation of the determinant of a matrix A E Rnxn in terms of the volume of the parallelopiped generated by its column vectors. 5.1 . Functionals

A function f from a vector space V over IF into IF is called a functional. A functional f on a vector space V over IF is said to be a linear functional if it is a linear mapping from V into IF, i.e., if

f(0zu + 5V) = af(u) + WW) for every choice of u,v E V and a, 5 6 IF. A linear functional on an n—dimensional vector space is completely determined by its action on the elements of a basis for the space: if {V1, . . . ,vn} is a basis for a vector 91

92

5. Determinants

space V over IF, and if v = alvl +

+ anvn, for some set of coefficients

{a1,. . . ,ozn} 6 IF, then

(5.1)

f(V) = f E am = Z ajf(vj) j=1

j=1

is prescribed by the n numbers f (v1), . . . , f (vn).

A functional f (v1, . . . ,vk) on an ordered set of vectors {V1, . . . , vk} be— longing to a vector space V is said to be a multilinear functional if it is linear in each entry separately; i.e., if for every integer 2', 1 g i g k,

f(v1,...,V,-+W,...,Vk) =f(v1,...,v,-,...,vk)+f(v1,...,w,...,vk) and f(v1,...,av,-,...,vk) =04f(v1,...,v,-,...,Vk) for every a E F. Notice that if, say, k: = 3, then this implies that

f(V1+W1,V2+W2,V3)

= f(V1,V2+W2aV3)+f(W1,V2+W2,V3) = f(V1,V2,V3)+f(V1,W2,V3)

+ f(W17V2,V3)+f(W1,W2,V3) and

f(O£V1,0zV2,V3) = a2f(v1,V2,V3). 5.2. Determinants

Let 2,, denote the set of all the n! one to one mappings a of the set of integers {1, . . . ,n} onto itself and let e,- denote the i’th column of the identity matrix In. Then the formula T

ea’(1) . T _ P0 _ — Z ezew.) —

2:1

eaT

a2 —

3

—>

a2] e]

and

3 —> a3 = E a3;c ek, —>

94

5. Determinants

another two applications of rule 3° lead to the formula

3

d(A)

r

—>

3

91' _)

ej

= Zali < Zamd 1

K3

3

r

a3

—>

e,-

3

3

%

= 2 an 4 Z azj Z a3kd i=1

j=1

ej

[£21

3

a:

which is an explicit formula for d(A) in terms of the entries ast in the matrix

3,A and the 27 numbers d

E;-

, which in fact are equal to 0 if one or

%

ek:

more of the rows coincide, thanks to the next lemma. Granting this fact for the moment, the last expression simplifies to a sum of 6 = 3! terms

835(1) d(A) = Z alo‘(1)a2a(2)a3a(3)d 0623

85(2)

3

631(3)

where, as noted earlier, Zn denotes the set of all the n! one to one mappings

of the set {1, . . . ,n} onto itself. It is pretty clear that analogous formulas hold for A E ([3a for every positive integer n:

(5.2)

T

90(1) d(A) = Z ala(1) ' ' ' ana(n)d 06277,

= 2 “10(1) ' ' ' ana(n)d(PU) ' eT(

)

06277,

Moreover, if P0 is equal to the product of k simple permutations, then (KPH) = (—1)kd(1n)= (‘Dk-

The unique number d(A) that is determined by the three conditions in

Theorem 5.1 is called the determinant of A and will be denoted det(A) or det A from now on.

Exercise 5.2. Use the three rules in Theorem 5.1 to show that if A 6 C2”, then det A = 0410,22 — 0420,21.

Exercise 5.3. Use the three rules in Theorem 5.1 to show that if A E C 3X3, then det A = a11a22a33—a11a23a32+a12a23a31—a12a21a33+a13a21a32—a13a22a31 -

5.3. Useful rules for calculating determinants

95

5.3. Useful rules for calculating determinants Lemma 5.2. The determinant of a matrix A 6 (€a satisfies the following rules.4° If two rows of A are identical, then det A = 0. 5° If B is the matrix that is obtained by adding a multiple of one row of A to another row of A, then det B = det A.

6° If A has a row in which all the entries are equal to zero, then det A = 0.

7° If two nonzero rows of A are linearly dependent, then det A = 0. Discussion. Rules 4°—7° are fairly easy consequences of 1°—3°, especially if you tackle them in the order that they are listed. Thus, for example, if two rows of A match and if P denotes the permutation that interchanges these two rows, then A 2 PA and hence

det A = det (PA) = —det A, which clearly justifies 4°.

Rule 5° is most easily understood by example: If, say,

A:

—>

—>

a1

a1

a)

3

2:) +0: 5)

_)

and

B=A+ae2a4=

2_,

as

213

a4

a4

—>

then

detB=det

4

,

—>

—>

—>

a1

211

z?

3

+adet

a?

If

as

33

a4

a4

—>

=detA+O.

—>

Rule 6° is left to the reader. Rule 7° follows from 6° and the observation — 4, a a3 +6 a4=0 and oz 7E 0, then that if, say, n— 31

adetA=det

3

2_)

dag

31

=det

a?

_, 2

_)

aa3 +6 a4

—> a4

=0.

—> a4

A number of supplementary rules that are useful to calculate determinants will now be itemized in numbers running from 8° to 13°, interspersed with discussion. 8° If A E (3a is either upper triangular or lower triangular, then detA=a11---am

96

5. Determinants

Discussion.

To clarify 8°, suppose for example that 011

012

G13

0

0,22

G23

0

0

G33

A =

Then by successive applications of rules 3° and 5°, we obtain 6111

6112

6113

0 0

a22 0

a23 1

det A = a33 det

= 0330.22 det

= a33 det

G11

6112

0

0

1

0

0

0

1

G11

G12

0

0 0

an 0

0 1

= (1330,22 det

all

0

0

0

1

0

0

0

1

= a33a22a11 det [3.

Thus, in view of rule 1°, det A = 611161226133,

as claimed. Much the same sort of argument works for lower triangular matrices, except then it is more convenient to work from the top row down rather than from the bottom row up.

Lemma 5.3. If E 6 CM” is a lower triangular matrix with ones on the diagonal, then

(5.3)

det (EA) = det A

for every A E Ca_

Discussion.

Rule 5° implies that det (EA) = det A if E is a lower trian-

gular matrix with ones on the diagonal and exactly one nonzero entry below the diagonal. But this is enough, since a general lower triangular matrix E with ones on the diagonal can be expressed as the product E = E1 - - -Ek of k: matrices with ones on the diagonal and exactly one nonzero entry below the diagonal, as in the next exercise. D Exercise 5.4. Let

1

0

0

0121

1

0

O 0

0131

0132

1

0

0441

0142

0443

1

and let e,, i = 1, . . . ,4 denote the standard basis for C4. Show that

E = (I4 + azlezefXLi + a31e3e{)(14 + a41e4eir) X

(14 + a32e3e§)(I4 + a429492T)(I4 + @3640?) -

9° If A E (Cnxn, then A is invertible if and only if det(A) 75 0.

5.3. Useful rules for calculating determinants

Proof.

97

In the usual notation, let U = EPA

(5.4)

be in upper echelon form. Then U is automatically upper triangular (since it is square in this application) and, by the preceding rules,

det(EPA) = det(PA) = :I: det A. Therefore,

|detA| = |detU| = |u11---unn|. But this serves to establish the assertion, since A is invertible 4:) U is invertible and

U is invertible u11~ mm aé 0.

10° If A,B E (Cnxn, then det(AB) = det Adet B = det(BA). Proof. If det B = 0, then the asserted identities are immediate from rule 9°, since B, AB and BA are then all noninvertible matrices.

If det B 75 0, set

det(AB) A = —

(M )

det B

and check that cp(A) meets rules 1°— 3°. Then 90(A) = det A, since there is only one functional that meets these three conditions, i.e.,

det(AB) = det Adet B, as claimed. Now, having this last formula for every choice of A and B, invertible or not, we can interchange the roles of A and B to obtain

det(BA) = det Bdet A = det Adet B = det(AB) .

III Exercise 5.5. Show that if det B # 0, then the functional 90(A) = (113% meets conditions 1°—3°. [HINT: To verify 3°, observe that if 61,...,d’n designate the rows of A, then the rows of AB are (131B,. . . ,dnB.]

11° If A E (:a and A is invertible, then det A‘1 = (det A)—1. Proof. Invoke rule 10° and the formula det(AA‘l) = det In = 1. 12° If A E (Cnxn, then det A = det AT.

El

98

5. Determinants

Proof. Since an n X n upper echelon matrix U is upper triangular, UT is lower triangular and rules 10° and 8° applied to the formulas EPA = U and ATPTET = UT imply that det Pdet A = det U = u11~ mm, = det UT = det ATdet PT.

Therefore, since det P = :|:1 and PPT = PTP = In,

det A = det Pdet Pdet A = det Pdet PT det AT = det AT, D

as claimed.

13° If A E (Cnxn, then rules 3° to 7° remain valid if the word rows is replaced by the word columns and the row interchange in rule 2° is replaced by a column interchange.

Proof. reader.

This is an easy consequence of 12°. The details are left to the El

Exercise 5.6. Complete the proof of 13°. Exercise 5.7. Calculate the determinants of the following matrices by Gaussian elimination: 1 0 0 1

3 4 0 1

2 1 2 0

1 6 1 4



1 0 1 0

0 1 0 1

1 O 0 1

O 1 1 0



1 0 0 0

3 2 0 0

2 1 3 1

4 6 0 2



0 1 0 0

O 2 0 1

0 3 1 2

4 1 1 6

[HINTz If, in the usual notation, EPA = U, then |detA| = |det U |] Exercise 5.8. Calculate the determinants of the matrices in the previous

exercise by rules 1° to 13°.

5.4. Eigenvalues Determinants play a useful role in calculating the eigenvalues of a matrix A 6 FM". In particular, if A = U JU ‘1 , Where J is in Jordan form, then

det()\In — A) = det()\In — UJU—l) = det {a — J)U—1} . Therefore, by rules 10°, 11° and 8°, applied in that order,

detOJn — A) = detOJn - J) = (A — j11)()\ - 9'22) - ' ' (A - jnn), Where jii, i = 1, . . . ,n, are the diagonal entries of J. The polynomial

(5.5)

p()\) = det()\In — A)

is termed the characteristic polynomial of A. In particular, a number A is an eigenvalue of the matrix A if and only if p()\) = 0. Thus, for

5.4. Eigenvalues

99

example, to find the eigenvalues of the matrix

1 2

A=l21l look for the roots of the polynomial

det()\12 — A) = (A — 1)2 — 22 = A2 — 2). — 3. This leads readily to the conclusion that the eigenvalues of the given matrix

A are A1 = 3 and A2 = —1. Moreover, if J = diag{3, —1}, then

A2 — 2A — 312 = (A — 312)(A + 12) = U(J — 312)U—1U(J + 12)U_1

_

0

0

40

_1_

00

_1

— UL. _4H0 J11 —U[. olvw which yields the far from obvious conclusion

A2—2A—312 =0. The argument propogates: If A1, . . . , Ak denote the distinct eigenvalues of A, and if a, denotes the algebraic multiplicity of the eigenvalue A,, 2' = 1, . . . , k, then the characteristic polynomial can be written in the more revealing form

(5.6)

pM) = (A — A1)C‘1(x\ — A2)” - - - (A — AW-

Thus,

p(A) = (A — Alma” (A — Aglnym . - - (A — AkIn)°‘k U(J — AlIn)“1(J — Ann)” - - - (J — AkIn)akU_1

= 0. This serves to justify the Cayley-Hamilton theorem that was referred to in the discussion of Theorem 4.15. In more striking terms, it states that

(5.7)

det (AIn — A)

=

a0 + . . . + an_1)\n—1 + )‘n

. => aoIn + - ' - + an_1An_1 + A” = 0

Exercise 5.9. Show that if J = diag{C,(\51), CS), 0%)}, then (J — A1110)5(J — A2110)3(J — A3110)2 = 0.

Exercise 5.10. Show that if A1 = A2 in Exercise 5.9, then (J — A1110)5(J — A3]10)2 = 0.

Exercise 5.10 illustrates the fact that if Vj, j = 1,. . .,k, denotes the

size of the largest Jordan cell in the matrix J with M on its diagonal, then p(A) = 0 holds for the possibly lower degree polynomial

29mm = (A— /\1)”1(/\—A2)”2---(>\—Ak)"’“,

100

5. Determinants

which is the minimal polynomial referred to in the discussion of Theo— rem 4.15:

pmm(A)

=

(A — /\1In)"1 (A — A217,)” - - - (A — AkIn)Vk

= U(.] — Alln)"1(J — Mn)” - . . (J — AkIn)”kU‘1 =

0.

The Jordan decomposition A = U JU‘1 leads easily to a number of useful formulas for determinants and traces, Where the trace of an n x n matrix

A is defined as the sum of its diagonal elements: (5.8)

traceA = all + 0.22 + - - - + am.

Theorem 5.4. If A E ((3a has 11: distinct eigenvalues A1, . . .,)\k with al— gebraic multiplicities a1, . . . ,ozk, then:

(5.9)

det A = £932 - - - A?“

and

(5.10)

traceA 2 wk + 042A2 + - - - + ozkAk.

Moreover, if f()\) is a polynomial, then

(5.11)

det()\In — f(A)) = (A — f()\1))0‘1...()\ _ f()\k))04k,

(5-12)

det f(A) = f()\1)a1f()\2)0‘2 ' ' ' f()\lc)o"c

and

(5.13) Proof.

trace f(A) = ozlf()\1) + a2f()\2) + - - - + ozk‘k). The verification of formulas (5.10) and (5.13) depends upon the

fact that

(5.14)

trace (AB) = Z Z aijbji = trace (BA) . i=1 j=1

Thus, in particular,

(5.15)

traceA = trace (UJU‘I) = trace (JU‘lU) = traceJ,

which leads easily to (5.10); the verification of (5.13) is similar, but is based [I on the formula f (A) = Uf (J)U‘1. The rest is left to the reader. Corollary 5.5. (Spectral mapping principle) If A E (3a and f (x\) is a polynomial, then

(5.16)

A E 0(A) c) f(>\) E o(f(A)).

Exercise 5.11. Verify Corollary 5.5, but show by example that the multi-

plicities may change. [HINT: The key is (511).]

101

5.5. Exploiting block structure

5.5. Exploiting block structure The calculation of determinants is often simplified by taking advantage of block structure.

Lemma 5.6. If A 6 CM" is a block triangular matrir, i.e., if either

_

A11

A—[

O

A11

_

A12

A_[A21

1422]”

0 A22]

with square blocks A11 6 CPXP and A22 6 (qq, then det A = det A11 det A22 .

Proof. In view of Theorem 4.15, A11 = VlJlVfl and A22 = V2J2V2—1, Where J1 and J2 are in Jordan form. Thus, if A is block upper triangular, then

A: A11

A12 2 V1

0

J1

Vf1A12V2

V1

0 _1

0

V2

0

J2

0

V2

A22

0

'

Therefore, since J1 and J2 are upper triangular, the middle matrix on the

right is also upper triangular and det A = det J1J2 = det J1 det J2 = det A11 det A22,

as claimed. The proof for block upper triangular matrices is similar.

D

Exercise 5.12. Give a second proof of Lemma 5.6 based on the factorization formulas E1P1A11 = U11 and E2P2A22 = U22 of Gaussian elimination.

[REMARKz This is a little more work than the proof based on the Jordan decomposition theorem, but is using tools that are available earlier.] Theorem 5.7. Let A 6 (Ca be expressed in block form as A =

A11 [ A21

A12 ] , A22

with square blocks A11 6 (3k and A22 6 C("_klx(n_kl 0n the diagonal.

(1) If A22 is invertible, then det A = det(A11 — A12A2_21A21) det A22 .

(5.17)

(2) If All is invertible, then (5.18)

Proof.

det A = det(A22 — A21A1_11A12) det A11 .

The first assertion follows easily from Lemma 5.6 and the identity

for the Schur complement of A22 With respect to A: A: |: Ik

0

A12A2_21 :| I: A11 —A12A2_21A21

In—k

0

0

A22

:| l:

Ik

0

142211421

In—k:

:|



102

5. Determinants

which is valid when A22 is invertible; the second rests on the identity Ik:

O

142114111

In—lc

A = |:

]

0

:|

0

|: All

|: Ik

A22 — 1421141111412

141—111412 :|

0

171—]:

which is valid when A11 is invertible.

’ El

Exercise 5.13. Show that if A, B 6 ([3a are expressed in compatible four

block form with A11, B11 6 (3k, A22, 322 e c>\"a

fn750a

be a polynomial of degree n, let

Sf = cén)_en [—00

. . .

_an_1:|

0 0

1 0

0 0

0 O

0

0

0

1

_a0

_a1

_an—2

_an—1

=

6.10. Companion and generalized Vandermonde matrices

129

with aj = fj/fn, denote the companion matrix based on f (A) and let

l

(6.10)

O

A

v()\) =

:

:

and f(>\) =

0

NH

f0)

Then

(6.11)

sfvo) = Av()\) — fifo)

and v(j)()\)

(

6.12

)

S

f

v(j)()\)

.

= A

J!

.

V(j—1)()\) +

J!

.

(J - 1)!

1 f(j)()\) — —

fn

.

i!

forj = 1,2.... Proof.

By direct computation

_

A

Ari—1

Sf VW =

_ —(a0 + a1)\ + - - - + an—lAn—l)

_

‘ A E

_i

0 E

fn

0

An_1

_ A”

7

f(>\)

Which coincides With (6.11). The formulas in (6.12) are obtained by differentiating both sides of (6.11) j times with respect to A to first verify the formula

(6.13)

S) v(j)()\) = Av(j)()\) +jv(j_1)()\) — fifmo)

for j = 1,2,. . . by induction and then dividing both sides by j !.

El

Corollary 6.8. In the setting of Lemma 6.7, assume that the polynomial

f (A) admits a factorization of the form

f(/\) = fn(>\ - Mm1 ' ' ' (A - A6)” with 16 distinct roots A1, - - - ,Ak, and let

(6'14)

V9:

V06)

0!

“DOW

1!

V(m"_1)()\j)

(mj—i)!

forj= 1,...,k. Then

(6.15)

sfv; = Vjojimj + 03m”) = 190g?) for j: 1,. . .,.k:

130

6. Calculating Jordan forms

Exercise 6.21. Verify formula (6.15) when mj = 4. A matrix of the form V = [V1

- --

Vk], with V}- as in (6.14) is called

a generalized Vandermonde matrix.

Corollary 6.9. The vectors in a generalized Vandermonde matrix are linearly independent. Exercise 6.22. Verify Corollary 6.9. Example 6.10. If

f0) = (A—a)3(A—fi)2=fo+f1A+---+f4A4+A5. then

0 0

1 0

0 1

0 0

0 0

1 a

0 1

0 0

1 e

0 1

0

0

0

1

0

a2

2d

1

32

2e

0

0

0

0

1

a3 3a2 3a

53 352

Oz

=

1 a a2 a3 a4

0 1 2d 3a2 4a3

0 0 1 3a 6a2

1 s B2 53 54

0 1 26 352 453

a 0 0 0 0

Exercise 6.23. Verify the matrix identity in Example 6.10.

Theorem 6.11. Let f (A) be a polynomial of degree n that admits a factorization of the form

f(>\) = fn()\ - A1)“1 - - - (A - Mm“ with k distinct roots /\1, - -- ,Ak. Then the companion matrix Sf is similar

to a Jordan matrix J with one Jordan cell for each root:

sf = VJv-1 , where V is a generalized Vandermonde matrix and

J = diag{C)(f:1), . . . , 0/9“}. [C Proof.

This is an easy consequence of Corollary 6.8.

D

This circle of ideas can also be run in the other direction, as indicated

by the following two exercises.

6.10. Companion and generalized Vandermonde matrices

131

Exercise 6.24. Let A E CW” have 14: distinct eigenvalues A1, . . . , M with geometric multiplicities 71, . . . ,7]. and algebraic multiplicities 041, . . . ,ak, re— spectively. Show that if 79' = 1 for j = 1, . . .,k:, then A is similar to a

companion matrix Sf based on a polynomial f()\) and find f (A). Exercise 6.25. Show that if A E ([3a is similar to the Jordan matrix

. 4 2 3 1 3 J = d1ag{C§\1), oil), cg, cg, egg} , then A is also similar to the block diagonal matrix diag {SQ1 , 392} based on a pair of polynomials gl()\) and 920‘) and find the polynomials. Exercise 6.26. Find a Jordan form for ”(In — ,uCO(n) )_1 when ,u 7E O.

Exercise 6.27. Find a Jordan form J for the matrix A =

2 0 0 0

2 1 0 0 2 3 0 0 0 2 0 0 . 3 1 2 0

0

1

1

1

2

Exercise 6.28. Find an invertible matrix U E (C5X5 such that A = U JU‘1 for the matrix A with Jordan form J that was considered in Exercise 6.27.

Exercise 6.29. Let B E CW”; let 111 E NB; v1,v2 E N32; W1,W2 E N33;

and assume that the 5 vectors B2w1, B2W2, Bvl, s, 111 are linearly independent over (C. Show that the 11 vectors B2w1, Bwl, W1, B2W2, BW2, W2, Bvl, v1, BV2, V2, 111 are also linearly independent over (C. Exercise 6.30. Find an invertible matrix U such that U‘1AU is in Jordan

form when A =

1

0

0

2 0

2

. [NOTE: i = \/—1.]

—i 0 1 Exercise 6.31. Find a Jordan form J for the matrix

A =

0

1

O

O

0

0 8 —1 —4

0 —12 1 1

1 6 0 0

0 0 0 —4

O O 1 4

HINT: You may find the formula 51:3 — 6x2 + 12x — 8 = x — 2 3 useful. Exercise 6.32. Find an invertible matrix U such that AU = U J for the

matrices A and J considered in Exercise 6.31.

The next three exercises are adapted from [73]. Exercise 6.33. Show that if n 2 2, then the matrix (052))2 is similar to

the matrix 052%) if and only if ,u 7E 0.

132

6. Calculating Jordan forms

Exercise 6.34. Let B E ([3p be a triangular matrix with diagonal entries

bii = A aé O for i = 1,...,p and let V E (CPXP. Show that B2V = VB2 4:) BV = VB. [HINT: Separate even and odd powers of B in the binomial expansion of (B — AIp)p = O to obtain a pair of invertible matrices P1 = 0,011, -|- 0.232 + ' ' ' and P2 = bOIp + [72.32 -|- - - - such that P1 = 3P2.]

Exercise 6.35. Let A, B 6 (3a and suppose that the eigenvalues of A

and B are nonnegative and that NA = NA2 and NB = N32. Show that

A2 = B2 4:) A = B. [HINT: Invoke Exercise 6.33 to show that the Jordan decompositions A = U1J1U1—1 and B = U2J2U§1 may be chosen so that

J1 = J2 and hence that J12V = VJ12 for V = Uf 1U2. Then apply Exercise 6.34, block by block, to finish] Exercise 6.36. Let A E (3a be a companion matrix and let p(A) = det (AIn — A). Show that 1

A

0

1

/\

A2

1

2,\

A”—1

A" — p(A)

(n — 1)A"_2

nA"‘1 — p'(A)

and differentiate once again with respect to A to obtain the next term in the indicated sequence of formulas. Exercise 6.37. Find an invertible matrix U and a matrix J in Jordan form

such that A = UJU‘1 if A E C6X6 is a companion matrix, det (A16 — A) = (A — A1)4(A — A2)2 and A1 7E A2. [HINT: Exploit the sequence of formulas indicated in (6.16), first with A = A1 and then with A = A2.] Exercise 6.38. Show that the matrix

a

1

O

O

a

1

O

0

oz

is similar to

a

6

0

0

a

B

0

0

Oz

if 6 7E O.

Exercise 6.39. Find a Jordan decomposition A = UJU‘1 for the matrix 1/2 1/2 0

A=

0

1/2

1/2

0

circulants.]

1/2 . [HINT: [1 1/2

1

1]T is an eigenvector of A, or, think

tum~7

Normed linear spaces

I give you now Professor Twist, A conscientious scientist,

Camped on a tropic riverside, One day he missed his loving bride. She had, the guide informed him later, Been eaten by an alligator. Professor Twist could not but smile. “You mean,” he said, “a crocodile.”

The Purist, by Ogden Nash In this chapter we shall consider a number of different ways of assigning a number to each vector in a vector space Ll over C that gives some indication of its size. Ultimately, our main interest will be in the vector spaces (3" and R”. But at this stage of the game it is useful to develop the material in a more general framework, because the extra effort is small and the dividends are significant. We shall also show that if a matrix B 6 (3a is sufficiently close to an invertible matrix A 6 CW", then B is invertible too. In other words, the invertibility of a square matrix is preserved under small perturbations of its entries, as are right and left invertibility, even though nonmaximal rank may not be.

7. 1 . Four inequalities Throughout this subsection 3 > 1 and t > 1 will be two fixed numbers that are connected by the formula

an

—+—=1. 133

134

7. Normed linear spaces

m

m

Figure 1

Figure 2

It is readily checked that

(7.2) %+%=1(s—1)(t—1)=1 (s—l)t=s 0, b> 0, s > 1, t > 1 and (s— 1)(t— 1) = 1, then (7.3)

ab 3 :8 + g ,

with equality if and only if as = bt.

Proof. The inequality will be obtained by comparing the areas of a rectangle R with horizontal sides of length a and vertical sides of length I) with the area of the shaded regions that are formed between the zit-axis and the curve y = ms_1, 0 g a: g a, and the y axis and the same curve, now written as a: = yt_1, for 0 g y S b, as sketched in the two figures.

The first figure corresponds to the case erg—1 > b; the second to the case

(Ls—1 < b, and (7.2) guarantees that y=£B3_1§

§$=yt_1.

The rest is straightforward. It is clear from the figures that in each setting the area of the rectangle is less than or equal to the sum of the area of the vertically shaded piece and the area of the horizontally shaded piece: a

ab

3

_

b

/xs_1dx+/ yt_1dy

0

0

$3 cc=a

yt‘y=b

s m=0

t y=0

as

bt

= fl?

7.1. Four inequalities

135

The figures also make it clear that equality will prevail in formula (7.3) if

and only if as_1 = b or, equivalently, if and only if a(5_1)1t = bt. But this is the same as the stated condition, since as = a(s_1)t.

El

Lemma 7.2. (Holder’s inequality) Ifs > 1, t > 1 and (s— 1)(t— 1) = 1, then

(7.4)

n

n

k=1

k=1

n

1/3

l/t

X |akbk| s {Emir} {Ewart} . k=1

Moreover, equality will prevail in (7.4) if and only if the vectors u with com—

ponents Uj = lajls and v with components Uj = lbjlt are linearly dependent. Proof.

We may assume that the right-hand side of the asserted inequality

is not equal to zero, because otherwise the inequality is self-evident. [Why?] Let

ale 3 2—

ak {Z?=1laj|3}1/ an

bk: (1 e=— .

k {Xian-VF”

Then

{lair = 1 and [lat = 1, k=1

k=1

and hence, in View of Lemma 7.1,

n n lfiklt =—1 < n lakls 2mm: . +2, k=1

19:1

1 —=L

19:1

This yields the desired inequality because it

n

Zlawu:

k=1

n

E '“1/1 = lakbklH

t m-

(Emails) (Zj=1|bj|)

Finally, equality will prevail in (7.4) if and only if either (1) the right-hand side is equal to zero or (2) the right—hand side is not equal to zero and a,s

loziflil=lgl

,t

+@



for

i=1,...,n.

Lemma 7.1 implies that the latter condition holds if and only if |ads==|thv

fifl'

i==1,n.,nq

i.e., if and only if

|a,;|s = lbilt zy=1|aj|s 2321 |b,-|t This completes the proof, since (1) and (2) are equivalent to the linear dependence of the vectors u and v.

D

136

7. Normed linear spaces

The case 3 = 2 is of special interest because then t = 2 and the inequality (7.4) assumes a more symmetric form and gets a special name:

Lemma 7.3. (The Cauchy-Schwarz inequality) Let a,b E C” with components a1, . . .,an and b1, . . .,bn, respectively. Then

n

n

1/2

n

Dan S (Elam) (2W)

19:1 with equality if and only if

19:1

1/2

Ic=1

dim span{a, b} g 1 . Proof.

The inequality is immediate from (7.4) by choosing s = 2 (which,

as already remarked, then forces t = 2).

D

Exercise 7.1. Show that if oz, 5 E R and 6 E [0, 271'), then 04 cos 6+B sin0 S V a2 + B2 and that the upper bound is achieved for some choice of 0. Lemma 7.4. (Minkowski’s inequality) Let 1 g s < 00. Then it

1/s

n

1/s

3 {2 law}

n

Us

+ {2W}

.

(75)

{Zlak +k3}

Proof.

The case 3 = 1 is an immediate consequence of the fact that for

k=1

k=1

k=1

every pair of complex numbers a and b, |a + b| S |a| + |b|. On the other hand, if s > 1, then 71.

n

Zlak‘l‘bkls

=

k=1

Zlak‘i‘bkls—1lak‘l‘bkl k=1 n

S Zlak+bkls_1(lak| +|bk|)k=1

By Holder’s inequality, n

77.

1/t

2 la}. +bk|3_1|ak|S {Zlak Milt—1)t}

n

1/3

{2 law}

k=1

and 3

l/t

lak+bkl(s_1)t}

Z lak + bk|s_1|bkl S{

l/s

{Zlbkls}

k =1

Ic=1

n

k=1

for t = s/(s — 1). Since (3 — 1)t = s, the last three inequalities imply that n

n

Zlak‘l'bkls S {Zlak‘l'bkls} k=1 k=1

1/t

n

(Zlak|s> k=1

1/3

n

+ (Zlbkls) k=1

1/3

7.1. Four inequalities

137

Now, if TL

Zlak+bkls>0, k=1

then we can divide both sides of the last inequality by {22:1 lak + bkls} 1/1: to obtain the desired inequality (7.5).

It remains to consider the case ZZ=1|ak + bkls = 0 . But then the inequality (7.5) is self-evident.

D

Exercise 7.2. Let a1, . . . , an and b1, . . . , bn be nonnegative numbers and let 1 < s < 00. Show that n

(7.6)

1/3

{Zlak+bk|s}

n

1/5

= {Zlakls}

1:21

n

1/3

+ {Zlbkls}

1921

k:=1

if and only if the vectors a and b with components a1, . . . , an and b1, . . . , bn,

respectively, are linearly dependent. [HINT: See how to change the inequal— ities in the proof of Minkowski’s inequality to equalities] Remark 7.5. The inequality (7.3) is a special case of a more general statement that is usually referred to as Young’s inequality: Ifa1,. . . ,an andp1,...,pn are positive numbers such that pil +- - -+p—1n = 1, then p1

align

p1

pn

A proof is spelled out in the following three exercises. Exercise 7.3. Let (11,. . .,an and 01,...,cn be positive numbers such that cl+---+cn= 1 andletp> 1. Showthat

p (7.8)

E Cjaj j=1

3 Z ejag. j=1

[HINT: Write ejaj = 031/q(cjl./paj) and then invoke Holder’s inequality] Exercise 7.4. Verify Young’s inequality when n = 3 by exploiting the

inequality (7.3) to show that if 1

1

1

p

F1

F2

—=—+— then: p

q

(1) a1a2a3 S % + 6:1—3.

p 101/10 + 29—20“? 17 192/10. (2) a1a2 S p—la1

and

1

1

CI

P3

—=—,

138

7. Normed linear spaces

(3) pPTafl/p+p%a§2/p S (pflla’fl + Ea”)

1/10

.

(4) Verify Young’s inequality for n— — 3. [HINT: The inequality (7.8) is useful for (3).] Exercise 7.5. Verify Young’s inequality. [HINT: Use the steps in the pre-

ceding exercise as a guide] Exercise 7.6. Show that the geometric mean of a given set of positive numbers b1, . . . , 1),, is less than or equal to its arithmetic mean, i.e., b b $, b.n < (1,1192 . . . bn )1/n_

(79)

with equality bl = - - - = bn. [HINT: Young’s inequality helps] 7.2. Normed linear spaces A vector space M over IF is said to be a normed linear Space if there exists a number cp(x) assigned to each vector x E U such that for every choice of x,y E U and every a 6 IF the following four conditions are met:

(1) ( ) ”A“, then

Alp — A is invertible and ”(Alp — A)—1|| s (W — ||A||)—1||Ip|| for every multiplicative norm on Cp.

Exercise 7.23. Calculate “II,“ for the multiplicative norms ||A|| considered in Exercise 7.20 applied to (CPXP. Corollary 7.19. Let A, B E (CPXP and suppose that A is invertible, that

||A_1|| g *y and that “B — A“ < 1/7 for some multiplicative norm || H and

150

7. Normed linear spaces

some number 7 > 0. Then:

(7.43) Proof.

B

is invertible and

||B_1|| g M ”II,“ .

Let

B=A—M—Bfia%Q—AHA—m) and set

X=A*m—By Then, since B = A(Ip — X), the desired results are immediate from Theorem 7.18 and the estimate

HXH=WTWA-BWSWFWMA-BWSVMA-BW[I The next corollary is mostly a reformulation of Corollary 7.20 into notation that will be useful in estimating contour integrals in Chapter 17.

Corollary 7.20. Let A,B E CPXP and A E C, and suppose that AIp — A

is invertible, that ||()\Ip — A)_1|| g *y and that “B — A“ < 1/7 for some multiplicative norm and some number '7 > 0. Then:

(1) AIp — B is invertible. _—1 4

m>MMi.m Hsl_flB_Apmm. VWA-BH much—m4—Mt—Brw:—:WX:EwML Proof.

Clearly,

AIp—B =

Alp—A— (B—A)

ut—AMt—ot—ArwB—m} ()‘Ip _ AXIP _ X):

=

With

.X=ot—Ar%B—m, for short. Moreover,

||(/\Ip - A)_1(B - A)” ||(/\Ip - A)_1|| ”3 - All

|/\

|/\

||X||

by assumption. Therefore, by Theorem 7.18, the matrix Ip — X is invertible and

”Up - X)_1|| S (1 - ||X||)_1||Ip||-

151

7.9. Small perturbations

Thus,

|/\

|/\

“(Mp - B)‘1||

”(1p — X)_1(Mp — A)_1|| ”(1p - X)_1||||(/\Ip - A)_1|| 7(1 - ||X||)_1 S 7(1 - 7MB - All)‘1 IIIplla

which justifies (2). Finally, the bound furnished in (3) is an easy consequence of the formula 0‘]? _ A)_1 _ ()‘Ip _ B)_1 = ()‘Ip _ A)_1(A _ B)()‘Ip _ B)_1

and the preceding estimates.

D

7 .9. Small perturbations 0 Warning: From now on the convention

”A” = ||A||2,2

(7.44)

for matrices A 6 Wm

will be in force, unless indicated otherwise. Thus, we shall usually write

”AB” 3 “A“ ”B“

instead 0f IIAB||2,2 S ||A||2,2 ||B||2,2.

Theorem 7.21. IfA,B E W”, A is invertible and ||A—B|| < {||A_1||}_1, then B is invertible. Proof.

This is a special case of Corollary 7.19.

D

Theorem 7.21 insures that invertibilty is preserved under small perturbations. It is now natural to ask: Are left and right invertibility preserved under small perturbations? The answer is yes, but it will be justified indirectly by first exploring the behavior of the rank, which is not necessarily preserved under small perturbations.

Lemma 7.22. IfA E lF‘pxq and B is an m x n submatrizl; ofA, then “B” g

”AllDiscussion. Suppose that A E F6X5 and B =

G21

G23

024

G41

G43

G44

and let e,- denote the i’th column of I6 for i = 1, . . . ,6 and fj denote the j’th column of I5 for , j = 1, . . .,5. Then

B = ETAF,

where

E = [e2 e4]

and

F = [f1

Therefore, ”B” = ||ETAF|| 3 ”ET” “A“ ”F”. But, as ETy = [:2] 4

for every y 6 F6,

f3 f4] .

152

7. Normed linear spaces

it is easily seen that ||ET|| = 1, and, by similar considerations, that ”F“ = 1. Therefore, ”B” g ”A” in this example. The verification of this inequality in the general setting is essentially the same; only the bookkeeping is a little more elaborate. III Exercise 7.24. Show by direct calculation that if E E Rn” is a submatrix of In that is obtained by discarding n — k columns of In for some choice of

[9,1 g k g n, then ”E” = 1 and ”ET” = 1. Theorem 7.23. IfA E lB‘q, then: (1) rankA = r => there exists an invertible r x r submatrix of A. (2) If there exists a k x k invertible submatrix of A, then rankA 2 k.

Proof. If A = [a1 - - -

aq] and rankA = r, then there exists a submatrix

B 6 lX7' of A with rankB = r. Therefore, rank BT = r and there exists

an r x r submatrix C of BT with rank C = r. Thus CT is an invertible r X r submatrix of A. This completes the proof of (1). Suppose next that there exists a h x k invertible submatrix of A. Then

the 1»: columns of A that overlap the columns of this submatrix are linearly independent. Thus, rankA 2 k. D Theorem 7.24. If A E 1?q and rankA = r, then there exists an e > 0

such that ifB E 117q and HA — B“ < 8, then rankB 2 r. Proof.

If A E 117q and rankA = r, then there exists a p x r submatrix

E of Ip and a q x r submatrix F of Iq such that the r x r submatrix ETAF of A is invertible. Moreover, since

IIETAF - ETBFII = ||ET(A - B)F|| S ||A - Bll, Theorem 7.21 insures that ETBF is invertible if ”A — B“ < 8 and e is small enough. Thus, in View of Theorem 7.23, rankB 2 r.

U

Example 7.25. If 0 A— [0

1 1],

and

0 B—A— [a

1 1]

with a #0,

then it is easily checked that ”A — B“ = |a| and hence that there exists a matrix B with rankB = 2 in the set {X E 19‘q : ||A — X|| < e} for every 5 > 0, no matter how small, Whereas rankA = 1.

Corollary 7.26. If A, B 6 WM, then there exists an e > 0 such that

(1) A is left invertible and HA — B” < e ==> B is left invertible. (2) A is right invertible and HA — B“ < 5 => B is right invertible.

7.9. Small perturbations

153

Proof. This depends upon the fact that A is right invertible if and only if rankA = p; and A is left invertible if and only if rankA = q, whereas

min {19, q} 2 rankB 2 rankA if ”A — B” < 8 and s is a sufficiently small positive number.

D

Theorem 7.27. If A E CPXP and 5 > 0, then there exists a diagonahzable

matrizt B E ((3p such that ”B — A“ < a.

Proof. If A = UJU‘l, choose B = U(J + D)U_1, where D is a diagonal matrix that is chosen so that the diagonal entries of J + D are all distinct

and ||UDU_1|| < 5.

D

Theorem 7.27 implies that the set of complex diagonalizable matrices is dense in CPXP; however, it is not an open set: If A 6 (CPx1” is diagonalizable

and e > 0, then {B E CPXP : ||B — A” < a} will contain nondiagonalizable matrices. The simplest example is

A=[fl1 0

0]

[1.2

and

[M1 0

a]

#2

with

0 1

and t = s/(s — 1), then

maX{lf(X)l= ||X||s S 1} = (Z lajlt)1/tj=1

[HINT: It is easy to show that |f(x)| g (291:1 ‘Otjlt)1/t||X||3. The particular vector x with coordinates 333- = OC—jlajlt—2 when aj 7E 0 is useful for showing that the maximum is attained] Exercise 7.30. Let f (x) be a linear functional on C” and let f (ej) = aj for j = 1, . . . ,n, where ej denotes the j ’th column of In. Show that

maX{lf(X)l= ||X||oo S 1} = Z lajlj=1

[HINT: See the hint in Exercise 7.29.] Exercise 7.31. Let f (x) be a linear functional on C” and let f (ej) = Ozj for j = 1, . . . ,n, where ej denotes the j ’th column of In. Show that

max{|f(x)|: ||x||1 g 1} = max{|ozj| : j: 1,...,n}. [HINT: See the hint in Exercise 7.29.] 7 .11. Extensions of bounded linear functionals

In this section we consider the problem of extending a bounded linear functional f that is specified on a proper subspace L! of a normed linear space X over ]F to the full space X in such a way that the norm of the extension F is the same as the norm of f, i.e.,

sup{|F(x)| : x E X and ”X” S 1} = sup{|f(u)| : u E L! and ||u|| g 1}. The fact that this is possible lies at the heart of the Hahn-Banach theorem.

It turns out to be more useful to phrase the extension problem in a slightly more general form that requires a new definition: A real-valued function p(x) on a vector space X over IE‘ is said to be a seminorm if for every choice of x, y E X and a 6 IF the following two conditions are met:

(1) 10(X + Y) S p(X) + 13(30-

(2) p(OzX) = lalp(X)Exercise 7.32. Show that if p(x) is a seminorm on a vector space X over IF, then p also automatically satisfies the following additional three conditions:

(3) 19(0) = 0-

156

7. Normed linear spaces

(4) p(x) 2 0 for every x E X.

(@pW—YfiHMfl-pWN Theorem 7.30. Let p be a seminorm on a finite dimensional vector space X over IF, and let f be a linear functional on a proper subspace u of X such that

f(u) 6 IF

and

|f(u)| S p(u) for every 11 E M.

Then there exists a linear functional F on the full space X such that

(1) F(u) 6 IF and F(u) = f(u) for every u E U. (2) |F(x)| g p(x) for every x E X. Proof.

Suppose first that IF = C and let

gm) 2 flu): (u) and Mu) = an); (u)

denote the real and imaginary parts of f (11), respectively. Then

f0fi=900+iMUL and it is readily checked that

9(04111 + 5112) = 049(111) + 59012) and h(04111 + 5112) = 04h(u1) + 5h(112) for every choice of 111, ug E U and 04, B E R. Moreover,

g(u) g |f(u)| g p(u) for every 11 G Ll. Let v1 6 X \U and suppose that there exists a real-valued function G(u) such that

G(u+ozv1)

=

G(u)+aG(v1)

= 9(U)+aG(V1) and C(u + av1)§ p(u + ozvl) for every choice of u E L! and oz 6 R. Then

g + crawl) 3 Mn + om). Thus, if oz > 0, then

g + aG s p (a {au + vl}) = and

for every 11 E U; i.e.,

(1%)

u 0119(5 +V1)

aszae+vl>we> u

GWDSPW+Vfl—9W)

7.11. Extensions of bounded linear functionals

157

for every y E M. On the other hand, if 04 < 0, then

9(U)+aG(V1) = 9(U)-lalG(V1)=G(u-lalvl) 11

g p(u —|ozlv1)=|a|p m — V1

7

and hence

for every u E u; i.e.,

(7-47)

GW”>QQZI> phi-v) G(V1) 2 9(X) — 29(X — V1)

for every x E M. Thus, in order for such an extension to exist, the number

G (v1) must meet the two sets of inequalities (7.46) and (7.47). Fortunately, this is possible: The inequality

9(X) + 9(y) = 9(X + y) S p(x + y) p(X-V1 +y+V1) s p(x — V1) +p(y + V1) implies that

9(X) - p(x - V1) S p(y + V1) - 9(y) for every pair of vectors x, y E X and hence that

(748)

sup{g(X) - p(x - V1) = X E U} S inf{p(y + V1) - 9(y) 1y E U}-

The next step is to extend f (u) This is facilitated by the observation that

gmn+mmn=fmo=uwrnmm—hmx which implies that h(u) = —g(7§u) and hence that

f(U) = 9(11) - 29011)This suggests that

F(x) = C(x) — iG(z'x) might be a reasonable choice for the extension of f (u), if it’s a linear functional with respect to (C that meets the requisite bound. Its clear that

F(x + y) = F(x) + F(y). However, it’s not clear that F(ax) = aF(x) for every point 01 E C, since G (ax) = aG’(x) only for Oz 6 1R. To verify this, let oz =a+ib with a,b E R. Then

F((a + ibb') = Ga(( + z’b).V)- iG’U (a + Z'b)y) = aG(y)+ bG(z’y) — iaG(iy) + ibG(y) =

(a + ib)G (y) — 2'(a + ib)G(z‘y)

= (a + ib)F (Y)

158

7. Normed linear spaces

Moreover, if F (y) 7E 0, then upon writing the complex number F (y) in polar

coordinates as F(y) = ew|F(y)|, it follows that

IF(y)| = F(e‘wy) = C(e‘wy) s p(e_“’y) = |e_z9|p(y) = p(y)Therefore, since F(O)

O = p(0), the inequality

IF (Y)| S 19(3') holds for every vector y E M + {owl : a E (C }. This completes the proof for a one dimensional extension. The procedure can be repeated until the extension is defined on the full finite dimensional space X over (C. The proof for the case IF = R is easily extracted from the preceding analysis and is left to the reader as an exercise. D Exercise 7.33. Verify Theorem 7.30 for the case IF = R. [HINT: The key

is to verify (7.48) with f = 9.] Exercise 7.34. Show that if X is a finite dimensional normed linear space

over R, then Theorem 7.30 remains valid if p(x) is only assumed to be a sublinear functional on X; i.e., if for every choice of x,y E X, p(x) satisfies the constraints (1) 00 > p(x) 2 0; (2) p(x + y) g p(x) +p(y); (3) p(0zx) = ap(x) for all oz > 0. 7. 12. Banach spaces A normed linear space Ll over IE‘ is said to be a Banach space over IF if every Cauchy sequence v1, v2, . . . in u tends to a limit v E bl, i.e., if there

exists a vector v E U such that 11mmOO ||vn — v||u = 0. In this section we shall show that finite dimensional normed linear spaces are automatically Banach spaces. Not all normed linear spaces are Banach spaces. Exercise 7.35. Let LI be the space of continuous real-valued functions f (at) on the interval 0 g x g 1 equipped with the norm 1

||f||u= / |f(w)ldrvShow that L! is not a Banach space. Remark 7.31. The vector space Ll considered in Exercise 7.35 is a Banach

space with respect to the norm ||f||u = max{|f(:c)| : 0 g x g 1}. This is a consequence of the Ascoli—Arzela theorem; see e.g., [91]. Thus, norms in infinite dimensional normed linear spaces are not necessarily equivalent.

7.12. Banach spaces

159

Exercise 7.36. Let u1,.. . , ug be a basis for a normed linear space M over IF. Show that the functional E

e

¢:ZMW j=1

=Zlm j=1

defines a norm on u.

Theorem 7.32. Every finite dimensional normed linear space M over IF is automatically a Banach space over F.

Proof.

Let 111,...,Ug be a basis for LI and let {VJ-E:1 be a Cauchy se-

quence in M. Then, for every 5 > 0 there exists a positive integer N such

that

IIVn+k—Vn||u d,- as n T 00. Moreover, if e V: E

diuia

i=1

then the inequalities

MW-u S wv—va e =

90 (2(Cm — dam) i=1

E

Z ICm — dil

clearly imply that “V — alu —> 0 as n T 00.

D

160

7. Normed linear spaces

7.13. Bibliographical notes Exercise 7.23 was motivated by a private communication by Eric de Sturler, who pointed out an oversight in the first edition.

Chapter 8

Inner product spaces and orthogonality

A proof should be as simple as possible, but no simpler.

Paraphrase of Albert Einstein’s remark on deep truths In this chapter we shall first introduce the notion of an inner product space and characterize its essential features. We then define orthogonal— ity and study projections, orthogonal projections and related applications, including methods of orthogonalization and Gaussian quadrature.

8.1. Inner product spaces

A vector space M over IF is said to be an inner product Space if there is a number (u,v)u 6 IF associated with every pair of vectors u, v E U such that: 1 WaV>u=u Z 0 With equality if and only if u = 0 ()

The number (u, v)“ is termed the inner product. Items (1) and (2) imply that the inner product is linear in the first entry and hence, in particular, that 2u = (u,v)u + (u, w)u ;

however,

(11, Bv)u = 3(u, v)u.

When the underlying inner product space M is clear from the context, the

inner product is often denoted simply as (u, v) instead of (u, v)“. Exercise 8.1. Let L1 be an inner product space over IF and let u E M. Show that

(u,v)=0

forevery

VEMu=O

and (consequently) (u1,v) = (u2,v>

for every

v E L! (=> 111 = u2.

The symbol (x, y)st, which is defined for x, y 6 IF” by the formula

(x, Y>st = n = Emu

(8-1)

i=1

will be used on occasion to denote the standard inner product on F n. The conjugation in this formula can be dropped if x, y E R". It is important to bear in mind that there are many other inner products that can be imposed on IF”:

Exercise 8.2. Show that if B 6 (€q and rankB = q, then the formula

(82)

(x,y> = (By)HBX

defines an inner product on (3".

Lemma 8.1. (The Cauchy—Schwarz inequality for inner products) Let L! be an inner product space over (C with inner product (u,v) for every pair of vectors u, v E M. Then

(83)

|| S {(11, u)}1/2{}1/2{}1/2 , which forces at least one of the vectors u,v to be equal to the vector 0, and hence the two vectors are linearly dependent.

8.1. Inner product spaces

163

(2) If (u,v) 7E 0, then, in polar coordinates,

(u,v) = |

r 2

r2

0

for every choice of x E R. (The condition (11, v) 75 0 insures that V 7E 0 and hence permits us to divide by b.) Thus, upon choosing x=——, b we conclude that

2

a—%=u = 0. o Orthogonal family: A set of nonzero vectors {u1, . . .,uk} in U is said to be an orthogonal family if

(ui, uj)u = 0 for 2' 7E j. The assumption that none of the vectors 111, . . . , uk are equal to 0

serves to guarantee that they are automatically linearly independent. o Orthonormal family: A set of vectors u1, . . . , uk in U is said to be an orthonormal family if

(1) it is an orthogonal family and (2) the vectors 11¢, 2' = 1, . . . , k, are all normalized to have unit

length, i.e.,

||u¢||51=u = 1, i: 1,. ..,k:. 0 Orthogonal decomposition: A pair of subspaces V and W of u is said to form an orthogonal decomposition of L! if (1) V + W = U,

(2) (V,W>u = 0 for every v E V and W E W. Orthogonal decompositions will be indicated by the symbol LI=V€BW. o Orthogonal complement: If V is a subspace of an inner product space H over IF, then the set

Vi={uEZ/{:u = dHGc = (Gc, (1)“.

Suppose first that G is invertible and that 2L1 cjvj = 0 for some choice of c1, . . . , ck 6 IF. Then, in view of formula (8.8), k

k

0 = (Z CjVj, Z divim = dHGc = (Gc, d)st

j=1

i=1

for every choice of d1, . . . , dk 6 IF. Therefore, GC 2 0, which in turn implies that c = 0, since G is invertible. Thus, the vectors V1,. . . ,vk, are linearly

independent. Suppose next that the vectors v1, . . . ,Vk. are linearly independent and

that c 6 Na. Then, by formula (8.8), k is (Z CjVj, Z CiVi>u = CHGC = 0 .

j=1

i=1

Therefore, 2L1 CjVj = 0 and hence, in view of the presumed linear independence, c1 = - -- = ck = 0. Thus, G is invertible.

III

Exercise 8.12. Verify the assertions in (8.7). Exercise 8.13. Verify formula (8.8). Lemma 8.4 can be strengthened:

Theorem 8.5. Letu be an inner product space overlF and let G denote the Gram matrix of a set of vectors v1, . . . ,vk in M. Then

(8.9)

rank G = dim span{v1, . . . ,vk} .

168

8. Inner product spaces and orthogonality

Proof.

Let V = span{v1, . . .,vk} and set 7' = dim V. In view of Lemma

8.4, it suffices to focus on the case that 1 g r < It. Without loss of generality, we may assume that the vectors v1, . . . ,vr are linearly independent. Then, by another application of Lemma 8.4, the upper left hand r x r submatrix of G is invertible and hence rankG 2 7°, i.e., rank G 2 dimV.

Conversely, if dim R0 = r, then, by reindexing the vectors if need be, we can assume that the first 7" columns of G are a basis for R0. Then for each

integer 2' with "r < 7} S k, the i’th column is a linear combination of the first 7" columns. Thus, as G = GH , the t’th row is also a linear combination of the first 7“ rows. Consequently, the upper left r x r submatrix of G is invertible. Therefore, V1, . . . , VT are linearly independent, i.e., rank G S dim V . This completes the proof.

D

Exercise 8.14. Show that if {V1, . . . ,vr} is a basis for the space V introduced in the proof of Theorem 8.5 and 1 g r < k, then there exists a matrix

A 6 CS” with s = k: — r such that the columns of [A independent and belong to Na.

IS]T are linearly

8.5. Projections and direct sum decompositions o Projections: A linear transformation P of a vector space M over ]F into itself is said to be a projection if P is idempotent, i.e., if P2 = P. Lemma 8.6. Let P be the projection of a vector space Ll over IF into itself, and let

Rp={Px:x€L{}

andlet

Np={xEU:Px=O}.

Then (8.10)

Proof.

U =RP-l-NP.

Let x E M. Then clearly

x = Px + (I — P)x

and Px 6 RP. Moreover, (I — P)x E Np, since

P(I — P)x = (P — P2)x = (P — P)x = 0. Thus, U =RP+NP.

8.5. Projections and direct sum decompositions

169

The sum is direct because

yERpy=PyandyENpPy=0 El

Lemma 8.6 exhibits L1 as the direct sum of the spaces V = RP and W = Np that are defined in terms of a given projection P. Conversely, every direct sum decomposition Ll = V—l—W defines a projection P on L! With V = RP and W =Np.

Lemma 8.7. Let V and W be subspaces of a vector space Ll over IF and suppose that U = V—l—W. Then for every vector u E L! there exists exactly one vector u’ E V such that u — u’ E W. The transformation Pv that maps u E U into u’ E V has the following properties:

(1) v = v for every v E V. (2) v = 0 for every W E W.

(3) P); is linear on Ll and P2 = Pv. (4) V =v andW =v. Proof. The fact that for each u E L! there exists exactly one vector u’ E V such that u — u’ E W is immediate from the definition of a direct sum

decomposition. Items (1) and (2) then follow from the decompositions v = v + 0 and W = O + w, respectively. Items (3) and (4) are left to the reader El

as exercises.

Exercise 8.15. Verify assertions (3) and (4) in Lemma 8.7.

Exercise 8.16. Let {v, w} be a basis for a vector space Ll over lF. Find the projection Pvu of the vector u = 2v + 3W onto the space V With respect to

each of the following direct sum decompositions: U = V—i—W and L1 = V—l—Wl,

when V = span {v}, W = span {W} and W1 = span {W + v}. It is important to keep in mind that Pv depends upon both V and the complementary Space W. Exercise 8.17. Let

[111

112

113

114

115

116]:

1 2 1

1

1

2

3

4

O 1

4 1

1 0

5 1

0 0

0

1

—1

0

—1

1

,

and let” = span{u1,u2,u3,u4}, V = span{u1,u2,u3}, W1 = span{u4}

and W2 = span{u5}(a) Find a basis for the vector space V. (b) Show that LI = V—l—Wl and L1 = V—i—W2.

170

8. Inner product spaces and orthogonality

(c) Find the projection of the vector u6 onto the space V with respect to the first direct sum decomposition. (d) Find the projection of the vector us onto the space V with respect to the second direct sum decomposition.

8.6. Orthogonal projections o Orthogonal projections: A linear transformation P of an inner product space Ll over IF into itself is said to be an orthogonal projection if

P2 = P

and

(Pu,v)u = (u, Pv)u

for every pair of vectors u,v E U, i.e., if P is idempotent and

(in terms of future notation) selfadjoint with respect to the given inner product. Exercise 8.18. Let 111 and u2 be a pair of orthonormal vectors in an inner product space LI over IF and let oz 6 IF. Show that the transformation P that is defined by the formula Pu = (u, 111 + ozug)uu1 is a projection but is not

an orthogonal projection unless oz 2 0. Lemma 8.8. Let P be the projection of an inner product space M over IF into itself and let

V=Rp={Pu: uEU}

and W=Np={u€L{: Pu=0}.

Then L1 = V—FW. Moreover,

(8.11)

(v,w)u = O

for every choice of v E V and w E W

if and only if

(Px,y)u = (x, Py)u

(8.12)

Proof.

for every choice of x,y E U.

Lemma 8.6 guarantees that L1 = V—l—W. To verify that (8.11) is

equivalent to (8.12), suppose first that (8.11) is in force and write PV in place of P. Then, since Pvu E V and (I — Pv)u E W for every vector u E U,

(PVXJM = (PVX, PvY>u + (PVX, (1 — Pv)Y>u = (PVX, PvY>u and

(X, PvY>u = (PVX, PvY)u + ((1 — Pla PvY>u = (PVX, PvY>u for every choice of x,y 6 LI, i.e., (8.11) => (8.12). Conversely, if (8.12) is in force and V E V and w E W, then

(v, W) = (PVV,W) = (V,PVW) = (v,0) = O.

171

8.6. Orthogonal projections

Exercise 8.19. Show that if P2 = Pv and (8.12) holds, then for every choice of x, y E L! .

(Km, y)u = (v, v)u

(8.13)

The next result is an analogue of Lemma 8.7 for orthogonal projections that also includes a recipe for calculating the projection. It is formulated in terms of one subspace V of the underlying inner product space M rather than in terms of a pair of complementary subspaces V and W. This is possible because, the second space W is implicitly specified as the orthogonal complement VJ‘ of V, i.e.,

W = VJ‘ = {u E U: (v,u)u = 0

(8.14)

for every

v E V}.

This orthogonal decomposition is often denoted

(8.15)

M =vevi.

Lemma 8.9. Letu be an inner product space over IF, let V be a subspace of

LI with basis {V1, . . . ,vk} and Gram matrix G with entries gij = u. Then: (1) For every vector u G Ll, there emists exactly one vector u’ E V such

that u — u’ 6 Vi. The transformation Py that maps u into u' it is given by the formula

I: (8.16)

(u, Vllu

Pvu = Z(G_1b)jvj,

where

b=

i=1

.

(u, Vklu

(2) PV is an orthogonal projection (i.e., Pv is a linear transformation of LI into L! that mapsbl onto V and meets the two conditions PV 2 P3

and (v, y)u = (x, vlu for every pair of vectors x, y 6 LI). (3) ||u — v||§l 2 ||u — Pvu||a for every vector v E V, with equality if and only ifv = Pvu.

(4) ||Pvu||§l g Hung), with equality if and only ifu E V. Moreover,

(8.17)

”PW”; = bHG—lb.

(5) If u = C" is endowed with the standard inner product and V =

[v1

vk], then G = VHV and

Pv = V(VHV)—1VH .

(8.18) Proof.

The first assertion is equivalent to the claim that there exists ex-

actly one choice of coefficients c1, . . . ,0], E (C such that

k


=0 Ll

for

i=1,...,k,

172

8. Inner product spaces and orthogonality

or, equivalently in terms of the entries in the Gram matrix, that

k (u,vi)u = Zgijcj j=1

for

i: 1,...,k.

But this in turn is the same as to say that the vector equation b = 00 has a unique solution c E (Ck with components c1, . . . ,ck for each choice of the

vector b. However, since G is invertible by Lemma 8.4, this is the case and Pvu is uniquely specified by formula (8.16). Moreover, this formula clearly displays the fact that Pv is a linear transformation of M into V, since the

vector b depends linearly on u. The rest of (2) is left to the reader as an exercise. Next, since

(u—Pvu,Pvu—v)u = 0

for

v E V,

it is readily seen that

(8-19) ”11 - V||51= llu - Pvu + Pvu - V||§1= llu - PvU||§1+||Pvu - Vllh, which serves to justify (3).

The inequality in (4) follows from formula (8.19) with v = 0; formula (8.17) is a straightforward calculation (it’s a special case of (8.8)). Finally, (5) follows from (1), since G = VH V and b = VHu in the given |j

setting.

Exercise 8.20. Show that in the setting of Lemma 8.9, k

(8.20)

llu — Z emu? 2 ||u||2 — bHG—lb j:1

with equality if and only if Cj = (G_1b)j forj = 1,. . . ,k. Exercise 8.21. Verify directly that the transformation Pv defined by for-

mula (8.18) for U = C" endowed with the standard inner product meets the following conditions:

(1) (Pv)2 = Pv-

[HINT: Items (1), (2) and (4) are easy, as is (3), if you take advantage of the fact that Vj = Vej, where ej is the j’th column vector of In.] Exercise 8.22. Calculate the norm of the projection P that is defined in Exercise 8.18.

8. 7. Orthogonal expansions

173

Exercise 8.23. Show that if P is a nonzero projection matrix, then:

(a) ”P“ = 1 if P is an orthogonal projection matrix. (b) ”P” can be very large if P is not an orthogonal projection matrix. Exercise 8.24. Find the orthogonal projection of the vector us onto the space V in the setting of Exercise 8.17.

8.7. Orthogonal expansions If the vectors v1, . . . , vk that are specified in the Lemma 8.9 are orthonormal in M, then the formulas simplify, because G = Ik, and the conclusions can be reformulated as follows:

Lemma 8.10. Let V1,... ,vk be an orthonormal set of vectors in an inner

product space Ll over IF and let V = span{v1, . . . ,vk}. Then: (1) The vectors v1, . . . ,vk are linearly independent.

(2) Vi ={uEUz (vj,u)u=0 forj= 1,...,k}. (3) The orthogonal projection Pvu of a vector u G Ll onto V is given by the formula

Pvll = (11, V1>LIV1 + - - - + (11, Vk>Ll-

(4) ||Pvu||gl = Z§=1|ul2 for every vector u E U. (5) (Bessel’s inequality) Z§=1|(u,vj)u|2 S ”Ulla; with equality if and only ifu E V.

Notice that: (1) It is easy to calculate the coefficients of a vector v in the span of V1,. . . , vk and its norm in terms of these coefficients:

k:

k

V = cvj => 03' = (VA/flu and “VII?! = Z |cj|2. j=1

j=1

(2) It is easy to calculate the coefficients of the projection PV of a vector 11 onto V = span {V1, . . . ,vk}:

k Pvu = Z cjvj => 09- = (u,vj)u j=1

and

k ||Pvu||gl = Z |cj|2. j=1

(3) The coefficients Cj, j = 1,. . . ,k, computed in (2) do not change if the space V is enlarged by adding more orthonormal vectors. It is important to note that to this point the analysis in this section is applicable to any inner product space. Thus, for example, we may choose

174

8. Inner product spaces and orthogonality

u equal to the set of continuous complex valued functions on the interval

[0, 1], with inner product

1

u = /0 fgdt Then it is readily checked that the set of functions

goj(t) = e32”,

j= 1,. .. ,k,

is an orthonormal family in L! for any choice of the integer k. Consequently, k

1

Z [0 roman s / |f(t)|2dt, by the last item in Lemma 8.10.

Exercise 8.25. Show that no matter how large you choose k, the family 903- (t) = e32”, j = 1,. . . ,k, is not a basis for the space Ll considered just above.

k (,u W>u =2( 11, uj)u W, uj)u

(8.21)

and

'Ms

Lemma 8.11. Let U be a k-dimensional inner product space over IF with inner product (-, “>u, and let u1, . . . , uk be an orthonormal basis for L! . Then

|u|2

(u,u)u =

j=1

]

1

for every choice of u and W in U. Proof.

Since the given basis for L! is orthonormal,

k u = E llj j=1

and

k W = E drillj, i=1

Where

Cj = (u,uj)u

and

dj = u,

for j: 1,...,k.

Therefore,

(u,w)u =

k k Z llj, Zdiflj

k =Zlcjd_-(uj,uz) u— — Zcidi, k

by the presumed orthonormality of the basis. The rest is plain.

D

8.8. The Gram-Schmidt method

175

8.8. The Gram-Schmidt method Let {u1, . . . ,uk} be a set of linearly independent vectors in an inner product space H over IF. The Gram—Schmidt procedure is a method for finding a set of orthonormal vectors {V1, . . . ,vk} such that

Vj = span{v1,...,vj} = span{u1,...,uj} for j = 1,...,k. The steps are as follows:

(1) Set v1 = u1/||u1||u. Then ||v1||u = 1. (2) Let Pvl denote the orthogonal projection onto V1 = span {v1}, set W2 = 112 — 13121112 = 112 - (112, WWW and check that

(W2,V1)u = (112,V1)u — (112,V1>u (V1,V1>u = 0 but ||W2||u 7E 0, since u2 and v1 are linearly independent. Then set V2 2 W2/ ”W2llu and observe that

span{u1, u2} = span{v1, V2} 2 V2 . (3) Let P122 denote the orthogonal projection onto V2, set W3 = 113 — 13122113 = 113 — (113, WWW — (113, WWW and check that

(W3,V1)u = O , (W3,V2)u = 0

and

“nlu 7E 0.

Then set

V3 = W3/||W3||u and verify that v1, V2, V3 is an orthonormal set of vectors such that

span{v1, v2, V3} = span{u1, u2, u3} .

The first three steps should sufiice to transmit the general idea. A complete formal proof depends upon induction, i.e., by showing that if the procedure works for the first j vectors (and j < k), then it works for the first 3' + 1 vectors: Thus, suppose v1, . . . ,Vj is an orthonormal family of vectors such that Vj = span{v1, . . .,vj} = span{u1, . . . ,uj}, let i denote the orthogonal projection onto Vj, set Wj+1 = uj+1 — iuj+1 = uj+1 — (uj+1a V1)uV1 — "' — (11341, Vj>uVj

and observe that

0 for 0 S 0 < 27r, and if (pg-(cw) = em for j = 1,. .. ,n, then 1

.. . . 0d0 = aj—k'a 9379 = (SD/9790])“ = _/ 27V 6—236w(626)ezk 271'

0

Where 1

aj

27f

w(ei9)e_ij0d6

2%0

is the j ’th Fourier coefficient of w(e 219); i.e., G is a Toeplitz matrix. On the other hand, if u is the space of continuous functions f (x) on a subinterval of R and an inner product is defined in terms of a function w(a:) by the formula d

u = / mwwmmx, Where w(:c) > 0 on the interval 0 < x < d, and if gag-(a3) = azj, then d

gjk = (9% My = / m3w(x)xkdzv = bj+ka Where d

.

bj =/ x3w(x)dm; i.e., G is a Hankel matrix.

These simple examples help to illustrate the great interest in developing efficient schemes for solving matrix equations of the form Gx = b and calculating G‘1 When G is either a Toeplitz or a Hankel matrix. Let 0

n

(8.22) Zn = Zeje£_j+1 = j=1

0

0

0 0 --:

1

1

1 0 2

0

---

0

and

”-1

Nn = ZejeJTH. j=1

0

Exercise 8.30. Show that A 6 ((3a is a Toeplitz matrix if and only if ZnA is a Hankel matrix.

Exercise 8.31. Show that if A E (3a is a Hankel matrix with aij = ,8i+j_]_ for 2', j = 1, . . . ,n, then, in terms of the matrices Z = Z, and N = Nn defined

178

8. Inner product spaces and orthogonality

in formula (8.22), n

71—1

A = Z aZWTr—j + E: as“? j=1

j=1

Exercise 8.32. Show that if A E (3a is a Toeplitz matrix with aij = Obi—j, then, in terms of the matrices Z = Zn and N = Nn defined in formula (8.22), n—1

n—1

A = Z 01_Z'Ni + Z ai(NT)i . i=0

i=1

Exercise 8.33. The n x n Hankel matrix Hn with entries hij = 1/ (i + j + 1) for i, j = 0, . . . ,n—l is known as the Hilbert matrix. Show that the Hilbert

matrix is invertible. [HINT: fol xixjda' = 1/(i +j + 1).] Exercise 8.34. Show that if H, denotes the n x n Hankel matrix introduced in Exercise 8.33 and if a E C" and b E C” are vectors with components a0, . . . ,an_1 and b0, . . . , bn_1, respectively, then

(8.23) 27r

(Hna, b)st—/() =



(E Ib e’kt)((ie‘it(7r — t))

n—l

Zajfijt j=0

Exercise 8.35. Show that if Hn denotes the n X n Hankel matrix introduced

in Exercise 8. 33, then ||Hn||2 2 < 7r. [HINT: First use formula (8.23) to prove

that |st| < 7T||-'sl||2||b||2] 8.10. Adjoints In this section we introduce the notion of the adjoint S* of a linear trans— formation S from one inner product space into another. It is important to

keep in mind that the adjoint depends upon the inner products of the two spaces.

Theorem 8.12. Let u and V be a pair of inner product spaces over IF, let U be finite dimensional and let S’ be a linear transformation from u into V. Then there emists exactly one transformation 3* from V into U such that

(8.24)

(Su,v)y = (u, S*V)u

for every choice of u 6 LI and v E V. Moreover, 8* is automatically linear. Proof.

It is easy to verify that there is at most one linear transforma—

tion from V into u for which (8.24) holds. If there were two such linear transformations, Si" and 8;, then

(311 V>v =u =u

8.10. Adjoints

179

and hence

(11,091“ - S§)V>u = 0 for every choice of u G LI and v E V. Thus, upon choosing u = (3; — 3;)v, it follows that

((Si" - 6‘3)", (Si - S§)V)u = 0, which implies that (3’1k — 3§)V = 0 for every vector v E V, i.e., 31k = 3’2“. Much the same sort of argument shows that if 3* exists, then it must be a linear transformation from V into L1, as follows from the sequence of formulas

(u, 3* (CW1 + 5V2)u = TT* = I. The opposite implication then

follows by noting that TT* = (T*)*T*.

El

Exercise 8.44. Show that lAj| = 1 for each eigenvalue Aj of the unitary transformation T considered in Theorem 8.19.

8.13. Auxiliary formulas Lemma 8.21. Let T be a linear transformation from a finite dimensional inner product space Ll over IF into an inner product space V over F. Then

(8.35)

IITllu,v = max{|(Tu,V)v| : u G Ll, v E V, ||u||u S 1 and ||v||v S 1}. Moreover,

(836)

”TH = ||T*||-

Proof. Let Bu 2 {u E U with ||u||u g 1}, B]; = {v E V with NV“); 3 1} and let 7 denote the right hand side of (8.35). Then the inequality

|v| S IITUIlv ||V||v S llTllu,v||u||u ||V||v S ||T||u,v for u 6 By and v 6 31) implies that 'y g ||T||uy. On the other hand, if T is not the zero transformation, then T

T

*y 2 max{w: u E Bu and HTu||v 7E 0} = ||T||u,v.

IITUIlv

Therefore, (8.35) holds and serves to verify (8.36).

CI

Exercise 8.45. Verify (8.36) with the help of (8.35). Lemma 8.22. Letu and V be inner product spaces ouerIF with orthonormal bases 111, . . . ,uq and v1, . . . ,Vp, respectively, and let S be a linear transfor-

mation frombl into V. Then q

(8-37)

p

2 IISUjlliz = Z ||s*v,-||§,. j=1

i=1

8.14. Gaussian quadrature

Proof.

187

By Lemma 8.11, p

p

IISUjIIiz = Z |vl2 = Z |S*Vi>u|2Therefore,

10 q jZHSUjHV =21s llj,S*Vi>U|2= Z||S*Villu El Lemma 8.23. Let S be a linear transformation from an inner product space M over IF into itself and let {u1, . . .,un} and {w1, . . . ,Wn} be any two or-

thonormal bases for u . Then:

— Zu = :(Suj, Wélu (Wu u3)u i=1

i=1

and n

n

*

L{ =(E i=1

=Eu MSUj,W3;)u. i=1

Therefore,

Z1 ‘13a f . Then clearly 7% is invariant under 3,, and Sn = 3;; i.e.,

(Snf,g)u = (f, Sng)u

for every choice of f,g E 79”.

Consequently, there exists an orthonormal set of vectors cpj E 79”, j = 0,. . .,n, such that

SnQOjZAjQDj

and

AjER

for

j=0,...,n.

Thus, if

7r],(:v)=:c k

for

k=0,1,...,

then, forlgkgn+1 andj=0,...,n,

u = (Sa—1,¢j>u =

(Wk—1: 5%i

=

)‘jU7

and hence, upon iterating this formula, we obtain (8.40) (Wk,§0j>u = A§(7r0,goj,)u for j = O,...,n and

k: =0,...,n+1.

(The case k; = 0 is not really obtained by iteration, but is self-evident.) Lemma 8.24. If p(x) is any polynomial of degree less than or equal to n + 1 with complex coefi'icients and {900, . . . , gun} is an orthonormal set of eigenfunctions for 8,, with Sngoj = Ajgoj for j = 0,. . .,n, then:

(1) (fistula = PO‘jXWoawM(2) 90j(/\j)u = 1 and WW) = 0 ifj 79 k(3) If p(x) is a polynomial of degree n + 1 such that < 2p matrix with p x p

blocks A = AH, B = BH, and C' = CH, then A E 0(E) 4:} X E 0(E). [HINT: It suffices to focus on A g IR .] 9.2. Commuting Hermitian matrices Theorem 9.6. Let A E ((3a and B E (3a be Hermitian matrices. Then AB = BA if and only if there exists a single unitary matrix U 6 (Ca that diagonalizes both A and B.

9.2. Commuting Hermitian matrices

Proof.

195

Suppose first that there exists a unitary matrix U E ((3a such

that DA = UHAU and D3 = UHBU are both diagonal matrices. Then, since DADB = DBDA,

AB = UDAUHUDBUH = UDADBUH = UDBDAUH = UDBUHUDAUH = BA. The proof of the converse is equally simple in the special case that A has n distinct eigenvalues A1, . . . , An, because then the formulas AUj=AjUj, j=1,...,n,

and ABuj = BAuj = AjBuj, j: 1, . . . ,n,

imply that Buj =fljuj

for

j=1,...,n

and some choice of 51, . . . , fin E C . But, since 111, . . .,un may be chosen

orthonormal, this is the same as to say that

A1

31

AU=U

andBU=U

,

fin

An for some unitary matrix U E (Ca_

Suppose next that AB = BA and A has k distinct eigenvalues A1, . . . , /\k with geometric multiplicities 71, . . . ,7,“ respectively. Then there exists a

set of k isometric matrices U1 6 Cnx71,...,Uk 6 (€a such that U 2

[U1

Uk] is unitary and AUj = AjUj. Therefore, ABUjZBAUjZAjBUj

fOI‘

j=1,...,k,

which implies that the columns of the matrix BUj belong to Af(fl—A3111) and

hence, since the columns of Uj form a basis for that space, that there exist matrices Cj such that BUj = UjCj

for j= 1,...,k.

The supplementary formulas

o, = Uio, = UjHBUj = of for j: 1,...,k exhibit 09- as a 73' x 79- Hermitian matrix. Therefore, upon writing

0,- = W,D,-W,H for j = 1,...,,k: with Wj unitary and Dj diagonal, and setting V} = Uj

for j: 1,...,k:

and

W=diag{W1,...,Wk},

one can readily check that

AV]- : AUj = AjUj = Ajl/Q-

for j: 1,...,k:

196

9. Symmetric, Hermitian and normal matrices

and BV} = BUj = UjCj = UjDj = VBDj

Thus, the matrix V = [V1

--'

for

j: 1,. . .,k.

V1,] = UW is a unitary matrix that serves III

to diagonalize both A and B. 9.3. Real Hermitian matrices

In this section we shall show that if A = AH and A E Rn”, then the

unitary matrix U in formula (9.3) may also be chosen in Rm". Theorem 9.7. If A = AH and A 6 RM”, then there exist an orthogonal matrtcc Q E Rnxn and a real diagonal matrix D 6 RM” such that

A = QDQT.

(9.4)

Proof. Let ,u E 0(A) and let 111, . . . , ug be a basis for the nullspace of the matrix B = A — HI”. Then, since B E Rn“, the real and imaginary parts

of the vectors Uj also belong to N3: If uj = xj + iyj with xj and yj in IR" forj= 1,...,€, then

B(xj+iyj) =0=>i =0

and quadByj =0

forj= 1,...,€.

Moreover, if

X: [X1

x5],

Y=[y1

then the formula

ye],

_ 1



U=X+iY

Ig

[X v] _5 [U U] [12 implies that the rank of the n x 26 matrix [x1

and

U=X—7IY,

—z'Ig

2.16] xe y1

ye] is

equal to 6. Therefore, 6 of these vectors form a basis for NB and an or-

thonormal basis of 6 vectors in R” for NB may be found by invoking the Gram-Schmidt procedure. If A has It distinct eigenvalues A1, . . . , A1,, let Qi, 2' = 1, . . . , k, denote the

n x 7, matrix that is obtained by stacking the vectors that are obtained by applying the procedure described above to B, = A — AiIn for i = 1,. . . ,k. Then, AQ, = MC), and A [Q1

° "

Qk] = [Q1

Qk] D where D = diag{)\1[71, . . 'iAkI’Yk} .

Moreover, the matrix Q = [Q1 - - - Qk] is an orthogonal matrix, since the columns in Q, form an orthonormal basis for MA—Aj In) and, by Lemma 9.1, the columns in Q, are orthogonal to the columns in Qj if 2' aé j. E Lemma 9.8. IfA E Rpxq, then

max{||Ax||st: x 6 CC" and ||x||st = 1} = max{||Ax||st: x E Rq and ||x||st = 1}.

9.4. Projections and direct sums in IF”

Proof.

197

Since A E Rpxq, AH A is a real q x q Hermitian matrix. Therefore,

AHA = QDQT, where Q E qq is orthogonal and D = diag{)\1, . . . , Ag} G qq. Let 6 = max{)\j : j: 1, . . . ,q}, let x E (3‘1 and let y = QTx. Then (5 Z 0 and “AX”;

=

st = (QDQTX,X>st

=

st

= Z My—jyj S 5:217yj j=1

j=1

= 6||y||§t = 6I|QTx||§t = 5||X||§tThus,

max{||Ax||st: x e c” and ||x||st = 1} = x/E. However, if 6 = A1,, then it is readily seen that this maximum can be attained by choosing x = Qek, the k’th column of Q. But this proves the claim, since Qek E Rq.

E]

9.4. Projections and direct sums in F” This section focuses on matrix projections of IF " onto subspaces of IF ”.

o Projections: A matrix P E 15‘a is said to be a projection if P2 = P. o Orthogonal projections: A matrix P E an is said to be an orthogonal projection (with respect to the standard inner product

(9.1)) if P2 = P and PH = P. Thus, for example, 1

0

P - la ol is a projection, but it is not an orthogonal projection with respect to the standard inner product unless or = 0. Exercise 9.6. Show that A 6 CM” is an orthogonal projection with rankA = r if and only if A = UDUH with D = diag{Ir,O(n_r)x(n_r)} and UH U = n. [REMARKz It is instructive to compare this exercise with

Exercise 6.20.] Lemma 9.9. Let P 6 FM" be a projection, let RP = {PX : x 6 Em} and

let = {x 6 IF" : Px = 0}. Then F” = Rp-l-Np.

(95)

Proof.

Clearly

x = Px + (In — P)x

for every vector

x 6 IF".

198

9. Symmetric, Hermitian and normal matrices

Therefore, since PX 6 RP and (In — P)x E Np, it follows that IF" = Rp +Np. It remains only to show that the indicated sum is direct. This is

left to the reader as an exercise.

D

Exercise 9.7. Show that if P is a projection on a vector space M over IF,

then RP flNp = {0}. Exercise 9.8. Let ej denote the j’th column of the identity matrix I4 for j= 1,...,4, and let

1 ll:

3

1 W

4’ 1 1

1

1 0

1 W

’2

=

1 W

1

1’ 3 1

=

0

1 0

1 W

’4

=

1

0 1

Compute the projection of the vector 11 onto the subspace V with respect

to the direct sum decomposition F4 = V—[—W when: (a) V = span {e1,e2} and W = span {W1, W2}.

(b) V = span {e1,e2} and W = span {W3, W4}. (0) V = span {e1,e2, W1} and W = span {W4}. [REMARK The point of this exercise is that the coefficients of e1 and e2 are different in all three settings]

Lemma 9.10. Letlfi‘” = V—[—W and letV = [ v1 vk ], where {v1,..., vk} is a basis for V, and let W = [W1 We [, where {W1,...,Wg} is a basis for W. Then:

(1) The matrix [V

W] is invertible.

(2) The projection Pv of IF” onto V with respect to the decomposition

IF” = V—l—W is given by the formula

mm Proof.

m=Vm @W WW3 Let u 6 IF”.

Then there exists a unique vector c 6 IF" with

entries 01,. . .,cn such that u = clvl + - - - + ckvk + ck+1wl + - - - + Cn or,

equivalently,

u = [V i.e., the n x n matrix [V

W]c;

W] maps IF” onto itself. Consequently, (1) holds

and Pvu = clvl + - - - + ckvk = V[Ik

0]c,

where

c = [V

W]_1u. D

If P is an orthogonal projection, then formula (9.6) simplifies with the help of the following simple observation:

9.4. Projections and direct sums in IF”

199

Lemma 9.11. Let A 6 PPM, let u E IN and v E l. Then (Au,v)st = (u, AHv)st.

(9.7)

Proof.

This is a pure computation:

(Au,v)st = VH(Au) = (VHA)11

= (AHv)Hu = .t. D Lemma 9.12. In the setting of Lemma 9.10, P1) is an orthogonal projection with respect to the standard inner product (9.1) if and only if

(9.8)

(Vx, Wy)st = 0

for every choice of x E lE‘k

and y 6 Fe.

Moreover, if (9.8) is in force, then

(9.9)

Pv = V(VHV)_1VH

(consistent with (8.17) Proof.

Let P = PV. If P = PH, then

(Vuwy') = u=/ u(s)v(s)r(s)ds, a

then integration by parts yields the formula

(MU), 'U>u = (u, L(v)>u +p(fl){U(B)’v'(/3) - U'(fi)v(fi)} —p(a){U(a)v'(a) — u'(Oz)'v(a)}If u and v are further restricted to belong to the subspace

c = {f E U : 01f(04) + 02f'(a) = 0 and d1f(fl)+ d2f'(fi) = 0}, then (19(9), ”>71 = < q and px p, respectively. Moreover, if AH Au = au and AAHV = flv, then the formulas

a(u, u) = (AHAu,u) 2 (Au, Au) 2 0 and

B(v,v) = (AAHV,V) = (AHV,AHV) Z 0 clearly imply that the eigenvalues of AHA and AAH are nonnegative. Therefore, by Theorem 9.3, there exists a unitary matrix U 6 (Cq such that

8i (10.1)

AHA = U

UH,

8% Where the numbers 3%, . . .,s§ designate the eigenvalues of AHA, and, in keeping With the usual conventions, it is assumed that they are indexed so 213

214

10. Singular values and related inequalities

that

812822---28q20. The numbers Sj, j = 1, . . . ,q, are referred to as the singular values of A.

IfrankA=r, then 8,. >0 and Sj =0forj=r+1,...,q, ifr 0 and sr+1 = 0.

(10.3)

Theorem 10.1. Let A 6 ((3q be a matrix of rank r, r 2 1, with singular values 81,. . . ,sq and let

(with

D=diag{sl,...,sr}ER’"x’"

312---Zs,.>0).

Then there exists a unitary matrix V E ((3p and a unitary matrix U E ((3q such that ,

V [

D

0rx(q—7‘)

Ono—aw 0(p—r)> D

(10.4)

A=
0 and

bj > 0 for j=1,..,k. Let u — —[cos€

sin 6]T be a vector in R2 with

0 < 0 < 77/2 and let Pux = u(uH u) 1uHx = uuHx denote the orthogonal projection of x E R2 onto the subspace bl = span{u} = {tu : t E R}. Then Xj = Pq + (I — Pu)Xj

and the square of the distance from Xj to U is equal to

“0- Pu)Xj||2 = ((1 — Palxja (1— Pu)Xj) = ((1 — Pu)x,~,xj) =

“ll2 — llPql|2 = a? + b? — (aj c086 + bj sin0)2.

The objective is to choose 0 to minimize

2m; +b2-— (ajcos6’+bj 31nd) }

222

10. Singular values and related inequalities

or, equivalently, to choose 6 to maximize

k

k

2(aj cos 6 + bj sin 6)2 = EM? cos2 6 + b? sin2 6 + 2ajbj cos 6 sin 6} j=1 = ||a||2cos26+ ||b||2sin26+2(a,b)cos6 sin6

j=1

= With a = [a1

{||a||2+||b||2+ozcos26+fisin26}/2, bk]T, Oz = ||a||2 — ”b”2 and fl =

ak]T, b = [b1

2(a, b). The Cauchy-Schwarz inequality implies that |0zcos26+flsin26| S V012 +52 Vcos2 26+sin22 = Voz2 +62 with equality if and only if 00826

sin26 — ‘7 as

for some

7 6 1R °

This in turn serves to identify 6 as the unique angle strictly between 0 and

7r/2 for which

a — W. Hall2 - llbll2 cot 26 — B

(10.24)

The formula is meaningful because 6 > 0. The asserted uniqueness follows from the fact that cot 26 decreases monotonically from +00 to —oo as 6 increases from 0 to 7r/2. Thus, if 60 denotes the unique solution of (10.24)

in the interval 0 < 6 < 7r/2, then 60 < 7r/4 if ||a||2 > ||b||2, 60 = 7r/4 if

”all2 = llbll2 and 00 > W/4 if ||a||2 < llbll2Exercise 10.15. Let 60 denote the unique solution of (10.24) in the interval

0 < 6 < 7r/2 and let 31, 52 denote the singular values of the matrix [a b].

Show that [a b]T [a b] [23:33] = 3% [2:31:33] Exercise 10.16. Find the angle 60 for the best fitting line in the sense that it minimizes the total mean square distance for the points x1 = [1 2]T, X2 = [2

1]T and X3 = [2

3]T.

10.5. Fitting a line in RP If x, u, v 6 RP and “V“ = 1, then the the distance of x from the line {u+ tv : t 6 IR} is equal to

min||x—u—tv|| = IIX— u—Pv(x—u)||, tElR where PV denotes the orthogonal projection of RP onto the vector space

V = span {v}.

10.6. Projection by iteration

223

The present objective is to choose the line, i.e., the vectors u and v, to minimize

k

E llxj — u — PV(Xj — u)||2 j=1

for a given set of It points x1, . . . ,xk 6 RP.

Let a = k‘1 Z§=1 xj. Then k: 2(3‘3 _

j=1 and the sum of interest is equal to

Z “39' — 3- Pv(Xj — a) + a- u — Pv(a— 11)||2 '

J

1

{llxj - a - Pv(Xj - a)||2 + Ila - u - Pv(a - 11)||2}, p—A

M.

II

(10.25)

since

—22(x ——a

Pv(x —a),a—u—PV(a—u))=0.

Thus, as both of1 the terms inside the curly brackets in the second line of

(10.25) are nonnegative, the sum is minimized by choosing u = a and then choosingkthe unit vector v to minimize

ZIIXj—a— Pv(X

a)|=|2

2:1{i—an2— (Xj aV>2}-

But this 1s the same as choosing v to maximize k k

Z : XEX

Xesj

(x,x)

and X750} 2%.

On the other hand, as

(Ax,x) .

max{ (x,x) .xEspan{u1,... ,uj}

and X7é0

it follows that

minmax{ XESj

(Ax, x)

(X; X)

and hence that equality prevails.

:XEX

and

X7é0

)

(

1,...,k

B

it follows that

2

(10.32)

det(BBH)=

E

B( .1"“’]? )

lsj10andk=1,...,n. J :1 i=1 k k (2) (1+r|/\j|) g H(1+rsj) forr > 0 andk = 1,. . .,n. j=1 J =1 Proof.

Lemma 10.12 guarantees that

|A1|"'|)\k| £81...3k, Suppose that lAkl > 0. Then

ln|/\1|+---+ln|)\k| gln. 0,

p{ln|)\1| + - - - + ln|)\k|} g p{1n31 + - - - +lnsk} or, equivalently,

ln|A1|p+---+1n|)\k|p glns§+---+1nsfi . Consequently, Lemma 10.14 is applicable to the numbers aj = 1n |Aj|p, bj = ln s? , j = 1,. . . ,k, and yields the inequality P eln|A1| P +...+eln|)‘k| p Selns1P +...+elnsk 7

which is equivalent to |A1|P+...+|)\klpg 811J+"'+8Z .

Thus we have established the inequality (1) for every integer k: E {1, . . . , n} for which |Ak| > 0. However, this is really enough, because if Ag = 0,

then 'd S sj forj = €,...,n. Thus, for example, if n = 5 and |A3| > 0 but A4 = 0, then the asserted inequality (1) holds for k: = 1,2,3 by the preceding analysis. However, it must also hold for k = 4 and k = 5, since A4 = 0 => A5 = 0 and thus |A4| S 34 and |A5| g 85.

232

10. Singular values and related inequalities

The second inequality may be verified in much the same way by invoking the formula

90(53) = [j (x — 8) 0. This works because

90"(x) = m 20 for every xER. The details are left to the reader.

D

Exercise 10.26. Verify the integral representation for 90(a3), assuming that 90(53) , 90’ (:12) and go” (at) are nice continuous functions that tend to zero quickly enough as x —> —oo so that the integrals referred to in the following hint converge. [HINT: Under the given assumptions on (p,

we) = L. weds: /_OO{/_:Oso”du}ds = [:0 (/jds) go”(u)du.] Lemma 10.16. Let A 6 CW” with singular values 31 2

2 sn. Then

81+"‘+8k

=nmfifimmaflfiuwm:UUH=1fin(W“nmdV=IQ. Proof.

Let B = VH UAV and let A1(B), . . . , Ak(B) denote the eigenvalues

of B repeated according to their algebraic multiplicity. Then, since

k trace B = Z Aj(B) , j=1 Theorem 10.15 implies that k

k

|trace 3| 3 Z |Aj(B)| 3 2 33(3). j=1

j=1

Moreover, by Lemma 10.10,

sAvHUAV)

83(3) |/\

IIVHII ||U||8j(A)||V|| = 81(14) -

10.8. Inequalities for singular values

233

Therefore, for every choice of U E (3a and V E 13””, with UHU = In and VHV = 1k:

k |trace(()lVHUAV |g Elsi The next step is to show that there exists a choice of U and V of the requisite form for which equality is attained. The key ingredient is the singular value decomposition

A = VOSUJJ , in which V0 and U0 are unitary,

S = diag{31, . . . , 3n}

and

Sj = Sj(A).

In these terms,

trace (VHUAV) = trace (VHUWSUJJV} , which, upon choosing

= [LC 010}? and U = UOVOH, simplifies to

trace(VHUAV)

= trace{[lk =

O]S[I;c

0]H}

81 + - - - + 3k -

El Exercise 10.27. Let A, B E Cnx” with singular values a1 2

2 an and

2 fin, respectively. Show that

51 2

max |trace{(UAVB) : U, v 6 CW” and UHU = vHv = In}| = 201,5,- . The next theorem summarizes a number of important properties of singular values.

Theorem 10.17. Let A,B 6 (3a and let 33(0) j = 1,. . .,n, denote the singular values of a matrix C. Then:

(1) sj(A) = sj(AH) forj < rankA.

(2) (3) (4) (5)

81(BA)< ||B||81(A) 8 (AB) < 81(A)llBll H12181'(AB) 0 and that A and A0 are expressed in the forms

(11.4) and (11.5), respectively. Then: (11.7)

RA={V

17"

]u:uElFr},

-O(P—7")X7”

(11.8) (11.9)

(11.10) Proof.

NAo=

{

V _DB1] v : V E Flo—r},

_ Ip—r

Ao={U :BI2D:| u: uElFr},

AM =

U ‘0T;(q_r)] V : V E F(q_r)} . q—T‘

By definition, _ _D RA—{V_0

0 H. q O]Ux.xEIF}

={V10DJ [IT O]x:xElE‘q} _Ir

,

'r'

—{V_0]u.uElF }.

11.1. Pseudoinverses

241

_D4 H U_B2]p;_DBflV'xzxerW} _D4

][5 DBflxzxeFP}

Q

HHr—“e—H

||

Similarly,

.32

.1?

r

U_BzD]u'U‘EF }.

Suppose next that x E NAo. Then

—1

However, since U [ B J is left invertible, this holds if and only if

2

[L DBflVHx=0. Thus, upon writing

m = [u] , V

with u 6 IF’" and v E Fin—7", it is easily seen that u m DEJL]—0 or equivalently that

Therefore,

This proves that

NAO g {V [—1931] v :v E Fin—7"}

#4

and hence in fact serves to establish equality, since the opposite inclusion is self-evident.

The formula for NA is established in much the same way. Exercise 11.3. Verify the formula for NA that is given in Lemma 11.5.

El

242

11. Pseudoinverses

Remark 11.6. Formulas (11.7) and (11.8) confirm the already established fact that

R +No={V[ A

A

V [

= {

=1Fp,

I

—DBl

0(p—r)Xr

119—?“

I

—DBl

0(10-7‘)> 31 = Orx(p_,.)

NA

is orthogonal to

and

(11.12)

RA0 4:) B2 = 0(q_,.)x,..

Theorem 11.7. Let A E lfi‘pxq and let X and 3/ be subspaces of II.” and It” respectively, such that

72/4e = IE‘p and NARI = IF‘q. Then there exists a pseudoz’nuerse A° of A such that

NAo =X

and 72,40 =3).

Suppose first that X and 3/ are proper nonzero subspaces of F10

Proof.

and IF ‘1 respectively, let 7" = rank A and let {x1, . . . ,xp_,.} be a basis for X.

Then, in terms of the representation (11.4), we can write

[x.

x._.] =v[§]

for some choice of C E IFTXU‘H") and E 6 IF (p_’")x(p_f). Thus, in View of

(11.7), R

A

+26:

{

V

I

r

C

u

l0> 0, and

(11.14) Where V = [V1

A=V

D OTXW—T) UH = Vlol, 0(p-7‘)>x

V2] and U = [U1

U2] are unitary matrices with first

blocks V1 and U1 of sizes p x 7" and q x 7“, respectively, and

(11.15)

D =diag{sl,...,sr}

is a diagonal matrix based on the nonzero singular values 81 2 - - - 2 8,. > 0

of A, then the matrix AI 6 IF q defined by the formula

(11.16)

AI = U [

D‘1 0

O ] VH = U1D—1V1H O

meets the four conditions in (11.13). It remains to check uniqueness. Let B E IFqXp and C 6 IF q both satisfy

the four conditions in (11.13) and let Y = BH — CH. Then the formulas

A = ABA = A(BA)H = AAHBH and

A = AC'A = A(C’A)H = AAHCH imply that O = AAH Y and hence that Ry Q NAAH = NAH.

11.2. The Moore-Penrose inverse

245

On the other hand, the formulas

B = BAB = B(AB)H = BBHAH and

o = CAC = C(AC)H = CCHAH imply that

Y = A(BBH — OCH) and hence that Ry Q RA. Thus, as RA flNAH = {0} by Lemma 9.14, it follows that Y = O, as needed. III

The unique matrix Al that satisfies the four conditions in (11.13) is called the Moore-Penrose inverse of A.

In View of the last two formulas in (11.13), AAl and AlA are both orthogonal projections with respect to the standard inner product. Correspondingly the direct sum decompositions exhibited in Lemma 11.3 become orthogonal decompositions if the Moore—Penrose inverse AJr is used in place of an arbitrary pseudoinverse A°. Lemma 11.9. Let Ar E IN” be the Moore-Penmse inverse ofA E lfi‘pxq. Then: (1) l =RA®NAf. (2) q = RAT GENA.

Proof. Since AJr is a pseudoinverse, Lemma 11.3 guarantees the direct sum decompositions IFP=RA-l-NA+

and

IFq=NA-l-RA+.

To complete the proof of (1), we need to show that the two spaces RA and NA,» are orthogonal with respect to the standard inner product. To

this end let x 6 RA and y E NAT. Then, since RA = RAAT and AA1 is a projection,

x = AAlx. Therefore,

(Kay) = (AAlxaw = (x, (AAT)Hy> =

(x,AAly) = 0,

as needed. The proof of (2) is immediate from (1), since (Al)l = A.

D

Exercise 11.9. Show that if A E (CPXP, B E ((3q and RB Q RA, then

AATB = B, AHAA1 2 AH and BHAAT 2 BH.

246

11. Pseudoinverses

Lemma 11.10. Let

M=[A B] BH 0

be a Hermitian matrix with A E (Cp, C 6 ((3q and RB Q RA. Then M admits a factorization of the form

I

(”'17) Proof.

O][A

M=[BHAT I

O

][IATB].

0 C—BHAtB

0

I

The formula is easily verified by direct calculation, since BHAlA =

BH and AAlB = B when RB Q RA and A = AH.

D

Exercise 11.10. Show that RAT = RAH and NA». = NAH. [HINT: This is an easy consequence of the representations (11.14) and (11.16).] Exercise 11.11. Show that if A E Cpxq, then the matrix AA)r is an or—

thogonal projection from Cp onto RA.

Exercise 11.12. Use the representation formulas (11.14) and (11.16) to give a new proof of the following two formulas, for any matrix A E (3q: (1) (3” = RA EBA/AH (with respect to the standard inner product).

2 (3‘1 = R A H EDNA with respect to the standard inner product). Exercise 11.13. Show that if B,C E Cpxq, A E ((3q and rankB = rankC = rankA = q, then

(11.18)

(EACH)T = C(CHC)—1A—1(BHB)—1BH

and give explicit formulas for (BACH)l(BACH) and (BACH)(BACH)l in terms of B, BH, C and CH.

Exercise 11.14. Show that if A E (:q, then (Al)l = A. Exercise 11.15. Show that if A E Cpxq, then AlAAH = AHAAJr = AH. Exercise 11.16. Show that if

0

B

E - l B. o l is a Hermitian matrix such that RC Q RBH, then the Moore-Penrose inverse

EJr of E is given by the formula

(11.19)

(1)31 El=[ —BTHCBT

BTH ]. (O)

[HINT: Exploit Exercise 11.15 and the fact that 720 g RBH => 3130 = 0.]

11.3. Best approximation in terms of Moore-Penrose inverses

Exercise 11.17. Let

247

H A0 _ 0-400%

Where B is invertible, and let Al denote the Moore—Penrose inverse of A. Show that the matrix

AT 0 0 O l B —1 [ > (B —1 H

is a pseudoinverse of C, but it is not the Moore-Penrose inverse. Exercise 11.18. Show that the matrix

AA)r BHA’r

O O

is a projection but not an orthogonal projection with respect to the standard

inner product (unless BHAJr = 0). Exercise 11.19. Let A1,A2 E Cq and B1,B2 E (Cpxr and suppose that 7231 g 72,41 and R32 Q RA2. Show by example that this does not imply that RBI+B2 g RA1+A2. [HINTz Try B2- = ui and AZ- 2 uz-wZ-H forz' = 1, 2, with v1 orthogonal to V2 and W1 2 W2.]

Exercise 11.20. Show that if A E (CPXP, B E (Cpxq, BBH = Ip and AB

M=[BH 0

],

then

Ml:

0

B

BH —BHAB

'

Exercise 11.21. Let B E Cpxq. Show that:

(1) BTB is the orthogonal projection of (Cg onto RBH. (2) BBl is the orthogonal projection of (3” onto RB. Exercise 11.22. Show that if A E (Cpxq and AHA is invertible, then

(AAH)T = A(AHA)_2AH. 11.3. Best approximation in terms of Moore-Penrose inverses The Moore-Penrose inverse of a matrix A With singular value decomposition

(11.14) is given by formula (11.16). Lemma 11.11. IfA E ((3q and b 6 (CP, then

(11.20)

||Ax — b||2 = ||Ax — AAlb||2 + ”(1p — 21/11)a

for every x 6 (3‘1.

248

11. Pseudoinverses

Proof.

The stated formula follows easily from the decomposition

Ax — b = (Ax — AA’rb) — (1,, — AAl)b = AAl(Ax — b) — (1,, — AAl)b, since

(AAi (Ax — b), (1,, — AAT)b) = (Ax — b), (AAUH (1,, — AAUb) Ax — b, (AAl)(Ip — AAl)b) (Ax — b, 0)

El Formula (11.20) exhibits the fact that the best approximation to b that we can hope to get by vectors of the form Ax is obtained by choosing x so that

Ax = AAlb. This is eminently reasonable, since AAlb is equal to the orthogonal projection of b onto RA. The particular choice

x = Alb has one more feature:

(11.21)

”Nb” 2 min{||y|| : y e C4 and Ay = AATb}.

To verify this, observe that if y is any vector for which Ay = AAlb, then

y — Alb 6 NA; i.e.,

y = Alb + u

for some vector u E NA. Therefore, since

(Alb,u) = (ATAAib,u) = (Aib,AiAu) = (Alb,0) = 0, it follows that

IIYII2 = “Alb“2 + IIUIl2Thus,

IIYII 2 llAlbll with equality if and only if y = Alb. Remark 11.12. If A is a p x q matrix of rank q, then AH A is invertible and another recipe for obtaining an approximate solution to the equation Ax = b is based on the observation that if x is a solution, then

AHAX = AHb

and hence

x = (AHA)_1AHb.

11.4. Drazin inverses

249

Exercise 11.23. Let A E IE‘pxq. Show that if rankA = q, then AHA is invertible and

(AHA)-1AH = AT. 11.4. Drazin inverses

Theorem 11.13. IfA 6 CW”, then (11.22)

RAlc 2 RAk+1

f0?" k = 0,1,. . .

and (11.23)

Proof.

RAk = RAk+1 => RAk+1 = RAk+2 .

To verify (11.23), observe that if RAIC = RAk+1, then x = Ak+1u => x = A(Aku) = A(Ak+1v) = Ak+2v.

The verification of (11.22) is easy and is left to the reader.

[I

Theorem 11.13 guarantees that there is a smallest nonnegative integer k for which equality prevails in (11.22). This integer is called the Drazin index of A. The basic example of a matrix A 6 ([3a with Drazin index k, k = 1,...,n— 1, is the matrix

(11.24)

A = U [B O 0

C] U‘1 ,

where B is invertible, Ck = 0 but 016—1 7g 0.

The representation (11.24) may be obtained from the Jordan decomposition

of a matrix A with NA 7Q {O}; k is equal to the size of the largest Jordan cell in J with zeros on the diagonal. Exercise 11.24. Show that if A 6 Ca and k: is equal to the Drazin index of A, then 0 g k g n. Exhibit matrices with Drazin index 0, 1, 2 and n.

Theorem 11.14. If the Drazin index of a nonzero matrix A 6 ([3a is equal to k, then there exists exactly one solution X E (3a of the system of matrix equations

(11.25)

AkXA = A“,

XAX = X and AX = XA.

Proof. If A is invertible, then k: = 0 and X = A‘l. If A is not invertible and has Drazin index k with 1 S k: S n — 1, then it admits a representation

of the form (11.24). Consequently, if U‘lXU = [YH Y21

1’12] Y22

250

11. Pseudoinverses

where Y11 and Y22 are square matrices that are of the same size as B and C

in (11.24), respectively, then the first equation in the system (11.25) can be rewritten as

Bko

Y11Y12

B0_Bk0

O

Y21

O

O

Y22

C

_

O

O



which leads easily to the conclusion that Yll = 3—1

and

Y12C = 0.

Similar analysis of the second and third equations in the system (11.25) lead to the supplementary constraints Y220Y21 = 0,

Y22CY22 = Y22,

YmB = CY21,

Y12 = 0,

Y220 = CY22

and hence that

1920 = Y220Y22C = - -- = (mayc = 15n = 0. Therefore, Y22 = Y22C’Y22 = O and

Y21 = 01513—1 = 021/2132 = --- = CkYnk = 0. Thus,

(11.26)

X=U[B(;1 g]U_1 ifk=1,...,n—1,

whereas

(11.27)

A = U03")U—1 and X = 0 if k: = n. CI

The unique solution X of (11.25) is called the Drazin inverse of A and will be denoted Ab. Exercise 11.25. Show that if A E (Cnxn, then

(VAv-1)" = VAbV‘l,

(AH)" = (Ab)H and (A3)" = (.4511

Exercise 11.26. Show that if A E (Cnxn, then

(Ab)b = A RAM/A = {0}. Exercise 11.27. Show that if A 6 CM", then

the Drazin index of A = min{j : RAj DNAJ‘ = {0}} . 11.5. Bibliographical notes Exercises (11.25) and (11.26) were taken from a slide presentation of Adi Ben Israel that was available on the Web.

Chapter 12

Triangular factorization and positive definite matrices

Half the harm that is done in this world Is due to people who want to feel important. They don’t mean to do harm—but the harm does not interest them. Or they do not see it, or they justify it Because they are absorbed in the endless struggle To think well of themselves. T. S. Elliot, The Cocktail Party

This chapter is devoted primarily to positive definite and semidefinite matrices and related applications. To add perspective, however, it is conve— nient to begin with some general observations on the triangular factorization of matrices. In a sense this is not new, because the formula

EPA = U

or, equivalently,

A = P‘1E_1U

that emerged from the discussion of Gaussian elimination is almost a triangular factorization. Under appropriate extra assumptions on the matrix A 6 CW", the formula A = P‘1E_1U holds with P = In.

0 Warning: Recall that (u,v) = (u, V)“, the standard inner product, and ||u|| = ||u||2 for vectors u , v 6 IF”, unless indicated otherwise. Correspond— ingly, ”A“ = ||A||2,2 for matrices A. 251

252

12. Triangular factorization and positive definite matrices

12.1. A detour on triangular factorization The notation

(12.1)

ajj Awe] = 3

"'

akj

. . .

ajk f

for A E (Cnxn

and

1 gj g k: g n

akk

will be convenient.

Theorem 12.1. A matrix A 6 CW” admits a factorization of the form

(122)

A = LDU,

where L E ((3a is a lower triangular matrix with ones on the diagonal, U 6 ((3a is an upper triangular matrix with ones on the diagonal and D 6 (Ca is an invertible diagonal matrix, if and only if the submatrices (12.3)

AlLkl

are invertible for

k = 1, . . . ,n.

Moreover, if the conditions in (12.3) are met, then there is only one set of matrices, L, D and U, with the stated properties for which (12.2) holds. Suppose first that condition (12.3) is in force and consider the

Proof. equation

(12.4)

XA = Y

in which X E (:a is lower triangular with ones on the diagonal and Y 6

Ca is upper triangular. This is a system of n2 equations with (n2 — n)/2 unknown entries $z'j with 1 g j < i S n in X and (n2 + n) /2 unknown entries yij with 1 g i g j g n in Y. The basic fact that makes this awesome

looking system of equations easy to solve is that (in self-evident notation)

(12.5)

X11 (

)[Lk]

[k

] [X21

0

A11

A12

X22]

[A21

A22]

1k [0

J = X11A11 = X[1.k]A[ 1’k]

for k = 1, . . . ,n. Thus, the k’th row of the reduced system (XA)[1,k] = Yu’k] is all = yll for k: = 1 and [£1711

° °'£E1’k_1

1] Auk] = [0

-' '

0

ykk]

for 2 S k S n.

But this in turn implies that (12.6)

[.7311

- - - $1,k_1] = — [akl

- --

ak,k_1] A[—1,1k—1]

for 2 S k: S n.

It is readily checked that if the entries in the lower triangular matrix X

are specified by the formulas (12.6), then Y = XA is upper triangular with nonzero diagonal entries. Thus, (12.2) holds with D = diag{y11, . . . ,ynn}, L = X‘1 and U = D‘lY.

12.1. A detour on triangular factorization

253

Conversely, if A admits a factorization of the form (12.2) with the stated properties, then, upon writing the factorization in block form as

[A11 A21

A12]=[L11 A22 L21

0 ][D11 L22 0

0 ][U11 D22 0

U12] U22 7

it is readily checked that

A11 = L11D11U11 or, equivalently, that A[1,k] = L[1,k]D[1,k]U[1,k]

for

h = 1,. . . ,n.

Thus, A[1,k] is invertible for lc = 1,. . . , n, as needed. To verify uniqueness, suppose that A = L1D1U1 = L2D2U2. Then the

identity L2— 1L1D1 = D2U2U1— 1 implies that L2— 1L1D1 is both upper and lower triangular and hence must be a diagonal matrix, which is readily seen to be equal to D1. Therefore, L1 2 L2 and by an analogous argument U1 = U2, which then forces D1 = D2.

D

Exercise 12.1. Let Pk = diag {11“ 0(n_k)x(n_k)}. Show that (a) A 6 (Ca is upper triangular if and only if AP], = PkAPk for k: = 1, . . . , n. (b) A E (:a is lower triangular if and only if PICA = PkAPk for k: = 1, . . . , n.

Exercise 12.2. Show that if L E (3a is lower triangular, U 6 ([3a is upper triangular and D E Ca is diagonal, then

(12-7) (LDU)[1,k] = L[1,k]D[1,k]U[1,k] and (UDL)[k,n] = U[lc,n]D[k:,n]L[lc,n] fork=1,...,n.

Theorem 12.2. A matrix A E (Cnxn admits a factorization of the form (12.8)

A = UDL,

where L E (:a is a lower triangular matrix with ones on the diagonal, U E (:a is an upper triangular matrix with ones on the diagonal and D E (:a is an invertible diagonal matrix, if and only if the blocks

(12.9)

AUcm]

are invertible for k: = 1,. ..,n.

Moreover, if the conditions in (12.9) are met, then there is only one set of matrices, L, D and U, with the stated properties for which (12.8) holds.

Proof. The details are left to the reader. They are easily filled in with the aid of the second set of equalities in (12.7) and the proof of Theorem 12.1 III as a guide. Exercise 12.3. Prove Theorem 12.2.

254

12. Triangular factorization and positive definite matrices

Exercise 12.4. Let A E (3a be a Vandermonde matrix with columns v()\1), . . . ,v()\n) based on n distinct points A1, . . . , An. Show that A admits a factorization of the form (12.2) , but may not admit a factorization of the

form (12.8). 12.2. Definite and semidefinite matrices

A matrix A 6 ((3a is said to be positive semidefinite over C” if

(12.10)

(Ax,x) 2 0 for every x E C”;

it is said to be positive definite over (3” if

(12.11)

(Ax, x) > 0 for every nonzero vector x E C”.

The notation

A t 0 will be used to indicate that the matrix A E (3a is positive semidefinite over C”. Similarly, the notation

A > 0 Will be used to indicate that the matrix A 6 (Ca is positive definite over (Cn.

Moreover, if A 6 (Ca and B 6 CW", then A t B and A > B means that A — B i 0 and A — B > 0, respectively.

Correspondingly, a matrix A 6 ([3a is said to be negative semidefinite over C" if —A z 0 and negative definite over (3" if —A > 0.

Lemma 12.3. IfA 6 (Ca and A t 0, then: (1) A is automatically Hermitian. (2) The eigenvalues of A are nonnegative numbers. Moreover,

(12.12)

A > O 4:) A t O and the eigenvalues ofA are all positive

(12.13) Proof.

4:) A t O and det A > 0. If A t 0, then

(Ax,x) = (Ax, x) = (x, Ax) for every x E C”. Therefore, by a straightforward calculation, 4(Ax, y)

E i k=1

i

A(x + i]6 y) (x + iky))

ik(( ((x + i y), A(x + iky)) = 4(x, Ay);

12.2. Definite and semidefinite matrices

255

i.e., (Ax, y) = (x,Ay) for every choice of x, y E C". Therefore, (1) holds. Next, let x be an eigenvector of A corresponding to the eigenvalue A. Then

A(x,x) = (Ax,x) Z 0. Therefore A 2 0, since (x, x) > 0. This justifies assertion (2); the rest of the proof is left to the reader.

El

0 Warning: The conclusions of Lemma 12.3 are not true under the less restrictive constraint

(Ax, x) 2 0 for every x E R“. Thus, for example, if _

2

A—[O

—2

_

£171

2]andx—[$2],

then

(Ax,x) = ($1 — 2:2)2 +38% +m§ > 0 for every nonzero vector x E R”. However, A is clearly not Hermitian. Exercise 12.5. Verify the equivalences (12.12) and (12.13). Exercise 12.6. Show that if V E Ca is invertible, then

A > 0 4:) VHAV > 0. Exercise 12.7. Show that if V E @71c and rankV = k, then

A > 0 => VHAV > 0, but the converse implication is not true if k: < 77.. Exercise 12.8. Show that if the n x 11 matrix A = [aij], i,j = 1, . . . ,n, is

positive semidefinite over C", then |a¢j|2 S aiiajj. Exercise 12.9. Show that if A 6 CW”, n = p + q and A =

A11

A12 ] ,

[ A21 A22

Where A11 6 (CPXP, A22 6 (qq, then

A > 0 All > 0,

A21 = Ag and A22 — A21A1‘11A12 > 0.

Exercise 12.10. Show that if A E Cpxq, then

”A” g 1Iq—AHA:OIp—AAH:O. [HINT: Use the singular value decomposition of A.]

256

12. Triangular factorization and positive definite matrices

Exercise 12.11. Show that if A 6 Ca and A = AH, then

A2 A

in).

A In

Exercise 12.12. Show that if A 6 CW” and A t 0, then

A A

A ]§0. A

Exercise 12.13. Let U E (3a be unitary and let A 6 CW”. Show that if A > O and AU > 0, then U = In. [HINT: Consider (Ax, x) for eigenvectors x of U] 12.3. Characterizations of positive definite matrices Our next objective is to present a number of different characterizations of the class (12.14)

C2” = {A E (Cnxn: A > 0}.

Theorem 12.4. If A E (Cnxn, then the following statements are equivalent:

(1) A > 0. (2) A 2 AH and the eigenvalues, A1, . . . , An, of A are all positive; i.e.

Aj >0f0rj=1,...,n. (3) A = VHV for some n x n invertible matrix V. (4) A 2 AH and det A[1,kl > 0

for

k = 1,...,n.

(5) A = LLH, where L is a lower triangular invertible matrix.

(6) A 2 AH and det A[k,n] > 0

for

k: = 1,...,n.

(7) A = UUH , where U is an upper triangular invertible matrix. Proof. Let {u1, . . . , un} denote an orthonormal set of eigenvectors of A corresponding to A1, . . . , An. Then, since = 1, the formula

Aj = Aj(uj,uj) = (Aug-my),

for j: 1,...,n,

clearly displays the fact that (1)=>(2). Next, if (2) is in force, then

D = diag{)\1, . . . , An} admits a square root

D1/2 = diag{\/X1, . . . , x/Xn} and hence the diagonalization formula

A = UDUH with U = [u1

un]

can be rewritten as

A = VHV with V = D1/2UH invertible.

12.3. Characterizations of positive definite matrices

257

Thus, (2) => (3), and, upon setting

1'1 = k

1k

[Om—mm

:|

and

V1=VHk,

it is readily seen that

AM = HEAR], = nkHvak = l v1 . But this implies that

(HkHAHkx, x) = (ll, x) = (l,v1x> > 0 for every nonzero vector x E Ck, since V1 has It linearly independent

columns. Therefore, (3) implies (4). However, in View of Theorem 12.1, (4) implies that A = LlDUl, where L1 6 CM” is a lower triangular matrix

with ones on the diagonal, U1 6 Ca is an upper triangular matrix with ones on the diagonal and D 6 (Ca is an invertible diagonal matrix. Thus, as A = AH in the present setting, it follows that

(U1H)—1L1 = DHLfIUl_1 and therefore, since the left-hand side of the last identity is lower triangular

and the right-hand side is upper triangular, the matrix (U1H)_1L1 must be a diagonal matrix.

Moreover, since both U1 and L1 have ones on their

diagonals, it follows that (UfI)_1L1 = In, i.e., UlH 2 L1. Consequently,

AM = 115.411,. = HkHUlHDUll'Ik = (HEUlH 1mm]? DHI.)(H{ZU1H,,) and det A[1,k] = d€t{(L1)[1,k]} det{D[1’k]} d€t{(U1)[1,k]} = dll . . . dick

for k = 1, . . . ,n. Therefore, D is positive definite over (3” as is A = LlDLfI.

The formula advertised in (5) is obtained by setting L = L1D1/2; i.e., (4) implies (5). It is also clear that (5) implies (1). Next, the matrix identity [0 1k

In—k:||:A11 0

A21

A12][ A22

0 In—k

Ik]=[A22 A21] 0

A12

A11

clearly displays the fact that (4) holds if and only if (6) holds. Moreover, since (7) implies (1), it remains only to show that (6) implies (7) in order to complete the proof. This is left to the reader as an exercise.

Exercise 12.14. Verify the implication (6)=>(7) in Theorem 12.4. Exercise 12.15. Let A 6 ((3a and let DA = diag{a11...,ann} denote the

n x 77. diagonal matrix with diagonal entries equal to the diagonal entries of A. Show that DA is multiplicative on upper triangular matrices in the sense that if A and B are both n x n upper triangular matrices, then DAB = DAD3

and thus, if A is invertible, DA_1 = (DA)_1.

258

12. Triangular factorization and positive definite matrices

Remark 12.5. The proof that a matrix A 6 CS” admits a factorization of the form A = LLH for some lower triangular invertible matrix L can also be based on the general factorization formula EPA = U that was established as a byproduct of Gaussian elimination. The proof may be split into two parts. The first part is to check that, since A > 0, there always exists a lower triangular matrix E with ones on the diagonal such that EA=U is in upper echelon form and hence upper triangular. Once this is verified, the second part is easy: The identity

UEH = EAEH = (EAEH)H = EUH implies that D = UEH is a positive definite matrix that is both lower triangular and upper triangular. Therefore, D = diag{d11, - - - 7a}

is a diagonal matrix with djj > 0 for j = 1, . . . , n. Thus, D has a positive square root:

D=Ffl where F = diag{(d11)1/2, . . . , (dnn)1/2} ,

and consequently

A =(E—1F)(E—1F)H . This is a representation of the desired form, since L = E‘lF is lower triangular.

Notice that djj is the j’th pivot of U and E‘1 = (D‘lU)H. Lemma 12.6. If A 6 CC”, then there exists exactly one lower triangular matrix X E ((3a and one upper triangular matrix U E ((3a such that

(12.15)

AX=U

and

ujj=1forj=1,...,n.

Moreover, A = diag{x11,...,xnn} is positive definite and U is uniquely

specified by X: UH = AX‘l. Proof. The existence of a factorization of the indicated form follows from Theorem 12.4. However, if there were two such factorizations AX1 = U1 and AX2 = U2, then X1 and X2 are automatically invertible and the formulas

A = Ul1 = U2X2‘1 imply that Xl2 = UflU2 = In, since X1— 1X2 is lower triangular and Uf1U2 is upper triangular with ones on the diagonal. The formula UH = AX‘1 is obtained in much the same way from the fact that A = UX ‘1 = AH . The details are left to the reader. D

12.4. An application of factorization

259

Exercise 12.16. Verify the formula UH = AX‘1 in the setting of Lemma 12.6.

Exercise 12.17. Show that if A 6 C333, then there exists exactly one lower triangular matrix E with ones on the diagonal such that EA is upper

triangular. Exercise 12.18. Show that if A E C3X3 and A > 0, then there exists

exactly one upper triangular matrix F with ones on the diagonal such that FA is lower triangular. [HINT: This is very much like Gaussian elimination in spirit, except that now you work from the bottom row up instead of from

the top row down.] EXGI‘CiSG

12.19.

[A1

Let A =

A2], where A1

E (Cnxs, A2 6 CnXt and

3 +13 = 7“. Show that if rankA = r, then the matrices AHA, A? A1, A§A2 and A§A2 — A§A1(A{{A1)_1A{{A2 are all positive definite (over complex

I—II—AR

r—xwm

Exercise 12.20. Show that if a: E R, then the matrix

RN90:

spaces of appropriate sizes). will be

positive definite over (C3 if and only if (:L' — 1)2 < 1 /2. 12.4. An application of factorization

Lemma 12.7. Let A 6 CS” and suppose that A11

A12]

[A21

A22

A =

d

A

—1

an

:

[Bil

312]

B21

B22

,

where A21, B21 6 (in—1, A22, B22 6 C(”_1)x(n_1) and A11, B11 6 R. Then

(12.16)

min

= A11 — 1412142311421 = 31—11 .

x2,...,xn

Proof.

'=2

j=2

In view of Theorem 12.4, A = LLH, where L 6 (3a is an invert—

ible lower triangular matrix. Therefore, 2 = j=2

j=2

LH(e1 — iej)

i=2

Let vj =LHej forj=1,...,n, V=[V2

vn]

and

V=span{V2,...,vn}.

260

12. Triangular factorization and positive definite matrices

Then, since L is invertible, the vectors V1,. . . ,vn are linearly independent

and hence the orthogonal projection PV of (C 7" onto V is given by the formula

PV = V(vHV)—1VH. Thus, the minimum of interest is equal to

”V1 — v1||2 = (v1 — PvV1,V1) = ||V1||2 — v(VHV)_1VHV1. It remains to express this number in terms of the entries in the original matrix A by taking advantage of the formulas

A11 [A21

A12 _ _ H _ Vii A22] _ A _ LL _ VH

_ VqI V] _ VHV1

[V1

VfIV VH V '

The rest is left to the reader.

D

Exercise 12.21. Complete the proof of Lemma 12.7.

Exercise 12.22. Let A E CQX”. Evaluate n—1

min

n—1

A(en — E xjej), en — E xjej

(171,...,.’I}n_1

i=1

i=1

in terms of the entries in A and the entries in A‘l.

12.5. Positive definite Toeplitz matrices In this section we shall sketch some applications related to factorization in the special case that the given positive definite matrix is a Toeplitz matrix. We shall begin, however, Without imposing this restriction and shall let

(12.17)

Tk =

to

t_1

...

t—k

t1

t0

"'

tl—k

tic

...

t1

to

denote a (k + l) X (k + 1) Toeplitz matrix , k: = 0,...,n, and, if Tk is invertible,

7.2'.2

1.2' ,:

k

(12.18)

T;1 = D, =

.k Vick)

k

'''

for k: = 0,. . .n.

'k Vick)

Theorem 12.8. If Tn and Tn_1 are invertible Toeplitz matrices and

pno) = Z v§8’Aj and W) = 215.2% j=0 then:

‘20

12.5. Positive definite Toeplitz matrices

(n)

261

(7;) 7A 0 and

(1) 700 =7n n

(n) _1—



(n) —1—

n_- = pn(/\){700 } 1071(0))1 —Aq(>\){7nn} 2 A 7; 'yZ-(jhu’ _ Aw

(12.19)

qn(w) .

i,j=0

(2) if?) = 722,714 fori, j = 0, . . . ,n. Moreover, if also: _ _ n ' '_ _W (n) — _ TnH , then'yz-j (3) Tn — ’l fori,] — 0, . . . ,n andqn()\) — A pn(1/)\).

(4) Tn > 0, then the polynomial pn()\) has no roots in the closed unit disc.

(5) Tn > O and 5,, > O is an (n + 1) x (n + 1) Toeplitz matrix such that Tglel = Sglel, then Sn 2 Tn. Proof. Since Tn is a Toeplitz matrix, the 00 minor (Tn){00} and the nn minor (Tn){,m} of Tn are both equal to Tn_1. Thus, by Theorem 5.11,

730) = m = 75m) 79 0Consequently, we can invoke the Schur complement formulas to write

(12.20) =

_

(n) 700

X

H

1 {733)}-1XH 0 0 733) 1 In M73354 In 0 X—x{753)}-IXH o

and

(12.21) I),



Y

yn

yH 75173] In Whirl Y-y{7£’£)}‘1yfl

0

where XH = [76?)

1

0

762)], yH = [77%)

0

(mln

O ’

7533 Wkly}? 1

75:24 , X denotes the

lower right—hand n X n corner of 1‘” and Y denotes the upper left-hand n x n corner of I‘n. Consequently, the polynomial 223:0 Al 713-(n) —j w can be

262

12. Triangular factorization and positive definite matrices

expressed as

A"] I‘m .

[1 A

wwoég‘s 1m w

+[A

An] [X—x{*y(()3)}_1XH] 2 , W

and a second development based on the second Schur complement formula yields the identity

1

[1 A

WP", : =qn(A){7£Z)}_1W w” 1

+[1

An—1][Y—y{7£’¥3}‘1yH]

511—1

The proof of formula (12.19) is now completed by first verifying that

(12-22)

XX{75")} — 1XH 11,1: Y — y{7£2)}‘1yH ,

and then subtracting Aw times the second formula from the first. The details are left to the reader as an exercise.

To verify (2), observe first that if A = [an-j] for i,j = 0,...,n and e,denotes the i’th column of In“, then aij = eiTHAeJ-H for i,j = 0,. . .,n. Thus, if Z= 22:11 ese£+1(s_1), the backwards diagonal, then Z2 = In“,

Z— — ZT and the ij entry of ZAZ 1s equal to ei+1ZAZej+1 = en+1_,-Aen+1_j = an_7;,n_j

for 2,] = 0,.

. ,7),

Thus, if A is a Toeplitz matrix, i.e., if adj = ai_j, then ZAZ = AT. Conse—

quently, the identity

1,,+1 = ZIn+1Z = ZTnZZI‘nZ = nrnz implies that ZI‘nZ = r5, which yields (2). The verification of (3) is an easy consequence of the fact that Tn = T? if and only if I‘n = F713 .

12.5. Positive definite Toeplitz matrices

263

Suppose next that pn(w) = 0. Then formula (12.19) implies that —1 _

(12.23)

n

. n

.

—l2qn 0, then I‘m admits the upper-lower triangular factorization

(12.29)

1“,, = UnnglUf ,

where

(12.30)

733) 7%: Un=

mg:

0

711

71"”

0

0

v5.22)

and Dn=diag{700),*yll),..., (7‘)}.

Positive definite Toeplitz matrices play a significant role in the theory of prediction of stationary stochastic sequences, which, when recast in the

language of trigonometric approximation, focuses on evaluations of the following sort:

(12.31) 1

min

21r

fl/l 6mg

:0 e‘jal2f 6“9 )rd0

.,cn_1 E C

={'y.({i3}_1

and

(12.32) min

1 2” n .. . —/ |1 — cemglzf(ew)d6: c1, . . . ,cn E (C 27f

0

= {763)}_1,

j=1

where, for ease of exposition, we assume that f (6“) is a continuous function of 0 on the interval 0 g 6 S 27r such that f (620) > O on this interval. Let

Tn = Tn( f) denote the Toeplitz matrix with entries

(12.33)

1

..

t,- = fie f(e29)eWade for j=0,:|:1,:|:2,... .

Exercise 12.26. Show that b0

(12.34) if b=

;

1

,

27T

then E /

bn

0

n

|ieije|2f(ew)d0=bHTnb.

“=0

Exercise 12.27. Show that if Tn > O and uH = [tn

(”'35)

In

0

T..- 1

0

t1], then

1,,

T” _ [111911.111 ll 0H {wit)}1l [OH

”11111

1 l'

12.5. Positive definite Toeplitz matrices

265

Exercise 12.28. Show that if Tn >— O and VH = [t1

(1236)

tn], then

OH mv 1 OH T. _ [01 vHT;_11 1 ] {7535—1 0 TH I. .

Exercise 12.29. Show that if Tn > 0, then

(12-37)

det Tn = {753)}‘1{783_1)}‘1 - - - {733)}‘1

= {7£’.1)}‘1{7.(Z”__1,1.3,_1}‘1---{733)}‘1Exercise 12.30. Verify formula (12.31). [HINT: Exploit formulas (12.34) and (1235).] Exercise 12.31. Verify formula (12.32). [HINT: Exploit formulas (12.34) and (1236).] Exercise 12.32. Use the formulas in Lemma 8.9 for calculating orthogonal

projections to verify (12.31) and (12.32) another way. Exercise 12.33. Show that if Tn > 0, then 733') = 77%) and

(12.38)

753‘” 2 753).

[HINT: The monotonicity is an easy consequence of formula (1232).] Exercise 12.34. Show that if Tn > 0, then the polynomials [c

(12.39)

qk()\) = Z 7]?)9

for

k: = 0,. . .,n

are orthogonal with respect to the inner product 1

27r

(12.40) f=%/ qkfqjdm and f=v£’;)Exercise 12.35. Use the orthogonal polynomials defined by formula (12.39) to give a new proof of formula (12.31). [HINT: Write C" = 237:0 cj(().]

Exercise 12.36. Let flew) = |h(ew)|2, where h(() = 23:0 tj, 22:0 lhj] < 00 and |h(C)| > 0 for lg] g 1. Granting that 1 / h has the same properties as h (which follows from a theorem of Norbert Wiener), show that

(12-41)

liTmhmYl = Ihol2-

[HINT I1— Z?=1¢j€"j‘9|2f(€”> 5W) — watermeww = m. +

“(ezellza where ”(826) = 2:11 hjewe — 23-;1 cjeWh(e“9) is orthogonal to ho with respect to the inner product of Exercise 8.3 adapted to [0, 27r].]

266

12. Triangular factorization and positive definite matrices

The next lemma serves to guarantee that the conditions imposed on

f(e 9') in Exercise 12. 36 are met iff(()= a(()/b((), where a(() = 25:46 ajCj and b(C)— — 23.24 (330 are trigonometric polynomials such that a(C) > 0 and b(C) > 0 when |C| = 1. Lemma 12.9. (Fejér—Riesz) Let

= X ij for lCl = 1 j=—n

be a trigonometric polynomial such that f (C) > 0 for every point C E (C with |§| = 1 and fn 7é 0. Then there exists a polynomial gon(§) = a(C—a1)- - - (C—

an) such that

fK)=|@MCN2.fiT K|=1 and lajl >1forj=1,...,n Proof.

Under the given assumptions, it is readily checked that f_j = f:-

for j = 0, . . . ,n and hence, that f(6) = f(l/B) for every point B E C \ {0}. Moreover, since g(§) = (”f(C) = f_n + f1_n( + - - - + fnC2" is a polynomial of degree 2n with 9(0) 2 f_n 7E O,

MO=aK—BflnlC—&w for some choice of points a, 51, . . . , 52,, E (C \ {0}.

However, in View of

the preceding discussion, these roots can be indexed so that |fij| > 1 and 53‘...” = 1/@- for j = 1,. .. ,n. Therefore,

f(C) = \){’y((,3)}_1 AQn(/\){'y£’t}‘1] and

Y0) = [Paw —AQ;(X)H].

Then

[7§3){753)}_1

’Yfihmlfifitfl]

for

j=1,...,n

Xj=

[0 119]

for j=n+1

and

“75)t -{7,(3_1}H]

for j=1,...,n

s

[o —{7§;;}H] for j=n+1

0 ,

12.7. A maximum entropy matrix completion problem

271

and the formula emerges from formula (12.48) upon making the requisite |:|

substitutions.

12.7. A maximum entropy matrix completion problem In this section we shall consider the problem of completing a partially spec-

ified matrix in the class CQX” when only the entries in the 2m + 1 central diagonals of the matrix, i.e., the entries with indices in the set

(12.49)

Am={(i,j):i,j=1,...,n

and

|i—j|gm},

are given. It is tempting to set the unknown entries equal to zero. However, the matrix that is obtained this way is not necessarily positive definite; see Exercise 12.20 for a simple example, which also displays the fact that there may be many ways to fill in the missing entries to obtain a positive definite completion. But it turns out that there is only one way to fill in the missing entries so that the ij entries of the inverse of this completion are equal to zero

if (i, j) ¢ Am. We shall present an algorithm for obtaining this particular completion that is based on factorization and shall show that it can also be characterized as the completion which maximizes the determinant. Because of this property this particular completion is commonly referred to as the maximum entropy completion.

Theorem 12.14. Let m be an integer such that 0 S m S n — 1 and let

{bij 3 (Li) E Am} be a given set of complex numbers. Then there exists a matrix A 6 CS” such that (12.50)

aij = bij

for

(i,j) 6 Am

if and only if

bjj (12.51)

g

"'

bj,j+m 5

>0

for j=1,...,n—m.

bj+m,j ' '° bj+m,j+m Moreover, if these conditions are met, then there exists exactly one matrix

A 6 CS” such that {12.50) and (12.52)

e$A_1ej = 0

for

(i,j) ¢ Am

both hold, where e,- denotes the i’th column of In. (This unique matrix is

given by the formula A = {XA—lXH}—1; X and A are defined in the proof.) Proof. The proof of the necessity of the conditions (12.51) for the existence of a matrix A E ((3a such that (12.50) holds is easy and is left to the reader.

272

12. Triangular factorization and positive definite matrices

The rest of the proof is devoted to showing that there exists exactly one

matrix A 6 CC” that meets the constraints in (12.50) and (12.52). The argument is divided into steps. It is convenient to begin with the proof of uniqueness, because it motivates the algorithm that is used to establish existence. The very first step, however, will be used to establish a general

principle that will be useful in the development.

1. If L 6 (3a is an invertible lower triangular matrix with entries €13, then

(12.53)

efLLHej = 0 for (i, j) 91 Am efLe, = 0 for (i, j) 9: Am.

If (k3,1),...,(k,r) ¢Am, k: 2 r and C = LLH, then [Ckl

' ' '

Ckr] = egLLH [61

= 95L [L[1,r]

-- -

er]

0]H = [5m

em] Lay];

and hence, as L[1,,.] is invertible, [Ckl

Cm] = [0

0] 4:)[3191

gm] = [0

0].

Thus, (12.53) holds. 2. There exists at most one matrix A 6 CC” for which ( 12.50) and ( 12.52) both hold.

If A 6 CS” enjoys the properties advertised in (12.50) and (12.52), then, in View of Lemma 12.6, A‘1 = X A‘lX H , where X is the only lower

triangular matrix solution of (12.15) and A = diag{a:11, . . . ,mnn} is positive definite. Thus, by Step 1, (12.52) implies that

(12.54)

xij = O

for

i —j > m,

whereas (12.50) yields the constraints 1

5313' (12.55)

Bljaj+ml

0

f

=

$j+m,j

,

for j = 1, . . . ,n — m

6

and 1 mjj

(12.56)

Bun]

3 xnaj

0

=

for j=n—m+1,...,n. 0

Thus, there is at most one matrix A 6 (33” that meets the conditions in

(12.50) and (12.52): A = {XA-lXH}-1.

12.7. A maximum entropy matrix completion problem

273

3. If X E ((3a is the lower triangular matrix that is specified by ( 12.54), (12.55) and (12.56), then A = diag{ic11,. . .,wnn} is positive definite and the matrix A = {XA‘lXH }_1 belongs to the class CC” and meets the conditions in (12.50) and (12.52).

The entries xjj that are specified by (12.55) and (12.56) are the 11 entries of positive definite matrices and are therefore positive. Thus, A > O, A > O

and, in view of (12.52) and Step 1, e$A_1ej = 0 if (i,j) 93' Am. It remains to verify (12.50). Since

In = AA-1 = AXA‘lXH => AX = U with U = (A—lXH)‘1, equations (12.55) and (12.56) are in force, but with aij in place of bij. Therefore, the numbers Cij = bij — adj are solutions of the equations (12.55) and

(12.56) but with right-hand sides equal to zero. Thus, for example, if n = 5 and m = 2, then

(1257)

611

C12

013

$11

C21 031

C22 C32

C23 C33

$21 $31

0

=

0 0

,

C22

C23

024

$22

C32 C42

033 C43

034 C44

$32 $42

0

=

0 0

,

(12.58) 033

034

635

$33

C43

C44

C45

243

C53

654

055

$53

_



0

,

O

0

C44

C45

$44 _ 0

C54

C55

$54

an

0



d

_

0553755 — 0.

But, since the mjj are positive and each of the five submatrices are Hermitian, it is readily seen that Cij = 0 for all the indicated entries; i.e., aij = bij for |i — j| S 2. (Start with c55x55 = 0 and work your way up.) Thus, the matrix

A = {XA‘lXH }_1 is a positive definite completion.

El

Example 12.15. If n = 4 and m = 1, then the entries in X in

511 521 7

(912 (922 1932

?

?

7 ? b23 ? b33 (934 b43

b44

$11 $21 0

0 $22 $32

0 0 $33

0 0 0

0

0

£1743

$44

0

,

[b2 b32

1923] [$22] 1933 $32

=

1 U12 U13 U14 0 1 U23 U24 0 0 1 “34 0

0

1

must satisfy the equations

(

(

12.59

12.60

)

)

1911 [(921 533 [1943

bu] [$11] 522 $21

=

[1] 0

1934] [$33] = [1] 0 (944 $43

and

b x

44 44

=

[1] 0

,

= 1.

Next, since xjj > 0 forj = 1,. ..,4, A = diag{a:11, . . . ,$44} is positive

definite, the matrix XA‘1 is lower triangular with ones on the diagonal.

274

12. Triangular factorization and positive definite matrices

Thus, the matrix

(12.61)

A = (XH)_1AX_1

is positive definite. Moreover, equations (12.59) and (12.60) are in force, but with aij in place of bij. Therefore, the numbers Cij = bij — aij are solutions of the equations

r “21 m =11 [022C2311x221=1”1

'

(

021

12.63

033

)

[C43

C22

034

11321

$33

044] [$43]

0 ’

=

C32

0

[0]

an

d

633

$32

6443344

0 ’

= 0.

But, since the am are positive and each of the four submatrices are Hermitian, it is readily seen that Cij = 0 for all the indicated entries; i.e., az'j = bij for |i — j | g 1. (Start with C44ac44 = 0 and work your way up.) Thus, the

matrix A constructed above is a positive definite completion. For future convenience the general fact that was observed in (12.53) and a companion result are recorded as separate lemmas:

Lemma 12.16. If A = XXH , X 6 Ca is a lower triangular invertible matriat, 1 3233‘ Sn andl g hSn—l, then

(12.64)

aij=0

for

li—j|2kxij:0

for

i—jZk.

Lemma 12.17. If A = YYH, Y E Ca is an upper triangular invertible matrix, 1 3%a andl g hgn—l, then

(12.65)

aij=0 for

|j—i|2ky,-j=0 for j—iZk.

Exercise 12.40. Verify Lemma 12.17 if n = 7 and h = 3. Theorem 12.18. Let m be an integer such that 0 S m S n — 1 and let

{bu 3 (M) 6 Am} be a given set of complex numbers such that the conditions in (12.51) are

in force. Let A 6 CS“ meet the conditions in (12.50) and (12.52) and let

0 6 CS” meet condition (12.50). Then (1) det A 2 det C, with equality if and only if C = A. (2) If A = LADALE and C = LCDCLH, in which LA and LC are

lower triangular with ones on the diagonal and DA and DC are n x n diagonal matrices, then DA : Do, with equality if and only if A = C.

12.8. A class ofA > O for which (12.52) holds

Proof.

275

In view of Theorem 12.4,

A = (XH)_1DX_1 and c = (YH)—1GY—1, where X E (3a and Y E (Cnxn are lower triangular matrices with ones on

the diagonal and D 6 CS” and G 6 CS” are diagonal matrices. Moreover, the constraint (12.52) implies that wij = 0 for i 2 m + j. Therefore, the

formulas

C=A+(C'—A) and Z=Y—1X imply that

ZHGZ=D+XH(C—A)X. Thus, as Z is lower triangular with ones on the diagonal and the diagonal

entries of XH (C — A)X are all equal to zero, 17.

djj = ZQSSIsI

2

2

= gjj + ZQSSIZSJ'I Z gjj

s=j

3>j

with strict inequality unless 2133' = 0 for s > j, i.e., unless Z = In. This

completes the proof of (1). Much the same argument serves to justify (2). D Remark 12.19. Theorem 12.14 can also be expressed in terms of the orthogonal projection PAm that is defined by the formula

PAmA =

Z

(A, meg-”em?

on the inner product space (€a with inner product (A, B) = trace {BH A}: If the conditions of Theorem 12.14 are met and if Q E (3a with gij 2 bigfor (i, j) 6 Am, then there exists exactly one matrix A 6 (33” such that

PAmA = Q and (In — PAm)A—1 = 0. This formulation suggests that results analogous to those discussed above can be obtained in other algebras, which is indeed the case.

12.8. A class of A > O for which (12.52) holds In this section we shall exhibit one way in which Toeplitz matrices Tn > O with inverses supported in a band arise.

Theorem 12.20. Let Tn denote the (n + 1) x (n + 1) Toeplitz matrix: with entries 1

tj

271'

e‘ij0|p(ew|_2d0

for j = O,:|:1,...,:|:n,

where p(§) = [)0 + p1C + - - -pm(m is a stable polynomial of degree m 2 1,

i.e., |p(§)| > 0 for |C| S 1 and pm 75 0. Then:

276

12. Triangular factorization and positive definite matrices

(1) Tn>0. (2) The polynomials qn(() = 29;) 79(2)? that are defined in terms of the entries in (the last column of) P7,, = T;1 are related by the formulas (1266)

(172“) = Cn_mqm( 0, then xi,- = 0. On the other hand, if d}? + (1,?) = 0, then dial) = dz?) = 0 and, as follows from the definition of X, 1113' = 0 in this case too. Consequently,

U2HU1D1 —D2U2HU1 =X =0; i.e.,

B1 = U1D1U1H = U2D2U2H = B2, III

as claimed.

If A t 0, the symbol A”2 will be used to denote the unique n x n matrix

B t 0 With B2 = A. Correspondingly, B will be referred to as the square root of A. The restriction that B t O is essential to insure uniqueness. Thus, for example, if A is Hermitian, then the formula A C

0 —A

A C

0 —A

_ _

A2 CA — A0

0 A2

exhibits the matrix

[A0 C—A

as a square root of

]

[

A2 0 0A2

for every choice of C that commutes with A. In particular,

IkOIkO_IkO

[C

—Ik] [C

—Ik]_[0

[k]

forevery

CEC kxk .

An even simpler example is exhibited in the formula 0

[l/a

a

0

0] [l/a

a

0] — [2

for every nonzero a E C.

Exercise 12.50. Show that if A,B 6 (Ca and if A > O and B = BH, then there exists a matrix V E Cnx” such that

VHAV = In and VHBV = D = diag{)\1, . . . , An}. [HINT: Reexpress the problem in terms of U = A1/2V.]

Exercise 12.51. Show that if A, B E (3a and A t B >— 0, then B‘1 t

A-1 > 0. [HINT: A — B > 0 => A-1/2BA-1/2 < 1”,] Exercise 12.52. Show that if A, B 6 ((3a and if A z 0 and B i 0, then

trace AB 2 0 (even if AB & O).

282

12. Triangular factorization and positive definite matrices

12.11. Polar forms

If A E ((3q and r = rankA 2 1, then the formula A = V1DU1H that was obtained in Corollary 10.3 on the basis of the singular value decomposition of A can be reexpressed in polar form:

(12.73) A = lflwafi) and A = 0/10l )V1U1H , where V1U1H maps RAH isometrically onto RA, UlDUlH = {AHA}1/2 is positive definite on RAH and V1DV1H = {AAH}1/2 is positive definite on RA. These formulas are matrix analogues of the polar decomposition of a complex number. Theorem 12.24. Let A E CPX‘I. Then

(1) rankA = q if and only if A admits a factorization of the form A = V1P1, where V1 E (3q is isometric; i.e., V1HV1 = Iq, and P1 6 ((3q is positive definite over (Cq.

(2) rankA = p if and only if A admits a factorization of the form A = P2U2, where U2 6 Cq is coisometric; i.e., U2U2H = Ip, and

P2 E (Cp is positive definite over Cp. Proof. If rankA = q, then p 2 q and, by Theorem 10.1, A admits a factorization of the form _ D H _ Iq H

A—vuv July],

where V and U are unitary matrices of sizes p x p and q X q, respectively, and D E (3q is positive definite over (Cq. But this yields a factorization of the asserted form with V1 = V 8 UH and P1 = UDUH . Conversely, if

A admits a factorization of this form, it is easily seen that rankA = q. The details are left to the reader. Assertion (2) may be established in much the same way or by invoking (1) and passing to transposes. The details are left to the reader. D Exercise 12.53. Complete the proof of assertion (1) of Theorem 12.24.

Exercise 12.54. Verify assertion (2) of Theorem 12.24. Exercise 12.55. Show that if UUH = VVH for a pair of matrices U, V E (Cnx‘i with rankU = rankV = d, then U = VK for some unitary matrix

K e (CdXd. Exercise 12.56. Find an isometric matrix V1 and a matrix P1 > 0 such 1 0 that

1

1

0

1

=V1P1.

12.12. Matrix inequalities

283

12.12. Matrix inequalities If FHF, GHG E qq, FHF — GHG t O and F is invertible, then clearly

G = (GP—HF and ||GF_1|| g 1. The next lemma extends this observation to a setting in which F is not assumed to be invertible.

Lemma 12.25. IfF E Cpxq, G E Crxq and FHF — GHG : 0, then there exists emctly one matriw K 6 C”? such that

(12.74)

G=KF

and Ku=0 forevery uENFH.

Moreover, this matrix K is contractioe: “K“ g 1. Proof.

The given conditions imply that

(FHFx,x) 2 (GHGx,x)

for every

x E (Cg.

Thus,

Fx=0=> ||Gx|| =0=>Gx=0; i.e., NF Q Na and hence RG11 Q RFH. Therefore, there exists a matrix

Kfq E (Cpxr such that GH 2 FHKfI. If NFH = {0}, then the matrix K 2 K1 meets both of the conditions in

(12.74). If NFH 7E {O} and V E (CPXZ is a matrix whose columns form a basis for NFH, then

FH(K{I + VL) = FHKfl 2 GH for every choice of L 6 Ce)”. Moreover,

(K1 + LHVH)V = 0 limnT00 ||un|| = 0. and |oz| g 1 => ||un|| is bounded. (or not) and fl < 0 => limp,00 ||x(t)|| = 0. and 5 g 0 => ||x(t)|| is bounded for t > 0.

Exercise 13.30. Show by example that item (b) in the list just above is not necessarily correct if the assumption that J is a diagonal matrix is dropped.

Exercise 13.31. Show by example that item (d) in the list just above is not necessarily correct if the assumption that J is a diagonal matrix is dropped.

13.9. Nonhomogeneous differential systems In this section we shall consider nonhomogeneous differential systems, i.e.,

systems of the form

x’(t) —Ax(t) =g(t),

a S t < B,

where A E Rnxn and g(t) is a continuous n X 1 real vector-valued function on the interval a g t 3 fl. Then, since

x'(t) — Am) = etA (e—tAx(t))’ ,

302

13. Difference equations and differential equations

it is readily seen that the given system can be reexpressed as

(6—5Ax(s))l = 6—3Ag(s) and hence, upon integrating both sides from oz to a point t 6 (oz, B), that

e‘tAx(t) — e‘o‘Ax(oz) =/ (6—SAX(S))/d8 =/ 6—3Ag(s)d3 a

06

or, equivalently, that

t (13.16)

x(t) = e(t_0‘)Ax(oz) +/ e(t_s)Ag(s)ds

for

oz g t < B.

C!

To explore formula (13.16) further, let

u(t)=e(t_°‘)Ax(oz) and Va): /te(t_s)Ag(s)ds a

and note that u(t) is a solution of the homogeneous equation

u'(t) = Au(t)

for 04 S t with initial condition u(a) = x(a) ,

whereas, v(t) is the particular solution of the equation

v’ (t) = Av(t) + g(t)

for 04 g t with initial condition v(a) = 0.

Thus,

11'(t) + V'(t) = A(u(t) + V(t)) + 300 and M06) + v(a) = X(0z), as needed.

13.10. Strategy for equations To this point we have shown how to exploit the Jordan decomposition of a matrix in order to study the solutions of a first-order vector difference equation and a first-order vector differential equation. The next item of business is to study higher order scalar difference equations and higher order scalar differential equations. In both cases the strategy is to identify the solution with a particular coordinate of the solution of a first—order vector equation. This will lead to vector equations of the form uk+1 = Auk and

x’ (t) = Ax(t), respectively. However, now A will be a companion matrix and hence Theorem 5.12 supplies an explicit formula for det (A1,, — A), which is simply related to the scalar difference/differential equation under consideration. Moreover, A is similar to a Jordan matrix with only one Jordan cell for each distinct eigenvalue. Consequently, it is possible to develop

an algorithm for writing down the solution, as will be noted in subsequent sections.

13.11. Second-order diHerence equations

303

Exercise 13.32. Show that if A is a companion matrix, then, in the nota— tion of Theorem 5.12,

(13.17)

A

is invertible 4:) A1 - - - A1, aé 0 4:) a0 aé O.

13.11. Second-order difference equations To warm up, we shall begin with the second-order difference equation (13.18)

again + alxn+1 + a2mn+2 = 0, n = 0,1,. . ., With aoag 7E 0,

where a0, a1, a2 are fixed with aoag 75 0 and mo and x1 are given. The objective is to obtain a formula for £12,, and, if possible, to understand how

sun behaves as n T 00. We shall solve this second-order difference equation by embedding it into a first-order vector equation by setting Xn

2

[

Inn

SO't h at

]

IXn+1

=

0

1

xn

__a1/a2

{—ao/ag

forn=0,1,....

xn+1

Thus, x n = A” x0

w1'th

0 A = [—ao/ag

1 —a.1/a2]

forn=0,1,....

Since A is a companion matrix, Theorem 5.12 implies that det()\12 — A) = (cm -I— a1)\ + a2A2)/a2 = (/\ — A1)()\ — A2)

and that there are only two possible Jordan forms:

(13.19) J=[0A1 A20 ]

if

A175A2

and

J=[

0 A11 :|

A1

if

/\1=)\2.

Therefore, A = U JU‘1, where U is a Vandermonde matrix or a generalized Vandermonde matrix,

_ A1

0

J—[O

A2]

.

1f

AlaéAg

and

_ A1

J—[O

1

A1 ]

if

A1=A2.

Moreover, since 0.0 7é 0 by assumption, MM 71 O.

Case 1 (A1 7E A2): (13.20)

_11A2]=>Xn_Au0_[)\1 _n_11)\’f0_1 A2] [0 NAU x0.

U_[)\1

Consequently,

w ., t. .i [.1 .iv 1

1

A"

0

_

304

13. Difference equations and differential equations

However, it is not necessary to calculate U‘1. It suffices to note that formula (13.21) guarantees that xn must be of the form din = 01A? + 5X;

(A1 7'é A2)

and then to solve for a and [5’ from the given initial conditions: 300 = a + B and x1 = OzAl + BA2.

Case 2 (A1 = A2): n

Xn = AnXO = U [A1

0

1:|

A1

1’),

71—1

U_1X0 = U [A1

”A1

0

A?

:| U_1X0 .

Consequently,

_ 1 0 _ 1 0 A?nA’1‘_1 _1 U_[)\1 1]=>xn—[O 1] [A1 1] [0 A? ]U x0 must be of the form

{tn 2 ozA’f’ + flnA’f. The coefficients a and B are obtained from the initial conditions: :60 = 04 and x1 = OzAl + 5A1.

Notice the second term in the solution was written as BnA’f and not as flnA?_1. This is possible because A1 7E 0, and hence a (positive or negative)

power of A1 can be absorbed into the constant fl. The preceding analysis leads to the following recipe for obtaining the solutions of the second-order (homogeneous) difference equation a2xn+2 + alxn+1 + down

for n = 0,1,... ,

with

a2a0 7E 0

and initial conditions

330:0

and

x1=d:

(1) Solve for the roots A1, A2 of the polynomial a2A2 + a1A + a0 and note that the factorization a2(A — A1)()\ — A2) = CL2)\2 + a1A + 00

implies that A1A2 = ao/a2 7E 0.

(2) Express the solution as ozA’f + 5A3“

if

A1 75 A2

xn =

ozA’f + BnA'f

if A1 = A2

for some choice of oz and fl.

(3) Solve for a and B by invoking the initial conditions: c=xo=oz+fl c=xo=oz

and

and

d=x1=ozA1+flA2

d=m1=ozA1+fiA1

if

if

A17éA2

A1=A2

13.11. Second-order diHerence equations

305

Example 13.5. xn+2—3mn+1—4xn=0, n=0,1,...,

m0=5

and

331:0.

Discussion. The roots of the equation A2 —3>\—4 are A1 = 4 and A2 = —1. Therefore, the solution xn must be of the form

xn=a4n+fi(—l)”

forn=0,1,....

The initial conditions 330 = 5 and x1 = 0 imply that all—[3:0

and

a+fi=5

Thus, 01 = 1, fl = 4 and

sun =4”+4(—1)n

n=0,1,....

for

Example 13.6. xn+2—2xn+1+a:n=O 300:3

Discussion.

and

for

n=0,1,...

231:5.

The equation A2 — 2A + 1 = 0 has two equal roots: A1 = A2 = 1.

Therefore,

513,, = 041)” + 611(1)" = 04 + fin. Substituting the initial conditions x0=oz=3

and

x1=3+fl=5,

we see that B = 2 and hence that xn=3+2n

for

n=0,1,....

Exercise 13.33. Find an explicit formula for 51%, for n = 0,1,. .., given

that $0 = —1,x1 = 2 and ask“ = 330,, — 2mk_1 for k = 1,2,. . .. Exercise 13.34. The Fibonacci sequence sun, 71 = 0, 1,. . ., is prescribed by the initial conditions 5100 = 1, x1 = 1 and the difference equation xn+1 = sun + xn_1 for n = 1,2,. ... Find an explicit formula for mn and use it to

calculate the golden mean, limmOO xn+1/:Jcn. Exercise 13.35. Let 30,, = txn_1 + (1 — t)a:n+1 for n = 1,2,. . .. Evaluate the limm00 am, as a function of t for 0 < t < 1 when 130 = O and x1 = 1.

306

13. Difference equations and differential equations

13.12. Higher order difference equations Similar considerations apply to higher order difference equations. The solu— tion of the p’th order equation (13.22) aomn+a1mn+1 +---+a1,alr;n+p = fn,

n = 0,1,. . .,

with

(1,, 7E 0

and given initial conditions 330, {1:1, . . . ,acp_1, can be obtained from the solu—

tion of the first-order vector equation xn=Axn_1—|—fn

for

n=0,1,...

1 0

0 1

where x a: n n

Xn =

O 0

O 0

1

.+

7 A =

m

n+p_1

:

:

0

O

0

‘aO/ap

—a1/ap

—a2/a,p

1

"'

_ap—1/ap

and f5 = [O - - - 0 fn/ap]. The nature of the solution will depend on the eigenvalues of the companion matrix A. Let e1- denote the j ’th column of

Ip forj = 1,...,p. Then, by (13.5), which is valid for any A E (CPXP, the solution

xnzefxn=e1A”0x +e1:2;: A” 1 jfj. Exercise 13.36. Show that un = efAnxo for n = 0,1,... is a solution of

the homogeneous equation (13.23)

(Lawn + alxn+1 + - - - + apxn+p = 0

with Uj = x1- forj = 0,...,p — 1, whereas vn = e? 23:3 An_1_jfj for n = 1,2,... and v0 = 0 is a solution of (13.22) with v0 = - -- = 'vp_1 = 0.

A convenient recipe for obtaining the solution of equation (13.23) is (1) Find the roots of the polynomial a0 + a1/\ + - - - + apAp.

(2) If a0 + a1)\ + - - - + up)? = ap()\ — A1)0‘1---()\ — Ak)ak with distinct roots A1, . . . , M, then the solution must be of the form

an = Z pj(n))\§‘,

where 191-

is a polynomial of degree

Ozj — 1.

(3) Invoke the initial conditions to solve for the coefficients of the polynomials p1.

13.13. Second-order differential equations

Discussion.

307

The algorithm works because A is a companion matrix. Thus,

det (Alp — A) = (a0 + a1)\ + - - - + a19)\1”)/a.p and hence, if

det (Alp — A) = (A — A1)°‘1---()\ — Ak)0"° with distinct roots A1, . . . , M, then A is similar to the Jordan matrix

J = diag{C/(\c:1), . . . , Cgak) k

3

with one Jordan cell for each distinct eigenvalue and the matrix U in the Jordan decomposition A = U JU‘1 is a generalized Vandermonde matrix.

Therefore, the solution must be of the form indicated in (2). Remark 13.7. The equation a0 + a1)\ + - - - + up)? = 0 for the eigenvlaues

of A may be obtained with minimum thought by letting CUj = V in equation (13.23) and then factoring out the highest common power of A. The assumption a0 75 0 will guarantee that the eigenvalues are all nonzero. Exercise 13.37. Find the solution of the third-order difference equation

$n+3—3$n+2+3$n+1 —€Un =0 7 "=0,1,--subject to the initial conditions :60 = 1, m1 = 2 and 332 = 8.

[HINT:

(m—1)3=x3—3m2+3m—1.] 13.13. Second-order differential equations Ordinary differential equations with constant coefficients can be solved by imbedding them in first-order vector differential equations and exploiting the theory developed in Sections 13.4 and 13.9. Thus, for example, to solve the second-order differential equation

(13.24)

0.0.7005) + alac’(t) + agac”(t) = g(t),

for t 2 O

with (12 75 0 and initial conditions

33(0) = oz and x’(0) = fl, introduce the new variables

331(t) = 33(t) and w2(t) = m/(t). Then and

90/205) = {13"(t) = {—61096105) — a1$2(t)+ 9(t)}/0«2Consequently, the vector

_ 56105) Km _ [$205)]

308

13. Difference equations and differential equations

is a solution of the first—order vector equation

x’(t) = [iv/105)] = Ax(t) + g(t), t 2 0,

90W)

with

A: Lag/6&2 —a11/a2] ’

g(t) = ai2 [925)]

and Km) 2 [g] '

Thus,

x(t) = em [2] + /0te(t_S)Ag(s)ds and

x(t)=[1 O]etA

Oz

6

+[1 0] /te(t_s)Ag(s)ds.

Let A1, A2 denote the roots of CA2 +b)\+a. Then, since A is a companion matrix, there are only two possible cases to consider corresponding to the two Jordan forms described in (13.19).

Case 1 (A1 7E A2): H 6

=

eAlt

0

0

€A2t

and hence the solution 33(t) of equation (13.24) must be of the form t x(t) = 76)“ + 66)“ + [1 O] / e 0 on this interval. Let

(13.28)

~- uS‘Uu)

312

13. Difference equations and differential equations

Then

(13.29)

u9 awe)

urn

ufkw 24”“) ufkw

aft)

uxw

win == da;zé”u> u;” uSRw -+da2«é”a) a?) U105)

U205)

uaw

U303)

+®tuP umuD u9 u”u§ = 0+0+det

u1(t) “1(1) (t)

112(15)

ugl)(t)


u9u> _ _

G205) —a3(t) (10(t) 7

since

a‘%w+ax>‘%>+am>‘%>+mumxw=o mrj=123. If cp(t) > 0 for or S t 3 fl, then the differential equation for for some point

c = a + t0(b — a) on the open line segment between a and b. Thus, by the Cauchy-Schwarz inequality,

llull2 S ||Jf(C)(b - a)||||U||, which leads easily to the advertised result.

D

Exercise 14.5. Show that if g(x) is a smooth map of RT into Rq, f is a smooth map of R9 into RP and h(x) = f(g(x)), then Jh(x) = Jf(g(x)) Jg(x). 14.6. A contractive fixed point theorem Theorem 14.8. Let f(x) be a continuous map of a closed subset E of a normed linear space X over IF into itself such that

||f(b) - f(full S Kllb - all for some constant K, 0 < K < 1, and every pair of points a, b in the set E. Then:

(1) There is exactly one point x* E E such that f (x*) = x...

322

14. Vector-valued functions

(2) Ifxo E E and xn+1 = f(xn) for n = 0,1,..., then x* = lim xn; nToo

i.e., the limit exists and is independent of how the initial point x0 is chosen and Kn

(3) ||X* - al S Proof.

1—K ||X1

— X0”-

Choose any point x0 6 E and then define the sequence of points

x1,x2, . .. by the rule Xn+1 = f(xn)

Then clearly

||X2 - X1“ = ||f(X1) - f(Xo)|| S Kllxl - Xoll ||X3 - X2“ = ||f(X2) - f(X1)|| S K||X2 - X1|| S K2||X1 - Xoll llxn+1 - al S Knllxl - Xoll, and hence llxn+k _ xn” S llxn-Hc _ xn+k—1|l+ ' ' ' + llxn+1 _ xn”

S (K”+k_1 + - - - + K”)||X1 - Xo|| n


O and

hence if 72 + 62 g 7'2, then

(Vgi)(U(t)) = Z aj(ll(t))(V9j)(U(t)) j=1

for 0 S t g 1 and h(0) = 0. Thus,

h(t) = [0 $h(s)ds

15. The implicit function theorem

346

because

for j= 1, . . . ,p

(n)(u(s))u'(s) = %Qj(U(S)) = 0

and

O < s < 1. D

15.4. Continuous dependence of solutions The implicit function theorem is often a useful tool to check the continuous dependence of the solution of an equation on the coefficients appearing in

the equation. Suppose, for example, that X 6 EV” is a solution of the matrix equation

ATX+XA=B for some fixed choice of the matrices A, B E R2”. Then in many instances the implicit function theorem may be invoked to show that if A changes only a little, then X Will also change only a little. To this end, let

F(A,X) = ATX + XA — B so that

f,,-(A, X) = e,T(ATX + XA — B)e,-,

239' = 1, 2,

Where e1, e2 denote the standard basis vectors in R2. Then, upon writing X =

9011 $21

51312 $22

,

one can readily check that

8f,-3

= eiT(ATesetT + esetTA)ej

amst

=

T

T

asiet ej -|- atjez- es .

Thus,

6f,-.7

T

T

T

T

T

T

T

T

=

duel ej + aljei e1

=

0,1202 ej + 0,239,. e1

=

0,2201 ej + aljei 62

=

0,2202 Gj + agjez- 62 .

65311

of,-J

(9:812

6f,-.7

(95321

(91% (93122

15.5. The inverse function theorem

347

Correspondingly,

gfll

gfll

$11

$12

gfll £1321

gfll $22

aflz

8f12

8f12

3f12

85511

8f21

83312

8f21

311321

3f21

83322

5511

{£12

£1121

£1122

6$11

8$12

Q$21

811322

=

afgl

2a11

0'21

G21

0

6312

all + G22

G11 + a22

0

G21

6112

0

a21

NOW suppose that F (A0, X0) = O and that the matrix on the right in the last identity is invertible When the terms 049’ are taken from A0. Then the implicit

function theorem guarantees the existence of a pair of numbers 7 > 0 and

6 > 0 such that for every matrix A 6 EV“ in the ball ”A — A0|| < 7 there exists a unique X = (A) in the ball ||X — Xoll < 6 such that F(A,X) = 0 and hence that X = (A) is a continuous function of A in the ball ||A—A0 H < 7. 15.5. The inverse function theorem

Theorem 15.3. Suppose that the p x 1 real vector—valued function [f1($1,.. . ,xp)]

f(x) =

s [fp(x1,...,xp)J

is in C 1(Q) for some nonempty open set Q C RP, that the Jacobian matrix

Qfibc)

QflOC)

:

:

Sic—w

5%;(x)

8x1

Jf(x) =

0x10

is invertible at a point x0 6 Q and that yo = f (x0). Then there exist a pair of numbers 7 > 0 and 6 > 0 such that B7(x0) C Q and for each point

y E Bg(y0) there exists exactly one point x = 19(y) in the ball B7(x0) such that y = f (x) Moreover, the function 29 E Cl(B(s(y0)). Proof.

Let g(x, y) = f (x) — y. Then, in self-evident notation,

Jg = pg) 1?] = [Jf(x) 1p] . Thus, as g(x0,y0) = 0 and the p x p matrices Jé1)(x0,y0) = Jf(x0) and Jé2)(x0, yo) = 1,, are both invertible, the implicit function theorem may be invoked to obtain x as a function of y and y as a function of x. In particular, there exist a pair of positive numbers '7 and 6 such that By(x0) C Q and

for each vector y E Bg(y0) there exists exactly one vector x = 19(y) such

15. The implicit function theorem

348

that g(19(y),y) = 0 and that 19(y) E C1(B(;(y0)). But this is equivalent to D

the asserted statement.

Corollary 15.4. (The open mapping theorem) Let Q be an open nonempty susbet of RP and suppose that f E C1(Q) maps Q into R10 and

that Jf(x) is invertible for every point x E Q. Then f(Q) is an open subset of RP. Proof. Let x0 6 Q and yo = f(x0). Then there exist a pair of numbers ”y > 0 and 5 > 0 such that B7(x0) g Q and for each vector y E Bg(y0) there

exists exactly one vector x = 19(y) in By(x0) such that f(x) = y. Thus,

Yo E 36(3'0) = f(19(Ba(.Yo)) Q f(Bv(X0)) 9 M2)El

Exercise 15.5. Let g(x) be a continous mapping of RP into RP such that all the partial derivatives g5} exist and are continuous on RP. Write J

g.(..,...,..)

3 gem

8 ) age

9P(m17"'>93p)

gig—11)“)

3596—:(x)

y° = g(x°) and suppose that the matrix B = Jg(x°) is invertible and that

”11, — B‘n(x)|| < 1/2 for every point x in the closed ball B5(x°). Show

that if p = 6/(2||B_1||), then for each fixed y in the closed ball Bp(y°), there exists exactly one point x E B5(x°) such that g(x) = y. [HINT2 Show that for each point y E Bp(y°), the function h(x) = x — B_1(g(x) — y) has a fixed point in B5(x°).] Exercise 15.6. Let

1

as? — x2 g(x) =

x2 + x3

,

x° =

mg — 2563 + 1

—1

and

y° = g(x°).

—1

(a) Calculate Jg(x), B = Jg(x°) and 3*.

(b) Show that ||B_1||2 < 5/3. (c) Show that if y E R3 is fixed and h(x) = x — B—1(g(x) — y), then Jh(X) = B_1(Jg(XO) — Jg(x)).

((1) Show that lt(X)|| S 2||B_1||||X — X°||~ (e) Show that if 2||B_1||6 < 1/2, then for each fixed point y in the

closed ball Bp(y°), with p = 6/(2||B—1||), there exists exactly one point x in the closed ball B5(x°) such that g(x) = y.

15.7. An instructive example

349

Exercise 15.7. Let u° E R2, f E C2(Br(u°)) and suppose that [(Vf1))(u)] (Vf2) (V)

is invertible for every pair of vectors u, v E Br(u°). Show that if a, be Br(u°), then f(a) = f(b) (E) a = b. Exercise 15.8. Show that the condition in Exercise 15.7 cannot be weak-

ened to

(Vf1)(u) is invertible for every vector u E Br(u°). [HINTz Con— (Vf2)(u)

sider the function f(x) with components f1(x) = x1 cos x2 and f2(x) =

x1 sin $2 in a ball of radius 27r centered at the point (37r, 27r).] Exercise 15.9. Calculate the Jacobian matrix Jf(x) of the function f(x) with components fi(a:1,:1:2,:r3) = xi/(l + x1 + x2 + x3) for i = 1,2,3 that

are defined at all points x E R3 with x1 + x2 + x3 7e —1. Exercise 15.10. Show that the vector-valued function that is defined in Exercise 15.8 defines a one to one map from its domain of definition in R3 and find the inverse mapping.

15.6. Roots of polynomials Theorem 15.5. The roots of the polynomial

f(A) = A” +a1An—1 + - - - +an vary continuously with the coefilcz'ents a1, . . . , on. This theorem is of great importance in applications. It guarantees that

a small change in the coefficients a1, . . . , an of the polynomial causes only a small change in the roots of the polynomial. If n = 2, this is easily checked Via the formula for the roots of a quadratic equation. If n 2 2,

then it is usually proved by Rouché’s theorem from the theory of complex variables; see e.g. pp. 153—154 of [9] and Appendix B. Below, we shall treat a special case of this theorem in which the polynomial has distinct roots via the implicit function theorem. The full result will be established later by invoking a different circle of ideas in Chapter 17. Another approach is considered in Exercise 17.16.

15.7. An instructive example To warm up, consider first the polynomial

p()\) =A3+A2—4)\+6. It has three roots: A1=1+l, A2=1—’l, andA3=—3.

15. The implicit function theorem

350

This means that the equation

(15.12)

(M + M3 + a(,u + M2 + My + z‘u) + c = 0

in terms of the 5 real variables ,u, V, a, b, c is satisfied by the choices: =1, 1,

1/=1,

a=1,b=—4,c=6

1/=—1,a=1,b=—4,c=6

=—3,1/=0,

a=1,b=—4,c=6.

To put this into the setting of the implicit function theorem, express

f(a, b, c, u + iv) = (u + iv)?’ + am +711»? + bm + iv) + c = H3 + 3M2il/ + show/)2 + (ii/)3 + a(,u2 + 2ui1/ + (ii/)2) + b(,u + iv) + c in terms of its real and imaginary parts as f(a,, b, c, u +z'1/) = f1(a,b,c,,u, V) + if2(a,b,c,/i,1/), where f1(aabacauau) = H3 _ 3/1’1/2 +0412 _ (“/2 +blu’+c

and

f2(aa b, C, H, V) = 3M2V — V3 + 2auV + by.

Thus, the study of the roots of the equation

A3+aA2+bA+c=0 with real coeflicients a, b, c has been converted into the study of the solutions

of the system f1(a,b,c,/i,1/) =0 f2(aabacvlu'aI/) = 0

The implicit function theorem guarantees the continuous dependence of the pair (u, V) on (a, b, c) in the vicinity of a solution provided that the matrix

%h %f_1 A(a,b,c,,u,1/)=

81/};

707

a}; 1/

15.8. A more sophisticated approach

351

is invertible. To explore this at the point a = 1, b = —4, c = 6,11 = 1,1/ = 1, observe first that

(15.13)

% = 3,112 — 31/2 + 20111 + b,

(15.14)

% = —6m/ — 2au,

(15.15)

{2—}: = 6111/ + 2611/,

(15.16)

% = 3112 — 32/2 + 2cm + b.

3M

Therefore,

afl

— 811(1’ — 4169 1 a 1 ) =—2

8f1

— 81/

(1)



4? 6’

1 1 =— 3

)

a

87

afg — 8,“!( 1 a —4 7 6 7 1 a 1 ) = 8 7 (9f2

— 81/ (17 — 4767 1 a 1 ) 2—2

and

(15.17)

det A(1,—4,6, 1,1) = det [ 8

_2] = 2 +8 = 68.

Thus, the implicit function theorem guarantees that if the coefficients a, b, c of the polynomial

A3+aA2+bA+c change a little bit from 1, —4, 6, then the root in the Vicinity of 1 +1 will

only change a little bit. Similar considerations apply to the other two roots in this example.

Exercise 15.11. Show that there exists a pair of numbers v > 0 and 6 > 0

such that the polynomial A3 + 01A2 + bA + c with real coefficients has exactly one root A = ,u+z'1/ in the ball u2+(V—2)2 < 6 if (a—l)2+(b—4)2+(c—4)2
0 may be used for real symmetric matrices that are positive definite over R" as well as for matrices A E (Cnxn that are positive definite over C”. Correspondingly we define

(16.7)

R3“ = {A E RM”: A > 0}.

Exercise 16.2. Let A,B E Rn” and suppose that A > 0, B = BT and ”A — B H < Amm, where Amm denotes the smallest eigenvalue of A. Show

that B > O. [HINT2 (Bu, u) 2 (Au, u) + ((B — A)u,u).]

Exercise 16.3. Let A, B E Ra and suppose that A > O and B 2 BT. Show that A + 5B > 0 if |e| is sufficiently small.

If f 6 62(6)) and B5(a) C Q for some 8 > 0, then, in view of formula

(16.6),

(16.8)

(vma) = 01)... => f(a + an) — f(a) = $52

for some point c on the open line segment between a + eu and a. Thus, if 0 < 6 < e and

(16.9) Hf(x) : 0 for every x E B5(a), then a is a local minimum for f, whereas, if

(16.10) Hf(x) j 0 for every x E B5(a), then a is a local maximum for f. This leads easily to the following conclusions: Theorem 16.2. If Q g R“ is an open set that contains the point a, f (x) =

f(x1,.. .,a:n) belongs to 62(6)) and (Vf)(a) = 01x”, then: (16.11)

Hf(a) > 0 => a is a local minimum for f(x).

(16.12)

Hf(a) < 0 => a is a local mamimum for f(x).

362

16. Extremal problems

Proof. The proof of (16.11) follows from (16.8) and Exercise 16.2. The latter is applicable because Hf(a) >— 0 and Hf(c) is a real symmetric matrix that tends to Hf(a) when 8 —> 0. The verification of (16.12) is similar. III Theorem 16.2 implies that the behavior of a smooth function f (x) in the vicinity of a point a at which (V f )(a) = 01m depends critically on the eigenvalues of the real symmetric matrix Hf(a).

Example 16.3. Let f(u, v) = a(u— 1)2-l-fi(v—2)3 with nonzero coefiicients ozElRandflElR. Then

g=sfi2.

g(u,.)=2.t_t

Hence,

O] ifu = 1 and v = 2.

(Vf)(u,v) = [0

However, the point (1, 2) is not a local maximum point or a local minimum

point for the function f (u, 1)), since

f(1 + 81, 2 + 82) — f0, 2) = as? + [35%, which changes sign as 52 passes through zero when 51 = 0. This is consistent with the fact that the Hessian

Hf('U/,’U) :

331

82.;

82f

82f

m

W

8212

u v

_ [2a _

0

0

]

6,8(7) _ 2)

and

HAL%=[%XS] is neither positive definite nor negative definite.

Exercise 16.4. Show that the Hessian Hg(1, 2) of the function

9Wfl0=aW—1V+B@-2V is the same as the Hessian Hf(1, 2) of the function considered in the preceding example.

0 Warning: A local minimum or maximum point is not necessarily an ab-

solute minimum or maximum point: In the figure, f (x) has a local minimum at x = c and a local maximum at ac = b. However, the absolute maximum value of f (x) in the closed interval a S a: g d is attained at the point d and

the absolute minimum value of f (at) in this interval is attained at the point a: = a.

16.2. Convex functions

363

f(x)

Exercise 16.5. Show that if 04 = 1 and fl = 1, then the point (1,2) is a local minimum for the function

9(Ua ’0) = (U - 1)2 + (v - 2)4, but it is not a local minimum point for the function

f(’u,v) = (U - 1)2 + (’U - 2)3Exercise 16.6. Let f E (32(R2) and suppose that (Vf)(a,, b) = [0 0], and let A1 and A2 denote the eigenvalues of Hf(a., b). Show that the point (a, b) is: (i) a local minimum for f if A1 > 0 and A2 > 0, (ii) a local maximum for f if A1 < 0 and A2 < 0, (iii) neither a local maximum nor a local minimum if |/\1/\2| > 0, but A1A2 < 0.

Exercise 16.7. In many textbooks on calculus the conclusions formulated in Exercise 16.6 are given in terms of the second—order partial derivatives

04 = (82f/8x2)(a,b), 3 = (82f/8y2)(a,b) and 7 = (82f/Bx6y)(a,b) by

the conditions (i) a > 0, B > Oand a6—72 > 0; (ii) a < 0, B
0; (iii) a6 — 72 < 0, respectively. Show that the two formulations are equivalent. 16.2. Convex functions

A real valued function f (x) = f (951, . . .,xn) that is defined on a convex subset Q of IR” is said to be convex if for every pair of points a, b E Q

(16.13) f(ta+(1 —t)b) S tf(a) + (1 —t)f(b)

for every t with 0 S t S 1,

364

16. Extremal problems

or, equivalently, if and only if for every set of k points a1, . . . , ak E Q with k 2 2

k

(16.14)

k

I:

X f(tjaj) g tflaj) if

j=1

t,- 2 o

and

i=1

t = 1;

1:1

(16.14) is called Jensen’s inequality. To understand why (16.13) implies (16.14), consider the case of three points with tj > 0 for j = 1,2,3 and Sj = tj 3 may be verified in much the same way by induction. A number of important properties of convex functions and their applications will be considered in detail in Chapter 22. At this point we restrict attention primarily to the relevance of convexity to extremal problems. Theorem 16.4. Let Q be an open nonempty convex subset of JR”, let f be a real valued convex function on Q that belongs to the class Cl(Q) and suppose

that V f (a) = O for some point a E Q. Then

f (a) g f (b)

for every point b E Q,

i.e., f achieves its absolute minimal value at the point a.

Proof.

Suppose that a, b E Q. Then (16.13) implies that

f(a+1l(b - al) - f(a) gf(b)—f(a) if0 f(b) > f(a)

for every point

b E Q.

The implications of Theorem 16.4 and Exercise 16.8 are connected be— cause, as will be shown in Section 22.2, if Q is an open nonempty convex subset of R”, then a real valued function f E C2(Q) is convex if and only if

the Hessian Hf(X) i 0 on Q; if n = 1, this reduces to f”(sc) 2 0 on Q.

16.2. Convex functions

365

Example 16.5. Let (11,... ,an, and 151,. . . ,tn be two sequences of positive numbers such that t1 + - - - tn = 1. Then

a a ail---af,"g—1+---+—n

(16.16)

with equality if and only if a1 = - - - = an.

Since — lnx is a convex function of at on the interval (0, oo),

—ln(t1a1 + - - - +tnan) S —(t11na1 + - - - + tnlnan) = —ln(a.'i1 ---a.f,"), which is easily seen to be equivalent to the inequality (16.16).

Example 16.6. Let A E 11%a with block decomposition _ [A21 A11 A22] A12 A—

_ a and set f (x) —

With A11 E RPXP, a E RP, A22 6 qq and X E Rq. Then

—“"”“)‘“")=2+€ 0. Then, upon setting

x = Vlu + V2W, (Vf)(X)T = 2A213+2A22X = 2A21a+2V181l/1T(‘/111+l/2W) = 2A218+2W5111

and hence (Vf)(X)T = 0 (E) u = —Sl_1V1TA21a,

i.e., if and only if X = X0 déf —V131_1V1TA218. + V2W = —A;2a + V2W .

366

16. Extremal problems

Correspondingly, since NA22 Q NA12 by Lemma 12.21, aTA12V2w = O and f(X0) = aT(A11 — A12A52A21)a.

Exercise 16.9. Show that in the setting of Example 16.6 f (x)1/2 is convex. [HINT: To verify the latter, note that

f...W[:::8::3:H:::81231>

and identify (Au, u)1/2 as a norm on R”.]

Exercise 16.10. Show that if f (x) is a convex function of x on a convex subset Q of R” and f(x) 2 0 when x E Q, then f (x)2 is also a convex function of x on Q. Exercise 16.11. Show that in the setting of Example 16.6, All is invert-

ible and find a geometric interpretation of the Schur complement A22 — 1421141111412.

16.3. Extremal problems with constraints In this section we shall consider extremal problems with constraints, using the method of Lagrange multipliers. Let a be an extreme point of the

function f (:61, . . . ,xn) when the variables ($1,...,xn) are subject to the constraint

g(:c1,...,xn)=0. Geometrically, this amounts to evaluating to f (:31, . . . ,a:,,) at the points of the surface determined by the constraint 9(33'1, . . . ,zcn) = 0. Thus, for example, if g(:c1, . . . ,xn) = 93% +- - -+:I:,2, — 1, the surface is a sphere of radius 1.

If x(t), —1 g t g 1, is any smooth curve on this surface passing through the point a with x(0) = a, then, in view of the formulas g(x(t)) = O it follows that

and

%g(x(t)) = (Vg)(x(t))x'(t)

for all

156 (—1,1),

d

=0 a W» = (Vg)(x(t))x’(t) for all t E (—1, 1) and hence, in particular, that

(V9)(a)X'(0) = 0At the same time, since a is a local extreme point for f (x) we also have

d I 0 = E (X(0)) = (Vf)(a)x (0)-

16.3. Extremal problems With constraints

367

Thus, if the set of possible vectors x’ (0) fill out an n — 1 dimensional space, then

(Vf)(a) = /\(V9)(a) for some constant A. Our next objective is to present a precise version of this argument, With the help of the implicit function theorem.

Theorem 16.8. Let f(u) = f(u1,...,un), 91(u) = 91(u1,...,un),..., gk(u) = gk(u1, . . . ,un) be real-valued functions in 01(Q) for some open set Q in IR", where k < n. Let S={(u1,...,un) Enj(u1,...,un) =0forj= 1,...,k} and assume that:

(1) There exists a point a E S and a number a > 0 such that the open ball Ba(a) is a subset o and either f(u) 2 f(a) for all u 6 Sn Ba(a) or f(u) g f(a) for all u 6 Sn Ba(a).

(V91)(u) (2) rank

3

=p

for all points u in the ball Ba(a).

(V9001) Then there emlsts a set of k constants A1, . . . , Ak such that

(Vf)(a) = A1(V91)(a) + ' ' ' + Ak(V9k)(a)Proof. The general implicit function theorem guarantees the existence of a pair of constants 'y > 0 and 6 > 0 and a permutation matrix P E Rnx" such that if (16.17) x

Pu= [ y]

.

W1th

xERq,

yERP,

p+q=n

and

X0

Pa: [Yo],

then for each point x in the ball By(x0) there exists exactly one point

y = cp(x) in the ball B5(y0) such that iu=0

9”

for

i=1,...,h

when

Pu=

x

e)l

.

Moreover, go 6 Cl(p(x0)). Let x(t), —1 g t g 1, be a smooth curve in By(x0) with x(0) = x0 and let

X05)

Then,

(16.18)

“‘t) 2 PT l r(X(t)) l '

%f|t=o = (Vf)(a)U’(0) = o

368

16. Extremal problems

and g,(u(t)) =0

for

— 1 \p_i _

p

p

p

it follows that

det XXT g 19—? = det Xn,

i.e., the maximum under the given constraints is achieved by XOXOT.

Exercise 16.16. Find the maximum value of the function f (X) = ln(det X )2 over all matrices X 6 1Rp with trace XXT = 1, by the methods developed in Example 16.11. Exercise 16.17. Show that the problem considered in Exercise 16.16 is

equivalent to finding the maximum value of the product 3% - - - 3?, for a set of real numbers 31, . . . , 3,, that are subject to the constraint 3% + - - - + 312, = 1.

[Hintz Think svd.] Exercise 16.18. Find the minimum value of the function f (X) = trace XXT

over all matrices X E 1%.p with det X = 1.

16.4. Examples

373

Exercise 16.19. Let C 6 RM”, Q = {X E Ra : det X > 0} and let f(X) = 0

for

i=1,...,n

and

Ze$x=1}.

Show that if a E Q, b E Q and u E R” are vectors with components a1, . . . ,an, b1...,bn and U1 . . . ,un, respectively, then

lrlréaRxL {(u,a) —ln (; euibi>}= :Zlazln—.

[HINT: First rewrite the function that is to be maximized as 2l a.- ln if, where cz- = 6“ bi / {22:1 e“8bs} and note that the vector c with components Ci, 7) = 1,. . .,n, belongs to Q. Therefore, the quantity of interest can be identified with max Z a2 1n:—

overx E R” with 3:7; > 0 and subject to g(x) 2x1 +"'+$n— 1 =

374

16. Extremal problems

Exercise 16.24. Use Lagrange’s method to find the point on the line of intersection of the two planes 0.1501 + agxg + a3x3 + a0 = 0, (91961 + bgscg +

b3x3 + b0 = O which is nearest to the origin. You may assume that the two planes really intersect, but you should explain where this enters into the calculation. Exercise 16.25. Use Lagrange’s method to find the shortest distance from

the point (0, b) on the y axis to the parabola x2 — 43/ = 0. Exercise 16.26. Use Lagrange’s method to find the maximum value of

(Ax, x) subject to the conditions (x,x) = 1 and (u1,x) = 0, where 111 is a nonzero vector in A/(sgeTA), 31 is the largest singular value of A and

A = AT 6 Rm". 16.5. Krylov subspaces The k’th Krylov subspace of R” generated by a nonzero vector u E R" and a matrix A E Ra is defined by the formula for

Hie = span{u, Au, . . .,Ak_1u}

k = 1,2,... .

Clearly, dim 7-11 2 1 and dim ”HI, 3 k: for k = 2, 3, . . .. Lemma 16.12. If 7-1;,“ = H], for some positive integer k, then 3% = 7-1;,

for every integer j 2 k.

Proof.

If H1,“ 2 7-1;, for some positive integer k, then Aku = ck_1Ak_1u+

- - -—l—cou for some choice of coefiicients co, . . . ,ck_1 E R. Therefore, Ak+1u = ck_1Ak + + coAu, which implies that ”HI,” = 7-1;,“ and hence that Hk+2 = 7th.. The same argument implies that Hk+3 = HI,” = 7-1;,“ and so

[I

on down the line, to complete the proof.

Exercise 16.27. Let A 6 RM", let u E R” and let k 2 1 be a positive

integer. Show that if A > 0, then the matrix

(Au, u) (Au, Au)

(A2u, u) (A2u, Au)

-----

(Aku, u) (Aku, Au)

(Au, Ak_1u)

(A2u, Ak_1u)

- --

(Aku, Ak_1u)

is invertible if and only if the vectors u, Au, . . . ,Ak_1u are linearly independent in R".

16.6. The conjugate gradient method The method of conjugate gradients is an iterative approach to solving equa-

tions of the form Ax = b for matrices A 6 RC” when b E R”. It is

16.6. The conjugate gradient method

375

presented in this chapter because the justification of this method is based on solving an appropriately chosen sequence of minimization problems. Let A E Rnxn, b E R” and

1

(1625)

MK) = 5&4a) - (bx)

for every x 6 IR". Then the gradient (Vgo)(x), written as a column vector, is readily seen to be equal to

292 (9531

(Vso)(x) =

1

= § 0 (resp., x 2 0) for a vector x E R” means that x, > 0 (resp., a}, 2 0) for every entry 5v,- ofx; u > v means u—v > 0 etc. and Rf denotes {x E R,“ : x 2 0}.

The problems are formulated in terms of a matrix A E Rpxq, a pair of vectors b E R1", 0 E Rq and the sets

(16.33)

X={X€R_?_: Ab}

and y={y€R£: ATygc}.

382

16. Extremal problems

The objective is to minimize (x, c) over x E X when X aé (Z), or to maximize (b, y) over y E 3/ when 3) 7A 0). A vector u E X is said to be a solution of the primal problem if

(16.34)

(u, c) g (x, c)

for every x E X.

A vector v E 3/ is said to be a solution of the dual problem if

(16.35)

(b, v) 2 (b, y)

for every y E y.

The symmetry in the sets in (16.33) serves to make the terminology primal and dual reasonable. However, it is more convenient to define the primal and dual problems for the sets

(16.36)

X={x€Rq:Ax2b} andy={yElRf::ATy=c}.

The next exercise shows how to express the constraints in sets X and y of

the form (16.33) into sets X and 3/ of the form (16.36). Exercise 16.39. Show that:

(1) XERiandAx2b XERqand [fJXZ [3].

q

(2) yERf: andATy§cyER£ and A T there exists a vector u E R: such that [I ]

[3;] = c.

q The verification of the next lemma depends in part upon a result in convex analysis that will be established in Chapter 22. It is more convenient to reference ahead than to postpone the proof. Lemma 16.20. (Farkas’ Lemma) IfA E Rpxq, c E Rq,

Z={x€Rq: AxZOand(x,c) + (x, ATy} = (x, ATy) = (AX — by) + (by) Z (by), since (Ax — b, y) 2 0, with equality if and only (16.39) is in force. The rest of the proof is divided into steps.

1. If x0 6 X is a solution of the primal problem and c aé 0, then the set (16.40)

E = {7: E {1, . . . ,p}: (AX0)Z' — bi = 0}

is not empty and

(16.41)

{11 E Rq: (Au), 2 0 for i E E and cTu < 0} = (b.

If E = (2), then the first p positive integers all belong to the set (16.42)

F = {’l E {1, . . . ,p}: (AX0)Z' — b1 > 0}

and there exists an e > 0 such that (x0 + u) E X for every vector u E Rq

with ||u|| < 8. Since G 7E O, u may be chosen so that cTu < 0. But then, (x0 + u,c) = (x0,c) + (u,c) < (x0,c),

384

16. Extremal problems

which contradicts the presumed minimality of c0.

Therefore, E 75 (2).

Suppose that E = {731, . . . ,z‘k} and let Ag G kq (resp., by 6 RI“) denote the sub matrix of A (resp., sub vector of b) with only rows 731, . . . ,ik. Then clearly AExo = bE and, upon introducing the analogous notation for F, Apxo > bF. Thus, in terms of this notation, the set in (16.41) is equal to

QE={XERq: AEXZ 0

and

(x,c) bp, A(x0+6u) 2 b for all 6 > 0 that are sufficiently small, i.e., x0 + (Su 6 X. But then (X0 + (511, C) = (X0, C) + 6(11, C) < (X0, C),

which contradicts the presumed minimality of c0. Thus, QE = Q), as claimed.

2. (1)=>(2) If c = 0, then the vector yo = 0 belongs to 3? and satisfies the equality in (2). If c 7E 0, then, in View of Step 1, E = {i1,...,7§k} for some k > 0 and 9E = (2). Therefore, Farkas’ lemma guarantees that there exists a vector

u E Rf such that Agu = 0. Thus, 11 75 0 and there exists a nonzero vector yo 6 3?; it is obtained by setting (y0)E = u and (y0)F = 0. The construction implies that yo 6 y and (AXO _ b1 y0>

=

(AEXO _ bEa (Y0)E) + (AFXQ — bF, (y0)F>

= + = (X0,C) _ 7

which justifies (2). 3. (2)=>(1) and (2)=>(3)

Ifxo E X, yo 6 3?, (x0,c) = (b,y0) and x E X, then _ (X, C) = (bay0> _ (vTYO> = (b _ AxvYO> S 0’

i.e., x0 is a solution of the primal problem.

Similarly, if (2) holds and y E 37, then

(bow) - (hm) = (x0,c) - (by) = (XO,ATy> - (by) =

(2)

16.8. Linear programming

385

Suppose first that yo 6 3) is a solution of the dual problem with positive entries. Then it is an interior point of R1: and is a local extreme point of the constrained extremal problem

max f(y) subject to the constraints

gi(y) = 0

for i = 1,. . . , p,

where

f(y) = (b, y)

and

gi(y) = (ATy — c),-, for i = 1,. .. ,q.

Thus, as grad f = b and grad g,- = Aei, the i’th column of A, when the gradients are written as column vectors, it follows that b = Axo for some vector x0 6 Kg and hence that for this vector (X0>C> = (x0,ATy0) = (AXOaYO> = (bah):

i.e., (2) holds. Suppose next that yo is a solution of the dual problem with only k

nonzero entries with 1 S k < p and let AG 6 11%q (resp., (y0)G E Rk and b0 6 Rk) denote the sub matrix of A (resp., sub vectors of yo and b) with only those rows corresponding to the nonzero entries in yo. Then, since (13,370) : (bGa (YO)G>

and

(AG)T(y0)G : C,

the analysis in the first part of this step implies that there exists a vector x0 6 Rh such that

= (x01(AG)T(YO)G> = (AGXO, (Yo)G> = (be, (Yo)G> = (b,y0)Thus, (2) is in force for this case also. Finally, if yo = 0 is a solution of the dual problem, then c = ATyO = 0.

Thus, if x E X, then (x,c) = 0 = (b,y0), i.e., again (2) is in force.

III

Exercise 16.42. Find a solution x0 6 X of the primal problem and yo 6 37

0f the dual problem for the sets X and 31 defined in (16.36), when 3

A—[_1

2

_1

1

1],

1

b——[1]

and

2 0—;

and evaluate (x0, c) and (b,y0). Exercise 16.43. Show that if X and 31 are the sets defined in (16.36) for

as. 2.], b=l§l c=lil

then X = (b and supy€y(b, y) = 00.

386

16. Extremal problems

16.9. Bibliographical notes Exercise 16.21 is connected with the maximum entropy extensions consid— ered in Section 12.7. The first papers on this problem by J. P. Burg were based on variational methods. Section 16.6 was adapted from the discussion

in [61] and [86]. Exercise 16.23 is taken from Ellis [38]. The section on linear programming is adapted mostly from [8].

Chapter 17

Matrix-valued holomorphic functions

like most normally constituted writers Martin had no use for any candid opinion that was not wholly favorable.

Patrick O’Brian [69], p. 162 The main objective of this chapter is to introduce matrix-valued contour integrals of the form

/ 0 lim —f(A + 5)é- _ f(A)

exists for every point A E (2. This is a very strong constraint because the variable 5 in this difference ratio is complex and the definition requires the limit to be the same regardless of how 5 tends to zero. Thus, for example,

if f (A) is analytic in Q, then the directional derivative i6 _

(17.2)

(Dgf)()\) = lim “Mm 7“ ) f”) 711,0

exists and

(173)

f'(A) = 6—i6(Def)()\)

for every angle 6. In particular, because of (17.1),

(Done) = e—WDi/zm);

(17.4)

i.e., in more conventional notation

('9 f _ 831:”) —

(17.5)

.8f iay()\) .

This last formula can also be obtained by more familiar manipulations by writing A 2 it + iy, g 2 oz + i5, f()\) = g(a:,y) + ih(a:,y), where any, oz,6,g(x, y) and h(a:, y) are all real, and noting that

(17.6)

f(/\+€) — f(/\) = 9(w+a,y+fl) - 9mg) +ih(w+a,y+fi) — May) g

oz + i5

oz + i5

Then, since the limit when fl = 0 and oz —> 0 must agree with the limit when oz = O and B —> 0, it follows that

(17-7)

89 0h _ 1 {89 8h } away) +2503, y) — ; 6—y($’y) + ia—ylx'ay)

which is just another way of writing formula (17.5). Clearly formula (17.7) (and hence also formula (17.5)) is equivalent to the pair of equations

89

_ ah

ah

_ ag

These are the Cauchy-Riemann equations. But it’s easiest to remember

them in the form (17.5). There is a converse statement: If (17.8) holds in some open set (2 and if the indicated partial derivatives are continuous in Q, then f (A) = 9(37, y) + ih(a", y) is holomorphic in 9. For sharper statements, see e.g. [75]. Theorem 17.1. Let Q be an open nonempty subset of C and let [)9 denote the set of functions that are holomorphic {i.e., analytic) in (2. Then:

1 7.1. Differentiation

389

(1) f€b9=>f()\)is continuous MS) (2) f6bQ,g€bQ=>af+Bg€borevery choice afaWBEC

(3) f6h9,9€bo=>f9€fm and(f9)’>\()= f’(>\)9()+ f()\)9'0)(4) fem, f()\)7é0f0ranypoint)\eQ=>(1/f)€b9. (5) 96%, 9(9)C91,91 open,f6b91=>fog6bg. (6) f€b9,a€Q=>Raf€bQ,where

.

A —

(RJXM= flg-gm)‘fA#“ f’(a) if A=0z

(17.9)

(7) f€U§2=>f'€bnProof.

Let a E (2. Then a +€ E 9 if |§| is small enough and

Ifla+e—fmn==|ifliQ—i@3m s flfli%—fll— rm)

mum,

which clearly tends to zero as |§| —> 0. Therefore, (1) holds. The next four items in this list are easy to verify directly from the definition. Items (6) and (7) require more extensive treatment. They will be discussed after we introduce contour integration. [I We turn now to some examples.

Example 17.2. The polynomial p()\) = a0+a1A+- - -+an)\n is holomorphic in the whole complex plane (C.

Discussion.

In view of Items 2 and 3 of Theorem 17.1, it suffices to verify

this for polynomials p1()\) = a0 + am of degree one. But this is easy:

P1(A+€) —P1()\) = a0+a1()\+§) —a0 —a1)\ 6



=a1.

Therefore, the powers

Ak=A-)\---)\, k=1,2,...,n are holomorphic in C as is p()\). Exercise 17.1. Verify directly that

lim —()\ + E) _ A = nAn—l

6—)0

for every positive integer

8:

Example 17.3. f /\ =

1 is analytic in C X — oz

a .

n 2 2.

390

1 7. Matrix-valued holomorphic functions

Discussion.

Clearly

flA+@-f0)__l{ 1 1 } g E A+§—a_A—a A—oz—(A+€—0z)

flA+€-QXA—a)

—1

—1

= *2’ as£—>O.

Example 17.4. f()\) = (A _ a1) '

(A _ ah) is analytic in (C\{fl1,...,fin}.

Discussion. This is immediate from Item 3 of Theorem 17.1 and the pre-

ceding two examples. Example 17.5. f (A) = cad is analytic in C for every fixed a E (C and

fTM=aflMDiscussion. In View of Item 5 of Theorem 17.1, it suffices to verify that e)‘

is analytic. To this end, write A = u + iv (with ,u, 1/ E R) so that f()\) = e)‘ = We” = e“(cosu + isin V) and, consequently,

%(A) = f()\)

%()\) = e“(—sin1/ + icosu) = if()\).

and

Thus, as the Cauchy-Riemann equations are satisfied, f (A) is analytic in (C. A perhaps more satisfying proof can be based on the exhibited formula

for f (A) and Taylor’s formula with remainder applied to the real valued functions 6“, cos V, sin V, to check the existence of the limit in formula (17.1). Yet another approach is to first write ea()\+§) _ eaA _ 80M {6015 _ 1}

5





and then to verify the term inside the curly brackets tends to oz as 5 tends to 0, using the power series expansion for the exponential. It all depends upon what you are willing to assume to begin with.

1 7.2. Contour integration

391

17.2. Contour integration A directed curve (or contour) I‘ in the complex plane C is the set of points

{7(t) : a S t S b} traced out by a continuous complex valued function 7(t) as t runs from a. to b. The curve is said to be closed if 7(a) = 7(b), it is said to be smooth if *y’ (t) is continuous on [a, b], it is said to be simple if y is one to one on the open interval a < t < b; i.e., if a < t1,t2 < b and

y(t1) = 7(t2), then t1 = t2. The simplest contours are line segments and arcs of circles. Thus, for example, if I‘ is the horizontal line segment directed from a1 + i6 to 622 + i6 and 041 < 042, then we may choose

nfl=tflfimnst3m

orwhhun+aw—an+nnogtgr

The second parametrization is valid even if ozl > am. If I‘ is the vertical line

segment directed from a + 2'61 to a + 2'62 and 61 < 62, then we may choose y(t) 2 oz + it, 31 S t S ,82. If I‘ is a circular arc of radius R directed from Re” to Re” and a < fl, then we may choose y(t) = Re“, oz 3 t g 6. A curve 1" is said to be piecewise smooth if it is a finite union of

smooth curves, such as a polygon. The contour integral fr f (A)d)\ of a continuous complex valued function that is defined on a smooth curve I‘ that is parametrized by 7(t) is defined by the formula

(17.10)

b

[F mm = / f(i(t))i’(t)dt-

The numerical value of the integral depends upon the curve I‘, but not upon

the particular choice of the (one to one) function 7(t) that is used to describe the curve, as the following exercise should help to clarify. Exercise 17.2. Use the rules of contour integration to calculate the integral

(17.10) when f()\) = A and (a) *y(t) = t for 1 S t g 2; (b) y(t) = t2 for 1 g t g fl; (c) y(t) = et for 0 g t g ln2 and (d) y(t) = 1+sint for

ogtgwfi. Exercise 17.3. Use the rules of contour integration to calculate the integral

(17.10) when f (A) = A and I‘ is the rectangle directed counterclockwise with vertices —a — ib, a — ib, a + ib, —a + ib, where a > 0 and b > 0.

Exercise 17.4. Repeat the preceding exercise for f (A) = A", n an integer

(positive, zero or negative) and the same curve I‘. Theorem 17.6. Let f (A) be holomorphic in some open nonempty set 9. Let I‘ be a simple piecewise smooth closed curve in 9 such that all the points

392

1 7. Matrix-valued holomorphic functions

enclosed by I‘ also belong to (2. Then

A mm = 0 . Discussion.

Consider first the special case when I‘ is a rectangle with

vertices a1 +7Jb1, a2 +z'b1, a2 +z'b2, a1 +z'b2, with a1 < a2, b1 < b2 and suppose

that the curve is directed counterclockwise. Then the integral over the two horizontal segments of I‘ is equal to

[:2 m: + ib1)da: — ff m + ib2)dx = /:2{ /b

()2

8f

.

—a—y(x + zy)dy} dx, 1

whereas the integral over the vertical segments of 1" is equal to

b2

b2

b2

f(a2 + iy)idy — b1

f(a1 + iy)z'dy 2/ b1

b1

(12

i{/

gm + iy)da:} dy . 1

Therefore, assuming that it is legitimate to interchange the order of inte— gration on the right—hand side of the first of these two formulas, we see that

/F mm

/bb2 /aa2 {_g_;(x + i?!) + i%(x + 7331)} dandy 1

1

:0,

by the Cauchy-Riemann equations (17.5). The conclusion for general I‘ is obtained by approximating it as in Figure 1. The point is that the sum of the contour integrals over every little box is equal to zero, since the integral around the edges of each box is equal to zero. However, in the sum, each

interior edge is integrated over twice, once in each direction. Therefore, the integrals over the interior edges cancel out and you are left with an integral over the outside edges, which is approximately equal to the integral over the

curve F and in fact tends to the integral over this curve as the boxes shrink to zero.

1 7.2. Contour integration

393

Figure 1

Theorem 17.7. Let f (A) be holomorphic in some open nonempty set 9. Let I‘ be a simple closed piecewise smooth curve in 9 that is directed coun—

terclockwise such that all the points enclosed by I‘ also belong to 9. Then

1 / f()\)d)\— 27ri pA—oz

Proof.

_

f(oz)

ifozisinsidel‘

0

ifozisoutsidel".

Let 04 be inside I‘ and introduce a new curve Pi + F3 + I‘g + F2, as

in Figure 2, such that 04 is outside this new curve, Pg is “most” of a circle of radius 7" centered at the point a and

1

f(/\)

2—717;

_ .

1

F()\—C¥)dA—2E)I(l)2—7Tl

f0) Pi (A—Ol)d)\

Then, by the construction, I‘g (A—Ck)d/\

[FE (A_Ol)d/\

fr; ()‘_O‘)d)\—

F461 (A—Oz)d)\'

Now, as 6 tends to zero, the first and third integrals on the right cancel each other out and the second integral

/ 5(A—a) f”) dA—> — f” —f(0‘+’”ew) irewdd oz+rei‘9—oz) r

0

(

as 6 tends to 0. The final formula follows from the fact that 27r

/ 0 as 1" tends to 0.

i6

_

Mirewdd —> 27rif(a) (04 + pew — 04)

394

1 7. Matrix-valued holomorphic functions

Figure 2 It remains to consider the case that the point oz is outside 1". But then the statement is immediate from Theorem 17.6. D

Theorem 17.8. Let Q be a nonempty open subset of (C and let f 6 b9.

Then f’ 6 b9. Proof.

Let w E Q and let I‘ = w + ”re“, 0 S 6 < 27r, Where 7" is Chosen

small enough so that A E 9 when |)\ — w| S 7'. Then

_ 1 m) f(w)—2—m./F/\_wdk and

flw+0—fW)__

—g



__ _

fl»

1

2m/ g

1

1

{A—w—§_A—w}d)\

f(A)

1

27ri/F (A—w—§)(A—w)d)\

1

f(A)

% r (A—wl2d)‘ as g —> 0. Therefore,

,

__ 1

f(A)

f (w) ‘ 2m/F (A—w)2d)\ and hence

f’(w+€)—f’(w)_ 1 /f{ g

_27ri

F

g

1

1

(A—w—§)2_(A—w)2

}d)\,

1 7.2. Contour integration

395

which tends to the limit

2

f (A)

2—77i/ —d)\ (A— w)3 III

as g tends to 0. Corollary 17.9. In the setting of Theorem 17.8,

NW) _ _ 1 / ( NA)

(17.11)

k3!

27ri

1“

/\—LU)k+1

dA

fork=0,1,.... Exercise 17.5. Verify Corollary 17.9.

Exercise 17.6. Verify assertion (6) in Theorem 17.1. Theorem 17.10. Let Q be a nonempty open subset ofC and let f()\) = g()\)/h()\), where g E [m and

h(A) = (A — 051),“ - - - (A — awn is a polynomial with n distinct roots 041, . . .,ozn. Let I‘ be a simple closed piecewise smooth curve directed counterclockwise such that I‘ and all the points inside F belong to 9. Suppose further that 011,...,Olg are inside I‘ and ag+1, . . . , an lie outside I‘. Then

Tim/“A)dA: ZRes(f,a,-), where

Rest,.,)_ggf\= f()\)d —. 27m F]. 2m pj (A — aj) J (kj — 1)! .

1

1

for j = 1, . . . ,6, which coincides with the advertised formula.

17.3. Evaluating integrals by contour integration Having come so far, it is worth expending a little extra energy to review some evaluations of integrals that emerge as a very neat application of con— tour integration and also serve as a good introduction to some of the basic formulas of Fourier analysis, which will be the subject of the next section.

x mp

E a

le 17.11.

Discussion.



f...

1

dzlz=7r.

Let

N) = ”L“ g= (A—wo) = A; and let FR denote the semicircle of radius R in the upper half plane, including

the base (—R, R), directed counterclockwise as depicted in Figure 3. Then

/

R

1

m2+1d$=IR+IIR,

—R

||

:0

N

where

_ FR A_Z,alA 9(/\) FR f()\)d)\ _

= 27rig(i)=7r

if R>1,

17.3. Evaluating integrals by contour integration

397

since 9 is holomorphic in C \ {—z'}, and OR

the integral over the circular arc OR = Re“, 0 g 6 3 7r, tends to zero as

R T 00, since CR

f()\)d)\| =

/ f(Rew)iRewd0| 0 7r

< _ 00

° 1f

R>1 .

its:

Example 17.12. /

6

_00 x2 + 1

Discussion.

1

/0 R2—1Rd6

dx = we‘ltl

if t E IR.

Let

A

em

(1

f()—)\2—+1

an

A

A

.

A

ea).

9(l—(—Z)f()—A+i

and let FR denote the contour depicted in Figure 3. Then, since 9 is holo-

morphic in C\{—i}, the strategy introduced in Example 17.11 yields the evaluations PR

9(A)_d)\ = 2mm if R > 1

mm =

FR A — Z

and

f()\)d)\| = / f(Rei9)z'Rewd9| 0

CR

7r e—tRsinO

< _

/0

R2 _ 1 Rd6 .

Thus, if t > 0, then

AdAS/

W

R

d0,

which tends to zero as R T 00. If t < 0, then this bound is no longer valid;

however, the given integral may be evaluated by completing the line segment

[—R R] with a semicircle in the lower half plane. Exercise 17.7. Show that 00

/

eitx

2 dm=7ret 003: +1

ift\) _

A2

2 — em — e‘it)‘

_

2A2

'

Then f is holomorphic in (C and, following the strategy of Example 1, R

/

f($)df1}=IR-l-IIR,

—R

where IR =

f()\)d)\ PR

and

HR = —

f(>\)d)\ = — / f(Rew)iRewd6. CR

0

However, this does not lead to any useful conclusions because IIR does not

tend to zero as R T 00 (due to the presence of both 6‘“ and e‘it)‘ inside the integral). This is in fact good news because IR = 0. It is tempting to split

f (A) into the two pieces as

f()\) = 1010‘) + 1020‘) with 1 _ e-it)‘

1 _ eitA

f1()\)=

2A2

and f2()\)=

2A2

and then, if say If > 0, to integrate f1()\) around a contour in the upper half

plane and f2()\) around a contour in the lower half plane. However, this does not work because the integrals R / f1(£l?)d$ —R

R and/ f2(.’13)d$ —R

17.3. Evaluating integrals by contour integration

399

Figure 5

are not well defined (because of the presence of a pole at zero). This new difficulty is resolved by first noting that R

/ mm = —R

fdA

LR

Where LR is the directed path in Figure 5. troublesome point A = 0, R

/ fdx = _R

LR

Since this path detours the

mm + f2(A)} dA = / HAW + LR

LR

f2 0, R > 0 and CR depicts the circular arc in Figure 3,

LR

f1(A)dA =

LR

f1(A)dA + / f1(A)dA — CR

OR

f1(A)d/\

= 7rt — / f1(Rew)iRewd0 0

—> 7rt as R T 00;

Whereas, if DR denotes the circular arc depicted in Figure 4,

f2()\)dA— :f2(Rew)iRe’6d0 [R f2(A)+/DR(

LR f2(A)dA

0+

2f2 (Rew)iRewd6 7T

—>0as RToo.

This completes the evaluation if t > 0. The result for t < 0 may be obtained by exploiting the fact that l — cos tan is an even function of t.

Exercise 17.8. Verify the evaluation of the integral given in the preceding example by exploiting the fact that foo l—costzr

_00 Example 17.14. / —00

x2

d3: = lim

0° 1—costx

—dar:.

510 _00 x2 + 52

e_a(m_ib)2dx = / —OO

e_‘””2da: if a > 0 and b 6 IR.

400

1 7. Matrix-valued holomorphic functions

-R+ib

R+ib -

-

4

l

r

-

-R

R Figure 6

Discussion.

Let FR denote the counterclockwise2 rectangular path indi-

cated in Figure 6 for b > 0 and let f (A) = 6—“ . Then, since f()\) is holomorphic in the whole complex plane C, R

b

0: FRfoW = /_Rf(x)dx+/O f(R+7Zy)idy R



/

b

f(x + ib)da: —/ f(—R + iy)idy. 0

—R

Next, invoking the bound

| exp {—a(R + z‘b)2}| = | exp {—a(R2 + 2i — b2)}| = exp{—a,(R2 _ 19)}, one can readily see that the integrals over the vertical segments of the rectangle tend to zero as R T 00 and hence that

/

—oo

e—am2dx= /

f(x)d:c= /

f(x+z’b)dm/

e—“(x+ib)2dx, 00

for every choice of b > 0. Since the same argument works for b < 0, the verification of the asserted formula is complete.

17.4. A short detour on Fourier analysis Let

(17.12)

M) = /°° Mom —00

denote the Fourier transform of f (Whenever the integral is meaningful) andlet

fab(90) = {

1 0

foragmgb elsewhere

and

t = b _ a.

Then fab(H) = /

—00

ewxfab(x)dx = / ewmdx = —_ a

7’1““

17.4. A short detour on Fourier analysis

401

and 00

A

2

oo

/ ram)! em = /_:

€iu_b_ eipa 2

d”

€iut_

0" 1Z—cos,ut

= 2L. Td“ = 27rt

(by the formula in Example 17.13)

277/ |fab(m)|2dm. Exercise 17.9. Show that if a < b S c < d, then

_°° fodmwmdu = [HINT: Let A

A

id_

ic

W

9(H)=fcd(,u) ab(:u)={e’u 1H6” }{6M “16”} eiM(d_b) _ eiH(c_b) _ 6".“(d—a) + eiy,(C—a)

#2 and exploit the fact that g()\) is holomorphic in (C and the coefficients of in in the exponential terms in 9(a) are all nonnegative.] Exercise 17.10. Show that if 0 g t < 1, then

lim RTOO

e—Rsin (’t6 = 0. 0

[HINTz First check that if 0 g 0 S 7r/2, then 0 3 sin 0 g 26/7r.] Exercise 17.11. Show that R

11m i / e—Wfamdu = f.. RToo 271' _ R A

for all points x E R other than a and b. In View of the formulas in Exercises 17.9 and 17.11, it is now easy to check that

713 ) (1-

1' 1 R ‘WA d — IggoglRe f(u)u—f(x)

and

(17.14)

; _Oolf(u)|2du= /_00|f(ar)l2dx

402

1 7. Matrix-valued holomorphic functions

for functions f of the form f($) = cfajbj(x)7

j=1 where a1 < b1 g (12 < b2 g

3 an < bn and c1, ...,cn is any set ofcomplex

numbers. The first formula (17.13) exhibits a way of recovering f (at) from its Fourier transform f (/1) Accordingly, the auxiliary transform

(17-15)

Wit)

1 foo e‘i“‘”9(u)du

=%

—m

(appropriately interpreted) is termed the inverse Fourier transform. The

second formula (17.14) is commonly referred to as the Parseval/Plancherel or Parseval-Plancherel formula. It exhibits the fact that

”(m-m = ”m where

||f||2 = {/_: lf(:v)|2d:c}%, for the class of functions under consideration. However, the conclusion is

valid for the class of f which belong to the space L2 of f such that | f |2 is integrable in the sense of Lebesgue on the line R.

Exercise 17.12. Show that (17.14) holds if and only if

(17.16)

% /_°° flmmdu = /_°° “some

holds for every pair of piecewise constant functions f (x) and g(:c). [HINT: Invoke (8.5).]

The space L2 has the pleasant feature that f E L2 f6 L2. An even pleasanter class for Fourier analysis is the Schwartz class 8 of infinitely

differentiable functions f (x) on R such that lim xj f (k) ()l a: = MOI lim xj f (k) ()l :L' = 0 Wool for every pair of nonnegative integers j and k. A

Exercise 17.13. Show that if f E 8, then its Fourier transform f (A) enjoys the following properties:

(a) (—i/\)jf()\) = ffoooefmf(j)(x)dx for j = 1,2,.... (b) (—z'nmf: ff; eimkmx for k = 1,2,. . .. (c) f6 8.

17.5. The Hilbert matrix

403

You may take it as known that if f E 8, then the derivative A

A

flA+0-ffl)

¢_.

.Dxf—;§%-———7?———— can be brought inside the integral that defines the transform.

Exercise 17.14. Show that if f (m) and 9(x) belong to the Schwartz class 8, then the convolution

(f o g> = /_°° m: — y>g(y)dy

(17.17)

belongs to the class 8 and that

(17.18)

Wm) = fond). A

Exercise 17.15. Show that if f(x) = e—m2/2, then flu) = e—“2/2f(0). [HINT: Exploit the formula that was established in Example 17.14.] 17.5. The Hilbert matrix

The matrix An 6 (C (”+1)X(”+1) with jk entry (17.19)

_

ajk

1

for j, k: = 0,. . . ,n

_j+k+1

is called the Hilbert matrix. It is readily checked that An > O by setting

f (:10) 2 22:0 chick and noting that 1

1

n / |f(x)|2d:r =/ Z Ckk—jivjdx 0 0 j,k=0 ,12:0 J + k: + 1 where cH= [a

E]-

Lemma 17.15. The Hilbert matrix An defined by ( 17.19) is subject to the bound

(17.20)

Proof.

”An” 3 7r

for every choice of the positive integer n.

Let g(x) = 23:0 |ck|xk. Then, by the preceding calculations, 1 cHAnc =/1|f(:c)|2dm S /1 g(5c)2d;v 3/ g(m)2dx o 0 —1 = —’l:/

9(ei0)2€i6d0 S/

0

0

1

W

.

n

|g(ei0|2d6

= 5 L. |9(e“’)l2d0 = 7r; lair = rllcll2k:=0

404

1 7. Matrix-valued holomorphic functions

Thus, the bound (17.20) holds, as claimed.

D

17.6. Contour integrals of matrix-valued functions The contour integral

fr F()\)d)\ of a p x q matrix-valued function

F()\)=

qx)

f1q()\)

-

:

fPIO‘)

"'

qO‘)

is defined by the formula all

[F mm =

n n -

s

s ap 1

alq

u u .

,

apq

where aij=/Ffij()\)d)\,

i=1,...,p, j=1,...,q;

i.e., each entry is integrated separately. It is readily checked that

/F{F(/\)+G()\)}d)\= fr F(>\)d)\+ A com and that if B and C are appropriately sized constant matrices, then

A BF()\)Cd)\ = B (/F F()\)d)\) 0. Moreover, if 900‘) is a scalar valued function and C E (Cpxq, then

[rcxX/‘dA = (/r

27”. F< AI” — A > ‘ldA = 2 UiXiVi,



where X- _

z_ Proof.

1m.

if fli

is inside

Onai

if flz-

is outside I‘

I‘

This is an easy consequence of formula (17.24) and Lemma 17.17. D

It is readily checked that the sum on the right—hand side of formula

(17.25) is a projection:

e

2

e

(2 mm) = Z U.X.-V.-. i=1

i=1

Therefore, the integral on the left—hand side of (17.25), 1

( 17.26 >

PA r =— 27”.

P< Mn — A > ‘ldA,

is also a projection. It is termed the Riesz projection.

Lemma 17.19. Let A E (Cnxn, let det(/\In — A) = (A — A1)a1(~)\ — NJ”, where the points A1 . . . M are distinct, and let I‘ be a simple piecewise smooth counterclockwise directed closed curve that does not intersect any of the eigenvalues of A. Then

(17.27) Proof.

rank Pf4 = Z 071ieG

where

G = {i : A7

is inside

The conclusion rests on the observation that

6 rank Z UiXZ-V;

rank (U {diag {X1, . . . ,Xg}} V)

i=1

= rank {diag{X1, . - - aX€}} a

I‘}.

17.7. Continuous dependence of the eigenvalues

407

since U and V are invertible. Therefore, the rank of the indicated sum is equal to the sum of the sizes of the nonzero X,- that intervene in the formula

for PIA, which agrees with formula (17.27).

III

17.7. Continuous dependence of the eigenvalues In this section we shall use the projection formulas PIA to establish the continuous dependence of the eigenvalues of a matrix A E (3a on A. The strategy is to show that if B 6 CM” is sufficiently close to A, then

”Pf4 — P15“ < 1, and hence, by Lemma 9.16, rank4 = rankPIB. The precise formulation is given in the next theorem, in which

D€(u)={AEC:|A—,u| 0 such that the open ball

35m) = {B e cpxq: ||A — B” < 5} is also a subset of X. The meaning of this condition is that if the entries in a matrix A E X are changed only slightly, then the new perturbed matrix will also belong to the class X. This is a significant property in applications and computations because the entries in any matrix that is obtained from data or from a numerical algorithm are only known approximately. The preceding analysis implies that:

(1) {A E Cpxq : rankA = min{p, q}} is an open subset of (3q. (2) {A E (CPX‘I : rankA < min{p, q}} is not an open subset of (CPXq. (3) {A E ((3a : with n distinct eigenvalues} is an open subset of 13”“. (4) {A E (3a : A is diagonalizable} is not an open subset of CW”.

410

1 7. Matrix-valued holomorphic functions

The set {A E Ca : with n distinct eigenvalues} is particularly useful be-

cause it is both open and dense in Cm”, thanks to Lemma 17.22. Open dense sets are said to be generic.

Exercise 17.19. Show that {A 6 (€a : A

is invertible} is a generic set.

17.9. Spectral radius redux In this section we shall use the methods of complex analysis to obtain a simple proof of the formula

MA) = grog llAklll/k for the spectral radius

7“,,(A) = max{|)\| : A E 0(A)} of a matrix A 6 CW". Since the inequality

n,(A) g ||Ak||1/k for k: = 1,2,... is easily verified, it suffices to show that

(1728)

[gm llAklll/k S MA)-

Lemma 17.23. If A E (3a and 0(A) belongs to the set of points enclosed by a simple piecewise smooth counterclockwise directed closed curve I‘, then

(1729) . Proof.

Ak

—2—m. —/( A k (A I

—A ) 1 dA.

Let g()\) = N“; let Fr denote a circle of radius r centered at zero

Fr 90‘) j=0 )‘J'+1 00‘

{2—71ri/P i—j+1dA}Aj

(”@141 = A“. O

H

|| u.Mg

.7

II o

I

k(,,—I—A1 2—71m/A(A )dA =

M8 tilt

and directed counterclockwise, and suppose that r > ”A”. Then

j!

The assumption 7“ > ”All is used to guarantee the uniform convergence of the sum $320 A‘jAj on 1",. and subsequently to justify the interchange in the order of summation and integration. Thus, to this point, we have established

formula (17.29) for l" = Pr and 7" > “A”. However, since Ak()\In — A)‘1 is

17.9. Spectral radius redux

411

holomorphic in an open set that contains the points between and on the curves I‘ and Fr, it follows that

1 fwkwn — A)_1d)\ = if MAL, — A)_1d>\.

272'

1“

271"],

Fr

D Corollary 17.24. If A E (:a and 0(A) belongs to the set of points enclosed by a simple piecewise smooth counterclockwise directed closed curve I‘, then

(17.30)

p(A)—_ 2—7ri 1 /p(A)(AIn—A)_1d)\ F

for every polynomial p()\).

Theorem 17.25. IfA 6 CW", then

it“ ||Ak||1flc = MA);

(1731)

i.e., the limit exists and is equal to the modulus of the maximum eigenvalue of A.

Proof.

Fix 6 > 0, let T = ra(A) + e, and let

w = maX{||(>\In — A)_1|| = W = 7"}Then, by formula (17.29), 271'

1

”Ale”

:

| 2—m/

. (74620)k3(,r.6201n _ A)—1i,rezod0|| _

.

0

S

1

g 0

277

.

rk||(re191n — A)_1||rd0

< rk+1'y,.. Thus,

llAklll/k s «mm/k

and, as

(mal/k —> 1 as kToo, there exists a positive integer N such that

||Ak||1/k g ra(A) + 26

for every integer k 2 N.

Therefore, since rU(A) g ||Ak||1/k for k = 1,2,. . ., it follows that

0 g ||Ak||1flc — 7“,,(A) g 28

for every integer k 2 N,

which serves to establish formula (17.31).

D

412

1 7. Matrix-valued holomorphic functions

Exercise 17.20. Show that if A E (3a and 0(A) belongs to the set of points enclosed by a simple piecewise smooth counterclockwise directed closed curve I‘, then

(17.32)

1 Fe A (AIn _ A) —1 dA—Z _ 00 A_j 27”, j! .

Let A 6 (3a and let f (A) be holomorphic in an open set 9 that contains 0(A). Then, in View of formulas (17.30) and (17.32) it is reasonable to define

(17.33)

((A) = 2% [P f(/\)(>\In — A>-1dA,

where I‘ is any simple piecewise smooth counterclockwise directed closed curve in (2 that encloses 0(A) such that every point inside I‘ also belongs to (2. This definition is independent of the choice of 1" and is consistent with

the definitions of f (A) considered earlier. Exercise 17.21. Show that if A E (Cnxn, f (A) and g()\) are holomorphic in an open set 9 that contains 0(A) and F is any simple piecewise smooth counterclockwise directed closed curve in (2 that encloses 0(A) such that every point inside F also belongs to Q, then

(1734)

f(A){2im/F9(C)(C1n—)‘1dC}=2—m:/()f(>\ )(MIn—A)_1d)\.

[HINT: Use the 1dent1ty (AI — A) 1n((I — (ngI — A) 1} to reexpress 2771f(A)27rig(A)A)

=((— A) 1{()\In —A)1 —

/f(A>((AI—A {/ (gldc}dA—Alg( 0, then A”2 = i / \FAO‘IT, — A)_1d)\ 271"],

F

for any simple closed piecewise smooth curve I‘ in the open right half plane that includes the eigenvalues of A in its interior. Exercise 17.26. Let A, B E (Ca and suppose that A > B > O. Show that if 0 < t < 1, then

(17.39)

At — Bt = 2im/ At {(AIn — A)—1 — (A1,, — B)—1}dA, P

where 1" indicates the curve in Figure 7, and then, by passing to appropriate limits, obtain the formula

(17.40)

At — Bt =

sin 7rt 71'

/ xt(:vIn + A)—1(A — B)(;I:In + B)—1dx. 0

Exercise 17.27. Use formula (17.40) to show that if A, B E (Cnxn, then

(17.41)

A>B>O=>At>Bt for O AX — XB e 0””.

Then

(18.8) NT={0pxq}ozi—Bj7é0 for i=1,...,p, Proof.

j=1,...,q.

Let Au,- 2 aiui and Bi = fijvj for some pair of nonzero vectors

u,- 6 CP and vj E (Cq and let X = uivgr. Then the formula TX 2 Auivg-P — uz-VJTB = (a7; — Bfluivgr

clearly implies that the condition stated in (18.8) is necessary for NT =

{0q}To prove the sufficiency of this condition, invoke Jordan decompositions

A = U JU‘1 and B = VJV‘1 of these matrices. Then, since

AX — XB = o UJU-lx — xv-1 = o J(U‘1XV) — (U-lxv) .7: o, the proof is now completed by setting Y = U‘1X V and writing J~and fin block diagonal form in terms of their Jordan cells J1, . . . , Jk and J1, . . . , Jg,

respectively, just as in (18.3). Then, upon expressing Y in compatible block

form with blocks Yij, it is readily seen that

(18.9)

JY—YJ=O(=>J¢Y;;j—Yii=O

for

i=1,...,kandj=1,...€.

Thus, if J,- = ozz-Ipi + N and j; = fljlqj + 1V, then

«1i — 14,3} =(06i1pi- + N)Y,-,- — Yijjj = Yijmflqj — 33') + NY,-,-.

18.2. The Sylvester equation AX — XB = C

419

~

However, if 04.; — ,Bj 75 0, then the matrix ai, — Jj = (o,- — Bjflqj — 1V is invertible, and hence, upon setting ~

M = —(042'Iqj — (Bl—1a the equation reduces to

n, = NY,jM,

which iterates to Y”. = Nkn-Mk for k: = 2,3,...

and hence implies that Yij = 0, since Nk = O for large enough k. This completes the proof of the sufficiency of the condition a,- — fij 7E O to insure

III

that NT = {0pm}.

Theorem 18.6. Let A E CPXP, B E (3q and C E ((3q and let 041, . . .,ap and 51,...,fiq denote the eigenvalues of the matrices A and B {repeated

according to their algebraic multiplicity), respectively. Then the equation AX—XBzC

has a unique solution X E ((3q if and only if 04,- — [33- 7E 0 for any choice of i and j. Proof.

This is an immediate corollary of Theorem 18.5 and the principle

of conservation of dimension. The details are left to the reader.

|:|

Exercise 18.7. Complete the proof of Theorem 18.6. Exercise 18.8. Let A 6 CW". Show that the Lyapunov equation

(18.10)

AHX + XA = Q

has a unique solution for each choice of Q E (3a if and only if 0(A) fl o(—AH) = 0.

Lemma 18.7. If A,Q E (Cnxn, and if 0(A) C IL and —Q t 0, then the Lyapunov equation (18.10) has a unique solution X E Cnxn. Moreover this solution is positive semidefinite with respect to C”.

Proof.

Since 0(A) C H_, the matrix Z= —/ 0

etAHQetAdt

420

18. Matrix equations

is well defined and is positive semidefinite with respect to C”. Moreover,

AHZ = — / AHetAHQetAdt 0

_

_



d tAH

/. (dt

tA

)Qe dt

d 00 A)dt} 0 0° etAH —(Q€t _{etAHQetA |t=0_/

= Q+ / etAHoetAt 0

= Q—ZA. Thus, the matrix Z is a solution of the Lyapunov equation (18.10) and hence, as the assumption 0(A) C H_ implies that 0(A) fl o(—AH) = 0, there is only one, by Exercise 18.8. Therefore, X = Z is positive semidefinite.

D

A number of refinements of this lemma may be found in [56]. Exercise 18.9. Let A 6 CW”. Show that if 0(A) C 11+, the open right half plane, then the equation AHX + XA = Q has a unique solution for every choice of Q E Ca and that this solution can be expressed as

X=/ 0

00

H

e‘tA Qe‘tAdt

for every choice of Q 6 CW". [HINT: Integrate the formula d 0° —tAH —tA = _ / 0° _ AH —tA H —tA /0

6

Q6

dt

0

dt (6

Q) 6

dt

by parts] Exercise 18.10. Show that in the setting of Exercise 18.9, the solution X can also be expressed as

X = _%/00 (m1... + AH)‘1Q(iuIn — A)‘1d;i. [HINT: Write AHX — —limR.0012—.f1§,(AH+mIn —i/J,In){- - - }d,u; another way is via formula (1716).] Exercise 18.11. Let A = diag {1411,1422} be a block diagonal matrix in

(Cnxn with 0(A11) C 11+ and 0(A22) C H_, let Q E (3a and let Y E (3a and Z 6 ((3a be solutions of the Lyapunov equation AHX + XA = Q. Show that if Y and Z are written in block form consistent with the block decomposition of A, then Yll = Z11 and Y22 = Z22.

Exercise 18.12. Let A, Q E (Ca_ Show that if 0(A) (HR = (D and if Y and Z are both solutions of the same Lyapunov equation AHX + XA = Q and Y — Z i 0, then Y = Z.

[HINT: To warm up, suppose first that

18.3. AX = XB

421

A = diag {A11,A22}, where 0(A11) C 11+ and 0(A22) C H_ and consider Exercise 18.11.] Exercise 18.13. Let A: :31 ejej+1— —C(4) and let T denote the linear

transformation from (C4X4 into itself that is defined by the formula TX =

AHX — XA. (a) Calculate dim NT. (b) Show that a matrix X E C4X4 with entries mij is a solution of the matrix equation AH X — XA =

0

—a

—b

—c

Z

8

8

8

c

0

0

0

if and only if

X is a Hankel matrix With 5611 = a, $12 = b and $13 = c.

Exercise 18.14. Show that if A, B, C' 6 CW”, then

X(t) = etACe_tB is a solution of the differential equation

X’(t) = AX(t) — X(t)B that meets the initial condition X (0) = C. 18.3. AX = XB Lemma 18.8. Let A,X,B E Ca and suppose that AX = XB.

Then

there emists a matrix C’ 6 CM” such that AX = X(B+C’) and 0(B+C’) Q 0(A).

Proof.

If X is invertible, then 0(A) = 0(B); i.e., the matrix C = 0 does

the trick. Suppose therefore that X is not invertible and that Cg“) is a k X k

Jordan cell in the Jordan decomposition of B = UJU ‘1 such that 5 ¢ 0(A). Then there exists a set of k linearly independent vectors u1, . . . ,uk in C”

such that

B[u1 . . . 11k] = [111 . . MAC/(3k)

and

AX[u1---uk,]

= XB[u1---uk] =

Xlu1"'uk]0ék) ;

i.e., AX u1 2 5X 111

and AXuj=BXuj+uj_1forj=2,...,k:.

422

18. Matrix equations

Since B 55 0(A), it is readily checked that Xuj = 0 forj = 1,. . . ,k. Thus, if _)

v1

31 = B + [111 ' ' ' uk](0ék) — Cg”)

E

,

_)

Vic where 5;, . . . ,5]: are the rows in V = U ‘1 corresponding to the columns

111, . . . ,uk, and 04 E 0(A), then X31 = X3 = AX, and the diagonal entry of the block under consideration in the Jordan decomposition of B1 now belongs to 0(A) and not to 0(B). Moreover, none of the other Jordan blocks in the Jordan decomposition of B are affected by this change. The same procedure can now be applied to change the diagonal entry of any Jordan cell in the Jordan decomposition of B1 from a point that

is not in 0(A) to a point that is in 0(A). The proof is completed by iterating [I

this procedure.

Exercise 18.15. Let A,X,B E (Cnxn. Show that if AX = XB and the columns of V E (CnXk form a basis for NX, then there exists a matrix

L 6 cm such that 0(B + VL) g 0(A). 18.4. Special classes of solutions Let A E ((3a and let:

0 8+(A) = the number of zeros of det (A1,, — A) in 11+; 0 8_(A) = the number of zeros of det (Mn — A) in H_;

o 60(A) = the number of zeros of det (AIn — A) in 2R; counting multiplicities in all three. The triple (6+(A), 8- (A), 80(A)) is called the inertia of A. Since multiplicities are counted,

8+(A) + 8_(A) + 50m) = n. Theorem 18.9. Let A 6 CW” and suppose that 0(A) (HR = (b. Then there exists a Hermitz'an matrix G 6 (Ca such that (1) AHG+ GA > O. (2) 6+(G) = 6+(A), 8_(G) = 8_(A) and 80(G) = 80(A) = 0.

Proof.

Sylvester’s inertia theorem (which is discussed in Chapter 20) guar-

antees that (2) is an automatic consequence of (1). Therefore, it is only necessary to verify (1).

18.4. Special classes of solutions

423

Suppose first that 8+(A) = p 2 1 and 8_(A) = q 2 1. Then the assumption 0(A) 0 7R = Q) guarantees that p + q = n and that A admits a

Jordan decomposition UJU‘1 of the form

_ J10 Mia by;_1 with J1 e CPXP, 0(J1) c m, J2 6 (CW and 0(J2) c H_. Let P11 6 ((3p be positive definite over (3" and P22 6 (Cq be positive definite over C9. Then X11 = /



H

€_tJ1 P11€_tJ1dt

0 is a positive definite solution of the equation

JfIXll + X11J1 = P11 and 00

X22 2 —/

6t]; P22€tJ2dt

0 is a negative definite solution of the equation

JfX22 + X22J2 = P22Let X : diag {X11, X22} and P = diag {P11, P22}.

Then

JHX + XJ = P and hence

(UH)‘1.Jlr1'U1r7{(U1LI)‘1XU‘1 + (UH)-1XU—1UJU—1 = (UH)-1pu-1. Thus, the matrix G = (UH)_1XU_1 is a solution of the equation

AHG + GA = (UH)—1PU—1 > o. The cases p = 0, q = n and p = n, q = 0 are left to the reader.

III

Exercise 18.16. Complete the proof of Theorem 18.9 by verifying the cases p = O and p = n. Exercise 18.17. Find a Hermitian matrix G E (C2X2 that fulfills the con-

ditions of Theorem 18.9 when A = diag{1 + 2', 1 — i}.

424

18. Matrix equations

18.5. Riccati equations In this section we shall investigate the existence and uniqueness of solutions

X 6 (€a to the Riccati equation

(18.11)

AHX+XA+XRX+Q=0,

in which A, R, Q E (Cnxn, R = RH and Q = QH. This class of equations has important applications. Moreover, the exploration of their properties has the added advantage of serving both as a useful review and a nonartificial application of a number of concepts considered earlier. The study of the Riccati equation (18.11) is intimately connected with the invariant subspaces of the matrix

(11111

G- [ _Q A

R

1,

which is often referred to as the Hamiltonian matrix in the control theory literature. The first order of business is to verify that the eigenvalues of G are symmetrically distributed with respect to the imaginary axis ilR, or, to put it more precisely:

Lemma 18.10. The roots of the polynomial p()\) = det()\Ign — G) are sym— metrically distributed with respect to iR. Proof.

This is a simple consequence of the identity

505—1 = —GH, where

o —In S: [ I11 0 ]

(18.13)

El Exercise 18.18. Verify the identity SGS‘1 = —GH and the assertion of Lemma 18.10.

If 0(0) 0 ilR = (2), then Lemma 18.10 guarantees that G admits a Jordan decomposition of the form

118111

_ J1 AU o _1 , (1440

where J1, J2 E (Cnxn,0(1]1) C H_ and 0(J2) C 11+. It turns out that the upper left-hand n x n corner X1 of the matrix U will play a central role in the subsequent analysis; i.e., upon writing In

_

X1

U[O]_[X2]

and

_

A—J1

18.5. Riccati equations

425

so that

(18.15)

G l g l = l 2 JA and 0(A) c n_,

the case in which X1 is invertible Will be particularly significant.

Lemma 18.11. If 0(G) fl ilR = Q) and formula (18.15) is in force for some matrix A E (Cnxn (that is not necessarily in Jordan form), then

(18.16) Proof.

XfIX2 = X2HX1 . Let

Z = XfIX1 — XFXg. Then

ZA = [XlH Xfl]S[X1]A X2

= [XfI X51]SG[X1]

= —[X1H XzHlafllll = —AH[X1H X2H]S [ X1]

= —AHZ. Consequently, the matrix Z is a solution of the equation

ZA + AHZ = 0. However, since 0(A) C l'L and hence 0(AH) C l'L, it follows from Theo— rem 18.6 that Z = O is the only solution of the last equation.

III

Theorem 18.12. If 0(0) 0 ilR = (D and the matrix X1 in formula (18.15) is invertible, then:

(1) The matrix X = X2X1— 1 is a solution of the Riccati equation

(18.11). (2) X=XH.

(3) 0(A+RX) c H_ . Proof. that

If X1 is invertible and X = X2X1—1, then formula (18.15) implies

_

In In G[X]=[X]X1AX11

426

18. Matrix equations

and hence, upon filling in the block entries in G and writing this out in detail, that

A+RX = X1AX1—1 —Q—AHX = X(X1AX;1). Therefore,

—Q—AHX =X(A+RX), which serves to verify (1). Assertion (2) is immediate from Lemma 18.11, whereas (3) follows from the formula A + RX = X1AX1_1 and the fact that 0(A) C H_. D Theorem 18.12 established conditions that guarantee the existence of a

solution X to the Riccati equation (18.11) such that 0(A + RX) C H_. There is a converse: Theorem 18.13. IfX = XH is a solution of the Riccati equation (18.11)

such that 0(A + RX) C 11., then 0(G) fl ilR = (D and the matrix X1 in formula (18.15) is invertible. Proof.

If X is a solution of the Riccati equation with the stated properties,

then

0E2] = l—AQ —§Hll1§l=l—3ii§xl = [fillVHRXl‘ Moreover, upon invoking the Jordan decomposition

A + RX = PJlP‘l , it is readily seen that

In _ In Glj‘D—[l‘h’ which serves to identify the n dimensional space spanned by the columns of the matrix In

_

P

l x l P - l XP l as the span of the eigenvectors and generalized eigenvectors of the matrix G corresponding to the eigenvalues of G in H_. Thus, in view of Lemma

18.10, 0(G) fl ilR = (3. Moreover, X1 = P is invertible.

D

Theorem 18.14. The Riccati equation (18.11) has at most one solution

X E (3a such that X = XH and o(A+RX) C H_.

18.5. Riccati equations

Proof.

427

Let X and Y be a pair of Hermitian solutions of the Riccati equa—

tion (18.11) such that 0(A + RX) C H_ and 0(A + RY) C H_. Then, since

AHX+XA+XRX+Q=O and

AHY+YA+YRY+Q=O, it is clear that

AH(X—Y)+(X—Y)A+XRX—YRY=0. However, as Y = YH, this last equation can also be reexpressed as

(A+RY)H(X — Y) + (X — Y)(A+RX) = 0, which exhibits X — Y as the solution of an equation of the form BZ + ZO = O

with 0(B) C TL and 0(0) C H_. Theorem 18.6 insures that this equation has at most one solution. Thus, as Z = Oa is a solution, it is in fact the

only solution. Therefore X = Y, as claimed.

III

The next theorem provides conditions under which the constraints imposed on the Hamiltonian matrix G in Theorem 18.12 are satisfied when

R = —BBH and Q = CHC. Theorem 18.15. Let A E Cnx”, B 6 CW1“, C 6 (Ca and suppose that (18.17)

rank [

A — A1,, C ] =n

. . for every paint A 6 2R

and

(18.18)

rank [A — A1,,

B] = n for every point /\ E E.

Then there exists exactly one Hermitian solution X of the Riccati equation

(18.19)

AHX + XA — XBBHX + 0% = 0

such that 0(A — BBH X ) C H_. Moreover, this solution X is positive semidefinite over C", and if A, B and C are real matrices, then X 6 KW". Proof.

Let

G=[ A

—BBH]

—CHC —AH and suppose first that (18.17) and (18.18) are in force and that

[—54% ‘Ffil’lmam for some choice of x E C”, y E C" and A E (C. Then

(A — AIn)x = BBHy

428

18. Matrix equations

and

(AH + AIn)y = —oHox . Therefore,

((A - /\In)x,y> = (BBHy,Y> = IIBHYIIE and ((A +XIn)x, y) = (x, (AH + AIn)y) = —(x,CHCx) = —||Cx||§ . Thus,

-(>\ + A)v=0=>J\/X1 = {O} =>X1 X2

is invertible.

Thus, in View of Theorems 18.12 and 18.14, there exists exactly one Hermit-

ian solution X of the Riccati equation (18.19) such that_o(A — BBHX ) C H_. If the matrices A, B and C are real, then the matrix X is also a Hermit-

ian solution of the Riccati equation (18.19) such that 0(A — BBHX) C 11.. Therefore, in this case, X E RM". It remains only to verify that this solution X is positive semidefinite with respect to C”. To this end, it is convenient to reexpress the Riccati equation

AHX + XA — XBBHX + 0% = o as

(A — BBHX)HX + X(A — BBHX) = —CHC — XBBHX, Which is of the form

A?X + XA1 = Q , Where 0(A1) C H_ and

—Q t O.

The desired result then follows by invoking Lemma 18.7.

CI

Exercise 18.19. Let A 6 CW”, B 6 C71”. Show that if 0(A) fl ilR = (Z) and (18.18) holds, then there exists exactly one Hermitian solution X of the

Riccati equation AHX+XA—XBBHX = 0 such that 0(A—BBHX) c H_. For future applications, it will be convenient to have another variant of Theorem 18.15.

Theorem 18.16. Let A 6 CW”, B 6 tin”, Q 6 CW", R 6 (CW; and suppose that Q t O, R > O,

(18.20)

rank [

A — Mn

Q

] =n

for every point A E ilR

and { 18.18) holds. Then there exists exactly one Hermitian solution X of the Riccati equation

AHX +XA — XBR‘lBHX +Q = 0 such that 0(A — BR‘lBHX ) C l'L. Moreover, this solution X is positive semidefinite over C”, and if A, B and C are real matrices, then X E Rnxn.

430

Proof.

18. Matrix equations

Since Q t 0, there exists a matrix C 6 CM” such that CH0 = Q

and rank C = rank Q = 7“. Thus, upon setting B1 = BR‘1/2, we see that

A

—BR-1BH

_

A

—Bl

—Q

—AH



—CHC

—AH

is of the form considered in Theorem 18.15. Moreover, since

]A_/\In]u=0]A_)\In]u=0,

C

Q

condition (18.20) implies that

rank ] A _C)\In ] = n for every point A E ilR. Furthermore, as

rank [A — AI”

B] 2 rank [A — AI”

BRA/2],

assumption (18.18) guarantees that

rank [A — AI” B1] = n for every point A E E. The asserted conclusion now follows immediately from Theorem 18.15. 18.6. Two lemmas

The two lemmas in this section are prepared for use in the next section.

Lemma 18.17. Let AC) 6 CW”, B, L 6 CW“, R 6 CW“,

E = l 5% E l and suppose that E t O, R > O and that

(18.21)

rankE = rank Q + rank R.

Then the formulas

(18.22)

rank ] A_é)\1" ] = rank ] A _Q)\In ]

and

(18.23)

rank [11— AI”

B] = rank [A — AIn

B]

are valid for the matrices

(18.24)

E = A — BR-lLH and 62' = Q — LR-lLH

and every point A E C.

CI

18.6. Two lemmas

Proof.

431

The formula

Q

L

_

In LR—1

@ o

LH

R

_

0

0

implies that

Ik

~ rank E = rank Q + rank R

R and

In

0

R‘lLH

In

~ Q t O.

Thus, in view of assumption (18.21), rank Q = rank Q and, since Q = Q+LR_1LH is the sum of two positive semidefinite matrices, NQ =Né ONLH Q N6“ However, since

rankQ = rankQ => dimNQ = dimN~, the last inclusion is in fact an equality:

NQ =N€2

and N52 QNLH

and hence,

A_~)‘In]u=0[A_>‘I"]u=0.

Q

Q

The conclusion (18.22) now follows easily from the principle of conservation of dimension. The second conclusion (18.23) is immediate from the identity

~ [A—AIn

B]—[A—/\In

In 0 B] [ —R_1LH [k]. III

Lemma 18.18. Assume that the matrices A, A, Q, Q, B, L, R and E

are as in Lemma 18.17 and that (18.21), (18.20) and {18.18) are in force. Then there exists exactly one Hermitian solution X E (3a of the Riccati equation

(18.25)

AHX + XA — XBR-lBHX + Q = 0

such that 0(A — BR‘lBHX ) C H_. Moreover, this solution X is positive semidefinite ouer C”, and if the matrices A, B, Q, L and R are real, then

X 6 RM".

Proof.

Under the given assumptions, Lemma 18.17 guarantees that rank

21' — A1,, ~

.

] = n for every point A 6 2R

432

18. Matrix equations

and

rank [A — AIn

B] = n for every point A E H—+.

Therefore, Theorem 18.16 is applicable with if in place of A and Q in place III of Q.

18.7. An LQR problem Let A E Rnx" and B E 11%”c and let

t x(t) = etAx(0) +/ e(t_s)ABu(s)ds, 0

0 S t < oo,

be the solution of the first-order vector system of equations

x’(t) = Ax(t) + Bu(t), t2 0, in which the vector x(0) E R” and the vector-valued function u(t) 6 IR k, t 2 0, are specified. The LQR (linear quadratic regulator) problem in control engineering is to choose u to minimize the value of the integral

(18.26)

Z(t) =/0t [xT(s) uT(s)] [ 1% f2] [:8 ] d3

when

[ 3 g ] E 0,

Q = QT 6 RM", L E Rn”, R = RT 6 11%k and R is assumed to be invertible. The first step in the analysis of this problem is to reexpress it in simpler form by invoking the Schur complement formula:

Q L LT R

_ _

In LR—l 0 1,6

Q—LR-lLT o 0 R

In 0 R-lLT 1,.“

Then, upon setting

E = A — BR—lLT,

Q = Q — LR—lLT

and

v(s) = R‘lLTx(s) + u(s), the integral (18.26) can be reexpressed more conveniently as t ~

(18.27)

Z(t) = A [xT(s) VT(3)] [ g g ] [ :8 ] ds,

Where the vectors x(s) and v(s) are linked by the equation

(18.28)

x'(s) = Elix(s) + Bv(s),

18.7. An LQR problem

433

i.e.,

~ t ~ x(t) = etAx(0) +/ e(t_5)ABV(s)ds. 0

(18.29)

Lemma 18.19. Let X be~the unique Hermitian solution of the Riccati equa-

tion (18.25) such that 0(A — BR—lBTX) C H_. Then X E Rn” and (18.30)

Z(t) =X(0)TXX(0)-Xt(t(15))TXX)+/0 ||R_1/2(BTXX(8)+RV(8))||2d8 Proof. Let 90(3)

x(s)TXx( ). Then

MS) = X’(8)TXX8( )+ X(8)TXX’(8)

= (Ax(s)+Bv(3))TXx(s) +x(s)TX(A’x(s) +Bv(s)) = X(8)T(ATX + XZ)X(8) + V(8)TBTXX(8) +X(8)TXBV(8) = X(8)T(XBR‘1BTX — Q)X(8) + V(8)TBTXX(8) + X(8)TXBV(8) = (x(s)TXB + v(s)TR)R—1(BTXx(s) + Rv(s)) -X(8)T@X(8) - V(8)TRV(8)Therefore,

Z(t)

= /0{x(s)TC§x(s)+v(s)TRv(s)}ds =

— 0 %{x(s)TXx(s)}ds t

+ / (BTXx(s) + Rv(s))TR-1(BTXX(3) + Rv(s))ds, 0

which is the same as the advertised formula.

D

Theorem 18.20. If the hypotheses of Lemma 18.18 are met and ifX E

R”:" is the unique Hermitian solution of the Riccati equation (18.25) with

0(A — BR-lBTX) c H_, then: (1) Z(t) 2 x(0)TXx(0) — x(t)TXx(t) with equality if V(8)=—R_1BTXX(8)f0’I“ 0 g s g t.

(2) v(s)= —R 1BTXx(s)for 0 S s < 00, then

ZOO( )= X(0)TXX(0) Proof.

The first assertion is immediate from formula (18.30). Moreover,

if v(s) is chosen as specified in assertion (2), then x(t) is a solution of the

434

18. Matrix equations

vector differential equation

x'(t) = (X — BR‘lBTX)x(t), and hence, as 0021' — BR‘lBTX) C H_, it is readily checked that x(t) —> 0 as t T 00.

D

18.8. Bibliographical notes The discussion of Riccati equations and the LQR problem was partially adapted from the monograph [92], Which is an excellent source of supplementary information on both of these topics. The monograph [55] is recommended for more advanced studies.

Chapter 19

Realization theory

was so upset when her mother married one that she took to mathematics and Hebrew directly though she was the prettiest girl for miles around...

Patrick O’Brian [68], p. 122 A function f (A) is said to be rational if it can be expressed as the ratio of two polynomials:

(19.1)

f(A)

_ ao+oz1A+---+ozkAk _/30+B1/\+---+finA"'

A rational function f (A) is said to be proper if f (A) tends to a finite limit as A —> 00; it is said to be strictly prOper if f(A) —> 0 as A —> 00. If oak 7E 0, fin 7E 0 and the numerator and denominator have no common factors, then the degree of f (A) is defined by the rule

(19.2)

deg f(A) = max{k,n} .

In this case f (A) is proper if n 2 k. A p x q mvf (matrix-valued function)

F(A):

f11(>\)

f1q(A)

fpl'O‘)

qlA)

-

:

is said to be rational if each of its entries is rational. It is said to be proper if each of its entries is proper or, equivalently, if F(A) tends to a finite limit as A —> 00 and strictly proper if F(A) —> O as A —> 00. Theorem 19.1. Let F(A) be a p x q rational muf that is proper. Then there exists an integer n > 0 and matrices D E (Cpxq, C E (Cpxn, A 6 ([3a and 435

436

19. Realization theory

B 6 (Ca such that

(19.3)

F()\) = D + C(AIn — A)—1B .

Proof.

(

19.4

Let us suppose first that Xk

X1

F A = —

()

)

()\—w)"“+

+A—w

+X09

where the Xj are constant p x q matrices. Let A denote the kp >< kp block Jordan cell of the form A=

wIp

IP

0

O

---

0

O

wlp

lp

O

---

0

.

.

.

0

O

O

II) O

---

0011,

Let n = kp and let N = A — wln . Then

(Mn — A)—1 = «A — w)In — N)—1 N =

_

—1

Nk—l ...



since Nk = 0. Therefore, the top block row of (A1” — A)—1 is equal to

[I29 0

0](AIn—A)—1=[A‘E’w

(Af—pwy

(Af—pwyc].

Thus, upon setting

X1

D=X0,C=[Ip

O

0]

and B:

3

,

Xk it is readily seen that the mvf F()\) specified in (19.4) can be expressed in the form (19.3) for the indicated choices of A, B, C and D. A proper rational p x q mvf F()\) will have poles at a finite number of distinct points 021,. . . ,LUg E (C and there Will exist mvf’s

Xlk 1 F)\=—

X11

X21: ,,_,,F)\=—e

X21

19. Realization theory

437

is holomorphic and bounded in the Whole complex plane. Therefore, by

Liouville’s theorem (applied to each entry of F()\) — 22:1 Fj (A) separately), 3

FW—ZEW=R i=1

a constant p x q matrix. Moreover,

D = lim F()\) , A—>oo

and, by the preceding analysis, there exist matrices Cj E (Cpxnj, Aj E (axnj, Bj E (axq and nj =pkrj for j = 1,...,€, such that

Fj()\) = (Ia-(AI — Aj)_1Bj and n = m + - - - +ne. Formula (19.3) for F()\) emerges upon setting B1

C=[C'1---Cg],A=diag{A1,...,Ag}

and

B:

_

Be D

Formula (19.3) is called a realization of F()\). It is far from unique. Exercise 19.1. Check that the mvf F()\) defined by formula (19.3) does

not change if C is replaced by OS, A by S‘lAS and B by 3—13 for some invertible matrix S 6 CW”.

Exercise 19.2. Show that if F1()\) 2 D1 + 01(Alm — A1)—1Bl is a p x q mvf and F2()\) 2 D2 + 02(Aln2 — A2)—1B2 is a q x 7" mvf, then F1()\) F2()\) = D3 -|- C3()\In — A3)_1B3,

Where

D3=D1D27

C3=[01

D102],

A

BC

A3=[Ol

1:122],

BD B3=l11322l

and n=n1 +712.

Exercise 19.3. Show that if D E (CPXP is invertible, then

(19.5)

{D + C(AIn — A)-1B}‘1 = D—1 — D‘10()\In — [A — BD-lo])-1BD-1 . Let C 6 CW", A 6 (Ca and B E (Cnxq. Then the pair (A, B) is said to be controllable if the controllability matrix

Q:=[B AB is right invertible, i.e., if rank 0: = n.

A"_1B]

438

19. Realization theory

The pair (0, A) is said to be observable if the observability matrix C CA CA”_1 is left invertible, i.e., if its null space

M: = {0} Exercise 19.4. Show that (C, A) is an observable pair if and only if the pair (AH, OH) is controllable. Lemma 19.2. The following are equivalent:

(1) (A, B) is controllable. (2) rank[A — AI” B] = n for every point A E (C. (3) The rows of the mvf (AIn — A)_1B are linearly independent in the sense that if uH()\In — A)_1B = 0H for u E C" and all A in some open nonempty subset Q of C that does not contain any eigenvalues of A, then u = 0.

(4) f; eSABBHeSAHds > O for every t > O. (5) For each vector v E C” and each t > 0, there exists an m x 1 vector—valued function u(s) on the interval 0 g s S t such that

t

/ e(t_5)ABu(s)ds = v. 0 (6) The matrix {CH is invertible. Proof. (1)=>(2). Let u E C" be orthogonal to the columns of the mvf [A — A1,, B] for some point A E (C. Then uHA = AuH and uHB = OH. Therefore,

uHAkB = AkuHB = 0H for k =0,...,n— 1, i.e., uHQI= OH. Thus, 11 = 0.

(2)=>(1). Suppose that uHAkB = 0H for k = 0,. ..,n— 1 for some nonzero

vector u E C". Then NQH is nonempty and hence, since NQH is invariant under AH, AH has an eigenvector in this nullspace. Thus, there exists a nonzero vector v E C” and a point oz 6 (C such that AHv=av and QIHV=O.

19. Realization theory

439

But this implies that

VH[A—aI

B] =0H,

which is incompatible with (2). Thus, N¢H = {0}; Le, (A, B) is controllable.

(1)(3). This follows from the observation that uH()\In — A)_1B

0H

for

AE Q

4:» uH()\In — A)‘1B = 0H for A e c \ 0(A) 4:) uHZO VB = 0H

uHAkB=OH

for

for

|A| > ”A”

k=0,...,n—1

uHC = OH. The verification of (4), (5) and (6) is left to the reader.

D

Exercise 19.5. Show that (A, B) is controllable if and only if condition (4) in Lemma 19.2 is met.

Exercise 19.6. Show that (A, B) is controllable if and only if condition (5) in Lemma 19.2 is met.

Exercise 19.7. Show that (A, B) is controllable if and only if the matrix QCH is invertible. Lemma 19.3. The following are equivalent:

(1) (C, A) is observable. A — A1,, (2) rank [ J = n for every point A E C. C (3) The columns of the mvf C(AIn — A)‘1 are linearly independent in the sense that if u E C” and C(AIn — A)_1u = 0 for all points A in some open nonempty subset Q of (C that does not contain any eigenvalues of A, then u = 0.

(4) fot CeSAeSAHCHds > O for every t > 0. (5) For each vector v E C” and each t > 0, there exists a p x 1 vector-

valued function u(s) on the interval 0 S s S t such that t / e(t_s)AHCHu(s)ds = v. 0 (6) The matrix DHD is invertible.

440

19. Realization theory

Proof.

(1)=>(2). Let [A—AIn]u=O C

for some vector u E C" and some point A E (C. Then Au = Au and Cu = 0. Therefore,

CAku=AkCu=O

for

k: 1,2,...

also and hence u = 0 by (1).

(2)=>(1). Clearly ND is invariant under A. Therefore, if N9 75 {0}, then it contains an eigenvector of A. But this means that there is a nonzero vector

v 6 N9 such that Av = av, and hence that [A—C'OlI]v=0

But this is incompatible with (2). Therefore, (C, A) is observable. (3):)(1). This follows from the observation that

(19.6) C(AIn—A)_1u=0

for

A6 QCAku=O

for

k=0,...,n—1.

The details and the verification that (4), (5) and (6) are each equivalent to observability are left to the reader.

El

Exercise 19.8. Verify the equivalence (19.6) and then complete the proof

that (3) is equivalent to (1) in Lemma 19.3. Exercise 19.9. Show that in Lemma 19.3, (4) is equivalent to (1). Exercise 19.10. Show that in Lemma 19.3, (5) is equivalent to (1).

Exercise 19.11. Show that the pair (C, A) is observable if and only if the

matrix DHD is invertible. Exercise 19.12. Let F()\) = Ip+C()\In—A)_1B and C(A) = Ip—C'1()\In— A1)_1Bl. Show that if Cl = C and (C, A) is an observable pair, then F()\)G()\) = [,0 if and only if B1 = B and A1 = A — BC. A realization F()\) = D +C()\In — A)_lB of a p X q rational nonconstant mvf F()\) is said to be an observable realization if the pair (C, A) is observable; it is said to be a controllable realization if the pair (A, B) is controllable. Theorem 19.4. Let F()\) be a nonconstant proper rational p x q mvf such that F(/\) 2 D1 -|- 01(A1n1 — A1)_lBl = D2 + 02(A1n2 — A2)_1B2

and suppose that both of these realizations are controllable and observable. Then:

19. Realization theory

441

(1) D1 = D2 and m = mg.

(2) There exists exactly one invertible m X m matrix Y such that 01 = 021/ , A1 = Y—1A2Y and B1 = Y‘1B2 . Proof.

It is readily checked that

F(OO) = D1 = D2

and

ClAll = C2A§B2

for j = 0,1,. . .

.

Let Dj, 01,-, j = 1, 2, denote the observability and controllability matrices for the two realizations; and let

C2

02/12

1 22 = [B2

A2B2

' ' -

1431— B2]

and

92 =

I

021431—1

but bear in mind that 22 7E 62 and (22 7E 02 unless n2 2 n1 . Nevertheless, the identity 531011 = {2222

holds. Moreover, under the given assumptions, Sf]01 and €16? are invertible. Thus, the inclusions

R91 = 739193193? Q R0161 Q R01 imply that

rank(Dl) = rank(01€1€{{) g rank(01€1) S rank(Dl) 2 n1 , which implies in turn that m = rank(91€1) = rank(§2222) S n2 .

However, since the roles played by the two realizations in the preceding analysis can be reversed, we must also have m g m .

Therefore, equality prevails and so 532 = $22 and 62 = 22.

Next, observe that the identity

53131 = 53232 7 implies that B1 = XB2, where X =(Df101)_10{102 . Similarly, the identity

01031 = 029:2 implies that

01 = 021/, where Y = acflacfrl .

442

19. Realization theory

Moreover,

XY = (ofol)—1o{{o2c2efl(ele¥)—1 (Di191)_19i191€1€i1(€1¢i1)_1

= 1”,. Thus, X = Y—l. Next, the formula 01A1€1 = 0214262

implies that A1

(53n)_ID{{DQA2€2€{J(C1€{I)_1

= XA2Y = Y—1A2Y . This completes the proof, up to the verification of the uniqueness of Y,

[I

Which is left to the reader.

Exercise 19.13. Verify the asserted uniqueness of the invertible matrix Y that is constructed in the proof of (2) of Theorem 19.4. 19.1. Minimal realizations The realization

(19.7)

F()\) = D + C(AIn — A)—1B

based on the matrices C E (Cpxn, A 6 CW”, B E (3a and D E ((3q is said to be minimal if the integer n is as small as possible. This minimal value of the number n is termed the McMillan degree of F()\). Theorem 19.5. A realization (19.7) for a proper rational nonconstant func-

tion F()\) is minimal if and only if the pair (0', A) is observable and the pair (A,B) is controllable. For ease of future reference, it is convenient to first prove two preliminary lemmas that are of independent interest. Lemma 19.6. The controllability matrix Q: has rank k, 1 g k < n, if and only if there exists an invertible matrix T such that

(1) T—lAT = [ A11 0

A12 ] , T_lB = [B1] 0 , where A11 E (Ck, Bl E A22

(Cq and

(2) the pair (1411,31) is controllable.

19.1. Minimal realizations

443

Proof. Suppose first that Q: has rank 1:, 1 g k: < n, and let X be an n X 1:: matrix Whose columns are selected from the columns of Q: in such a way that rankX=rankQI=k. Next, let 6 = n — k and let Y be an n x 5 matrix such that the n x n matrix T = [X

Y]

is invertible and express T—1 in block row form as

= [g] ,

Where U E Ck” and V 6 Ce“. Then, the formula

[ii] [X w= [3r 5;]

implies that

and

Ulk, UYZOn, VX=nk

VYZIg.

Moreover, since

(19.8)

AQI=QIE,€=XF,X=QIG

and

B=QIL

for appropriate choices of the matrices E, F, G and L, it follows easily that AX = AQIG = CEG = XFEG

and

B = CL = XFL.

Therefore,

VAX = VXFEG' = O

and

VB 2 VXFL = 0.

Thus, in a self—evident notation,

T—lAT =

[U]A[X Y]

V UAX UAY _ A11 A12 VAX VAY — 0 A22

_ _ and

U

UB

B1

T B - ll-lVBl-lol’ —1

_

_

_

Where A11 6 ((3k and B1 6 Ckxq. Furthermore, since

T—lAjB =(T‘1AT)jT_1B

for j = 0,1,...,n — 1,

it is readily checked that

T—1€=[Bl

A1131

1433—131]

444

19. Realization theory

and hence that

A1131

k = rankQZ = rank T‘1QI = rank [B1

A'fll] .

---

Thus, (A11,B1) is controllable. This completes the proof in one direction. The converse is easy. D

Exercise 19.14. Verify the assertions in formula (19.8). Lemma 19.7. The observability matrix D has rank h, 1 g k: < n, if and only if there exists an invertible matrix T such that

1 CT: 0 () [ 1

A11

0,T-1AT= ] [A21

0

A22

], where C1 E (310”, A11 6

((3k and (2) the pair (Cl,A11) is observable. Proof.

Suppose first that the observability matrix D has rank k, 1 S k

ifj=r+1,...,n'

7. {02H — P21PfilCIH}J{C'2 — 01131—111312} = o. _1

H

[HINT: of — P21131310? = {C [421;12]} .] 8. e?CJ@X(wJ-) [88%)] = 0H

q

€51|:

for j = 1, . . . ,n if and only if

ew-:(147)]o _ 07"[if—In—r r ):|{C2I—IJ—P21P11101]{J}|

_

for] =T+1,...,n.

o Moral: If rankP = r with 1 g r S n — 1, then T9X[e] meets the first r interpolation conditions for every choice of e E Sq. However, the remaining ii — r conditions will only be met if e itself is subject to the constraints in Step 8. If

—1 { [a1] , . . . , [ah] } is an orthogonal basis for the range of C [—1311 P12] b1

bk

In—r

with aj 6 CP, bj E (3‘1 and afaj + bfbj = 2, then, in view of Step 7, aHa = bH b = 1. Thus a constraint of the form aHe(a) = bH with 04 6 11+ translates to aHe()\)b = 1 first for A = a and then by the maximum principle for every point A 6 11+. The analysis to this point serves to complete the proof of Theorem 19.17. However, although we shall not prove it here, more is true:

Theorem 19.22. There exist a pair of unitary matrices V E CPXP and U E ((3q such that

{Tex [V [(8) [OH UH : e E 8p_V)X(q_V)} = the set of all solutions, where I/ = rank (P + ClHCl) — rankP = rank (P + C5102) — rank P.

462

19. Realization theory

19.7. Factorization of @(A) We shall assume from now on that X E 0“” is a nonzero positive semidef— inite solution of the Riccati equation

(19.43)

XAH + AX + XCHJCX = o,

and shall obtain a factorization of the matrix-valued function @(A) defined

in (19.23) based on a decomposition of the space M = M(X) that was introduced in Theorem 19.13. In particular, formula (19.43) implies that

AX = X(—AH — CHJCX) and hence, in view of Lemma 19.16, that M (X) is invariant under the action

of R0, for every choice of oz 6 (C \ 0(A). Thus, if oz 6 (C \ 0(A), Lemma 19.16 guarantees the existence of a nonzero vector-valued function gl 6 M and a point ,ul 6 C such that

(Ragl)()‘) = mnTherefore, since 810‘) = F()\)Xu1 for some vector 111 E C” such that X ul 75 0, it follows that

(Rag1)()\) = —F()\)(0Jn — A)_1Xu1 = ,u1F()\)Xu1 and hence that —(aIn — A)_1Xu1 = [1,1X111.

But this in turn implies that m 7E 0. Thus, the previous formula can be rewritten as AXu1 = lll1

(1944)

with

1

(.01 = a + #—

1

and, consequently, (1945)

810‘) = F(>\)X111 =

CX

111

A — LL21

and

(19'46) Let

”gllli/l = M

111 X111

= FXu1(u{IXu1)_1u{JXV = FX1V

19.7. Factorization of @(A)

463

With

(19.47)

X1 = XQ1

and

Q1 = u1(u{{Xu1)_1qX.

Then, since Q1 is a projection, i.e., Q? = Q1, it is readily checked that

X1 = XQl = XQf = X1Q1 = 52l = QlQl and, as AX1 = w1X1, that

QfAXQl = QiHAXl = wiQiHX1 = w1X1 = AX1 and

of!XAHQ1 = X1AH . Consequently, upon multiplying the Riccati equation (19.43) on the left by

Q? and on the right by Q1, it follows that

o = {1XAHQ1 + Q?AXQ1 + ofXCHJC'XQ1 = X1AH + AX1 + X10H JCX1 ; i.e., X1 is a rank one positive semidefinite solution of the Riccati equation

(19.43). Therefore, by Theorem 19.15, M1 = 7-1091) is a de Branges space based upon the matrix—valued function

mm = 1m — F()\)X10HJ = 1m —

0X1 CHJ /\ — wl

'

Let .Mli denote the orthogonal complement of M1 in M(X). Then

(19.48)

Mi = {(I — H1)FXu: u e on} = {1?q : u e on},

Where

X2 =X—X1 =X—Q11HXQ1 = (In—Q¥)X(In—Q1) : 0. Let

M2 = 191(A)‘1Mf and ego) = 19100—180) By formula (19.5),

elm—1 = I... + own — A1)—1X10HJ, Where

A1 = A + XlCHJC. Moreover, by straightforward calculations that are left to the reader, (19.49)

M2 = {C(AIn — A1)_1X2u: u E Cn},

(19.50)

ego) = 1m — own — A1)—1X2CH J

464

19. Realization theory

and, since both X and X1 are solutions of (19.43), X2 is a positive semidef— inite solution of the Riccati equation

(19.51)

X214? + A1X2 + XzCHJCXg = o

with rank X2 = rankX — 1. Thus, as the pair

(0, A1) = (o, A + XlCHJC) is observable, we can define

Flo) = own — A1)-1 and the space M2 = {F1()\)X2u : u E 13"} endowed with the inner product (F1X2u, F1X2V)M2 = vHX2u

is Ra invariant for each point oz 6 C \ 0(A1). Therefore, M2 = H(@2), and the factorization procedure can be iterated to obtain a factorization @(A) = l91()\)"'19k()\) of @(A) as a product of k: elementary factors of McMillan degree one with k = rank X.

Exercise 19.32. Verify the statements in (19.48).

Exercise 19.33. Verify formula (19.50). [HINT: The trick in this calculation (and others of this kind) is to note that in the product 191—19, the two terms

own — A)-1XCHJ + C(AIn — A1)‘1XlCHJC(AIn — A)-1XCHJ can be reexpressed as

C(AIn — A1)‘1{)\In — A1 + XlCHJC}()\In — A)—1XCHJ, which simplifies beautifully] Exercise 19.34. Show that if f G M2, then

||191f||M = llfll[HINT: First check that

191(A)C()\In — A1)—1X2 = C(AIn — A)—1X2 and then exploit the fact that

X2 = X(In — Q1) = (In — Q?)X(In — Q1)-]

19.8. Bibliographical notes

465

Exercise 19.35. Let A 6 CW”, C E (mn, J E (mm, let A1,...,)\k denote the distinct eigenvalues of A; and let P be a solution of the Stein

equation P— AHPA = CHJC. Show that if 1 — AM—j aé 0 for i,j = 1,. . .,k, then N9 Q Np. [HINT: N9 is invariant under A.] Exercise 19.36. Let A 6 CW", C E (mn, J E (mm; and let P be a solution of the Lyapunov equation AH P + PA + CH JC = 0. Show that if

0(A) fl 0(—AH) = (2), then N9 Q Np. [HINT: N9 is invariant under A] 19.8. Bibliographical notes The monographs [92] and [90] are good sources of supplementary information on realization theory and applications to control theory. Condition (2) in Lemmas 19.2, 19.3, 19.10, 19.11 and variations thereof are usually re— ferred to as Hautus tests or Popov—Belevich—Hautus tests. Theorem 19.9 is

adapted from [7]. Exercise 19.23 is adapted from Theorem 4.3 in Chapter 3 of [76]. The connection between finite dimensional de Branges spaces and Riccati equations is adapted from [30]. This connection lends itself to a rather clean framework for handling a number of bitangential interpolation problems. The discussion of the Nevanlinna-Pick interpolation problem in

Section 19.6 is adapted from [31]; Theorem 19.22 is Theorem 7.1 of [31] adapted to the right half plane; the formula for 1/ is taken from [11]. The treatment of factorization in the last section is adapted from the article [27], Which includes extensions of the factorization discussed here to nonsquare

matrix-valued functions.

Chapter 20

Eigenvalue location problems

When I ’m finished [shooting] that bridge I ’ll have made it into something of my own, by lens choice, or camera angle, or general composition, and most likely by some combination of all those.

I don’t just take things as given, I try to make them into something that reflects my personal consciousness, my spirit. I try to find the poetry in the image.

Waller [88], p. 50 If A E (:a and A 2 AH, then 0(A) C R and hence: o 8+(A) = the number of positive eigenvalues of A, counting multiplicities. o £_(A) = the number of negative eigenvalues of A, counting multiplicities.

o 60(A) = the number of zero eigenvalues of A, counting multiplicities. Thus,

8+(A) + 8_ (A) = rankA and 80(A) = dim NA. 20.1. Interlacing Theorem 20.1. Let B be the upper left k: X k corner of a (k + 1) x (k + 1) Hermitian matrix A and let A1(A) S g Ak+1(A) and A1(B) g g 467

468

20. Eigenvalue location problems

Ak(B) denote the eigenvalues of A and B, respectively. Then (20.1) Proof.

Aj(A) g Aj(B) g Aj+1(A)

for j: 1,...,k.

Let

a(X) = max{(Ax,x) : x E X

and

”x” = 1}

for each subspace X of Ck“ and let

50/) = maX{ = y E 3’ and NY“ = 1}, for each subspace )2 of Cl“.

Let 8]- denote the set of all j-dimensional

subspaces of CH1 for j = 1,...,k + 1; let ’7} denote the set of all jdimensional subspaces of Ck for j = 1, . . . , k; and let 89‘? denote the set

of all j—dimensional subspaces of (CH1 for j = 1,. . . , k that are orthogonal to ek+1, the 16+ 1’st column of Ik+1- Then, by the Courant—Fischer theorem,

MA) = min a(X) XESJ'

< _

minaX =minb X683? ( ) 3267} (y)

= Aj(B)

for j=1,...,k.

The second inequality in (20.1) depends upon the observation that for

each j + 1—dimensional subspace X of (CH1 with j Z 1, there exists at least one j-dimensional subspace )1 of (Ck such that

(20.2)

{Bl : yey}g2c.

Thus, as

— 602) 3 am _

y

y

for y E 3/ and such a pair of spaces 3) and X, it follows that - B l = 31%;} ' 50]) < _ a( X l . )‘J( .7

Therefore, as this lower bound is valid for each subspace X E 8341, it is also valid for the minimum over all X E 8341, i.e.,

a( )_ X2931; a( )

a+1( ) El

Exercise 20.1. Find a 2-dimensional subspace )7 of (C3 such that (20.2) holds for each of the following two choices of X: 1

X _ span

1 1

,

1 1

1

0 ,

1

1

1

and

X — span

1 1

1

0

,

1 1

1 0

0

0

0 ,

0

20.1. Interlacing

469

Exercise 20.2. Show that if X is a j + 1—dimensional subspace of (CH1 with basis 111, . . . , uj+1 and j 2 1, then there exists a j—dimensional subspace )1 of (Ck such that (20.2) holds. [HINT: Build a basis V1,. . .,Vj for 3/, with

Exercise 20.1 as a guide]

Exercise 20.3. Let A = AH E (3a and B = BH 6 CW". Show that if A1(A) g

g An(A) and A1(B) g

g An(B), then

Aj(A) + A1(B) S Aj(A+ B) S Aj(A) + An(B)

for j = 1,...,n

[HINT: Invoke the Courant—Fischer theorem] Theorem 20.2. Let A1 and A2 be nxn Hermitian, matrices with eigenvalues

A?) S

S A511) and A?) S

A1 — A2 i 0.

3 Ag), respectively, and suppose further that

Then:

(1) Aj(1) 2 Aj(2) fOT‘j-_ — 1,. (2) If also rank(A1 — A2)— — 'r, then All) < Afi)? forj = 1, . . . ,n — 7“. Proof. The first assertion is a straightforward consequence of the CourantFischer theorem and is left to the reader as an exercise. To verify (2), let B = A1 — A2. Then, since rankB = 1" by assumption, dim/VB = n — ’r, and hence, for any h dimensional subspace 3/ of C",

dimO/ 0N3)

=

dim)? + dim/VB — dimO} +NB)

2

k+n—r—n

=

k: — 7" .

Thus, if k: > r and Sj denotes the set of all j—dimensional subspaces of C”, then

A213,,

=

y:) ugarmaxflAlyJ

y E L! and My” = 1}

max{(A1y, y:) y E Jim/VB and ||y||— — 1}

mMUAw,M> yeyflNewt=fl |/\

max{(A2y, y): y E yflNB and ||y||— — 1} max{(A2y, y) . y E J} and ||y|| = 1} .

Therefore, since this inequality is valid for every choice of 3/ E 8],, it follows that

AfirgA? brk=T+lvnfll But this is equivalent to the asserted upper bound in (2).

III

Exercise 20.4. Verify the first assertion of Theorem 20.2. Exercise 20.5. Let A = B+7uuH, where B E (:a is Hermitian, u E C”,

”y E R; and let A1(A) g

3 AMA), A1(B) S

eigenvalues of A and B, respectively. Show that

g An(B) denote the

470

20. Eigenvalue location problems

(a) My 2 0, then Aj(B) g Aj(A) S Aj+1(B) forj = 1,...,n — 1 and

MB) S AMA)-

0))

If "y S 0, then Aj_1(B) S )xj(A) S )xj(B) fOI'j = 2,...,’I’l and

Exercise 20.6. Show that in the setting of Exercise 20.5,

Aj(A)=)\j(B)+cj’y,

where

03-20

and

c=uHu.

j=1 [HINT: 23:1)‘9'0‘0 = Z?21(Auj,uj) = Zg=l for any two or— thonormal sets of vectors {u1, . . . , un} and {V1, . . . ,vn} in CC”.] Exercise 20.7. Let A = UDU‘1 6 (Ca with D = diag{)\1, . . . , An}. Show that if ek denotes the k’th column of In, 11;. = Uek and B = A+ek

for some vector v E C”, then det()\ln — B) = (A — ,u1)---()\ — u”), where in. = M + vHuk and W = Aj for j 7E k. [HINT: B = U(D +ekU)U_1.] Exercise 20.8. Let A0 be a pseudoinverse of a matrix A 2 AH 6 (3a such

that A° is also Hermitian. Show that 8AA) = 8AA") and 80(A) = 60(A0). A tridiagonal Hermitian matrix An 6 Ra of the form

n

71—].

A" = Z “jejel+§: bj(eje}’+1 +ej+1ef) =

a1

b1

0

. . .

0

0

b1

(12

b2

. . .

0

0

0

52 a3 ---

0

0

0

0

bn—l

an

0

...

with bj > 0 and aj 6 R is termed a Jacobi matrix. Exercise 20.9. Show that a Jacobi matrix An+1 has n + 1 distinct eigen< un denote the eigenvalues of < An“ and that if #1 < values A1
O and AU > 0, then U = In.

Lemma 20.6. IfB E Cpxq, then

E=[ o B]N[BBH 0

BH 0

0

—BHBl

and 51(E)=rankB.

The first step is to note that, in terms of the Moore-Penrose inverse

Proof.

Bl of B,

1,, (BT)H —BH I,

o B BH 0

1,, —B Bi 1.,

_2 _

BBT 0 o —BHB

and, by Theorem 5.7, det

1,, —B Iq ]=det(lq+BlB) >0.

|: Bl

The conclusion then follows upon multiplying both sides of the preceding identity by the matrix

ill/O J «501.

, where Y = (BBH)1/2 + 1p — BBl = YH.

This does the trick, since Y is invertible and

YBBlY = (BBH)1/2BBT(BBH)1/2 = BBH .

D Exercise 20.12. Verify the claim that the matrix Y that is defined in the

proof of Lemma 20.6 is invertible by showing that NY = {0}. Exercise 20.13. Furnish a second proof of Lemma 20.6 by showing that if

p+ q = n, then det (A1,, — E) = NH) det (A211, — BHB). [HINT: Show that if rankB = k, then 81(E) = k and 80(E) = n — 2b.]

20.3. Congruence

473

Lemma 20.7. IfB E (3q and C = CH 6 (Cnxn, then E:

0 0B 0 00 BHOO

BBHO 0 0 C 0 0 O—BHB

~

and hence

SHE) = (2(0) + rankB. Proof. This is an easy variant of Lemma 20.6: Just multiply the given matrix on the right by the invertible constant matrix

1,, 0 —B 0 In 0

K:

Bl 0

L1

and on the left by KH to start things off. This yields the formula

KHEK=

211313T

0

0

0

C

0

0

0

—2BHB

,

which leads easily to the desired conclusion upon multiplying KHEK on the left and the right by the invertible constant Hermitian matrix 1

—dia fl g { Y, film]q} , Where Y is defined in the proof of Lemma 20.6.

D

Exercise 20.14. Show that congruence is an equivalence relation, i.e., (i) A~A; (ii)A~B=>B~A; and (iii) A~B, B~C=>A~C.

Exercise 20.15. Let A = AH, C = CH and E = [1;]

5]. Show that

E = EH and that 8i(E) 2 €i(A) and 8i(E) 2 €i(C). [HINT: Exploit Theorem 20.3.] Exercise 20.16. Let ej denote the j’th column of In. Show that the eigen-

values of Zn = 23:1 eje£+1_j must be equal to either 1 or —1 Without calculating them. [HINT: Show that Z713 = Zn and Zo = In.] Exercise 20.17. Let Zn be the matrix defined in Exercise 20.16.

(a) Show that if n = 2k is even, then 8+(Zn) = 8_(Zn) = k. (b) Show that ifn = 2k+1 is odd, then 8+(Zn) = [6+1 and 8_(Zn) = k. .

.

.

.

.

Ik

[HINT. Verify and exp101t the identlty [—Zk

2 diag{1k,—Ik}-]

Zk

0 1k] [2k

Zk

Ik: 0] [Zk

_Zk _ Ik: ! ] _

474

20. Eigenvalue location problems

Exercise 20.18. Confirm the conclusions in Exercise 20.16 by calculating

det (AL, — Zn). [HINT: If n = 2k, then A1,, — 2,, = —Zk Mk ‘Zk A1,, .] 20.4. Counting positive and negative eigenvalues Lemma 20.8. Let A = AH E (Ck, B E (Ckxe,

A

B

and let P denote the orthogonal projection of (CI4 onto NBH. Then

(20.4)

£i(E) 2 8i(PAP) + rank (B).

Proof. Let m = rankB and suppose that m 2 1, because otherwise the asserted inequality is self-evident. Then, the singular value decomposition of B yields the factorization B 2 VSUH , where V and U are unitary matrices of sizes k x k and E X A, respectively, S :

|:

D

, s— m ,e=3— m ,

Om

Ok’Xm

01¢e l

D is an m x m positive definite diagonal matrix, and for the sake of definiteness, it is assumed that k’ 2 1 and 6’ 2 1. Thus,

E_'Vo _

_0 U VHAV

VHAVS

VH0

SH

o

0

UH

0 S]

—§11 A21

A12 A22

D 0

D

O

0

O

O

0

QQQQ

3H

where the block decomposition of

A=VHAV=[

VlH

_ VlHAV1 VlHAV2

VzHlAlvl V2l‘l1/2HAV1 V2HAV2

is chosen to be compatible with that of S, i.e., A,,-— — VH AV for i, j— — 1, 2 with V1 E (3k and V2 6 (Chm.

Moreover, A11— — A11, A12— — A55,

A22— — Ag; and, since D 1s invertible, the last matrix on the right 1s congruent to

o g

D 0

0 D

A22 O

0 O

O O

0

O

O

O

Jim. 0 o o ,

. . wh1ch 1s congruent to

O O

O D

D O

O O

O

O

O

O

,

20.4. Counting positive and negative eigenvalues

475

as may be easily verified by symmetric block Gaussian elimination for the first congruence and appropriately chosen permutations for the second; the

details are left to the reader as an exercise. Therefore, by Lemma 20.6,

(SHE) = 5141122) +5} ([10) 3]) = 8i(V2HAV2) + rank (D) = £i(PAP) + rank (B) .

since P = V2V2H,

5i(V2HAV2) = 5i(‘/'2H‘/'2‘/2HAW‘/'2Hl/2) S 5i(‘/'2V2HAl/§‘/'2H) S 5i(V2HAV§) El

and rank (D) = rank (B)

Exercise 20.19. Show that if B,C 6 CW,“ and A = BCH + CBH, then

8AA) g k. [HINT: Write A = [B 0].] [B C]H for an appropriately chosen signature matrix J and invoke Theorem 20.3.] Exercise 20.20. Show that if B 6 (€q and Bl denotes the Moore—Penrose O B [BBT 0

inverse of B, then BH

0 ~

0

—BlB .

Exercise 20.21. Evaluate 8i(Ej) and 80(Ej) for the matrices

[goon _ormoo and E2—oooo Igooo

_1,.1,. E1_[Ik0]

Lemma 20.9. Let A = AH, D = DH and E:

A BH CH

B D 0

C O O

Then £i(E) 2 5&1?) + rankC.

Proof.

Let C = VSUH be the singular value decomposition of 0. Then

.21“ E s E~

BH D o SH

0

0

Where Z = VHAV and B = VHB. Thus, if

F0

S-lool’

,

476

20. Eigenvalue location problems

Where F is a positive definite diagonal matrix and 21V and F are written in compatible block form as ~ ~

A:

A11

A12

A21

A22

~

respectively, then 1111 = Fifi, 12112 = gig“ 1122 = 115‘; and

(20.5)

E~

1322

421

O 422

0 0

B2

BfDOOrv F O

O 0

O O

O O

~0

0

E11 E12 E1 F 0

9

F 0

B2

0 0

osfnoo

O O

F O

0 O

O O

O O

O 0

Therefore, upon applying Theorem 20.3 to the identity

0

O

F

I o o T

0 9

9 F o

I o o

O

0

32

O

0

O

{1,22

0

0

O

O

0DO=OIO

03511900010,

F

F 0

0

0

O O

0 0

I O

O O

O O

O 0

0 0

O O

0 0

I O

and then invoking Lemma 20.7, it is readily seen that 8i(E) Z rankF+8i(D).

This completes the proof, since rankF = rank C.

D

Lemma 20.10. Let E 2 EH be given by formula (20.3) with rankA 2 1 and let Al denote the Moore-Penrose inverse of A. Then

(20.6)

8i(E) 2 aim) + 8;(BATBH).

Proof. Since A = AH, there exists a unitary matrix U and an invertible diagonal matrix D such that

H D0 _ A_U[OO]U.

Thus, upon writing U = [U1

and

U2]

H

U B— [ U21,

_ 32

in compatible block form, it is readily seen that

E=

U o

o I

D O

0

0

Bl

B2

B? B51 0

UH o 0

I



20.5. Exploiting continuity

477

i.e.,

E~

D 0 B1 o 0 32 B? 351 0

Moreover, since D is invertible,

D

0

Bl

D

0

B? B? o

1

where _

0

B2

0

_1

_

0

B2

E1‘iBé’ o l ‘ lBHl D [0 Bl] — l B? aim-131 l ° Thus,

(20.7)

81(E) = 634D) + 8i(E1) = 81(14) + 81(E1) .

Much the same sort of analysis implies that

£i(E1) 2 £i(—B{{D_1Bl)= 8;(BfID_1B1) and hence, as

19—10

_

BiqDlBlliqBéql[0

B

0][B;]

D-lo H H _BU[O olUB _

=BHAlB, implies that

€i(E1) 2 6;(BHATB). The asserted inequality (20.6) now drops out easily upon combining the last

inequality with (20.7). O x H X . ExerCIse 20.22. Let B E (C? q, C = C E (Cg ‘1 and let E = BH

D B C .

Show that £i(E) 2 max {8i(C), rank (3)}. 20.5. Exploiting continuity In this section we shall illustrate by example how to exploit the fact that the eigenvalues of a matrix A E (3a are continuous functions of the entries in the matrix to obtain information on the location of its eigenvalues. But see also Exercise 20.24. The facts in Appendix A may be helpful.

478

20. Eigenvalue location problems

Theorem 20.11. Let

A

B 1,,

E = BH 1,, o IP

0

,

O

in which A, B E (3p and A = AH. Then 8+(E) = 21) and £_(E) = p. Proof.

Let tA

tB

Ip

E(t) = tBH 1,, 0 Ip

0

0

Then it is readily checked that:

( 1) E(t) is invertible for every choice of t E R. (2)5+(E(0)) = 2p and 55—03(0)) = p(3) The set (21: {t E R : 8+(E(t)) 2 2p} is an open subset of IR. (4) The set (22: {t E R : 8+(E(t)) 7E 2p} is an open subset of IR. (5) IR: 521 U 92

(6) 91 75 0 Thus, as the connected set R = (21 U (22 is the union of two open sets, Item

[I

(6) implies that (22 = (2). Therefore, $21 = R.

Exercise 20.23. Verify the six items that are listed in the proof of Theo— rem 20.11.

Exercise 20.24. Show that the matrix E(t) that is introduced in the proof of Theorem 20.11 is congruent to E(0) for every choice of t E R.

20.6. Gersgorin disks

Let A E (3a with entries aij and let it

A) = Zlazjl — laiil

for

i: 1,...,n.

Then the set

13(14):” 6 (C: |)\ — Gal S 9414)} is called the i’th Gersgorin disk of A.

Theorem 20.12. IfA 6 CW", then:

(1) 0(A) Q U?=1Fi(A)(2) A union {21 of k: Gersgorin disks that has no points in common with the union S22 of the remaining n — k Gersgorin disks contains emactly k eigenvalues of A.

20.6. Gersgorin disks

Proof.

479

If A E 0(A), then there exists a nonzero vector u E C” with

components ul, . . . ,un such that (Mn — A)u = 0. Suppose that

lukl = max{|uj| : j= 1,...,n}. Then the identity TL

(A — 6%k = Z alcjuj - akkuk

j=1

implies that n

|()\ — akkI E Z lakj||uj| — lakkllukl j=1

S Pk(A)luk|Therefore, n

A e 1“,.(A) c U B(A). i=1 This completes the proof of (1). Next, to verify (2), let D = diag{a11, . . . ,ann} and let

B(t)=D+t(A—D)

for

031531.

Then 0(B(t)) Q 91(75) U (22(75)

for

0 S t g 1,

where Sly-(15) denotes the union of the disks with centers in Qj but with radii pi(B(t)) = tpi(B(1)) for 0 g t S 1. Clearly (21(0) contains exactly k eigenvalues of D = B(0), and S22 (0) contains exactly n—k eigenvalues of D =

3(0). Moreover, since the eigenvalues of B(t) must belong to 91(t) U 92(t) and vary continuously with t, the assertion follows from the inclusions

(21(15) Q (21(1)

and

(2205) Q 522(1)

for

OStg 1

and the fact that 91(1) 0 (22(1) = Q).

III

Exercise 20.25. Show that the spectral radius 7“,,(A) of a matrix A 6 ((3a is subject to the bound n

70(14) SmaX

Elam : i: 1,...,’n j=1

Exercise 20.26. Show that the spectral radius ra(A) of a matrix A 6 ((3a is subject to the bound

ra(A) s max{: |aij| : j= 1,...,n}.

i=1

480

20. Eigenvalue location problems

Exercise 20.27. Let A E (Cnxn. Show that if an > pi(A) for i = 1,. . .,n, then A is invertible.

Exercise 20.28. Let A 6 CW”. Show that A is a diagonal matrix if and

only if 0(A) = U?=1Fi(A).

20.7. The spectral mapping principle In this section we shall give an elementary proof of the spectral mapping

principle for polynomials because it fits the theme of this chapter; see The— orem 17.26 for a stronger result. Theorem 20.13. Let A E (Cnxn, let A1, . . . , Ak denote the distlct eigenval—

ues of A and let p()\) be a polynomial. Then

(20.8) det (A1” — A)

=

(A _ A1)“ . . . (A _ Ak)0"°

=> det (A1” —p(A)) = (A—p()\1))a1...()\_p()\k))ak;

0(p(A)) = p(0(A))-

(20-9)

Proof. Let V be an invertible matrix such that V—lAV = J is in Jordan form. Then it is readily checked that Jk = Dk + Tk for k = 1, 2. . . ., Where D = diag {d11, . . . , dnn} is a diagonal matrix and Tk is strictly upper triangular. Thus, if the polynomial p()\) = a0 + a1)\ + - - - + agAe, then r

p(J) = 10(1)) + Z ajTjj=1

Consequently,

det (AIn — p(A))

= det (AIR — p(J)) = det (AIn — p(D)) =

(A — P\"a 131750, be a polynomial of degree n with coefficients f0, . . . , fn E (C and let

g()\) =go+giA+~~+gnAn be a polynomial of degree less than or equal to n with coefficients go, . . . , gn E (C, at least one of which is nonzero. Then the matrix B 6 (3a with entries bij, i,j = 0, . . . ,n — 1, that is uniquely defined by the formulas n—l

2',j=0

and

v()\)T = [1 A

An—l] , 487

488

21. Zero location problems

is called the Bezoutian of the polynomials f (A) and g()\) and will be denoted by the symbol B( f, g) (or just plain B if the polynomials are clear from the context). The first main objective of this chapter is to verify the formula (21.2)

dimNB(f,g) = 1/(fag)?

where

1/(f, g)

= the number of common roots of the polynomials f (A) and g()\), counting multiplicities.

If f (A) has n distinct roots, then the verification of (21.2) is a simple consequence of the following observation: Lemma 21.1. If f()\) is a polynomial of degree n with n distinct roots 041, . . . ,ozn, g()\) is a polynomial of degree 3 n and V is the Vandermonde

matrix with columns v(oz1), . . . ,v(ozn), then

(213)

VTB(f, 9W = diag{f'(041)9(011),~-,f’(04n)9(04n)}-

Exercise 21.1. Verify Lemma 21.1. [HINT: Compute V(O{j)TB(f,g)V(ak), for j = k andj 7E k via (21.1)] If f has n distinct roots, then V is invertible. Therefore, formula (21.3) implies that dim N305” is equal to the number of points 043- at which g(aj) =

0 and hence that (21.2) holds in this case. The next step is to verify the inequality

(21.4)

dimNBOohq) 2 1/(f,g)

when f is not restricted to have distinct roots.

To begin with, it is readily seen that if f (a) = 0 and 9(a) = 0, then

v()\)TBv(a) = 0 for every point A E (C and hence that

Bv(a) = 0.

Moreover, if f(a) = f’(a) = 0 and 9(a) = g’(a) = 0, then the identity

f (AWN) - 9(x\)f’(u) + f(M900 - 9(A)f(u) = V(/\)TBV'(M), A—u (A - m2 which is obtained by differentiating both sides of formula (21.1) with respect to ,a, implies that

(21.5)

V()\)TBV/(Oz) = 0

for every point A E C. Therefore, dim/VB 2 2, since the vectors V(Oz) and

v’(a) both belong to NB and are linearly independent.

21.1. Bezoutians

489

Much the same sort of reasoning leads rapidly to the conclusion that if

f(a) = f(1)(a) = "'f(k_1)(0\ + ' ' ' + an = f0 + Alf1"'fnl(1n — ANTVlel

for polynomials f (A) = f0 + f1)\ + - - - + a" will be helpful. Exercise 21.2. Verify formula (21.9).

490

21. Zero location problems

Theorem 21.2. If f()\) = f0 + - ' ' + an is a polynomial of degree n, i.e., fn 7E O, and n 2 2, and if (pkOx) = N“, then

’ HEW] if k=0, (21.10)

[‘fi} [0,k—1]

0

O

H[k+1,n] ]

B(f,(,0k) =
\(1n - ANTYI — MU - HNT)_1}61 = [fl ... fn]{()\ - H)(1n - ANTYIUn — HNT)_1}91 and, consequently,

(21.11)

w = [f1 --- mu — ANTWI — uNTrlel .

However, since

(21.12)

[f1

fn](NT)k = e£+1Hf for k: = 0,. . .,n — 1,

it is easily seen that n—1

n—1

[f1 --- mun — ANTrl = [fl --- fn] ZMNT)" = Z Ajel’HHf j=0

j=0

and hence that

(21.13)

[f1 --- MI“, — ANTrl = [1 ,\ --- )‘n_1l-

Thus, substituting formula (21.13) into formula (21.11), f()\))\ : £(M)

=

[l A . ' ' An_1]Hf(In _ MNT)_181

= V()\)THfV(M); i.e., B(f,\2m and

50‘) = PM — p3)\3 +"'+(—1)m_1p2m—1)\2m_1Ifn = 2m + 1, then

“(M = 190 — I92)\2 +"' +(—1)mp2m>\2m and

W) =m—mx3+-~+(—nmpzmflvw

21. 7. Stable polynomials

Proof.

503

Let f (A) = 19(7)) and f#()\) = f ( ). Then it is readily checked

that:

(1) am 2 f0) +f#(A).

(2, W, mfg—(A), _

#

(3) B(a, b) = Jig—{#1 (4) p()\) is stable if and only if the roots of f (A) belong to the open upper half plane (3+. Now, to begin the real work, suppose that the Bezoutian B (a, b) < 0.

Then —iB( f, f#) > O and hence, in view of Theorem 21.13, the roots of the polynomial f (A) are all in (3+. Therefore, the roots of p()\) are all in IL; i.e., p()\) is stable.

The main step in the proof of the converse is to show that if p()\) is stable, then B(a, b) is invertible because if B(a, b) is invertible, then B(f, f#) is invertible, and hence, by Theorem 21.13, 8+(—z'B(f, f#))

=

the number of roots of f(/\) in C+

the number of roots of p(/\) in H_ =

n;

i.e., —iB(f, f#) > O and therefore B(a, b) < 0. To complete the proof, it suffices to show that a,()\) and b()\) have no common roots. Suppose to the contrary that a(a) = b(a) = 0 for some point a E (I. Then

p(z'a) = a(a) + ib(a) = 0. Therefore, 2'04 6 11.. However, since a()\) and b()\) have real coefficients, it follows that

Mia) = a(§) + ib(§) = (1(a) + ib(a) = 0 also. Therefore, if a is a common root of a()\) and b()\), then id and 2'6 both belong to H_; i.e., the real part of 2'04 is negative and the real part of id is negative. But this is impossible. D

Exercise 21.15. Verify the four assertions (1)—(4) that are listed at the beginning of the proof of Theorem 21.15. [HINT: To check (4) note that if p()\) = (A—Aflml - - - (A—Ak)mk, then 19(7)) = (iA+i2)\1)m1 - - - (iA+i2Ak)mk.] Exercise 21.16. Show that if p()\) and q()\) are two stable polynomials with either the same even part or the same odd part, then tp()\) + (1 — t)q()\) is stable when 0 S t S 1. (In other words, the set of stable polynomials with

504

21. Zero location problems

real coefficients that have the same even (respectively odd) part is a convex set.) 21.8. Kharitonov’s theorem

A problem of great practical interest is to determine when a given polynomial

p(/\) =po +p1A+ - - - +pnA” is stable. Moreover, in realistic problems the coefficients may only be known approximately, i.e., within the bounds

(21.37)

gjgpjgpj

for j=0,...,n.

Thus, in principle it is necessary to show that every polynomial that meets the stated constraints on its coefficients has all its roots in H_. A remarkable theorem of Kharitonov states that it is enough to check four extremal cases. Theorem 21.17. (Kharitonov) Let 139‘ g E, j = 0, . . . ,n, be given, and let

@100 = 130 +172)? +124” + . .. 90200 = fig +132)? +1—34A4 + . ..

¢1(>\) = 31A +1—93A3 +135A5 + . .. $200 =fi1/\ +1_93)\3 +1—95A5 + . . -. Then every polynomial

p(A)=po+p1A+---+pn)\” with coefi‘lcients 10,- that are subject to the constraints (21.37) is stable if and only if the four polynomials

iOx) = B(a,b) -< 0.

Theorem 21.14 is useful for this.

Suppose for the sake of definiteness that n = 2m and pn > 0. Then

clearly a1()\) 3 a()\) S a2()\) for every point A E R. Moreover, since the Bezoutians B(a1, [11) < O and B(a2, bl) 4 O, bl()\) has n — 1 distinct real

roots and the polynomials a1()\) and 0120‘) each have n distinct real roots that interlace the roots of [11(A). Thus, by Exercise 21.13, the polynomial a()\) also has n real roots A1, . . . , An that interlace the roots of (110‘) and meet

the constraint a,()\j)b1()\j) < 0. Therefore, by Theorem 21.14, B(a, bl) < 0. Similar considerations imply that a’()\j)bg()\j) < 0 and B(a, [12) < 0. Moreover, since 61(A) S b()\) 3 b2()\) for A > 0, 520‘) g b()\) g bl()\) for A < 0 and [11(0) = b(0) = [12(0) = 0, the roots of a()\) interlace the roots of b()\) and al()\j)b()\j) < 0 forj = 1, . . . ,n. Thus, Theorem 21.14 guarantees that B(a, b) < O. This completes the proof for the case under consideration. The other cases may be handled in much the same way. The details are left to the reader. III

21.9. Bibliographical notes My main source for early versions of this chapter was the book [56]. However, a number of the proofs changed considerably as the chapter evolved. The use of a realization formula to establish Barnett’s identity in this edition

leads to considerable reduction in the computational effort. Formula (12.19) and its block matrix analogue (12.46) exhibit the inverse of a Toeplitz matrix as a Bezoutian that is tailored to the unit circle rather than to IR or 2R. The application of Bezoutians to verify Kharitonov’s theorem is due to A.

Olshevsky and V. Olshevsky [70]. There are also versions of Kharitonov’s theorem for polynomials with complex coeflicients. Exercise 21.16 is based on an observation in the paper [20]. Exercise 21.9 is based on a variant of a calculation by Percy Deift [private communication]. Lemma 21.1 is adapted

from a paper by F. I. Lander [57]. A very neat way to establish the proper— ties of Bezoutians and Resultants that avoids the use of Barnett’s identity is

506

21. Zero location problems

presented in [22]. Another approach that extends Lemma 21.1 to the case of nonsimple roots with the help of the Barnett identity is [78].

Chapter 22

Convexity

All extremism, fanaticism and obscurantism come from a lack of security. A person who is secure cannot be an eattremist. He uses his heart and his mind in a normal fashion.

I resent very much that certain roshei yeshiua and certain teachers want to impose their will upon the boys. ways are correct it is up to the individual to make the decision.

Rabbi Joseph B. Soloveitchik, cited in [74], pp. 237 and 240 0 Warning The warnings in preceding chapters should be kept in mind.

22.1. Preliminaries Recall that a subset Q of a vector space a over IF is said to be convex if u,v€ Q=>/\u+(1—)\)v€ Q for every Asuchthat 03 Ag 1;

and if U is a normed linear space over IF with norm || H and v E M, then B..(v) = {u E U: ||u — VII S r} is a closed convex subset oft/I, whereas B7.(v) = {u E U: ||u — v|| < r} is an open convex subset of” for every choice of r > 0 Exercise 22.1. Verify that the two sets indicated just above are both con—

vex. [HINTz Aul + (1 — A)u2 — v = A(u1 — v) + (1 — /\)(u2 — v).] Exercise 22.2. Show that Q = {A 6 CW” : A t O} is a convex set. 507

508

22. Convexity

Figure 1 Exercise 22.3. Show that Q = {(A,B,C) E Ca x (3a x Cnx”: “A“ g 1, ”B” g 1 and ”CH 3 1} is a convex set.

Exercise 22.4. Show that the four-sided figure in Figure 1 is not convex.

A convex combination of n vectors is a sum of the form

(22.1)

2 Am with A,- 2 0 and

Z)..- = 1.

i=1

i=1

Lemma 22.1. Let Q be a nonempty subset of a vector space H overlF. Then Q is convex if and only if it is closed under convex combinations, i.e., if and only if for every integer n 2 1, TL

V1,...,VnEQ=>Z)\¢V¢EQ i=1

for every choice of nonnegative numbers A1, . . . , An such that A1 +- - -+ An = 1.

Proof. Suppose first that Q is convex, and that v1, . . . ,vn E Q. Then, if n > 2, A1 < 1 and M = Aj/(l — Aj) forj= 2,...,n, the formula /\1V1 +)\2V2+---+)\nVn = A1V1 +(1- A1){

)‘2V2+"'+)\nvn

l—Al

implies that

A1V1+---+>\nvn EQM2V2+w+unvnEQ Thus, Q is closed under convex combinations of n vectors if and only if it is closed under convex combinations of n — 1 vectors. Therefore, Q is closed under convex combinations of n vectors if and only if it is closed under convex combinations of 2 vectors, i.e., if and only if Q is convex.

D

22.2. Convex functions

509

Lemma 22.2. Let x E C”, let Q be a nonempty subset of C" and let

d = inf{||x— u|| : u E Q}. Then the following conclusions hold: (a) If Q is closed, then there exists at least one vector no 6 Q such that d = “X — u0|| . (b) If Q is closed and convex, then there exists exactly one vector uO E Q such that

d = “X — u0|| . Proof.

Choose a sequence of vectors 111, ug, . . . E Q such that

IIX—ukII s cm; for k: = 1,2,. . .. Then the bound 1

llukll = Iluk — x + XII S Iluk — XII + IIXII S 01+ E + IIXII guarantees that the vectors uk are bounded and hence that a subsequence 11141, ukz, . . . converges to a limit uo, which must belong to Q if Q is closed. The bounds 1

d: llx—uoll s llx— ukjll + In]... —uoI s d+ E + Ink,- —uo|| .7

serve to complete the proof of (a).

Suppose next that Q is both closed and convex and that d = ||x— u0|| = ”x — v0||. Then, by the parallelogram law,

4012 = 2Ix— nor + 2Ix— Var = llx— 110 +x— Vo||2 + I V O — no||2 2 = 4 x—

u +v

0 2

O

+||v0—u0||2 Z 4d2+||v0—u0||2.

Therefore, 0 2 ||v0 — uoll, which proves uniqueness.

D

22.2. Convex functions

Recall that a real-valued function f (x) that is defined on a convex subset Q of a vector space u over IR is said to be convex if

(222)

f(tX + (1 — UV) S tf(X) + (1 — t)f(¥)

for x, y E Q and 0 S t g 1; f is said to be strictly convex if the inequality

(22.2) is strict for 0 < t < 1. Recall also that (22.2) holds if and only if Jensen’s inequality holds:

(22.3)

1%: Aixz‘) S E Aif(xi) i=1

i=1

510

22. Convexity

for every convex combination 221:1 Aix, of vectors in Q. Exercise 22.5. Verify the algebraic—geometric inequality: Show that if an, . . . ,an and t1, . . . ,tn are two sequences of positive numbers such that

251+~~+tn = 1, then (22.4)

a? . - of," g t1a1 + - - - + tnan

with equality if and only if tj = 1/n for j = 1, . . . ,n. [HINT2 —ln x is convex on (O, 00).]

Lemma 22.3. Let f(:1:) be a convea: function on on open subtntevval Q of IR and let a < c < b be three points in Q. Then f(C) _ f(a) < f(b) — f(a) < f(b) — f(c) .

(22.5)

c—a

Proof.

_

b—a

_

b—c

Let 0 < t < 1 and c = to + (1 — t)b. Then the inequality

f(C) S tf(a) + (1 — t)f(b) implies that

N?) — f(C) 2 t(f(b) — flat», or, equivalently, that

f(b) — f(c) > N?) — f(a) t(b — a)

_

b—a



which serves to prove the second inequality, since t(b — a) = b — c. The proof of the first inequality is established in much the same way. The first step is

to observe that

f(C) — f(a) S (1 — t)(f(b) — f(a)). The rest is left to the reader as an exercise.

III

Exercise 22.6. Complete the proof of Lemma 22.3. Lemma 22.4. Let Q = (oz, B) be an open subz'nterval oflR and let f E 62(Q).

Then f(a:) is convex on Q if and only if f”(x) Z 0 at every point x E Q. Proof.

Suppose first that f (cc) is convex on Q and let a < c < b be three

points in Q. Then upon letting c l a in the inequality (22.5), one can readily see that fa?) f( ) , — a

f (a) _ b _ a

< — .

Next, upon letting c T b in the same set of inequalities, it follows that

b _ a _ f (b)

f(b) _ f(a) < —

I

_

Thus, f’(a) g f’(b) when a S b and f(:c) is convex. Therefore, f”(.v) 2 0 at every point a: E Q.

22.2. Convex functions

511

Conversely, if f”(x) 2 O at every point :1: E Q and if a < c < b are three points in Q, then, by the mean value theorem, W = f’(§)

for some point

E 6 ((1,0)

and

M b_C

fl

(n) for some point 77 e (c, b).

Therefore, since f’(§) S f’ (77), it follows that

(f(C) — f(a))(b - C) S (fa?) - f(C))(C — 0)But this in turn implies that

f(C)(b - a) S (b - C)f(a) + (C — a)f(b), which, upon setting 0 = to + (1 — t)b for any choice of t 6 (0,1), is easily El

seen to be equivalent to the requisite condition for convexity.

Exercise 22.7. Let f(ac) = 33’" on the set Q = (0,00). Show that f(x) is convex on Q if and only if "r 2 1 or 7' S 0 and that — f (x) is convex on Q if and only if0 g r g 1. Exercise 22.8. Show that ex is convex on R and that — In a; is convex on

(0, 00). Theorem 22.5. Let Q be an open nonempty convea: subset of R", and let

f E C2(Q) be a real-valued function on Q with Hessian Hf(x). Then (22.6)

f

is convex on Q if and only if Hf(x) t 0

on

Q

for every point X E Q.

Proof. Choose a pair of points a, b E Q and let h(t) = f (a+t(b — a)) for t E [0, 1]. Then it is readily checked that (22.7)

h”(t) = (Hf(a+t(b—a))(b—a),(b—a)).

Thus,

Hf(X) : O on Q => h”(t) 2 0 on (0,1) => his convex on (0,1); the second implication follows from Lemma 22.4. Consequently,

f(a+t(b—a)) = h((1—t)0+t1)S(1—t)h(0)+th(1)=tf(a)+(1—t)f(b), i.e., Hf(x) : O on Q => f is convex. Conversely, if f is convex on Q, a, b E Q, then the preceding sequence of

inequalities insures that h(t) = f (a+t(b— a)) is convex on (O, 1) and hence, by another application of Lemma 22.4, that h” (t) 2 0 on (0, 1). Thus, upon letting t 1 0 in formula (22.7) that (Hf(a)(b — a), (b — a)) 2 0

for every choice of a, b E Q.

512

22. Convexity

Therefore, f convex on Q => Hf(a) : O for every a E Q, as needed.

El

Exercise 22.9. Show that in the setting of Theorem 22.5,

(22.8)

f

is strictly convex on Q if and only if Hf(x) > O

on

Q.

22.3. Convex sets in R"

Lemma 22.6. Let Q be a closed nonempty convex subset ofa, let x E R” and let ux be the unique element in Q that is closest to x. Then

(22.9) Proof.

(x — ux, u — ux) S 0

for every 11 E Q .

Let u E Q. Then clearly (1 — A)ux + Au 6 Q for every number A

in the interval 0 S A g 1. Therefore,

IIX - ux||2 S ”X - (1 - A)ux - AHII2 = ”X — 11x — Mu — 11x)”2 =

”x — ux||2 — 2A(x — ux, u — ux) + A2||u — ux||2,

since A E R and all the vectors belong to R”. But this in turn implies that 2A(x — ux, u — ux) S A2||u — ux||§ and hence that

2(x — ux, u — ux) g A||u — ux||§ for every A in the interval 0 < A g 1. (The restriction A > 0 is imposed

in order to permit division by A in the line preceding the last one.) The El desired inequality (22.9) now drops out easily upon letting A l 0. Exercise 22.10. Let B be a closed nonempty convex subset of R”, let

a E R” and let h(x) = (a — ba, x), Where ha is the unique vector in B that is closest to a. Show that

(22.10)

h(a) 2 Ha — ba||2 + h(b)

for every vector b E B.

Lemma 22.7. Let u be a nonempty subspace of R” and let x E R”. Then there exists a unique vector ux E L! that is closest to x. Moreover,

(22.11)

(x — ux, u) = 0 for every vector u G Ll .

Proof. The existence and uniqueness of ux follows from Lemma 22.2, since a nonempty subspace of R” is a closed nonempty convex set. Lemma 22.6 implies that

(x—ux,u) = (x—ux,u+ux— ux) g 0 for every vector u E M. Therefore, since L1 is a subspace, the supplementary inequality

(x — ux, —u) S 0 is also in force for every vector u G Ll.

D

22.4. Separation theorems in R”

513

Lemma 22.8. Let Q be a closed nonempty convex subset of R”, let x,y E R" and let ux and 11y denote the unique elements in Q that are closest to x and y, respectively. Then

(22-12)

llux — 11y” S ”X - YIl -

Proof.

Let

a=2 ux E Q is continuous.

This fact will be used to advantage in the next section, which deals with separation theorems.

22.4. Separation theorems in R” The next theorem extends Exercise 22.10.

Theorem 22.9. Let A and B be disjoint nonempty closed convert sets in R” such that B is also compact. Then there exists a point a0 6 A and a point b0 6 B such that

(ac - b0,a) 2 (ac — b0, 80) = Ilao - boll2 + (a0 - b0, 130) > (ac - b03130) 2 (ac - '00, b) for every choice ofa E A and b E B. Proof. Let ax denote the unique point in A that is closest to x E R”. Then, by Lemma 22.6,

(x—ax,a—ax) g 0

for every

a E A.

Moreover, since ax is a continuous vector function of x by Lemma 22.8,

9(X) = ||X - axll is a continuous scalar valued function of x E R”. In particular, 9 is continuous on the compact set B, and hence there exists a vector b0 6 B such that

llbo - aboll S llb — ab”

514

22. Convexity

for every b E B. Let a0 = abo. Then

”‘00 — at)“ S llb — abll S ”'0 - 510“ for every choice of b E B, and hence, as B is convex,

llbo—aoll2 s II C2 2 f(c)

for every point q E Q.

Thus, there exists a vector u E R” such that

(q, u) — (c, u) 2 c1 — C2 > 0

for every point q E Q.

Consequently, the hyperplane f1 = {x E R” : (x — c, u) = 0} does not intersect Q. It is left to the reader to check that

(22.17)

min{||x — a|| : x E ii} = H“ —a,u)| IIUII

_ (a—c,u)

||u||

g 5.

Therefore, if x0 6 fl achieves the minimum distance in (22.17) and d E R" is the point on the line through a and x0 such that ||d — a|| = 1, then

lld-Xll Z lld-Xoll Z lld-a+a-Xoll 21-8 for every x E Q. Thus, for each positive integer Is there is a point (21;, E R”

such that ||dk—a|| = 1 and Hdk—XH 2 1—1/k: for every x E Q. Consequently, a subsequence, of the points dk converges to a point b 6 R” that meets the

conditions in (1). Next, Lemma 22.6 implies that (b — a, x — a) S 0 for every point x E Q, which serves to justify (2), since the given hyperplane can also be written

as H = {x E R": (x—a,b—a) = 0}, and (3) with g(x) = (x,b—a).

D

Exercise 22.11. Verify formula (22.17) and find the point x0 6 I? that achieves the maximum of interest. [HINT: Find the maximum value of f(x) = ”X — all2 subject to the constraint (x — c, u) = 0.] 22.7. Convex hulls

Let Q be a subset of a vector space V over IF. The convex hull of Q is the smallest convex set in V that contains Q. Since the intersection of two convex sets is convex, the convex hull of Q can also be defined as the

518

22. Convexity

intersection of all convex sets in V that contain Q. The symbol coa Will be used to denote the convex hull of a set Q.

Lemma 22.14. Let Q be a subset ofIE‘”. Then the convex hull o is equal to the set of all convex combinations of elements in Q, i.e., (22.18)

conv Q={Ztixi:n21, tZ-ZO, ZtiZI andxieQ}. i=1

Proof.

1

It is readily seen that the set on the right-hand side of (22.18) is a

convex set: if n

u = Z tixi is a convex combination of x1, . . . ,xn E Q

2'21 and

k y = Z Si is a convex combination of y1, . . . ,yk E Q , j=1

then n

k;

Au + (1 — A)v = Z )‘tixi + :(1 — Maj-ya. i=1 j=1 is again a convex combination of elements of Q for every choice of A in the interval 0 g /\ g 1, since for such A, Ati 2 0, (1 — A)sj Z 0 and

At1+"-+Atn+(1—)\)81+'--+(1—/\)8k=1. Thus, the right—hand side of (22.18) is a convex set that contains Q. More— over, since every convex set that contains Q must contain the convex combinations of elements in Q, the right hand side of (22.18) is the smallest convex set that contains Q. E! Theorem 22.15. (Carathéodory) Let Q be a nonempty subset of R”. Then every vector x 6 coa is a convex combination of at most n + 1 vectors in Q. Proof.

Let

Then

x€coa

=>

[ k:

k

:5 H 2209' [x1] With 09' >0a EQand Zaj=1‘ .7 j=1

i=1

22. 7. Convex hulIS

519

If the vectors

yj=[

1

.

],

y=l,...,k,

xj

are linearly independent, then k: S n + 1, as claimed. If not, then there

exists a set of coefficients Bl, . . . ,Bk 6 R such that

k ZIBi=O j=1

79={j:Bj>O}7éQ).

and

Let

'y=min{&:j€79}.

[39'

Then Olj — ”7,83' 2 0 for j = 1, . . . , k, and, since at least one of these numbers is equal to zero, the formula

k [x]1 — gm.— we) [l1

(22.19)

displays the vector on the left as a convex combination of at most k: — 1 vectors. If these vectors are linearly independent, then there are at most n + 1 of them.

If not, the same argument can be repeated to eliminate

additional vectors from the representation until a representation of the form

(22.19), but with k: g n + 1, is obtained. The resulting identities k

k

1 = 2(09' — 759') and X = 2019' — 753039” j=1

j=1

serve to complete the proof.

D

Exercise 22.12. Show that every vector in the convex hull of the four vectors q1 = [(1)], conv Q is open. (2) Q compact => con'v Q is compact. Proof. Suppose first that Q is open and that x = 25:1 Ci is a convex combination of vectors x1, . . . ,xe E Q. Then there exists an e > 0 such

that xj + u E Q forj = 1,...,£ and every vector u 6 IR" With ||u|| < 8. Therefore,

e

j+u>ec2 x+u=zcj 0. Consequently,

qk E conv{b,c,d}U {x E R2: C2 < x2 < b2} for all sufficiently large k. But this is not possible, since the qk were pre[I sumed to be extreme points of Q. The conclusions of Lemma 22.18 do not propagate to higher dimensions:

Exercise 22.13. Let Q1 = {x E R3 : x? + x3 3 1 and x3 = 0}, Q2 = {x E R3: x1 = 1,x2 = O

and

— 1 g x3 3 1}. Show that the set of

extreme points of the set conv (Q1 U Q2) is not a closed subset of R3. We turn next to a finite dimensional version of the Krein-Milman theorem.

Theorem 22.19. Let Q be a nonempty compact convex set in IR". Then Q is equal to the convex hull of its extreme points. Discussion.

By Lemma 22.17, the set E of extreme points of Q is nonempty.

Let F denote the closure of E and let F = convF denote the convex hull of

F and suppose that there exists a vector qo E Q such that qO ¢ F. Then, since F is closed, Theorem 22.10 guarantees that there exists a real linear functional h on IR“ and a number (5 E R such that

(22.20)

h(q0) > 6 2 h(x)

for every

x E F.

Let

7=max{h(x):x€Q}

and Eh={x€Q:h(x)=7}.

522

22. Convexity

The inequality (22.20) implies that 7 > 6 and hence that E n E, = (b. On the other hand, it is readily checked that E, is a compact convex set. Therefore, by Lemma 22.17, it contains extreme points. The next step is to check that if x0 E E, is an extreme point of Eh, then it is also an extreme

point of Q: If

x0 =ozu+(1—oz)v for some pair of vectors 11, V E Q and 0 < oz < 1, then the identity

h(x0) = ah(u) + (1 — a)h(v) implies that h(x0) = h(u) = h(v) = ’y and hence that u, v E Eh. Therefore, since x0 is an extreme point for Eh, x0 = u = v. Thus, x0 6 E, which proves that E O E, 7é (Z); i.e.,

if FaéQ,

then EflEh=0

and EflEhaélb,

which is clearly impossible. Therefore, F = Q; i.e., convE = Q. It remains to show that Q = conv E. In view of Lemma 22.18, this is

the case if Q C 1R2, because then E = E. Proceeding inductively, suppose that in fact Q = convE if Q is a subset of Rk for k < p, let Q be a

subset of RP and let q E E. Then q belongs to the boundary of Q and, by translating and rotating Q appropriately, we can assume that q belongs

to the hyperplane H = {x 6 RP : x1 = 0} and that Q is a subset of the halfspace H_ = {x 6 RP : x1 3 0}. Thus, Q (1 H can be identified with a compact convex subset of RID—1. Let E’ denote the extreme points of Q0 H. By the induction hypothesis Q 0 H = conv E’. Therefore, since q E Q n H, and conv E’ Q conv E, it follows that

(22.21)

E Q convE,

and hence that Q = conv E, as claimed.

III

Theorem 22.20. Let Q be a nonempty compact convex set in IR“. Then every vector in Q is a convex combination of at most n + 1 extreme points

of Q. Proof. This is an immediate consequence of Carathéodory’s theorem and the Krein—Milman theorem. D 22.9. Brouwer’s theorem for compact convex sets

A simple argument serves to extend the Brouwer fixed point theorem to compact convex subsets of IR". Theorem 22.21. Let Q be a nonempty compact convex subset of R“ and let f be a continuous mapping of Q into Q. Then there exists at least one point q E Q such that f(q) = q.

22.10. The Minkowski functional

Proof.

523

Since Q is compact, there exists an r > 0 such that the closed ball

Tm) = {x E R" : “X” S r} contains Q. Then, by Lemma 22.2, for each point x E m there exists a unique vector qx E Q that is closest to x. Moreover, by Lemma 22.8, the function g from Tm) into Q that is defined by the rule g(x) = qx is continuous. Therefore the composite function

h(x) = f(g(x)) is a continuous map of Br(0) into itself and therefore has a fixed point in Br(0). But this serves to complete the proof:

f(g(x))=x=>f(qx)=x=>xEQ=>x=qx=>f(x)=x. D Exercise 22.14. Show that the function

(x1 + $2)/2 f(X) = #561332

maps the set Q = {x E R2 : 1 g x1 3 2

and

1 3 x2 3 2} into itself and

then invoke Theorem 22.21 to establish the existence of fixed points in this set and find them. Exercise 22.15. Show that the function f defined in Exercise 22.14 does

not satisfy the constraint ||f(x) — f(y)|| < ’yllx — Y” for all vectors x, y in the set Q that is considered there if 7 < 1. [HINT: Consider the number of

fixed points] 22.10. The Minkowski functional Let X be a normed linear space over IF and let Q Q X. Then the functional

pQ(x) = inf{t > 0: § 6 Q} is called the Minkowski functional. If the indicated set oft is empty, then pQ(x) = 00. Lemma 22.22. Let Q be a conuea: subset of a normed linear space X over IF such that

Q 2 37(0)

for some

r>0

and let intQ and Q, denote the interior and the closure of Q, respectively. Then:

pQ(X+Y) S PQ(X) +pc2(y) for MI 6 X. pQ(ax) = apQ(x) for oz 2 0 and x E X.

pQ(x) is continuous. {x E X : pQ(x) < 1} = intQ.

5 {XEXI pQ(X) S ”=5-

524

22. Convexity

(6) If Q is also bounded, then pQ(x) = 0 => x = 0. Proof. Let x,y E X and suppose that a‘lx E Q and B‘ly E Q for some choice of a > 0 and fl > 0. Then, since Q is convex,

X+Y_ 5 —1 (oz—1x) + a+B_a+B a+fi O and pQ(x) = a. Then there exists a sequence of numbers t1, t2, . . . such that tj > 0,

x —€Qand limtj=a. jToo

tj

Therefore, since

ozx

—E

atj

Q

and

limozt-=oza,,

jToo

J

fQWX) S afo(X) However, the same argument yields the opposite inequality:

ozfQ(x) = ozfQ(oz_104x) g aa‘lfQ(ozx) = fQ(ax) . Therefore, equality prevails. This completes the proof of (2) when oz > 0.

However, (2) holds when a = 0, because pQ(O) = 0. If x 7E 0, then

pQ(x) g 2r||x||,

since

x

2THXH

6 B7.(0).

Therefore, since the last inequality is clearly also valid if x = O, and, as

follows from (1),

IPQ(X) - po(y)l S po(x — y), it is easily seen that pQ(x) is a continuous function of x on X.

Items (4) and (5) are left to the reader. Finally, to verify (6), suppose that pQ(x) = 0. Then there exists a sequence of points 041 Z 042 Z

decreasing to 0 such that aflx E Q.

Therefore, since Q is bounded, say Q Q {x : ”X” S C}, the inequality Haj—1x“ g C implies that “X” g ajC forj = 1,2... and hence that x = 0. D

22.11. The numerical range

525

Exercise 22.16. Complete the proof of Lemma 22.22 by verifying items (4)

and (5).

Exercise 22.17. Show that in the setting of Lemma 22.22, pQ(x) < 1 => xEQandQ=>pQ(x)g 1. The proof of the next theorem that is presented below serves to illustrate the use of the Minkowski functional. The existence of a support hyperplane is already covered by Theorem 22.13.

Theorem 22.23. Let Q be a convex subset of R" such that Br(q) C Q for some q E R" and some 7" > 0. Let v E R” and L! be a k-dimensz’onal subspace of IR" such that 0 S h < n and the set V = v +11 has no points in common with int Q. Then there exist a vector y 6 R" and a constant c 6 IR

such that (x,y) = c z'fx E V and (x,y) < c tfx E intQ. Proof.

By an appropriate translation of the problem we may assume that

q = 0, and hence that 0 9? V. Thus, there exists a linear functional f on

the vector space W = {av + L! : oz 6 R} such that

V={WEW: f(w)=1}. The next step is to check that

(22.22)

f (x) g pQ(x)

for every vector

X E W.

Since pQ(x) 2 0, the inequality (22.22) is clearly valid if f (x) g 0. On the other hand, if x E W and f(x) = a > 0, then oflx E V and thus, as

V 0 int Q = 0, 1

x

x

l

oz>0:» apQ(x)—pQ (a) Z 1—f(a) —af(x), i.e.,

f(X) > 0 => f(X) S pQ(X)Thus, (22.22) is verified. Therefore, by the variant of the Hahn—Banach theorem discussed in Exercise 7.34, there exists a linear functional F on R”

such that F(x) = f(x) for x E W and F(x) 3 pQ(x) for every x E R”. Let H = {x E R”: F(x) = 1}. Then F(x) 3 pQ(x) < 1 when x 6 int Q, whereas F(x) = 1 when x E V, since V Q H. Thus, as F(x) = (x, y) for some vector y E K”, it follows that (x, y) < 1 when x 6 int Q and (x, y) = 1 El

when x E H.

22.11. The numerical range Let A 6 (€a. The set W(A) = {(Ax,x) :x E C" and ”X” = 1}

526

22. Convexity

is called the numerical range of A. The objective of this section is to show that W(A) is a convex subset of (C. We begin with a special case.

Lemma 22.24. IfB E (Cnxn and ifO E W(B) and 1 E W(B), then every point ,u 6 [0,1] also belongs to W(B). Proof.

Let x,y E C” be vectors such that “X” = My“ = 1, (Bx,x) = 1

and (By, y) = 0 and let

11,, = t'yx + (1 — t)y, Where |*y| = 1 and 0 g t g 1. Then

(Bub Ht) = 152(3):, X) + t(1 — t){7} + (1 — t)2 = t2 + t(1 — t)(7c + fi). Moreover, since (Bx,x> = 1 and (By,y) = 0, the vectors x and y are linearly independent. Thus,

ll’utll2 = “757x — (1 — t)yll2

= 152 + (1 - t)t{ = ta—I— (1 — t)b. If a = b, then ta + (1 — t)b = a = b, and hence we can choose ut = x or ut = y. Suppose therefore that a 7E b and let B = aA + BI”, Where a, fi are solutions of the system of equations

aoz+fi

=

1

boz+fi

=

0.

Then 0. Then, since A8 and B5 are invertible Hermitian matrices, we can invoke

(22.24) to obtain the inequality

2llAeQll S IIAEQBeBs—l + 14511456233“ = IIAEQ + Q33”, III

which tends to (22.25) as e 1 0.

Theorem 22.29. (Heinz) Let A E CPXP, Q E (Cpxq, B E ((3q and suppose that A t O and B t 0. Then

(22.27)

||AtQBl_t+A1‘tQBt|| g ||AQ+QB|| for 03253 1.

Proof.

Let f(t) = ||AtQB1_t + A1_tQBt||, let 0 S a < b S 1 and set

C: (a+b)/2 andd= b—c. Then, as C: a+d and 1—c= 1—b+d, (22.25) implies that

f(c) = IIACQBl—c + AHQBCII = llAd (AaQBl‘b + Al—bQB“) Bdll S

ill/12d (AaQBl—b+A1—bQBa) + (AaQBl—b+A1—bQBa) 32d”

2 éllAboBl—b + AHQB“ + AacBl‘“ + Al‘bQBbII

S M) : f(b) ;

i.e., f (t) is a convex function on the interval 0 g t S 1. Thus, as the upper

bound in formula (22.27) is equal to f(0) = f(1) ,

f(t)=f((1-t)0+t1)é (1-t)f(0)+tf(1)=(1-t)f(0)+tf(0)=f(0) for every point t in the interval 0 g t g 1.

D

Theorem 22.30. Let A E (CPXP, B E (CPXP and suppose that A > O and B > 0. Then (22.28) Proof.

||ASBS|| g ||AB||S

for

0 g s g 1.

Let

Q = {u 6 [0,1] I llAuBull S IIABIIU} and let 3 and t be a pair of points in Q. Then, with the help of the auxiliary inequality ”A(s+t)/2B(s+t)/2||2

=

“B(s+t)/2As+tB(s+t)/2H = T0(B(S+t)/2AS+tB(S+t)/2)

=

’r‘a(BsAs+tBt) S llBsAs+tBtll S ”BsAs” “AtBtlla

532

22. Convexity

it is readily checked that Q is convex. The proof is easily completed, since E 0 E Q and 1 E Q. Exercise 22.20. Verify each of the assertions that lead to the auxiliary inequality in the proof of Theorem 22.30. Exercise 22.21. Show that if A and B are as in Theorem 22.30, then

90(3) = ||ASBS||1/5 is an increasing function of s for s > 0. 22.15. Extreme points for polyhedra In this section we shall discuss the extreme points of the polyhedral sets

X={x€Rq: Ab},

basedonAERpxq andbERp

that were introduced in Section 16.8, when they are not empty. Even though X is a closed convex subset of Rq (when it is not empty), this is not simply an application of the Krein—Milman theorem because X is not necessarily compact.

Exercise 22.22. Show that if A = E

31], b = [31], then X is not

bounded. Show further that if A 6 TV” is invertible, then X is never compact or empty.

Recall that if E = {i1, . . . ,ip} with integers 1 g i1
(2).

If (1) is in force, then E 75 (2), because if Axo — b > 0, then there exists a nonzero vector u 6 Kg such that x0 :I: u E X. But this in turn implies that X0:

x0 + u

x0 — u

2

2



which is not viable with (1). Moreover, if rank AE < q, then there exists a nonzero vector v E NAE and hence x0 :I: 6v 6 X for sufficiently small 6 > 0.

But this again leads to a contradiction of (1). Thus, (2) must hold.

2215. Extreme points for polyhedra

533

2. (2) => (3). Suppose next that (2) is in force, x E X and let a = (AE)TW for some W > 0. Then

(x, (AE)TW) = (AEX — bE,W) + (bE,W)

(X) a) Z

= = (x07a>

with equality if and only if (AEx—bE, W) = 0. However, since AEx—bE Z 0 for x E X and W > 0, this happens if and only if AEX = bE, i.e., if and

only if AEX = AEXO or, equivalently, if and only if x = x0, since NAE = {0} when (2) is in force, by the principle of conservation of dimension.

3. (3) => (1). If (3) is in force and x0 6 X is not an extreme point of X, then there exist a pair of vectors u, v E X such that u 75 x0 75 v and

x0 = tu + (1 — t)v

for some choice oft 6 (0,1).

Thus,

(x0,a) = t(u, a) + (1 — t)(v, a) > t(x0,a) + (1 — t)(x0,a) = (x0,a),

which is clearly impossible. Therefore (3) => (1).

El

Corollary 22.32. X has at most (2) extreme points.

Proof.

The extreme points of X are the solutions of equations of the form

A’x = b', where A’ is a q x q submatrix of A of rank q and b’ E Rq is the corresponding subvector of b. Thus, the number of extreme points is less than or equal to to the the number of ways of choosing q rows out of 1). El Theorem 22.33. If X 75 (2), then the following statements are equivalent: (1) X has at least one extreme point.

(2) rankA = q. (3) There does not exist a pair of vectors u,v E Rq with v 7E 0 such that {u + tv : t E R} g X, i.e., X does not contain a line.

Proof.

The implication (1) => (2) follows from Theorem 22.31. Suppose

next that (2) holds and that there exist a pair of vectors u,v E Rq with v 7E 0 such that

{u+tv:tElR} QX; then A(u + tv) 2 b for every t E R, which is clearly impossible unless Av = 0. But this implies that v = 0, since, in view of (2), NA = {0}. Thus,

(2) => (3)-

534

22. Convexity

Finally, to verify the implication (3) => (1), suppose that (3) is in force and u E X, and observe first that if there is a nonzero vector v 6 NA, then

{u+tv:tElR}§X, which is not viable with (3). Therefore, (3) => (2). Now let E = {i E {1, . . . ,p}: (Au), = b,}. If E 7E (Z) and rankAE = q, then u is an extreme point of X by Theorem 22.31. On the other hand, if E = (Z) (in which case AE = O) or E aé (Z) and rank AE < q, then there exists

a nonzero vector v E NAE such that (A(u + tV))j — bj > 0 forj 9? E and |t| small enough. But as X does not contain a line, there exists an integer j g! E and a number

tj E R such that ef(A(u + tjv) — b) = 0. Let E1 = E U {j}. Then E1 # (I) and, as eJTAv 75 0, whereas AEv = 0, rank AE1 2 rank AE+1 (with the understanding that rank AE = 0 if E 2 (ll). If rank A31 2 q, then u + t1v is an extreme point of X by Theorem 22.31. If rank AE1 < q, then the procedure can be repeated until after say k such

iterations rank AEk = q.

[I

Theorem 22.34. If the set X has at least one extreme point and if c is a

nonzero vector in Rq and ”y = inf{(x, c) : x E X}, then either 7 = —oo

Proof.

or

'y is attained at an emtreme point of X.

Suppose that x E X is not an extreme point of X and let

E=Ex={i6 {1,...,p}: (Ax)Z-=b7;}

and F=Fx={1,...,p}\Ex.

Then, in view of Theorem 22.31, there are two cases to consider: either

E = (Z) or E 7E (Z) and rank AE < q. It is convenient to begin with a general principle that will be used in the analysis of both of these cases and to keep in mind that if AEu = 0 for a nonzero vector u E Rq, then eJTAu aé 0 for at

least one integer j E F, since NA = {0}; ej, denotes the j ’th column of Ip. 1. If (u,c) < 0, then efAu < 0 for at least one choice ofj. If the assertion is false, then Au 2 0 and hence x + tn 6 X for every t 2 0. But then

liTm(x + tu, c) = (x, c) + limt (u, c) = —00, contrary to assumption. Therefore, there exists at least one integer j, j =

1,. . .,n, such that eJTAu < 0. 2. If E = (2), then there exists a vector x1 6 Kg such that Ex1 aé (Z).

22.15. Extreme points for polyhedra

535

If E = (2), choose 11 E R4 such that (u, c) < 0. Then, in View of Step 1, there exist integers jl, . . . , jk such that eiAu < 0 fori = 1, . . . , k. Therefore,

there is a corresponding set of positive numbers t1, . . . ,tk such that

e£A(x+tz-u) = bj,

for i = 1,...,k:.

Thus, if 7' = min {151, . . . , tk}, then e?A(x+7'u) 2 bj

for j = 1, . . . , p with equality for at least one choice of j .

Let x1 = x + 7'11 and E1 = Exl- Then E1 75 (D and rankAE1 2 1. 3. If E 75 (D and rank AE < q, then there exists a vector x1 6 Kg such that

(22.29)

rank AEX1 > rank AEX

and

(x1,c) g (x,c).

If E aé (D and rank AE < q, then there exists a nonzero vector u E NAE and an e > 0 such that AE(x+tu) = by

and

AF(x+tu) > bp

for |t| < 5. Moreover, since —u also meets these conditions, we may assume that (u, c) g 0 and hence that

(22.30)

(x + tu,c) = (x,c) + t(u, c) g (x, 0)

when t 2 0.

But,

(22.31)

if (u, c) < 0, then (Au)j < 0 for at least one integer j 6 FX,

by Step 1. Therefore, there exists an integer j Q’ E and a number 7'1 > 0

such that the vector x1 = x + nu belongs to X and A(X1)j = bj. Thus Ex1 Q E U {j}. Moreover, since eZTAu = 0 for i E E and egrAu < 0 for this choice of j E F, this new row is linearly independent of the rows in AE and hence rank AE1 2 rank Ag + 1. This completes the proof of (22.29) when

E 75 (Z) and (u,c) < 0.

Suppose next that E 75 (Z) and (u,c) = 0. Then the inequality (22.30) holds for every t E R and, since X cannot contain a line, there exists a number 71 (possibly negative this time) such that x1 = x + nu belongs to

X and A(X1)j = bj and (22.29) is verified much as before. 4. There exists a set of integers Ek and a vector xk such that

rankAEk = q,

AEk = bEk

and

(xk,c) g (x,c).

The set Ek is obtained by iterating the procedure in Step 3 k: times. Then, in view of Theorem 22.31, xk is an extreme point of X and hence as

x was an arbitrary vector in X, it follows that the minimum value of (x, c) must be attained at an extreme point of X.

D

536

22. Convexity

22.16. Bibliographical notes A number of the results stated in this chapter can be strengthened. The

monograph by Webster [89] is an eminently readable source of supplemen— tary information on convexity in IR". Exercise 22.13 was taken from [89]. Applications of convexity to optimization may be found in [12] and the references cited therein. The proof of Theorem 22.13 is adapted from the expository paper [14]; the proof of Theorem 22.23 is adapted from [61]. The presented proof of the Krein-Milman theorem, which works in more general settings (with convex hull replaced by the closure of the convex hull), is adapted from [91]. The presented proof of the convexity of numerical range is based on an argument that is sketched briefly in [45]. Halmos credits it to C. W. R. de Boor. The presented proof works also for bounded operators in Hilbert space; see also McIntosh [64] for another very attractive approach. The proof of the Heinz inequality is taken from a beautiful short

paper [41] that establishes the equivalence of the inequalities in (1)—(4) of Lemma 22.28 with the Heinz inequality (22.27) for bounded operators in Hilbert space and sketches the history. The elegant passage from (22.25) to (22.27) is credited to an unpublished paper of A. McIntosh. The proof of

Theorem 22.30 is adapted from [42]. The notion of convexity can be extended to matrix valued functions: a function f that maps a convex set Q of symmetric p x p matrices into a set of symmetric q x q matrices is said to be convex if

f(tX+ (1 — t)Y) j #00 + (1 — t)f(Y) for every t 6 (0,1) when X and Y belong to Q. Thus, for example, the

function f(X) = X" is convex on the set Q = {X E Rm‘” : X z 0} is convexiflg'rg 2or —1 SrSOand —fisc0nvexif0§ TE 1; see

[2]. The proof of the Gauss-Lucas theorem in Section 22.13 is adapted from an exercise in [6]; the discussion of extreme points for polyhedra is adapted

mostly from Chapter 2 0f [8]. A complete description of the numerical range W(A) of a matrix A E (3a is presented in [48].

Chapter 23

Matrices with nonnegative entries

Be wary of writing many books, there is no end, and much study is wearisome t0 the flesh. Ecclesiastes 12:12

Matrices with nonnegative entries play an important role in numerous applications. This chapter is devoted to the study of some of their special properties. A rectangular matrix A E Rnxm with entries aij, i = 1,...,n, j = 1,...,m, is said to be nonnegative if aij 2 O for z' = 1,...,n and j = 1,...,m; A is said to be positive if aij > 0 for z' = 1,...,n and j =

1, . . . ,m. The notation A 2 O and A > O are used to designate nonnegative matrices A and positive matrices A, respectively. Note the distinction with the notation A > O for positive definite and A t O for positive semidefinite

matrices that was introduced earlier. The symbols A 2 B and A > B will be used to indicate that A — B 2 O and A — B > 0, respectively.

A nonnegative square matrix A 6 RM" is said to be irreducible if

for every pair of indices 13, j E {1, . . . ,n} there exists an integer k 2 1 such that the ij entry of Ak is positive; i.e., in terms of the standard basis ei, i: 1,...,n, oflR", if

(Akej, ei) > O

for some positive integer k: that may depend upon 71j .

This is less restrictive than assuming that there exists an integer k: 2 1 such

that At > 0. 537

538

23. Matrices with nonnegative entries

Exercise 23.1. Show that the matrix 0

A- l 1 o l 1

is a nonnegative irreducible matrix, but that Ak is never a positive matrix. Exercise 23.2. Show that the matrix

A = [0 1

1] is irreducible, but the matrix B = [(1) 1

fl is not irreducible

and, more generally, that every triangular nonnegative matrix is not irreducible.

Lemma 23.1. Let A E Rnxn be a nonnegative irreducible matrix. Then

(In + AV"—1 is positive. Proof. Suppose to the contrary that the ij entry of (In + A)”_1 is equal to zero for some choice of i and j. Then, in View of the formula

2z k=0

it is readily seen that

k=0

and hence that (Akej,e,) =0

for

k=0,...,n— 1.

Therefore, by the Cayley—Hamilton theorem, (Akej,ei) = 0

for

k: 0,1,... ,

which contradicts the assumed irreducibility.

D

Exercise 23.3. Let A E Rnxn have nonnegative entries and let D E Rnxn be a diagonal matrix with strictly positive diagonal entries. Show that A is irreducible AD is irreducible DA is irreducible.

23.1. Perron-Frobenius theory The main objective of this section is to show that if A is a square nonnegative irreducible matrix, then ra(A), the spectral radius of A, is an eigenvalue of

A with algebraic multiplicity equal to one and that it is possible to choose a corresponding eigenvector u to have strictly positive entries.

Theorem 23.2. (Perron—Frobenius) Let A 6 HM” be a nonnegative irreducible matrix. Then A 7E Oa and:

23.1. Perron-Frobenius theory

539

(1) MA)6 0-(A) (2) There exists a positive vector u E R" such that Au— — ra(A)u. — ro—(A)v. (3) There emists a positive vector v E R” such that ATv—

(4) The algebraic multiplicity of ra(A) as an eigenvalue of A is equal to one.

The proof of this theorem is divided into a number of lemmas. Let

B={X€RnZHXH2=1},

C={xER”:x20},

ez- denote the i’th column of In and, for nonzero vectors x E C, let (AX, 81> .

6A(x)= mzin{ 6,4(u),

by Lemma 23.3. But this contradicts the presumed maximality of 6A(u). Therefore, Au = 6A(u)u. The last equality implies further that

(In + A)”_1u = (1 + 6A(u))"_1u and hence, as 1 + 6A(u) > 0, that u 6 CA (1 B.

23.1. Perron—Frobenius theory

541

Finally, if A E 0(A), then there exists a vector x E B such that n

Ax,- = Eaijmj

for i = 1,...,n.

3:1

Thus, n

W |a32~| S Zaijlmfl

for i = 1,...,n,

j=1 since aij 2 0 and hence the vector v with components v, = |x¢| belongs to C n B and

IAI S 5A(V) S 5AM S MA)But as this string of inequalities holds for every A E 0(A), (3) holds.

D

Lemma 23.6. If A E Ra is a nonnegative irreducible matrix, x is a nonzero vector in C" such that Ax = ra(A)x and v is the vector in R" with

components v,- = lxil, then AV 2 r0(A)v and v > 0. Proof. It is readily checked that Av — rU(A)v Z 0. Moreover, if AV — rU(A)v 7E 0, then (In +A)”_1(Av— ra(A)V) > 0, i.e., AW— rO—(A)W > 0 for

W = (In —|— A)"_1v. Thus, 6A(W) > rU(A), which contradicts Lemma 23.5. Consequently, Av = TU(A)V and v = (1 + rU(A))_"W > 0, as claimed.

D

Lemma 23.7. Let A E Ra be a nonnegative irreducible matrix. Then the geometric multiplicity of ra(A) as an eigenvalue of A is equal to one, i.e., dimMrU(A)In—A) = 1.

Proof.

Let u and v be any two nonzero vectors in C" such that Au =

rU(A)u and Av = r0(A)V.

Then, in view of Lemma 23.6, the entries

u1, . . . ,un of u and the entries v1, . . . ,vn of V are all nonzero and mu — ulv is also in the null space of the matrix A — r0(A)In. Thus, by another application of Lemma 23.6, either vlu — ulv = 0 or |v1uj — ulvjl > O for

j = 1, . . . ,n. However, the second situation is clearly impossible since the

first entry in the vector vlu—ulv is equal to zero. Thus, 11 and v are linearly III dependent.

Lemma 23.8. Let A E Rnxn and B E Rnxn be nonnegative matrices such that A is irreducible and A — B is nonnegative {i.e., A 2 B 2 0). Then:

(1) ra(A) 2 ra(B). (2) r0.(A) = ra(B) A = B. Proof.

Let B E 0(B) with |6| = ra(B), let By = fly for some nonzero

vector y E C" and let v E R" be the vector with v,- = |yz| for i = l, . . .,n.

542

23. Matrices with nonnegative entries

Then, in the usual notation, n

flyi = Zbijyj

for

’l= 1,...,n

j=1 and so (23.1)

”(Em- = lfillyz'l = s'jyj S 21%|?!l Zaijlyjl = Zauvjj=1

j=1

j=1

j=1

Therefore, since v E C, this implies that 737(3) S (SAW)

and hence that ra(B) g ra(A). Suppose next that ra(A) = ra(B). Then the inequality (23.1) implies that Av — ra(A)v 2 0.

But this forces Av — ra(A)v = 0 because otherwise Lemma 23.6 yields a

contradiction to the already established inequality ra(A) 2 6A(u) for every 11 E C. But this in turn implies that TL

2(043' —bz'j)’Uj =0

fOI‘

7:: 1,...,'n,

i=1

and thus, as aij — big- 2 0 and 'Uj > 0, we must have 0413' = bij for every choice of i,j E {1, . . .,n}, i.e., rU(A) = ra(B) => A = B. The other direction is Ij

self—evident.

Lemma 23.9. Let A E Rnxn be a nonnegative irreducible matrix. Then the algebraic multiplicity of ro—(A) as an eigenvalue of A is equal to one. Proof.

It suffices to show that ra(A) is a simple root of the characteristic

polynomial cp()\) = det()\In — A) of the matrix A. Let 01100

00‘) =

---

E

E cn1()\)

Cln()\)

- --

,

cnn()\)

Where

0170‘) = Elli-“(Mn — A){ji} and (AL, — A){j,} denotes the determinant of the (n — 1) x (n — 1) matrix that is obtained from A1,, — A by deleting the j’th row and the i’th column of AI” — A. Then, as

(A1,, — moo) = C(A)(AIn — A) = 90min

23.1. Perron-Frobenius theory

543

and g0(ra(A)) = 0, (TU(A)In _ A)C(7"U(A)) = Oa

and hence that each nonzero column of the matrix C(rU(A)) is an eigenvector of A corresponding to the eigenvalue rU(A). Therefore, in view of

Lemma 23.7, each column of C (ra(A)) is a constant multiple of the unique vector u 6 CA 0 B such that Au = ra(A)u, i.e.,

C((ra(A)) = uwT

for some W 6 IR”.

Next, upon differentiating the formula C(A)()\In — A) = |)\| < ra(A). Exercise 23.9. Show that if the complex numbers c1, . . . , on E (C\{O}, then |c1 + - - - + cnl = |c1| + |02| + - - - + |cn| if and only if there exists a number 6 E [0, 271') such that Cj = e‘0|cj| forj = 1, . . . ,n Exercise 23.10. Prove Theorem 23.10. [HINT: Exploit Exercise 23.9.] Exercise 23.11. Let A E RM" be a nonnegative irreducible matrix with

spectral radius ra(A) = 1. Let B = A—xyT, where x = Ax and y = ATy are positive eigenvectors of the matrices A and AT, respectively, corresponding to the eigenvalue 1 such that yTx = 1. Show that:

(a) 0(B)C 0(A)U {0}, but that 1 9? 0(B). (b)t—>00NZI¢N= lBkZO

(c)B

—xyT for k = 1, 2,

( ) 11mN_,OO NZk_1Ak =xyT

[HINT: If ra(B) < 1, then it is readily checked that Bk —> O as k: —> 00. However, if ra(B) = 1, then B may have complex eigenvalues of the form em and a more careful analysis is required that exploits the fact that

Iimoo % 33;, 6.1.9 = 0 if e“ 7g 1.] 23.2. Stochastic matrices A nonnegative matrix P 6 RM” with entries pij, i, j = 1, . . . ,n, is said to be a stochastic matrix if n

(23.2)

Zpij=1

fori=1,...,n

Stochastic matrices play a prominent role in the theory of Markov chains with a finite number of states. Exercise 23.12. Show that P 6 RM" is a stochastic matrix if and only if

P“ is a stochastic matrix for every positive integer k. Exercise 23.13. Show that the spectral radius ra(P) of a stochastic matrix P is equal to one. [HINT: Invoke Exercise 23.5 and Lemma 23.5 to justify

the inequality rU(P) g 1.]

23.3. Behind Google

545

Exercise 23.14. Show that if P E Rnxn is an irreducible stochastic matrix

with entries 1),, for i, j = 1, . . . ,n, then there exists a positive vector u E R”

with entries u,- for i = 1, . . . ,n such that Uj = 2;;1 run-pi,- for j = 1,. . .,n.

[HINT: Exploit Theorem 23.2 and Exercise 23.13.]

1/2 Exercise 23.15. Show that the matrix P =

0

1/4 1/2

1/2 1/4

is an irre—

1/8 3/8 1/2 ducible stochastic matrix and find a positive vector u E R3 that meets the conditions discussed in Exercise 23.14. Exercise 23.16. Find the eigenvalues of the (doubly) stochastic matrix

P = [e1

e3 e2

e4] based on the columns e,, j = 1,. . . ,4, of I4.

23.3. Behind Google In a library of n documents, the Google search engine associates a vector in R12 to each document. The entries of each such vector are based on a weighted average of the number of links of this document to other documents with overlapping sets of keywords; they are nonnegative and sum to one. Thus, if G = [g1 -- - gn] is the array of the n vectors corresponding to the n documents, then GT is a stochastic matrix. Let

aT=[1

1],

A=%aaT

and

B=tG+(1—t)A

for some fixed choice of t E (0, 1). The matrix B has good properties:

Theorem 23.11. If the matrizt B = t0 + (1 — t)A for some fixed choice of t 6 (0,1) has It distinct eigenvalues A1, . . . , A), with |)\1| 2 2 |Ak|, then (23.3)

A1=1

and

lAjlgt

forj=2,...,l~c.

Moreover, if in a Jordan decomposition B = U JU ‘1, U = [111 and Bu1 = 111, then (23.4)

Proof.

un]

V = Z llj => lim llV — (31111” = O . j=1 mToo

In view of Theorems 23.2 and 23.10, A1 = ”(3) and |)\j| < 1

for j = 2,. . . .13. Moreover, since BT is stochastic matrix, ra(B) = 1 and BTa = a. Thus, if

Bu= Au+W,

|)\| < 1

and

(W,a) = 0,

then

(u, a) = (u,BTa) = (Bu, a) = (Au + w, a) = Mu, a)

546

23. Matrices with nonnegative entries

and hence

(u, a) = 0. Therefore, upon applying this observation to the Jordan chains associated with each eigenvalue, it is readily checked that

(uj,a)=0

forj=2,...,n.

Consequently,

Ajuj = Buj = tGuj + (1 — t)Auj = tGuj

forj = 2,...,n.

Thus, n

n

n

ll 2 WM — Clulll = ll X CjujH = tmll 2 WM” j=1

j=2

'=2

72.

S tmllll | 2 Cjujll j=2

which tends to 0 as m T 00, since limmTOO ||Gm||1/m = ”(6') = 1.

[I

To have at least a rough idea of the advantage of working with B instead of directly with G, look at Exercise 23.16. The ranking of documents is based on the entries in a good approxima-

tion to 111 (normalized so that ||u1||1 = 1) that is obtained by computing Bkv for large enough k. The numbers involved are reportedly on the order of: n = 25, 000,000, 000, k = 100 with t = .85. This works because most of

the entries in the vectors 93- are equal to zero. Exercise 23.17. Show that if G = 12, then (in the notation of Theorem

23.11) 0(3) = {1,75}. Exercise 23.18. Show that if (in the notation of Theorem 23.11) B is diagonalizable, then n

TL

llV - 01111“ = ZCj/i S 15m: IICjua'll. i=2 i=2 23.4. Doubly stochastic matrices A nonnegative matrix P 6 KW” is said to be a doubly stochastic matrix

if both P and PT are stochastic matrices, i.e., if (23.2) and n

(23.5)

2192-9-21

forj=1,...,n

i=1 are both in force. The main objective of this section is to establish a theorem of Birkhoff and von Neumann that states that every doubly stochastic matrix

23.4. Doubly stochastic matrices

547

is a convex combination of permutation matrices. It turns out that the notion of permanents is a convenient tool for obtaining this result. If A E Cnx”, the permanent of A, abbreviated per(A) or per A, is defined by the rule (236)

per (A) = 2 040(1) ' ' ' ana('n) a oEEn

where the summation is taken over the set 2,, of all n! permutations a of

the integers {1, . . . ,n}. This differs from the formula (5.2) for det A because the term d(P(,.) is replaced by the number one. There is also a formula for computing perA that is analogous to the formula for computing determinants by expanding by minors:

(23.7)

perA = Z aZ-jper AW) j=1

for each choice of i ,

where AW) denotes the (n — 1) x (n — 1) matrix that is obtained from A by deleting the i’th row and the j’th column.

Exercise 23.19. Show that if A E an, then (i) perA is a multilinear functional of the rows of A; (ii) per PA = perA 2 per AP for every n X n

permutation matrix P; and (iii) per In 2 1. B

ExerCISe 23.20. Let A = [ C

0

D J , where B and D are square matrices.

Show that

perA =perB - perD. Exercise 23.21. Let A E R2X2 be a nonnegative matrix. Show that perA = 0 if and only if A contains a 1 x 2 submatrix of zeros or a 2 x 1 submatrix of zeros.

Exercise 23.22. Let A E R3X3 be a nonnegative matrix.

Show that

perA = 0 if and only if A contains an r X s submatrix of zeros where r+s=3+L

Lemma 23.12. Let A 6 RM” be a nonnegative matrix. Then perA = 0 if and only if there emists an r x 8 zero submatrix of A with 7" + s = n + 1. Proof. Suppose first that perA = 0. Then, by Exercise 23.21, the claim is true for 2 x 2 matrices A. Suppose that in fact the assertion is true for k x k

matrices when k < n and let A E Rnx”. Then formula (23.7) implies that aijperA(,-j) = 0 for every choice 0ft,j = 1, . . . ,n. If a” = 0, forj = 1,. . .,n

and some i, then A has an n x 1 submatrix of zeros. If adj 7é 0 for some j, then per AW) 2 0 and so by the induction assumption AW) has an r X s

548

23. Matrices with nonnegative entries

submatrix of zeros with 7“ + s = n. By permuting rows and columns we can assume OMS is the upper right—hand block of A, i.e.,

B 0 ]

P1AP2= [ C

D

for some pair of permutation matrices P1 and P2. Thus, as

peeerD = per (P1AP2) = perA = 0, it follows that either per B = 0 or perD = 0. Suppose, for the sake of definiteness, that perB = 0. Then, since B 6 RT)” and r < n, the induction assumption guarantees the existence of an i x j submatrix of zeros in B with i + j = r + 1. Therefore, by permuting

columns we obtain a zero submatrix of A of size 2' x (j + s). This fits the assertion, since

i+j+s=r+1+s=n+1.

The other cases are handled similarly. To establish the converse, suppose now that A has an 7' x s submatrix of zeros with 'r + 3 = n + 1. Then, by permuting rows and columns, we can without loss of generality assume that B

A—[C

O

D] ,r+8—n+1.

Thus, as B E RTXW—S) and r x (n — s) = r x (r — 1), any product of the form a10(1)a20(2) ' ' ' arch") ' ' ' an0'(n)

is equal to zero, since at least one of the first 7' terms in the product sits in the zero block.

[I

Lemma 23.13. Let A E Rnx” be a doubly stochastic matrix. Then perA > 0. Proof.

If perA = 0, then, by the last lemma, we can assume that

BO

Heal where B E Rrxm—s), C E Rm—flxm—S) and 7" + s = n + 1. Let 2G denote the sum of the entries in the matrix G. Then, since A is doubly stochastic,

T=EBSEB+EC=n—8; i.e., r + s g n, which is not compatible with the assumption perA = 0. Therefore, perA > 0, as claimed. D

23.4. Doubly stochastic matrices

549

Theorem 23.14. (Birkhoff-von Neumann) Let P 6 RM” be a doubly stochastic matrix. Then P is a convex combination of finitely many permutation matrices.

Proof.

If P is a permutation matrix, then the assertion is self-evident. If

P is not a permutation matrix, then, in view of Lemma 23.13 and the fact

that P is doubly stochastic, there exists a permutation a of the integers {1, . . .,n} such that 1 > p10(1)p20(2) - ' 'pmm) > 0. Let )‘1 = min{pla(1)a ' ' ' apna(n)}

and let H1 be the permutation matrix with 1’s in the i0(i) position for i = 1,. . . ,n. Then it is readily checked that

P—All'll P:— 1 1—A1 is a doubly stochastic matrix with at least one more zero entry than P and that P = All—l1 -|— (1 — A1)P1 .

If P1 is not a permutation matrix, then the preceding argument can be repeated; i.e., there exists a number A2, 0 < A2 < 1, and a permutation matrix H2 such that

P2 =

P1 — A2112 1 — A2

is a doubly stochastic matrix with at least one more zero entry than P1. Then P = All—ll + (l — A1){)\2H2 -|- (1 — A2)P2} .

Clearly this procedure must terminate after a finite number of steps.

El

Exercise 23.23. Show that the representation of a doubly stochastic matrix P E Rnxn as a convex combination of permutation matrices is unique if

n = 1, 2, 3, but not if n 2 4. [HINT: If n 2 4, then n! > n2.] Exercise 23.24. Let Q denote the set of doubly stochastic n x n matrices. (a) Show that Q is a convex set and that every n x n permutation matrix is an extreme point of Q.

(b) Show that if P E Q and P is not a permutation matrix, then P is not an extreme point of Q. (c) Give a second proof of Theorem 23.14 based on the Krein-Milman theorem.

550

23. Matrices with nonnegative entries

23.5. An inequality of Ky Fan Let A, B E ((3a be a pair of unitary matrices with eigenvalues H1 2 M2 2 2 on, respectively. Then the Cauchy—Schwarz 2 an and V1 2 V2 2 inequality applied to the inner product space ((3a with inner product

(A, B) = trace{BHA} leads easily to the inequality n

trace{AB} 3

1/2

n

Z a?

2 V32

j=1

j=1

1/2

In this section we shall use the Birkhoff—von Neumann theorem and the

Hardy-Littlewood—Polya rearrangement lemma to obtain a sharper result for real symmetric matrices. The Hardy-Littlewood—Polya rearrangement lemma, which extends the observation that (23.8)

0.1 2 a2 and b1 2 ()2 => a1b1 + (1.2192 2 albg + a2b1

to longer ordered sequences of numbers, can be formulated as follows: Lemma 23.15. Let a and b be vectors in R” with entries a1 2 a2 2

an and b1 2 b2 2

(23.9)

2

2 bn, respectively. Then

aTPb g 31%

for every n X n permutation matrix P.

Proof.

Let P 2 23‘21 ejegm for some one to one mapping a of the in-

tegers {1, . . .,n} onto themselves, and suppose that P 7g In. Then there

exists a smallest positive integer k such that 0(k) 75 k. If k; > 1, this means that 0(1) = 1,...,o(k — 1) = k — 1 and k = 0(6), for some integer 6 > 13. Therefore, bk = 50(6) 2 bag.) and hence the inequality (23.8) implies that

akbaUc) + Gabe—(e) S Globe—(r) + aebaac) =

akbk + agbdk) .

In the same way, one can rearrange the remaining terms to obtain the in-

equality (23.9).

D

Lemma 23.16. Let A, B E Rn” be symmetric matrices with eigenvalues

H12H22"'2HnandVlZI/2Z"'2Vna respectively. Then (23.10)

trace{AB} S all/1 + - - - + uni/n ,

with equality if and only if there exists an n x n orthogonal matrix U that diagonalizes both matrices and preserves the order of the eigenvalues in each.

23.5. An inequality of Ky Fan

551

Proof. Under the given assumptions there exists a pair of n x n orthogonal matrices U and V such that

A = UDAUT and B = VDBVT, Where DA = diag{,u1, . . . Hun} and D3 = diag{I/1, . . .,1/n}. Thus

trace{AB}

= trace{UDAUTVDBVT} = trace{DAWDBWT}

=

Z Hi’nI/j a i,j=1

Where wij denotes the ij entry of the matrix W = UTV. Moreover, since W is an orthogonal matrix, the matrix Z G Ra with entries 2713’ = 112%, 73, j = 1, . . . , n, is a doubly stochastic matrix and consequently, by the Birkhoff-von Neumann theorem,

2

z = 2 ASP, 3:1

is a convex combination of permutation matrices. Thus, upon setting xT = [#1: . . . , [in] and yT = [1/1,...,1/n] and invoking Lemma 23.15, it is readily seen that e trace{AB} = Z AsxTPSy 3:1

6

3 Z AsxTy = XTy , 5:1

as claimed.

The case of equality is left to the reader.

III

Remark 23.17. A byproduct of the proof is the observation that the Schur product U 0 U of an orthogonal matrix U with itself is a doubly stochastic matrix. Doubly stochastic matrices of this special form are often referred to as orthostochastic matrices. Not every doubly stochastic matrix is an orthostochastic matrix; see Exercise 23.26 The subclass of orthostochastic

matrices play a special role in the next section. Exercise 23.25. Show that in the setting of Lemma 23.16, trace{AB} 2 [1a + - - - + uni/1 . [HINT2b1 Z

2 bn => —bn Z

2 —b1.]

552

23. Matrices with nonnegative entries

0

3

3

Exercise 23.26. Show that the doubly stochastic matrix A = % 3 1 2 3

2

1

is not an orthostochastic matrix.

23.6. The Schur-Horn convexity theorem Let A = [adj] ,i, j = 1, . . . ,n be a real symmetric matrix with eigenvalues p1, . . . , an. Then, by Theorem 9.7, there exists an orthogonal matrix Q E

Rnx" such that A = QDQT, where D = diag{,u1, . . . ,un}. Thus, n 2

ai¢=2

qz-j/ij

.

for

z=1,...,n

j=1 and the vector dA with components (1,, for = 1, . . . , n is given by the formula an dAf=

M1

E

==£3

E

ann

7

Mn

where B denotes the orthostochastic matrix with entries

bij =q

for

i,j= 1,...,n.

This observation is due to Schur [79]. By Theorem 23.14,

B = Z CUPU 062”

is a convex combination of permutation matrices PU. Thus, upon writing

P0. in terms of the standard basis 9,, 2' = 1, . . . ,n for R" as Pa = :1 9693;“) 9

it is readily checked that the vector [lla(1)1

M=Zas 06271.

”GOOJ

is a convex combination of the vectors corresponding to the eigenvalues of the matrix A and all their permutations. In other words, Ita(1)

(23.11)

dA E conv

f

: a E 2,,

#00.)

There is a converse statement due to Horn [50], but in order to state it, we must first introduce the notion of majorization.

23.6. The Schur—Horn convexity theorem

553

Given a sequence {301, . . .,xn} of real numbers, let {3151, . . .,in} denote the rearrangement of the sequence in “decreasing” order: E1 2 2 in. Thus, for example, if n = 4 and {x1,...,x4} = {5,3,6,1}, then {551, . . . £4} = {6,5,3,1}. A sequence {51:1, . . . ,xn} of real numbers is said

to majorize a sequence {y1, . . . ,yn} of real numbers if §1+~~+§Ek

2

§1+---+§k

351+~-+35n

=

§1+---+17n-

for

h=1,...,n—1

and

Exercise 23.27. Show that if A E Ra is a doubly stochastic matrix and if y = Ax for some x E R”, then the set of entries {$1, . . . ,mn} in the

vector x majorizes the set of entries {3/1, . . . ,yn} in the vector y. [HINT: If x12---a,y12---Zynand1§igk§n,then n

k:—1

n

k—l

ya- = E aijCBj S E aijmj + 51% E az'j = E 6%j — £10k) + £13kj=k

j=1

j=1

j=1

Now exploit the fact that 2:721 aij g 1 to bound 3/1 + - - - + yk.] Lemma 23.18. Let {$1, . . . ,xn} and {3/1, . . .,yn} be two sequences of real

numbers such that {51:1, . . . ,xn} majorizes {3/1, . . . ,yn}. Then there exists a set of n — 1 orthonormal vectors 111, . . . , un_1 and a permutation a E 2,,

such that n

(23.12)

ya(,-) = Z(u,-)3xj j=1

Discussion.

for 2': 1, . . .,n — 1.

Without loss of generality, we may assume that $1 2 - - - 2 sun

and y1 2 - - - 2 yn. To ease the presentation, we shall focus on the case n = 4.

Then the given assumptions imply that $12311,

x1+x22y1+y2,

x1+x2+x32y1+y2+y3

and, because of the equality m1 +---+x4 = y1 +---+y4, 3343314,

x3+$4Sy3+y4,

x2+x3+w43y2+y3+y4.

The rest of the argument depends upon the location of the points y1,y2, y3 with respect to the points 5101, . . . ,x4. We shall suppose that $1 > x2 > x3 >

:64 and shall consider three cases: Case 1: 313 g x3 and y2 g 902. Since y3 2 114 2 :34, there clearly exists a choice of C3, C4 6 R such that

ya, = c3303 + 042,334

with

0% + 03, = 1 .

Moreover, if 7 = —c4 and 6 = C3, then

703+5C4=0,

72+52=1

554

23. Matrices with nonnegative entries

and

72x3 + 62x4 = (1 — 0%)x3 + (1 — 021704 = $3 + x4 — .743 S in S y2. Therefore, there exists a choice of u, v E R such that y2 = 142(72233 + 62234) + 112232

with U2 + v2 = 1.

Thus, the vectors 111 and u2 defined by the formulas

urf = [O 0 03 C4]

and

u; = [0 b2

b3

(94]

with b2 = 1), b3 = 2m and b4 = 11,6 are orthonormal and

3/2 = 123232 + (93303 + him . Next, choose constants B, 'y, 6 E R such that the matrix

0 b2 [3 03 b3 ’7 C4

()4

(3

is an orthogonal matrix. Then, since the transpose of this matrix is also an orthogonal matrix,

fl2x2 + 72303 + 62274 = (1 — bg)$2 + (1 — cg — b§)x3 + (1 — CZ — b33134

2x2+w3+x4—y3—y23y43y1 £371Therefore, there exist u, v E R such that

y1 = u2(fi2x2 + 72x3 + 62234) + 212361

and U2 + v2 = 1.

Consequently,

2

2

2

2

3/1 = 0,1271 + 012372 + (1331.3 + (11431.4,

with a1 = 1), a2 = ufl, a3 = rm and a4 = 216 and the vectors 0

0

0 111 =

a1

()2 ,

112 =

cm and

113 =

03

b3

a3

C4

194

a4

are orthonormal.

Case 2: M 2 x2 and 312 2 333. Since 311 S .981, there exists a pair of constants a1, a2 6 R such that

3/1 = 03x1 + agwg

with

a? + a3 = 1.

Thus, if "y = —a2 and 6 = on, then 704 + 60,2 = 0, 72 + 62 = 1 and 72201 + 62552 = (1 — abxl + (1 — (@562 = x1 + x2 — 3/1 2 yg. Therefore, there exist a pair of constants u, ’v E R such that y2 = u2(’y2m1 + 62202) + 112203

with U2 + v2 = 1,

23.6. The Schur—Horn convexity theorem

555

i.e., y2 = him -I- nz + (93333

with bl = m, b2 = U6, b3 = v and the vectors u? = [a1

(1.2

0

0]

and

u; = [b1

b2

b3

0]

are orthonormal. Now choose a1, 052, 043 E R so that the matrix

a1

51

a1

a2

()2

0&2

0

b3

0&3

is an orthogonal matrix. Then 2 + 0525132 2 + (r3333 2 — _ (1 — a12 — b1)x1 2 2 2 alxl + (1 — a22 — b2)a:2 + (1 — b3)m3

=x1+x2+x3—y1—y22y32y42x4. Therefore, there exist a pair of constants u, v E R such that

with u2 + v2 = 1,

m = u2(oz%a:1 + 05332 + @3333) + @2334 i.e.,

2

2

2

2

M = C1331 + c2902 + c3333 + 04304

with c1 2 mm, C2 = uozg, C3 = um; and c4 = v; and the vectors

111 —

a1 a2 0

51 52

1.12 = ’

b3

0

and

u3 =



Cl C2 C3

0

C4

are orthonormal.

Case 3: x3 3 3/3 3 3,12 s yl 3 m2. Clearly

y2 = (93332 + (93:03

with

(9% + b3, = 1.

Thus, if 7 = —b3 and 6 = b2, then 7192 + 61);), = 0, 72 + 62 = 1, (23.13)

72x2 + 62303 = (1 — bgfi'g + (1 — b§)m3 = x2 + x3 — m

and there are two subcases to consider:

(a)

$2+$3 S w +y2

and

(b)

$2+$3 Z y1+y2.

(a) If x2 + mg g yl + yg, then (23.13) implies that ”72302 + 52:33 S 3/1 S 331 and hence that there exist a pair of constants u, v E R with U2 + v2 = 1 such that y1 = u2(72m2 + 62133) + 112231 , i.e., if a1 = 1), a2 = 11/7 and a3 = u6, then y1 = (€231 + agxg + agxg

with

a? + a3 + a3 = 1

and ruin + a3b3 = 0.

556

23. Matrices with nonnegative entries

Now choose a1, a2, 053 6 IR such that the matrix 0

a1

a1

()2

a2

a2

()3

a3

a3

is an orthogonal matrix. Then

again + 0435132 + @3563 = (1 — afixl + (1 — a3 — b§)$2 + (1 — a3 — b§)x3 =m1+m2+x3—y1—y22y32y42w4. Thus,

y4 = u2(a%m1 + 053332 + 043303) + @2334

with u2 + v2 = 1,

i.e., y4 = Cl$1 + 021172 + C3333 + C4334

with c1 = ual, C2 = uag, C3 = uozg, C4 = v and the vectors 0,1 111 =

a

2

a3

0 ,

b

112 =

01

2

and

53

0

113 =

0

c

2

03

C4

are orthonormal.

(b) If :32 + x3 2 3/1 + y2, then (23.13) implies that

72$2+52$€3 2111 Z y3 21103 2334Therefore,

31;; = 112(725102 + 62233) + 112.104

with U2 + v2 = 1,

i.e.,

y3 = cz + c3563 + 02234

with 02 = u'y, C3 = U6 and C4 = ’0.

Now choose a2, a3 a4 6 IR so that the matrix b2

02

042

b3 C3

043

0

0&4

C4

is an orthogonal matrix. Then 2 + (13333 2 + 014304 2 — _ (l — b22 — c2)x2 2 2 2 0425132 + (1 — b32 — c3)x3 + (1 — c4)x4 =x2+m3+x4—y2—y3£y43y1 3581Thus,

y1 = u2(a§m2 + 0435133 + 042584) + 112ml

with U2 + v2 = 1,

i.e., if a1 = 1), a2 = uozg, a3 = uozg and a4 = ua4, then 2

2

2

2

in = 0,1131 -|- 0,2332 + 61,3273 -|- a4x4

23.6. The Schur—Horn convexity theorem

557

and the vectors a1

u1 =

0

a2

a

a3

b2

112 =

0,4

0

u3 =

and

02

(93

03

0

C4

[I

are orthonormal.

Theorem 23.19. (Schur-Horn) Let {#1: . . . , an} be any set of real numbers (not necessarily distinct) and let a E R". Then ”0(1)

a E conv

g

: a 6 2n

Mao.)

if and only if there exists a symmetric matrix A E Rnxn with eigenvalues [11,...,/1.n such that dA = a. Suppose first that a belongs to the indicated convex hull. Then

Proof.

#1

a=aPa 06211

;

M1

=P

Mn

3, [an

Where P = 2062” caPa is a convex combination of permutation matrices P0 and hence, by Exercise 23.27, {#1, . . . ,Mn} majorizes {a1, . . .,an}. Therefore, by Lemma 23.18, there exists a set {u1, . . . , un_1} of n— 1 orthonormal vectors in R” and a permutation a 6 En such that n

aw) = Z(ui)§nj j=1

for

i = 1,. ..,n — l.

Let un 6 IR" be a vector of norm one that is orthogonal to u1,. . . ,un_1. Then n

71.

11—1

(ling/13' = Z (1 — 2011)?) My“ .7

1

j=1 =

i=1

#1 + . . . + Mn — (0,0(1) + ‘ ' ' + a0(n—1))= a0'(’n) '

This completes the proof in one direction. The other direction is covered by El the first few lines of this section.

Exercise 23.28. The components {x1,...,xn} of x E R" majorize the components {y1, . . . , yn} of y E R" if and only if

y E conv {Pax: o 6 Zn}.

558

23. Matrices with nonnegative entries

Exercise 23.29. Verify Lemma 23.18 when {901, . . . ,cc4} majorizes {y1, . . . , y4} and x1 2 x2 2 yl 2 y2 2 y3 2 21:3 2 x4. [HINT: Express y1 as a convex

combination of $1 and x4 and y3 as a convex combination of 51:2 and $3.]

23.7. Bibliographical notes Applications of Perron—Frobenius theory to the control of a group of au-

tonomous wheeled vehicles are found in the paper [60]. The presented proof of Ky Fan’s inequality is adapted from an exercise with hints in [12]. The section on Google was gleaned from the articles [4] and [47]. Exercise 23.23 is based on a question raised and answered in class by Rangarajan Bhara—

tram. Exercise 23.26 is taken from [79]. Exercise 23.27 is taken from [46]; 23.28 is a theorem of Rado. Additional discussion on the history of the Schur—Horn convexity theorem and references to generalizations may be found in [36]. Related applications are discussed in [17]. The definitive account of permanents up until about 1978 was undoubt—

edly the book Permanents by Henryk Mine [65]. However, in 1980/81 two proofs of van der Waerden’s conjecture, which states that The permanent of a doubly stochastic n X n matria: is bounded below by n! /n", with equality if and only if each entry in the matrix is equal to 1/n, were published. The later book Nonnegatiue Matrices [66] includes a proof of this conjecture.

Appendix A

Some facts from

analysis

. a liberal arts school that administers a light education to students lured by fine architecture and low admission requirements. Among financially gifted parents of academically challenged students along the Eastern Seaboard, the college is known as a place where ..., barring a felony conviction, [your child] will get to wear a black gown and attend graduation. . ..

Garrison Keillor [53], p. 7 A.1. Convergence of sequences of points A sequence of points .731, £172, . . . E R is said to

o be bounded if there exists a finite number M > 0 such that |1cj| g M forj = 1,2,...,

0 be monotonic if either 301 3 x2 3

or m1 2 x2 2 ---,

o converge to a limit cc if for every 8 > 0 there exists an integer N such that

lxj—x| 0 there exists an integer N such that |5cj+k—xj| 0, there exists an integer N such that

|fj(x)-f(fc)| 0 there exists an integer N that is independent of the choice of at E Q such that

|fj(x) — f(x)| < 8

and every

for jz N

x E Q.

A.3. Convergence of sums

A sum 23:1 aj of points aj E R is said to converge to a limit a E R if the partial sums n

j=1 tend to a as n T 00, i.e., if for every 5 > 0, there exists an integer N such that

|Sn—a|N.

The Cauchy criterion for convergence then translates to: for every 8 > 0 there exists an integer N such that

ISn+k—Sn|N

and

1621

or, equivalently, n+k

Z aj Nand

1921.

A.4. Sups and infs

561

In particular, a sufficient (but not necessary) condition for convergence is that for every 8 > 0 there exists an integer N such that n+k'

00

Z |aj| < 5 j=n+1

for

n>N

and

k2 1

or, equivalently,

Z lajl < oo. j=1

A.4. Sups and infs If Q is a bounded set of points in R, then: m is said to be a lower bound for Q if m S :1: for every :1: E Q.

M is said to be an upper bound for Q if a: S M for every a: E Q. Moreover, there exists a unique greatest lower bound 7% for Q and a unique least upper bound M for Q. These numbers are referred to as the infimum and supremum of Q, respectively. They are denoted by the symbols

77L=inf{a::xEQ}

and

M=sup{m:x€Q},

respectively, and may be characterized by the conditions

infimum: 771 = inf{m : 5r: 6 Q} if and only if 777, g a: for every x E Q, but for every 5 > 0 there exists at least one point :6 E Q such that a: < 77L + a.

supremum: M = sup{:r: : x E Q} if and only if a: S M for every x E Q, but forAevery 5 > 0 there exists at least one point a: E Q such that a: > M — 5.

Let 51:1, :62, - - - be a sequence of points in R such that |$j| S M < 00 and let Mk =sup{mk,mk+1,---}

for

k = 1,2,... .

Then

M12M22M32“'2—M; i.e., M1,M2, . .. is a bounded monotone sequence.

Therefore lii00 M9-

exists, even though the original sequence x1, x2, . .. may not have a limit. (Think of the sequence 0, 1, 0, 1, . . ..) This number is called the limit superior and is written

lim sup acj = lim sup{wj,wj+1- --}. jToo

jTOO

The limit inferior is defined similarly:

lim inf 1179- = lim inf{xj,mj+1---}. jTOO

jTOO

562

A. Some facts from analysis

A.5. Topology Let

Br(y)={x€IR: |x—y| 0. The symbol int A will be used to denote the set of all interior points of A. This set is called the interior of A.

Appendz'a: B

More complex variables

The game was not as close as the score indicated.

Rud Rennie (after observing a 19 to 1 rout), cited in [51], p. 48 This appendix is devoted to some supplementary facts on complex vari— able theory to supplement the brief introduction given in Chapter 17. B. 1 . Power series

An infinite series of the form 2:020 an()\ — w)” is called a power series and the number R that is defined by the formula 1

— = limsup{|ak|1/k},

R

kToo

with the understanding that

R=oo

if

limsuplakll/k=0 and R=0 if

limsuplakIl/lc

kToo

2%,

kToo

is termed the radius of convergence of the power series. The name stems from the fact that the series converges if M — w| < R and diverges if A —w| > R. Thus, for example, if 0 < R < 00 and 0 < 7“ < R, then the partial sums

fn=§jak 0. Then

foy=iiflmwNA—ww flrlA—M 0, thanks to Lemma B.4. Therefore, by Lemma B.6, we can write

MC) = 1—(1—h(C)) = exp{ 00. Therefore,

|f(A) — {91W + - - - +ge(A)}l s M < 00 for all A E (C. Thus, by Liouville’s theorem,

f(>\) — {91(/\) + - - - +ge(>\)} = c for every point A E C.

D

Bibliography

Gregory S. Ammar and William B. Gragg. Schur flows for orthogonal Hessenberg matrices. Fields Inst. Commun, 3:27—34, 1994. Tsuyoshi Ando. Concavity of certain maps on positive definite matrices and applica— tions to Hadamard products. Linear Algebra Appl., 26:203—241, 1979. Tom M. Apostol. Mathematical Analysis. Addison-Wesley, 1957. David Austin. How Google Finds Your Needle in the Web’s Haystack. Feature Column from the AMS, December 2006. WWW.ams.org/samplings/feature—column/fcarc— pagerank. Sheldon Axler. Down with determinants. Amer. Math. Monthly, 102:139—154, 1995. Mih’aly Bakonyi and Hugo J. Woerdemann. Matrix Completions, Moments and Sums of Hermitian Squares. Princeton University Press, 2011. Harm Bart, Israel Gohberg, and Marinus A. Kaashoek. Minimal Factorization of Matrix and Operator Functions. Birkhauser, 1979. Dimitris Bertsimas and John N. Tsitsiklis. Introduction to Linear Optimization. Athena Scientific, 1997. Rajendra Bhatia. Matrix Analysis. Springer-Verlag, 1997. Rajendra Bhatia. On the exponential metric increasing property. Linear Algebra Appl, 375:211—220, 2003.

[11]

Vladimir Bolotnikov and Harry Dym. On degenerate interpolation, entropy and ex— tremal problems for matrix Schur functions. Integral Equations Operator Theory, 32:367—435, 1998.

[12]

Jonathan M. Borwein and Adrian S. Lewis. Convex Analysis and Nonlinear Optimization. Theory and Examples. Springer, 2006.

[13]

Albrecht Bottcher and Harold Widom. Szego via Jacobi. Linear Algebra Appl., 419:656—657, 2006.

[14] [15]

Truman Botts. Convex sets. Amer. Math. Monthly, 49:527—535, 1942. Stanley Boylan. Learning With the Rav: Learning from the Rav. Tradition, 30:131— 144, 1996.

575

576

Bibliography

[16] Alfred Brauer. Limits for the characteristic roots of a matrix. iv: Applications to stochastic matrices. Duke Math. J., 19:75—91, 1952. [17] Roger Brockett. Using feedback to improve system identification. Lecture Notes in Control and Information Sciences, 329:45—65, 2006. [18] Juan F. Camino, J. William Helton, Robert E. Skelton, and Ye Jieping. Matrix inequalities: a symbolic procedure to determine convexity automatically. Integral Equations Operator Theory, 46:399—454, 2003. [19] Shalom Carmy. Polyphonic diversity and military music. Tradition, 34:6—32, 2000. [20] Hervé Chapellat and S. P. Bhattacharyya. An alternative proof of Kharitonov’s the— orem. IEEE Trans. Automatic Control, 34:448—450, 1989. [21] Barry Cipra. Andy Rooney, PhD. The Mathematical Intelligencer, 10:10, 1988. [22] Branko Curgus and Aad Dijksma. A proof of the main theorem on Bezoutians. Elemente der Mathematik, in press, arXiv:1208.2385v2. [23] Chandler Davis, W. M. Kahan, and Hans F. Weinberger. Norm preserving dilations and their applications to optimal error bounds. SIAM J. Numer. Anal., 19:445—469, 1982. [24] Ilan Degani. RCMS—right correction Magnus schemes for oscillatory ODE’s and cubature formulas and oscillatory extensions. PhD thesis, The Weizmann Institute of Science, 2005. [25] Klaus Diepold. Intersection of subspaces. Patrick Dewilde Workshop on Algebra, Networks, Signal Processing and System Theory, Waasenaar, June 2008. [26] Ronald G. Douglas. On majorization, factorization and range inclusion of operators in Hilbert spaces. Proc. Amer. Math. Soc., 17:413—415, 1966. [27] Chen Dubi and Harry Dym. Riccati inequalities and reproducing kernel Hilbert spaces. Linear Algebra Appl., 420:458—482, 2007. [28] Peter Duren. Invitation to Classical Analysis. Amer. Math. Soc, 2012. [29] Harry Dym. J Contractive Matrix Functions, Reproducing Kernel Hilbert Spaces, and Interpolation. Amer. Math. Soc, 1989. [30] Harry Dym. On Riccati equations and reproducing kernel spaces. Oper. Theory Adv. Appl., 124:189—215, 2001. [31] Harry Dym. Riccati equations and bitangential interpolation problems with singular Pick matrice. Contemporary Mathematics, 323:361—391, 2002. [32] Harry Dym and Israel Gohberg. Extensions of band matrices with band inverses. Linear Algebra Appl., 36:1—24, 1981. [33] Harry Dym and Israel Gohberg. Extensions of kernels of Fredholm operators. Journal d ’Analyse Mathematique, 42:51—97, 1982/ 1983. [34] Harry Dym and J. William Helton. The matrix multidisk problem. Integral Equations Operator Theory, 46:285—339, 2003. [35] Harry Dym, J. William Helton, and Scott McCullough. The Hessian of a noncommutative polynomial has numerous negative eigenvalue. Journal d’Analyse Mathematique, 102:29—76, 2007. [36] Harry Dym and Victor Katsnelson. Contributions of Issai Schur to analysis. Progr. Math., 210:xci—clxxxviii, 2003. [37] Harry Dym and David P. Kimsey. Trace formulas. Linear Algebra Applications, 439:3070—3099, 2013. [38] Richard S. Ellis. Entropy, Large Deviations and Statistical Mechanics. SpringerVerlag, 1985.

Bibliography

577

[39]

Ludwig D. Faddeev. 30 years in mathematical physics. Proc. Steklov Institute, 176:3— 28, 1988.

[40] [41]

Avraham Feintuch. Robust Control Theory in Hilbert Space. Springer, 1998. J. I. Fujii, M. Fujii, T. Furuta, and R. Nakamoto. Norm inequalities equivalent to Heinz inequality. Proc. Amer. Math. Soc., 118:827—830, 1993.

[42]

T. F‘uruta. Norm inequalities equivalent to L'owner-Heinz theorem. Rev. Math. Phys, 1:135—137, 1989.

[43]

Israel Gohberg, Marinus A. Kaashoek, and Hugo J. Woerdeman. The band method for positive and contractive extension problems. J. Operator Theory, 22:109—155, 1989.

[44]

Israel Gohberg and Mark G. Krein. Introduction to the Theory of Linear Nonselfadjoint Operators. American Math. Soc., 1969.

[45] [46]

Paul Halmos. A Hilbert Space Problem Book. Van Nostrand, 1967.

[47] [481 [49] [501 [51] [52] [53] [54]

G. H. Hardy, J. E. Littlewood, and G. Polya. Inequalities. Cambridge University Press, 1959. Taher Haveliwala and Sepandar Kamvar. The second eigenvalue of the google matrix. Technical report, Stanford University. J. William Helton and Ilya M. Spitkovsky. The possible shapes of numerical ranges. Oper. Matrices, 62607—611, 2012. Michael Heymann. The pole shifting theorem revisited. IEEE Trans. Automatic Control, 24:479—480, 1979. Alfred Horn. Doubly stochastic matrices and the diagonal of a rotation. Amer. J. Math., 76:620—630, 1953. Roger Kahn. Memories of Summer. University of Nebraska Press, 1997. Yakar Kannai. An elementary proof of the no—retraction theorem. American Math. Monthly, 88:264—268, 1981. Garrison Keillor. Woebegone Boy. Viking, 1997. Donald E. Knuth, Tracy Larrabee, and Paul M. Roberts. Mathematical Writing. Mathematical Association of America, 1989.

[55]

Peter Lancaster and Leiba Rodman. Algebraic Riccati Equations. Oxford University Press, 1995.

[56]

Peter Lancaster and Miron Tismenetsky. The Theory of Matrices. Academic Press, 1985.

[57]

F. I. Lander. The Bezoutian and the inversion of Hankel and Toeplitz matrices. Matem. Issled. Kishine'v, 9:69—87, 1974.

[58]

Joseph LaSalle and Solomon Lefschetz. Stability by Liapunov’s Direct Method with Applications. Academic Press, 1961.

[59]

E. Levy and Orr M. Shalit. Dilation theory in finite dimensions: the possible, the impossible and the unknown. Rocky Mountain J., in press, arXiv:1012.4514.

[60]

Zhiyun Lin, Bruce Francis, and Manfredi Maggiore. Necessary and sufficient graphical conditions for formation control of unicycles. IEEE Trans. Automatic Control, 50:121— 127, 2005.

[61]

David G. Luenberger. Optimization by Vector Space Methods. John Wiley & Sons, 1969.

[62] [63]

André Malraux. Antimemoirs. Holt Rhinehart and Winston, 1968. John E. McCarthy and Orr M. Shalit. Unitary N—di1ations for tuples of commuting matrices. Proc. Amer. Math. Soc., 141:563—571, 2013.

Bibliography

578

[64] Alan McIntosh. The Toeplitz-Hausdorff theorem and ellipticity conditions. Amer. Math. Monthly, 85:475—477, 1978. [65] Henryk Minc. Permanents. Addison—Wesley, 1978. [66] Henryk Minc. Nonnegative Matrices. John Wiley, 1988. [67] Leon Mirsky. On the trace of matrix products. Math. Nachr., 20:171—174, 1959.

[68] Patrick O’Brian. Master and Commander. Norton, 1990. [69] Patrick O’Brian. The Far Side of the World. Norton, 1992. [70] Alex Olshevsky and Vadim Olshevsky. Kharitonov’s theorem and Bezoutians. Linear Algebra Appl., 399:285—297, 2005. [71] Vladimir Peller. Hankel Operators and Their Applications. Springer, 1957. [72] Elijah Polak. Optimization: Algorithms and Consistent Approximation. Springer— Verlag, 2003. [73] Vladimir P. Potapov. The multiplicative structure of J—contractive matrix functions.

Amer. Math. Soc. Transl. (2), 15:131—243, 1960. [74] Aaron Rakeffet-Rothkoff. The Rav: The World of Rabbi Joseph B. Solo'veitchik, Volume 2. Ktav Publishing House, 1999. [75] Walter Rudin. Real and Complex Analysis. McGraW Hill, 1966. [76] David L. Russell. Mathematics of Finite-Dimensional Control Systems. Marcel Dekker, 1979. [77] Thomas L. Saaty and Joseph Bram. Nonlinear Mathematics. Dover, 1981. [78] Gabriela Sansigre and Manuel Alvarez. On Bezoutian reduction with the Vandermonde matrix. Linear Algebra Appl., 121:401—408, 1989. Linear algebra and applica-

tions (Valencia, 1987). [79] Issai Schur. Uber eine Klasse von Mittelbildungen mit Anwendungen auf die Determinantenttheorie. Sitzungsberichte der Berliner Mathematischen Cesellschaft, 22:9—20, 1923. [80] Alan Shuchat. Generalized least squares and eigenvalues. Amer. Math. Monthly, 92:656—659, 1985. [81] Barry Simon. OPUC on one foot. Bull. Amer. Math. Soc., 42:431—460, 2005. [82] Barry Simon. The sharp form of the strong Szegé theorem. Contemporary Mathematics, 387:253—275, 2005. [83] Joshua Sobol. Remarks upon being awarded an honorary doctorate by the Weizmann Institute in 2008.

[84] Teiji Takagi. An algebraic problem related to an analytic theorem of Carathéodory and Fejér on an allied theorem of Landau. Japanese J. of Mathematics, 1:83—93, 1924. [85] E. C. Titchmarsh. Eigenfunction expansions associated with second-order difierential equations. Vol. 2. Oxford, 1958. [86] Lloyd N. Trefethen and David Bau, III. Numerical Linear Algebra. SIAM, 1997. [87] Sergei Treil and Alexander Volberg. Wavelets and the angle between past and future. J. Funct. Anal, 143:269—308, 1997. ] Robert James Waller. The Bridges of Madison County. Warner Books, 1992. ] Roger Webster. Convexity. Oxford University Press, 1994. [90] W. Murray Wonham. Linear Multivariable Control. Springer-Verlag, 1985. ] Kosaku Yosida. Functional Analysis. Springer-Verlag, 1965.

]

Kemin Zhou, John C. Doyle, and Keith Glover. Robust and Optimal Control. Prentice Hall, 1996.

Notation Index

11+, 415 H_, 415

C(Q), 315

CW2), 315

Otj, 67

71'? 67

fF f()\)d)\, 391 V(f,g), 488 E, 456

704(4), 457 0(A), 67

90k(/\) = X“, 490 Afij}, 105

A1/2, 281 A1, 245

Ab, 250 A 2 0, 537 A > 0, 537 A 5 0, 254 A > O, 254

“All, 151 ”AH”, 144

llAlls,t, 146

D, 457 PT, 11 det A, 94

det(A), 94

EM, 489 50(A), 422, 467 5_(A), 422, 467 8+(A), 422, 467

IF, 3 IFP, 4

pq,4 n8t7 4'56 A

M), 400 gV o, 381 x 2 o, 381 llxlls, 139